<|startoftext|> Introduction and preliminaries The focus of this paper is decompositions of (k, `)-sparse graphs into edge-disjoint subgraphs that certify sparsity. We use graph to mean a multigraph, possibly with loops. We say that a graph is (k, `)-sparse if no subset of n′ vertices spans more than kn′− ` edges in the graph; a (k, `)-sparse graph with kn′− ` edges is (k, `)-tight. We call the range k ≤ `≤ 2k−1 the upper range of sparse graphs and 0≤ `≤ k the lower range. In this paper, we present efficient algorithms for finding decompositions that certify sparsity in the upper range of `. Our algorithms also apply in the lower range, which was already ad- dressed by [3, 4, 5, 6, 19]. A decomposition certifies the sparsity of a graph if the sparse graphs and graphs admitting the decomposition coincide. Our algorithms are based on a new characterization of sparse graphs, which we call the pebble game with colors. The pebble game with colors is a simple graph construction rule that produces a sparse graph along with a sparsity-certifying decomposition. We define and study a canonical class of pebble game constructions, which correspond to previously studied decompositions of sparse graphs into edge disjoint trees. Our results provide a unifying framework for all the previously known special cases, including Nash-Williams- Tutte and [7, 24]. Indeed, in the lower range, canonical pebble game constructions capture the properties of the augmenting paths used in matroid union and intersection algorithms[5, 6]. Since the sparse graphs in the upper range are not known to be unions or intersections of the matroids for which there are efficient augmenting path algorithms, these do not easily apply in ∗ Research of both authors funded by the NSF under grants NSF CCF-0430990 and NSF-DARPA CARGO CCR-0310661 to the first author. 2 Ileana Streinu, Louis Theran Term Meaning Sparse graph G Every non-empty subgraph on n′ vertices has ≤ kn′− ` edges Tight graph G G = (V,E) is sparse and |V |= n, |E|= kn− ` Block H in G G is sparse, and H is a tight subgraph Component H of G G is sparse and H is a maximal block Map-graph Graph that admits an out-degree-exactly-one orientation (k, `)-maps-and-trees Edge-disjoint union of ` trees and (k− `) map-grpahs `Tk Union of ` trees, each vertex is in exactly k of them Set of tree-pieces of an `Tk induced on V ′ ⊂V Pieces of trees in the `Tk spanned by E(V ′) Proper `Tk Every V ′ ⊂V contains ≥ ` pieces of trees from the `Tk Table 1. Sparse graph and decomposition terminology used in this paper. the upper range. Pebble game with colors constructions may thus be considered a strengthening of augmenting paths to the upper range of matroidal sparse graphs. 1.1. Sparse graphs A graph is (k, `)-sparse if for any non-empty subgraph with m′ edges and n′ vertices, m′ ≤ kn′− `. We observe that this condition implies that 0 ≤ ` ≤ 2k− 1, and from now on in this paper we will make this assumption. A sparse graph that has n vertices and exactly kn−` edges is called tight. For a graph G = (V,E), and V ′ ⊂ V , we use the notation span(V ′) for the number of edges in the subgraph induced by V ′. In a directed graph, out(V ′) is the number of edges with the tail in V ′ and the head in V −V ′; for a subgraph induced by V ′, we call such an edge an out-edge. There are two important types of subgraphs of sparse graphs. A block is a tight subgraph of a sparse graph. A component is a maximal block. Table 1 summarizes the sparse graph terminology used in this paper. 1.2. Sparsity-certifying decompositions A k-arborescence is a graph that admits a decomposition into k edge-disjoint spanning trees. Figure 1(a) shows an example of a 3-arborescence. The k-arborescent graphs are described by the well-known theorems of Tutte [23] and Nash-Williams [17] as exactly the (k,k)-tight graphs. A map-graph is a graph that admits an orientation such that the out-degree of each vertex is exactly one. A k-map-graph is a graph that admits a decomposition into k edge-disjoint map- graphs. Figure 1(b) shows an example of a 2-map-graphs; the edges are oriented in one possible configuration certifying that each color forms a map-graph. Map-graphs may be equivalently defined (see, e.g., [18]) as having exactly one cycle per connected component.1 A (k, `)-maps-and-trees is a graph that admits a decomposition into k− ` edge-disjoint map-graphs and ` spanning trees. Another characterization of map-graphs, which we will use extensively in this paper, is as the (1,0)-tight graphs [8, 24]. The k-map-graphs are evidently (k,0)-tight, and [8, 24] show that the converse holds as well. 1 Our terminology follows Lovász in [16]. In the matroid literature map-graphs are sometimes known as bases of the bicycle matroid or spanning pseudoforests. Sparsity-certifying Graph Decompositions 3 Fig. 1. Examples of sparsity-certifying decompositions: (a) a 3-arborescence; (b) a 2-map-graph; (c) a (2,1)-maps-and-trees. Edges with the same line style belong to the same subgraph. The 2-map-graph is shown with a certifying orientation. A `Tk is a decomposition into ` edge-disjoint (not necessarily spanning) trees such that each vertex is in exactly k of them. Figure 2(a) shows an example of a 3T2. Given a subgraph G′ of a `Tk graph G, the set of tree-pieces in G′ is the collection of the components of the trees in G induced by G′ (since G′ is a subgraph each tree may contribute multiple pieces to the set of tree-pieces in G′). We observe that these tree-pieces may come from the same tree or be single-vertex “empty trees.” It is also helpful to note that the definition of a tree-piece is relative to a specific subgraph. An `Tk decomposition is proper if the set of tree-pieces in any subgraph G′ has size at least `. Figure 2(a) shows a graph with a 3T2 decomposition; we note that one of the trees is an isolated vertex in the bottom-right corner. The subgraph in Figure 2(b) has three black tree- pieces and one gray tree-piece: an isolated vertex at the top-right corner, and two single edges. These count as three tree-pieces, even though they come from the same back tree when the whole graph in considered. Figure 2(c) shows another subgraph; in this case there are three gray tree-pieces and one black one. Table 1 contains the decomposition terminology used in this paper. The decomposition problem. We define the decomposition problem for sparse graphs as tak- ing a graph as its input and producing as output, a decomposition that can be used to certify spar- sity. In this paper, we will study three kinds of outputs: maps-and-trees; proper `Tk decompositions; and the pebble-game-with-colors decomposition, which is defined in the next section. 2. Historical background The well-known theorems of Tutte [23] and Nash-Williams [17] relate the (k,k)-tight graphs to the existence of decompositions into edge-disjoint spanning trees. Taking a matroidal viewpoint, 4 Ileana Streinu, Louis Theran Fig. 2. (a) A graph with a 3T2 decomposition; one of the three trees is a single vertex in the bottom right corner. (b) The highlighted subgraph inside the dashed countour has three black tree-pieces and one gray tree-piece. (c) The highlighted subgraph inside the dashed countour has three gray tree-pieces (one is a single vertex) and one black tree-piece. Edmonds [3, 4] gave another proof of this result using matroid unions. The equivalence of maps- and-trees graphs and tight graphs in the lower range is shown using matroid unions in [24], and matroid augmenting paths are the basis of the algorithms for the lower range of [5, 6, 19]. In rigidity theory a foundational theorem of Laman [11] shows that (2,3)-tight (Laman) graphs correspond to generically minimally rigid bar-and-joint frameworks in the plane. Tay [21] proved an analogous result for body-bar frameworks in any dimension using (k,k)-tight graphs. Rigidity by counts motivated interest in the upper range, and Crapo [2] proved the equivalence of Laman graphs and proper 3T2 graphs. Tay [22] used this condition to give a direct proof of Laman’s theorem and generalized the 3T2 condition to all `Tk for k≤ `≤ 2k−1. Haas [7] studied `Tk decompositions in detail and proved the equivalence of tight graphs and proper `Tk graphs for the general upper range. We observe that aside from our new pebble- game-with-colors decomposition, all the combinatorial characterizations of the upper range of sparse graphs, including the counts, have a geometric interpretation [11, 21, 22, 24]. A pebble game algorithm was first proposed in [10] as an elegant alternative to Hendrick- son’s Laman graph algorithms [9]. Berg and Jordan [1], provided the formal analysis of the pebble game of [10] and introduced the idea of playing the game on a directed graph. Lee and Streinu [12] generalized the pebble game to the entire range of parameters 0≤ `≤ 2k−1, and left as an open problem using the pebble game to find sparsity certifying decompositions. 3. The pebble game with colors Our pebble game with colors is a set of rules for constructing graphs indexed by nonnegative integers k and `. We will use the pebble game with colors as the basis of an efficient algorithm for the decomposition problem later in this paper. Since the phrase “with colors” is necessary only for comparison to [12], we will omit it in the rest of the paper when the context is clear. Sparsity-certifying Graph Decompositions 5 We now present the pebble game with colors. The game is played by a single player on a fixed finite set of vertices. The player makes a finite sequence of moves; a move consists in the addition and/or orientation of an edge. At any moment of time, the state of the game is captured by a directed graph H, with colored pebbles on vertices and edges. The edges of H are colored by the pebbles on them. While playing the pebble game all edges are directed, and we use the notation vw to indicate a directed edge from v to w. We describe the pebble game with colors in terms of its initial configuration and the allowed moves. Fig. 3. Examples of pebble game with colors moves: (a) add-edge. (b) pebble-slide. Pebbles on vertices are shown as black or gray dots. Edges are colored with the color of the pebble on them. Initialization: In the beginning of the pebble game, H has n vertices and no edges. We start by placing k pebbles on each vertex of H, one of each color ci, for i = 1,2, . . . ,k. Add-edge-with-colors: Let v and w be vertices with at least `+1 pebbles on them. Assume (w.l.o.g.) that v has at least one pebble on it. Pick up a pebble from v, add the oriented edge vw to E(H) and put the pebble picked up from v on the new edge. Figure 3(a) shows examples of the add-edge move. Pebble-slide: Let w be a vertex with a pebble p on it, and let vw be an edge in H. Replace vw with wv in E(H); put the pebble that was on vw on v; and put p on wv. Note that the color of an edge can change with a pebble-slide move. Figure 3(b) shows examples. The convention in these figures, and throughout this paper, is that pebbles on vertices are represented as colored dots, and that edges are shown in the color of the pebble on them. From the definition of the pebble-slide move, it is easy to see that a particular pebble is always either on the vertex where it started or on an edge that has this vertex as the tail. However, when making a sequence of pebble-slide moves that reverse the orientation of a path in H, it is sometimes convenient to think of this path reversal sequence as bringing a pebble from the end of the path to the beginning. The output of playing the pebble game is its complete configuration. Output: At the end of the game, we obtain the directed graph H, along with the location and colors of the pebbles. Observe that since each edge has exactly one pebble on it, the pebble game configuration colors the edges. We say that the underlying undirected graph G of H is constructed by the (k, `)-pebble game or that H is a pebble-game graph. Since each edge of H has exactly one pebble on it, the pebble game’s configuration partitions the edges of H, and thus G, into k different colors. We call this decomposition of H a pebble- game-with-colors decomposition. Figure 4(a) shows an example of a (2,2)-tight graph with a pebble-game decomposition. Let G = (V,E) be pebble-game graph with the coloring induced by the pebbles on the edges, and let G′ be a subgraph of G. Then the coloring of G induces a set of monochromatic con- 6 Ileana Streinu, Louis Theran (a) (b) (c) Fig. 4. A (2,2)-tight graph with one possible pebble-game decomposition. The edges are oriented to show (1,0)-sparsity for each color. (a) The graph K4 with a pebble-game decomposition. There is an empty black tree at the center vertex and a gray spanning tree. (b) The highlighted subgraph has two black trees and a gray tree; the black edges are part of a larger cycle but contribute a tree to the subgraph. (c) The highlighted subgraph (with a light gray background) has three empty gray trees; the black edges contain a cycle and do not contribute a piece of tree to the subgraph. Notation Meaning span(V ′) Number of edges spanned in H by V ′ ⊂V ; i.e. |EH(V ′)| peb(V ′) Number of pebbles on V ′ ⊂V out(V ′) Number of edges vw in H with v ∈V ′ and w ∈V −V ′ pebi(v) Number of pebbles of color ci on v ∈V outi(v) Number of edges vw colored ci for v ∈V Table 2. Pebble game notation used in this paper. nected subgraphs of G′ (there may be more than one of the same color). Such a monochromatic subgraph is called a map-graph-piece of G′ if it contains a cycle (in G′) and a tree-piece of G′ otherwise. The set of tree-pieces of G′ is the collection of tree-pieces induced by G′. As with the corresponding definition for `Tk s, the set of tree-pieces is defined relative to a specific sub- graph; in particular a tree-piece may be part of a larger cycle that includes edges not spanned by G′. The properties of pebble-game decompositions are studied in Section 6, and Theorem 2 shows that each color must be (1,0)-sparse. The orientation of the edges in Figure 4(a) shows this. For example Figure 4(a) shows a (2,2)-tight graph with one possible pebble-game decom- position. The whole graph contains a gray tree-piece and a black tree-piece that is an isolated vertex. The subgraph in Figure 4(b) has a black tree and a gray tree, with the edges of the black tree coming from a cycle in the larger graph. In Figure 4(c), however, the black cycle does not contribute a tree-piece. All three tree-pieces in this subgraph are single-vertex gray trees. In the following discussion, we use the notation peb(v) for the number of pebbles on v and pebi(v) to indicate the number of pebbles of colors i on v. Table 2 lists the pebble game notation used in this paper. 4. Our Results We describe our results in this section. The rest of the paper provides the proofs. Sparsity-certifying Graph Decompositions 7 Our first result is a strengthening of the pebble games of [12] to include colors. It says that sparse graphs are exactly pebble game graphs. Recall that from now on, all pebble games discussed in this paper are our pebble game with colors unless noted explicitly. Theorem 1 (Sparse graphs and pebble-game graphs coincide). A graph G is (k, `)-sparse with 0≤ `≤ 2k−1 if and only if G is a pebble-game graph. Next we consider pebble-game decompositions, showing that they are a generalization of proper `Tk decompositions that extend to the entire matroidal range of sparse graphs. Theorem 2 (The pebble-game-with-colors decomposition). A graph G is a pebble-game graph if and only if it admits a decomposition into k edge-disjoint subgraphs such that each is (1,0)-sparse and every subgraph of G contains at least ` tree-pieces of the (1,0)-sparse graphs in the decomposition. The (1,0)-sparse subgraphs in the statement of Theorem 2 are the colors of the pebbles; thus Theorem 2 gives a characterization of the pebble-game-with-colors decompositions obtained by playing the pebble game defined in the previous section. Notice the similarity between the requirement that the set of tree-pieces have size at least ` in Theorem 2 and the definition of a proper `Tk . Our next results show that for any pebble-game graph, we can specialize its pebble game construction to generate a decomposition that is a maps-and-trees or proper `Tk . We call these specialized pebble game constructions canonical, and using canonical pebble game construc- tions, we obtain new direct proofs of existing arboricity results. We observe Theorem 2 that maps-and-trees are special cases of the pebble-game decompo- sition: both spanning trees and spanning map-graphs are (1,0)-sparse, and each of the spanning trees contributes at least one piece of tree to every subgraph. The case of proper `Tk graphs is more subtle; if each color in a pebble-game decomposition is a forest, then we have found a proper `Tk , but this class is a subset of all possible proper `Tk decompositions of a tight graph. We show that this class of proper `Tk decompositions is sufficient to certify sparsity. We now state the main theorem for the upper and lower range. Theorem 3 (Main Theorem (Lower Range): Maps-and-trees coincide with pebble-game graphs). Let 0 ≤ ` ≤ k. A graph G is a tight pebble-game graph if and only if G is a (k, `)- maps-and-trees. Theorem 4 (Main Theorem (Upper Range): Proper `Tk graphs coincide with pebble-game graphs). Let k≤ `≤ 2k−1. A graph G is a tight pebble-game graph if and only if it is a proper `Tk with kn− ` edges. As corollaries, we obtain the existing decomposition results for sparse graphs. Corollary 5 (Nash-Williams [17], Tutte [23], White and Whiteley [24]). Let `≤ k. A graph G is tight if and only if has a (k, `)-maps-and-trees decomposition. Corollary 6 (Crapo [2], Haas [7]). Let k ≤ `≤ 2k−1. A graph G is tight if and only if it is a proper `Tk . Efficiently finding canonical pebble game constructions. The proofs of Theorem 3 and Theo- rem 4 lead to an obvious algorithm with O(n3) running time for the decomposition problem. Our last result improves on this, showing that a canonical pebble game construction, and thus 8 Ileana Streinu, Louis Theran a maps-and-trees or proper `Tk decomposition can be found using a pebble game algorithm in O(n2) time and space. These time and space bounds mean that our algorithm can be combined with those of [12] without any change in complexity. 5. Pebble game graphs In this section we prove Theorem 1, a strengthening of results from [12] to the pebble game with colors. Since many of the relevant properties of the pebble game with colors carry over directly from the pebble games of [12], we refer the reader there for the proofs. We begin by establishing some invariants that hold during the execution of the pebble game. Lemma 7 (Pebble game invariants). During the execution of the pebble game, the following invariants are maintained in H: (I1) There are at least ` pebbles on V . [12] (I2) For each vertex v, span(v)+out(v)+peb(v) = k. [12] (I3) For each V ′ ⊂V , span(V ′)+out(V ′)+peb(V ′) = kn′. [12] (I4) For every vertex v ∈V , outi(v)+pebi(v) = 1. (I5) Every maximal path consisting only of edges with color ci ends in either the first vertex with a pebble of color ci or a cycle. Proof. (I1), (I2), and (I3) come directly from [12]. (I4) This invariant clearly holds at the initialization phase of the pebble game with colors. That add-edge and pebble-slide moves preserve (I4) is clear from inspection. (I5) By (I4), a monochromatic path of edges is forced to end only at a vertex with a pebble of the same color on it. If there is no pebble of that color reachable, then the path must eventually visit some vertex twice. From these invariants, we can show that the pebble game constructible graphs are sparse. Lemma 8 (Pebble-game graphs are sparse [12]). Let H be a graph constructed with the pebble game. Then H is sparse. If there are exactly ` pebbles on V (H), then H is tight. The main step in proving that every sparse graph is a pebble-game graph is the following. Recall that by bringing a pebble to v we mean reorienting H with pebble-slide moves to reduce the out degree of v by one. Lemma 9 (The `+1 pebble condition [12]). Let vw be an edge such that H + vw is sparse. If peb({v,w}) < `+1, then a pebble not on {v,w} can be brought to either v or w. It follows that any sparse graph has a pebble game construction. Theorem 1 (Sparse graphs and pebble-game graphs coincide). A graph G is (k, `)-sparse with 0≤ `≤ 2k−1 if and only if G is a pebble-game graph. 6. The pebble-game-with-colors decomposition In this section we prove Theorem 2, which characterizes all pebble-game decompositions. We start with the following lemmas about the structure of monochromatic connected components in H, the directed graph maintained during the pebble game. Sparsity-certifying Graph Decompositions 9 Lemma 10 (Monochromatic pebble game subgraphs are (1,0)-sparse). Let Hi be the sub- graph of H induced by edges with pebbles of color ci on them. Then Hi is (1,0)-sparse, for i = 1, . . . ,k. Proof. By (I4) Hi is a set of edges with out degree at most one for every vertex. Lemma 11 (Tree-pieces in a pebble-game graph). Every subgraph of the directed graph H in a pebble game construction contains at least ` monochromatic tree-pieces, and each of these is rooted at either a vertex with a pebble on it or a vertex that is the tail of an out-edge. Recall that an out-edge from a subgraph H ′ = (V ′,E ′) is an edge vw with v∈V ′ and vw /∈ E ′. Proof. Let H ′ = (V ′,E ′) be a non-empty subgraph of H, and assume without loss of generality that H ′ is induced by V ′. By (I3), out(V ′)+ peb(V ′) ≥ `. We will show that each pebble and out-edge tail is the root of a tree-piece. Consider a vertex v ∈ V ′ and a color ci. By (I4) there is a unique monochromatic directed path of color ci starting at v. By (I5), if this path ends at a pebble, it does not have a cycle. Similarly, if this path reaches a vertex that is the tail of an out-edge also in color ci (i.e., if the monochromatic path from v leaves V ′), then the path cannot have a cycle in H ′. Since this argument works for any vertex in any color, for each color there is a partitioning of the vertices into those that can reach each pebble, out-edge tail, or cycle. It follows that each pebble and out-edge tail is the root of a monochromatic tree, as desired. Applied to the whole graph Lemma 11 gives us the following. Lemma 12 (Pebbles are the roots of trees). In any pebble game configuration, each pebble of color ci is the root of a (possibly empty) monochromatic tree-piece of color ci. Remark: Haas showed in [7] that in a `Tk , a subgraph induced by n′ ≥ 2 vertices with m′ edges has exactly kn′−m′ tree-pieces in it. Lemma 11 strengthens Haas’ result by extending it to the lower range and giving a construction that finds the tree-pieces, showing the connection between the `+1 pebble condition and the hereditary condition on proper `Tk . We conclude our investigation of arbitrary pebble game constructions with a description of the decomposition induced by the pebble game with colors. Theorem 2 (The pebble-game-with-colors decomposition). A graph G is a pebble-game graph if and only if it admits a decomposition into k edge-disjoint subgraphs such that each is (1,0)-sparse and every subgraph of G contains at least ` tree-pieces of the (1,0)-sparse graphs in the decomposition. Proof. Let G be a pebble-game graph. The existence of the k edge-disjoint (1,0)-sparse sub- graphs was shown in Lemma 10, and Lemma 11 proves the condition on subgraphs. For the other direction, we observe that a color ci with ti tree-pieces in a given subgraph can span at most n− ti edges; summing over all the colors shows that a graph with a pebble-game decomposition must be sparse. Apply Theorem 1 to complete the proof. Remark: We observe that a pebble-game decomposition for a Laman graph may be read out of the bipartite matching used in Hendrickson’s Laman graph extraction algorithm [9]. Indeed, pebble game orientations have a natural correspondence with the bipartite matchings used in 10 Ileana Streinu, Louis Theran Maps-and-trees are a special case of pebble-game decompositions for tight graphs: if there are no cycles in ` of the colors, then the trees rooted at the corresponding ` pebbles must be spanning, since they have n− 1 edges. Also, if each color forms a forest in an upper range pebble-game decomposition, then the tree-pieces condition ensures that the pebble-game de- composition is a proper `Tk . In the next section, we show that the pebble game can be specialized to correspond to maps- and-trees and proper `Tk decompositions. 7. Canonical Pebble Game Constructions In this section we prove the main theorems (Theorem 3 and Theorem 4), continuing the inves- tigation of decompositions induced by pebble game constructions by studying the case where a minimum number of monochromatic cycles are created. The main idea, captured in Lemma 15 and illustrated in Figure 6, is to avoid creating cycles while collecting pebbles. We show that this is always possible, implying that monochromatic map-graphs are created only when we add more than k(n′−1) edges to some set of n′ vertices. For the lower range, this implies that every color is a forest. Every decomposition characterization of tight graphs discussed above follows immediately from the main theorem, giving new proofs of the previous results in a unified framework. In the proof, we will use two specializations of the pebble game moves. The first is a modi- fication of the add-edge move. Canonical add-edge: When performing an add-edge move, cover the new edge with a color that is on both vertices if possible. If not, then take the highest numbered color present. The second is a restriction on which pebble-slide moves we allow. Canonical pebble-slide: A pebble-slide move is allowed only when it does not create a monochromatic cycle. We call a pebble game construction that uses only these moves canonical. In this section we will show that every pebble-game graph has a canonical pebble game construction (Lemma 14 and Lemma 15) and that canonical pebble game constructions correspond to proper `Tk and maps-and-trees decompositions (Theorem 3 and Theorem 4). We begin with a technical lemma that motivates the definition of canonical pebble game constructions. It shows that the situations disallowed by the canonical moves are all the ways for cycles to form in the lowest ` colors. Lemma 13 (Monochromatic cycle creation). Let v ∈ V have a pebble p of color ci on it and let w be a vertex in the same tree of color ci as v. A monochromatic cycle colored ci is created in exactly one of the following ways: (M1) The edge vw is added with an add-edge move. (M2) The edge wv is reversed by a pebble-slide move and the pebble p is used to cover the reverse edge vw. Proof. Observe that the preconditions in the statement of the lemma are implied by Lemma 7. By Lemma 12 monochromatic cycles form when the last pebble of color ci is removed from a connected monochromatic subgraph. (M1) and (M2) are the only ways to do this in a pebble game construction, since the color of an edge only changes when it is inserted the first time or a new pebble is put on it by a pebble-slide move. Sparsity-certifying Graph Decompositions 11 vw vw Fig. 5. Creating monochromatic cycles in a (2,0)-pebble game. (a) A type (M1) move creates a cycle by adding a black edge. (b) A type (M2) move creates a cycle with a pebble-slide move. The vertices are labeled according to their role in the definition of the moves. Figure 5(a) and Figure 5(b) show examples of (M1) and (M2) map-graph creation moves, respectively, in a (2,0)-pebble game construction. We next show that if a graph has a pebble game construction, then it has a canonical peb- ble game construction. This is done in two steps, considering the cases (M1) and (M2) sepa- rately. The proof gives two constructions that implement the canonical add-edge and canonical pebble-slide moves. Lemma 14 (The canonical add-edge move). Let G be a graph with a pebble game construc- tion. Cycle creation steps of type (M1) can be eliminated in colors ci for 1 ≤ i ≤ `′, where `′ = min{k, `}. Proof. For add-edge moves, cover the edge with a color present on both v and w if possible. If this is not possible, then there are `+1 distinct colors present. Use the highest numbered color to cover the new edge. Remark: We note that in the upper range, there is always a repeated color, so no canonical add-edge moves create cycles in the upper range. The canonical pebble-slide move is defined by a global condition. To prove that we obtain the same class of graphs using only canonical pebble-slide moves, we need to extend Lemma 9 to only canonical moves. The main step is to show that if there is any sequence of moves that reorients a path from v to w, then there is a sequence of canonical moves that does the same thing. Lemma 15 (The canonical pebble-slide move). Any sequence of pebble-slide moves leading to an add-edge move can be replaced with one that has no (M2) steps and allows the same add-edge move. In other words, if it is possible to collect `+ 1 pebbles on the ends of an edge to be added, then it is possible to do this without creating any monochromatic cycles. 12 Ileana Streinu, Louis Theran Figure 7 and Figure 8 illustrate the construction used in the proof of Lemma 15. We call this the shortcut construction by analogy to matroid union and intersection augmenting paths used in previous work on the lower range. Figure 6 shows the structure of the proof. The shortcut construction removes an (M2) step at the beginning of a sequence that reorients a path from v to w with pebble-slides. Since one application of the shortcut construction reorients a simple path from a vertex w′ to w, and a path from v to w′ is preserved, the shortcut construction can be applied inductively to find the sequence of moves we want. Fig. 6. Outline of the shortcut construction: (a) An arbitrary simple path from v to w with curved lines indicating simple paths. (b) An (M2) step. The black edge, about to be flipped, would create a cycle, shown in dashed and solid gray, of the (unique) gray tree rooted at w. The solid gray edges were part of the original path from (a). (c) The shortened path to the gray pebble; the new path follows the gray tree all the way from the first time the original path touched the gray tree at w′. The path from v to w′ is simple, and the shortcut construction can be applied inductively to it. Proof. Without loss of generality, we can assume that our sequence of moves reorients a simple path in H, and that the first move (the end of the path) is (M2). The (M2) step moves a pebble of color ci from a vertex w onto the edge vw, which is reversed. Because the move is (M2), v and w are contained in a maximal monochromatic tree of color ci. Call this tree H ′i , and observe that it is rooted at w. Now consider the edges reversed in our sequence of moves. As noted above, before we make any of the moves, these sketch out a simple path in H ending at w. Let z be the first vertex on this path in H ′i . We modify our sequence of moves as follows: delete, from the beginning, every move before the one that reverses some edge yz; prepend onto what is left a sequence of moves that moves the pebble on w to z in H ′i . Sparsity-certifying Graph Decompositions 13 Fig. 7. Eliminating (M2) moves: (a) an (M2) move; (b) avoiding the (M2) by moving along another path. The path where the pebbles move is indicated by doubled lines. Fig. 8. Eliminating (M2) moves: (a) the first step to move the black pebble along the doubled path is (M2); (b) avoiding the (M2) and simplifying the path. Since no edges change color in the beginning of the new sequence, we have eliminated the (M2) move. Because our construction does not change any of the edges involved in the remaining tail of the original sequence, the part of the original path that is left in the new sequence will still be a simple path in H, meeting our initial hypothesis. The rest of the lemma follows by induction. Together Lemma 14 and Lemma 15 prove the following. Lemma 16. If G is a pebble-game graph, then G has a canonical pebble game construction. Using canonical pebble game constructions, we can identify the tight pebble-game graphs with maps-and-trees and `Tk graphs. 14 Ileana Streinu, Louis Theran Theorem 3 (Main Theorem (Lower Range): Maps-and-trees coincide with pebble-game graphs). Let 0 ≤ ` ≤ k. A graph G is a tight pebble-game graph if and only if G is a (k, `)- maps-and-trees. Proof. As observed above, a maps-and-trees decomposition is a special case of the pebble game decomposition. Applying Theorem 2, we see that any maps-and-trees must be a pebble-game graph. For the reverse direction, consider a canonical pebble game construction of a tight graph. From Lemma 8, we see that there are ` pebbles left on G at the end of the construction. The definition of the canonical add-edge move implies that there must be at least one pebble of each ci for i = 1,2, . . . , `. It follows that there is exactly one of each of these colors. By Lemma 12, each of these pebbles is the root of a monochromatic tree-piece with n− 1 edges, yielding the required ` edge-disjoint spanning trees. Corollary 5 (Nash-Williams [17], Tutte [23], White and Whiteley [24]). Let `≤ k. A graph G is tight if and only if has a (k, `)-maps-and-trees decomposition. We next consider the decompositions induced by canonical pebble game constructions when `≥ k +1. Theorem 4 (Main Theorem (Upper Range): Proper Trees-and-trees coincide with peb- ble-game graphs). Let k≤ `≤ 2k−1. A graph G is a tight pebble-game graph if and only if it is a proper `Tk with kn− ` edges. Proof. As observed above, a proper `Tk decomposition must be sparse. What we need to show is that a canonical pebble game construction of a tight graph produces a proper `Tk . By Theorem 2 and Lemma 16, we already have the condition on tree-pieces and the decom- position into ` edge-disjoint trees. Finally, an application of (I4), shows that every vertex must in in exactly k of the trees, as required. Corollary 6 (Crapo [2], Haas [7]). Let k ≤ `≤ 2k−1. A graph G is tight if and only if it is a proper `Tk . 8. Pebble game algorithms for finding decompositions A naı̈ve implementation of the constructions in the previous section leads to an algorithm re- quiring Θ(n2) time to collect each pebble in a canonical construction: in the worst case Θ(n) applications of the construction in Lemma 15 requiring Θ(n) time each, giving a total running time of Θ(n3) for the decomposition problem. In this section, we describe algorithms for the decomposition problem that run in time O(n2). We begin with the overall structure of the algorithm. Algorithm 17 (The canonical pebble game with colors). Input: A graph G. Output: A pebble-game graph H. Method: – Set V (H) = V (G) and place one pebble of each color on the vertices of H. – For each edge vw ∈ E(G) try to collect at least `+1 pebbles on v and w using pebble-slide moves as described by Lemma 15. Sparsity-certifying Graph Decompositions 15 – If at least `+1 pebbles can be collected, add vw to H using an add-edge move as in Lemma 14, otherwise discard vw. – Finally, return H, and the locations of the pebbles. Correctness. Theorem 1 and the result from [24] that the sparse graphs are the independent sets of a matroid show that H is a maximum sized sparse subgraph of G. Since the construction found is canonical, the main theorem shows that the coloring of the edges in H gives a maps- and-trees or proper `Tk decomposition. Complexity. We start by observing that the running time of Algorithm 17 is the time taken to process O(n) edges added to H and O(m) edges not added to H. We first consider the cost of an edge of G that is added to H. Each of the pebble game moves can be implemented in constant time. What remains is to describe an efficient way to find and move the pebbles. We use the following algorithm as a subroutine of Algorithm 17 to do this. Algorithm 18 (Finding a canonical path to a pebble.). Input: Vertices v and w, and a pebble game configuration on a directed graph H. Output: If a pebble was found, ‘yes’, and ‘no’ otherwise. The configuration of H is updated. Method: – Start by doing a depth-first search from from v in H. If no pebble not on w is found, stop and return ‘no.’ – Otherwise a pebble was found. We now have a path v = v1,e1, . . . ,ep−1,vp = u, where the vi are vertices and ei is the edge vivi+1. Let c[ei] be the color of the pebble on ei. We will use the array c[] to keep track of the colors of pebbles on vertices and edges after we move them and the array s[] to sketch out a canonical path from v to u by finding a successor for each edge. – Set s[u] = ‘end′ and set c[u] to the color of an arbitrary pebble on u. We walk on the path in reverse order: vp,ep−1,ep−2, . . . ,e1,v1. For each i, check to see if c[vi] is set; if so, go on to the next i. Otherwise, check to see if c[vi+1] = c[ei]. – If it is, set s[vi] = ei and set c[vi] = c[ei], and go on to the next edge. – Otherwise c[vi+1] 6= c[ei], try to find a monochromatic path in color c[vi+1] from vi to vi+1. If a vertex x is encountered for which c[x] is set, we have a path vi = x1, f1,x2, . . . , fq−1,xq = x that is monochromatic in the color of the edges; set c[xi] = c[ fi] and s[xi] = fi for i = 1,2, . . . ,q−1. If c[x] = c[ fq−1], stop. Otherwise, recursively check that there is not a monochro- matic c[x] path from xq−1 to x using this same procedure. – Finally, slide pebbles along the path from the original endpoints v to u specified by the successor array s[v], s[s[v]], . . . The correctness of Algorithm 18 comes from the fact that it is implementing the shortcut construction. Efficiency comes from the fact that instead of potentially moving the pebble back and forth, Algorithm 18 pre-computes a canonical path crossing each edge of H at most three times: once in the initial depth-first search, and twice while converting the initial path to a canonical one. It follows that each accepted edges takes O(n) time, for a total of O(n2) time spent processing edges in H. Although we have not discussed this explicity, for the algorithm to be efficient we need to maintain components as in [12]. After each accepted edge, the components of H can be updated in time O(n). Finally, the results of [12, 13] show that the rejected edges take an amortized O(1) time each. 16 Ileana Streinu, Louis Theran Summarizing, we have shown that the canonical pebble game with colors solves the decom- position problem in time O(n2). 9. An important special case: Rigidity in dimension 2 and slider-pinning In this short section we present a new application for the special case of practical importance, k = 2, ` = 3. As discussed in the introduction, Laman’s theorem [11] characterizes minimally rigid graphs as the (2,3)-tight graphs. In recent work on slider pinning, developed after the current paper was submitted, we introduced the slider-pinning model of rigidity [15, 20]. Com- binatorially, we model the bar-slider frameworks as simple graphs together with some loops placed on their vertices in such a way that there are no more than 2 loops per vertex, one of each color. We characterize the minimally rigid bar-slider graphs [20] as graphs that are: 1. (2,3)-sparse for subgraphs containing no loops. 2. (2,0)-tight when loops are included. We call these graphs (2,0,3)-graded-tight, and they are a special case of the graded-sparse graphs studied in our paper [14]. The connection with the pebble games in this paper is the following. Corollary 19 (Pebble games and slider-pinning). In any (2,3)-pebble game graph, if we replace pebbles by loops, we obtain a (2,0,3)-graded-tight graph. Proof. Follows from invariant (I3) of Lemma 7. In [15], we study a special case of slider pinning where every slider is either vertical or horizontal. We model the sliders as pre-colored loops, with the color indicating x or y direction. For this axis parallel slider case, the minimally rigid graphs are characterized by: 1. (2,3)-sparse for subgraphs containing no loops. 2. Admit a 2-coloring of the edges so that each color is a forest (i.e., has no cycles), and each monochromatic tree spans exactly one loop of its color. This also has an interpretation in terms of colored pebble games. Corollary 20 (The pebble game with colors and slider-pinning). In any canonical (2,3)- pebble-game-with-colors graph, if we replace pebbles by loops of the same color, we obtain the graph of a minimally pinned axis-parallel bar-slider framework. Proof. Follows from Theorem 4, and Lemma 12. 10. Conclusions and open problems We presented a new characterization of (k, `)-sparse graphs, the pebble game with colors, and used it to give an efficient algorithm for finding decompositions of sparse graphs into edge- disjoint trees. Our algorithm finds such sparsity-certifying decompositions in the upper range and runs in time O(n2), which is as fast as the algorithms for recognizing sparse graphs in the upper range from [12]. We also used the pebble game with colors to describe a new sparsity-certifying decomposi- tion that applies to the entire matroidal range of sparse graphs. Sparsity-certifying Graph Decompositions 17 We defined and studied a class of canonical pebble game constructions that correspond to either a maps-and-trees or proper `Tk decomposition. This gives a new proof of the Tutte-Nash- Williams arboricity theorem and a unified proof of the previously studied decomposition cer- tificates of sparsity. Canonical pebble game constructions also show the relationship between the `+1 pebble condition, which applies to the upper range of `, to matroid union augmenting paths, which do not apply in the upper range. Algorithmic consequences and open problems. In [6], Gabow and Westermann give an O(n3/2) algorithm for recognizing sparse graphs in the lower range and extracting sparse subgraphs from dense ones. Their technique is based on efficiently finding matroid union augmenting paths, which extend a maps-and-trees decomposition. The O(n3/2) algorithm uses two subroutines to find augmenting paths: cyclic scanning, which finds augmenting paths one at a time, and batch scanning, which finds groups of disjoint augmenting paths. We observe that Algorithm 17 can be used to replace cyclic scanning in Gabow and Wester- mann’s algorithm without changing the running time. The data structures used in the implemen- tation of the pebble game, detailed in [12, 13] are simpler and easier to implement than those used to support cyclic scanning. The two major open algorithmic problems related to the pebble game are then: Problem 1. Develop a pebble game algorithm with the properties of batch scanning and obtain an implementable O(n3/2) algorithm for the lower range. Problem 2. Extend batch scanning to the `+1 pebble condition and derive an O(n3/2) pebble game algorithm for the upper range. In particular, it would be of practical importance to find an implementable O(n3/2) algorithm for decompositions into edge-disjoint spanning trees. References 1. Berg, A.R., Jordán, T.: Algorithms for graph rigidity and scene analysis. In: Proc. 11th European Symposium on Algorithms (ESA ’03), LNCS, vol. 2832, pp. 78–89. (2003) 2. Crapo, H.: On the generic rigidity of plane frameworks. Tech. Rep. 1278, Institut de recherche d’informatique et d’automatique (1988) 3. Edmonds, J.: Minimum partition of a matroid into independent sets. J. Res. Nat. Bur. Standards Sect. B 69B, 67–72 (1965) 4. Edmonds, J.: Submodular functions, matroids, and certain polyhedra. In: Combinatorial Optimization—Eureka, You Shrink!, no. 2570 in LNCS, pp. 11–26. Springer (2003) 5. Gabow, H.N.: A matroid approach to finding edge connectivity and packing arborescences. Journal of Computer and System Sciences 50, 259–273 (1995) 6. Gabow, H.N., Westermann, H.H.: Forests, frames, and games: Algorithms for matroid sums and applications. Algorithmica 7(1), 465–497 (1992) 7. Haas, R.: Characterizations of arboricity of graphs. Ars Combinatorica 63, 129–137 (2002) 8. Haas, R., Lee, A., Streinu, I., Theran, L.: Characterizing sparse graphs by map decompo- sitions. Journal of Combinatorial Mathematics and Combinatorial Computing 62, 3–11 (2007) 9. Hendrickson, B.: Conditions for unique graph realizations. SIAM Journal on Computing 21(1), 65–84 (1992) 18 Ileana Streinu, Louis Theran 10. Jacobs, D.J., Hendrickson, B.: An algorithm for two-dimensional rigidity percolation: the pebble game. Journal of Computational Physics 137, 346–365 (1997) 11. Laman, G.: On graphs and rigidity of plane skeletal structures. Journal of Engineering Mathematics 4, 331–340 (1970) 12. Lee, A., Streinu, I.: Pebble game algorihms and sparse graphs. Discrete Mathematics 308(8), 1425–1437 (2008) 13. Lee, A., Streinu, I., Theran, L.: Finding and maintaining rigid components. In: Proc. Cana- dian Conference of Computational Geometry. Windsor, Ontario (2005). http://cccg. cs.uwindsor.ca/papers/72.pdf 14. Lee, A., Streinu, I., Theran, L.: Graded sparse graphs and matroids. Journal of Universal Computer Science 13(10) (2007) 15. Lee, A., Streinu, I., Theran, L.: The slider-pinning problem. In: Proceedings of the 19th Canadian Conference on Computational Geometry (CCCG’07) (2007) 16. Lovász, L.: Combinatorial Problems and Exercises. Akademiai Kiado and North-Holland, Amsterdam (1979) 17. Nash-Williams, C.S.A.: Decomposition of finite graphs into forests. Journal of the London Mathematical Society 39, 12 (1964) 18. Oxley, J.G.: Matroid theory. The Clarendon Press, Oxford University Press, New York (1992) 19. Roskind, J., Tarjan, R.E.: A note on finding minimum cost edge disjoint spanning trees. Mathematics of Operations Research 10(4), 701–708 (1985) 20. Streinu, I., Theran, L.: Combinatorial genericity and minimal rigidity. In: SCG ’08: Pro- ceedings of the twenty-fourth annual Symposium on Computational Geometry, pp. 365– 374. ACM, New York, NY, USA (2008). 21. Tay, T.S.: Rigidity of multigraphs I: linking rigid bodies in n-space. Journal of Combinato- rial Theory, Series B 26, 95–112 (1984) 22. Tay, T.S.: A new proof of Laman’s theorem. Graphs and Combinatorics 9, 365–370 (1993) 23. Tutte, W.T.: On the problem of decomposing a graph into n connected factors. Journal of the London Mathematical Society 142, 221–230 (1961) 24. Whiteley, W.: The union of matroids and the rigidity of frameworks. SIAM Journal on Discrete Mathematics 1(2), 237–255 (1988) http://cccg.cs.uwindsor.ca/papers/72.pdf http://cccg.cs.uwindsor.ca/papers/72.pdf Introduction and preliminaries Historical background The pebble game with colors Our Results Pebble game graphs The pebble-game-with-colors decomposition Canonical Pebble Game Constructions Pebble game algorithms for finding decompositions An important special case: Rigidity in dimension 2 and slider-pinning Conclusions and open problems ABSTRACT We describe a new algorithm, the $(k,\ell)$-pebble game with colors, and use it obtain a characterization of the family of $(k,\ell)$-sparse graphs and algorithmic solutions to a family of problems concerning tree decompositions of graphs. Special instances of sparse graphs appear in rigidity theory and have received increased attention in recent years. In particular, our colored pebbles generalize and strengthen the previous results of Lee and Streinu and give a new proof of the Tutte-Nash-Williams characterization of arboricity. We also present a new decomposition that certifies sparsity based on the $(k,\ell)$-pebble game with colors. Our work also exposes connections between pebble game algorithms and previous sparse graph algorithms by Gabow, Gabow and Westermann and Hendrickson. <|endoftext|><|startoftext|> Introduction The popularly accepted theory for the formation of the Earth-Moon system is that the Moon was formed from debris of a strong impact by a giant planetesimal with the Earth at the close of the planet-forming period (Hartmann and Davis 1975). Since the formation of the Earth-Moon system, it has been evolving at all time scale. It is well known that the Moon is receding from us and both the Earth’s rotation and Moon’s rotation are slowing. The popular theory is that the tidal friction causes all those changes based on the conservation of the angular momentum of the Earth-Moon system. The situation becomes complicated in describing the past evolution of the Earth-Moon system. Because the Moon is moving away from us and the Earth rotation is slowing, this means that the Moon was closer and the Earth rotation was faster in the past. Creationists argue that based on the tidal friction theory, the tidal friction should be stronger and the recessional rate of the Moon should be greater in the past, the distance of the Moon would quickly fall inside the Roche's limit (for earth, 15500 km) in which the Moon would be torn apart by gravity in 1 to 2 billion years ago. However, geological evidence indicates that the recession of the Moon in the past was slower than the present rate, i. e., the recession has been accelerating with time. Therefore, it must be concluded that tidal friction was very much less in the remote past than we would deduce on the basis of present-day observations (Stacey 1977). This was called “geological time scale difficulty” or “Lunar crisis” and is one of the main arguments by creationists against the tidal friction theory (Brush 1983). But we have to consider the case carefully in various aspects. One possible scenario is that the Earth has been undergoing dynamic evolution at all time scale since its inception, the geological and physical conditions (such as the continent positions and drifting, the crust, surface temperature fluctuation like the glacial/snowball effect, etc) at remote past could be substantially different from currently, in which the tidal friction could be much less; therefore, the receding rate of the Moon could be slower. Various tidal friction models were proposed in the past to describe the evolution of the Earth- Moon system to avoid such difficulty or crisis and put the Moon at quite a comfortable distance from Earth at 4.5 billion years ago (Hansen 1982, Kagan and Maslova 1994, Ray et al. 1999, Finch 1981, Slichter 1963). The tidal friction theories explain that the present rate of tidal dissipation is anomalously high because the tidal force is close to a resonance in the response function of ocean (Brush 1983). Kagan gave a detailed review about those tidal friction models (Kagan 1997). Those models are based on many assumptions about geological (continental position and drifting) and physical conditions in the past, and many parameters (such as phase lag angle, multi-mode approximation with time dependent frequencies of the resonance modes, etc.) have to be introduced and carefully adjusted to make their predictions close to the geological evidence. However, those assumptions and parameters are still challenged, to certain extent, as concoction. The second possible scenario is that another mechanism could dominate the evolution of the Earth-Moon system and the role of the tidal friction is not significant. In the Meeting of Division of Particle and Field 2004, American Physical Society, University of California at Riverside, the author proposed a dark matter field fluid model (Pan 2005) with a non-Newtonian approach, the current Moon and Earth data agree with this model very well. This paper will demonstrate that the past evolution of Moon-Earth system can be described by the dark matter field fluid model without any assumptions about past geological and physical conditions. Although the subject of the evolution of the Earth-Moon system has been extensively studied analytically or numerically, to the author’s knowledge, there are no theories similar or equivalent to this model. 2. Invisible matter In modern cosmology, it was proposed that the visible matter in the universe is about 2 ~ 10 % of the total matter and about 90 ~ 98% of total matter is currently invisible which is called dark matter and dark energy, such invisible matter has an anti- gravity property to make the universe expanding faster and faster. If the ratio of the matter components of the universe is close to this hypothesis, then, the evolution of the universe should be dominated by the physical mechanism of such invisible matter, such physical mechanism could be far beyond the current Newtonian physics and Einsteinian physics, and the Newtonian physics and Einsteinian physics could reflect only a corner of the iceberg of the greater physics. If the ratio of the matter components of the universe is close to this hypothesis, then, it should be more reasonable to think that such dominant invisible matter spreads in everywhere of the universe (the density of the invisible matter may vary from place to place); in other words, all visible matter objects should be surrounded by such invisible matter and the motion of the visible matter objects should be affected by the invisible matter if there are interactions between the visible matter and the invisible matter. If the ratio of the matter components of the universe is close to this hypothesis, then, the size of the particles of the invisible matter should be very small and below the detection limit of the current technology; otherwise, it would be detected long time ago with such dominant amount. With such invisible matter in mind, we move to the next section to develop the dark matter field fluid model with non-Newtonian approach. For simplicity, all invisible matter (dark matter, dark energy and possible other terms) is called dark matter here. 3. The dark matter field fluid model In this proposed model, it is assumed that: 1. A celestial body rotates and moves in the space, which, for simplicity, is uniformly filled with the dark matter which is in quiescent state relative to the motion of the celestial body. The dark matter possesses a field property and a fluid property; it can interact with the celestial body with its fluid and field properties; therefore, it can have energy exchange with the celestial body, and affect the motion of the celestial body. 2. The fluid property follows the general principle of fluid mechanics. The dark matter field fluid particles may be so small that they can easily permeate into ordinary “baryonic” matter; i. e., ordinary matter objects could be saturated with such dark matter field fluid. Thus, the whole celestial body interacts with the dark matter field fluid, in the manner of a sponge moving thru water. The nature of the field property of the dark matter field fluid is unknown. It is here assumed that the interaction of the field associated with the dark matter field fluid with the celestial body is proportional to the mass of the celestial body. The dark matter field fluid is assumed to have a repulsive force against the gravitational force towards baryonic matter. The nature and mechanism of such repulsive force is unknown. With the assumptions above, one can study how the dark matter field fluid may influence the motion of a celestial body and compare the results with observations. The common shape of celestial bodies is spherical. According to Stokes's law, a rigid non- permeable sphere moving through a quiescent fluid with a sufficiently low Reynolds number experiences a resistance force F rvF πμ6−= (1) where v is the moving velocity, r is the radius of the sphere, and μ is the fluid viscosity constant. The direction of the resistance force F in Eq. 1 is opposite to the direction of the velocity v. For a rigid sphere moving through the dark matter field fluid, due to the dual properties of the dark matter field fluid and its permeation into the sphere, the force F may not be proportional to the radius of the sphere. Also, F may be proportional to the mass of the sphere due to the field interaction. Therefore, with the combined effects of both fluid and field, the force exerted on the sphere by the dark matter field fluid is assumed to be of the scaled form (2) mvrF n−−= 16πη where n is a parameter arising from saturation by dark matter field fluid, the r1-n can be viewed as the effective radius with the same unit as r, m is the mass of the sphere, and η is the dark matter field fluid constant, which is equivalent to μ. The direction of the resistance force F in Eq. 2 is opposite to the direction of the velocity v. The force described by Eq. 2 is velocity-dependent and causes negative acceleration. According to Newton's second law of motion, the equation of motion for the sphere is mvr m n−−= 16πη (3) Then (4) )6exp( 10 vtrvv n−−= πη where v0 is the initial velocity (t = 0) of the sphere. If the sphere revolves around a massive gravitational center, there are three forces in the line between the sphere and the gravitational center: (1) the gravitational force, (2) the centripetal acceleration force; and (3) the repulsive force of the dark matter field fluid. The drag force in Eq. 3 reduces the orbital velocity and causes the sphere to move inward to the gravitational center. However, if the sum of the centripetal acceleration force and the repulsive force is stronger than the gravitational force, then, the sphere will move outward and recede from the gravitational center. This is the case of interest here. If the velocity change in Eq. 3 is sufficiently slow and the repulsive force is small compared to the gravitational force and centripetal acceleration force, then the rate of receding will be accordingly relatively slow. Therefore, the gravitational force and the centripetal acceleration force can be approximately treated in equilibrium at any time. The pseudo equilibrium equation is GMm 2 2 = (5) where G is the gravitational constant, M is the mass of the gravitational center, and R is the radius of the orbit. Inserting v of Eq. 4 into Eq. 5 yields )12exp( 1 R n−= πη (6) (7) )12exp( 10 trRR n−= πη where R = (8) R0 is the initial distance to the gravitational center. Note that R exponentially increases with time. The increase of orbital energy with the receding comes from the repulsive force of dark matter field fluid. The recessional rate of the sphere is dR n−= 112πη (9) The acceleration of the recession is ( Rr Rd n 21 12 −= πη ) . (10) The recessional acceleration is positive and proportional to its distance to the gravitational center, so the recession is faster and faster. According to the mechanics of fluids, for a rigid non-permeable sphere rotating about its central axis in the quiescent fluid, the torque T exerted by the fluid on the sphere ωπμ 38 rT −= (11) where ω is the angular velocity of the sphere. The direction of the torque in Eq. 11 is opposite to the direction of the rotation. In the case of a sphere rotating in the quiescent dark matter field fluid with angular velocity ω, similar to Eq. 2, the proposed T exerted on the sphere is ( ) ωπη mrT n 318 −−= (12) The direction of the torque in Eq. 12 is opposite to the direction of the rotation. The torque causes the negative angular acceleration = (13) where I is the moment of inertia of the sphere in the dark matter field fluid ( )21 2 nrmI −= (14) Therefore, the equation of rotation for the sphere in the dark matter field fluid is ωπη d −−= 120 (15) Solving this equation yields (16) )20exp( 10 tr n−−= πηωω where ω0 is the initial angular velocity. One can see that the angular velocity of the sphere exponentially decreases with time and the angular deceleration is proportional to its angular velocity. For the same celestial sphere, combining Eq. 9 and Eq. 15 yields (17) The significance of Eq. 17 is that it contains only observed data without assumptions and undetermined parameters; therefore, it is a critical test for this model. For two different celestial spheres in the same system, combining Eq. 9 and Eq. 15 yields 67.1 1 −=−=⎟⎟ (18) This is another critical test for this model. 4. The current behavior of the Earth-Moon system agrees with the model The Moon-Earth system is the simplest gravitational system. The solar system is complex, the Earth and the Moon experience not only the interaction of the Sun but also interactions of other planets. Let us consider the local Earth-Moon gravitational system as an isolated local gravitational system, i.e., the influence from the Sun and other planets on the rotation and orbital motion of the Moon and on the rotation of the Earth is assumed negligible compared to the forces exerted by the moon and earth on each other. In addition, the eccentricity of the Moon's orbit is small enough to be ignored. The data about the Moon and the Earth from references (Dickey et .al., 1994, and Lang, 1992) are listed below for the readers' convenience to verify the calculation because the data may vary slightly with different data sources. Moon: Mean radius: r = 1738.0 km Mass: m = 7.3483 × 1025 gram Rotation period = 27.321661 days Angular velocity of Moon = 2.6617 × 10-6 rad s-1 Mean distance to Earth Rm= 384400 km Mean orbital velocity v = 1.023 km s-1 Orbit eccentricity e = 0.0549 Angular rotation acceleration rate = -25.88 ± 0.5 arcsec century-2 = (-1.255 ± 0.024) × 10-4 rad century-2 = (-1.260 ± 0.024) × 10-23 rad s-2 Receding rate from Earth = 3.82 ± 0.07 cm year-1 = (1.21 ± 0.02) × 10-9 m s-1 Earth: Mean radius: r = 6371.0 km Mass: m = 5.9742 × 1027 gram Rotation period = 23 h 56m 04.098904s = 86164.098904s Angular velocity of rotation = 7.292115 × 10-5 rad s-1 Mean distance to the Sun Rm= 149,597,870.61 km Mean orbital velocity v = 29.78 km s-1 Angular acceleration of Earth = (-5.5 ± 0.5) × 10-22 rad s-2 The Moon's angular rotation acceleration rate and increase in mean distance to the Earth (receding rate) were obtained from the lunar laser ranging of the Apollo Program (Dickey et .al., 1994). By inserting the data of the Moon's rotation and recession into Eq. 17, the result is 039.054.1 10662.21021.1 1092509.31026.1 (19) The distance R in Eq. 19 is from the Moon's center to the Earth's center and the number 384400 km is assumed to be the distance from the Moon's surface to the Earth's surface. Eq. 19 is in good agreement with the theoretical value of -1.67. The result is in accord with the model used here. The difference (about 7.8%) between the values of -1.54 and - 1.67 may come from several sources: 1. Moon's orbital is not a perfect circle 2. Moon is not a perfect rigid sphere. 3. The effect from Sun and other planets. 4. Errors in data. 5. Possible other unknown reasons. The two parameters n and η in Eq. 9 and Eq. 15 can be determined with two data sets. The third data set can be used to further test the model. If this model correctly describes the situation at hand, it should give consistent results for different motions. The values of n and η calculated from three different data sets are listed below (Note, the mean distance of the Moon to the Earth and mean radii of the Moon and the Earth are used in the calculation). The value of n: n = 0.64 From the Moon's rotation: η = 4.27 × 10-22 s-1 m-1 From the Earth's rotation: η = 4.26 × 10-22 s-1 m-1 From the Moon's recession: η = 4.64 × 10-22 s-1 m-1 One can see that the three values of η are consistent within the range of error in the data. The average value of η: η = (4.39 ± 0.22) × 10-22 s-1 m-1 By inserting the data of the Earth's rotation, the Moon’s recession and the value of n into Eq. 18, the result is 14.053.1 6371000 1738000 1021.11029.7 1092509.3105.5 )64.01( (20) This is also in accord with the model used here. The dragging force exerted on the Moon's orbital motion by the dark matter field fluid is -1.11 × 108 N, this is negligibly small compared to the gravitational force between the Moon and the Earth ~ 1.90 × 1020 N; and the torque exerted by the dark matter field fluid on the Earth’s and the Moon's rotations is T = -5.49 × 1016 Nm and -1.15 × 1012 Nm, respectively. 5. The evolution of Earth-Moon system Sonett et al. found that the length of the terrestrial day 900 million years ago was about 19.2 hours based on the laminated tidal sediments on the Earth (Sonett et al., 1996). According to the model presented here, back in that time, the length of the day was about 19.2 hours, this agrees very well with Sonett et al.'s result. Another critical aspect of modeling the evolution of the Earth-Moon system is to give a reasonable estimate of the closest distance of the Moon to the Earth when the system was established at 4.5 billion years ago. Based on the dark matter field fluid model, and the above result, the closest distance of the Moon to the Earth was about 259000 km (center to center) or 250900 km (surface to surface) at 4.5 billion years ago, this is far beyond the Roche's limit. In the modern astronomy textbook by Chaisson and McMillan (Chaisson and McMillan, 1993, p.173), the estimated distance at 4.5 billion years ago was 250000 km, this is probably the most reasonable number that most astronomers believe and it agrees excellently with the result of this model. The closest distance of the Moon to the Earth by Hansen’s models was about 38 Earth radii or 242000 km (Hansen, 1982). According to this model, the length of day of the Earth was about 8 hours at 4.5 billion years ago. Fig. 1 shows the evolution of the distance of Moon to the Earth and the length of day of the Earth with the age of the Earth-Moon system described by this model along with data from Kvale et al. (1999), Sonett et al. (1996) and Scrutton (1978). One can see that those data fit this model very well in their time range. Fig. 2 shows the geological data of solar days year-1 from Wells (1963) and from Sonett et al. (1996) and the description (solid line) by this dark matter field fluid model for past 900 million years. One can see that the model agrees with the geological and fossil data beautifully. The important difference of this model with early models in describing the early evolution of the Earth-Moon system is that this model is based only on current data of the Moon-Earth system and there are no assumptions about the conditions of earlier Earth rotation and continental drifting. Based on this model, the Earth-Moon system has been smoothly evolving to the current position since it was established and the recessional rate of the Moon has been gradually increasing, however, this description does not take it into account that there might be special events happened in the past to cause the suddenly significant changes in the motions of the Earth and the Moon, such as strong impacts by giant asteroids and comets, etc, because those impacts are very common in the universe. The general pattern of the evolution of the Moon-Earth system described by this model agrees with geological evidence. Based on Eq. 9, the recessional rate exponentially increases with time. One may then imagine that the recessional rate will quickly become very large. The increase is in fact extremely slow. The Moon's recessional rate will be 3.04 × 10-9 m s-1 after 10 billion years and 7.64 × 10-9 m s-1 after 20 billion years. However, whether the Moon's recession will continue or at some time later another mechanism will take over is not known. It should be understood that the tidal friction does affect the evolution of the Earth itself such as the surface crust structure, continental drifting and evolution of bio-system, etc; it may also play a role in slowing the Earth’s rotation, however, such role is not a dominant mechanism. Unfortunately, there is no data available for the changes of the Earth's orbital motion and all other members of solar system. According to this model and above results, the recessional rate of the Earth should be 6.86 × 10-7 m s-1 = 21.6 m year-1 = 2.16 km century-1, the length of a year increases about 6.8 ms and the change of the temperature is -1.8 × 10-8 K year-1 with constant radiation level of the Sun and the stable environment on the Earth. The length of a year at 1 billion years ago would be 80% of the current length of the year. However, much evidence (growth-bands of corals and shellfish as well as some other evidences) suggest that there has been no apparent change in the length of the year over the billion years and the Earth's orbital motion is more stable than its rotation. This suggests that dark matter field fluid is circulating around Sun with the same direction and similar speed of Earth (at least in the Earth's orbital range). Therefore, the Earth's orbital motion experiences very little or no dragging force from the dark matter field fluid. However, this is a conjecture, extensive research has to be conducted to verify if this is the case. 6. Skeptical description of the evolution of the Mars The Moon does not have liquid fluid on its surface, even there is no air, therefore, there is no ocean-like tidal friction force to slow its rotation; however, the rotation of the Moon is still slowing at significant rate of (-1.260 ± 0.024) × 10-23 rad s-2, which agrees with the model very well. Based on this, one may reasonably think that the Mars’s rotation should be slowing also. The Mars is our nearest neighbor which has attracted human’s great attention since ancient time. The exploration of the Mars has been heating up in recent decades. NASA, Russian and Europe Space Agency sent many space crafts to the Mars to collect data and study this mysterious planet. So far there is still not enough data about the history of this planet to describe its evolution. Same as the Earth, the Mars rotates about its central axis and revolves around the Sun, however, the Mars does not have a massive moon circulating it (Mars has two small satellites: Phobos and Deimos) and there is no liquid fluid on its surface, therefore, there is no apparent ocean-like tidal friction force to slow its rotation by tidal friction theories. Based on the above result and current the Mars's data, this model predicts that the angular acceleration of the Mars should be about -4.38 × 10-22 rad s-2. Figure 3 describes the possible evolution of the length of day and the solar days/Mars year, the vertical dash line marks the current age of the Mars with assumption that the Mars was formed in a similar time period of the Earth formation. Such description was not given before according to the author’s knowledge and is completely skeptical due to lack of reliable data. However, with further expansion of the research and exploration on the Mars, we shall feel confident that the reliable data about the angular rotation acceleration of the Mars will be available in the near future which will provide a vital test for the prediction of this model. There are also other factors which may affect the Mars’s rotation rate such as mass redistribution due to season change, winds, possible volcano eruptions and Mars quakes. Therefore, the data has to be carefully analyzed. 7. Discussion about the model From the above results, one can see that the current Earth-Moon data and the geological and fossil data agree with the model very well and the past evolution of the Earth-Moon system can be described by the model without introducing any additional parameters; this model reveals the interesting relationship between the rotation and receding (Eq. 17 and Eq. 18) of the same celestial body or different celestial bodies in the same gravitational system, such relationship is not known before. Such success can not be explained by “coincidence” or “luck” because of so many data involved (current Earth’s and Moon’s data and geological and fossil data) if one thinks that this is just a “ad hoc” or a wrong model, although the chance for the natural happening of such “coincidence” or “luck” could be greater than wining a jackpot lottery; the future Mars’s data will clarify this; otherwise, a new theory from different approach can be developed to give the same or better description as this model does. It is certain that this model is not perfect and may have defects, further development may be conducted. James Clark Maxwell said in the 1873 “ The vast interplanetary and interstellar regions will no longer be regarded as waste places in the universe, which the Creator has not seen fit to fill with the symbols of the manifold order of His kingdom. We shall find them to be already full of this wonderful medium; so full, that no human power can remove it from the smallest portion of space, or produce the slightest flaw in its infinite continuity. It extends unbroken from star to star ….” The medium that Maxwell talked about is the aether which was proposed as the carrier of light wave propagation. The Michelson-Morley experiment only proved that the light wave propagation does not depend on such medium and did not reject the existence of the medium in the interstellar space. In fact, the concept of the interstellar medium has been developed dramatically recently such as the dark matter, dark energy, cosmic fluid, etc. The dark matter field fluid is just a part of such wonderful medium and “precisely” described by Maxwell. 7. Conclusion The evolution of the Earth-Moon system can be described by the dark matter field fluid model with non-Newtonian approach and the current data of the Earth and the Moon fits this model very well. At 4.5 billion years ago, the closest distance of the Moon to the Earth could be about 259000 km, which is far beyond the Roche’s limit and the length of day was about 8 hours. The general pattern of the evolution of the Moon-Earth system described by this model agrees with geological and fossil evidence. The tidal friction may not be the primary cause for the evolution of the Earth-Moon system. The Mars’s rotation is also slowing with the angular acceleration rate about -4.38 × 10-22 rad s-2. References S. G. Brush, 1983. L. R. Godfrey (editor), Ghost from the Nineteenth century: Creationist Arguments for a young Earth. Scientists confront creationism. W. W. Norton & Company, New York, London, pp 49. E. Chaisson and S. McMillan. 1993. Astronomy Today, Prentice Hall, Englewood Cliffs, NJ 07632. J. O. Dickey, et al., 1994. Science, 265, 482. D. G. Finch, 1981. Earth, Moon, and Planets, 26(1), 109. K. S. Hansen, 1982. Rev. Geophys. and Space Phys. 20(3), 457. W. K. Hartmann, D. R. Davis, 1975. Icarus, 24, 504. B. A. Kagan, N. B. Maslova, 1994. Earth, Moon and Planets 66, 173. B. A. Kagan, 1997. Prog. Oceanog. 40, 109. E. P. Kvale, H. W. Johnson, C. O. Sonett, A. W. Archer, and A. Zawistoski, 1999, J. Sediment. Res. 69(6), 1154. K. Lang, 1992. Astrophysical Data: Planets and Stars, Springer-Verlag, New York. H. Pan, 2005. Internat. J. Modern Phys. A, 20(14), 3135. R. D. Ray, B. G. Bills, B. F. Chao, 1999. J. Geophys. Res. 104(B8), 17653. C. T. Scrutton, 1978. P. Brosche, J. Sundermann, (Editors.), Tidal Friction and the Earth’s Rotation. Springer-Verlag, Berlin, pp. 154. L. B. Slichter, 1963. J. Geophys. Res. 68, 14. C. P. Sonett, E. P. Kvale, M. A. Chan, T. M. Demko, 1996. Science, 273, 100. F. D. Stacey, 1977. Physics of the Earth, second edition. John Willey & Sons. J. W. Wells, 1963. Nature, 197, 948. Caption Figure 1, the evolution of Moon’s distance and the length of day of the earth with the age of the Earth-Moon system. Solid lines are calculated according to the dark matter field fluid model. Data sources: the Moon distances are from Kvale and et al. and for the length of day: (a and b) are from Scrutton ( page 186, fig. 8), c is from Sonett and et al. The dash line marks the current age of the Earth-Moon system. Figure 2, the evolution of Solar days of year with the age of the Earth-Moon system. The solid line is calculated according to dark matter field fluid model. The data are from Wells (3.9 ~ 4.435 billion years range), Sonett (3.6 billion years) and current age (4.5 billion years). Figure 3, the skeptical description of the evolution of Mars’s length of day and the solar days/Mars year with the age of the Mars (assuming that the Mars’s age is about 4.5 billion years). The vertical dash line marks the current age of Mars. Figure 1, Moon's distance and the length of day of Earth change with the age of Earth-Moon system The age of Earth-Moon system (109 years) 0 1 2 3 4 5 Distance Length of day Roche's limit Hansen's result Figure 2, the solar days / year vs. the age of the Earth The age of the Earth (109 years) 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 ABSTRACT The evolution of Earth-Moon system is described by the dark matter field fluid model proposed in the Meeting of Division of Particle and Field 2004, American Physical Society. The current behavior of the Earth-Moon system agrees with this model very well and the general pattern of the evolution of the Moon-Earth system described by this model agrees with geological and fossil evidence. The closest distance of the Moon to Earth was about 259000 km at 4.5 billion years ago, which is far beyond the Roche's limit. The result suggests that the tidal friction may not be the primary cause for the evolution of the Earth-Moon system. The average dark matter field fluid constant derived from Earth-Moon system data is 4.39 x 10^(-22) s^(-1)m^(-1). This model predicts that the Mars's rotation is also slowing with the angular acceleration rate about -4.38 x 10^(-22) rad s^(-2). <|endoftext|><|startoftext|> Introduction The chief purpose of this paper is to show bijectively that a determinant of Stirling cycle numbers counts unlabeled acyclic single-source automata. Specifically, let Ak(n) denote the kn × kn matrix with (i, j) entry [ ⌊ i−1 ⌊ i−1 ⌋+1+i−j , where is the Stirling cycle number, the number of permutations on [i] with j cycles. For example, A2(5) = 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 3 2 0 0 0 0 0 0 0 0 1 3 2 0 0 0 0 0 0 0 0 1 6 11 6 0 0 0 0 0 0 0 1 6 11 6 0 0 0 0 0 0 0 1 10 35 50 24 0 0 0 0 0 0 1 10 35 50 0 0 0 0 0 0 0 1 15 85 0 0 0 0 0 0 0 0 1 15 http://arxiv.org/abs/0704.0004v1 As evident in the example, Ak(n) is formed from k copies of each of rows 2 through n+1 of the Stirling cycle triangle, arranged so that the first nonzero entry in each row is a 1 and, after the first row, this 1 occurs just before the main diagonal; in other words, Ak(n) is a Hessenberg matrix with 1s on the infra-diagonal. We will show Main Theorem. The determinant of Ak(n) is the number of unlabeled acyclic single- source automata with n transient states on a (k + 1)-letter input alphabet. Section 2 reviews basic terminology for automata and recurrence relations to count finite acyclic automata. Section 3 introduces column-marked subdiagonal paths, which play an intermediate role, and a way to code them. Section 4 presents a bijection from these column-marked subdiagonal paths to unlabeled acyclic single-source automata. Fi- nally, Section 5 evaluates detAk(n) using a sign-reversing involution and shows that the determinant counts the codes for column-marked subdiagonal paths. 2 Automata A (complete, deterministic) automaton consists of a set of states and an input alphabet whose letters transform the states among themselves: a letter and a state produce another state (possibly the same one). A finite automaton (finite set of states, finite input alphabet of, say, k letters) can be represented as a k-regular directed multigraph with ordered edges: the vertices represent the states and the first, second, . . . edge from a vertex give the effect of the first, second, . . . alphabet letter on that state. A finite automaton cannot be acyclic in the usual sense of no cycles: pick a vertex and follow any path from it. This path must ultimately hit a previously encountered vertex, thereby creating a cycle. So the term acyclic is used in the looser sense that only one vertex, called the sink, is involved in cycles. This means that all edges from the sink loop back to itself (and may safely be omitted) and all other paths feed into the sink. A non-sink state is called transient. The size of an acyclic automaton is the number of transient states. An acyclic automaton of size n thus has transient states which we label 1, 2, . . . , n and a sink, labeled n + 1. Liskovets [1] uses the inclusion-exclusion principle (more about this below) to obtain the following recurrence relation for the number ak(n) of acyclic automata of size n on a k-letter input alphabet (k ≥ 1): ak(0) = 1; ak(n) = (−1)n−j−1 (j + 1)k(n−j)ak(j), n ≥ 1. A source is a vertex with no incoming edges. A finite acyclic automaton has at least one source because a path traversed backward v1 ← v2 ← v3 ← . . . must have distinct vertices and so cannot continue indefinitely. An automaton is single-source (or initially connected) if it has only one source. Let Bk(n) denote the set of single-source acyclic finite (SAF) automata on a k-letter input alphabet with vertices 1, 2, . . . , n + 1 where 1 is the source and n + 1 is the sink, and set bk(n) = | Bk(n) |. The two-line representation of an automaton in Bk(n) is the 2× kn matrix whose columns list the edges in order. For example, 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 2 4 6 6 6 6 6 6 6 3 5 3 2 2 6 is in B3(5) and the source-to-sink paths in B include 1 → 6, 1 → 6, 1 → 6, where the alphabet is {a, b, c}. Proposition 1. The number bk(n) of SAF automata of size n on a k-letter input alphabet (n, k ≥ 1) is given by bk(n) = (−1)n−i (i+ 1)k(n−i)ak(i) Remark This formula is a bit more succinct than the the recurrence in [1, Theorem 3.2]. Proof Consider the setA of acyclic automata with transient vertices [n] = {1, 2, . . . , n} in which 1 is a source. Call 2, 3, . . . , n the interior vertices. For X ⊆ [2, n], let f(X) = # automata in A whose set of interior vertices includes X, g(X) = # automata in A whose set of interior vertices is precisely X. Then f(X) = Y :X⊆Y⊆[2,n] g(Y ) and by Möbius inversion [2] on the lattice of subsets of [2, n], g(X) = Y :X⊆Y⊆[2,n] µ(X, Y )f(Y ) where µ(X, Y ) is the Möbius function for this lattice. Since µ(X, Y ) = (−1)|Y |−|X| if X ⊆ Y , we have in particular that g(∅) = Y⊆[2,n] (−1)| Y |f(Y ). (1) Let | Y | = n − i so that 1 ≤ i ≤ n. When Y consists entirely of sources, the vertices in [n+ 1]\Y and their incident edges form a subautomaton with i transient states; there are ak(i) such. Also, all edges from the n − i vertices comprising Y go directly into [n + 1]\Y : (i + 1)k(n−i) choices. Thus f(Y ) = (i + 1)k(n−i)ak(i). By definition, g(∅) is the number of automata in A for which 1 is the only source, that is, g(∅) = bk(n) and the Proposition now follows from (1). An unlabeled SAF automaton is an equivalence class of SAF automata under relabeling of the interior vertices. Liskovets notes [1] (and we prove below) that Bk(n) has no nontrivial automorphisms, that is, each of the (n− 1)! relabelings of the interior vertices of B ∈ Bk(n) produces a different automaton. So unlabeled SAF automata of size n on a k-letter alphabet are counted by 1 (n−1)! bk(n). The next result establishes a canonical representative in each relabeling class. Proposition 2. Each equivalence class in Bk(n) under relabeling of interior vertices has size (n− 1)! and contains exactly one SAF automaton with the “last occurrences increas- ing” property: the last occurrences of the interior vertices—2, 3, . . . , n—in the bottom row of its two-line representation occur in that order. Proof The first assertion follows from the fact that the interior vertices of an au- tomatonB ∈ bk(n) can be distinguished intrinsically, that is, independent of their labeling. To see this, first mark the source, namely 1, with a mark (new label) v1 and observe that there exists at least one interior vertex whose only incoming edge(s) are from the source (the only currently marked vertex) for otherwise a cycle would be present. For each such interior vertex v, choose the last edge from the marked vertex to v using the built-in ordering of these edges. This determines an order on these vertices; mark them in order v2, v3, . . . , vj (j ≥ 2). If there still remain unmarked interior vertices, at least one of them has incoming edges only from a marked vertex or again a cycle would be present. For each such vertex, use the last incoming edge from a marked vertex, where now edges are arranged in order of initial vertex vi with the built-in order breaking ties, to order and mark these vertices vj+1, vj+2, . . .. Proceed similarly until all interior vertices are marked. For example, for 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 2 4 6 6 6 6 6 6 6 3 5 3 2 2 6 v1 = 1 and there is just one interior vertex, namely 4, whose only incoming edge is from the source, and so v2 = 4 and 4 becomes a marked vertex. Now all incoming edges to both 3 and 5 are from marked vertices and the last such edges (built-in order comes into play) are 4 → 5 and 4 → 3 putting vertices 3, 5 in the order 5, 3. So v3 = 5 and v4 = 3. Finally, v5 = 2. This proves the first assertion. By construction of the vs, relabeling each interior vertex i with the subscript of its corresponding v produces an automaton in Bk(n) with the “last occurrences increasing” property and is the only relabeling that does so. The example yields 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 5 2 6 4 3 4 5 5 6 6 6 6 6 6 6 Now let Ck(n) denote the set of canonical SAF automata in Bk(n) representing un- labeled automata; thus | Ck(n) | = (n−1)! bk(n). Henceforth, we identify an unlabeled au- tomaton with its canonical representative. 3 Column-Marked Subdiagonal Paths A subdiagonal (k, n, p)-path is a lattice path of steps E = (1, 0) and N = (0, 1), E for east and N for north, from (0, 0) to (kn, p) that never rise above the line y = 1 x. Let Ck(n, p) denote the set of such paths.For k ≥ 1, it is clear that Ck(n, p) is nonempty only for 0 ≤ p ≤ n and it is known (generalized ballot theorem) that |Ck(n, p) | = kn− kp+ 1 kn+ p+ 1 kn+ p + 1 A path P in Ck(n, n) can be coded by the heights of its E steps above the line y = −1; this gives a a sequence (bi) i=1 subject to the restrictions 1 ≤ b1 ≤ b2 ≤ . . . ≤ bkn and bi ≤ ⌈i/k⌉ for all i. A column-marked subdiagonal (k, n, p)-path is one in which, for each i ∈ [1, kn], one of the lattice squares below the ith E step and above the horizontal line y = −1 is marked, say with a ‘ ∗ ’. Let C k(n, p) denote the set of such marked paths. b b b b b b b b b b b ∗ ∗ ∗ (0,0) (8,4) y = −1 y = 1 A path in C 2(4, 3) A marked path P ∗ in C k(n, n) can be coded by a sequence of pairs (ai, bi) where i=1 is the code for the underlying path P and ai ∈ [1, bi] gives the position of the ∗ in the ith column. The example is coded by (1, 1), (1, 1), (1, 2), (2, 2), (1, 2), (3, 3), (1, 3), (2, 3). An explicit sum for |C k(n, n) | is k(n, n) | = 1≤b1≤b2≤...≤bkn, bi ≤ ⌈i/k⌉ for all i b1b2 . . . bkn, because the summand b1b2 . . . bkn is the number of ways to insert the ‘ ∗ ’s in the underlying path coded by (bi) It is also possible to obtain a recurrence for |C k(n, p) |, and then, using Prop. 1, to show analytically that |C k(n, n) | = | Ck+1(n) |. However, it is much more pleasant to give a bijection and in the next section we will do so. In particular, the number of SAF automata on a 2-letter alphabet is | C2(n) | = |C 1(n, n) | = 1≤b1≤b2≤...≤bn bi ≤ i for all i b1b2 . . . bn = (1, 3, 16, 127, 1363, . . .)n≥1, sequence A082161 in [3]. 4 Bijection from Paths to Automata In this section we exhibit a bijection from C k(n, n) to Ck+1(n). Using the illustrated path as a working example with k = 2 and n = 4, b b b b b b b b b b b ∗ ∗ ∗ (0,0) (8,4) y = −1 y = 1 first construct the top row of a two-line representation consisting of k + 1 each 1s, 2s, . . . ,n s and number them left to right: The last step in the path is necessarily anN step. For the second last, third last,. . .N steps in the path, count the number of steps following it. This gives a sequence i1, i2, . . . , in−1 satisfying 1 ≤ i1 < i2 < . . . < in−1 and ij ≤ (k + 1)j for all j. Circle the positions i1, i2, . . . , in−1 in the two-line representation and then insert (in boldface) 2, 3, . . . , n in the second row in the circled positions: 2 3 4 These will be the last occurrences of 2, 3, . . . , n in the second row. Working from the last column in the path back to the first, fill in the blanks in the second row left to right as follows. Count the number of squares from the ∗ up to the path (including the ∗ square) http://www.research.att.com:80/cgi-bin/access.cgi/as/njas/sequences/eisA.cgi?Anum=A082161 and add this number to the nearest boldface number to the left of the current blank entry (if there are no boldface numbers to the left, add this number to 1) and insert the result in the current blank square. In the example the numbers of squares are 2,3,1,2,1,2,1,1 yielding 2 4 5 3 3 5 4 5 4 5 5 This will fill all blank entries except the last. Note that ∗ s in the bottom row correspond to sink (that is, n+1) labels in the second row. Finally, insert n+1 into the last remaining blank space to give the image automaton: 1 1 1 2 2 2 3 3 3 4 4 4 2 4 5 3 3 5 4 5 4 5 5 5 This process is fully reversible and the map is a bijection. 5 Evaluation of detAk(n) For simplicity, we treat the case k = 1, leaving the generalization to arbitrary k as a not-too-difficult exercise for the interested reader. Write A(n) for A1(n). Thus A(n) = 1≤i,j≤n . From the definition of detA(n) as a sum of signed products, we show that detA(n) is the total weight of certain lists of permutations, each list carrying weight ±1. Then a weight-reversing involution cancels all −1 weights and reduces the problem to counting the surviving lists. These surviving lists are essentially the codes for paths in C 1(n, p), and the Main Theorem follows from §4. To describe the permutations giving a nonzero contribution to detA(n) = σ sgn σ× i=1 ai,σ(i), define the code of a permutation σ on [n] to be the list c = (ci) i=1 with ci = σ(i)−(i−1). Since the (i, j) entry of A(n), , is 0 unless j ≥ i−1, we must have σ(i) ≥ i−1 for all i. It is well known that there are 2n−1 such permutations, corresponding to compositions of n, with codes characterized by the following four conditions: (i) ci ≥ 0 for all i, (ii) c1 ≥ 1, (iii) each ci ≥ 1 is immediately followed by ci − 1 zeros in the list, i=1 ci = n. Let us call such a list a padded composition of n: deleting the zeros is a bijection to ordinary compositions of n. For example, (3, 0, 0, 1, 2, 0) is a padded composition of 6. For a permutation σ with padded composition code c, the nonzero entries in c give the cycle lengths of σ. Hence sgnσ, which is the parity of “n−#cycles in σ”, is given by (−1)#0s in c. We have detA(n) = σ sgn σ i=1 ai,σ(i) = σ sgn σ 2i−σ(i) , and so detA(n) = (−1)#0s in c i+ 1− ci where the sum is restricted to padded compositions c of n with ci ≤ i for all i (A002083) because i+1−ci = 0 unless ci ≤ i. Henceforth, let us write all permutations in standard cycle form whereby the smallest entry occurs first in each cycle and these smallest entries increase left to right. Thus, with dashes separating cycles, 154-2-36 is the standard cycle form of the permutation ( 1 2 3 4 5 65 2 6 1 4 3 ). We define a nonfirst entry to be one that does not start a cycle. Thus the preceding permutation has 3 nonfirst entries: 5,4,6. Note that the number of nonfirst entries is 0 only for the identity permutation. We denote an identity permutation (of any size) by ǫ. By definition of Stirling cycle number, the product in (2) counts lists (πi) i=1 of permu- tations where πi is a permutation on [i+1] with i+1− ci cycles, equivalently, with ci ≤ i nonfirst entries. So define Ln to be the set all lists of permutations π = (πi) i=1 where πi is a permutation on [i + 1], #nonfirst entries in πi is ≤ i, π1 is the transposition (1,2), each nonidentity permutation πi is immediately followed by ci − 1 ǫ’s where ci ≥ 1 is the number of nonfirst entries in πi (so the total number of nonfirst entries is n). Assign a weight to π ∈ Ln by wt(π) = (−1) # ǫ’s in π. Then detA(n) = wt(π). We now define a weight-reversing involution on (most of) Ln. Given π ∈ Ln, scan the list of its component permutations π1 = (1, 2), π2, π3, . . . left to right. Stop at the first one that either (i) has more than one nonfirst entry, or (ii) has only one nonfirst entry, b say, and b > maximum nonfirst entry m of the next permutation in the list. Say πk is the permutation where we stop. http://www.research.att.com:80/cgi-bin/access.cgi/as/njas/sequences/eisA.cgi?Anum=A002083 In case (i) decrement (i.e. decrease by 1) the number of ǫ’s in the list by splitting πk into two nonidentity permutations as follows. Let m be the largest nonfirst entry of πk and let ℓ be its predecessor. Replace πk and its successor in the list (necessarily an ǫ) by the following two permutations: first the transposition (ℓ,m) and second the permutation obtained from πk by erasing m from its cycle and turning it into a singleton. Here are two examples of this case (recall permutations are in standard cycle form and, for clarity, singleton cycles are not shown). i 1 2 3 4 5 6 πi 12 13 23 14-253 ǫ ǫ i 1 2 3 4 5 6 πi 12 13 23 25 14-23 ǫ i 1 2 3 4 5 6 πi 12 23 14 13-24 ǫ 23 i 1 2 3 4 5 6 πi 12 23 14 24 13 23 The reader may readily check that this sends case (i) to case (ii). In case (ii), πk is a transposition (a, b) with b > maximum nonfirst entry m of πk+1. In this case, increment the number of ǫ’s in the list by combining πk and πk+1 into a single permutation followed by an ǫ: in πk+1, b is a singleton; delete this singleton and insert b immediately after a in πk+1 (in the same cycle). The reader may check that this reverses the result in the two examples above and, in general, sends case (ii) to case (i). Since the map alters the number of ǫ’s in the list by 1, it is clearly weight-reversing. The map fails only for lists that both consist entirely of transpositions and have the form (a1, b1), (a2, b2), . . . , (an, bn) with b1 ≤ b2 ≤ . . . ≤ bn. Such lists have weight 1. Hence detA(n) is the number of lists (ai, bi) satisfying 1 ≤ ai < bi ≤ i+ 1 for 1 ≤ i ≤ n, and b1 ≤ b2 ≤ . . . ≤ bn. After subtracting 1 from each bi, these lists code the paths in C 1(n, n) and, using §4, detA(n) = |C 1(n, n) | = | C2(n) |. References [1] Valery A. Liskovets, Exact enumeration of acyclic deterministic au- tomata, Disc. Appl. Math., in press, 2006. Earlier version available at http://www.i3s.unice.fr/fpsac/FPSAC03/articles.html http://www.i3s.unice.fr/fpsac/FPSAC03/articles.html [2] J. H. van Lint and R. M. Wilson, A Course in Combinatorics, 2nd ed., Cambridge University Press, NY, 2001. [3] Neil J. Sloane (founder and maintainer), The On-Line Encyclopedia of Integer Se- quences http://www.research.att.com:80/ njas/sequences/index.html?blank=1 http://www.research.att.com:80/~njas/sequences/index.html?blank=1 ABSTRACT We show that a determinant of Stirling cycle numbers counts unlabeled acyclic single-source automata. The proof involves a bijection from these automata to certain marked lattice paths and a sign-reversing involution to evaluate the determinant. <|endoftext|><|startoftext|> FROM DYADIC Λα TO Λα WAEL ABU-SHAMMALA AND ALBERTO TORCHINSKY Abstract. In this paper we show how to compute the Λα norm , α ≥ 0, using the dyadic grid. This result is a consequence of the description of the Hardy spaces Hp(RN ) in terms of dyadic and special atoms. Recently, several novel methods for computing the BMO norm of a function f in two dimensions were discussed in [9]. Given its importance, it is also of interest to explore the possibility of computing the norm of a BMO function, or more generally a function in the Lipschitz class Λα, using the dyadic grid in RN . It turns out that the BMO question is closely related to that of approximating functions in the Hardy space H1(RN ) by the Haar system. The approximation in H1(RN ) by affine systems was proved in [2], but this result does not apply to the Haar system. Now, if HA(R) denotes the closure of the Haar system in H1(R), it is not hard to see that the distance d(f,HA) of f ∈ H1(R) to HA is ∼ f(x) dx ∣, see [1]. Thus, neither dyadic atoms suffice to describe the Hardy spaces, nor the evaluation of the norm in BMO can be reduced to a straightforward computation using the dyadic intervals. In this paper we address both of these issues. First, we give a characterization of the Hardy spaces Hp(RN ) in terms of dyadic and special atoms, and then, by a duality argument, we show how to compute the norm in Λα(R N ), α ≥ 0, using the dyadic grid. We begin by introducing some notations. Let J denote a family of cubes Q in RN , and Pd the collection of polynomials in R N of degree less than or equal to d. Given α ≥ 0, Q ∈ J , and a locally integrable function g, let pQ(g) denote the unique polynomial in P[α] such that [g − pQ(g)]χQ has vanishing moments up to order [α]. For a locally square-integrable function g, we consider the maximal function α,J g(x) given by α,J g(x) = sup x∈Q,Q∈J |Q|α/N |g(y)− pQ(g)(y)| 1991 Mathematics Subject Classification. 42B30,42B35. http://arxiv.org/abs/0704.0005v1 2 WAEL ABU-SHAMMALA AND ALBERTO TORCHINSKY The Lipschitz space Λα,J consists of those functions g such that M α,J g is in L∞, ‖g‖Λα,J = ‖M α,J g‖∞; when the family in question contains all cubes in RN , we simply omit the subscript J . Of course, Λ0 = BMO. Two other families, of dyadic nature, are of interest to us. Intervals in R of the form In,k = [ (k−1)2 n, k2n], where k and n are arbitrary integers, positive, negative or 0, are said to be dyadic. In RN , cubes which are the product of dyadic intervals of the same length, i.e., of the form Qn,k = In,k1 ×· · ·×In,kN , are called dyadic, and the collection of all such cubes is denoted D. There is also the family D0. Let I n,k = [(k− 1)2 n, (k+ 1)2n], where k and n are arbitrary integers. Clearly I ′n,k is dyadic if k is odd, but not if k is even. Now, the collection {I ′n,k : n, k integers} contains all dyadic intervals as well as the shifts [(k − 1)2n + 2n−1, k 2n + 2n−1] of the dyadic intervals by their half length. In RN , put D0 = {Q n,k : Q n,k = I × · · · × I ′n,kN }; Q n,k is called a special cube. Note that D0 contains D properly. Finally, given I ′n,k, let I n,k = [(k − 1)2 n, k2n], and I n,k = [k2 n, (k + 1)2n]. The 2N subcubes of Q′n,k = I × · · · × I ′n,kN of the form I × · · · × I Sj = L or R, 1 ≤ j ≤ N , are called the dyadic subcubes of Q Let Q0 denote the special cube [−1, 1] N . Given α ≥ 0, we construct a family Sα of piecewise polynomial splines in L 2(Q0) that will be useful in characterizing Λα. Let A be the subspace of L 2(Q0) consisting of all functions with vanishing moments up to order [α] which coincide with a polynomial in P[α] on each of the 2 N dyadic subcubes of Q0. A is a finite dimensional subspace of L2(Q0), and, therefore, by the Graham-Schmidt orthogonalization process, say, A has an orthonormal basis in L2(Q0) consisting of functions p1, . . . , pM with vanishing moments up to order [α], which coincide with a polynomial in P[α] on each dyadic subinterval of Q0. Together with each p we also consider all dyadic dilations and integer translations given by pLn,k,α(x) = 2 n(N+α)pL(2nx1 + k1, . . . , 2 nxN + kN ) , 1 ≤ L ≤ M , and let Sα = {p n,k,α : n, k integers, 1 ≤ L ≤ M} . Our first result shows how the dyadic grid can be used to compute the norm in Λα. Theorem A. Let g be a locally square-integrable function and α ≥ 0. Then, g ∈ Λα if, and only if, g ∈ Λα,D and Aα(g) = supp∈Sα ∣〈g, p〉 ∣ < ∞. Moreover, ‖g‖Λα ∼ ‖g‖Λα,D +Aα(g) . Furthermore, it is also true, and the proof is given in Proposition 2.1 be- low, that ‖g‖Λα ∼ ‖g‖Λα,D0 . However, in this simpler formulation, the tree structure of the cubes in D has been lost. FROM DYADIC Λα TO Λα 3 The proof of Theorem A relies on a close investigation of the predual of Λα, namely, the Hardy space H p(RN ) with 0 < p = (α + N)/N ≤ 1. In the process we characterize Hp in terms of simpler subspaces: H , or dyadic Hp, and H , the space generated by the special atoms in Sα. Specifically, we Theorem B. Let 0 < p ≤ 1, and α = N(1/p− 1). We then have Hp = H where the sum is understood in the sense of quasinormed Banach spaces. The paper is organized as follows. In Section 1 we show that individual Hp atoms can be written as a superposition of dyadic and special atoms; this fact may be thought of as an extension of the one-dimensional result of Fridli concerning L∞ 1- atoms, see [5] and [1]. Then, we prove Theorem B. In Section 2 we discuss how to pass from Λα,D, and Λα,D0 , to the Lipschitz space Λα. 1. Characterization of the Hardy spaces Hp We adopt the atomic definition of the Hardy spaces Hp, 0 < p ≤ 1, see [6] and [10]. Recall that a compactly supported function a with [N(1/p− 1)] vanishing moments is an L2 p -atom with defining cube Q if supp(a) ⊆ Q, and |Q|1/p | a(x) |2dx ≤ 1 . The Hardy space Hp(RN ) = Hp consists of those distributions f that can be written as f = λjaj , where the aj ’s are H p atoms, |λj | p < ∞, and the convergence is in the sense of distributions as well as in Hp. Furthermore, ‖f‖Hp ∼ inf |λj | where the infimum is taken over all possible atomic decompositions of f . This last expression has traditionally been called the atomic Hp norm of f . Collections of atoms with special properties can be used to gain a better understanding of the Hardy spaces. Formally, let A be a non-empty subset of L2 p -atoms in the unit ball of Hp. The atomic space H spanned by A consists of those ϕ in Hp of the form λjaj , aj ∈ A , |λj | p < ∞ . It is readily seen that, endowed with the atomic norm ‖ϕ‖Hp = inf |λj | : ϕ = λj aj , aj ∈ A becomes a complete quasinormed space. Clearly, H ⊆ Hp, and, for f ∈ H , ‖f‖Hp ≤ ‖f‖Hp 4 WAEL ABU-SHAMMALA AND ALBERTO TORCHINSKY Two families are of particular interest to us. When A is the collection of all L2 p -atoms whose defining cube is dyadic, the resulting space is H or dyadic Hp. Now, although ‖f‖Hp ≤ ‖f‖Hp , the two quasinorms are not equivalent on H . Indeed, for p = 1 and N = 1, the functions fn(x) = 2 n[χ[1−2−n,1](x) − χ[1,1+2−n](x)] , satisfy ‖fn‖H1 = 1, but ‖fn‖H1 ∼ |n| tends to infinity with n. Next, when Sα is the family of piecewise polynomial splines constructed above with α = N(1/p − 1), in analogy with the one-dimensional results in [4] and [1], H is referred to as the space generated by special atoms. We are now ready to describe Hp atoms as a superposition of dyadic and special atoms. Lemma 1.1. Let a be an L2 p -atom with defining cube Q, 0 < p ≤ 1, and α = N(1/p − 1). Then a can be written as a linear combination of 2N dyadic atoms ai, each supported in one of the dyadic subcubes of the smallest special cube Qn,k containing Q, and a special atom b in Sα. More precisely, a(x) = i=1 di ai(x) + L=1 cL p −n,−k,α(x), with |di| , |cL| ≤ c. Proof. Suppose first that the defining cube of a is Q0, and let Q1, . . . , Q2N denote the dyadic subcubes of Q0. Furthermore, let {e i , . . . , e i } denote an orthonormal basis of the subspace Ai of L 2(Qi) consisting of polynomials in P[α], 1 ≤ i ≤ 2 N . Put αi(x) = a(x)χQi (x)− 〈aχQi , e j(x) , 1 ≤ i ≤ 2 and observe that 〈αi, e j〉 = 0 for 1 ≤ j ≤ M . Therefore, αi has [α] vanishing moments, is supported in Qi, and ‖αi‖2 ≤ ‖aχQi‖2 + ‖aχQi‖2 ≤ (M + 1) ‖aχQi‖2 . ai(x) = 2N(1/2−1/p) M + 1 αi(x) , 1 ≤ i ≤ N , is an L2 p - dyadic atom. Finally, put b(x) = a(x) − M + 1 2N(1/2−1/p) ai(x) . FROM DYADIC Λα TO Λα 5 Clearly b has [α] vanishing moments, is supported in Q0, coincides with a polynomial in P[α] on each dyadic subcube of Q0, and ‖b‖22 ≤ |〈aχQi , e 2 ≤ M ‖a‖22 . So, b ∈ A, and, consequently, b(x) = L=1 cL p L(x), where |cL| = |〈b, p L〉| ≤ c , 1 ≤ L ≤ M . In the general case, let Q be the defining cube of a, side-length Q = ℓ, and let n and k = (k1, . . . , kN ) be chosen so that 2 n−1 ≤ ℓ < 2n, and Q ⊂ [(k1 − 1)2 n, (k1 + 1)2 n]× · · · × [(kN − 1)2 n, (kN + 1)2 Then, (1/2)N ≤ |Q|/2nN < 1. Now, given x ∈ Q0, let a ′ be the translation and dilation of a given by a′(x) = 2nN/pa(2nx1 − k1, . . . , 2 nxN − kN ) . Clearly, [α] moments of a′ vanish, and ‖a′‖2 = 2 nN/p 2−nN/2‖a‖2 ≤ c |Q| 1/p|Q|−1/2‖a‖2 ≤ c . Thus, a′ is a multiple of an atom with defining cube Q0. By the first part of the proof, a′(x) = i(x) + L(x) , x ∈ Q0 . The support of each a′i is contained in one of the dyadic subcubes of Q0, and, consequently, there is a k such that ai(x) = 2 −nN/pa′i(2 −nx1 − k1, . . . , 2 −nxN − kN ) ai is an L 2p -atom supported in one of the dyadic subcubes of Q. Similarly for the pL’s. Thus, a(x) = di ai(x) + −n,−k,N(1/p−1)(x) , and we have finished. � Theorem B follows readily from Lemma 1.1. Clearly, H →֒ Hp. Conversely, let f = j λj aj be in H p. By Lemma 1.1 each aj can be written as a sum of dyadic and special atoms, and, by distributing the sum, we can write f = fd + fs, with fd in H , fs in H , and ‖fd‖Hp , ‖fs‖Hp |λj | Taking the infimum over the decompositions of f we get ‖f‖Hp c ‖f‖Hp , and H p →֒ H . This completes the proof. 6 WAEL ABU-SHAMMALA AND ALBERTO TORCHINSKY The meaning of this decomposition is the following. Cubes in D are con- tained in one of the 2N non-overlapping quadrants of RN . To allow for the information carried by a dyadic cube to be transmitted to an adjacent dyadic cube, they must be connected. The pLn,k,α’s channel information across ad- jacent dyadic cubes which would otherwise remain disconnected. The reader will have no difficulty in proving the quantitative version of this observation: Let T be a linear mapping defined on Hp, 0 < p ≤ 1, that assumes values in a quasinormed Banach space X . Then, T is continuous if, and only if, the restrictions of T to H and H are continuous. 2. Characterizations of Λα Theorem A describes how to pass from Λα,D to Λα, and we prove it next. Since (Hp)∗ = Λα and (H )∗ = Λα,D, from Theorem B it follows readily that Λα = Λα,D ∩ (H )∗, so it only remains to show that (H )∗ is characterized by the condition Aα(g) < ∞. First note that if g is a locally square-integrable function with Aα(g) < ∞ and f = j,L cj,L p nj ,kj ,α , since 0 < p ≤ 1, |〈g, f〉| ≤ |cj,L| |〈g, p nj ,kj ,α ≤ Aα(g) |cj,L| and, consequently, taking the infimum over all atomic decompositions of f in , we get g ∈ (H )∗ and ‖g‖(Hp )∗ ≤ Aα(g). To prove the converse we proceed as in [3]. Let Qn = [−2 n, 2n]N . We begin by observing that functions f in L2(Qn) that have vanishing moments up to order [α] and coincide with polynomials of degree [α] on the dyadic subcubes of Qn belong to H ‖f‖Hp ≤ |Qn| 1/p−1/2‖f‖2 . Given ℓ ∈ (H )∗, for a fixed n let us consider the restriction of ℓ to the space of L2 functions f with [α] vanishing moments that are supported in Qn. Since |ℓ(f)| ≤ ‖ℓ‖ ‖f‖Hp ≤ ‖ℓ‖ |Qn| 1/p−1/2‖f‖2 , this restriction is continuous with respect to the norm in L2 and, consequently, it can be extended to a continuous linear functional in L2 and represented as ℓ(f) = f(x) gn(x) dx , FROM DYADIC Λα TO Λα 7 where gn ∈ L 2(Qn) and satisfies ‖gn‖2 ≤ ‖ℓ‖ |Qn| 1/p−1/2. Clearly, gn is uniquely determined in Qn up to a polynomial pn in P[α]. Therefore, gn(x) − pn(x) = gm(x)− pm(x) , a.e. x ∈ Qmin(n,m) . Consequently, if g(x) = gn(x)− pn(x) , x ∈ Qn , g(x) is well defined a.e. and, if f ∈ L2 has [α] vanishing moments and is supported in Qn, we have ℓ(f) = f(x) gn(x) dx f(x) [gn(x)− pn(x)] dx f(x) g(x) dx . Moreover, since each 2nN/ppL(2n ·+k) is an L2 p-atom, 1 ≤ L ≤ M , it readily follows that Aα(g) = sup 1≤L≤M n,k∈Z |〈g, 2−n/ppL(2n ·+k)〉| ≤ ‖ℓ‖ sup ‖pL‖Hp ≤ ‖ℓ‖ , and, consequently, Aα(g) ≤ ‖ℓ‖ , and (H )∗ is the desired space. � The reader will have no difficulty in showing that this result implies the following: Let T be a bounded linear operator from a quasinormed space X into Λα,D. Then, T is bounded from X into Λα if, and only if, Aα(Tx) ≤ c ‖x‖X for every x ∈ X . The process of averaging the translates of dyadic BMO functions leads to BMO, and is an important tool in obtaining results in BMO once they are known to be true in its dyadic counterpart, BMOd, see [7]. It is also known that BMO can be obtained as the intersection of BMOd and one of its shifted counterparts, see [8]. These results motivate our next proposition, which essentially says that g ∈ Λα if, and only if, g ∈ Λα,D and g is in the Lipschitz class obtained from the shifted dyadic grid. Note that the shifts involved in this class are in all directions parallel to the coordinate axis and depend on the side-length of the cube. Proposition 2.1. Λα = Λα,D0 , and ‖g‖Λα ∼ ‖g‖Λα,D0 . Proof. It is obvious that ‖g‖Λα,D0 ≤ ‖g‖Λα . To show the other inequality we invoke Theorem A. Since D ⊂ D0, it suffices to estimate Aα(g), or, equiva- lently, |〈g, p〉| for p ∈ Sα, α = N(1/p − 1). So, pick p = p n,k,α in Sα. The defining cube Q of pLn,k,α is in D0, and, since p n,k,α has [α] vanishing moments, 8 WAEL ABU-SHAMMALA AND ALBERTO TORCHINSKY 〈pLn,k,α, pQ(g)〉 = 0. Therefore, |〈g, pLn,k,α〉| = |〈g − pQ(g), p n,k,α〉| ≤ ‖pLn,k,α‖2 ‖g − pQ(g)‖L2(Q) ≤ |Q|α/N |Q|1/2‖pLn,k,α‖2 ‖g‖Λα,D0 . Now, a simple change of variables gives |Q|α/N |Q|1/2‖pLn,k,α‖2 ≤ 1, and, con- sequently, also Aα(g) ≤ ‖g‖Λα,D0 . � References [1] W. Abu-Shammala, J.-L. Shiu, and A. Torchinsky, Characterizations of the Hardy space H1 and BMO, preprint. [2] H.-Q. Bui and R. S. Laugesen, Approximation and spanning in the Hardy space, by affine systems, Constr. Approx., to appear. [3] A. P. Calderón and A. Torchinsky, Parabolic maximal functions associated with a distibution, II, Advances in Math., 24 (1977), 101–171. [4] G. S. de Souza, Spaces formed by special atoms, I, Rocky Mountain J. Math. 14 (1984), no. 2, 423–431. [5] S. Fridli, Transition from the dyadic to the real nonperiodic Hardy space, Acta Math. Acad. Paedagog. Niházi (N.S.) 16 (2000), 1–8, (electronic). [6] J. Garćıa-Cuerva and J. L. Rubio de Francia, Weighted norm inequalities and related topics, Notas de Matemática 116, North Holland, Amsterdam, 1985. [7] J. Garnett and P. Jones, BMO from dyadic BMO, Pacific J. Math. 99 (1982), no. 2, 351–371. [8] T. Mei, BMO is the intersection of two translates of dyadic BMO, C. R. Math. Acad. Sci. Paris 336 (2003), no. 12, 1003–1006. [9] T. M. Le and L. A. Vese, Image decomposition using total variation and div( BMO)∗, Multiscale Model. Simul. 4, (2005), no. 2, 390–423. [10] A. Torchinsky, Real-variable methods in harmonic analysis, Dover Publications, Inc., Mineola, NY, 2004. Department of Mathematics, Indiana University, Bloomington IN 47405 E-mail address: wabusham@indiana.edu Department of Mathematics, Indiana University, Bloomington IN 47405 E-mail address: torchins@indiana.edu 1. Characterization of the Hardy spaces Hp 2. Characterizations of References ABSTRACT In this paper we show how to compute the $\Lambda_{\alpha}$ norm, $\alpha\ge 0$, using the dyadic grid. This result is a consequence of the description of the Hardy spaces $H^p(R^N)$ in terms of dyadic and special atoms. <|endoftext|><|startoftext|> Polymer Quantum Mechanics and its Continuum Limit Alejandro Corichi,1, 2, 3, ∗ Tatjana Vukašinac,4, † and José A. Zapata1, ‡ Instituto de Matemáticas, Unidad Morelia, Universidad Nacional Autónoma de México, UNAM-Campus Morelia, A. Postal 61-3, Morelia, Michoacán 58090, Mexico Instituto de Ciencias Nucleares, Universidad Nacional Autónoma de México, A. Postal 70-543, México D.F. 04510, Mexico Institute for Gravitational Physics and Geometry, Physics Department, Pennsylvania State University, University Park PA 16802, USA Facultad de Ingenieŕıa Civil, Universidad Michoacana de San Nicolas de Hidalgo, Morelia, Michoacán 58000, Mexico A rather non-standard quantum representation of the canonical commutation relations of quan- tum mechanics systems, known as the polymer representation has gained some attention in recent years, due to its possible relation with Planck scale physics. In particular, this approach has been followed in a symmetric sector of loop quantum gravity known as loop quantum cosmology. Here we explore different aspects of the relation between the ordinary Schrödinger theory and the polymer description. The paper has two parts. In the first one, we derive the polymer quantum mechanics starting from the ordinary Schrödinger theory and show that the polymer description arises as an appropriate limit. In the second part we consider the continuum limit of this theory, namely, the reverse process in which one starts from the discrete theory and tries to recover back the ordinary Schrödinger quantum mechanics. We consider several examples of interest, including the harmonic oscillator, the free particle and a simple cosmological model. PACS numbers: 04.60.Pp, 04.60.Ds, 04.60.Nc 11.10.Gh. I. INTRODUCTION The so-called polymer quantum mechanics, a non- regular and somewhat ‘exotic’ representation of the canonical commutation relations (CCR) [1], has been used to explore both mathematical and physical issues in background independent theories such as quantum grav- ity [2, 3]. A notable example of this type of quantization, when applied to minisuperspace models has given way to what is known as loop quantum cosmology [4, 5]. As in any toy model situation, one hopes to learn about the subtle technical and conceptual issues that are present in full quantum gravity by means of simple, finite di- mensional examples. This formalism is not an exception in this regard. Apart from this motivation coming from physics at the Planck scale, one can independently ask for the relation between the standard continuous repre- sentations and their polymer cousins at the level of math- ematical physics. A deeper understanding of this relation becomes important on its own. The polymer quantization is made of several steps. The first one is to build a representation of the Heisenberg-Weyl algebra on a Kinematical Hilbert space that is “background independent”, and that is sometimes referred to as the polymeric Hilbert space Hpoly. The second and most important part, the implementation of dynamics, deals with the definition of a Hamiltonian (or Hamiltonian constraint) on this space. In the examples ∗Electronic address: corichi@matmor.unam.mx †Electronic address: tatjana@shi.matmor.unam.mx ‡Electronic address: zapata@matmor.unam.mx studied so far, the first part is fairly well understood, yielding the kinematical Hilbert space Hpoly that is, how- ever, non-separable. For the second step, a natural im- plementation of the dynamics has proved to be a bit more difficult, given that a direct definition of the Hamiltonian Ĥ of, say, a particle on a potential on the space Hpoly is not possible since one of the main features of this repre- sentation is that the operators q̂ and p̂ cannot be both simultaneously defined (nor their analogues in theories involving more elaborate variables). Thus, any operator that involves (powers of) the not defined variable has to be regulated by a well defined operator which normally involves introducing some extra structure on the configu- ration (or momentum) space, namely a lattice. However, this new structure that plays the role of a regulator can not be removed when working in Hpoly and one is left with the ambiguity that is present in any regularization. The freedom in choosing it can be sometimes associated with a length scale (the lattice spacing). For ordinary quantum systems such as a simple harmonic oscillator, that has been studied in detail from the polymer view- point, it has been argued that if this length scale is taken to be ‘sufficiently small’, one can arbitrarily approximate standard Schrödinger quantum mechanics [2, 3]. In the case of loop quantum cosmology, the minimum area gap A0 of the full quantum gravity theory imposes such a scale, that is then taken to be fundamental [4]. A natural question is to ask what happens when we change this scale and go to even smaller ‘distances’, that is, when we refine the lattice on which the dynamics of the theory is defined. Can we define consistency con- ditions between these scales? Or even better, can we take the limit and find thus a continuum limit? As it http://arxiv.org/abs/0704.0007v2 mailto:corichi@matmor.unam.mx mailto:tatjana@shi.matmor.unam.mx mailto:zapata@matmor.unam.mx has been shown recently in detail, the answer to both questions is in the affirmative [6]. There, an appropriate notion of scale was defined in such a way that one could define refinements of the theory and pose in a precise fashion the question of the continuum limit of the theory. These results could also be seen as handing a procedure to remove the regulator when working on the appropri- ate space. The purpose of this paper is to further explore different aspects of the relation between the continuum and the polymer representation. In particular in the first part we put forward a novel way of deriving the polymer representation from the ordinary Schrödinger represen- tation as an appropriate limit. In Sec. II we derive two versions of the polymer representation as different lim- its of the Schrödinger theory. In Sec. III we show that these two versions can be seen as different polarizations of the ‘abstract’ polymer representation. These results, to the best of our knowledge, are new and have not been reported elsewhere. In Sec. IV we pose the problem of implementing the dynamics on the polymer representa- tion. In Sec. V we motivate further the question of the continuum limit (i.e. the proper removal of the regulator) and recall the basic constructions of [6]. Several exam- ples are considered in Sec. VI. In particular a simple harmonic oscillator, the polymer free particle and a sim- ple quantum cosmology model are considered. The free particle and the cosmological model represent a general- ization of the results obtained in [6] where only systems with a discrete and non-degenerate spectrum where con- sidered. We end the paper with a discussion in Sec. VII. In order to make the paper self-contained, we will keep the level of rigor in the presentation to that found in the standard theoretical physics literature. II. QUANTIZATION AND POLYMER REPRESENTATION In this section we derive the so called polymer repre- sentation of quantum mechanics starting from a specific reformulation of the ordinary Schrödinger representation. Our starting point will be the simplest of all possible phase spaces, namely Γ = R2 corresponding to a particle living on the real line R. Let us choose coordinates (q, p) thereon. As a first step we shall consider the quantization of this system that leads to the standard quantum theory in the Schrödinger description. A convenient route is to introduce the necessary structure to define the Fock rep- resentation of such system. From this perspective, the passage to the polymeric case becomes clearest. Roughly speaking by a quantization one means a passage from the classical algebraic bracket, the Poisson bracket, {q, p} = 1 (1) to a quantum bracket given by the commutator of the corresponding operators, [ q̂, p̂] = i~ 1̂ (2) These relations, known as the canonical commutation re- lation (CCR) become the most common corner stone of the (kinematics of the) quantum theory; they should be satisfied by the quantum system, when represented on a Hilbert space H. There are alternative points of departure for quantum kinematics. Here we consider the algebra generated by the exponentiated versions of q̂ and p̂ that are denoted U(α) = ei(α q̂)/~ ; V (β) = ei(β p̂)/~ where α and β have dimensions of momentum and length, respectively. The CCR now become U(α) · V (β) = e(−iα β)/~V (β) · U(α) (3) and the rest of the product is U(α1)·U(α2) = U(α1+α2) ; V (β1)·V (β2) = V (β1+β2) The Weyl algebra W is generated by taking finite linear combinations of the generators U(αi) and V (βi) where the product (3) is extended by linearity, (Ai U(αi) +Bi V (βi)) From this perspective, quantization means finding an unitary representation of the Weyl algebra W on a Hilbert space H′ (that could be different from the ordi- nary Schrödinger representation). At first it might look weird to attempt this approach given that we know how to quantize such a simple system; what do we need such a complicated object as W for? It is infinite dimensional, whereas the set S = {1̂, q̂, p̂}, the starting point of the ordinary Dirac quantization, is rather simple. It is in the quantization of field systems that the advantages of the Weyl approach can be fully appreciated, but it is also useful for introducing the polymer quantization and comparing it to the standard quantization. This is the strategy that we follow. A question that one can ask is whether there is any freedom in quantizing the system to obtain the ordinary Schrödinger representation. On a first sight it might seem that there is none given the Stone-Von Neumann unique- ness theorem. Let us review what would be the argument for the standard construction. Let us ask that the repre- sentation we want to build up is of the Schrödinger type, namely, where states are wave functions of configuration space ψ(q). There are two ingredients to the construction of the representation, namely the specification of how the basic operators (q̂, p̂) will act, and the nature of the space of functions that ψ belongs to, that is normally fixed by the choice of inner product on H, or measure µ on R. The standard choice is to select the Hilbert space to be, H = L2(R, dq) the space of square-integrable functions with respect to the Lebesgue measure dq (invariant under constant trans- lations) on R. The operators are then represented as, q̂ · ψ(q) = (q ψ)(q) and p̂ · ψ(q) = −i ~ ∂ ψ(q) (4) Is it possible to find other representations? In order to appreciate this freedom we go to the Weyl algebra and build the quantum theory thereon. The representation of the Weyl algebra that can be called of the ‘Fock type’ involves the definition of an extra structure on the phase space Γ: a complex structure J . That is, a linear map- ping from Γ to itself such that J2 = −1. In 2 dimen- sions, all the freedom in the choice of J is contained in the choice of a parameter d with dimensions of length. It is also convenient to define: k = p/~ that has dimensions of 1/L. We have then, Jd : (q, k) 7→ (−d2 k, q/d2) This object together with the symplectic structure: Ω((q, p); (q′, p′)) = q p′ − p q′ define an inner product on Γ by the formula gd(· ; ·) = Ω(· ; Jd ·) such that: gd((q, p); (q ′, p′)) = q q′ + which is dimension-less and positive definite. Note that with this quantities one can define complex coordinates (ζ, ζ̄) as usual: q + i p ; ζ̄ = q − i d from which one can build the standard Fock representa- tion. Thus, one can alternatively view the introduction of the length parameter d as the quantity needed to de- fine (dimensionless) complex coordinates on the phase space. But what is the relevance of this object (J or d)? The definition of complex coordinates is useful for the construction of the Fock space since from them one can define, in a natural way, creation and annihilation operators. But for the Schrödinger representation we are interested here, it is a bit more subtle. The subtlety is that within this approach one uses the algebraic prop- erties of W to construct the Hilbert space via what is known as the Gel’fand-Naimark-Segal (GNS) construc- tion. This implies that the measure in the Schrödinger representation becomes non trivial and thus the momen- tum operator acquires an extra term in order to render the operator self-adjoint. The representation of the Weyl algebra is then, when acting on functions φ(q) [7]: Û(α) · φ(q) := (eiα q/~ φ)(q) V̂ (β) · φ(q) := e (q−β/2) φ(q − β) The Hilbert space structure is introduced by the defini- tion of an algebraic state (a positive linear functional) ωd : W → C, that must coincide with the expectation value in the Hilbert space taken on a special state ref- ered to as the vacuum: ωd(a) = 〈â〉vac, for all a ∈ W . In our case this specification of J induces such a unique state ωd that yields, 〈Û(α)〉vac = e− d2 α2 ~2 (5) 〈V̂ (β)〉vac = e− d2 (6) Note that the exponents in the vacuum expectation values correspond to the metric constructed out of J : d2 α2 = gd((0, α); (0, α)) and = gd((β, 0); (β, 0)). Wave functions belong to the space L2(R, dµd), where the measure that dictates the inner product in this rep- resentation is given by, dµd = d2 dq In this representation, the vacuum is given by the iden- tity function φ0(q) = 1 that is, just as any plane wave, normalized. Note that for each value of d > 0, the rep- resentation is well defined and continuous in α and β. Note also that there is an equivalence between the q- representation defined by d and the k-representation de- fined by 1/d. How can we recover then the standard representation in which the measure is given by the Lebesgue measure and the operators are represented as in (4)? It is easy to see that there is an isometric isomorphism K that maps the d-representation in Hd to the standard Schrödinger representation in Hschr by: ψ(q) = K · φ(q) = e d1/2π1/4 φ(q) ∈ Hschr = L2(R, dq) Thus we see that all d-representations are unitarily equiv- alent. This was to be expected in view of the Stone-Von Neumann uniqueness result. Note also that the vacuum now becomes ψ0(q) = d1/2π1/4 2 d2 , so even when there is no information about the param- eter d in the representation itself, it is contained in the vacuum state. This procedure for constructing the GNS- Schrödinger representation for quantum mechanics has also been generalized to scalar fields on arbitrary curved space in [8]. Note, however that so far the treatment has all been kinematical, without any knowledge of a Hamil- tonian. For the Simple Harmonic Oscillator of mass m and frequency ω, there is a natural choice compatible with the dynamics given by d = , in which some calculations simplify (for instance for coherent states), but in principle one can use any value of d. Our study will be simplified by focusing on the funda- mental entities in the Hilbert Space Hd , namely those states generated by acting with Û(α) on the vacuum φ0(q) = 1. Let us denote those states by, φα(q) = Û(α) · φ0(q) = ei The inner product between two such states is given by 〈φα, φλ〉d = dµd e ~ = e− (λ−α)2 d2 4 ~2 (7) Note incidentally that, contrary to some common belief, the ‘plane waves’ in this GNS Hilbert space are indeed normalizable. Let us now consider the polymer representation. For that, it is important to note that there are two possible limiting cases for the parameter d: i) The limit 1/d 7→ 0 and ii) The case d 7→ 0. In both cases, we have ex- pressions that become ill defined in the representation or measure, so one needs to be careful. A. The 1/d 7→ 0 case. The first observation is that from the expressions (5) and (6) for the algebraic state ωd, we see that the limiting cases are indeed well defined. In our case we get, ωA := lim1/d→0 ωd such that, ωA(Û(α)) = δα,0 and ωA(V̂ (β)) = 1 (8) From this, we can indeed construct the representation by means of the GNS construction. In order to do that and to show how this is obtained we shall consider several expressions. One has to be careful though, since the limit has to be taken with care. Let us consider the measure on the representation that behaves as: dµd = d2 dq 7→ 1 so the measures tends to an homogeneous measure but whose ‘normalization constant’ goes to zero, so the limit becomes somewhat subtle. We shall return to this point later. Let us now see what happens to the inner product between the fundamental entities in the Hilbert Space Hd given by (7). It is immediate to see that in the 1/d 7→ 0 limit the inner product becomes, 〈φα, φλ〉d 7→ δα,λ (9) with δα,λ being Kronecker’s delta. We see then that the plane waves φα(q) become an orthonormal basis for the new Hilbert space. Therefore, there is a delicate interplay between the two terms that contribute to the measure in order to maintain the normalizability of these functions; we need the measure to become damped (by 1/d) in order to avoid that the plane waves acquire an infinite norm (as happens with the standard Lebesgue measure), but on the other hand the measure, that for any finite value of d is a Gaussian, becomes more and more spread. It is important to note that, in this limit, the operators Û(α) become discontinuous with respect to α, given that for any given α1 and α2 (different), its action on a given basis vector ψλ(q) yields orthogonal vectors. Since the continuity of these operators is one of the hypotesis of the Stone-Von Neumann theorem, the uniqueness result does not apply here. The representation is inequivalent to the standard one. Let us now analyze the other operator, namely the action of the operator V̂ (β) on the basis φα(q): V̂ (β) · φα(q) = e− ~ e(β/d 2+iα/~)q which in the limit 1/d 7→ 0 goes to, V̂ (β) · φα(q) 7→ ei ~ φα(q) that is continuous on β. Thus, in the limit, the operator p̂ = −i~∂q is well defined. Also, note that in this limit the operator p̂ has φα(q) as its eigenstate with eigenvalue given by α: p̂ · φα(q) 7→ αφα(q) To summarize, the resulting theory obtained by taking the limit 1/d 7→ 0 of the ordinary Schrödinger descrip- tion, that we shall call the ‘polymer representation of type A’, has the following features: the operators U(α) are well defined but not continuous in α, so there is no generator (no operator associated to q). The basis vec- tors φα are orthonormal (for α taking values on a contin- uous set) and are eigenvectors of the operator p̂ that is well defined. The resulting Hilbert space HA will be the (A-version of the) polymer representation. Let us now consider the other case, namely, the limit when d 7→ 0. B. The d 7→ 0 case Let us now explore the other limiting case of the Schrödinger/Fock representations labelled by the param- eter d. Just as in the previous case, the limiting algebraic state becomes, ωB := limd→0 ωd such that, ωB(Û(α)) = 1 and ωB(V̂ (β)) = δβ,0 (10) From this positive linear function, one can indeed con- struct the representation using the GNS construction. First let us note that the measure, even when the limit has to be taken with due care, behaves as: dµd = d2 dq 7→ δ(q) dq That is, as Dirac’s delta distribution. It is immediate to see that, in the d 7→ 0 limit, the inner product between the fundamental states φα(q) becomes, 〈φα, φλ〉d 7→ 1 (11) This in fact means that the vector ξ = φα − φλ belongs to the Kernel of the limiting inner product, so one has to mod out by these (and all) zero norm states in order to get the Hilbert space. Let us now analyze the other operator, namely the action of the operator V̂ (β) on the vacuum φ0(q) = 1, which for arbitrary d has the form, φ̃β := V̂ (β) · φ0(q) = e (q−β/2) The inner product between two such states is given by 〈φ̃α, φ̃β〉d = e− (α−β)2 In the limit d → 0, 〈φ̃α, φ̃β〉d → δα,β. We can see then that it is these functions that become the orthonormal, ‘discrete basis’ in the theory. However, the function φ̃β(q) in this limit becomes ill defined. For example, for β > 0, it grows unboundedly for q > β/2, is equal to one if q = β/2 and zero otherwise. In order to overcome these difficulties and make more transparent the resulting the- ory, we shall consider the other form of the representation in which the measure is incorporated into the states (and the resulting Hilbert space is L2(R, dq)). Thus the new state ψβ(q) := K · (V̂ (β) · φ0(q)) = (q−β)2 We can now take the limit and what we get is d 7→0 ψβ(q) := δ 1/2(q, β) where by δ1/2(q, β) we mean something like ‘the square root of the Dirac distribution’. What we really mean is an object that satisfies the following property: δ1/2(q, β) · δ1/2(q, α) = δ(q, β) δβ,α That is, if α = β then it is just the ordinary delta, other- wise it is zero. In a sense these object can be regarded as half-densities that can not be integrated by themselves, but whose product can. We conclude then that the inner product is, 〈ψβ , ψα〉 = dq ψβ(q)ψα(q) = dq δ(q, α) δβ,α = δβ,α which is just what we expected. Note that in this repre- sentation, the vacuum state becomes ψ0(q) := δ 1/2(q, 0), namely, the half-delta with support in the origin. It is important to note that we are arriving in a natural way to states as half-densities, whose squares can be integrated without the need of a nontrivial measure on the configu- ration space. Diffeomorphism invariance arises then in a natural but subtle manner. Note that as the end result we recover the Kronecker delta inner product for the new fundamental states: χβ(q) := δ 1/2(q, β). Thus, in this new B-polymer representation, the Hilbert space HB is the completion with respect to the inner product (13) of the states generated by taking (finite) linear combinations of basis elements of the form χβ : Ψ(q) = bi χβi(q) (14) Let us now introduce an equivalent description of this Hilbert space. Instead of having the basis elements be half-deltas as elements of the Hilbert space where the inner product is given by the ordinary Lebesgue measure dq, we redefine both the basis and the measure. We could consider, instead of a half-delta with support β, a Kronecker delta or characteristic function with support on β: χ′β(q) := δq,β These functions have a similar behavior with respect to the product as the half-deltas, namely: χ′β(q) · χ′α(q) = δβ,α. The main difference is that neither χ ′ nor their squares are integrable with respect to the Lebesgue mea- sure (having zero norm). In order to fix that problem we have to change the measure so that we recover the basic inner product (13) with our new basis. The needed mea- sure turns out to be the discrete counting measure on R. Thus any state in the ‘half density basis’ can be written (using the same expression) in terms of the ‘Kronecker basis’. For more details and further motivation see the next section. Note that in this B-polymer representation, both Û and V̂ have their roles interchanged with that of the A-polymer representation: while U(α) is discontinuous and thus q̂ is not defined in the A-representation, we have that it is V (β) in the B-representation that has this property. In this case, it is the operator p̂ that can not be defined. We see then that given a physical system for which the configuration space has a well defined physi- cal meaning, within the possible representation in which wave-functions are functions of the configuration variable q, the A and B polymer representations are radically dif- ferent and inequivalent. Having said this, it is also true that the A and B representations are equivalent in a different sense, by means of the duality between q and p representations and the d↔ 1/d duality: The A-polymer representation in the “q-representation” is equivalent to the B-polymer representation in the “p-representation”, and conversely. When studying a problem, it is important to decide from the beginning which polymer representation (if any) one should be using (for instance in the q-polarization). This has as a consequence an implication on which variable is naturally “quantized” (even if continuous): p for A and q for B. There could be for instance a physical criteria for this choice. For example a fundamental symmetry could suggest that one representation is more natural than an- other one. This indeed has been recently noted by Chiou in [10], where the Galileo group is investigated and where it is shown that the B representation is better behaved. In the other polarization, namely for wavefunctions of p, the picture gets reversed: q is discrete for the A- representation, while p is for the B-case. Let us end this section by noting that the procedure of obtaining the polymer quantization by means of an appropriate limit of Fock-Schrödinger representations might prove useful in more general settings in field theory or quantum gravity. III. POLYMER QUANTUM MECHANICS: KINEMATICS In previous sections we have derived what we have called the A and B polymer representations (in the q- polarization) as limiting cases of ordinary Fock repre- sentations. In this section, we shall describe, without any reference to the Schrödinger representation, the ‘ab- stract’ polymer representation and then make contact with its two possible realizations, closely related to the A and B cases studied before. What we will see is that one of them (the A case) will correspond to the p-polarization while the other one corresponds to the q−representation, when a choice is made about the physical significance of the variables. We can start by defining abstract kets |µ〉 labelled by a real number µ. These shall belong to the Hilbert space Hpoly. From these states, we define a generic ‘cylinder states’ that correspond to a choice of a finite collection of numbers µi ∈ R with i = 1, 2, . . . , N . Associated to this choice, there are N vectors |µi〉, so we can take a linear combination of them |ψ〉 = ai |µi〉 (15) The polymer inner product between the fundamental kets is given by, 〈ν|µ〉 = δν,µ (16) That is, the kets are orthogonal to each other (when ν 6= µ) and they are normalized (〈µ|µ〉 = 1). Immediately, this implies that, given any two vectors |φ〉 = j=1 bj |νj〉 and |ψ〉 = i=1 ai |µi〉, the inner product between them is given by, 〈φ|ψ〉 = b̄j ai 〈νj |µi〉 = b̄k ak where the sum is over k that labels the intersection points between the set of labels {νj} and {µi}. The Hilbert space Hpoly is the Cauchy completion of finite linear com- bination of the form (15) with respect to the inner prod- uct (16). Hpoly is non-separable. There are two basic operators on this Hilbert space: the ‘label operator’ ε̂: ε̂ |µ〉 := µ |µ〉 and the displacement operator ŝ (λ), ŝ (λ) |µ〉 := |µ+ λ〉 The operator ε̂ is symmetric and the operator(s) ŝ(λ) defines a one-parameter family of unitary operators on Hpoly, where its adjoint is given by ŝ† (λ) = ŝ (−λ). This action is however, discontinuous with respect to λ given that |µ〉 and |µ + λ〉 are always orthogonal, no matter how small is λ. Thus, there is no (Hermitian) operator that could generate ŝ (λ) by exponentiation. So far we have given the abstract characterization of the Hilbert space, but one would like to make contact with concrete realizations as wave functions, or by iden- tifying the abstract operators ε̂ and ŝ with physical op- erators. Suppose we have a system with a configuration space with coordinate given by q, and p denotes its canonical conjugate momenta. Suppose also that for physical rea- sons we decide that the configuration coordinate q will have some “discrete character” (for instance, if it is to be identified with position, one could say that there is an underlying discreteness in position at a small scale). How can we implement such requirements by means of the polymer representation? There are two possibilities, depending on the choice of ‘polarizations’ for the wave- functions, namely whether they will be functions of con- figuration q or momenta p. Let us the divide the discus- sion into two parts. A. Momentum polarization In this polarization, states will be denoted by, ψ(p) = 〈p|ψ〉 where ψµ(p) = 〈p|µ〉 = ei How are then the operators ε̂ and ŝ represented? Note that if we associate the multiplicative operator V̂ (λ) · ψµ(p) = ei ~ = ei (µ+λ) p = ψ(µ+λ)(p) we see then that the operator V̂ (λ) corresponds precisely to the shift operator ŝ (λ). Thus we can also conclude that the operator p̂ does not exist. It is now easy to identify the operator q̂ with: q̂ · ψµ(p) = −i~ ψµ(p) = µ e ~ = µψµ(p) namely, with the abstract operator ε̂. The reason we say that q̂ is discrete is because this operator has as its eigenvalue the label µ of the elementary state ψµ(p), and this label, even when it can take value in a continuum of possible values, is to be understood as a discrete set, given that the states are orthonormal for all values of µ. Given that states are now functions of p, the inner product (16) should be defined by a measure µ on the space on which the wave-functions are defined. In order to know what these two objects are, namely, the quan- tum “configuration” space C and the measure thereon1, we have to make use of the tools available to us from the theory of C∗-algebras. If we consider the operators V̂ (λ), together with their natural product and ∗-relation given by V̂ ∗(λ) = V̂ (−λ), they have the structure of an Abelian C∗-algebra (with unit) A. We know from the representation theory of such objects that A is iso- morphic to the space of continuous functions C0(∆) on a compact space ∆, the spectrum of A. Any representation of A on a Hilbert space as multiplication operator will be on spaces of the form L2(∆, dµ). That is, our quantum configuration space is the spectrum of the algebra, which in our case corresponds to the Bohr compactification Rb of the real line [11]. This space is a compact group and there is a natural probability measure defined on it, the Haar measure µH. Thus, our Hilbert space Hpoly will be isomorphic to the space, Hpoly,p = L2(Rb, dµH) (17) In terms of ‘quasi periodic functions’ generated by ψµ(p), the inner product takes the form 〈ψµ|ψλ〉 := dµH ψµ(p)ψλ(p) := = lim L 7→∞ dpψµ(p)ψλ(p) = δµ,λ (18) note that in the p-polarization, this characterization cor- responds to the ‘A-version’ of the polymer representation of Sec. II (where p and q are interchanged). B. q-polarization Let us now consider the other polarization in which wave functions will depend on the configuration coordinate q: ψ(q) = 〈q|ψ〉 The basic functions, that now will be called ψ̃µ(q), should be, in a sense, the dual of the functions ψµ(p) of the previous subsection. We can try to define them via a ‘Fourier transform’: ψ̃µ(q) := 〈q|µ〉 = 〈q| dµH|p〉〈p|µ〉 which is given by ψ̃µ(q) := dµH〈q|p〉ψµ(p) = dµH e −i p q ~ = δq,µ (19) 1 here we use the standard terminology of ‘configuration space’ to denote the domain of the wave function even when, in this case, it corresponds to the physical momenta p. That is, the basic objects in this representation are Kro- necker deltas. This is precisely what we had found in Sec. II for the B-type representation. How are now the basic operators represented and what is the form of the inner product? Regarding the operators, we expect that they are represented in the opposite manner as in the previous p-polarization case, but that they preserve the same features: p̂ does not exist (the derivative of the Kro- necker delta is ill defined), but its exponentiated version V̂ (λ) does: V̂ (λ) · ψ(q) = ψ(q + λ) and the operator q̂ that now acts as multiplication has as its eigenstates, the functions ψ̃ν(q) = δν,q: q̂ · ψ̃µ(q) := µ ψ̃µ(q) What is now the nature of the quantum configurations space Q? And what is the measure thereon dµq? that defines the inner product we should have: 〈ψ̃µ(q), ψ̃λ(q)〉 = δµ,λ The answer comes from one of the characterizations of the Bohr compactification: we know that it is, in a precise sense, dual to the real line but when equipped with the discrete topology Rd. Furthermore, the measure on Rd will be the ‘counting measure’. In this way we recover the same properties we had for the previous characterization of the polymer Hilbert space. We can thus write: Hpoly,x := L2(Rd, dµc) (20) This completes a precise construction of the B-type poly- mer representation sketched in the previous section. Note that if we had chosen the opposite physical situation, namely that q, the configuration observable, be the quan- tity that does not have a corresponding operator, then we would have had the opposite realization: In the q- polarization we would have had the type-A polymer rep- resentation and the type-B for the p-polarization. As we shall see both scenarios have been considered in the literature. Up to now we have only focused our discussion on the kinematical aspects of the quantization process. Let us now consider in the following section the issue of dynam- ics and recall the approach that had been adopted in the literature, before the issue of the removal of the regulator was reexamined in [6]. IV. POLYMER QUANTUM MECHANICS: DYNAMICS As we have seen the construction of the polymer representation is rather natural and leads to a quan- tum theory with different properties than the usual Schrödinger counterpart such as its non-separability, the non-existence of certain operators and the existence of normalized eigen-vectors that yield a precise value for one of the phase space coordinates. This has been done without any regard for a Hamiltonian that endows the system with a dynamics, energy and so on. First let us consider the simplest case of a particle of mass m in a potential V (q), in which the Hamiltonian H takes the form, p2 + V (q) Suppose furthermore that the potential is given by a non- periodic function, such as a polynomial or a rational func- tion. We can immediately see that a direct implementa- tion of the Hamiltonian is out of our reach, for the simple reason that, as we have seen, in the polymer representa- tion we can either represent q or p, but not both! What has been done so far in the literature? The simplest thing possible: approximate the non-existing term by a well defined function that can be quantized and hope for the best. As we shall see in next sections, there is indeed more that one can do. At this point there is also an important decision to be made: which variable q or p should be regarded as “dis- crete”? Once this choice is made, then it implies that the other variable will not exist: if q is regarded as dis- crete, then p will not exist and we need to approximate the kinetic term p2/2m by something else; if p is to be the discrete quantity, then q will not be defined and then we need to approximate the potential V (q). What hap- pens with a periodic potential? In this case one would be modelling, for instance, a particle on a regular lattice such as a phonon living on a crystal, and then the natural choice is to have q not well defined. Furthermore, the po- tential will be well defined and there is no approximation needed. In the literature both scenarios have been considered. For instance, when considering a quantum mechanical system in [2], the position was chosen to be discrete, so p does not exist, and one is then in the A type for the momentum polarization (or the type B for the q- polarization). With this choice, it is the kinetic term the one that has to be approximated, so once one has done this, then it is immediate to consider any potential that will thus be well defined. On the other hand, when con- sidering loop quantum cosmology (LQC), the standard choice is that the configuration variable is not defined [4]. This choice is made given that LQC is regarded as the symmetric sector of full loop quantum gravity where the connection (that is regarded as the configuration vari- able) can not be promoted to an operator and one can only define its exponentiated version, namely, the holon- omy. In that case, the canonically conjugate variable, closely related to the volume, becomes ‘discrete’, just as in the full theory. This case is however, different from the particle in a potential example. First we could mention that the functional form of the Hamiltonian constraint that implements dynamics has a different structure, but the more important difference lies in that the system is constrained. Let us return to the case of the particle in a po- tential and for definiteness, let us start with the aux- iliary kinematical framework in which: q is discrete, p can not be promoted and thus we have to approximate the kinetic term p̂2/2m. How is this done? The stan- dard prescription is to define, on the configuration space C, a regular ‘graph’ γµ0 . This consists of a numerable set of points, equidistant, and characterized by a pa- rameter µ0 that is the (constant) separation between points. The simplest example would be to consider the set γµ0 = {q ∈ R | q = nµ0 , ∀ n ∈ Z}. This means that the basic kets that will be considered |µn〉 will correspond precisely to labels µn belonging to the graph γµ0 , that is, µn = nµ0. Thus, we shall only consider states of the form, |ψ〉 = bn |µn〉 . (21) This ‘small’ Hilbert space Hγµ0 , the graph Hilbert space, is a subspace of the ‘large’ polymer Hilbert space Hpoly but it is separable. The condition for a state of the form (21) to belong to the Hilbert space Hγµ0 is that the co- efficients bn satisfy: n |bn|2 <∞. Let us now consider the kinetic term p̂2/2m. We have to approximate it by means of trigonometric functions, that can be built out of the functions of the form eiλ p/~. As we have seen in previous sections, these functions can indeed be promoted to operators and act as translation operators on the kets |µ〉. If we want to remain in the graph γ, and not create ‘new points’, then one is con- strained to considering operators that displace the kets by just the right amount. That is, we want the basic shift operator V̂ (λ) to be such that it maps the ket with label |µn〉 to the next ket, namely |µn+1〉. This can in- deed achieved by fixing, once and for all, the value of the allowed parameter λ to be λ = µ0. We have then, V̂ (µ0) · |µn〉 = |µn + µ0〉 = |µn+1〉 which is what we wanted. This basic ‘shift operator’ will be the building block for approximating any (polynomial) function of p. In order to do that we notice that the function p can be approximated by, p ≈ ~ (µ0 p ~ − e−i where the approximation is good for p << ~/µ0. Thus, one can define a regulated operator p̂µ0 that depends on the ‘scale’ µ0 as: p̂µ0 · |µn〉 := [V (µ0) − V (−µ0)] · |µn〉 = (|µn+1〉 − |µn−1〉) (22) In order to regulate the operator p̂2, there are (at least) two possibilities, namely to compose the operator p̂µ0 with itself or to define a new approximation. The oper- ator p̂µ0 · p̂µ0 has the feature that shifts the states two steps in the graph to both sides. There is however an- other operator that only involves shifting once: p̂2µ0 · |νn〉 := [2 − V̂ (µ0) − V̂ (−µ0)] · |νn〉 = (2|νn〉 − |νn+1〉 − |νn−1〉) (23) which corresponds to the approximation p2 ≈ 2~ cos(µ0 p/~)), valid also in the regime p << ~/µ0. With these considerations, one can define the operator Ĥµ0 , the Hamiltonian at scale µ0, that in practice ‘lives’ on the space Hγµ0 as, Ĥµ0 := p̂2µ0 + V̂ (q) , (24) that is a well defined, symmetric operator on Hγµ0 . No- tice that the operator is also defined on Hpoly, but there its physical interpretation is problematic. For example, it turns out that the expectation value of the kinetic term calculated on most states (states which are not tailored to the exact value of the parameter µ0) is zero. Even if one takes a state that gives “reasonable“ expectation values of the µ0-kinetic term and uses it to calculate the expectation value of the kinetic term corresponding to a slight perturbation of the parameter µ0 one would get zero. This problem, and others that arise when working on Hpoly, forces one to assign a physical interpretation to the Hamiltonian Ĥµ0 only when its action is restricted to the subspace Hγµ0 . Let us now explore the form that the Hamiltonian takes in the two possible polarizations. In the q-polarization, the basis, labelled by n is given by the functions χn(q) = δq,µn . That is, the wave functions will only have sup- port on the set γµ0 . Alternatively, one can think of a state as completely characterized by the ‘Fourier coeffi- cients’ an: ψ(q) ↔ an, which is the value that the wave function ψ(q) takes at the point q = µn = nµ0. Thus, the Hamiltonian takes the form of a difference equation when acting on a general state ψ(q). Solving the time independent Schrödinger equation Ĥ · ψ = E ψ amounts to solving the difference equation for the coefficients an. The momentum polarization has a different structure. In this case, the operator p̂2µ0 acts as a multiplication operator, p̂2µ0 · ψ(p) = 1 − cos (µ0 p ψ(p) (25) The operator corresponding to q will be represented as a derivative operator q̂ · ψ(p) := i~ ∂p ψ(p). For a generic potential V (q), it has to be defined by means of spectral theory defined now on a circle. Why on a circle? For the simple reason that by restricting ourselves to a regular graph γµ0 , the functions of p that preserve it (when acting as shift operators) are of the form e(i m µ0 p/~) for m integer. That is, what we have are Fourier modes, labelled by m, of period 2π ~/µ0 in p. Can we pretend then that the phase space variable p is now compactified? The answer is in the affirmative. The inner product on periodic functions ψµ0(p) of p coming from the full Hilbert space Hpoly and given by 〈φ(p)|ψ(p)〉poly = lim L 7→∞ dp φ(p)ψ(p) is precisely equivalent to the inner product on the circle given by the uniform measure 〈φ(p)|ψ(p)〉µ0 = ∫ π~/µ0 −π~/µ0 dp φ(p)ψ(p) with p ∈ (−π~/µ0, π~/µ0). As long as one restricts at- tention to the graph γµ0 , one can work in this separable Hilbert space Hγµ0 of square integrable functions on S Immediately, one can see the limitations of this descrip- tion. If the mechanical system to be quantized is such that its orbits have values of the momenta p that are not small compared with π~/µ0 then the approximation taken will be very poor, and we don’t expect neither the effective classical description nor its quantization to be close to the standard one. If, on the other hand, one is al- ways within the region in which the approximation can be regarded as reliable, then both classical and quantum de- scriptions should approximate the standard description. What does ‘close to the standard description’ exactly mean needs, of course, some further clarification. In particular one is assuming the existence of the usual Schrödinger representation in which the system has a be- havior that is also consistent with observations. If this is the case, the natural question is: How can we approxi- mate such description from the polymer picture? Is there a fine enough graph γµ0 that will approximate the system in such a way that all observations are indistinguishable? Or even better, can we define a procedure, that involves a refinement of the graph γµ0 such that one recovers the standard picture? It could also happen that a continuum limit can be de- fined but does not coincide with the ‘expected one’. But there might be also physical systems for which there is no standard description, or it just does not make sense. Can in those cases the polymer representation, if it ex- ists, provide the correct physical description of the sys- tem under consideration? For instance, if there exists a physical limitation to the minimum scale set by µ0, as could be the case for a quantum theory of gravity, then the polymer description would provide a true physical bound on the value of certain quantities, such as p in our example. This could be the case for loop quantum cosmology, where there is a minimum value for physical volume (coming from the full theory), and phase space points near the ‘singularity’ lie at the region where the approximation induced by the scale µ0 departs from the standard classical description. If in that case the poly- mer quantum system is regarded as more fundamental than the classical system (or its standard Wheeler-De Witt quantization), then one would interpret this dis- crepancies in the behavior as a signal of the breakdown of classical description (or its ‘naive’ quantization). In the next section we present a method to remove the regulator µ0 which was introduced as an intermedi- ate step to construct the dynamics. More precisely, we shall consider the construction of a continuum limit of the polymer description by means of a renormalization procedure. V. THE CONTINUUM LIMIT This section has two parts. In the first one we motivate the need for a precise notion of the continuum limit of the polymeric representation, explaining why the most direct, and naive approach does not work. In the sec- ond part, we shall present the main ideas and results of the paper [6], where the Hamiltonian and the physical Hilbert space in polymer quantum mechanics are con- structed as a continuum limit of effective theories, follow- ing Wilson’s renormalization group ideas. The resulting physical Hilbert space turns out to be unitarily isomor- phic to the ordinary Hs = L2(R, dq) of the Schrödinger theory. Before describing the results of [6] we should discuss the precise meaning of reaching a theory in the contin- uum. Let us for concreteness consider the B-type repre- sentation in the q-polarization. That is, states are func- tions of q and the orthonormal basis χµ(q) is given by characteristic functions with support on q = µ. Let us now suppose we have a Schrödinger state Ψ(q) ∈ Hs = L2(R, dq). What is the relation between Ψ(q) and a state in Hpoly,x? We are also interested in the opposite ques- tion, that is, we would like to know if there is a preferred state in Hs that is approximated by an arbitrary state ψ(q) in Hpoly,x. The first obvious observation is that a Schödinger state Ψ(q) does not belong to Hpoly,x since it would have an infinite norm. To see that note that even when the would-be state can be formally expanded in the χµ basis as, Ψ(q) = Ψ(µ) χµ(q) where the sum is over the parameter µ ∈ R. Its associ- ated norm in Hpoly,x would be: |Ψ(q)|2poly = |Ψ(µ)|2 → ∞ which blows up. Note that in order to define a mapping P : Hs → Hpoly,x, there is a huge ambiguity since the values of the function Ψ(q) are needed in order to expand the polymer wave function. Thus we can only define a mapping in a dense subset D of Hs where the values of the functions are well defined (recall that in Hs the value of functions at a given point has no meaning since states are equivalence classes of functions). We could for instance ask that the mapping be defined for representatives of the equivalence classes in Hs that are piecewise continuous. From now on, when we refer to an element of the space Hs we shall be refereeing to one of those representatives. Notice then that an element of Hs does define an element of Cyl∗γ , the dual to the space Cylγ , that is, the space of cylinder functions with support on the (finite) lattice γ = {µ1, µ2, . . . , µN}, in the following way: Ψ(q) : Cylγ −→ C such that Ψ(q)[ψ(q)] = (Ψ|ψ〉 := Ψ(µ) 〈χµ| ψi χµi〉polyγ Ψ(µi)ψi < ∞ (26) Note that this mapping could be seen as consisting of two parts: First, a projection Pγ : Cyl ∗ → Cylγ such that Pγ(Ψ) = Ψγ(q) := i Ψ(µi)χµi(q) ∈ Cylγ . The state Ψγ is sometimes refereed to as the ‘shadow of Ψ(q) on the lattice γ’. The second step is then to take the inner product between the shadow Ψγ(q) and the state ψ(q) with respect to the polymer inner product 〈Ψγ |ψ〉polyγ . Now this inner product is well defined. Notice that for any given lattice γ the corresponding projector Pγ can be intuitively interpreted as some kind of ‘coarse graining map’ from the continuum to the lattice γ. In terms of functions of q the projection is replacing a continuous function defined on R with a function over the lattice γ ⊂ R which is a discrete set simply by restricting Ψ to γ. The finer the lattice the more points that we have on the curve. As we shall see in the second part of this section, there is indeed a precise notion of coarse graining that implements this intuitive idea in a concrete fashion. In particular, we shall need to replace the lattice γ with a decomposition of the real line in intervals (having the lattice points as end points). Let us now consider a system in the polymer represen- tation in which a particular lattice γ0 was chosen, say with points of the form {qk ∈ R |qk = ka0 , ∀ k ∈ Z}, namely a uniform lattice with spacing equal to a0. In this case, any Schrödinger wave function (of the type that we consider) will have a unique shadow on the lattice γ0. If we refine the lattice γ 7→ γn by dividing each interval in 2n new intervals of length an = a0/2 n we have new shad- ows that have more and more points on the curve. Intu- itively, by refining infinitely the graph we would recover the original function Ψ(q). Even when at each finite step the corresponding shadow has a finite norm in the poly- mer Hilbert space, the norm grows unboundedly and the limit can not be taken, precisely because we can not em- bed Hs into Hpoly. Suppose now that we are interested in the reverse process, namely starting from a polymer theory on a lattice and asking for the ‘continuum wave function’ that is best approximated by a wave function over a graph. Suppose furthermore that we want to con- sider the limit of the graph becoming finer. In order to give precise answers to these (and other) questions we need to introduce some new technology that will allow us to overcome these apparent difficulties. In the remaining of this section we shall recall these constructions for the benefit of the reader. Details can be found in [6] (which is an application of the general formalism discussed in [9]). The starting point in this construction is the concept of a scale C, which allows us to define the effective the- ories and the concept of continuum limit. In our case a scale is a decomposition of the real line in the union of closed-open intervals, that cover the whole line and do not intersect. Intuitively, we are shifting the emphasis from the lattice points to the intervals defined by the same points with the objective of approximating con- tinuous functions defined on R with functions that are constant on the intervals defined by the lattice. To be precise, we define an embedding, for each scale Cn from Hpoly to Hs by means of a step function: Ψ(man) χman(q) → Ψ(man) χαm(q) ∈ Hs with χαn(q) a characteristic function on the interval αm = [man, (m + 1)an). Thus, the shadows (living on the lattice) were just an intermediate step in the con- struction of the approximating function; this function is piece-wise constant and can be written as a linear com- bination of step functions with the coefficients provided by the shadows. The challenge now is to define in an appropriate sense how one can approximate all the aspects of the theory by means of this constant by pieces functions. Then the strategy is that, for any given scale, one can define an effective theory by approximating the kinetic operator by a combination of the translation operators that shift between the vertices of the given decomposition, in other words by a periodic function in p. As a result one has a set of effective theories at given scales which are mutually related by coarse graining maps. This framework was developed in [6]. For the convenience of the reader we briefly recall part of that framework. Let us denote the kinematic polymer Hilbert space at the scale Cn as HCn , and its basis elements as eαi,Cn , where αi = [ian, (i + 1)an) ∈ Cn. By construction this basis is orthonormal. The basis elements in the dual Hilbert space H∗Cn are denoted by ωαi,Cn ; they are also orthonormal. The states ωαi,Cn have a simple action on Cyl, ωαi,Cn(δx0,q) = χαi,Cn(x0). That is, if x0 is in the interval αi of Cn the result is one and it is zero if it is not there. Given any m ≤ n, we define d∗m,n : H∗Cn → H as the ‘coarse graining’ map between the dual Hilbert spaces, that sends the part of the elements of the dual basis to zero while keeping the information of the rest: d∗m,n(ωαi,Cn) = ωβj ,Cm if i = j2 n−m, in the opposite case d∗m,n(ωαi,Cn) = 0. At every scale the corresponding effective theory is given by the hamiltonian Hn. These Hamiltonians will be treated as quadratic forms, hn : HCn → R, given by hn(ψ) = λ (ψ,Hnψ) , (27) where λ2Cn is a normalizaton factor. We will see later that this rescaling of the inner product is necessary in order to guarantee the convergence of the renormalized theory. The completely renormalized theory at this scale is obtained as hrenm := lim d⋆m,nhn. (28) and the renormalized Hamiltonians are compatible with each other, in the sense that d⋆m,nh n = h In order to analyze the conditions for the convergence in (28) let us express the Hamiltonian in terms of its eigen-covectors end eigenvalues. We will work with effec- tive Hamiltonians that have a purely discrete spectrum (labelled by ν) Hn · Ψν,Cn = Eν,Cn Ψν,Cn . We shall also introduce, as an intermediate step, a cut-off in the energy levels. The origin of this cut-off is in the approximation of the Hamiltonian of our system at a given scale with a Hamiltonian of a periodic system in a regime of small energies, as we explained earlier. Thus, we can write hνcut−offm = νcut−off Eν,CmΨν,Cm ⊗ Ψν,Cm , (29) where the eigen covectors Ψν,Cm are normalized accord- ing to the inner product rescaled by 1 , and the cut- off can vary up to a scale dependent bound, νcut−off ≤ νmax(Cm). The Hilbert space of covectors together with such inner product will be called H⋆renCm . In the presence of a cut-off, the convergence of the microscopically corrected Hamiltonians, equation (28) is equivalent to the existence of the following two limits. The first one is the convergence of the energy levels, Eν,Cn = E ν . (30) Second is the existence of the completely renormalized eigen covectors, d⋆m,n Ψν,Cn = Ψ ∈ H⋆renCm ⊂ Cyl ⋆ . (31) We clarify that the existence of the above limit means that Ψrenν,Cm(δx0,q) is well defined for any δx0,q ∈ Cyl. No- tice that this point-wise convergence, if it can take place at all, will require the tuning of the normalization factors λ2Cn . Now we turn to the question of the continuum limit of the renormalized covectors. First we can ask for the existence of the limit Ψrenν,Cn(δx0,q) (32) for any δx0,q ∈ Cyl. When this limits exists there is a natural action of the eigen covectors in the continuum limit. Below we consider another notion of the continuum limit of the renormalized eigen covectors. When the completely renormalized eigen covectors exist, they form a collection that is d⋆-compatible, d⋆m,nΨ = Ψrenν,Cm . A sequence of d ⋆-compatible nor- malizable covectors define an element of , which is the projective limit of the renormalized spaces of covec- H⋆renCn . (33) The inner product in this space is defined by ({ΨCn}, {ΦCn})renR := lim (ΨCn ,ΦCn) The natural inclusion of C∞0 in is by an antilinear map which assigns to any Ψ ∈ C∞0 the d⋆-compatible collection ΨshadCn := ωαiΨ̄(L(αi)) ∈ H⋆renCn ⊂ Cyl ΨshadCn will be called the shadow of Ψ at scale Cn and acts in Cyl as a piecewise constant function. Clearly other types of test functions like Schwartz functions are also naturally included in . In this context a shadow is a state of the effective theory that approximates a state in the continuum theory. Since the inner product in is degenerate, the physical Hilbert space is defined as H⋆phys := / ker(·, ·)ren Hphys := H⋆⋆phys The nature of the physical Hilbert space, whether it is isomorphic to the Schrödinger Hilber space, Hs, or not, is determined by the normalization factors λ2Cn which can be obtained from the conditions asking for compatibil- ity of the dynamics of the effective theories at different scales. The dynamics of the system under consideration selects the continuum limit. Let us now return to the definition of the Hamilto- nian in the continuum limit. First consider the contin- uum limit of the Hamiltonian (with cut-off) in the sense of its point-wise convergence as a quadratic form. It turns out that if the limit of equation (32) exists for all the eigencovectors allowed by the cut-off, we have νcut−off ren : Hpoly,x → R defined by νcut−off ren (δx0,q) := lim hνcut−off renn ([δx0,q]Cn). (34) This Hamiltonian quadratic form in the continuum can be coarse grained to any scale and, as can be ex- pected, it yields the completely renormalized Hamilto- nian quadratic forms at that scale. However, this is not a completely satisfactory continuum limit because we can not remove the auxiliary cut-off νcut−off . If we tried, as we include more and more eigencovectors in the Hamilto- nian the calculations done at a given scale would diverge and doing them in the continuum is just as divergent. Below we explore a more successful path. We can use the renormalized inner product to induce an action of the cut–off Hamiltonians on νcut−off ren ({ΨCn}) := lim hνcut−off renn ((ΨCn , ·)renCn ), where we have used the fact that (ΨCn , ·)renCn ∈ HCn . The existence of this limit is trivial because the renormalized Hamiltonians are finite sums and the limit exists term by term. These cut-off Hamiltonians descend to the physical Hilbert space νcut−off ren ([{ΨCn}]) := h νcut−off ren ({ΨCn}) for any representative {ΨCn} ∈ [{ΨCn}] ∈ H⋆phys. Finally we can address the issue of removal of the cut- off. The Hamiltonian hren → R is defined by the limit := lim νcut−off→∞ νcut−off ren when the limit exists. Its corresponding Hermitian form in Hphys is defined whenever the above limit exists. This concludes our presentation of the main results of [6]. Let us now consider several examples of systems for which the continuum limit can be investigated. VI. EXAMPLES In this section we shall develop several examples of systems that have been treated with the polymer quanti- zation. These examples are simple quantum mechanical systems, such as the simple harmonic oscillator and the free particle, as well as a quantum cosmological model known as loop quantum cosmology. A. The Simple Harmonic Oscillator In this part, let us consider the example of a Simple Har- monic Oscillator (SHO) with parameters m and ω, clas- sically described by the following Hamiltonian mω2 x2. Recall that from these parameters one can define a length scale D = ~/mω. In the standard treatment one uses this scale to define a complex structure JD (and an in- ner product from it), as we have described in detail that uniquely selects the standard Schrödinger representation. At scale Cn we have an effective Hamiltonian for the Simple Harmonic Oscillator (SHO) given by HCn = 1 − cos anp mω2x2 . (35) If we interchange position and momentum, this Hamilto- nian is exactly that of a pendulum of mass m, length l and subject to a constant gravitational field g: ĤCn = − +mgl(1 − cos θ) where those quantities are related to our system by, mω an , g = , θ = That is, we are approximating, for each scale Cn the SHO by a pendulum. There is, however, an important difference. From our knowledge of the pendulum system, we know that the quantum system will have a spectrum for the energy that has two different asymptotic behav- iors, the SHO for low energies and the planar rotor in the higher end, corresponding to oscillating and rotating solutions respectively2. As we refine our scale and both the length of the pendulum and the height of the periodic potential increase, we expect to have an increasing num- ber of oscillating states (for a given pendulum system, there is only a finite number of such states). Thus, it is justified to consider the cut-off in the energy eigenval- ues, as discussed in the last section, given that we only expect a finite number of states of the pendulum to ap- proximate SHO eigenstates. With these consideration in mind, the relevant question is whether the conditions for the continuum limit to exist are satisfied. This question has been answered in the affirmative in [6]. What was shown there was that the eigen-values and eigen func- tions of the discrete systems, which represent a discrete and non-degenerate set, approximate those of the contin- uum, namely, of the standard harmonic oscillator when the inner product is renormalized by a factor λ2Cn = 1/2 This convergence implies that the continuum limit exists as we understand it. Let us now consider the simplest possible system, a free particle, that has nevertheless the particular feature that the spectrum of the energy is con- tinuous. 2 Note that both types of solutions are, in the phase space, closed. This is the reason behind the purely discrete spectrum. The distinction we are making is between those solutions inside the separatrix, that we call oscillating, and those that are above it that we call rotating. B. Free Polymer Particle In the limit ω → 0, the Hamiltonian of the Simple Harmonic oscillator (35) goes to the Hamiltonian of a free particle and the corresponding time independent Schrödinger equation, in the p−polarization, is given by (1 − cos anp ) − ECn ψ̃(p) = 0 where we now have that p ∈ S1, with p ∈ (−π~ Thus, we have ECn = 1 − cos ≤ ECn,max ≡ 2 . (36) At each scale the energy of the particle we can describe is bounded from above and the bound depends on the scale. Note that in this case the spectrum is continu- ous, which implies that the ordinary eigenfunctions of the Hilbert are not normalizable. This imposes an upper bound in the value that the energy of the particle can have, in addition to the bound in the momentum due to its “compactification”. Let us first look for eigen-solutions to the time inde- pendent Schrödinger equation, that is, for energy eigen- states. In the case of the ordinary free particle, these correspond to constant momentum plane waves of the form e±( ) and such that the ordinary dispersion re- lation p2/2m = E is satisfied. These plane waves are not square integrable and do not belong to the ordinary Hilbert space of the Schrödinger theory but they are still useful for extracting information about the system. For the polymer free particle we have, ψ̃Cn(p) = c1δ(p− PCn) + c2δ(p+ PCn) where PCn is a solution of the previous equation consid- ering a fixed value of ECn . That is, PCn = P (ECn) = arccos 1 − ma The inverse Fourier transform yields, in the ‘x represen- tation’, ψCn(xj) = ∫ π~/an −π~/an ψ̃(p) e p j dp = ixjPCn /~ + c2e −ixjPCn /~ .(37) with xj = an j for j ∈ Z. Note that the eigenfunctions are still delta functions (in the p representation) and thus not (square) normalizable with respect to the polymer inner product, that in the p polarization is just given by the ordinary Haar measure on S1, and there is no quantization of the momentum (its spectrum is still truly continuous). Let us now consider the time dependent Schrödinger equation, i~ ∂t Ψ̃(p, t) = Ĥ · Ψ̃(p, t). Which now takes the form, Ψ̃(p, t) = (1 − cos (an p/~)) Ψ̃(p, t) that has as its solution, Ψ̃(p, t) = e− (1−cos (an p/~)) t ψ̃(p) = e(−iECn /~) t ψ̃(p) for any initial function ψ̃(p), where ECn satisfy the dis- persion relation (36). The wave function Ψ(xj , t), the xj-representation of the wave function, can be obtained for any given time t by Fourier transforming with (37) the wave function Ψ̃(p, t). In order to check out the convergence of the micro- scopically corrected Hamiltonians we should analyze the convergence of the energy levels and of the proper cov- ectors. In the limit n → ∞, ECn → E = p2/2m so we can be certain that the eigen-values for the energy converge (when fixing the value of p). Let us write the proper covector as ΨCn = (ψCn , ·)renCn ∈ H . Then we can bring microscopic corrections to scale Cm and look for convergence of such corrections ΨrenCm = lim d⋆m,nΨCn . It is easy to see that given any basis vector eαi ∈ HCm the following limit ΨrenCm(eαi,Cm) = limCn→∞ ΨCn(dn,m(eαi,Cm)) exists and is equal to ΨshadCm (eαi,Cm) = [d ⋆ΨSchr](eαi,Cm) = Ψ Schr(iam) where ΨshadCm is calculated using the free particle Hamilto- nian in the Schrödinger representation. This expression defines the completely renormalized proper covector at the scale Cm. C. Polymer Quantum Cosmology In this section we shall present a version of quantum cosmology that we call polymer quantum cosmology. The idea behind this name is that the main input in the quan- tization of the corresponding mini-superspace model is the use of a polymer representation as here understood. Another important input is the choice of fundamental variables to be used and the definition of the Hamiltonian constraint. Different research groups have made differ- ent choices. We shall take here a simple model that has received much attention recently, namely an isotropic, homogeneous FRW cosmology with k = 0 and coupled to a massless scalar field ϕ. As we shall see, a proper treatment of the continuum limit of this system requires new tools under development that are beyond the scope of this work. We will thus restrict ourselves to the intro- duction of the system and the problems that need to be solved. The system to be quantized corresponds to the phase space of cosmological spacetimes that are homogeneous and isotropic and for which the homogeneous spatial slices have a flat intrinsic geometry (k = 0 condition). The only matter content is a mass-less scalar field ϕ. In this case the spacetime geometry is given by metrics of the form: ds2 = −dt2 + a2(t) (dx2 + dy2 + dz2) where the function a(t) carries all the information and degrees of freedom of the gravity part. In terms of the coordinates (a, pa, ϕ, pϕ) for the phase space Γ of the the- ory, all the dynamics is captured in the Hamiltonian con- straint C := −3 + 8πG 2|a|3 The first step is to define the constraint on the kine- matical Hilbert space to find physical states and then a physical inner product to construct the physical Hilbert space. First note that one can rewrite the equation as: p2a a 2 = 8πG If, as is normally done, one chooses ϕ to act as an in- ternal time, the right hand side would be promoted, in the quantum theory, to a second derivative. The left hand side is, furthermore, symmetric in a and pa. At this point we have the freedom in choosing the variable that will be quantized and the variable that will not be well defined in the polymer representation. The standard choice is that pa is not well defined and thus, a and any geometrical quantity derived from it, is quantized. Fur- thermore, we have the choice of polarization on the wave function. In this respect the standard choice is to select the a-polarization, in which a acts as multiplication and the approximation of pa, namely sin(λ pa)/λ acts as a difference operator on wave functions of a. For details of this particular choice see [5]. Here we shall adopt the op- posite polarization, that is, we shall have wave functions Ψ(pa, ϕ). Just as we did in the previous cases, in order to gain intuition about the behavior of the polymer quantized theory, it is convenient to look at the equivalent prob- lem in the classical theory, namely the classical system we would get be approximating the non-well defined ob- servable (pa in our present case) by a well defined object (made of trigonometric functions). Let us for simplicity choose to replace pa 7→ sin(λ pa)/λ. With this choice we get an effective classical Hamiltonian constraint that depends on λ: Cλ := − sin(λ pa) λ2|a| + 8πG 2|a|3 We can now compute effective equations of motion by means of the equations: Ḟ := {F, Cλ}, for any observable F ∈ C∞(Γ), and where we are using the effective (first order) action: dτ(pa ȧ+ pϕ ϕ̇−N Cλ) with the choice N = 1. The first thing to notice is that the quantity pϕ is a constant of the motion, given that the variable ϕ is cyclic. The second observation is that ϕ̇ = 8πG has the same sign as pϕ and never vanishes. Thus ϕ can be used as a (n internal) time variable. The next observation is that the equation for , namely the effective Friedman equation, will have a zero for a non-zero value of a given by λ2p2ϕ. This is the value at which there will be bounce if the trajectory started with a large value of a and was con- tracting. Note that the ‘size’ of the universe when the bounce occurs depends on both the constant pϕ (that dictates the matter density) and the value of the lattice size λ. Here it is important to stress that for any value of pϕ (that uniquely fixes the trajectory in the (a, pa) plane), there will be a bounce. In the original description in terms of Einstein’s equations (without the approxima- tion that depends on λ), there in no such bounce. If ȧ < 0 initially, it will remain negative and the universe collapses, reaching the singularity in a finite proper time. What happens within the effective description if we re- fine the lattice and go from λ to λn := λ/2 n? The only thing that changes, for the same classical orbit labelled by pϕ, is that the bounce occurs at a ‘later time’ and for a smaller value of a∗ but the qualitative picture remains the same. This is the main difference with the systems considered before. In those cases, one could have classical trajecto- ries that remained, for a given choice of parameter λ, within the region where sin(λp)/λ is a good approxima- tion to p. Of course there were also classical trajectories that were outside this region but we could then refine the lattice and find a new value λ′ for which the new clas- sical trajectory is well approximated. In the case of the polymer cosmology, this is never the case: Every classical trajectory will pass from a region where the approxima- tion is good to a region where it is not; this is precisely where the ‘quantum corrections’ kick in and the universes bounces. Given that in the classical description, the ‘original’ and the ‘corrected’ descriptions are so different we expect that, upon quantization, the corresponding quantum the- ories, namely the polymeric and the Wheeler-DeWitt will be related in a non-trivial way (if at all). In this case, with the choice of polarization and for a particular factor ordering we have, sin(λpa) · Ψ(pa, ϕ) = 0 as the Polymer Wheeler-DeWitt equation. In order to approach the problem of the continuum limit of this quantum theory, we have to realize that the task is now somewhat different than before. This is so given that the system is now a constrained system with a constraint operator rather than a regular non-singular system with an ordinary Hamiltonian evolution. Fortu- nately for the system under consideration, the fact that the variable ϕ can be regarded as an internal time allows us to interpret the quantum constraint as a generalized Klein-Gordon equation of the form Ψ = Θλ · Ψ where the operator Θλ is ‘time independent’. This al- lows us to split the space of solutions into ‘positive and negative frequency’, introduce a physical inner product on the positive frequency solutions of this equation and a set of physical observables in terms of which to de- scribe the system. That is, one reduces in practice the system to one very similar to the Schrödinger case by taking the positive square root of the previous equation: Θλ · Ψ. The question we are interested is whether the continuum limit of these theories (labelled by λ) exists and whether it corresponds to the Wheeler- DeWitt theory. A complete treatment of this problem lies, unfortunately, outside the scope of this work and will be reported elsewhere [12]. VII. DISCUSSION Let us summarize our results. In the first part of the article we showed that the polymer representation of the canonical commutation relations can be obtained as the limiting case of the ordinary Fock-Schrödinger represen- tation in terms of the algebraic state that defines the representation. These limiting cases can also be inter- preted in terms of the naturally defined coherent states associated to each representation labelled by the param- eter d, when they become infinitely ‘squeezed’. The two possible limits of squeezing lead to two different polymer descriptions that can nevertheless be identified, as we have also shown, with the two possible polarizations for an abstract polymer representation. This resulting the- ory has, however, very different behavior as the standard one: The Hilbert space is non-separable, the representa- tion is unitarily inequivalent to the Schrödinger one, and natural operators such as p̂ are no longer well defined. This particular limiting construction of the polymer the- ory can shed some light for more complicated systems such as field theories and gravity. In the regular treatments of dynamics within the poly- mer representation, one needs to introduce some extra structure, such as a lattice on configuration space, to con- struct a Hamiltonian and implement the dynamics for the system via a regularization procedure. How does this re- sulting theory compare to the original continuum theory one had from the beginning? Can one hope to remove the regulator in the polymer description? As they stand there is no direct relation or mapping from the polymer to a continuum theory (in case there is one defined). As we have shown, one can indeed construct in a systematic fashion such relation by means of some appropriate no- tions related to the definition of a scale, closely related to the lattice one had to introduce in the regularization. With this important shift in perspective, and an appro- priate renormalization of the polymer inner product at each scale one can, subject to some consistency condi- tions, define a procedure to remove the regulator, and arrive to a Hamiltonian and a Hilbert space. As we have seen, for some simple examples such as a free particle and the harmonic oscillator one indeed recovers the Schrödinger description back. For other sys- tems, such as quantum cosmological models, the answer is not as clear, since the structure of the space of classi- cal solutions is such that the ‘effective description’ intro- duced by the polymer regularization at different scales is qualitatively different from the original dynamics. A proper treatment of these class of systems is underway and will be reported elsewhere [12]. Perhaps the most important lesson that we have learned here is that there indeed exists a rich inter- play between the polymer description and the ordinary Schrödinger representation. The full structure of such re- lation still needs to be unravelled. We can only hope that a full understanding of these issues will shed some light in the ultimate goal of treating the quantum dynamics of background independent field systems such as general relativity. Acknowledgments We thank A. Ashtekar, G. Hossain, T. Pawlowski and P. Singh for discussions. This work was in part supported by CONACyT U47857-F and 40035-F grants, by NSF PHY04-56913, by the Eberly Research Funds of Penn State, by the AMC-FUMEC exchange program and by funds of the CIC-Universidad Michoacana de San Nicolás de Hidalgo. [1] R. Beaume, J. Manuceau, A. Pellet and M. Sirugue, “Translation Invariant States In Quantum Mechanics,” Commun. Math. Phys. 38, 29 (1974); W. E. Thirring and H. Narnhofer, “Covariant QED without indefinite met- ric,” Rev. Math. Phys. 4, 197 (1992); F. Acerbi, G. Mor- chio and F. Strocchi, “Infrared singular fields and non- regular representations of canonical commutation rela- tion algebras”, J. Math. Phys. 34, 899 (1993); F. Cav- allaro, G. Morchio and F. Strocchi, “A generalization of the Stone-von Neumann theorem to non-regular repre- sentations of the CCR-algebra”, Lett. Math. Phys. 47 307 (1999); H. Halvorson, “Complementarity of Repre- sentations in quantum mechanics”, Studies in History and Philosophy of Modern Physics 35 45 (2004). [2] A. Ashtekar, S. Fairhurst and J.L. Willis, “Quantum gravity, shadow states, and quantum mechanics”, Class. Quant. Grav. 20 1031 (2003) [arXiv:gr-qc/0207106]. [3] K. Fredenhagen and F. Reszewski, “Polymer state ap- proximations of Schrödinger wave functions”, Class. Quant. Grav. 23 6577 (2006) [arXiv:gr-qc/0606090]. [4] M. Bojowald, “Loop quantum cosmology”, Living Rev. Rel. 8, 11 (2005) [arXiv:gr-qc/0601085]; A. Ashtekar, M. Bojowald and J. Lewandowski, “Mathematical struc- ture of loop quantum cosmology”, Adv. Theor. Math. Phys. 7 233 (2003) [arXiv:gr-qc/0304074]; A. Ashtekar, T. Pawlowski and P. Singh, “Quantum nature of the big bang: Improved dynamics” Phys. Rev. D 74 084003 (2006) [arXiv:gr-qc/0607039] [5] V. Husain and O. Winkler, “Semiclassical states for quantum cosmology” Phys. Rev. D 75 024014 (2007) [arXiv:gr-qc/0607097]; V. Husain V and O. Winkler, “On singularity resolution in quantum gravity”, Phys. Rev. D 69 084016 (2004). [arXiv:gr-qc/0312094]. [6] A. Corichi, T. Vukasinac and J.A. Zapata. “Hamil- tonian and physical Hilbert space in polymer quan- tum mechanics”, Class. Quant. Grav. 24 1495 (2007) [arXiv:gr-qc/0610072] [7] A. Corichi and J. Cortez, “Canonical quantization from an algebraic perspective” (preprint) [8] A. Corichi, J. Cortez and H. Quevedo, “Schrödinger and Fock Representations for a Field Theory on Curved Spacetime”, Annals Phys. (NY) 313 446 (2004) [arXiv:hep-th/0202070]. [9] E. Manrique, R. Oeckl, A. Weber and J.A. Zapata, “Loop quantization as a continuum limit” Class. Quant. Grav. 23 3393 (2006) [arXiv:hep-th/0511222]; E. Manrique, R. Oeckl, A. Weber and J.A. Zapata, “Effective theo- ries and continuum limit for canonical loop quantization” (preprint) [10] D.W. Chiou, “Galileo symmetries in polymer particle representation”, Class. Quant. Grav. 24, 2603 (2007) [arXiv:gr-qc/0612155]. [11] W. Rudin, Fourier analysis on groups, (Interscience, New York, 1962) [12] A. Ashtekar, A. Corichi, P. Singh, “Contrasting LQC and WDW using an exactly soluble model” (preprint); A. Corichi, T. Vukasinac, and J.A. Zapata, “Continuum limit for quantum constrained system” (preprint). http://arxiv.org/abs/gr-qc/0207106 http://arxiv.org/abs/gr-qc/0606090 http://arxiv.org/abs/gr-qc/0601085 http://arxiv.org/abs/gr-qc/0304074 http://arxiv.org/abs/gr-qc/0607039 http://arxiv.org/abs/gr-qc/0607097 http://arxiv.org/abs/gr-qc/0312094 http://arxiv.org/abs/gr-qc/0610072 http://arxiv.org/abs/hep-th/0202070 http://arxiv.org/abs/hep-th/0511222 http://arxiv.org/abs/gr-qc/0612155 ABSTRACT A rather non-standard quantum representation of the canonical commutation relations of quantum mechanics systems, known as the polymer representation has gained some attention in recent years, due to its possible relation with Planck scale physics. In particular, this approach has been followed in a symmetric sector of loop quantum gravity known as loop quantum cosmology. Here we explore different aspects of the relation between the ordinary Schroedinger theory and the polymer description. The paper has two parts. In the first one, we derive the polymer quantum mechanics starting from the ordinary Schroedinger theory and show that the polymer description arises as an appropriate limit. In the second part we consider the continuum limit of this theory, namely, the reverse process in which one starts from the discrete theory and tries to recover back the ordinary Schroedinger quantum mechanics. We consider several examples of interest, including the harmonic oscillator, the free particle and a simple cosmological model. <|endoftext|><|startoftext|> Introduction Conceptual structure for material properties Idealized one-dimensional loading Ramp compression Shock compression Accuracy: application to air Complex behavior of condensed matter Temperature Density-temperature equations of state Temperature model for mechanical equations of state Strength Preferred representation of isotropic strength Beryllium Phase changes Composite loading paths Conclusions Acknowledgments References References List of figures ABSTRACT A general formulation was developed to represent material models for applications in dynamic loading. Numerical methods were devised to calculate response to shock and ramp compression, and ramp decompression, generalizing previous solutions for scalar equations of state. The numerical methods were found to be flexible and robust, and matched analytic results to a high accuracy. The basic ramp and shock solution methods were coupled to solve for composite deformation paths, such as shock-induced impacts, and shock interactions with a planar interface between different materials. These calculations capture much of the physics of typical material dynamics experiments, without requiring spatially-resolving simulations. Example calculations were made of loading histories in metals, illustrating the effects of plastic work on the temperatures induced in quasi-isentropic and shock-release experiments, and the effect of a phase transition. <|endoftext|><|startoftext|> Introduction A hypercube H(X) on a set X is a graph which vertices are the finite subsets of X ; two vertices are joined by an edge if they differ by a singleton. A partial cube is a graph that can be isometrically embedded into a hypercube. There are three general graph-theoretical structures that play a prominent role in the theory of partial cubes; namely, semicubes, Djoković’s relation θ, and Winkler’s relation Θ. We use these structures, in particular, to characterize bi- partite graphs and partial cubes. The characterization problem for partial cubes was considered as an important one and many characterizations are known. We list contributions in the chronological order: Djoković [9] (1973), Avis [2] (1981), Winkler [20] (1984), Roth and Winkler [18] (1986), Chepoi [6, 7] (1988 and 1994). In the paper, we present new proofs for the results of Djoković [9], Winkler [20], and Chepoi [6], and obtain two more characterizations of partial cubes. http://arxiv.org/abs/0704.0010v1 The paper is also concerned with some ways of constructing new partial cubes from old ones. Properties of subcubes, the Cartesian product of partial cubes, and expansion and contraction of a partial cube are investigated. We introduce a construction based on pasting two graphs together and show how new partial cubes can be obtained from old ones by pasting them together. The paper is organized as follows. Hypercubes and partial cubes are introduced in Section 2 together with two basic examples of infinite partial cubes. Vertex sets of partial cubes are described in terms of well graded families of finite sets. In Section 3 we introduce the concepts of a semicube, Djoković’s θ and Win- kler’s Θ relations, and establish some of their properties. Bipartite graphs and partial cubes are characterized by means of these structures. One more charac- terization of partial cubes is obtained in Section 4, where so-called fundamental sets in a graph are introduced. The rest of the paper is devoted to constructions: subcubes and the Carte- sian product (Section 6), pasting (Section 7), and expansions and contractions (Section 8). We show that these constructions produce new partial cubes from old ones. Isometric and lattice dimensions of new partial cubes are calculated. These dimensions are introduced in Section 5. Few words about conventions used in the paper are in order. The sum (disjoint union) A+B of two sets A and B is the union ({1} ×A) ∪ ({2} ×B). All graphs in the paper are simple undirected graphs. In the notation G = (V,E), the symbol V stands for the set of vertices of the graph G and E stands for its set of edges. By abuse of language, we often write ab for an edge in a graph; if this is the case, ab is an unordered pair of distinct vertices. We denote 〈U〉 the graph induced by the set of vertices U ⊆ V . If G is a connected graph, then dG(a, b) stands for the distance between two vertices a and b of the graph G. Wherever it is clear from the context which graph is under consideration, we drop the subscript G in dG(a, b). A subgraph H ⊆ G is an isometric subgraph if dH(a, b) = dG(a, b) for all vertices a and b of H ; it is convex if any shortest path in G between vertices of H belongs to H . 2 Hypercubes and partial cubes Let X be a set. We denote Pf (X) the set of all finite subsets of X . Definition 2.1. A graph H(X) has the set Pf (X) as the set of its vertices; a pair of vertices PQ is an edge of H(X) if the symmetric difference P∆Q is a singleton. The graph H(X) is called the hypercube on X [9]. If X is a finite set of cardinality n, then the graph H(X) is the n-cube Qn. The dimension of the hypercube H(X) is the cardinality of the set X . The shortest path distance d(P,Q) on the hypercube H(X) is the Hamming distance between sets P and Q: d(P,Q) = |P∆Q| for P,Q ∈ Pf . (2.1) The set Pf (X) is a metric space with the metric d. Definition 2.2. A graph G is a partial cube if it can be isometrically embedded into a hypercube H(X) for some set X . We often identify G with its isometric image in the hypercube H(X), and say that G is a partial cube on the set X . Figure 2.1: A graph and its isometric embedding into Q3. An example of a partial cube and its isometric embedding into the cube Q3 is shown in Figure 2.1. Clearly, a family F of finite subsets of X induces a partial cube on X if and only if for any two distinct subsets P,Q ∈ F there is a sequence R0 = P,R1, . . . , Rn = Q of sets in F such that d(Ri, Ri+1) = 1 for all 0 ≤ i < n, and d(P,Q) = n. (2.2) The families of sets satisfying condition (2.2) are known as well graded fam- ilies of sets [10]. Note that a sequence (Ri) satisfying (2.2) is a shortest path from P to Q in H(X) (and in the subgraph induced by F). Definition 2.3. A family F of arbitrary subsets ofX is a wg-family (well graded family of sets) if, for any two distinct subsets P,Q ∈ F, the set P∆Q is finite and there is a sequence R0 = P,R1, . . . , Rn = Q of sets in F such that |Ri∆Ri+1| = 1 for all 0 ≤ i < n and |P∆Q| = n. Example 2.1. The induced graph can be a partial cube on a different set if the family F is not well graded. Consider, for instance, the family F = {∅, {a}, {a, b}, {a, b, c}, {b, c}} of subsets of X = {a, b, c}. The graph induced by this family is a path of length 4 in the cube Q3 (cf. Figure 2.2). Clearly, F is not well graded. On the other hand, as it can be easily seen, any path is a partial cube. Figure 2.2: A nonisometric path in the cube Q3. Any family F of subsets of X defines a graph GF = (F, EF), where EF = {{P,Q} ⊆ F : |P∆Q| = 1}. Theorem 2.1. The graph GF defined by a family F of subsets of a set X is isomorphic to a partial cube on X if and only if the family F is well graded. Proof. We need to prove sufficiency only. Let S be a fixed set in F. We define a mapping f : F → Pf (X) by f(R) = R∆S for R ∈ F. Then d(f(R), f(T )) = |(R∆S)∆(T∆S)| = |R∆T |. Thus f is an isometric embedding of F into Pf (X). Let (Ri) be a sequence of sets in F such that R0 = P , Rn = Q, |P∆Q| = n, and |Ri∆Ri+1| = 1 for all 0 ≤ i < n. Then the sequence (f(Ri)) satisfies conditions (2.2). The result follows. A set R ∈ Pf (X) is said to be lattice between sets P,Q ∈ Pf (X) if P ∩Q ⊆ R ⊆ P ∪Q. It is metrically between P and Q if d(P,R) + d(R,Q) = d(P,Q). The following theorem is a well-known result about these two betweenness re- lations on Pf (X) (see, for instance, [3]). Theorem 2.2. Lattice and metric betweenness relations coincide on Pf (X). Let F be a family of finite subsets of X . The set of all R ∈ F that are between P,Q ∈ F is the interval I(P,Q) between P and Q in F. Thus, I(P,Q) = F ∩ [P ∩Q,P ∪Q], where [P ∩Q,P ∪Q] is the usual interval in the lattice Pf . Two distinct sets P,Q ∈ F are adjacent in F if J(P,Q) = {P,Q}. If sets P and Q form an edge in the graph induced by F, then P and Q are adjacent in F, but, generally speaking, not vice versa. For instance, in Example 2.1, the vertices ∅ and {b, c} are adjacent in F but do not define an edge in the induced graph (cf. Figure 2.2). The following theorem is a ‘local’ characterization of wg-families of sets. Theorem 2.3. A family F ⊆ Pf (X) is well graded if and only if d(P,Q) = 1 for any two sets P and Q that are adjacent in F. Proof. (Necessity.) Let F be a wg-family of sets. Suppose that P and Q are adjacent in F. There is a sequence R0 = P,R1, . . . , Rn = Q that satisfies conditions (2.2). Since the sequence (Ri) is a shortest path in F, we have d(P, Pi) + d(Pi, Q) = d(P,Q) for all 0 ≤ i ≤ n. Thus, Pi ∈ I(P,Q) = {P,Q}. It follows that d(P,Q) = n = 1. (Sufficiency.) Let P and Q be two distinct sets in F. We prove by induction on n = d(P,Q) that there is a sequence (Ri) ∈ F satisfying conditions (2.2). The statement is trivial for n = 1. Suppose that n > 1 and that the statement is true for all k < n. Let P and Q be two sets in F such that d(P,Q) = n. Since d(P,Q) > 1, the sets P and Q are not adjacent in F. Therefore there exists R ∈ F that lies between P and Q and is distinct from these two sets. Then d(P,R) + d(R,Q) = d(P,Q) and both distances d(P,R) and d(R,Q) are less than n. By the induction hypothesis, there is a sequence (Ri) ∈ F such that P = R0, R = Rj , Q = Rn for some 0 < j < n, satisfying conditions (2.2) for 0 ≤ i < j and j ≤ i < n. It follows that F is a wg-family of sets. We conclude this section with two examples of infinite partial cubes (more examples are found in [17]). Example 2.2. Let Z be the graph on the set Z of integers with edges defined by pairs of consecutive integers. This graph is a partial cube since its vertex set is isometric to the wg-family of intervals {(−∞,m) : m ∈ Z} in Z. Example 2.3. Let us consider Zn as a metric space with respect to the ℓ1- metric. The graph Zn has Zn as the vertex set; two vertices in Zn are connected if they are on the unit distance from each other. We will show in Section 6 (Corollary 6.1) that Zn is a partial cube. 3 Characterizations Only connected graphs are considered in this section. Definition 3.1. Let G = (V,E) be a graph and d be its distance function. For any two adjacent vertices a, b ∈ V let Wab be the set of vertices that are closer to a than to b: Wab = {w ∈ V : d(w, a) < d(w, b)}. Following [11], we call the sets Wab and induced subgraphs 〈Wab〉 semicubes of the graph G. The semicubes Wab and Wba are called opposite semicubes. Remark 3.1. The subscript ab in Wab stands for an ordered pair of vertices, not for an edge of G. In his original paper [9], Djoković uses notation G(a, b) (cf. [8]). We use the notation from [15]. Clearly, two opposite semicubes are disjoint. They can be used to charac- terize bipartite graphs as follows. Theorem 3.1. A graph G = (V,E) is bipartite if and only if the semicubes Wab and Wba form a partition of V for any edge ab ∈ E. Proof. Let us recall that a connected graph G is bipartite if and only if for every vertex x there is no edge ab with d(x, a) = d(x, b) (see, for instance, [1]). For any edge ab ∈ E and vertex x ∈ V we clearly have d(x, a) = d(x, b) ⇔ x /∈ Wab ∪Wba. The result follows. The following lemma is instrumental and will be used frequently in the rest of the paper. Lemma 3.1. Let G = (V,E) be a graph and w ∈ Wab for some edge ab ∈ E. d(w, b) = d(w, a) + 1. Accordingly, Wab = {w ∈ V : d(w, b) = d(w, a) + 1}. Proof. By the triangle inequality, we have d(w, a) < d(w, b) ≤ d(w, a) + d(a, b) = d(w, a) + 1. The result follows, since d takes values in N. There are two binary relations on the set of edges of a graph that play a central role in characterizing partial cubes. Definition 3.2. Let G = (V,E) be a graph and e = xy and f = uv be two edges of G. (i) (Djoković [9]) The relation θ on E is defined by e θf ⇔ f joins a vertex in Wxy with a vertex in Wyx. The notation can be chosen such that u ∈Wxy and v ∈ Wyx. (ii) (Winkler [20]) The relation Θ on E is defined by eΘf ⇔ d(x, u) + d(y, v) 6= d(x, v) + d(y, u). It is clear that both relations θ and Θ are reflexive and Θ is symmetric. Lemma 3.2. The relation θ is a symmetric relation on E. Proof. Suppose that xy θ uv with u ∈ Wxy and v ∈ Wyx. By Lemma 3.1 and the triangle inequality, we have d(u, x) = d(u, y)− 1 ≤ d(u, v) + d(v, y)− 1 = d(v, y) = = d(v, x)− 1 ≤ d(v, u) + d(u, x) − 1 = d(u, x). Hence, d(u, x) = d(v, x) − 1 and d(v, y) = d(u, y)− 1. Therefore, x ∈ Wuv and y ∈ Wvu. It follows that uv θ xy. Lemma 3.3. θ ⊆ Θ. Proof. Suppose that xy θ uv with u ∈Wxy, v ∈ Wyx. By Lemma 3.1, d(x, u) + d(y, v) = d(x, v) − 1 + d(y, u)− 1 6= d(x, v) + d(y, u). Hence, xyΘ uv. Example 3.1. It is easy to verify that θ is the identity relation on the set of edges of the cycle C3. On the other hand, any two edges of C3 stand in the relation Θ. Thus, θ 6= Θ in this case. Bipartite graphs can be characterized in terms of relations θ and Θ as follows. Theorem 3.2. A graph G = (V,E) is bipartite if and only if θ = Θ. Proof. (Necessity.) Suppose that G is a bipartite graph, two edges xy and uv stand in the relation Θ, that is, d(x, u) + d(y, v) 6= d(x, v) + d(y, u), and that edges xy and uv do not stand in the relation θ. By Theorem 3.1, we may assume that u, v ∈ Wxy. By Lemma 3.1, we have d(x, u) + d(y, v) = d(y, u)− 1 + d(x, v) + 1 = d(x, v) + d(y, u), a contradiction. It follows that Θ ⊆ θ. By Lemma 3.3, θ = Θ. (Sufficiency.) Suppose that G is not bipartite. By Theorem 3.1, there is an edge xy such that Wxy ∪Wyx is a proper subset of V . Since G is connected, there is an edge uv with u /∈ Wxy ∪Wyx and v ∈ Wxy ∪Wyx. Clearly, uv does not stand in the relation θ to xy. On the other hand, d(x, u) + d(y, v) 6= d(x, v) + d(y, u), since u /∈ Wxy ∪Wyx and v ∈ Wxy ∪Wyx. Thus, xyΘ uv, a contradiction, since we assumed that θ = Θ. By Theorem 3.2, the relations θ and Θ coincide on bipartite graphs. For this reason we use the relation θ in the rest of the paper. Lemma 3.4. Let G = (V,E) be a bipartite graph such that all its semicubes are convex sets. Then two edges xy and uv stand in the relation θ if and only if the corresponding pairs of mutually opposite semicubes form equal partitions of V : xy θ uv ⇔ {Wxy,Wyx} = {Wuv,Wvu}. Proof. (Necessity) We assume that the notation is chosen such that u ∈ Wxy and v ∈ Wyx. Let z ∈ Wxy ∩Wvu. By Lemma 3.1, d(z, u) = d(z, v) + d(v, u). Since z, u ∈ Wxy and Wxy is convex, we have v ∈ Wxy, a contradiction to the assumption that v ∈Wyx. Thus Wxy ∩Wvu = ∅. Since two opposite semicubes in a bipartite graph form a partition of V , we haveWuv =Wxy andWvu =Wyx. A similar argument shows that Wuv = Wyx and Wvu = Wxy, if u ∈ Wyx and v ∈ Wxy. (Sufficiency.) Follows from the definition of the relation θ. We need another general property of the relation θ (cf. Lemma 2.2 in [15]). Lemma 3.5. Let P be a shortest path in a graph G. Then no two distinct edges of P stand in the relation θ. Proof. Let i < j and xixi+1 and xjxj+1 be two edges in a shortest path P from x0 to xn. Then d(xi, xj) < d(xi, xj+1) and d(xi+1, xj) < d(xi+1, xj+1), so xi, xi+1 ∈ Wxjxj+1 . It follows that edges xixi+1 and xjxj+1 do not stand in the relation θ. The converse statement is true for bipartite graphs (we omit the proof); a counterexample is the cycle C5 which is not bipartite. Lemma 3.6. Let G = (V,E) be a bipartite graph. The following statements are equivalent (i) All semicubes of G are convex. (ii) The relation θ is an equivalence relation on E. Proof. (i) ⇒ (ii). Follows from Lemma 3.4. (ii) ⇒ (i). Suppose that θ is transitive and there is a nonconvex semicube Wab. Then there are two vertices u, v ∈ Wab and a shortest path P from u to v that intersects Wba. This path contains two distinct edges e and f joining vertices of semicubes Wab and Wba. The edges e and f stand in the relation θ to the edge ab. By transitivity of θ, we have e θf . This contradicts the result of Lemma 3.5. Thus all semicubes of G are convex. We now establish some basic properties of partial cubes. Theorem 3.3. Let G = (V,E) be a partial cube. Then (i) G is a bipartite graph. (ii) Each pair of opposite semicubes form a partition of V . (iii) All semicubes are convex subsets of V . (iv) θ is an equivalence relation on E. Proof. We may assume that G is an isometric subgraph of some hypercube H(X), that is, G = (F, EF) for a wg-family F of finite subsets of X . (i) It suffices to note that if two sets in H(X) are connected by an edge then they have different parity. Thus, H(X) is a bipartite graph and so is G. (ii) Follows from (i) and Theorem 3.1. (iii) LetWAB be a semicube of G. By Lemma 3.1 and Theorem 2.2, we have WAB = {S ∈ F : S ∩B ⊆ A ⊆ S ∪B}. Let Q,R ∈WAB and P be a vertex of G such that d(Q,P ) + d(P,R) = d(Q,R). By Theorem 2.2, Q ∩R ⊆ P ⊆ Q ∪R. Since Q,R ∈WAB , we have Q ∩B ⊆ A ⊆ Q ∪B and R ∩B ⊆ A ⊆ R ∪B, which implies P ∩B ⊆ (Q ∪R) ∩B ⊆ A ⊆ (Q ∩R) ∪B ⊆ S ∪B. Hence, P ∈ WAB, and the result follows. (iv) Follows from (iii) and Lemma 3.6. Remark 3.2. Since semicubes of a partial cube G = (V,E) are convex subsets of the metric space V , they are half-spaces in V [19]. This terminology is used in [6, 7]. The following theorem presents four characterizations of partial cubes. The first two are due to Djoković [9] and Winkler [20] (cf. Theorem 2.10 in [15]). Theorem 3.4. Let G = (V,E) be a connected graph. The following statements are equivalent: (i) G is a partial cube. (ii) G is bipartite and all semicubes of G are convex. (iii) G is bipartite and θ is an equivalence relation. (iv) G is bipartite and, for all xy, uv ∈ E, xy θ uv ⇒ {Wxy,Wyx} = {Wuv,Wvu}. (3.1) (v) G is bipartite and, for any pair of adjacent vertices of G, there is a unique pair of opposite semicubes separating these two vertices. Proof. By Lemma 3.6, the statements (ii) and (iii) are equivalent and, by The- orem 3.3, (i) implies both (ii) and (iii). (iii) ⇒ (i). By Theorem 3.1, each pair {Wab,Wba} of opposite semicubes of G form a partition of V . We orient these partitions by calling, in an arbitrary way, one of the two opposite semicubes in each partition a positive semicube. Let us assign to each x ∈ V the set W+(x) of all positive semicubes containing x. In the next paragraph we prove that the family F = {W+(x)}x∈V is well graded and that the assignment x 7→ W+(x) is an isometry between V and F. Let x and y be two distinct vertices of G. We say that a positive semicube Wab separates x and y if either x ∈ Wab, y ∈ Wba or x ∈ Wba, y ∈ Wab. It is clear that Wab separates x and Y if and only if Wab ∈ W +(x)∆W+(y). Let P be a shortest path x0 = x, x1, . . . , xn = y from x to y. By Lemma 3.5, no two distinct edges of P stand in the relation θ. By Lemma 3.4, distinct edges of P define distinct positive semicubes; clearly, these semicubes separate x and y. Let Wab be a positive semicube separating x and y, and, say, x ∈Wab and y ∈Wba. There is an edge f ∈ P that joins vertices in Wab and Wba. Hence, f stands in the relation θ to ab and, by Lemma 3.4, Wab is defined by f . It follows that any semicube inW+(x)∆W+(y) is defined by a unique edge in P and any edge in P defines a semicube in W+(x)∆W+(y). Therefore, d(W+(x),W+(y)) = d(x, y), that is x 7→W+(x) is an isometry. Clearly, F is a wg-family of sets. By Theorem 2.1, the family F is isometric to a wg-family of finite sets. Hence, G is a partial cube. (iv) ⇒ (ii). Suppose that there exist an edge ab such that semicube Wba is not convex. Let p and q be two vertices in Wba such that there is a shortest path P from p to q that intersects Wab. There are two distinct edges xy and uv in P such that x, u ∈ Wab and y, v ∈ Wba. Since ab θ xy and ab θ uv, we have, by (3.1), Wab =Wxy =Wuv. Hence, u ∈ Wxy and v ∈Wyx. By Lemma 3.1, d(x, u) = d(x, v) − 1 = 1 + d(v, y)− 1 = d(v, y), a contradiction, since P is a shortest path from p to q. (ii) ⇒ (iv). Follows from Lemma 3.4. It is clear that (iv) and (v) are equivalent. 4 Fundamental sets in partial cubes Semicubes played an important role in the previous section. In this section we introduce three more classes of useful subsets of graphs. We also establish one more characterization of partial cubes. Let G = (V,E) be a connected graph. For a given edge e = ab ∈ E, we define the following sets (cf. [15, 16]): Fab = {f ∈ E : e θf} = {uv ∈ E : u ∈Wab, v ∈Wba}, Uab = {w ∈Wab : w is adjacent to a vertex in Wba}, Uba = {w ∈Wba : w is adjacent to a vertex in Wab}. The five sets are schematically shown in Figure 4.1. Figure 4.1: Fundamental sets in a partial cube. Remark 4.1. In the case of a partial cube G = (V,E), the semicubes Wab and Wba are complementary half-spaces in the metric space V (cf. Remark 3.2). Then the set Fab can be regarded as a ‘hyperplane’ separating these half-spaces (see [17] where this analogy is formalized in the context of hyperplane arrange- ments). The following theorem generalizes the result obtained in [16] for median graphs (see also [15]). Theorem 4.1. Let ab be an edge of a connected bipartite graph G. If the semicubes Wab and Wba are convex, then the set Fab is a matching and induces an isomorphism between the graphs 〈Uab〉 and 〈Uba〉. Proof. Suppose that Fab is not a matching. Then there are distinct edges xu and xv with, say, x ∈ Uab and u, v ∈ Uba. By the triangle inequality, d(u, v) ≤ 2. Since G does not have triangles, d(u, v) 6= 1. Hence, d(u, v) = 2, which implies that x lies between u and v. This contradicts convexity of Wba, since x ∈ Wab. Therefore Fab is a matching. To show that Fab induces an isomorphism, let xy, uv ∈ Fab and xu ∈ E, where x, u ∈ Uab and y, v ∈ Uba. Since G does not have odd cycles, d(v, y) 6= 2. By the triangle inequality, d(v, y) ≤ d(v, u) + d(u, x) + d(x, y) = 3. Since Wba is convex, d(v, y) 6= 3. Thus d(v, y) = 1, that is, vy is an edge. The result follows by symmetry. By Theorem 3.4(ii), we have the following corollary. Corollary 4.1. Let G = (V,E) be a partial cube. For any edge ab the set Fab is a matching and induces an isomorphism between induced graphs 〈Uab〉 and 〈Uba〉. Figure 4.2: Graph G. Example 4.1. Let G be the graph depicted in Figure 4.2. The set Fab = {ab, xu, yv} is a matching and defines an isomorphism between the graphs induced by subsets Uab = {a, x, y} and Uba = {b, u, v}. The set Wba is not convex, so G is not a partial cube. Thus the converse of Corollary 4.1 does not hold. We now establish another characterization of partial cubes that utilizes a geometric property of families Fab. Theorem 4.2. For a connected graph G the following statements are equivalent: (i) G is a partial cube. (ii) G is bipartite and d(x, u) = d(y, v) and d(x, v) = d(y, u), (4.1) for any ab ∈ E and xy, uv ∈ Fab. Proof. (i)⇒(ii). We may assume that x, u ∈ Wab and y, v ∈ Wba. Since θ is an equivalence relation, we have xy θ uv θab. By Lemma 3.4, Wuv = Wxy = Wab. By Lemma 3.1, d(x, u) = d(x, v) − 1 = d(v, y) + 1− 1 = d(y, v). We also have d(x, v) = d(y, v) + 1 = d(y, u), by the same lemma. (ii)⇒(i). Suppose that G is not a partial cube. Then, by Theorem 3.4, there exist an edge ab such that, say, semicube Wba is not convex. Let p and q be two vertices in Wba such that there is a shortest path P from p to q that intersects Wab. Let uv be the first edge in P which belongs to Fab and xy be the last edge in P with the same property (see Figure 4.3). Figure 4.3: An illustration to the proof of theorem 4.2. Since P is a shortest path, we have d(v, y) = d(v, u) + d(u, x) + d(x, y) 6= d(x, u), which contradicts condition (4.1). Thus all semicubes of G are convex. By Theorem 3.4, G is a partial cube. Remark 4.2. One can say that four vertices satisfying conditions (4.1) define a rectangle in G. Then Theorem 4.2 states that a connected graph is a partial cube if and only if it is bipartite and for any edge ab pairs of edges in Fab define rectangles in G. 5 Dimensions of partial cubes There are many different ways in which a given partial cube can be isometrically embedded into a hypercube. For instance, the graph K2 can be isometrically embedded in different ways into any hypercube H(X) with |X | > 2. Following Djoković [9] (see also [8]), we define the isometric dimension, dimI(G), of a partial cube G as the minimum possible dimension of a hypercube H(X) in which G is isometrically embeddable. Recall (see Section 2) that the dimension of H(X) is the cardinality of the set X . Theorem 5.1. (Theorem 2 in [9].) Let G = (V,E) be a partial cube. Then dimI(G) = |E/θ|, (5.1) where θ is Djoković’s equivalence relation on E and E/θ is the set of its equiv- alence classes (the quotient-set). The quotient-set E/θ can be identified with the family of all distinct sets Fab (see Section 4). If G is a finite partial cube, we may consider it as an isometric subgraph of some hypercube Qn. Then the edges in each family Fab are parallel edges in Qn (cf. Theorem 4.2). This observation essentially proves (5.1) in the finite case. Let G be a partial cube on a set X . The vertex set of G is a wg-family F of finite subsets of X (see Section 2). We define the retraction of F as a family F′ of subsets of X ′ = ∪F \ ∩F consisting of the intersections of sets in F with X ′. It is clear that F′ satisfies conditions ∩ F′ = ∅ and ∪ F′ = X ′. (5.2) Proposition 5.1. The partial cubes induced by a wg-family F and its retraction F′ are isomorphic. Proof. It suffices to prove that metric spaces F and F′ are isometric. Clearly, α : P 7→ P ∩X ′ is a mapping from F onto F′. For P,Q ∈ F, we have (P ∩X ′)∆(Q ∩X ′) = (P∆Q) ∩X ′ = (P∆Q) ∩ (∪F \ ∩F) = P∆Q. Thus, d(α(P ), α(Q)) = d(P,Q). Consequently, α is an isometry. Let G be a partial cube on some set X induced by a wg-family F satisfying conditions (5.2), and let PQ be an edge of G. By definition, there is x ∈ X such that P∆Q = {x}. The following two lemmas are instrumental. Lemma 5.1. Let PQ be an edge of a partial cube G on X and let P∆Q = {x}. The two sets {R ∈ F : x ∈ R} and {R ∈ F : x /∈ R} form the same bipartition of the family F as semicubes WPQ and WQP . Proof. We may assume that Q = P + {x}. Then, for any R ∈ F, R∆Q = R∆(P + {x}) = (R∆P ) + {x}, if x ∈ R, R∆P, if x /∈ R. Hence, |R∆P | < |R∆Q| if and only if x ∈ R. It follows that WPQ = {R ∈ F : x ∈ R}. A similar argument shows that WQP = {R ∈ F : x /∈ R}. Lemma 5.2. If F is a wg-family of sets satisfying conditions (5.2), then for any x ∈ X there are sets P,Q ∈ F such that P∆Q = {x}. Proof. By conditions 5.2, for a given x ∈ X there are sets S and T in F such that x ∈ S and x /∈ T . Let R0 = S,R1, . . . , Rn = T be a sequence of sets in F satisfying conditions (2.2). It is clear that there is i such that x ∈ Ri and x /∈ Ri+1. Hence, Ri∆Ri+1 = {x}, so we can choose P = Ri and Q = Ri+1. By Lemmas 5.1 and 5.2, there is one-to-one correspondence between the set X and the quotient-set E/θ. From Theorem 5.1 we obtain the following result. Theorem 5.2. Let F be a wg-family of finite subsets of a set X such that ∩F = ∅ and ∪F = X, and let G be a partial cube on X induced by F. Then dimI(G) = |X |. Clearly, a graph which is isometrically embeddable into a partial cube is a partial cube itself. We will show in Section 6 (Corollary 6.1) that the integer lattice Zn is a partial cube. Thus a graph which is isometrically embeddable into an integer lattice is a partial cube. It follows that a finite graph is a partial cube if and only if it is embeddable in some integer lattice. Examples of infinite partial cubes isometrically embeddable into a finite dimensional integer lattice are found in [17]. We call the minimum possible dimension n of an integer lattice Zn, in which a given graph G is isometrically embeddable, its lattice dimension and denote it dimZ(G). The lattice dimension of a partial cube can be expressed in terms of maximum matchings in so-called semicube graphs [11]. Definition 5.1. The semicube graph Sc(G) has all semicubes in G as the set of its vertices. Two vertices Wab and Wcd are connected in Sc(G) if Wab ∪Wcd = V and Wab ∩Wcd 6= ∅. (5.3) If G is a partial cube, then condition (5.3) is equivalent to each of the two equivalent conditions: Wba ⊂Wcd ⇔ Wdc ⊂Wab, (5.4) where ⊂ stands for the proper inclusion. Theorem 5.3. (Theorem 1 in [11].) Let G be a finite partial cube. Then dimZ(G) = dimI(G) − |M |, where M is a maximum matching in the semicube graph Sc(G). Example 5.1. Let G be the graph shown in Figure 2.1. It is easy to see that dimI(G) = 3 and dimZ(G) = 2. Example 5.2. Let T be a tree with n edges and m leaves. Then dimI(T ) = n and dimZ(T ) = ⌈m/2⌉ (cf. [8] and [14], respectively). Example 5.3. For the cycle C6 we have (see Figure 8.2) dimI(C6) = dimZ(C6) = 3. 6 Subcubes and Cartesian products Let G be a partial cube. We say that G′ is a subcube of G if it is an isometric subgraph of G. Clearly, a subcube is itself a partial cube. The converse does not hold; a subgraph of a graph G can be a partial cube but not an isometric subgraph of G (cf. Example 2.1). If G′ is a subcube of a partial cube G, then dimI(G ′) ≤ dimI(G) and dimZ(G ′) ≤ dimZ(G). In general, the two inequalities are not strict. For instance, the cycle C6 is an isometric subgraph of the cube Q3 (see Figure 8.2) dimI(C6) = dimZ(C6) = dimI(Q3) = dimZ(Q3) = 3. Semicubes of a partial cube are examples of subcubes. Indeed, by Theo- rem 3.4, semicubes are convex subgraphs and therefore isometric. In general, the converse is not true; a path connecting two opposite vertices in C6 is an isometric subgraph but not a convex one. Another common way of constructing new partial cubes from old ones is by forming their Cartesian products (see [15] for details and proofs). Definition 6.1. Given two graphs G1 = (V1, E1) and G2 = (V2, E2), their Cartesian product G = G1�G2 has vertex set V = V1 × V2; a vertex u = (u1, u2) is adjacent to a vertex v = (v1, v2) if and only if u1v1 ∈ E1 and u2 = v2, or u1 = v1 and u2v2 ∈ E2. The operation � is associative, so we can write G = G1� · · ·�Gn = for the Cartesian product of graphs G1, . . . , Gn. A Cartesian product i=1Gi is connected if and only if the factors are connected. Then we have dG(u, v) = dGi(ui, vi). (6.1) Example 6.1. Let {Xi} i=1 be a family of sets and Y = i=1 be their sum. Then the Cartesian product of the hypercubes H(Xi) is isomorphic to the hy- percube H(Y ). The isomorphism is established by the mapping f : (P1, . . . , Pn) 7→ Formula (6.1) yields immediately the following results. Proposition 6.1. Let Hi be isometric subgraphs of graphs Gi for all 1 ≤ i ≤ n. Then the Cartesian product i=1Hi is an isometric subgraph of the Cartesian product i=1Gi. Corollary 6.1. The Cartesian product of a finite family of partial cubes is a partial cube. In particular, the integer lattice Zn (cf. Examples 2.2 and 2.3) is a partial cube. The results of the next two theorems can be easily extended to arbitrary finite products of finite partial cubes. Theorem 6.1. Let G = G1�G2 be the Cartesian product of two finite partial cubes. Then dimI(G) = dimI(G1) + dimI(G2). Proof. We may assume that G1 (resp. G2) is induced by a wg-family F1 (resp. F2) of subsets of a finite set X1 (resp. X2) such that ∩F1 = ∅ and ∪F1 = X1 (resp. ∩F2 = ∅ and ∪F2 = X1) (see Section 5). By Theorem 5.2, dimI(G1) = |X1| and dimI(G2) = |X2|. It is clear that the graph G is induced by the wg-family F = F1 +F2 of subsets of the set X = X1 + X2 (cf. Example 6.1) with ∩F = ∅, ∪F = X . By Theorem 5.2, dimI(G) = |X | = |X1|+ |X2| = dimI(G1) + dimI(G2). Theorem 6.2. Let G = (V,E) be the Cartesian product of two finite partial cubes G1 = (V1, E1) and G2 = (V2, E2). Then dimZ(G) = dimZ(G1) + dimZ(G2). Proof. Let W(a,b)(c,d) be a semicube of the graph G. There are two possible cases: (i) c = a, bd ∈ E2. Let (x, y) be a vertex of G. Then, by (6.1), dG((x, y), (a, b)) = dG1(x, a) + dG2(y, b) dG((x, y), (c, d)) = dG1(x, c) + dG2(y, d). Hence, dG((x, y), (a, b)) < dG((x, y), (c, d)) ⇔ dG2(y, b) < dG2(y, d). It follows that W(a,b)(c,d) = V1 ×Wbd. (6.2) (ii) d = b, ac ∈ E1. Like in (i), we have W(a,b)(c,d) =Wac × V2. (6.3) Clearly, two semicubes given by (6.2) form an edge in the semicube graph Sc(G) if and only if their second factors form an edge in the semicube graph Sc(G2). The same is true for semicubes in the form (6.3) with respect to their first factors. It is also clear that semicubes in the form (6.2) and in the form (6.3) are not connected by an edge in Sc(G). Therefore the semicube graph Sc(G) is isomorphic to the disjoint union of semicube graphs Sc(G1) and Sc(G2). If M1 is a maximum matching in Sc(G1) and M2 is a maximum matching in Sc(G2), then M =M1 ∪M2 is a maximum matching in Sc(G). The result follows from theorems 5.3 and 6.1. Remark 6.1. The result of Corollary 6.1 does not hold for infinite Cartesian products of partial cubes, as these products are disconnected. On the other hand, it can be shown that arbitrary weak Cartesian products (connected com- ponents of Cartesian products [15]) of partial cubes are partial cubes. 7 Pasting partial cubes In this section we use the set pasting technique [5, ch.I, §2.5] to build new partial cubes from old ones. Let G1 = (V1, E1) and G2 = (V2, E2) be two graphs, H1 = (U1, F1) and H2 = (U2, F2) be two isomorphic subgraphs of G1 and G2, respectively, and ψ : U1 → U2 be a bijection defining an isomorphism between H1 and H2. The bijection ψ defines an equivalence relation R on the sum V1+V2 as follows: any element in (V1 \U1)∪ (V2 \U2) is equivalent to itself only and elements u1 ∈ U1 and u2 ∈ U2 are equivalent if and only if u2 = ψ(u1). We say that the quotient set V = (V1 + V2)/R is obtained by pasting together the sets V1 and V2 along the subsets U1 and U2. Since the graphs H1 and H2 are isomorphic, the pasting of the sets V1 and V2 can be naturally extended to a pasting of sets of edges E1 and E2 resulting in the set E of edges joining vertices in V . We say that the graph G = (E, V ) is obtained by pasting together the graphs G1 and G2 along the isomorphic subgraphs H1 and H2. The pasting construction allows for identifying in a natural way the graphs G1 and G2 with subgraphs of G, and the isomorphic graphs H1 and H2 with a common subgraph H of both graphs G1 and G2. We often follow this convention below. Remark 7.1. Note that in the above construction the resulting graph G de- pends not only on graphs G1 and G2 and their isomorphic subgraphs H1 and H2 but also on the bijection ψ defining an isomorphism from H1 onto H2 (see the drawings in Figures 7.1 and 7.2). Figure 7.1: Pasting of two trees. Figure 7.2: Another pasting of the same trees. In general, pasting of two partial cubes G1 and G2 along two isomorphic subgraphs H1 and H2 does not produce a partial cube even under strong as- sumptions about these subgraphs as the next example illustrates. Figure 7.3: Pasting partial cubes G1 and G2. Example 7.1. Pasting of two partial cubes G1 = C6 and G2 = C6 along subgraphs H1 and H2 is shown in Figure 7.3. The resulting graph G is not a partial cube. Indeed, the semicubeWab is not a convex set. Note that subgraphs H1 and H2 are convex subgraphs of the respective partial cubes. In this section we study two simple pastings of connected graphs together, the vertex-pasting and the edge-pasting, and show that these pastings produce partial cubes from partial cubes. We also compute the isometric and lattice dimensions of the resulting graphs. Let G1 = (V1, E1) and G2 = (V2, E2) be two connected graphs, a1 ∈ V1, a2 ∈ V2, and H1 = ({a1},∅), H2 = ({a2},∅). Let G be the graph obtained by pasting G1 and G2 along subgraphs H1 and H2. In this case we say that the graph G is obtained from graphs G1 and G2 by vertex-pasting. We also say that G is obtained from G1 and G2 by identifying vertices a1 and a2. Figure 7.4 illustrates this construction. Note that the vertex a = {a1, a2} is a cut vertex of G, since G1 ∪ G2 = G and G1 ∩ G2 = {a}. (We follow our convention and identify graphs G1 and G2 with subgraphs of G.) Figure 7.4: An example of vertex-pasting. In what follows we use superscripts to distinguish subgraphs of the graphs G1 and G2. For instance, W stands for the semicube of G2 defined by two adjacent vertices a, b ∈ V2. Theorem 7.1. A graph G = (V,E) obtained by vertex-pasting from partial cubes G1 = (V1, E1) and G2 = (V2, E2) is a partial cube. Proof. We denote a = {a1, a2} the vertex of G obtained by identifying vertices a1 ∈ V1 and a2 ∈ V2. Clearly, G is a bipartite graph. Let xy be an edge of G. Without loss of generality we may assume that xy ∈ E1 and a ∈ Wxy. Note that any path between vertices in V1 and V2 must go through a. Since a ∈Wxy, we have, for any v ∈ V2, d(v, x) = d(v, a) + d(a, x) < d(v, a) + d(a, y) = d(v, y), which implies V2 ⊆ Wxy and Wyx ⊆ V1. It follows that Wxy = W xy ∪ V2 and Wyx = W yx . The sets W xy , W yx and V2 are convex subsets of V . Since xy ∩ V2 = {a}, the set Wxy = W xy ∪ V2 is also convex. By Theorem 3.4(ii), the graph G is a partial cube. The vertex-pasting construction introduced above can be generalized as follows. Let G = {Gi = (Vi, Ei)}i∈J be a family of connected graphs and A = {ai ∈ Gi}i∈J be a family of distinguished vertices of these graphs. Let G be the graph obtained from the graphs Gi by identifying vertices in the set A. We say that G is obtained by vertex-pasting together the graphs Gi (along the set A). Example 7.2. Let J = {1, . . . , n} with n ≥ 2, G = {Gi = ({ai, bi}, {aibi})}i∈J , and A = {ai}i∈J . Clearly, each Gi is K2. By vertex-pasting these graphs along A, we obtain the n-star graph K1,n. Since the star K1,n is a tree it can be also obtained from K1 by successive vertex-pasting as in Example 7.3. Example 7.3. Let G1 be a tree and G2 = K2. By vertex-pasting these graphs we obtain a new tree. Conversely, let G be a tree and v be its leaf. Let G1 be a tree obtained from G by deleting the leaf v. Clearly, G can be obtained by vertex-pasting G1 and K2. It follows that any tree can obtained from the graph K1 by successive vertex-pasting of copies of K2 (cf. Theorem 2.3(e) in [12]). Any connected graph G can be constructed by successive vertex-pasting of its blocks using its block cut-vertex tree [4] structure. Let G1 be an endblock of G with a cut vertex v and G2 be the union of the remaining blocks of G. Then G can be obtained from G1 and G2 by vertex-pasting along the vertex v. It follows that any connected graph can be obtained from its blocks by successive vertex-pastings. Let G = (V,E) be a partial cube. We recall that the isometric dimension dimI(G) of G is the cardinality of the quotient set E/θ, where θ is Djoković’s equivalence relation on the set E (cf. formula (5.1)). Theorem 7.2. Let G = (V,E) be a partial cube obtained by vertex-pasting together partial cubes G1 = (V1, E1) and G2 = (V2, E2). Then dimI(G) = dimI(G1) + dimI(G2). Proof. It suffices to prove that there are no edges xy ∈ E1 and uv ∈ E2 which are in Djoković’s relation θ with each other. Suppose that G1 and G2 are vertex-pasted along vertices a1 ∈ E1 and a2 ∈ E2 and let a = {a1, a2} ∈ E. Let xy ∈ E1 and uv ∈ E2 be two edges in E. We may assume that u ∈ Wxy. Since a is a cut-vertex of G and u ∈Wxy, we have d(u, a) + d(a, x) = d(u, x) < d(u, y) = d(u, a) + d(a, y). Hence, d(a, x) < d(a, y), which implies d(v, x) = d(v, a) + d(a, x) < d(v, a) + d(a, y) = d(v, y). It follows that v ∈ Wxy. Therefore the edge xy does not stand in the relation θ to the vertex uv. The next result follows immediately from the previous theorem. Note that blocks of a partial cube are partial cubes themselves. Corollary 7.1. Let G be a partial cube and {G1, . . . , Gn} be the family of its blocks. Then dimI(G) = dimI(Gi). In the case of the lattice dimension of a partial cube we can claim only much weaker result than one stated in Theorem 7.2 for the isometric dimension. We omit the proof. Theorem 7.3. Let G be a partial cube obtained by vertex-pasting together partial cubes G1 and G2. Then max{dimZ(G1), dimZ(G2)} ≤ dimZ(G) ≤ dimZ(G1) + dimZ(G2). The following example illustrate possible cases for inequalities in Theo- rem 7.3. Let us recall that the lattice dimension of a tree with m leaves is ⌈m/2⌉ (cf. [14]). Example 7.4. The star K1,6 can be obtained from the stars K1,2 and K1,4 by vertex-pasting these two stars along their centers. Clearly, max{dimZ(K1,2), dimZ(K1,4)} < dimZ(K1,6) = dimZ(K1,2) + dimZ(K1,4). The same star K1,6 is obtained from two copies of the star K1,3 by vertex- pasting along their centers. We have dimZ(K1,3) = 2, dimZ(K1,6) = 3, so max{dimZ(K1,3), dimZ(K1,3)} < dimZ(K1,6) < dimZ(K1,3) + dimZ(K1,3). Let us vertex-paste two stars K1,3 along their two leaves. The resulting graph T is a tree with four vertices. Therefore, max{dimZ(K1,3), dimZ(K1,3)} = dimZ(T ) < dimZ(K1,3) + dimZ(K1,3). We now consider another simple way of pasting two graphs together. Let G1 = (V1, E1) and G2 = (V2, E2) be two connected graphs, a1b1 ∈ E1, a2b2 ∈ E2, and H1 = ({a1, b1}, {a1b1}), H2 = ({a2, b2}, {a2b2}). Let G be the graph obtained by pasting G1 and G2 along subgraphs H1 and H2. In this case we say that the graph G is obtained from graphs G1 and G2 by edge-pasting. Figures 7.1, 7.2, and 7.5 illustrate this construction. Figure 7.5: An example of edge-pasting. As before, we identify the graphs G1 and G2 with subgraphs of the graph G and denote a = {a1, a2}, b = {b1, b2} the two vertices obtained by pasting together vertices a1 and a2 and, respectively, b1 and b2. The edge ab ∈ E is obtained by pasting together edges a1b1 ∈ E1 and a2b2 ∈ E2 (cf. Figure 7.5). Then G = G1∪G2, V1∩V2 = {a, b} and E1∩E2 = {ab}. We use these notations in the rest of this section. Proposition 7.1. A graph G obtained by edge-pasting together bipartite graphs G1 and G2 is bipartite. Proof. Let C be a cycle in G. If C ⊆ G1 or C ⊆ G2, then the length of C is even, since the graphs G1 and G2 are bipartite. Otherwise, the vertices a and b separate C into two paths each of odd length. Therefore C is a cycle of even length. The result follows. The following lemma is instrumental; it describes the semicubes of the graph G in terms of semicubes of graphs G1 and G2. Lemma 7.1. Let uv be an edge of G. Then (i) For uv ∈ E1, a, b ∈ Wuv ⇒ Wuv =W uv ∪ V2, Wvu =W (ii) For uv ∈ E2, a, b ∈ Wuv ⇒ Wuv =W uv ∪ V1, Wvu =W (iii) a ∈ Wuv, b ∈Wvu ⇒ Wuv =Wab. Figure 7.6: Edge-pasting of graphs G1 and G2. Proof. We prove parts (i) and (iii) (see Figure 7.6). (i) Since any path from w ∈ V2 to u or v contains a or b and a, b ∈Wuv, we have w ∈Wuv. Hence, Wuv =W uv ∪ V2 and Wvu =W (iii) Since ab θ uv in G1, we have W uv = W , by Theorem 3.4(iv). Let w be a vertex in W uv . Then, by the triangle inequality, d(w, u) < d(w, v) ≤ d(w, b) + d(b, v) < d(w, b) + d(b, u). Since any shortest path from w to u contains a or b, we have d(w, a) + d(a, u) = d(w, u). Therefore, d(w, a) + d(a, u) < d(w, b) + d(b, u). Since ab θ uv in G1, we have d(a, u) = d(b, v), by Theorem 4.2. It follows that d(w, a) < d(w, b), that is, w ∈ W . We proved that W uv ⊆ W symmetry, W vu ⊆ W . Since two opposite semicubes form a partition of V2, we have W uv =W . The result follows. Theorem 7.4. A graph G obtained by edge-pasting together partial cubes G1 and G2 is a partial cube. Proof. By Theorem 3.4(ii) and Proposition 7.1, we need to show that for any edge uv of G the semicube Wuv is a convex subset of V . There are two possible cases. (i) uv = ab. The semicube Wab is the union of semicubes W and W which are convex subsets of V1 and V2, respectively. It is clear that any shortest path connecting a vertex in W with a vertex in W contains vertex a and therefore is contained in Wab. Hence, Wab is a convex set. A similar argument proves that the set Wba is convex. (ii) uv 6= ab. We may assume that uv ∈ E1. To prove that the semicube Wuv is a convex set, we consider two cases. (a) a, b ∈ Wuv. (The case when a, b ∈ Wvu is treated similarly.) By Lemma 7.1(i), the semicube Wuv is the union of the semicube W uv and the set V2 which are both convex sets. Any shortest path P from a vertex in V2 to a vertex in W uv contains either a or b. It follows that P ⊆ W uv ∪ V2 = Wuv. Therefore the semicube Wuv is convex. (b) a ∈ Wuv, b ∈ Wvu. (The case when b ∈ Wuv , a ∈ Wvu is treated similarly.) By Lemma 7.1(ii), Wuv = Wab. The result follows from part (i) of the proof. Theorem 7.5. Let G be a graph obtained by edge-pasting together finite partial cubes G1 and G2. Then dimI(G) = dimI(G1) + dimI(G2)− 1. Proof. Let θ, θ1, and θ2 be Djoković’s relations on E, E1, and E2, respectively. By Lemma 7.1, for uv, xy ∈ E1 (resp. uv, xy ∈ E2) we have uv θ xy ⇔ uv θ1xy (resp. uv θ xy ⇔ uv θ2xy). Let uv ∈ E1, xy ∈ E2, and uv θ xy. Suppose that (uv, ab) /∈ θ. We may assume that a, b ∈ Wuv . By Lemma 7.1(i), V2 ⊂ Wuv, a contradiction, since xy ∈ E2. Hence, uv θ xy θ ab. It follows that each equivalence class of the relation θ is either an equivalence class of θ1, an equivalence class of θ2 or the class containing the edge ab. Therefore |E/θ| = |E1/θ1|+ |E2/θ2| − 1. The result follows, since the isometric dimension of a partial cube is equal to the cardinality of the set of equivalence classes of Djoković’s relation (formula (5.1)). We need some results about semicube graphs in order to prove an analog of Theorem 7.3 for a partial cube obtained by edge-pasting of two partial cubes. Lemma 7.2. Let G be a partial cube and WpqWuv , WqpWxy be two edges in the graph Sc(G). Then WxyWuv is an edge in Sc(G). Proof. By condition (5.4), Wqp ⊂ Wuv and Wyx ⊂ Wqp. Hence, Wyx ⊂ Wuv. By the same condition, WxyWuv ∈ Sc(G). As before, we identify partial cubes G1 and G2 with subgraphs of the partial cube G. Then G1 ∪G2 = G and G1 ∩G2 = ({a, b}, {ab}) = K2 (cf. Figure 7.6). Lemma 7.3. Let G be a partial cube obtained by edge-pasting together partial cubes G1 and G2. Let W xy (resp. W xy ) be an edge in the semicube Sc(G1) (resp. Sc(G2)). Then WuvWxy is an edge in Sc(G). Figure 7.7: Semicubes forming an edge in Sc(G1). Proof. It suffices to consider the case of Sc(G1) (see Figure 7.7). By condi- tion (5.4),W vu ⊂W xy andW yx ⊂W uv . Suppose that a ∈ W vu and b ∈W (the case when b ∈ W vu and a ∈ W yx is treated similarly). Then ab θ1xy and ab θ1uv. By transitivity of θ1, we have uv θ1xy, a contradiction, since semicubes uv and W xy are distinct. Therefore we may assume that, say, a, b ∈ W Then, by Lemma 7.1, Wvu = W vu ⊂ V1. Since W vu ⊂ W xy ⊆ Wxy, we have Wvu ⊂Wxy. By condition (5.4), WuvWxy is an edge in Sc(G). Lemma 7.4. LetM1 andM2 be matchings in graphs Sc(G1) and Sc(G2). There is a matching M in Sc(G) such that |M | ≥ |M1|+ |M2| − 1. Proof. By Lemma 7.3, M1 and M2 induce matchings in Sc(G) which we denote by the same symbols. The intersection M1 ∩M2 is either empty or a subgraph of the empty graph with vertices Wab and Wba. If M1 ∩M2 is empty, then M = M1 ∪M2 is a matching in Sc(G) and the result follows. If M1 ∩M2 is an empty graph with a single vertex, say, in M1, we remove fromM1 the edge that has this vertex as its end vertex, resulting in the matching M ′1. Clearly, M =M 1 ∪M2 is a matching in Sc(G) and |M | = |M1|+ |M2| − 1. Suppose now that M1 ∩M2 is the empty graph with vertices Wab and Wba. Let WabWuv, WbaWpq (resp. WabWxy, WbaWrs) be edges in M1 (resp. M2). By Lemma 7.2, WxyWrs is an edge in Sc(G2). Let us replace edgesWabWxy and WbaWrs in M2 by a single edge WxyWrs, resulting in the matching M 2. Then M =M1 ∪M 2 is a matching in Sc(G) and |M | = |M1|+ |M2| − 1. Corollary 7.2. Let M1 and M2 be maximum matchings in Sc(G1) and Sc(G2), respectively, and M be a maximum matching in Sc(G). Then |M | ≥ |M1|+ |M2| − 1. (7.1) By Theorem 5.3, we have dimI(G1) = dimZ(G1) + |M1|, dimI(G2) = dimZ(G2) + |M2|, dimI(G) = dimZ(G) + |M |, where M1 and M2 are maximum matchings in Sc(G1) and Sc(G2), respectively, and M is a maximum matching in Sc(G). Therefore, by Theorem 7.5 and (7.1), we have the following result (cf. Theorem 7.3). Theorem 7.6. Let G be a partial cube obtained by edge-pasting from partial cubes G1 and G2. Then max{dimZ(G1), dimZ(G2)} ≤ dimZ(G) ≤ dimZ(G1) + dimZ(G2). Example 7.5. Let us consider two edge-pastings of the stars G1 = K1,3 and G2 = K1,3 of lattice dimension 2 shown in figures 7.1 and 7.2. In the first case the resulting graph is the star G = K1,5 of lattice dimension 3. Then we have max{dimZ(G1), dimZ(G2)} < dimZ(G) < dimZ(G1) + dimZ(G2). In the second case the resulting graph is a tree with 4 leaves. Therefore, max{dimZ(G1), dimZ(G2)} = dimZ(G) < dimZ(G1) + dimZ(G2). Let c1a1 and c2a2 be edges of stars G1 = K1,4 and G2 = K1,4 (each of which has lattice dimension 2), where c1 and c2 are centers of the respective stars. Let us edge-paste these two graphs by identifying c1 with c2 and a1 with a2, respectively. The resulting graph G is the star K1,7 of lattice dimension 4. Thus, max{dimZ(G1), dimZ(G2)} ≤ dimZ(G) = dimZ(G1) + dimZ(G2). 8 Expansions and contractions of partial cubes The graph expansion procedure was introduced by Mulder in [16], where it is shown that a graph is a median graph if and only if it can be obtained from K1 by a sequence of convex expansions (see also [15]). A similar result for partial cubes was established in [6] (see also [7]) as a corollary to a more general result concerning isometric embeddability into Hamming graphs; it was also established in [13] in the framework of oriented matroids theory. In this section we investigate properties of (isometric) expansion and con- traction operations and, in particular, prove in two different ways that a graph is a partial cube if and only if it can be obtained from the graph K1 by a sequence of expansions. A remark about notations is in order. In the product {1, 2} × (V1 ∪ V2), we denote V ′i = {i} × Vi and x i = (i, x) for x ∈ Vi, where i, j = 1, 2. Definition 8.1. Let G = (V,E) be a connected graph, and let G1 = (V1, E1) and G2 = (V2, E2) be two isometric subgraphs of G such that G = G1 ∪ G2. The expansion of G with respect to G1 and G2 is the graph G ′ = (V ′, E′) constructed as follows from G (see Figure 8.1): (i) V ′ = V1 + V2 = V 1 ∪ V (ii) E′ = E1 + E2 +M , where M is the matching x∈V1∩V2 {x1x2}. In this case, we also say that G is a contraction of G′. Figure 8.1: Expansion/contraction processes. It is clear that the graphs G1 and 〈V 1〉 are isomorphic, as well as the graphs G2 and 〈V We define a projection p : V ′ → V by p(xi) = x for x ∈ V . Clearly, the restriction of p to V ′1 is a bijection p1 : V 1 → V1 and its restriction to V 2 is a bijection p2 : V 2 → V2. These bijections define isomorphisms 〈V 1〉 → G1 and 〈V ′2〉 → G2. Let P ′ be a path in G′. The vertices of G obtained from the vertices in P ′ under the projection p define a walk P in G; we call this walk P the projection of the path P ′. It is clear that ℓ(P ) = ℓ(P ′), if P ′ ⊆ 〈V ′1〉 or P ′ ⊆ 〈V ′2〉. (8.1) In this case, P is a path in G and either P = p1(P ′) or P = p2(P ′). On the other hand, ℓ(P ) < ℓ(P ′), if P ′ ∩ 〈V ′1〉 6= ∅ and P ′ ∩ 〈V ′2 〉 6= ∅, (8.2) and P is not necessarily a path. We will frequently use the results of the following lemma in this section. Lemma 8.1. (i) For u1, v1 ∈ V ′1 , any shortest path Pu1v1 in G ′ belongs to 〈V ′1 〉 and its projection Puv = p1(Pu1v1) is a shortest path in G. Accordingly, dG′(u 1, v1) = dG(u, v) and 〈V ′1〉 is a convex subgraph of G ′. A similar statement holds for u2, v2 ∈ V ′2 . (ii) For u1 ∈ V ′1 and v 2 ∈ V ′2 , dG′(u 1, v2) = dG(u, v) + 1. Let Pu1v2 be a shortest path in G ′. There is a unique edge x1x2 ∈M such that x1, x2 ∈ Pu1v2 and the sections Pu1x1 and Px2v2 of the path Pu1v2 are shortest paths in 〈V ′1 〉 and 〈V 2 〉, respectively. The projection Puv of Pu1v2 in G ′ is a shortest path in G. Proof. (i) Let Pu1v1 be a path in G ′ that intersects V ′2 . Since 〈V1〉 is an isometric subgraph of G, there is a path Puv in G that belongs to 〈V1〉. Then p 1 (Puv) is a path in 〈V ′1 〉 of the same length as Puv. By (8.1) and (8.2), ℓ(p−11 (Puv)) < ℓ(Pu1v1). Therefore any shortest path Pu1v1 in G ′ belongs to 〈V ′1 〉. The result follows. (ii) Let Pu1v2 be a shortest path in G ′ and Puv be its projection to V . By (8.2), dG′(u 1, v2) = ℓ(Pu1v2) > ℓ(Puv) ≥ dG(u, v). Since there is no edge of G joining vertices in V1 \ V2 and V2 \ V1, a shortest path in G from u to v must contain a vertex x ∈ V1 ∩ V2. Since G1 and G2 are isometric subgraphs, there are shortest paths Pux in G1 and Pxv in G2 such that their union is a shortest path from u to v. Then, by the triangle inequality and part (i) of the proof, we have (cf. Figure 8.1) dG′(u 1, v2) ≤ dG′(u 1, x1) + dG′(x 1, x2) + dG′(x 2, v2) = dG(u, v) + 1. The last two displayed formulas imply dG′(u 1, v2) = dG(u, v) + 1. Since u1 ∈ V ′1 and v 2 ∈ V ′2 the path Pu1v2 must contain an edge, say x 1x2, in M . Since this path is a shortest path in G′, this edge is unique. Then the sec- tions Pu1x1 and Px2v2 of Pu1v2 are shortest paths in 〈V 1 〉 and 〈V 2〉, respectively. Clearly, Puv is a shortest path in G. Let a1a2 be an edge in the matchingM = ∪x∈V1∩V2{x 1x2}. This edge defines five fundamental sets (cf. Section 4): the semicubes Wa1a2 and Wa2a1 , the sets of vertices Ua1a2 and Ua2a1 , and the set of edges Fa1a2 . The next theorem follows immediately from Lemma 8.1. It gives a hint to a connection between the expansion process and partial cubes. Theorem 8.1. Let G′ be an expansion of a connected graph G and notations are chosen as above. Then (i) Wa1a2 = V 1 and Wa2a1 = V 2 are convex semicubes of G (ii) Fa1a2 =M defines an isomorphism between induced subgraphs 〈Ua1a2〉 and 〈Ua2a1〉, which are isomorphic to the subgraph G1 ∩G2. The result of Theorem 8.1 justifies the following constructive definition of the contraction process. Definition 8.2. Let ab be an edge of a connected graph G′ = (V ′, E′) such (i) semicubes Wab and Wba are convex and form a partition of V (ii) the set Fab is a matching and defines an isomorphism between subgraphs 〈Uab〉 and 〈Uba〉. A graph G obtained from the graphs 〈Wab〉 and 〈Wba〉 by pasting them along subgraphs 〈Uab〉 and 〈Uba〉 is said to be a contraction of the graph G Remark 8.1. If G′ is bipartite, then semicubesWab andWba form a partition of its vertex set. Then, by Theorem 4.1, condition (i) implies condition (ii). Thus any pair of opposite convex semicubes in a connected bipartite graph defines a contraction of this graph. By Theorem 8.1, a graph is a contraction of its expansion. It is not difficult to see that any connected graph is also an expansion of its contraction. The following three examples give geometric illustrations for the expansion and contraction procedures. Example 8.1. Let a and b be two opposite vertices in the graph G = C4. Clearly, the two distinct paths P1 and P2 from a to b are isometric subgraphs of G defining an expansion G′ = C6 of G (see Figure 8.2). Note that P1 and P2 are not convex subsets of V . Example 8.2. Another isometric expansion of the graph G = C4 is shown in Figure 8.3. Here, the path P1 is the same as in the previous example and G2 = G. Example 8.3. Lemma 8.1 claims, in particular, that the projection of a shortest path in an extension G′ of a graphG is a shortest path in G. Generally speaking, Figure 8.2: An expansion of the cycle C4. Figure 8.3: Another isometric expansion of the cycle C4. the converse is not true. Consider the graph G shown in Figure 8.4 and two paths in G: V1 = abcef and V2 = bde. The graph G′ in Figure 8.4 is the convex expansion of G with respect to V1 and V2. The path abdef is a shortest path in G; it is not a projection of a shortest path in G′. Figure 8.4: A shortest path which is not a projection of a shortest path. One can say that, in the case of finite partial cubes, the contraction procedure is defined by an orthogonal projection of a hypercube onto one of its facets. By Theorem 8.1, the sets V ′1 and V 2 are opposite semicubes of the graph G defined by edges in M . Their projections are the sets V1 and V2 which are not necessarily semicubes of G. For other semicubes in G′ we have the following result. Lemma 8.2. For any two adjacent vertices u, v ∈ V , Wuivi = p −1(Wuv) for u, v ∈ Vi and i = 1, 2. Proof. By Lemma 8.1, dG′(x j , ui) < dG′(x j , vi) ⇔ dG(x, u) < dG(x, v) for x ∈ V and i, j = 1, 2. The result follows. Corollary 8.1. If uv is an edge of G1 ∩G2, then Wu1v1 =Wu2v2 . The following lemma is an immediate consequence of Lemma 8.1. We shall use it implicitly in our arguments later. Lemma 8.3. Let u, v ∈ V1 and x ∈ V1 ∩ V2. Then x1 ∈Wu1v1 ⇔ x 2 ∈Wu1v1 . The same result holds for semicubes in the form Wu2v2 . Generally speaking, the projection of a convex subgraph of G′ is not a con- vex subgraph of G. For instance, the projection of the convex path b2d2e2 in Figure 8.4 is the path bde which is not a convex subgraph of G. On the other hand, we have the following result. Theorem 8.2. Let G′ = (V ′, E′) be an expansion of a graph G = (V,E) with respect to subgraphs G1 = (V1, E1) and G2 = (V2, E2). The projection of a convex semicube of G′ different from 〈V ′1〉 and 〈V 2 〉 is a convex semicube of G. Proof. It suffices to consider the case when Wuv = p(Wu1v1) for u, v ∈ V1 (cf. Theorem 8.2). Let x, y ∈Wuv and z ∈ V be a vertex such that dG(x, z) + dG(z, y) = dG(x, y). We need to show that z ∈Wuv. Figure 8.5: A shortest path from x to y. (i) x, y ∈ V1 (the case when x, y ∈ V2 is treated similarly). Suppose that z ∈ V1. Then x 1, y1, z1 ∈ V ′1 and, by Lemma 8.1, dG′(x 1, z1) + dG′(z 1, y1) = dG′(z 1, y1). Since x1, y1 ∈ Wu1v1 and Wu1v1 is convex, z 1 ∈ Wu1v1 . Hence, z ∈Wuv. Suppose now that z ∈ V2 \ V1. Consider a shortest path Pxy in G from x to y containing z. This path contains vertices x′, y′ ∈ V1 ∩ V2 such that (see Figure 8.5) dG(x, x ′) + dG(x ′, z) = dG(x, z) and dG(y, y ′) + dG(y ′, z) = dG(y, z). Since Pxy is a shortest path in G, we have dG(x, x ′) + dG(x ′, y) = dG(x, y), dG(x, y ′) + dG(y ′, y) = dG(x, y), ′, z) + dG(z, y ′) = dG(x ′, y′). Since x, x′, y ∈ V1, we have x 1, x′1, y1 ∈ V ′1 . Because x 1, y1 ∈ Wu1v1 and Wu1v1 is convex, x′1 ∈ Wu1v1 . Hence, x ′ ∈ Wuv and, similarly, y ′ ∈ Wuv. Since x′2, y′2, z2 ∈ V ′2 and Wu1v1 is convex, z 2 ∈Wu1v1 . Hence, z ∈Wuv. (ii) x ∈ V1 \V2 and y ∈ V2 \V1. We may assume that z ∈ V1. By Lemma 8.1, dG′(x 1, y2) = dG(x, y) + 1 = dG(x, z) + dG(z, y) + 1 = dG′(x 1, z1) + dG′(z 1, y2). Since x1, y2 ∈ Wu1v1 and Wu1v1 is convex, z 1 ∈ Wu1v1 . Hence, z ∈Wuv. By using the results of Lemma 8.1, it is not difficult to show that the class of connected bipartite graphs is closed under the expansion and contraction operations. The next theorem establishes this result for the class of partial cubes. Theorem 8.3. (i) An expansion G′ of a partial cube G is a partial cube. (ii) A contraction G of a partial cube G′ is a partial cube. Proof. (i) Let G = (V,E) be a partial cube and G′ = (V ′, E′) be its expansion with respect to isometric subgraphs G1 = (V1, E1) and G2 = (V2, E2). By Theorem 3.4(ii), it suffices to show that the semicubes of G′ are convex. By Lemma 8.1, the semicubes 〈V ′1〉 and 〈V 2〉 are convex, so we consider a semicube in the formWu1v1 where uv ∈ E1 (the other case is treated similarly). Let Px′y′ be a shortest path connecting two vertices in Wu1v1 and Pxy be its projection to G. By Theorem 8.2, x, y ∈ Wuv and, by Lemma 8.1, Pxy is a shortest path in G. Since Wuv is convex, Pxy belongs to Wuv. Let z ′ be a vertex in Px′y′ and z = p(z ′) ∈ Pxy. By Lemma 8.1, dG(z, u) < dG(z, v) ⇒ dG′(z ′, u1) ≤ dG′(z ′, v1). Since G′ is a bipartite graph, dG′(z ′, u1) < dG′(z ′, v1). Hence, Px′y′ ⊆ Wu1v1 , so Wu1v1 is convex. (ii) Let G = (V,E) be a contraction of a partial cube G′ = (V ′, E′). By Theorem 3.4, we need to show that the semicubes of G are convex. By The- orem 8.2, all semicubes of G are projections of semicubes of G′ distinct from 〈V ′1〉 and 〈V 2〉. By Theorem 8.2, the semicubes of G are convex. Corollary 8.2. (i) A finite connected graph is a partial cube if and only if it can be obtained from K1 by a sequence of expansions. (ii) The number of expansions needed to produce a partial cube G from K1 is dimI(G). Proof. (i) Follows immediately from Theorem 8.3. (ii) Follows from theorems 8.2 and 5.1 (see the discussion in Section 5 just before Theorem 5.2 ). The processes of expansion and contraction admit useful descriptions in the case of partial cubes on a set. Let G = (V,E) be a partial cube on a set X , that is an isometric subgraph of the hypercube H(X). Then it is induced by some wg-family F of finite subsets of X (cf. Theorem 2.1). We may assume (see Section 5) that ∩F = ∅ and ∪F = X . In what follows we present proofs of the results of Theorem 8.3 and Corol- lary 8.2 given in terms of wg-families of sets. The expansion process for a partial cube G on X can be described as follows: Let F1 and F2 be wg-families of finite subsets of X such that F1 ∩ F2 6= ∅, F1∪F2 = F, and the distance between any two sets P ∈ F1 \F2 and Q ∈ F2 \F1 is greater than one. Note that 〈F1〉 and 〈F2〉 are partial cubes, 〈F1〉∩ 〈F2〉 6= ∅, and 〈F1〉 ∪ 〈F2〉 = 〈F〉 = G. Let X ′ = X + {p}, where p /∈ X , and 2 = {Q+ {p} : Q ∈ F2}, F ′ = F1 ∪ F It is quite clear that the graphs 〈F′2〉 and 〈F2〉 are isomorphic and the graph G′ = 〈F′〉 is an isometric expansion of the graph G. Theorem 8.4. An expansion of a partial cube is a partial cube. Proof. We need to verify that F′ is a wg-family of finite subsets of X ′. By Theorem 2.3, it suffices to show that the distance between any two adjacent sets in F′ is 1. It is obvious if each of these two sets belong to one of the families F1 or F 2. Suppose that P ∈ F1 and Q+ {p} ∈ F 2 are adjacent, that is, for any S ∈ F′ we have P ∩ (Q+ {p}) ⊆ S ⊆ P ∪ (Q+ {p}) ⇒ S = P or S = Q+ {p}. (8.3) If Q ∈ F1, then P ∩ (Q + {p}) ⊆ Q ⊆ P ∪ (Q+ {p}), since p /∈ P . By (8.3), Q = P implying d(P,Q + {p}) = 1. If Q ∈ F2 \ F1, there is R ∈ F1 ∩ F2 such that d(P,R) + d(R,Q) = d(P,Q), since F is well graded. By Theorem 2.2, P ∩Q ⊆ R ⊆ P ∪Q, which implies P ∩ (Q + {p}) ⊆ R+ {p} ⊆ P ∪ (Q+ {p}). By (8.3), R + {p} = Q+ {p}, a contradiction. It is easy to recognize the fundamental sets (cf. Section 4) in an isometric expansion G′ of a partial cube G = 〈F〉. Let P ∈ F1∩F2 and Q = P +{p} ∈ F be two vertices defining an edge in G′ according to Definition 8.1(ii). Clearly, the families F1 and F 2 are the semicubes WPQ and WQP of the graph G ′ (cf. Lemma 5.1) and therefore are convex subsets of F′. The set FPQ is the set of edges defined by p as in Lemma 5.1. In addition, UPQ = F1 ∩ F2 and UQP = {R+ {p} : R ∈ F1 ∩ F2}. Let G be a partial cube induced by a wg-family F of finite subsets of a set X . As before, we assume that ∩F = ∅ and ∪F = X . Let PQ be an edge of G. We may assume that Q = P + {p} for some p /∈ P . Then (see Lemma 5.1) WPQ = {R ∈ F : p /∈ R} and WQP = {R ∈ F : p ∈ R}. Let X ′ = X \ {p} and F′ = {R \ {p} : R ∈ F}. It is clear that the graph G′ induced by the family F′ is isomorphic to the contraction of G defined by the edge PQ. Geometrically, the graph G′ is the orthogonal projection of the graph G along the edge PQ (cf. figures 8.2 and 8.3). Theorem 8.5. (i) A contraction G′ of a partial cube G is a partial cube. (ii) If G is finite, then dimI(G ′) = dimI(G)− 1. Proof. (i) For p ∈ X we define F1 = {R ∈ F : p /∈ R}, F2 = {R ∈ F : p ∈ R}, and F′2 = {R \ {p} ∈ F : p ∈ R}. Note that F1 and F2 are semicubes of G and F′2 is isometric to F2. Hence, F1 and F 2 are wg-families of finite subsets of X ′. We need to prove that F′ = F1 ∪ F 2 is a wg-family. By Theorem 2.3, it suffices to show that d(P,Q) = 1 for any two adjacent sets P,Q ∈ F′. This is true if P,Q ∈ F1 or P,Q ∈ F 2, since these two families are well graded. For P ∈ F1 \ F 2 and Q ∈ F 2 \ F1, the sets P and Q + {p} are not adjacent in F, since F is well graded and Q /∈ F. Hence there is R ∈ F1 such that P ∩ (Q+ {p}) ⊆ R ⊆ P ∪ (Q + {p}) and R 6= P . Since p /∈ R, we have P ∩Q ⊆ R ⊆ P ∪Q. Since R 6= P and R 6= Q, the sets P and Q are not adjacent in F′. The result follows. (ii) If G is a finite partial cube, then, by Theorem 5.2, dimI(G ′) = |X ′| = |X | − 1 = dimI(G)− 1. 9 Conclusion The paper focuses on two themes of a rather general mathematical nature. 1. The characterization problem. It is a common practice in mathematics to characterize a particular class of object in different terms. We present new characterizations of the classes of bipartite graphs and partial cubes, and give new proofs for known characterization results. 2. Constructions. The problem of constructing new objects from old ones is a standard topic in many branches of mathematics. For the class of partial cubes, we discuss operations of forming the Cartesian product, expansion and contraction, and pasting. It is shown that the class of partial cubes is closed under these operations. Because partial cubes are defined as graphs isometrically embeddable into hypercubes, the theory of partial cubes has a distinctive geometric flavor. The three main structures on a graph—semicubes and Djoković’s and Winkler’s relations—are defined in terms of the metric structure on a graph. One can say that this theory is a branch of discrete metric geometry. Not surprisingly, geo- metric structures play an important role in our treatment of the characterization and construction problems. References [1] A.S. Asratian, T.M.J. Denley, and R. Häggkvist, Bipartite Graphs and their Applications, Cambridge University Press, 1998. [2] D. Avis, Hypermetric spaces and the Hamming cone, Canadian Journal of Mathematics 33 (1981) 795–802. [3] L. Blumenthal, Theory and Applications of Distance Geometry, Oxford University Press, London, Great Britain, 1953. [4] J.A. Bondy, Basic graph theory: Paths and circuits, in: R.L. Graham, M. Grötshel, and L. Lovász (Eds.), Handbook of Combinatorics, The MIT Press, Cambridge, Massachusetts, 1995, pp. 3–110. [5] N. Bourbaki, General Topology, Addison-Wesley Publ. Co., 1966. [6] V. Chepoi, Isometric subgraphs of Hamming graphs and d-convexity, Con- trol and Cybernetics 24 (1988) 6–11. [7] V. Chepoi, Separation of two convex sets in convexity structures, Journal of Geometry 50 (1994) 30–51. [8] M.M. Deza and M. Laurent, Geometry of Cuts and Metrics, Springer, 1997. [9] D.Ž. Djoković, Distance preserving subgraphs of hypercubes, J. Combin. Theory Ser. B 14 (1973) 263–267. [10] J.-P. Doignon and J.-Cl. Falmagne, Well-graded families of relations, Dis- crete Math. 173 (1997) 35–44. [11] D. Eppstein, The lattice dimension of a graph, European J. Combinatorics 26 (2005) 585–592, doi: 10.1016/j.ejc.2004.05.001. [12] A. Frank, Connectivity and network flows, in: R.L. Graham, M. Grötshel, and L. Lovász (Eds.), Handbook of Combinatorics, The MIT Press, Cam- bridge, Massachusetts, 1995, pp. 111–177. [13] K. Fukuda and K. Handa, Antipodal graphs and oriented matroids, Dis- crete Mathematics 111 (1993) 245–256. [14] F. Hadlock and F. Hoffman, Manhattan trees, Util. Math. 13 (1978) 55–67. [15] W. Imrich and S. Klavžar, Product Graphs, John Wiley & Sons, 2000. [16] H.M. Mulder, The Interval Function of a Graph, Mathematical Centre Tracts 132, Mathematisch Centrum, Amsterdam, 1980. [17] S. Ovchinnikov, Media theory: representations and examples, Discrete Ap- plied Mathematics, (in review, e-print available at http://arxiv.org/abs/math.CO/0512282). [18] R.I. Roth and P.M. Winkler, Collapse of the metric hierarchy for bipartite graphs, European Journal of Combinatorics 7 (1986) 371–375. [19] M.L.J. van de Vel, Theory of Convex Structures, Elsevier, The Netherlands, 1993. [20] P.M. Winkler, Isometric embedding in products of complete graphs, Dis- crete Appl. Math. 8 (1984) 209–212. http://arxiv.org/abs/math.CO/0512282 Introduction Hypercubes and partial cubes Characterizations Fundamental sets in partial cubes Dimensions of partial cubes Subcubes and Cartesian products Pasting partial cubes Expansions and contractions of partial cubes Conclusion ABSTRACT Partial cubes are isometric subgraphs of hypercubes. Structures on a graph defined by means of semicubes, and Djokovi\'{c}'s and Winkler's relations play an important role in the theory of partial cubes. These structures are employed in the paper to characterize bipartite graphs and partial cubes of arbitrary dimension. New characterizations are established and new proofs of some known results are given. The operations of Cartesian product and pasting, and expansion and contraction processes are utilized in the paper to construct new partial cubes from old ones. In particular, the isometric and lattice dimensions of finite partial cubes obtained by means of these operations are calculated. <|endoftext|><|startoftext|> Introduction Let F be a real quadratic field of narrow class number one and let B be the unique (up to isomorphism) quaternion algebra over F which is ramified at both archimedean places of F and unramified everywhere else. Let GU2(B) be the unitary similitude group of B⊕2. This is the set of Q-rational points of an algebraic group GB defined over Q. The group GB is an inner form of G := ResF/Q(GSp4) such that G B(R) is compact modulo its centre. (These notions are reviewed at the beginning of Section 1.) In this paper we develop an algorithm which computes automorphic forms on GB in the following sense: given an idealN inOF and an integer k greater than 2, the algorithm returns the Hecke eigensystems of all automorphic forms f of level N and parallel weight k. More precisely, given a prime p in OF , the algorithm returns the Hecke eigenvalues of f at p, and hence the Euler factor Lp(f, s), for each eigenform f of level N and parallel weight k. The algorithm is a generalization of the one developed in [D1 2005] to the genus 2 case. Although we have only described the algorithm in the case of a real quadratic field in this paper, it should be clear from our presentation that it can be adapted to any totally real number field of narrow class number one. The Jacquet-Langlands Correspondence of the title refers to the conjec- tural map JL : Π(GB) → Π(G) from automorphic representations of GB to automorphic representations of G, which is injective, matches L-functions and enjoys other properties compatible with the principle of functoriality; Date: October 29, 2018. 1991 Mathematics Subject Classification. Primary: 11F41 (Hilbert and Hilbert-Siegel modular forms). Key words and phrases. Hilbert-Siegel modular forms, Jacquet-Langlands Correspon- dence, Brandt matrices, Satake parameters. http://arxiv.org/abs/0704.0011v3 2 CLIFTON CUNNINGHAM AND LASSINA DEMBÉLÉ in particular, the image of the Jacquet-Langlands Correspondence is to be contained in the space of holomorphic automorphic representations. If we admit this conjecture, then the algorithm above provides a way to produce examples of cuspidal Hilbert-Siegel modular forms of genus 2 over F and allows us to compute the L-factors of the corresponding automorphic repre- sentations for arbitrary finite primes p of F . In fact, we are also able to use these calculations to provide evidence for the Jacquet-Langlands Correspondence itself by comparing the Euler factors we find with those of known Hilbert-Siegel modular forms obtained by lifting. This we do in the final section of the paper where we observe that some of the Euler factors we compute match those of lifts of Hilbert modular forms, for the primes we computed. Although this does not definitively establish that these Hilbert-Siegel modular forms are indeed lifts, in principle one can establish equality in this way, using an analogue of the Sturm bound. The first systematic approach to Siegel modular forms from a computa- tional viewpoint is due to Skoruppa [Sk 1992] who used Jacobi symbols to generate spaces of such forms. His algorithm, which has been extensively exploited by Ryan [R 2006], applies only to the case of full level structure. More recently, Faber and van der Geer [FvdG1 2004] and [FvdG2 2004] also produced examples of Siegel modular forms by counting points on hy- perelliptic curves of genus 2; again their results are available only in the full level structure case. The most substantial progress toward the com- putation of Siegel modular forms for proper level structure is by Gunnells [Gu 2000] who extended the theory of modular symbols to the symplectic group Sp4/Q. However, this work does not see the cuspidal cohomology, which is the only part of the cohomology which is relevant to arithmetic geometric applications. To the best of our knowledge, there are no numer- ical examples of Hilbert-Siegel modular forms for proper level structure in the literature, with the exception of those produced from liftings of Hilbert modular forms. The outline of the paper is as follows. In Section 1 we recall the basic properties of Hilbert-Siegel modular forms and algebraic automorphic forms together with the Jacquet-Langlands Correspondence. In Section 2 we give a detailed description of our algorithm. Finally, in Section 3 we present numerical results for the quadratic field Q( Acknowledgements. During the course of the preparation of this paper, the second author had helpful email exchanges with several people includ- ing Alexandru Ghitza, David Helm, Marc-Hubert Nicole, David Pollack, Jacques Tilouine and Eric Urban. The authors wish to thank them all. Also, we would like to thank William Stein for allowing us to use the SAGE computer cluster at the University of Washington. And finally, the sec- ond author would like to thank the PIMS institute for their postdoctoral fellowship support, and the University of Calgary for its hospitality. COMPUTING HILBERT-SIEGEL MODULAR FORMS 3 1. Hilbert-Siegel modular forms and the Jacquet-Langlands correspondence Throughout this paper, F denotes a real quadratic field of narrow class number one. The two archimedean places of F and the real embeddings of F will both be denoted v0 and v1. For every a ∈ F , we write a0 (resp. a1) for the image of a under v0 (resp. v1). The ring of integers of F is denoted by OF . For every prime ideal p in OF , the completion of F and OF at p will be denoted by Fp and OFp , respectively. Let B be the unique (up to isomorphism) totally definite quaternion al- gebra over F which is unramified at all finite primes of F . We fix a maximal order OB of B. Also, we choose a splitting field K/F of B that is Ga- lois over Q and such that there exists an isomorphism j : OB ⊗Z OK ∼= M2(OK)⊕M2(OK), where M2(A) denotes the ring of 2× 2-matrices with entries from a ring A. For every finite prime p in F , we fix an isomorphism Bp ∼= M2(Fp) which restricts to an isomorphism from OB, p onto M2(OFp ). The algebraic group G = ResF/Q(GSp4) is defined as follows. For any Q-algebra A, the set of A-rational points of G is given by G(A) = γ ∈ GL4(A⊗Q F ) t = νG(γ)J2 νG(γ) ∈ (A⊗Q F )× where −12 0 This group admits an integral model with A-rational points for every Z- algebra A given by GZ(A) = γ ∈ GL4(A⊗Z OF ) t = νG(γ)J2 νG(γ) ∈ (A⊗Z OF )× For any Q-algebra A, the conjugation on B extends in a natural way to the matrix algebra M2(B ⊗Q A). The algebraic group GB/Q is defined as follows. For any Q-algebra A, the set of A-rational points of GB is given by GB(A) = γ ∈ M2(B ⊗Q A) γγ̄t = νGB(γ)12 νGB (γ) ∈ (A⊗Q F )× This group also admits an integral model with A-rational points for every Z-algebra given by GBZ (A) = γ ∈ M2(OB ⊗Z A) γγ̄t = νGB(γ)12 νGB (γ) ∈ (A⊗Z OF )× The group GB/Q is an inner form of G/Q such that GB(R) is compact modulo its center. Combining the isomorphism j (see above) with con- jugation by a permutation matrix, we obtain an isomorphism GBZ (OK) ∼= 4 CLIFTON CUNNINGHAM AND LASSINA DEMBÉLÉ GZ(OK), which we fix from now on. For every prime ideal p in F , the split- ting of GB at p amounts to the splitting of the quaternion algebra B at p; we refer to [D1 2005] for further details. By the choice of the quaternion algebra B, we have GB(Q̂) ∼= G(Q̂). (We denote the finite adèles of Q (resp. Z) by Q̂ (resp. Ẑ)). 1.1. Hilbert-Siegel modular forms. We fix an integer k ≥ 3 and, for simplicity, we restrict ourselves to Hilbert-Siegel modular forms of parallel weight k. The real embeddings v0 and v1 of F extend to G(Q) = GSp4(F ) in a natural way. We denote by GSp+4 (F ) the subgroup of elements γ with totally positive similitude factor νG(γ). We recall that the Siegel upper-half plane of genus 2 is defined by H2 = {γ ∈ GL2(C) ∣ γt = γ and Im(γ) is positive definite }. We also recall that GSp+4 (F ) acts on H (τ0, τ1) := (a0τ0 + b0)(c0τ0 + d0) −1, (a1τ1 + b1)(c1τ1 + d1) This induces an action on the space of functions f : H22 → C by , f |kγ(τ) = νG(γi) det(ciτi + di)k f(τ). Let N be an ideal in OF and set Γ0(N) = ∈ GSp+4 (OF ) ∣ c ≡ 0(N) A Hilbert-Siegel modular form of level N and parallel weight k is a holomorphic function f : H22 → C such that ∀γ ∈ Γ0(N), f |kγ = f. The space of Hilbert-Siegel modular forms of parallel weight k and level N is denoted Mk(N). Each f ∈Mk(N) admits a Fourier expansion, which by the Koecher principle takes the form ∀τ ∈ H22, f(τ) = {Q}∪{0} 2πiTr(Qτ), where Q ∈ M2(F ) runs over all symmetric totally positive and semi-definite matrices. A Hilbert-Siegel modular forms f is a cusp form if, for all γ ∈ 4 (F ), the constant term in the Fourier expansion of f |kγ is zero. The space of Hilbert-Siegel cusp forms is denoted Sk(N). COMPUTING HILBERT-SIEGEL MODULAR FORMS 5 1.2. The Hecke algebra. The space Sk(N) comes equipped with a Hecke action, which we now recall. Take u ∈ GSp+4 (F ) ∩M4(OF ), and write the finite disjoint union Γ0(N)uΓ0(N) = Γ0(N)ui. Then the Hecke operator [Γ0(N)uΓ0(N)] on Sk(N) is given by [Γ0(N)uΓ0(N)]f = f |kui. Let p be a prime ideal in OF and let πp be a totally positive generator of p; let T1(p) and T2(p) be the Hecke operators corresponding to the double Γ0(N)-cosets of the symplectic similitude matrices 1 0 0 0 0 1 0 0 0 0 πp 0 0 0 0 πp 1 0 0 0 0 πp 0 0 0 0 π2p 0 0 0 0 πp respectively. (We remind the reader of the symplectic form J2 fixed at the beginning of Section 1.) The Hecke algebra Tk(N) is the Z-algebra generated by the operators T1(p) and T2(p), where p runs over all primes not dividing N . 1.3. Algebraic Hilbert-Siegel autormorphic forms. We only consider level structure of Siegel type. Namely, we define the compact open subgroup U0(N) of G(Q̂) by U0(N) = GSp4(OFp )× ep ), where N = p|N p ep and ep ) := ∈ GSp4(OFp ) ∣ c ≡ 0 mod pep The weight representation is defined as follows. Let Lk be the repre- sentation of GSp4(C) of highest weight (k− 3, k− 3). We let Vk = Lk ⊗Lk and define the complex representation (ρk, Vk) by ρk : G B(R) −→ GL(Vk), where the action on the first factor is via v0, and the action on the second one is via v1. The space of algebraic Hilbert-Siegel modular forms of weight k and level N is given by MBk (N) := f : GB(Q̂)/U0(N) → Vk ∣ ∀γ ∈ GB(Q), f |kγ = f 6 CLIFTON CUNNINGHAM AND LASSINA DEMBÉLÉ where f |kγ(x) = f(γx)γ, for all x ∈ GB(Q̂)/U0(N). When k = 3, we let IBk (N) := f : GB(Q)\GB(Q̂)/U0(N) → C ∣ f is constant Then, the space of algebraic Hilbert-Siegel cusp forms of weight k and level N is defined by SBk (N) := MBk (N) if k > 3, MBk (N)/I k (N) if k = 3. The action of the Hecke algebra on SBk (N) is given as follows. For any u ∈ G(Q̂), write the finite disjoint union U0(N)uU0(N) = uiU0(N), and define [U0(N)uU0(N)] : S k (N) → SBk (N) f 7→ f |k[U0(N)uU0(N)], f |k[U0(N)uU0(N)](x) = f(xui), x ∈ G(Q̂). For any prime p ∤ N , let ̟p be a local uniformizer at p. The local Hecke alge- bra at p is generated by the Hecke operators T1(p) and T2(p) corresponding to the double U0(N)-cosets ∆1(p) and ∆2(p) of the matrices 1 0 0 0 0 1 0 0 0 0 ̟p 0 0 0 0 ̟p 1 0 0 0 0 ̟p 0 0 0 0 ̟2 0 0 0 ̟p respectively. We let TBk (N) be the Hecke algebra generated by T1(p) and T2(p) for all primes p ∤ N . 1.4. The Jacquet-Langlands Correspondence. The Hecke modules Sk(N) and SBk (N) are related by the following conjecture known as the Jacquet- Langlands Correspondence for symplectic similitude groups. Conjecture 1. The Hecke algebras Tk(N) and T k (N) are isomorphic and there is a compatible isomorphism of Hecke modules Sk(N) ∼−→ SBk (N). It is common, but perhaps not entirely accurate, to attribute this con- jecture to Jacquet-Langlands. To the best of our knowledge, the correspon- dence in this form was first discussed by Ihara [Ih 1964] in the case F = Q. In [Ib 1984], Ibukiyama provided some numerical evidence. On the other hand, it is appropriate to refer to Conjecture 1 as the Jacquet-Langlands Corre- spondence (for GSp(4)) since it is an analogue of the Jacquet-Langlands COMPUTING HILBERT-SIEGEL MODULAR FORMS 7 Correspondence (for GL(2)) which relates automorphic representations of the multiplicative group of a quaternion algebra with certain automorphic representations of GL(2) (see [JL 1970]). Both correspondences are, in turn, special consequences of the principle of functoriality, as expounded by Lang- lands. Finally, it appears that Conjecture 1 may soon be a theorem due to the work of [So 2008] and the forthcoming book by James Arthur on auto- morphic representations of classical groups. 2. The Algorithm In this section, we present the algorithm we used in order to compute the Hecke module of (algebraic) Hilbert-Siegel modular forms. The main assumption in this section is that the class number of the principal genus of GB is 1. (We refer to [D3 2007] to see how one can relax this condition on the class number.) We recall that since B is totally definite, GB satis- fies Proposition 1.4 in Gross [Gr 1999]. Thus the group GB(R) is compact modulo its centre, and Γ = GB(Z)/O×F is finite. For any prime p in F , let Fp = OF /p be the residue field at p and define the reduction map M2(OB, p) → M4(Fp) g 7→ g̃, where we use the splitting of OB,p that was fixed at the beginning of Sec- tion 1. Now, choose a totally positive generator πp of p and put Θ1(p) := Γ\ u ∈ M2(OB) ∣ uūt = πp12and rank(g̃) = 2 Θ2(p) := Γ\ u ∈ M2(OB) ∣ uūt = π2 12 and rank(g̃) = 1 We let H20(N) = G(Ẑ)/U0(N). Then the group Γ acts on H20(N), thus on the space of functions f : H20(N) → Vk by ∀x ∈ H20(N),∀γ ∈ Γ, f |kγ(x) := f(γx)γ. Theorem 2. There is an isomorphism of Hecke modules MBk (N) f : H20(N) → Vk ∣ f |kγ = f, γ ∈ Γ where the Hecke action on the right hand side is given by f |kT1(p) = u∈Θ1(p) f |ku, f |kT2(p) = u∈Θ2(p) f |ku. Proof. The canonical map φ : GB(Z)\GB(Ẑ)/U0(N) → GB(Q)\GB(Q̂)/U0(N) is an injection. Making use of the fact that the class number in the principal genus of GB is one (GB(Q̂) = GB(Q)GBZ (Ẑ)), we see that φ is in fact a 8 CLIFTON CUNNINGHAM AND LASSINA DEMBÉLÉ bijection. Since each element f ∈ MBk (N) is determined by its values on a set of coset representatives of GB(Q)\GB(Q̂)/U0(N), the map φ induces an isomorphism of complex vector spaces MBk (N) f : H20(N) → Vk ∣ f |kγ = f, γ ∈ Γ f 7−→ f ◦ φ. We make this into a Hecke module isomorphism by defining the Hecke action on the right hand side as indicated in the statement of the theorem. � In the rest of this section, we explain the main steps of the algorithm provided by Theorem 2. 2.1. The quotient H20(N). Keeping the notations of the previous section, we recall that N = p|N p ep . Let p be a prime dividing N and consider the rank 4 free OFp/pep -module L = OFp/pep endowed with the symplectic pairing 〈 , 〉 given by the matrix −12 0 where 12 is the identity matrix in M2(OFp/pep ). Let M be a rank 2 OFp/pep -submodule which is a direct factor in L. We say that M is isotropic if 〈u, v〉 = 0 for all u, v ∈ M . We recall that GSp4(OFp ) acts transitively on the set of rank 2, isotropic OFp/pep -submodules of L and that the stabilizer of the submodule generated by e1 = (1, 0, 0, 0) T and e2 = (0, 1, 0, 0) T is U0(p ep ). The quotient H20(pep ) = GSp4(OFp )/U0(pep ) is the set of rank 2, isotropic OFp/pep -submodules of L. Via the reduction map ÔF → OF /N , the quotient GZ(Ẑ)/U0(N) can be identified with the product H20(N) = H20(pep ). The cardinality of H20(N) is extremely useful and is determined using the following lemma. Lemma 1. Let p be a prime in F and ep ≥ 1 an integer. Then, the cardi- nality of the set H20(pep ) is given by #H20(pep ) = N(p)3(ep−1)(N(p) + 1)(N(p)2 + 1). Proof. For ep = 1, the cardinality of the Lagrange variety over the finite field Fp = OF /p is given by (N(p) + 1)(N(p)2 + 1). Proceed by induction on ep. � We have more to say about elements of H20(pep ) in Subsection 2.5. COMPUTING HILBERT-SIEGEL MODULAR FORMS 9 2.2. Brandt matrices. Let F = {x1, . . . , xh} be a fundamental domain for the action of Γ on H20(N) and, for each i, let Γi be the stabilizer of xi. Then, every element in MBk (N) is completely determined by its values on F . Thus, there is an isomorphism of complex spaces MBk (N) → f 7→ (f(xi)), where V is the subspace of Γi-invariants in Vk. For any x, y ∈ H20(N), we let Θ1(x, y, p) := u ∈ Θ1(p) ∣ ∃γ ∈ Γ, ux = γy Θ2(x, y, p) := u ∈ Θ2(p) ∣ ∃γ ∈ Γ, ux = γy Proposition 3. The actions of the Hecke operators Ts(p), s = 1, 2, are given by the Brandt matrices Bs(p) = (bsij(p)), where bsji(p) : V k → V v 7→ v · u∈Θs(xi, xj ,p) γ−1u u Proof. The proof of Proposition 3 follows the lines of [D1 2005, §3]. � 2.3. Computing the group GB(Z). It is enough to compute the subgroup Γ consisting of the elements in GB(Z) with similitude factor 1. But it is easy to see that u, v ∈ O1B u, v ∈ O1B where O1B is the group of norm 1 elements. 2.4. Computing the sets Θ1(p) and Θ2(p). Let us consider the quadratic form on the vector space V = B2 given by V → F (a, b) 7→ ||(a, b)|| := nr(a) + nr(b), where nr is the reduced norm on B. This determines an inner form V × V → F (u, v) 7→ 〈u, v〉. An element of Θ1(p) (resp. Θ2(p)) is a unitary matrix γ ∈ M2(OB) with respect to this inner form such that the norm of each row is πp (resp. π and the rank of the reduced matrix is 1). So we first start by computing all the vectors u = (a, b) ∈ O2B such that ||u|| = πp (resp. ||u|| = π2p). And for each such vector u, we compute the vectors v = (c, d) ∈ O2B of the same 10 CLIFTON CUNNINGHAM AND LASSINA DEMBÉLÉ norm such that 〈u, v〉 = 0. The corresponding matrix γ = belongs to Θ1(p) (resp. Θ2(p)) when its reduction mod p has the appropriate rank. We list all these matrices up to equivalence and stop when we reach the right cardinality. 2.5. The implementation of the algorithm. The implementation of the algorithm is similar to that of [D1 2005]. However, it is important to note how we represent elements in H20(N) so that we can retrieve them easily once stored. As in [D1 2005] we choose to work with the product H20(N) = H20(pep ). Using Plucker’s coordinates, we can view H20(pep ) as a closed subspace of P5(OFp/pep ). We then represent each element in H20(pep ) by choosing a point x = (a0 : · · · : a5) = [u ∧ v] ∈ P5(OFp/pep ) such that the submodule M generated by u and v is a Lagrange submodule, and the first invertible coordinate is scaled to 1. Remark 1. In [LP 2002], Lansky and Pollack describe an algorithm which computes algebraic modular forms on the same inner form of GSp4/Q that we use. We would like to note that there are some differences between the two algorithms. Although [LP 2002] also uses the flag variety H20(N) in order to determine the double coset space GB(Q)\GB(Q̂)/U0(N), it later returns to the adelic setting in order to compute the Brandt matrices. In contrast, Theorem 2 and Proposition 3 allow us to avoid that unnecessary step by describing the Hecke action on the flag variety H20(N) directly. As a result, we get an algorithm that is more efficient. 3. Numerical examples: F = Q( 5) and B = −1,−1 In this section, we provide some numerical examples using the quadratic field F = Q( 5). It is proven in K. Hashimoto and T. Ibukiyama [HI 1980] that, for the Hamilton quaternion algebra B over F , the class number of the principal genus of GB is one. We use our algorithm to compute all the systems of Hecke eigenvalues of Hilbert-Siegel cusp forms of weight 3 and level N that are defined over real quadratic fields, where N runs over all prime ideals of norm less than 50. We then determine which of the forms we obtained are possible lifts of Hilbert cusp forms by comparing the Hecke eigenvalues for those primes. 3.1. Tables of Hilbert-Siegel cusp forms of parallel weight 3. In Table 1 we list all the systems of eigenvalues of Hilbert-Siegel cusp forms of weight 3 and level N that are defined over real quadratic fields, where N runs over all prime ideals in F of norm less than 50. Here are the conventions we use in the tables. COMPUTING HILBERT-SIEGEL MODULAR FORMS 11 (1) For a quadratic field K of discriminant D, we let ωD be a generator of the ring of integers OK of K. (2) The first row contains the level N , given in the format (Norm(N), α) for some generator α ∈ F of N , and the dimensions of the relevant spaces. (3) The second row lists the Hecke operators that have been computed. (4) For each eigenform f , the Hecke eigenvalues are given in a row, and the last entry of that row indicates if the form f is a probable lift. (5) The levels and the eigenforms are both listed up to Galois conjuga- tion. For an eigenform f and a given prime p ∤ N , let a1(p, f) and a2(p, f) be the eigenvalues of the Hecke operators T1(p) and T2(p), respectively. Then the Euler factor Lp(f, s) is given (for example, in [AS 2001, §3.4]) by Lp(f, s) = Qp(q −s)−1, where Qp(x) = 1− a1(p, f)x+ b1(p, f)x2 − a1(p, f)q2k−3x3 + q4k−6x4, b1(p, f) = a1(p, f) 2 − a2(p, f)− q2k−4, q = N(p). 3.2. Tables of Hilbert cusp forms of parellel weight 4. In Table 2, we list all the Hilbert cusp forms of parallel weight 4 and level N that are defined over real quadratic fields, with N running over all prime ideals of norm less than 50. (They are computed by using the algorithm in [D1 2005]). We use this data in order to determine the forms in Table 1 that are possible lifts from GL2. 3.3. Lifts. There are two types of lifts from GL2 to GSp4. The first one corresponds to the homomorphism of L-groups determined by the long root embedding into GSp4, and the second one by the short root embedding. (See [LP 2002] for more details). Let f be a Hilbert cusp form of parallel weight k and level N with Hecke eigenvalues a(p, f), where p is a prime not dividing N . Let φ be the lift of f to GSp4 via the long root, and ψ the one via the short root. Then the Hecke eigenvalues of φ are given by a1(p, φ) = a(p, f) N(p) 2 +N(p)2 +N(p) a2(p, φ) = a(p, f) N(p) 2 (N(p) + 1) +N(p)2 − 1, and the Hecke eigenvalues of ψ are given by a1(p, ψ) = a(p, f) 2 − 2 a(p, f) N(p) a2(p, ψ) = a(p, f) N(p)4−2k − 3 a(p, f)2 N(p)3−k +N(p)2 − 1. The second lift ψ is the so-called symmetric cube lifting. 12 CLIFTON CUNNINGHAM AND LASSINA DEMBÉLÉ N = (4, 2) : dimMB (N) = 2, dimSB (N) = 1 T1(2) T2(2) T1( 5) T2( 5) T1(3) T2(3) Lift? f1 −4 0 20 −36 140 580 yes N = (5, 2 + ω5) : dimM (N) = 2, dimSB (N) = 1 T1(2) T2(2) T1( 5) T2( 5) T1(3) T2(3) Lift? f1 20 15 −5 0 40 −420 yes N = (9, 3) : dimMB (N) = 3, dimSB (N) = 2 T1(2) T2(2) T1( 5) T2( 5) T1(3) T2(3) Lift? f1 25− 3ω41 40− 15ω41 30 + 6ω41 24 + 36ω41 −9 0 yes N = (11, 3 + ω5) : dimM (N) = 3, dimSB (N) = 2 T1(2) T2(2) T1( 5) T2( 5) T1(3) T2(3) Lift? f1 24 35 34 48 88 60 yes f2 −20 35 −10 4 0 60 no N = (19, 4 + ω5) : dimM (N) = 5, dimSB (N) = 4 T1(2) T2(2) T1( 5) T2( 5) T1(3) T2(3) Lift? f1 4 11 −20 28 6 76 no f2 7 −50 15 −66 73 −90 yes f3 24 + ω161 35 + 5ω161 36− ω161 60− 6ω161 98− 3ω161 160− 30ω161 yes N = (29, 5 + ω5) : dimM (N) = 9, dimSB (N) = 8 T1(2) T2(2) T1( 5) T2( 5) T1(3) T2(3) Lift? f1 −4 11 10 20 30 60 no f2 8 −45 30 24 50 −320 yes f3 17 0 9 −102 86 40 yes N = (31, 5 + 2ω5) : dimM (N) = 12, dimSB (N) = 11 T1(2) T2(2) T1( 5) T2( 5) T1(3) T2(3) Lift? f1 13 −20 20 −36 76 −60 yes N = (41, 6 + ω5) : dimM (N) = 19, dimSB (N) = 18 T1(2) T2(2) T1( 5) T2( 5) T1(3) T2(3) Lift? f1 10 20 −10 29 30 −20 no f2 −1 1 5 14 −2 −56 no f3 27 50 40 84 124 420 yes f4 −12 19 30 65 0 0 no f5 16− 2ω21 −5− 10ω21 21 + 4ω21 −30 + 24ω21 72− 2ω21 −100− 20ω21 yes f6 2− 6ω5 11− 2ω5 8 + 4ω5 11− 4ω5 −12 + 54ω5 160 + 40ω5 no N = (49, 7) : dimMB (N) = 26, dimSB (N) = 25 T1(2) T2(2) T1( 5) T2( 5) T1(3) T2(3) Lift? f1 5 −60 46 120 40 −420 yes f2 4 + 4ω65 32 + 3ω65 12− 4ω65 44− 4ω65 −6− 12ω65 145 + 8ω65 no Table 1. Hilbert-Siegel eigenforms of weight 3 COMPUTING HILBERT-SIEGEL MODULAR FORMS 13 N (4, 2) (5, 2 + ω5) (9, 3) (11, 3 + ω5) N(p) p a(p, f1) a(p, f1) a(p, f1) a(p, f1) 4 2 −4 0 5− 3ω41 4 5 2 + ω5 −10 −5 6ω41 4 9 3 50 −50 −9 −2 11 3 + 2ω5 −28 32 −18− 6ω41 −10 11 3 + ω5 −28 32 −18− 6ω41 −11 19 4 + 3ω5 60 100 −40 + 24ω41 −94 19 4 + ω5 60 100 −40 + 24ω41 28 N (19, 4 + ω5) (29, 5 + ω5) N(p) p a(p, f1) a(p, f2) a(p, f1) a(p, f2) 4 2 −13 5− ω161 −12 −3 5 2 + ω5 −15 5 + ω161 0 −21 9 3 −17 5 + 3ω161 −40 −4 11 3 + 2ω5 −6 2 + 8ω161 −68 37 11 3 + ω5 33 7− 7ω161 30 −66 19 4 + 3ω5 −139 −15− 9ω161 −28 −40 19 4 + ω5 19 −19 84 −9 N (31, 5 + 2ω5) (41, 6 + ω5) N(p) p a(p, f1) a(p, f1) a(p, f2) 4 2 −7 7 −4− 2ω21 5 2 + ω5 −10 10 −9 + 4ω21 9 3 −14 34 −18− 2ω21 11 3 + 2ω5 −20 −60 −19 11 3 + ω5 −28 −2 −24− 4ω21 19 4 + 3ω5 −12 74 4− 50ω21 19 4 + ω5 28 16 −29 + 44ω21 N (49, 7) N(p) p a(p, f1) a(p, f2) 4 2 −15 −2 5 2 + ω5 16 −10 9 3 −50 −11 11 3 + 2ω5 −8 −7− 28ω13 11 3 + ω5 −8 −35 + 28ω13 19 4 + 3ω5 −110 −26 + 14ω13 19 4 + ω5 −110 −12− 14ω13 Table 2. Hilbert eigenforms of weight 4 Remark 2. So far, our algorithm has been implemented only for congruence subgroups of Siegel type. We intend to improve the implementation in the near future so as to include more additional level structures such as the Klingen type. Indeed, Ramakrishnan and Shahidi [RS 2007] recently showed the existence of symmetric cube lifts for non-CM elliptic curves E/Q to GSp4/Q. And their result should hold for other totally real number fields, with the level structures of the lifts being of Klingen type. Unfortunately, 14 CLIFTON CUNNINGHAM AND LASSINA DEMBÉLÉ those lifts cannot be seen in our current tables. For example, there are modular elliptic curves over Q( 5) whose conductors have norm 31, 41 and 49, but the corresponding symmetric cubic lifts do not appear in Table 1. We would like to remedy that in our next implementation. References [D1 2005] L. Dembélé, Explicit computations of Hilbert modular forms on Q( 5). Exper- iment. Math. 14 (2005), no. 4, 457–466. [D2 2007] L. Dembélé, Quaternionic M -symbols, Brandt matrices and Hilbert modular forms. Math. Comp. 76, no 258, (2007), 1039-1057. Also available electronically. [D3 2007] L. Dembélé, On the computation of algebraic modular forms (submitted). [AS 2001] Mahdi Asgari and Ralf Schmidt, Siegel modular forms and representations, Manuscripta Math. 104 (2001), 173–200. [FvdG1 2004] Carel Faber and Gerard van der Geer, Sur la cohomologie des systèmes locaux sur les espaces de modules des courbes de genre 2 et des surfaces abéliennes. I, C. R. Math. Acad. Sci. Paris 338 (2004), no. 5, 381–384. [FvdG2 2004] Carel Faber and Gerard van der Geer, Sur la cohomologie des systèmes locaux sur les espaces de modules des courbes de genre 2 et des surfaces abéliennes. II, C. R. Math. Acad. Sci. Paris 338 (2004), no. 6, 467–470. [JL 1970] Hervé Jacquet and Robert Langlands, Automorphic forms on GL(2), Lecture notes in mathematics 114 and 278, 1970. [Gr 1999] Benedict H. Gross, Algebraic modular forms. Israel J. Math. 113 (1999), 61–93. [Gu 2000] P. Gunnells, Symplectic modular symbols, Duke Math. J. 102 (2000), no. 2, 329-350. [HI 1980] K. Hashimoto and T. Ibukiyama, On the class numbers of positive definite binary quaternion hermitian forms. J. Fac. Sci. Univ. Tokyo Sect. IA Math. 27 (1980), 549-601. [Ib 1984] T. Ibukiyama, On symplectic Euler factors of genus 2. J. Fac. Sci. Univ. Tokyo 30 (1984), 587614. [Ih 1964] Y. Ihara, On certain Dirichlet series, J. Math. Soc. Japan 16 (1964), 214-225. [LP 2002] J. Lansky and D. Pollack, Hecke algebras and automorphic forms. Compositio Math. 130 (2002), no. 1, 21–48. [RS 2007] Dinakar Ramakrishnan and Freydoon Shahidi, Siegel modular forms of genus 2 attached to elliptic curves (preprint). Available at www.math.arxiv. [R 2006] N. C. Ryan, Computing the Satake p-parameters of Siegel modular forms. (sub- mitted). [Sk 1992] Nils-Peter Skoruppa, Computations of Siegel modular forms of genus two. Math. Comp. 58 (1992), no. 197, 381–398. [So 2008] Claus M. Sorensen, Potential level-lowering for GSp(4), arXive:0804.0588v1. Department of Mathematics, University of Calgary E-mail address: cunning@math.ucalgary.ca Institut für Experimentelle Mathematik, Universität Duisburg-Essen E-mail address: lassina.dembele@uni-duisburg-essen.de Introduction 1. Hilbert-Siegel modular forms and the Jacquet-Langlands correspondence 1.1. Hilbert-Siegel modular forms 1.2. The Hecke algebra 1.3. Algebraic Hilbert-Siegel autormorphic forms 1.4. The Jacquet-Langlands Correspondence 2. The Algorithm 2.1. The quotient H02(N) 2.2. Brandt matrices 2.3. Computing the group GB(Z) 2.4. Computing the sets 1(p) and 2(p) 2.5. The implementation of the algorithm 3. Numerical examples: F=Q(5) and B=(-1,-1F) 3.1. Tables of Hilbert-Siegel cusp forms of parallel weight 3 3.2. Tables of Hilbert cusp forms of parellel weight 4 3.3. Lifts References ABSTRACT In this paper we present an algorithm for computing Hecke eigensystems of Hilbert-Siegel cusp forms over real quadratic fields of narrow class number one. We give some illustrative examples using the quadratic field $\Q(\sqrt{5})$. In those examples, we identify Hilbert-Siegel eigenforms that are possible lifts from Hilbert eigenforms. <|endoftext|><|startoftext|> Introduction and Results Let Mλ+ 1 (Γ0(N), χ) and Sλ+ 1 (Γ0(N), χ) be the spaces, respectively, of modular forms and cusp forms of weight λ + 1 on Γ0(N) with a Dirichlet character χ whose conductor divides N . If f(z) ∈Mλ+ 1 (Γ0(N), χ), then f(z) has the form f(z) = a(n)qn, where q := e2πiz. It is well-known that the coefficients of f are related to interesting objects in number theory such as the special values of L-function, class number, traces of singular moduli and so on. In this paper, we study congruence properties of the Fourier coefficients of f(z) ∈Mλ+ 1 (Γ0(N), χ) ∩ Z[[q]] and their applications. Recently, Bruinier and Ono proved in [3] that g(z) ∈ Sλ+ 1 (Γ0(N), χ) ∩ Z[[q]] has a special form (see (2.1)) by modulo p when p is an odd prime and the coefficients of f(z) do not satisfy the following property for p: Property A. IfM is a positive integer, we say that a sequence α(n) ∈ Z satisfies Property A for M if for every integer r ♯{ 1 ≤ n ≤ X | α(n) ≡ r (mod M) and gcd(M,n) = 1} if r 6≡ 0 (mod M), X if r ≡ 0 (mod M). 2000 Mathematics Subject Classification. 11F11,11F33. Key words and phrases. Modular forms, Congruences. http://arxiv.org/abs/0704.0012v1 2 D. CHOI θ(f(z)) := f(z) = n · a(n)qn. Using Rankin-Cohen Bracket (see (2.3)), we prove that there exists f̃(z) ∈ Sλ+p+1+ 1 (Γ0(4N), χ) ∩ Z[[q]] such that θ(f(z)) ≡ f̃(z) (mod p). We extend the results in [3] to modular forms of half integral weight. Theorem 1. Let λ be a non-negative integer. We assume that f(z) = n=0 a(n)q Mλ+ 1 (Γ0(4N), χ) ∩ Z[[q]], where χ is a real Dirichlet character. If p ≥ 5 is a prime and there exists a positive integer n for which gcd(a(n), p) = 1 and gcd(n, p) = 1, then at least one of the following is true: (1) The coefficients of θp−1(f(z)) satisfies Property A for p. (2) There are finitely many square-free integers n1, n2, · · · , nt for which (1.1) θp−1(f(z)) ≡ a(nim 2)qnim (mod p). Moreover, if gcd(4N, p) = 1 and an odd prime ℓ divides some ni, then p|(ℓ− 1)ℓ(ℓ+ 1)N or ℓ | N. Remark 1.1. Note that for every odd prime p ≥ 5, θp−1(f(z)) ≡ a(n)qn (mod p). As an applications of Theorem 1, we study the distribution of traces of singular moduli modulo primes p ≥ 5. Let j(z) be the usual j-invariant function. We denote by Fd the set of positive definite binary quadratic forms F (x, y) = ax2 + bxy + cy2 = [a, b, c] with discriminant −d = b2−4ac. For each F (x, y), let αF be the unique complex number in the complex upper half plane, which is a root of F (x, 1). We define ωF ∈ {1, 2, 3} as ωF := 2 if F ∼Γ [a, 0, a], 3 if F ∼Γ [a, a, a], 1 otherwise, where Γ := SL2(Z). Here, F ∼Γ [a, b, c] denotes that F (x, y) is equivalent to [a, b, c]. From these notations, we define the Hecke trace of singular moduli. DISTRIBUTION OF INTEGRAL FOURIER COEFFICIENTS MODULO PRIMES 3 Definition 1.2. If m ≥ 1, then we define the mth Hecke trace of the singular moduli of discriminant −d as tm(d) := F∈Fd/Γ jm(αF ) where Fd/Γ denotes a set of Γ−equivalence classes of Fd and jm(z) := j(z)|T0(m) = az + b Here, T0(m) denotes the normalized mth weight zero Hecke operator. Note that t1(d) = t(d), where t(d) := F∈Fd/Γ j(αF )− 744 is the usual trace of singular moduli. Let h(z) := η(z)2 η(2z) · E4(4z) η(4z)6 and Bm(1, d) denote the coefficient of q d in h(z)|T (m2, 1, χ0), where E4(z) := 1 + 240 d3qn, η(z) := q (1− qn) , and χ0 is a trivial character. Here, T (m 2, λ, χ) denotes the mth Hecke operator of weight λ + 1 with a Dirichlet chracter χ (see VI. §3. in [5] or (2.5)). Zagier proved in [11] that for all m and d (1.2) tm(d) = −Bm(1, d). Using these generating functions, Ahlgren and Ono studied the divisibility properties of traces and Hecke traces of singular moduli in terms of the factorization of primes in imaginary quadratic fields (see [2]). For example, they proved that a positive proportion of the primes ℓ has the property that tm(ℓ 3n) ≡ 0 (mod ps) for every positive integer n coprime to ℓ such that p is inert or ramified in Q . Here, p is an odd prime, and s and m are integers with p ∤ m. In the following theorem, we give the distribution of traces and Hecke traces of singular moduli modulo primes p. 4 D. CHOI Theorem 2. Suppose that p ≥ 5 is a prime such that p ≡ 2 (mod 3). (1) Then, for every integer r, p ∤ r, ♯{ 1 ≤ n ≤ X | t1(n) ≡ r (mod p)} ≫r,p if r 6≡ 0 (mod p) X if r ≡ 0 (mod p). (2) Then, a positive proportion of the primes ℓ has the property that ♯{ 1 ≤ n ≤ X | tℓ(n) ≡ r (mod p)} ≫r,p if r 6≡ 0 (mod p) X if r ≡ 0 (mod p). for every integer r, p ∤ r. As another application we study the distribution of Hurwitz class number modulo primes p ≥ 5. The Hurwitz class number H(−N) is defined as follows: the class number of quadratic forms of the discriminant −N where each class C is counted with multiplicity Aut(C) . The following theorem gives the distribution of Hurwitz class number modulo primes p ≥ 5. Theorem 3. Suppose that p ≥ 5 is a prime. Then, for every integer r ♯{ 1 ≤ n ≤ X | H(n) ≡ r (mod p)} ≫r,p if r 6≡ 0 (mod p), X if r ≡ 0 (mod p). We also use the main theorem to study an analogue of Newman’s conjecture for overpar- titions. Newman’s conjecture concerns the distribution of the ordinary partition function modulo primes p. Newman’s Conjecture. Let P (n) be an ordinary partition function. If M is a positive integer, then for every integer r there are infinitely many nonnegative integer n for which P (n) ≡ r (mod M). This conjecture was already studied by many mathematicians (see Chapter 5. in [8]). The overpartition of a natural number n is a partition of n in which the first occurrence of a number may be overlined. Let P̄ (n) be the number of the overpartition of an integer n. As an analogue of Newman’s conjecture, the following theorem gives a distribution property of P̄ (n) modulo odd primes p. Theorem 4. Suppose that p ≥ 5 is a prime such that p ≡ 2 (mod 3). Then, for every integer r, ♯{ 1 ≤ n ≤ X | P̄ (n) ≡ r (mod p)} ≫r,p if r 6≡ 0 (mod p), X if r ≡ 0 (mod p). Remark 1.3. When r ≡ 0 (mod p), Theorem 2, 3 and 4 were proved in [2] and [10]. DISTRIBUTION OF INTEGRAL FOURIER COEFFICIENTS MODULO PRIMES 5 Next sections are detailed proofs of theorems: Section 2 gives a proof of Theorem 1. In Section 3, we give the proofs of Theorem 2, 3, and 4. 2. Proof of Theorem 1 We begin by stating the following theorem proved in [3]. Theorem 2.1 ([3]). Let λ be a non-negative integer. Suppose that g(z) = n=0 ag(n)q Sλ+ 1 (Γ0(4N), χ) ∩ Z[[q]], where χ is a real Dirichlet character. If p is an odd prime and a positive integer n exists for which gcd(ag(n), p) = 1, then at least one of the following is true: (1) If 0 ≤ r < p, then ♯{ 1 ≤ n ≤ X | ag(n) ≡ r (mod p)} ≫r,M if r 6≡ 0 (mod p), X if r ≡ 0 (mod p). (2) There are finitely many square-free integers n1, n2, · · · , nt for which (2.1) g(z) ≡ ag(nim 2)qnim (mod p). Moreover, if gcd(p, 4N) = 1, ǫ ∈ {±1}, and ℓ ∤ 4Np is a prime with ∈ {0, ǫ} for 1 ≤ i ≤ t, then (ℓ−1)g(z) is an eigenform modulo p of the half-integral weight Hecke operator T (ℓ2, λ, χ). In particular, we have (2.2) (ℓ− 1)g(z)|T (ℓ2, λ, χ) ≡ ǫχ(p) (−1)λ ℓλ + ℓλ−1 (ℓ− 1)g(z) (mod p). Recall that f(z) = a(n)qn ∈ Mλ+ 1 (Γ0(4N), χ) ∩ Z[[q]]. Thus, to apply Theorem 2.1, we show that there exists a cusp form f̃(z) such that f̃(z) ≡ θp−1(f(z)) (mod p) for a prime p ≥ 5. Lemma 2.2. Suppose that p ≥ 5 is a prime and f(z) = a(n)qn ∈Mλ+ 1 (Γ0(N), χ) ∩ Z[[q]]. Then, there exists a cusp form f̃(z) ∈ Sλ+(p+1)(p−1)+ 1 (Γ0(N), χ) ∩ Z[[q]] such that f̃(z) ≡ θp−1(f(z)) (mod p). Proof of Lemma 2.2. For F (z) ∈Mk1 (Γ0(N), χ1) and G(z) ∈Mk2 (Γ0(N), χ2), let (2.3) [F (z), G(z)]1 := θ(F (z)) ·G(z)− F (z) · θ(G(z)). This operator is referred to as a Rankin-Cohen 1-bracket, and it was proved in [4] that [F (z), G(z)]1 ∈ S k1+k2 (Γ0(N), χ1χ2χ 6 D. CHOI where χ′ = 1 if k1 and k2 ∈ Z, χ′(d) = 2 if ki ∈ Z and k3−i + Z, and χ′(d) = ) k1+k2 2 if k1 and k2 For even k ≥ 4, let Ek(z) := 1− dk−1qn be the usual normalized Eisenstein series of weight k. Here, the number Bk denotes the kth Bernoulli number. The function Ek(z) is a modular form of weight k on SL2(Z), and (2.4) Ep−1(z) ≡ 1 (mod p) (see [6]). From (2.3) and (2.4), we have [Ep−1(z), f(z)]1 ≡ θ(f(z)) (mod p) and [Ep−1(z), f(z)]1 ∈ Sλ+p+1+ 1 (Γ0(N), χ). Repeating this method p− 1 times, we com- plete the proof. � Using the following lemma, we can deal with the divisibility of ag(n) for positive integers n, p ∤ n, where g(z) = n=1 ag(n)q n ∈ Sλ+ 1 (Γ0(N), χ) ∩ Z[[q]]. Lemma 2.3 (see Chapter 3 in [8]). Suppose that g(z) = n=1 ag(n)q n ∈ Sλ+ 1 (Γ0(N), χ) has coefficients in OK , the algebraic integers of some number field K. Furthermore, suppose that λ ≥ 1 and that m ⊂ OK is an ideal norm M . (1) Then, a positive proportion of the primes Q ≡ −1 (mod 4MN) has the property g(z)|T (Q2, λ, χ) ≡ 0 (mod m). (2) Then a positive proportion of the primes Q ≡ 1 (mod 4MN) has the property that g(z)|T (Q2, λ, χ) ≡ 2g(z) (mod m). We can now prove Theorem 1. Proof of Theorem 1. From Lemma 2.2, there exists a cusp form f̃(z) ∈ Sλ+(p+1)(p−1)+ 1 (Γ0(N), χ) ∩ Z[[q]] such that f̃(z) ≡ θp−1(f(z)) (mod p). Note that, for F (z) = n=0 aF (n)q n ∈ Mk+ 1 (Γ0(N), χ) and each prime Q ∤ N , the half-integral weight Hecke operator T (Q2, λ, χ) is defined as (2.5) F (z)|T (Q2, k, χ) aF (Q 2n) + χ∗(Q) Qk−1aF (n) + χ ∗(Q2)Q2k−1aF (n/Q DISTRIBUTION OF INTEGRAL FOURIER COEFFICIENTS MODULO PRIMES 7 where χ∗(n) := χ∗(n) (−1)k and aF (n/Q 2) = 0 if Q2 ∤ n. If F (z)|T (Q2, k, χ) ≡ 0 (mod p) for a prime Q ∤ N , then we have aF (Q 2 ·Qn) + χ∗(Q) Qk−1aF (Qn) + χ ∗(Q2)Q2k−1aF Qn/Q2 ≡ aF (Q3n) ≡ 0 (mod p) for every positive integer n such that gcd(Q, n) = 1. Thus, we have the following by Lemma 2.3-(1): ♯{ 1 ≤ n ≤ X | a(n) ≡ 0 (mod p) and gcd(p, n) = 1} ≫ X. We apply Theorem 2.1 with f̃(z). Then the purpose of the remaining part of the proof is to show the following: if gcd(p, 4N) = 1, an odd prime ℓ divides some ni, and (2.6) θp−1(f(z)) ≡ a(nim 2)qnim (mod p), then p|(ℓ− 1)ℓ(ℓ+ 1)N or ℓ | N . We assume that there exists a prime ℓ1 such that ℓ1|n1, p ∤ (ℓ1 − 1)ℓ1(ℓ1 + 1)N and ℓ | N . We also assume that nt = 1 and that ni ∤ n1 for every i, 2 ≤ i ≤ t − 1. Then, we can take a prime ℓi for each i, 2 ≤ i ≤ t − 1, such that ℓi|ni and ℓi ∤ n1. For convention, we define (−1)(n−1)2/8 if n is odd, 0 otherwise, and χQ(d) := for a prime Q. Let ψ(d) := i=2 χℓi(d). We take a prime β such that ψ(n1)χβ(n1) = −1. If we denote the ψ-twist of f̃(z) by f̃ψ(z) and the ψχβ-twist of f̃(z) by f̃ψχβ(z), then f̃ψχ2 (z)− f̃ψχβ(z) ≡ 2 gcd(m,β ℓj)=1 a(n1m 2)qn1m (mod p) and f̃ψχβ(z) ∈ Sλ+(p+1)(p−1)+ 1 (Γ0(Nα 2β2), χ) ∩ Z[[q]] (see Chapter 3 in [8]). Note that gcd(Nα2β2, p) = gcd(Nα2β2, ℓ1) = 1. Thus, (f̃ψ(z)− f̃ψχβ(z))|T (ℓ21, λ+ (p+ 1)(p− 1), χ) satisfies the formula (2.2) of Theorem 2.1 for both of ǫ = 1 and ǫ = −1. This results in a contradiction since (f̃ψ(z)− f̃ψχβ(z))|T (ℓ 1, λ+ (p+ 1)(p− 1), χ) 6≡ 0 (mod p) and p ≥ 5. Thus, we complete the proof. � 8 D. CHOI 3. Proofs of Theorem 2, 3, and 4 3.1. Proof of Theorem 2. Note that h(z) = η(z)2 η(2z) ·E4(4z) η(4z)6 is a meromorphic modular form. In [2] it was obtained a holomorphic modular form on Γ0(4p 2) whose Fourier coefficients generate traces of singular moduli modulo p (see the formula (3.1) and (3.2)). Since the level of this modular form is not relatively prime to p, we need the following proposition. Proposition 3.1 ([1]). Suppose that p ≥ 5 is a prime. Also, suppose that p ∤ N , j ≥ 1 is an integer, and g(z) = a(n)qn ∈ Sλ+ 1 (Γ0(Np j)) ∩ Z[[q]]. Then, there exists a cusp form G(z) ∈ Sλ′+ 1 (Γ0(N)) ∩ Z[[q]] such that G(z) ≡ g(z) (mod p), where λ′ + 1 = (λ+ 1 )pj + pe(p− 1) for a sufficiently large e ∈ N. Using Theorem 1 and Proposition 3.1, we give the proof of Theorem 2. Proof of Theorem 2. Let (3.1) h1,p(z) := h(z)− hχp(z), where hχp(z) is the χp-twist of h(z). From (1.2), we have h1,p(z) := −2 − 0<|startoftext|> Introduction and Statement of Main Results Serre obtained the p-adic limits of the integral Fourier coefficients of modular forms on SL2(Z) for p = 2, 3, 5, 7 (see Théorème 7 and Lemma 8 in [20]). In this paper, we extend the result of Serre to weakly holomorphic modular forms of half integral weight on Γ0(4N) forN = 1, 2, 4. The proof is based on linear relations among Fourier coefficients of modular forms of half integral weight. As applications of our main result, we obtain congruences for various modular objects, such as those for Borcherds exponents, for Fourier coefficients of quotients of Eisentein series and for Fourier coefficients of Siegel modular forms on the Maass Space. For odd d, let := γtΓ0(4N)tγ where γt = ( c d ) ∈ Γ(1) and γt(t) = ∞. We denote the q-expansion of a modular form f ∈Mλ+ 1 (Γ0(4N)) at each cusp t of Γ0(4N) by (1.1) (f |λ+ 1 γt)(z) = (cz + d) −λ− 1 az + b cz + d atf (n)q t , qt := q where (1.2) r(t) ∈ 2000 Mathematics Subject Classification. 11F11,11F33. Key words and phrases. modular forms, p-adic limit, Borcherds exponents, Maass space . This work was partially supported by KOSEF R01-2003-00011596-0 , ITRC and BRSI-POSTECH. http://arxiv.org/abs/0704.0013v2 2 D. CHOI AND Y. CHOIE When t ∼ ∞, we denote atf (n) by af (n). Note that the number r(t) is independent of the choice of f ∈Mλ+ 1 (Γ0(4N)) and λ. We call t a regular cusp if r(t) = 0 (see Chapter IV. §1. of [15] for a more general definition of a λ-regular cusp ). Remark 1.1. Our definition of a regular cusp is different from the usual one. Let U4N := {t1, · · · , tν(4N)} be the set of all inequivalent regular cusps of Γ0(4N). Note that the genus of Γ0(4N) is zero if and only if 1 ≤ N ≤ 4. LetMλ+ 1 (Γ0(4N)) be the space of weakly holomorphic modular forms of weight λ + 1 on Γ0(4N) and let M0λ+ 1 (Γ0(N)) denote the set of f(z) ∈ Mλ+ 1 (Γ0(N)) such that the constant term of its q-expansion at each cusp is zero. Let Up be the operator defined by (f |Up)(z) := af(pn)q Let OL be the ring of integers of a number field L with a prime ideal p ⊂ OL. For f(z) := af(n)q n and g(z) := ag(n)q n ∈ L[[q−1, q]] we write f(z) ≡ g(z) (mod p) if and only if af (n)− ag(n) ∈ p for every integer n. With these notations we state the following theorem. Theorem 1. For N = 1, 2, 4 consider f(z) := af (n)q n ∈ M0 (Γ0(4N)) ∩ L[[q−1, q]]. Suppose that p ⊂ OL is any prime ideal such that p|p, p prime, and that af(n) is p-integral for every integer n ≥ n0. (1) If p = 2 and af (0) = 0, then there exists a positive integer b such that (f |(Up)b)(z) ≡ 0 (mod pj) for each j ∈ N. (2) If p ≥ 3 and f(z) ∈ M0 (Γ0(4N)) with λ ≡ 2 or 2+ (mod p−1 ), then there exists a positive integer b such that (f |(Up)b)(z) ≡ 0 (mod pj) for each j ∈ N. Remark 1.2. The p-adic limit of a sum of Fourier coefficients of f ∈ M 3 (Γ0(4N)) was studied in [13]. Our method only allows to prove a weaker result if f(z) 6∈ M0 (Γ0(4N)). THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 3 Theorem 2. For N = 1, 2 or 4, let f(z) := af (n)q n ∈ Mλ+ 1 (Γ0(4N)) ∩ L[[q−1, q]]. Suppose that p ⊂ OL is any prime ideal with p|p, p prime, p ≥ 5, and that af (n) is p-integral for every integer n ≥ n0. If λ ≡ 2 or 2 + (mod p−1 ), then there exists a positive integer b0 such that p2b−m(p:λ) t∈U4N ∆4N,3−α(p:λ)(z) R4N (z) e·ω(4N) (0)atf (0) (mod p) for every positive integer b > b0 (see Section 3 for detailed notation ). Example 1.3. Recall that the generating function of the overpartition P̄ (n) of n(see [11]) P̄ (n)qn = η(2z) η(z)2 is in M− 1 (Γ0(16)), where η(z) := q n=1(1− qn). Therefore, theorem 2 implies that P̄ (52b) ≡ 1 (mod 5), ∀b ∈ N. 2. Applications: More Congruences In this section, we study congruences for various modular objects such as those for Borcherds exponents and for quotients of Eisenstein series. 2.1. p-adic Limits of Borcherds Exponents. Let MH denote the set of meromorphic modular forms of integral weight on SL2(Z) with Heegner divisor, integer coefficients and leading coefficient 1. Let (Γ0(4)) := {f(z) = af(n)q n ∈ M 1 (Γ0(4)) | a(n) = 0 for n ≡ 2, 3 (mod 4)}. If f(z) = af(n)q n ∈ M+1 (Γ0(4)), then define Ψ(f(z)) by Ψ(f(z)) := q−h (1− qn)af (n2), where h = − 1 af(0) + 10, let rn(u) := ♯{(s1, · · · , sn) ∈ Zn : s21 + · · ·+ s2n = u}. Theorem 4. Suppose that p ≥ 5 is a prime. If λ ≡ 2 or 3 (mod p−1 ), then there exists a positive integer C0 such that r2λ+1 p2b−m(p:λ) ≡ − (14− 4α (p : λ)) + 16 )[ λp−1 ]+α(p:λ)m(p:λ) (mod p), for every b > C0. Remark 2.2. As for an example, if λ ≡ 2 (mod p− 1) and p is an odd prime, then there exists a positive integer C0 such that r2λ+1 ≡ 10 (mod p), ∀b > C0 2.3. Quotients of Eisenstein Series. Congruences for the coefficients of quotients of elliptic Eisenstein series have been studied in [3]. Let us consider the Cohen Eisenstein series Hr+ 1 (z) := N=0H(r,N)q n of weight r+ 1 , r ≥ 2 (see [7]). We derive congruences for the coefficients of quotients of Hr+ 1 (z) and Eisenstein series. THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 5 Theorem 5. Let F (z) := E4(z) aF (n)q G(z) := E6(z) aG(n)q W (z) := E6(z) aW (n)q Then there exists a positive integer C0 such that aF (11 2b+1) ≡ 1 (mod 11), aG(11 2b+1) ≡ 6 (mod 11), aW (11 2b+1) ≡ 2 (mod 11) for every integer b > C0. 2.4. The Maass Space. Next we deal with congruences for the Fourier coefficients of a Siegel modular form in the Maass space. To define the Maass space, let us introduce notations given in [17]: let T ∈ M2g(Q) be a rational, half-integral, symmetric, non- degenerate matrix of size 2g with discriminant DT := (−1)g det(2T ). Let DT = DT,0f T , where DT,0 is the corresponding fundamental discriminant. Further- more, let G8 :=  2 0 −1 0 0 0 0 0 0 2 0 −1 0 0 0 0 −1 0 2 −1 0 0 0 0 0 −1 −1 2 −1 0 0 0 0 0 0 −1 2 −1 0 0 0 0 0 0 −1 2 −1 0 0 0 0 0 0 −1 2 −1 0 0 0 0 0 0 −1 2  and G7 be the upper (7, 7)-submatrix of G8. Define Sg := (g−1)/8 2, if g ≡ 1 (mod 8), (g−7)/8 G7, if g ≡ −1 (mod 8). 6 D. CHOI AND Y. CHOIE For each m ∈ N such that (−1)gm ≡ 0, 1 (mod 4), define a rational, half-integral, sym- metric, positive definite matrix Tm of size 2g by Tm :=   0 m/4 , if m ≡ 0 (mod 4), e2g−1 e′2g−1 [m+ 2 + (−1)n]/4 , if m ≡ (−1)g (mod 4) Here e2g−1 ∈ Z(2n−1,1) is the standard column vector and e′2g−1 is its transpose. Definition 2.3. (The Maass Space) Take g, k ∈ N such that g ≡ 0, 1 (mod 4) and g ≡ k (mod 2). Let SMaassk+g (Γ2g) F (Z) = A(T )qtr(TZ) ∈ Sk+g(Γ2g) ∣∣∣∣∣∣ A(T ) = ak−1φ(a;T )A(T|DT |/a2) (see (6.2) for details). This space is called the Maass space of genus 2g and weight g + k. In [17] it was proved that the Maass space is the same as the image of the Ikeda lifting when g ≡ 0, 1 (mod 4). Using this fact together with Theorem 1, we derive the following congruences for the Fourier coefficients of F (Z) in SMaassk+g (Γ2g). Theorem 6. For g ≡ 0, 1 (mod 4), let F (Z) := A(T )qtr(TZ) ∈ SMaassk+g (Γ2g) with integral coefficients A(T ), T > 0. If k ≡ 2 or 3 (mod p−1 ) for some prime p, then, for each j ∈ N, there exists a positive integer b for which A(T ) ≡ 0 (mod pj) for every T > 0, det(2T ) ≡ 0 (mod pb). This paper is organized as follows. Section 3 gives a linear relation among Fourier coefficients of modular forms of half integral weight. The remaining sections contain detailed proofs of the main theorems. 3. Linear Relation among Fourier Coefficients of modular forms of Half Integral Weight Let V (N ; k, n) be the subspace of Cn generated by the first n coefficients of the q- expansion of f at ∞ for f ∈ Sk(Γ0(N)), where Sk(Γ0(N)) denotes the space of cusp forms of weight k ∈ Z on Γ0(N). Let L(N ; k, n) be the orthogonal complement of V (N ; k, n) THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 7 in Cn with the usual inner product of Cn. The vector space L(1; k, d(k) + 1), d(k) = dim(Sk(Γ(1))), was studied by Siegel to evaluate the value of the Dedekind zeta function at a certain point. The vector space L(1; k, n) is explicitly described in terms of the principal part of negative weight modular forms in [9]. These results were extended in [8] to the groups Γ0(N) of genus zero. For 1 ≤ N ≤ 4, let 4N, λ+ at1f (0), · · · , a tν(4N) f (0), af(1), · · · , af(n) ∈ Cn+ν(4n) ∣∣∣ f ∈Mλ+ 1 (Γ0(4N)) where U4N := {t1, · · · , tν(4N)} is the set of all inequivalent regular cusps of Γ0(4N). We define EL(4N, λ+ 1 ;n) to be the orthogonal complement of EV (4N, λ+ 1 ;n) in Cn+ν(4N). Let ∆4N,λ := q δλ(4N)+O(qδλ(4N)+1) be inMλ+ 1 (Γ0(4N) with the maximum order at ∞, that is, its order at ∞ is bigger than that of any other modular form of the same level and weight. Furthermore, let R4(z) := η(4z)8 η(2z)4 , R8(z) := η(8z)8 η(4z)4 R12(z) := η(12z)12η(2z)2 η(6z)6η(4z)4 and R16(z) := η(16z)8 η(8z)4 For ℓ, n ∈ N, define m(ℓ : n) := ≡ 0 (mod 2) ≡ 1 (mod 2) α(ℓ : n) := n− ℓ− 1 Let ω(4N) be the order of zero of R4N (z) at ∞. Note that R4N (z) ∈ M2(Γ0(4N)) has its only zero at ∞. So, using the definition of η(z) = q 124 n=1(1− qn), we find that (3.1) ω(4) = 1, ω(8) = 2, ω(12) = 4, ω(16) = 4. For each g ∈Mr+ 1 (Γ0(4N)) and e ∈ N, let (3.2) R4N (z)e e·ω(4N)∑ b(4N, e, g; ν)q−ν +O(1) at ∞. With these notations we state the following theorem: Theorem 3.1. Suppose that λ ≥ 0 is an integer and 1 ≤ N ≤ 4. For each e ∈ N such that e ≥ λ − 1, take r = 2e − λ + 1. The linear map Φr,e(4N) : Mr+ 1 (Γ0(4N)) → 8 D. CHOI AND Y. CHOIE EL(4N, λ+ 1 ; e · ω(4N)), defined by Φr,e(4N)(g) R4N (z) (0), · · · , htν(4N)a tν(4N) R4N (z) (0), b(4N, e, g; 1), · · · , b(4N, e, g; e · ω(4N)) is an isomorphism. Proof of Theorem 3.1. Suppose that G(z) is a meromorphic modular form of weight 2 on Γ0(4N). For τ ∈ H∪C4N , let Dτ be the image of τ under the canonical map from H∪C4N to a compact Riemann surface X0(4N). Here H is the usual complex upper half plane, and C4N denotes the set of all inequivalent cusps of Γ0(4N). The residue ResDτGdz of G(z) at Dτ ∈ X0(4N) is well-defined since we have a canonical correspondence between a meromorphic modular form of weight 2 on Γ0(4N) and a meromorphic 1-form of X0(4N). If ResτG denotes the residue of G at τ on H, then ResDτGdz = ResτG. Here lτ is the order of the isotropy group at τ . The residue of G at each cusp t ∈ C4N is (3.3) ResDtGdz = ht · atG(0) Now we give a proof of Theorem 3.1. To prove Theorem 3.1, take G(z) = R4N (z)e f(z), where g ∈Mr+ 1 (Γ0(4N)) and f(z) = n=1 af(n)q n ∈Mλ+ 1 (Γ0(4N)). Note that G(z) is holomorphic on H. Since g(z), R4N (z) and f(z) are holomorphic and R4N (z) has no zero on H, it is enough to compute the residues of G(z) only at all inequivalent cusps to apply the Residue Theorem. The q-expansion of R4N (z) ef(z) at ∞ is R4N(z)e f(z) = e·ω(4N)∑ b(4N, e, g; ν)q−ν + a g(z) R4N (z) (0) +O(q) af(n)q Since R4N (z) has no zero at t ≁ ∞, we have R4N (z)e γt = a R4N (z) (0)af(0) +O(qt). Further note that, for an irregular cusp t, at g(z) R4N (z) (0)af(0) = 0. THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 9 So the Residue Theorem and (3.3) imply that (3.4) t∈U4N e·ω(4N) (0)atf(0) + e·ω(4N)∑ b(4N, e, g; ν)af(ν) = 0. This shows that Φr,e(4N) is well-defined. The linearity of the map Φr,e(4N) is clear. It remains to check that Φr,e(4N) is an isomorphism. Since there exists no holomorphic modular form of negative weight except the zero function, we obtain the injectivity of Φr,e(4N). Note that for e ≥ λ−12 , 4N ;λ+ , e · ω(4N) = e · ω(4N) + ν(4N)− dimC Mλ+ 1 (Γ0(4N)) However, the set C4N , 1 ≤ N ≤ 4, of all inequivalent cusps of Γ0(4N) are ∞, 0, 1 ∞, 0, 1 C12 = ∞, 0, 1 C16 = ∞, 0, 1 and it can be checked that (3.5) ν(4) = 2, ν(8) = 3, ν(12) = 4, ν(16) = 6 (see §1 of Chapter 4. in [15] for details). The dimension formula of Mλ+ 1 (Γ0(4N)) (see Table 1) together with the results in (3.1) and (3.5), implies that 4N, λ+ ; e · ω(N) = dimC(Mr+ 1 (Γ0(4N))) since r = 2e− λ+ 1. Table 1. Dimension Formula for Mk(Γ0(4N)) N k = 2n + 1 k = 2n+ 3 k = 2n N = 1 n + 1 n + 1 n + 1 N = 2 2n+ 1 2n+ 2 2n+ 1 N = 3 4n+ 1 4n+ 3 4n+ 1 N = 4 4n+ 2 4n+ 4 4n+ 1 So Φr,e(4N) is surjective since the map Φr,e(4N) is injective. This completes our claim. 10 D. CHOI AND Y. CHOIE 4. Proofs of Theorem 1 and 2 4.1. Proof of Theorem 1. First, we obtain linear relations among Fourier coefficients of modular forms of half integral weight modulo p. Let Op := {α ∈ L | α is p-integral}. M̃λ+ 1 , p(Γ0(4N)) := {H(z) = aH(n)q n ∈ Op/pOp[[q−1, q]] | H ≡ h (mod p) for some h ∈ Op[[q−1, q]] ∩Mλ+ 1 (Γ0(4N))}. S̃λ+ 1 , p(Γ0(4N)) := {H(z) = aH(n)q n ∈ Op/pOp[[q−1, q]] | H ≡ h (mod p) for some h ∈ Op[[q−1, q]] ∩ Sλ+ 1 (Γ0(4N))}. The following lemma gives the dimension of M̃λ+ 1 , p(Γ0(4N)). Lemma 4.1. Take λ ∈ N, 1 ≤ N ≤ 4 and a prime p such that p ≥ 3 if N = 1, 2, 4, p ≥ 5 if N = 3. Now take any prime ideal p ⊂ OL, p|p. Then dim M̃λ+ 1 , p(Γ0(4N)) = dimMλ+ 1 (Γ0(4N)) dim S̃λ+ 1 , p(Γ0(4N)) = dimSλ+ 1 (Γ0(4N)). Proof. Let j4N (z) = q −1 +O(q) be a meromorphic modular function with a pole only at ∞. Explicitly, these functions j4(z) = η(z)8 η(4z)8 + 8, j8(z) = η(4z)12 η(2z)4η(8z)8 j12(z) = η(4z)4η(6z)2 η(2z)2η(12z)4 , j16(z) = η2(z)η(8z) η(2z)η2(16z) Since the Fourier coefficients of η(z) and 1 are integral, the q-expansion of j4N (z) has integral coefficients. Recall that ∆4N,λ = q δλ(4N) + O(qδλ(4N)+1) is the modular form of weight λ + 1 Γ0(4N) such that the order of its zero at ∞ is higher than that of any other modular form THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 11 of the same level and weight. Denote the order of zero of ∆4N,λ at ∞ by δλ(4N). Then the basis of Mλ+ 1 (Γ0(4N)) can be chosen as (4.1) {∆4N,λ(z)j4N (z)e | 0 ≤ e ≤ δλ(4N)} . If ∆4N,λ(z) is p-integral, then {∆4N,λ(z)j4N (z)e | 0 ≤ e ≤ δλ(4N)} also forms a basis of M̃λ+ 1 ,p(Γ0(4N)). Note that δλ(4N) = dimMλ+ 1 (Γ0(4N))− 1. So from Table 1 we have (4.2) ∆4N,λ(z) = ∆4N,j(z)R4N (z) where λ ≡ j (mod 2), j ∈ {0, 1}. More precisely, one can choose ∆4N,j(z) as followings: ∆4,0(z) = θ(z), ∆4,1(z) = θ(z) ∆8,0(z) = θ(z), ∆8,1(z) = (θ(z)3 − θ(z)θ(2z)2) , ∆12,0(z) = θ(z), ∆12,1(z) = x,y,z∈Z q 3x2+2(y2+z2+yz) − x,y,z∈Z q 3x2+4y2+4z2+4yz ∆16,0(z) = (θ(z)− θ(4z)) , ∆16,1(z) = 18 (θ(z) 3 − 3θ(z)2θ(4z) + 3θ(z)θ(4z)2 − θ(4z)3) . Since θ(z) = 1+ 2 n=1 q n, the coefficients of the q-expansion of ∆4N,j(z), j ∈ {0, 1}, are p-integral. This completes the proof. � Remark 4.2. The proof of Lemma 4.1 implies that the spaces of Mλ+ 1 (Γ0(4N)) for N = 1, 2, 4 are generated by eta-quotients since θ(z) = η(2z)5 η(z)2η(4z)2 For 1 ≤ N ≤ 4 set 4N, λ+ (af(1), · · · , af(n)) ∈ Fnp | f ∈ S̃λ+ 1 (Γ0(4N)) ,Fp := Op/pOp. We define L̃S(4N, λ + ;n) to be the orthogonal complement of ṼS(4N, λ + ;n) in Fn Using Lemma 4.1, we obtain the following proposition. Proposition 4.3. Suppose that λ is a positive integer and 1 ≤ N ≤ 4. For each e ∈ N, e ≥ λ − 1, take r = 2e−λ+1. The linear map ψ̃r,e(4N) : M̃r+ 1 ,p(Γ0(4N)) → L̃S(4N, λ+ ; e · ω(4N)), defined by ψ̃r,e(4N)(g) = (b(4N, e, g; 1), · · · , b(N, e, g; e · ω(4N))) , is an isomorphism. Here b(4N, e, g; ν) is defined in (3.2). Proof. Note that dimS 3 (4N) = 0 and that dimSλ+ 1 (4N) +N + 1 + = dimMλ+ 1 (see [10]). So, from Lemma 4.1 and Table 1, it is enough to show that ψr,e(4N) is injective. If g is in the kernel of ψr,e(4N), then R4N (z) e · R4N (z)e ≡ 0 (mod p) by Sturm’s formula (see [21]). So we have g(z) ≡ 0 (mod p) since R4N(z)e 6≡ 0 (mod p). This completes the proof. � 12 D. CHOI AND Y. CHOIE Theorem 4.4. Take a prime p,N = 1, 2, 4 and f(z) := af(n)q n ∈ Sλ+ 1 (Γ0(4N)) ∩ L[[q]]. Suppose that p ⊂ OL is any prime ideal with p|p and that af (n) is p-integral for every integer n ≥ n0. If λ ≡ 2 or 2 + (mod p−1 ) or p = 2, then there exists a positive integer b such that ≡ 0 (mod p), ∀n ∈ N. Proof of Theorem 4.4. i) First, suppose that p ≥ 3: Take positive integers ℓ and b such (4.3) 3− 2α(p : λ) p2b + pm(p:λ) + ℓ(p− 1) = 2. Note that if b is large enough, that is, b > logp 3−2α(p:λ) pm(p:λ) − 2 , then there exists a positive integer ℓ satisfying (4.3). Also note that atf(0) = 0 for every cusp t of Γ0(4N) since f(z) is a cusp form. So, if r = 2e− α(p : λ) + 1, then Theorem 3.1 implies that, for g(z) ∈ M̃r+ 1 (Γ0(4N)), e·ω(4N)∑ b(4N, e, g; ν)af(νp 2b−m(p:λ)) ≡ 0 (mod p), since R4N (z)e f(z)p m(p:λ) Eℓp−1(z) e·ω(4N)∑ b(4N, e, g; ν)q−νp + a g(z) R4N (z) (0) + a g(z) R4N (z) (n)qnp af(n)q npm(p:λ) (mod p). So Proposition 4.3 implies that p2b−m(p:λ) 2p2b−m(p:λ) , · · · , a e · ω(4N)p2b−m(p:λ) ∈ ṼS 4N,α(p : λ) + 1 If α(p : λ) = 2 or 2 + , then dimSα(p:λ)+ 1 (Γ0(4N)) = dim ṼS 4N,α(p : λ) + THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 13 ii) p = 2: Note that ∆4N,1(z) R4N (z) = q−1+O(1) for N = 1, 2, 4. So, there exists a polynomial F (X) ∈ Z[X ] such that F (j4N(z)) ∆4N,1(z) R4N (z) = q−n +O(1). For an integer b, 22 > λ+ 2, let G(z) := F (j4N(z)) ∆4N,1(z) R4N(z) f(z)θ(z)2 1+2b−2λ+3. Since θ(z) ≡ 1 (mod 2), Theorem 3.1 implies that af(2b · n) ≡ 0 (mod p). � To apply Theorem 4.4, we need the following two propositions. Proposition 4.5 (Proposition 3.2 in [22]). Suppose that p is an odd prime, k and N are integers with (N, p) = 1. Let f(z) = a(n)qn ∈ Mλ+ 1 (Γ0(4N)). Suppose that ξ := cp2 d , with ac > 0. Then there exist n0, h0 ∈ N with h0|N, a sequence {a0(n)}n≥n0 and r0 ∈ {0, 1, 2, 3} such that (f |Upm|λ+ 1 ξ)(z) = 4n+r0≡0 (mod p a0(n)q 4n+r0 m , ∀m ≥ 1. Proposition 4.6 (Proposition 5.1 in [1]). Suppose that p is an odd prime such that p ∤ N and consider g(z) = a(n)qn ∈ Sλ+ 1 (Γ0(4Np j)) ∩ L[[q]], for each j ∈ N. Suppose further that p ⊂ OL is any prime ideal with p|p and that a(n) is p-integral for every integer n ≥ 1. Then there exists G(z) ∈ Sλ′+ 1 (Γ0(4N)) ∩OL[[q]] such that G(z) ≡ g(z) (mod p), where λ′ + 1 = (λ+ 1 )pj + pe(p− 1) with eN large. Remark 4.7. Proposition 4.6 was proved for p ≥ 5 in [1]. One can check that this holds also for p = 3. Now we prove Theorem 1. Proof of Theorem 1. Take Gp(z) := η(8z)48 η(16z)24 ∈M12(Γ0(16)) if p = 2, η(z)27 η(9z)3 ∈M12(Γ0(9)) if p = 3, η(4z)p η(4p2z) ∈M p2−1 (Γ0(p 2)) if p ≥ 5. 14 D. CHOI AND Y. CHOIE Using properties of eta-quotients (see [12]), note that Gp(z) vanishes at every cusp of Γ0(16) except ∞ if p = 2, and vanishes at every cusp ac of Γ0(4Np 2) with p2 ∤ N if p ≥ 3. Thus, Proposition 4.5 implies that there exist positive integers ℓ,m, k such that (f |Upm)(z)Gp(z)ℓ ∈ Sk+ 1 (Γ0(16)) if p = 2, (f |Upm)(z)Gp(z)ℓ ∈ Sk+ 1 (Γ0(4p 2N)) if p ≥ 3. Note that k ≡ λ (mod p− 1). Using Proposition 4.6, we can find F (z) ∈ Sk′+ 1 (Γ0(4N)) such that F (z) ≡ (f(z)|Upm)Gp(z)ℓ ≡ (f |Upm)(z) (mod p) and k′ ≡ k (mod p − 1). Theorem 4.4 implies that there exists a positive integer b such that (F |Up2b)(z) ≡ 0 (mod p). Thus, we have shown so far that if ρ ∈ p \ p2, all the Fourier coefficients of · F (z)|Upm+2b are p-integral. Repeat this argument to complete our claim. � 4.2. Proof of Theorem 2. Theorem 2 can be derived from Theorem 3.1 by taking a special modular form. Proof of Theorem 2. Take a positive integer ℓ and a positive even integer u such that 3− 2α(p : λ) pm(p:λ) + ℓ(p− 1) = 2. Let F (z) := ∆4N,3−α(p:λ)(z) R4N (z) and G(z) := Ep−1(z) ℓf(z)p m(p:λ) . Since Ep−1(z) ≡ 1 (mod p), we have F (z)G(z) ≡ a∆4N,3−α(p:λ)(z) R4N (z) (n)qnp af(n)q nm(p:λ) (mod p). If Fourier coefficients of f(z) at each cusp are p-integral, then ((F ·G)|2γt) (z) ≡ atF (n)q atG(n)q atf (n)q at∆4N,3−α(p:λ)(z) R4N (z) (mod p) for t ≁ ∞. Since aF (z)G(z)(0) ≡ a∆4N,3−α(p:λ)(z) R4N (z) (0)af (0) + af (p u−m(p:λ)) (mod p) , F (z)G(z) (0) ≡ at∆4N,3−α(p:λ)(z) R4N (z) (0)atf (0) (mod p) for t ≁ ∞, for large u, the Residue Theorem implies Theorem 2 by letting u = 2b. Therefore it is enough to check a p-integral property of Fourier coefficients of f(z) at each cusp: take a positive integer e such that ∆(z)ef(z) is a holomorphic modular form, where THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 15 ∆(z) := q n=1(1− qn)24. Note that the q-expansions of j4N (z) and ∆4N,12e+λ(z) at each cusp are p-integral. Thus (4.1) implies that ∆(z)ef(z) = δ12e+λ(4N)∑ cnj4N (z) n∆4N,12e+λ(z). Moreover, cn is p-integral since j4N (z) n∆4N,12e+λ(z) = q δ12e+λ(4N)−n +O qδ12e+λ(4N)−n+1 and f(z) ∈ OL[[q, q−1]]. Note that p ∤ 4N since 1 ≤ N ≤ 4 and p ≥ 5 is a prime. So Fourier coefficients of j4N (z), ∆N,12e+λ(z) and at each cusp are p-integral. This completes our claim. � 5. Proof of Theorem 3 Theorem 3 follows from Theorem 1 and Theorem 2.1. Proof of Theorem 3. Note that j(z) ∈ MH . Let g(z) := Ψ−1(j(z)) and f(z) := Ψ−1(F (z)) = af (n)q It is known (see §14 in [4]) that g(z) = (θ(z))E10(4z) 4πi∆(4z) θ(z) d (E10(4z)) 80πi∆(4z) θ(z). Since the constant terms of the q-expansions at ∞ of f(z), θ(z) and g(z) are 0, a0 (0) = and a0g(0) = · 456 , respectively, we have f(z)− kθ(z)− a0f(0) + k(1− i)/2 a0g(0) g(z) ∈ M01 (Γ0(4)). Applying Theorem 1, one obtains the result. � 6. Proofs of Theorem 4 and 5 We begin with the following proposition. Proposition 6.1. Let p be an odd prime and f(z) := af (n)q n ∈Mλ+ 1 (Γ0(4)) ∩ Zp[[q]]. If λ ≡ 2 or 3 (mod p−1 ), then p2b−m(p:λ) ≡ −(14 − 4α(p : λ))af(0) + 28 2−1 − 2−1i )pb(7−2α(p:λ)) a0f (0) (mod p) 16 D. CHOI AND Y. CHOIE for every integer b > logp 2α(p:λ)−3 pm(p:λ) + 2 Proof of Proposition 6.1. For ν ∈ Z≥0, pm(p:λ) := ν · (p− 1) + α(p : λ) + 1 For an integer b with 3− 2α(p : λ) pm(p:λ) − 2 there exists an ℓ ∈ N such that 3− 2α(p : λ) p2b + pm(p:λ) + ℓ(p− 1) = 2, since 3− 2α(p : λ) p2b + pm(p:λ) − 2 = 3− 2α(p : λ) (p2b − 1) + ν(p− 1). We have F (z) ≡ n=0 af (n)q npm(p:λ) (mod p), G(z) ≡ q−pb + 14− 4α(p : λ) + aG(1)q + · · · (mod p). Note that aG(n) is p-integral for every integer n. Moreover, we obtain F (z)G(z)|2 ( 0 −11 0 ) ≡ a0f (0) + · · · −26pb )pb(7−2α(p:λ)) + · · · (mod p), where a0f (0) is given in (1.1). Note that ∞, 0, 1 is the set of cusps of Γ0(4), so Theorem 2 implies that af (p 2b−m(p:n)) + (14− 4α(p : λ))af(0)− 28a0f (0) )pb(7−2α(p:λ)) ≡ 0 (mod p). This proves Proposition 6.1. � 6.1. Proof of Theorem 4. Now we prove Theorem 4. Proof of Theorem 4. Take f(z) := θ2λ+1(z) = 1 + r2λ+1(ℓ)q af(n)q Note that f(z) ∈Mλ+ 1 (Γ0(4)). Since (θ| 1 ( 0 −11 0 ))(z) = , we obtain af(0) = 1 and a f (0) = )2λ+1 THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 17 Since λ ≡ 2, 3 (mod p−1 ) and , we have )p2u(7−2α(p:λ)) a0f (0) pm(p:λ) )p2u(7−2α(p:λ)) ( )pm(p:λ)(2α(p:λ)+(p−1)(2[ λp−1 ]+m(p:λ))+1) )(7−2α(p:λ))(p2u−1)( )8+2(p−1)[ λp−1 ]+m(p:λ)pm(p:λ)(p−1)+(pm(p:λ)−1)(1+2α(p:λ)) )8+2[ λp−1 ](p−1)+2α(p:λ)(pm(p:λ)−1) )[ λp−1 ]+α(p:λ)m(p:λ) (mod p), for some u ∈ N. Applying Proposition 6.1, we obtain the result. � 6.2. Proof of Theorem 5. Consider the Cohen Eisenstein seriesHr+ 1 (z) := N=0H(r,N)q of weight r + 1 , where r ≥ 2 is an integer. If (−1)rN ≡ 0, 1 (mod 4), then H(r,N) = 0. If N = 0, then H(r, 0) = −B2r . If N is a positive integer and Df 2 = (−1)rN , where D is a fundamental discriminant, then (6.1) H(r,N) = L(1− r, χD) µ(d)χD(d)d r−1σ2r−1(f/d). Here µ(d) is the Möbius function. The following theorem implies that the Fourier coeffi- cients of Hr+ 1 (z) are p-integral if p−1 Theorem 6.2 ([6]). Let D be a fundamental discriminant. If D is divisible by at least two different primes, then L(1−n, χD) is an integer for every positive integer n. If D = p, p > 2, then L(1−n, χD) is an integer for every positive integer n unless gcd(p, 1−χD(g)gn) 6= 1, where g is a primitive root (mod p). Proof of Theorem 5. Note that E10(z) = E4(z)E6(z). So, E10(z)F (z), E10(z)G(z) and E10(z)W (z) are modular forms of weights, 8 · 12 , 7 · and 8 · 1 respectively. Moreover, the Fourier coefficients of those modular forms are 11-integral, since the Fourier coefficients of H 5 (z), H 7 (z) and H 9 (z) are 11-integral by Theorem 6.2. We have E10(z)F (z) = +O(q), E10(z)F (z)| 17 ( 0 −11 0 ) = (1 + i)(2i)−5 +O E10(z)G(z) = +O(q), E10(z)G(z)| 15 ( 0 −11 0 ) = (1− i)(2i)−7 +O E10(z)W (z) = +O(q), E10(z)W (z)| 17 ( 0 −11 0 ) = (1 + i)(2i)−9 +O 18 D. CHOI AND Y. CHOIE where B2r is the 2rth Bernoulli number. The conclusion now follows from Proposition 6.1. � 6.3. Proof of Theorem 6. We begin by introducing some notations (see [17]). Let V := (F2np , Q) be the quadratic space over Fp, where Q is the quadratic form obtained from a quadratic form x 7→ T [x](x ∈ Z2np ) by reducing modulo p. We denote by < x, y >:= Q(x, y)−Q(x)−Q(y), x, y ∈ F2np , the associated bilinear form and let R(V ) := {x ∈ F2np : < x, y >= 0, ∀y ∈ F2np , Q(x) = 0} be the radical of R(V ). Following [14], define a polynomial Hn,p(T ;X) := 1 if sp = 0,∏[(sp−1)/2] j=1 (1− p2j−1X2) if sp > 0, sp odd, (1 + λp(T )p (sp−1)/2X) ∏[(sp−1)/2] j=1 (1− p2j−1X2) if sp > 0, sp even, where for even sp we denote λp(T ) := 1 if W is a hyperbolic space or sp = 2n, −1 otherwise. Following [16], for a nonnegative integer µ, define ρT (p µ) by ρT (p µ)Xµ := (1−X2)Hn,p(T ;X), if p|fT , 1 otherwise. We extend the functions ρT multiplicatively to natural numbers N by defining ρT (p µ)X−µ := ((1−X2)Hn,p(T ;X)). D(T ) := GL2n(Z) \ {G ∈M2n(Z) ∩GL2n(Q) : T [G−1] half-integral}, where GL2n(Z) operates by left-multiplication and T [G −1] = T ′G−1T . Then D(T ) is finite. For a ∈ N with a|fT , let (6.2) φ(a;T ) := G∈D(T ),|det(G)|=d ρT [G−1](a/d Note that φ(a;T ) ∈ Z for all a. With these notations we state the following theorem: Theorem 6.3 ([17]). Suppose that g ≡ 0, 1 (mod 4) and let k ∈ N with g ≡ k (mod 2). A Siegel modular form F is in SMaassk+n (Γ2g) if and only if there exists a modular form f(z) = c(n)qn ∈ Sk+ 1 (Γ0(4)) THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 19 such that A(T ) = ak−1φ(a;T )c |DT | for all T . Here, DT := (−1)g · det(2T ) and DT = DT,0f T with DT,0 the corresponding fundamental discriminant and fT ∈ N. Remark 6.4. A proof of Theorem 6.3 given in [17] implies that if A(T ) ∈ Z for all T , then c(m) ∈ Z for all m ∈ N. Proof of Theorem 6. From Theorem 6.3 we can take f(z) = c(n)qn ∈ Sk+ 1 (Γ0(4)) ∩ Zp[[q]] such that F (Z) = A(T )qtr(TZ) = ak−1φ(a;T )c |DT | qtr(TZ). By Theorem 1, there exists a positive integer b such that, for every positive integer m, c(pbm) ≡ 0 (mod pj), since k ≡ 2 or 3 (mod p−1 ). Suppose that pb+2j ||DT |. If pj|a and a|fT , then ak−1φ(a;T )c |DT | ≡ 0 (mod pj). If pj ∤ a and a|fT , then pb ∣∣∣ |DT |a2 and a k−1φ(a;T )c |DT | ≡ 0 (mod pj). � Acknowledgement We thank the referee for many helpful comments which have improved our exposition. References [1] S. Ahlgren and M. Boylan Central Critical Values of Modular L-functions and Coeffients of Half Integral Weight Modular Forms Modulo ℓ, Amer. J. Math. 129 (2007), no. 2, 429–454. [2] A. Balog, H. Darmon, K. Ono, Congruences for Fourier coefficients of half-integer weight modu- lar forms and special values of L-functions, Analytic Number Theory, 105–128. Progr. Math. 138 Birkhauser, 1996. [3] B. Berndt and A. Yee, Congruences for the coefficients of quotients of Eisenstein series, Acta Arith. 104 (2002), no. 3, 297–308. [4] R. E. Borcherds, Automorphic forms on Os+2,2(R) and infinite products, Invent. Math. 120 (1995) 161–213. [5] J. H. Bruinier, K. Ono, The arithmetic of Borcherds’ exponents, Math. Ann. 327 (2003), no. 2, 293–303. [6] L. Carlitz, Arithmetic properties of generalized Bernoulli numbers, J. Reine Angew. Math. 202 1959 174–182. 20 D. CHOI AND Y. CHOIE [7] H. Cohen, Sums involving the values at negative integers of L-functions of quadratic characters, Math. Ann. 217 (1975), no. 3, 271–285. [8] D. Choi and Y. Choie, Linear Relations among the Fourier Coefficients of Modular Forms on Groups Γ0(N) of Genus Zero and Their Applications, to appear in J. Math. Anal. Appl. 326 (2007), no. 1, 655–666. [9] Y. Choie, W. Kohnen, K. Ono, Linear relations between modular form coefficients and non-ordinary primes, Bull. London Math. Soc. 37 (2005), no. 3, 335–341. [10] H. Cohen and J. Oesterle, Dimensions des espaces de formes modulaires, Lecture Notes in Mathe- matics, 627 (1977), 69–78. [11] S. Corteel and J. Lovejoy, Overpartitions, Trans. Amer. Math. Soc. 356 (2004) 1623–1635. [12] B. Gordon and K. Hughes, Multiplicative properties of eta-product, Cont. Math. 143 (1993), 415-430. [13] P. Guerzhoy, The Borcherds-Zagier isomorphism and a p-adic version of the Kohnen-Shimura map, Int. Math. Res. Not. 2005, no. 13, 799–814. [14] Y. Kitaoka, Dirichlet series in the theory of Siegel modular forms, Nagoya Math. J. 95 (1984), 73–84. [15] N. Koblitz, Introduction to elliptic curves and modular forms, Graduate Texts in Mathematics, 97. Springer-Verlag, New York, 1993 [16] W. Kohnen, Lifting modular forms of half-integral weight to Siegel modular forms of even genus, Math. Ann. 322 (2002), 787–809. [17] W. Kohnen and H. Kojima, A Maass space in higher genus, Compos. Math. 141 (2005), no. 2, 313–322. [18] P. Jenkins and K. Ono, Divisibility criteria for class numbers of imaginary quadratic fields, Acta Arith. 125 (2006), no. 3, 285–289. [19] T. Miyake, Modular forms, Translated from the Japanese by Yoshitaka Maeda, Springer-Verlag, Berlin, 1989 [20] J.-P. Serre, Formes modulaires et fonctions zeta p-adiques, Lecture Notes in Math. 350, Modular Functions of One Variable III. Springer, Berlin Heidelberg, 1973, pp. 191–268. [21] J. Sturm, On the congruence of modular forms, Number theory (New York, 1984–1985), 275–280, Lecture Notes in Math., 1240, Springer, Berlin, 1987. [22] S. Treneer, Congruences for the Coefficients of Weakly Holomorphic Modular Forms, to appear in the Proceedings of the London Mathematical Society. [23] D. Zagier, Traces of singular moduli, Motives, polylogarithms and Hodge theory, Part I, Int. Press Lect. Ser., 3, I, Int. Press, Somerville, MA, 2002, pp.211–244. School of Liberal Arts and Sciences, Korea Aerospace University, 200-1, Hwajeon- dong, Goyang, Gyeonggi, 412-791, Korea E-mail address : choija@postech.ac.kr Department of Mathematics and Pohang Mathematical Institute, POSTECH, Pohang, 790–784, Korea E-mail address : yjc@postech.ac.kr 1. Introduction and Statement of Main Results 2. Applications: More Congruences 2.1. p-adic Limits of Borcherds Exponents 2.2. Sums of n-Squares 2.3. Quotients of Eisenstein Series 2.4. The Maass Space 3. Linear Relation among Fourier Coefficients of modular forms of Half Integral Weight 4. Proofs of Theorem ?? and ?? 4.1. Proof of Theorem ?? 4.2. Proof of Theorem ?? 5. Proof of Theorem ?? 6. Proofs of Theorem ?? and ?? 6.1. Proof of Theorem ?? 6.2. Proof of Theorem ?? 6.3. Proof of Theorem ?? Acknowledgement References ABSTRACT Serre obtained the p-adic limit of the integral Fourier coefficient of modular forms on $SL_2(\mathbb{Z})$ for $p=2,3,5,7$. In this paper, we extend the result of Serre to weakly holomorphic modular forms of half integral weight on $\Gamma_{0}(4N)$ for $N=1,2,4$. A proof is based on linear relations among Fourier coefficients of modular forms of half integral weight. As applications we obtain congruences of Borcherds exponents, congruences of quotient of Eisentein series and congruences of values of $L$-functions at a certain point are also studied. Furthermore, the congruences of the Fourier coefficients of Siegel modular forms on Maass Space are obtained using Ikeda lifting. <|endoftext|><|startoftext|> Introduction The purpose of this paper is to describe string topology from the viewpoint of Chen’s iterated integrals. Let M be a compact closed oriented d-manifold and LM be the free loop space ofM , the set of unbased smooth maps from S1 toM . Let H∗(LM) be the homology of the free loop space shifted by the dimension of the manifold i.e. H∗(LM) = H∗+d(LM). Chas and Sullivan found the product on H∗(LM) which they called loop product [1]: Hp(LM)⊗Hq(LM)→ Hp+q(LM). They showed that this product makes H∗(LM) an associative, commutative algebra. Merkulov constructed a model for this product based on the theory of iter- ated integrals, especially of the formal power series connection [10]. He showed that there is an isomorphism of algebras H∗(LM) ∼= H∗(ΛM ⊗ R where ΛM is the de Rham differential graded algebra of M and R the formal completion of the free graded associative algebra generated by some noncommutative indeterminates. On the other hand, Chen showed that the cohomology of the free loop space of the simply-connected manifold is isomorphic to the cohomology of the cyclic bar complex of differential forms via Chen’s iterated integrals (see [5] or [8]): H∗(LM) ∼= H ∗(C(ΛM)). In this paper, we construct a model for the loop product based on the the- ory of the cyclic bar complex. We define a complex Hom(B(ΛM),ΛM) and its subcomplex Hom(B(ΛM),ΛM) so that the Poincaré duality induces the isomorphism of vector spaces H∗(Hom(C(ΛM),R)) ∼= H∗−d(Hom(B(ΛM),ΛM)). We can define a product on Hom(B(ΛM),ΛM) which realizes the loop product. http://arxiv.org/abs/0704.0014v1 Theorem 1.1. Let M be a compact closed oriented simply-connected manifold. Assume that H∗(M) is of finite type. Let A be a differential graded subalge- bra of ΛM such that H∗(A) ∼= H∗(ΛM) by the inclusion. Then there is an isomorphism of associative, commutative algebras H∗(LM) ∼= H∗(Hom(B(A), A)). The product defined on H∗(Hom(B(A), A)) corresponds to the loop product un- der the isomorphism. The paper is organized in the following way. In section 2, we briefly review Chen’s iterated integrals. In section 3, we give a construction of a complex Hom(B(A), A), and discuss its properties. In section 4, we give a proof of theorem 1.1. In section 5, we study the iterated integrals on the free loop space of the non-simply-connected manifolds. In section 6, we describe a relation between the product on Hom(B(A), A) and the Goldman bracket. In this paper, all the homologies have their coefficients in the field of real numbers. Acknowledgement: The author would like to thank Professor Toshitake Kohno much for helpful comments and gentle support. 2 Chen’s iterated integrals We briefly review Chen’s iterated integrals (see [5], or [8]). Let M be a finite dimensional smooth manifold and let LM be the free loop space of M , that is the space of all smooth maps from S1 to M . Let ∆k be the k-simplex {(t1, · · · , tk) ∈ R k | 0 ≤ t1 ≤ · · · ≤ tk ≤ 1}. We have an evaluation map Φk : ∆k × LM →M defined by Φk(t1, · · · , tk; γ) = (γ(t1), · · · , γ(tk)). Then define Pk to be the composition (Λ∗M)⊗k → Λ∗Mk → Λ∗(∆k × LM) → Λ∗−kLM where p∗ is the integration along the fiber of the projection p : ∆k×LM → LM . Given ω1, · · ·ωk ∈ Λ ∗M , the iterated integral ω1 · · ·ωk is a differential form on LM of total degree |ω1| + · · · |ωk| − k, defined by the formula ω1 · · ·ωk = (−1) (k−1)|ω1|+(k−2)|ω2|+···+|ωk−1|+k(k−1)/2Pk(ω1, · · · , ωk). 3 Preliminaries In this section, we give a construction of some complexes. Let A be an arbitrary differential graded algebra in this section. Let A∨ denote the dual of A. The bar complex of A, (B(A), dB), is defined by B(A) = ⊕r≥0 ⊗ r sA, dB(ω1, · · · , ωr) = −(−1) (ω1, · · · , ωi−1, dωi, ωi+1, · · · , ωr) −(−1)εi (ω1, · · · , ωi−1, ωi ∧ ωi+1, ωi+2, · · · , ωr). Here (sA)q = Aq+1 or Aq according as 0 ≤ q or 0 < q, and εi = deg(ω1, · · · , ωi). We denote the totality of degree n elements by B(A)n. The coproductH ∗(B(A)) → H∗(B(A)) ⊗H∗(B(A)) is defined by (ω1, · · · , ωn) 7→ (ω1, · · · , ωi)⊗ (ωi+1, · · · , ωn). Chen proved the following theorem. Theorem 3.1 (Chen [5]). Let M be a simply-connected manifold and H∗(M) be of finite type. Let A be a differential graded algebra of ΛM such that A0 = R and H∗(A) ∼= H∗(ΛM) by the inclusion. Then there is an isomorphism of coalgebras H∗(B(A)) ∼= H ∗(ΩM) given by (ω1, · · · , ωn) 7→ ω1 · · ·ωn. Let F pB(A) be a filtration of B(A) such that F pB(A) = ⊕0≤r≤p ⊗ r sA. Let Hom(B(A), A∨)n = p+q=n Hom(B(A)p, A q∨) and Hom(B(A), A∨) = n Hom(B(A), A ∨)n. Its boundary is defined by δϕ(ω1, · · · , ωr)(ω) = ϕ(ω1, · · · , ωr)(dω) + (−1) |ω|ϕ(dB(ω1, · · · , ωr))(ω) − (−1)|ω|ϕ(ω2, · · · , ωr)(ω ∧ ω1) +(−1)|ω|+εr−1(|ωr|+1)ϕ(ω1, · · · , ωr−1)(ω ∧ ωr). Let us define the subcomplex of Hom(B(A), A∨), Hom(B(A), A∨), according to the Chen’s normalization of the cyclic bar complex (see [4] or [8]). We define Hom(B(A), A∨) to be the set of elements in Hom(B(A), A∨) which satisfy the following equations for any ω, ωi ∈ A >0 and f ∈ A0: −ϕ(· · ·ωi−2, fωi−1, ωi, · · · )(ω) + ϕ(· · · , ωi−1, fωi, ωi+1, · · · )(ω) +ϕ(· · · , ωi−1, df, ωi, · · · )(ω) = 0, 1 ≤ i ≤ r − 1, −ϕ(ω1, · · · , ωr)(fω) + ϕ(fω1, · · · , ωr)(ω) + ϕ(df, ω1, · · · , ωr)(ω) = 0, −ϕ(ω1, · · · , fwr)(ω) + ϕ(ω1, · · · , ωr)(fω) + ϕ(ω1, · · · , ωr, df)(ω) = 0. It can be easily seen that it is isomorphic to the dual of the normalized cyclic bar complex of A: Hom(B(A), A∨) ∼= C(A) Similarly, let Hom(B(A), A)n = p−q=n Hom(B(A)p, A q) and Hom(B(A), A) n Hom(B(A), A)n. Its boundary is defined by δϕ(ω1, · · · , ωr) = (−1)|ϕ|−εrdϕ(ω1, · · · , ωr)− (−1) |ϕ|−εrϕ(dB(ω1, · · · , ωr)) +(−1)|ϕ|−εrω1 ∧ ϕ(ω2, · · · , ωr) −(−1)(|ωr|+1)(|ϕ|+1)ϕ(ω1 · · · , ωr−1) ∧ ωr. We define Hom(B(A), A) to be the set of elements in Hom(B(A), A) which satisfy the following equations for any ω, ωi ∈ A >0 and f ∈ A0: −ϕ(· · ·ωi−2, fωi−1, ωi, · · · ) + ϕ(· · · , ωi−1, fωi, ωi+1, · · · ) +ϕ(· · · , ωi−1, df, ωi, · · · ) = 0, 1 ≤ i ≤ r − 1, −f ∧ ϕ(ω1, · · · , ωr) + ϕ(fω1, · · · , ωr) + ϕ(df, ω1, · · · , ωr) = 0, −ϕ(ω1, · · · , fwr) + ϕ(ω1, · · · , ωr) ∧ f + ϕ(ω1, · · · , ωr, df) = 0. The cup product on Hom(B(A), A) is defined by ϕ1 ∪ ϕ2(ω1, · · · , ωr) 0≤i≤r (−1)|ϕ1|(|ϕ2|+εr−εi)ϕ1(ω1, · · · , ωi) ∧ ϕ2(ωi+1, · · · , ωr). Since δ(ϕ1 ∪ ϕ2) = δϕ1 ∪ ϕ2 + (−1) |ϕ1|ϕ1 ∪ δϕ2, H∗(Hom(B(A), A)) becomes an algebra. This product can be induced on H∗(Hom(B(A), A)). The E1-term of their spectral sequences associated with the filtration F pB(A) can be calculated from the cohomology of A. Proposition 3.2. There is an isomorphism of vector spaces H∗(Hom(F pB(A)/F p−1B(A), A∨)) ∼= Hom(⊗ psH(A), H(A)∨) Proof. Let A be a differential graded subalgebra of A such that A = Ap for p > 1, A = R and A1 = dA0 ⊕A There is an isomorphism of vector spaces Hom(F qB(A)/F q−1B(A), A∨) ∼= Hom(F qB(A)/F q−1B(A), A Since A = R, there is an isomorphism H0(Hom(F qB(A)/F q−1B(A), A )) ∼= Hom(⊗sH(A), H(A) Therefore we obtain the proposition. 4 Proof of Theorem 1.1 We give the proof of theorem 1.1 in this section. There is a differential graded subalgebra of A, A, such that A = R and H(A) ∼= H(A) by the inclusion. Then we obtain the isomorphism of algebras H∗(Hom(B(A), A)) ∼= H∗(Hom(B(A), A)) by proposition 3.2. Therefore it suffices to verify the theorem in the case A0 = R. The following result is due to Chen. Theorem 4.1 (Chen [5]). H∗(LM) ∼= H∗(Hom(B(A), A Proof. We define ψ : C∗(LM)→ Hom(B(A), A ∨) by ψ(σ)(ω1, · · · , ωn)(ω) = π∗ω ∧ ω1 · · ·ωn. Let FpC∗(LM) be a filtration of C∗(LM) such that FpCr(LM) = { σ : ∆ r → LM | π ◦ σ = σ′ ◦ π′ for some σ′ ∈ Cq(M), q ≤ p, π′ : ∆r → ∆q } . Let {Erp,q} be the associated spectral sequence. Define a filtration of Hom(B(A), A FpHom(B(A), A) = {f ∈ Hom(B(A), A ∨) | f(ω1, · · · , ωn)(ω) = 0, ∀ω ∈ A ≥p+1}. It can be easily shown that ψ preserves the filtrations of C∗(LM) and Hom(B(A), A On E2-level, the map ψ : Hp(M)⊗Hq(ΩM)→ Hp(A ∨)⊗Hq(B(A) is given by σ1 ⊗ σ2 7−→ (ω1, · · · , ωn 7→ ω1 · · ·ωn) Theorem 3.1 asserts that this is an isomorphism. Therefore we obtain the theorem. Lemma 4.2. H∗(Hom(B(A), A)) ∼= H∗−d(Hom(B(A), A Proof. We define a chain map P : Hom(B(A), A)→ Hom(B(A), A∨) by P (ϕ)(ω1, · · · , ωn)(ω) = ω ∧ ϕ(ω1, · · · , ωn). Define a filtration of Hom(B(A), A) by FpHom(B(A), A) = {ϕ ∈ Hom(B(A), A) | ϕ(ω1, · · · , ωn) ∈ A ≥d−p}. The map P preserves those filtrations. On E2-level, the map P : Hd−p(A)⊗Hq(B(A) ∨)→ Hp(A ∨)⊗Hq(B(A) is given by ω ⊗ ϕ 7−→ ω ∧ τ This is isomorphic and we obtain the lemma. Proof of theorem 1.1. We can verify that H∗(LM) is isomorphic to H∗(Hom(B(A), A)) as vector spaces by composing the maps in theorem 4.1 and lemma 4.2. We can also verify that there is an isomorphism of associative, commutative algebras. Indeed, the cup product of Hom(B(A), A) on E2-level Hd−p(A)⊗Hq(B(A) ∨)⊗Hd−s(A)⊗Ht(B(A) ∨)→ H2d−p−s(A)⊗Hq+t(B(A) is given by a⊗ g ⊗ b⊗ h 7→ (−1)(d−p+q)(d−s)a ∧ b⊗ g · h, where g · h satisfies g · h(ω1, · · · , ωn) = g(ω1, · · · , ωi)h(ωi+1, · · · , ωn). Then the following theorem asserts that the loop product and the cup product coincide on E2-level. Theorem 4.3 (Cohen-Jones-Yan [6]). Let M be a simply-connected manifold. Then {Erp,q} becomes an algebra and converges to H∗(LM) as algebras. On E2-level, the product µ : Hp(M ;Hq(LM))⊗Hs(M ;Ht(LM))→ Hp+q−d(M ;Hs+t(LM)) is given by µ((a⊗ g)⊗ (b ⊗ h)) = (−1)(d−s)(p+q−d)(a · b)⊗ (gh) where a ∈ Hp(M), b ∈ Hs(M), g ∈ Hq(ΩM), h ∈ Ht(ΩM), a · b is the intersec- tion product and gh is the Pontryagin product. Therefore we obtain the theorem. 5 The conjugacy classes of fundamental groups Let π denote a fundamental group of a smooth manifold M and J denote an augmentation ideal of the group ring of π, Rπ. Chen showed that the completion of the fundamental group with respect to the powers of its augmentation ideal is isomorphic to the dual of the 0-th cohomology of the bar complex of differential forms via iterated integrals [3]: Rπ/Jp ∼= H 0(B(A))∨ where A is a differential graded subalgebra of ΛM such that A0 = R and H∗(A) ∼= H∗(M). Based on this work, we study iterated integrals on the free loop space of the non-simply-connected manifold. Let π̃ denote the set of conjugacy classes of π and J̃p denote pr(Jp) where pr is the projection of Rπ onto Rπ̃. Theorem 5.1. Let M be a smooth manifold and H∗(M) is of finite type. Let A be a differential graded subalgebra of ΛM such that the map Hq(A)→ Hq(ΛM) induced by the inclusion is isomorphic if q = 0, 1 and injective if q = 2. Then there is an isomorphism of vector spaces Rπ̃/J̃p ∼= H0(Hom(B(A), A We give the proof of this theorem in this section. Let ∗ be a fixed point in S1. In this section, let LM be a set of smooth maps from S1 to M which are constant maps near ∗. Let ΩxM be a subspace of LM whose elements send ∗ to x ∈ M . Let Diff(S1, ∗) denote diffeomorphisms of S1 which coincide with identity map near ∗. We define α, β : ∆q → LM to be equivalent by a reparameterization iff there is a smooth map τ : ∆q → Diff(S1, ∗) such that β(ξ)(t) = α(ξ)(τ(t, ξ)), ∀(t, ξ) ∈ S1 ×∆q. Let C∗(LM) be a chain complex having as a basis the totality of equiva- lence classes of smooth simplexes of LM . Let C∗(ΩxM) be a chain complex having as a basis the totality of equivalence classes of smooth simplexes of ΩxM . C∗(ΩxM) becomes a noncommutative associative algebra as follows. The prod- uct of σ1 and σ2 in C∗(ΩxM) is defined to be the path product or 0 according as degσ1+degσ2 ≤ 1 or > 1. The augmentation ε : C∗(ΩxM) → R is given by εσ = 1 or 0 according as degσ = 0 or > 0. Let σ be a smooth simplex of M . Define for each σ Cq(LM)(σ) = { niτi ∈ Cq(LM) | π♯τi = σ}. Cq(LM)(σ) becomes a noncommutative associative algebra. Let ε(σ) denote the augmentation of Cq(LM)(σ), given by niτi 7→ ni. Define a filtration of Cq(LM)(σ) by FpCq(LM) = (kerε) p ⊕ (⊕σ:∆q→M (kerε(σ)) Proposition 5.2. The map ψp : FpCq(LM) → Hom(F p−1B(A), A∨) given by (ω1, · · · , ωp) 7→ π∗ω ∧ ω1 · · ·ωp is well-defined, chain map and FpCq(LM) ⊂ kerψp. Proof. The well-definedness can be verified by the following lemma which can be verified as in proposition 1.5, proposition 4.1.1 [2], and in proposition 1.5.3 Lemma 5.3 (Chen). (1) If α and β ∈ C∗(LM) are equivalent by a reparame- terization, then ω1 · · ·ωn = β ω1 · · ·ωn. (2) If τ1, τ2 ∈ Cq(LM)(σ), then (τ1 · τ2) ω1 · · ·ωn = ω1 · · ·ωi ∧ τ ωi+1 · · ·ωn. (3) If f ∈ Λ0M , then for any i ω1 · · · fωi−1 · · ·ωn + ω1 · · · fωi · · ·ωn + ω1 · · ·ωi−1df ωi · · ·ωn = 0. To verify FpCq(LM) ⊂ kerψp, it suffices to show (kerε(σ)) p ⊂ kerψp. Let s denote the section of π, which sends points of M to the constant map. Take (σ1 − s♯σ) · (σ2 − s♯σ) · · · · ·(σp − s♯σ) ∈ (kerε(σ)) p, where σ ∈ Cq(M) and σi ∈ Cq(LM)(σ). Then (σ1 − s♯σ) · (σ2 − sσ) · · · · · (σp − s♯σ) π∗ω ∧ ω1 · · ·ωp−1 σ∗ω ∧ (σ1 − s♯σ) ω1 · · · (σk − s♯σ) ∗1 · · · ∧ (σp − s♯σ) Therefore we obtain the proposition. Let C∗(M,x) denote a set of smooth simplexes ofM neighborhood of whose vertices are at x in M . We define C ⊗ sC⊗p = C∗(M,x)⊗ sC∗(M,x) Here (sC∗(M,x))q = Cq+1(M,x) or 0 according as q > 0 or q ≤ 0. Its boundary is given by the sum of the boundary on each complex. Let us construct a chain map Φ : C ⊗ sC⊗p → FpC∗(LM)/Fp+1C∗(LM) considering the following three cases: case 1: If (σ1, · · · , σp) ∈ sC(M,x)⊗p , then Φ : (σ1, · · · , σp) 7−→ (σ1 − x) · (σ2 − x) · · · · · (σp − x) where x is regarded as a constant map. case 2: If (σ1, · · · , σp) ∈ sC(M,x)⊗p , then Φ : (σ1, · · · , σp) 7−→ (σ1 − x) · (σ2 − x) · · ·σi · · · (σp − x) where σi : ∆ 1 ∋ ξ 7→ σi(ξ)(t) ∈ ΩxM is σi(ξ)(t) σi((1 − ξ)((1 − t)v0 + tv2) + ξ(1 − 2t)v0 + 2ξtv1), if 0 ≤ t ≤ 1/2 σi((1 − ξ)((1 − t)v0 + tv2) + ξ(2 − 2t)v1 + ξ(2t− 1)v2), if 1/2 ≤ t ≤ 1 Here v0, v1, v2 are the vertices of the standard simplex ∆ case 3: If (γ, σ1, · · · , σp) ∈ C1(M,x)⊗ sC(M,x)⊗p , then Φ : (γ, σ1, · · · , σp) 7−→ γ t (σ1 − x)γt · · · γ t (σp − x)γt where γt : [0, 1] ∋ s 7→ γ(st) ∈ M , t ∈ ∆ Lemma 5.4. The following diagram commutes: C ⊗ sC⊗p −−−−→ FpC1(LM)/Fp+1C1(LM) C ⊗ sC⊗p −−−−→ FpC0(LM)/Fp+1C0(LM) Proof. For case 2, ∂′Φ(σ1, · · · , σp)− Φ∂(σ1, · · · , σp) = (σ1 − x) · · · (σ i · σ i − σ i − σ i + σ i − σ i + x) · · · (σp − x) = (σ1 − x) · · · (σ i − x) · (σ i − x) · · · (σp − x) ∈ Fp+1C0(LM) where σ i , σ i , σ i are the faces of σi. For case 3, ∂′Φ(γ, σ1, · · · , σp)− Φ∂ ′(γ, σ1, · · · , σp) = γ−1 · (σ1 − x) · γ · · · γ −1 · (σp − x) · γ − (σ1 − x) · · · (σp − x) ∈ Fp+1C0(LM). Therefore we obtain the lemma. Proposition 5.2 gives the map Hq(FpC(LM)/Fp−1C(LM))→ Hq(Hom(F pB(A)/F p−1B(A), A∨)). Lemma 5.5. For q = 0, the following map is isomorphic: H0(FpC(LM)/Fp+1C(LM)) ∼= H0(Hom(F pB(A)/F p−1B(A), A∨)). Proof. We obtain the following surjection by lemma 5.4. Φ : H0(C ⊗ sC ⊗p) ։ H0(FpC(LM)/Fp+1C(LM)). Composing with the isomorphism ⊗pH1(M) ∼= H0(C ⊗ sC ⊗p), the map ⊗pH1(M) ։ H0(FpC(LM)/Fp+1C(LM))→ Hom(⊗ pH1(A),R) is given by (σ1, · · · , σn) 7→ (ω1, · · · , ωp) 7→ ω1 · · · This is isomorphic and we obtain the lemma. Lemma 5.6. For q = 1, the following map surjective: H1(FpC(LM)/Fp+1C(LM)) ։ H1(Hom(F pB(A)/F p−1B(A), A∨)). Proof. It suffices to show that the following map obtained by lemma 5.4 is surjective. ker∂ → H1(FpC(LM)/Fp+1C(LM))→ Hom(⊗ psH(A), H(A)∨)1 If (γ, σ1, · · · , σp) ∈ ker∂ ∩ C0(M,x)⊗ sC(M,x)⊗p , then (γ, σ1, · · · , σp) 7→ (ω1, · · · , ωp) 7→ ω1 · · · ωp, if deg ω = 0 0, otherwise through the above map. If (γ, σ1, · · · , σp) ∈ ker∂ ∩ C1(M,x)⊗ sC(M,x)⊗p , then (γ, σ1, · · · , σp) 7→ (ω1, · · · , ωp) 7→ ω1 · · · when deg ω = 1. Then we can verify the surjectivity and obtain the lemma. Proof of theorem 1.1. Consider the spectral sequences ofC(LM)/FpC(LM) and Hom(F p−1B(A), A∨) associated with FqC(LM) and Hom(F qB(A), A∨), re- spectively. Lemma 5.5 asserts that ψp is isomorphic on E1-level at degree 0: H0(FqC(LM)/Fq+1C(LM)) ∼= H0(Hom(F qB(A)/F q−1B(A), A∨)). Lemma 5.6 asserts that ψp is surjective on E1-level at degree 1: H1(FqC(LM)/Fq+1C(LM)) ։ H1(Hom(F qB(A)/F q−1B(A), A∨)). Then there is an isomorphism on Er-level at degree 0 for r ≥ 1. We have Rπ̃/J̃p ∼= H0(C(LM)/FpC(LM)) ∼= H0(Hom(F pB(A), A∨)). Therefore we obtain the theorem. 6 The Goldman bracket This section is devoted to the proof of the following theorem. Theorem 6.1. Let M be a compact closed oriented surface with genus g. Then the Goldman bracket induces a Lie algebra structure on lim Rπ̃/J̃pand there is an isomorphism of Lie algebras Rπ̃/J̃p ∼= H0(Hom(B(H ∗(M)), H∗(M)∨)). Goldman showed that the vector space spanned by the free homotopy classes of closed curves on a closed oriented surface has a Lie algebra structure [9]. This work led Chas and Sullivan to the string topology. We would verify that this structure makes lim Rπ̃/J̃p a Lie algebra. On the other hand, we can construct a bracket on H0(Hom(B(H ∗(M)), H∗(M)∨)) by the cup product defined in section 3 and the Connes’s operator. Here we regard H∗(M) as a differential graded algebra with a trivial differential. Theorem 6.1 asserts that those two Lie algebras are isomorphic. First we describe a relation between this bracket and the augmentation ideal of the group ring of the surface group to induce a Lie algebra structure on Rπ̃/J̃p. Then we construct a bracket on H0(Hom(B(A), A ∨)) and verify the isomorphism of Lie algebras Rπ̃/J̃p ∼= H0(Hom(B(A), A Finally we verify the isomorphism H0(Hom(B(A), A ∨)) ∼= H0(Hom(B(H ∗(M)), H∗(M)∨). The following proposition makes lim Rπ̃/J̃p a Lie algebra. Proposition 6.2. (1) If p ≥ 1 and q ≥ 2, then [J̃p, J̃q] ⊂ J̃p+q−2. (2) If p ≥ 2 , then [J̃p,Rπ̃] ⊂ J̃p−1. Proof. We give a proof of (1). Take (σ1−x) · · · (σp−x) ∈ J̃p, (τ1−y) · · · (τq−y) ∈ J̃q, where σi ∈ ΩxM and τi ∈ ΩyM . Assume that all curves are immersions and σi τj intersect transversally for any i, j. Let {σi♯τj} denote the set of intersection points of σi and τj . Also assume that all the intersection points are distinct i.e. {σi♯τj} ∩ {σk♯τl} = φ if i 6= k or j 6= l. Then, [σ, τ ] = s∈σi♯τj {ε(s;σi, τj)γs,x · (σi − x) · · · (σp − x)(σ1 − x) · · · ·(σi−1 − x) · γ s,x · ·γs,y · (τj − y) · · · (τq − y)(τ1 − y) · · · (τj−1 − y) · γ −γs,x · (σi+1 − x) · · · (σp − x)(σ1 − x) · · · (σi−1 − x) · γ s,x · ·γs,y · (τj+1 − y) · · · (τq − y)(τ1 − y) · · · (τj−1 − y) · γ ∈ J̃p+q−2. Here γs,x is a path from s to x along σi and γs,y is a path from s to y along τj . The proof of (2) can be verified in the same way. Let A be a differential graded subalgebra of ΛM such thatH∗(A) ∼= H∗(ΛM) by the inclusion. Proposition 6.3. There is an isomorphism of vector spaces H∗(Hom(F pB(A), A)) ∼= H∗−2(Hom(F pB(A), A∨)). Proof. We define P : H∗−2(Hom(F pB(A), A))→ H∗(Hom(F pB(A), A∨)) by P (ϕ)(ω1, · · · , ωp)(ω) = ω ∧ ϕ(ω1, · · · , ωp). This map preserves the filtrations. On E1-level, the map Hom(⊗qH(A), H(A))→ Hom(⊗qH(A), H(A)∨) is isomorphic. Therefore we obtain the proposition. Now we construct a bracket on H0(Hom(B(A), A ∨)). First, we define the Connes’s operator B : H∗(Hom(F pB(A), A∨)) → H∗+1(Hom(F p−1B(A), A∨)) B(ϕ)(ω1, · · · , ωp−1)(ω) 0≤k≤p−1 (−1)(εk+1)(εp−1−εk)ϕ(ωk+1, · · · , ωp−1, ω, ω1, · · ·ωk)(1). Composing these maps and the cup product, we can define a bracket on H0(Hom(F pB(A), A∨)) by [ϕ1, ϕ2] = −P (P −1Bϕ1 ∪ P −1Bϕ2) ∈ H0(Hom(F p−1B(A), A∨)). Take 2g closed 1-forms on M , α1, · · · , αg, β1, · · ·βg, such that αi ∧ βj = δij . Let {E p.q} denote the spectral sequence of Hom(B(A), A ∨) associated with F pB(A). Notice that the cyclic group Z/pZ acts on E ∼= Hom(⊗pH1(A),R) ιϕ(ω1, · · · , ωp) = ϕ(ω2, · · · , ωp, ω1) where ι is a generator of Z/pZ. The bracket [ , ] : E p,−p⊗E q,−q → E p+q−2,−p−q+2 [ϕ1, ϕ2](ω1, · · · , ωp+q−2) i,m,n ιmϕ1(αi, ω1, · · · , ωp−1)̺ nϕ2(βi, ωp, · · · , ωp+q−2) −ιmϕ1(βi, ω1, · · · , ωp−1)̺ nϕ2(αi, ωp, · · · , ωp+q−2) where ι and ̺ are generators of Z/pZ and Z/qZ, respectively. Proposition 6.4. The following diagram commutes for p, q ≥ 1: J̃p/ ˜Jp+1 ⊗ J̃q/ ˜Jq+1 −−−−→ E ∞ ⊗ E [ , ] [ , ] J̃p+q−2/J̃p+q−1 −−−−→ E p+q−2,−p−q+2 Proof. Take σ = (σ1 − x) · · · (σp − x) ∈ FpC0(LM), τ = (τ1 − y) · · · (τq − y) ∈ FqC0(LM). Take 2g curves in M , ai, bi, as in Figure 1. Assume that σi and τj , ak, or bk, intersect transversally for any i, j, k. Also assume that τj and ak, or bk, intersect transversally for any j, k. Assume that all the intersection points are distinct. Then for any i, j, k, we can take each tubular neighborhoods of ai and bi so that it does not include some neighborhoods of intersection points of σj and τk. We fix such neighborhoods of intersection points and denote them by Up for each p. We can also take a tubular neighborhood of the diagonal map from M to M×M outside those neighborhoods of intersection points of σi and τj for any i, j i.e. S1 \ ∪pσ i (Up) S1 \ ∪pτ j (Up) = φ, ∀i, j. Here N∆ denotes the tubular neighborhood of the diagonal map. Thom class Φ of this tubular neighborhood satisfies Φ = −ε(p;σi, τj), where ε(p;σi, τj) is the intersecion number of σi and τj at p. Fig. 1 ・ ・ ・ Define e♯ : C0(LM) → C1(LM) by e♯γ(ξ)(t) = γ(ξ + t). Let ωk, 1 ≤ k ≤ n, be differential forms on M which has its support inside the tubular neighborhoods of ai and bi. Then [σi,τj] ω1 · · ·ωn p∈σi♯τj,k ε(p;σi, τj) (σi)p ω1 · · ·ωk (τj)p ωk+1 · · ·ωn p∈σi♯τj ,k ×τj | (σi)p ω1 · · ·ωk (τj)p ωk+1 · · ·ωn e♯σi×e♯τj π∗Φ ∧ p∗1 ω1 · · ·ωk ∧ p ωk+1 · · ·ωn. Here p1, p2 : LM×LM → LM are the projections. The last equality is obtained by the following lemma. Lemma 6.5. If p ∈ σi♯τj and p ′ ∈ Up ∩ σi([0, 1]), then (σi)p ω1 · · ·ωn = (σi)p′ ω1 · · ·ωn. Proof. F Let γ be the curve from p to p′ along σi inside Up. If γ and σ are in the same direction, then (σi)p′ ω1 · · ·ωn = γ·(σi)p′ ω1 · · ·ωn = (σ)p·γ ω1 · · ·ωn ω1 · · ·ωn. We can also verify the case where γ is in the direction opposite to σ in the same We have the equality e♯σ×e♯τ π∗Φ ∧ p∗1 ω1 · · ·ωk ∧ p ωk+1 · · ·ωp+q−2 e♯σ×e♯τ − p∗1(α1 ∧ β1)− p 2(α1 ∧ β1) + p 1αj ∧ p 2βj − p 1βj ∧ p ω1 · · ·ωk ∧ p ωk+1 · · ·ωp+q−2 In fact, if η ∈ Λ(M ×M) then (−1)|η|+1 e♯σ×e♯τ π∗dη ∧ p∗1 ω1 · · ·ωk ∧ p ωk+1 · · ·ωp+q−2 e♯σ×e♯τ π∗η ∧ d ω1 · · ·ωk ∧ p ωk+1 · · ·ωp+q−2 +(e♯σ) ω1 · · ·ωk (e♯τ) ωk+1 · · ·ωj ∧ ωj+1 · · ·ωp+q−2 The last equality is obtained by the following lemma. Lemma 6.6. If σ ∈ FpC0(LM), then (e♯σ) ω1 · · ·ωp−2 = 0. Proof. It suffices to show the case σ = (τ1 − x) · · · (τp − x) where x ∈ M and τi ∈ ΩxM . We define τ̄i ∈ ΩxM by τ̄i(t) = τi(pt), if (i − 1)/p ≤ t ≤ i/p 0, otherwise. Let σ̄ denote (τ̄1 − x) · · · (τ̄p − x). It can be shown that e♯σ̄ restricted on [(i − 1)/p, i/p] is contained in Fp−1C1(LM) for any i. Therefore (e♯σ) ω1 · · ·ωp−2 = (e♯σ̄) ω1 · · ·ωp−2 = 0. Jones, Geztler, and Petrack describes the map e♯ in terms of iterated inte- grals by the following theorem. Theorem 6.7 (Geztler-Jones-Petrack [8]). If σ ∈ C0(LM) and ω, ωi ∈ Λ 1 ≤ i ≤ p, then π∗ω ∧ ω1 · · ·ωp = ωk · · ·ωpωω1 · · ·ωk−1. This theorem asserts the equality e♯σ×e♯τ − p∗1(α1 ∧ β1)− p 2(α1 ∧ β1) + p 1αj ∧ p 2βj − p 1βj ∧ p ω1 · · ·ωk ∧ p ωk+1 · · ·ωn j,k,l ωk+1 · · ·ωp−1αjω1 · · ·ωk ωl+1 · · ·ωp+q−2βjωp · · ·ωl ωk+1 · · ·ωp−1βjω1 · · ·ωk ωl+1 · · ·ωp+q−2αjωp · · ·ωl Finally we obtain the equality [σ,τ ] ω1 · · ·ωp+q−2 j,k,l ωk+1 · · ·ωp−1αjω1 · · ·ωk ωl+1 · · ·ωp+q−2βjωp · · ·ωl ωk+1 · · ·ωp−1βjω1 · · ·ωk ωl+1 · · ·ωp+q−2αjωp · · ·ωl Since we can take ωi ∈ H 1(M), 1 ≤ i ≤ p + q − 2, so that their support are inside the tubular neighborhoods of aj and bj, we obtain the proposition. Proof of theorem 6.1. We obtain the following isomorphism of Lie algebras by proposition 6.4. Rπ̃/J̃p ∼= H0(Hom(B(A), A To obtain the isomorphism of Lie algebras H0(Hom(B(A), A ∨) ∼= H0(Hom(B(H ∗(M)), H∗(M)∨), we introduce the following lemma, which asserts the formality of the compact Kähler manifolds. Lemma 6.8 (ddcLemma, Deligne-Griffiths-Morgan-Sullivan [7]). Let X be a compact Kähler manifold and dc = J−1dJ where J gives the complex structure in the cotangent bundle. If α is a differential form on X such that dα = 0 and dcα = 0, and such that α = dγ, then α = ddcβ for some β. Cor. There are quasi-isomorphisms of differential graded algebras (ΛX, d)← (kerdc, d)→ (H∗dc(X), 0). Notice that a closed oriented surface endowed with a complex structure become a Kähler manifolds for the dimensional reason. Therefore the following lemma completes the proof of the theorem. Lemma 6.9. If f : A1 → A2 is a quasi-isomorphism of differential graded algebras, then the map induced by f H0(Hom(B(A1), A 1 )→ H0(Hom(B(A2), A is an isomorphism. Proof. It suffices to verify that the map induced by f f : H0(Hom(F pB(A1), A 1 )→ H0(Hom(F pB(A2), A is an isomorphism for any p. On E1-level, the map induced by f Hom(⊗sH(A1), H(A1) ∨)→ Hom(⊗sH(A2), H(A2) is an isomorphism because f is quasi-isomorphism. Therefore we obtain the lemma. Therefore we obtain the theorem. References [1] M. Chas and D. Sullivan, String topology, preprint, 1999, http://arXiv.org /abs/math.GT/9911159. [2] K.T. Chen, Iterated integrals of differential forms and loop space homology, Ann. of Math. (2) 97(1973), 217-246. [3] K.T. Chen, Iterated integrals, fundamental groups and covering spaces, Trans. Amer. Math. Soc. 206 (1975), 83-98. [4] K.T. Chen, Reduced bar constructions on de Rham complexes, in:A.Haller and M.Tierney (eds), (Algebra, topology and category theory, 1977, pp. 19- [5] K.T. Chen, Iterated path integrals, Bull. Amer. Math. Soc. 83 (1977), no.5, 831-879. [6] R.L. Cohen, J.D.S. Jones and J. Yan, The loop homology algebra of spheres and projective spaces, Categorical Decomposition Techniques in Algebraic Topology (Isle of Skye, 2001), Progr. Math., vol. 215. Birkhäuser, Basel, 2004, pp.77-92. [7] P. Deligne, P. Griffiths, J. Morgan and D. Sullivan, Real homotopy theory of Kähler manifolds, Invent. Math. 29 (1975), 245-274. [8] E. Getzler, J.D.S. Jones and S. Petrack Differential forms on loop spaces and the cyclic bar complex, Topology 30 (1991), no.3, 339-371. http://arXiv.org [9] W.M. Goldman, Invariant functions on Lie groups and Hamlitonian flows of surface group representation, Invent. Math. 85 (1986), no.2, 263-302. [10] S.A. Merkulov, De Rham Model for String Topology, International Mathe- matics Research Notices 55 (2004), 2955-2981. Introduction Chen's iterated integrals Preliminaries Proof of Theorem 1.1 The conjugacy classes of fundamental groups The Goldman bracket ABSTRACT In this article we discuss a relation between the string topology and differential forms based on the theory of Chen's iterated integrals and the cyclic bar complex. <|endoftext|><|startoftext|> Introduction 1 2. Zero mode integration 2 2.1 Symmetry considerations and tensorial formulae 3 2.2 A spinorial formula 5 2.3 Component-based approach 7 3. One-loop amplitudes 7 3.1 Review: four bosons 8 3.2 Four fermions 10 3.3 Two bosons, two fermions 10 4. Two-loop amplitudes 12 4.1 Review: four bosons 13 4.2 Four fermions 14 4.3 Two bosons, two fermions 15 5. Discussion 16 A. Reduction to kinematic bases 17 A.1 Four bosons 17 A.2 Four fermions 18 A.3 Two bosons, two fermions 20 B. A gamma matrix representation 21 1. Introduction The quantisation of the ten-dimensional superstring using pure spinors as world-sheet ghosts [1] has overcome many difficulties encountered in the Green-Schwarz (GS) and Ramond-Neveu-Schwarz (RNS) formalisms. Most notably, by maintaining manifest space- time supersymmetry, the pure spinor formalism has yielded super-Poincaré covariant multi- loop amplitudes, leading to new insights into perturbative finiteness of superstring theory [2, 3]. Counting fermionic zero modes is a powerful technique in the computation of loop amplitudes in the pure spinor formalism and has for example been used to show that at least four external states are needed for a non-vanishing massless loop amplitude [2]. Furthermore, the structure of massless four-point amplitudes is relatively simple because all – 1 – fermionic worldsheet variables contribute only through their zero modes. In the expressions derived for the one-loop [2] and two-loop [4] amplitudes, supersymmetry was kept manifest by expressing the kinematic factors as integrals over pure spinor superspace [5] involving three pure spinors λ and five fermionic superspace coordinates θ, K1-loop = (λA)(λγmW )(λγnW )Fmn K2-loop = (λγmnpqrλ)(λγsW )FmnFpqFrs (1.1) where the pure spinor superspace integration is denoted by 〈. . . 〉, and Aα(x, θ), Wα(x, θ) and Fmn(x, θ) are the superfields of ten-dimensional Yang-Mills theory. The kinematic factors in (1.1) have been explicitly evaluated for Neveu-Schwarz states at two loops [6] and one loop [7], and were found to match the amplitudes derived in the RNS formalism [8]. This provided important consistency checks in establishing the validity of the pure spinor amplitude prescriptions. (Related one-loop calculations had been reported in [9].) In this paper, it will be shown how to compute the kinematic factors in (1.1) when the superfields are allowed to contribute fermionic fields, as is relevant for the scattering of fermionic closed string states as well as Ramond/Ramond bosons. It turns out that the calculation of fermionic amplitudes presents no additional difficulties, making (1.1) a good practical starting point for the computation of four-point loop amplitudes in a unified fashion. This practical aspect of the supersymmetric pure spinor amplitudes was also emphasised in [10], where the tree-level amplitudes were used to construct the fermion and Ramond/Ramond form contributions to the four-point effective action of the type II theories. This paper is organised as follows. In section 2, different methods to compute pure spinor superspace integrals are explored. These methods are then applied to the explicit evaluation of the kinematic factors of massless four-point amplitudes at the one-loop level in section 3, and at the two-loop level in section 4. In both these sections, the bosonic calcu- lations are briefly reviewed before separately considering the cases of two and four Ramond states. Particular attention will be paid to the constraints imposed by simple exchange symmetries. An appendix contains algorithms which were used to reduce intermediate expressions encountered in the amplitude calculations to a canonical form. 2. Zero mode integration The calculation of scattering amplitudes in the pure spinor formalism leads to integrals over zero modes of the fermionic worldsheet variables λ and θ. Both θ and λ are 16-component Weyl spinors, the λ are commuting and the θ anticommuting, and λ is subject to the pure spinor constraint (λγmλ) = 0. The amplitude prescriptions [1, 2] require three zero modes of λ and five zero modes of θ to be present, and a Lorentz covariant object T̄ αβγ,δ1...δ5 ≡ λαλβλγθδ1 . . . θδ5 = T̄ (αβγ),[δ1...δ5] (2.1) was constructed such that the Yang-Mills antighost vertex operator V = (λγmθ)(λγnθ)(λγpθ)(θγmnpθ) has = 1 . (2.2) – 2 – In this section, different methods of computing such “pure superspace integrals” are ex- plored. As an example, a typical correlator encountered in the two-loop calculations of section 4 is considered: F (ki, ui) = k (λγmnpq[rλ)(λγs]u1)(θγn abθ)(θγbu2)(θγqu3)(θγsu4) (2.3) Here, ki and ui are the momenta and spinor wavefunctions of the four external particles. 2.1 Symmetry considerations and tensorial formulae One systematic approach to evaluate the zero mode integrals is to find expressions for all tensors that can be formed from (2.1). By Fierz transformations, one can always write the product of two θ spinors as (θγ[3]θ), where γ[k] denotes the antisymmetrised product of k gamma matrices. Due to the pure spinor constraint, the only bilinear in λ is (λγ[5]λ), and it is thus sufficient to consider the three cases (λγ[5]λ)(λ{γ[1] or γ[3] or γ[5]}θ)(θγ[3]θ)(θγ[3]θ) . (2.4) Lorentz invariance then implies that it must be possible to express these tensors as sums of suitably symmetrised products of metric tensors, resulting in a parity-even expression, plus a parity-odd part made up from terms which in addition contain an epsilon tensor. The parity-even parts may be constructed [6] starting from the most general ansatz compatible with the symmetries of the correlator and then using spinor identities along with the normalisation (2.2) to determine all coefficients in the ansatz. Duality properties of the spinor bilinears can be used to determine the parity-odd part [7]. An extensive (and almost exhaustive) list of correlators is found in [11], including the (λγ[1]θ) and (λγ[3]θ) cases of the above list: (λγmnpqrλ)(λγuθ)(θγfghθ)(θγjklθ) = − 4 mnpqr m̄n̄p̄q̄r̄ + εmnpqrm̄n̄p̄q̄r̄ δm̄n̄fg δ (δr̄l δ u + δ u − δr̄uδhl ) [fgh][jkl] (2.5) (λγmnpqrλ)(λγstuθ)(θγfghθ)(θγjklθ) = −24 mnpqr m̄n̄p̄q̄r̄ + εmnpqrm̄n̄p̄q̄r̄ δm̄j δ δq̄sδ u − δkhδr̄u) [fgh][jkl](fgh↔jkl) (2.6) (Here, the brackets (fgh↔ jkl) denote symmetrisation under simultaneous interchange of fgh with ijk, with weight one.) The remaining correlator with the (λγ[5]θ) factor can be derived in the same way, using an ansatz consisting of six parity-even structures. Taking a trace between the two γ[5] factors and noting that (λγmnpqrλ)(λγabcdeθ) . . . (λγmnpq[bλ)(λγcde]θ) . . . one finds a relation to (2.6). This is sufficient to determine all coefficients in the ansatz, and the result is (λγmnpqrλ)(λγabcdeθ)(θγfghθ)(θγjklθ) mnpqr m̄n̄p̄q̄r̄ + εmnpqrm̄n̄p̄q̄r̄ m̄n̄p̄ (−δehδr̄l + 2δel δr̄h) + δm̄n̄ab δcdfgδ (δehδ l − 3δel δr̄h) [abcde][fgh][jkl](fgh↔jkl) (2.7) – 3 – One may find it surprising that the derivation of these tensorial expressions only made use of properties of (pure) spinors, and of the normalisation condition (2.2). However, it can be seen from representation theory that the correlator (2.1) is uniquely characterised, up to normalisation, by its symmetry. To see this, note that [12] the spinor products λ3 and θ5 transform in λ(αλβλγ) : Sym3 S+ = [00003] ⊕ [10001] θ[δ1 . . . θδ5] : Alt5 S+ = [00030] ⊕ [11010] . (2.8) (Here, λ and θ are taken to be in the S+ irrep of SO(1,9), with Dynkin label [00001].) The tensor product of these contains only one copy of the trivial representation. This applies to any spinors λ, which means that the pure spinor property cannot be essential to the derivation of the tensorial identities. The use of the pure spinor constraint merely allows for simpler derivations of the same identities. As an illustration of this approach, consider the correlator of eq. (2.3). Leaving the momenta aside for the moment by setting F = k2ak r F̃ , the task is to compute (λγmnpq[rλ)(λγs]u1)(θγn abθ)(θγbu2)(θγqu3)(θγsu4) After applying two Fierz transformations, (λγmnpq[r|λ)(λγcθ)(θγn abθ)(θγjklθ) |s]γcγbu2) 3!·16 (λγmnpq[r|λ)(λγcdeθ)(θγn abθ)(θγjklθ) |s]γcdeγbu2) 2·5!·16 (λγmnpq[r|λ)(λγcdefgθ)(θγn abθ)(θγjklθ) |s]γcdefgγbu2) 3!·16(u3γqγjklγsu4) , one obtains a combination of the fundamental correlators listed in (2.5), (2.6) and (2.7). A reliable evaluation of the numerous index symmetrisations is made possible by the use of a computer algebra program. In doing these calculations with Mathematica, an essential tool is the GAMMA package [13], expanding the products of gamma matrices in a γ[k] basis. The result consists of two parts, F̃ = F̃ (δ) + F̃ (ε), with F̃ (δ) = 1 mpru2)(u3γ au4) + r (u1γ iu2)(u3γiu4) + . . . ai1i2u2)(u3γ i1i2u4) (92 terms) (2.9) F̃ (ε) = − 1 1209600 εi1...i7 mpr(u1γ i1...i7u2)(u3γ au4) + . . . 604800 εampri1...i6(u1γ i3...i9u2)(u3γ i7i8i9u4) (34 terms) (2.10) The epsilon tensors in the second part can be eliminated using the fact that the ui are chiral spinors: If all the indices on γ[k]ui are contracted into an epsilon tensor, one uses εi1...ik′j1...jkγ j1...jkγ11 = (−) k(k+1) k! γi1...ik′ , (2.11) where γ11 = 1 εi0...i9γ i0...i9 . More generally, if all but r indices of γ[k]ui are contracted, εi1...ik′ j1...jkγ p1...prj1...jkγ11 = (−) k(k+1) (k′ − r)! pr ...p1 [i1...ir γir+1...i′k] . (2.12) – 4 – The result of these manipulations is F̃ (ε) =− 1 mpru2)(u3γ au4)− 1280δ r (u1γ amiu2)(u3γiu4) + . . . 11200 i1i2i3u2)(u3γ i1i2i3u4) (53 terms) (2.13) (Note that while the epsilon terms in the basic correlator formulae were easily obtained from the delta terms by using Poincaré duality, this cannot be done here in any obvious way.) The last step in the evaluation of (2.3) is to contract with the momenta, F = k2ak r F̃ , and to simplify the expressions using the on-shell identities i ki = 0, k i = 0, /kiui = 0. It is shown in appendix A.2 that there are only ten independent scalars, denoted by B1 . . . B10, that can be formed from four momenta and the four spinors u1 . . . u4. With respect to this basis, the result is F (δ) = 1 48·10080 695s12(u1/k3u2)(u3/k1u4) + · · ·+ 233s213(u1γau2)(u3γau4) (7 terms) 48·10080 (695, 775, 0,−80, 356, 356, 0, 233, 233, 0)B1 ...B10 , F (ε) = 1 48·10080 (−23,−7, 0,−16, 28, 28, 0, 7, 7, 0)B1 ...B10 , F = 1 10080 (14, 16, 0,−2, 8, 8, 0, 5, 5, 0)B1 ...B10 , (2.14) where sij = ki · kj . 2.2 A spinorial formula While the derivation of tensorial identities for correlators of the form (2.4) is relatively straightforward and elegant, it may be a tedious task to transform the expressions encoun- tered in amplitude calculations to match this pattern. As seen in the example calculated above, this is particularly true if additional spinors are involved, making it necessary to ap- ply Fierz transformations. It is therefore desirable to use a covariant correlator expression with open spinor indices. Such an expression was given in [1, 2]: T̄αβγ,δ1...δ5 = N−1 (γm)αδ1(γn)βδ2(γp)γδ3(γmnp) (αβγ)[δ1...δ5] , (2.15) where N is a normalisation constant and the brackets ()[] denote (anti-)symmetrisation with weight one. (Note that the right hand side is automatically gamma-matrix traceless: any gamma-trace (γr)αβ × (γm)α[δ1|(γn)β|δ2|(γp)γ|δ3(γmnp)δ4δ5] = −(γmnr)[δ1δ2(γmnp)δ3δ4(γp)δ5]γ = 0 vanishes due to the double-trace identity (γabθ) α(θγabcθ) = 0, which follows from the fact that the tensor product (Alt3 S+)⊗ S− does not contain a vector representation and therefore the vector (ψγabθ)(θγ abcθ) has to vanish for all spinors ψ, and can also be shown by applying a Fierz transformation.) This prescription was originally motivated [2] by the fermionic expansion of the Yang-Mills antighost vertex operator V , V = Tαβγ,δ1...δ5λ αλβλγθδ1 . . . θδ5 (2.16) Tαβγ,δ1...δ5 = (γm)αδ1(γ n)βδ2(γ p)γδ3(γmnp)δ4δ5 (αβγ)[δ1...δ5] – 5 – where T is related to T̄ by a parity transformation, up to the overall constant N . (Since T̄ is uniquely determined by its symmetries, any covariant expression will be proportional to T̄ , after symmetrisation of the spinor indices, and this is merely the simplest choice.) Equation (2.15) immediately yields an algorithm to convert any correlator into traces of gamma matrices or, if additional spinors are involved, bilinears in those spinors. It is, however, already very tiresome to determine the normalisation constant N by hand. The main advantage of this approach is that it clearly lends itself to implementation on a computer algebra system, which can easily carry out the spinor index symmetrisations, simplify the gamma products (again using the GAMMA package), and compute the traces. For example, N〈V 〉 = (γm)αδ1(γn)βδ2(γp)γδ3(γmnp) (αβγ)[δ1...δ5] (γx)αδ1(γy)βδ2(γz)γδ3(γ xyz)δ4δ5 = − 1 Tr(γxγ m)Tr(γyγ n)Tr(γzγ p)Tr(γxyzγpnm) + . . . Tr(γzγpnmγ zyxγnγxγ p) (60 terms) = 5160960 . The correct normalisation is therefore obtained by setting N = 5160960. Returning to the example correlator (2.3), one finds that the calculation is by far simpler than with the previous method. After carrying out the symmetrisations (αβγ)[δi], one obtains NF̃ = 1 Tr(γxγ mnpq[r|)(u3γqγ xyzγsu4)(u1γ |s]γzγbu2) + . . . (u2γbγ xyzγqu3)(u1γsγyγ mnpq[rγzγ s]u4) , (24 terms) where elementary index re-sorting has reduced the number of terms from 60 to 24. Ex- panding the gamma products leads to NF̃ = 476 δpr (u1γ mu4)(u2γ au3) + · · ·+ 815(u1γ ai1i2i3i4u2)(u3γ i1i2i3i4u4) , (294 terms) which, in contrast to (2.10), contains no epsilon terms as there are not enough free indices present. Note that this intermediate result contains terms with with u1 paired with u3 or u4, so it is not possible to directly compare to eqs. (2.9) and (2.13). However, after contracting with the momenta k2ak r and decomposing the result in the basis B1 . . . B10, one again obtains F = 1 10080 (14, 16, 0,−2, 8, 8, 0, 5, 5, 0)B1 ...B10 , (2.17) in agreement with (2.14). The algorithm just outlined will be the method of choice for all correlator calculations in the later sections of this paper and can easily be applied to a wider range of problems. The only limitation is that the larger the number of gamma matrices and open indices of the correlator, the slower the computer evaluation will be. For example, the correlator considered in eq. (5.2) of [11], mnm1n1...m4n4 (λγpγm1n1θ)(λγqγm2n2θ)(λγrγm3n3θ)(θγmγnγpqrγ m4n4θ) = − 2 m1n1...m4n4 εmnm1n1...m4n4 , (2.18) can still be verified with this method but this already requires substantial runtime. – 6 – 2.3 Component-based approach A third method to evaluate the zero mode integrals consists of choosing a gamma matrix representation, expanding the integrand as a polynomial in spinor components, and then applying (2.15) to the individual monomials. This procedure seems particularly appealing if at some stage of the calculation one works with a matrix representation anyhow, in order to reduce the results to a canonical form (e.g. as outlined in appendix A). An efficient decomposition algorithm (of k4u1u2u3u4 scalars, say) only needs a few non-zero momentum and spinor wavefunction components to distinguish all independent scalars, and therefore k and u can be replaced by sparse vectors. Furthermore, a trivial observation allows for a much quicker numeric evaluation of correlator components than a naive use of (2.15): In view of (2.16), one can equivalently compute the components of the parity- transformed expression V̄ = (λ̄γmθ̄)(λ̄γnθ̄)(λ̄γpθ̄)(θ̄γmnpθ̄), where λ̄ and θ̄ are spinors of chirality opposite to that of λ, θ. In the representation given in appendix B, V̄ coincides with V |λ→λ̄,θ→θ̄, and V = 192λ9λ9λ9θ1θ2θ3θ4θ9 + · · ·+ 480λ1λ2λ3θ1θ9θ10θ13θ15 + . . . (100352 terms) The monomials in the fermionic expansion of V̄ then correspond to the arguments of non-zero correlators, and the coefficients of those monomials are, up to normalisation and symmetry factors, the correlator values. Unfortunately, it turns out that the complexity of typical correlators (e.g. the one given in (2.3)) makes it difficult to carry out the expansion in fermionic components in any straightforward way and limits this method to special applications. For example, the coefficients in (2.18) can be checked relatively easily by choosing particular index values, such as (λγpγ12θ)(λγqγ21θ)(λγrγ34θ)(θγ0γ0γpqrγ 12λ1λ1λ1θ1θ9θ10θ11θ12 + · · ·+ 12λ16λ16λ16θ5θ6θ7θ8θ16 (For fixed values of pqr, one gets no more than about 105 monomials of the form λ3θ5). This approach may thus still be helpful in situations where the result has been narrowed down to a simple ansatz. 3. One-loop amplitudes The amplitude for the scattering of four massless states of the type IIB superstring was computed [2] in the pure spinor formalism as A = KK̄ (Im τ)5 G(zi, zj) ki·kj , (3.1) where G(zi, zj) is the scalar Green’s function, and the kinematic factor is given by the product KK̄ of left- and right-moving open superstring expressions, K1-loop = (λA1)(λγ mW2)(λγ nW3)F4,mn cycl(234) . (3.2) – 7 – Here the indices 1 . . . 4 label the external states and “· · ·+ cycl(234) ” denotes the addition of two other terms obtained by cyclic permutation of the indices 234. The spinor super- field Aα and its supercovariant derivatives, the vector gauge superfield Am = m DαAβ as well as the spinor and vector field strengths Wα = 1 (γm)αβ(DβAm − ∂mAβ) and Fmn = 18(γmn) β = 2∂[mAn], describe ten-dimensional super-Yang-Mills theory. The physical fields of this theory, a gauge boson and a gaugino, are found in the leading components Am| = ζm and Wα| = ûα and correspond to the Neveu-Schwarz and Ramond superstring states. The superfields Aα and W α as well as the gaugino field ûα are anticommuting.1 To facilitate computer calculations involving polynomials in the spinor components, and for easier comparison with the literature, it will be more convenient to work with commuting fermion wavefunctions uα. Fortunately, as the kinematic factors with fermionic external states are multilinear functions of the distinctly labelled spinors ûi, it is straightforward to translate between the two conventions: Any monomial expression in û1 . . . û4 (and possibly fermionic coordinates θ) corresponds to the same expression in u1 . . . u4, multiplied by the signature of the permutation sorting the ûi (and any θ variables) into some fixed order, such as (θ · · · θ)ûα11 û Choosing a gauge where θαAα = 0, the on-shell identities 2D(αAβ) = γ αβAm , DαW β = 1 (γmn)α have been used to derive recursive relations [10, 14, 15] for the fermionic expansion A(n)α = (γmθ)αA (n−1) m , A (θγmW (n−1)) , Wα(n) = − 1 (γmnθ)α∂mA (n−1) where f (n) = 1 θαn · · · θα1(Dα1 · · ·Dαnf)|. These recursion relations were explicitly solved in [10], reducing the fermionic expansion to a simple repeated application of the derivative operator Omq = 12 (θγm qpθ)∂p: A(2k)m = (2k)! [Ok]mqζq , A(2k+1)m = (2k+1)! [Ok]mq(θγqû) . (3.3) With this solution at hand, one has all ingredients to evaluate the kinematic factor (3.2) for the three cases of zero, two, or four fermionic states. 3.1 Review: four bosons The kinematic factor involving four bosons was considered in [7] and this calculation will now be reviewed briefly. First, note that the outcome is not fixed by symmetry: The result must be gauge invariant [2] and therefore expressible in terms of the field strengths F1 . . . F4. The cyclic symmetrisation in (3.2) yields expressions symmetric in F2, F3, F4, and acting on scalars constructed from the Fi only, the (234) symmetrisation is equivalent to complete symmetrisation in all labels (1234). Thus the result must be a linear combination of the 1Thanks to Carlos Mafra for pointing this out. – 8 – two gauge invariant symmetric F 4 scalars, namely the single trace Tr(F(1F2F3F4)) and double trace Tr(F(1F2)Tr(F3F4)), leaving one relative coefficient to be determined. Since all four states are of the same kind, one may first evaluate the correlator for one labelling and then carry out the cyclic symmetrisation: 1-loop = (λA1)(λγ mW2)(λγ nW3)F4,mn cycl (234) The different ways to saturate θ5 result in a sum of terms of the form XABCD = 1 )(λγ 2 )(λγ (3.4) with A+B +C +D = 5 and A, B, C odd, D even: (λA1)(λγ mW2)(λγ nW3)F4,mn = X3110 +X1310 +X1130 +X1112 . Note that X1310 and X1130 are related by exchange of the labels 2 and 3. This exchange can be carried out after computing the correlator, an operation which will in the following be denoted by π23. Using (3.3) for the superfield expansions and replacing ∂m → ikm, one obtains X3110 = − 1512F tuX̃3110 , X̃3110 = (λγ[t|γpqθ)(λγ|u]γrsθ)(λγaθ)(θγ amnθ) X1112 = − 1128 ik tuX̃1112 , X̃1112 = (λγ[m|γpqθ)(λγ|a]γrsθ)(λγnθ)(θγa X1310 = − 1384 ik tuX̃1310 , X̃1310 = (λγ[t|γmaθ)(λγ|u]γrsθ)(λγnθ)(θγa The method outlined in section 2.2 is readily applicable to these correlators. For example, for X3111, the trace evaluation yields X̃3110 = N Tr(γaγ z)Tr(γxyzγ anm)Tr(γxγqpγ [t|)Tr(γyγsrγ |u]) + · · · · · ·+ 1 Tr(γ[u|γrsγzyxγqpγ |t]γxγaγ yγmnaγz) (60 terms) δmprs δ tu − 1315δ rs − 145δ δmnpr δ [mn][pq][rs][tu](pq↔rs) Upon contracting with the field strengths, momenta and polarisations, and symmetrising over the cyclic permutations (234) (with weight 3), one finds that all three contributions are separately gauge invariant: X3110 + cycl(234) = − 11 13440 Tr(F(1F2F3F4)) + Tr(F(1F2)Tr(F3F4)) X1112 + cycl(234) = − 19 53760 Tr(F(1F2F3F4)) + 215040 Tr(F(1F2)Tr(F3F4)) (1 + π23)X1310 + cycl(234) = − 1 10240 4Tr(F(1F2F3F4))− Tr(F(1F2)Tr(F3F4)) The sum X3110 +X1112 has the right ratio of single- and double-trace terms to be propor- tional to the well-known result t8F 4, and the last line exhibits the right ratio by itself. The overall kinematic factor is therefore K4B1-loop = − 12560 4Tr(F(1F2F3F4))− Tr(F(1F2)Tr(F3F4)) = − 1 15360 4 , (3.5) in agreement with the expressions derived in the RNS [16] and Green-Schwarz [17] for- malisms. – 9 – 3.2 Four fermions The four-fermion kinematic factor could be evaluated in the same way as in the four-boson case by summing up all terms XABCD, A + B + C + D = 5, now with A, B, C even and D odd. Note however that this time, the outcome is fixed by symmetry: The cyclic symmetrisation in (3.2) leads to a completely symmetric dependence on û2, û3, û4, and therefore to a completely antisymmetric dependence on u2, u3, u4. Acting on scalars of the form k2u1u2u3u4, antisymmetrising over [234] is equivalent to antisymmetrising over [1234], and there is only one completely antisymmetric k2u1u2u3u4 scalar. Without further calculation, one can infer that the kinematic factor is proportional to that scalar, K4F1-loop = const · (u1/k3u2)(u3/k1u4)− (u1/k2u3)(u2/k1u4) + (u1/k2u4)(u2/k1u3) which of course agrees with the RNS amplitude (see e.g. [16], eq. (3.67)). 3.3 Two bosons, two fermions In evaluating (3.2) for two bosons and two fermions, the cyclic symmetrisations affect whether the W and F superfields contribute bosons or fermions. Only the label of the Aα superfield stays unaffected, and one has to choose whether it should contribute a boson or a fermion. Since its fermionic expansion starts with the bosonic polarisation vector, A1,α ∼ (/ζ1θ)α, the calculation can be simplified by choosing a labelling where particle 1 is a fermion. (Of course, the final result must be independent of this choice.) The assignment of the other three labels is then irrelevant and will be chosen as f1f2b3b4. Writing out the cyclic permutations, two of the three terms are essentially the same because they are related by interchange of the labels 3 and 4. The kinematic factor is then K2B2F1-loop(f1f2b3b4) = (1 + π34) (even) 1 )(λγ (even) 2 )(λγ (odd) (even) (even) 1 )(λγ (odd) 3 )(λγ (odd) (odd) Unlike in the four-fermion calculation, the result is not fixed by symmetry. There are five independent ku1u2F3F4 scalars (see appendix A, eq. (A.6)), denoted by C1 . . . C5, and there are two independent combinations of these scalars with the required [12](34) symmetry. Expanding the superfields and collecting terms with θ5, the first line yields a combination of terms XABCD with A, B, D odd and C even. There is only one θ 5 combination coming from the second line, which will be denoted by X ′2111 ≡ (−π24)X2111: K2B2F1-loop = (1 + π34) (X4010 +X2210 +X2030 +X2012) +X 2111 , with the correlators X4010 = ζ3c k nX̃4010 , X̃4010 = (λγaθ)(θγa pqθ)(θγpu1)(λγ [mu2)(λγ n]γbcθ) X2210 = − i12k nX̃2210 , X̃2210 = (λγaθ)(θγau1)(λγ [m|γbcθ)(θγcu2)(λγ |n]γdeθ) X2030 = − i36k nX̃2030 , X̃2030 = (λγaθ)(θγau1)(λγ [mu2)(λγ n]γbcθ)(θγc X2012 = − i12k ζ3c k ζ4e X̃2012 , X̃2012 = (λγaθ)(θγau1)(λγ [mu2)(λγ n]γbcθ)(θγn X ′2111 = ζ3c k 2111 , X̃ 2111 = (λγaθ)(θγau1)(λγ [m|γbcθ)(λγ|n]γdeθ)(θγnu2) – 10 – (The numerical coefficient in X ′2111 includes a sign coming from the θ, û ordering: there is an odd number of θs between u1 and u2.) Evaluating these expressions as outlined in section 2.2, the spinor wavefunctions ui present no complication. The last part takes the simplest form: One finds (λγaθ)(θγau1)(λγ mγbcθ)(λγnγdeθ)(θγnu2) = − 1 (2δbcm[d(u1γe]u2) + δ m(u1γ c]deu2)) and therefore X̃ ′2111 = − 1480 δ[bm(u1γ c]γdeu2) + δ m(u1γ e]γbcu2) The result for X̃4010 is X̃4010 = δbqmn(u1γ cu2)− 190δ mq(u1γ nu2) + δbcmn(u1γ qu2)− 12520δ q (u1γ bcnu2) δbq(u1γ cmnu2) + δbm(u1γ cnqu2) + bcmnqu2) [bc][mn] For the evaluation of X̃2210, it is useful to consider the more general correlator (λγaθ)(θγau1)(λγ [m|γbcθ)(λγ|n]γdeθ)(θγxu2) mn(u1γ cu2) + . . . 201600 δmx (u1γ bcdenu2) + · · · − 11403200 (u1γ bcdemnxu2) [mn][bc][de] (27 terms) 9676800 εbcdemni1i2i3i4(u1γ i1i2i3i4xu2)− 12419200εbcdemnxi1i2i3(u1γ i1i2i3u2) . This time, even using the method of section 2.2, there are sufficiently many open indices and long enough traces for epsilon tensors to appear. Using eqs. (2.11) and (2.12), they can be re-written into γ[5,7] terms: (λγaθ)(θγau1)(λγ [m|γbcθ)(λγ|n]γdeθ)(θγxu2) mn(u1γ cu2) + . . . 16800 δmx (u1γ bcdenu2) + · · · − 133600 (u1γ bcdemnxu2) [mn][bc][de] (27 terms) A good check on the sign of the epsilon contributions is that X̃ ′2111 is recovered when contracting with ηnx, involving a cancellation of all γ [5] terms. To obtain X̃2210, one multiplies by −ηcx: X̃2210 = δdemn(u1γ bu2) + δbdmn(u1γ eu2) + δbmde (u1γ nu2) + 20160 δdm(u1γ benu2) δbm(u1γ denu2) + 20160 δbd(u1γ emnu2) + bdemnu2) [de][mn] For the calculation of X2030 and X2012, one may first evaluate a more general correlator 〈(λγaθ)(θγau1)(λγ[mu2)(λγn]γbcθ)(θγxγdeθ)〉 and then contract with ηcx and ηnx, respec- tively. The results are X̃2030 = δdemn(u1γ bu2) + δbdmn(u1γ eu2)− 11440δ de (u1γ nu2)− 1710080δ m(u1γ benu2) 10080 δbm(u1γ denu2)− 11440δ d(u1γ emnu2) + bdemnu2) [mn][de] X̃2012 = δdebm(u1γ cu2) + δbcdm(u1γ eu2)− 11440δ de(u1γ mu2) + δdm(u1γ bceu2) 10080 δbm(u1γ cdeu2) + 10080 δbd(u1γ cemu2)− 13360 (u1γ bcdemu2) [bc][de] – 11 – After multiplication with the momenta and polarisations, all individual contributions are gauge invariant and can be expanded in the basis C1 . . . C5 listed in (A.6): (1 + π34)X4010 = 483840 (−6,−16,−40, 6, 0)C1 ...C5 (1 + π34)X2210 = 483840 (−18,−104,−176, 18, 0)C1 ...C5 (1 + π34)X2030 = 483840 (−21, 42,−42, 21, 0)C1 ...C5 (1 + π34)X2012 = 483840 (−39, 78,−78, 39, 0)C1 ...C5 X ′2111 = − i11520 (1, 0, 4,−1, 0)C1 ...C5 The sum can be written as K2B2F1-loop = X 2111 = − i3840 (1, 0, 4,−1, 0)C1 ...C5 = − i s13(u2/ζ3(/k2 + /k3)/ζ4u1) + s23(u2/ζ4(/k2 + /k4)/ζ3u1) (3.6) and again agrees with the amplitude computed in the RNS result, see [16] eq. (3.37). 4. Two-loop amplitudes The pure spinor formalism was used in [4, 2] to compute the two-loop type-IIB amplitude involving four massless states, d2Ω11d 2Ω12d i,j ki · kj G(zi, zj) (det ImΩ)5 K2-loop(ki, zi) , where Ω is the genus-two period matrix, and the integration over fermionic zero modes is encapsulated in K2-loop = ∆12∆34 (λγmnpqrλ)(λγsW1)F2,mnF3,pqF4,rs perm(1234) (4.1) ≡ ∆12∆34K12 +∆13∆24K13 +∆14∆23K14 . (4.2) The kinematic factors K12, K13, K14 are accompanied by the basic antisymmetric biholo- morphic 1-form ∆, which is related to a canonical basis ω1, ω2 of holomorphic differentials via ∆ij = ∆(zi, zj) = ω1(zi)ω2(zj) − ω2(zi)ω1(zj). The superfields Wαi and Fi,mn are the spinor and vector field strengths of the i-th external state, as in section 3. One encounters superspace integrals of the form Y (abcd) = (λγmnpqrλ)(λγsWa)Fb,mnFc,pqFd,rs . (4.3) The symmetries of the λ3 combination [4] in this correlator include the obvious symmetry under mn↔ pq, and also (λγ[mnpqrλ)(λγs])α = 0 (this holds for pure spinors λ and can be seen by dualising, and holds for unconstrained spinors λ as part of a λ3θ5 scalar, as seen from the representation content (2.8)), and allow one to shuffle the F factors: Y (abcd) = Y (acbd) , Y (abcd) + Y (acdb) + Y (adbc) = 0 . (4.4) – 12 – 4.1 Review: four bosons The case of four Neveu-Schwarz states was considered in [6] and will be briefly reviewed here. As all three kinematic factors K12, K13 and K14 are equivalent, it is sufficient to consider K12 in detail. With all external states being identical, the symmetrisations of (4.1) can be carried out at the end of the calculation: K4B12 = 4 W[1F2]F[3F4] W[3F4]F[1F2] = (1− π12)(1− π34)(1 + π13π24) W1F2F3F4 Expanding the superfields and adopting the notation YABCD(abcd) = (λγmnpqrλ)(λγsW (A)a )F F (C)c,pqF the Neveu-Schwarz states come from terms of the form YABCD ≡ YABCD(1234) with A odd and B, C, D even. Using the shuffling identities (4.4) to simplify, one obtains W1F2F3F4 = Y5000 + Y1400 + Y1040 + Y1004 + Y3200 + Y3020 + Y3002 + Y1220 + Y1202 + Y1022 = (1 + π23)(1− π24) Y5000 + Y1400 + Y3200 + Y1022 and therefore K4B12 can be written as the image of a symmetrisation operator S4B: K4B12 = S4B Y5000 + Y1400 + Y3200 + Y1022 S4B = (1− π12)(1− π34)(1 + π13π24)(1 + π23)(1− π24) It is worth noting at this point that, on the sixteen-dimensional space of Lorentz scalars built from the four field strengths Fi and two momenta, the symmetriser S4B has rank four. The correlators were computed in [6], using the method outlined in section 2.1. Two are zero, Y5000 = Y1400 = 0, and the remaining ones are Y3200 = (λγmnpqrλ)(λγsγabθ)(θγb cdθ)(θγn Y1022 = F 1abF (λγmnpq[rλ)(λγs]γabθ)(θγq cdθ)(θγs In reducing those two contributions to a set of independent scalars, one finds that they both are not just sums of (k · k)F 4 terms but also contain terms of the form k · F terms. The latter are projected out by the symmetriser S4B, and the result is K4B12 = S4B(Y3200 + Y1022) = 1120 (s13 − s23) 4Tr(F(1F2F3F4))− Tr(F(1F2)Tr(F3F4)) (s13 − s23)t8F 4 . By trivial index exchange, one obtains K13 and K14, and the total is K4B2-loop = (s13 − s23)∆12∆34 + (s12 − s23)∆13∆24 + (s12 − s13)∆14∆23 4 , (4.5) a product of the completely symmetric one-loop kinematic factor t8F 4 and a completely symmetric combination of the momenta and the ∆ij. – 13 – 4.2 Four fermions The calculation involving four Ramond states is very similar to the bosonic one. Focussing on the K12 part, the symmetrisations in (4.1) can again be rewritten as action of sym- metrisation operators on the correlator of superfields with one particular labelling: K4F12 (ûi) = (1− π12)(1 − π34)(1 + π13π24) W1F2F3F4 û1û2û3û4 = 4(1− π12) W1F2F3F4 û1û2û3û4 The last step follows from the fact that all scalars of the form k4u4 (see appendix A.2), and therefore all k4û4 scalars, are invariant under π13π24 and have π12 = π34. This time, on expanding the superfields, one collects the terms YABCD with A even and B, C, D odd. After using (4.4) to simplify, W1F2F3F4 û1û2û3û4 = Y2111 + Y0311 + Y0131 + Y0113 = (1 + π23)(1− π24) Y2111 + Y0311 and after translating to commuting wavefunctions ui, which multiplies every permutation operator with its signature, one obtains K4F12 (ui) = S4F Y2111(ui) + Y0311(ui) , S4F = 4(1 + π12)(1− π23)(1 + π24) . This symmetriser has rank three, and the result is again not determined by symmetry. Two correlators have to be computed: Y2111(ui) = (−2)k1ak2mk3pk4r (λγmnpq[rλ)(λγs]γabθ)(θγbu1)(θγnu2)(θγqu3)(θγsu4) Y0311(ui) = (−23)k (λγmnpq[rλ)(λγs]u1)(θγn abθ)(θγbu2)(θγqu3)(θγsu4) With four fermions present, the method of section 2.2 is preferred as it does not involve re- arranging the fermions using Fierz identities. The first correlator was covered as an example in that section, and the second one can be evaluated in the same fashion. Expressed in the basis listed in (A.5), the results are Y2111(ui) = (−19,−21, 21, 19,−17,−17, 0, 0, 0, 0)B1 ...B10 , Y0311(ui) = 15120 (−14,−16, 0, 2,−8,−8, 0,−5,−5, 0)B1 ...B10 . After acting with the symmetriser S4F, one obtains the same u4 scalar encountered in the one-loop amplitude, K4F12 (ui) = S4F(13Y2111(ui) + Y0311(ui)) = (−1,−2, 1, 2,−1,−2, 0, 0, 0, 0)B1 ...B10 (s23 − s13) (u1/k3u2)(u3/k1u4)− (u1/k2u3)(u2/k1u4) + (u1/k2u4)(u2/k1u3) The K13 and K14 parts again follow by index exchange, and the total result K4F2-loop(ui) = (s23 − s13)∆12∆34 + (s23 − s12)∆13∆24 + (s13 − s12)∆14∆23 (u1/k3u2)(u3/k1u4)− (u1/k2u3)(u2/k1u4) + (u1/k2u4)(u2/k1u3) (4.6) is again a simple product of the one-loop kinematic factor and a combination of the ∆ij and momenta. – 14 – 4.3 Two bosons, two fermions As in the one-loop calculation of section 3.3, in the mixed case one has to pay some attention to the permutations in (4.1) since they affect which superfields contribute fermionic fields. The complete symmetrisation makes it irrelevant which labels are assigned to the two fermions, and the convention f1f2b3b4 will be used here. The kinematic factor K 12 is then distinguished from the other two, K2B2F13 and K 14 . Carrying out the symmetrisations in (4.1) and using the identities (4.4), one finds K12(û1, û2, ζ3, ζ4) = (1− π12)(1− π34)K̃ , K13(û1, û2, ζ3, ζ4) = (2 · 1+ π12 + π34 + 2π12π34)K̃ , K14(û1, û2, ζ3, ζ4) = (1+ 2π12 + 2π34 + π12π34)K̃ , where, schematically, (even) (odd) (even) (even) (odd) (even) (odd) (odd) . (4.7) In translating to commuting variables u1 and u2, the permutation operator π12 changes sign, and therefore2 K12(u1, u2, ζ3, ζ4) = (1+ π12)(1− π34)K̃ , K13(u1, u2, ζ3, ζ4) = (2 · 1− π12 + π34 − 2π12π34)K̃ , K14(u1, u2, ζ3, ζ4) = (1− 2π12 + 2π34 − π12π34)K̃ . Expanding the superfields, the contributions to K̃ are: Y4100 = − i48k (λγmnpqrλ)(λγsγabθ)(θγbγ cdθ)(θγcu1)(θγnu2) Y0500 = (λγmnpqrλ)(λγsu1)(θγn abθ)(θγb cdθ)(θγdu2) Y0140 = (λγmnpqrλ)(λγsu1)(θγnu2)(θγq abθ)(θγb Y0104 = (λγmnpqrλ)(λγsu1)(θγnu2)(θγ|s] abθ)(θγb Y2300 = (λγmnpqrλ)(λγsγabθ)(θγbu1)(θγn cdθ)(θγeu2) Y2120 = (λγmnpqrλ)(λγsγabθ)(θγbu1)(θγnu2)(θγq Y2102 = (λγmnpqrλ)(λγsγabθ)(θγbu1)(θγnu2)(θγ|s] Y0320 = (λγmnpqrλ)(λγsu1)(θγn abθ)(θγbu2)(θγq Y0302 = (λγmnpqrλ)(λγsu1)(θγn abθ)(θγbu2)(θγ|s] Y0122 = (λγmnpqrλ)(λγsu1)(θγnu2)(θγq abθ)(θγs] Y3011 = (λγmnpqrλ)(λγsγabθ)(θγb cdθ)(θγcu1)(θγnu2) Y1211 = F 3abk (λγmnpqrλ)(λγsγabθ)(θγn cdθ)(θγqu1)(θγ|s]u2) Y1031 = F 3abF (λγmnpqrλ)(λγsγabθ)(θγq cdθ)(θγdu1)(θγ|s]u2) Y1013 = F 3abF (λγmnpqrλ)(λγsγabθ)(θγqu1)(θγ|s] cdθ)(θγdu2) 2This sign change is crucial to avoid the erroneous conclusion that the two-boson, two-fermion kinematic factor cannot be of the same product form as in the four-boson or four-fermion cases, which would be in contradiction to the supersymmetric identities derived in [18]. – 15 – These correlators can be evaluated exactly as described in section 3.3. One finds that Y0500 = Y0140 = Y0104 = 0, and the sum of the remaining terms reduces to K̃ = Y4100 + Y2300 + Y2120 + Y2102 + Y0320 + Y0302 + Y0122 + Y3011 + Y1211 + Y1031 + Y1013 (s12 + s13)× (1, 0, 4,−1, 0)C1 ...C5 . After applying the symmetrisation operators, (1+ π12)(1− π34)K̃ = i180 (s12 + 2s13)× (1, 0, 4,−1, 0)C1 ...C5 , (2 · 1− π12 + π34 − 2π12π34)K̃ = i180 (2s12 + s13)× (1, 0, 4,−1, 0)C1 ...C5 , (1− 2π12 + 2π34 − π12π34)K̃ = i180 (s12 − s13)× (1, 0, 4,−1, 0)C1 ...C5 , the total kinematic factor is seen to be K2-loop(u1, u2, ζ3, ζ4) = − i180 (s23−s13)∆12∆34+(s23−s12)∆12∆34+(s13−s12)∆12∆34 × (1, 0, 4,−1, 0)C1 ...C5 (4.8) and displays the same simple product form as in the four-boson and four-fermion case. 5. Discussion In this paper, different methods were discussed to efficiently evaluate the superspace inte- grals appearing in multiloop amplitudes derived in the pure spinor formalism. Extending previous calculations [6, 7] restricted to Neveu-Schwarz states, it was then shown how the treatment of Ramond states poses no additional difficulties. While the bosonic calculations of [6, 7] have, in conjunction with supersymmetry, already established the equivalence of the massless four-point amplitudes derived in the pure spinor and RNS formalisms, it would be interesting to make contact between the results of sections 4.2 / 4.3 and two-loop amplitudes involving Ramond states as computed in the RNS formalism (see for example [19]). The assistance of a computer algebra system seems indispensible in explicitly evaluat- ing pure spinor superspace integrals. To avoid excessive use of custom-made algorithms, it would be desirable to implement these calculations in a wider computational framework particular adapted to field theory calculations [20]. The methods outlined in this paper should be easily applicable to future higher-loop amplitude expressions derived from the pure spinor formalism, and, it is hoped, to other superspace integrals. Acknowledgements The author would like to thank Louise Dolan for discussions, and Carlos Mafra for valuable correspondence. This work is supported by the U.S. Department of Energy, grant no. DE- FG01-06ER06-01, Task A. – 16 – A. Reduction to kinematic bases In calculating scattering amplitudes one encounters kinematic factors which are Lorentz invariant polynomials in the momenta, polarisations and/or spinor wavefunctions of the scattered particles. It can be a non-trivial task to simplify such expressions, taking into account the on-shell identities i ki = 0, k i = 0, ki · ζi = 0, /kiui = 0, and, in the case of fermions, re-arrangements stemming from Fierz identities. More generally, one would like to know how many independent combinations of some given fields (subject to on-shell identitites) there are, and how to reduce an arbitrary expres- sion with respect to some chosen basis. This appendix outlines methods to address these problems, with an emphasis on algorithms which can easily be transferred to a computer algebra system. These methods are not limited to dealing with pure spinor calculations but the scope will be restricted to amplitudes of four massless vector or spinor particles in ten dimensions. A.1 Four bosons It is not difficult to reduce polynomials in the momenta and polarisations to a canonical form. The momentum conservation constraint i ki = 0 is solved by eliminating one momentum (for example k4), all k i are set to zero, and one of the two remaining quadratic combinations of momenta is eliminated (for example s23 → −s12− s13, where sij ≡ ki ·kj). Then all products ki · ζi are set to zero, and one extra k · ζ product is replaced (when eliminating k4, the replacement is k3 · ζ4 → (−k1 − k2) · ζ4). The remaining monomials are then independent. (This is at least the case with the low powers of momenta encountered in the calculations of sections 3 and 4, where there are enough spatial directions for all momenta/polarisations to be linearly independent.) The implementation of these reduction rules on a computer is straightforward. The easiest way to obtain scalars which are also invariant under the gauge symmetry ki → ζi is to start with expressions constructed from the field strengths F abi = 2∂ i . For the one-loop calculations of section 3.1, the relevant basis consists of gauge invariant scalars containing only the four field strengths F1 . . . F4. One finds six independent combinations, Tr(F1F2F3F4) Tr(F1F2F4F3) Tr(F1F3F2F4) Tr(F1F2)Tr(F3F4) Tr(F1F3)Tr(F2F4) Tr(F1F4)Tr(F2F3) In the two-loop calculations of section 4.1, all monomials have two more momenta. There are sixteen independent gauge invariant scalars of the form kkF1F2F3F4, and twelve of them may be constructed from the previous basis by multiplication with s12 and s13: A1 = s12 Tr(F1F2F3F4), A2 = s13 Tr(F1F2F3F4), etc. One choice for the additional four is A13 = k3 · F1 · F2 · k3 Tr(F3F4) A15 = k3 · F1 · F4 · k2 Tr(F2F3) A14 = k4 · F1 · F3 · k2 Tr(F2F4) A16 = k4 · F2 · F3 · k4 Tr(F1F4) . – 17 – As an example application of the computer algorithms, one may check that the symmetri- sation operator of section 4.1, S4B = (1− π12)(1− π34)(1 + π13π24)(1 + π23)(1− π24) , acts as S4BA1 = 8A1 + 4A2 − 4A3 + 4A4 + 8A5 + 16A6 . . . S4BA16 = −6A1 + 6A3 − 6A5 − 12A6 + 32A7 + 3A8 + A9 + 3A10 + A11 + 3A12 and has rank four. A.2 Four fermions In dealing with the spinor wavefunctions ui one has to face two issues: Fierz identities, and the Dirac equation. Fierz identities not only allow one to change the order of the spinors but also give rise to relations between different expressions in one spinor order. The Dirac equation often simplifies terms with momenta contracted into (uiγ [n]uj) bilinears. In this section it is shown how to construct bases for terms of the form (k2 or k4) × u1u2u3u4. A significant simplification comes from noting that the Dirac equation allows one to rewrite (uiγ [n]uj) bilinears into terms with lower n if more than one momentum is contracted into the γ[n]. A good first step is therefore to disregard the momenta temporarily and find all independent scalars and two-index tensors built from u1, . . . , u4. From the SO(10) representation content, (S+)⊗4 = 2 · 1+ 6 · + 3 ·˜+ (tensors with rank > 2) , one expects two scalars and nine 2-tensors. The scalars are easily found by considering, as in [21], T1(1234) = (u1γ au2)(u3γau4) , T3(1234) = (u1γ abcu2)(u3γabcu4) . and similarly for the other two inequivalent orders of the four spinors. (Note there is no T5 because of self-duality of the γ[5].) From Fierz transformations, one learns that all T3 terms can be reduced to T1 by T3(1234) = −12T1(1234)− 24T1(1324) and permutations, and the identity (γa)(αβ(γ a)γ)δ = 0 implies that T1(1234) + T1(1324) + T1(1423) = 0, leaving for example T1(1234) and T1(1324) as independent scalars. Generalising this approach to two-index tensors, it turns out that it is sufficient to start with T11(1234) = (u1γ mu2)(u3γ nu4) , T31(1234) = (u1γ aγmγnu2)(u3γau4) , T33(1234) = (u1γ abγmu2)(u3γabγ nu4) , – 18 – and permutations of the spinor labels. It would be very tiresome to systematically apply a variety of Fierz transformations by hand and to find an independent set. Fortunately, by choosing a gamma matrix representation (such as the one listed in appendix B) and reducing all expressions to polynomials in the independent spinor components u1i , . . . , u this problem can be solved with computer help. As expected, one finds that the Tij(abcd) span a nine-dimensional space, and a basis can be chosen as T11(1234), T11(1324), T11(1423), T11(3412), T11(2413), T11(2314), T31(1234), T31(1324), T31(2314) . (A.1) A typical relation reducing the other Tij(abcd) to this basis is T31(3412) = 2T11(1234) − 2T11(3412) + T31(1324) + T31(2314) + 2ηmnT1(1234) . (A.2) Having solved the first step, it is now easy to include the two or four momenta, taking the Dirac equation into account. Consider first the case of two momenta. Starting from the two-tensors in (A.1), one gets the three independent scalars (u1/k3u2)(u3/k1u4) , (u1/k2u3)(u2/k1u4) , (u1/k2u4)(u2/k1u3) . In addition, there are four products of the two independent scalars T1(1234) and T1(1324) with the two independent momentum invariants s12 and s13. By contracting (A.2) with momenta, one can show that s12T1(1324) − s13T1(1234) = −(u1/k3u2)(u3/k1u4) + (u1/k2u3)(u2/k1u4)− (u1/k2u4)(u2/k1u3) , (A.3) and this relation can be used to eliminate s12T1(1324). (It will become clear later that there are no independent other relations like this one.) There are thus six independent k2u1 · · · u4 scalars: (u1/k3u2)(u3/k1u4) s12 T1(1234) (u1/k2u3)(u2/k1u4) s13 T1(1234) (A.4) (u1/k2u4)(u2/k1u3) s13 T1(1324) Note that there is only one completely antisymmetric combination of those, given by the right hand side of (A.3). Similarly, in the case of four momenta, one finds ten independent k4u1 · · · u4 scalars: B1 = s12 (u1/k3u2)(u3/k1u4) B2 = s13 (u1/k3u2)(u3/k1u4) B3 = s12 (u1/k2u3)(u2/k1u4) B4 = s13 (u1/k2u3)(u2/k1u4) B5 = s12 (u1/k2u4)(u2/k1u3) B6 = s13 (u1/k2u4)(u2/k1u3) (A.5) B7 = s 12 T1(1234) B8 = s12s13 T1(1234) B9 = s 13 T1(1234) B10 = s 13 T1(1324) – 19 – Working in a gamma matrix representation, it is again simple to construct a computer algorithm which reduces any given k2u1 · · · u4 or k4u1 · · · u4 scalar into polynomials of the spinor and momentum components. The Dirac equation can then be solved by breaking up the sixteen-component spinors ui into eight-dimensional chiral spinors u i and u i , as in eq. (B.1). One obtains polynomials in the momentum components kai and the independent spinor components (uci ) 1...8. However, a great disadvantage of this procedure is that it breaks manifest Lorentz invariance. For example, one encounters expressions which contain subsets of terms proportional to the square of a single momentum and are therefore equal to zero, but it is difficult to recognise this with a simple algorithm. The easiest solution is to choose several sets of particular vectors ki satisfying k i = 0 and i ki = 0 and to evaluate all expressions on these vectors. (By choosing integer arithmetic, one easily avoids issues of numerical accuracy.) Substituting these sets of momentum vectors in the bases (A.4) and (A.5) gives full rank six and ten respectively, showing they are indeed linearly independent. Equipped with a computer algorithm for these basis decompositions, one finds, for example, that the symmetriser S4F of section 4.2, S4F = 4(1 + π12)(1 − π23)(1 + π24) , acts on the B1 . . . B10 basis as S4FB1 = −12B4 + 12B5 + 12B6 , . . . S4FB10 = 8B1 + 16B2 − 8B3 − 16B4 + 8B5 + 16B6 − 24B7 − 24B8 − 24B9 and has rank three. A.3 Two bosons, two fermions The combined methods of the last two sections can easily be extended to the mixed case of two bosons and two fermions. In the one-loop calculation of section 3.3, one encounters scalars of the form ku1u2F3F4. A basis of such objects is given by C1 = (u1γ au2)k C2 = (u1γ au2)F C3 = (u1γ au2)F c (A.6) C4 = (u1γ abcu2)F C5 = (u1γ abcu2)F There are two combinations antisymmetric in [12] and symmetric in (34): −C1 + 4C2 +C4 and C2 + C3 . Finally, there are ten independent scalars of the form k3u1u2F3F4 (relevant to the two-loop calculation of section 4.3), and they can all be obtained by multiplication of C1 . . . C5 with the two momentum invariants s12 and s13. – 20 – B. A gamma matrix representation A convenient representation of the SO(1,9) gamma matrices is given by the 32×32 matrices 0 (γa)αβ (γa)αβ 0 where (γ0)αβ = 116 = (γ 0)αβ , (γ9)αβ = −18 0 = −(γ9)αβ , and (γa)αβ = −(γa)αβ, a = 1 . . . 8, is a real, symmetric 16×16 representation for the SO(8) Clifford algebra, (γa)αβ = (σa)T 0 , a = 1 . . . 8 , as given in appendix 5.B of [21]. The matrices Γa satisfy the SO(1,9) Clifford algebra relations, {Γa,Γb} = 2ηab132 , ηab = (+−− · · · −) , and bilinears of chiral spinors (with, say, positive chirality) are constructed as (uΓ[a1...ak]v) = (uγ[a1...ak ]v) = uα(γ[a1)αβ(γ a2)βγ . . . (γak ])γδv This representation is particularly suitable for the calculations outlined in appendix A because it allows a simple decomposition of SO(1,9) spinors into SO(8) spinors due to its block structure: Γ0 · · ·Γ9 = 116 0 0 −116 , Γ1 · · ·Γ8 = 18 0 0 0 0 −18 0 0 0 0 18 0 0 0 0 −18 Therefore, the Dirac equation for a chiral 16-component spinor u, (γa)αβ∂au α = 0 , can be solved by splitting u into two chiral eight-component spinors of SO(8), with γ1...8 One obtains the coupled equations (∂0 + ∂9)u s − (σ · ∂)uc = 0 (∂0 − ∂9)uc − (σT · ∂)us = 0 (with eight-dimensional dot products). These can be solved for us in terms of uc: (σ · ∂)uc = (σ · k)uc , (B.1) where k+ = −i∂+ = −i√ (∂0 + ∂9). – 21 – References [1] N. Berkovits, Super-Poincaré covariant quantization of the superstring, J. High Energy Phys. 04 (2000) 018 [hep-th/0001035]. [2] N. Berkovits, Multiloop amplitudes and vanishing theorems using the pure spinor formalism for the superstring, J. High Energy Phys. 09 (2004) 047 [hep-th/0406055]. [3] N. Berkovits, New higher-derivative R4 theorems [hep-th/0609006]. [4] N. Berkovits, Super-Poincaré covariant two-loop superstring amplitudes, J. High Energy Phys. 01 (2006) 005 [hep-th/0503197]. [5] N. Berkovits, Explaining pure spinor superspace [hep-th/0612021]. [6] N. Berkovits and C.R. Mafra, Equivalence of two-loop superstring amplitudes in the pure spinor and RNS formalisms, Phys. Rev. Lett. 96 (2006) 011602 [hep-th/0509234]. [7] C.R. Mafra, Four-point one-loop amplitude computation in the pure spinor formalism, J. High Energy Phys. 01 (2006) 075 [hep-th/0512052]. [8] E. D’Hoker and D.H. Phong, Two loop superstrings, 1. Main formulas, Phys. Lett. B 529 (2002) 241, [hep-th/0110247]. [9] L. Anguelova, P.A. Grassi and P. Vanhove, Covariant one-loop amplitudes in D = 11, Nucl. Phys. B 702 (2004) 269 [hep-th/0408171]. [10] G. Policastro and D. Tsimpis, R4, purified, Class. and Quant. Grav. 23 (2006) 4753 [hep-th/0603165]. [11] N. Berkovits and C.R. Mafra, Some superstring amplitude computations with the non-minimal pure spinor formalism, J. High Energy Phys. 11 (2006) 079 [hep-th/0607187]. [12] A. Cohen, M. van Leeuwen and B. Lisser, LiE: A Computer algebra package for Lie group computations, v. 2.2 (1998), http://wwwmathlabo.univ-poitiers.fr/~maavl/LiE/ [13] U. Gran, GAMMA: A Mathematica package for performing gamma-matrix algebra and Fierz transformations in arbitrary dimensions [hep-th/010508]. [14] H. Ooguri, J. Rahmfeld, H. Robins, J. Tannenhauser, Holography in superspace, J. High Energy Phys. 07 (2000) 045 [hep-th/0007104]. [15] P.A. Grassi, L. Tamassia, Vertex operators for closed superstrings, J. High Energy Phys. 07 (2004) 071 [hep-th/0405072]. [16] J.J. Atick and A. Sen, Covariant one loop fermion emission amplitudes in closed string theories, Nucl. Phys. B 293 (1987) 317. [17] M. B. Green and J. H. Schwarz, Supersymmetrical dual string theory. 3. Loops and renormalisation, Nucl. Phys. B 198 (1982) 441. [18] C. R. Mafra, Pure Spinor Superspace Identities for Massless Four-point Kinematic Factors, [arXiv:0801.0580 [hep-th]]. [19] C.-J. Zhu, Covariant two-loop fermion emission amplitude in closed superstring theories, Nucl. Phys. B 327 (1989) 744. [20] K. Peeters, Introducing Cadabra: A symbolic computer algebra system for field theory problems [hep-th/0701238]. [21] M.B. Green, J.H. Schwarz and E. Witten, Superstring theory. Vol. 1: Introduction, Cambridge University Press, 1987. – 22 – http://jhep.sissa.it/stdsearch?paper=04%282000%29018 http://jhep.sissa.it/stdsearch?paper=04%282000%29018 http://xxx.lanl.gov/abs/hep-th/0001035 http://jhep.sissa.it/stdsearch?paper=09%282004%29047 http://xxx.lanl.gov/abs/hep-th/0406055 http://xxx.lanl.gov/abs/hep-th/0609006 http://jhep.sissa.it/stdsearch?paper=01%282006%29005 http://jhep.sissa.it/stdsearch?paper=01%282006%29005 http://xxx.lanl.gov/abs/hep-th/0503197 http://xxx.lanl.gov/abs/hep-th/0612021 http://www-spires.slac.stanford.edu/spires/find/hep/www?j=PRLTA%2C96%2C011602 http://xxx.lanl.gov/abs/hep-th/0509234 http://jhep.sissa.it/stdsearch?paper=01%282006%29075 http://jhep.sissa.it/stdsearch?paper=01%282006%29075 http://xxx.lanl.gov/abs/hep-th/0512052 http://www-spires.slac.stanford.edu/spires/find/hep/www?j=PHLTA%2CB529%2C241 http://www-spires.slac.stanford.edu/spires/find/hep/www?j=PHLTA%2CB529%2C241 http://xxx.lanl.gov/abs/hep-th/0110247 http://www-spires.slac.stanford.edu/spires/find/hep/www?j=NUPHA%2CB702%2C269 http://www-spires.slac.stanford.edu/spires/find/hep/www?j=NUPHA%2CB702%2C269 http://xxx.lanl.gov/abs/hep-th/0408171 http://www-spires.slac.stanford.edu/spires/find/hep/www?j=CQGRD%2C23%2C4753 http://xxx.lanl.gov/abs/hep-th/0603165 http://jhep.sissa.it/stdsearch?paper=11%282006%29079 http://xxx.lanl.gov/abs/hep-th/0607187 http://wwwmathlabo.univ-poitiers.fr/~maavl/LiE/ http://xxx.lanl.gov/abs/hep-th/010508 http://jhep.sissa.it/stdsearch?paper=07%282000%29045 http://jhep.sissa.it/stdsearch?paper=07%282000%29045 http://xxx.lanl.gov/abs/hep-th/0007104 http://jhep.sissa.it/stdsearch?paper=07%282004%29071 http://jhep.sissa.it/stdsearch?paper=07%282004%29071 http://xxx.lanl.gov/abs/hep-th/0405072 http://www-spires.slac.stanford.edu/spires/find/hep/www?j=NUPHA%2CB293%2C317 http://www-spires.slac.stanford.edu/spires/find/hep/www?j=NUPHA%2CB198%2C441 http://www-spires.slac.stanford.edu/spires/find/hep/www?j=NUPHA%2CB327%2C744 http://xxx.lanl.gov/abs/hep-th/0701238 ABSTRACT The pure spinor formulation of the ten-dimensional superstring leads to manifestly supersymmetric loop amplitudes, expressed as integrals in pure spinor superspace. This paper explores different methods to evaluate these integrals and then uses them to calculate the kinematic factors of the one-loop and two-loop massless four-point amplitudes involving two and four Ramond states. <|endoftext|><|startoftext|> Introduction Formulation for Lifetimes of cc+, cc++, cc+ Spectator Contribution to Lifetimes of cc+, cc++, cc+ Non-spectator Contributions to Inclusive Decays of cc+, cc++, cc+ The hadronic matrix elements Input parameters and Numerical results Conclusion and Discussion References ABSTRACT In this work, we evaluate the lifetimes of the doubly charmed baryons $\Xi_{cc}^{+}$, $\Xi_{cc}^{++}$ and $\Omega_{cc}^{+}$. We carefully calculate the non-spectator contributions at the quark level where the Cabibbo-suppressed diagrams are also included. The hadronic matrix elements are evaluated in the simple non-relativistic harmonic oscillator model. Our numerical results are generally consistent with that obtained by other authors who used the diquark model. However, all the theoretical predictions on the lifetimes are one order larger than the upper limit set by the recent SELEX measurement. This discrepancy would be clarified by the future experiment, if more accurate experiment still confirms the value of the SELEX collaboration, there must be some unknown mechanism to be explored. <|endoftext|><|startoftext|> Introduction Observations And Data Reduction 1991 Observations 2001 Observations The Radial Velocities Period Searches Orbital Variations of the Emission Lines Orbital Tomograms and Trailed Spectra Spin Variations of the Emission lines The Spin Radial Velocity Curve Spin Tomograms and Trailed Spectra Discussion of the Orbital and Spin Data White dwarf and secondary masses The revised model of EX Hya Summary References ABSTRACT Results from spectroscopic observations of the Intermediate Polar (IP) EX Hya in quiescence during 1991 and 2001 are presented. Spin-modulated radial velocities consistent with an outer disc origin were detected for the first time in an IP. The spin pulsation was modulated with velocities near ~500-600 km/s. These velocities are consistent with those of material circulating at the outer edge of the accretion disc, suggesting corotation of the accretion curtain with material near the Roche lobe radius. Furthermore, spin Doppler tomograms have revealed evidence of the accretion curtain emission extending from velocities of ~500 km/s to ~1000 km/s. These findings have confirmed the theoretical model predictions of King & Wynn (1999), Belle et al. (2002) and Norton et al. (2004) for EX Hya, which predict large accretion curtains that extend to a distance close to the Roche lobe radius in this system. Evidence for overflow stream of material falling onto the magnetosphere was observed, confirming the result of Belle et al. (2005) that disc overflow in EX Hya is present during quiescence as well as outburst. It appears that the hbeta and hgamma spin radial velocities originated from the rotation of the funnel at the outer disc edge, while those of halpha were produced due to the flow of material along the field lines far from the white dwarf (narrow component) and close to the white dwarf (broad-base component), in agreement with the accretion curtain model. <|endoftext|><|startoftext|> Introduction We do not know what six-dimensional (2, 0) theory really is. It is believed that it can sustain solitonic self-dual strings [1], although no one today knows what a (non-Abelian) self-dual string really is. But if we break the gauge group maximally to U(1)r, then we should be able to define the charges of these mysterious self-dual strings by the asymptotic behaviour of the U(1) gauge fields. One should expect these asymptotic U(1) fields to be (at least isomorphic with) a copy of the familiar abelian two-form gauge potentials (with self-dual field strengths). It now seems to make sense to ask a question like, what is the dimension of the moduli space of self-dual strings of a given charge? If the gauge group is SU(2) and is broken to U(1) by the Higgs vacuum expectation value (that should also determine the tension of the string), then the intuitive answer to this question is 4N where N is the U(1) charge in a suitable normalization, such that N = 1 corresponds to one self-dual string. One may argue that half the supersymmetry is broken by the string. Therefore one string should sustain 4 fermionic zero modes. Since some (half) of the supersymmety is unbroken there should also be 4 corresponding bosonic zero modes. These are naturally identified with the translational zero modes associated with the four transverse directions to the string. Furthermore, the strings being BPS, should be possible to separate at no cost of energy (thus staying in the moduli space approximation). If we take them far from each other, one may suspect that we can just add 4 bosonic zero modes from each string, to get 4N bosonic zero modes in total in a configuration of N strings [2]. It would of course be nice to have a proof of this conjecture. Could it be proven if one had some index theorem? We will not provide a full solution to this problem in this Letter. But we will make it plausible that the problem can indeed be solved by computing the index of a certain Dirac operator in loop space. To address our index problem, we think that one can lend the methods that Callias [3] used to prove his index theorem in odd-dimensional spaces. In our case we have an even number of dimensions (namely the four transverse direction) so it is apparent that we would have to construct a new type of index. This we do in section 3. In section 2 we recall the Callias method [3] to address index problems in open spaces, though we will modify Callias’ regularization, using the more convergent exponential function to obtain the index, as the limit γe−sD , (1) (here D2 > 0 and γ =diag (1,−1)) rather than D2 +M2 , (2) which is the regularization that Callias used. We think that using the more convergent regularization of an exponential function is interesting in itself, as it could possibly extend the Callias index theorem to a wider class of index problems. Therefore we will devote the first part of this Letter on this subject. But let us at once say that our regulator probably has no advantages when attacking these old problems. It does not provide us with a solution for how to count the number of zero modes in a multimonopole configuration with a non- maximally broken gauge group, where the index can not be reliable computed due to a contribution from the continuum portion of the spectrum. What we hope though, is that our regulatization can be useful when attacking our new index problem associated with the moduli space of self-dual strings. In section 2 we obtain the index in one and three dimensions. In three dimensions we apply this on the multimonopole moduli space and re-derive the result in [4]. A recent review article on monopoles and supersymmetry is [5]. The one and three-dimensional index problems have also been studied in [6]. We then indicate how our method manages to reproduce the correct results in any odd dimensions. In section 3 we show how one at least in principle should be able to compute the dimension of the moduli space of N self-dual strings by computing a certain index. 2 Computing the Callias index in odd-dimensional spaces For Dirac operators on open n− 1-dimensional space where n− 1 is odd, there is an index theorem by Callias [3]. This applies to Dirac equations of the form Dψ = 0 (3) where the Dirac operator D is of the form D = γiiDi + γnφ. (4) Here i = 1, ..., n− 1 and γµ ≡ (γi, γn) denote the Dirac gamma matrices, {γµ, γν} = 2δµν . (5) We define the gauge covariant derivative as iDis = i∂is +Ais and all our fields are hermitian. If n− 1 is odd, the gamma matrices can be represented as One may use the n-dimesional notation Aµ = (Ai, φ), D = γµiDµ, but one must then remember that space is really n− 1 dimensional. If n − 1 is even there is no Weyl representation of the gamma matrices (because of the inclusion of the ‘gamma-five’), and no index theorem of this form exists. We define the ‘gamma-five’ for even n as γ ≡ −i− 2 γ1···n (7) which then is hermitian, and we define the projectors (1∓ γ) . (8) In odd dimensions n− 1, the Dirac operator splits into two Weyl operators D ≡ P+DP− D† ≡ P−DP+ (9) Because P± andD are all hermitian, it follows thatD† is the hermitian conjugate of D. Also, because D is already of an off-block diagonal form, it suffices to include just one of the projectors, so we can just as well write this as D = P+D = DP− D† = P−D = DP+ (10) The index can now be defined as dimkerD − dimkerD† (11) Since kerD = ker and kerD† = ker we can express this as2 dimker − dimker = dimker . (12) where we have noted that γ = P− − P+. Callias, Weinberg and others used the regulator I(M2) = Tr D2 +M2 to obtain the index as the limit M2 → 0. In this Letter we will be slightly more general. We define Ji(x, y) ≡ tr 〈x |γγif(D)| y〉 , (14) for any function f (and of course D is not dimensionless, so D has to be accom- panied by M in a suitable way). Then we notice that W (x, y) ≡ (iγi∂xi + γµAµ(x) +M) 〈x |f (D)| y〉 = 〈x |f (D)| y〉 −iγi∂yi + γµAµ(y) +M where (manifestly) W (x, y) = 〈x |(D +M)f (D)| y〉 . (16) From this, we obtain the following identity ∂xi + ∂yi Ji(x, y) = 2tr 〈x |γDf(D)| y〉 +tr (Aµ(y)−Aµ(x)) 〈x |γγµf(D)| y〉 (17) In odd dimensions, the second term in the right hand side vanishes as x ap- proaches y. This can be seen as being equivalent to the statement that there is no chiral anomaly in odd dimensions (by using point-splitting and inserting a Wilson line). So we get i∂iJi(x, x) = 2tr 〈x |γDf(D)|x〉 (18) 2To see this that kerD = kerD†D we apply the definition of hermitian conjugate with respect to the inner product (ψ, χ) = dxψ†χ and the property of the norm, to 0 = (ψ,D†Dψ) = (Dψ,Dψ). If we wish to compute the index as in Eq (13), then we can take f(D) = D2 +M2 (however there is no unique choice of Ji). We then get Ji(x, y) = tr D2 +M2 −D2 +D2 +M2 D2 +M2 = −tr D2 +M2 . (20) provided = 0 (21) We will see in the next few paragraphs how one can achieve this by using a principal value prescription. The virtue of expressing Eq (13) as a total divergence, is that we then can compute the index as a boundary integral over an (n− 2)-sphere at infinity as I(M2) = dΩn−2r n−2x̂iJi(x, x). (22) where r is the radius of the sphere and dΩn−2 denotes the volume element of the unit sphere. If instead we wish to compute the index as the limit of I(s) = Tr γe−sD . (23) as s→ ∞, then we get Ji(x, y) = tr . (24) It might seem confusing that we can have a plus sign here, when we have a minus sign in Eq (20). These peculiar signs seem to be correct though. Why we can have opposite signs should be a reflection of the fact that these expressions can not be continuously connected with each other, at least not in any obvious way (like taking M to zero and s to zero. In fact s should be taken to plus infinity as M goes to zero). We will now illustrate how one can use this Ji to compute the index in odd dimensions. One dimension We choose our gamma matrices as , γ2 = and we have γ = iγ1γ2 = . (26) The Dirac operator reads D = iγ1∂ + γ2φ (27) We need the square of the Dirac operator, D2 = −∂2 + φ2 + γ∂φ. (28) We make the choice J1(x, y) = −tr D2 +M2 We assume that φ(x) converges towards some constant values at x = −∞ and x = +∞. That means that we may ignore ∂φ(x) for sufficiently large |x|, where we then get J1(x, x) = −tr (γγ1γ2) k2 + φ2 +M2 φ2 +M2 The index is now given by (J1(+∞)− J1(−∞)) = ±1 (31) if φ flips the sign an odd number of times when going from −∞ to +∞, and 0 otherwise. If instead we choose J(x, y) = tr then we get J(x, x) = tr (γγ1γ2) k2 + φ2 e−s(k 2+φ2) (33) If we compute the integral over k in the most natural way, then we get a result that vanishes in the limit s → ∞. Could there be another way of defining this integral, such that we do not get zero as the result? We notice that the integral A(s) ≡ e−s(k k2 + 1 for s > 0 is convergent only if we integrate k along a line in the complex plane which is such that it asymptotically is such that −π < θ < π where k = |k|eiθ. Integrating along any such line in the complex plane, we get the same value of this integral. If on the other hand we integrate over a line that asymptotically lies outside this cone, then we get a divergent integral for s > 0. But we get a convergent integral for s < 0. We then define the value of the integral for s > 0 as the analytic continuation of the same integral for s < 0. It remains to compute this convergent integral. Replacing k by ik and s by −s, we get the integral A(−s) = −i e−s(k k2 − 1 (35) We can compute its derivative ′(−s) = −i −s(k2−1) = −i s (36) The right-hand side can obviously be analytically continued to −s, and that is how we will define A(s) where the integral representation does not converge. We can then integrate up A′(s), A(∞) = A(0)− e−s = A(0)− = A(0)− π (37) and we then need to compute A(0) = i k2 − 1 We define this as the principal value. This is ad hoc – we have no argument why one should define it like this. But if we accept this, then we get A(0) = 0. We conclude that we could just as well define the integral that we had, as e−s(k k2 + 1 = −π. (39) But this requires us to perform the integration of k in the cone where it diverges for s > 0, and then define this integral by analytic continuation. This seem to be rather ad hoc. We have three rather week arguments why one should Wick rotate. First, if we keep x − y as a small number, then we get the factor eik(x−y) and this can act as a convergence factor only if we Wick rotate. (We illustrate this in the Appendix where we compute the corresponding integral in any complex number of dimensions.) Second, it seems to be the only way that we could produce a non-trivial answer. Third, with this prescription we will manage to reproduce the right answer in any odd number of dimensions, where we can check our result against the safer regularization used by Callias. If we compute the integral by this prescription, then we get J(x, x) = tr (γγ1γ2) lim k2 + φ2 e−s(k 2+φ2) = i and we see that we indeed get the right answer. Three dimensions and magnetic monopoles The physics problem that we will consider in three dimensions, is to compute number of zero modes of the Bogomolnyi equation Fij = ǫijkDkφ (41) We choose the convention that our fields are hermitian. It is convenient to group the fields into ‘gauge potential’ Aµ = (Ai, φ) (42) We define Dµ = (Di, φ) such that iDµ = i∂µ + Aµ and we let Gµν = i[Dµ, Dν ] be the associated ‘field strength’. Then the Bogomolnyi equation reads Gµν = ǫµνρσGρσ . (43) Linearizing this, we get DµδAν = ǫµνρσDρδAσ (44) Contracting with γµν , we get (1 + γ)γµνDµδAν = 0 (45) and if we impose the background gauge condition DµδAµ = 0 (46) which is to say that zero modes are orthogonal to gauge variations with respect to the moduli space metric, then we can write this linearized equation as a Dirac equation Dψ ≡ γµDµψ = 0 (47) where ψ := (1 + γ)γµδAµ. (48) We compute D2 = −D2i + φ2 + iγµνGµν (49) Inserting the Bogomolnyi configuration we can write this, thus using the fact that Gµν is selfdual, 2 = −D2i + φ2 + (1 + γ)iγµνGµν . (50) and get a vanishing theorem. Namely, dimkerDD† = 0 as DD† > 0 is strictly postive. Hence we can compute the dimension of the moduli space dim kerD ≡ dimkerD†D just by computing the index of D. To compute the index, we now wish to compute Ji(x, x) = tr γγiγkDk We assume that asymptotically φ approaches a constant value at infinity. This corresponds to a gauge choice where we have a Dirac string singularity. Some further examination reveals that we get a non-negligible contribution to Ji, for a sufficiently large two-sphere, only from the term Ji(x, x) = tr γγiγ4φ (2π)3 k2 + φ2 + 1 iγµνGµν −s(k2+φ2+ 1 iγµνGµν) We thus need to perform an integral of the form A(s) = k2 + 1 e−s(k 2+1) (53) If we choose the same prescription as we did in one dimension, then we get the result A(+∞) = π. (54) For details of such a computation we refer to appendix A. If we apply this result to the integral that we had, we get Ji(x, x) = γγiγ4φ iγµνGµν We expand the square root, iγµνGµν = φ+ iγµνGµν + ... (56) In the far distance, in a charge Q monopole configuration, we find that γµνGµν = 2γkγ4(1− γ) Q (57) and so when we trace over the gamma matrices, we get Ji(x, x) = . (58) If we now for instance assume SU(2) gauge group, broken to U(1), then if we integrate i Ji over S 2, we get the index 2Q. The number of bosonic zero modes is twice the index, i.e. −4Q in our conventions [4, 5]. (2m+ 1) dimensions In 2m+ 1 dimensions we get the integral A(µ) ≡ lim k2 + µ2 e−s(k 2+µ2) (59) if we use our regulator. Here µ2 ≡ v2 +G (60) (and G is an abbreviation for 1 iγµνGµν .) This should be compared to the integral B(µ) ≡ − lim (−1)m (k2 + v2 +M2) Gm (61) that we get using the Callias regulator. 3 In order to compare these integrals, we rewrite them as A(µ) = µ2m−1a B(µ) = v−1bGm (63) where a = lim ξ2 + 1 −s̃(ξ2+1) b = − lim (−1)m ξ2 + 1 + M̃2 We compute a according the prescription introduced above in one and three dimensions, that is by Wick rotating ξ and continue analytically in s. (Details are in appendix A.) We can compute b using residue calculus (introducing a regulator so that we can close the contour on a semi-circle at infinity). The result is a = −(−1)mπ b = (−1)m 1 ) (65) We next expand vA(µ) = v v2 +G )m− 1 = v2ma+ ...+ m + ... vB(µ) = bGm (66) and we find that the coefficient of Gm becomes equal to −(−1)m ) π (67) if one uses our regularization, and equal to (−1)m 1 ) π (68) 3This integral comes from expanding k2 + v2 +G+M2 k2 + v2 +M2 + ... (62) in powers of G as a geometric series [4]. if one uses the Callias regularization. We see that the two expressions coincide for all m. We have now showed that if we use our prescription of Wick rotating k to compute the integrals over the exponential, then we get the right answer for all cases that can be safely computed using a regulator that is less convergent. We are inclined to think that our prescription for how to compute the integral, will also work for index problems where the Callias regulator diverges. But we have no proof. It is perhaps not so obvious that more general index problems can be formulated. In the next section we will give one example of a more general type of index problem. 3 Four dimensions and self-dual strings To introduce the notation, we first consider the free Abelian tensor multiplet theory in 1 + 5 dimensions. The on-shell field content is a two-form gauge potential Bµν , five scalar fields φ A and corresponding Weyl fermions ψ. The field strength Hµνρ = ∂µBνρ + ∂ρBµν + ∂νBρµ is selfdual. The supersymmetry variation of the Weyl fermions is ΓµνρHµνρ + Γ µΓA∂µφ ǫ (69) where we use eleven-dimensional gamma matrices splitted into SO(1, 5)×SO(5), so that in particular {Γµ,ΓA} = 0. (70) In a static and x5 independent field configuration, in which only φ5 =: φ is non-zero, we find the SUSY variation Γ0i5H0i5 + Γ iΓA=5∂iφ ǫ (71) If we assume that the classical bosonic field configuration is such that ∂iφ = H0i5 (72) then the SUSY variation reduces to δψ = ∂iφΓ Γ05 + ΓA=5 ǫ (73) and we find the condition for unbroken SUSY as 1 + Γ05ΓA=5 ǫ = 0 (74) If we use the Weyl condition Γǫ = −ǫ (75) of the (2, 0) supersymmetry parameter ǫ, then we can also write this as 1 + Γ1234ΓA=5 ǫ = 0. (76) We may represent the gamma matrices as Γµ = (Γ0,Γi,Γ5) = 1⊗ iσ2 ⊗ 1, γi ⊗ σ1 ⊗ 1, γ ⊗ σ1 ⊗ 1 ΓA = 1⊗ iσ2 ⊗ σA (77) where σ1,2,3 are the Pauli sigma matrices, γ = γ1234. Then the condition for unbroken SUSY is (1 + γ ⊗ σ) ǫ = 0 (78) where σ = σ1234 = σA=5. We have found that if Hijk = ǫijkl∂lφ (79) then half SUSY is unbroken. This equation is the Bogomolnyi equation for self- dual strings [1]. We are interested in finding the number of parameters needed to describe solutions of this equation. We can linearize it and get the equation γi∂iχ = 0 (80) for the bosonic zero modes, that we have gathered into a matrix χ ≡ γijδBij + γδφ. (81) For this to work we must also assume the background gauge condition ∂iBij = 0. (82) Now this linearized equation Eq (80) does not make any reference to the gauge field. So there is no way that we could count the number of parameters of a multi-string configuration just using this equation. This should of course not be a surprise. The strings that we have in the Abelian theory are not solutions of the field equations. They have to be inserted by hand, that is we need to insert delta function sources by hand, in the same spirit as for Dirac monopoles. To be able to count the number of zero modes, we must consider some interacting theory which (at the classical level) has solitonic string solutions. To pass to non-Abelian theory we begin by rewriting the Abelian theory in loop space. Loop space consists of parametrized loops C: s 7→ Cµ(s). We introduce the Abelian ‘loop fields’ [7] Aµs = Bµν(C(s))Ċ ν (s) φµs = φ(C(s))Ċµ(s) µs = ψ(C(s))Ċµ(s) (83) With these definitions, a short computation reveals that Aµs transforms as a vec- tor and φµs a contra-variant vector under diffeomorphisms in loop space induced by diffeomorphisms in space-time. One may then extend these transformation properties to any diffeomorphism in loop space. Space-time diffeomorphism and reparametrizations of the loops then get unified and are both diffemorphisms in loop space. The only thing to remember is what is kept fixed under the variation. If it is the parameter of the loop, or the loop itself. The field strength becomes Fµs,νt = Hµνρ(C(s))Ċ ρ(s)δ(s− t) (84) In terms of these fields, the Bogomolnyi equation will read4 Fis,jt = ǫijkl∂k(sφlt). (85) We pass to the non-Abelian theory by letting these loop fields become non- Abelian, in the sense that Aµs = A a(s) where λa(s) are generators of a loop algebra associated to the gauge group [7]. We introduce a covariant derivative Dµs = ∂µs +Aµs. (86) Local gauge transformations act as δΛAµs = DµsΛ µs = [φµs,Λ]. (87) Given a loop C, we automatically get a tangent vector Ċµ(s) that makes no reference to space-time. We can therefore impose the loop space constraints Ċµ(s)Aµs = 0 (88) for each s, and also φµs = Ċµ(s)φ(s;C) (89) for some subtle field φ(s;C) on loop space. As a consequence, we find that µs = 0. (90) These constraints are covariant under diffeomorphisms of space-time and reparametriza- tions of loops. They are invariant also under local gauge transformations, pro- vided that the gauge parameter is subject to the condition µ(s)∂µsΛ = 0 (91) which is the condition of reparametrization invariance. With the assumption made that λa(s) are generators of a loop algebra, we find that the constraint can also be written as [Aµs, φ µt] = 0 (92) A local gauge variation of this constraint is [DµsΛ, φ µt] + [Aµs, [φ µt,Λ]] = [∂µsΛ, φ µt] + [[Aµs,Λ], φ µt] + [Aµs, [φ µt,Λ]] = [∂µsΛ, φ µt] + [Λ, [φµt, Aµs]] (93) The last term vanishes by the constraint. The first term gives us the constraint Eq (92) that we must impose on the gauge parameter dsΛa(s, C)λa(s). (94) 4We denote by ∂is the usual functional derivative with respect to C µ(s). We have now introduced non-local non-Abelian fields with infinitely many components. It is also likely that consisteny of the theory requires an infinite set of constraints on these fields. Maybe then, it could be that we may in the end descend to a finite degrees of freedom. But this is just a speculation. The problem appears to be difficult and ill-defined – How should one define a degree of freedom in a strongly coupled non-local theory? The non-Abelian generalization of the Bogomolnyi equation should be given by [7] Fis,jt = ±ǫijklDk(sφlt). (95) This equation is gauge invariant and invariant under the residual SO(4) Lorentz group that is preserved by the strings. We can not think of any reasonable modification of this equation that would preserve these symmetries, so on this grounds alone one could suspect this equation to be correct. Of course this is not the only requirement that the BPS condition imposes. We also get conditions on the 0s and the 5s components. But these BPS equations will be of no interest to us right now. We will show below that the linearized Bogomolnyi equation can be written Di(s + σφi(s χt) = 0 (96) We will also see below that we (presumably) can actually drop the symmetriza- tion in s and t in this equation. The fields transform in the adjoint represen- tation of the loop algebra, by which we mean that φisχt = [φis, χt]. We define the Dirac operator Ds = γi (Dis + σφis) (97) and the projectors (1∓ γσ) , (98) We can now formulate an index problem, in an even-dimensional (loop-)space. The even-dimensional space in this case is given by the 4-dimensional transverse space to the strings, and the index is given by dimkerDs − dimkerD†s (99) where Ds = P+Ds = DsP− D†s = P−Ds = DsP+. (100) Since Ds and P± are hermitian, it is manifest that D†s defined this way will be the hermitian conjugate of Ds, thus justifying the notation. Computing the index alone is not sufficient in order to obtain the dimension of the moduli space of self-dual strings. We also need a vanishing theorem that says that dimkerD†s = 0. Linearizing the Bogomolnyi equation, we get 2D[isδAjt] = ±ǫijkl (Dksδφlt + φksδAlt) (101) Contracting by γij , we get γijD̃isχjt = 0 (102) where we have defined D̃is ≡ Dis ∓ γφis χis ≡ δAis ∓ γδφis (103) To see that the linearized BPS equation can be written like this, one must use the constraint γijφisδφjt = 0. (104) We can avoid having explicit ± signs by introducing the other chiraly matrix at our disposal, namely σ that lives in a different vector space than γ. We can then hide the ± signs in the tensor product γ ⊗ σ = ±1 (105) which amounts to D̃is ≡ Dis + σφis χis ≡ δAis + σδφis (106) without any ±.5 If we define χs ≡ γiχis (108) then we can write the zero mode equation as γiD̃isχt + D̃ sχit = 0. (109) Let us analyze the second term in this equation. It is given by DisδAit + φ sδφit φisδAit +D sδφit (110) We should not count variations that are gauge variations as bosonic zero modes. We can insure this by demanding the zero modes to be orthogonal to gauge variations, with respect to the metric on the moduli space, (δΛAis, δAit) + (δΛφis, δφjt) = 0 (111) This leads to the background gauge condition sδAit + φ sδφit = 0. (112) 5To really understand what is going on, one should apply (1± γσ) on everything, on ψs and on Ds. Then one notices that ∓γ (1∓ γσ) = σ (1∓ γσ) . (107) That is, we can trade ∓γ for σ, once we apply (1± γσ) on everything. This is what we really should do, but to keep the notation simple, we do not spell this out. This condition implies that the gauge variation of the zero modes vanishes, δΛδAis = 0 = δΛδφis (113) To see this, we make a gauge variation δΛδAis = DisΛ, δΛφis = φisΛ, and ask which gauge parameters Λ will respect the background gauge condition. Inserting this gauge variation into the background gauge condition, we get sDit + φ Λ = 0. (114) For this to work nicely, it seems that we must constrain the non-locality of our loop field such that ∂i ∂it) < 0. Then the only solution to this equation is Λ = 0. In other words all gauge variations of the zero modes have to vanish. Furthermore we want the variation to preserve the orthogonality between Ais and φis, (Ais, δφit) + (δAis, φit) = 0 (115) If we make a gauge variation of this, then we get the condition (δΛAis, δφit) + (δAis, δΛφit) = 0 (116) which amounts to φisδAit +D sδφit = 0. (117) We conclude that the zero mode equation can be written as Dsχt = 0 (118) where Ds = γi (Dis + σφis) (119) We are interested in counting the number of such modes in a background of k BPS strings. We compute D2 = (Dis) 2 + (φis) γij (Fis,js + γσǫijklDksφls) (120) (Here D2 ≡ DsDs ≡ DsDs, and analogously for the other fields or opera- tors.) In a BPS configuration, we get is 2 = (Dis) 2 + (φis) ij (1 + γσ)Fis,js (121) Furthermore, in the subspace where 1 + γσ = 0, we find that D2 = (Dis) 2 + (φis) 2 (122) is a strictly negative operator, hence has no zero modes. This means that we have a vanishing theorem, dimkerD† = 0. A small comment The zero mode equation was really D(sχt) = 0 (123) where we should symmetrize in s and t. That means that we should rather consider DsD(sχt) = (DsDsχt +DsDtχs) (DsDsχt +DtDsχs + [Ds, Dt]χs) . (124) If now D[sDt] = 0 and Dsχs = 0, then we get DsDsχt = 0 (125) The latter condition, Dsχs = 0 is of course a consequence of D(sχt) = 0 with s = t. The former condition reads 0 = D[sDt] = Di[sDit] + φi[sφit] + σDi[sφit] (126) which we would like to impose as a constraint. Restricting to the abelian case this is condition is of course true as 0 ≡ ∂i[s∂|i|t]. If we can impose this as a constraint on the non-abelian fields, then we have now seen that the zero mode equation Eq (123) implies that dsD†sDsχt = 0 (127) because Ds is anti-self-adjoint with respect to the inner product (ψs, χt) = ψ†s(C)χt(C) (128) on loop space. We can also go in the opposite direction. Assuming that Eq (127) holds, we get χt, D sDsχt = (Dsχt, Dsχt) (129) and we conclude that (123) implies Dsχt = 0 (130) with no symmetrization in s, t. How to compute the index We should now be able to compute an index associated to self-dual strings, as the limit I(s) = Tr (131) when s→ ∞. We define the quantity Jis(C,C ′) = tr γσγiγk (Dks + σφks) (132) (it should be clear that the two s’s involved in this formula are totally unrelated) and find that I(s) = DC∂isJ is(C,C) (133) We can separate the functional integral over parametrized loops C into several pieces. We can keep a point on the loops C(s) = x fixed, and separate it as DxC (134) Then we can write I(s) as an integral over a large three-sphere at spatial infinity, ∂Jis(C) ∂Ci(s) dΩ3x̂ DxCJis(C,C) (135) where thus x = C(s). If we assume that the gauge group is maximally broken to a product of U(1)’s by the Higgs vacuum expectation values, then we should have U(1) loop fields at spatial infinity. If we assume that the gauge group is SU(2) and that it is broken to U(1), then we need only the asymptotic form of the U(1) fields at spatial infinity, Fis,jt = Hijl(x)Ċ l(s)δ(s− t) φks = vĊk(s) (136) Without doing any computations, we can guess what the outcome of the index calculation should be. A term like ǫijkl DxCtr (Fis,jt(C)Fks,jt(C)) (137) could certainly arise somewhere (in odd dimensions a corresponding term van- ished since there is no chiral anomaly in odd dimensions). In our case this term vanishes identically by the Bogomolnyi equation and the constraint6 Fis,jtDisφjt = 0. (139) Then there can be a term ǫijkl DxCtr Fis,jtφks (140) 6For U(1) fields this would read Fis,jt∂isφjt ∼ Hijk(C(s))∂iφ(C(s))Ċ k(s)Ċj(s)δ(s − t)2 ≡ 0. (138) that should arise in a very similar way as the corresponding term arose for monopoles. If we insert the asymptotic U(1) fields, this term becomes propor- tional to ǫijklHijk(x) (141) That means that the index should be given by some numerical constant, times the magnetic charge H. (142) A Integrals over the exponential The integral we will analyze here is a(s) = k2 + 1 −s(k2+1)eiǫk (143) for any complex number ζ. (The ǫ > 0, say, will be taken towards zero. It arose from ǫ = x− y and we keep it here just as a convergence factor.) We first compute a(0) = k2 + 1 eiǫk (144) In order to make this integral converge for any ζ, we should Wick rotate k to ik, and henceforth we will always mean by i the branch eiπ/2, and by −1 we mean eiπ . Then we get a(0) = −i2ζ+1 k2 − 1 e−ǫk (145) and this integral we evaluate as a principal value. That means to evaluate the residues along the real axis and multiply them not by 2πi, but by half of it, that is, by πi. We get a(0) = (−1)ζπ 1− (−1) . (146) Next we turn to our integral a(s). It is easier to first compute the derivative. We should still work with the Wick rotated integral. Making the substitution ξ = k2 we can put it on the form of two gamma functions. The result is that ′(−s) = −eiπ(ζ+ 1 + (−1)2ζ −ζ− 1 s (147) which we can trivially continue analytically to +s, and then integrate up. The result is a(+∞) = −π 1 + (−1) cos(πζ) + π(−1)ζ 1− (−1) . (148) References [1] P. S. Howe, N. D. Lambert and P. C. West, “The self-dual string soliton,” Nucl. Phys. B 515, 203 (1998) [arXiv:hep-th/9709014]. [2] D. S. Berman and J. A. Harvey, “The self-dual string and anomalies in the M5-brane,” JHEP 0411, 015 (2004) [arXiv:hep-th/0408198]. [3] C. Callias, “Index Theorems On Open Spaces,” Commun. Math. Phys. 62, 213 (1978). [4] E. J. Weinberg, “Parameter Counting For Multi - Monopole Solutions,” Phys. Rev. D 20, 936 (1979). [5] E. J. Weinberg and P. Yi, “Magnetic monopole dynamics, supersymmetry, and duality,” Phys. Rept. 43, 65 (2007) [arXiv:hep-th/0609055]. [6] M. Hirayama, “Supersymmetric Quantum Mechanics And Index Theo- rem,” Prog. Theor. Phys. 70, 1444 (1983). [7] A. Gustavsson, “A reparametrization invariant surface ordering,” JHEP 0511, 035 (2005) [arXiv:hep-th/0508243]. A. Gustavsson, “The non-Abelian tensor multiplet in loop space,” JHEP 0601, 165 (2006) [arXiv:hep-th/0512341]. ABSTRACT We give a prescription for how to compute the Callias index, using as regulator an exponential function. We find agreement with old results in all odd dimensions. We show that the problem of computing the dimension of the moduli space of self-dual strings can be formulated as an index problem in even-dimensional (loop-)space. We think that the regulator used in this Letter can be applied to this index problem. <|endoftext|><|startoftext|> Introduction Let X = {0, 1}Zd denote a configuration space, where Zd is the d-dimensional integer lattices. The contact process {ηt : t ≥ 0} is an X-valued continuous- time Markov process. The model was introduced by Harris in 1974 [1] and is considered as a simple model for the spread of a disease with the infection rate λ. In this setting, an individual at x ∈ Zd for a configuration η ∈ X is infected if η(x) = 1 and healthy if η(x) = 0. The formal generator is given Ωf(η) = c(x, η)[f(ηx)− f(η)], where ηx ∈ X is defined by ηx(y) = η(y) (y 6= x), and ηx(x) = 1−η(x). Here for each x ∈ Zd and η ∈ X, the transition rate is c(x, η) = (1− η(x))× λ y:|y−x|=1 η(y) + η(x), http://arxiv.org/abs/0704.0019v2 with |x| = |x1|+ · · ·+ |xd|. In particular, the one-dimensional contact process 001 → 011 at rate λ, 100 → 110 at rate λ, 101 → 111 at rate 2λ, 1 → 0 at rate 1. Let Y = {A ⊂ Zd : |A| < ∞}, where |A| is the number of elements in A. Let ξAt (⊂ Zd) denote the state at time t of the contact process with ξA0 = A. There is a one-to-one correspondence between ξAt (⊂ Zd) and ηt ∈ X such that x ∈ ξAt if and only if ηt(x) = 1. For any A ∈ Y , we define the extinction probability of A by limt→∞ P (ξ t = ∅). Define νλ(A) = νλ{η : η(x) = 0 for any x ∈ A}, where νλ is an invariant measure of the process starting from a configuration: η(x) = 1 (x ∈ Zd) and is called the upper invariant measure. In other words, let δ1S(t) denote the probability measure at time t for initial probability measure δi which is the pointmass η ≡ i(i = 0, 1). Then νλ = limt→∞ δ1S(t). Then self-duality of the process implies that νλ(A) = limt→∞ P (ξ t = ∅). The correlation identities for νλ(A) can be obtained as follows: Theorem 1.1 For any A ∈ Y , y:|y−x|=1 νλ(A ∪ {y})− νλ(A) νλ(A \ {x})− νλ(A) From now on we consider the one-dimensional case. We introduce the fol- lowing notation: νλ(◦) = νλ({0}), νλ(◦◦) = νλ({0, 1}), νλ(◦ × ◦) = νλ({0, 2}), . . . . By Theorem 1.1, we obtain Corollary 1.2 2λνλ(◦◦)− (2λ+ 1)νλ(◦) + 1 = 0,(1) λνλ(◦ ◦ ◦)− (λ+ 1)νλ(◦◦) + νλ(◦) = 0,(2) 2λνλ(◦ ◦ ◦◦) + νλ(◦ × ◦)− (2λ+ 3)νλ(◦ ◦ ◦) + 2νλ(◦◦) = 0,(3) λνλ(◦ ◦ ×◦)− (2λ+ 1)νλ(◦ × ◦) + λνλ(◦ ◦ ◦) + νλ(◦) = 0.(4) The detailed discussion concerning results in this section can be seen in Konno [2, 3]. If we regard λ, νλ(◦), νλ(◦◦), νλ(◦ ◦ ◦), . . . as variables, then the left hand sides of the correlation identities by Theorem 1.1 are polyno- mials of degree at most two. In the next section, we give a new procedure for getting a series of approximations for extinction probabilities based on the Gröbner basis by using Corollary 1.2. As for the Gröbner basis, see [4], for example. 2 Our results Put x = νλ(◦), y = νλ(◦◦), z = νλ(◦ ◦ ◦), w = νλ(◦ × ◦), s = νλ(◦ ◦ ◦◦), u = νλ(◦ ◦ ×◦). Let ≺ denote the lexicographic order with λ ≺ x ≺ y ≺ w ≺ z ≺ u ≺ s. For m = 1, 2, 3, let Im be the ideals of a polynomial ring R[x1, x2, . . . , xn(m)] over R as defined below. Here x1 = λ, x2 = x, x3 = y, x4 = z, x5 = w, x6 = s, x7 = u and n(1) = 3, n(2) = 4, n(3) = 7. 2.1 First approximation We consider the following ideal based on Corollary 1.2 (1): I1 = 〈 2λy − 2λx− x+ 1, y − x2 〉 ⊂ R[λ, x, y].(5) Here y−x2 corresponds to the first (or mean-field) approximation: ν(1) (◦◦) = λ (◦))2. Then G1 = {(x− 1)(2λx− 1), y − x2}(6) is the reduced Gröbner basis for I1 with respect to ≺. Therefore the solution except a trivial one x(= y) = 1 is x = ν (◦) = 1/(2λ). Remark that the trivial solution means that the invariant measure is δ0. From this, we obtain the first approximation of the density of the particle, ρλ = Eνλ(η(x)), as follows: = 1− ν(1) (◦) = 2λ− 1 for any λ ≥ 1/2. This result gives the first lower bound λ(1)c of the critical value λc of the one-dimensional contact process, that is, λ c = 1/2 ≤ λc. However it should be noted that the inequality is not proved in our approach. The estimated value of λc is about 1.649. 2.2 Second approximation Consider the following ideal based on Corollary 1.2 (1) and (2): I2 = 〈 2λy − 2λx− x+ 1, λz − λy − y + x, xz − y2 〉 ⊂ R[λ, x, y, z]. Here xz−y2 corresponds to the second (or pair) approximation: ν(2) (◦)ν(2) ◦) = (ν(2)λ (◦◦))2. Then G2 = {(x− 1)((2λ− 1)x− 1), 1 + 2λ(y − x)− x, −y − yx+ 2x2,−z − y(2 + y) + 4x2} is the reduced Gröbner basis for I2 with respect to ≺. Therefore the solution except a trivial one x(= y = z) = 1 is x = ν (◦) = 1/(2λ − 1). As in a similar way of the first approxaimation, we get the second approximation of the density of the particle: 2(λ− 1) 2λ− 1 , for any λ ≥ 1. This result implies the second lower bound λ(2)c = 1. We should remark that if we take I ′2 = 〈 2λy − 2λx− x+ 1, λz − λy − y + x, y − x2, z − x3 〉 ⊂ R[λ, x, y, z], then we have G′2 = {z − 1, y − 1, x− 1} is the reduced Gröbner basis for I ′2 with respect to ≺. Here y−x2 and z−x3 correspond to an approximation: ν (◦◦) = (ν(2 (◦))2 and ν(2 (◦ ◦ ◦) = (◦))3, respectively. Then we have only trivial solution: x = y = z = 1. 2.3 Third approximation Consider the following ideal based on Corollary 1.2 (1)–(4): I3 = 〈 2λy − 2λx− x+ 1, λz − λy − y + x, 2λs+ w − (2λ+ 3)z + 2y, λu− (2λ+ 1)w + λz + x, ys− z2, xu− yw 〉 ⊂ R[λ, x, y, z, w, s, u]. Here ys−z2 and xu−yw correspond to the third approximation: ν(3) (◦◦)ν(3) ◦◦) = (ν(3) (◦ ◦ ◦))2 and ν(3) (◦)ν(3) (◦ ◦×◦) = ν(3) (◦◦)ν(3) (◦× ◦), respectively. G3 = {(x− 1)((12λ3 − 5λ− 1)x2 − 2λ(2λ+ 3)x− λ+ 1), . . .} is the reduced Gröbner basis for I3 with respect to ≺. Therefore the solution except a trivial one x = 1 is x = ν λ (◦) = (λ(2λ+3)+ D)/(12λ3−5λ−1), where D = 16λ4 + 4λ2 + 4λ+ 1. Then we obtain the third approximation of the density of the particle: 4λ(3λ2 − λ− 3) 12λ3 − 2λ2 − 8λ− 1 + for any λ ≥ (1 + 37)/6. This result corresponds to the third lower bound c = (1 + 37)/6 ≈ 1.180. 3 Summary We obtain the first, second, and third approximations for the extinction probability, the density of the particle, and the lower bound of the one- dimensional contact process by using the Gröbner basis with respect to a suitable term order. These results coincide with results given by the Harris lemma (more precisely, the Katori-Konno method, see [3]) or the BFKL inequality [5] (see also [3]). As we saw, the generators of Im in Section 2 have degree at most two in x1, x2, . . ., such as 2λy − 2λx− x+ 1, ys− z2 in the case of I3. We expect that this property will lead to get the higher order approximations of the process (and other interacting particle systems having a similar property) effectively. Acknowledgment. The author thanks Takeshi Kajiwara for valuable dis- cussions and comments. References [1] T. E. Harris, Contact interactions on a lattice, Ann. Probab. 2: 969–988 (1974). [2] N. Konno, Phase Transitions on Interacting Particle Systems, World Scientific, Singapore (1994). [3] N. Konno, Lecture Notes on Interacting Particle Systems, Rokko Lectures in Mathematics, Kobe University, No.3 (1997), http://www.math.kobe-u.ac.jp/publications/rlm03.pdf. [4] D. A. Cox, J. B. Little, and D. O’Shea, Ideals, Varieties, And Al- gorithms: An Introduction to Computational Algebraic Geometry And Commutative Algebra, 3rd edition, Undergraduate Texts in Mathemat- ics, Springer Verlag (2007). [5] V. Belitsky, P. A. Ferrari, N. Konno, and T. M. Liggett, A strong corre- lation inequality for contact processes and oriented percolation, Stochas- tic. Process. Appl. 67: 213–225 (1997). http://www.math.kobe-u.ac.jp/publications/rlm03.pdf Introduction Our results First approximation Second approximation Third approximation Summary ABSTRACT In this note we give a new method for getting a series of approximations for the extinction probability of the one-dimensional contact process by using the Gr\"obner basis. <|endoftext|><|startoftext|> Introduction The f+(q2) hadronic form factor Form factor parameterizations Taylor expansion Model-dependent parameterizations Quantitative expectations Quark Models QCD sum rules Lattice QCD Analyzed parameterizations The BABAR Detector and Dataset Signal reconstruction Signal selection Background rejection q2 measurement Results on the q2 dependence of the hadronic form factor Systematic Uncertainties c-quark hadronization tuning Reconstruction algorithm Resolution on q2 Particle identification Background estimate Fitting procedure and radiative events Control of the statistical accuracy in the SVD approach Summary of systematic errors Comparison with expectations and with other measurements Branching fraction measurement Selection of candidate signal events Efficiency corrections Systematic uncertainties on RD Correlated systematic uncertainties Selection requirement on the Fisher discriminant D*+ counting in D0K- + Decay rate measurement Summary Acknowledgments References ABSTRACT The shape of the hadronic form factor f+(q2) in the decay D0 --> K- e+ nue has been measured in a model independent analysis and compared with theoretical calculations. We use 75 fb(-1) of data recorded by the BABAR detector at the PEPII electron-positron collider. The corresponding decay branching fraction, relative to the decay D0 --> K- pi+, has also been measured to be RD = BR(D0 --> K- e+ nue)/BR(D0 --> K- pi+) = 0.927 +/- 0.007 +/- 0.012. From these results, and using the present world average value for BR(D0 --> K- pi+), the normalization of the form factor at q2=0 is determined to be f+(0)=0.727 +/- 0.007 +/- 0.005 +/- 0.007 where the uncertainties are statistical, systematic, and from external inputs, respectively. <|endoftext|><|startoftext|> Molecular Synchronization Waves in Arrays of Allosterically Regulated Enzymes Vanessa Casagrande,1 Yuichi Togashi,2, ∗ and Alexander S. Mikhailov2, † 1Hahn-Meitner-Institut, Glienicker Straße 100, 14109 Berlin, Germany 2Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, 14195 Berlin, Germany Spatiotemporal pattern formation in a product-activated enzymic reaction at high enzyme con- centrations is investigated. Stochastic simulations show that catalytic turnover cycles of individual enzymes can become coherent and that complex wave patterns of molecular synchronization can develop. The analysis based on the mean-field approximation indicates that the observed patterns result from the presence of Hopf and wave bifurcations in the considered system. PACS numbers: 82.40.Ck, 87.18.Pj, 82.39.Fk, 05.45.Xt Molecular machines, such as molecular motors, ion pumps and some enzymes, play a fundamental role in biological cells and can be also used in the emerging soft- matter nanotechnology [1]. A protein machine is a cyclic device, where each cycle consists of conformational mo- tions initiated by binding of an energy-bringing ligand [2, 3]. In motors, such internal motions generate me- chanical work [4], while in enzymes they enable or facil- itate chemical reaction events (see, e.g., [5, 6]). Much attention has been attracted to studies of biomembranes with ion pumps and molecular motors, where membrane instabilities and synchronization effects have been ana- lyzed [7, 8, 9]. Here, a different class of distributed ac- tive molecular systems — formed by enzymes — is con- sidered. The catalytic activity of an allosteric enzyme protein is activated or inhibited by binding of small reg- ulatory molecules; the role of such regulatory molecules can be played by products of the same reaction [10]. Pre- vious investigations of simple product-regulated enzymic systems [11, 12] and enzymic networks [13] in small spa- tial volume with full diffusional mixing have shown that spontaneous synchronization of molecular turnover cycles can take place there. External molecular synchroniza- tion of enzymes of the photosensitive P-450 dependent monooxygenase system by periodic optical forcing has been experimentally demonstrated [14]. In this Letter, spatiotemporal pattern formation in en- zymic arrays is investigated. In such systems, immobile enzymes are attached to a solid planar support immersed into a solution through which fresh substrate is supplied and product molecules are continuously removed. Prod- uct molecules released by an enzyme diffuse through the solution and activate catalytic turnover cycles of neigh- bouring enzymes in the array. A simple stochastic model [12] of an enzyme as a cyclic machine (a stochastic phase oscillator), shown in Fig. 1, is used. Binding of a substrate molecule to an enzyme i initiates an ordered internal conformational motion, de- scribed by the conformational phase coordinate φi. The initial state corresponds to the phase φi = 0. The cat- alytic conversion event takes place and the product is released at the state φp inside the cycle. After that, the substrate enzyme regulatory molecule feedback product FIG. 1: (Color online) A sketch of the model. conformational motion continues until the equilibrium state of the enzyme (φi = 1) is finally reached. Initi- ation of a turnover cycle is a random event, occurring at a certain probability rate. We assume that substrate is present in abundance, and its concentration is not af- fected by the reactions. Conformational motion inside the cycle is modeled as a stochastic diffusional drift pro- cess, described by equation φi = v+ ηi(t), where v is the mean drift velocity and ηi(t) is an internal white noise with 〈ηi(t)ηj(t ′)〉 = 2σδijδ(t− t ′) where σ specifies inten- sity of intramolecular fluctuations. Allosterically activated enzymes possess a site on their surface where regulatory molecules can become bound. Binding of a regulatory molecule leads to conformational change that enhances catalytic activity of the enzyme. A regulatory molecule binds to an enzyme with rate con- stant β and dissociate from it with rate constant κ. Bind- ing of a regulatory molecule at an enzyme raises its prob- ability to start a cycle from α0 to α1. We assume that a regulatory molecule can bind to an enzyme only in its rest state and this molecule is released when the cycle is started. The role of regulatory molecules is played by product molecules of the same reaction. Immobile enzymes are randomly distributed in space with concen- tration c. Product diffuses at diffusion constant D and undergoes decay at rate constant γ. The characteristic diffusion length of product molecules is ldiff = In our stochastic 2D simulations, the medium was dis- cretized into spatial cells (up to 256 × 256), each con- http://arxiv.org/abs/0704.0021v2 FIG. 2: Stochastic (a,b) and mean-field (c,d) simulations of 2D wave patterns; (a) τp = 0.14, c = 1, and β = 300, (b) τp = 0.25, c = 10, and β = 10, (c) τp = 0.14, c = 1, and β = 300, (d) τp = 0.34, c = 100, and β = 1.42. Other parameters are α0 = 1, α1 = 1000, κ = 10, γ = 10, σ = 0, D = 100. The linear size of the shown area is L = 40 ldiff in all panels. taining a number of enzyme molecules. The cells were so small that diffusional mixing of product molecules in a cell within the shortest characteristic time of the reac- tion could always take place. Each enzyme was described by the stochastic model given above; diffusion of product molecules was modeled as a random walk over a discrete cell lattice. The mean cycle time τ = 1/v was chosen as the time unit (τ = 1). Systems including up to 655 360 enzymes were used in the simulations. Figure 2a,b (see also Videos 1 and 2 in ref. [15]) shows two typical examples of stochastic 2D simulations. Here, spatial distributions of product molecules are displayed. Waves of product concentration are propagating through the medium. In a peak of a wave, many locally present enzymes are simultaneously releasing product molecules. Since product release can take place only at a certain stage inside the cycle, this means that the cycles of en- zymes are locally synchronized. Not only regular wave structures, such as rotating spiral waves or target pat- terns (Fig. 2a), but also complex regimes of wave turbu- lence (Fig. 2b) have been observed. To understand and interpret stochastic simulation re- sults, an analytical study of the system in the mean-field approximation, which holds in the limit of high enzyme concentrations, has been performed. In this approxima- tion, the system is characterised by three continuous vari- ables n0(r, t), n1(r, t) and m(r, t) which represent local concentrations of enzymes in the rest state without or with regulatory molecules attached (n0 and n1) and local concentration of the product (m). For simplicity, internal fluctuations in enzymes are neglected (σ = 0). Thus, all enzymes which have started their cycles at some time t would release their products at a definite time t+τp (with τp = φp/v) and finish their cycles, returning to the rest state, at time t + τ . Therefore, the system is described by a set of three reaction-diffusion equations with time delays, = βmn0 − κn1 − α1n1 (1a) = −βmn0 + κn1 − α0n0 + α0n0(t− τ) +α1n1(t− τ) (1b) = −βmn0 + κn1 + α1n1 − γm+ α0n0(t− τp) +α1n1(t− τp) +D∇ 2m. (1c) The system always has a uniform stationary state with certain concentrations n0, n1 and m, which can be found as solutions of the respective algebraic equations. This state corresponds to the absence of synchronization. However, it may become unstable if allosteric activation is strong enough. To analyze stability, small perturba- tions δn0, δn1 and δm are added to the stationary state, equations (1) are linearized and their solutions are sought as δn0 ∼ δn1 ∼ δm ∼ exp (λqt− iqx) with λq = µq+iωq. Thus, each spatial mode with wavevector q is character- ized by its frequency ωq and its rate of growth µq. The properties µq and ωq are given by the roots of a charac- teristic equation which is determined by the linearization matrix of equations (1). The steady state becomes unsta- ble when at least one spatial mode with some wavenum- ber q0 starts to grow (µq0 > 0). As the bifurcation parameter, coefficient β can be cho- sen. If regulatory molecules cannot bind to enzymes (β = 0), feedback is absent and instabilities are not pos- sible. On the other hand, allosteric activation becomes strong if regulatory molecules can easily bind and, in this case, emergence of oscillations and wave patterns can be expected. Our bifurcation analysis reveals that, depend- ing on the parameters of the system, it can exhibit ei- ther a Hopf or a wave bifurcation [16]. As a result of the Hopf bifurcation, uniform oscillations with q = 0 de- velop. Because of the presence of delays in equations (1), the characteristic equation is nonpolynomial in terms of λ and, generally, a number of oscillatory solutions with different frequencies ω are possible. Physically, such so- lutions correspond to formation of several synchronous enzymic groups. This effect has been previously exten- sively investigated for similar systems in small spatial volumes with full diffusional mixing [11] and we shall not further discuss it here. The most robust uniform oscillations, which we consider, are characterized by the frequency ω ≈ 2π/τ and correspond to the single-group synchronization. As the result of a wave bifurcation (also known as the Hopf bifurcation with a finite wave number [17]), the first unstable modes are traveling waves with a certain wavenumber q0. Figure 3 shows the bifurca- tion diagram in the parameter plane (τp, β). Note the presence of a codimension-2 bifurcation point where the boundaries of the Hopf and the wave bifurcations join. To investigate nonlinear dynamics of the system, nu- merical simulations of equations (1) have been performed 0 0.1 0.2 0.3 0.4 oscillations codimension−2 wave−Hopf bifurcation uniform ripples pacemakers/waves higher frequency/ mixed modes waves standing− traveling standing waves FIG. 3: Phase diagram (α0 = 1, α1 = 1000, κ = 10, γ = 10, c = 100, D = 1000). The Hopf bifurcation (solid line) and the wave bifurcation (dash-dotted line) boundaries are displayed. Gray lines show instability of the stationary state with respect to development of uniform oscillations with two (dashed) and three (dotted) groups in the well-mixed case. Lines separating parameter domains with different kinds of patterns are hand- drawn, based on numerical simulations. [16]. The explicit Euler integration method has been used; no-flux boundary conditions were applied. Results of 1D simulations are summarized in Fig. 3 and examples of typical observed patterns are shown in Fig. 4. Stand- ing waves (Fig. 4a) develop when the boundary of the wave bifurcation (dash-dotted curve) is crossed and uni- form oscillations are observed above the boundary of the Hopf bifurcation. Near the codimension-2 point, more complex behavior was found. This included rippled os- cillations (Fig. 4b), self-organized pacemakers (Fig. 4c) and modulated traveling waves (Fig. 4d). The observed patterns are similar to those previously found in reaction- diffusion systems with the wave bifurcation [18]. In the right upper corner of the diagram in Fig. 3, higher fre- quency oscillations with several synchronous groups take place. Two-dimensional simulations of reaction-diffusion equations (1) with time delay have been performed for selected parameter values. In 2D simulations, sponta- neously developing concentric waves (target patterns) and spiral waves have been observed; target patterns were however unstable and evolved into pairs of rotat- ing spiral waves (Fig. 2c and Video 3 [15]). Complex wave regimes, which can be qualitatively characterized as turbulence of standing waves, have also been observed (Fig. 2d and Video 4 [15]). The mean-field approximation is based on neglect- ing statistical fluctuations in concentrations of reacting species [11] and, therefore, it should hold in the high concentration limit. In Fig. 4, two upper panel rows display spatiotemporal patterns which are observed in FIG. 4: Spatiotemporal patterns in a 1D system (in each panel, the vertical axis is time, running down, and the hor- izontal axis is the coordinate). The upper two rows are stochastic simulations (σ = 0) with concentrations c = 1 and c = 10, the bottom row shows mean-field simulations with c = 100. (a) τp = 0.3, β = 95/c, (b) τp = 0.14, β = 260/c, (c) τp = 0.22, β = 600/c, and (d) τp = 0.16, β = 300/c. Other parameters as in Fig. 3; the system size shown is L = 51 ldiff . stochastic simulations with parameter values correspond- ing to the respective mean-field simulations. To compare mean-field simulations with different enzyme densities, the following property of equations (1) can be used: in- troducing relative concentrations ñ0 = n0/c, ñ1 = n1/c and m̃ = m/c, it can be noticed that they obey the same equations, but with a rescaled coefficient β̃ = βc. Thus, essentially the same patterns are observed as long as the parameter combination βc remains constant. In the stochastic simulations in Fig. 4, the coefficient β has been increased to compensate for a decrease in the enzyme concentration. For larger enzyme concentrations, good agreement between mean-field predictions and stochas- tic simulations has been found. In the mean-field equa- tions (1), intramolecular fluctuations are not taken into account (σ = 0 and therefore each turnover cycle has the same fixed duration τ). Stochastic simulations have been, however, also performed when such fluctuations were present. Synchronization waves could still be found even at internal noise levels which corresponded to the mean relative dispersion ξ of turnover times of about 10% (with ξ = /τ ≃ (2στ) Although the emphasis in this Letter is on the phenom- ena in two-dimensional enzymic arrays, analogous effects should be expected for three-dimensional systems repre- senting aqueous enzymic solutions. The linear stability analysis, yielding Hopf and wave bifurcation boundaries (see Fig. 3), is valid also for the 3D geometry. We have performed preliminary stochastic simulations for thin so- lution layers with high enzyme concentrations and could observe synchronization patterns similar to those found for the enzymic arrays. A product molecule, released by an enzyme, diffuses in the solution until it either binds, as a regulatory molecule, to another enzyme or undergoes a decay. Here, it should be taken into account that a regulatory molecule can bind to an allosteric enzyme only at a certain bind- ing site of characteristic radius R. Using the theory of diffusion-controlled reactions, the average time ttransit after which a regulatory product would find a binding site of one of the enzymes can be roughly estimated [11] as ttransit = 1/cDR, if enzymes are uniformly dis- tributed inside the reaction volume with concentration c. Therefore, binding typically occurs within the dis- tance Lcorr = (Dttransit) = (cR) from the point where a molecule is released. Obviously, it can only take place if the product molecule has not undergone decay until that moment, i.e. if γttransit < 1. This condition puts a restriction on the enzyme concentration c, which must be higher than the critical concentration c∗ = γ/DR. Choosing γ = 103 s−1, D = 10−5 cm2s−1 and R = 10−7 cm, the critical enzyme concentration is c∗ = 1015 cm−3 = 10−6 M. A similar estimate can be obtained when enzymes are immobilized on a plane im- mersed into a reactive solution; in this case the mean dis- tance between the enzymes on the plane should be less than lc = (Rldiff) [22]. Although the required en- zyme concentrations are relatively large, they are within the range characteristic for biological cells (glycolytic en- zymes are present [19] in a cell at even higher concentra- tion of more than 10−5 M). The characteristic temporal period of developing patterns is determined by the en- zyme turnover time τ , which typically varies from mil- liseconds to seconds. The characteristic length scale of developing wave patterns is determined by the diffusion length ldiff , which can vary under these conditions from a fraction of a micrometer to tens of micrometers. Our analysis shows that spontaneous molecular syn- chronization of allosteric product-activated enzymes can be observed in enzymic arrays. Artificial arrays formed by immobilized protein machines (molecular motors) are already used in experiments on active nanoscale trans- port (see [20]). Many enzymes in biological cells are membrane-bound, thus forming natural enzymic arrays. Similar phenomena are possible in dense enzyme solu- tions. In the study by Petty et al. [21], traveling waves of NAD(P)H and proton concentrations with the wave- length of about a micrometer were observed inside neu- trophil cells. These metabolic waves had the temporal period of about 300 ms, which is by two orders of magni- tude shorter than the characteristic period of glycolytic oscillations in the cells and lies closer to the time scales of turnover cycles of individual enzymes. An intriguing question, requiring further detailed analysis, is whether molecular synchronization waves may have already been seen in these experiments. Molecular synchronization waves are principally dif- ferent from classical concentration waves in reaction- diffusion systems. Under synchronization conditions, internal conformational states of individual enzyme molecules in their turnover cycles become strongly cor- related. In optics, a similar situation is found when a transition to coherent laser generation has taken place. Our theoretical analysis may open a way to the investiga- tions of a new class of spatio-temporal pattern formation in chemically active molecular systems. The authors are grateful to M. Falcke and P. Stange for valuable discussions. Financial support of Japan Society for the Promotion of Science through a fellowship for research abroad (Y. T.) is acknowledged. ∗ Present address: Nanobiology Laboratories, Graduate School of Frontier Biosciences, Osaka University, 1-3 Ya- madaoka, Suita, Osaka 565-0871, Japan; Electronic ad- dress: togashi@phys1.med.osaka-u.ac.jp † Electronic address: mikhailov@fhi-berlin.mpg.de [1] K. Kinbara, T. Aida, Chem. Rev. 105, 1377 (2005). [2] L. A. Blumenfeld, A. N. Tikhonov, Biophysical Thermo- dynamics of Intracellular Processes: Molecular Machines of the Living Cell (Springer, Berlin 1994). [3] M. Gerstein, A. M. Lesk, C. Chothia, Biochemistry 33, 6739 (1994). [4] F. Jülicher, A. Ajdari, J. Prost, Rev. Mod. Phys. 69, 1269 (1997). [5] H.-Ph. Lerch, A. S. Mikhailov, B. Hess, Proc. Natl. Acad. Sci. (USA) 99, 15410 (2002). [6] H.-Ph. Lerch, R. Rigler, A. S. Mikhailov, Proc. Natl. Acad. Sci. (USA) 102, 10807 (2005). [7] S. Ramaswamy, J. Toner, J. Prost, Phys. Rev. Lett. 84, 3494 (2000). [8] P. Lenz, J.-F. Joanny, F. Jülicher, J. Prost, Phys. Rev. Lett. 91, 108104 (2003). [9] H.-Y. Chen, Phys. Rev. Lett. 92, 168101 (2004). [10] A. Goldbeter, Biochemical Oscillations and Cellular Rhythms (Cambridge University Press, Cambridge 1996). [11] P. Stange, A. S. Mikhailov, B. Hess, J. Phys. Chem. B 102, 6273 (1998). [12] P. Stange, A. S. Mikhailov, B. Hess, J. Phys. Chem. B 103, 6111 (1999). [13] K. Sun, Q. Ouyang, Phys. Rev. E 64, 026111 (2001). [14] M. Schienbein, H. Gruler, Phys. Rev. E 56, 7116 (1997). [15] See EPAPS Document No. E-PRLTAO-99-041730 for dynamical evolutions in the 2D simula- tions. For more information on EPAPS, see http://www.aip.org/pubservs/epaps.html . [16] V. Casagrande, Doctoral thesis, Technical University, Berlin (2006), http://opus.kobv.de/tuberlin/volltexte/2006/1273/ . [17] D. Walgraef, Spatio-Temporal Pattern Formation (Springer, Berlin 1997). [18] A. M. Zhabotinsky, M. Dolnik, I. R. Epstein, J. Chem. Phys. 103, 10306 (1995). [19] B. Hess, A. Boiteux, J. Krüger, Adv. Enzyme Regul. 7, mailto:togashi@phys1.med.osaka-u.ac.jp mailto:mikhailov@fhi-berlin.mpg.de http://www.aip.org/pubservs/epaps.html http://opus.kobv.de/tuberlin/volltexte/2006/1273/ 149 (1969). [20] H. Hess, G. D. Bachand, Materials Today 8 (12, Suppl. 1), 22 (2005). [21] H. R. Petty, R. G. Worth, A. L. Kindzelskii, Phys. Rev. Lett. 84, 2754 (2000). [22] Diffusion perpendicular to the plane is considered as di- lution within a layer of effective thickness ≃ ldiff . ABSTRACT Spatiotemporal pattern formation in a product-activated enzymic reaction at high enzyme concentrations is investigated. Stochastic simulations show that catalytic turnover cycles of individual enzymes can become coherent and that complex wave patterns of molecular synchronization can develop. The analysis based on the mean-field approximation indicates that the observed patterns result from the presence of Hopf and wave bifurcations in the considered system. <|endoftext|><|startoftext|> Introduction. We are interested in designing Lie group numerical schemes for the strong approximation of nonlinear Stratonovich stochastic differential equa- tions of the form yt = y0 + Vi(yτ , τ) dW τ . (1.1) HereW 1, . . . ,W d are d independent scalar Wiener processes andW 0t ≡ t. We suppose that the solution y evolves on a smooth n-dimensional submanifold M of RN with n ≤ N and Vi : M × R+ → TM, i = 0, 1, . . . , d, are smooth vector fields which in local coordinates are Vi = j=1 V i ∂yj . The flow-map ϕt : M → M of the integral equation (1.1) is defined as the map taking the initial data y0 to the solution yt at time t, i.e. yt = ϕt ◦ y0. Our goal in this paper is to show how the Lie group integration methods developed by Munthe-Kaas and co-authors can be extended to stochastic differential equations on smooth manifolds (see Crouch and Grossman [8] and Munthe-Kaas [40]). Suppose we know that the exact solution of a given system of stochastic differential equations evolves on a smooth manifold M (see Malliavin [36] or Emery [14]), but we can only find the solution pathwise numerically. How can we ensure that our approximate numerical solution also lies in the manifold? Suppose we are given a finite dimensional Lie group G and Lie group action Λy0 that generates transport across the manifold M from the starting point y0 ∈ M via elements of G. Then with any given elements ξ in the Lie algebra g corresponding to the Lie group G, we can associate the infinitesimal action λξ using the Lie group action Λy0 . The map ξ 7→ λξ is a Lie algebra homomorphism from g to X(M), the Lie algebra ∗Maxwell Institute for Mathematical Sciences and School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS, UK. (S.J.Malham@ma.hw.ac.uk, A.Wiese@hw.ac.uk). (16/10/2007) http://arxiv.org/abs/0704.0022v2 2 Malham and Wiese of vector fields over the manifold M. Further the Lie subalgebra {λξ ∈ X(M) : ξ ∈ g} is isomorphic to a finite dimensional Lie algebra with the same structure constants (see Olver [42], p. 56). Conversely, suppose we know that the Lie algebra generated by the set of govern- ing vector fields Vi, i = 0, 1, . . . , d, on M is finite dimensional, call this XF (M). Then we know there exists a finite dimensional Lie group G whose Lie algebra g has the same structure constants as XF (M) relative to some basis, and there is a Lie group action Λy0 such that Vi = λξi , i = 0, 1, . . . , d, for some ξi ∈ g (see Olver [42], p. 56 or Kunita [30], p. 194). The choice of group and action is not unique. In this paper we assume that there is a finite dimensional Lie group G and action Λy0 such that our set of governing vector fields Vi, i = 0, 1, . . . , d, are each infinitesimal Lie group actions generated by some element in g via Λy0 , i.e. Vi = λξi for some ξi ∈ g, i = 0, 1, . . . , d. They are said to be fundamental vector fields. This means that we can write down the set of governing vector fields Xξi for a system of stochastic differential equations on the Lie group G that, via the Lie group action Λy0, generates the flow governed by the set of vector fields Vi on the manifold. The vector fields Vi on M are simply the push forward of the vector fields Xξi on G via the Lie group action Λy0 . Typically the flow on the Lie group also needs to be computed numerically. We thus want the approximation to remain in the Lie group so that the Lie group action takes us back to the manifold. To achieve this, we pull back the set of governing vector fields Xξi on G to the set of governing vector fields vξi on g, via the exponential map ‘exp’ from g to G. Thus the stochastic flow generated on g by the vector fields vξi generates the stochastic flow on G generated by the Xξi . The set of governing vector fields on g are for each σ ∈ g: vξi ◦ σ ≡ (adσ) k ◦ ξi , (1.2) where Bk is the kth Bernoulli number and the adjoint operator adσ is a closed operator on g, in fact adσ ◦ ζ = [σ, ζ], the Lie bracket on g. Now the essential point is that ξi ∈ g and so the series on the right or any truncation of it is closed in g. Hence if we construct an approximation to our stochastic differential equation on g using the vector fields vξi or an approximation of them achieved by truncating the series representation, then that approximation must reside in the Lie algebra g. We can then push the approximation in the Lie algebra forward onto the Lie group and then onto the manifold. Provided we compute the exponential map and action appropriately, our approximate solution lies in the manifold (to within machine accuracy). In summary, for a given ξ ∈ g and any y0 ∈ M we have the following commutative diagram: ∗−−−−→ X(G) (Λy0 )∗−−−−→ X(M) exp−−−−→ G Λy0−−−−→ M We have implicitly separated the governing set of vector fields Vi, i = 0, 1, . . . , d, from the driving path process w ≡ (W 1, . . . ,W d). Together they generate the unique solution process y ∈ M to the stochastic differential equation (1.1). When there is only one driving Wiener process (d = 1) the Itô map w 7→ y is continuous in the topology of uniform convergence. When there are two or more driving processes Stochastic Lie group integrators 3 (d ≥ 2) the Universal Limit Theorem tells us that the Itô map w 7→ y is continuous in the p-variation topology, in particular for 2 ≤ p < 3 (see Lyons [32], Lyons and Qian [33] and Malliavin [36]). A Wiener path with d ≥ 2 has finite p-variation for p > 2. This means that from a pathwise perspective, approximations to y constructed using successively refined approximations to w are only guaranteed to converge to the correct solution y, if we include information about the Lévy chordal areas of the driving path process. Note however that the L2-norm of the 2-variation of a Wiener process is finite. In the Lie group integration procedure prescribed above we must solve a stochastic differential system on the Lie algebra g defined by the set of governing vector fields vξi and the driving path process w ≡ (W 1, . . . ,W d). In light of the Universal Limit Theorem and with stepsize adaptivity in mind in future (see Gaines and Lyons [20]), we for instance use in our examples order 1 stochastic numerical methods—that include the Lévy chordal area—to solve for the flow on the Lie algebra g. We have thus explained the idea behind Munthe-Kaas methods and how they can be generalized to the stochastic setting. The first half of this paper formalizes this procedure. In the second half of this paper, we consider autonomous vector fields and con- struct stochastic Lie group integration schemes using Castell–Gaines methods. This approach proceeds as follows. We truncate the stochastic exponential Lie series expan- sion corresponding to the flow ϕt of the solution process y to the stochastic differential equation (1.1). We then approximate the driving path process w ≡ (W 1, . . . ,W d) by replacing it by a suitable nearby piecewise smooth path in the appropriate variation topology. An approximation to the solution yt requires the exponentiation of the approximate truncated exponential Lie series. This can be achieved by solving the system of ordinary differential equations driven by the vector field that is the approx- imate truncated exponential Lie series. If we use ordinary Munthe-Kaas methods as the underlying ordinary differential integrator the Castell–Gaines method becomes a stochastic Lie group integrator. Further, based on the Castell–Gaines approach we then present uniformly accurate exponential Lie series integrators that are globally more accurate than their stochastic Taylor counterpart schemes (these are investigated in detail in Lord, Malham and Wiese [31] for linear stochastic differential equations). They require the assumption that a sufficiently accurate underlying ordinary differential integrator is used; that integrator could for example be an ordinary Lie group Munthe-Kaas method. In the case of two driving Wiener processes we derive the order 1/2, and in the case of one driving Wiener process the order 1 uniformly accurate exponential Lie series integrators. As a consequence we confirm the asymptotic efficiency properties for both these schemes proved by Castell and Gaines [8] (see Newton [41] for more details on the concept of asymptotic efficiency). We also present in the case of one driving Wiener process a new order 3/2 uniformly accurate exponential Lie series integrator (also see Lord, Malham and Wiese [31]). We present two physical applications that demonstrate the advantage of using stochastic Munthe-Kaas methods. First we consider a free rigid body which for ex- ample could model the dynamics of a satellite. We suppose that it is perturbed by two independent multiplicative stochastic noise processes. The governing vector fields are non-commutative and the corresponding exact stochastic flow evolves on the unit sphere. We show that the stochastic Munthe-Kaas method, with an order 1 stochastic Taylor integrator used to progress along the corresponding Lie algebra, preserves the 4 Malham and Wiese approximate solution in the unit sphere manifold to within machine error. However when an order 1 stochastic Taylor integrator is used directly, the solution leaves the unit sphere. The contrast between these two methods is more emphatically demon- strated in our second application. Here we consider an autonomous underwater vehicle that is also perturbed by two independent multiplicative stochastic noise processes. The exact stochastic flow evolves on the manifold which is the dual of the Euclidean Lie algebra se(3); two independent Casimirs are conserved by the exact flow. Again the stochastic Munthe-Kass method preserves the Casimirs to within machine error. However the order 1 stochastic Taylor integrator is not only unstable for large step- sizes, but the approximation drifts off the manifold and makes a dramatic excursion off to infinity in the embedding space R6. Preserving the approximate flow on the manifold of the exact dynamics may be a required property for physical or financial systems driven by smooth or rough paths— for general references see Iserles, Munthe-Kaas, Nørsett and Zanna [25], Hairer, Lubich and Wanner [22], Elworthy [13], Lyons and Qian [33] and Milstein and Tretyakov [38]. Stochastic Lie group integrators in the form of Magnus integrators for linear stochastic differential equations were investigated by Burrage and Burrage [5]. They were also used in the guise of Möbius schemes (see Schiff and Shnider [43]) to solve stochastic Riccati equations by Lord, Malham and Wiese [31] where they outperformed direct stochastic Taylor methods. Further applications where they might be applied include: backward stochastic Riccati equations arising in optimal stochastic linear-quadratic control (Kohlmann and Tang [28]); jump diffusion processes on matrix Lie groups for Bayesian inference (Srivastava, Miller and Grenander [44]); fractional Brownian motions on Lie groups (Baudoin and Coutin [3]) and stochastic dynamics triggered by DNA damage (Chickarmane, Ray, Sauro and Nadim [10]). Our paper is outlined as follows. In Section 2 we present the basic geometric setup, sans stochasticity. In particular we present a generalized right translation vector field on a Lie group that forms the basis of our subsequent transformation from the Lie group to the manifold. Using a Lie group action, this vector field pushes forward to an infinitesimal Lie group action vector field that generates a flow on the smooth manifold. In Section 3 we specialize to the case of a matrix Lie group and using the exponential map, derive the pullback of the generalized right translation vector field on the Lie group to the corresponding vector field on the Lie algebra. To help give some context to our overall scheme, we provide in Section 4 illustrative examples of manifolds and natural choices for associated Lie groups and actions that generate flows on those manifolds. Then in Section 5 we show how a flow on a smooth manifold corresponding to a stochastic differential equation can be generated by a stochastic flow on a Lie algebra via a Lie algebra action. We explicitly present stochastic Munthe- Kaas Lie group integration methods in Section 6. We start the second half of our paper by reviewing the exponential Lie series for stochastic differential equations in Section 7. We show in Section 8 how to construct geometric stochastic Castell–Gaines numerical methods. In particular we also present uniformly accurate exponential Lie series numerical schemes that not only can be used as geometric stochastic integrators, but also are always more accurate than stochastic Taylor numerical schemes of the corresponding order. In Section 9 we present our concrete numerical examples. Finally in Section 10 we conclude and present some further future applications and directions. 2. Lie group actions. SupposeM is a smooth finite n-dimensional submanifold of RN with n ≤ N . We use X(M) to denote the Lie algebra of vector fields on the manifold M, equipped with the Lie–Jacobi bracket [U, V ] ≡ U · ∇V − V · ∇U , for all Stochastic Lie group integrators 5 U, V ∈ X(M). Let G denote a finite dimensional Lie group. Definition 2.1 (Lie group action). A left Lie group action of a Lie group G on a manifold M is a smooth map Λ: G ×M → M satisfying for all y ∈ M and R,S ∈ G: (1) Λ(id, y) = y; (2) Λ(R,Λ(S, y)) = Λ(RS, y). We denote Λy ◦ S ≡ Λ(S, y). Hereafter we suppose y0 ∈ M is fixed and focus on the action map Λy0 : G→M. We assume that the Lie group action Λ is transitive, i.e. transport across the manifold from any point y0 ∈ M to any other point y ∈ M can always be achieved via a group element S ∈ G with y = Λy0 ◦ S (Marsden and Ratiu [37], p. 310). We define the Lie algebra g associated with the Lie group G to be the vector space of all right invariant vector fields on G. By standard construction this is isomorphic to the tangent space to G at the identity id ≡ idG (see Olver [42], p. 48 or Marsden and Ratiu [37], p. 269). Definition 2.2 (Generalized right translation vector field). Suppose we are given a smooth map ξ : M→g. With each such map ξ we associate a vector field Xξ : G → X(G) defined as follows Xξ ◦ S ≡ ∂τ exp τ ξ(Λy0 ◦ S) for S ∈ G, where ‘exp’ is the usual local diffeomorphism exp: g → G from a neigh- bourhood of the zero element o ∈ g to a neighbourhood of id ∈ G. Definition 2.3 (Infinitesimal Lie group action). We associate with each vector field Xξ : G→X(G) a vector field λξ : M→X(M) as the push forward of Xξ from G to M by Λy0, i.e. λξ ≡ Xξ, so that if S ∈ G and y = Λy0 ◦ S ∈ M, then λξ ◦ y ≡ ∂τΛy0 ◦ γ(τ)|τ=0 , where γ(t) ∈ G, γ(0) = S and ∂τγ(τ) = Xξ ◦ γ(τ) (the flow generated on G by the vector field Xξ starting at S ∈ G). Naturally, as a vector field λξ is linear, and also λξ ◦ y ≡ LXξ ◦ Λy0 ◦ S , the Lie derivative of Λy0 along Xξ at S ∈ G. Remarks. 1. The map Λ(S) : M→M defined by y 7→ Λ(S) ◦ y ≡ Λy ◦ S represents a flow on M. Hence if y = Λ(S) ◦ y0, the push forward of λξ by Λ(S) is given by λξ ≡ λAdSξ (Marsden and Ratiu [37], p. 317). 2. We define the isotropy subgroup at y0 ∈ M by Gy0 ≡ {S ∈ G : Λy0◦S = y0}; it is a closed subgroup of G (see Helgason [23], p. 121 or Warner [48], p. 123). We define the global isotropy subgroup by GM ≡ ∩y0∈MGy0 ≡ {S ∈ G : Λy0 ◦ S = y0, ∀y0 ∈ M}; it is a normal subgroup of G (see Olver [42], p. 38). 3. A Lie group action is said to be is effective/faithful if the map S 7→ Λ(S) from G to Diff(M), the group of diffeomorphisms on M, is one-to-one. This is equivalent to the condition that different group elements have different actions, i.e. GM ≡ {idG}. A Lie group action is said to be free if Gy0 = {idG} for all y0 ∈ M, i.e. Λy0 is a diffeomorphism from G to M. For more details see Marsden and Ratiu [37], p. 310 and Olver [42], p. 38. 4. The map γ : G/Gy0→M defined by γ : S · Gy0 7→ Λy0 ◦ S is a diffeomorphism, i.e. M ∼= G/Gy0 for any y0 ∈ M (a manifold M with a Lie group action Λ: G×M→M defined over it is thus diffeomorphic to a homogeneous manifold ; see Warner [48], p. 123 or Olver [42], p. 40). Further, the induced action of G/GM on M is effective. Hence if Λ is not an effective action of G, we can replace it (without loss of generality) by the induced action of G/GM (see Olver [42], p. 38). 6 Malham and Wiese 5. Our definition for the generalized right translation vector field Xξ on G is motivated by the standard right translation vector field used to identify g, the vector space of right invariant vector fields on G, with TidG, the tangent space to G at the identity. When ξ ∈ g is constant, Xξ ∈ X(G) is right invariant and a Lie bracket on TidG can be defined via right extension by the corresponding Lie–Jacobi bracket for the vector fields Xξ on X(G). Unless ξ ∈ g is constant, Xξ is not in general right invariant. For further details see Varadarajan [47], Olver [42], or Marsden and Ratiu [37]. 6. The infinitesimal generator map ξ 7→ λξ from g to X(M) is a Lie algebra homomorphism. If we identify g as the vector space of left invariant vector fields on G this map becomes an anti-homomorphism. The Lie–Jacobi bracket as defined above gives the right (rather than left) Lie algebra stucture over the group of diffeomorphisms on M. If in addition we take the Lie–Jacobi bracket to be minus that defined above— associated with the left Lie algebra structure—then the infinitesimal generator map becomes a homomorphism again. See for example Marsden and Ratiu [37], p. 324 or Munthe-Kaas [40]. 7. The image of g under the infinitesimal generator map ξ 7→ λξ forms a finite dimensional Lie algebra of vector fields on M which is isomorphic to the Lie algebra of the effectively acting quotient group G/GM (see Olver [42], p. 56). Thus the tangent space to M at any point is g and M inherents a connection from G/GM. Connections are necessary to define martingales on manifolds, but not for defining semimartingales (our focus here); see Malliavin [36] and Emery [14]. 8. A comprehensive study of the systematic construction of symmetry Lie groups from given vector fields can be found in Olver [42]. 9. We assumed above that the vector fieldsXξ and λξ are autonomous. However all results in this and subsequent sections up to Section 7 can be straightforwardly extended to non-autonomous vector fields generated by ξ : M × R→g with (y, t) 7→ ξ(y, t) for all y ∈ M and t ∈ R. 10. For full generality we want to suspend reference to embedding spaces as far as possible. However in subsequent sections to be concise we will more explicitly reclaim this context. 3. Pull back to the Lie algebra. For ease of presentation, we will assume in this section that G is a matrix Lie group. Recall that the exponential map exp: g → G is a local diffeomorphism from a neighbourhood of o ∈ g to a neighbourhood of id ∈ G. Let vξ : g→g be the pull back of the vector field Xξ : G→X(G) from G to g via the exponential mapping exp: g→G, i.e. vξ ◦ σ ≡ exp∗Xξ ◦ σ. If σ ∈ g then vξ ◦ σ = dexp−1σ ◦ ξ Λy0 ◦ expσ . (3.1) Here dexp−1σ : g→g is the inverse of the right-trivialized tangent map of the exponential dexpσ : g→g defined as follows. If β(τ) is a curve in g such that β(0) = σ and β′(0) = η ∈ g then dexp: g× g→g is the local smooth map (Varadarajan [47], p. 108) dexpσ ◦ η ≡ ∂τ expβ(τ)|τ=0 exp(−σ) exp(adσ)− id ◦ η . Note that as a tangent map dexpσ : g→g is linear. The inverse operator dexp σ is the operator series (1.2) generated by considering the reciprocal of dexpσ. Stochastic Lie group integrators 7 To show that (3.1) is true, if exp: g→G with σ 7→ S = expσ, and β(τ) ∈ g with β(0) = σ and ∂τβ(τ) = vξ ◦ β(τ), then: exp∗ vξ ◦ S = ∂τ expβ(τ)|τ=0 dexpσ ◦ vξ ◦ σ exp(σ) ≡ Xξ ◦ S . Since ‘exp’ is a diffeomorphism in a neighbourhood of o ∈ g, this push forward calcu- lation establishes the pull back (3.1) for all σ ∈ g in that neighbourhood. 4. Illustrative examples. Suppose the vector field V : M× R→X(M) gener- ates a flow solution yt ∈ M starting from y0 ∈ M. Then assume there exists a: 1. Lie group G with corresponding Lie algebra g; 2. Lie group action Λy0 : G→M for which a starting point y0 ∈ M is fixed; 3. Vector field λξ : M× R→X(M) such that: V ≡ λξ, i.e. V is a fundamental vector field corresponding to the action Λy0 . Let us suppose G is a matrix Lie group (or can be embedded into a matrix Lie group, for example the Euclidean group SE(3) is naturally embedded into the special linear group SL(4;R)). We have for all S ∈ G and t ∈ R, Xξ(S, t) ≡ ξ Λy0(S), t S . (4.1) If V = λξ for some ξ : M→g, some Lie group G and corresponding action Λy0 , then the flow generated by Xξ on G drives the flow generated by V on M. In each of the examples below, given the manifold M, we present a natural Lie group and action associated with the manifold structure, and identify vector fields which generate flows on the manifold via the Lie group. Stiefel manifold Vn,k. Suppose M = Vn,k ≡ {y ∈ Rn×k : yTy = I}. Take G = SO(n), the special orthogonal group, and Λy0(S) ≡ Sy0, the action of left multiplication. The corresponding Lie algebra g = so(n). Then by direct calculation λξ(y) = ξ(y, t) y. Hence if the given vector field V (y, t) = ξ(y, t) y, then the push forward of the flow generated by Xξ(S, t) on G in (4.1) is the flow generated by V on M. Note that the unit sphere S2 ∼= V3,1, i.e. S2 is just a particular Stiefel manifold. In Section 9 as an application, we consider rigid body dynamics evolving on S2. Isospectral manifold Sn. Suppose M = Sn = {y ∈ Rn×n : yT = y}, the set of n× n real symmetric matrices. Take G = O(n), the orthogonal group and Λy0(S) ≡ T, which is an isospectral action (Munthe-Kaas [40]). The corresponding Lie algebra is g = so(n). Again, by direct calculation λξ(y) = ξ(y, t) y − y ξ(y, t). Hence if the given vector field V (y, t) = ξ(y, t) y−y ξ(y, t), then the push forward of the flow generated by Xξ(S, t) on G in (4.1) is the flow generated by V on M. Dual of the Euclidean algebra se(3)∗. Suppose M = se(3)∗ ∼= R3, the dual of the Euclidean algebra se(3) of the Euclidean group SE(3) = (s, ρ) ∈ SE(3) : s ∈ SO(3), ρ ∈ R3 . Take G = SE(3) so g = se(3) and Λ ≡ Ad∗ : G × g∗→g∗, the coadjoint action of G on g∗. Then by direct calculation λξ(y) = −ad∗ξ(y). Since λξ(y) in linear in ξ and −λξ(y) ≡ λ−ξ(y), it follows that if V (y) = ad∗ξ(y), then the push forward of the flow generated by X−ξ(S, t) = −ξ Λy0(S), t S on G is the flow generated by V on M. For more details see Section 9 where we investigate the dynamics of an autonomous underwater vehicle evolving on se(3)∗. 8 Malham and Wiese Grassmannian manifold Gr(k, n). The Grassmannian manifold M = Gr(k, n) is the space of k-dimensional subspaces of Rn. Take G = GL(n), the general linear matrix group, where if S ∈ GL(n), we identify where the block matrices α, β, γ and δ are sizes k × k, k × (n − k), (n− k) × k and (n − k) × (n − k), respectively (see Schiff and Shnider [43]; Munthe-Kaas [40]). We choose the action of GL(n) on Gr(k, n) to be the generalized Möbius transformation Λy0(S) = (αy0 + β)(γy0 + δ) −1. Hence if ξ(t) = a(t) b(t) c(t) d(t) then direct calculation reveals that λξ(y) = a(t)y+ b(t)− yc(t)y− yd(t). Hence if the given vector field V (y) = a(t)y + b(t) − yc(t)y − yd(t), then the push forward of the flow generated by Xξ(S, t) = ξ(t)S on G is the flow generated by V on Gr(k, n). 5. Stochastic Lie group integration. We show that if a Lie group action Λ: G ×M→M exists, then for y0 ∈ M fixed, the Lie algebra action Λy0 ◦ exp: g→M carries a flow on g to a flow on M. Theorem 5.1. Suppose there exists a Lie group action Λ: G ×M→M. Then if there exists a process σ ∈ g and a stopping time T∗ such that on [0, T∗), σ satisfies the Stratonovich stochastic differential equation vξi ◦ στ dW iτ , (5.1) then the process y = Λy0 ◦ expσ ∈ M satisfies the Stratonovich stochastic differential equation on [0, T∗): yt = y0 + λξi ◦ yτ dW iτ . (5.2) Proof. Using Itô’s lemma, if σt ∈ g satisfies (5.1) then Λy0 ◦ expσt satisfies Λy0 ◦ expσt = Λy0 ◦ exp o+ Lvξi ◦ Λy0 ◦ expστ dW Now recall that for each i = 0, 1, . . . , d, Xξi is the push forward of vξi from g to G via the exponential map, and that λξi is the push forward of Xξi from G to M via Λy0 and so the Lie derivative Lvξi ◦ Λy0 ◦ expσt ≡ λξi ◦ yt . Then since yt = Λy0 ◦ expσt, we conclude that y ∈ M is a process satisfying the stochastic differential equation (5.2). Corollary 5.2. Suppose that for each i = 0, 1, . . . , d there exists ξi : M→g such that the vector field Vi : M→X(M) and λξi : M→X(M) can be identified, i.e. Vi ≡ λξi . (5.3) Stochastic Lie group integrators 9 Then the push forward by ‘Λy0◦exp’ of the flow on the Lie algebra manifold g generated by the stochastic differential equation (5.1) is the flow on the smooth manifold M generated by the stochastic differential equation (5.2), whose solution can be expressed in the form yt = Λy0 ◦ expσt. Remark. If the action is free then ‘Λy0 ◦ exp’ is a diffeomorphism from a neigh- bourhood of o ∈ g to a neighbourhood of y0 ∈ M. 6. Stochastic Munthe-Kaas methods. Assuming that the vector fields in our original stochastic differential equation (1.1) are fundamental and satisfy (5.3), then stochastic Munthe-Kaas methods are constructed as follows: 1. Subdivide the global interval of integration [0, T ] into subintervals [tn, tn+1]. 2. Starting with t0 = 0, repeat the next two steps over successive intervals [tn, tn+1] until tn+1 = T . 3. Compute an approximate solution σ̂tn,tn+1 to (5.1) across [tn, tn+1] using a stochastic Taylor, stochastic Runge–Kutta or Castell–Gaines method. 4. Compute the approximate solution ytn+1 ≈ Λytn ◦ exp σ̂tn,tn+1 . Note that by construction σ̂tn,tn+1 ∈ g because the stochastic differential equa- tion (5.1) (or any stochastic Taylor or other sensible approximation) evolves the so- lution locally on the Lie algebra g via the vector fields vξi : g→g. Suitable methods for approximating the exponential map to ensure it maps g to G appropriately can be found in Iserles and Zanna [26]. Then by construction ytn+1 ∈ M. For example, with two Wiener processes and autonomous vector fields vξi ◦ σ, an order 1 stochastic Taylor Munthe-Kaas method is based on σ̂tn,tn+1 = J0vξ0 +J1vξ1 +J2vξ2 + +J12vξ1vξ2 +J21vξ2vξ1 + ◦o , (6.1) evaluated at the zero element o ∈ g. Typically ‘dexp−1σ ’ is truncated to only include the necessary low order terms to maintain the order of the numerical scheme. Remark. It is natural to invoke Ado’s Theorem (see for example Olver [42] p. 54): any finite dimensional Lie algebra is isomorphic to a Lie subalgebra of gl(n) (the general linear algebra) for some n ∈ N. However as Munthe-Kaas [40] points out, directly using a matrix representation for the given Lie group might not lead to the optimal computational implementation (other data structures might do so). 7. Exponential Lie series. The stochastic Taylor series is known in different contexts as the Neumann series, Peano–Baker series or Feynman–Dyson path ordered exponential. If the vector fields in the stochastic differential equation (1.1) are au- tonomous (which we assume henceforth), i.e. for all i = 0, 1, . . . , d, Vi = Vi(y) only, then the stochastic Taylor series for the flow is Jα1···αm(t)Vα1 · · ·Vαm . Here Pm is the set of all combinations of multi-indices α = (α1, . . . , αm) of length m with αi ∈ {0, 1, . . . , d} and Jα1···αm(t) ≡ · · · ∫ τm−1 dWα1τm · · · dW are multiple Stratonovich integrals. 10 Malham and Wiese The logarithm of ϕt is the exponential Lie series, Magnus expansion (Magnus [34]) or Chen–Strichartz formula (Chen [9], Strichartz [45]). In other words we can express the flow map in the form ϕt = expψt, where Ji(t)Vi + j>i=0 (Jij − Jji)(t)[Vi, Vj ] + · · · is the exponential Lie series for our system, and [· , ·] is the Lie–Jacobi bracket on X(M). See Yamato [49], Kunita [29], Ben Arous [1] and Castell [7] for the derivation and convergence of the exponential Lie series expansion in the stochastic context; Strichartz [45] for the full explicit expansion; Sussmann [46] for a related product expansion and Lyons [32] for extensions to rough paths. Let us denote the truncated exponential Lie series by ψ̂t = Jα cα , (7.1) where Qm denotes the finite set of multi-indices α for which ‖Jα‖L2 is of order up to and including tm, where m = 1/2, 1, 3/2, . . .. The terms cα are linear combinations of finitely many (length α) products of the smooth vector fields Vi, i = 0, 1, . . . , d. The following asymptotic convergence result can be established along the lines of the proof for linear stochastic differential equations in Lord, Malham and Wiese [31]; we provide a proof in Appendix A. Theorem 7.1. Assume the vector fields Vi have 2m+1 uniformly bounded deriva- tives, for all i = 0, 1, . . . , d. Then for t ≤ 1, the flow exp ψ̂t ◦ y0 is square-integrable, where ψ̂t is the truncated Lie series (7.1). Further, if y is the solution of the stochastic differential equation (1.1), there exists a constant C m, ‖y0‖2 such that ∥yt − exp ψ̂t ◦ y0 m, ‖y0‖2 tm+1/2 . (7.2) 8. Geometric Castell–Gaines methods. Consider the truncated exponential Lie series ψ̂tn,tn+1 across the interval [tn, tn+1]. We approximate higher order multiple Stratonovich integrals across each time-step by their expectations conditioned on the increments of the Wiener processes on suitable subdivisions (Gaines and Lyons [20]). An approximation to the solution of the stochastic differential equation (1.1) across the interval [tn, tn+1] is given by the flow generated by the truncated and conditioned exponential Lie series ψ̂tn,tn+1 via ytn+1 ≈ exp ψ̂tn,tn+1 ◦ ytn . Hence the solution to the stochastic differential equation (1.1) can be approximately computed by solving the ordinary differential system (see Castell and Gaines [8]; Misawa [39]) u′(τ) = ψ̂tn,tn+1 ◦ u(τ) (8.1) across the interval τ ∈ [0, 1]. Then if u(0) = ytn we will get u(1) ≈ ytn+1. We must choose a sufficiently accurate ordinary differential integrator to solve (8.1)—we implicitly assume this henceforth. Stochastic Lie group integrators 11 The set of governing vector fields Vi, i = 0, 1, . . . , d, prescribes a map from the driving path process w ≡ (W 1, . . . ,W d) to the unique solution process y ∈ M to the stochastic differential equation (1.1). The map w 7→ y is called the Itô map. Recall that we assume the vector fields are smooth. When there is only one driving Wiener process (d = 1) the Itô map is continuous in the topology of uniform convergence (Theorem 1.1.1. in Lyons and Qian [33]). When there are two or more driving pro- cesses (d ≥ 2) the Universal Limit Theorem (Theorem 6.2.2. in Lyons and Qian [33]) tells us that the Itô map is continuous in the p-variation topology, in particular for 2 ≤ p < 3. A Wiener path with d ≥ 2 has p-variation with p > 2, and the p-variation metric in this case includes information about the Lévy chordal areas of the path (Lyons [32]). Hence we must choose suitable piecewise smooth approximations to the driving path process w. The following result follows from the corresponding result for ordinary differential equations in Hairer, Lubich and Wanner [22] (p. 112) as well as directly from Chapter VIII in Malliavin [36] on the Transfer Principle (see also Emery [15]). Lemma 8.1. A necessary and sufficient condition for the solution to the stochastic differential equation (1.1) to evolve on a smooth n-dimensional submanifold M of RN (n ≤ N) up to a stopping time T∗ is that Vi(y, t) ∈ TyM for all y ∈ M, i = 0, 1, . . . , d. Hence the stochastic Taylor expansion for the flow ϕt is a diffeomorphism on M. However a truncated version of the stochastic Taylor expansion for the flow ϕ̂t will not in general keep you on the manifold, i.e. if y0 ∈ M then ϕ̂t ◦ y0 need not necessarily lie in M. On the other hand, the exponential Lie series ψt, or any truncation ψ̂t of it, lies in X(M). By Lemma 8.1 this is a necessary and sufficient condition for the corresponding flow-map exp ψ̂t to be a diffeomorphism on M. Hence if u(0) = ytn ∈ M, then ytn+1 ≈ u(1) ∈ M. When solving the ordinary differential equation (8.1), classical geometric integration methods, for example Lie group integrators such as Runge–Kutta Munthe-Kaas methods, over the interval τ ∈ [0, 1] will numerically ensure ytn+1 stays in M. Additionally, as the following result reveals, numerical methods constructed using the Castell–Gaines Lie series approach can also be more accurate (a proof is provided in Appendix B). We define the strong global error at time T associated with an approximate solution ŷT as E ≡ ‖yT − ŷT ‖L2 . Theorem 8.2. In the case of two independent Wiener processes and under the assumptions of Theorem 7.1, for any initial condition y0 ∈ M and a sufficiently small fixed stepsize h = tn+1 − tn, the order 1/2 Lie series integrator is globally more accurate in L2 than the order 1/2 stochastic Taylor integrator. In addition, in the case of one Wiener process, the order 1 and 3/2 uniformly accurate exponential Lie series integrators generated by ψ̂ tn,tn+1 = J0V0 + J1V1 + [V1, [V1, V0]] (3/2) tn,tn+1 = J0V0 + J1V1 + (J01 − J10)[V0, V1] + h [V1, [V1, V0]] respectively, are globally more accurate in L2 than their corresponding stochastic Tay- lor integrators. In other words, if E lsm denotes the global error of the exponential Lie series integrators of order m above, and Estm is the global error of the stochastic Taylor integrators of the corresponding order, then E lsm ≤ Estm for m = 1/2, 1, 3/2. Remarks. 1. The result for ψ̂(3/2) is new. That the order-1/2 Lie series integrator (for two Wiener processes) and the order 1 integrator generated by ψ̂(1) are uniformly more accurate confirms the asymptotically efficient properties of these schemes proved by 12 Malham and Wiese Castell and Gaines [8]. The proof follows along the lines of an analogous result for linear stochastic systems considered in Lord, Malham and Wiese [31]. 2. Consider the order 1/2 exponential Lie series with no vector field commu- tations. Solving the ordinary differential equation (8.1) using an (ordinary) Euler Munthe-Kaas method and approximating dexp σ ≈ id is equivalent to the order 1/2 stochastic Taylor Munthe-Kaas method (for the same Lie group and action). 9. Numerical examples. 9.1. Rigid body. We consider the dynamics of a rigid body such as a satellite (see Marsden and Ratiu [37]). We will suppose that the rigid body is perturbed by two independent multiplicative stochastic processes W 1 and W 2 with the corresponding vector fields Vi(y) ≡ ξi(y) y, for i = 0, 1, 2, with ξi ∈ so(3). If we normalize the initial data y0 so that |y0| = 1 then the dynamics evolves on M = S2. We naturally suppose G = SO(3), and Λy0(S) ≡ Sy0 so that λξi(y) = ξi(y) y, and we can pull back the flow generated by V on M to the flow on G generated by Xξi(S, t) = ξi Λy0(S) i = 0, 1, 2. We use the following matrix representation for the ξi(y) ∈ so(3): ξi(y) = 0 −y3/αi,3 y2/αi,2 y3/αi,3 0 −y1/αi,1 −y2/αi,2 y1/αi,1 0 where the constants αi,j for j = 1, 2, 3 are chosen so that the vector fields Vi and matrices ξi do not commute for i = 0, 1, 2: α0,1 = 3, α0,2 = 1, α0,3 = 2, α1,1 = 1, α1,2 = 1/2, α1,3 = 3/2, α2,1 = 1/4, α2,2 = 1, α2,3 = 1/2. The vector fields Vi satisfy the conditions of Theorem 7.1 since the manifold is compact in this case. We will numerically solve (1.1) using three different order 1 methods: stochastic Taylor, stochastic Taylor Munthe-Kaas based on (6.1) and Castell–Gaines (a stan- dard non-geometric Runge–Kutta method is used to solve the ordinary differential equation (8.1)). The vector field compositions ViVj needed for the stochastic Taylor and Castell–Gaines methods are readily computed. For the Munthe-Kaas method we note that we have vξi ◦ o = ξi(y0) and vξivξj ◦ o = Â(y0, y0;αi, αj)− 12 [ξi(y0), ξj(y0)] . Here o ∈ so(3) is the zero element on the Lie algebra, and for all y, z ∈ R3 we define A(y, z;α, β) ≡ − y3z2 − y1z3 − y2z1 and ˆ : R3→so(3) denotes the vector space isomorphism σ 7→ σ̂ where 0 −σ3 σ2 σ3 0 −σ1 −σ2 σ1 0 Note that ŷ z ≡ y ∧ z (see Marsden and Ratiu [37]). Note also since σ ∈ so(3), expσ ∈ SO(3) can be conveniently and cheaply computed using Rodrigues’ formula (see Marsden and Ratiu [37] or Iserles et al. [25]). In Figure 9.1 we show the distance from the manifold S2 of each the three approx- imations; we start with initial data y0 = ( 2, 0)T. The stochastic Taylor Munthe- Kaas method can be seen to preserve the solution in the unit sphere to within machine Stochastic Lie group integrators 13 0 1 2 3 4 5 6 7 8 9 10 Stochastic Taylor Castell−Gaines Munthe−Kaas Fig. 9.1. Rigid body: We show the log-distance of the approximate solution to the unit sphere as a function of time for each of the methods. Below we show the approximate solutions as a function of time for the stochastic Taylor (blue) and Munthe-Kaas methods (magenta). The trajectory starts at the top right and eventually drifts over the left horizon. 14 Malham and Wiese error. We also see that the stochastic Taylor method clearly drifts off the sphere as the integration time progresses, as does the non-geometric Castell-Gaines method— which does however remain markedly closer to the manifold than the stochastic Taylor scheme. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Stochastic Taylor Castell−Gaines Munthe−Kaas −3.2 −3.1 −3 −2.9 −2.8 −2.7 −2.6 −2.5 −2.4 −2.3 −2.2 (stepsize) Number of sampled paths=100 Stochastic Taylor Castell−Gaines Munthe−Kaas Fig. 9.2. Autonomous underwater vehicle: We show the log-distance of the approximate solution to the two Casimirs C1 = π ·p (dotted line) and C2 = |p| 2 (solid line) as a function of time for each of the methods. Below, we also show the global error as a function of stepsize. Stochastic Lie group integrators 15 9.2. Autonomous underwater vehicle. The dynamics of an ellipsoidal au- tonomous underwater vehicle is prescribed by the state y = (π, p) ∈ se(3)∗ where π ∈ so(3)∗ is its angular momentum and p ∈ (R3)∗ its linear momentum (see Holmes, Jenkins and Leonard [24], Egeland, Dalsmo and Sørdalen [12] and Marsden and Ratiu [37]). We suppose that the vehicle is perturbed by two independent multi- plicative stochastic processes. The governing vector fields are for i = 0, 1, 2: Vi(y) = ad ◦ y . Here ξi(y) = ωi(y), ui(y) ∈ se(3) where ωi(y) = I−1i π and ui(y) = M i p are the angular and linear velocity, and Ii = diag(αi,1, αi,2, αi,3) andMi = diag(βi,1, βi,2, βi,3) are the constant moment of inertia and mass matrices, respectively. Explicitly for ξ ∈ se(3) we have ad∗ξ ◦ y ≡ (π ∧ ω + p ∧ u, p ∧ ω) . The system of vector fields Vi, i = 0, 1, 2 represents the Lie–Poisson dynamics on M = se(3)∗ (Marsden and Ratiu [37]). There are two independent Casimir functions Ck : se(3) ∗→R, k = 1, 2, namely C1 = π · p and C2 = |p|2; these are conserved by the flow on se(3)∗. Note that the Hamiltonian, i.e. total kinetic energy 1 (π · ω + p · u), is also exactly conserved (and helpful for establishing the sufficiency conditions in Theorem 7.1), but that is not our focus here. If G = SE(3) ∼= SO(3) × R3, then the coadjoint action of SE(3) on se(3)∗, : SE(3) × se(3)∗→se(3)∗ is defined for all S = (s, ρ) ∈ SE(3), where s ∈ SO(3) and ρ ∈ R3, and y ∈ se(3)∗ by: Λy ◦ S = Ad∗S−1 ◦ y ≡ sπ + ρ ∧ (sp), sp . The corresponding infinitesimal action λ : se(3)× se(3)∗→se(3)∗ for all ξ ∈ se(3) and y ∈ se(3)∗ is given by (see Marsden and Ratiu [37], p. 477) λξ ◦ y = −ad∗ξ ◦ y . Since ad∗ξ(y) = −λξ(y) = λ−ξ(y) the governing set of vector fields on se(3)∗ are Vi(y) = λ−ξi ◦ y . We can now pull back this flow on se(3)∗ to a flow on SE(3) via Λy0 . The correspond- ing flow on SE(3) is generated by the governing set of vector fields for i = 0, 1, 2: X−ξi ◦ S = − ωi(y) ∧ s, ωi(y) ∧ ρ+ ui(y) with y = Λy0(S). To aid implementation note that SE(3) = (s, ρ) ∈ SE(3) : s ∈ SO(3), ρ ∈ R3 embeds into SL(4;R) via the map S = (s, ρ) 7→ where O is the three-vector of zeros. Also se(3) is isomorphic to a Lie subalgebra of sl(4;R) with elements of the form σ = (θ, ζ) 7→ 16 Malham and Wiese Hence the governing vector fields on SE(3) are of the form Xξi = −ξi(y)S, where ξi(y) = ω̂i(π) ui(p) The governing vector fields on se(3) are vξi(σ) = −dexpσ ◦ ξi Λy0(expσ) . Again the vector field compositions ViVj needed for the stochastic Taylor and Castell–Gaines methods can be computed straightforwardly. Direct calculation also reveals that in block matrix form vξivξj◦o = Â(π0, π0;αi, αj) + Â(p0, p0;βi, αj) A(π0, p0;αi, βj) [ξi(y0), ξj(y0)] . Here A(y, z;α, β) is defined as for the rigid body example. Note that the exponential map exp se(3) : se(3)→SE(3) is defined for all σ = (θ, ζ) ∈ se(3) by se(3) σ = so(3) θ̂ f(θ)ζ where exp so(3) is the exponential map from so(3) to SO(3) which can be computed using Rodrigues’ formula and (see Bullo and Murray [4], p. 5) f(θ) = I3×3 + (1 − cos ‖θ‖)θ̂/‖θ‖2 + 1− (sin ‖θ‖)/‖θ‖ θ̂2/‖θ‖2 . In Figure 9.2 we show the distance from the manifold se(3)∗ of each the three ap- proximations; in particular how far the individual trajectories stray from the Casimirs C1 = π · p and C2 = |p|2. We start with the initial data y0 = ( 2, 0, 0, As before the stochastic Taylor Munthe-Kaas method can be seen to preserve the Casimirs to within machine error. We also see that the stochastic Taylor method clearly drifts off the manifold as the integration time progresses and at a particular time depending on the Wiener path shoots off very rapidly away from the manifold. Note also that for large stepsizes the stochastic Taylor method is unstable. However the non-geometric Castell–Gaines and stochastic Munthe-Kaas methods still give reli- able results in that regime. Lastly, although the the stochastic Munthe-Kaas method adheres to the manifold to within machine error, the error of the non-geometric Castell–Gaines method is actually smaller. 10. Conclusions. We have established and implemented stochastic Lie group integrators based on stochastic Munthe-Kaas methods and also derived geometric Castell–Gaines methods. We have also revealed several aspects of these integrators that require further investigation. 1. We could construct a stochastic nonlinear Magnus method by approximating the solution to the stochastic differential equation (5.1) on the Lie algebra using Picard iterations (see Casas and Iserles [6]). 2. We would like to develop a practical procedure for implementing ordinary Munthe-Kaas methods for higher order Castell–Gaines integrators. We need to de- termine the element ξ : M→g so that in (8.1) we have ψ̂ = λξ. 3. We need to determine the properties of the local and global errors for the stochastic Munthe-Kaas methods. Also a thorough investigation of the stability prop- erties of the stochastic Munthe-Kaas and Castell–Gaines methods is required. For the autonomous underwater vehicle simulations they were both superior to the direct stochastic Taylor method, especially for larger stepsizes. We also need to compare the relative efficiency of the methods concerned, in particular to compare an optimally efficient geometric Castell–Gaines method with the stochastic Munthe-Kaas method. Stochastic Lie group integrators 17 4. Although we have chiefly confined ourselves to driving paths that are Wiener processes, we can extend Munthe-Kaas and Castell–Gaines methods to rougher driv- ing paths (Lyons and Qian [33], Friz [18], Friz and Victoir [19]). Further, what hap- pens when we consider processes involving jumps? For example Srivastava, Miller and Grenander [44] consider jump diffusion processes on matrix Lie groups for Bayesian inference. Or what if we consider fractional Brownian driving paths; Baudoin and Coutin [3] investigate fractional Brownian motions on Lie groups? 5. Schiff and Shnider [43] have used Lie group methods to derive Möbius schemes for numerically integrating deterministic Riccati systems beyond finite time removable singularities and numerical instabilities. They integrate a linear system of equations on the general linear group GL(n) which corresponds to a Riccati flow on the Grass- mannian manifold Gr(k, n) via the Möbius action map. Lord, Malham and Wiese [31] implemented stochastic Möbius schemes and show that they can be more accurate and cost effective than directly solving stochastic Riccati systems using stochastic Taylor methods. We would like to investigate further their effectiveness for stochastic Ric- cati equations arising in Kalman filtering (Kloeden and Platen [27]) and to backward stochastic Riccati equations arising in optimal stochastic linear-quadratic control (see for example Kohlmann and Tang [28] and Estrade and Pontier [16]). 6. Other areas of potential application of the methods we have presented in this paper are for example: term-structure interest rate models evolving on finite dimen- sional invariant manifolds (see Filipovic and Teichmann [17]); stochastic dynamics triggered by DNA damage (Chickarmane, Ray, Sauro and Nadim [10]) and stochastic symplectic integrators for which the gradient of the solution evolves on the symplectic Lie group (see Milstein and Tretyakov [38]). Acknowledgments. We thank Alex Dragt, Peter Friz, Anders Hansen, Terry Lyons, Per-Christian Moan and Hans Munthe–Kaas for stimulating discussions. We also thank the anonymous referees, whose suggestions and encouragement improved the original manuscript significantly. SJAM would like to acknowledge the invalu- able facilities of the Isaac Newton Institute where some of the final touches to this manuscript were completed. Appendix A. Proof of Theorem 7.1. We follow the proof for linear stochastic differential equations in Lord, Malham and Wiese [31] (where further technical details on estimates for multiple Stratonovich integrals can be found). Suppose ψ̂t ≡ ψ̂t(m) is the truncated Lie series (7.1). First we show that exp ψ̂t ◦ y0 ∈ L2. We see that for any number k, )k ◦ y0 is a sum of |Qm|k terms, each of which is a k-multiple product of terms Jα cα ◦ y0. It follows that )k ◦ y0 ‖cα ◦ y0‖ αi∈Qm i=1,...,k ‖Jα1Jα2 · · · Jαk‖L2 . (A.1) Note that the maximum of the norm of the compositions of vector fields cα◦y0 is taken over a finite set. Repeated application of the product rule reveals that for i = 1, . . . , k, each term ‘Jα1Jα2 · · · Jαk ’ in (A.1) is the sum of at most 22mk−1 Stratonovich integrals Jβ , where for t ≤ 1, ‖Jβ‖L2 ≤ 24mk−1 tk/2. Since the right hand side of equation (A.1) consists of |Qm|k 22mk−1 Stratonovich integrals Jβ , we conclude that, )k ◦ y0 ‖cα ◦ y0‖ · |Qm| · 26m · t1/2 18 Malham and Wiese Hence exp ψ̂t ◦ y0 is square-integrable. Second we prove (7.2). Let ŷt denote the stochastic Taylor series solution, trun- cated to included terms of order up to and including tm. We have ∥yt − exp ψ̂t ◦ y0 ∥yt − ŷt ∥ŷt − exp ψ̂t ◦ y0 We know yt ∈ L2—see Lemma III.2.1 in Gihman and Skorohod [21]. Note that the assumptions there are fulfilled, since the uniform boundedness of the derivatives implies uniform Lipschitz continuity of the vector fields by the mean value theorem, and uniform Lipschitz continuity in turn implies a linear growth condition for the vector fields since they are autonomous. Note that ŷt is a strong approximation to yt up to and including terms of order t m, with the remainder consisting of O(tm+1/2) terms (see Proposition 5.9.1 in Kloeden and Platen [27]). It follows from the definition of the exponential Lie series as the logarithm of the stochastic Taylor series, that the terms of order up to and including tm in exp ψ̂t ◦ y0 correspond with ŷt; the error consists of O(tm+1/2) terms. Appendix B. Proof of Theorem 8.2. Our proof follows along the lines of that for uniformly accurate Magnus integrators for linear constant coefficient systems (see Lord, Malham & Wiese [31] and Malham and Wiese [35]). Let ϕtn,tn+1 and ϕ̂tn,tn+1 denote the exact and approximate flow-maps constructed on the interval [tn, tn+1] of length h. We define the local flow remainder as Rtn,tn+1 ≡ ϕtn,tn+1 − ϕ̂tn,tn+1 , and so the local remainder is Rtn,tn+1 ◦ ytn . Let Rls and Rst denote the local flow remainders corresponding to the exponential Lie series and stochastic Taylor approx- imations, respectively. B.1. Order 1/2 integrator: two Wiener processes. For the global order 1/2 integrators we have to leading order Rls = 1 (J12 − J21)[V1, V2] and Rst = J12V1V2 + J21V2V1. Note that we have included the terms J11V 1 and J22V 2 in the integrators. A direct calculation reveals that (Rst ◦ y0)TRst ◦ y0 (Rls ◦ y0)TRls ◦ y0 + h2mUTBU +O . (B.1) Here m = 1/2 (for the order 1/2 integrators), U = (V1V2 ◦ y0, V2V1 ◦ y0)T ∈ R2n, and B ∈ R2n×2n consists of n× n diagonal blocks of the form bijIn×n where b = 1 and In×n is the n×n identity matrix. Since b is positive semi-definite, the matrix B = b⊗In×n is positive semi-definite. Hence the order 1/2 exponential Lie series integrator is locally more accurate than the corresponding stochastic Taylor integrator. B.2. Order 1 integrator: one Wiener process. For the global order 1 in- tegrators we have to leading order Rls = 1 (J01 − J10)[V0, V1] and Rst = J01V0V1 + J10V1V0 + J111V h2(V0V 1 + V 1 V0). The terms of order h 2 shown are significant when we consider the global error in Section B.4 below. The estimate (B.1) also applies in this case with m = 1 and U = (V0V1 ◦ y0, V1V0 ◦ y0, V 31 ◦ y0)T ∈ R3n; and B ∈ R3n×3n consists of n× n diagonal blocks of the form bijIn×n where b = 1 3 3 3 3 3 3 3 3 5 Stochastic Lie group integrators 19 Since b is positive semi-definite, the matrix B = b ⊗ In×n is positive semi-definite. Hence the order 1 exponential Lie series integrator is locally more accurate than the corresponding stochastic Taylor integrator. B.3. Order 3/2 integrator: one Wiener process. The local flow remainders are Rls = 1 J110−2J101+J011− 12h [V1, [V1, V0]] and R st = J011V0V 1 +J101V1V0V1+ J110V 1 V0 + J1111V 1 − 14h 2(V0V 1 + V 1 V0 + V 41 ). The terms of order h 2 shown are significant when we consider the global error—but for a different reason this time—see Section B.4 below. Again, the estimate (B.1) applies in this case with m = 3/2 and U = (V0V 1 ◦ y0, V1V0V1 ◦ y0, V 21 V0 ◦ y0, V 41 ◦ y0)T ∈ R4n; and B ∈ R4n×4n consists of n× n diagonal blocks of the form bijIn×n where b = 1 11 8 5 12 8 8 8 12 5 8 11 12 12 12 12 24 Again, B is positive semi-definite and the order 3/2 exponential Lie series integrator is locally more accurate than the corresponding stochastic Taylor integrator. B.4. Global error. Recall that we define the strong global error at time T associated with an approximate solution ŷT as E ≡ ‖yT − ŷT ‖L2. The exact and approximate solutions can be constructed by successively applying the exact and approximate flow maps ϕtn,tn+1 and ϕ̂tn,tn+1 on the successive intervals [tn, tn+1] to the initial data y0. A straightforward calculation shows for a small fixed stepsize h, E2 = E (R ◦ y0)TR ◦ y0 , (B.2) up to higher order terms, where R ≡ n=0 ϕtn+1,tN ◦Rtn,tn+1 ◦ϕt0,tn is the standard accumulated local error contribution to the global error. The important conclusion is that when we construct the global error (B.2), the terms of leading order in the local flow remainders Rls or Rst with zero expectation lose only a half order of convergence in this accumulation effect. Hence in the local flow remainders shown above, for the terms of zero expectation, the local superior accuracy for the Lie series integrators transfers to the corresponding global errors (see Lord, Malham and Wiese [31] for more details). Terms of non-zero expectation however behave like deterministic er- ror terms losing a whole order (in the local to global convergence); they contribute to the global error through their expectations. Hence we include such terms of or- der h2 in the order 3/2 integrators above and they appear as the terms subtracted from the remainders shown. For the order 1 integrators we do not need to include the order h2 terms in the integrator to obtain the correct mean-square convergence. However to guarantee that the global error for the exponential Lie series integrator is always smaller than that for the stochastic Taylor scheme, we include this term in the integrator. REFERENCES [1] G. Ben Arous, Flots et series de Taylor stochastiques, Probab. Theory Related Fields, 81 (1989), pp. 29–77. [2] F. Baudoin, An introduction to the geometry of stochastic flows, Imperial College Press, 2004. [3] F.Baudoin and L. Coutin, Self-similarity and fractional Brownian motions on Lie groups, arXiv:math.PR/0603199 v1, 2006. http://arxiv.org/abs/math/0603199 20 Malham and Wiese [4] F. Bullo and R. M. Murray, Proportional derivative (PD) control on the Euclidean group, CDS Technical Report 95-010, 1995. [5] K. Burrage and P. M. Burrage, High strong order methods for non-commutative stochas- tic ordinary differential equation systems and the Magnus formula, Phys. D, 133 (1999), pp. 34–48. [6] F. Casas and A. Iserles, Explicit Magnus expansions for nonlinear equations, Cambridge NA reports, 2005. [7] F. Castell, Asymptotic expansion of stochastic flows, Probab. Theory Related Fields, 96 (1993), pp. 225–239. [8] F. Castell and J. Gaines, An efficient approximation method for stochastic differential equa- tions by means of the exponential Lie series, Math. Comp. Simulation, 38 (1995), pp. 13–19. [9] K. T. Chen, Integration of paths, geometric invariants and a generalized Baker–Hausdorff formula, Annals of Mathematics, 65(1) (1957), pp. 163–178. [10] V. Chickarmane, A. Ray, H. M. Sauro and A. Nadim, A model for p53 dynamics triggered by DNA damage, SIAM J. Applied Dynamical Systems, 6(1) (2007), pp.61–78. [11] P. E. Crouch and R. Grossman, Numerical integration of ordinary differential equations on manifolds, J. Nonlinear Sci., 3 (1993), pp. 1–33. [12] O. Egeland, M. Dalsmo and O. J. Sørdalen, Feedback control of a nonholonomic under- water vehicle with a constant desired configuration, The International Journal of Robotics Research, 15(1) (1996), pp. 24–35. [13] K. D. Elworthy, Stochastic differential equations on manifolds, London Mathematical Society Lecture Note Series 70, Cambridge University Press, 1982. [14] M. Emery, Stochastic Calculus on manifolds, Universitext, Springer–Verlag, 1989. [15] , On two transfer principles in stochastic differential geometry, Séminaire de probabilités (Strasbourg), 24 (1990), pp. 407–441. [16] A. Estrade and M. Pontier, Backward stochastic differential equations in a Lie group, Séminaire de probabilités (Strasbourg), 35 (2001), pp. 241–259. [17] D. Filipović and J. Teichmann, On the geometry of the term structure of interest rates, Proc. R. Soc. Lond. A, 460 (2004), pp. 129–167. [18] P. Friz, Continuity of the Itô-map for Hölder rough paths with applications to the support theorem in Hölder norm, arXiv:math.PR/0304501 v2, 2003. [19] P. Friz and N. Victoir, Euler estimates for rough differential equations, Preprint, 2007. [20] J. G. Gaines and T. J. Lyons, Variable step size control in the numerical solution of stochastic differential equations, SIAM J. Appl. Math., 57(5) (1997), pp. 1455–1484. [21] I. I. Gihman, and A. V. Skorohod, The theory of stochastic processes III, Springer, 1979. [22] E. Hairer, C. Lubich and G. Wanner, Geometric Numerical Integration, Springer Series in Computational Mathematics, 2002. [23] S. Helgason, Differential geometry, Lie groups, and symmetric spaces, Academic Press, 1978. [24] P. Holmes, J. Jenkins and N. E. Leonard, Dynamics of the Kirchoff Equations I: coincident centers of gravity and bouyancy, Phys. D, 118 (1998), pp. 311–342. [25] A. Iserles, H. Z. Munthe-Kaas, S. P. Nørsett, and A. Zanna, Lie-group methods, Acta Numer., (2000), pp. 215–365. [26] A. Iserles and A. Zanna, Efficient computation of the matrix exponential by generalized polar decompositions, SIAM J. Numer. Anal., 42(5) (2005), pp. 2218–2256. [27] P. E. Kloeden and E. Platen, Numerical solution of stochastic differential equations, Springer, 1999. [28] M. Kohlmann and S. Tang, Multidimensional backward stochastic Riccati equations and ap- plications, SIAM J. Control Optim., 41(6) (2003), pp. 1696–1721. [29] H. Kunita, On the representation of solutions of stochastic differential equations, LNM 784, Springer–Verlag, 1980, pp. 282–304. [30] , Stochastic flows and stochastic differential equations, Cambridge University Press, 1990. [31] G. Lord, S. J.A. Malham and A. Wiese, Efficient strong integrators for linear stochastic systems, 2006, Submitted. [32] T. Lyons, Differential equations driven by rough signals, Rev. Mat. Iberoamericana, 14(2) (1998), pp. 215–310. [33] T. Lyons and Z. Qian, System control and rough paths, Oxford University Press, 2002. [34] W. Magnus, On the exponential solution of differential equations for a linear operator, Comm. Pure Appl. Math., 7 (1954), pp. 649–673. [35] S. J.A. Malham and A. Wiese, Universal optimal stochastic expansions, 2007, Preprint. [36] P. Malliavin, Stochastic analysis, Grundlehren der mathematischen Wissenschaften 313, Springer, 1997. [37] J. E. Marsden and T. S. Ratiu, Introduction to mechanics and symmetry, Second edition, http://arxiv.org/abs/math/0304501 Stochastic Lie group integrators 21 Springer, 1999. [38] G. N. Milstein and M. V. Tretyakov, Stochastic numerics for mathematical physics, Springer, 2004. [39] T. Misawa, A Lie algebraic approach to numerical integration of stochastic differential equa- tions, SIAM J. Sci. Comput., 23(3) (2001), pp. 866–890. [40] H. Munthe-Kaas, High order Runge–Kutta methods on manifolds, Appl. Numer. Math., 29 (1999), pp. 115–127. [41] N. J. Newton, Asymptotically efficient Runge–Kutta methods for a class of Itô and Stratonovich equations, SIAM J. Appl. Math., 51 (1991), pp. 542–567. [42] P. J. Olver, Equivalence, invariants, and symmetry, Cambridge University Press, 1995. [43] J. Schiff and S. Shnider, A natural approach to the numerical integration of Riccati differ- ential equations, SIAM J. Numer. Anal., 36(5) (1999), pp. 1392–1413. [44] A. Srivastava, M. I. Miller and U. Grenander, Jump-diffusion processes on matrix Lie groups for Bayesian inference, preprint, 2000. [45] R. S. Strichartz, The Campbell–Baker–Hausdorff–Dynkin formula and solutions of differen- tial equations, J. Funct. Anal., 72 (1987), pp. 320–345. [46] H. J. Sussmann, Product expansions of exponential Lie series and the discretization of stochas- tic differential equations, in Stochastic Differential Systems, Stochastic Control Theory, and Applications, W. Fleming and J. Lions, eds., Springer IMA Series, Vol. 10 (1988), pp. 563–582. [47] V. S. Varadarajan, Lie groups, Lie algebras, and their representations, Springer, 1984. [48] F. W. Warner, Foundations of differentiable manifolds and Lie groups, Graduate Texts in Mathematics, Springer–Verlag, 1983. [49] Y. Yamato, Stochastic differential equations and nilpotent Lie algebras, Z. Wahrsch. Verw. Gebiete, 47(2) (1979), pp 213–229. ABSTRACT We present Lie group integrators for nonlinear stochastic differential equations with non-commutative vector fields whose solution evolves on a smooth finite dimensional manifold. Given a Lie group action that generates transport along the manifold, we pull back the stochastic flow on the manifold to the Lie group via the action, and subsequently pull back the flow to the corresponding Lie algebra via the exponential map. We construct an approximation to the stochastic flow in the Lie algebra via closed operations and then push back to the Lie group and then to the manifold, thus ensuring our approximation lies in the manifold. We call such schemes stochastic Munthe-Kaas methods after their deterministic counterparts. We also present stochastic Lie group integration schemes based on Castell--Gaines methods. These involve using an underlying ordinary differential integrator to approximate the flow generated by a truncated stochastic exponential Lie series. They become stochastic Lie group integrator schemes if we use Munthe-Kaas methods as the underlying ordinary differential integrator. Further, we show that some Castell--Gaines methods are uniformly more accurate than the corresponding stochastic Taylor schemes. Lastly we demonstrate our methods by simulating the dynamics of a free rigid body such as a satellite and an autonomous underwater vehicle both perturbed by two independent multiplicative stochastic noise processes. <|endoftext|><|startoftext|> Introduction The chromosphere remains the least understood layer of the solar atmosphere, with the very basics of its struc- ture being hotly debated: is it better described by the classical picture of a steady temperature rise as a func- tion of height, with superposed weak oscillations (e.g. semi empirical models of Vernazza et al. [8], Fontenla et al. [5]), or does the temperature keep dropping out- wards, with very hot shocks producing strong localized heating (radiation hydrodynamic simulations of Carls- son & Stein [3], [4], and Wedemeyer et al. [9])? The latter concept is consistent with the IR observations of carbon monoxide, which require cool gas to be present at chromospheric heights (see, e.g. Ayres [1]). Thus, existing models cannot provide a complete de- scription of the solar chromosphere. Consequently now- adays two alternative pictures of the chromosphere co- exist and the role played by chromospheric dynamics in the structuring of this atmospheric layer is a subject of intense scientific debate. One reason for conflicting models is that they are based either on atomic chromospheric lines and con- tinua in the UV or on molecular lines in the IR, since UV observations are practically blind to cool gas in a dynamic chromosphere, while the IR observations sam- ple only the cool part of the chromosphere. Improved and more sensitive diagnostics of the chromospheric structure and dynamics, that sample both the hot and http://arxiv.org/abs/0704.0023v1 MDI TRACE 1600 CaII K BIMA 3.5 mm Fig. 1 Portrait of the solar chromosphere at the center of the Sun’s disk at 4 different wavelengths on May 18, 2004. From top left to bottom right: MDI longitudinal photospheric magnetogram, UV 1600 A image from TRACE, CaII K line center image from BBSO and BIMA image at 3.5 mm. the cool gas and should distinguish between the ri- val models, are provided by observations at millime- ter wavelengths with an acceptable spatial resolution as was proposed by Loukitcheva et al. [6]. In this con- tribution we review the unique chromospheric obser- vations at 3.5 mm with the Berkeley-Illinois-Maryland Array and the analysis of the intensity variations ex- pected from the model of Carlsson & Stein for mm wavelengths. We postulate the requirements for mm ob- servations with the future instruments, with emphasis on spatial and temporal resolution. Finally we discuss the prospects for chromospheric studies with ALMA. 2 Results 2.1 Analysis of the BIMA observations at 3.5 mm The Berkeley-Illinois-Maryland Array (BIMA) operat- ing at a wavelength of 3.5 mm (frequency of 85 GHz) has been the only interferometer in the mm range fre- quently used for solar observations. The BIMA tele- scopes are now part of the CARMA array which will also carry out such observations. With the BIMA data obtained in the years 2003 and 2004 we have constructed two-dimensional maps of the solar chromosphere with a resolution of 12′′, which represents the highest spatial resolution achieved so far at this wavelength for non- flare solar observations. The BIMA images have led to new insights in to chromospheric structure and to the detection of spatially-resolved chromospheric oscil- lations at mm wavelengths. The details of the restora- tion procedure and extensive tests of the sensitivity of the BIMA data to the detection of dynamic signatures can be found in White et al. [11]. With the currently available resolution the contrast of the brightness structures is evaluated to be up to 30% of the quiet-sun brightness at 3.5 mm (White et al. [11]). However, the similarity of brightness struc- tures, derived from the mm images and seen in other chromospheric emissions (Fig.1), in spite of the differ- ence in resolution of the images (1-2′′ resolution of the UV images), implies that the BIMA resolution is not enough to resolve the millimeter fine structure and ob- servations with spatial resolution much higher than 12′′ are required. A detailed analysis of the relations be- tween the millimeter emission, magnetic field and other chromospheric diagnostics is in preparation. In the millimeter brightness we detected intensity oscillations with typical amplitudes of 50-150 K in the range of periods from 120 to 700 seconds (frequency range 1.5-8 mHz). We found a tendency toward short period oscillations in internetwork and longer periods in network regions in the quiet Sun, which is in good agree- ment with the results obtained at other wavelengths. At 3 mm the inner parts of the chromospheric cells exhibit a behavior typical of the internetwork with the maxi- mum of the Fourier power in the 3-minute range, how- ever, most of the oscillations are quasi-periodic, show- ing up in wave trains of finite duration lasting for typi- cally 1-3 wave periods (see also Loukitcheva et al. [7]). 2.2 Analysis of the CS model millimeter spectrum The response of the submillimeter and millimeter ra- diation to a time-series generated by Carlsson & Stein (CS) was computed under the assumption of thermal free-free radiation by Loukitcheva et al. [6]. The results are depicted in Fig. 2 as the excess intensity as a func- tion of wavelength and time. 400 720 1040 1360 1680 2000 2320 2640 2960 3280 3600 time(s) Fig. 2 Evolution of the Carlsson & Stein model millimeter spec- trum with time. Negative grey scale representing excess intensity as a function of time and wavelength. Wave periods of approximately 3 min can be clearly distinguished in the intensity at all considered wave- lengths. Though the dominant frequency of the oscilla- tions changes slightly with wavelength, for all mm wave- lengths it lies in the range of 3 minutes. The difference from one period of time to another can be explained by the presence of merging shocks during certain time intervals. The differences in the light curves at differ- ent wavelengths are caused primarily by the difference in the formation heights of the emitted radiation. In general the amplitudes of the oscillations compared to the radiation temperature are large, in this sense mm wavelength radiation combines the advantages of the CO lines, which mainly see the cool gas, with those of atomic lines and UV continua, which mainly sample the hot gas. On the whole, the brightness temperatures are ex- tremely time-dependent at millimeter wavelengths, fol- lowing changes in the atmospheric parameters. With increasing wavelength the amplitude of the brightness oscillations grows significantly, reaches its maximum value at 2.2 mm (expected to be 15% of the quiet-Sun brightness temperature), and decreases rapidly towards longer wavelengths. Thus we can identify the range 0.8- 5.0 mm as the appropriate range of mm wavelengths at which one can expect the clearest signatures of dynamic effects. A careful look at the mm brightness spectrum as a function of time (see Fig. 2) reveals a time delay between the oscillations at long and short millimeter wavelengths. Hence, it is possible to study wave modes traveling in the chromosphere by comparing sub-mm with mm observations. 3 Discussion The CS model predicts that spatially and temporally resolved observations should clearly exhibit the signa- tures of the strong shock waves. However, a direct com- parison of the observational data products (RMS val- ues, histogram skewness, Fourier and wavelet spectra, etc.), referring to regions with weak magnetic field like the quiet Sun internetwork, with the corresponding prod- ucts expected from the simulations of Carlsson & Stein exhibits large differences. In particular, the RMS of the brightness temperature is nearly an order of magnitude larger in the model (800 K at 3 mm) than in the ob- servations (100 K). Another difference is the absence of longer periods in the model power spectrum. But these discrepancies do not rule out the CS models. On the one hand the model is one dimensional and hence does not predict a coherence length of the oscillations, while on the other hand we are not able to resolve individual oscillating elements due to the limited spatial resolution of the observations. Consequently we estimated the influence of the spa- tial smearing on the model parameters of chromospheric dynamics and on the observed oscillatory power. Thus we confirmed that the very limited spatial resolution currently available hinders a clean separation between cells and network and typically both network and in- ternetwork areas contribute to the recorded BIMA ra- diation. From the analysis of the observational data it was found that power in all frequency ranges increases significantly with improving resolution. Consistency be- tween the power predicted by the CS model and the observed power is obtained if the coherence length of oscillating elements is on the order of 1′′. Our results are consistent with Wedemeyer et al. [10], who computed the millimeter wave signature re- sulting from the 3-D simulations of Wedemeyer et al. [9]. Although the 3-D simulations suffer from the fact that the radiative transfer of energy is computed en- tirely in LTE, which becomes a poor assumption at chromospheric heights, the authors believe that the chro- mospheric pattern and its temporal evolution is repre- sentative of the non-magnetic internetwork regions of the solar chromosphere. The simulations display a com- plex 3D structure of the chromospheric layers, which is highly dynamical on temporal scales of 20-25 s and on spatial scales comparable to solar granulation, which is in good agreement with the 1′′ size of oscillating ele- ments that we deduced. According to Wedemeyer et al. [9] the chromospheric temperature structure is charac- terized by a pattern of hot shock waves, which originate from convective motions, and cool gas lying between the shocks. The intensity distribution at mm wavelengths follows the pattern of the shocks in the chromosphere with a sub arcsecond size of the features associated with the shocks. All this complex and dynamic 3D structure can be deduced from observations at mm wavelengths with a sufficiently high spatial resolution of better than 4 Summary Simultaneous mm-submm observations at different wave- lengths can be used for the tomography of the solar atmosphere, as radiation at the different wavelengths originates from different layers, with the average for- mation height increasing with wavelength. Such obser- vations also provide a strong test of present and future models. However, observations that might be able to uncover the nature of the chromosphere should meet the following requirements: – multiband observations in mm-submm domain (0.8- 5.0 mm) to address shock waves and chromospheric oscillation modes – arcsecond spatial resolution to resolve fine structure – temporal resolution better than a few seconds to follow its evolution in time – FOV size of order of 1′ – accurate absolute calibration of the observations (Bas- tian [2]) These requirements look very similar to the techni- cal specification of the continuum observations with the Atacama Large Millimeter Array (ALMA), which rep- resents an enormous advance over existing instrumenta- tion operating at mm-submm wavelengths. ALMA will produce images of the highest resolution available for the foreseeable future (although the technical problem of sampling both large and small spatial scales simulta- neously, required for high–quality imaging of the chro- mosphere, will remain a challenge) and will be the most sensitive instrument operating at submm-mm wavelengths. To summarize, ALMA will be an extraordinarily pow- erful instrument for studying the solar chromosphere. It will finally allow the mapping of the three-dimensional thermal structure of the solar chromosphere which will be a real breakthrough in solar studies. Acknowledgements The use of BIMA for scientific research carried out at the University of Maryland is supported by NSF grant AST–0028963. Solar research at the University of Maryland is supported by NSF grant ATM 99-90809 and NASA grants NAG 5-8192, NAG 5-10175, NAG 5-12860 and NAG 5-11872. References 1. Ayres, T.R.: Does the Sun Have a Full-Time COmosphere? Ap. J. 575, 1104-1115 (2002) 2. Bastian, T. S.: ALMA and the Sun. Astronomische Nachrichten 323, 271-276 (2002) 3. Carlsson, M., & Stein, R.F.: Does a nonmagnetic solar chro- mosphere exist? Ap. J. 440, L29-L32 (1995) 4. Carlsson, M., & Stein, R.F.: Dynamic Hydrogen Ionization. Ap. J. 572, 626-635 (2002) 5. Fontenla, J. M.; Avrett, E. H.; Loeser, R.: Energy balance in the solar transition region. III - Helium emission in hydrostatic, constant-abundance models with diffusion. Ap. J. 406, 319-345 (1990) 6. Loukitcheva, M., Solanki, S.K., Carlsson, M., Stein, R.F.: Mil- limeter observations and chromospheric dynamics. A&A 419, 747-756 (2004) 7. Loukitcheva, M., Solanki, S.K., White, S.: The dynamics of the solar chromosphere: comparison of model predictions with millimeter-interferometer observations. A&A 456, 713-723 (2006) 8. Vernazza, J. E., Avrett, E. H., Loeser, R.: Structure of the solar chromosphere. III - Models of the EUV brightness com- ponents of the quiet-sun. Ap. J. Suppl. 45, 635-725 (1981) 9. Wedemeyer, S., Freytag, B., Steffen, M., Ludwig, H.-G., Hol- weger, H.: Numerical simulation of the three-dimensional struc- ture and dynamics of the non-magnetic solar chromosphere. A&A 414, 1121-1137 (2004) 10. Wedemeyer-Böhm, S., Ludwig, H.-G., Steffen, M., Freytag, B., Holweger, H.: The shock-patterned solar chromosphere in the light of ALMA. In: Favata et al. (eds.) Proceedings of ”The 13th Cambridge Workshop on Cool Stars, Stellar Systems and the Sun” Hamburg, Germany, ESA SP-560, pp. 1035-1038 (2005) 11. White, S., Loukitcheva, M., & Solanki, S.K.: High-resolution millimeter-interferometer observations of the solar chromo- sphere. A&A 456, 697-711 (2006) Introduction Results Discussion Summary ABSTRACT The very nature of the solar chromosphere, its structuring and dynamics, remains far from being properly understood, in spite of intensive research. Here we point out the potential of chromospheric observations at millimeter wavelengths to resolve this long-standing problem. Computations carried out with a sophisticated dynamic model of the solar chromosphere due to Carlsson and Stein demonstrate that millimeter emission is extremely sensitive to dynamic processes in the chromosphere and the appropriate wavelengths to look for dynamic signatures are in the range 0.8-5.0 mm. The model also suggests that high resolution observations at mm wavelengths, as will be provided by ALMA, will have the unique property of reacting to both the hot and the cool gas, and thus will have the potential of distinguishing between rival models of the solar atmosphere. Thus, initial results obtained from the observations of the quiet Sun at 3.5 mm with the BIMA array (resolution of 12 arcsec) reveal significant oscillations with amplitudes of 50-150 K and frequencies of 1.5-8 mHz with a tendency toward short-period oscillations in internetwork and longer periods in network regions. However higher spatial resolution, such as that provided by ALMA, is required for a clean separation between the features within the solar atmosphere and for an adequate comparison with the output of the comprehensive dynamic simulations. <|endoftext|><|startoftext|> Formation of quasi-solitons in transverse confined ferromagnetic film media A.A. Serga 1 Technische Universität Kaiserslautern, Department of Physics and Forschungsschwerpunkt MINAS, D - 67663 Kaiserslautern, Germany M. Kostylev 2 School of Physics, The University of Western Australia, 35 Stirling Highway, Crawley WA 6009, Australia St.Petersburg Electrotechnical University, 197376, St.Petersburg, Russia B. Hillebrands Technische Universität Kaiserslautern, Department of Physics and Forschungsschwerpunkt MINAS, D - 67663 Kaiserslautern, Germany Abstract The formation of quasi-2D spin-wave waveforms in longitudinally magnetized stripes of ferrimagnetic film was observed by using time- and space- resolved Brillouin light scattering technique. In the linear regime it was found that the confinement decreases the amplitude of dynamic magnetization near the lateral stripe edges. Thus, the so-called effective dipolar pinning of dynamic mag- netization takes place at the edges. In the nonlinear regime a new stable spin wave packet propagating along a waveguide structure, for which both transversal instability and interaction with the side walls of the waveguide are important was observed. The experiments and a numerical simulation of the pulse evolution show that the shape of the formed waveforms and their behavior are strongly influenced by the confinement. We report on the observation of a new type of a stable, two-dimensional nonlinear spin wave packet propagating in a magnetic waveguide structure and suggest a theoretical description of our experimental findings. Stable two-dimensional spin wave packets, so-called spin wave bullets, were previ- ously observed, however solely in long and wide samples of a thin ferrimag- netic film of yttrium-iron-garnet (YIG) [1, 2, 3], that were practically un- 1Email address: serha@rhrk.uni-kl.de 2Email address: kostylev@cyllene.uwa.edu.au bounded in both in-plane directions compared to the lateral size of the spin wave packets and the wavelength of the carrier spin wave. In a waveguide structure, where the transverse dimension is comparable to the wavelength, up to day only quasi one-dimensional nonlinear spin wave objects were ob- served, which are spin wave envelope solitons. Here a typical system is a narrow (' 1-2mm) stripe of a YIG ferrite film [4, 5]. Both for solitons and bul- lets the spreading in dispersion is compensated by the longitudinal nonlinear compression. Concerning the transverse dimension, solitons have a cosine- like amplitude distribution due to the lateral confinement in the waveguide, whereas bullets show a transverse nonlinear instability compensating pulse widening due to diffraction and leading to transverse confinement. Here we report on the observation of a new stable spin wave packet prop- agating along a waveguide structure, for which both transversal instability and interaction with the side walls of the waveguide are important. The experiments were carried out using a longitudinally magnetized long YIG film stripe of 2.5mm width and 7µm thickness. The magnetizing field was 1831Oe. The spin waves were excited by a microwave magnetic field created with a microstrip antenna of 25µm width placed across the stripe and driven by electromagnetic pulses of 20ns duration at a carrier frequency of 7.125GHz. As is well known the backward volume magnetostatic spin wave (BVMSW) [6] excited in the given experimental configuration is able to form both envelope solitons and bullets [4], depending on the geometry. The spatio-temporal behavior of the traveling BVMSW packets was investigated by means of space- and time-resolved Brillouin light scattering spectroscopy The obtained results are demonstrated in Fig. 1 where the spatial distri- butions of the intensity of the spin wave packets are shown for given moments of time. The spin wave packets propagate here from left to right and decay in the course of their propagation along the waveguide because of magnetic loss. The left set of diagrams corresponds to the linear case. The power of the driving electromagnetic wave is 20mW. The right set of diagrams corre- sponding to the nonlinear case was collected for a driving power of 376mW. Differences between these two cases are clearly observed. First of all the linear spin wave packet is characterized by a cosine-like lateral profile while the cross section of the nonlinear pulse is sharply modified relative to the linear case and has a pronounced bell-like shape. Second, the intensity of the linear packet decays monotonically with time while the intensity of the non- linear packet initially increases because of its strong transversal compression (see the second diagram from the top in Fig. 1). Both of these nonlinear features provide clear evidence for the develop- Figure 1: Bullet formation in the transversally confined yttrium-iron-garnet film. ment of a transversal instability and bullet formation. It is interesting that the bell-like cross-section shape survives even at the end of the propaga- tion distance when the pulse intensity decreases more than ten times and the nonlinear contribution to the spin wave dynamics should considerably diminish. In order to interpret the experimental result we have assumed that the de- velopment of nonlinear instabilities in a laterally confined medium is strongly modified by a quantization of the spin wave spectrum. That is why we have transformed the two-dimensional Nonlinear Schrödinger Equation tradition- ally used for the analysis of bullet dynamics [4] into a system of coupled equations for amplitudes of the spin wave width modes. The specific form of the discrete set of these orthogonal modes is defined by the actual boundary conditions at the lateral edges of the stripe. We developed a two-dimensional theory of linear spin-wave dynamics in magnetic stripes. As an important outcome we found that the Guslienko-Slavins effective boundary condition [8] for dynamic magnetization at the stripe lateral edges, being initially derived for spin waves with vanishing longitudinal wavenumbers, is also valid in the case of propagating width modes with non-vanishing longitudinal wavenum- bers [9] . The effective boundary condition shows that the magnetization vector at the lateral stripe edges is highly pinned, that means that the am- plitude of dynamic magnetization practically vanishes at the edges. For simplicity it is even possible to consider the stripe width modes to be totally pinned at the stripe lateral edges. As seen from Fig. 1 this conclusion is in a good agreement with the experiment. The analysis of the system of nonlinear equations derived from the Non- linear Schrödinger Equation shows that the formation of the two-dimensional waveform can be considered as an enrichment of the spectrum of the width modes. The partial waveforms carried by the modes have the same carrier frequencies equal to that of the initial signal and the carrier wave numbers which satisfy the dispersion relations for the modes. In the linear regime all the modes are orthogonal to each other and do not interact. In the nonlinear (high amplitude) regime the width modes become intercoupled by the four- wave nonlinear interaction, resulting in an intermodal energy transfer and the mode spectrum enrichment. As the spin wave input antenna effectively generates only the lowest width mode, the initial waveform launched in the stripe is determined by it solely. Therefore to understand the underlaying physics of quasi-bullet formation it is necessary to consider the nonlinear interaction of higherorder width modes with it. Our theoretical analysis shows that the interaction of the lowest width mode (n = 1) with higherorder modes is different for odd and even higher or- der modes. While interacting with even modes, the lowest width mode plays the role of the pumping wave. This parametrically transfers its energy to the higher width modes. The interaction is purely parametric and therefore a threshold process. It needs an initial signal to start the process. This signal usually is a thermally excited mode. Therefore the amplified waveform needs a large distance of propagation and a group velocity equal to the velocity of the lowest width mode in order to reach the soliton amplitude level. If there is a damping of the pumped wave, even modes will never reach an amplitude comparable with that of the lowest mode. As a result they can contribute to the nonlinear waveform profile only, if the amplitude of the initial waveform is far beyond the threshold of soliton formation. Interaction of modes of the same type of symmetry are described by a parametric term as well as by an additional pseudo-linear (tri-linear) exci- tation term, playing the role of an external source of excitation. Such a pseudo-linear excitation is a threshold-free process. In contrast to paramet- ric processes it does not need an initial amplitude value to start the the process. The pseudo-linear excitation is possible only due to the effective dipolar pinning of the magnetization at the stripe edges. If the edge spins were unpinned, the interaction of all the width modes would be purely para- metric. The purely parametric mechanism of developing a transversal instability is typical for the process of bullet formation from a plane-wave waveform in an unconfined medium, which distinguishes it from the process of soliton and bullet formation in the waveguide structures. In contrast, the transverse instability of a wave packet in a confined medium starts as a pseudolinear excitation of higher-order width modes. This mechanism ensures a rapid growth of the symmetric n = 3 mode up to the level where the parametric mechanism starts to work. After that the main mode together with the n = 3 mode are capable to rapidly generate a large set of yet higher modes through both pseudo-linear and parametric mechanisms. Our theory shows that the efficiency of both nonlinear interaction mech- anisms (parametric and tri-linear) strongly depends on the group velocity difference of modes and the initial length of the nonlinear pulse. In larger stripes the group velocities of modes are closer to each other. As a result the nonlinearly generated higher-order modes longer remain within the pump pulse. If the pulse is long enough, they reach significant amplitudes and a bullet-like waveform is formed. In narrower stripes the group velocity differ- ence is larger, and consequently the nonlinearly generated highorder wave- forms leave faster the pumping area. As a result, for the same pulse length, they do not reach significant amplitudes. The nonlinear steepening results Figure 2: Lateral shapes of the nonlinear SW packets. 1 and 2 – theoretical results calculated for the ferrite stripes of width of 2.5mm and 1mm , respec- tively. 3 and 4 – experimental profiles observed in YIG waveguides of width of 2.5mm and 1mm, respectively. 1 and 3: bullets. 2 and 4: solitons. in the transformation of the lowest mode into a soliton. The results of our calculations of the lateral shapes of the nonlinear spin wave packets in wide (2.5mm) and narrow 1mm ferrite stripes are shown in Fig. 2. The excellent correspondence with the experimental data provides good evidence for the validity of the developed theory. Support by the Deutsche Forschungsgemeinschaft, the Australian Re- search Council, and Russian Foundation for Basic Research is gratefully ac- knowledged. References [1] O. Büttner, M. Bauer, S.O. Demokritov, B. Hillebrands, Yu.S. Kivshar, V. Grimalsky, Yu. Rapoport, A.N. Slavin, 61, 11576 (2000). [2] A.A. Serga, B. Hillebrands, S.O. Demokritov, A.N. Slavin, 92, 117203 (2004). [3] A.A. Serga, B. Hillebrands, S.O. Demokritov, A.N. Slavin, P. Wierzbicki, V. Vasyuchka, O. Dzyapko, A. Chumak, 94, 167202 (2005). [4] A.N. Slavin, O. Büttner, M. Bauer, S.O. Demokritov, B. Hillebrands, M.P. Kostylev, B.A. Kalinikos, V. Grimalsky, Yu. Rapoport, Chaos 13, 693 (2003). [5] M. Chen, M.A. Tsankov, J.M. Nash, C.E. Patton, 49, 12773 (1994). [6] F.R. Morgenthaler, Proceedings of the IEEE 76, 138 (1988). [7] S.O. Demokritov, B. Hillebrands, A.N. Slavin, Phys. Rep. 348, 441 (2001). [8] K.Y.Guslienko, S.O.Demokritov, B.Hillebrands, and A.N.Slavin, 66, 132402 (2002). [9] M.Kostylev, J.-G. Hu, and R.L.Stamps, 90, 012507 (2007). ABSTRACT The formation of quasi-2D spin-wave waveforms in longitudinally magnetized stripes of ferrimagnetic film was observed by using time- and space-resolved Brillouin light scattering technique. In the linear regime it was found that the confinement decreases the amplitude of dynamic magnetization near the lateral stripe edges. Thus, the so-called effective dipolar pinning of dynamic magnetization takes place at the edges. In the nonlinear regime a new stable spin wave packet propagating along a waveguide structure, for which both transversal instability and interaction with the side walls of the waveguide are important was observed. The experiments and a numerical simulation of the pulse evolution show that the shape of the formed waveforms and their behavior are strongly influenced by the confinement. <|endoftext|><|startoftext|> Introduction Theoretical study of polarons in the strongly correlated system is like an at- tempt to view contents of a Pandora box embedded into another, even more sinister and obscure, container of riddles, enigmas and mysteries. This des- perate situation occurs because solution is not known even for the simplest polaron problem, i.e. when a perfectly stable quasiparticle (QP) with mo- mentum as a single quantum number interacts with a well defined bath of bosonic elementary excitations. To the contrary, the definition of the strongly correlated system implies that QPs might be highly unstable and the very notion of QPs, both in electronic and bosonic subsystems, is under question. Thus, one faces the problem of an interplay between ill defined objects and it is crucial to solve the problem without approximations. Further difficulty, pertinent to realistic systems, is an interplay of the momentum and other quantum numbers characterizing internal states of a QP. The problem of polaron originally emerged as that of an electron coupled to phonons (see [1, 2]). In the initial formulation a structureless QP is char- acterized by the only quantum number, momentum, which changes due to interaction of the QP with phonons [3, 4]. Later, depending on what can be called “particle” and “environment”, and how they interact with each other, the polaron concept was related to extreme diversity of physical phenomena. There are many other objects which, having nothing to do with phonons, are isomorphic to simple polaron [5], as, e.g. an exciton-polaron in the intraband scattering approximation [6, 7, 8, 9]. Another example is the problem of a hole in the antiferromagnet which is closely related to polaron since hole movement is accompanied by the spin flips which, in the spin wave approximation, are equivalent to creation and annihilation of magnons [10, 11]. http://arxiv.org/abs/0704.0025v1 2 A. S. Mishchenko and N. Nagaosa The concept of polaron was further generalized to include internal degrees of freedom which, interacting with environment, change their quantum num- bers. Example of a complex QP is the Jahn-Teller polaron, where electron- phonon interaction (EPI) changes quantum numbers of degenerate electronic states [12, 13, 14]. This generalization is important due to it’s relevance to the colossal magnetoresistance phenomena in the manganese oxides [15, 16]. Another example is the pseudo Jahn-Teller polaron, where EPI is inelastic and leads to transitions between close in energy electronic levels of a QP [17, 18, 19]. Further generalization is a system of several QPs which interact both with each other and environment. For example, effective interaction of two electrons through exchange by phonons can overcome the Coulomb repul- sion and form a bound state, bipolaron [20, 21, 22, 23, 24]. On the other hand, coupling of attracting hole and electron to the lattice vibrations [25, 26, 27] can create a lot of qualitatively different objects: localized exciton, weakly bound pair of localized hole and localized electron, etc. [28, 7]. Scattering by impurities introduces additional complexity to the polaron problem because interference of impurity potential with lattice distortion, which accompanies the polaron movement, can contribute either constructively or destructively to the localization of a QP on impurity [29, 30, 7]. In addition, a bare QP and bosonic bath can not be considered as well defined in the correlated systems. Angle Resolved Photoemission Spectra (ARPES), revealing the Lehmann Function (LF) of quasiparticle, demonstrate broad peaks in many correlated systems: cooper oxide high-temperature su- perconductors [31, 32, 33], colossal magnetoresistive manganites [34, 35, 36], quasi-one-dimensional Peierls conductors [37, 38], and Verwey magnetites [39]. Besides, phonons are also broadened in many correlated systems, e.g. in high- temperature semiconductors [40] and mixed-valent materials [41, 42]. One of possible reasons for these broadenings is the interaction of the QPs with the lattice degrees of freedom. However, in many realistic cases other subsystems, not explicitly included into the polaron Hamiltonian, are responsible for the decay of QP and phonons, e.g., another electronic bands, phonon anharmonic- ity, interaction with nuclear spins, etc. Then, if this auxiliary broadening is known in some approximation, one can formulate an ambitious goal to study spectral response when “bare” quasiparticle with known damping interacts with “broadened” bosonic excitations. No one of traditional numerical methods, to say nothing of analytical ones, can give approximation free results for measurable quantities of polaron, such as optical conductivity or angle resolved photoemission spectra, for in macro- scopic system of arbitrary dimension. Besides, we are not aware of any numer- ical method which can incorporate in an approximation free way the informa- tion on the damping of QP and bosonic bath. Below we describe basics of re- cently developed Diagrammatic Monte Carlo (DMC) method for numerically exact computation of Green functions and correlation functions in imaginary time for few polarons in a macroscopic system [43, 44, 45, 46, 47, 48, 49, 50, 51]. Analytic continuation of imaginary time functions to real frequencies is per- Spectroscopic Properties of Polarons by Exact Monte Carlo 3 formed by a novel approximation free approach of stochastic optimization (SO) [45, 50, 51], circumventing difficulties of popular Maximal Enthropy method. Finally we focus on results of application of the DMC-SO machinery to various problems [52, 53, 54, 55, 56, 57] The basic models, related to the polaronic objects in correlated systems, which can be solved by DMC-SO methods, are stated in the next Sect. It is followed in Sect. 1.2 by description of stumbling blocks encountered by analytic methods. Sect. 2 concerns the basics of DMC-SO methods. However, those who are not interested in the details of the methods can briefly look through the definitions in the introduction for Sect. 2 and turn to Sect. 3 where LF and optical conductivity of Fröhlich polaron are discussed (see also [58]). Results of studies of the self-trapping phenomenon are presented in Sect. 4 and application of DMC-SO methods to the exciton problem can be found in Sect. 5. The chapter is completed by Sect. 6 devoted to studies of ARPES of high temperature superconductors. 1.1 Formulation of a General Model with Interacting Polarons In general terms, the simplest problem of a complex polaronic object, where center-of-mass motion does not separate from the rest of degrees of freedom, is introduced as system of two QPs εa(k)a εh(k)hkh (ak and hk are annihilation operators, and εa(k) and εh(k) are dispersions of QPs), which interact with each other Ĥa-h = −N−1 U(p,k,k′)a† p−khp−k′ap+k′ . (2) (N is the number of lattice sites) through the instantaneous Coulomb potential and the scattering by bosons Ĥpar-bos = i (b†q,κ − b−q,κ) γaa,κ(k,q)a k−qak + γhh,κ(k,q)h k−qhk + γah,κ(k,q)h k−qak + h.c. (3) (γ[aa,ah,hh],κ are interaction constants) where quanta of Q different branches of bosonic excitations are created or annihilated, which are described by Ĥbos = ωq,κb q,κbq,κ . (4) In general, each QP can be a composite one with internal degree of freedom represented by T different states 4 A. S. Mishchenko and N. Nagaosa ĤPJT0 = ǫi(k)a i,kai,k, (5) which quantum numbers can be also changed due to nondiagonal part of particle-boson interaction Ĥpar-bos = i i,j=1 γij,κ(k,q)(b q,κ − b−q,κ)a i,k−qaj,k + h.c. (6) Complicated model (1)-(6) is still too far from the cases encountered in strongly correlated systems. Due to coupling of QPs (1) and (5) and bosonic fields (4) to additional degrees of freedom, these excitations are not well de- fined from the onset. Namely, the dispersion relation of the QP spectrum ǫ(k) in realistic system is ill-defined. One can speak of a Lehmann Function (LF) [59, 60, 61] of a QP Lk(ω) = δ(ω − Eν(k)) |〈ν|a†k|vac〉| 2 (7) ,which is normalized to unity dωLk(ω) = 1 and can be interpreted as a probability that a QP has momentum k and energy ω. (Here {|ν〉} is a complete set of eigenstates of Hamiltonian Ĥ in a sector of given momentum k: H |ν(k)〉 = Eν(k) |ν(k)〉.) Only for noninteracting system the LF reduces to delta function LNONINTk (ω) = δ(ω − ǫ(k)) and, thus, sets up dispersion relation ω = ǫ(k). Specific cases of model (1)-(6) describe enormous variety of physical prob- lems. Hamiltonians (1) and (2), in case of attractive potential U(p,k,k′) > 0, describe an exciton with static screening [62, 63]. Besides, expressions (1)-(4) describe bipolaron for repulsive interaction [20, 21, 22, 23, 24] U(p,k,k′) < 0 and exciton-polaron otherwise [25, 26, 27]. The simplest model for exciton- phonon interaction, when only two (T = 2) lowest states of relative electron- hole motion are relevant (e.g. in one-dimensional charge-transfer exciton [64, 65, 66]), is defined by Hamiltonians (4)-(6)). The same relations (4)-(6) describe the problems of Jahn-Teller [all ǫi in Hamiltonian (5) are the same] and pseudo Jahn-Teller polaron. The problem of a hole in an antiferromagnet in spin-wave approximation is expressed in terms of Hamiltonians (4)-(6) with Q = 1 and T = 1. When hole also interacts with phonons, one has to take into account one more bosonic branch and set Q = 2 in (4) and (6). Finally, the simplest nontrivial problem of a polaron, i.e. of a structureless QP interacting with one phonon branch, is described by noninteracting Hamiltonians of QP Ĥpar and phonons Ĥph Ĥ0 = ǫ(k)a qbq , (8) and interaction term Spectroscopic Properties of Polarons by Exact Monte Carlo 5 Ĥint = V (k,q)(b†q − b−q)a k−qak + h.c. . (9) The simplest polaron problem, in turn, can be subdivided into continuous and lattice polaron models. 1.2 Limitations of Analytic Methods in Problem of Polarons Analytic solution for the problem of exciton in a rigid lattice is available only for small radius Frenkel regime [67] and large radius Wannier regime [68]. However, even limits of validity for these approximations are not known. Random phase approximation approaches [62, 63], are capable of obtaining some qualitative conclusions for intermediate radius regime though its’ quan- titative results are not reliable due to uncontrolled errors. The situation is similar with the problem of structureless polaron, where analytic solutions are known only in the weak and strong coupling regimes. Besides, reliable results for these regimes are available only for ground state properties. Although several novel methods, capable of obtaining properties of excited states, were developed recently, variational coherent-states expansion [69] and free propagator momentum average summation [70] as a few examples, all of them, to provide reliable data in a specific regime, need either comparison with exact sum rules [71, 72] or with exact numerical results. Application of variational methods to study of excitations is a tricky is- sue since, strictly speaking, they are valid only for the ground state. As an example for the importance of sum rules in variational treatment, we refer to the problem of the optical conductivity of the Fröchlich polaron. Possibil- ity of existence of Relaxed Excited State (RES), which is a metastable state where lattice deformation has adjusted to the electronic excitation rendering stability and narrow linewidth of the spectroscopic response, was briefly men- tioned by S. I. Pekar in early 50’s. Then, conception of RES was rigorously formulated by J. T. Devreese with coworkers and has been a subject of ex- tensive investigations for years [5, 73, 74, 75, 76, 77, 48, 57]. Calculations of impedance [75] in the framework of technique [78] supported the existence of a narrow stable peak in the optical conductivity. However, even the authors of [75] were skeptical about the fact that the width of RES in the strong coupling regime appeared to be more narrow than the phonon frequency, i.e. inverse time which is, according to the Heisenberg uncertainty principle, is required for the lattice readaptation. In consequent paper [77] they realized the importance of many-phonon processes and studied two-phonon contri- bution to optical conductivity. Importance of many phonon processes was confirmed when variational results [75] were compared with exact DMC sim- ulations [48]. Variational result well reproduced the position of the peak in exact data though failed in description of the peak width in the strong cou- pling regime [48]. Finally, when approach [75] was modified and several sum rules were accurately introduced into variational model [57], both position 6 A. S. Mishchenko and N. Nagaosa and width of the peak were quantitatively reproduced. Studies [57] (see Sect. 3.1), do not address rather philosophical question whether RES exists or not, though inevitably prove that, in contrast to the foregoing beliefs, there in no stable excited state of the Fröhlich polaron in the strong coupling regime. Note that sometimes excited states can not be handled by analytic methods even for weak couplings: perturbation theory expression for LF of the Fröhlich polaron model diverges at the phonon energy ωph [See (34) in Sect. 3.1.] and more elaborate treatment is necessary. Difficulties of semianalytic methods enhance in the intermediate coupling regime where results are sometimes wrong even for ground state properties. For example, the variatioanl approach [79], which has been considered as an intermediate coupling theory, appeared to be valid only in the weak coupling limit [45]. Special interest to the methods, giving reliable information on ex- cited states, is triggered by the self-trapping phenomenon which occurs just in the intermediate coupling regime. This phenomenon is a dramatic trans- formation of QP properties when system parameters are slightly changed [3, 7, 9, 80]. In the intermediate coupling regime “trapped” QP state with strong lattice deformation around it and “free” state with weakly perturbed lattice may hybridize and resonate because of close energies at some critical value of electron-lattice interaction γc. It is clear that, to study self-trapping, one has to apply a method giving reliable information on excited states in the intermediate coupling regime. 2 Diagrammatic Monte Carlo and Stochastic Optimization Methods In this section we introduce definitions of exciton-polaron properties which can be evaluated by DMC and SO methods. An idea of DMC approach for numerically exact calculation of Green functions (GFs) in imaginary times is presented in Sect. 2.1, and a short description of SO method, which is capable of making unbiased analytic continuation from imaginary times to real frequencies, is given in Sect. 2.2. Using combination of DMC and SO, one can often circumvent difficulties of analytic and traditional numerical methods. Therefore, a brief comparative analysis of advantages and drawbacks of DMC-SO machinery is given in Sect. 2.3. To obtain information on QPs it is necessary to calculate Matsubara GF in imaginary time representation and make analytic continuation to the real frequencies [60]. For the two-particle problem (1)-(4) the relevant quantity is the two-particle GF [46, 47] (τ) = 〈vac | ak+p′(τ)hk−p′(τ)h†k−pa k+p | vac〉 . (10) (Here hk−p(τ) = e Ĥτhk−pe −Ĥτ , τ > 0.) In the case of exciton-polaron, vac- uum state | vac〉 is the state with filled valence and empty conduction bands. Spectroscopic Properties of Polarons by Exact Monte Carlo 7 For the bipolaron problem it is a system without particles. In the simpler case of a QP with two-level internal structure described by (4)-(6) the relevant quantity is the one-particle matrix GF [52, 47] Gk,ij(τ) = 〈vac | ai,k(τ)a†j,k | vac〉, i, j = 1, 2. (11) For a structureless polaron the matrix (11) reduces to one-particle scalar GF Gk(τ) = 〈vac | ak(τ)a†k | vac〉 . (12) Information on the response to an external weak perturbation (e.g. optical absorption) is contained in the current-current correlation function 〈Jβ(τ)Jδ〉 (β/δ are Cartesian indexes). Lehmann spectral representation of Gk(τ) [60, 61] at zero temperature Gk(τ) = dω Lk(ω) e −ωτ , (13) with the Lehmann function (LF) Lk(ω) given in (7), reveals information on the ground and excited states. Here {|ν〉} is a complete set of eigenstates of Hamiltonian Ĥ in a sector of given momentum k: H |ν(k)〉 = Eν(k) |ν(k)〉. The LF Lk(ω) has poles (sharp peaks) on the energies of stable (metastable) states of particle. For example, if there is a stable state at energy E(k), the LF reads Lk(ω) = Z (k) δ(ω − E(k)) + . . ., and the state with the lowest energy Eg.s.(k) in a sector of a given momentum k is highlighted by asymptotic behavior of GF Gk(τ ≫ max ω−1q,κ ) → Z(k) exp[−Eg.s.(k)τ ] , (14) where Z(k)-factor is the weight of the state. Analyzing the asymptotic behavior of similar n-phonon GFs [45, 52] Gk(n, τ ; q1, . . . ,qn) = 〈vac| bqn(τ) · · · bq1(τ) ap(τ)a · · · b†qn |vac〉 , p = k− j=1 qj . one obtains detailed information about lowest state. For example, important characteristics of the lowest state wave function Ψg.s.(k) = q1...qn θi(k;q1, ...,qn)c i,k−q1...−qnb ...b†qn | vac〉 (16) are partial n-phonon contribution Z(k)(n) ≡ q1...qn | θi(k;q1, ...,qn) |2 (17) which is normalized to unity n=0 Z (k)(n) ≡ 1, and the average number of phonons 8 A. S. Mishchenko and N. Nagaosa 〈N〉 ≡ 〈Ψg.s.(k) | b†qbq | Ψg.s.(k)〉 = nZ(k)(n) (18) in polaronic cloud. Another example is the wave function of relative electron- hole motion of exciton in the lowest state in the sector of given momentum Ψg.s.(k) = ξk p(g.s.)a | vac〉 . (19) The amplitudes ξk p(g.s.) of this wave function can be obtained [46] from asymptotic behavior of the following GF (10) (τ → ∞) =| ξk p(g.s.) |2 e−Eg.s.(k)τ . (20) Information on the excited states is obtained by the analytic continuation of imaginary time GF to real frequencies which requires to solve the Fredholm equation Gk(τ) = F̂ [Lk(ω)] (13) Lk(ω) = F̂−1ω [Gk(τ)] . (21) The equation (13) is a rather general relation between imaginary time GF/cor- relator and spectral properties of the system. For example, the absorption coefficient of light by excitons I(ω) is obtained as solution of the same equation I(ω) = F̂−1ω k=0(τ)  . (22) Besides, the real part of the optical conductivity σβδ(ω) is expressed [48] in terms of current-current correlation function 〈Jβ(τ)Jδ〉 by relation σβδ(ω) = πF̂−1ω [〈Jβ(τ)Jδ〉] /ω . (23) 2.1 Diagrammatic Monte Carlo Method DMC Method is an algorithm which calculates GF (10)-(12) without any systematic errors. This algorithm is described below for the simplest case of structureless polaron [45], and generalizations to more complex cases can be found in consequent references4. DMC is based on the Feynman expansion of the Matsubara GF in imaginary time in the interaction representation 4 Generalization of described below technique to the case of exciton (1-2) is given in [46] and its modification for pseudo-Jahn-Teller polaron (4-6) is developed in [52, 47]. Method for evaluation of current-current correlation function can be found in [48] and a case of a polaron interacting with two kinds of bosonic fields is considered in [49]. Spectroscopic Properties of Polarons by Exact Monte Carlo 9 Gk(τ) = ∣∣∣∣Tτ ak(τ)a (0) exp Ĥint(τ ′)dτ ′ }]∣∣∣∣ vac ; τ > 0 . Here Tτ is the imaginary time ordering operator, |vac〉 is a vacuum state with- out particle and phonons, Ĥint is the interaction Hamiltonian in (9). Symbol of exponent denotes Taylor expansion which results in multiple integration over internal variables {τ ′1, τ ′2, . . .}. Operators are in the interaction representation Â(τ) = exp[τ(Ĥpar + Ĥph)] exp[−τ(Ĥpar + Ĥph)]. Index “con” means that expansion contains only connected terms where no one integral over internal time variables {τ ′1, τ ′2, . . .} can be factorized. Vick theorem expresses matrix element of time-ordered operators as a sum of terms, each is a factor of matrix elements of pairs of operators, and expansion (24) becomes an infinite series of integrals with an ever increasing number of integration variables Gk(τ) = m=0,2,4... dx′1 · · · dx′m D(ξm)m (τ ; {x′1, . . . , x′m}) . (25) Here index ξm stands for different Feynman diagrams (FDs) of the same order m. Term with m = 0 is the GF of the noninteracting QP G Function D(ξm)m (τ ; {x′1, . . . , x′m}) of any order m can be expressed as a fac- tor of GFs of noninteracting quasiparticle, GFs of phonons, and interaction vortexes V (k,q). For the simplest case of Hamiltonian system expressions for GFs of QP G (τ2 − τ1) = exp [−ǫ(k)(τ2 − τ1)] (τ2 > τ1) and phonons q (τ2 − τ1) = exp [−ωq(τ2 − τ1)] (τ2 > τ1) are well known. An important feature of the DMC method, which is distinct from the row of other exact numerical approaches, is the explicit possibility to include renormalized GFs into exact expansion without any change of the algorithm. For example, if a damping of QP, caused by some interactions not included in the Hamiltonian, is known, i.e. retarded self-energy of QP Σret(k, ω) is available, renormalized GF (τ) = dωe−ωτ ImΣret(k, ω) [ω − ǫ(k)−ReΣret(k, ω)]2 + [ImΣret(k, ω]2 can be introduced instead of bare GF G (τ). Explicit rules for evaluation of D(ξm)m do not depend on the order and topology of FD. GFs of noninteracting (τ2−τ1) (or G̃(0)k (τ2−τ1)) with corresponding times and momenta are ascribed to horizontal lines and noninteracting GFs of phonon D q (τ2 − τ1) (multiplied by the factor of corresponding vortexes V (k′,q)V ∗(k′′,q)) are attributed to phonon propagator arch (see Fig. 1a). Then, D(ξm)m is the factor of all GSs. For example, expression for the weight of the second order term (Fig. 1b) is the following 10 A. S. Mishchenko and N. Nagaosa D2(τ ; {τ ′2, τ ′1,q}) = |V (k,q)|2D(0)q (τ ′2 − τ ′1)G (τ ′1)G k−q(τ 2 − τ ′1)G (τ − τ ′2) . (27) τ’2τ’1 k k-q k τ0 τ’2τ’4τ’1 τ’3 k k-q-q’ k-q k Fig. 1. (a) Typical FD contributing into expansion (25). (b) FD of the second order and (c) forth order. The DMC process is a numerical procedure which, basing on the Metropo- lis principle [81, 82], samples different FDs in the parameter space (τ,m, ξm, {x′m}) and collects statistics of external variable τ in a way that the result of this statistics converges to exact GF Gk(τ). Although sampling of the internal parameters of one term in (25) and switch between different orders is per- formed within the the framework of one and the same numerical process, it is instructive to start with the procedure of evaluation of a specific term D(ξm)m (τ ; {x′1, . . . , x′m}). Starting from a set {τ ; {x′1, . . . , x′m}}, an update x (old) l → x (new) l of an arbitrary chosen parameter is suggested. This update is accepted or rejected according to Metropolis principle. After many steps, altering all variables, statistics of external variable converges to exact dependence of the term on τ . Suggestion for new value of parameter x (new) l = Ŝ −1(R) is generated by random number R ∈ [0, 1] with a normalized distribution function W (xl) in a range x (min) l < xl < x (max) l . There are only two restrictions for this otherwise arbitrary function. First, new parameters x (new) l must not violate FD topology, i.e., for example, internal time τ ′1 in Fig. 1c must be in the range [x(min) = 0, x(max) = τ ′3]. Second, the distribution must be nonzero for the whole, allowed by FD topology, domain. This ergodicity property is crucial since it is necessary to sample the whole domain for convergence to exact answer. At each step, update x (old) l → x (new) l is accepted with probability Pacc = M (if M < 1) and always otherwise. The ratio M has the following D(ξm)m (τ ; {x′1, . . . , x (new) l , . . . , x m})/W (x (new) D(ξm)m (τ ; {x′1, . . . , x (old) l , . . . , x m})/W (x (old) . (28) For uniform distribution W = const = (max) l − x (max) , the probability of any combination of parameters is proportional to the weight function D. Spectroscopic Properties of Polarons by Exact Monte Carlo 11 However, for better convergence the distributionW (xnewl ) must be as close as possible to the actual distribution given by function D(ξm)m ({. . . , x(new)l , . . . , }). For sampling over FDs of all orders and topologies it is enough to introduce two complimentary updates. Update A transforms FD D(ξm)m (τ ; {x′1, . . . , x′m}) into higher order FD D(ξm+2)m+2 (τ ; {x′1, . . . , x′m; q′, τ ′3, τ ′4}) with extra phonon arch, connecting some time points τ ′3 and τ 4 by phonon propagator with mo- mentum q′ (Fig. 1c). Note that the ratio of weights D(ξm+2)m+2 /D m is not dimensionless. Dimensionless Metropolis ratio D(ξm+2)m+2 (τ ; {x′1, . . . , x′m; q′, τ ′, τ ′′}) D(ξm)m (τ ; {x′1, . . . , x′m})W (q′, τ ′, τ ′′) . (29) contains normalized probability function W (q′, τ ′, τ ′′), which is used for gen- erating of new parameters5. Complementary update B, removing the phonon propagator, uses ratio M−1 [45]. Note that all updates are local, i.e. do not depend on the structure of the whole FD. Neither rules nor CPU time, needed for update, depends on the FD order. DMC method does not imply any explicit truncation of FDs order due to finite size of computer memory. Ever for strong coupling, where typical number of phonon propagators Nph, contributing to result, is large, influence of finite size of memory is not essential. Really, according to Central Limit Theorem, number of phonon propagators obeys Gauss distribution centered at N̄ph with half width of the order of N̄ph [83]. Hence, if a memory for at least 2N̄ph propagators is reserved, diagram order hardly surpasses this limit. 2.2 Stochastic Optimization Method The problem of inverting of integral equation (13) is an ill posed problem. Due to incomplete noisy information about GF Gk(τ), which is known with statistic errors on a finite number of imaginary times in a finite range [0, τmax], there is infinite number of approximate solutions which reproduce GF within some range of deviations and the problem is to chose “the best one”. Another problem, which is a stumbling block for decades, is the saw tooth noise insta- bility. It occurs when solution is obtained by a naive method, e.g. by using least-squares approach for minimizing deviation measure D[L̃k(ω)] = ∫ τmax ∣∣∣Gk(τ) − G̃k(τ) ∣∣∣G−1k (τ)dτ . (30) Here G̃k(τ) is obtained from approximate LF L̃k(ω) by applying of integral operator G̃k(τ) = F L̃k(ω) in (13). Saw tooth instability corrupts LF in the ranges where actual LF is smooth. Fast fluctuations of the solution L̃k(ω) often 5 The factor pA/pB depends on the probability to address add/remove processes. 12 A. S. Mishchenko and N. Nagaosa have much larger amplitude than the value of actual LF Lk(ω). Standard tools for saw tooth noise suppression are based on the early 60-es idea of Fillips- Tikhonov regularization method [84, 85, 86, 87]. A nonlinear functional, which suppresses large derivatives of approximate solution L̃k(ω), is added to the linear deviation measure (30). Most popular variant of regularization methods is the Maximal Entropy Method [61]. However, typical LF of a QP in a boson field consists of δ-functional peaks and smooth incoherent continuum with a sharp border [45, 54]. Hence, sup- pression of high derivatives, as a general strategy of the regularization method, fails. Moreover, any specific implementation of the regularization method uses predefined mesh in the ω space, which could be absolutely unacceptable for the case of sharp peaks. If the actual location of a sharp peak is between predefined discrete points, the rest of spectral density can be distorted be- yond recognition. Finally, regularization Maximal Entropy approach requires assumption of Gauss distribution of statistic errors in Gk(τ), which might be invalid in some cases [61]. Recently, a Stochastic Optimization (SO) method, which circumvents abovementioned difficulties, was developed [45]. The idea of the SO method is to generate a large enough number M of statistically independent nonreg- ularized solutions {L̃(s) (ω)}, s = 1, ...,M , which deviation measures D(s) are smaller than some upper limit Du, depending of the statistic noise of the GF Gk(τ). Then, using linearity of expressions (13), (30), the final solution is found as the average of particular solutions {L̃(s) Lk(ω) = M (ω) . (31) Particular solution L̃ (ω) is parameterized in terms of sum (ω) = χ{Pt}(ω) (32) of rectangles {Pt} = {ht, wt, ct} with height ht > 0, width wt > 0, and center ct. Configuration C = {{Pt}, t = 1, ...,K} , (33) which satisfies normalization condition t=1 htwt = 1, defines function G̃k(τ). The procedure of generating particular solution starts from stochastic choice of initial configuration Cinits . Then, deviation measure is optimized by a randomly chosen consequence of updates until deviation is less than Du. In addition to updates, which do not change number of terms in the sum (32), there are updates which increase or decrease number K. Hence, since the number of elements K is not fixed, any spectral function can be reproduced with desired accuracy. Spectroscopic Properties of Polarons by Exact Monte Carlo 13 Although each particular solution L̃ (ω) suffers from saw tooth noise at the area of smooth LF, statistical independence of each solution leads to self averaging of this noise in the sum (32). Note that suppression of noise happens without suppression of high derivatives and, hence, sharp peaks and edges are not smeared out in contrast to regularization approaches. Therefore, saw tooth noise instability is defeated without corruption of sharp peaks and edges. Moreover, continuous parameterization (32) does not need predefined mesh in ω-space. Besides, since the Hilbert space of solution is sampled directly, any assumption about distribution of statistical errors is not necessary. SO method was successfully applied to restore LF of Fröhlich polaron [45], Rashba-Pekar exciton-polaron [54], hole-polaron in t-J-model [53, 49], and many-particle spin system [88]. Calculation of the optical conductivity of polaron by SO method can be found in [48]. SO method appeared to be helpful in cases when GF’s asymptotic limit, giving information about ground state, can not be reached. For example, sign fluctuations of the terms in expansion (25) for a hole in the t-J-model lead to poor statistics at large times [53], though, SO method is capable of recovering energy and Z-factor even from GF known only at small imaginary times [53]. 2.3 Advantages and Drawbacks of DMC-SO Machinery Among numerical methods, capable of obtaining quantitative results in the problem of exciton (1) and (2), one can list time-dependent density func- tional theory [89], Hanke-Sham technique of correcting particle-hole excita- tion energy [90, 91], and approaches directly solving Bethe-Salpeter equation [92, 93, 94]. The latter ones provide rather accurate information on the two- particle GF. However, usage of finite mesh in direct/reciprocal space, which is avoided in DMC method, leads to its’ failure in Wannier regime [93]. In contrast to DMC method, none of the traditional numeric methods can give reliable results for measurable properties of excited states of polaron at arbitrary range of electron-phonon interaction for the macroscopic system in the thermodynamic limit. Exact diagonalization method [95, 96, 97, 98] can study excited states though only on rather small finite size systems and results of this method are not even justified in the variational sense in the thermodynamic limit [99]. There is a batch of rather effective variational “ex- act translation” methods [99, 100, 101, 102, 103] where basis is chosen in the momentum space and, hence, the variational principle is applied in the thermodynamic limit. Although these methods can reveal few discrete excited states, its fail for long-range interaction and for dispersive, especially acoustic phonons due to catastrophic growth of variational basis. A non perturbative theory, which is able to give information about spectral properties in the ther- modynamic limit at least for one electron, is Dynamical Mean Field Theory [104, 105, 106, 107]. However it gives an exact solution only in the case of infinite dimension which does not correspond to a realistic system and can be considered only as a guide for extrapolation to finite dimensions [108]. 14 A. S. Mishchenko and N. Nagaosa Recently developed cluster perturbation theory, where exact diagonaliza- tion of a cluster is further improved by taking into account inter-cluster inter- action [109, 110, 111, 112, 113], is applicable for study of the excited states, but limited to one-dimensional lattices or two-dimensional systems with short- range interaction. Traditional density-matrix renormalization group method [114, 115, 116, 117, 118] is very effective though mostly limited to one- dimensional systems and ladders. Finally, recently developed path integral quantum Monte Carlo algorithm [119, 120, 121, 122] is valid for any dimen- sion and properly takes into account quasi long-range interactions [123]. Path integral method is capable of obtaining the density of states [119, 120] and isotope exponents [121, 124]. However calculations of measurable characteris- tics of excited states, such as ARPES or optical conductivity, by this method were never reported. In conclusion, none of methods, except DMC-SO combination, can obtain at the moment approximation-free results for measurable physical quantities for a few QPs interacting with a macroscopic bosonic bath in the thermody- namic limit. Indeed, there are limitations of the DMC and SO methods. DMC method does not work in many-fermion systems due to sign problem and SO method fails at high temperatures, comparable to energies of dominant spec- tral peaks, because even very small statistical noise of GFs turns Fredholm equation (13) into essentially “ill defined” problem [84]. 3 Spectral Properties of the Fröhlich Polaron Before development of DMC-SO methods, the information on the excited states of polaron models, especially the Fröhlich one, was very limited. Knowl- edge of LF was based on results of infinite-dimensions approximation [125], exact diagonalization [126, 96, 97, 97], or strong coupling expansion [127]. No one of the above techniques was capable of obtaining the LF of polaron with- out approximations, especially for long-range interaction where difficulties of traditional numerical methods dramatically increase. In a similar way, optical conductivity (OC) of Fröhlich model was known only in strong coupling ex- pansion approximation [128], within the framework of the perturbation theory [129], or was based on the variational Feynman path integral technique [75]. In this sect. we consider exact DMC-SO results on LF [45] and OC [48, 57] of Fröhlich polaron model. 3.1 Lehmann Function of the Fröhlich Polaron The perturbation theory expression for the high-energy part (ω > 0) of the LF for arbitrary interaction potential V (| q |) reads [45] (frequency of the optical phonon ωph is set to unity) Lk=0(ω > 0) = ω − 1 | V ( 2(ω − 1)) |2 θ(ω − 1) . (34) Spectroscopic Properties of Polarons by Exact Monte Carlo 15 Low-energy part of the LF for the short-range interaction V (| q |) = 0 2 4 6 0.000 0.002 0.004 0.006 0 2 4 6 2 4 6 2 4 6 2 4 6 1.0 1.2 L (a) Fig. 2. Comparison of the numerical results (solid lines) and the perturbation theory (dashed lines) for the LFs of the Fröhlich model with α = 0.05 (a) and the short-range interaction model with α = 0.05 and κ = 1 (b). LFs of Fröhlich polaron for α = 0.5 (c), α = 1 (d) and α = 2 (e). Energy is measured from that of the ground state of the polaron. The initial fragment of the LF for α = 1 is shown in the inset (f). (q2 + κ2)−1/2 , reducing to the Fröhlich one when κ→ 0, is Lk=0(ω < 0) = ω + α . (35) Comparison of low-energy parts of the LF of the Fröhlich model, obtained by DMC-SO and taken from (35), shows perfect agreement for α = 0.05: the accuracy for the polaron energy and Z-factor is about 10−4. On the other hand, high-energy part of numeric result (Fig. 2) significantly deviates from that of the analytic expression (35). This is not surprising since for Fröhlich polaron the perturbation theory expression is diverging as ω → ωph and, therefore the perturbation theory breaks down. When perturbation theory is obviously valid, e.g. for the case of finite κ = 1, there is a perfect agreement between analytic expression and DMC-SO results (Fig. 2b). Note that the high-energy part of Lk=0(ω) is successfully restored by SO method despite the fact that the total weight of the feature for α = 0.05 is less than 10−2. The main deviation of the actual LF from the perturbation theory result is the extra broad peak in the actual LF at ω ∼ 3.5. To study this feature Lk=0(ω) was calculated for α = 0.5, α = 1, and α = 2 (Fig. 2c-e). The peak 16 A. S. Mishchenko and N. Nagaosa 0 4 8 12 0 4 8 12 0 4 8 12 =4 =6 Fig. 3. Evolution of spectral density with α in the cross-over region from interme- diate to strong couplings. The polaron ground state peak is shown only for α = 8. Note that the spectral analysis still resolves it, despite its very small weight < 10−3. is seen for higher values of the interaction constant and its weight grows with α. Near the threshold, ω = 1, LF demonstrates the square-root dependence ω − 1 (Fig. 2f). To trace the evolution of the peak at higher values of α the LF was cal- culated [45] for α = 4, α = 6, and α = 8 (Fig. 3). At α = 4 the peak at ω ∼ 4 already dominates. Moreover, a distinct high-energy shoulder appears at α = 4, which transforms into a broad peak at ω ∼ 8.5 in the LF for α = 6. The LF for α = 8 demonstrates further redistribution of the spectral weight between different maxima without significant shift of the peak positions. 3.2 Optical Conductivity of the Fröhlich Polaron: Validity of the Franck-Condon Principle in the Optical Spectroscopy The FC principle [130, 131] and its validity have been widely discussed in stud- ies of optical transitions in atoms, molecules [132, 133], and solids [134, 9]. Generally, the FC principle means that if only one of two coupled subsys- tems, e.g. an electronic subsystem, is affected by an external perturbation, the second subsystem, e.g., the lattice, is not fast enough to follow the re- construction of the electronic configuration. It is clear that the justification for the FC principle is the short characteristic time of the measurement pro- cess τmp ≪ τic, where τmp is related to the energy gap between the initial and final states, ∆E, through the uncertainty principle: τmp ≃ h̄/(∆E) and τic is the time necessary to adjust the lattice when the electronic component is perturbed. Then, the spectroscopic response considerably depends on the value of the ratio τmp/τic For example, in mixed valence systems, where the ionic valence fluctuates between configurations f5 and f6 with characteristic time τic ≈ 10−13s, spectra of fast and slow experiments are dramatically dif- ferent [135, 136]. Photoemission experiments with short characteristic times τmp ≈ 10−16s (FC regime), reveal two lines, corresponding to f5 and f6 Spectroscopic Properties of Polarons by Exact Monte Carlo 17 states. On the other hand slow Mössbauer isomer shift measurements with τmp ≈ 10−9s show a single broad peak with mean frequency lying between signals from pure f5 and f6 shells. Finally, according to paradigm of mea- surement process time, magnetic neutron scattering with τmp ≈ τic revealed both coherent lines with all subsystems dynamically adjusted and broad inco- herent remnants of strongly damped excitation of f5 and f6 shells [137, 138]. Actually, the meaning of the times τic and τmp varies with the system and with the measurement process. To study the interplay between measurement process time τmp and ad- justment time τic, the OC of the Fröhlich polaron was studied in [57] from the weak to the strong coupling regime by three methods. DMC method gives numerically exact answer which is compared with memory function formalism (MFF), which is able to take dynamical lattice relaxation into account, and strong coupling expansion (SCE) which assumes FC approach. It was found that near critical coupling αc ≈ 8.5 a dramatic change of the OC spectrum occurs: dominating peak of OC splits into two satellites. In this critical regime the upper (lower) one quickly decreases (increases) it’s spectral weight as the value of coupling constant increases. Besides, while OC follows prediction of MFF at α < αc, its dependence switches to that predicted by SCE for larger couplings. It was concluded that, for the OC measurement of polaron, the adjustment time τic ≈ h̄/D is set by typical nonadiabatic energy D. Nona- diabaticity destroys FC classification at α < αc while FC principle rapidly regains its validity at large couplings due to fast growth of energy separation between initial and final states of optical transitions. Comparison of exact DMC-SO data for OC with existing results of ap- proximate methods showed [48] that the Feynman path integral technique [75] of Devreese, De Sitter, and Goovaerts, where OC is calculated starting from the Feynman variational model [139], is the only successfully describing evolution of the energy of the main peak in OC with coupling constant α (see [58]). However, starting from the intermediate coupling regime this approach fails to reproduce the peak width. Subsequently, the path integral approach was rewritten in terms of MFF [140]. Then, in [57] the extended MFF for- malism, which introduces dissipation processes fixed by exact sum rules, was developed [141]. As shown in Fig. 4a, in the weak coupling regime, the MFF, with or with- out dissipation, is in very good agreement with DMC data, showing significant improvement with respect to weak coupling perturbation approach [129] which provides a good description of OC spectra only for very small values of α. For 1 ≤ α ≤ 8, where standard MFF fails to reproduce peak width (Fig. 4b-d) and even the peak position (Fig. 4c), the damping, introduced to extended MFF scheme, becomes crucial. Results of extended MFF are accurate for the peak energy and quite satisfactory for the peak width (Fig. 4b-e). Note that the broadening of the peak in DMC data is not a consequence of poor quality of analytic continuation procedure since DMC-SO methods is capable of re- vealing such fine features as 2- and 3-phonon thresholds of emission (Fig. 4b). 18 A. S. Mishchenko and N. Nagaosa 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 10 12 0 5 10 15 0 5 10 15 (a) (b) =5.25 Fig. 4. Comparison of the optical conductivity calculated by DMC method (circles), extended MFF (solid line), and DSG [75, 140] (dashed line) for different values of α. The slanted arrows indicate 2- and 3-phonon thresholds of absorption. 0 5 10 15 20 0 10 20 30 0 10 20 30 40 0 5 10 15 0 5 10 15 (a) (b) Fig. 5. (a)-(c) Comparison of the optical conductivity calculated within the DMC method (circles), the extended MFF (solid line), and SCE (dotted line) for different values of α. (d) The energy of lower- and higher-frequency features (circles and tri- angles, respectively) compared with the FC transition energy with the SCE (dashed line) and with the energy of the peak obtained from the extended MFF (solid line). In the inset, the weights of FC and adiabatically connected transitions are shown as a function of α (for η = 1.3.) However, a dramatic change of OC occurs around critical coupling strength αc ≈ 8.5. The dominating peak of OC splits into two ones, the energy of lover one corresponding to the predictions of SCR expansion and that of upper one obeying extended MFF value (Fig. 5a). The shoulder, corresponding to dynamical extended MFF contribution, rapidly decreases it’s intensity with increase of α and at large α (Fig. 5b-c) the OC is in good agreement with strong coupling expansion, assuming FC scheme. Finally, comparing energies of the peaks, obtained by DMC, extended MFF and FC strong coupling ex- pansion (Fig. 5d), we conclude that at critical coupling αc ≈ 8.5 the spectral properties rapidly switch from dynamic, when lattice relaxes at transition, to FC regime, where nuclei are frozen in initial configuration. In order get an idea of the FC breakdown authors of [57] consider the fol- lowing arguments. The approximate adiabatic states are not exact eigenstates of the system. These states are mixed by nondiagonal matrix elements of the Spectroscopic Properties of Polarons by Exact Monte Carlo 19 nonadiabatic operator D and exact eigenstates are linear combinations of the adiabatic wavefunctions. Being interested in the properties of transition from ground (g) to an excited (ex) state, whose energy correspond to that of the OC peak, it is necessary to consider mixing of only these states and express exact wavefunctions as a linear combinations [142, 143] of ground and excited adi- abatic states. The coefficients of superposition are determined from standard techniques [142, 143] where nondiagonal matrix elements of the nonadiabatic operator [142] are expressed in terms of matrix elements of the kinetic energy operator M , the gap between excited and ground state ∆E = Eex − Eg and the number nβ of phonons in adiabatic state: D± =M(∆E)−1 nβ + 1/2± 1/2 +M2(∆E)−2. (36) The extent to which lattice can follow transition between electronic states, depends on the degree of mixing between initial and final exact eigenstates through the nonadiabatic interaction. If initial and ground states are strongly mixed, the adiabatic classification has no sense and, therefore, the FC pro- cesses have no place and lattice is adjusted to the change of electronic states during the transition. In the opposite limit adiabatic approximation is valid and FC processes dominate. The estimation for the weight of FC component IFC [57] is equal to unity in the case of zero mixing and zero in the case of maximal mixing. The weight of adiabatically connected (AC) transition IAC = 1− IFC is defined accordingly. Non-diagonal matrix element M is pro- portional to the root square of α with a coefficient η of the order of unity. In the strong coupling regime, assuming that ∆E ≈ γα2 (γ ≈ 0.1 from MC data), and nβ ≈ ∆E (nβ ≫ 1), one gets IFC = 1 + 4(τmp/τic) , (37) where τmp = 1/∆E and τic = 1/D. For η of the order of unity one obtains qualitative description of a rather fast transition from AC- to FC-dominated transition, when IFC and IAC exchange half of their weights in the range of α from 7 to 9. The physical reason for such quick change is the faster growth of energy separation ∆E ∼ α2 compared to that of the matrix ele- ment M ∼ α1/2. Finally, for large couplings, initial and final states become adiabatically disconnected. The rapid AC-FC switch has nothing to do with the self-trapping phenomenon where crossing and hybridization of the ground and an excited states occurs. This phenomenon is a property of transition between different states and related to the choice whether lattice can or can not follow adiabatically the change of electronic state at the transition. 4 Self-Trapping In this section we consider the self-trapping (ST) phenomenon which, due to essential importance of many-particle interaction of QP with bosonic bath of 20 A. S. Mishchenko and N. Nagaosa macroscopic system, was never addressed by exact method before. We start with a basic definition of the ST phenomenon and introduce the adopted criterion for it’s existence. Then, generic features of ST are demonstrated on a simple model of Rashba-Pekar exciton-polaron in Sect. 4.1. It is shown in Sect. 4.2 that the criterion is not a dogma since even in one dimensional system, where ST is forbidden by criterion of existence, one can observe all main features of ST due to peculiar nature of electronic states. In general terms [7, 80], ST is a dramatic transformation of a QP properties when system parameters are slightly changed. The physical reason of ST is a quantum resonance, which happens at some critical interaction constant γc, between “trapped” (T) state of QP with strong lattice deformation around it and “free” (F) state. Naturally, ST transition is not abrupt because of nonadiabatic interaction between T and F states and all properties of the QP are analytic in γ [144]. At small γ < γc, ground state is an F state which is weakly coupled to phonons while excited states are T states and have a large lattice deformation. At critical couplings γ ≈ γc a crossover and hybridization of these states occurs. Then, for γ > γc the roles of the states exchange. The lowest state is a T state while the upper one is an F state. First, and up to now the only quantitative criterion for ST existence was given in terms of the ground state properties in the adiabatic approximation. This criterion considers stability of the delocalized state in undistorted lattice ∆ = 0 with respect to the energy gain due to lattice distortion ∆′ 6= 0. ST phenomenon occurs when completely delocalized state with∆ = 0 is separated from distorted state with ∆′ 6= 0 by a barrier of adiabatic potential. One of these states is stable while another one is meta-stable. The criterion of barrier existence is defined in terms of the stability index s = d− 2(1 + l) , (38) where d is the system dimensionality. Index l determines the range of the force limq→0 ψ(q) ∼ q−l, where ψ(R) is the kernel of interaction U(Rn) = ψ(Rn − Rn′)ν(Rn′) connecting potential U(Rn) with generalized lattice distortion ν(Rn′) [7]. The barrier exists for s > 0 and does not exist for s < 0. The discontinuous change of the polaron state, i.e. ST, occurs in the former case while does not happen in the latter case. When s = 0, this scaling argument alone can not conclude the presence or absence of the ST and more detailed discussion for each model is needed. 4.1 Typical Example of the Self-Trapping: Rasba-Pekar Exciton-Polaron Classical example of a system with ST phenomenon is the three dimensional continuous Rasba-Pekar exciton-polaron in the approximation of intraband scattering, i.e. when polar electron-phonon interaction (EPI) with dispersion- less optical phonons ωph = 1 does not change the wave function of internal Spectroscopic Properties of Polarons by Exact Monte Carlo 21 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 10 20 30 40 Fig. 6. The ground-state energy (a), effective mass (b), and average number of phonons as function of coupling constant (c). Partial weights of n-phonon states (d) in the polaron ground state (k = 0) at γ = 18 (circles), γ = 18.35 (squares), and γ = 19 (diamonds). Dotted line in panel (a) is the result of strong coupling limit and dashed line is the result of perturbation theory. electron-hole motion. System is defined as a structureless QP with dispersion ǫ(k) = k2/2 and short range coupling to phonons [54, 7]. General criterion of the existence of ST is satisfied for three dimensional system with short range interaction [54, 7, 50] and, thus, one expects to observe typical features of the phenomenon. It is shown [54] that in the vicinity of the critical coupling γc ≈ 18 the average number of phonons 〈N〉 in (18) and effective mass m∗ quickly in- crease in the ground state by several orders of magnitude (Fig. 6b-c). Besides, a quantum resonance between polaronic phonon clouds of F and T state is demonstrated. Distribution of partial n-phonon contributions Z(k=0)(n) in (17) has one maximum at n = 0 in the weak coupling regime, which cor- responds to weak deformation, and one maximum at n ≫ 1 in the strong coupling regime, which is the consequence of a strong lattice distortion. How- ever, due to F-T resonance there are two distinct peaks at n = 0 and n ≫ 1 for γ ≈ γc (Fig. 6d). Near the critical coupling γc the LF of polaron has several stable states (Fig. 7 a-b) below the threshold of incoherent continuum Egs+ωph. Any state above the threshold is unstable because emission of a phonon with transition to the ground state at k = 0 with energy Egs is allowed. On the other hand, decay is forbidden by conservation laws for states below the threshold. De- pendence of the energies of ground and excited resonances on the interaction constant resembles a picture of crossing of several states interacting with each other (Fig. 7c). According to the general picture of the ST phenomenon, lowest F state in the weak coupling regime at k = 0 has small effective mass m∗ ≈ m of the order of the bare QP mass m. To the contrary, the effective mass of excited state m∗ ≫ m is large. Hence, below the critical coupling the energy of the F state, which is lowest at k = 0, has to reach a flat band of T state at some momentum. Then, F and T state have to hybridize and exchange in 22 A. S. Mishchenko and N. Nagaosa 0 1 2 3 17 18 19 0 2 4 Fig. 7. LF L(k=0)(ω) at critical coupling γ = γc (a) and for γ > γc (b). Energy is counted from the polaron ground state. (c) Dependence of energy of ground state (squares) and stable excited states (circles, diamonds, and triangles) on the coupling constant. Dashed line is the threshold of the incoherent continuum. Dependence of energy (d) and average number of phonons (e) on the wave vector at γ < γc (circles and rectangles). Dashed line is the effective mass approximation E(k) = Egs + k 2/2m∗ for parameters Egs = −3.7946 and m∗ = 2.258, obtained by DMC estimators for given value of γ. Dotted line is a parabolic dispersion law which is fitted to last 4 points of energy dispersion curve with parameters E1 = −3.5273 and m∗1 = 195. Empty square is the energy of first excited stable state at zero momentum obtained by SO method. energy. DMC data visualize this picture (Fig. 7 d-e). After F state crosses the flat band of excited T state, the average number of phonons increases and dispersion becomes flat. It is natural to assume that above the critical coupling the situation is opposite: ground state is the T state with large effective mass while excited F state has small, nearly bare, effective mass. Indeed, this assumption was confirmed in the framework of another model which is considered in Sect. 6.1. Moreover, it was shown that in the strong coupling regime excited resonance inherits not only bare effective mass around k = 0 but the whole dispersion law of the bare QP [49]. 4.2 Degeneracy Driven Self-Trapping According to the criterion (38), ST phenomenon in one-dimensional sys- tem does not occur. Although this statement is probably valid for the case of single band in relevant energy range, it is not the case for the generic multi-band cases. This fact has been unnoticed for many years, Spectroscopic Properties of Polarons by Exact Monte Carlo 23 which prevented the proper explanation of puzzling physics of quasi-one- dimensional compound Anthracene-PMDA, although it’s optical properties [65, 145, 146, 147, 66, 148] directly suggested resonance of T and F states. The reason is that in Anthracene-PMDA, in contrast to conditions at which criterion (38) is obtained, there are two, nearly degenerate exciton bands. Then, one can consider quasi-degenerate self-trapping mechanism when ST phenomenon is driven by nondiagonal interaction of phonons with quaside- generate exciton levels [52]. Such mechanism was already suggested for expla- nation of properties of mixed valence systems [143] though it’s relevance was never proved by an exact approach. 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 0 10 20 Fig. 8. Dependence of energy (a) and average number of phonons (b) on the non- diagonal coupling constant λ12 at λ11 = 0 and λ22 = 0.25. Phonon distributions in polaron cloud below ST point at λ12 = 1.0125 (c), at ST point at λ12 = 1.0435 (d), and above ST coupling at λ12 = 1.0625 (e). The minimal model to demonstrate the mechanism of quasi-degenerate self-trapping involves one optical phonon branch with frequency ωph = 0.1 and two exciton branches with energies ǫ1,2(q) = ∆1,2 + 2[1 − cos(q)], where ∆1 = 0 and ∆2 = 1. Presence of short range diagonal γ22 and nondiagonal γ12 interactions (with corresponding dimensionless constants λ22 = γ 22/(2ω) and λ12 = γ 12/(2ω)) leads to classical self-trapping behavior even in one- dimensional system [52] (see Fig. 8). 5 Exciton Despite numerous efforts over the years, there has been no rigorous tech- nique to solve for exciton properties even for the simplest model (1)-(2) which treats electron-electron interactions as a static renormalized Coulomb poten- tial with averaged dynamical screening. The only solvable cases are the Frenkel small-radius limit [67] and the Wannier large-radius limit [68] which describe molecular crystals and wide gap insulators with large dielectric constant, re- spectively. Meanwhile, even the accurate data for the limits of validity of the 24 A. S. Mishchenko and N. Nagaosa Wannier and Frenkel approximations have not been available. As discussed in Sects. 1.2 and 2.3, semianalytic approaches has little to add to problem when quantitative results are needed whereas traditional numerical methods fail to reproduce them even in the Wannier regime. To the contrary, DMC results do not contain any approximation. 0.0 0.5 1.0 1.5 2.0 Bandwidth 0.0 20.0 40.0 60.0 80.0 Bandwidth 0 5 10 15 20 25 Coordinate sphere −0.05 0 200 400 600 800 1000 Electron−hole distance 0 2 4 6 8 10 Coordinate sphere 0 1 2 3 4 Coordinate sphere Fig. 9. Panel (a): dependence of the exciton binding energy on the bandwidth Ec = Ev for conduction and valence bands. The dashed line corresponds to the Wannier model. The solid line is the cubic spline, the derivatives at the right and left ends being fixed by the Wannier limit and perturbation theory, respectively. Inset in panel (a): the initial part of the plot. Panel (b): the wave function of internal motion in real space for the optically forbidden monopolar exciton. Panels (c)-(e): the wave function of internal motion in real space: (c) Wannier [Ec = Ev = 60]; (d) intermediate [Ec = Ev = 10]; (e) near-Frenkel [Ec = Ev = 0.4] regimes. The solid line in the panel (c) is the Wannier model result while solid lines in other panels are to guide the eyes only. To study conditions of validity of limiting regimes by DMC method, electron-hole spectrum of three dimensional system was chosen in the form of symmetric valence and conduction bands with width Ec and direct gap Eg Spectroscopic Properties of Polarons by Exact Monte Carlo 25 at zero momentum [46]. For large ratio W = Ec/Eg, when W > 30, exci- ton binding energy is in good agreement with Wannier approximation results (Fig. 9a) and probability density of relative electron-hole motion corresponds (Fig. 9c) to hydrogen-like result. The striking result is the requirement of rather large valence and conduction bandwidths (W > 20) for applicability of Wannier approximation. For smaller values ofW the binding energy and wave function of relative motion (Fig. 9d) deviate from large radius results. In the similar way, conditions of validity of Frenkel approach are rather restricted too. Moreover, even strong localization of wave function does not guarantee good agreement between exact and Frenkel approximation result for binding energy. At 1 < W < 10 the wave function is already strongly localized though binding energy considerably differs from Frenkel approximation result. For example, at W = 0.4 relative motion is well localized (Fig. 9e) whereas the binding energy of Frenkel approximation is two times larger than exact result (Inset in Fig. 9a). A study of conditions necessary for formation of charge transfer exciton in three dimensional systems is crucial to finalize protracted discussion of numer- ous models concerning properties of mixed valence semiconductors [149]. A decade ago unusual properties of SmS and SmB6 were explained by invoking the excitonic instability mechanism assuming charge-transfer nature of the optically forbidden exciton [150, 151]. Although this model explained quanti- tatively the phonon spectra [152, 153], optical properties [154, 155], and mag- netic neutron scattering data [138], it’s basic assumption has been criticized as being groundless [156, 157]. To study excitonic wavefunction, dispersions of valence and conduction bands were chosen as it is typical for mixed valence materials: almost flat valence band is separated from broad conduction band, having maximum in the centre and minimum at the border of Brillouin zone [46]. Results presented in Fig. 9b support assumption of [150, 151] since wave function of relative motion has almost zero on-site component and maximal charge density at near neighbors. 6 Polarons in Undoped High Temperature Superconductors It is now well established that the physics of high temperature superconduc- tors is that of hole doping a Mott insulator [158, 159, 160]. Even a single hole in a Mott insulator, i.e. a hole in an antiferromagnet in case of infinite Hubbard repulsion U , is substantially influenced by many-body effects [10] be- cause it’s jump to a neighboring site disturbs antiferromagnetic arrangement of spins. Hence a thorough understanding of the dynamics of doped holes in Mott insulators has attracted a great deal of recent interest. The two major interactions relevant to the electrons in solids are electron-electron interac- tions (EEI) and electron-phonon interactions (EPI). The importance of the former at low doping is no doubt essential since the Mott insulator is driven 26 A. S. Mishchenko and N. Nagaosa by strong Hubbard repulsion, while the latter was considered to be largely irrelevant to superconductivity based on the observations of a small isotope effect on the optimal Tc [161] and an absence of a phonon contribution to the resistivity (for review see [162]). On the other hand, there are now accumulating evidences that the EPI plays an important role in the physics of cuprates such as (i) an isotope effect on superfluid density ρs and Tc away from optimal doping [163], (ii) neutron and Raman scattering [164, 165, 166] experiments showing strong phonon soft- ening with both temperature and hole doping, indicating that EPI is strong [167, 168]. Furthermore, the recent studies of cuprates by the angle resolved photoemission spectroscopy (ARPES), which spectra are proportional to the LF (7) [32], resulted in the discovery of the dispersion ”kinks” at around 40- 70meV measured from the Fermi energy, in the correct range of the relevant oxygen related phonons [169, 170, 171]. These particular phonons - oxygen buckling and half-breathing modes are known to soften with doping [172, 164] and with temperature [170, 171, 172, 164, 165, 166] indicating strong cou- pling. The quick change of the velocity can be predicted by any interaction of a quasiparticle with a bosonic mode, either with a phonon [170, 171] or with a collective magnetic resonance mode [173, 174, 175]. However, the recently discovered “universality” of the kink energy for LSCO over the entire doping range [176] casts doubts on the validity of the latter scenario as the energy scale of the magnetic excitation changes strongly with doping. Besides, measured in undoped high Tc materials ARPES revealed appar- ent contradiction between momentum dependence of the energy and linewidth of the QP peak. On the one hand the experimental energy dispersion of the broad peak in many underdoped compounds [31, 177] obeys the theoretical predictions [178, 179], whereas the experimental peak width is comparable with the bandwidth and orders of magnitude larger than that obtained from theory of Mott insulator [53]. Early attempts to interpret this anomalously short lifetime of a hole by an interaction with additional nonmagnetic bosonic excitations, e.g. phonons [180], faced generic question: is it possible that in- teraction with media leaves the energy dispersion absolutely unrenormalized, while, induces a decay which inverse life-time is comparable or even larger than the QP energy dispersion? A possibility of an extrinsic origin of this width can be ruled out since the doping induces further disorder, while a sharper peak is observed in the overdoped region. In order to understand whether phonons can be responsible for peculiar shape of the ARPES in the undoped cuprates, the LF of an interacting with phonons hole in Mott insulator was studied by DMC-SO [49]. The case of the LF of a single hole corresponds to the ARPES in an undoped compound. For a system with large Hubbard repulsion U , when U is much larger than the typical bandwidth W of noninteracting QP, the problem reduces to the t-J model [181, 182, 158, 11] Spectroscopic Properties of Polarons by Exact Monte Carlo 27 Ĥt-J = −t 〈ij〉s iscjs + J (SiSj − ninj/4) . (39) Here cjσ is projected (to avoid double occupancy) fermion annihilation op- erator, ni (< 2) is the occupation number, Si is spin 1/2 operator, J is an exchange integral, and 〈ij〉 denotes nearest-neighbor sites in two dimensional square lattice. Different theoretical approaches revealed [158, 183, 53] basic properties of the LF. The LF has a sharp peak in the low energy part of the spectrum which disperses with a bandwidth WJ/t ∼ 2J and, therefore, the large QP width in experiment can not be explained. More complicated tt′t′′-J model takes into account hoppings to the second t′ and third t′′ nearest neigh- bors and, hence, dispersion of the hole changes [184, 185, 186, 178, 179, 32]. However, for parameters, which are necessary for description of dispersion in realistic high Tc superconductors [31, 178], peak in the low energy part remains sharp and well defined for all momenta [187]. After expressing spin operators in terms of Holstein-Primakoff spin wave operators and diagonalizing the spin part of Hamiltonian (39) by Fourier and Bogoliubov transformations [188, 10, 189, 190], tt′t′′-J Hamiltonian is reduced to the boson-holon model, where hole (annihilation operator is hk) with dispersion ε(k) = 4t′ cos(kx) cos(ky)+2t ′′[cos(2kx)+cos(2ky)] propagates in the magnon (annihilation operator is αk) bath Ĥ0t-J = ε(k)h αk (40) with magnon dispersion ωk = 2J 1− γ2 , where γk = (cos kx + cos ky)/2. The hole is scattered by magnons as described by Ĥh-mt-J = N hk−qαk + h.c. with the scattering vertex Mk,q. Parameters t, t ′ and t′′ are hopping ampli- tudes to the first, second and third near neighbors, respectively. If hopping integrals t′ and t′′ are set to zero and bare hole has no dispersion, the problem (40-41) corresponds to t-J model. Short range interaction of a hole with dispersionless optical phonons Ĥe-ph = Ω0 bk of the frequency Ω0 is introduced by Holstein Hamil- tonian Ĥe-ph = N−1/2 hk−qbq + h.c. , (42) where σ is the momentum and isotope independent coupling constant, M is the mass of the vibrating lattice ions, and Ω0 is the frequency of dispersionless phonon. The coefficient in front of square brackets is the standard Holstein in- teraction constant γ = σ/ (2MΩ0). In the following we characterize strength of EPI in terms of dimensionless coupling constant λ = γ2/4tΩ0. Note, if in- teraction with magnetic subsystem (41) is neglected and hole dispersion ε(k) 28 A. S. Mishchenko and N. Nagaosa is chosen in the form ε(k) = 2t[cos(kx) + cos(ky)], the problem (40), (42) cor- responds to standard Holstein model where hole with near neighbor hopping amplitude t interacts with dispersionless phonons. We consider the evolution of ARPES of a single hole in t-J-Holstein model (40)-(42) from the weak to the strong coupling regime and dispersion of the LF in the strong coupling regime in Sect. 6.1. It occurs that properties of the LF in the strong coupling regime of the EPI explain the puzzle of broad lineshape in ARPES in underdoped high Tc superconductors. Therefore, in order to suggest a crucial test for the mechanism of phonon-induced broadening, we present calculations of the effect of the isotope substitution on the ARPES in Sect. 6.2. 6.1 Spectral Function of a Hole Interacting with Phonons in the t-J Model: Self-Trapping and Momentum Dependence Previously, the LF of t-J-Holstein model was studied by exact diagonalization method on small clusters [191] and in the non-crossing approximation (NCA)6 for both phonons and magnons [192, 193]. However, the small system size in exact diagonalization method implies a discrete spectrum and, therefore, the problem of lineshape could not be addressed. The latter method omits the FDs with mutual crossing of phonon propagators and, hence, is an invalid approximation for phonons in strong and intermediate couplings of EPI. This statement was demonstrated by DMC, which can sum all FDs for Holstein model both exactly and in the NCA [49]. Exact results and those of NCA are in good agreement for small values λ ≤ 0.4 and drastically different for λ > 1. For example, for Ω0/t = 0.1 exact result shows a sharp crossover to strong coupling regime for λ > λcH ≈ 1.2 whereas NCA result does not undergo such crossover even at λ = 100. On the other hand, NCA is valid for interaction of a hole with magnons since spin S=1/2 can not flip more than once and number of magnons in the polaronic cloud can not be large. Note that the t-J-Holstein model is reduced to problem of polaron which interacts with several bosonic fields (3)-(4). DMC expansion in [49] takes into account mutual crossing of phonon prop- agators and, in the framework of partial NCA, neglects mutual crossing of magnon propagators, to avoid sign problem. NCA for magnons is justified for J/t ≤ 0.4 by good agreement of results of NCA and exact diagonalization on small clusters [188, 10, 194, 195, 190]. Recently results of exact diagonalization were compared in the limit of small EPI for t-J-Holstein model, boson-holon model (40-42) without NCA, and boson-holon model with NCA [196]. Al- though agreement is not so good as for pure t-J model, it was concluded that NCA for magnons is still good enough to suggest that one can use NCA for a qualitative description of the t-J-Holstein model. 6 NCA is equivalent to self-consistent Born approximation (SCBA) Spectroscopic Properties of Polarons by Exact Monte Carlo 29 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.0 0.2 0.4 -2.50 -2.25 -2.00 -2 0 2 4 Fig. 10. (a) The LF of a hole in the ground state k = (π/2, π, 2) at J/t = 0.3 and λ = 0. Low energy part of the LF of a hole in the ground state k = (π/2, π, 2) at J/t = 0.3: (b) λ = 0; (c) λ = 0.3; (d) λ = 0.4; (e) λ = 0.46. Dependence on coupling strength λ at J/t = 0.3: (f) energies of lowest LF resonances; (g) Z-factor of lowest peak; (h) average number of phonons 〈N〉. Figures 10a-e show low energy part of LF in the ground state at k = (π/2, π/2) in the weak, intermediate, and strong coupling regimes of inter- action with phonons. Dependence on the coupling constant of energies of resonances (Fig. 10f), Zk=(π/2,π/2)-factor of lowest peak (Fig. 10g), and aver- age number of phonons in the polaronic cloud 〈N〉 (Fig. 10h) demonstrates a picture which is typical for ST (see [80, 54] and Sect. 4). Two states cross and hybridize in the vicinity of critical coupling constant λct-J ≈ 0.38, Zk=(π/2,π/2)- factor of lowest resonance sharply drops and average number of phonons in polaronic cloud quickly rises. According to the general understanding of the ST phenomenon, above the critical couplings λ > λct-J one expects that the lowest state is dispersionless while the upper one has small effective mass. This assumption is supported by the momentum dependence of the LF in the strong coupling regime (Fig. 11a-e). Dispersion of upper broad shake-off Franck-Condon peak nearly perfectly obeys relation εk = εmin+WJ/t/5{[coskx+cos ky]2+[cos(kx+ky)+cos(kx−ky)]2/4}, (43) which describes dispersion of the pure t-J model in the broad range of ex- change constant 0.1 < J/t < 0.9 [194] (Fig. 11f). Note that this property of the shake-off peak is general for the whole strong coupling regime (Fig. 11f). Momentum dependence of the shake-off peak, reproducing that of the free particle, is the direct consequence of the adiabatic regime. Actually, phonon frequency Ω0 is much smaller than the coherent bandwidth 2J of the t-J 30 A. S. Mishchenko and N. Nagaosa -2 0 2 -2.5 -2.0 -2.75 -2.50 -2.25 -2.00 -1.75 -10 -5 0 5 10 k=( /2, /2) /t /t k=( /4, /4) k=(0, /4) k=(0, ) k=( /2, /2) Fig. 11. The LF of a hole at J/t = 0.3 and λ = 0.46: (a) full energy range for k = (π/2, π/2); (b–e) low energy part for different momenta. Slanted arrows show broad peaks which can be interpreted in ARPES spectra as coherent (C) and incoherent (I) part. Vertical arrows in panels (b)–(e) indicate position of “invisible” lowest resonance. (f) Dispersion of resonances energies at J/t = 0.3: broad resonance (filled circles) and lowest polaron pole (filled squares) at λ = 0.46; broad resonance (open circles) and lowest polaron pole (open squares) at λ = 0.4. The solid curves are dispersions (43) of a hole in pure t-J model at J/t = 0.3 (WJ/t=0.3 = 0.6): εmin = −2.396 (εmin = −2.52) for dotted (solid) line. Panel (g) shows ground state potential Q2/2 (solid line), excited state potential without relaxation D + Q2/2 (dashed line), and relaxed excited state potential D + (Q − λ)2/2 − λ2/2 (dotted line). model, giving the adiabatic ratio Ω/2J = 1/6 ≪ 1. Besides, as experience with the OC of the Fröhlich polaron (Sect. 3.2) shows, there is one more important parameter in the strong coupling limit. Namely, the ratio between measurement process time τmp = h̄/∆E where ∆E is the energy separation of shake-off hump from the ground state pole, and that of characteristic lattice time τ ≈ 1/Ω0 is much less than unity. Hence, fast photoemission probe sees the ions frozen in one of possible configurations [197]. The LF in the FC limit is a sum of transitions between a lower Elow(Q) and an upper Eup(Q) sheets of adiabatic potential, weighted by the adiabatic wave function of the lower sheet | ψlow(Q) |2 [198]. If EPI is absent both in initial Elow(Q) = Q2/2 and final Eup(Q) = D + Q2/2 states, the LF is peaked at the energy D. Then, if there is EPI ∆Eup(Q) = −λQ only in the final state, i.e. when hole is removed from the Mott insulator, the upper sheet of adiabatic potential Eup(Q) = D− λ2/2 + (Q− λ)2/2 has the same energy D at Q = 0. Since the probability function | ψlow(Q) |2 has maximum at Q = 0, the peak of the LF broadens but it’s energy does not shift [198] (Fig. 11g). Spectroscopic Properties of Polarons by Exact Monte Carlo 31 Behavior of the LF is the same as observed in the ARPES of undoped cuprates. The LF consists of a broad peak and a high energy incoherent con- tinuum (see Fig. 11a). Besides, dispersion of the broad peak “c” in Figs. 11 reproduces that of sharp peak in pure t-J model (Fig. 11b-f). The lowest dis- persionless peak, corresponding to small radius polaron, has very small weight and, hence, can not be seen in experiment. On the other hand, according to ex- periment, momentum dependence of spectral weight Z(k) of broad resonance exactly reproduces dispersion of Z(k)-factor of pure t-J model. The reason for such perfect mapping is that in adiabatic case Ω0/2J ≪ 1 all weight of the sharp resonance in t-J model without EPI is transformed at strong EPI into the broad peak. This picture implies that the chemical potential in the heavily underdoped cuprates is not connected with the broad resonance but pinned to the real quasiparticle pole with small Z-factor. This conclusion was recently confirmed experimentally [177]. Comparing the critical EPI for a hole in the t-J-Holstein model (40-42) λct-J ≈ 0.38 and that for Holstein model λcH ≈ 1.2 with the same value of hopping t, we conclude that spin-hole interaction accelerates transition into the strong coupling regime. The reason for enhancement of the role of EPI is found in [196]. Comparison of the EPI driven renormalization of the effective mass in t-J-Holstein and Holstein model shows that large effective mass in the t-J model is responsible for this effect. The enhancement of the role of EPI by EEI takes place at least for a single hole at the bottom of the t-J band. Had the comparison been made with half-filled model, the result would have been smaller enhancement or no enhancement at all [199]. On the other hand, coupling constant of half-breathing phonon is increased by correlations [200]. Finally, we conclude that effect of enhancement of the effective EPI by EEI is not unambiguous and depends on details of interaction and filling. However, this effect is present for small filling in the t-J-Holstein model. 6.2 Isotope Effect on ARPES in Underdoped High-Temperature Superconductors The magnetic resonance mode and the phonon modes are the two major candidates to explain the “kink” structure of the electron energy dispersion around 40-70 meV below the Fermi energy, and the isotope effect (IE) on ARPES should be the smoking-gun experiment to distinguish between these two. Gweon et al. [201] performed the ARPES experiment on O18-replaced Bi2212 at optimal doping and found an appreciable IE, which however can not be explained within the conventional weak-coupling Migdal-Eliashberg theory. Namely the change of the spectral function due to O18-replacement has been observed at higher energy region beyond the phonon energy (∼ 60meV). This is in sharp contrast to the weak coupling theory prediction, i.e., the IE should occur only near the phonon energy. Hence the IE in optimal Bi2212 remains still a puzzle. On the other hand, the ARPES in undoped materials, as described in Sect. 6.1, has recently been understood in terms of 32 A. S. Mishchenko and N. Nagaosa the small polaron formation [49, 202, 198]. Therefore, it is essential to compare experiment in undoped systems with presented in this Sect. DMC-SO data, where theory can offer quantitative results. In addition to high-Tc problem, strong EPI mechanism of ARPES spec- tra broadening was considered as one of alternative scenarios for diatomic molecules [203], colossal magnetoresistive manganites [34], quasi-one-dimensi- onal Peierls conductors [37, 38], and Verwey magnetites [39]. Therefore, exact analysis of the IE on ARPES at strong EPI is of general interest for conclusive experiments in a broad variety of compound classes. Dimensionless coupling constant λ = γ2/4tΩ in (42) is an invariant quan- tity for the simplest case of IE. Indeed, assuming natural relation Ω ∼ 1/ between phonon frequency and mass, we find that λ does not depend on the isotope factor κiso = Ω/Ω0 = M0/M , which is defined as the ratio of phonon frequency in isotope substituted (Ω) and normal (Ω0) systems. We chose adopted parameters of the tt′t′′-J model which reproduce the experi- mental dispersion of ARPES [178]: J/t = 0.4, t′/t = −0.34, and t′′/t = 0.23 . The frequency of the relevant phonon [32] is set to Ω0/t = 0.2 and the isotope factor κiso = 16/18 corresponds to substitution of O18 isotope for O16. To sweep aside any doubts of possible instabilities of analytic continuation, we calculate the LF for normal compound (κnor = 1), isotope substituted (κiso = 16/18) and “anti-isotope” substituted (κant = 18/16) compounds. Monotonic dependence of LF on κ ensures stability of analytic continuation and gives possibility to evaluate the error-bars of a quantity A using quantities Aiso −Anor, Anor −Aant, and (Aiso −Aant)/2. Since LF is sensitive to strengths of EPI only for low frequencies [55], we concentrate on the low energy part of the spectrum. Figure 12 shows IE on the hole LF for different couplings in nodal and antinodal points, respectively. The general trend is a shift of all spectral features to larger energies with increase of the isotope mass (κ < 1). One can also note that the shift of broad FCP is much larger than that of narrow real-QP peak. Moreover, for large couplings λ the shift of QP energy approaches zero and only decrease of QP spectral weight Z is observed for larger isotope mass. On the other hand, the shift of FCP is not suppressed for larger couplings. Except for the LF in nodal point at λ = 0.62 (Fig. 12a, b), where LF still has significant weight of QP δ-functional peak, there is one more notable feature of the IE. With increase of the isotope mass the height of FCP increases. Taking into account the conservation law for LF −∞ Lk(ω) = 1 and insensitivity of high energy part of LF to EPI strength [55], the narrowing of the FCP for larger isotope mass can be concluded. To understand the trends of the IE in the strong coupling regime we analyze the exactly solvable independent oscillators model (IOM) [60]. The LF in IOM is the Poisson distribution L(ω) = exp[−ξ0/κ] [ξ0/κ] Gκ,l(ω) , (44) Spectroscopic Properties of Polarons by Exact Monte Carlo 33 Fig. 12. Low energy part of hole LFs: normal compound (solid line), isotope sub- stituted compound (dotted line) and “antiisotope” substituted compound (dashed line). LFs at different couplings in the nodal (a, c, e) and antinodal (g, i, l) points. Insets (b, d, f, h, k) show low energy peak of real QP. where ξ0 = γ 0 = 4tλ/Ω0 is dimensionless coupling constant for normal system and Gκ,l(ω) = δ[ω+4tλ−Ω0κl] is the δ-function. The properties of the Poisson distribution quantitatively explain many features of the IE on LF7. The energy ωQP = −4tλ of the zero-phonon line l = 0 in (44) depends only on isotope independent quantities which explains very weak isotope de- pendence of QP peak energy in insets of Fig. 12. Besides, change of the zero- phonon line weight Z(0) obeys relation Z iso /Z nor = exp [−ξ0(1− κ)/κ] in IOM. These IOM estimates agree with DMC data within 15% in the nodal point and within 25% in the antinodal one. IE on FCP in the strong cou- pling regime follows from the properties of zero M0 = −∞ L(ω)dω = 1, first −∞ ωL(ω)dω = 0, and second M2 = 2L(ω)dω = κξ0Ω 0 mo- ments of shifted Poisson distribution (44). Moments M0 and M2 establish relation D = hFCPiso /hFCPnor = 1/ κ ≈ 1.03 between heights of FCP in normal and substituted compounds. DMC data in the antinodal point perfectly agree with the above estimate for all couplings. This is consistent with the idea that the anti-nodal region remains in the strong coupling regime even though the nodal region is in the crossover region. In the nodal point DMC data well agree with IOM estimate for λ = 0.75 (D ≈ 1.025) whereas at λ = 0.69 and 7 Cautions should be made about approximate form of EPI (42). Strictly speaking, actual momentum dependence of the interaction constant σ [204, 205] can slightly change the obtained differences between nodal and antinodal points though the general trends have to be left intact because ST is caused solely by the short range part of EPI [80]. 34 A. S. Mishchenko and N. Nagaosa 0.65 0.70 0.75 0.65 0.70 0.75 0.65 0.70 0.75 0.5 0.6 0.7 k=( /2, /2) Fig. 13. (a) Energies of ground state and broad peaks for normal (triangles), isotope substituted (circles) and “antiisotope” substituted (diamonds) compounds. Comparison of IOM estimates (lines) with DMC data in the nodal (squares) and antinodal (diamonds) points: (b) shift of the FCP top, (c) FCP leading edge at 1/2 of height, and (d) FCP leading edge at 1/3 of height. λ = 0.62 influence of the ST point leads to anomalous values of D: D ≈ 1.07 and D ≈ 0.98, respectively. Shift of the low energy edge at half maximum ∆1/2 must be proportional to change of the root square of second moment ∆√M2 = ξ0Ω0[1 − κ]. As we found in numeric simulations of (44) with Gaussian functions8 Gκ,l(ω), relation ∆1/2 ≈ ∆√M2/2 is accurate to 10% for 0.62 < λ < 0.75. Also, simulations show that the shift of the edge at one third of maximum ∆1/3 obeys relation ∆1/3 ≈ ∆√M2 . DMC data with IOM estimates are in good agreement for strong EPI λ = 0.75 (Fig. 13). How- ever, shift of the FCP top ∆p and ∆1/2 are considerably enhanced in the self-trapping (ST) transition region. The physical reason for enhancement of IE in this region is a general property regardless of the QP dispersion, range of EPI, etc. The influence of nonadiabatic matrix element, mixing excited and ground states, on the energies of resonances essentially depends on the phonon frequency. While in the adiabatic approximation ST transition is sud- den and nonanalytic in λ [80], nonadiabatic matrix elements turn it to smooth crossover [144]. Thus, as illustrated in Fig. 13a, the smaller the frequency the sharper the kink in the dependence of excited state energy on the interaction constant In the undoped case the present results can be directly compared with the experiments. It is found that the IE on the ARPES lineshape of a sin- gle hole is anomalously enhanced in the intermediate coupling regime while can be described by the simple independent oscillators model in the strong coupling regime. The shift of FCP top and change of the FCP height are rele- vant quantities to pursue experimentally in the intermediate coupling regime since IE on these characteristics is enhanced near the self trapping point. In 8 Results are almost independent on the parameter η of the Gaussian distribution Gκ,l(ω) = 1/(η 2π) exp(−[ω + 4tλ−Ω0κl]/(2η2)) in the range [0.12, 0.2]. Spectroscopic Properties of Polarons by Exact Monte Carlo 35 contrast, shift of the leading edge of the broad peak is the relevant quantity in the strong coupling regime since this value increases with coupling as These conclusions, depending on the fact whether self trapping phenomenon is encountered in specific case, can be applied fully or partially to another compounds with strong EPI [34, 37, 38, 39]. 6.3 Conclusions and Perspectives In this article, we have focused mainly on the polaron problem in strongly correlated systems. This offers an approach from the limit of low carrier con- centration doped into the (Mott) insulator, which is complementary to the conventional Eliashberg-Migdal approach for the EPI in metals. In the latter case, we have the Fermi energy εF as a relevant energy scale, which is usually much larger than the phonon frequency Ω0. In this case, the adiabatic Migdal approximation is valid and the vertex corrections, which correspond to the multi-phonon cloud and are essential to the self-trapping phenomenon, are suppressed by the ratio Ω0/εF . Therefore an important issue is the crossover from the strong coupling polaronic picture to the weak coupling Eliashberg- Migdal picture. This occurs as one increases the carrier doping into the insu- lator. As is observed by ARPES experiments in high temperature supercon- ductors, the polaronic states continue to survive even at finite doping [177]. This suggests a novel polaronic metallic state in underdoped cuprates, which is common also in CMR manganites [36] and is most probably universal in transition metal oxides. In the optimal and overdoped region, the Eliashberg- Migdal picture becomes appropriate [170, 171], but still a nontrivial feature of the EPI is its strong momentum dependence leading to the dichotomy between the nodal and anti-nodal regions. It is an interesting observation that the high- est superconducting transition temperature is attained at the crossover region between the two pictures above, which suggests that both the itinerancy and strong coupling to the phonons are essential to the quantum coherence. It should be noted that this crossover occurs in a nontrivial way also in the mo- mentum space, i.e., the nodal and anti-nodal regions behave quite differently as discussed in Sect. 6.2. However, the relevance of the EPI to the high Tc superconductivity is still left for future investigations. We hope that this article convinces the readers the vital role of ARPES experiments and numerically exact solutions to the EPI problem, the com- bination of them offers a powerful tool for the momentum-energy resolved analysis of these rather complicated strongly correlated electronic systems. This will pave a new path to the deeper understanding of the many-body electronic systems. We thank Y. Toyozawa, Z. X. Shen, T. Cuk, T. Devereaux, J. Zaanen, S. Ishihara, A. Sakamoto, N. V. Prokofev, B. V. Svistunov, E. A. Burovski, J. T. Devreese, G. de Filippis, V. Cataudella, P. E. Kornilovitch, O. Gunnarsson, N. M. Plakida, and K. A. Kikoin, for collaborations and discussions. 36 A. S. Mishchenko and N. Nagaosa References 1. J. Appel: Solid State Physics, Vol. 21, ed by H. Ehrenreich, F. Seitz and D. Turnbull (Academic, New York 1968). 2. S. I. Pekar: Untersuchungen über die Elektronentheorie der Kristalle, (Akademie Verlag, Berlin 1954) 3. L. D. Landau: Sow. Phys. 3, 664 (1933). 4. H. Fröhlich, H. Pelzer, S. Zienau: Philos. Mag. 41, 221 (1950) 5. J. T. Devreese: Encyclopedia of Applied Physics Vol. 14, ed by G. L. Trigg (VCH, New York 1996), p. 383 6. A. I. Anselm, Yu. A. Firsov: J. Exp. Theor. Phys. 28, 151 (1955); ibid. 30, 719 (1956) 7. M. Ueta, H. Kanzaki, K. Kobayashi, Y. Toyozawa, E. Hanamura: Excitonic Processes in Solids, (Springer-Verlag, Berlin 1986) 8. Y. Toyozawa: Progr. Theor. Phys. 20 53 (1958). 9. Y. Toyozawa: Optical Processes in Solids, (University Press, Cambridge 2003) 10. C. L. Kane, P. A. Lee, N. Read: Phys. Rev. B 39, 6880 (1989) 11. Yu. A. Izymov: Usp. Fiz. Nauk 167, 465 (1997) [Physics-Uspekhi 40, 445 (1997)] 12. J. Kanamori: Appl. Phys. 31, S14 (1960). 13. A. Abragam, B. Bleaney: Electron Paramagnetic Resonance of Transition Ions, (Clarendon Press, Oxford 1970) 14. K. I. Kugel, D. I. Khomskii: Sov. Phys. Usp. 25, 231 (1982) 15. A. J. Millis, P. B. Littlewood, B. I. Shraiman: Phys. Rev. Lett. 74, 5144 (1995) 16. A. S. Alexandrov, A. M. Bratkovsky: Phys. Rev. Lett. 82, 141 (1999) 17. E. I. Rashba: Sov. Phys. JETP 23, 708 (1966) 18. Y. Toyozawa, J. Hermanson: Phys. Rev. Lett. 21, 1637 (1968) 19. I. B. Bersuker: The Jahn-Teller Effect, (IFI/Plenum, New York 1983) 20. V. L. Vinetskii: Zh. Exp. Teor. Fiz 40, 1459 (1961) [Sov. Phys. - JETP 13, 1023 (1961)] 21. P. W. Anderson: Phys. Rev. Lett. 34 953 (1975) 22. H. Hiramoto, Y. Toyozawa: J. Phys. Soc. Jpn. 54, 245 (1985) 23. A. Alexandrov, and J. Ranninger: Phys. Rev. B 23 1796 (1981) 24. A. Alexandrov, and J. Ranninger: Phys. Rev. B 24 1164 (1981) 25. H. Haken: Il Nuovo Cimento 3, 1230 (1956) 26. F. Bassani, G. Pastori Parravicini: Electronic States and Optical Transitions in Solids, (Pergamon, Oxford 1975) 27. J. Pollman, H. Büttner: Phys. Rev. B 16, 4480 (1977) 28. A. Sumi: J. Phys. Soc. Jpn. 43, 1286 (1977) 29. Y. Shinozuka, Y. Toyozawa: J. Phys. Soc. Jpn. 46, 505 (1979) 30. Y. Toyozawa: Physica 116B, 7 (1983) 31. B. O. Wells, Z.-X. Shen, A. Matsuura et al: Phys. Rev. Lett. 74, 964 (1995) 32. A. Danmascelli, Z.-X. Shen, and Z. Hussain: Rev. Mod. Phys. 75, 473 (2003) 33. X. J. Zhou, T. Yoshida, D.-H. Lee et al: Phys. Rev. Lett. 92, 187001 (2004) 34. D. S. Dessau, T. Saitoh, C.-H. Park et al: Phys. Rev. Lett. 81, 192 (1998); 35. N. Mannella, A. Rosenhahn, C. H. Booth et al: Phys. Rev. Lett. 92, 166401 (2004) 36. N. Mannella, W. L. Yang, X. J. Zhou et al: Nature 438, 474 (2005) 37. L. Perfetti, H. Berger, A. Reginelli et al: Phys. Rev. Lett. 87, 216404 (2001) Spectroscopic Properties of Polarons by Exact Monte Carlo 37 38. L. Perfetti, S. Mitrovic, G. Margaritondo et al: Phys. Rev. B 66, 075107 (2002) 39. D. Schrupp, M. Sing, M. Tsunekawa et al: Eur. Phys. Lett. 70, 789 (2005) 40. R. J. Mc Queeney, T. Egami, G. Shirane and Y. Endoh: Phys. Rev. B 54 R9689 (1996) 41. H. A. Mook, R. M. Nicklow: Phys. Rev. B 20 1656 (1979) 42. H. A. Mook, D. B. McWhan, F. Holtzberg: Phys. Rev. B 25 4321 (1982) 43. N. V. Prokof’ev, B. V. Svistunov, I. S. Tupitsyn: J. Exp. Theor. Phys. 114, 570 (1998) [Sov. Phys. - JETP 87, 310 (1998)] 44. N. V. Prokof’ev, B. V. Svistunov: Phys. Rev. Lett. 81, 2514 (1998) 45. A. S. Mishchenko, N. V. Prokof’ev, A. Sakamoto, B. V. Svistunov: Phys. Rev. B 62, 6317 (2000) 46. E. A. Burovski, A. S. Mishchenko, N. V. Prokof’ev, B. V. Svistunov: Phys. Rev. Lett. 87, 186402 (2001) 47. A. S. Mishchenko, N. Nagaosa, N. V. Prokof’ev, B. V. Svistunov, E. A. Burovski: Nonlinear Optics 29, 257 (2002) 48. A. S. Mishchenko, N. Nagaosa, N. V. Prokof’ev, A. Sakamoto, B. V. Svistunov: Phys. Rev. Lett. 91, 236401 (2003) 49. A. S. Mishchenko, N. Nagaosa: Phys. Rev. Lett. 93, 036402 (2004) 50. A. S. Mishchenko: Usp. Phys. Nauk 175, 925 (2005) [Physics-Uspekhi 48, 887 (2005)] 51. A. S. Mishchenko, N. Nagaosa: J. Phys. Soc. J. 75, 011003 (2006) 52. A. S. Mishchenko, N. Nagaosa: Phys. Rev. Lett. 86, 4624 (2001) 53. A. S. Mishchenko, N. V. Prokof’ev, B. V. Svistunov: Phys. Rev. B 64, 033101 (2001) 54. A. S. Mishchenko, N. Nagaosa, N. V. Prokof’ev, A. Sakamoto, B. V. Svistunov: Phys. Rev. B 66 020301 (2002) 55. A. S. Mishchenko, N. Nagaosa: Phys. Rev. B 73, 092502 (2006) 56. A. S. Mishchenko, N. Nagaosa: J. Phys. Chem. Solids 67, 259 (2006) 57. G. De Fillipis, V. Cataudella, A. S. Mishchenko, J. T. Devreese, C. A. Perroni: Phys. Rev. Lett. 96, 136405 (2006) 58. J. T. Devreese: Optical Properties of Few and Many Fröhlich Polarons from 3D to 0D, contribution to the present book. 59. A. A. Abrikosov, L. P. Gor’kov, I. E. Dzyaloshinskii: Quantum field theoretical method in statistical physics (Pergamon Press, Oxford 1965) 60. G. D. Mahan: Many particle physics (Plenum Press, Plenum Press 2000) 61. M. Jarrell, J. Gubernatis: Phys. Rep. 269, 133 (1996) 62. R. Knox: Theory of Excitons, (Academic Press, New York 1963) 63. I. Egri: Phys. Rep 119, 364 (1985) 64. D. Haarer: Chem. Phys. Lett 31, 192 (1975) 65. D. Haarer, M. R. Philpot, H. Morawitz: J. Chem. Phys 63, 5238 (1975) 66. A. Elscner, G. Weiser: Chem. Phys 98 465 (1985) 67. J. I. Frenkel: Phys. Rev. 17, 17 (1931) 68. J. H. Wannier: Phys. Rev. 52, 191 (1937) 69. G. De Filippis, V. Cataudella, V. Marigliano Ramaglia, C. A. Perroni: Phys. Rev. B 72, 014307 (2005) 70. M. Berciu: cond-mat/0602195 71. J. T. Devreese, L. F. Lemmens, J. Van Royen: Phys. Rev. B 15, 1212 (1977) 72. P. E. Kornilovitch: Europhys. Lett. 59, 735 (2002) 73. J. Devreese, R. Evrard: Phys. Lett. 11, 278 (1966) 38 A. S. Mishchenko and N. Nagaosa 74. E. Kartheuzer, R. Evrard, J. Devreese: Phys. Rev. Lett. 22, 94 (1969) 75. J. Devreese, J. De Sitter, M. Goovaerts: Phys. Rev. B 5, 2367 (1972) 76. J. T. Devreese: Internal structure of free Fröhlich polarons, optical absorption and cyclotron resonance. In Polarons in Ionic crystals and Polar Semiconduc- tors (North Holland, Amsterdam 1972) pp 83–159 77. M. J. Goovaerts, J. M. De Sitter, J. T. Devreese: Phys. Rev. B 7, 2639 (1973) 78. R. Feynman, R. Hellwarth, C. Iddings, and P. Platzman: Phys. Rev. 127, 1004 (1962) 79. T. D. Lee, F. E. Low, D. Pines: Phys. Rev. 90, 297 (1953) 80. E. I. Rashba: Self-trapping of excitons. In Modern Problems in Condensed Matter Sciences, vol. 2, ed by V. M. Agranovich and A. A. Maradudin (Notrh Holland, Amsterdam 1982) pp 543–602 81. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. M. Teller and E. Teller: J. Chem. Phys. 21, 1087 (1953) 82. D. P. Landau, K. Binder: A Guide to Monte Carlo Simulations in Statistical Physics, (University Press, Cambridge 2000) 83. A. W. Sandvik, J. Kurkijärvi: Phys. Rev. B 43, 5950 (1991) 84. A. N. Tikhonov, V. Y. Arsenin: Solutions of Ill-Posed Problems, (Winston, Washington 1977) 85. E. Perchik: math-ph/0302045 86. D. L. Phillips: J. Assoc. Comut. Mach. 9 84 (1962) 87. A. N. Tikhonov: DAN USSR 151 501 (1963) 88. S. S. Aplesnin: J. Exp. Theor. Phys 97 969 (2003) 89. G. Onida, L. Reining, A. Rubio: Rev. Mod. Phys. 74, 601 (2002) 90. L. J. Sham, T. M. Rice: Phys. Rev. 144, 708 (1965). 91. L. X. Benedict, E. L. Shirley, R. B. Bohn: Phys. Rev. Lett. 80, 4514 (1998) 92. S. Albrecht, L. Reining, R. Del Sole, G. Onida: Phys. Rev. Lett. 80, 4510 (1998) 93. M. Rohlfing, S. G. Louie: Phys. Rev. Lett. 81, 2312 (1998) 94. A. Marini, R. Del Sole: Phys. Rev. Lett. 91, 176402 (2003) 95. W. Stephan: Phys. Rev. B 54, 8981 (1996) 96. G. Wellein, H. Fehske: Phys. Rev. B 56, 4513 (1997) 97. H. Fehske, J. Loos, G. Wellein: Z. Phys. B 104, 619 (1997) 98. H. Fehske, J. Loos, G. Wellein: Phys. Rev. B 61, 8016 (2000) 99. J. Bonča, S. A. Trugman, I. Batistić: Phys. Rev. B 60, 1633 (1999) 100. L.-C. Ku, S. A. Trugman, J. Bonča: Phys. Rev. B 65, 174306 (2002) 101. S. E. Shawish, J. Bonča, L.-C. Ku, S. A. Trugman: Phys. Rev. B 67, 014301 (2003) 102. O. S. Barisic: Phys. Rev. B 65, 144301 (2002) 103. O. S. Barisic: Phys. Rev. B 69, 064302 (2004) 104. A. Georges, G. Kotliar: Phys. Rev. B 45, 647 (1992) 105. M. Jarrel: Phys. Rev. Lett. 69, 168 (1992) 106. P. G. J. van Dongen, D. Vollhardt: Phys. Rev. Lett. 65, 1663 (1990) 107. A. Georges, G. Kotliar, W. Krauth, M. J. Rozenberg: Rev. Mod. Phys. 68, 13 (1996) 108. S. Ciuchi, F. de Pasquale, S. Fratini, D. Feinberg: Phys. Rev. B 56 4494 (1997) 109. D. Sénéchal, D. Perez, M. Pioro-Landriére: Phys. Rev. Lett. 84, 522 (2000) 110. D. Sénéchal, D. Perez, M. Plouffe: Phys. Rev. B 66, 075129 (2002) 111. M. Hohenadler, M. Aichhorn, W. von der Linden: Phys. Rev. B 68, 18430 (2003) Spectroscopic Properties of Polarons by Exact Monte Carlo 39 112. M. Hohenadler, M. Aichhorn, W. von der Linden: Phys. Rev. B 71, 014302 (2005) 113. M. Hohenadler, D. Neuber, W. von der Linden, G. Wellein, J. Loos, H. Fehske: ibid. 71, 245111 (2005) 114. S. R. White: Phys. Rev. Lett. 69, 2863 (1992) 115. S. R. White: Phys. Rev. B 48, 10345 (1993) 116. S. R. White: Phys. Rev. Lett. 77, 363 (1996) 117. E. Jeckelmann, S. R. White: Phys. Rev. B 57, 6376 (1998) 118. G. Hager, G. Wellein, E.Jeckelmann, H. Fehske: Phys. Rev. B 71, 075108 (2005) 119. P. E. Kornilovitch: Phys. Rev. Lett. 81, 5382 (1998) 120. P. E. Kornilovitch: Phys. Rev. B 60, 3237 (1999) 121. P. E. Spenser, J. H. Samson, P. E. Kornilovitch, A. S. Alexandrov: Phys. Rev. B 71, 184310 (2005) 122. J. P. Hague, P. E. Kornilovitch, A. S. Alexandrov, J. H. Samson: Phys. Rev. B 73, 054303 (2006) 123. A. S. Alexandrov, P. E. Kornilovitch: Phys. Rev. Lett. 82, 807 (1999) 124. A. S. Alexandrov, P. E. Kornilovitch: Phys. Rev. B 70, 224511 (2004) 125. S. Ciuchi, F. de Pasquale, D. Feinberg: Europhys. Lett. 30, 151 (1995) 126. A. S. Alexandrov, V. V. Labanov, D. K. Ray: Phys. Rev. B 49, 9915 (1994) 127. A. S. Alexandrov, J. Ranninger: Phys. Rev. B 45, 13109 (1992) 128. L. D. Landau, S. I. Pekar: Zh. Eksp. Teor. Fiz. 18, 419 (1948) [Sov. Phys. JETP 18, 341 (1948)] 129. V. L. Gurevich, I. G. Lang, Yu. A. Firsov: Fiz. Tverd. Tela (Leningrad) 4, 1252 (1962) [Sov. Phys. Solid State 4, 918 (1962)] 130. J. Franck, E. G. Dymond: Trans. Faraday Soc. 21, 536 (1926) 131. E. U. Condon: Phys. Rev. 32, 858 (1928) 132. D. N. Bertran, J. J. Hopfield: J. Chem. Phys. 81, 5753 (1984) 133. X. Urbain, B. Fabre, E. M. Staice-Casagrande et al: Phys. Rev. Lett. 92, 163004 (2004) 134. M. Lax: J. Chem. Phys. 20, 1752 (1952) 135. D. I. Khomskii: Usp. Fiz. Nauk 129, 443 (1979) [Sov. Phys. Usp. 22, 879 (1979)] 136. C. E. T. Goncalves da Silva, L. M. Falicov: Phys. Rev. B 13, 3948 (1976). 137. P. A. Alekseev, J. M. Mignot, J. Rossat-Mignot: J. Phys.: Condens. Matter 7, 289 (1995) 138. K. A. Kikoin, A. S. Mishchenko: J. Phys.: Condens. Matter 7, 307 (1995) 139. R. Feynman: Phys. Rev. 97, 660 (1955) 140. F. M. Peeters, J. T. Devreese: Phys. Rev. B 28, 6051 (1983) 141. V. Cataudella, G. De. Filippis, C. A. Perroni: Single polaron properties in different electron phonon models, contribution to the present book. 142. E. G. Brovman, Yu. Kagan: Zh. Eksp. Teor. Fiz. 52, 557 (1967) [Sov. Phys. JETP 25, 365 (1967)] 143. K. A. Kikoin, A. S. Mishchenko: Zh. Eksp. Teor. Fiz. 104, 3810 (1993) [Sov. Phys. JETP 77, 828 (1993)] 144. B. Gerlach, H. L”owen: Rev. Mod. Phys. 63, 63 (1991) 145. A. Brillante, M. R. Philpott: J. Chem. Phys. 72, 4019 (1980) 146. D. Haarer: Chem. Phys. Lett. 27, 91 (1974) 147. D. Haarer: J. Chem. Phys. 67, 4076 (1977) 148. M. Kuwata-Gonokami, N. Peyghambarian, K. Meissner et al: Nature 367, 47 (1994) 40 A. S. Mishchenko and N. Nagaosa 149. S. Curnoe, K. A. Kikoin: Phys. Rev. B 61, 15714 (2000) 150. K. A. Kikoin, A. S. Mishchenko: Zh. Eksp. Teor. Fiz. 94, 237 (1988) [Sov. Phys. JETP 67, 2309 (1988)] 151. K. A. Kikoin, A. S. Mishchenko: J. Phys.: Condens. Matter 2, 6491 (1990) 152. P. A. Alekseev, A. S. Ivanov, B. Dorner et al: Europhys. Lett. 10, (1989) 457. 153. A. S. Mishchenko, K. A. Kikoin: J. Phys.: Condens. Matter 3, 5937 (1991). 154. G. Trawaglini P. Wachter: Phys. Rev. B 29, 893 (1984) 155. P. Lemmens, A. Hoffman, A. S. Mishchenko et al: Physica B 206&207, 371 (1995) 156. T. Kasuya: Europhys. Lett. 26, 277 (1994) 157. T. Kasuya: Europhys. Lett. 26, 283 (1994) 158. E. Manousakis: Rev. Mod. Phys. 63, 1 (1991) 159. E. Dagotto: Rev. Mod. Phys. 66, 763 (1994) 160. P. A. Lee, N. Nagaosa, X. G. Wen: Rev. Mod. Phys. 78, 17 (2006) 161. B. Batlogg, R. J. Cava, A. Jayaraman et al: Phys. Rev. Lett. 58, 2333 (1987) 162. O. Gunnarsson, M. Calandra, J. E. Han: Rev. Mod. Phys. 75, 1085 (2003) 163. R. Khasanov, D. G. Eshchenko, H. Luetkens et.al: Phys. Rev. Lett. 92, 057602 (2004) 164. L. Pintschovius, M. Braden: Phys. Rev. B, 60, R15039 (1999). 165. C. Thomsen, M. Cardona, B. Gegenheimer et. al: Phys. Rev. B 37, 9860 (1988) 166. V. G. Hadjiev, X. Zhou, T. Strohm, et. al: Phys. Rev. B 58, 1043 (1998) 167. G. Khaliullin, P. Horsch: Physica C 282-287, 1751 (1997) 168. O. Rösch, O. Gunnarsson: Phys. Rev. Lett. 93, 237001 (2004) 169. A. Lanzara, P. V. Bogdanov, X. J. Zhou et al: Nature 412, 510 (2001) 170. T. Cuk, F. Baumberger, D. H. Lu et al: Phys. Rev. Lett. 93, 117003 (2004) 171. T. P. Devereaux, T. Cuk, Z.-X. Shen, N. Nagaosa: Phys. Rev. Lett. 93, 117004 (2004) 172. R. J. McQueeney, Y. Petrov, T. Egami et al: Phys. Rev. Lett. 82, 628 (1999) 173. A. V. Chubukov, M. R. Norman: Phys. Rev. B 70, 174505 (2004) 174. M. Eschrig, M. R. Norman: Phys. Rev. Lett. 85, 3261 (2000) 175. M. Eschrig, M. R. Norman: Phys. Rev. B 67, 144503 (2003) 176. X. J. Zhou, T. Yoshida, A. Lanzara et al: Nature 423, 398 (2003) 177. K. M. Shen, F. Ronnig, D. H. Lu et al: Phys. Rev. Lett. 93, 267002 (2004) 178. T. Xiang, J. M. Wheatley: Phys. Rev. B 54, R12653 (1996) 179. B. Kyung, R. A. Ferrell: Phys. Rev. B 54, 10125 (1996) 180. J. J. M. Pothuizen1, R. Eder1, N. T. Hien et al: Phys. Rev. Lett. 78, 717 (1997) 181. K. A. Chao, J. Spalek, A. M. Oles: J. Phys. C 10, L271 (1977) 182. C. Gross, R. Joynt, T. M. Rice: Phys. Rev. B 36, 381 (1987) 183. M. Brunner, F. F. Assaad, A. Muramatsu: Phys. Rev. B 62, 15480 (2000). 184. V. I. Belinicher, A. L. Chernyshev, V. A. Shubin: Phys. Rev. B 53, 335 (1996) 185. V. I. Belinicher, A. L. Chernyshev, V. A. Shubin: Phys. Rev. B 54, 14914 (1996) 186. T. Tohyama, S. Maekawa: Superconductors Science and Technology 13, R17 (2000) 187. J. Ba la, A. M. Oleś, J. Zaanen: Phys. Rev. B 52 4597 (1995) 188. S. Schmitt-Rink, C. M. Varma, A. E. Ruckenstein: Phys. Rev. Lett. 60, 2793 (1988) 189. Z. Liu, E. Manousakis: Phys. Rev. B 44, 2414 (1991) 190. Z. Liu, E. Manousakis: Phys. Rev. B 45, 2425 (1992) Spectroscopic Properties of Polarons by Exact Monte Carlo 41 191. B. Bauml, G. Wellein, H. Fehske: Phys. Rev. B 58, 3663 (1998) 192. A. Ramšak, P. Horsch, P. Fulde: Phys. Rev. B 46, 14305 (1992) 193. B. Kyung, S. I. Mukhin, V. N. Kostur, R. A. Ferrell: Phys. Rev. B 54, 13167 (1996) 194. F. Marsiglio F, A. E. Ruckenstein, S. Schmitt-Rink, C. Varma: Phys. Rev. B 43, 10882 (1991) 195. G. Martinez, P. Horsch: Phys. Rev. B 44, 317 (1991) 196. O. Rösch, O. Gunnarsson: Phys. Rev. B 73, 174521 (2006) 197. A. S. Mishchenko: Pis’ma Zh. Eksp. Teor. Fiz. 66, 460 (1997) [JETP Lett. 66, 487 (1997)] 198. O. Rösch, O. Gunnarsson: Europhys. Phys. J. B 43, 11 (2005) 199. G. Sangiovanni, O. Gunnarsson, E. Koch, C. Castellani, M. Capone: cond- mat/0602606. 200. O. Rösch, O. Gunnarsson: Phys. Rev. B 70, 224518 (2004). 201. G.-H. Gweon, T. Sasagawa, S. Y. Zhou et al: Nature 430, 187 (2004) 202. O. Rösch, O. Gunnarsson, X. J. Zhou et al: Phys. Rev. Lett. 95, 227002 (2005) 203. G. A. Sawatzky: Nature (London) 342B, 480 (1989) 204. O. Rösch, O. Gunnarsson: Phys. Rev. Lett. 92, 146403 (2004) 205. S. Ishihara , N. Nagaosa: Phys. Rev. B 69, 144520 (2004) ABSTRACT We present recent advances in understanding of the ground and excited states of the electron-phonon coupled systems obtained by novel methods of Diagrammatic Monte Carlo and Stochastic Optimization, which enable the approximation-free calculation of Matsubara Green function in imaginary times and perform unbiased analytic continuation to real frequencies. We present exact numeric results on the ground state properties, Lehmann spectral function and optical conductivity of different strongly correlated systems: Frohlich polaron, Rashba-Pekar exciton-polaron, pseudo Jahn-Teller polaron, exciton, and interacting with phonons hole in the t-J model. <|endoftext|><|startoftext|> Introduction By Way of Reprise: From Box-Kites to ETs The creation of 2N-dimensional analogues of Complex Numbers (and it was not a trivial insight of 19th Century algebra that legitimate analogs always have dimen- sion a power of 2) is handled by a now well-known algorithm called the Cayley- Dickson Process (CDP). Its name suggests a compressed account of its history: for Arthur Cayley – simultaneously with, but independently of, John Graves – ∗Email address: rdemarrais@alum.mit.edu http://arxiv.org/abs/0704.0026v3 jumped on Hamilton’s initial generalization of the 2-D Imaginaries to the 4-D Quaternions within weeks of its announcement, producing – by the method later streamlined into Leonard Dickson’s close-to-modern “cookie-cutter” procedure – the 8-D Octonions. The hope, voiced by no less than Gauss, had been that an infinity of new forms of Number were lurking out there, with wondrous proper- ties just awaiting discovery, whose magical utility would more than compensate for the loss of things long taken for granted as their seekers ascended into higher dimensions. But such fantasies were quashed quite abruptly by Adolph Hurwitz’s proof, just a few years before the 20th Century loomed, that it only took four dimension-doublings past the Real Number Line to find trouble: the 16-D Sede- nions had zero-divisors, which meant division algebra itself broke down, which meant researchers were so at a loss to find anything good to say about such Num- bers that nobody bothered to even give their 32-D immediate successors a name, much less investigate them seriously. But it is with these 32-D “Pathions” (for short for “pathological,” which we’ll call them from now on) that our own account will pick up in this second part of our study of “placeholder substructures” (i.e., “zero divisors”) For, due to a phenomenon we dubbed carrybit overflow in the first installment, strange yet pre- dictable things are found to be afoot in the ZD equivalent of a “Cayley Table.” As we’ll see shortly, this is a listing, in a square array, of the ZD “emanations” (or lack of same) of all ZD “elements” with each other – all, that is, sharing mem- bership in an ensemble defined not by a shared “identity element,” but a common strut constant. What we’ll see is that the lacks are of the essence: for each doubling of N, the Emanation Table (ET) for the 2N+1-ions of same strut-constant will contain that of its predecessor, leading to an infinite “boxes-within-boxes” deployment whose empty cells define, as N grows ever larger, an unmistakable fractal limit. The full algorithmic analysis of such Matrioshka-doll-like “meta-fractal” aspects – by the simple rules of what we’ll call “recipe theory” (after the R, C, and P values related to the Row label, Column label, and their cell-specific Products in such Tables) – must await our third and last installment. But the colored-quilt-like graphics can be viewed by any interested readers at their leisure, in the Powerpoint slide-show online at Wolfram Science from our mid-June presentation at NKS 2006.[1] (The slide-show’s title is almost identical to that of this monograph, as this latter is meant to be the “theorem/proof” exposition of that iconic, hence largely intuitive and empirically driven narration.) What we’ll need to undertake this voyage is a quick reprise of the results from Part I [2]. As the hardest part (as a hundred years of denial would imply) is finding the right way to think about the phenomenology of zero-division, not understanding its basic workings once they’re hit upon, such a summary can be much more brief and easy to follow than the proofs required to produce and justify it. We need but grasp 3 rather simple things. First, we must internalize the path and vertex structure of an Octahedron – for, properly annotated and storyboarded, this will provide us with the Box-Kite representation that completely catalogs ZDs in the 16-D arena where they first emerged (and, as we’ll see in our Roundabout Theorem herein, underwrites all higher-dimensional ZD emergences as well). Second, instead of the cumbersome apparatus of CDP that one finds in algebra texts and the occasional software treatment, we offer two easy algebraic one-liners which (inspired by Dr. Seuss’s “Thing 1” and “Thing 2”), we simply call “Rule 1” and “Rule 2” – which operate, in almost Pythagorean earnest, on triplets of integers (indices of associative triplets among our Hypercomplex Units, as we’ll learn), and which, by so doing, accomplish everything the usual CDP tactics do, but without the all-too-frequent obfuscation. (There is also a very useful, albeit quite trivial, “Rule 0,” which merely states that any integer-triple serving to index an associative triplet for one power of N will continue to do so for all higher pow- ers. What makes this useful is its allowing us to recursively take triplet “givens” for lower-level 2N-ions than those of current interest and toss them into the central circle of the third thing we must grasp.) We’ll need, that is, to be able to draw the simplest finite projective group’s 7-line, 7-node representation, the so-called PSL(2,7) triangle. The Rules, plus the Triangle, applied to Box-Kite edge-tracings and nodal indices, are all we’ll need. Indeed, the Box-Kite itself can be readily derived from the Triangle, by suppressing the central node, and then recognizing four correspondences. First, see the Triangle’s 3 triple-noded sides – two vertices plus midpoint – as the sources of the Box-Kite’s trio of “filled-in” triangles dubbed Trefoil Sails. Second, link the 1 triple-noded circle (which is a projective line, after all), wrapped around the suppressed center and threading the midpoints, as the 4th such triangle, the quite special “Zigzag Sail.” Third, envision the 3 lines from midpoints to angles as underwriting the ZD-challenged part of the diagram (because ZDs housed at the midpoint node cannot mutually zero-divide any housed at the opposite, vertex, node), the struts (whence strut constants). Fourth and last, imagine the other four triangles of the Box-Kite (meeting, as with the first four, each to each, at corners only, like same-colored checkerboard squares) as the vents where the wind blows. They keep the kite afloat, letting the four prettily colored jib-shaped Sails show off, while the trio of wooden or plastic dowels that form the struts thanklessly provide the structural stability that makes the kite able to fly in the first place. As Euclid knew well, 3 points determine a Triangle as well as a Circle – which is how we can glibly switch gears between representations based on these projec- tive lines. But the easy convertibility of lines to circles is what projective means here – and is, as well, at the very heart of linking the above geometrical images to Imaginary Numbers. From Argand’s diagram to Riemann’s Sphere, this has been the essence of Complex geometry. On the latter image only, place a sphere on a flat tabletop, call the point of contact S (for “South”), and then direct rays from its polar opposite point N. Rays through the equator intersect the table in a circle whose radius we ascribe an absolute value of 1, with center S = 0. This circle is just the trace of the usual ei·2π·θ exponential-orbit equation, with the i in the exponent, of course, being the standard Imaginary. Any diameter through this circle, extended indefinitely in either direction, is clearly a “projective pencil” of a circular motion in the plane containing both it and N, and centered on the latter. What each “line,” then, in the PSL(2,7) triangle represents is a coherent sys- tem interrelating 3 distinct imaginaries, one per nodal point: that is, a “Quater- nion copy” sans the Reals (which latter, like our N,S polar axis in the above, must stand “outside” the Number Space itself, since 3-D visualization is all used up by the nodes’ dimensional requirements). Hence, the 7 lines are the 7 interconnected Quaternion copies which constitute the 8-D Octonions. And what makes this espe- cially rich for our purposes is the built-in recursiveness of this Octonion-labeling scheme for higher-dimensional isomorphs, embedded in the sorts of ensembles we’ll be needing ETs to investigate more thoroughly. To see how this relates to actual integers, take the prototype of the 7 lines in the Triangle, and consider the Quaternions strictly from the vantage of CDP’s Rule 1. The first task in studying any system of 2N-ions is generating its units, so start with N = 0. Treat this singleton as the index of the Real axis: i0, that is, is identically 1. Add a unit whose index = 20 = 1 and we have the complex plane. Now, add in a unit whose index is the next available power of 2 – with N = 1, this is 2 itself. Call this unit and its index G for Generator, and declare this inductive rule: the index of the product of any two units is always the XOR of the indices of the units being multiplied; but, for any unit with index u < G, the product of said unit, written on the left (right), with the Generator written on its right (left), has index equal to their indices’ simple sum, and sign equal (opposite) to the product of the signs of their units’: i1 · i2 =+i3, but i2 · i1 =−i3. But this is just a standard way of summarizing Quaternion multiplication. Now, set N = 2, making G = 4. Applying the same logic, but slightly general- ized, we get three more triplets of indices. Dispensing with the tedious overhead of explicitly writing the indices as subscripts to explicit copies of the letter i, these are written in cyclical positive order (CPO) as follows: (1,4,5);(2,4,6);(3,4,7). (CPO is not mysterious: it just means read the triplet listing in left-right order, and so long as we multiply any unit with any such index by the unit whose index is to the right of it, the third term will result with signing as specified above: e.g., i4 · i5 = +i1; i4 · i3 = −i7.) We now have 4 of the Octonions’ 7 triplets, forming labels on the nodes of 4 of PSL(2,7)’s lines. Call the central circle spanning the medians the Rule 0 line (the Quaternions’ “starter kit” we just fed into our Rule 1 induction machine). Putting G = 4 in the center, the 3 lines through it are our Rule 1 triplets. If we further array the Quaternion index-set (1,2,3) in clockwise order around the 4, starting from the left slope’s midpoint at 10 o’clock, these lines are all oriented pointing into the angles. Now, with “Rule 2,” let’s construct the lines along the Triangle’s sides. Here’s all that Rule 2 says: given an associative index-triplet (henceforth, trip) like the Quaternions’ (1,2,3), fix any one among them, then take its two CPO successors and add G to them. Swap the order of the resulting two new units, and you have a new trip. Hence, fixing 1, 2, and 3 in turn, in that order, Rule 2 gives us these 3 triplets: (1,7,6);(2,5,7);(3,6,5). If you’ve drawn PSL(2,7) with the Octonion labels per the instructions in the last paragraph, you’ve already seen these 3 trips are the answers ... and now you know how and why they’re oriented, too. (Clockwise, in parallel with the Rule 0 circle). We’ve now laid out all the ingredients we need to do a basic run-through of Box-Kite properties. We’ll merely state and describe them, rather than prove them (but we’ll give the Roman numerals of the theorem numbers from last installment, for those who want to follow them). The first feature in need of elucidating, which should have those who’ve been reading attentively scratching their heads just about now, is this: the relations between the indices at the nodes of PSL(2,7) qua Octonion labeling scheme are clear enough; but how can these same labeled nodes serve to underwrite the 16-D Sedenion framework that Box-Kites reside in? The answer has two parts. First part: since all Imaginaries have negative Reals as squares, Imaginaries whose products are zero must have different indices – meaning that the simple case (which we call “primitive” ZDs) will always involve products of pairs of differently-indexed units, whose respective planes share no points other than 0 [IV]. Second part: given any such ZD dyad, neither index can ever equal G [II]; and, one must have index > G, while the other has index < G [I, III]. The Oc- tonion labeling scheme maps to the four Sails of a legitimate Sedenion Box-Kite [V], because it only provides the low-index labels at each of the 6 Octahedral vertices. The 4 in the center of our example, meanwhile, is no longer the G for this setup, since that role in now played by 8 (the next power of 2 in the CDP induc- tion). In the context of the Box-Kite scheme, it is now represented by a different letter: S, for strut constant – the only Octonion index not on a Box-Kite vertex. Which is why, from one vantage, there are 7 distinct (but isomorphic) Box- Kites in Sedenion space: because we’ve 7 choices of which Octonion to suppress! 6 vertices times 7 gives us the 42 Assessors of our first ZD paper [3], a term we’ll use interchangeably with dyad throughout. We can, in fact, tug on the net- work of interconnected lines “wok-cooking” style, stirring things into and out of the hot oil in the center of the Box-Kite. (S as ”Stir-fry constant”?) To find the “Octonion copy” labeling low indices on Box-Kite vertices where the 5, say, is suppressed, trace the line containing it and the 4, and “rotate”: the 1 now goes from the left slope’s midpoint to the bottom right angle, to be replaced by the 4 while the 5 heads for the middle, with CPO order (and hence, orientation of the line) remaining unchanged. Of the other 2 trips the 5 belongs to, only one will preserve midpoint-to-angle orientation along the 6 o’clock-to-midnight vertical: (2,5,7), as one can check in an instance. (The two possibilities must orient oppo- sitely when placed along the same line, since one is Rule 1, the other Rule 2.) From this point, everything is forced. This is obviously a procedure that is trivial to automate, for any “Octonion copy,” regardless of the ambient dimen- sionality the Box-Kite it underwrites might float in. This simple insight will be the basis, in fact, of our proof method, both in this paper and its sequel. Another simple insight will tell us how to find the high-index term for any vertex’s dyad. Two indices per vertex leaves 4 that are suppressed: 0 (for the Reals), G and S, and the XOR (and also simple sum) of the latter two, which we’ll shorthand X. These four clearly form a Quaternion copy – one, in fact, which has no involve- ment whatsoever in its containing Box-Kite’s zero-divisions. Putting the index of the one among these which is itself an L-unit center stage gives us the full array of L-index sets (trips composed of those indices of a Sail’s 3 vertices 8 and not a power of 2. Correlated with such ZD-free structures are “Type II” box-kites with S < 8 (or, more gen- erally, ¡ G/2), indistinguishable from the standard “Type I” variety but for strut orientations (with exactly 2 of a “Type II”’s 3 struts always being reversed: see Appendix B). Their “twist products” (operating similarly on parallel sides of each of the 3 orthogonal squares or “catamarans” of a box-kite’s orthogonal wire-frame, as opposed to the 4 triangular “sails” which are our sole focus in this monograph) let them act as middlemen between the normal and ZD-free structures. Our ar- guments here will make no use of such “twist product” subtleties (on which, see Theorem 6 in Part I and the caveat that follows it, and the more developed re- marks and diagrams in [4]). Indeed, their phenomenology falls “under the radar” of our Sail-based analysis: strut-opposite Assessors, after all, do not mutually zero-divide. Given our limited purposes here, therefore, our toolkit, once the Viziers are dropped in it, is complete for all our later proofs. (We must simply remember that invocations of VZ1 and VZ3 implicitly concern sign-free relations between Vent and Zigzag terms – that is, indices of XOR products only.) What’s left to do still: get our hands messy with the plumbing, and then clean up with a last grand construct. Let’s start with the plumbing, and add some notation. Label the Zigzag dyads with the letters A, B, C; label their strut-opposite terms in the Vent F, E, D respectively. Specify the diagonal lines containing all and only ZDs in any such dyad K as (K, /) and (K, \) – for c · (iK + ik) and c · (iK − ik) respectively, c an arbitrary real scalar. The twelve edges of the octahedral grid are so many pipes, through which course the two-way streets of edge-currents: for the 3 edges of the Zigzag (and the 3 defining the opposite Vent), currents joining arbitrary vertices M and N are called negative, since they have this form: (M,/) ·(N, \) = (M, \) · (N,/) = 0 Tracing the perimeter of the Zigzag with one’s finger, performing ZD products in natural sequence – (A, /)·(B, \), followed by the latter times (C, /), then this times (A, \) and so forth – one should quickly see how the Zigzag’s name was suggested. Suppressing all letters, one is left with just this cyclically repeating sequence: /\/\/\. Currents along all 6 edges joining Zigzag and Vent dyads, on the contrary, con- nect similarly sloping diagonals, hence are called positive, yielding the shorthand sequence ///\\\ for Trefoil sail traversals: (Z,/) ·(V, /)= (Z, \) ·(V,\)= 0 Consider the chain of ZD multiplications one can make along the Zigzag, be- tween A and B, then B and C, then C and A, for S = 4. The first term of this 6-cycle of zero products, once fully expanded, is writable thus: (A, /) ·(B, \) = (i1 + i13) · (i2− i14) = (i3 − i15 + i15 − i3) = (C, /)−(C, /)= (C, \)−(C, \)= 0 We can readily see here where the notion of emanation arises: traversing the edge between any two vertices in a Sail yields a balance-pan pairing of oppositely signed instances of the terms at the Sail’s third vertex ... the 0 being, then, an instance of “balanced bookkeeping” (whence the term “Assessor,” our synonym for “dyad”). This suggests the spontaneous emanation of particle/anti-particle pairings from the quantum vacuum, rather than true “emptiness.” Finally, a side-effect of such “Sail dynamics” is this astonishing phenomenon: each Sail is an interlacing of 4 associative triplets. For the Zigzag, these are the L- index (a,b,c), plus the 3 U-index trips obtained by replacing all but one of these lowercase letters with their uppercase partners: ergo, (a,B,C); (A,b,C); (A,B,c). Ultimately this tells us that ZDs are extreme preservers of order, since they main- tain associativity in rigorous lock-step patterns, for all 2N-ions, no matter how close to ∞ their N might become. Put another way, the century-long aversion re- action experienced by virtually all mathematicians faced with zero-divisors was profoundly misguided. 2 Emanation Tables: Conventions for Construction Theorem 7 guaranteed the simple structure of ETs: because any Assessor’s up- percase index iU is strictly determined by G and S, once we are given these two values, the table need only track interactions among the lowercase indices iL. This will only lead to ambiguities in the very place these are meaningful: in the recur- sive articulation of a boxes-within-boxes tabulation of meta-fractal or Sky behav- iors. In such cases, the overlaying will be as rich in significance as the multiplicity of sheets of a Riemann surface in complex analysis. An ET does for ZD interactivity what a Cayley Table does for abstract groups: it makes things visible we otherwise could not see – and in a similar way. Each Assessor’s L-index is entered (in a manner we’ll soon specify) as a row (R) or column (C) value, with XOR products (P values) among them being placed in the “spreadsheet cell” (r,c) uniquely fixed by R and C. We’ve noted such values only get entered if P is the L-index of a legitimate emanation: that is, the Assessor it represents mutually zero-divides (forms DMZs with, for “divisors making zero”) both the Assessors represented by the R and C labels of its cell. (As already suggested, the natural use of the letters R, C, P here inspired calling the study of NKS-like “simple rules” for cooking fractals from their bit-strings recipe theory.) Four conventions are used in building ETs: first, their labeling scheme obeys the same nested-parentheses ordering we’ve already used in designating Assessors A through F, with D, E, F the strut opposites of A, B, C in reverse of the order just written. The L-indices, then, are entered as labels running across the top and down the left. The label of the lowest L-index is placed flush left (abutting the ceiling), with the corresponding label of its strut opposite being entered flush right (atop the floor). As there will always be G− 2 (hence, an even number of) indices to enter, repeating this procedure after each pair has been copied to horizontal and vertical labels will completely exhaust them all. Second convention: As the point of an ET is to display all legitimate DMZs, any cell whose R and C do not mutually zero-divide is left blank – even if, in fact, there is a well-defined XOR value. Hence, if R and C reference the same Assessor, the XOR of their L-indices will be 0; if they reference strut opposites, the XOR will be S. But in both cases, the cell (hence, the P value) is left blank. All “normal” ETs, then, will have both long diagonals populated by blank cells, while all other cells are filled. Third convention: the two ZD diagonals associated with any Assessor are not distinguished in the ET, although various protocols are possible that would make doing so easy. The reasons are parsimony and redundancy: rather than create longer, or twice as many, entries, we assume both entries for the same Box-Kite edge will contain the positive-sloping diagonal when the lower L-index appears as the row label, else the negative-sloping diagonal when the higher L-index appears first instead. Such niceties won’t concern us much here: the key thing is that, in fact, all 24 filled cells of a Box-Kite’s ET entries can be mapped one-to-one to its ZD diagonals. Recall, per Theorem 3, that both ZD diagonals of an Assessor form DMZs with the same Assessor, according to the same edge-sign logic. This leads us to the . . . Fourth convention: Although they are superfluous for many purposes, edge signs provide critical information for others, and so are indicated in all ETs pro- vided here. Each of a Box-Kite’s 12 edges conducts two currents – one per ZD diagonal – and does so according to one or the other orientational option. ZD di- agonals are conventionally inscribed so that the horizontal axis of their Assessor plane is the L-indexed unit, while the vertical is the U-indexed unit. But even if this convention were reversed, the diagonal leading from lower left quadrant to up- per right would still correspond to the state of synchrony implied by ±k(iL + iU): for some Assessor U, we write (U,/). Conversely, the orthogonal diagonal in- dicative of anti-synchrony is written (U,\). If DMZs formed by the Assessors bounding an edge are both of same kind, then we call the edge blue or notate it [+]; if Assessors U and V only form DMZs from oppositely oriented ZD diag- onals – (U,/) · (V,\) = 0 ⇔ (U,\) · (V,/) = 0 – then we call the edge red or notate it [-]. However, for ET purposes, since the red edges are the most infor- mative (all-red-edged Zigzags providing the stable basis of Box-Kite structure, while all-red-edged DEF Vents play a key role in twist-product interpreting – a deep topic touched upon in Part I, which won’t concern us further here), we leave them unmarked. The six blue edges bounding the hexagonal view of the Box-Kite, however, are preceded by an extra mark (best interpreted as a dash, rather than a minus sign). This has the pragmatic advantage that when zoomed, a large ET will have its entries with an extra mark become unreadable in many software systems (e.g., one sees only asterisks) – and so we want the unmarked entries to be those likely to be of most interest. Since, given X (or, alternatively, G or N, and S), we can reconstruct a Box- Kite from just its Zigzag’s L-index trip, gleaning this information from an ET is worth explaining. If a given row contains the indices of any such Zigzag L-trips, they will appear as the row label itself, plus two unmarked cell entries, with the column label of the one appearing as the content of the other. (If either cell in such a complementary set be marked with a dash, then we are dealing with a DEF Vent index.) Each Zigzag L-trip will also appear 3 times in an ET, once in each row whose label is one of its indices, its 2 non-label indices appearing in un-dashed cell entries each time. Here is a readily interpreted emanation table. Having 6 = 23 − 2 rows and columns, G = 8, so N = 4, making this a Sedenion ET (encoding, thereby, a single Box-Kite). And, since 2⊻ 3 = 4⊻ 5 = 6⊻ 7 = 1, the Strut Constant S = 1 as well. A scan of the first row shows 6 and 5 unmarked, under headings 4 and 7 respectively; however, these two labels appear as cell values which are marked, making these edges that connect Assessors in the D, E, F Vent. In the fourth row of entries, though, column labels 5 and 3 contain cell values 3 and 5 respectively, both unmarked. With their row label 6, then, these form the Zigzag L-index set (3,6,5), which hence must map to Assessors (A,B,C). Using the mirror-opposite logic of the labeling scheme to determine strut opposites, it is clear that the six row and column headings (2,4,6,7,5,3) correspond, in that order, to the Assessors (F,D,B,E,C,A). (The unmarked contents 6 and 5 in the first row, having labels (2,4) and (2,7), thereby map to edges FD and FE, connecting DEF Vent Assessors as claimed.) Finally, the long diagonals are all empty: those cells in the diagonal beginning at the upper left all have identical row and column labels; those in the mirror-opposite slots, meanwhile, have labels which are strut-opposites. By our second convention, all these cells are left blank. 2 4 6 7 5 3 2 6 −4 5 −7 4 6 −2 3 −7 6 −4 −2 3 5 7 5 3 −2 −4 5 −7 3 −2 6 3 −7 5 −4 6 Before beginning an in-depth study of emanation tables by type, there is one general result that applies to them all – and whose proof will give us the chance to put the Three Viziers to good use. While seemingly quite concrete, we will use it in roundabout ways to simplify some otherwise quite complicated arguments, beginning with next section’s Theorem 9. This Roundabout Theorem is our Theorem 8. The number of filled cells in any emanation table is a multiple of 24. Proof. Since 24 is the number of filled cells in a Sedenion Box-Kite, this is equiv- alent to claiming that CDP zero-divisors come in clusters no smaller than Box- Kites. We have already seen, in Theorem 5, that the existence of a DMZ implies the 3-Assessor system of a Sail, which further (as Theorem 7 spelled out) entails a system of 4 interlocking trips: the Sail’s L-trip, plus 3 trips comprising each L-trip index plus the U-indices of its Assessor’s 2 “sailing partners.” Since we have an ET, we have a fixed S and fixed G. Hence, if we suppose our DMZ corresponds to a Zigzag edge-current, we immediately can derive its L-trip by Theorem 5, and all 3 Zigzag strut-opposites’ L-indices by VZ 1, and all 6 U-indices by VZ 3. We then can test whether the Trefoil Sails’ edge-currents are all DMZs as follows. As we wrote in Theorem 7, (u,v,w) maps to the Zigzag L-trip in CPO, but not neces- sarily in (a,b,c), order: hence, (uopp,wopp,v) is an L-trip, and can be mapped to any of the Trefoils. In other words, given the Zigzag’s 3-fold rotational symmetry, proving the truth of the following arithmetical result proves the DMZ status of all Trefoil edges. Yet we can avail ourselves of all 3 Zigzag U-trips in proving it. (wopp −Wopp) (uopp +Uopp) −V − v +v +V The left bottom result is a given of the trip we started with. The result to its right is a three-step deduction from one of the Zigzag U-trips: use (uopp,w,vopp); Rule 2 gives (uopp,vopp+G,w+G); the Second Vizier tells us this is (uopp,V,Wopp); but the negative inner sign on the upper dyad reverses the sign this trip implies, yielding +V for the answer. The top results are derived similarly: find which of the 4 Zigzag trips un- derwrites the Vizier-derived “harmonic” which contains the pair of terms being multiplied, and flip signs as necessary. Hence, the top left uses (u,wopp,vopp), then applies Rule 2 and the Second Vizier to get (−V ), while the top right uses the Zigzag L-trip itself: (u,v,w)→ (w+G,v,u+G) → (Wopp,v,Uopp) – which, multiplied by (−1), yields (−v). � Remark. The implication that, regardless of how large N grows, ZDs only increase in their interconnectedness, rather than see their basic structures atrophy, flies in the face of a century’s intuition based on the Hurwitz Proof. That there are no standalone edge-currents, nor even standalone Sails, bespeaks an astonishing (and hitherto quite unsuspected) stability in the realm of ZDs. Corollary. An easy calculation makes it clear that the maximum number of filled cells in any ET for any 2N-ions is just the square of a row or column’s length in cells, minus twice the same number (to remove all the blanks in long diagonals): that is, (2N−1 −2)(2N−1 −2)−2 · 2N−1 +4 = (22N−2 −6 · 2N−1 +8) = (2N−1 − 4)(2N−1 − 2) = 4 · (2N−2 − 1)(2N−2 − 2). By Roundabout, we now know this number is divisible by 24, hence indicates an integer number of Box-Kites. But two dozen into this number is just (2N−2 − 1)(2N−2 − 2)/6 – the trip count for the 2N−2-ions! (See Section 2 of Part I.) We have, then, the very important Trip- Count Two-Step: The maximum number of Box-Kites that can fill a 2N-ion ET = TripN−2. We will see just how important this corollary is next section. 3 ETs for N > 4 and S ≤ 7 One of the immediate corollaries of our CDP Rules for creating new triplets from old ones is something we might call the Zero-Padding Lemma: if two k-bit-long bitstring representations of two integers R and C being XORed are stuffed with the same number n of 0s between bits j and j+1, 0 ≤ j ≤ k, their XOR will, but for the extra n bits of 0s in the same positions, be unchanged – and so will the sign of the product P of CDP-derived imaginary units with these three bit-strings representing their respective indices. Examples. (1,2,3)→ (2,4,6)→ (4,8,12) [Add 1, then 2, 0s to the right of each bitstring] (1,2,3)→ (1,4,5)→ (1,8,9) [Add 1, then 2, 0s just before the rightmost bit in each bitstring] (3,4,7)→ (3,8,11)→ (3,16,19) [Add 1, then 2, 0s just after the leftmost bit in each bitstring] Proof. Rule 1 will create a new unit of index G+L from any unit of index L < G, regardless of what power of 2 G might be. Rule 2, meanwhile, uses any power of 2 which exceeds all indices of the trip it would operate on, then adds this G to two of the members of the trip, creating a new trip with reversed orientation – one of an infinite series of such, differing only in the power of 2 (hence, position of the leftmost bit) used to construct them. The lemma, then, is an obvious restatement of the fundamental implications of the CDP Rules. But creation of U-indices associated with L-indices in Assessor dyads is the direct result of creating new triplets with G+S as their middle term. Hence, if we call the current generator g and that of the next higher 2N-ions G (= 2 · g), then if Assessors with L-indices u and v form DMZs in the Sedenions for a given strut constant S, their U-indices will increment by g in the Pathions, and zero division will remain unaffected. By induction, the emanation table contents of the Sedenion (R,C,P) entries will remain unchanged for all N, for all fixed S≤ 7. This leads us to Theorem 9. All non-long-diagonal cell entries in all ETs for all N, for all fixed S ≤ 7, will be filled. Proof. Keeping the same notation, the 2N-ions will have g more Assessors than their predecessors, with indices ranging from g itself to 2g−1 (=G−1). Consider first some arbitrary Zigzag Assessor with L-index z < g, whose U-index is G+ z ·S. (If it were a Vent Assessor, or a Zigzag on a reversed-orientation strut in a “Type II” box-kite, the second part of the expression would be reversed: S · z, per the First Vizier. This effects triplet orientation, but not absolute value of the index, however, and it is only the latter which matters at the moment.) Now consider the Assessor whose L-index is the lowest of those new to the 2N-ions, g. We know it is a Vent Assessor, in all Box-Kites with S < g, of which there are 7 per each such S in the Pathions, 35 in the 64-D 26-ions, and so on: for it belongs to the trip (S,g,g+S) (Rule 1), so that its U-index appears on its immediate left in the triplet (G+g+S,g,G+S) (Rule 2 and last parentheses). Its U-index, then, is G+(g⊻ S), or (recall Rule 1) just G+ g+S. We claim these Assessors form DMZs; or, writing out the arithmetic, that the following term-by-term multiplication is true: +g+(G+g+S) +z+(G+ z·S) −(G+g+ z ·S)− (z+g) +(z+g)+(G+g+ z ·S) Because one Assessor is assumed a Zigzag, while the other is proven a Vent, the inner signs will be the same. (Simple sign reversals, akin to those involving our frequently invoked binary variable sg, will let us generalize our proof to include the Vent-times-Vent case later.) Let’s examine the terms one at a time, starting with the bottom line. Its left term is an obvious application of Rule 1, as z < g, the latter being the Generator of the prior CDP level which also contained z as an L-index. The term on bottom right we derive as follows: we know that z and its U-index partner in the 2N−1-ions belong to the triplet mediated by g+S: (z,g+ z ·S,g+S). Supplementing this CPO expression by adding G to the right-hand terms (Rule 2), we get the triplet containing both multiplicands of the bottom- right quantity: (z,G+g+S,G+g+ z ·S). The multiplicands appear in this trip in their order of application in forming the product; therefore, their resultant is a plus-signed copy of the trip’s third term, as shown above. Moving to the left-hand term of the top line, what trip do the multiplicands belong to? Within the prior generation, Rule 1 tells us that z’s strut opposite, z ·S, multiplies g on the left to yield g+ z ·S. Application of Rule 2 to the terms 6= g reverses order and gives us this: (G+g+z ·S,g,G+z ·S). But what we’ve written above is the product of multiplying the third and second terms of the trip together, in CPO-reversed order; hence, the negative sign is correct. Finally, we get the negative of (z+g) by similar tactics: the term is the U-index of z’s strut-opposite Assessor in the prior CDP generation, hence belongs to the trip with this CPO expression: (g+S,z+g,z ·S). Rule 2 gives us (G+g+S,G+z ·S,z+g). Hence, the product written above is properly signed. Now, what effect does our initial assumption that z is the L-index of a Zigzag Assessor have on the argument? The lower-left term is obviously unaffected. But the upper-left term, perhaps less obviously, also is unchanged: while it seems to depend on z ·S, in fact this is only used to define the L-index of z’s strut opposite, which multiplies g on the left to precisely the same effect as z itself, both being less than it. The two terms on the right, just as clearly, do have their signs changed, for in both, the order relations of L- and U- indices vis à vis G+S or X are necessarily invoked. But both signs on the right can be re-reversed to obtain the desired result if we change the inner sign of the topmost expression – which is to say, we have an effect analogous to that achieved in earlier arguments by use of the binary variable sg, as claimed. Since one CDP level’s G is the g of the next level up, the above demonstration clearly obtains, by the obvious induction, for all 2N-ions including and beyond the Pathions. But what if one or both L-indices in a candidate DMZ pairing exceed g? Rather than answer directly, we use the Roundabout Theorem of last section. Given a DMZ involving Assessors with L-indices u < g and g, we are assured a full Box-Kite exists with a Trefoil L-trip (u,g,g+ u). The remaining Assessors, being their strut opposites, then have L-indices uopp,g+ S, and g+ u · S. As u varies from 1 to 7, skipping S < 8, zero-padding assures us that all DMZs from prior CDP generations exist for higher N, for all L-indices u,v < 8. Only those Box-Kites created by zero-padding from prior-generation Box-Kites (of which there can be but 1 inherited per fixed S among the 7 found in the Pathions, for instance) will have all L-indices < g. For all others, the model shown with those having g as an L-index must obtain. Hence, only one strut will have L-indices < g, the rest being comprised of some w with L-index ≥ 8, the others deriving their L-indices from the XOR of w with the strut just mentioned, or with S. But what will guarantee that any edge-currents will exist between arbitrary Assessors with L-indices u < g and g+ k,0 < k < g, since there is not even one DMZ to be found among Assessors with L-indices ≤ g in the candidate Box- Kite they would share? We can now narrow the focus of our original question considerably, by making use of the curious computational fact we called the Trip- Count Two-Step. In Part I’s preliminary arguments concerning CDP, we showed that the number of associative triplets in a given generation of 2N-ions, or TripN , can be derived from a simple combinatoric formula. Call the count of complete Box-Kites in an ET BKN,S. For S < 8, BKN,S = TripN−2, provided all L-indices g+ k,0 < k < g, form DMZs in the candidate Box-Kites implied. To begin an induction, let us consider a new construction along familiar lines, which will provide us an easy way to comprehend the Pathion trip-systems of all S < 8. Beginning with N = 5, we designate TripN−2 trips for each S < 8 as type Rule 0, in the manner the singleton 22-ion trip (1,2,3) was used in our introduction’s ”wok-cooking” discussion (which Part I, Section 5, used as the basis of its “slipcover proofs”). But now, instead of putting the Octonions’ G = 4 in the center of the PSL(2,7) triangle, we put the Sedenions’ 8. For consistency of examples, we continue to assume S= 1, so we’ll begin with (3,6,5), the Zigzag L-trip for S = 1 in the Sedenions, and also, by zero-padding, an L-trip Zigzag for 1 of the 7 Box-Kites with S = 1 among the Pathions. Ex- tending rays from the (3,6,5) midpoints through the center creates Rule 1 trips which end in 11,14,13: (a,b,c) get sent to (F,E,D) respectively. The Rule 2 trips along the sides, in order of Zigzag L-index inclusion, then correspond to Trefoil U-trips, all oriented clockwise. They read symbolically (literally) as fol- lows: EaD (14,3,13);DbF (13,6,11);FcE (11,5,14). We claim each of these 7 lines, when its nodes are attached to their strut opposites, map 1-to-1 to an S = 1 Pathion Box-Kite. We have this as a given for the Rule 0 trip; we need to ex- plain this for the Rule 1 trips (which Roundabout already tells us are Box-Kites); and, we need to prove it for the Rule 2 trips that make the sides. (And, once we do prove it, and frame the suitable induction for all higher N, the task which originally motivated us will be done: for these U-trips house the Assessors with L-indices > g, whose candidate Box-Kites don’t include g.) The Rule 1 trips, in all instances within this example, correspond to Asses- sor L-indices (a,d,e). With g = 8 at d, the Third Vizier tells us c = 8+ S = Sedenion X. (a,b,c) thereby reads, within the Sedenions, as (a,A,X). But in the Pathions, all 3 terms are less than G, hence can comprise an L-index trip for a Sail – and specifically, a Zigzag (else the order of A and X would be reversed). Simi- larly, the old Sedenion ( f ,F) are the new Pathion ( f ,e), with the new trip ( f ,c,e) being the Third Vizier’s way of saying ( f ,X ,F) from the Sedenions’ vantage. For the Rule 2 trips, we prove one relation in one of them a DMZ, which Roundabout tells us implies the whole Box-Kite, while symmetry allows us to assume the same of the other two. Consider, then, the aDE Trefoil U-trip, in- stantiated by (3,13,14) in our example; specifically, compute the product of the Assessors containing a and D = c+ g as L-indices. Their U-indices within the Pathions must be (G+ a⊻S) = (G+ f ), and (G+ g+ c⊻S) = (G+ g+ d) re- spectively. We write their dyads when multiplying with opposite inner signs, as we assume their DMZ is an edge in a Zigzag. We claim the truth of this arithmetic: +(c+g)− (G+g+d) +a + (G+ f ) +(G+g+ e)− (b+g) +(b+g)− (G+g+ e) Bottom left: (a,b,c)→ (a,c+g,b+g) (Rule 2, with N = 4.) Bottom right: (a,d,e)→ (a,g+ e,g+d)→ (a,G+g+d,G+g+ e) (Rule 2 twice, N = 4, then N = 5.) Upper dyad’s inner sign reverses that of product. Top left: ( f ,c,e)→ (e+g,c+g, f )→ (G+ f ,c+g,G+e+g) (Rule 2 twice, N = 4, then N = 5.) Top right: ( f ,d,b) → (b+ g,d + g, f ) → (b+ g,G+ f ,G+ g+ d) (Rule 2 twice, N = 4, then N −5.) Upper dyad’s inner sign reverses that of product. A similar brief exercise with either DMZ formed with the emanated Assessor will show it, too, has a negative inner sign with respect to a positive in its DMZ partner. Two negative edge-signs in one Sail means Zigzag (means three negative edge-signs, in fact). Our proof up through the Pathions is complete; we need only indicate the existence of a constructive mechanism for pursuing this same strategy as N grows arbitrarily large. Consider now the same PSL(2,7) triangle, but in its center put a 16 (= g=G/2 for the 64-D Chingons, after the 64 Hexagrams of the I Ching, to give them a name). Then, put all 7 of the Pathions’ S = 1 Zigzag L-trips into the Rule 0 circle. One gets 3 · 7 = 21 Rule 2 Zigzag L-trips, and the 10 integers < g found in them and the 7 Rule 0 Zigzag L-trips implies there are 10 Rule 1 Trefoil L-trips, each associated with a distinct Box-Kite. But that would make for 7+ 21+ 10 = 38 Zigzag L-trips, when we know there can only be 35. The extra 3 indicate there’s some double-duty occurring: specifically, 3 of the Rule 1 Trefoil L-trips in fact designate not the standard (a,d,e), but ( f ,d,b), with d = g = 16 in each instance. When (5,14,11) is fed into our “trip machine” as Rule 0 circle, both (11,16,27) and (14,16,30) map to ( f ,d,b) trips tied to Rule 0 Zigzag L-indices (10,27,17) and (15,30,17), whose (a,d,e) trips appear as rays on triangles for (3,10,9) and (3,13,14) respectively. (11,16,27) also shows as an ( f ,d,b) with Rule 0 trip (6,11,13). (Readers are encouraged to use the code in the appendix to [4], to generate ETs for low S and N. Trip-machining details for our S = 1 example are in Appendix A.) For N = 7, use the 35 just-derived S = 1 L-trips as Rule 0 circles with a central 32, and so on. � 4 The Number Hub Theorem (S = 2N−2) for 2N-ions Given the lengths required to prove the fullness of ETs for S < 8, it might be surprising to realize that the infinite number of cases for S = 2N−2 for all 2N-ions are so simple to handle that they almost prove themselves. Yet the proof of this Number Hub Theorem, while technically trivial, has far-reaching implications. Theorem 10. For all 2N-ions with ZDs (N > 3), and S = g = G/2, all non-long- diagonal entries in the emanation table are filled; more, each such filled cell in the ET’s upper left quadrant is unmarked (indeed, indicates an edge-current in a Zigzag); further, the row, column, and cell entries are isomorphic to those found in an unsigned, CDP-generated, multiplication table for the 2N−2-ions; finally, the TripN−2 Zigzag L-index sets which underwrite its Box-Kites are precisely all and only those trips contained in said 2N−2-ions, the ET effectively serving as their high-level atlas. Proof. As the largest L-index of any Assessor is 2g− 1, and each S in the ETs in question is precisely g, then the row (column) labels will ascend from 1 to g− 1 in simple increments from top to bottom (left to right) in the upper left quadrant, making its square of filled cells isomorphic to unsigned entries in the corresponding 2N−2-ion multiplication table. Also, all these filled cells of the ET will only contain XORs of indices < g. Hence, all and only L-index trips will have the edges of their (necessarily Zigzag) Sails residing in said quadrant. All non- long-diagonal cells in the ET are meanwhile filled, since all candidate Assessors have form M = (m,G+ g+m), and for any CPO triplet (a,b,c) whose row and column labels plus cell entry are contained in the upper left quadrant, it is easy to show that the following arithmetic is true: +b− (G+g+b) +a + (G+g+a) +(G+g+ c)− c +c− (G+g+ c) Therefore, the TripN−2 Box-Kites, the Zigzag L-index set of each of which is one of the TripN−2 trips contained in the 2 N−2-ions, all have this simple form: (a,b,c,d,e, f ) = (a,b,c,g+ c,g+b,g+a) � Remarks. As will become ever more evident, powers of 2 – which is to say, singleton 1-bits in indefinitely long binary bitstrings – play a role in ZD number theory most readily analogized to that of primes in traditional studies. And while integer triples (from Pythagoras to Fermat) play a central role in prime-factor- based traditional studies, all XOR triplets at two CDP generations’ remove from the power of 2 in question are collected by its ET in this new approach. All other integers sufficiently large (meaning > 8) are meanwhile associated with fractal signatures, to each of which is linked a unique infinite-dimensional space spanned by ZD diagonals. But can such a vantage truly be called Number Theory at all? We say indeed it can: that it is, in fact, the “new kind of number theory” that must accompany Stephen Wolfram’s New Kind of Science. In his massive 2002 book, he tells us that, common wisdom to the contrary, complex behavior can be derived from the simplest arithmetical behavior. The obstacle to seeing this resides in the common wisdom itself [5, p. 116]: · · · traditional mathematics makes a fundamental idealization: it as- sumes that numbers are elementary objects whose only relevant at- tribute is their size. But in a computer, numbers are not elementary objects. Instead, they must be represented explicitly, typically by giv- ing a sequence of digits. But that ultimately implies strings of 0’s and 1’s, where the matter of impor- tance becomes which places in the string are held, and which are vacant: the orig- inal meaning of our decimal notation’s sense of itself as placeholder arithmetic. The study of zero divisors – placeholder substructures – then becomes the natural way to investigate the composite characteristics of Numbers qua bitstrings. When we discover, in what follows, that composite integers (meaning those requiring multiple bits to be represented) are inherently linked, when seen as strut-constant bit-strings, with infinite-dimensional meta-fractals, the continuation of the quote on the following page should ring true: In traditional mathematics, the details of how operations performed on numbers affect sequences of digits are usually considered quite irrelevant. But · · · precisely by looking at such details, we will be able to see more clearly how complexity develops in systems based on numbers. 5 The Sand Mandala Flip-Book (8 < S < 16, N = 5) In the first concrete exploration of ZD phenomenology beyond the Sedenions [6, pp. 13-19], a startling set of patterns were discovered in the ETs for values of S beyond the “Bott limit”: that is, for 8 < S < 16 (the upper bound being the G of the 32-D Pathions), the filled cells sufficed to define not 7, but only 3, Box-Kites for N = 5; more, the primary geometric figures in each such ET transformed into each other with each integer increment of S, in a manner exactly reminiscent of the flip-books which anticipated cartoon animation. While these seemed perplexing in mid-2002 when they were found, their logic is in fact profoundly simple. First, each such ET’s S is just the X of one already seen in the Sedenions. We continue our convention of using g to indicate the G of the prior CDP genera- tion, employ s for said generation’s S, and reference all prior Assessor indices by suffixing their letters with asterisks. Then, since S = g+ s, the trip (s,g,g+ s) mandates, by the First Vizier (whose signed version we invoke due to the direct derivation from the Sedenions), that g must belong to the Zigzag Sail if it’s to be an Assessor L-index at all. Note that this is not a truly legitimate argument, as we’ll see shortly, albeit the results are correct, as shown by other means in [6]: this is because “Type II” box- kites first emerge in this current context – but are not among the 3 x 7 “flip-book” denizens of immediate interest. We will assume, for simplicity of presentation, that the First Vizier does obtain here: proving that it does, however, requires a background argument concerning “Type II” box-kites: their S values must be less than g, hence none of our flip-book candidates can qualify. (But they are just as numerous as the flip-book box-kites, there being 3 for each of the seven values of S < g. For their listing, and theoretical framing of “Type II” phenomenology, see Appendix B.) We will content ourselves here with giving this as an empirical result, and assume, therefore, the validity of the signed version of the First Vizier in the case at hand. Based on this assumption, we can further claim that the Sedenion Vent L-indices, f∗,e∗,d∗, must also be associated with Zigzag Assessors. By an argument exactly akin to that of last section, we then have 3 candidate Box-Kites to consider: since the 3 Vent L-indices are all less than g, they must be mapped to the 3 Assessors A, g = 8 must adhere to B (and s = 1 to E), while the L-indices of the C Assessors associated with f∗,e∗,d∗ must be A*, B*, C* respectively. The proof is easy: taking the new A, C Assessors = ( f∗,G+g+a∗) and (g,G+ s) in that order as readily generalizable representatives, we do the arithmetic. +g− (G+ s) + f ∗ + (G+g+a∗) +(G+a∗) − ( f ∗+g) +( f ∗+g)− (G+a∗) The bottom left is just Rule 1. For the bottom right, start with the First Vizier: ( f∗,a∗,s)→ ( f∗,G+ s,G+a∗)→ ( f∗) · (−(G+ s)) =−(G+a∗). The top left is derived thus: (a∗, g, g+a∗)→ (g, G+a∗, G+g+a∗)→ (G+g+a∗) · g = +(G+a∗). Finally, (a∗,s, f∗)→ (g+a∗,g+ f∗,s)→ (G+g+a∗,G+s,g+ f∗), but the negative inner sign of the top dyad reverses sign as shown. The 3 Box-Kites thus derived are the only among the 7 candidates to be viable: for the Zigzag L-index of the S = 1 Sedenion Box-Kite does not underwrite a Sail; hence, by what lawyers would call a “fruit of the poisoned tree” argument, neither do the 3 U-trips associated with the same failed Zigzag. Using A* and B*, then invoking the Roundabout Theorem, we see this readily: +b∗+(G+g+ e∗) +a∗ + (G+g+ f∗) −(G+g+d∗)− c∗ +c∗ − (G+g+d∗) NOT ZERO (only c*’s cancel) With the appending of two successive bits to the left, the bottom-left and top- right products are identical to those obtaining without the (G+g) being included. Similarly, the top-left product uses Rule 2 twice, to similar effect, but with (G+g) included in the outcome: since ( f∗,d∗,b∗) is CPO, we then get −(G+g+d∗). For the top-right result, meanwhile, the two high bits induce a double reversal, then are killed by XOR, leaving the product the same as if they hadn’t been there: ( f∗,c∗,e∗)→ (g+e∗,c∗,g+ f∗)→ (G+g+ f∗,c∗,G+g+e∗), hence −c∗. We have an argument reminiscent of Theorem 2: depending on the inner sign of the upper dyad, one pair of products cancels or the other, but not both. We see, then, that the construction given without explanation at the end of Part I is correct. The arguments given there concerning the vital relationship of a Box-Kite’s non-ZD structures to semiotic modeling suggest that this “offing” (to use the appropriately binary slang linked to Mafia hitmen) of a Zigzag’s 4 triplets should have a similarly significant role to play in such modeling. This has bearing not just on semiotic, but physical models, since the key dynamic fact im- plicit in the Zigzag L- and U- trips (or just Z-trips henceforth) is their similarity of orientation: since (a,b,c);(a,B,C);(A,b,C);(A,B,c) are all CPO as written, we are effectively allowed to do pairwise swaps of upper- and lower- case lettering among them without inducing anything a physicist might deem observable (e.g., a 180◦ reversal or “spin quantum”). This condition of trip sync breaks down as soon as we attempt to allow similar swapping between Z-trips and their Trefoil compatriots: in particular, those 2 which don’t share an Assessor with the Zigzag. The toy model of [7] would use these features to designate the basis of a “Cre- ation Pressure” that leads to the output of the string theorist’s E8 ×E8 symmetry. This symmetry, as discussed there, breaks in the standard models when one of the primordial E8’s decays into an E6 – which has 72 roots to parallel the 72 filled cells of our Sand Mandalas. For present purposes, the key aspect of this corre- spondence is that, in ZD theory at least, the explosion of a singleton Box-Kite into a Sand Mandalic trinity throws the off-switch on the source of the dynamics: the Z-trips which underwrite trip sync no longer even underwrite Box-Kites. The whole scenario suggests nothing so much as those boxes which, when opened by pushing an external lever, emit an arm which pulls up on the same lever, forcing the box to close and the arm to return to its hiding place inside it. Let’s turn now to the ET graphics of the flip-book sequence, so suggestive of cellular automata. For each of the 7 ET’s in question, all labels < g are monotoni- cally increasing, since S, and hence their strut opposites, exceed them all. But the only filled (but for long-diagonal crossings) rows and columns will be those with labels equal to S−g = s and its strut-opposite g, for these L-indices reside at E and B respectively in all 3 Box-Kites in the ensemble, hence either dyad contain- ing one of them makes DMZs within each of the trio’s (a,d,e) and ( f ,d,b) Sails, filling all 12 (= 24−2, minus 2 for diagonals) fillable cells in each row or column tagged with these Assessors’ label. Thus, as s is incremented, two parallel sets of perpendicular lines of ET cells start off defining a square missing its corners, then these parallels move in unit increments toward each other, until they form a 2-ply crossbar once s = 7 (S = 15). 24 cells each have row label R or column label C = s; 24 reside in lines with label = g; and 24 more have their contents P = s or g: these last have an orderliness that is less obvious, but by the last ET in the flip- book, they have arrayed themselves to form the edges of a diamond, orthogonal to the long diagonals and meeting up with the crossbar at its four corners, with s = 7 values filling the upward-pointing edges, and g = 8’s those sloping down. The graphics for the flip-book first appeared in [6, p. 15]; they were recy- cled on p. 13 of [8]; larger, easily-read versions of these ETs were then included (along with numerous other Chingon-based flip-books and other graphics we’ll discuss later) as Slides 25-31 of the Powerpoint presentation comprising [1], de- livered at Wolfram Science’s June 15-18, 2006, NKS conference in Washington, D.C. All three of these resources are available online, and the reader is especially encouraged to explore the last, whose 78 slides can be thought of as the visual accompaniment to this monograph. (Henceforth, references to numbered Slides will be to those contained and indexed in it.) 6 64-D Spectrography: 3 Ingredients for “Recipe Theory” In a manner clearly related to Bott periodicity, strut constants fall into types de- marcated by multiples of 8. But unlike the familiar modulo 8 categorization of types demonstrated, perhaps most familiarly, in the Clifford algebras of various dimensions, the situation with zero-divisors concerns not typology (which keeps producing new patterns at all dimensions), but granularity. As we shall see, em- anation tables for S > 8 (and not a power of 2), aside from diagonally aligned cells in otherwise empty stretches, display checkerboard layouts of parallel and perpendicular near-solid lines (NSLs), whose cells all have emanations save for a pair of long-diagonal crossings, and whose visual rhythms are strictly governed by S and 8 or the latter’s higher multiples. The rule we found in the 32-D Pathions for the Sand Mandalas indicates that the basic pattern (and BK5, S for 8 < S < 16) is “essentially the same” for all of them. We put the qualifying phrase in quotes, as it is an open question at this point what features, residing at what depth, are indeed “the same,” and which are different. For the moment, we will invoke the term spectrographic equivalence as a sort of promissory note, hoping to stuff ever more elements into its grab-bag of properties, beginning with two. First is something at once intuitively obvious but not readily proven. (We will include a corollary to a later theorem when we have done so). Since the first 8 possible strut-constant values all display maximally- filled ETs, and since anomalies displayed by higher values are strictly side-effects of bits to the left of the 8-bit (which are, of course, its multiples), it is natural to assume that any recursive induction upon simpler forms will echo this “octave” structure: that each time S passes a new multiple of 8, it participates in a new type. (As with the Sand Mandalas, we will see this means that BKN, S for the new 7- or 8-element spectral band of new forms will differ from that found in its predecessor band.) This will lead, in the most clear-cut cases – S = 15, or a multiple of 8 not a power of 2, say – to grids composed of 8 x 8 boxes some or all of whose borders are NSLs. How we determine which cases are clear-cut, meanwhile, and why and how we might want or need to privilege them, leads to our second property to include up-front in our grab-bag. In a manner reminiscent of the various tricks – like minors and cofactors – used in classical matrix theory to prove two matrices are equivalent, we can transform members of a spectral band into each other by cer- tain formal methods of hand-waving. With the Sand Mandalas, for instance, we could replace concrete indices in the row and column labels with abstract desig- nations referencing the (a,b,c) values of each of their 3 Box-Kites, listed in one of a number of predetermined orders: by least-first CPO ordering of such (a,b,c) triplets, in a sequence determined by the Zigzag L-trip of the Sedenion Box-Kite we can derive them from, for instance (which is equivalent to the 3 sand-mandalic Box-Kites’ d values, as we’ve seen). Since which cells are filled is strictly determined by S and G, such desig- nations eliminate all individuality among the ETs in question. Hence, if certain display features of one of them seem convenient, we can convert its “tone row” of indices populating its row and column labels into an abstract layout, governed by which index is associated with which Assessor, in the manner sketched last paragraph. We could then use this layout as the template for re-writes of all other ETs in the same spectral band, knowing that results obtained using the specific instantiation of the band could thereby be converted into exactly analogous ones for the other band-members. We will, in fact, implicitly adopt this tactic by using S = 1 as an exemplary “for instance” in numerous arguments, while employing the highest-valued S found among the Sand Mandalas, 15, to simplify the visualizing (and calculat- ing) of recursive pattern creation for fixed-S, growing N sequences. (S = 15 is chosen because it has all its low bits filled, hence all XORs are derived by simple subtraction, leaving carrybit overflow to show itself only in what matters most to us: the turning off of 4 candidate Box-Kites in the Pathions, and – as we will show two sections hence – 16 in the Chingons, and 4N−4 in all higher 2N-ions.) Where we termed, for reasons already explained, the fixed-N, growing S sequences flip- books, we designate these new displays (for reasons we’ll justify shortly) balloon- rides. While there is but one abstract type for the Sedenions, with one Box-Kite for each of the 7 possible S values, a second spectral band emerges in the Pathions to include the Sand Mandalas, and two more are added for the 64-D Chingons. By induction from the universally shared first band for all N > 3, where there are TripN−2 Box-Kites in each ET, for each S ≤ 8, the first new spectrographic addition includes the upper multiple of 8 that bounds it, since it is not a power of 2: 16 < S ≤ 24. The second new range, though, is bounded by G, hence does not include it, as it is tautologically a power of 2 (which powers, as we saw two sections ago, comply with a type all their own, with the same Box-Kite-count formula as for the lowest spectral band): 24 < S < 32. Each of these two new bands displays a distinctive feature which underwrites one of the three key ingredients for the recipe theory we are ultimately aiming for. We call these, for S ascending, (s,g)-modularity and hide/fill involution respec- tively. The third key ingredient, meanwhile, resides in the band that first emerges in the Pathions – and whose echo in the Chingons has recapitulative features suffi- ciently rich as to merit the name of recursivity. We will be devoting Part III’s first post-introductory section to a thorough treatment of the simplest instance of this third ingredient, showing how to ascend into the meta-fractal we call the Whor- fian Sky (named for the great theorist of linguistics, Benjamin Lee Whorf, whose last-ever lecture on “Language, mind and reality” described the layering of mean- ing in language in a manner strongly suggesting something akin to it). Among many visionary passages in his descriptions of a future cross-disciplinary science, the following seems most apt to serve as the lead-in quote for the third and final sweep of our argument [9]: Patterns form wholes, akin to the Gestalten of psychology, which are embraced in larger wholes in continual progression. Thus the cos- mic picture has a serial or hierarchical character, that of a progression of planes or levels. Lacking recognition of such serial order, differ- ent sciences chop segments, as it were, out of the world, segments which perhaps cut across the direction of the natural levels, or stop short when, upon reaching a major change of level, the phenomena become of quite different type, or pass out of the ken of the older ob- servational methods. But · · · the facts of the linguistic domain compel recognition of serial planes, each explicitly given by an order of pat- terning observed. It is as if, looking at a wall covered with fine tracery of lacelike design, we found that this tracery served as the ground for a bolder pattern, yet still delicate, of tiny flowers, and that upon be- coming aware of this floral expanse we saw that multitudes of gaps in it made another pattern like scrollwork, and that groups of scrolls made letters, the letters if followed in a proper sequence made words, the words were aligned in columns which listed and classified enti- ties, and so on in continual cross-patterning until we found this wall to be – a great book of wisdom! [10, p. 248] Appendix A: Genealogy of S = 1 Box-Kites N = 4: Unique Quaternion L-index set (1,2,3) fed as Rule 0 circle into PSL(2,7) with central g = 4, yielding 7 Octonions trips, each with a different S. For S = 1, have (3,6,5), which becomes singleton Rule 0 for next level. N = 5: (3,6,5) fed as Rule 0 circle into PSL(2,7) with central g= 8 yields 3 Rule 2 L-trips as triangle’s sides, which (upon affixing their strut opposites as L-indices) generate (along with zero-padded (3,6,5) ) 4 Box-Kites with X = G+1 = 17. Triangle’s medians become (a,d,e) Trefoil L-index sets of 3 Rule 1 S = 1 Box- Kites, making 7 in all. These Zigzag L-index sets become Rule 0 trips for the next level, and are: Rule 0: (3,6,5) Rule 1: (3,10,9); (6,15,9); (5,12,9) Rule 2: (3,13,14); (6,11,13); (5,14,11) N = 6: The 7 N = 5 Zigzag L-index sets just listed are fed as Rule 0 circles into PSL(2,7) triangles with central g = 16, and are Zigzag L-index sets in their own right for Box-Kites with X = G+1 = 33. 10 Rule 1 medians, 3 redundant (as they generate ( f ,d,b)’s where (a,d,e)’s are also given: (14,16,30)* and (11,16,27)** in (5,14,11)’s triangle, the latter also in (6,11,13)’s). They are associated with these 7 Zigzag L-index sets: (3,18,17); (5,20,17); (6,23,17); (9,24,17); (10,27,17)∗; (12,29,17); (15,30,17)∗∗ Rule 2 sides: 3 per each Rule 0 trip, as follows: (3,6,5)→ (3,21,22); (6,19,21); (5,22,19) (3,10,9)→ (3,25,26); (10,19,25); (9,26,19) (6,15,9)→ (6,25,31); (15,22,25); (9,31,22) (5,12,9)→ (5,25,28); (12,21,25); (9,28,21) (3,13,14)→ (3,30,29); (13,19,30); (14,29,19) (6,11,13)→ (6,29,27); (11,22,29); (13,27,22) (5,14,11)→ (5,27,30); (14,21,27); (11,30,21) N = 7: Feed the just-listed 35 Zigzag L-index sets to PSL(2,7)’s with g = 32, as Rule 0 circles, thereby generating the 155 S = 1 Zigzags found in the 27-ions, or Routions – named for the site of the Internet Bubble’s once-famed “Massachusetts Miracle,” Route 128 – and so on. Appendix B: A Brief Intro to “Type II” Box-Kites The recursive generation of Zigzag L-sets just presented calls for some close at- tention when the box-kites involved are Type II, since they then have the diagonals of their PSL(2,7) triangles oriented differently: instead of all 3 leading from mid- points of the Rule 2 sides to the corners, only 1 of these will preserve orientation for a Type II (with the other two having “reversed VZ1” rules in evidence). We first give a construction for producing all the Type II box-kites in the Pathions, and then indicate the manner in which their workings are intimately connected with the phenomenology of twist products broached in Part I’s Theorem 6. The construction was presented with different framing in [8], where we de- ployed a “stereo Fano” representation using side-by-side triangles, the left being a proper PSL(2,7). Within the Pathions, there are 7 distinct box-kites for each S ex- cept for the “flip-book” trios, one for each S> 8. And for S= 8 exactly, we saw in our discussion of the Number Hub Theorem that we can build all 7 by placing 8 in the center of the standard Fano (what we’ll call PSL(2,7) henceforth), then taking the Zigzag L-trip for each Sedenion S and placing its units at the sides’ midpoints, in the usual CPO order (in left, right, and bottom sides respectively). Each of these 7 lines then generates a new box-kite in the Pathions for the Sedenion S in question. If we re-inscribe the starter-kit L-trip, but change G to the Pathion’s 16, ap- plying VZ2 gives us new U-index terms, but the L-index terms for all 6 Assessors remain the same as for the Sedenion box-kite: we call this “Rule 0” instance the zero-padded box-kite (or just ZP) for the S value in question. If we take the 3 “Rule 1” triplets along the struts, and place them not at the A, B, C positions of our new Pathion box-kites, but instead at A, D, E (with 8 always winding up at D), we generate 3 more standard (Type I) box-kites. For S = 1, the Sedenion Zigzag L-trip is just (3,6,5), and each of its units becomes the low-index ‘A’ for a new Pathion box-kite, with L-indices written in “nested parentheses” order (that is, A, B, C, D, E, F) as follows: (3,10,9,8,11,2); (6,15,9,8,14,7); (5,12,9,8,13,4). But if we take the 3 “Rule 2” triplets along the edges, mapping the Zigzag unit at the center of each to the low-index ‘A’ of a new box-kite, the 8 doesn’t show at any Assessor, and two of the three struts will have orientations reversed. These “Type II” box-kites, again for S = 1, written per the same convention just used for their 8-bearing siblings, read like this: (3,13,14,15,12,2); (6,11,13,12,10,7); (5,14,11,10,15,4). Since the A and F low-index terms are the same as in the same-S Sedenion box-kite, the strut they make obviously has the standard orienta- tion. (But note that there is nothing essential about the (A,F) strut here: the placing of the lowest-indexed unit of the Zigzag L-trip at A is a convenient convention, and its employment in the Pathions suffices to induce this effect; however, it no longer suffices in higher dimensions, where S can exceed 8 yet still be less than That their being Type II is an immediate side-effect of “Rule 2” in this method of deriving them should be obvious. What is less obvious is their special relation- ship with twist products. Here, we review some of the basics: in the Sedenions, whenever two Assessors bound an edge, we can swap a pair of corresponding terms (either L- or U- indices) and then switch the sign joining the L- and U- in- dices in the resultant pairing, and get an Assessor in another box-kite as a result. Such “twist products,” then, reverse the edge-sign of a given line of ZDs as we move between containing box-kites. Moreover, such twists are naturally investi- gated in the context of the squares, not the triangles, of the octahedral vertex figure we write Assessors on: the three orthogonal Catamarans, then, instead of the four touching-only-at-the-vertices Sails. That’s because opposite sides of a Catamaran twist to Assessors in the same box-kite, so that each Catamaran lets one twist to two different box-kites – with the terminal Catamaran, in each case, being further twistable into the box-kite you didn’t twist to in the first instance. As shown in the “Twisted Sister” and “Royal Hunt” diagrams of [4], these triple transforms can be represented in their own Fano planes, with the indices placed on their loci now corresponding to the strut constants of a septet of box-kites. Each Catamaran comprises the pathways connecting 4 Assessors – meaning it doesn’t connect up with either term of the third strut in its box-kite. It is not hard to see that the strut constant of the box-kite one twists to is equal to the strut- opposite of the term which completes the L-trip of the edge being twisted in the first place. Hence, any L-index term on a Sedenion box-kite corresponds to the strut constant of another such box-kite one can twist to. This suggests expanding the meaning of “twist product” to embrace pairings which share a strut rather than an edge. For, if we allow this, we can then treat the third strut orthogonal to the square hosting twists as the “mast” of the Catamaran, giving us an expanded sense of this latter term which allows us a major simplification: instead of thinking of the Sedenions’ ZDs as distributed among 7 distinct box-kites, we can see them all included in one “embroidered” box-kite diagram, which we call a brocade. Each of the 12 box-kite edges allows twists to a pair of different Assessors – let’s say (A, b) and (B, a), in the box-kite with S = copp = d. More, the (S,X) pair – which we can think of as in the box-kite’s center – can be “twisted” with all 6 Assessors in the original box-kite to yield 12 more. We therefore have 6 + 24 + 12 = the total set of “42 Assessors” in the Sedenions, all representable, on any one of the 7 component box-kites, as a unitary “brocade.” It would be nice to be able to generalize the “brocade” notion so as to reduce the number of basic structures in higher-order contexts: in the Pathions, for in- stance, there are 77 box-kites, all but 21 of which are “Type I,” with 21 of those coming in sand-mandala triples, 7 forming the S = 8 “Atlas,” plus 7 ZP’s and 3 ·7 “strongboxes” (so called, because these low-S box-kites contain “pieces of 8”) completing the collection. But if we also count in the 4 · 7 = 28 “missing” box- kites for high S, we can collapse our head count from 105 box-kite-like structures to 15 brocades. Miming the Sedenion situation, the 7 ZP’s form the simplest; the 7 Sand-Mandala trios intermingle with the Atlas septet and the 21 strongboxes to make 7 more brocades; and the 21 Type II box-kites twist into each other (to fill out one Catamaran in each) and into the “hidden box-kites” linked with high S (filling out two more Catamarans per Type II instance), yielding up the final set of 7 brocades. (We note that the Type II situation is not as mysterious as it might appear, once we recall the “slipcover proof” logic of Part I, Section 5: with 2 of 3 strut triplets being reversed, “tugging” on a Type II’s Fano will tend to send a reversed arrow onto an edge 4 times out of 6 – meaning that, in all such cases, the corollary to Theorem 7, and hence the theorem itself, will fail, thereby explaining the “why” of “missing” box-kites!) We gain the generalized “brocade” simplification at a very small price: relax- ing the notion of “twist product” to embrace source and target L- and U- index pairs which aren’t necessarily zero-divisors within the context of the G at hand. But this is an investment which pays dividends, since it allows us to use Type II structures as “middlemen” to facilitate studying the “hidden box-kite” substruc- tures of the meta-fractal “white space” in high-S ET’s. Given the semiotic and semantic importance of “ZD-free” structures (recall that our transcription of Pe- titot’s analysis of Greimas’ “Semiotic Square” into zero-divisor theory is based on ZD-free strut opposites), we can expect a richness of results based on Catama- ran study that should at least equal that we are conducting based on Sails. (For a “coming attraction,” interested readers should see the online Powerpoint slide- show linked with our NKS 2007 presentation [11], which will play a role with respect to our forthcoming and similarly named monograph, “Voyage by Catama- ran,” akin to that our NKS 2006 slide-show did for the theorem/proof exposition you are currently reading.) References [1] Robert P. C. de Marrais, “Placeholder Substructures: The Road from NKS to Small-World, Scale-Free Networks Is Paved with Zero-Divisors,” http:// wolframscience.com/conference/2006/ presentations/materials/demarrais.ppt (Note: the author’s surname is listed under “M,” not “D.”) [2] Robert P. C. de Marrais, “Placeholder Substructures I: The Road From NKS to Scale-Free Networks is Paved with Zero Divisors,” Complex Systems, 17 (2007), 125-142; arXiv:math.RA/0703745. [3] Robert P. C. de Marrais, “The 42 Assessors and the Box-Kites They Fly,” arXiv:math.GM/0011260. [4] Robert P. C. de Marrais, “Presto! Digitization,” arXiv:math.RA/0603281 [5] Stephen Wolfram, A New Kind of Science, (Wolfram Media, Champaign IL, 2002). Electronic version at http://www.wolframscience.com/nksonline. [6] Robert P. C. de Marrais, “Flying Higher Than A Box-Kite,” arXiv:math.RA/0207003. http://arxiv.org/abs/math/0703745 http://arxiv.org/abs/math/0011260 http://arxiv.org/abs/math/0603281 http://www.wolframscience.com/nksonline http://arxiv.org/abs/math/0207003 [7] Robert P. C. de Marrais, “The Marriage of Nothing and All: Zero-Divisor Box-Kites in a ‘TOE’ Sky”, in Proceedings of the 26th International Col- loquium on Group Theoretical Methods in Physics, The Graduate Center of the City University of New York, June 26-30, 2006, forthcoming from Springer–Verlag. [8] Robert P. C. de Marrais, “The ‘Something From Nothing’ Insertion Point”, http://www.wolframscience.com/conference/2004/presentations/materials/ rdemarrais.pdf [9] Robert P. C. de Marrais, “Placeholder Substructures III: A Bit-String-Driven ‘Recipe Theory’ for Infinite-Dimensional Zero-Divisor Spaces,” arXiv:0704.0112 [math.RA]) [10] Benjamin Lee Whorf, Language, Thought, and Reality, edited by John B. Carroll (M.I.T. Press, Cambridge MA, 1956). [11] Robert P. C. de Marrais, “Voyage by Catamaran: Long-Distance Seman- tic Navigation, from Myth Logic to Semantic Web, Can Be Effected by Infinite-Dimensional Zero-Divisor Ensembles,” wolframscience.com/ conference/2007/presentations/materials/demarrais.ppt (Note: the author’s surname is listed this time American style, under “D,” not “M.”) http://www.wolframscience.com/conference/2004/presentations/materials/ http://arxiv.org/abs/0704.0112 Introduction By Way of Reprise: From Box-Kites to ETs Emanation Tables: Conventions for Construction ETs for N > 4 and S 7 The Number Hub Theorem (S = 2N - 2) for 2N-ions The Sand Mandala Flip-Book (8 < S < 16, N = 5) 64-D Spectrography: 3 Ingredients for ``Recipe Theory'' ABSTRACT Zero-divisors (ZDs) derived by Cayley-Dickson Process (CDP) from N-dimensional hypercomplex numbers (N a power of 2, at least 4) can represent singularities and, as N approaches infinite, fractals -- and thereby,scale-free networks. Any integer greater than 8 and not a power of 2 generates a meta-fractal or "Sky" when it is interpreted as the "strut constant" (S) of an ensemble of octahedral vertex figures called "Box-Kites" (the fundamental building blocks of ZDs). Remarkably simple bit-manipulation rules or "recipes" provide tools for transforming one fractal genus into others within the context of Wolfram's Class 4 complexity. <|endoftext|><|startoftext|> Filling-Factor-Dependent Magnetophonon Resonance in Graphene M. O. Goerbig,1 J.-N. Fuchs,1 K. Kechedzhi,2 and Vladimir I. Fal’ko2 Laboratoire de Physique des Solides, Univ. Paris-Sud, CNRS UMR 8502, F-91405 Orsay, France and Department of Physics, Lancaster University, Lancaster, LA1 4YB, United Kingdom (Dated: October 23, 2018) We describe a peculiar fine structure acquired by the in-plane optical phonon at the Γ-point in graphene when it is brought into resonance with one of the inter-Landau-level transitions in this material. The effect is most pronounced when this lattice mode (associated with the G-band in graphene Raman spectrum) is in resonance with inter-Landau-level transitions 0 ⇒ +, 1 and −, 1 ⇒ 0, at a magnetic field B0 ≃ 30T. It can be used to measure the strength of the electron- phonon coupling directly, and its filling-factor dependence can be used experimentally to detect circularly polarized lattice vibrations. PACS numbers: 78.30.Na, 73.43.-f, 81.05.Uw In metals and semiconductors the spectra of phonons are renormalized by their interaction with electrons. Some of the best known examples include the Kohn anomaly [1] in the phonon dispersion, which originates from the excitation/de-excitation of electrons across the Fermi level upon the propagation of a phonon through the bulk of a metal and a shift in the longitudinal opti- cal phonon frequency in heavily doped polar semiconduc- tors [2]. However, despite the transparency of theoretical models the observation of such effects is often obscured by the difficulty to change the electron density in a mate- rial, whereas in semiconductor structures containing two- dimensional (2D) electrons the density of which can be varied, the influence of the latter on the phonon modes is weak due to a negligibly small volume fraction occupied by the electron gas. In this context, a unique opportunity arises in graphene-based field-effect transistors [3], where the density of carriers in an atomically thin film (mono- layer [4, 5, 6] or a bilayer [7]) can be continuously varied from 1013cm−2 p-type to 1013cm−2 n-type. Several Ra- man experiments have already been reported [8, 9] where the variation of carrier density in graphene changes the optical phonon frequency, in agreement with theoretical expectations [10, 11, 12]. When graphene is exposed to a quantizing magnetic field, its electronic spectrum quenches into discrete Lan- dau levels (LLs) [13]. Then, the optical phonon energy in graphene may coincide with the energy of one of the inter- LL transitions, a condition known as magnetophonon resonance [14, 15]. Recently, Ando has suggested [16] that in undoped graphene the magnetophonon resonance enhances the effect of the electron-phonon coupling on a spectrum of the in-plane optical phonons - the E2g modes attributed to the G-band in the Raman spectra in Refs. [8, 9, 17, 18, 19]. In this paper, we investigate a rich structure of the anti-crossing experienced by such lattice modes when a magnetic field makes their energy equal to the energy of one of the valley-antisymmetric interband magnetoexcitons [20]. Most saliently, the dif- ference between circular polarization of various inter-LL transitions [21, 22] makes the magnetophonon resonance distinguishable for lattice vibrations of different circular polarization, which makes the number of split lines in the fine structure acquired by a phonon and the value of splitting dependent on the electronic filling factor, ν. The in-plane optical phonons in graphene [relative dis- placement u = (ux, uy) of sublattices A and B] have the energy ω ≈ 0.2eV at the Γ-point (in the center of the Brillouin zone). These phonons and their coupling to electrons can be described using the Hamiltonian [10, 11], Hph = ωb†µ,qbµ,q + g 2Mω(σxuy − σyux), (1) u(r) = 2NucMω bµ,q + b eµ,qe −iq·r, where b µ,q are annihilation (creation) operators of a phonon with polarisation eµ,q, M is the mass of a car- bon atom, and Nuc is the number of unit cells. Here and below, we use units ~ ≡ 1. Also, we shall uti- lize a double degeneracy of the E2g mode at the Γ-point (at q = 0) and describe the in-plane optical phonon in terms of a degenerate pair of circularly polarized modes, u = (ux+iuy)/ 2 and u� = u . The constant g in Eq. (1) characterizes the electron-phonon coupling [23]. This coupling has the form of the only invariant linear in u per- mitted by the symmetry group of the honeycomb crystal. It is constructed using Pauli matrices σ = (σx, σy) acting in the space of sublattice components of the Bloch func- tions, [φK+A, φK+B] and [φK−B, φK−A] which describe electron states in the valleys K± (two opposite corners of the hexagonal Brillouin zone) and obey the Hamiltonian, in terms of the electron charge −e < 0 [24], Hel = ξvσ · p, p =− i∇+ eA, ∂xAy − ∂yAx = B. Here, ξ = ± distinguishes between K±, and momentum p is calculated with respect to the center of the corre- sponding valley. This Hamiltonian represents the dom- inant term of the next-neighbor tight-binding model of graphene [25, 26, 27], and the electron-phonon coupling http://arxiv.org/abs/0704.0027v4 : B sublattice : A sublattice (a) (b) −,(n+1) +,(n+1) FIG. 1: (a) Optical phonons are lattice vibrations with an out- off-phase oscillation of the two sublattices. (b) Interband electron- hole excitations coupling to phonon modes with different circular polarization. in Eq. (1) takes into account the change in the A − B hopping elements due to the sublattice displacement [28]. In a perpendicular magnetic field, Hel determines [13] a spectrum of 4-fold (spin and valley) degenerate LLs, εα=±n = α 2nvλ−1B in the valence band (ε n>0), con- duction band (ε+n>0), and at zero energy (ε0 = 0, ex- actly at the Dirac point in the electron spectrum), in terms of the magnetic length λB = 1/ eB. Such a spectrum has been confirmed by recent quantum Hall effect measurements [4, 5, 6]. In each of the two val- leys, the LL basis is given by two-component states 1 + δn,0φn,m, iξα(1−δn,0)φn−1,m], where φn,m are the LL wave functions described by the quantum num- bers n and m, the latter being related to the guiding center degree of freedom. Here, we neglect the Zeeman effect, and simply take into account the two-fold spin de- generacy. Excitations of electrons between LLs can be described in terms of magnetoexcitons (see Fig. 1). Those relevant for the magnetophonon resonance are (n, ξ) = 1 + δn,0 +,n,m;ξc−,(n+1),m;ξ, �(n, ξ) = 1 + δn,0 +,(n+1),m;ξ c−,n,m;ξ, (2) where the index A = ,� characterizes the angular mo- mentum of the excitation and the operators c α,n,m;ξ annihilate (create) an electron in the state α, n,m in the valley Kξ. The normalization factors N n = [(1 + δn,0)NB(ν̄−,(n+1) − ν̄+,n)]1/2 and N�n = [(1 + δn,0)NB(ν̄−,n − ν̄+,(n+1))]1/2 are used to ensure the bosonic commutation relations of the exciton operators, [ψA(n, ξ), ψ ′, ξ′)] = δA,A′δξ,ξ′δn,n′ , where NB is the total number of states per LL in a sample, including the two-fold spin-degeneracy. These commutation rela- tions are obtained within the mean-field approximation with 〈c†α,n,m;ξcα′,n′,m′;ξ′〉 = δξ,ξ′δα,α′δn,n′δm,m′(δα,− + δα,+ν̄α,n), where 0 ≤ ν̄α,n ≤ 1 is the partial filling fac- tor of the n-th LL. Similarly to magneto-optical selec- tion rules in graphene [20, 21, 22], α, n ⇒ α′, n ± 1, - polarized phonons are coupled to electronic transitions with −, (n + 1) ⇒ +, n, and �-polarized phonons to −, n ⇒ +, (n + 1) magneto-excitons, at the same en- ergy Ωn ≡ 2(v/λB)( n+ 1) (Fig. 1), which follows directly from the composition of the LL in graphene and the form of the electron-phonon coupling in Eq. (1). In contrast to photons that couple to the valley-symmetric mode ψA,s(n) = [ψA(n,K+) + ψA(n,K−)]/ 2, electron-phonon interaction in Eq.(1) couples phonons to the valley-antisymmetric magnetoex- citon ψA,as(n) = [ψA(n,K+)− ψA(n,K−)]/ In terms of magnetoexcitons we can, now, rewrite the electron-phonon Hamiltonian in a bosonized form, as τ=s,as A,τ (n)ψA,τ (n) + AbA (3) gA(n) ψA;as(n) + bAψ A;as(n) g (n) = g (1 + δn,0)γ ν̄−,(n+1) − ν̄+,n, g�(n) = g (1 + δn,0)γ ν̄−,n − ν̄+,(n+1), where gA are the effective coupling constants, with γ = 3a2/2πλ2B and a = 1.4Å (distance between neighbor- ing carbon atoms). In the Hamiltonian (3), we have omit- ted electronic excitations with a higher angular momen- tum which do not couple to the in-plane optical phonon modes (e.g., n ⇒ n′, with n′ 6= n ± 1). The dressed phonon operator corresponding to the Hamiltonian (3) is obtained by solving Dyson’s equation. The pole of the propagator gives the antisymmetric coupled mode fre- quencies ω̃A, ω̃2A − ω2 = 4ω n=nF+1 ω̃2A − Ω2n ∆nF g A(nF ) ω̃2A −∆2nF , (4) where nF stands for the number of the highest fully occu- pied LL in the spectrum, and ∆n = 2(v/λB)( n+ 1−√ n). In Eq. (4), the sum (extended up to the high- energy cut-off N ∼ (λB/a)2 above which the electronic dispersion is no longer linear) takes into account inter- band magnetoexcitons, and the last term gives a small correction due to an intraband magnetoexciton. In the small-field limit and large doping (nF ≫ 1), solution of Eq. (4) reproduces the zero-field result [10, 11] if one replaces the sum by an integral, n=0 → dn, ap- proximates n+ 1 ≈ 2 n and ∆nF ≈ 0, and, then, linearizes Eq. (4) by replacing ω̃A by ω in the de- nominator, ω̃ ≃ ω̃0 + λ ω + 2 2nFv/λB ω − 2 2nFv/λB ω̃0 ≃ ω + 2 ω2 − Ω2n where λ = (2/ 3π)(g/t)2 ≃ 3.3× 10−3 is the same as in Refs. [10, 16] (t = 2v/3a ∼ 3eV is the A−B hopping am- plitude), and ω̃0 is the renormalized phonon frequency in an undoped graphene sheet at B = 0. The only variation arises at high fields, ω̃0 & 2v/λB, where for nF = 0 the linearized Eq. (4) yields ω̃ ≃ ω̃0 − g2(0) (ω̃0λB/ 2v)2 − 1 The strongest effect of the phonon coupling to elec- tron modes occurs when the frequency of the former coincides with the frequency Ωn of one of the magne- toexcitons ψA,as(n). In such a case, the sum on the right-hand-side of the eigenvalue equation (4) is domi- nated by the resonance term and may be approximated by 2ωg2A(n)/ (ω̃A − Ωn). This results in a fine structure of mixed phonon-magnetoexciton modes, ψA,as(n) cos θ+ bA sin θ with frequency ω̃ A and ψA,as(n) sin θ − bA cos θ with frequency ω̃−A [where cot 2θ = (Ωn−ω̃0)/2gA], which are determined for each polarisation (A = ,� ) sepa- rately, (n) = 1 (Ωn + ω̃0)∓ (Ωn − ω̃0)2 + g2A(n). (5) A generic form of the phonon-magnetoexciton anti- crossing and formation of coupled modes, ω± (n) in un- doped graphene (i.e., ν = 0) is illustrated in Fig. 2(a). Such an anticrossing and mode mixing is simlar to that described by Ando [16]. It can manifest itself in Raman spectroscopy: in a fine structure acquired by the G-line (earlier attributed [8, 9, 17, 18, 19] to the in-plane op- tical phonon at the Γ-point, E2g mode) at the magneto- phonon resonance conditions. The effect is the strongest for the resonance Ωn=0 ≈ ω̃0 between the phonon and magnetoexciton based upon −, 1 ⇒ 0 and 0 ⇒ +, 1 tran- sitions. When approaching the resonance (by sweeping a magnetic field), the phonon line becomes accompanied by a weak satellite moving towards it and increasing its in- tensity. Exactly at the magnetophonon resonance, where both the upper mode [ω̃+A(n)] and the lower mode [ω̃ A(n)] consist of an equal-weight superposition of the phonon and the resonant exciton, with cos θ = sin θ = 1/ the G-band in graphene would appear as two lines. For Ωn=0 = 2v/λB ≈ 36 B[T] meV (see [16, 24]) and ω̃0 ≃ 200 meV, this resonance occurs in an experimen- tally accessible field range, B0 ≃ 30 T. For the filling factor ν = 0, the central LL (n = 0) is always half-filled. Then, coupling and, therefore, splitting of the �- and - polarized modes coincide, g� = g , thus, giving rise to a pair of peaks at the energies ω̃± = ω̃0 ± g� sketched in part I in Fig. 2(b). For the magnetic field value B0 ≃ 30 T and g ≃ 0.28eV [12], we estimate this splitting as 2gA ∼ 16meV (∼ 130cm−1), which largely exceeds the G-band width observed in Refs. [8, 9, 17, 18, 19]. Doping of graphene changes the strength of the cou- pling constants g� and g , as shown in Fig. 2(c). This 5 10 15 20 25 30 35 Magnetic Field [T] 3010 20 40 ν = 0 0 < |ν| < 2 |ν| = 2 2g 2g mode splitting −6 −4 −2 0 4n−2 4n+2 4n+6 B=Bn>02γ FIG. 2: (a) Coupled phonon and magneto-excitons as a function of the magnetic field. Energies are in units of the bare phonon energy ω. Dashed lines indicate the uncoupled valley-symmetric modes, with gA = 0. (b) Mode splitting as a function of the filling factor, as may be seen in Raman spectroscopy, with the resonance condition Ωn=0 ≈ ω̃0, for ν = 0 in (I), 0 < |ν| < 2 (in II), and ν = ±2 (in III). The absolute intensity of the modes is in arbitrary units, but the height and the width reflect the expected relative intensities. (c) Mode splitting for n = 0, as a function of the filling factor ν. (d) Same as in (c) for n ≥ 1. is because a higher (lower) occupancy of the n = 0 LL reduces (enhances) the oscillator strength of the polar- ized transition due to the availability of filled and empty states in the involved LLs, whereas the same change in the electron density has the opposite effect on g�. As a result, for an arbitrary filling factor −2 < ν < 2, we predict that, in the vicinity of magnetophonon reso- nance, the phonon mode (and, therefore, G-band in Ra- man spectrum) should split into four lines [part II in Fig. 2(b)], with ω̃±� = ω̃±g� for �-polarized and ω̃± = ω̃±g for -polarized phonons. In the quantum Hall state at filling factor ν = 2, the transition −, 1 ⇒ 0 becomes suc- cessively blocked and no longer affects the frequency of a -polarized phonon, whereas the transition 0 ⇒ +, 1 ac- quires the maximum strength, thus, increasing the cou- pling parameter g�. This leads to the magnetophonon resonance fine structure consisting of three peaks, with an even larger splitting between side lines, as sketched in part III in Fig. 2(b). Interestingly, this may enable one to directly observe lattice modes with a definite circu- lar polarization. A further increase of the electron filling factor reduces the side-line splitting which should com- pletely disappear at ν = 6, after the transition 0 ⇒ +, 1 becomes blocked by a complete filling of the +, 1 LL [Fig. 2(c)]. The same arguments hold for p-doped graphene, though in this case the roles of �- and -polarized modes are interchanged. Magnetophonon resonances with other possible inter- LL transitions n ⇒ n+ 1 occur at much lower magnetic fields, Bn = B0/( n+ 1)2. For example, a resonant phonon coupling with the magnetoexciton ψA;as(1) is ex- pected to occur at B1 ≈ 5T. Its description remains qual- itatively similar, though for n > 0 the mode splitting is less pronounced because of the B-field dependence of the coupling constants in Eq. (3). One finds that g� = g for |ν| < 2(2n − 1). At ν = 2(2n − 1), filling of the n- th LL starts changing, which reduces splitting of the - polarized mode and gives rise to the four-peak structure. At ν = 2(2n+1), where the +, n LL becomes completely filled, splitting of the -polarized phonon vanishes, thus, resulting in the three-peak fine structure [part III in Fig. 2(b)] that would persist up to ν = 2(2n + 3). This is because the splitting of the �-polarized modes remains constant up to the filling factor ν = 2(2n + 1), above which population of the +, (n+ 1) LL starts to suppress the value of g�, until the latter vanishes at ν = 2(2n+3) [see Fig. 2(d)]. In conclusion, we have predicted a filling-factor depen- dence of the fine structure acquired by the in-plane (E2g) optical phonon in graphene when the latter is in reso- nance with one of the inter-LL transitions in this ma- terial. The effect is expected to be most pronounced when the phonon is resonantly coupled to the 0 ⇒ +, 1 and −, 1 ⇒ 0 transitions, which requires a magnetic field B0 ≃ 30T. The predicted mode splitting may be used to measure directly the strength of the electron-phonon coupling, and also to distinguish between circularly (left- and right- hand) polarized lattice modes. We thank D. Abergel, A. Ferrari, P. Lederer, and A. Pinczuk for useful discussions. This work was suported by Agence Nationale de la Recherche Grant ANR-06- NANO-019-03 and EPSRC-Lancaster Portfolio Partner- ship EP/C511743. We thank the MPI-PKS workshop ‘Dynamics and Relaxation in Complex Quantum and Classical Systems and Nanostructures’ and the Kavli Institute for Theoretical Physics, UCSB (NSF PHY99- 07949) for hospitality. [1] W. Kohn, Phys. Rev. Lett. 2, 393 (1959). [2] G.D. Mahan, Many-Particle Physics, Kluwer Academic, New York 2000. [3] K. Novoselov et al., Science 306, 666 (2004). [4] K. Novoselov et al., Nature 438, 197 (2005). [5] Y. Zhang et al., Nature 438, 201 (2005). [6] Y. Zhang et al., Phys. Rev. Lett. 96, 136806 (2006). [7] K. Novoselov et al., Nature Phys. 2, 177 (2006). [8] S. Pisana et al., Nat. Mater. 6, 198 (2007). [9] J. Yan, Y. Zhang, P. Kim, and A. Pinczuk, Phys. Rev. Lett. 98, 166802 (2007). [10] T. Ando, J. Phys. Soc. Jpn. 75, 124701 (2006). [11] A.H. Castro Neto and F. Guinea, Phys. Rev. B 75, 045404 (2007). [12] M. Lazzeri and F. Mauri, Phys. Rev. Lett. 97, 266407 (2006). [13] J.W. McClure, Phys. Rev. 104, 666 (1956). [14] J.P. Maneval, A. Zylberzstejn, and H.F. Budd, Phys. Rev. Lett. 23, 848 (1969); G. Bauer and H. Kahlert, Phys. Rev. B 5, 566 (1972). [15] R.J. Nicholas, S.J. Sessions, and J.C. Portal, Appl. Phys. Lett. 37, 178 (1980); T.A. Vaughan et al., Phys. Rev. B 53, 16481 (1996). [16] T. Ando, J. Phys. Soc. Jpn 76, 024712 (2007). [17] A.C. Ferrari et al., Phys. Rev. Lett. 97, 187401 (2006). [18] A. Gupta et al., Nano Lett. 6, 2667 (2006). [19] D. Graf et al., Nano Lett. 7, 238 (2007). [20] A. Iyengar et al., Phys. Rev. B 75, 125430 (2007). [21] M.L. Sadowski et al., Phys. Rev. Lett. 97, 266405 (2006). [22] D.S.L. Abergel and V. I. Fal’ko, Phys. Rev. B 75, 155430 (2007). [23] Numerical results yield g = 〉F ≃ 0.28eV; S. Pis- canec et al., Phys. Rev. Lett. 93, 185503 (2004). [24] We use the reported value v = 108cm/s; A.K. Geim and K.S. Novoselov, Nat. Mater. 6, 183 (2007). [25] P.R. Wallace, Phys. Rev. 71, 622 (1947). [26] R. Saito, G. Dresselhaus, M.S. Dresselhaus, Physical Properties of Carbon Nanotubes, Imperial College Press, London 1998. [27] T. Ando, J. Phys. Soc. Jpn. 74, 777 (2005). [28] The electron-phonon coupling is off-diagonal because a lattice distortion affects the bond length and thus the nearest-neighbor hopping between the two different sub- lattices [10, 11]. Erratum In the previous version (v3) of this Letter, we have underestimated the numerical value of the mode split- ting of the magnetophonon resonance [see paragraph af- ter Eq. (5)] by a factor of 2 (the text above takes into account the corrected parameters). This is a result of two mistakes. First, there is a factor of 2, which finds its origin in an erroneous normalization of the circular polarized phonons. They should indeed be defined as u = (ux + iuy)/ 2 and u� = (ux − iuy)/ 2 [and not as u = ux + iuy and u� = ux − iuy as incorrectly assumed on page 1, second column], such that the asso- ciated phonon operators bA obey the usual commutation relations [bA, b ] = δA,A′ , with A = ,�. This yields a factor of 2 in the definition of the effective coupling constants [Eq. (3)], which read in the corrected form g (n) = g (1 + δn,0)γ ν̄−,(n+1) − ν̄+,n , g�(n) = g (1 + δn,0)γ ν̄−,n − ν̄+,(n+1) . As a consequence, the zero-field dimensionless coupling constant λ [defined in the first column page 3 of our Letter] is multiplied by a factor of 2 and becomes λ = 3π)(g/t)2. Second, we also underestimated the numerical value of the electron-phonon coupling constant g by a factor of√ 2. Indeed, g defined in our work [see Eq. (1)] is related to 〈g2Γ〉F ≃ 0.0405 eV2 computed by Piscanec et al. [2] as g = 2〈g2Γ〉F ≃ 0.28 eV and not as g = 〈g2Γ〉F ≃ 0.2 eV as incorrectly assumed in our Letter. In addition, there is a substantial uncertainty in the precise value of the constant g. In a tight-binding model, the latter may be related to the derivative of the hopping amplitude t as a function of the carbon-carbon distance a as g = (−dt/da)× 3/(2 Mω) [1]. Harrison’s phenomenological law t ∝ 1/a2 then implies that g ≃ 0.26 eV. Experiments in graphene [3] and [4] in zero magnetic field give for the dimensionless coupling constant λ the values 4.4 × 10−3 and 5.3×10−3 respectively. This determines g in between 0.3 eV and 0.36 eV, where we take into account that the value of t lies between 2.7 and 3 eV. In the end, we have to take g in the range between 0.26 and 0.36 eV [instead of g ≃ 0.2 eV] and therefore the dimensionless coupling constant becomes λ ≃ (2.8 to 5.3)× 10−3 [instead of λ ≃ 10−3]. As a result of the two factors of 2, the numerical estimate for the mode splitting 2gA at ν = 0 and B ≃ 30 T [at the discussed resonance −, 1 ⇒ 0 and 0 ⇒ +, 1, see second column of page 3] becomes 2gA ∼ 15 meV (∼ 120 cm−1), for g ≃ 0.26 eV and 2gA ∼ 20 meV (∼ 160 cm−1) for g ≃ 0.36 eV [instead of 2gA ∼ 8 meV]. The effect is therefore twice larger than initially predicted. The conclusions of our work remain unaltered. We would like to thank C. Faugeras and M. Potemski for having drawn our attention on the underestimated value of the mode splitting. See also their recent preprint where they measure the magnetophonon resonance [5]. [1] T. Ando, J. Phys. Soc. Jpn 75, 124701 (2006); ibid 76, 024712 (2007). [2] S. Piscanec, M. Lazzeri, F. Mauri, A. C. Ferrari, and J. Robertson, Phys. Rev. Lett. 93, 185503 (2004). [3] S. Pisana, M. Lazzeri, C. Casiraghi, K. S. Novoselov, A. K. Geim, A. C. Ferrari, and F. Mauri, Nature Materials 6, 198 (2007). [4] J. Yan, Y. Zhang, P. Kim, and A. Pinczuk, Phys. Rev. Lett. 98, 166802 (2007). [5] C. Faugeras, M. Amado, P. Kossacki, M. Orlita, M. Sprinkle, C. Berger, W.A. de Heer and M. Potemski, arXiv:0907.5498. http://arxiv.org/abs/0907.5498 ABSTRACT We describe a peculiar fine structure acquired by the in-plane optical phonon at the Gamma-point in graphene when it is brought into resonance with one of the inter-Landau-level transitions in this material. The effect is most pronounced when this lattice mode (associated with the G-band in graphene Raman spectrum) is in resonance with inter-Landau-level transitions 0 -> (+,1) and (-,1) -> 0, at a magnetic field B_0 ~ 30 T. It can be used to measure the strength of the electron-phonon coupling directly, and its filling-factor dependence can be used experimentally to detect circularly polarized lattice modes. <|endoftext|><|startoftext|> Pfa�ans, hafnians and produ ts of real linear fun tionals Péter E. Frenkel Alfréd Rényi Institute of Mathemati s Hungarian A ademy of S ien es P.O.B. 127, 1364 Budapest, Hungary frenkelp�renyi.hu Abstra t We prove pfa�an and hafnian versions of Lieb's inequalities on deter- minants and permanents of positive semi-de�nite matri es. We use the hafnian inequality to improve the lower bound of Révész and Sarantopou- los on the norm of a produ t of linear fun tionals on a real Eu lidean spa e (this subje t is sometimes alled the `real linear polarization on- stant' problem). Mathemati s Subje t Classi� ation: 46C05, 15A15 Keywords: polarization onstant, real Eu lidean spa e, hafnian, pfaf- �an, positive semi-de�nite matrix -1. Introdu tion The ontents of this paper are as follows. In Se tion 0, we sket h one part of the histori ba kground: lassi al inequalities on determinants and permanents of positive semi-de�nite matri es. In Se tion 1, we prove pfa�an and hafnian versions of these inequalities, and we formulate Conje ture 1.5, another hafnian inequality. In Se tion 2, we apply the hafnian inequality of Theorem 1.4 to our main goal: improving the lower bound of Révész and Sarantopoulos on the norm of a produ t of linear fun tionals on a real Eu lidean spa e (this subje t is sometimes alled the `real linear polarization onstant' problem, its history is sket hed at the end of the paper). This is a hieved in Theorem 2.3. We point out that Conje ture 1.5 would be su� ient to ompletely settle the real linear polarization onstant problem. Partially supported by OTKA grants T 046365, K 61116 and NK 72523. http://arxiv.org/abs/0704.0028v2 0. Old inequalities on determinants and perma- nents Re all that the determinant and the permanent of an n× n matrix A = (ai,j) are de�ned by detA = (−1)π ai,π(i), per A = ai,π(i), where Sn is the symmetri group on n elements. Throughout this se tion, we assume that A is a positive semi-de�nite Hermitian n × n matrix (we write A ≥ 0). For su h A, Hadamard proved that detA ≤ ai,i, with equality if and only if A has a zero row or is a diagonal matrix. Fis her generalized this to detA ≤ detA′ · detA′′ B∗ A′′ ≥ 0, (1) with equality if and only if detA′ · detA′′ · B = 0. Con erning the permanent of a positive semi-de�nite matrix, Mar us [Mar1, Mar2℄ proved that per A ≥ ai,i, (2) with equality if and only if A has a zero row or is a diagonal matrix. Lieb [L℄ generalized this to per A ≥ per A′ · per A′′ (3) for A as in (1), with equality if and only if A has a zero row or B = 0. Moreover, he proved that in the polynomial P (λ) of degree n′ (=size of A′) de�ned by P (λ) = per λA′ B B∗ A′′ all oe� ients ct are real and non-negative. This is indeed a stronger theorem sin e it implies per A = P (1) = ct ≥ cn′ = per A′ · per A′′. �okovi¢ [D, Mi℄ gave a simple proof of Lieb's inequalities, and showed also that if A′ and A′′ are positive de�nite then cn′−t = 0 if and only if all subpermanents of B of order t vanish. Lieb [L℄ also states an analogous (and analogously provable) theorem for determinants: for A as in (1), let D(λ) = det λA′ B B∗ A′′ If detA′ · detA′′ = 0, then D(λ) = 0. If A′ and A′′ are positive de�nite, then (−1)tdn′−t is positive for t ≤ rk B and is zero for t > rk B. Remark. In all of Lieb's inequalities mentioned above, the ondition that the matrix A is positive semi-de�nite an be repla ed by the weaker ondition that the diagonal blo ks A′ and A′′ are positive semi-de�nite. The proof goes through virtually un hanged. Alternatively, this stronger form of the inequali- ties an be easily dedu ed from the seemingly weaker form above. 1 New inequalities on pfa�ans and hafnians For an n × n matrix A = (ai,j) and subsets S, T of N := {1, . . . , n}, we write AS,T := (ai,j)i∈S,j∈T . If |T | = 2t is even, we write (−1)T := (−1)t+ 1.1 Pfa�ans As far as the appli ations in Se tion 2 are on erned, this subse tion may be skipped. Re all that the pfa�an of a 2n × 2n antisymmetri matrix C = (ci,j) is de�ned by pf C = π∈S2n (−1)πcπ(1),π(2) · · · cπ(2n−1),π(2n). We have (pf C) = detC. For antisymmetri A and symmetri B, both of size n× n, we onsider the polynomial (−1)⌊n/2⌋pf −λA B ⌊n/2⌋ Theorem 1.1 Let A and B be real n×n matri es with A antisymmetri and B symmetri . If B is positive semi-de�nite, then pt ≥ 0 for all t. If B is positive de�nite, then pt > 0 for t ≤ (rk A)/2 and pt = 0 for t > (rk A)/2. Proof. If B = (bi,j) is positive semi-de�nite, then there exist ve tors x1, . . . , xn in a real Eu lidean spa e V su h that (xi, xj) = bi,j. Re all that in the exterior tensor algebra V a positive de�nite inner produ t (and the orresponding Eu lidean norm) is de�ned by := det((vi, wj)). We have |S|=2t |T |=2t (−1)S(−1)Tpf AS,S · pf AT,T · detBN\S,N\T = |S|=2t |T |=2t (−1)Spf AS,S · xi, (−1)Tpf AT,T · j 6∈T |S|=2t (−1)Spf AS,S · Assume that B is positive de�nite. Then the ve tors xi are linearly independent. It follows that the tensors i6∈S xi are also linearly independent as S runs over the subsets of N . Thus pt = 0 if and only if pf AS,S = 0 for all |S| = 2t, i.e., if and only if 2t > rk A. � Theorem 1.2 Let A and B be real n × n matri es with A antisymmetri and B symmetri . Let λ ≥ 0. If B is positive semi-de�nite, then (−1)⌊n/2⌋pf −λA B ≥ detB. If B is positive de�nite, then equality o urs if and only if λA = 0. Proof. The left hand side is p0 + p1λ+ · · ·+ p⌊n/2⌋λ⌊n/2⌋. The right hand side is p0. � I am grateful to the anonymous referee of this paper for the idea of the following alternative proof of Theorems 1.1 and 1.2. We may assume B > 0, sin e every positive semi-de�nite matrix is a limit of positive de�nite ones. The matrix B−1/2AB−1/2 being real and antisymmetri , there exists a unitary matrix U su h that D := U−1B−1/2AB−1/2U is diagonal with purely imaginary eigenvalues a1 −1, . . . , an −1. The real multiset {a1, . . . , an} is invariant under a ↔ −a. We have = det −λA B = det BUDU−1 BUDU−1 = det −λD 1 0 U−1 = det −1 ai = detB2 · (1 + a2iλ). Extra ting square roots, and hoosing the sign in a ordan e with p0 = +detB, we get t = (−1)⌊n/2⌋pf −λA B = detB · (1 + a2iλ), when e both theorems immediately follow, sin e detB > 0. 1.2 Hafnians Re all that the hafnian of a 2n× 2n symmetri matrix C = (ci,j) is de�ned by haf C = π∈S2n cπ(1),π(2) · · · cπ(2n−1),π(2n). For symmetri A and B, both of size n× n, we onsider the polynomial ⌊n/2⌋ Theorem 1.3 Let A and B be symmetri real n× n matri es. If B is positive semi-de�nite, then ht ≥ 0 for all t. If B is positive de�nite, then ht = 0 if and only if all 2t× 2t subhafnians of A vanish. Proof. If B = (bi,j) is positive semi-de�nite, then there exist ve tors x1, . . . , xn in a real Eu lidean spa e V su h that (xi, xj) = bi,j . Re all [Mar1, Mar2, MN, Mi℄ that in the symmetri tensor algebra SV a positive de�nite inner produ t (and the orresponding Eu lidean norm) is de�ned by := per ((vi, wj)). We have |S|=2t |T |=2t haf AS,S · haf AT,T · per BN\S,N\T = |S|=2t haf AS,S · Assume that B is positive de�nite. Then the ve tors xi are linearly independent. It follows that the tensors i6∈S xi are also linearly independent as S runs over the subsets of N . Thus ht = 0 if and only if haf AS,S = 0 for all |S| = 2t. � Theorem 1.4 Let A and B be symmetri real n× n matri es. Let λ ≥ 0. If B is positive semi-de�nite, then ≥ per B. If B is positive de�nite, then equality o urs if and only if A is a diagonal matrix or λ = 0. Proof. The left hand side is h0 + h1λ+ · · ·+ h⌊n/2⌋λ⌊n/2⌋. The right hand side is h0. � Setting A = B and λ = 1, and ombining with Mar us's inequality (2), we arrive at ase p = 1 of Conje ture 1.5 If A = (ai,j) is a positive semi-de�nite symmetri real n× n matrix, then the hafnian of the 2pn× 2pn matrix onsisting of 2p× 2p blo ks A is at least (2p− 1)!!n i,i, with equality if and only if A has a zero row or is a diagonal matrix. 2 Produ ts of real linear fun tionals In this se tion, we apply Theorem 1.4 to produ ts of jointly normal random variables and then to produ ts of real linear fun tionals, whi h was the main motivation for this work. The ideas in this se tion are analogous to those that Arias-de-Reyna [A℄ used in the omplex ase. Let ξ1, . . . , ξd denote independent random variables with standard Gaussian distribution, i.e., with joint density fun tion (2π)−d/2 exp(−|ξ|2/2), where |ξ|2 = ξ2k.We write Ef(ξ) for the expe tation of a fun tion f = f(ξ) = f(ξ1, . . . , ξd). Re all that k = (2p− 1)!! = (2p− 1)(2p− 3) · · · 3 · 1 for k = 1, . . . , d (easy indu tive proof via integration by parts), and thus (2pk − 1)!!. , we write (·, ·) for the standard Eu lidean inner produ t. We re all the well-known [B2, G, S, Z℄ Wi k formula. Let x1, . . . , xn be ve tors in R with Gram matrix A = ((xi, xj)). Then (xi, ξ) = haf A. (4) (For odd n, we de�ne haf A = 0.) Proof. Both sides are multilinear in the xi, so we may assume that ea h xi is an element of the standard orthonormal basis e1, . . . , ed. If there is an ek that o urs an odd number of times among the xi, then both sides are zero. If ea h ek o urs 2pk times, then the left hand side is E k=1 ξ k , and the right hand side is k=1(2pk − 1)!!, whi h are equal. � The following theorems are easy orollaries of Theorem 1.4 together with the Wi k formula (4) and Mar us's theorem (2). Theorem 2.1 If X1, . . . , Xn are jointly normal random variables with zero expe tation, then X21 · · ·X2n ≥ EX21 · · ·EX2n. Equality holds if and only if they are independent or at least one of them is almost surely zero. Proof. The variables an be written as Xi = (xi, ξ) with ξ of standard normal distribution and the xi onstant ve tors with a positive semi-de�nite Gram matrix A = (ai,j) = ((xi, xj)). Then X2i = E (xi, ξ) = haf ≥ per A ≥ ai,i = E(xi, ξ) EX2i , with equality if and only if A is a diagonal matrix or has a zero row, i.e., the xi are pairwise orthogonal or at least one of them is zero. � The generalization of Theorem 2.1 to an arbitrary even exponent 2p is equiv- alent to Conje ture 1.5. Theorem 2.2 For any x1, . . . , xn ∈ Rd, |xi| = 1, the average of (xi, ξ) the unit sphere {ξ ∈ Rd : |ξ| = 1} is at least Γ(d/2) 2nΓ(d/2 + n) (d− 2)!! (d+ 2n− 2)!! d(d+ 2)(d+ 4) . . . (d+ 2n− 2) with equality if and only if the ve tors xi are pairwise orthogonal. Proof. The average on the unit sphere is the onstant in the theorem times the expe tation w.r.t. the standard Gaussian measure (see e.g. [B1℄). By The- orem 2.1, the latter expe tation is minimal if and only if the xi are pairwise orthogonal, in whi h ase it is 1. � Theorem 2.3 For real linear fun tionals fi on a real Eu lidean spa e, ||f1 · · · fn|| ≥ ||f1|| · · · ||fn|| n(n+ 2)(n+ 4) · · · (3n− 2) Here || · || means supremum of the absolute value on the unit sphere. In the in�nite-dimensional ase, fun tionals with in�nite norm may be allowed. Then the onvention 0 · ∞ = 0 should be used on the right hand side. Proof. We may assume that the spa e is R with d ≤ n, and the fun tionals are given by fi(ξ) = (xi, ξ) with ||fi|| = |xi| = 1. Then ||f1 · · · fn||2 is at least the average of f2i (ξ) = (xi, ξ) on the unit sphere, whi h by Theorem 2.2 and d ≤ n is at least 1/(n(n+ 2)(n+ 4) · · · (3n− 2)). � It is an unsolved problem, raised by Benítez, Sarantopoulos and Tonge [BST℄ (1998), whether Theorem 2.3 is true with nn under the square root sign in the denominator on the right hand side. This is alled the `real linear polarization onstant' problem. In the omplex ase, the a�rmative answer was proved by Arias-de-Reyna [A℄ in 1998, based on the omplex analog of the Wi k formula [A, B2, G℄ and on Lieb's inequality (3). Keith Ball [Ball℄ gave another proof of the a�rmative answer in the omplex ase by solving the omplex plank problem. In the real ase, the a�rmative answer for n ≤ 5 was proved by Pappas and Révész [PR℄ in 2004. For general n, the best estimate known before the present paper was that of Révész and Sarantopoulos [RS℄ (2004), based on results of [MST℄, with (2n)n/4 under the square root sign. See [Mat1, Mat2, MM, R℄ for a ounts on this and related questions. Note that n(n+ 2)(n+ 4) · · · (3n− 2) = = exp (logn+ log(n+ 2) + log(n+ 4) + · · ·+ log(3n− 2)) < < exp log u · du = exp [u(log u− 1)]3nn /2 = exp((3n log 3n− 3n− n logn+ n)/2) = = exp n(2 logn+ 3 log 3− 2) and 3 3/e < 3 · 1.8/2.7 = 2, so Theorem 2.3 is an improvement. Note also that the statement with nn under the square root sign would follow from Conje - ture 1.5. A knowledgements I am grateful to Péter Major, Máté Matol si and Szilárd Révész for helpful dis ussions, and to the anonymous referee for useful omments. Referen es [A℄ J. Arias-de-Reyna, Gaussian variables, polynomials and permanents, Lin. Alg. Appl. 285 (1998), 107�114. The referee of the present paper alled my attention to the fa t that Arias-de-Reyna used only the spe ial ase of (3) where the matrix A is of rank 1. This is mu h simpler than (3) in general, it an be proved essentially by the argument Mar us used in [Mar1, Mar2℄ to prove the even more spe ial ase n = 1, whi h still implies inequality (2). [Ball℄ K. M. Ball, The omplex plank problem, Bull. London. Math. So . 33 (2001), 433�442. [B1℄ A. Barvinok, Estimating L∞ norms by L2k norms for fun tions on orbits, Found. Comput. Math. 2 (2002), 393�412. [B2℄ A. Barvinok, Integration and optimization of multivariate polynomials by restri tion onto a random subspa e, arXiv preprint: math.OC/0502298 [BST℄ C. Benítez, Y. Sarantopoulos, A. Tonge, Lower bounds for norms of produ ts of polynomials, Math. Pro . Camb. Phil. So . 124 (1998), 395�408. [D℄ D. �. �okovi¢, Simple proof of a theorem on permanents, Glasgow Math. J. 10 (1969), 52�54. [G℄ L. Gurvits, Classi al omplexity and quantum entanglement, J. Comput. System S i. 69 (2004), no. 3, 448�484. [L℄ E. H. Lieb, Proofs of some onje tures on permanents, J. Math. Me h. 16 (1966), 127�134. [Mar1℄ M. Mar us, The permanent analogue of the Hadamard determinant the- orem, Bull. Amer. Math. So . 69 (1963), 494�496. [Mar2℄ M. Mar us, The Hadamard theorem for permanents, Pro . Amer. Math. So . 15 (1964), 967�973. [MN℄ M. Mar us, M. Newman, The permanent fun tion as an inner produ t, Bull. Amer. Math. So . 67 (1961), 223�224. [Mat1℄ M. Matol si, A geometri estimate on the norm of produ t of fun tionals, Lin. Alg. Appl. 405 (2005), 304�310. [Mat2℄ M. Matol si, The linear polarization onstant of R , A ta Math. Hungar. 108 (2005), no. 1-2, 129�136. [MM℄ M. Matol si, G. A. Muñoz, On the real linear polarization onstant prob- lem, Math. Inequal. Appl. 9 (2006), no. 3, 485�494. [Mi℄ H. Min , Permanents, En y lopedia of Mathemati s and its Appli ations, Addison-Wesley, 1978 [MST℄ G. A. Muñoz, Y. Sarantopoulos, A. Tonge, Complexi� ations of real Bana h spa es, polynomials and multilinear maps, Studia Math. 134 (1999), no. 1, 1�33. http://arxiv.org/abs/math/0502298 [PR℄ A. Pappas, Sz. Révész, Linear polarization onstants..., J. Math. Anal. Appl. 300 (2004), 129�146. [R℄ Sz. Gy. Révész, Inequalities for multivariate polynomials, Annals of the Marie Curie Fellowships 4 (2006), http://www.marie urie.org/annals/, arXiv preprint: math.CA/0703387 [RS℄ Sz. Gy. Révész, Y. Sarantopoulos, Plank problems, polarization and Cheby- shev onstants, J. Korean Math. So . 41 (2004) 157�174. [S℄ B. Simon, The P(φ)2 Eu lidean (Quantum) Field Theory, Prin eton Series in Physi s, Prin eton University Press, 1974 [Z℄ A. Zvonkin, Matrix integrals and map enumeration: an a esible introdu - tion, Combinatori s and physi s (Marseille, 1995), Math. Comput. Modelling 26 (1997), 281�304. http://arxiv.org/abs/math/0703387 New inequalities on pfaffians and hafnians Pfaffians Hafnians Products of real linear functionals ABSTRACT We prove pfaffian and hafnian versions of Lieb's inequalities on determinants and permanents of positive semi-definite matrices. We use the hafnian inequality to improve the lower bound of R\'ev\'esz and Sarantopoulos on the norm of a product of linear functionals on a real Euclidean space (this subject is sometimes called the `real linear polarization constant' problem). <|endoftext|><|startoftext|> Understanding the Flavor Symmetry Breaking and Nucleon Flavor-Spin Structure within Chiral Quark Model Zhan Shu, Xiao-Lin Chen, and Wei-Zhen Deng∗ Department of Physics, Peking University, Beijing 100871, China Abstract In χQM, a quark can emit Goldstone bosons. The flavor symmetry breaking in the Goldstone boson emission process is used to intepret the nucleon flavor-spin structure. In this paper, we study the inner structure of constituent quarks implied in χQM caused by the Goldstone boson emission process in nucleon. From a simplified model Hamiltonian derived from χQM, the intrinsic wave functions of constituent quarks are determined. Then the obtained transition probabilities of the emission of Goldstone boson from a quark can give a reasonable interpretation to the flavor symmetry breaking in nucleon flavor-spin structure. PACS numbers: 12.39.-x, 12.39.Fe, 14.20.Dh ∗Electronic address: dwz@th.phy.pku.edu.cn http://arxiv.org/abs/0704.0029v2 mailto:dwz@th.phy.pku.edu.cn I. INTRODUCTION The measurements of the polarized structure functions of the nucleon in deep inelastic scattering(DIS) experiments[1, 2, 3, 4] show the complication in proton spin structure. Only a portion of the proton spin is carried by valence quarks. Moreover, several experiments[5, 6, 7] clearly indicate the ū-d̄ asymmetry as well as the existence of the strange quark content s̄ in the proton sea. Also the distribution of strange quark in the proton sea is polarized negative. The DIS results deviate significantly from the näıve quark model (NQM) expectation. NQM gives many fairly good descriptions of hadron properties. Why does NQM work? It is a puzzle that the quarks inside a hadron could be treated as non-relativistic particles in NQM. The chiral quark model (χQM) tries to bridge between QCD and NQM. It was originated by Weinberg[8] and formulated by Manohar and Georgi[9]. Between the QCD confinement scale (ΛQCD ≃200MeV) and a chiral symmetry breaking scale (ΛχSB ≃1GeV), the strong interaction is described by an effective Lagrangian of quarks q, gluons g and Numbu-Goldstone bosons Π. An important feature of the χQM is that, betweetn ΛQCD and ΛχSB, the internal gluon effects in a hadron can be small compared to the internal Goldstone bosons Π and quarks q, so the effective degrees of freedom in this region can be q and Π. It is interesting that χQM can also be used to explain why NQM does not work in the above DIS experiments. By the emission of Goldstone boson, χQM allows the fluctuation of a quark q into a recoiling quark plus a Goldstone boson q → q′Π . The q′Π system then further splits to generate quark sea through • the helicity-flipping process q↑ −→ Π+ q′↓ −→ (qq̄′) + q′↓ (1) • and the helicity-non-flipping process q↑ −→ Π+ q′↑ −→ (qq̄′) + q′↑ (2) where the subscript indicates the helicity of quark. In both the process, q′Π is in a relative P- wave state. In the helicity-flipping process (1), the orbital angular momentum along helicity direction must be 〈lz〉 = +1. In the helicity-non-flipping process (2), 〈lz〉 = 0. The process cause a modification of the spin content of the nucleon because a quark changes its helicity in (1). Also it causes a modification of the flavor content because the generated quark sea from Π is flavor dependent[10, 11]. χQM was first used to explain the nucleon sea flavor asymmetry and the smallness of the quark spin fraction by Eichten, Hinchliffe and Quigg[10]. The flavor asymmetry of sea quark distribution arises from the mass differences in different quark flavors and in different Goldstone bosons. Only the lightest Goldstone Boson π was considered since its contribution dominates. From a perturbation calculation, the probability for an up quark to emit a π+ was estimated to be a = 0.083. This would induce a flavor asymmetry in parton distributions of nucleon and other hadrons. However, the estimated transition probability is not enough to full account the flavor asymmetry in DIS experiments. Contribution from other Π’s and even η′ was considered by Cheng and Li[11]. Explicit SUf (3) breaking in the transition probabilities was later intro- duced in refs. 12, 13 and further used by several authors[14, 15, 16, 17, 18, 19]. Nevertheless, in all these calculations, the transition probabilities were put into model by hand. To fit the experimental data, the probability of an up quark emitting π+ needs to be set to a >∼ 0.1, which is about 20% larger than the perturbation calculation. Although the probability of π emission can be enlarged by using a higher momentum cut off Λ > ΛχQM in the perturbation calculation [20], however, the chiral quark model is no longer valid at arbitrary high energies Λ ≫ ΛχQM. We should not be surprised by this discrepancy since the χQM works in a region right above the QCD confinement scale ΛQCD. There one may expect the confinement effect is important and the perturbative calculation of QCD may contain large error. However, there is another essential difference between the above model calculations and the perturbation calcultion. In the perturbation calculation, the emitted Goldstone bosons are virtual par- ticles. In the above model calculations which are closely related to NQM, however, the Goldstone bosons are close to mass shell under the non-relativistic approximation. Since χQM can be a bridge between NQM and QCD, it is interesting to explore χQM from NQM side where we use the wave function method. This will give the above model cal- culations a concrete foundation in NQM and help us further understand the flavor symmetry breaking mechanism. In this paper, we will use wave function method to investigate the flavor symmetry break- ing in χQM. In a conventional quark model[21], a hadron consists of confined constituent quarks and its wave function is constructed in the configuration space of the constituent quarks. To incorporate the transition process of emitting Goldstone boson of χQM into the quark model, the constituent quarks will have intrinsic wave functions within the configu- ration q + q′Π. In Sec. II, we first present the composite wave function of constituent quarks including components of q′Π. The wave functions and the transition probabilities of q → q′Π are determined from a simplified χQM Hamiltionian. In Sec. III and Sec. IV, the obtained transition probabilities are used to calculate nucleon flavor-spin structure and baryon octet magnetic moments respectively. The numerical results and a brief summary are presented in Sec. V. II. THE WAVE FUNCTION OF A CONSTITUENT QUARK In χQM, the effective Lagrangian below the chiral symmetry breaking scale ΛχQM involves quarks, gluons, and Goldstone bosons. The first few terms in this Lagrangian are[9]: LχQM = ψ̄(iDµ + Vµ)γµψ + igAψ̄Aµγµγ5ψ −mψ̄ψ + f 2πtr∂ µΣ†∂µΣ+ ... (3) where Dµ = ∂µ + igGµ is the gauge-covariant derivative of QCD, Gµ the gluon field and g the strong coupling constant. The dimensionless axial-vector coupling gA = 0.7524 is determined from the axial charge of the nucleon. m represents the constituent quark masses due to chiral symmetry breaking. The pseudoscalar decay constant is fπ ≈ 93MeV. The Σ field, vector currents Vµ and axial-vector currents Aµ are given in terms of the Goldstone boson fields Φ π0 + 1√ η π+ K+ π− − 1√ π0 + 1√ K− K̄0 − 2√ , (4) Σ = exp(i ), (5) (ξ†∂µξ ± ξ∂µξ†), (6) ξ = exp(i ). (7) An expansion of the currents in powers of Φ/fπ yields the effective interaction between Π and q[10] LI = − ψ̄∂µΦγ µγ5ψ. (8) This allows the fluctuation of a quark into a recoil quark plus a Goldstone boson q → q′Π. In quark model, a hadron is built with constituent quarks. In accordance with χQM, we should treat a constituent quark as a composite particle including such components q′Π. Here we denote the wave function of a composite constituent quark as |q〉〉. At rest, |q〉〉 = zq|q〉+ q′Π|q ′Π〉. (9) In our paper, the state normalization relation is always taken as 〈p|p′〉 = δ3(p− p′). (10) The above wave function is of essential importance in our work. The square of the mod- ulus of the coefficient of each q′Π configuration is just the probability for the corresponding Π emission process Pq→q′Π = |xqq′Π| 2, (11) |zq|2 = (1− Pq→q′Π) is the probability of no Π emission. To determine the wave function (9), we first construct a simplified Hamiltonian in the degrees of freedom q and Π, H = H0 +HB +HI . (12) H0 represents the kinetic energies of q and Π. It reads ψ̄(iα · ∇+m)ψ + Tr[Φ̇2 + (∇Φ)2] + m2Π(Φ , (13) where mΠ is the physical mass of Π which is nonzero and nondegenerate. HI = − d3xLI , (14) is the χQM interaction. HB is an accessary interaction which is needed to bind the q together. In our simplified Hamiltonian, we will not disscuss the explicit formalism of HB. Instead, we will put some physical restriction conditions on it later in this section, which is sufficient to our calculation. FromH0, we can expand free fields ψ and Π in terms of creation and annihilation operators ψq(x) = (2π)3/2 q(p, s)e−ip·x + bq† ps(t)v q(p, s)eip·x , (15) ΦΠ(x) = (2π)3/2 e−ip·x + cΠ† eip·x p0=EΠ , (16) where p2 +m2q is the quark energy of flavor q, p2 +m2Π is the energy of Goldstone boson Π. aq† ps and b pr are the creation operators of quark q and anti-quark q̄ pr, a p′s} = {b pr, b p′s} = δ (3)(p− p′)δrs. (17) is the creation operator of Π ] = δ(3)(p− p′). (18) Next, we will replace the field ψ and Φ in the Hamiltonian (12) with the free field of (15) and (16). Then we can express the Hamiltonian in creation and annihilation operators, for example d3p Eq ps + b ps] + d3p EΠ . (19) In all the model calculations [11, 12, 13, 14, 15, 16, 17, 18, 19], the emitted Π is assumed bound to the quark source. To represent that q′Π are bound, we use the well known SHO function as their spatial wave function |qΠ〉 = d3p|p|e− 2λ2 [Y1(θ, φ) c ]1/2 |0〉, (20) |qΠ ↑〉 = 1√ d3p|p|e− 2λ2 Y11(θ, φ) c p↓ |0〉 d3p|p|e− 2λ2 Y10(θ, φ) c p↑ |0〉, (21) where λ is the “characteristic radius” parameter in Gaussian function. 1/ N is the nor- malization factor, dp p4 e πλ5. (22) However, we need a binding interaction HB in the Hamiltonian. Yet we do not know how to write out the explicit form of HB. However, HB should provide enough binding energy. That is, for the q′Π system, we must have 〈qΠ|H0 +HB|qΠ〉 ≤ mq +mΠ. (23) That is EB = 〈qΠ|HB|qΠ〉 ≤ mq +mΠ − 〈qΠ|H0|qΠ〉 = mq − Eq +mΠ − EΠ. (24) As a rough estimation, we will take the mininum value of EB EB = −max {Eq −mq + EΠ −mΠ} = −(Eu −mu + Eπ −mπ). (25) Then the wave function of a composite constituent quark is determined from Schrödinger equation H|q〉〉 =Mq|q〉〉. (26) After taking the above simplification, we need only solve a matrix eigen-value problem  =Mq  , (27) where aδ3(0) = 〈q|H|q〉, Bq′Πδ 3(0) = 〈q|H|q′Π〉, Cq′Π;q′′Π′δ 3(0) = 〈q′Π|H|q′′Π′〉, q′Π = x For example, let us consider the process u emitting Π. There are four possible |q′Π〉 states generated by the fluctuations of a u quark: |uπ0〉, |uη〉, |dπ+〉 and |sK+〉. Thus |u〉〉 = zu|u〉+ xuuπ0 |uπ0〉+ xuuη|uη〉+ xudπ+ |dπ+〉+ xusK+|sK+〉. (28) Taking these wave functions as basis, we can calculate the matrix of the Hamiltonian in (27). a = mu. (29) C is diagonalized. Its diagonal matrix elements are calculated from H0 Cuπ0;uπ0 = dp p4 e p2 +m2u + ) + EB, (30) Cuη;uη = dp p4 e p2 +m2u + p+m2η) + EB, (31) Cdπ+;dπ+ = dp p4 e p2 +m2d + ) + EB, (32) CsK+;sK+ = dp p4 e p2 +m2s + ) + EB. (33) B is calculated from HI Buπ0 = − dp p4e , (34) Buη = − dp p4e , (35) Bdπ+ = − dp p4e , (36) BsK+ = − dp p4e . (37) By diagonalizing this Hamiltonian matrix, we will obtain a new mass of the constituent u quark Mu and its composite wave function. The constituent masses and wave functions of d and s quarks can be obtained similarly. We have |d〉〉 = zd|d〉+ xddπ0 |dπ0〉+ xddη|dη〉+ xduπ−|uπ−〉+ xdsK0|sK0〉, (38) |s〉〉 = zs|s〉+ xssη|sη〉+ xsdK̄0 |dK̄ 0〉+ xsuK−|uK−〉. (39) From isospin symmetry, mu = md, we have zd = zu; xddπ0 = −xuuπ0 ; xduπ− = xudπ+ ; ... (40) However, since mu 6= ms, one should notice that zs 6= zu; xsdK̄0 6= x sK0; x uK− 6= xusK+. (41) After the diagonalization, the Goldstone bosons Π are separated from quarks q approx- imately. With only degrees of freedom q one can rebuild the quark model and so Mu, Md, Ms should be regarded as the constituent quark masses in quark model. III. FLAVOR AND SPIN STRUCTURE OF PROTON Having known the wave functions of constituent quark q and the transition amplitudes of q emitting each Goldstone bosons Π, we are able to calculate the quark distribution in a constituent quark following refs. 11, 12, 13. In χQM, Π will further split into a quark- antiquark pair. By substituting the quark contents of Π into wave functions (28), (38) and (39), we can rewrite the wave functions of constituent quark q as |u〉〉 = zu|u〉+ xuuη√ |u(uū)〉+ xuuη√ |u(dd̄)〉 2xuuη√ |u(ss̄)〉+ xudπ+ |d(ud̄)〉+ xusK+|s(us̄)〉, (42) |d〉〉 = zu|d〉+ xuuη√ |d(uū)〉+ xuuη√ |d(dd̄)〉 2xuuη√ |d(ss̄)〉+ xudπ+ |u(dū)〉+ xusK+|s(ds̄)〉, (43) |s〉〉 = zs|s〉+ xssη√ |s(uū)〉+ xssη√ |s(dd̄)〉 − 2xssη√ |s(ss̄)〉 |d(sd̄)〉+ xsuK−|u(sū)〉. (44) Then the antiquark and quark flavor contents of the proton (uud) are ū = 2 xuuη√ xuuη√ + |xudπ+ |2, u = ū+ 2, (45) xuuη√ xuuη√ + 2|xudπ+ |2, d = d̄+ 1, (46) s̄ = 2|xuuη|2 + 3|xusK+|2, s = s̄. (47) Some important quantities depending on the above quark distribution are: the Gottfried sum rule IG = (ū − d̄) whose deviation indicates the ū-d̄ asymmetry in proton sea; ū/d̄ measured through the ratio of muon pair production cross sections; and the fractions of quark flavors in proton fq = Σ(q+q̄) , f3 = fu − fd and f8 = fu + fd − 2fs. We can further calculate the spin structure of proton. Here one should consider the effects of configuration mixing generated by spin-spin forces[22]. We take the baryon wave functions from the quark model calculation[23, 24, 25]. The proton wave function for example, is expressed as = 0.90|P 28SS〉 − 0.34|P 28S ′S〉 − 0.27|P 28SM〉 (48) where the baryon SU(6)⊗O(3) wave functions are denoted as |B2S+1N Lσ〉, N is SU(3) mul- tiplicity. S, L are the total spin and total orbital angular momentum while σ = S,M,A denotes the permutation symmetry of SU(6). The spin polarization functions will be re- markably affected by configuration mixing. Following refs. 15, 17, we define the number operator by N̂ = nu↑u↑ + nu↓u↓ + nd↑d↑ + nd↓d↓ + ns↑s↑ + ns↓s↓, where nq↑, nq↓ are the number of q↑, q↓ quarks. The spin structure of the “mixed” proton is given by = (0.902 + 0.342) + 0.272 . (49) The spin structure after considering Π-emission is obtained by replacing for every quark in eq. (49) by q↑,↓ −→ (1− ΣPi)q↑,↓ + Pflipping(q↑,↓) + Pnon−flipping(q↑,↓), (50) where Pflipping(q↑,↓) and Pnon−flipping(q↑,↓)| are the probabilities of quark helicity flipping and non-flipping for q↑,↓ respectively. For example, in the case of u↑ quark we have, Pflipping(u↑) = (|xuuπ0 |2 + |xuuη|2)u↓ + |xudπ+ |2d↓ + |xusK+|2s↓ Pnon−flipping(u↑) = (|xuuπ0 |2 + |xuuη|2)u↑ + |xudπ+ |2d↑ + |xusK+|2s↑ Finally the spin polarization functions defined as ∆q = q↑ − q↓ are ∆u = (0.902 + 0.342) 114|xu |2 + 48|xuuη|2 + 36|xusK+|2 + 0.272 66|xu |2 + 24|xuuη|2 + 18|xusK+|2 , (51) ∆d = (0.902 + 0.342) |2 + 12|xuuη|2 + 9|xusK+|2 + 0.272 42|xu |2 + 12|xuuη|2 + 9|xusK+|2 , (52) ∆s = − . (53) There are several measured quantities which can be expressed in terms of the above spin polarization functions. The quantities usually calculated are ∆3 = ∆u−∆d and ∆8 = ∆u+ ∆d−2∆s, obtained from the neutron β-decay and the weak decays of hyperons respectively. Another important quantity is the flavor singlet component of the total quark spin content defined as 2∆Σ = ∆u + ∆d + ∆s . We also calculate some weak axial-vector form factors which are also related to the spin polarization functions, (GA/GV )Λ→p = (2∆u−∆d−∆s), (GA/GV )Σ−→n = ∆d−∆s, and (GA/GV )Ξ−→Λ = 13(∆u+∆d− 2∆s). IV. BARYON OCTET MAGNETIC MOMENTS Considering the relative angular momentum between quark and Goldstone boson Π, the magnetic moment operator of a qΠ system is µ̂qΠ = p2q +m p2Π +m p2q +m p2Π +m p2Π +m p2q +m p2q +m p2Π +m l̂ (54) where eq and eΠ are the electric charges carried by q and Π respectively, ŝ the quark spin operator and l̂ the relative angular momentum bewteen q and Π. The first term in Eq(54) is the intrinsic magnetic moment of quark and the other two terms are the contribution of the orbital angular momentum. Here we have to consider the relativistic effect since the relative momentum of q or Π are comparable to their masses in the qΠ system pq,Π ∼ Λ ∼ mq,Π. With the SHO wave functions of (20), the magnetic moment of qΠ system (54) can be readily calculated. Then we can recalculate the magnetic moments of constituent quarks taking into account of the relativistic effect. For example, the magnetic moments of the u quark is µu = |zu|2〈u↑|µ̂|u↑〉+ Pu→uπ0〈uπ0|µ̂|uπ0〉+ Pu→uη〈uη|µ̂|uη〉 + Pu→dπ+〈dπ+|µ̂|dπ+〉+ Pu→sK+〈sK+|µ̂|sK+〉, (55) where 〈u↑|µ̂|u↑〉 = , (56) and the contribution from qΠ systems are 〈uπ0|µ̂|uπ0〉 = − eu p2 +m2π p2 +m2u + p2 +m2π p2 +m2u λ2 , (57) 〈uη|µ̂|uη〉 = − eu p2 +m2η p2 +m2u + p2 +m2η p2 +m2u λ2 , (58) 〈dπ+|µ̂|dπ+〉 = − p2 +m2π p2 +m2d + p2 +m2π p2 +m2d p2 +m2d p2 +m2d + p2 +m2π p2 +m2π λ2 , (59) 〈sK+|µ̂|sK+〉 = − p2 +m2K p2 +m2s + p2 +m2K p2 +m2s p2 +m2s p2 +m2s + p2 +m2K p2 +m2K λ2 . (60) The magnetic moments of d and s quarks can be calculated similarly. One can easily obtain the octet baryon magnetic moments by replacing the valence quarks inside the baryons with the corresponding constituent quarks. Again we take proton as an example, µp = (0.90 2 + 0.342) + 0.272 . (61) If we replace the µq by (55), µp can be further expressed as the baryon magnetic moment in conventional quark model plus the contribution from the Goldstone boson emission process [26]. The magnetic moments for other octet baryons can be calculated similarly. V. NUMERICAL RESULTS AND CONCLUSIONS In the numerical calculation, most of the parameters can be taken from the experimental data or the chiral quark model. We collect these fixed input parameters of our calculation in Table I. Here we have used the the physical masses of Goldstone bosons[27]. TABLE I: The fixed input parameters from chiral quark model and experimental data. gA fπ(MeV) mπ(MeV) mK(MeV) mη(MeV) 0.7524 93 135 494 548 For the quark masses, since our work focuses on the inner context of the constituent quarks in quark model, naturally we will refer to the quark masses from quark model, instead of the chiral quark model values. Here we will use the quark mass values from the widely accepted Isgur’s quark model[21] as shown in Table II. However, one should be cautious that, in our model, it is the quark with the Goldstone boson mixing which corresponds to the constituent quark in quark model. That is, mass values Mq after the diagonalization process should be set to the quark masses in Isgur’s model. Our strategy is to adjust the quark masses mq in the model Hamiltonian to fit the Mq values. Finally we are left only with one free parameter λ which describes the confinement of the emitted Goldstone boson in our model. An overall fit to the experimental data of nucleon flavor-spin structure and octet baryon magnetic moments shows that the best value should be λ=152MeV. With this value of λ and a minimun binding energe EB = −218MeV, the “bare” values of quark masses mq without Goldstone boson mixing are shown also in Table Transition probabilities of the light and strange quarks to various q′Π systems are given in Table III and IV respectively. The probability of a u quark emitting a π+ P (u → TABLE II: The quark masses with vs. without Goldstone boson mixing. λ EB(MeV) mu,d(MeV) ms(MeV) Mu,d(MeV) Ms(MeV) 152 −218 288 474 220 419 d + π+)=0.145 is significantly larger than the perturbation calculation a=0.083. One may notice that the λ parameter value 152MeV in our wave function, which is below ΛQCD, is rather small than another energy scale ΛχQM in chiral quark model. Surely this will weaken the interaction between q and q′Π. However, one should also notice that the binding energy EB = −218MeV will make the energy of a qΠ system much close to the single quark energy. This will enhance the mixing of q′Π components in a constituent quark. Also, we notice that the asymmetry between the probabilities of u(d) → s + K and s → u(d) + K̄. Whether this asymmetry leads to any observable consequence in hadron structure needs further investigation. TABLE III: Transition probabilities of a u quark to various q′Π systems and the mass of constituent u quark. u → u+ π0 u → u+ η u → d+ π+ u → s+K+ no GB-emission Mu 0.072 0.003 0.145 0.010 0.770 220MeV TABLE IV: Transition probabilities of a s quark to various q′Π systems and the mass of constituent s quark. s → s+ η s → u+K− s → d+ K̄0 no GB-emission Ms 0.012 0.071 0.071 0.846 419MeV Next, we will compare our calculate results with the experimental data. Since our em- phasis is on the substructure of a constituent quark in NQM, here we also quote the results from NQM. In Table V, the calculated flavor and spin structures of the proton are shown. It should be mentioned that the quark spin polarization functions can be further corrected by the gluon anomaly[13, 15, 17, 28, 29, 30] through ∆q(Q2) = ∆q − αs(Q ∆g(Q2), (62) and the flavor singlet component of the total helicity is modified accordingly as ∆Σ(Q2) = ∆Σ− 3αs(Q ∆g(Q2), (63) where ∆q(Q2) and ∆Σ(Q2) are the experimentally measured quantities, ∆q and ∆Σ corre- spond to the calculated quantities without gluon correction. Using the experimental data Σ(Q2 = 5GeV2) = 0.19 ± 0.02[2], αs(Q2 = 5GeV2) = 0.285 ± 0.013[27], and our result ∆Σ=0.346, the gluon polarization ∆g(Q2) is estimated to be 2.293. Both the results with and without gluon polarization corrections are presented in Table V. The inclusion of gluon polarization leads to a better agreement with experimental data for the spin structure. The calculated magnetic moments of octet baryons are given in Table VI. Although the deviation is somewhat around 30% in the case of Ξ−, our overall fit to octet baryon magnetic moments is in good agreement with experiments. Also it should be mentioned that even in the case of Ξ− the fit can perhaps be improved if corrections due to pion loops are taken into account[32, 33]. In the model calculations [11, 12, 13, 14, 15, 16, 17, 18, 19], the Goldstone boson sector in χQM is usually extended to include the η′ meson with U(3) symmetry. According to Cheng and Li[11], in the large Nc limit of QCD, there are nine Goldstone bosons including the usual octet and the singlet η′. Thus an constituent quark can also transit to a quark-η′ system. We have also made an U(3) calculation. With the inclusion of η′, we find that the probabilities for η′-emission from light and strange quarks P (u → u + η′)=P (d → d + η′)=0.0021 and P (s → s + η′)=0.0018 which are negligibly small as compared to those of octet Goldstone boson emissions. We therefore conclude that the contribution of η′ is not important, due to the obvious axial U(1) symmetry breaking in meson mass spectra mη′ > mK,η. To summarize, the χQM builds a bridge between the QCD and low-energy quark model. This allows us to understand the mechanism of flavor symmetry breaking and nucleon flavor- spin structure in NQM through the consideration of the sea quark and Goldstone bosons in the substructure of constituent quarks. Using the simple SHO wave function, we have modeled the wave functions of the composite constituent quarks and thus estimated the transition probabilities for Goldstone boson emissions. These transition probabilities indeed reflect the flavor SU(3) symmetry breaking in χQM from the differences in quark masses ms > mu,d and differences in Goldstone bosons masses mK,η > mπ and they are roughly in agreement with the parametrizations of other model calculations [11, 12, 13, 14, 15, 16, TABLE V: The calculated values for the quark flavor distribution functions and spin polarization functions in proton, as compared with experimental data and NQM results. Data NQM Our Model With ∆g Without ∆g ∆u 0.85 ± 0.05[2] 1.33 0.864 0.968 ∆d −0.41± 0.05[2] −0.33 −0.377 −0.274 ∆s −0.07± 0.05[2] 0 −0.107 −0.003 ∆3 = (GA/GV )n→p 1.270 ± 0.003[27] 1.67 1.242 1.242 (GA/GV )Λ→p 0.718 ± 0.015[27] 1 0.737 0.737 (GA/GV )Σ→n −0.340 ± 0.017[27] −0.33 −0.270 −0.270 (GA/GV )Ξ→Λ 0.25 ± 0.05[27] 0.33 0.234 0.234 ∆8 0.58 ± 0.025[2] 1 0.701 0.701 ∆Σ 0.19 ± 0.02[2] 0.5 0.190 0.346 ū − 0.264 d̄ − 0.392 s̄ − 0.036 ū− d̄ −0.118 ± 0.015[6] 0 −0.128 ū/d̄ 0.67 ± 0.06[6] 1 0.674 IG 0.254 ± 0.005[6] 0.33 0.248 fu − 0.577 fd − 0.407 fs 0.10 ± 0.06[31] 0 0.017 f3 − 0.170 f8 − 0.950 f3/f8 0.21 ± 0.05[14] 0.33 0.179 17, 18, 19]. The fit to both the flavor-spin structure of nucleon and octet baryon magnetic moments are in good agreement with experiments. TABLE VI: The caculated octet baryon magnetic moments in nuclear magneton, as compared with experiments and the results of NQM. Octet baryons Data[27] NQM[34] Our model p 2.79± 0.00 2.72 2.73 n −1.91 ± 0.00 -1.81 −1.91 Σ− −1.16 ± 0.025 -1.01 −1.23 Σ+ 2.46± 0.01 2.61 2.67 Ξ0 −1.25± 0.0014 −1.41 −1.36 Ξ− −0.65 ± 0.002 −0.50 −0.44 Λ −0.61 ± 0.004 −0.59 −0.56 ΣΛ 1.61± 0.08 1.51 1.63 Acknowledgments Zhan Shu would like to thank Fan-Yong Zou and Yan-Rui Liu for useful discussions. This work was supported by the National Natural Science Foundation of China under Grants 10675008. [1] J. Ashman et al. (European Muon), Phys. Lett. B206, 364 (1988); Nucl. Phys. B328, 1 (1990). [2] B. Adeva et al. (Spin Muon), Phys. Lett. B302, 533 (1993); P. Adams et al. (Spin Muon), Phys. Rev. D56, 5330 (1997). [3] P. L. Anthony et al. (E142), Phys. Rev. Lett. 71, 959 (1993). [4] K. Abe et al. (E143), Phys. Rev. Lett. 74, 346 (1995). [5] P. Amaudruz et al. (New Muon), Phys. Rev. Lett. 66, 2712 (1991); M. Arneodo et al. (New Muon), Phys. Rev. D50, R1 (1994). [6] E. A. Hawker et al. (E866/NuSea), Phys. Rev. Lett. 80, 3715 (1998); J. C. Peng et al. (E866/NuSea), Phys. Rev. D58, 092004 (1998); R. S. Towell et al. (E866/NuSea), ibid. D64, 052002 (2001). [7] A. Baldit et al. (NA51), Phys. Lett. B332, 244 (1994). [8] S. Weinberg, Physica A96, 327 (1979). [9] A. Manohar and H. Georgi, Nucl. Phys. B234, 327 (1984). [10] E. J. Eichten, I. Hinchliffe, and C. Quigg, Phys. Rev. D45, 2269 (1992). [11] T. P. Cheng and L.-F. Li, Phys. Rev. Lett. 74, 2872 (1995). [12] T. P. Cheng and L.-F. Li, Phys. Rev. D57, 344 (1998). [13] X. Song, J. S. McCarthy, and H. J. Weber, Phys. Rev. D55, 2624 (1997); X. Song, ibid., D57, 4114 (1998). [14] T. P. Cheng and L.-F. Li, Phys. Rev. Lett. 80, 2789 (1998). [15] J. Linde, T. Ohlsson, and H. Snellman, Phys. Rev. D57, 452 (1998); T. Ohlsson and H. Snell- man, Eur. Phys. J. C7, 501 (1999). [16] H. Dahiya and M. Gupta, Phys. Rev. D64, 014013 (2001). [17] H. Dahiya and M. Gupta, Phys. Rev. D66, 051501(R) (2002); D67, 114015 (2003). [18] H. Dahiya, M. Gupta, and J. M. S. Rana, Int. J. Mod. Phys. A21, 4255 (2006). [19] L. Yu, X.-L. Chen, W.-Z. Deng, and S.-L. Zhu, Phys. Rev. D73, 114001 (2006). [20] S. Baumgartner, H. J. Pirner, K. C. Konigsmann, and B. Povh, Z. Phys. A353, 397 (1996). [21] S. Godfrey and N. Isgur, Phys. Rev D32, 189 (1985); S. Capstick and N. Isgur, Phys. Rev. D34, 2809 (1986). [22] A. De Rujula, H. Georgi, and S. L. Glashow, Phys. Rev. D12, 147 (1975). [23] N. Isgur and G. Karl, Phys. Rev. D18, 4187 (1978). [24] R. Koniuk and N. Isgur, Phys. Rev. D21, 1868 (1980) [25] N. Isgur, G. Karl, and R. Koniuk, Phys. Rev. Lett. 41, 1269 (1978); N. Isgur and G. Karl, Phys. Rev. D21, 3175 (1980). [26] J. Franklin, Phys. Rev. D66, 033010 (2002). [27] W. M. Yao et al. (Particle Data Group), J. Phys. G33, 1 (2006). [28] G. Altarelli, G. G. Ross, Phys. Lett. B212, 391 (1988). [29] R. D. Carlitz, J. D. Collins, and A. H. Mueller, Phys. Lett. B214, 229 (1988). [30] A. V. Efremov, O. V. Teryaev, Dubna Report No. JIN-E2-88-287, 1998. [31] J. Grasser, H. Leutwyler, and M. E. Saino, Phys. Lett. B253, 252 (1991); A. O. Bazarko et al. (CCFR), Z. Phys. C65, 189 (1995). [32] S. Theberge and A. W. Thomas, Phys. Rev. D25, 284 (1982). [33] J. Cohen and H. J. Weber, Phys. Lett. B165, 229 (1985). [34] G. Karl, Phys. Rev. D45, 247 (1992). INTRODUCTION The Wave Function of a Constituent quark FLAVOR AND SPIN STRUCTURE OF PROTON BARYON OCTET MAGNETIC MOMENTS NUMERICAL RESULTS AND CONCLUSIONS Acknowledgments References ABSTRACT In $\XQM$, a quark can emit Goldstone bosons. The flavor symmetry breaking in the Goldstone boson emission process is used to intepret the nucleon flavor-spin structure. In this paper, we study the inner structure of constituent quarks implied in $\XQM$ caused by the Goldstone boson emission process in nucleon. From a simplified model Hamiltonian derived from $\XQM$, the intrinsic wave functions of constituent quarks are determined. Then the obtained transition probabilities of the emission of Goldstone boson from a quark can give a reasonable interpretation to the flavor symmetry breaking in nucleon flavor-spin structure. <|endoftext|><|startoftext|> Introduction Mounting experimental evidence from high-Tc cuprates 1, nickelates 2, manganites 3,4 and other interesting materials suggests that large electron- phonon interactions may play a more important role in the physics of strongly correlated electron systems than previously thought. Migdal-Eliashberg and BCS theories have proved extremely successful in describing the effects of phonons in many materials. However, if the coupling between electrons and the underlying lattice is large, and/or the phonons can not be treated within an adiabatic approximation, conventional approaches fail. The Holstein model contains most of the fundamental physics of the electron-phonon problem 5. Tight-binding electrons are coupled to the lat- tice through a local interaction with Einstein modes. For large phonon frequencies, electrons interact with a strongly correlated Hubbard-like at- traction, while for small phonon frequencies the lattice gives rise to a static http://arxiv.org/abs/0704.0030v1 potential which is essentially uncorrelated. Between these two extreme lim- its of correlated and uncorrelated behavior, levels of correlation are tuned by the size of the phonon frequency and novel physics is expected. In par- ticular, it is normally the strength of interaction which is said to tune the correlation in e.g. the Hubbard model, whereas in the Holstein model, it can be seen that both interaction strength and phonon frequency may compete with each other to play this role. The dynamical mean-field theory (DMFT) approach has proved suc- cessful in treating the Holstein and other models 6,7,8. DMFT treats the self-energy as a momentum-independent quantity and is accurate as long as the variation across the Brillouin zone is small. For many aspects of the electron-phonon problem in 3D, correlations are short ranged and DMFT can be successfully applied. The weak coupling phase diagram was studied by Ciuchi et al. where competing charge-order (CO) and superconducting states were found 7. Freericks et al. developed a quantum Monte-Carlo (QMC) algorithm 8,9 and examined the applicability of several perturba- tion theory based techniques to the electron-phonon problem 10,11,12. The prediction of measurable quantities away from certain well-defined limits is severely restricted owing to difficulties inherent in the analytic continuation. Dynamic properties such as spectral functions can be computed in the case of static phonons 13, and close to the static limit 14. Alternatively, the limit of high phonon frequency (attractive Hubbard model) has been studied with a QMC algorithm 15. In the current study we are concerned with the behavior of dynamical properties that could be measured directly with experiment. We use the iter- ated perturbation theory approximation, which has been demonstrated to be accurate for the Hubbard model, and use maximum entropy to analytically continue the results. We compare the resulting single-particle spectral func- tions over a wide range of electron-phonon coupling strengths and phonon frequencies. The results obtained using iterated perturbation theory (IPT) are promising and capture generic weak and strong coupling behavior for all phonon frequencies. At intermediate phonon frequencies, we find that electron-phonon interactions produce a spectral function which is simulta- neously characteristic of both uncorrelated band (static) and strongly cor- related Mott/Hubbard regimes. We also find that the competition between band-like and correlated states causes unusual structures in the optical con- ductivity and resistivity. Provided a material with high enough phonon frequency can be identified, it is possible that such a state could be observed experimentally. This paper is organized as follows. First, we introduce the Holstein model, the dynamical mean-field theory and analytic continuation techniques (1a) (2a) (2b) (2c) Fig. 1. Second order contributions to the self-energy. Straight lines repre- sent electron Green’s functions of the host and wavy lines phonon Green’s functions. (section 2.). In section 3., we use IPT to determine the spectral functions of the Holstein model. We compare IPT with exactly known results in the static limit. This, in conjunction with the conclusions of Ref. 11 leads us to argue that IPT is a reasonable approximation for the calculation of dynamical properties in the intermediate phonon frequency regime. We compute the density of states, optical conductivity and resistivity, and give a heuristic explanation for their behavior. 2. Formalism The Holstein Hamiltonian is written as, H = −t σ iσcjσ + (gxi − µ)niσ + Mω20x The first term in this Hamiltonian represents a tight binding model with hopping parameter t. The second term couples the local ion displacement, xi to the local electron density. The final term can be identified as the non-interacting phonon Hamiltonian. c i (ci) create (annihilate) electrons at site i, pi is the ion momentum, M the ion mass, µ the chemical potential and g the electron-phonon coupling. The phonons are dispersionless with frequency ω0. The perturbation theory of this model may be written down in terms of electrons interacting via phonons with the effective interaction, U(iωs) = − M(ω2s + ω Here, ωs = 2πsT are the Matsubara frequencies for bosons and s is an integer. Taking the limit ω0 → ∞, g → ∞, while keeping the ratio g/ω0 finite, leads to an attractive Hubbard model with a non-retarded on-site interaction U = −g2/Mω20 . Iterated-perturbation theory (IPT) is known to be a reasonable approximation to the half-filled Hubbard model 16,17. Taking the opposite limit (ω0 → 0, M → ∞, keeping Mω20 ≡ κ finite) the phonon kinetic energy term vanishes, and the phonons depend on a static variable xi. As such, the model may be considered as uncorrelated. We solve the Holstein model using dynamical mean-field theory (DMFT). DMFT freezes spatial fluctuations, leading to a theory which is completely momentum independent, while fully including dynamical effects of excita- tions. In spite of this simplification, DMFT predicts non-trivial (correlated) physics and may be used as an approximation to 3D models 18. As discussed in Ref. 6, DMFT involves the solution of a set of coupled equations which are solved self-consistently. The Green’s function for the single site problem, G(iωn) can be written in terms of the self-energy Σ(iωn) as, G−1(iωn) = G−10 (iωn)− Σ(iωn), (3) where Σ is a functional of G0, the Green’s function for the host of a single impurity model. Here ωn = 2πT (n+ 1/2) are the usual Matsubara frequen- cies. The assumptions of DMFT are equivalent to taking the self-energy of the original lattice problem to be local, hence G is also given by, G(iωn) = dǫD(ǫk) iωn + µ− Σ(iωn)− ǫk where D(ǫ) the density of states (DOS) of the non-interacting problem (in our case g = 0). We work with a Gaussian DOS which corresponds to a hypercubic lattice 18, D(ǫ) = exp(−ǫ2/2t2)/t 2π. Equations (3) and (4) are solved according to the following self-consistent procedure: Compute the Green’s function from equation (4) and the host Green’s function of the effective impurity problem, G0, from equation (3); then calculate a new self- energy from the host or full Green’s functions. In the following we will take the hopping parameter t = 0.5, which sets the energy scale. Once the algorithm has converged, and after analytically continuing to the real axis, response functions can be computed. We use the MAXENT method for the determination of spectral functions from Matsubara axis data. MAXENT treats the analytic continuation as an inverse problem 19. The Green’s function, G(z), is given by the integral transform, G(z) = z − x dx (5) where ρ(x) is the spectral function (ρ(ω) = Im[G(ω + iη)]/π). The problem of finding ρ is therefore one of inverting the integral transform. Since the data for Gn are incomplete and noisy for any finite set of Matsubara fre- quencies, the inversion of the kernel of the discretised problem is ill-defined. The MAXENT method selects the distribution ρ(x) which assumes the least structure consistent with the calculated or measured data. These methods have been extensively reviewed in the context of the inversion of the kernel in Refs. 19,20. The applicability to the current problem has been thoroughly tested, and is found to be accurate. Within the DMFT formalism, many response functions follow directly from the one-electron spectral function and the electron self-energy (essen- tially because of the neglect of all connected higher point functions apart from G0). Here we will be interested in the conductivity 6: Re[σ(ω)] = dǫD(ǫ) dν ρ(ǫ, ν)ρ(ǫ, ν + ω)[f(ν)− f(ν + ω)] (6) where f(x) is the Fermi-Dirac distribution. Taking the limit, ω → 0, leads to the DC conductivity. (The conductivity is in units of e2V/ha2, where a is the lattice spacing, V the volume of the unit cell and e and h are the electron charge and Planck’s constant respectively.) 3. Results In this section, we examine the validity of an approximation to the self- energy constructed from only first and second order terms with respect to the spectral functions calculated at very high and very low phonon frequencies. Finally, we calculate the optical conductivity and resistivity. Spectral functions are shown in figures 2 and 3. The perturbation theory is carried out in the host Green’s function (i.e. both electrons and phonons are bare). All diagrams in fig. 1 are considered, Σ1a(iωn) = −UT G0(iωn − iωs)D0(iωs) (7) Σ2a(iωn) = −2U2T 2 D20(iωn−m)G0(iωm)G0(iωs)G0(iωn−m+s) (8) Σ2b(iωn) = U D0(iωm−s)D0(iωn−m)G0(iωm)G0(iωs)G0(iωn−m+s) Σ2c(iωn) = U D0(iωn−m)D0(iωm−s)G20(iωm)G0(iωs) (10) This also gives the correct weak coupling limit for the electronic Green’s function. We consider first the calculation of spectral functions close to the static and Hubbard limits. In the instantaneous limit the perturbation theories for the Holstein and Hubbard models are equivalent. It is well known that second order perturbation theory in the host Green’s function provides a good approximation to the Hubbard model 21. In the static limit, the exact solution can easily be calculated 13. Figure 2 shows spectral functions from the exact solution, computed for a hypercubic lattice and spectral functions, computed using 2nd order perturbation theory at a temperature of T = 0.08. The phonon frequency ω0 = T/20 was chosen so that the effects of the phonon kinetic energy are negligible compared to thermal fluctuations. This allows a direct comparison to be made between the exact and approximate results. The comparison shows that the widths and positions of the major features are closely related. The results in Figure 2 for the static limit (ω0 → 0), together with the fact that second order IPT is known to give reasonable results in the instan- taneous limit (ω0 → ∞), suggest that the calculation of spectral functions should also be reliable at intermediate frequencies. We note that Freericks et al. also find a reasonable agreement between the IPT and QMC self-energies at half-filling, and that this should lead to a good agreement in the Matsub- ara axis Green’s function. We have therefore solved the IPT equations for the spectral functions at intermediate frequencies. We show the results in Fig 3. The results of the IPT calculations in the regime of intermediate cou- pling (Fig 3) are consistent with known results for the limiting cases. For frequencies ω ≫ ω0, the system has the qualitative behavior of the static limit: The original unperturbed density of states splits into two sub-bands centered around ±U/2. For small frequencies (ω ≪ U,ω0) and interaction strength, U , less than some critical value, the system behaves as an in- teracting electron (Hubbard) model, since the retarded interaction between particles U(iωs) (see equation 2) is effectively constant for ωs ≪ ω0. There is then a narrow quasiparticle band at the Fermi energy with density of states at the Fermi energy pinned at its non-interacting value 6. We also note that the results for small coupling and small frequencies are in good agreement with those calculated using ME theory in the metallic phase 22. The recent renormalization group (NRG) calculations of Meyer et al. 23,24 also report the spectral function in the intermediate regime. The NRG is in principle an exact method for solving the impurity problem onto which the DMFT equations map. Our results are largely consistent with theirs adding further support to the use of IPT in the intermediate regime. When comparing with the results of Meyer et al 23, one should note that the Hamil- tonian (1) is exactly the one considered in Ref. 23 but with the quantity 2Mω0 = Uω0/2 denoted by g in Ref. 23 and with energies measured in terms of the full bandwidth (instead of the half-bandwidth used here). In this paper, we work with the Gaussian density of states for the non- interacting electron DOS, whereas reference 23 uses the semi-elliptic DOS. In general, we would expect the critical values for the opening of a gap to be larger for the Gaussian case than for the semi-elliptic case. The critical coupling for the parameters in Figure 3(c) lies just above U = 2.0, corre- sponding in the units used in Ref. 23 to g = 1, compared with the critical value found for the semi-elliptic DOS of g = 0.69 (note that because of the different energy scales ω0 = 2 in our results corresponds to ω0 = 1 in the units of 23). However, the shapes of the spectral functions are similar in both cases, with a five-peaked structure below and four peaked structure above the transition. The peaks are narrower in the IPT results than in the NRG results and there is less weight in the high energy peaks. This may reflect the different DOS, or inaccuracies in the NRG method at frequencies far from the Fermi energy resulting from the logarithmic discretisation, but more likely the limitations of the IPT method. Using the method outlined in section 2. it is possible to calculate the real-axis self-energy. The temperature evolution of the imaginary part of the self-energy may be seen in figure 4 for U = 2 and ω0 = 2 The self-energy at low temperatures and small frequencies shows a quadratic (Fermi-liquid like) behavior consistent with the narrow quasiparticle peak seen in the spec- tral function (Fig 3) and develops to a broad central peak at higher tem- peratures. There are also peaks corresponding to the Hubbard sub-bands. With increasing temperature these phonon-induced peaks move together and merge into the single central maximum associated with incoherent on-site scattering. This peak is naturally characterized within the framework of the self-consistent impurity model formulation of the DMFT equations 6 in terms of a Kondo resonance. In this formulation, the dynamical mean field G0(ω) is written in terms of a hybridization ∆(ω) between the site orbital and a bath of conduction electrons and is therefore equivalent to an Anderson impurity model with the added complication that ∆ is frequency-dependent and needs to be computed self-consistently. However, many of the properties in the metallic state are similar to those of the Anderson impurity model. In particular the central peak in the spectral function can be viewed as the Kondo resonance of the impurity model. As all connected point-functions with order higher than G0 are neglected within DMFT, the computation of q = 0 response functions is straightfor- ward. As an example we show in fig 5 the (real part of the) optical con- ductivity for various temperatures with U = 2 and ω0 = 2. The structure seen in the curves reflects the structure of the density of states. There is a strong response at low frequencies as particle-hole pairs are excited within the ‘Kondo-like’ quasiparticle resonance at the Fermi energy. The second peak at frequencies ω ∼ 1 arises from excitations between the quasiparti- cle resonance and the large satellite (Hubbard band), while the third peak around ω ∼ 5.0 involves excitations between the satellites. The first dip at ω = 0.5 is the signature of the small Mott bands close to the Kondo resonance and is the feature most likely to be observable experimentally. Also calculated is the resistivity as a function of temperature (figure 6). The curves reflect the structure of the self-energy shown in Figure 4: At low temperatures the resistivity rises quadratically as expected for interacting electrons. The temperature scale is given by the quasiparticle bandwidth (‘Kondo temperature’). Above this temperature the resistivity drops as the on-site (Kondo) scattering amplitude for electrons reduces. There is a slight second peak at higher temperatures. The structures in ρ can be traced back to the behavior seen in the self-energy. This second peak is the result of an increase in scattering off the phonons: these soften slowly with increasing temperature and, around the second peak in the resistivity curve, outweigh the reduction in Kondo-like scattering as the temperature increases. This effect clearly involves a partial cancellation between two effects and hence may be sensitive to the accuracy of the analytic continuation, which at higher temperatures starts from reduced information (since the majority of Mat- subara points simply show asymptotic behaviour). 4. Summary We have discussed the result of changing the ratio of electron and phonon energies as a method for tuning the amount of correlation in a model of electron-phonon interactions. We use approximate schemes to solve for the spectral functions of the Holstein model. On the basis that second-order iterated perturbation theory predicts the correct qualitative behavior at a range of couplings in the static limit as well as describing correctly the limit of infinite phonon frequency, we have computed the spectral function at in- termediate frequencies and couplings. We have used an adaptation of the standard maximum entropy scheme to obtain the spectral function, the self- energy and the conductivity of the model by analytic continuation. These quantities had not previously been studied. The results for the intermediate frequency regime are consistent with what might be expected on the basis of the limiting cases (high and low frequencies). At energy scales smaller than ω0, the system shows behav- ior similar to that of the Hubbard model found in the instantaneous limit ω0 → ∞: there is a narrow central ‘Kondo-resonance’ or quasiparticle band. At large energies the model behaves as it does in the static regime with a well-defined band splitting. At intermediate frequencies the picture is com- plicated by the interplay of the loss of coherence in the quasiparticle band and the effective renormalization of the phonon frequency as a function of coupling and temperature. We suggest that if systems with anomalously large phonon frequencies and couplings exist, then the optical conductivity should bear the hallmark of the correlation tuned regime. 5. Acknowledgements The authors would like to thank F.Essler and F.Gebhard for useful discussions. REFERENCES 1. A.Lanzara, P.V.Bogdanov, X.J.Zhou, S.A.Kellar, D.L.Feng, E.D.Lu, T.Yoshida, H.Eisaki, A.Fujimori, K.Kishio, J.-I.Shimoyama, T.Noda, S.Uchida, Z.Hussa, and Z.-X.Shen. Nature, 412:6846, 2001. 2. J.M.Tranquada, K.Nakajima, M.Braden, L.Pintschovius, and R.J.McQueeney. Bond-stretching-phonon anomalies in stripe-ordered la1.69sr0.31nio4. Phys. Rev. Lett., 88:075505, 2002. 3. G.M.Zhao, K.Conder, H.Keller, and K.A.Müller. Nature, 381:676, 1996. 4. A.J.Millis, R.Mueller, and B.I.Shraiman. Phys. Rev. B, 54:5405–5417, 1996. 5. T.Holstein. Ann. Phys., 8:325–342, 1959. 6. A.Georges, G.Kotliar, W.Krauth, and M.Rozenburg. Rev. Mod. Phys., 68:13, 1996. 7. S. Ciuchi, F.de Pasquale, C.Masciovecchio, and D.Feinberg. Europhys. Lett., 24:575–580, 1993. 8. J.K.Freericks, M.Jarrell, and D.J.Scalapino. Phys. Rev. B, 48:6302–6314, 1993. 9. J.K.Freericks, M.Jarrell, and D.J.Scalapino. Europhys. Lett., 25:37–42, 1994. 10. J.K.Freericks. Phys. Rev. B, 50:403–417, 1994. 11. J.K.Freericks and M.Jarrell. Phys. Rev. B, 50:6939–6952, 1994. 12. J.K.Freericks, V.Zlatić, W.Chung, and M.Jarrell. Phys. Rev. B, 58:11613– 11623, 1998. 13. A.J.Millis, R.Mueller, and B.I.Shraiman. Phys. Rev. B, 54:5389–5404, 1996. 14. P.Benedetti and R.Zeyher. Phys. Rev. B, 58:14320–14334, 1998. 15. M.Keller, W.Metzner, and U.Schollwock. Dynamical mean-field theory for the normal phase of the attractive hubbard model. J. Low. Temp. Phys, 126:961, 2002. 16. A.Georges and G.Kotliar. Phys. Rev. B, 45:6479, 1992. 17. M.J.Rozenberg, X.Y.Zhang, and G.Kotliar. Phys. Rev. Lett., 69:1236, 1992. 18. W.Metzner and D.Vollhardt. Phys. Rev. Lett., 62:324, 1989. 19. J.E.Gubernatis, M.Jarrell, R.N.Silver, and D.S.Sivia. Phys. Rev. B, 44:6011, 1991. 20. H.Touchette and D.Poulin. Aspects numériques des simulations du modèle de hubbard – monte carlo quantique et méthode d’entropie maximum. Technical report, Université de Sherbrooke, 2000. 21. X.Y.Zhang, M.J.Rozenberg, and G.Kotliar. Phys. Rev. Lett., 70:1666, 1993. 22. J.P.Hague and N.d’Ambrumenil. cond-mat/0106355, 2001. 23. D.Meyer, A.C.Hewson, and R.Bulla. Gap formation and soft phonon mode in the holstein model. Phys. Rev. Lett., 89:196401, 2002. 24. A.C.Hewson and D.Meyer. Numerical renormalization group study of the anderson-holstein impurity model. J. Phys. Condens. Matt, 14(3):427, 2002. -6 -4 -2 0 2 4 6 (a) Static U=0.33 U=1.17 U=2.00 U=4.50 -6 -4 -2 0 2 4 6 (b) IPT U=0.33 U=1.17 U=2.00 U=4.50 -6 -4 -2 0 2 4 6 (c) Conserving U=0.33 U=1.17 U=2.00 U=4.50 Fig. 2. The spectral function in the static limit of the half-filled Holstein model computed at temperature T = 0.08 (a) using the exact solution and (b) using 2nd order IPT at a low frequency, ω0 = 0.004. The IPT solution at this small non-zero frequency is quite close to the exact solution in the static limit. In particular, the band splitting and the positions of the maxima agree. To contrast, panel (c) shows the results of the approximation using the full Green’s function (Diagram 2c from figure 1 is not included to avoid overcounting) -6 -4 -2 0 2 4 6 (a) ω0=0.056 U=0.33 U=1.17 U=2.00 U=4.50 -6 -4 -2 0 2 4 6 (b) ω0=0.500 U=0.33 U=2.00 U=4.50 -6 -4 -2 0 2 4 6 (c) ω0=2.000 U=0.33 U=2.00 U=4.50 Fig. 3. Spectral functions of the half-filled Holstein model for various electron-phonon couplings U , approximated using 2nd order perturbation theory at T = 0.02 and ω0 = 0.056 (top), ω0 = 0.5 (center) and ω0 = 2 (bottom). In the low frequency limit (ω0 = 0.125), the spectral functions are similar to those in the static limit shown in Fig. 2, with only a small effect from the non-zero phonon frequency. As the temperature is lower than the phonon frequency, the central quasiparticle peak is clearly resolved for U ≤ 2. For the intermediate frequencies (central panel) the peak around ω = 0 is again clear and has a width ∼ ω0 at low coupling. In the gapped phase at large couplings two band-splittings are visible. For ω ≫ ω0 the band splits just as in the static limit, while for ω ≪ U there is a peak at a renormalized phonon frequency (which is less than the bare phonon fre- quency). In the ungapped phases for ω0 = 0.5 and 2, the low energy behavior is similar to that found in the Hubbard model with a narrow quasiparticle band forming near the Fermi energy with the value at the Fermi energy pinned to its value in the non-interacting case. -6 -4 -2 0 2 4 6 T=0.08 T=0.16 T=0.32 Fig. 4. Imaginary part of the self-energy of the half-filled Holstein model when U = 2 and ω0 = 2 computed using IPT and analytically continued using MAXENT. At low temperatures the low frequency behavior is Fermi- liquid like (quadratic dependence on ω) down to quite low frequencies (at very low frequencies and low temperatures there are some inaccuracies as- sociated with the truncation in Matsubara frequencies). There are peaks at the frequencies associated with the phonon energy and with U. As the tem- perature increases the minimum at the Fermi energy (ω = 0) increases as incoherent on-site scattering in the corresponding local impurity increases (see text). At temperatures above the characteristic (Kondo-like) energy scale the central peak subsides and disappears. 0 1 2 3 4 5 6 T=0.08 T=0.16 T=0.32 Fig. 5. The real part of the optical conductivity for a system with U = 2.0 and ω0 = 2.0 for a range of temperatures. The structure of the spectrum reflects that in the density of states (see fig 3. At low frequencies, electrons may be excited within the quasiparticle resonance. The second peak at ω ∼ 2.0 represents excitations from the Kondo resonance to the large satellite (Hubbard band), and the peak at ω ∼ 5.0 represents excitations between the satellites. 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 U=2.00 U=2.13 U=2.28 Fig. 6. The resistivity as a function of temperature for the Holstein model for ω0 = 2 for varying electron-phonon coupling strengths. The resistivity is in units of e2V/ha2 with V the unit cell volume and a the lattice cell spacing. The behavior reflects what is seen in the self-energy. At low temperatures the behavior is similar to that in a Kondo lattice. The resistivity rises sharply with temperature for temperatures smaller than the quasiparticle bandwidth. The resistivity then drops for temperatures larger than this lattice coherence temperature. A simple logarithmic decay with temperature is not visible because, in addition to the Kondo-like scattering processes, the electrons are scattered from thermally excited phonons whose spectral weight broadens and shifts towards lower frequencies as the temperature rises. This leads to a second peak. In contrast, the second peak is not visible for the Hubbard model, and indicates the presence of two energy scales in the Holstein model. Introduction Formalism Results Summary Acknowledgements ABSTRACT We investigate the effect of tuning the phonon energy on the correlation effects in models of electron-phonon interactions using DMFT. In the regime where itinerant electrons, instantaneous electron-phonon driven correlations and static distortions compete on similar energy scales, we find several interesting results including (1) A crossover from band to Mott behavior in the spectral function, leading to hybrid band/Mott features in the spectral function for phonon frequencies slightly larger than the band width. (2) Since the optical conductivity depends sensitively on the form of the spectral function, we show that such a regime should be observable through the low frequency form of the optical conductivity. (3) The resistivity has a double kondo peak arrangement <|endoftext|><|startoftext|> Introduction A particle entering the crystal lattice parallel to a major crystallographic direction can be captured and channeled by the lattice along a crystal axis or plane [1]. For instance, a positive particle can be channeled between adjacent atomic planes. In a bent crystal, the channeled particles can follow the bend [2]. This led to elegant technique of beam steering by bent channeling crystals [3] now experimentally explored over six decades in energy from low MeV [4] to 1 TeV [5]. The technique is used on permanent basis in IHEP Protvino where crystal systems extract protons from 70 GeV main ring with efficiency of 85% at intensity up to 4×1012 protons using Si crystals just 2 mm along the beam [6]. Bent crystals channel in good agreement with predictions up to the highest energies [6-9]. Crystal applications at the 7-TeV Large Hadron Collider are considered for beam collimation and extraction [10] and in situ calibration of CMS and ATLAS calorimeters [11]. In another proposal, crystal could capture the particles emerging from the interaction point (IP) with small angles and channel them out of the beam [12]. This could help to improve on measurement of small angle elastic and “quasi-elastic” scattering in CMS and ATLAS where lower momentum transfers might become available for pp elastics scattering and lower proton momentum losses for diffractive physics [12]. Groups in both CMS (with TOTEM) and ATLAS would like to add very forward proton detectors, 420 m downstream on both sides, a project FP420 [13]. By detecting protons that have lost less than 1% of their longitudinal momentum, a rich QCD, electroweak, Higgs and BSM program becomes accessible, with the potential to make measurements which are unique at LHC, and difficult even at a future linear collider [13]. The measurement of the displacement x and angle x’ (in the horizontal plane) of the outgoing protons relative to the beam allows the momentum loss ξ=∆p/p and transverse momentum of the scattered protons to be reconstructed. Protons emerging from diffractive scattering at LHC have very small emission angles (10-150 µrad) and fractional momentum loss (ξ = 10-8 – 0.1). Hence they are very close to the beam and can only be detected in the Roman Pots downstream if their displacement at the detector location is large enough to escape the beam halo [13]. 2. Crystal efficiency As practice shows, crystal can go into a very limited space and get particles from there [6]. Most efficient crystal applications are based on so called “multipass” mode where particles can encounter a crystal many times in the ring [6,14]. There are also successful experimental demonstrations of highly efficient channeling in a single pass, with efficiency up to 60% at CERN SPS [15]. Throughout this paper we consider only a single-pass channeling. We show with simulations that at the LHC a crystal can efficiently channel forward protons. For channeling simulations we apply a Monte Carlo code CATCH [16] successfully used for prediction of experiments at CERN SPS [9], IHEP U-70 [6], Tevatron [7], RHIC [8] and KEK [17] and crystal applications at the LHC [10,11]. Crystal capture is very selective in angle. The critical angle θC within which the capture is possible is as small as about ±5 µrad / E1/2(TeV) at a high energy E in Silicon (110) planes. Proton divergence of 150 µrad is almost 100 times θC at 7 TeV. Therefore it is not possible to capture all these protons by a plain crystal. However, we can suggest an efficient solution benefiting from the fact that all diffractively scattered protons originate from a small region at the IP. For standard LHC optics with beta function value at the IP β*=0.55 m, the beam size at the IP is σbeam≈16 µm rms. The spread in the transverse position of the vertex point where outgoing protons originate from is determined by the rms spread of the beam and equals σbeam/√2 [13]. As protons emerge from diffractive scattering at LHC with emission angles up to ≈150 µrad and interaction width σbeam/√2, the emittance of the beam to be trapped by a crystal is ≤ 2π µrad-mm only. This corresponds to the acceptance of a Si crystal of ~1 mm transverse size. The match of the diffractive-protons emittance to the crystal acceptance means that the particles could be ♦ http://mail.ihep.ru/~biryukov/ trapped and channeled efficiently. To realize this, one has to match the crystal design and location to the application. For the LHC scenario with high luminosity, we find the most efficient design to be a crystal with a point-to-parallel focusing entry face. Focusing crystal proposed by A.I Smirnov in 1985 has a face shaped so that the tangents to the crystal planes cross at a focus line at some distance LF from the crystal [3]. This kind of crystal was successfully tested at IHEP where it efficiently trapped a beam of ±2 mrad divergence (or ~100 times θC) [18]. The crystal traps protons emerging from the focus line uniformly from all the angular range if the entry face has a proper shape [18]. We find that for most efficient channeling in the LHC the focus distance LF of this crystal should be equal to the effective distance Leff between the crystal location and the IP. In a drift space, Leff is geometrical distance. In accelerator lattice, Leff =(β*βC)1/2sin∆ψ, where βC is β value at the crystal and ∆ψ is the phase advance from the IP to the crystal. A plain crystal with a flat entry face has LF = ∞. In simulations with the low β* optics settings, a focusing crystal shows best efficiency if installed at a location with effective distance Leff ≥ 15 m from the IP. It can be a Si (110) or (111) crystal of ≈(0.15 mrad)×Leff ≥ 2.5 mm transverse size in order to capture efficiently all diffractive protons. Simulation predicts that a focusing Si(110) crystal with LF=Leff traps 90% of 7 TeV protons emerging from the IP in the angular range of 150 µrad width into channeling mode. The efficiency figure is almost independent of the crystal location provided Leff ≥ 15 m. The reason for high efficiency at high Leff is that, at a distance Leff from the IP, any point at the crystal entry face sees the beam source of σ size (at the IP) at an angle of σ/Leff. Channeling efficiency reduces by a factor of about (1-(σ/LeffθC)2)1/2 [3]. The reduction in efficiency by a factor of ≈1-σ2/2L2effθC2 becomes negligible for L2eff >> σ2/2θC2 =β*ε/4θC2 where ε is beam emittance. With β*=0.55 m and Si(110) crystal, channeling efficiency saturates for L2eff >> (4 One more idea for efficient channeling of forward protons in the LHC is that a crystal can be installed with planes parallel to either x’ or y’ plane. For application, it is not critical whether crystal bends protons in horizontal or vertical plane to produce an offset at the detector. But the distance Leff in accelerator lattice from the IP to the crystal can be very different in x and y planes. Then channeling efficiency is very different, whether crystal traps protons in x’ or y’ plane. We suggest in this case to install crystal for channeling in the plane with larger Leff. For instance, on the location 200 m downstream of the IP5 (CMS) and some 20 m ahead of the Roman Pot station at 220 m where crystal could be installed, Leff ≈6 m in x’ and ≈20 m in y’ plane. According to the analysis above, channeling efficiency in y’ plane should be great while in x’ plane moderate. Indeed, our simulations for this location show channeling efficiency of ≈87% in y’ and only ≈60% in x’ plane, for β*=0.55 m and optimal crystals of Si(110). A plain Si crystal has channeling efficiency in x’ plane of just 3.5% or 17 times lower than a Si focusing crystal adapted to the LHC lattice. For the run-in phase of the LHC with β*=2 m we find that channeling efficiency of ≥ 85% can be achieved if crystal is located at Leff ≥ 30 m downstream of the IP. The nominal, high luminosity optics of the LHC is not optimized for forward proton detection. Therefore a possibility to use a channeling crystal can be very helpful as it offers opportunities for diffractive physics studies otherwise inaccessible in the nominal LHC settings. The LHC options with a high β* (1540 and 90 m) are devised for the studies of diffractive physics. With β*=1540 m, the emittance of diffractively scattered protons increases to ≈50π µrad-mm. This corresponds to the acceptance of a Si crystal of ≥ 30 mm transverse size. Such a crystal is not out of question, however the problem is where to fit it in the LHC. In terms of Leff, good channeling efficiency requires a location with L2eff >> (130 m)2 in this optics. We simulated channeling on the location 200 m from the IP5. In β*=1540 m optics, a 10-cm Si(110) crystal trapped and bent protons 0.5 mrad in x’ plane with efficiency of 41%. A Ge(110) crystal shows there 48% efficiency, i.e. comparable to Si. All figures assumed a perfect match LF=Leff in crystal. A plain Si crystal gives efficiency of <<1%, or 300 times lower than a Si focusing crystal adapted to the LHC lattice on this location. In β*=90 m option on the same location, the choice of plane is important because Leff ≈10 m in x’ and ≈170 m in y’ plane. Preferred location should have L2eff >> (60 m)2 so we expect very different efficiencies in x’ and y’ planes. Our simulations give crystal efficiency of 72% in y’ and only 7% in x’ plane for β*=90 m. Here, crystal application is feasible only with bending in vertical plane. Low efficiency may exclude a crystal use for double-pomeron-exchange events (pp→pXp) with double-arm reconstruction, because the probability to have channeling in both arms in coincidence becomes small, e.g. (41%)2≈17%. For reconstruction of single-diffraction events (pp→pX) more detailed studies are required before the benefits (or their absence) from a crystal use with high β* options can be understood. In this paper we suggest the use of a single crystal for proton extraction from halo and delivery to the detector. The use of a 2-stage crystal system [12], first crystal for extracting a proton and second one for bending it a big angle, would reduce the overall efficiency by a factor of ~0.6 (ideally) or less. The 2nd crystal traps only part of the protons channeled in the 1st one. Finally, we notice that one can filter diffraction events with a crystal. Instead of trapping all forward protons, crystal acceptance can be made smaller and sample e.g. only the most forward protons emerging from the IP with the angles of a few µrad. 3. Precise transmission in a single (x, x’) plane Whereas protons are physically delivered from the IP to detector with good efficiency, the essential question is whether the information on phase space (x, x', y, y', E) distribution of particles is lost or corrupted while the particles are captured and transmitted in crystal. The success of experiments on measuring forward high momentum protons at the LHC depends on the angular precision of proton track reconstruction. A plain crystal would destroy the phase space information first by selecting particles from just a single direction and then disturbing the exit angle of particle by coherent and incoherent scattering in crystal. Plain crystal acceptance is ±θC and crystal accuracy in angle transmission is again ±θC. That means, a plain crystal traps and delivers about zero bit information on angle distribution. In this paper we design a crystal with the acceptance of ~100 θC and angle transmission accuracy of ~0.1 θC, although it sounds against the nature of crystal channeling. Suppose particles are coming with a distribution over (x, x', y, y', E). Ideally, we would like the crystal to trap all coming particles and preserve their distribution over (x, x', y, y', E), and then shift an angle of θ each particle towards a physical setup where this distribution can be analyzed in detail. One should solve two problems. One problem was to trap and bend a beam with a divergence much greater than the critical angle. A focusing crystal adapted to the LHC optics solves this problem. In simulations, a focusing crystal traps with 90% efficiency all protons emerging from the IP with the angular distribution ~100 times θC. Notice that the trapped particles fully preserve also their distribution over the angle in the plane orthogonal to the plane of channeling. In such a crystal, particles are trapped uniformly from a very broad distribution over x’ and y’. A bent crystal would transform (x, x', y, y') at the entrance into (x, x'+θ, y, y') at the exit. To do so, each trapped particle has to be channeled over the same distance in crystal. Therefore, the shape of the crystal exit face must match the entry face. Then in a bent crystal each channeled particle receives the same bending angle. Although the crystal described above can solve the idea of sampling a broad distribution of particles and delivering it to a required destination, second problem is how to preserve the sampled distribution (x, x', y, y') “frozen” on transmission through the crystal lattice as precise as possible. The coordinates (x, y) of particles are obviously preserved in crystal, so one should take care of the accuracy in transmission of angles x’, y’ only. The protons channeled between atomic planes in crystal are disturbed by (1) oscillations in the channeling plane with an amplitude up to θC and by (2) scattering on a rarefied electronic gas (mostly valence electrons) in both planes, x’ and y’. Notice that nuclear scattering will not disturb the sample of transmitted channeled particles as this process is strongly suppressed for channeled positive particles. Simply saying, any particle nuclear scattered would be dechanneled and thus not present in the sample of bent particles. That gives us the first idea that partially solves the problem of transmission accuracy. The idea is that the information on crystal-captured particles is very well preserved in one plane, e.g. (x, x'), while the particles are trapped and bent in another plane, e.g. (y, y’). Notice that particle distribution in the plane orthogonal to channeling is favored twice. Firstly, they are easily trapped with a broad angular distribution; secondly they are transmitted with a very little scattering. Information in this plane will be best preserved. The opportunity to have perfect data on just one plane is interesting for applications. The reconstruction of the Higgs boson mass in reaction pp→p+H+p requires (x, x’) data in horizontal plane only [13]. Figure 1 The difference in proton angles, x’ and y’, before and after a Si(110) crystal. Oscillations in the channeling plane on the atomic coherent potential are a greater problem. Fig. 1 shows a distribution of the difference in proton angle in x’ and y’ planes before and after a channeling in crystal, (x’OUT – x’IN) and (y’OUT – y’IN – 0.1 mrad), as obtained in simulations for a Si(110) bent crystal channeling in y’ plane. The accuracy in x’ transmission in crystal is very good indeed, ~0.1 µrad rms. The width of (y’OUT – y’IN – 0.1 mrad) distribution is much greater due to oscillations in the potential of Si (110) planes. 4. Precise transmission in both planes To solve the problem of accuracy in the other plane, i.e. the plane of channeling, one solution is to use a channel with a lower critical angle, for instance Si(100) instead of (110) or (111). A more universal solution is to use a strongly bent crystal. The critical angle θC is gradually reduced to zero when the crystal curvature approaches a critical value. The strong focusing of a strongly bent crystal suppresses channeling oscillations to any low level needed in the application. Fig. 2 shows the difference in proton angle in x’ and y’ planes before and after channeling in a crystal, (x’OUT – x’IN– 0.1 mrad) and (y’OUT – y’IN), as obtained in simulations for a 2 mm Si(100) crystal bent 0.1 mrad. The protons were channeled in x’ plane. The rms value of angle smearing found in simulations is 0.2 µrad both for x’ and y’. Figure 2 The same as in Fig. 1 but for a strongly bent Si (100) crystal. This accuracy should be compared to the angular resolution of the detectors downstream of the crystal. Measuring proton coordinates with ~10 µm resolution [13] over a base of ~8 m as allowed by a drift space would give an angular accuracy of ~1.4×10µm/8m=1.8 µrad. Addition (quadratic) of crystal transmission accuracies doesn’t change this resolution. That would be perfect for crystal. With a much better resolution on the detector side, down to 0.5 µrad, the overall resolution becomes ~0.55 µrad, i.e. just slightly disturbed. Crystal transmission in both planes, x’ and y’, is still almost perfect. Because of scattering on electronic gas, the protons loose energy in crystal. In simulations, the energy loss and its fluctuations in crystal are ∆E/E ≈10-7–10-6, i.e. much smaller than even the nominal energy spread in the LHC beam, 1.1×10-4. Energy losses in bent crystals were studied in experiments at CERN SPS with protons of 450 GeV and Pb ions of 33 TeV where CATCH predictions were also validated [9]. The diffractively scattered protons would have energy spread on the order of 100 GeV, or ∆E/E ≈1.5%, at the crystal entrance. In simulations with β*=0.55-2 m optics, channeling efficiency was completely independent of energy even for ∆E/E≈10%. In high β* options, crystal efficiency was uniform within ≈0.7% for ∆E/E ≈1.5%. One can say that a phase space distribution (x, y, x', y', E) can be perfectly preserved in crystal and no information is lost on transmission in crystal. Figure 3 An example of beam space (x’, y’) at the entrance to the crystal (a) and at the exit (b). Fig. 3 shows an example of a (x’, y’) plot at the entrance to the crystal (a) and at the exit of it (b) where we tried to show how accurately a crystal can transmit a signature in angular space (semicircle chosen as a probe). The resolution of the image transmitted by a crystal is ~0.2 µrad in both planes. In terms of the critical channeling angle θC, the obtained resolution is an order of magnitude finer than θC while the size of the trapped and channeled area can be some orders of magnitude greater than θC. In the applications there is no point to have a crystal transmission too perfect. It should match the other sources of inaccuracy like a multiple scattering in the detectors and vacuum chambers, etc. By tuning crystal parameters, in principle, one could very much improve in precision of the beam image downstream of the crystal but loose in brightness of the image, i.e. in statistics rate, as the efficiency of crystal transmission could be affected. Finally, we suggest another idea for the channeling plane (a “microscope idea”) that improves not only the crystal accuracy but even the detector resolution in that plane. Crystal can magnify beam image in one plane, e.g. transform the entrance values (x, x', y, y') into exit values (x, Nx' +θ, y, y'). The magnification factor N can be as big as 2 or 10 or even 100, and serve the purpose to increase strongly the overall angular resolution in x’. In the above examples, the overall resolution was ~1 µrad defined by detector resolution. With magnification optics, the overall inaccuracy in x’ would be effectively reduced by factor N, bringing it below 0.1 µrad rms. Magnification is realized by making the shape of the crystal exit face different from the entry face. With a magnification factor of 10, e.g., the entry angular opening of 50 µrad would correspond to the exit opening of 500 µrad. 5. Conclusion We have shown in simulations that crystal lattice can trap with 90% efficiency a beam with a (x', y') distribution much broader than a critical angle θC. To achieve that, one has to match the crystal focus length to the effective length between the particle source and the crystal in the accelerator lattice. Crystal adaptation to accelerator lattice improved channeling efficiency up to 300-fold. Crystal can transmit the trapped particles in channeled states with the phase space (x, x', y, y', E) distribution preserved with accuracy an order of magnitude finer than θC. Several solutions were proposed and supported by simulations for achieving a fine resolution in crystal transmission. This may give a beam instrument for collision products in colliders. Usually, accelerator beam instruments prepare particles for collision: by cooling them, bending, focusing, etc. Detectors sort out the results of collision. We change this a bit by introducing crystal optics between the collision point and detectors. A crystal adapted to the LHC lattice can trap with 90% efficiency all protons emerging from the IP with divergence of 150 µrad or ~100θC. The trapped protons can be channeled to detectors with precision down to 0.1 µrad rms. This makes feasible a crystal application for the measurement of diffractive scattering in CMS and ATLAS at the LHC. While we showed the physical capabilities of crystal channeling, its actual application in the LHC environment has to take into account many technical considerations to fit into existing infrastructure of accelerator and detectors. Crystal channeling of LHC forward protons can improve proton acceptance in momentum loss ξ and four-momentum transfer t both in TOTEM and FP420 and allow to reach the smallest possible value of the scattering angle [9]. Now the sensitive detector area starts at ~12-15σ from the LHC beam [10]. Crystal can be placed at ~6σ from the LHC beam as it is very small, ~cm Si, and does not provoke beam instability. Such a crystal can trap and deliver a very useful information on most forward high momentum “quasi-elastic” and elastic protons at LHC, unavailable otherwise. There are practical benefits as well. Crystal would relax tough requirements on β* needed for TOTEM. Crystal may allow TOTEM to run at the early start of the LHC, possibly running in parallel to other experiments. Thanks to crystal, FP420 detectors could possibly reside out of the cold region. The detectors don’t need to be edgeless. Crystal works best with low β*, where FP420 is interested most. If detectors can be more distanced from the beam, background conditions may improve. For injection, the active areas of the detectors must be kept away from the beams and then moved back; instead, one can move a crystal. Crystal can be introduced to experiment on a later stage in an attempt to expand the horizons of the physics program. References [1] D.S. Gemmel, Rev. Mod. Phys. 46, 1 (1974) [2] E.N. Tsyganov, FNAL TM-682 (1976). A.S. Vodopianov et al., JETP Lett. 30, 474 (1979) [3] V.M. Biryukov, Yu.A. Chesnokov and V.I. Kotov, Crystal Channeling and its Application at High Energy Accelerators. Berlin: Springer (1997) [4] M.B.H. Breese, Nucl. Instr. and Meth. B 132, 540 (1997) [5] R.A. Carrigan et al., Phys. Rev. ST AB 5, 043501 (2002) [6] A.G. Afonin et al., Nucl. Instr. and Meth. B 234, 14 (2005); Phys. Lett. B 435, 240 (1998); JETP Lett. 67, 781 (1998) [7] R.A. Carrigan et al., Phys. Rev. ST AB 1, 022801 (1998); V. Biryukov. Phys. Rev. E 52, 6818 (1995) [8] R.P. Fliller et al. Phys. Rev. ST AB 9, 013501 (2006); Nucl. Instr. Meth. B 234, 47 (2005); AIP Conf. Proc. 693, 192 (2004) [9] S.P. Moller et al. Phys. Rev. A 64, 032902 (2001); S.P. Moller et al., Nucl. Instr. and Meth. B 84, 434 (1994); V. Biryukov, Nucl. Instr. and Meth. B 117, 357 (1996). [10] E. Uggerhoj and U.I. Uggerhoj, Nucl. Instr. and Meth. B 234, 31 (2005); V.M. Biryukov et al., Nucl. Instr. and Meth. B 234, 23 (2005); arXiv:physics/0307027 [11] V.M. Biryukov and S. Bellucci, Nucl. Instr. and Meth. B 252, 7 (2006); arXiv:hep-ex/0504021 [12] K. Eggert and P. Grafstrom. Presented at CARE-HHH-APD Mini-Workshop on Crystal Collimation (CC-2005), Geneva, 2005. M. Albrow. Talk given at CERN (2006). [13] M. Albrow et al., CERN/LHCC 2006-039/G-124. [14] V. Biryukov, Nucl. Instrum. and Meth. B 53, 202 (1991); A. Taratin et al., Nucl. Instrum. and Meth. B 58, 103 (1991); V.M. Biryukov, Nucl. Instr. and Meth. B 117, 463 (1996) [15] A. Baurichter et al. Nucl. Instr. and Meth. B 164-165, 27 (2000) [16] V. Biryukov. Phys. Rev. E 51, 3522 (1995); CERN SL/Note 93-74 AP (1993). [17] S. Strokov et al., submitted to J. Phys. Soc. Jap. [18] V.I. Baranov et al., Nucl. Instr. and Meth. B 95, 449 (1995). ABSTRACT We show that crystal can trap a broad (x, x', y, y', E) distribution of particles and channel it preserved with a high precision. This sampled-and-hold distribution can be steered by a bent crystal for analysis downstream. In simulations for the 7 TeV Large Hadron Collider, a crystal adapted to the accelerator lattice traps 90% of diffractively scattered protons emerging from the interaction point with a divergence 100 times the critical angle. We set the criterion for crystal adaptation improving efficiency ~100-fold. Proton angles are preserved in crystal transmission with accuracy down to 0.1 microrad. This makes feasible a crystal application for measuring very forward protons at the LHC. <|endoftext|><|startoftext|> IFIC/07-03 Probing non-standard neutrino interactions with supernova neutrinos A. Esteban-Pretel, R. Tomàs and J. W. F. Valle1 1AHEP Group, Institut de F́ısica Corpuscular - C.S.I.C/Universitat de València Edifici Instituts d’Investigació, Apt. 22085, E-46071 València, Spain (Dated: November 4, 2018) We analyze the possibility of probing non-standard neutrino interactions (NSI, for short) through the detection of neutrinos produced in a future galactic supernova (SN). We consider the effect of NSI on the neutrino propagation through the SN envelope within a three-neutrino framework, paying special attention to the inclusion of NSI-induced resonant conversions, which may take place in the most deleptonised inner layers. We study the possibility of detecting NSI effects in a Megaton water Cherenkov detector, either through modulation effects in the ν̄e spectrum due to (i) the passage of shock waves through the SN envelope, (ii) the time dependence of the electron fraction and (iii) the Earth matter effects; or, finally, through the possible detectability of the neutronization νe burst. We find that the ν̄e spectrum can exhibit dramatic features due to the internal NSI-induced resonant conversion. This occurs for non-universal NSI strengths of a few %, and for very small flavor-changing NSI above a few×10−5. PACS numbers: 13.15.+g, 14.60.Lm, 14.60.Pq, 14.60.St, 97.60.Bw I. INTRODUCTION The very first data of the KamLAND collaboration [1] have been enough to isolate neutrino oscillations as the correct mechanism explaining the solar neutrino prob- lem [2, 3], indicating also that large mixing angle (LMA) was the right solution. The 766.3 ton-yr KamLAND data sample further strengthens the validity of the LMA os- cillation interpretation of the data [4]. Current data imply that neutrino have mass. For an updated review of the current status of neutrino oscil- lations see [5]. Theories of neutrino mass [6, 7] typ- ically require that neutrinos have non-standard prop- erties such as neutrino electromagnetic transition mo- ments [8, 9, 10] or non-standard four-Fermi interactions (NSI, for short) [11, 12, 13]. The expected magnitude of the NSI effects is rather model-dependent. Seesaw-type models lead to a non-trivial structure of the lepton mixing matrix characterizing the charged and neutral current weak interactions [6]. The NSI which are induced by the charged and neutral current gauge interactions may be sizeable [14, 15, 16, 17, 18]. Alter- natively, non-standard neutrino interactions may arise in models where neutrinos masses are radiatively “calcula- ble” [19, 20]. Finally, in some supersymmetric unified models, the strength of non-standard neutrino interac- tions may arise from renormalization and/or threshold effects [21]. We stress that non-standard interactions strengths are highly model-dependent. In some models NSI strengths are too small to be relevant for neutrino propagation, because they are either suppressed by some large mass scale or restricted by limits on neutrino masses, or both. However, this need not be the case, and there are many theoretically attractive scenarios where moderately large NSI strengths are possible and consistent with the small- ness of neutrino masses. In fact one can show that NSI may exist even in the limit of massless neutri- nos [14, 15, 16, 17, 18]. Such may also occur in the context of fully unified models like SO(10) [22]. We argue that, in addition to the precision determi- nation of the oscillation parameters, it is necessary to test for sub-leading non-oscillation effects that could arise from non-standard neutrino interactions. These are nat- ural outcome of many neutrino mass models and can be of two types: flavor-changing (FC) and non-universal (NU). These are constrained by existing experiments (see be- low) and, with neutrino experiments now entering a pre- cision phase [23], an improved determination of neutrino parameters and their theoretical impact constitute an im- portant goal in astroparticle and high energy physics [5]. Here we concentrate on the impact of non-standard http://arxiv.org/abs/0704.0032v1 neutrino interactions on supernova physics. We show how complementary information on the NSI parame- ters could be inferred from the detection of core-collapse supernova neutrinos. The motivation for the study is twofold. First, if a future SN event takes place in our Galaxy the number of neutrino events expected in the current or planned neutrino detectors would be enor- mous, O(104 − 105) [24]. Moreover, the extreme con- ditions under which neutrinos have to travel since they are created in the SN core, in strongly deleptonised re- gions at nuclear densities, until they reach the Earth, lead to strong matter effects. In particular the effect of small values of the NSI parameters can be dramatically enhanced, possibly leading to observable consequences. This paper is planned as follows. In Sec. II we summa- rize the current observational bounds on the parameters describing the NSI, including previous works on NSI in SNe. In Sec. III we describe the neutrino propagation formalism as well as the SN profiles which will be used. In Sec. IV we analyze the effect of NSI on the ν propaga- tion in the inner regions near the neutrinosphere and in the outer regions of the SN envelope. In Sec. V we discuss the possibility of using various observables to probe the presence of NSI in the neutrino signal of a future galactic SN. Finally in Sec. VI we present our conclusions. II. PRELIMINARIES A large class of non-standard interactions may be parametrized with the effective low-energy four-fermion operator: LNSI = −εfPαβ 2 2GF (ν̄αγµLνβ)(f̄γ µPf) , (1) where P = L, R and f is a first generation fermion: e, u, d. The coefficients ε αβ denote the strength of the NSI between the neutrinos of flavors α and β and the P−handed component of the fermion f . Current constraints on ε αβ come from a variety of dif- ferent sources, which we now briefly list. A. Laboratory Neutrino scattering experiments [25, 26, 27, 28, 29] provide the following bounds, |εfPµµ | . 10−3 − 10−2, |εfPee | . 10−1 − 1, |εfPµτ | . 0.05, |εfPeτ | . 0.5 at 90 % C.L [30, 31, 32]. On the other hand the analysis of the e+e− → νν̄γ cross section measured at LEP II leads to a bound on |εePττ | . 0.5 [33]. Future prospects to improve the current limits imply the measurement of sin2 ϑW leptonically in the scattering off electrons in the target, as well as in neutrino deep inelastic scattering in a future neutrino factory. The main improvement would be in the case of |εfPee | and |εfPeτ |, where values as small as 10−3 and 0.02, respectively, could be reached [31]. The search for flavor violating processes involving charged leptons is expected to restrict corresponding neu- trino interactions, to the extent that the SU(2) gauge symmetry is assumed. However, this can at most give indicative order-of-magnitude restrictions, since we know SU(2) is not a good symmetry of nature. Using radiative corrections it has been argued that, for example, µ − e conversion on nuclei like in the case of µ−T i also con- strains |εqPµe | . 7.7× 10−4 [31]. Non-standard interactions can also affect neutrino propagation through matter, probed in current neutrino oscillation experiments. The bounds so obtained apply to the vector coupling constant of the NSI, ε αβ = ε since only this appears in neutrino propagation in mat- ter [91]. B. Solar and reactor The role of neutrino NSIs as subleading effects on the solar neutrino oscillations and KamLAND has been re- cently considered in Ref. [34, 35, 36] with the following bounds at 90 % CL for ε ≡ − sinϑ23εdVeτ with the al- lowed range −0.93 . ε . 0.30, while for the diagonal term ε′ ≡ sin2 ϑ23εdVττ − εdVee , the only forbidden region is [0.20, 0.78] [36]. Only in the ideal case of infinitely pre- cise solar neutrino oscillation parameters determination, the allowed range would “close from the left” for negative NSI parameter values, at −0.6 for ε and −0.7 for ε′. C. Atmospheric and accelerator neutrinos Non-standard interactions involving muon neutrinos can be constrained by atmospheric neutrino experiments as well as accelerator neutrino oscillation searches at K2K and MINOS. In Ref. [37] Super-Kamiokande and MACRO observations of atmospheric neutrinos were con- sidered in the framework of two neutrinos. The limits ob- tained were −0.05 . εdVµτ < 0.04 and |εdVττ − εdVµµ | . 0.17 at 99 % CL. The same data set together with K2K were recently considered in Refs. [38, 39] to study the nonstan- dard neutrino interactions in a three generation scheme under the assumption εeµ = εµµ = εµτ = 0. The al- lowed region of εττ obtained for values of εeτ smaller than O(10−1) becomes Σf=u,d,eεfVαβNf/Ne . 0.2 [39] , where Nf stands for the fermion number density. D. Cosmology If non-standard interactions with electrons were large they might also lead to important cosmological and as- trophysical implications. For instance, neutrinos could be kept in thermal contact with electrons and positrons longer than in the standard case, hence they would share a larger fraction of the entropy release from e± annihi- lations. This would affect the predicted features of the cosmic background of neutrinos. As recently pointed out in Ref [40] required couplings are, though, larger than the current laboratory bounds. E. NSI in Supernovae According to the currently accepted supernova (SN) paradigm, neutrinos are expected to play a crucial role in SN dynamics. As a result, SN physics provides a laboratory to probe neutrino properties. Moreover, many future large neutrino detectors are currently be- ing discussed [41]. The enormous number of events, O(104 − 105) that would be “seen” in these detectors in- dicates that a future SN in our Galaxy would provide a very sensitive probe of non-standard neutrino interaction effects. The presence of NSI can lead to important conse- quences for the SN neutrino physics both in the highly dense core as well as in the envelope where neutrinos basically freely stream. The role of non-forward neutrino scattering processes on heavy nuclei and free nucleons giving rise to flavor change within the SN core has been recently analyzed in Ref. [42, 43]. The main effect found was a reduction in the core electron fraction Ye during core collapse. A lower Ye would lead to a lower homologous core mass, a lower shock energy, and a greater nuclear photon-disintegration burden for the shock wave. By allowing a maximum ∆Ye = −0.02 it has been claimed that εeα . 10−3, where α = µ, τ [43]. On the other hand it has been noted since long ago that the existence of NSI plays an important role in the propagation of SN neutrinos through the envelope lead- ing to the possibility of a new resonant conversion. In contrast to the well known MSW effect [44, 45] it would take place even for massless neutrinos [13]. Two basic ingredients are necessary: universal and flavor changing NSI. In the original scheme neutrinos were mixed in the leptonic charged current and universality was violated thanks to the effect of mixing with heavy gauge singlet leptons [6, 14]. Such resonance would induce strong neu- trino flavor conversion both for neutrinos and antineutri- nos simultaneously, possibly affecting the neutrino sig- nal of the SN1987A as well as the possibility of having r−process nucleosynthesis. This was first quantitatively considered within a two-flavor νe−ντ scheme, and bounds on the relevant NSI parameters were obtained using both arguments [46]. One of the main features of the such “internal” or “massless” resonant conversion mechanism is that it re- quires the violation of universality, its position being determined only by the matter chemical composition, namely the value of the electron fraction Ye, and not by the density. In view of the experimental upper bounds on the NSI parameters such new resonance can only take place in the inner layers of the supernova, near the neu- trinosphere, where Ye takes its minimum values. In this region the values of Ye are small enough to allow for resonance conversions to take place in agreement with existing bounds on the strengths of non-universal NSI parameters. The SN physics implications of another type of NSI present in supersymmetric R-parity violating models have also been studied in Ref. [47], again for a system of two neutrinos. For definiteness NSI on d−quarks were considered, in two cases: (i) massless neutrinos without mixing in the presence of flavor-changing (FC) and non- universal (NU) NSIs, and (ii) neutrinos with eV masses and FC NSI. Different arguments have been used in order to constrain the parameters describing the NSI, namely, the SN1987A signal, the possibility to get suc- cessful r−process nucleosynthesis, and the possible en- hancement of the energy deposition behind the shock wave to reactivate it. On the other hand several subsequent articles [48, 49, 50] considered the effects of NSI on the neutrino propa- gation in a three–neutrino mixing scenario for the case Ye > 0.4, typical for the outer SN envelope. Together with the assumption that εdVαβ . 10 −2 this prevents the appearance of internal resonances in contrast to previous references. Motivated by supersymmetric theories without R par- ity, in Ref. [48] the authors considered the effects of small-strength NSI with d−quarks. Following the for- malism developed in Refs. [51, 52] they studied the cor- rections that such NSI would have on the expressions for the survival probabilities in the standard resonances MSW-H and MSW-L. A similar analysis was performed in Ref. [49] assuming Z-induced NSI interactions orig- inated by additional heavy neutrinos. A phenomeno- logical generalization of these results was carried out in Ref. [50]. The authors found an analytical compact ex- pression for the survival probabilities in which the main effects of the NSI can be embedded through shifts of the mixing angles ϑ12 and ϑ13. In contrast to similar expres- sions found previously these directly apply to all mixing angles, and in the case with Earth matter effects. The main phenomenological consequence was the identifica- tion of a degeneracy between ϑ13 and εeα, similar to the analogous “confusion” between ϑ13 and the correspond- ing NSI parameter noted to exist in the context of long- baseline neutrino oscillations [53, 54]. We have now re-considered the general three–neutrino mixing scenario with NSI. In contrast to previous work [48, 49, 50], we have not restricted ourselves to large values of Ye, discussing also small values present in the inner layers. This way our generalized descrip- tion includes both the possibility of neutrinos having the “massless” NSI-induced resonant conversions in the in- ner layers of the SN envelope [13, 46, 47], as well as the “outer” oscillation-induced conversions [48, 49, 50] [92]. III. NEUTRINO EVOLUTION In this section we describe the main ingredients of our analysis. Our emphasis will be on the use of astrophys- ically realistic SN matter and Ye profiles, characterizing its density and the matter composition. Their details, in particular their time dependence, are crucial in deter- mining the way the non-standard neutrino interactions affect the propagation of neutrinos in the SN medium. A. Evolution Equation As discussed in Sec. II in an unpolarized medium the neutrino propagation in matter will be affected by the vector coupling constant of the NSI, ε αβ = ε αβ [93]. The way the neutral current NSI modifies the neu- trino evolution will be parametrized phenomenologically through the effective low-energy four-fermion operator described in Eq. (1). We also assume ε αβ ∈ ℜ, neglect- ing possible CP violation in the new interactions. Under these assumptions the Hamiltonian describing the SN neutrino evolution in the presence of NSI can be cast in the following form [94] να = (Hkin +Hint)αβ νβ , (2) where Hkin stands for the kinetic term Hkin = U U † , (3) with M2 = diag(m21,m 3), and U the three-neutrino lepton mixing matrix [6] in the PDG convention [55] and with no CP phases. The second term of the Hamiltonian accounts for the interaction of neutrinos with matter and can be split into two pieces, Hint = H int +H int . (4) The first term, Hstd describes the standard interaction with matter and can be written asHstd = diag (VCC , 0, 0) up to one loop corrections due to different masses of the muon and tau leptons [56]. The standard matter poten- tial for neutrinos is given by VCC = 2GFNe = V0ρYe , (5) where V0 ≈ 7.6×10−14 eV, the density is given in g/cm3, and Ye stands for the relative number of electrons with respect to baryons. For antineutrinos the potential is identical but with the sign changed. The term in the Hamiltonian describing the non- standard neutrino interactions with a fermion f can be expressed as, (Hnsiint )αβ = f=e,u,d )αβ , (6) with (V )αβ ≡ 2GFNfε αβ. For definiteness and mo- tivated by actual models, for example, those with broken R parity supersymmetry we take for f the down-type quark. However, an analogous treatment would apply to the case of NSI on up-type quarks, the existence of NSI with electrons brings no drastic qualitative differ- ences with respect to the pure oscillation case (see be- low). Therefore the NSI potential can be expressed as follows, (V dnsi)αβ = ε αβV0ρ(2− Ye) . (7) From now on we will not explicitely write the superindex d. In order to further simplify the problem we will rede- fine the diagonal NSI parameters so that εµµ = 0, as one can easily see that subtracting a matrix proportional to the identity leaves the physics involved in the neutrino oscillation unaffected. B. Supernova matter profiles Neutrino propagation depends on the supernova mat- ter and chemical profile through the effective potential. This profile exhibits an important time dependence dur- ing the explosion. Fig. 1 shows the density ρ(t, r) and the electron fraction Ye(t, r) profiles for the SN progenitor as well as at different times post-bounce. Progenitor density profiles can be roughly parametrized by a power-law function ρ(r) = ρ0 , (8) where ρ0 ∼ 104 g/cm3, R0 ∼ 109 cm, and n ∼ 3. The electron fraction profile varies depending on the matter composition of the different layers. For instance, typical values of Ye between 0.42 and 0.45 in the inner regions are found in stellar evolution simulations [57]. In the in- termediate regions, where the MSW H and L-resonances take place Ye ≈ 0.5. This value can further increase in the most outer layers of the SN envelope due to the pres- ence of hydrogen. After the SN core bounce the matter profile is affected in several ways. First note that a front shock wave starts to propagate outwards and eventually ejects the SN enve- lope. The evolution of the shock wave will strongly mod- ify the density profile and therefore the neutrino propa- gation [58, 59]. Following Ref. [60] we shall assume that the structure of the shock wave is more complicated and an additional “reverse wave” appears due to the collision of the neutrino-driven wind and the slowly moving mate- rial behind the forward shock, as seen in the upper panel of Fig. 1 [95]. On the other hand, the electron fraction is also affected by the time evolution as the SN explosion proceeds. Once the collapse starts the core density grows so that the neu- trinos become eventually effectively trapped within the so called “neutrinosphere”. At this point the trapped electron fraction has decreased until values of the order of 0.33 [61]. When the inner core reaches the nuclear density it can not contract any further and bounces. As a con- sequence a shock wave forms in the inner core and starts propagating outwards. When the newly formed super- nova shock reaches densities low enough for the initially trapped neutrinos to begin streaming faster than the shock propagates [62], a breakout pulse of νe is launched. In the shock-heated matter, which is still rich of elec- trons and completely disintegrated into free neutrons and protons, a large number of νe are rapidly produced by electron captures on protons. They follow the shock on its way out until they are released in a very luminous flash, the breakout burst, at about the moment when the shock penetrates the neutrinosphere and the neutri- nos can escape essentially unhindered. As a consequence, the lepton number in the layer around the neutrinosphere decreases strongly and the matter neutronizes [63]. The value of Ye steadily decreases in these layers until val- ues of the order of O(10−2). Outside the neutrinosphere there is a steep rise until Ye ≈ 0.5. This is a robust feature of the neutrino-driven baryonic wind. Neutrino heating drives the wind mass loss and causes Ye to rise within a few 10 km from low to high values, between 0.45 and 0.55 [64], see bottom panel of Fig. 1. Inspired in the numerical results of Ref. [60] we have parametrized the behavior of the electron fraction near the neutrinosphere phenomenologically as, Ye = a+ b arctan[(r − r0)/rs] , (9) where a ≈ 0.23− 0.26 and b ≈ 0.16− 0.20. The param- eters r0 and rs describe where the rise takes place and how steep it is, respectively. As can be seen in Fig. 1 both decrease with time. FIG. 1: Density (upper panel) and electron fraction (bottom panel) profiles for the SN progenitor and at different instants after the core bounce, from Ref. [60]. The regions where the H (yellow) and the L (cyan) resonance take place are also indicated, as well as the NSI-induced I (gray) resonance for the parameters εee = 0, εττ . 0.07 and |εµτ | . 0.05 IV. THE TWO REGIMES In order to study the neutrino propagation through the SN envelope we will split the problem into two differ- ent regions: the inner envelope, defined by the condition VCC ≫ ∆m2atm/(2E) with ∆m2atm ≡ m23 − m22, and the outer one, where ∆m2atm/(2E) & VCC . From the upper panel of Fig. 1 one can see how the boundary roughly varies between r ≈ 108 cm and 109 cm, depending on the time considered. This way one can fully characterize all resonances that can take place in the propagation of su- pernova neutrinos, both the outer resonant conversions related to neutrino masses and indicated as the upper bands in Fig. 1, and the inner resonances that follow from the presence of non-standard neutrino interactions, indicated by the band at the bottom of the same figure. Here we pay special attention to the use of realistic mat- ter and chemical supernova profiles and three-neutrino flavors thus generalising previous studies. A. Neutrino Evolution in the Inner Regions Let us first write the Hamiltonian in the inner layers, where Hint ≫ Hkin. In this case the Hamiltonian can be written as H ≈ Hint = V0ρ(2− Ye) + εee εeµ εeτ εeµ 0 εµτ εeτ εµτ εττ When the value of the εαβ is of the same order as the electron fraction Ye internal resonances can arise [13]. Taking into account the current constraints on the ε’s discussed in Sec. II one sees that small values of Ye are required [46, 47]. As a result, these can only take place in the most deleptonised inner layers, close to the neu- trinosphere, where the kinetic terms of the Hamiltonian are negligible. Given the large number of free parameters εαβ in- volved we consider one particular case where |εeµ| and |εeτ | are small enough to neglect a possible initial mixing between νe and νµ or ντ . Barring fine tuning, this basi- cally amounts to |εeµ|, |εeτ | ≪ 10−2. According to the discussion of Sec. II εeµ automatically satisfies the condi- tion, whereas one expects that the window |εeτ | & 10−2 will eventually be probed in future experiments. Since the initial fluxes of νµ and ντ are expected to be basically identical, it is convenient to redefine the weak basis by performing a rotation in the µ− τ sector: = U(ϑ′23) 1 0 0 0 c23′ s23′ 0 −s23′ c23′ where c23′ and s23′ stand for cos(ϑ 23) and sin(ϑ 23), re- spectively. The angle ϑ′23 can be written as tan(2ϑ′23) ≈ . (12) The Hamiltonian becomes in the new basis H ′αβ = U †(ϑ′23)HαβU(ϑ 23) (13) = V0ρ(2− Ye) + εee ε ε′eµ ε ε′eτ 0 ε ,(14) where ε′eµ = εeµc23′ − εeτs23′ (15) ε′eτ = εeµs23′ + εeτ c23′ (16) ε′µµ = (εττ − ε2ττ + 4ε µτ )/2 (17) ε′ττ = (εττ + ε2ττ + 4ε µτ )/2 . (18) With our initial assumptions on εeα one notices that the new basis ν′α basically diagonalizes the Hamiltonian, and therefore coincides roughly with the matter eigen- state basis. A novel resonance can arise if the condition H ′ee = H ττ is satisfied, we call this I-resonance, I stand- ing for “internal” [96]. The corresponding resonance con- dition can be written as Y Ie = 1 + εI , (19) where εI is defined as ε′ττ − εee. In Fig. 2 we represent the range of εee and ε ττ leading to the I-resonance for an electron fraction profile between different Y mine ’s and Y maxe = 0.5. It is important to notice that the value of Y mine depends on time. Right before the collapse the minimum value of the electron fraction is around 0.4. Hence the window of NSI parameters that would lead to a resonance would be relatively narrow, as indicated by the shaded (yellow) band in Fig. 2. As time goes on Y mine decreases to values of the order of a few %, and as a result the region of parameters giving rise to the I- resonance significantly widens. For example, in the range |εee| ≤ 10−3 possibly accessible to future experiments one sees that the I-resonance can take place for values of ε′ττ of the order of O(10−2). This indicates that the potential sensitivity on NSI parameters that can be achieved in su- pernova studies is better than that of the current limits. FIG. 2: Contours of Y Ie as function of εee and ε ττ accord- ing to Eq. (19) for different values of Ye. The region in yel- low represents the region of parameters that gives rise to I- resonance before the collapse. The arrows indicate how this region widens with time. As seen in Fig. 1 in order to fulfill the I-resonance con- dition for such small values of the NSI parameters the values of Ye must indeed lie, as already stated, in the inner layers. Several comments are in order: First, in contrast to the standard H and L-resonances, related to the kinetic term, the density itself does not explicitly enter into the resonance condition, provided that the density is high enough to neglect the kinetic terms. Analogously the en- ergy plays no role in the resonance condition, which is determined only by the electron fraction Ye. Moreover, in contrast to the standard resonances, the I-resonance occurs for both neutrinos and antineutrinos simultane- ously [13]. Finally, as indicated in Fig. 3 the νe’s (ν̄e) are not created as the heaviest (lightest) state but as the in- termediate state, therefore the flavor composition of the neutrinos arriving at the H-resonance is exactly the op- posite of the case without NSI. As we show in Sec. V, this fact can lead to important observational consequences. In order to calculate the hopping probability between matter eigenstates at the I-resonance we use the Landau- ν m2 ν FIG. 3: Level-crossing schemes, first panel is for the case of normal hierarchy (oscillations only), the second includes the NSI effect. The two lower panels correspond to the inverse hi- erarchy, oscillations only and oscillations + NSI, respectively. Zener approximation for two flavors P ILZ ≈ e− γI , (20) where γI stands for the adiabaticity parameter, which can be generally written as Em2 − Em1 , (21) where ϑ̇m ≡ dϑm/dr. If one applies this for- mula to the e − τ ′ box of Eq. (14) assuming that tan 2ϑmI = 2H eτ/(H ττ − Hee) and Em2 − Em1 = (H ′ττ −Hee)2 + 4H ′eτ one gets 4H ′2eτ (Ḣ ′ττ − Ḣee) 16V0ρε (1 + εI)3Ẏe ≈ 4× 109rs,5ρ11ε′2eτf(εI) , (22) where the parametrization of the Ye profile has been de- fined as in Eq. (9) with b = 0.16. The density ρ11 rep- resents the density in units of 1011 g/cm3, rs,5 stands for rs in units of 10 5 cm, and f(εI) is a function whose value is of the order O(1) in the range of parameters we are interested in. Taking all these factors into account it follows that the internal resonance will be adiabatic provided that ε′eτ & 10 −5, well below the current limits, in full numerical agreement with, e. g., Ref. [47]. In Fig. 4 we show the resonance condition as well as the adiabaticity in terms of εττ and εeτ assuming the other εαβ = 0. In order to illustrate the dependence on time we consider profiles inspired in the numerical profiles of Fig. 1 at t = 2 s (upper panel) and 15.7 s (bottom panel). For definiteness we take Y mine as the electron fraction at which the density has value of 5× 1011g/cm3. For comparison with Fig. 2 we have assumed Y mine = 10−2 in the case of 15.7 s. We observe how the border of adiabaticity depends on εττ through the value of the density at rI which in turn depends on time. Before moving to the discussion of the outer resonances a comment is in order, namely, how does the formalism change for other non-standard interaction models. First note that the whole treatment presented above also ap- plies to the case of NSI on up-type quarks, except that the position of the internal resonance shifts with respect to the down-quark case. Indeed, in this case the NSI potential (V unsi)αβ = ε αβV0ρ(1 + Ye) , (23) 0.001 0.01 0.1 1 0.001 0.01 0.1 1 FIG. 4: Contours of constant jump probability at the I- resonance in terms of εττ and εeτ for two profiles correspond- ing to Fig. 1 at 2 s with a = 0.235 and b = 0.175 (upper panel) and 15.7 s with a = 0.26 and b = 0.195 (bottom panel). For simplicity the other ε’s have been set to zero. would induce a similar internal resonance for the condi- tion Ye = ε I/(1− εI). In contrast, for the case of NSI with electrons, the NSI potential is proportional to the electron fraction, and therefore no internal resonance would appear. B. Neutrino Evolution in the Outer Regions In the outer layers of the SN envelope neutrinos can un- dergo important flavor transitions at those points where the matter induced potential equals the kinetic terms. In absence of NSI this condition can be expressed as VCC ≈ ∆m2/(2E). Neutrino oscillation experiments in- dicate two mass scales, ∆m2atm and ∆m ⊙ ≡ m22−m21 [5], hence two different resonance layers arise, the so-called H-resonance and the L-resonance, respectively. The presence of NSI with values of |εαβ| . 10−2 modi- fies the properties of the H and L transitions [48, 49, 50]. In particular one finds that the effects of the NSI can be described as in the standard case by embedding the ε’s into effective mixing angles [50]. An analogous “confu- sion” between sinϑ13 and the corresponding NSI param- eter εeτ has been pointed out in the context of long- baseline neutrino oscillations in Refs. [53, 54]. In this section we perform a more general and com- plementary study for slightly higher values of the NSI parameters: |εαβ | & few 10−2, still allowed by current limits, and for which the I-resonance could occur. The phenomenological assumption of hierarchical squared mass differences, |∆m2atm| ≫ ∆m2⊙, allows, for not too large ε’s, a factorization of the 3ν dynamics into two 2ν subsystems roughly decoupled for the H and L transitions [65]. To isolate the dynamics of the H transition, one usually rotates the neutrino flavor ba- sis by U †(ϑ23), and extracts the submatrix with indices (1,3) [48, 50]. Whereas this method works perfectly for small values of εαβ it can be dangerous for values above 10−2. In order to analyze how much our case deviates from the simplest approximation we have performed a rotation with the angle ϑ′′23 ≡ ϑ23 − α instead of just ϑ23. By requiring that the new rotation diagonalizes the submatrix (2,3) at the H-resonance layer one obtains the following expression for the correction angle α tan(2α) = ∆⊙s212s13 + V ττ s223 − 2V NSIµτ c223 (∆atm + ∆⊙c212(−3 + c213) +V NSIττ c223 + 2V µτ s223 , (24) where ∆atm ≡ ∆m2atm/(2E) and ∆⊙ ≡ ∆m2⊙/(2E). In our notation sij and s2ij represent sinϑij and sin(2ϑij), respectively. The parameters cij and c2ij are analogously defined. In the absence of NSI α is just a small correction to ϑ23 [97], tan(2α) ≈ ∆⊙s212s13/∆atmc213 . O(10−3) . (25) In order to calculate α we need to know the H- resonance point. To calculate it one can proceed as in the case without NSI, namely, make the ϑ′′23 rotation and analyze the submatrix (1, 3). The new Hamiltonian H ′′αβ has now the form H ′′ee = V0ρ[Ye + εee(2− Ye)] + ∆atms213 +∆⊙(c 12 + s 13) , H ′′ττ = V0ρ(2− Ye)ε′′ττ +∆atmc213c2α c213c α + (sαc12 + cαs12s13) H ′′eτ = V0ρ(2− Ye)ε′′eτ + ∆atms213cα ∆⊙(−c13sαs212 + c212cαs213) . (26) We have defined ε′′ττ = εττc 23−α + εµτs223−α, and ε εeτ c23−α+εeµs23−α, where s23−α ≡ sin(ϑ23−α), c23−α ≡ cos(ϑ23 − α), and s223−α ≡ sin(2ϑ23 − 2α), c223−α ≡ cos(2ϑ23 − 2α). The resonance condition for the H tran- sition, H ′′ee = H ττ can be then written as H [Y He + (εee − ε′′ττ )(2− Y He )] = ∆atm(c213c2α − s213) +∆⊙[c 13 − c2αs213)− s2αs212 + 12s2αs212s13] .(27) It can be easily checked how in the limit of εαβ → 0 one recovers the standard resonance condition, HY He ≈ ∆atmc213 . (28) In the region where the H-resonance occurs Y He ≈ 0.5. Taking into account Eqs. (24) and (27) one can already estimate how the value of α changes with the NSI param- eters. In Fig. 5 we show the dependence of α on the εττ after fixing the value of the other NSI parameters. One can see how for εττ & 10 −2 the approximation of neglect- ing α significantly worsens. Assuming ϑ23 = π/4 and a fixed value of εµτ one can easily see that εττ basically affects the numerator in Eq. (24). Therefore one expects a rise of α as the value of εττ increases, as seen in Fig. 5. The dependence of α on εµτ is correlated to the rela- tive sign of the mass hierarchy and εµτ . For instance, for normal mass hierarchy and positive values of εµτ the dependence is inverse, namely, higher values of εµτ lead to a suppression of α. Apart from this general behav- ior, α also depends on the diagonal term εee as seen in Fig. 5. This effect occurs by shifting the resonance point through the resonance condition in Eq. (27). One can now calculate the jump probability be- tween matter eigenstates in analogy to the I-resonance by means of the Landau-Zener approximation, see Eqs. (20), (21), and 22, PHLZ ≈ e− γH , (29) where γH represents the adiabaticity parameter at the FIG. 5: Angle α as function of εττ for different values of εee and εµτ , in the case of neutrinos of energy 10 MeV, with normal mass hierarchy, and s213 = 10 −5. The other NSI pa- rameters take the following values: εeµ = 0 and εeτ = 10 H-resonance, which can be written as 4H ′′2eτ (Ḣ ′′ττ − Ḣ ′′ee) , (30) where the expressions for H ′′αβ are given in Eqs (26). Let us first consider the case |εαβ| . 10−2. In this case α ≈ 0 and one can rewrite the adiabaticity parameter as ∆atm sin cos(2ϑ )|d ln V/dr|rH , (31) where = ϑ13 + ε eτ (2− Ye)/Ye (32) in agreement with Ref. [50]. For slightly larger ε’s there can be significant differences. In Fig. 6 we show PHLZ in the εeτ -εττ plane for antineutrinos with energy 10 MeV in the case of inverse mass hierarchy, using Eq. (29) with (upper panel) and without (bottom panel) the α cor- rection. The values of ϑ13 and εeτ have been chosen so that the jump probability lies in the transition regime be- tween adiabatic and strongly non adiabatic. In the limit of small εττ , α becomes negligible and therefore both re- sults coincide. From Eq. (31) one sees how as the value of εeτ increases γH gets larger and therefore the transition becomes more and more adiabatic. For negative values of εeτ there can be a cancellation between εeτ and ϑ13, and as a result the transition becomes non-adiabatic. An additional consequence of Eq. (32) is that a degen- eracy between εeτ and ϑ13 arises. This is seen in Fig. 7, which gives the contours of PH in terms of εeτ and ϑ13 for εττ = 10 −4. One sees clearly that the same Landau- Zener hopping probability is obtained for different com- binations of εeτ and ϑ13. This leads to an intrinsic “con- fusion” between the mixing angle and the corresponding NSI parameter, which can not be disentangled only in the context of SN neutrinos, as noted in Ref. [50]. We now turn to the case of |εττ | ≥ 10−2. As |εττ | increases the role of α becomes relevant. Whereas in the bottom panel PHLZ remains basically independent of εττ , one can see how in the upper panel PHLZ becomes strongly sensitive to εττ for |εττ | ≥ 10−2. One sees that for positive values of εττ it tends to adi- abaticity whereas for negative values to non-adiabaticity. This follows from the dependence of H ′′eτ on α, essen- tially through the term −∆⊙c13sαs212, see Eq. (26). For |εττ | ≥ 10−2 one sees that sinα starts being important, and as a result this term eventually becomes of the same order as the others in H ′′eτ . At this point the sign of εττ , and so the sign of sinα, is crucial since it may con- tribute to the enhancement or reduction of H ′′eτ . This directly translates into a trend towards adiabaticity or non-adiabaticity, seen in Fig. 6. Thus, for the range of εττ relevant for the NSI-induced internal resonance the adiabaticity of the outer H resonance can be affected in a non-trivial way. Turning to the case of the L transition a similar expres- sion can be obtained by rotating the original Hamiltonian by U(ϑ13) †U(ϑ23) † [48, 50]. However, in contrast to the case of the H-resonance, where the mixing angle ϑ13 is still unknown, in the case of the L transition the angle ϑ12 has been shown by solar and reactor neutrino exper- iments to be large [5]. As a result, for the mass scale ∆⊙ this transition will always be adiabatic irrespective of the values of εαβ, and will affect only neutrinos. FIG. 6: Landau-Zener jump probability isocontours at the H- resonance in terms of εeτ and εττ for 10 MeV antineutrinos in the case of inverted mass hierarchy. Upper panel: α given by Eq. (24). Bottom panel: α set to zero. The remaining parameters take the following values: sin2 ϑ13 = 10 , εeτ = 10−3, εee = εeµ = 0. See text. V. OBSERVABLES AND SENSITIVITY As mentioned in the introduction one of the major mo- tivations to study NSI using the neutrinos emitted in a SN is the enhancement of the NSI effects on the neutrino propagation through the SN envelope due to the specific extreme matter conditions that characterize it. In this section we analyze how these effects translate into ob- servable effects in the case of a future galactic SN. Schematically, the neutrino emission by a SN can be di- vided into four stages: Infall phase, neutronization burst, accretion, and Kelvin-Helmholtz cooling phase. During the infall phase and neutronization burst only νe’s are emitted, while the bulk of neutrino emission is released in all flavors in the last two phases. Whereas the neutrino emission characteristics of the two initial stages are basi- FIG. 7: Landau-Zener jump probability isocontours at the H-resonance in terms of εeτ and ϑ13 for εττ = 10 −4. An- tineutrinos with energy 10 MeV and inverted mass hierarchy has been assumed. cally independent of the features of the progenitor, such as the core mass or equation of state (EoS), the details of the neutrino spectra and luminosity during the ac- cretion and cooling phases may significantly change for different progenitor models. As a result, a straightfor- ward extraction of oscillation parameters from the bulk of the SN neutrino signal seems hopeless. Only features in the detected neutrino spectra which are independent of unknown SN parameters should be used in such an analysis [66]. The question then arises as to how can one obtain in- formation about the NSI parameters. Taking into ac- count that the main effect of NSI is to generate new in- ternal neutrino flavor transitions, one possibility is to in- voke theoretical arguments that involve different aspects of the SN internal dynamics. In Ref. [47] it was argued that such an internal flavor conversion during the first second after the core bounce might play a positive role in the so-called SN shock re- heating problem. It is observed in numerical simula- tions [67, 68, 69, 70] that as the shock wave propa- gates it loses energy until it gets stalled at a few hun- dred km. It is currently believed that after neutrinos escape the SN core they can to some extent deposit en- ergy right behind and help the shock wave continue out- wards. On the other hand it is also believed that due to the composition in matter of the protoneutronstar (PNS) the mean energies of the different neutrino spectra obey 〈Eνe〉 < 〈Eν̄e〉 < 〈Eνµ,ντ 〉. This means that a reso- nant conversion between νe(ν̄e) and νµ,τ (ν̄µ,τ ) between the neutrinosphere and the position of the stalled shock wave would make the νe(ν̄e) spectra harder, and there- fore the energy deposition would be larger, giving rise to a shock wave regeneration effect. Another argument used in the literature was the pos- sibility that the r−process nucleosynthesis, responsible for synthesizing about half of the heavy elements with mass number A > 70 in nature, could occur in the region above the neutrinosphere in SNe [71, 72]. A necessary condition is Ye < 0.5 in the nucleosynthesis region. The value of the electron fraction depends on the neutrino absorption rates, which are determined in turn by the νe(ν̄e) luminosities and energy distribution. These can be altered by flavor conversion in the inner layers due to the presence of NSI. Therefore by requiring the electron fraction be below 0.5 one can get information about the values of the NSI parameters. While it is commonly accepted that neutrinos will play a crucial role in both the shock wave re-heating as well as the r−process nucleosynthesis, there are still other as- trophysical factors that can affect both. While the issue remains under debate we prefer to stick to arguments directly related to physical observables in a large water Cherenkov detector. There are several possibilities. (A) the modulations in the ν̄e spectra due to the pas- sage of shock waves through the supernova [58, 59, (B) the modulation in the ν̄e spectra due to the time dependence of the electron fraction, induced by the I-resonance (C) the modulations in the ν̄e spectra due to the Earth matter [73, 74, 75, 76] (D) detectability of the neutronization νe burst [77, 78] Three of these observables, 1, 3 and 4 have already been considered in the literature in the context of neutrino oscillations. Here we discuss the potential of the above promising observables in providing information about the Scheme Hierarchy sin2 ϑ13 NSI Psurv P̄surv A normal & 10−4 No 0 cos2 ϑ12 B inverted & 10−4 No sin2 ϑ12 0 C any . 10−6 No sin2 ϑ12 cos AI normal & 10−4 Yes sin2 ϑ12 sin BI inverted & 10−4 Yes cos2 ϑ12 cos CIa normal . 10−6 Yes 0 sin2 ϑ12 CIb inverted . 10−6 Yes cos2 ϑ12 0 TABLE I: Definition of the neutrino schemes considered in terms of the hierarchy, the value of ϑ13, and the presence of NSI, as described in the text. The values of the survival probabilities for νe (Psurv) and ν̄e (P̄surv) for each case are also indicated. NSI parameters. It is important to pay attention to the possible ocurrence of the internal I-resonance and to its effect in the external H and L-resonances. The first can induce a genuinely new observable effect, item 2 above. Here we concentrate on neutral current-type non- standard interactions, hence there will be not effect in the main reaction in water Cherenkov and scintillator detec- tors, namely the inverse beta decay, ν̄e+p → e++n [98]. For definiteness we take NSI with d (down) quarks, in which case the NSI effects will be confined to the neu- trino evolution inside the SN and the Earth, through the vector component of the interaction. From all possible combinations of NSI parameters we will concentrate on those for which the internal I tran- sition does take place, namely |εI | & 10−2, see Fig. 2. Concerning the FC NSI parameters we will consider |ε′eτ | between few × 10−5 and 10−2, range in which the I- resonance is adiabatic, see Fig. 4. In the following discus- sion we will focus on the extreme cases defined in Table I. One of the motivations for considering these cases is the fact that the resonances involved become either adiabatic or strongly non adiabatic, and hence the survival prob- abilities in the absence of Earth effects or shock wave passage, become energy independent. This assumption simplifies the task of relating the observables with the neutrino schemes. A. Shock wave propagation During approximately the first two seconds after the core bounce, the neutrino survival probabilities are con- stant in time and in energy for all cases mentioned in Ta- ble I. Only the Earth effects could introduce an energy dependence. However, at t ≈ 2 s the H-resonance layer is reached by the outgoing shock wave, see Fig. 1. The way the shock wave passage affects the neutrino propagation strongly depends on the neutrino mixing scenario. In the absence of NSI cases A and C will not show any evidence of shock wave propagation in the observed ν̄e spectrum, either because there is no resonance in the antineutrino channel as in scenario A, or because the H-resonance is always strongly non-adiabatic as in scenario C. How- ever, in scenario B, the sudden change in density breaks the adiabaticity of the resonance, leading to a time and energy dependence of the electron antineutrino survival probability P̄surv(E, t). In the upper panel of Fig. 8 we show P̄surv(E, t) in the particular case that two shock waves are present, one forward and a reverse one [60]. The presence of the shocks results in the appearance of bumps in survival probability at those energies for which the resonance region is passed by the shock waves. All these structures move in time towards higher energies, as the shock waves reach regions with lower density, leading to observable consequences in the ν̄e spectrum. We now turn to the case where NSI are present, which opens the possibility of internal resonances. When such I-resonance is adiabatic the situation will be similar to the case without NSI. For normal mass hierarchy, AI and CIa, ν̄e will not feel the H-resonance and therefore the adiabaticity-breaking effect will not basically alter their propagation. In contrast, for inverted mass hierarchy and large ϑ13, case BI, the H-resonance occurs in the antineutrino channel and therefore ν̄e will feel the shock wave passage. However, in contrast to case B now ν̄e will reach the H-resonance in a different matter eigenstate: ν̄m1 instead of ν̄ 3 , see Fig. 3. That means that before the shock wave reaches the H-resonance the ν̄e survival probability will be P̄surv ≈ cos2 ϑ12 ≈ 0.7. Once the adiabaticity of the H-resonance is broken by the shock wave then ν̄e will partly leave as ν̄ 3 and therefore the survival probability will decrease. As a consequence one expects a pattern in time and energy for the survival FIG. 8: Survival probability P̄surv(E, t) for ν̄e as function of energy at different times averaged in energies with the en- ergy resolution of Super-Kamiokande; for the profile shown in Fig. 1. Upper panel: case B is assumed for sin2 ϑ13 = 10 Bottom panel: case BI , with εττ = 0.07, εeτ = 10 −4 and the rest of NSI parameters put to zero. probability in the case BI to be roughly opposite than in the case B, see bottom panel of Fig. 8. The position of the peaks and dips en each panel do not exactly coincide as the value of εττ roughly shifts the position of the H- resonance. In the left panels of Fig. 9 we represent in light-shaded (yellow) the range of εeτ and εττ for which this opposite shock wave imprint would be observable. In the upper panels we have assumed a minimum value of the electron fraction of 0.06, based on the numerical profiles at t = 2 s of Fig. 1. In the bottom panels Y mine is set to 0.01, inspired in the profiles at t = 15.7 s. It can be seen how as time goes on the range of εττ ’s for which the I-resonance takes place widens towards to smaller and smaller values. This is a direct consequence of the steady deleptonization of the inner layers. For smaller ϑ13, case CIb, the situation is different. Except for relatively large εeτ values theH-resonance will be strongly non-adiabatic, as in case C. Therefore the passage of the shock waves will not significantly change the ν̄e survival probability and will not lead to any ob- servable effect. In the right panels of Fig. 9 we show the same as in the left panels but for sin2 ϑ13 = 10 Whereas for large values of ϑ13, left panels, the H- resonance is always adiabatic and one has only to ensure the adiabaticity of the I-resonance, for smaller values of ϑ13 the adiabaticity of the H-resonance strongly depends on the values of εeτ and εττ , as discussed in Sec. IVB. This can be seen as a significant reduction of the yel- low area. Only large values of either εeτ or εττ would still allow for a clear identification of the opposite shock wave effects. In dark-shaded (cyan) we show the region of parameters for which PH lies in the transition region between adiabatic and strongly non-adiabatic, and there- fore could still lead to some effect. A useful observable to detect effects of the shock prop- agation is the average of the measured positron energies, 〈Ee〉, produced in inverse beta decays. In Fig. 10, we show 〈Ee〉 together with the one sigma errors expected for a Megaton water Cherenkov detector and a SN at 10 kpc distance, with a time binning of 0.5 s, for different neu- trino schemes: caseB and caseBI with different values of εττ . For the neutrino fluxes we assumed the parametriza- tion given by Refs. [79, 80] with 〈E0(ν̄e)〉 = 15 MeV and 〈E0(ν̄µ,τ )〉 = 18 MeV and the following ratio of the total neutrino fluxes Φ0(ν̄e)/Φ0(ν̄µ,τ ) = 0.8 [99]. One can see how the features of the average positron energy are a direct consequence of the shape of the sur- vival probability, where dips have to be translated into bumps and vice-versa. Thus, it is important to stress that whereas in case B one expects the presence of one or two dips (depend- ing on the structure of the shock wave, see Ref [60]), or nothing in the other cases, one or two bumps are ex- pected in case BI, as seen in the upper left panel of Fig. 10. As discussed in Ref. [60] the details of the dips/bump will depend on the exact shape of the neu- trino fluxes, but as long as general reasonable assump- tions like 〈Eν̄e〉 . 〈Eν̄µ,τ 〉 are considered the dips/bumps should be observed. B. Time variation of Ye We have just seen how the distorsion of the density profile due to the shock wave passage through the outer FIG. 9: Range of εττ and εeτ for which the effect of the shock wave will be observed. In the upper panels a minimum value of Y mine = 0.06 based on the numerical profiles at t = 2 s has been assumed, see Fig. 1. In the lower panels we have considered a case with Y mine = 0.01 inspired in the profile at t = 15.7 s. The value of sin2 ϑ13 has been assumed to be 10−2 and 10−7 in the left and right panels, respectively. We have also superimposed isocontours of constant hopping probability 0.1 (blue) and 0.9 (red) in the I (solid lines) and H (dashed lines) resonances for inverted mass hierarchy and E = 10 MeV and antineutrinos. The area in yellow represents the parameter space where both resonances will be adiabatic. In the cyan area the I-resonance is assumed to be adiabatic whereas H lies in the transition region. SN envelope can induce a time-dependent modulation in the ν̄e spectrum in cases B and BI. However the time dependence of the electron fraction Ye can also reveal the presence of NSI leaving a clear imprint in the observed ν̄e spectrum, as we now explain. As discussed in Sec. IVA the region of NSI parame- ters leading to I-resonance is basically determined by the minimum and maximum values of the electron fraction, Y mine and Y e . The crucial point is that as the delep- tonization of the proto-neutron star goes on, the value of Y mine steadily decreases with time. As a result, the range of NSI strengths for which the I-resonance takes place FIG. 10: The average energy of ν̄p → ne+ events binned in time for case B (dashed blue) and BI (solid red). In each panel different values of εττ have been assumed. The error bars represent 1 σ errors in any bin. εeτ = 10 increases with time, as can be seen in Fig. 2. Let us first discuss the observational consequences of the time dependence of the electron fraction in case BI. If εττ (ε I in general) is large enough the I-resonance will take place right after the core bounce. In this case, as seen in the upper left panel of Fig. 10 the two bumps we have just discussed in Sec. VA would be clearly observed. However for smaller NSI parameter values it could hap- pen that the I-resonance occurs only after several sec- onds. In particular for the specific Ye profile considered we show how this delay could be of roughly 2, 4 or 9 sec for values of εττ of 0.025, 0.02 or 0.015, respectively, see last three panels Fig. 10. As can be inferred from the figure this delay effect can lead to misidentification of the pure NSI effect. So, for instance, in the upper right panel, one sees how the two bumps might also be inter- preted as two dips, given the astrophysical uncertainties. This subtle degeneracy can only be solved by extra in- formation on, for example, the time dependence of the spectra or the velocity of the shock wave. Given the su- pernova model, however, the time structure of the signal could eventually not only point out the presence of NSI but even potentially indicate a range of NSI parameters. Let us now turn to the normal mass hierarchy sce- nario (cases AI and CIa). In analogy to the BI case, if εI is relatively large the onset of the I-resonance will take place early on. As can be inferred from Fig. 3 that implies that ν̄e will escape the SN as ν̄2. For smaller values, though, it may happen that the I-resonance be- comes effective only after a few seconds. This means that during the first seconds of the neutrino signal ν̄e would leave the star as ν̄1 (cases A and C). Then, after some point, the electron fraction would be low enough to switch on the I-resonance, and consequently ν̄e would enter the Earth as ν̄2. This would result in a transition in the electron antineutrino survival prob- ability from P̄surv ≈ cos2 ϑ12 = 0.7 to sin2 ϑ12 = 0.3. Given the expected hierarchy in the average neutrino en- ergies 〈Eν̄e〉 . 〈Eν̄µ,τ 〉, it follows that the change in Ye would lead to a hardening of the observed positron spec- trum. The effect is quantified in Fig. 11 for different values of εττ . The figure shows the average energy of the ν̄p → ne+ events for the case of a Megaton water Cherenkov detector exactly as in Fig. 10, but for scenar- ios AI and CIa. One can see how for εττ = 0.07 the I- resonance condition is always fulfilled and therefore there is no time dependence. However for smaller values one can see a rise at a certain moment which depends on the magnitude of εττ . A similar effect would occur in case C. Earth matter effects Before the shock wave reaches the H-resonance layer the dependence of the neutrino survival probability in the cases we are considering, on the neutrino energy E is very weak. However, if neutrinos cross the Earth before reach- ing the detector, the conversion probabilities may become energy-dependent, inducing modulations in the neutrino energy spectrum. These modulations may be observed in the form of local peaks and valleys in the spectrum of the event rate σFDν̄e plotted as a function of 1/E. These modulations arise in the antineutrino channel only when ν̄e leave the SN as ν̄1 or ν̄2. In the absence of NSI this happens in cases A and C, where ν̄e leave the star as ν̄1. FIG. 11: The average energy of ν̄p → ne+ events binned in time for case AI and CIa and different values of εττ . The error bars represent 1 σ errors in any bin. εeτ = 10 In the presence of NSI ν̄e will arrive at the Earth as ν̄1 in cases BI, and as ν̄2 in case AI and CIa. Therefore its observation would exclude cases B and CIb. This distortion in the spectra could be measured by compar- ing the neutrino signal at two or more different detectors such that the neutrinos travel different distances through the Earth before reaching them [73, 74]. However these Earth matter effects can be also identified in a single de- tector [75, 76]. By analyzing the power spectrum of the detected neu- trino events one can identify the presence of peaks located at the frequencies characterizing the modulation. These do not dependend on the primary neutrino spectra, and can be determined to a good accuracy from the knowl- edge of the solar oscillation parameters, the Earth matter density, and the position of the SN in the sky [76]. The latter can be determined with sufficient precision even if the SN is optically obscured using the pointing capability of water Cherenkov neutrino detectors [81]. This method turns out to be powerful in detecting the modulations in the spectra due to Earth matter effects, and thus in ruling out cases B and CIb. However, the po- sition of the peaks does not depend on how ν̄e enters the Earth, as ν̄1 or ν̄2. Hence it is not useful to discriminate case AI and CIa from the cases A, C, and BI. The time dependence of Ye, however, can transform case B into BI, and C with inverse hierarchy into CIb, leading respectively to an appearance and disappearance of these Earth matter effects. In case BI the presence of the shock wave modulation can spoil a clear identification of the Earth matter effects. Nevertheless, the disappear- ance of the Earth matter effects in the transition from case C to CIb allows us to pin down case CIb. D. Neutronization burst The prompt neutronization burst takes place during the first ∼ 25 ms after the core bounce with a typical full width half maximum of 5–7ms and a peak luminos- ity of 3.3–3.5×1053 erg s−1. The striking similarity of the neutrino emission characteristics despite the variability in the properties of the pre-collapse cores is caused by a regulation mechanism between electron number frac- tion and target abundances for electron capture. This effectively establishes similar electron fractions in the in- ner core during collapse, leading to a convergence of the structure of the central part of the collapsing cores, with only small differences in the evolution of different pro- genitors until shock breakout [77, 78]. Taking into account that the SN will be likely to be obscured by dust and a good estimation of the distance will not be possible, the time structure of the detected neutrino signal should be used as signature for the neu- tronization burst. In Ref. [78] it was shown that such a time structure can be in principle cleanly seen in the case of a Megaton water Cherenkov detector. It was also shown how the time evolution of the signal depends strongly on the neutrino mixing scheme. In the absence of NSI the νe peak could be observed provided that the νe survival probability Pνeνe is not zero. As can be seen in Table I this happens for cases B and C. However for case A (normal mass hierarchy and “large” ϑ13), νe leaves the SN as ν3. This leads to a survival probability Pνeνe ≈ sin2 ϑ13 . 10−1, and therefore the peak remains hidden. Let us now consider the situation where NSI are prensent. For normal mass hierarchy νe, which is born as νm2 passes through three different resonances, I, H and L. Whereas I and L will be adiabatic, the fate of H will depend on the value of ϑ13. For “large” values, case AI, the H-resonance will also be adiabatic. This implies that νe’s will leave as ν2, the survival probability will be Pνeνe ≈ sin2 ϑ12 ≈ 0.3, and therefore the peak will be seen, as in cases B and C. If ϑ13 happens to be very small, case CIa, then H will be strongly non- adiabatic and therefore νe will leave the star as ν3. As a consequence the neutronization peak will not be seen. For inverse mass hierarchy, νe is born as ν 1 and tra- verses adiabatically I and L. This implies that they will leave the star as ν1 and therefore the peak will also be observed. However now the survival probability will be larger, Pνeνe ≈ cos2 ϑ12 ≈ 0.7. Thus for a given known normalization, i.e. the distance to the SN, one expects a larger number of events during the neutronization peak in this case. In Fig. 12 we show the expected number of events per time bin in a water Cherenkov detector in the case of a SN exploding at 10 kpc, for two different neutrino schemes, C and BI, and for different SN pro- genitor masses. One can see how the difference due to the larger survival probability is bigger than the typi- cal error bars, associated to the lack of knowledge of the progenitor mass. Two comments are in order. The neutronization νe burst takes place during the first milliseconds, before strong deleptonization takes place. As a result, in con- trast to other observables we have considered in this pa- per, here the I-resonance will only occur for εI & 10−1. On the other hand in the presence of additional NSI with electrons this would significantly affect the ν − e cross sections, and consequently the results presented here. VI. SUMMARY We have analyzed the possibility of observing clear sig- natures of non-standard neutrino interactions from the detection of neutrinos produced in a future galactic su- pernova. In Secs. III and IV we have re-considered effect of ν−d non-standard interactions on the neutrino propagation through the SN envelope within a three-neutrino frame- work. In contrast to previous works we have analyzed the neutrino evolution in both the more deleptonized in- FIG. 12: Number of events from the elastic scattering on elec- trons, per time bin in a Megaton water Cherenkov detector for a SN at 10 kpc for cases C (dashed lines) and BI (solid lines). Different progenitor masses have been assumed: 13 M⊙ (n13) in red, 15 M⊙ (s15s7b2) in black, and 25 M⊙ (s25a28) in blue. 1-sigma errors are also shown for the 15 M⊙ case. ner layers and the outer regions of the SN envelope. We have also taken into account the time dependence of the SN density and electron fraction profiles. First we have found that the small values of the elec- tron fraction typical of the former allows for internal NSI- induced resonant conversions, in addition to the standard MSW-H and MSW-L resonances of the outer envelope. These new flavor conversions take place for a relatively large range of NSI parameters, namely |εαα| between 10−2 − 10−1, and |εeτ | & few × 10−5, currently allowed by experiment. For this range of strengths, in particu- lar εττ , non-standard interactions can significantly affect the adiabaticity of the H-resonance. On the other hand the NSI-induced resonant conversions may also lead to the modulation of the ν̄e spectra as a result of the time dependence of the electron fraction. In Sec. V we have studied the possibility of detecting NSI effects in a Megaton water Cherenkov detector us- ing the modulation effects in the ν̄e spectrum due to (i) the passage of shock waves through the SN envelope, (ii) the time dependence of the electron fraction and (iii) the Earth matter effects; and, finally, through the possible detectability of the neutronization νe burst. Note that observable (ii) turns out to be complementary to the ob- servation of the shock wave passage, (i), and offers the possibility to probe NSI effects also for normal hierarchy neutrino spectra. In Table II we summarize the results obtained for dif- ferent neutrino schemes. We have found that observable (i) can clearly indicate the existence of NSI in the case of inverse mass hierarchy and large ϑ13 (case BI). On the other hand, observable (ii) allows for an identification of NSI effects in the other cases, normal mass hierarchy (cases AI and CIa) and inverse mass hierarchy and small ϑ13 (case CIb). Therefore a positive signal of either ob- servable (i) or (ii) would establish the existence of NSI. In the latter case this would, however, leave a degeneracy among cases AI, CIa, and CIb. Such degeneracy can be broken with the help of observables (iii) and the ob- servation of the neutronization νe burst. The detection of Earth matter effects during the whole supernova neu- trino signal would rule out case CIb since, as discussed in Sec. VC, a disappearance of Earth matter effects would take place due to a transition from C to CIb. Finally, the (non) observation of the neutronization burst can be used to distinguish between cases AI and CIa. Similarly, other degeneracies in Table II may be lifted by suitably combining different observables. For exam- ple, a negative of observable (ii) could mean either neg- ligible NSI strengths or (NU) NSI parameter values so large that the internal resonance is always present. In this case one could use the observation of the neutron- ization burst in order to establish the presence of NSI for the case of inverse mass hierarchy. In addition the ob- servation of the shock wave imprint in the ν̄e spectrum would provide additional information on ϑ13. In conclusion, by suitably combining all observables one may establish not only the presence of NSI, but also the mass hierarchy and probe the magnitude of ϑ13. Acknowledgments The authors wish to thank H-Th. Janka, O. Miranda, S. Pastor, Th. Schwetz, and M. Tórtola for fruitful discus- sions. Work supported by the Spanish grant FPA2005- Scheme Hierarchy sin2 ϑ13 NSI shock Ye Earth νe burst A normal & 10−4 No No No Yes No B inverted & 10−4 No Yes No No Yes C any . 10−6 No No No Yes Yes AI normal & 10−4 Yes No Yes Yes Yes BI inverted & 10−4 Yes Yes⋆ No Yes Yes⋆ CIa normal . 10−6 Yes No Yes Yes No CIb inverted . 10−6 Yes No Yes No Yes⋆ TABLE II: Expectations for the observables discussed in the text: modulation of the ν̄e spectrum due to the shock wave passage, the time variation of Ye, the Earth effect, and the observation of the νe burst within various neutrino schemes. Asterisks indicate that the effect differs from that expected in the absence of NSI. See text. 01269 and European Network of Theoretical Astroparti- cle Physics ILIAS/N6 under contract number RII3-CT- 2004-506222. A. E. has been supported by a FPU grant from the Spanish Government. R. T. has been supported by the Juan de la Cierva program from the Spanish Gov- ernment and by an ERG from the European Commission. References [1] KamLAND collaboration, K. Eguchi et al., Phys. Rev. Lett. 90, 021802 (2003), [hep-ex/0212021]. [2] S. Pakvasa and J. W. F. Valle, hep-ph/0301061, Proc. of the Indian National Academy of Sciences on Neutrinos, Vol. 70A, No.1, p.189 - 222 (2004), Eds. D. Indumathi, M.V.N. Murthy and G. Rajasekaran. [3] V. Barger, D. Marfatia and K. Whisnant, hep-ph/0308123. [4] KamLAND collaboration, T. Araki et al., Phys. Rev. Lett. 94, 081801 (2004). [5] M. Maltoni, T. Schwetz, M. A. Tortola and J. W. F. Valle, New J. Phys. 6, 122 (2004), Appendix C in hep-ph/0405172 (v5) provides updated neutrino oscilla- tion results taking into account new SSM, new SNO salt data, latest K2K and MINOS data; previous works by other groups are referenced therein. [6] J. Schechter and J. W. F. Valle, Phys. Rev. D22, 2227 (1980). [7] J. W. F. Valle, J. Phys. Conf. Ser. 53, 473 (2006), [hep-ph/0608101], Review based on lectures at the Corfu Summer Institute on Elementary Particle Physics in September 2005. [8] J. Schechter and J. W. F. Valle, Phys. Rev. D24, 1883 (1981), Err. D25, 283 (1982). [9] C.-S. Lim and W. J. Marciano, Phys. Rev. D37, 1368 (1988). [10] E. K. Akhmedov, Phys. Lett. B213, 64 (1988). [11] L. Wolfenstein, Phys. Rev. D17, 2369 (1978). [12] Mikheev, S. P. and Smirnov, A. Yu., (Editions Frontières, Gif-sur-Yvette, 1986, p.355.), 86 Massive Neutrinos in Astrophysics and Particle Physics, Proceedings of the Sixth Moriond Workshop, ed. by Fackler, O. and Tran Thanh Van, J. [13] J. W. F. Valle, Phys. Lett. B199, 432 (1987). [14] R. N. Mohapatra and J. W. F. Valle, Phys. Rev. D34, 1642 (1986). [15] J. Bernabeu et al., Phys. Lett. B187, 303 (1987). [16] G. C. Branco, M. N. Rebelo and J. W. F. Valle, Phys. Lett. B225, 385 (1989). [17] N. Rius and J. W. F. Valle, Phys. Lett.B246, 249 (1990). [18] F. Deppisch and J. W. F. Valle, Phys. Rev. D72, 036001 (2005), [hep-ph/0406040]. [19] A. Zee, Phys. Lett. B93, 389 (1980). [20] K. S. Babu, Phys. Lett. B203, 132 (1988). [21] L. J. Hall, V. A. Kostelecky and S. Raby, Nucl. Phys. B267, 415 (1986). [22] M. Malinsky, J. C. Romao and J. W. F. Valle, Phys. Rev. Lett. 95, 161801 (2005), [hep-ph/0506296]. [23] A. B. McDonald, astro-ph/0406253. [24] K. Scholberg, astro-ph/0701081. [25] LSND, L. B. Auerbach et al., Phys. Rev. D63, 112001 (2001), [hep-ex/0101039]. [26] MUNU, Z. Daraktchieva et al., Phys. Lett. B564, 190 (2003), [hep-ex/0304011]. [27] CHARM, J. Dorenbosch et al., Phys. Lett. B180, 303 (1986). [28] CHARM-II, P. Vilain et al., Phys. Lett. B335, 246 (1994). [29] NuTeV, G. P. Zeller et al., Phys. Rev. Lett. 88, 091802 (2002), [hep-ex/0110059]. [30] V. D. Barger, R. J. N. Phillips and K. Whisnant, Phys. Rev. D44, 1629 (1991). [31] S. Davidson, C. Pena-Garay, N. Rius and A. Santamaria, JHEP 03, 011 (2003), [hep-ph/0302093]. http://arxiv.org/abs/hep-ex/0212021 http://arxiv.org/abs/hep-ph/0301061 http://arxiv.org/abs/hep-ph/0308123 http://arxiv.org/abs/hep-ph/0405172 http://arxiv.org/abs/hep-ph/0608101 http://arxiv.org/abs/hep-ph/0406040 http://arxiv.org/abs/hep-ph/0506296 http://arxiv.org/abs/astro-ph/0406253 http://arxiv.org/abs/astro-ph/0701081 http://arxiv.org/abs/hep-ex/0101039 http://arxiv.org/abs/hep-ex/0304011 http://arxiv.org/abs/hep-ex/0110059 http://arxiv.org/abs/hep-ph/0302093 [32] J. Barranco, O. G. Miranda, C. A. Moura and J. W. F. Valle, Phys. Rev.D73, 113001 (2006), [hep-ph/0512195]. [33] Z. Berezhiani and A. Rossi, Phys. Lett. B535, 207 (2002), [hep-ph/0111137]. [34] A. Friedland, C. Lunardini and C. Pena-Garay, Phys. Lett. B594, 347 (2004), [hep-ph/0402266]. [35] M. M. Guzzo, P. C. de Holanda and O. L. G. Peres, Phys. Lett. B591, 1 (2004), [hep-ph/0403134]. [36] O. G. Miranda, M. A. Tortola and J. W. F. Valle, JHEP 10, 008 (2006), [hep-ph/0406280]. [37] N. Fornengo et al., Phys. Rev. D65, 013010 (2002), [hep-ph/0108043]. [38] A. Friedland, C. Lunardini and M. Maltoni, Phys. Rev. D70, 111301 (2004), [hep-ph/0408264]. [39] A. Friedland and C. Lunardini, Phys. Rev. D72, 053009 (2005), [hep-ph/0506143]. [40] G. Mangano et al., Nucl. Phys. B756, 100 (2006), [hep-ph/0607267]. [41] S. K. Katsanevas, talk at Workshop on Neutrino Oscil- lation Physics (NOW 2006), Otranto, Lecce, Italy, 9-16 Sep 2006. [42] P. S. Amanik, G. M. Fuller and B. Grinstein, Astropart. Phys. 24, 160 (2005), [hep-ph/0407130]. [43] P. S. Amanik and G. M. Fuller, astro-ph/0606607. [44] S. P. Mikheev and A. Y. Smirnov, Sov. J. Nucl. Phys. 42, 913 (1985). [45] S. P. Mikheev and A. Y. Smirnov, Nuovo Cim. C9, 17 (1986). [46] H. Nunokawa, Y. Z. Qian, A. Rossi and J. W. F. Valle, Phys. Rev. D54, 4356 (1996), [hep-ph/9605301]. [47] H. Nunokawa, A. Rossi and J. W. F. Valle, Nucl. Phys. B482, 481 (1996), [hep-ph/9606445]. [48] S. Mansour and T.-K. Kuo, Phys. Rev. D58, 013012 (1998), [hep-ph/9711424]. [49] S. Bergmann and A. Kagan, Nucl. Phys. B538, 368 (1999), [hep-ph/9803305]. [50] G. L. Fogli, E. Lisi, A. Mirizzi and D. Montanino, Phys. Rev. D66, 013009 (2002), [hep-ph/0202269]. [51] T.-K. Kuo and J. T. Pantaleone, Phys. Rev. D37, 298 (1988). [52] S. Bergmann, Nucl. Phys. B515, 363 (1998), [hep-ph/9707398]. [53] P. Huber, T. Schwetz and J. W. F. Valle, Phys. Rev. Lett. 88, 101804 (2002), [hep-ph/0111224]. [54] P. Huber, T. Schwetz and J. W. F. Valle, Phys. Rev. D66, 013006 (2002), [hep-ph/0202048]. [55] Particle Data Group, W. M. Yao et al., J. Phys. G33, 1 (2006). [56] F. J. Botella, C. S. Lim and W. J. Marciano, Phys. Rev. D35, 896 (1987). [57] S. E. Woosley, A. Heger and T. A. Weaver, Reviews of Modern Physics 74, 1015 (2002). [58] R. C. Schirato, G. M. Fuller, . U. . LANL), UCSD and LANL), astro-ph/0205390. [59] G. L. Fogli, E. Lisi, D. Montanino and A. Mirizzi, Phys. Rev. D68, 033005 (2003), [hep-ph/0304056]. [60] R. Tomas et al., JCAP 0409, 015 (2004), [astro-ph/0407132]. [61] C. Y. Cardall, astro-ph/0701831. [62] H. A. Bethe, J. H. Applegate and G. E. Brown, Astro- phys. J. 241, 343 (1980). [63] A. Burrows and T. J. Mazurek. Astrophys. J. 259, 330 (1982). [64] H. Th. Janka, private communication. [65] T.-K. Kuo and J. T. Pantaleone, Rev. Mod. Phys. 61, 937 (1989). [66] M. Kachelriess and R. Tomas, hep-ph/0412100. [67] M. Liebendoerfer et al., Phys. Rev. D63, 103004 (2001), [astro-ph/0006418]. [68] M. Rampp and H. T. Janka, Astron. Astrophys. 396, 361 (2002), [astro-ph/0203101]. [69] T. A. Thompson, A. Burrows and P. A. Pinto, Astrophys. J. 592, 434 (2003), [astro-ph/0211194]. [70] K. Sumiyoshi et al., Astrophys. J. 629, 922 (2005), [astro-ph/0506620]. [71] Y.-Z. Qian, Prog. Part. Nucl. Phys. 50, 153 (2003), [astro-ph/0301422]. [72] J. Pruet, S. E. Woosley, R. Buras, H.-T. Janka and R. D. Hoffman, Astrophys. J. 623, 325 (2005), [astro-ph/0409446]. [73] C. Lunardini and A. Y. Smirnov, Nucl. Phys. B616, 307 (2001), [hep-ph/0106149]. [74] A. S. Dighe, M. T. Keil and G. G. Raffelt, JCAP 0306, 005 (2003), [hep-ph/0303210]. [75] A. S. Dighe, M. T. Keil and G. G. Raffelt, JCAP 0306, 006 (2003), [hep-ph/0304150]. [76] A. S. Dighe, M. Kachelriess, G. G. Raffelt and R. Tomas, JCAP 0401, 004 (2004), [hep-ph/0311172]. [77] K. Takahashi, K. Sato, A. Burrows and T. A. Thompson, Phys. Rev. D68, 113009 (2003), [hep-ph/0306056]. [78] M. Kachelriess et al., Phys. Rev. D71, 063003 (2005), [astro-ph/0412082]. [79] M. T. Keil, PhD thesis TU München 2003 [astro-ph/0308228]. [80] M. T. Keil, G. G. Raffelt and H. T. Janka, Astrophys. J. 590 (2003) 971 [astro-ph/0208035]. [81] R. Tomas, D. Semikoz, G. G. Raffelt, M. Kachelriess and A. S. Dighe, Phys. Rev. D68, 093013 (2003), [hep-ph/0307050]. [82] H. Nunokawa, V. B. Semikoz, A. Y. Smirnov and J. W. F. Valle, Nucl. Phys. B501, 17 (1997), [hep-ph/9701420]. [83] H. Duan, G. M. Fuller, J. Carlson and Y.-Z. Qian, Phys. Rev. D74, 105014 (2006), [astro-ph/0606616]. [84] H. Duan, G. M. Fuller, J. Carlson and Y.-Z. Qian, Phys. Rev. Lett. 97, 241101 (2006), [astro-ph/0608050]. [85] S. Hannestad, G. G. Raffelt, G. Sigl and Y. Y. Y. Wong, http://arxiv.org/abs/hep-ph/0512195 http://arxiv.org/abs/hep-ph/0111137 http://arxiv.org/abs/hep-ph/0402266 http://arxiv.org/abs/hep-ph/0403134 http://arxiv.org/abs/hep-ph/0406280 http://arxiv.org/abs/hep-ph/0108043 http://arxiv.org/abs/hep-ph/0408264 http://arxiv.org/abs/hep-ph/0506143 http://arxiv.org/abs/hep-ph/0607267 http://arxiv.org/abs/hep-ph/0407130 http://arxiv.org/abs/astro-ph/0606607 http://arxiv.org/abs/hep-ph/9605301 http://arxiv.org/abs/hep-ph/9606445 http://arxiv.org/abs/hep-ph/9711424 http://arxiv.org/abs/hep-ph/9803305 http://arxiv.org/abs/hep-ph/0202269 http://arxiv.org/abs/hep-ph/9707398 http://arxiv.org/abs/hep-ph/0111224 http://arxiv.org/abs/hep-ph/0202048 http://arxiv.org/abs/astro-ph/0205390 http://arxiv.org/abs/hep-ph/0304056 http://arxiv.org/abs/astro-ph/0407132 http://arxiv.org/abs/astro-ph/0701831 http://arxiv.org/abs/hep-ph/0412100 http://arxiv.org/abs/astro-ph/0006418 http://arxiv.org/abs/astro-ph/0203101 http://arxiv.org/abs/astro-ph/0211194 http://arxiv.org/abs/astro-ph/0506620 http://arxiv.org/abs/astro-ph/0301422 http://arxiv.org/abs/astro-ph/0409446 http://arxiv.org/abs/hep-ph/0106149 http://arxiv.org/abs/hep-ph/0303210 http://arxiv.org/abs/hep-ph/0304150 http://arxiv.org/abs/hep-ph/0311172 http://arxiv.org/abs/hep-ph/0306056 http://arxiv.org/abs/astro-ph/0412082 http://arxiv.org/abs/astro-ph/0308228 http://arxiv.org/abs/astro-ph/0208035 http://arxiv.org/abs/hep-ph/0307050 http://arxiv.org/abs/hep-ph/9701420 http://arxiv.org/abs/astro-ph/0606616 http://arxiv.org/abs/astro-ph/0608050 Phys. Rev. D74, 105010 (2006), [astro-ph/0608695]. [86] G. G. Raffelt and G. G. R. Sigl, hep-ph/0701182. [87] A. B. Balantekin, J. M. Fetter and F. N. Loreti, Phys. Rev. D54, 3941 (1996), [astro-ph/9604061]. [88] H. Nunokawa, A. Rossi, V. B. Semikoz and J. W. F. Valle, Nucl. Phys. B472, 495 (1996), [hep-ph/9602307]. [89] G. L. Fogli, E. Lisi, A. Mirizzi and D. Montanino, JCAP 0606, 012 (2006), [hep-ph/0603033]. [90] A. Friedland and A. Gruzinov, astro-ph/0607244. [91] Axial couplings would affect neutrino propagation in po- larized media, see Ref. [82]. [92] However we have confined ourselves to values of εeα small enough not to lead to drastic consequences during the core collapse. [93] For the sake of simplicity we will omit the superindex V . [94] The importance of collective flavor neutrino conversions driven by neutrino-neutrino interactions has been re- cently noted in Refs. [83, 84, 85, 86]. Here we consider only the case where the effective potential felt by neutri- nos comes from their interactions with electrons, protons and neutrons. In a future work we plan to include this effect and have a complete picture of the neutrino prop- agation. [95] Here we neglect the possible effects of density fluctua- tions [87, 88] taking place during the shock wave prop- agation. For a detailed study of the phenomenological consequences see Refs. [89, 90]. [96] The alternative condition H ′ee = H µµ would give rise to another internal resonance which can be studied using the same method. For brevity, we will not pursue this in this paper. [97] Note that, in the limit of high densities one recovers the rotation angle obtained for the internal I-resonance 23 → ϑ 23 after neglecting the kinetic terms. [98] For the case of NSI with electrons both the vector and axial components of εeαβ will contribute to the ν−e cross section. [99] We assume that for the values of the NSI parameters con- sidered the initial neutrino spectra do not significantly change. http://arxiv.org/abs/astro-ph/0608695 http://arxiv.org/abs/hep-ph/0701182 http://arxiv.org/abs/astro-ph/9604061 http://arxiv.org/abs/hep-ph/9602307 http://arxiv.org/abs/hep-ph/0603033 http://arxiv.org/abs/astro-ph/0607244 ABSTRACT We analyze the possibility of probing non-standard neutrino interactions (NSI, for short) through the detection of neutrinos produced in a future galactic supernova (SN).We consider the effect of NSI on the neutrino propagation through the SN envelope within a three-neutrino framework, paying special attention to the inclusion of NSI-induced resonant conversions, which may take place in the most deleptonised inner layers. We study the possibility of detecting NSI effects in a Megaton water Cherenkov detector, either through modulation effects in the $\bar\nu_e$ spectrum due to (i) the passage of shock waves through the SN envelope, (ii) the time dependence of the electron fraction and (iii) the Earth matter effects; or, finally, through the possible detectability of the neutronization $\nu_e$ burst. We find that the $\bar\nu_e$ spectrum can exhibit dramatic features due to the internal NSI-induced resonant conversion. This occurs for non-universal NSI strengths of a few %, and for very small flavor-changing NSI above a few$\times 10^{-5}$. <|endoftext|><|startoftext|> Introduction The discrete dipole approximation (DDA) is a well-known method to solve the light scattering problem for arbitrary shaped particles. Since its introduction by Purcell and Pennypacker1 it has been improved constantly. The formulation of DDA summarized by Draine and Flatau2 more than 10 years ago is still most widely used for many applications,3 partly due to the publicly available high-quality and user-friendly code DDSCAT.4 Although modern improvements of DDA (as discussed in detail in Section 2.F) exist, they are still in the research stage because they are not widely used in real applications. DDA directly discretizes the volume of the scatterer and hence is applicable to arbitrary shaped particles. However, the drawback of this discretization is the extreme computational complexity of DDA, although it is significantly decreased by advanced numerical techniques.2,5 That is why the usual application strategy for DDA is “single computation”, where a discretization is chosen based on available computational resources and some empirical estimates of the expected errors.3,4 These error estimates are based on a limited number of benchmark calculations3 and hence are external to the light scattering problem under investigation. Such error estimates have evident drawbacks, however no better alternative is available. Some results of analytical analysis of errors in computational electromagnetics are known, e.g. 6,7, however they typically consider the surface integral equations. To the best of our knowledge, such analysis has not been done for volume integral equations (such as DDA). Usually errors in DDA are studied as a function of the size parameter of the scatterer x (at a constant or few different total numbers of dipoles N), e.g. 2,8. Only a small number of papers directly present errors versus discretization parameter (e.g. d – the size of a single dipole).9-17 The range of d typically studied in those papers is limited to a 5 times difference between minimum and maximum values, with the exception of two papers11,12 where it is 15 times. Those plots of errors versus discretization parameter are always used to illustrate the performance of a new DDA formulation and compare it with others. No conclusions about the convergence properties of DDA, as a function of d, have been made based on these plots. To our knowledge, no theoretical analysis of DDA convergence has been performed, but only a few limited empirical studies have appeared in the literature. In this paper we perform a theoretical analysis of DDA convergence when refining the discretization (Section 2). We derive rigorous theoretical bounds on the error in any measured quantity for any scatterer. In Section 3 we present extensive numerical results of DDA computations for 5 different scatterers using many different discretizations. These results are discussed in Section 4 to support conclusions of the theoretical analysis. We formulate the conclusions of the paper in Section 5. In a follow-up paper18 (which from now on we refer to as Paper 2) the theoretical convergence results are used for an extrapolation technique to increase the accuracy of DDA computations. 2. Theoretical analysis In this section we analyze theoretically the errors of DDA computations. We formulate the volume integral equation for the internal electric field and its operator counterpart in Section 2.A and its discretization in Section 2.B. Section 2.C contains integral and discretized formulae for measured quantities that are the final goal of any light scattering simulation. We derive the main results in Section 2.D, where we consider errors of the traditional DDA formulation2 without shape errors, which are considered separately in Section 2.E. Finally in Section 2.F we discuss some recent DDA improvements from the viewpoint of our convergence theory. A.Integral Equation Throughout this paper we assume the )iexp( tω− time dependence of all fields. The scatterer is assumed dielectric but not magnetic (magnetic permittivity 1=μ ), and the electric permittivity is assumed isotropic (non-isotropic permittivity will significantly complicate the derivations but will not principally change the main conclusion of Section 2 – Eqs. (70) and (87)). The general form of the integral equation governing the electric field inside the dielectric scatterer is the following:19,20 )()(),(),()()(),(d)()( 00 rErrLrMrErrrGrErE χχ VVr ∂−+′′′′+= ∫ , (1) where Einc(r), E(r) are the incident and total electric field at location r; πεχ 4)1)(()( −= rr is the susceptibility of the medium at point r (ε(r) – relative permittivity). V is the volume of the particle (more general – the volume, which contains all points where the susceptibility is not zero), V0 is a smaller volume such that , VV ⊂0 00 \ VV ∂∈r . ),( rrG ′ is the free space dyadic Green’s function, defined as −=∇∇+=′ )()(ˆˆ),( kRgRgk IIIrrG , (2) where I is the identity dyadic, ck ω= – free space wave vector, rrR ′−= , R=R , and is a dyadic defined as (μ, ν are Cartesian components of the vector or tensor), and g(R) is the scalar Green’s function RR ˆˆ νμμν RRRR =ˆˆ iexp( )( = . (3) M is the following integral associated with the finiteness of the exclusion volume V0 ( )∫ ′−′′′′= )()(),()()(),(d),( s30 rV rErrrGrErrrGrM χχ , (4) where ),(s rrG ′ is the static limit ( ) of 0→k ),( rrG ′ : −−=∇∇=′ 23 11ˆˆ),( IrrG . (5) L is the so-called self-term dyadic: rV rL , (6) where is an external (as viewed from r) normal to the surface ∂Vn′ˆ 0 at point r'. Eq. (1) can be rewritten in operator form as follows inc~~~ EEA =⋅ , (7) where ( 311 )~ CE →=∈ VLH – functions from V to C3 that have finite L1-norm, 2inc~ H∈E – subspace of H1 containing all functions that satisfy Maxwell equations in free space. A is a linear operator . Although the Sobolev norm is physically more sound (based on the finiteness of energy of the electric field), 21: HH → 6,21 we use the L1-norm. A detailed discussion of all assumptions made for the electric field is performed in Section 2.D. B.Discretization To solve Eq. (1) numerically a discretization is done in the following way.20 Let , for . N denotes the number of subvolumes (dipoles). Assuming and choosing , Eq. /0=ji VV I ji ≠ iV∈r iVV =0 (1) can be rewritten as )()(),(),()()(),(d)()( 3inc rErrLrMrErrrGrErE χχ ii ∂−+′′′′+= ∑ ∫ . (8) The set of Eq. (8) (for all i) is exact. Further one fixed point ri inside each Vi (its center) is chosen and is set. irr = The usual approximation20 is considering E and χ constant inside each subvolume: iiiii V∈==== rrrErErE for)()(,)()( χχχ . (9) Eq. (8) can then be rewritten as ( ) iiii jjjijii V ELMEGEE χχ −++= ∑ inc , (10) where , )(incinc ii rEE = ),( iii V rLL ∂= , ( )∫ ′−′′= iii r ),(),(d s3 rrGrrGM , (11) ∫ ′′= ij rV 1 3 rrGG . (12) A further approximation, which is used in almost all formulations of DDA, is ),()0( jiij rrGG = . (13) This assumption is made implicitly by all formulations that start by replacing the scatterer with a set of point dipoles, as was done originally by Purcell and Pennypacker.1 For a cubical (as well as spherical) cell Vi with ri located at the center of the cell, iL can be calculated analytically yielding22 =i . (14) Eq. (10) together with Eqs. (13) and (14) and completely neglecting iM is equivalent to the original DDA by Purcell and Pennypacker (PP).1 The diagonal terms in Eq. (10) are then equivalent to the well-known Clausius-Mossotti (CM) polarizability for point dipoles. Modifications introduced by other DDA prescriptions are discussed in Section 2.F. In matrix notation Eq. (10) reads ddd ,incEEA = , (15) where Ed, Einc,d are elements of (vectors of size N where each element is a 3D complex vector) and ( )N3C dA is a N×N matrix where each element is a 3×3 tensor. d is the size of one dipole. In operator notation Eq. (8) (for irr = ) is as follows ( ) diii ,incinc )(~)(~~ ErErEA == , (16) We define the discretization error function as ( ) ( )iddidi ,0)(~~ EArEAh −= , (17) where E0,d is the exact field at the centers of the dipoles – )( i rEE = , in contrast to E d that is only an approximation obtained from solution of Eq. (15) (here we neglect the numerical error that appears from the solution of Eq. (15) itself, which is acceptable if this error is controlled to be much less than other errors). Comparing Eqs. (15) and (17) one can immediately obtain the error in internal fields due to discretization δEd: ( ) ddddd hAEEE 1,0δ −−=−= . (18) C.Measured quantities After having determined the internal electric fields, scattered fields and cross sections can be calculated. Scattered fields are obtained by taking the limit ∞→r of the integral in Eq. (1) (see e.g. 23) )iexp( )(sca nFrE = , (19) where rrn = is the unit vector in the scattering direction, and F is the scattering amplitude: ∑∫ ′′⋅′−′−−= krnnk )()()iexp(d)ˆˆ(i)( 33 rErnrInF χ . (20) All other differential scattering properties, such as the amplitude and Mueller scattering matrices, and asymmetry parameter >< θcos can be easily derived from F(n), calculated for two incident polarizations.24 We consider an incident polarized plane wave: )iexp()( 0inc rkerE ⋅= , (21) where , a is direction of incidence, and ak k= 10 =e is assumed. The scattering and extinction cross sections (Csca, Cext) are derived from the scattering amplitude:23 ∫ Ω= nFkC , (22) ( )∗⋅= 02ext )(Re , (23) where * denote complex conjugation. The expression for absorption cross section (Cabs) directly uses the internal fields:23 ( )∑ ∫ ′′′= abs )()(Imd4 rErχπ , (24) Since only values of the internal field in the centers of dipoles are known, Eqs. (20) and (24) are approximated by (PP) ∑ ⋅−χ−−= iii kVnnk )iexp()ˆˆ(i)( 3 nrEInF , (25) iiiVkC abs )Im(4 Eχπ . (26) Corrections to Eq. (26) are discussed in Section 2.F. Both Eqs. (20) (for each component) and (24) can be generalized as ( )E~~φ (a functional that is not necessarily linear), which is approximated as: ( ) ( ) ddd φφφ δ~~ += EE , (27) where ( )dd Eφ corresponds to Eqs. (25) or (26) respectively, and the error δφ d consists of two parts: ( ) ( )[ ] ( ) ( )[ ]ddddddd EEEE φφφφφ −+−= ,0,0~~δ . (28) The first one comes from discretization (similar to Eq. (17)), and the second from errors in the internal fields. D.Error analysis In this section we perform error analysis for the PP formulation of DDA. Improvements of DDA are further discussed in Section 2.F. We assume cubical subvolumes with size d. We also assume that the shape of the particle is exactly described by these cubical subvolumes (we call this cubically shaped scatterer). Moreover, χ is a smooth function inside V (exact assumptions on χ are formulated below). An extension of the theory to shapes that do not satisfy these conditions is presented in Section 2.E. If there are several regions with different values of χ (smooth inside each region), the analysis is still valid but interfaces inside V should be considered the same way as the outer boundary of V. We further fix the geometry of the scattering problem and incident field. Therefore we will be interested only in variation of discretization (which is characterized by the single parameter – d); for reasons that will become clear in the sequel, we assume that (this bound is not limiting since otherwise DDA is generally inapplicable We switch to dimensionless parameters by assuming 1=k , which is equivalent to measuring all the distances in units of k1 . The unit of electric field can be chosen arbitrary but constant. In all further derivations we will use two sets of constants: γi and ci. γ1-γ13 are basic constants that do not depend on the discretization d, but do depend directly upon all other problem parameters – size parameter eqkRx = (Req – volume-equivalent radius), m, shape, and incident field – or some of them. On the contrary, c1-c94 are auxiliary values that either are numerical constants or can be derived in terms of constants γi. Although the dependencies of ci on γi are not explicitly derived in this paper, one can easily obtain them following the derivations of this section. That is the main motivation for using such vast amount of constants instead of an “order of magnitude” formalism. However, such explicit derivation has limited application because, as we will see further, constants in the final result depend upon almost all basic constants. Qualitative analysis of these dependencies will be performed in the end of this section. It should be noted that the main theoretical results concerning DDA convergence (boundedness of errors by a quadratic function, cf. Eq (70)) can be formulated and applied without consideration of any constants (which is simpler). However our full derivation enables us to make additional conclusions related to the behavior of specific error terms. The total number of dipoles used to discretize the scatterer is −= dN γ . (29) We assume that the internal field E is at least four times differentiable and all these derivatives are bounded inside V 65432 )(,)(,)(,)(,)( γγγγγ τρνμρνμνμμ ≤∂∂∂∂≤∂∂∂≤∂∂≤∂≤ rErErErErE for V∈r and τρνμ ,,,∀ . This assumption is acceptable since there are no interfaces inside V, therefore E should be a smooth function. . denotes the Euclidian (L2) norm, which is used for all 3D objects: vectors and tensors. We use the L1-norm, . , for N-dimensional vectors and matrices as well as for functions and operators. Eq. (30) immediately implies that . We require that χ satisfies Eq. ~ 1 VL∈E (30) with constants γ7-γ11. Further we will state an estimate for the norm of )(RG and its derivatives. One can easily obtain from Eq. (2) that for 1>R )(RG satisfies Eq. (30) (with constants c1-c5), while for 2≤R 6 )(,)(,)(,)( −−−− ≤∂∂∂≤∂∂≤∂≤ RcRcRcRc RGRGRGRG ρνμνμμ , −≤∂∂∂∂ RcRGτρνμ for τρνμ ,,,∀ . Next we state two auxiliary facts that will be used later. Let Vc be a cube with size d and with its center at the origin and f(r) a four times differentiable function inside Vc. Then )(max)()(d fdcffr d cVV ∈,∫ , (32) ( ) )(max)( )()(d d cVV ∂∂∂∂+∇≤− =∫ . (33) Eqs. (32) and (33) are the corollary of expanding f into Taylor series. Odd orders of the Taylor expansion vanish because of cubical symmetry. Our first goal is to estimate dh . Starting from Eq. (17) we write as dih ),()(),(d )0(33 ii i Vdr rMPGrPrrGh + −′′′= ∑ ∫ , (34) where we have introduced the polarization vector for conciseness )()()( rErrP χ= , )( ii rPP = . (35) It is evident that also satisfies Eq. )(rP (30) (with constants c13-c17). We start by estimating ),( iiV rM . Substituting a Taylor expansion of )(rP ( ) ( )( )∑∑ ∂∂+∂+= ρρ τρ ),,( )()()( RrP0P0PRP RRR , (36) where , into Eq μμ Rr ≤≤ ~0 (4) gives ( ) ( )(∫ ∑∫ ∂∂+−= ii RRRRV τρτρ τρ ),,( )()(d),( 3s3 RrPRGPRGRGrM ) . (37) The norms of these two terms can be estimated as ( ) 2183s3 )(d3 )()(d dcRRgR i ≤=− ∫∫ PIPRGRG , (38) ( ) 21923153 )(d3)),,(~()(d dcRRcRRR ≤≤∂∂ ∫∫ ∑ RGRrPRG τρτρ τρ . (39) Eq. (38) follows directly from the definitions in Eqs. (2), (5). To derive Eq. (39) we used Eq. (31) and the fact that 23RRR ≤∑ τρ . Finally, Eqs. (37)-(39) lead to 20),( dcV ii ≤rM . (40) To estimate the sum in Eq. (34) we consider separately three cases: 1) dipole j lies in a complete shell of dipole i (we define the shell below); 2) j lies in a distant shell of dipole i – 1>−= ijijR rr ; 3) all j that fall between the first two cases (see Fig. 1). We define the first shell (S1(i)) of a cubical dipole as a set of dipoles that touch it (including touching in one point only). The second shell (S2(i)) is a set of dipoles that touch the outer surface of the first shell, and so on. The l-th shell (Sl(i)) is then a set of all dipoles that lie on the boundary of the cube with size and center coinciding with the center of the original dipole. We call a shell complete if all its elements lie inside the volume of the scatterer V. A shell is called a distant dl )12( + Kmax K(i) (2) (3) vacuum scatterer Fig. 1. Partition of the scatterer’s volume into three regions relative to dipole i. shell if all its elements satisfy , i.e. if its order 1>ijR [ ]dKl 1max => . Let K(i) be the order of the first incomplete shell, which is an indicator of how close dipole i is to the surface. We demand to separate cases (1) and (2) described above. All j that fall in the third case satisfy (the exact value of this constant – slightly larger than max)( KiK ≤ 2ijR (32), then use Eq. (30) for P(r) and )(rG , and finally invoke Eq. (29): )0(33 )(),(d dcdNcdcdr ijij j i ≤≤≤⎟ −′′′ ∑∑ ∫ PGrPrrG . (54) To analyze the third part of the sum in Eq. (34) we again sum over shells, however since they are incomplete we cannot use symmetry considerations. We apply Eq. (33) to the whole function under the integral and proceed analogous to the derivation of Eq. (51). Using the identity )()(2 rGrG −=∇ , (55) (since we have assumed ) we obtain 1=k ( ) 4372 )()( −= ≤∇ ijRcijRrrPrG , (56) ( ) 738)()(max − ≤′′∂∂∂∂ ij , (57) which leads to )0(33 )(),(d −− −′′′∑ ∫ lcdlcdr PGrPrrG , (58) and then analogous to Eq. (53): )()()(),(d 442 )( )( )0(33 iKcidKcdr iKl iSj −′′′∑ ∑ ∫ PGrPrrG . (59) Collecting Eqs. (40), (53), (54), (59) we finally obtain ( ) 24443442141 )(ln)()( diKcciKcidKcdi +++≤ −−h . (60) Then ( ) ( )442141 max4443 )(ln −− +++≤= ∑∑ KcdKcKnNdKcc d hh , (61) where n(K) is the number of dipoles whose order of the first incomplete shell is equal to K. It is clear that NdnKn 12)1()( γ≤≤ , (62) where γ12 is surface to volume ratio of the scatterer. Finally we obtain ( )[ ]dcddccNd 46245431 ln +−≤h . (63) The last term in Eq. (63) is mostly determined by dipoles that lie on the surface (or few dipoles deep) because it comes from the K-4 term in Eq. (61) (which rapidly decreases when moving from surface). We define surface errors as those associated with the linear term in Eq. (63). Our numerical simulation (see Section 0) show that this term is small compared to other terms for “typical” values of d, however it is always significant for small enough values of d. From Eq. (18) we directly obtain δ ddd hAE ≤ . (64) We assume that a bounded solution of Eq. (7) uniquely exists for any , moreover we assume that if inc~ H∈E inc =E then 131 γ≤E . These assumptions are equivalent to the fact that 1~ −A exists and is finite (the operator 1 ~ −A is bounded). Because dA is a discretization of A one would expect that ( ) 131 lim γ== − . (65) Although Eq. (65) seems intuitively correct, its rigorous prove, even if feasible, lies outside the scope of this paper. For an intuitive understanding one may consult the paper by Rahola,25 where he studied the spectrum of the discretized operator (for scattering by a sphere) and showed that it does converge to the spectrum of the integral operator with decreasing d. It should however be noted, that convergence of the spectrum only implies the convergence of the spectral (L2) norm of the operator and not necessarily the convergence of the L1-norm. Therefore Eq. (65) should be considered as an assumption. It implies that there exists a d0 such that for 0dd < ( ) 47 A , (66) where c47 is an arbitrary constant larger then γ13 (although d0 depends on its choice). For example 1347 2γ=c should lead to a rather large d0 (a rigorous estimate of d0 does not seem feasible). Therefore δ dE satisfies the same constrain as dh (Eq. (63)) but with constants c48-c50. Next we estimate the errors in the measured quantities and start with the discretization error (first part in Eq. (28)). Examining Eqs. (20) and (24) one can see that Eq. (32) may be directly applied leading to ( ) ( ) 252551,0~~ dcdc dd ≤≤− ∑EE φφ . (67) The second part in Eq. (28) is estimated as ( ) ( ) ( ) dcddccdcdc d 55541 ,0 lnδδ +−≤≤≤− ∑ EEEE φφ , (68) where we used Eq. (29). The estimation of the error for Cabs additionally uses the fact i c EE δδ 57 c Eδmax 57 . By combining Eqs. (67) and (68) we obtain the final result of this section: ( ) dcddccd 5625558 lnδ +−≤φ . (69) It is important to remember that the derivation was performed for constant x, m, shape, and incident field. There are 13 basic constants (γ1-γ13). γ1 (Eq. (29)) characterizes the total volume of the scatterer, hence it depends only on x. γ7-γ11 (Eq. (30) for χ(r)) can be easily obtained given the function χ(r), moreover it is completely trivial in the common case of homogenous scatterers. γ12 (surface to volume ratio, Eq. (62)) depends on the shape of the scatterer and is inversely proportional to x. It is not feasible (except for certain simple shapes) to obtain the values of constants γ2-γ6 (Eq. (30)), since it requires an exact solution for the internal fields. These constants definitely depend upon all the parameters of the scattering problem. Moreover, these dependencies can be rapidly varying, especially near the resonance regions. The same is true for γ13 (L1-norm of the inverse of the integral operator, Eq. (65)). Finally, there is the important constant d0 that also depends on all the parameters, however one may expect it to be large enough (e.g. ) for most of the problems – then its variation can be neglected. 20 ≥d Before proceeding we introduce the discretization parameter kdmy = . We employ the commonly used formula as proposed by Draine,8 however the exact dependence on m is not important because all the conclusions are still valid for constant m. Replacing d by y does not significantly change the dependence of the constants in Eq. (69) since they all already depend on m through the basic constants γ2-γ11, γ13. This leads to ( ) ycyyccy 6126059 lnδ +−≤φ . (70) It is not feasible to make any rigorous conclusions about the variation of the constants in Eq. (70) with varying parameters because all these constants depend on γ2-γ6, γ13 that in turn depend in a complex way upon the parameters of the scattering problem. However we can make one conclusion about the general trend of this dependency. Following the derivation of the Eq. (70) one can observe that c61 is proportional to γ12, while c59 and c60 do not directly depend on it (at least part of the contributions to them are independent of γ12). Therefore the general trend will be a decrease of the ratio 5961 cc with increasing x (when all other parameters are fixed). This is a mathematical justification of the intuitively evident fact that surface errors are less significant for larger particles. In the analysis of the results of the numerical simulations (Section 0) we will neglect the variation of the logarithm. Eq. (70) then states that error is bounded by a quadratic function of y (for ). However, keep in mind that our derivation does not lead to an optimal error estimation, i.e. it overestimates the error and can be improved. For example, the constants γ 0dd ≤ γ6 are usually largest inside a small volume fraction of the scatterer (near the surface or some internal resonance regions), while in the rest of the scatterer the internal electric field and its derivatives are bounded by significantly smaller constants. However the order of the error is estimated correctly, as we will see in the numerical simulations. It is important to note that Eq. (70) does not imply that δφ y (which is a signed value) actually depends on y as a quadratic function, but we will see later that it is the case for small enough y (Section 0, see detailed discussion in Paper 2). Moreover, the coefficients of linear and quadratic terms for δφ y may have different signs, which may lead to zero error for non- zero y (however, this y, if it exists, is unfortunately different for each measured quantity). E.Shape errors In this section we extend the error analysis as presented in Section 2.D to shapes that cannot be described exactly by a set of cubical subvolumes. We perform the discretization the same way as in Section 2.B but some of the Vi are not cubical (for Vi ∂∈ , which denotes that dipole i lies on the boundary of the volume V). We set ri to be still in the center of the cube (circumscribing Vi) not to break the regularity of the lattice. The standard PP prescription uses equal volumes ( ) in Eqs. 3dVi = (10), (14), (25), and (26), i.e. the discretization changes the shape of the particle a little bit. We will estimate the errors introduced by these boundary dipoles. These errors should then be added to those obtained in Section 2.D. We start by estimating dh . First we consider for dih Vi ∂∉ −′′′= PGrPrrGh )0(33 )(),(d , (71) which is just a reduction of Eq. (34). For Vi ∂∈ is the same plus the error in the self-term dih iiiiii i VVdr EΙrLrMPGrPrrGh χ −′′′= ∑ ∫ ),(),()(),(d )0(33 . (72) Let us define iij dr PGrPrrGh )0(33sh )(),(d −′′′= ∫ , (73) iiiiiiii VV EΙrLrMh χ ⎛ −∂−= ),(),(sh . (74) We estimate each of the terms in Eq. (73) separately (since there is actually no significant cancellation, and the error is of the same order of magnitude as the values themselves) using Eq. (30) for P(r) and )(rG and Eq. (31). This leads to h (75) To estimate we assume that the surface of the scatterer is a plane on the scale of the size of the dipole. A finite radius of curvature only changes the constants in the following expressions. We will prove that sh cii ≤h , (76) therefore we do not need to consider the third term in Eq. (74) (coming from the unity tensor) at all, since it is bounded by a constant. ( ) ( )∫∫ −′′′+′′−′′= iiii rrV )()(),(d)(),(),(d),( s3s3 rPrPrrGrPrrGrrGrM . (77) −′ ic rrThe function in the first integral is always bounded by . If the same is true for the second integral and hence ii V∈r dcV ii 66),( ≤rM . (78) If ii V∉r we introduce an auxilia r ′′ry point that is symmetric to ri over the particle surface and apply the identity ( ) ( ))()()()()()( ii rPrPrPrPrPrP −′′+′′−′=−′ (79) to the second integral in Eq. (77). Using Taylor expansion of P near and the fact that r ′′ irrrr −′≤′′−′ for iV∈′r one can show that ∫ ′′+≤ii cdcV ),( s36867r ir ),(d rrGM , (80) where the remaining integral can be proven to be equal to ),( iiV rL ∂− . The last prove left (see Eqs. (74) and (80)) is to demonstrate that ),( iiV rL ∂ is bounded by a constant. The only potential problem may come from the subsurface of iV∂ th the particle surface (because it may be close to r at is part of ce is i). This subsurfa med planar. We will calculate the integral in Eq. (6) over the infinite plane rρrr +=−′ i uch that 0 s =⋅rρ . Then ρρn ±=′ ( ) 223 2d),plane.inf( mm == ∫irL +∂V r r , (81) which is bounded. The rest of the integral (over the part of the cube surface) is bounded by a constant, which is a manifestation of a more general fact that (by its definition) ), ir ( iVL ∂ does not depend on the size but only on the shape of the volume. Finally we have 69),( cV ii ≤∂ rL , which together with Eqs. (74), (78), and (80) prove Eq. (76). Using Eqs. (75) and (76) we obtain )ln()( 70 cllnc +≤+≤ ∑ ∑∑∑ ∑ −hhh 7271 dccNd , (83) where we have changed the order of the summation in the double sum and split the summation over cubical shells for ⎝∂∈∂∈∂∈ maxKl ≤ and . Then we have grouped everything otal esti maxKl > into one sum over boundary dipoles. Eqs. (41) and (62) were used in the last inequality. Combining Eqs. (63) and (83) one can obtain the t mate of the dh for any scatterer: ( ) ( )[ ]ddccddccNd lnln 7273245431 −+−≤h . (84) Using Eq. (66) we immediately obtain the sam δ de estimate for E . The derivation of the errors in the measured quantities is slightly modified compared to (68) e changed to Section 2.D, by the presence of the shape errors. Eqs. (67) and ar ( ) ( )~~ dcdcdcdc ,0 +≤+≤− ∑∑ EE φφ , (85) ( ) )( ( ) ( )ddccddccdddd lnln 2,0 −+−≤−Eφ . dcd δ 7655541 53≤ EEφ (86) The second term in Eq. (85) comes from surface dipoles for which errors are the same order as the values themselves. Finally the generalization of Eq. (70) is ( ) ( )yyccyyccy lnlnδ 797826059 −+−≤φ . (87) The shape errors “reinforce” the surface errors (the linear term of discretization error), and although they both generally decrease with increasing size parameter x one may expect the linear term in Eq. (87) to be significant up to higher values of y than in Eq. (70). All the derivations in this section can in principle be extended to interfaces inside the particle, i.e. when a surface, which cannot be described exactly as a surface of a set of cubes, separates two regions where χ(r) varies smoothly. Two parts of the cubical dipole on the interface should be considered separately the same way as it was done above. This will however not change the main conclusion of this section – Eq.(87) – but only the constants. F.Different DDA formulations In this section we discuss how different DDA formulations modify the error estimates derived in Sections 2.D and 2.E. Most of the improvements of PP proposed in the literature are concerned with the self- term – . They are the Radiative Reaction correction (RR),),( iiV rM 8 the Digitized Green’s Function (DGF),23 the formulation by Lakhtakia (LAK),26,27 the a1-term method,28,29 the Lattice Dispersion Relation (LDR),30 the formulation by Peltoniemi (PEL),31 and the Corrected LDR (CLDR).32 All of them provide an expression for that is of order d),( iiV rM (except for RR that is of order d3). For instance LDR is equivalent to ( )[ ] iii ddSmbmbbV PrM 3223221 i)32(),( +++= , (88) (remember that we assumed ) where b1=k 1, b2, b3 are numerical constants and S is a constant that depends only on the propagation and polarization vectors of the incident field. However, none of these formulations can exactly evaluate the integral in Eq. (39), because the variation of the electric field is not known beforehand (PEL solves this problem, but only for a spherical dipole). Therefore they (hopefully) decrease the constant in Eq. (40), thus decreasing the overall error in the measured quantities. However, these formulations are not expected to change the order of the error from d2 to some higher order. We do not analyze the improvements by Rahmani, Chaumet, and Bryant (RCB)33,34 and Surface Corrected LDR (SCLDR),17 as they are limited to certain particle shapes. There exist two improvements of the interaction term in PP: Filtered Coupled Dipoles (FCD)12 and Integration of Green’s Tensor (IT).35 A rigorous analysis of FCD errors is beyond the scope of this paper, but it seems that FCD is not designed to reduce the linear term in Eq. (63) that comes from the incomplete (non-symmetric) shells. This is because FCD employs sampling theory to improve the accuracy of the overall discretization for regular cubical grids. FCD does not improve the accuracy of a single ijG calculation (approximation of an integral over one subvolume). IT, which numerically evaluates the integral in Eq. (12), has a more pronounced effect on the error estimate. Consider dipole j from l-th shell (incomplete) of dipole i, then .),(d),(maxd )(),(d)(),(d ≤′′′+′∂′′≤ −′′′=−′′′ dlcrrcrrc jijij rrGrrG PrPrrGPGrPrrG (89) Here we have used Eq. (36) and Taylor expansion of Green’s tensor up to the first order. Eq. (89) states that the second term in Eq. (58) is completely eliminated and so is the linear term in Eqs. (69) and (70) (surface errors). Therefore convergence of DDA with IT for cubically shaped scatterers is expected to be purely quadratic (neglecting the logarithm). However, for non cubically shaped scatterers the linear term reappears, due to the shape errors. Both IT and FCD also modify the self-term, however the effect is basically the same as for the other formulations. Several papers aimed to reduce shape errors.10,11,36 The first one – Generalized Semi- Analytical (GSA) method10 – modifies the whole DDA scheme, while the other two propose averaging of the susceptibility over the boundary dipoles. We will analyze here Weighted Discretization (WD) by Piller11 as probably the most advanced method to reduce shape errors available today. WD modifies the susceptibility and self-term of the boundary subvolume. We slightly modify the definition of the boundary subvolume used in Sections 2.B and 2.E to automatically take into account interfaces inside the scatterer. We define Vi to be always cubical, but with a possible interface inside. The particle surface, crossing the subvolume Vi, is assumed planar and divides the subvolume into two parts: the principal volume (containing the center) and the secondary volume with susceptibilities , and electric fields , respectively. The electric fields are considered constant inside each part and related to each other via the boundary condition tensor iV ii χχ ≡ ii EE ≡ iT : iii ETE = s . (90) In WD the susceptibility of the boundary subvolume is replaced by an effective one, defined ( ) 3ssppe dVV iiiiii TI χχχ += , (91) which gives the correct total polarization of the cubical dipole. The effective self-term is directly evaluated starting from Eq. (4), considering χ and E constant inside each part, ( ) ( ) iii rrV ETrrGrrGrrGrrGrM ′−′′+′−′′= ∫∫ ss3ps3 ),(),(d),(),(d),( χχ . (92) Piller evaluated the integrals in Eq. (92) numerically.11 To take a smooth variation of the electric field and susceptibility into account we define ( r is defined in Section )(s r ′′= χχ i ′′ 2.E) and iT is calculated at the surface between ri and . and r ′′ ii PP ≡ iiiiii ETEP ssss χχ == . Then c rrPrP −≤−′′ min)( 83 s , (93) where we have assumed that Eq. (30) for χ(r) and E(r) is also valid in . siV We start estimating errors of WD with (cf. Eq. shijh (73)) ( ) ( )∫∫ −′′′+−′′′= s)0(3p)0(3sh )(),(d)(),(d jijiij rr PGrPrrGPGrPrrGh , (94) Using Taylor expansions of near r)(rP ′ i and r ′′ in and correspondingly and Eq. piV iV (93) one may obtain that the main contribution comes from the derivative of Green’s tensor, leading to (cf. Eq. (75)) h (95) iih is the following (cf. Eq. (74)) ( ) ( ) ( ) ( ) ( ) .),(),( ),(d)(),(d)(),(d ),(),(),(d),(),(d ),(),( pss3s3p3 ess3ps3 iiiiiii iiiiiii PrLErL PPrrGPrPrrGPrPrrG ErLPrrGrrGPrrGrrG PrLrMh −′′+−′′′+−′′′= ∂−′−′′+′−′′− (96) The first two integrals can be easily shown to be dс85≤ (cf. Eq. (77)) and third one is transformed to L the same way as in Eq. (80), thus iiiiiiiiiiii VVVdc ErLPrLPrLh esspp sh ),(),(),( χ∂−∂+∂+≤ , (97) where the second term comes from the fact that averaged PL is not the same as L times averaged P. This error depends on the geometry of the interface inside Vi and generally is of order unity. For example, if the plane interface is described as ε+= izz , taking limit 0→ε gives the error ( )zii sp2 PP −π (using Eq. (81)). Therefore WD does not principally improve the error estimate of given by Eq. shiih (76), although it may significantly decrease the constant. On the other hand, since ),( p iiV rL ∂ and ),( iiV rL ∂ can be (analytically) evaluated for a cube intersected by a plane, WD can be further improved to reduce the error in to linear in d, which is a subject of future research. Proceeding analogous to the derivation of Eq. (83) one can obtain Ndcdccllnc 898887 )( ≤⎟⎟ ++≤ ∑ ∑ h . (98) It can be shown that for the scattering amplitude (Eq. (25)) the error estimate given by Eq. (85) can be improved, since WD correctly evaluates the zeroth order of value for the boundary dipoles, leading to ( ) ( ) 291490551,0~~ dcdcdc dd ≤+≤− ∑∑ EE φφ . (99) In his original paper11 Piller did not specify the expression that should be used for Cabs. Direct application of the susceptibility provided by WD into Eq. (26) does not reduce the order of error when compared with the exact Eq. (24) (except when ), since they are not linear functions of the electric field. However, if we consider separately and (which is equivalent to replacing 0s =iχ ( )( )∗⋅ iiiiV EEeIm χ by 2ss2pp )Im()Im( iiiiiii VV ETE χχ + ) the same estimate as in Eq. (99) can be derived for Cabs. Using Eqs. (98), (99), and the first part of Eq. (86) one can derive the final error estimate for WD: ( ) ycyyccy 9429392 lnδ +−≤φ , (100) where the constant before the linear term, as compared to Eq. (87), does not contain a logarithm and is expected to be significantly smaller, because several factors contributing to it are eliminated in WD. Although WD has a potential for improving, it does not seem feasible to completely eliminate the linear term in the shape error. The accuracy of evaluation of the interaction term over the boundary dipole (cf. Eq. (94)) can be improved by integration of Green’s tensor over and separately but that would ruin the block-Toeplitz structure of the interaction matrix and hinder the FFT-based algorithm for the solution of linear 0 30 60 90 120 150 180 Scattering angle θ, deg cube kD=8 discretized sphere kD=10 sphere kD=3 (x10) sphere kD=10 sphere kD=30 Fig. 2. S11(θ ) for all 5 test cases in logarithmic scale. The result for the kD = 3 sphere is multiplied by 10 for convenience. equations.5 Since there is no comparable alternative to FFT nowadays, this method seems inapplicable. Minor modifications of the expression for Cabs are possible. Draine8 proposed a modification of Eq. (26) that was widely used afterwards and which was further modified by Chaumet et al.35 However, for many cases these expressions are equivalent and, even when they are not, the difference is of order d3, which is neglected in our error analysis. 3. Numerical simulations A.Discrete Dipole Approximation The basics of the DDA method were summarized by Draine and Flatau.2 In this paper we use the LDR prescription for dipole polarizability,30 which is most widely used nowadays, e.g. in the publicly available code DDSCAT 6.1.4 We also employ dipole size correction8 for non- cubically shaped scatterers to ensure that the cubical approximation of the scatterer has the correct volume; this is believed to diminish shape errors, especially for small scatterers.2 We use a standard discretization scheme as described in Section 2.E, without any improvements for boundary dipoles. It is important to note that all the conclusions are valid for any DDA implementation, but with a few changes for specific improvements as discussed in Section 2.F. Our code – Amsterdam DDA (ADDA) – is capable of running on a cluster of computers (parallelizing a single DDA computation), which allows us to use practically unlimited number of dipoles, since we are not limited by the memory of a single computer.37,38 We used a relative error of residual as a stopping criterion for the iterative solution of the DDA linear system. Tests suggest that the relative error of the measured quantities due to the iterative solver is then (data not shown) and hence can be neglected (total relative errors in our simulations are – see Section 810−< 710−< 56 1010 −− ÷> 0). More details about our code can be found in Paper 2. All DDA simulations were carried out on the Dutch national compute cluster LISA.39 Table 1. Exact values of Qext for the 5 test cases. Particle Qext kD = 8 cube 4.490 discretized kD = 10 sphere 3.916 kD = 3 sphere 0.753 kD = 10 sphere 3.928 kD = 30 sphere 1.985 0.1 1 slope = 0.77 y = kd·m maximum θ = 0° θ = 45° θ = 90° θ = 135° θ = 180° slope = 0.95 Fig. 3. Relative errors of S11 at different angles θ and maximum over all θ versus y for (a) the kD = 8 cube, (b) the cubical discretization of kD = 10 sphere. A log-log scale is used. A linear fit of maximum over θ errors is shown. (m = 1.5). B.Results We study five test cases: one cube with 8=kD , three spheres with , and a particle obtained by a cubical discretization of the 30,10,3=kD 10=kD sphere using 16 dipoles per D (total 2176 dipoles, x equal to that of a sphere; see detailed description in Paper 2). By D we denote the diameter of a sphere or the edge size of a cube. All scatterers are homogenous with . Although DDA errors significantly depend on m (see e.g. 5.1=m 14), we limit ourselves to one single value and study effects of size and shape of the scatterer. The maximum number of dipoles per D (nD) was 256. The values of nD that we used are of the form (p is an integer), except for the discretized sphere, where all np2}7,6,5,4{ ⋅ D are 0.1 1 0.1 1 0.01 0.1 y = kd·m slope 2.29 (c) slope 0.91 maximum θ = 0° θ = 45° θ = 90° θ = 135° θ = 180° slope 1.05 Fig. 4. Same as Fig. 3 but for (a) kD = 3, (b) kD = 10, and (c) kD = 30 spheres. multiples of 16 (this is required to exactly describe the shape of the particle composed from a number of cubes). The minimum values for nD were 8 for the 3=kD sphere, 16 for the cube, the sphere, and the discretized sphere, and 40 for the 10=kD 30=kD sphere. All the computations use a direction of incidence parallel to one of the principal axes of the cubical dipoles. The scattering plane is parallel to one of the face of the cubical dipoles. In this paper we show results only for the extinction efficiency Qext (for incident light polarized parallel to one of the principal axes of the cubical dipoles) and phase function S11(θ ) as the 0.01 0.1 1 slope = 0.89 cube kD=8 discretized sphere kD=10 sphere kD=3 sphere kD=10 sphere kD=30 y=kd·m Fig. 5. Relative errors of Qext versus y for all 5 test cases. A log-log scale is used. A linear fit through 5 finest discretizations of kD = 3 sphere is shown. most commonly used in applications. However, the theory applies to any measured quantity. For instance, we have also confirmed it for other Mueller matrix elements (data not shown). Exact results of S11(θ ) for all 5 test cases are shown in Fig. 2. For spheres this is the result of Mie theory (the relative accuracy of the code we used24 is at least ) and for the cube and discretized sphere an extrapolation over the 5 finest discretizations (the extrapolation technique is presented in Paper 2, together with all details of obtaining these results, including their estimated errors). We use such ‘exact’ results because analytical theory is unavailable for these shapes and because errors of the best discretization are larger than that of the extrapolation. Their use as references for computing real errors (difference between the computed and the exact value) of single DDA calculations is justified because all these real errors are significantly larger than the errors of the references themselves (see Paper 2; in general, real errors obtained this way have an uncertainty of reference error). Exact values of 610−< ext for all test cases are presented in Table 1. In the following we show the results of DDA convergence. Fig. 3 and Fig. 4 present relative errors (absolute values) of S11 at different angles θ and maximum error over all θ versus y in log-log scale. In many cases the maximum errors are reached at exact backscattering direction, then these two sets of points overlap. Deep minima that happen at intermediate values of y for some values of θ (and also sometimes for Qext – Fig. 5) are due to the fact that the differences between simulated and reference values change sign near these values of y (see Paper 2 for detailed description of this behavior). The solid lines are linear fits to all or some points of maximum error. The slopes of these lines are depicted in the figures. Fig. 5 shows relative errors of Qext for all 5 studied cases in log-log scale. A linear fit through the 5 finest discretizations of the 3=kD sphere is shown. More results of these numerical simulations are presented in Paper 2. 4. Discussion Convergence of DDA for cubically shaped particles (Fig. 3) shows the following trends. All curves have linear and quadratic parts (the non-monotonic behavior of errors for some θ are also a manifestation of the fact that signed difference can be approximated by a sum of linear and quadratic terms that have different signs). The transition between these two regimes occurs at different y (which indicates the relative importance of linear and quadratic coefficients). While for maximum errors that are close to those of the backscattering direction the linear term is significant for larger y, it is much smaller and not significant in the whole range of y studied for side scattering ( °= 90θ ). R for l esults of DDA convergence for spheres (Fig. 4) show a different behavior for different sizes. Errors for the small ( 3=kD ) sphere converge purely linear (except for small deviation of errors of )90(11 °S values of y). Similar results are obtained for the 10=kD sphere, but with significant oscillations superimposed upon the general trend. Convergence for the large ( 30 =kD ) sphere is quadratic or even faster in the range of y studied, also with significant oscillations. Comparing Fig. 3 and Fig. 4 (especially Fig. 3(b) and Fig. 4(b) showing results for almost the same particles) one can deduce the following differences in DDA convergence for cubically and non-cubically shaped scatterers. The linear term for cubically shaped scatterers is significantly smaller, resulting in smaller total errors, especially for small y. All these conclusions, together with the size dependence of the significance of the linear term in the total errors, are in perfect agreement with the theoretical predictions made in Sections 2.D and 2.E. Errors for non-cubically shaped particles exhibit quasi-random oscillations that are not present for cubically shaped particles. This can be explained by the sharp variations of shape errors with changing y (discussed in details in Paper 2). Oscillations for the sphere Fig. 4(a)) are very small (but still clearly present), which is due to the small size of the particle and hence featurelessness of its light scattering pattern – the surface structure is not that important and one may expect rather small shape errors. Results for Qext (Fig. 5) fully support the conclusions. Errors of Qext for the large sphere at small values of y are unexpectedly smaller than for smaller spheres. This feature requires further study before making any firm conclusions, however there is definitely no similar tendency for S11(θ ) (cf. Fig. 4). We have also studied a porous cube that was obtained by dividing a cube into 27 smaller cubes and then removing randomly 9 of them. All the conclusions are the same as those reported for the cube, but with slightly higher overall errors (data not shown). In this paper we have used a traditional DDA formulation2 for numerical simulations. However, as we showed in Section 2.F several modern improvements of DDA (namely IT and WD) should significantly change its convergence behavior. IT should completely eliminate the linear term for cubically shaped scatterers, which should improve the accuracy especially for small y. WD should significantly decrease shape and hence total errors for non- cubically shaped particles, moreover it should significantly decrease the amplitude of quasi- random error oscillations because it takes into account the location of the interface inside the boundary dipoles. Numerical testing of DDA convergence using IT and WD is a subject of a future study. 5. Conclusion To the best of our knowledge, we conducted for the first time a rigorous theoretical convergence analysis of DDA. In the range of DDA applicability ( 2 /CHT /DAN /DEU /ESP /FRA /ITA /JPN /KOR /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken die zijn geoptimaliseerd voor prepress-afdrukken van hoge kwaliteit. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.) /NOR /PTB /SUO /SVE /ENU (Use these settings to create Adobe PDF documents best suited for high-quality prepress printing. Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.) /Namespace [ (Adobe) (Common) (1.0) /OtherNamespaces [ << /AsReaderSpreads false /CropImagesToFrames true /ErrorControl /WarnAndContinue /FlattenerIgnoreSpreadOverrides false /IncludeGuidesGrids false /IncludeNonPrinting false /IncludeSlug false /Namespace [ (Adobe) (InDesign) (4.0) ] /OmitPlacedBitmaps false /OmitPlacedEPS false /OmitPlacedPDF false /SimulateOverprint /Legacy >> << /AddBleedMarks false /AddColorBars false /AddCropMarks false /AddPageInfo false /AddRegMarks false /ConvertColors /ConvertToCMYK /DestinationProfileName () /DestinationProfileSelector /DocumentCMYK /Downsample16BitImages true /FlattenerPreset << /PresetSelector /MediumResolution >> /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> >> setdistillerparams /HWResolution [2400 2400] /PageSize [612.000 792.000] >> setpagedevice ABSTRACT We performed a rigorous theoretical convergence analysis of the discrete dipole approximation (DDA). We prove that errors in any measured quantity are bounded by a sum of a linear and quadratic term in the size of a dipole d, when the latter is in the range of DDA applicability. Moreover, the linear term is significantly smaller for cubically than for non-cubically shaped scatterers. Therefore, for small d errors for cubically shaped particles are much smaller than for non-cubically shaped. The relative importance of the linear term decreases with increasing size, hence convergence of DDA for large enough scatterers is quadratic in the common range of d. Extensive numerical simulations were carried out for a wide range of d. Finally we discuss a number of new developments in DDA and their consequences for convergence. <|endoftext|><|startoftext|> Microsoft Word - NatureCorrespondJasmitedit.rtf Sent to Nature in 1990 Abstract This is a supplement to the paper arXiv:q-bio/0701050, containing the text of correspondence sent to Nature in 1990. Origin of adaptive mutants: a quantum measurement? Sir, - Several recent works described non-random induction of adaptive mutations by environmental stimuli 1-3. The most obvious explanation of this striking phenomenon would be that activation of gene expression leads to the enhancement of its mutation rate4. However, this does not work with the lacZ mutations described by Cairns and co-workers as the true inducer of the lac-operon is not lactose as such, but allolactose, a by-product of the β-galactosidase reaction5. So, in lacZ mutants the operon is not induced by lactose6. Besides, induction of respective genes would not explain the high fraction, among the revertants, of suppressor mutations in tRNA genes1,7 Other explanations suggest some special mechanisms for the "acceleration of adaptive evolution", like selection of "useful" protein coupled to specific reverse transcription1. However, any mechanism of this type also should have emerged in evolution. I propose that, to explain the adaptive mutation phenomenon, there is no need for any new ad hoc mechanism. The only thing that is necessary is to return to the old discussion of the role of quantum concepts in our understanding of life. This alone will allow the explanation of this manifestly Lamarckian phenomenon by Darwinian selection, occurring not in a population of organisms as usual, but in a "population" of virtual, in the direct quantum theory sense, states of each distinct cell. Thus, this hypothesis may be called "selection of virtual mutations". Detailed substantiation of this concept will be presented in a special publication; below I briefly show how this explanation might work. It has been shown by the Cairns group that the mutations ensuring cell growth begin to accumulate not immediately after plating, but only after conditions are created under which such mutations become "useful", as if the mutations are induced by these conditions1. I suggest that, to explain this phenomenon, we should change our ideas about what a cell is, and consider not actual but virtual mutations. An important distinction of virtual mutations is that they do not accumulate with time in stationary cell, whereas the number of actual mutants would grow linearly from the moment of plating, and this would yield drastically different results in experiments like those shown in Fig. 3 of Ref. 1. Virtual mutations produce "delocalization" of the cell among different states, similarly to the delocalization of electron in physical space. However, for a virtual mutation to become an actual one, certain conditions are necessary, namely the possibility to grow, leading the system away irreversibly from the initial state. Such conditions arise when, for example, lactose is added to a plate with lacZ bacteria. Briefly, this is the essence of the proposed explanation. What is a virtual mutation? The main cause of usual spontaneous mutations is the well- known base tautomerization8 (having the in vitro frequency of about 10-4). Thus could we reduce ‘virtual mutation’ to such tautomerization? I believe that this view is not consistent with experiments, as it implies that the same rare tautomeric form should work both in transcription and in replication. If these processes are considered independent, we logically arrive to the leaky mutant, which was refuted by Cairns and coworkers1. Thus we need to postulate a correlation between the recognition of the tautomeric forms in transcription and in replication, making us to define "virtual mutation" as a certain state of the cell as a whole. Analogous reasoning is applicable to the "adaptive transpositions" discussed by Cairns. In other words, we consider the whole cell as a quantum system, with non-negligible nonlocality inherent in such systems. Most of all it resembles the systems of "generalized rigidity"9, such as superfluid or superconducing states of matter, whose behavior is linked to quantum correlations; and I believe, similar correlations take place in the cell too. I would like to emphasize that the proposed approach does not require detalization of molecular processes in the cell. Its main focus is the behavior of the cell as a whole. Similarly, to explain gyroscopic precession there is no need to consider interactions between elements inside the gyroscope; it’s enough to know some motion invariants, defined by space-time symmetries. Starting from this general view, one may express the above ideas using the operator formalism, and considering experiments conducted by Cairns as measurement of the cells’ capability to propagate under given conditions. I suggest that the trait "ability to reproduce on lactose" (as an example) can be represented by an operator which one may designate "Lac". Importantly, this new operator will act on the state Ψ of the whole cell because the ability to reproduce is a property of the cell as a whole, and not of any part of it. Generally, "Lac " will decompose this Ψ into a superposition of some eigenfunctions. The components of this superposition are those functions that do not change upon the action of this operator, but are only multiplied by a constant. It reflects the essence of operator formalism in quantum theory, which chooses states compatible with given experimental conditions. There are three such eigenfunctions (I intentionally simplify the situation): ψ1 corresponds to cell death, ψ2 to the stationary state, and ψ3 to the self-reproduction (that is the virtual mutation, in our case). Each function will enter the decomposition of Ψ with a coefficient ci related to the probability of this or that outcome, i.e.: Ψ = c1 ψ1 + c2 ψ2 + c3 ψ3, where Σ| ci |2 = 1 By plating the cells on lactose agar we, in fact, begin to measure their ability to grow under these particular conditions. The rate of accumulation of lac revertants, i.e. the probability to obtain a cell in the mutant state, will correspond to |c3|2, being a small, but finite quantity, appearing, for example, due to base tautomerization. Here, the role of cell growth is dual: on the one hand, it is a factor of irreversibility amplifying the "quantum fluctuation", and on the other hand, it is a selection criterion, as each kind of virtual mutants capable of growth under these conditions can lead to colony formation. Another situation, i.e. glucose/valine agar, will be represented by another operator (Valr), which will decompose the same Ψ function according to another basis, and Valr mutants will be obtained with certain rate. In fact, this is the essence of adaptive mutation phenomenon, where a particular condition induces emergence of respective mutants. Thus, the proposed change of our view on the cell suggests that, in accord with quantum concepts, we are not dealing with the probability for a cell to mutate by itself, independent of experimental conditions. Rather, we are dealing with the probability to observe the cell in the mutant state by plating it on lactose. We are certainly simplifying situation, as spontaneous mutations that accumulate during cell growth before plating, make our ensemble ‘mixed’. However, this complication does not change the essence of the explanation, according to which adaptive mutations emerge by measurement of ‘pure’ state. This resembles the passage of a polarized photon through a polarizer turned under some angle to the photon polarization. It will be incorrect to say that the polarization of the photon could turn by the necessary angle by chance, prior to interaction. It is the specific experimental situation that makes us to decompose the state vector according to the respective basis states, and to evaluate the fraction of the component that will pass through polarizer. On the other hand, one may speak about "adaptation" of photon polarization by selection of "fit" eigenstate, and consider this case as the model for our phenomenon. How are all these ideas applicable to the living bacterial cell? Discussion of the possible role of quantum concepts in biology has a rather long history, initiated by Niels Bohr (‘the complementarity principle’). Briefly, one might reduce the essence of this discussion to the principal impossibility to predict precisely the fate of an individual cell. For example, any attempt to determine, whether it is able to reproduce under certain conditions, will lead to irreversible change of the state of the cell, even to its death. This is reminiscent of the two-slit diffraction experiment, where an attempt to determine through which of the two slits the electron actually passes will lead to disappearance of the interference. The two trajectories of the electron can be made physically discernable only by the cost of changing the experimental situation. Similarly, the notorious phenomenon of the "wholeness" of the living organism can be formally expressed according to the Feynman rules of calculating probabilities: different indiscernible (in the given experimental conditions) variants should be included in the pure state (i.e. their amplitudes, and not probabilities, should be summed, leading to interference and other quantum effects). Thus, as long as a whole cell exists and is alive, we are obligated to treat its different indiscernible states in this way. Such consideration of operational limitations allows us to explain the adaptive mutation phenomenon (and hopefully other adaptations too) as the consequence of unavoidable quantum scatter in measurement of the cell's capability to propagate under given conditions. In spite of its apparent formal character, this hypothesis allows us to make some predictions of applied (in particular, medical) interest. It predicts that in processes involving somatic mutations (e.g. oncogenesis, or specific antibody generation) the mutations may be induced by conditions allowing the cell that happened to be in the state of virtual mutation to proliferate irreversibly. I believe, this possibility can be tested experimentally. References 1. Cairns,J., Overbaugh,J., Miller,S. Nature 335, 142-145 (1988) 2. Shapiro,J.A. Molec. Gen. Genet. 194, 79-90 (1984) 3. Hall,B.J. Genetics 120, 887-897 (1988) 4. Devis,B.D. Proc.Natl.Acad.Sci.USA 86, 5005-5009 (1989) 5. Lewin,B., Genes, p.236 ( J.Wiley & Sons,1985) 6. Burstein,C., Cohn,M., Kepes,A., Monod,J. Bioch.Bioph.Acta 95, 634 (1965) 7. Savic,D.J.& Kanazir,D.T. Molec. Gen. Genet. 137, 143-150 (1975) 8. Topal,M., Fresco,J. Nature 263, 285-289 (1976) 9. Anderson,P.W.,Stein,D.L. in Self-Organizing Systems, ed. by F.E.Yates, pp.451-452 (Plenum Press, 1987) Comments: This text was written in 1990. The author translated it to English with the kind help of Dr. Eugene Koonin (current affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA.) The English version of the text was sent to Nature in 1990 and rejected. At the same time it was also sent to the following correspondents : 1. JOHN CAIRNS Department of Cancer Biology, Harvard School of Public Health, Boston, Massachusetts 02115. 2. BARRY HALL Department of Molecular and Cell Biology, University of Connecticut, Storrs 06269. 3. BERNARD DAVIS Bacterial Physiology Unit, Harvard Medical School, Boston, MA 02115. 4. KOICHIRO MATSUNO Department of BioEngineering, Nagaoka University of Technology, Japan. 5. KONSTANTIN CHUMAKOV Center for Biologics Evaluation and Research, Food and Drug Administration, Rockville, Maryland 20852, USA. 6. MIKHAIL V. IVANOV Institute of Microbiology, Russian Academy of Sciences, pr. 60-letiya Oktyabrya 7, k. 2, Moscow, 117811 Russia. , as well as to all participants of the discussion ‘Origin of mutants disputed’ (Nature 336, 525 - 526 (08 December 1988)) : 1. D. CHARLESWORTH, B. CHARLESWORTH & J. J. BULL Department of Ecology and Evolution, University of Chicago, 915 East 57th Street, Chicago, Illinois 60637, USA Department of Zoology, University of Texas, Austin, Texas 78712, USA 2. ALAN GRAFEN Animal Behaviour Research Group, Zoology Department, Oxford University, Oxford OX1 3PS, 3. R. HOLLIDAY & R. F. ROSENBERGER CSIRO Laboratory for Molecular Biology, North Ryde, Sydney, Australia Genetics Division, National Institute for Medical Research, Mill Hill, London NW7 1AA, UK 4. LEIGH M. VAN VALEN Department of Ecology and Evolution, University of Chicago, 915 East 57Street, Chicago, Illinois 60637, USA 5. ANTOINE DANCHIN Institut Pasteur, 28 Rue Dr. Roux, 75724 Paris, Cedex 15, France 6. IRWIN TESSMAN Departments of Biiological Sciences, Purdue University, West Lafayette, Indiana 47907, USA ABSTRACT This is a supplement to the paper arXiv:q-bio/0701050, containing the text of correspondence sent to Nature in 1990. <|endoftext|><|startoftext|> Introduction The discrete dipole approximation (DDA) is a well-known method to solve the light scattering problem for arbitrary shaped particles. Since its introduction by Purcell and Pennypacker1 it has been improved constantly. The formulation of DDA summarized by Draine and Flatau2 more than 10 years ago is still most widely used for different applications,3 partly due to the publicly available high-quality and user-friendly code DDSCAT.4 DDA directly discretizes the volume of the scatterer and hence is applicable to arbitrary shaped particles. However, the drawback of this discretization is the extreme computational complexity of DDA of O(N 2), where N is the number of dipoles. This complexity is decreased to O(NlogN) by advanced numerical techniques.2,5 Still the usual application strategy for DDA is “single computation”, where a discretization is chosen based on available computational resources and some empirical estimates of the expected errors.3,4 These error estimates are based on a limited number of benchmark calculations3 and hence are external to the light scattering problem under investigation. Such error estimates have evident drawbacks, however no better alternative is available. Usually errors in DDA are studied as a function of the size parameter of the scatterer x (at a constant or few different values of N), e.g. 2,6. Only several papers directly present errors versus discretization parameter (e.g. d – the size of a single dipole).7-15 The range of d typically studied in those papers is limited to a 5 times difference between minimum and maximum values (with the exception of two papers9,10 where it is 15 times). Only two papers7,15 use extrapolation (to zero d) to get an exact result of some measured quantity, however they use the simplest linear extrapolation without any theoretical foundation nor discussion of its capabilities. It is acknowledged for a long time that DDA errors are due to two different factors: shape (it is not always possible to describe the particle shape exactly by a collection of cubical cells) and discretization (finite size of each cell).6 However, the question which of them is more important in different cases is still open. A discussion on this issue spanned through several papers16-20 that have not reached any definite conclusions yet. The uncertainty is due to the indirect methods used that have inherent interpretation problems. In accompanying paper,21 that from now on we will refer to as Paper 1, we performed a theoretical analysis of DDA convergence when refining the discretization. It provides the basis for this paper, where an extrapolation technique is introduced (Section 2) to improve the accuracy of DDA computations. We thoroughly discuss all free parameters that influence extrapolation performance and provide a step-by-step prescription, which can be used with any existing DDA code without any modifications. It is important to note that although Paper 1 provides a firm theoretical background, it is not necessary to go through all theoretical details to understand and apply the extrapolation technique that we introduce here. In Section 3 we present extensive numerical results of DDA computations for 5 different scatterers using many different discretizations. These results are discussed in Section 4 to evaluate the performance of the extrapolation technique. We also propose a new method to directly separate shape and discretization errors of DDA (described and illustrated in Section 3.B). The results and possible applications are discussed in Section 4. We formulate the conclusions of the paper in Section 5. 2. Extrapolation In this section we describe a straightforward technique to significantly increase the accuracy of a DDA simulation with a relatively small increase of computation time. This technique does not require any modification of a DDA program but only postprocessing of computed data. Therefore it can be easily implemented in any existing DDA code. In Paper 1 we have proven that the error of any measured quantity is bounded by a quadratic function of the discretization parameter mkdy = (k – free space wave vector, m – refractive index of the scatterer): ( ) ( )yybayybay lnlnδ 11222 φφφφφ −+−≤ , (1) where φ y is some measured quantity (e.g. extinction efficiency Qext, Mueller matrix elements at some scattering angle Sij(θ ), etc.) and δφ y its error (difference between a result of the numerical simulation and an exact value). , are constants (independent on y), which are described in detail in Paper 1. Here we proceed and assume that for sufficiently small y, δφ y can in fact be approximated by a quadratic function of y (taking the logarithmic term as a constant). The applicability of this assumption will be tested empirically in Section 3.B. Introduction of higher-order terms is possible but not necessary (contrary to the quadratic term), and we avoid it in order to keep our technique as simple and robust as possible. We can now write: yy yayaa ζφ +++= 2210 , (2) where a0, a1, a2 are constants that are chosen such that ζ y – the error of the approximation – is minimized. a0 is then an estimate for the exact value of the measured quantity φ 0. A procedure to determine a0 is basically fitting of a quadratic function over several points , which are obtained by a standard DDA simulation. In the ideal case of one can use any three values of y to obtain the exact value of φ },{ yy φ 0=yζ 0. However, in practice different fits will always give different results. We limit ourselves to the usual least-square polynomial fit of the data. There are three question one should answer before conducting such a fit: 1) how many and which values of y to use? 2) how to weight the influence of different calculated values used in the fitting, i.e. what is the behavior of expected errors ζ y? (Note that in the polynomial fit we minimize χ2, the summation of the squared difference between computed values and the fitting function weighted by the inverse of the expected error ζ y.) 3) how to estimate the difference between a0 and φ 0, i.e. the error of the final result? It is important to note that, although there are some theoretical hints, answers to these questions are mainly empirical and should be tested. Our approach is based on the test cases presented in Section 3.B. These may not be representative for all scattering problems, but they do show the potential power of our approach. We do not attempt to choose the most suitable fit options, but merely demonstrate the applicability of the technique. We start by analyzing the second question, i.e. what is the expected deviation from the quadratic model, i.e. what is the functional dependence of ζ y on y, to be used as weighting function in the polynomial fitting procedure? For cubically shaped particles, defined in Paper 1 as particles whose shape can be exactly discretized using cubical subvolumes, one expects a smooth variation of the function φ y, and the error can be attributed as a model error, i.e. coming mainly from neglecting higher order terms in the convergence analysis of Paper 1. In that case the error ζ y is expected to be a cubical function of y. We have tried cubical, quadratic and linear error functions when fitting results for cubically shaped particles and found that, although the differences are small, cubical errors generally lead to the best fits (data not shown). Shape errors, which are present for non-cubically shaped particles, are expected to be very sensitive to y, because they depend upon the position of the particle surface inside the boundary dipole that changes considerably by a small variation of y (for details see Paper 1). Therefore shape errors can be viewed as random noise superimposed upon a smooth variation of φ y. The asymptotic behavior of shape errors is linear in y (see Paper 1). Indeed, in certain cases we found that using linear errors ζ y results in significantly better fits than when using cubical errors. However in other cases linear errors performed significantly worse. In our experience, using a cubical error function is in general always more reliable, even in the presence of shape errors, because it decreases the influence of points with high values of y, where the error is larger and less predictable. Since we want the procedure to be as robust as possible and not to use more complex error functions than strictly needed (e.g. polynomial), we take a cubical dependence of the error ζ y, both for cubically- and non-cubically shaped scatterers. The choice of values of y for computation can be described by the interval [ymin,ymax], the number of points and their spacing. ymin is usually determined by available computer hardware (time or memory bounds), that is the best discretization that can be computed for a given resource. The goal of the extrapolation procedure is to increase the accuracy beyond this “single DDA boundary”. We will show in Section 3.B that the overall performance of this technique strongly depends upon ymin. The choice of ymax is governed by two notions: a larger interval of data points generally leads to better extrapolation but errors for high values of y are more random and their significance is anyway much smaller (since we use a cubical error function). We have found that for cubically shaped scatterers a good choice is minmax 2yy = , while for non-cubically shaped scatterers increasing the interval to minmax 4yy = does improve the fits. Probably that is due to the fact that the quality of fit for non-cubically shaped scatterers is determined by quasi-random shape errors and increasing the range leads to larger statistical significance of the result. We will also demand that ymax is less than 1, since otherwise DDA is definitely far from its asymptotic behavior. Spacing of the sample points depends partly on the problem, especially for cubically shaped scatterers (in that case an arbitrary number of dipoles cannot be used). We space computational points approximately uniform on a logarithmic scale, acknowledging the fact that a relative difference in y is more significant than an absolute. The total number of points should be large enough for statistical significance. However, a large number of points increases computational time. We have used 5 points for cubically shaped particles (ratio of y1 values is 8:7:6:5:4) and 9 points for non-cubically shaped particles (ratio of y1 values is 16:14:12:10:8:7:6:5:4) or less if minmax 4yy < . The estimation of the error of the final result is difficult since this error is due to model imperfection and not to some kind of random noise. The standard least-square fitting technique22 provides a standard error (SE) for the parameter a0, which we use as a starting point. Numerical simulations (Section 3.B) show that for spheres (the only non-cubical shape we studied) real errors are less than 2×SE in most cases. That is what one would expect if ζ y is considered completely random (which is similar to the expected behavior of the shape errors). For cubical shapes, on the contrary, we have to estimate the error as 10×SE to reliably describe the real errors. It is important to note that an error estimate based on the SE is the simplest one can use. Its drawback is that we have to use a large multiplier (based on the real errors obtained in some of our simulations), which may lead to significant overestimation of real errors in certain cases. We can now formulate the step-by-step extrapolation technique. We use abbreviations (c) and (nc) for cubically and non-cubically shaped scatterers respectively. 1) Select ymin based on your computational resources. 2) Take ymax to be 2 (c) or 4 (nc) times ymin but not larger than 1. 3) Choose 5 (c) or 9 (nc) points over the interval [ymin,ymax] approximately uniformly spaced on a logarithmic scale. 4) Perform DDA computations for each y. 5) Fit the quadratic function (Eq. (2)) over the points using y},{ yy φ 3 as errors of data points; a0 is then the estimate of φ 0. Multiply SE of a0 by 10 (c) or 2 (nc) to obtain an estimate of the extrapolation error. Results of using this procedure are presented in Section 3, together with computational costs. The extrapolation procedure is similar to a Romberg integration method,22 which is adaptive. The error estimate, obtained by extrapolation, is an internal accuracy indicator of DDA computations that is just as important as the increase in the accuracy itself. Our error estimate opens the way to adaptive DDA, i.e. a code that will reach a required accuracy, using minimum computational resources. 3. Numerical simulations A.Discrete Dipole Approximation The basics of the DDA method were summarized by Draine and Flatau.2 In this paper we use the LDR prescription for dipole polarizability,23 which is most widely used nowadays, e.g. in the publicly available code DDSCAT 6.1.4 We also employ dipole size correction6 for non- cubically shaped scatterers to ensure that the cubical approximation of the scatterer has the correct volume; this is believed to diminish shape errors, especially for small scatterers.2 We use a standard discretization scheme without any improvements for boundary dipoles. The main numerical challenge of DDA is to solve a large system of 3N linear equations. This is done iteratively using some Krylov-subspace method,22 while the matrix-vector products are computed using an FFT-based algorithm.5 Our code – Amsterdam DDA (ADDA) – is capable of running on a cluster of computers (parallelizing a single DDA computation), which allows us to use practically an unlimited number of dipoles, since we are not limited by the memory of a single computer.24,25 We used a relative error of residual as a stopping criterion. Tests suggest that the relative error of the measured quantities due to the iterative solver is then (data not shown) and hence can be neglected (total relative errors in our simulations are – see Section 810−< 710−< 56 1010 −− ÷> 3.B). All DDA simulations were carried out on the Dutch national compute cluster LISA.26 The execution time of one iteration depends solely on N, it consists of an arithmetic part which scales linearly with N and an FFT part which scales as NlnN. The number of iterations only slightly depends on the discretization parameter y for fixed geometry of the scatterer. Rahola proved this theoretically for any Krylov-subspace method,27 and our own experience agrees with this conclusion. Therefore the total computational time scales linearly with N ( ) or slightly faster (considering logarithm and imperfect optimization), which is consistent with our timing results (data not shown). 3−∝ y We can now estimate the computational overhead of the extrapolation technique compared to a single DDA computation for ymin (time – t(ymin)). Considering the spacing of points we used (described in Section 2) the execution time needed for 5 points computation is and for the 9 points computation – )(5.2 min5 ytt < )(7.2 min9 ytt < . Memory requirements are the same as for a single computation. For comparison one should note that an 8 times increase in computational time and memory requirements (for single DDA computation with Fig. 1. Cubical discretization of a sphere using 16 dipoles per diameter (total 2176 dipoles). 2minyy = ) gives only a 2 to 4 times increase in accuracy (depending in which error regime – linear or quadratic – ymin is located). B.Results We study five test cases: one cube with 8=kD , three spheres with , and a particle obtained by a cubical discretization of the 30,10,3=kD 10=kD sphere using 16 dipoles per D (total 2176 dipoles, see Fig. 1, x equal to that of a sphere). By D we denote the diameter of a sphere or the edge size of a cube. All scatterers are homogenous with . Although DDA errors significantly depend on m (see e.g. 5.1=m 12), we limit ourselves to one single value and study the effects of size and shape of the scatterer. The maximum number of dipoles per D (nD) was 256. The values of nD that we used are of the form (p is an integer), except for the discretized sphere, where all np2}7,6,5,4{ ⋅ D are multiples of 16 (this is required to exactly describe the shape of the particle composed from a number of cubes – see Fig. 1). The minimum values for nD were 8 for the sphere, 16 for the cube, the sphere, and the discretized sphere, and 40 for the sphere. 10=kD 30=kD Typical computation time for the finest discretization (for the cube with , resulting in ) currently is 2.5 hours on a cluster of 64 P4-3.4 GHz processors. We expect that it can be improved by an order of magnitude by using modern FFT routines (e.g. fastest Fourier transform in the West – FFTW 047.0=y 7107.1 ⋅=N 28) and a faster iterative solvers (bi-conjugate gradient stabilized or quasi-minimal residual that were shown to be clearly superior to CGNR29,30 that we still use). We are currently improving our code along these lines. All computations use a direction of incidence parallel to one of the principal axes of the cubical dipoles. The scattering plane is parallel to one of the faces of the cubical dipoles. In this paper we show results only for the extinction efficiency Qext (for incident light polarized parallel to one of the principal axes of the cubical dipoles) and phase function S11(θ ) as the 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. most commonly used in applications. However, the extrapolation technique is equally applicable to any measured quantity. For instance, we have also applied it for other Mueller matrix elements (data not shown). Reference (exact) results of S11(θ ) and Qext for spheres are obtained by Mie theory (the relative accuracy of the code we use31 is at least ). Unfortunately, no analytical theory is available for the cube and the discretized sphere, which could provide us with exact results. Instead, we use extrapolation over the 5 finest discretizations as reference results for these shapes. 610−< To justify this choice we discuss, as an example, simulation results of Qext for the cube. Instead of showing values of Qext itself, we show in Fig. 2a ( )10ext −aQ , with a0 obtained through fitting the 5 finest discretizations. The extrapolation through these 5 best points ( , ) is also shown. The deviation of the fit from the five best points (that overlap on 047.0min =y 094.0max =y Fig. 2(a)) is very small indeed. This is also characterized by a small estimate of the extrapolation error (see 6108.1 −× Table 1). In Paper 1 we proved that DDA converges to the exact solution, therefore the result of the best extrapolation should be close to the exact result. The relative difference between the best discretization and the best extrapolation is only , therefore it does not make a big difference which one to use as a reference when evaluating, for instance, the error of the extrapolation through the 5 worst 5100.9 −× cube kD=8 fit (5 best points) discretized sphere kD=10 fit (5 best points) (b) sphere kD=3 fit (9 best points) sphere kD=10 fit (9 best points) sphere kD=30 fit (9 best points) y=kd·m Fig. 2. Signed relative errors of Qext versus y and their fits by quadratic functions for (a) kD = 8 cube and discretized kD = 10 sphere, (b) 3 spheres. 5 and 9 best points are used for fits in (a) and (b) respectively. Table 1. Extrapolation errors of Qext. Estimate of the extrapolation errors is 10×SE for first two particles and 2×SE for spheres. Extrapolation ymin ymax Points Error for ymin Estimate Real kD = 8 cube 0.047 0.094 5 9.0×10-5 1.8×10-6 –––– 0.094 0.19 5 1.6×10-4 6.6×10-6 4.6×10-6 0.19 0.38 5 2.2×10-4 5.3×10-5 4.0×10-5 0.38 0.75 5 1.1×10-4 3.7×10-4 3.2×10-4 Discretized kD = 10 sphere 0.058 0.12 5 1.0×10-4 2.4×10-5 –––– 0.12 0.23 5 2.0×10-4 9.0×10-6 7.9×10-6 0.23 0.93 4 4.3×10-4 1.2×10-3 5.9×10-4 kD = 3 sphere 0.018 0.070 9 2.2×10-4 1.0×10-5 4.1×10-6 0.035 0.14 9 4.0×10-4 5.9×10-5 4.8×10-5 0.070 0.28 9 6.8×10-4 8.7×10-5 5.7×10-6 0.14 0.54 9 9.0×10-4 3.7×10-4 7.0×10-4 0.28 0.54 5 2.4×10-4 4.3×10-3 1.8×10-3 kD = 10 sphere 0.059 0.23 9 2.7×10-4 2.0×10-4 2.7×10-5 0.12 0.47 9 5.5×10-4 5.5×10-4 3.7×10-4 0.23 0.93 9 1.5×10-3 3.1×10-3 2.1×10-3 kD = 30 sphere 0.18 0.70 9 3.8×10-4 1.3×10-3 1.4×10-3 0.18 0.35 5 3.8×10-4 3.3×10-3 6.9×10-4 Table 2. Comparison of shape and discretization errors of Qext for kD = 10 sphere discretized with y = 0.93. All errors are relative to the best extrapolation result for the discretized sphere. Shape Discretization Total Error 3.1×10-3 8.3×10-3 5.2×10-3 discretizations ( 38.0min =y , 75.0max =y ). Hence all conclusions with respect to the reliability of the error estimates (as discussed in Section 4) do not depend on the choice of reference if ymin is large enough. We also apply this reasoning to smaller ymin and assume that using the reference value obtained by extrapolation of the finest discretizations is a good enough estimate of the exact value. The same justification is valid for the discretized sphere (see Table 1 for Qext results). Comparison of errors of different extrapolations results of S11(θ ) (shown in Fig. 3 and Fig. 4) is even more convincing. Reference results themselves (both of Qext and S11(θ )) can be found in Paper 1. Next we show the results obtained by the extrapolation technique. The dependence of the signed relative errors of Qext on y for all 5 test cases are shown in Fig. 2. Fig. 2(a) depicts results for the cube and the discretized sphere. The 5 best points for each scatterer are fitted by a quadratic function, using the method described in Section 2. Fig. 2(b) depicts extrapolation results for spheres, using the 9 best points for each of them (cf. Section 2). Since the exact Mie solution is available, intersection of a fit with a vertical axis is a measure of the accuracy of extrapolation result. Table 1 summarizes the parameters (ymin, ymax, number of points) of all the extrapolations, which were carried out, and their performance for Qext. 0 30 60 90 120 150 180 Scattering angle θ, deg y = 0.75 y = 0.38 extrapolation estimate y = 0.19 y = 0.094 extrapolation estimate y = 0.094 y = 0.047 extrapolation (estimate) Fig. 3. Errors of S11(θ ) in logarithmic scale for extrapolation using 5 values of y in the intervals (a) [0.047,0.094], (b) [0.094,0.19], and (c) [0.38,0.75] for kD = 8 cube. Estimate of the extrapolation error is 10×SE. Next we present some of the extrapolations results for S11(θ ). Results for the cube are shown in Fig. 3. Each subfigure shows real (compared to the best extrapolation – reference) and estimated errors together with the errors of the finest and crudest discretizations used. Only the estimate of the error is shown for the best extrapolation – Fig. 3(a). Fig. 3(b) and (c) show extrapolation results using 5 points in the intervals [0.094,0.19] and [0.38,0.75] respectively. The performance of the extrapolation for the discretized sphere is shown in Fig. 4: (a) – best extrapolation, (b) and (c) – results for extrapolation using 5 and 4 points in the intervals [0.12,0.23] and [0.23,0.93] respectively. The broad spacing of points for 0 30 60 90 120 150 180 Scattering angle θ, deg y = 0.93 y = 0.23 extrapolation estimate y = 0.23 y = 0.12 extrapolation estimate y = 0.12 y = 0.058 extrapolation (estimate) Fig. 4. Errors of S11(θ ) in logarithmic scale for extrapolation using 5 values of y in the intervals (a) [0.058,0.12], (b) [0.12,0.23] ((c): 4 values of y in the interval [0.23,0.93]) for the discretized kD = 10 sphere. Estimate of the extrapolation error is 10×SE. extrapolation depicted in Fig. 4(c) is, as was noted above, due to the complex shape of the discretized sphere that limits possible values of y to be 0.93 divided by an integer (total time for computing these 4 points is ). It is important to note once more that we use 10×SE as an estimate of extrapolation error for the cube and discretized sphere and 2×SE for spheres (cf. Section )(6.1 minyt< Extrapolation results for the 3=kD sphere are summarized in Fig. 5: (a) shows the best extrapolation (using 9 points in the interval [0.018,0.070]), and (b) shows the worst, but still satisfactory result, i.e. one that shows definite improvement of accuracy over most of the θ 0 30 60 90 120 150 180 Scattering angle θ, deg y = 0.55 y = 0.14 extrapolation estimate y = 0.070 y = 0.018 extrapolation estimate Fig. 5. Errors of S11(θ ) in logarithmic scale for extrapolation using 9 values of y in the intervals (a) [0.018,0.070], (b) [0.14,0.55] for kD = 3 sphere. Estimate of the extrapolation error is 2×SE. range. The extrapolation using 5 points from the interval [0.28,0.54] is no longer satisfactory (data not shown). Errors of the two best extrapolations for the 10=kD sphere (using 9 points from the intervals [0.059,0.23] and [0.12,0.47]) are shown in Fig. 6(a) and (b) respectively. A third extrapolation for sphere is not satisfactory (data not shown). Both extrapolations for the sphere show similar controversial results, only one of them (9 points from the interval [0.18,0.70]) that is overall slightly better is shown in 10=kD 30=kD Fig. 7. The estimate of the extrapolation error is overall slightly higher than the real errors of the extrapolation (data not shown). Results of S11(θ ) for all extrapolations (see Table 1) support the following trend: the quality of the extrapolation (defined as decrease of error compared to a single DDA computation for ymin) rapidly degrades with increasing ymin. The ratio of estimated to real errors increase with increasing ymin (that can be considered as a degradation of the estimate quality). Computation of exact results for both the 10=kD sphere and its cubical discretization ( ) allows us for the first time to directly separate and compare shape and discretization error of single DDA computations. The shape error is the difference between some measured quantity for a discretized sphere (calculated to a high accuracy) and that for the exact sphere. The discretization error is difference between calculation using a limited number of dipoles (2176) and exact (very accurate) solution for the cubical discretization of the sphere (first curve in 93.0=y Fig. 4(c)). The total error is just the sum of the two. These three 0 30 60 90 120 150 180 Scattering angle θ, deg y = 0.47 y = 0.12 extrapolation estimate y = 0.23 y = 0.059 extrapolation estimate Fig. 6. Errors of S11(θ ) in logarithmic scale for extrapolation using 9 values of y in the intervals (a) [0.059,0.23], (b) [0.12,0.47] for kD = 10 sphere. Estimate of the extrapolation error is 2×SE. types of errors for S11(θ ) are shown in Fig. 8, all relative to the exact value for discretized sphere. Errors of Qext are shown in Table 2. 4. Discussion In their review Draine and Flatau2 gave the condition 1 /CHT /DAN /DEU /ESP /FRA /ITA /JPN /KOR /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken die zijn geoptimaliseerd voor prepress-afdrukken van hoge kwaliteit. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.) /NOR /PTB /SUO /SVE /ENU (Use these settings to create Adobe PDF documents best suited for high-quality prepress printing. Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.) /Namespace [ (Adobe) (Common) (1.0) /OtherNamespaces [ << /AsReaderSpreads false /CropImagesToFrames true /ErrorControl /WarnAndContinue /FlattenerIgnoreSpreadOverrides false /IncludeGuidesGrids false /IncludeNonPrinting false /IncludeSlug false /Namespace [ (Adobe) (InDesign) (4.0) ] /OmitPlacedBitmaps false /OmitPlacedEPS false /OmitPlacedPDF false /SimulateOverprint /Legacy >> << /AddBleedMarks false /AddColorBars false /AddCropMarks false /AddPageInfo false /AddRegMarks false /ConvertColors /ConvertToCMYK /DestinationProfileName () /DestinationProfileSelector /DocumentCMYK /Downsample16BitImages true /FlattenerPreset << /PresetSelector /MediumResolution >> /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> >> setdistillerparams /HWResolution [2400 2400] /PageSize [612.000 792.000] >> setpagedevice ABSTRACT We propose an extrapolation technique that allows accuracy improvement of the discrete dipole approximation computations. The performance of this technique was studied empirically based on extensive simulations for 5 test cases using many different discretizations. The quality of the extrapolation improves with refining discretization reaching extraordinary performance especially for cubically shaped particles. A two order of magnitude decrease of error was demonstrated. We also propose estimates of the extrapolation error, which were proven to be reliable. Finally we propose a simple method to directly separate shape and discretization errors and illustrated this for one test case. <|endoftext|><|startoftext|> Introduction A promising approach to handling the complexity of cell signaling pathways is to decompose pathways into small motifs, and analyze the individual motifs. One particular motif that has attracted much attention in recent years is the cycle formed by two or more inter-convertible forms of one protein. The protein, denoted here by S0, is ultimately converted into a product, denoted here by Sn, through a cascade of “activation” reactions triggered or facilitated by an enzyme E; conversely, Sn is transformed back (or “deactivated”) into the original S0, helped on by the action of a second enzyme F . See Figure 1. S S0 2S1 SSS nn−1n−2 Figure 1: A futile cycle of size n. Such structures, often called “futile cycles” (also called substrate cycles, enzymatic cycles, or enzymatic inter-conversions, see [1]), serve as basic blocks in cellular signaling pathways and have pivotal impact on the signaling dynamics. Futile cycles underlie signaling processes such as GTPase cycles [2], bacterial two-component systems and phosphorelays [3, 4] actin treadmilling [5]), and glucose mobilization [6], as well as metabolic control [7] and cell division and apoptosis [8] and cell-cycle checkpoint control [9]. One very important instance is that of Mitogen-Activated Protein Kinase (“MAPK”) cascades, which regulate http://arxiv.org/abs/0704.0036v2 primary cellular activities such as proliferation, differentiation, and apoptosis [10–13] in eukaryotes from yeast to humans. MAPK cascades usually consist of three tiers of similar structures with multiple feedbacks [14–16]. Each individual level of the MAPK cascades is a futile cycle as depicted in Figure 1 with n = 2. Markevich et al.’s paper [17] was the first to demonstrate the possibility of multistationarity at a single cascade level, and motivated the need for analytical studies of the number of steady states. Conradi et al. studied the existence of multistationarity in their paper [19], employing algorithms based on Feinberg’s chemical reaction network theory (CRNT). (For more details on CRNT, see [31,32].) The CRNT algorithm confirms multistationarity in a single level of MAPK cascades, and provides a set of kinetic constants which can give rise to multistationarity. However, the CRNT algorithm only tests for the existence of multiple steady states, and does not provide information regarding the precise number of steady states. In [18], Gunawardena proposed a novel approach to the study of steady states of futile cycles. His approach, which was focused in the question of determining the proportion of maximally phosphorylated substrate, was developed under the simplifying quasi-steady state assumption that substrate is in excess. Nonetheless, our study of multistationarity uses in a key manner the basic formalism in [18], even for the case when substrate is not in excess. In Section 2, we state our basic assumptions regarding the model. The basic formalism and background for the approach is provided in Section 3. The main focus of this paper is on Section 4, where we derive various bounds on the number of steady states of futile cycles of size n. The first result is a the lower bound for the number of steady states. Currently available results on lower bounds, as in [29], can only handle the case when quasi-steady state assumptions are valid; we substantially extend these results to the fully general case by means of a perturbation argument which allows one to get around these restricted assumptions. Another novel feature of our results in this paper is the derivation of an upper bound of 2n − 1, valid for all kinetic constants. Models in molecular cell biology are characterized by a high degree of uncertainly in parameters, hence such results valid over the entire parameter space are of special significance. However, when more information of the parameters are available, sharper upper bounds can obtained, see Theorems 4 and 5. We finally conclude our paper in Section 5 with a conjecture of an n+ 1 upper bound. We remark that the results given here complement our work dealing with the dynamical behavior of futile cycles. For the case n = 2, [25] showed that the model exhibits generic convergence to steady states but no more complicated behavior, at least within restricted parameter ranges, while [27] showed a persistence property (no species tends to be eliminated) for any possible parameter values. These papers did not address the question of estimating the number of steady states. (An exception is the case n = 1, for which uniqueness of steady states can be proved in several ways, and for which global convergence to these unique equilibria holds [27].) 2 Model assumptions Before presenting mathematical details, let us first discuss the basic biochemical assumptions that go into the model. In general, phosphorylation and dephosphorylation can follow either distributive or processive mechanism. In the processive mechanism, the kinase (phosphatase) facilitates two or more phosphorylations (dephosphorylations) before the final product is released, whereas in the distributive mechanism, the kinase (phosphatase) facilitates at most one phosphorylation (dephosphorylation) in each molecular encounter. In the case of n = 2, a futile cycle that follows the processive mechanism can be represented by reactions as follows: S0 + E ←→ ES0 ←→ ES1 −→ S2 + E S2 + F ←→ FS2 ←→ FS1 −→ S0 + F ; and the distributive mechanism can be represented by reactions: S0 + E ←→ ES0 −→ S1 + E ←→ ES1 −→ S2 + E S2 + F ←→ FS2 −→ S1 + F ←→ FS1 −→ S0 + F. Biological experiments have demonstrated that both dual phosphorylation and dephosphorylation in MAPK are distributive, see [14–16]. In their paper [19], Conradi et al. showed mathematically that if either phos- phorylation or dephosphorylation follows a processive mechanism, the steady state will be unique, which, it is argued in [19], contradicts experimental observations. So, to get more interesting results, we assume that both phosphorylations and dephosphorylations in the futile cycles follow the distributive mechanism. Our structure of futile cycles in Figure 1 also implicitly assumes a sequential instead of a random mechanism. By a sequential mechanism, we mean that the kinase phosphorylates the substrates in a specific order, and the phosphatase works in the reversed order. This assumption dramatically reduces the number of different phospho-forms and simplifies our analysis. In a special case when the kinetic constants of each phosphorylation are the same and the kinetic constants of each dephosphorylation are the same, the random mechanism can be easily included in the sequential case. Biologically, there are systems, for instance the auto-phosphorylation of FGF-receptor-1, that have been experimentally shown to follow a sequential mechanism [33]. To model the reactions, we assume mass action kinetics, which is standard in mathematical modeling of molecular events in biology. 3 Mathematical formalism In this section, we set up a mathematical framework for studying the steady states of futile cycles. Let us first write down all the elementary chemical reactions in Figure 1: S0 + E koff0 kcat0 → S1 + E Sn−1 + E konn−1 koffn−1 kcatn−1 → Sn + E S1 + F loff0 lcat0 → S0 + F Sn + F lonn−1 loffn−1 lcatn−1 → Sn−1 + F where kon0 , etc., are kinetic parameters for binding and unbinding, ES0 denotes the complex consisting of the enzyme E and the substrate S0, and so forth. These reactions can be modeled by 3n + 3 differential- algebraic equations according to mass action kinetics: = −kon0s0e+ koff0c0 + lcat0d1 = −konisie+ koffi ci + kcati−1ci−1 − loni−1sif + loffi−1 di + lcatidi+1, i = 1, . . . , n− 1 = konjsje− (koffj + kcatj )cj , j = 0, . . . , n− 1 (1) = lonk−1skf − (loffk−1 + lcatk−1)dk, k = 1, . . . , n, together with the algebraic “conservation equations”: Etot = e+ Ftot = f + di, (2) Stot = The variables s0, . . . , sn, c0, . . . , cn−1, d1, . . . , dn, e, f stand for the concentrations of S0, . . . , Sn, ES0, . . . , ESn−1, FS1, . . . , FSn, E, F respectively. For each positive vector κ =(kon0 , . . . , konn−1 , koff0 , . . . , koffn−1 , kcat0 , . . . , kcatn−1 , lon0 , . . . , lonn−1 , loff0 , . . . , loffn−1 , lcat0 , . . . , lcatn−1) ∈ R (of “kinetic constants”) and each positive triple C = (Etot, Ftot, Stot), we have a different system Σ(κ, C). Let us write the coordinates of a vector x ∈ R3n+3+ as: x = (s0, . . . , sn, c0, . . . , cn−1, d1, . . . , dn, e, f), and define a mapping Φ : R3n+3+ × R + × R + −→ R with components Φ1, . . . ,Φ3n+3 where the first 3n components are Φ1(x, κ, C) = −kon0s0e+ koff0c0 + lcat0d1, and so forth, listing the right hand sides of the equations (1), Φ3n+1 is ci − Etot, and similarly for Φ3n+2 and Φ3n+3, we use the remaining equations in (2). For each κ, C, let us define a set Z(κ, C) = {x |Φ(x, κ, C) = 0}. Observe that, by definition, given x ∈ R3n+3+ , x is a positive steady state of Σ(κ, C) if and only if x ∈ Z(κ, C). So, the mathematical statement of the central problem in this paper is to count the number of elements in Z(κ, C). Our analysis will be greatly simplified by a preprocessing. Let us introduce a function Ψ : R3n+3+ × R + × R + −→ R with components Ψ1, . . . ,Ψ3n+3 defined as Ψ1 = Φ1 +Φn+1 Ψi = Φi +Φn+i +Φ2n+i−1 +Ψi−1, i = 2, . . . , n Ψj = Φj, j = n+ 1, . . . , 3n + 3. It is easy to see that Z(κ, C) = {x |Ψ(x, κ, C) = 0}, but now the first 3n equations are: Ψi = lcati−1di − kcati−1ci−1 = 0, i = 1, . . . , n, Ψn+1+j = konjsje− (koffj + kcatj )cj = 0, j = 0, . . . , n− 1 Ψ2n+k = lonk−1skf − (loffk−1 + lcatk−1)dk = 0, k = 1, . . . , n, and can be easily solved as: si+1 = λi(e/f)si, (3) di+1 = fsi+1 , (5) where kcatiLMi KMi lcati , KMi = kcati + koffi , LMi = lcati + loffi , i = 0, . . . , n− 1. (6) We may now express 0 si, 0 ci and 1 di in terms of s0, κ, e and f : si = s0 1 + λ0 + λ0λ1 + · · ·+ λ0 · · ·λn−1 := s0ϕ ci = es0 + · · ·+ λ0 · · ·λn−2 KMn−1 := es0ϕ , (7) di = fs0 + · · · + λ0 · · ·λn−1 LMn−1 := fs0ϕ Although the equation Ψ = 0 represents 3n+3 equations with 3n+3 unknowns, next we will show that it can be reduced to two equations with two unknowns, which have the same number of positive solutions as Ψ = 0. Let us first define a set S(κ, C) = {(u, v) ∈ R+ × R+ |G 1 (u, v) = 0, G 2 (u, v) = 0}, where G 1 , G 2 : R + −→ R are given by 1 (u, v) = v (uϕ 1(u)− ϕ 2(u)Etot/Ftot)− Etot/Ftot + u, 2 (u, v) = ϕ 0(u)ϕ 2 (u)v 2 + (ϕκ0 (u)− Stotϕ 2 (u) + Ftotuϕ 1 (u) + Ftotϕ 2 (u)) v − Stot. The precise statement is as follows: Lemma 1 There exists a mapping δ : R3n+3 −→ R2 such that, for each κ, C, the map δ restricted to Z(κ, C) is a bijection between the sets Z(κ, C) and S(κ, C). Proof. Let us define the mapping δ : R3n+3 −→ R2 as δ(x) = (e/f, s0), where x = (s0, . . . , sn, c0, . . . , cn−1, d1, . . . , dn, e, f). If we can show that δ induces a bijection between Z(κ, C) and S(κ, C), we are done. First, we claim that δ(Z(κ, C)) ⊆ S(κ, C). Pick any x ∈ Z(κ, C), we have that x satisfies (3)-(5). Moreover, Φ3n+2(x, κ, C) = 0 yields Etot = e+ es0ϕ and thus 1 + s0ϕ 1(e/f) . (8) Using Φ3n+1(x, κ, C) = 0 and Φ3n+2(x, κ, C) = 0, we get: e(1 + s0ϕ 1(e/f)) f(1 + s0ϕ 2(e/f)) , (9) which is G 1 (e/f, s0) = 0 after multiplying by 1 + s0ϕ 2(e/f) and rearranging terms. To check that G 2 (e/f, s0) = 0, we start with Φ3n+3(x, κ, C) = 0, i.e. Stot = Using (7) and (8), this expression becomes Stot = s0ϕ Etots0ϕ 1(e/f) 1 + s0ϕ 1 (e/f) Ftots0ϕ 2(e/f) 1 + s0ϕ 2(e/f) = s0ϕ eFtots0ϕ 1(e/f) f(1 + s0ϕ 2(e/f)) Ftots0ϕ 2(e/f) 1 + s0ϕ 2(e/f) where the last equality comes from (9). After multiplying by 1 + s0ϕ 2 (e/f), and simplifying, we get ϕκ0 ( )ϕκ2 ( )s20 + )− Stotϕ Ftotϕ ) + Ftotϕ s0 − Stot = 0, that is, G 2 (e/f, s0) = 0. since both G 1 (e/f, s0) and G 2 (e/f, s0) are zero, δ(x) ∈ S(κ, C). Next, we will show that S(κ, C) ⊆ δ(Z(κ, C)). For any y = (u, v) ∈ S(κ, C), let the coordinates of x be defined as: s0 = v si+1 = λiusi 1 + s0ϕ 1 (u) di+1 = fsi+1 for i = 0, . . . , n − 1. It is easy to see that the vector x = (s0, . . . , sn, c0, . . . , cn−1, d1, . . . , dn, e, f) satisfies Φ1(x, κ, C) = 0, . . . ,Φ3n+1(x, κ, C) = 0. If Φ3n+2(x, κ, C) and Φ3n+3(x, κ, C) are also zero, then x is an element of Z(κ, C) with δ(x) = y. Given the condition that G i (u, v) = 0 (i = 1, 2) and u = e/f, v = s0, we have G 1 (e/f, s0) = 0, and therefore (9) holds. Since 1 + s0ϕ 1(e/f) in our construction, we have Ftot = f(1 + s0ϕ 2(e/f)) = f + To check Φ3n+3(x, κ, C) = 0, we use 2 (e/f, s0) 1 + s0ϕ 2(e/f) 2 (e/f, s0) = 0 and 1 + s0ϕ 2(e/f) > 0. Applying (7)-(9), we have di = s0ϕ 0(e/f) + eFtots0ϕ 1(e/f) f(1 + s0ϕ 2 (e/f)) Ftots0ϕ 2(e/f) 1 + s0ϕ 2(e/f) = Stot. It remains for us to show that the map δ is one to one on Z(κ, C). Suppose that δ(x1) = δ(x2) = (u, v), where xi = (si0, . . . , s 0, . . . , c n−1, d 1, . . . , d i, f i), i = 1, 2. By the definition of δ, we know that s10 = s 0 and e 1/f1 = e2/f2. Therefore, s1i = s i for i = 0, . . . , n. Equation (8) gives 1 + vϕκ1 (u) = e2. Thus, f1 = f2, and c1i = c i , d i+1 = d i+1 for i = 0, . . . , n − 1 because of (3)-(5). Therefore, x 1 = x2, and δ is one to one. The above lemma ensures that the two sets Z(κ, C) and S(κ, C) have the same number of elements. From now on, we will focus on S(κ, C), the set of positive solutions of equations G 1 (u, v) = 0, G 2 (u, v) = 0, 1 (u, v) = v (uϕ 1(u)− ϕ 2(u)Etot/Ftot)− Etot/Ftot + u = 0, (10) 2 (u, v) = ϕ 0(u)ϕ 2 (u)v 2 + (ϕκ0(u)− Stotϕ 2 (u) + Ftotuϕ 1 (u) + Ftotϕ 2(u)) v − Stot = 0. (11) 4 Number of positive steady states 4.1 Lower bound on the number of positive steady states One approach to solving (10)-(11) is to view G 2 (u, v) as a quadratic polynomial in v. Since G 2 (u, 0) < 0, equation (11) has a unique positive root, namely −Hκ,C(u) + Hκ,C(u)2 + 4Stotϕ 0(u)ϕ 2 (u) 2ϕκ0 (u)ϕ 2 (u) , (12) where Hκ,C(u) = ϕκ0(u)− Stotϕ 2(u) + Ftotuϕ 1(u) + Ftotϕ 2(u). (13) Substituting this expression for v into (10), and multiplying by ϕκ0 (u), we get F κ,C(u) := −H̃κ,C(u) + H̃κ,C(u)2 + 4Stotϕ 0 (u)ϕ 2 (u) 2ϕκ2 (u) uϕκ1(u)− ϕκ2(u) ϕκ0(u)+uϕ 0 (u) = 0. So, any (u, v) ∈ S(κ, C) should satisfy (12) and (14). On the other hand, any positive solution u of (14) (notice that ϕκ0(u) > 0) and v given by (12) (always positive) provide a positive a solution of (10)-(11), that is, (u, v) is an element in S(κ, C). Therefore, the number of positive solutions of (10)-(11) is the same as the number of positive solutions of (12) and (14). But v is uniquely determined by u in (12), which further simplifies the problem to one equation (14) with one unknown u. Based on this observation, we have: Theorem 1 For each positive numbers Stot, γ, there exist ε0 > 0 and κ ∈ R + such that the following property holds. Pick any Etot, Ftot such that Ftot = Etot/γ < ε0Stot/γ, (15) then the system Σ(κ, C) with C = (Etot, Ftot, Stot) has at least n + 1 (n) positive steady states when n is even (odd). Proof. For each κ, γ, Stot, let us define two functions R+ × R+ −→ R as follows: κ,γ,Stot(ε, u) = H κ,(εStot,εStot/γ,Stot)(u) (16) = ϕκ0(u)− Stotϕ 2(u) + ε uϕκ1(u) + ε ϕκ2(u), κ,γ,Stot(ε, u) = F κ,(εStot,εStot/γ,Stot)(u) (17) κ,γ,Stot(ε, u) + κ,γ,Stot(ε, u)2 + 4Stotϕ 0 (u)ϕ 2 (u) 2ϕκ2 (u) (uϕκ1(u)− γϕ 2 (u)) − γϕκ0(u) + uϕ 0(u). By Lemma 1 and the argument before this theorem, it is enough to show that there exist ε0 > 0 and κ ∈ + such that for all ε ∈ (0, ε0), the equation F̃ κ,γ,Stot(ε, u) = 0 has at least n+1 (n) positive solutions when n is even (odd). (Then, given Stot, γ, Etot, and Ftot satisfying (15), we let ε = Etot/Stot < ε0, and apply the result.) A straightforward computation shows that when ε = 0, κ,γ,Stot(0, u) = Stot (uϕ 1(u)− γϕ 2(u))− γϕ 0 (u) + uϕ = λ0 · · ·λn−1u n+1 + λ0 · · ·λn−2 KMn−1 (1− γβn−1)− γλn−1 + · · ·+ λ0 · · ·λi−2 KMi−1 (1− γβi−1)− γλi−1 ui + · · · (18) (1− γβ0)− γλ0 u− γ, where the λi’s and KMi ’s are defined as in (6), and βi = kcati/lcati . The polynomial F̃ κ,γ,Stot(0, u) is of degree n + 1, so there are at most n + 1 positive roots. Notice that u = 0 is not a root because κ,γ,Stot(0, u) = −γ < 0, which also implies that when n is odd, there can not be n + 1 positive roots. Now fix any Stot and γ. We will construct a vector κ such that F̃ κ,γ,Stot(0, u) has n+ 1 distinct positive roots when n is even. Let us pick any n+ 1 positive real numbers u1 < · · · < un+1, such that their product is γ, and assume (u− u1) · · · (u− un+1) = u n+1 + anu n + · · · + a1u+ a0, (19) where a0 = −γ < 0 keeping in mind that ai’s are given. Our goal is to find a vector κ ∈ R + such that (18) and (19) are the same. For each i = 0, . . . , n − 1, we pick λi = 1. Comparing the coefficients of u in (18) and (19), we have: (1 + a0βi) = ai+1 − a0 − 1. (20) Let us pick KMi > 0 such that (ai+1 − a0 − 1)− 1 < 0, then take (ai+1 − a0 − 1)− 1 in order to satisfy (20). From the given λ0, . . . , λn−1,KM0 , . . . ,KMn−1 , β0, . . . , βn−1, we will find a vector κ =(kon0 , . . . , konn−1 , koff0 , . . . , koffn−1 , kcat0 , . . . , kcatn−1 , lon0 , . . . , lonn−1 , loff0 , . . . , loffn−1 , lcat0 , . . . , lcatn−1) ∈ R such that βi = kcati/lcati , i = 0, . . . , n− 1, and (6) holds. This vector κ will guarantee that F̃ κ,γ,Stot(0, u) has n + 1 positive distinct roots. When n is odd, a similar construction will give a vector κ such that κ,γ,Stot(0, u) has n positive roots and one negative root. One construction of κ (given λi,KMi , βi, i = 0, . . . , n − 1) is as follows. For each i = 0, . . . , n − 1, we start by defining: LMi = λiKMi consistently with the definitions in (6). Then, we take koni = 1, loni = 1, koffi = αiKMi , kcati = (1− αi)KMi , lcati = 1− αi KMi , loffi = LMi − lcati , where αi ∈ (0, 1) is chosen such that loffi = LMi − 1− αi KMi > 0. This κ satisfies βi = kcati/lcati , i = 0, . . . , n− 1, and (6). In order to apply the Implicit Function Theorem, we now view the functions defined by formulas in (16) and (17) as defined also for ε ≤ 0, i.e. as functions R×R+ −→ R. It is easy to see that F̃ κ,γ,Stot(ε, u) is C1 on R × R+ because the polynomial under the square root sign in F̃ κ,γ,Stot(ε, u) is never zero. On the other hand, since F̃ κ,γ,Stot(0, u) is a polynomial in u with distinct roots, ∂F̃ κ,γ,Stot (0, ui) 6= 0. By the Implicit Function Theorem, for each i = 1, . . . , n+ 1, there exist open intervals Ei containing 0, and open intervals Ui containing ui, and a differentiable function αi : Ei → Ui such that αi(0) = ui, F̃ κ,γ,Stot(ε, αi(ε)) = 0 for all ε ∈ Ei, and the images αi(Ei)’s are non-overlapping. If we take (0, ε0) := (0,+∞), then for any ε ∈ (0, ε0), we have {αi(ε)} as n+ 1 distinct positive roots of F̃ κ,γ,Stot(ε, u). The case when n is odd can be proved similarly. The above theorem shows that when Etot/Ftot is sufficiently small, it is always possible for the futile cycle to have n + 1 (n) steady states when n is even (odd), by choosing appropriate kinetic constants κ. We should notice that for arbitrary κ, the derivative of F̃ at each positive root may become zero, which breaks down the perturbation argument. Here is an example to show that more conditions are needed: n = 2, λ0 = 1, λ1 = 3, γ = 6, β0 = β1 = 1/12, K0 = 1/8, K1 = 1/2, Stot = 5, we have that κ,γ,Stot(0, u) = 3u3 − 12u2 + 15u− 6 = 3(u− 1)2(u− 2) has a double root at u = 1. In this case, even for ε = 0.01, there is only one positive root of F̃ κ,γ,Stot(ε, u), see Figure 2. 1 2 3 Figure 2: The plot of the function F̃ κ,γ,Stot(0.01, u) on [0, 3]. There is a unique positive real solution around u = 2.14, the double root u = 1 of F̃ κ,γ,Stot(0, u) bifurcates to two complex roots with non-zero imaginary parts. However, the following lemma provides a sufficient condition for ∂F κ,γ,Stot (0, ū) 6= 0, for any positive ū such that F̃ κ,γ,Stot(0, ū) = 0. Lemma 2 For each positive numbers Stot, γ, and vector κ ∈ R + , if 1− γβj holds for all j = 1, · · · , n − 1, then ∂F̃ κ,γ,Stot (0, ū) 6= 0. See Appendix for the proof. Theorem 2 For each positive numbers Stot, γ, and vector κ ∈ R + satisfying condition (21), there exists ε1 > 0 such that for any Ftot, Etot satisfying Ftot = Etot/γ < ε1Stot/γ, the number of positive steady states of system Σ(κ, C) is greater or equal to the number of (positive) roots of F̃ κ,γ,Stot(0, u). Proof. Suppose that F̃ κ,γ,Stot(0, u) has m roots: ū1, . . . , ūm. Applying Lemma 2, we have κ,γ,Stot (0, ūk) 6= 0, k = 1, . . . ,m. By the perturbation arguments as in Theorem 1, we have that there exists ε1 > 0 such that F̃ κ,γ,Stot(ε, u) has at least m roots for all 0 < ε < ε1. The above result depends heavily on a perturbation argument, which only works when Etot/Ftot is sufficiently small. In the next section, we will give an upper bound of the number of steady states with no restrictions on Etot/Ftot, and independent of κ and C. 4.2 Upper bound on the number of steady states Theorem 3 For each κ, C, the system Σ(κ, C) has at most 2n− 1 positive steady states. Proof. An alternative approach to solving (10)-(11) is to first eliminate v from (10) instead of from (11), Etot/Ftot − u uϕκ1(u)− (Etot/Ftot)ϕ , (22) when uϕκ1(u) − (Etot/Ftot)ϕ 2 (u) 6= 0. Then, we substitute (22) into (11), and multiply by (uϕ 1(u) − (Etot/Ftot)ϕ 2 (u)) 2 to get: P κ,C(u) := ϕκ0ϕ + (ϕκ0 − Stotϕ 2 + Ftotuϕ 1 + Ftotϕ uϕκ1 − − Stot uϕκ1 − = 0. (23) Therefore, if uϕκ1(u) − (Etot/Ftot)ϕ 2 (u) 6= 0, the number of positive solutions of (10)-(11) is no greater than the number of positive roots of P κ,C(u). In the special case when uϕκ1(u) − (Etot/Ftot)ϕ 2(u) = 0, by (10), we must have u = Etot/Ftot, and thus ϕκ1 (Etot/Ftot) = ϕ 2 (Etot/Ftot). Substituting into (11), we get a unique v defined as in (12) with u = Etot/Ftot. But notice that in this case u = Etot/Ftot is also a root of P κ,C(u), so also in this case the number of positive solutions to (10)-(11) is no greater than the number of positive roots of P κ,C(u). It is easy to see that P κ,C(u) is divisible by u. Consider the polynomial Qκ,C(u) := P κ,C(u)/u of degree 2n + 1. We will first show that Qκ,C(u) has no more than 2n positive roots, then we will prove by contradiction that 2n distinct positive roots can not be achieved. It is easy to see that the coefficient of u2n+1 is (λ0 · · · λn−1) LMn−1 and the constant term is FtotKM0 So the polynomial Qκ,C(u) has at least one negative root, and thus has no more than 2n positive roots. Suppose that S(κ, C) has cardinality 2n, then Qκ,C(u) must have 2n distinct positive roots, and each of them has multiplicity one. Let us denote the roots as u1, . . . , u2n in ascending order. We claim that none of them equals Etot/Ftot. If so, we would have ϕ 1(Etot/Ftot) = ϕ 2 (Etot/Ftot), and Etot/Ftot would be a double root of Qκ,C(u), contradiction. Since Qκ,C(0) > 0, Qκ,C(u) is positive on intervals I0 = (0, u1), I1 = (u2, u3), . . . , In−1 = (u2n−2, u2n−1), In = (u2n,∞), and negative on intervals J1 = (u1, u2), . . . , Jn = (u2n−1, u2n). As remarked earlier, ϕκ1 (Etot/Ftot) 6= ϕ 2 (Etot/Ftot), the polynomial Q κ,C(u) evaluated at Etot/Ftot is negative, and therefore, Etot/Ftot belongs to one of the J intervals, say Js = (u2s−1, u2s), for some s ∈ {1, . . . , n} . On the other hand, the denominator of v in (22), denoted as B(u), is a polynomial of degree n and divisible by u. If B(u) has no positive root, then it does not change sign on the positive axis of u. But v changes sign when u passes Etot/Ftot, thus v2s−1 and v2s have opposite signs, and one of (u2s−1, v2s−1) and (u2s, v2s) is not a solution to (10)-(11), which contradicts the fact that both are in S(κ, C). Otherwise, there exists a positive root ū of B(u) such that there is no other positive root of B(u) between ū and Etot/Ftot. Plugging ū into Q κ,C(u), we see that Qκ,C(ū) is always positive, therefore, ū belongs to one of the I intervals, say It = (u2t, u2t+1) for some t ∈ {0, . . . , n}. There are two cases: 1. Etot/Ftot < ū. We have u2s−1 < Etot/Ftot < u2t < ū. Notice that v changes sign when u passes Etot/Ftot, so the corresponding v2s−1 and v2t have different signs, and either (u2s−1, v2s−1) /∈ S(κ, C) or (u2t, v2t) /∈ S(κ, C), contradiction. 2. Etot/Ftot > ū. We have ū < u2t+1 < Etot/Ftot < u2s. Since v changes sign when u passes Etot/Ftot, so the corresponding v2t+1 and v2s have different signs, and either (u2t+1, v2t+1) /∈ S(κ, C) or (u2s, v2s) /∈ S(κ, C), contradiction. Therefore, Σ(κ, C) has at most 2n− 1 steady states. 4.3 Fine-tuned upper bounds In the previous section, we have seen that any (u, v) ∈ S(κ, C), u 6= Etot/Ftot must satisfy (22)-(23), but not all solutions of (22)-(23) are elements in S(κ, C). Suppose that (u, v) is a solution of (22)-(23), it is in S(κ, C) if and only if u, v > 0. In some special cases, for example, when the enzyme is in excess, or the substrate is in excess, we could count the number of solutions of (22)-(23) which are not in S(κ, C) to get a better upper bound. The following is a standard result on continuity of roots; see for instance Lemma A.4.1 in [30]: Lemma 3 Let g(z) = zn + a1z n−1 + · · ·+ an be a polynomial of degree n and complex coefficients having distinct roots λ1, . . . , λq, with multiplicities n1 + · · ·+ nq = n, respectively. Given any small enough δ > 0 there exists a ε > 0 so that, if h(z) = zn + b1z n−1 + · · ·+ bn, |ai − bi| < ε for i = 1, . . . , n, then h has precisely ni roots in Bδ(λi) for each i = 1, . . . , q. Theorem 4 For each γ > 0 and κ ∈ R6n−6+ such that ϕ 1 (γ) 6= ϕ 2 (γ), and each Stot > 0, there exists ε2 > 0 such that for all positive numbers Etot, Ftot satisfying Ftot = Etot/γ < ε2Stot/γ, the system Σ(κ, C) has at most n+ 1 positive steady states. Proof. Let us define a function R+ × C −→ C as follows: κ,γ,Stot(ε, u) = Q κ,(εStot,εStot/γ,Stot)(u), and a set B κ,γ,Stot(ε) consisting of the roots of Q̃ κ,γ,Stot(ε, u) which are not positive or the corresponding v’s determined by u’s as in (22) are not positive, Since Q̃ κ,γ,Stot(ε, u) is a polynomial of degree 2n + 1, if we can show that there exists ε2 > 0 such that for any ε ∈ (0, ε2), Q̃ κ,γ,Stot(ε, u) has at least n roots counting multiplicities that are in B κ,γ,Stot(ε), then we are done. In order to apply Lemma 3, we regard the function Q̃ κ,γ,Stot as defined on R× C. At ε = 0: κ,γ,Stot(0, u) = [ϕκ0ϕ 2(γ − u) 2 + (ϕκ0 − Stotϕ 2 )(uϕ 1 − γϕ 2)(γ − u)− Stot(uϕ 1 − γϕ = [ϕκ0ϕ 2(γ − u) 2 + ϕκ0(uϕ 1 − γϕ 2)(γ − u)− Stotϕ 1 − γϕ 2)(γ − u)− Stot(uϕ 1 − γϕ = [ϕκ0 (γ − u)u(ϕ 1 − ϕ 2) + Stotu(uϕ 1 − γϕ 2 )(ϕ 2 − ϕ 1)]/u = (ϕκ2 − ϕ 1)(uϕ 0 + Stot(uϕ 1 − γϕ 2 )− γϕ = (ϕκ2 − ϕ κ,γ,Stot(0, u) Let us denote the distinct roots of Q̃ κ,γ,Stot(0, u)/u as u1, . . . , uq, with multiplicities n1 + · · ·+ nq = 2n+ 1, and the roots of ϕκ1 − ϕ u1, . . . , up, p ≤ q, with multiplicities m1 + · · ·+mp = n, ni ≥ mi, for i = 1, . . . , p. For each i = 1, . . . , p, if ui is real and positive, then there are two cases (ui 6= γ as ϕ 1(γ) 6= ϕ 2(γ)): 1. ui > γ. We have 1(ui)− γϕ 2 (ui) > γ(ϕ 1 (ui)− ϕ 2(ui)) = 0. 2. ui < γ. We have 1(ui)− γϕ 2 (ui) < γ(ϕ 1 (ui)− ϕ 2(ui)) = 0. In both cases, uiϕ 1(ui)− γϕ 2 (ui) and γ − ui have opposite signs, i.e. 1 (ui)− γϕ 2(ui))(γ − ui) < 0. Let us pick δ > 0 small enough such that the following conditions hold: 1. For all i = 1, . . . , p, if ui is not real, then Bδ(ui) has no intersection with the real axis. 2. For all i = 1, . . . , p, if ui is real and positive, the following inequality holds for any real u ∈ Bδ(ui): (uϕκ1(u)− γϕ 2 (u))(γ − u) < 0. (24) 3. For all i = 1, . . . , p, if ui is real and negative, then Bδ(ui) has no intersection with the imaginary axis. 4. Bδ(uj) Bδ(uk) = ∅ for all j 6= k = 1, . . . , q. By Lemma 3, there exists ε3 > 0 such that for all ε ∈ (0, ε3), the polynomial Q̃ κ,γ,Stot(ε, u)/u has exactly nj roots in each Bδ(uj), j = 1, . . . , q, denoted by u j (ε), k = 1, . . . , nj . We pick one such ε, and we claim that none of the roots in Bδ(ui), i = 1, . . . , p with the v defined as in (22) will be an element in S. If so, we are done, since there are 1 ni ≥ 1 mi = n such roots, of κ,γ,Stot(ε, u) which are in B κ,γ,Stot(ε). For each i = 1, . . . , p, there are two cases: 1. ui is not real. Then condition 1 guarantees that u i (ε) is not real for each k = 1, . . . , ni, and thus is κ,γ,Stot(ε). 2. ui is real and positive. Pick any root u i (ε) ∈ Bδ(ui), k = 1, . . . , ni, the corresponding v i (ε) equals γ − uki (ε) uki (ε)ϕ i (ε)) − γϕ i (ε)) ) < 0 followed from (24). So (uki (ε), v i (ε)) /∈ S(κ, C), and u i (ε) ∈ B κ,γ,Stot(ε). 3. ui is real and negative. By condition 1 and 3, u i (ε) is not positive for all k = 1, . . . , ni. The next theorem considers the case when enzyme is in excess: Theorem 5 For each γ > 0, κ ∈ R6n−6+ such that ϕ 1 (γ) 6= ϕ 2(γ), and each Etot > 0, there exists ε3 > 0 such that for all positive numbers Ftot, Stot satisfying Ftot = Etot/γ > Stot/(ε3γ), the system Σ(κ, C) has at most one positive steady state. Proof. For each γ > 0, κ ∈ R6n−6+ such that ϕ 1 (γ) 6= ϕ 2 (γ), and each Etot > 0, we define a function R+ × C −→ C as follows: κ,γ,Etot(ε, u) = Q κ,(Etot,Etot/γ,εEtot)(u). Let us define the set C κ,γ,Etot(ε) as the set of roots of Q̄ κ,γ,Etot(ε, u) which are not positive or the corresponding v’s determined by u’s as in (22) are not positive. If we can show that there exists ε3 > 0 such that for any ε ∈ (0, ε3) there is at most one positive root of Q̄ κ,γ,Etot(ε, u) that is not in C κ,γ,Etot(ε), we are done. In order to apply Lemma 3, we now view the function Q̄ κ,γ,Etot as defined on R× C. At ε = 0: κ,γ,Etot(0, u) = (γ − u) (γ − u)ϕκ0ϕ ϕκ0 + uϕκ1 + (uϕκ1 − γϕ := (γ − u)R κ,γ,Etot(u). Let us denote the distinct roots of Q̄ κ,γ,Etot(0, u)/u as u1(= γ), u2, . . . , uq, with multiplicities n1 + · · ·+ nq = 2n+ 1, and u2, . . . , uq are the roots of R κ,γ,Etot(u) other than γ. Since ϕκ1(γ) 6= ϕ 2(γ), R κ,γ,Etot(u) is not divisible by u− γ, and thus n1 = 1. For each i = 2, . . . , q, we have (γ − ui)ϕ 0(ui)ϕ 2 (ui) = − ϕκ0(ui) + 1(ui) + ϕκ2(ui) 1 (ui)− γϕ 2(ui)) . If ui > 0, then ϕ 0(ui)ϕ 2 (ui) and ϕ 0 (ui) + 1 (ui) + ϕκ2 (ui) are both positive. Since uiϕ 1(ui)− γϕκ2(ui) and γ − ui are non zero, uiϕ 1 (ui)− γϕ 2(ui) and γ − ui must have opposite signs, that is 1 (ui)− γϕ 2(ui))(γ − ui) < 0. Let us pick δ > 0 small enough such that the following conditions hold for all i = 2, . . . , q: 1. If ui is not real, then Bδ(ui) has no intersection with the real axis. 2. If ui is real and positive, then for any real u ∈ Bδ(ui), the following inequality holds: (uϕκ1(u)− γϕ 2 (u))(γ − u) < 0. (25) 3. If ui is real and negative, then Bδ(ui) has no intersection with the imaginary axis. 4. Bδ(uj) Bδ(uk) = ∅ for all i 6= k = 2, . . . , q. By Lemma 3, there exists ε3 > 0 such that for all ε ∈ (0, ε3), the polynomial Q̄ κ,γ,Etot(ε, u) has exactly nj roots in each Bδ(uj), j = 1, . . . , q, denoted by u j (ε), k = 1, . . . , nj . We pick one such ε, and if we can show that all of the roots in Bδ(ui), i = 2, . . . , q are in C κ,γ,Etot(ε), then we are done, since the only roots that may not be in C κ,γ,Etot(ε) are the roots in Bδ(u1), and there is one root in Bδ(u1). For each i = 2, . . . , p, there are three cases: 1. ui is not real. Then condition 1 guarantees that u i (ε) is not real for all k = 1, . . . , ni. 2. ui is real and positive. Pick any root u i (ε), k = 1, . . . , ni, the corresponding v i (ε) equals γ − uki (ε) uki (ε)ϕ i (ε))− γϕ i (ε)) So, uki (ε) is in C κ,γ,Etot(ε). 3. ui is real and negative. By conditions 1 and 3, u i (ε) is not positive for all k = 1, . . . , ni. 5 Conclusions and discussions Here we have set up a mathematical model for multisite phosphorylation-dephosphorylation cycles of size n, and studied the number of positive steady states based on this model. We reformulated the question of number of positive steady states to question of the number of positive roots of certain polynomials, through which we also applied perturbation techniques. Our theoretical results depend on the assumption of mass action kinetics and distributive sequential mechanism, which are customary in the study of multisite phosphorylation and dephosphorylation. An upper bound of 2n−1 steady states is obtained for arbitrary parameter combinations. Biologically, when the substrate concentration greatly exceeds that of the enzyme, there are at most n + 1 (n) steady states if n is even (odd). And this upper bound can be achieved under proper kinetic conditions, see Theorem 1 for the construction. On the other extreme, when the enzyme is in excess, there is a unique steady state. As a special case of n = 2, which can be applied to a single level of MAPK cascades. Our results guarantees that there are no more than three steady states, consistent with numerical simulations in [17]. We notice that there is an apparent gap between the upper bound 2n−1 and the upper bound of n+1 (n) if n is even (odd) when the substrate is in excess. If we think the ratio Etot/Ftot as a parameter ε, then when ε≪ 1, there are at most n+1 (n) steady states when n is even (odd), which coincides with the largest possible lower bound. When ε ≫ 1, there is a unique steady state. If the number of steady states changes “continuously” with respect to ε, then we do not expect the number of steady states to exceed n + 1 (n) if n is even (odd). So a natural conjecture would be that the number of steady states never exceed n+ 1 under any conditions. 6 Acknowledgment We thank Jeremy Gunawardena for very helpful discussions. 7 Appendix proof of Lemma 2: Recall that (dropping the u’s in ϕκi , i = 0, 1, 2) κ,γ,Stot(0, u) = uϕκ0 + Stot(uϕ 1 − γϕ 2)− γϕ κ,γ,Stot (0, u) = ϕκ0 + Stot(uϕ 1 − γϕ ′ − (γ − u)(ϕκ0 ) Since F̃ κ,γ,Stot(0, ū) = 0, Stot(ūϕ 1 − γϕ 2 ) = (γ − ū)ϕ that is, γ − ū = Stot(ūϕ 1 − γϕ Therefore, κ,γ,Stot (0, ū) = ϕκ0 + Stot(uϕ 1 − γϕ Stot(ūϕ 1 − γϕ (ϕκ0) = ϕκ0 + ϕκ0(uϕ 1 − γϕ ′ − (ūϕκ1 − γϕ = ϕκ0 + ((1 + λ0ū+ λ0λ1ū 2 + · · · + λ0 · · ·λn−1ū (1− γβ0) + 2 (1− γβ1)ū+ · · ·+ n λ0 · · · λn−2 KMn−1 (1− γβn−1)ū λ0 + 2λ0λ1ū+ · · · + nλ0 · · ·λn−1ū (1− γβ0)ū+ (1− γβ1)ū 2 + · · ·+ λ0 · · ·λn−2 KMn−1 (1− γβn−1)ū = ϕκ0 + λ0 · · ·λi−1ū (j + 1− i) λ0 · · ·λj−1 (1− γβj)ū λ0 · · ·λi−1ū λ0 · · ·λj−1ū + Stot λ0 · · ·λi−1ū (j + 1− i) λ0 · · ·λj−1 (1− γβj)ū λ0 · · ·λi−1ū λ0 · · ·λn−1ū λ0 · · ·λj−1ū 1 + Stot(j + 1− i) 1 − γβj where the product λ0 · · ·λ−1 is defined to be 1 for the convenience of notation. Because of (21), (j + 1− i) 1− γβj so we have ∂F̃ κ,γ,Stot (0, ū) > 0. References [1] M. Samoilov, S. Plyasunov, and A.P. Arkin. Stochastic amplification and signaling in enzymatic futile cycles through noise-induced bistability with oscillations. Proc Natl Acad Sci USA, 102:2310–2315, 2005. [2] S. Donovan, K.M. Shannon, and G. Bollag. GTPase activating proteins: critical regulators of intra- cellular signaling. Biochim. Biophys Acta, 1602:23–45, 2002. [3] J.J. Bijlsma and E.A. Groisman. Making informed decisions: regulatory interactions between two- component systems. Trends Microbiol, 11:359–366, 2003. [4] A.D. Grossman. Genetic networks controlling the initiation of sporulation and the development of genetic competence in bacillus subtilis. Annu Rev Genet., 29:477–508, 1995. [5] H. Chen, B.W. Bernstein, and J.R. Bamburg. Regulating actin filament dynamics in vivo. Trends Biochem. Sci., 25:19–23, 2000. [6] G. Karp. Cell and Molecular Biology. Wiley, 2002. [7] L. Stryer. Biochemistry. Freeman, 1995. [8] M.L. Sulis and R. Parsons. PTEN: from pathology to biology. Trends Cell Biol., 13:478–483, 2003. [9] D.J. Lew and D.J. Burke. The spindle assembly and spindle position checkpoints. Annu Rev Genet., 37:251–282, 2003. [10] A.R. Asthagiri and D.A. Lauffenburger. A computational study of feedback effects on signal dynamics in a mitogen-activated protein kinase (MAPK) pathway model. Biotechnol. Prog., 17:227–239, 2001. [11] L. Chang and M. Karin. Mammalian MAP kinase signaling cascades. Nature, 410:37–40, 2001. [12] C-Y.F. Huang and J.E. Ferrell Jr. Ultrasensitivity in the mitogen-activated protein kinase cascade. Proc. Natl. Acad. Sci. USA, 93:10078–10083, 1996. [13] C. Widmann, G. Spencer, M.B. Jarpe, and G.L. Johnson. Mitogen-activated protein kinase: Conser- vation of a three-kinase module from yeast to human. Physiol. Rev., 79:143–180, 1999. [14] W.R. Burack and T.W. Sturgill. The activating dual phosphorylation of MAPK by MEK is nonpro- cessive. Biochemistry, 36:5929–5933, 1997. [15] J.E. Ferrell and R.R. Bhatt. Mechanistic studies of the dual phosphorylation of mitogen-activated protein kinase. J. Biol. Chem., 272:19008–19016, 1997. [16] Y. Zhao and Z.Y. Zhang. The mechanism of dephosphorylation of extracellular signal-regulated kinase 2 by mitogen-activated protein kinase phosphatase 3. J. Biol. Chem., 276:32382–32391, 2001. [17] N.I. Markevich, J.B. Hoek, and B.N. Kholodenko. Signaling switches and bistability arising from multisite phosphorylation in protein kinase cascades. J. Cell Biol., 164:353–359, 2004. [18] J. Gunawardena. Multisite protein phosphorylation makes a good threshold but can be a poor switch. Proc. Natl. Acad. Sci., 102:14617–14622, 2005. [19] C. Conradi, J. Saez-Rodriguez, E.-D. Gilles, and J. Raisch. Using chemical reaction network theory to discard a kinetic mechanism hypothesis. In Proc. FOSBE 2005 (Foundations of Systems Biology in Engineering), Santa Barbara, Aug. 2005, pages 325–328. 2005. [20] T.S. Gardner, C.R. Cantor, and J.J. Collins. Construction of a genetic toggle switch in Escherichia coli. Nature, 403:339–342, 2000. [21] D. Angeli, J. E. Ferrell, and E.D. Sontag. Detection of multistability, bifurcations, and hysteresis in a large class of biological positive-feedback systems. Proc Natl Acad Sci USA, 101(7):1822–1827, 2004. [22] E.E. Sel’kov. Stabilization of energy charge, generation of oscillation and multiple steady states in energy metabolism as a result of purely stoichiometric regulation. Eur. J. Biochem, 59(1):151–157, 1975. [23] W. Sha, J. Moore, K. Chen, A.D. Lassaletta, C.S. Yi, J.J. Tyson, and J.C. Sible. Hysteresis drives cell-cycle transitions in Xenopus laevis egg extracts. Proc. Natl. Acad. Sci., 100:975–980, 2003. [24] F. Ortega, J. Garcés, F. Mas, B.N. Kholodenko, and M. Cascante. Bistability from double phos- phorylation in signal transduction: Kinetic and structural requirements. FEBS J, 273:3915–3926, 2006. [25] L. Wang and E.D. Sontag. Singularly perturbed monotone systems and an application to double phosphorylation cycles. (Submitted to IEEE Transactions Autom. Control, Special Issue on Systems Biology, January 2007, Preprint version in arXiv math.OC/0701575, 20 Jan 2007), 2007. [26] L. Wang and E.D. Sontag. Almost global convergence in singular perturbations of strongly monotone systems. In Positive Systems, pages 415–422. Springer-Verlag, Berlin/Heidelberg, 2006. (Lecture Notes in Control and Information Sciences Volume 341, Proceedings of the second Multidisciplinary Inter- national Symposium on Positive Systems: Theory and Applications (POSTA 06) Grenoble, France). [27] D. Angeli, P. de Leenheer, and E.D. Sontag. A Petri net approach to the study of persistence in chemical reaction networks. (Submitted to Mathematical Biosciences, also arXiv q-bio.MN/068019v2, 10 Aug 2006), 2007. [28] D. Angeli and E.D. Sontag. Translation-invariant monotone systems, and a global convergence result for enzymatic futile cycles. Nonlinear Analysis Series B: Real World Applications, to appear, 2007. [29] M Thompson and J. Gunawardena. Multi-bit information storage by multisite phosphorylation. Sub- mitted, 2007. [30] E.D. Sontag. Mathematical Control Theory. Deterministic Finite-Dimensional Systems, volume 6 of Texts in Applied Mathematics. Springer-Verlag, New York, second edition, 1998. [31] M. Feinberg. Chemical reaction network structure and the stability of complex isothermal reactors: II. Multiple steady states for networks of deficiency one. Chem. Eng. Sci., 43,1–25, 1988. [32] P. Ellison, M. Feinberg. How catalytic mechanisms reveal themselves in multiple steady-state data: I. Basic principles. J. Symbolic Comput., 33, 275–305, 2002. [33] C.M. Furdui, E.D. Lew, J. Schlessinger, K.S. Anderson. Autophosphorylation of FGFR1 kinase is mediated by a sequential and precisely ordered reaction. Molecular Cell, 21, 711–717, 2006. http://arxiv.org/abs/math/0701575 Introduction Model assumptions Mathematical formalism Number of positive steady states Lower bound on the number of positive steady states Upper bound on the number of steady states Fine-tuned upper bounds Conclusions and discussions Acknowledgment Appendix ABSTRACT The multisite phosphorylation-dephosphorylation cycle is a motif repeatedly used in cell signaling. This motif itself can generate a variety of dynamic behaviors like bistability and ultrasensitivity without direct positive feedbacks. In this paper, we study the number of positive steady states of a general multisite phosphorylation-dephosphorylation cycle, and how the number of positive steady states varies by changing the biological parameters. We show analytically that (1) for some parameter ranges, there are at least n+1 (if n is even) or n (if n is odd) steady states; (2) there never are more than 2n-1 steady states (in particular, this implies that for n=2, including single levels of MAPK cascades, there are at most three steady states); (3) for parameters near the standard Michaelis-Menten quasi-steady state conditions, there are at most n+1 steady states; and (4) for parameters far from the standard Michaelis-Menten quasi-steady state conditions, there is at most one steady state. <|endoftext|><|startoftext|> Introduction The discrete dipole approximation (DDA) is a general method to calculate scattering and absorption of electromagnetic waves by particles of arbitrary geometry and composition. The DDA was first proposed by Purcell and Pennypacker [1] and was reviewed by Draine and Flatau in 1994 [2]. A recent review [3] describes the current state of the DDA and its historical development. It also explains the equivalence of the DDA and methods based on the volume integral equation formulation. The reader is referred to this review for an in-depth discussion of the DDA. There are a number of computer programs based on the DDA, some of which were recently compared by Penttila et al. [4]. The most popular among them is DDSCAT [5], which has been widely used by many researchers for more than 10 years. In this paper we present a new program, Amsterdam DDA (ADDA), which recently has been put in the public domain.1 Its main distinctive feature is the ability to parallelize a single DDA simulation over a cluster of computers, which allows simulation of light scattering by very large particles. This is demonstrated for a number of test cases in this manuscript. Validation of ADDA by simulating light scattering by wavelength-sized particles and comparing it with other DDA programs was reported elsewhere [4]. Section 2 describes in detail the ADDA computer code, showing its advantages compared to other codes. A number of numerical tests are shown in Section 3, demonstrating that DDA is actually capable processing large particles, and showing the current capabilities of ADDA. Results of these simulations are discussed in Section 4; the errors are compared with previous results for much smaller particles. Section 5 concludes the manuscript and discusses possible future work. 2 ADDA computer code ADDA has been developed over a period of more than 10 years at the University of Amsterdam [6-8]. Its main feature (distinctive from other DDA codes) has always been the capability of running on a cluster of computers, parallelizing a single DDA computation, in contrast with e.g. DDSCAT [5] that allows farming several instantiations of a DDA simulation to different processors. This allows using a practically unlimited number of dipoles, since ADDA is not limited by the memory of a single computer [8,9]. Recently the overall performance of the code has been improved significantly, together with some optimizations specifically for single-processor mode. ADDA's source code and documentation is freely available. Most of ADDA is written in ANSI C, which ensures wide portability on the source-code level. The code is fully operational under Linux and, in sequential mode, on Windows based systems. The parallelization over multiple processors is based on a geometric decomposition of the particle and the single-program-multiple-data paradigm of parallel computing. The code is written for distributed memory systems using the message passing interface (MPI).2 Note that ADDA should in principle also run on shared memory computers, but so far this was not explicitly tested. The fast Fourier transform (FFT) used for the matrix-vector products in the iterative solver is performed either using routines by Temperton [10] or the more advanced package “Fastest Fourier transform in the West” (FFTW) [11]. The latter is generally considerably faster but requires a separate package installation. ADDA has four options implemented for dipole polarizabilities: Clausius-Mossotti [1], radiative reaction correction [12], lattice dispersion relation (LDR) [13], and corrected LDR [14]. It includes four iterative methods: conjugate gradient applied to normalized equation with minimization of residual norm (CGNR) [15], Bi-CG stabilized (Bi-CGSTAB) [15], Bi- 1 http://www.science.uva.nl/research/scs/Software/adda/ 2 http://www.mpi-forum.org http://www.science.uva.nl/research/scs/Software/adda/ http://www.mpi-forum.org/ CG [16], and quasi minimal residual (QMR) [16]. The last two iterative methods employ the complex-symmetric property of the DDA interaction matrix to halve the calculation time [16]. The default stopping criterion of the iterative method in ADDA is the relative norm of the residual ε, which must be . 510−< The usual formulation of DDA can be written as [2,3]: jijii EPGP =− ∑ −α , (1) where iα is the tensor of dipole polarizability, is incident electric field, iE ijG is the free- space Green’s tensor (complex symmetric), and Pi is the unknown dipole polarization. If the polarizability tensor is diagonal for all dipoles then there always exists a iβ such that iii αββ = , i.e. ii αβ = . Moreover, iβ is then complex symmetric, and so is the matrix with elements A , (2) where I is an identity tensor. A is the interaction matrix that is used in ADDA, i.e. the following system of linear equations is solved: jjijii jij ExGxxA βββ =−= ∑∑ , (3) where iii Px 1−= β is a new unknown vector. Eq. (3) is equivalent to the use of Jacobi- preconditioning [15] together with keeping the interaction matrix complex-symmetric (for any distribution of refractive index inside the scatterer and for any of the supported polarization prescriptions). We have not studied, however, whether this Jacobi-preconditioning improves the convergence of the iterative solver. Flatau showed [17] that in some test cases it helps, while in others there is no improvement. It is important to note also that DDA is not limited to diagonal or symmetric polarizabilities. Any other tensor may be used, but then the interaction matrix is not complex-symmetric; hence, QMR and Bi-CG are less efficient. ADDA can perform orientation averaging of the scattering quantities over three Euler angles (α, β, γ) of the particle orientation. Averaging over the angle α is done with a single computation of internal fields by computing scattering in different scattering planes, which is comparably fast. Averaging over the other two Euler angles is done by independent DDA simulations. The averaging itself is performed using a Romberg integration [18], which may be used adaptively (i.e. automatically simulating the required number of different orientations to reach a prescribed accuracy) but limits the possible number of values for each orientation angle to be , where n is an integer. Moreover, symmetries of the scatterer may be used to decrease the intervals of Euler angles, over which to average, and hence accelerate the calculation. This feature of ADDA was tested in a recent benchmark study [4]. 12 +n Other features of ADDA include computation of scattering by a tightly focused Gaussian beams [6], a checkpoint system to allow for long runs on queuing systems that enforce upper limits on wall clock time for execution as is usually the case on massively parallel supercomputers, calculation of radiation forces on each of the dipoles [19], use of rotational symmetry of the scatterer to halve the simulation time, and an extended command line interface. Some other features, such as applicability to anisotropic scatterers and a large set of predefined shapes, are planned to be implemented in the near future. There are several factors that allow ADDA's performance to compare favorably with other codes, which was shown in a benchmark study by Penttila et al. [4]. First of all, the FFTW 3 package that is used automatically adapts itself to optimally perform on any particular hardware. Moreover, ADDA does not perform complete 3D FFT transforms in one run, but decomposes them into a set of 1D transforms with data transposition in between. This allows employing the fact that input data for the forward transform contains many zeros, and 0 20 40 60 80 100 120 140 160 ε ∈(10−5,10−3) Size parameter x ε =10−5 70 GB Fig. 1. Current capabilities of the ADDA for spheres with different x and m. The striped region corresponds to full convergence and densely hatched region to incomplete convergence. The dashed lines show two levels of memory requirements for the simulation, according to the “rule of thumb” (see main text for explanation). only part of the output data of the backward transform is used [8]. Second, we have implemented four different Krylov-space-based iterative solvers, allowing us to choose the most suitable one for a particular application. As is known from the literature [17,20,21] and demonstrated in Section 3, there is not a best iterative solver for DDA. Depending on all details of the scattering problem, any of the methods may outperform the others. Third, dynamic memory allocation and optimized data structures allow all computations, except the FFT, to be performed only for the real (non-void) dipoles and not for the whole computational box. This also decreases ADDA's memory consumption. Moreover, symmetry of the interaction matrix is used to decrease memory required for its Fourier transform. Finally, all float variables in ADDA are represented in double precision. This accelerates convergence in cases when machine precision becomes important. Moreover, basic operations with double- precision numbers can be faster than with single-precision ones on modern processors. This acceleration comes at a cost of increased memory consumption, which is, however, still lower than for other computer codes [4]. More information on ADDA can be found in an extensive manual included in the distribution package. 3 Numerical simulations 3.1 Simulation parameters In our tests we used ADDA v.0.75, compiled with the Intel C compiler v.9.0 with maximum possible optimizations (default options in ADDA’s makefile). All the tests were run on the Dutch compute cluster LISA,3 using 32 nodes (each dual Intel Xeon 3.4 GHz processor with 4 GB RAM). LDR was used as the most common polarization formulation. We have tried three different iterative solvers: QMR, Bi-CG, and Bi-CGSTAB. For all of them a default stopping criterion was used. 510−=ε 3 http://www.sara.nl/userinfo/lisa/description/ http://www.sara.nl/userinfo/lisa/description/ Table 1. Parameters of the numerical simulations. m x λ/md Number of dipolesa Iterative method Number of iterations 20 9.6 2.6×105 Bi-CGSTAB 6 30 9.6 8.8×105 Bi-CGSTAB 7 40 9.6 2.1×106 Bi-CGSTAB 9 60 9.6 7.1×106 Bi-CGSTAB 14 80 9.6 1.7×107 Bi-CGSTAB 20 100 9.6 3.3×107 Bi-CGSTAB 27 130 10.3 9.0×107 Bi-CGSTAB 40 1.05 160 9.6 1.3×108 Bi-CGSTAB 65 20 10.5 5.1×105 QMR 86 30 11.2 2.1×106 QMR 223 40 10.5 4.1×106 QMR 598 60 9.8 1.1×107 QMR 2120 80 10.5 3.3×107 Bi-CGSTAB 21748 100 10.1 5.7×107 Bi-CGSTAB 6169 130 10.3 1.3×108 Bi-CGSTAB 29200 20 10.8 8.8×105 QMR 1344 30 10.8 3.0×106 QMR 16930 40 10.8 7.1×106 QMR 8164 60 9.6 1.7×107 Bi-CG 127588 20 11.0 1.4×106 QMR 8496 1.6 30 10.5 4.1×106 Bi-CG 69748 20 11.2 2.1×106 QMR 28171 1.8 30 10.2 5.5×106 Bi-CG 118383 2 20 10.1 2.1×106 QMR 58546 a This is the total number of dipoles in the rectangular computational grid, which is the main factor determining the computation time of one iteration. For spheres the number of dipoles occupied by the scatterer itself is almost two times smaller. 0 5000 10000 15000 20000 25000 Number of iterations Fig. 2. Convergence of the QMR iterative solver for the sphere with x = 20 and m = 1.8. The residual as a function of the iteration number is shown. The system of linear equations contains 3×106 unknowns. Spheres were used as test objects. Their size parameter x was varied from 20 to 160 and their refractive index m was varied from 1.05 to 2. We limited ourselves to the case of real m. The current capabilities of ADDA are shown as a region of the (x,m)-plane in Fig. 1. The striped region corresponds to full convergence, the densely hatched region corresponds to those cases where ADDA could not fully converge to the required residual norm, but only to 20 40 60 80 100 120 140 160 106 1 week 1 day Size parameter x m = 1.05 1 min 1 hour Fig. 3. Total simulation wall clock time (on 64 processors) for spheres with different x and m. Time is shown in logarithmic scale. Horizontal dotted lines corresponding to a minute, an hour, a day, and a week are shown for convenience. 20 40 60 80 100 120 140 160 Size parameter x m = 1.05 Fig. 4. Relative errors of the extinction efficiency in logarithmic scale for spheres with different x and )10,10( 35 −−∈ε . Although this incomplete convergence probably affects the final accuracy of the scattering quantities only slightly, we remove such results from further consideration because a separate study is required to quantify this effect (see Section 4). For fully converged results, the errors of scattering quantities due to the numerical convergence are much smaller than the total errors (data not shown). A complete set of (x,m) pairs, for which ADDA converged, is shown in Table 1. It also shows the number of dipoles per wavelength in the medium ( md/λ where d is the size of the dipole). We tried to keep it equal to 10 according to the “rule of thumb” as formulated by Draine and Flatau [2]; however, it was slightly different because we varied the size of the dipole grid to optimize the parallel efficiency of ADDA.4 The total number of dipoles in a rectangular computational grid, shown in Table 1, was varied from 2.6×105 to 1.3×108, it can 4 The best parallel performance is obtained when grid size divides the number of processors. However, ADDA works with any grid size. 20 40 60 80 100 120 140 160 Size parameter x m = 1.05 Fig. 5. Same as Fig. 4 but now for the asymmetry parameter. 20 40 60 80 100 120 140 160 Size parameter x m = 1.05 Fig. 6. Maximum relative errors of S11(θ ) in logarithmic scale for spheres with different x and m. be approximately determined as . Both memory requirements and computation time of one iteration are proportional to this number. Two dashed lines are shown in 3)18.3( xm Fig. 1 to indicate the memory requirements for different x and m. They correspond to typical memory of a modern desktop computer (2 GB) and the maximum total memory used in our simulations (70 GB), respectively. For each sphere we computed the extinction efficiency, the asymmetry parameter, and all Mueller matrix elements in one scattering plane, which is a symmetry plane of the cubical discretization of the sphere. Exact results for the same spheres were obtained using the Mie theory [22]. Spherical symmetry was used by ADDA to get all results from calculations for only one polarization state of the incident field. Therefore computation time is a factor of two smaller than for non-symmetric scatterers with the same x and m. We employed a volume correction to ensure equal volumes of sphere and its dipole representation [2]. Note, however, that for the very large spheres this correction is extremely small. 3.2 Results Table 1 shows the iterative solver that provided the best performance for each particular case and the number of iterations to achieve convergence. Fig. 2 illustrates one specific example of 20 40 60 80 100 120 140 160 Size parameter x m = 1.05 Fig. 7. Same as Fig. 6 but now for RMS relative errors. 0 30 60 90 120 150 180 170 175 180 Scattering angle θ, deg Fig. 8. DDA results (dotted line) of S11(θ ) in logarithmic scale for a sphere with x = 160 and m = 1.05, compared with the results of the Mie theory (solid line). convergence of the DDA iterative solver. This is QMR applied to the system of 3⋅106 linear equations obtained for the sphere with 20=x and 8.1=m . The total simulation wall clock time t for all particles is shown in Fig. 3. Fig. 4 and Fig. 5 show the relative errors of the extinction efficiency Qext and the asymmetry parameter >< θcos respectively. Maximum - and root-mean-squared (RMS) relative errors of S11 over the whole range of scattering angle are shown in Fig. 6 and Fig. 7 respectively. Errors of other non-trivial Mueller matrix elements behave in a similar way (data not shown). DDA results of S11(θ) for a sphere with 160=x and 05.1=m are compared with the Mie theory in Fig. 8. The inset shows a magnification of the backscattering region. This is, to the best of our knowledge, the largest particle ever simulated with DDA. Fig. 9 and Fig. 10 show the same comparisons but for 60=x , 4.1=m and 20=x , 2=m respectively. 0 30 60 90 120 150 180 Scattering angle θ, deg Fig. 9. Same as Fig. 8 but now for x = 60 and m = 1.4. 0 30 60 90 120 150 180 Scattering angle θ, deg Fig. 10. Same as Fig. 8 but now for x = 20 and m = 2. 4 Discussion The convergence of the QMR iterative solver shown in Fig. 2, featuring plateaus and steep descents, is in agreement both with its behavior in general [16] and with particular examples of its application to DDA [20,23]. A distinctive feature of this graph compared to the literature data is that the convergence slows down with iteration number, i.e. the logarithm of the residual norm decreases slower than linearly. This is probably due to the large size of the scatterer and loss of numerical precision (see discussion below). The total computation times t increase steeply both with x and m (Fig. 3). The time is displayed in a logarithmic scale covering a range from 4 seconds to more than 2 weeks. For , the increase of t with x is mostly due to the increasing number of dipoles to model the scatterer, since the number of iterations increase at a slower pace ( 05.1=m Table 1). For larger m these two effects are comparable, combining into a very unfavorable scaling, which can be approximately described by a power law , where )()( mxmCt α≈ 6>α for . It should be noted that both the number of iterations and t do not always increase monotonically with x. For example for , and 2.1≥m 80=x 2.1=m 30=x , 4.1=m the execution times are unusually high. This may be caused by a large condition number of DDA interaction matrices for these two particular particles. Moreover, when the convergence is slow it may suffer from machine precision, the latter determining the limit of x and m, for which ADDA will converge at all. Therefore, current size limitations of the DDA for are due to the practically unbearable computation times, and not due to memory requirements. 2.1≥m 5 Simulations for larger m are far from the memory limit shown in Fig. 1. Moreover, simply using more processors does not solve the problem. Improving numerical performance is required, e.g. dedicated preconditioning of the iterative solver [15]. On the other hand, extension to larger sizes for is feasible if more computer resources are available. This facilitates, for example, simulating scattering of visible light by almost all biological cells in suspension. 2.1< θcos (Fig. 5) behave in a similar way. These results are in good agreement with results of other researchers for smaller size parameters [2,13,26], both in terms of the errors themselves and their dependence on m. To express errors on the angular dependencies of S11 we use two integral parameters: the maximum - and RMS relative errors (Fig. 6 and Fig. 7 respectively). Although these parameters are not completely objective, as they are significantly influenced by the values of S11 in deep minima, which are completely irrelevant to most real experiments, they do provide a consistent measure of the DDA accuracy. To relate these integral parameters to some other criteria, e.g. visual agreement, three examples are presented in Fig. 8 – Fig. 10. Errors of S11(θ ) show the same tendencies as the integral scattering quantities, except that errors for are relatively large (larger than those for in the range ) and generally decrease with x. This is due to the relative nature of the measured errors and the huge dynamical range of S 05.1=m 2.1=m 11(θ ) for small refractive indices (see Fig. 8). Results for smaller size parameters found in the literature [2,26] show a similar increase of 5 The boundary value of m is not well-defined, as it depends on particular hardware and restrictions on computation time; 1.2 is just a convenient value to guide the reader. errors with m: however, the errors themselves are considerably smaller. For instance, maximum relative errors of S11(θ ) for 10< θcos do satisfy the “rule of thumb,” however this rule does not describe the decrease of errors by two orders of magnitude with decreasing m. The latter can be used to cut down the number of dipoles and hence computation time in cases when only integral scattering quantities need to be calculated for small m. Relative errors of S11(θ ) are much larger than that predicted by the “rule of thumb,” which is due to the fact that the latter was derived based on test simulations for x smaller than 10 [2]. See, however, the discussion below on possible changes for complex refractive index and non-spherical shapes. To conclude, the “rule of thumb” has very limited application for the range of x and m here. More elaborate empirical functions are required to estimate the number of dipoles needed to reach a prescribed accuracy. They will also allow a more realistic estimate of DDA computational complexity, i.e. the computation time needed to reach a certain accuracy of some scattering quantities for particular x and m. This topic is left for the future study. The test results shown in this paper are limited to real refractive indices and spherically shaped scatterers. In the following we try to generalize our conclusions to complex refractive index and non-spherical shapes. However, we want to stress that this generalization is speculative, and more numerical tests are clearly needed to verify them. It is expected that accuracies of integral scattering quantities should not change significantly for more general cases. Their accuracy should deteriorate both with increasing real and imaginary parts of the refractive index. The situation for angle-resolved scattering quantities is expected to be different. Large relative errors observed in this paper are due to deep minima that are a consequence of both spherical symmetry and purely real refractive index. It is expected that visual agreement between the DDA results and the exact solution (as shown in Fig. 8 – Fig. 10) should not change significantly for more general cases, however it will result in smaller relative errors, especially for larger x and smaller m. 5 Conclusion In this paper we present the ADDA, a computer program to simulate light scattering by arbitrarily shaped particles. ADDA can parallelize a single DDA simulation, which allows it not to be limited by the memory of a single computer. Moreover, ADDA is heavily optimized, which allows it to compare favorably with other programs based on DDA when running on a single processor. We showed its capabilities for simulating light scattering by spheres with x up to 160 and m up to 2. The maximum reachable x on a cluster of 64 modern processors decrease rapidly with increasing m: it is 160 for 05.1=m and only 20-40 (depending on the convergence threshold) for . This is mostly due to the slow convergence of the iterative solver leading to practically unbearable computation times. It is expected that larger particle sizes can be reached if m has a significant imaginary part. Errors of both integral and angle-resolved scattering quantities show no systematic dependence on x, but generally increase with m. Errors of Qext and >< θcos range from less than 0.01 % to 6 %. Maximum - and RMS relative errors of S11(θ ) are in the ranges 0.2–18 and 0.04–1 respectively. Error predictions of the traditional “rule of thumb” have very limited application in this range of x and m: it describes the upper limit of errors of Qext and >< θcos , however it does not account for the decrease of the errors with m. Currently, the ADDA is capable of simulating light scattering by almost all biological cells in suspension; however, its performance for other cases can be improved. These improvements, left for future work, may include improving the convergence of the iterative solver by preconditioning. It also is desirable to conduct a detailed study of the dependence of the accuracy of the final results on the size of the dipole and convergence thresholds of the iterative solver for different scatterers. Such a study should result in a reduction of the computation time and provide a realistic estimate of DDA complexity over a wide range of x and m. Acknowledgements We thank Gorden Videen for critically reading the manuscript and anonymous reviewer for valuable comments. Our research is supported by Siberian Branch of the Russian Academy of Sciences through the grant 2006-03. References [1] Purcell EM, Pennypacker CR. Scattering and adsorption of light by nonspherical dielectric grains. Astrophys J 1973;186:705-714. [2] Draine BT, Flatau PJ. Discrete-dipole approximation for scattering calculations. J Opt Soc Am A 1994;11:1491-1499. [3] Yurkin MA, Hoekstra AG. The discrete dipole approximation: an overview and recent developments. J Quant Spectrosc Radiat Transf 2007, doi:10.1016/j.jqsrt.2007.01.034. [4] Penttila A, Zubko E, Lumme K, Muinonen K, Yurkin MA, Draine BT, Rahola J, Hoekstra AG, Shkuratov Y. Comparison between discrete dipole implementations and exact techniques. J Quant Spectrosc Radiat Transf 2007, doi:10.1016/j.jqsrt.2007.01.026. [5] Draine BT, Flatau PJ. User guide for the discrete dipole approximation code DDSCAT 6.1. http://xxx.arxiv.org/abs/astro-ph/0409262, 2004. [6] Hoekstra AG. Computer simulations of elastic light scattering. PhD thesis. University of Amsterdam, Amsterdam, 1994. [7] Hoekstra AG, Sloot PMA. Coupled dipole simulations of elastic light scattering on parallel systems. Int J Mod Phys C 1995;6:663-679. [8] Hoekstra AG, Grimminck MD, Sloot PMA. Large scale simulations of elastic light scattering by a fast discrete dipole approximation. Int J Mod Phys C 1998;9:87-102. [9] Yurkin MA, Semyanov KA, Tarasov PA, Chernyshev AV, Hoekstra AG, Maltsev VP. Experimental and theoretical study of light scattering by individual mature red blood cells with scanning flow cytometry and discrete dipole approximation. Appl Opt 2005;44:5249-5256. [10] Temperton C. Self-sorting mixed-radix fast Fourier transforms. J Comp Phys 1983;52:1-23. [11] Frigo M, Johnson SG. FFTW: an adaptive software architecture for the FFT. Proc ICASSP 1998;3:1381- 1384. [12] Draine BT. The discrete-dipole approximation and its application to interstellar graphite grains. Astrophys J 1988;333:848-872. [13] Draine BT, Goodman JJ. Beyond clausius-mossotti - wave-propagation on a polarizable point lattice and the discrete dipole approximation. Astrophys J 1993;405:685-697. [14] Gutkowicz-Krusin D, Draine BT. Propagation of electromagnetic waves on a rectangular lattice of polarizable points. http://xxx.arxiv.org/abs/astro-ph/0403082, 2004. [15] Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, van der Vorst HA. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd ed. SIAM, 1994. [16] Freund RW. Conjugate gradient-type methods for linear-systems with complex symmetrical coefficient matrices. SIAM J Sci Stat Comp 1992;13:425-448. [17] Flatau PJ. Improvements in the discrete-dipole approximation method of computing scattering and absorption. Opt Lett 1997;22:1205-1207. [18] Davis PJ, Rabinowitz P. Methods of Numerical Integration. New York: Academic Press, 1975. [19] Hoekstra AG, Frijlink M, Waters LBFM, Sloot PMA. Radiation forces in the discrete-dipole approximation. J Opt Soc Am A 2001;18:1944-1953. [20] Rahola J. Solution of dense systems of linear equations in the discrete-dipole approximation. SIAM J Sci Comp 1996;17:78-89. http://xxx.arxiv.org/abs/astro-ph/0409262, http://xxx.arxiv.org/abs/astro-ph/0403082, [21] Fan ZH, Wang DX, Chen RS, Yung EKN. The application of iterative solvers in discrete dipole approximation method for computing electromagnetic scattering. Microwave Opt Tech Lett 2006;48:1741-1746. [22] Bohren CF, Huffman DR. Absorption and scattering of Light by Small Particles. New York: Wiley, 1983. [23] Rahola J. Iterative solution of dense linear systems arising from integral equations. Appl Parall Comput , Lect Not Comp Sci 1998;1541:460-467. [24] Rahola J. On the eigenvalues of the volume integral operator of electromagnetic scattering. SIAM J Sci Comp 2000;21:1740-1754. [25] Budko NV, Samokhin AB. Spectrum of the volume integral operator of electromagnetic scattering. SIAM J Sci Comp 2006;28:682-700. [26] Hoekstra AG, Rahola J, Sloot PMA. Accuracy of internal fields in volume integral equation simulations of light scattering. Appl Opt 1998;37:8482-8497. 1 Introduction 2 ADDA computer code 3 Numerical simulations 3.1 Simulation parameters 3.2 Results 4 Discussion 5 Conclusion Acknowledgements References /ASCII85EncodePages false /AllowTransparency false /AutoPositionEPSFiles true /AutoRotatePages /None /Binding /Left /CalGrayProfile (Dot Gain 20%) /CalRGBProfile (sRGB IEC61966-2.1) /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2) /sRGBProfile (sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Error /CompatibilityLevel 1.4 /CompressObjects /Tags /CompressPages true /ConvertImagesToIndexed true /PassThroughJPEGImages true /CreateJDFFile false /CreateJobTicket false /DefaultRenderingIntent /Default /DetectBlends true /DetectCurves 0.0000 /ColorConversionStrategy /CMYK /DoThumbnails false /EmbedAllFonts true /EmbedOpenType false /ParseICCProfilesInComments true /EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false /EndPage -1 /ImageMemory 1048576 /LockDistillerParams false /MaxSubsetPct 100 /Optimize true /OPM 1 /ParseDSCComments true /ParseDSCCommentsForDocInfo true /PreserveCopyPage true /PreserveDICMYKValues true /PreserveEPSInfo true /PreserveFlatness true /PreserveHalftoneInfo false /PreserveOPIComments true /PreserveOverprintSettings true /StartPage 1 /SubsetFonts true /TransferFunctionInfo /Apply /UCRandBGInfo /Preserve /UsePrologue false /ColorSettingsFile () /AlwaysEmbed [ true /NeverEmbed [ true /AntiAliasColorImages false /CropColorImages true /ColorImageMinResolution 300 /ColorImageMinResolutionPolicy /OK /DownsampleColorImages true /ColorImageDownsampleType /Bicubic /ColorImageResolution 300 /ColorImageDepth -1 /ColorImageMinDownsampleDepth 1 /ColorImageDownsampleThreshold 1.50000 /EncodeColorImages true /ColorImageFilter /DCTEncode /AutoFilterColorImages true /ColorImageAutoFilterStrategy /JPEG /ColorACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] /ColorImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] /JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 /JPEG2000ColorImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] /GrayImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] /JPEG2000GrayACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 /JPEG2000GrayImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1 /AllowPSXObjects false /CheckCompliance [ /None /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 /PDFXOutputIntentProfile () /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False /Description << /CHS /CHT /DAN /DEU /ESP /FRA /ITA /JPN /KOR /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken die zijn geoptimaliseerd voor prepress-afdrukken van hoge kwaliteit. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.) /NOR /PTB /SUO /SVE /ENU (Use these settings to create Adobe PDF documents best suited for high-quality prepress printing. Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.) /Namespace [ (Adobe) (Common) (1.0) /OtherNamespaces [ << /AsReaderSpreads false /CropImagesToFrames true /ErrorControl /WarnAndContinue /FlattenerIgnoreSpreadOverrides false /IncludeGuidesGrids false /IncludeNonPrinting false /IncludeSlug false /Namespace [ (Adobe) (InDesign) (4.0) ] /OmitPlacedBitmaps false /OmitPlacedEPS false /OmitPlacedPDF false /SimulateOverprint /Legacy >> << /AddBleedMarks false /AddColorBars false /AddCropMarks false /AddPageInfo false /AddRegMarks false /ConvertColors /ConvertToCMYK /DestinationProfileName () /DestinationProfileSelector /DocumentCMYK /Downsample16BitImages true /FlattenerPreset << /PresetSelector /MediumResolution >> /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> >> setdistillerparams /HWResolution [2400 2400] /PageSize [612.000 792.000] >> setpagedevice ABSTRACT In this manuscript we investigate the capabilities of the Discrete Dipole Approximation (DDA) to simulate scattering from particles that are much larger than the wavelength of the incident light, and describe an optimized publicly available DDA computer program that processes the large number of dipoles required for such simulations. Numerical simulations of light scattering by spheres with size parameters x up to 160 and 40 for refractive index m=1.05 and 2 respectively are presented and compared with exact results of the Mie theory. Errors of both integral and angle-resolved scattering quantities generally increase with m and show no systematic dependence on x. Computational times increase steeply with both x and m, reaching values of more than 2 weeks on a cluster of 64 processors. The main distinctive feature of the computer program is the ability to parallelize a single DDA simulation over a cluster of computers, which allows it to simulate light scattering by very large particles, like the ones that are considered in this manuscript. Current limitations and possible ways for improvement are discussed. <|endoftext|><|startoftext|> Introduction ......................................................................................................................... 2 2 General framework.............................................................................................................. 3 3 Various DDA models .......................................................................................................... 7 3.1 Theoretical base of the DDA........................................................................................ 7 3.2 Accuracy of DDA simulations ................................................................................... 13 3.3 The DDA for clusters of spheres................................................................................ 16 3.4 Modifications and extensions of the DDA................................................................. 18 4 Numerical considerations.................................................................................................. 19 4.1 Direct vs. iterative methods........................................................................................ 19 4.2 Scattering order formulation ...................................................................................... 22 4.3 Block-Toeplitz ........................................................................................................... 23 4.4 FFT............................................................................................................................. 24 4.5 Fast multipole method................................................................................................ 24 4.6 Orientation averaging and repeated calculations ....................................................... 25 5 Comparison of the DDA to other methods ....................................................................... 27 6 Concluding remarks .......................................................................................................... 28 Acknowledgements .................................................................................................................. 28 Appendix. Description of used acronyms and symbols ........................................................... 28 References ................................................................................................................................ 31 1 Introduction The discrete dipole approximation (DDA) is a general method to compute scattering and absorption of electromagnetic waves by particles of arbitrary geometry and composition. Initially the DDA was proposed by Purcell and Pennypacker (PP) [1], who replaced the scatterer by a set of point dipoles. These dipoles interact with each other and the incident field, giving rise to a system of linear equations, which is solved to obtain dipole polarizations. All the measured scattering quantities can be obtained from these polarizations. The DDA was further developed by Draine and coworkers [2-5], who popularized the method by developing a publicly available computer code DDSCAT [6]. Later it was shown that the DDA also can be derived from the integral equation for the electric field, which is discretized by dividing the scatterer into small cubical subvolumes. This derivation was apparently first performed by Goedecke and O'Brien [7] and further developed by others (see, for instance, [8-11]). It is important to note that the final equations, produced by both lines of derivation of the DDA are essentially the same. The only difference is that derivations based on the integral equations give more mathematical insight into the approximation, thus pointing at ways to improve the method, while the model based on point dipoles is physically clearer. The DDA is called the coupled dipole method or approximation by some researchers [12,13]. There are also other methods, such as the volume integral equation formulation [14] and the digitized Green’s function (DGF) [7], which were developed completely independently from PP. However, later they were shown to be equivalent to DDA [8,15]. In this review we will use the term DDA to refer to all such methods, since we describe them in terms of one general framework. However, it is difficult to separate unambiguously the DDA from other similar methods, based on the volume integral equations for the electromagnetic fields, such as a broad range of method of moments (MoM) with different bases and testing functions [16-19]. In our opinion, one fundamental aspect of the DDA is that the solution for the “physically meaningful” internal fields or their direct derivatives, e.g. polarization, plays an integral role in the process. In other words, any DDA formulation can be interpreted as replacing a scatterer by a set of interacting dipoles; this is further discussed in Section 2. An example of method that is not considered DDA is the MoM with higher-order hierarchical Legendre basis functions [17]. The DDA is a popular method in the light-scattering community and it has been reviewed by several authors. An extensive review by Draine and Flatau [4] covers almost all DDA developments up to 1994. A more recent review by Draine [5] mainly concerns applications and numerical considerations. DDA theory was discussed together with other methods for light scattering simulations in reviews by Wriedt [20], Chiappetta and Torresani [21], and Kahnert [15] and in books by Mishchenko et al. [22] and Tsang et al. [23]. Jones [24] placed the DDA in context of different methods with respect to particle characterization. However, many important DDA developments since 1994 are not mentioned in any of these manuscripts. Those that are mentioned are usually considered as side-steps, and are not placed into a general framework. Moreover, to the best of our knowledge numerical aspects of the DDA have never been reviewed extensively – each paper discusses only a few particular aspects. In this review we try to fill this gap. A general framework is developed in Section 2 to ease the further discussion of different DDA models. This framework is based on the integral equation because it allows a uniform description of all the DDA development. However, connection to a physically clearer model of point dipoles is discussed throughout the section. The sources of errors in the DDA formulation are also discussed there. In Section 3 the physical principles of the DDA are reviewed and results of different models are compared. In Subsection 3.1 different improvements of polarizabilities and interaction terms are reviewed from a theoretical point of view. Different expressions for Cabs also are discussed. Comparison of simulation results using different formulations is given in Subsection 3.2. Subsection 3.3 covers the special case of a cluster of spheres that allows particular improvements and simplifications. In Section 3.4 different significant modifications are reviewed, which do not fall completely into the general framework described in Section 2. Enhancements of the DDA for some special purposes also are discussed. Different numerical aspects of the DDA are reviewed in Section 4. These are concerned primarily with solving very large systems of linear equations (Subsection 4.1). Subsection 4.2 describes the simplest iterative procedure to solve DDA linear system, which has a clear physical meaning. The special structure of the DDA interaction matrix for a rectangular grid and its application to decrease computational costs are described in subsections 4.3 and 4.4 respectively. General methods to accelerate calculations, which do not require a rectangular grid, are discussed in Subsection 4.5. Subsection 4.6 covers special techniques to increase the efficiency of repeated calculations (e.g. in orientation averaging). A numerical comparison of the DDA with other methods is reviewed in Section 5; its strong and weak points are discussed. Section 6 concludes the review and discusses future development of the DDA. 2 General framework The )iexp( tω− time dependence of all fields is assumed throughout this review. The scatterer is assumed dielectric but not magnetic (magnetic permittivity 1=μ ). The electric permittivity is assumed isotropic to simplify the derivations; however, extension to arbitrary dielectric tensors is straightforward.1 The general form of the integral equation governing the electric field inside the dielectric scatterer is the following [8,15]: )()(),(),()()(),(d)()( 00 rErrLrMrErrrGrErE χχ VVr ∂−+′′′′+= ∫ , (1) 1 In most formulae scalar values can be replaced directly by tensors, but there are exceptions. Extensions of DDA to optically anisotropic scatterers are discussed in Section 3.4. where Einc(r) and E(r) are the incident and total electric field at location r; πεχ 4)1)(()( −= rr is the susceptibility of the medium at point r (ε(r) – relative permittivity). V is the volume of the particle, i.e., the volume that contains all points where the susceptibility is not zero. V0 is a smaller volume such that , VV ⊂0 00 \ VV ∂∈r . ),( rrG ′ is the free space dyadic Green’s function, defined as −=∇∇+=′ 222 i1ˆˆ)iexp()iexp(ˆˆ),( k IIIrrG , (2) where I is the identity dyadic, ck ω= is the free space wave vector, rrR ′−= , R=R , and is a dyadic defined as (μ and ν are Cartesian components of the vector or tensor). M is the following integral associated with the finiteness of the exclusion volume RR ˆˆ νμμν RRRR =ˆˆ ( )∫ ′−′′′′= )()(),()()(),(d),( s30 rV rErrrGrErrrGrM χχ , (3) where ),(s rrG ′ is the static limit ( ) of 0→k ),( rrG ′ : −−=∇∇=′ 11ˆˆ),( IrrG . (4) L is the so-called self-term dyadic: rV rL , (5) where is an external normal to the surface ∂Vn′ˆ 0 at point r'. L is always a real, symmetric dyadic with trace equal to 4π [25]. It is important to note that L does not depend on the size of the volume V0, but only on its shape (and location of the point r inside it). On the contrary, M does depend on the size of the volume, moreover it approaches zero when the size of the volume decreases [8] (if both χ(r) and E(r) are continuous inside V0). When deriving Eq. (1) the singularity of the Green’s function has been treated explicitly, therefore it is preferable to the commonly used formulation [8,15]: ∫ ′′′′+= r )()(),(d)()( 3inc rErrrGrErE χ . (6) Moreover, Yanghjian noted [25] that there exist several methods for treating the singularity in Eq. (6) leading to different results. He also proved that the derivation of Eq. (6) is false in the vicinity of the singularity of ),( rrG ′ . Hence it can be considered correct only if the singularity is then treated in a way similar to that of Lakhtakia [8], resulting in the correct Eq. (1). Discretization of Eq. (1) is done in the following way [15]. Let , U = /0=ji VV I for ji ≠ ; N denotes number of subvolumes.2 Although the formulation is applicable to any set of subvolumes Vi, in most applications standard (equal) cells are used. Then the shape of the scatterer cannot always be described exactly by such standard cells. Hence, the discretization may be only approximately correct. Assuming iV∈r and choosing iVV =0 , Eq. (1) can be rewritten as )()(),(),()()(),(d)()( 3inc rErrLrMrErrrGrErE χχ ii ∂−+′′′′+= ∑ ∫ . (7) 2 In the framework of the DDA we usually call a subvolume a dipole. The set of Eq. (7) (for all i) is exact. Further, one fixed point ri inside each Vi (its center) is chosen and is set. In many cases the following assumptions can be made: irr = )()()()(),(d3 jjijj rErGrErrrG χχ =′′′′∫ , (8) )()(),( iiiiiV rErMrM χ= , (9) which state that integrals in Eq. (7) linearly depend upon the values of χ and E at point ri. Eq. (7) can then be rewritten as ( ) iiii jjjijii V ELMEGEE χχ −++= ∑ inc , (10) where , , )( ii rEE = )( incinc ii rEE = )( ii rχχ = , ),( iii V rLL ∂= . The usual approximation [15] is to consider E and χ constant inside each subvolume: iii V∈== rrErE for)(,)( χχ , (11) which automatically implies Eqs. (8), (9) and ( )∫ ′−′′= iii r ),(),(d s3)0( rrGrrGM , (12) ∫ ′′= ij rV 1 3)0( rrGG . (13) Superscript (0) denotes approximate values of the dyadics. A further approximation, which is used in almost all formulations of the DDA, including e.g. [8], is ),()0( jiij rrGG = . (14) This assumption is made implicitly by all formulations that start by replacing the scatterer with a set of point dipoles. It is important to note that Eq. (10) and derivations resulting from it require weaker assumptions (Eqs. (8), (9)) than imposed by Eq. (11) and, moreover, Eq. (14). It is possible to formulate the DDA based on Eq. (10), e.g. the Peltoniemi formulation [26] that is described in Section 3.1. We postulate Eq. (10) as a distinctive feature of the DDA, i.e. a method is called the DDA if and only if its main equation is equivalent to Eq. (10) with any Vi, χi, iM , iL , and ijG . Kahnert [15] distinguished the DDA from the MoM by the fact that the MoM solves directly Eq. (10) for unknown Ei, while the DDA seeks not the total, but the exciting electric fields ( )( ) selfexc iiiiiii EEEMLIE −=−+= χ , (15) ( ) iiiii ELME χ−=self , (16) where is the field induced by the subvolume on itself. Eq. selfiE (10) is then equivalent to jjijii excexcinc EαGEE , (17) where iα is the polarizability tensor defined as ( )( ) 1−−+= iiiiii V χχ MLIα . (18) However, an alternative formulation of the DDA exists [4] seeking a solution for unknown polarizations Pi: iiiiii V EEαP χ== exc , (19) jijiii PGPαE 1inc . (20) It is important to note that Pi, defined by Eq. (19), is only an approximation to the polarization of the subvolume Vi. This approximation is exact only under the assumption of Eq. (11), while the formulation itself does not require it. The formulation, using Eq. (20), can be thought as an intermediary between the DDA and the MoM as classified by Kahnert [15], therefore revealing complete equivalence of these two formulations. The special structure of the matrix ijG makes Eq. (20) preferable over Eqs. (10), (17) to find a numerical solution. This is discussed in Section 4. Lakhtakia [8] classified strong and weak forms of the DDA as those accounting for or neglecting iM respectively. The weak form approaches the strong form when the size of the cell decreases, because iM approaches zero. For a cubical cell Vi and with ri located at the center of the cell, iL can be calculated analytically yielding [25] =i . (21) Using Eq. (18), this results in the well-known Clausius-Mossotti (CM) polarizability (used originally by Purcell and Pennypacker [1]) for the weak form of the DDA: ii d ε α IIα , (22) where )( ii rεε = , and d is the size of the cubical cell. After the internal electric fields are determined, the scattered fields and cross sections can be calculated. The scattered fields are obtained by taking the limit ∞→r of the integral in Eq. (1) (see e.g. [7]): )iexp( )(sca nFrE = , (23) where rrn = is the unit vector in the scattering direction, and F is the scattering amplitude: ∑∫ ′′⋅′−′−−= krnnk )()()iexp(d)ˆˆ(i)( 33 rErnrInF χ . (24) All other differential scattering properties, such as amplitude and Mueller scattering matrices, and asymmetry parameter >< θcos can be derived from F(n), calculated for two incident polarizations [27]. Radiation forces also can be calculated [28-30]. Consider an incident polarized plane wave3 )iexp()( 0inc rkerE ⋅= , (25) where , a is the incident direction, and ak k= 10 =e . The scattering cross section Csca is [27] 2sca )(d C . (26) Absorption and extinction cross sections (Cabs, Cext) are derived [7,14] directly from the internal fields: ( )∑ ∫ ′′′= abs )()(Imd4 rErχπ , (27) [ ]( ) ( )∗∗ ⋅=′⋅′′′= ∑∫ 02inc3ext )(Re4)()()(Imd4 eaFrErEr krkC i Vi χπ , (28) where * denotes a complex conjugate. Conservation of energy necessitates that absextsca CCC −= . (29) However, as was noted by Draine [2], use of Eq. (29) for evaluation of Csca can lead to larger errors than Eq. (26), especially when . scaabs CC >> The easiest way to express Eqs. (24) and (27) in terms of the internal fields in the subvolumes centers is to assume Eq. (11), yielding ∑ ∫ ⋅′−′−−= krnnk )iexp(d)ˆˆ(i)( 33)0( nrEInF χ , (30) 3 DDA can be used for any incident wave, e.g. Gaussian beams [31]; however, we do not discuss this here. ∑∑ ∗== iii kVkC )Im(4)Im(4 abs EPE πχπ . (31) Further approximation of Eq. (30), leaving only the lowest order expansion of the exponent around ri, leads to ∑ ⋅−−−= ii knnk )iexp()ˆˆ(i)( 3)0( nrPInF , (32) which together with Eq. (28), leads to ( )∑ ∗⋅= inc)0( ext Im4 EPπ . (33) Eqs. (32) and (33) are identical to those used by Purcell and Pennypacker [1] and then by Draine [2], while expressions for Cabs (compared to Eq. (31)) are slightly different. These differences are discussed in Subsection 3.1. Unfortunately, many researchers do not specify explicitly how the scattering quantities are obtained from the computed internal fields or polarizations. Those who do usually use Draine’s prescription (Eqs. (26), (32), (33), and (35)). Errors of the formulation can be classified as associated with the finite cell size d (discretization errors), and with approximating the particle shape with a set of standard cells, e.g. cubical (shape errors). Discretization errors result from considering E constant inside each cell and the approximate evaluation of iM and ijG . Shape errors also can be considered as resulting from the assumption of constant χ and E inside bordering cells, which is false since the edge of the particle crosses these cells. On the other hand, shape errors can be viewed as a difference of the results for the exact particle shape and for that comprised of the set of standard cells. Both errors approach zero when ∞→N , while the geometry of the scatterer and parameters of the incident field are fixed. However, the same does not apply if while N is fixed, i.e. the DDA is not exact in the long-wavelength limit. Moreover, both errors are sensitive to the size of the scatterer in the resonance region (see discussion in Subsection 3.2). The behavior of these errors was studied by Yurkin et al. [32]. 3 Various DDA models 3.1 Theoretical base of the DDA Since the original manuscript by Purcell and Pennypacker [1], many attempts have been made to improve the DDA. The first stage (1988-1993) of these improvements was reviewed by Draine and Flatau [4]. It has been noted [2] that Eq. (22) does not satisfy energy conservation, and results obtained using this formulation do not satisfy the optical theorem. Based on the well-known [33] “radiative reaction” (RR) electric field, a correction to the polarizability for a finite dipole was added [2]: i)32(1 α = . (34) Draine [2] also proposed the following expression for the absorption cross section: [ ]∑ ⋅−⋅= iiii kkC *3*exc)0( abs )32()Im(4 PPEPπ , (35) derived from Eq. (29) applied to a single point dipole. The PP formulation uses Eq. (35) without the second part. It can be verified that Eq. (35) results in zero absorption for any scatterer if the polarizability is of the following form: IAα 31 i)32( kii −= − , Hii AA = , (36) where H denotes the conjugate transpose of a tensor. For real refractive index m, RR and all other expressions specified below result in α satisfying Eq. (36), which makes Eq. (35) clearly favorable over e.g. the PP formulation. It must be noted however that the original PP formulation, where CM polarizability was used, also results in zero absorption for real m. The correction in Eq. (34) is ( )3)(O kd . Several other corrections of ( )2)(O kd have been proposed. The first one was proposed by Goedecke and O’Brien [7] and independently in two other manuscripts [34,35]. They started from Eqs. (10)-(12) and used the following simplifying fact for a cubical cell (also valid for spherical cells), resulting from symmetry: )(d IRRf RRf , (37) where the origin is in the center of the cube. Eq. (37) is valid for any f(R) that has a singularity of less than third order for , i.e. the integrals on both sides are defined. They obtained 0→R 32)0( )iexp(d Rki IM . (38) By expanding exp(ikR) in Taylor series one can obtain ( )⎟⎟ ++= ∫ 423 2)0( Oi ki IM . (39) The remaining integral was evaluated by approximating the cube by a volume-equivalent sphere, resulting in ( )( )432DGF1)0( )(O))i(32()( kdkdkdbi ++= IM , (40) 611992.1)34( 31DGF1 ≈= πb . (41) An exact evaluation, obtained without expanding the exponent, of Eq. (38) for the equivolume sphere with radius 31)43( πda = was performed by Livensay and Chen [36] and implemented into the DGF formulation of the DDA by Hage and Greenberg [14,35] and later Lakhtakia [37]: [ ]1)iexp()i1()38()0( −−= kakai IM π . (42) In terms of the first two orders of expansion, this yields an identical result as Eq. (40). Finally the polarizability is obtained as ( )( )32DGF13CM )i()32()(1 kdkdbd +− α . (43) We denote the method based on Eq. (42) as LAK. Differences between LAK and DGF should be noticeable only for large values of kd. Dungey and Bohren [38], using results by Doyle [39], proposed the following treatment of the polarizability. First, each cubic cell is replaced by the inscribed sphere that is called a dipolar subunit with a higher relative electric permittivity εs as determined by the Maxwell- Garnett effective medium theory [27]: f , (44) where 6π=f is the volume filling factor. Other effective medium theories also may be used [40]. Next, the dipole moment of the equivalent sphere is determined using the Mie theory, and the polarizability is defined as [39] i= , (45) where α1 is the electric dipole coefficient from the Mie theory (see e.g. [41]): )()()()( )()()()( sssssss sssssss 1 xmxxxmm xmxxxmm α , (46) where ψ, ξ are Riccati-Bessel functions; 2s kdx = and ss ε=m are the size parameter and the relative refractive index of the equivalent sphere. We denote this formulation for the polarizability as the a1-term method (note that this terminology was introduced later [42]). It has the particular property that 1constCMM ≠→αα when , contrary to all other polarization prescription, for which this ratio approaches 1. It should be noted that the Mie theory is based on the assumption that the external electric field is a plane wave. In most applications of the DDA this is true for the incident electric field, but not for the field created by other subvolumes. Therefore the a 1-term method is expected to be correct only for very small cell size. Hence it is not clear whether this method has advantages even compared to CM. On the other hand, this method may be more justified for clusters of small spheres, where each sphere can be considered as a dipole (see Subsection 3.3). Draine and Goodman [3] pointed out that considering electric fields constant for evaluating integrals over a cell introduces errors of order ( )( )2O kd . This represents a problem for many polarizability corrections, based on integral equations. Draine and Goodman approached this problem from a different angle. They determined the optimal polarizability in the sense that an infinite lattice of point dipoles with such polarizability would lead to the same propagation of plane waves4 as in a medium with a given refractive index. This polarizability was called LDR (Lattice Dispersion Relation) and is, as expected, CM plus high-order corrections. These corrections in turn depend on the direction of propagation a and the polarization of the incident field e0: ( ) ( )[ ]322LDR32LDR2LDR13CM )i()32()(1 kdkdSmbmbbd +++− α , (47) 8915316.1LDR1 ≈b , , , 1648469.0 2 −≈b 7700004.1 3 ≈b (48) ( )∑= 20eaS . (49) We use a reverse sign convention in the denominator of Eq. (47) and the LDR coefficients as compared to the original paper [3]. Recently it has been shown [43] that the LDR derivation is not completely accurate, since the resulting dipole moment does not satisfy the transversality condition, for which a correction was proposed. This corrected LDR (CLDR) differs principally in the fact that the polarizability tensor can not be made isotropic but only diagonal [43], though not dependent on the incident polarization: ( ) ( )[ ]3222LDR32LDR2LDR13CM )i()32()(1 kdkdambmbbd +++− α . (50) Another flaw of LDR is that it is evidently not correct for dipoles near the particle surface. However, it is not clear how to evaluate the effect of these mistreated surface dipoles on the overall results, e.g. on the scattering cross section. Further improvement of the DDA was initiated by Peltoniemi [26] (PEL) who showed that the term M(Vi) in Eq. (7) can be evaluated exactly up to the third order of kd by expanding the term )()( rEr ′′χ under the integral in a Taylor series over the point irr =′ , yielding ( ) ( ,)(O3i3)iexp(d )iexp( EkdERRRRkRRk ERkRRk REMVM ντρτρνμ +∂∂−+− ∂−++= (51) where χ, E and their derivatives are all considered at the point ri. Eq. (51) is correct up to the third order of kd since the third term in the Taylor series vanishes because of symmetry. For spherical Vi of radius a, the integrals can be evaluated exactly [26] in a way similar to 4 with certain direction of propagation and polarization state. obtaining Eq. (42), but only terms of less than fourth order of kd are significant, which results ( EkaakakaVi χχχχπ 42232 )(O)(10 )( +⎥ ⎛ ⋅∇∇−∇−⎟ ⎛ += EEEM ). (52) If χ is constant inside the cell then the Maxwell equations state that EE 222 km−=∇ , 0=⋅∇ E . (53) Hence Eq. (9) is valid up to the third order of ka and ( )[ ]322 )(i)32()()101(1)34( kakami ++= IM π . (54) Piller and Martin [44] proposed using sampling theory to evaluate the integrals in Eq. (1). The electric field and the susceptibility is sampled: ∑ −′=′′ iiih )()()()()( r rErrrrEr χχ , (55) where hr(r) is the impulse response function of an antialiasing filter defined as )cos()sin( qrqrqr =r , (56) where dq π2= . Eq. (1) is then transformed to Eq. (10) with the so-called filtered Green’s function, defined as ∫ −′′′= r3 )(),(d ij hrV rrrrGG . (57) Eq. (57) can be viewed as a generalization of Eq. (13). The latter is obtained if a pulse function is considered instead of hr. The integral in Eq. (57) is evaluated analytically [44], taking V0 to be infinitesimally small. The filtered Green’s function does not have a singularity when , therefore ji rr = iiii V GM = . It was shown that the Fourier spectrum of E(r) lies on a sphere with radius m(r)k, if m is constant in the vicinity of r. Therefore at least two sampling points per wavelength in the scatterer are required. The susceptibility is also filtered, either by a mean value filter or a more complicated one, e.g. a Hanning window. This approach is called FCD (filtered coupled dipoles), and a computer code library for evaluation of filtered Green’s function is available [45]. Chaumet et al. [11] proposed direct integration of the Green’s tensor (IT) in Eqs. (12), (13). A Weyl expansion of the Green’s tensor is performed, transforming it to a form allowing efficient numerical computation of the self-term ( LM − ). They also proposed a correction to the second term in Draine’s expression for Cabs (Eq. (35)). Extension of their results to a non- isotropic self-term is ( ) ( )[ ]∑ −⋅+⋅= iiiiiii VkC /)(ImIm4 ***exc)0( abs PLMPEPπ The corrected second term is based on radiation energy of a finite dipole [11]: , in contrast to a point dipole used in the derivation of Eq. )Im( self ∗⋅ ii PE (35). One can see that Eqs. (58) and (31) are equivalent. Moreover, both of them are equivalent to Eq. (35) if and only if IAM iii Vk 3i)32(+= , Hii AA = . (59) This condition is similar, but not equivalent, to Eq. (36) and is always satisfied for RR, DGF, and LAK. Other polarizability prescriptions satisfy Eq. (59) for real m, then both Eqs. (58) and (35) result in zero absorption. Rahmani, Chaumet, and Bryant [46] proposed a new method (RCB) to determine polarizability based on the known solution of the electrostatic problem for the same scatterer. In the static limit the electric field at any point is linearly related to the incident field )()()( 01 rErCrE −= . (60) Substituting Eq. (60) into Eq. (20) with the static Green’s tensor, one can obtain the polarizability, which would give an exact solution in the static limit, as 1RCB −= iiii V Λχα , (61) ii CCrrGCΛ 1),( − ∑+= χ , (62) where )( ii rCC = . This static polarizability then replaces the CM polarizability, and the RR (Eq. (34)) is applied to it [46] to obtain the final polarizability for DDA simulations. It was later shown that RCB polarizabilities differ significantly from CM only for dipoles closer than 2d to the interface [47]. In their next manuscript [48] Rahmani et al. stated that the previous derivation is correct only if the tensor C is constant inside the particle (e.g. for ellipsoids), since otherwise the polarizability tensor obtained from Eq. (61) is generally not symmetric, which is physically impossible in the static case. This shows that a particle with a non-constant C is not equivalent to any set of physical point dipoles even in the static regime. However, it is equivalent to a set of non-physical dipoles with an asymmetric polarizability. Therefore, the polarization defined by Eq. (61) formally can be used, by itself or with RR, even when C is not constant. Collinge and Draine [47] empirically combined the RCB prescription with CLDR to get the surface-corrected LDR (SCLDR): ( )( ) 13RCBRCBSCLDR −−= BαIαα d , (63) where B is the correction matrix (analogous to Eq. (50)): ( )[ ]3222LDR32LDR2LDR1 )i()32()( kdkdambmbbB +++= μμνμν δ . (64) All methods based on the paper by Rahmani et al. [46] are initially limited to very specific shapes of the scatterer (ellipsoids, infinite slabs and cylinders). Expansion of its applicability to other shapes is debatable [48] and would anyway require a preliminary solution of the electrostatic problem for the same shape, which is generally not trivial. All DDA formulations are schematically depicted in Fig. 1, which also shows interrelations between them. Some formulations can be compared unambiguously in terms of theoretical soundness: one is an improvement of the other, i.e. it employs fewer approximations. Such formulations are depicted in the same column on Fig. 1, while others cannot be compared directly with each other; they give rise to different columns. Comparison between formulations from different columns can and has been made almost exclusively empirically by comparing the accuracy of the simulation results (see Subsection 3.2). All the above techniques are aimed at reducing discretization errors; only a few aim at reducing shape errors. Some of them employ adaptive discretization (different dipole sizes) to better describe the shape of the scatterer (see Subsection 3.4). Another approach is to average susceptibility in boundary subvolumes. The simplest averaging using the Lorentz-Lorenz mixing rule was proposed by Evans and Stephens [49] for the case of the boundary between the scatterer and its surrounding medium 3434 e , (65) where is the effective susceptibility, and f is the volume fraction of the subvolume actually occupied by scatterer. A more advanced averaging, called the weighted discretization (WD), was proposed by Piller [13]. It modifies the susceptibility and self-term of the boundary subvolume.5 The particle surface, crossing the subvolume Vi, is assumed linear and divides the subvolume into two parts: the principal that contains the center and a secondary with susceptibilities piV 5 any subvolume that has non-zero intersection with both the scatterer and the outer medium. All such subvolumes are accounted for. Integral Eq. (1) discretization (no assumptions) Eq. (7) General formulation of DDA – Eq. (20) Eqs. (8), (9) DGF, LAK Eqs. (11) Eq. (14) (weak form) CLDR a1 term SCLDR FCD sampling with antialiasing filter removing antialiasing filter improving polarizability starting from dipole formulation complies Eq. (14) simplifies to Fig. 1. Scheme of interrelation between the different DDA models discussed in Section 3.1. Arrows down correspond to assumptions employed. Vertical position of the method qualitatively corresponds to its accuracy (higher = better), however methods in different columns cannot be compared directly. iχ , and electric fields , respectively. Electric fields are considered constant inside each part and related to each other via a boundary condition tensor iχ ii EE ≡ iT : iii ETE = s . (66) Then the total polarization of the subvolume can be evaluated as follows: iiiiiiiii i VVVr EEErErP essspp3 )()(d χχχχ =+=′′′= ∫ , (67) ( ) iiiiiii VVV TI ssppe χχχ += . (68) The susceptibility of the boundary subvolume is replaced by an effective one. The effective self-term is evaluated directly starting from Eq. (3), considering χ and E constant inside each part: ( ) ( ) ii rr TrrGrrGrrGrrGM ss3ps3ee ),(),(d),(),(d χχχ ∫∫ ′−′′+′−′′= . (69) Piller [13] evaluated the integrals in Eq. (69) numerically. The final equations are the same as Eq. (20), where polarizabilities are obtained from Eq. (18) using effective susceptibilities and self-terms for boundary subvolumes. Hence, WD does not modify the general numerical scheme. Currently, there are no rigorous theoretical reasons for preferring one formulation over others. However, theoretical analyses of DDA convergence when refining discretization recently conducted by Yurkin et al. [32], showed that IT and WD significantly improve the convergence of shape and discretization errors, respectively. Experimental verification of these theoretical conclusions is still to be performed. Table 1. Accuracy of different DDA formulations for a sphere.a Value Method x a/d y m Error, % Ref. Cext a1-term 1÷2 2÷4 c 0.65 0.85 1.33+0.05i 1.7+0.1i CSec, S11 LAK 9 0.44 0.42 0.51 1.05 1.33+0.01i 2.5+1.4i 0.05, 37 0.5, 35 4, 15 Csca, Cabs DGF ≤3.2c 16 ≤1 4+3i 5, 10÷30 [3] CSec LDR ≤8c 16 ≤0.5, ≤0.1 m-1≤1 1, 2 Csca LDR ≤7c 16 ≤1 ≤0.5, ≤1 2+i 1.5 3, 4 CSec ≤16c 25 ≤1 1.6+0.0008i 2.5+0.02i LDR [51] [4]eCSec LDR any any ≤1 |m|≤3 5 20÷30 Csca LDR ≤10c 16 kd≤0.63 0.69 0.41 0.29 [148] S11 LDR ≤10c 24 kd≤0.42 0.69 0.41 Cext, RMS11S 3.2 Accuracy of DDA simulations Over the years many results on the accuracy of DDA simulations have been published. It is, however, generally hard to systematically compare the relevant manuscripts because they all use different independent parameters, such as the size parameter x, refractive index m, or discretization, as a function of which the error is measured. We will describe discretization by the parameter kdmy = or . The former is used wherever possible; however, in some cases a description of results is more straightforward in terms of y kdmy )Re(Re = Re. Accuracy results LDR 20÷160 20÷130 20÷60 20÷30 20÷30 32÷256c 40÷256c 48÷128c 56÷80c 64÷88c 0.61÷0.65d 0.56÷0.64d 0.58÷0.65d 0.57÷0.60d 0.56÷0.62d 0.62d 1.05 0.04, 38 0.4, 23 1, 59 4.4, 56 5.7, 105 2.0, 86 [113] Ψ FCD π, 2π 2.8, 5.6c 1.7 1.5 1 [44] WD-FCD 0.5÷3.2c 1.5÷3.8c 0.9÷1.5c yRe=0.63 |m|<7b |m|<2.5b |m|<4b [10]Ψ IT ≤5.2c CSec ≤2.1c ≤1.1c 8 ≤1 1.5+0.3i 3.5+1.4i 7.1+0.7i Cabs CSec RCB-RR ≤8.2c ≤7.5c ≤5.9c ≤3.4c ≤1.3c 16 ≤1 1.8+0.4i 1.9+i 2.5+i 2.5+4i 7.4+9.4i CSec SCLDR SCLDR ≤7.2c ≤1.5c ≤1.5c 12 ≤0.8 1.33+0.1i 5+4i 5+4i a All errors are relative. CSec denotes the maximum error over all cross sections, S11 and correspond to maximum and root mean square error over the range of scattering angles, Ψ is the normalized mean error of the far-field electric fields [44]. In some cases two errors are shown in one cell separated by a coma. They correspond to two values of one of the parameters in the same row. b approximate description of the range. c this value is determined by other values in the same row. d this value is slightly different for different size parameters. e this corresponds to the “rule of thumb” for spheres. for scattering by a sphere are summarized in Table 1. All manuscripts on this subject can be divided into two classes: those that fix x and vary N (or equivalently, the number of dipoles per sphere radius a/d) with y, and those that fix a/d and vary the size parameter with y. The former is easier to interpret; the latter is easier to simulate. To facilitate comparison between different methods we provide both x and a/d, however one of them is dependent on the other. Some additional information on these results follows below. Draine and Goodman [3] compared RR, DGF, and LDR for cross sections of a sphere with . DGF is generally more accurate than RR. For 16/ =da 1|1| ≤−m LDR gives superior or comparable results to DGF, for i2+=m LDR and DGF are comparable, and for DGF is preferable over LDR. In the review of LDR DDA, Draine and Flatau [4] summarized that for cross sections can be evaluated to accuracies of a few percent provided . In that case differential cross sections have satisfactory accuracy: relative errors up to 20-30%, but only where the absolute value of the differential cross sections is small. For spheres, such results are obtained even for i34 +=m 2|| ≤m 3|| ≤m . Comparison of CLDR to LDR [43] only results in minor differences. Generally CLDR results in slightly better accuracy for Csca, but worse for Cabs. Piller and Martin [44] compared FCD to LAK by studying the dependence of the mean relative error of the far-field electric fields (Ψ) on y for spheres with π=x , 2π and . It was shown that FCD (with a Hanning window filter for the electric permittivity ε) is roughly 3 times more accurate than LAK in the range 5.1=m 5.27.0 ≤≤ y and gives similar accuracies for (for larger spheres). Comparison of WD to traditional methods [13] was performed for spheres with 4.0≤y π=x , 2π and 32.1=m , i7.01.2 + . LAK was used to determine polarizabilities. For in the range 32.1=m 3.14.0 ≤≤ y overall accuracy was only slightly improved, but error peaks for certain values of y were smoothed out. For i7.01.2 +=m accuracy was improved 4-5 times over the whole range 3.1≤y . Piller also showed [10] that a combination of WD and FCD gives even better results. Generally FCD decreases the negative effects of Re(ε) on accuracy and WD those of Im(ε). Rahmani et al. [48] showed that RCB was clearly superior to CM in calculating cross sections for fixed and m from 16/ =da i4.08.1 + to i4.94.7 + in the range . Two corrections (LDR and RR) over the static case were compared, and they gave similar overall results. Improvement of overall accuracy compared with CM was 2-5 times in all cases studied. For a thin slab, it was shown [46,48] that the internal fields calculated using RCB differ from those by CM mostly near the interfaces, where RCB yields much smaller errors, almost the same as far from interfaces. Collinge and Draine [47] compared LDR, RCB, and SCLDR in calculations of cross sections of spheres with . It was shown that for 12/ =da i01.033.1 +=m , LDR and SCLDR are superior in the range , while for 8.0≤y i45 +=m , SCLDR and RCB are superior. Convergence of cross sections for spheres and ellipsoids for increasing N with fixed x and different m (from to i01.033.1 + i45+ ) also was studied. SCLDR showed the most stable results for all cases, being the most or close to the most accurate one; however, for ellipsoids with large Im(m) RCB gave significantly more accurate results for Csca, especially for larger y. Performance of the DDA for more complex shapes also was studied by different authors. Flatau et al. [50] compared DDA simulations for a bisphere with an exact solution from a multipole expansion. For i01.033.1 +=m , 16/ =da , and 8.0≤y , LDR was several times more accurate than DGF and resulted in errors of less than 0.5% for both Csca and Cabs. Xu and Gustafson [51] made a similar but much more extended study of LDR. For , i008.06.1 +=m 25/ =da , and , errors in C4.0≤y ext, Cabs, and θcos are within 10%. For , errors in the angular dependence of S81.0=y 11 are up to 20% while S12 and S21 were completely wrong. For , errors in cross sections exceed 10% for . i02.05.2 +=m 3.0≥y Errors in the angular dependencies of the Mueller matrix elements are within 10-20% for and increase rapidly with increasing y. For a fixed 3.0=y 3=x and , errors i004.06.1 +=m ext, Cabs, and >< θcos decrease from 10% to 1% while y decrease from 1 to 0.2. For , the angular dependence of S33.0=y 11 is in good agreement with the rigorous solution, while S12 and S21 differ significantly for certain orientations of the bisphere. Hage and Greenberg [14] compared LAK to experimental results obtained from microwave experiments on porous cubes. Using i005.0362.1 +=m , 64.0=y and , they obtained a difference of less than 40% with the experimental results of angular scattering patterns, except for deep minima. Light scattering of cubes, tiles, and cylinders with similar parameters also was studied and comparable differences between experiment and theory were obtained. Theoretical errors were estimated to be less than 10%, except for deep minima. 5504=N Iskander et al. [34] conducted a limited test of LAK for small elongated spheroids, comparing the results to those obtained using an iterative extended boundary condition method. Using , calculations were performed for aspect ratios up to 20 with maximum size parameter of the long axis being 10 and 0.5 for i01.033.1 +=m and respectively. Errors in scattering cross section were 21% and 11%, respectively. Ku [52] compared LAK with CM and the a i28.076.1 + 1-term for different shapes, but his conclusions are based on a large parameter y (up to 2), and are therefore suspicious and not further discussed here. Andersen et al. [53] studied the performance of the DDA for Rayleigh-sized clusters of a few spheres (most DDA formulations are then equivalent to CM). Several constituent materials were tested, all with high refractive indices in the studied region. It was shown that the DDA failed to converge using the fixed computational resources for very high (up to 13.0) and very low (down to 0.12) Re(m); up to 30 dipoles were used per diameter of a single sphere. It can be concluded that particles with more complex shapes than spheres are more difficult to model with the DDA, leading to larger errors for the same m and y. This effect can be explained in general by the increase of surface to volume ratio and hence larger fraction of boundary subvolumes [32]. Another possible reason is complex regions, e.g. contact between two particles in a cluster, where rapid variation of the electric field deteriorates the overall accuracy. There is, however, a notable exception from this general tendency. Shapes, which can be modeled exactly by a set of cubical dipoles, e.g. a cube, can be simulated using the DDA much more accurately than spheres, especially for small y [32]. Draine and Flatau [4] have introduced a “rule of thumb” for discretization: use 10 dipoles per wavelength in the medium (i.e. either y or yRe equal to 0.63, depending on the interpretation). Though it is widely used, the accuracy of the results, when using such discretization, is hard to deduce a priori. Draine and Flatau themselves derived an estimate of the error based on a set of test simulations. This estimate is described above and mentioned in Table 1; it is usually cited as a “few percent accuracy in cross sections.” However, it may significantly over- or under-estimate the error, especially for large size parameters. Moreover, it does not completely account for the dependence on m, even in the stated range of its application ( ), since DDA accuracy deteriorates rapidly with increasing m (see 2|| ≤m Table 1). Still, the rule of thumb is good first guess for many applications. Most studies of DDA accuracy are limited to integral scattering quantities and, at most, the angular dependence of S11. In only a few manuscripts are other scattering quantities studied. For instance, Singham [54] simulated the angular dependence of Mueller matrix element S34 for spheres and less compact particles, using CM polarizability. It was shown that an accurate simulation of this element requires smaller values of y than for S11. For 55.1=x and a calculation of S33.1=m 11 was accurate already for 8.0=y , while was required for S 2.0≤y 34. It was also reported that for less compact objects like discs and rods, the required y was larger, 0.4 and 0.55 respectively, because of the smaller interaction between the dipoles. However, Hoekstra and Sloot argued [55] that this effect is mostly caused by the pronounced S34 sensitivity to surface roughness, which is significant for smaller size if y is fixed. They showed that for and 7.10=x 05.1=m , very high accuracy is achieved with because of the larger number of dipoles used. 66.0=y Internal fields are an intermediate result in the DDA. They cannot be directly compared to the experimental results; however, all measured scattering quantities are derived from them. Therefore, a study of their accuracy can reveal greater understanding of the nature of DDA errors. Hoekstra et al. [56] performed such a study for LAK polarizability. Three spheres were examined with , 9, 5 and 9=x 05.1=m , i01.033.1 + , respectively. Values of y were 0.44, 0.42, and 0.51 respectively. The most significant errors in the amplitude of the internal field were localized at the boundary of the spheres with maximum relative errors of 3.4%, 19%, and 120% respectively. Errors in S i4.15.2 + 12, S33, S34 were significant only for the third sphere. It was shown that for a given yRe these errors rapidly increase with m but only slightly depend upon x in the range from 1 to 10. Moreover, the DDA is capable of reproducing resonances of Mie theory, although their positions are slightly shifted (less than 1% in m). Druger and Bronk [57] studied the accuracy of the internal fields for single and coated spheres. They used 5.1=x , , and CM polarizability. Errors in the internal fields were localized at the interfaces, with average errors larger than 30% for a single sphere with and , and less than 7% for a single and concentric sphere with and . The core of the concentric sphere has 8.1≤m 8.1=m 17.0=y 3.1=m 08.0=y 1.1=m and its diameter is half the total diameter. The angular dependence of the absolute values of S1 and S2 had significant errors in the side- and backscattering. It can be concluded that shape errors contribute mostly to the internal fields near the boundary, and increase with m. All the literature discussing DDA accuracy shows errors as a function of input parameters and discretization, which is the most straightforward way. The only exception so far is the rule of thumb, which is too general and approximate to be applied in many particular cases. A more useful way to present errors is to fix the desired accuracy for certain input parameters and find the discretization that results in such accuracy. Such an analysis can be applied directly to practical calculations and can be used to derive rigorous estimates of DDA computational requirements [58]. In a number of manuscripts the origin of errors in the DDA was examined to try to separate and compare shape and discretization errors [49,59-62]; however, no definite conclusions were reached. The uncertainty was due to the indirect methods used that have inherent interpretation problems. Recently, Yurkin et al. [63] proposed a direct method to separate shape and discretization errors, which can be used to study their fundamental properties. This method also can be applied to study the performance of different formulations aimed at decreasing shape errors, e.g. WD. For example, it has been shown that the maximum errors of S11(θ) for a sphere with and 5=x 5.1=m , discretized using 16 dipoles per diameter ( ), are mostly due to shape errors. However the same is not true for all measured quantities. In another manuscript [32] it was suggested that the discretization error should decrease more rapidly with decreasing y than shape errors. However, it is still hard to deduce a priori the importance of shape errors for a certain scatterer and y; hence, further systematic quantitative study is required. 93.0=y 3.3 The DDA for clusters of spheres There are two main peculiarities when the DDA is applied to clusters of spheres. First, such particles are generally less compact, yielding smaller interactions between dipoles. This leads to a smaller condition number of the DDA interaction matrix and hence faster convergence of the iterative solver (see Section 4.1). Second, when the constituent spheres are small compared to the wavelength, each sphere can be modeled as one spherical subvolume, yielding some theoretical simplifications. A general theory exists [64] based on the Mie theory (generalized multiparticle Mie solution (GMM) [65]) that allows for highly accurate simulations of clusters of spheres. However, when many small spheres are used one wants to minimize the number of unknowns in the linear system. Direct reduction of the GMM to the lowest order (using only the first order expansion coefficients) leads to DDA + CM [64]. Improving accuracy in the GMM is done by accounting for higher multipole moments, while the DDA introduces higher order corrections to the coefficients of the linear system. It is not clear how the accuracy of these two methods compare with each other; however, the former should lead to a formulation similar to a coupled multipole method (Subsection 3.4) with a larger number of unknowns. DDA-based methods (starting usually with the integral equations introduced in Section 2) should be successful in making the formulation more accurate without increasing the number of unknowns, which is the goal for large clusters of small spheres. Moreover, the DDA may employ fast algorithms for solving the linear system. In this setting, the fast multipole method (FMM) (see Subsection 4.5) seems most promising. It should be noted, however, that a cluster having a small size parameter (i.e. in the electrostatic approximation) does not imply that all expansion coefficients, except the first one, are negligible. This is because the size of the constituent particles is also very small and the fields inside them are far from constant, especially when the spheres are located close to each other and have large refractive indices [66]. Therefore, the DDA does have some principal difficulties of calculating scattering by clusters of spheres. Mackowski [67], for instance, found that for some systems composed of spheres much smaller than the wavelength, up to 10 expansion terms were necessary to achieve convergence. In studies of osculating spheres, Ngo et al. [68] proved that the GMM could be chaotic and were able to calculate Lyapunov exponents, and that the slow convergence for the touching spheres was the result of the system lying in an attractor region. A recent paper by Markel et al. [69] presented computationally efficient modifications of the GMM in the static limit and demonstrated the insufficiency of the DDA to compute scattering properties of fractal aggregates accurately. However, Kim et al. [70] showed that the DDA is satisfactory in calculating the static polarizability of dielectric nanoclusters, especially of clusters with a large number of constituents. The development of DDA-based methods for calculating light scattering by clusters of small spheres was started by Jones [71,72], who developed a method similar to CM. Iskander et al. [34] used a method equivalent to LAK to calculate scattering of chained aerosol clusters. This subject was further investigated by Kosaza [73,74]. Lou and Charalampopoulos [75] (LC) further improved the calculations of the interaction term and scattering quantities. Starting from an integral equation for the internal field equivalent to Eq. (1), they assumed Eq. (11). After that the integrals in Eqs. (12) and (13) over spherical subvolumes can be evaluated analytically. The result for the interaction term is the following: ),()()0( jiij ka rrGG η= , (70) where a correction function η is defined as )O()101(1 cossin 3)( 42 x +−= =η . (71) Eq. (30) also is evaluated analytically, yielding ∑ ⋅−−−= ii knnkak )iexp()ˆˆ)((i)( 3)0( nrPInF η , (72) ( )∑ ∗⋅= iikakC inc)0( ext Im)(4 EPηπ . (73) The following expression for Cabs is stated without derivation: iikakC )Im()(4 abs EPηπ . (74) Markel et al. [76] applied the DDA to fractal clusters of spheres, and studied their optical properties. However, they have not fixed the polarizability of a single dipole but rather treated it as a variable, calculating the dependence of a cluster’s optical characteristics upon it. Pustovit et al. [77] argued that the DDA is inaccurate for touching spheres. They developed a hybrid of the DDA and the GMM, which considers only pair interactions between spheres (as the DDA) but, when calculating them, accounts for higher multipole terms. This formulation can be considered as the one providing a more accurate evaluation of the interaction term (Eq. (13)), and hence similar to LC. LC was compared to DGF and LAK in a Csca computation of a cluster of 10 particles for and . Differences between DGF and LAK are less than 1% (as expected), while the difference between LC and LAK increases quadratically with ka, reaching 10% for i7.07.1 +=m 5.005.0 ≤≤ ka 5.0=ka . However, as no exact (e.g. GMM) solution is presented, the accuracy of each individual method is not clear. Okamoto [42] tested the a1-term method for clusters of up to 3 touching spheres. No effective medium is needed in this case, making the method sounder. It was shown that the a1- term is clearly superior to LDR in cross-sections calculations, when each sphere is treated as a single dipole. Errors of the a1-term are less than 10% for 2.1≤y when . For three collinear touching spheres the errors are 30% and 40% for and 2.8 when and respectively. However, errors do not seem to diminish significantly for small y (results are presented only down to i01.033.1 +=m 9.1≤y i01.033.1 +=m i2 + 2.0=y ). Therefore, the a1-term seems suitable for obtaining quick crude estimations of cross sections. In the sequel of this subsection we mention several applications of the DDA to scattering from clusters of spheres. It was applied to describe the scattering by astrophysical dust aggregates [78,79] using the a1-term method. Hull et al. [80] applied CM DDA to Diesel soot particles. LC was applied [81] to the computation of light scattering by randomly branched chain aggregates. Lumme and Rahola [40] studied scattering properties of clusters of large spheres (each modeled by a set of dipoles) with the a1-term method considering astrophysical applications. Hage and Greenberg [35] studied scattering by porous particles, which were modeled as clusters of cubical cells making their method equivalent to standard LAK. Recently the DDA with LDR was used [82] to model scattering by porous dust grains and compare them to approximate theories, e.g. effective medium theories. It also was used to study light scattering by fractal aggregates [83], especially its dependence on the internal structure [84]. 3.4 Modifications and extensions of the DDA Bourrely et al. [85] proposed to use small d to minimize surface roughness, but larger dipoles inside the particle. Starting with small dipoles with CM polarizability, one dipole is combined with 6 adjacent ones (if they all have the same polarizability) producing a dipole, located at the same point but with a 7 times larger polarizability. This operation is repeated while possible. Interaction terms are considered in their simplest form (Eq. (14)). This method allows the decrease of the shape errors with only a minor increase in the number of dipoles. The authors showed that this method is more than two times more accurate than CM for some test cases. Rouleau and Martin [86] proposed a generalized semi-analytical method. A dynamic grid is used to evaluate the integral in Eq. (1). First, a static grid is built inside the particle. Then each point on the static grid is used as an origin of a spherical coordinate system, and the particle is approximated by an ensemble of volume elements in these spherical coordinates. As usual, the polarization inside each subvolume is assumed constant, but Eq. (13) can be evaluated analytically in spherical coordinates. Polarization inside a subvolume is obtained by interpolation of its values at the points of the static grid. In addition, adaptive gridding is employed, where smaller subvolumes are used at the boundary of the particle. Mulholland et al. [87] proposed a coupled electric and magnetic dipole method (CEMD), where a magnetic dipole is considered at each subvolume together with an electric dipole. Polarizabilities are derived from the a1 and b1 terms of the Mie theory. CEMD requires two times more variables in the linear system, since the electric and magnetic fields are interconnected. Lemaire [88] went further and developed the coupled multipole method, considering also the electric quadrupole. Addition of the electric quadrupole can be considered as a more accurate evaluation of the interaction term in Eq. (13), as compared to Eq. (14). It results in even better accuracy than CEMD, but at the expense of additional computation time. The major disadvantage of all these four methods is that the matrix of the system of linear equations does not seem to have any special form, suitable for faster algorithms (see Section 4). Therefore computational costs are much larger compared to regular methods, thus limiting their practical use. In what follows, several DDA extensions are mentioned without further discussion. The theoretical basis for application of the DDA to optically anisotropic particles was summarized by Lakhtakia [89]. Loiko and Molochko [90] applied the DDA to study light scattering by liquid-crystal spherical droplets. Smith and Stokes [91] used the DDA to calculate the Faraday effect for nanoparticles. Researchers in the electrical engineering community applied MoM (in a variation that is equivalent to the DDA) to anisotropic scatterers [92,93]. Rectangular parallelepipeds can be used as subvolumes in the DDA [11,23,43]. This allows an accurate description of light scattering by particles with large aspect ratios, using fewer dipoles and is also compatible with FFT techniques (Subsection 4.4). Khlebtsov [94] proposed a simplification of the DDA, based on the assumption that all polarizations are parallel to the incident electric field. The number of variables is thus reduced three times, however at a cost of accuracy. Moreover, depolarization is completely ignored. Markel [95] analytically solved the DDA equations for scattering by an infinite one- dimensional periodic dipole array. This approach is similar to the one used in obtaining the LDR formulation for dipole polarizability [3]. Chaumet et al. [96] generalized the DDA to periodic structures, and further to defects in a periodic grating on a surface [97]. The idea of using the complex Green’s tensor in the standard DDA formulation was summarized by Martin [98]. Yang et al. [99] used the DDA to calculate surface electromagnetic fields and determine Raman intensities for small metal particles of arbitrary shape. Lemaire and Bassrei [100] showed that the shape of an object can be reconstructed from the measured angle dependence of scattered intensities. This procedure can be thought of as an inversion of the dependence between dipole polarizabilities and scattering. This dependence is taken from the DDA. A similar idea is used in recent manuscripts on optical tomography [101-103]. Zubko et al. [104] modified the Green’s tensor used in the DDA to study the backscattering of debris particles. They showed that the far-field part of the Green’s tensor is responsible for both the backscattering brightness surge and the negative polarization branch. 4 Numerical considerations In this section the numerical aspects of the DDA are discussed. One should keep in mind, however, that final simulation times depend not only on the chosen numerical methods but also on the particular implementation. Recently, Penttila et al. [105] have compared four different computer programs for the DDA. These are based on almost identical numerical methods: the Krylov-subspace iterative method (Section 4.1) combined with a FFT acceleration of the matrix-vector product (Section 4.4). However, simulation times may differ by several factors. Optimizations of computer codes are not further discussed in this review. 4.1 Direct vs. iterative methods There are two general types of methods to solve linear systems of equations , where x is an unknown vector and A and y are known matrix and vector, respectively: direct and yAx = iterative [106]. Direct methods give results in a fixed number of steps, while the number of iterations required in iterative methods is generally not known a priori. The most usual example of a direct method is LU decomposition, which allows quick solving for multiple y once the decomposition is performed. Iterative methods are usually faster, less memory consuming and numerically more stable. However, iterative methods cannot be considered superior over direct, since they strongly depend on the problem to solve [107]. For a general n×n matrix (in DDA Nn 3= ) computation time of LU decomposition is O(n3) and storage requirements O(n2), while computation time for one iteration is O(n2) [107]. Iterative methods for a general matrix converge in O(n) iterations, although some of them may not converge at all. However, in many cases satisfactory accuracy can be obtained after a much smaller number of iterations. In these cases, iterative methods can provide significant increases in speed, especially for large n. Most iterative methods access the matrix A only through matrix-vector multiplication (sometimes also with the transposed matrix), which allows the construction of special routines for calculation of these products. Such routines may decrease memory requirements, since it is no longer necessary to store the entire matrix, especially for matrices of special form (see Subsection 4.3). A special structure of the matrix may also allow acceleration of the matrix-vector product from O(n2) to O(nlnn) (see subsections 4.4, 4.5). However, the same applies to direct methods (see Subsection 4.3). Throughout DDA history, mostly iterative methods were employed (however see Subsection 4.6). At first, they were used to accelerate computations [1], but they also allowed larger numbers of dipoles to be simulated [6,108], since storage of the entire matrix is prohibitive for direct methods. The most widely used iterative methods in the DDA are Krylov-space methods, such as [107] conjugate gradient (CG), CG applied to the Normalized equation with minimization of Residual norm (CGNR), Bi-CG, Bi-CG stabilized (Bi- CGSTAB), CG squared (CGS), generalized minimal residual (GMRES), quasi-minimal residual (QMR), transpose free QMR (TFQMR), and generalized product-type methods based on Bi-CG (GPBi-CG) [109]. An important part of the iterative solver is preconditioning, which effectively decreases the condition number of the matrix A and therefore speeds up convergence. However, this requires additional computational time during both initialization and each iteration. Preconditioning of the initial system can be summarized as [107] yMxMAMM 12 21 )( = − , (75) where M1 and M2 are left and right preconditioners, respectively. Preconditioners should either allow fast inversion or be integrated into the iteration process. The simplest preconditioner of the first type is the Jacobi (point), which is just the diagonal part of matrix A. An example of the second type of preconditioner is the Neumann polynomial preconditioner of order l: )( AIM . (76) QMR and Bi-CG can be made to employ the complex symmetric (CS) property of the DDA interaction matrix to halve the number of matrix-vector multiplications [110] (and thus computational time). Lumme and Rahola [40] were the first to apply QMR(CS) to the DDA and compared it with CGNR. They used m from i1.06.1 + to i43+ , and x from 1.3 to 13.5, corresponding to N from 136 to 20336. For all cases studied QMR(CS) was 2-4 times faster than CGNR. Rahola [9] further studied QMR(CS) and compared it to CGNR, Bi-CG(CS), Bi- CGSTAB, CGS, GMRES (full and with different memory length). For a “typical small problem” (parameters were not specified, unfortunately) the convergence of different methods was tested and QMR(CS) along with Bi-CG(CS) showed the best results. Although full GMRES was able to converge in fewer iterations, GMRES with as much as 40 memory lengths was slower than QMR(CS). Flatau [111] reviewed the use of iterative algorithms in the DDA and tested many of them, together with several preconditioners. He calculated scattering of a homogenous sphere with and m from 1.33 up to 1.0=x i0001.05+ , 1=x and m from 1.33 up to and . Left (L) and right (R) Jacobi-, and first-order Neumann polynomial preconditioners were tested. Unfortunately the number of dipoles N was not specified, which hampers comparison with other studies. For small particles CG(L) was superior for all refractive indices studied. CG and CG(R) showed similar results, while CGNR(L) and Bi- CGSTAB(L) were about 4 times slower. For i33.1 + i0001.03+ 1=x Bi-CGSTAB(L) was superior while Bi- CGSTAB,(R) and CGS,(L),(R) were slightly worse. TFQMR (both with and without Jacobi preconditioner) was 3-4 times slower. The first-order Neumann preconditioner showed unsatisfactory results. It was concluded that Bi-CGSTAB(L) is the most satisfactory choice for the DDA, and that method is the default one used in the DDSCAT program [6]. Recently Fan et al. [112] have compared GMRES, QMR(CS), Bi-CGSTAB, GPBi-CG, and Bi-CG(CS). They tested them on wavelength-sized scatterers (x up to 10) with m up to , and concluded that GMRES with memory depth 30 was the fastest, although it required four times more memory than the other methods. However, only the times of the matrix-vector product was compared, while other parts of the iteration may also take significant time, especially for GMRES(30). Choosing from less memory-consuming methods, QMR(CS) and Bi-CG(CS) showed a better convergence rate than Bi-CGSTAB and GPBi-CG, especially when i2.05.4 + 2>m . Moreover, the authors pointed out some flaws in the comparison by Flatau [111], making his conclusions insufficient. Yurkin et al. [113] employed QMR(CS), Bi-CG(CS), and Bi-CGSTAB to simulate light scattering by spheres with x up to 160 and 40 for 05.1=m and 2, respectively. It was shown that convergence of the iterative methods becomes very slow with increasing x and m (up to 105 iterations are required), and none of them is clearly preferable to the others. Moreover, there seems to be no systematic dependency of the choice of the best iterative solver on x and m; however, the difference in computational time was less than a factor of two, except for the largest x and m studied. Rahola [114] showed that the spectrum of the integral scattering operator for any homogenous scatterer is a line in the complex plane going from 1 to m2, except for a small amount of points, which corresponds to refractive indices that cause resonances for the specific shape. The spectrum of A is similar, since this matrix is obtained in the DDA by discretization of the integral operator (see also [9]). Assuming that the spectrum of A exactly lies on the specified line, it was shown that an estimate for the optimal reduction factor6 γ can be given as Eq. (77) is an approximation valid for small particle sizes, where no, or only few, resonances are present. However, in all cases the spectrum of A resembles the spectrum of the linear operator, which is defined by shape, size and refractive index of the scatterer. Therefore, the spectrum, and thus convergence, should not depend significantly on the discretization. This fact was confirmed empirically in other manuscripts [9,63]. Budko and Samokhin [115] generalized Rahola’s results to arbitrary inhomogeneous and anisotropic scatterers. They described a region in the complex plane that contains the whole spectrum of the integral scattering operator. This region depends only on the values of m inside the scatterer and does not depend on x. They showed that for purely real m or for m with very small imaginary part this region may come close to the origin, therefore the spectrum may contain very small eigenvalues for particles larger than the wavelength. This 6 Norm of the residual is decreased by this factor every iteration. may explain the extremely slow convergence of the iterative solver for real m and large x, which was recently obtained in numerical simulations [113]. Based on the analysis of the spectrum of the integral scattering operator for particles much smaller than the wavelength, Budko et al. [116] proposed an efficient iteration method for this particular case. It can be concluded that there are several modern iterative methods (QMR(CS), Bi- CG(CS), and Bi-CGSTAB) that have proved to be efficient when applied to the DDA. However, none of them can be claimed superior to the others, and one should test them for particular light-scattering problems. Moreover, except for the simplest cases, preconditioning of the DDA interaction matrix is almost not studied, while there is a need for it for large x and m, since then all methods converge extremely slowly or even diverge. It seems to us that the next major numerical advance in the DDA will be achieved by developing a powerful preconditioner for the DDA matrix. A large number of dipoles requires large computational power and, hence, parallel computers are commonly used, e.g. [108,113]. Parallel efficiency is not discussed here, but for iterative solvers, it is generally close to 1 [117]. However, this is not true for all preconditioners [107], and hence heavy preconditioners requiring large computational time in combination with a parallel DDA implementation should be employed with caution. 4.2 Scattering order formulation The Rayleigh-Debye-Gans (RDG) approximation [27] consists in considering E(r) equal to Einc(r). F(n) is then obtained directly from Eq. (24). Generalization of the RDG approach is obtained by iteratively solving the integral equation (1), which can be rewritten as )()()( inc rΛErErE += , (78) where Λ is a linear integral operator describing the scatterer. The iterative scheme is readily obtained by inserting the current (l-th) iteration of the electric field E(l)(r) into the right side of Eq. (78) and calculating the next iteration in the left side: )()()( )(inc)1( rΛErErE ll +=+ . (79) The starting value is taken the same as in RDG, , and the general formula for the solution is the following: )()( inc)0( rErE = inc )()( l rEΛrE , (80) which is a direct implementation of the well-known Neumann series: lΛΛI , (81) where I is the unitary operator. A necessary and sufficient condition for Neumann-series convergence is 1<Λ . (82) Physical sense of this iterative method lies in successive calculations of interaction between different parts of the scatterer. The zeroth approximation (or RDG) accounts for no interaction; the first approximation considers the influence of scattering of each dipole on the others once, and so on. Eq. (82) states that the interaction inside the scatterer should be small, but not as small as required for the applicability of RDG ( 1<<Λ ). In scattering problems, especially in quantum physics, Eq. (80) is called the Born expansion. Although theoretically clear, the Born expansion is not directly applicable [118], since each successive iteration requires analytical evaluation of multidimensional integrals with rising complexity, which quickly becomes unfeasible even for the simplest scatterers. The latest result is probably that of Acquista [118], who evaluated the Born expansion for a homogenous sphere up to second order. Therefore, realistic application of the Born expansion does require discretization of the integral operator, which is naturally done in the DDA. A scattering order formulation (SOF) of the DDA was developed independently by Chiappetta [119] and Singham and Bohren [12,120] by applying the Neumann series to Eq. (17). Λ is then a matrix defined as jijij αGΛ = , where each element is a dyadic, which can be expressed as a 3×3 matrix. An explicit check of Eq. (82) for a certain scatterer is not feasible numerically, however de Hoop [121] derived a sufficient condition for scalar waves: 1)(max)(2 20 < θcos asymmetry parameter 2 * superscript: complex conjugate 2 A a matrix 3.1 a kk 2 a radius of (equivalent) sphere 3.1 B correction matrix in SCLDR 3.1, Eq. (64) b1 – b3 numerical coefficients in polarization prescriptions 3.1 C tensor of electrostatic solution 3.1, Eq. (60) Csca, Cabs, Cext scattering, absorption, extinction cross section 2 c speed of light in vacuum 2 d size of a cubical cell 2 E, Einc, Eexc, Eself, Esca (total) electric field, incident, exciting, self-induced, scattered 2 e0 polarization vector of the incident wave, 10 =e 2 e superscript: effective 3.1 F scattering amplitude 2 f a function; volume filling factor G free space dyadic Green’s function (tensor) 2 2sG G in static limit ijG interaction term H superscript: conjugate transpose 3.1 hr impulse response function of a filter 3.1 2, 4.1I , I identity dyadic (tensor), operator (matrix) i, j subscript: vector indices 4.3 i imaginary unity 2 i, j subscript: number of the dipole 2 K order of a BT matrix 4.3 k free space wave vector 2 L self-term dyadic 2 M integral associated with finiteness of V0; preconditioner M dyadic associated with M 2 m refractive index (relative) 3.1 N total number of dipoles 2 n rr 2 n size of a matrix 4.1 n′ˆ external normal to the surface 2 nx, ny, nz sizes of the rectangular lattice 4.3 P polarization 2 p superscript: principal 3.1 q dπ2 3.1 R rr ′− 2 R0 radius of the smallest sphere circumscribing the scatterer 4.2 2r, r′ radius-vectors S LDR coefficient dependent on incident polarization 3.1, Eq. (49) Si amplitude matrix element 3.2 Sij Mueller matrix element 3.2 s superscript: secondary; strong; subscript: equivalent spherical dipole T boundary condition tensor 3.1, Eq. (66) t time 2 V volume of the scatterer 2 V0 exclusion volume 2 w superscript: weak 4.5 x unknown vector 4.1 Table A2 (continued) Symbols Description Section x size parameter of scatterer 3.2 x, y, z Cartesian coordinates 4.3 y a known vector (right side of a linear system) 4.1 y kdm || 3.2 yRe kdm)Re( 3.2 2α, α polarizability, tensor 4.1γ optimal reduction factor 3.1δ Kronecker symbol 2ε electric permittivity (relative) 3.3η correction function Λ intermediate tensor in RCB method 3.1, Eq. (62) 4.2Λ linear integral operator, its matrix μ, ν, ρ, τ, … sub-, superscript: Cartesian components of vectors (tensors) 2 ξ, ψ Riccati-Bessel functions 3.1 2χ electric susceptibility 3.2Ψ mean relative error of far-field electric field 2Ω solid angle 2ω circular frequency of the harmonic electric field a common sub- and superscripts are given on their own. For all vectors – the same symbol but in italic (instead of bold) denotes Euclidian norm of the vector (except unitary vectors). References [1] Purcell EM, Pennypacker CR. Scattering and adsorption of light by nonspherical dielectric grains. Astrophys J 1973;186:705-714. [2] Draine BT. The discrete-dipole approximation and its application to interstellar graphite grains. Astrophys J 1988;333:848-872. [3] Draine BT, Goodman JJ. Beyond clausius-mossotti - wave-propagation on a polarizable point lattice and the discrete dipole approximation. Astrophys J 1993;405:685-697. [4] Draine BT, Flatau PJ. Discrete-dipole approximation for scattering calculations. J Opt Soc Am A 1994;11:1491-1499. [5] Draine BT. The discrete dipole approximation for light scattering by irregular targets. In: Mishchenko MI, Hovenier, JW, Travis, LD, editors. Light Scattering by Nonspherical Particles, Theory, Measurements, and Applications. New York: Academic Press, 2000. p. 131-145. [6] Draine BT, Flatau PJ. User guide for the discrete dipole approximation code DDSCAT 6.1. http://xxx.arxiv.org/abs/astro-ph/0409262, 2004. [7] Goedecke GH, O'Brien SG. Scattering by irregular inhomogeneous particles via the digitized Green's function algorithm. Appl Opt 1988;27:2431-2438. [8] Lakhtakia A. Strong and weak forms of the method of moments and the coupled dipole method for scattering of time-harmonic electromagnetic-fields. Int J Mod Phys C 1992;3:583-603. [9] Rahola J. Solution of dense systems of linear equations in the discrete-dipole approximation. SIAM J Sci Comp 1996;17:78-89. [10] Piller NB. Coupled-dipole approximation for high permittivity materials. Opt Comm 1999;160:10-14. [11] Chaumet PC, Sentenac A, Rahmani A. Coupled dipole method for scatterers with large permittivity. Phys Rev E 2004;70:036606. [12] Singham SB, Bohren CF. Light scattering by an arbitrary particle: a physical reformulation of the coupled dipole method. Opt Lett 1987;12:10-12. [13] Piller NB. Influence of the edge meshes on the accuracy of the coupled-dipole approximation. Opt Lett 1997;22:1674-1676. [14] Hage JI, Greenberg JM, Wang RT. Scattering from arbitrarily shaped particles - theory and experiment. Appl Opt 1991;30:1141-1152. [15] Kahnert FM. Numerical methods in electromagnetic scattering theory. J Quant Spectrosc Radiat Transf 2003;79:775-824. [16] Peterson AW, Ray SL, Mittra R. Computational Methods of Electromagnetic Scattering. IEEE Press, 1998. [17] Kim OS, Meincke P, Breinbjerg O, Jorgensen E. Method of moments solution of volume integral equations using higher-order hierarchical Legendre basis functions. Radio Science 2004;39. [18] Lu CC. A fast algorithm based on volume integral equation for analysis of arbitrarily shaped dielectric radomes. IEEE Trans Ant Propag 2003;51:606-612. http://xxx.arxiv.org/abs/astro-ph/0409262, [19] Ivakhnenko V, Eremin Y. Light scattering by needle-type and disk-type particles. J Quant Spectrosc Radiat Transf 2006;100:165-172. [20] Wriedt T. A review of elastic light scattering theories. Part Part Sys Charact 1998;15:67-74. [21] Chiappetta P, Torresani B. Some approximate methods for computing electromagnetic fields scattered by complex objects. Meas Sci Technol 1998;9:171-182. [22] Mishchenko MI, Travis LD, Lacis AA. Scattering, Absorption, and Emission of Light by Small Particles. Cambridge: Cambridge University Press, 2002. [23] Tsang L, Kong JA, Ding KH, Ao CO. Scattering of Electromagnetic Waves: Numerical Simulations. New York: Wiley, 2001. [24] Jones AR. Light scattering for particle characterization. Prog Ener Comb Sci 1999;25:1-53. [25] Yanghjian AD. Electric dyadic Green's function in the source region. IEEE Proc 1980;68:248-263. [26] Peltoniemi JI. Variational volume integral equation method for electromagnetic scattering by irregular grains. J Quant Spectrosc Radiat Transf 1996;55:637-647. [27] Bohren CF, Huffman DR. Absorption and scattering of Light by Small Particles. New York: Wiley, 1983. [28] Draine BT, Weingartner JC. Radiative torques on interstellar grains .1. Superthermal spin-up. Astrophys J 1996;470:551-565. [29] Hoekstra AG, Frijlink M, Waters LBFM, Sloot PMA. Radiation forces in the discrete-dipole approximation. J Opt Soc Am A 2001;18:1944-1953. [30] Chaumet PC, Rahmani A, Sentenac A, Bryant GW. Efficient computation of optical forces with the coupled dipole method. Phys Rev E 2005;72:046708. [31] Hoekstra AG. Computer simulations of elastic light scattering. PhD thesis. University of Amsterdam, Amsterdam, 1994. [32] Yurkin MA, Maltsev VP, Hoekstra AG. Convergence of the discrete dipole approximation. I. Theoretical analysis. J Opt Soc Am A 2006;23:2578-2591. [33] Jackson JD. Classical Electrodynamics. New York: Wiley, 1975. [34] Iskander MF, Chen HY, Penner JE. Optical-scattering and absorption by branched chains of aerosols. Appl Opt 1989;28:3083-3091. [35] Hage JI, Greenberg JM. A model for the optical-properties of porous grains. Astrophys J 1990;361:251- 259. [36] Livesay DE, Chen KM. Electromagnetic fields induced inside arbitrarily shaped biological bodies. IEEE Trans Microw Theory Tech 1974;22:1273-1280. [37] Lakhtakia A, Mulholland GW. On 2 numerical techniques for light-scattering by dielectric agglomerated structures. J Res Nat Inst Stand Technol 1993;98:699-716. [38] Dungey CE, Bohren CF. Light-scattering by nonspherical particles - a refinement to the coupled-dipole method. J Opt Soc Am A 1991;8:81-87. [39] Doyle WT. Optical properties of a suspension of metal spheres. Phys Rev B 1989;39:9852-9858. [40] Lumme K, Rahola J. Light-scattering by porous dust particles in the discrete-dipole approximation. Astrophys J 1994;425:653-667. [41] van de Hulst HC. Light Scattering by Small Particles. New York: Dover, 1981. [42] Okamoto H. Light scattering by clusters: the a1-term method. Opt Rev 1995;2:407-412. [43] Gutkowicz-Krusin D, Draine BT. Propagation of electromagnetic waves on a rectangular lattice of polarizable points. http://xxx.arxiv.org/abs/astro-ph/0403082, 2004. [44] Piller NB, Martin OJF. Increasing the performance of the coupled-dipole approximation: A spectral approach. IEEE Trans Ant Propag 1998;46:1126-1137. [45] Gay-Balmaz P, Martin OJF. A library for computing the filtered and non-filtered 3D Green's tensor associated with infinite homogeneous space and surfaces. Comp Phys Comm 2002;144:111-120. [46] Rahmani A, Chaumet PC, Bryant GW. Coupled dipole method with an exact long-wavelength limit and improved accuracy at finite frequencies. Opt Lett 2002;27:2118-2120. [47] Collinge MJ, Draine BT. Discrete-dipole approximation with polarizabilities that account for both finite wavelength and target geometry. J Opt Soc Am A 2004;21:2023-2028. [48] Rahmani A, Chaumet PC, Bryant GW. On the importance of local-field corrections for polarizable particles on a finite lattice: Application to the discrete dipole approximation. Astrophys J 2004;607:873- 878. [49] Evans KF, Stephens GL. Microwave radiative-transfer through clouds composed of realistically shaped ice crystals .1. Single scattering properties. J Atmos Sci 1995;52:2041-2057. [50] Flatau PJ, Fuller KA, Mackowski DW. Scattering by 2 spheres in contact - comparisons between discrete-dipole approximation and modal-analysis. Appl Opt 1993;32:3302-3305. [51] Xu YL, Gustafson BAS. Comparison between multisphere light-scattering calculations: Rigorous solution and discrete-dipole approximation. Astrophys J 1999;513:894-909. [52] Ku JC. Comparisons of coupled-dipole solutions and dipole refractive-indexes for light-scattering and absorption by arbitrarily shaped or agglomerated particles. J Opt Soc Am A 1993;10:336-342. http://xxx.arxiv.org/abs/astro-ph/0403082, [53] Andersen AC, Mutschke H, Posch T, Min M, Tamanai A. Infrared extinction by homogeneous particle aggregates of SiC, FeO and SiO2: Comparison of different theoretical approaches. J Quant Spectrosc Radiat Transf 2006;100:4-15. [54] Singham SB. Theoretical factors in modeling polarized light scattering by arbitrary particles. Appl Opt 1989;28:5058-5064. [55] Hoekstra AG, Sloot PMA. Dipolar unit size in coupled-dipole calculations of the scattering matrix- elements. Opt Lett 1993;18:1211-1213. [56] Hoekstra AG, Rahola J, Sloot PMA. Accuracy of internal fields in volume integral equation simulations of light scattering. Appl Opt 1998;37:8482-8497. [57] Druger SD, Bronk BV. Internal and scattered electric fields in the discrete dipole approximation. J Opt Soc Am B 1999;16:2239-2246. [58] Yurkin MA, Brock RS, Lu JQ, Hoekstra AG. Systematic comparison of the discrete dipole approximation and the finite difference time domain method. (in preparation) [59] Okamoto H, Macke A, Quante M, Raschke E. Modeling of backscattering by non-spherical ice particles for the interpretation of cloud radar signals at 94 GHz. An error analysis. Contrib Atmos Phys 1995;68:319-334. [60] Liu CL, Illingworth AJ. Error analysis of backscatter from discrete dipole approximation for different ice particle shapes. Atmos Res 1997;44:231-241. [61] Lemke H, Okamoto H, Quante M. Comment on error analysis of backscatter from discrete dipole approximation for different ice particle shapes [ Liu, C.-L., Illingworth, A.J., 1997, Atmos. Res. 44, 231- 241.]. Atmos Res 1998;49:189-197. [62] Liu CL, Illingworth AJ. Reply to comment by Lemke, Okamoto and Quante on 'Error analysis of backscatter from discrete dipole approximation for different ice particle shapes'. Atmos Res 1999;50:1-2. [63] Yurkin MA, Maltsev VP, Hoekstra AG. Convergence of the discrete dipole approximation. II. An extrapolation technique to increase the accuracy. J Opt Soc Am A 2006;23:2592-2601. [64] Fuller KA, Mackowski DW. Electromagnetic scattering by compounded spherical particles. In: Mishchenko MI, Hovenier, JW, Travis, LD, editors. Light Scattering by Nonspherical Particles, Theory, Measurements, and Applications. New York: Academic Press, 2000. p. 223-272. [65] Xu YL. Scattering Mueller matrix of an ensemble of variously shaped small particles. J Opt Soc Am A 2003;20:2093-2105. [66] Mackowski DW. Electrostatics analysis of radiative absorption by sphere clusters in the rayleigh limit - application to soot particles. Appl Opt 1995;34:3535-3545. [67] Mackowski DW. Calculation of total cross-sections of multiple-sphere clusters. J Opt Soc Am A 1994;11:2851-2861. [68] Ngo D, Videen G, Dalling R. Chaotic light scattering from a system of osculating, conducting spheres. Physics Letters A 1997;227:197-202. [69] Markel VA, Pustovit VN, Karpov SV, Obuschenko AV, Gerasimov VS, Isaev IL. Electromagnetic density of states and absorption of radiation by aggregates of nanospheres with multipole interactions. Phys Rev B 2004;70:054202. [70] Kim HY, Sofo JO, Velegol D, Cole MW, Mukhopadhyay G. Static polarizabilities of dielectric nanoclusters. Phys Rev A 2005;72:053201. [71] Jones AR. Electromagnetic wave scattering by assemblies of particles in the Rayleigh approximation. Proc R Soc London A 1979;366:111-127. [72] Jones AR. Scattering efficiency factors for agglomerates for small spheres. J Phys D 1979;12:1661-1672. [73] Kozasa T, Blum J, Mukai T. Optical-properties of dust aggregates .1. Wavelength dependence. Astron Astrophys 1992;263:423-432. [74] Kozasa T, Blum J, Okamoto H, Mukai T. Optical-properties of dust aggregates .2. Angular-dependence of scattered-light. Astron Astrophys 1993;276:278-288. [75] Lou WJ, Charalampopoulos TT. On the electromagnetic scattering and absorption of agglomerated small spherical-particles. J Phys D 1994;27:2258-2270. [76] Markel VA, Shalaev VM, Stechel EB, Kim W, Armstrong RL. Small-particle composites .1. Linear optical properties. Phys Rev B 1996;53:2425-2436. [77] Pustovit VN, Sotelo JA, Niklasson GA. Coupled multipolar interactions in small-particle metallic clusters. J Opt Soc Am A 2002;19:513-518. [78] Lumme K, Rahola J, Hovenier JW. Light scattering by dense clusters of spheres. Icarus 1997;126:455- 469. [79] Kimura H, Mann I. Light scattering by large clusters of dipoles as an analog for cometary dust aggregates. J Quant Spectrosc Radiat Transf 2004;89:155-164. [80] Hull P, Shepherd I, Hunt A. Modeling light scattering from Diesel soot particles. Appl Opt 2004;43:3433-3441. [81] Venizelos DT, Lou WJ, Charalampopoulos TT. Development of an algorithm for the calculation of the scattering properties of agglomerates. Appl Opt 1996;35:542-548. [82] Voshchinnikov NV, Il'in VB, Henning T. Modelling the optical properties of composite and porous interstellar grains. Astron Astrophys 2005;429:371-381. [83] Kohler M, Kimura H, Mann I. Applicability of the discrete-dipole approximation to light-scattering simulations of large cosmic dust aggregates. Astron Astrophys 2006;448:395-399. [84] Zubko E, Petrov D, Shkuratov Y, Videen G. Discrete dipole approximation simulations of scattering by particles with hierarchical structure. Appl Opt 2005;44:6479-6485. [85] Bourrely C, Chiappetta P, Lemaire TJ, Torresani B. Multidipole formulation of the coupled dipole method for electromagnetic scattering by an arbitrary particle. J Opt Soc Am A 1992;9:1336-1340. [86] Rouleau F, Martin PG. A new method to calculate the extinction properties of irregularly shaped particles. Astrophys J 1993;414:803-814. [87] Mulholland GW, Bohren CF, Fuller KA. Light-scattering by agglomerates - coupled electric and magnetic dipole method. Langmuir 1994;10:2533-2546. [88] Lemaire TJ. Coupled-multipole formulation for the treatment of electromagnetic scattering by a small dielectric particle of arbitrary shape. J Opt Soc Am A 1997;14:470-474. [89] Lakhtakia A. General-theory of the purcell-pennypacker scattering approach and its extension to bianisotropic scatterers. Astrophys J 1992;394:494-499. [90] Loiko VA, Molochko VI. Polymer dispersed liquid crystal droplets: Methods of calculation of optical characteristics. Liq Crys 1998;25:603-612. [91] Smith DA, Stokes KL. Discrete dipole approximation for magneto-optical scattering calculations. Opt Expr 2006;14:5746-5754. [92] Su CC. Electromagnetic scattering by a dielectric body with arbitrary inhomogeneity and anisotropy. IEEE Trans Ant Propag 1989;37:384-389. [93] Chen RS, Fan ZH, Yung EKN. Analysis of electromagnetic scattering of three-dimensional dielectric bodies using Krylov subspace FFT iterative methods. Microwave Opt Tech Lett 2003;39:261-267. [94] Khlebtsov NG. An approximate method for calculating scattering and absorption of light by fractal aggregates. Opt Spec 2000;88:594-601. [95] Markel VA. Coupled-dipole approach to scattering of light from a one-dimensional periodic dipole structure. J Mod Opt 1993;40:2281-2291. [96] Chaumet PC, Rahmani A, Bryant GW. Generalization of the coupled dipole method to periodic structures. Phys Rev B 2003;67:165404. [97] Chaumet PC, Sentenac A. Numerical simulations of the electromagnetic field scattered by defects in a double-periodic structure. Phys Rev B 2005;72:205437. [98] Martin OJF. Efficient scattering calculations in complex backgrounds. AEU-Int J Electr Comm 2004;58:93-99. [99] Yang WH, Schatz GC, Vanduyne RP. Discrete dipole approximation for calculating extinction and raman intensities for small particles with arbitrary shapes. J Chem Phys 1995;103:869-875. [100] Lemaire TJ, Bassrei A. Three-dimensional reconstruction of dielectric objects by the coupled-dipole method. Appl Opt 2000;39:1272-1278. [101] Belkebir K, Chaumet PC, Sentenac A. Superresolution in total internal reflection tomography. J Opt Soc Am A 2005;22:1889-1897. [102] Chaumet PC, Belkebir K, Sentenac A. Three-dimensional subwavelength optical imaging using the coupled dipole method. Phys Rev B 2004;69:245405. [103] Chaumet PC, Belkebir K, Lencrerot R. Three-dimensional optical imaging in layered media. Opt Expr 2006;14:3415-3426. [104] Zubko E, Shkuratov Y, Videen G. Discrete-dipole analysis of backscatter features of agglomerated debris particles comparable in size with wavelength. J Quant Spectrosc Radiat Transf 2006;100:483-488. [105] Penttila A, Zubko E, Lumme K, Muinonen K, Yurkin MA, Draine BT, Rahola J, Hoekstra AG, Shkuratov Y. Comparison between discrete dipole implementations and exact techniques. J Quant Spectrosc Radiat Transf 2007, doi:10.1016/j.jqsrt.2007.01.26. [106] Press WH, Flannery BP, Teukolsky SA, Vetterling WT. Numerical Recipes in C. The Art of Scientific Computing. New York: Cambridge University Press, 1990. [107] Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, van der Vorst HA. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, 1994. [108] Hoekstra AG, Grimminck MD, Sloot PMA. Large scale simulations of elastic light scattering by a fast discrete dipole approximation. Int J Mod Phys C 1998;9:87-102. [109] Zhang SL. GPBi-CG: Generalized product-type methods based on Bi-CG for solving nonsymmetric linear systems. SIAM J Sci Comp 1997;18:537-551. [110] Freund RW. Conjugate gradient-type methods for linear-systems with complex symmetrical coefficient matrices. SIAM J Sci Stat Comp 1992;13:425-448. [111] Flatau PJ. Improvements in the discrete-dipole approximation method of computing scattering and absorption. Opt Lett 1997;22:1205-1207. [112] Fan ZH, Wang DX, Chen RS, Yung EKN. The application of iterative solvers in discrete dipole approximation method for computing electromagnetic scattering. Microwave Opt Tech Lett 2006;48:1741-1746. [113] Yurkin MA, Maltsev VP, Hoekstra AG. The discrete dipole approximation for simulation of light scattering by particles much larger than the wavelength. J Quant Spectrosc Radiat Transf 2007, doi:10.1016/j.jqsrt.2007.01.33. [114] Rahola J. On the eigenvalues of the volume integral operator of electromagnetic scattering. SIAM J Sci Comp 2000;21:1740-1754. [115] Budko NV, Samokhin AB. Spectrum of the volume integral operator of electromagnetic scattering. SIAM J Sci Comp 2006;28:682-700. [116] Budko NV, Samokhin AB, Samokhin AA. A generalized overrelaxation method for solving singular volume integral equations in low-frequency scattering problems. Differ Eq 2005;41:1262-1266. [117] Hoekstra AG, Sloot PMA. Coupled dipole simulations of elastic light scattering on parallel systems. Int J Mod Phys C 1995;6:663-679. [118] Acquista C. Light scattering by tenuous particles: a generalization of the Rayleigh-Gans-Rocard approach. Appl Opt 1976;15:2932-2936. [119] Chiappetta P. Multiple scattering approach to light scattering by arbitrarily shaped particles. J Phys A 1980;13:2101-2108. [120] Singham SB, Bohren CF. Light-scattering by an arbitrary particle - the scattering-order formulation of the coupled-dipole method. J Opt Soc Am A 1988;5:1867-1872. [121] de Hoop AT. Convergence criterion for the time-domain iterative Born approximation to scattering by an inhomogeneous, dispersive object. J Opt Soc Am A 1991;8:1256-1260. [122] Flatau PJ, Stephens GL, Draine BT. Light-scattering by rectangular solids in the discrete-dipole approximation - a new algorithm exploiting the block-Toeplitz structure. J Opt Soc Am A 1990;7:593- 600. [123] Flatau PJ. Fast solvers for one dimensional light scattering in the discrete dipole approximation. Opt Expr 2004;12:3149-3155. [124] Goodman JJ, Draine BT, Flatau PJ. Application of fast-Fourier-transform techniques to the discrete- dipole approximation. Opt Lett 1991;16:1198-1200. [125] Barrowes BE, Teixeira FL, Kong JA. Fast algorithm for matrix-vector multiply of asymmetric multilevel block-Toeplitz matrices in 3-D scattering. Microwave Opt Tech Lett 2001;31:28-32. [126] Greengard L, Rokhlin V. A fast algorithm for particle simulations. J Comp Phys 1987;73:325-348. [127] Rahola J. Diagonal forms of the translation operators in the fast multipole algorithm for scattering problems. BIT 1996;36:333-358. [128] Hoekstra AG, Sloot PMA. New computational techniques to simulate light-scattering from arbitrary particles. Part Part Sys Charact 1994;11:189-193. [129] Koc S, Chew WC. Multilevel fast multipole algorithm for the discrete dipole approximation. J Electrom Wav Applic 2001;15:1447-1468. [130] Amini S, Profit ATJ. Multi-level fast multipole solution of the scattering problem. Engin Anal Bound Elem 2003;27:547-564. [131] Darve E. The fast multipole method I: error analysis and asymptotic complexity. SIAM J Num Anal 2000;38:98-128. [132] Dembart B, Yip E. The accuracy of fast multipole methods for Maxwell's equations. IEEE Comp Sci Engin 1998;5:48-56. [133] Barnes JE, Hut P. A hierarchical O(N log N) force-calculation algorithm. Nature 1986;324:446-449. [134] Barnes JE, Hut P. Error analysis of a tree code. Astrophys J Suppl 1989;70:389-417. [135] Ding KH, Tsang L. A sparse matrix iterative approach for modeling tree scattering. Microwave Opt Tech Lett 2003;38:198-202. [136] Singham MK, Singham SB, Salzman GC. The scattering matrix for randomly oriented particles. J Chem Phys 1986;85:3807-3815. [137] Mishchenko MI. Calculation of the amplitude matrix for a nonspherical particle in a fixed orientation. Appl Opt 2000;39:1026-1031. [138] McClain WM, Ghoul WA. Elastic light scattering by randomly oriented macromolecules: Computation of the complete set of observables. J Chem Phys 1986;84:6609-6622. [139] Khlebtsov NG. Orientational averaging of integrated cross sections in the discrete dipole method. Opt Spec 2001;90:408-415. [140] Mishchenko MI, Travis LD, Mackowski DW. T-matrix computations of light scattering by nonspherical particles: A review. J Quant Spectrosc Radiat Transf 1996;55:535-575. [141] Mackowski DW. Discrete dipole moment method for calculation of the T matrix for nonspherical particles. J Opt Soc Am A 2002;19:881-893. [142] Mishchenko MI. Light-scattering by size shape distributions of randomly oriented axially-symmetrical particles of a size comparable to a wavelength. Appl Opt 1993;32:4652-4666. [143] Muinonen K, Zubko E. Optimizing the discrete-dipole approximation for sequences of scatterers with identical shapes but differing sizes or refractive indices. J Quant Spectrosc Radiat Transf 2006;100:288- 294. [144] Hovenier JW, Lumme K, Mishchenko MI, Voshchinnikov NV, Mackowski DW, Rahola J. Computations of scattering matrices of four types of non-spherical particles using diverse methods. J Quant Spectrosc Radiat Transf 1996;55:695-705. [145] Wriedt T, Comberg U. Comparison of computational scattering methods. J Quant Spectrosc Radiat Transf 1998;60:411-423. [146] Comberg U, Wriedt T. Comparison of scattering calculations for aggregated particles based on different models. J Quant Spectrosc Radiat Transf 1999;63:149-162. [147] Wriedt T, Hellmers J, Eremina E, Schuh R. Light scattering by single erythrocyte: Comparison of different methods. J Quant Spectrosc Radiat Transf 2006;100:444-456. [148] Laczik Z. Discrete-dipole-approximation-based light-scattering calculations for particles with a real refractive index smaller than unity. Appl Opt 1996;35:3736-3745. 1 Introduction 2 General framework 3 Various DDA models 3.1 Theoretical base of the DDA 3.2 Accuracy of DDA simulations 3.3 The DDA for clusters of spheres 3.4 Modifications and extensions of the DDA 4 Numerical considerations 4.1 Direct vs. iterative methods 4.2 Scattering order formulation 4.3 Block-Toeplitz 4.4 FFT 4.5 Fast multipole method 4.6 Orientation averaging and repeated calculations 5 Comparison of the DDA to other methods 6 Concluding remarks Acknowledgements Appendix. Description of used acronyms and symbols References /ASCII85EncodePages false /AllowTransparency false /AutoPositionEPSFiles true /AutoRotatePages /None /Binding /Left /CalGrayProfile (Dot Gain 20%) /CalRGBProfile (sRGB IEC61966-2.1) /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2) /sRGBProfile (sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Error /CompatibilityLevel 1.4 /CompressObjects /Tags /CompressPages true /ConvertImagesToIndexed true /PassThroughJPEGImages true /CreateJDFFile false /CreateJobTicket false /DefaultRenderingIntent /Default /DetectBlends true /DetectCurves 0.0000 /ColorConversionStrategy /CMYK /DoThumbnails false /EmbedAllFonts true /EmbedOpenType false /ParseICCProfilesInComments true /EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false /EndPage -1 /ImageMemory 1048576 /LockDistillerParams false /MaxSubsetPct 100 /Optimize true /OPM 1 /ParseDSCComments true /ParseDSCCommentsForDocInfo true /PreserveCopyPage true /PreserveDICMYKValues true /PreserveEPSInfo true /PreserveFlatness true /PreserveHalftoneInfo false /PreserveOPIComments true /PreserveOverprintSettings true /StartPage 1 /SubsetFonts true /TransferFunctionInfo /Apply /UCRandBGInfo /Preserve /UsePrologue false /ColorSettingsFile () /AlwaysEmbed [ true /NeverEmbed [ true /AntiAliasColorImages false /CropColorImages true /ColorImageMinResolution 300 /ColorImageMinResolutionPolicy /OK /DownsampleColorImages true /ColorImageDownsampleType /Bicubic /ColorImageResolution 300 /ColorImageDepth -1 /ColorImageMinDownsampleDepth 1 /ColorImageDownsampleThreshold 1.50000 /EncodeColorImages true /ColorImageFilter /DCTEncode /AutoFilterColorImages true /ColorImageAutoFilterStrategy /JPEG /ColorACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] /ColorImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] /JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 /JPEG2000ColorImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] /GrayImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] /JPEG2000GrayACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 /JPEG2000GrayImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1 /AllowPSXObjects false /CheckCompliance [ /None /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 /PDFXOutputIntentProfile () /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False /Description << /CHS /CHT /DAN /DEU /ESP /FRA /ITA /JPN /KOR /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken die zijn geoptimaliseerd voor prepress-afdrukken van hoge kwaliteit. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.) /NOR /PTB /SUO /SVE /ENU (Use these settings to create Adobe PDF documents best suited for high-quality prepress printing. Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.) /Namespace [ (Adobe) (Common) (1.0) /OtherNamespaces [ << /AsReaderSpreads false /CropImagesToFrames true /ErrorControl /WarnAndContinue /FlattenerIgnoreSpreadOverrides false /IncludeGuidesGrids false /IncludeNonPrinting false /IncludeSlug false /Namespace [ (Adobe) (InDesign) (4.0) ] /OmitPlacedBitmaps false /OmitPlacedEPS false /OmitPlacedPDF false /SimulateOverprint /Legacy >> << /AddBleedMarks false /AddColorBars false /AddCropMarks false /AddPageInfo false /AddRegMarks false /ConvertColors /ConvertToCMYK /DestinationProfileName () /DestinationProfileSelector /DocumentCMYK /Downsample16BitImages true /FlattenerPreset << /PresetSelector /MediumResolution >> /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> >> setdistillerparams /HWResolution [2400 2400] /PageSize [612.000 792.000] >> setpagedevice ABSTRACT We present a review of the discrete dipole approximation (DDA), which is a general method to simulate light scattering by arbitrarily shaped particles. We put the method in historical context and discuss recent developments, taking the viewpoint of a general framework based on the integral equations for the electric field. We review both the theory of the DDA and its numerical aspects, the latter being of critical importance for any practical application of the method. Finally, the position of the DDA among other methods of light scattering simulation is shown and possible future developments are discussed. <|endoftext|><|startoftext|> Introduction The scalar form factor of the pion, Γπ(t), corresponds to the matrix element Γπ(t) = d4x e−i(q ′−q)x〈π(q′)| muū(x)u(x) +mdd̄(x)d(x) |π(q)〉 , t = (q′ − q)2 . (1.1) Performing a Taylor expansion around t = 0, Γπ(t) = Γπ(0) t〈r2〉πs +O(t2) , (1.2) where 〈r2〉πs is the quadratic scalar radius of the pion. The quantity 〈r2〉πs contributes around 10% [1] to the values of the S-wave ππ scattering lengths a00 and a 0 as determined in ref.[1], by employing Roy equations and χPT to two loops. If one takes into account that this reference gives a precision of 2.2% in its calculation of the scattering lengths, a 10% of contribution from 〈r2〉πs is a large one. Related to that, 〈r2〉πs is also important in SU(2)×SU(2) χPT since it gives the low energy constant ℓ̄4 that controls the departure of Fπ from its value in the chiral limit [2, 3] at leading order correction. Based on one loop χPT , Gasser and Leutwyler [2] obtained 〈r2〉πs = 0.55 ± 0.15 fm2. This calculation was improved later on by the same authors together with Donoghue [4], who solved the corresponding Muskhelishvili-Omnès equations with the coupled channels of ππ and KK̄. The update of this calculation, performed in ref.[1], gives 〈r2〉πs = 0.61±0.04 fm2, where the new results on S-wave I=0 ππ phase shifts from the Roy equation analysis of ref.[5] are included. Moussallam [6] employs the same approach and obtains values in agreement with the previous result. One should notice that solutions of the Muskhelishvili-Omnès equations for the scalar form factor rely on non-measured T−matrix elements or on assumptions about which are the channels that matter. Given the importance of 〈r2〉πs , and the possible systematic errors in the analyses based on Muskhelishvili-Omnès equations, other independent approaches are most welcome. In this respect we quote the works [7, 8, 9], and Ynduráin’s ones [10, 11, 12]. These latter works have challenged the previous value for 〈r2〉πs , shifting it to the larger 〈r2〉πs = 0.75 ± 0.07 fm2. From ref.[1] the equations, δa00 = +0.027∆r2 , δa 0 = −0.004∆r2 , (1.3) give the change of the scattering lengths under a variation of 〈r2〉πs defined by 〈r2〉πs = 0.61(1 + ∆r2) fm 2. For the difference between the central values of 〈r2〉πs given above from refs.[1, 10], one has ∆r2 = +0.23. This corresponds to δa 0 = +0.006 and δa 0 = −0.001, while the errors quoted are a00 = 0.220 ± 0.005 and a20 = −0.0444 ± 0.0010. We then adduce about shifting the central values for the predicted scattering lengths at the level of one sigma. The value taken for 〈r2〉πs is also important for determining the O(p4) χPT coupling ℓ̄4. The value of ref.[1] is ℓ̄4 = 4.4±0.2 while that of ref.[10] is ℓ̄4 = 5.4±0.5. Both values are incompatible within errors. The papers [10, 11, 12] have been questioned in refs.[13, 14]. The value of the Kπ quadratic scalar radius, 〈r2〉Kπs , obtained by Ynduráin in ref.[10], 〈r2〉Kπs = 0.31± 0.06 fm2, is not accurate, because he relies on old experiments and on a bad parameterization of low energy S-wave I=1/2Kπ phase shifts by assuming dominance of the κ resonance as a standard Breit-Wigner pole [15]. Fur- thermore, 〈r2〉Kπs was recently fixed by high statistics experiments in an interval in agreement with the sharp prediction of [15], based on dispersion relations (three-channel Muskhelishvili-Omnès equations from the T−matrix of ref.[16]) and two-loop χPT [17]. From the recent experiments [18, 19], one has for the charged kaons [18] 〈r2〉K±πs = 0.235 ± 0.014 ± 0.007 fm2, and for the neutral ones [19] 〈r2〉KLπs = 0.165 ± 0.016 fm2. The prediction of [15], in an isospin limit, is 〈r2〉Kπs = 0.192± 0.012 fm2, lying just in the middle of the experimental determinations. Another issue is Ynduráin’s more sound determination of the pionic scalar radius, whose (in)correctness is not settled yet. In this paper we concentrate on the approach of Ynduráin [10, 11, 12] to evaluate the quadratic scalar radius of the pion based on an Omnés representation of the I=0 non-strange pion scalar form factor. Our main conclusion will be that this approach [10] and the solution of the Muskhelishvili- Omnès equations [4], with ππ and KK̄ as coupled channels, agree between each other if one properly takes into account, for some T−matrices, the presence of a zero in the pion scalar form factor at energies slightly below the KK̄ threshold. Precisely these T−matrices are those used in [10] and favoured in [11]. Once this is considered we conclude that 〈r2〉πs = 0.63± 0.05 fm2. The contents of the paper are organized as follows. In section 2 we discuss the Omnès rep- resentation of Γπ(t) and derive the expression to calculate 〈r2〉πs . This calculation is performed in section 3, where we consider different parameterizations for experimental data and asymptotic phases for the scalar form factor. Conclusions are given in the last section. 2 Scalar form factor The pion scalar form factor Γπ(t), eq.(1.1), is an analytic function of t with a right hand cut, due to unitarity, for t ≥ 4m2π. Performing a dispersion relation of its logarithm, with the possible zeroes of Γπ(t) removed, the Omnès representation results, Γπ(t) = P (t) exp s(s− t) . (2.1) Here, P (t) is a polynomial made up from the zeroes of Γπ(t), with P (0) = Γπ(0). In the previous equation, φ(s) is the phase of Γπ(t)/P (t), taken to be continuous and such that φ(4m π) = 0. In ref.[10] the scalar form factor is assumed to be free of zeroes and hence P (t) is just the constant Γπ(0) (the exponential factor is 1 for t = 0). Thus, Γπ(t) = Γπ(0) exp s(s− t) . (2.2) From where it follows that, 〈r2〉πs = ds . (2.3) One of the features of the pion scalar form factor of refs.[4, 6, 8], as discussed in ref.[13], is the presence of a strong dip at energies around the KK̄ threshold. This feature is also shared by the strong S-wave I=0 ππ amplitude, tππ. This is so because tππ is in very good approximation purely elastic below the KK̄ threshold and hence, neglecting inelasticity altogether in the discussion that follows, it is proportional to sin δπe iδπ , with δπ the S-wave I=0 ππ phase shift. It is an experimental fact that δπ is very close to π around the KK̄ threshold, as shown in fig.1. Therefore, if δπ = π happens before the opening of this channel the strong amplitude has a zero at that energy. On the other hand, if δπ = π occurs after the KK̄ threshold, because inelasticity is then substantial, see eq.(2.4) below, there is not a zero but a pronounced dip in |tππ|. This dip can be arbitrarily close to zero if before the KK̄ threshold δπ approaches π more and more, without reaching it. 400 600 800 1000 1200 1400 1600 (MeV) Eq. (3.13), [20] PY [24] CGL [1] Sol. A of [27] Sol. B of [27] Sol. C of [27] Sol. D of [27] Sol. E of [27] Kaminski et al. [21] BNL-E865 Coll. [25] NA48/2 Coll. [26] 300 350 400 450 BL-E865 Coll. [25] NA48/2 Coll. [26] PY [24] CGL [1] Figure 1: S-wave I = 0 ππ phase shift, δπ(s). Experimental data are from refs.[21, 25, 26, 27]. Because of Watson final state theorem the phase φ(s) in eq.(2.1) is given by δπ(s) below the KK̄ threshold, neglecting inelasticity due to 4π or 6π states as indicated by experiments [20]. The situation above the KK̄ threshold is more involved. Let us recall that tππ = (η e 2iδπ − 1)/2i , (2.4) with 0 ≤ η ≤ 1 and the inelasticity is given by 1− η2, with η the elasticity coefficient. We denote by ϕ(s) the phase of tππ, required to be continuous (below 4m K it is given by δπ(s)). By continuity, close enough to the KK̄ threshold and above it, η → 1 and then we are in the same situation as in the elastic case. As a result, because of the Watson final state theorem and continuity, the phase φ(s) must still be given by ϕ(s). For δπ(sK) < π, sK = 4m K , ϕ(s) does not follow the increasing trend with energy of δπ(s) but drops as a result of eq.(2.4), see fig.2 for δπ(sK) < π. This is easily seen by writing explicitely the real and imaginary parts of tππ in eq.(2.4), tππ = η sin 2δπ + (1− η cos 2δπ) . (2.5) 400 600 800 1000 1200 1400 1600 1800 2000 (MeV) 1000 1100 1200 1300 1400 1500 1600 (MeV) ( δπ(sK)< 180 ( δπ(sK)> 180 Figure 2: Left panel: Strong phase ϕ(s), eigenvalue phase δ(+)(s) and asymptotic phase φas(s). Right panel: Integrand of 〈r2〉πs in eq.(3.12) for parameterization I (dashed line) and II (solid line). For more details see the text. Notice that the uncertainty due to φas(s) is much reduced in the integrand. The imaginary part is always positive (η < 1 above the KK̄ threshold and 1.1 GeV [20]) while the real part is negative for δπ < π, but in an interval of just a few MeV the real part turns positive as soon as δπ > π, fig.1. As a result, ϕ(s) passes quickly from values below but close to π to the interval [0, π/2]. This rapid motion of φ(s) gives rise to a pronounced minimum of |Γπ(t)| at this energy, as indicated in ref.[13] and shown in fig.3. The drop in φ(s) becomes more and more dramatic as δπ(sK) → π− (with the superscript +(−) indicating that the limit is approached from values above(below), respectively); and in this limit, φ(sk) = ϕ(sK) is discontinuous at sK . This is easily understood from eq.(2.5). Let us call s1 the point at which δπ(s1) = π with s1 > sK . Close and above s1, ϕ(s) ∈ [0, π/2], for the reasons explained above, and ϕ(s) has decreased very rapidly from almost π at the KK̄ threshold to values below π/2 just after s1. Then, in the limit s1 → s+K one has φ(s−K) = ϕ(s K) = π on the left, while on the right φ(s K) = ϕ(s K) < π/2. As a result ϕ(s) is discontinuous at s = sK . We stress that this discontinuity of ϕ(s) at sK when δπ(sK) → π− applies rigorously to φ(sK) as well since η(sK) = 1. This discontinuity at s = sK implies also that the integrand in the Omnès representation for Γπ(t) develops a logarithmic singularity as, φ(s−K)− φ(s , (2.6) with δ → 0+. When exponentiating this result one has a zero for Γπ(sK) as (δ/sK)ν , ν = (φ(s−K)− φ(s+K))/π > 0 and δ → 0+. This zero is a necessary consequence when evolving continuously from δπ(sK) < π to δπ(sK) > π. #1 This in turn implies rigorously that in the Omnès representation of Γπ(t), eq.(2.1), P (t) must be a polynomial of first degree for those cases with δπ(sK) ≥ π,#2 P (t) = Γπ(0) s1 − t , (2.7) with s1 the position of the zero. Notice that the degree of the polynomial P (t) is discrete and thus by continuity it cannot change unless a singularity develops. This is the case when δπ(sK) = π, changing the degree from 0 to 1. Hence, if δπ(sK) ≥ π for a given tππ, instead of eqs.(2.2) and (2.3) one must then consider, Γπ(t) = Γπ(0) s1 − t s(s− t) , (2.8) 〈r2〉πs = − ds . (2.9) For those tππ for which δπ(sK) > π then ϕ(s) follows δπ(s) just after the KK̄ threshold and there is no drop, as emphasized in ref.[11], see fig.2. Summarizing, we have shown that Γπ(t) has a zero at s1 when δπ(sK) ≥ π as a consequence of the assumption that φ(s) follows ϕ(s) above the KK̄ threshold, along the lines of ref.[11], and by imposing continuity in Γπ(t) under small changes in δπ(sK) ≃ π. As a result eqs.(2.8) and (2.9) should be used in the latter case, instead of eqs.(2.2) and (2.3), valid for δπ(sK) < π. This solution was overlooked in refs.[10, 11, 12]. We show in appendix A why the previous discussion on the zero of Γπ(t) for δπ(sK) ≥ π at s1 cannot be applied to all pion scalar form factors, in particular to the strange one. If eq.(2.2) were used for those tππ with δπ(sK) ≥ π then a strong maximum of |Γπ(t)| would be obtained around the KK̄ threshold, instead of the aforementioned zero or the minimum of refs.[4, 6], as shown in fig.3 by the dashed-dotted line. That is also shown in fig.10 of ref.[22] or fig.2 of [13]. This is the situation for the Γπ(t) of refs.[10, 11], and it is the reason why 〈r2〉πs obtained there is much larger than that of refs.[4, 1, 6]. That is, Ynduráin uses eqs.(2.2), (2.3) for δπ(sK) ≥ π, instead of eqs.(2.8), (2.9) (solid line in fig.3). The unique and important role played by δπ(sK) (for elastic tππ below the KK̄ threshold) is perfectly recognised in ref.[11]. However, in this reference the astonishing conclusion that Γπ(t) has two radically different behaviours under tiny variations of tππ was sustained. These variations are enough to pass from δπ(sK) < π to δπ(sK) ≥ π [10], while the T− or S−matrix are fully continuous. Because of this instability of the solution of refs.[10, 11] under tiny changes of δπ(s), we consider ours, that produces continuous Γπ(t), to be certainly preferred. We also stress that our solutions, either for δπ(sK) ≥ π and δπ(sK) < π, are the ones that agree with those obtained by solving the Muskhelishvili-Omnès equations [4, 1, 6] and Unitary χPT [8]. #1It can be shown from eq.(2.5) that φ(s− ) − φ(s+ ) = π. Here we are assuming η = 1 for s ≤ sK , which is a very good approximation as indicated by experiment [20, 21]. #2We are focusing in the physically relevant region of experimental allowed values for δπ(sK), which can be larger or smaller than π but close to. 0 200 400 600 800 1000 1200 (MeV) δπ(sK)<π δπ(sK)>π, P(t)=Γπ(0)(s1- t)/s1 δπ(sK)>π, P(t)=Γπ(0) ref. [8] (δπ(sK)>π) PSfrag replacements Figure 3: |Γπ(t)/Γπ(0)| from eq.(2.2) with δπ(sK) < π, dashed-line, and δπ(sK) > π, dashed-dotted line. The solid line corresponds to use eq.(2.8) for the latter case. For this figure we have used parameterization II (defined in section 3) with α1 = 2.28 (dashed line) and 2.20 (dashed-dotted and solid lines). The dashed-double-dotted line is the scalar form factor of ref.[8] that has δπ(sK) > π. Let us now show how to fix s1 in terms of the knowledge of δπ(s) with δπ(sK) ≥ π. For this purpose let us perform a dispersion relation of Γπ(t) with two subtractions, Γπ(t) = Γπ(0) + 〈r2〉πs t+ ImΓπ(s) s2(s− t) ds , (2.10) From asymptotic QCD [23] one expects that the scalar form factor vanishes at infinity [10, 12], then the dispersion integral in eq.(2.10) should converge rather fast. Eq.(2.10) is useful because it tells us that the only point around 1 GeV where there can be a zero in Γπ(t) is at the energy s1 for which the imaginary part of Γπ(t) vanishes. Otherwise, the integral in the right hand side of eq.(2.10) picks up an imaginary part and there is no way to cancel it as Γπ(0), 〈r2〉πs and t are all real. Since |ImΓπ(t)| = |Γπ(t) sin δπ(t)| for t ≤ sK , it certainly vanishes at the point s1 where δπ(s1) = π. As there is only one zero at such energies, this determines s1 exactly in terms of the given parameterization for δπ(s). One could argue against the argument just given to determine s1 that this energy could be complex. However, this would imply two zeroes at s1 and s 1, and then the degree of P (t) would be two instead of one. Notice that the degree of the polynomial P (t) is discrete and thus, by softness in the continuous parameters of the T−matrix, its value should stay at 1 for some open domain in the parameters with δπ(sK) > π until a discontinuity develops. Physically, the presence of two zeroes would in turn require that φ(s) → 3π so as to guarantee that Γπ(t) still vanishes as −1/t, as required by asymptotic QCD [23, 10]. This value for the asymptotic phase seems to be rather unrealistic as ϕ(s) only reaches 2π at already quite high energy values, as shown in fig.2. 3 Results Our main result from the previous section is the sum rule to determine 〈r2〉πs , 〈r2〉πs = − θ(δπ(sK)− π) + ds , (3.11) where θ(x) = 0 for x < 0 and 1 for x ≥ 0. We split 〈r2〉πs in two parts: 〈r2〉πs = QH +QA , QH = − θ(δπ(sK)− π) + , (3.12) with sH = 2.25 GeV 2. Reasons for fixing sH to this value are given below. The main issue in the application of eq.(3.11) is to determine φ(s) in the integrand. Below the KK̄ threshold and neglecting inelasticity, one has that φ(s) = δπ(s), 4m π ≤ s ≤ 4m2K . This follows because of the Watson final state theorem, continuity and the equality φ(4m2π) = δπ(4m π) = 0. For practical applications we shall consider the S-wave I=0 ππ phase shifts given by the K−matrix parameterization of ref.[20] (from its energy dependent analysis of data from 0.6 GeV up to 1.9 GeV) and the parameterizations of ref.[1] (CGL) and ref.[24] (PY). The resulting δπ(s) for all these parameterizations are shown in fig.1. We use CGL from ππ threshold up to 0.8 GeV, because this is the upper limit of its analysis, while PY is used up to 0.9 GeV, because at this energy it matches well inside the experimental errors with the data of [20]. The K−matrix of ref.[20] is used for energies above 0.8 GeV, when using CGL below this energy (parameterization I), and above 0.9 GeV, when using PY for lower energies (parameterization II). We take the pa- rameterizations CGL and PY as their difference below 0.8 GeV accounts well for the experimental uncertainties in δπ, see fig.1, and they satisfy constraints from χPT (the former) and dispersion re- lations (both). The reason why we skip to use the parameterization of ref.[20] for lower energies is because one should be there as precise as possible since this region gives the largest contribution to 〈r2〉πs , as it is evident from the right panel of fig.2. It happens that the K−matrix of [20], that fits data above 0.6 GeV, is not compatible with data from Ke4 decays [25, 26]. We show in the insert of fig.1 the comparison of the parameterizations CGL and PY with the Ke4 data of [25, 26]. We also show in the same figure the experimental points on δπ from refs.[20, 21, 27]. Both refs.[20, 21] are compatible within errors, with some disagreement above 1.5 GeV. This disagreement does not affect our numerical results since above 1.5 GeV we do not rely on data. The K−matrix of ref.[20] is given by, Kij(s) = αiαj/(x1 − s) + βiβj/(x2 − s) + γij , (3.13) where 1 = 0.11± 0.15 x 2 = 1.19± 0.01 α1 = 2.28± 0.08 α2 = 2.02± 0.11 β1 = −1.00± 0.03 β2 = 0.47± 0.05 γ11 = 2.86± 0.15 γ12 = 1.85± 0.18 γ22 = 1.00± 0.53 , (3.14) with units given in appropriate powers of GeV. In order to calculate the contribution from the phase shifts of this K−matrix we generate Monte-Carlo gaussian samples, taking into account the errors shown in eq.(3.14), and evaluate QH according to eq.(3.12). The central value of δπ(sK) for the K−matrix of ref.[20] is 3.05, slightly below π. When generating Monte-Carlo gaussian samples according to eq.(3.14), there are cases with δπ(sK) ≥ π, around 30% of the samples. Note that for these cases one also has the contribution −6/s1 in eq.(3.11). The application of Watson final state theorem for s > 4m2K is not straightforward since inelastic channels are relevant. The first important one is the KK̄ channel associated in turn with the appearance of the narrow f0(980) resonance, just on top of its threshold. This implies a sudden drop of the elasticity parameter η, but it again rapidly raises (the f0(980) resonance is narrow with a width around 30 MeV) and in the region 1.12 . s . 1.52 GeV2 is compatible within errors with η = 1 [20, 21]. For η ≃ 1, the Watson final state theorem would imply again that φ(s) = ϕ(s), but, as emphasized by [13], this equality only holds, in principle, modulo π. The reason advocated in ref.[13] is the presence of the region sK < s < 1.1 2 GeV2 where inelasticity can be large, and then continuity arguments alone cannot be applied to guarantee the equality φ(s) ≃ ϕ(s) for s & 1.12 GeV2. This argument has been proved in ref.[11] to be quite irrelevant in the present case. In order to show this a diagonalization of the ππ and KK̄ S−matrix is done. These channels are the relevant ones when η is clearly different from 1, between 1 and 1.1 GeV. Above that energy one also has the opening of the ηη channel and the increasing role of multipion states. We reproduce here the arguments of ref.[11], but deliver expressions directly in terms of the phase shifts and elasticity parameter, instead of K−matrix parameters as done in ref.[11]. For two channel scattering, because of unitarity, the T−matrix can be written as: (ηe2iδπ − 1) 1 1− η2ei(δπ+δK) 1− η2ei(δπ+δK) 1 (ηe2iδK − 1) , (3.15) with δK the elastic S-wave I=0KK̄ phase shift. In terms of the T -matrix the S-wave I=0 S−matrix is given by, S = I + 2iT , (3.16) satisfying SS† = S†S = I. The T -matrix can also be written as T = Q1/2 K−1 − iQ Q1/2 , (3.17) where the K−matrix is real and symmetric along the real axis for s ≥ 4m2π and Q = diag(qπ, qK), with qπ(qK) the center of mass momentum of pions(kaons). This allows one to diagonalize K with a real orthogonal matrix C, and hence both the T− and S−matrices are also diagonalized with the same matrix. Writing, cos θ sin θ − sin θ cos θ , (3.18) one has cos θ = [(1− η2)/2]1/2 1− η2 cos2∆− η| sin∆| 1− η2 cos2∆ ]1/2 , sin θ = − sin∆√ 1 + (1− η2) cot2∆ 1− η2 cos2∆− η| sin∆| 1− η2 cos2∆ ]1/2 , (3.19) with ∆ = δK − δπ. On the other hand, the eigenvalues of the S−matrix are given by, e2iδ(+) = S11 1 + e2i∆ 1 + (1− η2) cot2∆ (3.20) e2iδ(−) = S22 1 + e−2i∆ 1 + (1− η2) cot2∆ . (3.21) The eigenvalue phase δ(+) satisfies δ(+)(sK) = δπ(sK). The expressions above for exp 2iδ(+) and exp 2iδ(−) interchange between each other when tan∆ crosses zero and simultaneously the sign in the right hand side of eq.(3.19) for sin θ changes. This diagonalization allows to disentangle two elastic scattering channels. The scalar form factors attached to every of these channels, Γ′1 and Γ′2, will satisfy the Watson final state theorem in the whole energy range and then one has, = CTQ1/2Γ = CTQ1/2 Γπ = q λ cos θ |Γ′1|eiδ(+) ± sin θ |Γ′2|eiδ(−) ΓK = q ± cos θ |Γ′2|eiδ(−) − λ sin θ |Γ′1|eiδ(+) . (3.22) The ± in front of |Γ′2| is due to the fact that Γ′2 = 0 at sK , as follows from its definition in the equation above. Since Watson final state theorem only fixes the phase of Γ′2 up to modulo π, and the phase is not defined in the zero, we cannot fix the sign in front at this stage. Next, Γ′1 has a zero at s1 when δπ(sK) ≥ π. For this case, −|Γ′1| must appear in the previous equation, so as to guarantee continuity of its ascribed phase, and this is why λ = (−1)θ(δπ(sK)−π). Now, when η → 1 then sin θ → 0 as (1− η)/2 and φ(s) is then the eigenvalue phase δ(+). This eigenvalue phase can be calculated given the T−matrix. For those T−matrices employed here, and those of refs.[10, 11, 4, 13], δ(+)(s) follows rather closely ϕ(s) in the whole energy range. This is shown in fig.2 and already discussed in detail in ref.[11]. In this way, one guarantees that φ(s) and ϕ(s) do not differ between each other in an integer multiple of π when η ≃ 1, 1.12 . s . 1.52 GeV2. For the calculation of QH in eq.(3.12) we shall equate φ(s) = ϕ(s) for 4m K < s < 1.5 2 GeV2. Denoting, = I1 + I2 + I3 , ∫ 1.12 ds , (3.23) QH ≃ IH − θ(δπ(sK)− π) . (3.24) Now, eq.(3.22) can also be used to estimate the error of approximating φ(s) by ϕ(s) in the range 4m2K < s < 1.5 2 GeV2 to calculate I2 and I3 as done in eq.(3.23). We could have also used δ(+)(s) in eq.(3.23). However, notice that when η . 1 then ϕ(s) ≃ δ(+)(s) and when inelasticity could be substantial the difference between δ(+)(s) and ϕ(s) is well taken into account in the error analysis that follows. Remarkably, consistency of our approach also requires φ(s) to be closer to ϕ(s) than to δ(+)(s). The reason is that ϕ(s) for δπ(sK) ≥ π is in very good approximation the ϕ(s) for δπ(sK) < π plus π, this is clear from fig.2. This difference is precisely the required one in order to have the same value for 〈r2〉πs either for δπ(sK) < π or δπ(sK) ≥ π from eq.(3.11). However, the difference for δ(+)(s) between δπ(sK) < π and δπ(sK) ≥ π is smaller than π. Indeed, we note that φ(s) follows closer ϕ(s) than δ(+)(s) for the explicit form factors of refs.[8, 4]. Let us consider first the range 1.12 < s < 1.52 GeV2 where from experiment [20] η ≃ 1 within errors. With ǫ = ± tan θ|Γ′2/Γ′1| and ρ = δ(−) − δ(+), eq.(3.22) allows us to write, Γπ = λ cos θ |Γ′1|eiδ(+)(1 + ǫ cos ρ) 1 + i ǫ sin ρ 1 + ǫ cos ρ . (3.25) When η → 1 then ǫ → 0, according to the expansion,#3 tan θ = (1− η)/2 1− 1 + 3 cos 2∆ 8 sin2∆ (1− η) (1− η)5/2 . (3.26) Rewriting, 1 + i ǫ sin ρ 1 + ǫ cos ρ = exp ǫ sin ρ 1 + ǫ cos ρ +O(ǫ2) , (3.27) which from eqs.(3.25) and (3.27) implies a shift in δ(+) because of inelasticity effects, δ(+) → δ(+) + ǫ sin ρ 1 + ǫ cos ρ . (3.28) #3The the ratio |Γ′2/Γ′1|, present in ǫ, is not expected to be large since the f0(1300) couples mostly to 4π and similarly to ππ and KK̄, and the f0(1500) does mostly to ππ [28]. Using η = 0.8 in the range 1.12 . s . 1.52 GeV, η ≃ 1 from the energy dependent analysis of ref.[20] given by the K−matrix of eq.(3.13), one ends with ǫ ≃ 0.3. Taking into account that δ(+) is larger than & 3π/2 for δπ(sK) ≥ π (in this case δ(+) ≃ δπ), and around 3π/4 for δπ(sK) < π, see fig.2, one ends with relative corrections to δ(+) around 6% for the former case and 13% for the latter. Although the K−matrix of ref.[20], eq.(3.13), is given up to 1.9 GeV, one should be aware that to take only the two channels ππ and KK̄ in the whole energy range is an oversimplification, particularly above 1.2 GeV. Because of this we finally double the previous estimate. Hence I3 is calculated with a relative error of 12% for δπ(sK) ≥ π and 25% for δπ(sK) < π. In the narrow region between sK < s < 1.1 2 GeV2, η can be rather different from 1, due to the f0(980) that couples very strongly to the just open KK̄ channel. However, from the direct measurements of ππ → KK̄ [29], where 1 − η2 is directly measured,#4 one has a better way to determine η than from ππ scattering [20, 21]. It results from the former experiments, as shown also by explicit calculations [30, 31, 32], that η is not so small as indicated in ππ experiments [20], and one has η ≃ 0.6 − 0.7 for its minimum value. Employing η = 0.6 in eq.(3.28) then ǫ ≃ 0.5. Taking δ(+) around π/2 when δπ(sK) < π this implies a relative error of 30%. For δπ(sK) ≥ π one has instead δ(+) & π, and a 15% of estimated error. Regarding the ratio of the moduli of form factors entering in ǫ we expect it to be . 1 (see appendix A). Therefore, our error in the evaluation of I2 is estimated to be 30% and 15% for the cases δπ(sK) < π and δπ(sK) ≥ π, respectively. As a result of the discussion following eq.(3.24), we consider that the error estimates done for I2 and I3 in the case δπ(sK) < π are too conservative and that the relative errors given for δπ(sK) > π are more realistic. Nonetheless, since the absolute errors that one obtains for I2 and I3 are the same in both cases (because I2 and I3 for δπ(sK) < π are around a factor 2 smaller than those for δπ(sK) ≥ π) we keep the errors as given above. To the previous errors for I2 and I3 due to inelasticity, we also add in quadrature the noise in the calculation of QH due to the error in tππ from the uncertainties in the parameters of the K−matrix eqs.(3.13), (3.14), and those in the parameterizations CGL and PY. We finally employ for s > 2.25 GeV2 the knowledge of the asymptotic phase of the pion scalar form factor in order to evaluate QA in eq.(3.12). The function φ(s) is determined so as to match with the asymptotic behaviour of Γπ(t) as −1/t from QCD. The Omnès representation of the scalar form factor, eqs.(2.2) and (2.8), tends to t−q/π and t−q/π+1 for t → ∞, respectively. Here, q is the asymptotic value of the phase φ(s) when s → ∞. Hence, for δπ(sK) < π the function φ(s) is then required to tend to π while for δπ(sK) ≥ π the asymptotic value should be 2π. The way φ(s) is predicted to approach the limiting value is somewhat ambiguous [11, 12], φas(s) ≃ π log(s/Λ2) . (3.29) In this equation, 2dm = 24/(33 − 2nf ) ≃ 1, Λ2 is the QCD scale parameter and n = 1, 2 for δπ(4m K) < π, ≥ π, respectively. The case n = 2 was not discussed in refs.[10, 11, 12, 13, 14] for the form factor given in eq.(1.1). There is as well a controversy between [14] and [12] regarding the ± sign in eq.(3.29). If leading twist contributions dominate [11, 12] then the limiting value is reached from above and one has the plus sign, while if twist three contributions are the dominant ones [14] the minus sign has to be considered [12]. In the left panel of fig.2 we show with the wide #4Neglecting multipion states. φ(s) I I II II δπ(sK) ≥ π < π ≥ π < π I1 0.435± 0.013 0.435± 0.013 0.483± 0.013 0.483± 0.013 I2 0.063± 0.010 0.020± 0.006 0.063± 0.010 0.020± 0.006 I3 0.143± 0.017 0.053± 0.013 0.143± 0.017 0.053± 0.013 QH 0.403± 0.024 0.508± 0.019 0.452± 0.024 0.554± 0.019 QA 0.21± 0.03 0.10± 0.03 0.21± 0.03 0.10± 0.03 〈r2〉πs 0.61± 0.04 0.61± 0.04 0.66± 0.04 0.66± 0.04 Table 1: Different contributions to 〈r2〉πs as defined in eqs.(3.12) and (3.23). All the units are fm2. In the value for 〈r2〉πs the errors due to I1, I2, I3 and QA are added in quadrature. bands the values of φ(s)as for s > 2.25 GeV 2 from eq.(3.29), considering both signs, for n = 1 (δπ(sK) < π) and 2 (δπ(sK) ≥ π). We see in the figure that above 1.4−1.5 GeV (1.96−2.25 GeV2) both ϕ(s) and φ(s)as phases match and this is why we take sH = 2.25 GeV 2 in eq.(3.11), similarly as done in refs.[10, 11]. In this way, we also avoid to enter into hadronic details in a region where η < 1 with the onset of the f0(1500) resonance. The present uncertainty whether the + or − sign holds in eq.(3.29) is taken as a source of error in evaluating QA. The other source of uncertainty comes from the value taken for Λ2, 0.1 < Λ2 < 0.35 GeV2, as suggested in ref.[10]. From fig.2 it is clear that our error estimate for φas(s) is very conservative and should account for uncertainties due to the onset of inelasticity for energies above 1.4 − 1.5 GeV and to the appearance of the f0(1500) resonance. In the right panel of fig.2 we show the integrand for 〈r2〉πs , eq.(3.12), for parameterization I (dashed line) and II (solid line). Notice as the large uncertainty in φas(s) is much reduced in the integrand as it happens for the higher energy domain. In table 1 we show the values of I1, I2, I3, QH , QA and 〈r2〉πs for the parameterizations I and II and for the two cases δπ(sK) ≥ π and δπ(sK) < π. This table shows the disappearance of the disagreement between the cases δπ(sK) ≥ π and δπ(sK) < π from the ππ and KK̄ T−matrix of eq.(3.13), once the zero of Γπ(t) at s1 < sK is taken into account for the former case. This disagreement was the reason for the controversy between Ynduráin and ref.[13] regarding the value of 〈r2〉πs . The fact that the parameterization II gives rise to a larger value of 〈r2〉πs than I is because PY follows the upper δπ data below 0.9 GeV, while CGL follows lower ones, as shown in fig.1. The different errors in table 1 are added in quadrature. The final value for 〈r2〉πs is the mean between those of parameterizations I and II and the error is taken such that it spans the interval of values in table 1 at the level of two sigmas. One ends with: 〈r2〉πs = 0.63± 0.05 fm 2 . (3.30) The largest sources of error in 〈r2〉πs are the uncertainties in the experimental δπ and in the asymptotic phase φas. This is due to the fact that the former are enhanced because of its weight in the integrand, see fig.2, and the latter due to its large size. Our number above and that of refs.[1, 4], 〈r2〉πs = 0.61± 0.04 fm2, are then compatible. On the other hand, we have also evaluated 〈r2〉πs directly from the scalar form factor obtained with the dynamical approach of ref.[8] from Unitary χPT and we obtain 〈r2〉πs = 0.64±0.06 fm2, in perfect agreement with eq.(3.30). Notice that the scalar form factor of ref.[8] has δπ(sK) > π and we have checked that it has a zero at s1, as it should. This is shown in fig.3 by the dashed-double-dotted line. The value 〈r2〉πs = 0.75±0.07 fm2 from refs.[10, 11] is much larger than ours because the possibility of a zero at s1 was not taking into account there and other solution was considered. This solution, however, has an unstable behaviour under the transition δπ(sK) = π− 0+ to δπ(sK) = π+0+ and it cannot be connected continuously with the one for δπ(sK) < π. Our solution for Γπ(t) from Ynduráin’s method does not have this unstable behaviour and it is continuous under changes in the values of the parameters of the K−matrix, eqs.(3.13) and (3.14). This is why, from our results, it follows too that the interesting discussion of ref.[11], regarding whether δπ(sK) < π or ≥ π, is not any longer conclusive to explain the disagreement between the values of refs.[10, 11] and ref.[1] for 〈r2〉πs . We can also work out from our determination of 〈r2〉πs , eq.(3.30), values for the O(p4) SU(2) χPT low energy constant ℓ̄4. We take the two loop expression in χPT for 〈r2〉πs [1], 〈r2〉πs = 8π2f 2π ℓ̄4 − + ξ∆r , (3.31) where fπ = 92.4 MeV is the pion decay constant, ξ = (Mπ/4πfπ) 2 and Mπ is the pion mass. First, at the one loop level calculation ∆r = 0 and then one obtains, ℓ̄4 = 4.7± 0.3 . (3.32) We now move to the determination of ℓ̄4 based on the full two loop relation between 〈r2〉πs and ℓ̄4. The expression for ∆r can be found in Appendix C of ref.[1]. ∆r is given in terms of one O(p6) χPT counterterm, r̃S2 , and four O(p4) ones. Taking the values of all these parameters, but for ℓ̄4, from ref.[1], and solving for ℓ̄4, one arrives to ℓ̄4 = 4.5± 0.3 . (3.33) This number is in good agreement with ℓ̄4 = 4.4± 0.2 [1]. Ref.[12] also points out that one loop χPT fits to the S-, P- and D-wave scattering lengths and effective ranges give rise to much larger values for ℓ̄2 and ℓ̄4 than those of ref.[1]. For more details we refer to [12]. 4 Conclusions In this paper we have addressed the issue of the discrepancies between the values of the quadratic pion scalar radius of Leutwyler et al. [4, 13], 〈r2〉πs = 0.61 ± 0.04 fm2, and Ynduráin’s papers [10, 11, 12], 〈r2〉 = 0.75±0.07 fm2. One of the reasons of interest for having a precise determination of 〈r2〉πs is its contribution of a 10% to a00 and a20, calculated with a precision of 2% in ref.[1]. The value taken for 〈r2〉πs is also important for determining the O(p4) χPT coupling ℓ̄4. From our study it follows that Ynduráin’s method to calculate 〈r2〉πs [10, 11], based on an Omnès representation of the pion scalar form factor, and that derived by solving the two(three) coupled channel Muskhelishvili-Omnès equations [4, 1, 6], are compatible. It is shown that the reason for the aforementioned discrepancy is the presence of a zero in Γπ(t) for those S-wave I=0 T−matrices with δπ(sK) ≥ π and elastic below the KK̄ threshold, with sK = 4m2K . This zero was overlooked in refs.[10, 11], though, if one imposes continuity in the solution obtained under tiny changes of the ππ phase shifts employed, it is necessarily required by the approach followed there. Once this zero is taken into account the same value for 〈r2〉πs is obtained irrespectively of whether δπ(sK) ≥ π or δπ(sK) < π. Our final result is 〈r2〉πs = 0.63 ± 0.05 fm2. The error estimated takes into account experimental uncertainty in the values of δπ(s), inelasticity effects and present ignorance in the way the phase of the form factor approaches its asymptotic value π, as predicted from QCD. Employing our value for 〈r2〉πs we calculate ℓ̄4 = 4.5 ± 0.3. The values 〈r2〉πs = 0.61± 0.04 fm2 and ℓ̄4 = 4.5± 0.3 of ref.[1] are then in good agreement with ours. Acknowledgements We thank Miguel Albaladejo for providing us numerical results from some unpublished T−matrices and Carlos Schat for his collaboration in a parallel research. We also thank F.J. Ynduráin for long discussions and B. Anathanarayan, I. Caprini, G. Colangelo, J. Gasser and H. Leutwyler for a critical reading of a previous version of the manuscript. This work was supported in part by the MEC (Spain) and FEDER (EC) Grants FPA2004-03470 and Fis2006-03438, the Fundación Séneca (Murcia) grant Ref. 02975/PI/05, the European Commission (EC) RTN Network EURIDICE under Contract No. HPRN-CT2002-00311 and the HadronPhysics I3 Project (EC) Contract No RII3-CT-2004-506078. Appendices A Coupled channel dynamics We take ππ and KK̄ coupled channels and denote by F1 and F2 their respective I=0 scalar form factors. Unitarity requires, ImFi = Fjρjθ(t− s′j)t∗ji , (A.1) where ||tij|| is the I=0 S-wave T−matrix, s′i is the threshold energy square of channel i and ρi = qi/8π s, with qi its center of mass three momentum. A general solution to the previous equations is given by, F = T G , F = , G = , (A.2) where the functions Gi(t) do not have right hand cut. This equation is interesting as tells us that if pion dynamics dominate, |G1| >> |G2|, then F1 ≃ G1t11 and the form factor phase φ(s) follows ϕ(s). As a result, like t11, it has a zero at s1 below the KK̄ threshold for δπ(sK) ≥ π, as shown in section 3. On the other hand, if kaon dynamics dominates, |G2| >> |G1|, then F1 ≃ G2t12 and φ(s) follows the phase of t12, that above the KK̄ threshold is clearly above π. This is why for the pion strange scalar form factor there is no zero at s1 . sK for δπ(sK) ≥ π, indeed there is a maximum like that shown in fig.3 by the dashed-dotted line. As in section 3 we now proceed to the diagnolization above the KK̄ threshold of the renormal- ized T−matrix T ′, T ′ = ρ1/2Tρ1/2 , ρ = , T̃ = CTT ′C = t̃11 0 0 t̃22 t̃11 = sin δ(+)e iδ(+) , t̃22 = sin δ(−)e iδ(−) . (A.3) The corresponding diagonal form factors F ′1 and F 2, collected in the vector F ′, are F ′ = CTρ1/2F = T̃CTρ−1/2G = cos θ ρ 1 G1 − sin θ ρ t̃11{ sin θ ρ 1 G1 + cos θ ρ  . (A.4) The previous expressions allow to obtaining F1 directly in terms of the eigenphases and with clean separation between pion, G1, and kaon dynamics, G2. From eq.(3.22) it follows that, cos2 θ ρ−1G1 − cos θ sin θ ρ−1/22 ρ sin2 θ ρ−11 G1 + cos θ sin θ ρ t̃22 . (A.5) For δπ(sK) ≥ π typical values, somewhat above the KK̄ threshold, are e2iδ(+) ≃ +i, e2iδ(−) ≃ −i and sin θ > 0. For dominance of G1 one has F1/G1 ≃ ρ−11 (i + cos 2θ)/2 while for dominance of G2 the result is F1/G2 ≃ − sin θ cos θ ρ−1/22 ρ 1 < 0. The factors G1,2 do not introduce any change in φ(s) with respect to its value before the opening of the KK̄ threshold since they are smooth functions in s.#5 In both cases the phase φ(s) is larger than π and F1 follows the upper trend of phases shown in fig.2 (note that in this case t̃11 is in the first quadrant though δπ > π). Now, doing the same exercise for δπ(sK) < π, one has the typical values e 2iδ(+) ≃ −i, e2iδ(−) ≃ +i and sin θ < 0. For pion dominance then F1/G1 ≃ ρ−11 (i− cos 2θ)/2 and for the kaon one F1/G2 ≃ + sin θ cos θρ−1/22 ρ 1 < 0. Thus, in the former case the phase is & π/2, and follows the lower trend of phases of fig.2, while in the latter is & π and follows again the upper trend (this is the case of the strange scalar form factor). The demonstration in section 3 that φ(sK) is discontinuous in the limit δπ(sK) → π− by taking s1 → s+K , cannot be applied in the case of kaon dominance (e.g. pion strange scalar form factor). From eq.(A.5) it follows that, F1(t) ≃ − cos θ sin θρ−1/22 ρ t̃11 − t̃22 . (A.6) The point is that t̃22 for t ≥ s1 (s1 → s+K) is of size comparable with that of t̃11 (both tend to zero) and the phase does not follow δ(+). This is not the case for pion dominance because for s1 → s+K then sin2 θ → 0, F1(t) ≃ cos2 θ ρ−11 G1t̃11, eq.(A.5), and φ(s) follows δ(+). From eq.(A.4) we can also write |Γ′2/Γ′2| ≃ |t̃11 tan θ/t̃22| for the case of pion dominance. Since typically |t̃11/t̃22| ≃ 1, as shown above for energies somewhat above the KK̄ threshold, then |Γ′2/Γ′1| ≃ | tan θ| < 1. This is why we consider that equating it to 1 in section 3 is a conservative estimate. #5Due to the Adler zeroes this is not necessarily case close to the ππ threshold. References [1] G. Colangelo, J. Gasser and H. Leutwyler, Nucl. Phys. B603, 125 (2001). [2] J. Gasser and H. Leutwyler, Phys. Lett. B125, 325 (1983). [3] G. Colangelo and S. Dür, Eur. Phys. J. C33, 543 (2004). [4] J. F. Donoghue, J. Gasser and H. Leutwyler, Nucl. Phys. B343, 341 (1990). [5] B. Ananthanarayan, G. Colangelo, J. Gasser and H. Leutwyler, Phys. Rep. 353, 207 (2001). [6] B. Moussallam, Eur. Phys. J. C14, 111 (2000). [7] J. Gasser and U.-G. Meißner, Nucl. Phys. B357, 90 (1991). [8] U. G. Meißner and J. A. Oller, Nucl. Phys. A679, 671 (2001). [9] J. Bijnens, G. Colangelo and P. Talavera, JHEP 9805, 014 (1998). [10] F. J. Ynduráin, Phys. Lett. B578, 99 (2004); (E)-ibid B586, 439 (2004). [11] F. J. Ynduráin, Phys. Lett. B612, 245 (2005). [12] F. J. Ynduráin, arXiv:hep-ph/0510317. [13] B. Ananthanarayan, I. Caprini, G. Colangelo, J. Gasser and H. Leutwyler, Phys. Lett. B602, 218 (2004). [14] I. Caprini, G. Colangelo and H. Leutwyler, Int. J. Mod. Phys. A21, 954 (2006). [15] M. Jamin, J.A. Oller and A. Pich, JHEP 0402, 047 (2004); Phys. Rev. D74, 074009 (2006). [16] M. Jamin, J.A. Oller and A. Pich, Nucl. Phys. B 587, 331 (2000). [17] J. Bijnens and P. Talavera, Nucl. Phys. B669, 341 (2003). [18] O. P. Yushchenko et al., Phys. Lett. B581, 31 (2004). [19] T. Alexopoulos et al. [KTeV Collaboration], Phys. Rev. D70, 092007 (2004). [20] B. Hyams et al., Nucl. Phys. B64, 134 (1973). [21] R. Kaminski, L. Lesniak and K. Rybicki, Z. Phys. C 74, 79 (1997). [22] F. Guerrero and J. A. Oller, Nucl. Phys. B537, 459 (1999); (E)-ibid. B602, 641 (2001). [23] S. J. Brodsky and G. P. Lepage, Phys. Rev. D22, 2157 (1980). [24] J. R. Peláez and F. J. Ynduráin, Phys. Rev. D68, 074005 (2003); ibid D71, 074016 (2005). http://arxiv.org/abs/hep-ph/0510317 [25] S Pislak et al. [BNL-E865 Collaboration], Phys. Rev. Lett. 87, 221801; Phys. Rev. D67, 072004 (2003). [26] L. Masetti [NA48/2 Collaboration], arXiv:hep-ex/0610071. [27] G. Grayer et al., Nucl. Phys. B 75 (1974) 189. [28] W.-M. Yao et al., Journal of Physics G33, 1 (2006). [29] W. Wetzel et al., Nucl. Phys. B115, 208 (1976); V. A. Polychromatos et al., Phys. Rev. D19, 1317 (1979); D. Cohen et al. Phys. Rev. D22, 2595 (1980); E. Etkin et al., Phys. Rev. D25, 1786 (1982). [30] J. A. Oller and E. Oset, Nucl. Phys. A 620 (1997) 438 (E)-ibid. A 652 (1999) 407]. [31] J. A. Oller and E. Oset, Phys. Rev. D 60 (1999) 074023. [32] M. Albaladejo and J. A. Oller, forthcoming. Here the 4π channel is included. [33] J. A. Oller, Nucl. Phys. A727, 353 (2003). http://arxiv.org/abs/hep-ex/0610071 Introduction Scalar form factor Results Conclusions Coupled channel dynamics ABSTRACT The quadratic pion scalar radius, \la r^2\ra^\pi_s, plays an important role for present precise determinations of \pi\pi scattering. Recently, Yndur\'ain, using an Omn\`es representation of the null isospin(I) non-strange pion scalar form factor, obtains \la r^2\ra^\pi_s=0.75\pm 0.07 fm^2. This value is larger than the one calculated by solving the corresponding Muskhelishvili-Omn\`es equations, \la r^2\ra^\pi_s=0.61\pm 0.04 fm^2. A large discrepancy between both values, given the precision, then results. We reanalyze Yndur\'ain's method and show that by imposing continuity of the resulting pion scalar form factor under tiny changes in the input \pi\pi phase shifts, a zero in the form factor for some S-wave I=0 T-matrices is then required. Once this is accounted for, the resulting value is \la r^2\ra_s^\pi=0.65\pm 0.05 fm^2. The main source of error in our determination is present experimental uncertainties in low energy S-wave I=0 \pi\pi phase shifts. Another important contribution to our error is the not yet settled asymptotic behaviour of the phase of the scalar form factor from QCD. <|endoftext|><|startoftext|> Introduction The paper addresses a topic related to conditionally free (or, shortly, using the term from [2], c-free) probability. This notion was developed in the ’90’s (see [1], [2]) as an extension of freeness within the framework of ∗-algebras endowed with not one, but two states. Namely, given a family of unital algebras {A}i∈I, each Ai endowed with two expectations ϕi, ψi : Ai −→ C, their c-free product is the triple (A, ϕ, ψ), where: (i) A = ∗i∈IAi is the free product of the algebras Ai. (ii) ψ = ∗i∈Iψi and ϕ = ∗(ψi)i∈Iϕi are expectations given by the relations (a) ψ(a1 · · · an) = 0 (b) ϕ(a1 · · · an) = ϕε(1)(a1) · · ·ϕε(n)(an) for all aj ∈ Aε(j), j = 1, . . . , n such that ψε(j)(aj) = 0 and ε(1) 6= · · · 6= ε(n). A key result is that if the Ai are ∗-algebras and ϕi, ψi are positive functionals, then ϕ and ψ are also positive. In [6], the positivity of the free product maps ϕ, ψ is proved for the case when ϕ1, ϕ2 are positive conditional expectations in a common C∗-subalgebra, but ψ1, ψ2 remain positive C-valued maps. A more general situation is indeed discussed (see Theorem 3, Section 6, from [6]), but the question if ϕ, ψ re positive for ϕ1,2, ψ1,2 arbitrary positive conditional expectations is left unanswered. A first answer was given in [8], where we showed that for A a ∗-algebra, the analogous construction with both ϕ and ψ valued in a C∗-subalgebra B of A still retains the positivity. The present paper further develops this result (see Theorem 2.3) and also demonstrates the use of multilinear function series in c-free setting. 2000 Mathematics Subject Classification. Primary 45L53; Secondary 46L08. Key words and phrases. conditional freeness, conditional expectation, R-transform, multi- linear function series. http://arxiv.org/abs/0704.0040v3 2 MIHAI POPA In [2] is constructed a c-free version of Voiculescu’s R-transform, which we will call the cR-transform, with the property that cRX+Y = cRX + cRY if X and Y are c-free elements from the algebra A relative to ϕ and ψ (i.e. the relations (a) and (b) from the definition of the c-free product hold true for the subalgebras generated by X and Y .) The apparatus of multilinear function series is used in recent work of K. Dykema ([3] and [4]) to construct suitable analogues for the R and S-transforms in the framework of freeness with amalgamation. We will show that this construction is also appropriate for the cR-transform mentioned above. The techniques used differ from the ones of [3], the Fock space type construction being substituted by combinatorial techniques similar to [2] and [7]. Particularly, Theorems 3.3 and 3.6 contain new (shorter) proves of the results 6.1–6.13 from [3]. The paper is structured in four sections. In Section 2 are stated the basic definitions and are proved the main positivity results. Section 3 describes the construction and the basic property of the multilinear function series cR-transform and Section 4 treats the central limit theorem and a related positivity property. 2. Definitions and positivity results Definition 2.1. Let {Ai}i∈I be a family of algebras, all containing the subalgebra B. Suppose D is a subalgebra of B and Ψi : Ai −→ D and Φi : Ai −→ B are con- ditional expectations, i ∈ I. We say that the triple (A,Φ,Ψ) = ∗i∈I(Ai,Φi,Ψi) is the conditionally free product with amalgamation over (B,D), or shortly, the c-free product, of the triples (Ai,Φi,Ψi)i∈I if (1) A is the free product with amalgamation over B of the family (Ai)i∈I (2) Ψ = ∗i∈IΨi and Φ = ∗(Ψi)i∈IΦi are determined by the relations Ψ(a1a2 . . . an) = 0 Φ(a1a2 . . . an) = Φ(a1)Φ(a2) . . .Φ(an), for all ai ∈ Aε(i), ε(i) ∈ I such that ε(1) 6= ε(2) 6= · · · 6= ε(n) and Ψε(i)(ai) = 0. When D = C, this definition reduces to the one given in [6]. When both B and D are equal to C, this definition was given in [2]. When discussing positivity, we need a ∗-structure on our algebras. We will demand that B and D be C∗-algebras, while Ai and A are only required to be ∗-algebras. The following results are slightly modified versions of Lemma 6.4 and Theorem 6.5 from [8]. Lemma 2.2. Let B be a C∗-algebra and A1, A2 be two ∗-algebras containing B as a ∗-subalgebra, endowed with positive conditional expectations Φj : Aj −→ B, j = 1, 2. If a1, . . . , an ∈ A1, an+1, . . . , an+m ∈ A2 and A = (Ai,j)i,j ∈ Mn+m(B) is the matrix with the entries Ai,j = i aj) if i, j ≤ n i )Φ2(aj) if i ≤ n, j > n i )Φ1(aj) if i > n, j ≤ n i aj) if i, j > n MULTILINEAR FUNCTION SERIES, C-FREE PROBABILITY WITH AMALGAMATION 3 then A is positive. Proof. The vector space E = B ⊕ ker(Φ1) ⊕ ker(Φ2) has a B-bimodule structure given by the algebraic operations on A1 and A2. Consider the B-sesquilinear pairing 〈·, ·〉 : E× E −→ B determined by the relations: 〈b1, b2〉 = b∗1b2, for b1, b2 ∈ B 〈uj, vj〉 = Φj(u∗jvj), for uj , vj ∈ ker(Φj), j = 1, 2 〈u1, u2〉 = 〈u2, u1〉 = 0 for u1 ∈ ker(Φ1), and u2 ∈ ker(Φ2). 〈b, uj〉 = 〈uj , b〉 = 0 for all b ∈ B, uj ∈ Aj With this notation, we have that Ai,j = 〈ai, aj〉, hence it suffices to show that 〈a, a〉 ≥ 0 for all a ∈ E. Indeed, for an element a = b + u1 + u2 with b ∈ B, uj ∈ ker(Φj), j = 1, 2, we have: 〈a, a〉 = 〈b + u1 + u2, b+ u1 + u2〉 = 〈b, b〉+ 〈u1, u1〉+ 〈u2, u2〉 = b∗b+Φ1(u 1u1) + Φ2(u Theorem 2.3. Let B be a C∗-algebra and D a C∗-subalgebra of B. Suppose that A1, A2 are ∗-algebras containing B, each endowed with two positive conditional expectations Φj : Aj −→ B, and Ψj : Aj −→ D, j = 1, 2 and consider the c-free product (A,Φ,Ψ) = ∗i=1,2(Ai,Φi,Ψi). Then the maps Φ and Ψ are positive. Proof. The positivity of Ψ is by now a classical result in the theory of free proba- bility with amalgamation over a C∗-algebra (for example, see [9], Theorem 3.5.6). For the positivity of Φ we have to show that Φ(a∗a) ≥ 0 for any a ∈ A. Any element of A can be written as s1,k . . . sn(k),k, where sj,k ∈ Aε(j,k) ε(1, k) 6= ε(2, k) 6= · · · 6= ε(n(k), k). Writing s(j,k) = s(j,k) −Ψ(s(j,k)) +Ψ(s(j,k)) and expanding the product, we can consider a of the form a = d+ a1,k . . . an(k),k with d ∈ D ⊂ B and aj,k ∈ Aε(j,k) such that Ψε(j,k)(aj,k) = 0 and ε(1, k) 6= ε(2, k) 6= · · · 6= ε(n(k), k). 4 MIHAI POPA Therefore Φ(a∗a) = Φ d+ d∗ a1,k . . . an(k),k a1,k . . . an(k),k a1,k . . . an(k),k ]∗[ N∑ a1,k . . . an(k),k Since Φ is a conditional expectation and d ∈ D ⊂ B, the above equality becomes Φ(a∗a) = d∗d+ d∗Φ(a1,k . . . an(k),k) + Φ(a∗n(k),k . . . a 1,k)d k,l=1 a∗n(k),k . . . a 1,ka1,l . . . an(l),l Using the definition of the conditionally free product with amalgamation over B and that Ψε(j,k)(aj,k) = 0 for all j, k, one further has Φ(a∗a) = d∗d+ Φ(d∗a1,k)Φ(a2,k) . . .Φ(an(k),k) Φ(an(k),k) . . .Φ(a∗2,k)Φ(a 1,kd) k,l=1 Φ(an(k),k) ∗ . . .Φ(a∗2,k) Φ(a∗1,ka1,l)Φ(a2,l) . . .Φ(an(l),l) that is Φ(a∗a) = d∗d+ Φ(d∗a1,k) Φ(a2,k) . . .Φ(an(k),k) Φ(a2,k) . . .Φ(an(k),k) Φ(a∗1,kd) k,l=1 Φ(a2,k) . . .Φ(an(k),k) Φ(a∗1,ka1,l) Φ(a2,l) . . .Φ(an(l),l) From Lemma 2.2, the matrix S = Φ(a∗1,ia1,j) i,j=1 is positive in MN+1(B), therefore S = T ∗T, for some T ∈MN+1(B). MULTILINEAR FUNCTION SERIES, C-FREE PROBABILITY WITH AMALGAMATION 5 Denote now a1,N+1 = d and vk = Φ(a2,k) . . .Φ(an(k),k).The identity for Φ(a becomes: Φ(a∗a) = (v1, . . . , vN , 1) ∗T ∗T (v1, . . . , vN , 1) ≥ 0, as claimed. Theorem 2.4. Assume that I = j∈J Ij is a partition of I. Then: ∗j∈J (∗i∈Ij (Ai,Φi,Ψi)) = ∗i∈I(Ai,Φi,Ψi) Proof. The proof is identical to the proofs of similar results in [6] and [2]. Consider ai ∈ Aε(i), 1 ≤ i ≤ m such that ε(1) 6= ε(2 6= · · · 6= ε(m) and Ψε(i)(ai) = 0. Let 1 = i0 < i1 < · · · < ik = m and Jl = {ε(i), il−1 ≤ i < il}. Since (∗j∈JlΨj) ((ail−1 · · · ail)) = 0, it suffices to show that Φ(a1 · · · am) = (∗(Ψj),j∈JlΦj)(ail−1 · · · ail)] . Φ(a1 · · · am) = Φε(1)(a1) · · ·Φε(m)(am) while, since Ψε(i)(ai) = 0, (∗(Ψj),j∈JlΦj)(ail−1 · · · ail) = Φil−1(ail−1) · · ·Φil(ail) and the conclusion follows. � Definition 2.5. Let A be an algebra (respectively a ∗-algebra),B a subalgebra (∗- subalgebra) of A and D a subalgebra (∗-subalgebra) of B. Suppose A is endowed with the conditional expectations Ψ : A −→ D and Φ : A −→ D. (i) The subalgebras (∗-subalgebras) (Ai)i∈I of A are said to be c-free with respect to (Φ,Ψ) if: (a) (Ai)i∈I are free with respect to Ψ. (b) if ai ∈ Aε(i), 1 ≤ i ≤ m, are such that ε(1) 6= · · · 6= ε(m) and Ψ(ai) = 0, then Φ(a1 · · ·am) = Φ(a1) · · ·Φ(am). (ii) The elements (Xi)i∈I of A are said to be c-free with respect to (Φ,Ψ) if the subalgebras (∗-subalgebras) generated by B and Xi are c-free with respect to (Φ,Ψ). We will denote by B〈ξ〉 the non-commutative algebra of polynomials in the symbol ξ and with coefficients from B (the coefficients do not commute with the symbol ξ). If I is a family of indices, B〈{ξi}i∈I〉 will denote the algebra of polynomials in the non-commuting variables {ξ}i∈I and with coefficients from B. We will identify B〈{ξi}i∈I〉 with the free product with amalgamation over B of the family {B〈ξi〉}i∈I . 6 MIHAI POPA If A is a ∗-algebra and B is with the C∗-algebra, B〈ξ〉 will also be considered with a ∗-algebra structure, by taking ξ∗ = ξ. If X is a selfadjoint element from A, we define the conditional expectations ΦX ,ΨX : B〈ξ〉 −→ B given by ΦX(f(ξ)) = Φ(f(X)) and ΨX(f(ξ)) = Ψ(f(X)) , for any f(ξ) ∈ B〈ξ〉. Corollary 2.6. Suppose that A is a ∗-algebra and X and Y are c-free selfadjoint elements of A such that the maps ΦX ,ΨX and ΦY ,ΨY are positive. Then the maps ΦX+Y and ΨX+Y are also positive. Proof. The positivity of ΨX+Y is an immediate consequence of the fact that X and Y are free with amalgamation over B with respect to Ψ. It remains to prove the positivity of ΦX+Y . Since the maps ΦX : B〈ξ1〉 −→ B and ΦY : B〈ξ2〉 −→ B are positive, from Theorem 2.3 so is Φx ∗(ΨX ,ΨY ) ΦY : B〈ξ1〉 ∗B B〈ξ2〉 = B〈ξ1, ξ2〉 −→ B Remark also that iZ : B〈ξ〉 ∋ f(ξ) 7→ f(X + Y ) ∈ B〈ξ1〉 ∗B B〈ξ2〉 is a positive B-functional. The conclusion follows from the fact that the c-freeness of X and Y is equivalent ΦX+Y = (ΦX ∗(ΨX ,ΨY ) ΦY ) ◦ iX+Y . 3. Multilinear function series and the cR-transform Let A be a ∗-algebra containing the C∗-algebra B, endowed with a conditional expectation Ψ : A −→ B. If X is a selfadjoint element of A, then by the moment of order n of X we will understand the map X : B× · · · ×B︸ ︷︷ ︸ n−1 times X (b1, . . . , bn−1) = Ψ(Xb1X . . .Xbn−1X) If B = C, then the moment-generating series of X mX(z) = Ψ(Xn)zn encodes all the information about the moments of X . For B 6= C, the straightfor- ward generalization mX(z) = Ψ(Xn)zn generally fails to keep track of all the possible moments of X . A solution to this inconvenience was proposed in [3], namely the moment-generating multilin- ear function series of X . Before defining this notion, we will briefly recall the construction and several results on multilinear function series. MULTILINEAR FUNCTION SERIES, C-FREE PROBABILITY WITH AMALGAMATION 7 Let B be an algebra over a field K. We set B̃ equal to B if B is unital and to the unitization of B otherwise. For n ≥ 1, we denote by Ln(B) the set of all K-multilinear maps ωn : B× · · · ×B︸ ︷︷ ︸ n times A formal multilinear function series over B is a sequence ω = (ω0, ω1, . . . ), where ω0 ∈ B̃ and ωn ∈ Ln(B) for n ≥ 1. According to [3], the set of all multilinear function series over B will de denoted by Mul[[B]]. For α, β ∈Mul[[B]], the formal sum α+ β and the formal product αβ are the elements from Mul[[B]] defined by: (α+ β)n(b1, . . . , bn) = αn(b1, . . . , bn) + βn(b1, . . . , bn) (αβ)n(b1, . . . , bn) = αk(b1, . . . , bk)βn−k(bk+1, . . . , bn) for any b1, . . . , bn ∈ B. If β0 = 0, then the formal composition α ◦ β ∈Mul[[B]] is defined by (α ◦ β)0 = α0 and, for n ≥ 1, by (α ◦ β)n(b1, . . . , bn) = βp1(b1, . . . , bp1), . . . , βpk(bqk+1, . . . , bqk+pk) where the second summation is done over all k-tuples p1, . . . , pk ≥ 1 such that p1 + · · ·+ pk = n and qj = p1 + · · ·+ pj−1. One can work with elements ofMul[[B]] as if they were formal power series. The relevant properties are described in [3], Proposition 2.3 and Proposition 2.6. As in [3], we use 1, respectively I, to denote the identity elements of Mul[[B]] relative to multiplication, respectively composition. In other words, 1 = (1, 0, 0, . . . ) and I = (0, idB, 0, 0, . . . ). We will also use the fact that an element α ∈Mul[[B]] has an inverse with respect to formal composition, denoted α〈−1〉, if and only if α has the form (0, α1, α2, . . . ) with α1 an invertible element of L1(B). Definition 3.1. With the above notation, the moment-generating multilinear func- tion series MX of X is the element of Mul[[B]] such that: MX,0 = Ψ(X) MX,n(b1, . . . , bn) = Ψ(Xb1X · · ·XbnX). Given an element α ∈ Mul[[B]], the multilinear function series Rα is defined by the following equation (see [3], Def 6.1): (1 + αI) ◦ (I + IαI)〈−1〉. (3.1) A key property of R is that for any X,Y ∈ A free over B, we have RMX+Y = RMX +RMY . (3.2) 8 MIHAI POPA These relations were proved earlier in the particular case B = C. One can also describe Rα by combinatorial means, via the recurrence relation αn(b1, . . . , bn) = [b1αp(1)(b3, . . . , bi1−2)bi1−1], . . . . . . , [bi(k−1)αp(k)(bi(k−1)+1, . . . , bi(k)−2)bi(k)−1] bi(k)αn−ik(bik+1 , . . . , bn) where the second summation is done over all 1 = i(0) < i(1) < · · · < i(k) ≤ n and p(k) = i(k)− i(k − 1)− 2. Following an idea from [2], the above equation can be graphically illustrated by the picture: In the case of scalar c-free probability, an analogue of the Voiculescu’s R- transform is developed in [2]. In order to avoid confusions, we will denote it by cR. The cR-transform has the property that it linearizes the c-free convolution of pairs of compactly supported measures. In particular, if X and Y are c-free elements from some algebra A, then cRX+Y = cRX + cRY . If the ∗-algebraA is endowed with the C-valued states ϕ, ψ andX is a selfadjoint element of A, then (see [2]), the coefficients {cRm}m ≥ 0 of cRX are defined by the recurrence: ϕ(Xn) = l(1),...,l(k)≥0 l(1)+···+l(k)=n−k cRk · ψ(X l(1)) · · ·ψ(X l(k−1))ϕ(X l(k)) equation that can be graphically illustrated by the picture, were the dark boxes stand for the application of ϕ and the light ones for the application of ψ: The above considerations lead to the following definition: Definition 3.2. Let β, γ ∈Mul[[B]]. The multilinear function series cRβ,γ is the element of Mul[[B]] defined by the recurrence relation βn(b1, . . . , bn) = cRβ,γ,k [b1γp(1)(b3, . . . , bi1−2)bi1−1], . . . . . . , [bi(k−1)γp(k)(bi(k−1)+1, . . . , bi(k)−2)bi(k)−1] bi(k)βn−ik(bik+1 , . . . , bn) where the second summation is done over all 1 = i(0) < i(1) < · · · < i(k) ≤ n and p(k) = i(k)− i(k − 1)− 2. MULTILINEAR FUNCTION SERIES, C-FREE PROBABILITY WITH AMALGAMATION 9 The following analytical description of cRβ,γ also shows that it is unique and well-defined: Theorem 3.3. For any β, γ ∈Mul[[B]], Rβ,γ = β(1 + Iβ)−1 ◦ (I + IγI)〈−1〉 (3.3) Before proving the theorem, remark that the right-hand side of (3.3) is well- defined and unique, since 1 + Iγ is invertible with respect to the formal multipli- cation, I+ IβI is invertible with respect to formal composition and its inverse has 0 as first component (see [3]). We will need the following auxiliary result: Lemma 3.4. Let β be an element of Mul[[B]] and I the identity element with respect to formal composition, I = (0, idB, 0, 0 . . . ). (i) the multilinear function series Iβ is given by: (Iβ)0 = 0 (Iβ)n(b1, . . . , bn) = b1βn−1(b2, . . . , bn) (ii) the multilinear function series IβI is given by (IβI)0 = 0 (IβI)1(b1) = 0 (IβI)n(b1, . . . , bn) = b1βn−2(b2, . . . , bn−1)bn Proof. Since I = (0, idB, 0, . . . ), one has: (Iβ)0 = I0β0 = 0. If n ≥ 1, (Iβ)n(b1, . . . , bn) = Ik(b1, . . . , bk)βn−k(bk+1, . . . , bn) = I1(b1)βn−1(bk+1, . . . , bn) = b1βn−1(bk+1, . . . , bn). For IβI, the same computations give: (IβI)0 = (Iβ)0I0 = 0 (IβI)1 = (Iβ)0I1(b1) + (Iβ)1(b1)I0 If n ≥ 2, one has: (IβI)n(b1, . . . , bn) = (Iβ)k(b1, . . . , bk)In−k(bk+1, . . . , bn) = (Iβ)n−1(b1, . . . , bk)I1(b1) = b1βn−2(b2, . . . , bn−1)bn 10 MIHAI POPA Proof of the Theorem 3.3: Set σ = I + IβI. Then (cRβ,γ ◦ σ)n (b1, . . . , bn) = p1,...,pk≥1 p1+···+pk=n Rβ,γ,k σp1 (b1, . . . , bp1), . . . , σpk(bqk+1, . . . , bqk+pk) where qi = p1 + · · ·+ pi−1. From Lemma (3.4)(ii), we have that σn(b1, . . . , bn) = (I + IβI)n(b1, . . . , bn) therefore Definition 3.2 is equivalent to βn(b1, . . . , bn) = (cRβ,γ ◦ (I + IβI)k(b1, . . . , bk)) bk+1βn−k−2(bk+2, . . . , bn) Considering now Lemma 3.4(i), the above relation becomes βn(b1, . . . , bn) = (cRβ,γ ◦ (I + IβI)k(b1, . . . , bk)) (I + Iβ)n−k(bk+1, . . . , bn) therefore β = [cRβ,γ ◦ (I + IγI)] (1 + Iβ) which is equivalent to (3.3). � Remark 3.5. Up to a shift in the coefficients, equation (3.3) is similar to the result in the case B = C from [2], Theorem 5.1. Let X be a selfadjoint element of A. If A is endowed with two B-valued condi- tional expectations Φ,Ψ, the element X will have two moment-generating multi- linear function series, one with respect to Ψ, that we will denote by MX , and one with respect to Φ, denoted MX . For brevity, we will use the notation cRX for the multilinear function series cRMX ,MX . Theorem 3.6. Let X and Y be two elements of A that are c-free with respect to the pair of conditional expectations (Φ,Ψ). Then cRX+Y = cRX + Proof. Let A be an algebra containing B as a subalgebra and endowed with the conditional expectations Φ,Ψ : A −→ B. Consider the set A0 = A \ B (set difference). For n ≥ 1 define the maps cr : A0 × · · · × A0︸ ︷︷ ︸ n times given by the recurrence formula: Φ(a1 · · · an) = l(1)<···<|startoftext|> Introduction Since the formulation of quantum automorphism groups by Wang ([15], [16]), following suggestions of Alain Connes, many interesting examples of such quantum groups, particularly the quantum permutation groups of finite sets and finite graphs, have been extensively studied by a number of mathe- maticians (see, e.g. [1], [2], [17] and references therein), who have also found applications to and interaction with areas like free probability and subfactor theory. The underlying basic principle of defining a quantum automorphism group corresponding to some given mathematical structure (for example, a 1The author gratefully acknowldges support obtained from the Indian National Academy of Sciences through the grants for a project on ‘Noncommutative Geometry and Quantum Groups’, and also wishes to thank The Abdus Salam ICTP (Trieste), where a major part of the work was done during a visit as Junior Assciate. http://arxiv.org/abs/0704.0041v4 finite set, a graph, a C∗ or von Neumann algebra) consists of two steps : first, to identify (if possible) the group of automorphisms of the structure as a universal object in a suitable category, and then, try to look for the univer- sal object in a similar but bigger category by replacing groups by quantum groups of appropriate type. However, most of the work done so far concern some kind of quantum automorphism groups of a ‘finite’ structure, for ex- ample, of finite sets or finite dimensional matrix algebras. It is thus quite natural to try to extend these ideas to the ‘infinite’ or ‘continuous’ mathe- matical structures, for example classical and noncommutative manifolds. In the present article, we have made an attempt to formulate and study the quantum analogues of the groups of Riemannian isometries, which play a very important role in the classical differential geometry. The group of Rie- mannian isometries of a compact Riemannian manifold M can be viewed as the universal object in the category of all compact metrizable groups acting on M , with smooth and isometric action. Therefore, to define the quantum isometry group, it is reasonable to consider a category of compact quantum groups which act on the manifold (or more generally, on a noncommutative manifold given by spectral triple) in a ‘nice’ way, preserving the Riemannian structure in some suitable sense, to be precisely formulated. In this article, we have given a definition of such ‘smooth and isometric’ action by a com- pact quantum group on a (possibly noncommutative) manifold, extending the notion of smooth and isometric action by a group on a classical mani- fold. Indeed, the meaning of isometric action is nothing but that the action should commute with the ‘Laplacian’ coming from the spectral triple, and we should mention that this idea was already present in [2], though only in the context of a finite metric space or a finite graph. The universal object in the category of such quantum groups, if it exists, should be thought of as the quantum analogue of the group of isometries, and we have been able to prove its existence under some regularity assumptions, all of which can be verified for a general compact connected Riemannian manifold as well as the standard examples of noncommutative manifolds. Motivated by the ideas of Woronowicz and Soltan, we actually consider a bigger category. The isometry group of a classical manifold, viewed as a compact metrizable space (forgetiing the group structure), can be seen to be the universal object of a category whose object-class consists of subsets (not necessarily subgroups) of the set of smooth isometries of the manifold. Then it can be proved that this universal compact set has a canonical group structure. A natural quantum analogue of this has been formulated by us, called the category of ‘quantum families of smooth isometries’. The underlying C∗-algebra of the quantum isometry group has been identified with its universal object and moreover, it is shown to be equipped with a canonical coproduct making it into a compact quantum group. We believe that a detailed study of quantum isometry groups will not only give many new and interesting examples of compact quantum groups, it will also contribute to the understanding of quantum group covariant spectral triples. In fact, we have made some progress in this direction already by constructing a spectral triple (which is often closely related to the original spectral triple) on the Hilbert space of forms which is equivriant with respect to a canonical unitary representation of the quantum isometry group. In a companion article [3] with J. Bhowmick, we provide explicit compu- tations of quantum isometry groups of a few classical and noncommutative manifolds. However, we briefly quote some of main results of [3] in the present article. One interesting observation is that the quantum isometry group of the noncommutative two-torus Aθ (with the canonical spectral triple) is (as a C∗ algebra) a direct sum of two commutative and two non- commutative tori, and contains as a quantum subgroup (which is univer- sal for certain class of isometric actions called holomorphic isometries) the ‘quantum double-torus’ discovered and studied by Hajac and Masuda ([11]). 2 Definition of the quantum isometry group 2.1 Isometry groups of classical manifolds We begin with a well-known characterization of the isometry group of a (clas- sical) compact Riemannian manifold. Let (M,g) be a compact Riemannian manifold and let Ω1 = Ω1(M) be the space of smooth one-forms, which has a right Hilbert-C∞(M)-module structure given by the C∞(M)-valued inner product << ·, · >> defined by << ω, η >> (m) =< ω(m), η(m) > |m, where < ·, · > |m is the Riemannian metric on the cotangent space T mM at the pointm ∈M . The Riemannian volume form allows us to make Ω1 a pre- Hilbert space, and we denote its completion by H1. Let H0 = L 2(M,dvol) and consider the de-Rham differential d as an unbounded linear map from H0 toH1, with the natural domain C ∞(M) ⊂ H0, and also denote its closure by d. Let L := −d∗d. The following identity can be verified by direct and easy computation using the local coordinates : (∂L)(φ,ψ) ≡ L(φ̄ψ)−L(φ̄)ψ−φ̄L(ψ) = 2 << dφ, dψ >> for φ,ψ ∈ C∞(M) (∗). Proposition 2.1 A smooth map γ : M → M is a Riemannian isometry if and only if γ commutes with L in the sense that L(f ◦ γ) = (L(f)) ◦ γ for all f ∈ C∞(M). Proof : If γ commutes with L then from the identity (*) we get for m ∈ M and φ,ψ ∈ C∞(M) : < dφ|γ(m), dψ|γ(m) > |γ(m) = << dφ, dψ >> (γ(m)) (∂L(φ,ψ) ◦ γ)(m) ∂L(φ ◦ γ, ψ ◦ γ)(m) = << d(φ ◦ γ), d(ψ ◦ γ) >> (m) = < d(φ ◦ γ)|m, d(ψ ◦ γ)|m > |m = < (dγ|m) ∗(dφ|γ(m)), (dγ|m) ∗(dψ|γ(m)) > |m, which proves that (dγ|m) ∗ : T ∗ M → T ∗mM is an isometry. Thus, γ is a Riemannian isometry. Conversely, if γ is an isometry, both the maps induced by γ on H0 and H1, i.e. U γ : H0 → H0 given by U γ (f) = f ◦ γ and U γ : H 1 → H1 given by U1γ (fdφ) = (f ◦ γ)d(φ ◦ γ) are unitaries. Moreover, d ◦ U γ = U γ ◦ d on C∞(M) ⊂ H0. From this, it follows that L = −d ∗d commutes with U0γ . ✷ Now let us consider a compact metrizable (i.e. second countable) space Y with a continuous map θ : M × Y → M . We abbreviate θ(m, y) as ym and denote by ξy the map M ∋ m 7→ ym. Let α : C(M) → C(M)⊗C(Y ) ∼= C(M × Y ) be the map given by α(f)(m, y) := f(ym) for y ∈ Y , m ∈ M and f ∈ C(M). For a state φ on C(Y ), denote by αφ the map (id⊗ φ) ◦ α : C(M) → C(M). We shall also denote by C the subspace of C(M) ⊗ C(Y ) generated by elements of the form α(f)(1⊗ψ), f ∈ C(M), ψ ∈ C(Y ). Since C(M) and C(Y ) are commutative algebras, it is easy to see that C is a ∗-subalgebra of C(M)⊗ C(Y ). Then we have the following Theorem 2.2 (i) C is norm-dense in C(M)⊗C(Y ) if and only if for every y ∈ Y , ξy is one-to-one. (ii) The map ξy is C ∞ for every y ∈ Y if and only if αφ(C ∞(M)) ⊆ C∞(M) for all φ. (iii) Under the hypothesis of (ii), each ξy is also an isometry if and only if αφ commutes with (L − λ) −1 for all state φ and all λ in the resolvent of L (equivalently, αφ commutes with the Laplacian L on C ∞(M)). Proof : (i) First, assume that ξy is one-to-one for all y. By Stone-Weirstrass Theo- rem, it is enough to show that C separates points. Take (m1, y1) 6= (m2, y2) in M × Y . If y1 6= y2, we can choose ψ ∈ C(Y ) which separates y1 and y2, hence (1 ⊗ ψ) ∈ C separates (m1, y1) and (m2, y2). So, we can consider the case when y1 = y2 = y (say), but m1 6= m2. By injectivity of ξy, we have ym1 6= ym2, so there exists f ∈ C(M) such that f(ym1) 6= f(ym2), i.e. α(f)(m1, y) 6= α(f)(m2, y). This proves the density of C. For the converse, we argue as in the proof of Proposition 3.3 of [14]. Assume that C is dense in C(M)⊗ C(Y ), and let y ∈ Y , m1,m2 ∈ M such that ym1 = ym2. That is, α(f)(1 ⊗ ψ)(m1, y) = α(f)(1 ⊗ ψ)(m2, y) for all f ∈ C(M), ψ ∈ C(Y ). By the density of C we get χ(m1, y) = χ(m2, y) for all χ ∈ C(M × Y ), so (m1, y) = (m2, y), i.e. m1 = m2. (ii) The ‘if part’ of (ii) follows by considering the states corresponding to point evaluation, i.e. C(Y ) ∋ ψ 7→ ψ(y), y ∈ Y . For the converse, we note that an arbitrary state φ corresponds to a regular Borel measure µ on Y so that φ(h) = hdµ, and thus, αφ(f)(m) = f(ym)dµ(y) for f ∈ C(M). From this, by interchanging differentiation and integation (which is allowed by the Dominated Convergence Theorem, since µ is a finite measure) we can prove that αφ(f) is C ∞ whenever f is so. The assertion (iii) follows from Proposition 2.1 in a straghtforward way. Let us recall a few well-known facts about the Laplacian L, viewed as a negative self-adjoint operator on the Hilbert space L2(M,dvol). It is known (see [12] and references therein) that L has compact resolvents and all its eigenvectors belong to C∞(M). Moreover, it follows from the Sobolev Em- bedding Theorem that Dom(Ln) = C∞(M). Let {eij , j = 1, ..., di; i = 1, 2, ...} be the set of (normalised) eigenvectors of L, where eij ∈ C ∞(M) is an eigenvector corresponding to the eigenvalue λi, |λ1| < |λ2| < .... We have the following: Lemma 2.3 The complex linear span of {eij} is norm-dense in C(M). Proof : This is a consequence of the asymptotic estimates of eigenvalues λi, as well as the uniform bound of the eigenfunctions eij . For example, it is known ([9],Theorem 1.2) that there exist constants C,C ′ such that ‖eij‖∞ ≤ C|λi| 4 , di ≤ C ′|λi| 2 , where n is the dimension of the manifoldM . Now, for f ∈ C∞(M) ⊆ k≥1Dom(L k), we write f as an a-priori L2-convergent series ij fijeij (fij ∈ C), and observe that |fij | 2|λi| 2k < ∞ for every k ≥ 1. Choose and fix sufficiently large k such that i≥0 |λi| n−1−2k < ∞, which is possible due to the well-known Weyl asymptotics of eigenvalues of L. Now, by the Cauchy-Schwarz inequality and the estimate for di, we have |fij|‖eij‖∞ ≤ C(C |fij | 2|λi| n−1−2k Thus, the series ij fijeij converges to f in sup-norm, so Sp{eij , j = 1, 2, ..., di ; i = 1, 2, ...} is dense in sup-norm in C∞(M), hence in C(M) as well. ✷ Let us denote Sp{eij , j = 1, ..., di; i ≥ 1} by A 0 from now on. We shall now show that C∞(M) can be replaced by the smaller subspace A∞0 in Theorem 2.2. We need a lemma for this, which will be useful later on too. Lemma 2.4 Let H1,H2 be Hilbert spaces and for i = 1, 2, let Li be (possibly unbounded) self-adjoint operator on Hi with compact resolvents, and let Vi be the linear span of eigenvectors of Li. Moreover, assume that there is an eigenvalue of Li for which the eigenspace is one-dimensional, say spanned by a unit vector ξi. Let Ψ be a linear map from V1 to V2 such that L2Ψ = ΨL1 and Ψ(ξ1) = ξ2. Then we have 〈ξ2,Ψ(x)〉 = 〈ξ1, x〉 ∀x ∈ V1. (1) Proof: By hypothesis on Ψ, it is clear that there is a common eigenvalue, say λ0, of L1 and L2, with the eigenvectors ξ1 and ξ2 respectively. Let us write the set of eigenvalues of Li as a disjoint union {λ0} Λi (i = 1, 2), and let the corre- sponding orthogonal decomposition of Vi be given by Vi = Cξi Vλi ≡ Cξi ⊕ V i, say, where V i denotes the eigenspace of Li corresponding to the eigenvalue λ. By assumption, Ψ maps Vλ1 to V 2 whenever λ is an eigenvalue of L2, i.e. V 2 6= {0}, and otherwise it maps V 1 into {0}. Thus, Ψ(V 1) ⊆ V Now, (1) is obviously satisfied for x = ξ1, so it is enough to prove (1) for all x ∈ V ′1. But we have 〈ξ, x〉 = 0 for x ∈ V 1, and since Ψ(x) ∈ V 2 = V2 it follows that 〈ξ2,Ψ(x)〉 = 0 = 〈ξ1, x〉. ✷ Lemma 2.5 Let Y and α be as in Theorem 2.2. Then the following are equivalent. (a) For every y ∈ Y , ξy is smooth isometric. (b) For every state φ on C(Y ), we have αφ(A 0 ) ⊆ A 0 , and αφL = Lαφ on A∞0 . Proof: We prove only the nontrivial implication (b) ⇒ (a). Assume (b) that αφ leaves A∞0 invariant and commutes with L on it, for every state φ. To prove that α is a smooth isometric action, it is enough (see the proof of Theorem 2.2) to prove that αy(A ∞) ⊆ A∞ for all y ∈ Y , where αy(f) := (id⊗evy)(f) = f ◦ξy, evy being the evaluation at the point y. LetM1, ...,Mk be the connected components of the compact manifoldM . Thus, the Hilbert space L2(M,dvol) admits an orthogonal decomposition ⊕ki=1L 2(Mi,dvol), and the Laplacian L is of the form ⊕iLi where Li denotes the Laplacian on Mi. Since each Mi is connected, we have Ker(Li) = Cχi, where χi is the constant function on Mi equal to 1. Now, we note that for fixed y and i, the image of Mi under the continuous function ξy must be mapped into a component, sayMj. Thus, by applying Lemma 2.4 with H1 = L 2(Mi),H2 = L2(Mj), Ψ = ξy and the L 2-continuity of the map f 7→ αy(f) = f ◦ ξy, we αy(f)(x)dvol(x) = f(x)dvol(x) for all f in the linear span of eigenvectors of Li, hence (by density) for all f in L2(Mi). It follows that αy(f)dvol = fdvol for all f ∈ L2(M), in particular for all f ∈ C(M). Since αy is a ∗-homomorphism on C(M), we 〈αy(f), αy(g)〉 = αy(fg)dvol = fgdvol = 〈f, g〉, for all f, g ∈ C(M). Thus, αy extends to an isometry on L 2(M), to be denoted by the same notation, which by our assumption commutes with the self-adjoint operator L on the core A∞0 , and hence αy commutes with L n for all n. In particular it leaves invariant the domains of each Ln, which implies ∞) ⊆ A∞. ✷ In view of the fact that the set of isometries of M , denoted by ISO(M), is a compact second countable (i.e. compact metrizable) group, we see that ISO(M) is the maximal compact second countable group acting on M such that the action is smooth and isometric. In other words, if we consider a catogory whose objects are compact metrizable groups acting smoothly and isometrically on M , and morphisms are the group homomorphisms com- muting with the actions on M , then ISO(M) (with its canonical action on M) is the initial object of this cateogory. However, one can take a more general viewpoint and consider the category of compact metrizable spaces Y equipped with a continuous map θ : M × Y → M satisfying (i)-(iii) of Theorem 2.2, or equivalently, the pair of commutative unital C∗-algebras B = C(Y ) and a unital C∗-homomorphism α : C(M) → C(M) → B satisfy- ing the conditions (i)-(iii). The set of isometries ISO(M) (as a topological space) can be identified with the universal object of this category, and then one can prove that it has a group structure. It is quite natural to formulate a quantum analogue of the above, by con- sidering, in the spirit of Woronowicz and Soltan (see [19] and [13]), ‘quantum families of isometries’, which can be defined to be a pair (B, α) where B is a (not necessarily commutative) C∗-algebra and α : C(M) → C(M)⊗ B is unital C∗-homomorhism satisfying (i)-(iii) of Theorem 2.2, i.e. the linear span of α(C(M))(1⊗B) (which is not necessarily a ∗-subalgebra any more, B being possibly noncommutative) is norm-dense in C(M)⊗ B and for ev- ery state φ on B, the map αφ keeps C ∞(M) invariant and commutes with the Laplacian L. The morphisms of this category are obvious. We shall prove that this category has a universal object, and this universal object can be equipped with a canonical quantum group structure. This will define the quantum isometry group of a manifold. However, we shall go beyond classical manifolds and define quantum isometry group QISO(A∞,H,D) for a spectral triple (A∞,H,D), with A∞ being unital, and satisfying cer- tain assumptions. To this end, we need to carefully formulate the notion of Laplacian in noncommutative geometry, which is the goal of the next subsection. 2.2 Laplacian in noncommutative geometry Given a spectral triple (A∞,H,D), we recall from [10] and [6] the con- struction of the space of one-forms. We have a derivation from A∞ to the A∞-A∞ bimodule B(H) given by a 7→ [D, a]. This induces a bimodule morphism π from Ω1(A∞) (the bimodule of universal one-forms on A∞) to B(H), such that π(δ(a)) = [D, a], where δ : A∞ → Ω1(A∞) denotes the universal derivation map. We set Ω1D ≡ Ω ∞) := Ω1(A∞)/Ker(π) ∼= π(Ω1(A∞)) ⊆ B(H). Assume that the spectral triple is of compact type and has a finite dimension in the sense of Connes ([6]), i.e. there is some p > 0 such that the operator |D|−p (interpreted as the inverse of the re- striction of |D|p on the closure of its range, which has a finite co-dimension since D has compact resolvents) has finite nonzero Dixmier trace, denoted by Trω (where ω is some suitable Banach limit, see, e.g. [6], [10]). Con- sider the canonical ‘volume form’ τ coming from the Dixmier trace, i.e. τ : B(H) → C defined by τ(A) := 1 Trω(|D|−p) Trω(A|D| −p). Let us at this point assume that the spectral triple is QC∞, i.e. A∞ and {[D, a], a ∈ A∞} are contained in the domains of all powers of the derivation [|D|, ·]. Under this assumption, τ is a positive faithful trace on the C∗-subalgebra gener- ated by A∞ and {[D, a] a ∈ A∞}, and the GNS Hilbert space L2(A∞, τ) is denoted by H0D. Similarly, we equip Ω D with a semi-inner product given by < η, η′ >:= τ(η∗η′), and denote the Hilbert space obtained from it by H1D. The map dD : H D → H D given by dD(·) = [D, ·] is an unbounded densely defined linear map. Let us assume the following: Assumption(i) (a) dD is closable (the closure is denoted again by dD); (b) A∞ ⊆ Dom(L), where L := −d∗DdD and A ∞ is viewed as a dense sub- space of H0D; At this point, let us show that this assumption is valid under a very natural condition on the spectral triple. Lemma 2.6 Suppose that for every element a ∈ A∞, the map R ∋ t 7→ αt(X) := exp(itD)Xexp(−itD) is differentiable at t = 0 in the norm- topology of B(H), where X = a or [D, a]. Then the assumption (i) is sat- isfied. Moreover, in this case, L maps A∞ into the weak closure of A∞ in B(H0D). Proof : We first observe that τ(αt(A)) = τ(A) for all t and for all A ∈ B(H), since exp(itD) commutes with |D|−p. If moreover, A belongs to the domain of norm-differentiability (at t = 0) of αt, i.e. αt(A)−A → i[D,A] in operator- norm, then it follows from the property of the Dixmier trace that τ([D,A]) = limt→0 τ(αt(A))−τ(A) = 0. Now, since by assumption we have the norm- differentiability at t = 0 of αt(A) for A belonging to the ∗-subalgebra (say B) generated by A∞ and [D,A∞], it follows that τ([D,A]) = 0 ∀A ∈ B. Let us now fix a, b, c ∈ A∞ and observe that < a dD(b), dD(c) > = τ((a dD(b)) ∗dD(c) > = −τ([D, [D, b∗]a∗c]) + τ([D, [D, b∗]a∗]c) = τ([D, [D, b∗]a∗]c), using the fact that τ([D, [D, b∗]a∗c]) = 0. This implies | < a dD(b), dD(c) > | ≤ ‖[D, [D, b ∗]a∗]‖τ(c∗c) 2 = ‖[D, [D, b∗]a∗]‖‖c‖2, where ‖c‖2 = τ(c 2 denotes the L2-norm of c ∈ H0D. This proves that a dD(b) belongs to the domain of d D for all a, b ∈ A ∞, so in particular d∗D is dense, i.e. dD is closable. Moreover, taking a = 1, we see that ∞) ⊆ Dom(d∗D), or in other words, A ∞ ⊆ Dom(d∗DdD). This proves (i)(a) and (i)(b). The last sentence in the statement of the lemma can be proven along the line of Theorem 2.9, page 129, [10]. ✷ We need few more assumptions on the operator L to define the quantum isometry group. Assumption (ii): L has compact resolvents, Assumption(iii): L(A∞) ⊆ A∞; Assumption(iv): Each eigenvector of L (which has a discrete spectrum, hence a complete set of eigenvectors) belongs to A∞; Assumption(v)(‘connectedness assumption’): the kernel of L is one-dimensional, spanned by the identity 1 of A∞, viewed as a unit vector in H0D. We call L the noncommutative Laplacian and Tt the noncommutative heat semigroup. We summarize some simple observations in form of the following Lemma 2.7 (a) If the assumptions (i)-(v) are valid, then for x ∈ A∞, we have L(x∗) = (L(x))∗. (b) If Tt := exp(tL) maps H D into A ∞ for all t > 0, the the assumption (iv) is satisfied. Proof : It follows by simple calculation using the facts that τ is a trace and dD(x −(dD(x)) ∗ that τ(L(x∗)∗y) = −τ(dD(x)dD(y)) = −τ(dD(y)dD(x)) = τ((dD(y ∗))∗dD(x)) = < y∗,L(x) >= τ(yL(x)) = τ(L(x)y), for all y ∈ A∞. By density of A∞ in H0D (a) follows. To prove (b), we note that if x ∈ H0D is an eigenvector of L, say L(x) = λx (λ ∈ C), then we have Tt(x) = e λtx, hence x = e−λtTt(x) ∈ A Since by assumption, L has a countable set of eigenvalues each with finite multiplicity, let us denote them by λ0 = 0, λ1, λ2, ... with V0 = C 1, V1, V2, ... be corresponding eigenspaces (finite dimensional), and for each i, let {eij , j = 1, ..., di} be an orthonormal basis of Vi. By Assumption (iv), Vi ⊆ A ∞ for each i, Vi is closed under ∗, and moreover, {e ij , j = 1, ..., di} is also an or- thonormal basis for Vi, since τ(x ∗y) = τ(yx∗) for x, y ∈ A∞. We also make the following Assumption (vi) The complex linear span of {eij , i = 0, 1, ...; j = 1, ..., di}, say A∞0 , is norm-dense in A Definition 2.8 We say that a spectral triple satisfying the assumptions (i)- (vi) admissible. Remark 2.9 We have just seen that classical spectral triple (A∞ = C∞(M),H,D), where M is compact connected spin manifold, H is the L2 space of square integrable spinors and D is the Dirac operator, is indeed admissible in our sense. Later on we shall discuss how we can weaken the connectedness assumption as well, thus accommodating a general classical (commutative) spectral triple in our set-up. Moreover, the standard examples of noncom- mutative spectral triples, e.g. those on Aθ, quantum Heisenberg manifold etc., do belong to the admissible class. Lemma 2.10 Let us assume that the spectral triple (A∞,H,D) is admis- sible. Let Ψ : A∞0 → A 0 be a (norm-) bounded linear map, such that Ψ(1) = 1, and Ψ ◦L = L◦Ψ on the subspace A∞0 spanned (algebraically) by Vi, i = 1, 2, .... Then τ(Ψ(x)) = τ(x) for all x ∈ A Proof : By Lemma 2.4 with H1 = H2 = H D, ξ1 = ξ2 = 1, we have τ(Ψ(x)) = τ(x) for all x ∈ A∞0 . By the norm-continuity of Ψ and τ it extends to the whole of A∞. ✷ 2.3 Definition and existence of the quantum isometry group We begin by recalling the definition of compact quantum groups and their actions from [18]. A compact quantum group is given by a pair (S,∆), where S is a unital separable C∗ algebra equipped with a unital C∗-homomorphism ∆ : S → S ⊗ S (where ⊗ denotes the injective tensor product) satisfying (ai) (∆⊗ id) ◦∆ = (id⊗∆) ◦∆ (co-associativity), and (aii) the linear span of ∆(S)(S ⊗ 1) and ∆(S)(1 ⊗ S) are norm-dense in S ⊗ S. It is well-known (see [18]) that there is a canonical dense ∗-subalgebra S0 of S, consisting of the matrix coefficients of the finite dimensional unitary (co)-representations of S, and maps ǫ : S0 → C (co-unit) and κ : S0 → S0 (antipode) defined on S0 which make S0 a Hopf ∗-algebra. We say that the compact quantum group (S,∆) acts on a unital C∗ algebra B, if there is a unital C∗-homomorphism α : B → B ⊗ S satisfying the following : (bi) (α⊗ id) ◦ α = (id⊗∆) ◦ α, and (bii) the linear span of α(B)(1 ⊗ S) is norm-dense in B ⊗ S. Let us now recall the concept of universal quantum groups as in [17], [15] and references therein. We shall use most of the terminologies of [15], e.g. Woronowicz C∗ -subalgebra, Woronowicz C∗-ideal etc, however with the exception that we shall call the Woronowicz C∗ algebras just compact quantum groups, and not use the term compact quantum groups for the dual objects as done in [15]. For Q ∈ GLn(C), let Au(Q) denote the uni- versal compact quantum group generated by uij, i, j = 1, ..., n satisfying the relations uu∗ = In = u ∗u, u′QuQ−1 = In = QuQ −1u′, where u = ((uij)), u ′ = ((uji)) and u = ((u ij)). The coproduct, say ∆̃, is given by, ∆̃(uij) = uik ⊗ ukj. We refer the reader to [17] for a detailed discussion on the structure and clas- sification of such quantum groups. Let us denote by Ui the quantum group Adi(I), where di is dimension of the subspace Vi. We fix a representation βi : Vi → Vi⊗Ui of Ui on the Hilbert space Vi, given by βi(eij) = k eik⊗u for j = 1, ..., di, where u (i) ≡ u are the generators of Ui as discussed before. Thus, both u(i) and ¯u(i) are unitaries. It follows from [15] that the represen- tations βi canonically induce a representation β = ∗iβi of the free product U := ∗iUi (which is a compact quantum group, see [15] for the details) on the Hilbert space H0D, such that the restriction of β on Vi coincides with βi for all i. In view of the characterization of smooth isometric action on a classical manifold, we make the following definitions. Definition 2.11 A quantum family of smooth isometries of a noncommu- tative manifold A∞ (or, more precisely on the corresponding spectral triple) is a pair (S, α) where S is a separable unital C∗-algebra, α : A → A ⊗ S (where A denotes the C∗ algebra obtained by completing A∞ in the norm of B(H0D)) is a unital C ∗-homomorphism, satisfying the following: (a) Sp α(A)(1⊗ S) = A⊗ S, (b) αφ := (id ⊗ φ) ◦ α maps A 0 into itself and commutes with L on A for every state φ on S. In case the C∗-algebra S has a coproduct ∆ such that (S,∆) is a compact quantum group and α is an action of (S,∆) on A, we say that (S,∆) acts smoothly and isometrically on the noncommutative manifold. Fix a spectral triple (A∞,H,D). Consider the category Q with the object-class consisting of all quantum families of isometries (S, α) of the given noncommutative manifold, and the set of morphismsMor((S, α), (S ′, α′)) being the set of unital C∗-homomorphisms φ : S → S ′ satisfying (id⊗φ)◦α = α′. We also consider another category Q′ whose objects are triplets (S,∆, α) where (S,∆) is a compact quantum group acting smoothly and isometrically on the given noncommutative manifold, with α being the corresponding ac- tion. The morphisms are the homomorphisms of compact quantum groups which are also morphisms of the underlying quantum families. The forget- ful functor F : Q′ → Q is clearly faithful, and we can view F (Q′) as a subcategory of Q. Let us assume from now on that the spectral triple (A∞,H,D) is admis- sible. Our aim is to prove the existence of a universal object in Q. We shall also prove that the (unique upto isomorphism) universal object belongs to F (Q′), and its pre-image in Q′ is a universal object in the category Q′. To this end, we need some preparatory results. Lemma 2.12 Consider an admissible spectral triple (A∞,H,D) and let (S, α) be a quantum family of smooth isometries of the spectral triple. More- over, assume that the action α is faithful in the sense that there is no proper C∗-subalgebra S1 of S such that α(A ∞) ⊆ A∞ ⊗ S1. Then α̃ : A ∞ ⊗ S → A∞ ⊗ S defined by α̃(a⊗ b) : α(a)(1 ⊗ b) extends to an S-linear unitary on the Hilbert S-module H0D ⊗S, denoted again by α̃. Moreover, we can find a C∗-isomorphism φ : U/I → S between S and a quotient of U by a C∗-ideal I of U , such that α = (id ⊗ φ) ◦ (id ⊗ ΠI) ◦ β on A ∞ ⊆ H0D, where ΠI denotes the quotient map from U to U/I. If, furthermore, there is a compact quantum group structure on S given by a coproduct ∆ such that (S,∆, α) is an object in Q′, the map α : A∞ → A∞⊗S extends to a unitary representation (denoted again by α) of the com- pact quantum group (S,∆) on H0D. In this case, the ideal I is a Woronowicz C∗-ideal and the C∗-isomorphism φ : U/I → S is a morphism of compact quantum groups. Proof : Let ω be any state on S. Since the action α : A∞ → A∞ ⊗ S is smooth and isometric, we conclude by Lemma 2.10 that τ(αω(x)) = τ(x)ω(1) for all x ∈ A. Since ω is arbitrary, we have (τ ⊗ id)α(x) = τ(x)1S for all x ∈ A. So, < α(x), α(y) >S=< x, y > 1S , where < ·, · >S denotes the S-valued inner product of the Hilbert module H0D ⊗S. This proves that α̃ defined by α̃(x⊗ b) := α(x)(1⊗ b) (x ∈ A∞, b ∈ S) extends to an S-linear isometry on the Hilbert S-module H0D⊗S. Moreover, since α(A ∞)(1⊗S) is norm-dense in Ā⊗S, it is clear that the S-linear span of the range of α(A∞) is dense in the Hilbert module H0D ⊗ S, or in other words, the isometry α̃ has a dense range, so it is a unitary. Since αω leaves each Vi invariant, it is clear that α maps Vi into Vi ⊗ S for each i. Let v (j, k = 1, ..., di) be the elements of S such that α(eij) = k eik ⊗ v . Note that vi := ((v )) is a unitary in Mdi(C)⊗S. Moreover, the ∗-subalgebra generated by all {v , i, j, k ≥ 1} must be dense in S by the assumption of faithfulness. We have already remarked that {e∗ij} is also an orthonormal basis of Vi, and since α, being a C ∗-action on A, is ∗-preserving, we have α(e∗ij) = (α(eij)) , and therefore ((v )) is also unitary. By univer- sality of Ui, there is a C ∗-homomorphism from Ui to S sending u and by definition of the free product, this induces a C∗-homomorphism, say Π, from U onto S, so that U/I ∼= S, where I := Ker(Π). In case S has a coproduct ∆ making it into a compact quantum group and α is a quantum group action, it is easy to see that the subalgebra of S generated by v is a Hopf algebra, with ∆(v . From this, it follows that Π is Hopf-algebra morphism, hence I is a Woronowicz C∗-ideal. ✷ Before we state and prove the main theorem, let us note the following elementary fact about C∗-algebras. Lemma 2.13 Let C be a C∗ algebra and F be a nonempty collection of C∗-ideals (closed two-sided ideals) of C. Then for any x ∈ C, we have ‖x+ I‖ = ‖x+ I0‖, where I0 denotes the intersection of all I in F and ‖x + I‖ = inf{‖x − y‖ : y ∈ I} denotes the norm in C/I. Proof : It is clear that supI∈F ‖x + I‖ defines a norm on C/I0, which is in fact a C∗-norm since each of the quotient norms ‖ · +I‖ is so. Thus the lemma follows from the uniqueness of C∗ norm on the C∗ algebra C/I0. ✷ Theorem 2.14 For any admissible spectral triple (A∞,H,D), the category Q of quantum families of smooth isometries has a universal (initial) object, say (G, α0). Moreover, G has a coproduct ∆0 such that (G,∆0) is a com- pact quantum group and (G,∆0, α0) is a universal object in the category Q of compact quantum groups acting smoothly and isometrically on the given spectral triple. The action α0 is faithful. Proof : Recall the C∗-algebra U considered before, and the map β from H0D to H0D⊗U . By our definition of β, it is clear that β(A 0 ) ⊆ A 0 ⊗algU . However, β is only a linear map (unitary) but not necessarily a ∗-homomorphism. We shall construct the universal object as a suitable quotient of U . Let F be the collection of all those C∗-ideals I of U such that the composition ΓI := (id⊗ΠI) ◦ β : A 0 → A 0 ⊗alg (U/I) extends to a C ∗-homomorphsim from Ā to Ā ⊗ (U/I), where ΠI denotes the quotient map from U onto U/I. This collection is nonempty, since the trivial one-dimensional C∗- algebra C gives an object in Q and by Lemma 2.12 we do get a member of F . Now, let I0 be the intersection of all ideals in F . We claim that I0 is again a member of F . Since any C∗-homomorphism is contractive, we have ‖ΓI(a)‖ ≡ ‖β(a) + Ā ⊗ I‖ ≤ ‖a‖ for all a ∈ A 0 and I ∈ F . By Lemma 2.13, we see that ‖ΓI0(a)‖ ≤ ‖a‖ for a ∈ A 0 , so ΓI0 extends to a norm- contractive map on Ā by the density of A∞0 in Ā. Moreover, for a, b ∈ Ā and for I ∈ F , we have ΓI(ab) = ΓI(a)ΓI(b). Since ΠI = ΠI ◦ ΠI0 , we can rewrite the homomorphic property of ΓI as ΓI0(ab)− ΓI0(a)ΓI0(b) ∈ Ā ⊗ (I/I0). Since this holds for every I ∈ F , we conclude that ΓI0(ab)−ΓI0(a)ΓI0(b) ∈ I∈F Ā⊗(I/I0) = (0), i.e. ΓI0 is a homomorphism. In a similar way, we can show that it is a ∗-homomorphism. Since each βi is a unitary representation of the compact quantum group Ui on the finite dimensional space Vi, it follows that βi(Vi)(1 ⊗ Ui) is total in Vi ⊗ Ui. In particular, for any vi ∈ Vi (i arbitrary), the element vi ⊗ 1Ui = vi ⊗ 1U belongs to the linear span of βi(Vi)(1⊗Ui) ⊂ β(Vi)(1⊗U). Thus, A 0 ⊗1U is contained in the linear span of β(A∞0 )(1⊗U) and henceA 0 ⊗1 U is linearly spanned by ΓI0(A 0 )(1⊗U/I0). By the norm-denisty of A∞0 in A and the contractivity of the quotient map, it follows that A ⊗ U/I0 is the closed linear span of ΓI0(A 0 )(1 ⊗ U/I0). This completes the proof that (U/I0,ΓI0) is indeed an object of Q. We now show that G := U/I0 is a universal object in Q. To see this, con- sider any object (S, α) of Q. Without loss of generality we can assume the action to be faithful, since otherwise we can replace S by the C∗-subalgebra generated by the elements {v } appearing in the proof of Lemma 2.12. But by Lemma 2.12 we can further assume that S is isomorphic with U/I for some I ∈ F . Since I0 ⊆ I, we have a C ∗-homomorphism from U/I0 onto U/I, sending x+I0 to x+I, which is clearly a morphism in the category Q. This is indeed the unique such morphism, since it is uniquely determined on the dense subalgebra generated by {u + I0, i, j, k ≥ 1} of G. To construct the coproduct on G = U/I0, we first consider α (2) = (ΓI0 ⊗ id) ◦ΓI0 : A → A⊗G ⊗G. It is easy to verify that (G ⊗G, α (2)) is an object in the category Q, so by the universality of (G,ΓI0), we have a unique unital C∗-homomorphism ∆0 : G → G ⊗ G satisfying (id⊗∆0) ◦ ΓI0(x) = α (2)(x) ∀x ∈ A. Taking x = eij, we get eil ⊗ (πI0 ⊗ πI0) eil ⊗∆0(πI0(u Comparing coefficients of eil, and recalling that ∆̃(u (where ∆̃ denotes the coproduct on U), we have (πI0 ⊗ πI0) ◦ ∆̃ = ∆0 ◦ πI0 (2) on the linear span of {u , i, j, k ≥ 1}, and hence on the whole of U . This implies that ∆0 maps I0 = Ker(πI0) into Ker(πI0⊗πI0) = (I0⊗1+1⊗I0) ⊂ U ⊗ U . In other words, I0 is a Hopf C ∗-ideal, and hence G = U/I0 has the canonical compact quantum group structure as a quantum subgroup of U . It is clear from the relation (2) that ∆0 coincides with the canonical coproduct of the quantum subgroup U/I0 inherited from that of U . It is also easy to see that the object (G,∆0,ΓI0) is universal in the category Q ′, using the fact that (by Lemma 2.12) any compact quantum group (G,Φ) acting smoothly and isometrically on the given spectral triple is isomorphic with a quantum subgroup U/I, for some Hopf C∗-ideal I of U . Finally, the faithfulness of α0 follows from the universality by standard arguments which we briefly sketch. If G1 ⊂ G is a ∗-subalgebra of G such that α0(A) ⊆ A ⊗ G1, it is easy to see that (G1,∆0, α0) is also a universal object, and by definition of universality of G it follows that there is a unique morphism, say j, from G to G1. But the map j ◦ i is a morphism from G to itself, where i : G1 → G is the inclusion. Again by universality, we have that j ◦ i = idG , so in particular, i is onto, i.e. G1 = G. ✷ Definition 2.15 We shall call the universal object (G,∆0) obtained in the theorem above the quantum isometry group of (A∞,H,D) and denote it by QISO(A∞,H,D), or just QISO(A∞) (or sometimes QISO(Ā)) if the spectral triple is understood from the context. Remark 2.16 Assume that an admissible spectral triple (A∞,H,D) also satisfies the condition (i) of Lemma 2.5, i.e. Dom(Ln) = A∞. Let α : A → A⊗S be a smooth isometric action on A∞ by a compact quantum group S. We recall from the proof of Lemma 2.12 that the map α̃ from A⊗ S to itself extends to an S-linear unitary on the Hilbert S-module H0D ⊗ S, i.e. α̃ can be viewed as a unitary in B(H0D) ⊗ S. Clearly, for any state φ on S, we have αφ = (id ⊗ φ)(α̃) ∈ B(H D). Now, by the definition of a smooth isometric action, the bounded operator αφ commutes with the self-adjoint operator L on A∞0 , which is a core for L. So, αφ must commute with L for all n, and in particular keeps A∞ = nDom(L n) invariant. Remark 2.17 Let us now briefly indicate how one can weaken the hy- pothesis of connectedness. Such an extension of our results is desirable to accommodate the classical spaces, including the finite sets and graphs, in our framework. One possibile approach could be to consider the cate- gory of compact quantum group actions α which are not only ‘smmoth’ and ‘isometric’ in our sense, but also satisfy the τ -invariance condition, i.e. (τ ⊗ id)(α(a)) = τ(a)1. It is easy to see that the connectedness assumption has been used by us only to prove that the τ -invariance is automatic for smooth isometric actions. Thus, if we work in the smaller cateogory of such τ -invraiant actions only, the proof of Theorem 2.14 does go through and thus we can prove the existence of a universal object, to be defined as the quantum isometry group. It is easy to see that for the algebra of functions on a finite set, with the spectral triple given by D = 0, this quantum isometry group coincides with thw quantum permutation group defined by Wang. Remark 2.18 It is easy to see how to extend our formulation and results to spectral triples which are not necessarily of type II, i.e. when the trace τ is replaced by some non-tracial positive functional. Indeed, our construc- tion will go through in such a situation more or less verbatim, by replacing the universal quantum groups Adi(I) by Adi(Qi) for some suitable choice of matrices Qi coming from the modularity property of τ . 2.4 Construction of quantum group-equivariant spectral triples In this subsection, we shall briefly discuss the relevance of quantum isometry group to the problem of constructing quantum group equivariant spectral triples, which is important to understand the role of quantum groups in the framework of noncommutative geometry. There has been a lot of activity in this direction recently, see, for example, the articles by Chakraborty and Pal ([5]), Connes ([7]), Landi et al ([8]) and the references therein. In the classical situation, there exists a natural unitary representation of the isom- etry group G = ISO(M) of a manifold M on the Hilbert space of forms, so that the operator d+d∗ (where d is the de-Rham differential operator) com- mutes with the representation. Indeed, d+d∗ is also a Dirac operator for the spectral triple given by the natural representation of C∞(M) on the Hilbert space of forms, so we have a canonical construction of G-equivariant spectral triple. Our aim in this subsection is to generalize this to the noncommuta- tive framework, by proving that dD + d D is equivariant with respect to a canonical unitary representation on the Hilbert space of ‘noncommutative forms’ (see, for example, [10] for a detailed discussion of such forms). Consider an admissible spectral triple (A∞,H,D) and moreover, make the assumption of Lemma 2.6, i.e. assume that t 7→ eitDxe−itD is norm- differentiable at t = 0 for all x in the ∗-algebra B generated by A∞ and [D,A∞]. Lemma 2.19 In the notation of Lemma 2.6, we have the following (where b, c ∈ A∞): d∗D(dD(b)c) = − (bL(c)− L(b)c− L(bc)) . (3) Proof: Denote by χ(b, c) the right hand side of euqation (3) and fix any a ∈ A∞. Using the facts the the functional τ is a faithful trace on the ∗-algebra B, L = −d∗DdD and that [D,X] = 0 for any X in B, we have, τ(a∗χ(b, c)) {τ(a∗bL(c)) + τ(ca∗L(b)) + τ(a∗L(bc))} {τ([D, a∗b][D, c]) − τ([D, ca∗][D, b])− τ([D, a∗][D, bc])} {τ(a∗[D, b][D, c]) − τ([D, c]a∗[D, b])− τ(c[D, a∗][D, b])− τ([D, a∗][D, b]c)} = −τ([D, a∗][D, b]c) = τ([D, a]∗[D, b]c) = 〈dD(a), dD(b)c〉 = τ(a∗(d∗D(dD(b)c))). From this, we get the following by a simple computation: 〈adD(b), a ′dD(b ′)〉 = − τ(b∗Ψ(a∗a, b′)), (4) for a, b, a′, b′ ∈ A∞, and where Ψ(x, y) := L(xy)−L(x)y+xL(y). Now, let us denote the quantum isometry group of the given spectral triple (A∞,H,D) by (G,∆, α). Let A0 denote the ∗-algebra generated by A 0 , G0 denote ∗- algebra of G generated by matrix elements of irreducible representations. Clearly, α : A0 → A0 ⊗alg G0 is a Hopf-algebraic action of G0 on A0. Define Ψ̃ : (A0 ⊗alg G0)× (A0 ⊗alg G0) → A0 ⊗alg G0 by Ψ̃((x⊗ q), (x′ ⊗ q′)) := Ψ(x, x′)⊗ (qq′). It follows from the relation (L ⊗ id) ◦ α = α ◦ L on A0 that Ψ̃(α(x), α(y)) = α(Ψ(x, y)). (5) We now define a linear map α(1) from the linear span of {adD(b) : a, b ∈ A0} to H1D ⊗ G by setting α(1)(adD(b)) := i dD(b j )⊗ a where for any x ∈ A0 we write α(x) = i ∈ A0⊗algG0 (summation over finitely many terms). We shall sometimes use the Sweedler convention of writing the above simply as α(x) = x(1) ⊗ x(2). It then follows from the identities (4) and (5), and also the fact that (τ ⊗ id)(α(a)) = τ(a)1 for all a ∈ A0 that 〈adD(b), a ′dD(b (τ ⊗ id)(α(b∗)Ψ̃(α(a∗a′), α(b′))) (τ ⊗ id)(α(b∗)α(Ψ(a∗a′, b′))) (τ ⊗ id)(α(b∗Ψ(a∗a′, b′))) τ(b∗Ψ(a∗a′, b′))1G = 〈adD(b), a ′dD(b ′)〉1G . This proves that α(1) is indeed well-defined and extends to a G-linear isom- etry on H1D ⊗ G, to be denoted by U (1), which sends (adD(b)) ⊗ q to α(1)(adD(b))(1 ⊗ q), a, b ∈ A0, q ∈ G. Moreover, since the linear span of α(A∞0 )(1 ⊗ G) is dense in H D ⊗ G, it is easily seen that the range of the isometry U (1) is the whole of H1D ⊗G, i.e. U (1) is a unitary. In fact, from its definition it can also be shwon that U (1) is a unitary representation of the compact quantum group G on H1D. In a similar way, we can construct unitary representation U (n) of G on the Hilbert space of n-forms for any n ≥ 1, by defining U (n)((a0dD(a1)dD(a2)...dD(an))⊗q) = a 0 dD(a 1 )...dD(a n )⊗(a 1 ...a n q), ai ∈ A (using Sweedler convention) and verifying that it extends to a unitary. We also denote by U (0) the unitary representation α̃ on H0D discussed be- fore. Finally, we have a unitary representation U = n≥0 U (n) of G on H̃ := D, and also extend dD as a closed densely defined operator on H̃ in the obvious way, by defining dD(a0dD(a1)...dD(an)) = dD(a0)...dD(an). It is now straightforward to see the following: Theorem 2.20 The operator D′ := dD+d D is equivariant in the sense that U(D′ ⊗ 1) = (D′ ⊗ 1)U . We point out that there is a natural representation π of A on H̃ given by π(a)(a0dD(a1)...dD(an)) = aa0dD(a1)...dD(an), and (π(A ∞), H̃,D′) is indeed a spectral triple, which is G-equivariant. Although the relation between spectral properties of D and D′ is not clear in general, in many cases of interest (e.g. when there is an underlying type (1, 1) spectral data in the sense of [10]) these two Dirac operators are closely related. As an illustration, consider the canonical spectral on the noncommutative 2-torus Aθ, which is discussed in some details in the next section. In this case, the Dirac operator D acts on L2(Aθ, τ) ⊗ C 2, and it can easily be shown (see [10]) that the Hilbert space of forms is isomorphic with L2(Aθ, τ)⊗C 4 ∼= L2(Aθ)⊗C 2; thus D′ is essentially same as D in this case. 3 Examples and computations We give some simple yet interesting explicit examples of quantum isometry groups here. However, we give only some computational details for the first example, and for the rest, the reader is referred to a companion article ([3]). Example 1 : commutative tori Consider M = T, the one-torus, with the usual Riemannian structure. The ∗-algebra A∞ = C∞(M) is generated by one unitary U , which is the multi- plication operator by z in L2(T). The Laplacian is given by L(Un) = −n2Un. If a compact quantum group (S,∆S) acts on A ∞ smoothly, let An, n ∈ Z be elements of S such that α0(U) = n ⊗An (here α0 : A ∞ → A∞ ⊗alg S is the S-action on A∞). Note that this infinite sum converges at least in the topology of the Hilbert space L2(T) ⊗ L2(S), where L2(S) denotes the GNS space for the Haar state of S. It is clear that the condition (L ⊗ id) ◦ α0 = α0 ◦ L forces to have An = 0 for all but n = ±1. The conditions α0(U)α0(U) ∗ = α0(U) ∗α0(U) = 1 ⊗ 1 further imply the follow- A∗1A1 +A −1A−1 = 1 = A1A 1 +A−1A A∗1A−1 = A −1A1 = A1A −1 = A−1A 1 = 0. It follows that A±1 are partial isometries with orthogonal domains and ranges. Say, A1 has domain P and range Q. Hence the domain and range of A−1 are respectively 1 − P and 1 − Q. Consider the unitary V = A + B, so that V P = A, V (1 − P ) = B. Now, from the fact that (L⊗ id)(α0(U 2)) = α0(L(U 2)) it is easy to see that the coefficient of 1⊗1 in the expression of α0(U) 2 must be 0, i.e. AB+BA = 0. From this, it follows that V and P commute and therefore P = Q. By straightforward calculation using the facts that V is unitary, P is a projection and V and P commute, we can verify that α0 given by α0(U) = U ⊗V P +U −1⊗V (1−P ) extends to a ∗-homomorphsim from A∞ to A∞⊗C∗(V, P ) satisfying (L⊗id)◦α0 = α0◦L. It follows that the C∗ algebra QISO(T) is commutative and generated by a unitary V and a projection P , or equivalently by two partial isometries A, B such that A∗A = AA∗, B∗B = BB∗, AB = BA = 0. So, as a C∗ algebra it is isomorphic with C(T) ⊕ C(T) ∼= C(T × Z2). The coproduct (say ∆0) can easily be calculated from the requirement of co-associativity, and the Hopf algebra structure of QISO(T) can be seen to coincide with that of the semi-direct product of T by Z2, where the generator of Z2 acts on T by sending z 7→ z̄. We summarize this in form of the following. Theorem 3.1 The universal quantum group of isometries QISO(T) of the one-torus T is isomorphic (as a quantum group) with C(T >⊳Z2) = C(ISO(T)). We can easily extend this result to higher dimensional commutative tori, and can prove that the quantum isometry group coincides with the classical isometry group. This is some kind of rigidity result, and it will be interest- ing to investigate the nature of quantum isometry groups of more general classical manifolds. Example 2 : Noncommutative torus; holomorphic isomrtries Next we consider the simplest and well-known example of noncommutative manifold, namely the noncommutative two-torus Aθ, where θ is a fixed irrational number (see [6]). It is the universal C∗ algebra generated by two unitaries U and V satisfying the commutation relation UV = λV U , where λ = e2πiθ. There is a canonical faithful trace τ on Aθ given by τ(UmV n) = δmn. We consider the canonical spectral triple (A ∞,H,D), where A∞ is the unital ∗-algebra spanned by U, V , H = L2(τ)⊕ L2(τ) and D is given by 0 d1 + id2 d1 − id2 0 where d1 and d2 are closed unbounded linear maps on L 2(τ) given by mV n) = mUmV n, d2(U mV n) = nUmV n. It is easy to compute the space of one-forms Ω1D (see [4], [10], [6]) and the Laplacian L = −d ∗d is given by L(UmV n) = −(m2 + n2)UmV n. For simplicity of computation, instead of the full quantum isometry group we at first concentrate on an interesting quantum subgroup G = QISOhol(A∞,H,D), which is the uni- versal quantum group which leaves invariant the subalgebra ofA∞ consisting of polynomials in U , V and 1, i.e. span of UmV n with m,n ≥ 0. The proof of existence and uniqueness of such a universal quantum group is more or less identical to the proof of existence and uniqueness of QISO. We call G the quantum group of “holomorphic” isometries, and observe in the theorem stated below without proof (see [3]) that this quantum group is nothing but the quantum double torus studied in [11]. Theorem 3.2 Consider the following co-product ∆B on the C ∗ algebra B = C(T2)⊕A2θ, given on the generators A0, B0, C0,D0 as follows ( where A0,D0 correspond to C(T2) and B0, C0 correspond to A2θ) ∆B(A0) = A0 ⊗A0 + C0 ⊗B0, ∆B(B0) = B0 ⊗A0 +D0 ⊗B0, ∆B(C0) = A0 ⊗ C0 +C0 ⊗D0, ∆B(D0) = B0 ⊗ C0 +D0 ⊗D0. Then (B,∆0) is a compact quantum group and it has an action α0 on Aθ given by α0(U) = U ⊗A0 + V ⊗B0, α0(V ) = U ⊗C0 + V ⊗D0. Moreover, (B,∆B) is isomorphic (as quantum group) with G = QISO hol(A∞,H,D). We refer to [3] for a proof of the above result, and to [11] for the computation of the Haar stat and representation theory of the compact quantum group Example 3 : Noncommutative Torus; full quantum isometry group By similar but somewhat tedious calculations (see [3]) one can also describe explicitly the full quantum isometry group QISO(A∞,H,D). It is as a C∗ algebra has eight direct summands, four of which are isomorphic with the commutative algebra C(T2), and the other four are irrational rotation algebras. Theorem 3.3 QISO(Aθ) = ⊕ ∗(Uk1, Uk2) (as a C ∗ algebra), where for odd k, Uk1, Uk2 are the two commuting unitary generators of C(T 2), and for even k, Uk1Uk2 = exp(4πiθ)Uk2Uk1, i.e. they generate A2θ. The (co)-action on the generators U, V (say) of Aθ are given by the following : α0(U) = U⊗(U11+U31)+V⊗(U52+U62)+U −1⊗(U21+U41)+V −1⊗(U72+U82), α0(V ) = U⊗(U51+U71)+V⊗(U12+U22)+U −1⊗(U61+U81)+V −1⊗(U32+U42). From the co-associativity condition, the co-product of QISO(Aθ) can easily be calculated. For the detailed description of the coproduct, counit, an- tipode and study of the representation theory of QISO(Aθ), the reader is referred to [3]. It is interesting to mention here that the quantum isometry group of Aθ is a Rieffel type deformation of the isometry group (which is same as the quantum isometry group) of the commutative two-torus. The commutative two-torus is a subgroup of its isometry group, but when the isometry group is deformed into QISO(Aθ), the subgroup relation is not respected, and the deformation of the commutative torus, which is A2θ, sits in QISO(Aθ) just as a C ∗ subalgebra (in fact a direct summand) but not as a quantum subgroup any more. This perhaps provides some explanation of the non-existence of any Hopf algebra structure on the noncommutative torus. Acknowledgement : The author would like to thank P. Hajac for draw- ing his attention to the article [11], and S.L. Woronowicz for many valuable comments and suggestions which led to substantial improvement of the pa- References [1] Banica, T.: Quantum automorphism groups of small metric spaces, Pacific J. Math. 219(2005), no. 1, 27–51. [2] Banica, T.: Quantum automorphism groups of homogeneous graphs, J. Funct. Anal. 224(2005), no. 2, 243–280. [3] Bhowmick, J. and Goswami, D.: Quantum isometry groups : examples and computations, preprint (2007), arXiv 0707.2648. [4] Chakraborty, P. S.: Goswami, D. and Sinha, Kalyan B.: Probability and geometry on some noncommutative manifolds, J. Operator Theory 49 (2003), no. 1, 185–201. [5] Chakraborty, P. S. and Pal, A.: Equivariant spectral triples on the quantum SU(2) group, K. Theory 28(2003), 107–126. [6] Connes, A.: “Noncommutative Geometry”, Aacdemic Press, London-New York (1994). [7] Connes, A.: Cyclic cohomology, quantum group symmetries and the local index formula for SUq(2), J. Inst. Math. Jussieu 3(2004), no. 1, 17–68. [8] Dabrowski,L., Landi, G., Sitarz, A., van Suijlekom, W. and Varilly, Joseph C.: The Dirac operator on SUq(2), Comm. Math. Phys. 259(2005), no. 3, 729–759. [9] Donnelly, H.: Eigenfunctions of Laplacians on Compact Riemannian Manifolds, Assian J. Math. 10 (2006), no. 1, 115–126. [10] Fröhlich, J.; Grandjean, O.; Recknagel, A.: Supersymmetric quan- tum theory and non-commutative geometry, Comm. Math. Phys. 203 (1999), no. 1, 119–184. [11] Hajac, P. and Masuda, T.: Quantum Double-Torus, Comptes Rendus Acad. Sci. Paris 327(6), Ser. I, Math. (1998), 553–558. [12] Rosenberg, S.: “The Laplacian on a Riemannian Manifold”, Cam- bridge University Press, Cambridge (1997). [13] Soltan, P. M.: Quantum families of maps and quantum semigroups on finite quantum spaces, preprint, arXiv:math/0610922. [14] Van Daele, A.: Notes on Compact Quantum Groups, arXiv:math/9803122. [15] Wang, S.: Free products of compact quantum groups, Comm. Math. Phys. 167 (1995), no. 3, 671–692. [16] Wang, S.: Quantum symmetry groups of finite spaces, Comm. Math. Phys. 195(1998), 195–211. [17] Wang, S.: Structure and isomorphism classification of compact quantum groups Au(Q) and Bu(Q), J. Operator Theory 48 (2002), 573–583. [18] Woronowicz, S. L.: ”Compact quantum groups”, pp. 845–884 in Symétries quantiques (Quantum symmetries) (Les Houches, 1995), edited by A. Connes et al., Elsevier, Amsterdam, 1998. [19] Woronowicz, S. L.: Pseudogroups, pseudospaces and Pontryagin du- ality, Proceedings of the International Conference on Mathematical Physics, Lausane (1979), Lecture Notes in Physics 116, pp. 407-412. ABSTRACT We formulate a quantum generalization of the notion of the group of Riemannian isometries for a compact Riemannian manifold, by introducing a natural notion of smooth and isometric action by a compact quantum group on a classical or noncommutative manifold described by spectral triples, and then proving the existence of a universal object (called the quantum isometry group) in the category of compact quantum groups acting smoothly and isometrically on a given (possibly noncommutative) manifold satisfying certain regularity assumptions. In fact, we identify the quantum isometry group with the universal object in a bigger category, namely the category of `quantum families of smooth isometries', defined along the line of Woronowicz and Soltan. We also construct a spectral triple on the Hilbert space of forms on a noncommutative manifold which is equivariant with respect to a natural unitary representation of the quantum isometry group. We give explicit description of quantum isometry groups of commutative and noncommutative tori, and in this context, obtain the quantum double torus defined in \cite{hajac} as the universal quantum group of holomorphic isometries of the noncommutative torus. <|endoftext|><|startoftext|> Microsoft Word - Like-QuantumSemantics.doc General System theory, Like-Quantum Semantics and Fuzzy Sets Ignazio Licata Isem, Institute for Scientific Methodology, Pa, Italy Ignazio.licata@ejtp.info Abstract: It is outlined the possibility to extend the quantum formalism in relation to the requirements of the general systems theory. It can be done by using a quantum semantics arising from the deep logical structure of quantum theory. It is so possible taking into account the logical openness relationship between observer and system. We are going to show how considering the truth-values of quantum propositions within the context of the fuzzy sets is here more useful for systemics . In conclusion we propose an example of formal quantum coherence. Key-words: Quantum Theory; Fuzzy Sets; System Theory; Syntax and Semantics of Scientific Theories; Logical Openness. Published in Systemics of Emergence. Research and Development, Minati G., Pessa E., Abram M., Springer, 2006, pages 723-734. 1.The role of syntactics and semantics in general system theory The omologic element breaks specializations up, forces taking into account different things at the same time, stirs up the interdependent game of the separated sub-totalities, hints at a broader totality whose laws are not the ones of its components. In other words, the omologic method is an anti-separatist and reconstructive one, which thing makes it unpleasant to specialists. F. Rossi-Landi 1985 The systemic-cybernetic approach ( Wiener, 1961; von Bertalannfy,1968; Klir, 1991) requires a careful evaluation of epistemology as the critical praxis internal to the building up of the scientific discourse. That is why the usual referring to a “connective tissue” shared in common by different subjects could be misleading. As a matter of fact every scientific theory is the outcome of a complex conceptual construction aimed to the problem peculiar features, so what we are interested in is not a framework shaping an abstract super-scheme made by the “filtering” of the particular sciences, but a research focusing on the global and foundational characteristics of scientific activity in a trans-disciplinary perspective. According to such view, we can understand the General System Theory (GST) by the analogy to metalogic. It deals with the possibilities and boundaries of various formal systems to a more higher degree than any specific structure. A scientific theory presupposes a certain set of relations between observer and system, so GST has the purpose to investigate the possibility of describing the multeity of system-observer relationships. The GST main goal is delineating a formal epistemology to study the scientific knowledge formation, a science able to speak about science. Succeeding to outline such panorama will make possible analysing those inter-disciplinary processes which are more and more important in studying complex systems and they will be guaranteed the “transportability” conditions of a modellistic set from a field to another one. For instance, during a theory developing, syntax gets more and more structured by putting univocal constraints on semantics according to the operative requirements of the problem. Sometimes it can be useful generalising a syntactic tool in a new semantic domain so as to formulate new problems. Such work, a typically trans- disciplinary one, can only be done by the tools of a GST able to discuss new relations between syntactics (formal model) and semantics ( model usage). It is here useful to consider again the omologic perspective, which not only identifies analogies and isomorphisms in pre-defined structures, but aims to find out a structural and dynamical relation among theories to an higher level of analysis, so providing new use possibilities (Rossi-Landi, 1985). Which thing is particularly useful in studying complex systems, where the very essence of the problem itself makes a dynamic use of models necessary to describe the emergent features of the system (Minati & Brahms, 2002; Collen, 2002). We want here to briefly discuss such GST acceptation, and then showing the possibility of modifying the semantics of Quantum Mechanics (QM) so to get a conceptual tool fit for the systemic requirements. 2. Observer as emergence surveyor and semantic ambiguity solver What we look at is not Nature in itself, but Nature unveiling to our questioning methods. W. Heisenberg, 1958 A very important and interesting question in system theory can be stated as follows: given a set of measurement systems M and of theories T related to a system S, is it always possible to order them, such that Ti-1 �Ti, where the partial order symbol � is used to denote the relationship “physically weaker than” ? We shall point out that, in this case, the ith theory of the chain contains more information than the preceding ones. This consequently leads to a second key question: can an unique final theory Tf describe exhaustively each and every aspect of system S ? From the informational and metrical side, this is equivalent to state that all of the information contained in a system S can be extracted, by means of adequate measurement processes. The fundamental proposition for reductionism is, in fact, the idea that such a theory chain will be sufficient to give a coherent and complete description for a system S. Reductionism, in the light of our definitions, coincides therefore with the highest degree of semantic space “compression”; each object D ∈ Ti in S has a definition in a theory Ti belonging to the theory chain, and the latter is - on its turn - related to the fundamental explanatory level of the “final” theory Tf. This implies that each aspect in a system S is unambiguously determined by the syntax described in Tf. Each system S can be described at a fundamental level, but also with many phenomenological descriptions, each of these descriptions can be considered an approximation of the “final” theory. Anyway, most of the “interesting” systems we deal with cannot be included in this chained- theory syntax compatibility program: we have to consider this important aspect for a correct epistemic definition of systems “complexity”. Let us illustrate this point with a simple reasoning, based upon the concepts of logical openness and intrinsic emergence (Minati, Pessa, Penna, 1998; Licata, 2003b). Each measurement operation can be theoretically coded on a Turing machine. If a coherent and complete fundamental description Tf exists, then there will also exist a finite set - or, at most, countably infinite - of measurement operations M which can extract each and every single information that describes the system S. We shall call such a measurement set Turing-observer. We can easily imagine Turing-observer as a robot that executes a series of measurements on a system. The robot is guided by a program built upon rules belonging to the theory T. It can be proved, though, that this is only possible for logically closed systems, or at most for systems with a very low degree of logical openness. When dealing with highly logically open systems, no recursive formal criterion exists that can be as selective as requested (i.e., automatically choose which information is relevant to describe and characterize the system, and which one is not), simply because it is not possible to isolate the system from the environment. This implies that the Turing- observer hypothesis does not hold for fundamental reasons, strongly related to Zermelo-Fraenkel's choice axiom and to classical Godel's decision problems. In other words, our robot executes the measurements always following the same syntactics, whereas the scenario showing intrinsic emergence is semantically modified. So it is impossible thinking to codify any possible measurement in a logically open system! The observer therefore plays a key rule, unavoidable as a semantic ambiguity solver: only the observer can and will single out intrinsic-observational emergence properties ( Bass & Emmeche,1997; Cariani, 1991), and subsequently plan adequate measurement processes to describe what – as a matter of fact- have turned in new systems. System complexity is structurally bound to logical openness and is, at the same time, both an expression of highly organized system behaviours (long-range correlations, hierarchical structure, and so on) and an observer's request for new explanatory models. So, a GST has to allow - in the very same theoretical context – to deal with the observer as an emergence surveyor in a logical open system. In particular, it is clear that the observer itself is a logical open system. Moreover, it has to be pointed out that the co-existence of many description levels – compatible but not each other deductible – leads to intrinsic uncertainty situations, linked to the different frameworks by which a system property can be defined. 3. Like-quantum semantics I’m not happy with all the analyses that go with just the classical theory, because nature isn’t classical, damm it, and if you want to make a simulation of nature, you’d better make it quantum mechanical, and by golly it’s a wonderful problem, because it doesn’t look so easy. Thank you. R. P. Feyman, 1981 When we modify and/or amplify a theory so as to being able to speak about different systems from the ones they were fitted for, it could be better to look at the theory deep structural features so as to get an abstract perspective able to fulfil the omologic approach requirements, aiming to point out a non-banal conceptual convergence. As everybody knows, the logic of classical physics is a dichotomic language (tertium non datur), relatively orthocomplemented and able to fulfil the weak distributivity relations by the logical connectives AND/OR. Such features are the core of the Boolean commutative elements of this logic because disjunctions and conjunctions are symmetrical and associative operations. We shall here dwell on the systemic consequences of these properties. A system S can get or not to get a given property P. Once we fix the P truth-value it is possible to keep on our research over a new preposition P subordinated to the previous one’s truth-value. Going ahead, we add a new piece of information to our knowledge about the system. So the relative orthocomplementation axiom grants that we keep on following a successions of steps, each one making our uncertainty about the system to diminish or, in case of a finite amount of steps, to let us defining the state of the system by determining all its properties. Each system’s property can be described by a countable infinity of atomic propositions. So, such axiom plays the role of a describable axiom for classical systems. The unconstrained use of such kind of axiom tends to hide the conceptual problems spreading up from the fact that every description implies a context, as we have seen in the case of Turing- observer analysis, and it seems to imply that systemic properties are independent of the observer, it surely is a non-valid statement when we deal with open logical systems. In particular, the Boolean features point out that it is always possible carrying out exhaustively a synchronic description of the properties of a systems. In other words, every question about the system is not depending on the order we ask it and it is liable to a fixed answer we will indicate as 0- false / 1- true. It can be suddenly noticed that the emergent features otherwise get a diachronic nature and can easily make such characteristics not taken for granted. By using Venn diagrams it is possible providing a representation of the complete descriptiveness of a system ruled by classical logics. If the system’s state is represented by a point and a property of its by a set of points, then it is always possible a complete “blanketing” of the universal set I, which means the always universally true proposition. (see fig. 1). The quantum logics shows deep differences which could be extremely useful for our goals (Birkhoff & von Neumann, 1936; Piron, 1964). At the beginning it was born to clarify some QM’s counter-intuitive sides , later it has developed as an autonomous field greatly independent from the matters which gave birth to it. We will abridge here the formal references to an essential survey, focusing on some points of general interest in systemics. The quantum language is a non-Boolean orthomodular structure, which is to say it is relatively orthocomplemented but non-commutative, for the crack down of the distributivity axiom. Such thing comes naturally from the Heisenberg Indetermination Principle and binds the truth- value of an assertion to the context and the order by which it has been investigated (Griffiths, 1995). A well- known example is the one of a particle’s spin measurement along a given direction. In this case we deal with semantically well defined possibilities and yet intrinsically uncertain. Let put xΨ the spin measurement along the direction x. For the indetermination principle the value yΨ will be totally uncertain, yet the proposition yΨ =0 ∨ yΨ =1 is necessarily true. In general, if P is a proposition , (-P ) its negation and Q the property which does not commute with P, then we will get a situation that can be represented by a “patchy” blanketing of the set I (see fig.2). Such configuration finds its essential meaning just in its relation with the observer. So we can state that when a situation can be described by a quantum logics, a system is never completely defined a priori. The measurement process by which the observer’s action takes place is a choice fixing some system’s characteristics and letting other ones undefined. It happens just for the nature itself of the observer-system inter-relationship. Each observation act gives birth to new descriptive possibilities. The proposition Q – in the above example – describes properties that cannot be defined by any implicational chain of propositions P. Since the intrinsic emergence cannot be regarded as a system property independent of the observer action- as in naïve classical emergentism - , Q can be formally considered the expression of an emergent property. Now we are strongly tempted to define as emergent the undefined proposition of quantum-like anti- commutative language. In particular, it can be showed that a non-Boolean and irreducible orthomodular language arises infinite propositions. It means that for each couple of propositions P1 and P2 such that non of them imply the other , there exists infinite propositions Q which imply P1 ∨ P2 without necessarily implying the two of them separately: tertium datur. In a sense, the disjunction of the two propositions gets more information than their mere set-sum, that is the entirely opposite of what happens in the Boolean case. It is now easy to comprehend the deep relation binding the anti-commutativity, indetermination principles and system’s holistic global structure. A system describable by a Boolean structure can be completely “solved” by analysing the sub-systems defined by a fit decomposition process( Heylighen, 1990; Abram, 2002). On the contrary, in the anti-commutative case studying any sub-system modifies the entire system in an irreversible and structural way and produces uncertainty correlated to the gained information, which think makes absolutely natural extending the indetermination principles to a big deal of spheres of strong interest for systemics (Volkenshtein , 1988). A particularly key-matter is how to conceptually managing the infinite cardinality of emergent propositions in a lik-quantum semantics. As everybody knows traditional QM refers to the frequentistic probability worked out within the Copenhagen Interpretation (CIQM). It is essentially a sub specie probabilitatis Boolean logics extension. The values between [ ]1,0 - i.e. between the completely and always true proposition I and the always false one O – are meant as expectation values, or the probabilities associated to any measurable property. Without dwelling on the complex – and as for many questions still open – debate on QM interpretation, we can here ask if the probabilistic acception of truth-values is the fittest for system theory. As it usually happens when we deal with trans-disciplinary feels, it will bring us to add a new, and of remarkable interest for the “ordinary” QM too, step to our search. 4. A Fuzzy Interpretation of Quantum Languages A slight variation in the founding axioms of a theory can give way to huge changings on the frontier. S. Gudder, 1988 The study of the structural and logical facets of quantum semanics does not provide any necessary indications about the most suitable algebraic space to implement its own ideas. One of the thing which made a big merit of such researches has been to put under discussion the key role of Hilbert space. In our approach we have kept the QM “internal” problems and its extension to systemic questions well separated. Anyway, the last ones suggest an interpretative possibility bounded to fuzzy logic, which thing can considerably affect the traditional QM too. The fuzzy set theory is , in its essence, a formal tool created to deal with information characterized with vagueness and indeterminacy. The by-now classical paper of Lotfi Zadeh (Zadeh, 1965) brings to a conclusion an old tradition of logics, which counts Charles S. Peirce, Jan C. Smuts, Bertrand Russell, Max Black and Ian Lukasiewicz among its forerunners. At the core of the fuzzy theory lies the idea that an element can belong to a set to a variable degree of membership; the same goes for a proposition and its variable relation to the true and false logical constants. We underline here two aspects of particular interest for our aims. The fuzziness’ definition concerns single elements and properties, but not a statistical ensemble, so it has to be considered a completely different concept from the probability one, it should –by now- be widely clarified (Mamdani, 1977; Kosko, 1990). A further essential – even maybe less evident – point is that fuzzy theory calls up a non- algorithmic “oracle”, an observator (i.e. a logical open system and a semantic ambiguity solver) to make a choice as for the membership degree. In fact, the most part of the theory in its structure is free-model; no equation and no numerical value create constraints to the quantitative evaluation, being the last one the model builder’s task. There consequently exists a deep bound between systemics and fuzziness successfully expressed by the Zadeh’s incompatibility principle (Zadeh, 1972) which satisfies our requirement for a generalized indeterminacy principle. It states that by increasing the system complexity (i.e. its logical openness degree), it will decrease our ability to make exact statements and proved predictions about its behaviour. There already exists many examples of crossing between fuzzy theory and QM (Dalla Chiara, Giuntini, 1995; Cattaneo, Dalla Chiara, Giuntini 1993). We want here to delineate the utility of fuzzy polyvalence for systemic interpretation of quantum semantics. Let us consider a complex system, such as a social group, a mind and a biological organism. Each of these cases show typical emergent features owed both to the interaction among its components and the inter-relations with the environment. An act of the observer will fix some properties and will let some others undetermined according to a non-Boolean logic. The recording of such properties will depend on the succession of the measurement acts and their very nature. The kind of complexity into play, on the other hand, prevents us by stating what the system state is so as to associate to the measurement of a property an expectation probabilistic value. In fact, just the above-mentioned examples are related to macroscopic systems for which the probabilistic interpretation of QM is patently not valid. Moreover, the traditional application of the probability concept implies the notion of “possible cases”, and so it also implies a pre-defined knowledge of systems’ properties. However, the non-commutative logical structure here outlined does not provide any cogent indication on probability usage. Therefore, it would be proper to look at a fuzzy approach so to describe the measurement acts. We can state that given a generic system endowed with high logical openness and an indefinite set of properties able of describing it, each of them will belong to the system in a variable degree. Such viewpoint expressing the famous theorem of fuzzy “subsetness” – also known as “the whole into the part” principle – could seem to be too strong , indeed it is nothing else than the most natural expression of the actual scientific praxis facing intrinsic emergent systems. At the beginning, we have at our disposal indefinite information progressively structuring thanks to the feedback between models and measurements. It can be shown that any logically open model of degree n – where n is an integer – will let a wide range of properties and propositions indeterminate (the Qs in fig. 2).The above-mentioned model is a “static” approximation of a process showing aspects of variable closeness and openness. The latter ones varies in time, intensity, different levels and context. It is remarkable pointing out how such systems are “flexible” and context-sensitive, change the rules and make use of “contradictions” . This point has to be stressed to understand the link between fuzzy logic and quantum languages. By increasing the logical openness and the unsharp properties of a system, it will be less and less fit to be described by a Boolean logic. It brings as a consequence that for a complex system the intersection between a set (properties, propositions) and its complement is not equal to the empty set, but it includes they both in a fuzzy sense. So we get a polyvalent semantic situation which is well fitted for being described by a quantum language. As for our systemic goal it is the probabilistic interpretation to be useless, so we are going to build a fuzzy acception of the semantics of the formalism. In our case, given a system S and a property Q,, let Ψ be a function which associates Q to S, the expression ( ) [ ]1,0∈Ψ QS has not to be meant as a probability value, but as a degree of membership. Such union between the non-commutative sides of quantum languages and fuzzy polyvalence appears to be the most suitable and fecund for systemics. Let us consider the traditional expression of quantum coherence (the property expressing the QM global and non-local characteristics, i.e. superposition principle, uncertainty, interference of probabilities), 2211 Ψ+Ψ=Ψ aa . In the fuzzy interpretation, it means that the properties 1Ψ e 2Ψ belong to Ψ with degrees of membership 1a e 2a respectively. In other words, for complex systems the Schrödinger’s cat can be simultaneously both alive and dead ! Indeed the recent experiments with SQUIDs and the other ones investigating the so-called macroscopic quantum states suggest a form of macro-realism quite close to our fuzzy acception (Leggett, 1980; Chiatti, Cini, Serva, 1995). It can provide in nuce an hint which could show up to be interesting for the QM old-questioned interpretative problems. In general, let x be a position coordinate of a quantum object and Ψ its wave function, ( ) dVx 2Ψ is usually meant as the probability of finding the particle in a region dV of space. On the contrary, in the fuzzy interpretation we will be compelled to look at the Ψ square modulus as the degree of membership of the particle to the region dV of space. How unusual it may seem, such idea has not to be regarded thoughtlessly at. As a matter of fact, in Quantum Field Theory and in other more advanced quantum scenarios, a particle is not only a localized object in the space, but rather an event emerging from the non-local networks elementary quantum transition (Licata, 2003a). Thus, the measurement is a “defuzzification” process which, according to the stated, reduces the system ambiguity by limiting the semantic space and by defining a fixed information quantity. If we agree with such interpretation we will easily and immediately realize that we will able to observate quantum coherence behaviours in non-quantum and quite far from the range of Plank’s h constant situations. We reconsider here a situation owed to Yuri Orlov (Orlov, 1997). Let us consider a Riemann’s sphere (Dirac, 1947) – see fig. 3 - and let assume that each point on the sphere represents a single interpretation of a given situation, i.e. the assigning of a coherent set of truth-values to a given proposition. Alternatively, we can consider the choosing of a vector v from the centre O to a point on the sphere as a logical definition of a world. If we choose a different direction, associated to a different vector w , we can now set the problem about the meaning of the amplitude between the logical descriptions of the two worlds. It is known that such amplitude is expressed by ( )ϑcos121 + , where ϑ is the angle between the two interpretations. The amplitude corresponds to a superposition of worlds, so producing the typical interference patterns which in vectorial terms are related to v w . In this case, the traditional use of probability is not necessary because our knowledge of one of the two world with probability equal to p =1 (certainity), say nothing us about the other one probability. An interpretation is not a quantum object in the proper sense, and yet we are forced to formally introduce a wave-function and interference terms whose role is very obscure a one. The fuzzy approach, instead, clarifies the quantum semantics of this situation by interpreting interference as a measurement where the properties of the world wv wv Ψ+Ψ are owed to the global and indissoluble (non-local) contribution of the v and w overlapping. In conclusion, the generalized using of quantum semantics associated to new interpretative possibilities gives to systemics a very powerful tool to describe the observator-environment relation and to convey the several, partial attempts - till now undertaken - of applying the quantum formalism to the study of complex systems into a comprehensive conceptual root. ACKNOWLEDGEMENTS A special thank to Prof. G. Minati for his kindness and his supporting during this paper drafting. I owe a lot to the useful discussing on structural Quantum Mechanics and logics with my good friends Prof. Renato Nobili (who let me use the figs. 1 and 2 from his book “Dai Quark alla Mente”, to be published) and Prof. Eliano Pessa. Dedicated to M.V. REFERENCES Abram, M.R.,2002, Decomposition of Systems, in Emergence in Complex, Cognitive, Social and Biological Systems ,( G. Minati and E.Pessa eds.), Kluwer Academic, NY, 2002. Baas, N. A. and Emmeche , C., 1997, On Emergence and Explanation, in SFI Working Paper, Santa Fé Inst., 97-02-008. Birkhoff, G. and von Neumann J., 1936, The Logic of Quantum Mechanics, in Annals of Math.,37. Cariani, P., 1991, Adaptivity and Emergence in Organism and Devices, in World Futures, 32( 111). Cattaneo, G., Dalla Chiara, M.L.,Giuntini, R., 1993, Fuzzy-Intuitionistic Quantum Logics, in Studia Logica, 52. Chiatti, L., Cini M., Serva, M., 1995, Is Macroscopic Quantum Coherence Incompatibile with Macroscopic Realism? In Nuovo Cim., 110B (5-6). Collen, A., 2002, Disciplinarity in the Pursuit of Knowledge, in Emergence in Complex, Cognitive, Social and Biological Systems ( G. Minati and E.Pessa eds.), Kluwer Academic, NY, 2002. Dalla Chiara, M.L. and Giuntini R, 1995, The Logic of Orthoalgebras, in Studia Logica, 55. Dirac,P.A.M., 1947, The Principles of Quantum Mechanics, 3rd ed., Oxford Univ. Press, Oxford. Feynman, R. P., 1982, Simulating Physics with Computers, in Int. J. of Theor. Phys., 21(6/7). Griffiths, R. B., 1995, Consistent Quantum Reasoning, in arXiv :quant-ph/9505009 v1. Gudder, S.P.,1988, Quantum Probability, Academic Press, NY. Heisenberg W., 1958, Physics and Philosophy: The Revolution in Modern Science .Harper and Row, NY ; Prometheus Books; Reprint edition 1999. Heylighen, F., 1990, Classical and Non-Classical Representations in Physics: Quantum Mechanics, in Cybernetics and Systems 21. Klir, J. G., (ed), 1991, Facets of Systems Science, Plenum Press, NY. Kosko, B., 1990, Fuzziness vs. Probability, in Int. J. Of General Systems, 17(2). Legget, A. J., 1980, Macroscopic Quantum Systems and the Quantum Theory of Measurement,in Suppl.Prog.Theor.Phys., 69(80). Licata, I., 2003a, Osservando la sfinge. La realtà virtuale della fisica quantistica, Di Renzo, Roma. Licata,I., 2003b, Mente & Computazione, in Sistema Naturae, Annali di Biologia Teorica,5. Mamdani, E.H., 1977, Application of Fuzzy Logic to Approximate Reasoning Using Linguistic Synthesis, in IEEE Trans. on Computers, C26. Minati G and Brahms S., 2002, The Dynamic Usage of Models (DYSAM), in Emergence in Complex, Cognitive, Social and Biological Systems ( G. Minati and E.Pessa eds.), Kluwer Academic, NY, 2002. Minati, G., Pessa, E., Penna, M. P., 1998, Thermodynamical and Logical Openness in Systems Research and Behavioral Science, 15(3). Orlov, Y.F., 1997, Quantum-Type Coherence as a Combination of Symmetry and Semantics, in arXiv:quant-ph/9705049 v1. Piron, C.,1964, Axiomatique Quantique, in Helvetica Physica Acta, 37. Rossi- Landi, F, 1985, Metodica filosofica e scienza dei segni, Bompiani, Milano. Volkenshtein, M.V.,1988, Complementary,Physics and Biology in Soviet Phys. Uspekhi 31. Von Bertalanffy, 1968, General System Theory, Braziller, NY. Zadeh, L.A., 1965, Fuzzy Sets, in Information and Control , 8. Zadeh, L. A. , 1987, Fuzzy Sets and Applications:Selected Papers by L.A. Zadeh, R.R Yager, R.M Tong, S. Ovchnikov H.T Nguyen (eds.) , Wiley, NY. Wiener, N., 1961, Cybernetics : or control and communication in the animal ed the machine, MIT Press, Cambridge. ABSTRACT It is outlined the possibility to extend the quantum formalism in relation to the requirements of the general systems theory. It can be done by using a quantum semantics arising from the deep logical structure of quantum theory. It is so possible taking into account the logical openness relationship between observer and system. We are going to show how considering the truth-values of quantum propositions within the context of the fuzzy sets is here more useful for systemics . In conclusion we propose an example of formal quantum coherence. <|endoftext|><|startoftext|> Introduction In 1959, S.K. Godunov [17] demonstrated that a (linear) scheme for a PDE could not, at the same time, be monotone and second order accurate. Hence, ∗ Corresponding author. Email addresses: r.brownlee@mcs.le.ac.uk (R. A. Brownlee), a.gorban@mcs.le.ac.uk (A. N. Gorban), j.levesley@mcs.le.ac.uk (J. Levesley). 1 This work is supported by EPSRC grant number GR/S95572/01. Preprint submitted to Physica A 24 October 2018 http://arxiv.org/abs/0704.0043v1 we should choose between spurious oscillation in high order non-monotone schemes and additional dissipation in first order schemes. Flux limiter schemes are invented to combine high resolution schemes in areas with smooth fields and first order schemes in areas with sharp gradients. The idea of flux limiters can be illustrated by computation of the flux F0,1 of the conserved quantity u between a cell marked by 0 and one of two its neighbour cells marked by ±1: F0,1 = (1− φ(r))f low0,1 + φ(r)f 0,1 , where f low0, 1 , f 0, 1 are low and high resolution scheme fluxes, respectively, r = (u0 − u−1)/(u1 − u0), and φ(r) ≥ 0 is a flux limiter function. For r close to 1, the flux limiter function φ(r) should be also close to 1. Many flux limiter schemes have been invented during the last two decades [43]. No particular limiter works well for all problems, and a choice is usually made on a trial and error basis. Below are several examples of flux limiter functions: φmm(r) = max [0,min (r, 1)] (minmod, [36]); φos(r) = max [0,min (r, β)] , (1 ≤ β ≤ 2) (Osher, [10]); φmc(r) = max [0,min (2r, 0.5(1 + r), 2)] (monotonised central [42]); φsb(r) = max [0,min (2r, 1) ,min (r, 2)] (superbee, [36]); φsw(r) = max [0,min (βr, 1) , (r, β)] , (1 ≤ β ≤ 2) (Sweby, [40]). The lattice Boltzmann method has been proposed as a discretization of Boltz- mann’s kinetic equation and is now in wide use in fluid dynamics and beyond (for an introduction and review see [38]). Instead of fields of moments M , the lattice Boltzmann method operates with fields of discrete distributions f . This allows us to construct very simple limiters that do not depend on slopes or gradients. All the limiters we construct are based on the representation of distributions f in the form: f = f ∗ + ‖f − f ∗‖ f − f ∗ ‖f − f ∗‖ where f ∗ is the correspondent quasiequilibrium (conditional equilibrium) for given moments M , f − f ∗ is the nonequilibrium “part” of the distribution, which is represented in the form “norm×direction” and ‖f − f ∗‖ is the norm of that nonequilibrium component (usually this is the entropic norm). Lim- iters change the norm of the nonequilibrium component f − f ∗, but do not touch its direction or the equilibrium. In particular, limiters do not change the macroscopic variables, because moments for f and f ∗ coincide. All limiters we use are transformations of the form f 7→ f ∗ + φ× (f − f ∗) (1) with φ > 0. If f − f ∗ is too big, then the limiter should decrease its norm. The outline of the paper is as follows. In Sec. 2 we introduce the notions and notations from lattice Boltzmann theory we need, in Sec. 3 we elaborate the idea of entropic limiters in more detail and construct several nonequilibrium entropy limiters for LBM, in Sec. 4 some numerical experiments are described: (1) 1D athermal shock tube examples; (2) steady state vortex centre locations and observation of first Hopf bifur- cation in 2D lid-driven cavity flow. Concluding remarks are given in Sec. 5. 2 Background The essence of lattice Boltzmann methods was formulated by S. Succi in the following maxim: “Nonlinearity is local, non-locality is linear” 2 . We should even strengthen this statement. Non-locality (a) is linear; (b) is exactly and explicitly solvable for all time steps; (c) space discretization is an exact oper- ation. The lattice Boltzmann method is a discrete velocity method. The finite set of velocity vectors {vi} (i = 1, ...m) is selected, and a fluid is described by associating, with each velocity vi, a single-particle distribution function fi = fi(x, t) which is evolved by advection and interaction (collision) on a fixed computational lattice. The values fi are named populations. If we look at all lattice Boltzmann models, one finds that there are two steps: free flight for time δt and a local collision operation. The free flight transformation for continuous space is fi(x, t+ δt) = fi(x− viδt, t). After the free flight step the collision step follows: fi(x) 7→ Fi({fj(x)}), (2) 2 S. Succi, “Lattice Boltzmann at all-scales: from turbulence to DNA transloca- tion”, Mathematical Modelling Centre Distinguished Lecture, University of Leices- ter, Leicester UK, 15th November 2006. or in the vector form f(x) 7→ F (f(x)). Here, the collision operator F is the set of functions Fi({fj}) (i = 1, ...m). Each function Fi depends on all fj (j = 1, ...m): new values of the populations fi at a point x are known functions of all previous population values at the same point. The lattice Boltzmann chain “free flight → collision → free flight → collision · · · ” can be exactly restricted onto any space lattice which is invariant with respect to space shifts of the vectors viδt (i = 1, ...m). Indeed, free flight trans- forms the population values at sites of the lattice into the population values at sites of the same lattice. The collision operator (2) acts pointwise at each lattice site separately. Much effort has been applied to answer the questions: “how does the lattice Boltzmann chain approximate the transport equation for the moments M?”, and “how does one construct the lattice Boltzmann model for a given macroscopic transport phenomenon?” (a review is presented in book [38]). In our paper we propose a universal construction of limiters for all possible collision operators, and the detailed construction of Fi({fj}) is not important for this purpose. The only part of this construction we use is the local equilibria (sometimes these states are named conditional equilibria, quasiequilibria, or even simpler, equilibria). The lattice Boltzmann models should describe the macroscopic dynamic, i.e., the dynamic of macroscopic variables. The macroscopic variables Mℓ(x) are some linear functions of the population values at the same point: Mℓ(x) = imℓifi(x), or in the vector form, M(x) = m(f(x)). The macroscopic vari- ables are invariants of collisions: mℓifi = mℓiFi({fj}) (or m(f) = m(F (f))). The standard example of the macroscopic variables are hydrodynamic fields (density–velocity–energy density): {n, nu, E}(x) := ∑i{1, vi, v2i /2}fi(x). But this is not an obligatory choice. If we would like to solve, by LBM methods, the Grad equations [22] or some extended thermodynamic equations [25], we should extend the list of moments (but, at the same time, we should be ready to introduce more discrete velocities for a proper description of these extended moment systems). On the other hand, the athermal lattice Boltzmann models with a shortened list of macroscopic variables {n, nu} are very popular. The quasiequilibrium is the positive fixed point of the collision operator for the given macroscopic variablesM . We assume that this point exists, is unique and depends smoothly on M . For the quasiequilibrium population vector for given M we use the notation f ∗M , or simply f ∗, if the correspondent value of M is obvious. We use Π∗ to denote the equilibration projection operation of a distribution f into the corresponding quasiequilibrium state: Π∗(f) = f ∗m(f). For some of the collision models an entropic description of equilibrium is pos- sible: an entropy density function S(f) is defined and the quasiequilibrium point f ∗M is the entropy maximiser for given M [26,39]. As a basic example we shall consider the lattice Bhatnagar–Gross–Krook (LBGK) model with overrelaxation (see, e.g., [3,12,23,28,38]). The LBGK col- lision operator is F (f) := Π∗(f) + (2β − 1)(Π∗(f)− f), (3) where β ∈ [0, 1]. For β = 0, LBGK collisions do not change f , for β = 1/2 these collisions act as equilibration (this corresponds to the Ehrenfests’ coarse graining [15] further developed in [14,19,20]), for β = 1, LBGK collisions act as a point reflection with the center at the quasiequilibrium Π∗(f). It is shown [8] that under some stability conditions and after an initial period of relaxation, the simplest LBGK collision with overrelaxation [23,38] provides second order accurate approximation for the macroscopic transport equation with viscosity proportional to δt(1− β)/β. The entropic LBGK (ELBM) method [5,20,26,39] differs in the definition of (3): for β = 1 it should conserve the entropy, and in general has the following form: F (f) := (1− β)f + βf̃ , (4) where f̃ = (1 − α)f + αΠ∗(f). The number α = α(f) is chosen so that the constant entropy condition is satisfied: S(f) = S(f̃). For LBGK (3), α = 2. Of course, for ELBM the entropic definition of quasiequilibrium should be valid. In the low-viscosity regime, LBGK suffers from numerical instabilities which readily manifest themselves as local blow-ups and spurious oscillations. The LBM experiences the same spurious oscillation problems near sharp gra- dients as high order schemes do. The physical properties of the LBM schemes allows one to construct new types of limiters: the nonequilibrium entropy lim- iters. In general, they do the same work for LBM as flux limiters do for finite differences, finite volumes and finite elements methods, but for LBM the main idea behind the construction of nonequilibrium entropy limiter schemes is to limit a scalar quantity — nonequilibrium entropy (and not the vectors or ten- sors of spatial derivatives, as it is for flux limiters). These limiters introduce some additional dissipation, but all this dissipation could easily be evaluated through analysis of nonequilibrium entropy production. Two examples of such limiters have been recently proposed: the positivity rule [6,31,41] and the Ehrenfests’ regularisation [7]. The positivity rule just provides positivity of distributions: if a collision step produces negative popu- lations, then the positivity rule returns them to the boundary of positivity. In the Ehrenfests’ regularisation, one selects the k sites with highest nonequilib- rium entropy (the difference between entropy of the state f and entropy of the corresponding quasiequilibrium state f ∗ at a given space point) that exceed a given threshold and equilibrates the state in these sites. The positivity rule and Ehrenfests’ regularisation provide rare, intense and localised corrections. It is easy and also computationally cheap to organise more gentle transformation with smooth shift of highly nonequilibrium states to quasiequilibrium. The following regularisation transformation distributes its action smoothly: we can just choose in (1) φ = φ(∆S(f)) with sufficiently smooth function φ(∆S(f)). Here f is the state at some site, f ∗ is the corre- sponding quasiequilibrium state, S is entropy, and ∆S(f) := S(f ∗)− S(f). The next step in the development of the nonequilibrium entropy limiters is in the usage of local entropy filters. The filter of choice here is the median filter: it does not erase sharp fronts, and is much more robust than convolution filters. An important problem is: “how does one create nonequilibrium entropy lim- iters for LBM with non-entropic quasiequilibria?”. We propose a solution of this problem based on the nonequilibrium Kullback entropy. For entropic quasiequilibrium the Kullback entropy approach gives the same entropic lim- iters. In thermodynamics, Kullback entropy belongs to the family of Massieu– Planck–Kramers functions (canonical or grandcanonical potentials). 3 Nonequilibrium entropy limiters for LBM 3.1 Positivity rule There is a simple recipe for positivity preservation [6,31,41]: to substitute nonpositive I 0 (f)(x) by the closest nonnegative state that belongs to the straight line λf(x) + (1− λ)Π∗(f(x))| λ ∈ R defined by the two points, f(x) and corresponding quasiequilibrium. This op- eration is to be applied pointwise, at points of the lattice where positivity is violated. The coefficient λ depends on x too. Let us call this recipe the positivity rule (Fig. 1). This recipe preserves positivity of populations and probabilities, but can affect accuracy of approximation. The same rule is nec- F(f ) Positivity fixation Positivity domain Fig. 1. Positivity rule in action. The motions stops at the positivity boundary. essary for ELBM (4) when the positive “mirror state” f̃ with the same entropy as f does not exists on the straight line (5). 3.2 Ehrenfests’ regularisation To discuss methods with additional dissipation, the entropic approach is very convenient. Let entropy S(f) be defined for each population vector f = (fi) (below we use the same letter S for local in space entropy, and hope that context will make this notation always clear). We assume that the global entropy is a sum of local entropies for all sites. The local nonequilibrium entropy is ∆S(f) := S(f ∗)− S(f), (6) where f ∗ is the corresponding local quasiequilibrium at the same point. The Ehrenfests’ regularisation [6,7] provides “entropy trimming”: we moni- tor local deviation of f from the corresponding quasiequilibrium, and when ∆S(f)(x) exceeds a pre-specified threshold value δ, perform local Ehrenfests’ steps to the corresponding quasiequilibrium: f 7→ f ∗ at those points. So that the Ehrenfests’ steps are not allowed to degrade the accuracy of LBGK it is pertinent to select the k sites with highest ∆S > δ. The a posteriori estimates of added dissipation could easily be performed by analysis of entropy production in Ehrenfests’ steps. Numerical experiments show (see, e.g., [6,7]) that even a small number of such steps drastically improve stability. To avoid the change of accuracy order “on average”, the number of sites with this step should be ≤ O(Nh/L) where N is the total number of sites, h is the step of the space discretization and L is the macroscopic characteristic length. But this rough estimate of accuracy in average might be destroyed by concentration of Ehrenfests’ steps in the most nonequilibrium areas, for example, in the boundary layer. In that case, instead of the total number of sites N in O(Nh/L) we should take the number of sites in a specific region. The effects of concentration could be easily analysed a posteriori. 3.3 Smooth limiters of nonequilibrium entropy The positivity rule and Ehrenfests’ regularisation provide rare, intense and localised corrections. Of course, it is easy and also computationally cheap to organise more gentle transformation with a smooth shift of highly nonequilib- rium states to quasiequilibrium. The following regularisation transformation distributes its action smoothly: f 7→ f ∗ + φ(∆S(f))(f − f ∗). (7) The choice of function φ is highly ambiguous, for example, φ = 1/(1+α∆Sk) for some α > 0 and k > 0. There are two significantly different choices: (i) ensemble-independent φ (i.e., the value of φ depends on local value of ∆S only) and (ii) ensemble-dependent φ, for example φ(∆S) = 1 + (∆S/(αE(∆S)))k−1/2 1 + (∆S/(αE(∆S)))k , (8) where E(∆S) is the average value of ∆S in the computational area, k ≥ 1, and α & 1. For small ∆S, φ(∆S) ≈ 1 and for ∆S ≫ αE(∆S), φ(∆S) tends αE(∆S)/∆S. It is easy to select an ensemble-dependent φ with control of total additional dissipation. 3.4 Monitoring of total dissipation For given β, the entropy production in one LBGK step in quadratic approxi- mation for ∆S is: δLBGKS ≈ [1− (2β − 1)2] ∆S(x), where x is the grid point, ∆S(x) is nonequilibrium entropy (6) at point x, δLBGKS is the total entropy production in a single LBGK step. It would be desirable if the total entropy production for the limiter δlimS was small relative to δLBGKS: δlimS < δ0δLBGKS. (9) A simple ensemble-dependent limiter (perhaps, the simplest one) for a given δ0 operates as follows. Let us collect the histogram of the ∆S(x) distribution, and estimate the distribution density, p(∆S). We have to estimate a value ∆S0 that satisfies the following equation: p(∆S)(∆S −∆S0) d∆S = δ0[1− (2β − 1)2] p(∆S)∆S d∆S. (10) In order not to affect distributions with small expectation of ∆S, we choose a threshold ∆St = max{∆S0, δ}, where δ is some predefined value (as in the Ehrenfests’ regularisation). For states at sites with ∆S ≥ ∆St we pro- vide homothety with quasiequilibrium center f ∗ and coefficient ∆St/∆S (in quadratic approximation for nonequilibrium entropy): f(x) 7→ f ∗(x) + (f(x)− f ∗(x)). (11) 3.5 Median entropy filter The limiters described above provide pointwise correction of nonequilibrium entropy at the “most nonequilibrium” points. Due to the pointwise nature, the technique does not introduce any nonisotropic effects, and provides some other benefits. But if we involve the local structure, we can correct local non- monotone irregularities without touching regular fragments. For example, we can discuss monotone increase or decrease of nonequilibrium entropy as regular fragments and concentrate our efforts on reduction of “speckle noise” or “salt and pepper noise”. This approach allows us to use the accessible resource of entropy change (9) more thriftily. Among all possible filters, we suggest the median filter. The median is a more robust average than the mean (or the weighted mean) and so a single very unrepresentative value in a neighborhood will not affect the median value significantly. Hence, we suppose that the median entropy filter will work better than entropy convolution filters. The median filter considers each site in turn and looks at its nearby neighbours. It replaces the nonequilibrium entropy value ∆S at the point with the median of those values ∆Smed, then updates f by the transformation (11) with the homothety coefficient ∆Smed/∆S. The median, ∆Smed, is calculated by first sorting all the values from the surrounding neighbourhood into numerical order and then replacing that being considered with the middle value. For example, if a point has 3 nearest neighbors including itself, then after sorting we have 3 values ∆S: ∆S1 ≤ ∆S2 ≤ ∆S3. The median value is ∆Smed = ∆S2. For 9 nearest neighbors (including itself) we have after sorting ∆Smed = ∆S5. For 27 nearest neighbors ∆Smed = ∆S14. We accept only dissipative corrections (those resulting in a decrease of ∆S, ∆Smed < ∆S) because of the second law of thermodynamics. The analogue of (10) is also useful for acceptance of the most significant corrections. Median filtering is a common step in image processing [34] for the smoothing of signals and the suppression of impulse noise with preservation of edges. 3.6 Entropic steps for non-entropic quasiequilibria Beyond the quadratic approximation for nonequilibrium entropy all the logic of the above mentioned constructions remain the same. There exists only one sig- nificant change: instead of a simple homothety (11) with coefficient ∆St/∆S the transformation (7) should be applied, where the multiplier φ is a solution of the nonlinear equation S(f ∗ + φ(f − f ∗)) = S(f ∗)−∆St. This is essentially the same equation that appears in the definition of ELBM steps (4). More differences emerge for LBM with non-entropic quasiequilibria. The main idea here is to reason that non-entropic quasiequilibria appear only because of technical reasons, and approximate continuous physical entropic quasiequilib- ria. This is not an approximation of a density function, but an approximation of measure, i.e., from the cubature formula: f(v) ≈ fiδ(v − vi) ϕ(v)f(v) dv ≈ ϕ(vi)fi. The discrete populations fi are connected to continuous (and sufficiently smooth) densities f(v) by cubature weights fi ≈ wif(vi). These weights for quasiequilibria are found by moment and flux matching conditions [37]. It is impossible to approximate the BGS entropy f ln fdv just by discretiza- tion (to change integration by summation, and continuous distribution f by discrete fi), because cubature weights appear as additional variables. Never- theless, the approximate discretization of the Kullback entropy SK [30] does not change its form: SK(f) = − f(v) ln f ∗(v) dv ≈ − fi ln , (12) because fi/f i approximates the ratio of functions f(v)/f ∗(v) and i fi . . . gives the integral f(v) . . .dv approximation. Here, in (12), the state f ∗ is the quasiequilibrium with the same values of the macroscopic variables as f . More- over, for given values of the macroscopic variables, SK(f) achieves its maxi- mum at the point f = f ∗ (both for continuous and for discrete distributions). The corresponding maximal value is zero. Below, SK is the discrete Kullback entropy. If the approximate discrete quasiequilibrium f ∗ is non-entropic, we can use −SK(f) instead of ∆S(f). For entropic quasiequilibria with perfect entropy the discrete Kullback entropy gives the same ∆S: −SK(f) = ∆S(f). Let the discrete entropy have the standard form for an ideal (perfect) mixture [27]. S(f) = − fi ln After the classical work of Zeldovich [44], this function is recognised as a useful instrument for the analysis of kinetic equations (especially in chemical kinetics [21]). If we define f ∗ as the conditional entropy maximum for given k mjkfk, then ln f ∗k = µjmjk, where µj(M) are the Lagrange multipliers (or “potentials”). For this entropy and conditional equilibrium we find ∆S = S(f ∗)− S(f) = fi ln , (13) if f and f ∗ have the same moments, m(f) = m(f ∗). The right hand side of (13) is −SK(f). In thermodynamics, the Kullback entropy belongs to the family of Massieu– Planck–Kramers functions (canonical or grandcanonical potentials). There is another sense of this quantity: SK is the relative entropy of f with respect to f ∗ [18,35]. In quadratic approximation, −SK(f) = fi ln (fi − f ∗i )2 3.7 ELBM collisions as a smooth limiter On the base of numerical tests, the authors of [41] claim that the positivity rule provides the same results (in the sense of stability and absence/presence of spurious oscillations) as the ELBM models, but ELBM provides better accuracy. For the formal definition of ELBM (4) our tests do not support claims that ELBM erases spurious oscillations (see below). Similar observation for Burgers equation was previously published in [4]. We understand this situation in the following way. The entropic method consists at least of three components: (1) entropic quasiequilibrium, defined by entropy maximisation; (2) entropy balanced collisions (4) that have to provide proper entropy bal- ance; (3) a method for the solution of the transcendental equation S(f) = S(f̃) to find α = α(f) in (4). It appears that the first two items do not affect spurious oscillations at all, if we solve the equation for α(f) with high accuracy. Additional viscosity is, potentially, added by explicit analytic formulas for α(f). In order not to decrease entropy, errors in these formulas always increase dissipation. This can be interpreted as a hidden transformation of the form (7), where the coefficients in φ depend also on f ∗. 3.8 Monotonic and double monotonic limiters Two monotonicity properties are important in the theory of nonequilibrium entropy limiters: (1) a limiter should move the distribution to equilibrium: in all cases of (1) 0 ≤ φ ≤ 1. This is the dissipativity condition which means that limiters never produce negative entropy. (2) a limiter should not change the order of states on the line: if for two distributions with the same moments, f and f ′, ∆S(f) > ∆S(f ′) before the limiter transformation, then the same inequality should hold after the limiter transformation too. For example, for the limiter (7) it means that ∆S(f ∗ + xφ(∆S(f ∗ + x(f − f ∗))(f − f ∗)) is a monotonically increasing function of x > 0. In quadratic approximation, ∆S(f ∗ + x(f − f ∗)) = x2∆S(f), ∆S(f ∗ + xφ(∆S(f ∗ + x(f − f ∗))(f − f ∗)) = x2φ2(x2∆S(f)), and the second monotonicity condition transforms into the following require- ment: yφ(y2s) is a monotonically increasing (not decreasing) function of y > 0 for any s > 0. If a limiter satisfies both monotonicity conditions, we call it “double mono- tonic”. For example, Ehrenfests’ regularisation satisfies the first monotonicity condition, but obviously violates the second one. The limiter (8) violates the first condition for small ∆S, but is dissipative and satisfies the second one in quadratic approximation for large ∆S. The limiter with φ = 1/(1+α∆Sk) al- ways satisfies the first monotonicity condition, violates the second if k > 1/2, and is double monotonic (in quadratic approximation for the second condi- tion), if 0 < k ≤ 1/2. The threshold limiters (11) are also double monotonic. Of course, it is not forbidden to use any type of limiters under the local and global control of dissipation, but double monotonic limiters provide some nat- ural properties automatically, without additional care. 4 Numerical experiment To conclude this paper we report some numerical experiments conducted to demonstrate the performance of some of the proposed nonequilibrium entropy limiters for LBM from Sec. 3. 4.1 Velocities and quasiequilibria We will perform simulations using both entropic and non-entropic quasiequi- libria, but we always work with an athermal LBM model. Whenever we use non-entropic quasiequilibria we employ Kullback entropy (13). In 1D, we use a lattice with spacing and time step δt = 1 and a discrete velocity set {v1, v2, v3} := {0,−1, 1} so that the model consists of static, left- and right-moving populations only. The subscript i denotes population (not lattice site number) and f1, f2 and f3 denote the static, left- and right-moving populations, respectively. The entropy is S = −H , with H = f1 log(f1/4) + f2 log(f2) + f3 log(f3), (see, e.g., [27]) and, for this entropy, the local entropic quasiequilibrium state f ∗ is available explicitly: f ∗1 = 1 + 3u2 f ∗2 = (3u− 1) + 2 1 + 3u2 f ∗3 = − (3u+ 1)− 2 1 + 3u2 where fi, u := vifi. (15) The standard non-entropic polynomial quasiequilibria [38] are: f ∗1 = f ∗2 = (1− 3u+ 3u2), f ∗3 = (1 + 3u+ 3u2). In 2D, we employ a uniform 9-speed square lattice with discrete velocities {vi | i = 0, 1, . . . 8}: v0 = 0, vi = (cos((i − 1)π/2), sin((i − 1)π/2)) for i = 1, 2, 3, 4, vi = 2(cos((i − 5)π ), sin((i − 5)π )) for i = 5, 6, 7, 8. The numbering f0, f1, . . . , f8 are for the static, east, north, west, south, north- east, northwest, southwest and southeast-moving populations, respectively. As usual, the entropic quasiequilibrium state, f ∗, can be uniquely determined by maximising an entropy functional S(f) = − fi log subject to the constraints of conservation of mass and momentum [2]: f ∗i = ρWi 1 + 3u2j 2uj + 1 + 3u2j 1− uj . (17) Here, the lattice weights, Wi, are given lattice-specific constants: W0 = 4/9, W1,2,3,4 = 1/9 and W5,6,7,8 = 1/36. Analogously to (15), the macroscopic vari- ables ρ and u = (u1, u2) are the zeroth and first moments of the distribution f , respectively. The standard non-entropic polynomial quasiequilibria [38] are: f ∗i = ρWi 1 + 3viu+ 9(viu) . (18) 4.2 LBGK and ELBM The governing equations for LBGK are fi(x+ vi, t+ 1) = f i (x, t) + (2β − 1)(f ∗i (x, t)−fi(x, t)), (19) where β = 1/(2ν + 1). For ELBM (4) the governing equations are: fi(x+ vi, t+ 1) = (1− β)f ∗i (x, t) + βf̃i(x, t), (20) with β as above and f̃ = (1−α)f+αf ∗. The parameter, α, is chosen to satisfy a constant entropy condition. This involves finding the nontrivial root of the equation S((1− α)f + αf ∗) = S(f). (21) To solve (21) numerically we employ a robust routine based on bisection. The root is solved to an accuracy of 10−15 and we always ensure that the returned value of α does not lead to a numerical entropy decrease. We stipulate that if, at some site, no nontrivial root of (21) exists we will employ the positivity rule instead (Fig. 1). 4.3 Shock tube The 1D shock tube for a compressible athermal fluid is a standard benchmark test for hydrodynamic codes. Our computational domain will be the interval [0, 1] and we discretize this interval with 801 uniformly spaced lattice sites. We choose the initial density ratio as 1:2 so that for x ≤ 400 we set ρ = 1.0 else we set ρ = 0.5. We will fix the kinematic viscosity of the fluid at ν = 10−9. 4.3.1 Comparison of LBGK and ELBM In Fig. 2 we compare the shock tube density profile obtained with LBGK (using entropic quasiequilibria (14)) and ELBM. On the same panel we also display both the total entropy S(t) := x S(x, t) and total nonequilibrium entropy ∆S(t) := x∆S(x, t) time histories. As expected, by construction, we observe that total entropy is (effectively) constant for ELBM. On the other hand, LBGK behaves non-entropically for this problem. In both cases we ob- serve that nonequilibrium entropy grows with time. As we can see, the choice between the two collision formulas LBGK (19) or ELBM (20) does not affect spurious oscillation, and reported regularisa- tion [29] is, perhaps, the result of approximate analytical solution of the equa- tion (21). Inaccuracy in the solution of (21) can be interpreted as a hidden nonequilibrium entropy limiter. But it should be mentioned that the entropic method consists not only of the collision formula, but, what is important, in- cludes a special choice of quasiequilibrium that could improve stability (see, e.g., [13]). Indeed, when we compare ELBM with LBGK using either entopic or standard polynomial quasiequilibria, there appears to be some gain in employ- ing entropic quasiequilibria (Fig. 3). We observe that the post-shock region for the LBGK simulations is more oscillatory when polynomial quasiequilibria are used. In Fig. 3 we have also included a panel with the simulation result- ing from a much higher viscosity (ν = 3.3333 × 10−2). Here, we observe no appreciable differences in the results of LBGK and ELBM. 0 0.5 1 0 100 200 300 400 0 100 200 300 400 0 0.5 1 0 100 200 300 400 0 100 200 300 400 Fig. 2. Density and profile of the 1:2 athermal shock tube simulation with ν = 10−9 after 400 time steps using (a) LBGK (19); (b) ELBM (20). In this example, no negative population are produced by any of the methods so the positivity rule is redundant. For ELBM in this example, (21) always has a nontrivial root. Total entropy and nonequilibrium entropy time histories are shown in panels (c), (d) and (e), (f) for LBGK and ELBM, respectively. 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 Fig. 3. Density and velocity profile of the 1:2 isothermal shock tube simula- tion after 400 time steps using (a) LBGK (19) with polynomial quasiequilib- ria (16) [ν = 3.3333 × 10−2]; (b) LBGK (19) with entropic quasiequilibria (14) [ν = 3.3333 × 10−2]; (c) ELBM (20) [ν = 3.3333 × 10−2]; (d) LBGK (19) with polynomial quasiequilibria (16) [ν = 10−9]; (e) LBGK (19) with entropic quasiequi- libria (14) [ν = 10−9]; (f) ELBM (20) [ν = 10−9]. 4.3.2 Nonequilibrium entropy limiters. Now, we would like to demonstrate just a representative sample of the many possibilities of limiters suggested in Sec. 3. In each case the limiter is im- plemented by a post-processing routine immediately following the collision step (either LBGK (19) or ELBM (20)). Here, we will only consider LBGK collisions and entropic quasiequilibria (14). The post-processing step adjusts f by the update formula: f 7→ f ∗ + φ(∆S)(f − f ∗), where ∆S is defined by (6) and φ is a limiter function. For the Ehrenfests’ regularisation one would choose φ(∆S)(x) = 1, ∆S(x) ≤ δ, 0, otherwise, where δ is a pre-specified threshold value. Furthermore, it is pertinent to select just k sites with highest ∆S > δ. This limiter has been previously applied to the shock tube problem in [6,7,8] and we will not reproduce those results here. Instead, our first example will be the following smooth limiter: φ(∆S) = 1 + α∆Sk . (22) For this limiter, we will fix k = 1/2 (so that the limiter is double monotonic in quadratic approximation to entropy) and compare the density profiles for α = δ/(E(∆S)k), δ = 0.1, 0.01, 0.001. We have also ensured an ensemble-dependent limiter because of the dependence of α on the average E(∆S). As with Fig. 2, we accompany each panel with the total entropy and nonequilibrium entropy histories. Note the different scales for nonequilibrium entropy. Note also that entropy (necessarily) now grows due to the additional dissipation. Our next example (Fig. 5) considers the threshold filter (10). In this example we choose the estimates ∆S0 = 5E(∆S), 10E(∆S), 20E(∆S) and fix the tol- erance δ = 0 so that the influence of the threshold alone can be studied. Only entropic adjustments are accepted in the limiter: ∆St ≤ ∆S. As the threshold increases, nonequilibrium entropy grows faster and spurious begin to appear. Finally, we test the median filter (Fig. 6). We choose a minimal filter so that only the nearest neighbours are considered. As with the threshold filter, we introduce a tolerance δ and we try the values δ = 10−3, 10−4, 10−5. Only entropic adjustments are accepted in the limiter: ∆Smed ≤ ∆S. 0 0.5 1 0 100 200 300 400 0 100 200 300 400 0 0.5 1 0 100 200 300 400 0 100 200 300 400 0.025 0 0.5 1 0 100 200 300 400 0 100 200 300 400 Fig. 4. Density and profile of the 1:2 athermal shock tube simulation with ν = 10−9 after 400 time steps using LBGK (19) and the smooth limiter (22) with k = 1/2, α = δ/(E(∆S)k) and (a) δ = 0.1; (b) δ = 0.01 and (c) δ = 0.001. Total entropy and nonequilibrium entropy time histories for each parameter set {k, α(δ)} are displayed in the adjacent panels. We have seen that each of the examples we have considered (Fig. 4, Fig. 5 and Fig. 6) is capable of subduing spurious post-shock oscillations compared with LBGK (or ELBM) on this problem (cf. Fig. 2). Of course, by limiting nonequilibrium entropy the result is necessarily an increase in entropy. From our experiences our recommendation is that the median filter is the superior choice amongst all the limiters suggested in Sec. 3. The action of the median filter is found to be both extremely gentle and, at the same time, very effective. 4.4 Lid-driven cavity Our second numerical example is the classical 2D lid-driven cavity flow. A square cavity of side length L is filled with fluid with kinematic viscosity ν (initially at rest) and driven by the cavity lid moving at a constant velocity (u0, 0) (from left to right in our geometry). 0 0.5 1 0 100 200 300 400 0 100 200 300 400 0 0.5 1 0 100 200 300 400 0 100 200 300 400 0 0.5 1 0 100 200 300 400 0 100 200 300 400 Fig. 5. Density and profile of the 1:2 athermal shock tube simulation with ν = 10−9 after 400 time steps using LBGK (19) and the threshold limiter (10) with (a) ∆St = 5E(∆S); (b) ∆St = 10E(∆S) and (c) ∆St = 20E(∆S). Total entropy and nonequilibrium entropy time histories for each threshold ∆St are displayed in the adjacent panels. We will simulate the flow on a 100 × 100 grid using LBGK regularised with the median filter limiter. Unless otherwise stated, we use entropic quasiequilib- ria (17). The implementation of the filter is as follows: the filter is not applied to boundary nodes; for nodes which immediately neighbour the boundary the stencil consists of the 3 nearest neighbours (including itself) closest to the boundary; for all other nodes the minimal stencil of 9 nearest neighbours is used. We have purposefully selected such a coarse grid simulation because it is read- ily found that, on this problem, unregularised LGBK fails (blows-up) for all but the most modest Reynolds numbers Re := Lu0/ν. 4.4.1 Steady-state vortex centres For modest Reynolds number the system settles to a steady state in which the dominant features are a primary central rotating vortex, with several counter- rotating secondary vortices located in the bottom-left, bottom-right (and pos- 0 0.5 1 0 100 200 300 400 0 100 200 300 400 0 0.5 1 0 100 200 300 400 0 100 200 300 400 0 0.5 1 0 100 200 300 400 0 100 200 300 400 Fig. 6. Density and profile of the 1:2 athermal shock tube simulation with ν = 10−9 after 400 time steps using LBGK (19) and the minimal median limiter with (a) δ = 10−5; (b) δ = 10−4 and (c) δ = 10−3. Total entropy and nonequilibrium entropy time histories for each tolerance δ are displayed in the adjacent panels. sibly top-left) corners. Steady state has been extensively investigated in the literature. The study of Hou et al [24] simulates the flow over a range of Reynolds numbers using unregularised LBGK on a 256×256 grid. Primary and secondary vortex centre data is provided. We compare this same statistic for the present median filtered coarse grid simulation. We will employ the same convergence criteria used in [24]. Namely, we deem that steady state has been reached by ensuring that the difference between the maximum value of the stream function for successive 10, 000 time steps is less that 10−5. The stream function, which is not a primary variable in the LBM simulation, is obtained from the velocity data by integration using Simpson’s rule. Vortex centres are characterised as local extrema of the stream function. We compare our results with the LBGK simulations in [24] and [41]. To align ourselves with these studies we specify the following boundary condition: lid profile is constant; remaining cavity walls are subject to the “bounce-back” condition [38]. In our simulations, the initial uniform fluid density profile is ρ = 2.7 and the velocity of the lid is u0 = 1/10 (in lattice units). Collected in Table 1, for Re = 2000, 5000 and 7500, are the coordinates of the primary and secondary vortex centres using (a) unregularised LBGK; (b) LBGK with median filter limiter (δ = 10−3); (c) LBGK with median filter lim- iter (δ = 10−4), all with non-entropic polynomial quasiequilibria (18). Lines (d), (e) and (f) are the same but with entropic quasiequilibria (17). The re- maining lines of Table 1 are as follows: (g) literature data [24] (unregularised LBGK on a 256×256 grid); (h) literature data [41] (positivity rule); (i) litera- ture data [41] (ELBM). With the exception of (g), all simulation are conducted on a 100 × 100 grid. The top-left vortex does not appear at Re = 2000 and no data was provided for it in [41] at Re = 5000. The unregularised LBGK Re = 7500 simulation blows-up in finite time and the simulation becomes meaningless. The y-coordinate of the two lower-vortices at Re = 5000 in (i) appear anomalously small and were not reproduced by our experiments with the positivity rule (not shown). We have conducted two runs of the experiment with the median filter param- eter δ = 10−3 and δ = 10−4. Despite the increased number of realisations the vortex centre locations remain effectively unchanged and we detect no signif- icant variation between the two runs. This demonstrates the gentle nature of the median filter. At Reynolds Re = 2000 the median filter has no effect at all on the vortex centres compared with LBGK. We find no significant differences between the experiments with entropic and non-entropic polynomial quasiequilibria in this test. The coordinates of the primary vortex centre for unregularised LBGK at Re = 5000 are already quite inaccurate as LBGK begins to lose stability. Stability is lost entirely at some critical Reynolds number 5000 < Re ≤ 7500 and the simulation blows-up. Furthermore, we have agreement (within grid resolution) with the data given in [24]. Also compiled in Table 1 is the data from the limiter experiments conducted in [41] (although not explicitly discussed in the language of limiters by the authors of that work). In [41] the authors give vortex centre data for the positivity rule (Fig. 1) and for ELBM (which we interpret as containing a hidden limiter). In [41] the positivity rule is called FIX-UP. As Reynolds number increases the flow in the cavity is no longer steady and a more complicated flow pattern emerges. On the way to a fully developed tur- bulent flow, the lid-driven cavity flow is known to undergo a series of period doubling Hopf bifurcations. On our coarse grid, we observe that the coordi- nates of the primary vortex centre (maximum of the stream function) is a very robust feature of the flow, with little change between coordinates (no change in y-coordinates) computed at Re = 5000 and Re = 7500 with the median fil- ter. On one hand, because of this observation it becomes inconclusive whether Table 1 Primary and secondary vortex centre coordinates for the lid-driven cavity flow at Re = 2000, 5000, 7500. Primary Lower-left Lower-right Top-left Re x y x y x y x y 2000 (a) 0.5253 0.5455 0.0909 0.1010 0.8384 0.1010 Not applicable 2000 (b) 0.5253 0.5455 0.0909 0.1010 0.8384 0.1010 Not applicable 2000 (c) 0.5253 0.5455 0.0909 0.1010 0.8384 0.1010 Not applicable 2000 (d) 0.5253 0.5455 0.0909 0.1010 0.8384 0.1010 Not applicable 2000 (e) 0.5253 0.5455 0.0909 0.1010 0.8384 0.1010 Not applicable 2000 (f) 0.5253 0.5455 0.0909 0.1010 0.8384 0.1010 Not applicable 2000 (g) 0.5255 0.5490 0.0902 0.1059 0.8471 0.0980 Not applicable 2000 (h) 0.5200 0.5450 0.0900 0.1000 0.8300 0.0950 Not applicable 2000 (i) 0.5200 0.5500 0.0890 0.1000 0.8300 0.1000 Not applicable 5000 (a) 0.5152 0.6061 0.0808 0.1313 0.7980 0.0707 0.0505 0.8990 5000 (b) 0.5152 0.5354 0.0808 0.1313 0.8081 0.0808 0.0606 0.8990 5000 (c) 0.5152 0.5354 0.0808 0.1313 0.8081 0.0808 0.0707 0.8889 5000 (d) 0.5152 0.5960 0.0808 0.1313 0.8081 0.0808 0.0505 0.8990 5000 (e) 0.5152 0.5354 0.0808 0.1313 0.8081 0.0808 0.0606 0.8990 5000 (f) 0.5152 0.5354 0.0808 0.1313 0.8081 0.0808 0.0707 0.8889 5000 (g) 0.5176 0.5373 0.0784 0.1373 0.8078 0.0745 0.0667 0.9059 5000 (h) 0.5150 0.5680 0.0950 0.0100 0.8450 0.0100 Not available 5000 (i) 0.5150 0.5400 0.0780 0.1350 0.8050 0.0750 Not available 7500 (a) — — — — — — — — 7500 (b) 0.5051 0.5354 0.0707 0.1515 0.7879 0.0707 0.0606 0.8990 7500 (c) 0.5051 0.5354 0.0707 0.1515 0.7879 0.0707 0.0707 0.8889 7500 (d) — — — — — — — — 7500 (e) 0.5051 0.5354 0.0707 0.1515 0.7879 0.0707 0.0606 0.8990 7500 (f) 0.5051 0.5354 0.0707 0.1515 0.7879 0.0707 0.0707 0.8889 7500 (g) 0.5176 0.5333 0.0706 0.1529 0.7922 0.0667 0.0706 0.9098 the median limiter is adding too much additional dissipation. On the other hand, a more studious choice of control criteria may indicate that the first bifurcation has already occurred by Re = 7500. 4.4.2 First Hopf bifurcation A survey of available literature reveals that the precise value of Re at which the first Hopf bifurcation occurs is somewhat contentious, with most current studies (all of which are for incompressible flow) ranging from around Re = 7400–8500 [9,32,33]. Here, we do not intend to give a precise value because it is a well observed grid effect that the critical Reynolds number increases (shifts to the right) with refinement (see, e.g., Fig. 3 in [33]). Rather, we will be content to localise the first bifurcation and, in doing so, demonstrate that limiters are capable of regularising without effecting fundamental flow features. To localise the first bifurcation we take the following algorithmic approach. Entropic quasiequilibria are in use. The initial uniform fluid density profile is ρ = 1.0 and the velocity of the lid is u0 = 1/10 (in lattice units). We record the unsteady velocity data at a single control point with coordinates (L/16, 13L/16) and run the simulation for 5000 non-dimensionless time units (5000L/u0 time steps). Let us denote the final 1% of this signal by (usig, vsig). We then compute the energy Eu (ℓ2-norm normalised by non-dimensional signal duration) of the deviation of usig from its mean: Eu := u0|usig| (usig − usig) , (23) where |usig| and usig denote the length and mean of usig, respectively. We choose this robust statistic instead of attempting to measure signal amplitude because of numerical noise in the LBM simulation. The source of noise in LBM is attributed to the existence of an inherently unavoidable neutral stability direction in the numerical scheme (see, e.g., [8]). We opt not to employ the “bounce-back” boundary condition used in the pre- vious steady state study. Instead we will use the diffusive Maxwell boundary condition (see, e.g., [11]), which was first applied to LBM in [1]. The essence of the condition is that populations reaching a boundary are reflected, propor- tional to equilibrium, such that mass-balance (in the bulk) and detail-balance are achieved. The boundary condition coincides with “bounce-back” in each corner of the cavity. To illustrate, immediately following the advection of populations consider the situation of a wall, aligned with the lattice, moving with velocity uwall and with outward pointing normal to the wall in the negative y-direction (this is the situation on the lid of the cavity with uwall = u0). The implementation of the diffusive Maxwell boundary condition at a boundary site (x, y) on this wall consists of the update fi(x, y, t+ 1) = γf i (uwall), i = 4, 7, 8, f2(x, y, t) + f5(x, y, t) + f6(x, y, t) f ∗4 (uwall) + f 7 (uwall) + f 8 (uwall) Observe that, because density is a linear factor of the quasiequilibria (17), the density of the wall is inconsequential in the boundary condition and can therefore be taken as unity for convenience. As is usual, only those populations pointing in to the fluid at a boundary site are updated. Boundary sites do not undergo the collisional step that the bulk of the sites are subjected to. We prefer the diffusive boundary condition over the often preferred “bounce- back” boundary condition with constant lid profile. This is because we have experienced difficulty in separating the aforementioned numerical noise from the genuine signal at a single control point using “bounce-back”. We remark that the diffusive boundary condition does not prevent unregularised LBGK from failing at some critical Reynolds number Re > 5000. Now, we conduct an experiment and record (23) over a range of Reynolds numbers. In each case the median filter limiter is employed with parameter δ = 10−3. Since the transition between steady and periodic flow in the lid- driven cavity is known to belong to the class of standard Hopf bifurcations we are assured that E2u ∝ Re [16]. Fitting a line of best fit to the resulting data localises the first bifurcation in the lid-driven cavity flow to Re = 7135 (Fig. 7). This value is within the tolerance of Re = 7402±4% given in [33] for a 100×100 grid. We also provide a (time averaged) phase space trajectory and Fourier spectrum for Re = 7375 at the monitoring point (Fig. 8 and Fig. 9) which clearly indicate that the first bifurcation has been observed. 5 Conclusions Entropy and thermodynamics are important for stability of the lattice Boltz- mann methods. It is now clear: after almost 10 years of work since the pub- lication of [26] proved this statement (the main reviews are [5,28,39]). The question is now: “how does one utilise, optimally, entropy and thermody- namic structures in lattice Boltzmann methods?”. In our paper we attempt to propose a solution (temporary, at least). Our approach is applicable to both entropic as well as for non-entropic polynomial quasiequilibria. 5750 6000 6250 6500 6750 7000 7250 7500 7750 8000 0.005 0.015 0.025 0.035 0.045 (7135,0) Fig. 7. Plot of energy squared, E2u (23), as a function of Reynolds number, Re, using LBGK regularised with the median filter limiter with δ = 10−3 on a 100× 100 grid. Straight lines are lines of best fit. The intersection of the sloping line with the x-axis occurs close to Re = 7135. We have constructed a system of nonequilibrium entropy limiters for the lattice Boltzmann methods (LBM): • the positivity rule that provides positivity of distribution; • the pointwise entropy limiters based on selection and correction of most nonequilibrium values; • filters of nonequilibrium entropy, and the median filter as a filter of choice. All these limiters exploit physical properties of LBM and allow control of total additional entropy production. In general, they do the same work for LBM as flux limiters do for finite differences, finite volumes and finite elements meth- ods, and come into operation when sharp gradients are present. For smoothly changing waves, the limiters do not operate and the spatial derivatives can be represented by higher order approximations without introducing non-physical oscillations. But there are some differences too: for LBM the main idea behind the construction of nonequilibrium entropy limiter schemes is to limit a scalar quantity — the nonequilibrium entropy — or to delete the “salt and pepper” noise from the field of this quantity. We do not touch the vectors or tensors of spatial derivatives, as it is for flux limiters. Standard test examples demonstrate that the developed limiters erase spurious oscillations without blurring of shocks, and do not affect smooth solutions. The limiters we have tested do not produce a noticeable additional dissipation and Fig. 8. Velocity components as a function of time for the signal (usig, vsig) at the monitoring point (L/16, 13L/16) using LBGK regularised with the median filter limiter with δ = 10−3 on a 100 × 100 grid (Re = 7375). Dots represent simulation results and the solid line is a 100 step time average of the signal. allow us to reproduce the first Hopf bifurcation for 2D lid-driven cavity on a coarse 100× 100 grid. At the same time the simplest median filter deletes the spurious post-shock oscillations for low viscosity. Perhaps, it is impossible to find one best nonequilibrium entropy limiter for all problems. It is a special task to construct the optimal limiters for a specific classes of problems. Acknowledgments Discussion of the preliminary version of this work with S. Succi and par- ticipants of the lattice Boltzmann workshop held on 15th November 2006 in Leicester (UK) was very important. Author A. N. Gorban is grateful to S. K. Godunov for the course of numerical methods given many years ago at Novosibirsk University. This work is supported by Engineering and Physical Sciences Research Council (EPSRC) grant number GR/S95572/01. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Frequency Fig. 9. Relative amplitude spectrum for the signal usig at the monitoring point (L/16, 13L/16) using LBGK regularised with the median filter limiter with δ = 10−3 on a 100 × 100 grid (Re = 7375). We measure a dominant frequency of ω = 0.525. References [1] S. Ansumali, and I. V. Karlin. Kinetic boundary conditions in the lattice Boltzmann method. Phys. Rev. E 66, 026311 2002. [2] S. Ansumali S, I. V. Karlin, H. C. Ottinger. Minimal entropic kinetic models for hydrodynamics Europhys. Let. 63 (6): 798-804. 2003 [3] R. Benzi, S. Succi, and M. Vergassola. The lattice Boltzmann-equation - theory and applications. Physics Reports, 222(3):145–197, 1992. [4] B. M. Boghosian, P. J. Love, and J. Yepez. Entropic lattice Boltzmann model for Burgers equation. Phil. Trans. Roy. Soc. A, 362:1691–1702, 2004. [5] B. M. Boghosian, J. Yepez, P. V. Coveney, and A. J. Wager. Entropic lattice Boltzmann methods. R. Soc. Lond. Proc. Ser. A Math. Phys. Eng. Sci., 457(2007):717–766, 2001. [6] R. A. Brownlee, A. N. Gorban, and J. Levesley. Stabilisation of the lattice- Boltzmann method using the Ehrenfests’ coarse-graining. cond-mat/0605359, 2006. [7] R. A. Brownlee, A. N. Gorban, and J. Levesley. Stabilisation of the lattice- Boltzmann method using the Ehrenfests’ coarse-graining. Phys. Rev. E, 74:037703, 2006. http://arxiv.org/abs/cond-mat/0605359 [8] R. A. Brownlee, A.N. Gorban, and J. Levesley. Stability and stabilization of the lattice Boltzmann method, Phys. Rev. E, to appear. cond-mat/0611444, 2006. [9] C.-H. Bruneau, and M. Saad. The 2D lid-driven cavity problem revisited. Comput. Fluids, 35:326–348, 2006. [10] S. R. Chatkravathy, and S. Osher. High resolution applications of the Osher upwind scheme for the Euler equations, AIAA Paper 83-1943, Proc. AIAA 6th Comutational Fluid Dynamics Conference, (1983), 363–373. [11] C. Cercignani. Theory and Application of the Boltzmann Equation. Scottish Academic Press, Edinburgh, 1975. [12] S. Chen and G. D. Doolen. Lattice Boltzmann method for fluid flows. Annu. Rev. Fluid. Mech., 30:329–364, 1998. [13] S. S. Chikatamarla and I. V. Karlin. Entropy and Galilean Invariance of Lattice Boltzmann Theories. Phys. Rev. Lett. 97, 190601 (2006) [14] A. J. Chorin, O. H. Hald, R. Kupferman. Optimal prediction with memory, Physica D 166 (2002), 239–257. [15] P. Ehrenfest and T. Ehrenfest. The conceptual foundations of the statistical approach in mechanics. Dover Publications Inc., New York, 1990. [16] N. K. Ghaddar, K. Z. Korczak, B. B. Mikic, and A. T. Patera. Numerical investigation of incompressible flow in grooved channels. Part 1. Stability and self-sustained oscillations. J. Fluid Mech., 163:99–127, 1986. [17] S. K. Godunov. A Difference Scheme for Numerical Solution of Discontinuous Solution of Hydrodynamic Equations, Math. Sbornik, 47 (1959), 271-306. [18] A. N. Gorban. Equilibrium encircling. Equations of chemical kinetics and their thermodynamic analysis, Nauka, Novosibirsk, 1984. [19] A. N. Gorban, I. V. Karlin, H. C. Öttinger, and L. L. Tatarinova. Ehrenfest’s argument extended to a formalism of nonequilibrium thermodynamics. Phys. Rev. E, 62:066124, 2001. [20] A. N. Gorban. Basic types of coarse-graining. In A. N. Gorban, N. Kazantzis, I. G. Kevrekidis, H.-C. Öttinger, and C. Theodoropoulos, editors, Model Reduction and Coarse-Graining Approaches for Multiscale Phenomena, pages 117–176. Springer, Berlin-Heidelberg-New York, 2006. cond-mat/0602024. [21] A. Gorban, B. Kaganovich, S. Filippov, A. Keiko, V. Shamansky, I. Shirkalin, Thermodynamic Equilibria and Extrema: Analysis of Attainability Regions and Partial Equilibrium, Springer, Berlin, Heidelberg, New York, 2006. [22] H. Grad. On the kinetic theory of rarefied gases, Comm. Pure and Appl. Math. 2 4, (1949), 331–407. [23] F. Higuera, S. Succi, and R. Benzi. Lattice gas – dynamics with enhanced collisions. Europhys. Lett., 9:345–349, 1989. http://arxiv.org/abs/cond-mat/0611444 http://arxiv.org/abs/cond-mat/0602024 [24] S. Hou, Q. Zou, S. Chen, G. Doolen and A. C. Cogley. Simulation of cavity flow by the lattice Boltzmann method. J. Comp. Phys., 118:329–347, 1995. [25] D. Jou, J. Casas-Vázquez, G. Lebon. Extended irreversible thermodynamics, Springer, Berlin, 1993. [26] I. V. Karlin, A. N. Gorban, S. Succi, and V. Boffi. Maximum entropy principle for lattice kinetic equations. Phys. Rev. Lett., 81:6–9, 1998. [27] I. V. Karlin, A. Ferrante, and H. C. Öttinger. Perfect entropy functions of the lattice Boltzmann method. Europhys. Lett., 47:182–188, 1999. [28] I. V. Karlin, S. Ansumali, C. E. Frouzakis, and S. S. Chikatamarla. Elements of the lattice Boltzmann method I: Linear advection equation. Commun. Comput. Phys., 1 (2006), 616–655. [29] I. V. Karlin, S. S. Chikatamarla and S. Ansumali. Elements of the lattice Boltzmann method II: Kinetics and hydrodynamics in one dimension. Commun. Comput. Phys., 2 (2007), 196–238. [30] S. Kullback. Information theory and statistics, Wiley, New York, 1959. [31] Y. Li, R. Shock, R. Zhang, and H. Chen. Numerical study of flow past an impulsively started cylinder by the lattice-Boltzmann method. J. Fluid Mech., 519:273–300, 2004. [32] T. W. Pan, and R. Glowinksi. A projection/wave-like equation method for the numerical simulation of incompressible viscous fluid flow modeled by the Navier–Stokes equations. Comp. Fluid Dyn. J., 9:28–42, 2000. [33] Y.-F. Peng, Y.-H. Shiau, and R. R. Hwang. Transition in a 2-D lid-driven cavity flow. Comput. Fluids, 32:337–352, 2003. [34] W. K. Pratt. Digital Image Processing, Wiley, New York, 1978. [35] H. Qian. Relative entropy: free energy associated with equilibrium fluctuations and nonequilibrium deviations, Phys. Rev. E. 63 (2001), 042103. [36] P. L. Roe. Characteristic-based schemes for the Euler equations, Ann. Rev. Fluid Mech., 18 (1986), 337-365. [37] X. Shan, X-F. Yuan, and H. Chen. Kinetic theory representation of hydrodynamics: a way beyond the NavierStokes equation. J. Fluid Mech. 550 (2006), 413-441. [38] S. Succi. The lattice Boltzmann equation for fluid dynamics and beyond. Oxford University Press, New York, 2001. [39] S. Succi, I. V. Karlin, and H. Chen. Role of the H theorem in lattice Boltzmann hydrodynamic simulations. Rev. Mod. Phys., 74:1203–1220, 2002. [40] P. K. Sweby. High resolution schemes using flux-limiters for hyperbolic conservation laws. SIAM J. Num. Anal., 21 (1984), 995–1011. [41] F. Tosi, S. Ubertini, S. Succi, H. Chen, and I.V. Karlin. Numerical stability of entropic versus positivity-enforcing lattice Boltzmann schemes. Math. Comput. Simulation, 72:227–231, 2006. [42] B. Van Leer. Towards the ultimate conservative difference scheme III. Upstream-centered finite-difference schemes for ideal compressible flow., J. Comp. Phys., 23 (1977), 263–275. [43] P. Wesseling. Principles of Computational Fluid Dynamics, Springer Series in Computational Mathematics (Springer-Verlag, Berlin, 2001), Vol. 29. [44] Y. B. Zeldovich, Proof of the Uniqueness of the Solution of the Equations of the Law of Mass Action, In: Selected Works of Yakov Borisovich Zeldovich, Vol. 1, J. P. Ostriker (Ed.), Princeton University Press, Princeton, USA, 1996, 144–148. Introduction Background Nonequilibrium entropy limiters for LBM Positivity rule Ehrenfests' regularisation Smooth limiters of nonequilibrium entropy Monitoring of total dissipation Median entropy filter Entropic steps for non-entropic quasiequilibria ELBM collisions as a smooth limiter Monotonic and double monotonic limiters Numerical experiment Velocities and quasiequilibria LBGK and ELBM Shock tube Lid-driven cavity Conclusions References ABSTRACT We construct a system of nonequilibrium entropy limiters for the lattice Boltzmann methods (LBM). These limiters erase spurious oscillations without blurring of shocks, and do not affect smooth solutions. In general, they do the same work for LBM as flux limiters do for finite differences, finite volumes and finite elements methods, but for LBM the main idea behind the construction of nonequilibrium entropy limiter schemes is to transform a field of a scalar quantity - nonequilibrium entropy. There are two families of limiters: (i) based on restriction of nonequilibrium entropy (entropy "trimming") and (ii) based on filtering of nonequilibrium entropy (entropy filtering). The physical properties of LBM provide some additional benefits: the control of entropy production and accurate estimate of introduced artificial dissipation are possible. The constructed limiters are tested on classical numerical examples: 1D athermal shock tubes with an initial density ratio 1:2 and the 2D lid-driven cavity for Reynolds numbers Re between 2000 and 7500 on a coarse 100*100 grid. All limiter constructions are applicable for both entropic and non-entropic quasiequilibria. <|endoftext|><|startoftext|> Introduction, observational and nu- merical evidence makes it safe to assume that the turbulence in such a system will be anisotropic with k‖ ≪ k⊥ (at scales smaller than the outer scale, k‖L ≫ 1; see § 1.3 and § 1.5.1). Let us, therefore, introduce a small parameter ǫ ∼ k‖/k⊥ and carry out a systematic expansion of Eqs. (7-10) in ǫ. In this expansion, the fluctuations are treated as small, but not arbi- trarily so: in order to estimate their size, we shall adopt the critical-balance conjecture (3), which is now treated not as a detailed scaling prescription but as an ordering assumption. This allows us to introduce the following ordering: ∼ δB⊥ ∼ ǫ, (12) where vA = B0/ 4πρ0 is the Alfvén speed. Note that this means that we order the Mach number , (13) where cs = (γp0/ρ0) 1/2 is the speed of sound and is the plasma beta, which is ordered to be order unity in the ǫ expansion (subsidiary limits of high and low β can be taken after the ǫ expansion is done; see § 2.4). In Eq. (12), we made two auxiliary ordering assump- tions: that the velocity and magnetic-field fluctuations have the character of Alfvén and slow waves (δB⊥/B0 ∼ u⊥/vA, δB‖/B0 ∼ u‖/vA) and that the relative amplitudes of the Alfvén-wave-polarized fluctuations (δB⊥/B0, u⊥/vA), slow-wave-polarized fluctuations (δB‖/B0, u‖/vA) and den- sity/pressure/entropy fluctuations (δρ/ρ0, δp/p0) are all the same order. Strictly speaking, whether this is the case depends on the energy sources that drive the turbulence: as we shall see, if no slow waves (or entropy fluctuations) are launched, none will be present. However, in astrophysical contexts, the outer-scale energy input may be assumed random and, there- fore, comparable power is injected into all types of fluctua- tions. We further assume that the characteristic frequency of the fluctuations is ω∼ k‖vA [Eq. (3)], meaning that the fast waves, for which ω ≃ k⊥(v2A + c2s )1/2, are ordered out. This restric- tion must be justified empirically. Observations of the solar- wind turbulence confirm that it is primarily Alfvénic (see, e.g., Bale et al. 2005) and that its compressive component is substantially pressure-balanced (Roberts 1990; Burlaga et al. 1990; Marsch & Tu 1993; Bavassano et al. 2004, see Eq. (22) below). A weak-turbulence calculation of compressible MHD turbulence in low-beta plasmas (Chandran 2005b) suggests that only a small amount of energy is transferred from the fast waves to Alfvén waves with large k‖. A similar conclusion emerges from numerical simulations (Cho & Lazarian 2002, 2003). As the fast waves are also expected to be subject to strong collisionless damping and/or to strong dissipation after they steepen into shocks, we eliminate them from our con- sideration of the problem and concentrate on low-frequency turbulence. 2.2. Alfvén Waves We start by observing that the Alfvén-wave-polarized fluctuations are two-dimensionally solenoidal: since, from Eq. (7), ∇·u = − d = O(ǫ2) (15) and ∇·δB = 0 exactly, separating the O(ǫ) part of these diver- gences gives ∇⊥ ·u⊥ = 0 and ∇⊥ · δB⊥ = 0. To lowest order in the ǫ expansion, we may, therefore, express u⊥ and δB⊥ in terms of scalar stream (flux) functions: u⊥ = ẑ×∇⊥Φ, = ẑ×∇⊥Ψ. (16) Evolution equations for Φ and Ψ are obtained by substituting the expressions (16) into the perpendicular parts of the induc- tion equation (10) and the momentum equation (8)—of the latter the curl is taken to annihilate the pressure term. Keep- ing only the terms of the lowest order, O(ǫ2), we get +{Φ,Ψ}= vA , (17) ∇2⊥Φ+ Φ,∇2⊥Φ ∇2⊥Ψ+ Ψ,∇2⊥Ψ , (18) where {Φ,Ψ} = ẑ · (∇⊥Φ×∇⊥Ψ) and we have taken into account that, to lowest order, + u⊥ ·∇⊥ = +{Φ, · · ·} , (19) b̂ ·∇= ∂ ·∇⊥ = {Ψ, · · ·} . (20) Here b̂ = B/B0 is the unit vector along the perturbed field line. Equations (17-18) are known as the Reduced Magne- tohydrodynamics (RMHD). The first derivations of these equations (in the context of fusion plasmas) are due to Kadomtsev & Pogutse (1974) and to Strauss (1976). These were followed by many systematic derivations and gener- alizations employing various versions and refinements of the basic expansion, taking into account the non-Alfvénic KINETIC TURBULENCE IN MAGNETIZED PLASMAS 9 modes (which we will do in § 2.4), and including the ef- fects of spatial gradients of equilibrium fields (e.g., Strauss 1977; Montgomery 1982; Hazeltine 1983; Zank & Matthaeus 1992; Kinney & McWilliams 1997; Bhattacharjee et al. 1998; Kruger et al. 1998). A comparative review of these expansion schemes and their (often close) relationship to ours is outside the scope of this paper. One important point we wish to em- phasize is that we do not assume the plasma beta [defined in Eq. (14)] to be either large or small. Equations (17) and (18) form a closed set, meaning that the Alfvén-wave cascade decouples from the slow waves and den- sity fluctuations. It is to the turbulence described by Eqs. (17- 18) that the GS theory outlined in § 1.2 applies.13 In § 5.3, we will show that Eqs. (17) and (18) correctly describe inertial- range Alfvénic fluctuations even in a collisionless plasma, where the full MHD description [Eqs. (7-10)] is not valid. 2.3. Elsasser Fields The MHD equations (7-10) in the incompressible limit (ρ = const) acquire a symmetric form if written in terms of the Elsasser fields z± = u± δB/ 4πρ (Elsasser 1950). Let us demonstrate how this symmetry manifests itself in the re- duced equations derived above. We introduce Elsasser potentials ζ± = Φ±Ψ, so that z±⊥ = ẑ×∇⊥ζ±. For these potentials, Eqs. (17-18) become ∇2⊥ζ±∓ vA ∇2⊥ζ± = − ζ+,∇2⊥ζ− ζ−,∇2⊥ζ+ ∓∇2⊥ {ζ+, ζ−} . (21) These equations show that the RMHD has a simple set of ex- act solutions: if ζ− = 0 or ζ+ = 0, the nonlinear term vanishes and the other, non-zero, Elsasser potential is simply a fluc- tuation of arbitrary shape and magnitude propagating along the mean field at the Alfvén speed vA: ζ ± = f±(x,y,z∓ vAt). These solutions are finite-amplitude Alfvén-wave packets of arbitrary shape. Only counterpropagating such solutions can interact and thereby give rise to the Alfvén-wave cascade (Kraichnan 1965). Note that these interactions are conserva- tive in the sense that the “+” and “−” waves scatter off each other without exchanging energy. Note that the individual conservation of the “+” and “−” waves’ energies means that the energy fluxes associated with these waves need not be equal, so instead of a sin- gle Kolmogorov flux ε assumed in the scaling arguments 13 The Alfvén-wave turbulence in the RMHD system has been stud- ied by many authors. Some of the relevant numerical investigations are due to Kinney & McWilliams (1998), Dmitruk et al. (2003), Oughton et al. (2004), Rappazzo et al. (2007, 2008), Perez & Boldyrev (2008, 2009). An- alytical theory has mostly been confined to the weak-turbulence paradigm (Ng & Bhattacharjee 1996, 1997; Bhattacharjee & Ng 2001; Galtier et al. 2002; Lithwick & Goldreich 2003; Galtier & Chandran 2006; Nazarenko 2008). We note that adopting the critical balance [Eq. (3)] as an ordering assumption for the expansion in k‖/k⊥ does not preclude one from subse- quently attempting a weak-turbulence approach: the latter should simply be treated as a subsidiary expansion. Indeed, implementing the anisotropy as- sumption on the level of MHD equations rather than simultaneously with the weak-turbulence closure (Galtier et al. 2000) significantly reduces the amount of algebra. One should, however, bear in mind that the weak- turbulence approximation always breaks down at some sufficiently small scale—namely, when k⊥ ∼ (vA/U) L, where L is the outer scale of the turbulence, U velocity at the outer scale, and k‖ the parallel wavenum- ber of the Alfvén waves (see Goldreich & Sridhar 1997 or the review by Schekochihin & Cowley 2007). Below this scale, interactions cannot be as- sumed weak. reviewed in § 1.2, we could have ε+ 6= ε−. The GS the- ory can be generalized to this case of imbalanced Alfvénic cascades (Lithwick et al. 2007; Beresnyak & Lazarian 2008a; Chandran 2008), but here we will focus on the balanced tur- bulence, ε+ ∼ ε−. If one considers the turbulence forced in a physical way (i.e., without forcing the magnetic field, which would break the flux conservation), the resulting cas- cade would always be balanced. In the real world, imbal- anced Alfvénic fluxes are measured in the fast solar wind, where the influence of initial conditions in the solar atmo- sphere is more pronounced, while the slow-wind turbulence is approximately balanced (Marsch & Tu 1990a; see also re- views by Tu & Marsch 1995; Bruno & Carbone 2005 and ref- erences therein). 2.4. Slow Waves and the Entropy Mode In order to derive evolution equations for the remaining MHD modes, let us first revisit the perpendicular part of the momentum equation and use Eq. (12) to order terms in it. In the lowest order, O(ǫ), we get the pressure balance B0δB‖ = 0 ⇒ δp . (22) Using Eq. (22) and the entropy equation (9), we get , (23) where s0 = p0/ρ 0 . Now, substituting Eq. (15) for ∇·u in the parallel component of the induction equation (10), we get − b̂ ·∇u‖ = 0. (24) Combining Eqs. (23) and (24), we obtain 1 + c2s/v b̂ ·∇u‖, (25) 1 + v2A/c2s b̂ ·∇u‖. (26) Finally, we take the parallel component of the momentum equation (8) and notice that, due to the pressure balance (22) and to the smallness of the parallel gradients, the pressure term is O(ǫ3), while the inertial and tension terms are O(ǫ2). Therefore, = v2Ab̂ ·∇ . (27) Equations (26-27) describe the slow-wave-polarized fluctu- ations, while Eq. (23) describes the zero-frequency entropy mode, which is decoupled from the slow waves.14 The non- linearity in Eqs. (26-27) enters via the derivatives defined in 14 For other expansion schemes leading to reduced sets of equations for these “compressive” fluctuations see references in § 2.2. Note that the na- ture of the density fluctuations described above is distinct from the so called “pseudosound” density fluctuations that arise in the “nearly incompress- ible” MHD theories (Montgomery et al. 1987; Matthaeus & Brown 1988; Matthaeus et al. 1991; Zank & Matthaeus 1993). The “pseudosound” is es- sentially the density response caused by the nonlinear pressure fluctuations calculated from the incompressibility constraint. The resulting density fluc- tuations are second order in Mach number and, therefore, order ǫ2 in our expansion [see Eq. (13)]. The passive density fluctuations derived in this sec- tion are order ǫ and, therefore, supersede the “pseudosound” (see review by Tu & Marsch 1995 for a discussion of the relevant solar-wind evidence). 10 SCHEKOCHIHIN ET AL. Eqs. (19-20) and is due solely to interactions with Alfvén waves. Thus, both the slow-wave and the entropy-mode cas- cades occur via passive scattering/mixing by Alfvén waves, in the course of which there is no energy exchange between the cascades. Note that in the high-beta limit, cs ≫ vA [see Eq. (14)], the entropy mode is dominated by density fluctuations [Eq. (23), cs ≫ vA], which also decouple from the slow-wave cascade [Eq. (25), cs ≫ vA]. and are passively mixed by the Alfvén- wave turbulence: = 0. (28) The high-beta limit is equivalent to the incompressible ap- proximation for the slow waves. In § 5.5, we will derive a kinetic description for the inertial- range compressive fluctuations (density and magnetic-field strength), which is more generally valid in weakly collisional plasmas and which reduces to Eqs. (26-27) in the collisional limit (see Appendix D). While these fluctuations will in gen- eral satisfy a kinetic equation, they will remain passive with respect to the Alfvén waves. 2.5. Elsasser Fields for the Slow Waves The original Elsasser (1950) symmetry was derived for in- compressible MHD equations. However, for the “compres- sive” slow-wave fluctuations, we may introduce generalized Elsasser fields: = u‖± . (29) Straightforwardly, the evolution equation for these fields is ∓ vA√ 1 + v2A/c2s 1∓ 1√ 1 + v2A/c2s ζ+,z± 1± 1√ 1 + v2A/c2s ζ−,z± . (30) In the high-beta limit (vA ≪ cs), the generalized Elsasser fields (29) become the parallel components of the conven- tional incompressible Elsasser fields. We see that only in this limit do the slow waves interact exclusively with the counter- propagating Alfvén waves, and so only in this limit does set- ting ζ− = 0 or ζ+ = 0 gives rise to finite-amplitude slow-wave- packet solutions z± = f±(x,y,z∓ vAt) analogous to the finite- amplitude Alfvén-wave packets discussed in § 2.3.15 For gen- eral β, the phase speed of the slow waves is smaller than that of the Alfvén waves and, therefore, Alfvén waves can “catch up” and interact with the slow waves that travel in the same direction. All of these interactions are of scattering type and involve no exchange of energy. 2.6. Scalings for Passive Fluctuations 15 Obviously, setting both ζ± = 0 does always enable these finite- amplitude slow-wave solutions. More non-trivially, such finite-amplitude so- lutions exist in the Lagrangian frame associated with the Alfvén waves—this is discussed in detail in § 6.3. The scaling of the passively mixed scalar fields introduced above is slaved to the scaling of the Alfvénic fluctuations. Consider for example the entropy mode [Eq. (23)]. As in Kolmogorov–Obukhov theory (see § 1.1), one assumes a local-in-scale-space cascade of scalar variance and a constant flux εs of this variance. Then, analogously to Eq. (1), v2thi ∼ εs. (31) Since the cascade time is τ−1λ ∼ u⊥ ·∇⊥ ∼ vA/l‖λ ∼ ε/u2⊥λ, )1/2 u⊥λ , (32) so the scalar fluctuations have the same scaling as the turbu- lence that mixes them (Obukhov 1949; Corrsin 1951). In GS turbulence, the scalar-variance spectrum should, therefore, be ⊥ (Lithwick & Goldreich 2001). The same argument ap- plies to all passive fields. It is the (presumably) passive electron-density spectrum that provides the main evidence of the k−5/3 scaling in the in- terstellar turbulence (Armstrong et al. 1981, 1995; Lazio et al. 2004, see further discussion in § 8.4.1). The explanation of this spectrum in terms of passive mixing of the entropy mode, originally proposed by Higdon (1984), was developed on the basis of the GS theory by Lithwick & Goldreich (2001). The turbulent cascade of the compressive fluctuations and the rel- evant solar-wind data is discussed further in § 6.3. In partic- ular, it will emerge that the anisotropy of these fluctuations remains a non-trivial issue: is there an analog of the scaling relation (5)? The scaling argument outlined above does not invoke any assumptions about the relationship between the parallel and perpendicular scales of the compressive fluctu- ations (other than the assumption that they are anisotropic). Lithwick & Goldreich (2001) argue that the parallel scales of the Alfvénic fluctuations will imprint themselves on the pas- sively advected compressive ones, so Eq. (5) holds for the latter as well. In § 6.3, we examine this conclusion in view of the solar-wind evidence and of the fact that the equations for the compressive modes become linear in the Lagrangian frame associated with the Alfvénic turbulence. 2.7. Five RMHD Cascades Thus, the anisotropy and critical balance (3) taken as ordering assumptions lead to a neat decomposition of the MHD turbulent cascade into a decoupled Alfvén-wave cas- cade and cascades of slow waves and entropy fluctuations pas- sively scattered/mixed by the Alfvén waves. More precisely, Eqs. (23), (21) and (30) imply that, for arbitrary β, there are five conserved quantities:16 W±AW = d3rρ0|∇⊥ζ±|2 (Alfven waves), (33) W±sw = d3rρ0|z±‖ | 2 (slow waves), (34) (entropy fluctuations).(35) 16 Note that magnetic helicity of the perturbed field is not an invariant of RMHD, except in two dimensions (see Appendix F.4). In 2D, there is also conservation of the mean square flux, d3r |Ψ|2 (see Appendix F.2). KINETIC TURBULENCE IN MAGNETIZED PLASMAS 11 W +AW and W AW are always cascaded by interaction with each other, Ws is passively mixed by W AW and W AW, W sw are pas- sively scattered by W∓AW and, unless β ≫ 1, also by W This is an example of splitting of the overall energy cascade into several channels (recovered as a particular case of the more general kinetic cascade in Appendix D.2)—a concept that will repeatedly arise in the kinetic treatment to follow. The decoupling of the slow- and Alfvén-wave cascades in MHD turbulence was studied in some detail and confirmed in direct numerical simulations by Maron & Goldreich (2001, for β ≫ 1) and by Cho & Lazarian (2002, 2003, for a range of values of β). The derivation given in § 2.2 and § 2.4 (cf. Lithwick & Goldreich 2001) provides a straightforward theo- retical basis for these results, assuming anisotropy of the tur- bulence (which was also confirmed in these numerical stud- ies). It turns out that the decoupling of the Alfvén-wave cascade that we demonstrated above for the anisotropic MHD turbu- lence is a uniformly valid property of plasma turbulence at both collisional and collisionless scales and that this cascade is correctly described by the RMHD equations (17-18) all the way down to the ion gyroscale, while the fluctuations of den- sity and magnetic-field strength do not satisfy simple fluid evolution equations anymore and require solving the kinetic equation. In order to prove this, we adopt a kinetic descrip- tion and apply to it the same ordering (§ 2.1) as we used to reduce the MHD equations. The kinetic theory that emerges as a result is called gyrokinetics. 3. GYROKINETICS The gyrokinetic formalism was first worked out for linear waves by Rutherford & Frieman (1968) and by Taylor & Hastie (1968) (see also Catto 1978; Antonsen & Lane 1980; Catto et al. 1981) and subsequently extended to the nonlinear regime by Frieman & Chen (1982). Rigorous derivations of the gyrokinetic equation based on the Hamiltonian formalism were developed by Dubin et al. (1983, electrostatic) and Hahm et al. (1988, electromagnetic). This approach is reviewed in Brizard & Hahm (2007). A more pedestrian, but perhaps also more transparent exposition of the gyrokinetics in a straight mean field can be found in Howes et al. (2006), who also provide a detailed explanation of the gyrokinetic ordering in the context of astrophysical plasma turbulence and a treatment of the linear waves and damping rates. Here we review only the main points so as to allow the reader to understand the present paper without referring elsewhere. In general, a plasma is completely described by the distribu- tion function fs(t,r,v)—the probability density for a particle of species s (= i,e) to be found at the spatial position r mov- ing with velocity v. This function obeys the kinetic Vlasov– Landau (or Boltzmann) equation + v ·∇ fs + · ∂ fs , (36) where qs and ms are the particle’s charge and mass, c is the speed of light, and the right-hand side is the collision term (quadratic in f ). The electric and magnetic fields are E = −∇ϕ− , B = ∇×A. (37) The first equality is Faraday’s law uncurled, the second the magnetic-field solenoidality condition; we shall use the Coulomb gauge, ∇·A = 0. The fields satisfy the Poisson and the Ampère–Maxwell equations with the charge and current densities determined by fs(t,r,v): ∇·E = 4π qsns = 4π d3v fs, (38) ∇×B − 1 d3vv fs. (39) 3.1. Gyrokinetic Ordering and Dimensionless Parameters As in § 2 we set up a static equilibrium with a uniform mean field, B0 = B0ẑ, E0 = 0, assume that the perturbations will be anisotropic with k‖ ≪ k⊥ (at scales smaller than the outer scale, k‖L ≫ 1; see § 1.3 and § 1.5.1), and construct an expan- sion of the kinetic theory around this equilibrium with respect to the small parameter ǫ ∼ k‖/k⊥. We adopt the ordering ex- pressed by Eqs. (3) and (12), i.e., we assume the perturbations to be strongly interacting Alfvén waves plus electron density and magnetic-field-strength fluctuations. Besides ǫ, several other dimensionless parameters are present, all of which are formally considered to be of order unity in the gyrokinetic expansion: the electron–ion mass ra- tio me/mi, the charge ratio Z = qi/|qe| = qi/e (40) (for hydrogen, this is 1, which applies to most astrophysical plasmas of interest to us), the temperature ratio17 τ = Ti/Te, (41) and the plasma (ion) beta v2thi 8πniTi , (42) where vthi = (2Ti/mi) 1/2 is the ion thermal speed and the total β was defined in Eq. (14) based on the total pressure p = niTi + neTe. We shall occasionally also use the electron beta 8πneTe βi. (43) The total beta is β = βi +βe. 3.1.1. Wavenumbers and Frequencies As we want our theory to be uniformly valid at all (perpen- dicular) scales above, at or below the ion gyroscale, we order k⊥ρi ∼ 1, (44) where ρi = vthi/Ωi is the ion gyroradius, Ωi = qiB0/cmi the ion cyclotron frequency. Note that ρi. (45) 17 It can be shown that equilibrium temperatures change on the timescale ∼ (ǫ2ω)−1 (Howes et al. 2006). On the other hand, from standard theory of collisional transport (e.g., Helander & Sigmar 2002), the ion and elec- tron temperatures equalize on the timescale ∼ ν−1ie ∼ (mi/me) 1/2ν−1ii [see Eq. (51)]. Therefore, τ can depart from unity by an amount of order ǫ2(ω/νii)(mi/me)1/2. In our ordering scheme [Eq. (49)], this is O(ǫ2) and, therefore, we should simply set τ = 1 + O(ǫ2). However, we shall carry the parameter τ because other ordering schemes are possible that permit arbitrary values of τ . These are appropriate to plasmas with very weak collisions. For example, in the solar wind, τ appears to be order unity but not exactly 1 (Newbury et al. 1998), while in accretion flows near the black hole, some models predict τ ≫ 1 (see § 8.5). 12 SCHEKOCHIHIN ET AL. FIG. 3.— Regions of validity in the wavenumber space of two primary approximations—the two-fluid (Appendix A.1) and gyrokinetic (§ 3). The gyrokinetic theory holds when k‖ ≪ k⊥ and ω ≪ Ωi [when k‖ ≪ k⊥ < ρ i , the second requirement is automatically satisfied for Alfvén, slow and entropy modes; see Eq. (46)]. The two-fluid equations hold when k‖λmfpi ≪ 1 (collisional limit) and k⊥ρi ≪ 1 (magnetized plasma). Note that the gyrokinetic theory holds for all but the very largest (outer) scales, where anisotropy cannot be assumed. Assuming Alfvénic frequencies implies ∼ k⊥ρi√ ǫ. (46) Thus, gyrokinetics is a low-frequency limit that averages over the timescales associated with the particle gyration. Because we have assumed that the fluctuations are anisotropic and have (by order of magnitude) Alfvénic frequencies, we see from Eq. (46) that their frequency remains far below Ωi at all scales, including the ion and even electron gyroscale—the gyroki- netics remains valid at all of these scales and the cyclotron- frequency effects are negligible (cf. Quataert & Gruzinov 1999). 3.1.2. Fluctuations Equation (3) allows us to order the fluctuations of the scalar potential: on the one hand, we have from Eq. (3) u⊥ ∼ ǫvA; on the other hand, the plasma mass flow velocity is (to the lowest order) the E×B drift velocity of the ions, u⊥ ∼ cE⊥/B0 ∼ ck⊥ϕ/B0, so ǫ. (47) All other fluctuations (magnetic, density, parallel velocity) are ordered according to Eq. (12). Note that the ordering of the flow velocity dictated by Eq. (3) means that we are considering the limit of small Mach numbers: M ∼ u . (48) This means that the gyrokinetic description in the form used below does not extend to large sonic flows that can be present in many astrophysical systems. It is, in principle, possible to extend the gyrokinetics to systems with sonic flows (e.g., in the toroidal geometry; see Artun & Tang 1994; Sugama & Horton 1997). However, we do not follow this route because such flows belong to the same class of non- universal outer-scale features as background density and tem- perature gradients, system-specific geometry etc.—these can all be ignored at small scales, where the turbulence should be approximately homogeneous and subsonic (as long as k‖L ≫ 1, see discussion in § 1.5.1). 3.1.3. Collisions Finally, we want our theory to be valid both in the colli- sional and the collisionless regimes, so we do not assume ω to be either smaller or larger than the (ion) collision fre- quency νii: k‖λmfpi√ ∼ 1, (49) where λmfpi = vthi/νii is the ion mean free path (this order- ing can actually be inferred from equating the gyrokinetic en- tropy production terms to the collisional entropy production; see extended discussion in Howes et al. 2006). Note that the ordering (49) holds on the understanding that we have ordered k⊥ρi ∼ 1 [Eq. (44)] because the fluctuation frequency can de- pend on k⊥ρi in the dissipation range (see § 7.3). Other collision rates are related to νii via a set of standard formulae (see, e.g., Helander & Sigmar 2002), which will be useful in what follows: νei = Zνee = τ 3/2 νii, (50) KINETIC TURBULENCE IN MAGNETIZED PLASMAS 13 νie = τ 3/2 νii, (51) νii = 2πZ4e4ni lnΛ , (52) where lnΛ is the Coulomb logarithm and the numerical factor in the definition of νie has been inserted for future notational convenience (see Appendix A). We always define λmfpi = , λmfpe = λmfpi. (53) The ordering of the collision frequency expressed by Eq. (49) means that collisions, while not dominant as in the fluid description (Appendix A), are still retained in the version of the gyrokinetic theory adopted by us. Their presence is required in order for us to be able to assume that the equilibrium distribution is Maxwellian [Eq. (54) below] and for the heating and entropy production to be treated correctly (§ 3.4 and § 3.5). However, our ordering of collisions and of the fluctuation amplitudes (§ 3.1.2) imposes certain limitations: thus, we cannot treat the class of nonlinear phenomena involving particle trapping by parallel-varying fluctuations, non-Maxwellian tails of particle distributions, plasma instabilities arising from the equilibrium pressure anisotropies (mirror, firehose) and their possible nonlinear evolution to large amplitudes (see discussion in § 8.3). The region of validity of the gyrokinetic approximation in the wavenumber space is illustrated in Fig. 3—it embraces all of the scales that are expected to be traversed by the anisotropic energy cascade (except the scales close to the outer scale). As we explained above, me/mi, βi, k⊥ρi and k‖λmfpi (or ω/νii) are assigned order unity in the gyrokinetic expansion. Subsidiary expansions in small me/mi (§ 4) and in small or large values of the other three parameters (§§ 5-7) can be car- ried out at a later stage as long as their values are not so large or small as to interfere with the primary expansion in ǫ. These expansions will yield simpler models of turbulence with more restricted domains of validity than gyrokinetics. 3.2. Gyrokinetic Equation Given the gyrokinetic ordering introduced above, the ex- pansion of the distribution function up to first order in ǫ can be written as fs(t,r,v) = F0s(v) − qsϕ(t,r) F0s(v) + hs(t,Rs,v⊥,v‖). (54) To zeroth order, it is a Maxwellian:18 F0s(v) = (πv2ths) v2ths , vths = , (55) with uniform density n0s and temperature T0s and no mean flow. As will be explained in more detail in § 3.5, F0s has a slow time dependence via the equilibrium temperature, T0s = T0s(ǫ 2t). This reflects the slow heating of the plasma as the tur- bulent energy is dissipated. However, T0s can be treated as a constant with respect to the time dependence of the first-order 18 The use of isotropic equilibrium is a significant idealization—this is discussed in more detail in § 8.3. distribution function (the timescale of the turbulent fluctua- tions). The first-order part of the distribution function is com- posed of the Boltzmann response [second term in Eq. (54), or- dered in Eq. (47)] and the gyrocenter distribution function hs. The spatial dependence of the latter is expressed not by the particle position r but by the position Rs of the particle gy- rocenter (or guiding center)—the center of the ring orbit that the particle follows in a strong guide field: Rs = r + v⊥× ẑ . (56) Thus, some of the velocity dependence of the distribution function is subsumed in the Rs dependence of hs. Explicitly, hs depends only on two velocity-space variables: it is cus- tomary in the gyrokinetic literature for these to be chosen as the particle energy εs = msv 2/2 and its first adiabatic invari- ant µs = msv ⊥/2B0 (both conserved quantities to two lowest orders in the gyrokinetic expansion). However, in a straight uniform guide field B0ẑ, the pair (v⊥,v‖) is a simpler choice, which will mostly be used in what follows (we shall some- times find an alternative pair, v and ξ = v‖/v, useful, especially where collisions are concerned). It must be constantly kept in mind that derivatives of hs with respect to the velocity-space variables are taken at constant Rs, not at constant r. The function hs satisfies the gyrokinetic equation: {〈χ〉Rs ,hs} = qsF0s ∂〈χ〉Rs where χ(t,r,v) = ϕ− v⊥ ·A⊥ , (58) the Poisson brackets are defined in the usual way: {〈χ〉Rs ,hs} = ẑ · ∂〈χ〉Rs × ∂hs , (59) and the ring average notation is introduced: 〈χ(t,r,v)〉Rs = t,Rs − v⊥× ẑ , (60) where ϑ is the angle in the velocity space taken in the plane perpendicular to the guide field B0ẑ. Note that, while χ is a function of r, its ring average is a function of Rs. Note also that the ring averages depend on the species index, as does the gyrocenter variable Rs. Equation (57) is derived by transforming the first-order kinetic equation to the gyrocenter variable (56) and ring averaging the result (see Howes et al. 2006, or the references given at the beginning of § 3). The ring-averaged collision integral (∂hs/∂t)c is discussed in Ap- pendix B. 3.3. Field Equations To Eq. (57), we must append the equations that determine the electromagnetic field, namely, the potentials ϕ(t,r) and A(t,r) that enter the expression for χ [Eq. (58)]. In the non-relativistic limit (vthi ≪ c), these are the plasma quasi- neutrality constraint [which follows from the Poisson equa- tion (38) to lowest order in vthi/c]: qsδns = n0s + d3v〈hs〉r 14 SCHEKOCHIHIN ET AL. and the parallel and perpendicular parts of Ampère’s law [Eq. (39) to lowest order in ǫ and in vthi/c]: ∇2⊥A‖ = − j‖ = − d3vv‖〈hs〉r, (62) ∇2⊥δB‖ = − ∇⊥× j⊥ d3v〈v⊥hs〉r , (63) where we have used δB‖ = ẑ · (∇⊥×A⊥) and dropped the dis- placement current. Since field variables ϕ, A‖ and δB‖ are functions of the spatial variable r, not of the gyrocenter vari- able Rs, we had to determine the contribution from the gy- rocenter distribution function hs to the charge distribution at fixed r by performing a gyroaveraging operation dual to the ring average defined in Eq. (60): 〈hs(t,Rs,v⊥,v‖)〉r = t,r + v⊥× ẑ ,v⊥,v‖ In other words, the velocity-space integrals in Eqs. (61-63) are performed over hs at constant r, rather than constant Rs. If we Fourier transform hs in Rs, the gyroaveraging operation takes a simple mathematical form: 〈hs〉r = 〈eik·Rs〉rhsk(t,v⊥,v‖) eik·r ik · v⊥× ẑ hsk(t,v⊥,v‖) eik·rJ0(as)hsk(t,v⊥,v‖), (65) where as = k⊥v⊥/Ωs and J0 is a Bessel function that arose from the angle integral in the velocity space. In Eq. (63), an analogous calculation taking into account the angular depen- dence of v⊥ leads to δB‖ = − eik·r d3vmsv J1(as) hsk(t,v⊥,v‖). Note that Eq. (63) [and, therefore, Eq. (66)] is the gyroki- netic equivalent of the perpendicular pressure balance that ap- peared in § 2 [Eq. (22)]: B0δB‖ = ∇⊥ · d3v〈ẑ× v⊥hs〉r = ∇⊥ · t,r + v⊥× ẑ ,v⊥,v‖ = −∇⊥∇⊥ : d3vms〈v⊥v⊥ hs〉r = −∇⊥∇⊥ : δP⊥,(67) where we have integrated by parts with respect to the gyroan- gle ϑ and used ∂v⊥/∂ϑ = ẑ× v⊥, ∂2v⊥/∂ϑ2 = −v⊥ (cf. the Appendix of Roach et al. 2005). Once the fields are determined, they have to be substi- tuted into χ [Eq. (58)] and the result ring averaged [Eq. (60)]. Again, we emphasize that ϕ, A‖ and δB‖ are functions of r, while 〈χ〉Rs is a function of Rs. The transformation is ac- complished via a calculation analogous to the one that led to Eqs. (65) and (66): 〈χ〉Rs = eik·Rs〈χ〉Rs ,k, (68) 〈χ〉Rs ,k = J0(as) v‖A‖k v2ths J1(as) . (69) The last equation establishes a correspondence between the Fourier transforms of the fields with respect to r and the Fourier transform of 〈χ〉Rs with respect to Rs. 3.4. Generalized Energy and the Kinetic Cascade As promised in § 1.4, the central unifying concept of this paper is now introduced. If we multiply the gyrokinetic equation (57) by T0shs/F0s and integrate over the velocities and gyrocenters, we find that the nonlinear term conserves the variance of hs and d3Rs qs ∂〈χ〉Rs T0shs . (70) Let us now sum this equation over all species. The first term on the right-hand side is ∂〈χ〉Rs d3v〈hs〉r − d3v〈vhs〉r d3rE · j, (71) where we have used Eq. (61) and Ampère’s law [Eqs. (62- 63)] to express the integrals of hs. The second term on the right-hand side is the total work done on plasma per unit time. Using Faraday’s law [Eq. (37)] and Ampère’s law [Eq. (39)], it can be written as d3rE · j = − d |δB|2 + Pext, (72) where Pext ≡ − d3rE · jext is the total power injected into the system by the external energy sources (outer-scale stirring; in terms of the Kolmogorov energy flux ε used in the scaling arguments in § 1.2, Pext = Vmin0iε, where V is the system vol- ume). Combining Eqs. (70-72), we find (Howes et al. 2006) T0s〈h2s 〉r |δB|2 = Pext + T0shs . (73) W is a positive definite quantity—this becomes explicit if we use Eq. (61) to express it in terms of the total perturbed distri- bution function δ fs = −qsϕF0s/T0s + hs [see Eq. (54)]: T0sδ f |δB|2 . (74) KINETIC TURBULENCE IN MAGNETIZED PLASMAS 15 We will refer to W as the generalized energy. We use this term to emphasize the role of W as the cascaded quantity in gyrokinetic turbulence (see below). This quantity is, in fact, the gyrokinetic version of a collisionless kinetic invariant var- iously referred to as the generalized grand canonical poten- tial (see Hallatschek 2004, who points out the fundamental role of this quantity in plasma turbulence simulations) or free energy (e.g., Fowler 1968; Scott 2007). The non-magnetic part of W is related to the perturbed entropy of the sys- tem (Krommes & Hu 1994; Sugama et al. 1996; Howes et al. 2006; Schekochihin et al. 2008b, see discussion in § 3.5).19 Equation (73) is a conservation law of the generalized en- ergy: Pext is the source and the second term on the right-hand side, which is negative definite, represents collisional dissi- pation. This suggests that we might think of kinetic plasma turbulence in terms of the generalized energy W injected by the outer-scale stirring and dissipated by collisions. In or- der for the dissipation to be important, the collisional term in Eq. (73) has to become comparable to Pext. This can happen in two ways: 1. At collisional scales (k‖λmfpi ∼ 1) due to deviations of the perturbed distribution function from a local per- turbed Maxwellian (see § 6.1 and Appendix D); 2. At collisionless scales (k‖λmfpi ≫ 1) due the develop- ment of small scales in the velocity space—large gra- dients in v‖ (see § 6.2.4) or v⊥ (which is accompanied by the development of small perpendicular scales in the position space; see § 7.9.1). Thus, the dissipation is only important at particular (small) scales, which are generally well separated from the outer scale. The generalized energy is transferred from the outer scale to the dissipation scales via a nonlinear cascade. We shall call it the kinetic cascade. It is analogous to the energy cascade in fluid or MHD turbulence, but a conceptually new feature is present: the small scales at which dissipation hap- pens are small scales both in the velocity and position space. Whereas the large gradients in v‖ are produced by the lin- ear parallel phase mixing, whose role in the kinetic dissipa- tion processes has been appreciated for some time (Landau 1946; Hammett et al. 1991; Krommes & Hu 1994; Krommes 1999; Watanabe & Sugama 2004, see § 6.2.4), the emergence of large gradients in v⊥ is due to an essentially nonlinear phase mixing mechanism (§ 7.9.1). At spatial scales smaller than the ion gyroradius, this nonlinear perpendicular phase mixing turns out to be a faster and, therefore, presumably the dominant way of generating small-scale structure in the veloc- ity space. It was anticipated in the development of gyrofluid moment hierarchies by Dorland & Hammett (1993). Here we treat it for the first time as a phase-space turbulent cascade: this is done in § 7.9 and § 7.10 (see also Schekochihin et al. 2008b). In the sections that follow, we shall derive particular forms of W for various limiting cases of the gyrokinetic theory (§ 4.7, § 5.6, § 6.2.5, § 7.8, Appendices D.2 and E.2). We shall see that the kinetic cascade of W is, indeed, a direct generalization of the more familiar fluid cascades (such as 19 Note also that a quadratic form involving both the perturbed distribution function and the electromagnetic field appears, in a more general form than Eq. (74), in the formulation of the energy principle for the Kinetic MHD approximation (Kruskal & Oberman 1958; Kulsrud 1962, 1964). Regarding the relationship between Kinetic MHD and gyrokinetics, see footnote 23. the RMHD cascades discussed in § 2) and that W contains the energy invariants of the fluid models in the appropriate limits. In these limits, the cascade of the generalized en- ergy will split into several decoupled cascades, as it did in the case of RMHD (§ 2.7). Whenever one of the physically important scales (§ 1.5.2) is crossed and a change of physical regime occurs, these cascades are mixed back together into the overall kinetic cascade of W , which can then be split in a different way as it emerges on the “opposite side” of the transition region in the scale space. The conversion of the Alfvénic cascade into the KAW cascade and the entropy cas- cade at k⊥ρi ∼ 1 is the most interesting example of such a transition, discussed in § 7. The generalized energy appears to be the only quadratic invariant of gyrokinetics in three dimensions; in two dimen- sions, many other invariants appear (see Appendix F). 3.5. Heating and Entropy In a stationary state, all of the the turbulent power injected by the external stirring is dissipated and thus transferred into heat. Mathematically, this is expressed as a slow increase in the temperature of the Maxwellian equilibrium. In gyrokinet- ics, the heating timescale is ordered as ∼ (ǫ2ω)−1. Even though the dissipation of turbulent fluctuations may be occurring “collisionlessly” at scales such that k‖λmfpi ≫ 1 (e.g., via wave–particle interaction at the ion gyroscale; § 7.1), the resulting heating must ultimately be effected with the help of collisions. This is because heating is an irreversible process and it is a small amount of collisions that make “collisionless” damping irreversible. In other words, slow heating of the Maxwellian equilibrium is equivalent to entropy production and Boltzmann’s H-theorem rigorously requires collisions to make this possible. Indeed, the total entropy of species s is Ss = − d3v fs ln fs F0s lnF0s + δ f 2s + O(ǫ3), (75) where we took d3rδ fs = 0. It is then not hard to show that T0shs where the overlines mean averaging over times longer than the characteristic time of the turbulent fluctuations ∼ ω−1 but shorter than the typical heating time ∼ (ǫ2ω)−1 (see Howes et al. 2006; Schekochihin et al. 2008b for a detailed derivation of this and related results on heating in gyroki- netics; see also earlier discussions of the entropy production in gyrokinetics by Krommes & Hu 1994; Krommes 1999; Sugama et al. 1996). We have omitted the term describing the interspecies collisional temperature equalization. Note that both sides of Eq. (76) are order ǫ2ω. If we now time average Eq. (73) in a similar fashion, the left-hand side vanishes because it is a time derivative of a quantity fluctuating on the timescale ∼ ω−1 and we confirm that the right-hand side of Eq. (76) is simply equal to the av- erage power Pext injected by external stirring. The import of Eq. (76) is that it tells us that heating can only be effected by collisions, while Eq. (73) implies that the injected power gets to the collisional scales in velocity and position space by means of a kinetic cascade of generalized energy. 16 SCHEKOCHIHIN ET AL. The first term in the expression for the generalized energy (74) is − s T0sδSs, where δSs is the perturbed entropy [see Eq. (75)]. The second term in Eq. (74) is magnetic energy. Collisionless damping of electromagnetic fluctuations can be thought of as a redistribution of the generalized energy, trans- ferring the electromagnetic energy into entropy fluctuations, while the total W is conserved (a simple example of how that happens for collisionless compressive fluctuations in the iner- tial range is worked out in § 6.2.3). The contribution to the perturbed entropy from the gy- rocenter distribution is the integral of −h2s/2F0s, whose evolution equation (70) can be viewed as the gyrokinetic version of the H-theorem. The first term on the right-hand side of this equation represents the wave–particle interaction (collisionless damping). Under time average, it is related to the work done on plasma [Eq. (71)] and hence to the average externally injected power Pext via time-averaged Eq. (72). In a stationary state, this is balanced by the second term in the right-hand side of Eq. (70), which is the collisional-heating, or entropy-production, term that also appears in Eq. (76). Thus, the generalized energy channeled by collisionless damping into entropy fluctuations is eventually converted into heat by collisions. The sub-gyroscale entropy cascade, which brings the perturbed distribution function hs to col- lisional scales, will be discussed further in § 7.9 and § 7.10 (see also Schekochihin et al. 2008b). This concludes a short primer on gyrokinetics necessary (and sufficient) for adequate understanding of what is to fol- low. Formally, all further analytical derivations in this paper are simply subsidiary expansions of the gyrokinetics in the pa- rameters we listed in § 3.1: in § 4, we expand in (me/mi) in § 5 in k⊥ρi (followed by further subsidiary expansions in large and small k‖λmfpi in § 6), and in § 7 in 1/k⊥ρi. 4. ISOTHERMAL ELECTRON FLUID In this section, we carry out an expansion of the electron gy- rokinetic equation in powers of (me/mi) 1/2 ≃ 0.02 (for hydro- gen plasma). In virtually all cases of interest, this expansion can be done while still considering βi, k⊥ρi, and k‖λmfpi to be order unity.21 Note that the assumption k⊥ρi ∼ 1 together with Eq. (45) mean that k⊥ρe ∼ k⊥ρi(me/mi)1/2 ≪ 1, (77) i.e., the expansion in (me/mi) 1/2 means also that we are considering scales larger than the electron gyroradius. The idea of such an expansion of the electron kinetic equation has been utilized many times in plasma physics literature. The mass-ratio expansion of the gyrokinetic equation in a form very similar to what is presented below is found in Snyder & Hammett (2001). 20 Note that Eq. (72) is valid not only in the integral form but also indi- vidually for each wavenumber: indeed, using the Fourier-transformed Fara- day and Ampère’s laws, we have Ek · j k + E k · jk = Ek · j ext,k + E k · jext,k − (1/4π)∂|δBk|2/∂t. In a stationary state, time averaging eliminates the time derivative of the magnetic-fluctuation energy, so Ek · j∗k + E k · jk = 0 at all k except those corresponding to the outer scale, where the external energy in- jection occurs. This means that below the outer scale, the work done on one species balances the work done on the other. The wave–particle interaction term in the gyrokinetic equation is responsible for this energy exchange. 21 One notable exception is the LAPD device at UCLA, where β ∼ 10−4 − 10−3 (due mostly to the electron pressure because the ions are cold, τ ∼ 0.1, so βi ∼ βe/10; see, e.g., Morales et al. 1999; Carter et al. 2006). This interferes with the mass-ratio expansion. The primary import of this section will be technical: we shall dispense with the electron gyrokinetic equation and thus prepare the necessary ground for further approximations. The main results are summarized in § 4.9. A reader who is only interested in following qualitatively the major steps in the derivation may skip to this summary. 4.1. Ordering the Terms in the Kinetic Equation In view of Eq. (77), ae ≪ 1, so we can expand the Bessel functions arising from averaging over the electron ring mo- tion: J0(ae) = 1 − a2e + · · · , J1(ae) a2e + · · · . (78) Keeping only the lowest-order terms of the above expansions in Eq. (69) for 〈χ〉Re , then substituting this 〈χ〉Re and qe = −e in the electron gyrokinetic equation, we get the following kinetic equation for the electrons, accurate up to and including the first order in (me/mi) 1/2 (or in k⊥ρe): ︸ ︷︷ ︸ ︸ ︷︷ ︸ v2the ︸ ︷︷ ︸ ︸ ︷︷ ︸ v2the ︸ ︷︷ ︸ ︸ ︷︷ ︸ . (79) Note that ϕ, A‖, δB‖ in Eq. (79) are taken at r = Re. We have indicated the lowest order to which each of the terms enters if compared with v‖∂he/∂z. In order to obtain these estimates, we have assumed that the physical ordering intro- duced in § 3.1 holds with respect to the subsidiary expansion in (me/mi) 1/2 as well as for the primary gyrokinetic expansion in ǫ, so we can use Eqs. (3) and (12) to order terms with re- spect to (me/mi) 1/2. We have also made use of Eqs. (45), (47), and of the following three relations: ∼ vthe , (80) (v‖/c)A‖ ∼ vtheδB⊥ , (81) v2the βi. (82) The collision term is estimated to be zeroth order because [see Eqs. (49), (50)] k‖λmfpi . (83) The consequences of other possible orderings of the collision terms are discussed in § 4.8. We remind the reader that all dimensionless parameters except k‖/k⊥ ∼ ǫ and (me/mi)1/2 are held to be order unity. We now let he = h e + h e + . . . and carry out the expansion to two lowest orders in (me/mi) 4.2. Zeroth Order KINETIC TURBULENCE IN MAGNETIZED PLASMAS 17 To zeroth order, the electron kinetic equation is v‖b̂ ·∇h(0)e = v‖ ∂h(0)e , (84) where we have assembled the terms in the left-hand side to take the form of the derivative of the distribution function along the perturbed magnetic field: b̂ ·∇ = ∂ ·∇ = ∂ A‖, · · · . (85) We now multiply Eq. (84) by h(0)e /F0e and integrate over v and r (since we are only retaining lowest-order terms, the distinc- tion between r and Re does not matter here). Since ∇·B = 0, the left-hand side vanishes (assuming that all perturbations are either periodic or vanish at the boundaries) and we get h(0)e ∂h(0)e ‖e = 0. The right-hand side of this equation is zero because the electron flow velocity is zero in the zeroth order, u(0) (1/n0e) d3vv‖h e = 0. This is a consequence of the paral- lel Ampére’s law [Eq. (62)], which can be written as follows u‖e = 4πen0e ∇2⊥A‖ + u‖i, (87) where u‖i = eik·r d3vv‖J0(ai)hik. (88) The three terms in Eq. (87) can be estimated as follows ∼ ǫvthe ǫ, (89) ∼ ǫ, (90) c∇2⊥A‖ 4πen0evA ∼ k⊥ρi ǫ, (91) where we have used the fundamental ordering (12) of the slow waves (u‖i ∼ ǫvA) and Alfvén waves (δB⊥ ∼ ǫB0). Thus, the two terms in the right-hand side of Eq. (87) are one order of (me/mi) 1/2 smaller than u(0) ‖e , which means that to zeroth order, the parallel Ampère’s law is u(0) ‖e = 0. The collision operator in Eq. (86) contains electron– electron and electron–ion collisions. To lowest order in (me/mi) 1/2, the electron–ion collision operator is simply the pitch-angle scattering operator [see Eq. (B20) in Appendix B and recall that u‖i is first order]. Therefore, we may then rewrite Eq. (86) as follows h(0)e Cee[h νeiD (v) 1 − ξ2 ∂h(0)e = 0. (92) Both terms in this expression are negative definite and must, therefore, vanish individually. This implies that h(0)e must be a perturbed Maxwellian distribution with zero mean veloc- ity (this follows from the proof of Boltzmann’s H theorem; see, e.g., Longmire 1963), i.e., the full electron distribution function to zeroth order in the mass-ratio expansion is [see Eq. (54)]: fe = F0e + + h(0)e = 2πTe/me , (93) where ne = n0e + δne, Te = T0e + δTe. Expanding around the unperturbed Maxwellian F0e, we get h(0)e = v2the F0e, (94) where the fields are taken at r = Re. Now substitute this so- lution back into Eq. (84). The collision term vanishes and the remaining equation must be satisfied at all values of v. This gives + b̂ ·∇ϕ= b̂ ·∇T0e , (95) b̂ ·∇δTe = 0. (96) The collision term is neglected in Eq. (95) because, for h(0)e given by Eq. (94), it vanishes to zeroth order. 4.3. Flux Conservation Equation (95) implies that the magnetic flux is conserved and magnetic-field lines cannot be broken to lowest order in the mass-ratio expansion. Indeed, we may follow Cowley (1985) and argue that the left-hand side of Eq. (95) is minus the projection of the electric field on the total magnetic field [see Eq. (37)], so we have E · b̂ = −b̂ ·∇ ; (97) hence the total electric field is Î − b̂b̂ and Faraday’s law becomes = −c∇×E = ∇× (ueff ×B) , (99) ueff = E +∇T0e ×B, (100) i.e., the magnetic field lines are frozen into the velocity field ueff. In Appendix C.1, we show that this effective velocity is the part of the electron flow velocity ue perpendicular to the total magnetic field B [see Eq. (C6)]. The flux conservation is broken in the higher orders of the mass-ratio expansion. In the first order, Ohmic resistivity for- mally enters in Eq. (95) (unless collisions are even weaker than assumed so far; if they are downgraded one order as is done in § 4.8.3, resistivity enters in the second order). In the second order, the electron inertia and the finiteness of the elec- tron gyroradius also lead to unfreezing of the flux. This can be seen formally by keeping second-order terms in Eq. (79), mul- tiplying it by v‖ and integrating over velocities. The relative importance of these flux unfreezing mechanisms is evaluated in § 7.7. 18 SCHEKOCHIHIN ET AL. 4.4. Isothermal Electrons Equation (96) mandates that the perturbed electron temper- ature must remain constant along the perturbed field lines. Strictly speaking, this does not preclude δTe varying across the field lines. However, we shall now assume δTe = const (has no spatial variation), which is justified, e.g., if the field lines are stochastic. Assuming that no spatially uniform perturba- tions exist, we may set δTe = 0. Equation (94) then reduces h(0)e = F0e(v), (101) or, using Eq. (54), δ fe = F0e(v). (102) Hence follows the equation of state for isothermal electrons: δpe = T0eδne. (103) 4.5. First Order We now integrate Eq. (79) over the velocity space and retain the lowest (first) order terms only. Using Eq. (101), we get A‖,u‖e = 0, (104) where the parallel electron velocity is first order: u‖e = u d3vv‖h e . (105) The velocity-space integral of the collision term does not enter because it is subdominant by at least one factor of (me/mi) indeed, as shown in Appendix B.1, the velocity integration leads to an extra factor of k2⊥ρ e , so that ∼ νeik2⊥ρ2e i νii , (106) where we have used Eqs. (45) and (50). The collision term is subdominant because of the ordering of the ion collision frequency given by Eq. (49). 4.6. Field Equations Using Eq. (101) and qi = Ze, n0e = Zn0i, T0e = T0i/τ , we derive from the quasi-neutrality equation (61) [see also Eq. (65)] eik·r d3vJ0(ai)hik, (107) and, from the perpendicular part of Ampère’s law [Eq. (66), using also Eq. (107)], eik·r J0(ai) + v2thi J1(ai) . (108) The parallel electron velocity, u‖e, is determined from the par- allel part of Ampère’s law, Eq. (87). The ion distribution function hi that enters these equations has to be determined by solving the ion gyrokinetic equation: Eq. (57) with s = i. 4.7. Generalized Energy The generalized energy (§ 3.4) for the case of isothermal electrons is calculated by substituting Eq. (102) into Eq. (74): T0iδ f n0eT0e |δB|2 , (109) where δ fi = hi − Zeϕ/T0i F0i [see Eq. (54)]. 4.8. Validity of the Mass-Ratio Expansion Let us examine the range of spatial scales in which the equations derived above are valid. In carrying out the ex- pansion in (me/mi) 1/2, we ordered k⊥ρi ∼ 1 [Eq. (77)] and k‖λmfpi ∼ 1 [Eq. (83)]. Formally, this means that the perpen- dicular and parallel wavelengths of the perturbations must not be so small or so large as to interfere with the mass ratio ex- pansion. We now discuss the four conditions that this require- ment leads to and whether any of them can be violated without destroying the validity of the equations derived above. 4.8.1. k⊥ρi ≪ (mi/me) This is equivalent to demanding that k⊥ρe ≪ 1, a condition that was, indeed, essential for the expansion to hold [Eq. (78)]. This is not a serious limitation because electrons can be con- sidered well magnetized at virtually all scales of interest for astrophysical applications. However, we do forfeit the de- tailed information about some important electron physics at k⊥ρe ∼ 1: for example such effects as wave damping at the electron gyroscale and the electron heating (although the total amount of the electron heating can be deduced by subtracting the ion heating from the total energy input). The breaking of the flux conservation (resistivity) is also an effect that requires incorporation of the finite electron gyroscale physics. 4.8.2. k⊥ρi ≫ (me/mi) If this condition is broken, the small-k⊥ρi expansion, car- ried out in § 5, must, formally speaking, precede the mass- ratio expansion. However, it turns out that the small- k⊥ρi expansion commutes with the mass-ratio expansion (Schekochihin et al. 2007, see also footnote 23), so we may use the equations derived in §§ 4.2-4.6 when k⊥ρi . (me/mi) 4.8.3. k‖λmfpi ≪ (mi/me) Let us consider what happens if this condition is broken and k‖λmfpi & (mi/me) 1/2. In this case, the collisions be- come even weaker and the expansion procedure must be mod- ified. Namely, the collision term picks up one extra order of (me/mi) 1/2, so it is first order in Eq. (79). To zeroth order, the electron kinetic equation no longer contains collisions: in- stead of Eq. (84), we have v‖b̂ ·∇h(0)e = v‖ . (110) We may seek the solution of this equation in the form h(0)e = H(t,Re)F0e + h e,hom, where H(t,Re) is an unknown function to KINETIC TURBULENCE IN MAGNETIZED PLASMAS 19 FIG. 4.— Region of validity in the wavenumber space of the secondary approximation—isothermal electrons and gyrokinetic ions (§ 4). It is the region of validity of the gyrokinetic approximation (Fig. 3) further circumscribed by two conditions: k‖λmfpi ≫ (me/mi) 1/2 (isothermal electrons) and k⊥ρe ≪ 1 (magnetized electrons). The region of validity of the strongly magnetized two-fluid theory (Appendix A.2) is also shown. It is the same as for the full two-fluid theory plus the additional constraint k⊥ρi ≪ k‖λmfpi. The region of validity of MHD (or one-fluid theory) is the subset of this with k‖λmfpi ≪ (me/mi) (adiabatic electrons). be determined and h(0)e,hom is the homogeneous solution satis- fying b̂ ·∇h(0)e,hom = 0, (111) i.e., h(0)e,hom must be constant along the perturbed magnetic field. This is a generalization of Eq. (96). Again assuming stochastic field lines, we conclude that h(0)e,hom is independent of space. If we rule out spatially uniform perturbations, we may set h(0)e,hom = 0. The unknown function H(t,Re) is readily expressed in terms of δne and ϕ: d3vh(0)e ⇒ H = , (112) so h(0)e is again given by Eq. (101), so the equations derived in §§ 4.2-4.6 are unaltered. Thus, the mass-ratio expansion remains valid at k‖λmfpi & (mi/me) 4.8.4. k‖λmfpi ≫ (me/mi) If the parallel wavelength of the fluctuations is so long that this is violated, k‖λmfpi . (me/mi) 1/2, the collision term in Eq. (79) is minus first order. This is the lowest-order term in the equation. Setting it to zero obliges h(0)e to be a perturbed Maxwellian again given by Eq. (94). Instead of Eq. (84), the zeroth-order kinetic equation is v‖b̂ ·∇h(0)e = v‖ ∂h(1)e . (113) Now the collision term in this order contains h(1)e , which can be determined from Eq. (113) by inverting the colli- sion operator. This sets up a perturbation theory that in due course leads to the Reduced MHD version of the general MHD equations—this is what was considered in § 2. Equa- tion (96) no longer needs to hold, so the electrons are not isothermal. In this true one-fluid limit, both electrons and ions are adiabatic with equal temperatures [see Eq. (115) be- low]. The collisional transport terms in this limit (parallel and perpendicular resistivity, viscosity, heat fluxes, etc.) were calculated [starting not from gyrokinetics but from the gen- eral Vlasov–Landau equation (36)] in exhaustive detail by Braginskii (1965). His results and the way RMHD emerges from them are reviewed in Appendix A. In physical terms, the electrons can no longer be isothermal if the parallel electron diffusion time becomes longer than the characteristic time of the fluctuations (the Alfvén time): vtheλmfpik2‖ ⇔ k‖λmfpi . . (114) Furthermore, under a similar condition, electron and ion tem- peratures must equalize: this happens if the ion–electron col- lision time is shorter than the Alfvén time, ⇔ k‖λmfpi . (115) (see Lithwick & Goldreich 2001 for a discussion of these con- ditions in application to the ISM). 4.9. Summary The original gyrokinetic description introduced in § 3 was a system of two kinetic equations [Eq. (57)] that evolved the electron and ion distribution functions he, hi and three field 20 SCHEKOCHIHIN ET AL. equations [Eqs. (61-63)] that related ϕ, A‖ and δB‖ to he and hi. In this section, we have taken advantage of the smallness of the electron mass to treat the electrons as an isothermal magnetized fluid, while ions remained fully gyrokinetic. In mathematical terms, we solved the electron kinetic equa- tion and replaced the gyrokinetics with a simpler closed sys- tem of equations that evolve 6 unknown functions: ϕ, A‖, δB‖, δne, u‖e and hi. These satisfy two fluid-like evolution equa- tions (95) and (104), three integral relations (107), (108), and (87) which involve hi, and the kinetic equation (57) for hi. The system is simpler because the full electron distribution function has been replaced by two scalar fields δne and u‖e. We now summarize this new system of equations: denoting ai = k⊥v⊥/Ωi, we have + b̂ ·∇ϕ= b̂ ·∇T0e , (116) + b̂ ·∇u‖e = − ,(117) eik·r d3vJ0(ai)hik, (118) u‖e = 4πen0e ∇2⊥A‖ + eik·r d3vv‖J0(ai)hik, (119) eik·r J0(ai) + v2thi J1(ai) , (120) and Eq. (57) for s = i and ion–ion collisions only: {〈χ〉Ri ,hi} = ∂〈χ〉Ri F0i + 〈Cii[hi]〉Ri , (121) where 〈Cii[. . .]〉Ri is the gyrokinetic ion–ion collision oper- ator (see Appendix B) and the ion–electron collisions have been neglected to lowest order in (me/mi) 1/2 [see Eq. (51)]. Note that Eqs. (116-117) have been written in a compact form, where + uE ·∇ = {ϕ, · · ·} (122) is the convective derivative with respect to the E×B drift ve- locity, uE = −c∇⊥ϕ× ẑ/B0, and b̂ ·∇ = ∂ ·∇ = ∂ A‖, · · · (123) is the gradient along the total magnetic field (mean field plus perturbation). The generalized energy conserved by Eqs. (116-121) is given by Eq. (109). It is worth observing that the left-hand side of Eq. (116) is simply minus the component of the electric field along the to- tal magnetic field [see Eq. (37)]. This was used in § 4.3 to prove that the magnetic flux described by Eq. (116) is exactly conserved (see § 7.7 for a discussion of scales at which this conservation is broken). Equation (116) is the projection of the generalized Ohm’s law onto the total magnetic field—the right-hand side of this equation is the so-called thermoelec- tric term. This is discussed in more detail in Appendix C.1, where we also show that Eq. (117) is the parallel part of Fara- day’s law and give a qualitative non-gyrokinetic derivation of Eqs. (116-117). We will refer to Eqs. (116-121) as the equations of isother- mal electron fluid. They are valid in a broad range of scales: the only constraints are that k‖ ≪ k⊥ (gyrokinetic order- ing, § 3.1), k⊥ρe ≪ 1 (electrons are magnetized, § 4.8.1) and k‖λmfpi ≫ (me/mi)1/2 (electrons are isothermal, § 4.8.4). The region of validity of Eqs. (116-121) in the wavenumber space is illustrated in Fig. 4. A particular advantage of this hybrid fluid-kinetic system is that it is uniformly valid across the transition from magnetized to unmagnetized ions (i.e., from k⊥ρi ≪ 1 to k⊥ρi ≫ 1). 5. TURBULENCE IN THE INERTIAL RANGE: KINETIC RMHD Our goal in this section is to derive a reduced set of equa- tions that describe the magnetized plasma in the limit of small k⊥ρi. Before we proceed with an expansion in k⊥ρi, we need to make a formal technical step, the usefulness of which will become clear shortly. A reader with no patience for this or any of the subsequent technical developments may skip to the summary at the end of this section (§ 5.7). 5.1. A Technical Step Let us formally split the ion gyrocenter distribution function into two parts: v⊥ ·A⊥ F0i + g eik·Ri J0(ai) v2thi J1(ai) F0i + g. (124) Then g satisfies the following equation, obtained by substitut- ing Eq. (124) and the expression for ∂A‖/∂t that follows from Eq. (116) into the ion gyrokinetic equation (121): {〈χ〉Ri ,g}− 〈Cii[g]〉Ri ︸ ︷︷ ︸ A‖,ϕ− 〈ϕ〉Ri ︸ ︷︷ ︸ + b̂ ·∇ v⊥ ·A⊥ ︸ ︷︷ ︸ v⊥ ·A⊥ ︸ ︷︷ ︸ . (125) In the above equation, we have used compact notation in writing out the nonlinear terms: e.g., A‖,ϕ− 〈ϕ〉Ri A‖(r),ϕ(r) 〈A‖〉Ri ,〈ϕ〉Ri , where the first Poisson bracket involves derivatives with respect to r and the second with respect to Ri. KINETIC TURBULENCE IN MAGNETIZED PLASMAS 21 The field equations (118-120) rewritten in terms of g are −Γ1(αi) ︸ ︷︷ ︸ 1 −Γ0(αi) ]Zeϕk ︸ ︷︷ ︸ d3vJ0(ai)gk ︸ ︷︷ ︸ , (126) 4πen0e k2⊥A‖k ︸ ︷︷ ︸ d3vv‖J0(ai)gk ︸ ︷︷ ︸ = u‖ki, (127) ︸ ︷︷ ︸ Γ2(αi) + ︸ ︷︷ ︸ 1 −Γ1(αi) ]Zeϕk ︸ ︷︷ ︸ v2thi J1(ai) ︸ ︷︷ ︸ , (128) where ai = k⊥v⊥/Ωi, αi = k i /2 and we have defined Γ0(αi) = d3v [J0(ai)] 2 F0i = I0(αi)e −αi = 1 −αi + · · · , (129) Γ1(αi) = v2thi J0(ai) J1(ai) F0i = −Γ′0(αi) = [I0(αi) − I1(αi)] e−αi = 1 − αi + · · · , (130) Γ2(αi) = v2thi J1(ai) F0i = 2Γ1(αi). (131) Underneath each term in Eqs. (125-128), we have indicated the lowest order in k⊥ρi to which this term enters. 5.2. Subsidiary Ordering in k⊥ρi In order to carry out a subsidiary expansion in small k⊥ρi, we must order all terms in Eqs. (95-104) and (125-128) with respect to k⊥ρi. Let us again assume, like we did when ex- panding the electron equation (§ 4), that the ordering intro- duced for the gyrokinetics in § 3.1 holds also for the sub- sidiary expansion in k⊥ρi. First note that, in view of Eq. (47), we must regard Zeϕ/T0i to be minus first order: . (132) Also, as δB⊥/B0 ∼ ǫ [Eq. (12)], (v‖/c)A‖ ∼ vthiδB⊥ βi, (133) so ϕ and (v‖/c)A‖ are same order. Since u‖ = u‖i (electrons do not contribute to the mass flow), assuming that slow waves and Alfvén waves have comparable energies implies u‖i ∼ u⊥. As u‖i is determined by the second equality in Eq. (127), we can order g [using Eq. (12)]: , (134) so g is zeroth order in k⊥ρi. Similarly, δne/n0e ∼ δB‖/B0 ∼ ǫ are zeroth order in k⊥ρi—this follows directly from Eq. (12). Together with Eq. (3), the above considerations allow us to order all terms in our equations. The ordering of the collision term involving ϕ is explained in Appendix B.2. 5.3. Alfvén Waves: Kinetic Derivation of RMHD We shall now show that the RMHD equations (17-18) hold in this approximation. There is a simple correspondence be- tween the stream and flux functions defined in Eq. (16) and the electromagnetic potentials ϕ and A‖: ϕ, Ψ = − 4πmin0i . (135) The first of these definitions says that the perpendicular flow velocity u⊥ is the E×B drift velocity; the second definition is the standard MHD relation between the magnetic flux func- tion and the parallel component of the vector potential. 5.3.1. Derivation of Eq. (17) Deriving Eq. (17) is straightforward: in Eq. (95), we retain only the lowest—minus first—order terms (those that contain ϕ and A‖). The result is = 0. (136) Using Eq. (135) and the definition of the Alfvén speed, vA = 4πmin0i, we get Eq. (17). By the argument of § 4.3, Eq. (136) expresses the fact that that magnetic-field lines are frozen into the E×B velocity field, which is the mean flow velocity associated with the Alfvén waves (see § 5.4). 5.3.2. Derivation of Eq. (18) As we are about to see, in order to derive Eq. (18), we have to separate the first-order part of the k⊥ρi expansion. The easiest way to achieve this, is to integrate Eq. (125) over the velocity space (keeping r constant) and expand the resulting equation in small k⊥ρi. Using Eqs. (126) and (127) to express the velocity-space integrals of g, we get 1 −Γ0(αi) ]Zeϕk ︸ ︷︷ ︸ −Γ1(αi) ︸ ︷︷ ︸ 4πen0e k2⊥A‖k ︸ ︷︷ ︸ d3vJ0(ai){〈χ〉Ri ,g}k ︸ ︷︷ ︸ d3vJ0(ai) v⊥ ·A⊥ ︸ ︷︷ ︸ 22 SCHEKOCHIHIN ET AL. . (137) Underneath each term, the lowest order in k⊥ρi to which it enters is shown. We see that terms containing ϕ are all first order, so it is up to this order that we shall retain terms. The collision term integrated over the velocity space picks up two extra orders of k⊥ρi (see Appendix B.1), so it is second order and can, therefore, be dropped. As a consequence of quasi- neutrality, the zeroth-order part of the above equation exactly coincides with Eq. (104), i.e, δni/n0i = δne/n0e satisfy the same equation. Indeed, neglecting second-order terms (but not first-order ones!), the nonlinear term in Eq. (137) (the last term on the left-hand side) is d3vv‖g v2thi , (138) and, using Eqs. (126-128) to express velocity-space integrals of g in the above expression, we find that the zeroth-order part of the nonlinearity is the same as the nonlinearity in Eq. (104), while the first-order part is ρ2i ∇2⊥ 4πen0e ∇2⊥A‖ , (139) where we have used the expansion (129) of Γ0(αi) and con- verted it back into x space. Thus, if we subtract Eq. (104) from Eq. (137), the remain- der is first order and reads ρ2i ∇2⊥ ρ2i ∇2⊥ 4πen0e ∇2⊥A‖ − 4πen0e ∇2⊥A‖ = 0. (140) Multiplying Eq. (140) by 2T0i/Zeρ i and using Eq. (135), we get the second RMHD equation (18). We have established that the Alfvén-wave component of the turbulence is decoupled and fully described by the RMHD equations (17) and (18). This result is the same as that in § 2.2 but now we have proven that collisions do not affect the Alfvén waves and that a fluid-like description only requires k⊥ρi ≪ 1 to be valid. 5.4. Why Alfvén Waves Ignore Collisions Let us write explicitly the distribution function of the ion gyrocenters [Eq. (124)] to two lowest orders in k⊥ρi: 〈ϕ〉Ri F0i + v2thi F0i + g + · · · , (141) where, up to corrections of order k2⊥ρ i , the ring-averaged scalar potential is 〈ϕ〉Ri = ϕ(Ri), the scalar potential taken at the position of the ion gyrocenter. Note that in Eq. (141), the first term is minus first order in k⊥ρi [see Eq. (132)], the sec- ond and third terms are zeroth order [Eq. (134)], and all terms of first and higher orders are omitted. In order to compute the full ion distribution function given by Eq. (54), we have to convert hi to the r space. Keeping terms up to zeroth order, we get 〈ϕ〉Ri ≃ ϕ(Ri) = ϕ(r) + v⊥× ẑ ·∇ϕ(r) + · · · ϕ(r) + 2v⊥ ·uE v2thi + . . . , (142) where uE = −c∇ϕ(r)× ẑ/B0, the E×B drift velocity. Sub- stituting Eq. (142) into Eq. (141) and then Eq. (141) into Eq. (54), we find fi = F0i + 2v⊥ ·uE v2thi F0i + v2thi F0i + g + · · · . (143) The first two terms can be combined into a Maxwellian with mean perpendicular flow velocity u⊥ = uE . These are the terms responsible for the Alfvén waves. The remaining terms, which we shall denote δ f̃i, are the perturbation of the Maxwellian in the moving frame of the Alfvén waves—they describe the passive (compressive) component of the turbu- lence (see § 5.5). Thus, the ion distribution function is (πv2thi) (v⊥ − uE)2 + v2‖ + δ f̃i. (144) This sheds some light on the indifference of Alfvén waves to collisions: Alfvénic perturbations do not change the Maxwellian character of the ion distribution. Unlike in a neu- tral fluid or gas, where viscosity arises when particles trans- port the local mean momentum a distance ∼ λmfpi, the parti- cles in a magnetized plasma instantaneously take on the lo- cal E×B velocity (they take a cyclotron period to adjust, so, roughly speaking, ρi plays the role of the mean free path). Thus, there is no memory of the mean perpendicular motion and, therefore, no perpendicular momentum transport. Some readers may find it illuminating to notice that Eq. (140) can be interpreted as stating simply ∇·j = 0: the first two terms represent the divergence of the polarization current, which is perpendicular to the magnetic field;22 the last two terms are b̂ ·∇ j‖. No contribution to the current arises from the collisional term in Eq. (137) as ion–ion collisions cause no particle transport to lowest order in k⊥ρi. 5.5. Compressive Fluctuations The equations that describe the density (δne) and magnetic- field-strength (δB‖) fluctuations follow immediately from Eqs. (125-128) if only zeroth-order terms are kept. In these equations, terms that involve ϕ and A‖ also contain factors ∼ k2⊥ρ2i and are, therefore, first-order [with the exception of the nonlinearity on the left-hand side of Eq. (125)]. The fact that 〈Cii[〈ϕ〉Ri F0i]〉Ri in Eq. (125) is first order is proved in Ap- pendix B.2. Dropping these terms along with all other contri- butions of order higher than zeroth and making use of Eq. (69) 22 The polarization-drift velocity is formally higher order than uE in the gyrokinetic expansion. However, since uE does not produce any current, the lowest-order contribution to the perpendicular current comes from the polarization drift. The higher-order contributions to the gyrocenter distri- bution function did not need to be calculated explicitly because the informa- tion about the polarization charge is effectively carried by the quasi-neutrality condition (61). We do not belabor this point because, in our approach, the no- tion of polarization charge is only ever brought in for interpretative purposes, but is not needed to carry out calculations. For further qualitative discussion of the role of the polarization charge and polarization drift in gyrokinetics, we refer the reader to Krommes 2006 and references therein. KINETIC TURBULENCE IN MAGNETIZED PLASMAS 23 to write out 〈χ〉Ri , we find that Eq. (125) takes the form + v‖ b̂ ·∇ v2thi v2thi , (145) where we have used definitions (122-123) of the convective time derivative d/dt and the total gradient along the magnetic field b̂ · ∇ to write our equation in a compact form. Note that, in view of the correspondence between Φ, Ψ and ϕ, A‖ [Eq. (135)], these nonlinear derivatives are the same as those defined in Eqs. (19-20). The collision term in the right-hand side of the above equation is the zeroth-order limit of the gy- rokinetic ion–ion collision operator: a useful model form of it is given in Appendix B.3 [Eq. (B18)]. To zeroth order, Eqs. (126-128) are d3vg, (146) d3vv‖g, (147) v2thi g. (148) Note that u‖ is not an independent quantity—it can be com- puted from the ion distribution but is not needed for the deter- mination of the latter. Equations (145-148) evolve the ion distribution function g, the “slow-wave quantities” u‖, δB‖, and the density fluc- tuations δne. The nonlinearities in Eq. (145), contained in d/dt and b̂ ·∇, involve the Alfvén-wave quantities Φ and Ψ (or, equivalently, ϕ and A‖) determined separately and inde- pendently by the RMHD equations (17-18). The situation is qualitatively similar to that in MHD (§ 2.4), except now a kinetic description is necessary—Eqs. (145-148) replace Eqs. (25-27)—and the nonlinear scattering/mixing of the slow waves and the entropy mode by the Alfvén waves takes the form of passive advection of the distribution function g. The density and magnetic-field-strength fluctuations are velocity- space moments of g. Another way to understand the passive nature of the com- pressive component of the turbulence discussed above is to think of it as the perturbation of a local Maxwellian equilib- rium associated with the Alfvén waves. Indeed, in § 5.4, we split the full ion distribution function [Eq. (144)] into such a local Maxwellian and its perturbation δ f̃i = g + v2thi F0i. (149) It is this perturbation that contains all the information about the compressive component; the second term in the above ex- pression enforces to lowest order the conservation of the first adiabatic invariant µi = miv ⊥/2B. In terms of the function (149), Eqs. (145-148) take a somewhat more compact form (cf. Schekochihin et al. 2007): δ f̃i − v2thi + v‖b̂ ·∇ δ f̃i + δ f̃i , (150) FIG. 5.— Channels of the kinetic cascade of generalized energy (§ 3.4) from large to small scales: see § 2.7 and Appendix D.2 (inertial range, collisional regime), § 5.6 and § 6.2.5 (inertial range, collisionless regime), § 7.8 and § 7.12 (dissipation range). Note that some ion heating probably also results from the collisional and collisionless damping of the compressive fluctuations in the inertial range (see § 6.1.2 and § 6.2.4). d3vδ f̃i, (151) v2thi δ f̃i. (152) 5.6. Generalized Energy: Three KRMHD Cascades The generalized energy (§ 3.4) in the limit k⊥ρi ≪ 1 is cal- culated by substituting into Eq. (109) the perturbed ion dis- tribution function δ fi = 2v⊥ · uEF0i/v2thi + δ f̃i [see Eqs. (143) and (149)]. After performing velocity integration, we get min0iu n0iT0i δ f̃ 2i =WAW +Wcompr. (153) We see that the kinetic energy of the Alfvénic fluctuations has emerged from the ion-entropy part of the generalized en- ergy. The first two terms in Eq. (153) are the total (kinetic plus magnetic) energy of the Alfvén waves, denoted WAW. As we learned from § 5.3, it cascades independently of the rest of the generalized energy, Wcompr, which contains the compres- sive component of the turbulence (§ 5.5) and is the invariant conserved by Eqs. (150-152). In terms of the potentials used in our discussion of RMHD in § 2, we have WAW = min0i |∇⊥Φ|2 + |∇⊥Ψ|2 min0i |∇⊥ζ+|2 + |∇⊥ζ−|2 =W +AW +W AW (154) whereW +AW and W AW are the energies of the “+” and “−” waves [Eq. (33)], which, as we know from § 2.3, cascade by scatter- ing off each other but without exchanging energy. Thus, the kinetic cascade in the limit k⊥ρi ≪ 1 is split, in- dependently of the collisionality, into three cascades: of W +AW, 24 SCHEKOCHIHIN ET AL. W −AW and Wcompr. The compressive cascade is, in fact, split into three independent cascades—the splitting is different in the collisional limit (Appendix D.2) and in the collisionless one (§ 6.2.5). Figure 5 schematically summarizes both the splitting of the kinetic cascade that we have worked out so far and the upcoming developments. 5.7. Summary In § 4, gyrokinetics was reduced to a hybrid fluid-kinetic system by means of an expansion in the electron mass, which was valid for k⊥ρe ≪ 1. In this section, we have further re- stricted the scale range by taking k⊥ρi ≪ 1 and as a result have been able to achieve a further reduction in the complexity of the kinetic theory describing the turbulent cascades. The re- duced theory derived here evolves 5 unknown functions: Φ, Ψ, δB‖, δne and g. The stream and flux functions, Φ and Ψ are related to the fluid quantities (perpendicular velocity and magnetic field perturbations) via Eq. (16) and to the electro- magnetic potentials ϕ, A‖ via Eq. (135). They satisfy a closed system of equations, Eqs. (17-18), which describe the decou- pled cascade of Alfvén waves. These are the same equations that arise from the MHD approximations, but we have now proven that their validity does not depend on the assumption of high collisionality (the fluid limit) and extends to scales well below the mean free path, but above the ion gyroscale. The physical reasons for this are explained in § 5.4. The den- sity and magnetic-field-strength fluctuations (the “compres- sive” fluctuations, or the slow waves and the entropy mode in the MHD limit) now require a kinetic description in terms of the ion distribution function g [or δ f̃i, Eq. (149)], evolved by the kinetic equation (145) [or Eq. (150)]. The kinetic equation contains δne and δB‖, which are, in turn calculated in terms of the velocity-space integrals of g via Eqs. (146) and (148) [or Eqs. (151) and (152)]. The nonlinear evolution (turbulent cascade) of g, δB‖ and δne is due solely to passive advection of g by the Alfvén-wave turbulence. Let us summarize the new set of equations: = vAb̂ ·∇Φ, (155) ∇2⊥Φ= vAb̂ ·∇∇2⊥Ψ, (156) + v‖ b̂ ·∇ v2thi v2thi , (157) v2thi (158) v2thi g, (159) where +{Φ, · · ·} , b̂ ·∇ = ∂ {Ψ, · · ·} . (160) An explicit form of the collision term in the right-hand side of Eq. (157) is provided in Appendix B.3 [Eq. (B18)]. The generalized energy conserved by Eqs. (155-159) is given by Eq. (153). The kinetic cascade is split, the Alfvénic cascade proceeding independently of the compressive one (see Fig. 5). The decoupling of the Alfvénic cascade is manifested by Eqs. (155-156) forming a closed subset. As already noted in § 4.9, Eq. (155) is the component of Ohm’s law along the total magnetic field, B ·E = 0. Equation (156) can be interpreted as the evolution equation for the vorticity of the perpendicular plasma flow velocity, which is the E×B drift velocity. We shall refer to the system of equations (155-159) as Ki- netic Reduced Magnetohydrodynamics (KRMHD).23 It is a hybrid fluid-kinetic description of low-frequency turbulence in strongly magnetized weakly collisional plasma that is uni- formly valid at all scales satisfying k⊥ρi ≪ min(1,k‖λmfpi) (ions are strongly magnetized)24 and k‖λmfpi ≫ (me/mi)1/2 (electrons are isothermal), as illustrated in Fig. 2. Therefore, it smoothly connects the collisional and collisionless regimes and is the appropriate theory for the study of the turbulent cas- cades in the inertial range. The KRMHD equations generalize rather straightforwardly to plasmas that are so collisionless that one cannot assume a Maxwellian equilibrium distribu- tion function (Chen et al. 2009)—a situation that is relevant in some of the solar-wind measurements (see further discus- sion in § 8.3). KRMHD describe what happens to the turbulent cascade at or below the ion gyroscale—we shall move on to these scales in § 7, but first we would like to discuss the turbulent cascades of density and magnetic-field-strength fluctuations and their damping by collisional and collisionless mechanisms. 6. COMPRESSIVE FLUCTUATIONS IN THE INERTIAL RANGE Here we first derive the nonlinear equations that govern the evolution of the compressive (density and magnetic-field- strength) fluctuations in the collisional (k‖λmfpi ≪ 1, § 6.1 and Appendix D) and collisionless (k‖λmfpi ≫ 1, § 6.2) limits, dis- cuss the linear damping that these fluctuations undergo in the two limits and work out the form the generalized energy takes for compressive fluctuations (which is particularly interesting in the collisionless limit, §§ 6.2.3-6.2.5). As in previous sec- tions, an impatient reader may skip to § 6.3 where the results of the previous two subsections are summarized and the im- plications for the structure of the turbulent cascades of the density and field-strength fluctuations are discussed. 6.1. Collisional Regime 6.1.1. Equations In the collisional regime, k‖λmfpi ≪ 1, the fluid limit is re- covered by expanding Eqs. (155-159) in small k‖λmfpi. The calculation that is necessary to achieve this is done in Ap- pendix D (see also Appendix A.4). The result is a closed set 23 The term is introduced by analogy with a popular fluid-kinetic system known as Kinetic MHD, or KMHD (see Kulsrud 1964, 1983). KMHD is de- rived for magnetized plasmas (ρi ≪ λmfpi) under the assumption that kρs ≪ 1 and ω ≪ Ωs but without assuming either strong anisotropy (k‖ ≪ k⊥) or small fluctuations (|δB| ≪ B0). The KRMHD equations (155-159) can be recovered from KMHD by applying to it the GK-RMHD ordering [Eq. (12) and § 3.1] and an expansion in (me/mi)1/2 (Schekochihin et al. 2007). This means that the k⊥ρi expansion (§ 5), which for KMHD is the primary ex- pansion, commutes with the gyrokinetic expansion (§ 3) and the (me/mi)1/2 expansion (§ 4), both of which preceded it in this paper. 24 The condition k⊥ρi ≪ k‖λmfpi must be satisfied because in our esti- mates of the collision terms (Appendix B.2) we took k⊥ρi ≪ 1 while assum- ing that k‖λmfpi ∼ 1. KINETIC TURBULENCE IN MAGNETIZED PLASMAS 25 of three fluid equations that evolve δB‖, δne and u‖: = b̂ ·∇u‖ + , (161) = v2Ab̂ ·∇ + ν‖ib̂ ·∇ b̂ ·∇u‖ , (162) +κ‖ib̂ ·∇ b̂ ·∇δTi , (163) where ν‖ib̂ ·∇u‖ , (164) and ν‖i and κ‖i are the coefficients of parallel ion viscosity and thermal diffusivity, respectively. The viscous and ther- mal diffusion are anisotropic because plasma is magnetized, λmfpi ≫ ρi (Braginskii 1965). The method of calculation of ν‖i and κ‖i is explained in Appendix D.3. Here we shall ig- nore numerical prefactors of order unity and give order-of- magnitude values for these coefficients: ν‖i ∼ κ‖i ∼ v2thi ∼ vthiλmfpi. (165) If we set ν‖i = κ‖i = 0, Eqs. (161-164) are the same as the RMHD equations of § 2 with the sound speed defined as cs = vA . (166) This is the natural definition of cs for the case of adiabatic ions, whose specific heat ratio is γi = 5/3, and isothermal elec- trons, whose specific heat ratio is γe = 1 [because δpe = T0eδne; see Eq. (103)]. Note that Eq. (164) is equivalent to the pressure balance [Eq. (22) of § 2] with p = niTi + neTe and δpe = T0eδne. As in § 2, the fluctuations described by Eqs. (161-164) sep- arate into the zero-frequency entropy mode and the left- and right-propagating slow waves with ω = ± 1 + v2A/c2s (167) [see Eq. (30)]. All three are cascaded independently of each other via nonlinear interaction with the Alfvén waves. In Ap- pendix D.2, we show that the generalized energy Wcompr for this system, given in § 5.6, splits into the three familiar invari- ants W +sw, W sw, and Ws, defined by Eqs. (34-35) (see Fig. 5). 6.1.2. Dissipation The diffusion terms add dissipation to the equations. Be- cause diffusion occurs along the field lines of the total mag- netic field (mean field plus perturbation), the diffusive terms are nonlinear and the dissipation process also involves interac- tion with the Alfvén waves. We can estimate the characteristic parallel scale at which the diffusion terms become important by balancing the nonlinear cascade time and the typical diffu- sion time: k‖vA ∼ vthiλmfpik2‖ ⇔ k‖λmfpi ∼ 1/ βi, (168) where we have used Eq. (165). Technically speaking, the cutoff given by Eq. (168) always lies in the range of k‖ that is outside the region of validity of the small-k‖λmfpi expansion adopted in the derivation of Eqs. (161-163). In fact, in the low-beta limit, the collisional cutoff falls manifestly in the collisionless scale range, i.e., the collisional (fluid) approximation breaks down before the slow-wave and entropy cascades are damped and one must use the collisionless (kinetic) limit to calculate the damping (see § 6.2.2). The situation is different in the high-beta limit: in this case, the expansion in small k‖λmfpi can be reformulated as an expansion in small 1/ βi and the cutoff falls within the range of validity of the fluid approximation. Equations (161- 163) in this limit are = b̂ ·∇u‖, (169) = v2Ab̂ ·∇ + ν‖ib̂ ·∇ b̂ ·∇u‖ , (170) 1 + Z/τ 5/3 + Z/τ κ‖ib̂ ·∇ b̂ ·∇δne . (171) As in § 2 [Eq. (28)], the density fluctuations [Eq. (171)] have decoupled from the slow waves [Eqs. (169-170)]. The former are damped by thermal diffusion, the latter by viscosity. The corresponding linear dispersion relations are ω = −i 1 + Z/τ 5/3 + Z/τ ‖, (172) ω =±k‖vA ν‖ik‖ . (173) Equation (172) describes strong diffusive damping of the den- sity fluctuations. The slow-wave dispersion relation (173) has two distinct regimes: 1. When k‖ < 2vA/ν‖i, it describes viscously damped slow waves. In particular, in the limit k‖λmfpi ≪ 1/ βi, we ω ≃±k‖vA − i . (174) 2. For k‖ > 2vA/ν‖i, both solutions become purely imag- inary, so the slow waves are converted into aperiodic decaying fluctuations. The stronger-damped (diffusive) branch has ω ≃ −iν‖ik2‖, the weaker-damped one has ω ≃ −i v ∼ − i λmfpi ∼ − i√ λmfpi . (175) This damping effect is called viscous relaxation. It is valid until k‖λmfpi ∼ 1, where it is replaced by the col- lisionless damping discussed in § 6.2.2 [see Eq. (190)]. The viscous and thermal-diffusive dissipation mechanisms described above lead, in the limits where they are efficient, to ion heating via the standard fluid (collisional) route, involving the development of small parallel scales in the position space, but not in velocity space (see § 3.4 and § 3.5). 6.2. Collisionless Regime 6.2.1. Equations In the collisionless regime, k‖λmfpi ≫ 1, the collision inte- gral in the right-hand side of the kinetic equation (157) can be 26 SCHEKOCHIHIN ET AL. neglected. The v⊥ dependence can then be integrated out of Eq. (157). Indeed, let us introduce the following two auxiliary functions: Gn(v‖) = − dv⊥ v⊥ v2thi g, (176) GB(v‖) = − dv⊥ v⊥ v2thi g. (177) In terms of these functions, dv‖Gn, dv‖GB (178) and Eq. (157) reduces to the following two coupled one- dimensional kinetic equations + v‖b̂ ·∇Gn = − v‖FM(v‖) ×b̂ ·∇ , (179) + v‖b̂ ·∇GB = v‖FM(v‖) ×b̂ ·∇ , (180) where FM(v‖) = (1/ πvthi)exp(−v2‖/v thi) is a one-dimensional Maxwellian. This system can be diagonalized, so it splits into two decoupled equations +v‖b̂ ·∇G± = v‖FM(v‖) b̂ ·∇ dv′‖ G ±(v′‖), (181) where ± = − (182) and we have introduced a new pair of functions G+ = GB + Gn, G − = Gn + GB, (183) where σ = 1 + . (184) Equation (181) describes two decoupled kinetic cascades, which we will discuss in greater detail in §§ 6.2.3-6.2.5. 6.2.2. Collisionless Damping Fluctuations described by Eq. (181) are subject to collision- less damping. Indeed, let us linearize Eq. (181), Fourier trans- form in time and space, divide through by −i(ω − k‖v‖), and integrate over v‖. This gives the following dispersion relation (the “−” branch is for G−, the “+” branch for G+) ζiZ (ζi) = Λ ± − 1, (185) FIG. 6.— Schematic log-log plot (artist’s impression) of the ratio of the damping rate of magnetic-field-strength fluctuations to the Alfvén frequency k‖vA in the high-beta limit [see Eqs. (173) and (190)]. In Barnes et al. (2009), this plot is reproduced via a direct numerical solution of the linearized ion gyrokinetic equation with collisions. where ζi = ω/|k‖|vthi = ω/|k‖|vA βi and we have used the plasma dispersion function (Fried & Conte 1961) Z (ζi) = x − ζi (186) (the integration is along the Landau contour). This function is not to be confused with the ion charge parameter Z = qi/e. Formally, Eq. (185) has an infinite number of solutions. When βi ∼ 1, they are all strongly damped with damping rates Im(ω) ∼ |k‖|vthi ∼ |k‖|vA, so the damping time is compara- ble to the characteristic timescale on which the Alfvén waves cause these fluctuations to cascade to smaller scales. It is interesting to consider the high- and low-beta limits. High-Beta Limit. — When βi ≫ 1, we have in Eq. (185) − − 1≃−2 , G− ≃ Gn, (187) + − 1≃ , G+ ≃ GB + Gn. (188) The “−” branch corresponds to the density fluctuations. The solution of Eq. (185) has Im(ζi) ∼ 1, so these fluctuations are strongly damped: ω ∼ −i|k‖|vA βi. (189) The damping rate is much greater than the Alfvénic rate k‖vA of the nonlinear cascade. In contrast, for the “+” branch, the damping rate is small: it can be obtained by expanding Z(ζi) = π + O (ζi), which gives25 ω = −i |k‖|vthi√ |k‖|vA√ . (190) Since Gn is strongly damped, Eq. (188) implies G + ≃ GB, i.e., the fluctuations that are damped at the rate (190) are predom- inantly of the magnetic-field strength. The damping rate is a 25 This is the gyrokinetic limit (k‖/k⊥ ≪ 1) of the more general damping effect known in astrophysics as the Barnes (1966) damping and in plasma physics as transit-time damping. We remind the reader that our approach was to carry out the gyrokinetic expansion (in small k‖/k⊥) first, and then take the high-beta limit as a subsidiary expansion. A more standard approach in the linear theory of plasma waves is to take the limit of high βi while treating k‖/k⊥ as an arbitrary quantity. A detailed calculation of the damping rates done in this way can be found in Foote & Kulsrud (1979). KINETIC TURBULENCE IN MAGNETIZED PLASMAS 27 constant (independent of k‖) small fraction ∼ 1/ βi of the Alfvénic cascade rate. In Fig. 6, we give a schematic plot of the damping rate of the magnetic-field-strength fluctuations (slow waves) connecting the fluid and kinetic limits for βi ≫ 1. Low-Beta Limit. — When βi ≪ 1, we have − − 1≃− , G− ≃ Gn + GB, (191) + − 1≃ 2 , G+ ≃ GB. (192) For the “−” branch, we again have Im(ζi) ∼ 1, so ω ∼ −i|k‖|vA βi, (193) which now is much smaller than the Alfvénic cascade rate k‖vA. For the “+” branch (predominantly the field-strength fluctuations), we seek a solution with ζ = −iζ̃i and ζ̃i ≫ 1. Then Eq. (185) becomes ζiZ(ζi) ≃ 2 π ζ̃i exp(ζ̃i) = 2/βi. Up to logarithmically small corrections, this gives ζ̃i ≃ | lnβi|, whence ω ∼ −i|k‖|vA βi| lnβi|. (194) While this damping rate is slightly greater than that of the “−” branch, it is still much smaller than the Alfvénic cascade rate. 6.2.3. Collisionless Invariants Equation (181) obeys a conservation law, which is very easy to derive. Multiplying Eq. (181) by G±/FM and integrating over space and velocities and performing integration by parts in the right-hand side, we get (G±)2 b̂ ·∇ dv‖v‖G ±. (195) On the other hand, integrating Eq. (181) over v‖ gives ± = −b̂ ·∇ dv‖v‖G ±. (196) Using this to express the right-hand side of Eq. (195) as a full time derivative, we find dW±compr = 0, (197) where the two invariants are W±compr = n0iT0i (G±)2 (198) It is useful (and always possible) to split G± = FM ± + G̃±, (199) where dv‖G̃ ± = 0 by construction. Then W±compr = n0iT0i (G̃±)2 . (200) Written in this form, the two invariants W±compr are mani- festly positive definite quantities because Λ+ > 1 and Λ− < 0. The invariants regulate the two decoupled kinetic cascades of compressive fluctuations in the collisionless regime. The col- lisionless damping derived in § 6.2.2 leads to exponential de- cay of the density and field-strength fluctuations, or, equiva- lently, of ±, while conserving W±compr. This means that the damping is merely a redistribution of the conserved quan- tity W±compr: the first term in Eq. (200) grows to compensate for the decay of the second. 6.2.4. Linear Parallel Phase Mixing In dynamical terms, how does the kinetic system Eq. (181) arrange for the integral of the distribution function G±(v‖) to decay while allowing its norm to grow? This is a very well known phenomenon of (linear) phase mixing (Landau 1946; Hammett et al. 1991; Krommes & Hu 1994; Krommes 1999; Watanabe & Sugama 2004). To put it in simple terms, the solution of the linearized Eq. (181) consists of the inhomoge- neous part, which contains the collisionless damping and the homogeneous part (solution of the left-hand side = 0) given by G± ∝ e−ik‖v‖t , the so-called ballistic response (this is also the nonlinear solution if t and k‖ are interpreted as Lagrangian variables in the frame of the Alfvén waves; see § 6.3). As time goes on, this part of the solution becomes increasingly oscillatory in v‖, so its velocity integral tends to zero, while its amplitude does not decay. It is such ballistic contributions that make up the G̃± term in Eq. (200). As the velocity gradient of G̃± increases with time, ∂G̃±/∂v‖ ∼ k‖tG±, at some point it can become sufficiently large to activate the collision integral [the right-hand side of Eq. (157)], which has so far been neglected. This way the col- lisionless damping of compressive fluctuations can be turned into ion heating—a simple example of a more general prin- ciple of how electromagnetic fluctuation energy is transferred into heat via the entropy part of the generalized energy (§ 3.5). Indeed, we will prove in § 6.2.5 that the invariants W±compr are constituent parts of the overall generalized energy functional for the compressive fluctuations, so their cascade to small scales in phase space is part of the overall kinetic cascade in- troduced in § 3.4. It is not entirely clear how efficient is the parallel-phase- mixing route to ion heating and, therefore, whether the colli- sionlessly damped energy of compressive fluctuations ends up in the ion heat or rather reaches the ion gyroscale and couples back to the Alfvénic component of the turbulence (§ 7.1). The answer to this question will depend on whether compressive fluctuations can develop large k‖—a non-trivial issue further discussed in § 6.3. 6.2.5. Generalized Energy: Three Collisionless Cascades We will now show how the generalized energy for com- pressive fluctuations in the collisionless regime incorporates the two invariants derived in § 6.2.3. Rewriting the compressive part of the KRMHD generalized energy [Eq. (153)] in terms of the function g [see Eq. (149)], we get Wcompr = n0iT0i . (201) 28 SCHEKOCHIHIN ET AL. Using Eqs. (178) and (183), we can express δne and δB‖ in terms of ± as follows , (202) , (203) where σ was defined in Eq. (184) and . (204) In order to express g in terms of G±, we have to reconstruct the v⊥ dependence of g, which we integrated out at the begin- ning of § 6.2.1. Let us represent the distribution function as follows πv2thi e−xĝ(x,v‖), ĝ(x,v‖) = Ll(x)Gl(v‖), (205) where x = v2⊥/v thi and we have expanded ĝ in Laguerre poly- nomials Ll(x) = (e x/l!)(dl/dxl)xle−x. Since Laguerre polyno- mials are orthogonal, the first term in Eq. (201) splits into a sum of “energies” associated with the expansion coefficients: . (206) The expansion coefficients are determined via the Laguerre transform: Gl(v‖) = dxe−xLl(x)ĝ(x,v‖). (207) As L0 = 1 and L1 = 1 − x, it is easy to see that δne and δB‖ can be expressed as linear combinations of dv‖G0 and dv‖G1 [see Eqs. (176-178)]. Using Eqs. (176), (177), and (183), we can show that G0 = − +G+ + σ − 1 − , (208) σΛ+G+ − , (209) where G± satisfy Eq. (181). As follows from Eq. (157) (ne- glecting the collision integral), all higher-order expansion co- efficients satisfy a simple homogeneous equation: + v‖b̂ ·∇Gl = 0, l > 1. (210) Thus, the distribution function can be explicitly written in terms of G±: G0(v‖) + v2thi G1(v‖) πv2thi thi + g̃, (211) where G0 and G1 are given by Eqs. (208-209) and g̃ com- prises the rest of the Laguerre expansion (all Gl with l > 1), i.e., it is the homogeneous solution of Eq. (157) that does not contribute to either density or magnetic-field strength: + v‖b̂ ·∇g̃ = 0, d3v g̃ = 0, v2thi g̃ = 0. (212) Now substituting Eqs. (208) and (209) into Eq. (206) and then substituting the result and Eqs. (202-203) into Eq. (201), we find after some straightforward manipulations Wcompr = T0ig̃ (Λ+)2W +compr (Λ−)2W −compr, (213) where κ is defined by Eq. (204) and W±compr are the two inde- pendent invariants that we derived in § 6.2.3. Thus, the gener- alized energy for compressive fluctuations splits into three in- dependently cascading parts: W±compr associated with the den- sity and magnetic-field-strength fluctuations and a purely ki- netic part given by the first term in Eq. (213) (see Fig. 5). The dynamical evolution of this purely kinetic component is described by Eq. (212)—it is a passively mixed, undamped ballistic-type mode. All three cascade channels lead to small perpendicular spa- tial scales via passive mixing by the Alfvénic turbulence and also to small scales in v‖ via the parallel phase mixing pro- cess discussed in § 6.2.4 (note that g̃ is subject to this process as well). 6.3. Parallel and Perpendicular Cascades Let us return to the kinetic equation (157) and transform it to the Lagrangian frame associated with the velocity field u⊥ = ẑ×∇⊥Φ of the Alfvén waves: (t,r) → (t,r0), where r(t,r0) = r0 + dt ′u⊥(t ′,r(t ′,r0)). (214) In this frame, the convective derivative d/dt defined in Eq. (160) turns into ∂/∂t, while the parallel spatial gradient b̂ ·∇ can be calculated by employing the Cauchy solution for the perturbed magnetic field δB⊥ = ẑ×∇⊥Ψ: b̂(t,r) = ẑ + δB⊥(t,r) = b̂(0,r0) ·∇0r, (215) where r is given by Eq. (214) and ∇0 = ∂/∂r0. Then b̂ ·∇ = b̂(0,r0) · ·∇ = b̂(0,r0) ·∇0 = , (216) where s0 is the arc length along the perturbed magnetic field taken at t = 0 [if δB⊥(0,r0) = 0, s0 = z0]. Thus, in the La- grangian frame associated with the Alfvénic component of the turbulence, Eq. (157) is linear. This means that, if the effect of finite ion gyroradius is neglected, the KRMHD sys- tem does not give rise to a cascade of density and magnetic- field-strength fluctuations to smaller scales along the moving (perturbed) field lines, i.e., b̂ · ∇δne and b̂ · ∇δB‖ do not in- crease. In contrast, there is a perpendicular cascade (cascade in k⊥): the perpendicular wandering of field lines due to the Alfvénic turbulence causes passive mixing of δne and δB‖ in the direction transverse to the magnetic field (see § 2.6 for a quick recapitulation of the standard scaling argument on the passive cascade that leads to a k ⊥ in the perpendicular di- KINETIC TURBULENCE IN MAGNETIZED PLASMAS 29 FIG. 7.— Lagrangian mixing of passive fields: fluctuations develop small scales across, but not along the exact field lines. rection). Figure 7 illustrates this situation.26 We emphasize that this lack of nonlinear refinement of the scale of δne and δB‖ along the moving field lines is a particu- lar property of the compressive component of the turbulence, not shared by the Alfvén waves. Indeed, unlike Eq. (157), the RMHD equations (155-156), do not reduce to a linear form under the Lagrangian transformation (214), so the Alfvén waves should develop small scales both across and along the perturbed magnetic field. Whether the density and magnetic-field-strength fluctua- tions develop small scales along the magnetic field has direct physical and observational consequences. Damping of these fluctuations, both in the collisional and collisionless regimes, discussed in § 6.1.2 and § 6.2.2, respectively, depends pre- cisely on their scale along the perturbed field: indeed, the linear results derived there are exact in the Lagrangian frame (214). To summarize these results, the damping rate of δne and δB‖ at βi ∼ 1 is γ∼ vthiλmfpik2‖0, k‖0λmfpi ≪ 1, (217) γ∼ vthik‖0, k‖0λmfpi ≫ 1, (218) where k‖0 ∼ b̂ ·∇ is the wavenumber along the perturbed field (i.e., if there is no parallel cascade, the wavenumber of the large-scale stirring). Whether this damping cuts off the cascades of δne and δB‖ depends on the relative magnitudes of the damping rate γ for a given k⊥ and the characteristic rate at which the Alfvén waves cause δne and δB‖ to cascade to higher k⊥. This rate is ωA ∼ k‖AvA, where k‖A is the parallel wave number of the Alfvén waves that have the same k⊥. Since the Alfvén waves do have a parallel cascade, assuming scale-by-scale critical balance (3) leads to [Eq. (5)] k‖A ∼ k 0 . (219) If, in contrast to the Alfvén waves, δne and δB‖ have no par- allel cascade, k‖0 does not grow with k⊥, so, for large enough k⊥, k‖0 ≪ k‖A and γ≪ωA. This means that, despite the damp- ing, the density and field-strength fluctuations should have perpendicular cascades extending to the ion gyroscale. The validity of the argument at the beginning of this sec- tion that ruled out the parallel cascade of δne and δB‖ is not quite as obvious as it might appear. Lithwick & Goldreich (2001) argued that the dissipation of δne and δB‖ at the ion gyroscale would cause these fluctuations to become uncorre- lated at the same parallel scales as the Alfvénic fluctuations by which they are mixed, i.e., k‖0 ∼ k‖A. The damping rate then becomes comparable to the cascade rate, cutting off the cas- cades of density and field-strength fluctuations at k‖λmfpi ∼ 1. The corresponding perpendicular cutoff wavenumber is [see 26 Note that effectively, there is also a cascade in k‖ if the latter is mea- sured along the unperturbed field—more precisely, a cascade in kz. This is due to the perpendicular deformation of the perturbed magnetic field by the Alfvén-wave turbulence: since ∇⊥ grows while b̂ ·∇ remains the same, we have from Eq. (123) ∂/∂z ≃ −(δB⊥/B0) ·∇⊥. Eq. (219)] k⊥ ∼ l1/20 λ mfpi . (220) Asymptotically speaking, in a weakly collisional plasma, this cutoff is far above the ion gyroscale, k⊥ρi ≪ 1. How- ever, the relatively small value of λmfpi in the warm ISM, which was the main focus of Lithwick & Goldreich 2001, meant that the numerical value of the perpendicular cutoff scale given by Eq. (220) was, in fact, quite close both to the ion gyroscale (see Table 1) and to the observational es- timates for the inner scale of the electron-density fluctuations in the ISM (Spangler & Gwinn 1990; Armstrong et al. 1995). Thus, it was not possible to tell whether Eq. (220), rather than k⊥ ∼ ρ−1i , represented the correct prediction. The situation is rather different in the nearly collision- less case of the solar wind, where the cutoff given by Eq. (220) would mean that very little density or field- strength fluctuations should be detected above the ion gy- roscale. Observations do not support such a conclu- sion: the density fluctuations appear to follow a k−5/3 law at all scales larger than a few times ρi (Lovelace et al. 1970; Woo & Armstrong 1979; Celnikier et al. 1983, 1987; Coles & Harmon 1989; Marsch & Tu 1990b; Coles et al. 1991), consistently with the expected behavior of an un- damped passive scalar field (see § 2.6). An extended range of k−5/3 scaling above the ion gyroscale is also observed for the fluctuations of the magnetic-field strength (Marsch & Tu 1990b; Bershadskii & Sreenivasan 2004; Hnat et al. 2005; Alexandrova et al. 2008a). These observational facts suggest that the cutoff formula (220) does not apply. This does not, however, conclusively vitiate the Lithwick & Goldreich (2001) theory. Heuristically, their argument is plausible, although it is, perhaps, useful to note that in order for the effect of the perpendicular dis- sipation terms, not present in the KRMHD equations (157- 159), to be felt, the density and field-strength fluctuations should reach the ion gyroscale in the first place. Quanti- tatively, the failure of the compressive fluctuations in the solar wind to be damped could still be consistent with the Lithwick & Goldreich (2001) theory because of the relative weakness of the collisionless damping, especially at low beta (§ 6.2.2)—the explanation they themselves favor. The way to check observationally whether this explanation suffices would be to make a comparative study of the compressive fluctua- tions for solar-wind data with different values of βi. If the strength of the damping is the decisive factor, one should al- ways see cascades of both δne and δB‖ at low βi, no cascades at βi ∼ 1, and a cascade of δB‖ but not δne at high βi (in this limit, the damping of the density fluctuations is strong, of the field-strength weak; see § 6.2.2). If, on the other hand, the parallel cascade of the compressive fluctuations is intrin- sically inefficient, very little βi dependence is expected and a perpendicular cascade should be seen in all cases. Obviously, an even more direct observational (or numer- ical) test would be the detection or non-detection of near- perfect alignment of the density and field-strength structures with the moving field lines (not with the mean magnetic field—see footnote 26), but it is not clear how to measure this reliably. It is interesting, in this context, that in near- the-Sun measurements, the density fluctuations are reported to have the form of highly anisotropic filaments aligned with the magnetic field (Armstrong et al. 1990; Grall et al. 1997; Woo & Habbal 1997). Another intriguing piece of observa- 30 SCHEKOCHIHIN ET AL. tional evidence is the discovery that the local structure of the magnetic-field-strength and density fluctuations at 1 AU is, in a certain sense, correlated with the solar cycle (Kiyani et al. 2007; Hnat et al. 2007; Wicks et al. 2009)—this suggests a dependence on initial conditions that is absent in the Alfvénic fluctuations and that presumably should also disappear in the compressive fluctuations if the latter are fully mixed both in the perpendicular and parallel directions. 7. TURBULENCE IN THE DISSIPATION RANGE: ELECTRON RMHD AND THE ENTROPY CASCADE 7.1. Transition at the Ion Gyroscale The validity of the theory discussed in § 5 and § 6 breaks down when k⊥ρi ∼ 1. As the ion gyroscale is approached, the Alfvén waves are no longer decoupled from the rest of the plasma dynamics. All modes now contain perturbations of density and magnetic-field strength and can be collision- lessly damped. Because of the low-frequency nature of the Alfvén-wave cascade, ω ≪ Ωi even at k⊥ρi ∼ 1 [Eq. (46)], so the ion cyclotron resonance (ω − k‖v‖ = ±Ωi) is not im- portant, while the Landau one (ω = k‖v‖) is. The linear the- ory of this collisionless damping in the gyrokinetic approx- imation is worked out in detail in Howes et al. (2006) (see also Gary & Borovsky 2008). Figure 8 shows the solutions of their dispersion relation that illustrate how the Alfvén wave becomes a dispersive kinetic Alfvén wave (KAW) (see § 7.3) and collisionless damping becomes important as the ion gy- roscale is reached. We stress that this transition occurs at the ion gyroscale, not at the ion inertial scale di = ρi/ βi (except in the limit of cold ions, τ = T0i/T0e ≪ 1; see Appendix E). This statement is true even when βi is not order unity, as illustrated in Fig. 8: for the three cases plotted there, k⊥di = 1 corresponds to k⊥ρi = 0.1, 1 and 10 for βi = 0.01, 1 and 100, respectively, but there is no trace of the ion inertial scale in the solutions of the linear dispersion relation. Nonlinearly, in the limit βi ≪ 1, we may consider the scales k⊥di ∼ 1 and expand the gyrokinetics in k⊥ρi = k⊥di βi ≪ 1 in a way similar to how it was done in § 5 and obtain precisely the same results: Alfvénic fluctuations described by the RMHD equations and compressive fluctua- tions passively advected by them and satisfying the reduced kinetic equation derived in § 5.5. Thus, even though di ≫ ρi at low beta, there is no change in the nature of the turbulent cascade until k⊥ρi ∼ 1 is reached. The nonlinear theory of what happens at k⊥ρi ∼ 1 is very poorly understood. It is, however, possible to make progress by examining what kind of fluctuations emerge on the other side of the transition, at k⊥ρi ≫ 1. As we will demonstrate below, it turns out that another turbulent cascade—this time of KAW—is possible in this so-called dissipation range. It can transfer the energy of KAW-like fluctuations down to the electron gyroscale, where electron Landau damping becomes important (see Howes et al. 2006). Some observational evi- dence of KAW is, indeed, available in the solar wind and the magnetosphere (Bale et al. 2005; Grison et al. 2005, see fur- ther discussion in § 8.2.4). Below we derive the equations that describe KAW-like fluctuations in the scale range k⊥ρi ≫ 1, k⊥ρe ≪ 1 (§ 7.2) and work out a Kolmogorov-style scaling theory for this cascade (§ 7.5). Because of the presence of the collisionless damping at the ion gyroscale, only a certain fraction of the turbulent power arriving there from the inertial range is converted into the KAW cascade, while the rest is Landau-damped. The damp- ing leads to the heating of the ions, but the process of deposit- ing the collisionlessly damped fluctuation energy into the ion heat is non-trivial because, as we explained in § 3.5, collisions do need to play a role in order for true heating to occur. As we explained in § 3.5 and will see specifically for the dissi- pation range in § 7.8, the electromagnetic-fluctuation energy does not disappear as a result of the Landau damping but is converted into ion entropy fluctuations, while the generalized energy is conserved. Collisions are then accessed and ion heating achieved via a purely kinetic phenomenon: the ion entropy cascade in phase space (nonlinear phase mixing), for which a theory is developed in § 7.9 and § 7.10. A similar pro- cess of conversion of the KAW energy into electron entropy fluctuations and then electron heat is treated in § 7.12. Figure 5 illustrates the routes energy takes from the ion gy- roscale towards heating. Crucially, it is at k⊥ρi ∼ 1 that it is decided how much energy would eventually go into the ions and how much into electrons.27 How this distribution of energy depends on plasma parameters (βi and T0i/T0e) is an open theoretical question28 of considerable astrophys- ical interest: e.g., the efficiency of ion heating is a key un- known in the theory of advection-dominated accretion flows (Quataert & Gruzinov 1999, see discussion in § 8.5) and of the solar corona (e.g., Cranmer & van Ballegooijen 2003); we will also see in § 7.11 that it may determine the form of the observed dissipation-range spectra in space plasmas. A short summary of this section is given in § 7.14. 7.2. Equations of Electron Reduced MHD The derivation is straightforward: when ai ∼ k⊥ρi ≫ 1, all Bessel functions in Eqs. (118-120) are small, so the integrals of the ion distribution function vanish and Eqs. (118-120) be- , (221) u‖e = 4πen0e ∇2⊥A‖ = − ρi∇2⊥Ψ√ , u‖i = 0, (222) , (223) where we used the definitions (135) of the stream and flux functions Φ and Ψ. These equations are a reflection of the fact that, for k⊥ρi ≫ 1, the ion response is effectively purely Boltzmann, with the gyrokinetic part hi contributing nothing to the fields or flows [see Eq. (54) with hi omitted; hi does, however, play an impor- tant role in the energy balance and ion heating, as explained in §§ 7.8-7.10 below]. The Boltzmann response for ion den- sity is expressed by Eq. (221). Equation (222) states that the parallel ion flow velocity can be neglected. Finally, Eq. (223) expresses the pressure balance for Boltzmann (and, therefore, isothermal) electrons [Eq. (103)] and ions: if we write B0δB‖ = −δpi − δpe = −T0iδni − T0eδne, (224) 27 Some of the energy of compressive fluctuations may go into ion heat via collisional (§ 6.1.2) or collisionless (§ 6.2.2) damping of these fluctuations in the inertial range. Whether this is a significant ion heating mechanism depends on the efficiency of the parallel cascade (see § 6.2.4 and § 6.3). 28 How much energy is converted into ion entropy fluctuations in the pro- cess of a nonlinear turbulent cascade is not necessarily directly related to the strength of the linear collisionless damping. KINETIC TURBULENCE IN MAGNETIZED PLASMAS 31 FIG. 8.— Numerical solutions of the linear gyrokinetic dispersion relation (for a detailed treatment of the linear theory, see Howes et al. 2006) showing the transition from the Alfvén wave to KAW between the inertial range (k⊥ρi ≪ 1) and the dissipation range (k⊥ρi & 1). We show three cases: low beta (βi = 0.01), βi = 1, and high beta (βi = 100). In all three cases, τ = 1 and Z = 1. Bold solid lines show the real frequency ω, bold dashed lines the damping rate γ, both normalized by k‖vA (in gyrokinetics, ω/k‖vA and γ/k‖vA are functions of k⊥ only). Dotted lines show the asymptotic KAW solution (230). Horizontal solid line shows the Alfvén wave ω = k‖vA. Vertical solid lines show k⊥ρi = 1 and k⊥ρe = 1. Note that the damping can be considered strong if the characteristic decay time is comparable or shorter than the wave period, i.e., γ/ω & 1/2π. Thus, in these plots, the damping at k⊥ρi ∼ 1 is relatively weak for βi = 1, relatively strong for low beta and very strong for high beta. it follows that , (225) which, combined with Eq. (221), gives Eq. (223). We remind the reader that the perpendicular Ampère’s law, from which Eq. (223) was derived [Eq. (66) via Eq. (120)] is, in gyrokinet- ics, indeed equivalent to the statement of perpendicular pres- sure balance (see § 3.3). Substituting Eqs. (221-223) into Eqs. (116-117), we obtain the following closed system of equations b̂ ·∇Φ, (226) 2 +βi 1 + Z/τ ) b̂ ·∇ ρ2i ∇2⊥Ψ . (227) Note that, using Eq. (223), Eqs. (226) and (227) can be recast as two coupled evolution equations for the perpendicular and parallel components of the perturbed magnetic field, respec- tively [Eqs. (C10) in Appendix C.2]. We shall refer to Eqs. (226-227) as Electron Reduced MHD (ERMHD). They are related to the Electron Magnetohydrody- namics (EMHD)—a fluid-like approximation that evolves the magnetic field only and arises if one assumes that the mag- netic field is frozen into the electron flow velocity ue, while the ions are immobile, ui = 0 (Kingsep et al. 1990): 4πen0e ∇× [(∇×B)×B] . (228) As explained in Appendix C.2, the result of applying the RMHD/gyrokinetic ordering (§ 2.1 and § 3.1) to Eq. (228), where B = B0ẑ + δB and ẑ×∇⊥Ψ+ ẑ , (229) coincides with our Eqs. (226-227) in the effectively incom- pressible limits of βi ≫ 1 or βe = βiZ/τ ≫ 1. When betas are arbitrary, density fluctuations cannot be neglected compared to the magnetic-field-strength fluctuations [Eq. (225)] and give rise to perpendicular ion flows with ∇·ui 6= 0. Thus, our ERMHD system constitutes the appropriate generalization of EMHD for low-frequency anisotropic fluctuations without the assumption of incompressibility. A (more tenuous) relationship also exists between our ERMHD system and the so-called Hall MHD, which, like EMHD, is based on the magnetic field being frozen into the electron flow, but includes the ion motion via the stan- dard MHD momentum equation [Eq. (8)]. Strictly speak- ing, Hall MHD can only be used in the limit of cold ions, τ = T0i/T0e ≪ 1 (see, e.g., Ito et al. 2004; Hirose et al. 2004, and Appendix E), in which case it can be shown to reduce to Eqs. (226-227) in the appropriate small-scale limit (Ap- pendix E). Although τ ≪ 1 is not a natural assumption for most space and astrophysical plasmas, Hall MHD has, due to its simplicity, been a popular theoretical paradigm in the stud- ies of space and astrophysical plasma turbulence (see § 8.2.6). We have therefore devoted Appendix E to showing how this approximation fits into the theoretical framework proposed here: namely, we derive the anisotropic low-frequency ver- sion of the Hall MHD approximation from gyrokinetics under the assumption τ ≪ 1 and discuss the role of the ion inertial and ion sound scales, which acquire physical significance in this limit. However, outside this Appendix, we assume τ ∼ 1 everywhere and shall not use Hall MHD. The validity of the ERMHD equations as a model for plasma dynamics in the dissipation range is further discussed in § 7.6. 7.3. Kinetic Alfvén Waves The linear modes supported by ERMHD are kinetic Alfvén waves (KAW) with frequencies ωk = ± 1 + Z/τ 2 +βi 1 + Z/τ ) k⊥ρik‖vA. (230) This dispersion relation is illustrated in Fig. 8: note that the transition from Alfvén waves to dispersive KAW always oc- curs at k⊥ρi ∼ 1, even when βi ≪ 1 or βi ≫ 1. In the latter case, there is a sharp frequency jump at the transition (accom- panied by very strong ion Landau damping). The eigenfunctions corresponding to the two waves with 32 SCHEKOCHIHIN ET AL. FIG. 9.— Polarization of the kinetic Alfvén wave, see Eqs. (232) and (233). frequencies (230) are 2 +βi ∓ k⊥Ψk. (231) Using Eqs. (229) and (223), the perturbed magnetic-field vec- tor can be expressed as follows = −iẑ× k⊥ 1 + Z/τ 2 +βi 1 + Z/τ (232) so, for a single “+” or “−” wave (corresponding to Θ−k = 0 or k = 0, respectively), δBk rotates in the plane perpendicular to the wave vector k⊥ clockwise with respect to the latter, while the wave propagates parallel or antiparallel to the guide field (Fig. 9). The waves are elliptically right-hand polarized. Indeed, us- ing Eq. (223), the perpendicular electric field is: E⊥k = −ik⊥ϕ+ −ik⊥ + ẑ×k⊥ ϕ (233) (cf. Gary 1986; Hollweg 1999). The second term is small in the gyrokinetic expansion, so this is a very elongated ellipse (Fig. 9). 7.4. Finite-Amplitude Kinetic Alfvén Waves As we are about to argue for a critically balanced KAW turbulence in a fashion analogous to the GS theory for the Alfvén waves (§ 1.2), it is a natural question to ask how simi- lar the nonlinear properties of a putative KAW cascade will be to an Alfvén-wave cascade. As in the case of Alfvén waves, there are two counterpropagating linear modes [Eqs. (230) and (231)], and it turns out that certain superpositions of these modes (KAW packets) are also exact nonlinear solutions of Eqs. (226-227). Let us show that this is the case. We might look for the nonlinear solutions of Eqs. (226-227) by requiring that the nonlinear terms vanish. Since b̂ · ∇ = ∂/∂z + (1/vA){Ψ, · · ·}, this gives {Ψ,Φ} = 0 ⇒ Ψ = c1Φ, (234) {Ψ,ρ2i ∇2⊥Ψ} = 0 ⇒ ρ2i ∇2⊥Ψ = c2Ψ, (235) where c1 and c2 are constants. Whether such solutions are possible is determined by substituting Eqs. (234) and (235) into Eqs. (226) and (227) and demanding that the two result- ing linear equations be consistent with each other (both equa- tions now just evolve Ψ). This is achieved if29 c21 = − 2 +βi , (236) so real solutions exist if c2 < 0. In particular, wave pack- ets consisting of KAW given by one of the linear eigen- modes (231) with an arbitrary shape in z but confined to a single shell |k⊥| = k⊥ = const, satisfy Eqs. (234-236) with c2 = −k2⊥ρ i . This outcome is, in fact, only mildly non-trivial: in gyrokinetics, the Poisson bracket nonlinearity [Eq. (59)] vanishes for any monochromatic (in k⊥) mode because the Poisson bracket of two modes with wavenumbers k⊥ and k′⊥ is ∝ ẑ · (k⊥ × k′⊥). Therefore, any monochromatic solution of the linearized equations is also an exact nonlinear solution. As we have shown above, a superposition of monochromatic KAW that have a fixed k⊥, or, somewhat more generally, sat- isfy Eq. (235) with a fixed c2, is still an exact solution. Note that a similar procedure applied to the RMHD equa- tions (17-18) returns the Elsasser solutions: perturbations of arbitrary shape that satisfy Φ = ±Ψ. The physical difference between these finite-amplitude Alfven-wave packets and the finite-amplitude KAW packets discussed above is that non- linear interactions can occur not just between counterpropa- gating KAW but also between copropagating ones—a natural conclusion because KAW are dispersive (their group velocity along the guide field is ∝ vAk⊥ρi), so copropagating waves with different k⊥ can “catch up” with each other and inter- act.30 7.5. Scalings for KAW Turbulence A scaling theory for the turbulence described by Eqs. (221- 227) can be constructed along the same lines as the GS theory for the Alfvén-wave turbulence (§ 1.2). Namely, we shall as- sume that the turbulence below the ion gyroscale consists of KAW-like fluctuations with k‖ ≪ k⊥ (Quataert & Gruzinov 1999) and that the interactions between them are critically balanced (Cho & Lazarian 2004), i.e., that the propagation time and nonlinear interaction time are comparable at every scale. We stress that none of these assumptions are, strictly speaking, inevitable31 (and, in fact, neither were they in- evitable in the case of Alfvén waves). Since we have de- rived Eqs. (226-227) from gyrokinetics, the anisotropy of the fluctuations described by these equations is hard-wired, but it is not guaranteed that the actual physical cascade be- low the ion gyroscale is indeed anisotropic, although anal- ysis of solar-wind measurements does seem to indicate that 29 Formally speaking, c1 and c2 can depend on t and z. If this is allowed, we still recover Eq. (236), but in addition to it, we get the evolution equation c1∂c1/∂t = vA(1 + Z/τ )∂c1/∂z. This allows c1 = const, but there are, of course, other solutions. We shall not consider them here. 30 The calculation above is analogous to the calculation by Mahajan & Krishan (2005) for incompressible Hall MHD (i.e., essen- tially, the high-βe limit of the equations discussed in Appendix E), but the result is more general in the sense that it holds at arbitrary ion and electron betas. The Mahajan–Krishan solution in the EMHD limit amounts to noticing that Eq. (228) becomes linear for force-free (Beltrami) magnetic perturbations, ∇× δB = λδB. Substituting Eq. (229) into this equation and using Eq. (223), we see that the force-free equation is equivalent to Eqs. (234-236) if c2 = −λ2 and the incompressible limit (βi ≫ 1 or βe = βiZ/τ ≫ 1) is taken. 31 In fact, the EMHD turbulence was thought to be weak by several au- thors, who predicted a k−2 spectrum of magnetic energy assuming isotropy (Goldreich & Reisenegger 1992) or k for the anisotropic case (Voitenko 1998; Galtier & Bhattacharjee 2003; Galtier 2006). KINETIC TURBULENCE IN MAGNETIZED PLASMAS 33 at least a significant fraction of it is (see Leamon et al. 1998; Hamilton et al. 2008). Numerical simulations based on Eq. (228) (Biskamp et al. 1996, 1999; Ghosh et al. 1996; Ng et al. 2003; Cho & Lazarian 2004; Shaikh & Zank 2005) have revealed that the spectrum of magnetic fluctuations scales as k ⊥ , the outcome consistent with the assumptions stated above. Let us outline the argument that leads to this scaling. First assume that the fluctuations are KAW-like and that Θ+ and Θ− [Eq. (231)] have similar scaling. This implies 1 +βi Φλ (237) (for the purposes of scaling arguments and order-of- magnitude estimates, we set Z/τ = 1, but keep the βi de- pendence so low- and high-beta limits could be recovered if necessary). The fact that fixed-k⊥ KAW packets, which sat- isfy Eq. (237) with λ = 1/k⊥, are exact nonlinear solutions of the ERMHD equations (§ 7.4) lends some credence to this assumption. Assuming scale-space locality of interactions implies a constant-flux KAW cascade: analogously to Eq. (1), (Ψλ/λ) τKAWλ ∼ (1 +βi)(Φλ/ρi) τKAWλ ∼ εKAW = const, (238) where τKAWλ is the cascade time and εKAW is the KAW energy flux proportional to the fraction of the total flux ε (or the total turbulent power Pext; see § 3.4) that was converted into the KAW cascade at the ion gyroscale. Using Eqs. (226-227) and Eq. (237), it is not hard to see that the characteristic nonlinear decorrelation time is λ2/Φλ. If the turbulence is strong, then this time is comparable to the inverse KAW frequency [Eq. (230)] scale by scale and we may assume the cascade time is comparable to either: τKAWλ ∼ 1 +βi . (239) In other words, this says that ∂/∂z ∼ (δB⊥/B0) ·∇⊥ and so δB⊥λ/B0 ∼ λ/l‖λ (note that the last relation confirms that our scaling arguments do not violate the gyrokinetic ordering; see § 2.1 and § 3.1). Equation (239) is the critical-balance as- sumption for KAW. As in the case of the Alfvén waves (§ 1.2), we might argue physically that the critical balance is set up be- cause the parallel correlation length l‖λ is determined by the condition that a wave can propagate the distance l‖λ in one nonlinear decorrelation time corresponding to the perpendic- ular correlation length λ. Combining Eqs. (238) and (239), we get the desired scaling relations for the KAW turbulence: (εKAW )1/3 vA (1 +βi)1/3 2/3, (240) (1 +βi)1/6 , (241) where l0 = v A/ε, as in § 1.2. The first of these scaling relations is equivalent to a k ⊥ spectrum of magnetic energy, the sec- ond quantifies the anisotropy (which is stronger than for the GS turbulence). Both scalings were confirmed in the numer- ical simulations of Cho & Lazarian (2004)—it is their detec- tion of the scaling (241) that makes a particularly strong case that KAW turbulence is not weak and that the critical balance hypothesis applies. For KAW-like fluctuations, the density [Eq. (221)] and magnetic field [Eqs. (223) and (231)] have the same spec- trum as the scalar potential, i.e., k ⊥ , while the electric field E ∼ k⊥ϕ has a k−1/3⊥ spectrum. The solar-wind fluctuation spectra reported by Bale et al. (2005) indeed are consistent with a transition to KAW turbulence around the ion gyroscale: k−5/3 magnetic and electric-field power spectra at kρi ≪ 1 are replaced, for kρi & 1, with what appears to be consistent with a k−7/3 scaling for the magnetic-field spectrum and a k−1/3 for the electric one (see Fig. 1). A similar result is recovered in fully gyrokinetic simulations with βi = 1, τ = 1 (Howes et al. 2008b). However, not all solar-wind observations are quite as straightforwardly supportive of the notion of the KAW cas- cade and much steeper magnetic-fluctuation spectra have also been reported (e.g., Denskat et al. 1983; Leamon et al. 1998; Smith et al. 2006). Possible reasons for this will emerge in § 7.6 and § 7.11 and the solar-wind data are further discussed in § 8.2.4 and § 8.2.5. 7.6. Validity of the Electron RMHD and the Effect of Electron Landau Damping The ERMHD equations derived in § 7 are valid provided k⊥ρi ≫ 1 and also provided it is sufficient to use the leading order in the mass-ratio expansion (isothermal electrons; see § 4). In particular, this means that the electron Landau damp- ing is neglected. Asymptotically speaking, this is a rigorous limit, but one must be cautious in applying it to real plas- mas. Since the width of the scale range where k⊥ρi ≫ 1 and k⊥ρe ≪ 1 is only ∼ (mi/me)1/2 ≃ 43, for some values of the plasma parameters (T0i/T0e and βi) there may not be a very broad interval of scales where the electron Landau damping is truly negligible. Consider, for example, the low-beta limit, βi ≪ 1. In this limit, the KAW frequency is ω ∼ k⊥ρik‖vA [Eq. (230)]. The electron Landau damping becomes impor- tant when ω ∼ k‖vthe, or k⊥ρe ∼ βi ≪ 1, so the ERMHD approximation breaks down and, consequently, the KAW cas- cade, if any, should be interrupted well before the electron gyroscale is reached. Figure 8 shows the solution of the full gyrokinetic dispersion relation (Howes et al. 2006) for small, unity and large βi. One can judge for which scales and how well (or how badly) the ERMHD approximation holds from the precision with which the exact frequency follows the asymptotic solution Eq. (230) and from the relative strength of the damping compared to the real frequency of the waves. Non-negligible electron Landau damping may affect turbu- lence spectra because one can no longer assume a constant flux of KAW energy as we did in § 7.5. To evaluate the conse- quences of this effect, Howes et al. (2008a) constructed a sim- ple model of spectral energy transfer and concluded that Lan- dau damping leads to steepening of the KAW spectra—one of several possible reasons for steep dissipation-range spectra observed in space plasmas (see also § 7.11). 7.7. Unfreezing of Flux As ERMHD is a limit of the isothermal-electron-fluid sys- tem (§ 4), the magnetic-field lines remain unbroken (see § 4.3). Within the orderings employed above (small mass ra- tio, νii ∼ ω, βi ∼ 1, τ ∼ 1), the flux unfreezes only in the vicinity of the electron gyroscale. It is interesting to evaluate somewhat more precisely the scale at which this happens as a function of plasma parameters. 34 SCHEKOCHIHIN ET AL. Physically, there are three kinds of mechanisms by which the flux conservation is broken: electron inertia, the effects of finite electron gyroradius, and Ohmic resistivity. Let us take the v‖ moment of the electron gyrokinetic equation [Eq. (57), s = e, integration at constant r] and use Eq. (222) to evaluate the inertial term in the resulting parallel electron momentum equation: d2e∇2⊥A‖, (242) where de = ρe/ βe is the electron inertial scale and βe = Zβi/τ . Comparing this with the ∂A‖/∂t term in the right- hand side of the electron momentum equation, we see that the electron inertia becomes important when k⊥ρe ∼ βe. The finite-gyroradius effects enter when k⊥ρe ∼ 1. Thus, at low βe, the electron inertia becomes important above the electron gyroscale, whereas at high βe, the finite-gyroradius effects en- ter first. Finally, the Ohmic resistivity comes from the colli- sion term (see Appendix B.4): d3vv‖ νeiu‖e ∼ νeik2⊥d2e A‖. (243) Thus, resistivity starts to act when k⊥de ∼ (ω/νei)1/2. Using the KAW frequency [Eq. (230)] to estimate ω and assuming that τ is not small, we get k⊥ρe ∼ k‖λmfpi 1 +βi . (244) Thus, the resistive scale can only be larger the electron gy- roscale if the plasma is collisional (k‖λmfpi ≪ 1) and/or elec- trons are much colder than ions (τ ≫ 1) and/or βi ≪ 1. Note if only the last of these conditions is satisfied, the electron inertia still becomes important at larger scales than resistivity. 7.8. Generalized Energy: KAW and Entropy Cascades The generalized energy (§ 3.4) in the limit k⊥ρi ≫ 1 is cal- culated by substituting Eqs. (221) and (223) into Eq. (109): T0i〈h2i 〉r n0iT0i =Whi +WKAW. (245) Here the first term, Whi , is the total variance of hi, which is proportional to minus the entropy of the ion gyrocenter distri- bution (see § 3.5) and whose cascade to collisional scales will be discussed in § 7.9 and § 7.10. The remaining two terms are the independently cascaded KAW energy: WKAW = min0i |∇⊥Ψ|2 min0i |Θ+|2 + |Θ−|2 . (246) Although we can write WKAW as the sum of the energies of the “+” and “−” linear KAW eigenmodes [Eq. (231)], which are also exact nonlinear solutions (§ 7.4), the two do not cas- cade independently and can exchange energy. Note that the ERMHD equations also conserve d3rΨΦ, which is readily interpreted as the helicity of the perturbed magnetic field (see Appendix F.3). However, it does not affect the KAW cascade discussed in § 7.5 because it can be argued to have a tendency to cascade inversely (Appendix F.6). Comparing the way the generalized energy is split above and below the ion gyroscale (see § 5.6 for the k⊥ρi ≪ 1 limit), we interpret what happens at the k⊥ρi ∼ 1 transition as a redis- tribution of the power that arrived from large scales between a cascade of KAW and a cascade of the (minus) gyrocenter entropy in the phase space (see Fig. 5). The latter cascade is the way in which the energy diverted from the electromag- netic fluctuations by the collisionless damping (wave–particle interaction) can be transferred to the collisional scales and de- posited into heat (§ 7.1). The concept of entropy cascade as the key agent in the heating of the plasma was introduced in § 3.5, where we promised a more detailed discussion later on. We now proceed to this discussion. 7.9. Entropy Cascade The ion-gyrocenter distribution function hi satisfies the ion gyrokinetic equation (121), where ion–electron collisions are neglected under the mass-ratio expansion. At k⊥ρi ≫ 1, the dominant contribution to 〈χ〉Ri comes from the electromag- netic fluctuations associated with KAW turbulence. Since the KAW cascade is decoupled from the entropy cascade, hi is a passive tracer of the ring-averaged KAW turbulence in phase space. Expanding the Bessel functions in the expres- sion for 〈χ〉Ri ,k [ai ≫ 1 in Eq. (69) with s = i] and making use of Eqs. (222-223) and of the KAW scaling Ψ ∼ Φ/k⊥ρi [Eq. (231)], it is not hard to show that 〈χ〉Ri ,k ≃ 〈ϕ〉Ri ,k = J0(ai)Φk , (247) where J0(ai) ≃ , ai = k⊥ρi , (248) so hi satisfies [Eq. (121)] +{〈Φ〉Ri ,hi} = βiρivA ∂〈Φ〉Ri F0i + 〈Cii[hi]〉Ri (249) with the conservation law [Eq. (70), s = i] βi ρivA ∂〈Φ〉Ri hi 〈Cii[hi]〉Ri . (250) 7.9.1. Nonlinear Perpendicular Phase Mixing The wave–particle interaction term (the first term on the right hand sides of these two equations) will shortly be seen to be subdominant at k⊥ρi ≫ 1. It represents the source of the invariant Whi due to the collisionless damping at the ion gyroscale of some fraction of the energy arriving from the in- ertial range. In a stationary turbulent state, we should have KINETIC TURBULENCE IN MAGNETIZED PLASMAS 35 FIG. 10.— Nonlinear perpendicular phase-mixing mechanism: the gyrocenter distribution function at Ri of particles with velocities v⊥ and v is mixed by turbulent fluctuations of the potential Φ (E×B flows) averaged over particle orbits separated by a distance greater than the correlation length of Φ. dWhi/dt = 0 and this source should be balanced on average by the (negative definite) collisional dissipation term ( = heating; see § 3.5). This balance can only be achieved if hi develops small scales in the velocity space and carries the generalized energy, or, in this case, entropy, to scales in the phase space at which collisions are important. A quick way to see this is by recalling that the collision operator has two velocity deriva- tives and can only balance the terms on the left-hand side of Eq. (249) if ∼ ω ⇒ δv , (251) where ω is the characteristic frequency of the fluctuations of hi. If νii ≪ ω, δv/vthi ≪ 1. This is certainly true for k⊥ρi ∼ 1: taking ω ∼ k‖vA and using k‖λmfpi ≫ 1 (which is the appropriate limit at and below the ion gyroscale for most of the plasmas of interest; cf. footnote 24), we have νii/ω ∼ βi/k‖λmfpi ≪ 1. The condition (251) means that the collision rate can be ar- bitrarily small—this will always be compensated by the suf- ficiently fine velocity-space structure of the distribution func- tion to produce a finite amount of entropy production (heat- ing) independent of νii in the limit νii → +0. The situa- tion bears some resemblance to the emergence of small spa- tial scales in neutral-fluid turbulence with arbitrarily small but non-zero viscosity (Kolmogorov 1941). The analogy is not perfect, however, because the ion gyrokinetic equa- tion (249) does not contain a nonlinear interaction term that would explicitly cause a cascade in the velocity space. In- stead, the (ring-averaged) KAW turbulence mixes hi in the gyrocenter space via the nonlinear term in Eq. (249), so hi will have small-scale structure in Ri on characteristic scales much smaller than ρi. Let us assume that the dominant non- linear effect is a local interaction of the small-scale fluctua- tions of hi with the similarly small-scale component of 〈Φ〉Ri . Since ring averaging is involved and k⊥ρi is large, the val- ues of 〈Φ〉Ri corresponding to two velocities v and v′ will come from spatially decorrelated electromagnetic fluctuations if k⊥v⊥/Ωi and k⊥v ⊥/Ωi [the argument of the Bessel function in Eq. (247)] differ by order unity, i.e., for |v⊥ − v′⊥| (252) (see Fig. 10). This relation gives a correspondence between the decorrelation scales of hi in the position and velocity space. Combining Eqs. (252) and (251), we see that there is a collisional cutoff scale determined by k⊥ρi ∼ (ω/νii)1/2 ≫ 1.32 The cutoff scale is much smaller than the ion gyroscale. In the range between these scales, collisional dissipation is small. The ion entropy fluctuations are transferred across this scale range by means of a cascade, for which we will con- struct a scaling theory in § 7.9.2 (and, for the case without the background KAW turbulence, in § 7.10). It is important to emphasize that no matter how small the collisional cutoff scale is, all of the generalized energy chan- neled into the entropy cascade at the ion gyroscale eventually reaches it and is converted into heat. Note that the rate at which this happens is in general amplitude-dependent because the process is nonlinear, although we will argue in § 7.9.4 (see also § 7.10.3) that the nonlinear cascade time and the parallel linear propagation (particle streaming) time are related by a critical-balance-like condition (we will also argue there that the linear parallel phase mixing, which can generate small scales in v‖, is a less efficient process than the nonlinear per- pendicular one discussed above). It is interesting to note the connection between the entropy cascade and certain aspects of the gyrofluid closure formal- ism developed by Dorland & Hammett (1993). In their the- ory, the emergence of small scales in v⊥ manifested itself as the growth of high-order v⊥ moments of the gyrocenter distri- bution function. They correctly identified this effect as a con- sequence of the nonlinear perpendicular phase mixing of the gyrocenter distribution function caused by a perpendicular- velocity-space spread in the ring-averaged E ×B velocities (given by 〈uE〉Ri = ẑ×∇〈Φ〉Ri in our notation) arising at and below the ion gyroscale. 7.9.2. Scalings Since entropy is a conserved quantity, we will follow the well trodden Kolmogorov path, assume locality of interac- tions in scale space and constant entropy flux, and conclude, analogously to Eq. (1), v8thi h2iλ ∼ εh = const, (253) where εh is the entropy flux proportional to the fraction of the total turbulent power ε (or Pext; see § 3.4) that was diverted into the entropy cascade at the ion gyroscale, and is the cas- cade time that we now need to find. By the critical-balance assumption, the decorrelation time of the electromagnetic fluctuations in KAW turbulence is comparable at each scale to the KAW period at that scale and to the nonlinear interaction time [Eq. (239)]: τKAWλ ∼ (1 +βi)1/3 . (254) The characteristic time associated with the nonlinear term in Eq. (249) is longer than τKAWλ by a factor of (ρi/λ) 1/2 due to the ring averaging, which reduces the strength of the nonlinear interaction. This weakness of the nonlinearity makes it pos- sible to develop a systematic analytical theory of the entropy 32 Another source of small-scale spatial smoothing comes from the per- pendicular gyrocenter-diffusion terms ∼ −νii(v/vthi)2k2⊥ρ i hik that arise in the ring-averaged collision operators, e.g., the second term in the model operator (B13). These terms again enforce a cutoff wavenumber such that k⊥ρi ∼ (ω/νii) 1/2 ≫ 1. 36 SCHEKOCHIHIN ET AL. cascade (Schekochihin & Cowley 2009). It is also possible to estimate the cascade time via a more qualitative argument analogous to that first devised by Kraichnan (1965) for the weak turbulence of Alfvén waves: during each KAW correla- tion time τKAWλ, the nonlinearity changes the amplitude of hi by only a small amount: ∆hiλ ∼ (λ/ρi)1/2hiλ ≪ hiλ; (255) these changes accumulate with time as a random walk, so after time t, the cumulative change in amplitude is ∆hiλ(t/τKAWλ) 1/2; finally, the cascade time t = is the time after which the cumulative change in amplitude is compara- ble to the amplitude itself, which gives, using Eq. (254), τKAWλ ∼ (1 +βi)1/3 . (256) Substituting this into Eq. (253), we get hiλ ∼ v3thi )1/2( (1 +βi)1/6√ (257) which corresponds to a k ⊥ spectrum of entropy. In the argument presented above, we assumed that the scal- ing of hi was determined by the nonlinear mixing of hi by the ring-averaged KAW fluctuations rather than by the wave– particle interaction term on the right-hand side of Eq. (249). We can now confirm the validity of this assumption. The change in amplitude of hi in one KAW correlation time τKAWλ due to the wave–particle interaction term is ∆hiλ∼ v3thi βiρivA ∼ n0i v3thi (εKAW )1/3 1√ βi (1 +βi)1/3 7/6, (258) where we have used Eq. (240). Comparing this with Eq. (255) and using Eq. (257), we see that ∆hiλ in Eq. (258) is a factor of (λ/ρi) 1/2 smaller than ∆hiλ due to the nonlinear mixing. 7.9.3. Phase-Space Cutoff To work out the cutoff scales both in the position and veloc- ity space, we use Eqs. (251) and (252): in Eq. (251), ω ∼ 1/, where is the characteristic decorrelation time of hi given by Eq. (256); using Eq. (252), we find the cutoffs: ∼ (νiiτρi )3/5 = Do−3/5, (259) where τρi is the cascade time [Eq. (256)] taken at λ = ρi. By a recently established convention, the dimensionless num- ber Do = 1/νiiτρi is called the Dorland number. It plays the role of Reynolds number for kinetic turbulence, mea- suring the scale separation between the ion gyroscale and the collisional dissipation scale (Schekochihin et al. 2008b; Tatsuno et al. 2009a,b). 7.9.4. Parallel Phase Mixing Another assumption, which was made implicitly, was that the parallel phase mixing due to the second term on the left- hand side of Eq. (249) could be ignored. This requires jus- tification, especially because it is with this “ballistic” term that one traditionally associates the emergence of small-scale structure in the velocity space (e.g., Krommes & Hu 1994; Krommes 1999; Watanabe & Sugama 2004). The effect of the parallel phase mixing is to produce small scales in veloc- ity space δv‖ ∼ 1/k‖t. Let us assume that the KAW turbu- lence imparts its parallel decorrelation scale to hi and use the scaling relation (241) to estimate k‖ ∼ l−1‖λ. Then, after one cascade time [Eq. (256)], hi is decorrelated on the parallel velocity scales βi(1 +βi) ∼ 1. (260) We conclude that the nonlinear perpendicular phase mixing [Eq. (259)] is more efficient than the linear parallel one. Note that up to a βi-dependent factor Eq. (260) is equivalent to a critical-balance-like assumption for hi in the sense that the propagation time is comparable to the cascade time, or k‖v‖ ∼ −1 [see Eq. (249)]. 7.10. Entropy Cascade in the Absence of KAW Turbulence It is not currently known how one might determine ana- lytically what fraction of the turbulent power arriving from the inertial range to the ion gyroscale is channeled into the KAW cascade and what fraction is dissipated via the kinetic ion-entropy cascade introduced in § 7.9 (perhaps it can only be determined by direct numerical simulations). It is cer- tainly a fact that in many solar-wind measurements, the rel- atively shallow magnetic-energy spectra associated with the KAW cascade (§ 7.5) fail to appear and much steeper spectra are detected (close to k−4; see Leamon et al. 1998; Smith et al. 2006). In view of this evidence, it is interesting to ask what would be the nature of electromagnetic fluctuations below the ion gyroscale if the KAW cascade failed to be launched, i.e., if all (or most) of the turbulent power were directed into the entropy cascade (i.e., if W ≃Whi in § 7.8). 7.10.1. Equations It is again possible to derive a closed set of equations for all fluctuating quantities. Let us assume (and verify a posteriori; § 7.10.4) that the characteristic frequency of such fluctuations is much lower than the KAW frequency [Eq. (230)] so that the first term in Eq. (116) is small and the equation reduces to the balance of the other two terms. This gives , (261) meaning that the electrons are purely Boltzmann [he = 0 to lowest order; see Eq. (101)]. Then, from Eq. (118), ρivthi eik·r d3vJ0(ai)hik (262) Using Eq. (262), we find from Eq. (120) that the field- strength fluctuations are eik·r v2thi J1(ai) hik, (263) which is smaller than Zeϕ/T0i by a factor of βi/k⊥ρi. Therefore, we can neglect δB‖/B0 compared to δne/n0e in Eq. (117). Using Eq. (261), we get what is physically the KINETIC TURBULENCE IN MAGNETIZED PLASMAS 37 electron continuity equation: + b̂ ·∇ 4πen0e ∇2⊥A‖ + u‖i = 0, (264) u‖i = eik·r d3vv‖J0(ai)hik. (265) Note that in terms of the stream and flux functions, Eq. (264) takes the form ρ2i ∇2⊥Ψ = , (266) where we have approximated b̂ · ∇ ≃ ∂/∂z, which will, in- deed, be shown to be correct in § 7.10.4. Together with the ion gyrokinetic equation, which deter- mines hi, Eqs. (261-264) form a closed set. They describe low-frequency fluctuations of the density and electromagnetic field due solely to the presence of fluctuations of hi below the ion gyroscale. It follows from Eq. (263) that δB‖/B0 contributes subdom- inantly to 〈χ〉Ri [Eq. (69) with s = i and ai ≫ 1]. It will be verified a posteriori (§ 7.10.4) that the same is true for A‖. Therefore, Eqs. (247) and (249) continue to hold, as in the case with KAW. This means that Eqs. (249) and (262) form a closed subset. Thus the kinetic ion-entropy cascade is self- regulating in the sense that hi is no longer passive (as it was in the presence of KAW turbulence; § 7.9) but is mixed by the ring-averaged “electrostatic” fluctuations of the scalar po- tential, which themselves are produced by hi according to Eq. (262). The magnetic fluctuations are passive and determined by the electrostatic and entropy fluctuations via Eqs. (263) and (264). 7.10.2. Scalings From Eq. (262), we can establish a correspondence between Φλ and hiλ (the electrostatic fluctuations and the fluctuations of the ion-gyrocenter distribution function): Φλ ∼ ρivthi hiλλ, (267) where the factor of (λ/ρi) 1/2 comes from the Bessel function [Eq. (248)] and the factor of (δv⊥/vthi) 1/2 results from the v⊥ integration of the oscillatory factor in the Bessel function times hi, which decorrelates on small scales in the velocity space and, therefore, its integral accumulates in a random- walk-like fashion. The velocity-space scales are related to the spatial scales via Eq. (252), which was arrived at by an ar- gument not specific to KAW-like fluctuations and, therefore, continues to hold. Using Eq. (267), we find that the wave–particle interaction term in the right-hand side of Eq. (249) is subdominant: com- paring it with ∂hi/∂t shows that it is smaller by a factor of (λ/ρi) 3/2 ≪ 1. Therefore, it is the nonlinear term in Eq. (249) that controls the scalings of hiλ and Φλ. We now assume again the scale-space locality and con- stancy of the entropy flux, so Eq. (253) holds. The cascade (decorrelation) time is equal to the characteristic time associ- ated with the nonlinear term in Eq. (249): ∼ (ρi/λ)1/2λ2/Φλ. Substituting this into Eq. (253) and using Eq. (267), we ar- rive at the desired scaling relations for the entropy cascade (Schekochihin et al. 2008b): v3thi )1/3 1√ 1/6, (268) )1/3 vthi√ 7/6, (269) )1/3 √ 1/3, (270) where l0 = v A/ε, as in § 1.2. Note that since the existence of this cascade depends on it not being overwhelmed by the KAW fluctuations, we should have εKAW ≪ ε and εh = ε − εKAW ≈ ε. The scaling for the ion-gyrocenter distribution function, Eq. (268), implies a k ⊥ spectrum—the same as for the KAW turbulence [Eq. (257)]. The scaling for the the cascade time, Eq. (270), is also similar to that for the KAW turbulence [Eq. (256)]. Therefore the velocity- and gyrocenter-space cut- offs are still given by Eq. (259), where τρi is now given by Eq. (270) taken at λ = ρi. A new feature is the scaling of the scalar potential, given by Eq. (269), which corresponds to a k −10/3 ⊥ spectrum (unlike the KAW spectrum, § 7.5). This is a measurable prediction for the electrostatic fluctuations: the implied electric-field spectrum ⊥ . From Eq. (261), we also conclude that the density fluctuations should have the same spectrum as the scalar po- tential, k −10/3 ⊥ —another measurable prediction. The scalings derived above for the spectra of the ion distribution function and of the scalar potential have been confirmed in the numerical simulations by Tatsuno et al. (2009a,b), who studied decaying electrostatic gyrokinetic tur- bulence in two spatial dimensions. They also found velocity- space scalings in accord with Eq. (252) (using a spectral representation of the correlation functions in the v⊥ space based on the Hankel transform of the distribution function; see Plunk et al. 2009). 7.10.3. Parallel Cascade and Parallel Phase Mixing We have again ignored the ballistic term (the second on the left-hand side) in Eq. (249). We will estimate the effi- ciency of the parallel spatial cascade of the ion entropy and of the associated parallel phase mixing by making a conjecture analogous to the critical balance: assuming that any two per- pendicular planes only remain correlated provided particles can stream between them in one nonlinear decorrelation time (cf. § 1.2 and § 7.9.4), we conclude that the parallel particle- streaming frequency k‖v‖ should be comparable at each scale to the inverse nonlinear time −1, so k‖vthi ∼ 1. (271) As we explained in § 7.9.4, the parallel scales in the velocity space generated via the ballistic term are related to the parallel wavenumbers by δv‖ ∼ 1/k‖t. From Eq. (271), we find that after one cascade time , the typical parallel velocity scale is δv‖/vthi ∼ 1, so the parallel phase mixing is again much less efficient than the perpendicular one. Note that Eq. (271) combined with Eq. (270) means that the anisotropy is again characterized by the scaling relation k‖ ∼ ⊥ , similarly to the case of KAW turbulence [see Eq. (241) and § 7.9.4]. 38 SCHEKOCHIHIN ET AL. 7.10.4. Scalings for the Magnetic Fluctuations The scaling law for the fluctuations of the magnetic-field strength follows immediately from Eqs. (263) and (269): ρivthi −11/6 13/6, (272) whence the spectrum of these fluctuations is k −16/3 The scaling of A‖ (the perpendicular magnetic fluctuations) depends on the relation between k‖ and k⊥. Indeed, the ratio between the first and the third terms on the left-hand side of Eq. (264) [or, equivalently, between the first and second terms on the right-hand side of Eq. (266)] is ∼ k‖vthi . For a crit- ically balanced cascade, this makes the two terms comparable [Eq. (271)]. Using the first term to work out the scaling for the perpendicular magnetic fluctuations, we get, using Eq. (269), ρivthi −11/6 13/6, (273) which is the same scaling as for δB‖/B0 [Eq. (272)]. Using Eq. (273) together with Eqs. (269) and (270), it is now straightforward to confirm the three assumptions made in § 7.10.1 that we promised to verify a posteriori: 1. In Eq. (116), ∂A‖/∂t ≪ cb̂ ·∇ϕ, so Eq. (261) holds (the electrons remain Boltzmann). This means that no KAW can be excited by the cascade. 2. δB⊥/B0 ≪ k‖/k⊥, so b̂ ·∇ ≃ ∂/∂z in Eq. (264). This means that field lines are not significantly perturbed. 3. In the expression for 〈χ〉Ri [Eq. (69)], v‖A‖/c ≪ ϕ, so Eq. (249) holds. This means that the electrostatic fluc- tuations dominate the cascade. 7.11. Cascades Superposed? The spectra of magnetic fluctuations obtained in § 7.10.4 are very steep—steeper, in fact, than those normally observed in the dissipation range of the solar wind (§ 8.2.5). One might speculate that the observed spectra may be due to a superposi- tion of the two cascades realizable below the ion gyroscale: a high-frequency cascade of KAW (§ 7.5) and a low-frequency cascade of electrostatic fluctuations due to the ion entropy fluctuations (§ 7.10). Such a superposition could happen if the power going into the KAW cascade is relatively small, εKAW ≪ ε. One then expects an electrostatic cascade to be set up just below the ion gyroscale with the KAW cascade superseding it deeper into the dissipation range. Comparing Eqs. (240) and (269), we can estimate the position of the spec- tral break: k⊥ρi ∼ ε/εKAW . (274) Since ρi/ρe ∼ (τmi/me)1/2/Z is not a very large number, the dissipation range is not very wide. It is then conceivable that the observed spectra are not true power laws but simply non- asymptotic superpositions of the electrostatic and KAW spec- tra with the observed range of “effective” spectral exponents due to varying values of the spectral break (274) between the two cascades.33 33 Several alternative theories that aim to explain the dissipation-range spectra exist: see § 8.2.6. The value of εKAW/ε specific to any particular set of param- eters (βi, τ , etc.) is set by what happens at k⊥ρi ∼ 1 (§ 7.1; see § 8.2.2, § 8.2.5, and § 8.5 for further discussion). 7.12. Below the Electron Gyroscale: The Last Cascade Finally, let us consider what happens when k⊥ρe ≫ 1. At these scales, we have to return to the full gyrokinetic sys- tem of equations. The quasi-neutrality [Eq. (61)], parallel [Eq. (62)] and perpendicular [Eq. (66)] Ampère’s law become eik·r d3vJ0(ae)hek, (275) 4πen0e ∇2⊥A‖ = eik·r d3vv‖J0(ae)hek, (276) eik·r v2the J1(ae) hek, (277) where βe = βiZ/τ . We have discarded the velocity integrals of hi both because the gyroaveraging makes them subdom- inant in powers of (me/mi) 1/2 and because the fluctuations of hi are damped by collisions [assuming the collisional cut- off given by Eq. (259) lies above the electron gyroscale]. To Eqs. (275-277), we must append the gyrokinetic equation for he [Eq. (57) with s = e], thus closing the system. The type of turbulence described by these equations is very similar to that discussed in § 7.10. It is easy to show from Eqs. (275-277) that . (278) Hence the magnetic fluctuations are subdominant in the ex- pression for 〈χ〉Re [Eq. (69) with s = e and ae ≫ 1], so 〈χ〉Re ≃ 〈ϕ〉Re . The electron gyrokinetic equation then is {〈ϕ〉Re ,he} = , (279) where the wave–particle interaction term in the right-hand side has been dropped because it can be shown to be small via the same argument as in § 7.10.2. Together with Eq. (275), Eq. (279) describes the kinetic cas- cade of electron entropy from the electron gyroscale down to the scale at which electron collisions can dissipate it into heat. This cascade the result of collisionless damping of KAW at k⊥ρe ∼ 1, whereby the power in the KAW cascade is con- verted into the electron-entropy fluctuations: indeed, in the limit k⊥ρe ≫ 1, the generalized energy is simply = Whe (280) (see Fig. 5). The same scaling arguments as in § 7.10.2 apply and scaling relations analogous to Eqs. (268-270), and (272) duly follow: v3the (εKAW 1/6, (281) (εKAW vthe l 7/6, (282) )1/3( , (283) KINETIC TURBULENCE IN MAGNETIZED PLASMAS 39 (εKAW −11/6 13/6, (284) where l0 = v A/ε, as in § 1.2. The formula for the collisional cutoffs in the wavenumber and velocity space is analogous to Eq. (259): ∼ (νeiτρe )3/5, (285) where τρe is the cascade time (283) taken at λ = ρe. 7.13. Validity of Gyrokinetics in the Dissipation Range As the kinetic cascade takes the (generalized) energy to ever smaller scales, the frequency ω of the fluctuations increases. In applying the gyrokinetic theory, one must be mindful of the need for this frequency to stay smaller than Ωi. Using the scaling formulae for the characteristic times of the fluc- tuations derived above [Eqs. (254), (270) and (283)], we can determine the conditions for ω ≪ Ωi. Thus, for the gyroki- netic theory to be valid everywhere in the inertial range, we must have k⊥ρi ≪ β3/4i (286) at all scales down to k⊥ρi ∼ 1, i.e., ρi/l0 ≪ β3/2i , not a very stringent condition. Below the ion gyroscale, the KAW cascade (§ 7.5) remains in the gyrokinetic regime as long as k⊥ρi ≪ i (1 +βi) (287) (we are assuming Ti/Te ∼ 1 everywhere). The condition for this still to be true at the electron gyroscale is i (1 +βi) . (288) The ion entropy fluctuations passively mixed by the KAW tur- bulence (§ 7.9) satisfy Eq. (287) at all scales down to the ion collisional cutoff [Eq. (259)] if λmfpi i (1 +βi) . (289) Note that the condition for the ion collisional cutoff to lie above the electron gyroscale is λmfpi βi(1 +βi)1/3 )5/6( (290) In the absence of KAW turbulence, the pure ion-entropy cas- cade (§ 7.10) remains gyrokinetic for k⊥ρi ≪ β3/2i . (291) This is valid at all scales down to the ion collisional cutoff provided λmfpi/l0 ≪ β3i (l0/ρi), an extremely weak condition, which is always satisfied. This is because the ion-entropy fluctuations in this case have much lower frequencies than in the KAW regime. The ion collisional cutoff lies above the electron gyroscale if, similarly to Eq. (290), λmfpi )5/6( . (292) If the condition (290) is satisfied, all fluctuations of the ion distribution function are damped out above the electron gyroscale. This means that below this scale, we only need the electron gyrokinetic equation to be valid, i.e., ω ≪ Ωe. The electron-entropy cascade (§ 7.12), whose characteristic timescale is given by Eq. (283), satisfies this condition for k⊥ρe ≪ β3/2e . (293) This is valid at all scales down to the electron collisional cutoff [Eq. (285)] provided λmfpe/l0 ≪ (ε/εKAW) 2β3e (mi/me) 3(l0/ρe), which is always satisfied. Within the formal expansion we have adopted (k⊥ρi ∼ 1 and k‖λmfpi ∼ βi), it is not hard to see that λmfpi/l0 ∼ ǫ2 and ρi/l0 ∼ ǫ3. Since all other parameters (me/mi, βi, βe etc.) are order unity with respect to ǫ, all of the above con- ditions for the validity of the gyrokinetics are asymptotically correct by construction. However, in application to real as- trophysical plasmas, one should always check whether this construction holds. For example, substituting the relevant pa- rameters for the solar wind shows that the gyrokinetic ap- proximation is, in fact, likely to start breaking down some- where between the ion and electron gyroscales (Howes et al. 2008a).34 This releases a variety of high-frequency wave modes, which may be participating in the turbulent cascade around and below the electron gyroscale (see, e.g., the recent detailed observations of these scales in the magnetosheath by Mangeney et al. 2006; Lacombe et al. 2006 or the early mea- surements of high-frequency fluctuations in the solar wind by Denskat et al. 1983; Coroniti et al. 1982). 7.14. Summary In this section, we have analyzed the turbulence in the dissi- pation range, which turned out to have many more essentially kinetic features than the inertial range. At the ion gyroscale, k⊥ρi ∼ 1, the kinetic cascade rear- ranged itself into two distinct components: part of the (gener- alized) energy arriving from the inertial range was collision- lessly damped, giving rise to a purely kinetic cascade of ion- entropy fluctuations, the rest was converted into a cascade of Kinetic Alfvén Waves (KAW) (Fig. 5; see § 7.1 and § 7.8). The KAW cascade is described by two fluid-like equa- tions for two scalar functions, the magnetic flux function Ψ = −A‖/ 4πmin0i and the scalar potential, expressed, for continuity with the results of § 5, in terms of the function Φ = (c/B0)ϕ. The equations are (see § 7.2) b̂ ·∇Φ, (294) 2 +βi 1 + Z/τ ) b̂ ·∇ ρ2i ∇2⊥Ψ , (295) where b̂ · ∇ = ∂/∂z + (1/vA){Ψ, · · ·}. The density and 34 See this paper also for a set of numerical tests of the validity of gy- rokinetics in the dissipation range, a linear theory of the conversion of KAW into ion-cyclotron-damped Bernstein waves, and a discussion of the potential (un)importance of ion cyclotron damping for the dissipation of turbulence. 40 SCHEKOCHIHIN ET AL. magnetic-field-strength fluctuations are directly related to the scalar potential: . (296) We call Eqs. (294-296) the Electron Reduced Magnetohydro- dynamics (ERMHD). The ion-entropy cascade is described by the ion gyrokinetic equation: +{〈Φ〉Ri ,hi} = 〈Cii[hi]〉Ri . (297) The ion distribution function is mixed by the ring-averaged scalar potential and undergoes a cascade both in the velocity and gyrocenter space—this phase-space cascade is essential for the conversion of the turbulent energy into the ion heat, which can ultimately only be done by collisions (see § 7.9). If the KAW cascade is strong (its power εKAW is an order- unity fraction of the total injected turbulent power ε), it de- termines Φ in Eq. (297), so the ion-entropy cascade is passive with respect to the KAW turbulence. Equations (294-295) and (297) form a closed system that determines the three func- tions Φ, Ψ, hi, of which the latter is slaved to the first two. One can also compute δne and δB‖, which are proportional to Φ [Eq. (296)]. The generalized energy conserved by these equations is given by Eq. (245). If the KAW cascade is weak (εKAW ≪ ε), the ion-entropy cascade dominates the turbulence in the dissipation range and drives low-frequency mostly electrostatic fluctuations, with a subdominant magnetic component. These are given by the following relations (see § 7.10) ρivthi 2(1 + τ/Z) eik·r d3vJ0(ai)hik, (298) ρivthi , (299) eik·r × 1 + Z/τ J0(ai) hik, (300) eik·r v2thi J1(ai) hik, (301) where ai = k⊥v⊥/Ωi, Equations (297) and (298) form a closed system for Φ and hi. The rest of the fields, namely δne, Ψ and δB‖, are slaved to hi via Eqs. (299-301). The fluid and kinetic models summarized above are valid between the ion and electron gyroscales. Below the electron gyroscale, the collisionless damping of the KAW cascade con- verts it into a cascade of electron entropy, similar in nature to the ion-entropy cascade (§ 7.12). The KAW cascade and the low-frequency turbulence asso- ciated with the ion-entropy cascade have distinct scaling be- haviors. For the KAW cascade, the spectra of the electric, density and magnetic fluctuations are (§ 7.5) EE (k⊥) ∝ k−1/3⊥ , En(k⊥) ∝ k ⊥ , EB(k⊥) ∝ k ⊥ . (302) For the ion- and electron-entropy cascades (§ 7.9 and § 7.12), EE (k⊥) ∝ k−4/3⊥ , En(k⊥) ∝ k −10/3 ⊥ , EB(k⊥) ∝ k −16/3 (303) We argued in § 7.11 that the observed spectra in the dissipa- tion range of the solar wind could be the result of a superpo- sition of these two cascades, although a number of alternative theories exist (§ 8.2.6). 8. DISCUSSION OF ASTROPHYSICAL APPLICATIONS We have so far only occasionally referred to some relevant observational evidence for space and astrophysical plasmas. We now discuss in more detail how the theoretical framework laid out above applies to real plasma turbulence in space. Although we will discuss the interstellar medium, accre- tion disks and galaxy clusters towards the end of this sec- tion, the most rewarding source of observational information about plasma turbulence in astrophysical conditions is the so- lar wind and the magnetosheath because only there direct in situ measurements of all the interesting quantities are possi- ble. Measurements of the fluctuating magnetic and velocity fields in the solar wind have been available since the 1960s (Coleman 1968) and a vast literature now exists on their spec- tra, anisotropy, Alfvénic character and many other aspects (a short recent review is Horbury et al. 2005; two long ones are Tu & Marsch 1995; Bruno & Carbone 2005). It is not our aim here to provide a comprehensive survey of what is known about plasma turbulence in the solar wind. Instead, we shall limit our discussion to a few points that we consider impor- tant in light of the theoretical framework proposed in this pa- per.35 As we do this, we shall provide copious references to the main body of the paper, so this section can be read as a data-oriented guide to it, aimed both at a thorough reader who has arrived here after going through the preceding sections and an impatient one who has skipped to this one hoping to find out whether there is anything of “practical” use in the theoretical developments above. 8.1. Inertial-Range Turbulence in the Solar Wind In the inertial range, i.e., for k⊥ρi ≪ 1, the solar-wind turbu- lence should be described by the reduced hybrid fluid-kinetic theory derived in § 5 (KRMHD). Its applicability hinges on three key assumptions: (i) the turbulence is Alfvénic, i.e., con- sists of small (δB/B0 ≪ 1) low-frequency (ω ∼ k‖vA ≪ Ωi) perturbations of an ambient mean magnetic field and corre- sponding velocity fluctuations; (ii) it is strongly anisotropic, k⊥ ≫ k‖; (iii) the equilibrium distribution can be approxi- mated or, at least, reasonably modeled by a Maxwellian with- out loss of essential physics (this will be discussed in § 8.3). If these assumptions are satisfied, KRMHD (summarized in § 5.7) is a rigorous set of dynamical equations for the inertial range, a set of Kolmogorov-style scaling predictions for the Alfvénic component of the turbulence can be produced (the GS theory, reviewed in § 1.2), while to the compressive fluc- tuations, the considerations of § 6 apply. So let us examine the observational evidence. 8.1.1. Alfvénic Nature of the Turbulence The presence of Alfvén waves in the solar wind was re- ported already the early works of Unti & Neugebauer (1968) and Belcher & Davis (1971). Alfvén waves are detected al- ready at very low frequencies (large scales)—and, at these 35 An extended quantitative discussion of the applicability of the gyroki- netic theory to the turbulence in the slow solar wind was given by Howes et al. (2008a). KINETIC TURBULENCE IN MAGNETIZED PLASMAS 41 low frequencies, have a k−1 spectrum.36 This spectrum cor- responds to a uniform distribution of scales/frequencies of waves launched by the coronal activity of the Sun. Nonlin- ear interaction of these waves gives rise to an Alfvénic tur- bulent cascade of the type that was discussed above. The ef- fective outer scale of this cascade can be detected as a spec- tral break where the k−1 scaling steepens to the Kolmogorov slope k−5/3 (see Bavassano et al. 1982; Marsch & Tu 1990a; Horbury et al. 1996 for fast-wind results on the spectral break; for a discussion of the effective outer scale in the slow wind at 1 AU, see Howes et al. 2008a). The particular scale at which this happens increases with the distance from the Sun (Bavassano et al. 1982), reflecting the more developed state of the turbulence at later stages of evolution. At 1 AU, the outer scale is roughly in the range of 105 − 106 km; the k−5/3 range extends down to scales/frequencies that correspond to a few times the ion gyroradius (102 − 103 km; see Table 1). The range between the outer scale (the spectral break) and the ion gyroscale is the inertial range. In this range, δB/B0 de- creases with scale because of the steep negative spectral slope. Therefore, the assumption of small fluctuations, δB/B0 ≪ 1, while not necessarily true at the outer scale, is increasingly better satisfied further into the inertial range (cf. § 1.3). Are these fluctuations Alfvénic? In a plasma such as the solar wind, they ought to be because, as showed in § 5.3, for k⊥ρi ≪ 1, these fluctuations are rigorously described by the RMHD equations. The magnetic flux is frozen into the ion motions, so displacing a parcel of plasma should produce a matching (Alfvénic) perturbation of the magnetic field line and vice versa: in an Alfvén wave, u⊥ = ±δB⊥/ 4πmin0i. The strongest confirmation that this is indeed true for the inertial-range fluctuations in the solar wind was achieved by Bale et al. (2005), who compared the spectra of electric and magnetic fluctuations and found that they both scale as k−5/3 and follow each other with remarkable precision (see Fig. 1). The electric field is a very good measure of the perpendicular velocity field because, for k⊥ρi ≪ 1, the plasma velocity is the E×B drift velocity, u⊥ = cE× ẑ/B0 (see § 5.4). This picture of agreement between basic theory and ob- servations is upset in a disturbing fashion by an extraordi- nary recent result by Chapman & Hnat (2007); Podesta et al. (2006) and J. E. Borovsky (2008, private communication), who claim different spectral indices for velocity and mag- netic fluctuations—k−3/2 and k−5/3, respectively. This result is puzzling because if it is asymptotically correct in the iner- tial range, it implies either u⊥ ≫ δB⊥ or u⊥ ≪ δB⊥ and it is not clear how perpendicular velocity fluctuations in a near- ideal plasma could fail to produce Alfvénic displacements and, therefore, perpendicular magnetic field fluctuations with matching energies. Plausible explanations may be either that the velocity field in these measurements is polluted by a non- Alfvénic component parallel to the magnetic field (although data analysis by Chapman & Hnat 2007 does not support this) or that the flattening of the velocity spectrum is due to some form of a finite-gyroradius effect or even an energy injection into the velocity fluctuations at scales approaching the ion gyroscale (e.g., from the pressure-anisotropy-driven instabili- 36 Inferred from the frequency spectrum f −1 via the Taylor (1938) hypoth- esis, f ∼ k ·Vsw , where Vsw is the mean velocity at which the wind blows past the spacecraft. The Taylor hypothesis is a good assumption for the so- lar wind because Vsw (∼ 800 km/s in the fast wind, ∼ 300 km/s in the slow wind) is highly supersonic, super-Alfvénic and far exceeds the fluctuating velocities. ties, § 8.3). 8.1.2. Energy Spectrum How solid is the statement that the observed spectrum has a k−5/3 scaling? In individual measurements of the magnetic-energy spectra, very high accuracy is claimed for this scaling: the measured spectral exponent is be- tween 1.6 and 1.7; agreement with Kolmogorov value 1.67 is often reported to be within a few percent (see, e.g., Horbury et al. 1996; Leamon et al. 1998; Bale et al. 2005; Narita et al. 2006; Alexandrova et al. 2008a; Horbury et al. 2008)). There is a somewhat wider scatter of spectral in- dices if one considers large sets of measurement intervals (Smith et al. 2006), but overall, the observational evidence does not appear to be consistent with a k ⊥ spectrum consis- tently found in the MHD simulations with a strong mean field (Maron & Goldreich 2001; Müller et al. 2003; Mason et al. 2007; Perez & Boldyrev 2008, 2009; Beresnyak & Lazarian 2008b) and defended on theoretical grounds in the recent modifications of the GS theory by Boldyrev (2006) and by Gogoberidze (2007) (see footnote 10). This discrepancy be- tween observations and simulations remains an unresolved theoretical issue. It is probably best addressed by numeri- cal modeling of the RMHD equations (§ 2.2) and by a de- tailed comparison of the structure of the Alfvénic fluctuations in such simulations and in the solar wind. 8.1.3. Anisotropy Building up evidence for anisotropy of turbulent fluctua- tions has progressed from merely detecting their elongation along the magnetic field (Belcher & Davis 1971)—to fitting data to an ad hoc model mixing a 2D perpendicular and a 1D parallel (“slab”) turbulent components in some propor- tion37 (Matthaeus et al. 1990; Bieber et al. 1996; Dasso et al. 2005; Hamilton et al. 2008)—to formal systematic unbiased analyses showing the persistent presence of anisotropy at all scales (Bigazzi et al. 2006; Sorriso-Valvo et al. 2006)— to di- rect measurements of three-dimensional correlation functions (Osman & Horbury 2007)—and finally to computing spectral exponents at fixed angles between k and B0 (Horbury et al. 2008). The latter authors appear to have achieved the first direct quantitative confirmation of the GS theory by demon- strating that the magnetic-energy spectrum scales as k wavenumbers perpendicular to the mean field and as k−2 wavenumbers parallel to it [consistent with the first scaling relation in Eq. (4)]. This is the closest that observations have got to confirming the GS relation k‖ ∼ k ⊥ [see Eq. (5)] in a real astrophysical turbulent plasma. 8.1.4. Compressive Fluctuations According to the theory developed in § 5, the density and magnetic-field-strength fluctuations are passive, energetically decoupled from and mixed by the Alfvénic cascade (§ 5.5; these are slow and entropy modes in the collisional MHD limit—see § 2.4 and § 6.1). These fluctuations are expected to be pressure-balanced, as expressed by Eq. (22) or, more gen- erally in gyrokinetics, by Eq. (67). There is, indeed, strong 37 These techniques originate from the view of MHD turbulence as a su- perposition of a 2D turbulence and an admixture of Alfvén waves (Fyfe et al. 1977; Montgomery & Turner 1981). As we discussed in § 1.2, we consider the Goldreich & Sridhar (1995, 1997) view of a critically balanced Alfvénic cascade to be better physically justified. 42 SCHEKOCHIHIN ET AL. evidence that magnetic and thermal pressures in the solar wind are anticorrelated, although there are some indications of the presence of compressive, fast-wave-like fluctuations as well (Roberts 1990; Burlaga et al. 1990; Marsch & Tu 1993; Bavassano et al. 2004). Measurements of density and field-strength fluctua- tions done by a variety of different methods both at 1 AU (Celnikier et al. 1983, 1987; Marsch & Tu 1990b; Bershadskii & Sreenivasan 2004; Hnat et al. 2005; Kellogg & Horbury 2005; Alexandrova et al. 2008a) and near the Sun (Lovelace et al. 1970; Woo & Armstrong 1979; Coles & Harmon 1989; Coles et al. 1991) show fluctuation levels of order 10% and spectra that appear to have a k−5/3 scaling above scales of order 102 − 103 km, which approxi- mately corresponds to the ion gyroscale. The Kolmogorov value of the spectral exponent is, as in the case of Alfvénic fluctuations, measured quite accurately in individual cases (1.67 ± 0.03 in Celnikier et al. 1987). Interestingly, the higher-order structure function exponents measured for the magnetic-field strength show that it is a more intermittent quantity than the velocity or the vector magnetic field (i.e., than the Alfvénic fluctuations) and that the scaling expo- nents are quantitatively very close to the values found for passive scalars in neutral fluids (Bershadskii & Sreenivasan 2004; Bruno et al. 2007). One might argue that this lends some support to the theoretical expectation of passive magnetic-field-strength fluctuations. Considering that in the collisionless regime these fluctua- tions are supposed to be subject to strong kinetic damping (§ 6.2.2), the presence of well-developed Kolmogorov-like and apparently undamped turbulent spectra is more surprising than has perhaps been publicly acknowledged. An extended discussion of this issue was given in § 6.3. Without the in- clusion of the dissipation effects associated with the finite ion gyroscale, the passive cascade of the density and field strength is purely perpendicular to the (exact) local magnetic field and does not lead to any scale refinement along the field. This im- plies highly anisotropic field-aligned structures, whose length is determined by the initial conditions (i.e., conditions in the corona). The kinetic damping is inefficient for such fluctua- tions. While this would seem to explain the presence of fully fledged power-law spectra, it is not entirely obvious that the parallel cascade is really absent once dissipation is taken into account (Lithwick & Goldreich 2001), so the issue is not yet settled. This said, we note that there is plenty of evidence of a high degree of anisotropy and field alignment of the den- sity microstructure in the inner solar wind and outer corona (e.g., Armstrong et al. 1990; Grall et al. 1997; Woo & Habbal 1997). There is also evidence that the local structure of the compressive fluctuations at 1 AU is correlated with the coro- nal activity, implying some form of memory of initial condi- tions (Kiyani et al. 2007; Hnat et al. 2007; Wicks et al. 2009). We note, finally, that whether compressive fluctuations in the inertial range can develop short parallel scales should also tell us how much ion heating can result from their damping (see § 6.2.4). 8.2. Dissipation-Range Turbulence in the Solar Wind and the Magnetosheath At scales approaching the ion gyroscale, k⊥ρi ∼ 1, effects associated with the finite extent of ion gyroorbits start to matter. Observationally, this transition manifests itself as a clear break in the spectrum of magnetic fluctuations, with the inertial-range k−5/3 scaling replaced by a steeper slope (see Fig. 1). While the electrons at these scales can be treated as an isothermal fluid (as long as we are considering fluctuations above the electron gyroscale, k⊥ρe ≪ 1; see § 4), the fully gyrokinetic description (§ 3) has to be adopted for the ions. It is, indeed, to understand plasma dynamics at and around k⊥ρi ∼ 1 that gyrokinetics was first designed in fusion plasma theory (Frieman & Chen 1982; Brizard & Hahm 2007). In or- der for gyrokinetics and further dissipation-range approxima- tions that follow from it (§ 7) to be a credible approach in the solar wind and other space plasmas, it has to be estab- lished that fluctuations at and below the ion gyroscale are still strongly anisotropic, k‖ ≪ k⊥. If that is the case, then their frequencies (ω∼ k‖vAk⊥ρi, see § 7.3) will still be smaller than the cyclotron frequency in at least a part of the “dissipation range”38—the range of scales k⊥ρi & 1 (see § 7.13). Note that additional information about the dissipation- range turbulence can be extracted from the measurements in the magnetosheath—while scales above the ion gyroscale are probably non-universal there, the dissipation range appears to display universal behavior, mostly similar to the solar wind (see, e.g., Alexandrova 2008). This complements the obser- vational picture emerging from the solar-wind data and al- lows us to learn more as fluctuation amplitudes in the mag- netosheath are larger and much smaller scales can be probed than in the solar wind (Mangeney et al. 2006; Lacombe et al. 2006; Alexandrova et al. 2008b). 8.2.1. Anisotropy We know with a fair degree of certainty that the fluctu- ations that cascade down to the ion gyroscale from the in- ertial range are strongly anisotropic (§ 8.1.3). While it ap- pears likely that the anisotropy persists at k⊥ρi ∼ 1, it is ex- tremely important to have a clear verdict on this assumption from solar wind measurements. While Leamon et al. (1998) and, more recently, Hamilton et al. (2008) did present some evidence that magnetic fluctuations in the solar wind have a degree of anisotropy below the ion gyroscale, no definitive study similar to Horbury et al. (2008) or Bigazzi et al. (2006); Sorriso-Valvo et al. (2006) exists as yet. In the magne- tosheath, where the dissipation-range scales are easier to mea- sure than in the solar wind, recent analysis by Sahraoui et al. (2006); Alexandrova et al. (2008b) does show evidence of strong anisotropy. Besides confirming the presence of the anisotropy, it would be interesting to study its scaling characteristics: e.g., check the scaling prediction k‖ ∼ k ⊥ [Eq. (241); see also § 7.9.4 and § 7.10.3] in a similar fashion as the GS relation k‖ ∼ k [Eq. (5)] was corroborated by Horbury et al. (2008). In this paper, we have proceeded on the assumption that the anisotropy, and, therefore, low frequencies (ω ≪ Ωi) do characterize fluctuations in the dissipation range—or, at least, that the low-frequency anisotropic fluctuations are a signifi- cant energy cascade channel and can be considered decoupled from any possible high-frequency dynamics. 8.2.2. Transition at the Ion Gyroscale: Collisionless Damping and Heating 38 This term, customary in the space-physics literature, is somewhat of a misnomer because, as we have seen in § 7, rich dissipationless turbulent dynamics are present in this range alongside what is normally thought of as dissipation. KINETIC TURBULENCE IN MAGNETIZED PLASMAS 43 If the fluctuations at the ion gyroscale have k‖ ≪ k⊥ and ω ≪ Ωi (§ 8.2.1), they are not subject to the cyclotron res- onance (ω − k‖v‖ = ±Ωi), but are subject to the Landau one (ω = k‖v‖). Alfvénic fluctuations at the ion gyroscale are no longer decoupled from the compressive fluctuations and can be Landau-damped (§ 7.1). It seems plausible that it is the inflow of energy from the Alfvénic cascade that ac- counts for a pronounced local flattening of the spectrum of density fluctuations in the solar wind observed just above the ion gyroscale (Woo & Armstrong 1979; Celnikier et al. 1983, 1987; Coles & Harmon 1989; Marsch & Tu 1990b; Coles et al. 1991; Kellogg & Horbury 2005).39 In energetic terms, Landau damping amounts to a redis- tribution of generalized energy from electromagnetic fluctu- ations to entropy fluctuations (§ 3.4, § 7.8). This gives rise to the entropy cascade, ultimately transferring the Landau- damped energy into ion heat (§ 3.5, § 7.9 and § 7.10). How- ever, only part of the inertial-range cascade is so damped be- cause an alternative, electron, cascade channel exists: the ki- netic Alfvén waves (§§ 7.2-7.8). The energy transferred into the KAW-like fluctuations can cascade to the electron gy- roscale, where it is Landau damped on electrons, converting first into the electron entropy cascade and then electron heat (§ 7.12). Thus, the transition at the ion gyroscale ultimately de- cides in what proportion the turbulent energy arriving from the inertial range is distributed between the ion and electron heat. How the fraction of power going into either depends on parameters—βi, Ti/Te, amplitudes, . . . —is a key unanswered question both in space and astrophysical (see, e.g., § 8.5) plas- mas. Gyrokinetics appears to be an ideal tool for addressing this question both analytically and numerically (Howes et al. 2008b). Within the framework outlined in this paper, the min- imal model appropriate for studying the transition at the ion gyroscale is the system of equations for isothermal electrons and gyrokinetic ions derived in § 4 (it is summarized in § 4.9). 8.2.3. Ion Gyroscale vs. Ion Inertial Scale It is often assumed in the space physics literature that it is at the ion inertial scale, di = ρi/ βi, rather than at the ion gy- roscale ρi that the spectral break between the inertial and dis- sipation range occurs. The distinction between di and ρi be- comes noticeable when βi is significantly different from unity, a relatively rare occurrence in the solar wind. While some at- tempts to determine at which of these two scales a spectral break between the inertial and dissipation ranges occurs have produced claims that di is a more likely candidate (Smith et al. 2001), more comprehensive studies of the available data sets conclude basically that it is hard to tell (Leamon et al. 2000; Markovskii et al. 2008). In the gyrokinetic approach advocated in this paper, the ion inertial scale does not play a special role (see § 7.1). The only parameter regime in which di does appear as a special scale is Ti ≪ Te (“cold ions”), when the Hall MHD approximation can be derived in a systematic way (see Appendix E). This, however, is not the right limit for the solar wind or most other astrophysical plasmas of interest because ions are rarely cold. Hall MHD is discussed further in § 8.2.6 and Appendix E. 8.2.4. KAW Turbulence 39 Celnikier et al. (1987) proposed that the flattening might be a k−1 spec- trum analogous to Batchelor’s spectrum of passive scalar variance in the viscous-convective range. We think this analogy cannot apply because den- sity is not passive at or below the ion gyroscale. If gyrokinetics is valid at scales k⊥ρi & 1 (i.e., if k‖ ≪ k⊥, ω ≪ Ωi and it is acceptable to at least model the equilibrium distribution as a Maxwellian; see § 8.3), the electromagnetic fluctuations below the ion gyroscale will be described by the fluid approximation that we derived in § 7.2 and referred to ERMHD. The wave solutions of this system of equations are the kinetic Alfvén waves (§§ 7.3-7.4) and it is possible to ar- gue for a GS-style critically balanced cascade of KAW-like electromagnetic fluctuations (§ 7.5) between the ion and elec- tron gyroscales (Landau damped on electrons at k⊥ρe ∼ 1; the expression for the KAW damping rate in the gyrokinetic limit is given in Howes et al. 2006; see also Fig. 8). Individual KAW have, indeed, been detected in space plas- mas (e.g., Grison et al. 2005). What about KAW turbulence? How does one tell whether any particular spectral slope one is measuring corresponds to the KAW cascade or fits some alter- native scheme for the dissipation-range turbulence (§ 8.2.6)? It appears to be a sensible program to look for specific rela- tionships between different fields predicted by theory (§ 7.2) and for the corresponding spectral slopes and scaling relations for the anisotropy (§ 7.5). This means that simultaneous mea- surements of magnetic, electric, density and magnetic-field- strength fluctuations are needed. For the solar wind, the spectra of electric and magnetic fluctuations below the ion gyroscale reported by Bale et al. (2005) are consistent with the k−1/3 and k−7/3 scalings pre- dicted for an anisotropic critically balanced KAW cascade (§ 7.5; see Fig. 1 for theoretical scaling fits superimposed on a plot taken from Bale et al. 2005; note, however, that Bale et al. 2005 themselves interpreted their data in a some- what different way and that their resolution was in any case not sufficient to be sure of the scalings). They were also able to check that their fluctuations satisfied the KAW dispersion relation—for critically balanced fluctuations, this is, indeed, plausible. Magnetic-fluctuation spectra recently reported by Alexandrova et al. (2008a) are only slightly steeper than the theoretical k−7/3 KAW spectrum. These authors also find a significant amount of magnetic-field-strength fluctuations in the dissipation range, with a spectrum that follows the same scaling—this is again consistent with the theoretical picture of KAW turbulence [see Eq. (223)]. Measurements reported by Czaykowska et al. (2001); Alexandrova et al. (2008b) for the magnetosheath appear to present a similar picture. The density spectra measured by Celnikier et al. (1983, 1987) steepen below the ion gyroscale following the flattened segment around k⊥ρi ∼ 1 (discussed in § 8.2.2). For a KAW cascade, the density spectrum should be k−7/3 (§ 7.5); with- out KAW, k−10/3 (§ 7.10.2). The slope observed in the papers cited above appears to be somewhat shallower even than k−2 (cf. a similar result by Spangler & Gwinn 1990 for the ISM; see § 8.4.1), but, given imperfect resolution, neither seriously in contradiction with the prediction based on the KAW cas- cade, nor sufficient to corroborate it. Unfortunately, we have not found published simultaneous measurements of density- and magnetic- or electric-fluctuation spectra. 8.2.5. Variability of the Spectral Slope While many measurements consistent with the KAW pic- ture can be found, there are also many in which the spectra are much steeper (Denskat et al. 1983; Leamon et al. 1998). Analysis of a large set of measurements of the magnetic- fluctuation spectra in the dissipation range of the solar wind reveals a wide spread in the spectral indices: roughly between 44 SCHEKOCHIHIN ET AL. −1 and −4 (Smith et al. 2006). There is evidence of a weak positive correlation between steeper dissipation-range spectra and higher ion temperatures (Leamon et al. 1998) or higher cascade rates calculated from the inertial range (Smith et al. 2006). This suggests that a larger amount of ion heating may correspond to a fully or partially suppressed KAW cascade, which is in line with our view of the ion heating and the KAW cascade as the two competing channels of the overall kinetic cascade (§ 7.8). With a weakened KAW cascade, all or part of the dissipation range would be dominated by the ion entropy cascade—a purely kinetic phenomenon manifested by pre- dominantly electrostatic fluctuations and very steep magnetic- energy spectra (§ 7.10). This might account both for the steep- ness of the observed spectra and for the spread in their indices (§ 7.11), although many other theories exist (see § 8.2.6). While we may thus have a plausible argument, this is not yet a satisfactory quantitative theory that would allow us to predict when the KAW cascade is present and when it is not or what dissipation-range spectrum should be expected for given values of the solar-wind parameters (βi, Ti/Te, etc.). Resolu- tion of this issue again appears to hinge on the question of how much turbulent power is diverted into the ion entropy cascade (equivalently, into ion heat) at the ion gyroscale (see § 8.2.2). 8.2.6. Alternative Theories of the Dissipation Range A number of alternative theories and models have been put forward to explain the observed spectral slopes (and their vari- ability) in the dissipation range. It is not our aim to review or critique them all in detail, but perhaps it is useful to provide a few brief comments about some of them in light of the theo- retical framework constructed in this paper. This entire theoretical framework hinges on adopting gy- rokinetics as a valid description or, at least, a sensible model that does not miss any significant channels of energy cascade and dissipation. While we obviously believe this to be the right approach, it is worth spelling out what effects are left out “by construction.” Parallel Alfvén-wave cascade and ion cyclotron damping. — The use of gyrokinetics assumes that fluctuations stay anisotropic at all scales, k‖ ≪ k⊥, and, therefore, ω ≪ Ωi, so the cyclotron resonances are ordered out. However, if one insists on routing the Alfvén-wave energy into a paral- lel cascade, e.g., by forcibly setting k⊥ = 0, it is pos- sible to construct a weak turbulence theory in which it is dissipated by the ion cyclotron damping (Yoon & Fang 2008). Numerical simulations of 3D MHD turbulence do not support the possibility of a parallel Alfvén-wave cascade (Shebalin et al. 1983; Oughton et al. 1994; Cho & Vishniac 2000; Maron & Goldreich 2001; Cho et al. 2002; Müller et al. 2003). Solar-wind evidence that the perpendicular cascade dominates is quite strong for the inertial range (§ 8.1.3) and less so for the dissipation range (§ 8.2.1). While, as stated in § 8.2.1, one cannot yet definitely claim that observations tell us that ω ≪ Ωi at k⊥ρi ∼ 1, it has been argued that observations do not appear to be consistent with cyclotron damping being the main mechanism for the dissipation of the inertial-range Alfvénic turbulence at the ion gyroscale (Leamon et al. 1998, 2000; Smith et al. 2001). Ion-cyclotron resonance could conceivably be reached somewhere in the dissipation range (see § 7.13). At this point gyrokinetics will formally break down, although, as argued by Howes et al. (2008a, see their § 3.6), this does not necessarily mean that ion cyclotron damping will become the dominant dissipation channel for the turbulence. Parallel whistler cascade. — A parallel magnetosonic/whistler cascade eventually damped by the electron cyclotron resonance (Stawicki et al. 2001) is also excluded in the construction of gyrokinetics. The whistler cascade has been given some consideration in the Hall MHD approxi- mation (further discussed at the end of this section). Both weak-turbulence theory (Galtier 2006) and 3D numerical simulations (Cho & Lazarian 2004) concluded that, like in MHD, the turbulent cascade is highly anisotropic, with perpendicular energy transfer dominating over the parallel one.40 The same conclusion appears to have been reached in recent 2D kinetic PIC simulations by Gary et al. (2008); Saito et al. (2008). Thus, the turbulence again seems to be driven into the gyrokinetically accessible regime. While theory and numerical simulations appear to make arguing in favor of a parallel cascade and cyclotron heat- ing difficult, there exists some observational evidence in sup- port of them, especially for the near-Sun solar wind (e.g., Harmon & Coles 2005). Thus, the presence or relative im- portance of the cyclotron heating in the solar wind and, more generally, the mechanism(s) responsible for the observed per- pendicular ion heating (Marsch et al. 1983) remain a largely open problem. Besides the theories mentioned above, many other ideas have been proposed, some of which attempted to reconcile the dominance of the low-frequency perpendic- ular cascade with the possibility of cyclotron heating (e.g., Chandran 2005b; Markovskii et al. 2006; see Hollweg 2008 for a concise recent review of the problem). Mirror cascade. — Sahraoui et al. (2006) analyzed a set of Cluster multi-spacecraft measurements in the magnetosheath and reported a broad power-law (∼ k−8/3) spectrum of mirror structures at and below the ion gyroscale. They claim that these are not KAW-like fluctuations because their frequency is zero in the plasma frame. Although these structures are highly anisotropic with k‖ ≪ k⊥, they cannot be described by the gyrokinetic theory in its present form because δB‖/B0 is very large (∼ 40%, occasionally reaching unity) and because the particle trapping by fluctuations, which is likely to be important in the nonlinear physics of the mirror instabil- ity (Kivelson & Southwood 1996; Pokhotelov et al. 2008; Rincon et al. 2009), is ordered out in gyrokinetics. Thus, if a “mirror cascade” exists, it is not captured in our description. More generally, the effect of the pressure-anisotropy-driven instabilities on the turbulence in the dissipation range is a wide open area, requiring further analytical effort (see § 8.3). If k‖ ≪ k⊥, ω ≪ Ωi, and δB/B0 ≪ 1 are accepted for the dissipation range and plasma instabilities at the ion gyroscale (§ 8.3) are ignored, the formal gyrokinetic theory and its asymptotic consequences derived above should hold. There are two essential features of the linear physics at and below the ion gyroscale that must play some role: the collisionless (Landau) damping and the dispersive nature of the wave so- lutions (see Fig. 8 and § 7.3; cf., e.g., Leamon et al. 1999; Stawicki et al. 2001). Both of these features have been em- ployed to explain the spectral break at the ion gyroscale and the spectral slopes below it. 40 It is possible to produce a parallel cascade artificially by running 1D simulations (Matthaeus et al. 2008b). KINETIC TURBULENCE IN MAGNETIZED PLASMAS 45 Landau damping and instrumental effects. — In most of our dis- cussion, (§ 7, §§ 8.2.4-8.2.5), we effectively assumed that the Landau damping is only important at k⊥ρi ∼ 1 and k⊥ρe ∼ 1, but not in between, so we could talk about asymptotic scal- ings and dissipationless cascades. However, as was noted in § 7.6, a properly asymptotic scaling behavior in the dis- sipation range is probably impossible in nature because the scale separation between the ion and electron gyroscales is only about (mi/me) 1/2 ≃ 43. In particular, there is not always a wide scale interval where the kinetic damping is negligi- bly small (especially at low βi; see Fig. 8; cf. Leamon et al. 1999). Howes et al. (2008a) proposed a model of how the presence of damping combined with instrumental effects (a resolution floor) could lead to measured spectra that look like power laws steeper than k−7/3, with the effective spectral ex- ponent depending on plasma parameters (we refer the reader to that paper for a discussion of how this compares with pre- vious models of a similar kind, e.g., Li et al. 2001). A key physical assumption of theirs and similar models is that the amount of power drained from the Alfvén-wave and KAW cascades into the ion heat is set by the strength of the linear damping. Whether this is justified is not yet clear. Hall and Electron MHD. — If Landau damping is deemed unimportant in some part of the dissipation range (which can be true in some regimes; see Fig. 8 and Howes et al. 2006, 2008a,b) and the wave dispersion is considered to be the salient feature, it might appear that a fluid, rather than kinetic, description should be sufficient. Hall MHD (Mahajan & Yoshida 1998) or its kdi ≫ 1 limit the Electron MHD (Kingsep et al. 1990) have been embraced by many au- thors as such a description, suitable both for analytical argu- ments (Goldreich & Reisenegger 1992; Krishan & Mahajan 2004; Gogoberidze 2005; Galtier & Bhattacharjee 2003; Galtier 2006; Alexandrova et al. 2008a) and numerical sim- ulations (Biskamp et al. 1996, 1999; Ghosh et al. 1996; Ng et al. 2003; Cho & Lazarian 2004; Shaikh & Zank 2005; Galtier & Buchlin 2007; Matthaeus et al. 2008b). To what extent does this constitute an approach alterna- tive to (and better than?) gyrokinetics (as suggested, e.g., by Matthaeus et al. 2008b)? For fluctuations with k‖ ≪ k⊥, Hall MHD is merely a particular limit of gyrokinetics: βi ≪ 1 and Ti/Te ≪ 1 (cold-ion limit; see Appendix E). If k‖ is not small compared to k⊥, then the gyrokinetics is not valid, while Hall MHD continues to describe the cold-ion limit correctly (e.g., Ito et al. 2004; Hirose et al. 2004), capturing in particular the whistler branch of the dispersion relation. However, as we have already mentioned above, the dominance of the perpen- dicular energy transfer (k‖ ≪ k⊥) is supported both by weak- turbulence theory for Hall MHD (Galtier 2006) and by 3D numerical simulations of the Electron MHD (Cho & Lazarian 2004). Thus, the gyrokinetic theory and its rigorous limits, such as ERMHD (§ 7.2), supersede Hall MHD for anisotropic tur- bulence. Since ions are generally not cold in the solar wind (or any other plasma discussed here), Hall MHD is not for- mally a relevant approximation. It also entirely misses the kinetic damping and the associated entropy cascade channel leading to particle heating (§ 7.1, § 7.9 and § 7.10). However, Hall MHD does capture the Alfvén waves becoming disper- sive and numerical simulations of it do show a spectral break, although, technically speaking, at the wrong scale (di instead of ρi; see § 7.1). Although Hall MHD cannot be rigorously used as quantitative theory of the spectral break and the asso- ciated change in the nature of the turbulent cascade, the Hall MHD equations in the limit kdi ≫ 1 are mathematically sim- ilar to our ERMHD equations (see § 7.2 and Appendix E) to within constant coefficients probably not essential for quali- tative models of turbulence. Therefore, results of numerical simulations of Hall and Electron MHD cited above are di- rectly useful for understanding the KAW cascade—and, in- deed, in the limit kdi ≫ 1, kde ≪ 1, they are mostly consistent with the scaling arguments of § 7.5. Alfvén vortices. — Finally we mention an argument pertaining to the dissipation-range spectra that is not based on energy cascades at all. Based on the evidence of Alfvén vortices in the magnetosheath, Alexandrova (2008) speculated that steep power-law spectra observed in the dissipation range at least in some cases could reflect the geometry of the ion-gyroscale structures rather than a local energy cascade. If Alfvén vor- tices are a common feature, this possibility cannot be ex- cluded. However, the resulting geometrical spectra are quite steep (k−4 and steeper), so they can become important only if the KAW cascade is weak or suppressed—somewhat simi- larly to the steep spectra associated with the entropy cascade (§ 7.11). 8.3. Is Equilibrium Distribution Isotropic and Maxwellian? In rigorous theoretical terms, the weakest point of this pa- per is the use of a Maxwellian equilibrium. Formally, this is only justified when the collisions are weak but not too weak: we ordered the collision frequency as similar to the fluctu- ation frequency [Eq. (49)]. This degree of collisionality is sufficient to prove that a Maxwellian equilibrium distribution F0s(v) does indeed emerge in the lowest order of the gyroki- netic expansion (Howes et al. 2006). This argument works well for plasmas such as the ISM (§ 8.4), where collisions are weak (λmfpi ≫ ρi) but non-negligible (λmfpi ≪ L). In space plasmas, the mean free path is of the order of 1 AU—the dis- tance between the Sun and the Earth (see Table 1). Strictly speaking, in so highly collisionless a plasma, the equilib- rium distribution does not have to be either Maxwellian or isotropic. The conservation of the first adiabatic invariant, µ = v2⊥/2B, suggests that temperature anisotropy with respect to the magnetic-field direction (T0⊥ 6= T0‖) may exist. When the relative anisotropy is larger than (roughly) 1/βi, it triggers several very fast growing plasma instabilities: most promi- nently the firehose (T0⊥ < T0‖) and mirror (T0⊥ > T0‖) modes (e.g., Gary et al. 1976). Their growth rates peak around the ion gyroscale, thus giving rise to additional energy injection at k⊥ρi ∼ 1. No definitive analytical theory of how these fluctuations sat- urate, cascade and affect the equilibrium distribution has been proposed. It appears to be a reasonable expectation that the fluctuations resulting from temperature anisotropy will satu- rate by limiting this anisotropy. This idea has some support in solar-wind observations: while the degree of anisotropy of the core particle distribution functions varies consider- ably between data sets, the observed anisotropies do seem to populate the part of the parameter plane (T0⊥/T0‖,βi) cir- cumscribed in a rather precise way by the marginal stabil- ity boundaries for the mirror and firehose (Gary et al. 2001; Kasper et al. 2002; Marsch et al. 2004; Hellinger et al. 2006; Matteini et al. 2007).41 41 Note that Kellogg et al. (2006) measure the electric-field fluctuations 46 SCHEKOCHIHIN ET AL. If we want to study turbulence in data sets that do not lie too close to these stability boundaries, assuming an isotropic Maxwellian equilibrium distribution [Eq. (54)] is probably an acceptable simplification, although not an entirely rigor- ous one. Further theoretical work is clearly possible on this subject: thus, it is not a problem to formulate gyrokinetics with an arbitrary equilibrium distribution (Frieman & Chen 1982) and starting from that, once can generalize the results of this paper (for the KRMHD system, § 5, this has been done by Chen et al. 2009). Treating the instabilities themselves might prove more difficult, requiring the gyrokinetic order- ing to be modified and the expansion carried to higher orders to incorporate features that are not captured by gyrokinetics, e.g., short parallel scales (Rosin et al. 2009), particle trap- ping (Pokhotelov et al. 2008; Rincon et al. 2009), or nonlin- ear finite-gyroradius effects (Califano et al. 2008). Note that the theory of the dissipation-range turbulence will probably need to be modified to account for the additional energy in- jection from the instabilities and for the (yet unclear) way in which this energy makes its way to dissipation and into heat. Besides the anisotropies, the particle distribution functions in the solar wind (especially the electron one) exhibit non- Maxwellian suprathermal tails (see Maksimovic et al. 2005; Marsch 2006, and references therein). These contain small (∼ 5% of the total density) populations of energetic particles. Both the origin of these particles and their effect on turbulence have to be modeled kinetically. Again, it is possible to formu- late gyrokinetics for general equilibrium distributions of this kind and examine the interaction between them and the turbu- lent fluctuations, but we leave such a theory outside the scope of this paper. Thus, much remains to be done to incorporate realistic equi- librium distribution functions into the gyrokinetic description of the solar wind plasma. In the meanwhile, we believe that the gyrokinetic theory based on a Maxwellian equilibrium dis- tribution as presented in this paper, while idealized and imper- fect, is nevertheless a step forward in the analytical treatment of the space-plasma turbulence compared to the fluid descrip- tions that have prevailed thus far. 8.4. Interstellar Medium While the solar wind is unmatched by other astrophysical plasmas in the level of detail with which turbulence in it can be measured, the interstellar medium (ISM) also offers an ob- server a number of ways of diagnosing plasma turbulence, which, in the case of the ISM, is thought to be primarily ex- cited by supernova explosions (Norman & Ferrara 1996). The accuracy and resolution of this analysis are due to improve rapidly thanks to many new observatories, e.g., LOFAR,42 Planck (Enßlin et al. 2006), and, in more distant future, the SKA (Lazio et al. 2004). The ISM is a spatially inhomogeneous environment consist- ing of several phases that have different temperatures, densi- ties and degrees of ionization (Ferrière 2001).43 We will use the Warm ISM phase (see Table 1) as our fiducial interstel- lar plasma and discuss briefly what is known about the two main observationally accessible quantities—the electron den- sity and magnetic fields—and how this information fits into in the ion-cyclotron frequency range, estimate the resulting velocity-space diffusion and argue that it is sufficient to isotropize the ion distribution 42 http://www.lofar.org 43 And, therefore, different degrees of importance of the neutral particles and the associated ambipolar damping effects—these will not be discussed here; see Lithwick & Goldreich 2001. the theoretical framework proposed here. 8.4.1. Electron Density Fluctuations The electron-density fluctuations inferred from the inter- stellar scintillation measurements appear to have a spectrum with an exponent ≃ −1.7, consistent with the Kolmogorov scaling (Armstrong et al. 1981, 1995; Lazio et al. 2004; see, however, dissenting evidence by Smirnova et al. 2006, who claim a spectral exponent closer to −1.5). This holds over about 5 decades of scales: λ ∈ (105,1010) km. Other observa- tional evidence at larger and smaller scales supports the case for this presumed inertial range to be extended over as many as 12 decades: λ ∈ (102,1015) km, a fine example of scale separation that prompted an impressed astrophysicist to dub the density scaling “The Great Power Law in the Sky.” The upper cutoff here is consistent with the estimates of the su- pernova scale of order 100 pc—presumably the outer scale of the turbulence (Norman & Ferrara 1996) and also roughly the scale height of the galactic disk (obviously the upper bound on the validity of any homogeneous model of the ISM tur- bulence). The lower cutoff is an estimate for the inner scale below which the logarithmic slope of the density spectrum steepens to about −2 (Spangler & Gwinn 1990). Higdon (1984) was the first to realize that the electron- density fluctuations in the ISM could be attributed to a cas- cade of a passive tracer mixed by the ambient turbulence (the MHD entropy mode; see § 2.6). This idea was brought to ma- turity by Lithwick & Goldreich (2001), who studied the pas- sive cascades of the slow and entropy modes in the frame- work of the GS theory (see also Maron & Goldreich 2001). If the turbulence is assumed anisotropic, as in the GS theory, the passive nature of the density fluctuations with respect to the decoupled Alfvén-wave cascade becomes a rigorous re- sult both in MHD (§ 2.4) and, as we showed above, in the more general gyrokinetic description appropriate for weakly collisional plasmas (§ 5.5). Anisotropy of the electron-density fluctuations in the ISM is, indeed, observationally supported (Wilkinson et al. 1994; Trotter et al. 1998; Rickett et al. 2002; Dennett-Thorpe & de Bruyn 2003; Heyer et al. 2008, see also Lazio et al. 2004 for a concise discussion), although detailed scale-by-scale measurements are not currently possible. If the underlying Alfvén-wave turbulence in the ISM has ⊥ spectrum, as predicted by GS, so should the elec- tron density (see § 2.6). As we discussed in § 6.3, the phys- ical nature of the inner scale for the density fluctuations de- pends on whether they have a cascade in k‖ and are effi- ciently damped when k‖λmfpi ∼ 1 or fail to develop small parallel scales and can, therefore, reach k⊥ρi ∼ 1. The ob- servationally estimated inner scale is consistent with the ion gyroscale, ρi ∼ 103 km (see Table 1; note that the ion iner- tial scale di = ρi/ βi is similar to ρi at the moderate values of βi characteristic of the ISM—see further discussion of the (ir)relevance of di in § 7.1, § 8.2.3 and Appendix E). How- ever, since the mean free path in the ISM is not huge (Ta- ble 1), it is not possible to distinguish this from the perpen- dicular cutoff k−1⊥ ∼ λ mfpiL −1/2 ∼ 500 km implied by the par- allel cutoff at k‖λmfpi ∼ 1 [see Eq. (220)], as advocated by Lithwick & Goldreich (2001). Note that the relatively short mean free path means that much of the scale range spanned by the Great Power Law in the Sky is, in fact, well described by the MHD approximation either with adiabatic (§ 2) or isother- mal (§ 6.1 and Appendix D) electrons. Below the ion gyroscale, the −2 spectral exponent reported KINETIC TURBULENCE IN MAGNETIZED PLASMAS 47 by Spangler & Gwinn (1990) is measured sufficiently impre- cisely to be consistent with the −7/3 expected for the density fluctuations in the KAW cascade (§ 7.5). However, given the high degree of uncertainty about what happens in this “dis- sipation range” even in the much better resolved case of the solar wind (§ 8.2), it would probably be wise to reserve judg- ment until better data are available. 8.4.2. Magnetic Fluctuations The second main observable type of turbulent fluctuations in the ISM are the magnetic fluctuations, accessible indirectly via the measurements of the Faraday rotation of the polar- ization angle of the pulsar light travelling through the ISM. The structure function of the rotation measure (RM) should have the Kolmogorov slope of 2/3 if the magnetic fluctua- tions are due to Alfvénic turbulence described by the GS the- ory. There is a considerable uncertainty in interpreting the available data, primarily due to insufficient spatial resolution (rarely better than a few parsec). Structure function slopes consistent with 2/3 have been reported (Minter & Spangler 1996), but, depending on where one looks, shallower struc- ture functions that seem to steepen at scales of a few parsec are also observed (Haverkorn et al. 2004). A recent study by Haverkorn et al. (2005) detected an in- teresting trend: the RM structure functions computed for re- gions that lie in the galactic spiral arms are nearly perfectly flat down to the resolution limit, while in the interarm regions, they have detectable slopes (although these are mostly shal- lower that 2/3). Observations of magnetic fields in external galaxies also reveal a marked difference in the magnetic-field structure between arms and interarms: the spatially regular (mean) fields are stronger in the interarms, while in the arms, the stochastic fields dominate (Beck 2007). This qualitative difference between the magnetic-field structure in the arms and interarms has been attributed to smaller effective outer scale in the arms (∼ 1 pc, compared to ∼ 102 pc in the in- terarms; see Haverkorn et al. 2008) or to the turbulence in the arms and interarms belonging to the two distinct asymptotic regimes described in § 1.3: closer to the anisotropic Alfvénic turbulence with a strong mean field in the interarms and to the isotropic saturated state of small-scale dynamo in the arms (Schekochihin et al. 2007). 8.5. Accretion Disks Accretion of plasma onto a central black hole or neutron star is responsible for many of the most energetic phenomena observed in astrophysics (see, e.g., Narayan & Quataert 2005 for a review). It is now believed that a linear instability of dif- ferentially rotating plasmas—the magnetorotational instabil- ity (MRI)—amplifies magnetic fields and gives rise to MHD turbulence in astrophysical disks (Balbus & Hawley 1998). Magnetic stresses due to this turbulence transport angular mo- mentum, allowing plasma to accrete. The MRI converts the gravitational potential energy of the inflowing plasma into turbulence at the outer scale that is comparable to the scale height of the disk. This energy is then cascaded to small scales and dissipated into heat—powering the radiation that we see from accretion flows. Fluid MHD simulations show that the MRI-generated turbulence in disks is subsonic and has β ∼ 10 − 100. Thus, on scales much smaller than the scale height of the disk, homogeneous turbulence in the parameter regimes considered in this paper is a valid idealization and the kinetic models developed above should represent a step forward compared to the purely fluid approach. Turbulence is not yet directly observable in disks, so mod- els of turbulence are mostly used to produce testable predic- tions of observable properties of disks such as their X-ray and radio emission. One of the best observed cases is the (pre- sumed) accretion flow onto the black hole coincident with the radio source Sgr A∗ in the center of our Galaxy (see review by Quataert 2003). Depending on the rate of heating and cooling in the inflow- ing plasma (which in turn depend on accretion rate and other properties of the system under consideration), there are differ- ent models that describe the physical properties of accretion flows onto a central object. In one class of models, a geometri- cally thin optically thick accretion disk (Shakura & Sunyaev 1973), the inflowing plasma is cold and dense and well de- scribed as an MHD fluid. When applied to Sgr A∗, these models produce a prediction for its total luminosity that is several orders of magnitude larger than observed. Another class of models, which appears to be more consistent with the observed properties of Sgr A∗, is called radiatively inefficient accretion flows (RIAFs; see Rees et al. 1982; Narayan & Yi 1995 and review by Quataert 2003 of the applications and ob- servational constraints in Sgr A∗). In these models, the in- flowing plasma near the black hole is believed to adopt a two- temperature configuration, with the ions (Ti ∼ 1011 − 1012 K) hotter than the electrons (Te ∼ 109 − 1011 K).44 The electron and ion thermodynamics decouple because the densities are so low that the temperature equalization time ∼ ν−1ie is longer than the time for the plasma to flow into the black hole. Thus, like the solar wind, RIAFs are macroscopically collisionless plasmas (see Table 1 for plasma parameters in the Galactic center; note that these parameters are so extreme that the gy- rokinetic description, while probably better than the fluid one, cannot be expected to be rigorously valid; at the very least, it needs to be reformulated in a relativistic form). At the high temperatures appropriate to RIAFs, electrons radiate energy much more efficiently than the ions (by virtue of their much smaller mass) and are, therefore, expected to contribute dom- inantly to the observed emission, while the thermal energy of the ions is swallowed by the black hole. Since the plasma is collisionless, the electron heating by turbulence largely de- termines the thermodynamics of the electrons and thus the observable properties of RIAFs. The question of which frac- tion of the turbulent energy goes into ion and which into elec- tron heating is, therefore, crucial for understanding accretion flows—and the answer to this question depends on the de- tailed properties of the small-scale kinetic turbulence (e.g., Quataert & Gruzinov 1999; Sharma et al. 2007), as well as on the linear properties of the collisionless MRI (Quataert et al. 2002; Sharma et al. 2003). Since all of the turbulent power coming down the cascade must be dissipated into either ion or electron heat, it is re- ally the amount of generalized energy diverted at the ion gy- roscale into the ion entropy cascade (§§ 7.8-7.9) that decides how much energy is left to heat the electrons via the KAW cascade (§§ 7.2-7.5, § 7.12). Again, as in the case of the solar wind (§ 8.2.2 and § 8.2.5), the transition around the ion gy- roscale from the Alfvénic turbulence at k⊥ρi ≪ 1 to the KAW turbulence at k⊥ρi ≫ 1 emerges as a key unsolved problem. 8.6. Galaxy Clusters 44 It is partly with this application in mind that we carried the general temperature ratio in our calculations; see footnote 17. 48 SCHEKOCHIHIN ET AL. Galaxy clusters are the largest plasma objects in the Uni- verse. Like the other examples discussed above, the intraclus- ter plasma is in the weakly collisional regime (see Table 1). Fluctuations of electron density, temperature and of magnetic fields are measured in clusters by X-ray and radio observa- tories, but the resolution is only just enough to claim that a fairly broad scale range of fluctuations exists (Schuecker et al. 2004; Vogt & Enßlin 2005). No power-law scalings have yet been established beyond reasonable doubt. What fundamentally hampers quantitative modeling of tur- bulence and related effects in clusters is that we do not have a definite theory of the basic properties of the intracluster medium: its (effective) viscosity, magnetic diffusivity or ther- mal conductivity. In a weakly collisional and strongly mag- netized plasma, all of these depend on the structure of the magnetic field (Braginskii 1965), which is shaped by the tur- bulence. If (or at scales where) a reasonable a priori assump- tion can be made about the field structure, further analytical progress is possible: thus, the theoretical models presented in this paper assume that the magnetic field is a sum of a slowly varying in space “mean field” and small low-frequency per- turbations (δB ≪ B0). In fact, since clusters do not have mean fields of any mag- nitude that could be considered dynamically significant, but do have stochastic fields, the outer-scale MHD turbulence in clusters falls into the weak-mean-field category (see § 1.3). The magnetic field should be highly filamentary, organized in long folded direction-reversing structures. It is not cur- rently known what determines the reversal scale.45 Obser- vations, while tentatively confirming the existence of very long filaments (Clarke & Enßlin 2006), suggest that the re- versal scale is much larger than the ion gyroscale: thus, the magnetic-energy spectrum for the Hydra A cluster core re- ported by Vogt & Enßlin (2005) peaks at around 1 kpc, com- pared to ρi ∼ 105 km. Below this scale, an Alfvén-wave cas- cade should exist (as is, indeed, suggested by Vogt & Enßlin’s spectrum being roughly consistent with k−5/3 at scales below the peak). As these scales are collisionless (λmfpi ∼ 100 pc in the cores and ∼ 10 kpc in the bulk of the clusters), it is to this turbulence that the theory developed in this paper should be applicable. Another complication exists, similar to that discussed in § 8.3: pressure anisotropies could give rise to fast plasma instabilities whose growth rate peaks just above the ion gy- roscale. As was pointed out by Schekochihin et al. (2005), these are, in fact, an inevitable consequence of any large-scale fluid motions that change the strength of the magnetic field. Although a number of interesting and plausible arguments can be made about the way the instabilities might determine the magnetic-field structure (Schekochihin & Cowley 2006; Schekochihin et al. 2008a; Rosin et al. 2009; Rincon et al. 2009), it is not currently understood how the small-scale fluctuations resulting from these instabilities coexist with the Alfvénic cascade. The uncertainties that result from this imperfect under- standing of the nature of the intracluster medium are exempli- fied by the problem of its thermal conductivity. The magnetic- field reversal scale in clusters is certainly not larger than the electron diffusion scale, (mi/me) 1/2λmfpi, which varies from a 45 See Schekochihin & Cowley (2006) for a detailed presentation of our views on the interplay between turbulence, magnetic field and plasma ef- fects in cluster; for further discussions and disagreements, see Enßlin & Vogt (2006); Subramanian et al. (2006); Brunetti & Lazarian (2007). few kpc in the cores to a few hundred kpc in the bulk. There- fore, one would expect that the approximation of isothermal electron fluid (§ 4) should certainly apply at all scales below the reversal scale, where δB ≪ B0 presumably holds. Even this, however, is not absolutely clear. One could imagine the electrons being effectively adiabatic if (or in the regions where) the plasma instabilities give rise to large fluctuations of the magnetic field (δB/B0 ∼ 1) at the ion gyroscale re- ducing the mean free path to λmfpi ∼ ρi (Schekochihin et al. 2008a; Rosin et al. 2009; Rincon et al. 2009). Such fluctu- ations cannot be described by the gyrokinetics in its cur- rent form. The current state of the observational evidence does not allow one to exclude either of these possibilities. Both isothermal (Fabian et al. 2006; Sanders & Fabian 2006) and non-isothermal (Markevitch & Vikhlinin 2007) coherent structures that appear to be shocks are observed. Disordered fluctuations of temperature can also be detected, which allows one to infer an upper limit for the scale at which the isothermal approximation can start being valid: thus, Markevitch et al. (2003) find temperature variations at all scales down to ∼ 100 kpc, which is the statistical limit that defines the spa- tial resolution of their temperature map. In none of these or similar measurements is the magnetic field data available that would make possible a pointwise comparison of the magnetic and thermal structure. Because of this lack of information about the state of the magnetized plasma in clusters, theories of the intracluster medium are not sufficiently constrained by observations, so no one theory is in a position to prevail. This uncertain state of affairs might be improved by analyzing the observationally much better resolved case of the solar wind, which should be quite similar to the intracluster medium at very small scales (except for somewhat lower values of βi in the solar wind). 9. CONCLUSION In this paper, we have considered magnetized plasma tur- bulence in the astrophysically prevalent regime of weak col- lisionality. We have shown how the energy injected at the outer scale cascades in phase space, eventually to increase the entropy of the system and heat the particles. In the process, we have explained how one combines plasma physics tools— in particular, the gyrokinetic theory—with the ideas of a tur- bulent cascade of energy to arrive at a hierarchy of tractable models of turbulence in various physically distinct scale in- tervals. These models represent the branching pathways of a generalized energy cascade in phase space (the “kinetic cas- cade”; see Fig. 5) and make explicit the “fluid” and “kinetic” aspects of plasma turbulence. A detailed outline of these developments was given in the Introduction. Intermediate technical summaries were pro- vided in § 4.9, § 5.7, and § 7.14. An astrophysical summary and discussion of the observational evidence was given in § 8, with a particular emphasis on space plasmas (§§ 8.1-8.3). Our view of how the transformation of the large-scale turbulent energy into heat occurs was encapsulated in the concept of a kinetic cascade of generalized energy. It was previewed in § 1.4 and developed quantitatively in §§ 3.4-3.5, § 4.7, § 5.6, §§ 6.2.3-6.2.5, §§ 7.8-7.12, Appendices D.2 and E.2. Following a series of analytical contributions that set up a theoretical framework for astrophysical gyrokinetics (Howes et al. 2006, 2008a; Schekochihin et al. 2007, 2008b, and this paper), an extensive program of fluid, hybrid fluid- kinetic, and fully gyrokinetic46 numerical simulations of mag- netized plasma turbulence is now underway (for the first re- KINETIC TURBULENCE IN MAGNETIZED PLASMAS 49 sults of this program, see Howes et al. 2008b; Tatsuno et al. 2009a,b). Careful comparisons of the fully gyrokinetic simulations with simulations based on the more readily computable models derived in this paper (RMHD—§ 2, isothermal electron fluid—§ 4, KRMHD—§ 5, ERMHD— § 7, HRMHD—Appendix E) as well as with the numerical studies based on various Landau fluid (Snyder et al. 1997; Goswami et al. 2005; Ramos 2005; Sharma et al. 2006, 2007; Passot & Sulem 2007) and gyrofluid (Hammett et al. 1991; Dorland & Hammett 1993; Snyder & Hammett 2001; Scott 2007) closures appear to be the way forward in developing a comprehensive numerical model of the kinetic turbulent cas- cade from the outer scale to the electron gyroscale. Of the many astrophysical plasmas to which these results apply, the solar wind and, perhaps, the magnetosheath, due to the high quality of turbulence measurements possible in them, appear to be the most suitable test beds for direct and detailed quan- titative comparisons of the theory and simulation results with observational evidence. The objective of all this work remains a quantitative characterization of the scaling-range properties (spectra, anisotropy, nature of fluctuations and their interac- tions), the ion and electron heating, and the transport proper- ties of the magnetized plasma turbulence. We thank O. Alexandrova, S. Bale, J. Borovsky, T. Carter, S. Chapman, C. Chen, E. Churazov, T. Enßlin, A. Fabian, A. Finoguenov, A. Fletcher, M. Haverkorn, B. Hnat, T. Hor- bury, K. Issautier, C. Lacombe, M. Markevitch, K. Osman, T. Passot, F. Sahraoui, A. Shukurov, and A. Vikhlinin for helpful discussions of experimental and observational data; I. Abel, M. Barnes, D. Ernst, J. Hastie, P. Ricci, C. Roach, and B. Rogers for discussions of collisions in gyrokinetics; and G. Plunk for discussions of the theory of gyrokinetic tur- bulence in two spatial dimensions. The authors’ travel was supported by the US DOE Center for Multiscale Plasma Dy- namics and by the Leverhulme Trust (UK) International Aca- demic Network for Magnetized Plasma Turbulence. A.A.S. was supported in part by a PPARC/STFC Advanced Fellow- ship and by the STFC Grant ST/F002505/1. He also thanks the UCLA Plasma Group for its hospitality on several occa- sions. S.C.C. and W.D. thank the Kavli Institute for The- oretical Physics and the Aspen Center for Physics for their hospitality. G.W.H. was supported by the US DOE contract DE-AC02-76CH03073. G.G.H. and T.T. were supported by the US DOE Center for Multiscale Plasma Dynamics. E.Q. and G.G.H. were supported in part by the David and Lucille Packard Foundation. 46 Using the publicly available GS2 code (developed originally for fusion applications; see http://gs2.sourceforge.net) and the purpose-built AstroGK code (see http://www.physics.uiowa.edu/~ ghowes/astrogk/). APPENDIX A. BRAGINSKII’S TWO-FLUID EQUATIONS AND REDUCED MHD Here we explain how the standard one-fluid MHD equations used in § 2 and the collisional limit of the KRMHD system (§ 6.1, derived in Appendix D) both emerge as limiting cases of the two-fluid theory. For the case of anisotropic fluctuations, k‖/k⊥ ≪ 1, all of this can, of course, be derived from gyrokinetics, but it is useful to provide a connection to the more well known fluid description of collisional plasmas. A.1. Two-Fluid Equations The rigorous derivation of the fluid equations for a collisional plasma was done in the classic paper of Braginskii (1965). His equations, valid for ω/νii ≪ 1, k‖λmfpi ≪ 1, k⊥ρi ≪ 1 (see Fig. 3), evolve the densities ns, mean velocities us and temperatures Ts of each plasma species (s = i,e): + us ·∇ ns = −ns∇·us, (A1) + us ·∇ us = −∇ps −∇· Π̂s + qsns us ×B + Fs, (A2) + us ·∇ Ts = −ps∇·us −∇·Γs − Π̂s : ∇us + Qs, (A3) where ps = nsTs and the expressions for the viscous stress tensor Π̂s, the friction force Fs, the heat flux Γs and the interspecies heat exchange Qs are given in Braginskii (1965). Equations (A1-A3) are complemented with the quasi-neutrality condition, ne = Zni, and the Faraday and Ampère laws, which are (in the non-relativistic limit) = −c∇×E, j = ene(ui − ue) = ∇×B. (A4) Because of quasi-neutrality, we only need one of the continuity equations, say the ion one. We can also use the electron momen- tum equation [Eq. (A2), s = e] to express E, which we then substitute into the ion momentum equation and the Faraday law. The resulting system is = −ρ∇·u, (A5) −∇· Π̂+ B ·∇B + ue ·∇ ue, (A6) u×B − j×B c∇· Π̂e + ue ·∇ , (A7) http://gs2.sourceforge.net http://www.physics.uiowa.edu/~ 50 SCHEKOCHIHIN ET AL. where ρ = mini, u = ui, p = pi + pe, Π̂ = Π̂i + Π̂e, ue = u − j/ene, ne = Zni, d/dt = ∂/∂t + u ·∇. The ion and electron temperatures continue to satisfy Eq. (A3). A.2. Strongly Magnetized Limit In this form, the two-fluid theory starts resembling the standard one-fluid MHD, which was our starting point in § 2: Eqs. (A5- A7) already look similar to the continuity, momentum and induction equations. The additional terms that appear in these equations and the temperature equations (A3) are brought under control by considering how they depend on a number of dimensionless parameters: ω/νii, k‖λmfpi, k⊥ρi, (me/mi) 1/2. While all these are small in Braginskii’s calculation, no assumption is made as to how they compare to each other. We now specify that k‖λmfpi√ , k⊥ρi ≪ k‖λmfpi ∼ ≪ 1 (A8) (see Fig. 4). Note that the first of these relations is equivalent to assuming that the fluctuation frequencies are Alfvénic—the same assumption as in gyrokinetics [Eq. (49)]. The second relation in Eq. (A8) will be referred to by us as the strongly magnetized limit. Under the assumptions (A8), the two-fluid equations reduce to the following closed set:47 = −ρ∇·u, (A10) b̂b̂ : ∇u − 1 b̂b̂ρν‖i b̂b̂ : ∇u − 1 B ·∇B , (A11) = B ·∇u − B∇·u, (A12) Ti∇·u + b̂ρκ‖ib̂ ·∇Ti − νie (Ti − Te) + miν‖i b̂b̂ : ∇u − 1 , (A13) Te∇·u + b̂ρκ‖eb̂ ·∇Te νie (Te − Ti) , (A14) where ν‖i = 0.90vthiλmfpi is the parallel ion viscosity, κ‖i = 2.45vthiλmfpi parallel ion thermal diffusivity, κ‖e = 1.40vtheλmfpe ∼ Z2/τ 5/2 (mi/me) 1/2κ‖i parallel electron thermal diffusivity [here λmfpi = vthi/νii with νii defined in Eq. (52)], and νie ion–electron collision rate [defined in Eq. (51)]. Note that the last term in Eq. (A13) represents the viscous heating of the ions. A.3. One-Fluid Equations (MHD) If we now restrict ourselves to the low-frequency regime where ion–electron collisions dominate over all other terms in the ion-temperature equation (A13), k‖λmfpi√ ≪ 1 (A15) [see Eqs. (A8) and (51)], we have, to lowest order in this new subsidiary expansion, Ti = Te = T . We can now write p = (ni +ne)T = (1 + Z)ρT/mi and, adding Eqs. (A13) and (A14), find the equation for pressure: p∇·u = ∇· b̂neκ‖eb̂ ·∇T miν‖i b̂b̂ : ∇u − 1 , (A16) where we have neglected the ion thermal diffusivity compared to the electron one, but kept the ion heating term to maintain energy conservation. Equation (A16) together with Eqs. (A10-A12) constitutes the conventional one-fluid MHD system. With the dissipative terms [which are small because of Eq. (A15)] neglected, this was the starting point for our fluid derivation of RMHD in § 2. Note that the electrons in this regime are adiabatic because the electron thermal diffusion is small ∼ k‖λmfpi ≪ 1, (A17) 47 The structure of the momentum equation (A11) is best understood by realizing that ρν‖i b̂b̂ : ∇u −∇·u/3 = p⊥ − p‖ , the difference between the perpen- dicular and parallel (ion) pressures. Since the total pressure is p = (2/3)p⊥ + (1/3)p‖ , Eq. (A11) can be written p⊥ − p‖ B ·∇B . (A9) This is the general form of the momentum equation that is also valid for collisionless plasmas, when k⊥ρi ≪ 1 but k‖λmfpi is order unity or even large. Equation (A9) together with the continuity equation (A11), the induction equation (A12) and a kinetic equation for the particle distribution function (from the solution of which p⊥ and p‖ are determined) form the system known as Kinetic MHD (KMHD, see Kulsrud 1964, 1983). The collisional limit, k‖λmfpi ≪ 1, of KMHD is again Eqs. (A10-A14). KINETIC TURBULENCE IN MAGNETIZED PLASMAS 51 provided Eq. (A15) holds and βi is order unity. If we take βi ≫ 1 instead, we can still satisfy Eq. (A15), so Ti = Te follows from the ion temperature equation (A13) and the one-fluid equations emerge as an expansion in high βi. However, these equations now describe two physical regimes: the adiabatic long-wavelength regime that satisfies Eq. (A17) and the shorter-wavelength regime in which (me/mi) βi ≪ k‖λmfpi ≪ (me/mi)1/2 βi, so the fluid is isothermal, T = T0 = const, p = [(1+Z)T0/mi]ρ = c2sρ [Eq. (9) holds with γ = 1]. A.4. Two-Fluid Equations with Isothermal Electrons Let us now consider the regime in which the coupling between the ion and electron temperatures is small and the electron diffusion is large [the limit opposite to Eqs. (A15) and (A17)]: k‖λmfpi√ ∼ k‖λmfpi ≫ 1, (A18) Then the electrons are isothermal, Te = T0e = const (with the usual assumption of stochastic field lines, so b̂ · ∇Te = 0 implies ∇Te = 0, as in § 4.4), while the ion temperature satisfies Ti∇·u + b̂ρκ‖ib̂ ·∇Ti miν‖i b̂b̂ : ∇u − 1 . (A19) Equation (A19) together with Eqs. (A10-A12) and p = ρ(Ti + ZT0e)/mi are a closed system that describes an MHD-like fluid of adiabatic ions and isothermal electrons. Applying the ordering of § 2.1 to these equations and carrying out an expansion in k‖/k⊥ ≪ 1 entirely analogously to the way it was done in § 2, we arrive at the RMHD equations (17-18) for the Alfvén waves and the following system for the compressive fluctuations (slow and entropy modes): + b̂ ·∇u‖ = 0, (A20) − v2Ab̂ ·∇ = ν‖i b̂ ·∇ b̂ ·∇u‖ + , (A21) = κ‖ib̂ ·∇ b̂ ·∇ , (A22) and the pressure balance b̂ ·∇u‖ + . (A23) Recall that these equations, being the consequence of Braginskii’s two-fluid equations (§ A.1), are an expansion in k‖λmfpi ≪ 1 correct up to first order in this small parameter. Since the dissipative terms are small, we can replace (d/dt)δρ/ρ0 in the viscous terms of Eqs. (A21) and (A23) by its value computed from Eqs. (A20), (A22) and (A23) in neglect of dissipation: (d/dt)δρ/ρ0 = −b̂ · ∇u‖/(1 + c2s/v2A) [cf. Eq. (25)], where the speed of sound cs is defined by Eq. (166). Substituting this into Eqs. (A21) and (A23), we recover the collisional limit of KRMHD derived in Appendix D, see Eqs. (D18-D20) and (D22). B. COLLISIONS IN GYROKINETICS The general collision operator that appears in Eq. (36) is (Landau 1936) = 2π lnΛ q2s q fs′ (v ∂ fs(v) fs(v) ∂ fs′ (v′) , (B1) where w = v − v′ and lnΛ is the Coulomb logarithm. We now take into account the expansion of the distribution function (54), use the fact that the collision operator vanishes when it acts on a Maxwellian, and retain only first-order terms in the gyrokinetic expansion. This gives us the general form of the collision term in Eq. (57): it is the ring-averaged linearized form of the Landau collision operator (B1), (∂hs/∂t)c = 〈Cs[h]〉Rs , where Cs[h] = 2π lnΛ q2s q F0s′(v hs(v) − F0s(v) hs′(v . (B2) Note that the velocity derivatives are taken at constant r, i.e., the gyrocenter distribution functions that appear in the integrand should be understood as hs(v)≡ hs(t,r+v⊥× ẑ/Ωs,v⊥,v‖). The explicit form of the gyrokinetic collision operator can be derived in k space as follows: eik·Rhk eik·rCs e−ik·ρhk eik·Rs eik·ρs(v)Cs e−ik·ρhk , (B3) 52 SCHEKOCHIHIN ET AL. where ρs(v) = −v⊥× ẑ/Ωs and Rs = r−ρs(v). Angle brackets with no subscript refer to averages over the gyroangle ϑ of quantities that do not depend on spatial coordinates. Note that inside the operator Cs[. . .], h occurs both with index s and velocity v and with index s′ and velocity v′ (over which summation/integration is done). In the latter case, ρ = ρs′(v′) = −v′⊥× ẑ/Ωs′ in the exponential factor inside the operator. Most of the properties of the collision operator that are used in the main body of this paper to order the collision terms can be established in general, already on the basis of Eq. (B3) (§§ B.1-B.2). If the explicit form of the collision operator is required, we could, in principle, perform the ring average on the linearized operator C [Eq. (B2)] and derive an explicit form of (∂hs/∂t)c. In practice, in gyrokinetics, as in the rest of plasma physics, the full collision operator is only used when it is absolutely unavoidable. In most problems of interest, further simplifications are possible: the same-species collisions are often modeled by simpler operators that share the full collision operator’s conservation properties (§ B.3), while the interspecies collision operators are expanded in the electron–ion mass ratio (§ B.4). B.1. Velocity-Space Integral of the Gyrokinetic Collision Operator Many of our calculations involve integrating the gyrokinetic equation (57) over the velocity space while keeping r constant. Here we estimate the size of the integral of the collision term when k⊥ρs ≪ 1. Using Eq. (B3), d3veik·r−ik·ρs(v) eik·ρs(v)Cs e−ik·ρhk eik·r2π dv⊥ v⊥ e−ik·ρs(v) eik·ρs(v)Cs e−ik·ρhk eik·r e−ik·ρs(v) eik·ρs(v)Cs e−ik·ρhk eik·r d3vJ0(as)e ik·ρs(v)Cs e−ik·ρhk eik·r 1 − ik · v⊥× ẑ k · v⊥× ẑ + . . . e−ik·ρhk . (B4) Since the (linearized) collision operator Cs conserves particle number, the first term in the expansion vanishes. The operator Cs = Css +Css′ is a sum of the same-species collision operator [the s′ = s part of the sum in Eq. (B2)] and the interspecies collision operator (the s′ 6= s part). The former conserves total momentum of the particles of species s, so it gives no contribution to the second term in the expansion in Eq. (B4). Therefore, d3v〈〈Css[hs]〉Rs〉r ∼ νssk s δns. (B5) The interspecies collisions do contribute to the second term in Eq. (B4) due to momentum exchange with the species s′. This contribution is readily inferred from the standard formula for the linearized friction force (see, e.g., Helander & Sigmar 2002): d3vvCss′ e−ik·ρhk S (v)e −ik·ρs(v)hsk + ms′νs S (v)e −ik·ρs′ (v)hs′k , (B6) S (v) = 2πn0s′q s′ lnΛ (vths vths′ vths′ erf ′ vths′ , (B7) where erf(x) = (2/ dy exp(−y2) is the error function. From this, via a calculation of ring averages analogous to Eq. (B17), we get −ik · v⊥× ẑ e−ik·ρhk S (v) ik ·ρs(v)e−ik·ρs(v) hsk + S (v) ik ·ρs′(v)e−ik·ρs′ (v) S (v)asJ1(as)hsk + S (v)as′J1(as′)hs′k ∼ νss′k2⊥ρ2sδns + νs′sk2⊥ρ2s′δns′ . (B8) For the ion–electron collisions (s = i, s′ = e), using Eqs. (45) and (51), we find that both terms are ∼ (me/mi)1/2νiik2⊥ρ2i δni. Thus, besides an extra factor of k2⊥ρ i , the ion–electron collisions are also subdominant by one order in the mass-ratio expansion compared to the ion–ion collisions. The same estimate holds for the interspecies contributions to the third and fourth terms in Eq. (B4). In a similar fashion, the integral of the electron–ion collision operator (s = e, s′ = i), is ∼ νeik2⊥ρ2eδne, which is the same order as the integral of the electron–electron collisions. The conclusion of this section is that, both for ion and for electron collisions, the velocity-space integral (at constant r) of the gyrokinetic collision operator is higher order than the collision operator itself by two orders of k⊥ρs. This is the property that we relied on in neglecting collision terms in Eqs. (104) and (137). B.2. Ordering of Collision Terms in Eqs. (125) and (137) In § 5, we claimed that the contribution to the ion–ion collision term due to the (Ze〈ϕ〉Ri/T0i)F0i part of the ion distribution function [Eq. (124)] was one order of k⊥ρi smaller than the contributions from the rest of hi. This was used to order collision KINETIC TURBULENCE IN MAGNETIZED PLASMAS 53 terms in Eqs. (125) and (137). Indeed, from Eq. (B3), Ze〈ϕ〉Ri eik·Ri eik·ρiCii e−ik·ρi J0(ai)F0i ]〉 Zeϕk eik·Ri eik·ρiCii 1 − ik ·ρi − (k ·ρi)2 − + · · · ∼ νiik2⊥ρ2i F0i. (B9) This estimate holds because, as it is easy to ascertain using Eq. (B2), the operator Cii annihilates the first two terms in the expansion and only acts non-trivially on an expression that is second order in k⊥ρi. With the aid of Eq. (47), the desired ordering of the term (B9) in Eq. (125) follows. When Eq. (B9) is integrated over velocity space, the result picks up two extra orders in k⊥ρi [a general effect of integrating the gyroaveraged collision operator over the velocity space; see Eq. (B4)]: Ze〈ϕ〉Ri ∼ νiik4⊥ρ4i , (B10) so the resulting term in Eq. (137) is third order, as stated in § 5.3. B.3. Model Pitch-Angle-Scattering Operator for Same-Species Collisions A popular model operator for same-species collisions that conserves particle number, momentum, and energy is constructed by taking the test-particle pitch-angle-scattering operator and correcting it with an additional term that ensures momentum con- servation (Rosenbluth et al. 1972; see also Helander & Sigmar 2002): CM[hs] = ν D (v) 1 − ξ2 ) ∂hs 1 − ξ2 2v ·U[hs] v2ths , U[hs] = d3vvνssD (v)hs d3v (v/vths)2νssD (v)F0s(v) , (B11) νssD (v) = νss (vths v2ths erf ′ , νss = 2πn0sq s lnΛ , (B12) where the velocity derivatives are at constant r. The gyrokinetic version of this operator is (cf. Catto & Tsang 1977; Dimits & Cohen 1994) 〈CM[hs]〉Rs = eik·RsνssD (v) 1 − ξ2 ) ∂hsk v2(1 + ξ2) 4v2ths s hsk + 2 v⊥J1(as)U⊥[hsk] + v‖J0(as)U‖[hsk] v2ths , (B13) U⊥[hsk] = d3vv⊥J1(as)νssD (v)hsk(v⊥,v‖) d3v (v/vths)2νssD (v)F0s(v) , U‖[hsk] = d3vv‖J0(as)ν D (v)hsk(v⊥,v‖) d3v (v/vths)2νssD (v)F0s(v) where as = k⊥v⊥/Ωs. The velocity derivatives are now at constant Rs. The spatial diffusion term appearing in the ring-averaged collision operator is physically due to the fact that a change in a particle’s velocity resulting from a collision can lead to a change in the spatial position of its gyrocenter. In order to derive Eq. (B13), we use Eq. (B3). Since, ρs(v) = 1 − ξ2 sinϑ+ ŷv 1 − ξ2 cosϑ /Ωs, it is not hard to see e−ik·ρs(v)hsk = e −ik·ρs(v) 1 − ξ2 ik⊥ · v⊥× ẑ e−ik·ρs(v)hsk = e −ik·ρ(v) ik⊥ ·v⊥ hsk. (B14) Therefore, eik·ρs(v) 1 − ξ2 e−ik·ρs(v)hsk 1 − ξ2 ) ∂hsk k2⊥hsk, eik·ρs(v) e−ik·ρs(v)hsk 1 − ξ2 k2⊥hsk. (B15) Combining these formulae, we obtain the first two terms in Eq. (B13). Now let us work out the U term: eik·ρs(v)v · d3v′ v′νssD (v ′)e−ik·ρs(v ′)hsk v′⊥,v veik·ρs(v) dv′⊥ v dv′‖ν v′e−ik·ρs(v v′⊥,v (B16) Since ve±ik·ρs(v) = ẑv‖ e±ik·ρs(v) v⊥e±ik·ρs(v) , where e±ik·ρs(v) = J0(as) and ±ik·ρs(v) = ẑ× v⊥× ẑ ∓ik⊥ · v⊥× ẑ = ±iΩsẑ× ∓ik⊥ · v⊥× ẑ = ±i ẑ×k⊥ v⊥J1(as), (B17) we obtain the third term in Eq. (B13). 54 SCHEKOCHIHIN ET AL. It is useful to give the lowest-order form of the operator (B13) in the limit k⊥ρs ≪ 1: 〈CM[hs]〉Rs = ν D (v) 1 − ξ2 ) ∂hs d3v′v′ νssD (v ′)hs(v d3v′v′2νssD (v ′)F0s(v′) + O(k2⊥ρ s ). (B18) This is the operator that can be used in the right-hand side of Eq. (145) (as, e.g., is done in the calculation of collisional transport terms in Appendix D.3). In practical numerical computations of gyrokinetic turbulence, the pitch-angle scattering operator is not sufficient because the distribution function develops small scales not only in ξ but also in v (M. Barnes, W. Dorland and T. Tatsuno 2006, unpublished). This is, indeed, expected because the phase-space entropy cascade produces small scales in v⊥, rather than just in ξ (see § 7.9.1). In order to provide a cut off in v, an energy-diffusion operator must be added to the pitch-angle-scattering operator derived above. A numerically tractable model gyrokinetic energy-diffusion operator was proposed by Abel et al. (2008); Barnes et al. (2009).48 B.4. Electron–Ion Collision Operator This operator can be expanded in me/mi and to the lowest order is (see, e.g., Helander & Sigmar 2002) Cei[h] = ν D (v) 1 − ξ2 ) ∂he 1 − ξ2 2v ·ui v2the , νeiD (v) = νei (vthe . (B19) The corrections to this form are O(me/mi). This is second order in the expansion of § 4 and, therefore, we need not keep these corrections. The operator (B19) is mathematically similar to the model operator for the same-species collisions [Eq. (B13)]. The gyrokinetic version of this operator is derived in the way analogous to the calculation in Appendix B.3. The result is 〈Cei[h]〉Re = eik·ReνeiD (v) 1 − ξ2 ) ∂hek v2(1 + ξ2) 4v2the v2the J1(ae) 2v′2⊥ v2thi hik + 2v‖J0(ae)u‖ki v2the . (B20) At scales not too close to the electron gyroscale, namely, such that k⊥ρe ∼ (me/mi)1/2, the second and third terms are manifestly second order in (me/mi) 1/2, so have to be neglected along with other O(me/mi) contributions to the electron–ion collisions. 49 The remaining two terms are first order in the mass-ratio expansion: the first term vanishes for he = h e [Eq. (101)], so its contribution is first order; in the fourth term, we can use Eq. (87) to express u‖i in terms of quantities that are also first order. Keeping only the first-order terms, the gyrokinetic electron–ion collision operator is 〈Cei[h]〉Re = ν D (v) 1 − ξ2 ) ∂h(1)e 2v‖u‖i v2the . (B21) Note that the ion drag term is essential to represent the ion–electron friction correctly and, therefore, to capture the Ohmic resistivity (which, however, is rarely more important for unfreezing flux than the electron inertia and the finiteness of the electron gyroradius; see § 7.7). C. A HEURISTIC DERIVATION OF THE ELECTRON EQUATIONS Here we show how the equations (116-117) of § 4 and the ERMHD equations (226-227) of § 7 can be derived heuristically from electron fluid dynamics and a number of physical assumptions, without the use of gyrokinetics (§ C.1). This derivation is not rigorous. Its role is to provide an intuitive route to the isothermal electron fluid and ERMHD approximations. C.1. Derivation of Eqs. (116-117) We start with the following three equations: = −c∇×E, ∂ne +∇· (neue) = 0, E + ue ×B . (C1) These are Faraday’s law, the electron continuity equation, and the generalized Ohm’s law, which is the electron momentum equation with all electron inertia terms neglected (i.e., effectively, the lowest order in the expansion in the electron mass me). The electron pressure is assumed to be scalar by fiat (this can be justified in certain limits: for example in the collisional limit, as in Appendix A, or for the isothermal electron fluid approximation derived in § 4). The electron-pressure term in the right-hand side of Ohm’s law is sometimes called the thermoelectric term. We now assume the same static uniform equilibrium, E0 = 0, B0 = B0ẑ, that we have used throughout this paper and apply to Eqs. (C1) the fundamental ordering discussed in § 3.1. 48 The collision operator now used the GS2 and AstroGK codes (see footnote 46) is their energy-diffusion operator plus the pitch-angle-scattering opera- tor (B13). 49 The third term in Eq. (B20) is, in fact, never important: at the electron scales, k⊥ρe ∼ 1, it is negligible because of the Bessel function in the velocity integral (Abel et al. 2008). KINETIC TURBULENCE IN MAGNETIZED PLASMAS 55 First consider the projection of Ohm’s law onto the total magnetic field B, use the definition of E [Eq. (37)], and keep the leading-order terms in the ǫ expansion: E · b̂ = − 1 b̂ ·∇pe ⇒ + b̂ ·∇ϕ = b̂ ·∇ δpe . (C2) This turns into Eq. (116) if we also assume isothermal electrons, δpe = T0eδne [see Eq. (103)]. With the aid of Ohm’s law, Faraday’s law turns into = ∇× (ue ×B) = −ue ·∇B + B ·∇ue − B∇·ue. (C3) Keeping the leading-order terms, we find, for the components of Eq. (C3) perpendicular and parallel to the mean field, + u⊥e ·∇⊥ = b̂ ·∇u⊥e, + u⊥e ·∇⊥ = b̂ ·∇u‖e. (C4) In the last equation, we have used the electron continuity equation to write ∇·ue = − + u⊥e ·∇⊥ . (C5) From Ohm’s law, we have, to lowest order, u⊥e = −ẑ× E⊥ +∇⊥ = ẑ×∇⊥ . (C6) Using this expression in the second of the equations (C4) gives − b̂ ·∇u‖e = , (C7) where d/dt is defined in the usual way [Eq. (122)]. Assuming isothermal electrons (δpe = T0eδne) annihilates the second term on the right-hand side and turns the above equation into Eq. (117). As for the first of the equations (C4), the use of Eq. (C6) and substitution of δB⊥ = −ẑ×∇⊥A‖ turns it into the previously derived Eq. (C2), whence follows Eq. (116). Thus, we have shown that Eqs. (116-117) can be derived as a direct consequence of Faraday’s law, electron fluid dynamics (electron continuity equation and the electron force balance, a. k. a. the generalized Ohm’s law), and the assumption of isothermal electrons—all taken to the leading order in the gyrokinetic ordering given in § 3.1 (i.e., assuming strongly interacting anisotropic fluctuations with k‖ ≪ k⊥). We have just proved that Eqs. (116) and (117) are simply the perpendicular and parallel part, respectively, of Eq. (C3). The latter equation means that the magnetic-field lines are frozen into the electron flow velocity ue, i.e., the flux is conserved, the result formally proven in § 4.3 [see Eq. (99)]. C.2. Electron MHD and the Derivation of Eqs. (226-227) One route to Eqs. (226-227), already explained in § 7.2, is to start with Eqs. (C2) and (C7) and assume Boltzmann electrons and ions and the total pressure balance. Another approach, more standard in the literature on the Hall and Electron MHD, is to start with Eq. (C3), which states that the magnetic field is frozen into the electron flow. The electron velocity can be written in terms of the ion velocity and the current density, and the latter then related to the magnetic field via Ampère’s law: ue = ui − = ui − 4πene ∇×B. (C8) To the leading order in ǫ, the perpendicular and parallel parts of Eq. (C3) are Eqs. (C4), respectively, where the perpendicular and parallel electron velocities are [from Eq. (C8)] u⊥e = u⊥i + 4πen0e ẑ×∇⊥δB‖, u‖e = u‖i + 4πen0e ∇2⊥A‖. (C9) The relative size of the two terms in each of these expressions is controlled by the size of k⊥di, where di = ρi/ βi is the ion inertial scale. When k⊥di ≫ 1, we may set ui = 0. Note, however, that the ion motion is not totally neglected: indeed, in the second of the equations (C4), the δne/ne terms comes, via Eq. (C5), from the divergence of the ion velocity [from Eq. (C8), ∇·ui = ∇·ue]. To complete the derivation, we relate δne to δB‖ via the assumption of total pressure balance, as explained in § 7.2, giving us Eq. (225). Substituting this equation and Eqs. (C9) into Eqs. (C4), we obtain = v2Adi b̂ ·∇ 1 + 2/βi(1 + Z/τ ) b̂ ·∇∇2⊥Ψ, (C10) where Ψ = −A‖/ 4πmin0i. Equations (C10) evolve the perturbed magnetic field. These equations become the ERMHD equations (226-227) if δB‖/B0 is expressed in terms of the scalar potential via Eq. (223). 56 SCHEKOCHIHIN ET AL. Note that there are two special limits in which the assumption of immobile ions suffices to derive Eqs. (C10) from Eq. (C3) without the need for the pressure balance: βi ≫ 1 (incompressible ions) or τ = T0i/T0e ≪ 1 (cold ions) but βe = βiZ/τ ≫ 1. In both cases, Eq. (225) shows that δne/n0e ≪ δB‖/B0, so the density perturbation can be ignored and the coefficient of the right-hand side of the second of the equations (C10) is equal to 1. The limit of cold ions is discussed further in Appendix E. D. FLUID LIMIT OF THE KINETIC RMHD Taking the fluid (collisional) limit of the KRMHD system (summarized in § 5.7) means carrying out another subsidiary expansion—this time in k‖λmfpi ≪ 1. The expansion only affects the equations for the density and magnetic-field-strength fluctuations (§ 5.5) because the Alfvén waves are indifferent to collisional effects. The calculation presented below follows a standard perturbation algorithm used in the kinetic theory of gases and in plasma physics to derive fluid equations with collisional transport coefficients (Chapman & Cowling 1970). For magnetized plasma, this calculation was carried out in full generality by Braginskii (1965), whose starting point was the full plasma kinetic theory [Eqs. (36-39)]. While what we do below is, strictly speaking, merely a particular case of his calculation (see Appendix A), it has the advantage of relative simplicity and also serves to show how the fluid limit is recovered from the gyrokinetic formalism—a demonstration that we believe to be of value. It will be convenient to use the KRMHD system written in terms of the function δ f̃i = g + (v2⊥/v thi)(δB‖/B0)F0i, which is the perturbation of the local Maxwellian in the frame of the Alfvén waves [Eqs. (150-152)]. We want to expand Eq. (150) in powers of k‖λmfpi, so we let δ f̃i = δ f̃ i + δ f̃ i + . . ., δB‖ = δB + δB(1) + . . ., etc. D.1. Zeroth Order: Ideal Fluid Equations Since [see Eq. (49)] k‖λmfpi√ k‖vthi ∼ k‖λmfpi, (D1) to zeroth order Eq. (150) becomes δ f̃ (0)i = 0. The zero mode of the collision operator is a Maxwellian. Therefore, we may write the full ion distribution function up to zeroth order in k‖λmfpi as follows [see Eq. (144)] 2πTi/mi mi[(v⊥ − uE)2 + (v‖ − u‖)2] , (D2) where ni = n0i + δni and Ti = T0i + δTi include both the unperturbed quantities and their perturbations. The E×B drift velocity uE comes from the Alfvén waves (see § 5.4) and does not concern us here. Since the perturbations δni, u‖ and δTi are small in the original gyrokinetic expansion, Eq. (D2) is equivalent to δ f̃ (0)i = δn(0)e v2thi δT (0)i v2thi F0i, (D3) where we have used quasi-neutrality to replace δni/n0i = δne/n0e. This automatically satisfies Eq. (151), while Eq. (152) gives us an expression for the ion-temperature perturbation: δT (0)i δn(0)e δB(0) . (D4) Note that this is consistent with the interpretation of the perpendicular Ampère’s law [Eq. (63), which is the progenitor of Eq. (152)] as the pressure balance [see Eq. (67)]: indeed, recalling that the electron pressure perturbation is δpe = T0eδne [Eq. (103)], we have = −δpe − δpi = −δneT0e − δniT0i − n0iδTi, (D5) whence follows Eq. (D4) by way of quasi-neutrality (Zni = ne) and the definitions of Z, τ , βi [Eqs. (40-42)]. Since the collision operator conserves particle number, momentum and energy, we can obtain evolution equations for δn(0)e /n0e, and δB(0) /B0 by multiplying Eq. (150) by 1, v‖, v 2/v2thi, respectively, and integrating over the velocity space. The three moments that emerge this way are d3vδ f̃ (0)i = δn(0)e d3vv‖δ f̃ i = u v2thi δ f̃ (0)i = δn(0)e δT (0)i . (D6) The three evolution equations for these moments are δn(0)e δB(0) + b̂ ·∇u(0) = 0, (D7) KINETIC TURBULENCE IN MAGNETIZED PLASMAS 57 du(0) − v2A b̂ ·∇ δB(0) = 0, (D8) δn(0)e δT (0)i δB(0) b̂ ·∇u(0) = 0. (D9) These allow us to recover the fluid equations we derived in § 2.4: Eq. (D8) is the parallel component of the MHD momentum equation (27); combining Eqs. (D7), (D9) and (D4), we obtain the continuity equation and the parallel component of the induction equation—these are the same as Eqs. (25) and (26): δn(0)e 1 + c2s/v b̂ ·∇u(0) δB(0) 1 + v2A/c2s b̂ ·∇u(0) , (D10) where the sound speed cs is defined by Eq. (166). From Eqs. (D7) and (D9), we also find the analog of the entropy equation (23): δT (0)i δn(0)e δs(0) δs(0) δT (0)i δn(0)e δn(0)e δB(0) . (D11) This implies that the temperature changes due to compressional heating only. D.2. Generalized Energy: Five RMHD Cascades Recovered We now calculate the generalized energy by substituting δ f̃i from Eq. (D3) into Eq. (153) and using Eqs. (D4) and (D11): min0iu min0iu n0iT0i 1 + Z/τ 5/3 + Z/τ =W +AW +W AW +W sw +W n0iT0i 1 + Z/τ 5/3 + Z/τ Ws. (D12) The first two terms are the Alfvén-wave energy [Eq. (154)]. The following two terms are the slow-wave energy, which splits into the independently cascaded energies of “+” and “−” waves (see § 2.5): WSW = W sw +W min0i |z+‖| 2 + |z−‖| . (D13) The last term is the total variance of the entropy mode. Thus, we have recovered the five cascades of the RMHD system (§ 2.7; Fig. 5 maps out the fate of these cascades at kinetic scales). D.3. First Order: Collisional Transport Now let us compute the collisional transport terms for the equations derived above. In order to do this, we have to determine the first-order perturbed distribution function δ f̃ (1)i , which satisfies [see Eq. (150)] δ f̃ (1)i δ f̃ (0)i − v2thi δB(0) + v‖ b̂ ·∇ δ f̃ (0)i + δn(0)e . (D14) We now use Eq. (D3) to substitute for δ f̃ (0)i and Eqs. (D10-D11) and (D8) to compute the time derivatives. Equation (D14) becomes δ f̃ (1)i 1 − 3ξ2 v2thi 2/3 + c2s/v 1 + c2s/v b̂ ·∇u(0) v2thi b̂ ·∇δT F0i(v), (D15) where ξ = v‖/v. Note that the right-hand side gives zero when multiplied by 1, v‖ or v 2 and integrated over the velocity space, as it must do because the collision operator in the left-hand side conserves particle number, momentum and energy. Solving Eq. (D15) requires inverting the collision operator. While this can be done for the general Landau collision operator (see Braginskii 1965), for our purposes, it is sufficient to use the model operator given in Appendix B.3, Eq. (B18). This simplifies calculations at the expense of an order-one inaccuracy in the numerical values of the transport coefficients. As the exact value of these coefficients will never be crucial for us, this is an acceptable loss of precision. Inverting the collision operator in Eq. (D15) then gives δ f̃ (1)i = ν iiD(v) 1 − 3ξ2 v2thi 2/3 + c2s/v 1 + c2s/v b̂ ·∇u(0) v2thi b̂ ·∇δT F0i(v), (D16) 58 SCHEKOCHIHIN ET AL. where ν iiD(v) is a collision frequency defined in Eq. (B12) and we have chosen the constants of integration in such a way that the three conservation laws are respected: d3vδ f̃ (1)i = 0, d3vv‖δ f̃ i = 0, d3vv2δ f̃ (1)i = 0. These relations mean that δn e = 0, = 0, δT (1)i = 0 and that, in view of Eq. (152), we have δB(1) 2/3 + c2s/v 1 + c2s/v ν‖ib̂ ·∇u‖, (D17) where ν‖i is defined below [Eq. (D21)]. Equations (D16-D17) are now used to calculate the first-order corrections to the moment equations (D7-D9). They become + b̂ ·∇u‖ = 0, (D18) − v2Ab̂ ·∇ 2/3 + c2s/v 1 + c2s/v ν‖i b̂ ·∇ b̂ ·∇u‖ , (D19) = κ‖ib̂ ·∇ b̂ ·∇δTi , (D20) where we have introduced the coefficients of parallel viscosity and parallel thermal diffusivity: ν‖i = ν iiD(v)v F0i(v), κ‖i = ν iiD(v)v v2thi F0i(v). (D21) All perturbed quantities are now accurate up to first order in k‖λmfpi. Note that in Eq. (D19), we used Eq. (D17) to express δB(0) = δB‖ − δB . We do the same in Eq. (D4) and obtain 2/3 + c2s/v 1 + c2s/v ν‖ib̂ ·∇u‖ . (D22) This equation completes the system (D18-D20), which allows us to determine δne, u‖, δTi and δB‖. In § 6.1, we use the equations derived above, but absorb the prefactor (2/3 + c2s/v A)/(1 + c A) into the definition of ν‖i. The same system of equations can also be derived from Braginskii’s two-fluid theory (Appendix A.4), from which we can borrow the quantitatively correct values of the viscosity and ion thermal diffusivity: ν‖i = 0.90v thi/νii, κ‖i = 2.45v thi/νii, where νii is defined in Eq. (52). E. HALL REDUCED MHD The popular Hall MHD approximation consists in assuming that the magnetic field is frozen into the electron flow velocity [Eq. (C3)]. The latter is calculated from the ion flow velocity and the current determined by Ampère’s law [Eq. (C8)]: 4πen0e , (E1) where the ion flow velocity ui satisfies the conventional MHD momentum equation (8). The Hall MHD is an appealing theoretical model that appears to capture both the MHD behavior at long wavelengths (when ue ≃ ui) and some of the kinetic effects that become important at small scales due to decoupling between the electron and ion flows (the appearance of dispersive waves) without bringing in the full complexity of the kinetic theory. However, unlike the kinetic theory, it completely ignores the collisionless damping effects and suggests that the key small-scale physical change is associated with the ion inertial scale di = ρi/ βi (or, when βe ≪ 1, the ion sound scale ρs = ρi Z/2τ ; see § E.3), rather than the ion gyroscale ρi. Is this an acceptable model for plasma turbulence? Figure 8 illustrates the fact that at τ ∼ 1, the ion inertial scale does not play a special role linearly, the MHD Alfvén wave becomes dispersive at the ion gyroscale, not at di, and that the collisionless damping cannot in general be neglected. A detailed comparison of the Hall MHD linear dispersion relation with full hot plasma dispersion relation leads to the conclusion that Hall MHD is only a valid approximation in the limit of cold ions, namely, τ = T0i/T0e ≪ 1 (Ito et al. 2004; Hirose et al. 2004). In this Appendix, we show that a reduced (low-frequency, anisotropic) version of Hall MHD can, indeed, be derived from gyrokinetics in the limit τ ≪ 1.50 This demonstrates that the Hall MHD model fits into the theoretical framework proposed in this paper as a special limit. However, the parameter regime that gives rise to this special limit is not common in space and astrophysical plasmas of interest. E.1. Gyrokinetic Derivation of Hall Reduced MHD Let us start with the equations of isothermal electron fluid, Eqs. (116-121), i.e., work within the assumptions that allowed us to carry out the mass-ratio expansion (§ 4.8). In Eq. (120) (perpendicular Ampère’s law, or gyrokinetic pressure balance), taking 50 Note that, strictly speaking, our ordering of the collision frequency does not allow us to take this limit (see footnote 17), but this is a minor betrayal of rigor, which does not, in fact, invalidate the results. KINETIC TURBULENCE IN MAGNETIZED PLASMAS 59 the limit τ ≪ 1 gives eik·r d3vJ0(ai)hik , (E2) where we have used Eq. (118) to express the hi integral and the expression for the electron beta βe = βiZ/τ . Note that the above equation is simply the statement of a balance between the magnetic and electron thermal pressure (the ions are relatively cold, so they have fallen out of the pressure balance). Using Eq. (E2) to express δne in terms of δB‖ in Eqs. (116) and (117) and also substituting for u‖e from Eq. (119) [or, equivalently, Eq. (87)], we get = vAb̂ ·∇ Φ+ vAdi 1 + 2/βe b̂ ·∇ u‖i − di∇2⊥Ψ , (E3) where we have used our usual definitions of the stream and flux functions [Eq. (135)] and of the full derivatives [Eq. (160)]. These equations determine the evolution of the magnetic field, but we still need the ion gyrokinetic equation (121) to calculate the ion motion (Φ = cϕ/B0 and u‖i) via Eqs. (118) and (88). There are two limits in which the ion kinetics can be reduced to simple fluid models. E.1.1. High-Ion-Beta Limit, βi ≫ 1 In this limit, k⊥ρi = k⊥di βi ≫ 1 as long as k⊥di is not small. Then the ion motion can be neglected because it is averaged out by the Bessel functions in Eqs. (118) and (88)—in the same way as in § 7.2. So we get Φ = (τ/Z)vAdiδB‖/B0 [using Eq. (E2); this is the τ ≪ 1 limit of Eq. (223)] and u‖i = 0. Noting that βe = βiZ/τ ≫ 1 in this limit, we find that Eqs. (E3) reduce to = v2Adi b̂ ·∇ = −di b̂ ·∇∇2⊥Ψ, (E4) which is the τ ≪ 1 limit of our ERMHD equations (226-227) [or, equivalently, Eqs. (C10)]. E.1.2. Low-Ion-Beta Limit, βi ∼ τ ≪ 1 (the Hall Limit) This limit is similar to the RMHD limit worked out in § 5: we take, for now, k⊥di ∼ 1 and βe ∼ 1 (in which subsidiary expansions can be carried out later), and expand the ion gyrokinetics in k⊥ρi = k⊥di βi ≪ 1. Note that ordering βe ∼ 1 means that we have ordered βi ∼ τ ≪ 1. We now proceed analogously to the way we did in § 5: express the ion distribution in terms of the g function defined by Eq. (124) and, using the relation (E2) between δB‖/B0 and δne/n0e, write Eqs. (125-127) as follows: ︸ ︷︷ ︸ ︸ ︷︷ ︸ v⊥ ·A⊥ ︸ ︷︷ ︸ − 〈Cii[g]〉Ri ︸ ︷︷ ︸ A‖,ϕ− 〈ϕ〉Ri ︸ ︷︷ ︸ +b̂ ·∇ ︸ ︷︷ ︸ v⊥ ·A⊥ ︸ ︷︷ ︸ F0i + v⊥ ·A⊥ ︸ ︷︷ ︸ ,(E5) Γ1(αi) + ︸ ︷︷ ︸ 1 −Γ0(αi) ]Zeϕk ︸ ︷︷ ︸ d3vJ0(ai)gk ︸ ︷︷ ︸ , u‖ki d3vv‖J0(ai)gk ︸ ︷︷ ︸ . (E6) All terms in these equations can be ordered with respect to the small parameter βi (an expansion subsidiary to the gyrokinetic expansion in ǫ and the Hall expansion in τ ≪ 1). The lowest order to which they enter is indicated underneath each term. The ordering we use is the same as in § 5.2, but now we count the powers of βi and order formally k⊥di ∼ 1 and βe ∼ 1. It is easy to check that this ordering can be summarized as follows and that the ion and electron terms in Eqs. (E3) are comparable under this ordering, so their competition is retained (in fact, this could be used as the underlying assumption behind the ordering). The fluctuation frequency continues to be ordered as the Alfvén frequency, ω ∼ k‖vA. The collision terms are ordered via ω/νii ∼ k‖λmfpi/ βi and k‖λmfpi ∼ 1, although the latter assumption is not essential for what follows, because collisions turn out to be negligible and it is fine to take k‖λmfpi ≫ 1 from the outset and neglect them completely. In Eqs. (E6), we use Eqs. (129) and (130) to write 1 −Γ0(αi) ≃ αi = k2⊥ρ2i /2 and Γ1(αi) ≃ 1. These equations imply that if we expand g = g(−1) + g(0) + . . ., we must have d3vg(−1) = 0, so the contribution to the right-hand side of the first of the equations 60 SCHEKOCHIHIN ET AL. (E6) (the quasi-neutrality equation) comes from g(0), while the parallel ion flow is determined by g(−1). Retaining only the lowest (minus first) order terms in Eq. (E5), we find the equation for g(−1), the v‖ moment of which gives an equation for u‖i: ∂g(−1) {ϕ,g(−1)} = 2 v‖b̂ ·∇ F0i ⇒ = v2Ab̂ ·∇ . (E8) Now integrating Eq. (E5) over the velocity space (at constant r), using the first of the equations (E6) to express the integral of g(0), and retaining only the lowest (zeroth) order terms, we find ρ2i ∇2⊥ + b̂ ·∇u‖i = 0 ⇒ ∇2⊥Φ = vAb̂ ·∇∇2⊥Ψ, (E9) where we have used the second of the equations (E3) to express the time derivative of δB‖/B0. Together with Eqs. (E3), Eqs. (E8) and (E9) form a closed system, which it is natural to call Hall Reduced MHD (HRMHD) because these equations can be straightforwardly derived by applying the RMHD ordering (§ 2.1) to the MHD equations (8-10) with the induction equation (10) replaced by Eq. (E1). Indeed, Eqs. (E8) and (E9) exactly coincide with Eqs. (27) and (18), which are the parallel and perpendicular components of the MHD momentum equation (8) under the RMHD ordering; Eqs. (E3) should be compared Eqs. (17) and (26) while noticing that, in the limit τ ≪ 1, the sound speed is cs = vA βe/2 [see Eq. (166)]. The incompressible case (Mahajan & Yoshida 1998) is recovered in the subsidiary limit βe ≫ 1 (i.e., 1 ≫ βi ≫ τ ). E.2. Generalized Energy for Hall RMHD and the Passive Entropy Mode To work out the generalized energy (§ 3.4) for the HRMHD regime, we start with the generalized energy for the isothermal electron fluid [Eq. (109)] and use Eq. (E2) to express the density perturbation: T0iδ f , (E10) where δB⊥ = ẑ×∇⊥Ψ. The perturbed ion distribution function can be written in the same form as it was done in § 5.4 [Eq. (143)]: to lowest order in the βi expansion (§ E.1.2), δ f (−1)i = 2v⊥ ·u⊥ v2thi F0i + g(−1) = 2v⊥ ·u⊥ v2thi F0i + 2v‖u‖i v2thi F0i + g̃, (E11) where u⊥ = ẑ×∇⊥Φ. The last equality above is achieved by noticing that, since g(−1) satisfies Eq. (E8), we may split it into a perturbed Maxwellian with parallel velocity u‖i and the remainder: g (−1) = 2v‖u‖iF0i/v thi + g̃. Then g̃ is the homogeneous solution of the leading-order kinetic equation [see Eq. (E8)]: +{Φ, g̃} = 0, d3v g̃ = 0. (E12) Substituting Eq. (E11) into Eq. (E10) and keeping only the leading-order terms in the βi expansion, we get min0iu min0iu T0ig̃ . (E13) The first four terms are the energy of the Alfvénic and slow-wave-polarized fluctuations [cf. Eq. (D12)]. Unlike in RMHD, these are not decoupled in HRMHD, unless a further subsidiary long-wavelength limit is taken (see § E.4). It is easy to verify that the sum of these four terms is indeed conserved by Eqs. (E3), (E8) and (E9). The last term in Eq. (E13) is an individually conserved kinetic quantity. Its conservation reflects the fact that g̃ is decoupled from the wave dynamics and passively advected by the Alfvénic velocities via Eq. (E12).51 The passive kinetic mode g̃ can be thought of as a kinetic version of the MHD entropy mode and, indeed, reduces to it if the collision operator in Eq. (E5) is upgraded to the leading order by orderingω/νii ∼ 1 (i.e., by considering long parallel wavelengths, k‖λmfpi ∼ βi). In such a collisional limit, g̃ has to be a perturbed Maxwellian with no density or velocity perturbation [because d3vg̃ = 0, while the velocity perturbation is explicitly separated from g̃ in Eq. (E11)]. Therefore, v2thi F0i ⇒ T0ig̃ n0iT0i δT 2i T 20i . (E14) This is to be compared with the βi ∼ τ ≪ 1 limit of Eqs. (D11) and (D12). As we have established, in the βi expansion, δTi = δT i , δni = δn i , δB‖ = δB , so to lowest order δs/s0 = δTi/T0i and Eq. (E14) describes the entropy mode in the Hall limit. 51 A similar splitting of the generalized energy cascade into a fluid-like cascade plus a passive cascade of a zero-density part of the distribution function occurs in the Hasegawa–Mima regime, which is the electrostatic version of the Hall limit (Plunk et al. 2009). KINETIC TURBULENCE IN MAGNETIZED PLASMAS 61 E.3. Hall RMHD Dispersion Relation Linearizing the Hall RMHD equations (E3), (E8) and (E9) (derived in § E.1.2 assuming the ordering βi ∼ τ ≪ 1), we obtain the following dispersion relation:52 ω2 − k2‖v 1 + 2/βe = ω2k2‖v 1 + 2/βe . (E15) When the coupling term on the right-hand side is negligible, k⊥di/ 1 + 2/βe ≪ 1, we recover the MHD Alfvén wave, ω2 = k2‖v and the MHD slow wave, ω2 = k2 v2A/(1 + v s ) [Eq. (167)], where cs = vA βe/2 in the limit τ ≪ 1 [Eq. (166)]. In the opposite limit, we get the kinetic Alfvén wave, ω2 = k2 i /(1 + 2/βe) [same as Eq. (230) with τ ≪ 1]. The solution of the dispersion relation (E15) is 1 + 2/βe  . (E16) The corresponding eigenfunctions then satisfy53 Ψ = − Φ+ vAdi , u‖i = − , Φ = − Ψ. (E17) Equation (E16) takes a particularly simple form in the subsidiary limits of high and low electron beta βe = βiZ/τ : βe ≫ 1 : ω2 = k2‖v  , βe ≪ 1 : ω2 = k2‖v 1 + k2⊥ρ and ω2 = 1 + k2⊥ρ , (E18) where ρs = di βe/2 = ρi Z/2τ = cs/Ωi is called the ion sound scale. The Alfvén wave and the slow wave (known as the ion acoustic wave in the limit of τ ≪ 1, βe ≪ 1) become dispersive at the ion inertial scale (k⊥di ∼ 1) when βe ≫ 1 and at the ion sound scale (k⊥ρs ∼ 1) when βe ≪ 1. E.4. Summary of Hall RMHD and the Role of the Ion Inertial and Ion Sound Scales We have shown that in the limit of cold ions and low ion beta (βi ∼ τ ≪ 1, “the Hall limit”), gyrokinetic turbulence can be described by five scalar functions: the stream and flux functions Φ and Ψ for the Alfvénic fluctuations, the parallel velocity and magnetic-field perturbations u‖i and δB‖ for the slow-wave-polarized fluctuations, and g̃, the zero-density, zero-velocity part of the ion distribution function, which is the kinetic version of the MHD entropy mode. The first four of these functions satisfy a closed set of four fluid-like equations, derived in § E.1 and collected here: = vAb̂ ·∇ Φ+ vAdi 1 + 2/βe b̂ ·∇ u‖i − di∇2⊥Ψ , (E19) ∇2⊥Φ = vAb̂ ·∇∇2⊥Ψ, = v2Ab̂ ·∇ . (E20) We call these equations the Hall Reduced Magnetohydrodynamics (HRMHD). To fully account for the generalized energy cas- cade, one must append to the four HRMHD equations the fifth, kinetic equation (E12) for g̃, which is energetically decoupled from HRMHD and slaved to the Alfvénic velocity fluctuations (§ E.2). The equations given above are valid above the ion gyroscale, k⊥ρi ≪ 1. They contain a special scale, di/ 1 + 2/βe, which is the ion inertial scale di for βe ≫ 1 and the ion sound scale ρs = cs/Ωi for βe ≪ 1. As becomes clear from the linear theory (§ E.3), the Alfvén and slow waves become dispersive at this scale. Nonlinearly, this scale marks the transition from the regime in which the Alfvénic and slow-wave-polarized fluctuations are decoupled to the regime in which they are mixed. Namely, when k⊥di/ 1 + 2/βe ≪ 1, HRMHD turns into RMHD: Eqs. (E19) become Eqs. (17) and (26), while Eqs. (E20) remain unchanged and identical to Eqs. (18) and (27); in the opposite limit, k⊥di/ 1 + 2/βe ≫ 1, the ion motion decouples from the magnetic-field evolution and Eqs. (E19) turn into the ERMHD equations (226-227). Since we are considering the case βi ≪ 1, both di and ρs are much larger than the ion gyroscale ρi. In the opposite limit of βi ≫ 1 (§ E.1.1), while di is the only scale that appears explicitly in Eqs. (E4), we have di ≪ ρi and the equations themselves represent the dynamics at scales much smaller than the ion gyroscale, so the transition between the RMHD and ERMHD regimes occurs at k⊥ρi ∼ 1. The same is true for βi ∼ 1, when di ∼ ρi. The ion sound scale ρs ≫ ρi does not play a special role when 52 The full gyrokinetic dispersion relation in a similar limit was worked out in Howes et al. (2006), Appendix D.2.1. 53 Note that wave packets with |k⊥| = k⊥ and satisfying Eq. (E17) with k‖vA/ω as a function of k⊥ given by Eq. (E16) are exact nonlinear solutions of the HRMHD equations (E3) and (E8-E9). This can be shown via a calculation analogous to that in § 7.3 (for the incompressible Hall MHD, this was done by Mahajan & Krishan 2005). 62 SCHEKOCHIHIN ET AL. βi is not small: it is not hard to see that for k⊥ρs ∼ 1, the ion motion terms in Eqs. (E19) dominate and we simply recover the inertial-range KRMHD model (§ 5) by expanding in k⊥ρi = k⊥ρs 2τ/Z ≪ 1. Various theories of the dissipation-range turbulence based on Hall and Electron MHD are further discussed in § 8.2.6. F. TWO-DIMENSIONAL INVARIANTS IN GYROKINETICS Since gyrokinetics is in a sense a “quasi-two-dimensional” approximation, it is natural to inquire if this gives rise to additional conservation properties (besides the conservation of the generalized energy discussed in § 3.4) and how they are broken by the presence of parallel propagation terms. It is important to emphasize that, except in a few special cases, these invariants are only invariants in 2D, so gyrokinetic turbulence in 2D and 3D has fundamentally different properties, despite its seemingly “quasi-2D” nature. It is, therefore, generally not correct to think of the gyrokinetic turbulence (or its special case the MHD turbulence) as essentially a 2D turbulence with an admixture of parallel-propagating waves (Fyfe et al. 1977; Montgomery & Turner 1981). In this Appendix, we work out the 2D invariants. Without attempting to present a complete analysis of the 2D conservation properties of gyrokinetics, we limit our discussion to showing how some more familiar fluid invariants (most notably, magnetic helicity) emerge from the general 2D invariants in the appropriate asymptotic limits. F.1. General 2D Invariants In deriving the generalized energy invariant, we used the fact that d3Rs hs{〈χ〉Rs ,hs} = 0, so Eq. (57) after multiplication by T0shs/F0s and integration over space contains no contribution from the Poisson-bracket nonlinearity. Since we also have∫ d3Rs 〈χ〉Rs{〈χ〉Rs ,hs} = 0, multiplying Eq. (57) by qs〈χ〉Rs and integrating over space has a similar outcome. Subtracting the latter integrated equation from the former and rearranging terms gives qs〈χ〉Rs = qsv‖ d3Rs 〈χ〉Rs qs〈χ〉Rs . (F1) We see that in a purely 2D situation, when ∂/∂z = 0, we have an infinite family of invariants Is = Is(v⊥,v‖) whose conservation (for each species and for every value of v⊥ and v‖!) is broken only by collisions. In 3D, the parallel particle streaming (propagation) term in the gyrokinetic equation generally breaks these invariants, although special cases may arise in which the first term on the right-hand side of Eq. (F1) vanishes and a genuine 3D invariant appears. F.2. “A2 -Stuff” Let apply the mass-ratio expansion (§ 4.1) to Eq. (F1) for electrons. Using the solution (101) for the electron distribution function, we find T0eF0e v2the d3rA‖ v2the + · · · = −ev‖ v2the F0e − ∂h(1)e d3rA‖ , (F2) where we have kept terms to two leading orders in the expansion. To lowest order, the above equation reduces to d3rA‖ . (F3) This equation can also be obtained directly from Eq. (116) (multiply by A‖ and integrate). In 2D, it expresses a well known conservation law of the “A2‖-stuff.” As this 2D invariant exists already on the level of the mass-ratio expansion of the electron kinetics, with no assumptions about the ions, it is inherited both by the RMHD equations in the limit of k⊥ρi ≪ 1 (§ 5.3) and by the ERMHD equations in the limit of k⊥ρi ≫ 1 (§ 7.2). In the former limit, δne/n0e on the right-hand side of Eq. (F3) is negligible (under the ordering explained in § 5.2); in the latter limit, it is expressed in terms of ϕ via Eq. (221). The conservation of “A2‖-stuff” is a uniquely 2D feature, broken by the parallel propagation term in 3D. F.3. Magnetic Helicity in the Electron Fluid If we now divide Eq. (F2) through by ev‖/c and integrate over velocities, we get, after some integrations by parts, another relation that becomes a conservation law in 2D and that can also easily be derived directly from the equations of the isothermal electron fluid (116-117): d3rA‖ . (F4) In the ERMHD limit k⊥ρi ≫ 1 (§ 7.2), we use Eqs. (221-223) to simplify the above equation and find that the integral on the right-hand side vanishes and we get a genuine 3D conservation law: d3rA‖δB‖ = 0. (F5) KINETIC TURBULENCE IN MAGNETIZED PLASMAS 63 This can also be derived directly from the ERMHD equations (226-227) [using Eq. (223)]. The conserved quantity is readily seen to be the helicity of the perturbed magnetic field: d3rA · δB = ∇⊥×A‖ẑ + A‖δB‖ A‖ẑ · (∇⊥×A⊥) + A‖δB‖ d3rA‖δB‖. (F6) F.4. Magnetic Helicity in the RMHD Limit Unlike in the case of ERMHD, the helicity of the perturbed magnetic field in RMHD is conserved only in 2D. This is because the induction equation for the perturbed field has an inhomogeneous term associated with the mean field [Eq. (10) with B = B0ẑ + δB] (this issue has been extensively discussed in the literature; see Matthaeus & Goldstein 1982; Stribling et al. 1994; Berger 1997; Montgomery & Bates 1999; Brandenburg & Matthaeus 2004). Directly from the induction equation or from its RMHD descendants Eqs. (17) and (26), we obtain [note the definitions (135)] d3rA‖δB‖ = 1 + v2A/c2s , (F7) so helicity is conserved only if ∂/∂z = 0. For completeness, let us now show that this 2D conservation law is a particular case of Eq. (F1) for ions. Let us consider the inertial range (k⊥ρi ≪ 1). We substitute Eq. (124) into Eq. (F1) for ions and expand to two leading orders in k⊥ρi using the ordering explained in § 5.2: v‖〈A‖〉Ri Z2e2v2 d3rA‖g + · · · Z2e2v2 d3rA‖ v2thi + Zev‖ d3rA‖ . (F8) The lowest-order terms in the above equations (all proportional to v2‖F0i) simply reproduce the 2D conservation of “A ‖-stuff,” given by Eq. (F3). We now subtract Eq. (F3) multiplied by (Zev‖/c) 2F0i/T0i from Eq. (F8). This leaves us with d3rA‖g = c + v‖F0i v2thi d3rA‖ . (F9) This equation is a general 2D conservation law of the KRMHD equations (see § 5.7) and can also be derived directly from them. If we integrate it over velocities and use Eqs. (146) and (147), we simply recover Eq. (F4). However, since Eq. (F9) holds for every value of v‖ and v⊥, it carries much more information than Eq. (F4). To make connection to MHD, let us consider the fluid (collisional) limit of KRMHD worked out in Appendix D. The distribu- tion function to lowest order in the k‖λmfpi ≪ 1 expansion is g = −(v2⊥/v2thi)δB‖/B0 +δ f̃ i , where δ f̃ i is the perturbed Maxwellian given by Eq. (D3). We can substitute this expression into Eq. (F9). Since in this expansion the collision integral is applied to δ f̃ (1)i and is the same order as the rest of the terms (see § D.3), conservation laws are best derived by taking 1, v‖, and v 2/v2thi moments of Eq. (F9) so as to make the collision term vanish. In particular, multiplying Eq. (F9) by 1 + (2τ/3Z)v2/v2thi, integrating over velocities and using Eqs. (D4) and (D6), we obtain the evolution equation for d3rA‖δB‖, which coincides with Eq. (F7). Note that, either proceeding in an analogous way, one can derive similar equations for d3rA‖δne and d3rA‖u‖—these are also 2D invariants of the RMHD system, broken in 3D by the presence of the propagation terms. The same result can be derived directly from the evolution equations (D8) and (D10). F.5. Electrostatic Invariant Interestingly, the existence of the general 2D invariants introduced in § F.1 alongside the generalized energy invariant given by Eq. (73) means that one can construct a 2D invariant of gyrokinetics that does not involve any velocity-space quantities. In order to do that, one must integrate Eq. (F1) over velocities, sum over species, and subtract Eq. (73) from the resulting equation (thus removing the h2s integrals). The result is not particularly edifying in the general case, but it takes a simple form if one considers electrostatic perturbations (δB = 0). In this case, χ = ϕ, and the manipulations described above lead to the following equation d3v Is −W q2s n0s 1−Γ0(αs) |ϕk|2 = d3rE‖ j‖ − d3Rs 〈ϕ〉Rs , (F10) where E‖ = −∂ϕ/∂z, αs = k2⊥ρ s/2 and Γ0 is defined by Eq. (129). In 2D, E‖ = 0 and the above equation expresses a conservation law broken only by collisions. The complete derivation and analysis of 2D conservation properties of gyrokinetics in the electro- static limit, including the invariant (F10), the electrostatic version of Eq. (F1), and their consequences for scalings and cascades, was given by Plunk et al. (2009). Here we briefly consider a few relevant limits. For k⊥ρi ≪ 1, we have Γ0(α) = 1 −αs + . . ., so the invariant given by Eq. (F10) is simply the kinetic energy of the E×B flows: s(msn0s/2) d3r |∇⊥Φ|2, where Φ = cϕ/B0. In the limit k⊥ρi ≫ 1, k⊥ρe ≪ 1, we have Y = −n0i d3rZ2e2ϕ2/2T0i. In 64 SCHEKOCHIHIN ET AL. the limit k⊥ρe ≫ 1, we have Y = −(1 + Z/τ )n0e d3re2ϕ2/2T0e. Whereas we are not interested in electrostatic fluctuations in the inertial range, electrostatic turbulence in the dissipation range was discussed in § 7.10 and § 7.12. The electrostatic 2D invariant in the limits k⊥ρi ≫ 1, k⊥ρe ≪ 1 and k⊥ρe ≫ 1 can also be derived directly from the equations given there [in the former limit, use Eq. (264) to express u‖i in terms of j‖ in order to get Eq. (F10)]. Note that, taken separately and integrated over velocities, Eq. (F1) for ions (when k⊥ρi ≫ 1, k⊥ρe ≪ 1) and for electrons (when k⊥ρe ≫ 1), reduces to lowest order to the statement of 3D conservation of d3Ri T0ih2i /2F0i [Whi in Eq. (245)] and∫ d3Re T0eh2e/2F0e [Eq. (280)], respectively. F.6. Implications for Turbulent Cascades and Scalings Since invariants other than the generalized energy or its constituent parts are present in 2D and, in some limits, also in 3D, one might ask how their presence affects the turbulent cascades and scalings. As an example, let us consider the magnetic helicity in KAW turbulence, which is a 3D invariant of the ERMHD equations (§ F.3). A Kolmogorov-style analysis of a local KAW cascade based on a constant flux of helicity gives (proceeding as in § 7.5): τKAWλ 1 +βi τKAWλ 1 +βi ∼ εH = const ⇒ Φλ ∼ (1 +βi)1/6 1/3, (F11) where εH is the helicity flux (omitting constant dimensional factors, the helicity is now defined as d3rΨΦ and assumed to be non-zero). This corresponds to a k ⊥ spectrum of magnetic energy. In order to decide whether we expect the scalings to be determined by the constant-helicity flux or by the constant-energy flux (as assumed in § 7.5), we adapt a standard argument originally due to Fjørtoft (1953). If the helicity flux of the KAW turbulence originating at the ion gyroscale (via partial conversion from the inertial-range turbulence; see § 7) is εH , its energy flux is εKAW ∼ εH [set λ = ρi in Eq. (F11) and compare with Eq. (238)]. If the cascade between the ion and electron gyroscales is controlled by maintaining a constant flux of helicity, then the helicity flux arriving to the electron gyroscale is still εH , while the associated energy flux is εHρi/ρe ≫ εKAW, i.e., more energy arrives to ρe than there was at ρi! This is clearly impossible in a stationary state. The way to resolve this contradiction is to conclude that the helicity cascade is, in fact, inverse (i.e., directed towards larger scales), while the energy cascade is direct (to smaller scales). A similar argument based on the constancy of the energy flux εKAW then leads to the conclusion that the helicity flux arriving to the electron gyroscale is εKAWρe/ρi ≪ εH ∼ εKAW, i.e., the helicity indeed does not cascade to smaller scales. It does not, in fact, cascade to large scales either because the ERMHD equations are not valid above the ion gyroscale and the helicity of the perturbed magnetic field in the inertial range is not a 3D invariant (§ F.4). The situation would be different if an energy source existed either at the electron gyroscale or somewhere in between ρe and ρi. In such a case, one would expect an inverse helicity cascade and the consequent shallower scaling [Eq. (F11)] between the energy-injection scale and the ion gyroscale. Other invariants introduced above can in a similar fashion be argued to give rise to inverse cascades in the hypothetical 2D situations where they are valid and provided there is energy injection at small scales (for the electrostatic case, see Plunk et al. 2009 and numerical simulations by Tatsuno et al. 2009b). The view of turbulence advanced in this paper does not generally allow for this to happen. First, the fundamentally 3D nature of the turbulence is imposed via the critical balance conjecture and supported by the argument that “two dimensionality” can only be maintained across parallel distances that do not exceed the distance a parallel-propagating wave (or parallel-streaming particles) travels over one nonlinear decorrelation time (see § 1.2, § 7.5 and § 7.10.3). Secondly, the lack of small-scale energy injection was assumed at the outset. This can, however, be violated in real astrophysical plasmas by various small-scale plasma instabilities (e.g., triggered by pressure anisotropies; see discussion in § 8.3). Treatment of such effects falls outside the scope of this paper and remains a matter for future work. REFERENCES Abel, I. G., Barnes, M., Cowley, S. C., Dorland, W., & Schekochihin, A. A. 2008, Phys. Plasmas, 15, 122509 Alexandrova, O. 2008, Nonlinear Process. Geophys., 15, 95 Alexandrova, O., Carbone, V., Veltri, P., & Sorriso-Valvo, L. 2008a, ApJ, 674, 1153 Alexandrova, O., Lacombe, C., & Mangeney, A. 2008b, Ann. Geophys., 26, Antonsen, T. M. & Lane, B. 1980, Phys. Fluids, 23, 1205 Armstrong, J. W., Coles, W. A., Kojima, M., & Rickett, B. J. 1990, ApJ, 358, 685 Armstrong, J. W., Cordes, J. M., & Rickett, B. J. 1981, Nature, 291, 561 Armstrong, J. W., Rickett, B. J., & Spangler, S. R. 1995, ApJ, 443, 209 Artun, M. & Tang, W. M. 1994, Phys. Plasmas, 1, 2682 Balbus, S. A. & Hawley, J. F. 1998, Rev. Mod. Phys., 70, 1 Bale, S. D., Kellogg, P. J., Mozer, F. S., Horbury, T. S., & Reme, H. 2005, Phys. Rev. Lett., 94, 215002 Barnes, A. 1966, Phys. Fluids, 9, 1483 Barnes, M. A., Abel, I. G., Dorland, W., Ernst, D. R., Hammett, G. W., Ricci, P., Rogers, B. N., Schekochihin, A. A., and Tatsuno, T. 2009, Phys. Plasmas, submitted (arXiv:0809.3945) Bavassano, B., Dobrowolny, M., Fanfoni, G., Mariani, F., & Ness, N. F. 1982, J. Geophys. Res., 87, 3617 Bavassano, B., Pietropaolo, E., & Bruno, R. 2004, Ann. Geophys., 22, 689 Beck, R. 2007, in Polarisation 2005, ed. F. Boulanger & M. A. Miville-Deschenes, EAS Pub. Ser., 23, 19 Belcher, J. W. & Davis, L. 1971, J. Geophys. Res., 76, 3534 Beresnyak, A. & Lazarian, A. 2006, ApJ, 640, L175 Beresnyak, A. & Lazarian, A. 2008a, ApJ, 682, 1070 Beresnyak, A. & Lazarian, A. 2008b, arXiv:0812.0812 Berger, M. 1997, J. Geophys. Res., 102, 2637 Bershadskii, A. & Sreenivasan, K. R. 2004, Phys. Rev. Lett., 93, 064501 Bhattacharjee, A. & Ng, C. S. 2001, ApJ, 548, 318 Bhattacharjee, A., Ng, C. S., Spangler, S. R. 1998, ApJ, 494, 409 Bieber, J. W., Wanner, W., & Matthaeus, W. H. 1996, J. Geophys. Res., 101, Bigazzi, A., Biferale, L., Gama, S. M. A., & Velli, M. 2006, ApJ, 638, 499 Biskamp, D. & Müller, W.-C. 2000, Phys. Plasmas, 7, 4889 Biskamp, D., Schwartz, E., & Drake, J. F. 1996, Phys. Rev. Lett., 76, 1264 Biskamp, D., Schwartz, E., Zeiler, A., Celani, A., & Drake, J. F. 1999, Phys. Plasmas, 6, 751 Boldyrev, S. A. 2006, Phys. Rev. Lett., 96, 115002 Braginskii, S. I. 1965, Rev. Plasma Phys., 1, 205 Brandenburg, A. & Matthaeus, W. H. 2004, Phys. Rev. E, 69, 056407 Brizard, A. J. & Hahm, T. S. 2007, Rev. Mod. Phys., 79, 421 http://arxiv.org/abs/0809.3945 http://arxiv.org/abs/0812.0812 KINETIC TURBULENCE IN MAGNETIZED PLASMAS 65 Brunetti, G. & Lazarian, A. 2007, MNRAS, 378, 245 Bruno, R. & Carbone, V. 2005, Living Rev. Solar Phys., 2, 4 Bruno, R., Carbone, V., Chapman, S., Hnat, B., Noullez, A., & Sorriso-Valvo, L. 2007, Phys. Plasmas, 14, 032901 Burlaga, L. F., Scudder, J. D., Klein, L. W., & Isenburg, P. A. 1990, J. Geophys. Res., 95, 2229 Califano, F., Hellinger, P., Kuznetsov, E., Passot, T., Sulem, P.-L., & Trávnícek 2008, J. Geophys. Res., 113, A08219 Candy, J. & Waltz, R. E. 2003, J. Comput. Phys., 186, 545 Carter, T. A., Brugman, B., Pribyl, P., & Lybarger, W. 2006, Phys. Rev. Lett., 96, 155001 Catto, P. J. 1978, Plasma Phys., 20, 719 Catto, P. J., Tang, W. M., & Baldwin, D. E. 1981, Plasma Phys., 23, 639 Catto, P. J. & Tsang, K. T. 1977, Phys. Fluids, 20, 396 Celnikier, L. M., Harvey, C. C., Jegou, R., Kemp, M., & Moricet, P. 1983, A&A, 126, 293 Celnikier, L. M., Muschietti, L., & Goldman, M. V. 1987, A&A, 181, 138 Chandran, B. D. G. 2005a, ApJ, 632, 809 Chandran, B. D. G. 2005b, Phys. Rev. Lett., 95, 265004 Chandran, B. D. G. 2008, ApJ, 685, 646 Chapman, S. & Cowling, T. G. 1970, The Mathematical Theory of Non-Uniform Gases (Cambridge: Cambridge Univ. Press) Chapman, S. C. & Hnat, B. 2007, Geophys. Res. Lett., 34, L17103 Chen, Y. & Parker, S. E. 2003, J. Comput. Phys., 189, 463 Chen, C. H. K., Schekochihin, A. A., Cowley, S. C., & Horbury, T. S. 2009, ApJ, submitted Cho, J. & Lazarian, A. 2002, Phys. Rev. Lett., 88, 245001 Cho, J. & Lazarian, A. 2003, MNRAS, 345, 325 Cho, J. & Lazarian, A. 2004, ApJ, 615, L41 Cho, J. & Vishniac, E. T. 2000, ApJ, 539, 273 Cho, J., Lazarian, A., & Vishniac, E. T. 2002, ApJ, 564, 291 Cho, J., Lazarian, A., & Vishniac, E. T. 2003, ApJ, 595, 812 Clarke, T. E. & Enßlin T. A. 2006, AJ, 131, 2900 Coleman, P. J. 1968, ApJ, 153, 371 Coles, W. A. & Harmon, J. K. 1989, ApJ, 337, 1023 Coles, W. A., Liu, W., Harmon, J. K., & Martin, C. L. 1991, J. Geophys. Res., 96, 1745 Coroniti, F. W., Kennel, C. F., Scarf, F. L., & Smith, E. J. 1982, J. Geophys. Res., 87, 6029 Corrsin, S. 1951, J. Appl. Phys., 22, 469 Cowley, S. C. 1985, Ph. D. Thesis, Princeton University Cranmer, S. R. & van Ballegooijen, A. A. 2003, ApJ, 594, 573 Czaykowska, A., Bauer, T. M., Treumann, R. A., & Baumjohann, W. 2001, Ann. Geophys., 19, 275 Dasso, S., Milano, L. J., Matthaeus, W. H., & Smith, C. W. 2005, ApJ, 635, Dennett-Thorpe, J. & de Bruyn, A. G. 2003, A&A, 404, 113 Denskat, K. U., Beinroth, H. J., & Neubauer, F. M. 1983, J. Geophys., 54, 60 Dimits, A. M. & Cohen, B. I. 1994, Phys. Rev. E, 49, 709 Dmitruk, P., Gomez, D. O., & Matthaeus, W. H. 2003, Phys. Plasmas, 10, Dobrowolny, M., Mangeney, A., & Veltri, P.-L. 1980, Phys. Rev. Lett., 45, Dorland, W. & Hammett, G. W. 1993, Phys. Fluids B, 5, 812 Dubin, D. H. E., Krommes, J. A., Oberman, C., & Lee, W. W. 1983, Phys. Fluids, 26, 3524 Elsasser, W. M. 1950, Phys. Rev., 79, 183 Enßlin, T. A. & Vogt, C. 2006, A&A, 453, 447 Enßlin, T. A., Waelkens, A., Vogt, C., & Schekochihin, A. A. 2006, Astron. Nachr., 327, 626 Fabian, A. C., Sanders, J. S., Taylor, G. B., Allen, S. W., Crawford, C. S., Johnstone, R. M., & Iwasawa, K. 2006, MNRAS, 366, 417 Ferriere, K. M. 2001, Rev. Mod. Phys., 73, 1031 Fjørtoft, R. 1953, Tellus, 5, 225 Foote, E. A. & Kulsrud, R. M. 1979, ApJ, 233, 302 Fowler, T. K. 1968, Adv. Plasma Phys., 1, 201 Fried, B. D. & Conte, S. D. 1961, The Plasma Dispersion Function (San Diego, CA: Academic Press) Frieman, E. A. & Chen, L. 1982, Phys. Fluids, 25, 502 Fyfe, D., Joyce, G., & Montgomery, D. 1977, J. Plasma Phys., 17, 317 Galtier, S. 2006, J. Plasma Phys., 72, 721 Galtier, S. & Bhattacharjee, A. 2003, Phys. Plasmas, 10, 3065 Galtier, S. & Buchlin, E. 2007, ApJ, 656, 560 Galtier, S. & Chandran, B. D. G. 2006, Phys. Plasmas, 13, 114505 Galtier, S., Nazarenko, S. V., Newell, A. C., & Pouquet, A. 2000, J. Plasma Phys., 63, 447 Galtier, S., Nazarenko, S. V., Newell, A. C., & Pouquet, A. 2002, ApJ, 564, Gary, S. P., Montgomery, M. D., Feldman, W. C., & Forslund, D. W. 1976, J. Geophys. Res., 81, 1241 Gary, S. P. 1986, J. Plasma Phys., 35, 431 Gaty, S. P. & Borovsky, J. 2008, J. Geophys. Res., 113, A12104 Gary, S. P., Saito, S., & Li, H. 2008, Geophys. Res. Lett., 35, L02104 Gary, S. P., Skoug, R. M., Steinberg, J. T., & Smith, C. W. 2001, Geophys. Res. Lett., 28, 2759 Ghosh, S., Siregar, E., Roberts, D. A., & Goldstein, M. L. 1996, J. Geophys. Res., 101, 2493 Gogoberidze, G. 2005, Phys. Rev. E, 72, 046407 Gogoberidze, G. 2007, Phys. Plasmas, 14, 022304 Goldreich, P. & Reisenegger, A. 1992, ApJ, 395, 250 Goldreich, P. & Sridhar, S. 1995, ApJ, 438, 763 Goldreich, P. & Sridhar, S. 1997, ApJ, 485, 680 Goswami, P., Passot, T., & Sulem, P. L. 2005, Phys. Plasmas, 12, 102109 Grall, R. R., Coles, W. A., Spangler, S. R., Sakurai, T., & Harmon, J. K. 1997, J. Geophys. Res., 102, 263 Grison, B., Sahraoui, F., Lavraud, B., Chust, T., Cornilleau-Wehrlin, N., Rème, H., Balogh, A., & André, M. 2005, Ann. Geophys., 23, 3699 Hahm, T. S., Lee, W. W., & Brizard, A. 1988, Phys. Fluids, 31, 1940 Hallatschek, K. 2004, Phys. Rev. Lett., 93, 125001 Hamilton, K., Smith, C. W., Vasquez, B. J., & Leamon, R. J. 2008, J. Geophys. Res., 113, A01106 Hammett, G. W., Dorland, W., & Perkins, F. W. 1991, Phys. Fluids B, 4, Harmon, J. K. & Coles, W. A. 2005, J. Geophys. Res., 110, A03101 Haugen, N. E. L., Brandenburg, A., & Dobler, W. 2004, Phys. Rev. E, 70, 016308 Haverkorn, M., Gaensler, B. M., McClure-Griffiths, N. M., Dickey, J. M., & Green, A. J. 2004, ApJ, 609, 776 Haverkorn, M., Gaensler, B. M., Brown, J. C., Bizunok, N. S., McClure-Griffiths, N. M., Dickey, J. M., & Green, A. J. 2005, ApJ, 637, Haverkorn, M., Brown, J. C., Gaensler, B. M., & McClure-Griffiths, N. M. 2008, ApJ, 680, 362 Hazeltine, R. D. 1983, Phys. Fluids, 26, 3242 Helander, P. & Sigmar, D. J. 2002, Collisional Transport in Magnetized Plasmas (Cambridge: Cambridge Univ. Press) Hellinger, P., Trávnícek, P., Kasper, J. C., & Lazarus, A. J., 2006, Geophys. Res. Lett., 33, L09101 Heyer, M., Gong, H., Ostriker, E., & Brunt, C. 2008, ApJ, 680, 420 Higdon, J. C. 1984, ApJ, 285, 109 Hirose, A., Ito, A., Mahajan, S. M., & Ohsaki, S. 2004, Phys. Lett. A, 330, Hnat, B., Chapman, S. C., & Rowlands, G. 2005, Phys. Rev. Lett., 94, 204502 Hnat, B., Chapman, S. C., Kiyani, K., Rowlands, G., & Watkins, N. W. 2007, Geophys. Res. Lett., 34, L15108 Hollweg, J. V. 1999, J. Geophys. Res., 104, 14811 Hollweg, J. V. 2008, J. Astrophys. Astr., 29, 217 Horbury, T. S., Balogh, A., Forsyth, R. J., & Smith E. J. 1996, A&A, 316, Horbury, T. S., Forman, M. A., & Oughton, S. 2005, Plasma Phys. Control. Fusion, 47, B703 Horbury, T. S., Forman, M., & Oughton, S. 2008, Phys. Rev. Lett., 101, 175005 Howes, G. G., Cowley, S. C., Dorland, W., Hammett, G. W., Quataert, E., & Schekochihin, A. A. 2006, ApJ, 651, 590 Howes, G. G., Cowley, S. C., Dorland, W., Hammett, G. W., Quataert, E., & Schekochihin, A. A., 2008a, J. Geophys. Res., 113, A05103 Howes, G. G., Cowley, S. C., Dorland, W., Hammett, G. W., Quataert, E., Schekochihin, A. A., & Tatsuno, T. 2008b, Phys. Rev. Lett., 100, 065004 Iroshnikov, R. S. 1963, Astron. Zh., 40, 742 [English translation: 1964, Sov. Astron, 7, 566] Ito, A., Hirose, A., Mahajan, S. M., & Ohsaki, S. 2004, Phys. Plasmas, 11, Jenko, F., Dorland, W., Kotschenreuther, M., & Rogers, B. N. 2000, Phys. Plasmas, 7, 1904 Kadomtsev, B. B. & Pogutse, O. P. 1974, Sov. Phys.—JETP, 38, 283 Kasper, J. C., Lazarus, A. J., & Gary, S. P. 2002, Geophys. Res. Lett., 29, 20 Kellogg, P. J. & Horbury, T. S. 2005, Ann. Geophys., 23, 3765 Kellogg, P. J., Bale, S. D., Mozer, F. S., Horbury, T. S., & Reme, H. 2006, ApJ, 645, 704 Kingsep, A. S., Chukbar, K. V., & Yankov, V. V. 1990, Rev. Plasma Phys., 16, 243 66 SCHEKOCHIHIN ET AL. Kinney, R. & McWilliams, J. C., 1997, J. Plasma Phys., 57, 73 Kinney, R. M. & McWilliams, J. C. 1998, Phys. Rev. E, 57, 7111 Kivelson, M. G. & Southwood, D. J. 1996, J. Geophys. Res., 101, 17365 Kiyani, K., Chapman, S. C., Hnat, B. & Nicol, R. M. 2007, Phys. Rev. Lett., 98, 211101 Kolmogorov, A. N. 1941, Dokl. Akad. Nauk SSSR, 30, 299 [English translation: 1991, Proc. R. Soc. A, 434, 9] Kotschenreuther, M., Rewoldt, G., & Tang, W. M. 1995, Comput. Phys. Commun., 88, 128 Kraichnan, R. H. 1965, Phys. Fluids, 8, 1385 Krishan, V. & Mahajan, S. M. 2004, J. Geophys. Res., 109, A11105 Krommes, J. A. 1999, Phys. Plasmas, 6, 1477 Krommes, J. A. 2006, in Turbulence and Coherent Structures in Fluids, Plasmas and Nonlinear Medium, eds. M. Shats & H. Punzmann (Singapore: World Scientific), 115 Krommes, J. A. & Hu, G. 1994, Phys. Plasmas, 1, 3211 Kruger, S. E., Hegna, C. C., & Callen, J. D. 1998, Phys. Plasmas, 5, 4169 Kruskal, M. D. & Oberman, C. R. 1958, Phys. Fluids, 1, 275 Kulsrud, R. 1962, Phys. Fluids, 5, 192 Kulsrud, R. M. 1964, in Teoria dei plasmi, ed. M. N. Rosenbluth (London: Academic Press), 54 Kulsrud, R. M. 1983, in Handbook of Plasma Physics, Vol. 1, ed. A. A. Galeev & R. N. Sudan (Amsterdam: North–Holland), 115 Lacombe, C., Samsonov, A. A., Mangeney, A., Maksimovic, M., Cornilleau-Wehrlin, N., Harvey, C. C., Bosqued, J.-M., & Trávnícek, P. 2006, Ann. Geophys., 24, 3523 Landau, L. 1936, Zh. Exp. Teor. Fiz., 7, 203 Landau, L. 1946, Zh. Exp. Teor. Fiz., 16, 574 [English translation: 1946, J. Phys. U.S.S.R., 10, 25] Lazio, T. J. W., Cordes, J. M., de Bruyn, A. G., & Macquart, J.-P. 2004, New Astron. Rev., 48, 1439 Leamon, R. J., Smith, C. W., Ness, N. F., Matthaeus, W. H., & Wong, H. K. 1998, J. Geophys. Res., 103, 4775 Leamon, R. J., Smith, C. W., Ness, N. F., & Wong, H. K. 1998, J. Geophys. Res., 104, 22331 Leamon, R. J., Matthaeus, W. H., Smith, C. W., Zank, G. P., & Mullan, D. J. 2000, ApJ, 537, 1054 Li, H., Gary, P., & Stawicki, O. 2001, Geophys. Res. Lett., 28, 1347 Lithwick, Y. & Goldreich, P. 2001, ApJ, 562, 279 Lithwick, Y. & Goldreich, P. 2003, ApJ, 582, 1220 Lithwick, Y., Goldreich, P., & Sridhar, S. 2007, ApJ, 655, 269 Loeb, A. & Waxman, E. 2007, J. Cosmol. Astropart. Phys., 03, 011 Longmire, C. L. 1963, Elementary Plasma Physics (New York: Interscience) Lovelace, R. V. E., Salpeter, E. E., Sharp, L. E., & Harris, D. E. 1970, ApJ, Mahajan, S. M. & Krishan, V. 2005, MNRAS, 359, L27 Mahajan, S. M. & Yoshida, Z. 1998, Phys. Rev. Lett., 81, 4863 Maksimovic, M., Zouganelis, I., Chaufray, J.-Y., Issautier, K., Scime, E. E., Littleton, J. E., Marsch, E., McComas, D. J., Salem, C., Lin, R. P., & Elliott, H. 2005, J. Geophys. Res., 110, A09104 Mangeney, A., Lacombe, C., Maksimovic, M., Samsonov, A. A., Cornilleau-Wehrlin, N., Harvey, C. C., Bosqued, J.-M., & Trávnícek, P. 2006, Ann. Geophys., 24, 3507 Markevitch, M. & Vikhlinin, A. 2007, Phys. Rep., 443, 1 Markevitch, M., Mazzotta, P., Vikhlinin, A., Burke, D., Butt, Y., David, L., Donnelly, H., Forman, W. R., Harris, D., Kin, D.-W., Virani, S., & Vrtilek, J. 2003, ApJ, 586, L19 Markovskii, S. A., Vasquez, B. J., Smith, C. W., & Holweg, J. V. 2006, ApJ, 639, 1177 Markovskii, S. A., Vasquez, B. J., & Smith, C. W. 2008, ApJ, 675, 1576 Maron, J. & Goldreich, P. 2001, ApJ, 554, 1175 Marsch, E. 2006, Living Rev. Solar Phys., 3, 1 Marsch, E. & Tu, C.-Y. 1990a, J. Gephys. Res, 95, 8211 Marsch, E. & Tu, C.-Y. 1990b, J. Gephys. Res, 95, 11945 Marsch, E. & Tu, C.-Y. 1993, Ann. Geophys., 11, 659 Marsch, E., Ao, X.-Z.,& Tu, C.-Y. 2004, J. Geophys. Res., 109, A04102 Marsch, E., Mühlhäuser, K. H., Rosenbauer, H., & Schwenn, R. 1983, J. Geophys. Res., 88, 2982 Mason, J., Cattaneo, F., & Boldyrev, S. 2006, Phys. Rev. Lett., 97, 255002 Mason, J., Cattaneo, F., & Boldyrev, S. 2007, Phys. Rev. E, 77, 036403 Matteini, L., Landi, S., Hellinger, P., Pantellini, F., Maksimovic, M., Velli, M., Goldstein, B. E., & Marsch, E. 2007, Geophys. Res. Lett., 34, L20105 Matthaeus, W. H. & Goldstein, M. L. 1982, J. Geophys. Res., 87, 6011 Matthaeus, W. H. & Brown, M. R. 1988, Phys. Fluids, 31, 3634 Matthaeus, W. H., Goldstein, M. L., & Roberts, D. A. 1990, J. Geophys. Res., 95, 20673 Matthaeus, W. H., Klein, K. W., Ghosh, S., & Brown, M. R. 1991, J. Geophys. Res., 96, 5421 Matthaeus, W. H., Pouquet, A., Mininni, P. D., Dmitruk, P., & Breech, B. 2008a, Phys. Rev. Lett., 100, 085003 Matthaeus, W. H., Servidio, S., & Dmitruk, P. 2008b, Phys. Rev. Lett., 101, 149501 Minter, A. H. & Spangler, S. R. 1996, ApJ, 458, 194 Montgomery, D. C. 1982, Phys. Scripta, T2/1, 83 Montgomery, D. C. & Bates, J. W. 1999, Phys. Plasmas, 6, 2727 Montgomery, D. & Turner, L. 1981, Phys. Fluids, 24, 825 Montgomery, D., Brown, M. R., & Matthaeus, W. H. 1987, J. Geophys. Res., 92, 282 Morales, G. J., Maggs, J. E., Burke, A. T., & Peñano, J. R. 1999, Plasma Phys. Control. Fusion, 41, A519 Müller, W.-C., Biskamp, D., & Grappin, R. 2003, Phys. Rev. E, 67, 066302 Narayan, R. & Quataert, E. 2005, Science, 307, 77 Narayan, R. & Yi, I. 1995, ApJ, 452, 710 Narita, Y., Glassmeier, K.-H., & Treumann, R. A. 2006, Phys. Rev. Lett., 97, 191101 Nazarenko, S. 2007, New J. Phys, 9, 307 Newbury, J. A., Russell, C. T., Phillips, J. L., & Gary, S. P. 1998, J. Geophys. Res., 103, 9553 Ng, C. S. & Bhattacharjee, A. 1996, ApJ, 465, 845 Ng, C. S. & Bhattacharjee, A. 1997, Phys. Plasmas, 4, 605 Ng, C. S., Bhattacharjee, A., Germaschewski, K., & Galtier, S. 2003, Phys. Plasmas, 10, 1954 Norman, C. A. & Ferrara, A. 1996, ApJ, 467, 280 Obukhov, A. M. 1941, Izv. Akad. Nauk SSSR Ser. Geogr. Geofiz., 5, 453 Obukhov, A. M. 1949, Izv. Akad. Nauk SSSR Ser. Geogr. Geofiz., 13, 58 Osman, K. T. & Horbury, T. S. 2007, 654, L103 Oughton, S., Dmitruk, P., & Matthaeus, W. H. 2004, Phys. Plasmas, 11, 2214 Oughton, S., Priest, E. R., & Matthaeus, W. H. 1994, J. Fluid Mech., 280, 95 Passot, T. & Sulem, P. L. 2007, Phys. Plasmas, 14, 082502 Perez, J. C. & Boldyrev, S. 2008, ApJ, 672, L61 Perez, J. C. & Boldyrev, S. 2009, Phys. Rev. Lett., 102, 025003 Plunk, G. G., Cowley, S. C., Schekochihin, A. A., & Tatsuno, T. 2009, J. Fluid Mech., submitted (arXiv:0904.0243) Podesta, J. J., Roberts, D. A., & Goldstein, M. L. 2006, J. Geophys. Res., 111, A10109 Pokhotelov, O. A., Sagdeev, R. Z., Balikhin, M. A., Onishchenko, O. G., & Fedun, V. N. 2008, J. Geophys. Res., 113, A04225 Quataert, E. 2003, Astron. Nachr., 324, 435 Quataert, E. & Gruzinov, A. 1999, ApJ, 520, 248 Quataert, E., Dorland, W., & Hammett, G. W. 2002, ApJ, 577, 524 Ramos, J. J. 2005, Phys. Plasmas, 12, 052102 Rappazzo, A. F., Velli, M., Einaudi, G., & Dahlburg, R. B. 2007, ApJ, 657, Rappazzo, A. F., Velli, M., Einaudi, G., & Dahlburg, R. B. 2008, ApJ, 677, Rees, M. J., Begelman, M. C., Blandford, R. D., & Phinney, E. S. 1982, Nature, 295, 17 Rickett, B. J., Kedziora-Chudczer, L., & Jauncey, D. L. 2002, ApJ, 581, 103 Rincon, F., Schekochihin, A. A., & Cowley, S. C. 2009, MNRAS, submitted Roach, C. M., Applegate, D. J., Connor, J. W., Cowley, S. C., Dorland, W. D., Hastie, R. J., Joiner, N., Saarelma, S., Schekochihin, A. A., Akers, R. J., Brickley, C., Field, A. R., Valovic, M., & MAST Team 2005, Plasma Phys. Control. Fusion, 47, B323 Roberts, D. A. 1990, J. Geophys. Res., 95, 1087 Robinson, D. C. & Rusbridge, M. G. 1971, Phys. Fluids, 14, 2499 Rosenbluth, M. N., Hazeltine, R. D., & Hinton, F. L. 1972, Phys. Fluids, 15, Rosin, M. S., Rincon, F., Schekochihin, A. A., & Cowley, S. C. 2009, MNRAS, submitted Rutherford, P. H. & Frieman, E. A. 1968, Phys. Fluids, 11, 569 Sahraoui, F., Belmont, G., Rezeau, L., Cornilleau-Wehrlin, N., Pinçon, J. L., & Balogh, A. 2006, Phys. Rev. Lett., 96, 075002 Saito, S., Gary, S. P., Li, H., & Narita, Y. 2008, Phys. Plasmas, 15, 102305 Sanders, J. S. & Fabian, A. C. 2006, MNRAS, 371, L65 Schekochihin, A. A. & Cowley, S. C. 2006, Phys. Plasmas, 13, 056501 Schekochihin, A. A. & Cowley, S. C. 2007, in Magnetohydrodynamics: Historical Evolution and Trends, ed. S. Molokov, R. Moreau, & H. K. Moffatt, (Berlin: Springer), 85 (arXiv:astro-ph/0507686) Schekochihin, A. A. & Cowley, S. C. 2009, Phys. Rev. Lett., submitted Schekochihin, A. A., Cowley, S. C., Taylor, S. F., Maron, J. L., & McWilliams, J. C. 2004, ApJ, 612, 276 Schekochihin, A. A., Cowley, S. C., Kulsrud, R. M., Hammett, G. W., & Sharma, P. 2005, ApJ, 629, 139 http://arxiv.org/abs/0904.0243 http://arxiv.org/abs/astro-ph/0507686 KINETIC TURBULENCE IN MAGNETIZED PLASMAS 67 Schekochihin, A. A., Cowley, S. C., & Dorland, W. 2007, Plasma Phys. Control. Fusion, 49, A195 Schekochihin, A. A., Cowley, S. C., Kulsrud, R. M., Rosin, M. S., & Heinemann, T. 2008a, Phys. Rev. Lett., 100, 081301 Schekochihin, A. A., Cowley, S. C., Dorland, W., Hammett, G. W., Howes, G. G., Plunk, G. G., Quataert, E., & Tatsuno, T. 2008b, Plasma Phys. Control. Fusion, 50, 124024 Schuecker, P., Finoguenov, A., Miniati, F., Böhringer, H., & Briel, U. G. 2004, A&A, 426, 387 Scott, B. D. 2007, Phys. Plasmas, submitted (arXiv:0710.4899) Shaikh, D. & Zank, G. P. 2005, Phys. Plasmas, 12, 122310 Shakura, N. I. & Sunyaev, R. A. 1973, A&A, 24, 337 Sharma, P., Hammett, G. W., & Quataert, E. 2003, ApJ, 596, 1121 Sharma, P., Hammett, G. W., Quataert, E., & Stone, J. M. 2006, ApJ, 637, Sharma, P., Quataert, E., Hammett, G. W., & Stone, J. M. 2007, ApJ, 667, Shebalin, J. V., Matthaeus, W. H., & Montgomery, D. 1983, J. Plasma Phys., 29, 525 Shukurov, A. 2007, in Mathematical Aspects of Natural Dynamos, eds. E. Dormy & A. M. Soward (London: CRC Press), 313 (arXiv:astro-ph/0411739) Smirnova, T. V., Gwinn, C. R., & Shishov, V. I. 2006, A&A, 453, 601 Smith, C. W., Mullan, D. J., Ness, N. F., Skoug, R. M., & Steinberg, J. 2001, J. Geophys. Res., 106, 18625 Smith, C. W., Hamilton, K., Vasquez, B. J., & Leamon, R. J. 2006, ApJ, 645, Snyder, P. B. & Hammett, G. W. 2001, Phys. Plasmas, 8, 3199 Snyder, P. B., Hammett, G. W., & Dorland, W. 1997, Phys. Plasmas, 4, 3974 Sorriso-Valvo, L., Carbone, V., Bruno, R., & Veltri, P. 2006, Europhys. Lett., 75, 832 Spangler, S. R. & Gwinn, C. R. 1990, ApJ, 353, L29 Stawicki, O., Gary, S. P., & Li, H. 2001, J. Geophys. Res., 106, 8273 Strauss, H. R. 1976, Phys. Fluids, 19, 134 Strauss, H. R. 1977, Phys. Fluids, 20, 1354 Stribling, T., Matthaeus, W. H., & Ghosh, S. 1994, J. Geophys. Res., 99, Subramanian, K., Shukurov, A., & Haugen, N. E. L. 2006, MNRAS, 366, Sugama, H. & Horton, W. 1997, Phys. Plasmas, 4, 405 Sugama, H., Okamoto, M., Horton, W., & Wakatani, M. 1996, Phys. Plasmas, 3, 2379 Tatsuno, T., Dorland, W., Schekochihin, A. A., Plunk, G. G., Barnes, M. A., Cowley, S. C., & Howes, G. G. 2009a, Phys. Rev. Lett., submitted (arXiv:0811.2538) Tatsuno, T., Dorland, W., Schekochihin, A. A., Plunk, G. G., Barnes, M. A., Cowley, S. C., & Howes, G. G. 2009b, Phys. Plasmas, submitted Taylor, G. I. 1938, Proc. R. Soc. A, 164, 476 Taylor, J. B. & Hastie, R. J. 1968, Plasma Phys., 10, 479 Trotter, A. S., Moran, J. M., & Rodríguez, L. F. 1998, ApJ, 493, 666 Tu, C.-Y. & Marsch, E. 1995, Space Sci. Rev., 73, 1 Unti, T. W. J. & Neugebauer, M. 1968, Phys. Fluids, 11, 563 Vogt, C. & Enßlin, T. A. 2005, A&A, 434, 67 Voitenko, Yu. M. 1998, J. Plasma Phys., 60, 515 Watanabe, T.-H. & Sugama, H. 2004, Phys. Plasmas, 11, 1476 Wicks, R. T., Chapman, S. C., & Dendy, R. O. 2009, ApJ, 690, 734 Wilkinson, P. N., Narayan, R., & Spencer, R. E. 1994, MNRAS, 269, 67 Woo, R. & Armstrong, S. R. 1979, J. Geophys. Res., 84, 7288 Woo, R. & Habbal, S. R. 1997, ApJ, 474, L139 Yoon, P. H. & Fang, T.-M. 2008, Plasma Phys. Control. Fusion, 50, 085007 Yousef, T., Rincon, F., & Schekochihin, A. 2007, J. Fluid Mech., 575, 111 Yousef, T. A., Schekochihin, A. A., & Nazarenko, S. V. 2009, Phys. Rev. Lett., submitted Zank, G. P. & Matthaeus, W. H., 1992, J. Plasma Phys., 48, 85 Zank, G. P. & Matthaeus, W. H., 1993, Phys. Fluids A, 5, 257 Zweben, S. J., Menyuk, C. R. & Taylor, R. J., 1979, Phys. Rev. Lett.,42, http://arxiv.org/abs/0710.4899 http://arxiv.org/abs/astro-ph/0411739 http://arxiv.org/abs/0811.2538 ABSTRACT We present a theoretical framework for plasma turbulence in astrophysical plasmas (solar wind, interstellar medium, galaxy clusters, accretion disks). The key assumptions are that the turbulence is anisotropic with respect to the mean magnetic field and frequencies are low compared to the ion cyclotron frequency. The energy injected at the outer scale scale has to be converted into heat, which ultimately cannot be done without collisions. A KINETIC CASCADE develops that brings the energy to collisional scales both in space and velocity. Its nature depends on the physics of plasma fluctuations. In each of the physically distinct scale ranges, the kinetic problem is systematically reduced to a more tractable set of equations. In the "inertial range" above the ion gyroscale, the kinetic cascade splits into a cascade of Alfvenic fluctuations, which are governed by the RMHD equations at both the collisional and collisionless scales, and a passive cascade of compressive fluctuations, which obey a linear kinetic equation along the moving field lines associated with the Alfvenic component. In the "dissipation range" between the ion and electron gyroscales, there are again two cascades: the kinetic-Alfven-wave (KAW) cascade governed by two fluid-like Electron RMHD equations and a passive phase-space cascade of ion entropy fluctuations. The latter cascade brings the energy of the inertial-range fluctuations that was damped by collisionless wave-particle interaction at the ion gyroscale to collisional scales in the phase space and leads to ion heating. The KAW energy is similarly damped at the electron gyroscale and converted into electron heat. Kolmogorov-style scaling relations are derived for these cascades. Astrophysical and space-physical applications are discussed in detail. <|endoftext|><|startoftext|> Introduction There have been many studies of the propagation of water waves over a slope, sometimes also subject to the effects of bottom friction. Many of these works have considered linear waves, or have been numerical simulations in the framework of various nonlinear long-wave model equations. Our interest here is in the propagation of weakly nonlinear long water http://arxiv.org/abs/0704.0045v1 waves over a slope, simultaneously subject to bottom friction, a combination apparently first considered by Miles (1983a,b) albeit for the special case of a single solitary wave, or a periodic wavetrain. An appropriate model equation for this scenario is the variable-coefficient perturbed Korteweg-de Vries (KdV) equation (see Grimshaw 1981, Johnson 1973a,b), At + cAx + AAx + Axxx = −CD |A|A. (1) Here A(x, t) is the free surface elevation above the undisturbed depth h(x) and c(x) = gh(x) is the linear long wave phase speed. The bottom friction term on the right-hand side is represented by the Chezy law, modelling a turbulent boundary layer. Here CD is a non-dimensional drag coefficient, often assumed to have a value around 0.01 (Miles 1983a,b). Other forms of friction could be used (see, for instance Grimshaw et al 2003) but the Chezy law seems to be the most appropriate for water waves in a shallow depth. In (1) the first two terms on the left-hand side are the dominant terms, and by themselves describe the propagation of a linear long wave with speed c. The remaining terms on the left-hand side represent, respectively, the effect of varying depth, weakly nonlinear effects and weak linear dispersion. The equation is derived using the usual KdV balance in which the linear dispersion, represented by ∂2/∂x2, is balanced by nonlinearity, represented by A. Here we have added to this balance weak inhomogeneity so that cx/c scales as h 2∂3/∂x3, and weak friction so that CD scales with h∂/∂x. Within this basic balance of terms, we can cast (1) into the asymptotically equivalent form AAX + AXXX = −CD |A|A, (2) where τ = c(x′) , X = τ − t. (3) Here we have h = h(x(τ)), explicitly dependent on the variable τ which describes evolution along the path of the wave. The governing equation (2) can be cast into several equivalent forms. That most com- monly used is the variable-coefficient KdV equation, obtained here by putting B = (gh)1/4A (4) so that Bτ + 2g1/4h5/4 BBX + BXXX = −CD |B|B . (5) This form shows that, in the absence of friction term, i.e. when CD ≡ 0, equation (2) has two integrals of motion with the densities proportional to h1/4A and h1/2A2. These are often referred to as laws for the conservation of “mass” and “momentum”. However, these densities do not necessarily correspond to the corresponding physical entities. Indeed, to leading order, the “momentum” density is proportional to the wave action flux, while the “mass” density differs slightly from the actual mass density. This latter issue has been explored by Miles (1979), where it was shown that the difference is smaller than the error incurred in the derivation of equation (4), and is due to reflected waves. Our main concern in this paper is with the behaviour of an undular bore over a slope in the presence of bottom friction, using the perturbed KdV equation (2), where we were originally motivated by the possibility that the behaviour of a tsunami approaching the shore might be modeled in this way. The undular bore solution to the unperturbed KdV equation can be constructed using the well-known Gurevich-Pitaevskii (GP) (1974) approach (see also Fornberg and Whitham 1978). In this approach, the undular bore is represented as a modulated nonlinear periodic wave train. The main feature of this unsteady undular bore is the presence of a solitary wave (which is the limiting wave form of the periodic cnoidal wave) at its leading edge. The original initial-value problem for the KdV equation is then replaced by a certain boundary-value problem for the associated modulation Whitham equations. We note, however, that so far, the simplest, “(x/t)”-similarity solutions of the modulation equations have been used for the modelling of undular bores in various contexts (see Grimshaw and Smyth 1986, Smyth 1987 or Apel 2003 for instance). These solutions, while effectively describing many features of undular bores, are degenerate and fail to cap- ture, even qualitatively, some important effects associated with non-self-similar modulation dynamics. In particular, in the classical GP solution for the resolution of an initial jump in the unperturbed KdV equation, the amplitude of the lead solitary wave in the undular bore is constant (twice the value of the initial jump). On the other hand, the modulation solution for the undular bore evolving from a general monotonically decreasing initial profile shows that the lead solitary wave amplitude in fact grows with time (Gurevich, Krylov and Mazur 1989; Gurevich, Krylov and El 1992; Kamchatnov 2000). As we shall see, the very possibility of such variations in the modulated solutions of the unperturbed KdV equation has a very important fluid dynamics implication: in a general setting, the undular bore lead solitary wave cannot be treated as an individual KdV solitary wave but rather represents a part of the global nonlinear wave structure. In other words, while at every particular moment of time the lead solitary wave has the spatial profile of the familiar KdV soliton, generally, the temporal dependence of its amplitude cannot be obtained in the framework of single solitary wave perturbation theory. In the unperturbed KdV equation, the growth of the lead solitary wave amplitude is caused by the spatial inhomogeneity of the initial data. Here, however, the presence of a perturbation due to topography and/or friction serves as an alternative and/or additional cause for variation of the lead solitary wave amplitude. Thus, in the present case, the variation in the amplitude will have two components (which generally, of course, cannot be separated because of the nonlinear nature of the problem); one is local, described by the adiabatic perturbation theory for a single solitary wave, and the other one is nonlocal, which in principle requires the study of the full modulation solution. Depending on the relative values of the small parameters associated with the slope, friction and spatial non-uniformity of the initial modulations, we can take into account only one of these components, or a combination of them. The structure of the paper is as follows. First, in Section 2, we reformulate the basic model (1) as a constant-coefficient KdV equation perturbed by terms representing topography and friction. Then we derive in Section 3 the associated perturbed Whitham modulation equations using methods recently developed by Kamchatnov (2004). Next, in Section 4, this Whitham system is integrated in the solitary-wave limit. Our purpose here is primarily to obtain the equation of a multiple characteristic, which defines the leading edge of a shoaling undular bore in the case when the modulations due to the combined action of the slope and bottom friction are small compared to the existing spatial modulations due to non-uniformity of the initial data. As a by-product of this integration, we reproduce and extend the known results on the adiabatic variation of a single solitary wave (Miles 1983a,b). Then, in Section 5, we carry out an analogous study of a cnoidal wave, propagating over a gradual slope and subject to friction, a case studied previously by Miles (1983 b) but under the restriction of zero mean flow, which is removed here. Finally, in Section 6 we study effects of a gradual slope and bottom friction on the front of an undular bore which represents a modulated cnoidal wave transforming into a system of weakly interacting solitons near its leading edge. 2 Problem formulation For the purpose of the present paper it is convenient to recast (2) into the standard KdV equation form with constant coefficients, modified by certain perturbation terms. Thus we introduce the new variables A, T = hdτ = 6g3/2 h(x)dx. (6) so that UT + 6UUX + UXXX = R = F (T )U −G(T )|U |U, (7) where F (T ) = −9hT , G(T ) = 4CD . (8) In this form, the governing equation (7) has the structure of the integrable KdV equation on the left-hand side, while the separate effects of the varying depth and the bottom friction are represented by the two terms on the right-hand side. This structure enables us to use the general theory developed in Kamchatnov (2004) for perturbed integrable systems. For much of the subsequent discussion, it is useful to assume that h(x) = constant, CD = 0 for x < 0 in the original equation (1), which corresponds to F (T ) = G(T ) = 0 for T < 0 in (7). We shall also assume that A = 0 for x > 0 at t = 0, which corresponds to U = 0 for X > 0 on X = τ(T ) (see (6)). Then we shall propose two types of initial-value problem for (1), and correspondingly for (7). (a) Let a solitary wave of a given amplitude a0 initially propagating over a flat bottom without friction (i.e a soliton described by an unperturbed KdV equation), enter the variable topography and bottom friction region at t = 0, x = 0 (Fig. 1 a). (b) Let an undular bore of a given intensity propagate over a flat bottom without friction (the corresponding solution of the unperturbed KdV equation will be discussed in Section 5). Let the lead solitary wave of this undular bore have the same amplitude a0 and enter the variable topography and bottom friction region at t = 0, x = 0 (Fig. 1b). In particular, we shall be interested in the comparison of the slow evolution of these two, initially identical, solitary waves in the two different problems described above. The expected essential difference in the evolution is due to the fact that the lead solitary wave in the undular bore is generally not independent of the remaining part of the bore and can exhibit features that cannot be captured by a local perturbation analysis. The well-known example of such a behaviour, when a solitary wave is constrained by the condition of being a part of a global nonlinear wave structure, is provided by the undular bore solution of the KdV-Burgers (KdV-B) equation ut + 6uux + uxxx = µuxx, µ ≪ 1 . (9) ( )h x a) ( )h x Figure 1: Isolated solitary wave (a) and undular bore (b) entering the variable topogra- phy/bottom friction region. Indeed, the undular bore solution of the KdV-B equation (9) is known to have a solitary wave at its leading edge (see Johnson 1970; Gurevich & Pitaevskii 1987; Avilov, Krichever & Novikov 1987) and this solitary wave: (a) is asymptotically close to a soliton solution of the unperturbed KdV equation; and (b) has the amplitude, say a0, that is constant in time. At the same time, it is clear that if one takes an isolated KdV soliton of the same amplitude a0 as initial data for the KdV-Burgers equation it would damp with time due to dissipation. The physical explanation of such a drastic difference in the behaviour of an isolated soliton and a lead solitary wave in the undular bore for the same weakly dissipative KdV-B equation is that the action of weak dissipation on an expanding undular bore is twofold: on the one hand, the dissipation tends to decrease the amplitude of the wave locally but, on the other hand, it “squeezes” the undular bore so that the interaction (i.e. momentum exchange) between separate solitons within the bore becomes stronger than in the absence of dissipation and this acts as the amplitude increasing factor. The additional momentum is extracted from the upstream flow with a greater depth (see Benjamin and Lighthill 1954). As a result, in the case of the KdV-B equation, an equilibrium non-zero value for the lead solitary wave amplitude in the undular bore is established. Of course, for other types of dissipation, a stationary value of the lead soliton amplitude would not necessarily exist, but in general, due to the expected increase of the soliton interactions near the leading edge, the amplitude of the lead soliton of the undular bore would decay slower than that of an isolated soliton. Indeed, the presence here of variable topography as well can result in an additional “nonlocal” amplitude growth. While the problem (a) can be solved using traditional perturbation analysis for a single solitary wave, which leads to an ordinary differential equation along the solitary wave path (see Miles 1983a,b), the undular bore evolution problem (b) requires a more general approach which can be developed on the basis of Whitham’s modulation theory leading to a system of three nonlinear hyperbolic partial differential equations of the first order. Since the Whitham method, being equivalent to a nonlinear multiple scale perturbation procedure, contains the adiabatic theory of slow evolution of a single solitary wave as a particular (albeit singular) limit, it is instructive for the purposes of this paper to treat both problems (a) and (b) using the general Whitham theory. 3 Modulation equations The original Whithammethod (Whitham 1965, 1974) was developed for conservative constant- coefficient nonlinear dispersive equations and is based on the averaging of appropriate con- servation laws of the original system over the period of a single-phase periodic travelling wave solution. The resulting system of quasi-linear equations describes the slow evolution of the modulations (i.e. of the mean value, the wavenumber, the amplitude etc.) of the pe- riodic travelling wave. Here, that approach is extended to the perturbed KdV equation (6) following the general approach of Kamchatnov (2004), which extends earlier results for cer- tain specific cases (see Gurevich and Pitaevskii (1987, 1991), Avilov, Krichever and Novikov (1987) and Myint and Grimshaw (1995) for instance). We suppose that the evolution of the nonlinear wave is adiabatically slow, that is, the wave can be locally represented as a solution of the corresponding unperturbed KdV equation (i.e. (7) with zero on the right-hand side) with its parameters slowly varying with space and time. The one-phase periodic solution of the KdV equation can be written in the form U(X, T ) = λ3 − λ1 − λ2 − 2(λ3 − λ2)sn2( λ3 − λ1 θ,m) (10) where sn(y,m) is the Jacobi elliptic sine function, λ1 ≤ λ2 ≤ λ3 are parameters and the phase variable θ and the modulus m are given by θ = X − V T, V = −2(λ1 + λ2 + λ3) , (11) λ3 − λ2 λ3 − λ1 , (12) and L = −P (µ) 2K(m)√ λ3 − λ1 , (13) where K(m) is the complete elliptic integral of the first kind, L is the “wavelength” along the X-axis (which is actually a retarded time rather than a true spatial co-ordinate). Here we have used the representation of the basic ordinary differential equation for the KdV travelling wave solution (10) in the form (see Kamchatnov (2000) for a general motivation behind this representation) −P (µ), (14) where µ = 1 (U + s1), s1 = λ1 + λ2 + λ3 (15) P (µ) = (µ− λi) = µ3 − s1µ2 + s2µ− s3, (16) that is the solution (10) is parameterized by the zeroes λ1, λ2, λ3 of the polynomial P (µ). In a modulated wave, the parameters λ1, λ2, λ3 are allowed to be slow functions of X and T , and their evolution is governed by the Whitham equations. For the unperturbed KdV equation, the evolution of the modulation parameters is due to a spatial non-uniformity of the initial distributions for λj, j = 1, 2, 3 and the typical spatio-temporal scale of the modulation variations is determined by the scale of the initial data. In the case of the perturbed KdV equation (7), the evolution of the parameters λ1, λ2, λ3 is caused not only by their initial spatial non-uniformity, but also by the action of the weak perturbation, so that, generally, at least two independent spatio-temporal scales for the modulations can be involved. However, at this point we shall not introduce any scale separation within the modulation theory and derive general perturbed Whitham equations assuming that the typical values of F (T ) and G(T ) are O(∂λj/∂T, ∂λj/∂X) within the modulation theory. It is instructive to first introduce the Whitham equations for the perturbed KdV equation (7) using the traditional approach of averaging the (perturbed) conservation laws. To this end, we introduce the averaging over the period (13) of the cnoidal wave (10) by 〈F〉 = Fdθ = −P (µ) . (17) In particular, 〈U〉 = 2〈µ〉 − s1 = 2(λ3 − λ1) + λ1 − λ2 − λ3, (18) 〈U2〉 = 8[−s1 (λ3 − λ1) s1λ1 + (λ21 − λ2λ3)] + s21 , (19) where E(m) is the complete elliptic integral of the second kind. Now, one represents the KdV equation (7) in the form of the perturbed conservation laws = Rj , j = 1, 2, 3 , Rj ≪ 1 , (20) where Pj and Qj are the standard expressions for the conserved densities (Kruskal integrals) and “fluxes” of the unperturbed KdV equation. Just as in the Whitham (1965) theory for unperturbed dispersive systems, the number of conservation laws required is equal to the number of free parameters in the travelling wave solution, which is three in the present case. Next, one applies the averaging (17) to the system (20) to obtain (see Dubrovin and Novikov 1989) ∂〈Pj〉 ∂〈Qj〉 = 〈Rj〉 , j = 1, 2, 3 . (21) The system (21) describes slow evolution of the parameters λj in the cnoidal wave solution (10). Along with these derived perturbed conservative form of the Whitham equations, we introduce the wave conservation law which is a general condition for the existence of slowly modulated single-phase travelling wave solutions (10) (see for instance Whitham 1974) and must be consistent with the modulation system (21). This conservation law has the form = 0 , (22) where k = , ω = kV (23) are the “wavenumber” and the “frequency” respectively (we have put quotation marks here because the actual wavenumber and frequency related to the physical variables x, t are different quantities from those in (23), but are related through the transformations (3, 6) ). The wave conservation law (22) can be introduced instead of any of three inhomogeneous averaged conservation laws comprising the Whitham system (21). It is known that the Whitham system for the homogeneous constant-coefficient KdV equation can be represented in diagonal (Riemann) form (Whitham 1965, 1974) by an ap- propriate choice of the three parameters characterising the periodic travelling wave solution. In fact, in our solution (7) the parameters λj have already been chosen so that they coincide with the Riemann invariants of the unperturbed KdV modulation system. Introducing them explicitly into the perturbed system (21) we obtain (see Kamchatnov 2004) ∂L/∂λi 〈(2λi − s1 − U)R〉 j 6=i(λi − λj) , i = 1, 2, 3, (24) where R is the perturbation term on the right-hand side of the KdV equation (7) and vi = −2 ∂L/∂λi , i = 1, 2, 3, (25) are the Whitham characteristic velocities corresponding to the unperturbed KdV equation. It should be noted that the straightforward realisation of the above lucid general algo- rithm for obtaining perturbed modulation system in diagonal form is quite a laborious task. In fact, to derive system (24), the so-called finite-gap integration method incorporating the integrable structure of the unperturbed KdV equation has been used. The modulation sys- tem (24) in a more particular form corresponding to specific choices of the perturbation term was obtained by Myint and Grimshaw (1995) using a multiple-scale perturbation expansion. In that latter setting, the wave conservation law (22) is an inherent part of the construction, while in the averaging approach used here, it can be obtained as a consequence of the system (24). To obtain an explicit representation of the Whitham equations for the present case of equation (7), we must substitute the perturbation R from the right-hand side of (7) and perform the integration (17) with U given by (10). From now on, we are going to consider only the flows where U ≥ 0 so that the perturbation term assumes the form R(U) = G(T )U − F (T )U2 . (26) Substituting (26) into (24) we obtain, after some detailed calculations (see Appendix), the perturbed Whitham system in the form = ρi = Ci[F (T )Ai −G(T )Bi], i = 1, 2, 3 (27) where C1 = , C2 = E − (1−m)K , C3 = ; (28) (5λ1 − λ2 − λ3)E + (λ2 − λ1)K, (5λ2 − λ1 − λ3)E − (λ2 − λ1) λ3 − λ1 (5λ3 − λ1 − λ2)E − (λ2 − λ1) (−27λ21 − 7λ22 − 7λ23 + 2λ1λ2 + 2λ1λ3 + 22λ2λ3)E (λ2 − λ1)(3λ1 + λ2 + λ3)K, (−7λ21 − 27λ22 − 7λ23 + 2λ1λ2 + 22λ1λ3 + 2λ2λ3)E λ2 − λ1 λ3 − λ1 (7λ21 + 15λ 2 + 11λ 3 − 6λ1λ2 − 18λ1λ3 + 6λ2λ3)K, (−7λ21 − 7λ22 − 27λ23 + 22λ1λ2 + 2λ1λ3 + 2λ2λ3)E (7λ21 + 11λ 2 + 15λ 3 − 18λ1λ2 − 6λ1λ3 + 6λ2λ3)K; and the characteristic velocities are: v1 = −2 4(λ3 − λ1)(1−m)K v2 = −2 4(λ3 − λ2)(1−m)K E − (1−m)K v3 = −2 4(λ3 − λ2)K The equations (27) – (31) provide a general setting for studying the nonlinear modulated wave evolution over variable topography with bottom friction. In the absence of the pertur- bation terms (i.e. when F (T ) ≡ 0, G(T ) ≡ 0), the system (27), (31) indeed coincides with the original Whitham equations (Whitham 1965) for the integrable KdV dynamics. In that case the variables λ1, λ2, λ3 become Riemann invariants, so in this general (perturbed) case we shall call them Riemann variables. It is important to study the structure of the perturbed Whitham equations (27) – (31) in two limiting cases when the underlying cnoidal wave degenerates into (i) a small-amplitude sinusoidal wave (linear limit), when λ2 = λ3 (m = 0), and (ii) into a solitary wave when λ2 = λ1 (m = 1). Since in both these limits the oscillations do not contribute to the mean flow (they are infinitely small in the linear limit and the distance between them becomes infinitely long in the solitary wave limit) one should expect that in both cases one of the Whitham equations will transform into the dispersionless limit of the original perturbed KdV equation (7) i.e. UT + 6UUX = F (T )U −G(T )U2, (32) Indeed, using formulae (27) – (31) we obtain for m = 0: λ2 = λ3 , − 6λ1 = λ1F + λ + (6λ1 − 12λ3) = λ1F + λ Similarly, for m = 1, one has λ2 = λ1 , − (4λ1 + 2λ3) (4λ1 − λ3)F + (7λ23 − 24λ1λ3 + 32λ21)G, − 6λ3 = λ3F + λ We see that, in both cases, one of the Riemann variables (taken with inverted sign) coincides with the solution of the dispersionless equation (32) (recall that in the derivation of the Whitham equations we assumed U ≥ 0 everywhere), namely U = 〈U〉 = −λ1 when λ2 = λ3 (m = 0) and U = 〈U〉 = −λ3 when λ2 = λ1 (m = 1). To conclude this section, we present expressions for the physical wave parameters such as the surface elevation wave amplitude a, mean elevation 〈A〉 speed and wavenumber in terms of the modulation solution λj(X, T ). Using (6) and (10) we obtain for the wave amplitude (peak to trough) and the mean elevation (λ3 − λ2) , 〈A〉 = 〈U〉 , (35) where the dependence of 〈U〉 on λj(X, T ), j = 1, 2, 3 is given by (18) and X = X(x, t), T = T (x, t) by (3, 6). In order to obtain the physical wavenumber κ and the frequency Ω we first note that the phase function θ(X, T ) defined in (11) is replaced by a more general expression defined so that k = θX and kV = −θT are the “wavenumber” and “frequency” in the X −T coordinate system. Then we define the physical phase function Θ(x, t) = θ(X, T ) so that we get κ = Θx , Ω = −Θt . (36) It now follows that (1− hV ) , Ω = k , and 1− hV/6g . (37) Note that the physical frequency is the “wavenumber” in the X − T coordinate system, and that the physical phase speed is Ω/κ. Since the validity of the KdV model (1) requires inter alia that the wave be right-going it follows from this expression that the modulation solution remains valid only when hV < 6g. Of course, the validity of (1) also requires that the amplitude remains small, and this would normally also ensure that V remains small. 4 Modulation solution in the solitary wave limit In this section, we shall integrate the perturbed modulation system (27) along the multiple characteristic corresponding to the merging of two Riemann variables λ2 and λ1. As we shall see later, this characteristic specifies the motion of the leading edge of the shoaling undular bore in the case when the perturbations due to variable topography and bottom friction can be considered as small compared with the existing spatial modulations within the bore. At the same time, as the case λ2 = λ1 ( i.e. m = 1) corresponds to the solitary wave limit in the travelling wave solution (10), our results here are expected to be consistent with the results from the traditional perturbation approach to the adiabatic variation of a solitary wave due to topography and bottom friction (see Miles 1983a,b). In the limit m → 1 the periodic solution (10) of the KdV equation goes over to its solitary wave solution U(X, T ) = U0sech λ3 − λ1(X − VsT )]− λ3, (38) where U0 = 2(λ3 − λ1) , Vs = −(4λ1 + 2λ3) (39) are the solitary wave amplitude and “velocity” respectively. The solution (38) depends on two parameters λ1 and λ3 whose adiabatic slow evolution is governed by the reduced modulation system (34). It is important that the second equation in this system is decoupled from the first one. Hence, evolution of the pedestal −λ3 on which the solitary wave rides, can be found from the solution of this dispersionless equation by the method of characteristics. When λ3(X, T ) is known, evolution of the parameter λ1 can be found from the solution of the first equation (34). As a result, we arrive at a complete description of adiabatic slow evolution of the solitary wave parameters taking account of its interaction with the (given) pedestal. However, it is important to note here that while this description of the adiabatic evolution of a solitary wave is complete as far as the solitary wave itself is concerned, it fails to describe the evolution of a trailing shelf, which is needed to conserve total “mass” (see, for instance, Johnson 1973b, Grimshaw 1979 or Grimshaw 2006). This trailing shelf has a very small amplitude, but a very large length scale, and hence can carry the same order of “mass” as the solitary wave. But note that the “momentum” of the trailing shelf is much smaller than that of the solitary wave, whose adiabatic deformation is in fact governed to leading order by conservation of “momentum”, or more precisely, by conservation of wave action flux (strictly speaking, conservation only in the absence of friction). The situation simplifies if the solitary wave propagates into a region of still water so that there is no pedestal ahead of the wave, that is λ3 = 0 in X > τ(T ). But then, since λ3 = 0 is an exact solution of the degenerate Whitham system (34) for this solitary wave configuration, we can put λ3 = 0 both in the solitary wave solution, U(X, T ) = −2λ1sech2[ −λ1 (X − VsT )], Vs = −4λ1, (40) and in equation (34) for the parameter λ1 to obtain, − 4λ1 Fλ1 + Gλ21 , (41) As we see, the solitary wave moves with the instant velocity = −4λ1, (42) and the parameter λ1 changes with T along the solitary wave trajectory according to the ordinary differential equation F (T )λ1 + G(T )λ21. (43) It can be shown that equation (43) is consistent with the equation for the solitary wave half- width γ = −λ1 obtained by the traditional perturbation approach (see Grimshaw (1979) for instance). Next, we re-write equation (43) in terms the original independent x-variable. For that, we find from (6), that dT = (h1/2/6g3/2)dx (44) and F = −27 )3/2 dh , G = 4CD . (45) Then substituting these expressions into (43) yields the equation = −31 λ21 (46) which can be easily integrated to give −C0 − , (47) where C0 is an integration constant and x = 0 is a reference point where h = h0. According to (40), U0 = −2λ1 is the amplitude of the soliton expressed in terms of variable U(X, T ). Returning to the original surface displacement A(x, t) by means of (6) and denoting C0 = 4/(3ga0h0), we find the dependence of the surface elevation soliton amplitude a = (2h 2/3g)U0 on x in the form a = a0 CDa0h0 , (48) where a0 is the solitary wave amplitude at x = 0. We note that for CD = 0 this reduces to the classical Boussinesq (1872) result a ∼ h−1, while for h = h0 it reduces to the well-known algebraic decay law a ∼ 1/(1 + constant x) due to Chezy friction. Miles (1983a,b) obtained this expression for a linear depth variation, although we note that there is a factor of 2 difference from (48) (in Miles (1983a,b) the factor 16CD/15 is 8CD/15). The trajectory of the soliton can be now found from (42) and (47): − t = dx′h−5/2(x′) CDa0h0 h3(x) . (49) This expression determines implicitly the dependence of x on t along the solitary wave path and provides the desired equation for the multiple characteristic of the modulation system for the case m = 1. It is instructive to derive an explicit expression for the solitary wave speed by computing the derivative dx/dt from (49), or more simply, directly from (37), 1− a/2h . (50) The formula (50) yields the restriction for the relative amplitude γ = a/h < 2 which is clearly beyond the applicability of the KdV approximation (wave breaking occurs already at γ = 0.7 (see Whitham 1974)). In the frictionless case (CD = 0) equation (48) gives a/h = a0h0/h 2, and so the expression (50) for the speed must fail as h → 0. It is interesting to note that this failure of the KdV model as h → 0 due to appearance of infinite (and further negative!) solitary wave speeds is not apparent from the expression (48) for the solitary wave amplitude, and the implication is that the model cannot be continued as h → 0. Curiously this restriction of the KdV model seems never to have been noticed before in spite of numerous works on this subject. Note that taking account of bottom friction leads to a more complicated formula for the solitary wave speed as a function of h but the qualitative result remains the same. It is straightforward to show from (46) or (48) that = −hx CDa0h0 CDa0h0 . (51) It follows immediately that for a wave advancing into increasing depth (hx > 0), the ampli- tude decreases due to a combination of increasing depth and bottom friction. However, for a wave advancing into decreasing depth, there is a tendency to increase the amplitude due to the depth decrease, but to decrease the amplitude due to bottom friction. Hence whether or not the amplitude increases is determined by which of these effects is larger, and this in turn is determined by the slope, the depth, and the consolidated drag parameter CDa0/h0. To illustrate, let us consider the bottom topography in the form h(x) = h1−α0 (h0 − δx)α , α > 0 , (52) which satisfies the condition h(0) = h0; the parameter δ characterizes the slope of the bot- tom. In this case the formula (48) becomes a = a0 δ(3α− 1)h0 )(3α−1)/α if α 6= 1/3. One can see now that if α < 1/3, then the bottom friction term is relatively unimportant due to the smallness of CD. Of course, for this case we again recover the Boussinesq result, now slightly modified, a ≈ a0 δ(1− 3α)h20 , 0 < α < , h ≪ h0. (54) Of course, this result is impractical in the KdV context as the KdV approximation used here requires the ratio a/h to remain small. If α > 1/3 now obtain asymptotic formula 15(3α− 1)δ , h ≪ h0 , (55) which is independent of the initial amplitude a0. This expression is consistent with the small- amplitude KdV approximation as long as (3α− 1)δ/CD is order unity. Simple inspection of (55) shows that the solitary wave amplitude • increases as h → 0 if 1 < α < 1 • is constant as h → 0 if α = 1 • decreases as h → 0 if α > 1 Thus for 1/3 < α < 1/2, as for the case α < 1/3, the amplitude will increase as the depth decreases, in spite of the presence of (sufficiently small) friction. However, for α > 1/3, even although there is usually some initial growth in the amplitude, eventually even small bottom friction will take effect and the amplitude decreases to zero. We note that if α = 1/3 then the integral h−3dx in (48) diverges logarithmically as h → 0, which just slightly modifies the result (55) for h ≪ h0 and implies growth of the amplitude ∝ ln h/h as h → 0. Of particular interest is the case α = 1. In that case formula (53) becomes a = a0 . (56) and a ≈ 15 h , h ≪ h0 (57) These expressions (56, 57) were obtained by Miles (1983a,b) using wave energy conservation (as above, note, however, that in Miles (1983a,b) the numerical coefficient is 15/4 rather than 15/8). Thus, these results obtained from the Whitham theory are indeed consistent, at the leading order, with the traditional perturbation approach for a slowly-varying solitary wave. 5 Adiabatic deformation of a cnoidal wave Next we consider a modulated cnoidal wave (10) in the special case when the modulation does not depend on X . While this case is, strictly speaking, impractical as it assumes there is an infinitely long wavetrain, it can nevertheless provide some useful insights into the qualitative effects of gradual slope and friction on undular bores which are locally represented as cnoidal waves. In the absence of friction, the slow dependence of the cnoidal wave parameters on T was obtained by Ostrovsky & Pelinovsky (1970, 1975) and Miles (1979) (see also Grimshaw 2006), assuming that the surface displacement had a zero mean (i.e. 〈U〉 = 0), while, the effects of friction were taken into account by Miles (1983b) using the same zero-mean displacement assumption. However, this assumption is inconsistent with our aim to study undular bores where the value of 〈U〉 is essentially nonzero. Hence, we need to develop a more general theory enabling us to take into account variations in all the parameters in the cnoidal wave. Such a general setting is provided by the modulation system (27). Thus we consider the case when the Riemann variables in (27) do not depend on the variable X so that the general Whitham equations become ordinary differential equations in T , which can be conveniently reformulated in terms of the original spatial x-coordinate using the relationship (44): , i = 1, 2, 3, (58) where all variables are defined above in section 3 (see 28, 29, 30). This system can be readily solved numerically. But it is instructive, however, to first indicate some general properties of the solution. First, the solution to the system (58) must have the property of conservation of “wave- length” L (or “wavenumber” k=2π/L) 2K(m)√ λ3 − λ1 = constant (59) Indeed, the wave conservation law (22) in absence of X-dependence assumes the form = 0 , (60) which yields (59). Thus the system of three equations (58) can be reduced to two equations. Next, applying Whitham averaging directly to (7) yields P̃ , M = 〈U〉 , P̃ = 〈|U |U〉 . (61) P − 4CD Q̃ , P = 〈U2〉 , Q̃ = 〈|U |3〉 . (62) The equation set (59), (61), (62) comprise a closed modulation system for three independent modulation parameters, say M , P̃ and m. While this system is not as convenient for further analysis as the system (27) in Riemann variables, it does not have a restriction U > 0 inherent in (27), and allows for some straightforward inferences regarding the possible existence of modulation solutions with zero mean elevation, that is with M = 0. Indeed, one can see that the solution with the zero mean is actually not generally permissible when CD 6= 0, a situation overlooked in Miles (1983b). Indeed, M = 0 immediately then implies that P̃ = 0 by (61). But then due to (59) we have all three modulation parameters fixed which is clearly inconsistent with the remaining equation (62) (except for the trivial case M = 0, P = 0, Q̃ = 0). However, in the absence of friction, when CD = 0, equation (61) uncouples and permits a nontrivial solution with a zero mean. In general, when CD = 0 equations (61), (62) can be easily integrated to give d = Mh9/4 = constant; σ = Ph9/2 = constant. (63) Then, using (18, 19, 59) one readily gets the formula for the variation of the modulus m, and hence of all the other wave parameters, as a function of h K2[2(2−m)EK − 3E2 − (1−m)K2] = (σ − d2)L4 . (64) 200 400 600 800 0. 2 0. 4 0. 6 0. 8 C = 0 C = 0.01 Figure 2: Dependence of the modulus m on the physical space coordinate x in the cases without and with bottom friction in the X-independent modulation solution. Formula (64) generalises to the case M 6= 0 (i.e. d 6= 0) the expressions of Ostrovsky & Pelinovsky (1970, 1975), Miles (1979) and Grimshaw (2006) (note that in Grimshaw (2006) the zero mean restriction in actually not necessary). We note here that, again with CD = 0, equation (5) implies conservation of 〈B〉 and 〈B2〉 (the averaged wave action flux), which, together with (59), also yield (64). The physical frequency Ω and wavenumber κ in the modulated periodic wave under study are given by the formula (37), and we recall here that k = 2π/L is constant (see (59)). As discussed before at the end of Section 3 we must require that the phase speed stays positive as the wave evolves, and here that requires that the physical wavenumber κ > 0. Since a/h (and hence hV/6g) is supposed to be small within the range of applicability of the KdV equation (2) the expression (37) implies the behaviour κ ≃ Ω/ gh which of course agrees with the well known result for linear waves on a sloping beach (see Johnson 1997 for instance). This effect will be slightly attenuated for the nonlinear cnoidal wave, since V h/6g > 0, but the overall effect will be a “squeezing” of the cnoidal wave, a result important for our further study of undular bores. Next we study numerically the combined effect of slope and friction on a cnoidal wave. As we have shown, in the presence of Chezy friction M 6= 0, and we have also assumed that U > 0, which is necessary when we come to study undular bores. Now we use the stationary modulation system (58) in Riemann variables, which was derived using this as- sumption. We solve the coupled ordinary differential equation system (58) for the case of a linear slope h(x) = h0 − δx (65) with h0 = 10, δ = 0.01, and with the initial conditions λ1 = −0.441, λ2 = 0.147, λ3 = 0.294 at x = 0, (66) which corresponds to a nearly harmonic wave with m = 0.2, a/h0 = 0.2, 〈A〉/h0 ≈ 0.3 at x = 0 (see (35)). Also we note that for the chosen parameters we have V = 0, so at x = 0 we have κ = Ω/ gh0 as in linear theory. It is instructive to compare solutions with (CD = 0.01) and without (CD = 0) friction. In Fig. 2 the dependence of the modulus m 100 200 300 400 500 C = 0 C = 0.01 100 200 300 400 500 h 1/4 C = 0 C = 0.01 Figure 3: Left: Dependence of the mean value 〈A〉 in theX-independent modulation solution on the physical space coordinate x without (dashed line) and with (solid line) bottom friction; Right: Same but multiplied by the Green’s law factor, h1/4 100 200 300 400 500 1. 4 1. 6 1. 8 2. 2 2. 4 C = 0 C = 0.01 Figure 4: Dependence of the surface elevation amplitude a on the space coordinate x. Dashed line corresponds to the frictionless case and solid line to the case with bottom friction. on x is shown for both cases. We see that for the frictionless case m → 1 with decrease of depth, i.e. the wave crests assume the shape of solitary waves when one approaches the shoreline. When CD 6= 0 the modulus also grows with decrease of depth but never reaches unity. The dependence on x of the mean surface elevation 〈A〉 for the cases without and with friction is shown in Fig. 3. We have checked that the “wavelength” L (59) is constant for both solutions. Also, one can see from Fig. 3 (right) that the value h1/4〈A〉 ∝ d is indeed conserved in the frictionless case but is not constant if friction is present (the same holds true for the value h1/2〈A2〉 ∝ σ but we do not present the graph here). Finally, in Fig. 4 the dependence of the physical elevation wave amplitude a on the spatial coordinate x is shown. One can see that the amplitude adiabatically grows with distance in the frictionless case due to the effect of the slope (without friction) but, not unexpectedly, gradually decreases in the case when bottom friction is present, where the decrease for these parameter settings is comparable in magnitude to the effect of the slope. In both cases the main qualitative changes occur in the wave shape and the wavelength. Overall, we can infer from these results that the main local effect of a slope and bottom friction on a cnoidal wave, along with the adiabatic amplitude variations, is twofold: a wave with a m < 1 at x = 0 tends to transform into a sequence of solitary waves as x decreases, and at the same time the distance between subsequent wave crests tends to decrease. This is in sharp contrast with the behaviour of modulated cnoidal waves in problems described by the unperturbed KdV equation, where growth of the modulus m is accompanied by an increase of the distance between the wave crests. Generally, in the study of behaviour of unsteady undular bores in the presence of a slope and bottom friction we will have to deal with the combination of these two opposite tendencies. 6 Undular bore propagation over variable topography with bottom friction 6.1 Gurevich-Pitaevskii problem for flat-bottom zero-friction case We now turn to the problem (b) outlined in Section 2. We study the evolution of an undular bore developing from an initial surface elevation jump ∆ > 0, located at some point x0 < 0. As discussed below, the undular bore will expand with time so that at some t = t0 its lead solitary wave enters the gradual slope region, which begins at x = 0 (see Fig. 1b). We assume that for x < 0 one has h = h0 = constant and CD ≡ 0. We shall first present a formulation of the Gurevich-Pitaevskii problem for the perturbation-free KdV equation and reproduce the well-known similarity modulation solution describing the evolution of the undular bore until the moment it enters the slope. We emphasize that, although this formulation and, especially, this similarity solution are known very well and have been used by many authors, some of the inferences important for the present application to fluid dynamics have not been widely appreciated, as far as we can discern. Pertinent to our main objective in this paper, we undertake a detailed study of the characteristics of the Whitham modulation system in the vicinity of the leading edge of the undular bore solution, and show that the boundary con- ditions of Gurevich-Pitaevskii type permit only two possible characteristics configurations, implying two qualitatively different types of the leading solitary wave behaviour. Next, we shall show how this Gurevich-Pitaevskii formulation of the problem applies to the perturbed modulation system in the form (27) and finally we will study the effects of the perturbation on the modulations in the vicinity of the leading edge of the undular bore. In the case of a flat, frictionless bottom the original equation (1) becomes the constant- coefficient KdV equation which can be cast into the standard form ηζ + 6ηηξ + ηξξξ = 0 (67) by introducing the new variables A , ξ = (x+ x0 − gh0t) , ζ = t , (68) where x0 < 0 is an arbitrary constant. In the Gurevich-Pitaevskii (GP) approach, one considers a large-scale initial disturbance η(ξ, 0) = f(ξ), in the form of a decreasing profile, f ′(ξ) < 0 (e.g. a smooth step: f(ξ) → 0 as ξ → +∞; f(ξ) → η0 > 0 as ξ → −∞), whose initial evolution until some critical (breaking) time ζb can be described by the dispersionless limit of the KdV equation, i.e. by the Hopf equation, ζ < ζb : η ≈ r(ξ, ζ), rζ + 6rrξ = 0 , r(ξ, 0) = f(ξ) . (69) The evolution (69) leads to wave-breaking of the r(ξ)-profile at some ζ = ζb, with the consequence that the dispersive term in the KdV equation then comes into play, and an undular bore forms, which can be locally represented as a single-phase travelling wave. This travelling wave is modulated in such a way that it acquires the form of a solitary wave at the leading edge ξ = ξ+(ζ) and gradually degenerates, via the nonlinear cnoidal-wave regime, to a linear wave packet at the trailing edge ξ = ξ−(ζ). It is important that this undular bore is essentially unsteady, i.e. the region ξ−(ζ) < ξ < ξ+(ζ) expands with time ζ . The single-phase travelling wave solution of the KdV equation (67) has the form (cf. (10)) η(ξ, ζ) = r3 − r1 − r2 − 2(r3 − r2)sn2( r3 − r1θ,m) (70) θ = ξ + 2(r1 + r2 + r3)ζ , m = r3 − r2 r3 − r1 . (71) The parameters r1 ≤ r2 ≤ r3 ≤ 0 in the undular bore are slowly varying functions of ξ, ζ , whose evolution is governed by the Whitham equations + vj(r1, r2, r3) = 0 , j = 1, 2, 3. (72) The characteristic velocities in (72) are given by (31). We stress that, although analytical expressions (70) and (10) (as well as (72) and the homogeneous version of (27)) are identical, they are written for completely different sets of variables, both dependent and independent. The Riemann invariants rj(ξ, ζ) are subject to special matching conditions at the free boundaries, ξ = ξ±(ζ) defined by the conditions m = 0 (trailing edge) and m = 1 (leading edge), formulated in Gurevich and Pitaevskii (1974) (see also Kamchatnov (2000) or El (2005) for a detailed description). At the trailing (harmonic) edge, where the wave amplitude a = 2(r3 − r2) vanishes and m = 0, one has ξ = ξ−(ζ) : r2 = r3 , −r1 = r . (73) At the leading (soliton) edge, where m = 1 one has ξ = ξ+(ζ) : r2 = r1 , −r3 = r . (74) In both (73) and (74), r(ξ, ζ) is the solution of the Hopf equation (69). The curves ξ = ξ±(ζ) are defined for the solution of the GP problem (72), (73), (74) by the ordinary differential equations = v−(ξ−, ζ) , = v+(ξ+, ζ) , (75) where v± are calculated as the values of double characteristic velocities of the modulation system at the undular bore edges, v− = v2(r1, r3, r3)|ξ=ξ−(ζ) = v3(r1, r3, r3)|ξ=ξ−(ζ), (76) v+ = v2(r1, r1, r3)|ξ=ξ+(ζ) = v1(r1, r1, r3)|ξ=ξ+(ζ) (77) These equations (75) essentially represent kinematic boundary conditions for the undular bore (see El 2005). Indeed, the double characteristic velocity v2(r1, r3, r3) = v3(r1, r3, r3) can be shown to coincide with the linear group velocity of the small-amplitude KdV wavepacket while the double characteristic velocity v2(r1, r1, r3) = v1(r1, r1, r3) is the soliton speed. One might infer from this GP formulation of the problem that, since the leading edge of the undular bore specified by (75), (77) is a characteristic of the modulation system, then the value of the double Riemann invariant r+ ≡ r2 = r1 is constant. Then, on considering an undular bore propagating into still water, where r = 0, one would obtain from the matching condition (74) at the leading edge that r3|ξ=ξ+ = 0 and thus, the amplitude of the lead solitary wave a+ = 2(r3−r1)|ξ=ξ+ = −r+ would always be constant as well. However, this contradicts the general physical reasoning that the amplitude of the lead solitary wave should be allowed to change in the case of general initial data. The apparent contradiction is resolved by noting that the leading edge specified by (75), (77) can be an envelope of the characteristic family, i.e. a caustic, rather than necessarily a regular characteristic, and hence there is no necessity for the double Riemann invariant r+ to be constant along the curve ξ = ξ+(ζ) in general case. On the other hand, since the leading edge is defined by the condition m = 1, the wave form at the leading edge will coincide with the spatial profile of the standard KdV soliton. Thus we arrive at the conclusion that, in general, the amplitude of the leading KdV solitary wave will vary, even in the absence of the perturbation terms. Of course, in the unperturbed KdV equation, such varying solitary waves cannot not exist on their own, and require the presence of the rest of the undular bore. We also stress that these variations of the leading solitary wave in the undular bore, as described here, have a completely different physical nature to the variations of the parameters of an individual solitary wave due to small perturbations as described in Section 4. They are caused by nonlinear wave interactions within the undular bore rather than by a local adiabatic response of the solitary wave to a perturbation induced by topography and friction. Importantly for our study, however, it will transpire that the action of these same perturbation terms on the undular bore can lead to both a local and a nonlocal response of the leading solitary wave. 6.2 Undular bore developing from an initial jump Next we consider the simplest solution of the modulation system, which describes an undular bore developing from an initial discontinuity placed at the point x = −x0. In (η; ξ, ζ) - variables we have the initial conditions η(ξ, 0) = ∆ for ξ < 0 ; η(ξ, 0) = 0 for ξ > 0 , (78) where ∆ > 0 is a constant. Then, on using (69), the initial conditions (78) are readily translated into the free-boundary matching conditions (73), (74) for the Riemann invariants. Because of the absence of a length scale in this problem, the corresponding solution of the modulation system must depend on the self-similar variable τ = ξ/ζ alone, which reduces the modulation system to the ordinary differential equations (vi − τ) = 0 , i = 1, 2, 3. (79) -4 0 -3 0 -2 0 -1 0 10 20 -0.8 -0.6 -0.4 -0.2 -30 -20 -10 10 η(ξ, ζ = 5) Figure 5: Left: Riemann invariants behaviour in the similarity modulation solution for the flat-bottom zero-friction case ; Right: corresponding undular bore profile η(ξ). The boundary conditions for (79) follow from the matching conditions (73), (74) using the initial condition (78): τ = τ− : r2 = r3 , r1 = −∆ τ = τ+ : r2 = r1 , r3 = 0 . where τ± are self-similar coordinates (speeds) of the leading and trailing edges, ξ± = τ±ζ . Taking into account the inequality r1 ≤ r2 ≤ r3 one obtains the well-known modulation solution of Gurevich and Pitaevskii (1974) (see also Fornberg and Whitham 1978) in the r1 = −∆ , r3 = 0 , r2 = −m∆ , (81) = v2(−∆,−m∆, 0) = 2∆[(1 +m)− 2m(1−m)K(m) E(m)− (1−m)K(m) ] . (82) This modulation solution (81), (82) (see Fig. 5a) represents the replacement, due to averag- ing over the oscillations, of the unphysical formal three-valued solution of the dispersionless KdV equation (i.e. of the Hopf equation) which would have taken place in the absence of the dispersive regularisation by the undular bore. We see that (82) describes an expansion fan in the characteristic (ξ, ζ)-plane and thus is a global solution. Substituting (81), (82) into the travelling wave solution (70) one obtains the asymptotic wave form of the undular bore (see Fig. 5b), which then can be readily represented in terms of the original physical variables using the relationships (68). The equations of the trailing and leading edges of the undular bore are determined from (82) by putting m = 0 and m = 1 respectively = τ− = v2(−∆, 0, 0) = −6∆ , = τ+ = v2(−∆,−∆, 0) = 4∆ . (83) The leading solitary wave amplitude is η0 = 2(r3−r1) = 2∆, which is exactly twice the height of the initial jump. This corresponds to the amplitude of the surface elevation a = 3h0∆ (see (68)). Note that, to get the leading solitary wave of the same initial amplitude a0 as for the separate solitary wave considered in Section 4, one should use the jump value ∆0 = a0/3h0, which of course is just 2∆̃, where ∆̃ = 3h0∆/2 is the initial discontinuity in the surface elevation. 6.3 Structure of the undular bore front We are especially interested in the behaviour of the modulation solution (81), (82) in the vicinity of the leading edge ξ = ξ+(ζ). This behaviour is essentially determined by the manner in which the pair of characteristics corresponding to the velocities v2 and v1 merge into a multiple eigenvalue v+ of the modulation system at ξ = ξ+(ζ). First, one can readily infer from the modulation solution (81), (82) that the phase velocity c = −2(r1 + r2 + r3) = 2∆(1 +m) > v2(−∆,−m, 0) for m < 1 and c = v2 for m = 1. Thus, any individual wave crest generated at the trailing edge of the undular bore moves towards the leading edge, i.e. for any crest m → 1 as ζ → ∞. Thus, for any particular wave crest, except for the very first one, the solitary wave ‘status’ is achieved only asymptotically as ζ → ∞. Without loss of generality we assume in this section that ∆ = 1 in (81), (82). First, as we have already mentioned, the characteristic family Γ2 : dξ/dζ = v2 is an expansion fan in the ξ, ζ - plane, Γ2 : ξ = C2ζ , (84) parameterised by a constant C2, −6 ≤ C2 ≤ 4 . Next, in (82) we make an asymptotic expansion of v2(−1,−m, 0) for small (1−m) ≪ 1, to get 2(1−m) ln(16/(1−m)) ≃ τ+ − ξ/ζ (85) or, with logarithmic accuracy, (τ+ − ξ/ζ) ≪ 1 : 1−m ≃ τ+ − ξ/ζ 2 ln[1/(τ+ − ξ/ζ)] . (86) Next, expanding v1(−1,−m, 0) for (1 − m) ≪ 1 and using (86) we get the asymptotic equation for the characteristics family Γ1, = v1 = τ + + (τ+ − ξ/ζ) +O(1−m) , (87) which is readily integrated to leading order to give Γ1 : ξ ≃ τ+ζ − , (88) where C1 ≥ 0 is an arbitrary constant ‘labeling’ the characteristics; C1 = 0 corresponds to the leading edge of the undular bore. This asymptotic formula (88) is valid as long as ζ ≫ 1. The behaviour of the characteristics belonging to the families Γ1 and Γ2 near the leading edge is shown in Fig. 6a. Next, expanding the equation for the third characteristic family, Γ3: dξ/dζ = v3(−1,−m, 0) for (1−m) ≪ 1, we get on using (86) τ+ − ξ/ζ ln(1/(τ+ − ξ/ζ)) +O(τ+ − ξ/ζ) . (89) Figure 6: Characteristics behaviour for the similarity modulation solution near the leading edge ξ+(ζ): (a) families Γ1: dξ/dζ = v1 and Γ2 : ξ = C2ζ , (b) family Γ3: dξ/dζ = v3. Integrating (89) we obtain to first order Γ3 : ξ ≃ C3 − g(ζ) , (90) where g(ζ) = τ+ζ − C3 ln |τ+ζ − C3| − ln ζ dζ , g(C3/τ +) = 0 , (91) C3 being an arbitrary constant. The asymptotic formula (90) is valid as long as g(ζ)/C3 ≪ 1. Since the characteristics Γ3 intersect the leading edge ξ = τ +ζ we must indicate their behaviour outside the undular bore. It follows from the matching condition (74) and the limiting structure (34) of the characteristic velocities of the Whitham system, that the characteristics from the family Γ3 match with the Hopf equation characteristics dξ/dζ = 6r carrying the value of the Riemann invariant r = 0 corresponding to still water upstream the undular bore. Therefore, the sought external characteristics are simply vertical lines ξ = C3. The qualitative behaviour of the characteristics from the family Γ3 is shown in Fig. 6b. It is clear from the asymptotic behaviour of the characteristics that the edge characteristic ξ = τ+ζ corresponding to the motion of the leading solitary wave intersects only with characteristics of the family Γ3 carrying the Riemann invariant value r3 = 0 into the undular bore domain. Since, according to the matching condition (80), r3 ≡ 0 everywhere along the edge characteristic one can infer that the leading solitary wave motion is completely specified by its amplitude at ζ = 0. Indeed, in this case, the leading edge represents a genuine multiple characteristic of the modulation system, along which the Riemann invariant r+ = r2 = r1 is a constant. Given the constant value of r1 = −1 for the solution (82), one infers that the amplitude of the lead soliton of the self-similar undular bore, η0 = 2(r3 − r+) = 2 is also a constant value. Thus, in the undular bore evolving from an initial jump, the leading solitary wave represents an independent soliton of the KdV equation. Of course, this fact follows directly from the modulation solution (82) but now we have established its meaning in the context of the characteristics, which will play an important role below. Next we discuss the structure of the undular bore front in the case when the initial profile η(ξ, 0) is not a simple jump discontinuity, and instead has the form of a monotonically decreasing function, for instance, (−ξ)1/2 when ξ ≤ 0 and η(ξ, 0) = 0 for ξ > 0. In that case, the modulation solution for the undular bore no longer possesses x/t-similarity as in the Figure 7: a) Leading edge ξ+(ζ) of non-self-similar undular bore as an envelope of pairwise merging characteristics from the families dξ/dζ = v1 and dξ/dζ = v2; b) behaviour of the Riemann invariants in non-self-similar modulation solution with r3 ≡ 0. jump resolution case and, as a result, the speed (and therefore, the amplitude) of the lead solitary wave is not constant. For instance, for the afore-mentioned square-root initial profile the amplitude of the lead solitary wave grows as ζ2 (see Gurevich, Krylov and Mazur 1989, or Kamchatnov 2000). Clearly, such an amplitude variation is impossible if the leading edge ξ+(ζ) was a regular characteristic carrying a constant value of the Riemann invariant r+. As discussed above, however, the GP matching conditions (73) -(77) admit another possibility; the leading edge curve is the envelope of the characteristic families Γ1: dξ/dζ = v1 and Γ2: dξ/dζ = v2 merging when m = 1. This configuration is shown in Fig. 7a. In this case, the behaviour of the modulus m in the vicinity of the leading edge is given by the asymptotic formula found in Gurevich & Pitaevskii (1974): (1−m)2 (r+)2 (ξ+ − ξ) (92) where the function r+(ζ) 6= constant is assumed to be known. Another specific feature of this (general) configuration is that dr1,2/dξ → ±∞ as ξ → ξ+ (see Fig. 7b - also found in Gurevich & Pitaevskii 1974, see also Kamchatnov 2000), which is in drastic contrast with similarity solution (see Fig. 6a). This particular difference was discussed in relation with undular bores in the KdV-Burgers equation in Gurevich and Pitaevskii (1987). In summary, we see from (92) that the structure of the modulation solution in the vicin- ity of the leading edge of an undular bore defined as a characteristic envelope is qualitatively different compared to that for the similarity case (see (85)). The more general (but qual- itatively similar to (92)) asymptotic formula which takes into account small perturbations due to a variable topography and bottom friction will be derived later. At the moment, it is important for us that in this configuration, when the leading edge is a characteristic envelope rather than just a characteristic, the value r+, and thus, the leading solitary wave amplitude are allowed to vary. The analysis of the corresponding modulation solution in Gurevich, Krylov and Mazur (1989) showed that, while in the case of an initial jump the wave crests generated at the trailing edge reach the leading edge (and therefore, transform into solitary waves) only asymptotically as t → ∞, for the more general case of decreasing initial data each wave crest generated at the trailing edge reaches the leading edge in finite time and replaces (overtakes) the existing leading solitary wave. This process is manifested as a continuous amplitude growth of the (apparent) leading solitary wave. As in classical soliton theory, an alternative explanation of the leading solitary wave amplitude growth can be made in terms of the momentum exchange between the “instantaneous” leading solitary wave and solitary waves of greater amplitude coming from the left. Indeed, as the rigorous analysis of Lax, Levermore and Venakides showed (see Lax, Levermore and Venakides (1994) and the references therein), the whole modulated structure of the undular bore can be asymptotically described in terms of the interactions of a large number of KdV solitons initially ‘packed’ into a non-oscillating large-scale initial profile. This latter interpretation is especially instructive for our purposes. Our point is that the specific cause of the enhanced soliton interactions resulting in amplitude growth at the leading edge is not essential; it can be large-scale spatial variations of the initial profile as just described, but it could also equally well be an effect of small perturbations in the KdV equation itself. Indeed, in the weakly perturbed KdV equation, the local wave structure of the undular bore must be described to leading order by the periodic solution (70) of the unperturbed KdV equation, so if one assumes the GP boundary conditions analogous to (73) – (77) for the perturbed modulation system (27), one invariably will have to deal with one of the two possible types of the characteristics behaviour (shown in Figs. 7a and 8a) in the vicinity of the leading edge of the undular bore, because this qualitative behaviour is determined only by the structure of the GP boundary conditions and by the associated asymptotic structure of the characteristic velocities of the Whitham system for (1−m) ≪ 1, which are the same for both unperturbed and perturbed modulation systems. Next, we will show that, by using the knowledge of this qualitative behaviour of the characteristics, one is able to construct the asymptotic modulation solution for the undular bore front in the presence of variable topography and bottom friction even if the full solution of the perturbed modulation system is not available. 6.4 Gurevich-Pitaevskii problem for perturbed modulation sys- We investigate now how the GP matching problem applies to the perturbed modulation system (27). As in the original GP problem, we postulate the natural physical requirement that the mean value 〈U〉 is continuous across the undular bore edges, which represent free boundaries and are defined by the conditions m = 0 (trailing edge X = X−(T )) and m = 1 (leading edge X = X+(T )). Also, we consider propagation of the undular bore into still water, hence 〈U〉|X=X+(T ) = 0. Now, using the explicit expression (18) for 〈U〉 in terms of complete elliptic integrals and calculating its limits as m → 0 and m → 1 one has X = X−(T ) : λ2 = λ3 , 〈U〉 = −λ1 = u , X = X+(T ) : λ2 = λ1 , 〈U〉 = −λ3 = 0 , where u(X, T ) is solution of the dispersionless perturbed KdV equation (7), i.e. uT + 6uuX = F (T )u−G(T )u2, (94) with the boundary conditions ∆0 if τ < τ0; u = 0 if τ > τ0 , (95) where τ0 = −x0/ gh0. The boundary conditions (95) correspond to a discontinuous initial surface elevation A(x, t) at x = −x0, obtained by using transformations (3) and (6) where one sets t = 0. As earlier, ∆0 = a0/(3h0) is the value of the discontinuity in A, chosen in such a way that the amplitude of the lead solitary wave in the undular bore was exactly a0 in the flat-bottom zero-friction region (see Section 6.2). This free-boundary matching problem is then complemented by the kinematic conditions explicitly defining the boundaries X = X±(T ). These are formulated using the multiple characteristic directions of the perturbed modulation system (27) in the limits as m → 0 and m → 1 (cf. (75) - (77)), = V −(X−, T ) , = V +(X+, T ) , (96) where V − = v2(u, λ −, λ−) = v3(u, λ −, λ−), (97) V + = v2(λ +, λ+, 0) = v1(λ +, λ+, 0) , (98) and λ− = λ2(X −, T ) = λ3(X −, T ) , λ+ = λ2(X +, T ) = λ1(X +, T ). (99) Thus, for the perturbed KdV equation the leading and trailing edges of the undular bore are defined mathematically in the same way as for the unperturbed one, albeit for a different set of variables. 6.5 Deformation of the undular bore front due to variable topog- raphy and bottom friction Finally we study the effects of gradual slope and bottom friction on the leading front of the self-similar expanding undular bore described in Sections 6.2, 6.3. The result will essentially depend on the relative values of the small parameters appearing in the problem. We note that in general there are three distinct relevant small parameters, ≪ 1 , δ = max(hx) ≪ 1, CD ≪ 1 (100) The first small parameter is determined by the ratio of the equilibrium depth in the flat bottom region, to the distance from the beginning of the slope region to the location of the initial jump discontinuity in the surface displacement. This measures the typical relative spatial variations of the modulation parameters in the undular bore when it reaches the beginning of the slope. The second and third parameters are contained in the KdV equation (1) itself and measure the values of the slope and bottom friction respectively. In terms of the transformed variables appearing in (7), |F (T )| ∼ δ, |G(T )| ∼ CD (see (8)). Generally we assume δ ∼ CD (the possible orderings δ ≪ CD or CD ≪ δ can be then considered as particular cases). To obtain a quantitative description of the vicinity of the leading edge of the undular bore we perform an expansion of the Whitham modulation system (27) for (1 − m) ≪ 1. We first introduce the substitutions λi(X, T ) = λ +(T ) + li(X̃, T ) , vi = V + + v′i , ρi = ρ + + ρ′i, i = 1, 2. (101) where X̃ = X+ −X , V + = −4λ+ , ρ+ = F (T )λ+ + G(T )(λ+)2. (102) Since λ2 ≥ λ1, v2 ≥ v1 one always has l2 ≥ l1, v′2 ≥ v′1. Assuming X̃/X+ ≪ 1 ⇔ 1−m ≪ 1 and using that λ3 = 0 to leading order in the vicinity of the leading edge (see the matching condition (93)), we have from asymptotic expansions of (28) – (31) as (1−m) ≪ 1 v′1 = M1(l2 − l1) ≡ −2 ln(16/(1−m)) 1 + 1 (1−m) ln(16/(1−m)) (l2 − l1), v′2 = M2(l2 − l1) ≡ −2 1− ln(16/(1−m)) (1−m) ln(16/(1−m)) (l2 − l1), (103) ρ′1 = N1(l2 − l1) ≡ 1 + ln l2 − l1 −16λ+ 2λ+ ln l2 − l1 −16λ+ − 3λ+ (l2 − l1) ρ′2 = N2(l2 − l1) ≡ 5 + ln l2 − l1 −16λ+ 2λ+ ln l2 − l1 −16λ+ + 13λ+ (l2 − l1). (104) Naturally, v′i and ρ i vanish when l2 = l1. Now, substituting (101), (102) into the modulation system (27) we obtain − (V + + v′i) = ρ+ + ρ′i, i = 1, 2. (105) On using the kinematic condition (96) at the leading edge, this reduces to − v′i = ρ+ + ρ′i, i = 1, 2. (106) There are two qualitatively different cases to consider: (i) limX̃→0 |dli/dX̃| < ∞, i = 1, 2 (Fig. 8a) (ii) limX̃→0 |dli/dX̃| = ∞, i = 1, 2 (Fig. 8b) The case (i) implies that to leading order (106) reduces to = ρ+ , (107) which, together with the kinematic condition dX+/dT = −4λ+, defines the leading edge curve X+(T ). One can observe that this system coincides with (43), (42) defining the Figure 8: Riemann variables behaviour in the vicinity of the leading edge of the undular bore propagating over gradual slope with bottom friction (a) Adiabatic variations of the similarity GP regime, δ ≪ ǫ, CD ≪ ǫ; (b) General case, δ ∼ CD ∼ ǫ. motion of a separate solitary wave over a gradual slope with bottom friction. Its integral expressed in terms of original physical x, t-variables is given by (49). Therefore, in the case (i) the lead solitary wave in the undular bore to leading order is not restrained by interactions with the remaining part of the bore and behaves as a separate solitary wave. Physically this case corresponds to adiabatic deformation of the similarity modulation solution (81), (82) and implies the following small parameter ordering : δ ≪ ǫ, CD ≪ ǫ. Next, we study the structure of this weakly perturbed similarity modulation solution in the vicinity of the leading edge. The next leading order of the system (106) yields − v′i = ρ′i, i = 1, 2, (108) that is = −N1 = −N2 . (109) Subtraction of one equation (109) from another with account of the relationship l2 − l1 ∼= −λ+(1−m) leads consistently to leading order to the differential equation for 1−m ∂(1 −m) F (T ) 16G(T ) , (110) This equation should be solved with the initial condition 1−m = 0 at X̃ = 0 . (111) Elementary integration gives with the accuracy O(1−m) (cf. (85)) (1−m) ln 16 F (T )− 16 λ+G(T ) X+ −X . (112) This formula determines the dependence of the modulus m on T and X (as long as 1−m ≪ Now, we make use of the solution λ+ of equation (107) given by (47) with C0 = 4/(3ga0h0) (see (48)). Under supposition that the integral h−3dx diverges as h → 0, so that the turbulent bottom friction plays an essential role in the undular bore front be- haviour (see Section 4 for a similar approximation for an isolated solitary wave), we obtain for h ≪ h0 (1−m) ln 2 + 3h2 (X+ −X). (113) At last, if the bottom topography is approximated by the dependence (52), we get with the same accuracy (1−m) ln 16 (3α− 1)δ (X+ −X) , (114) where α > 1/3. The second term in square brackets tends to zero as h → 0. However, the region where it can be neglected may be very narrow because of smallness of the parameter δ. We recall that in this formula X+ is given by (49) and X is defined by (3) in terms of the original physical independent variables x and t. Summarising, if the conditions δ, CD ≪ ǫ are satisfied, the lead solitary wave of the undular bore behaves as an individual (noninteracting) solitary wave adiabatically varying under small perturbation due to variable topography and bottom friction. The modulation solution in the vicinity of the leading edge also varies adiabatically, however, its qualitative structure considered in Section 6.4 (see Figs 5,6) remains unchanged. In a sharp contrast with the described case of adiabatic deformation of an undular bore front is case (ii) when the second term in the left-hand side of (106) contributes to the leading order, i.e. to the motion of the leading edge itself. Namely, we have = ρ+ + v′i , i = 1, 2. (115) Now dλ+/dT 6= ρ+ which means that the amplitude of the lead solitary wave a = −2λ+ varies essentially differently compared to the case of an isolated solitary wave. Indeed, the term ρ+ in the right-hand side of (115) is responsible for local adiabatic variations of the solitary wave while the term v′i∂li/∂X̃ describes nonlocal parts of the variations associated with the wave interactions within the undular bore. Using asymptotic formulae (103) implying v′2 ≥ 0, v′1 ≤ 0, and the condition limX̃→0 |dl1,2/dX̃| = ∞ along with l2 ≥ l1, it is not difficult to show that this nonlocal term is always nonnegative , i.e. the lead solitary wave in the undular bore propagating over a gradual slope with bottom friction always moves faster (and, therefore, has greater amplitude) than an isolated solitary wave of the same initial amplitude in the beginning of the slope. Indeed, as we have shown in Section 5, the presence of a slope and bottom friction always result in “squeezing” the cnoidal wave, hence increasing momentum exchange between solitary waves in the vicinity of the leading edge of the undular bore and acceleration of the lead solitary wave itself. The situation here is qualitatively analogous to that described in Section 6.4 where the general global modulation solution for the unperturbed KdV equation was discussed. Similarly to that case, the leading edge now represents a characteristic envelope – a caustic (otherwise we are back in the case (i) implying dλ+/dT = ρ+) (see Fig. 6a). Unlike the case of adiabatic variations of the leading edge, determination of the function λ+(T ) requires now knowledge of the full solution of the perturbed modulation system (27) with the matching conditions (93). While the analytic methods to construct such a solution for inhomogeneous quasilinear systems are not available presently, it is instructive to assume that dλ+/dT − ρ+ is a known function of T and to study the structure of the solution in close vicinity of the leading edge. With an account of the explicit form (103) of the velocity corrections, equations (115) assume the form = −dλ +/dT − ρ+ 2(l2 − l1) ln[16/(1−m)] (1−m) , (116) = −dλ +/dT − ρ+ 2(l2 − l1) ln[16/(1−m)] (1−m) . (117) Taking the difference of (116) and (117) we transform it to the form ∂(1 −m) dλ+/dT − ρ+ (λ+)2 (1−m) ln[16/(1−m)] . (118) This equation can be readily integrated with the initial condition (111) to give (1−m)2 2(dλ+/dT − ρ+) (λ+)2 (X+ −X). (119) This solution coincides with the asymptotic formula (92) for the behaviour of the modulus in the vicinity of the leading edge of the undular bore in general unperturbed GP problem [16] but instead of the derivative dλ+/dT in (92) we have the difference dλ+/dT −ρ+ (which is always positive as we have established). 7 Conclusions We have studied the effects of a gradual slope and turbulent (Chezy) bottom friction on the propagation of solitary waves, nonlinear periodic waves and undular bores in shallow-water flows in the framework of the variable-coefficient perturbed KdV equation. The analysis has been performed in the most general setting provided by the associated Whitham equations describing slow modulations of a periodic travelling wave due to the slope, bottom friction and spatial nonuniformity of initial data. This modulation theory, developed in general form for perturbed integrable equations in Kamchatnov (2004) was applied here to the perturbed KdV equation and allowed us to take into account slow variations of all three parameters in the cnoidal wave solution. The particular time-independent solutions of the perturbed modulation equations were shown to be consistent with the adiabatically varying solutions for a single solitary wave and for a periodic wave propagating over a slope without bottom friction obtained in Ostrovsky & Pelinovsky (1970, 1975) and Miles (1979, 1983a). It was shown, however, that the assumption of zero mean elevation used in these papers for the description of slow variations of a cnoidal wave, ceases to be valid in the case when the turbulent bottom friction is present. In this case, a more general solution was obtained numerically improving the results of Miles (1983b). Further, the derived full time-dependent modulation system was used for the descrip- tion of the effects of variable topography and bottom friction on the propagation of undular bores, in particular on the variations of the undular bore front representing a system of weakly interacting solitary waves. By the analysis of the characteristics of the Whitham system in the vicinity of the leading edge of the undular bore, two possible configurations have been identified depending on whether the leading edge of the undular bore represents a regular characteristic of the modulation system or its singular characteristic, i.e. a caustic. The first case was shown to correspond to adiabatically slow deformations of the classi- cal Gurevich-Pitaevskii modulation solution and is realised when the perturbations due to variable topography and bottom friction are small compared with the existing spatial non- uniformity of modulations in the undular bore (which is supposed to be formed outside the region of variable topography/bottom friction). In the case when modulations due to the external perturbations are comparable in magnitude with the existing modulations in the undular bore, the leading edge becomes a caustic, and this situation was shown to corre- spond to enhanced solitary wave interactions within the undular bore front. These enhanced interactions have been shown to lead to a “nonlocal” leading solitary wave amplitude growth, which cannot be predicted in the frame of the traditional local adiabatic approach to prop- agation of an isolated solitary wave in a variable environment. As we mentioned in the Introduction, one of our original motivations for this study was the possibility to model a shoreward propagating tsunami as an undular bore. In this context, we would suggest that the second scenario described above is the more relevant, which has the implication that the growth, and eventual breaking of the leading waves in a tsunami wavetrain, cannot be modeled as a local effect for that particular wave, but is determined instead by the whole structure of the wavetrain. Acknowledgements This work was started during the visit of A.M.K. at the Department of Mathematical Sci- ences, Loughborough University, UK. A.M.K. is grateful to EPSRC for financial support. Appendix A: Derivation of the perturbed modulation system We express the integrand function in the right-hand side of (24) in terms of the µ-variable (15): (2λi − s1 − U)R = 8Gµ3 − [8Gλi + 4(F + 2s1G)]µ2 + [4(F + 2s1G)λi + 2s1(s1G+ F )]µ− 2s1(s1G+ F )λi. (120) Then we obtain with the use of (13), (14), and (16) the following expressions: 〈µ〉 = µdθ = −P (µ) 〈µ2〉 = 1 µ2dθ = 〈µ3〉 = µ3dθ = − + s1〈µ2〉 − s2〈µ〉+ s3, (121) where I is a known integral (λ3 − µ)(µ− λ2)(µ− λ1) dµ (λ3 − λ1)5/2[(1−m+m2)E(m)− (1−m)(1−m/2)K(m)], (122) K(m) and E(m) being the complete elliptic integrals of the first and second kind, respec- tively. The derivatives of I with respect to λi are also known table integrals (Gradshtein & Ryzhik 1980): (λ3 − µ)(µ− λ2) µ− λ1 λ3 − λ1[(λ2 + λ3 − 2λ1)E − 2(λ2 − λ1)K], (λ3 − µ)(µ− λ1) µ− λ2 λ3 − λ1[(λ3 − λ1)K + (λ1 + λ3 − 2λ2)E], (µλ2)(µ− λ1) λ3 − µ λ3 − λ1[(2λ3 − λ1 − λ2)E − (λ2 − λ1)K]. (123) We can easily express the si-derivatives in terms of λi derivatives by differentiation of the formulae (see (16)) s1 = λ1 + λ2 + λ3, s2 = λ1λ2 + λ1λ3 + λ2λ3, s3 = λ1λ2λ3 (124) and solving the linear system for differentials. Simple calculation gives (−1)3−k j 6=i(λi − λj) . (125) Then, combining (123) and (125), we obtain the derivatives ∂I/∂si and hence the expressions (λ3 − λ1) (s21 − 3s2) (λ2 − λ1)(λ2 + λ3 − 2λ1) + s1λ1 + λ 1 − λ2λ3 (λ3 − λ1) (126) To complete the calculation of the right-hand side of (24), we need also expressions ∂L/∂λ1 = 2(λ2 − λ1) ∂L/∂λ2 2(λ3 − λ2)(1−m)K E − (1−m)K ∂L/∂λ3 2(λ3 − λ2)K (127) Collecting all contributions into perturbations terms, we obtain the Whitham equations in the form = ρi = Ci[F (T )Ai −G(T )Bi], (128) where Cj , Aj , Bj and vj , j = 1, 2, 3 are specified by formulae (28) - (30). References [1] Apel, J.P. 2003 A new analytical model for internal solitons in the ocean, Journ. Phys. Oceanogr. 33, 2247. [2] Avilov, V.V., Krichever,I.M. and Novikov, S.P 1987 Evolution of Whitham zone in the theory of Korteweg-de Vries. Sov. Phys. Dokl. 32, 564 - . [3] Benjamin, T.B. and Lighthill, M.J. 1954 On cnoidal waves and bores. Proc. Roy. Soc. A224, 448 [4] Boussinesq, J. 1982 Théorie des ondes des remous qui se propagent le long d’un canal rectangulaire, en communuuant au liquide contenu dans ce canal des vitesses sensblemnt pareilles de la surface au fond. J. Math. Pures Appl. 17, 55-108. [5] Dubrovin, B.A. and Novikov, S.P. 1989 Hydrodynamics of weakly deformed soliton lattices. Differential geometry and Hamiltonian theory. Russian Math. Surveys 44, 35– [6] El, G.A. 2005 Resolution of a shock in hyperbolic systems modified by weak dispersion. Chaos 15, Art. No 037103. [7] Fornberg, D. and Whitham, G.B. 1978 A numerical and theoretical study of certain nonlinear wave phcnomena. Phil Trans. Roy. Soc. London A 289 373-403. [8] Gradshtein, I.S. and Ryzhik, I.M. 1980 Table of integrals, series, and products, London : Academic Press. [9] Grimshaw, R. 1979 Slowly varying solitary waves. I Korteweg-de Vries equation. Proc. Roy. Soc. 368A, 359-375. [10] Grimshaw, R. 1981 Evolution equations for long nonlinear internal waves in stratified shear flows. Stud. Appl. Math. 65, 159-188. [11] Grimshaw, R. 2006 Internal solitary waves in a variable medium. Gesellschaft für Angewandte Mathematik (accepted). [12] Grimshaw, R. Pelinovsky, E. and Talipova, T. 2003 Damping of large-amplitude soli- tary waves. Wave Motion 37, 351-364. [13] Grimshaw, R.H.J. and Smyth, N.F. 1986 Resonant flow of a stratified fluid over topography. J. Fluid Mech. 169, 429. [14] Gurevich, A.V., Krylov, A.L. and El, G.A. 1992 Evolution of a Riemann wave in dispersive hydrodynamics. Sov. Phys. JETP, 74 957–962. [15] Gurevich, A.V., Krylov, A.L. and Mazur, N.G. 1989 Quasi-simple waves in Korteweg- de Fries hydrodynamics, Zh. Eksp. Teor. Fiz. 95 1674. [16] Gurevich, A.V. and Pitaevskii, L.P. 1974 Nonstationary structure of a collisionless shock wave. Sov. Phys. JETP 38, 291. [17] Gurevich, A.V. and Pitaevskii, L.P. 1987 Averaged description of waves in the Korteweg-de Vries-Burgers equation. Sov. Phys. JETP 66, 490. [18] Gurevich, A.V. and Pitaevskii, L.P. 1991 Nonlinear waves with dispersion and non-local damping. Sov. Phys. JETP, 72, 821–825. [19] Johnson, R.S. 1970 A non-linear equation incorporating damping and dispersion, J. Fluid Mech. 42, 49-60. [20] Johnson, R.S. 1973a On the development of a solitary wave moving over an uneven bottom. Proc. Camb. Phil. Soc. 73, 183-203. [21] Johnson, R.S. 1973b On an asymptotic solution of the Korteweg - de Vries equation with slowly varying coefficients, J. Fluid Mach., 60, 813-824. [22] Johnson, R.S. 1997 A Modern Introduction to the Mathematical Theory of Water Waves Cambridge University Press, Cambridge. [23] Kamchatnov, A.M. 2000 Nonlinear Periodic Waves and Their Modulations—An In- troductory Course, World Scientific, Singapore. [24] Kamchatnov, A.M. 2004 On Whitham theory for perturbed integrable equations. Physica D188 247–261. [25] Lax, P.D., Levermore, C.D. and Venakides, S. 1994 The generation and propagation of oscillations in dispersive initial value problems and their limiting behavior. Important developments in soliton theory, ed. by A.S. Focas and V.E. Zakharov, (Springer Ser. Nonlinear Dynam., Springer, Berlin 1994) p. 205. [26] Miles, J.W. 1979 On the Korteweg - de Vries equation for a gradually varying channel, J. Fluid Mech 91 181-190 [27] Miles J.W. 1983a Solitary wave evolution over a gradual slope with turbulent friction. J. Phys. Oceanography, 13 551–553. [28] Miles, J.W. 1983b Wave evolution over a gradual slope with turbulent friction. J. Fluid Mech 133 207-216 [29] Myint, S. and Grimshaw, R.H.J. 1995 The modulation of nonlinear periodic wavetrains by dissipative terms in the Korteweg-de Vries equation. Wave Motion, 22, 215–238. [30] Ostrovsky, L.A. and Pelinovsky, E.N. 1970 Wave transformation on the surface of a fluid of variable depth. Akad. Nauk SSSR, Izv. Atmos. Ocean Phys. 6, 552-555. [31] Ostrovsky, L.A. and Pelinovsky, E.N. 1975 Refraction of nonlinear sea waves in a coastal zone. Akad. Nauk SSSR, Izv. Atmos. Ocean Phys. 11, 37-41. [32] Smyth, N.F. 1987 Modulation theory for resonant flow over topography, Proc. Roy. Soc. 409A, 79. [33] Whitham, G.B. 1965 Non-linear dispersive waves, Proc. Roy. Soc. London A283, 238. [34] Whitham, G.B. 1974 Linear and Nonlinear Waves, Wiley–Interscience, New York. Introduction Problem formulation Modulation equations Modulation solution in the solitary wave limit Adiabatic deformation of a cnoidal wave Undular bore propagation over variable topography with bottom friction Gurevich-Pitaevskii problem for flat-bottom zero-friction case Undular bore developing from an initial jump Structure of the undular bore front Gurevich-Pitaevskii problem for perturbed modulation system Deformation of the undular bore front due to variable topography and bottom friction Conclusions ABSTRACT This paper considers the propagation of shallow-water solitary and nonlinear periodic waves over a gradual slope with bottom friction in the framework of a variable-coefficient Korteweg-de Vries equation. We use the Whitham averaging method, using a recent development of this theory for perturbed integrable equations. This general approach enables us not only to improve known results on the adiabatic evolution of isolated solitary waves and periodic wave trains in the presence of variable topography and bottom friction, modeled by the Chezy law, but also importantly, to study the effects of these factors on the propagation of undular bores, which are essentially unsteady in the system under consideration. In particular, it is shown that the combined action of variable topography and bottom friction generally imposes certain global restrictions on the undular bore propagation so that the evolution of the leading solitary wave can be substantially different from that of an isolated solitary wave with the same initial amplitude. This non-local effect is due to nonlinear wave interactions within the undular bore and can lead to an additional solitary wave amplitude growth, which cannot be predicted in the framework of the traditional adiabatic approach to the propagation of solitary waves in slowly varying media. <|endoftext|><|startoftext|> Introduction It was conjectured by Diósi, Feldmann and Kosloff in [4], based on thermodynamical considerations, that the von Neumann entropy of a quantum state equal to a mixture Rn := σ ⊗ ρ⊗(n−1) + ρ⊗ σ ⊗ ρ⊗(n−2) + · · · + ρ⊗(n−1) ⊗ σ exceeds the entropy of a component asymptotically by the Umegaki relative entropy S(σ‖ρ), that is, S(Rn) − (n− 1)S(ρ) − S(σ) → S(σ‖ρ) (1) as n → ∞. Here ρ and σ are density matrices acting on a finite dimensional Hilbert space. Recall that S(σ) = −Tr σ log σ and S(σ‖ρ) = Tr σ(log σ − log ρ) if supp σ ≤ supp ρ +∞ otherwise. Concerning the background of quantum entropy quantities, we refer to [10, 12]. Apparently no exact proof of (1) has been published even for the classical case, al- though for that case a heuristic proof is offered in [4]. In the paper first an analytic proof of (1) is given for the case supp σ ≤ supp ρ, using an inequality between the Umegaki and the Belavkin-Staszewski relative entropies, and the weak law of large numbers in the quantum case. In the second part of the paper, it is clarified that the problem is related to the theory of classical-quantum channels. The essential observation is the fact that S(Rn) − (n− 1)S(ρ) − S(σ) in the conjecture is a Holevo quantity (classical-quantum mutual information) for a certain channel for which the relative entropy emerges as the capacity per unit cost. The two different proofs lead to two different generalizations of the conjecture. 2 An analytic proof of the conjecture In this section we assume that supp σ ≤ supp ρ for the support projections of σ and ρ. One can simply compute: S(Rn‖ρ ⊗n) = Tr(Rn logRn − Rn log ρ = −S(Rn) − (n− 1)Tr ρ log ρ− Trσ log ρ. Hence the identity S(Rn‖ρ ⊗n) = −S(Rn) + (n− 1)S(ρ) + S(σ‖ρ) + S(σ) holds. It follows that the conjecture (1) is equivalent to the statement S(Rn‖ρ ⊗n) → 0 as n → ∞ when supp σ ≤ supp ρ. Recall the Belavkin-Staszewski relative entropy SBS(ω‖ρ) = Tr(ω log(ω 1/2ρ−1ω1/2)) = −Tr(ρ η(ρ−1/2ωρ−1/2)) if suppω ≤ supp ρ, where η(t) := −t log t, see [1, 10]. It was proved by Hiai and Petz S(ω‖ρ) ≤ SBS(ω‖ρ), (2) see [6], or Proposition 7.11 in [10]. Theorem 1. If supp σ ≤ supp ρ, then S(Rn)− (n−1)S(ρ)−S(σ) → S(σ‖ρ) as n → ∞. Proof: We want to use the quantum law of large numbers, see Proposition 1.17 in [10]. Assume that ρ and σ are d × d density matrices and we may suppose that ρ is invertible. Due to the GNS-construction with respect to the limit ϕ∞ of the product states ϕn(A) = Tr ρ ⊗nA on the n-fold tensor product Md(C) ⊗n, n ∈ N, all finite tensor products Md(C) ⊗n are embedded into a von Neumann algebra M acting on a Hilbert space H. If γ denotes the right shift and X := ρ−1/2σρ−1/2, then Rn is written as Rn = (ρ 1/2)⊗n γi(X) (ρ1/2)⊗n. By inequality (2), we get 0 ≤ S(Rn‖ρ ⊗n) ≤ SBS(Rn‖ρ = −Tr ρ⊗n η (ρ−1/2)⊗nRn(ρ −1/2)⊗n γi(X) , (3) where Ω is the cyclic vector in the GNS-construction. The law of large numbers gives γi(X) → I in the strong operator topology in B(H), since ϕ(X) = Tr ρρ−1/2σρ−1/2 = 1. Since the continuous functional calculus preserves the strong convergence (simply due to approximation by polynomials on a compact set), we obtain γi(X) → η(I) = 0 strongly. This shows that the upper bound (3) converges to 0 and the proof is complete. By the same proof one can obtain that for Rm,n := σ⊗m ⊗ ρ⊗(n−1) + ρ⊗ σ⊗m ⊗ ρ⊗(n−2) + · · · + ρ⊗(n−1) ⊗ σ⊗m the limit relation S(Rm,n) − (n− 1)S(ρ) −mS(σ) → mS(σ‖ρ) (4) holds as n → ∞ when m is fixed. In the next theorem we treat the probabilistic case in a matrix language. The proof includes the case when supp σ ≤ supp ρ is not true. Those readers who are not familiar with the quantum setting of the previous theorem are suggested to follow the arguments below. Theorem 2. Assume that ρ and σ are commuting density matrices. Then S(Rn)− (n− 1)S(ρ) − S(σ) → S(σ‖ρ) as n → ∞. Proof: We may assume that ρ = Diag(µ1, . . . , µℓ, 0, . . . , 0) and σ = Diag(λ1, . . . , λd) are d×d diagonal matrices, µ1, . . . , µℓ > 0 and ℓ < d. (We may consider ρ, σ in a matrix algebra of bigger size if ρ is invertible.) If supp σ ≤ supp ρ, then λℓ+1 = · · · = λd = 0; this will be called the regular case. When supp σ ≤ supp ρ is not true, we may assume that λd > 0 and we refer to the singular case. The eigenvalues of Rn correspond to elements (i1, . . . , in) of {1, . . . , d} (λi1µi2 · · ·µin + µi1λi2µi3 · · ·µin + · · · + µi1 · · ·µin−1λin). (5) We divide the eigenvalues in three different groups as follows: (a) A corresponds to (i1, . . . , in) ∈ {1, . . . , d} n with 1 ≤ i1, . . . , in ≤ ℓ, (b) B corresponds to (i1, . . . , in) ∈ {1, . . . , d} n which contains exactly one d, (c) C is the rest of the eigenvalues. If the eigenvalue (5) is in group A, then it is (λi1/µi1) + · · · + (λin/µin) µi1µi2 · · ·µin . First we compute η(κ) = i1,...,in (λi1/µi1) + · · · + (λin/µin) µi1 · · ·µin Below the summations are over 1 ≤ i1, . . . , in ≤ ℓ: i1,...,in (λi1/µi1) + · · · + (λin/µin) µi1 · · ·µin i1,...,in (λi1/µi1) + · · · + (λin/µin) µi1 · · ·µin log(µi1 · · ·µin) + Qn i1,...,in λi1µi2 · · ·µin log µik + i1,...,in λi1µi2 · · ·µin logµik + · · · + i1,...,in λi1µi2 · · ·µin log µik (n− 1) µik logµik + λik logµik = (n− 1)S(ρ) − λi logµi + Qn, where Qn := i1,...,in (µi1 · · ·µin)η (λi1/µi1) + · · · + (λin/µin) Consider a probability space (Ω,P) := {1, . . . , ℓ}N, (µ1, . . . , µℓ) where (µ1, . . . , µℓ) N is the product of the measure on {1, . . . , ℓ} with the distribution (µ1, . . . , µℓ). For each n ∈ N let Xn be a random variable on Ω depending on the nth {1, . . . , ℓ} so that the value of Xn at i ∈ {1, . . . , ℓ} is λi/µi. Then X1, X2, . . . are identically distributed independent random variables and Qn is the expectation value of X1 + · · · + Xn The strong law of large numbers says that X1 + · · · + Xn → E(X1) = λi almost surely. Since η((X1 + · · · + Xn)/n) is uniformly bounded, the Lebesgue bounded convergence theorem implies that Qn → η as n → ∞. In the regular case i=1 λi = 1, Qn → 0 and all non-zero eigenvalues are in group A. Hence we have S(Rn) − (n− 1)S(ρ) − S(σ) = − λi logµi + λi log λi + Qn = S(σ‖ρ) + Qn and the statement is clear. Next we consider the singular case, when we have η(κ) = (n− 1)S(ρ) + O(1), and we turn to eigenvalues in B. If the eigenvalue corresponding to (i1, . . . , in) ∈ {1, . . . , d}n is in group B and i1 = d, then the eigenvalue is λdµi2 . . . µin . It follows that i2,...,in (λdµi2 · · ·µin (λdµi2 · · ·µin i2,...,in (µi2 · · ·µin) log(µi2 · · ·µin) − (n− 1)S(ρ) − When i2 = d, . . . , in = d, we get the same quantity, so this should be multiplied with n: η(κ) = λd(n− 1)S(ρ) − λd log We make a lower estimate to the entropy of Rn in such a way that we compute κ η(κ) when κ runs over A and B. It is clear now that S(Rn) − (n− 1)S(ρ) − S(σ) ≥ η(κ) + η(κ) − (n− 1)S(ρ) − S(σ) ≥ λd(n− 1)S(ρ) + λd log n + O(1) → +∞ as n → ∞. 3 Interpretation as capacity A classical-quantum channel with classical input alphabet X transfers the input x ∈ X into the output W (x) ≡ ρx which is a density matrix acting on a Hilbert space K. We restrict ourselves to the case when X is finite and K is finite dimensional. If a classical random variable X is chosen to be the input, with probability distribution P = {p(x) : x ∈ X}, then the corresponding output is the quantum state ρX := x∈X p(x)ρx. When a measurement is performed on the output quantum system, it gives rise to an output random variable Y which is jointly distributed with the input X . If a partition of unity {Fy : y ∈ X} in B(K) describes the measurement, then Prob(Y = y |X = x) = Tr ρxFy (x, y ∈ X ). (6) According to the Holevo bound, we have I(X ∧ Y ) := H(Y ) −H(Y |X) ≤ I(X,W ) := S(ρX) − p(x)S(ρx), (7) which is actually a simple consequence of the monotonicity of the relative entropy un- der state transformation [7], see also [11]. I(X,W ) is the so-called Holevo quantity or classical-quantum mutual information, and it satisfies the identity p(x)S(ρx‖ρ) = I(X,W ) + S(ρX‖ρ), (8) where ρ is an arbitrary density. The channel is used to transfer sequences from the classical alphabet; x = (x1, x2, . . . , xn) ∈ X n is transferred into the quantum state W⊗n(x) = ρx := ρx1⊗ρx2⊗. . .⊗ρxn . A code for the channel W⊗n is defined by a subset An ⊂ X n, which is called a codeword set. The de- coder is a measurement {Fy : y ∈ X n}. The probability of error is Prob(X 6= Y ), where X is the input random variable uniformly distributed on An and the output random variable is determined by (6), where x and y are replaced by x and y. The essential observation is the fact that S(Rn)−(n−1)S(ρ)−S(σ) in the conjecture is a Holevo quantity in case of a channel with input sequences (x1, x2, . . . , xn) ∈ {0, 1} and outputs ρx1 ⊗ ρx2 ⊗ . . . ⊗ ρxn, where ρ0 = σ, ρ1 = ρ and the codewords are all sequences containing exactly one 0. More generally, we shall consider Holevo quantities I(A, ρ0, ρ1) := S S(ρx). defined for any set A ⊂ {0, 1}n of binary sequences of length n. The concept related to the conjecture we study is the channel capacity per unit cost which is defined next for simplicity only in the case where X = {0, 1}, the cost of a character 0 ∈ X is 1, while the cost of 1 ∈ X is 0. For a memoryless channel with a binary input alphabet X = {0, 1} and an ε > 0, a number R > 0 is called an ε-achievable rate per unit cost if for every δ > 0 and for any sufficiently large T , there exists a code of length n > T with at least eT (R−δ) codewords such that each of the codewords contains at most T 0’s and the error probability is at most ε. The largest R which is an ε-achievable per unit cost for every ε > 0 is the channel capacity per unit cost. Lemma 1. For an arbitrary A ⊂ {0, 1}n, I(A, ρ0, ρ1) ≤ c(A)S(ρ0‖ρ1) holds, where c(A) := |{i : xi = 0}|. Proof: Let c(x) := |{i : xi = 0}| for x ∈ A. Since I(A, ρ0, ρ1) is a particular Holevo quantity I(X,W⊗n), we can use the identity (8) to get an upper bound S(ρx‖ρ 1 ) = c(x)S(ρ0‖ρ1) = c(A)S(ρ0‖ρ1) for I(A, ρ0, ρ1). Lemma 2. If A ⊂ {0, 1}n is a code of the channel W⊗n, whose probability of error (for some decoding scheme) does not exceed a given 0 < ε < 1, then (1 − ε) log |A| − log 2 ≤ I(A, ρ0, ρ1). Proof: The right-hand side is a bound for the classical mutual information I(X∧Y ) = H(Y ) − H(Y |X), where Y is the channel output, see (7). Since the error probability Prob(X 6= Y ) is smaller than ε, application of the Fano inequality (see [3]) gives H(X|Y ) ≤ ε log |A| + log 2. Therefore I(X ∧ Y ) = H(X) −H(X|Y ) ≥ (1 − ε) log |A| − log 2, and the proof is complete. The above two lemmas shows that the relative entropy S(ρ0‖ρ1) is an upper bound for the channel capacity per unit cost of the channel W (0) = ρ0 and W (1) = ρ1 with a binary input alphabet. In fact, assume that R > 0 is an ε-achievable rate. For every δ > 0 and T > 0 there is a code A ⊂ {0, 1}n for which we get by Lemmas 1 and 2 TS(ρ0‖ρ1) ≥ c(A)S(ρ0‖ρ1) ≥ I(A, ρ0, ρ1) ≥ (1 − ε) log |A| − log 2 ≥ (1 − ε)T (R− δ) − log 2. Since T is arbitrarily large and ε, δ are arbitrarily small, R ≤ S(ρ0‖ρ1) follows. That S(ρ0‖ρ1) equals the channel capacity per unit cost will be verified below. Theorem 3. Let the classical-quantum channel W : X = {0, 1} → B(K) be defined as W (0) = ρ0 ≡ σ and W (1) = ρ1 ≡ ρ. Assume that An ⊂ {0, 1} n is chosen such that (a) each element x = (x1, x2, . . . , xn) ∈ An contains at most ℓ copies of 0, (b) log |An|/ logn → c as n → ∞, c(An) := |{i : xi = 0}| → c as n → ∞ for some real number c > 0 and for some natural number ℓ. If the random variable Xn has a uniform distribution on An, then S(ρXn) − S(ρx) = cS(σ‖ρ). The proof of the theorem is divided into lemmas. We need the direct part of the so-called quantum Stein lemma obtained in [6], see also [2, 5, 9, 12]. Lemma 3. Let ρ0 and ρ1 be density matrices. For every η > 0 and 0 < R < S(ρ0‖ρ1), if N is sufficiently large, then there is a projection E ∈ B(K⊗N) such that αN [E] := Tr ρ 0 (I − E) < η and for βN [E] := Tr ρ 1 E the estimate log βN [E] < −R holds. Note that αN is called the error of the first kind, while βN is the error of the second kind. Lemma 4. Assume that ε > 0, 0 < R < S(ρ0‖ρ1), ℓ is a positive integer and the sequences x in An ⊂ {0, 1} n contain at most ℓ copies of 0. Let the codewords be the N-fold repetitions xN = (x,x, . . . ,x) of the sequences x ∈ An. If N is the integer part and n is large enough, then there is a decoding scheme such that the error probability is smaller than ε. Proof: We follow the probabilistic construction in [13]. Let the codewords be the N - fold repetitions xN = (x,x, . . . ,x) of the sequences x ∈ An. The corresponding output density matrices act on the Hilbert space K⊗Nn ≡ (K⊗n)⊗N . We decompose this Hilbert space into an N -fold product in a different way. For each 1 ≤ i ≤ n, let Ki be the tensor product of the factors i, i + n, i + 2n, . . . , i + (N − 1)n. So K is identified with K1 ⊗K2 ⊗ . . .⊗Kn. For each 1 ≤ i ≤ n we perform a hypothesis testing on the Hilbert space Ki. The 0-hypothesis is that the ith component of the actually chosen x ∈ An is 0. Based on the channel outputs at time instances i, i + n, . . . , i + (N − 1)n, the 0-hypothesis is tested against the alternative hypothesis that the ith component of x is 1. According to the quantum Stein lemma (Lemma 3), given any η > 0 and 0 < R < S(σ‖ρ), for N sufficiently large, there exists a test Ei such that the probability of error of the first kind is smaller than η, while the probability of error of the second kind is smaller than e−NR. The projections Ei and I − Ei form a partition of unity in the Hilbert space Ki, and the n-fold tensor product of these commuting projection will give a partition of unity in K⊗Nn. Let y ∈ {0, 1}n and set Fy := ⊗ i=1Fyi , where Fyi = Ei if yi = 0 and Fyi = I −Ei if yi = 1. Therefore, the result of decoding can be an arbitrary 0–1 sequence in {0, 1} The decoding scheme gives y ∈ {0, 1}n in such a way that yi = 0 if the tests accepted the 0-hypothesis for i and yi = 1 if the alternative was accepted. The error probability should be estimated: Prob(Y 6= X|X = x) = y:y 6=x Tr ρ⊗N y:y 6=x Tr ρ⊗Nxi Fyi y:yi 6=xi Tr ρ⊗Nxj Fyj ≤ Tr ρ⊗Nxi (I − Fxi). If xi = 0, then Tr ρ⊗Nxi (I − Fxi) = Tr ρ 0 (I −Ei) ≤ η, because it is an error of the first kind. When xi = 1, Tr ρ⊗Nxi (I − Fxi) = Tr ρ 1 Ei ≤ e from the error of the second kind. It follows that ℓη + ne−NR is a bound for the error probability. The first term will be small if η is small. The second term will be small if N is large enough. If both terms are majorized by ε/2, then the statement of the lemma holds. We can choose n so large that N defined by the statement should be large enough. Proof of Theorem 3: Since Lemma 1 gives an upper bound, that is, lim sup S(ρXn) − S(ρx) ≤ cS(σ‖ρ), it remains to prove that lim inf S(ρXn) − S(ρx) ≥ cS(σ‖ρ). Lemma 4 is about the N -times repeated input XN and describes a decoding scheme with error probability at most ε. According to Lemma 2 we have (1 − ε) log |An| − 1 ≤ S(ρXN ) − S(ρxN ). From the subadditivity of the entropy we have S(ρXN ) ≤ NS(ρX) S(ρxN ) = NS(ρx) holds due to the additivity for product. It follows that (1 − ε) log |An| ≤ S(ρX) − S(ρx). From the choice of N in Lemma 4 we have log |An| log n logn + log 2 − log ε log |An| and the lower bound is arbitrarily close to cR. Since R < S(ρ0‖ρ1) was arbitrary, the proof is complete. References [1] V.P. Belavkin and P. Staszewski, C*-algebraic generalization of relative entropy and entropy, Ann. Inst. Henri Poincaré, Sec. A 37(1982), 51–58. [2] I. Bjelaković, J. Deuschel, T. Krüger, R. Seiler, R. Siegmund-Schultze and A. Szko la, A quantum version of Sanov’s theorem, Comm. Math. Phys. 260(2005), 659–671. [3] T. M. Cover and J. A. Thomas, Elements of Information Theory, Second edition, Wiley-Interscience, Hoboken, NJ, 2006. [4] L. Diósi, T. Feldmann and R. Kosloff, On the exact identity between thermodynamic and informatic entropies in a unitary model of friction, Int. J. Quantum Information, 4(2006), 99–104. [5] M. Hayashi, Quantum information. An introduction, Springer, 2006. [6] F. Hiai and D. Petz, The proper formula for relative entropy and its asymptotics in quantum probability, Comm. Math. Phys. 143(1991), 99–114. [7] A.S. Holevo, Some estimates for the amount of information transmittable by a quan- tum communication channel (in Russian), Problemy Peredachi Informacii, 9(1973), 3–11. [8] M.A. Nielsen and I.L. Chuang, Quantum computation and quantum information, Cambridge University Press, Cambridge, 2000. [9] T. Ogawa and H. Nagaoka, Strong converse and Stein’s lemma in quantum hypoth- esis testing, IEEE Tans. Inf. Theory 46(2000), 2428–2433. [10] M. Ohya and D. Petz, Quantum Entropy and its Use, Springer, 1993. [11] M. Ohya, D. Petz and N. Watanabe, On capacities of quantum channels, Prob. Math. Stat. 17(1997), 179–196. [12] D. Petz, Lectures on quantum information theory and quantum statistics, book manuscript in preparation. [13] S. Verdu, On channel capacity per unit cost, IEEE Trans. Inform. Theory 36(1990), 1019–1030. Introduction An analytic proof of the conjecture Interpretation as capacity ABSTRACT In a quantum mechanical model, Diosi, Feldmann and Kosloff arrived at a conjecture stating that the limit of the entropy of certain mixtures is the relative entropy as system size goes to infinity. The conjecture is proven in this paper for density matrices. The first proof is analytic and uses the quantum law of large numbers. The second one clarifies the relation to channel capacity per unit cost for classical-quantum channels. Both proofs lead to generalization of the conjecture. <|endoftext|><|startoftext|> Intelligent location of simultaneously active acoustic emission sources: Part I Tadej Kosel and Igor Grabec Faculty of Mechanical Engineering, University of Ljubljana, Aškerčeva 6, POB 394, SI-1001 Ljubljana, Slovenia e-mail: tadej.kosel@guest.arnes.si; igor.grabec@fs.uni-lj.si Abstract— The intelligent acoustic emission locator is described in Part I, while Part II discusses blind source separation, time delay estimation and location of two simultaneously active continuous acoustic emission sources. The location of acoustic emission on complicated aircraft frame structures is a difficult problem of non-destructive testing. This article describes an intelligent acoustic emission source locator. The intelligent locator comprises a sensor antenna and a general regression neural network, which solves the location problem based on learning from examples. Locator performance was tested on different test specimens. Tests have shown that the accuracy of location depends on sound velocity and attenuation in the specimen, the dimensions of the tested area, and the properties of stored data. The location accuracy achieved by the intelligent locator is comparable to that obtained by the conventional triangulation method, while the applicability of the intelligent locator is more general since analysis of sonic ray paths is avoided. This is a promising method for non-destructive testing of aircraft frame structures by the acoustic emission method. INTRODUCTION Acoustic emission (AE) concerns non-destructive testing methods and is used to locate and characterize developing cracks and defects in material. In non-destructive testing of aviation frame structures, acoustic emission is a well accepted method [8]. The location problem is usually solved by various triangulation techniques based on the analysis of ultrasonic ray trajectories [10], [1], [3]. Solving and programming the related equation is rather cumbersome and cannot be simply per- formed if the structure of the tested specimen is geometrically complicated. Acoustic emission testing of aircraft structures is a challenging and difficult problem. The structures involve bolts, fasteners and plates, all of which move relative to one another due to differential structural loading during flight. The complex geometry of the airframe results in multiple mode conversions of AE source signals, compounding the difficulty of relating the source event to the detected signal. In order to avoid difficulties with equation solving and programming of the triangulation procedure, several empirical approaches based on learning from examples have already been proposed [5]. We developed an intelligent locator capable of learning from examples which we therefore called an intelligent locator. The purpose of developing the intelligent Manuscript generated: January 31, 2007 locator is to replace information obtained from the analysis of sonic ray trajectories by information obtained directly from simulated AE events on the specimen under test. In this way, the calibration procedure, which has to be performed anyway, could be generalized to the training of the intelligent locator. The development of such an intelligent locator has been described elsewhere [4]. In the locator developed a general regression neural network (GRNN) is employed [9], which acquires data about the detected AE signals and parameters of their sources during learning. The GRNN uses these data in testing when estimating the unknown source position from detected AE signals. For this purpose, associative GRNN operation is utilized. The basis of such operation is statistical estimation determined by the conditional average [6]. Conse- quently, the accuracy of the intelligent locator also depends on the learning procedure, and must be examined before testing. This article describes the results obtained by testing the intelligent locator on experimental continuous AE sources. The purpose of this study was to test and examine the advantages of the intelligent locator compared to a conventional locator. as described in Part I. In Part II an experiment will be explained in which an intelligent locator was used to locate two simultaneously active continuous AE sources generated by leakage air flow. Location of more than one source at the same time on the test specimen is a new approach in acoustic emission testing, and is a very promising method for aircraft and airspace structural testing. When preparing the experiments, we focused on locating evolving defects in stressed materials and constructions, and leakage of vessels. We therefore performed location exper- iments on four different specimens with three different AE sources. The specimens comprised bands, plates, rings, and vessels, while the AE sources were simulated by rupture of a pencil lead (pen test), material deformation during tensile test, and leakage air flow through a small hole in a sample. The positions of AE sources used in testing were well specified. Actual positions were compared with estimated ones, and the discrepancy was used to describe the inaccuracy of the locator. In this article, only the experiment with leakage air flow through a small hole in a sample is explained. In Part I, location of one continuous AE source is explained. This Part is intended for better understanding of Part II and comparison of results. In Part II, a new approach to the location of two http://arxiv.org/abs/0704.0047v1 simultaneously active continuous AE sources is explained. Below, the article first explains the theoretical background for application of the conditional average to the location problem, then describes auxiliary AE signal processing, and finally demonstrates performance of the experimental intelli- gent locator. THEORETICAL BACKGROUND In this section we describe a non-parametric approach to empirical modeling of AE phenomena and solving the location problem. This modeling stems from a description of physical laws in terms of probability distributions. Since it has been explained in detail elsewhere, we present here just its basic concepts [6], [5]. The object of empirical modeling is the relationship between variables which are simultaneously measured by a set of sensors. In our example the variables are source coordinates and AE signal characteristics. Let them be represented by a vector of M components: x = (ξ1, . . . , ξM ). In the empirical description of an AE phenomenon we repeat the observation N times to create a database of prototype vectors {x1, . . . ,xN}. Instead of formulating a relation between the components of x we instead treat this vector as a random variable and express the joint probability density function f by the estimator f(x) = δ(x− xn) . (1) Here δ denotes Dirac’s delta function. For the purposes of modelling, we must also estimate the probability density in the space between the prototype points. This is achieved by expressing the singular delta function in Eqs. 1 by a smooth function, such as for example the Gaussian wn(x− xn, σ) = exp −‖x− xn‖ , n = 1, . . . , N . in which σ denotes the smoothing parameter. The data vectors determine an empirical model of the probability density function. Their acquisition corresponds to the learning phase of the empirical modeling. Let us further assume that observation of AE phenomenon provides only partial information that is given by a truncated vector g = (ξ1, . . . , ξS ; ∅) , (3) in which ∅ denotes missing components. The problem is to estimate the complementary vector of missing or hidden components: h = (∅; ξS+1, . . . , ξM ); (4) such that the complete data vector is determined by concate- nation x = g ⊕ h = (ξ1, . . . , ξS , ξS+1, . . . , ξM ) . (5) A statistically optimal solution to this problem is determined by the conditional average estimator, which is expressed by a superposition of terms [6] Bn(g)hn, where (6) Bn(g) = w(g − gn, σ) w(g − gk, σ) . (7) The basis functions Bn(g) represent a measure of similarity between the truncated vector g given by a particular ob- servation and truncated vectors from the database gn. The higher the value of Bn(g) the higher the contribution of hn to the sum 7 estimating ĥ. Hence, estimation of the hidden vector ĥ resembles associative recall, which is characteristic of intelligence. The conditional average represents a general non-parametric regression [6]. During the learning phase of operation an intelligent locator of AE sources accepts AE signals and source coordinates and stores prototype data vectors, while during application it accepts only AE signals and estimates the corresponding source position. Each of these phases can be performed in a separate unit which can be interpreted as a layer of a sensory- neural network. In order to ensure acceptable properties of the locator, the smoothing parameter σ must be properly chosen[2]. The purpose of δ function smoothing is to estimate the probability density function between the prototype data points. A unique method for optimal specification of the smoothing parameter is as yet unknown. In this case, it is numerically simpler to specify σ by the half distance to the closest neighbor point: σn = 0.5 min ‖gi − gn‖ , for all i 6= n . (8) Signal pre-processing The intelligent locator comprised a sensor antenna, signal pre-processing unit and source locating unit, as shown in Fig. 1. The first unit calculates the time delay ∆t from AE signals y1(t) and y2(t), while the second unit estimates the source position ẑ from the time delay ∆t. AE signals y1(t) and y2(t) are detected by sensors and filtered using a Butterworth bandpass filter. Without the bandpass filter, time delays cannot be easily mapped to source positions on the sample band, and therefore the applicability of this method depends on the proper choice of bandpass filter function H(f). We found on dispersive specimens that information in the continuous AE signal about source position is located in a narrow frequency band. A wave packet with approximately constant wave velocity along the specimen must be extracted by this filter. The filter function H(f) is determined during training procedure of the locator. PSfrag replacements y1(t) y2(t) y1(t) y2(t) Ry1y2 ∆tCross- correlator detector Locator ẑSensor Sensor Bandpass filter Test specimen #2 H(f) Signal pre-processing unit Source location Fig. 1. AE signal processing by the intelligent locator Two conventional methods for time delay estimation be- tween two signals are known: threshold function and cross- correlation function. Estimation of time delay by the threshold function is simple, but only applicable in the case of discrete AE. More general, but also more demanding, is time delay estimation from the cross-correlation function of AE signals [11]. The cross-correlation function: Ry1y2(τ) = y1(t) y2(t+ τ) , (9) generally exhibits a peak when parameter τ corresponds to the time delay ∆t between signals y1(t) and y2(t). The time delay is thus determined from the position of the peak of the cross-correlation function. One advantage of the application of the cross-correlation function is that it does not depend on the discrete or continuous character of AE signals. This method for time delay estimation is only applicable when one AE source is active at the time of detection. In the event of two or more simultaneously active continuous AE sources, a different approach should be used which will be discussed in the Part II. A filter function is calculated during calibration of the intelligent locator as follows. During calibration, a set of prototype sources is generated on the test specimen by a pen test at a prepared coordinate net[8]. This net in most cases has linear sections, where the prototype sources are positioned on a straight line. In this case, we know that time delays between signals are also linearly dependent. If we have a test specimen with a complicated geometrical structure, then a pre- calibration process has to be performed in which we have to choose a geometrically simple part of the specimen and carry out a pre-calibration procedure on this part such that time delays between signals are linearly dependent. For calibration we used AE signals generated by a pen test. We obtained 12 pairs of AE signals from two sensors concatenated with known coordinates of sources. The posi- tions of simulated sources were uniformly distributed along a straight line on a specimen. In such cases, time delay ∆t is linearly related to source position z. This is of advantage for optimal determination of bandpass filter because the reference is a straight line. Calculation of time delays on the same set of prototype AE signals was repeated 70 times. The bandpass filter of ∆f = 10 kHz was shifted by 1 kHz at each repetition from 5 to 75 kHz. Time delays were calculated at each repetition and the distribution obtained was compared with a straight line, as shown in Fig. 2. The frequency bandwidth was considered optimal when the root mean square error (RMSE) was minimal, as shown in Fig. 3(a). The optimal frequency band for this specimen was 35-45 kHz and the velocity of elastic waves was 1.7 km s−1. The filter was further used for pre-processing samples of prototype as well as test sources. As shown in Fig. 3(b), the pairs (z,∆t), estimated from filtered signals, fit a straight line, except one outlier, which results from experimental error. EXPERIMENT The intelligent AE source locator is shown schematically in Fig. 4. It includes an automatic data-acquisition system −1 0 1 PSfrag replacements l [m] 5–15 kHz −1 0 1 PSfrag replacements l [m] ∆t [ms] 5–15 kHz 15–25 kHz −1 0 1 PSfrag replacements l [m] ∆t [ms] 5–15 kHz 15–25 kHz 25–35 kHz −1 0 1 PSfrag replacements l [m] ∆t [ms] 5–15 kHz 15–25 kHz 25–35 kHz 35–45 kHz −1 0 1 PSfrag replacements l [m] ∆t [ms] 5–15 kHz 15–25 kHz 25–35 kHz 35–45 kHz 45–55 kHz −1 0 1 PSfrag replacements l [m] ∆t [ms] 5–15 kHz 15–25 kHz 25–35 kHz 35–45 kHz 45–55 kHz 55–65 kHz Fig. 2. Distribution of time delays and their linear approximation along the band specimen. By this procedure an optimal bandpass filter can be determined. controlled by computer and a network of AE sensors. The AE sensors are piezoelectric transducers (pinducers). The diameter of the transducer active area is 1.3 mm, And so it can be considered a point-like sensor. The signals from sensors are fed to a digital oscilloscope where they are digitized and transferred to a PC. Operation of the intelligent locator is determined by software in the PC that controls data acquisition and estimates the position of unknown AE sources. The locator operates in two different modes: 1) In learning or calibration mode, a set of N pen tests is performed in which complete information about the AE phenomenon is acquired. The operator must prepare an orientation net the shape of which depends on the shape of the test specimen. The recommended shape is an equidistant net, since such position of prototype sources yield a minimum error of the locator. ¿From source coordinates and time delays between pre-processed AE signals, the prototype vectors are created and stored in the memory of the neural network as a data base. 2) In application mode, only time delays between AE signals are provided. There are then associated in the neural network with the estimated source coordinates. In the case of discrete AE, the time delay can visually be estimated from a marked jump in the burst of the AE signal, or can be instrumentally determined using a threshold function. Hence, in the case of continuous AE, time delays cannot be simply estimated, although a cross-correlation function has already been used for this purpose. In our approach, we therefore applied a cross-correlation function. The purpose of this experiment was to determine the accuracy of location of continuous AE sources on a one-dimensional specimen. Two experiments on aluminum band specimen are explained in this article. We tested the locator on an aluminum band specimen of dimensions 4000 × 40 × 5mm3. Reflection of AE signals at the ends of the band specimen was reduced by sharpening the ends. For testing we selected a test area 15−25 35−45 55−65 75−85 PSfrag replacements ∆f [kHz] - Frequency band E ∆fopt −1000 −500 0 500 1000 PSfrag replacements z [mm] - Actual location -outlier Fig. 3. Time delays for prototype and test sources by using the bandpass filter of frequency 35-45 kHz. a) Deviation of prototype source position from a straight line for different filter frequency bandwidth. b) Time delays of prototype and test sources; Legend: + prototype source, ◦ test source in the middle of the band specimen where 23 holes were prepared. The distance between holes was 100 mm and the diameter of holes was 2 mm. Two AE sensors were mounted 100 mm away from the terminal holes. For the purpose of locator training, we generated 12 prototype sources separated by 200 mm, while all 23 holes were applied for locator testing. In this experiment, we calibrate the locator by pen test and examine it by continuous AE generated by air flow. The air flow was produced by expansion of compressed air through nozzle of 1 mm diameter. The nozzle was mounted 1 mm above the band specimen surface. Two experiments were performed. In the first experiment, only one continuous AE source was active on the band specimen, while in the second experiment two continuous AE sources were active simultaneously on the band specimen. Successive simultaneous location of two sources is explained in Part II. Signals were processed as shown in Fig. 1. The first step in processing was calculation of cross-correlation function of AE signals. The corresponding signal was sent through a bandpass Butterworth filter of bandpass from 35 to 45 kHz. Determination of this filter is explained earlier in this article. RESULTS The results of locator testing are shown in Fig. 5(a). The absolute location error for each test source is shown in Fig. 5(b). Location error in the experiment ranges from 1.3 mm to 60 mm with average value εa = 20mm (ignoring the outlier). If we describe the error with respect to the distance between sensors (2.4 m), the relative value is less than 1%. Increasing the number of prototype sources can reduce the error. Despite the complexity of continuous AE signals, the location problem was solved satisfactorily with respect to The accuracy required in non-destructive testing. Results also show that a standard calibration procedure with discrete AE signals generated by pen test can be used for locator training. PSfrag replacements Sensors Operator Analog Signals #2 Digital oscilloscope Parameter set Computer Calibration by simulated AE sources Fig. 4. Experimental setup of intelligent locator −1000 −500 0 500 1000 −1000 PSfrag replacements x [mm] - Actual location -outlier −1000 −500 0 500 1000 PSfrag replacements x [mm] - Actual location -outlier Fig. 5. Result of continuous AE source location on the band. a) Estimated versus actual location of test sources; Legend: + prototype source, ◦ test source. b) Absolute location error; εa - average error. DISCUSSION AND CONCLUSION Estimation of source coordinates by the conditional average is subject to systematic error caused by smoothing of the delta function [5]. This error can be reduced by increasing the number of prototype sources. Since it is not always possible to increase the number of prototype sources due to the complexity of experiments, a compromise must be found by trial and error. Experimental error is acceptable, so we decided to make additional tests, as will be discussed in Part II. This study shows that a conventional AE locator operating on the triangulation method can be successfully replaced by an intelligent locator that learns from examples. The results show that the intelligent locator can locate sources with acceptable accuracy in cases of: (1) discrete AE on band and plate, (2) continuous AE on band, (3) discrete AE on plate with hole (ring), (4) discrete AE generated by specimen rupture during the tensile test, and (5) discrete AE on pressure vessel. Is has been also shown that the locator can perform zonal locating[7]. Comparing mean errors of all experiments and the distances between prototype sources, we find that the average error is always less than 30% of the distance between prototype sources, while the maximal error is always less than 50% of the distance between prototype sources. The accuracy of the locator can be controlled by the number of prototype sources excited during training. The experimental error of the locator is a consequence of wave dispersion on a specimen that operates as a waveguide, reflections from boundaries, and attenuation. We found for dispersive waves that an optimal wave packet must be found which has approximately constant velocity along the test specimen. Estimation of time delay between AE signals by the cross-correlation function is only applicable for one active AE source. If there are several simultaneously active AE sources, then blind source separation should be used, as will be shown in Part II. REFERENCES [1] Chan, Y. T. Ho, K. C. 1994 , A simple and efficient estimator for hy- perbolic location, IEEE Transactions on Signal Processing 42(8), 1905– 1915. [2] Cherkassky, V. Mulier, F. 1998 , Leraning from Data: Concepts, Theory, and Methods, John Wiley & Sons inc., New York. [3] Friedlander, B. 1987 , A passive localization algorithm and its accuracy analysis, IEEE Journal of Oceanic Engineering OE-12(1), 234–245. [4] Grabec, I. Antolovič, B. 1994 , Intelligent locator of AE sources, in T. Kishi, Y. Mori M. Enoki, eds, The 12th International Acoustic Emission Symposium, Vol. 7 of Progress in Acoustic Emission, The Japanese Society for Non-Destructive Inspection, Tokyo, Japan, pp. 565–570. [5] Grabec, I. Sachse, W. 1991 , ‘Automatic modeling of physical phenomena: Application to ultrasonic data’, J. Appl. Phys. 69(9), 6233–6244. [6] Grabec, I. Sachse, W. 1997 , Synergetics of Measurement, Prediction and Control, Springer-Verlag, Berlin. [7] Kosel, T. Grabec, I. 1998 , Intelligent locator of discrete and continuous acoustic emission sources, in J. Grum, ed., Application of Contemporary Non-destructive Testing in Engineering, The 5th International Conference of Slovenian Society for Nondestructive Testing, Slovenian Society for Nondestructive Testing, Ljubljana, Slovenia, pp. 39–54. [8] McIntire, P. Miller, R. K., eds 1987 , Acoustic Emission Testing, Vol. 5 of Nondestructive Testing Handbook, 2 edn, American Society for Non- destructive Testing, Philadelphia, USA. [9] Specht, D. F. 1991 , A general regression neural network, IEEE Trans. on Neural Networks 2(6), 568–576. [10] Tobias, A. 1976 , Acoustic emission source location in two dimensions by an array of three sensors, Non-Destructive Testing 9(2), 9–12. [11] Ziola, S. M. Gorman, M. R. 1991 , Source location in thin plates using cross-correlation, J. Acoust. Soc. Am. 90(5), 2551–2556. References ABSTRACT The intelligent acoustic emission locator is described in Part I, while Part II discusses blind source separation, time delay estimation and location of two simultaneously active continuous acoustic emission sources. The location of acoustic emission on complicated aircraft frame structures is a difficult problem of non-destructive testing. This article describes an intelligent acoustic emission source locator. The intelligent locator comprises a sensor antenna and a general regression neural network, which solves the location problem based on learning from examples. Locator performance was tested on different test specimens. Tests have shown that the accuracy of location depends on sound velocity and attenuation in the specimen, the dimensions of the tested area, and the properties of stored data. The location accuracy achieved by the intelligent locator is comparable to that obtained by the conventional triangulation method, while the applicability of the intelligent locator is more general since analysis of sonic ray paths is avoided. This is a promising method for non-destructive testing of aircraft frame structures by the acoustic emission method. <|endoftext|><|startoftext|> Introduction The data obtained from LISA [1] will contain a large number of white dwarf binary systems across the whole observational window [2]. At frequencies below ∼ 3 mHz the sources are so abundant that they produce a stochastic foreground whose intensity dominates the instrumental noise [3]. The closer (and louder) sources will still be sufficiently bright to be individually resolvable. Above ∼ 3 mHz the sources become sufficiently sparse in parameter space (and in particular in the frequency domain) that the detectable sources become individually resolvable. The identification of white dwarfs in the LISA data set represents one of the most interesting analysis problems posed by the mission: the total number of signals in the data set is unknown, the effective noise http://arxiv.org/abs/0704.0048v2 WD MLDC1 2 level affecting the measurements is not easily estimated from the data streams, and there is a large number of overlapping sources to the limit of confusion. Bayesian inference provides a clear framework to tackle such a problem [4, 5, 6]. Some of us have carried out exploratory studies and “proof of concept” analyses on simplified problems that have demonstrated that Bayesian techniques do indeed show good potential for LISA applications [11, 10, 12]. Similarly other authors have successfully implemented techniques using Bayesian inference [18, 17, 16]. In this paper we present the first results of an end-to-end analysis pipeline developed in the context of the Mock LISA Data Challenges that has evolved from our earlier work. This pipeline is applied to the simplest single-source challenge data sets 1.1.1a and 1.1.1b and all the results presented here are obtained after the release of the key files. In a companion paper [19], we present results that we have obtained for the analysis of the data sets containing gravitational radiation from a massive-black-hole binary inspiral. Our group submitted an entry for the MLDC analysing the blind data set 1.1.1c [13, 14]: however that result suffered from the fact that the pipeline was not complete, the analysis code was inefficient and we encountered hardware problems with the Beowulf cluster used to perform the analysis. The results that we present here are obtained with a two-stage end-to-end analysis pipeline: (i) we first process the data set with a grid-based coherent algorithm to identify candidate signals; (ii) we then follow up the candidate signals with a Markov Chain Monte Carlo code to obtain probability density function on the model parameters. Our method differs from other MCMC methods that have been proposed and applied to the MLDC data in the context of white dwarf binaries [18, 17, 16]: the MCMC is not used to search, but only in the final stage of the analysis to produce posterior density functions of the model parameters. The noise spectral level is included as one of the unknown parameters and is estimated together with the parameters of the gravitational wave source(s). 2. Analysis method In this section we describe the two stage approach that we have adopted for the analysis. The signal produced by a white dwarf binary system is modelled as monochromatic in the source reference frame, following the conventions adopted in the first MLDC [7, 8, 9]. It is described by 7 parameters: ecliptic latitude ϑe and longitude ϕe, inclination ι and polarisation angle Ψ, frequency at a reference time f0 and corresponding overall phase Φ0 and amplitude A. The data distributed for the MLDC are the three TDI v1.5 Michelson observables X , Y and Z ‡. From those we construct the two orthogonal TDI outputs A = (2X − Y − Z)/3 (1) E = (Z − Y )/ 3 (2) ‡ In our MCMC analysis we use the data set produced using the LISA Simulator. WD MLDC1 3 by diagonalizing the noise covariance matrix following the procedure presented in [23]. The noise affecting the channels A and E is uncorrelated and described by the one-sided noise spectral density Sn(f). We model the LISA response function in the low frequency limit in order to improve the computational efficiency of our analysis. 2.1. First stage: Grid based search The first stage of the pipeline consists of a fast search of the data for the best matched filter based on the well-known F -statistic algorithm, first developed for triaxial pulsar signals in the context of ground-based observations [20]. This exploits the Fast Fourier Transform to perform matching in the frequency domain to templates which are generated at an array of fixed points in the parameter space. The data from an individual detector in the frequency domain d̃(f) is supposed to contain a signal plus Gaussian noise, d̃(f) = h̃(f) + ñ(f). We define the logarithmic likelihood as a measure of match, as given by logL ≈ (d̃|h̃)− 1 (h̃|h̃) with (·|·) denoting the scalar product as defined in [20]. A single signal in the F -statistic algorithm is re-parameterised as a linear function of four orthogonal variables, and the frequency f0. The detection statistic is based on four parameters AF , BF , CF and DF , found by integrating over the response functions a(t) and b(t) to the two polarisation states of the gravitational wave signal [20], ∫ Tobs a(t)2dt (3) b(t)2dt (4) a(t)b(t)dt, (5) DF = AFBF − C2F (6) Tobs denotes the total observed time for the data set being analysed. The optimal detection statistic 2F , which is pre-maximised over the nuisance parameters h0, ι, φ0 and ψ is 2F = 8 Sn(f)Tobs BF |Fa|2 + AF |Fb|2 − 2CF ×R(FaFb) . (7) Fa and Fb are the demodulated Fourier transforms of the data, ∫ Tobs d(t)a(t)e−iΦ(t)dt; Fb = ∫ Tobs d(t)b(t)e−iΦ(t)dt, (8) Φ(t) is the phase of the gravitational wave signal, as is described in [22]. As the LISA array moves in space, the frequency f0 is affected by Doppler modulations. This modulation changes with differing position of the source in the sky, implying the need to recalculate the modulations and thus a(t) and b(t) for each sky position that is tested - a significant factor in the performance of this approach. The differing modulation structure however also allows us to estimate the location of the WD MLDC1 4 source in the sky by maximising the 2F value. The resolution possible on the sky with this method is not as good as from a full Bayesian posterior probability calculation as performed in the parameter estimation stage, as shown in an example for Challenge 1.1.1a in figure 1. Nevertheless, since this statistic can be computed fairly quickly it serves as a useful way of finding initial values to feed into the MCMC routine, as adopted within the pipeline. The resolution achievable on the sky increases with frequency, which implies that the mismatch between filter and signal falls off more rapidly at higher frequencies, requiring a greater number of templates to cover the sky. Therefore for challenge 1.1.1b at f ≈ 3mHz a sky grid of size 5,752 points was used, in comparison with 765 points for challenge 1.1.1a at f ≈ 1mHz. The F -statistic search was implemented using the LIGO “Lalapps” suite of software [24], in which the pulsar search code was modified by Reinhard Prix and John Whelan to use the LISA response function for the TDI variables X , Y , and Z [21]. These input data streams were given in the form of Short Fourier Transforms, each of length one day, created from the MLDC1 challenge data. For each challenge the full specified range of frequencies was searched for the signal as it would be in a blind search. The code was run on a single CPU and executed in a few hours, with the run-time increasing at higher frequency due to the higher resolution of sky and frequency grid that had to be used. The candidate chosen to pass to the MCMC stage was simply that which triggered the highest value of 2F . 2.2. Second stage: Markov Chain Monte Carlo follow-up According to Bayes’ theorem, the posterior probability, p(m̃|d̃) of a model m̃ given the data d̃ depends on the prior distribution p(m̃), containing the information known before the analysis, the likelihood L(d̃|m̃) of the model and a normalisation factor p(d̃) p(m̃|d̃) = L(d̃|m̃)p(m̃) p(d̃) The posterior probability density function shows the joint probability density of given values of parameters of the model m̃, conditional on the data d̃. We implemented Bayes’ theorem using data in the form of TDI variables A and E and modelled our template according to the Long Wavelength Approximation directly in the Fourier domain [25] to gain computational speed. The logarithmic likelihood L(d̃|m̃) in this stage explicitly included its dependence on the one-sided noise spectral density Sn(f) logL(d̃|m̃) = const. − log Sn(f) − (d̃− h̃|d̃− h̃), (10) shown here for either A or E, with the combined likelihood as sum of the individual likelihoods. We restricted our analysis to a sufficiently narrow frequency window in order to be able to approximate the noise spectral density as constant, Sn(f) = S0. This window was set as the interval in frequency that contains at least 98% of the power of our WD MLDC1 5 Ecliptic Longitude 2F as a function of sky position, at a frequency 0.001063 Hz 1 2 3 4 5 6 Figure 1. The variation of 2F values for the search for unknown signal 1.1.1a, as a function of sky position, parameterised by ecliptic latitude β and longitude λ. The distribution is multi-modal and non-Gaussian, and has a poor resolution in comparison with that can be achieved with the MCMC and a Bayesian likelihood, but by finding the maximum it serves well as a starting point for the more refined parameter estimation below. model m̃, with the interval’s upper and lower limits given by f±(2/year)(5+2πf0AU/c) [25]. S0 is therefore an additional parameter to be inferred within the model m̃ in Eq. 10. We implemented an automatic Random Walk Metropolis sampler (Stroeer & Vecchio 2007, in. prep.) to sample from the posterior probability density function in form of a Markov chain. Metropolis sampling eliminates the need to explicitly calculate the normalisation constant in Bayes’ theorem, and the evolving Markov chain gives easy access to joint as well as marginalised posterior density distribution. The sampler was started from the parameter set which triggered the highest value of 2F in our grid based coherent run of the analysis (see former section). The automated function of the Metropolis sampling is achieved by controlling the sampling step-size with adaptive acceptance probability techniques [26]. The sampler therefore does not depend on assumptions about the signal in the data set in order to perform successfully and reliably; it develops a suitable algorithm and approach by itself based on the properties of the likelihood as found on the fly, in the initial steps of the sampler. The length of our Markov chain was pre-set to 106, with the initial 104 chain states discarded as the “burn-in” phase of our sampler. The runtime for one data analysis run is 5 hours on a single 2 GHz CPU on the Tsunami cluster of the University of Birmingham. WD MLDC1 6 Figure 2. The marginalised posterior probability density functions of the eight unknown parameters – the seven parameters that describe the signal and the noise spectral density S0 – for the the challenge data set 1.1.1a. The vertical black solid line denotes the true value of the parameter (for the polarisation angle the true value modulo π/2), and the grey dashed line the initial value for the MCMC analysis as determined by the template of the first-stage that produces the maximum value of the F -statistic. In the case of the noise spectral density the first stage of the analysis does not provide an estimate; the true value of this parameter is taken to be the value of the instrumental noise spectrum used to generate the data set and provided in [9]. WD MLDC1 7 Figure 3. The marginalised posterior probability density functions of the eight unknown parameters for the the challenge data set 1.1.1b. Labels are as in Figure 3. Results We found that the most promising candidate signal from the F -statistic search already matched the true embedded signal to high accuracy, particularly in frequency and sky location. Our MCMC sampler, as a post-processing unit, thus only needed 1000 iterations to burn in and to establish a reliable sampling from the posterior. The marginalised posteriors are shown in Figs. 2 and 3. We found, as seen in latter figures, that the MCMC sampler further refined the initial guesses from the F -statistic, as measured by the absolute difference between the true value of a given parameter and the median of the marginalised posterior recovered for that parameter. Table 1 WD MLDC1 8 Table 1. Details about the results from Challenge 1.1.1a and Challenge 1.1.1b. S0, the constant one-sided noise spectral density within our narrow frequency window, is compared to the true one sided noise spectral density at the true frequency of the signal, Ψ is given modulo π/2. Int90 denotes the minimum interval to include 90% of MCMC states for given parameter, ∆mode denotes the absolute difference between the true value of a signal parameter and the mode of its recovered posterior; ∆median and ∆mean denote the equivalent absolute difference for median and mean of the posterior respectively; σ denotes the sampled standard deviation of the posterior as derived from the median. We further quote the signal-to-noise ratio (SNR) for a template using the true values of the source and the recovered values of the data analysis run, as derived from the median of the individual posterior distributions, and the correlation C between these two templates. Int90 ∆mode ∆median ∆mean σ Challenge 1.1.1a 10−41Hz−1 (3.53257, 4.72639) -0.42084 -0.440278 -0.452456 0.36704 ϑe/ rad (0.958409, 1.03165) -0.0147383 -0.0149381 -0.0148725 0.0222861 ϕe/ rad (5.05376, 5.13528) -0.00550139 -0.00569547 -0.00579889 0.0247886 Ψ/ rad (1.32475, 0.500553) 0.1768 0.1823 0.1902 0.1908 ι/ rad (0.097761, 1.0008) -0.0459747 0.190001 0.23459 0.295211 A/10−22 (1.61976, 2.67967) 0.664371 0.358844 0.298978 0.368524 f0/ mHz (1.06273, 1.06273) -1.19664e-06 -1.22207e-06 -1.22259e-06 1.04422e-06 Φ0/ rad (3.10668, 5.808) -0.164989 0.00998525 0.229659 0.829146 SNR true = 51.024497 recovered = 50.648600 C true vs. recovered = 0.99689 Challenge 1.1.1b 10−41Hz−1 (0.876833, 1.38959) -0.0679571 -0.0906557 -0.0996144 0.16017 ϑe/ rad (-0.121611, 0.0116916) -0.0343353 -0.151185 -0.150328 0.0406552 ϕe/ rad (4.60969, 4.63537) 0.00265723 0.00305564 0.00302203 0.00779893 Ψ/ rad (0.246328, 0.362409) 0.0301541 0.0311747 0.0311268 0.0353938 ι/ rad (1.22036, 1.33338) -0.0430412 -0.040458 -0.0394818 0.0348383 A/10−22 (0.45001, 0.542454) -0.016442 -0.0151921 -0.0149907 0.0281154 f0/ mHz (3.00036, 3.00036) 3.1221e-07 2.49289e-07 2.42807e-07 8.18111e-07 Φ0/ rad (5.83869, 6.19411) 0.137219 0.119301 0.119921 0.502384 SNR true = 36.587444 recovered = 37.368806 C true vs. recovered = 0.97897 shows details of the statistics of recovered posterior distributions. We highlight that the majority of the true values of the parameters are within one standard deviation of the median of the posterior, with a small percentage within two sampled standard deviations. In addition, every true value of a parameter of the signal is within the minimum interval of the posterior to cover 90% of all MCMC state values. Recovered signal-to-noise ratios are measured as SNR = (s|h)/ (h|h), and the match C = (htrue|hmed)/ (htrue|htrue) (hmed|hmed) between a template constructed from the true values and a template from the median values of the individual posterior distributions, yielding a correlation that is always higher than 0.97. Noise levels are determined accurately and within 1 to 1.5 sampled standard deviations. Nevertheless we note that WD MLDC1 9 our run on Challenge 1.1.1a shows a lower match and higher differences between true value and recovered value of parameters as compared to the run on Challenge 1.1.1b. It also exhibits tailing posterior distributions in inclination and amplitude, although the SNR of Challenge 1.1.1a is twice the value of Challenge 1.1.1b. 4. Conclusions We have presented a new approach to LISA data analysis in the form of an end-to-end pipeline. We first detected and identified candidate signals in the LISA data stream with a grid-based coherent algorithm, and then post-processed the most promising candidate signals with an automatic Markov Chain Monte Carlo code to obtain probability densities for the model’s parameters. We demonstrated successful identification and post-processing of the signals from the double white dwarf single source MLDC data sets 1.1.1a and 1.1.1b. Furthermore, the automatic Markov Chain Monte Carlo code successfully identified the noise level within a small frequency window of interest in these data sets. We note that a parallel approach to the data analysis of binary inspiral signals is being developed by Röver et al, with a Markov Chain Monte Carlo method that can successfully post-process a candidate signal generated from the true parameters of the signal. Signal detection in a pre-processing stage is currently being tested within parallel tempered MCMC methods and/or time-frequency analyses [19]. We identify two prominent and promising features of our pipeline: its ability to determine good initial conditions for the MCMC and its ability to run the MCMC automatically. As we have demonstrated in this paper, the width of the marginalised posterior density for the frequency parameter is extremely narrow. It is therefore vital that the initial estimate of the frequency is within this region, as the almost flat structure of the posterior PDF outside this region gives little to no information on the location of the peak. The chances of finding the mode through a random sampling are decreased further still with a larger prior range for the parameter. Adding an F -statistic search as the first stage in the pipeline solves this problem, since the frequency and position in the sky are recovered very accurately, to within the limits of the posterior probability region of interest, before the MCMC performs post-processing and parameter estimation. The automatic feature of the MCMC ensures a successful post-processing for the other astrophysical parameters that may have been located outside the posterior region of interest by the F -statistic approach, as in the case for the amplitude of Challenge 1.1.1a. Convergence is aided by the ability of our code to increase or decrease sampling step-sizes according to its experience of the sampling quality of the posterior during the burn-in phase. We are working on an extension of the pipeline as shown in this document to successfully tackle multi-source data sets, required for the second round of the MLDC. Current work includes the exploration of our grid-based coherent search on such data streams in order to automatically identify the most promising individual candidate signals, and the implementation of an automatic Reversible Jump Markov Chain Monte WD MLDC1 10 Carlo routine (e.g. as already demonstrated in [10]) to find the trans-dimensional probability density functions of the parameters of an unknown total number of signals. We highlight that the noise level determination presented here already serves as a key ingredient to round 2, where the simulation of a galactic white dwarf binary population introduces additional confusion noise levels from unresolvable sources. Acknowledgements Nelson Christensen’s work was supported by the National Science Foundation grant PHY-0553422 and the Fulbright Scholar Program. Alberto Vecchio’s work was partially supported by the Packard Foundation and the National Science Foundation. The University of Auckland group was supported by the Royal Society of New Zealand Marsden Fund Grant UOA-204. References [1] Bender B L et al 1998 LISA Pre-Phase A Report; Second Edition, MPQ 233 [2] Nelemans G, Yungelson L R and Portegies Zwart S F 2001 Astron. and Astrophys. 375 890 [3] Farmer A J and Phinney E S 2003 Mon. Not. R. Astron. Soc 346 1197 [4] Jaynes E T Probability theory: The logic of science 2003 Cambridge University Press [5] Gregory P C Bayesian logical data analysis for the physical sciences 2005 Cambridge University Press [6] Gelman A, Carlin J B, Stern H, and Rubin D B Bayesian data analysis 1997 Chapman & Hall CRC Boca Raton [7] Arnaud K A et al 2006 AIP Conf. Proc. 873 619 Preprint gr-qc/0609105 [8] Arnaud K A et al 2006 AIP Conf. Proc. 873 625 Preprint gr-qc/0609106 [9] Mock LISA Data Challenge Task Force, “Document for Challenge 1,” svn.sourceforge.net/viewvc/lisatools/Docs/challenge1.pdf. [10] Stroeer A, Gair J and Vecchio A 2006 Automatic Bayesian inference for LISA data analysis strategies Preprint gr-qc/0609010 [11] Umstätter R, Christensen N, Hendry M, Meyer R, Simha V, Veitch J, Vigeland S and Woan G 2005 Phys Rev D 72 022001 [12] Wickham E D L, Stroeer A and Vecchio A 2006 Class Quantum Grav 23 819 [13] Bloomer E et al Report on MLDC1 available at http://astrogravs.nasa.gov/docs/mldc/round1/entries.html [14] Arnaud K A et al 2007 Preprint gr-qc/0701139 [15] Arnaud K A et al 2007 Preprint gr-qc/0701170 [16] Crowder, J., and Cornish, N. J. 2007 Phys. Rev. D 75 043008 [17] Crowder J, Cornish N J and Reddinger J L 2006 Phys. Rev. D 73 063011 [18] Cornish N J and Crowder J 2005 Phys. Rev. D 72 043005 [19] Röver C et al in this volume [20] Jaranowski P, Królak A and Schutz B F 1998 Phys. Rev. D 58 063001 [21] Prix R and Whelan J 2006 Technical note [22] Brady P R, Creighton T, Cutler C and Schutz B F 1997 Phys. Rev. D 57 2101 [23] Prince T A, Tinto M, Larson S L and Armstrong J W 2002 Phys. Rev. D 66 122002 [24] LAL Home Page: http://www.lsc-group.phys.uwm.edu/daswg/projects/lal.html [25] Cornish N J, Larson S L 2003 Phys. rev. D 67 103001 [26] Atchade Y F, Rosenthal J S 2005 Bernoulli 11 815-828 http://arxiv.org/abs/gr-qc/0609105 http://arxiv.org/abs/gr-qc/0609106 http://arxiv.org/abs/gr-qc/0609010 http://astrogravs.nasa.gov/docs/mldc/round1/entries.html http://arxiv.org/abs/gr-qc/0701139 http://arxiv.org/abs/gr-qc/0701170 http://www.lsc-group.phys.uwm.edu/daswg/projects/lal.html Introduction Analysis method First stage: Grid based search Second stage: Markov Chain Monte Carlo follow-up Results Conclusions ABSTRACT We report on the analysis of selected single source data sets from the first round of the Mock LISA Data Challenges (MLDC) for white dwarf binaries. We implemented an end-to-end pipeline consisting of a grid-based coherent pre-processing unit for signal detection, and an automatic Markov Chain Monte Carlo post-processing unit for signal evaluation. We demonstrate that signal detection with our coherent approach is secure and accurate, and is increased in accuracy and supplemented with additional information on the signal parameters by our Markov Chain Monte Carlo approach. We also demonstrate that the Markov Chain Monte Carlo routine is additionally able to determine accurately the noise level in the frequency window of interest. <|endoftext|><|startoftext|> Introduction Isomorphism classes of smooth toric Fano varieties of dimension d correspond to isomorphism classes of socalled smooth Fano d-polytopes, which are fully dimensional convex lattice polytopes in Rd, such that the origin is in the interior of the polytopes and the vertices of every facet is a basis of the integral lattice Zd ⊂ Rd. Smooth Fano d-polytopes have been intensively studied for the last decades. They have been completely classified up to isomorphism for d ≤ 4 ([1], [18], [3], [15]). Under additional assumptions there are classification results valid in every dimension. To our knowledge smooth Fano d-polytopes have been classified in the fol- lowing cases: • When the number of vertices is d+ 1, d+ 2 or d+ 3 ([9],[2]). • When the number of vertices is 3d, which turns out to be the upper bound on the number of vertices ([6]). • When the number of vertices is 3d− 1 ([19]). • When the polytopes are centrally symmetric ([17]). • When the polytopes are pseudo-symmetric, i.e. there is a facet F , such that −F is also a facet ([8]). • When there are many pairs of centrally symmetric vertices ([5]). http://arxiv.org/abs/0704.0049v1 2 2 SMOOTH FANO POLYTOPES • When the corresponding toric d-folds are equipped with an extremal contraction, which contracts a toric divisor to a point ([4]) or a curve ([16]). Recently a complete classification of smooth Fano 5-polytopes has been an- nounced ([12]). The approach is to recover smooth Fano d-polytopes from their image under the projection along a vertex. This image is a reflexive (d− 1)-polytope (see [3]), which is a fully-dimensional lattice polytope con- taining the origin in the interior, such that the dual polytope is also a lattice polytope. Reflexive polytopes have been classified up to dimension 4 using the computer program PALP ([10],[11]). Using this classification and PALP the authors of [12] succeed in classifying smooth Fano 5-polytopes. In this paper we present an algorithm that classifies smooth Fano d-polytopes for any given d ≥ 1. We call this algorithm SFP (for Smooth Fano Poly- topes). The input is the positive integer d, nothing else is needed. The algorithm has been implemented in C++, and used to classify smooth Fano d-polytopes for d ≤ 7. For d = 6 and d = 7 our results are new: Theorem 1.1. There are 7622 isomorphism classes of smooth Fano 6- polytopes and 72256 isomorphism classes of smooth Fano 7-polytopes. The classification lists of smooth Fano d-polytopes, d ≤ 7, are available on the authors homepage: http://home.imf.au.dk/oebro A key idea in the algorithm is the notion of a special facet of a smooth Fano d-polytope (defined in section 3.1): A facet F of a smooth Fano d-polytope is called special, if the sum of the vertices of the polytope is a non-negative linear combination of vertices of F . This allows us to identify a finite subset Wd of the lattice Z d, such that any smooth Fano d-polytope is isomorphic to one whose vertices are contained in Wd (theorem 3.6). Thus the problem of classifying smooth Fano d-polytopes is reduced to the problem of considering certain subsets of Wd. We then define a total order on finite subsets of Zd and use this to define a total order on the set of smooth Fano d-polytopes, which respects isomor- phism (section 4). The SFP-algorithm (described in section 5) goes through certain finite subsets of Wd in increasing order, and outputs smooth Fano d-polytopes in increasing order, such that any smooth Fano d-polytope is isomorphic to exactly one in the output list. As a consequence of the total order on smooth Fano d-polytopes, the algo- rithm needs not consult the previous output to check for isomorphism to decide whether or not to output a constructed polytope. 2 Smooth Fano polytopes We fix a notation and prove some simple facts about smooth Fano polytopes. The convex hull of a set K ∈ Rd is denoted by convK. A polytope is the convex hull of finitely many points. The dimension of a polytope P is the dimension of the affine hull, affP , of the polytope P . A k-polytope is a polytope of dimension k. A face of a polytope is the intersection of a supporting hyperplane with the polytope. Faces of polytopes are polytopes. Faces of dimension 0 are called vertices, while faces of codimension 1 and 2 are called facets and ridges, respectively. The set of vertices of a polytope P is denoted by V(P ). Definition 2.1. A convex lattice polytope P in Rd is called a smooth Fano d-polytope, if the origin is contained in the interior of P and the vertices of every facet of P is a Z-basis of the lattice Zd ⊂ Rd. We consider two smooth Fano d-polytopes P1, P2 to be isomorphic, if there exists a bijective linear map ϕ : Rd → Rd, such that ϕ(Zd) = Zd and ϕ(P1) = P2. Whenever F is a (d−1)-simplex in Rd, such that 0 /∈ affF , we let uF ∈ (R be the unique element determined by 〈uF , F 〉 = {1}. For every w ∈ V(F ) we define uw ∈ (Rd)∗ to be the element where 〈uw , w〉 = 1 and 〈uw , w′〉 = 0 for every w′ ∈ V(F ), w′ 6= w. Then {uw |w ∈ V(F )} is the basis of (Rd)∗ dual to the basis V(F ) of Rd. When F is a facet of a smooth Fano polytope and v ∈ V(P ), we certainly have 〈uF , v〉 ∈ Z and 〈uF , v〉 = 1 ⇐⇒ v ∈ V(F ) and 〈uF , v〉 ≤ 0 ⇐⇒ v /∈ V(F ). The lemma below concerns the relation between the elements uF and uF ′ , when F and F ′ are adjacent facets. Lemma 2.2. Let F be a facet of a smooth Fano polytope P and v ∈ V(F ). Let F ′ be the unique facet which intersects F in a ridge R of P , v /∈ V(R). Let v′ = V(F ′) \ V(R). 1. 〈uv , v′〉 = −1. 2. 〈uF , v ′〉 = 〈uF ′ , v〉. 3. 〈uF ′ , x〉 = 〈uF , x〉+ 〈u , x〉(〈uF , v ′〉 − 1) for any x ∈ Rd. 4. In particular, • 〈uv , x〉 < 0 iff 〈uF ′ , x〉 > 〈uF , x〉. • 〈uv , x〉 > 0 iff 〈uF ′ , x〉 < 〈uF , x〉. • 〈uv , x〉 = 0 iff 〈uF ′ , x〉 = 〈uF , x〉. for any x ∈ Rd. 4 2 SMOOTH FANO POLYTOPES 5. Suppose x 6= v′ is a vertex of P where 〈uv , x〉 < 0. Then 〈uF , v 〈uF , x〉. Proof. The sets V(F ) and V(F ′) are both bases of the lattice Zd and the first statement follows. We have v + v′ ∈ span(F ∩ F ′), and then the second statement follows. Use the previous statements to calculate 〈uF ′ , x〉. 〈uF ′ , x〉 = 〈uF ′ , w∈V(F ) 〈uwF , x〉w〉 w∈V(F )\{v} 〈uwF , x〉+ 〈u F , x〉〈uF ′ , v〉 = 〈uF , x〉+ 〈u F , x〉 〈uF ′ , v〉 − 1 = 〈uF , x〉+ 〈u F , x〉 〈uF , v ′〉 − 1 As 〈uF , v ′〉 − 1 < 0 the three equivalences follow directly. Suppose there is a vertex x ∈ V(P ), such that 〈uv , x〉 < 0 and 〈uF , v 〈uF , x〉. Then 〈uF ′ , x〉 = 〈uF , x〉+ 〈u F , x〉(〈uF , v ′〉 − 1) ≥ 〈uF , x〉 − (〈uF , v ′〉 − 1) ≥ 1. Hence x is on the facet F ′. But this cannot be the case as V(F ′) = {v′} ∪ V(F ) \ {v}. Thus no such x exists. And we’re done. In the next lemma we show a lower bound on the numbers 〈uw , v〉, w ∈ V(F ), for any facet F and any vertex v of a smooth Fano d-polytope. Lemma 2.3. Let F be a facet and v a vertex of a smooth Fano polytope P . 〈uwF , v〉 ≥ 0 〈uF , v〉 = 1 −1 〈uF , v〉 = 0 〈uF , v〉 〈uF , v〉 < 0 for every w ∈ V(F ). Proof. When 〈uF , v〉 = 1 the statement is obvious. Suppose 〈uF , v〉 = 0 and 〈u , v〉 < 0 for some w ∈ V(F ). Let F ′ be the unique facet intersecting F in the ridge conv{V(F ) \ {w}}. By lemma 2.2 〈uF ′ , v〉 > 0. As 〈uF ′ , v〉 ∈ Z we must have 〈uF ′ , v〉 = 1. This implies 〈uF , v〉 = −1. Suppose 〈uF , v〉 < 0 and 〈u , v〉 < 〈uF , v〉 ≤ −1 for some w ∈ V(F ). Let F ′ 6= F be the facet containing the ridge conv{V(F ) \ {w}}, and let w′ be the unique vertex in V(F ′) \ V(F ). Then by lemma 2.2 〈uF ′ , v〉 = 〈uF , v〉 + 〈u F , v〉(〈uF , w ′〉 − 1) ≥ 〈uF , v〉 − 〈u F , v〉. If 〈uF , v〉 − 〈u , v〉 > 0, then v is on the facet F ′. But this is not the case as 〈uw , v〉 < −1. We conclude that 〈uw , v〉 ≥ 〈uF , v〉. When F is a facet and v a vertex of a smooth Fano d-polytope P , such that 〈uF , v〉 = 0, we can say something about the face lattice of P . Lemma 2.4 ([7] section 2.3 remark 5(2), [13] lemma 5.5). Let F be a facet and v be vertex of a smooth Fano polytope P . Suppose 〈uF , v〉 = 0. Then conv{{v} ∪ V(F ) \ {w}} is a facet of P for every w ∈ V(F ) with , v〉 = −1. Proof. Follows from the proof of lemma 2.3. 3 Special embeddings of smooth Fano polytopes In this section we find a concrete finite subset Wd of Z d with the nice prop- erty that any smooth Fano d-polytope is isomorphic to one whose vertices are contained in Wd. The problem of classifying smooth Fano d-polytopes is then reduced to considering subsets of Wd. 3.1 Special facets The following definition is a key concept. Definition 3.1. A facet F of a smooth Fano d-polytope P is called special, if the sum of the vertices of P is a non-negative linear combination of V(F ), that is v∈V(P ) w∈V(F ) aww , aw ≥ 0. Clearly, any smooth Fano d-polytope has at least one special facet. Let F be a special facet of a smooth Fano d-polytope P . Then 0 ≤ 〈uF , v∈V(P ) v〉 = d+ v∈V(P ),〈uF ,v〉<0 〈uF , v〉, which implies −d ≤ 〈uF , v〉 ≤ 1 for any vertex v of P . By using the lower bound on the numbers 〈uw , v〉, w ∈ V(F ) (see lemma 2.3), we can find an explicite finite subset of the lattice Zd, such that every v ∈ V(P ) is contained in this subset. In the following lemma we generalize this observation to subsets of V(P ) containing V(F ). Lemma 3.2. Let P be a smooth Fano polytope. Let F be a special facet of P and let V be a subset of V(P ) containing V(F ), whose sum is ν. 〈uF , ν〉 ≥ 0 6 3 SPECIAL EMBEDDINGS OF SMOOTH FANO POLYTOPES 〈uwF , ν〉 ≤ 〈uF , ν〉+ 1 for every w ∈ V(F ). Proof. For convenience we set U = V(P ) \ V and µ = v∈U v. Since F is a special facet we know that 0 ≤ 〈uF , v∈V(P ) v〉 = 〈uF , ν〉+ 〈uF , µ〉. The set V(F ) is contained in V so 〈uF , v〉 ≤ 0 for every v in U , hence 〈uF , ν〉 ≥ 0. Suppose that for some w ∈ V(F ) we have 〈uw , ν〉 > 〈uF , ν〉+ 1. By lemma 2.3 we know that 〈uwF , v〉 ≥ −1 〈uF , v〉 = 0 〈uF , v〉 〈uF , v〉 < 0 for every vertex v ∈ V(P ) \ V(F ). There is at most one vertex v of P , 〈uF , v〉 = 0, with negative coefficient 〈u , v〉 (lemma 2.4). So 〈uwF , µ〉 ≥ 〈uF , µ〉 − 1. Now, consider 〈uw v∈V(P ) v〉. 〈uwF , v∈V(P ) v〉 = 〈uwF , ν〉+ 〈u F , µ〉 > 〈uF , ν〉+ 〈uF , µ〉 = 〈uF , v∈V(P ) But this implies that 〈ux v∈V(P ) v〉 is negative for some x ∈ V(F ). A contradiction. Corollary 3.3. Let F be a special facet and v any vertex of a smooth Fano d-polytope. Then −d ≤ 〈uF , v〉 ≤ 1 and 〈uF , v〉 ≤ 〈uwF , v〉 ≤ 1 , 〈uF , v〉 = 1 d− 1 , 〈uF , v〉 = 0 d+ 〈uF , v〉 , 〈uF , v〉 < 0 for every w ∈ V(F ). Proof. For 〈uF , v〉 = 1 the statement is obvious. When 〈uF , v〉 = 0 the coefficients of v with respect to the basis V(F ) is bounded below by −1 (lemma 2.3), so no coefficient exceeds d− 1. So the case 〈uF , v〉 < 0 remains. The lower bound is by lemma 2.3. Use lemma 3.2 on the subset V = V(F ) ∪ {v} to prove the upper bound. 3.2 Special embeddings 7 3.2 Special embeddings Let (e1, . . . , ed) be a fixed basis of the lattice Z d ⊂ Rd. Definition 3.4. Let P be a smooth Fano d-polytope. Any smooth Fano d-polytope Q, with conv{e1, . . . , ed} as a special facet, is called a special embedding of P , if P and Q are isomorphic. Obviously, for any smooth Fano polytope P , there exists at least one special embedding of P . As any polytope has finitely many facets, there exists only finitely many special embeddings of P . Now we define a subset of Zd which will play an important part in what follows. Definition 3.5. By Wd we denote the maximal set (with respect to inclu- sion) of lattice points in Zd such that 1. The origin is not contained in Wd. 2. The points in Wd are primitive lattice points. 3. If a1e1 + . . .+ aded ∈ Wd, then −d ≤ a ≤ 1 for a = a1 + . . .+ ad and ≤ ai ≤ 1 , a = 1 d− 1 , a = 0 d+ a , a < 0 for every i = 1, . . . , d. The next theorem is one of the key results in this paper. It allows us to classify smooth Fano d-polytopes by considering subsets of the explicitely given set Wd. Theorem 3.6. Let P be an arbitrary smooth Fano d-polytope, and Q any special embedding of P . Then V(Q) is contained in the set Wd. Proof. Follows directly from corollary 3.3 and the definition of Wd. 4 Total ordering of smooth Fano polytopes In this section we define a total order on the set of smooth Fano d-polytopes for any fixed d ≥ 1. Throughout the section (e1, . . . , ed) is a fixed basis of the lattice Z 8 4 TOTAL ORDERING OF SMOOTH FANO POLYTOPES 4.1 The order of a lattice point We begin by defining a total order � on Zd. Definition 4.1. Let x = x1e1 + . . . + xded, y = y1e1 + . . . + yded be two lattice points in Zd. We define x � y if and only if (−x1 − . . .− xd, x1, . . . , xd) ≤lex (−y1 − . . .− yd, y1, . . . , yd), where ≤lex is the lexicographical ordering on the product of d + 1 copies of the ordered set (Z,≤). The ordering � is a total order on Zd. Example. (0, 1) ≺ (−1, 1) ≺ (1,−1) ≺ (−1, 0). Let V be any nonempty finite subset of lattice points in Zd. We define max V to the maximal element in V with respect to the ordering�. Similarly, minV is defined to be the minimal element in V . A important property of the ordering is shown in the following lemma. Lemma 4.2. Let P be a smooth Fano d-polytope, such that conv{e1, . . . , ed} is a facet of P . For every 1 ≤ i ≤ d, let vi 6= ei denote the vertex of P , such that conv{e1, . . . , ei−1, vi, ei+1, . . . , ed} is a facet of P . Then vi = min{v ∈ V(P ) | 〈u , v〉 < 0}. Proof. By lemma 2.2.(1) the vertex vi is in the set {v ∈ V(P ) | 〈u , v〉 < 0}, and by lemma 2.2.(5) and the definition of the ordering �, vi is the minimal element in this set. In fact, we have chosen the ordering � to obtain the property of lemma 4.2, and any other total order on Zd having this property can be used in what follows. 4.2 The order of a smooth Fano d-polytope We can now define an ordering on finite subsets of Zd. The ordering is defined recursively. Definition 4.3. Let X and Y be finite subsets of Zd. We define X � Y if and only if X = ∅ or Y 6= ∅ ∧ (minX ≺ minY ∨ (minX = minY ∧X\{minX} � Y \{min Y })). Example. ∅ ≺ {(0, 1)} ≺ {(0, 1), (−1, 1)} ≺ {(0, 1), (1,−1)} ≺ {(−1, 1)}. When W is a nonempty finite set of subsets of Zd, we define maxW to be the maximal element in W with respect to the ordering of subsets �. Similarly, minW is the minimal element in W . Now, we are ready to define the order of a smooth Fano d-polytope. 4.3 Permutation of basisvectors and presubsets 9 Definition 4.4. Let P be a smooth Fano d-polytope. The order of P , ord(P ), is defined as ord(P ) := min{V(Q) | Q a special embedding of P}. The set is non-empty and finite, so ord(P ) is well-defined. Let P1 and P2 be two smooth Fano d-polytopes. We say that P1 ≤ P2 if and only if ord(P1) � ord(P2). This is indeed a total order on the set of isomorphism classes of smooth Fano d-polytopes. 4.3 Permutation of basisvectors and presubsets The group Sd of permutations of d elements acts on Z d is the obvious way by permuting the basisvectors: σ.(a1e1 + . . .+ aded) := a1eσ(1) + . . .+ adeσ(d) , σ ∈ Sd. Similarly, Sd acts on subsets of Z σ.X := {σ.x | x ∈ X}. In this notation we clearly have for any special embedding P of a smooth Fano d-polytope ord(P ) � min{σ.V(P ) | σ ∈ Sd}. Let V and W be finite subsets of Zd. We say that V is a presubset of W , if V ⊆ W and v ≺ w whenever v ∈ V and w ∈ W \ V . Example. {(0, 1), (−1, 1)} is a presubset of {(0, 1), (−1, 1), (1,−1)}, while {(0, 1), (1,−1)} is not. Lemma 4.5. Let P be a smooth Fano polytope. Then every presubset V of ord(P ) is the minimal element in {σ.V | σ ∈ Sd}. Proof. Let ord(P ) = {v1, . . . , vn}, v1 ≺ . . . ≺ vn. Suppose there exists a permutation σ and a k, 1 ≤ k ≤ n, such that σ.{v1, . . . , vk} = {w1, . . . , wk} ≺ {v1, . . . , vk}, where w1 ≺ . . . ≺ wk. Then there is a number j, 1 ≤ j ≤ k, such that wi = vi for every 1 ≤ i < j and wj ≺ vj. Let σ act on {v1, . . . , vn}. σ.{v1, . . . , vn} = {x1, . . . , xn} , x1 ≺ . . . ≺ xn. Then xi � vi for every 1 ≤ i < j and xj ≺ vj. So σ.ord(P ) ≺ ord(P ), but this contradicts the definition of ord(P ). 10 5 THE SFP-ALGORITHM 5 The SFP-algorithm In this section we describe an algorithm that produces the classification list of smooth Fano d-polytopes for any given d ≥ 1. The algorithm works by going through certain finite subsets of Wd in increasing order (with respect to the ordering defined in the previous section). It will output a subset V iff convV is a smooth Fano d-polytope P and ord(P ) = V . Throughout the whole section (e1, . . . , ed) is a fixed basis of Z d and I denotes the (d− 1)-simplex conv{e1, . . . , ed}. 5.1 The SFP-algorithm The SFP-algorithm consists of three functions, SFP, AddPoint and CheckSubset. The finite subsets of Wd are constructed by the function AddPoint, which takes a subset V , {e1, . . . , ed} ⊆ V ⊆ Wd, together with a finite set F , I ∈ F , of (d − 1)-simplices in Rd as input. It then goes through every v in the set {v ∈ Wd | max V ≺ v} in increasing order, and recursively calls itself with input V ∪ {v} and some set F ′ of (d − 1)-simplices of Rd, F ⊆ F ′. In this way subsets of Wd are considered in increasing order. Whenever AddPoint is called, it checks if the input set V is the vertex set of a special embedding of a smooth Fano d-polytope P such that ord(P ) = V , in which case the polytope P = convV is outputted. For any given integer d ≥ 1 the function SFP calls the function AddPoint with input {e1, . . . , ed} and {I}. In this way a call SFP(d) will make the algorithm go through every finite subset of Wd containing {e1, . . . , ed}, and smooth Fano d-polytopes are outputted in strictly increasing order. It is vital for the effectiveness of the SFP-algorithm, that there is some efficient way to check if a subset V ⊆ Wd is a presubset of ord(P ) for some smooth Fano d-polytope P . The function AddPoint should perform this check before the recursive call AddPoint(V,F ′). If P is any smooth Fano d-polytope, then any presubset V of ord(P ) is the minimal element in the set {σ.V |σ ∈ Sd} (by lemma 4.5). In other words, if there exists a permutation σ such that σ.V ≺ V , then the algorithm should not make the recursive call AddPoint(V ). But this is not the only test we wish to perform on a subset V before the recursive call. The function CheckSubset performs another test: It takes a subset V , {e1, . . . , ed} ⊆ V ⊆ Wd as input together with a finite set of (d−1)-simplices F , I ∈ F , and returns a set F ′ of (d−1)-simplices containing F , if there exists a special embedding P of a smooth Fano d-polytope, such 5.2 An example of the reasoning in CheckSubset 11 1. V is a presubset of V(P ) 2. F is a subset of the facets of P This is proved in theorem 5.1. If no such special embedding exists, then CheckSubset returns false in many cases, but not always! Only when CheckSubset(V,F) returns a set F ′ of simplices, we allow the recursive call AddPoint(V,F ′). Given input V ⊆ Wd and a set F of (d − 1)-simplices of R d, the function CheckSubset works in the following way: Suppose V is a presubset of V(P ) for some special embedding P of a smooth Fano d-polytope and F is a subset of the facets of P . Deduce as much as possible of the face lattice of P and look for contradictions to the lemmas stated in section 2. The more facets we know of P , the more restrictions we can put on the vertex set V(P ), and then on V . If a contradiction arises, return false. Otherwise, return the deduced set of facets of P . The following example illustrates how the function CheckSubset works. 5.2 An example of the reasoning in CheckSubset Let d = 5 and V = {v1, . . . , v8}, where v1 = e1 , v2 = e2 , v3 = e3 , v4 = e4 , v5 = e5 v6 = −e1 − e2 + e4 + e5 , v7 = e2 − e3 − e4 , v8 = −e4 − e5. Suppose P is a special embedding of a smooth Fano 5-polytope, such that V is a presubset of V(P ). Certainly, the simplex I is a facet of P . Notice, that V does not violate lemma 3.2. v1 + . . . + v8 = e2 + e5. If V did contradict lemma 3.2, then the polytope P could not exist, and CheckSubset(V, {I}) should return false. For simplicity we denote any k-simplex conv{vi1 , . . . , vik} by {i1, . . . , ik}. Since 〈uI , v6〉 = 0, the simplices F1 = {2, 3, 4, 5, 6} and F2 = {1, 3, 4, 5, 6} are facets of P (lemma 2.4). There are exactly two facets of P containing the ridge {1, 2, 4, 5}. One of them is I. Suppose the other one is {1, 2, 4, 5, 9}, where v9 is some lattice point not in V , v9 ∈ V(P ). Then 〈uI , v9〉 > 〈uI , v7〉 by lemma 2.2.(5) and then v9 ≺ v7 by the definition of the ordering of lattice points Z But then V is not a presubset of V(P ). This is the nice property of the ordering of Zd, and the reason why we chose it as we did. We conclude that F3 = {1, 2, 4, 5, 7} is a facet of P , and by similar reasoning F4 = {1, 2, 3, 5, 8} and F5 = {1, 2, 3, 4, 8} are facets of P . 12 5 THE SFP-ALGORITHM Now, for each of the facets Fi and every point vj ∈ V , we check if 〈uFi , vj〉 = 0. If this is the case, then by lemma 2.4 conv({vj} ∪ V(Fi) \ {w}) is a facet of P for every w ∈ V(Fi) where 〈u , vj〉 < 0. In this way we get that {2, 4, 5, 6, 7} , {1, 4, 5, 6, 7} , {1, 2, 3, 7, 8} , {1, 3, 5, 7, 8} are facets of P . We continue in this way, until we cannot deduce any new facet of P . Every time we find a new facet F we check that v is beneath F (that is 〈uF , v〉 ≤ 1) and that lemma 2.3 holds for any v ∈ V . If not, then CheckSubset(V, {I}) should return false. If no contradiction arises, CheckSubset(V, {I}) returns the set of deduced facets. 5.3 The SFP-algorithm in pseudo-code Input: A positive integer d. Output: A list of special embeddings of smooth Fano d-polytopes, such that 1. Any smooth Fano d-polytope is isomorphic to one and only one poly- tope in the output list. 2. If P is a smooth Fano d-polytope in the output list, then V(P ) = ord(P ). 3. If P1 and P2 are two non-isomorphic smooth Fano d-polytopes in the output list and P1 preceeds P2 in the output list, then ord(P1) ≺ ord(P2). SFP ( an integer d ≥ 1 ) 1. Construct the set V = {e1, . . . , ed} and the simplex I = convV . 2. Call the function AddPoint(V, {I}). 3. End program. AddPoint ( a subset V where {e1, . . . , ed} ⊆ V ⊆ Wd , a set of (d − 1)- simplices F in Rd where I ∈ F ) 1. If P = conv(V(V )) is a smooth Fano d-polytope and V(V ) = ord(P ), then output P . 2. Go through every v ∈ Wd, maxV(V ) ≺ v, in increasing order with respect to the ordering ≺: (a) If CheckSubset(V ∪ {v},F) returns false, then goto (d). Oth- erwise let F ′ be the returned set of (d− 1)-simplices. 5.4 Justification of the SFP-algorithm 13 (b) If V ∪ {v} 6= min{σ.(V ∪ {v}) | σ ∈ Sd}, then goto (d). (c) Call the function AddPoint(V ∪ {v},F ′). (d) Let v be the next element in Wd and go back to (a). 3. Return CheckSubset ( a subset V where {e1, . . . , ed} ⊆ V ⊆ Wd , a set of (d− 1)- simplices F in Rd where I ∈ F ) 1. Let ν = v∈V v. 2. If 〈uI , ν〉 < 0, then return false. 3. If 〈u , ν〉 > 1 + 〈uI , ν〉 for some i, then return false. 4. Let F ′ = F . 5. For every i ∈ {1, . . . , d}: If the set {v ∈ V |〈u , v〉 < 0} is equal to {max V }, then add the simplex conv({max V } ∪ V(I) \ {ei}) to F 6. If there exists F ∈ F ′ such that V(F ) is not a Z-basis of Zd, then return false. 7. If there exists F ∈ F ′ and v ∈ V such that 〈uF , v〉 > 1, then return false. 8. If there exists F ∈ F ′, v ∈ V and w ∈ V(F ), such that 〈uwF , v〉 < 0 〈uF , v〉 = 1 −1 〈uF , v〉 = 0 〈uF , v〉 〈uF , v〉 < 0 then return false. 9. If there exists F ∈ F ′, v ∈ V and w ∈ V(F ), such that 〈uF , v〉 = 0 and , v〉 = −1, then consider the simplex F ′ = conv({v}∪V(F ) \ {w}). If F ′ /∈ F ′, then add F ′ to F ′ and go back to step 6. 10. Return F ′. 5.4 Justification of the SFP-algorithm The following theorems justify the SFP-algorithm. Theorem 5.1. Let P be a special embedding of a smooth Fano d-polytope and V a presubset of V(P ), such that {e1, . . . , ed} ⊆ V . Let F be a set of facets of P . Then CheckSubset(V,F) returns a subset F ′ of the facets of P and F ⊆ F ′. 14 6 CLASSIFICATION RESULTS AND WHERE TO GET THEM Proof. By lemma 3.2 the subset V will pass the tests in step 2 and 3 in CheckSubset. The function CheckSubset constructs a set F ′ of (d−1)-simplices contain- ing the input set F . We now wish to prove that every simplex F in F ′ is a facet of P : By the assumptions the subset F ⊆ F ′ consists of facets of P . Consider the addition of a simplex Fi, 1 ≤ i ≤ d, in step 5: Fi = conv({max V } ∪ V(I) \ {ei}). As maxV is the only element in the set {v ∈ V |〈uei , v〉 < 0} and V is a presubset of V(P ), Fi is a facet of P by lemma 4.2. Consider the addition of simplices in step 9: If F is a facet of P , then by lemma 2.4 the simplex conv({v} ∪ V(F ) \ {w}) is a facet of P . By induction we conclude, that every simplex in F ′ is a facet of P . Then any simplex F ∈ F ′ will pass the tests in steps 6–8 (use lemma 2.3 to see that the last test is passed). This proves the theorem. Theorem 5.2. The SFP-algorithm produces the promised output. Proof. Let P be a smooth Fano d-polytope. Clearly, P is isomorphic to at most one polytope in the output list. Let Q be a special embedding of P such that V(Q) = ord(P ). We need to show that Q is in the output list. Let V(Q) = {e1, . . . , ed, q1, . . . , qk}, where q1 ≺ . . . ≺ qk, and let Vi = {e1, . . . , ed, q1, . . . , qi} for every 1 ≤ i ≤ k. Certainly the function AddPoint has been called with input {e1, . . . , ed} and {I}. By theorem 5.1 the function call CheckSubset(V1 , {I}) returns a set F1 of (d − 1)-simplices which are facets of Q, I ⊂ F1. By lemma 4.5 the set V1 passes the test in 2b in AddPoint. Then AddPoint is called recursively with input V1 and F1. The call CheckSubset(V1,F1) returns a subset F2 of facets of Q, and the set V2 passes the test in 2b in AddPoint. So the call AddPoint(V2,F2) is made. Proceed in this way to see that the call AddPoint(Vk ,Fk) is made, and then the polytope Q = convVk is outputted in step 1 in AddPoint. 6 Classification results and where to get them A modified version of the SFP-algorithm has been implemented in C++, and used to classify smooth Fano d-polytopes for d ≤ 7. On an average home computer our program needs less than one day (january 2007) to con- struct the classification list of smooth Fano 7-polytopes. These lists can be downloaded from the authors homepage: http://home.imf.au.dk/oebro REFERENCES 15 An advantage of the SFP-algorithm is that it requires almost no memory: When the algorithm has found a smooth Fano d-polytope P , it needs not consult the output list to decide whether to output the polytope P or not. The construction guarentees that V(P ) = min{σ.V(P ) | σ ∈ Sd} and it remains to check if V(P ) = ord(P ). Thus there is no need of storing the output list. The table below shows the number of isomorphism classes of smooth Fano d-polytopes with n vertices. n d = 1 d = 2 d = 3 d = 4 d = 5 d = 6 d = 7 4 2 1 5 1 4 1 6 1 7 9 1 7 4 28 15 1 8 2 47 91 26 1 9 27 268 257 40 10 10 312 1318 643 11 1 137 2807 5347 12 1 35 2204 19516 13 5 771 26312 14 2 186 14758 15 39 4362 16 11 1013 17 1 214 18 1 43 Total 1 5 18 124 866 7622 72256 References [1] V. V. Batyrev, Toroidal Fano 3-folds, Math. USSR-Izv. 19 (1982), 13– [2] V. V. Batyrev, On the classification of smooth projective toric varieties, Tohoku Math. J. 43 (1991), 569–585. [3] V. V. Batyrev, On the classification of toric Fano 4-folds, J. Math. Sci. (New York) 94 (1999), 1021–1050. [4] L. Bonavero, Toric varieties whose blow-up at a point is Fano. Tohoku Math. J. 54 (2002), 593–597. 16 REFERENCES [5] C. Casagrande, Centrally symmetric generators in toric Fano varieties, Manuscr. Math. 111 (2003), 471–485. [6] C. Casagrande, The number of vertices of a Fano polytope, Ann. Inst. Fourier 56 (2006), 121–130. [7] O. Debarre, Toric Fano varieties in Higher dimensional varieties and rational points, lectures of the summer school and conference, Budapest 2001, Bolyai Society Mathematical Studies 12, Springer, 2001. [8] G. Ewald, On the classification of toric Fano varieties, Discrete Com- put. Geom. 3 (1988), 49–54. [9] P. Kleinschmidt, A classification of toric varieties with few generators, Aequationes Math 35 (1988), no.2-3, 254–266. [10] M. Kreuzer & H. Skarke, Classification of reflexive polyhedra in three dimensions, Adv. Theor. Math. Phys. 2 (1998), 853–871. [11] M. Kreuzer & H. Skarke, Complete classification of reflexive polyhedra in four dimensions, Adv. Theor. Math. Phys. 4 (2000), 1209–1230. [12] M. Kreuzer & B. Nill, Classification of toric Fano 5-folds, Preprint, math.AG/0702890. [13] B. Nill, Gorenstein toric Fano varieties, Manuscr. Math. 116 (2005), 183–210. [14] B. Nill. Classification of pseudo-symmetric simplicial reflexive poly- topes, Preprint, math.AG/0511294, 2005. [15] H. Sato, Toward the classification of higher-dimensional Toric Fano varieties,. Tohoku Math. J. 52 (2000), 383–413. [16] H. Sato, Toric Fano varieties with divisorial contractions to curves. Math. Nachr. 261/262 (2003), 163–170. [17] V.E. Voskresenskij & A. Klyachko, Toric Fano varieties and systems of roots. Math. USSR-Izv. 24 (1985), 221–244. [18] K. Watanabe & M. Watanabe, The classification of Fano 3-folds with torus embeddings, Tokyo Math. J. 5 (1982), 37–48. [19] M. Øbro, Classification of terminal simplicial reflexive d-polytopes with 3d− 1 vertices, Preprint, math.CO/0703416. http://arxiv.org/abs/math/0702890 http://arxiv.org/abs/math/0511294 http://arxiv.org/abs/math/0703416 REFERENCES 17 Department of Mathematics University of Århus 8000 Århus C Denmark E-mail address : oebro@imf.au.dk Introduction Smooth Fano polytopes Special embeddings of smooth Fano polytopes Special facets Special embeddings Total ordering of smooth Fano polytopes The order of a lattice point The order of a smooth Fano d-polytope Permutation of basisvectors and presubsets The SFP-algorithm The SFP-algorithm An example of the reasoning in CheckSubset The SFP-algorithm in pseudo-code Justification of the SFP-algorithm Classification results and where to get them ABSTRACT We present an algorithm that produces the classification list of smooth Fano d-polytopes for any given d. The input of the algorithm is a single number, namely the positive integer d. The algorithm has been used to classify smooth Fano d-polytopes for d<=7. There are 7622 isomorphism classes of smooth Fano 6-polytopes and 72256 isomorphism classes of smooth Fano 7-polytopes. <|endoftext|><|startoftext|> Intelligent location of two simultaneously active acoustic emission sources: Part II Tadej Kosel and Igor Grabec Faculty of Mechanical Engineering, University of Ljubljana, Aškerčeva 6, POB 394, SI-1001 Ljubljana, Slovenia e-mail: tadej.kosel@guest.arnes.si; igor.grabec@fs.uni-lj.si Abstract— Part I describes an intelligent acoustic emission locator, while Part II discusses blind source separation, time delay estimation and location of two continuous acoustic emission sources. Acoustic emission (AE) analysis is used for characterization and location of developing defects in materials. AE sources often generate a mixture of various statistically independent signals. A difficult problem of AE analysis is separation and characterization of signal components when the signals from various sources and the mode of mixing are unknown. Recently, blind source separation (BSS) by independent component analysis (ICA) has been used to solve these problems. The purpose of this paper is to demonstrate the applicability of ICA to locate two independent simultaneously active acoustic emission sources on an aluminum band specimen. The method is promising for non-destructive testing of aircraft frame structures by acoustic emission analysis. INTRODUCTION A common goal of many non-destructive testing methods is to detect defects in materials. Acoustic emission analysis (AE) is a passive testing method used to locate and characterize defects which emit sound[10]. There are many ways to deduce the location of an AE source from electrical signals detected by a chain of sensors. The corresponding problems may be classified by the type of acoustic source mechanism as the location of a continuous emission source, such as that generated by a leak, or as the location of discrete emission, such as an AE burst caused by a growing crack. This paper describes a method for processing continuous AE signals to determine the time delay (T-D) between signals and thus to provide information for location of AE sources. It should be pointed out that application of AE source characteristics, such as count, count rate, ampli- tude distribution, and conventional time delay measurement, becomes meaningless when dealing with continuous acoustic sources. The basic information for AE source location consists of T-D between stress waves detected at different positions on a specimen. In the case of only one active AE source, T- D of continuous acoustic waves can be estimated using the cross-correlation function (CCF) of sensor signals described in Part I of this article[10], [7]. In the case of two (or Manuscript generated: January 31, 2007 more) simultaneously active AE sources, this method is not applicable, since analysis of the CCF leads only to the T-D of the most powerful AE signal. Detection of simultaneously active independent AE source signals therefore requires a more sophisticated approach. The purpose of our study was to find a suitable method for processing a mixture of two simultaneously active continuous AE signals to determine the T-D and, related to this, the coordinates of both AE sources. We found that the Blind Source Separation (BSS) method solves this problem satisfac- torily. BSS is a general signal processing method involving the recovery of the contributions of different sources from a finite set of observations recorded by sensors, independent of the propagation medium and without any prior knowledge of the sources. BSS has already been successfully applied in medicine, telecommunications, image processing etc[8]. However, it is also a promising method for AE analysis of aircraft structures, because AE signals are often hidden in a mixture of signals from various sources. BSS could extract the specific signature of each AE source, which can further be used for location and characterization purposes, or to isolate AE sources from background noise. We conducted experiments with BSS on an aluminum beam on which two continuous AE sources were generated simultaneously by air flow. METHODS In this section we explain two different methods for time delay estimation of AE sources. The first method is based on analysis of the CCF and is convenient for T-D estimation of one active continuous AE source as is described in Part I[10], [7], [12]. The CCF exhibits a peak when the delay parameter compensates the T-D between the sensor signals [10]. The T-D is thus determined by the position of the highest peak of the CCF. The second method is based on BSS algorithm and is convenient for T-D estimation of two (or more) simultaneously active continuous AE sources[9]. Location of two simulta- neously active AE sources was performed by an intelligent locator based on a general regression neural network[5] as is described in Part I. Multichannel Blind Source Separation has recently received increased attention due to the importance of its potential http://arxiv.org/abs/0704.0050v1 applications[3]. It occurs in many fields of engineering and applied sciences, including processing of signals from antenna array, speech and geophysical data processing, noise reduction, biological system analysis, etc. It consists of recovering signals emitted by unknown sources and mixed by an unknown medium (material where waves propagate), using only several observations of the mixtures. The only assumptions made are the linearity of the mixing system and the statistical independence of original signals. BSS methods may be classified in several ways. One possible classification that can be made depends on whether the mixtures are instantaneous or convolutive [4]. Convo- lutive mixtures correspond to a mixing system with time dependent memory. They represent a more general case than instantaneous mixtures, and they have in particular acoustic applications. Recently, the principle of independent component analysis (ICA) was applied in BSS, and it was found to be a simple and powerful tool[6]. This study deals with the separation of two convolutively mixed independent continuous AE signals by ICA and the intelligent locator was used to locate two independent continuous AE sources based on T-D The mixing and filtering processes of unknown input signals sj(t) may have different mathematical or physical back- grounds, depending on specific applications. In this paper, we focus mainly on the simplest cases with n signals xi(t) linearly mixed in n unknown statistically independent, zero mean source signals sj(t). The composition is expressed in matrix notation as x = A ∗ s [8], where ‘*’ denotes a convolution, x = [x1(t), . . . , xn(t)] T is the vector of sensor signals, s = [s1(t), . . . , sn(t)] T is the vector of source signals and A is an unknown full rank n × n mixing matrix whose elements are finite inpulse response (FIR) filters. We assume that only vector x is available. The goal of ICA is to find a matrix W , by which vector x can be transformed into source signals u = W ∗ x. Matrix W is simply the inverse of A. However, when noise corrupts the signals, matrix W must be found by an optimal statistical treatment of the inverse problem. The optimal ma- trix W can be estimated by a feed-forward neural network operating in the frequency domain. A learning algorithm with Amari’s natural gradient can be written as[1]: ũ = W̃ · x̃, W̃ (τ + 1) = W̃ (τ) + α∆W̃ (τ) + η∆W̃ (τ − 1), ∆W̃ = [I − ỹ · ũH] W̃ , ỹ = tanh(ℜ[ũ]) + ı tanh(ℑ[ũ]), where α is the learning rate, η is the constant of learning, I is the identity matrix and the tilde ‘˜ ’represents a frequency domain. The ICA algorithm runs off-line and proceeds as follows [11] (Fig. 1): 1) Pre-process the time-domain input signals, x(t): sub- stract the mean from each signal. 2) Initialize the frequency domain unmixing filters, W̃ . 3) Take a block of input data and convert it into the frequency domain using the Fast Fourier Transform (FFT). 4) Filter the frequency domain input block, x̃, through W̃ to get the estimated source signals, ũ. 5) Pass ũ through the frequency domain nonlinearity, ỹ. 6) Use W̃ , ũ and ỹ along with the natural gradient extension [2] to compute the change in the unmixing PSfrag replacements pre-process initialize unmixing filters filter update rule Fig. 1. Block diagram of ICA algorithm filter, ∆W̃ . 7) Take the next block of input data, covert it into the frequency domain, and proceed from step 4. Repeat this process until the unmixing filters have converged upon a solution, passing several times through the data. 8) Normalize W̃ and convert it back into the time domain, using the Inverse Fast Fourier Transform (IFFT). 9) Convolve the time domain unmixing filters, W , with x to get the estimated sources. EXPERIMENTS We performed experiments with two independent continu- ous AE sources on an aluminum band of dimensions 4000× 40× 5mm3. Reflections at the end of the band were reduced by wrapping the ends in putty. The testing area was on the longitudinal axis in the middle of the band, where 23 holes of diameter 2 mm and mutual separation 100 mm were prepared as shown in Fig. 2. PSfrag replacements φ 2 mm bandl air flow Fig. 2. AE generation by air flowing through the hole Two AE sensors were mounted 100 mm away from the terminal holes, that is 2.4 m from each other. The origin of the coordinate system was in the middle of the band and the testing area extended from −1.1m to +1.1m. AE signals were excited by two independent air jets flowing through the holes. The source position was arbitrarily selected at +100mm and +800mm. Air jets were formed by two nozzles of diameter 1 mm using pressure 7 bar. The experimental set- up consisted of the test specimen (aluminum band), two AE sensors (pinducers), two AE sources (air jets), two amplifiers, a digital oscilloscope (A/D converter) and a computer (BSS module, locator, plotter) as shown in Fig. 3. Three experiments were performed : (1) T-D estimation using a CCF of two AE signals that were not simultaneously active; (2) T-D estimation using a CCF of two AE signals which were simultaneously active and (3) T-D estimation of AE signals using ICA. Location of sources, based on T-D, by the intelligent locator was performed in all three cases. PSfrag replacements sensor AE source locator plotter Fig. 3. Experimental set-up In the first experiment only one air jet was activated for a particular measurement. In the second experiment both air jets were activated. Sensor signals were linear convolutive mixtures of two independent continuous AE sources as shown in Fig. 4. The auto-correlation R11, R22 and cross-correlation functions R12, R21 were calculated from sensor signals. Only one T-D of two signals can be estimated from the highest peak in both CCF, regardless of the number of independent AE sources on the test specimen as shown in Fig. 5. This means that a CCF can not be used for automatic T-D estima- tion of multiple AE signals on the test specimen. The CCF exhibits various peaks which belong to various independent AE sources, but it is ussually impossible to relate these peaks to corresponding coordinates of AE sources. 0 0.1 0.2 0.3 0.4 PSfrag replacements t [ms] (a) Sensory signal #1 0 0.1 0.2 0.3 0.4 PSfrag replacements t [ms] (b) Sensory signal #2 Fig. 4. Mixtures of two independent continuous AE sources aquired by two sensors In the third experiment the ICA algorithm was used to solve this problem satisfactorily. The ICA algorithm results in demixing FIR filters which extract the independent source signals from sensory signals. By inverting the demixing filters W we obtain mixing filters A. In the case of two independent 0 5000 10000 15000 PSfrag replacements PSfrag replacements Rx1x1 PSfrag replacements Rx1x1 Rx1x2 0 5000 10000 15000 PSfrag replacements Rx1x1 Rx1x2 Rx2x1 Fig. 5. Auto- and cross-correlation functions of sensory signals; down-arrow marks the highest peak AE sources and two sensors, the components of A are four FIR mixing filters, as shown in Fig. 6. There are two direct a11, a22 and two cross mixing filters a12, a21. The first index of the filter represents the number of the sensor, while the second index represents the number of the source. The position of the highest peak of the cross FIR filters determines the T- D between two signals from two sensors. If we substract the coordinate of the highest peak of a direct mixing FIR filter a11 from the coordinate of the highest peak of cross filter a21 we obtain the T-D of first independent AE source, since each of the highest peaks in the FIR filters belongs to different independent AE signals. RESULTS The results of T-D estimation of two continuous independent AE sources are shown in Fig. 7. Three experiments were done. In the first experiment, the T-D was estimated by a CCF of two AE sources which were not active simultaneously as marked by ‘◦’. Locations of these two sources estimated by the intelligent locator were +181 mm and +784 mm. The second experiment was performed with both AE sources active simul- taneously. T-D were also estimated by a CCF. The highest peak 0 5000 10000 15000 PSfrag replacements PSfrag replacements PSfrag replacements 0 5000 10000 15000 0.5PSfrag replacements Fig. 6. Mixing filters obtained by ICA of sensory signals; down-arrow marks the highest peak position corresponds to the source location marked by ‘− −’ and was +784 mm. The third experiment was performed using ICA for T-D estimation and location by intelligent locator. The result is marked by ‘�’. Estimated positions of this two sources were +179 mm and +784 mm respectively. If we compare the coordinates of both independent AE sources estimated by the first experiment and by the third experiment, we find a good correspondence. If we compare estimated AE source coordinates with actual coordinates, which were +100 mm and +800 mm respectively, we observe a slight disagreement due to experimental error. Experimental error is about 3% regarding the distance between sensors. Absolute error in this case is 79 mm and 16 mm respectively. The results also depend on the number and distribution of prototype sources marked by ‘•’, which are essential for operation of the intelligent locator. If the number of prototype sources is increased, location error is reduced. In our case the prototype sources were distributed along the beam from −1.1m to +1.1m separated by 0.1m, so that systematic error of the locator was set to several procents. PSfrag replacements actual position l [m] correlation function Fig. 7. Results of location of two continuous independent AE sources. Symbols: ‘�’ – AE sources obtained by ICA; ‘◦’ – estimated AE sources obtained by cross-correlation function in two steps, when just one of two AE sources was active at time of measurement; ‘− −’ – estimated AE sources obtained by cross-correlation function when two AE sources were active simultaneously; ‘•’ – prototype AE sources required for location using intelligent locator; ‘−’ – distribution of actual sources. DISCUSSION AND CONCLUSION CCF is applicable to T-D estimation only in the case of one active AE source. The goal of our research is to develop a new method to estimate T-D between AE signals in the case of mul- tiple simultaneously active continuous AE sources. We have shown that, for this purpose, ICA is an applicable option. ICA finds a linear coordinate system (the unmixing filters) such that the resulting signals are statistically independent. This is an advantage of ICA over CCF. It represents a new approach to processing of AE data and further expands the applicability of AE analysis in the field of non-destructive testing. In machines or in an industrial environment, multiple sources are usually active Simultaneously, often representing environmental dis- turbances. The corresponding complex signals are not directly applicable to characterization of particular sources. However, separation of contributions by ICA analysis in fact represents a kind of filtering, increasing the applicability of filtered signals to characterization of sources in complex environments. Future research will be focused on location of multiple AE sources on two-dimensional and three-dimensional specimens. REFERENCES [1] Amari, S.-I. 1998 , Natural gradient works efficiently in learning, Neural Computation 10, 251–276. [2] Amari, S.-I., Cichocki, A. Yang, H. H. 1996 , A new learning algorithm for blind signal separation, in D. Touretzky, M. Mozer M. Hasselmo, eds, ‘Advances in Neural Information Processing Systems’, Vol. 8, MIT Press, Cambridge MA, pp. 752–763. [3] Burel, G. 1992 , Blind separation of sources: A nonlinear algorithm, Neural Networks 5, 937–947. [4] Deville, Y. Charkani, N. 1997 , Analysis of the stability of time- domain source separation algorithms for convolutively mixed signals, in International Comference on Acoustics, Speech, and Signal Processing, pp. 1835–1838. [5] Grabec, I. Sachse, W. 1997 , Synergetics of Measurement, Prediction and Control, Springer-Verlag, Berlin. [6] Hyvarinen, A. Oja, E. 2000 , Independent component analysis: algorithms and applications, Neural Networks 13, 411–430. [7] Kosel, T., Grabec, I. Mužič, P. 2000 , Location of continuous acoustic emission sources generated by air flow, Ultrasonics 38(1–8), 824–826. [8] Lee, T.-W. 1998 , Independent Component Analysis, Theory and Appli- cations, Kluwer Academic Publishers, Boston etc. [9] Lee, T.-W., Bell, A. J. Lambert, R. 1997 , Blind separation of convolved and delayed sources, Advances in Neural Information Processing Systems 9, 758–764. [10] McIntire, P. Miller, R. K., eds 1987 , Acoustic Emission Testing, Vol. 5 of Nondestructive Testing Handbook, 2 edn, American Society for Nondestructive Testing, Philadelphia, USA. [11] Westner, A. G. 1996 , Object-based audio capture: Separating acoustically-mixed sources, MSc Thesis, Massachusetts Institute of Tech- nology. [12] Ziola, S. M. Gorman, M. R. 1991 , Source location in thin plates using cross-correlation, J. Acoust. Soc. Am. 90(5), 2551–2556. References ABSTRACT Part I describes an intelligent acoustic emission locator, while Part II discusses blind source separation, time delay estimation and location of two continuous acoustic emission sources. Acoustic emission (AE) analysis is used for characterization and location of developing defects in materials. AE sources often generate a mixture of various statistically independent signals. A difficult problem of AE analysis is separation and characterization of signal components when the signals from various sources and the mode of mixing are unknown. Recently, blind source separation (BSS) by independent component analysis (ICA) has been used to solve these problems. The purpose of this paper is to demonstrate the applicability of ICA to locate two independent simultaneously active acoustic emission sources on an aluminum band specimen. The method is promising for non-destructive testing of aircraft frame structures by acoustic emission analysis. <|endoftext|><|startoftext|> Introduction Probabilities Classical coins and classical probabilities Quantum coins and quantum probabilities Teleportation Visualizing quantum information processing States of quantum coins Measurements on quantum coins Visualizing teleportation Teleporting classical coins Conclusion Acknowledgments References ABSTRACT A novel way of picturing the processing of quantum information is described, allowing a direct visualization of teleportation of quantum states and providing a simple and intuitive understanding of this fascinating phenomenon. The discussion is aimed at providing physicists a method of explaining teleportation to non-scientists. The basic ideas of quantum physics are first explained in lay terms, after which these ideas are used with a graphical description, out of which teleportation arises naturally. <|endoftext|><|startoftext|> Introduction The extension of quantum field theory to curved space-times has led to the discovery of many qualitatively new phenomena which do not occur in the simpler theory on Minkowski space, such as Hawking radiation; for background and historical references, see [2, 6, 18]. The reconstruction of quantum field theory on a Lorentz-signature space- time from the corresponding Euclidean quantum field theory makes use of Osterwalder-Schrader (OS) positivity [15, 16] and analytic continuation. On a curved background, there may be no proper definition of time-translation and no Hamiltonian; thus, the mathematical framework of Euclidean quan- tum field theory may break down. However, on static space-times there is a Hamiltonian and it makes sense to define Euclidean QFT. This approach was recently taken by the authors [11], in which the fundamental properties of Osterwalder-Schrader quantization and some of the fundamental estimates of constructive quantum field theory1 were generalized to static space-times. The previous work [11], however, did not address the analytic continuation which leads from a Euclidean theory to a real-time theory. In the present article, we initiate a treatment of the analytic continuation by constructing unitary operators which form a representation of the isometry group of the Lorentz-signature space-time associated to a static Riemannian space-time. Our approach is similar in spirit to that of Fröhlich [4] and of Klein and Date: February 22, 2007. 1For background on constructive field theory in flat space-times, see [8, 9]. http://arxiv.org/abs/0704.0052v1 2 ARTHUR JAFFE AND GORDON RITTER Landau [13], who showed how to go from the Euclidean group to the Poincaré group without using the field operators on flat space-time. This work also has applications to representation theory, as it provides a natural (functorial) quantization procedure which constructs nontrivial unitary representations of those Lie groups which arise as isometry groups of static, Lorentz-signature space-times. These groups are typically non- compact. For example, when applied to AdSd+1, our procedure gives a unitary representation of the identity component of SO(d, 2). Moreover, our procedure makes use of the Cartan decomposition, a standard tool in representation theory. 2. Classical Space-Time 2.1. Structure of Static Space-Times. Definition 2.1. A quantizable static space-time is a complete, con- nected orientable Riemannian manifold (M,gab) with a globally-defined (smooth) Killing field ξ which is orthogonal to a codimension-one hypersurface Σ ⊂M , such that the orbits of ξ are complete and each orbit intersects Σ exactly once. Throughout this paper, we assume that M is a quantizable static space- time. Definition 2.1 implies that there is a global time function t defined up to a constant by the requirement that ξ = ∂/∂t. Thus M is foliated by time-slices Mt, and M = Ω− ∪ Σ ∪Ω+ where the unions are disjoint, Σ =M0, and Ω± are open sets corresponding to t > 0 and t < 0 respectively. We infer existence of an isometry θ which reverses the sign of t, θ : Ω± → Ω∓ such that θ 2 = 1, θ|Σ = id. Fix a self-adjoint extension of the Laplacian, and let C = (−∆ +m2)−1 be the resolvent of the Laplacian (also called the free covariance), where m2 > 0. Then C is a bounded self-adjoint operator on L2(M). For each s ∈ R, the Sobolev space Hs(M) is a real Hilbert space, defined as completion of C∞c (M) in the norm (2.1) ‖f‖2s = 〈f,C −sf〉. The inclusion Hs →֒ Hs+k for k > 0 is Hilbert-Schmidt. Define S :=⋂ s<0Hs(M) and S s>0Hs(M). Then S ⊂ H−1(M) ⊂ S form a Gelfand triple, and S is a nuclear space. Recall that S ′ has a natural σ-algebra of measurable sets (see for instance [7, 8, 17]). There is a unique Gaussian probability measure µ with mean zero and covariance C defined on the cylinder sets in S ′ (see [7]). QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES3 More generally, one may consider a non-Gaussian, countably-additive measure µ on S ′ and the space E := L2(S ′, µ). We are interested in the case that the monomials of the form A(Φ) = Φ(f1) . . .Φ(fn) for fi ∈ S are all elements of E , and for which their span is dense in E . This is of course true if µ is the Gaussian measure with covariance C. For an open set Ω ⊂ M , let EΩ denote the closure in E of the set of monomials A(Φ) = iΦ(fi) where supp(fi) ⊂ Ω for all i. Of particular importance for Euclidean quantum field theory is the positive-time subspace E+ := EΩ+ . 2.2. The Operator Induced by an Isometry. Isometries of the under- lying space-time manifold act on a Hilbert space of classical fields arising in the study of a classical field theory. For f ∈ C∞(M) and ψ : M → M an isometry, define fψ ≡ (ψ−1)∗f = f ◦ ψ−1. Since det(dψ) = 1, the operation f → fψ extends to a bounded operator on H±1(M) or on L 2(M). A treatment of isometries for static space-times appears in [11]. Definition 2.2. Let ψ be an isometry, and A(Φ) = Φ(f1) . . .Φ(fn) ∈ E a monomial. Define the induced operator (2.2) Γ(ψ)A ≡ Φ(f1 ψ) . . .Φ(fn and extend Γ(ψ) by linearity to the domain of polynomials in the fields, which is dense in E . 3. Osterwalder-Schrader Quantization 3.1. Quantization of Vectors (The Hilbert Space H of Quantum Theory). In this section we define the quantization map E+ → H , where H is the Hilbert space of quantum theory. The existence of the quantization map relies on a condition known as Osterwalder-Schrader (or reflection) positivity. A probability measure µ on S ′ is said to be reflection positive if (3.1) Γ(θ)F F dµ ≥ 0 for all F in the positive-time subspace E+ ⊂ E . Let Θ = Γ(θ) be the reflection on E induced by θ. Define the sesquilinear form (A,B) on E+×E+ as (A,B) = 〈ΘA,B〉E , so (3.1) states that (F,F ) ≥ 0. Assumption 1 (O-S Positivity). Any measure dµ that we consider is re- flection positive with respect to the time-reflection Θ. 4 ARTHUR JAFFE AND GORDON RITTER Definition 3.1 (OS-Quantization). Given a reflection-positive measure dµ, the Hilbert space H of quantum theory is the completion of E+/N with respect to the inner product given by the sesquilinear form (A,B). Denote the quantization map Π for vectors E+ → H by Π(A) = Â, and write (3.2) 〈Â, B̂〉H = (A,B) = 〈ΘA,B〉E for A,B ∈ E+ . 3.2. Quantization of Operators. The basic quantization theorem gives a sufficient condition to map a (possibly unbounded) linear operator T on E to its quantization, a linear operator T̂ on H . Consider a densely-defined operator T on E , the unitary time-reflection Θ, and the adjoint T+ = ΘT ∗Θ. A preliminary version of the following was also given in [10]. Definition 3.2 (Quantization Condition I). The operator T satisfies QC-I i. The operator T has a domain D(T ) dense in E . ii. There is a subdomain D0 ⊂ E+ ∩D(T )∩D(T +), for which D̂0 ⊂ H is dense. iii. The transformations T and T+ both map D0 into E+. Theorem 3.3 (Quantization I). If T satisfies QC-I, then i. The operators T ↾D0 and T +↾D0 have quantizations T̂ and T̂+ with domain D̂0. ii. The operators T̂ ∗ = T̂ ↾D̂0 and T̂+ agree on D̂0. iii. The operator T̂ ↾D0 has a closure, namely T̂ Proof. We wish to define the quantization T̂ with the putative domain D̂0 (3.3) T̂  = T̂A . For any vector A ∈ D0 and for any B ∈ (D0 ∩ N ), it is the case that  = Â+B. The transformation T̂ is defined by (3.3) iff T̂A = ̂T (A+B) = T̂A+ T̂B. Hence one needs to verify that T : D0 ∩ N → N , which we now The assumption D0 ⊂ D(T +), along with the fact that Θ is unitary, ensures that ΘD0 ⊂ D(T ∗). Therefore for any F ∈ D0, (3.4) 〈ΘF, TB〉E = 〈T ∗ΘF,B〉E = 〈Θ(ΘT ∗ΘF ) , B〉E = 〈ΘT +F,B〉E = 〈T̂ +F, B̂〉H . In the last step we use the fact assumed in part (iii) of QC-I that T+ : D0 → E+, yielding the inner product of two vectors in H . We infer from the Schwarz inequality in H that |〈ΘF, TB〉E | ≤ ‖T̂+F‖H ‖B̂‖H = 0 . As 〈ΘF, TB〉E = 〈F̂ , T̂B〉H , this means that T̂B ⊥ D̂0. As D̂0 is dense in H by QC-I.ii, we infer T̂B = 0. In other words, TB ∈ N as required to define T̂ . QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES5 In order show that D̂0 ⊂ D(T̂ ∗), perform a similar calculation to (3.4) with arbitrary A ∈ D0 replacing B, namely (3.5) 〈F̂ , T̂ Â〉H = 〈ΘF, TA〉E = 〈Θ(ΘT ∗ΘF ) , A〉E = 〈ΘT +F,A〉E = 〈T̂+F, Â〉H . The right side is continuous in  ∈ H , and therefore F̂ ∈ D(T ∗). Further- more T ∗F̂ = T̂+F . This identity shows that if F ∈ N , then T̂+F = 0. Hence T+↾D0 has a quantization T̂ +, and we may write (3.5) as (3.6) T ∗F̂ = T̂+F̂ , for all F ∈ D0 . In particular T̂ ∗ is densely defined so T̂ has a closure. This completes the proof. � Definition 3.4 (Quantization Condition II). The operator T satisfies QC-II i. Both the operator T and its adjoint T ∗ have dense domains D(T ),D(T ∗) ⊂ ii. There is a domain D0 ⊂ E+ in the common domain of T , T +, T+T , and TT+. iii. Each operator T , T+, T+T , and TT+ maps D0 into E+. Theorem 3.5 (Quantization II). If T satisfies QC-II, then i. The operators T ↾D0 and T +↾D0 have quantizations T̂ and T̂+ with domain D̂0. ii. If A,B ∈ D0, one has 〈B̂, T̂ Â〉H = 〈T̂+B̂, Â〉H . Remarks. i. In Theorem 3.5 we drop the assumption that the domain D̂0 is dense, obtaining quantizations T̂ and T̂+ whose domains are not necessarily dense. In order to compensate for this, we assume more properties concerning the domain and the range of T+ on E . ii. As D̂0 need not be dense in H , the adjoint of T̂ need not be defined. Nevertheless, one calls the operator T̂ symmetric in case one has (3.7) 〈B̂, T̂ Â〉H = 〈T̂ B̂, Â〉H , for all A,B ∈ D0 . iii. If Ŝ ⊃ T̂ is a densely-defined extension of T̂ , then Ŝ∗ = T̂+ on the domain D̂0. Proof. We define the quantization T̂ with the putative domain D̂0. As in the proof of Theorem 3.3, this quantization T̂ is well-defined iff it is the case that T : D0 ∩ N → N . For any F ∈ D0 ∩ N , by definition ‖F̂‖H = 0. 〈TF, TF 〉H = (TF, TF ) = 〈ΘTF, TF 〉E = 〈F, T ∗ΘTF 〉 where one uses the fact that D0 ⊂ D(T +T ). Thus 〈TF, TF 〉H = ΘF, T+TF = 〈F, T+TF 〉H . 6 ARTHUR JAFFE AND GORDON RITTER Here we use the fact that T+T maps D0 to E+. Thus one can use the Schwarz inequality on H to obtain 〈TF, TF 〉H ≤ ‖F̂‖H ‖T̂ = 0 . Hence T : D0 ∩ N → N , and T has a quantization T̂ with domain D̂0. In order verify that T+↾D0 has a quantization, one needs to show that T+ : D0 ∩N ⊂ N . Repeat the argument above with T + replacing T . The assumption TT+ : D0 → E+ yields for F ∈ D0 ∩ N , 〈T+F, T+F 〉H = 〈T ∗ΘF, T+F 〉E = 〈ΘF, TT +F 〉E = 〈F̂ , T̂ T +F 〉H . Use the Schwarz inequality in H to obtain the desired result that 〈T+F, T+F 〉H ≤ ‖F̂‖H ‖T̂ T = 0 . Hence T+ has a quantization T̂+ with domain D̂0, and for B ∈ D0 one has T̂+B = T̂+B̂. In order to establish (ii), assume that A,B ∈ D0. Then 〈B̂, T̂ Â〉H = 〈ΘB,TA〉E = 〈Θ(ΘT ∗ΘB) , A〉E = 〈ΘT +B,A〉E = 〈T̂+B, Â〉H = 〈T̂+B̂, Â〉H .(3.8) This completes the proof. 4. Structure and Representation of the Lie Algebra of Killing Fields For the remainder of this paper we assume the following, which is clearly true in the Gaussian case as the Laplacian commutes with the isometry group G. (A further explanation was given in [11].) Assumption 2. The isometry groups G that we consider leave the measure dµ invariant, in the sense that Γ, defined above, is a unitary representation of G on E . 4.1. The Representation of g on E . Lemma 4.1. Let Gi be an analytic group with Lie algebra gi (i = 1, 2), and let λ : g1 → g2 be a homomorphism. There cannot exist more than one analytic homomorphism π : G1 → G2 for which dπ = λ. If G1 is simply connected then there is always one such π. Let D = d/dt denote the canonical unit vector field on R. Let G be a real Lie group with algebra g, and let X ∈ g. The map tD → tX(t ∈ R) is a homomorphism of Lie(R) → g, so by the Lemma there is a unique analytic homomorphism ξX : R → G such that dξX(D) = X. Conversely, if η is an analytic homomorphism of R → G, and if we let X = dη(D), it is obvious that η = ξX . Thus X 7→ ξX is a bijection of g onto the set of analytic homomorphisms R → G. The exponential map is defined by QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES7 exp(X) := ξX(1). For complex Lie groups, the same argument applies, replacing R with C throughout. Since g is connected, so is exp(g). Hence exp(g) ⊆ G0, where G0 denotes the connected component of the identity in G. It need not be the case for a general Lie group that exp(g) = G0, but for a large class of examples (the so-called exponential groups) this does hold. For any Lie group, exp(g) contains an open neighborhood of the identity, so the subgroup generated by exp(g) always coincides with G0. We will apply the above results with G = Iso(M), the isometry group of M , and g = Lie(G) the algebra of global Killing fields. Thus we have a bijec- tive correspondence between Killing fields and 1-parameter groups of isome- tries. This correspondence has a geometric realization: the 1-parameter group of isometries φs = ξX(s) = exp(sX) corresponding to X ∈ g is the flow generated by X. Consider the two different 1-parameter groups of unitary operators: (1) the unitary group φ∗s on L 2(M), and (2) the unitary group Γ(φs) on E . Stone’s theorem applies to both of these unitary groups to yield densely- defined self-adjoint operators on the respective Hilbert spaces. In the first case, the relevant self-adjoint operator is simply an extension of −iX, viewed as a differential operator on C∞c (M). This is because for f ∈ C∞c (M) and p ∈M , we have: Xpf = (LXf)(p) = f(φs(p))|s=0. Thus −iX is a densely-defined symmetric operator on L2(M), and Stone’s theorem implies that −iX has self-adjoint extensions. In the second case, the unitary group Γ(φs) on E also has a self-adjoint generator Γ(X), which can be calculated explicitly. By definition, e−isΓ(X) Φ(fi) Φ(fi ◦ φ−s). Now replace s→ −s and calculate d/ds|s=0 applied to both sides of the last equation to see that Φ(fi) Φ(f1) . . .Φ(−iXfj)Φ(fj+1) . . .Φ(fn) . One may check that Γ is a Lie algebra representation of g, i.e. Γ([X,Y ]) = [Γ(X),Γ(Y )]. 4.2. The Cartan Decomposition of g. For each ξ ∈ g, there exists some dense domain in E on which Γ(ξ) is self-adjoint, as discussed previously. 8 ARTHUR JAFFE AND GORDON RITTER However, the quantizations Γ̂(ξ) acting on H may be hermitian, anti- hermitian, or neither depending on whether there holds a relation of the (4.1) Γ(ξ)Θ = ±ΘΓ(ξ), with one of the two possible signs, or whether no such relation holds. Even if (4.1) holds, to complete the construction of a unitary representa- tion one must prove that there exists a dense domain in H on which Γ̂(ξ) is self-adjoint or skew-adjoint. This nontrivial problem will be dealt with in a later section using Theorems 3.3 and 3.5 and the theory of symmetric local semigroups [12, 4]. Presently we determine which elements within g satisfy relations of the form (4.1). Let ϑ := θ∗ as an operator on C∞(M), and consider a Killing field X ∈ g also as an operator on C∞(M). Define T : g → g by (4.2) T (X) := ϑXϑ. From (4.2) it is not obvious that the range of T is contained in g. To prove this, we recall some geometric constructions. Let M,N be manifolds, let ψ : M → N be a diffeomorphism, and X ∈ Vect(M). Then (4.3) ψ−1∗Xψ∗ = X(· ◦ ψ) ◦ ψ−1. defines an operator on C∞(N). One may check that this operator is a derivation, thus (4.3) defines a vector field on N . The vector field (4.3) is usually denoted ψ∗X = dψ(Xψ−1(p)) and referred to as the push-forward of X. We now wish to show that g = g+⊕ g−, where g± are the ±1-eigenspaces of T . This is proven by introducing an inner product (X,Y )g on g with respect to which T is self-adjoint. Theorem 4.2. Consider g as a Hilbert space with inner product (X,Y )g. The operator T : g → g is self-adjoint with T 2 = I; hence (4.4) g = g+ ⊕ g− as an orthogonal direct sum of Hilbert spaces, where g± are the ±1-eigenspaces of T . Further, ∂t ∈ g− hence dim(g−) ≥ 1. Elements of g− have hermitian quantizations, while elements of g+ have anti-hermitian quantizations. Proof. Write (4.2) as (4.5) T (X) = θ−1∗Xθ∗ = θ∗X . Thus T is the operator of push-forward by θ. The push-forward of a Killing field by an isometry is another Killing field, hence the range of T is contained 2It is not the case that g− consists only of ∂t. In particular, dim(g−) = 2 for M = H 2. It can occur that dim g+ = 0. QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES9 in g. Also, T must have a trivial kernel since T 2 = I, and this implies that T is surjective. It follows from (4.5) that T is a Hermitian operator on g. Hence T is diagonalizable and has real eigenvalues which are square roots of 1. This establishes the decomposition (4.4). That elements of g− have hermitian quantizations, while elements of g+ have anti-hermitian quantizations follows from Theorem 3.3. � A Cartan involution is a Lie algebra homomorphism g → g which squares to the identity. It follows from (4.2) that T is a Lie algebra homomorphism; thus, Theorem 4.2 implies that T is a Cartan involution of g. This implies that the eigenspaces (g+, g−) form a Cartan pair, meaning that (4.6) [g+, g+] ⊂ g+, [g+, g−] ⊂ g−, and [g−, g−] ⊂ g+ . Clearly g+ is a subalgebra while g− is not, and any subalgebra contained in g− is abelian. 5. Reflection-Invariant and Reflected Isometries Let G = Iso(M) denote the isometry group of M , as above. Then G has a Z2 subgroup containing {1, θ}. This subgroup acts on G by conjuga- tion, which is just the action ψ → ψθ := θψθ. Conjugation is an (inner) automorphism of the group, so (ψφ)θ = ψθφθ, (ψθ)−1 = (ψ−1)θ. Definition 5.1. We say that ψ ∈ G is reflection-invariant if ψθ = ψ, and that ψ is reflected if ψθ = ψ−1. Let GRI denote the subgroup of G consisting of reflection-invariant elements, and let GR denote the subset of reflected elements. Note that GRI is the stabilizer of the Z2 action, hence a subgroup. An alternate proof of this proceeds usingGRI = exp(g+). Although GR is closed under the taking of inverses and does contain the identity, the product of two reflected isometries is no longer reflected unless they commute. Generally, the product of an element of GR with an element ofGRI is neither an element of GR nor of GRI . The only isometry that is both reflection-invariant and reflected is θ itself. Thus we have: GR ∩GRI = {1, θ} ⊂ GR ∪GRI ( G. Theorem 5.2. Let G0 denote the connected component of the identity in G. Then G0 is generated by GR ∪ GRI . (This is a form of the Cartan decomposition for G.) 10 ARTHUR JAFFE AND GORDON RITTER Proof. Since g = g+ ⊕ g− as a direct sum of vector spaces (though not of Lie algebras), we have exp(g) exp(g+) ∪ exp(g−) Choose bases {ξ±,i}i=1,...,n± for g± respectively. Then we have: {exp(sξ+,i) : 1 ≤ i ≤ n+, s ∈ R}∪{exp(sξ−,j) : 1 ≤ j ≤ n−, s ∈ R} Furthermore, exp(sξ−,i) is reflected, while exp(sξ+,i) is reflection-invariant, completing the proof. � Corollary 5.3. The Lie algebra of the subgroup GRI is g+. To summarize, the isometry group of a static space-time can always be generated by a collection of n (= dim g) one-parameter subgroups, each of which consists either of reflected isometries, or reflection-invariant isome- tries. 6. Construction of Unitary Representations 6.1. Self-adjointness of Semigroups. In this section, we recall several known results on self-adjointness of semigroups. Roughly speaking, these results imply that if a one-parameter family Sα of unbounded symmetric operators satisfies a semigroup condition of the form SαSβ = Sα+β, then under suitable conditions one may conclude essential self-adjointness. A theorem of this type appeared in a 1970 paper of Nussbaum [14], who assumed that the semigroup operators have a common dense domain. The result was rediscovered independently by Fröhlich, who applied it to quan- tum field theory in several important papers [5, 3]. For our intended appli- cation to quantum field theory, it turns out to be very convenient to drop the assumption that ∃ a such that the Sα all have a common dense domain for |α| < a, in favor of the weaker assumption that α>0D(Sα) is dense. A generalization of Nussbaum’s theorem which allows the domains of the semigroup operators to vary with the parameter, and which only requires the union of the domains to be dense, was later formulated and two independent proofs were given: one by Fröhlich [4], and another by Klein and Landau [12]. The latter also used this theorem in their construction of representations of the Euclidean group and the corresponding analytic continuation to the Lorentz group [13]. In order to keep the present article self-contained, we first define symmet- ric local semigroups and then recall the refined self-adjointness theorem of Fröhlich, and Klein and Landau. Definition 6.1. Let H be a Hilbert space, let T > 0 and for each α ∈ [0, T ], let Sα be a symmetric linear operator on the domain Dα ⊂ H , such that: (i) Dα ⊃ Dβ if α ≤ β and D := 0<α≤T Dα is dense in H , (ii) α→ Sα is weakly continuous, (iii) S0 = I, Sβ(Dα) ⊂ Dα−β for 0 ≤ β ≤ α ≤ T , and QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES11 (iv) SαSβ = Sα+β on Dα+β for α, β, α + β ∈ [0, T ]. In this situation, we say that (Sα,Dα, T ) is a symmetric local semigroup. It is important that Dα is not required to be dense in H for each α; the only density requirement is (i). Theorem 6.2 ([12, 4]). For each symmetric local semigroup (Sα,Dα, T ), there exists a unique self-adjoint operator A such that3 Dα ⊂ D(e −αA) and Sα = e −αA|Dα for all α ∈ [0, T ]. Also, A ≥ −c if and only if ‖Sαf‖ ≤ e cα‖f‖ for all f ∈ Dα and 0 < α < T . 6.2. Reflection-Invariant Isometries. Lemma 6.3. Let ψ be a reflection-invariant isometry and assume ∃ p ∈ Ω+ such that ψ(p) ∈ Ω+. Then ψ preserves the positive-time subspace, i.e. ψ(Ω+) ⊆ Ω+. Proof. We first prove that ψ(Σ) ⊆ Σ. Suppose not; then ∃ p ∈ Σ with ψ(p) 6∈ Σ. Assume ψ(p) ∈ Ω+ (without loss of generality: we could repeat the same argument with ψ(p) ∈ Ω−). Then Ω+ contains (θψθ)(p) = θψ(p) ∈ Ω−, a contradiction since Ω−∩Ω+ = ∅. We used the fact that θ|Σ = id so θ(p) = p. Hence ψ restricts to an isometry of Σ. It follows that the restriction of ψ to M ′ =M \Σ is also an isometry. However, M ′ = Ω− ⊔Ω+, where ⊔ denotes the disjoint union. Therefore ψ(Ω+) is wholly contained in either Ω+ or Ω−, since ψ is a homeomorphism and so ψ(Ω+) is connected. The possibility that ψ(Ω+) ⊆ Ω− is ruled out by our assumption, so ψ(Ω+) ⊆ Ω+. � Lemma 6.3 has the immediate consequence that if ξ ∈ g+ then the one- parameter group associated to ξ is positive-time-invariant. This result plays a key role in the proof of Theorem 6.4. 6.3. Construction of Unitary Representations. The rest of this section is devoted to proving that the theory of symmetric local semigroups can be applied to the quantized operators on H corresponding to each of a set of 1-parameter subgroups of G = Iso(M). The proof relies upon Lemma 6.3, and Theorems 3.3, 3.5 and 6.2. Theorem 6.4. Let (M,gab) be a quantizable static space-time. Let ξ be a Killing field which lies in g+ or g−, with associated one-parameter group of isometries {φα}α∈R. Then there exists a densely-defined self-adjoint opera- tor Aξ on H such that Γ̂(φα) = e−αAξ , if ξ ∈ g− eiαAξ if ξ ∈ g+. 3The authors of [4, 12] also showed that bD := 0<α≤S 0<β<α Sβ(Dα) , where 0 < S ≤ T, is a core for A, i.e. (A, bD) is essentially self-adjoint. 12 ARTHUR JAFFE AND GORDON RITTER Proof. First suppose that ξ ∈ g−, which implies that the isometries φα are reflected, and so Γ(φα) + = Γ(φα). Define Ωξ,α := φ α (Ω+). For all α in some neighborhood of zero, Ωξ,α is a nonempty open subset of Ω+, and moreover, as α → 0 +, Ωξ,α increases to fill Ω+ with Ωξ,0 = Ω+. These statements follow immediately from the fact that, for each p ∈ Ω+, φα(p) is continuous with respect to α, and φ0 is the identity map. Since φα(Ωξ,α) ⊆ Ω+, we infer that Γ(φα)EΩξ,α ⊆ E+. By Theorem 3.5, Γ(φα) has a quantization which is a symmetric operator on the domain Dξ,α := Π(EΩξ,α). Note that Dξ,α is not necessarily dense in H . 4 We now show that Theorem 6.2 can be applied. Fix some positive constant a with Ωξ,a nonempty. Note that 0<α≤a Ωξ,α = Ω+ ⇒ 0<α≤a EΩξ,α = E+. It follows that Dξ := 0<α≤a is dense in H . This establishes condition (i) of Definition 6.1, and the other conditions are routine verifications. Theorem 6.2 implies existence of a densely-defined self-adjoint operator Aξ on H , such that Γ̂(φα) = exp(−αAξ) for all α ∈ [0, a] . This proves the theorem in case ξ ∈ g−. Now suppose that ξ ∈ g+, implying that the isometries φα are reflection- invariant, and Γ(φα) + = Γ(φα) −1 = Γ(φ−α) on E . Lemma 6.3 implies that Γ(φα)E+ ⊆ E+. By Theorem 3.3, Γ(φα) has a quantization Γ̂(φα) which is defined and satisfies Γ̂(φα) ∗ = Γ̂(φα) on the domain Π(E+), which is dense in H by definition. In this case we do not need Theorem 6.2; for each α, Γ̂(φα) extends by continuity to a one- parameter unitary group defined on all of H (not only for a dense subspace). By Stone’s theorem, Γ̂(φα) = exp(iαAξ) for Aξ self-adjoint and for all α ∈ R. The proof is complete. � 4Density of Dξ,α would be implied by a Reeh-Schlieder theorem, which we do not prove except in the free case. Theorem 6.2 removes the need for a Reeh-Schlieder theorem in this argument. QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES13 7. Analytic Continuation Each Riemannian static space-time (M,gab) has a Lorentzian continuation Mlor, which we construct as follows. In adapted coordinates, the metric gab on M takes the form (7.1) ds2 = F (x)dt2 + Gµν(x)dx µdxν . The analytic continuation t → −it of (7.1) is standard and gives a metric of Lorentz signature, ds2lor = −F dt 2 + G dx2, by which we define the Lorentzian space-time Mlor. Einstein’s equation Ricg = k g is preserved by the analytic continuation, but we do not use this fact anywhere in the present paper. Let {ξ i : 1 ≤ i ≤ n±} be bases of g±, respectively. Let A i = Aξ(±) the densely-defined self-adjoint operators on H , constructed by Theorem 6.4. Let (7.2) U i (α) = exp(iαA i ) , for 1 ≤ i ≤ n± be the associated one-parameter unitary groups on H . We claim that the group generated by the n = n+ + n− one-parameter unitary groups (7.2) is isomorphic to the identity component of Glor := Iso(Mlor), the group of Lorentzian isometries. Since locally, the group structure is determined by its Lie algebra, it suffices to check that the generators satisfy the defining relations of glor := Lie(Glor). Since quantization of operators preserves multiplication, we have (7.3) X,Y,Z ∈ g, [X,Y ] = Z ⇒ [Γ̂(X), Γ̂(Y )] = Γ̂(Z). In what follows, we will use the notation ĝ± for {Γ̂(X) : X ∈ g±}. Quantization converts the elements of g− from skew operators into Her- mitian operators; i.e. elements of ĝ− are Hermitian on H and hence, ele- ments of i ĝ− are skew-symmetric on H . Thus ĝ+ ⊕ i ĝ− is a Lie algebra represented by skew-symmetric operators on H . Theorem 7.1. We have an isomorphism of Lie algebras: (7.4) glor ∼= ĝ+ ⊕ i ĝ− . Proof. LetMC be the manifold obtained by allowing the t coordinate to take values in C. Define ψ :MC →MC by t 7→ −it. Then glor is generated by i }1≤i≤n+ ∪ {ηj}1≤j≤n− , where ηj := iψ It is possible to define a set of real structure constants fijk such that (7.5) [ξ fijkξ 14 ARTHUR JAFFE AND GORDON RITTER Applying ψ∗ to both sides of (7.5), the commutation relations of glor are seen to be (7.6) [ηi, ηj ] = −fijkξ together with the same relations for g+ as before. Now (7.3) implies that (7.6) are the precisely the commutation relations of ĝ+ ⊕ i ĝ−, completing the proof of (7.4). � Corollary 7.2. Let (M,gab) be a quantizable static space-time. The unitary groups (7.2) determine a unitary representation of G0lor on H . 7.1. Conclusions. We have obtained the following conclusions. There is a unitary representation of the group G0lor on the physical Hilbert space H of quantum field theory on the static space-time M . This representation maps the time-translation subgroup into the unitary group exp(itH), where the energy H ≥ 0 is a positive, densely-defined self-adjoint operator corre- sponding to the Hamiltonian of the theory. The Hilbert space H contains a ground state Ψ0 = 1̂ which is such that HΨ0 = 0 and Ψ0 is invariant under the action of all spacetime symmetries. We obtain these results via analytic continuation from the Euclidean path integral, under mild assumptions on the measure which should include all physically interesting examples. This is done without introducing the field operators; nonetheless, Theorems 3.3 and 3.5 do suffice to construct field operators. In the special case M = Rd with G = SO(4), we obtain a unitary representation of the proper orthochronous Lorentz group, G0lor = L + = SO 0(3, 1). 8. Hyperbolic Space and Anti-de Sitter Space Consider Euclidean quantum field theory on M = H d. The metric is ds2 = r−2 dx2i , where we define r = xd for convenience. The Laplacian is (8.1) ∆ d = (2− d)r + r2∆ The d−1 coordinate vector fields {∂/∂xi : i 6= d} are all static Killing fields, and any one of the coordinates xi (i 6= d) is a satisfactory representation of time in this space-time. It is convenient to define t = x1 as before, and to identify t with time. The time-zero slice is M0 = H d−1. From H d = {v ∈ Rd,1 | 〈v, v〉 = −1, v0 > 0} it follows that Isom(H d) = O+(d, 1) and the orientation-preserving isometry group is SO+(d, 1). QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES15 Figure 1. Flow lines of the Killing field ζ = (t2 − r2)∂t + 2tr ∂r on H For constant curvature spaces, one may solve Killing’s equation LKg = 0 explicitly. Let us illustrate the solutions and their quantizations for d = 2. The three Killing fields (8.2) ξ = ∂t, η = t∂t + r∂r, ζ = (t 2 − r2)∂t + 2tr ∂r are a convenient basis for g. Any d-dimensional manifold satisfies dim g ≤ d(d + 1)/2, manifolds saturating the bound are said to be maximally sym- metric, and H d is maximally symmetric. Now, ∂tf(−t) = −f ′(−t) so ∂tΘ = −Θ∂t, hence ∂t ∈ g−. Similar calcula- tions show [Θ, η] = 0 and Θζ = −ζΘ. Thus η spans g+, while ∂t, ζ span g−. The commutation relations5 for g are: [η, ζ] = ζ, [η, ∂t] = −∂t, [ζ, ∂t] = −2η. These calculations verify that (g+, g−) forms a Cartan pair, as defined in (4.6). The flows associated to (8.2) are easily visualized: ξ is a right-translation, and η flow-lines are radially outward from the Euclidean origin. The flows of ζ are Euclidean circles, indicated by the darker lines in Figure 1. Hence the flows of η are defined on all of E+, while the flows of ζ are analogous to space-time rotations in R2, and hence, must be defined on a wedge of the Wα = {(t, r) : t, r > 0, tan −1(r/t) < α}. The simple geometric idea of Section 6.2 is nicely confirmed in this case: the flows of η (the generator of g+) preserve the t = 0 plane, and are separately isometries of Ω+ and Ω−. Corollary 7.2 implies that the procedure outlined above defines a uni- tary representation of the identity component of Iso(AdS2) on the physical Hilbert space H for quantum field theory on this background, including theories with interactions that preserve the symmetry. Since Iso(AdSd+1) = 5Note that quite generally [g−, g−] ⊆ g+ so it’s automatic that [ζ, ∂t] is proportional to η. 16 ARTHUR JAFFE AND GORDON RITTER SO(d, 2), we have a unitary representation of SO0(1, 2). The latter is a non- compact, semisimple real Lie group, and thus it has no finite-dimensional unitary representations, but a host of interesting infinite-dimensional ones. Appendix A. Euclidean Reeh-Schlieder Theorem We prove the Euclidean Reeh-Schlieder property for free theories on curved backgrounds. It is reasonable to expect this property to extend to interact- ing theories on curved backgrounds, but it would have to be established for each such model since it depends explicitly on the two-point function. The Reeh-Schlieder theorem guarantees the existence of a dense quanti- zation domain based on any open subset of Ω+. For this reason, one could use the Reeh-Schlieder (RS) theorem with Nussbaum’s theorem [14] to con- struct a second proof of Theorem 6.4 under the additional assumption that M is real-analytic. Fortunately, our proof of Theorem 6.4 is completely independent of the Reeh-Schlieder property. This has two advantages: we do not have to assume M is a real-analytic manifold and, more importantly, our proof of Theorem 6.4 generalizes immediately and transparently to interacting theories as long as the Hilbert space H is not modified by the interaction. We state and prove this using the one-particle space; however, the result clearly extends to the quantum-field Hilbert space. Theorem A.1. Let M be a quantizable static space-time endowed with a real-analytic structure, and assume that gab is real-analytic. Let O ⊂ Ω+ and D = C∞(O) ⊂ L2(Ω+). Then D̂ ⊥ = {0}. Proof. Let f ∈ L2(Ω+) with f̂ ⊥ D . For x ∈ Ω+, define η(x) := 〈f̂ , δ̂x〉H = 〈Θf,Cδx〉L2 . Real-analyticity of η(x) follows from the real-analyticity of (the integral kernel of) C, which in turn follows from the elliptic regularity theorem in the real-analytic category (see for instance [1, Sec. II.1.3]). Now by assumption, for any g ∈ C∞c (O), we have 0 = 〈ĝ, f̂〉H = 〈Θf,Cg〉L2(M). Let g → δx for x ∈ O. Then 0 = 〈Θf,Cδx〉L2 ≡ η(x). Since η|O = 0, by real-analyticity we infer the vanishing of η on Ω+, completing the proof. � Acknowledgements. We are grateful to Hanno Gottschalk and Alexander Strohmaier for helpful discussions, and G.R. is grateful to the Universität Bonn for their hospitality during February 2007. References [1] Lipman Bers, Fritz John, and Martin Schechter. Partial differential equations. American Mathematical Society, Providence, R.I., 1979. Lec- tures in Applied Mathematics 3. QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES17 [2] N. D. Birrell and P. C. W. Davies. Quantum fields in curved space, vol- ume 7 of Cambridge Monographs on Mathematical Physics. Cambridge University Press, Cambridge, 1982. [3] W. Driessler and J. Fröhlich. The reconstruction of local observable algebras from the Euclidean Green’s functions of relativistic quantum field theory. Ann. Inst. H. Poincaré Sect. A (N.S.), 27(3):221–236, 1977. [4] J. Fröhlich. Unbounded, symmetric semigroups on a separable Hilbert space are essentially selfadjoint. Adv. in Appl. Math., 1(3):237–256, 1980. [5] Jürg Fröhlich. The pure phases, the irreducible quantum fields, and dynamical symmetry breaking in Symanzik-Nelson positive quantum field theories. Ann. Physics, 97(1):1–54, 1976. [6] Stephen A. Fulling. Aspects of quantum field theory in curved space- time, volume 17 of London Mathematical Society Student Texts. Cam- bridge University Press, Cambridge, 1989. [7] I. M. Gel′fand and N. Ya. Vilenkin. Generalized functions. Vol. 4. Academic Press [Harcourt Brace Jovanovich Publishers], New York, 1964 [1977]. Applications of harmonic analysis, Translated from the Russian by Amiel Feinstein. [8] James Glimm and Arthur Jaffe. Quantum physics. Springer-Verlag, New York, second edition, 1987. A functional integral point of view. [9] Arthur Jaffe. Constructive quantum field theory. In Mathematical physics 2000, pages 111–127. Imp. Coll. Press, London, 2000. [10] Arthur Jaffe. Introduction to Quantum Field Theory. 2005. Lecture notes from Harvard Physics 289r, available in part online at http://www.arthurjaffe.com/Assets/pdf/IntroQFT.pdf. [11] Arthur Jaffe and Gordon Ritter. Quantum field theory on curved backgrounds. i. the euclidean functional integral. Comm. Math. Phys., 270(2):545–572, 2007. [12] Abel Klein and Lawrence J. Landau. Construction of a unique self- adjoint generator for a symmetric local semigroup. J. Funct. Anal., 44(2):121–137, 1981. [13] Abel Klein and Lawrence J. Landau. From the Euclidean group to the Poincaré group via Osterwalder-Schrader positivity. Comm. Math. Phys., 87(4):469–484, 1983. [14] A. E. Nussbaum. Spectral representation of certain one-parametric families of symmetric operators in Hilbert space. Trans. Amer. Math. Soc., 152:419–429, 1970. [15] Konrad Osterwalder and Robert Schrader. Axioms for Euclidean Green’s functions. Comm. Math. Phys., 31:83–112, 1973. [16] Konrad Osterwalder and Robert Schrader. Axioms for Euclidean Green’s functions. II. Comm. Math. Phys., 42:281–305, 1975. With an appendix by Stephen Summers. http://www.arthurjaffe.com/Assets/pdf/IntroQFT.pdf 18 ARTHUR JAFFE AND GORDON RITTER [17] Barry Simon. The P (φ)2 Euclidean (quantum) field theory. Princeton University Press, Princeton, N.J., 1974. Princeton Series in Physics. [18] Robert M. Wald. Quantum field theory in curved space-time. In Grav- itation et quantifications (Les Houches, 1992), pages 63–167. North- Holland, Amsterdam, 1995. E-mail address: arthur jaffe@harvard.edu Harvard University, 17 Oxford St., Cambridge, MA 02138 E-mail address: ritter@post.harvard.edu Harvard University, 17 Oxford St., Cambridge, MA 02138 1. Introduction 2. Classical Space-Time 2.1. Structure of Static Space-Times 2.2. The Operator Induced by an Isometry 3. Osterwalder-Schrader Quantization 3.1. Quantization of Vectors (The Hilbert Space H of Quantum Theory) 3.2. Quantization of Operators 4. Structure and Representation of the Lie Algebra of Killing Fields 4.1. The Representation of g on E 4.2. The Cartan Decomposition of g 5. Reflection-Invariant and Reflected Isometries 6. Construction of Unitary Representations 6.1. Self-adjointness of Semigroups 6.2. Reflection-Invariant Isometries 6.3. Construction of Unitary Representations 7. Analytic Continuation 7.1. Conclusions 8. Hyperbolic Space and Anti-de Sitter Space Appendix A. Euclidean Reeh-Schlieder Theorem Acknowledgements References ABSTRACT We study space-time symmetries in scalar quantum field theory (including interacting theories) on static space-times. We first consider Euclidean quantum field theory on a static Riemannian manifold, and show that the isometry group is generated by one-parameter subgroups which have either self-adjoint or unitary quantizations. We analytically continue the self-adjoint semigroups to one-parameter unitary groups, and thus construct a unitary representation of the isometry group of the associated Lorentzian manifold. The method is illustrated for the example of hyperbolic space, whose Lorentzian continuation is Anti-de Sitter space. <|endoftext|><|startoftext|> Introduction In Finsler geometry all geometric objects depend not only on positional coordi- nates, as in Riemannian geometry, but also on directional arguments. In Riemannian geometry there is a canonical linear connection on the manifold M , while in Finsler geometry there is a corresponding canonical linear connection, due to E. Cartan, which is not a connection on M but is a connection on π−1(TM), the pullback of the tangent bundle TM by π : TM −→ M (the pullback approach). Moreover, in Riemannian geometry there is one curvature tensor and one torsion tensor associated with a given linear connection on the manifold M , whereas in Finsler geometry there are three curvature tensors and five torsion tensors associated with a given linear connection on π−1(TM). Most of the special spaces in Finsler geometry are derived from the fact that the π-tensor fields (torsions and curvatures) associated with the Cartan connection satisfy special forms. Consequently, special spaces of Finsler geometry are more numerous than those of Riemannian geometry. Special Finsler spaces are investigated locally (using local coordinates) by many authors: M. Matsumoto [16], [18], [15], [14] and others [6], [19], [8], [7]. On the other hand, the global (or intrinsic, free from local coordinates) investigation of such spaces is very rare in the literature. Some considerable contributions in this direction are due to A. Tamim [24], [25]. In the present paper, we provide a global presentation of the theory of special Finsler manifolds. We introduce and investigate globally many of the most important and most commonly used special Finsler manifolds : locally Minkowskian, Berwald, Landesberg, general Landesberg, P -reducible, C-reducible, semi-C-reducible, quasi- C-reducible, P ∗-Finsler, Ch-recurrent, Cv-recurrent, C0-recurrent, Sv-recurrent, Sv- recurrent of the second order, C2-like, S3-like, S4-like, P2-like, R3-like, P -symmetric, h-isotropic, of scalar curvature, of constant curvature, of p-scalar curvature, of s-ps- curvature. The paper consists of two parts, preceded by a preliminary section (§1), which provides a brief account of the basic concepts of the pullback approach to Finsler geometry necessary to this work. For more detail, the reader is referred to [1], [3], [5] and [24]. In the first part (§2), we introduce the global definitions of the aforementioned special Finsler manifolds in such a way that, when localized, they yield the usual local definitions current in the literature (see the appendix). The definitions are arranged according to the type of the defining property of the special Finsler manifold concerned. In the second part (§3), various relationships between the different types of the considered special Finsler manifolds are found. Many local results, known in the literature, are proved globally and several new results are obtained. As a by-product of some of the obtained results, interesting identities and properties concerning the torsion tensor fields and the curvature tensor fields are deduced, which in turn play a key role in obtaining other results. Among the obtained results are: a characterization of Riemannian manifolds, a characterization of Sv-recurrent manifolds, a characterization of P -symmetric manifolds, a characterization of Berwald manifolds (in certain cases), the equivalence of Landsberg and general Landsberg manifolds under certain conditions, a classifica- tion of h-isotropic Ch-recurrent manifolds and a presentation of different conditions under which an R3-like Finsler manifold becomes a Finsler manifold of s-ps curvature. The above results are just a non-exhaustive sample of the global results obtained in this paper. It should finally be noted that some important results of [8], [9], [11], [13], [19], [20],...,etc. (obtained in local coordinates) are immediately derived from the obtained global results (when localized). Although our investigation is entirely global, we conclude the paper with an ap- pendix presenting a local counterpart of our global approach and the local definitions of the special Finsler spaces considered. This is done to facilitate comparison and to make the paper more self-contained. 1. Notation and Preliminaries In this section, we give a brief account of the basic concepts of the pullback formalism of Finsler geometry necessary for this work. For more details refer to [1], [3], [5] and [24]. We make the general assumption that all geometric objects we consider are of class C∞. The following notations will be used throughout this paper: M : a real differentiable manifold of finite dimension n and of class C∞, F(M): the R-algebra of differentiable functions on M , X(M): the F(M)-module of vector fields on M , πM : TM −→M : the tangent bundle of M , π : TM −→M : the subbundle of nonzero vectors tangent to M , V (TM): the vertical subbundle of the bundle TTM , P : π−1(TM) −→ TM : the pullback of the tangent bundle TM by π, P ∗ : π−1(T ∗M) −→ TM : the pullback of the cotangent bundle T ∗M by π, X(π(M)): the F(TM)-module of differentiable sections of π−1(TM). Elements of X(π(M)) will be called π-vector fields and will be denoted by barred letters X. Tensor fields on π−1(TM) will be called π-tensor fields. The fundamental π-vector field is the π-vector field η defined by η(u) = (u, u) for all u ∈ TM . The lift to π−1(TM) of a vector field X on M is the π-vector field X defined by X(u) = (u,X(π(u))). The lift to π−1(TM) of a 1-form ω on M is the π-form ω defined by ω(u) = (u, ω(π(u))). The tangent bundle T (TM) is related to the pullback bundle π−1(TM) by the short exact sequence 0 −→ π−1(TM) −→ T (TM) −→ π−1(TM) −→ 0, where the bundle morphisms ρ and γ are defined respectively by ρ = (πT M , dπ) and γ(u, v) = ju(v), where ju is the natural isomorphism ju : TπM (v)M −→ Tu(TπM (v)M). Let ∇ be a linear connection (or simply a connection) in the pullback bundle π−1(TM). We associate to ∇ the map K : TTM −→ π−1(TM) : X 7−→ ∇Xη, called the connection (or the deflection) map of ∇. A tangent vector X ∈ Tu(TM) is said to be horizontal if K(X) = 0 . The vector space Hu(TM) = {X ∈ Tu(TM) : K(X) = 0} of the horizontal vectors at u ∈ TM is called the horizontal space to M at u . The connection ∇ is said to be regular if Tu(TM) = Vu(TM)⊕Hu(TM) ∀u ∈ TM. If M is endowed with a regular connection, then the vector bundle maps γ : π−1(TM) −→ V (TM), ρ|H(T M) : H(TM) −→ π −1(TM), K|V (T M) : V (TM) −→ π −1(TM) are vector bundle isomorphisms. Let us denote β = (ρ|H(T M)) −1, then ρoβ = idπ−1(T M), βoρ = idH(T M) on H(TM) 0 on V(TM) (1.1) For a regular connection∇ we define two covariant derivatives ∇ and ∇ as follows: For every vector (1)π-form A, we have ∇ A)(øX, øY ) := (∇βøXA)(øY ) , ( ∇ A)(øX, øY ) := (∇γøXA)(øY ). The classical torsion tensor T of the connection ∇ is defined by T(X, Y ) = ∇XρY −∇Y ρX − ρ[X, Y ] ∀X, Y ∈ X(TM). The horizontal ((h)h-) and mixed ((h)hv-) torsion tensors, denoted respectively by Q and T , are defined by Q(X, Y ) = T(βXβY ), T (X, Y ) = T(γX, βY ) ∀X, Y ∈ X(π(M)). The classical curvature tensor K of the connection ∇ is defined by K(X, Y )ρZ = −∇X∇Y ρZ +∇Y∇XρZ +∇[X,Y ]ρZ ∀X, Y, Z ∈ X(TM). The horizontal (h-), mixed (hv-) and vertical (v-) curvature tensors, denoted respec- tively by R, P and S, are defined by R(X, Y )øZ = K(βXβY )øZ, P (X, Y )øZ = K(βX, γY )øZ, S(X, Y )øZ = K(γX, γY )øZ. We also have the (v)h-, (v)hv- and (v)v-torsion tensors, denoted respectively by R̂, P̂ and Ŝ, defined by R̂(X, Y ) = R(X, Y )øη, P̂ (X, Y ) = P (X, Y )øη, Ŝ(X, Y ) = S(X, Y )øη. Theorem 1.1. [25] Let (M,L) be a Finsler manifold. There exists a unique regular connection ∇ in π−1(TM) such that (a) ∇ is metric : ∇g = 0, (b) The horizontal torsion of ∇ vanishes : Q = 0, (c) The mixed torsion T of ∇ satisfies g(T (X, Y ), Z) = g(T (X,Z), Y ). Such a connection is called the Cartan connection associated to the Finsler man- ifold (M,L). One can show that the torsion T of the Cartan connection has the property that T (X, η) = 0 for all X ∈ X(π(M)) and associated to T we have: Definition 1.2. [25] Let ∇ be the Cartan connection associated to (M,L). The torsion tensor field T of the connection ∇ induces a π-tensor field of type (0, 3), called the Cartan tensor and denoted again T , defined by : T (X, Y , Z) = g(T (X, Y ), Z), for all X, Y , Z ∈ X(TM). It also induces a π-form C, called the contracted torsion, defined by : C(X) := Tr{Y 7−→ T (X, Y )}, for all X ∈ X(TM). Definition 1.3. [25] With respect to the Cartan connection ∇ associated to (M,L), we have – The horizontal and vertical Ricci tensors Rich and Ricv are defined respectively by: Rich(X, Y ) := Tr{Z 7−→ R(X,Z)Y }, for all X, Y ∈ X(TM), Ricv(X, Y ) := Tr{Z 7−→ S(X,Z)Y }, for all X, Y ∈ X(TM). – The horizontal and vertical Ricci maps Rich0 and Ric 0 are defined respectively by: g(Rich0(X), Y ) := Ric h(X, Y ), for all X, Y ∈ X(TM), g(Ricv0(X), Y ) := Ric v(X, Y ), for all X, Y ∈ X(TM). – The horizontal and vertical scalar curvatures Sch , Scv are defined respectively by: Sch := Tr(Rich0), Sc v := Tr(Ricv0), where R and S are respectively the horizontal and vertical curvature tensors of ∇. Proposition 1.4. [12] Let (M,L) be a Finsler manifold. The vector field G deter- mined by iGΩ = −dE is a spray, called the canonical spray associated to the energy E, where E := 1 L2 and Ω := ddJE. One can show, in this case, that G = βoη, and G is thus horizontal with respect to the Cartan connection ∇. Theorem 1.5. [26] Let (M,L) be a Finsler manifold. There exists a unique regular connection D in π−1(TM) such that (a) D is torsion free, (b) The canonical spray G = βoη is horizontal with respect to D, (c) The (v)hv-torsion tensor P̂ of D vanishes. Such a connection is called the Berwald connection associated to the Finsler manifold (M,L). 2. Special Finsler spaces In this section, we introduce the global definitions of the most important and commonly used special Finsler spaces in such a way that, when localized, they yield the usual local definitions existing in the literature (see the Appendix). Here we simply set the definitions, postponing investigation of the mutual relationships be- tween these special Finsler spaces to the next section. The definitions are arranged according to the type of defining property of the special Finsler space concerned. Throughout the paper, g, ĝ, ∇ and D denote respectively the Finsler metric in π−1(TM), the induced metric in π−1(T ∗M), the Cartan connection and the Berwald connection associated to a given Finsler manifold (M,L). Also, T denotes the torsion tensor of the Cartan connection (or the Cartan tensor) and R, P and S denote respectively the horizontal curvature, the mixed curvature and the vertical curvature of the Cartan connection. Definition 2.1. A Finsler manifold (M,L) is : (a) Riemannian if the metric tensor g(x, y) is independent of y or, equivalently, if T (X, Y ) = 0, for all X, Y ∈ X(π(M)). (b) locally Minkowskian if the metric tensor g(x, y) is independent of x or, equiva- lently, if ∇βX T = 0 and R = 0. Definition 2.2. A Finsler manifold (M,L) is said to be : (a) Berwald [24] if the torsion tensor T is horizontally parallel. That is, ∇βX T = 0. (b) Ch-recurrent if the torsion tensor T satisfies the condition ∇βX T = λo(X) T, where λo is a π-form of order one. (c) P ∗-Finsler manifold if the π-tensor field ∇βηT is expressed in the form ∇βη T = λ(x, y) T, where λ(x, y) = bg(∇βη C,C) g(∇βηøC,øC) and C2 := ĝ(C,C) = C(C) 6= 0; C being the π-vector field defined by g(C,X) = C(X). Definition 2.3. A Finsler manifold (M,L) is said to be: (a) Cv-recurrent if the torsion tensor T satisfies the condition (∇γXT )(Y , Z) = λo(X)T (Y , Z). (b) C0-recurrent if the torsion tensor T satisfies the condition (DγXT )(Y , Z) = λo(X)T (Y , Z). Definition 2.4. [25] A Finsler manifold (M,L) is said to be : (a) semi-C-reducible if dimM ≥ 3 and the Cartan tensor T has the form T (X, Y , Z) = {~(X, Y )C(Z) + ~(Y , Z)C(X) + ~(Z,X)C(Y )}+ C(X)C(Y )C(Z), where µ and τ are scalar functions satisfying µ + τ = 1, ~ = g − ℓ ⊗ ℓ and ℓ(X) := L−1g(X, η). (b) C-reducible if dimM ≥ 3 and the Cartan tensor T has the form T (X, Y , Z) = {~(X, Y )C(Z) + ~(Y , Z)C(X) + ~(Z,X)C(Y )}. (c) C2-like if dimM ≥ 2 and the Cartan tensor T has the form T (X, Y , Z) = C(X)C(Y )C(Z). Definition 2.5. A Finsler manifold (M,L), where dimM ≥ 3, is said to be quasi-C- reducible if the Cartan tensor T is written as : T (X, Y , Z) = A(X, Y )C(Z) + A(Y , Z)C(X) + A(Z,X)C(Y ), where A is a symmetric indicatory (2) π-form (A(X, η) = 0 for all X). Definition 2.6. [25] A Finsler manifold (M,L) is said to be : (a) S3-like if dim(M) ≥ 4 and the vertical curvature tensor S(X, Y , Z,W ) := g(S(X, Y )Z,W ) has the form : S(X, Y , Z,W ) = (n− 1)(n− 2) {~(X,Z)~(Y ,W )− ~(X,W )~(Y , Z)}. (b) S4-like if dim(M) ≥ 5 and the vertical curvature tensor S(X, Y , Z,W ) has the form : S(X, Y , Z,W ) =~(X,Z)F(Y ,W )− ~(Y , Z)F(X,W )+ + ~(Y ,W )F(X,Z)− ~(X,W )F(Y , Z), (2.1) where F is the (2)π-form defined by F = {Ricv − Scv ~ 2(n− 2) Definition 2.7. A Finsler manifold (M,L) is said to be : (a) Sv-recurrent if the v-curvature tensor S satisfies the condition (∇γXS)(Y , Z,W ) = λ(X)S(Y , Z)W, where λ is a π-form of order one. (b) Sv-recurrent of the second order if the v-curvature tensor S satisfies the condition ∇ S)(øY, øX,Z,W,U) = Θ(X, Y )S(Z,W )U, where Θ is a π-form of order two. Definition 2.8. [24] A Finsler manifold (M,L) is said to be : (a) a Landsberg manifold if P̂ (X, Y ) = P (X, Y )η = 0 ∀X, Y ∈ X(π(M)), or equivalently ∇βη T = 0. (b) a general Landsberg manifold if Tr{Y −→ P̂ (X, Y )} = 0 ∀X,∈ X(π(M)), or equivalently ∇βη C = 0. Definition 2.9. A Finsler manifold (M,L) is said to be P -symmetric if the mixed curvature tensor P satisfies P (X, Y )Z = P (Y ,X)Z, ∀ øX, øY, øZ ∈ X(π(M)). Definition 2.10. A Finsler manifold (M,L), where dimM ≥ 3, is said to be P2-like if the mixed curvature tensor P has the form : P (X, Y , Z, øW ) = α(Z)T (X, Y , øW )− α(W ) T (X, øY, Z), where α is a (1) π-form (positively homogeneous of degree 0). Definition 2.11. [25] A Finsler manifold (M,L), where dimM ≥ 3, is said to be P -reducible if the π-tensor field P (X, Y , Z) := g(P (X, Y )η, Z) can be expressed in the form : P (X, Y , Z) = δ(X)~(Y , Z) + δ(Y )~(Z,X) + δ(Z)~(X, Y ), where δ is a (1) π-form satisfying δ(øη) = 0. Definition 2.12. [2] A Finsler manifold (M,L), where dimM ≥ 3, is said to be h-isotropic if there exists a scalar ko such that the horizontal curvature tensor R has the form R(X, Y )Z = ko{g(Y , Z)X − g(X,Z)Y }. Definition 2.13. [2] A Finsler manifold (M,L), where dimM ≥ 3, is said to be : (a) of scalar curvature if there exists a scalar function k : TM −→ R such that the horizontal curvature tensor R(X, Y , Z,W ) := g(R(X, Y )Z,W ) satisfies the relation R(η,X, η, Y ) = kL2~(X, Y ). (b) of constant curvature if the function k in (a) is constant. Definition 2.14. A Finsler manifold (M,L) is said to be R3-like if dimM ≥ 4 and the horizontal curvature tensor R(X, Y , Z,W ) is expressed in the form R(X, Y , Z,W ) =g(X,Z)F (Y ,W )− g(Y , Z)F (X,W )+ + g(Y ,W )F (X,Z)− g(X,W )F (Y , Z), (2.2) where F is the (2)π-form defined by F = 1 {Rich − Sc 2(n−1) 3. Relationships between different types of special Finsler spaces This section is devoted to global investigation of some mutual relationships between the special Finsler spaces introduced in the preceding section. Some conse- quences are also drawn from these relationships. We start with some immediate consequences from the definitions: (a) A Locally Minkowskian manifold is a Berwald manifold. (b) A Berwald manifold is a Landsberg manifold. (c) A Landsberg manifold is a general Landsberg manifold. (d) A Berwald manifold is Ch-recurrent (resp. P ∗-Finsler). (e) A P ∗-manifold is a Landsberg manifold. (f) A C-reducible (resp. C2-like) manifold is semi-C-reducible. (g) A semi-C-reducible manifold is quasi-C-reducible. (h) A Finsler manifold of constant curvature is of scalar curvature. The following two lemmas are useful for subsequent use. Lemma 3.1. [25] For every øX, øY ∈ X(π(M)), we have: (a) P (øη, øX)øY = 0, (b) P (øX, øη)øY = 0, (c) P (øX, øY )øη = (∇βøηT )(øX, øY ). Lemma 3.2. If φ is the vector π-form defined by φ(øX) := øX − L−1ℓ(øX)øη, or φ := I − L−1ℓ⊗ øη, (3.1) where ℓ is the π-form given by ℓ(X) = L−1g(X, η), then we have: (a) ~(øX, øY ) = g(φ(øX), øY ), (b) φ(øη) = 0, (c) φ o φ = φ, (d) Tr(φ) = n− 1, (e) ∇βøX φ = 0, (f) ∇βøX ~ = 0. As we have seen, a Landsberg manifold is general Landsberg. The converse is not true. Nevertheless, we have Proposition 3.3. A C-reducible general Landsberg manifold (M,L) is a Landsberg manifold. Proof. Since (M,L) is a C-reducible manifold, then, by Definition 2.4, Lemma 3.2, the symmetry of ~ and the non-degeneracy of g, we get T (øX, øY ) = {~(øX, øY )øC + C(øX)φ(øY ) + C(øY )φ(øX)}, where øC is the π-vector field defined by g(øC, øX) := C(øX). Taking the h-covariant derivative ∇βøZ of both sides of the above equation, we obtain (∇βøZ T )(øX, øY ) = {(∇βøZ ~)(øX, øY )øC + ~(øX, øY )∇βøZ øC + C(øX)(∇βøZ φ)(øY ) + +(∇βøZ C)(øX)φ(øY ) + C(øY )(∇βøZ φ)(øX) + (∇βøZ C)(øY )φ(øX)}, from which, by setting øZ = øη and taking into account the fact that ∇βøZ ~ = 0 and that ∇βøZ φ = 0 ( Lemma 3.2), we get (∇βøη T )(øX, øY ) = {~(øX, øY )∇βøη øC+(∇βøη C)(øX)φ(øY )+(∇βøη C)(øY )φ(øX)}. Now, under the given assumption that the (M,L) is a general Landsberg manifold, then ∇βøη C = 0 (Definition 2.8) and hence ∇βøη øC = 0. Hence ∇βøη T = 0 and the result follows. � Also, a Berwald manifold is Landsberg. The converse is by no means true, although we have no counter-examples. Finding a Landsberg manifold which is not Berwald is still an open problem. Nevertheless, we have Proposition 3.4. [25] A C-reducible Landsberg manifold (M,L) is a Berwald manifold. Combining the above two Propositions, we obtain the more powerful result : Proposition 3.5. A C-reducible general Landsberg manifold (M,L) is a Berwald manifold. Summing up, we get: Theorem 3.6. Let (M,L) be a C-reducible Finsler manifold. The following assertion are equivalent : (a) (M,L) is a Berwald manifold. (b) (M,L) is a Landsberg manifold. (c) (M,L) is a general Landsberg manifold. We retrieve here a result of Matsumuoto [15], namely Corollary 3.7. If the h-curvature tensor R and hv-curvature tensor P of a C- reducible manifold vanish, then the manifold is Locally Minkowskian. Remark 3.8. [15] It may be conjectured that a Finsler manifold will be Minkowskian if the h-curvature tensor R and hv-curvature tensor P vanish. As above seen the conjecture is verified already under somewhat strong condition “C-reducibility”. Theorem 3.9. Let (M,L) be a Finsler manifold. Then we have : (a) A C-reducible manifold is P -reducible. (b) A P -reducible general Landsberg manifold is Landsberg. Proof. (a) Since (M,L) is C-reducible, then by Definition 2.4, we have T (øX, øY, øZ) = SøX,øY,øZ{~(øX, øY )C(øZ)}. Applying the h-covariant derivative ∇βøW on both sides of the above equation, taking into account the fact that (∇βøW T )(øX, øY, øZ) = g((∇βøW T )(øX, øY ), øZ) and that ∇βøW ~ = 0, we obtain g((∇βøWT )(øX, øY ), øZ) = SøX,øY,øZ{~(øX, øY )(∇βøW C)(øZ)}. From which, by setting øW = øη and noting that P (øX, øY )øη = (∇βøη T )(øX, øY ), the result follows. (b) Since (M,L) is a P -reducible manifold, then by Definition 2.11, taking into account the fact that g is nondegenerate, we obtain P (øX, øY )øη = δ(øX)φ(øY ) + δ(øY )φ(øX) + ø~(øX, øY ) øζ, (3.2) where øζ is the π-vector field defined by g(øζ, øX) := δ(øX). Since δ(øη) = 0, then Tr{øY 7−→ δ(øY )φ(øX) + ~(øX, øY ) øζ} = 2δ(øX). Taking the trace of both sides of (3.2), using the fact that P (øX, øY )øη = (∇βøη T )(øX, øY ) (Lemma 3.1) and that Tr{øY 7−→ (∇βøη T )(øX, øY )} = (∇βøη C)(øX), we get δ(øX) = n + 1 (∇βøη C)(øX). (3.3) Now, from Equations (3.2) and (3.3), we have g(P (øX, øY )øη, øZ) = n + 1 SøX,øY,øZ{~(øX, øY )(∇βøη C)(øZ)}. (3.4) According to the given assumption that the manifold is general Landsberg, then ∇βøη C = 0. Therefore, from (3.4), we get P (øX, øY )øη = 0 and hence the manifold is Landsberg. � Proposition 3.10. (a) A Ch-recurrent manifold is a P ∗-Finsler manifold. (b) A general Landsberg P ∗-Finsler manifold is a Landsberg manifold. Proof. The proof is straightforward and we omit it. � Proposition 3.11. A C2-like Finsler manifold is a Berwald manifold if, and only if, the π-tensor field C is horizontally parallel. Proof. Let (M,L) be C2-like. Then, T (øX, øY, øZ) = C(øC) C(øX)C(øY )C(øZ), from which T (øX, øY ) = 1 C(øC) C(øX)C(øY )øC. Taking the h-covariant derivative of both sides, we get (∇βøZT )(øX, øY ) = −∇βøZC(øC) C(øX)C(øY )øC + C(øC) (∇βøZC)(øX)C(øY )øC + C(øC) (∇βøZC)(øY )C(øX)øC + C(øC) C(øX)C(øY )∇βøZøC. In view of this relation, ∇βøZ T = 0 if, and only if, ∇βøZ C = 0. Hence the result. � Corollary 3.12. A C2-like general Landsberg manifold is a Landsberg manifold. In view of the above Theorems, we have: Corollary 3.13. The two notions of being Landsberg and general Landsberg coincide in the case of C-reducibility, P -reducibility, C2-likeness or P ∗-Finsler. As we know, a C-reducible Landsberg manifold is a Berwald manifold (Proposi- tion 3.4 ). Moreover, A C2-like Finsler manifold is a Berwald manifold if, and only if, the π-tensor field C is horizontally parallel (Proposition 3.11). We shall try to generalize these results to the case of semi-C-reduciblity. Theorem 3.14. A semi-C-reducible Finsler manifold is a Berwald manifold if, and only if, the characteristic scalar µ and the π-tensor field C are horizontally parallel. Proof. Firstly, if (M,L) is semi-C-reducible, then T (øX, øY, øZ) = SøX,øY,øZ{~(øX, øY )C(øZ)}+ C(øC) C(øX)C(øY )C(øZ). Taking the h-covariant derivative of both sides, noting that ∇βøX~ = 0, we get (∇βøWT )(øX, øY, øZ) = n + 1 SøX,øY,øZ{~(øX, øY ){µ(∇βøWC)(øZ) + (∇βøWµ)C(øZ)}}+ SøX,øY,øZ{(∇βøWC)(øX)C(øY )C(øZ)} − ∇βøW µ τ ∇βøWC(øC) }C(øX)C(øY )C(øZ). Now, if the characteristic scalar µ and the π-tensor field C are horizontally par- allel, then ∇βøWT = 0 and (M,L) is a Berwald manifold. Conversely, if (M,L) is a Berwald manifold, then∇βøXT = 0 and hence ∇βøXC = 0, ∇βøXøC = 0. These, together with the above equation, give ∇βøWµ{ SøX,øY,øZ{~(øX, øY )C(øZ)} − C(øX)C(øY )C(øZ)} = 0, which implies immediately that ∇βøWµ = 0. � The following lemmas are useful for subsequent use Lemma 3.15. For all X, Y ∈ X(π(M)), we have : (a) [γX, γY ] = γ(∇γXY −∇γYX) (b) [γX, βY ] = −γ(P (Y ,X)η +∇βYX) + β(∇γXY − T (X, Y )) (c) [βX, βY ] = γ(R(X, Y )η) + β(∇βXY −∇βYX) Lemma 3.16. For all øX, øY, øZ, øW ∈ X(π(M)) and W ∈ X(TM), we have : (a) g((∇WT )(øX, øY ), øZ) = g((∇WT )(øX, øZ), øY ), (b) g(S(øX, øY )øZ, øW ) = −g(S(øX, øY )øW, øZ). Proof. (a) From the definition of the covariant derivative, we get g((∇WT )(øX, øY ), øZ) = g(∇WT (øX, øY ), øZ)− g(T (∇WøX, øY ), øZ)− −g(T (øX,∇WøY ), øZ). (3.5) Now, we have g(∇WT (øX, øY ), øZ) = W · g(T (øX, øY ), øZ)− g(T (øX, øY ),∇WøZ) = W · g(T (øX, øY ), øZ)− g(T (øX,∇WøZ), øY ), Similarly, g(T (øX,∇WøY ), øZ) = W · g(T (øX, øZ), øY )− g(∇WT (øX, øZ), øY ). Substituting these two equations into (3.5), noting the property that g(T (∇WøX, øY ), øZ) = g(T (∇WøX, øZ), øY ) (cf. §1), the result follows. (b) follows directly from the general formula (which can be easily proved) g(K(X, Y )øZ, øW ) + g(K(X, Y )øW, øZ) = 0 by setting X = γøX and Y = γøY , where K is the classical curvature tensor of the Cartan connection as a linear connection in the pull-back bundle (cf. §1). � Proposition 3.17. Let (M,L) be a Ch-recurrent Finsler manifold (∇βøXT = λ0(øX)T ). Then, we have: (a) If Ko := λo(øη) = 0, then the hv-curvature tensor P is expressed in the form: P (øX, øY, øZ, øW ) = λo(øZ)T (øX, øY, øW )− λo(øW )T (øX, øY, øZ) and the (v)hv-torsion P̂ vanishes. (b) If Ko 6= 0, then the v(hv)-torsion tensor P̂ is recurrent: (∇βøZP̂ )(øX, øY ) = (λo(øZ) + ∇βøZKo )P̂ (øX, øY ). Proof. (a) The hv-curvature tensor P can be written in the form [25]: P (øX, øY, øZ, øW ) = g((∇βøZT )(øX, øY ), øW )− g((∇βøWT )(øX, øY ), øZ)+ +g(T (øX, øZ), P̂(øW, øY ))− g(T (øX, øW ), P̂(øZ, øY )). Then, by using P̂ (øX, øY ) = (∇βøηT )(øX, øY ) (Lemma 3.1) and the C h-recurrence condition, we get P (øX, øY, øZ, øW ) = λo(øZ)T (øX, øY, øW )− λo(øW )T (øX, øY, øZ)− −λo(øη){g(T (øX, øW ), T (øY, øZ))− g(T (øX, øZ), T (øY, øW ))} = λo(øZ)T (øX, øY, øW )− λo(øW )T (øX, øY, øZ)− λo(øη)S(øX, øY, øZ, øW ). Now, if λo(øη) = 0, then (a) follows from the above relation. (b) If Ko := λo(øη) 6= 0, then by Lemma 3.1 and the recurrence condition, we have P̂ (øX, øY ) = KoT (øX, øY ), from which (∇βøZP̂ )(øX, øY ) = {∇βøZKo +Koλo(øZ)}T (øX, øY ). Then, (b) follows from the above two equations. � Theorem 3.18. Assume that (M,L) is Ch-recurrent. Then, the v-curvature tensor S is recurrent with respect to the h-covariant differentiation : ∇βøXS = θ(øX)S, where θ is a π-form of order one. Proof. One can easily show that : For all X, Y, Z ∈ X(TM), SX,Y,Z{K(X, Y )ρZ +∇XT(Y, Z) +T(X, [Y, Z])} = 0. Setting X = γøX, Y = γøY and Z = βøZ in the above equation, we get S(øX, øY )øZ = ∇γøY T (øX, øZ)−∇γøXT (øY, øZ)−∇βøZT(γøX, γøZ)− −T(γøX, [γøY, βøZ]) +T(γøY, [γøX, βøZ]) +T([γøX, γøY ], βøZ). Using Lemma 3.15 and the fact that T(γøX, γøZ) = 0, the above equation reduces S(øX, øY )øZ = (∇γøY T )(øX, øZ)− (∇γøXT )(øY, øZ)+ +T (øX, T (øY, øZ))− T (øY, T (øX, øZ)). (3.6) From which, since g(T (øX, øY ), øZ) = g(T (øX, øZ), øY ), we have g(S(øX, øY )øZ, øW ) = g((∇γøY T )(øX, øZ), øW )− g((∇γøXT )(øY, øZ), øW )+ +g(T (øX, øW ), T (øY, øZ))− g(T (øY, øW ), T (øX, øZ)). Similarly, g(S(øX, øY )øW, øZ) = g((∇γøY T )(øX, øW ), øZ)− g((∇γøXT )(øY, øW ), øZ)+ +g(T (øX, øZ), T (øY, øW ))− g(T (øY, øZ), T (øX, øW )). The above two equations, together with Lemma 3.16, yield g((∇γøXT )(øY, øZ), øW ) = g((∇γøY T )(øX, øZ), øW ). (3.7) By (3.6) and (3.7), we obtain S(øX, øY, øZ, øW ) = g(T (øX, øW ), T (øY, øZ))− g(T (øY, øW ), T (øX, øZ)). (3.8) Now, using the given assumption that the manifold is Ch-recurrent, Equation (3.8) implies that (∇βøXS)(øY, øZ, øV, øW ) = ∇βøXS(øY, øZ, øV, øW )− −S(∇βøXøY, øZ, øV, øW )− S(øY,∇βøXøZ, øV, øW )− −S(øY, øZ,∇βøXøV, øW )− S(øY, øZ, øV,∇βøXøW ). = +∇βøXg(T (øY, øW ), T (øZ, øV ))−∇βøXg(T (øZ, øW ), T (øY, øV ))− −g(T (∇βøXøY, øW ), T (øZ, øV )) + g(T (øZ, øW ), T (∇βøXøY, øV ))− −g(T (øY, øW ), T (∇βøXøZ, øV )) + g(T (∇βøXøZ, øW ), T (øY, øV ))− −g(T (øY, øW ), T (øZ,∇βøXøV )) + g(T (øZ, øW ), T (øY,∇βøXøV ))− −g(T (øY,∇βøXøW ), T (øZ, øV )) + g(T (øZ,∇βøXøW ), T (øY, øV )). = g((∇βøXT )(øY, øW ), T (øZ, øV )) + g(T (øY, øW ), (∇βøXT )(øZ, øV ))− −g((∇βøXT )(øZ, øW ), T (øY, øV ))− g(T (øZ, øW ), (∇βøXT )(øY, øV )). = 2λo(øX)S(øY, øZ, øV, øW ) =: θ(øX)S(øY, øZ, øV, øW ). Hence, the result follows. � Corollary 3.19. In the course of the proof of Theorem 3.18, we have shown that (Equations (3.7) and (3.8)) : (a) (∇γøXT )(øY, øZ) = (∇γøY T )(øX, øZ), (b) S(øX, øY, øZ, øW ) = g(T (øX, øW ), T (øY, øZ))− g(T (øY, øW ), T (øX, øZ)). Corollary 3.20. Let (M,L) be a C2-like Finsler manifold. Then the the v-curvature tensor S vanishes. Proof. Substituting T (øX, øY ) = 1 C(øC) C(øX)C(øY )øC in Corollary 3.19(b), we get the result. � Corollary 3.21. Let (M,L) be a C-reducible manifold. Then, (a) the v-curvature tensor S has the form S(øX, øY, øZ, øW ) = (n + 1)2 {C2~(øX, øW )~(øY, øZ)− C2~(øY, øW )~(øX, øZ) + +~(øX, øW )C(øY )C(øZ) + ~(øY, øZ)C(øX)C(øW )− −~(øY, øW )C(øX)C(øZ)− ~(øX, øZ)C(øY )C(øW )}. (b) the vertical Ricc tensor Ricv has the form Ricv(øX, øY ) = (3− n) (n + 1)2 C(øX)C(øY )− (n− 1) (n+ 1)2 C2~(øX, øY ). (c) the vertical scalar curvature Scv has the form Scv = (2− n) (n+ 1) Theorem 3.22. A Finsler manifold (M,L) is P -Symmetric if, and only if, the v-curvature tensor S satisfies the equation ∇βøηS = 0. Proof. One can show that: For all X, Y, Z ∈ X(TM), SX,Y,Z{∇ZK(X, Y )−K(X, Y )∇Z −K([X, Y ], Z)} = 0. (3.9) Setting X = γøX, Y = γøY and Z = βøZ in the above equation, we get ∇βøZS(øX, øY )øW +∇γøY P (øZ, øX)øW −∇γøXP (øZ, øY )øW− −S(øX, øY )∇βøZøW + P (øZ, øY )∇γøXøW − P (øZ, øX)∇γøY øW− −K([γøX, γøY ], βøZ)øW −K([γøY, βøZ], γøX)øW −K([βøZ, γøX ], γøY )øW = 0. By using Lemma 3.15, the above relation reduces to (∇βøZS)(øX, øY, øW ) + (∇γøY P )(øZ, øX, øW )− (∇γøXP )(øZ, øY, øW )+ +S(P (øZ, øY )øη, øX)øW − S(P (øZ, øX)øη, øY )øW+ +P (T (øY, øZ), øX)øW − P (T (øX, øZ), øY )øW = 0. (3.10) Setting øZ = øη in the above equation, taking into account Lemma 3.1 and the fact that T (øX, øη) = 0 and that (∇γøXP )(øη, øY, øZ) = −P (øX, øY )øZ, we get P (øX, øY )øZ = P (øY, øX)øZ − (∇βøηS)(øX, øY, øZ). (3.11) The result follows immediately from (3.11). � According to (3.11) and Lemma 3.1, we have : Corollary 3.23. Let P̂ (øX, øY ) := P (øX, øY )øη and T̂ (øX, øY ) := (∇βøηT )(øX, øY ). Then the π-tensor fields P̂ and T̂ are symmetric. Theorem 3.18 and Theorem 3.22 give rise the following result. Theorem 3.24. Assume that a Finsler manifold (M,L) is Ch-recurrent and P - symmetric. If θ(øη) 6= 0, then the v-curvature tensor S vanishes identically. Now, we shall prove the following lemma which provides some important and useful properties of the torsion tensor T and the v-curvature S : Lemma 3.25. For every øX, øY, øZ and øW ∈ X(π(M)), we have (a) T (øX, øY ) = T (øY, øX), (b) T (øη, øX) = 0, (c) SøX,øY,øZS(øX, øY )øZ = 0, (d) g(S(øX, øY )øZ, øW ) = g(S(øZ, øW )øX, øY ), (e) S(øη, øX)øY = 0 = S(øX, øη)øY , (f) (∇γøXS)(øη, øY )øZ = −S(øX, øY )øZ, (∇γøXS)(øη, øX)øη = 0 . (g) S(øX, øY )øZ = −1 {(DγXT )(Y , Z)− (DγY T )(X,Z)}. Consequently, S vanishes if and only if (DγXT )(Y , Z) = (DγY T )(X,Z). Proof. (a) From Corollary 3.19(a), we have (∇γøXT )(øY, øZ) = (∇γøY T )(øX, øZ). Setting øZ = øη and using the fact that T (øX, øη) = 0 and that K oγ = idX(π(M)), the result follows. (b) Follows from (a) together with the relation T (øX, øη) = 0. (c) Setting X = γøX, Y = γøY and Z = γøZ in (3.9) and using Lemma 3.15, we SøX,øY,øZ(∇γøXS)(øY, øZ, øW ) = 0. Again, setting øW = øη in the above equation and using the fact that S(øX, øY )øη = 0 and that K oγ = idX(π(M)), the result follows. (d) Follows from Corollary 3.19(b), noting that T is symmetric. (e) and (f) are clear. (g) From the relation DγXøY = ∇γXøY − T (øX, øY ) [27], we get (DγXT )(øY, øZ) = (∇γXT )(øY, øZ)−T (øX, T (øY, øZ))+T (T (øX, øY ), øZ)+T (øY, T (øX, øZ)), (DγY T )(øX, øZ) = (∇γY T )(øX, øZ)−T (øY, T (øX, øZ))+T (T (øY, øX), øZ)+T (øX, T (øY, øZ)). The result follows from the above two equations, using Corollary 3.19 and the sym- metry of T . � As a direct consequence of the above lemma, we have the Corollary 3.26. A P2-like Finsler manifold is P -symmetric. Proposition 3.27. Assume that (M,L) is Cv-recurrent. Then, the v-curvature ten- sor S is v-recurrent : ∇γøXS = Ψ(øX)S, Ψ being a (1)π-form. Consequently, S vanishes identically. Proof. Taking the v-covariant derivative of both sides of the relation in Corollary 3.19(b) and, then, using the assumption that ∇γXT = λ0(X)T , we get (∇γøXS)(øY, øZ, øV, øW ) = 2λo(øX)S(øY, øZ, øV, øW ) =: ψ(øX)S(øY, øZ, øV, øW ), which shows that S is v-recurrent. Now, setting øV = øη in the last equation, using the properties of S and noting that K oγ = idX(π(M)), we conclude that S = 0. � The following result gives a characterization of Riemannian manifolds in terms of Cv-recurrence and C0-recurrence. Theorem 3.28. (a) A Cv-recurrent Finsler manifold is Riemannian, (b) A C0-recurrent Finsler manifold is Riemannian. Proof. (a) Since (M,L) is Cv-recurrent, then (∇γXT )(Y , Z) = λo(X)T (Y , Z), from which, by setting øX = øη and noting that ∇γøηT = −T , we get T (Y , Z) = −λo(η)T (Y , Z). (3.12) But since (∇γøXT )(øY, øZ) = (∇γøY T )(øX, øZ) (Corollary 3.19), then λo(øX)T (øY, øZ) = λo(øY )T (øX, øZ). Hence, λo(η)T (Y , Z) = 0. (3.13) Then, the result follows from (3.12) and (3.13). (b) can be proved similarly. � Theorem 3.29. For a Finsler manifold (M,L), the following assertions are equivalent : (a) (M,L) is Sv-recurrent. (b) The v-curvature tensor S vanishes identically. (c) (M,L) is Sv-recurrent of the second order. Proof. (a) =⇒ (b) : If (M,L) is Sv-recurrent, then by Definition 2.7(a) we have (∇γøWS)(øX, øY, øZ) = λ(øW )S(øY, øX)øZ, from which, by setting øZ = øη, taking into account the fact that S(øX, øY )øη = 0 and that Koγ = idπ−1(TM), the result follows. (b) =⇒ (a) : Trivial. (b) =⇒ (c) : Trivial. (c) =⇒ (b) : If the given manifold (M,L) is Sv-recurrent of the second order, then by Definition 2.7(b) we get Θ(øX, øY )S(øZ, øV )øW = ( ∇ S)(øY, øX, øZ, øV, øW ) = ∇γøY (∇γøXS)(øZ, øV, øW )− (∇γ∇γøY øXS)(øZ, øV, øW )− −(∇γøXS)(∇γøY øZ, øV, øW )− (∇γøXS)(øZ,∇γøY øV, øW )− −(∇γøXS)(øZ, øV,∇γøY øW ). By substituting øZ = øη = øW in the above equation and using Lemma 3.25 and the fact that S(øX, øY )øη = 0, we get S(øX, øY )øZ = −S(øZ, øY )øX and S(øX, øY )øZ = −S(øX, øZ)øY. From this, together with the identity SøX,øY,øZS(øX, øY )øZ = 0, the v-curvature tensor S vanishes identically. � In view of the above theorem we have : Corollary 3.30. (a) An Sv-recurrent (resp. a second order Sv-recurrent) manifold (M,L) is S3-like, provided that dimM ≥ 4. (b) An Sv-recurrent (resp. a second order Sv-recurrent) manifold (M,L) is S4-like, provided that dimM ≥ 5. Theorem 3.31. If (M,L) is a P2-like Finsler manifold, then the v-curvature tensor S vanishes or the hv-curvature tensor P vanishes. In the later case, the h-covariant derivative of S vanishes. Proof. As (M,L) is P2-like, then P (X, Y , η, øW ) = α(η)T (X, Y , øW ) =: αoT (X, Y , øW ) and hence P̂ (øX, øY ) = αoT (X, Y ). (3.14) Now, setting øW = øη into (3.10), we get (∇γøY P̂ )(øZ, øX)− (∇γøX P̂ )(øZ, øY )− P (øZ, øX)øY + P (øZ, øY )øX− −P̂ (T (øX, øZ), øY ) + P̂ (T (øY, øZ), øX) = 0. Hence, g((∇γøY P̂ )(øZ, øX), øW )− g((∇γøXP̂ )(øZ, øY ), øW )− P (øZ, øX, øY, øW )+ +P (øZ, øY, øX, øW )− g(P̂ (T (øX, øZ), øY ), øW ) + g(P̂ (T (øY, øZ), øX), øW ) = 0. From which, together with (3.14) and Definition 2.10, taking into account the relation (∇γøY P̂ )(øZ, øX) = (∇γøY αo)T (øZ, øX) + αo(∇γøY T )(øZ, øX), we obtain g((∇γøY αo)T (øZ, øX) + αo(∇γøY T )(øZ, øX), øW )− g((∇γøXαo)T (øZ, øY )+ +αo(∇γøXT )(øZ, øY ), øW ) + α(X)T (Z, Y , øW )− α(W ) T (Z, øY,X)− α(Y )T (Z,X, øW ) +α(W ) T (X, øY, Z)− g(αoT (T (øX, øZ), øY ), øW ) + g(αoT (T (øY, øZ), øX), øW ) = 0. Therefore, using Corollary 3.19, (∇γøY α)(øη)T (øX, øZ, øW )− (∇γøXα)(øη)T (øY, øZ, øW ) = αoS(øX, øY, øW, øZ). It is to be observed that the left-hand side of the above equation is symmetric in the arguments øZ and øW while the right-hand side is skew-symmetric in the same arguments. Hence we have αoS(øX, øY, øW, øZ) = 0, (3.15) ε(øY )T (øX, øZ, øW )− ε(øX)T (øY, øZ, øW ) = 0, (3.16) where ε is the π-form defined by ε(øY ) := (∇γøY α)(øη). Now, If ε 6= 0, it follows from (3.16) that there exists a scalar function Υ such that T (øX, øY, øZ) = Υ ε(øX)ε(øY )ε(øZ). Consequently, T (øX, øY ) = Υ ε(øX)ε(øY )øε, where g(øε, øX) := ε(øX). From which S(øX, øY, øZ, øW ) = g(T (øX, øW ), T (øY, øZ))− g(T (øY, øW ), T (øX, øZ)) = Υ ε(øX)ε(øY )ε(øZ)ε(øW )g(øε, øε)−Υ ε(øX)ε(øY )ε(øZ)ε(øW )g(øε, øε) = 0. On the other hand, if the v-curvature tensor S 6= 0, then it follows from (3.15) that ε = 0 and α(øη) = 0. Hence, α = 0 and the hv-curvature tensor P vanishes. In this case, it follows from the identity (3.10) that ∇βøXS = 0. � Proposition 3.32. A P2-like Finsler manifold (M,L) is a P ∗-Finsler manifold. Proof. As (M,L) is P2-like, then from (3.14), we have P̂ (X, Y ) = αoT (X, Y ). Using Lemma 3.1, we get (∇βøηT )(øX, øY ) = α0T (øX, øY ), from which, by taking the trace, ∇βøηC = α0T , where α0 = bg(∇βη C,C) . Hence the result. � The next definition will be useful in the sequel. Definition 3.33. A π-tensor field Θ is positively homogenous of degree r in the directional argument y (symbolically, h(r)) if it satisfies the condition ∇γη Θ = rΘ, or Dγη Θ = rΘ. Lemma 3.34. Let (M,L) be a Finsler manifold, then we have (a) The Finsler metric g (the angular metric tensor ~) is homogenous of degree 0, (b) The v-curvature tensor S is homogenous of degree −2, (c) The hv-curvature tensor P is homogenous of degree −1, (d) The h-curvature tensor R is homogenous of degree 0, (e) The (h)hv-torsion tensor T is homogenous of degree −1, (f) The (v)hv-torsion tensor P̂ is homogenous of degree 0, (g) The (v)h-torsion tensor R̂ is homogenous of degree 1. Lemma 3.35. For every vector (1)π-form A, we have ∇ A)(øX, øY, øZ)− ( ∇ A)(øY, øX, øZ) = A(R(øX, øY )øZ)− R(øX, øY )A(øZ)+ γ bR(øX,øY ) A)(øZ). Deicke theorem [4] can be formulated globally as follows: Lemma 3.36. Let (M,L) be a Finsler manifold. The following assertions are equivalent: (a) (M,L) is Riemannian, (b) The (h)hv-torsion tensor T vanishes, (c) The π-form C vanishes. Theorem 3.37. Let (M,L) be Finsler manifold which is h-isotropic (of scalar k0) and Ch-recurrent (of recurrence vector λ0). Then, (M,L) is necessarily one of the following: (a) A Riemannian manifold of constant curvature, (b) A Finsler manifold of dimension 2, (c) A Finsler manifold of dimensions n ≥ 3 with vanishing scalar k0 and (∇βøXλo)(øY ) = (∇βøY λo)(øX). Proof. For a Ch-recurrent manifold, one can easily show that ∇ T )(øX, øY, øZ, øW )− ( ∇ T )(øY, øX, øZ, øW ) = = {(∇βøXλo)(øY )− (∇βøY λo)(øX)}T (øZ, øW ) =: Ψ(øX, øY )T (øZ, øW ). From which, taking into account Lemma 3.35, we obtain Ψ(øX, øY )T (øZ, øW ) = T (R(øX, øY )øZ, øW ) + T (øZ,R(øX, øY )øW )− −R(øX, øY )T (øZ, øW ) + (∇ γ bR(øX,øY ) T )(øZ, øW ). Now, as (M,L) is h-isotropic of scalar k0, then the h-curvature tensor R has the form R(øX, øY )øZ = k0{g(øX, øZ)øY − g(øY, øZ)øX} ; (n ≥ 3). From the above two equations, we get Ψ(øX, øY )T (øZ, øW ) = k0g(øX, øZ)T (øY, øW )− k0g(øY, øZ)T (øX, øW ) + k0g(øX, øW )T (øZ, øY )− −k0g(øY, øW )T (øZ, øX)− k0g(øX, T (øZ, øW ))øY + k0g(øY, T (øZ, øW ))øX +k0g(øX, øη)(∇γøY T )(øZ, øW )− k0g(øY, øη)(∇γøXT )(øZ, øW ). (3.17) Setting øY = øη, noting that T is h(−1) and g(øη, øη) = L2, we get Ψ(øX, øη)T (øZ, øW ) = −k0g(øη, øZ)T (øX, øW )− k0g(øη, øW )T (øZ, øX)− k0T (øX, øZ, øW )øη − −k0g(øX, øη)T (øZ, øW )− k0L 2(∇γøXT )(øZ, øW ). From which, we have g(øY, øη)Ψ(øX, øη)T (øZ, øW ) = −k0g(øY, øη)g(øη, øZ)T (øX, øW )− k0g(øY, øη)g(øη, øW )T (øZ, øX)− −k0g(øY, øη)T (øX, øZ, øW )øη− k0g(øY, øη)g(øX, øη)T (øZ, øW )− 2g(øY, øη)(∇γøXT )(øZ, øW ), (3.18) whereas g(øX, øη)Ψ(øY, øη)T (øZ, øW ) = −k0g(øX, øη)g(øη, øZ)T (øY, øW )− k0g(øX, øη)g(øη, øW )T (øZ, øY )− −k0g(øX, øη)T (øY, øZ, øW )øη− k0g(øX, øη)g(øY, øη)T (øZ, øW )− 2g(øX, øη)(∇γøY T )(øZ, øW ). (3.19) Now, from (3.17), (3.18) and (3.19), we obtain T (øZ, øW ){L2Ψ(øX, øY )− g(øY, øη)Ψ(øX, øη) + g(øX, øη)Ψ(øY, øη)} = = UøX,øY k0L 2{~(øX, øZ)T (øY, øW ) + ~(øX, øW )T (øY, øZ)− φ(øY ) T (øX, øZ, øW )}. Taking the trace of both sides of the above equation, we get C(øZ){L2Ψ(øX, øY )− g(øY, øη)Ψ(øX, øη) + g(øX, øη)Ψ(øY, øη)} = = 2k0L 2{~(øX, øZ)C(øY )− ~(øY, øZ)C(øX)}. (3.20) Setting øZ = øC, taking into account the fact that ~(øX, øC) = C(øX), the above equation reduces to C(øC){L2Ψ(øX, øY )− g(øY, øη)Ψ(øX, øη) + g(øX, øη)Ψ(øY, øη)} = 0. Now, if C(øC) = g(øC, øC) = 0, then øC = 0 and so C = 0. Consequently, by Lemma 3.36, (M,L) is a Riemannian manifold of constant curvature. On the other hand, if (M,L) is not Riemannian, then we have L2Ψ(øX, øY )− g(øY, øη)Ψ(øX, øη) + g(øX, øη)Ψ(øY, øη) = 0. From which, together with (3.20), we get k0{~(øX, øZ)C(øY )− ~(øY, øZ)C(øX)} = 0. (3.21) If k0 6= 0, then, by (3.21), ~(øX, øZ)C(øY ) = ~(øY, øZ)C(øX). Setting øY = øC, we get ~(øX, øZ) = 1 C(øX)C(øZ), which implies that dimM = 2. If k0 = 0, then R = 0 and (3.17) yields Ψ(øX, øY ) = 0, which means that (∇βøXλo)(øY ) = (∇βøY λo)(øX). � Now, we focus our attention to the interesting case (c) of the above theorem. In this case, the h-curvature tensor R = 0 and hence the (v)h-torsion tensor R̂ = 0. Therefore, the equation (deduced from (3.9)) (∇γøXR)(øY, øZ, øW ) + (∇βøY P )(øZ, øX, øW )− (∇βøZP )(øY, øX, øW )− −P (øZ, P (øY, øX)øη)øW +R(T (øX, øY ), øZ)øW − S(R(øY, øZ)øη, øX)øW+ +P (øY, P (øZ, øX)øη)øW − R(T (øX, øZ), øY )øW = 0. reduces to (∇βøY P )(øZ, øX, øW )− (∇βøZP )(øY, øX, øW )− −P (øZ, P̂ (øY, øX))øW + P (øY, P̂ (øZ, øX))øW = 0. Setting øW = øη, we get (∇βøY P̂ )(øZ, øX)− (∇βøZP̂ )(øY, øX)− P̂ (øZ, P̂ (øY, øX)) + P̂ (øY, P̂ (øZ, øX)) = 0. (3.22) Since (M,L) is Ch-recurrent, then, by Proposition 3.17, the (v)hv-torsion tensor P̂ satisfies the relations (∇βøZP̂ )(øX, øY ) = (Koλo(øZ) + ∇βøZKo)T (øX, øY ) and P̂ (øX, øY ) = λo(øη)T (øX, øY ) = KoT (øX, øY ). From these, together with (3.22), we get (Koλo(øY ) +∇βøYKo)T (øZ, øX)− (Koλo(øZ) +∇βøZKo)T (øX, øY )− −K2oT (øZ, T (øX, øY )) +K oT (øY, T (øX, øZ)) = 0. Hence, by Corollary 3.19, K2oS(øY, øZ, øX, øW ) = UøY,øZ{(Koλo(øY ) +∇βøYKo)T (øX, øZ, øW )}. As S(øY, øZ, øX, øW ) is skew-symmetric in the arguments øX and øW while the right-hand side is symmetric in the same arguments, we obtain K2oS(øY, øZ, øX, øW ) = 0, (3.23) UøY,øZ{(Koλo(øY ) +∇βøYKo)T (øZ, øX, øW )} = 0. (3.24) It follows from (3.23) and () that P (øX, øY, øZ, øW ) = λo(øZ)T (øX, øY, øW )− λo(øW )T (øX, øY, øZ). On the other hand, if Ko 6= 0, then the v-curvature tensor S vanishes from (3.23). Next, it is seen from (3.24) that, if V(øY ) := Koλo(øY ) +∇βøYKo 6= 0, then there exists a scalar function Υ = T (øX,øZ,øW )T (øX,øY,øZ)T (øY,øZ,øW ) (T (øX,øY,øW ))2(V(øZ))3 such that T (øX, øY, øW ) = ΥV(øX)V(øY )V(øW ). Summing up, we have Theorem 3.38. Let (M,L) be a Finsler manifold of dimensions n ≥ 3. If (M,L) is h-isotropic and Ch-recurrent, then (a) the recurrence vector λo satisfies : (∇βøXλo)(øY ) = (∇βøY λo)(øX), (b) the h-curvature tensor R = 0 and the (v)h-torsion tensor R̂ = 0, (c) the hv-curvature tensor P has the property that P (øX, øY, øZ, øW ) = λo(øZ)T (øX, øY, øW )− λo(øW )T (øX, øY, øZ), (d) the (v)hv-torsion tensor P̂ (øX, øY ) = KoT (øX, øY ). Moreover, if Ko 6= 0, then (e) the v-curvature tensor S vanishes, (f) the (h)hv-torsion tensor T satisfies : T (øX, øY, øW ) = ΥV(øX)V(øY )V(øW ). By Definition 2.10 and Theorem 3.38, we immediately have : Corollary 3.39. A Finsler manifold (M,L) of dimension n ≥ 3 which is h-isotropic and Ch-recurrent is necessarily P2-like. Now, we define an operator P which aids us to investigate the R3-like manifolds. Definition 3.40. (a) If ω is a π-tensor field of type (1,p), then P · ω is a π-tensor field of the same type defined by : (P · ω)(øX1, ..., øXp) := φ(ω(φ(øX1), ..., φ(øXp))), where φ is the vector π-form defined by (3.1). (b) If ω is a π-tensor field of type (0,p), then P · ω is a π-tensor field of the same type defined by : (P · ω)(øX1, ..., øXp) := ω(φ(øX1), ..., φ(øXp)). Remark 3.41. Since φ(φ(øX)) = φ(øX) for every øX ∈ X(π(M)) (Lemma 3.2), then the operator P is a projector (i.e. P · (P · ω) = P · ω). Definition 3.42. A π-tensor field ω is said to be indicatory if it satisfies the condition : P · ω = ω. The following result gives a characterization of the indicatory property for certain types of π-tensor fields : Lemma 3.43. (a) A vector (2)π-form ω is indicatory if, and only if, ω(øX, øη) = 0 = ω(øη, øX) and g(ω(øX, øY ), øη) = 0. (b) A scaler (2) π-form ω is indicatory if, and only if, ω(øX, øη) = 0 = ω(øη, øX). Proof. (a) Let ω be a vector (2)π-form. By Definition 3.40(a) and taking into account (3.1), we get (P · ω)(øX, øY ) = φ(ω(φ(øX), φ(øY ))) = φ{ω(øX − L−1ℓ(øX)øη, øY − L−1ℓ(øY )øη)} = φ{ω(øX, øY )− L−1ℓ(øY )ω(øX, øη)− −L−1ℓ(øX)ω(øη, øY ) + L−2ℓ(øX)ℓ(øY )ω(øη, øη)} = ω(øX, øY )− L−2g(ω(øX, øY ), øη)øη − φ{L−1ℓ(øY )ω(øX, øη)+ +L−1ℓ(øX)ω(øη, øY )− L−2ℓ(øX)ℓ(øY )ω(øη, øη)} (3.25) Now, if ω(øX, øη) = 0 = ω(øη, øX) and g(ω(øX, øY ), øη) = 0, then (3.25) implies that (P · ω)(øX, øY ) = ω(øX, øY ) and hence ω is indicatory. On the other hand, if ω is indicatory, then ω(øX, øY ) = φ(ω(φ(øX), φ(øY ))). From which, setting øX = øη (resp. øY = øη) and taking into account the fact that φ(øη) = 0 (Lemma 3.2), we get ω(øη, øY ) = 0 (resp. ω(øX, øη) = 0). From this, to- gether with (P·ω)(øX, øY ) = ω(øX, øY ), Equation (3.25) implies that L−2g(ω(øX, øY ), øη)øη = 0. Consequently, g(ω(øX, øY ), øη) = 0. (b) The proof is similar to that of (a) and we omit it. � Proposition 3.44. For a Finsler manifold (M,L), the following tensors are indicatory : (a) The π-tensor field φ, (b) The mixed torsion tensor T , (c) The v-curvature tensor S, (d) The angular metric tensor ~, (e) The π-tensor field P · ω for every π-tensor field ω. Now, we define the following π-tensor fields: F : F (X, Y ) := 1 {Rich(X, Y )− Schg(X,Y ) 2(n−1) Fo : g(Fo(øX), øY ) := F (øX, øY ), F a : F a(øX) := F (øη, øX), F b : F b(øX) := F (øX, øη), m : m(øX, øY ) := (P · F )(øX, øY ), mo : g(mo(øX), øY ) := m(øX, øY ), a : a(øX) := L−1(P · F a)(øX), øa : g(øa, øY ) := a(øX), b : b(øX) := L−1(P · F b)(øX), øb : g(øb, øX) := b(øX), c : c := L−2F (øη, øη), R̂ : R̂(øX, øY ) := R(øX, øY )øη, H : H(øX) := R(øη, øX)øη = R̂(øη, øX).   (3.26) Remark 3.45. One can show that m, mo, a and b are indicatory and H(øη) = 0. Proposition 3.46. If (M,L) is an R3-like Finsler manifold, then the π-tensor field F can be written in the form F (øX, øY ) = m(øX, øY ) + ℓ(øX)a(øY ) + ℓ(øY )b(øX) + c ℓ(øX)ℓ(øY ). (3.27) Proof. The proof follows from Definitions 2.14 and 3.40(b), taking into account Equations (3.1) and (3.26). In more details : (P · F )(øX, øY ) = F (φ(øX), φ(øY )) = F (øX − L−1ℓ(øX)øη, øY − L−1ℓ(øY )øη) = F (øX, øY )− L−1ℓ(øY )F (øX, øη)− −L−1ℓ(øX)F (øη, øY ) + L−2ℓ(øX)ℓ(øY )F (øη, øη) = F (øX, øY )− L−1ℓ(øY ){(P · F b)(øX) + L−1ℓ(øX)F (øη, øη)}− −L−1ℓ(øX){(P · F a)(øY ) + L−1ℓ(øY )F (øη øη)}+ L−2ℓ(øX)ℓ(øY )F (øη, øη) = F (øX, øY )− ℓ(øX)a(øY )− ℓ(øY )b(øX)− c ℓ(øX)ℓ(øY ). � Remark 3.47. One can show that the π-tensor fields a and b satisfy the following relations F a(øX) = L{a(øX) + c ℓ(øX)}, F b(øX) = L{b(øX) + c ℓ(øX)}. (3.28) Proposition 3.48. In an R3-like Finsler manifold (M,L), we have : (a) R(øX, øY )øZ = g(øX, øZ)Fo(øY )+F (øX, øZ)øY−g(øY, øZ)Fo(øX)−F (øY, øZ)øX. (b) R̂(øX, øY ) = g(øX, øη)Fo(øY )+F (øX, øη)øY −g(øY, øη)Fo(øX)−F (øY, øη)øX. (c) H(øY ) = L2Fo(øY ) + c L 2øY − g(øY, øη)Fo(øη)− F (øY, øη)øη. (d) Fo(øX) = mo(øX) + øa ℓ(øX) + L −1b(øX)øη + c L−1ℓ(øX)øη. Consequently, (e) R̂(øX, øY ) = L{ℓ(øX)(mo(øY ) + c φ(øY )) + b(øX)φ(øY )}− − L{ℓ(øY )(mo(øX) + c φ(øX)) + b(øY )φ(øX)}. (f) H(øY ) = L2{mo(øY ) + c φ(øY )}. Proof. (a) Since (M,L) is an R3-like manifold, then by Definition 2.14, we have R(X, Y , Z,W ) =g(X,Z)F (Y ,W )− g(Y , Z)F (X,W )+ + g(Y ,W )F (X,Z)− g(X,W )F (Y , Z). From which, using the fact that g(Fo(øX), øY ) = F (øX, øY ) and that the Finsler metric g is non-degenerate, the result follows. (b) Follows from (a) by setting øZ = øη. (c) Follows from (b) by setting øX = øη. (d) By (3.27) and (3.26), we get g(Fo(øX), øY ) = g(mo(øX), øY )+g(øa, øY ) ℓ(øX)+L −1b(øX)g(øη, øY )+c L−1ℓ(øX)g(øη, øY ). Hence, the result follows, from the non-degeneracy of g. (e) Follows by substituting Fo(øX) (from (d)) and F b(øX) (from (3.28)) into (b). (f) Follows from (e) by setting øX = øη, taking into account Remark 3.45 and the fact that ℓ(øη) = L. � Remark 3.49. In view of (3.26) and Lemma 3.2, Definition 2.13(a) can be reformu- lated as follows: A Finsler manifold (M,L) is of scaler curvature if the π-tensor field H satisfies the relation H(øX) = L2κφ(øX), where κ is a scalar function on TM. Definition 3.50. A Finsler manifold (M,L) is said to be of perpendicular scalar (or of p-scalar) curvature if the h-curvature tensor R satisfies the condition (P · R)(øX, øY, øZ, øW ) = Ro{~(øX, øZ)~(øY, øW )− ~(øX, øW )~(øY, øZ)}, (3.29) where Ro is a function called the perpendicular scalar curvature. Definition 3.51. A Finsler manifold (M,L) is said to be of s-ps curvature if (M,L) is both of scalar curvature and of p-scalar curvature. Proposition 3.52. If mo(øX) = t φ(øX), then an R3-like Finsler manifold is a Finsler manifold of s-ps curvature. Proof. Under the given assumption and taking into account Proposition 3.48(f), we H(øX) = L2κφ(øX), with κ = t + c. Thus, the considered manifold is of scalar curvature. Now, we prove that the given manifold is of p-scalar curvature. Applying the projection P on the h-curvature tensor R of an R3-like manifold, we get (P · R)(øX, øY, øZ, øW ) = R(φ(øX), φ(øY ), φ(øZ), φ(øW )) = g(φ(øX), φ(øZ))(P · F )(øY, øW ) + g(φ(øY ), φ(øW ))(P · F )(øX, øZ)− −g(φ(øY ), φ(øZ))(P · F )(øX, øW )− g(φ(øX), φ(øW ))(P · F )(øY, øZ) = g(φ(øX), φ(øZ))m(øY, øW ) + g(φ(øY ), φ(øW ))m(øX, øZ)− −g(φ(øY ), φ(øZ))m(øX, øW )− g(φ(øX), φ(øW ))m(øY, øZ). (3.30) Since g(φ(øX), φ(øY )) = g(φ(øX), øY − L−1ℓ(øY )øη) = g(φ(øX), øY )− L−1ℓ(øY )g(φ(øX), øη) = ~(øX, øY )− L−1ℓ(øY )~(øX, øη) = ~(øX, øY ), then, by using again the given assumption (mo = t φ =⇒ m = t~), Equation (3.30) reduces to (P · R)(øX, øY, øZ, øW ) = ~(øX, øZ)m(øY, øW ) + ~(øY, øW )m(øX, øZ)− −~(øY, øZ)m(øX, øW )− ~(øX, øW )m(øY, øZ) = 2t{~(øX, øZ)~(øY, øW )− ~(øY, øZ)~(øX, øW )}. Therefore, by taking Ro = 2t, we have (P · R)(øX, øY, øZ, øW ) = Ro{~(øX, øZ)~(øY, øW )− ~(øY, øZ)~(øX, øW )}. Consequently, the given manifold is of p-scalar curvature. � Theorem 3.53. If an R3-like Finsler manifold (M,L) is of p-scalar curvature, then it is of s-ps curvature. Proof. Since the considered manifold is R3-like, then, by the same procedure as in the proof of Proposition 3.52, we have (P · R)(øX, øY, øZ, øW ) = ~(øX, øZ)m(øY, øW ) + ~(øY, øW )m(øX, øZ)− −~(øY, øZ)m(øX, øW )− ~(øX, øW )m(øY, øZ). (3.31) On the other hand, since the considered manifold is of p-scalar curvature, then the h-curvature tensor satisfies (P · R)(øX, øY, øZ, øW ) = Ro{~(øX, øZ)~(øY, øW )− ~(øY, øZ)~(øX, øW )}. (3.32) Now, from Equations (3.31) and (3.32), we obtain UøX,øY {Ro~(øX, øZ)~(øY, øW )− ~(øX, øZ)m(øY, øW )− ~(øY, øW )m(øX, øZ)} = 0. Using (3.26) and the non-degeneracy of the metric tensor g, the above equation reduces to UøX,øY {Ro~(øX, øZ)φ(øY )− ~(øX, øZ)mo(øY )−m(øX, øZ)φ(øY )} = 0. (3.33) Since the π-tensor fields φ,m and mo are indicatory, then Tr{øY 7−→ ~(øX, øY )φ(øZ)} = g(øX, φ(øZ)) = ~(øX, øZ), Tr{øY 7−→ ~(øX, øY )mo(øZ)} = m(øX, øZ), Tr{øY 7−→ m(øX, øY )φ(øZ)} = m(øX, øZ). Consequently, if we take the trace of both sides of Equation (3.33), making use of Lemma 3.43, we get (n− 2)Ro~(øX, øZ)− (n− 3)m(øX, øZ)− (n− 1)t ~(øX, øZ) = 0, where t := 1 Tr(mo). From which, using (3.26) and Lemma 3.2, we get (n− 2)Roφ− (n− 3)mo − (n− 1)t φ = 0. (3.34) Again, taking the trace of the above equation, we obtain (n− 1)(n− 2)(Ro − 2t) = 0. Substituting the above relation into (3.34), we get mo = t φ. Hence, by Proposition 3.52, the result follows. � Theorem 3.54. If an R3-like Finsler manifold (M,L) is of scalar curvature, then it is of s-ps curvature. Proof. Since the given manifold is R3-like, then the π-tensor H is given by (cf. Proposition 3.48): H(øX) = L2{mo(øX) + c φ(øX)}. (3.35) And since the considered manifold is of scalar curvature, then H(øX) = L2κφ(øX). (3.36) From Equations (3.35) and (3.36), we deduce thatmo(øX) = (κ−c)φ(øX) =: tφ(øX). Hence, by Proposition 3.52, the result follows. � Now, let us define the π-tensor field Ψ(øX, øY, øZ, øW ) = R(øX, øY, øZ, øW )− 1 UøX,øY {g(øX, øZ)Ric h(øY, øW )+ +g(øY, øW )Rich(øX, øZ)− rg(øX, øZ)g(øY, øW )}, (3.37) where r = 1 Sch. From Definition 2.14 and (3.37), we immediately obtain Theorem 3.55. An R3-like Finsler manifold is characterized by Ψ(øX, øY, øZ, øW ) = 0. The tensor field Ψ in the above theorem being of the same form as the Weyl conformal tensor in Riemannian geometry, we draw the following Theorem 3.56. An R3-like Riemannian manifold is conformally flat. Remark 3.57. It should be noted that some important results of [8], [9], [11], [13], [19], [20],...,etc. (obtained in local coordinates) are retrieved from the above mentioned global results (when localized). Appendix. Local formulae For the sake of completeness, we present in this appendix a brief and concise survey of the local expressions of some important geometric objects and the local definitions of the special Finsler manifolds treated in the paper. Let (U, (xi)) be a system of local coordinates on M and (π−1(U), (xi, yi)) the associated system of local coordinates on TM . We use the following notations : (∂i) := ( ): the natural basis of TxM, x ∈M , (∂̇i) := ( ): the natural basis of Vu(TM), u ∈ TM , (∂i, ∂̇i): the natural basis of Tu(TM), (ø∂i): the natural basis of the fiber over u in π −1(TM) (ø∂i is the lift of ∂i at u). To a Finsler manifold (M,L), we associate the geometric objects : gij := ∂̇i∂̇jL 2 = ∂̇i∂̇jE: the Finsler metric tensor, Cijk := ∂̇k gij : the Cartan tensor, ~ij := gij − ℓiℓj (ℓi := ∂L/∂y i): the angular metric tensor, Gh: the components of the canonical spray, Ghi := ∂̇iG Ghij := ∂̇jG i = ∂̇j ∂̇iG (δi) := (∂i −G i ∂̇h): the basis of Hu(TM) adapted to G (δi, ∂̇i): the basis of Tu(TM) = Hu(TM)⊕ Vu(TM) adapted to G We have : γ(ø∂i) = ∂̇i, ρ(∂i) = ø∂i, ρ(∂̇i) = 0, ρ(δi) = ø∂i, β(ø∂i) = δi, J(∂i) = ∂̇i, J(∂̇i) = 0, J(δi) = ∂̇i, h := βoρ = dxi ⊗ ∂i −G j ⊗ ∂̇i v := γoK = dy i ⊗ ∂̇i +G j ⊗ ∂̇i. We define : γhij := ghℓ(∂i gℓj + ∂j giℓ − ∂ℓ gij), Chij := ghℓ(∂̇i gℓj + ∂̇j giℓ − ∂̇ℓ gij) = ghℓ ∂̇i gjℓ = g hℓCijℓ, Γhij := ghℓ(δi gℓj + δj giℓ − δℓ gij) . Then, we have : • The canonical spray G: Gh = 1 γhij y • The Barthel connection Γ: Ghi = ∂̇iG h = Γhijy j = Ghijy • The Cartan connection CΓ: ( Γhij, G i , C The associated h-covariant (resp. v-covariant) derivative is denoted by p (resp. |), where Ki j|k := δkK mk −K jk and K j |k := ∂̇kK mk −K • The Berwald connection BΓ: ( Ghij, G i , 0). The associated h-covariant (resp. v-covariant) derivative is denoted by p(resp. where Ki := δkK mk −K jk and K := ∂̇kK We also have Ghij = Γ ij + C ij |k y k = Γhij + C ij |o, where C ij |o = C ij |k y For the Cartan connection, we have : (v)h-torsion : Rijk = δkG j − δjG k = Ujk{δkG (v)hv-torsion : P ijk = G jk − Γ jk = C jk|my m = C i jk|0, (h)hv-torsion : C ijk = 1/2{g ri∂̇rgjk}, h-curvature : Rihjk = Ujk{δkΓ hj + Γ mk} − C hv-curvature : P ihjk = ∂̇kΓ hj − C + C ihmP v-curvature : Sihjk = C mj − C mk = Ujk{C For the Berwald connection, we have : (v)h-torsion : R∗ijk = δkG j − δjG k = Ujk{δkG h-curvature : R∗ihjk = Ujk{δkG hj +G hv-curvature : P ∗ihjk = ∂̇kG hj =: G In the following, we give the local definitions of the special Finsler spaces treated in the paper. For each special Finsler space (M,L), we set its name, its defining property and a selected reference in which the local definition is located: • Rimaniann manifold [22] : gij(x, y) ≡ gij(x) ⇐⇒ Cijk = 0 ⇐⇒ Ci := C ik = 0 (Deicke’s theorem [4]). • Minkowaskian manifold [22]: gij(x, y) ≡ gij(y) ⇐⇒ C = 0 and Rhijk = 0. • Berwald manifold [22]: Γhij(x, y) ≡ Γ ij(x) (i.e. ∂̇kΓ ij = 0) ⇐⇒ C • Ch-recurrent manifold [13]: Chij|k = µkChij , where µj is a covariant vector field. • P ∗-Finsler manifold [7]: Ch ij|0 = λ(x, y)C where λ(x, y) = PiC ; Pi := P ik = C ik|0 = Ci|0 and C 2 = CiC i 6= 0. • Cv-recurrent manifold [13]: C ijk|l = λl C jk or Cijk|l = λl Cijk. • C0-recurrent manifold [13]: C ijk = λl C jk or Cijk = λl Cijk. • Semi-C-reducible manifold (dimM ≥ 3) [18]: Cijk = (n+ 1) (~ijCk + ~jkCi + ~kiCj) + CiCjCk, C 2 6= 0, where µ and τ are scalar functions satisfying µ+ τ = 1. • C-reducible manifold (dimM ≥ 3) [15]: Cijk = (~ijCk + ~jkCi + ~kiCj). • C2-like manifold (dimM ≥ 2) [17]: Cijk = CiCjCk, C 2 6= 0. • quasi-C-reducible manifold (dimM ≥ 3) [23]: Cijk = AijCk + AjkCi + AkiCj , where Aij(x, y) is a symmetric tensor field satisfying Aijy i = 0. • S3-like manifold (dimM ≥ 4) [6]: Slijk = (n−1)(n−2) {~ik~lj − ~ij~lk}, where S is the vertical scalar curvature. • S4-like manifold (dimM ≥ 5) [6]: Slijk = ~ljFik − ~lkFij + ~ikFlj − ~ijFlk, where Fij := {Sij − 2(n−2) S~ij}; Sij being the vertical Ricci tensor. • Sv-recurrent manifold [20], [11]: Shijk|m = λmShijk, where λj(x, y) is a covariant vector field. • Second order Sv-recurrent manifold [20], [11]: Shijk|m|n = ΘmnShijk, where Θij(x, y) is a covariant tensor field. • Landsberg manifold [7]: P hkji y k = 0 ⇐⇒ (∂̇iΓ k = 0 ⇐⇒ Ch yk = 0. • General Landsberg manifold [10]: P rijry i = 0 ⇐⇒ Cj|o = 0. • P -symmetric manifold [19]: Phijk = Phikj. • P2-like manifold (dimM ≥ 3) [14]: Phijk = αhCijk − αiChjk, where αk(x, y) is a covariant vector field. • P -reducible manifold (dimM ≥ 3) [19]: Pijk = (~ij Pk + ~jk Pi + ~ki Pj), where Pijk = ghiP • h-isotropic manifold (dimM ≥ 3) [13]: Rhijk = ko{ghjgik − ghkgij}, for some scalar ko, where Rhijk = gilR • Manifold of scalar curvature [21]: Rijkl y iyk = kL2~jl, for some function k : TM −→ R . • Manifold of constant curvature [21]: the function k in the above definition is constant. • Manifold of perpendicular scalar (or of p-scalar ) curvature [8], [9]: P ·Rhijk := ~ k Rlmnr = Ro{~ik~hj − ~ij~hk}, where Ro is a function called a perpendicular scalar curvature. • Manifold of s-ps curvature [8], [9]: (M,L) is both of scalar curvature and of p-scalar curvature. • R3-like manifold (dimM ≥ 4) [8]: Rhijk = ghjFik − ghkFij + gikFhj − gijFhk, where Fij := {Rij − r gij}; Rij := R ijh, r := References [1] H. AKbar-Zadeh, Les espaces de Finsler et certaines de leurs généralisations, Ann. Ec. Norm. Sup., Série 3, 80 (1963), 1–79. [2] , Sur les espaces de Finsler isotropes, C. R. Acad. Sc. Paris, série A (1979), 53–56. [3] , Initiation to global Finsler geometry, Elsevier, 2006. [4] F. Brickell, A new proof of Deicke’s theorem on homogeneous functions, Proc. Amer. Math. Soc., 16 (1965), 190-191. [5] P. Dazord, Propriétés globales des géodésiques des espaces de Finsler, Thèse d’Etat, (575) Publ. Dept. Math. Lyon, 1969. [6] F. Ikedo, On S3- and S4-like Finsler spaces with the T-tensor of a special form, Tensor, N. S., 35 (1981), 345–351. [7] H. Izumi, On P∗-Finsler spaces, I, Memoirs of the Defense Academy, Japan, No. 4, XVI (1976), 133–138. [8] H. Izumi and T. N. Srivastava, On R3-like Finsler spaces, Tensor, N. S., 32 (1978), 339–349. [9] H. Izumi and M. Yoshida, On Finsler spaces of perpendicular scalar curvature, Tensor, N. S., 32 (1978), 219–224. [10] M. Kitayama, Geometry of transformations of Finsler metrics, Ph. D. Thesis, Hokkaido University of Education, Kushiro, Japan, 2000. [11] , Indicatrices of Randers change, 9th International Conference of Tensor Society, Sapporo, Japan, September 4-8, 2006. [12] J. Klein and A. Voutier, Formes extérieures génératrices de sprays, Ann. Inst. Fourier, Grenoble, 18(1) (1968), 241–260. [13] M. Matsumoto, On h-isotropic and Ch-recurrent Finsler spaces, J. Math. Kyoto Univ., 11 (1971), 1–9. [14] , On Finsler spaces with curvature tensors of some special forms, Tensor, N. S., 22 (1971), 201–204. [15] , On C-reducible Finsler spaces, Tensor, N. S., 24 (1972), 29–37. [16] , Foundations of Finsler geometry and special Finsler spaces, Kaiseisha Press, Otsu, Japan, 1986. [17] M. Matsumoto and S. Numata, On semi-C-reducible Finsler spaces with constant coefficients and C2-like Finsler spaces, Tensor, N. S., 34 (1980), 218–222. [18] M. Matsumoto and C. Shibata, On semi-C-reducibility, T-tensor = 0 and S4- likeness of Finsler spaces, J. Math. Kyoto Univ., 19 (1979), 301–314. [19] M. Matsumoto and Shimada, On Finsler spaces with the curvature tensors Phijk and Shijk satsfiying special conditions, Rep. Math. Phys., 12 (1977), 77–87. [20] A. Moór, Über Finsler Räume von Zweifach Rekurrenter Krümmung, Acta Math. Acad. Sci. Hungaricae, 22 (1971), 453–465. [21] S. Numata, On Landesberg spaces of scalar curvature, J. Korean Math. Soc., 12(2) (1975), 97–100. [22] H. Rund, The differential geometry of Finsler spaces, Springer-Verlag, Berlin, 1959. [23] C. Shibata, On invariant tensors of β-changes of Finsler metrics, J. Math. Kyoto Univ., 24(1) (1984), 163–188. [24] A. A. Tamim, General theory of Finsler spaces with applications to Randers spaces, Ph. D. Thesis, Cairo University, 1991. [25] , Special Finsler manifolds, J. Egypt. Math. Soc., 10(2) (2002), 149–177. [26] , On Finsler submanifolds, J. Egypt. Math. Soc., 12(1) (2004), 55–70. [27] Nabil L. Youssef, S. H. Abed, and A. Soleiman, A global theory of conformal Finsler geometry, Submitted. ArXiv No.: math. DG/0610052. ABSTRACT The aim of the present paper is to provide a global presentation of the theory of special Finsler manifolds. We introduce and investigate globally (or intrinsically, free from local coordinates) many of the most important and most commonly used special Finsler manifolds: locally Minkowskian, Berwald, Landesberg, general Landesberg, $P$-reducible, $C$-reducible, semi-$C$-reducible, quasi-$C$-reducible, $P^{*}$-Finsler, $C^{h}$-recurrent, $C^{v}$-recurrent, $C^{0}$-recurrent, $S^{v}$-recurrent, $S^{v}$-recurrent of the second order, $C_{2}$-like, $S_{3}$-like, $S_{4}$-like, $P_{2}$-like, $R_{3}$-like, $P$-symmetric, $h$-isotropic, of scalar curvature, of constant curvature, of $p$-scalar curvature, of $s$-$ps$-curvature. The global definitions of these special Finsler manifolds are introduced. Various relationships between the different types of the considered special Finsler manifolds are found. Many local results, known in the literature, are proved globally and several new results are obtained. As a by-product, interesting identities and properties concerning the torsion tensor fields and the curvature tensor fields are deduced. Although our investigation is entirely global, we provide; for comparison reasons, an appendix presenting a local counterpart of our global approach and the local definitions of the special Finsler spaces considered. <|endoftext|><|startoftext|> The Hardy-Lorentz Spaces Hp,q(Rn) Wael Abu-Shammala and Alberto Torchinsky Abstract In this paper we consider the Hardy-Lorentz spaces Hp,q(Rn), with 0 < p ≤ 1, 0 < q ≤ ∞. We discuss the atomic decomposition of the elements in these spaces, their interpolation properties, and the behavior of singular integrals and other operators acting on them. The real variable theory of the Hardy spaces represents a fruitful setting for the study of maximal functions and singular integral operators. In fact, it is because of the failure of these operators to preserve L1 that the Hardy space H1 assumes its prominent role in harmonic analysis. Now, for many of these operators, the role of L1 can just as well be played by H1,∞, or Weak H1. However, although these operators are amenable to H1 − L1 and H1,∞ − L1,∞ estimates, interpolation between H1 and H1,∞ has not been available. Similar considerations apply to Hp and Weak Hp for 0 < p < 1. The purpose of this paper is to provide an interpolation result for the Hardy-Lorentz spaces Hp,q, 0 < p ≤ 1, 0 < q ≤ ∞, including the case of Weak Hp as and end point for real interpolation. The atomic decomposition is the key ingredient in dealing with interpolation since in this context neither truncations are available, nor reiteration applies. The paper is organized as follows. The Lorentz spaces, including criteria that assure membership in Lp,q, 0 < p < ∞, 0 < q ≤ ∞, are discussed in Section 1. In Section 2 we show that distributions in Hp,q have an atomic decomposition in terms ofHp atoms with coefficients in an appropriate mixed norm space. An interesting application of this decomposition is to Hp,q−Lp,∞ estimates for Calderón-Zygmund singular integral operators, p < q ≤ ∞. Also, by manipulating the different levels of the atomic decomposition, we show that, for 0 < q1 < q < q2 ≤ ∞, H p,q is an intermediate space between Hp,q1 and Hp,q2. This result applies to Calderón-Zygmund singular integral operators, including those with variable kernels, Marcinkiewicz integrals, and other operators. http://arxiv.org/abs/0704.0054v1 1 The Lorentz spaces The Lorentz space Lp,q(Rn) = Lp,q, 0 < p <∞, 0 < q ≤ ∞, consists of those measurable functions f with finite quasinorm ‖f‖p,q given by ‖f‖p,q = [t1/pf ∗(t)]q , 0 < q <∞ , ‖f‖p,∞ = sup [t1/pf ∗(t)] , q = ∞ . The Lorentz quasinorm may also be given in terms of the distribution func- tion m(f, λ) = |{x ∈ Rn : |f(x)| > λ}|, loosely speaking, the inverse of the non-increasing rearrangement f ∗ of f . Indeed, we have ‖f‖p,q = λq−1m(f, λ)q/p dλ 2km(f, 2k)1/p when 0 < q <∞, and ‖f‖p,∞ = sup 2km(f, 2k)1/p , q = ∞ . Note that, in particular, Lp,p = Lp, and Lp,∞ is weak Lp. The following two results are useful in verifying that a function is in Lp,q. Lemma 1.1. Let 0 < p <∞, and 0 < q ≤ ∞. Assume that the non-negative sequence {µk} satisfies {2 kµk} ∈ ℓ q. Further suppose that the non-negative function ϕ verifies the following property: there exists 0 < ε < 1 such that, given an arbitrary integer k0, we have ϕ ≤ ψk0 + ηk0, where ψk0 is essentially bounded and satisfies ‖ψk0‖∞ ≤ c 2 k0, and 2k0εpm(ηk0 , 2 k0) ≤ c [2kεµk] Then, ϕ ∈ Lp,q, and ‖ϕ‖p,q ≤ c ‖{2 kµk}‖ℓq. Proof. It clearly suffices to verify that ‖{2k |{ϕ > γ 2k}|1/p}‖ℓq <∞, where γ is an arbitrary positive constant. Now, given k0, let ψk0 and ηk0 be as above, and put γ = c+ 1, where c is the constant in the above inequalities; for this choice of γ, {ϕ > γ 2k0} ⊆ {ηk0 > 2 When q = ∞, we have 2k0εm(ηk0 , 2 k0)1/p ≤ c [2−k(1−ε) 2k µk] ≤ c 2−k0(1−ε) sup [ 2k µk] . Thus, 2k0 m(ηk0 , 2 k0)1/p ≤ supk≥k0[ 2 k µk] , and, consequently, 2k0 m(ϕ, γ 2k0)1/p ≤ c ‖{2kµk}‖ℓ∞ , all k0. When 0 < q < ∞, let 1 − ε = 2δ and rewrite the right-hand side above [2k(1−δ)µk] When p < q, by Hölder’s inequality with exponent r = q/p and its conjugate r′, this expression is dominated by 2k δpr )1/r′( 2k(1−δ)µk ]rp )1/r ≤ c 2−k0 δp 2k(1−δ)µk ]q )p/q and, when 0 < q ≤ p, r < 1, and we get a similar bound by simply observing that it does not exceed 2−k0δp [2k(1−δ)µk] ≤ 2−k0δp 2k(1−δ)µk ]q )p/q Whence, continuing with the estimate, we have 2k0εpm(ηk0 , 2 k0) ≤ c 2−k0δp 2k(1−δ)µk ]q )p/q which yields, since 1− ε = 2 δ, 2k0 m(ϕ, γ 2k0)1/p ≤ c 2k0 δ 2k(1−δ)µk ]q )1/q Thus, raising to the q and summing, we get 2k0 m(ϕ, γ 2k0)1/p 2k0 δ q 2k(1−δ)µk which, upon changing the order of summation in the right-hand side of the above inequality, is bounded by 2k(1−δ)µk k0=−∞ 2k0 δ q The reader will have no difficulty in verifying that, for Lemma 1.1 to hold, it suffices that ψx0 satisfies m(ψx0 , 2 k0)1/p ≤ c µk0 , all k0 . This holds, for instance, when ‖ψx0‖ r ≤ c 2 , 0 < r < ∞. In fact, the assumptions of Lemma 1.1 correspond to the limiting case of this inequality as r → ∞. Another useful condition is given by our next result, the proof is left to the reader. Lemma 1.2. Let 0 < p < ∞, and let the non-negative sequence {µk} be such that {2kµk} ∈ ℓ q, 0 < q ≤ ∞. Further, suppose that the non-negative function ϕ satisfies the following property: there exists 0 < ε < 1 such that, given an arbitrary integer k0, we have ϕ ≤ ψk0+ηk0, where ψk0 and ηk0 satisfy 2k0pm(ψk0 , 2 k0)ε ≤ c 2kµεk , 0 < ε < min(1, q/p) , 2k0ε|{ηk0 > 2 k0}| ≤ c 2kεµk Then, ϕ ∈ Lp,q, and ‖ϕ‖p,q ≤ c ‖{2 kµk}‖ℓq. We will also require some basic concepts from the theory of real interpo- lation. Let A0, A1, be a compatible couple of quasinormed Banach spaces, i.e., both A0 and A1 are continuously embedded in a larger topological vector space. The Peetre K functional of f ∈ A0 + A1 at t > 0 is defined by K(t, f ;A0, A1) = inf f=f0+f1 ‖f0‖0 + t ‖f1‖1 , where f = f0 + f1, f0 ∈ A0 and f1 ∈ A1. In the particular case of the Lq spaces, the K functional can be computed by Holmstedt’s formula, see [12]. Specifically, for 0 < q0 < q1 ≤ ∞, let α be given by 1/α = 1/q0 − 1/q1. Then, K(t, f ;Lq0, Lq1) ∼ f ∗(s)q0ds )1/q0 f ∗(s)q1ds )1/q1 The intermediate space (A0, A1)η, q, 0 < η < 1, 0 < q < ∞, consists of those f ’s in A0 + A1 with ‖f‖(A0,A1)η, q = t−ηK(t, f ;A0, A1) ]q dt ‖f‖(A0,A1)η,∞ = sup t−ηK(t, f ;A0, A1) <∞ , q = ∞ . Finally, for the Lq and Lp,q spaces, we have the following result. Let 0 < q1 < q < q2 ≤ ∞, and suppose that 1/q = (1 − η)/q1 + η/q2. Then, Lq = (Lq1 , Lq2)η,q, and, L 1,q = (L1,q1, L1,q2)η,q, see [4]. 2 The Hardy-Lorentz spaces Hp,q In this paper we adopt the atomic characterization of the Hardy spaces Hp, 0 < p ≤ 1. Recall that a compactly supported function a with [n(1/p− 1)] vanishing moments is an Hp atom with defining interval I (of course, I is a cube in Rn), if supp(a) ⊆ I, and |I|1/p |a(x)| ≤ 1. The Hardy space Hp(Rn) = Hp consists of those distributions f that can be written as f = λjaj , where the aj ’s are H p atoms, p < ∞, and the convergence is in the sense of distributions as well as in Hp. Furthermore, ‖f‖Hp ∼ inf where the infimum is taken over all possible atomic decompositions of f . This last expression has traditionally been called the atomic Hp norm of f . C. Fefferman, Rivière and Sagher identified the intermediate spaces be- tween the Hardy space Hp0, 0 < p0 < 1, and L ∞, as (Hp0, L∞)η,q = H p,q, 1/p = (1− η)/p0 , 0 < q ≤ ∞ , where Hp,q consists of those distributions f whose radial maximal function Mf(x) = supt>0 |(f ∗ ϕt)(x)| belongs to L p,q. Here ϕ is a compactly sup- ported, smooth function with nonvanishing integral, see [10]. R. Fefferman and Soria studied in detail the space H1,∞, which they called Weak H1, see [11]. Just as in the case of Hp, Hp,q can be characterized in a number of different ways, including in terms of non-tangential maximal functions and Lusin functions. In what follows we will calculate the quasinorm of f in Hp,q by the means of the expression 2km(Mf, 2k)1/p , 0 < p ≤ 1, 0 < q ≤ ∞ , where Mf is an appropriate maximal function of f . Passing to the atomic decomposition of Hp,q, the proof is divided in two parts. First, we construct an essentially optimal atomic decomposition; Par- ilov has obtained independently this result for H1,q when 1 ≤ q, see [14]. Also, R. Fefferman and Soria gave the atomic decomposition of Weak H1, see [11], and Alvarez the atomic decomposition of Weak Hp, 0 < p < 1, see Theorem 2.1. Let f ∈ Hp,q, 0 < p ≤ 1, 0 < q ≤ ∞. Then f has an atomic decomposition f = j,k λj,kaj,k, where the aj,k’s are H p atoms with defining intervals Ij,k that have bounded overlap uniformly for each k, the sequence {λj,k} satisfies j |λj,k| < ∞, and the convergence is in the sense of distributions. Furthermore, j |λj,k| ∼ ‖f‖Hp,q . Proof. The idea of constructing an atomic decomposition using Calderón’s reproducing formula is well understood, so we will only sketch it here, for further details, see [5] and [18]. Let Nf(x) = sup{|(f ∗ ψt)(y)| : |x− y| < t} denote the non-tangential maximal function of f with respect to a suitable smooth function ψ with nonvanishing integral. One considers the open sets Ok = {Nf > 2 k}, all integers k, and builds the atoms with defining interval associated to the intervals, actually cubes, of the Whitney decomposition of Ok, and hence satisfying all the required properties. More precisely, one constructs a sequence of bounded functions fk with norm not exceeding c 2 for each k, and such that f − |k|≤n fk → 0 as n→ ∞ in the sense of distri- butions. These functions have the further property that fk(x) = j αj,k(x) , where |αj,k(x)| ≤ c 2 k, c is a constant, each αj,k has vanishing moments up to order [n(1/p − 1)] and is supported in Ij,k - roughly one of the Whitney cubes -, where the Ij,k’s have bounded overlaps for each k, uniformly in k. It only remains now to scale αj,k, αj,k(x) = λj,k aj,k(x) , and balance the contribution of each term to the sum. Let λj,k = 2 k|Ij,k| Then, aj,k(x) is essentially an H p atom with defining interval Ij,k, and one j |λj,k| ∼ 2k |Ok| 1/p. Thus, |λj,k| )1/p∥ 2k |Ok| ∼ ‖f‖Hp,q , 0 < q ≤ ∞ . � As an application of this atomic decomposition, the reader should have no difficulty in showing directly the C. Fefferman, Rivière, Sagher character- ization of Hp,q, see [10]. Another interesting application of this decomposition is to Hp,q − Lp,∞ estimates for Calderón-Zygmund singular integral operators T , p < q ≤ ∞. This approach combines the concept of p-quasi local operator of Weisz, see [17], with the idea of variable dilations of R. Fefferman and Soria, see [11]. Intuitively, since Hörmander’s condition implies that T maps H1 into L1, say, for T to be defined in H1,s, 1 < s ≤ ∞, some strengthening of this condition is required. This is accomplished by the variable dilations. Moreover, since we will include p < 1 in our discussion, as p gets smaller, more regularity of the kernel of T will be required. This justifies the following definition. Given 0 < p ≤ 1, let N = [n(1/p − 1)], and, associated to the kernel k(x, y) of a Calderón-Zygmund singular integral operator T , consider the modulus of continuity ωp given by ωp(δ) = sup Rn\(2/δ)I | k(x, y)− |α|≤N (y − yI) αkα(x, yI)| dy where 0 < δ ≤ 1, and the sup is taken over the collection of arbitrary intervals I of Rn centered at yI . Here, for a multi-index α = (α1, . . . , αn), kα(x, yI) = Dαk(x, y) ωp(δ) controls the behavior of T on atoms. More precisely, if a is an H p atom with defining interval I, and 0 < δ < 1, observe that T (a)(x) = [k(x, y)− |α|≤N (y − yI) αkα(x, yI)] a(y) dy , and, consequently, Rn\(2/δ)I |T (a)(x)|p dx ≤ ωp(δ) . We are now ready to prove the Hp,q − Lp,∞ estimate for a Calderón- Zygmund singular integral operator T with kernel k(x, y). Theorem 2.2. Let 0 < p ≤ 1, and p < q ≤ ∞. Assume that a Calderón- Zygmund singular integral operator T is of weak-type (r, r) for some 1 < r < ∞, and that the modulus of continuity ωp of the kernel k satisfies a Dini condition of order q/(q − p), namely, Ap,q = ωp(δ) q/(q−p)dδ ](q−p)/q Then T maps Hp,q continuously into Lp,∞, and ‖Tf‖p,∞ ≤ cA p,q ‖f‖Hp,q . Proof. We need to show that 2k0pm(Tf, 2k0) ≤ c ‖f‖ Hp,q , all k0 . Let f = j λj,kaj,k , be the atomic decomposition of f given in Theorem 2.1, and set f1 = j λj,kaj,k, and f2 = f − f1. Further, let µk = j |λj,k| , and recall that ‖{µk}‖ℓq ∼ ‖f‖Hp,q . Since ‖f1‖ r ≤ c 2 k0(r−p)‖f‖ Hp,∞, we have 2pk0m(Tf1, 2 k0) ≤ c ‖f‖ Hp,∞ . Next, put I∗j,k = 2 1/n(3/2)p(k−k0)/nIj,k, and let I∗j,k . Since |I∗j,k| = 2(3/2) p(k−k0)|Ij,k| ∼ 2 −k0p(3/4)p(k−k0)|λj,k| p, we get |Ω| ≤ |I∗j,k| ≤ c 2 (3/4)p(k−k0) |λj,k| ≤ c 2−k0p ≤ c 2−k0p‖f‖ Hp,∞ . Also, since 0 < p ≤ 1, it readily follows that |T (f2)(x)| |λj,k| p|T (aj,k)(x)| and, by Tonelli and the estimate for T (a), we have |T (f2)(x)| p dx ≤ |λj,k| Rn\I∗ |T (aj,k)(x)| )p(k−k0)/n )pk/n)q/(q−p))(q−p)/q ωp(δ) q/(q−p)dδ ](q−p)/q Hp,q . This bound gives at once 2pk0 |{x /∈ Ω : |T (f2)(x)| > 2 k0}| ≤ cAp,q ‖f‖ Hp,q , which implies that 2pk0m(Tf2, 2 k0−1) ≤ 2pk0 |Ω|+ |{x /∈ Ω : |T (f2)(x)| > 2 k0−1}| ≤ c ‖f‖ Hp,∞ + cAp,q ‖f‖ Hp,q . Finally, 2k0pm(Tf, 2k0) ≤ 2k0pm(Tf1, 2 k0−1) + 2k0pm(Tf2, 2 k0−1) ≤ c ‖f‖ Hp,∞ + cAp,q ‖f‖ Hp,q , and, since ‖f‖Hp,∞ ≤ c ‖f‖Hp,q for all q, we have finished. � We pass now to the converse of Theorem 2.1. It is apparent that a condition that relates the coefficients λj with the corresponding atoms aj involved in an atomic decomposition of the form j λjaj(x) is relevant here. More precisely, if Ij denotes the supporting interval of aj , let Ik = {j : 2 k ≤ |λj|/|Ij| 1/p < 2k+1} , and, for λ = {λj}, put ‖λ‖[p,q] = ]q/p)1/q We then have, Theorem 2.3. Let 0 < p ≤ 1, 0 < q ≤ ∞, and let f be a distribution given by f = j λj aj(x) , where the aj’s are H p atoms, and the convergence is in the sense of distributions. Further, assume that the family {Ij} consisting of the supports of the aj’s has bounded overlap at each level Ik uniformly in k, and ‖λ‖[p,q] <∞. Then, f ∈ H p,q, and ‖f‖Hp,q ≤ c ‖λ‖[p,q]. Proof. Let Mf(x) = supt>0 |(f ∗ ψt)(x)| denote the radial maximal function of f with respect to a suitable smooth function ψ with support contained in {|x| ≤ 1} and nonvanishing integral. We will verify that Mf satisfies the conditions of Lemma 1.1 and is thus in Lp,q. Fix an integer k0 and let g(x) = λjaj(x) . Since ‖Mg‖∞ ≤ ‖g‖∞ it suffices to estimate |g(x)|. Let C be the bounded overlap constant for the family of the supports of the aj’s. Then, for j ∈ Ik, |λj| |aj(x)| = |Ij|1/p |λj | |Ij| 1/p |aj(x)| ≤ 2 kχIj (x) , and, consequently, |g(x)| ≤ χIj(x) ≤ C 2 Next, let h(x) = λjaj(x) . Since aj has N = [n(1/p − 1)] vanishing moments, it is not hard to see that, if Ij is the defining interval of aj and Ij is centered at xj , and γ = (n+N+1)/n > 1/p, then, with c independent of j, ϕj(x) =Maj(x) satisfies ϕj(x) ≤ c |Ij | γ−1/p (|Ij|+ |x− xj |n)γ Thus, if 1/γ < εp < 1, Mh(x)εp ≤ c j∈Ik,k≥k0 (|λj| |Ij| γ−1/p)εp (|Ij |+ |x− xj |n)γεp which, upon integration, yields Mh(x)εp dx ≤ c j∈Ik,k≥k0 (|λj| |Ij| γ−1/p)εp (|Ij |+ |x− xj |n)γεp The integrals in the right-hand side above are of order |Ij| 1−γεp and, conse- quently, by Chebychev’s inequality, 2k0εp|{Mh > 2k0}| ≤ c j∈Ik,k≥k0 εp |Ij| 1−ε ≤ c |Ij| . Thus, Lemma 1.1 applies with ϕ = Mf , ψk0 = Mg, ηk0 = Mh, and µk = , and we get 2km(Mf, 2k)1/p )1/p}∥ which, since |Ij| ∼ , j ∈ Ik , is bounded by c ‖λ‖[p,q], 0 < q ≤ ∞. � The next result is of interest because it applies to arbitrary decomposi- tions in Hp,q. The proof relies on Lemma 1.2, and is left to the reader. Theorem 2.4. Let 0 < p ≤ 1, 0 < q ≤ ∞, and let f be a distribution given by f = j λj aj(x) , where the aj’s are H p atoms, and the convergence is in the sense of distributions. Further, assume that ‖λ‖[η,q] < ∞ for some 0 < η < min(p, q). Then, f ∈ Hp,q, and ‖f‖Hp,q ≤ c ‖λ‖[η,q]. 2.1 Interpolation between Hardy-Lorentz spaces We are now ready to identify the intermediate spaces of a couple of Hardy- Lorentz spaces with the same first index p ≤ 1. Theorem 2.5. Let 0 < p ≤ 1. Given 0 < q1 < q < q2 ≤ ∞, define 0 < η < 1 by the relation 1/q = (1− η)/q1 + η/q2. Then, with equivalent quasinorms, Hp,q = (Hp,q1, Hp,q2)η,q . Proof. Since the non-tangential maximal function Nf of a distribution f in Hp,q1 is in Lp,q1, and that of f in Hp,q2 is in Lp,q2, we have K(t, Nf ;Lp,q1, Lp,q2) ≤ cK(t, f ;Hp,q1, Hp,q2) . Thus, ‖Nf‖p,q ∼ ‖Nf‖(Lp,q1 ,Lp,q2)η,q ≤ c ‖f‖(Hp,q1 ,Hp,q2 )η,q , and (Hp,q1, Hp,q2)η,q →֒ H To show the other embedding, with the notation in the proof of Theorem 2.1, write f = j λj,kaj,k , and recall that for every integer k, the level set Ik = {j : |λj,k|/|Ij,k| 1/p ∼ 2k} contains exclusively the sequence {λj,k}. Let µ |λj,k| p. By construction, k ∼ ‖f‖ Hp,q . Now, rearrange {µk} into {µ l }, and, for each l ≥ 1, let kl be such that µkl = µ l . For l0 ≥ 1, let Kl0 = {k1, . . . , kl0}, and put f1,l0 = k∈Kl0 j λj,kaj,k and f2,l0 = f − f1,l0 . Then, by Theorem 2.2, f1,l0 ∈ H p,q1, f2,l0 ∈ H p,q2, and, with the usual interpretation for q2 = ∞, ‖f1,l0‖Hp,q1 ≤ c )1/q1 , ‖f2,l0‖Hp,q2 ≤ c )1/q2 So, for t > 0 and every positive integer l0, we have K(t, f ;Hp,q1, Hp,q2) ≤ c )1/q1 )1/q2 Now, by Homstedt’s formula, there is a choice of l0 such that the right-hand side above ∼ K(t, {µk}; ℓ q1, ℓq2), and, consequently, K(t, f ;Hp,q1, Hp,q2) ≤ cK(t, {µk}; ℓ q1, ℓq2) . Thus, ‖f‖(Hp,q1 ,Hp,q2 )η,q ≤ c ‖{µk}‖(ℓq1 ,ℓq2)η,q ≤ c ‖{µk}‖ℓq ≤ c ‖f‖Hp,q , and Hp,q →֒ (Hp,q1, Hp,q2)η,q. � The reader will have no difficulty in verifying that Theorem 2.5 gives that if T is a continuous, sublinear map fromH1 into L1, and fromH1,∞ into L1,∞, then ‖Tf‖1,q ≤ c ‖f‖H1,q for 1 < q < ∞. This observation has numerous applications. For instance, consider the Calderón-Zygmund singular integral operators with variable kernel defined by TΩ(f)(x) = p.v. Ω(x, x−y) |x− y|n f(y) dy . Under appropriate growth and smoothness assumptions on Ω, TΩ maps H continuously into L1, see [6], and H1,∞ continuously into L1,∞, see [8]. Thus, if Ω satisfies the assumptions of both of these results, TΩ maps H 1,q contin- uously into L1,q for 1 < q < ∞. A similar result follows by invoking the characterization of H1,q given by C. Fefferman, Rivière and Sagher. How- ever, in this case the Hp−Lp estimate requires additional smoothness of Ω, as shown, for instance, in [6]. Similar considerations apply to the Marcinkiewicz integral, see [9], and [7]. Finally, when p < 1, our results cover, for instance, the δ-CZ operators satisfying T ∗(1) = 0 discussed by Alvarez and Milman, see [3]. These oper- ators, as well as a more general related class introduced in [15], preserve Hp and Hp,∞ for n/(n + δ) < p ≤ 1, and, consequently, by Theorem 2.5, they also preserve Hp,q for p in that same range, and q > p. References [1] W. Abu-Shammala and A. Torchinsky, The atomic decomposition for H1,q(Rn), Proceedings of the International Conference on Harmonic Analysis and Ergodic Theory, (2005), to appear. [2] J. Alvarez, Hp and Weak Hp continuity of Calderón-Zygmund type op- erators, Lecture Notes in Pure and Appl. Math. 157 (1992), 17–34. [3] J. Alvarez and M. Milman, Hp continuity of Calderón-Zygmund type operators, J. Math. Anal. Appl. 118 (1986), 63–79. [4] J. Bergh and J. Löfström, Interpolation spaces, an introduction, Springer-Verlag, 1976. [5] A. P. Calderón, An atomic decomposition of distributions in parabolic Hp spaces, Advances in Math. 25 (1977), 216–225. [6] J. Chen, Y. Ding, and D. Fan, A class of integral operators with variable kernels in Hardy spaces, Chinese Annals of Math. (A) 23 (2002), 289– [7] Y. Ding, C.-C. Lin, and S. Shao, On the Marcinkiewciz integral with variable kernels, Indiana Math. J. 53, (2004), 805–821. [8] Y. Ding, S. Z. Lu, and S. Shao, Integral operators with variable kernels on weak Hardy spaces, J. Math. Anal. Appl. 317, (2006), 127-135. [9] Y. Ding, S. Z. Lu, and Q. Xue, Marcinkiewicz integral on Hardy spaces, Integr. Equ. Oper. Theory 42, (2002), 174-182. [10] C. Fefferman, N. M. Rivière, and Y. Sagher, Interpolation between Hp spaces: the real method, Trans. Amer. Math. Soc. 191 (1974), 75–81. [11] R. Fefferman and F. Soria, The space Weak H1, Studia Math. 85 (1987), 1–16. [12] T. Holmstedt, Interpolation of quasi-normed spaces, Math. Scand. 25 (1970), 177–199. [13] P. Krée, Interpolation d’espaces vectoriels qui ne sont ni normés ni com- plets. Applications., Ann. Inst. Fourier (Grenoble) 17 (1967), 137–174. [14] D. V. Parilov, Two theorems on the Hardy-Lorentz classes H1,q, Zap. Nauchm. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 327 (2005), 150-167. [15] T. Quek and D. Yang, Calderón-Zygmund type operators on weighted weak Hardy spaces over Rn, Acta Math. Sinica (Engl. Ser.) 16 (2000), 141–160. [16] A. Torchinsky, Real-variable methods in harmonic analysis, Dover Pub- lications, Inc., 2004. [17] F. Weisz, Summability of multi-dimensional Fourier series and Hardy spaces, Kluwer Academic Publishers, 2002. [18] J. M. Wilson, On the atomic decomposition for Hardy spaces, Pacific. J. Math. 116 (1985), 201–207. DEPARTMENT OF MATHEMATICS, INDIANA UNIVERSITY, BLOOMINGTON, IN 47405 E-mail: wabusham@indiana.edu, torchins@indiana.edu The Lorentz spaces The Hardy-Lorentz spaces Hp,q Interpolation between Hardy-Lorentz spaces ABSTRACT In this paper we consider the Hardy-Lorentz spaces $H^{p,q}(R^n)$, with $0<|startoftext|> Potassium intercalation in graphite: A van der Waals density-functional study Eleni Ziambaras,1 Jesper Kleis,1 Elsebeth Schröder,1 and Per Hyldgaard1, 2, ∗ Department of Applied Physics, Chalmers University of Technology, SE–412 96 Göteborg, Sweden Microtechnology and Nanoscience, MC2, Chalmers University of Technology, SE–412 96 Göteborg, Sweden (Dated: April 1, 2007) Potassium intercalation in graphite is investigated by first-principles theory. The bonding in the potassium-graphite compound is reasonably well accounted for by traditional semilocal density functional theory (DFT) calculations. However, to investigate the intercalate formation energy from pure potassium atoms and graphite requires use of a description of the graphite interlayer binding and thus a consistent account of the nonlocal dispersive interactions. This is included seamlessly with ordinary DFT by a van der Waals density functional (vdW-DF) approach [Phys. Rev. Lett. 92, 246401 (2004)]. The use of the vdW-DF is found to stabilize the graphite crystal, with crystal parameters in fair agreement with experiments. For graphite and potassium-intercalated graphite structural parameters such as binding separation, layer binding energy, formation energy, and bulk modulus are reported. Also the adsorption and sub-surface potassium absorption energies are reported. The vdW-DF description, compared with the traditional semilocal approach, is found to weakly soften the elastic response. I. INTRODUCTION Graphite with its layered structure is easily interca- lated by alkali metals (AM) already at room tempera- ture. The intercalated compound has two-dimensional layers of AM between graphite layers,1,2,3,4,5 giving rise to interesting properties, such as superconductivity.6,7 The formation of an AM-graphite intercalate proceeds with adsorption of AM atoms on graphite and absorption of AM atoms below the top graphite layer, after which further exposure to AM atoms leads the AM intercalate compound. Recent experiments8,9 on the structure and elec- tronic properties of AM/graphite systems use samples of graphite that are prepared by heating SiC crystals to temperatures around ∼ 1400◦ C.10 This heat-induced graphitization is of great value for spectroscopic studies of graphitic systems, since the resulting graphite overlay- ers are of excellent quality.11 The nature of the bonding between the SiC surfaces and graphite has been explored experimentally with photoemission spectroscopy12 and theoretically13 with a van der Waals density functional (vdW-DF) theory approach that accounts for the van der Waals (vdW) forces.14,15,16,17 Here we investigate with density functional theory (DFT) the effects on the graphite structure and the energetics and the elastic response when potassium is intercalated. The final intercalate compound is C8K. The AM intercalate system is interesting in it- self and has been the focus of numerous experimen- tal investigations.18,19,20,21,22 Graphitic systems are also ideal test materials in ongoing theory development that aims at improving the description of the nonlocal inter- layer bonds in sparse systems.14,23,24 Standard DFT ap- proaches are based on local (local density approximation, LDA) and semilocal approximations (generalized gradi- ent approximation, GGA)25,26,27,28 for the electron ex- change and correlation. Such regular DFT tools do not treat correctly the weak vdW binding, e.g., the cohe- sion between (adjacent) graphite layers. The failure of traditional DFT for graphite makes it impossible to ob- tain a meaningful comparison of the energetics in on- surface AM adsorption and subsurface AM absorption. Conversely, investigations of graphitic systems like C8K permit us to test the accuracy of our vdW-DF develop- ment work. We explore the nature of the bonding of graphite, the process leading to intercalation via adsorption and absorption of potassium, and the nature of potassium- intercalated graphite C8K using a recently developed vdW-DF density functional.16 This choice of functional is essential for a comparison of graphite and C8K properties because of the inability of traditional GGA-based DFT to describe graphite. We calculate the structure and elas- tic response (bulk modulus B0) of pristine graphite and potassium intercalated graphite and we present results for the formation energies of the C8K system. The intercalation of potassium in graphite is preceded by the adsorption of potassium on top of a graphite surface and potassium absorption underneath the top graphite layer of the surface. In this work we study how potassium bonds to graphite in these two parts of the process towards intercalation. Our vdW-DF inves- tigations of the binding of potassium in or on graphite supplements corresponding vdW-DF studies of the bind- ing of polycyclic aromatic hydrocarbon dimers, of the polyethylene crystal, of benzene dimers, and of poly- cyclic aromatic hydrocarbon and phenol molecules on graphite.29,30,31,32,33,34 The outline of the paper is as follows. Section II con- tains a short description of the materials of interest here: graphite, C8K, and graphite with an adsorbed or ab- sorbed K atom layer. The vdW-DF scheme is described in Sec. III. Section IV presents our results, Sec. V the discussion, and conclusions are drawn in Sec. VI. http://arxiv.org/abs/0704.0055v1 FIG. 1: (Color online) Simple hexagonal graphite (AA stack- ing) and natural hexagonal graphite (AB stacking). The two structures differ by that each second carbon layer in AB- stacked graphite is shifted, whereas in AA-stacked graphite all planes are directly above each other. The experimentally obtained in-plane lattice constant and sheet separation of nat- ural graphite is (Ref. 40) a = 2.459 Å and dC-C = 3.336 Å, respectively. II. MATERIAL STRUCTURE Graphite is a semimetallic solid with strong intra-plane bonds and weakly coupled layers. The presence of these two types of bonding results in a material with different properties along the various crystallographic directions.35 For example, the thermal and electrical conductivity along the carbon sheets is two orders of magnitude higher than that perpendicular to the sheets. This specific prop- erty allows heat to move directionally, which makes it possible to control the heat transfer. The relatively weak vdW forces between the sheets contribute to another in- dustrially important property: graphite is an ideal lubri- cant. In addition, the anisotropic properties of graphite make the material suitable as a substrate in electronic studies of ultrathin metal films.36,37,38,39 The natural structure of graphite is an AB stacking, with the graphite layers shifted relative to each other, as illustrated in Fig. 1. The figure also shows hexagonal graphite, consisting of AA-stacked graphite layers. The in-plane lattice constant a and the layer separation dC-C is also illustrated. In natural graphite the primitive unit cell is hexagonal, includes four carbon atoms in two lay- ers, and has unit cell side lengths a and height c = 2dC-C. The physical properties of graphite have been studied in a variety of experimental40,41,42 and theoretical43,44 work. Some of the DFT work has been performed in LDA, which does not provide a physically meaningful account of binding in layered systems.15,45 At the same time, using GGA is not an option because it does not bind the graphite layers. For a good description of the FIG. 2: (Color online) Crystalline structure of C8K show- ing the AA-stacking of the carbon layers (small balls) and the αβγδ-stacking of the potassium layers (large balls) per- pendicular to the graphene sheets. The potassium layers are arranged in a p(2× 2) structure, with the K atoms occupying the sites over the hollows of every fourth carbon hexagon. graphite structure and nature the vdW interactions must be included.45 Alkali metals (AM), except Na, easily penetrate the gallery of the graphite forming alkali metal graphite in- tercalation compounds. These intercalation compounds are formed through electron exchange between the inter- calated layer and the host carbon layers, resulting in a different nature of the interlayer bonding type than that of pristine graphite. The intercalate also affects the con- ductive properties of graphite, which becomes supercon- ductive in the direction parallel to the planes at critical temperatures below 1 K.6,7 The structure of AM graphite intercalation compounds is characterized by its stage n, where n is the number of graphite sheets located between the AM layers. In this work we consider only stage-1 intercalated graphite C8K, in which the layers of graphite and potassium alternate throughout the crystal. The primitive unit cell of C8K is orthorhombic and contains sixteen C atoms and two K atoms. In the C8K crystal the K atoms are ordered in a p(2× 2) registry with K-K separation 2a, where a is the in-plane lattice constant of graphite. This separation of the potassium atoms is about 8% larger than that in the natural K bcc crystal (based on experimental values). The carbon sheet stacking in C8K is of AA type, with the K atoms occupying the sites over the hollows of every fourth carbon hexagon, each position denoted by α, β, γ, or δ, and the stacking of the K atoms perpendicular to the planes being described by the αβγδ-sequence as illustrated in Fig. 2. III. COMPUTATIONAL METHODS The first-principle total-energy and electronic struc- ture calculations are performed within the framework of DFT. The semilocal Perdew-Burke-Ernzerhof (PBE) flavor26 of GGA is chosen for the exchange-correlation functional for the traditional self-consistent calculations underlying the vdW-DF calculations. For all GGA cal- culations we use the open source DFT code Dacapo,46 which employs Vanderbilt ultrasoft pseudopotentials,47 periodic boundary conditions, and a plane-wave basis set. An energy cut-off of 500 eV is used for the expansion of the wave functions and the Brillouin zone (BZ) of the unit cells is sampled according to the Monkhorst-Pack scheme.48 The self-consistently determined GGA valence electron density n(r) as well as components of the energy from these calculations are passed on to the subsequent vdW-DF calculation of the total energy. For the adsorption and absorption studies a graphite surface slab consisting of 4 layers is used, with a surface unit cell of side lengths twice those in the graphite bulk unit cell (i.e., side lengths 2a). The surface calculations are performed with a 4×4×1 k-point sampling of the BZ. The (pure) graphite bulk GGA calculations are per- formed with a 8×8×4 k-point sampling of the BZ, whereas for the C8K bulk structure, in a unit cell at least double the size in any direction, 4×4×2 k-points are used, consistent also with the choice of k-point sampling of the surface slabs. We choose to describe C8K by using a hexagonal unit cell with four formula units, lateral side lengths approxi- mately twice those of graphite and with four graphite and four K-layers in the direction perpendicular to the layers. C8K can also be described by the previously mentioned primitive orthorhombic unit cell containing two formula units of atoms but we retain the orthorhombic cell for ease of description and for simple implementation of nu- merically robust vdW-DF calculations. In all our studies, except test cases, the Fast Fourier Transform (FFT) grids are chosen such that the separa- tion of neighboring points is maximum ∼0.13 Å in any direction in any calculation. A. vdW density function calculations In graphite, the carbon layers bind by vdW interac- tions only. In the intercalated compound a major part of the attraction is ionic, but also here the vdW interactions cannot be ignored. In order to include the vdW interac- tions systematically in all of our calculations we use the vdW-DF of Ref. 16. There, the correlation energy func- tional is divided into a local and a nonlocal part, Ec ≈ E c + E c , (1) where the local part is approximated in the LDA and the nonlocal part Enlc is consistently constructed to vanish for a homogeneous system. The nonlocal correlation Enlc is calculated from the GGA-based n(r) and its gradients by using information about the many-body response of the weakly inhomogeneous electron gas: Enlc = dr′n(r)φ(r, r′)n(r′). (2) The nonlocal kernel φ(r, r′) can be tabulated in terms of the separation |r − r′| between the two fragments at positions r and r′ through the parameters D = (q0 + q′0)|r − r ′|/2 and δ = (q0 − q 0)/(q0 + q 0). Here q0 is a local parameter that depends on the electron density and its gradient at position r. The analytic expression for the kernel φ in terms of D and δ can be found in Ref. 16. For periodic systems, such as bulk graphite, C8K, and the graphite surface (with adsorbed or absorbed K- atoms), the nonlocal correlation per unit cell is simply evaluated from the interaction of the points in the unit cell V0 with points everywhere in space (V ) in the three (for bulk graphite and C8K) or two (for the graphite surface) dimensions of periodicity. Thus, the V -integral in Eq. (2) in principle requires a representation of the electron density infinitely repeated in space. In prac- tice, the nonlocal correlation rapidly converges31 and it suffices with repetitions of the unit cell a few times in each spatial direction. For graphite bulk the V -integral is converged when we use a V that extends 9 (7) times the original unit cell in directions parallel (perpendicular) to the sheets. For the potassium investigation a signif- icantly larger original unit cell is adopted (see Fig. 2); here a fully converged V corresponds to a cell extending five (three) times the original cell in the direction parallel (perpendicular) to the sheets for C8K bulk. To describe the nonlocal correlations (2) for the graphite surface a sufficient V extends five times the original unit cell along the carbon sheets. For the exchange energy Ex we follow the choice of Ref. 16 of using revPBE27 exchange. Among the func- tionals that we have easy access to, the revPBE has proved to be the best candidate for minimizing the ten- dency of artificial exchange binding in graphite.15 Using the scheme described above to evaluate Enlc , the total energy finally reads: EvdW−DF = EGGA − EGGAc + E c + E c , (3) where EGGA is the GGA total energy with the revPBE choice for the exchange description and EGGAc (E the GGA (LDA) correlation energy. As our GGA calcu- lations in this specific application of vdW-DF are carried out in PBE, not revPBE, we further need to explicitly replace the PBE exchange in EGGA by that of revPBE for the same electron charge density distribution. B. Convergence of the local and nonlocal energy variation DFT calculations provide physically meaningful results for energy differences between total energies (3). To un- derstand materials and processes we must compare total energy differences between a system with all constituents at relatively close distance and a system of two or more fragments at “infinite” separation (the reference system). Since the total energy (3) consists both of a long-range term and shorter-ranged GGA and LDA terms it is nat- ural to choose different ways to represent the separated fragments for these different long- or short-range energy terms. For the shorter-range energy parts (LDA and GGA terms) the reference system is a full system with vacuum between the fragments. For LDA and GGA calculations it normally suffices to make sure that the charge den- sity tails of the fragments do not overlap, but here we find that the surface dipoles cause a slower convergence with layer separation. We use a system with the layer separation between the potassium layer and the nearest graphite layer(s) dC-K = 12 Å (8 Å) as reference for the adsorption (absorption) study. The evaluation of the nonlocal correlationsEnlc requires additional care. This is due to technical reasons per- taining to numerically stability in basing the Enlc eval- uation on the FFT grid used to converge the underly- ing traditional-DFT calculations. The evaluation of the nonlocal correlation energy, Eq. (2), involves a weighted double integral of a kernel with a significant short-range variation16. The shape of the kernel makes the Enlc eval- uation sensitive to the particulars of FFT-type griding,49 for example, to the relative position of FFT grid points relative to the nuclei position (for a finite grid-point spac- ing). However, robust evaluation of binding- or cohesive- energy contributions by nonlocal correlations can gen- erally be secured by a further splitting of energy differ- ences into steps that minimize the above-mentioned grid sensitivity. The problem of FFT sensitivity of the Enlc evaluation is accentuated because the binding in the Enlc channel arises as a smaller energy difference between siz- ableEnlc contributions of the system and of the fragments. Conversely, convergence in vdW-DF calculations of bind- ing and cohesive energies can be obtained even at a mod- erate FFT grid accuracy (0.13 Å used here) by devising a calculational scheme that always maintains identical po- sition of the nuclei relative to grid points in the combined systems as well as in the fragment reference system. Thus we obtain a numerically robust evaluation of the Enlc energy differences by choosing steps for which we can explicitly control the FFT griding. For adsorption and absorption cases we calculate the reference systems as a sum of Enlc -contributions for each fragment and we make sure to always position the fragment at the exact same position in the system as in the interacting system. For bulk systems we choose steps in which we exclusively adjust the inter-plane or in-plane lattice constant. Here the reference system is then simply defined as a system with either double (or in some cases quadruple) lattice constant and with a corresponding doubling of the FFT griding along the relevant unit-cell vector. The cost of full convergence is that, in practice, we of- ten do three or more GGA calculations and subsequent Enlc calculations for each point on the absorption, absorp- tion, or formation-energy curve. In addition to the cal- culations for the full system we have to do one for each of the isolated fragments at identical position in the adsorp- tion/absorption cases and one or more for fragments in the doubled unit-cell and doubled griding reference. We have explicitly tested that using a FFT grid spacing of < 0.13 Å (but not larger) for such reference calculations is sufficient to ensure full convergence in the reported Enlc (and E vdW−DF total) energy variation for graphitic systems. C. Material formation and sorption energies The cohesive energy of graphite (G) is the energy gain, per carbon atom, of creating graphite at in-plane lattice constant a and layer separation dC-C from isolated (spin- polarized) carbon atoms. EG,coh(a, dC-C) = EG,tot(a, dC-C)− EC-atom,tot (4) where EG,tot and EC-atom,tot are total energies per carbon atom. The graphite structure is stable at the minimum of the cohesive energy, at lattice constants a = aG and 2dC-C = cG. The adsorption (absorption) energy for a p(2 × 2) K- layer over (under) the top layer of a graphite surface is the difference in total energy [from Eq. (3)] for the system at hand minus the total energy of the initial system, i.e., a clean graphite surface and isolated gas-phase potassium atoms. However, due to the above mentioned technical issues in using the vdW-DF we calculate the adsorption and absorption energy as a sum of (artificial) stages lead- ing to the desired system: First the initially isolated, spin-polarized potassium atoms are gathered into a free floating potassium layer with the structure correspond- ing to a full cover of potassium atoms. By this the total system gains the energy ∆EK-layer(aG), with ∆EK-layer(a) = EK,tot(a)− EK-atom,tot . (5) In adsorption the potassium layer is then simply placed on top of the four-layer (2 × 2) graphite surface (with the K atoms above graphite hollows) at distance dC-K. The system thereby gains a further energy contribution ∆EK-G(dC-K). This leads to an adsorption energy per K-atom Eads(dC-K) = ∆EK-layer(aG) + ∆EK-G(dC-K) . (6) In absorption the top graphite layer is peeled off the (2 × 2) graphite surface and moved to a distance far from the remains of the graphite surface. This process costs the system an (“exfoliation”) energy −∆EC-G = −[Etot,C-G(dC-C = cG/2)− Etot,C-G(dC-C → ∞)]. At the far distance the isolated graphite layer is moved into AA stacking with the surface, at no extra energy cost. Then, the potassium layer is placed midway between the far- away graphite layer and the remains of the graphite sur- face. Finally the two layers are gradually moved towards the surface. At distance 2dC-K between the two topmost graphite layers (sandwiching the K-layer) the system has further gained an energy ∆EC-K-G(dC-K). The absorp- tion energy per K-atom is thus Eabs(dC-K) = −∆EC-G+∆EK-layer(aG)+∆EC-K-G(dC-K) . Similarly, the C8K intercalate compound is formed from graphite by first moving the graphite layers far apart accordion-like (and there shift the graphite stack- ing from ABA . . . to AAA . . . at no energy cost), then changing the in-plane lattice constant of the isolated graphene layers from aG to a, then intercalating K-layers (in stacking αβγδ) between the graphite layers, and fi- nally moving all the K- and graphite layers back like an accordion, with in-plane lattice constant a (which has the value aC8K at equilibrium). In practice, a unit cell of four periodically repeated graphite layers is used in order to accommodate the potassium αβγδ-stacking. The energy gain of creating a (2× 2) graphene sheet from 8 isolated carbon atoms is defined similarly to that of the K-layer: ∆EC-layer(a) = EC-layer,tot(a)− 8EC-atom,tot . (8) The formation energy for the C8K intercalate com- pound per K atom or formula unit, Eform, is thus found from the energy cost of moving four graphite layers apart by expanding the (2 × 2) unit cell to large height, −∆EG-acc, the cost of changing the in-plane lattice con- stant from aG to a in each of the four isolated graphene layers, 4(∆EC-layer(a)−∆EC-layer(aG)), the gain of creat- ing four K-layers from isolated K-atoms, 4∆EK-layer(a), plus the gain of bringing four K-layers and four graphite layers together in the C8K structure, ∆EC8K-acc(a, dC-K), yielding Eform(a, dC-K) −∆EG-acc + 4∆EC-layer(a)− 4∆EC-layer(aG) + 4∆EK-layer(a) + ∆EC8K-acc(a, dC-K) . (9) The relevant energies to use for comparing the three different mechanisms of including potassium (adsorp- tion, absorption and intercalation) are thus Eads(dC-K), Eabs(dC-K) and Eform(a, dC-K) at their respective mini- mum values. IV. RESULTS Experimental observations indicate that the intercala- tion of potassium into graphite starts with the absorption of evaporated potassium into an initially clean graphite surface.50 This subsurface absorption is preceded by ini- tial, sparse potassium adsorption onto the surface, and proceeds with further absorption into deeper graphite voids. The general view is that the K atoms enter graphite at the graphite step edges.20 The amount and position of intercalated K atoms is controlled by the tem- perature and time of evaporation. Below, we first describe the initial clean graphite sys- tem, and the energy gain in (artificially) creating free- floating K-layers from isolated K-atoms. Then we present and discuss our results on potassium adsorption and sub- surface absorption, followed by a characterization of bulk For the adsorption (absorption) system we calculate the adsorption (absorption) energy curve, including the equilibrium structure. As a demonstration of the need for a relatively fine FFT griding in the vdW-DF cal- culations we also calculate and compare the absorption curve for a more sparse FFT grid. For the bulk sys- tems (graphite and C8K) we determine the lattice pa- rameters and the bulk modulus. We also calculate the formation energy of C8K and the energy needed to peel off one graphite layer from the graphite surface and com- pare with experiment.51 A. Graphite bulk structure The present calculations on pure graphite are for the natural, AB-stacked graphite (lower panel of Fig. 1). The cohesive energy is calculated at a total of 232 structure values (a, dC-C) and the equilibrium structure and bulk modulus B0 are then evaluated using the method de- scribed in Ref. 52. Figure 3 shows a contour plot of the graphite cohesive energy variation EG,coh as a function of the layer sep- aration dC-C and the in-plane lattice constant a, calcu- lated within the vdW-DF scheme. The contour spacing is 5meV per carbon atom, shown relative to the energy minimum located at (a, dC-C) = (aG, cG/2) =(2.476 Å, 3.59 Å). These values are summarized in Table I together with the results obtained from a semilocal PBE cal- culation. As expected, and discussed in Ref. 14, the semilocal PBE calculation yields unrealistic results for the layer separation. The table also presents the cor- responding experimental values. Our calculated lattice values obtained using vdW-DFT are in good agreement with experiment,40 and close to those found from the older vdW-DF of Refs. 14 and 15, (in which we for Enlc as- sume translational invariance of n(r) along the graphite planes,) at (2.47 Å, 3.76 Å). Consistent with experimental reports18 and our previ- ous calculations14,15,45 we find graphite to be rather soft, indicated by the bulk modulus B0 value. Since in-plane compression is very hard in graphite most of the softness suggested by (the isotropic) B0 comes from compression perpendicular to the graphite layers, and the value of B0 is expected to be almost identical to the C33 elastic 3 3.5 4 4.5 5 5.5 6 dC−C [Å] FIG. 3: Graphite cohesive energy EG,coh (AB-stacked), based on vdW-DF, as a function of the carbon layer separation dC-C and the in-plane lattice constant a. The energy contours are spaced by 5meV per carbon atom. TABLE I: Optimized structure parameters and elastic properties for natural hexagonal graphite (AB-stacking) and the potassium-intercalated graphite structure C8K in AαAβAγAδAα . . . stacking. The table shows the calcu- lated optimal values of the in-plane lattice constant a, the (graphite-)layer-layer separation dC-C, and the bulk modulus B0. In C8K the value if dC-C is twice the graphite-potassium distance dC-K. Graphite C8K PBE vdW-DF Exp. PBE vdW-DF Exp. a (Å) 2.473 2.476 2.459a 2.494 2.494 2.480b dC-C (Å) ≫ 4 3.59 3.336 a 5.39 5.53 5.35c B0 (GPa) 27 37 de 37 26 47de aRef. 40. bRef. 53. cRef. 4. dRef. 18. eValue presented is for C33; for laterally rigid materials, like graphite and C8K, C33 is a good approximation of B0. coefficient.14,18 We find the energy cost of peeling off a graphite layer from the graphite surface (the exfoliation energy) to be ∆EC-G = −435 meV per (2 × 2) unit cell, i.e., −55 meV per surface carbon atom (Table II). A recent experiment51 measured the desorption energy of poly- cyclic aromatic hydrocarbons (basically flakes of graphite sheets) off a graphite surface. From this experiment the energy cost of peeling off a graphite layer from the graphite surface was deduced to −52± 5meV/atom. Our value −55 meV/C-atom is also consistent with a separate vdW-DF determination29 of the binding (−47meV per in-plane atom) between two (otherwise) isolated graphene sheets. For the energies of the absorbate system and of the C8K intercalate a few other graphite-related energy con- tributions are needed. The energy of collecting C atoms to form a graphene sheet at lattice constant a from iso- lated (spinpolarized) atoms is given by ∆EC-layer(a); we find that changing the lattice constant a from aG to the equilibrium value aC8K of C8K causes this energy to change a mere 30 meV per (2 × 2) sheet. The contri- bution ∆EG-acc is the energy of moving bulk graphite layers (in this case four periodically repeated layers) far away from each other, by expanding the unit cell along the direction perpendicular to the layers. Thus, ∆EG-acc = 32∆EG,coh(aG, cG/2) − 4∆EC-layer(aG) tak- ing the number of atoms and layers per unit cell into account. We find the value ∆EG-acc = −1600 meV per (2×2) four-layer unit cell. This corresponds to −50 meV per C atom, again consistent with our result for the ex- foliation energy, ∆EC-G/8 = −55 meV. B. Creating a layer of K-atoms The (artificial) step of creating a layer of potassium atoms from isolated atoms releases a significant energy ∆EK-layer. This energy contains the energy variation with in-plane lattice constant and the energy cost of changing from a spin-polarized to a spin-balanced elec- tron configuration for the isolated atom.54 The creation of the K-layer provides an energy gain which is about half an eV per potassium atom, depending on the final lattice constant. With the graphite lattice constant aG the energy change, including the spin-change cost, is ∆EK-layer(aG) = −476meV per K atom in vdW- DF (−624meV when calculated within PBE), whereas ∆EK-layer(aC8K) = −473meV in vdW-DF. C. Graphite-on-surface adsorption of potassium The potassium atoms are adsorbed on a usual ABA . . .-stacked graphite surface. We consider here full (one monolayer) coverage, which is one potassium atom per (2 × 2) graphite surface unit cell. This orders the potassium atoms in a honeycomb structure with lattice constant 2aG, and a nearest-neighbor distance within the K-layer of aG. The unit cell used in the standard DFT calculations for adsorption and absorption has a height of 40 Å and includes a vacuum region sufficiently big that no interac- tions (within GGA) can occur between the top graphene sheet and the slab bottom in the periodically repeated image of the slab. The vacuum region is also large in order to guarantee that the separation from any atom to the dipole layer55 always remains larger than 4 Å. In the top panel of Fig. 4 we show the adsorption en- ergy per potassium atom. The adsorption energy at equi- librium is −937 meV per K atom at distance dC-K = 3.02 Å from the graphite surface. For comparison we also show the adsorption curve cal- culated in a PBE-only traditional DFT calculation. Since vdW−DF 2 3 4 5 6 7 dC−K [Å] vdW−DF 43.532.5 dC−K [Å] sparse FIG. 4: Potassium adsorption and absorption energy at the graphite surface as a function of the separation dC-K of the K-atom layer and the nearest graphite layer(s) (at in-plane lattice constant corresponding to that of the surface, aG). Top panel: Adsorption curve based on vdW-DF calculations (solid line with black circles) and PBE GGA calculations (dashed line). The horizontal lines to the left show the en- ergy gain in creating the isolated K layer from isolated atoms, ∆EK-layer(aG), the asymptote of Eads(dC-K) in this plot. Bottom panel: Absorption curve based on vdW-DF calcu- lations. The asymptote is here the sum ∆EK-layer(aG) − ∆EC-G. Inset: Binding energy of the K-layer and the top graphite layer (“C-layer”) on top of the graphite slab, ∆EC-K-G. The dashed curve shows our results when in E ignoring every second FFT grid point (in each direction) of the charge density from the underlying GGA calculations, the solid curve with black circles shows the result of using every available FFT grid point. the interaction between the K-layer and the graphite sur- face has a short-range component to it, even GGA calcu- lations, such as the PBE curve, show significant binding (−900 meV/K-atom at dC-K = 2.96 Å). This is in con- trast to the pure vdW binding between the layers in clean graphite.14,15 Note that the asymptote of the PBE curve is different from that of the vdW-DF curve, this is due to the different energy gains (∆EK-layer) in collecting a potassium layer from isolated atoms when calculated in PBE or in vdW-DF. For K-adsorption the vdW-DF and PBE curves agree reasonably well, and the use of vdW-DF for this spe- cific calculation is not urgently necessary. However, in order to compare the adsorption results consistently to absorption, intercalation and clean graphite, it is neces- sary to include the long-range interactions through vdW- DF. As shown for the graphite bulk results above, PBE yields quantitatively and qualitatively wrong results for the layer separation. D. Graphite-subsurface absorption of potassium The first subsurface adsorption of K takes place in the void under the top-most graphite layer. The surface ab- sorption of the first K-layer causes a lateral shift of the top graphite sheet, resulting in a A/K/ABAB . . . stack- ing of the graphite. We have studied the bonding nature of this absorption process by considering a full p(2× 2)- intercalated potassium layer in the subsurface of a four layer thick graphite slab. Following the receipt of Section III for the absorption energy (7) the energies ∆EC-K-G are approximated by those from a four-layer intercalated graphite slab with the stacking A/K/ABA, and the values are shown in the inset of Fig. 4. The absorption energy Eabs is given by the curve in the bottom panel of Fig. 4, and its minimum is −952meV per K atom at dC-K = 2.90 Å. To investigate what grid spacing is sufficiently dense to obtain converged total-energy values in vdW-DF we do additional calculations in the binding distance region with a more sparse grid. Specifically, the inset of Fig. 4 compares the vdW-DF calculated at full griding with one that uses only every other FFT grid point in each direc- tion, implying a grid spacing for Enlc (but not for the lo- cal terms) which is maximum 0.26 Å. We note that using the full grid yields smaller absolute values of the absorp- tion energy. We also notice that the effect is more pro- nounced for small separations than for larger distances. Thus given resources, the dense FFT grid calculations are preferred, but even the less dense FFT grid calcu- lations yield reasonably well-converged results. In all calculations (except tests of our graphitic systems) we use a spacing with maximum 0.13 Å between grid points. This is a grid spacing for which we have explicitly tested convergence of the vdW-DF for graphitic systems given the computational strategy described and discussed in Sec. III. E. Potassium-intercalated graphite When potassium atoms penetrate the gallery of the graphite, they form planes that are ordered in a p(2× 2) fashion along the planes. The K intercalation causes a shift of every second carbon layer resulting into an AA stacking of the graphite sheets. The K atoms then simply occupy the sites over the hollows of every fourth carbon hexagon. The order of the K atoms perpendicular to the planes is described by the αβγδ stacking, illustrated in Fig. 2. For the potassium intercalated compound C8K we cal- culate in standard DFT using PBE the total energy at 132 different combinations of the structural parameters a TABLE II: Comparison of the graphite exfoliation energy per surface atom, EC-G/8, graphite layer binding energy per car- bon atom, ∆EC-acc/32, the energy gain per K atom of col- lecting K- and graphite-layers at equilibrium to form C8K, ∆EC8K-acc/4, and the equilibrium formation energy of C8K, Eform. ∆EC-G/8 ∆EC-acc/32 ∆EC8K-acc/4 Eform [meV/atom] [meV/atom] [meV/C8K] [meV/C8K] vdW-DF −55 −50 −818 −861 PBE − − −511 − Exp. −52± 5a −1236b aRef. 51. bRef. 1. 5 5.2 5.4 5.6 5.8 6 dC−C [Å] FIG. 5: Formation energy of C8K, Eform, as a function of the carbon-to-carbon layer separation dC-C and half the in- plane lattice constant, a. The energy contours are spaced by 20meV per formula unit. and dC-C. The charge densities and energy terms of these calculations are then used as input to vdW-DF. The equi- librium structure and elastic properties (B0) both for the vdW-DF results and for the PBE results are then evalu- ated with the same method as in the graphite case.52 Figure 5 shows a contour plot of the C8K formation energy, calculated in vdW-DF, as a function of the C-C layer separation (dC-C) and the in-plane periodicity (a) of the graphite-layer structure. The contour spacing is 20 meV per formula unit and are shown relative to the energy minimum at (a, dC-C) = (2.494 Å, 5.53 Å). V. DISCUSSION Table I presents an overview of our structural results obtained with the vdW-DF for graphite and C8K. The table also contrasts the results with the corresponding values calculated with PBE where available. The vdW- DF value dC-C = 5.53 Å for the C8K C − C layer sep- aration is 3% larger than the experimentally observed value whereas the PBE value corresponds to less than a 1% expansion. Our vdW-DF result for the C8K bulk modulus (26 GPa) is also softer than the PBE result (37 GPa) and further away from the experimental esti- mates (47 GPa) based on measurements of the C33 elastic response.18 A small overestimation of atomic separation is consistent with the vdW-DF behavior that has been documented in a wide range of both finite and extended systems.14,15,16,17,29,30,33,34 This overestimation results, at least in part, from our choice of parametrization of the exchange behavior — an aspect that lies beyond the present vdW-DF implementation which focuses on improving the account of the nonlocal correlations, per se. It is likely that systematic investigations of the ex- change effects can further refine the accuracy of vdW-DF implementations.56 In any case, vdW-DF theory calcu- lations represent, in contrast to PBE, the only approach to obtain a full ab initio characterization of the AM in- tercalation process. The C8K system is more compact than graphite and this explains why PBE alone can here provide a good de- scription of the materials structure and at least some ma- terials properties, whereas it fails completely for graphite. The distance between the graphene sheets upon interca- lation of potassium atoms is stretched compared to that of pure graphite, but the (K-)layer to (graphite-)layer separation, dC-K = dC-C/2 = 2.77 Å, is significantly less than the layer-layer separation in pure graphite. This in- dicates that C8K is likely held together, at least in part, by shorter-ranged interactions. Table II documents that the vdW binding neverthe- less plays an important role in the binding and forma- tion of C8K. The table summarizes and contrasts our vdW-DF and PBE results for graphite exfoliation and layer binding energies as well as C8K interlayer binding and formation energies. The vdW-DF result for the C8K formation energy is smaller than experimental measure- ments by 31% but it nevertheless represent a physically motivated ab initio calculation. In contrast, the C8K formation energy is simply unavailable in PBE because PBE, as indicated, fails to describe the layer binding in graphite. Moreover, for the vdW-DF/PBE comparisons that we can make — for example, of the C8K layer inter- action ∆EC8K-acc — the vdW-DF is found to significantly strengthen the bonding compared with PBE. It is also interesting to note that the combination of shorter-ranged and vdW bonding components in C8K yields a layer binding energy that is close to that of the graphite case. In spite of the difference in nature of interactions, we find almost identical binding energies per layer for the case of the exfoliation and accordion in graphite and for the accordion in C8K. This observation testifies to a perhaps surprising strength of the so-called soft-matter vdW interactions. In a wider perspective our vdW-DF permits a first comparison of the range of AM-graphite systems from adsorption over absorption to full intercalation and thus insight on the intercalation progress. Assuming a dense 2×2 configuration, we find that the energy for potassium adsorption and absorption is nearly degenerate with an indication that absorption is slightly preferred, consis- tent with experimental behavior. We also find that the potassium absorption may eventually proceeds towards full intercalation thanks to a significant release of forma- tion energy. VI. CONCLUSIONS The potassium intercalation process in graphite has been investigated by means of the vdW-DF density func- tional method. This method includes the dispersive in- teractions needed for a consistent investigation of the intercalation process. For clean graphite the vdW-DF predicts — contrary to standard semilocal DFT imple- mentations — a stabilized bulk system with equilibrium crystal parameters in close agreement with experiments. Two limits of the absorption process have been inves- tigated by the vdW-DF, namely single layer subsurface absorption and the fully potassium intercalated stage-1 crystal C8K. Here the vdW-DF is shown to enhance the (semi-)local type of bonding described by traditional ap- proaches. The significant impact on the materials behav- ior indicates that the vdW-DF is needed not only for a consistent description of sparse matter systems that are solely stabilized by dispersion forces, but also for their intercalates. We thank D.C. Langreth and B.I. Lundqvist for stim- ulating discussions. Partial support from the Swedish Research Council (VR), the Swedish National Graduate School in Materials Science (NFSM), and the Swedish Foundation for Strategic Research (SSF) through the consortium ATOMICS is gratefully acknowledged, as well as allocation of computer time at UNICC/C3SE (Chalmers) and SNIC (Swedish National Infrastructure for Computing). ∗ Electronic address: hyldgaar@chalmers.se 1 S. Aronson, F.J. Salzano, and D. Ballafiore, J. Chem. Phys. 49, 434 (1968). 2 D.E. Nixon and G.S. Parry, J. Phys. D 1, 291 (1968). 3 R. Clarke, N. Wada, and S.A. Solin, Phys. Rev. Lett. 44, 1616 (1980). 4 M.S. Dresselhaus and G. Dresselhaus, Adv. Phys. 30, 139 (1981). 5 D.P. DiVincenzo and E.J. Mele, Phys. Rev. B 32, 2538 (1985). 6 N.B. Hannay, T.H. Geballe, B.T. Matthias, K. Andreas, P. Schmidt, and D. MacNair, Phys. Rev. Lett. 14, 225 (1965). 7 R.A. Jishi and M.S. Dresselhaus, Phys. Rev. B 45, 12465 (1992). 8 T. Kihlgren, T. Balasubramanian, L. Walldén, and R. Yakimova, Surf. Sci. 600, 1160 (2006). 9 M. Breitholtz, T. Kihlgren, S.-Å. Lindgren, and L. Walldén, Phys. Rev. B 66, 153401 (2002). 10 I. Forbeaux, J.-M. Themlin, and J.-M. Debever, Phys. Rev. B 58, 16396 (1998). 11 T. Kihlgren, T. Balasubramanian, L. Walldén, and R. Yakimova, Phys. Rev. B 66, 235422 (2002). 12 I. Forbeaux, J.-M. Themlin, A. Charrier, F. Thibaudau, and J.-M. Debever, Appl. Surf. Sci. 162–163, 406 (2000). 13 E. Ziambaras, Ph.D. thesis, Chalmers (2006). 14 H. Rydberg, M. Dion, N. Jacobson, E. Schröder, P. Hyldgaard, S.I. Simak, D.C. Langreth, and B.I. Lundqvist, Phys. Rev. Lett. 91, 126402 (2003). 15 D.C. Langreth, M. Dion, H. Rydberg, E. Schröder, P. Hyldgaard, and B.I. Lundqvist, Int. J. Quantum Chem. 101, 599 (2005). 16 M. Dion, H. Rydberg, E. Schröder, D.C. Langreth, and B.I. Lundqvist, Phys. Rev. Lett. 92, 246401 (2004); 95, 109902(E) (2005). 17 T. Thonhauser, V.R. Cooper, S. Li, A. Puzder, P. Hyldgaard, and D.C. Langreth, Van der Waals density functional: Self-consistent poten- tial and the nature of the van der Waals bond, http://arxiv.org/abs/cond-mat/0703442 18 N. Wada, R. Clarke, and S.A. Solin, Solid State Comm. 35, 675 (1980). 19 H. Zabel and A. Magerl, Phys. Rev. B 25, 2463 (1982). 20 J.C. Barnard, K.M. Hock and R.E. Palmer, Surf. Science 287–288, 178 (1993). 21 K. M. Hock and R. E. Palmer, Surf. Science 284, 349 (1993). 22 Z.Y. Li, K.M. Hoch, and R.E. Palmer, Phys. Rev. Lett. 67, 1562 (1991). 23 S.D. Chakarova and E. Schröder, Materials Science and Engineering C 25, 787 (2005). 24 L.A. Girifalco and M. Hodak, Phys. Rev. B 65, 125404 (2002). 25 J.P. Perdew, J.A. Chevary, S.H. Vosko, K.A. Jackson, M.R. Pederson, D.J. Singh, and C. Fiolhais, Phys. Rev. B 48, 6671 (1992). 26 J.P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996). 27 Y. Zhang and W. Yang, Phys. Rev. Lett. 80, 890 (1998). 28 B. Hammer, L.B. Hansen, and J.K. Nørskov, Phys. Rev. B 59, 7413 (1999). 29 S.D. Chakarova-Käck, J. Kleis, and E. Schröder, Appl. Phys. Rep. 2005-16 (2005). 30 J. Kleis, B.I. Lundqvist, D.C. Langreth, and E. Schröder, Towards a working density-functional theory for polymers: First-principles determination of the polyethylene crystal structure, http://arxiv.org/abs/cond-mat/0611498 31 S.D. Chakarova-Käck, E. Schröder, B.I. Lundqvist, and D.C. Langreth, Phys. Rev. Lett. 96, 146107 (2006). 32 S.D. Chakarova-Käck, Ø. Borck, E. Schröder, and B.I. Lundqvist, Phys. Rev. B 74, 155402 (2006). 33 A. Puzder, M. Dion, and D.C. Langreth, J. Chem. Phys. 124, 164105 (2006). 34 T. Thonhauser, A. Puzder, and D.C. Langreth, J. Chem. Phys. 124, 164106 (2006). 35 D.D.L. Chung, J. Mat. Sci. 37, 1475 (2002). 36 M. Breitholtz, T. Kihlgren, S.-Å. Lindgren, H. Olin, E. mailto:hyldgaar@chalmers.se http://arxiv.org/abs/cond-mat/0703442 http://arxiv.org/abs/cond-mat/0611498 Wahlström, and L. Walldén, Phys. Rev. B 64, 073301 (2001). 37 Z.P. Hu, N.J. Wu, and A. Ignatiev, Phys. Rev. B 33, 7683 (1986). 38 J. Cui, J.D. White, R.D. Diehl, J.F. Annett, and M.W. Cole, Surf. Sci. 279, 149 (1992). 39 L. Österlund, D.V. Chakarov, and B. Kasemo, Surf. Sci. 420, L437 (1991). 40 Y. Baskin and L. Meyer, Phys. Rev. 100, 544 (1955). 41 W. Eberhardt, I.T. McGovern, E.W. Plummer, and J.E. Fisher, Phys. Rev. Lett. 44, 200 (1980). 42 A.R. Law, J.J. Barry, and H.P. Hughes, Phys. Rev. B 28, 5332 (1983). 43 R. Ahuja, S. Auluck, J. Trygg, J.M. Wills, O. Eriksson, and B. Johansson, Phys. Rev. B 51, 4813 (1995). 44 N.A.W. Holzwarth, S.G. Louie, and S. Rabii, Phys. Rev. B 26, 5382 (1982). 45 H. Rydberg, N. Jacobson, P. Hyldgaard, S.I. Simak, B.I. Lundqvist, and D.C. Langreth, Surf. Sci. 532-535, 606 (2003). 46 Open-source plane-wave DFT computer code Dacapo, http://www.fysik.dtu.dk/CAMPOS/ 47 D. Vanderbilt, Phys. Rev. B 41, 7892 (1990). 48 H.J. Monkhorst and J.D. Pack, Phys. Rev. B 13, 5188 (1976). 49 D.C. Langreth, private communication; J. Kleis and P. Hyldgaard, unpublished. 50 The transition from on-surface adsorption to subsurface absorption is identified in experiment by a work function change, Refs. 20 and 21. 51 R. Zacharia, H. Ulbricht, and T. Hertel, Phys. Rev. B 69, 155406 (2004). 52 E. Ziambaras and E. Schröder, Phys. Rev. B 68, 064112 (2003). 53 D.E. Nixon and G.S. Parry, J. Phys. C 2, 1732 (1969). 54 O. Gunnarsson, B.I. Lundqvist, and J.W. Wilkins, Phys. Rev. B 10, 1319 (1974). Since no spin-polarized version of vdW-DF exists at present, we calculate the the energy cost for changing the spin of isolated potassium atoms in PBE. The spin-change cost is thus determined to be 26 meV/K-atom. 55 L. Bengtsson, Phys. Rev. B 59, 12301 (1999), and refer- ences therein. 56 The choice of exchange flavor in vdW-DF was set in Ref. 15 to avoid artificial bonding in noble-gas systems and to better mimic exact exchange calculations for those sys- tems. However, it is far from certain and even unlikely that the conclusions drawn for noble-gas systems carry over to bonding separations smaller than 3 Å. http://www.fysik.dtu.dk/CAMPOS/ ABSTRACT Potassium intercalation in graphite is investigated by first-principles theory. The bonding in the potassium-graphite compound is reasonably well accounted for by traditional semilocal density functional theory (DFT) calculations. However, to investigate the intercalate formation energy from pure potassium atoms and graphite requires use of a description of the graphite interlayer binding and thus a consistent account of the nonlocal dispersive interactions. This is included seamlessly with ordinary DFT by a van der Waals density functional (vdW-DF) approach [Phys. Rev. Lett. 92, 246401 (2004)]. The use of the vdW-DF is found to stabilize the graphite crystal, with crystal parameters in fair agreement with experiments. For graphite and potassium-intercalated graphite structural parameters such as binding separation, layer binding energy, formation energy, and bulk modulus are reported. Also the adsorption and sub-surface potassium absorption energies are reported. The vdW-DF description, compared with the traditional semilocal approach, is found to weakly soften the elastic response. <|endoftext|><|startoftext|> Introduction, one main in- convenience of liquid-crystal simulations is the correct identification of the solid phase(s) of the system, since a plethora of such phases are conceivable and there is no unfailing criterion for choosing those that are really relevant to the specific model under investigation. The actual importance of a given crystal phase can only be judged a posteriori, after proving its mechanical stability in a long simulation run and, ultimately, on the basis of the calculation of its Gibbs free energy, but nothing can nevertheless ensure that no important phase was skipped. Besides these vague indications, we adopted a more strin- gent test in order to select the phases for which it is worth performing the numerically-expensive calculation of the free energy. With specific reference to the model (2.2), we did a comprehensive T = 0 study of the chemical po- tential µ as a function of the pressure for many stretched cubic and hexagonal phases, in such a way as to iden- tify the stable ground states and leave out from further consideration all solids with a very large µ at zero tem- perature. In fact, it is unlikely that such phases can ever play a role for the thermodynamics at non-zero temper- atures. For the interaction potential describing the GCN model, we surmise that all of its stable crystal phases are to be sought among the structures obtained from the common cubic and hexagonal lattices by a suit- able stretching along a high-symmetry crystal axis, with optimal stretching ratios α that are probably close to L/D. Take e.g. the case of BCC. We can stretch it along [001], [110], or [111], this way defining new BCC001(α), BCC110(α), and BCC111(α) lattices (the number within parentheses is the stretching ratio; for in- stance, BCC001(2) is a BCC crystal whose unit cell has been expanded by a factor of 2 along ẑ). The same can be done with the simple-cubic (SC) and FCC structures. We further consider hexagonal-close-packed (HCP) and simple-hexagonal (SH) lattices that are stretched along [111], this way arriving at a total of eleven potentially relevant crystal phases. METHOD For fixed T and P values, the most stable of several thermodynamic phases is the one with lowest chemical potential µ (Gibbs free energy per particle). At T = 0, only crystal phases are involved in this competition and, once a list of relevant phases has been compiled, searching for the optimal one at a given P becomes a simple computational exercise. An exact property of the Gaussian-core model (which is the L/D = 1 limit of the GCN model) is that, on increasing pressure, the BCC crystal takes over the FCC crystal at P ∗ ≡ PD3/ǫ ≃ 0.055 [3]. Hence, in the GCN model with L/D > 1 a leading role is naturally expected for the stretched FCC and BCC crystals. For an assigned crystal structure, we calculate the T = 0 chemical potential µ(P ) of the GCN model for a given pressure P by adjusting the stretching ratio α(P ) and the density ρ(P ) until the minimum of (U+PV )/N is found. Once the profile of µ as a function of P is known for each structure, it is straightforward to draw the T = 0 phase diagram for the given L/D. The known thermodynamic behavior at zero tempera- ture provides the general framework for the further simu- lational study at non-zero temperatures. In fact, it is safe to say that the same crystals that are stable at T = 0 also give the underlying lattice structure for the stable solid phases at T > 0. As we shall see in more detail in the next Section, the only complication is the existence of three degenerate T = 0 structures for not too small pressures, which obliged us to consider each of them as a potentially relevant low-temperature GCN phase. We perform a Monte Carlo (MC) simulation of the GCN model with L/D = 3 in the isothermal-isobaric ensemble, using the standard Metropolis algorithm with periodic boundary conditions and the nearest-image con- vention. For the solid phase, four different types of lattices are considered, namely FCC001(3), BCC110(3), BCC111(3), and BCC001(3) (see Section IV). The num- ber of particles in a given direction is chosen so as to guar- antee a negligible contribution to the interaction energy from pairs of particles separated by half a simulation- box length in that direction. More precisely, our samples consist of 10× 20× 8 = 1600 particles in the FCC001(3) phase, of 8 × 24 × 6 = 1152 particles in the fluid and in the solid BCC110(3) phase, of 10× 12× 18 = 2160 parti- cles in the BCC111(3) phase, and of 12× 12× 10 = 1440 particles in the BCC001(3) phase. Considering the large system sizes employed, we made no attempt to extrapo- late our finite-size results to infinity. At given T and P , equilibration of the sample typically took a few thousand MC sweeps, a sweep consisting of one average attempt per particle to change its center-of- mass position plus one average attempt to change the volume by a isotropic rescaling of particle coordinates. The maximum random displacement of a particle and the maximum volume change in a trial MC move are ad- justed once a sweep during the run so as to keep the acceptance ratio of moves close to 50% and 40%, respec- tively. While the above setup is sufficient when simu- lating a (nematic) fluid system, it could have harmful consequences on the sampling of a solid state to operate with a fixed box shape since this would not allow the system to release the residual stress. That is why, after a first rough optimization with a fixed box shape, the equi- librium MC trajectory of a solid state is generated with a modified (so called constant-stress) Metropolis algorithm which makes it possible to adjust the length of the vari- ous sides of the box independently from each other (see e.g. [8]). Ordinarily, however, the simulation box will de- viate only very little from its original shape. When the opposite occurs, this indicates a mechanic instability of the solid in favor of the fluid, hence it gives a clue as to where melting is located. We note that MC simulations with a varying box shape are not well suited for the fluid phase since in this case one side of the box usually be- comes much larger or smaller than the other two, a fact that seriously prejudicates the reliability of the simula- tion results. In order to locate the melting point for a given pres- sure, we generate separate sequences of simulation runs, starting from the cold solid on one side and from the hot fluid on the other side. The last configuration produced in a given run is taken to be the first of the next run at a slightly different temperature. The starting configuration of a “solid” chain of runs was always a perfect crystal with α = 3 and a density equal to its T = 0 value. Usually, this series of runs is carried on until a sudden change is observed in the difference between the energies/volumes of solid and fluid, so as to prevent us from averaging over heterogeneous thermodynamic states. Thermodynamic averages are computed over trajectories 104 sweeps long. Much longer trajectories are constructed for estimating the chemical potential of the fluid (see below). Estimating statistical errors is a critical issue whenever different candidate solid structures so closely compete for thermodynamic stability. To this aim, we divide the MC trajectory into ten blocks and estimate the length of the error bars to be twice as large as the standard deviation of the block averages. Typically, the relative errors affecting the energy and the volume of the fluid are found to be very small, a few hundredths percent at the most (for a solid, they are even smaller). A more direct clue about the nature of the phase(s) expressed by the system for intermediate temperatures can be got from a careful monitoring across the state space of a “smectic” order parameter (OP) and of two different, transversal and longitudinal (with respect to ẑ) distribution functions (DFs). The smectic OP is defined τ(λ) = . (3.1) This quantity is able to notice the existence of a layered structure along ẑ in the system, be it solid-like or smectic- like. In particular, the λ at which τ takes its largest value gives the nominal distance λmax between the layers. A large value of τ at λmax signals a strong layering along z with period λmax. In order to discriminate between solid and smectic (fluid) layers, we can rely on the in-plane DF g⊥(r⊥), with r⊥ = r − (r · ẑ)ẑ, which informs on how much rapid is the decay of crystal-like spatial corre- lations in directions perpendicular to ẑ. The persistence of crystal order along ẑ is measured through another DF, g‖(z), which gives similar indications as τ(λ). A liquid- like profile of g⊥ along with a sharply peaked τ or g‖ will be faithful indication of a smectic phase. Conversely, a sharply peaked g⊥ along with a structureless g‖ will be the imprints of a columnar phase. Both g⊥(r⊥) and g‖(z) are normalized in such a way as to approach 1 at large distances in case of fully disordered center-of-mass distributions in the respective directions. Slight devia- tions from this asymptotic value may occur as a result of the variation of box sidelengths during a simulation run. The two DFs were constructed with a spatial reso- lution of ∆r⊥ = D/20 and ∆z = L/20 respectively, and updated every 10 MC sweeps. We compute the difference in chemical potential be- tween any two equilibrium states of the system – say, 1 and 2 – within the same phase (or even in different phases, provided they are separated by a second-order boundary) by the standard thermodynamic-integration method as adapted to the isothermal-isobaric ensemble, i.e., via the combined use of the formulas: µ(T, P2)− µ(T, P1) = dP v(T, P ) (3.2) µ(T2, P ) µ(T1, P ) u(T, P ) + Pv(T, P ) (3.3) To prove really useful, however, the above equations re- quire an independent estimate of µ for at least one ref- erence state in each phase. For the fluid, a reference state can be any state characterized by a very small den- sity (a nearly ideal gas), since then the excess chemical potential can be estimated accurately through Widom’s particle-insertion method [15]. The use of this technique for small but finite densities avoids the otherwise neces- sary extrapolation to the ideal gas limit as a reference state for thermodynamic integration. In order to calculate the excess Helmholtz free energy of a solid, we resort to the method proposed by Frenkel and Ladd [1], based on a different kind of thermody- namic integration (see Ref. [4] for a full description of this method and of its implementation on a computer). We note that the ellipsoidal symmetry of the GCN particles is not a complication at all, since the particle axes are frozen and the only degrees of freedom been left are the centers of mass. The solid excess Helmholtz free energy is calculated through a series of NV T simulation runs, i.e., for fixed density and temperature. As far as the density is concerned, its value is chosen in a way such that com- plies with the pressure of the low-temperature reference state, that is the one from which the NPT sequence of runs is started. We wish to emphasize that, thanks to the large sample sizes employed, the density histogram in a NPT run always turned out to be sharply peaked, indi- cating very limited density fluctuations (hence, negligible ensemble dependence of statistical averages). RESULTS Zero-temperature calculations For various L/D values in the interval between 1.1 and 3, we have calculated the T = 0 chemical poten- tial µ(P ) for our eleven candidate ground states, with P ranging from 0 to 0.20. We report in Table 1 the results relative to L/D = 3 for two values of P , 0.05 and 0.20. An emergent aspect of this Table is the exis- tence of a rich degeneracy that is only partly a result of the effective identity of crystal structures up to a dila- tion. Take e.g. the five structures with the minimum µ (and with the same density). While the BCC001 lattice with α = 3 is obtained from the FCC001 lattice with α = 3/ 2 = 2.12 . . . by a simple 2 dilation, there is no homothety transforming BCC001(3) into BCC110(3) or into BCC111(3) (in turn equivalent to SC111(1.5)): Points in these three lattices have different local envi- ronments, as can be checked by counting the nth-order neighbors for n up to 4, yet the three stretched BCC crys- tals of minimum µ share the same U/N . Also the pairs FCC110(3), FCC111(3) and SC001(3), SC110(3) consist of topologically-different degenerate structures. This fact is an emergent phenomenon whose deep reason remains unclear to us; it should deal with the dependence of u on the ratio r/σ(θ), since the same symmetry holds with a polynomial, rather than Gaussian, dependence. For the case of L/D = 3, we show in Fig. 1 the over- all P dependence at T = 0 of the chemical potential µ for the various solids. The solid with the minimum µ is either of the type FCC001 (with α = 3) or, say, of the type BCC001 (with α = 3), a fact that holds true, but with α = L/D, for all 1 < L/D < 3. Other solids are definitely ruled out, and the same will probably hold for T > 0. On increasing L/D, the transition from a FCC- type to a BCC-type phase occurs at a lower and lower pressure, whose reduced value is slightly less than 0.02 for L/D = 3. Monte Carlo simulation In order to investigate the thermodynamic behavior of the GCN model at non-zero temperatures, we have car- ried out a number of MC simulation runs for a GCN system with L/D = 3, which is the system with the strongest liquid-crystalline features that we can still man- age numerically. We have effected scans of the phase diagram for six different pressure values, P ∗ = 0.01, 0.02, 0.03, 0.05, 0.12, and 0.20. With all probability, FCC001(3) is the sta- ble system phase only in a very small pocket of T -P plane nearby the origin. However, we decided not to embark on a free-energy study of the relative stability of fluid, FCC001(3), and BCC-type phases at such low pres- sures since this would require a numerical accuracy that is beyond our capabilities. To a first approximation, the boundary line between FCC001(3) and, say, BCC111(3) can be assumed to run at constant pressure. For relating data obtained at different pressures, we have carried out two further sequences of MC runs along the isothermal paths for T ∗ = 0.002 (solids) and T ∗ = 0.015 (fluid). The Frenkel-Ladd computation of the excess Helmholtz free energy per particle fex confirms that the BCC001(3), BCC110(3), and BCC111(3) solids are nearly degenerate at low temperature. We take T ∗ = 0.002, P ∗ = 0.05 as a reference state for the calculation of solid free energies. With the density fixed at ρ = 0.08562D−3, in every case corresponding to P ∗ = 0.05, we find βfex = 144.461(2), 144.470(2), and 144.453(3), for the three above solids respectively, implying a weak preference for the BCC111(3) phase. Then, using thermodynamic integration along the T ∗ = 0.002 isotherm (see Eq. (3.2)), we have studied the relative stability of the three solids as a function of pressure, up to P ∗ = 0.20. The results, depicted in Fig. 2, suggest that BCC111(3) is the stable phase throughout the low-temperature region, the other solids being very good solutions anyway with near-optimal chemical potentials. We then follow the thermal disordering of the BCC- type solids for fixed pressure (with three cases consid- ered, P ∗ = 0.05, 0.12, and 0.20) through sequences of isothermal-isobaric runs, all starting from T ∗ = 0.002, with steps of 0.001. Any such sequence is stopped when the values of potential energy and specific volume have collapsed onto those of the fluid, thus informing that the ultimate bounds of solid stability are reached (usually, a solid can hardly be overheated). The stability thresholds detected this way are fairly consistent with the indica- tion coming from the DF profiles which, upon increas- ing temperature, will eventually show a fluid-like appear- ance. Thermodynamic integration (see Eq. (3.3)) is used to propagate the calculated µ for T ∗ = 0.002 to higher temperatures. As far as the (nematic) fluid is concerned, we have first generated a sequence of NPT simulation runs for P ∗ = 0.05, starting from T ∗ = 0.015. At this initial point, the excess chemical potential µex was estimated by Widom’s insertion method, obtaining µex = 0.986(5). It is worth noting that, in a long simulation run of as many as 5×104 MC sweeps at equilibrium, the chemical- potential value relaxed very soon, with small fluctuations around the average and no significant drift observed. Our analysis of the fluid phase is completed by further sim- ulation runs along the isobaric paths for P ∗ = 0.12 and 0.20, for which we did not have the need to compute the chemical potential again since this could be deduced from the volume data along the T ∗ = 0.015 isotherm. Chemical-potential results along the three isobars on which we focussed are reported in Figs. 3 to 5. As is clear, with increasing temperature the fluid eventually takes over the solids. Among the solids, the BCC111(3) phase is the preferred one for any temperature and pressure, al- though the chemical potential of the other solid phases is only slightly larger. On increasing pressure, the melting temperature goes down, like in the Gaussian-core model. The necessity of a matching with the zero-temperature melting point for P = 0 will then imply reentrant melt- ing in the GCN model too. The maximum error on the melting temperature Tm, which we estimate to be about 0.003 (hence not that small), entirely depends on the lim- ited precision of the fluid µex, which then constitutes a major source of error on Tm. The only conclusion we can draw from the above chemical-potential study is that BCC111(3) is the most stable solid phase of the system (provided the pressure is not too low). However, a closer look at the DF profiles obtained from the simulation of BCC111(3) raises some doubts about the absolute stability of this phase at in- termediate temperatures, whatever the pressure, calling for a different interpretation of the hitherto considered as BCC111(3) MC data. Take, for instance, the case of P = 0.05. Upon increasing temperature, while g⊥ keeps strongly peaked all the way to melting, the solid-like os- cillations of g‖ undergo progressive damping until they are washed out completely, suggesting a second-order (or very weak first-order at the most) transformation of BCC111(3) into a columnar phase before melting. This is illustrated in Figs. 6 and 7, where the DFs are plotted for a number of temperatures. A similar indication is got from the behavior of the smectic OP, see Fig. 8, whose highest maximum eventually deflates at practically the same temperature, T ∗ ≈ 0.005, at which the oscillations of g‖ disappear. Note that no appearance of a columnar phase is seen during the simulation of either BCC110(3) or BCC001(3), nor in the simulation of FCC001(3) for P ∗ = 0.01. A slice of the columnar phase is depicted in Fig. 9 (right panels). In this phase, columns of stacked particles are arranged side by side, tightly packed to- gether so as to project a triangular solid on the x-y plane. Neighboring columns are not commensurate with each other, as implied by a completely featureless g‖. The probable reason for the instability of the smectic phase in the GCNmodel is the absence of an ad hocmech- anism for lateral attraction between the molecules, which is present instead in the model of Ref. [14]. By the way, hard ellipsoids do not show a smectic phase either [7], at variance with (long) hard spherocylinders where particle geometry alone proves sufficient to stabilize a periodic modulation of the number density along ẑ [10]. Given the compelling evidence of a columnar phase in the GCN model, one may now ask whether the con- clusions drawn from the chemical-potential data are all flawed. In particular, the µ curves that are tagged as BCC111(3) in Figs. 3 to 5 would be meaningless beyond a certain temperature Tc < Tm. In fact they are not, i.e., they retain full validity up to melting since the (nearly) continuous character of the transition from BCC111(3) to columnar allows one to safely continuate thermody- namic integration across the boundary, with the proviso that what previously treated as the BCC111(3) chemi- cal potential beyond Tc is to be assigned instead to the columnar phase. As pressure goes up, the transition from BCC111(3) to columnar takes place at lower and lower temperatures. In order to exclude that the columnar phase too, like- wise the fluid, will show reentrant behavior at low pres- sure, we have simulated the disordering of a BCC111(3) solid also for P ∗ = 0.02 and 0.03 (in fact, no reentrance of the columnar phase is observed). Further points on the melting line for P = 0.01, 0.02, and 0.03 are fixed through the behavior of g⊥ as a function of temperature. All in all, the overall GCN phase diagram appears as sketched in Fig. 10. This is similar to the phase portrait of the Gaussian-core model, see Fig. 1 of Ref. [4], with the obvious exception of the columnar phase. There is a small discrepancy between the melting points as located through free-energy calculations (full dots in Fig. 10) and those assessed from the evolution of g⊥ (open dots). In our opinion, this would mostly be attributed to the sta- tistical error associated with the µex of the fluid in its reference state. Notwithstanding their limited precision, however, free-energy calculations are all but useless in identifying the structure of the solid phase. In conclu- sion, although some aspects of the equilibrium behavior of the GCN model remain still uncertain, especially with regard to the exact location of the solid-solid transition at low pressure, we are confident that the main features of the GCN phase diagram are correctly accounted for by Fig. 10. Summing up, there are at least two conceivable and mutually exclusive paths for the thermal disordering of a liquid-crystal solid (aside from a direct transformation of it into a nematic phase). One is through the forma- tion of a smectic phase, which eventually transforms into a nematic fluid. A second possibility is a more gradual release of crystalline order by the appearance of a colum- nar phase as intermediate stage between the solid and the nematic phase. Our study showed that it is this second scenario that occurs in the GCN model, with no evidence whatsoever of a smectic phase. CONCLUSIONS We have introduced a liquid-crystal model of softly- repulsive parallel ellipsoids, named the Gaussian-core ne- matic (GCN) model, aiming at a complete characteriza- tion of its phase behavior, including the solid sector. This requires a preliminary identification of all relevant solid structures, which is generally a far-from-trivial task to be accomplished for model liquid crystals [16]. Through a careful scrutiny of as many as eleven uniaxially-deformed cubic and hexagonal phases, we obtained a thorough de- scription of the T = 0 equilibrium phase portrait of the GCN model, identifying its ground state at any given pressure. In doing so, we discovered a rich and absolutely unexpected structural degeneracy, which is only lifted by going to T > 0. At low temperature, and for not too low pressures, our free-energy calculations indicate that a GCN system with an aspect ratio of 3 is found in just one solid phase, i.e., a stretched BCC solid with the molecules oriented along [111]. Only near zero pressure, the stable phase becomes a stretched FCC solid. With increasing temperature, the BCC-type solid first undergoes a weak transition into a columnar phase, which still retains par- tial crystalline order, before melting completely into the nematic fluid. It is worth emphasizing that our interest in the GCN model is purely theoretical, hard-core ellipsoids provid- ing a more physically realistic model liquid crystal. One could even argue that a Gaussian repulsion is highly irre- alistic for a liquid crystal. In real atomic systems, super- position of particle cores is strongly obstructed, whence the consideration of hard-core or steep inverse-power re- pulsion in the more popular models. However, unless the system density is very high, higher than considered in our study, repulsive Gaussian particles would effectively be blind to an inner hard core, which thus may or may not exist, as evidenced e.g. in the snapshots of Fig. 9 where particles appear well spaced out. The GCN model is a “deformation” of Stillinger’s Gaussian-core model, well known for exhibiting a reentrant-melting transition. Various instances of reen- trant behavior are also known for nematics [17] and in- deed one of the original motivations for the present work was searching for a new kind of reentrance, i.e., re- appearance of a more disordered phase with increasing pressure. With this study, we provide yet another exam- ple of reentrant behavior in a model nematic: While this is nothing but the analog of fluid-phase reentrance in the Gaussian-core model, the absolute novelty of our findings is in the nature of the intermediate phase, this being sur- prisingly columnar in a range of pressures rather than genuinely solid. ∗ Electronic address: Santi.Prestipino@unime.it † Electronic address: saija@me.cnr.it [1] D. Frenkel and A. J. C. Ladd, J. Chem. Phys. 81, 3188 (1984); see also J. M. Polson, E. Trizac, S. Pronk, and D. Frenkel, J. Chem. Phys. 112, 5339 (2000). [2] F. Saija and S. Prestipino, Phys. Rev. B 72, 024113 (2005). [3] S. Prestipino, F. Saija, and P. V. Giaquinta, Phys. Rev. E 71, 050102(R) (2005). [4] S. Prestipino, F. Saija, and P. V. Giaquinta, J. Chem. Phys. 123, 144110 (2005). [5] F. H. Stillinger, J. Chem. Phys., 65, 3968 (1976). [6] A. Lang, C. N. Likos, M. Watzlawek, and H. Löwen, J. Phys.: Condens. Matter, 12, 5087 (2000). [7] D. Frenkel, B. M. Mulder, and J. P. McTague, Phys. Rev. Lett. 52, 287 (1984). [8] A. Stroobants, H. N. W. Lekkerkerker, and D. Frenkel, Phys. Rev. A 36, 2929 (1987). [9] J. A. C. Veerman and D. Frenkel, Phys. Rev. A 41, 3237 (1990); ibidem, 43, 4334 (1991). [10] P. Bolhuis and D. Frenkel, J. Chem. Phys. 106, 666 (1997). [11] C. Vega, E. P. A. Paras, and P. A. Monson, J. Chem. Phys. 96, 9060 (1992); ibidem, 97, 8543 (1992). [12] P. Pasini and C. Zannoni eds., Advances in the Computer Simulations of Liquid Crystals (NATO-ASI Series, 1998). [13] S. Singh, Phys. Rep. 324, 107 (2000). [14] E. de Miguel and E. Martin del Rio, Phys. Rev. Lett. 95, 217802 (2005). [15] B. Widom, J. Chem. Phys. 39, 2808 (1963). [16] After completion of this paper, we became aware of the discovery, reported in P. Pfleiderer and T. Schilling, cond-mat/0612151, of a new stable crystal phase in freely-standing hard ellipsoids. This further demonstrates that the solid structure of liquid crystals is generally dif- ficult to anticipate, even when the model system is the simplest as possible. [17] The first example of such behavior was discovered by P. E. Cladis, Phys. Rev. Lett. 35, 48 (1975); see also Ref. [14] and references therein. mailto:Santi.Prestipino@unime.it mailto:saija@me.cnr.it TABLE I: GCN model for L/D = 3: T = 0 chemical poten- tial µ(P ) for eleven different solids and two values of P ∗, 0.05 and 0.20. Nx, Ny , Nz are the number of lattice points along the three spatial directions, ρ = NxNyNz/V is the density, and α is the stretching ratio (for the SH111 lattice, α is the so-called c/a ratio). Nx, Ny , Nz have been chosen so large that the rounding-off error on the total potential energy per particle, U/N , due to the finite lattice size is negligible. The numerical precision on ρ and α is of one unit on the last deci- mal digit. Looking at the Table, the most stable structures at both pressures are five degenerate crystals, actually belonging to three distinct types which are exemplified by BCC001(3) (equivalent to FCC001(2.12) up to a dilation), BCC110(3), and BCC111(3) (equivalent to SC111(1.5)) – within brackets is the value of α. crystal Nx, Ny , Nz ρ(0.05) α(0.05) µ(0.05) ρ(0.20) α(0.20) µ(0.20) FCC001 10,20,10 0.086 2.12 0.855724 0.157 2.12 2.093695 BCC001 14,14,10 0.086 3.00 0.855724 0.157 3.00 2.093695 SC001 20,20,8 0.086 3.00 0.881586 0.158 3.00 2.105241 FCC110 16,12,12 0.086 3.00 0.856391 0.157 3.00 2.094368 BCC110 10,28,8 0.086 3.00 0.855724 0.157 3.00 2.093695 SC110 14,18,10 0.086 3.00 0.881586 0.158 3.00 2.105241 FCC111 16,18,9 0.086 3.00 0.856391 0.157 3.00 2.094368 BCC111 12,12,18 0.086 3.00 0.855724 0.157 3.00 2.093695 SC111 12,12,18 0.086 1.50 0.855724 0.157 1.50 2.093695 HCP111 18,20,10 0.086 3.00 0.856429 0.157 3.02 2.094474 SH111 18,20,9 0.086 2.75 0.870014 0.158 2.69 2.099565 FIG. 1: T = 0 equilibrium behavior of the GCN model with L/D = 3. Left: T = 0 chemical potential µ(P ∗) of var- ious crystals relative to BCC110(3), which thus serves as the zero or reference level. The reduced pressure P ∗ is in- cremented by steps of 0.01. Note that, for all P , the five crystals FCC001(2.12), BCC001(3), BCC110(3), BCC111(3), and SC111(1.5) are degenerate (∆µ = 0). Other data points are for FCC001 (continuous line; α = 3 for P ∗ = 0.01, be- ing α = 2.12 otherwise), FCC110(3) and FCC111(3) (dotted line), HCP111 (open dots), SH111 (open squares), SC001(3) and SC110(3) (dashed line). Right: Resulting equation of state in the pressure range from 0 to 0.30. FCC001(3) (open triangle) is stable at very low pressure, up to slightly less than 0.02, while FCC001(2.12), BCC001(3), etc. (open dots) prevail for higher pressures. FIG. 2: GCN model with L/D = 3, chemical-potential results for T ∗ = 0.002. In the picture, we plot the reduced chemical potential of the three T = 0 degenerate structures that exist for not too low pressure, taking BCC111(3) for reference. The latter phase gives the most stable solid for any P in the range from 0.05 to 0.20 (and, most likely, even further). The µ curves are obtained by thermodynamic integration of volume MC data, using as initial conditions those specified by the Frenkel-Ladd calculations that were carried out at P ∗ = 0.05. Though the reported µ values for the BCC-type solids are very close to each other and also affected by some numerical noise, the higher stability of BCC111(3) cannot be truly called into question – a regular pattern is clearly seen behind each curve. FIG. 3: GCN model with L/D = 3, chemical-potential results for P ∗ = 0.05: Chemical potential of the fluid phase (dotted line) as compared with those of the competing solid phases for that pressure (BCC001(3), long-dashed line; BCC110(3), dashed line; BCC111(3), continuous line). While the BCC111(3) solid is stable at low temperature, the fluid phase overcomes it in stability for higher temperatures. This is more clearly seen in the inset, where chemical-potential differences are reported, taking the fluid µ for reference. The melting temperature for P ∗ = 0.05, which is where the con- tinuous line crosses zero, is estimated to be T ∗ ≃ 0.0073. FIG. 4: GCN model with L/D = 3, chemical-potential results for P ∗ = 0.12. Same notation as in Fig. 3, except for the absence of data for BCC001(3), which were not computed. Despite this, a look at the results in Figs. 2 and 3 give us confidence that the chemical potential of BCC001(3) will be closer to that of BCC110(3) than is for P ∗ = 0.05. FIG. 5: GCN model with L/D = 3, chemical-potential results for P ∗ = 0.20. Same notation as in Figs. 3 and 4. FIG. 6: GCN model with L/D = 3, distribution functions of BCC111(3) for P ∗ = 0.05. Left: T ∗ = 0.002. Right: T ∗ = 0.003. The strenght of crystalline order along ẑ, as measured by the amplitude of g‖ oscillations, reduces with increasing temperature, until complete disorder is left above T ∗ ≃ 0.005 (see next Fig. 7). Considering that the crystallinity within the x-y plane persists well beyond T ∗ = 0.005 (the spatial modulation of g⊥ remains solid-like beyond this temperature and up to melting), we conclude that the GCN system is found in a columnar phase for 0.005 < T < Tm. FIG. 7: GCN model with L/D = 3, distribution functions of BCC111(3) for P ∗ = 0.05. Left: T ∗ = 0.004. Right: T ∗ = 0.005. FIG. 8: GCN model with L/D = 3, smectic order parame- ter τ (λ) of BCC111(3) for P ∗ = 0.05. The behavior of τ (λ) faithfully reproduces that seen for g‖(z) (cf. Figs. 6 and 7): The deflating of the highest τ (λ) maximum with increasing temperature closely follows the thermal damping of g‖(z) os- cillations. FIG. 9: GCN model with L/D = 3, some snapshots of the particle configuration taken at low temperature (T ∗ = 0.002, BCC111(3) solid phase) and at intermediate temperature (T ∗ = 0.006, columnar phase). The reduced pressure is P ∗ = 0.05 in both cases. Above: side view, i.e., projection of particle coordinates onto the x-z plane. Below: top view, i.e., projection of particle coordinates onto the x-y plane. For clarity, in spite of their mutual interaction being soft, the par- ticles are given sharp ellipsoidal boundaries, corresponding to a unitary short axis (D) and a long axis of L = 3D. While the crystalline order along z is lost already at T ∗ = 0.005 (hence, it is there in the top-left panel while it is absent in the top- right panel), the triangular order within the x-y plane is main- tained up to the melting temperature (here, Tm ≃ 0.0073). FIG. 10: GCN model with L/D = 3, sketch of the phase diagram on the T -P plane. The full dots mark the location of the melting transition as extracted from our free-energy calculations. Open symbols refer instead to the transition thresholds as given by a visual inspection of the DF profiles. Though the latter melting-point estimates are more easily ob- tained than the former, the free-energy study was essential to identify the correct solid structure of the GCN model at not too low pressure. To help the eye, tentative phase bound- aries are drawn as continuous (i.e., first-order) and dashed (nearly second-order) lines through the transition points. In the low-pressure region, the solid-solid boundary is highly hy- pothetical since we have no data there. ABSTRACT We study a simple model of a nematic liquid crystal made of parallel ellipsoidal particles interacting via a repulsive Gaussian law. After identifying the relevant solid phases of the system through a careful zero-temperature scrutiny of as many as eleven candidate crystal structures, we determine the melting temperature for various pressure values, also with the help of exact free energy calculations. Among the prominent features of this model are pressure-driven reentrant melting and the stabilization of a columnar phase for intermediate temperatures. <|endoftext|><|startoftext|> High-spin to low-spin and orbital polarization transitions in multiorbital Mott systems Philipp Werner and Andrew J. Millis Columbia University, 538 West, 120th Street, New York, NY 10027, USA (Dated: June 30, 2007) We study the interplay of crystal field splitting and Hund coupling in a two-orbital model which captures the essential physics of systems with two electrons or holes in the eg shell. We use single site dynamical mean field theory with a recently developed impurity solver which is able to access strong couplings and low temperatures. The fillings of the orbitals and the location of phase boundaries are computed as a function of Coulomb repulsion, exchange coupling and crystal field splitting. We find that the Hund coupling can drive the system into a novel Mott insulating phase with vanishing orbital susceptibility. Away from half-filling, the crystal field splitting can induce an orbital selective Mott state. PACS numbers: 71.10.Fd, 71.10.Fd, 71.28.+d, 71.30.+h The Mott metal-insulator transition plays a fundamen- tal role in electronic condensed matter physics [1]. Much attention has focused on the one-orbital case, in part be- cause of its presumed relevance to high temperature su- perconductivity [2] and in part because appropriate the- oretical tools for the multiorbital case have until recently not been available. In most Mott systems, however, more than one orbital is relevant [3] and the redistribution of electrons among different orbitals leads to new phenom- ena such as orbital ordering or “orbital selective” Mott transitions. Recent studies of nickelates [4], titanates [5], cobaltates [6], manganates [7], vanadates [8, 9, 10] and ruthenates [3, 11, 12, 13] have focused interest on the in- terplay between the Mott metal-insulator transition and orbital degeneracy. A fundamental question in this field, relevant in particular to the issue of lattice distortions in strongly correlated materials, is the response of multi- orbital systems to a perturbation which breaks the orbital degeneracy. In this paper, we show that a two orbital model with Hund coupling and crystal field splitting ex- hibits two fundamentally different Mott phases, one char- acterized by a vanishing orbital susceptibility, and one adiabatically connected to the band insulating state. We characterize these phases in terms of the atomic ground states. Multiorbital models are more difficult to study both because of the larger number of degrees of freedom, and because the physically important exchange and pair-hopping terms are not easy to treat by standard Hubbard-Stratonovich methods [14]. Weak coupling ap- proaches [12] have been used to show that exchange and pair hopping interactions act to suppress the response to a crystal field splitting, and some authors have studied the model without exchange and pair hopping terms [9], but a reliable extension of these results to physically rel- evant Slater-Kanamori interactions and the strong cou- pling regime has been lacking. Dynamical mean field theory (DMFT) provides a non- perturbative and computationally tractable framework to study correlation effects and has allowed insights into the Mott metal-insulator transition [15]. In its sin- gle site version, DMFT ignores the momentum depen- dence of the self-energy and reduces the original lat- tice problem to the self-consistent solution of a quan- tum impurity model given by the Hamiltonian HQI = Hloc + Hhyb + Hbath. For multi-orbital models Hloc =∑ m ǫmc j,k,l,m U jklmc l cm, where m = (i, σ) denotes both orbital and spin indices, and U jklm some general four-fermion interaction. Hhyb and Hbath are the impurity-bath mixing and bath Hamiltonians, respec- tively. While the DMFT approximation simplifies the problem enormously (replacing a 3 + 1 dimensional field theory by a quantum impurity model plus a self consis- tency condition), the extra complications associated with exchange couplings in multiorbital systems have until re- cently prohibited extensive numerical work. Interesting progress has been made using a finite temperature exact diagonalization technique [6, 13], but this approach re- quires a truncation of Hbath to a small number of levels. In Refs. [16, 17] we have introduced a continuous-time impurity solver which can handle the general interactions in Hloc. The method, which is free from systematic er- rors, is based on a diagrammatic expansion of the parti- tion function in the impurity-bath hybridization Hhyb. Here, we employ this solver to study the physically relevant case in which the number of electrons matches the number of orbitals. The local Hamiltonian is Hloc = − α=1,2 µnα,σ + ∆(n1,σ − n2,σ) α=1,2 Unα,↑nα,↓ + U ′n1,σn2,−σ (U ′ − J)n1,σn2,σ − J(ψ† 2,↑ψ2,↓ψ1,↑ + ψ 2,↓ψ1,↑ψ1,↓ + h.c.), (1) with µ the chemical potential, ∆ the crystal field split- ting, U the intra-orbital and U ′ the inter-orbital Coulomb interaction, and J the coefficient of the Hund coupling. We adopt the conventional choice of parameters, U ′ = http://arxiv.org/abs/0704.0057v2 0 1 2 3 4 5 6 7 8 9 ∆/t=0.2 ∆/t=0.6 ∆/t=1 FIG. 1: Filling of orbital 1 as a function of U for ∆/t = 0.2, 0.6, 1 and several values of J/U . The different curves for given ∆ correspond (from bottom to top) to J/U = 0, (0.01), (0.02), 0.05, 0.1, 0.15, 0.25, respectively. Open (full) symbols correspond to metallic (insulating) solutions. The metal-insulator transition is characterized by a jump in filling and a coexistence region where both insulating and metallic solutions exist. Our data show the region of stability of the metallic phase. U − 2J , which follows from symmetry considerations for d-orbitals in free space and is also assumed to hold in solids. With this choice the Hamiltonian (1) is rotation- ally invariant in orbital space and the condition for half- filling becomes µ = µ1/2 ≡ 32U − J . In the DMFT self- consistency loop we use a semi-circular density of states of bandwith 4t (Bethe lattice). The temperature, unless otherwise noted, is T/t = 0.02 and we suppress magnetic order by averaging over spin up and down in each orbital. No sign problem is encountered in the simulations. The main result is shown in Fig. 1, which for several values of ∆ and J/U plots the filling per spin, n1, of or- bital 1 as a function of interaction strength. The half filling, non-magnetic condition implies n2 = 1 − n1. In the T → 0 limit, three phases are found: a metallic phase (which may have any value of n1 between 0 and 0.5), an orbitally polarized insulator favored by large ∆ and small J , and a Mott insulator (with n1 = 0.5 = n2) favored by large U , small ∆ and large J . If U is increased from zero to a small value, the orbital splitting either increases (small J/U) or decreases (large J/U), consistent with the findings of Ref. [12]. As interaction strength is further in- creased, one of several things may happen: at very small J/U , n1 continues to decrease, and the system eventually undergoes a transition to an orbitally polarized insulator (for large ∆ essentially a band insulator). For somewhat larger J/U , the occupancy n1, after an initial decrease, goes through a minimum and begins to increase. At even stronger interactions, one then observes a transition ei- ther to an orbitally polarized insulator (where n1 may take a range of values) or into a special type of insulator 0 0.2 0.4 0.6 0.8 1 J/U=0 0.05 0.1 metal insulator 0 0.5 1 1.5 2 2.5 3 3.5 J/U=0 J/U=0.25 Mott insulator (spin triplet for J/U=0.25) metal orbitally polarized insulator FIG. 2: Phase diagram in the plane of crystal field splitting ∆ and intraorbital Coulomb repulsion U for indicated values of J/U . For J = 0 the phase boundary is a monotonic function of ∆, whereas for J/U > 0 it peaks near ∆ = 2J (indicated by the dotted lines). The insulating state in the region ∆ .√ 2J is characterized by a vanishing orbital susceptibility. with n1 = 0.5. Figure 2 shows the metal-insulator phase diagram in the space of crystal field splitting and Coulomb repul- sion for several values of J/U . In the absence of a crystal field splitting (∆ = 0), we observe a metal-insulator tran- sition at a strongly J-dependent critical U . This finding is consistent with data presented in Ref. [18]. As ∆ is increased, the critical U changes. For J = 0 and fixed U , n1 decreases until the band is emptied and a metal- insulator transition occurs. The monotonic decrease of the critical U with ∆ at J = 0 is a special case. For J > 0, the first effect of a small ∆ is to stabilize the metallic phase. Then, at larger ∆, a reentrant insulating phase occurs. We shall show below that this behavior arises from the unusual nature of the insulating state at J > 0 and small ∆, which is characterized at T = 0 by a strictly vanishing orbital susceptibility. If ∆ is increased at large U , this state makes a transition to an orbitally polarized insulator at ∆ ≈ 2J . We therefore plot in Fig. 2 the curves ∆ = 2J as dotted lines, and sug- gest that they correspond to the T = 0 phase boundary 0.05 0.15 0.25 0.35 0.45 0 0.2 0.4 0.6 0.8 1 1.2 J/U=0.1 0 0.05 0.1 0.15 0.2 J/U=0 J/U=0.002 J/U=0.005 J/U=0.010 J/U=0.020 FIG. 3: Filling of orbital 1 as a function of ∆ for fixed U and indicated values of J/U . Top panel: U/t = 6. Open (full) symbols correspond to metallic (insulating) solutions. Bottom panel: U/t = 9. Here, all solutions are insulating. For crystal field splittings smaller than ∆c = 2J (indicated by a vertical line) the orbital susceptibility in the T → 0 limit is completely suppressed. Solid lines are for βt = 50, dotted lines show results for βt = 12.5, 25 and 100, respectively. between two distinct insulating states. Figure 3 plots the filling of orbital 1 as a function of crystal field splitting for fixed U/t and several values of J/U . The leftmost curve in the upper panel shows the density variation for J/U = 0.02. At ∆ = 0, the model is metallic. The rapid variation of n1 with ∆ reflects the large, but finite orbital susceptibility of the metal, which for this small value of J is strongly enhanced by U . At ∆/t ≈ 0.325 > 2J , an apparently first order transition occurs to the orbitally polarized insulating state, which then evolves smoothly (as ∆ is increased) to the band insulator (n1 → 0). The two larger J values reveal a different behavior. For ∆ < ∆c ≈ 2J the insulating state is characterized by an orbital occupancy which is independent of crystal field splitting. Then, an appar- ently discontinuous transition occurs to a metallic state with a large orbital susceptibility, which at even larger ∆ exhibits a first order transition to the orbitally polarized insulating state. The lower panel of Fig. 3 shows the be- 2 4 6 8 10 12 14 16 basis state ∆/t=0.3 ∆/t=0.5 ∆/t=0.7 FIG. 4: Weight of the different eigenstates of Hloc for U/t = 6, J/U = 0.05 and ∆/t = 0.3, 0.5 and 0.7. The smallest crystal field splitting corresponds to an insulating state with suppressed orbital susceptibility, the intermediate value to a metallic state and the largest splitting to a “band insulator” (see Fig. 3). havior for larger U , where the model is always insulating. Our data for J = 0 exhibit a rapid variation of n1 with ∆. The slope is set by the inverse of the Kugel-Khomskii superexchange ∼ t2/U ; thermal effects are unimportant at βt = 50. For J > 0 and small ∆, the model is insu- lating, with a vanishing orbital susceptibility, then (near 2J) makes a transition to the orbitally polarized insulating phase with a differential susceptibility ∂n1/∂∆ determined by Kugel-Khomskii physics. Note that the transition between the two insulators is sharp only at T = 0; for T > 0 a rapid (but smooth) crossover occurs. To gain insight into these phenomena, we look at the contribution to the partition function from the differ- ent eigenstates of the local Hamiltonian. Hloc has 16 eigenstates, which we number essentially as in Table II of Ref. [17]. For the following discussion it is important to note that |6〉, |7〉 and |8〉 are the three spin triplet states (with energy U − 3J − 2µ), while |10〉 and |11〉 are lin- ear combinations of the states |↑↓, 0〉 and |0,↑↓〉 with two electrons in one orbital and none in the other. The latter two states are coupled by the pair hopping and affected by the crystal field splitting. Here, we choose them to be eigenstates of Hloc corresponding to the eigenenergies J2 + 4∆2 − 2µ: (1 + α2±)−1/2(| ↑↓, 0〉+ α±|0,↑↓〉), α± = ±( J2 + 4∆2 ∓ 2∆)/J . In particular, we choose |10〉 to be the eigenstate with lower energy. Figure 4 shows the weights of these states for the three phases found at U/t = 6, J/U = 0.05 (see Fig. 3). In the small-∆ phase, the triplet states are occupied, with small excursions into states with occupancy 1 or 3. We therefore call this phase the triplet Mott insu- lator. The triplet states of course have one electron in each orbital and gain no energy from orbital polarization (the remarkable fact is that this feature is preserved af- 0.55 0.65 0.75 3.5 4 4.5 5 5.5 6 orbital 1 orbital 2 FIG. 5: Filling n1(µ), n2(µ) for U/t = 4, J/U = 0.25 and ∆/t = 0.4. Full (open) symbols correspond to insulating (metallic) solutions. At half-filling (µ/t = 3.5), the system is in a triplet Mott insulating state, for 3.9 . µ/t . 4.6 in an orbital selective Mott state, and for µ/t & 4.6 metallic in both bands. ter coupling to the lattice). In the metallic phase, a large number of states is visited, while in the orbitally polar- ized insulator, the dominant local state (whose weight increases continuously with ∆) is a singlet (|10〉). The triplet states are almost completely suppressed in the or- bitally polarized phase. The large-U insulator-insulator transition exhibits the same features, but without the intermediate metallic phase, and is therefore also a tran- sition between high and low spin states. Comparison of the eigenenergies of the spin triplet states and |10〉 show that these levels cross at ∆ = 2J . Thus, the transition from triplet Mott insulator to orbitally polarized insula- tor occurs at ∆c = 2J , consistent with our numerical data. We also note that the wave-function of state |10〉 depends on the ratio J/∆, leading in the large-∆ limit to n1(∆) ≈ (J/4∆)2. In the low spin phase, the or- bital susceptibility has therefore two contributions: one originating from Kugel-Khomskii physics and one of or- der J2/∆3 from Hloc. The latter explains the roundings seen in the right most curve of Fig. 3. We briefly address the issue of the orbital selective Mott transition, which provides a mechanism for local moment formation in correlated materials, and has been the subject of much recent debate [11]. Previous studies focused on two-orbital models with different band-widths and integer number of electrons. We find that in the presence of a crystal field splitting, shifting the chemical potential can drive the system into an orbital selective Mott state, even if the band-widths are equal. Figure 5 shows the filling per spin in both orbitals as a function of µ, for U/t = 4, J/U = 0.25 and ∆/t = 0.4. Doping occurs first in one of the bands, leaving the other in a Mott state with a magnetic moment. Further change of the chemical potential drives the second band metallic. In conclusion, we have shown that multiorbital impu- rity models with realistic couplings can be efficiently sim- ulated with the method of Ref. [17]. We have presented numerical evidence, based on single site DMFT calcula- tions, for the existence of two distinct Mott insulating phases in a half-filled two-orbital model with Hund cou- pling and crystal field splitting. At strong interactions and J < 2∆, the system, in the T → 0 limit, is in a phase characterized by a vanishing orbital susceptibil- ity, and a spin 1 moment on each site. For J > an orbitally polarized insulator is found. The exchange terms promote insulating behavior at ∆ = 0 but can stabilize a metallic phase at values of ∆ for which the non-interacting model is a band-insulator. It is interesting to compare our results to recent work on the bilayer Hubbard model [19, 20]. The model which these authors study is equivalent to our model with U = U ′ = J , and ∆ replaced by the interlayer hopping. In the low energy sector of this model, only four states (essentially our three triplets and the pair hopping state |10〉) are relevant, and what these authors describe as the Mott insulator to band insulator crossover corresponds to our transition (apparently sharp at T = 0) between triplet Mott insulator and orbitally polarized insulator. The existence of two distinct insulating phases raises many interesting questions including the theory of an insulator with strictly vanishing orbital susceptibility (which should exhibit an orbital gauge symmetry) and the nature and properties of the different metal-insulator transitions. The physics near the “triple point” remains to be studied. Our results away from half-filling suggest that lightly doped La2NiO4 is in an orbitally selective Mott phase. The calculations have been performed on the Hreidar beowulf cluster at ETH Zürich, using the ALPS-library [21]. We thank M. Troyer for the generous allocation of computer time, A. Georges and A. Poteryaev for stimu- lating discussions and NSF-DMR-040135 for support. [1] M. Imada, A. Fujimori and Y. Tokura, Rev. Mod. Phys. 70, 1039 (1998). [2] P. W. Anderson, Science 235, 1196 (1987). [3] Y. Tokura and N. Nagaosa, Science 288, 462 (2000). [4] J. Kunes et al., Phys. Rev. B 75, 165115 (2007) [5] C. Ulrich et al., Phys. Rev. Lett. 97, 157401 (2006). [6] H. Ishida, M. D. Johannes, and A. Liebsch, Phys. Rev. Lett. 94, 196401 (2005); A. Liebsch and H. Ishida, arXiv:0705.3627. [7] A. Yamasaki et al., Phys. Rev. Lett. 96, 166401 (2006). [8] S. Biermann et al., Phys. Rev. Lett. 94, 026404 (2005). [9] F. Lechermann, S. Biermann and A. Georges, Phys. Rev. Lett. 94, 166402 (2005). [10] T. Yoshida et al., Phys. Rev. Lett. 95, 146404 (2005). [11] A. Liebsch, Phys. Rev. Lett. 91, 226401 (2003); A. Koga et al., Phys. Rev. Lett. 92, 216402 (2004). http://arxiv.org/abs/0705.3627 [12] S. Okamoto and A. J. Millis, Phys. Rev. B 70, 195120 (2004). [13] A. Liebsch and H. Ishida, Phys. Rev. Lett. 98, 216403 (2007). [14] S. Sakai, R. Arita, K. Held, and H. Aoki, Phys. Rev. B 74, 155102 (2006). [15] A. Georges, G. Kotliar, W. Krauth and M. J. Rozenberg, Rev. Mod. Phys. 68, 13 (1996). [16] P. Werner et al., Phys. Rev. Lett. 97, 076405 (2006). [17] P. Werner and A. J. Millis, Phys. Rev. B 74, 155107 (2006). [18] A. Koga, Y. Imai and N. Kawakami, Phys. Rev. B 66, 165107 (2002). [19] A. Fuhrmann, D. Heilmann and H. Monien, Phys. Rev. B 73 245118 (2006). [20] S. S. Kancharla and S. Okamoto, cond-mat/0703728. [21] M. Troyer et al., Lecture Notes in Computer Science 1505, 191 (1998); F. Alet et al., J. Phys. Soc. Jpn. Suppl. 74, 30 (2005); http://alps.comp-phys.org/ . http://arxiv.org/abs/cond-mat/0703728 http://alps.comp-phys.org/ ABSTRACT We study the interplay of crystal field splitting and Hund coupling in a two-orbital model which captures the essential physics of systems with two electrons or holes in the e_g shell. We use single site dynamical mean field theory with a recently developed impurity solver which is able to access strong couplings and low temperatures. The fillings of the orbitals and the location of phase boundaries are computed as a function of Coulomb repulsion, exchange coupling and crystal field splitting. We find that the Hund coupling can drive the system into a novel Mott insulating phase with vanishing orbital susceptibility. Away from half-filling, the crystal field splitting can induce an orbital selective Mott state. <|endoftext|><|startoftext|> Introduction Martin Rees is fond of arguing, absence of evidence is not evidence of absence. How could anyone disagree? But on the question of the existence of extraterrestrial intelligent life, we have an undeniable fact: they aren’t here. That is, extraterrestrial intelligent beings are not obviously present on our planet, or in our solar system. I think even Martin will agree with this! But I claim this fact allows us to conclude that extraterrestrial intelligence (ETI) is absence from our Galaxy and from the Local Group of galaxies. In other words, if they existed, theyd be here! This argument has often been called the Fermi Paradox. I think it is analogous to Olbers’ Paradox in cosmology, which uses an equally obvious fact, known to all of us — the fact that the sky is dark at night — to conclude that the universe must have evolved to its present state. The universe cannot have been the same as it appears now for all eternity. I shall outline in Section 2 the reasons that the absence of ETI on Earth allows us to conclude that they don’t exist in our galactic neighborhood. I have developed this argument is much more detail elsewhere, addressing all counter-arguments that have been proposed. So I shall only outline my argument in Section 2. I shall also only outline the evolutionary argument against ETI here. Mayr, Dobzhanski, Simpson and Ayala have defended this position at length over the past 40 years, and I’m sure this argument is quite familiar to the readers of this journal. What I want to develop in this paper is a new argument against the existence of ETI. I shall call it the Limited Resources Argument. It is related to the Fermi Paradox in that it assumes that an intelligent life form will inevitability expand off its planet of origin and once this expansion begins, it will never stop. But if intelligent life were common in the cosmos, the expansion of technological civilization would use up resources so fast that intelligent life would die out. If intelligent life is rare, the speed of light barrier will prevent life from using up the resources too fast. The immediate reaction to this argument is, so what if intelligent life uses up the resources too fast and dies out? Do we have any reason for believing that intelligent has some guarantee for survival that other species do not? Most species that have evolved are now extinct, and have left no descendants. Why should Homo sapiens be any different? There is no evidence from evolutionary biology that intelligence should survive indefinitely. But there is evidence from physics for the importance of intelligence life in cosmology. Not of course in the current phase of universal history, but instead near the end of the universe. http://arXiv.org/abs/0704.0058v1 II. Why Intelligent Life Must Be Rare A. The Improbable Evolution Argument The argument against ETI that most readers of this journal will be familiar with goes back to Alfred Russell Wallace, and has more recently been defended by such major evolutionists as George Gaylord Simp- son, Theodosius Dobzhanski, and Ernst Mayr. These scientists point out that according to the Modern Synthesis, evolution has no knowledge of goals. Instead, natural selection acts on random mutations, muta- tions which never appear with the intent of achieving a goal in the distant future. There are an enormous number of evolutionary pathways, and so few of these lead to intelligent life, that it is unlikely intelligent life will appear more than once in the visible universe, which is the part of the universe within 13.7 billion light years. The universe is observed to be 13.7 billion years old, and so we cannot see out a distance greater than 13.7 billion light years, the distance light could have traveled in that time. (Actually, we can see out a bit further than 13.7 billion light years because of the expansion of the universe, but let me ignore this minor technicality.) Even if we were to assume that all the matter and energy in the visible universe were in the form of Earthlike planets, there would be only (!) about 1028 Earthlike planets in the visible universe. This number assumes that “earthlike” means only that the mass of the planet is greater than or equal to the mass of the Earth. No assumption is made about the planet’s star, atmosphere, or orbital radius. The well-known evolutionist Francisco Ayala has recently made this argument quantitative. He estimates that the probability of an intelligent species evolving on an Earthlike planet upon which one-cell organisms have appeared is less than 10 to the minus one million power! This number is so tiny that the evolution of intelligent life is exceedingly unlikely to have occurred even once. Ayala’s number is not contradicted by the fact that intelligent life exists on Earth. It is just exceedingly improbable that it exists anywhere in the universe (at least if the universe is finite in spatial size, as I shall argue in Section IV that it is). Ayala’s number depends on the assumption that gene changes upon which natural selection operates are essentially random. Evolution has no foresight. Mayr has emphasized that intelligence on earth is limited to the chordate lineage, so, he argues that if the chordates never appeared on Earth, neither would intelligence. But chordates first evolved more than half a billion years ago. These animals did not know that they had to evolve so that Homo sapiens would eventually appear. Natural selection can only operate during an animal’s lifetime. It cannot select a genome with the intent of using the genome a billion yeas later. There is an important caveat to this; a caveat first pointed out by Charles Darwin himself in the last pages of his book The Variation of Animals and Plants under Domestication. Darwin noted that at the ultimate level of physics, the universe is deterministic. This means that at the ultimate level, there are no random events. In particular, the evolution of Homo sapiens was inevitable, determined by the initial state of the universe and the universes initial conditions. “Random” variation does not mean uncaused. It just means unpredictable for human beings. Therefore, at this ultimate physical level, Darwin claims that his own theory is only an approximation. Darwin noted that the advance of science might enable us to obtain enough information to predict these “random” variations. I shall argue below that this time has now come. B. If They Existed, They’d Be Here The argument against the existence of extraterrestrial intelligent life that I have developed in most detail is sometimes called the Fermi Paradox: if they existed, they’d be here. The force of this argument is not usually appreciated, because most people — and even most scientists (! — tacitly assume that any alien civilization, no matter when they evolved or how long they have had advanced technology, will nevertheless have essentially the technology of the late 20th century. The reason for this tacit assumption is the usual human weakness: we have an unfortunate habit of trying to impose our current human perspectives on the physical universe. But let consider the consequences of only slightly more advanced computer technology than we now have. According to most computer experts, within a century or so we should have computer programs which have human level intelligence, computers which can run such programs and also make copies of themselves and the programs. Imagine such a machine combined with our rocket technology into a space probe. Such a space probe can reach the nearest star in 40,000 years. Once there in the nearest star system, the probe could make several copies of itself, using the asteroid material which we now know is present in almost all star systems, sending these daughter probes to further star systems, where the process would be repeated. Even with our rocket technology, every star system in the entire Galaxy would have a probe within 100 million years. With a more advanced rocket technology, a rocket technology which is even today been experimented with, it should be possible to send a probe between the stars at 1/10 light speed. With such a speed, probes would cover the entire galaxy within a few million years. And all for the cost of a single probe! Almost any motivation we can imagine would lead an intelligent species with the technology to launch that single probe. Suppose for example, ET wants to contact other intelligent life forms. Then rather than send out radio signals, they should send out that single probe. With radio, one has to send out the signals to many stars, over many thousands of years. (We would expect evolution to intelligence to require billions of years, as it did on Earth.) But once the probe is launched, coverage of the entire galaxy is automatic. Once in a target star system, the intelligent probe can contact any intelligent life forms that happen to have evolved on any planet in the system. Or if no intelligent life is found, the probe can study the entire system and transmit the results back to Earth. This on the spot investigation is obviously impossible if radio signals are sent out instead of a space probe. One might think an intelligent species would be reluctant to use probes because of the worry that these machines would eventually escape from the control of the original transmitting species. But the same objection can be made to sending out radio signals. It is impossible to predict what use a recipient species would make of the information in the signal. Many scientists here on Earth have opposed the transmission of signals, fearing that hostile aliens may use the signals to home in on our planet. The fear of losing control of the probes — which, since these machines are rational beings, should be regarded as our mind children — apply with equal force to our biological descendants. “No species now existing will transmit its unaltered likeness to a distant futurity” was how Darwin put in the closing pages of Origin of Species. We do not know whether they will be good or bad by our standards. We do know that in the far future they won’t be Homo sapiens. But in the long run, our descendants, whatever they look like, whether they are silicon machines or the more familiar DNA devices, must leave the Earth if they are to survive. Within 6 billion years, the Sun’s atmosphere will expand out and engulf the Earth, which will spiral into the Sun and be vaporized. A similar fate is in store for any and all intelligent species that evolve on a water planet. Making the reasonable Darwinian assumption that survival will be a central motivation of all intelligent species, all intelligent species will eventually develop space travel, leave their planet, and colonize their own star system. The universe is 13.7 billion years old, and most stars and their planets are billions of years older than our own. Thus, whatever the probability intelligent life evolves on an earthlike planet on which one-cell organisms appear, most intelligent species would be billions of years older than we are. They should have left their mother planet billions of years ago. Once they leave their planet, nothing can stop their expansion into interstellar space. If they existed, they would be here. C. The Limited Resources Argument Once an intelligent species begins its expansion into interstellar space, there is only the speed of light barrier to stop the expansion. Furthermore, as Dyson has emphasized, intelligent life will eventually develop the ability to convert any form of matter into living matter and life support devices. Given time, intelligent life can take apart no only asteroids, but also entire Jupiter-sized planets and even stars. Thus a galaxy which has been invaded (infected?) by a space travelling intelligent life form will start to disappear. This, by the way, is yet another argument for human uniqueness in the visible universe. We have never observed galaxies in the process of controlled disintegration. Intelligent life, in the long term, ought to appear as a horde of locusts, devouring all matter in its domain. A galactic wide government cannot be set up to stop such behavior because of the speed of light barrier, but even if it could be set up, it would have no choice but to allow such behavior. Survival requires the conversion of matter into energy. Setting an ultimate limit to how much matter can be so converted would merely doom life to extinction. However, the speed of light barrier, which prevents a galactic scale government from being set up to prevent life from devouring all matter, itself imposes a limitation on how fast life can use up resources. The disc of our galaxy is some 100,000 light years across; we not use up the material resources of our galaxy in less than 100,000 years. The Virgo cluster is some 60 million light years away. We cannot use up the resources of the Virgo cluster in less than 60 million years. If the universe were closed and decelerating, a single intelligent life form could not devour the entire universe until after the universe had begun to recollapse. Actually the universe is currently accelerating. If this acceleration were to continue forever at its present rate, our descendants could devour only the region currently within at most 10 billion light years. This limit is imposed by the speed of light barrier modified by the universal acceleration. But the more intelligent life there is in the universe, the more planets upon which intelligent life inde- pendently evolves, the more rapidly resources will be used up. When all the material resources are used up, intelligent life will die. The more common intelligent life is in the universe, the more rapidly it will become extinct. Conversely, if intelligent life is quite rare — a single intelligent species, if the universe were closed and always decelerating — intelligent life would be forced by the laws of physics to use resources at just the right rate to survive to the very end of time. And even more intelligent species could so survive if the universe were to have a period of acceleration in its expansion phase, as the universe is indeed observed to have. But why should the universe adjust the number of intelligent species so that the descendants of the species would survive to the end of time? As Darwin pointed out in the closing pages of Origin of Species, almost all species that have ever existed on Earth have died out, leaving no descendants. Why should an intelligent life form have a survival probability utterly different from almost all other species? I claim that intelligent life will survive until the end of time because the laws of physics require it. Or to put it another ways, because such survival is one of the goals of the universe. III. Unitarity is Teleology Teleology has been completely rejected by evolutionary biologists. This rejection is unfortunate, because, teleology is alive and well in physics, under the name of unitarity. Unitarity is an absolutely central postulate of quantum mechanics, and it has many consequences. One of these consequences is the CPT theorem, which implies that the g-factors of particles and antiparticles must be exactly equal. This equality (for electrons and positrons) has been verified experimentally to 13 decimal places, the most precise experimental number we have. Which is why very few physicists are willing to give up the postulate of unitarity! Furthermore, unitarity is closely related to the law of conservation of energy, and a violation of unitarity has been shown to result usually in the gigantic creation of energy out of nothing. One model (due to Leonard Susskin) of unitarity violation had the implication that whenever a microwave oven was turned on, so much energy was created that the Earth was blown apart. So physicists are very reluctant to abandon unitarity. Unitarity is most often applied to what physicists call the S-matrix, which is the quantum mechanical linear operator that transforms any state in the ultimate past to a unique state in the ultimate future. But unitarity more generally applies to the time evolution operator, a linear operator that carries the quantum state of the universe at any initial time uniquely into the quantum state of the universe at any chosen future time. Uniquely is a key word. It means that unitarity is the quantum mechanical version of determinism. Contrary to what is generally thought, determinism is alive and well in quantum mechanics. Determinism, however, applies to wave functions (quantum states) rather than to individual particles. Alternatively, we can say that determinism applies to coherent collections of worlds rather than to individuals. There is a sense, which I won’t have room to discuss here, in which quantum mechanics is more deterministic than classical mechanics, and that Schrödinger derived his famous equation by requiring that classical mechanics in it most general expression (Hamilton-Jacobi theory) be deterministic. (See Tipler (2005) for the mathematical details.) But the usual past-to-future determinism is not the fundamental meaning of unitarity. What unitarity really means is that the inverse of the time evolution operator exists, and is easily computed from the time evolution operator itself by forming the time evolution operator’s hermitian conjugate. Any operator whose inverse is obtained in this manner is said to be a unitary operator. But in the present context, the important point is that the inverse of the time evolution operator exists. The inverse of any operator is an operator that undoes the effect of the original operator. In the case of the time evolution operator, which generates past-to-future evolution, the inverse operator generates future-to-past evolution. In other words, it carries future quantum states uniquely into past quantum states. Therefore, unitarity tells us that any complete statement of usual past-to-future causation is mathematically equivalent to some complete statement of future-to-past causation. In more traditional language, a complete list of all efficient causes is equivalent to some complete list of final causes. Teleology is reborn! Nevertheless, the Second Law of Thermodynamics says that the complexity of the universe at the microlevel is increasing with time. This means that it will usually be the case that past-to-future causation will be the simpler explanation of the two causal languages. But this will not always be the case. We should always remember that for physical reality the two causation languages are mathematically equivalent. It might occasionally be the case that we humans can understand where the evolution of the universe is taking us only by using future-to-past causation. That is, we can understand what is happening now only by considering the ultimate goal of the universe. To reject this possibility is a terrible mistake. Humans naturally think in terms of past-to-future causation because our memories are designed (by the laws of physics) to work in this time direction. But the universe is not similarly restricted. It is a mistake to impose human limitations on the physical universe. It was a terrible mistake to require that solar system mechanics look simple in a geocentric frame of reference. Let me now use this future-to-past causation to show that biological evolution cannot be completely random. I shall now argue that the laws of physics require intelligent life to evolve somewhere, and survive to the very end of time. IV. Why Intelligent Life MUST Exist in the Far Future The necessity of intelligent life in the far future is an automatic consequence of the laws of physics, specifically quantum mechanics, general relativity, the Standard Model of particle physics, and most impor- tantly, the Second Law of Thermodynamics. I shall show that the mutual consistency of these laws requires three things. First, the universe must be closed (the universe’s spatial topology must be a three-sphere). Second, life must survive to the very end of time. Third, the knowledge possessed by life must increase to infinity as the end of time is approached. I do not assume life survives to the end of time. Life’s survival follows from the laws of physics. If the laws of physics be for us, who can be against us? But before I prove that the laws of physics require life to survive, let me first show that it is possible for life to survive. To survive for infinite experiential time, life requires an unlimited supply of energy. That is, the supply of available energy must diverge to infinity as the end of time is approached. Nevertheless, conservation of energy requires the total energy of the universe to be constant. In fact, Roger Penrose has shown that the total energy of any closed universe is ZERO! The total energy is zero now, was zero in the past, and will be zero at all times in the future. One might wonder how this is possible. After all, we are now receiving energy from the Sun, we are using food energy as we read this, and we can extract energy from coal, oil, and uranium. Energy, in other words, seems to be non-zero. However, the forms of energy just listed are not all the forms of energy in the universe. There is also gravitational energy, which is negative. So if we were to add all the positive forms of energy — radiant energy, the stored energy in coal, oil, and uranium, and most importantly, the mass-energy of matter — to the negative gravitational energy, the sum is zero. This means that if we can make the gravitational energy even more negative, the positive energy, that is, the energy available for life, necessarily increases, even though the total energy in the universe stays zero. The key property of energy that must always be kept in mind is that it transforms from one form to another. Once we realize that gravitational energy can transformed into available energy, we understand where life can obtain the unlimited source of available energy it needs for survival: life must make the total gravitational energy approach minus infinity. Life can do this only if the universe is closed, and collapses to zero size as the end of time is approached. Conversely, if the universe is closed and collapses to zero size, then the total gravitational energy goes to minus infinity, since the gravitational energy of a system is inversely proportional the size of the system. I have shown in my book (Tipler, 1994) that life can in fact extract unlimited available energy from the collapse of the universe. Now let me outline the proof of my three claims above. I can give here only a bare outline. For complete details, the reader is referred to my book (Tipler, 1994) and to papers ((Tipler et al, 2000), and (Tipler 2001)) on arXiv, the physics preprint database (available on the Internet at http://arxiv.org/). Black holes exist, but Hawking proved that were black holes to evaporate completely — as they necessarily would if the universe were to expand forever — the black holes would violate unitarity, the fundamental law of quantum mechanics which I described in the previous section. Hence the universe must eventually stop expanding, collapse, and end in a final singularity. If this final singularity were to be accompanied by event horizons, then the Bekenstein Bound (another law of quantum mechanics, basically the Heisenberg Uncertainty Principle http://arxiv.org/ expressed in the language of information theory) would have the following effect. It would force that all the microstate information in the universe to go to zero as the universe approaches the final singularity. But the microstate information going to zero would imply that the entropy of the universe would have to go to zero, and this would contradict the Second Law of Thermodynamics, which says that the entropy of the universe can never decrease. But if event horizons do not exist, then the Bekenstein Bound allows the information in the microstates to diverge to infinity as the final singularity is approached. Conversely, ONLY if event horizons do not exist can quantum mechanics (the Bekenstein Bound) be consistent with the Second Law of Thermodynamics. Therefore, event horizons cannot exist, and by Seifert’s Theorem (see (Tipler, 1994), p. 435) the non-existence of event horizons requires the universe to be spatially closed. In Penrose’s c-boundary construction (Tipler, 1994), (Hawking and Ellis, 1973), a singularity without event horizons is a single point. I call such a final singularity the OMEGA POINT. At a Windsor Castle conference, Martin Rees objected that many physicists (in particular, himself) do not accept Hawking’s proof that unitarity would be violated were a black hole to evaporate to completion. But most of the physicists who reject Hawking’s argument nevertheless accept that there is nevertheless a Black Hole Information Problem: i.e., that we must explain how the information that falls into a black hole gets out. Many solutions to the Information Problem have been proposed but all of these solutions (except the one I shall advance) have one feature in common. They all involve proposed new laws of physics. My proposal — that there are no event horizons at all, hence no black hole event horizons, so ALL information at all events are accessible to all observers in the far future — does NOT involve new physical laws. Only classical general relativity is used. I use Hawkings unitarity argument only to infer the non-existence of event horizons. If we resolve the Black Hole Information Problem by simply assuming the non-existence of event horizons, then I don’t need to use either the Bekenstein Bound or the Second Law of Thermodynamics to infer the existence of the Omega Point, or spatial closure. Resolving the Information Problem using known physics automatically yields no event horizons and spatial closure for the universe. If the universe were to evolve into an Omega Point type final singularity without life being present to guide its evolution, then the non-existence of event horizons would mean that the universe would be evolving into an infinitely improbable state. Such an evolution would contradict the Second Law of Thermodynamics, which requires the universe to evolve from less probable to more probable states. On the other hand, if life is present guiding the evolution of the universe into the final singularity, then the absence of event horizons is actually the MOST probable state, because the absence of event horizons is exactly what life requires in order to survive (details in my book (Tipler 1994)). In other words, the validity of the Second Law of Thermodynamics REQUIRES life to be present all the way into the final singularity, and further, the Second Law requires life to guide the universe in such a way as to eliminate the event horizons. Life is the only process consistent with known physical law capable of eliminating event horizons without the universe evolving into an infinitely improbable state. Exactly how life eliminates the event horizons is described in my book (Tipler, 1994). Roughly speaking, life nudges the universe so as to allow light to circumnavigate the universe first in one direction, and then another. This is done repeatedly, an infinite number of times. There are thus an INFINITE number of circumnavigations of light before the Omega Point is reached. If we were to regard a single circumnavigation as a single tick of the light clock there would be an infinite amount of such time between now and the Omega Point. An even more physical time would be the number of experiences which life has between now and the Omega Point. This “experiential time” — the time experienced by life in the far future — is the most appropriate physical time to use near the Omega Point. It is far more appropriate than the human based proper time we now use in our clocks. V. Life in the Future of an Accelerating Universe As anyone who has read the science columns of the newspapers over the past decade knows, the universe is now accelerating. The most recent WMAP observations of the Cosmic Microwave Background Radiation provide the strongest evidence for acceleration, but there are several independent lines of evidence that lead to the conclusion that the universe is accelerating. The evidence is also strong that the mechanism for the acceleration is due to a positive cosmological constant. If this acceleration were to continue forever, then as Barrow and I showed in our book (Barrow and Tipler, 1986), intelligent life will eventually die out, and the entire theory, which I described in section III, would be false. If intelligent life is to continue until the very end of time — as it must if the laws of physics are to hold at all times — then the universe must eventually stop accelerating, slow down until the expansion stops, and then recollapse to a final singularity. In this section, I shall outline a mechanism which can cancel the acceleration. My proposal assumes the validity of the Standard Model of particle physics, a theory which is so far supported by all experiments conducted to date, and which provides only one mechanism for a universal acceleration. The latest WMAP observations of the Cosmic Microwave Background Radiation (CMBR) have provided the following facts. First, the universe is 13.7 billion years old. Second, in the present epoch, the density parameters of the curvature, the ordinary matter, the dark matter, and the dark energy are respectively Ωk << 0.01, Ωm = 0.04, ΩDM = 0.23, and ΩΛ = 0.73. Notice that the subscript on the dark energy is Λ. I use this subscript to emphasize that the WMAP data indicate the dark energy looks observationally like the effect of a positive cosmological constant, traditionally written Λ. Any correct cosmological theory must be consistent with these observations. The Standard Model, minimally coupled to gravity, necessarily has a positive cosmological constant. I predicted in my book (Tipler, 1994) that this cosmological constant would cause the universe to undergo an acceleration. I argued that this acceleration would occur in the collapsing phase of universal history. I did not realize that an acceleration could also occur in the expanding phase. Though I should have, since the Standard Model requires such an acceleration. The Standard Model requires a positive cosmological constant to cancel the effect of the Higgs vacuum. Recall that according to the Standard Model, the universe is permeated with a non-zero value of the Higgs field, and it is this non-zero value that breaks the electroweak symmetry and gives mass to all the particles. But this symmetry breaking is accomplished via the Higgs potential, which for constant Higgs field, acts exactly a very strong negative cosmological constant. Initially, at the Big Bang singularity, the Higgs field, and hence the Higgs potential, was zero. But zero is not the lowest value of the potential, so as the universe expanded, the Higgs potential dropped to its lowest value, corresponding to a negative cosmological constant. Now in special relativity, this negative constant can be re-normalized out of existence. Not so in general relativity. Any constant in the matter Lagrangian multiples the invariant volume element, and is equivalent to putting in a cosmological constant in the Lagrangian (Weinberg, 1988). The value of the negative cosmological constant corresponding to the Higgs potential can be set by experiment, and it is enormous: −1.0× 1026 gm/cm3, as compared to the energy density of the dark matter and dark energy, only 10−29 gm/cm3. The only way to make the Standard Model consistent with general relativity is to add a positive cosmological constant of the same magnitude to the Lagrangian. We would expect the value of the added positive cosmological constant to precisely cancel the value of the Higgs potential, when the Higgs is in its true ground state (the absolute lowest energy density of the potential). But the Higgs field cannot presently be in its true ground state, for a very simple reason: there is more matter than antimatter in the universe. The Standard Model has a mechanism of generating this observed excess of matter over antimatter, but most cosmologists believe that this cannot be the main mechanism to generate matter, because they think, incorrectly, that it will generate too many photons to baryons. I have shown that this large number of photons to baryons is a consequence of imposing the wrong boundary conditions in the very early universe. If the only boundary conditions consistent with the Bekenstein Bound (a.k.a. quantum field theory) are imposed, the photon to baryon ratio turns out fine. The Standard Model generation of matter works by electroweak vacuum tunneling. And if this tunneling yields an excess of matter over antimatter, the Higgs field cannot be in its true vacuum. Thus the excess of matter over antimatter in the universe ultimately causes the observed acceleration of the universe! Conversely, if the excess of matter over antimatter were to disappear — if matter were converted into energy via electroweak tunneling — and if this disappearance were to occur rapidly enough, then the Higgs potential would fall toward its true ground state, the positive cosmological constant would be progressively cancelled, and the universe would cease to accelerate. If he universe were a spatially a three-sphere — and I have argued in the previous section that it is — then once the acceleration stops, the universe will expand to a maximum size, and then recollapse into the final singularity. Provided, of course, than a mechanism can be found to convert matter into energy via electrweak quantum tunneling. The mechanism would have to be the inverse of the process that created the matter excess in the early universe. But a large amount of matter was created in the early universe because the gauge field energy density was enormous. The gauge field energy density is tiny today: 10−31 gm/cm3, and getting smaller as the universe expands. If the acceleration is to stop, another mechanism must annihilate the matter. I claim that our future descendants will annihilate the matter. Once again, they will annihilate the matter in order to survive. Survival requires energy. If baryon number is conserved, then only a small fraction of the energy content of matter can be extracted. If hydrogen is converted into helium, as in the Sun, only 0.7% of the mass of the hydrogen is converted into energy. But if our descendants use the inverse of baryogenesis (the technical term for the process that generated matter in the early universe), ALL the energy in matter can be extracted. I predict that in the future, a way will be found to use inverse baryogenesis, our descendants will use this process as their main energy source, and as a consequence of using up there matter resources, they will save both themselves, and the entire universe. Because if the acceleration can be cancelled and universal recollapse induced, then the gravitational collapse energy can provide an unlimited energy source, as I showed above. But in an accelerating universe, life can only travel to the cosmological event horizon, which is about 10 billion light years away at the present time, given the observed value of the dark energy. (Actually, I should call it the “pseudo event horizon”, since it would be a true event horizon only if life never stops the expansion, and the Omega Point never develops. The Omega Point, recall, means that there are no event horizons.) But quantum non-locality means that the quantum tunneling responsible for baryogenesis generates a uniform density of baryons on large scales. (And since it is the creation of baryons that generate perturbations in the CMBR, the perturbation spectrum must be scale invariant.) This means that the baryons have essentially the same density on large scales everywhere in the universe. This means that the acceleration must be universal. This means that if the universe is to recollapse, the baryons must be annihilated everywhere, even at distances greater than 10 billion light years, where our descendants cannot travel, even were rockets based on baryon annihilation to be constructed. Such rockets could approach light speed. I have shown (Tipler, 1994) that such rockets can travel cosmological distances, using the expansion of the universe itself to slow down the rocket. Our descendants can reach the pseudo event horizon but no farther. Thus the laws of physics require there to exist other intelligent species in the universe. Because of the Limited Resources Argument, the different intelligent life forms must be rare, roughly one species per Hubble volume. The nearest other intelligent life form must be roughly 10 billion light years away. But were we to look for them, we would not see them, because at 10 billion light years, we would see their galaxy as it was 10 billion years ago, probably long before their planetary system formed. VI. Conclusion and Proposed Experiments But sufficiently advanced radio telescopes MIGHT be able to detect their future presence. In other words, I shall now argue that there is a role for SETI! If we cannot detect alien civilizations, we might be able to detect the one-cell organisms out of which they will eventually evolve. Provided that these organisms already existed 10 billion years ago. There is some evidence that the one-cell organism that were our own ancestors were around billions of years before the Earth formed 4.6 billion years ago. William Schopf (1999, p. 77) has discovered structures in the 3,465 ± 5 million-year-old Apex chert of Australia that closely resemble modern cyanobacteria. Schopf identified these structures as fossil cyanobacteria, an identification that has been recently challenged. But I shall assume that his identification is correct, so I can consider the consequences. Now cyanobacteria are actually very sophisticated biochemical machines. If the fossil found by Schopf are indeed cyanobacteria, then all the machinery of prokaryotes, including photosynthetic ability, must have been present on Earth almost as soon as the Earth became capable of sustaining life, about 3.8 billion years ago. Schopf himself remarks (1999, p. 98) that it seems extraordinary to suppose that this much sophistication could have evolved in the geologically short period between the solidification of the Earth and the date of the Apex fossils. I agree with Schopf. If indeed the Apex structures are fossils of cyanobacteria, then these organisms cannot have evolved on Earth. They must have evolved their observed level of sophistication on some other planet whose star long ago left the main sequence, and in the process, scattered the cyanobacteria throughout interstellar space. At the Windsor Castle conference, Paul Davies emphasized the consensus opinion that cyanobacteria could survive a trip from one of Solar System’s planets, but because of the amount of radiation that they would receive, they could not survive an interstellar journey. But the evidence Paul cited was theoretical, rather than experimental. Cyanobacteria are capable of surviving nuclear explosions, and they have been known to live inside nuclear reactors (Schopf, 1999, pp. 232-234). Given the ability of cyanobacteria to survive radiation, their biochemical complexity, and the evidence that they appeared almost instantaneously on Earth, I think that the preponderance of evidence says that cyanobacteria evolved billions of years before the Earth formed, on a star that has long since disappeared. This hypothesis has consequences. First, our interplanetary space probes should find cyanobacteria wherever in the Solar System there is, or has been, liquid water. But if cyanobacteria have indeed been dispersed throughout interstellar space billions of years before the Earth formed, we would expect to find cyanobacteria, with the same DNA codons and cellular machinery, wherever there is liquid water in the entire Galaxy. This hypothesis can be rigorously tested only with interstellar space probes. Incidentally, notice that I’ve given in passing yet another reason why interstellar probes will eventually be sent out by any intelligent species: to check how related life is in the Galaxy. But if photosynthetic organisms have existed for billions of years before the Earth formed — for the order of 10 billion years — and if our evolution is typical, we would expect intelligent life near the pseudo event horizon to have evolved from organisms, some of which have photosynthetic ability, which existed on liquid water planets 10 billion years ago. We would also expect there to have been time for the photosynthetic organisms to convert some of these ancient planets’ atmospheres into oxygen atmospheres. This is what we should search for in distant galaxies: the spectral lines of free oxygen. It has long been known that the oxygen in Earth’s atmosphere can be seen at a distance of 10 light years by a one meter orbiting telescope. A million-kilometer telescope would be able to see free oxygen lines in planetary atmospheres near the pseudo event horizon. From the arguments above, some such atmospheres must exist. A million-kilometer telescope is not going to be built in the immediate future. In the short run, I would propose testing the hypothesis that the excess of matter over antimatter is responsible for the universal acceleration, and that a special boundary condition on the fields of the Standard Model generate the excess of matter over antimatter. This can be done rather easily, using a modification of the original equipment that discovered the CMBR. I have shown in (Tipler, 2001, 2005) that if Standard Model physics is responsible for both the dark matter and the dark energy, then the CMBR should not couple to right-handed electrons, and this can be seen by sending the CMBR through filters consisting of poor conductors. Through such a filter, the CMBR would be more penetrating than thermal radiation of the same temperature. I have shown elsewhere that the same effect is visible in the Sunyaev-Zel-dovich effect (Tipler, 2005), and it is responsible for the great penetrating power of ultrahigh energy cosmic rays (Tipler, 2001, 2005). Two of the arguments against the existence of ETI have been around for a long time. The evolutionary argument goes back to Alfred Wallace, with Darwin the co-discoverer of the principle of natural selection. The Fermi Paradox goes back to Enrico Fermi. I’ve added a third, the “Limited Resources Argument” which connects the rarity of intelligent life in the universe to the unlimited survival of intelligence in the far future. But to appreciate the power of this argument, we must learn to give up anthropocentric ways of thinking. We must abandon the (usually tacit) idea that our technology exhausts what is possible using the known laws of physics. We must abandon the idea that the universe acts according to human thought patterns, that causality works from past to future. We must abandon the idea that the universe evolves us as the highest level of intelligence, and that all other intelligent species will be as limited in space as we are. Finally, we must abandon the idea that there is a limit to what intelligence can accomplish, and that intelligence will never play a role on the cosmological scale. Once we give up these human ways of thinking, we can appreciate the true relation between intelligent life and the cosmos. References Barrow, J.D., Tipler, F. J. 1986 The Anthropic Cosmological Principle, Oxford University Press. Hawking, S.W., Ellis, G.F.R. 1973 The Large-Scale Structure of Space-Time, Cambridge University Press. Schopf, W. 1999 Cradle of Life: the Discovery of Earths Earliest Fossils, Princeton University Press. Tipler, F. J. 1994 The Physics of Immortality, Doubleday. Tipler, F. J., Graber, J., McGinley, M., Nichols-Barrer, J., Staecker 2000 gr-qc/0003082. Tipler, F.J. 2001 astro-ph/0111520. http://arXiv.org/abs/gr-qc/0003082 http://arXiv.org/abs/astro-ph/0111520 Tipler, F. J. 2005, Reports Prog. Phys. 68, pp. 897–964. Weinberg, S. 1989, Rev. Mod. Phys., 61, pp. 1–22. ABSTRACT I shall present three arguments for the proposition that intelligent life is very rare in the universe. First, I shall summarize the consensus opinion of the founders of the Modern Synthesis (Simpson, Dobzhanski, and Mayr) that the evolution of intelligent life is exceedingly improbable. Second, I shall develop the Fermi Paradox: if they existed they'd be here. Third, I shall show that if intelligent life were too common, it would use up all available resources and die out. But I shall show that the quantum mechanical principle of unitarity (actually a form of teleology!) requires intelligent life to survive to the end of time. Finally, I shall argue that, if the universe is indeed accelerating, then survival to the end of time requires that intelligent life, though rare, to have evolved several times in the visible universe. I shall argue that the acceleration is a consequence of the excess of matter over antimatter in the universe. I shall suggest experiments to test these claims. <|endoftext|><|startoftext|> Introduction, we assume that the spin axes of both stars have been aligned with the orbital normal and that the rotation of both stars has been synchronized to the orbital period. This allows us to use the observed rotational line broadening of the primary to solve for the radius of the primary in linear units, which in turn allows us to convert the orbital size and secondary radius into linear units from the values of [a/RA] and [RB/RA] derived from the light curves. Using the assumption of synchronization, and that iorb = irot, we see by inspection that [Vrot sin irot] sin iorb [Vrot sin irot] sin iorb [RB/RA] (13) where [Vrot sin irot] is the projected rotational broadening of the primary derived from its observed spectra. We may now substitute in Eq.(8) for sin iorb to get both radii in terms of – 14 – our observables: 2π (1− [b]2/[a/RA]2) [Vrot sin irot] (14) 2π (1− [b]2/[a/RA]2)1/2 [RB/RA] [Vrot sin irot] (15) By combining these two statements with Eq.(9) and (10), we arrive at expressions for the masses of each component in terms of just the observable quantities: [a/RA] (1− [b]2/[a/RA]2)3/2 1− [e]2 [a/RA][Vrot sin irot] [Vrot sin irot] 3 (16) [a/RA] (1− [b]2/[a/RA]2)3/2 1− [e]2 [Vrot sin irot]2 (17) The results for the masses and radii for both components of HAT-TR-205-013 are pre- sented in Table 8. The errors were estimated using Monte-Carlo simulations and were com- pared with the results of formal error propagation, including the correlation coefficients derived from the light-curve fits. Both approaches delivered similar results. The mass and radius obtained for the primary star are essentially the same for both the g and i light curves, but the mass and radius for the secondary differ by 0.8 and 3 percent, respectively. This radius difference between the two light curves is close to 1-σ, and may be due to uncertainties in the limb-darkening coefficients. Our adopted values are based on the average values of the light curve parameters. 4. DISCUSSION In Figure 6 we plot our mass and radius for the M-dwarf secondary HAT-TR-205- 013 B on a mass-radius diagram, together with isochrones for ages of 0.5 and 5 Gyr from Baraffe et al. (1998). We also plot the results for 11 M dwarf secondaries from the sample of OGLE planetary candidates analyzed by Bouchy et al. (2005); Pont et al. (2005a,b, 2006) and listed in Table 9. For the systems OGLE-TR-34 (Bouchy et al. 2005), OGLE-TR-120 (Pont et al. 2005b), and the low mass systems OGLE-TR-122 (Pont et al. 2005a) and OGLE- TR-123 (Pont et al. 2006) the authors had to use stellar models to to estimate the masses and radii of the primaries without the assumption of synchronization, as synchronization implied masses and radii that were inconsistent with the spectroscopic observations. For the other seven systems, they were able to assume synchronization and to derive the radius of the primary from the observed rotational line broadening. In general the agreement between the – 15 – OGLE results and the Baraffe et al. (1998) isochrones looks promising, but the observational uncertainties are still too large to allow a critical test of the theoretical models. The OGLE systems are all much fainter than HAT-TR-205-013, which presents significant challenges for both the spectroscopic and photometric follow-up observations. Spectroscopy with the resolution and signal-to-noise ratio suitable for determining accurate values for rotational broadening requires time on large telescopes, and photometry for high-quality light curves also requires large telescopes to achieve the needed photon statistics. Eclipsing binaries identified by wide-angle surveys are much brighter and therefore less challenging on both counts. Our value for the radius of the M-dwarf secondary in HAT-TR-205-013 lies 11 percent, about 3-σ, above the theoretical isochrones. This divergence is further reinforced by Eq.(11), which, as has been previously noted, restricts the position of HAT-TR-205-013 B to lie on a single line that is determined by the surface gravity of the object. This gravity curve does not rely on any prior assumptions about the HAT-TR-205-013 system, nor does it depend upon our measured value of Vrot sin irot, which is the biggest contributor of uncertainty to our final results. We use the assumption of synchronization and the spectroscopically measured Vrot sin irot to place HAT-TR-205-013 B at a specific location along the curve, but it is important to note that in the region that we find HAT-TR-205-013 B, the curve of allowable locations runs nearly parallel to the theoretical models. This is illustrated in Figure 6 by the red line that passes through our point for HAT-TR-205-013 B. Thus the conclusion that the theoretical models predict a radius for HAT-TR-205-013 B that is too small by about 10 percent is on much firmer ground than the error bar in the observed radius might suggest. Indeed, it would require a 6-σ difference in Vrot sin irot to place HAT-TR-205-013 B onto the Baraffe models. Our result for HAT-TR-205-013 B supports the suggestion from the results for double- lined eclipsing binaries plotted in Figure 1 that the models predict radii for M dwarfs that are too small by up to 10 percent. This discrepancy has been noted before, for example by Torres & Ribas (2002) in the case of YY Gem. Torres et al. (2006) raised the issue of whether short-period eclipsing binaries are representative of isolated field stars and wide binaries where tidal forces are negligible. They suggested that the rapid rotation of the stars in these systems caused by tidal synchronization might give rise to enhanced magnetic activity, thus decreasing the efficiency of energy transport in the convective envelopes and leading to inflated stellar radii. For low mass stars, this effect is examined in more detail by Lopez-Morales (2007). In the case of HAT-TR-205-013, we see no evidence in the photometry of star spots on the primary star, which would be tell-tale indicators of enhanced stellar magnetic activity. – 16 – Though HAT-TR-205-013 A is rapidly rotating, the lack of magnetic ativity is not suprising, given its spectral type (F7). The star’s outer convective layer is relatively shallow, and it is not unusual for rapidly rotating stars of this type to lack strong magnetic activity (Torres et al. 2006). In some instances it may be possible to independently determine the rotational period of the primary through high-quality light curves used to definitively identify photometric varia- tion outside of eclipse. This would serve as a check to the assumption of tidal synchronization in the system. In future papers we will present the results for several additional single-lined eclipsing binaries with circularized orbits. We thank Joe Zajac, Perry Berlind, and Mike Calkins for obtaining some of the spectro- scopic observations; Bob Davis for maintaining the database for the CfA Digital Speedome- ters; and John Geary, Andy Szentgyorgyi, Emilio Falco, Ted Groner, and Wayne Peters for their contribution to making KeplerCam such an effective instrument for obtaining high- quality light curves. TGB thanks the Harvard University Origins of Life Initiative for sup- port. GK thanks the support of OTKA K-60750. The HATnet project is supported by NASA Grant NNG04GN74G. This research was supported in part by the Kepler Mission under NASA Cooperative Agreement NCC2-1390. REFERENCES Andersen, J. 1991, A&AR, 3, 91 Bakos, G. Á., Lázár, J., Papp, I., Sári, P., & Green, E. M. 2002, PASP, 114, 974 Bakos, G., Noyes, R. W., Kovács, G., Stanek, K. Z., Sasselov, D. D., & Domsa, I. 2004, PASP, 116, 266 Baraffe, I., Chabrier, G., Allard, F., & Hauschildt, P. H. 1998, A&A, 337, 403 Baraffe, I., Chabrier, G., Allard, F., & Hauschildt, P. H. 2002, A&A, 382, 563 Borucki, W. J., Caldwell, D., Koch, D. G., Webster, L. D., Jenkins, J. M., Ninkov, Z., & Showen, R. 2001, PASP, 113, 439 Bouchy, F., Pont, F., Melo, C., Santos, N. C., Mayor, M., Queloz, D., & Udry, S. 2005, A&A, 431, 1105 – 17 – Chabrier, G., & Baraffe, I. 1997, A&A, 327, 1039 Claret, A. 2004, A&A, 428, 1001 Creevey, O. L., Benedict, G. F., Brown, T. M., Alonso, R., Cargile, P., Mandushev, G., Charbonneau, D., McArthur, B. E., et al. 2005, ApJ, 625, L127 Cox, Arthur N., ed. 2000. Allen’s Astrophysical Quantities, Fourth Edition. New York: Springer-Verlag. Flower, P. J. 1996, ApJ, 469, 355 Gaudi, B. S., & Winn, J. N. 2007, ApJ, 655, 550 Girardi L., Bressan A., Bertelli G., & Chiosi C. 2000, A&AS, 141, 371 Hut, P. 1981, A&A, 99, 126 Kovács, G., Bakos, G., & Noyes, R. W. 2005, MNRAS, 356, 557 Kovács, G., Zucker, S., & Mazeh, T. 2002, A&A, 391, 369 Kurtz, M. J., & Mink, D. J. 1998, PASP, 110, 934 Kurucz, R. L. 1992, in The Stellar Populations of Galaxies, IAU Symp. No. 149, ed. B. Barbuy and A. Renzini (Kluwer Acad. Publ.: Dordrecht), 225 Lacy, C. H. 1977, ApJ, 218, 444 Latham, D. W. 1992, in IAU Coll. 135, Complementary Approaches to Double and Multiple Star Research, ASP Conf. Ser. 32, eds. H. A. McAlister & W. I. Hartkopf (San Francisco: ASP), 110 Latham, D. W. 2003, in ASP Conf. Ser. 294: Scientific Frontiers in Research on Extrasolar Planets, ed. D. Deming & S. Seager (San Fransisco: ASP), 409 Latham, D. W. 2007, in Transiting Extrasolar Planets Workshop, ed. C. Afonso, ASP Conf. Ser. in press. López-Morales, M., & Ribas, I. 2005, ApJ, 631, 1120 López-Morales, M., Orosz, J. A., Shaw, J. S., Havelka, L., Arevalo, M. J., McIntyre, T., & Lazaro, C. 2006, ApJ, submitted (astro-ph/0610225) Lopez-Morales, M. 2007, ArXiv Astrophysics e-prints, arXiv:astro-ph/0701702 http://arxiv.org/abs/astro-ph/0610225 http://arxiv.org/abs/astro-ph/0701702 – 18 – Maceroni, C., & Montalbán, J. 2004, A&A, 426, 577 Mandel, K., & Agol, E. 2002, ApJ, 580, L171 Metcalfe, T. S., Mathieu, R. D., Latham, D. W., & Torres, G. 1996, ApJ, 456, 356 Mullan, D. J., & MacDonald, J. 2001, ApJ, 559, 353 Nordström, B., Mayor, M., Andersen, J., Holmberg, J., Pont, F., Jogensen, B. R., Olsen, E. H., Udry, S., Mowlavi, N. 2004, A&A, 418, 989 O’Donovan, F. T., Charbonneau, D., Alonso, R., Brown, T. M., Mandushev, G., Dunham, E. W., Latham, D. W., Stefanik, R. P., et al. 2007, ApJ, submitted (astro-ph/0610603) Pont, F., Melo, C. H. F., Bouchy, F., Udry, S., Queloz, D., Mayor, M., & Santos, N. C. 2005, A&A, 433, L21 Pont, F., Bouchy, F., Melo, C., Santos, N. C., Mayor, M., Queloz, D., & Udry, S. 2005, A&A, 438, 1123 Pont, F., Moutou, C., Bouchy, F., Behrend, R., Mayor, M., Udry, S., Queloz, D., Santos, N., & Melo, C. 2006, A&A, 447, 1035 Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P., Numerical Recipes, 1992 (Cambridge: Cambridge Univ. Press) Ribas, I. 2003, A&A, 398, 239 Southworth, J., Zucker, S., Maxted, P. F. L., & Smalley, B. 2004, MNRAS, 355, 986 Tody, D. 1986, Proc. SPIE, 627, 733 Tody, D. 1993, in ASP Conf. Ser. 52, Astronomical Data Analysis Software and Systems II, ed. R. J. Hanisch, R. J. V. Brissenden, & J. Barnes (San Francisco: ASP), 173 Torres, G., Lacy, C. H., Marschall, L. A., Sheets, H. A. & Mader, J. A. 2006, ApJ, 640, 1018 Torres, G., & Ribas, I. 2002, ApJ, 567, 1140 Winn, J. N., et al. 2006, ArXiv Astrophysics e-prints, arXiv:astro-ph/0612224 Young, A. T. 1967, AJ, 72, 747 Zahn, J. P. 1989, A&A, 220, 112 This preprint was prepared with the AAS LATEX macros v5.2. http://arxiv.org/abs/astro-ph/0610603 http://arxiv.org/abs/astro-ph/0612224 – 19 – Table 1. Individual Radial Velocities HJD Vrad σ(Vrad) (days) (km s−1) (km s−1) 2453034.45642 −2.02 1.38 2453035.47574 −10.11 1.61 2453035.58018 −18.52 1.01 2453036.48778 −12.93 1.53 2453037.46565 −0.47 1.43 2453037.61215 −6.83 0.91 2453038.46413 −25.72 1.48 2453038.57874 −19.76 1.15 2453040.47360 −28.58 1.24 2453042.58686 −27.79 0.91 2453043.58338 +4.84 1.23 2453044.58422 −19.73 1.14 2453045.57911 −6.42 0.73 2453046.46373 −2.11 0.80 2453046.60000 −10.47 0.77 2453047.50881 −20.45 1.49 2453047.58731 −14.83 0.95 2453543.94910 −4.01 1.10 2453658.69572 −20.53 1.16 2453659.75967 +3.14 2.28 2453659.78398 +2.85 1.09 2453660.70213 −25.98 1.57 2453664.70202 −19.39 1.21 – 20 – Table 2. Spectroscopic Orbital Parameters Parameter Value P (days) 2.23072± 0.00005 γ (km s−1) −9.83± 0.30 K (km s−1) 18.33± 0.47 e 0.012± 0.021 ω (◦) 143± 90 Epoch (HJD) 2, 453, 198.61± 0.56 Nobs 23 O − C rms (km s−1) 1.06 f(M) (M⊙3) 0.00142± 0.00023 aA sin i (Gm) 0.562± 0.030 – 21 – Table 3. Rotational Velocity Results log g,[Fe/H] Teff < Vrot > σ(< Vrot >) Correlation (K) (km s−1) (km s−1) Coefficient 4.0,0.0 6340 29.4 0.25 0.826 4.5,0.0 6540 28.4 0.24 0.823 4.0,−0.5 5960 29.2 0.21 0.821 4.5,−0.5 6150 28.2 0.24 0.816 Adopted: 4.24,−0.2 6295 28.9 1.0 – 22 – Table 4. g Band Photometry HJD Flux 2453666.575985 1.00054 2453666.576472 0.99856 2453666.576946 1.00108 2453666.579226 1.00084 2453666.579712 1.00008 2453666.580198 0.99985 2453666.582501 0.99998 2453666.582976 0.99882 2453666.583474 1.00159 Note. — Table 4 is pre- sented in its entirety in the electronic edition of the As- trophysical Journal. A por- tion is shown here for guid- ance regarding its form and content. column (1): Heliocentric Julian Date, column (2): Normalized in- strumental flux. – 23 – Table 5. i Band Photometry HJD Flux 2453666.574226 0.99951 2453666.574469 0.99803 2453666.574724 1.00096 2453666.574967 0.99846 2453666.575233 0.99628 2453666.575488 1.00014 2453666.577432 1.00062 2453666.577698 0.99886 2453666.577965 0.99941 2453666.578208 0.99613 2453666.578474 0.99957 2453666.578728 0.99962 2453666.580719 0.99837 2453666.580974 0.99976 2453666.581228 0.99761 2453666.581494 1.00025 2453666.581772 1.00113 2453666.582027 0.99823 Note. — Table 5 is pre- sented in its entirety in the electronic edition of the As- trophysical Journal. A por- tion is shown here for guid- ance regarding its form and content. column (1): Heliocentric Julian Date, – 24 – column (2): Normalized in- strumental flux. – 25 – Table 6. Light-Curve Fit Results Parameter g Band i Band Adopted a/RA 5.93± 0.15 5.91± 0.16 5.92± 0.11 RB/RA 0.1330± 0.0010 0.1288± 0.0007 0.1309± 0.0006 b 0.36± 0.06 0.37± 0.07 0.365± 0.046 – 26 – Table 7. Correlation Coefficients Coefficient g Band i Band (a/RA,RB/RA) 0.28 0.27 (a/RA,b) −0.21 −0.42 (RB/RA,b) 0.04 0.01 – 27 – Table 8. Physical Parameters for HAT-TR-205-013 Parameter g Band i Band Adopted MA (M⊙) 1.04± 0.14 1.03± 0.14 1.04± 0.13 RA (R⊙) 1.28± 0.04 1.28± 0.04 1.28± 0.04 MB (M⊙) 0.124± 0.011 0.123± 0.011 0.124± 0.010 RB (R⊙) 0.169± 0.006 0.164± 0.006 0.167± 0.006 a (AU) 0.0351± 0.0015 0.0351± 0.0015 0.0351± 0.0014 – 28 – Table 9. Masses and Radii for Low-Mass Stars Name M (M⊙) R (R⊙) Type Ref. OGLE-TR-123 B 0.085± 0.011 0.133± 0.009 SB1 EB 1 OGLE-TR-122 B 0.092± 0.009 0.120± 0.018 SB1 EB 2,3 OGLE-TR-106 B 0.116± 0.021 0.181± 0.013 SB1 EB 3 HAT-TR-205-013 B 0.123± 0.011 0.167± 0.007 SB1 EB 13 OGLE-TR-125 B 0.209± 0.033 0.211± 0.027 SB1 EB 3 CM Dra B 0.2136± 0.0010 0.2347± 0.0019 SB2 EB 4,5 CM Dra A 0.2307± 0.0010 0.2516± 0.0020 SB2 EB 4,5 OGLE-TR-78 B 0.243± 0.015 0.240± 0.013 SB1 EB 3 OGLE-TR-5 B 0.271± 0.035 0.263± 0.012 SB1 EB 6 OGLE-TR-7 B 0.281± 0.029 0.282± 0.013 SB1 EB 6 OGLE-TR-6 B 0.359± 0.025 0.393± 0.018 SB1 EB 6 OGLE-TR-18 B 0.387± 0.049 0.390± 0.040 SB1 EB 6 CU Cnc B 0.3890± 0.0014 0.3908± 0.0094 SB2 EB 7 OGLE-BW3-V38 B 0.41± 0.09 0.44± 0.06 SB2 EB 8 CU Cnc A 0.4333± 0.0017 0.4317± 0.0052 SB2 EB 7 OGLE-BW3-V38 A 0.44± 0.07 0.51± 0.04 SB2 EB 8 OGLE-TR-120 B 0.47± 0.04 0.42± 0.02 SB1 EB 3 TrES-Her0-07621 B 0.489± 0.003 0.452± 0.050 SB2 EB 9 TrES-Her0-07621 A 0.493± 0.003 0.453± 0.060 SB2 EB 9 NSVS01031772 B 0.4982± 0.0025 0.5088± 0.0030 SB2 EB 10 OGLE-TR-34 B 0.509± 0.038 0.435± 0.033 SB1 EB 6 NSVS01031772 A 0.5428± 0.0027 0.5260± 0.0028 SB2 EB 10 YY Gem A & B 0.5992± 0.0047 0.6191± 0.0057 SB2 EB 11 GU Boo B 0.599± 0.006 0.620± 0.020 SB2 EB 12 GU Boo A 0.610± 0.007 0.623± 0.016 SB2 EB 12 References. — 1. Pont et al. (2006); 2. Pont et al. (2005a); 3. Pont et al. (2005b); 4. Lacy (1977); 5. Metcalfe et al. (1996); 6. Bouchy et al. (2005); – 29 – 7. Ribas (2003); 8. Maceroni & Montalbán (2004); 9. Creevey et al. (2005); 10. López-Morales et al. (2006); 11. Torres & Ribas (2002); 12. López-Morales & Ribas (2005); 13. This paper – 30 – Fig. 1.— The mass-radius diagram for 10 stars in 5 double-lined eclipsing binaries each composed of two M dwarfs, and with errors better than 3 percent. – 31 – 9.98 10.02 10.04 10.06 10.08 10.1 -0.4 -0.2 0 0.2 0.4 Phase HAT-5 TFA data HAT-8 TFA data 10.01 10.02 10.03 10.04 10.05 10.06 -0.1 -0.05 0 0.05 0.1 Fig. 2.— The phase-folded HATnet light curve for HAT-TR-205-013. – 32 – Fig. 3.— The velocity curve for our orbital solution for HAT-TR-205-013, together with the individual observed velocities. The lower panel shows the O-C velocity residuals from the orbital solution. – 33 – Fig. 4.— KeplerCam light curves for HAT-TR-205-013 in the SDSS g and i bands. Contin- uous lines show the best fit synthetic light curves for each. – 34 – Fig. 5.— Contours of χ2 for the results from fits to the light curves in the g-band (left panels) and i-band (right panels). For each band the three panels show the projections onto the three possible planes involving b, a/RA, and RB/RA. The 1-σ, 2-σ, and 3-σ contours are plotted. – 35 – Fig. 6.— The mass-radius diagram for M dwarfs in single-lined eclipsing binaries. The M dwarfs from Pont et al. (2005a,b, 2006) are plotted as open circles. The red line passing through the point for HAT-TR-205-013 B shows the constraint imposed on its location by Eq.(11) and our observed quantities, without making any explicit assumptions (such as synchronization) about the system. Assuming synchronization, the hash marks on the line show the effect that differences of ± 1, 2, & 3 km s−1 in Vrot have on our final results. INTRODUCTION OBSERVATIONS AND DATA REDUCTION HAT Photometry Follow-up Spectroscopy Follow-up KeplerCam Photometry LIGHT-CURVE ANALYSIS MASSES AND RADII FOR HAT-TR-205-013 DISCUSSION ABSTRACT We derive masses and radii for both components in the single-lined eclipsing binary HAT-TR-205-013, which consists of a F7V primary and a late M-dwarf secondary. The system's period is short, $P=2.230736 \pm 0.000010$ days, with an orbit indistinguishable from circular, $e=0.012 \pm 0.021$. We demonstrate generally that the surface gravity of the secondary star in a single-lined binary undergoing total eclipses can be derived from characteristics of the light curve and spectroscopic orbit. This constrains the secondary to a unique line in the mass-radius diagram with $M/R^2$ = constant. For HAT-TR-205-013, we assume the orbit has been tidally circularized, and that the primary's rotation has been synchronized and aligned with the orbital axis. Our observed line broadening, $V_{\rm rot} \sin i_{\rm rot} = 28.9 \pm 1.0$ \kms, gives a primary radius of $R_{\rm A} = 1.28 \pm 0.04$ \rsun. Our light curve analysis leads to the radius of the secondary, $R_{\rm B} = 0.167 \pm 0.006$ \rsun, and the semimajor axis of the orbit, $a = 7.54 \pm 0.30 \rsun = 0.0351 \pm 0.0014$ AU. Our single-lined spectroscopic orbit and the semimajor axis then yield the individual masses, $M_{\rm B} = 0.124 \pm 0.010$ \msun and $M_{\rm A} = 1.04 \pm 0.13$ \msun. Our result for HAT-TR-205-013 B lies above the theoretical mass-radius models from the Lyon group, consistent with results from double-lined eclipsing binaries. The method we describe offers the opportunity to study the very low end of the stellar mass-radius relation. <|endoftext|><|startoftext|> Coulomb excitation of unstable nuclei at intermediate energies C.A. Bertulani1∗, G. Cardella2, M. De Napoli2,3, G. Raciti2,3, and E. Rapisarda2,3 1 Department of Physics and Astronomy, University of Tennessee, Knoxville, Tennessee 37996, USA 2 Istituto Nazionale di Fisica Nucleare, Sezione di Catania, via Santa Sofia 64, I-95123, Catania, Italy 3 Dipartimento di Fisica e Astronomia, Universitá Catania, via Santa Sofia 64, I-95123, Catania, Italy Abstract We investigate the Coulomb excitation of low-lying states of unstable nuclei in intermediate energy collisions (Elab ∼ 10−500 MeV/nucleon). It is shown that the cross sections for the E1 and E2 transitions are larger at lower energies, much less than 10 MeV/nucleon. Retardation effects and Coulomb distortion are found to be both relevant for energies as low as 10 MeV/nucleon and as high as 500 MeV/nucleon. Implications for studies at radioactive beam facilities are discussed. PACS numbers: 25.60.-t, 25.70.-z, 25.70.De Keywords: Coulomb excitation, cross sections, unstable nuclei. ∗ bertulanica@ornl.gov. http://arxiv.org/abs/0704.0060v2 Unstable nuclei are often studied with reactions induced by secondary radioactive beams. Examples of these reactions are elastic scattering, fragmentation and Coulomb excitation by heavy targets. Coulomb excitation is specially useful since the interaction mechanism is very well known [1]. It is the result of electromagnetic interactions of a projectile (ZP ,AP ) with a target (ZT ,AT ). One of the participating nuclei is excited as it passes through the electromagnetic field of the other. Here we will only consider the excitation of the pro- jectile as is of interest in studies carried out in heavy ion facilities around the world, e.g. LNS/Catania, NSCL/MSU, GSI, GANIL, RIKEN, etc. In Coulomb excitation a virtual pho- ton with energy E is absorbed by the projectile. Because in pure Coulomb excitation the participating nuclei stay outside the range of the nuclear strong force, the excitation cross section can be expressed in terms of the same multipole matrix elements that character- ize excited-state gamma-ray decay, or the reduced transition probabilities, B(πλ; Ji → Jf). Hence, Coulomb excitation amplitudes are strongly coupled with valuable nuclear struc- ture information. Therefore, this mechanism has been used for many years to study the electromagnetic properties of low-lying nuclear states [1]. Coulomb excitation cross sections are large if the adiabacity parameter satisfies the con- dition ξ = ωfi < 1 , (1) where a0 is half the distance of closest approach in a head-on collision for a projectile velocity v, and Ex = ~ωfi is the excitation energy. This adiabatic cut-off limits the possible excitation energies below 1-2 MeV in sub-barrier collisions. A possible way to overcome this limitation, and to excite high-lying states, is to use higher projectile energies. In this case, the closest approach distance, at which the nuclei still interact only electromagnetically, is of order of the sum of the nuclear radii, R = RP + RT , where P refers to the projectile and T to the target. For very high energies one has also to take into account the Lorentz contraction of the interaction time by means of the Lorentz factor γ = (1− v2/c2)−1/2, with c being the speed of light. For such collisions the adiabacity condition, Eq. (1), becomes ξ(R) = < 1 . (2) From this relation one obtains that for bombarding energies around and above 100 MeV/nucleon, states with energy up to 10-20 MeV can be readily excited [3]. An appropriate description of Coulomb excitation at intermediate energies (Elab = 10 − 500 MeV/nucleon) has been described in ref. [2]. In this energy region neither the non-relativistic Coulomb excitation formalism described in ref. [1], nor the relativistic one formulated in refs. [3, 4] are appropriate. This is discussed in details in ref. [2] where it is shown that the correct values of the Coulomb excitation cross sections differ by up to 30-40% when compared to the non-relativistic and relativistic treatments used to calculate experimental observables (cross sections, gamma-ray angular distributions, etc.). We follow the formalism of ref. [2] to calculate cross sections for Coulomb excitation from energies varying from 10 to 500 MeV/nucleon. These are the energies where most radioactive beam facilities are or will be operating around the world. The calculated cross sections will be of useful guide for future experiments. We also compare the accurate calculations with those obtained by using simple analytical formulas and test the regime of their validity. The cross sections for the transition Ji → Jf in the projectile are calculated using the equation [2] dσi→f 4π2Z2T e B(πλ, Ji → Jf) (2λ+ 1)3 | S(πλ, µ) |2 , (3) where π = E or M stands for the electric or magnetic multipolarity, and B(πλ, Ji −→ Jf) = 2Ji + 1 |〈Jf ‖M(πλ)‖Ji〉| are the reduced transition probabilities. In these equations, ǫ = 1/ sin(Θ/2), with Θ being the deflection angle, a0 = ZPZT e 2/m0v 2 and a = a0/γ. The complex functions S(πλ, µ) are integrals along Coulomb trajectories corrected for retardation. Their calculation and how they relate to the non-relativistic and relativistic theories are described in details in ref. [2]. Here we will introduce another comparison tool for the total cross section, which is obtained by integration of eq. 3 over scattering angles. The code COULINT [2] was used to calculate the orbital integrals S(πλ, µ) and the cross sections of eq. 3 (for more details, see ref. [2]). Using the theory described in ref. [4], it is easy to show that approximate values of the cross sections for E1, E2, and M1 transitions can be obtained by means of the relations (app) B (E1) ξK0K1 − K21 −K (app) E3xB (E2) K21 + ξ K0K1 − K21 −K (app) B (M1) ξK0K1 − K21 −K , (5) where Kn are the modified Bessel functions of the second order, as a function of ξ given by eq. 2, with R corrected for recoil by the modification R → R + πa/2 [3]. Here we will only consider the excitation of the lowest lying states in light and medium heavy nuclei. For nuclear masses A < 20, the TUNL nuclear data evaluation web site was of great help [5]. The electromagnetic transition rates at the TUNL database are given in Weisskopf units and are transformed to the appropriate B(πλ, Ii → If)-values by means of the standard Weisskopf relations BW (E1; Ji → Jgs) = 0.06446A 2/3 e2fm2, BW (E2; Ji → Jgs) = 0.05940A 4/3 e2fm4, and BW (M1; Ji → Jgs) = 1.79 (e~/2mnc) . For comparison, a few medium mass nuclei, as well as a few stable nuclei, were included in the calculation. Other data were taken from refs. [6, 7, 8, 9]. Some cases of nuclei far from the stability line are very interesting and deserve further study, possibly using the method of Coulomb excitation. For example, it is well known that nuclei with open shells tend to have B(E2) values greater than 10 W.u., whereas nuclei with shell closure of neutrons or protons tend to have distinctly smaller B(E2) values. Typical examples of the latter category are the doubly magic nuclei, 16O and 48Ca, which B(E2) values are 3.17 and 1.58 W.u., respectively. According to an empirical formula adjusted to a global fit of the known transition rates, the values of first excited 2+ level, E2+ , and B(E2; 0+ → 2+) are related by [10] (E2+ in keV) B(E2; 0+ → 2+) = 26 A2/3E2+ e2fm4. (6) The value of B(E2) for 16C based on this formula is at least one order of magnitude larger than what is observed experimentally in a Coulomb dissociation experiment [9]. The anomalously strong hindrance of the 16C transition is not well explained theoretically. This is just an example of the power of Coulomb excitation as a tool to access the new physics inherent of poorly known rare nuclear species. Another example is the strong E1 transition in 11Be. 11Be is an archetype of a halo nu- cleus and exhibits the fastest known dipole transition between bound states in nuclei. The B(E1) transition strength between the ground and the only bound excited state (at 0.32 MeV) was determined from lifetime measurements by Millener et al. to be 0.116 e2fm2 [11]. However, Coulomb excitation experiments have obtained a much smaller value of the B(E1) which is still a matter of investigation [12, 13, 14]. It is thus seems clear that predictions based on traditional nuclear structure and reaction theory often yields results in disagree- ment with experimental data. In spite of that, when proper corrections are accounted for (e.g. channel-coupling, nuclear excitation, relativistic corrections), Coulomb excitation of ra- dioactive beams is a powerful complementary tool to investigate electromagnetic properties of nuclei far from the stability line. In Table 1 we compare our calculations with several experimentally obtained cross sections for Coulomb excitation of unstable nuclei. The units of energy are MeV, the laboratory energy is in MeV/nucleon, the B-values are in units of e2 fm2λ, and the cross sections are in millibarns. The last two columns give the calculated cross sections obtained by using eqs. 3 and 5, respectively. Since the cross sections of eq. 5 are functions of the minimum impact parameter, the values reported in the Table have been calculated according to the experimental angular ranges reported in the seventh column. Except for the 11Be case, for which the discrepancy between theory and experiment is known (see discussion above), the calculated cross sections are close to the experimental values. Nonetheless, the calculated cross sections tend to be smaller than the experimental ones for 17Ne, 32Mg, 38S, 40S, 42S, 44Ar, and 46Ar projectiles. This is worrisome because the B(πλ) values were extracted from the experimentally obtained cross sections, using equations similar to eq. 5. These experimental B-values would have to be larger by 10− 30% according to our calculations. It is important to stress the fact that many experimental data on unstable nuclei collected up to now have been analyzed by means of theoretical tools (DWBA and coupled-channels codes) which do not include relativistic dynamics (the inclusion of relativistic kinematics is straightforward). This problem was first addressed in ref. [15], where it was shown that the analysis of experimental data at intermediate energies without a proper treatment of relativistic dynamics leads to wrong values of electromagnetic transition probabilities. We should stress that a full theoretical treatment of relativistic dynamics of strong and electromagnetic interactions in many-body systems is very difficult and still does not exist [15]. Data Projectile Target Elab πλ B(πλ) θrange Ex σexp σth σapp 1 [16] 11Be Pb 43. E1 0.115 < 5◦ 0.32 (1 ) 191± 26 328. 323. 2 [16] 11Be Pb 59.4 E1 0.094 < 3.8◦ 0.32 (1 ) 304± 43 213. 211. 3 [18] 11Be Au 57.6 E1 0.079 < 3.8◦ 0.32 (1 ) 244± 31 170. 168. 4 [17] 11Be Pb 64. E1 0.099 < 3.8◦ 0.32 (1 ) 302± 31 217. 215. 5 [19, 20] 17Ne Au 60. M1 0.163 < 4.5◦ 1.29 (1 ) 12± 4 12.6 13.0 6 [6] 32Mg Pb 49.2 E2 454 < 4◦ 0.885 (0+ → 2+) 91.7± 14.4 137. 128. 7 [19] 38S Au 39.2 E2 235 < 4.1◦ 1.29 (0+ → 2+) 59± 7 48. 45.0 8 [19] 40S Au 39.5 E2 334 < 4.1◦ 0.91 (0+ → 2+) 94± 9 75.5 70.4 9 [19] 42S Au 40.6 E2 397 < 4.1◦ 0.89 (0+ → 2+) 128± 19 101. 94.3 10 [19] 44Ar Au 33.5 E2 345 < 4.1◦ 1.14 (0+ → 2+) 81± 9 62.3 58.3 11 [19] 46Ar Au 35.2 E2 196 < 4.1◦ 1.55 (0+ → 2+) 53± 10 40.9 38.2 12 [8] 46Ar Au 76.4 E2 212 < 2.9◦ 1.55 (0+ → 2+) 68± 8 50.0 47.4 Table 1. Cross sections for Coulomb excitation of unstable nuclei. The units of energy are MeV, the laboratory energy is in MeV/nucleon, the B(πλ)-values are in units of e2fm2λ, and the cross sections are in millibarns. The data for different experiments (numbered 1 to 12) were collected from the references listed in column 1. The last two columns give the calculated cross sections obtained by using eqs. 3 and 5, respectively. In figure 1 we show a comparison between the experimental data and our calculations. We notice that the cross sections calculated with help of eq. 5 are not much different than those calculated with eq. 3. They are systematically lower, up to 10%, than the exact calculation following eq. 3. As we discuss below, this is not always the case, specially for the excitation of high-lying states. In fact, this is a good check of eq. 3, which is done in a very different way than the analytical calculations of eq. 5. But as we will see below, this agreement is not always the case, specially when one includes small impact parameters for which the sensitivity to the relativistic corrections is higher (see ref. [2]). The dashed curve in figure 1 is a guide to the eye. It helps to see that the experimental cross sections are on average larger than the calculated ones, either with eq. 3 (open circles), or with eq. 5 (open triangles). 2 4 6 8 10 12 Data set FIG. 1: Comparison between experimental Coulomb excitation cross sections (solid stars with error bars) and theoretical ones, calculated either with eq. 3 (open circles), or with eq. 5 (open triangles). Ex [MeV] J i → J f πλ B(πλ) [e 2 fm2λ] 10 20 30 50 100 200 500 11Be 0.32 1 E1 0.115 1128 653 473 315 187 115 69.6 11B 2.21 3 M1 2.40×10−2 0.301 0.799 1.15 1.63 2.33 3.08 4.17 11C 2.00 3 M1 1.52×10−2 0.196 0.551 0.793 1.12 1.57 2.07 2.76 12B 0.953 1+ → 2+ M1 4.62×10−3 0.227 0.395 0.490 0.607 0.762 0.917 1.13 12C 4.44 0+ → 2+ E2 37.9 34.6 38.6 31.3 21.6 12.1 6.93 3.81 13C 3.09 1 E1 1.39×10−2 8.37 11.3 11.0 9.61 7.28 5.39 3.89 13N 2.37 1 E1 3.56×10−2 38.2 43.6 39.6 32.5 23.2 16.4 11.4 15C 0.74 1 E2 2.90 8.79 4.04 2.65 1.59 0.839 0.475 0.267 16C 1.77 0+ → 2+ E2 2.12 8.81 4.41 2.92 1.76 0.920 0.517 0.285 16N 0.12 0− → 2− E2 10.2 31.0 14.1 9.21 5.53 2.91 1.64 0.926 17N 1.37 1 M1 5.15×10−3 0.153 0.304 0.397 0.516 0.680 0.848 1.09 17O 0.87 5 E2 2.07 6.30 2.88 1.87 1.12 0.588 0.332 0.184 17F 0.5 5 E2 21.6 68.3 29.7 19.3 11.6 6.08 3.44 1.92 18O 1.98 0+ → 2+ E2 44.8 109 60.7 40.9 24.8 11.6 7.27 3.99 18F 0.94 1+ → 3+ E2 37.9 115 52.5 34.1 20.4 10.7 6.01 3.34 18Ne 1.89 0+ → 2+ E2 248 615 342 229 138 72.0 40.1 22.1 19O 0.1 5 M1 2.34×10−4 0.0495 0.0615 0.0673 0.0737 0.0799 0.779 0.799 19F 0.11 1 E1 5.51×10−4 8.07 4.36 3.06 1.97 1.10 0.592 0.337 19Ne 0.24 1 E2 119 361 157 102 61.6 32.5 18.5 10.5 20O 1.67 0+ → 2+ E2 28.0 72 37.4 24.9 15.1 7.86 4.41 2.43 20F 0.656 2+ → 3+ M1 3.56×10−3 0.237 0.385 0.465 0.560 0.683 0.803 0.959 20Ne 1.63 0+ → 2+ E2 319 834 433 287 173 89.8 50.3 27.6 30Ne 0.791 0+ → 2+ E2 460 1167 550 361 218 115 65.0 35.2 32Mg 0.885 0+ → 2+ E2 454 1151 541 355 214 112 63.0 36.7 42S 0.89 0+ → 2+ E2 397 945 445 292 175 91.9 52 29.7 46Ar 1.55 0+ → 2+ E2 190 399 209 140 84.4 44.1 24.7 13.6 54Ni 1.40 0+ → 2+ E2 626 1319 677 447 268 139 78.1 43.1 Table 2 - Cross sections (in mb) for Coulomb excitation of projectiles incident on Pb targets at bombarding energies ranging from 10 to 500 MeV/nucleon. The energy units are MeV, the laboratory energy is in MeV/nucleon, the B(πλ)-values are in units of e2fm2λ. The cross sections for Coulomb excitation of numerous projectiles incident on Pb targets at bombarding energies ranging from 10 to 500 MeV/nucleon are shown in Table 2. These cross sections were calculated assuming that the detectors collect events from all possible Coulomb scattering events. In a real experimental situation, the angular distribution is restricted to angular windows, reducing the available cross sections. Only the lowest lying transitions have been considered, i.e. from the ground to the first excited states. One observes that some cross sections are very large, specially for 11Be, 18Ne, 30Ne and 54Ni. For these and other similar cases, the measurements are easy to perform, with a large number of events/second even with modest intensities. Cases such as 16C are well within the experimental possibilities in most radioactive beam facilities. Table 2 also shows that, except for M1 excitations, the Coulomb excitation cross sections decrease steadily as the energy increases from 10 to 500 MeV/nucleon. Based on these numbers alone, one could conclude that Coulomb excitation of low-lying states (in contrast to the case of high-lying states, e.g. giant resonances [4]) are better suited for studies at low energies. However, reactions at lower energies while are less influenced by contamination due to nuclear breakup [12, 14] can give rise to large high-order effects [21]. The interpretation of data could be distorted as in the case of Coulomb dissociation of 8B at low energy [24], which was completely misinterpreted in terms of first-order calculations. In some situations, when higher-order effects are relevant, the effect of the nuclear breakup cannot be neglected either [22, 23]. Thus, the choice of the incident energy would depend on the experimental conditions. Identification of gamma-rays from de-excitation using Doppler shift techniques are often more advantageous at higher energies. Moreover, except for few cases (e.g. 11C), the magnetic dipole transitions are much smaller than those for E1 and E2 transitions. Even for M1 transitions the measurements are under the possibility of most new experimental facilities. The comparison of the exact calculations, using eq. 3 (solid lines), and the approximations 5 (dashed lines) are shown in figs. 2(a-d), for 11Be, 11B, 54Ni and 16O, respectively. The 16O case (as well as for 12C in Table 2) was included for comparison, with a high-lying excited state. We see from figs 2a and 2b that the approximations in eq. 5 work quite well for the M1 multipolarity and reasonably well (within 20% at 10 MeV/nucleon and 5% at 50 MeV/nucleon) for the E1 cases. But they fail badly at low and intermediate energies for (c) (d) FIG. 2: Coulomb excitation cross section of the first excited state in 11Be, 11B and 54Ni and of the 13.05 MeV sate in 16O projectiles incident on Pb targets as a function of the laboratory energy. the E2 ( fig. 2c). The reason is that the E2 Coulomb field (“tidal field”) is very sensitive to the details of the collision dynamics at low energies. These conclusions can be deceiving since even for the E1 and M1 cases the approximations in eq. 5 may strongly differ from the exact calculations if the excitation energy is large (see discussion in ref. [2]). This is shown in figure 2d, where we plot the Coulomb excitation cross section of the Ex = 13.09 MeV state in 16O. In this case, the cross sections based on eq. 5 is a factor of 10 smaller than the exact calculation at 10 MeV/nucleon. At 100 MeV/nucleon this difference drops to 10%, which still needs to be considered with care. In summary, in this article we have used the formalism of ref. [2] to predict the cross sections for Coulomb excitation of several light projectiles with electromagnetic transitions found in the literature, listed in the TUNL database [5], and for a few other selected cases. These estimates will be useful for planing Coulomb excitation experiments at present and future heavy ion facilities. It is evident that the inclusion of relativistic effects combined with Coulomb distortion are of the utmost relevance. The cross section inferred by using non-relativistic or pure relativistic treatments can be wrong by up to 30% even at 100 MeV/nucleon, as shown here and in ref. [2]. Finally, the use of Coulomb excitation to produce nuclei in high-lying states is an important tool to study particle emission processes. For example, the excitation of 18Ne and its subsequent decay by two-proton emission is a process of large theoretical and experimental interest. Experimental work in this direction is in progress [25]. Acknowledgments This research was supported by the U.S. Department of Energy under contract No. DE- AC05-00OR22725 (Oak Ridge National Laboratory) with UT-Battelle, LLC., and by DE- FC02-07ER41457 with the University of Washington (UNEDF, SciDAC-2). [1] K. Alder and A. Winther, Electromagnetic Excitation, North-Holland, Amsterdam, 1975. [2] C.A. Bertulani, A.E. Stuchbery, T.J. Mertzimekis and A.D. Davies, Phys. Rev. C 68 (2003) 044609. [3] A. Winther and K. Alder, Nucl. Phys. A 319 (1979) 518. [4] C.A. Bertulani and G. Baur, Nucl. Phys. A 442 (1985) 739. [5] TUNL Nuclear Data Project: http://www.tunl.duke.edu/nucldata/index.shtml [6] T. Motobayashi et al., Phys. Lett. B 346 (1995) 9. [7] H. Scheit et al., Phys. Rev. Lett. 77 (1996) 3967. [8] A. Gade et al., Phys. Rev. C 68 (2003) 014302. [9] N. Imai et al, Phys. Rev. Lett. 92 (2004) 062501. [10] S. Raman, C.W. Nestor, Jr., and K. H. Bhatt, Phys. Rev. C 37, 805 (1988). [11] D. J. Millener, J. W. Olness, E. K. Warburton, and S. S. Hanna, Phys. Rev. C 28 (1983) 497. [12] C. A. Bertulani, L. F. Canto, and M. S. Hussein, Phys. Lett. B 353 (1995) 413. [13] M. S. Hussein, R. Lichtenthäler, F. M. Nunes, and I. J. Thompson, Phys. Lett. B 640 (2006) http://www.tunl.duke.edu/nucldata/index.shtml [14] R. Chatterjee, [Los Alamos archiive: nucl-th/0703083], 2007. [15] C.A. Bertulani, Phys. Rev. Lett. 94 (2005) 072701. [16] R. Anne et al., Z. Phys. A 352 (1995) 397. [17] T. Nakamura et al., Phys. Lett. B 394 (1997) 11. [18] M. Fauerbach et al., Phys. Rev. C 56 (1997) R1. [19] M.J. Chromik et al., Phys. Rev C 55 (1997) 1676. [20] M.J. Chromik et al., Phys. Rev C 66 (2002) 024313. [21] C.A.Bertulani and L.F.Canto, Nucl. Phys. A 539 (1992) 163; G.F. Bertsch and C.A. Bertulani, Nucl. Phys. A 556 (1993) 136. [22] C.A. Bertulani and M. Gai, Nucl. Phys. A 636 (1998) 227. [23] C.H. Dasso, S.M. Lenzi, A. Vitturi, Nucl.Phys. A 639 (1998) 635. [24] J. von Schwarzenberg, J.J. Kolata, D. Peterson, P. Santi, and M. Belbot, Phys. Rev. C 53, R2598 (1996). [25] E. Rapisarda, G. Cardella, F. Amorini, L. Calabretta, M. De Napoli, P.Figuera, G. Raciti, F. Rizzo, D. Santonocito and C. Sfienti, 7th Int. Conf. on Radioactive Nuclear Beams, Cortina d’Ampezzo, Italy, July 3 - 7, 2006. http://arxiv.org/abs/nucl-th/0703083 References ABSTRACT We investigate the Coulomb excitation of low-lying states of unstable nuclei in intermediate energy collisions ($E_{lab}\sim10-500$ MeV/nucleon). It is shown that the cross sections for the $E1$ and $E2$ transitions are larger at lower energies, much less than 10 MeV/nucleon. Retardation effects and Coulomb distortion are found to be both relevant for energies as low as 10 MeV/nucleon and as high as 500 MeV/nucleon. Implications for studies at radioactive beam facilities are discussed. <|endoftext|><|startoftext|> Introduction. 2. Preliminaries. 3. Analytic families of the generalized cosine transforms. 4. Positive definite homogeneous distributions. 5. λ-intersection bodies. 6. Examples of λ-intersection bodies. 7. (q, ℓ)-balls. 8. The generalized cosine transforms and comparison of volumes. 9. Appendix. 2000 Mathematics Subject Classification. Primary 44A12; Secondary 52A38. Key words and phrases. Spherical Radon transforms, cosine transforms, inter- section bodies. The research was supported in part by the NSF grant DMS-0556157 and the Louisiana EPSCoR program, sponsored by NSF and the Board of Regents Support Fund. http://arxiv.org/abs/0704.0061v2 2 BORIS RUBIN 1. Introduction This is an updated and extended version of our previous preprint [R5]. Intersection bodies interact with Radon transforms and encompass diverse classes of geometric objects associated to sections of star bodies. The concept of intersection body was introduced in the remarkable paper by Lutwak [Lu] and led to a breakthrough in the solution of the long-standing Busemann-Petty problem; see [G], [K4], [Lu], [Z2] for references and historical notes. We remind some known facts that will be needed in the following. An origin-symmetric (o.s.) star body in Rn, n ≥ 2, is a compact set K with non-empty interior such that tK ⊂ K ∀t ∈ [0, 1], K = −K, and the radial function ρK(θ) = sup{λ ≥ 0 : λθ ∈ K} is continuous on the unit sphere Sn−1. In the following, Kn denotes the set of all o.s. star bodies in Rn, Gn,i is the Grassmann manifold of i-dimensional linear subspaces of Rn, and voli(·) denotes the i-dimensional volume function. The Minkowski functional of a body K ∈ Kn is defined by ||x||K = min{a ≥ 0 : x ∈ aK}, so that ||θ||K = ρ−1K (θ), θ ∈ Sn−1. Definition 1.1. [Lu] A body K ∈ Kn is an intersection body of a body L ∈ Kn if ρK(θ) = voln−1(L ∩ θ⊥) for every θ ∈ Sn−1, where θ⊥ is the central hyperplane orthogonal to θ. By taking into account that voln−1(L ∩ θ⊥) in Definition 1.1 is a constant multiple of the Minkowski-Funk transform (Mf)(θ) = Sn−1∩θ⊥ f(u) dθu, f(u) = ρ L (u), Goodey, Lutwak and Weil [GLW] generalized Definition 1.1 as follows. Definition 1.2. A body K ∈ Kn is an intersection body if ρK = Mµ for some even non-negative finite Borel measure µ on Sn−1. A sequence of bodies Kj ∈ Kn is said to be convergent to K ∈ Kn in the radial metric if lim ||ρKj − ρK ||C(Sn−1) = 0. Proposition 1.3. The class of intersection bodies is the closure of the class of intersection bodies of star bodies in the radial metric. Proposition 1.4. If K is an intersection body in Rn, n > 2, then for every i = 2, 3, . . . , n − 1 and every η ∈ Gn,i, K ∩ η is an intersection body in η. Regarding these two important propositions see [FGW], [GW] and a nice historical survey in [G]. INTERSECTION BODIES 3 Different generalizations of the concept of intersection body associ- ated to lower dimensional sections were suggested in the literature; see, e.g., [K4], [RZ], [Z1]. The following one, which plays an important role in the study of the lower dimensional Busemann-Petty problem, is due to Zhang [Z1]. Definition 1.5. We say, that a body K ∈ Kn belongs to Zhang’s class Zni if there is a non-negative finite Borel measure m on the Grassmann manifold Gn,i such that ρ K = R im, where R i is the dual spherical Radon transform; see (2.2), (2.5). Another generalization was suggested by Koldobsky [K2] and de- scribed in detail in [K4]. This class of bodies will be our main concern. Definition 1.6. [K4, p. 71] A body K ∈ Kn is a k-intersection body of a body L ∈ Kn (we write K = IBk(L)) if (1.1) volk(K ∩ ξ) = voln−k(L ∩ ξ⊥) ∀ξ ∈ Gn,k. We denote by IBk,n the set of all bodies K ∈ Kn satisfying (1.1) for some L ∈ Kn. When k = 1, this definition coincides with Definition 1.1 up to a constant multiple. An analog of Definition 1.2 was given in the Fourier analytic terms as follows. Definition 1.7. [K4, Definition 4.7] A body K ∈ Kn is a k-intersection body if there is a non-negative finite Borel measure µ on Sn−1, so that for every Schwartz function φ, ||x||−kK φ(x) dx = tk−1φ̂(tθ) dt dµ(θ), where φ̂ denotes the Fourier transform of φ. The set of all k-intersection bodies in Rn will be denoted by Ink . Keeping in mind Proposition 1.3 for k = 1, one can alternatively define the class Ink as a closure of IBk,n in the radial metric; cf. [Mi1, p. 532]. However, to apply results from [K4] to such class, equivalence of this definition to Definition 1.7 must be proved. We will do this in the more general situation in Section 5.2. From Definitions 1.6 and 1.7 it is not clear, for which bodies L ∈ Kn the relevant k-intersection body K = IBk(L) does exist. It is also not obvious which bodies actually constitute the class Ink . The following important characterization is due to Koldobsky. 4 BORIS RUBIN Theorem 1.8. [K4, Theorem 4.8] A body K ∈ Kn is a k-intersection body if and only if || · ||−kK represents a positive definite tempered dis- tribution on Rn, that is, the Fourier transform (|| · ||−kK )∧ is a positive tempered distribution on Rn. The concept of k-intersection body is related to another important development. For K ∈ Kn, the quasi-normed space (Rn, || · ||K) is said to be isometrically embedded in Lp, p > 0, if there is a linear operator T : Rn → Lp([0, 1]) so that ||x||K = ||Tx||Lp([0,1]). Theorem 1.9. [K4, Theorem 6.10] The space (Rn, || · ||K) embeds iso- metrically in Lp, p > 0, p 6= 2, 4, . . . , if and only if Γ(−p/2)(|| · ||pK)∧ is a positive distribution on Rn \ {0}. Following Theorems 1.9 and 1.8, one can formally say that K ∈ Ink if and only if (Rn, || · ||K) embeds isometrically in L−k. This observation, combined with Definition 1.7, was used by A. Koldobsky to define the concept of “isometric embedding in Lp” for negative p. Definition 1.10. [K4, Definition 6.14] Let 0 < p < n, K ∈ Kn. The space (Rn, || · ||K) is said to be isometrically embedded in L−p if there is a non-negative finite Borel measure µ on Sn−1, so that for every Schwartz function φ, ||x||−pK φ(x) dx = tp−1φ̂(tθ) dt dµ(θ), where φ̂ denotes the Fourier transform of φ. Origin-symmetric bodiesK in this definition can be regarded as “unit balls of n-dimensional subspaces of L−p”. Comparing Definitions 1.10 and 1.7, one might call these bodies “p-intersection bodies”. Since the meaning of the space L−p itself is not specified in Definition 1.10 and since our paper is mostly focused on geometric properties of bodies (rather than embeddings in Lp), in the following we prefer to adopt another name “λ-intersection body”, where λ is a real number, that will be specified in due course. We denote the set of all λ-intersection bodies in Rn by Inλ . Contents of the paper. We will focus on intimate connection between intersection bodies, spherical Radon transforms, and general- ized cosine transforms; see definitions in Section 2.2. This approach is motivated by the fact that the volume of a central cross section of a star body is expressed through the spherical Radon transform, and the latter is a member of the analytic family of the generalized cosine transforms. These transforms were introduced by Semyanistyi [Se] and INTERSECTION BODIES 5 arise (up to naming and normalization) in different contexts of analysis and geometry; see, e.g., [K4], [R1]-[RZ], [Sa2], [Sa3], [Str1], [Str2]. Sections 2-4 provide analytic background for geometric considera- tions in Sections 5-7. In Section 2 we establish our notation and define the generalized cosine transforms on the sphere and the relevant dual transforms on Grassmann manifolds. In Section 3 we present basic properties of these transforms, establish new relations between spheri- cal Radon transforms and the generalized cosine transforms, and prove “restriction theorems”, which are akin to trace theorems in Sobolev spaces. Section 4 deals with positive definite homogeneous distribu- tions, that can be characterized in terms of the generalized cosine transforms. This section serves as a preparation for the forthcoming definition of the concept of λ-intersection body. We investigate which λ’s are appropriate and why. In Section 5 we switch to geometry and define the class Inλ of λ-intersection bodies. The case 0 < λ < n cor- responds to the “unit balls of L−p-spaces” in the spirit of Definition 1.10. The reader will find in this section new proofs of some known facts. We introduce the notion of λ-intersection body of a star body in Rn, which extends Definition 1.6 to all λ < n, λ 6= 0. The class of all such bodies will be denoted by IBnλ . We will prove that for all λ < n, λ 6= 0,−2,−4, . . . , the class Inλ is the closure of IBnλ in the radial metric. The case λ = 1 gives Proposition 1.3. It will be proved that all m-dimensional central sections of λ-intersection bod- ies are λ-intersection bodies in the corresponding m-planes provided λ < m, λ 6= 0. The natural question arises: How to construct λ-intersection bodies? In Section 6 we give a series of examples; some of them are known and some are new. They can be obtained by utilizing auxiliary statements from Section 3. In particular, the famous embedding of Zhang’s class Znn−k into Ink , which was first established in [K3] and studied in [Mi1], [Mi2], will be generalized to the case, when k is replaced by any λ ∈ (0, n). Section 7 is devoted to the so called (q, ℓ)-balls, defined by Bnq,ℓ = {x = (x′, x′′) : |x′|q + |x′′|q ≤ 1; x′ ∈ Rn−ℓ, x′′ ∈ Rℓ}, q > 0. We show that if 0 < q ≤ 2, then Bnq,ℓ ∈ Inλ for all λ ∈ (0, n). If q > 2 and n − 3 ≤ λ < n, we still have Bnq,ℓ ∈ Inλ . If q > 2 and 0 < λ < λ0 = max(n − ℓ, ℓ) − 2, then Bnq,ℓ 6∈ Inλ . The case, when q > 2, ℓ > 1, and λ0 ≤ λ < n− 3 represents an open problem. In Section 8 we remind the generalized Busemann-Petty problem (GBP) for i-dimensional central sections of o.s. convex bodies in Rn. This challenging problem is still open for i = 2 and i = 3 (n ≥ 5). It actually inspires the whole investigation. Using properties of the 6 BORIS RUBIN generalized cosine transforms, we give a short direct proof of the fact that an affirmative answer to GBP implies that every smooth o.s. con- vex body in Rn with positive curvature is an (n− i)-intersection body. This fact was discovered by A. Koldobsky. The original proof in [K3] is based on the embedding Inn−i ⊂ Zni and Zhang’s result [Z1, Theorem 6]. The latter heavily relies on the Hahn-Banach separation theorem. Our proof is more constructive and almost self-contained. We conclude the paper by Appendix, which is added for convenience of the reader. The list of references at the end of the paper is far from being com- plete. Further references can be found in cited books and papers. Acknowledgement. I am grateful to Professor Alexander Koldob- sky, who shared with me his knowledge of the subject. Special thanks go to Professors Erwin Lutwak, Deane Yang, and Gaoyong Zhang for useful discussions. 2. Preliminaries 2.1. Notation. In the following, N = {1, 2, . . . } is the set of all nat- ural numbers, Sn−1 is the unit sphere in Rn with the area σn−1 = 2πn/2/Γ(n/2); Ce(S n−1) is the space of even continuous functions on Sn−1; SO(n) is the special orthogonal group of Rn; for θ ∈ Sn−1 and γ ∈ SO(n), dθ and dγ denote the relevant invariant probability mea- sures; D(Sn−1) is the space of C∞-functions on Sn−1 equipped with the standard topology, and D′(Sn−1) stands for the corresponding dual space of distributions. The subspaces of even test functions (distribu- tions) are denoted by De(Sn−1) ( D′e(Sn−1)); Gn,i denotes the Grass- mann manifold of i-dimensional subspaces ξ of Rn with the SO(n)- invariant probability measure dξ; D(Gn,i) is the space of infinitely dif- ferentiable functions on Gn,i. We write M(Sn−1) and M(Gn,i) for the spaces of finite Borel mea- sures on Sn−1 and Gn,i; M+(Sn−1) and M+(Gn,i) are the relevant spaces of non-negative measures; Me+(Sn−1) denotes the space of even measures µ ∈ M+(Sn−1). Given a function ϕ on Gn,i, we denote ϕ⊥(η) = ϕ(η⊥), η ∈ Gn,n−i. Similarly, given a measure µ ∈ M(Gn,n−i), the corresponding “orthogonal measure” µ⊥ in M(Gn,i) is defined by (µ⊥, ϕ) = (µ, ϕ⊥), ϕ ∈ C(Gn,i). Let {Yj,k} be an orthonormal basis of spherical harmonics on Sn−1. Here j = 0, 1, 2, . . . , and k = 1, 2, . . . , dn(j), where dn(j) is the di- mension of the subspace of spherical harmonics of degree j. Each function ω ∈ D(Sn−1) admits a decomposition ω = j,k ωj,kYj,k with the Fourier-Laplace coefficients ωj,k = ω(θ)Yj,k(θ)dθ, which decay rapidly as j → ∞. Each distribution f ∈ D′(Sn−1) can be defined by INTERSECTION BODIES 7 (f, ω) = j,k fj,kωj,k where fj,k = (f, Yj,k) grow not faster than j m for some integer m. We will need the Poisson integral, which is defined for f ∈ L1(Sn−1) by (2.1) (Πtf)(θ) = (1− t2) f(u)|θ − tu|−ndu, 0 < t < 1, and has the Fourier-Laplace decomposition Πtf = j,k t jfj,kYj,k [SW]. For f ∈ D′(Sn−1), this decomposition serves as a definition of Πtf . For harmonic analysis on the unit sphere, the reader is referred to [Le], [Mü], [Ne], [SW], and a survey article [Sa3]. 2.2. Basic integral transforms. For integrable functions f on Sn−1 and ϕ onGn,i, 1 ≤ i ≤ n−1, the spherical Radon transform (Rif)(ξ), ξ ∈ Gn,i, and its dual (R iϕ)(θ), θ ∈ Sn−1, are defined by (2.2) (Rif)(ξ) = θ∈Sn−1∩ξ f(θ) dξθ, (R iϕ)(θ) = ϕ(ξ) dθξ, where dξθ and dθξ denote the probability measures on the manifolds Sn−1 ∩ ξ and {ξ ∈ Gn,i : ξ ∋ θ}, respectively. The precise meaning of the second integral is (2.3) (R∗iϕ)(θ) = SO(n−1) ϕ(rθγp0) dγ, θ ∈ Sn−1, where p0 is an arbitrarily fixed coordinate i-plane containing the north pole en and rθ ∈ SO(n) is a rotation satisfying rθen = θ. Operators Ri and R i extend to finite Borel measures in a canonical way, using the duality (2.4) (Rif)(ξ)ϕ(ξ)dξ = f(θ)(R∗iϕ)(θ)dθ. Specifically, for µ ∈ M(Sn−1) and m ∈ M(Gn,i), we define Riµ ∈ M(Gn,i) and R∗im ∈ M(Sn−1) by (2.5) (Riµ, ϕ)= (R∗iϕ)(θ)dµ(θ), (R im, f)= (Rif)(ξ)dm(ξ), where ϕ ∈ C(Gn,i), f ∈ C(Sn−1). The generalized cosine transforms are defined by (2.6) (Rαi f)(ξ) = γn,i(α) |Prξ⊥θ|α+i−n f(θ) dθ, (2.7) ( αϕ)(θ) = γn,i(α) |Prξ⊥θ|α+i−n ϕ(ξ) dξ, 8 BORIS RUBIN γn,i(α) = σn−1 Γ((n− α− i)/2) 2π(n−1)/2 Γ(α/2) , Re α > 0, α+i−n 6= 0, 2, 4, . . . . Here Prξ⊥θ stands for the orthogonal projection of θ onto ξ ⊥, the or- thogonal complement of ξ ∈ Gn,i. If f and ϕ are smooth enough, then integrals (2.2) can be regarded (up to a constant multiple) as members of the relevant analytic families (2.6) and (2.7); cf. Lemma 3.1. The particular case i = n − 1 in (2.2) corresponds to the Minkowski-Funk transform (2.8) (Mf)(u) = {θ : θ·u=0} f(θ) duθ = (Rn−1f)(u ⊥), u ∈ Sn−1, which integrates a function f over great circles of codimension 1. This transform is a member of the analytic family (2.9) (Mαf)(u) = (Rαn−1f)(u ⊥) = γn(α) f(θ)|θ · u|α−1 dθ, (2.10) γn(α)= σn−1 Γ (1−α)/2 2π(n−1)/2Γ(α/2) , Re α>0, α 6=1, 3, 5, . . . . The values α = 1, 3, 5, . . . are poles of the Gamma function Γ((1−α)/2). In some occasions we include these values into consideration and set (2.11) (M̃αf)(u) = f(θ)|θ · u|α−1 dθ. Historical notes. Regarding spherical Radon transforms (2.2) and the Minkowski-Funk transform (2.8), see [GGG], [He], [R2], [R3]. The first detailed investigation of the analytic family {Mα} is due to Se- myanistyi [Se], who showed that these operators naturally arise in the Fourier analysis of homogeneous functions. The case α = 2 in (2.11) was known before, thanks to W. Blaschke, A.D. Alexandrov, and P. Lévy. Integrals (2.9) (sometimes with different normalization) arise in diverse areas of analysis and geometry; see [K4], [R1] - [R3], [Sa3], [Str1], and references therein. In convex geometry and Banach space theory, operators (2.11) with α − 1 replaced by p are known as the p- cosine transforms. More general analytic families (2.6) and (2.7) were introduced in [R2]. INTERSECTION BODIES 9 3. Analytic Families of the Generalized Cosine Transforms 3.1. Basic properties. Below we review basic properties of integrals (2.6), (2.7), (2.9); see [R2], [R3] for more details. For integrable func- tions f and ϕ and Reα > 0, integrals (2.6), (2.7) and (2.9) are ab- solutely convergent. When f and ϕ are infinitely differentiable, these integrals extend meromorphically to all α ∈ C. Lemma 3.1. If f and ϕ are continuous functions, then Rαi f = R i f = ciRif, ci = 2π(i−1)/2 ;(3.1) 0ϕ = ciR iϕ,(3.2) Mαf = M0f = cn−1Mf, cn−1 = 2π(n−2)/2 .(3.3) Hence, the Radon transform, its dual, and the Minkowski-Funk trans- form can be regarded (up to a constant multiple) as members of the corresponding analytic families {Rαi }, { α}, {Mα}. Proof. Formulas (3.2) and (3.3) follow from (3.1). To prove (3.1), we write (2.6) in bi-spherical coordinates θ = u sin ψ + vcosψ, where u ∈ Sn−1 ∩ ξ ∼ Si−1, v ∈ Sn−1 ∩ ξ⊥ ∼ Sn−i−1, 0 ≤ ψ ≤ π/2. dθ = c sini−1 ψ cosn−i−1ψ dψdudv, c = σi−1σn−i−1/σn−1. This gives (Rαi f)(ξ) = c γn,i(α) ∫ π/2 sini−1 ψ cosα−1ψ dψ Sn−1∩ξ⊥ Sn−1∩ξ f(u sin ψ+vcosψ) du ci(α) Γ(α/2) tα/2−1F (t) dt, where ci(α) = c γn,i(α) Γ(α/2) σi−1σn−i−1 Γ((n− α− i)/2) 2π(n−1)/2 2π(i−1)/2 as α → 0, and F (t) = (1− t2)i/2−1 Sn−1∩ξ⊥ Sn−1∩ξ 1− t2+vt) du. 10 BORIS RUBIN Since Γ(α/2) tα/2−1F (t) dt = F (0) = Sn−1∩ξ f(u)du = (Rif)(ξ), we are done. � Analytic continuation of integrals (2.9) can be realized in spherical harmonics as Mαf= mj,αfj,kYj,k, where (3.4) mj,α= (−1)j/2 Γ(j/2 + (1− α)/2) Γ(j/2 + (n− 1 + α)/2) if j is even, 0 if j is odd; see [R1], [R3]. If f ∈D′(Sn−1), then Mαf is a distribution defined by (Mαf, ω)=(f,Mαω)= mj,α fj,k ωj,k, ω∈D(Sn−1); α 6=1, 3, 5, . . . . Lemma 3.2. Let α, β ∈ C; α, β 6= 1, 3, 5, . . . . If α + β = 2 − n and f ∈ De(Sn−1) (or f ∈ D′e(Sn−1)), then (3.5) MαMβf = f. If α, 2−n−α 6= 1, 3, 5, . . ., then Mα is an automorphism of the spaces De(Sn−1) and D′e(Sn−1). Proof. The equality (3.5) is equivalent to mj,αmj,β = 1, α+β = 2−n. The latter follows from (3.4). The second statement is a consequence of the standard theory of spherical harmonics [Ne], because the Fourier- Laplace multiplier mj,α has a power behavior as j → ∞. � Corollary 3.3. The Minkowski-Funk transform on the spaces De(Sn−1) and D′e(Sn−1) can be inverted by the formula (3.6) (M)−1 = cn−1M 2−n, cn−1 = 2π(n−2)/2 Note that there is a wide variety of diverse inversion formulas for the Minkowski-Funk transform (see [GGG], [He], [R3] and references therein), but all of them are, in fact, different realizations of (3.6), depending on classes of functions. 3.2. Auxiliary statements. We establish some connections between operator families defined above. INTERSECTION BODIES 11 Lemma 3.4. Let α, β ∈ C; α, β 6= 1, 3, 5, . . . . If Reα > Reβ, then Mα =MβAα,β, where Aα,β is a spherical convolution operator with the Fourier-Laplace multiplier (3.7) aα,β(j) = Γ(j/2 + (1− α)/2) Γ(j/2 + (n− 1 + α)/2) Γ(j/2 + (n− 1 + β)/2) Γ(j/2 + (1− β)/2) so that aα,β(j) ∼ (j/2)β−α as j → ∞. If α and β are real numbers satisfying α > β > 1− n, α + β < 2, then Aα,β is an integral operator such that Aα,βf ≥ 0 for every non-negative f ∈ L1(Sn−1). Proof. The first statement follows from (3.4). To prove the second one, we consider integral operators + f)(x) = Γ(µ/2) (1− t2)µ/2−1(Πtf)(x) tn−νdt,(3.8) − f)(x) = Γ(µ/2) (t2 − 1)µ/2−1(Π1/tf)(x) t1−νdt,(3.9) expressed through the Poisson integral (2.1). The Fourier-Laplace mul- tipliers of Q + and Q − are (3.10) q̂ + (j)= Γ((j+n−ν+1)/2) Γ((j+n−ν+1+µ)/2) − (j)= Γ((j+ν−µ)/2) Γ((j+ν)/2) They can be easily computed by taking into account that Πt ∼ tj in the Fourier-Laplace terms. If f ∈ L1(Sn−1) and 0 < µ < ν < n, then integrals (3.8) and (3.9) are absolutely convergent and obey Q ± f ≥ 0 when f ≥ 0. Comparing (3.10) and (3.7), we obtain a factorization Aα,β = Q α−β,1−β α−β,1−β − (set µ = α − β, ν = 1 − β), which implies the second statement of the lemma. � It is convenient to introduce a special notation for the spherical Radon transform and the generalized cosine transform with orthogonal argument. Assuming ξ ∈ Gn,i, we denote (3.11) (Rn−i,⊥f)(ξ) = (Rn−if)(ξ ⊥), (Rαn−i,⊥f)(ξ) = (R n−if)(ξ Lemma 3.5. Let f ∈ L1(Sn−1), Re α > 0; α 6= 1, 3, 5, . . . . Then (3.12) (RiM αf)(ξ) = c (Rα+i−1n−i,⊥ f)(ξ), ξ ∈ Gn,i, c = 2π(i−1)/2 or (replace i by n− i) (3.13) (Rn−i,⊥M αf)(ξ) = 2π(n−i−1)/2 σn−i−1 (Rα+n−i−1i f)(ξ). If f ∈ De(Sn−1), then (3.12) and (3.13) extend to Reα ≤ 0 by analytic continuation. 12 BORIS RUBIN Proof. For Reα > 0, αf)(ξ) = γn(α) Sn−1∩ξ f(θ)|θ · u|α−1 dθ. Since |θ · u| = |Prξθ||vθ · u| for some vθ ∈ Sn−1 ∩ ξ, by changing the order of integration, we obtain αf)(ξ) = γn(α) f(θ)|Prξθ|α−1 dθ Sn−1∩ξ |vθ · u|α−1dξu. The inner integral is independent on vθ and can be easily evaluated: Sn−1∩ξ |vθ · u|α−1dξu = |t|α−1(1− t2)(i−3)/2 dt 2π(i−1)/2 Γ(α/2) σi−1 Γ((i+ α− 1)/2) This implies (3.12). � The following statement is dual to Lemma 3.5. Lemma 3.6. Let µ ∈ M(Gn,i), α 6= 1, 3, 5, . . . . Then (3.14) MαR∗iµ = c Rα+i−1n−i µ ⊥, c = 2π(i−1)/2/σi−1, in the D′(Sn−1)-sense. If Reα > 0 and µ is absolutely continuous with density ϕ ∈ L1(Gn,i), then (3.15) MαR∗iϕ = c Rα+i−1n−i ϕ almost everywhere on Sn−1. If ϕ ∈ D(Gn,i), then (3.15) extends to all complex α 6= 1, 3, 5, . . . by analytic continuation. Proof. Let ω ∈ De(Sn−1) (it suffices to consider only even test func- tions). By (2.4) and (3.12), (MαR∗iµ, ω) = (µ,RiM αω) = c (µ,Rα+i−1n−i,⊥ ω) = c (µ ⊥, Rα+i−1n−i ω). This gives the result. � The next statement contains explicit representations of the right in- verse of the dual Radon transform R∗i (note that R i is non-injective on D(Gn,i) when 1 < i < n− 1). Lemma 3.7. Every function f ∈De(Sn−1) is represented as f=R∗iAf , where A : De(Sn−1) → D(Gn,i), (3.16) Af = c1R i f = c2Rn−i,⊥M 2−nf, π(1−i)/2σn−2 σn−i−1 Γ((n− i)/2) Γ((n− 1)/2) , c2 = 2πn/2−1 INTERSECTION BODIES 13 Proof. The coincidence of expressions in (3.16) follows from (3.13). To prove the first equality, we invoke spherical convolutions defined by analytic continuation of the integral (3.17) (Qαf)(θ)= σn−1Γ((n−1−α)/2) 2π(n−1)/2Γ(α/2) (1−|u·θ|2)(α−n+1)/2f(u)du, Reα > 0, α − n 6= 0, 2, 4, . . . , so that Q0f = f [R2]. By Theorem 1.1 from [R2], R∗iR i f = c α+i−1f , and therefore (set α = 1 − i), i f = c 1 f , as desired. � The next statement provides an intriguing factorization of the Minkowski- Funk transform in terms of Radon transforms associated to mutually orthogonal subspaces. This factorization can be useful in different oc- currences. Theorem 3.8. For f ∈ L1(Sn−1) and 0 < i < n, (3.18) Mf = R∗iRn−i,⊥f. Proof. By (2.3), (R∗iRn−i,⊥f)(θ) = SO(n−1) (Rn−i,⊥f)(rθγR i) dγ SO(n−1) (Rn−if)(rθγR n−i) dγ SO(n−1) Sn−1∩rθγR f(v) dv Sn−1∩Rn−i SO(n−1) f(rθγw) dγ. The inner integral is independent on w ∈ Sn−1 ∩ Rn−i and equals (Mf)(θ). This gives (3.18). � 3.3. Restriction theorems. Theorems of such type deal with traces of functions on lower dimensional subspaces and are well known, for in- stance, in the theory of function spaces. To the best of our knowledge, traces of functions represented by Radon transforms or, more generally, by the generalized cosine transforms , were not studied systematically and deserve particular attention, because they provide analytic back- ground to a series of results related to sections of star bodies; cf. [R3, Sec. 3.5], [FGW]. Given a subspace η ∈ Gn,m and k < m, we denote by Gk(η) the manifold of all k-dimensional subspaces of η. 14 BORIS RUBIN Theorem 3.9. Let f ∈ Ce(Sn−1), 1 ≤ k < m < n, λ 6= 0,−2,−4, . . . . If Reλ < k, then for every η ∈ Gn,m and every ξ ∈ Gk(η), (3.19) (Rk−λn−kf)(ξ ⊥) = (Rk−λm−kT η f)(ξ ⊥ ∩ η), where (3.20) (T λη f)(u) = c̃ Sn−1∩(η⊥⊕Ru) f(w)|u · w|m−λ−1 dw, u ∈ Sn−1 ∩ η, c̃ = π(m−n)/2 σn−m/2. In particular (let λ→ k), (3.21) (Rn−kf)(ξ ⊥)=c (Rm−kT η f)(ξ ⊥ ∩ η), c= π (n−m)/2 σm−k−1 σn−k−1 Proof. By (2.6), (Rk−λn−kf)(ξ ⊥)=γn,n−k(k − λ) |Prξθ|−λ f(θ) dθ. We represent θ in bi-spherical coordinates as (3.22) θ = ucosψ + v sinψ, where u ∈ Sn−1 ∩ η ∼ Sm−1, v ∈ Sn−1 ∩ η⊥ ∼ Sn−m−1, 0 ≤ ψ ≤ π/2, dθ = c′′ sinn−m−1 ψ cosm−1ψ dψdudv, c′′ = σm−1σn−m−1/σn−1. If ξ ⊂ η, then |Prξθ| = |Prξ[Prηθ]| = |Prξu| cosψ, and therefore, (Rk−λn−kf)(ξ ⊥) = γm,m−k(k − λ) Sn−1∩η |Prξu|−λ(T λη f)(u) du, where (T λη f)(u) = c′′ γn,n−k(k − λ) γm,m−k(k − λ) ∫ π/2 sinn−m−1 ψ cosm−λ−1ψ dψ Sn−1∩η⊥ f(ucosψ+v sin ψ) dv π(m−n)/2 σn−m Sn−1∩(η⊥⊕Ru) f(w)|u · w|m−λ−1 dw. Formula (3.21) follows from (3.19) by (3.1). � INTERSECTION BODIES 15 Theorem 3.10. Let f ∈De(Sn−1), η∈Gn,m, 1 0, 〈F, φ(x/a)〉 = aλ+n 〈F, φ〉. Homogeneous dis- tributions on Rn are intimately connected with distributions on Sn−1. Let first f ∈ L1(Sn−1), (Eλf)(x) = |x|λf(x/|x|), x ∈ Rn \ {0}. The operator Eλ generates a meromorphic S ′-distribution 〈Eλf, φ〉= a.c. rλ+n−1u(r)dr, u(r) = f(θ)φ(rθ)dθ, where “a.c.” denotes analytic continuation in the λ-variable. The dis- tribution Eλf is regular if Reλ > −n and admits simple poles at λ = −n,−n − 1, . . .. The above definition extends to all distributions f ∈ D′(Sn−1) by the formula 〈Eλf, φ〉 = a.c. rλ+n−1u(r)dr, u(r) = (f, φ(rθ)), 1 and the map Eλ : D′(Sn−1) → S ′(Rn) is weakly continuous. If f is orthogonal to all spherical harmonics of degree j, then the deriv- ative u(j)(r) equals zero at r = 0 and the pole at λ = −n − j is removable. In particular, if f is an even distribution, i.e., (f, ϕ) = (f, ϕ−), ϕ−(θ) = ϕ(−θ) ∀ϕ ∈ D(Sn−1), then the only possible poles of Eλf are −n,−n− 2,−n− 4, . . . . The Fourier transform of homogeneous distributions was extensively studied by many authors; see [Sa3] and references therein. We restrict our consideration to even distributions, when the operator family {Mα} defined by (2.9) naturally arises thanks to the formula (4.1) [E1−n−αf ] ∧ = 21−απn/2Eα−1M This formula amounts to Semyanistyi [Se]. If f ∈ De(Sn−1), then (4.1) holds pointwise for 0 < Reα < 1 (see, e.g., Lemma 3.3 in [R1] ) and extends in the S ′-sense to all α ∈ C satisfying (4.2) α /∈ {1, 3, 5, . . .} ∪ {1− n,−n− 1,−n− 3, . . .}. 1Here and on, different notations 〈·, ·〉 and (·, ·) are used for distributions on Rn and Sn−1, respectively. INTERSECTION BODIES 19 Since De(Sn−1) is dense in D′e(Sn−1) and the maps E1−n−α and Eα−1 are weakly continuous from D′e(Sn−1) to S ′(Rn), then (4.1) extends to all f ∈ D′e(Sn−1). Regarding the cases excluded in (4.2), we note that if α = 1+ 2ℓ for some ℓ = 0, 1, . . ., then (4.1) is meaningful if and only if f is orthogonal to all spherical harmonics of degree 2ℓ. If α = 1 − n − 2ℓ for some ℓ = 0, 1, . . ., then, according to the spherical harmonic decomposition j,k fj,kYj,k, j even, formula (4.1) is substituted for the following: [E2ℓf ] ∧(ξ) = (2π)n fj,k(−∆)ℓ−j/2Yj,k(i∂) δ(ξ)(4.3) +2n+2ℓπn/2E−n−2ℓM 1−n−2ℓ fj,kYj,k where −∆ is the Laplace operator, ∂ = (∂/∂ξ1, . . . , ∂/∂ξn), and δ(ξ) is the delta function. It is worth noting that for α = 1, 3, 5, . . ., the distribution [E1−n−αf ] ∧ can also be understood in the regularized sense without any orthogonality assumptions. However, such regularization does not preserve homogeneity; see [Sa1], [Sa3]. Our main concern is positivity and positive definiteness of even ho- mogeneous distributions. The reader is referred to [GV] for the general theory. A distribution F ∈ S ′(Rn) is positive if 〈F, φ〉 ≥ 0 for all non- negative φ ∈ S(Rn). A similar definition holds for distributions on the sphere and on Rn \ {0}. A distribution F ∈ S ′(Rn) is positive definite if F̂ is positive. For our purposes, it is important to know, which even homogeneous distributions are positive definite. Let us rewrite (4.1) and (4.2) with 1− n− α replaced by −λ. We have (4.4) [E−λf ] ∧ = 2n−λπn/2Eλ−nM 1+λ−nf, (4.5) λ /∈ Λ0, Λ0 = {n, n + 2, n+ 4 . . .} ∪ {0,−2,−4, . . .}. Theorem 4.1. Let λ ∈ R \ Λ0, f ∈ D′e(Sn−1). (i) If λ < 0 and E−λf is a positive definite distribution, then f = 0. (ii) For all λ ∈ R \ Λ0, the following statements are equivalent: (a) [E−λf ] ∧ is a positive distribution on Rn \{0} (for λ > 0, this can be replaced by “E−λf is a positive definite distribution on R (b) M1+λ−nf ∈ Me+(Sn−1); (c) f =M1−λµ for some measure µ ∈ Me+(Sn−1). Furthermore, for any real λ 6= 0,−2,−4, . . ., and any i = 1, 2, . . . , n−1, (c) is equivalent to (d) Rif = R n−i,⊥µ for some measure µ ∈ Me+(Sn−1). 20 BORIS RUBIN Proof. (i) Choose φ(x) = exp(−|x|m) pt,θ(x/|x|), where m ∈ 2N and pt,θ(·) is the Poisson kernel (4.6) pt,θ(u) = 1− t2 (1− 2tu · θ + t2)n/2 , 0 < t < 1; u, θ ∈ Sn−1. Then 〈Eλ−nM1+λ−nf, φ〉 = cλ(ΠtM1+λ−nf)(θ), where cλ = a.c. rλ−1 exp(−rm) dr = m−1Γ(λ/m) and (ΠtM 1+λ−nf)(θ) is the Poisson integral of M1+λ−nf . If E−λf is a positive definite distribution, then, by (4.4), Eλ−nM 1+λ−nf is a positive distribution. On the other hand, if λ < 0 and m > −λ, then cλ < 0. Hence 〈Eλ−nM1+λ−nf, φ〉 can be non-negative for every non-negative φ ∈ S(Rn) only if (ΠtM1+λ−nf)(θ) = 0 for every 0 < t < 1 and θ ∈ Sn−1. The latter implies M1+λ−nf = 0, which is equivalent to f = 0 because M1+λ−n is injective; see Lemma 3.2. (ii) Let [E−λf ] ∧ be a positive distribution on Rn \{0}. It means that for every φ ∈ S(Rn) such that φ ≥ 0 and 0 /∈ suppφ, 〈[E−λf ]∧, φ〉 ≥ 0 or, by (4.4), 〈Eλ−nM1+λ−nf, φ〉 ≥ 0. Choose φ(x) = ψ(|x|)ω(x/|x|), where ω ∈ D(Sn−1), ω ≥ 0, and ψ is a smooth non-negative function such that rα+n−2ψ(r)dr = 1 and 0 /∈ suppψ. Then 〈Eλ−nM1+λ−nf, φ〉 = (M1+λ−nf, ω) ≥ 0, and therefore, M1+λ−nf ∈ Me+(Sn−1); see Theorem 9.1. Conversely, let µ = M1+λ−nf ∈ Me+(Sn−1) and let φ ∈ S(Rn); φ ≥ 0. In the case λ < 0 we additionally assume 0 /∈ suppφ. By (4.4), 〈[E−λf ]∧, φ〉 = 2n−λπn/2 〈Eλ−nµ, φ〉 = 2n−λπn/2 rλ−1dr φ(rθ)dµ(θ) ≥ 0. This proves equivalence of (a) and (b). Equivalence of (b) and (c) follows from Lemma 3.2. Let us prove the equivalence of (c) and (d). If Rif = R n−i,⊥µ, µ ∈ Me+(Sn−1), then, by (3.15), (f, R∗iϕ) = (Rif, ϕ) = (R n−i,⊥µ, ϕ) = (µ, Ri−λn−iϕ = c−1(µ,M1−λR∗iϕ), ϕ ∈ D(Gn,i). Since any function ω ∈ De(Sn−1) can be expressed as ω = R∗iϕ for some ϕ ∈ D(Gn,i) (see Lemma 3.7), this gives (f, ω) = c−1(µ,M1−λ, ω) which is (c). Conversely, let f = M1−λµ, µ ∈ Me+(Sn−1), that is, (f, ω) = (µ,M1−λ, ω) for every ω ∈ De(Sn−1). Choose ω = R∗iϕ, ϕ ∈ INTERSECTION BODIES 21 D(Gn,i). Then, as above, (f, R∗iϕ) = (µ,M1−λR∗iϕ) = c (µ, Ri−λn−iϕ which gives (d). � 5. λ-intersection bodies 5.1. Definitions and comments. We remind that Kn is the set of all origin-symmetric star bodies K in Rn, n ≥ 2; ρK and || · ||K are the radial function and the Minkowski functional of K. The following defi- nitions and statements are motivated by Theorem 4.1 and the previous consideration. Let λ be a real number, (5.1) sλ = 1 if λ > 0, λ 6= n, n+ 2, n+ 4, . . . , Γ(λ/2) if λ < 0, λ 6= −2,−4, . . . . The values λ = 0, n, n + 2, n + 4, . . . will not be considered in the following, but values λ = −2,−4, . . . will be included. They become meaningful if we change normalization. For λ 6= 0, n, n + 2, n + 4 . . . , let Inλ be the set of bodies K ∈ Kn, for which there is a measure µ ∈ Me+(Sn−1) such that sλρK = M1−λµ if λ 6= −2ℓ, ℓ ∈ N, and ρK = M̃ 1−λµ ≡ M̃1+2ℓµ, otherwise. The equality sλρK = M1−λµ means that for any ϕ ∈ D(Sn−1), ρkK(θ)ϕ(θ) dθ = (M1−λϕ)(θ) dµ(θ), where for λ ≥ 1, (M1−λϕ)(θ) is understood in the sense of analytic continuation. We remind the notation Λ0 = {n, n+ 2, n+ 4 . . .} ∪ {0,−2,−4, . . .}. Theorem 5.1. For λ ∈ R\Λ0, the following statements are equivalent: (a) K ∈ Inλ ; (b) The Fourier transform [sλ || · ||−λK ]∧ is a positive distribution on n\{0} (for λ > 0, this can be replaced by “|| · ||−λK is a positive definite distribution on Rn”); (c) sλM 1+λ−nρλK ∈ Me+(Sn−1); The theorem is an immediate consequence of Theorem 4.1 if the lat- ter is applied to f = sλρ K . Another useful characterization is provided by Theorem 4.1 (d). Theorem 5.2. Let λ ∈ R \ Λ0. If K ∈ Inλ , then for every i ∈ {1, 2, . . . , n−1} there is a measure µ ∈ Me+(Sn−1) such that sλRiρλK = Ri−λn−i,⊥µ. Conversely, if sλRiρ K = R n−i,⊥µ for some i ∈ {1, 2, . . . , n−1} and some µ ∈ Me+(Sn−1), then K ∈ Inλ . 22 BORIS RUBIN Although Inλ was called “the set of bodies”, the definition of this set is purely analytic and extra work is needed to understand what bodies (if any) actually constitute the class Inλ . The following comments will be helpful. 1. The case λ > n is not so interesting, because by Theorem 5.1(c), Inλ is either empty (if Γ((n − λ)/2) < 0) or coincides with the whole class Kn (if Γ((n− λ)/2) > 0). 2. The case λ ∈ (0, n) agrees with the concept of isometric embed- ding of the space (Rn, || · ||K) into L−p, p = λ; see Introduction. In the framework of this concept, all bodies K ∈ Inλ can be regarded as “unit balls of n-dimensional subspaces of L−λ”. 3. If K ∈ Inλ , where λ < 0 (one can replace λ by = −p, p > 0), then ||u||pK = |θ · u|p dµ(θ) for some µ ∈ Me+(Sn−1). This is the well known Lévy representation, characterizing isometric embedding of the space (Rn, || · ||K) into Lp; see Lemma 6.4 in [K4]. Statement (b) in Theorem 5.1 agrees with Theorem 1.9. Keeping this terminology, we can state the following Proposition 5.3. Let p > −n, p 6= 0. Then (Rn, || · ||K) embeds isometrically in Lp if and only if K ∈ In−p. 4. If λ = k ∈ {1, 2, . . . , n−1}, then Inλ = Ink coincides with the class of k-intersection bodies; see Definition 1.7 and Theorem 1.8. Theorems 5.1 and 5.2 provide new characterizations of this class. These comments inspire the following Definition 5.4. Let λ < n, λ 6= 0. A body K ∈ Kn is said to be a λ-intersection body if K ∈ Inλ , or, in other words, if there is a measure µ ∈ Me+(Sn−1) such that sλρλK = M1−λµ if λ 6= −2ℓ, ℓ ∈ N, and ρ−2ℓK = M̃ 1+2ℓµ, otherwise. The result of Theorem 5.2 for λ = i = k can serve as an alternative definition of k-intersection bodies in terms of Radon transforms. This definition agrees with Definition 1.6 and mimics Definition 1.2. Definition 5.5. Let k ∈ {1, 2, . . . , n − 1}. A body K ∈ Kn is a k- intersection body if there is a non-negative measure µ on Sn−1 such (5.2) (Rkρ K)(ξ) = (Rn−kµ)(ξ ⊥), ξ ∈ Gn,k. INTERSECTION BODIES 23 Equality (5.2) is understood in the weak sense according (2.5). Namely, for ϕ ∈ C(Gn,k) and ϕ⊥(η) = ϕ(η⊥), η ∈ Gn,n−k, (5.2) means (5.3) K)(ξ)ϕ(ξ) dξ = (R∗n−kϕ ⊥)(θ) dµ(θ). 5.2. λ-intersection bodies of star bodies and closure in the ra- dial metric. As we mentioned in Introduction, the class of intersection bodies, which coincides with Inλ when λ = 1, is the closure in the ra- dial metric of the class of intersection bodies of star bodies. Below we extend this result to all λ < n, λ 6= 0, in the framework of the unique approach. We remind (see Definition 1.6) that K ∈ Kn is a k-intersection body of a body L ∈ Kn and write K = IBk(L) if (5.4) volk(K ∩ ξ) = voln−k(L ∩ ξ⊥) ∀ξ ∈ Gn,k. Let IBk,n be the set of all bodies K ∈ Kn satisfying (5.4) for some L ∈ Kn. How can we extend the purely geometric property (5.4) to non- integer values of k? To this end, we first express (5.4) in terms of the generalized cosine transforms (2.9). Lemma 5.6. If K = IBk(L) is infinitely smooth, then (5.5) ρn−kL =cM 1−n+kρkK , ρ −1M1−kρn−kL , c = πk−n/2(n− k)/k. Proof. We make use of (3.13), where we set i = k, α = 1 − n + k and f = ρkK . By (3.1), this gives (5.6) Rkρ K = c̃Rn−k,⊥M 1−n+kρkK , c̃ = πk−n/2 σn−k−1 On the other hand, if K = IBk(L) is infinitely smooth, then, according to (5.4) and the equality (5.7) volk(K ∩ ξ) = K)(ξ), we have (5.8) Rkρ k σn−k−1 (n− k) σk−1 Rn−k,⊥ρ Comparing (5.6) and (5.8), owing to injectivity of the Radon transform, we obtain the first equality in (5.5). The second equality follows from the first one by (3.5). � 24 BORIS RUBIN Equalities (5.5) are extendable to non-integer values of k. We denote cλ,n = π λ−n/2(n−λ)/λ, and let sλ be defined by (5.1). Definition 5.7. Let λ < n, λ 6= 0; K,L ∈ Kn. We say that K is a λ-intersection body of L and write K = IBλ(L) if sλρλK=c−1λ,nM1−λρ in the case λ 6= −2ℓ, ℓ ∈ N, and ρ−2ℓK = M̃1+2ℓρ L , otherwise. The set of all λ-intersection bodies of star bodies will be denoted by IBλ,n. We also denote (5.9) IB∞λ,n={K ∈ IBλ,n : ρK ∈ De(Sn−1)}. By (3.5), equality sλρ K = c 1−λρn−λL is equivalent to ρ sλ cλ,nM 1−n+λρλK . Both equalities are generally understood in the sense of distributions, for instance, K , ϕ) = c λ,n(ρ 1−λϕ), ϕ ∈ D(Sn−1). If K (or L) is smooth, then sλρ K(θ)=c λ,n(M 1−λρn−λL )(θ) pointwise for every θ∈Sn−1. Theorem 5.8. Let λ < n, λ 6= 0. If λ 6= −2ℓ, ℓ ∈ N, then the class Inλ of λ-intersection bodies is the closure of the classes IBλ,n and IB∞λ,n of λ-intersection bodies of star bodies in the radial metric: (5.10) Inλ = cl IBλ,n = cl IB∞λ,n. If λ = −2ℓ, ℓ ∈ N, then Inλ ⊂ cl IBλ,n = cl IB∞λ,n. Proof. STEP 1. We first prove that Inλ ⊂ cl IB∞λ,n. Let K ∈ Inλ , i.e., (a) sλρ 1−λµ, µ ∈ Me+(Sn−1), if λ 6= −2ℓ, ℓ ∈ N, and (b) ρ−2ℓK = M̃ 1+2ℓµ, otherwise. Our aim is to define a sequence Kj ∈ IB∞λ,n such that ρKj → ρK in the C-norm. Consider the Poisson integral Πtρ K (see (2.1)), that converges to ρλK in the C-norm when t→ 1. In the case (a), for any test function ω ∈ D(Sn−1) we have K , ω) = (ρ K ,Πtω) = s λ (µ,M 1−λΠtω) = s 1−λΠtµ, ω). Similarly, in the case (b), we have a pointwise equality (Πtρ K )(θ) = (M̃1+2ℓΠtµ)(θ), θ ∈ Sn−1. Choose Kj so that ρλKj = Πtjρ K , where tj is a sequence in (0, 1) approaching 1. Clearly, Kj converges to K in the radial metric. Moreover, Kj ∈ IB∞λ,n, because ρλKj = s 1−λρn−λLj and ρ−2ℓKj = M̃ 1+2ℓρn+2ℓLj , where the bodies Lj are defined by ρ cλ,nΠtjµ in the case (a), and ρ = Πtjµ in the case (b), respectively. INTERSECTION BODIES 25 Conversely, let K ∈ cl IB∞λ,n, λ 6= −2,−4, . . . . It means that there is a sequence of Kj ∈ IB∞λ,n such that lim ||ρK − ρKj ||C(Sn−1) = 0 and = c−1λ,nM 1−λρn−λLj , ρLj ∈ De+(S n−1). If j → ∞, then for every ω ∈ D(Sn−1), (5.11) sλ(ρ ,M1−n+λω) → sλ(ρλK ,M1−n+λω)=sλ(M1−n+λρλK , ω). The right-hand side of (5.11) is non-negative, because by (3.5), for every j and every ω ∈ De+(Sn−1), ,M1−n+λω) = c−1λ,n(M 1−λρn−λLj ,M 1−n+λω) = c−1λ,n(ρ , ω) ≥ 0. By Theorem 9.1, it follows that sλM 1−n+λρλK is a non-negative mea- sure. We denote it by µ. By (3.5), for any ω ∈ D(Sn−1), K , ω) = sλ(M 1−n+λρλK ,M 1−λω) = (µ,M1−λω) = (M1−λµ, ω), i.e., K∈Inλ . This gives IB∞λ,n⊂Inλ and, by above, Inλ =cl IB∞λ,n. STEP 2. It remains to prove that cl IB∞λ,n = cl IBλ,n. Since IB∞λ,n ⊂ IBλ,n, then cl IB∞λ,n ⊂ cl IBλ,n. To prove the opposite inclusion, let K ∈ cl IBλ,n and consider the case λ 6= −2,−4, . . . . We have to show that there is a sequence of smooth bodies Kj, which converges to K in the radial metric and such that sλρ = c−1λ,nM 1−λρn−λLj for some bodies Lj ∈ Kn. Since K ∈ cl IBλ,n, there is a sequence K̃j ∈ Kn such that ||ρK̃j −ρK ||C(Sn−1) = 0 and sλρ = c−1λ,nM 1−λρn−λ for some bodies L̃j ∈ Kn. We define smooth bodies Kj and Lj by ρλKj = Π1−1/jρ , ρn−λLj = Π1−1/jρ where Π1−1/j stands for the Poisson integral with parameter 1 − 1/j. Since operators Π1−1/j andM 1−λ commute, then sλρ =c−1λ,nM 1−λρn−λLj , and therefore, Kj ∈ IB∞λ,n. On the other hand, by the properties of the Poisson integral [SW], |ρλKj − ρ K | ≤ |Π1−1/jρλK̃j − Π1−1/jρ K |+ |Π1−1/jρλK − ρλK | → 0 as j → ∞. It means, that K ∈ cl IB∞λ,n or cl IBλ,n ⊂ cl IB∞λ,n. Hence, by above, cl IBλ,n = cl IB∞λ,n. For λ = −2,−4, . . . , the argument is similar. � Remark 5.9. If λ = −2,−4, . . . , we cannot prove the coincidence of Inλ and cl IB∞λ,n, because the proof of the embedding cl IB∞λ,n ⊂ Inλ relies heavily on the fact that M1−λ is an isomorphism of De(Sn−1). If λ = −2,−4, . . . , this is not so, and the operator M̃1−λ has a nontrivial kernel, which consists of spherical harmonics of degree > 2ℓ; see [R1] for details. 26 BORIS RUBIN It is interesting to translate Theorem 5.8 for λ = −p, p > 0, into the language of isometric embeddings. Ignoring a non-important pos- itive constant factor and using polar coordinates, one can replace the equalities sλρ K = c 1−λρn−λL and ρ K = M̃ 1+2ℓρn+2ℓL in Definition 5.7 by (5.12) ||u||pK = |x · u|p dx, u ∈ Sn−1. Corollary 5.10. (i) A unit ball of every n-dimensional subspace of Lp, can be approxi- mated in the radial metric by bodies K, defined by (5.12), where L ∈ Kn has a C∞ boundary. (ii) If, moreover, p 6= 2, 4, . . . , then the set of unit balls of all n- dimensional subspaces of Lp, can be identified with the closure in the radial metric of the set of bodies K satisfying (5.12) for some smooth body L ∈ Kn (one can also consider arbitrary bodies L ∈ Kn). 5.3. Central sections of λ-intersection bodies. It is known, that a cross-section K ∩ η of a body K ∈ Ink by any m-dimensional central plane η is a k-intersection body in η provided 1 ≤ k < m < n. This fact was established in [Mi1, Proposition 3.17] by using Theorem 1.8 and a certain approximation procedure. Below we present more general results, including sections of k-intersection bodies of star bodies and the case of non-integer k = λ. These results are consequences of the restriction theorems from Section 3.3. Theorem 5.11. Let 1 ≤ k < m < n, η ∈ Gn,m. If K = IBk(L) in Rn, then K ∩ η = IBk(L̃) in η, where the body L̃ is defined by (5.13) ρm−k (u) = ck,m,n Sn−1∩(η⊥⊕Ru) ρn−kL (w)|u · w| m−k−1 dw, u ∈ Sn−1 ∩ η, ck,m,n = (m− k) σn−m 2(n− k) Proof. By (5.7) and (3.21) (with f = ρn−kL ), volk(K ∩ ξ) = voln−k(L ∩ ξ⊥) = σn−k−1 (Rn−kρ L )(ξ c σn−k−1 (Rm−kT L )(ξ ⊥ ∩ η)(5.14) σm−k−1 (Rm−kρ )(ξ⊥ ∩ η) = volm−k(L̃ ∩ ξ⊥), as desired. � INTERSECTION BODIES 27 Theorem 5.11 has the following generalization. Theorem 5.12. Let 1 < m < n, η ∈ Gn,m and suppose that λ < m, λ 6= 0. If K = IBλ(L) in Rn, then K ∩ η = IBλ(L̃) in η, where the body L̃ is defined by (5.15) ρm−λ (u) = c̃ Sn−1∩(η⊥⊕Ru) ρn−λL (w)|u · w| m−λ−1 dw, u ∈ Sn−1 ∩ η, c̃ = (m− λ) σn−m 2(n− λ) if λ 6= −2ℓ, ℓ ∈ N, π(m−n)/2 σn−m/2 otherwise. Moreover, if K ∈ Inλ in Rn, then K ∩ η ∈ Imλ in η. Proof. Let λ 6= −2ℓ, ℓ ∈ N, and let θ ∈ Sn−1 ∩ η. By Definition 5.7, K = c 1−λρn−λL , and Theorem 3.12 (with f = sλρ K and g = c−1λ,nρ L ) yields K(θ) = (M Sn−1∩η T λη [c L ])(θ) = c λ,m(M Sn−1∩η )(θ), where ρm−λ = c T λη ρ L , c = π (n−m)/2(m − λ)/(n − λ). By Definition 5.7 and (3.20), we are done. If λ = −2ℓ, ℓ ∈ N, then, as above, ρ−2ℓK (θ) = (M̃ Sn−1∩η T−2ℓη ρ L )(θ) = (M Sn−1∩η ρm+2ℓ where ρm+2ℓ = T−2ℓη ρ L . This gives (5.15). Furthermore, if K ∈ Inλ , λ 6= −2ℓ, ℓ ∈ N, then, by Definition 5.4, K = M 1−λµ, µ ∈ Me+(Sn−1). Hence, by Theorem 3.12, there is a measure ν ∈ Me+(Sn−1 ∩ η) such that the restriction of sλρλK onto Sn−1∩η is represented as sλρλK =M1−λSn−1∩ην. It means that K∩η ∈ I in η. In the case λ = −2ℓ, ℓ ∈ N, the argument is similar. � 6. Examples of λ-intersection bodies The definition of the classes Inλ and IBλ,n and all known characteri- zations are purely analytic. Unlike the case λ = 1, when an intersection body of a star body is explicitly defined by a simple geometric proce- dure, it is not clear how can we construct λ−intersection bodies in the general case. Below we give some examples, when the radial function of a λ−intersection body can be explicitly determined. These examples utilize the generalized cosine transforms. Example 6.1. Let λ < 1, λ 6= 0. This case is the simplest. Indeed, given a non-negative measure µ on Sn−1, the relevant λ−intersection 28 BORIS RUBIN body can be directly constructed by the formula ρλK = M 1−λµ, if λ 6= −2ℓ, ℓ ∈ N, and ρ−2ℓK = M̃1+2ℓµ, otherwise. In other words (cf. (2.11)), (6.1) ρλK(u) = |θ · u|−λ dµ(θ). This fact (with λ replaced by −p) is a reformulation of Theorem 6.17 from [K4], which was stated in the language of isometric embeddings and relies on the P. Lévy characterization; see also Lemma 6.4 and Theorem 4.11 in [K4]. Example 6.2. If n − 3 ≤ λ < n, λ > 0, then Inλ includes all origin- symmetric convex bodies in Rn. This fact is due to Koldobsky [K4, Corollary 4.9]. It can be proved using a slight modification of the argument from [R3, Sec. 7] as follows. By Theorem 5.1 (c), it suffices to check that for any o.s. convex body K,M1+λ−nρλK ∈ Me+(Sn−1). For λ ≥ n−1, this is obvious. To handle the case n− 3 ≤ λ < n− 1, suppose first that K is infinitely smooth. Using polar coordinates, for Reα > 0, we can write (6.2) (Mαρα+n−1K )(u) = (α + n− 1) γn(α) |x · u|α−1 dx. Then M1+λ−nρλK can be realized as analytic continuation (a.c.) at α = 1 + λ− n of the right-hand side of (6.2). The latter can be written as I(α) = 2(α+ n− 1)γn(α) tα−1AK,u(t) dt, AK,u(t) = voln−1(K∩{tu+u⊥}). Taking analytic continuation (see [GS, Chapter 1]), for −2 < α < 0 (which is equivalent to n−3 ≤ λ < n−1) we get a.c.I(α) = c1 tα−1[AK,u(t)−AK,u(0)] dt. Similarly, a.c.I(α) at α = −2 (which corresponds to λ = n − 3) is K,u(0). Following [GS], one can easily check that constants c1 and c2 are negative. Since K is convex, both analytic continuations are positive, and thereforeM1+λ−nρλK > 0. If K is an arbitrary o.s. convex body, we approximate it in the radial metric by smooth o.s. convex bodies Kj. Then for any test function ω ∈ D+(Sn−1), by the previous step we have (M1+λ−nρλK , ω) = (ρ 1+λ−nω) = lim (ρλKj ,M 1+λ−nω) = lim (M1+λ−nρλKj , ω) ≥ 0. INTERSECTION BODIES 29 Hence, by Theorem 9.1, M1+λ−nρλK is a non-negative measure and the proof is complete. Example 6.3. If ρλK = Ri−λn−iν for some ν ∈ M+(Gn,n−i) and λ ≤ i < n, then K ∈ Inλ . Indeed, for any test function ω ∈ D(Sn−1), by (3.12) (with α = 1−λ) we have (ρλK , ω) = ( Ri−λn−iν, ω) = (ν, R n−iω) = (ν ⊥, Ri−λn−i,⊥ω) = c−1(ν⊥, RiM 1−λω) = c−1(R∗i ν ⊥,M1−λω), c = 2π(i−1)/2 It means that for 0 < λ ≤ i < n and ν ∈ M+(Gn,n−i), (6.3) ρλK = Ri−λn−iν ⇐⇒ {ρλK =M1−λµ, µ = c−1R∗i ν⊥}. By Definition 5.4, this gives the result. The particular case λ = i implies the embedding into Ini of the Zhang’s class Znn−i; see Definition 1.5. This embedding was proved in [K3] and [Mi1] in a different way; see also [Mi2], where it is proved that Znn−i is a proper subset of Ini if 2 ≤ i ≤ n− 2. Example 6.4. If 0 < (i− 1)/2 < λ ≤ i < n and ρλK = M i−λµ for some µ ∈ M+(Sn−1), then K ∈ Inλ . Indeed, by Lemma 3.4 (with α = i− λ, β = 1− λ), ρλK =M i−λµ = M1−λAi−λ,1−λ, where Ai−λ,1−λ is an integral operator which preserves positivity provided i− λ > 1− λ > 1− n, (i− λ) + (1− λ) < 2. This is just our case. Example 6.5. One can construct bodies K ∈ Inλ from bodies L ∈ Inδ by the formula ρK = ρ L provided 0 < δ < λ < n. Indeed, by Definition 5.4, there is a measure µ ∈ M+(Sn−1) so that ρδL = M 1−δµ. Then, by Lemma 3.4 (with α = 1 − δ, β = 1 − λ), ρλK = ρ L = M 1−δµ = M1−λA1−δ,1−λµ, and we are done. This example generalizes the corresponding result from [Mi1, p. 533, Statement (c)], which was obtained in a different way for the case, when λ and δ are integers. Example 6.6. Let (6.4) Bnq = {x ∈ Rn : ||x||q = |xk|q ≤ 1}. If 0 < q ≤ 2, then Bnq ∈ Inλ for all λ ∈ (0, n). If 2 < q <∞, λ ∈ (0, n), then Bnq ∈ Inλ if and only if λ ≥ n− 3. 30 BORIS RUBIN Both statements are due to Koldobsky. The first one follows from the fact that for 0 < q ≤ 2 the Fourier transform of ||x||−λq is a positive S ′-distribution (see Lemmas 3.6 and 2.27 in [K4]). The second state- ment is a reformulation of Theorem 4.13 from [K4]. The “if” part is a consequence of Example 6.2. 7. (q, ℓ)-balls In this section we consider one more example, which resembles Ex- ample 6.6, but does not fall into its scope and requires a separate consideration. Let x = (x′, x′′) ∈ Rn, x′ ∈ Rn−ℓ = Rej , x ′′ ∈ Rℓ = j=n−ℓ+1 Rej , e1, . . . , εn being coordinate unit vectors. Consider the (q, ℓ)-ball (7.1) Bnq,ℓ = {x : ||x||q,ℓ = (|x′|q + |x′′|q)1/q ≤ 1}, q > 0. We wonder, for which triples (q, ℓ, n), Bnq,ℓ is a λ-intersection body. To study this problem, we need some preparation. Consider the Fourier integral (7.2) γq,ℓ(η) = e−|y| eiy·η dy, η ∈ Rℓ, q > 0. The function γq,ℓ(η) is uniformly continuous on R ℓ and vanishes at infinity. Lemma 7.1. If 0 < q ≤ 2, then γq,ℓ(η) > 0 for all η ∈ Rℓ. Proof. (Cf. [K4, p. 44, for ℓ = 1]). For η = 0, the statement is obvious. It is known (see, e.g., [SW]), that (7.3) [e−t|· | ]∧(η) = πℓ/2t−ℓ/2e−|η| 2/4t, t > 0. This gives the result for q = 2. Let 0 < q < 2. By Bernstein’s theorem [F, Chapter 18, Sec. 4], there is a non-negative finite measure µq on [0,∞) so that e−zq/2 = e−tz dµq(t), z ∈ [0,∞). Replace z by |y|2 to (7.4) e−|y| e−t|y| dµq(t). Then (7.3) yields γq,ℓ(η) = eiy·ηdy e−t|y| dµq(t) = dµq(t) eiy·ηe−t|y| = πℓ/2 t−ℓ/2e−|η| 2/4t dµq(t) > 0. INTERSECTION BODIES 31 The Fubini theorem is applicable here, because, by (7.4), |eiy·η|dy e−t|y| dµq = e−|y| dy <∞. Our next concern is the behavior of γq,ℓ(η) when |η| → ∞. If q is even, then e−|·| is a Schwartz function and therefore, γq,ℓ is infin- itely smooth and rapidly decreasing. In the general case, we have the following. Lemma 7.2. For any q > 0, (7.5) lim |η|→∞ |η|ℓ+qγq,ℓ(η) = 2ℓ+qπℓ/2−1Γ(1+ q/2)Γ((ℓ+ q)/2) sin(πq/2). Proof. For ℓ = 1, this statement can be found in [PS, Chapter 3, Prob- lem 154] and in [K4, p. 45]. In the general case, the proof is more sophisticated and relies on the properties of Bessel functions. By the well-known formula for the Fourier transform of a radial function (see, e.g., [SW]), we write γq,ℓ(η) = I(|η|), where I(s) = (2π)ℓ/2s1−ℓ/2 rℓ/2Jℓ/2−1(rs) dr = (2π)ℓ/2s−ℓ [(rs)ℓ/2Jℓ/2(rs)] dr. Integration by parts yields I(s) = q(2π)ℓ/2s−ℓ/2 rℓ/2+q−1Jℓ/2(rs) dr. Changing variable z = sqrq, we obtain sℓ+qI(s) = (2π)ℓ/2A(s1/q), A(δ) = e−zδzℓ/2qJℓ/2(z 1/q) dz. We actually have to compute the limit A0 = lim A(δ). To this end, we invoke Hankel functions H ν (z), so that Jν(z) = ReH ν (z) if z is real [Er]. Let hν(z) = z ν (z). This is a single-valued analytic function in the z-plane with cut (−∞, 0]. Using the properties of the Bessel functions [Er], we get (7.6) lim hν(z) = 2 νΓ(ν)/πi, (7.7) hν(z) ∼ 2/π zν−1/2eiz− (ν+ 1 ), z → ∞. Then we write A(δ) as A(δ) = Re e−zδhℓ/2(z 1/q) dz and change the line of integration from [0,∞) to ℓθ = {z : z = reiθ, r > 0} for 32 BORIS RUBIN small θ < πq/2. By Cauchy’s theorem, owing to (7.6) and (7.7), we obtain A(δ) = Re e−zδhℓ/2(z 1/q) dz. Since for z = reiθ, hℓ/2(z 1/q) = O(1) when r = |z| → 0 and hℓ/2(z1/q) = O(r(ℓ−1)/2qe−r 1/q sin(θ/q)) as r → ∞, by the Lebesgue theorem on dominated convergence, we get A0 = Re hℓ/2(z 1/q) dz. To evaluate the last integral, we again use analyticity and replace ℓθ by ℓπq/2 = {z : z = reiπq/2, r > 0} to get A0 = Re eiπq/2 hℓ/2(r 1/qeiπ/2) dr To finalize calculations, we invoke McDonald’s function Kν(z) so that hν(z) = z νH(1)ν (z) = − (ze−iπ/2)νKν(ze −iπ/2). This gives sin(πq/2) sℓ/2+q−1Kℓ/2(s) ds. The last integral can be explicitly evaluated by the formula 2.16.2 (2) from [PBM], and we obtain the result. � Now we can proceed to studying (q, ℓ)-balls Bnq,ℓ; see (7.1). There is an intimate connection between geometric properties of the balls Bnq,ℓ and the Fourier transform of the power function || · ||pq,ℓ. The case q = 2 is well-known and associated with Riesz potentials; see, e.g., [St]. The relevant case of ℓnq -balls, which agrees with ℓ = 1 was considered in Example 6.6. Lemma 7.3. Let q > 0, ξ = (ξ′, ξ′′) ∈ Rn, γq,ℓ(ξ′′) and γq,n−ℓ(ξ′) be the functions of the form (7.2). We define (7.8) hp,q,ℓ(ξ) = Γ(−p/q) tn+p−1 γq,n−ℓ(ξ ′t) γq,ℓ(ξ ′′t) dt. (i) Let ξ′ 6= 0 and ξ′′ 6= 0. If q is even, then the integral (7.8) is abso- lutely convergent for all p > −n. Otherwise, it is absolutely convergent when −n < p < 2q. In these cases, hp,q,ℓ (ξ) is a locally integrable function away from the coordinate subspaces Rℓ and Rn−ℓ. (ii) If −n < p < 0, then hp,q,ℓ (ξ) ∈ L1loc(Rn)∩S ′(Rn) and (||·|| ∧(ξ) = hp,q,ℓ(ξ) in the sense of S ′-distributions. Specifically, for ϕ ∈ S(Rn), (7.9) 〈hp,q,ℓ , ϕ̂〉 = (2π)n〈|| · ||pq,ℓ , ϕ〉. INTERSECTION BODIES 33 Proof. (i) For any 0 < ε < a <∞, ε<|ξ′| −n. The second integral can be estimated by making use of Lemma 7.2. Specifically, if q is not an even integer, then I2 ≤ cε tp−1 dt |z′|>tε |z′|n−ℓ+q |z′′|>tε |z′′|ℓ+q tp−2q−1 dt. If q is even, then γq,ℓ and γq,n−ℓ are rapidly decreasing and I2 ≤ tp−2m−1 dt for any m > 0. This gives what we need. (ii) If −n < p < 0, the same argument is applicable with ε = 0. In this case, I2 does not exceed ||γq,n−ℓ||1||γq,ℓ||1 tp−1 dt. The latter is finite when p < 0, because, by Lemma 7.2, γq,n−ℓ and γq,ℓ are integrable functions on respective spaces. When ξ → ∞, one can readily check that hp,q,ℓ (ξ) = O(|ξ|m) for some m > 0, and therefore, hp,q,ℓ ∈ S ′(Rn). To compute the Fourier transform (|| · ||pq,ℓ)∧(ξ), we replace ||x|| q,ℓ by the formula ||x||pq,ℓ = Γ(−p/q) tp−1 e−|x ′/t|q−|x′′/t|q dt, p < 0, 34 BORIS RUBIN and note that the Fourier transform of the function x→ e−|x′/t|q−|x′′/t|q is just γq,n−ℓ (ξ ′t) γq,ℓ (ξ ′′t). Then 〈|| · ||pq,ℓ) ∧ , ϕ̂〉 = (2π)n〈|| · ||pq,ℓ , ϕ〉 (2π)nq Γ(−p/q) tp−1 dt ′/t|q−|x′′/t|qϕ(x) dx Γ(−p/q) tn+p−1 dt γq,n−ℓ (ξ ′t) γq,ℓ (ξ ′′t) ϕ̂(ξ) dξ. Interchange of the order of integration in this argument can be easily justified using absolute convergence of integrals under consideration. Theorem 7.4. If 0 < q ≤ 2, 0 < ℓ < n, then Bnq,ℓ is a λ-intersection body for any 0 < λ < n. Proof. Owing to Lemma 7.1, the function (7.8) (with p replaced by −λ) is positive, and therefore, by Lemma 7.3, || · ||−λq,ℓ represents a positive definite distribution. Now the result follows by Theorem 5.1. � Consider the case q > 2. In this case Bnq,ℓ is convex, and, owing to Example 6.2, Bnq,ℓ ∈ Inλ for all n− 3 ≤ λ < n. What about λ < n− 3? This case is especially intriguing. Proposition 7.5. If q > 2 and 0 < λ < max(n− ℓ, ℓ)− 2, then || · ||−λq,ℓ is not a positive definite distribution and therefore, Bnq,ℓ 6∈ Inλ . Proof. Let 0 < λ < n− ℓ− 2 and suppose the contrary, that Bnq,ℓ ∈ Inλ . Consider the section of Bnq,ℓ by the (n − ℓ + 1)-dimensional plane η = Ren ⊕ Rn−ℓ. By Theorem 5.12, Bnq,ℓ ∩ η ∈ In−ℓ+1λ in η, and therefore ||xnen + x′′||λq,ℓ = (|xn|q + |x′′|q)−λ/q is a positive definite distribution in η. By the second derivative text (see [K4, Theorem 4.19]) this is impossible if 0 < λ < n− ℓ− 2. A similar contradiction can be obtained if we assume 0 < λ < ℓ− 2 and consider the section of Bnq,ℓ by the (ℓ+ 1)-dimensional plane Re1 ⊕ Rℓ. � Proposition 7.5 can be proved without using the second derivative text and Theorem 5.12 on sections of λ-intersection bodies; see [R4]. The bounds for λ appear to be the same. Open problem. Let q > 2, ℓ > 1. Is Bnq,ℓ a λ-intersection body if max(n− ℓ, ℓ)− 2 < λ < n− 3? This problem does not occur in the case ℓ = 1 as in Example 6.6. INTERSECTION BODIES 35 8. The generalized cosine transforms and comparison of volumes For 1 < i < n, let voli(·) denote the i-dimensional volume function. Suppose that i is fixed, and let A and B be o.s. convex bodies in Rn satisfying (8.1) voli(A ∩ ξ) ≤ voli(B ∩ ξ) ∀ξ ∈ Gn,i. Does it follow that (8.2) voln(A) ≤ voln(B) ? This question is known as the Generalized Busemann-Petty Problem (GBP); see [G], [RZ], [Z1]. Theorem 8.1. If GBP (8.1)-(8.2) has an affirmative answer, then every smooth origin-symmetric convex body with positive curvature in n is an (n− i)-intersection body. Proof. Suppose that B is an o.s. convex body in Rn so that the radial function ρB is infinitely smooth, the boundary of B has a positive curva- ture and B /∈ Inn−i. By Definition 5.4, there is a function ϕ ∈ De(Sn−1), which is negative on some open origin-symmetric set Ω ⊂ Sn−1 and such that ρn−iB = M 1+i−nϕ. We choose a function h ∈ De(Sn−1) so that h 6≡ 0, h(θ) ≥ 0 if θ ∈ Ω and h(θ) ≡ 0 otherwise. Define an o.s. smooth body A by ρiA = ρ B − εM1−ih, ε > 0. If ε is small enough, then A is convex. Since by (3.12), RiM 1−ih = cR0n−i,⊥h ≥ 0, then A ≤ RiρiB, which gives (8.1). On the other hand, by (3.5), (ρn−iB , ρ B − ρiA) = ε(M1+i−nϕ,M1−ih) = ε(ϕ, h) < 0, or (ρn−iB , ρ B) < (ρ B , ρ A). By Hölder’s inequality, this implies voln(B) < voln(A), which contradicts (8.2). � Remark 8.2. As we noted in Introduction, Theorem 8.1 is not new, and its proof given in [K3] relies on a sequence of deep facts from functional analysis. The proof presented above is much more elementary and constructive. For instance, it allows us to keep invariance properties of the bodies under control. This advantage was essentially used in our paper [R4]. Theorem 8.1 and Proposition 7.5 imply the following Corollary 8.3. Let 1 ≤ ℓ ≤ n/2; i > ℓ+2, B = Bn4,ℓ (see (7.1)). Then there is a smooth o.s. convex body A in Rn so that (8.1) holds but (8.2) fails. 36 BORIS RUBIN Setting ℓ = 1 in this statement, we obtain the well-known Bourgain- Zhang theorem, which states that GBP has a negative answer when 3 < i < n; see [BZ], [K4], [RZ] on this subject. For i = 2 and i = 3 (n ≥ 5) the GBP is still open. An affirmative answer in these cases was obtained in [R4] for bodies having a certain additional symmetry. 9. Appendix Every positive distribution F ∈ S ′(Rn) is given by a tempered non- negative measure µ, i.e., 〈F, φ〉 = φ(x)dµ(x); see, e.g., [GV, p.147]). For convenience of the reader, we present a similar fact for the sphere. Theorem 9.1. A distribution f ∈ D′(Sn−1) is positive if and only if there is a measure µ ∈ M+(Sn−1) such that (f, ϕ) = ϕ(θ)dµ(θ) ∀ϕ ∈ D(Sn−1). Proof. This statement is known, however, we could not find precise ref- erence and decided to give a proof for convenience of the reader. The “if” part is obvious. To prove the “ only if” part, we write a test func- tion ϕ ∈ D(Sn−1) as a sum ϕ = ϕ1+iϕ2, where ϕ1 = Reϕ, ϕ2 = Imϕ. Since −||ϕ||C(Sn−1) ≤ ϕj ≤ ||ϕ||C(Sn−1), j = 1, 2, and f is positive, then −(f, 1) ||ϕ||C(Sn−1) ≤ (f, ϕj) ≤ (f, 1) ||ϕ||C(Sn−1), and therefore, |(f, ϕ)| ≤ |(f, ϕ1)|+ |(f, ϕ2)| ≤ 2(f, 1) ||ϕ||C(Sn−1). Since D(Sn−1) is dense in C(Sn−1), then f extends as a linear continuous functional f̃ on C(Sn−1) and, by the Riesz theorem, there is a mea- sure µ on Sn−1 such that (f̃ , ω) = ω(θ)dµ(θ) for every ω ∈ C(Sn−1). In particular, (f, ϕ) = (f̃ , ϕ) = ϕ(θ)dµ(θ) for every ϕ ∈ D(Sn−1). By taking into account that every non-negative function ω ∈ C(Sn−1) can be uniformly approximated by non-negative functions ϕk ∈ D(Sn−1) (for instance, by Poisson integrals of ω), we get ω(θ)dµ(θ) = lim ϕk(θ)dµ(θ) = lim (f, ϕk) ≥ 0. The latter means that µ is non-negative. � References [BZ] J. Bourgain, G. Zhang, On a generalization of the Busemann-Petty prob- lem, Convex geometric analysis (Berkeley, CA, 1996), 65–76, Math. Sci. Res. Inst. Publ., 34, Cambridge Univ. Press, Cambridge, 1999. [Er] A. Erdélyi (Editor), Higher transcendental functions, Vol. II, McGraw- Hill, New York, 1953. INTERSECTION BODIES 37 [FGW] H. Fallert, P. Goodey, W, Weil, Spherical projections and centrally sym- metric Sets, Advances in Math., 129 (1997), 301–322. [F] W. Feller, An introduction to probability theory and its application, Wiley & Sons, New York, 1971. [G] R.J. Gardner, Geometric tomography, Cambridge University Press, New York, 1995; updates in http://www.ac.wwu.edu/ gardner/. [GGG] I. M. Gel’fand, S. G. Gindikin, andM. I. Graev, Selected topics in integral geometry, Translations of Mathematical Monographs, AMS, Providence, Rhode Island, 2003. [GS] I. M. Gelfand, G.E. Shilov, Generalized functions, vol. 1, Properties and Operations, Academic Press, New York, 1964. [GV] I. M. Gelfand, N. Ya. Vilenkin, Generalized functions, vol. 4, Applica- tions of harmonic analysis, Academic Press, New York, 1964. [GLW] P. Goodey, E. Lutwak, W. Weil, Functional analytic characterizations of classes of convex bodies, Math. Z. 222 (1996), 363–381. [GW] P. Goodey, W. Weil, Intersection bodies and ellipsoids. Mathematika, 42 (1995), 295–304. [GZ] E.L. Grinberg, G. Zhang, Convolutions, transforms, and convex bodies, Proc. London Math. Soc. (3), 78 (1999), 77–115. [He] S. Helgason, The Radon transform, Birkhäuser, Boston, Second edition, 1999. [K1] A. Koldobsky, Intersection bodies in R4, Adv. Math., 136 (1998), 1-14. [K2] , A generalization of the Busemann-Petty problem on sections of convex bodies, Israel J. Math. 110 (1999), 75–91. [K3] , A functional analytic approach to intersection bodies, Geom. Funct. Anal., 10 (2000), 1507–1526. [K4] , Fourier analysis in convex geometry, Mathematical Surveys and Monographs, 116, AMS, 2005. [Le] C. Lemoine, Fourier transforms of homogeneous distributions, Ann. Scuola Norm. Super. Pisa Sci. Fis. e Mat., 26 (1972), No. 1, 117–149. [Lu] E. Lutwak, Intersection bodies and dual mixed volumes, Adv. in Math. 71 (1988), 232–261. [Mi1] E. Milman, Generalized intersection bodies, J. Funct. Anal., 240 (2006), 530–567. [Mi2] , Generalized intersection bodies are not equivalent, math.FA/0701779. [Mü] Cl. Müller, Spherical harmonics, Springer, Berlin, 1966. [Ne] U. Neri, Singular integrals, Springer, Berlin, 1971. [PS] G. Polya, G. Szego, Aufgaben und lehrsatze aus der analysis, Springer- Verlag, Berlin-New York, 1964. [PBM] A. P. Prudnikov, Y. A. Brychkov, O. I. Marichev, Integrals and series: special functions, Gordon and Breach Sci. Publ., New York - London, 1986. [R1] B. Rubin, Inversion of fractional integrals related to the spherical Radon transform, Journal of Functional Analysis, 157 (1998), 470–487. [R2] , Inversion formulas for the spherical Radon transform and the generalized cosine transform, Advances in Appl. Math. 29 (2002), 471– 38 BORIS RUBIN [R3] , Notes on Radon transforms in integral geometry, Fractional Cal- culus and Applied Analysis, 6 (2003), 25–72. [R4] , The lower dimensional Busemann-Petty problem for bodies with the generalized axial symmetry, math.FA/0701317. [R5] , Generalized cosine transforms and classes of star bodies, math.FA/0602540. [RZ] B. Rubin, G. Zhang, Generalizations of the Busemann-Petty problem for sections of convex bodies, J. Funct. Anal., 213 (2004), 473–501. [Sa1] S. G. Samko, The Fourier transform of the functions Ym(x/|x|)/|x|n+α, Soviet Math. (IZ. VUZ) 22 (1978), no. 7, 6–64. [Sa2] , Generalized Riesz potentials and hypersingular integrals with homogeneous characteristics, their symbols and inversion, Proceeding of the Steklov Inst. of Math., 2 (1983) , 173–243. [Sa3] , Singular integrals over a sphere and the construction of the characteristic from the symbol, Soviet Math. (Iz. VUZ), 27 (1983), No. 4, 35–52. [Schn] R. Schneider, Convex bodies: The Brunn-Minkowski theory, Cambridge Univ. Press, 1993. [Schw] L. Schwartz, Théorie des distributions, Tome 1, Paris, Hermann, 1950. [Se] V.I. Semyanistyi, Some integral transformations and integral geometry in an elliptic space, Trudy Sem. Vektor. Tenzor. Anal., 12 (1963), 397–441 (Russian). [St] E. M. Stein, Singular integrals and differentiability properties of func- tions, Princeton Univ. Press, Princeton, NJ, 1970. [SW] E.M. Stein, G. Weiss, Introduction to Fourier analysis on Euclidean spaces, Princeton Univ. Press, Princeton, NJ, 1971. [Str1] R.S. Strichartz, Convolutions with kernels having singularities on a sphere, Trans. Amer. Math. Soc., 148 (1970), 461–471. [Str2] , Lp-estimates for Radon transforms in Euclidean and non- euclidean spaces, Duke Math. J., 48 (1981), 699–727. [Z1] G. Zhang, Sections of convex bodies, Amer. J. Math., 118 (1996), 319– [Z2] , A positive solution to the Busemann-Petty problem in R4, Ann. of Math. (2), 149 (1999), 535–543. Department of Mathematics, Louisiana State University, Baton Rouge, LA, 70803 USA E-mail address : borisr@math.lsu.edu ABSTRACT Intersection bodies represent a remarkable class of geometric objects associated with sections of star bodies and invoking Radon transforms, generalized cosine transforms, and the relevant Fourier analysis. The main focus of this article is interrelation between generalized cosine transforms of different kinds in the context of their application to investigation of a certain family of intersection bodies, which we call $\lam$-intersection bodies. The latter include $k$-intersection bodies (in the sense of A. Koldobsky) and unit balls of finite-dimensional subspaces of $L_p$-spaces. In particular, we show that restrictions onto lower dimensional subspaces of the spherical Radon transforms and the generalized cosine transforms preserve their integral-geometric structure. We apply this result to the study of sections of $\lam$-intersection bodies. New characterizations of this class of bodies are obtained and examples are given. We also review some known facts and give them new proofs. <|endoftext|><|startoftext|> Introduction Hidden Markov models (HMMs) are generative probabilistic models that have been succesfuly used for annotation of sequence data, such as DNA and protein sequences, natural langauge texts, and sequences of observations or measurements. Their numerous applications include gene finding [1], protein secondary structure prediction [2], and speech recognition [3]. The linear-time Viterbi algorithm [4] is the most commonly used algorithm for these tasks. Unfortunately, the space required by the Viterbi algorithm grows linearly with the length of the sequence (with a high constant factor), which makes it unsuitable for analysis of continuous or very long sequences. For example, DNA sequence of a single chromosome can be hundreds of megabases long. In this paper, we address this problem by proposing an on-line Viterbi algorithm that on average requires much less memory and that can annotate continuous streams of data on-line without reading the complete input sequence first. An HMM, composed of states and transitions, is a probabilistic model that generates sequences over a given alphabet. In each step of this generative process, the current state generates one symbol of the sequence according to the emission probabilities associated with that state. Then, an outgoing transition is randomly chosen according to the transition probability table, and this transition is followed to the new state. This process is repeated until the whole sequence is generated. The states in the HMM represent distinct features of the observed sequences (such as protein coding and non-coding sequences in a genome), and the emission probabilities in each state represent statistical properties of these features. The HMM thus defines a joint probability Pr(X,S) over all possible sequences X and all state paths S through the HMM that could generate these sequences. To annotate a given sequence X, we want to recover the state path S that maximizes this joint probability. For example, in an HMM with one state for protein-coding sequences, and one state for non-coding sequences, the most probable state path marks each symbol of the input sequence X as either protein coding or non-coding. http://arxiv.org/abs/0704.0062v1 To compute the most probable state path, we use the Viterbi dynamic programming algorithm [4]. For every prefix X1 . . . Xi of the given sequence X and for every state j, we compute the most probable state path generating this prefix ending in state j. We store the probability of this path in table P (i, j) and its second last state in table B(i, j). These values can be computed from left to right, using the recurrence P (i, j) = maxk{P (i− 1, k) · tk(j) · ej(Xi)}, where tk(j) is the transition probability from state k to state j, and ej(Xi) is the emission probability of the i-th symbol of X in state j. Back pointer B(i, j) is the value of k that maximizes P (i, j). After computing these values, we can recover the most probable state path S = s1, . . . , sn by setting the last state as sn = argmaxk{P (n, k)}, and then following the back pointers B(i, j) from right to left (i.e., si = B(i + 1, si+1)). For an HMM with m states and a sequence X of length n, the running time of the Viterbi algorithm is Θ(nm2), and the space is Θ(nm). This algorithm is well suited for sequences and models of moderate size. However, to annotate all 250 million symbols of the human chromosome 1 with a gene finding HMM consisting of hundred states, we would require 25 GB of memory just to store the back pointers B(i, j). This is clearly impractical on most computational platforms. Several solutions are used in practice to overcome this problem. For example, most practical gene finding programs process only sequences of limited size. The long input sequence is split into several shorter sequences which are processed separately. Afterwards, the results are merged and conflicts are resolved heuristically. This approach leads to suboptimal solutions, especially if the genes we are looking for cross the boundaries of the split. Grice et al. [5] proposed a practical checkpointing algorithm that trades running time for space. We divide the input sequence into K blocks of L symbols, and during the forward pass, we only keep the first column of each block. To obtain the most probable state path, we recompute the last block of L columns, and use back pointers to recover the last L states of the most probable path, as well as the last state of the previous block. The information about this last state can now be used to recompute the most probable state path within the previous block in the same way, and the process is repeated for all blocks. Since every value of P (i, j) will be computed twice, this means two-fold slow-down compared to the Viterbi algorithm, but if we set K = L = n, this algorithm only requires Θ( nm) memory. Checkpointing can be further generalized to trade L-fold slow-down for memory of Θ( L nm) [6, 7]. In this paper, we propose and analyze an on-line Viterbi algorithm that does not use fixed amount of memory for a given sequence. Instead, the amount of memory varies depending on the properties of the HMM and the input sequence. In the worst case, our algorithm still requires Θ(nm) memory; however, in practice the requirements are much lower. We prove, by demonstrating analogy to random walks and using results from the theory of extreme values, that in simple cases the expected space for a sequence of length n is as low as Θ(m log n). We also experimentally demonstrate that the memory requirements are low for more complex HMMs. 2 On-line Viterbi algorithm In our algorithm, we represent the back pointer matrix B in the Viterbi algorithm by a tree structure (see [4]), with node (i, j) for each sequence position i and each state j. Parent of node (i, j) is the node (i − 1, B(i, j)). In this data structure, the most probable state path is a path from the leaf node (n, j) with the highest probability P (n, j) to the root of the tree (see Figure 1). This tree is built as the Viterbi algorithm progresses from left to right. After processing sequence position i, all edges that do not lie on one of the paths ending in a level i node can be removed; sequence positions Fig. 1. Example of the back pointer tree structure. Dashed lines mark the edges that cannot be part of the most probable state path. The square node marks the coalescence point of the remaining paths. these edges will not be used in the most probable path [8]. The remaining m paths represent all possible initial segments of the most probable state path. These paths are not necessarily edge disjoint; in fact, often all the paths share the same prefix up to some node called coalescence point (see Figure 1). Left of the coalescence point, there is only a single candidate for the initial segment of the most probable state path. Therefore we can output this segment and remove all edges and nodes of the tree up to the coalescence point. Forney [4] describes an algorithm that after processing D symbols of the input sequence checks whether a coalescence point has been reached; in such case, the initial segment of the most probable state path is outputted. If the coalescence point was not reached, one potential initial segment is chosen heuristicaly. Several studies [9, 10] suggest how to choose D to limit the expected error caused by such heuristic steps in the context of convolution codes. Here we show how to detect the existence of a coalescence point dynamically without introducing significant overhead to the whole computation. We maintain a compressed version of the back pointer tree, where we omit all internal nodes that have less than two children. Any path consisting of such nodes will be contracted to a single edge. This compressed tree has m leaves and at most m− 1 internal nodes. Each node stores the number of its children and a pointer to its parent node. We also keep a linked list of all the nodes of the compressed tree ordered by the sequence position. Finally, we also keep the list of pointers to all the leaves. When processing the k-th sequence position in the Viterbi algorithm, we update the compressed tree as follows. First, we create a new leaf for each node at position i, link it to its parent (one of the former leaves), and insert it into the linked list. Once these new leaves are created, we remove all the former leaves that have no children, and recursively all of their ancestors that would not have any children. Finally, we need to compress the new tree: we examine all the nodes in the linked list in order of decreasing sequence position. If the node has zero or one child and is not a current leaf, we simply delete it. For each leaf or node that has at least two children, we follow the parent links until we find its first ancestor (if any) that has at least two children and link the current node directly to that ancestor. A node (ℓ, j) that does not have an ancestor with at least two children is the coalescence point; it will become a new root. We can output the most probable state path for all sequence positions up to ℓ, and remove all results of computation for these positions from memory. The running time of this update is O(m) per sequence position, and the representation of the compressed tree takes O(m) space. Thus the asymptotic running time of the Viterbi algorithm is not increased by the maintanance of the compressed tree. Moreover, we have implemented both the standard Viterbi algorithm and our new on-line extension, and the time measurements suggest that the overhead required for the compressed tree updates is less than 5%. The worst-case space required by this algorithm is still O(nm). However, this is rarely the case for realistic data; required space changes dynamically depending on the input. In the next section, we show that for simple HMMs the expected maximum space required for processing sequence of length n is Θ(m log n). This is much better than checkpointing, which requires space of Θ(m with a significant increase in running time. We conjecture that this trend extends to more complex cases. We also present experimental results on a gene finding HMM and real DNA sequences showing that the on-line Viterbi algorithm leads to significant savings in memory. Another advantage of our algorithm is that it can construct initial segments of the most probable state path before the whole input sequence is read. This feature makes it ideal for on-line processing of signal streams (such as sensor readings). 3 Memory requirements of the on-line Viterbi algorithm In this section, we analyze the memory requirements of the on-line Viterbi algorithm. The memory used by the algorithm is variable throughout the execution of the algorithm, but of special interest are asymptotic bounds on the expected maximum amount of memory used by the algorithm while decoding a sequence of length n. We use analogy to random walks and results in extreme value theory to argue that for a symmet- ric two-state HMMs, the expected maximum memory is Θ(m log n). We also conduct experiments on an HMM for gene finding, and both real and simulated DNA sequences. 3.1 Symmetric two-state HMMs Consider a two-state HMM over a binary alphabet as shown in Figure 2a. For simplicity, we assume t < 1/2 and e < 1/2. The back pointers between the sequence positions i and i+1 can form one of the configurations i–iii shown in Figure 2b. Denote pA = log P (i, A) and pB = logP (i, B), where P (i, j) is the table of probabilities from the Viterbi algorithm. The recurrence used in the Viterbi algorithm implies that the configuration i occurs when log t−log(1−t) ≤ pA−pB ≤ log(1−t)−log t, configuration ii occurs when pA−pB ≥ log(1−t)−log t, and configuration iii occurs when pA−pB ≤ log t− log(1− t). Configuration iv never happens for t < 1/2. Note that for a two-state HMM, a coalescence point occurs whenever one of the configurations ii or iii occur. Thus the memory used by the HMM is proportional to the length of continuous sequence of configurations i. We will call such a sequence of configurations a run. First, we analyze the length distribution of runs under the assumption that the input sequence X is a sequence of uniform i.i.d. binary random variables. In such case, we represent the run by a symmetric random walk corresponding to a random variable X = pA−pB log(1−e)−log e − (log t− log(1− t)). Whenever this variable is within the interval (0,K), where K = log(1−t)−log(t) log(1−e)−log(e) , the configuration i occurs, and the quantity pA−pB is updated by log(1−e)−log e, if the symbol at the corresponding sequence position is 0, or log e− log(1− e), if this symbol is 1. These shifts correspond to updating the value of X by +1 or −1. When X reaches 0, we have a coalescence point in configuration iii, and the pA−pB is initialized to log t− log(1 − t) ± (log e − log 1 − e), which either means initialization of X to +1, or another 0: 1−e 1−t 1−t 1: 1−e configuration i: configuration ii: configuration iii: configuration iv: (a) (b) Fig. 2. (a) Symmetric two-state HMM with two parameters: e for emission probabilities and t for transitions probabilities. (b) Possible back-pointer configurations for the two-state HMM. coalescence point, depending on the symbol at the corresponding sequence position. The other case, when X reaches K and we have a coalescence point in configuration ii, is symmetric. We can now apply the classical results from the theory of random walks (see [11, ch.14.3,14.5]) to analyze the expected length of runs. Lemma 1. Assuming that the input sequence is uniformly i.i.d., the expected length of a run of a symmetrical two-state HMM is K − 1. Therefore the larger is K, the more memory is required to decode the HMM. The worst case is achieved as e approaches 1/2. In such case, the two states are indistinguishable and being in state A is equivalent to being in state B. Using the theory of random walks, we can also characterize the distribution of length of runs. Lemma 2. Let Rℓ be the event that the length of a run of a symmetrical two-state HMM is either 2ℓ + 1 or 2ℓ + 2. Then, assuming that the input sequence is uniformly i.i.d., for some constants b, c > 0: b · cos2ℓ π ≤ Pr(Rℓ) ≤ c · cos2ℓ Proof. For a symmetric random walk on interval (0,K) with absorbing barriers and with starting point z, the probability of event Wz,n that this random walk ends in point 0 after n steps is zero, if n− z is odd, and the following quantity, if n− z is even [11, ch.14.5]: Pr(Wz,n) = 0 k) ≤ cak. Let Nn be the largest index such that i=1...Nn Xi ≤ n, and let Yn be max{X1,X2, . . . ,XNn , n− i=1Xi}. Then E[Yn] = log1/a n+ o(log n) (7) Proof. Let Zn = maxi=1...nXn be the maximum of the first n runs. Clearly, Pr(Zn ≤ k) = Pr(Xi ≤ k)n, and therefore (1− cak)n ≤ Pr(Zn ≤ k) ≤ (1− bak)n for all integers k ≥ log1/a(c). Lower bound: Let tn = log1/a n− lnn. If Yn ≤ tn, we need at least n/tn runs to reach the sum n, i.e. Nn ≥ n/tn − 1 (discounting the last incomplete run). Therefore Pr(Yn ≤ tn) ≤ Pr(Z n −1 ≤ tn) ≤ (1− batn) = (1− batn)a −tnatn ( n Since limn→∞ a tn(n/tn−1) = ∞ and limx→0(1− bx)1/x = e−b, we get limn→∞Pr(Yn ≤ tn) = 0. Note that E[Yn] ≥ tn(1− Pr(Yn ≤ tn)), and thus we get the desired bound. Upper bound: Clearly, Yn ≤ Zn and so E[Yn] ≤ E[Zn]. Let Z ′n be the maximum of n i.i.d. geometric random variables X ′1, . . . ,X n such that Pr(X i ≤ k) = 1− ak. We will compare E[Zn] to the expected value of variable Z n. Without loss of generality, c ≥ 1. For any real x ≥ log1/a(c) + 1 we have: Pr(Zn ≤ x) ≥ (1− ca⌊x⌋)n 1− a⌊x⌋−log1/a(c) 1− a⌊x−log1/a(c)−1⌋ = Pr(Z ′n ≤ x− log1/a(c)− 1) = Pr(Z ′n + log1/a(c) + 1 ≤ x) This inequality holds even for x < log1/a(c) + 1, since the right-hand side is zero in such case. Therefore, E[Zn] ≤ E[Z ′n+log1/a(c)+1] = E[Z ′n]+O(1). Expected value of Z ′n is log1/a(n)+o(log n) [14], which proves our claim. ⊓⊔ Using results of Lemma 3 together with the characterization of run length distributions by Lemma 2, we can conclude that for symmetric two-state HMMs, the expected maximum memory required to process a uniform i.i.d. input sequence of length n is (1/ ln(1/ cos(π/K)))·ln n+o(log n). 3 Using the Taylor expansion of the constant term as K grows to infinity, 1/ ln(1/ cos(π/K))) = 2K2/π2 +O(1), we obtain that the maximum memory grows approximately as (2K2/π2) lnn. The asymptotic bound Θ(log n) can be easily extended to the sequences that are generated by the symmetric HMM, instead of uniform i.i.d. The underlying process can be described as a random walk with approximately 2K states on two (0,K) lines, each line corresponding to sequence symbols generated by one of the two states. The distribution of run lengths still decays geometrically as required by Lemma 3; the base of the exponent is the largest eigenvalue of the transition matrix with absorbing states omitted (see e.g. [15, Claim 2]). The situation is more complicated in the case of non-symmetric two-state HMMs. Here, our random walks proceed in steps that are arbitrary real numbers, different in each direction. We are not aware of any results that would help us to directly analyze distributions of runs in these models, however we conjecture that the size of the longest run is still Θ(log n). Perhaps, to obtain bounds on the length distribution of runs, one can approximate the behaviour of such non-discrete random walks by a different model (for example, [16, ch.7]). 3.2 Multi-state HMMs Our analysis technique cannot be easily extended to HMMs with many states. In two-state HMMs, each new coalescence event clears the memory, and thus the execution of the algorithm can be divided into more or less independent runs. A coalescent event in a multi-state HMM results in a non-trivial tree left in memory, sometimes with a substantial depth. Thus, the sizes of consecutive runs are no longer independent (see Figure 3a). 3 We omitted the first run, which has a different starting point and thus does not follow the distribution outlined in Lemma 2. However, the expected length of this run does not depend on n and thus contributes only a lower-order term. We also omitted the runs of length one that start outside the interval (0,K); these runs again contribute only to lower order terms of the lower bound. 15.2M 15.3M 15.4M 15.5M Section of chromosome 1 0 5M 10M 15M 20M Sequence length Human genome (35) HMM generated (100) Random i.i.d. (35) Fig. 3. Memory requirements of a gene finding HMM. a) Actual length of table used on a segment of human chromosome 1. b) Average maximum table length needed for prefixes of 20 MB sequences. To evaluate the memory requirements of our algorithm for multi-state HMMs, we have im- plemented the algorithm and performed several experiments on both simulated and biological se- quences. First, we generalized the symmetric HMMs from the previous section to multiple states. The symmetric HMM with m states emits symbols over m-letter alphabet, where each state emits one symbol with higher probability than the other symbols. The transition probabilities are equiprobable, except for self-transitions. We have tested the algorithm for m ≤ 6 and sequences generated both by a uniform i.i.d. process, and by the HMM itself. Observed data are consistent with the logarithmic growth of average maximum memory needed to decode a sequence of length n (data not shown). We have also evaluated the algorithm using a simplified HMM for gene finding with 265 states. The emission probabilities of the states are defined using at most 4-th order Markov chains, and the structure of the HMM reflects known properties of genes (similar to the structure shown in [17]). The HMM was trained on RefSeq annotations of human chromosomes 1 and 22. In gene finding, we segment the input DNA sequence into exons (protein-coding sequence in- tervals), introns (non-coding sequence separating exons within a gene), and intergenic regions (se- quence separating genes). Common measure of accuracy is exon sensitivity (how many of real exons we have succesfuly and exactly predicted). The implementation used here has exon sensitivity 37% on testing set of genes by Guigo et al. [18]. A realistic gene finder, such as ExonHunter [19], trained on the same data set achieves sensitivity of 53%. This difference is due to additional features that are not implemented in our test, namely GC content levels, non-geometric length distributions, and sophisticated signal models. We have tested the algorithm on 20 MB long sequences: regions from the human genome, simulated sequences generated by the HMM, and i.i.d. sequences. Regions of the human genome were chosen from hg18 assembly so that they do not contain sequencing gaps. The distribution for the i.i.d. sequences mirrors the distribution of bases in the human chromosome 1. The results are shown in Figure 3b. The average maximum length of the table over several samples appears to grow faster than logarithmically with the length of the sequence, though it seems to be bounded by a polylogarithmic function. It is not clear whether the faster growth is an artifact that would disapear with longer sequences or higher number of samples. The HMM for gene finding has a special structure, with three copies of the state for introns that have the same emission probabilities and the same self-transition probability. In two-state symmetric HMMs, similar emission probabilities of the two states lead to increase in the length of individual runs. Intron states of a gene finder are an extreme example of this phenomenon. Nonetheless, on average a table of length roughly 100,000 is sufficient to to process sequences of length 20 MB, which is a 200-fold improvement compared to the trivial Viterbi algorithm. In addition, the length of the table did not exceed 222,000 on any of the 20MB human segments. As we can see in Figure 3a, most of the time the program keeps only relatively short table; the average length on the human segments is 11,000. The low average length can be of a significant advantage if multiple processes share the same memory. 4 Conclusion In this paper, we introduced the on-line Viterbi algorithm. Our algorithm is based on efficient detec- tion of coalescence points in trees representing the state-paths under consideration of the dynamic programming algorithm. The algorithm requires variable space that depends on the HMM and on the local properties of the analyzed sequence. For two-state symmetric HMMs, we have shown that the expected maximum memory used for analysis of sequence of length n is approximately only (2K2/π2) ln n. Our experiments on both simulated and real data suggest that the asymptotic bound Θ(m lnn) also extend to multi-state HMMs, and in fact, for most of the time throughout the execution of the algorithm, much less memory is used. Further advantage of our algorithm is that it can be used for on-line processing of streamed sequences; all previous algorithms that are guaranteed to produce the optimal state path require the whole sequence to be read before the output can be started. There are still many open problems. We have only been able to analyze the algorithm for two- state HMMs, though trends predicted by our analysis seem to generalize even to more complex cases. Can our analysis be extended to multi-state HMMs? Apparently, design of the HMM affects the memory needed for the decoding algorithm; for example, presence of states with similar emission probabilities tends to increase memory requirements. Is it possible to characterize HMMs that require large amounts of memory to decode? Can we characterize the states that are likely to serve as coalescence points? Acknowledgments: Authors would like to thank Richard Durrett for useful discussions. Recently, we have found out that parallel work on this problem is also performed by another research group [20]. Focus of their work is on implementation of an algorithm similar to our on-line Viterbi algorithm in their gene finder, and possible applications to parallelization, while we focus on the expected space analysis. References 1. Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268(1) (1997) 78–94 2. Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. Journal of Molecular Biology 305(3) (2001) 567–570 3. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2) (1989) 257–286 4. Forney Jr., G.D.: The Viterbi algorithm. Proceedings of the IEEE 61(3) (1973) 268–278 5. Grice, J.A., Hughey, R., Speck, D.: Reduced space sequence alignment. Computer Applications in the Biosciences 13(1) (1997) 45–53 6. Tarnas, C., Hughey, R.: Reduced space hidden Markov model training. Bioinformatics 14(5) (1998) 401–406 7. Wheeler, R., Hughey, R.: Optimizing reduced-space sequence analysis. Bioinformatics 16(12) (2000) 1082–1090 8. Henderson, J., Salzberg, S., Fasman, K.H.: Finding genes in DNA with a hidden Markov model. Journal of Computational Biology 4(2) (1997) 127–131 9. Hemmati, F., Costello, D., J.: Truncation error probability in Viterbi decoding. IEEE Transactions on Commu- nications 25(5) (1977) 530–532 10. Onyszchuk, I.: Truncation length for Viterbi decoding. IEEE Transactions on Communications 39(7) (1991) 1023–1026 11. Feller, W.: An Introduction to Probability Theory and Its Applications, Third Edition, Volume 1. Wiley (1968) 12. Guibas, L.J., Odlyzko, A.M.: Long repetitive patterns in random sequences. Probability Theory and Related Fields 53 (1980) 241–262 13. Gordon, L., Schilling, M.F., Waterman, M.S.: An extreme value theory for long head runs. Probability Theory and Related Fields 72 (1986) 279–287 14. Schuster, E.F.: On overwhelming numerical evidence in the settling of Kinney’s waiting-time conjecture. SIAM Journal on Scientific and Statistical Computing 6(4) (1985) 977–982 15. Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. Journal of Computer and System Sciences 70(3) (2005) 342–363 16. Durrett, R.: Probability: Theory and Examples. Duxbury Press (1996) 17. Brejova, B., Brown, D.G., Vinar, T.: Advances in hidden Markov models for sequence annotation. In Mandoiu, I., Zelikovski, A., eds.: Bioinformatics Algorithms: Techniques and Applications. Wiley (2007) To appear. 18. Guigo, R., et al.: EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biology 7(S1) (2006) 1–31 19. Brejova, B., Brown, D.G., Li, M., Vinar, T.: ExonHunter: a comprehensive approach to gene finding. Bioinfor- matics 21(S1) (2005) i57–65 20. Keibler, E., Brent, M.: Personal communication (2006) ABSTRACT In this paper, we introduce the on-line Viterbi algorithm for decoding hidden Markov models (HMMs) in much smaller than linear space. Our analysis on two-state HMMs suggests that the expected maximum memory used to decode sequence of length $n$ with $m$-state HMM can be as low as $\Theta(m\log n)$, without a significant slow-down compared to the classical Viterbi algorithm. Classical Viterbi algorithm requires $O(mn)$ space, which is impractical for analysis of long DNA sequences (such as complete human genome chromosomes) and for continuous data streams. We also experimentally demonstrate the performance of the on-line Viterbi algorithm on a simple HMM for gene finding on both simulated and real DNA sequences. <|endoftext|><|startoftext|> Introduction Neutrinoless double beta decay is one of the most sensitive approaches with great perspectives to test particle physics beyond the Standard Model. There is immense scope to use 0νββ decay for constraining neutrino masses, left–right–symmetric models, interactions involving R-parity breaking in the supersymmetric model and leptoquark scenarios, as well as effective lepton number violating couplings. Experimental limits on 0νββ decay are not only complementary to accelerator experiments but at least in some cases competitive or superior to the best existing direct search limits. The steadily improving experimental limits on the half-life of 0νββ can be translated into more stringent limits on the parameters of these new physics scenarios. In the process of beta decay an unstable nucleus decays by converting a neutron in the nucleus to a proton and emitting an electron and an anti-neutrino. In order for beta decay to be possible the final nucleus must have a larger binding energy than the original nucleus. For some nuclei, such as Germanium-76 the nuclei with atomic number one higher have a smaller binding energy, preventing beta decay from occurring. In the case of Germanium-76 the nuclei with atomic number two higher, Selenium-76 has a larger binding energy, so the "double beta decay" process is allowed. In double beta decay two neutrons in the nuclei are converted to protons, and two electrons and two anti-neutrinos are emitted. It is the rarest known kind of radioactive decay; it was observed for only ten isotopes. For some nuclei, the process occurs as conversion of two protons to neutrons, with emission of two neutrinos and absorption of two orbital electrons (double electron capture). If mass difference between the parent and daughter atoms is more than 1022 keV (two electron masses), another branch of the process becomes possible, with capture of one orbital electron and emission of one positron. And, at last, when the mass difference is more then 2044 keV (four electron masses), the third branch of the decay arises, with emission of two positrons (β+β+ decay). The processes described above are also known as two neutrino double beta decay, as two neutrinos (or anti- neutrinos) are emitted. If the neutrino is a Majorana particle, meaning that the anti-neutrino and the neutrino are actually the same particle then it is possible for neutrinoless double beta decay to occur. In 0νββ decay the emitted neutrino is immediately absorbed (as its anti-particle) by another nucleon of the nucleus, so the total kinetic energy of the two electrons would be exactly the difference in binding energy between the initial and final state nuclei. † Now at Indiana University – Bloomington, USA Experiments have been carried out and proposed to search for 0νββ decay mode, as its discovery would indicate that neutrinos are indeed Majorana particles and allow a calculation of neutrino mass. While the two-neutrino mode (1.1) is allowed by the Standard Model of particle physics, the neutrinoless mode (0νββ) (1.2) requires violation of lepton number (∆L=2). This mode is possible only, if the neutrino is a Majorana particle, i.e. the neutrino is its own antiparticle. Double beta decay, the rarest known nuclear decay process, can occur in different modes: _ 2νββ -decay : A(Z,N) → A(Z+2, N-2)+2e ⎯ +2 ν (1.1) 0 ν ββ -decay : A(Z,N) → A(Z+2, N-2) + 2e⎯ (1.2) 0 ν(2) χ ββ -decay : A(Z,N) → A(Z+2, N-2)+2e ⎯ + (2) χ (1.3) 2. Double Beta Decay: A Rare Process The process arises in certain cases of even-A nuclei, where A is the mass number and is the sum of the number of protons and neutrons (A = Z + N). For even-A nuclei, the strong pairing force between like nucleons (neutrons like to be paired with other neutrons in a given nucleus, with the same true for protons), the binding energy of even-even nuclei (even number of protons and even number neutrons) is larger than that of odd-odd nuclei (odd numbers of protons and neutrons). This fact results in two separate parabolas on a plot of binding energy, one parabola for even-even nuclei and one for odd-odd. Consequently, one occasionally finds a situation where two even-even nuclei for a given mass number A are stable against ordinary beta decay. However, the heavier nucleus is not fully stable and can decay to the lighter nucleus via normal double beta decay, a second-order process whereby the nuclear charge changes by two units. The ground state of the even-even nuclei is 0+ (positive parity) and the nuclear transition is 0+ +→0 . One particular type of experimental approach that hopes to determine if the neutrino is a massive Majorana particle is the search for neutrinoless double beta decay. This type of experiment is perhaps the only feasible method for determining if the neutrino is a Majorana or Dirac particle. While neutrinoless double beta decay has not yet been experimentally discovered, searches have been conducted for many years, with many continuing today. In fact, the next generation of double beta decay experiments is currently being designed and developed and involves a tremendous increase in the amount of source material to be studied (on the order of a half-ton or more). In neutrinoless double beta decay an antineutrino emitted at the first vertex is absorbed at the second as - 2 - seen in the figure below or that a virtual neutrino emitted by a neutron is absorbed by the second neutron participating in the double beta decay. The two neutrino mode is allowed in standard model. The neutrinoless mode can occur only if neutrinos have masses of the Majorana type. The decay rate is proportional to the squared mass. In other words, the half life is inversely proportional to the squared mass. Experimentally one can distinguish the two modes. In the two neutrino mode the electrons take away only a fraction of the energy Q released in the decay. The sum energy spectrum is continuous, extending from 0 to Q. In the neutrinoless mode the total energy Q is carried away by the electrons, and the sum energy spectrum is a peak centered at Q, with a width given by the instrumental resolution. allowed in the Standard Model of physics is given by (2.1) the exchange of the Majorana neutrino in the absence of right-handed (2.2) tively. The nuclear y in the The decay rate for 2ν ββ decay which is The decay rate for the process involving currents can be expressed as follows: 2/1 ),()00( MZEGT −=→ [ ] 2 2/1 ),()00( ><−=→ νννν mM MZEGT F The M and MGT F are the nuclear matrix elements of Gamow-Teller and Fermi transitions respec atrix elements of the 0+→0+ Gamow-Teller and Fermi transition for the two neutrino mode in weak theorm second order perturbation is given by - 3 - (2 ∑ Δ+−= n in ∑∑ ++++++ iknnjf 0110 ττ - 4 - nd is given by and the Gamow-Teller transition operator respectively. complete orthonormal set of intermediate excited states have been introduce denot eutrino double beta decay mode has been expressed in terms of single beta transitions through the introduction of termediate excited states via which the transition from the initial 0+ to the final 0+ state occurs. or the neutrinoless mode, the nuclear matrix elements resulting from Fermi and Gamow-Teller transitions are (2.6) (2.7) here, R ro=1.2 fm. he par e lepton phase space, and gV and gA are the weak vector and xial-vector coupling constants respectively. The is the effective electron neutrino ma . If the light j« few MeV) exchange is the dominant mechanism at both ton number mj) and the mixing arameters ( est possible construction cost. The next sections review experiments 3. International Germa iment (IGEX) is a unique ground to investigate the nature and properties of the neutrino. The tern. To (2.4) where Δ denotes the average energy a ∑ Δ+−= n in ∑∑ ++++++ ikknnjjf 0110 τστσ and the Fermi transition operator is given by. A d ed by . Thus the two given by (2.5) The function H depends on the distance between the nucleons and approximately has the form w = ro A ⅓, A being the mass number and T t G2ν and G0ν results from integrating over th neutrino (m for the 0νββ-decay process and th the neutrino currents are left-handed, then the 0νββ-decay amplitude is proportional to the lep iolating parameters. This effective mass is related to the light neutrino mass eigenvalues (v p Uej) and is give by the relation (2.8) The effective light neutrino mass may be suppressed by a destructive interference between the different contributions in the sum of equation (2.8) if CP is conserved. In this case the mixing matrix satisfies the condition Uej= Uej*.ζj, where ζj = ±i is the CP parity of the Majorana neutrino νj. The absolute value has thus been inserted for convenience, since the quantity inside it is squared in equation (2.8) and is complex if CP is violated. The ideal 0νββ-decay experiment has the following dream features: the lowest possible background, the best possible energy resolution, the greatest possible mass of the parent isotope, detection efficiency near 100% for valid events, a unique signature and the low in such an effort from the isotope 76Ge. nium EXper The nuclear Double Beta Decay neutrinoless decay mode, if it exists, would provide an unambiguous evidence of the Majorana nature of the neutrino, its non-zero mass, and the non-conservation of lepton number. After implication from solar and atmospheric neutrino oscillation results that neutrinos have non-zero mass, the process of neutrinoless Double Beta Decay has become the most relevant place to test the neutrino mass scale and its hierarchy pat achieve high sensitivity limits of the effective Majorana electron neutrino mass derived from the neutrinoless half- life lower bound required for such new objectives, it will require a large number of double beta emitter nuclei, a ejj Umm ++++ ∑= ik jjkfF ErHM 0),(0 0 ττν ++++ ∑= ikjjkfGT ErHM 0.),(00 ττν kj σσ 2)( fi EEΔ = − ∑ ∑ ++ jj ττσ and ∫ +−+= }2/)({ fi EEE sin2 qrqR - 5 - of this type of search was the IGEX. The International Germanium Xperiment (IGEX) was a search for the neutrinoless double beta decay of 76Ge employing large amounts of In the first phase of the experiment three detectors of 0.7 ents (the most sites. It provided a rejection of ~ 60 % of the events in the region of IGEX spectrum with and without the PSD background rejection. The IGEX detectors had the initial objective of the detection of the double beta decay of 76Ge. At the end of 1999 certain modifications were made to adapt the detectors to the detection at low energy where the signal of WIMPs (Weak Interacting Massive Particles) is relevant. The shielding, shared by three IGEX detectors (2 kg germanium detectors isotopically enriched to 86% in 76Ge) and the COSME detector, included from inside to outside 40 cm of lead, a PVC box (silicone sealed and flushed with nitrogen), 2 mm of cadmium, plastic scintillators working in anticoincidence with the Ge detectors and 20 cm of polyethylene. The shielding was modified on July 2001 as it included only one 2 kg germanium detector inside a more efficient neutron shielding. These techniques of passive very low background and a sharp energy resolution in the Q-value region, and effective methods to disentangle signal from noise. A typical example HPGe detectors, isotopically enriched to 86% in 76Ge. kg active volume each were operated: one in the Homestake gold mine (4000 m.w.e.), other in the Baksan Neutrino Observatory (660 m.w.e.) and the other in the Canfranc underground laboratory (Laboratory 2 at 1380 m.w.e.). A conservative lower bound on the neutrinoless half-life of about 1024 years was derived. The International Germanium EXperiment (IGEX) took data at the Canfranc Underground Laboratory in Spain at a depth of 2450 m.w.e. in a search of neutrinoless double beta decay. Three Germanium detectors (RG1, RG2 and RG3), of ~2 kg each, enriched to 86% in 76Ge were used. Efforts were made to reduce part of the radioactive background by discriminating it from the expected signal by comparison of the shape of the pulses (PSD) of both types of events. The method was applied to the data recorded by two Ge detectors of the IGEX, which has produced one of the two best current sensitivity limits for the Majorana neutrino mass parameter. In the second phase, three large detectors (2 kg each) were fabricated (with improvements derived from the analysis of data of Phase 1). They are installed in the Canfranc underground laboratory (Laboratory 3 at 2450 m.w.e.) inside a low background shielding consisting of 40 cm of lead, a PVC box (silicone sealed and flushed with nitrogen), 2mm of cadmium, 20 cm of polyethylene and an active veto (plastic scintillators). A pulse shape discrimination (PSD) technique capable to distinguish single site events (ββ decay events for example) from multisite ev dominant background events) is implemented. New limits on the neutrinoless half-life and the neutrino mass parameter were thus obtained from here. In large intrinsic Ge detectors, the charge carriers take 300 - 500 ns to reach their respective electrodes. These drift times are long enough for the current pulses to be recorded at a sufficient sampling rate. The current pulse contributions from electrons and holes are displacement currents, and therefore dependent on their instantaneous velocities and locations. Accordingly, events occurring at a single site (ββ-decay events for example) have associated current pulse characteristics which reflect the position in the crystal where the event occurred. More importantly, these single-site events (SSE) frequently have pulse shapes that differ significantly from those due to the background events that produce electron-hole pairs at several sites by multi-Compton-scattering process, for example (the so-called Multi-Site Events (MSE)). Consequently, pulse-shape analysis was used to distinguish between these two types of energy depositions since DBD events belong to the SSE class of events and will deposit energy at a single site in the detector while most of the background events belong to the MSE class of events and will deposit energy at several interest, accepting the criterion that those events having more than two lobes cannot be due to DBD event. - 6 - and active shielding, along with the extreme radiopurity of the detectors and their components, allowed a low energy background as well as a low enough threshold which are unique in this type of detectors. So, very stringent contour limits for cross sections and masses of dark matter particles interacting with Ge nuclei through spin- independent interactions were derived from here. The need to understand and reject backgrounds in Ge-diode detector double-beta decay experiments thus gave rise to the development of the pulse shape analysis technique in such detectors to distinguish DBD single-site energy deposits from the multiple-site deposits. Henceforth the analysis was extended by DBD people to segmented Ge detectors to study the effectiveness of combining segmentation with pulse shape analysis to identify the multiplicity of the energy deposits. The IGEX calculations for a lower bound to the half-life for the neutrinoless mode where there were fewer than 3.1 candidate events (90% Confidence Level) under a peak having FWHM = 4 keV and centered at 2038.56 keV corresponded to: he requirements for a next generation experiment can easily be deduced by reference to (3.1) where N is the number of parent nuclei, t is the counting time, and c is the upper limit on the number of 0νββ- decay counts consistent with the observed background. To improve the sensitivity of ‹mν› by a factor of 100, the quantity Nt/c must be increased by a factor of 104. The quantity N can feasibly be increased by a factor of ~102 over present experiments, so that t/c must also be improved by that amount. Since practical counting times can only be increased by a factor of 2 to 4, the background should be reduced by a factor of 25 to 50 below present levels. These are approximately the target parameters of the next generation neutrinoless double-beta decay experiments. Histogram of the IGEX data in the energy region of interest for the 0ν -ββ decay. The limits on the half-life and neutrino mass parameter are also shown. The Effective ν Mass: The section of KKDK on effective neutrino mass (“Critical View to the IGEX neutrinoless ouble-beta decay experiment...” published in Phys. Rev. D, Volume 65 (2002) 092007, by H. V. Klapdor- Kleingrothaus, A. Dietz, and I. V. Krivosheina) begins with: “Starting from their incorrectly determined half-life limit the authors claim a range of effective neutrino mass of (0.33-1.35) eV.” In response the IGEX collaboration, came out stating that KKDK selected only the 52.51 mole·years of the IGEX data that had been subjected to PSD and obtained T½0ν > 7.1×1024 y using the maximum number of counts, 3.1, from the entire 117 mole·years of data which was erroneous and unjustified. In another case, KKDK also decided to arbitrarily use the entire IGEX data set prior to PSD selection from which they obtained 0ν a bound of T½0ν > 1.1 × 1025 y for which there was no scientific justification for selecting only PSD corrected data on one hand and totally ignoring the PSD corrected data on the other hand. In the conclusion of KKDK it states: “the IGEX paper - apart from the too high half-life limits presented, as a consequence of an arithmetic error - is rather incomplete in its presentation”. In response to this paper the IGEX collaboration published the article “The IGEX experiment revisited: a response to the critique of Klapdor-Kleingrothaus, Dietz, and Krivosheina” where they stated that there was absolutely no arithmetic error yryrGe 25 76 1057.1 1087.4 >T 0 (ν2/1 1.3 ).2(ln0 2/1 = - 7 - and that the analysis of the published IGEX data presented in KKDK stands illegitimate. To obtain a much shorter bound on the half-life, they arbitrarily analyzed two ~ halves of the data separately. Instead of having 4.88×1025 y in the numerator (ln2 N.t) they used 2.2×1025 y. Yet they used the 90% CL upper limit on the number of counts under the peak, obtained by IGEX from all of the data. In another analysis, they ignored the fact that 52.51 mole·years were corrected with PSD and treated the complete uncorrected data set. Naturally, the lower limits on T1/2oν (76Ge) obtained by these completely unjustified procedures are shorter than that obtained from properly analyzing the complete data set. This paper henceforth states “the lower limit quoted by IGEX, T1/20ν ≥ 1.57 × 1025 years, is correct and that there was no arithmetical error as claimed in the Critical View article.” 4. The HEIDELBERG - MOSCOW Experiment The Heidelberg-Moscow experiment at the Gran Sasso underground laboratory is now claimed to be the most sensitive neutrinoless double beta decay experiment worldwide. It has contributed in an extraordinary way to the research in neutrino physics and particularly to beyond standard model physics, and limits for the latter are competing with those from the largest high-energy accelerators. The emphasis on the first indication for y is found in the Heidelberg-Moscow experiment ving first evidence of the lepton neutrinoless double beta deca gi number violation and a Majorana nature of the neutrinos. The neutrinoless double beta decay could answer questions to the absolute scale of the neutrino mass and the fundamental character of the neutrino whether it is a Dirac or a Majorana particle. Entrance of the highway tunnel under Gran Sasso mountain. With the support of the LNGS the experimental building of the experiment was built between Halls A and B in Gran Sasso, into which the first enriched 76 76Ge detector (the first high-purity enriched Ge detector worldwide) was installed in July 1990 . First preparation work had been done since 1989 in a provisional tent in Hall C. The ll amount of five enriched 76Ge detectors of in total 11 kg was finally installed in 1995 and were operated since n method lead (detectors ## 1,2,3,5). Each setup is coated with stainless steal casing. Non- dioactive pure nitrogen was blown through casings to reduce radon emanation contribution. To reduce neutron background the casing with detectors ##1,2,3,5 was coated with borated polyethylene and two anticoincidence plates of plastic scintillator were located over the casing in order to reduce muon component. The setup was located in Gran Sasso underground laboratory, Italy at a depth of 3500 metres of water equivalent of the lab 1996 with a newly developed pulse shape discriminatio High purity germanium crystals, enriched by Germanium-76 isotope up to 86% are used as the main detecting elements. Five coaxial detectors with the total weight of 11.5 kg (125 moles in the active volume of detectors) are used. Each detector is located in a separate cryostat made of electrolytic copper with low content of radioactive impurities. The quantity of other designed materials (iron, bronze, light material insulators) is minimized in order to reduce the feasible radioactive impurities contribution to the total background of the detectors. The detectors were located in two separate shielded boxes. One of them, 270 mm thick is made of electrolytic copper (detector #4), the other consists of two layers of lead – inner -100mm of high purity LCD2-grade lead and outer – 200 mm of low background Boliden reduces influence of cosmic rays on background conditions of the experiment. The electronics and the system of collecting data allow to record each event – the number (or numbers) of acted detector, amplitude and pulse shape, and anticoincidence veto. The Heidelberg-Moscow experiment, with five enriched 86%-88% high-purity p-type Germanium detectors, of in total 10.96 kg of active volume, used the largest source strength of all double beta experiments at present, and reached a record low level of background. The detectors were the first high-purity Ge detectors ever produced. The degree of enrichment was checked by investigation of tiny pieces of Ge after crystal production using the Heidelberg MP-Tandem accelerator as a mass spectrometer. The detectors, except detector # 4, were operated in a common Pb shielding of 30 cm, which consisted of an inner shielding of 10 cm radiopure LC2-grade Pb followed by 20 cm of Boliden lead. The whole setup was placed in an air-tight steel box and flushed with radiopure nitrogen in order to suppress the 222Rn contamination of the air. The shielding was improved in the course of the measurement. The steel box operated since 1994 centered inside a 10- cm boron-loaded polyethylene shielding to decrease the neutron flux from outside. An active anticoincidence shielding was placed on top of the setup since 1995 to reduce the effect of muons. Detector # 4 was installed in a separate setup, which had an inner shielding of 27.5 cm electrolytical Cu, 20 cm lead, and boron-loaded below the steel box, but no muon shielding. The setup was kept air-tight closed since stallation of detector #5 in February’95. Since then no radioactive contaminations of the inner of the he sensitivity for the 0ν ββ half-life is given by (4.1) ent are: energy resolution, background and e strength ever operated in a double beta decay xperiment. The background reached to the experiment, was 0.113 ± 0.007 events/kg y keV (in the period 1995- was the lowest lim polyethylene shielding experimental setup by air and dust from the tunnel could occur. - 8 - With denoting the degree of enrichment, ε the efficiency of the detector for detection of a double beta event, M the detector (source) mass, ∆E the energy resolution, B the background and t the measuring time, the sensitivity of our 11 kg of enriched 76Ge experiment corresponds to that of an at least 1.2 ton natural Ge experiment. After enrichment - the other most important parameters of a ββ experim source strength. The high energy resolution of the Ge detectors of 0.2% or better, assures no background for a 0νββ line from the two-neutrino double beta decay in this experiment (5.5 × 10-9 events expected in the energy range 2035-2039.1keV), in contrast to most other present experimental approaches, where limited energy resolution is a severe drawback. The efficiency of Ge detectors for detection of 0ν ββ decay events is close to 100%. The source strength in the Heidelberg-Moscow experiment of 11kg was the largest sourc ~0 〉〈×ν ε mand . 02/1 Δ ννTBE 2003) in the 0ν ββ decay region (around Qββ). This it ever obtained in such type of experiment. - 9 - he statistics collected in this experiment during 13 years of stable running is the largest ever collected in a presented a paper concerning “Measurement of the Bi spectrum in the energy gion around the Q-value of Ge neutrinoless double-beta decay”. In this work they presented the measurements f the 214Bi spectrum from a 226Ra source with a high purity germanium detector. Their attention was mostly focused on the energy region around the Q-value of 76Ge neutrinoless double-beta decay (2039.006 keV). The results of the measurement strongly relates to the first indication for neutrinoless double beta decay of 76Ge. An analysis of the data collected during ten years of measurements by the Heidelberg-Moscow experiment, at Gran- Sasso Underground Laboratory, yields a first indication for the neutrinoless double beta decay of 76Ge. An important point of this analysis is the interpretation of the background, in the region around the Q-value of the double beta decay (2039.006 keV), as containing several weak photopeaks. It was suggested and has been shown that four of these peaks are produced by a contamination from the isotope 214Bi, whose lines are present throughout the Heidelberg-Moscow background spectrum. In this work they performed a measurement of a 226Ra source with a high-purity germanium detector. The aim of this work was to study the spectral shape of the lines in the energy region from 2000 to 2100keV and, most important, to show the difference in this spectral shape when changing the position of the source with respect to the detector, and to verify the effect of TCS (True Coincidence Summing) for the weak 214Bi lines seen in the Heidelberg-Moscow experiment. The activity of the 226Ra source is 95.2kBq. The isotope 226Ra appears in the 238U natural decay chain and from its decays also 214Bi is produced. The γ-spectrum of 214Bi is clearly visible in the 226Ra measured spectrum. 214Bi is a naturally occurring isotope: it is produced in the 238U natural decay chain through the β- decay of 214Pb and the alpha decay of 218At. With a subsequent β- reaction, 214Bi decays then into 214Po (the branching ratio with respect to the α decay into 210Tl is 99.979%). The decay, however, does not lead d state of 214Po, but to its excited states. From the decays of those excited states to the ground tate the well known γ-spectrum of 214Bi is obtained, which contains more than hundred lines. the table given below, one can see in the energy region around the Q-value of the 0νββ decay (2000-2100keV), ur γ-lines and one E0 transition with energy 2016.7keV are expected. The E0 transition can produce a double beta decay experiment. The experiment took data during ~ 80% of its installation time. The Q value for neutrinoless double beta decay was recently determined with high precision. The background of the experiment: (1) primordial activities of the natural decay chains from 238 232U, Th, and 40K; (2) anthropogenic radio nuclides, like 137 134 125 207Cs, Cs, Sb, Bi; (3) cosmogenic isotopes, produced by activation due to cosmic rays during production and transport; (4) the bremsstrahlungs spectrum of 210Bi (daughter of 210Pb); (5) elastic and inelastic neutron scattering; and (6) direct muon-induced events. H.V. Klapdor-Kleingrothaus, O. Chkvorez, I.V. Krivosheina and C. Tomei at Max-Planck-Institut fur Kernphysik in the Heidelberg-Moscow group 214 directly to the groun conversion electron or an electron-positron pair but it could not contribute directly to the γ-spectrum in the considered energy region if the source is located outside the detector active volume. 0.0502010.71 Intensity(%)Energy (keV) 0.0782052.94 0.05020889.7 0.0202021.8 0.00582016.7 0.0502010.71 Intensity(%)Energy (keV) 0.0202021.8 0.00582016.7 0.0782052.94 0.05020889.7 The intensity of each line is defined as the number of emitted photons, with the corresponding energy, per 100 decays of the parent nuclide. The considerations for the measurement were the efficiency of the detector (which depends on the size of the detector and on the distance source-detector) and the effect called True Coincidence Summing (TCS). The lifetimes of the atomic excited levels are much shorter than the resolving time of the detector. If two gamma-rays are emitted in cascade, there is a certain probability that they will be detected - 10 - lled in . The measurement of Bi pectrum, with a high purity germanium detector, in the energy region around the Q-value of 76Ge neutrinoless ta decay (2039.006keV) was done with the 226Ra source used for the measurements positioned, in a first step the source was positioned on the top of the detector, directly in contact with the copper cap (close geometry) and in a second step the source was moved 15cm away from the detector cap (far geometry). The results of the measurements show that, if the source is close to the detector, the intensities of the weak Bi lines in the energy region 2000- 2100keV are not in the same ratio as reported by Table of Isotopes. The results of the analysis of the data collected by the Heidelberg-Moscow experiment with all the five detectors, yielding a first indication for the neutrinoless double beta decay of 76Ge, shows that four 214Bi lines are present in the energy region from 2000 to 2080keV (many other strong lines from the same isotope are present in the spectrum), due to the presence of bismuth in the experimental setup, especially in the copper in the vicinity of the Ge crystals. together. If this happens, then a pulse will be recorded which represents the sum of the energies of the two individual photons, instead of two separated pulses with different energies. The TCS effect can result both in lower peak-intensity for full-energy peaks and in bigger peak-intensity for those transitions whose energy can be given by the sum of two lower-energy gamma-rays. In this case, the lines at 2010.7 keV and 2016.7 keV can be given by the coincidence of the 609.312 keV photon (strongest line, intensity = 46.1%) with the 1401.50keV photon (intensity = 1.27%) or with the 1407.98keV photon (intensity = 2.15%). The degree of TCS depends on the probability that two gamma-rays emitted simultaneously will be detected simultaneously which is a function of the detector geometry and of the solid angle subtended at the detector by the source and for this the intensities of the two lines mentioned above (2010.71keV and 2016.7keV) are expected to depend on the position of the source with respect to the detector. The 226Ra γ-ray spectra were measured using a γ-ray spectroscopy system based on an HPGe detector insta the operation room of the HEIDELBERG-MOSCOW experiment in Gran Sasso Underground Laboratory, Italy. The coaxial germanium detector had an external diameter of 5.2cm and 4.9cm height. The distance between the top of the detector and the copper cap was kept at 3.5cm. The relative detection efficiency of the detector was 23% and the energy resolution being 3.6keV for the energy range 2000-2100keV 214 double-be The above figure shows the sum spectrum of the 76Ge detectors 1,2,3,4 and 5 over the period August 1990 to May 2003 as recorded by the Heidelberg-Moscow experiment. - 11 - There is no null hypothesis analysis demonstrating that the data require a peak. Furthermore, no simulation has to demonstrate that the analysis correctly finds true peaks or that it would find no peaks if none existed. Monte Carlo simulations of spectra containing different numbers of peaks are needed to confirm the significance of any found peaks. 2. There are three unidentified peaks in the region of analysis that have greater significance than the 2039-keV peak. There is no discussion of the origin of these peaks. 3. There is no discussion of how sensitive the conclusions are to different mathematical models. There is a previous Heidelberg-Moscow publication that gives a lower limit of 1.9 × 1025 y (90% confidence level). This is in conflict with the “best value” of a newer KDHK paper of 1.5 × 1025 y. This indicates a dependence of the results on the analysis model and the background evaluation. In this paper they state that a number of other cross checks of the result should also be performed. For example, there is no discussion of how a variation of the size of the chosen analysis window affects the significance of the hypothetical peak. There is no relative peak strength analysis of all the 214Bi peaks. Quantitative evaluations should be made on the four 214Bi peaks in the region of interest. There is no statement of the net count rate of the peaks other than the 2039-keV peak. There being no presentation of the entire spectrum, is difficult to compare relative strengths of peaks. There is no discussion of the relative peak strengths before and after the single-site- event cut. On the other hand the Heidelberg-Moscow group claims that the signal found at Qββ is consisting of single site events and is not a γ line. The signal does not occur in the Ge experiments not enriched in the double beta emitter 76Ge, while neighbouring background lines appear consistently in these experiments. On this basis they translated the observed numbers of events into half-lives for neutrinoless double beta decay. The Heidelberg-Moscow experiment continued regularly from 1990 till 2003. The analysis of the full data taken with the Heidelberg-Moscow experiment in the period 2 August 1990 until 20 May 2003 is presented. The completed Heidelberg-Moscow 76Ge Experiment -71.7 kg y after 13 years of operation presents their mass calculation limit status as mν (eV) = 0.24 - 0.58 ( 99.997% C.L.) with the best value of 0.4 eV (95% C.L.). hile an unambiguous interpretation of all of the neutrino oscillation experiments is not yet possible, it is bundantly clear that neutrinos exhibit properties not included in the standard model, namely mass and flavor arch which will employ 500 kg of Ge, a. The Majorana experiment is proposed for a US deep underground laboratory, eriments. Furthermore, new segmented Ge detector yogenic performance and background reduction and Moscow and IGEX experiments both utilized Germanium enriched to 86% in Ge and operated deep In a paper by Klapdor-Kleingrothaus, Dietz, Harney, and Krivosheina (hereafter referred to as KDHK) evidence is claimed for zero-neutrino double-beta decay in 76 Ge. The high quality data, upon which this claim is based, was compiled by the 2 careful efforts of the Heidelberg-Moscow collaboration, and is well documented. However, the analysis in KDHK makes an extraordinary claim, and therefore requires very solid substantiation according to another paper “Comment on Evidence for Neutrinoless Double Beta Decay” C.E.Aalseth et al. They state that a large number of issues were not addressed in KDHK some of which are: been presented 5. The proposed MAJORANA experiment mixing. Accordingly, sensitive searches for neutrinoless double-beta decay (0νββ-decay) are more important than ever. Experiments with large quantities of Ge, isotopically enriched in 76Ge, have thus far proven to be the most sensitive, specifically the Heidelberg-Moscow and IGEX experiments with lower limits in half-life sensitivities 1.9×1025 y and 1.6×1025 y respectively. A new generation of experiments will be required to make significant improvements in sensitivity one of which is the proposed Majorana Experiment. The Majorana Experiment is a next-generation Ge double-beta decay se isotopically enriched to 86% in Ge, in the form of ~200 detectors in a close-packed array for high granularity. Each crystal will be electronically segmented, with each region fitted with pulse-shape analysis electronics. A half-life sensitivity is predicted of 4.2 × 1027 years or < mν> ~ 0.02 - 0.07 eV, depending on the nuclear matrix elements used to interpret the dat and requires very little R&D as it stands on the technical shoulders of the IGEX experiment and other previous successful double-beta decay and low-background exp technology has recently become commercially available, while Pacific Northwest National Laboratory (PNNL)/University of South Carolina (USC) researchers have developed new pulse-shape discrimination techniques. Several configurations have been evaluated with respect to cr rejection. It will concentrate on a conventional modular design using ultra-low background cryostat technology developed by IGEX. It will also utilize new pulse-shape discrimination hardware and software techniques developed by the Majorana collaboration and detector segmentation to reduce background. The Heidelberg- underground. The projection for the Majorana is that the background will be reduced by a factor of 65 over the early IGEX results prior to pulse shape analysis (from 0.2 to ~0.003 keV-1 - 12 - germanium by limiting the time above ground after crystal growth, careful material selection marily comprised of multiple o or ore of the independent segments. When coincidences are found, the output from all detector segments is nly of the full-energy peak lying above a featureless kg-1 y-1). This will occur mainly by the decay of the internal background due to cosmogenic neutron spallation reactions that produce 56 58 60Co, Co, Co, 65Zn and 68Ge in the and electroforming copper cryostats. One component of the background reduction will arise from the segmentation and granularity of the detector array. Most of the Compton continuum consists of single Compton scatterings followed by escape of the scattered gamma ray, whereas full-energy events at typical gamma-ray energies are pri scattering sequences followed by a photoelectric absorption. The peak-to-Compton ratio can therefore be enhanced by requiring a recorded event to correspond to more than one interaction within the detector before its acceptance. In germanium detectors, this selection is usually accomplished by subdividing the detector into several segments (or providing several adjacent independent detectors) and seeking coincident pulses from tw summed and recorded. The resulting spectrum is made up o continuum that is greatly suppressed and has no abrupt Compton edges. New Ge experiments must not simply be a volume expansion of IGEX or Heidelberg–Moscow. They must have superior background rejection and better electronic stability. The summing of 200 individual energy spectra can result in serious loss of energy resolution for the overall experiment which can be avoided by segmenting n-type intrinsic Ge detectors, advanced PSD techniques and electronic stability in measurement. The above figure depicts a standard Ge detector segmentation scheme. This is the configuration of the SEGA detector undergoing tests by the Majorana collaboration. A configuration with six-azimuthal-segment by two- axial-segment geometry is shown in the above figure. Efforts are thus on with the Majorana experiment for the search of neutrinoless double beta decay that would give a new shape to the standard model of physics. Majorana cannot not simply be a volume expansion of IGEX, but must have superior background rejection. As it was conclusively shown that the limiting background in at least some previous experiments has been cosmogenic activation of the germanium itself, it is necessary to mitigate those background sources. Cosmogenic activity fortunately has certain factors which discriminate it from the signal of interest. For example, while 0νββ -decay would deposit 2 MeV between two electrons in a small, perhaps 1 mm3 volume, internal 60Co decay deposits about 318 keV (endpoint) in beta energy near the decaying atom, while simultaneous 1173 keV and 1332 keV gammas can deposit energy elsewhere in the crystal, most probably both in more than one location, for a total energy capable of reaching the 2039 keV region-of-interest. A similar situation exists for internal 68Ge decay. Thus deposition-location multiplicity distinguishes double-beta decay from the important long lived cosmogenics in germanium. Isotopes such as 56 57 58Co, Co, Co and 68Ge are produced at a rate of roughly 1 atom per day per kilogram on the earth’s surface. Only 60Co and 68Ge have both the energy and half-life to be of concern. To pursue the multiplicity parameter, firstly, the detector current pulse shape carries with it the record of energy deposition along the electric field lines in the crystal; that is, the radial - 13 - imension of cylindrical detectors. This information may be exploited through pulse-shape discrimination. econdly, the electrical contacts of the detector may be divided to produce independent regions of charge collection. The ability of new techniques to be easily calibrated for individual detectors makes them practical for large detector arrays. Calibration for single-site event pulses was trivially accomplished by collecting pulses from thorium ore; the 2614.47-keV gamma ray from 208Tl produces a largely single-site double-escape peak at 1592.47 keV. The PSD discriminator was then calibrated to the properties of the double-escape peak A slightly improved double-escape peak was be made from the 26Al gamma ray of 2938.22-keV. The double-escape appears at 1916.22 keV, only about 120 keV away from the expected region of interest for 0νββ-decay. The obvious and direct use of pulse- shape discrimination and segmentation is the rejection of cosmogenic pulses in the germanium itself. However, the approach should be also effective on gamma rays from the shielding and structural materials. The background effects of neutrons of both high energy (cosmic muon generated) and low energy (fission and (α,n) from rock) could be protected by the segmentation and granularity of the detectors. These neutrons could also produce other unwanted activities like the formation of 3H and 14C in nitrogen from high and low energy neutrons, respectively. Fortunately, Majorana detectors will not be surrounded by nitrogen at high density. The GERDA (GERmanium Detector Assembly), which is another next generation 76Ge double beta decay experiment at the Gran Sasso Underground Laboratory, has projected a sensitivity in the half-life of the 0νββ- decay mode which is less than the proposed Majorana experiment. In conclusion, the Majorana project has been designed in a compact, modular way such that it can be built and operated with high confidence in the approach and the technology. The initial years of construction will allow alternate cooling methods to be employed if they have an advantage and should they be shown to overcome long-term concerns due to surface contamination, muon-induced ions, and diffusion. The Majorana Collaboration has made an extensive analysis of the predicted backgrounds and their impact on the final sensitivity of the experiment. The Majorana experiment represents a great increase in Ge mass over IGEX with new segmented Ge detectors and the newest electronic systems for pulse-shape discrimination. Their conclusion is that with 500 kg of Ge, enriched to 86% in the isotope 76Ge, the Majorana array operating over 10 years including construction time, can reach a lower limit on T1/20ν of 4×1027 years. This corresponds to an upper bound of < m > of 0.038 ± 0.007eV. One advantage of 76ν Ge is that it may well be a candidate for a future more reliable microscopic calculation of the 0ν ββ- decay nuclear matrix element. 6 Conclusion eutrinoless double beta decay is thus one of the most sensitive approaches with great perspectives to test pN article hysics beyond the Standard Model. The possibilities to use 0νββ decay for constraining neutrino masses, left– ght symmetric models, SUSY and leptoquark scenarios, as well as effective lepton number violating couplings, have been reviewed. It is a very sensitive probe to the lepton number violating terms in the Lagrangian such as the Majorana mass of the light neutrinos, right-handed weak couplings involving heavy Majorana neutrinos, as well as Higgs and other interactions involving violation of chirality conservation. - 14 - In search for neutrinoless double beta decay 76Ge as the source material has multiple advantages. It has high resolution (< 4 keV at Qββ) with no background from 2ν mode. A huge leap in sensitivity is possible applying ultra-low background techniques and 0ν- ββ signal discrimination. There can be a phased approach in the experiment with the increment of target mass. The source and detector are the same material thereby reducing background and maintaining the 4π geometry and the only way to scrutinize 0ν – DBD claim on short time scale: since it tests T1/2 and not mν. The consequences of Neutrinoless Double Beta Decay are- [1] Total Lepton number violation: The most important consequence of the observation of neutrinoless double beta decay is that lepton number is not conserved. This is fundamental for particle physics. [2] Majorana nature of neutrino: Another fundamental consequence is that the neutrino is a Majorana particle. Both of these conclusions are independent of any discussion of nuclear matrix elements. [3] Effective neutrino mass: The matrix element enters when we derive a value for the effective neutrino mass - making the most natural assumption that the 0νββ decay amplitude is dominated by exchange of a massive Majorana neutrino. Acknowledgements I would like to thank the IGEX collaboration, the Heidelberg-Moscow collaboration and the Majorana n for having used information from th imental works to write up this bri review. eferences oration), Physics Review D (2002) Lett. A(2002), hep-ph/0202018 C. E. hed 76Ge in Gran Sasso 1990-2003 Heidelberg-Moscow or-Kleingrothaus, I.V. Krivosheina, A.Dietz, O.Chkvoretz, Physics Letters B 586(2004) 198-212. m the Heidelberg-Moscow double beta decay experiment”, (The Heidelberg Moscow Collaboration), Eur. Phys. J. A 12, 147-154(2001). collaboratio eir exper ef [1] “Search for neutrinoless double beta decay with enriched 76Ge in Gran Sasso 1990-2003”, H.V. Klapdor-Kleingrothaus, I.V. Krivosheina, A. Dietz, O. Chkvorets, Phys. Lett. B 586 (2004) 198 - 212 and hep-ph/0404088. [2] “Next generation double-beta decay experiments: metrics for their evaluation”, F T Avignone III, G S King III and Yu G Zdesenko , New Journal of Physics 7 (2005) [3] “Double-beta decay”, Steven R Elliott and Jonathan Engel, J. Phys. G: Nuclear and Particle Physics. [4] “New Physics Potential of Double Beta Decay and Dark Matter Search”, H.V. Klapdor–Kleingrothaus, H. Pas, Talk presented by Heinrich Pas atthe at the 6th Symp. on Particles, Strings and Cosmology (PASCOS’98), Boston, March 1998 [5] H.V. Klapdor-Kleingrothaus et al. Mod. Phys. Lett. A 16 (2001) 2409 - 2420. [6] H.V. Klapdor-Kleingrothaus, A. Dietz, I.V. Krivosheina, Part. & Nucl. 110(2002)57. [7] H.V. Klapdor-Kleingrothaus, et al., Nucl. Instr. Meth. 522 A (2004) 371-406 and hep-ph/0403018 and Phys. Lett. B 586 (2004) 198-212. [8] H.V. Klapdor-Kleingrothaus, A. Dietz, I.V. Krivosheina, Ch. Dorr, C. Tomei, Phys. Lett. B 578 (2004) 54-62 and hep- ph/0312171. [9] H.V. Klapdor-Kleingrothaus et al., (Heidelberg-Moscow Collaboration.), Eur. Phys. J. A 12(2001)147. 10] “IGEX [ 76Ge neutrinoless double-beta decay experiment: Prospects for next generation experiments”, C.E.Aalseth et al., (The IGEX collab [11] H.V.Klapdor-Kleingrothaus, A.Dietz, I.V.Krivosheina and O.Chkvorets, Nucl. Instr. Meth. A 522 (2004) 371-406. [12] “Heidelberg - Moscow Experiment. First Evidence for Lepton Number Violation and the Majorana Character of Neutrinos” H.V. Klapdor-Kleingrothaus and I.V. Krivosheina [13] “Search for Neutrinoless Double Beta Decay with Enriched 76Ge 1990-2003 Heidelberg-Moscow Experiment” H.V.Klapdor-Kleingrothaus, I.V. Krivosheina, A.Dietz, C.Tomei, O.Chkvoretz, H.Strecker hep-ph/0404062 (2004) [14] “Pulse Shape Discrimination in the IGEX Experiment”, D. Gonzalez et al, hep-ex/0302018. [15] “Comment On Evidence for Neutrinoless Double Beta Decay”, Mod. Phys. Aalseth et al. [16.] “The IGEX experiment revisited: a response to the critique of Klapdor-Kleingrothaus, Dietz, and Krivosheina”, C.E.Aalseth et al., (The IGEX collaboration), nucl-ex/0404036. [17] “The Majorana 76Ge Double-Beta Decay Project”, The Majorana Collaboration, hep-ex/0201021 [18] H.V. Klapdor-Kleingrothaus , O. Chkvorez, I.V. Krivosheina, C. Tomei, Nucl. Instrum. Meth. A (2003), “Measurement of the 214Bi spectrum in the energy region around the Q-value of 76Ge neutrinoless double-beta decay” [19] “Critical View to the IGEX neutrinoless double-beta decay experiment” H. V. Klapdor-Kleingrothaus, A. Dietz, and I. V. Krivosheina, hep-ph/0403056. [20] “Results of the experiment on investigation of Germanium-76 double beta decay - Experimental data of Heidelberg- Moscow collaboration November 1995 - August 2001”, A.M. Bakalyarov, A.Ya. Balysh, S.T. Belyaev, V.I. Lebedev, S.V. Zhukov, Phys.Part.Nucl.Lett. 2 (2005) 77-81 , hep-ex/0309016. [21] “The proposed Majorana 76Ge double-beta decay experiment” , The Majorana Collaboration, Nuclear Physics B 138(2005) 217-220. 22] “Search For Neutrinoless Double Beta Decay With Enric[ Experiment” H.V.Klapd [23] “Latest Results fro ABSTRACT Neutrinoless double beta decay is one of the most sensitive approaches in non-accelerator particle physics to take us into a regime of physics beyond the standard model. This article is a brief review of the experiments in search of neutrinoless double beta decay from 76Ge. Following a brief introduction of the process of double beta decay from 76Ge, the results of the very first experiments IGEX and Heidelberg-Moscow which give indications of the existence of possible neutrinoless double beta decay mode has been reviewed. Then ongoing efforts to substantiate the early findings are presented and the Majorana experiment as a future experimental approach which will allow a very detailed study of the neutrinoless decay mode is discussed. <|endoftext|><|startoftext|> Introduction The geometrical superfield approach [1-8] to Becchi-Rouet-Stora-Tyutin (BRST) formalism is one of the most attractive and intuitive approaches which enables us to gain some physical insights into the beautiful (but abstract mathematical) structures that are associated with the nilpotent (anti-)BRST symmetry transformations and their corresponding generators. The latter quantities play a very decisive role in (i) the covariant canonical quantization of the gauge theories, (ii) the proof of the unitarity of the “quantum” gauge theories at any arbitrary order of perturbative computations for a given physical process (that is allowed by the theory), (iii) the definition of the physical states of the “quantum” gauge theories in the quantum Hilbert space, and (iv) the cohomological description of the physical states of the quantum Hilbert space w.r.t. the conserved and nilpotent BRST charge. To be specific, in the superfield formulation [1-8] of the 4D 1-form gauge theories, one defines the super curvature 2-form F̃ (2) = d̃Ã(1)+ i Ã(1)∧ Ã(1) in terms of the super exterior derivative d̃ = dxµ∂µ + dθ∂θ + dθ̄∂θ̄ (with d̃ 2 = 0) and the super 1-form connection Ã(1) on a (4, 2)-dimensional supermanifold parametrized by the usual spacetime variables xµ (with µ = 0, 1, 2, 3) and a pair of anticommuting (i.e. θ2 = θ̄2 = 0, θθ̄ + θ̄θ = 0) Grassmannian variables θ and θ̄. The above super 2-form is subsequently equated, due to the so-called horizontality condition [1-8], to the ordinary curvature 2-form F (2) = dA(1) + iA(1) ∧ A(1) defined on the ordinary 4D flat Minkowski spacetime manifold in terms of the ordinary exterior derivative d = dxµ∂µ (with d 2 = 0) and the 1-form connection A(1) = dxµAµ. The above super exterior derivative d̃ and super 1-form connection Ã(1) are the generalization of the 4D ordinary exterior derivative d and 1-form connection A(1) to the (4, 2)-dimensional supermanifold because d̃ → d, Ã(1) → A(1) in the limit (θ, θ̄) → 0. The above horizontality condition (HC) has been referred to as the soul-flatness con- dition in [9] which amounts to setting equal to zero all the Grassmannian components of the (anti)symmetric second-rank super tensor that constitutes the super curvature 2-form F̃ (2) on the (4, 2)-dimensional supermanifold. The key consequences, that emerge from the HC, are (i) the derivation of the nilpotent (anti-)BRST symmetry transformations for the gauge and (anti-)ghost fields of a given 4D 1-form gauge theory, (ii) the geometrical interpretation of the (anti-)BRST symmetry transformations for the 4D local fields as the translation of the corresponding superfields along the Grassmannian directions of the su- permanifold, (iii) the geometrical interpretation of the nilpotency property as a pair of successive translations of the superfield along a particular Grassmannian direction of the supermanifold, and (iv) the geometrical interpretation of the anticommutativity property of the (anti-)BRST symmetry transformations for a 4D local field as the sum of (a) the translation of the corresponding superfield first along the θ-direction followed by the trans- lation along the θ̄-direction, and (b) the translation of the same superfield first along the θ̄-direction followed by the translation along the θ-direction. It will be noted that the above HC (i.e. F̃ (2) = F (2)) is valid for the non-Abelian (i.e. A(1)(n)∧A(1)(n) 6= 0) 1-form gauge theory as well as the Abelian (i.e. A(1)∧A(1) = 0) 1-form gauge theory. As expected, for both types of theories, the HC leads to the derivation of the nilpotent (anti-)BRST symmetry transformations for the gauge and (anti-)ghost fields of the respective theories. We lay emphasis on the fact that the HC does not shed any light on the derivation of the nilpotent (anti-)BRST symmetry transformations associated with the matter fields of the interacting 4D (non-)Abelian 1-form gauge theories. In a recent set of papers [10-17], the above HC condition has been generalized, in a consistent manner, so as to compute the nilpotent (anti-)BRST symmetry transformations associated with the matter fields of a given 4D interacting 1-form gauge theory (along with the well-known nilpotent transformations for the gauge and (anti-)ghost fields) without spoiling the cute geometrical interpretations of the (anti-)BRST symmetry transformations (and their corresponding generators) that emerge from the HC alone. The latter approach has been christened as the augmented superfield approach to BRST formalism where the restrictions imposed on the (4, 2)-dimensional superfields are (i) the HC plus the invariance of the (super) matter Noether conserved currents [10-14], (ii) the HC plus the equality of any (super) conserved quantities [15], (iii) the HC plus a restriction that owes its origin to the gauge invariance and the (super) covariant derivatives on the matter (super)fields [16,17], and (iv) an alternative to the HC where the gauge invariance and the property of a pair of (super) covariant derivatives on the (super) matter fields (and their intimate connection with the (super) curvatures) play a crucial role [18-20]. In all the above approaches [1-20], however, the invariance of the Lagrangian densities of the 4D (non-)Abelian 1-form gauge theories, under the nilpotent (anti-)BRST symmetry transformations, has not yet been discussed at all. Some attempts in this direction have been made in our earlier works where the specific topological features [21,22] of the 2D free (non-)Abelian 1-form gauge theories have been captured in the superfield formulation [23-25]. In particular, the invariance of the Lagrangian density under the nilpotent and anticommuting (anti-)BRST and (anti-)co-BRST symmetry transformations has been ex- pressed in terms of the superfields and the Grassmannian derivatives on them. These are, however, a bit more involved in nature because of the existence of a new set of nilpotent (anti-)co-BRST symmetries in the theory. The geometrical interpretations for the La- grangian densities and the symmetric energy-momentum tensor (for the above topological theory) have also been provided within the framework of the superfield formulation. The purpose of our present paper is to capture the (anti-)BRST symmetry invariance of the Lagrangian density of the 4D (non-)Abelian 1-form gauge theories within the framework of the superfield approach to BRST formalism and to demonstrate that the above symme- try invariance could be understood in a very simple manner in terms of the translational generators along the Grassmannian directions of the (4, 2)-dimensional supermanifold on which the above 4D ordinary gauge theories are considered. In addition, the reason behind the existence (or non-existence) of any specific nilpotent symmetry transformation could also be explained within the framework of the above superfield approach. We demonstrate the uniqueness of the existence of the nilpotent (anti-)BRST symmetry transformations for the Lagrangian density of a U(1) Abelian 1-form gauge theory. We go a step further and show the existence of the nilpotent BRST symmetry transformations for the specific Lagrangian densities (cf. (4.1) and (4.4) below) of the 4D non-Abelian 1-form gauge theory and clarify the non-existence of the anti-BRST symmetry transformations for these spe- cific Lagrangian densities within the framework of the superfield formulation (cf. section 5 below). Finally, we provide the geometrical basis for the existence of the off-shell nilpo- tent and anticommuting (anti-)BRST symmetry transformations (and their corresponding generators) for the specifically defined Lagrangian densities (cf. (4.7) and/or (4.8) below) of the 4D non-Abelian 1-form gauge theory in the Feynman gauge. The motivating factors that have propelled us to pursue our present investigation are as follows. First and foremost, to the best of our knowledge, the property of the symmetry invariance of a given Lagrangian density has not yet been captured in the language of the superfield approach to BRST formalism. Second, the above (anti-)BRST invariance of the theory has never been shown, in as simplified fashion, as we demonstrate in our present endeavour. The geometrical interpretations for (i) the existence of the above nilpotent (anti-)BRST symmetry invariance, and (ii) the on-shell conditions of the on-shell nilpotent (anti-)BRST symmetries, turn out to be quite transparent in our present work. Third, we establish the uniqueness of the existence of the (anti-)BRST symmetry invariance in their various forms. The non-existence of the specific symmetry transformation is also explained within the framework of the superfield approach to BRST formalism. Finally, our present investigation is the first modest step in the direction to gain some insights into the existence of the nilpotent symmetry transformations and their invariance for the higher form (e.g. 2-form, 3-form, etc.) gauge theories within the framework of the superfield formulation. The contents of our present paper are organized as follows. In section 2, we recapitulate some of the key points connected with the nilpotent (anti-)BRST symmetry transformations for the free 4D Abelian 1-form gauge theory (having no interaction with matter fields) in the Lagrangian formulation. The above symmetry transformations as well as the symmetry invariance of the Lagrangian densities are captured in the geometrical superfield approach to BRST formalism in section 3 where the HC on the gauge superfield plays a crucial role. Section 4 deals with the bare essentials of the nilpotent (anti-)BRST symmetry transfor- mations for the 4D non-Abelian 1-form gauge theory in the Lagrangian formulation. The subject matter of section 5 concerns itself with the superfield formulation of the symmetry invariance of the appropriate Lagrangian densities of the above 4D non-Abelian 1-form gauge theory. Finally, in section 6, we summarize our key results, make some concluding remarks and point out a few future directions for further investigations. 2 (Anti-)BRST symmetries in Abelian theory: Lagrangian formulation Let us begin with the following (anti-)BRST invariant Lagrangian density of the 4D Abelian 1-form gauge theory∗ in the Feynman gauge [26,27,9] B = − F µνFµν + B (∂µA B2 − i ∂µC̄ ∂ µC, (2.1) where Fµν = ∂µAν − ∂νAµ is the antisymmetric (Fµν = −Fνµ) curvature tensor that con- stitutes the Abelian 2-form F (2) = dA(1) ≡ 1 (dxµ ∧ dxν)Fµν , B is the Nakanishi-Lautrup auxiliary multiplier field and (C̄)C are the anticommuting (i.e. C2 = C̄2 = 0, CC̄+C̄C = 0) (anti-)ghost fields of the theory. The above Lagrangian density respects the off-shell nilpo- tent (s2(a)b = 0) (anti-)BRST symmetry transformations s(a)b (with sbsab + sabsb = 0) sbAµ = ∂µC, sbC = 0, sbC̄ = iB, sbB = 0, sbFµν = 0, sabAµ = ∂µC̄, sabC̄ = 0, sabC = −iB, sabB = 0, sabFµν = 0. (2.2) It is clear that, under the nilpotent (anti-)BRST symmetry transformations s(a)b, the cur- vature tensor Fµν is found to be invariant. In other words, the 2-form F (2), owing its origin to the cohomological operator d = dxµ∂µ, is an (anti-)BRST invariant object for the Abelian U(1) 1-form gauge theory and is, therefore, a physically meaningful (i.e. gauge- invariant) quantity. These observations will play an important role in our discussion on the horizontality condition that would be exploited in the context of our superfield approach to (anti-)BRST invariance of the Lagrangian densities in sections 3 and 5 (see below). A noteworthy point, at this stage, is the observation that the gauge-fixing and Faddeev- Popov ghost terms can be written, modulo a total derivative, in the following fashion −i C̄ {(∂µA B}], sab +i C {(∂µA sb sab (2.3) The above equation establishes, in a very simple manner, the (anti-)BRST invariance of the 4D Lagrangian density (2.1). The simplicity ensues due to (i) the nilpotency s2(a)b = 0 of the (anti-)BRST symmetry transformations, (ii) the anticommutativity property (i.e. sbsab + sabsb = 0) of s(a)b, and (iii) the invariance of the Fµν term under s(a)b. As a side remark, it is interesting to note that the following on-shell (i.e. ✷C = ✷C̄ = 0) nilpotent (s̃2(a)b = 0) (anti-)BRST symmetry transformations (with s̃bs̃ab + s̃abs̃b = 0) s̃bAµ = ∂µC, s̃bC = 0, s̃bC̄ = −i(∂µA µ), s̃bFµν = 0, s̃abAµ = ∂µC̄, s̃abC̄ = 0, s̃abC = +i(∂µA µ), s̃abFµν = 0, (2.4) ∗We adopt here the notations and conventions such that the flat Minkowski metric in 4D is ηµν = diag (+1,−1,−1,−1) so that AµB µ = ηµνA µBν = A0B0 − AiBi for two non-null 4-vectors Aµ and Bµ. The Greek indices µ, ν...... = 0, 1, 2, 3 and Latin indices i, j, k.... = 1, 2, 3 stand for the 4D spacetime and 3D space directions on the 4D Minkowski spacetime manifold, respectively, and the symbol ✷ = (∂0) 2 − (∂i) †We follow here the notations and conventions adopted in [27]. In its full blaze of glory, the nilpotent (anti-)BRST transformations δ(A)B are a product of an anticommuting spacetime independent parameter η and s(a)b (i.e. δ(A)B = ηs(a)b) where the nilpotency property is encoded in the operators s(a)b. are the symmetry transformations for the following Lagrangian density b = − F µνFµν − µ)2 − i ∂µC̄ ∂ µC. (2.5) The above transformations (2.4) and the Lagrangian density (2.5) have been derived from (2.2) and (2.1) by the substitution B = −(∂µA µ). An interesting point, connected with the on-shell nilpotent symmetry transformations, is to express the analogue of (2.3) as ‡ C̄ (∂µA µ) + i Aµ∂ µC̄], s̃ab C (∂µA µ)− i Aµ∂ s̃b s̃ab (2.6) It should be noted that, in the above precise computation, one has to take into account the on-shell (✷C = ✷C̄ = 0) conditions so that, for all practical purposes s̃(a)b(∂µA µ) = 0. The above nilpotent (anti-)BRST symmetry transformations (i.e. sr, s̃r with r = b, ab) are connected with the conserved and nilpotent generators (i.e. Qr, Q̃r with r = b, ab). This statement can be succinctly expressed, in the mathematical form, as sr Ω = −i [ Ω, Qr ](±), s̃r Ω̃ = −i [ Ω̃, Q̃r ](±), r = b, ab, (2.7) where the subscripts (with the signatures (±)) on the square bracket stand for the bracket to be an (anti)commutator, for the generic fields Ω = Aµ, C, C̄, B and Ω̃ = Aµ, C, C̄ (of the Lagrangian densities (2.1) and (2.5)), being (fermionic)bosonic in nature. The above charges Qr, Q̃r are found to be anticommuting (i.e. QbQab+QabQb = 0, Q̃bQ̃ab+Q̃abQ̃b = 0) and off-shell as well as on-shell nilpotent (Q2(a)b = 0, Q̃ (a)b = 0) in nature, respectively. 3 (Anti-)BRST invariance in Abelian theory: superfield formalism In this section, we exploit the geometrical superfield approach to BRST formalism, endowed with the theoretical arsenal of the horizontality condition, to express the (anti-)BRST symmetry transformations and the Lagrangian densities (cf. (2.1) and (2.5)) in terms of the superfields defined on the (4, 2)-dimensional supermanifold. The latter is parametrized by the spacetime coordinates xµ (with µ = 0, 1, 2, 3) and a pair of Grassmannian variables θ and θ̄. As a consequence, the generalization of the 4D ordinary exterior derivative d = dxµ∂µ and the 1-form connection A(1) = dxµAµ(x) on the (4, 2)-dimensional supermanifold, are d → d̃ = dxµ ∂µ + dθ ∂θ + dθ̄ ∂θ̄, d̃ 2 = 0, A(1) → Ã(1) = dxµ Bµ(x, θ, θ̄) + dθ F̄(x, θ, θ̄) + dθ̄ F(x, θ, θ̄), (3.1) where the mapping from the 4D local fields to the superfields are: Aµ(x) → Bµ(x, θ, θ̄), C(x) → F(x, θ, θ̄) and C̄(x) → F̄(x, θ, θ̄). The super-expansion of the superfields, in terms ‡We lay emphasis on the fact that (2.6) cannot be derived directly from (2.3) by the simple substitution B = −(∂µA µ). One has to be judicious to deduce the precise expression for (2.6). The logical reasons behind the derivation of (2.6) are encoded in the superfield formulation (cf. (3.9) below). of the basic fields as well as the secondary fields, are (see, e.g., [4-7, 10-12]): Bµ(x, θ, θ̄) = Aµ(x) + θ R̄µ(x) + θ̄ Rµ(x) + i θ θ̄ Sµ(x), F(x, θ, θ̄) = C(x) + i θ B̄1(x) + i θ̄ B1(x) + i θ θ̄ s(x), F̄(x, θ, θ̄) = C̄(x) + i θ B̄2(x) + i θ̄ B2(x) + i θ θ̄ s̄(x). (3.2) It can be readily seen that, in the limiting case of (θ, θ̄) → 0, we get back our 4D basic fields (Aµ, C, C̄). Furthermore, on the r.h.s. of the above super expansion, the bosonic (i.e. Aµ, Sµ, B1, B̄1, B2, B̄2) and the fermionic (Rµ, R̄µ, C, C̄, s, s̄) fields do match. At this juncture, we have to recall our observations after equation (2.2). The nilpotent (anti-)BRST symmetry transformations basically owe their origin to the cohomological operator d. This is capitalized in the horizontality condition where we impose the restriction d̃Ã(1) = dA(1) on the super 1-form connection Ã(1) that contains the superfields defined on the (4, 2)-dimensional supermanifold. The latter condition yields the following relationships (see, e.g., for details, in our earlier works [21-25]): B1 = B̄2 = s = s̄ = 0, B̄1 +B2 = 0, (3.3) where we are free to choose the secondary fields (B2, B̄1) (i.e. B2 = B ⇒ B̄1 = −B) in terms of the Nakanishi-Lautrup auxiliary field B of the BRST invariant Lagrangian density (2.1). The other relations, that emerge from the above HC (i.e. d̃Ã(1) = dA(1)), are Rµ = ∂µC, R̄µ = ∂µC̄, Sµ = ∂µB, ∂µBν − ∂νBµ = ∂µAν − ∂νAµ. (3.4) At this stage, the super-curvature tensor F̃µν = ∂µBν − ∂νBµ is not equal to the ordinary curvature tensor Fµν = ∂µAν−∂νAµ as the former contains Grassmannian dependent terms. The substitution of the above values (cf. (3.3),(3.4)) of the secondary fields, in terms of the basic and auxiliary fields of the Lagrangian density (2.1), leads to B(h)µ (x, θ, θ̄) = Aµ + θ ∂µC̄ + θ̄ ∂µC + i θ θ̄ ∂µB, F (h)(x, θ, θ̄) = C − i θ B, F̄ (h)(x, θ, θ̄) = C̄ + i θ̄ B, (3.5) where the superscript (h) has been used to denote that the above expansions have been obtained after the application of the HC. It can be seen that, due to (3.5), we get ν − ∂νB µ = ∂µAν − ∂νAµ, (3.6) where there is no Grassmannian θ and θ̄ dependence on the l.h.s. In the language of the geometry on the (4, 2)-dimensional supermanifold, the expansions (3.5) imply that the (anti-)BRST symmetry transformations s(a)b (and their corresponding generators Q(a)b) for the 4D local fields (cf. (2.7)) are connected with the translational generators (∂/∂θ, ∂/∂θ̄) because the translation of the corresponding (4, 2)-dimensional superfields, along the Grassmannian directions of the supermanifold, produces it. Thus, the Grassmannian independence of the super curvature tensor F̃ (h)µν = ∂µB ν −∂νB µ implies that the 4D curvature tensor Fµν is an (anti-)BRST (i.e. gauge) invariant physical quantity. In terms of the superfields, equations (2.3) can be expressed as Limθ→0 −i F̄ (h) { (∂µB(h)µ + Limθ̄→0 + iF (h) { (∂µB(h)µ + Bµ(h)B(h)µ + F (h) F̄ (h) (3.7) These equations are unique because there is no other way to express the above equations in terms of the derivatives w.r.t. Grassmannian variables θ and θ̄. Thus, besides (2.3), there is no other possibility to express the gauge-fixing and the Faddeev-Popov ghost terms in the language of the off-shell nilpotent (anti-)BRST symmetry transformations (2.2). The superfield approach to BRST formulation, therefore, establishes the uniqueness of (2.3). To express (2.6) in terms of the superfields, one has to substitute B = −(∂µA µ) in (3.5). Thus, the expansion (3.5), in terms of the transformations (2.4), becomes§ µ(o)(x, θ, θ̄) = Aµ + θ ∂µC̄ + θ̄ ∂µC − i θ θ̄ ∂µ(∂ ρAρ), ≡ Aµ + θ (s̃abAµ) + θ̄ (s̃bAµ) + θ θ̄(s̃bs̃abAµ), (o) (x, θ, θ̄) = C + i θ (∂µA µ) ≡ C + θ (s̃abC), (o) (x, θ, θ̄) = C̄ − i θ̄ (∂µA µ) ≡ C̄ + θ̄ (s̃bC̄). (3.8) We note that (3.5) and (3.8) are the super expansions (after the application of the HC) which lead to the derivation of the off-shell nilpotent (anti-)BRST symmetry transforma- tions s(a)b as well as the on-shell nilpotent (anti-)BRST symmetry transformations s̃(a)b, respectively, for the basic fields Aµ, C and C̄ of the theory. The gauge-fixing and Faddeev-Popov ghost terms of the Lagrangian density (2.5) can also be expressed in terms of the superfields (3.8). In other words, (vis-à-vis (3.7)), we have the following equations that are the analogue of (2.6), namely; Limθ→0 (o) (∂ µAµ) + i B µ(o) ∂ (o) ) Limθ̄→0 (o) (∂ µAµ)− i B µ(o) ∂ (o) ) (o) B µ(o) + (o) F̄ (3.9) We know that, for all practical computational purposes, it is essential to take into account s̃(a)b(∂µA µ) = 0 because of the on-shell conditions ✷C = ✷C̄ = 0. The logical reason behind such a restriction (i.e. s̃(a)b(∂µA µ) = 0) in (2.6) is encoded in the superfield approach to BRST formalism as can be seen from a close look at (3.9). The Lagrangian density (2.1) can be expressed, in terms of the (4, 2)-dimensional superfields, in the following distinct and different forms B = − F̃ (h)µν F̃ µν(h) + Limθ→0 −i F̄ (h)(∂µB(h)µ + , (3.10) §The on-shell nilpotent (anti-)BRST symmetry transformations s̃(a)b can also be obtained by invoking the (anti-)chiral superfields on the appropriately chosen supermanifolds (see, e.g. [23] for details). B = − F̃ (h)µν F̃ µν(h) + Limθ̄→0 +i F (h)(∂µB(h)µ + , (3.11) B = − F̃ (h)µν F̃ µν(h) + Bµ(h)B(h)µ + F (h)F̄ (h) . (3.12) It would be noted that the kinetic energy term −(1/4)F̃ (h)µν F̃ µν(h) is independent of the variables θ and θ̄ because F̃ (h)µν = Fµν . In exactly similar fashion, the Lagrangian density of (2.5) can be expressed, with the help of the super expansion (3.8), as b = − µν(o)F̃ µν(h) (o) + Limθ→0 (o) (∂ µAµ) + i B µ(o) ∂ (o) ) , (3.13) b = − µν(o)F̃ µν(h) (o) + Limθ̄→0 (o) (∂ µAµ)− i B µ(0) ∂ (o) ) , (3.14) b = − µν(o)F̃ µν(h) (o) + (o) B µ(o) + (o) F̄ . (3.15) The form of the Lagrangian densities (e.g. from (3.10) to (3.15)) simplify the proof for the (anti-)BRST invariance of the Lagrangian densities in (2.1) and (2.5). In the above forms (e.g. from (3.10) to (3.12)) of the Lagrangian density, the BRST invariance sbLB = 0 and the anti-BRST invariance sabLB = 0 become very transparent and simple because the following equalities and mappings exist, namely; B = 0 ⇒ Limθ→0 B = 0, sb ⇔ Limθ→0 , s2b = 0 ⇔ = 0, (3.16) B = 0 ⇒ Limθ̄→0 B = 0, sab ⇔ Limθ̄→0 , s2ab = 0 ⇔ = 0. (3.17) Similarly, the most beautiful relation (3.12), leads to the (anti-)BRST invariance together. Here one has to use the anticommutativity property sbsab + sabsb = 0 in the language of the translational generators (i.e. (∂/∂θ̄), (∂/∂θ)) along the Grassmannian directions of the supermanifold, for its proof. This statement can be mathematically expressed as s(a)bL B = 0 ⇒ B = 0, sbsab + sabsb = 0 ⇔ = 0. (3.18) In exactly similar fashion, the on-shell nilpotent (anti-)BRST symmetry invariance (i.e. s̃(a)bL b = 0) of the Lagrangian density (2.5) can also be captured in the language of the superfields if we exploit the expressions (3.13) to (3.15) for the Lagrangian density. In the latter case, the on-shell nilpotent (anti-)BRST invariance turns out to be like (3.16), (3.17) and (3.18) with the replacements: s(a)b → s̃(a)b, L B → L b , L̃ (1,2,3) B → L̃ (1,2,3) Mathematically, the (anti-)BRST invariance of the Lagrangian density (2.1) is captured in the equations (3.16) to (3.18). In the language of geometry on the (4, 2)-dimensional supermanifold, the (anti-)BRST invariance corresponds to the Grassmannian independence of the supersymmetric versions of the Lagrangian density (2.1). In other words, the trans- lation of the super Lagrangian densities (i.e. (3.10) to (3.12)), along the (θ)θ̄ directions of the supermanifold, is zero. This observation captures the (anti-)BRST invariance of (2.1). 4 (Anti-)BRST symmetries in non-Abelian theory: Lagrangian approach We begin with the following BRST-invariant Lagrangian density, in the Feynman gauge, for the four (3 + 1)-dimensional non-Abelian 1-form gauge theory¶ (see, e.g. [26,27,9]) B = − F µν · Fµν +B · (∂µA B · B − i∂µC̄ ·D µC, (4.1) where the curvature tensor (Fµν) is defined through the 2-form F (2)(n) = dA(1)(n)+iA(1)(n)∧ A(1)(n). Here the non-Abelian 1-form gauge connection is A(1)(n) = dxµ(Aµ · T ) and the exterior derivative is d = dxµ∂µ. The Nakanishi-Lautrup auxiliary field B = B · T is required for the linearization of the gauge-fixing term and the (anti-)ghost fields (C̄)C are essential for the proof of the unitarity in the theory. The latter fields are fermionic (i.e. (Ca)2 = 0, (C̄a)2 = 0, CaCb + CbCa = 0, CaC̄b + C̄bCa = 0, etc.) in nature. The above Lagrangian density respects the following off-shell nilpotent ((s 2 = 0) BRST symmetry transformations s b , namely; b Aµ = DµC, s b C = − (C × C), s b C̄ = iB, b B = 0, s b Fµν = i(Fµν × C). (4.2) It will be noted that (i) the curvature tensor Fµν · T transforms here under the BRST symmetry transformation. However, it can be checked explicitly that the kinetic energy term −(1/4)Fµν · F µν remains invariant under the BRST symmetry transformations, (ii) the nilpotent anti-BRST symmetry transformations corresponding to the above BRST symmetry transformations (4.2) cannot be defined for the Lagrangian density (4.1), and (iii) the on-shell nilpotent version of the above BRST symmetry transformations is also possible if we substitute, in the above symmetry transformations, B = −(∂µA µ). The ensuing on-shell (i.e. ∂µD µC = 0) nilpotent BRST symmetry transformations s̃ b are b Aµ = DµC, s̃ b C = − (C × C), b C̄ = −i(∂µA µ), s̃ b Fµν = i(Fµν × C). (4.3) The above on-shell nilpotent transformations leave the following Lagrangian density b = − F µν · Fµν − µ) · (∂ρA ρ)− i∂µC̄ ·D µC, (4.4) ¶For the non-Abelian 1-form gauge theory, the notations used in the Lie algebraic space are: A · B = AaBa, (A ×B)a = fabcAbBc, DµC a = ∂µC a + ifabcAbµC c ≡ ∂µC a + i(Aµ × C) a, Fµν = ∂µAν − ∂νAµ + iAµ×Aν , Aµ = Aµ ·T, [T a, T b] = fabcT c where the Latin indices a, b, c = 1, 2, 3....N are in the SU(N) Lie algebraic space. The structure constant fabc can be chosen to be totally antisymmetric for any arbitrary semi simple Lie algebra that includes SU(N), too (see, e.g., [27]). quasi-invariant because it transforms to a total derivative. The gauge-fixing and Faddeev-Popov ghost terms of the Lagrangian densities (4.1) and (4.4) can be written, modulo a total derivative, as a BRST-exact quantity in terms of the off-shell and on-shell nilpotent BRST symmetry transformations (4.2) and (4.3). This statement can be mathematically expressed as follows −i C̄ · {(∂µA = B · (∂µA B · B − i ∂µC̄ ·D µC, (4.5) C̄ · (∂µA µ) + i Aµ · ∂ µ) · (∂ρA ρ)− i ∂µC̄ ·D µC. (4.6) It will be noted that one has to take into account s̃ b (∂µA µ) = ∂µD µC = 0 in the above proof of the exactness of the expression in (4.6). The Lagrangian densities that respect the off-shell nilpotent (i.e. (s (a)b) 2 = 0) and anticommuting (s ab + s b = 0) (anti-)BRST symmetry transformations are (1)(n) b = − F µν · Fµν +B · (∂µA (B · B + B̄ · B̄)− i∂µC̄ ·D µC, (4.7) (2)(n) b = − F µν · Fµν − B̄ · (∂µA (B · B + B̄ · B̄)− iDµC̄ · ∂ µC. (4.8) Here auxiliary fields B and B̄ satisfy the Curci-Ferrari condition B+B̄ = −(C×C̄) [28,29]. It is also evident, from this relation, that B ·(∂µA µ)−i∂µC̄ ·D µC = −B̄ ·(∂µA µ)−iDµC̄ ·∂ Furthermore, it should be re-emphasized that the Lagrangian densities (4.1) and (4.4) do not respect the anti-BRST symmetry transformations of any kind. The BRST and anti- BRST symmetry transformations, for the above Lagrangian densities, are b Aµ = DµC, s b C = − (C × C), s b C̄ = iB, b B = 0, s b Fµν = i(Fµν × C), s b B̄ = i(B̄ × C), (4.9) ab Aµ = DµC̄, s ab C̄ = − (C̄ × C̄), s ab C = iB̄, ab B̄ = 0, s ab Fµν = i(Fµν × C̄), s ab B = i(B × C̄). (4.10) The above off-shell nilpotent (anti-)BRST symmetry transformations leave the Lagrangian densities (4.7) as well as (4.8) quasi-invariant as they transform to some total derivatives. The gauge-fixing and Faddeev-Popov ghost terms of the Lagrangian densities (4.7) and (4.8) can be written, in a symmetrical fashion with respect to s b and s ab , as Aµ ·A µ + C · C̄ = B · (∂µA (B ·B + B̄ · B̄)− i∂µC̄ ·D ≡ −B̄ · (∂µA (B · B + B̄ · B̄)− iDµC̄ · ∂ (4.11) This demonstrates the key fact that the above gauge-fixing and Faddeev-Popov ghost terms are (anti-)BRST invariant together because of the nilpotency and anticommutativity of the (anti-)BRST symmetry transformations s (a)b that are present in the theory. 5 (Anti-)BRST invariance in non-Abelian theory: superfield approach To capture (i) the off-shell as well as the on-shell nilpotent (anti-)BRST symmetry transfor- mations, and (ii) the invariance of the Lagrangian densities, in the language of the superfield approach to BRST formalism, we have to consider the 4D 1-form non-Abelian gauge theory on a (4, 2)-dimensional supermanifold. As a consequence, we have the following mappings: d → d̃ = dxµ ∂µ + dθ ∂θ + dθ̄ ∂θ̄, d̃ 2 = 0, A(1)(n) → Ã(1)(n) = dxµ(Bµ · T )(x, θ, θ̄) + dθ(F̄ · T )(x, θ, θ̄) + dθ̄(F · T )(x, θ, θ̄), (5.1) where the (4, 2)-dimensional superfields (Bµ ·T,F ·T, F̄ ·T ) are the generalizations of the 4D basic local fields (Aµ ·T, C ·T, C̄ ·T ) of the Lagrangian density (4.1), (4.7) and (4.8). These superfields can be expanded along the Grassmannian directions of the supermanifold, in terms of the basic 4D fields, auxiliary fields and secondary fields as [4,16,19] (Bµ · T )(x, θ, θ̄) = (Aµ · T )(x) + θ (R̄µ · T )(x) + θ̄ (Rµ · T )(x) + i θ θ̄ (Sµ · T )(x), (F · T )(x, θ, θ̄) = (C · T )(x) + i θ (B̄1 · T )(x) + i θ̄ (B1 · T )(x) + i θ θ̄ (s · T )(x), (F̄ · T )(x, θ, θ̄) = (C̄ · T )(x) + i θ (B̄2 · T )(x) + i θ̄ (B2 · T )(x) + i θ θ̄ (s̄ · T )(x). (5.2) To determine the exact expressions for the secondary fields, in terms of the basic and auxiliary fields of the theory, we have to exploit the HC. The horizontality condition, for the non-Abelian gauge theory is the requirement of the equality of the Maurer-Cartan equation on the (super) manifolds. In other words, the covariant reduction of the super 2-form curvature F̃ (2)(n) to the ordinary 2-form curvature (i.e. d̃Ã(1)(n)+ iÃ(1)(n) ∧ Ã(1)(n) = dA(1)(n)+ iA(1)(n)∧A(1)(n)) leads to the determination of the secondary fields in terms of the basic and auxiliary fields of the theory. The ensuing expansions, in terms of the basic and auxiliary fields, lead to (i) the derivation of the (anti-)BRST symmetry transformations for the basic fields of the theory, and (ii) the geometrical interpretations of the nilpotent (anti-)BRST symmetry transformations (and their corresponding nilpotent generators) for the basic fields of the theory as the translations of the corresponding superfields along the Grassmannian directions of the (4, 2)-dimensional supermanifold (see, e.g., [16,19]). With the identifications B2 = B and B̄1 = B̄, the following relationships emerge after the application of the horizontality condition ‖ (see, e.g., [16]): Rµ = DµC, R̄µ = DµC̄, B + B̄ = −(C × C̄), s = i(B̄ × C), Sµ = DµB +DµC × C̄ ≡ −DµB̄ −DµC̄ × C, s̄ = −i(B × C̄), B1 = − (C × C), B̄2 = − (C̄ × C̄). (5.3) ‖In the rest of our present text, we shall be using the short-hand notations for all the fields e.g.: Aµ · T = Aµ, C · T = C, B · T = B, etc., for the sake of brevity. The substitution of the above expressions, which are obtained after the application of the horizontality condition, leads to the following expansions B(h)µ (x, θ, θ̄) = Aµ + θ DµC̄ + θ̄ DµC + i θ θ̄ (DµB +DµC × C̄), F (h)(x, θ, θ̄) = C + i θ B̄ − θ̄ (C × C)− θ θ̄ (B̄ × C), F̄ (h)(x, θ, θ̄) = C̄ − θ (C̄ × C̄) + i θ̄ B + θ θ̄ (B × C̄). (5.4) The above expansions (see, e.g., our earlier works [16,19]) can be expressed in terms of the off-shell nilpotent (anti-)BRST symmetry transformations (4.9) and (4.10). With the above expansion at our disposal, the gauge-fixing and Faddeev-Popov terms of the Lagrangian density (4.1) can be written, modulo a total ordinary derivative, as Limθ→0 −iF̄ (h) · ∂µB(h)µ − F̄ (h) · B = B · (∂µA B · B − i ∂µC̄ ·D µC. (5.5) Furthermore, it can be seen that, due to the validity and consequences of the horizontality condition, the super curvature tensor F̃µν has the following form [16,4] F̃ (h)µν = Fµν + iθ(Fµν × C̄) + iθ̄(Fµν × C)− θ θ̄ (Fµν × B + Fµν × C × C̄). (5.6) It is clear from the above relationship that the kinetic energy term of the present 4D non-Abelian 1-form gauge theory remains invariant, namely; F̃ (h)µν · F̃ µν(h) = − Fµν · F µν . (5.7) The Grassmannian independence of the l.h.s. of (5.7) has deep meaning as far as physics is concerned. It implies immediately that the kinetic energy term of the non-Abelian gauge theory is an (anti-)BRST (i.e. gauge) invariant physical quantity. At this juncture, it is worthwhile to point out that one can also capture the equation (4.6) in the superfield approach to BRST formalism where the on-shell nilpotent version of the BRST symmetry transformations (i.e. s̃ b ) plays an important role. For this pur- pose, we have to express the superfield expansion (5.4) for the on-shell nilpotent BRST symmetry transformation where one has to exploit the replacement B = −(∂µA µ). With this substitution, the equation (5.4) for the superfield expansion becomes µ(o)(x, θ, θ̄) = Aµ + θ DµC̄ + θ̄ DµC + i θ θ̄ [−Dµ(∂ ρAρ) +DµC × C̄], (o) (x, θ, θ̄) = C + i θ B̄ − θ̄ (C × C)− θ θ̄ (B̄ × C), (o) (x, θ, θ̄) = C̄ − θ (C̄ × C̄)− i θ̄ (∂µA µ)− θ θ̄ [(∂µA µ)× C̄)]. (5.8) Now, the equation (4.6) can be expressed in terms of the above superfields, as: Limθ→0 (o) · (∂ µAµ) + i B µ(o) · ∂ µ) · (∂ρA ρ)− i ∂µC̄ ·D (5.9) Furthermore, it will be noted that the analogue of (5.6), for the on-shell nilpotent BRST symmetry transformation (i.e. F̃ µν(o)), can be obtained by the replacement B = −(∂µA Once again, the equality (5.7) would remain intact even if we take into account the on-shell nilpotent BRST symmetry transformations. Thus, we note that the kinetic energy term (i.e. (−(1/4)F µν · Fµν = −(1/4)F̃ µν(h) (o) · F̃ µν(o)) of the non-Abelian gauge theory remains independent of the Grassmannian variables θ and θ̄ after the application of the HC. This statement is true for the off-shell as well as the on-shell nilpotent (anti-)BRST symmetry transformations. Physically, it implies that the kinetic energy term for the gauge field of the non-Abelian theory is an (anti-)BRST (i.e. gauge) invariant quantity. The above key observation helps in expressing the Lagrangian density (4.1) and (4.4) in terms of the superfields (obtained after the application of HC), as B = − F̃ (h)µν · F̃ µν(h) + Limθ→0 −iF̄ (h) · ∂µB(h)µ − F̄ (h) · B b = − µν(o) · F̃ µν(h) (o) + Limθ→0 (o) · (∂ µAµ) + i B µ(o) · ∂ (5.10) This result, in turn, simplifies the BRST invariance of the above Lagrangian density (4.1) and (4.4) (describing the 4D 1-form non-Abelian gauge theory) as follows Limθ→0 B = 0 ⇒ s B = 0, Limθ→0 b = 0 ⇒ s̃ b = 0. (5.11) This is a great simplification because the total super Lagrangian densities (5.10) remain independent of the Grassmannian variable θ̄. This key result is encoded in the mapping b , s̃ b ) ⇔ Limθ→0(∂/∂θ̄) and the nilpotency (s 2 = 0, (s̃ 2 = 0, (∂/∂θ̄)2 = 0. It can be readily checked that the analogues of (5.5) and (5.9) cannot be expressed as the derivative w.r.t. the Grassmannian variable θ. To check this, one has to exploit the super expansions (5.4) and (5.8) obtained after the application of the HC (in the context of the derivation of the off-shell as well as the on-shell nilpotent BRST symmetry transfor- mations s b and s̃ b ). It can be clearly seen that the operation of the derivative w.r.t. the Grassmannian variable θ, on any combination of the superfields from the expansions (5.4) and (5.8), does not lead to the derivation of the r.h.s. of (5.5) and (5.9). In the language of the superfield approach to BRST formalism, this is the reason behind the non-existence of the anti-BRST symmetry transformations for the Lagrangian densities (4.1) and (4.4). The form of the gauge-fixing and Faddeev-Popov terms (4.11), expressed in terms of the BRST and anti-BRST symmetry transformations together, can be represented in the language of the superfields (obtained after the application of HC), as B(h)µ · B µ(h) + F (h) · F̄ (h) = B · (∂µA (B · B + B̄ · B̄)− i∂µC̄ ·D (5.12) As a consequence of the above expression, the Lagrangian densities (4.7) (as well as (4.8)) can be presented, in terms of the superfields, as (1,2)(n) b = − F̃ µν(h) · F̃ (h)µν + B(h)µ · B µ(h) + F (h) · F̄ (h) . (5.13) The BRST and anti-BRST invariance of the above super Lagrangian density (and that of the ordinary 4D Lagrangian densities (4.7) and (4.8)) is encoded in the following simple equations that are expressed in terms of the translational generators along the Grassman- nian directions of the (4, 2)-dimensional supermanifold, namely; Limθ→0 (1,2)(n) b = 0 ⇒ s (1)(n) b = 0, Limθ̄→0 (1,2)(n) b = 0 ⇒ s (2)(n) b = 0. (5.14) This is a tremendous simplification of the (anti-)BRST invariance of the Lagrangian den- sities (4.7) and (4.8) in the language of the superfield approach to BRST formalism. In other words, if one is able to show the Grassmannian independence of the super Lagrangian densities of the theory, the (anti-)BRST invariance of the 4D theory follows automatically. In the language of the geometry on the supermanifold, the (anti-)BRST invariance of a 4D Lagrangian density is equivalent to the statement that the translation of the super version of the above Lagrangian density, along the Grassmannian directions of the (4, 2)- dimensional supermanifold, is zero. Thus, the super Lagrangian density of an (anti-)BRST invariant 4D theory is a Lorentz scalar, constructed with the help of (4, 2)-dimensional superfields (obtained after the application of HC), such that, when the partial derivatives w.r.t. the Grassmannian variables (θ and θ̄) operate on it, the result is zero. The nilpotency and anticommutativity properties (that are associated with the con- served (anti-)BRST charges and (anti-)BRST symmetry transformations) are found to be captured very naturally (cf. (3.16)-(3.18)) when we consider the superfield formulation of the (anti-)BRST invariance of the Lagrangian density of a given 1-form gauge theory. We mention, in passing, that one could also derive the analogue of the equations (3.16), (3.17) and (3.18) for the 4D non-Abelian 1-form gauge theory in a straightforward manner. 6 Conclusions In our present investigation, we have concentrated mainly on the (anti-)BRST invariance of the Lagrangian densities of the free 4D (non-)Abelian 1-form gauge theories (having no interaction with matter fields) within the framework of the superfield approach to BRST formalism. We have been able to provide the geometrical basis for the existence of the (anti-)BRST invariance in the above 4D theories. To be more specific, we have been able to show that the Grassmannian independence of the (4, 2)-dimensional super Lagrangian density, expressed in terms of the appropriate superfields, is a clear-cut proof that there is an (anti-)BRST invariance (cf. (3.16), (3.17), (3.18), (5.11), (5.14)) in the 4D theory. If the super Lagrangian density could be expressed as a sum of (i) a Grassmannian independent term, and (ii) a derivative w.r.t. the Grassmannian variable, then, the cor- responding 4D Lagrangian density will automatically respect BRST and/or anti-BRST invariance. In the latter piece of the above super Lagrangian density, the derivative could be either w.r.t. θ or w.r.t. θ̄ or w.r.t. both of them put together. More specifically, (i) if the derivative is w.r.t. θ̄, the nilpotent symmetry would correspond to the BRST, (ii) if the derivative is w.r.t. θ, the nilpotent symmetry would be that of the anti-BRST type, and (iii) if both the derivatives are present together, both the nilpotent (anti-)BRST symmetries would be present together (and they would turn out to be anticommuting). For the 4D (non-)Abelian 1-form gauge theories, that are considered on the (4, 2)- dimensional supermanifold, it is the HC on the 1-form super connection Ã(1) that plays a very important role in the derivation of the (anti-)BRST symmetry transformations. The cohomological origin for the above HC lies in the (super) exterior derivatives (d̃)d. This point has been made quite clear in our discussions after the off-shell as well as the on-shell nilpotent (anti-)BRST symmetry transformations (2.2), (2.4), (4.2), (4.3), (4.9) and (4.10). In fact, it is the full kinetic energy term of the above theories (owing its origin to the cohomological operator d = dxµ∂µ) that remains invariant under the above on-shell as well the off-shell nilpotent (anti-)BRST symmetry transformations. The HC produces specifically the nilpotent (anti-)BRST symmetry transformations for the gauge and (anti-)ghost fields because of the fact that the super 1-form connection Ã(1)/Ã(1)(n) (cf. (3.1) and (5.1)) is constructed with a super vector multiplet (Bµ,F , F̄) which is the generalization of the gauge field Aµ and the (anti-)ghost fields (C̄)C (of the ordinary 4D (non-)Abelian 1-form gauge theories) to the (4, 2)-dimensional supermanifold. As a consequence, only the nilpotent and anticommuting (anti-)BRST symmetry transfor- mations for the 4D local fields Aµ, C and C̄ are obtained when the full potential of the HC is exploited within the framework of the above superfield formulation. It is worthwhile to point out that geometrically the super Lagrangian densities, ex- pressed in terms of the (4, 2)-dimensional superfields, are equivalent to the sum of the kinetic energy term and the translations of some composite superfields (obtained after the application of the HC) along the Grassmannian directions (i.e. θ and/or θ̄) of the (4, 2)- dimensional supermanifold. This observation is distinctly different from our earlier works on the superfield approach to 2D (non-)Abelian 1-form gauge theories [24,25,23] which are found to correspond to the topological field theories. In fact, for the latter theories, the total super Lagrangian density turns out to be a total derivative w.r.t. the Grassmannian variables (θ and/or θ̄). That is to say, even the kinetic energy term of the latter theories, is able to be expressed as the total derivative w.r.t. the variables θ and/or θ̄. In our present endeavour, within the framework of the superfield approach to BRST formalism, we have been able to provide (i) the logical reason behind the non-existence of the anti-BRST symmetry transformations for the Lagrangian densities (4.1) and (4.4) for the 4D non-Abelian 1-form gauge theory, (ii) the explicit explanation for the uniqueness of the equations (2.3) and (2.6) for the 4D Abelian 1-form gauge theory, (iii) the convinc- ing proof for the on-shell nilpotent (anti-)BRST invariance of the gauge-fixing term (i.e. s̃(a)b(∂µA µ) = 0, s̃ (a)b(∂µA µ) = 0) for the (non-)Abelian 1-form gauge theories, and (iv) the compelling arguments for the non-existence of the exact analogue(s) of (2.3) and (2.6) for the non-Abelian 1-form gauge theory. To the best of our knowledge, the logical explana- tions for the above subtle points (connected with the 1-form gauge theories) are completely new. Thus, the results of our present work are simple, beautiful and original. It is worthwhile to mention that our superfield construction and its ensuing geometrical interpretations are not specific to the Feynman gauge (which has been taken into account in our present endeavor). To corroborate this assertion, we take the simple case of the 4D Abelian 1-form gauge theory and write the Lagrangian density (2.1) in the arbitrary gauge (a,ξ) B = − F µνFµν + B (∂µA B2 − i ∂µC̄ ∂ µC, (6.1) where ξ is the gauge parameter. It is elementary to check that, in the limit ξ → 1, we get back our Lagrangian density (2.1) for the Abelian theory in the Feynman gauge. The analogue of the equation (2.3) (for the gauge-fixing and Faddeev-Popov ghost terms in the case of the arbitrary gauge) can be expressed as −i C̄ {(∂µA B}], sab +i C {(∂µA sb sab (6.2) The above expression can be easily generalized to the analogues of the equations (3.10)— (3.12) in terms of the superfields by taking the help of (3.8). Thus, the geometrical inter- pretations remain intact even in the case of the arbitrary gauge. In a similar fashion, for the 4D non-Abelian 1-form gauge theory, the equations (4.5), (4.6) and (4.11) can be generalized to the case of arbitrary gauge and, subsequently, can be expressed in terms of superfields as the analogues of (5.5), (5.9) and (5.12). Finally, we can obtain the analogues of (5.7), (5.10) and (5.13) which will lead to the derivation of the analogues of (5.11) and (5.14). Thus, we note that geometrical interpretations, in the arbitrary gauge, remain the same for the 4D (non-)Abelian 1-form gauge theory within the framework of our superfield approach to BRST formalism. Our present work can be generalized to the case of the interacting 4D (non-)Abelian 1-form gauge theories where there exists an explicit coupling between the gauge field and the matter fields. In fact, our earlier works [14-18] might turn out to be quite handy in attempting the above problems. It seems to us that it is the combination of the HC and the restrictions, owing their origin to the (super) covariant derivative on the matter (super) fields and their intimate connection with the (super) curvatures, that would play a decisive role in proving the existence of the (anti-)BRST invariance for the above gauge theories. It is gratifying to state that we have accomplished the above goals in our very recent endeavours [30-32]. In fact, we have been able to provide the geometrical basis for the existence of the (anti-)BRST invariance, in the context of the interacting (non-)Abelian 1-form gauge theories with Dirac as well as complex scalar fields, within the framework of the augmented superfield approach to BRST formalism. As it turns out, here too, the super Lagrangian density is found to be independent of the Grassmannian variables. In our earlier works [33-35], we have been able to show the existence of the nilpotent (anti-)BRST and (anti-)co-BRST symmetry transformations for the 4D free Abelian 2-form gauge theory. We have also established the quasi-topological nature of it in [35]. In a recent work [36], the nilpotent (anti-)BRST symmetry transformations have been captured in the framework of the superfield formulation. It would be a very nice endeavour to study the (anti-)BRST and (anti-)co-BRST invariance of the 4D Abelian 2-form gauge theory within the framework of superfield formulation. At present, this issue and connected problems in the context of the 4D free Abelian 2-form gauge theory are under intensive investigation and our results would be reported in our forthcoming future publications [37]. Acknowledgement: Financial support from the Department of Science and Technology (DST), Government of India, under the SERC project sanction grant No: - SR/S2/HEP- 23/2006, is gratefully acknowledged. References [1] J. Thierry-Mieg, J. Math. Phys. 21, 2834 (1980). [2] J. Thierry-Mieg, Nuovo Cimento A 56, 396 (1980). [3] M. Quiros, F. J. De Urries, J. Hoyos, M. L. Mazon and E. Rodrigues, J. Math. Phys. 22, 1767 (1981). [4] L. Bonora and M. Tonin, Phys. Lett. B 98, 48 (1981). [5] L. Bonora, P. Pasti and M. Tonin M, Nuovo Cimento A 63, 353 (1981). [6] R. Delbourgo and P. D. Jarvis, J. Phys. A: Math. Gen. 15, 611 (1981). [7] R. Delbourgo, P. D. Jarvis and G. Thompson, Phys. Lett. B 109, 25 (1982). [8] D. S. Hwang and C. -Y. Lee, J. Math. Phys. 38, 30 (1997). [9] N. Nakanishi and I. Ojima, Covariant Operator Formalism of Gauge Theories and Quantum Gravity (World Scientific, Singapore, 1990). [10] R. P. Malik, Phys. Lett. B 584, 210 (2004), hep-th/0311001. [11] R. P. Malik, Int. J. Geom. Methods Mod. Phys. 1, 467 (2004), hep-th/0403230. [12] R. P. Malik, J. Phys. A: Math. Gen. 37, 5261 (2004), hep-th/031193. [13] R. P. Malik, Int. J. Mod. Phys. A 20, 4899 (2005), hep-th/0402005. R. P. Malik, Int. J. Mod. Phys. A 20, 7285 (2005), hep-th/0402005 (Erratum). [14] R. P. Malik, Mod. Phys. Lett. A 20, 1667 (2005), hep-th/0402123. [15] R. P. Malik, Eur. Phys. J. C 45, 513 (2006), hep-th/0506109. [16] R. P. Malik and B. P. Mandal, Eur. Phys. J. C 47, 219 (2006), hep-th/0512334. [17] R. P. Malik, Eur. Phys. J. C 47, 227 (2006), hep-th/0507127. [18] R. P. Malik, J. Phys. A: Math. Gen. 39, 10575 (2006), hep-th/0510164. [19] R. P. Malik, Eur. Phys. J. C 51, 169 (2007), hep-th/0603049. [20] R. P. Malik, J. Phys. A: Math. Theor. 40, 4877 (2007), hep-th/0605213. [21] R. P. Malik, J. Phys. A: Math. Gen 33, 2437 (2000), hep-th/9902146. [22] R. P. Malik, J. Phys. A: Math. Gen. 34, 4167 (2001), hep-th/0012085. [23] R. P. Malik, Ann. Phys. (N. Y.) 307, 01 (2003), hep-th/0205135. [24] R. P. Malik, J. Phys. A: Math. Gen 35, 6919 (2002), hep-th/0112260. [25] R. P. Malik, J. Phys. A: Math. Gen. 35, 8817 (2002), hep-th/0204015. [26] K. Nishijima, Czech. J. Phys. 46, 01 (1996). [27] S. Weinberg, The Quantum Theory of Fields: Modern Applications Vol. 2 (Cambridge University Press, Cambridge, 1996). [28] G. Curci and R. Ferrari, Phys. Lett. B 63, 51 (1976). [29] G. Curci and R. Ferrari, Nuovo Cimento A 32, 151 (1976). [30] R. P. Malik, Nilpotent symmetry invariance in QED with Dirac fields: superfield for- malism, arXiv: 0706.4168 [hep-th]. [31] R. P. Malik and B. P. Mandal, Superfield approach to the nilpotent symmetry invariance in the non-Abelian 1-form gauge theory, arXiv: 0709.2277 [hep-th]. [32] R. P. Malik and B. P. Mandal, Nilpotent symmetry invariance in QED with complex scalar fields: augmented superfield formalism, arXiv: 0711.2389 [hep-th]. [33] E. Harikumar, R. P. Malik and M. Sivakumar, J. Phys. A: Math. Gen. 33, 7149 (2000), hep-th/0004145. [34] R. P. Malik, Int. J. Mod. Phys. A 19, 5663 (2004), hep-th/0212240. [35] R. P. Malik, J. Phys. A: Math. Gen. 36, 5056 (2003), hep-th/0209136. [36] R. P. Malik, Superfield approach to nilpotent (anti-)BRST symmetries for the free Abelian 2-form gauge theory, hep-th/0702039. [37] R. P. Malik, in preparation. ABSTRACT We capture the off-shell as well as the on-shell nilpotent Becchi-Rouet-Stora-Tyutin (BRST) and anti-BRST symmetry invariance of the Lagrangian densities of the four (3 + 1)-dimensional (4D) (non-)Abelian 1-form gauge theories within the framework of the superfield formalism. In particular, we provide the geometrical interpretations for (i) the above nilpotent symmetry invariance, and (ii) the above Lagrangian densities, in the language of the specific quantities defined in the domain of the above superfield formalism. Some of the subtle points, connected with the 4D (non-)Abelian 1-form gauge theories, are clarified within the framework of the above superfield formalism where the 4D ordinary gauge theories are considered on the (4, 2)-dimensional supermanifold parametrized by the four spacetime coordinates x^\mu (with \mu = 0, 1, 2, 3) and a pair of Grassmannian variables \theta and \bar\theta. One of the key results of our present investigation is a great deal of simplification in the geometrical understanding of the nilpotent (anti-)BRST symmetry invariance. <|endoftext|><|startoftext|> Introduction Let a = (ai), i ∈ Z be a sequence of variables. Consider the ring of polynomials Z[a] in the variables ai with integer coefficients. Introduce another infinite set of variables x = (x1, x2, . . . ) and for each nonnegative integer n denote by Λn the ring of symmetric polynomials in x1, . . . , xn with coefficients in Z[a]. The ring Λn is filtered by the usual degrees of polynomials in x1, . . . , xn with the ai considered to have the zero degree. The evaluation map ϕn : Λn → Λn−1, P (x1, . . . , xn) 7→ P (x1, . . . , xn−1, an) (1.1) is a homomorphism of filtered rings so that we can define the inverse limit ring Λ by Λ = lim Λn, n → ∞, (1.2) where the limit is taken with respect to the homomorphisms (1.1) in the category of filtered rings. When a is specialized to the sequence of zeros, this reduces to the usual definition of the ring of symmetric functions; see e.g. Macdonald [14]. In that case, a distinguished basis of Λ is comprised by the Schur functions sλ(x) parameterized by all partitions λ. The respective analogues of the sλ(x) in the general case are the double Schur functions sλ(x||a) which form a basis of Λ over Z[a]. We introduce the Littlewood–Richardson polynomials cνλµ(a) as the structure coefficients of the ring Λ in the basis of double Schur functions, sλ(x||a) sµ(x||a) = cνλµ(a) sν(x||a). (1.3) In the specialization a = (0) the polynomials cνλµ(a) become the classical Littlewood– Richardson coefficients cνλµ; see [12]. These are remarkable nonnegative integers which occupy a prominent place in combinatorics, representation theory and geometry; see e.g. Fulton [5], Macdonald [14] and Sagan [21]. The main result of this paper is a combinatorial rule for the calculation of the Littlewood–Richardson polynomials which provides a manifestly positive formula in the sense that cνλµ(a) is written as a polynomial in the differences ai − aj , i < j, with positive integer coefficients. We consider two applications of the rule. The results of Knutson and Tao [9] imply that under an appropriate specialization, the polynomials cνλµ(a) describe the multiplication rule for the equivariant Schubert classes on Grassmannians; see also Fulton [6] for a more direct argument. Let n and N be nonnegative integers with n 6 N and let Gr(n,N) denote the Grassmannian of the n-dimensional vector sub- spaces of CN . The torus T = (C∗)N acts naturally on Gr(n,N). The equivariant cohomology ring H∗T (Gr(n,N)) is a module over the polynomial ring Z[t1, . . . , tN ] which can be identified with H∗T ({pt}), the equivariant cohomology ring of a point. This module has a basis of the equivariant Schubert classes σλ parameterized by all diagrams λ contained in the n×m rectangle, m = N − n; see e.g. [5, 6]. Then σλ σµ = d νλµ σν , (1.4) where d νλµ = c λµ(a) with the sequence a specialized by a−m+1 = −t1, a−m+2 = −t2, . . . , an = −tN , (1.5) while the remaining parameters ai are set to zero (the ti should be replaced with yi in the notation of [9]). The coefficients d λµ are given explicitly as polynomials in the ti − tj , i > j, with positive integer coefficients. This positivity property was established by Graham [8] in the general context of the equivariant Schubert calculus. The first manifestly positive formula for the coefficients in the expansion (1.4) was obtained by Knutson and Tao [9] by using combinatorics of puzzles. An earlier rule of Molev and Sagan [17] also calculates d νλµ but lacks the explicit positivity property. Our new rule implies a stability property of the coefficients d νλµ (see Corollary 3.1 below). Even though this property was not pointed out in [9], it can be derived directly from the puzzle rule; see also Fulton [6] for its geometrical interpretation and an extension to the equivariant Schubert calculus on the flag variety. As another application, we obtain a rule for the positive integer expansion of the product of two (virtual) quantum immanants (or the corresponding higher Capelli operators) of Okounkov and Olshanski [18, 19]; cf. [17]. The quantum immanants Sλ|n are elements of the center Z(gln) of the universal enveloping algebra U(gln) parameterized by partitions λ with at most n parts; see [18]. The elements Sλ|n form a basis of Z(gln) so that we can define the coefficients f λµ by the expansion Sλ|n Sµ|n = f νλµ Sν|n. Then f νλµ = c λµ(a) for the specialization ai = −i for i ∈ Z. As n → ∞ this yields a multiplication rule for the virtual quantum immanants Sλ; see Section 3.2 for the definitions. We define the double Schur function sλ(x||a) as the sequence of the double Schur polynomials sλ(x1, . . . , xn ||a), n = 1, 2, . . . , (1.6) which are compatible with respect to the homomorphisms (1.1), ϕn : sλ(x1, . . . , xn ||a) 7→ sλ(x1, . . . , xn−1 ||a). (1.7) The polynomials (1.6) are closely related to the “factorial” or “double” Schur poly- nomials sλ(x|u) with x = (x1, . . . , xn). The latter were introduced by Goulden and Greene [7] and Macdonald [13] as a generalization of the factorial Schur polynomials of Biedenharn and Louck [1, 2], and they are also a special case of the double Schu- bert polynomials of Lascoux and Schützenberger; see Lascoux [11]. We follow Chen, Li and Louck [4] and Fulton [6] and use the name “double Schur polynomials” for the related polynomials sλ(x||a) as well. In a more detail, consider a partition λ which is a sequence λ = (λ1, . . . , λn) of integers λi such that λ1 > · · · > λn > 0. We will identify λ with its diagram represented graphically as the array of left justified rows of unit boxes with λ1 boxes in the top row, λ2 boxes in the second row, etc. The total number of boxes in λ will be denoted by |λ|. The transposed diagram λ′ = (λ′1, . . . , λ p) is obtained from λ by applying the symmetry with respect to the main diagonal, so that λ′j is the number of boxes in the j-th column of λ. Let u = (u1, u2, . . . ) be a sequence of variables. The polynomials sλ(x|u) can be defined by sλ(x|u) = (xT (α) − uT (α)+c(α)), (1.8) where T runs over all semistandard (column-strict) tableaux of shape λ with entries in {1, . . . , n}, T (α) is the entry of T in the box α ∈ λ and c(α) = j − i is the content of the box α = (i, j) in row i and column j. By a reverse λ-tableau T we will mean the tableau obtained by filling in the boxes of λ with the numbers 1, 2, . . . , n in such a way that the entries weakly decrease along the rows and strictly decrease down the columns. If α = (i, j) is a box of λ we let T (α) = T (i, j) denote the entry of T in the box α. We define the double Schur polynomials sλ(x||a) by sλ(x||a) = (xT (α) − aT (α)−c(α)), (1.9) summed over the reverse λ-tableaux T . Then we have sλ(x||a) = sλ(x|u) (1.10) for the sequences a and u related by an−i+1 = ui with i = 1, 2, . . . . In particular, the polynomial sλ(x||a) only depends on the variables ai with i 6 n, i ∈ Z. The relation (1.10) is verified easily by replacing xi with xn−i+1 in (1.8) for all i = 1, . . . , n and using the fact that sλ(x|u) is a symmetric polynomial in x. The property (1.7) of the double Schur polynomials is immediate from their definition. In the specialization of the sequence a with ai = −i, i ∈ Z, formula (1.9) defines the shifted Schur polynomials of Okounkov and Olshanski [18, 19] in the variables yi = xi+ i. The use of the reverse tableaux was significant in their study of the vanishing and stability properties of these polynomials and associated central elements of the universal enveloping algebra for the Lie algebra gln; see also Section 3.2 below. Note that the stability property (1.7) extends to the double Schubert polynomials (and to the equivariant Schubert calculus on the flag manifold). This follows easily from the Cauchy formula for the Schubert polynomials (e.g., put x1 = y1 in [15, Formula in 2.5.5]). In a more general context, this was also pointed out in [3]. The double Schur polynomials sλ(x||a) parameterized by the diagrams λ with at most n rows form a basis of the ring Λn. Due to the stability property (1.7), the Littlewood–Richardson polynomials cνλµ(a) can be defined by the expansion (1.3), where x is understood as the set of variables x = (x1, . . . , xn) for any positive integer n such that the diagrams λ, µ and ν have at most n rows. This allows us to work with a finite set of variables for the determination of the polynomials cνλµ(a). For the proof of the main theorem (Theorem 2.1) we follow the general approach of [17], using the techniques of “barred” tableaux and modify the corresponding arguments in order to obtain manifestly positive polynomials. This is achieved by imposing a boundness condition on the barred tableaux. It was observed by Goulden and Greene [7] and Macdonald [13] that sλ(x|u), regarded as a formal power series in the infinite sets of variables x and u, admits a “supertableaux” representation. We show that this representation has its “finite” counterpart where x is a finite set of variables. We derive the corresponding formula by choosing a certain specialization of the 9th Variation in [13]. This representa- tion leads to a “supertableau” expression for the Littlewood–Richardson polynomials cνλµ(a), although that expression is neither manifestly positive, nor stable. After the first version of this paper was completed we have learned of an indepen- dent work of V. Kreiman [10], where a positive equivariant Littlewood–Richardson rule was given. That rule is equivalent to our Theorem 2.1 although the proof in [10] is different. Moreover, Kreiman’s paper also provides a weight-preserving bijection between the Knutson–Tao puzzles and the barred tableaux used in Theorem 2.1. This work was inspired by Bill Fulton’s lectures [6]. I am grateful to Bill for stimulating discussions. 2 Multiplication rule Let R denote a sequence of diagrams µ = ρ(0) → ρ(1) → · · · → ρ(l−1) → ρ(l) = ν, (2.1) where ρ → σ means that σ is obtained from ρ by adding one box. Let ri denote the row number of the box added to the diagram ρ(i−1). The sequence r1r2 . . . rl is called the Yamanouchi symbol of R. Introduce the ordering on the set of boxes of a diagram λ by reading them by columns from left to right and from bottom to top in each column. We call this the column order . We shall write α ≺ β if α (strictly) precedes β with respect to the column order. Given a sequence R, construct the set T (λ,R) of barred reverse λ-tableaux T with entries from {1, 2, . . . } such that T contains boxes α1, . . . , αl with α1 ≺ · · · ≺ αl and T (αi) = ri, 1 6 i 6 l. We will distinguish the entries in α1, . . . , αl by barring each of them. So, an element of T (λ,R) is a pair consisting of a reverse λ-tableau and a chosen sequence of barred entries compatible with R. We shall keep the notation T for such a pair. For example, let R be the sequence (3, 1) → (3, 2) → (3, 2, 1) → (3, 3, 1) → (4, 3, 1) so that the Yamanouchi symbol is 2 3 2 1. Then for λ = (5, 5, 3) the following barred λ-tableau belongs to T (λ,R): For each box α with αi ≺ α ≺ αi+1, 0 6 i 6 l, set ρ(α) = ρ (i). The barred entries r1, . . . , rl divide the tableau into regions marked by the elements of the sequence R, as illustrated: ρ(0) ρ(1) r1 r2 · · · Finally, a reverse λ-tableau T will be called ν-bounded if T (1, j) 6 ν ′j for all j = 1, . . . , λ1. Note that ν-bounded λ-tableaux exist only if λ ⊆ ν. We are now in a position to state a rule for the calculation of the Littlewood- Richardson polynomials cνλµ(a) defined by (1.3). Theorem 2.1. The polynomial cνλµ(a) is zero unless µ ⊆ ν. If µ ⊆ ν then cνλµ(a) = T (α) unbarred aT (α)−ρ(α) T (α) − aT (α)−c(α) , (2.2) summed over all sequences R of the form (2.1) and all ν-bounded reverse λ-tableaux T ∈ T (λ,R). Moreover, for each factor occurring in the formula (2.2) we have ρ(α)T (α) > c(α). Before proving the theorem, let us point out some properties of the Littlewood- Richardson polynomials which are immediate from the rule and consider some exam- ples. The polynomial cνλµ(a) is zero unless both diagrams λ and µ are contained in ν and |λ| + |µ| > |ν|. In this case cνλµ(a) is a homogeneous polynomial in the ai of degree |λ| + |µ| − |ν|. If |λ| + |µ| − |ν| = 0 then the theorem reproduces a version of the classical Littlewood-Richardson rule; see Corollary 2.9 below. Note also that by the definition, the polynomials have the symmetry cνλµ(a) = c µλ(a) which is not apparent from the rule. Example 2.2. For the product of the double Schur functions s(2)(x||a) and s(2,1)(x||a) we have s(2)(x||a) s(2,1)(x||a) = s(4,1)(x||a) + s(3,2)(x||a) + s(3,1,1)(x||a) + s(2,2,1)(x||a) a−1 − a2 + a−2 − a0 s(3,1)(x||a) + a−1 − a2 s(2,2)(x||a) a−1 − a0 s(2,1,1)(x||a) + a−1 − a2 a−1 − a0 s(2,1)(x||a). For instance, the coefficient of s(3,1)(x||a) is calculated by the following barred (2)- tableaux 1 1 1 1 2 1 compatible with the sequence (2, 1) → (3, 1). They contribute respectively a−1 − a1, a−2 − a0, a1 − a2 which sums up to the coefficient a−1 − a2 + a−2 − a0. Alternatively, using the symmetry cνλµ(a) = c µλ(a) we can calculate the coefficient of s(3,1)(x||a) by considering the barred (2, 1)-tableaux compatible with the sequences (2) → (3) → (3, 1) and (2) → (2, 1) → (3, 1), respec- tively. Their contributions to the coefficient are a−2 − a0 and a−1 − a2. Example 2.3. For the calculation of c (5,2,2) (4,2,1)(2,2) (a) take λ = (4, 2, 1), µ = (2, 2) and ν = (5, 2, 2). We have ten sequences R of the form (2.1) but the set T (λ,R) contains ν-bounded tableaux only for three of them. For the sequence R1 with the Yamanouchi symbol 1 3 3 1 1, the set T (λ,R1) contains two bounded barred tableaux 3 1 1 3 1 1 whose contributions to the Littlewood–Richardson polynomial are (a0 − a3)(a0 − a2) and (a0 − a3)(a−2 − a1), respectively. For the sequence R2 with the Yamanouchi symbol 1 3 1 3 1, the set T (λ,R2) contains the bounded tableaux 3 1 1 3 1 1 with the respective contributions (a0 − a3)(a−4 − a−2) and (a0 − a3)(a−3 − a−1). For the sequence R3 with the Yamanouchi symbol 3 1 3 1 1, the set T (λ,R3) contains the only bounded tableau 3 1 1 with the contribution (a−1 − a3)(a0 − a3). Hence, (5,2,2) (4,2,1)(2,2) (a) = (a0 − a3) (a−4 + a−3 + a0 − a1 − a2 − a3). Taking λ = (2, 2), µ = (4, 2, 1) and ν = (5, 2, 2) we get two sequences with the Yamanouchi symbols 1 3 and 3 1. The corresponding sets T (λ,R) consist of five and four bounded barred tableaux, respectively, thus leading to a slightly longer calculation. Proof of Theorem 2.1. We present the proof as a sequence of lemmas. Due to the stability property (1.7), we may (and will) work with a finite set of variables x = (x1, . . . , xn). Accordingly, possible entries of the tableaux are now elements of the set {1, . . . , n}. Introduce another sequence of variables b = (bi), i ∈ Z, and define the Littlewood–Richardson type coefficients cνλµ(a, b) by the expansion sλ(x||b) sµ(x||a) = cνλµ(a, b) sν(x||a). (2.3) Lemma 2.4. The coefficient cνλµ(a, b) is zero unless µ ⊆ ν. If µ ⊆ ν then cνλµ(a, b) = T (α) unbarred aT (α)−ρ(α) T (α) − bT (α)−c(α) , (2.4) summed over all sequences R of the form (2.1) and all reverse λ-tableaux T ∈ T (λ,R). Proof. This is essentially a reformulation of the main result of [17] (Theorem 3.1). Note that the summation in (2.4) is taken over all barred tableaux T ∈ T (λ,R) (not just over the ν-bounded ones as in (2.2)). Rather than repeating the whole argument of [17], we only sketch the main steps of the proof and indicate the necessary changes to be made. We refer the reader to [17] for the details. We assume that all diagrams here have at most n rows. If ρ = (ρ1, . . . , ρn) is a such diagram, we set aρ = (a1−ρ1 , . . . , an−ρn) and |aρ| = a1−ρ1 + · · ·+ an−ρn. Under the correspondence (1.10) we have aρ = uρ = (uρ1+n, . . . , uρn+1), the latter notation was used in [17]. The starting point is the Vanishing Theorem of [18] whose proof was also repro- duced in [17]. By that theorem, sλ(aρ ||a) = 0 unless λ ⊆ ρ, sλ(aλ ||a) = (i,j)∈λ ai−λi − aλ′j−j+1 The first claim of the lemma follows from the Vanishing Theorem which also implies λµ(a, b) = sλ(aµ ||b). This proves (2.4) for the case ν = µ. Now we suppose that |ν| − |µ| > 1 and proceed by induction on |ν| − |µ|. The induction step is based on the recurrence relation cνλµ(a, b) = |aν | − |aµ| cνλµ+(a, b)− λµ (a, b) (2.5) which was proved in [17, Proposition 3.4]; see also [9]. Suppose that the diagram ν is obtained from µ by adding one box in row r. Then cνλµ(a, b) = sλ(aν ||b)− sλ(aµ ||b) (aν)r − (aµ)r . (2.6) Now use the definition (1.9) of the double Schur polynomials. Since the n-tuples aν and aµ only differ at the r-th component, the ratio on the right hand side of (2.6) can be expanded by taking into account the entries r of the reverse λ-tableaux T . We need the following formula, where we are thinking of y = (aν)r, z = (aµ)r and mi = bT (α)−c(α) as α runs over the boxes of T with T (α) = r in column order: i=1(y −mi)− i=1(z −mi) y − z (z −m1) . . . (z −mj−1)(y −mj+1) . . . (y −mk). The right hand side of (2.6) can now be interpreted as the right hand side of (2.4), where R is the only sequence µ → ν and the sum is taken over the reverse λ-tableaux T with one barred entry r, as illustrated: µ r ν Here ρ(α) = µ for all boxes α preceding the box occupied by the barred r, and ρ(α) = ν for all boxes α which follow that box in column order. Note that the variables y and z are now swapped on the right hand side of the above expansion, as compared to [17] (this does not change the polynomial due to the symmetry in y and z). Consequently, the column order used in [17] is the opposite to the order on the boxes of λ we use here. We can represent the above calculation of cνλµ(a) by the “diagrammatic” relation |aν | − |aµ| µ r ν = ν − µ Consider now the next case where |ν| − |µ| = 2 and apply the recurrence relation (2.5). We have three subcases: the diagram ν is obtained from µ by adding two boxes in different rows and columns; by adding two boxes in the same row; or by adding two boxes in the same column. The first two subcases are dealt with in a way similar to the case |ν|− |µ| = 1. An additional care is needed for the third subcase where we suppose that ν is obtained from µ by adding the boxes in rows r and r + 1. Denote by ρ the diagram obtained from µ by adding the box in row r. Then (2.5) gives cνλµ(a, b) = cνλρ(a, b)− c λµ(a, b) |aν | − |aµ| Set s = r+1. Exactly as in the case |ν|−|µ| = 1, we have the following diagrammatic relations: |aρ| − |aµ| sρ ν = ρ s ν − µ s ν |aν | − |aρ| sρ ν = µ r ν − µ r ρ Hence, the desired formula for cνλµ(a, b) will follow if we prove the relation µ r ν = µ s ν We construct a weight-preserving bijection between the barred reverse λ-tableaux which are represented by the left and right hand sides of this diagrammatic relation. Here the weight is the product on the right hand side of (2.4) corresponding to a barred tableau. Let such a tableau with a barred entry r in the box (i, j) be given. Suppose first that the box (i − 1, j) belongs to the diagram and it is occupied by s = r+1. Then the image of the tableau under the map is the same tableau but the entry T (i, j) = r is now unbarred while T (i− 1, j) = r + 1 is barred. Since (aν)r+1 = (aµ)r and T (i− 1, j)− c(i− 1, j) = T (i, j)− c(i, j), the weights of the tableaux are preserved under the map. Suppose now that the entry in the box (i − 1, j) is greater than r + 1, or this box is outside the diagram. Consider all entries r in the row i to the left of the box (i, j) and suppose that they occupy the boxes (i, j −m), (i, j −m+1), . . . , (i, j− 1). Then the image of the tableau under the map is the tableau obtained by replacing the entries in each of the boxes (i, j −m), . . . , (i, j) with s = r + 1 and barring the entry in the box (i, j −m). The weights of the tableaux are again preserved. The inverse map is described in a similar way. This gives the desired weight- preserving bijection. The general argument uses similar calculations with the barred diagrams and a similar bijection described in [17]. Remark 2.5. (i) A cohomological interpretation of the coefficients cνλµ(a, b) and their puzzle computation can be found in [9]. (ii) The definition (2.3) of the coefficients cνλµ(a, b) can be extended to the case where λ is a skew diagram. Lemma 2.4 and its proof remain valid; see [17]. (iii) In contrast with the Littlewood–Richardson polynomials cνλµ(a), the coeffi- cients cνλµ(a, b) do not have the stability property as they depend on n. Lemma 2.4 implies that the Littlewood–Richardson polynomials can be calcu- lated by (2.4) with b = a, that is, cνλµ(a) = c λµ(a, a). Our strategy now is to show that (unlike the formula of Theorem 3.1 in [17]), the formula (2.4) (with b = a) is “nonnegative” in the sense that all nonzero products which occur in the formula are polynomials in the ai − aj with i < j. Then we demonstrate that the ν-boundness condition serves to eliminate the unwanted zero terms. Lemma 2.6. Let R be a sequence of the form (2.1) and let T ∈ T (λ,R). Suppose that ∏ T (α) unbarred aT (α)−ρ(α) T (α) − aT (α)−c(α) 6= 0. (2.7) Then ρ(α) T (α) > c(α) for all α ∈ λ with unbarred T (α). Proof. Suppose on the contrary that there exists a box α = (i, j) with an unbarred T (i, j) and the condition ρ(i, j)T (i,j) < j − i; the equality ρ(i, j)T (i,j) = j − i is excluded since this would violate (2.7). Choose such a box with the minimum possible value of j. If all the entries T (i, 1), . . . , T (i, j − 1) of T are barred then ρ(i, j) is obtained from µ by adding boxes in rows T (i, 1) > · · · > T (i, j − 1) and, possibly, by adding other boxes. Since T (i, j − 1) > T (i, j), we have ρ(i, j)T (i,j) > j − 1, a contradiction. So, at least one of the entries T (i, 1), . . . , T (i, j−1) must be unbarred. Take such an unbarred entry T (i, k) which is the closest to T (i, j), that is, all entries T (i, k+1), . . . , T (i, j − 1) are barred. Then ρ(i, j) is obtained from ρ(i, k) by adding boxes in rows T (i, k + 1) > · · · > T (i, j − 1) and, possibly, by adding other boxes. Hence, ρ(i, j)T (i,j) > ρ(i, k)T (i,k) + j − k − 1 which implies ρ(i, k)T (i,k) < k − i + 1. However, if ρ(i, k)T (i,k) = k − i then the factor in (2.7) corresponding to α = (i, k) is zero, which is impossible. Therefore ρ(i, k)T (i,k) < k − i which contradicts the choice of j. Lemma 2.7. Suppose that R is a sequence of the form (2.1) and T ∈ T (λ,R). If (2.7) holds then T is ν-bounded. Proof. By Lemma 2.6, for all unbarred entries T (1, k) of the first row of the tableau T we have ρ(1, k)T (1,k) > k. This implies νT (1,k) > k. If the entry T (1, j) is barred then ρ(1, k)T (1,k) > k for the nearest unbarred entry T (1, k) on its left (if it exists). Then ν is obtained from ρ(1, k) by adding boxes in rows T (1, k + 1) > · · · > T (1, j) and, possibly, by adding other boxes. This implies νT (1,j) > j. Thus, this inequality holds for all j = 1, . . . , λ1. This is equivalent to the ν-boundness of T . Lemma 2.8. Suppose that R is a sequence of the form (2.1) and T ∈ T (λ,R) is ν-bounded. Then ρ(α) T (α) > c(α) for all α ∈ λ with unbarred T (α). Proof. We argue by contradiction. Taking into account Lemma 2.6, we find that for some α = (i, j) with unbarred T (α) we have ρ(i, j)T (i,j) = j − i. Set t = T (i, j) and consider all barred entries of T (assuming for now they exist) which are equal to t and occur to the right of the column j. Since T is a reverse tableau, these entries t̄ can only occur in rows 1, 2, . . . , i. Let (r, k) be the box with the maximum column number k containing t̄. Then the total number of such entries t̄ does not exceed k− j. This implies that the number of boxes νt in row t of ν does not exceed ρ(i, j)t + k − j = k − i. Hence, ν k 6 t − 1. On the other hand, by the ν-boundness of T we have t = T (r, k) 6 T (1, k) 6 ν ′k, a contradiction. If none of the boxes to the right of the column j contains t̄ then νt = ρ(i, j)t = j−i. However, by the assumption, νt > νT (1,j) > j, a contradiction. This completes the proof of the theorem. By the column word of a tableau T we will mean the sequence of all entries of T written in the column order. Corollary 2.9. Suppose that |ν| = |λ| + |µ|. The Littlewood–Richardson coefficient cνλµ equals the number of ν-bounded reverse λ-tableaux T whose column word coincides with the Yamanouchi symbol of a certain sequence R of the form (2.1). This can be shown to be equivalent to a well-known version of the Littlewood– Richardson rule. Corollary 2.9 also holds with the ν-boundness condition dropped; see Lemma 2.7. By the corollary, cνλµ counts the cardinality of the intersection of two finite sets: the set of column words of ν-bounded reverse λ-tableaux and the set of Yamanouchi symbols of the sequences of the form (2.1). Remark 2.10. Due to (1.10), the multiplication rule for the polynomials sλ(x|u) is obtained from Theorem 2.1 by replacing ai with un−i+1 for each i. The corresponding coefficients are polynomials in the ui−uj, i > j, with positive integer coefficients. Corollary 2.11. Suppose that the polynomials cνλµ(a) are defined by the expansion (1.3) with x = (x1, . . . , xn). Then c λµ(a) is independent of n as soon as n > ν Moreover, if n < ν ′1 then c λµ(a) = 0. Proof. This follows from the boundness condition on the reverse tableaux. 3 Applications 3.1 Equivariant Schubert calculus on the Grassmannian As in the Introduction, consider the equivariant cohomology ring H∗T (Gr(n,N)) as a module over Z[t1, . . . , tN ]. Let x1, . . . , xn denote the Chern roots of the dual S of the tautological subbundle S of the trivial bundle CNGr(n,N) so that for the total equivariant Chern class of S we have cT (S) = (1− xi). Then, due to [6, Lecture 8, Proposition 1.1] (see also [16]), the equivariant Schubert classes σλ can be expressed by σλ = sλ(x|u), u = (−tN , . . . ,−t1, 0, . . . ). Hence, Theorem 2.1 yields a multiplication rule for the equivariant Schubert classes. The corresponding stability property is implied by Corollary 2.11. Corollary 3.1. We have σλ σµ = d νλµ σν , where d νλµ = T (α) unbarred tm+T (α)−c(α) − tm+T (α)−ρ(α) T (α) , (3.1) summed over all sequences R of the form (2.1) and all ν-bounded reverse λ-tableaux T ∈ T (λ,R). In particular, the d νλµ are polynomials in the ti− tj, i > j, with positive integer coefficients. Moreover, the coefficients d νλµ, regarded as polynomials in the variables ai defined in (1.5), are independent of n and m, as soon as the inequalities n > λ′1 + µ 1 and m > λ1 + µ1 hold. Example 3.2. For any n > 3 and m > 4 we have σ(2) σ(2,1) = σ(4,1) + σ(3,2) + σ(3,1,1) + σ(2,2,1) + (tm+2 − tm−1 + tm − tm−2) σ(3,1) + (tm+2 − tm−1) σ(2,2) + (tm − tm−1) σ(2,1,1) + (tm+2 − tm−1) (tm − tm−1) σ(2,1). This follows from Example 2.2. The first manifestly positive rule for the expansion of σλ σµ was given by Knutson and Tao [9] by using combinatorics of puzzles. Although the stability property was not pointed out in [9], it can be deduced directly from the puzzle rule or by applying the weight-preserving bijection between the puzzles and the barred tableaux constructed by Kreiman [10]. 3.2 Quantum immanants and higher Capelli operators Let gln denote the general linear Lie algebra over C. Consider the center Z(gln) of the universal enveloping algebra U(gln). The algebra U(gln) is equipped with the natural filtration. For all n we identify gln−1 as a subalgebra of gln in a usual way and denote by gl∞ the corresponding inductive limit gl∞ = Due to Olshanski [20], there exist filtration-preserving homomorphisms on : Z(gln) → Z(gln−1), n > 1, (3.2) which allow one to define the algebra Z of the virtual Casimir elements for the Lie algebra gl∞ as the inverse limit Z = lim Z(gln), n → ∞, in the category of filtered algebras. The quantum immanants Sλ|n are elements of the center Z(gln) of the universal enveloping algebra U(gln) parameterized by the diagrams λ with at most n rows; see [18]. The elements Sλ|n form a basis of Z(gln) and they are consistent with the Olshanski homomorphisms (3.2) so that on : Sλ|n 7→ Sλ|n−1, (3.3) where we assume Sλ|n = 0 if the number of rows of λ exceeds n. For any diagram λ, the corresponding virtual quantum immanant Sλ is then defined as the sequence Sλ = ( Sλ|n |n > 0). The elements Sλ parameterized by all diagrams λ form a basis of the algebra Z so that we can define the coefficients f νλµ by the expansion Sλ Sµ = f νλµ Sν . Note that the same coefficients f νλµ determine the multiplication rule for the higher Capelli operators ∆λ, which are defined as the sequences of the images of the quantum immanants Sλ|n, where each image is taken under a natural representation of gln by differential operators; see [18, 19]. Corollary 3.3. The coefficient f νλµ is zero unless µ ⊆ ν. If µ ⊆ ν then f νλµ = T (α) unbarred ρ(α)T (α) − c(α) , (3.4) summed over all sequences R of the form (2.1) and all ν-bounded reverse λ-tableaux T ∈ T (λ,R). In particular, the f νλµ are nonnegative integers. Proof. Due to the stability property (3.3) of the quantum immanants, it suffices to calculate the corresponding coefficients for the expansion of the products Sλ|n Sµ|n. The images of the quantum immanants Sλ|n under the Harish-Chandra isomorphism can be identified with the double Schur polynomials sλ(x||a) where the sequence a is specialized to ai = −i; see [18]. Therefore, the coefficients in question coincide with the corresponding specializations of the Littlewood–Richardson polynomials cνλµ(a). Example 3.4. Using Example 2.2 we get S(2) S(2,1) = S(4,1) + S(3,2) + S(3,1,1) + S(2,2,1) + 5 S(3,1) + 3 S(2,2) + S(2,1,1) + 3 S(2,1). In the course of the proof of Corollary 3.3 we also calculated the coefficients for the expansion of the products Sλ|n Sµ|n for any n. Some other formulas for these coefficients were obtained in [17]. In particular, it was shown that the f νλµ are integers, although their positivity property was not established there. Note also that the algebra of virtual Casimir elements Z is isomorphic to the algebra of shifted symmetric functions Λ∗; see [19]. The latter can be regarded as the specialization of Λ (or rather, its extension over C) at ai = −i for all i ∈ Z. 4 Supertableau formulas for sλ(x||a) and c λµ(a) Here we obtain one more rule for the calculation of the Littlewood–Richardson poly- nomials cνλµ(a). It relies on a supertableau representation of the double Schur poly- nomials sλ(x||a) which is implied by the results of [13]. This representation provides a “finite” version of the supertableau formulas of [7] and [13]; cf. [4]. Fix a positive integer n. For r > 1 set u(r) = (u1, . . . , ur) and use the 9th Variation in [13] with the indeterminates hrs specialized by hrs = hr(u (n−r−s+1)) if r + s 6 n, and 0 otherwise, where hr denotes the r-th complete symmetric polynomial. Let us write ŝλ/µ(u) for the corresponding Schur functions. Then (8.2) and (9.1) in [13] give ŝλ/µ(u) = α∈λ/µ uT (α), summed over semistandard tableaux T of shape λ/µ, such that the entries of the i-th row do not exceed n− λi + i. Furthermore, using (6.18) 1 and (9.6 ′) in [13] we get sλ(x|u) = sµ(x) ŝλ′/µ′(−u). (4.5) Equivalently, this can be interpreted as a combinatorial expression for the polynomials sλ(x|u) in terms of “supertableaux”. Identify the indices of u with the symbols 1′, 2′, . . . . A supertableau T is obtained by filling in the diagram of λ with the indices 1, . . . , n, 1′, 2′, . . . in such a way that in each row (resp. column) each primed index is to the right (resp. below) of each unprimed index; unprimed indices weakly increase along the rows and strictly increase down the columns; primed indices strictly increase along the rows and weakly increase down the columns; primed indices in column j do not exceed n− λ′j + j. Relation (4.5) implies the following. Proposition 4.1. We have sλ(x|u) = T (α) unprimed xT (α) T (α) primed (−uT (α)), (4.6) summed over all λ-supertableaux T . Using (1.10), we get an analogous representation for the double Schur polynomials sλ(x||a). A reverse supertableau T is obtained by filling in the diagram of λ with 1This formula in [13] should be corrected by replacing a(λj+n−j) with a(λi+n−i). the indices 1, . . . , n, n′, (n − 1)′, . . . (including non-positive primed indices) in such a way that in each row (resp. column) each primed index is to the right (resp. below) of each unprimed index; unprimed indices weakly decrease along the rows and strictly decrease down the columns; primed indices strictly decrease along the rows and weakly decrease down the columns; primed indices in column j are not less than λ′j − j + 1. The following supertableau representation of the polynomials sλ(x||a) follows from Proposition 4.1. Corollary 4.2. We have sλ(x||a) = T (α) unprimed xT (α) T (α) primed (−aT (α)), (4.7) summed over all reverse λ-supertableaux T . Example 4.3. Let n = 2 and λ = (2, 1). By the definition (1.9), s(2,1)(x||a) = (x2 − a2)(x1 − a0)(x1 − a2) + (x2 − a2)(x2 − a1)(x1 − a2). On the other hand, the reverse (2, 1)-supertableaux are 2 0 ′ 2 1 ′ 2 2 ′ 2 0 ′ 2 1 ′ 2 2 ′ 1 0 ′ 1 1 ′ 1 2 ′ 2 ′ 0 ′ 2 ′ 1 ′ which yield s(2,1)(x||a) = x 1x2 + x1x 2 − x1x2a2 − x 2a2 − x 1a2 − x1x2a0 − x1x2a1 − x1x2a2 + x2a0a2 + x2a1a2 + x2a 2 + x1a0a2 + x1a1a2 + x1a 2 − a0a 2 − a1a Formula (4.5) implies a supertableau representation of the coefficients cνλµ(a, b) and hence, of the Littlewood–Richardson polynomials cνλµ(a). The representation for the latter is neither manifestly positive, nor stable; it provides an expression for cνλµ(a) as an alternating sum of monomials in the ai. Given a sequence R of the form (2.1), construct the set S(λ,R) of barred reverse λ-supertableaux by analogy with T (λ,R). A tableau T ∈ S(λ,R) must contain boxes α1, . . . , αl occupied by unprimed indices r1, r2, . . . , rl listed in the column order which is restricted to the subtableau of T formed by the unprimed indices. As before, we distinguish the entries in α1, . . . , αl by barring each of them. For each box α with αi ≺ α ≺ αi+1, 0 6 i 6 l, which is occupied by an unprimed index, set ρ(α) = ρ(i). Corollary 4.4. The coefficients cνλµ(a, b) defined in (2.3) can be given by cνλµ(a, b) = T (α) unprimed, unbarred aT (α)−ρ(α) T (α) T (α) primed (−bT (α)), (4.8) summed over sequences R of the form (2.1) and reverse supertableaux T ∈ S(λ,R). Proof. Applying formula (4.5) we can reduce the calculation of cνλµ(a, b) to the par- ticular case of the sequence b = (0). Now (4.8) follows from Lemma 2.4. Example 4.5. In order to calculate the Littlewood–Richardson polynomial c (2,1) (2,1) (2) we may take n = 2; see Corollary 2.11. The barred reverse supertableaux compatible with the sequence (2) → (2, 1) are 2 0 ′ 2 1 ′ 2 2 ′ 2 0 ′ 2 1 ′ 2 2 ′ so that (2,1) (2,1) (2) (a) = a2−1 + a−1a1 + a−1a2 − a−1a2 − a1a2 − a − a−1a0 − a−1a1 − a−1a2 + a0a2 + a1a2 + a = a2−1 − a−1a0 − a−1a2 + a0a2, which agrees with Example 2.2. References [1] L. C. Biedenharn and J. D. Louck, A new class of symmetric polynomials defined in terms of tableaux, Advances in Appl. Math. 10 (1989), 396–438. [2] L. C. Biedenharn and J. D. Louck, Inhomogeneous basis set of symmetric poly- nomials defined by tableaux, Proc. Nat. Acad. Sci. U.S.A. 87 (1990), 1441–1445. [3] A. S. Buch and R. Rimányi, Specializations of Grothendieck polynomials, C. R. Acad. Sci. Paris, Ser. I 339 (2004), 1–4. [4] W. Y. C. Chen, B. Li and J. D. Louck, The flagged double Schur function, J. Alg. Comb. 15 (2002), 7-26. [5] W. Fulton, Young tableaux. With applications to representation theory and ge- ometry. London Mathematical Society Student Texts, 35. Cambridge University Press, Cambridge, 1997. [6] W. Fulton, Equivariant cohomology in algebraic geometry, Eilenberg lectures, Columbia University, Spring 2007. Available at http://www.math.lsa.umich.edu/∼dandersn/eilenberg [7] I. Goulden and C. Greene, A new tableau representation for supersymmetric Schur functions, J. Algebra. 170 (1994), 687–703. [8] W. Graham, Positivity in equivariant Schubert calculus, Duke Math. J. 109 (2001), 599–614. [9] A. Knutson and T. Tao, Puzzles and (equivariant) cohomology of Grassmanni- ans, Duke Math. J. 119 (2003), 221–260. [10] V. Kreiman, Equivariant Littlewood-Richardson tableaux, preprint arXiv:0706.3738. [11] A. Lascoux, Interpolation, Lectures at Tianjin University, June 1996. Available at http://www-igm.univ-mlv.fr/∼al/pub−engl.html [12] D. E. Littlewood and A. R. Richardson, Group characters and algebra, Philos. Trans. Roy. Soc. London Ser. A 233 (1934), 49–141. [13] I. G. Macdonald, Schur functions: theme and variations, in “Actes 28-e Séminaire Lotharingien”, pp. 5–39. Publ. I.R.M.A. Strasbourg, 1992, 498/S–27. [14] I. G. Macdonald, Symmetric Functions and Hall Polynomials, Oxford University Press, Oxford, 1995. [15] L. Manivel, Symmetric functions, Schubert polynomials and degeneracy loci, SMF/AMS Texts and Monographs, Vol. 6, 1998. [16] L. C. Mihalcea, Giambelli formulae for the equivariant quantum cohomology of the Grassmannian, preprint math.CO/0506335. [17] A. I. Molev and B. E. Sagan, A Littlewood-Richardson rule for factorial Schur functions, Trans. Amer. Math. Soc, 351 (1999), 4429–4443. [18] A. Okounkov, Quantum immanants and higher Capelli identities, Transform. Groups 1 (1996), 99–126. http://www.math.lsa.umich.edu/~dandersn/eilenberg http://arxiv.org/abs/0706.3738 http://www-igm.univ-mlv.fr/~al/pub$_-$engl.html http://arxiv.org/abs/math/0506335 [19] A. Okounkov and G. Olshanski, Shifted Schur functions, St. Petersburg Math. J. 9 (1998), 239–300. [20] G. I. Olshanski, Representations of infinite-dimensional classical groups, limits of enveloping algebras, and Yangians, in “Topics in Representation Theory”, Advances in Soviet Math. 2, Amer. Math. Soc., Providence RI, 1991, pp. 1–66. [21] B. E. Sagan, The symmetric group. Representations, combinatorial algorithms, and symmetric functions, 2nd edition, Grad. Texts in Math., 203, Springer- Verlag, New York, 2001. Introduction Multiplication rule Applications Equivariant Schubert calculus on the Grassmannian Quantum immanants and higher Capelli operators Supertableau formulas for s(x || a) and c(a) ABSTRACT We introduce a family of rings of symmetric functions depending on an infinite sequence of parameters. A distinguished basis of such a ring is comprised by analogues of the Schur functions. The corresponding structure coefficients are polynomials in the parameters which we call the Littlewood-Richardson polynomials. We give a combinatorial rule for their calculation by modifying an earlier result of B. Sagan and the author. The new rule provides a formula for these polynomials which is manifestly positive in the sense of W. Graham. We apply this formula for the calculation of the product of equivariant Schubert classes on Grassmannians which implies a stability property of the structure coefficients. The first manifestly positive formula for such an expansion was given by A. Knutson and T. Tao by using combinatorics of puzzles while the stability property was not apparent from that formula. We also use the Littlewood-Richardson polynomials to describe the multiplication rule in the algebra of the Casimir elements for the general linear Lie algebra in the basis of the quantum immanants constructed by A. Okounkov and G. Olshanski. <|endoftext|><|startoftext|> Introduction 1 2 The momentum picture 3 3 Lagrangians, Euler-Lagrange equations and dynamical variables 5 4 On the uniqueness of the dynamical variables 10 5 Heisenberg relations 14 6 Types of possible commutation relations 20 6.1 Restrictions related to the momentum operator . . . . . . . . . . . . . . . . . 21 6.2 Restrictions related to the charge operator . . . . . . . . . . . . . . . . . . . . 25 6.3 Restrictions related to the angular momentum operator(s) . . . . . . . . . . . 27 7 Inferences 31 8 State vectors, vacuum and mean values 37 9 Commutation relations for several coexisting different free fields 43 9.1 Commutation relations connected with the momentum operator. Problems and their possible solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 9.2 Commutation relations connected with the charge and angular momentum operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 9.3 Commutation relations between the dynamical variables . . . . . . . . . . . . 50 9.4 Commutation relations under the uniqueness conditions . . . . . . . . . . . . 52 10 Conclusion 54 References 55 This article ends at page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Abstract Possible (algebraic) commutation relations in the Lagrangian quantum theory of free (scalar, spinor and vector) fields are considered from mathematical view-point. As sources of these relations are employed the Heisenberg equations/relations for the dynamical variables and a specific condition for uniqueness of the operators of the dynamical variables (with respect to some class of Lagrangians). The paracommutation relations or some their gen- eralizations are pointed as the most general ones that entail the validity of all Heisenberg equations. The simultaneous fulfillment of the Heisenberg equations and the uniqueness re- quirement turn to be impossible. This problem is solved via a redefinition of the dynamical variables, similar to the normal ordering procedure and containing it as a special case. That implies corresponding changes in the admissible commutation relations. The introduction of the concept of the vacuum makes narrow the class of the possible commutation relations; in particular, the mentioned redefinition of the dynamical variables is reduced to normal ordering. As a last restriction on that class is imposed the requirement for existing of an effective procedure for calculating vacuum mean values. The standard bilinear commutation relations are pointed as the only known ones that satisfy all of the mentioned conditions and do not contradict to the existing data. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 1 1. Introduction The main subject of this paper is an analysis of possible (algebraic) commutation relations in the Lagrangian quantum theory1 of free fields. These relations are considered only from mathematical view-point and physical consequence of them, like the statistics of many-par- ticle systems, are not investigated. The canonical quantization method finds its origin in the classical Hamiltonian mechan- ics [9, 10] and naturally leads to the canonical (anti)commutation relations [3, 11,12]. These relations can be obtained from different assumptions (see, e.g., [1,13–15]) and are one of the basic corner stones of the present-day quantum field theory. Theoretically there are possible also non-canonical commutation relations. The best known example of them being the so-called paracommutation relations [16–18]. But, however, it seems no one of the presently known particles/fields obeys them. In the present work is shown how different classes of commutation relations, understood in a broad sense as algebraic connections between creation and/or annihilation operators, arise from the Lagrangian formalism, when applied to three types of Lagrangians describing free scalar, spinor and vector fields. Their origin is twofold. One one hand, a requirement for uniqueness of the dynamical variables (that can be calculated from Lagrangians leading to identical Euler-Lagrange equation) entails a number of specific commutation relations. On another hand, any one of the so-called Heisenberg relations/equations [3, 11], implies cor- responding commutation relations; for example, the paracommutation relations arise from the Heisenberg equations regarding the momentum operator, when ‘charge symmetric’ La- grangian is employed.2 The combination of the both methods leads to strong, generally incompatible, restrictions on the admissible types of commutation relations. The introduction of the concept of vacuum, combined with the mentioned uniqueness of the operators of the dynamical variables, changes the situation and requires a redefinition of these operators in a way similar to the one known as the normal ordering [1, 3, 11, 12], which is its special case. Some natural assumptions reduce the former to the letter one; in particular, in that way are excluded the paracommutation relations. However, this does not reduce the possible commutation relations to the canonical ones. Further, the requirement to be available an effective procedure for calculating vacuum mean (expectation) values, to which reduce all predictable results in the theory, puts new restriction, whose only realistic solution at the time being seems to be the standard canonical (anti)commutation relations. The layout of the work is as follows. Sect. 2 gives an idea of the momentum picture of motion and discusses the relations between the creation and annihilation operators in it and in Heisenberg picture. In Sect. 3 are reviewed some basic results from [13–15], part of which can be found also in papers like [1, 3, 11, 12]. In particular, the explicit expression of the dynamical variables via the creation 1 In this paper we considered only the Lagrangian (canonical) quantum field theory in which the quantum fields are represented as operators, called field operators, acting on some Hilbert space, which in general is unknown if interacting fields are studied. These operators are supposed to satisfy some equations of motion, from them are constructed conserved quantities satisfying conservation laws, etc. From the view-point of present-day quantum field theory, this approach is only a preliminary stage for more or less rigorous formulation of the theory in which the fields are represented via operator-valued distributions, a fact required even for description of free fields. Moreover, in non-perturbative directions, like constructive and conformal field theories, the main objects are the vacuum mean (expectation) values of the fields and from these are reconstructed the Hilbert space of states and the acting on it fields. Regardless of these facts, the Lagrangian (canonical) quantum field theory is an inherent component of the most of the ways of presentation of quantum field theory adopted explicitly or implicitly in books like [1–8]. Besides, the Lagrangian approach is a source of many ideas for other directions of research, like the axiomatic quantum field theory [3,7,8]. 2 Ordinary [3,11], the commutation relations are postulated and the validity of the Heisenberg relations is then verified. We follow the opposite method by postulating the Heisenberg equations and, then, looking for commutation relations that are compatible with them. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 2 and annihilation operators are presented (without assuming some commutation relations or normal ordering) and it is pointed to the existence of a family of such variables for a given system of Euler-Lagrange equations for free fields. The last fact is analyzed in Sect. 4, where a number of its consequences, having a sense of commutation relations, are drawn. The Heisenberg relations and the commutation relations between the dynamical variables are reviewed and analyzed in Sect. 5. It is pointed that the letter should be consequences from the former ones. Arguments are presented that the Heisenberg equation concerning the angular momentum operator should be split into two independent ones, representing its ‘orbital’ and ‘spin’ parts, respectively. Sect. 6 contains a method for assigning commutation relations to the Heisenberg equa- tions. It is shown that the Heisenberg equation involving the ‘orbital’ part of the angular momentum gives rise to a differential, not algebraic, commutation relation and the one con- cerning the ‘spin’ part of the angular momentum implies a complicated integro-differential connections between the creation and annihilation operators. Special attention is paid to the paracommutation relations, whose particular kind are the ordinary ones, which ensure the validity of the Heisenberg equations concerning the momentum operator. Partially is analyzed the problem for compatibility of the different types of commutation relations de- rived. It is proved that some generalization of the paracommutation relations ensures the fulfillment of all of the Heisenberg relations. Sect. 7 is devoted to consequences from the commutation relations derived in Sect. 6 under the conditions for uniqueness of the dynamical variables presented in Sect. 4. Gen- erally, these requirements are incompatible with the commutation relations. To overcome the problem, it is proposed a redefinition of the dynamical variables via a method similar to (and generalizing) the normal ordering. This, of course, entails changes in the commutation relations, the new versions of which happen to be compatible with the uniqueness conditions and ensure the validity of the Heisenberg relations. The concept of the vacuum is introduced in Sect. 8. It reduces (practically) the redefini- tion of the operators of the dynamical variables to the one obtained via the normal ordering procedure in the ordinary quantum field theory, but, without additional suppositions, does not reduce the commutation relations to the standard bilinear ones. As a last step in specify- ing the commutation relations as much as possible, we introduce the requirement the theory to supply an effective way for calculating vacuum mean values of (anti-normally ordered) products of creation and annihilation operators to which are reduced all predictable results, in particular the mean values of the dynamical variables. The standard bilinear commutation relation seems to be the only ones know at present that survive that last condition, however their uniqueness in this respect is not investigated. Sect. 9 deals with the same problems as described above but for systems containing at least two different quantum fields. The main obstacle is the establishment of commutation relations between creation/annihilation operators concerning different fields. Argument is presented that they should contain commutators or anticommutators of these operators. The major of corresponding commutation relations are explicitly written and the results obtained turn to be similar to the ones just described, only in ‘multifield’ version. Section 10 closes the paper by summarizing its main results. The books [1–3] will be used as standard reference works on quantum field theory. Of course, this is more or less a random selection between the great number of (text)books and papers on the theme to which the reader is referred for more details or other points of view. For this end, e.g., [4, 12,19] or the literature cited in [1–4,12,19] may be helpful. Throughout this paper ~ denotes the Planck’s constant (divided by 2π), c is the velocity of light in vacuum, and i stands for the imaginary unit. The superscripts † and ⊤ mean respec- tively Hermitian conjugation and transposition (of operators or matrices), the superscript ∗ Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 3 denotes complex conjugation, and the symbol ◦ denotes compositions of mappings/operators. By δfg, or δ f or δ fg (:= 1 for f = g, := 0 for f = g) is denoted the Kronecker δ-symbol, depending on arguments f and g, and δn(y), y ∈ Rn, stands for the n-dimensional Dirac δ-function; δ(y) := δ1(y) for y ∈ R. The Minkowski spacetime is denoted by M . The Greek indices run from 0 to dimM −1 = 3. All Greek indices will be raised and lowered by means of the standard 4-dimensional Lorentz metric tensor ηµν and its inverse ηµν with signature (+ − −−). The Latin indices a, b, . . . run from 1 to dimM − 1 = 3 and, usually, label the spacial components of some object. The Einstein’s summation convention over indices repeated on different levels is assumed over the whole range of their values. At last, we ought to give an explanation why this work appears under the general title “Lagrangian quantum field theory in momentum picture” when in it all considerations are done, in fact, in Heisenberg picture with possible, but not necessary, usage of the creation and annihilation operators in momentum picture. First of all, we essentially employ the obtained in [13–15] expressions for the dynamical variables in momentum picture for three types of Lagrangians. The corresponding operators in Heisenberg picture, which in fact is used in this paper, can be obtained via a direct calculation, as it is partially done in, e.g., [1] for one of the mentioned types of Lagrangians. The important point here is that in Heisenberg picture it suffice to be used only the standard Lagrangian formalism, while in momentum picture one has to suppose the commutativity between the components of the momentum operator and the validity of the Heisenberg relations for it (see below equations (2.6) and (2.7)). Since for the analysis of the commutation relations we intend to do the fulfillment of these relations is not necessary (they are subsidiary restrictions on the Lagrangian formalism), the Heisenberg picture of motion is the natural one that has to be used. For this reason, the expression for the dynamical variables obtained in [13–15] will be used simply as their Heisenberg counterparts, but expressed via the creation and annihilation operators in momentum picture. The only real advantage one gets in this way is the more natural structure of the orbital angular momentum operator. As the commutation relations considered below are algebraic ones, it is inessential in what picture of motion they are written or investigated. 2. The momentum picture Since the momentum picture of motion will be used only partially in this work, below is presented only its definition and the connection between the creation/annihilation operators in it and in Heisenberg picture. Details concerning the momentum picture can be found in [20,21] and in the corresponding sections devoted to it in [13–15]. Let us consider a system of quantum fields, represented in Heisenberg picture of motion by field operators ϕ̃i(x) : F → F , i = 1, . . . , n ∈ N, acting on the system’s Hilbert space F of states and depending on a point x in Minkowski spacetime M . Here and henceforth, all quantities in Heisenberg picture will be marked by a tilde (wave) “˜” over their kernel symbols. Let P̃µ denotes the system’s (canonical) momentum vectorial operator, defined via the energy-momentum tensorial operator T̃ µν of the system, viz. P̃µ := x0=const T̃0µ(x) d3x. (2.1) Since this operator is Hermitian, P̃†µ = P̃µ, the operator U(x, x0) = exp (xµ − xµ0 ) P̃µ , (2.2) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 4 where x0 ∈ M is arbitrarily fixed and x ∈ M ,3 is unitary, i.e. U†(x0, x) := (U(x, x0))† = U−1(x, x0) := (U(x, x0))−1 and, via the formulae X̃ 7→ X (x) = U(x, x0)( X̃ ) (2.3) Ã(x) 7→ A(x) = U(x, x0) ◦ ( Ã(x)) ◦ U−1(x, x0), (2.4) realizes the transition to the momentum picture. Here X̃ is a state vector in system’s Hilbert space of states F and Ã(x) : F → F is (observable or not) operator-valued function of x ∈ M which, in particular, can be polynomial or convergent power series in the field operators ϕ̃i(x); respectively X (x) and A(x) are the corresponding quantities in momentum picture. In particular, the field operators transform as ϕ̃i(x) 7→ ϕi(x) = U(x, x0) ◦ ϕ̃i(x) ◦ U−1(x, x0). (2.5) Notice, in (2.2) the multiplier (xµ − xµ0 ) is regarded as a real parameter (in which P̃µ is linear). Generally, X (x) and A(x) depend also on the point x0 and, to be quite correct, one should write X (x, x0) and A(x, x0) for X (x) and A(x), respectively. However, in the most situations in the present work, this dependence is not essential or, in fact, is not presented at all. For that reason, we shall not indicate it explicitly. The momentum picture is most suitable in quantum field theories in which the compo- nents P̃µ of the momentum operator commute between themselves and satisfy the Heisenberg relations/equations with the field operators, i.e. when P̃µ and ϕ̃i(x) satisfy the relations: [ P̃µ, P̃ν ] = 0 (2.6) [ ϕ̃i(x), P̃µ] = i~∂µ ϕ̃i(x). (2.7) Here [A,B]± := A ◦ B ± B ◦ A, ◦ being the composition of mappings sign, is the commuta- tor/anticommutator of operators (or matrices) A and B. However, the fulfillment of the relations (2.6) and (2.7) will not be supposed in this paper until Sect. 6 (see also Sect. 5). Let a±s (k) and a s (k) be the creation/annihilation operators of some free particular field (see Sect. 3 below for a detailed explanation of the notation). We have the connections ã±s (k) = e xµkµ U−1(x, x0) ◦ a±s (k) ◦ U(x, x0) ㆱs (k) = e xµkµ U−1(x, x0) ◦ a†±s (k) ◦ U(x, x0) m2c2 + k2 (2.8) whose explicit form is ã±s (k) = e kµa±s (k) ㆱs (k) = e kµa†±s (k) m2c2 + k2. (2.9) Further it will be assumed ã±s (k) and ã s (k) to be defined in Heisenberg picture, indepen- dently of a±s (k) and a s (k), by means of the standard Lagrangian formalism. What concerns the operators a±s (k) and a s (k), we shall regard them as defined via (2.9); this makes them independent from the momentum picture of motion. The fact that the so-defined operators a±s (k) and a s (k) coincide with the creation/annihilation operators in momentum picture (under the conditions (2.6) and (2.7)) will be inessential in the almost whole text. 3 The notation x0, for a fixed point in M , should not be confused with the zeroth covariant coordinate µ of x which, following the convention xν := ηνµx µ, is denoted by the same symbol x0. From the context, it will always be clear whether x0 refers to a point in M or to the zeroth covariant coordinate of a point x ∈ M . Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 5 3. Lagrangians, Euler-Lagrange equations and dynamical variables In [13–15] we have investigated the Lagrangian quantum field theory of respectively scalar, spin 1 and vector free fields. The main Lagrangians from which it was derived are respectively (see loc. cit. or, e.g. [1, 3, 11,12]): L̃′sc = L̃′sc( ϕ̃, ϕ̃†) =− 1 + τ( ϕ̃) m2c4 ϕ̃(x) ◦ ϕ̃†(x) + 1 1 + τ( ϕ̃) c2~2(∂µ ϕ̃(x)) ◦ (∂µ ϕ̃†(x)) (3.1a) L̃′sp = L̃′sp( ψ̃, ψ) =− 1 i~c{ ˜̆ψ (x)C−1γµ ◦ (∂µ ψ̃(x)) − (∂µ ˜̆ψ (x))C−1γµ ◦ ψ̃(x)}+mc2 ˜̆ψ (x)C−1 ◦ ψ̃(x) (3.1b) L̃′v = L̃′v( Ũ , Ũ†) = 1 + τ( Ũ) Ũ†µ ◦ Ũ 1 + τ( Ũ) −(∂µ Ũ ν) ◦ (∂ µ Ũν) + (∂µ Ũ ) ◦ (∂ν Ũ (3.1c) Here it is used the following notation: ϕ̃(x) is a scalar field, a tilde (wave) over a symbol means that it is in Heisenberg picture, the dagger † denotes Hermitian conjugation, ψ̃ := ( ψ̃0, ψ̃1, ψ̃2, ψ̃3) is a 4-spinor field, ψ := C ψ̃ := C( ψ̃†γ0) is its charge conjugate with γµ being the Dirac gamma matrices and the matrix C satisfies the equations C−1γµC = −γµ and C⊤ = −C, Uµ is a vector field, m is the field’s mass (parameter) and the function τ(A) := 1 for A† = A (Hermitian operator) 0 for A† 6= A (non-Hermitian operator) , (3.2) with A : F → F being an operator on the systems Hilbert space F of states, takes care of is the field charged (non-Hermitian) or neutral (Hermitian, uncharged). Since a spinor field is a charged one, we have τ( ψ̃) = 0; sometimes below the number 0 = τ( ψ̃) will be written explicitly for unification of the notation. We have explored also the consequences from the ‘charge conjugate’ Lagrangians L̃′′sc = L̃′′sc( ϕ̃, ϕ̃†) := L̃′sc( ϕ̃†, ϕ̃) (3.3a) L̃′′sp = L̃′′sp( ψ̃, ψ) := L̃′sp( ψ, ψ̃) (3.3b) L̃′′v = L̃′′v( Ũ , Ũ†) := L̃′v( Ũ†, Ũ), (3.3c) as well as from the ‘charge symmetric’ Lagrangians L̃′′′sc = L̃′′′sc( ϕ̃, ϕ̃†) := L̃′sc + L̃′′sc L̃′sc( ϕ̃, ϕ̃†) + L̃′sc( ϕ̃†, ϕ̃) (3.4a) L̃′′′sp = L̃′′′sp( ψ̃, ψ) := L̃′sp + L̃′′sp L̃′sp( ψ̃, ψ) + L̃′sp( ψ, ψ̃) (3.4b) L̃′′′v = L̃′′′v ( Ũ , Ũ†) := L̃′v + L̃′′v L̃′v( Ũ , Ũ†) + L̃′v( Ũ†, Ũ) . (3.4c) It is essential to be noted, for a massless, m = 0, vector field to the Lagrangian formalism are added as subsidiary conditions the Lorenz conditions ∂µ Ũµ = 0 ∂ µ Ũ†µ = 0 (3.5) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 6 on the solutions of the corresponding Euler-Lagrange equations. Besides, if the opposite is not stated explicitly, no other restrictions, like the (anti)commutation relations, are supposed to be imposed on the above Lagrangians. And a technical remark, for convenience, the fields ϕ̃, ψ̃ and Ũ and their charge conjugate ϕ̃†, ˜̆ψ and Ũ†, respectively, are considered as independent field variables. Let L̃′ denotes any one of the Lagrangians (3.1) and L̃′′ (resp. L̃′′′) the corresponding to it Lagrangian given via (3.3) (resp. (3.4)). Physically the difference between L̃′ and L̃′′ is that the particles for L̃′ are antiparticles for L̃′′ and vice versa. Both of the Lagrangians L̃′ and L̃′′ are not charge symmetric, i.e. the arising from them theories are not invariant under the change particle↔antiparticle (or, in mathematical terms, under some of the changes ϕ̃ ↔ ϕ̃†, ψ̃ ↔ ˜̆ψ, Ũ ↔ Ũ†) unless some additional hypotheses are made. Contrary to this, the Lagrangian L̃′′′ is charge symmetric and, consequently, the formalism on its base is invariant under the change particle↔antiparticle.4 The Euler-Lagrange equations for the Lagrangians L̃′, L̃′′ and L̃′′′ happen to coin- cide [13–15]:5 ∂ L̃′ ( ∂ L̃′ ∂(∂µχ) ≡ ∂ L̃ ( ∂ L̃′′ ∂(∂µχ) ≡ ∂ L̃ ( ∂ L̃′′′ ∂(∂µχ) = 0, (3.6) where χ = ϕ̃, ϕ̃†, ψ̃, ψ, Ũ , Ũ† for respectively scalar, spinor and vector field. Since the creation and annihilation operators are defined only on the base of Euler-La- grange equations [1, 3, 11–15], we can assert that these operators are identical for the La- grangians L̃′, L̃′′ and L̃′′′. We shall denote these operators by a±s (k) and a s (k) with the convention that a+s (k) (resp. a s (k)) creates a particle (resp. antiparticle) with 4-momen- tum ( m2c2 + k2,k), polarization s (see below) and charge (−q) (resp. (+q))6 and a†−s (k) (resp. a−s (k)) annihilates/destroys such a particle (resp. antiparticle). Here and henceforth k ∈ R3 is interpreted as (anti)particle’s 3-momentum and the values of the polarization index s depend on the field considered: s = 1 for a scalar field, s = 1 or s = 1, 2 for respectively massless (m = 0) or massive (m 6= 0) spinor field, and s = 1, 2, 3 for a vector field.7 Since massless vector field’s modes with s = 3 may enter only in the spin and orbital angular mo- menta operators [15], we, for convenience, shall assume that the polarization indices s, t, . . . take the values from 1 to 2j+1− δ0m(1− δ0j), where j = 0, 12 , 1 is the spin for scalar, spinor and vector field, respectively, and δ0m := 1 for m = 0 and δ0m := 0 for m 6= 0;8 if the value s = 3 is important when j = 1 and m = 0, it will be commented/considered separately. Of course, the creation and annihilation operators are different for different fields; one should write, e.g., ja (k) for a±s (k), but we shall not use such a complicated notation and will assume the dependence on j to be an implicit one. 4 Besides, under the same assumptions, the Lagrangian L̃′′′ does not admit quantization via anticommu- tators (commutators) for integer (half-integer) spin field, while L̃′ and L̃′′ do not make difference between integer and half-integer spin fields. 5 Rigorously speaking, the Euler-Lagrange equations for the Lagrangian (3.4b) are identities like 0 = 0 — see [22]. However, bellow we shall handle this exceptional case as pointed in [14]. 6 For a neutral field, we put q = 0. 7 For convenience, in [14], we have set s = 0 if m = 0 and s = 1, 2 if m 6= 0 for a spinor field. For a massless vector field, one may set s = 1, 2, thus eliminating the ‘unphysical’ value s = 3 for m = 0 — see [1, 11, 15]. In [13], for a scalar field, the notation ϕ± (k) and ϕ (k) is used for a± (k) and a (k), respectively. 8 In this way the case (j, s,m) = (1, 3, 0) is excluded from further considerations; if (j,m) = (1, 0) and q = 0, the case considered further in this work corresponds to an electromagnetic field in Coulomb gauge, as the modes with s = 3 are excluded [15]. However, if the case (j, s,m) = (1, 3, 0) is important for some reasons, the reader can easily obtain the corresponding results by applying the ones from [15]. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 7 The following settings will be frequently used throughout this chapter: 0 for scalar field for spinor field 1 for vector field 1 for q = 0 (neutral (Hermitian) field) 0 for q 6= 0 (charged (non-Hermitian) field) ε := (−1)2j = +1 for integer j (bose fields) −1 for half-integer j (fermi fields) (3.7) [A,B]± := [A,B]±1 := A ◦B ±B ◦A, (3.8) where A and B are operators on the system’s Hilbert space F of states. The dynamical variables corresponding to L̃′, L̃′′ and L̃′′′ are, however, completely dif- ferent, unless some additional conditions are imposed on the Lagrangian formalism [13–15]. In particular, the momentum operators P̃ωµ , charge operators Q̃ω, spin operators S̃ωµν and orbital operators L̃ωµν , where ω = ′, ′′, ′′′, for these Lagrangians are [13–15]: P̃ ′µ = 1 + τ 2j+1−δ0m(1−δ0j ) d3kkµ| m2c2+k2 {a†+s (k) ◦ a−s (k) + εa†−s (k) ◦ a+s (k)} (3.9a) P̃ ′′µ = 1 + τ 2j+1−δ0m(1−δ0j ) d3kkµ| m2c2+k2 {a+s (k) ◦ a†−s (k) + εa−s (k) ◦ a†+s (k)} (3.9b) P̃ ′′′µ = 2(1 + τ) 2j+1−δ0m(1−δ0j ) d3kkµ| m2c2+k2 {[a†+s (k), a−s (k)]ε + [a+s (k), a†−s (k)]ε} (3.9c) Q̃′ = +q 2j+1−δ0m(1−δ0j) d3k{a†+s (k) ◦ a−s (k)− εa†−s (k) ◦ a+s (k)} (3.10a) Q̃′′ = −q 2j+1−δ0m(1−δ0j) d3k{a+s (k) ◦ a†−s (k)− εa−s (k) ◦ a†+s (k)} (3.10b) Q̃′′′ = 1 2j+1−δ0m(1−δ0j ) d3k{[a†+s (k), a−s (k)]ε − [a+s (k), a†−s (k)]ε} (3.10c) S̃ ′µν = (−1)j−1/2j~ 1 + τ 2j+1−δ0m(1−δ1j ) s,s′=1 µν (k)a s (k) ◦ a−s′(k) + σss µν (k)a s (k) ◦ a+s′(k) (3.11a) S̃ ′′µν = ε (−1)j−1/2j~ 1 + τ 2j+1−δ0m(1−δ1j ) s,s′=1 µν (k)a s′(k) ◦ a s (k) + σss µν (k)a (k) ◦ a†+s (k) (3.11b) S̃ ′′′µν = (−1)j−1/2j~ 2(1 + τ) 2j+1−δ0m(1−δ1j ) s,s′=1 µν (k)[a s (k), a (k)]ε + σss µν (k)[a s (k), a s′(k)]ε (3.11c) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 8 L̃′µν =x0µ P̃ ′ν − x0 ν P̃ ′µ (−1)j−1/2j~ 1 + τ 2j+1−δ0m(1−δ1j ) s,s′=1 µν (k)a s (k) ◦ a−s′(k) + lss µν (k)a s (k) ◦ a+s′(k) 2(1 + τ) 2j+1−δ0m(1−δ0j ) a†+s (k) ←−−−−−→ ←−−−−−→ ◦ a−s (k) − εa†−s (k) ←−−−−−→ ←−−−−−→ ◦ a+s (k) m2c2+k2 (3.12a) L̃′′µν =x0µ P̃ ′′ν − x0 ν P̃ ′′µ (−1)j−1/2j~ 1 + τ 2j+1−δ0m(1−δ1j ) s,s′=1 µν (k)a s′(k) ◦ a s (k) + lss µν (k)a (k) ◦ a†+s (k) 2(1 + τ) 2j+1−δ0m(1−δ0j ) a+s (k) ←−−−−−→ ←−−−−−→ ◦ a†−s (k) − εa−s (k) ←−−−−−→ ←−−−−−→ ◦ a†+s (k) m2c2+k2 (3.12b) L̃′′′µν =x0µ P̃ ′′′ν − x0 ν P̃ ′′′µ (−1)j−1/2j~ 2(1 + τ) 2j+1−δ0m(1−δ1j ) s,s′=1 µν (k)[a s (k), a (k)]ε + lss µν (k)[a s (k), a (k)]ε 4(1 + τ) 2j+1−δ0m(1−δ0j ) a†+s (k) ←−−−−−→ ←−−−−−→ ◦ a−s (k) − εa−s (k) ←−−−−−→ ←−−−−−→ ◦ a†+s (k) + a+s (k) ←−−−−−→ ←−−−−−→ ◦ a†−s (k) − εa†−s (k) ←−−−−−→ ←−−−−−→ ◦ a+s (k) m2c2+k2 (3.12c) Here we have used the following notation: (−1)n+1/2 := (−1)ni for all n ∈ N and i := + ←−−−−−→ ◦B(k) := − ∂A(k) ◦B(k) + A(k) ◦ kµ ∂B(k) ←−−−→ ◦B(k) (3.13) for operators A(k) and B(k) having C1 dependence on k,9 and σ ss′,± µν (k) and l ss′,± µν (k) are 9 More generally, if ω : {F → F} → {F → F} is a mapping on the operator space over the system’s Hilbert space, we put A ω ◦ B := −ω(A) ◦ B + A ◦ ω(B) for any A,B : F → F . Usually [2, 12], this notation is used for ω = ∂µ. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 9 some functions of k such that10 µν (k) = −σss νµ (k) l ss′,± µν (k) = −lss νµ (k) µν (k) = l ss′,± νµ (k) = 0 for j = 0 (scalar field) µν (k) = −σss µν (k) =: σ µν (k) = −σs µν (k) = −σss νµ (k) for j = 1 (vector field) µν (k) = −lss µν (k) =: l µν (k) = −ls µν (k) = −lss νµ (k) for j = 1 (vector field). (3.14) A technical remark must be make at this point. The equations (3.9)–(3.12) were de- rived in [13–15] under some additional conditions, represented by equations (2.6) and (2.7), which are considered bellow in Sect. 5 and ensure the effectiveness of the momentum pic- ture of motion [21] used in [13–15]. However, as it is partially proved, e.g., in [1], when the quantities (3.9)–(3.12) are expressed via the Heisenberg creation and annihilation operators (see (2.9)), they remain valid, up to a phase factor, and without making the mentioned assumptions, i.e. these assumptions are needless when one works entirely in Heisenberg pic- ture. For this reason, we shall consider (3.9)–(3.12) as pure consequence of the Lagrangian formalism. We should emphasize, in (3.11) and (3.12) with S̃ωµν and L̃ωµν , ω = ′, ′′, ′′′, are denoted the spin and orbital, respectively, operators for L̃ω, which are the spacetime-independent parts of the spin and orbital, respectively, angular momentum operators [14, 23]; if the last operators are denoted by S̃ωµν and L̃ µν , the total angular momentum operator of a system with Lagrangian L̃ω is [23] M̃ωµν = L̃ µν + S̃ µν = L̃ωµν + S̃ωµν , ω = ′, ′′, ′′′ (3.15) and S̃ωµν = S̃ωµν (and hence L̃ µν = L̃ωµν) iff S̃ µν is a conserved operator or, equivalently, iff the system’s canonical energy-momentum tensor is symmetric.11 Going ahead (see Sect. 6), we would like to note that the expressions (3.9c) and, conse- quently, the Lagrangian L̃′′′ are the base from which the paracommutation relations were first derived [16]. And a last remark. Above we have expressed the dynamical variables in Heisenberg picture via the creation and annihilation operators in momentum picture. If one works entirely in Heisenberg picture, the operators (2.9), representing the creation and annihilation operators in Heisenberg picture, should be used. Besides, by virtue of the equations (a±s (k)) † = a†∓s (k) (a s (k)) † = a∓s (k) (3.16) ã±s (k) = ã†∓s (k) ㆱs (k) = ã∓s (k), (3.17) some of the relations concerning a s (k), e.g. the Euler-Lagrange and Heisenberg equations, are consequences of the similar ones regarding a±s (k). In view of (2.9), we shall consider (3.9)– (3.12) as obtained form the corresponding expressions in Heisenberg picture by making the replacements ã±s (k) 7→ a±s (k) and ã s (k) 7→ a†±s (k). So, (3.9)–(3.12) will have, up to a phase factor, a sense of dynamical variables in Heisenberg picture expressed via the cre- ation/annihilation operators in momentum picture. 10 For the explicit form of these functions, see [13–15]; see also equation (6.57) below. 11 In [14,23] the spin and orbital operators are labeled with an additional left superscript ◦, which, for brevity, is omitted in the present work as in it only these operators, not S̃ µν and L̃ µν , will be considered. Notice, the operators S̃ µν and L̃ µν are, generally, time-dependent while the orbital and spin ones are conserved, as a result of which the total angular momentum is a conserved operator too [14,23]. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 10 4. On the uniqueness of the dynamical variables Let D = Pµ, Q, Sµν , Lµν denotes some dynamical variable, viz. the momentum, charge, spin, or orbital operator, of a system with Lagrangian L. Since the Euler-Lagrange equations for the Lagrangians L′, L′′ and L′′′ coincide (see (3.6)), we can assert that any field satisfying these equations admits at least three classes of conserved operators, viz. D′, D′′ and D′′′ = D′+D′′ .Moreover, it can be proved that the Euler-Lagrange equations for the Lagrangian Lα,β := αL′ + β L′′ α+ β 6= 0 (4.1) do not depend on α, β ∈ C and coincide with (3.6). Therefore there exists a two parameter family of conserved dynamical variables for these equations given via Dα,β := αD′ + βD′′ α+ β 6= 0. (4.2) Evidently L′′′ = L 1 and D′′′ = D 1 . Since the Euler-Lagrange equations (3.6) are linear and homogeneous (in the cases considered), we can, without a lost of generality, restrict the parameters α, β ∈ C to such that α+ β = 1, (4.3) which can be achieved by an appropriate renormalization (by a factor (α+β)−1/2) of the field operators. Thus any field satisfying the Euler-Lagrange equations (3.6) admits the family Dα,β, α + β = 1, of conserved operators. Obviously, this conclusion is valid if in (4.1) we replace the particular Lagrangians L′ and L′′ (see (3.1) and (3.3)) with any two Lagrangians (of one and the same field variables) which lead to identical Euler-Lagrange equations. How- ever, the essential point in our case is that L′ and L′′ do not differ only by a full divergence, as a result of which the operators Dα,β are different for different pairs (α, β), α+ β = 1.12 Since one expects a physical system to possess uniquely defined dynamical characteristics, e.g. energy and total angular momentum, and the Euler-Lagrange equations are considered (in the framework of Lagrangian formalism) as the ones governing the spacetime evolution of the system considered, the problem arises when the dynamical operators Dα,β, α+β = 1, are independent of the particular choice of α and β, i.e. of the initial Lagrangian one starts off. Simple calculation show that the operators (4.2), under the condition (4.3), are independent of the particular values of the parameters α and β if and only if D′ = D′′. (4.4) Some consequences of the condition(s) (4.4) will be considered below, as well as possible ways for satisfying these restrictions on the Lagrangian formalism. Combining (3.9)–(3.12) with (4.4), for respectively D = Pµ, Q, Sµν , Lµν , we see that a free scalar, spinor or vector field has a uniquely defined dynamical variables if and only if the following equations are fulfilled: 2j+1−δ0m(1−δ0j ) d3k kµ m2c2+k2 a†+s (k) ◦ a−s (k)− εa−s (k) ◦ a†+s (k) − a+s (k) ◦ a†−s (k) + εa†−s (k) ◦ a+s (k) = 0 (4.5) 12 Note, no commutativity or some commutation relations between the field operators and their charge (or Hermitian) conjugate are presupposed, i.e., at the moment, we work in a theory without such relations and normal ordering. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 11 2j+1−δ0m(1−δ0j ) a†+s (k) ◦ a−s (k)− εa−s (k) ◦ a†+s (k) + a+s (k) ◦ a†−s (k)− εa†−s (k) ◦ a+s (k) = 0 (4.6) 2j+1−δ0m(1−δ1j ) s,s′=1 µν (k)a s (k) ◦ a−s′(k)− εσ ss′,− µν (k)a (k) ◦ a†+s (k) − εσss′,+µν (k)a+s′(k) ◦ a s (k) + σ ss′,+ µν (k)a s (k) ◦ a+s′(k) = 0 (4.7) 2j+1−δ0m(1−δ1j ) s,s′=1 µν (k)a s (k) ◦ a−s′(k)− εl ss′,− µν (k)a (k) ◦ a†+s (k) − εlss′,+µν (k)a+s′(k) ◦ a s (k) + l ss′,+ µν (k)a s (k) ◦ a+s′(k) 2j+1−δ0m(1−δ0j ) a†+s (k) ←−−−−−→ ←−−−−−→ ◦a−s (k)+εa−s (k) ←−−−−−→ ←−−−−−→ ◦a†+s (k) −a+s (k) ←−−−−−→ ←−−−−−→ ◦a†−s (k)−εa†−s (k) ←−−−−−→ ←−−−−−→ ◦a+s (k) m2c2+k2 (4.8) In (4.6) is retained the constant factor q as in the neutral case it is equal to zero and, consequently, the equation (4.6) reduces to identity. Since the Euler-Lagrange equations do not impose some restrictions on the creation and annihilation operators, the equations (4.5)–(4.8) can be regarded as subsidiary conditions on the Lagrangian formalism and can serve as equations for (partial) determination of the creation and annihilation operators. The system of integral equations (4.5)–(4.8) is quite complicated and we are not going to investigate it in the general case. Below we shall restrict ourselves to analysis of only those solutions of (4.5)–(4.8), if any, for which the integrands in (4.5)–(4.8) vanish. This means that we shall replace the system of integral equations (4.5)–(4.8) with respect to creation and annihilation operators with the following system of algebraic equations (do not sum over s and s′ in (4.12) and (4.13)!): a†+s (k) ◦ a−s (k) − εa−s (k) ◦ a†+s (k) − a+s (k) ◦ a†−s (k) + εa†−s (k) ◦ a+s (k) = 0 (4.9) a†+s (k) ◦ a−s (k) − εa−s (k) ◦ a†+s (k) + a+s (k) ◦ a†−s (k) − εa†−s (k) ◦ a+s (k) = 0 if q 6= 0 (4.10) a†+s (k) ←−−−−−→ ←−−−−−→ ◦ a−s (k) + εa−s (k) ←−−−−−→ ←−−−−−→ ◦ a†+s (k) −a+s (k) ←−−−−−→ ←−−−−−→ ◦a†−s (k)−εa†−s (k) ←−−−−−→ ←−−−−−→ ◦a+s (k) m2c2+k2 (4.11) µν (k)a s (k) ◦ a−s′(k)− εσ ss′,− µν (k)a (k) ◦ a†+s (k) − εσss′,+µν (k)a+s′(k) ◦ a s (k) + σ ss′,+ µν (k)a s (k) ◦ a+s′(k) = 0 (4.12) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 12 µν (k)a s (k) ◦ a−s′(k)− εl ss′,− µν (k)a (k) ◦ a†+s (k) − εlss′,+µν (k)a+s′(k) ◦ a s (k) + l ss′,+ µν (k)a s (k) ◦ a+s′(k) = 0 (4.13) Here: s = 1, . . . , 2j + 1 − δ0m(1 − δ0j) in (4.9)–(4.11) and s, s′ = 1, . . . , 2j + 1 − δ0m(1 − δ1j) in (4.12) and (4.13). (Notice, by virtue of (3.14), the equations (4.12) and (4.13) are identically valid for j = 0, i.e. for scalar fields.) Since all polarization indices enter in (4.5) and (4.6) on equal footing, we do not sum over s in (4.9)–(4.11). But in (4.12) and (4.13) we have retain the summation sign as the modes with definite polarization cannot be singled out in the general case. One may obtain weaker versions of (4.9)–(4.13) by summing in them over the polarization indices, but we shall not consider these conditions below regardless of the fact that they also ensure uniqueness of the dynamical variables. At first, consider the equations (4.9)–(4.11). Since for a neutral field, q = 0, we have s (k) = a s (k), which physically means coincidence of field’s particles and antiparticles, the equations (4.9)–(4.11) hold identically in this case. Let consider now the case q 6= 0, i.e. the investigated field to be charged one. Using the standard notation (cf. (3.8)) [A,B]η := A ◦B + ηB ◦A, (4.14) for operators A and B and η ∈ C, we rewrite (4.9) and (4.10) as [a†+s (k), a s (k)]−ε − [a+s (k), a†−s (k)]−ε = 0 (4.9′) [a†+s (k), a s (k)]−ε + [a s (k), a s (k)]−ε = 0 if q 6= 0, (4.10′) which are equivalent to [a†±s (k), a s (k)]−ε = 0 if q 6= 0. (4.15) Differentiating (4.15) and inserting the result into (4.11), one can verify that (4.11) is tantamount to a†+s (k), ◦ a−s (k) a+s (k), ◦ a†−s (k) m2c2+k2 = 0 if q 6= 0, (4.16) Consider now (4.12) and (4.13). By means of the shorthand (4.14), they read µν (k)[a s (k), a (k)]−ε + σ ss′,+ µν (k)[a s (k), a (k)]−ε = 0 (4.17) µν (k)[a s (k), a (k)]−ε + l ss′,+ µν (k)[a s (k), a (k)]−ε = 0. (4.18) For a scalar field, j = 0, these conditions hold identically, due to (3.14). But for j 6= 0 they impose new restrictions on the formalism. In particular, for vector fields, j = 1 and ε = +1 they are satisfied iff (see (3.14)) [a†+s (k), a (k)]−ε − [a†−s (k), a+s′(k)]−ε − [a (k), a−s (k)]−ε + [a (k), a+s (k)]−ε = 0. (4.19) One can satisfy (4.17) and (4.18) if the following generalization of (4.15) holds [a†±s (k), a (k)]−ε = 0. (4.20) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 13 For spin j = 1 (and hence ε = −1 – see (3.7)), the conditions (4.12) and (4.13) cannot be simplified much, but, if one requires the vanishment of the operator coefficients after ss′,± µν (k) and l ss′,± µν (k), one gets a†±s (k) ◦ a∓s′(k) = 0 j = ε = −1. (4.21) Excluding some special cases, e.g. neutral scalar field (q = 0 and j = 0), the equa- tions (4.15) and (4.21) are unacceptable from many viewpoints. The main of them is that they are incompatible with the ordinary (anti)commutation relations (see, e.g., e.g. [1, 11, 12, 18] or Sect. 6, in particular, equations (6.13) bellow); for example, (4.21) means that the acts of creation and annihilation of (anti)particles with identical characteristics should be mutually independent, which contradicts to the existing theory and experimental data. Now we shall try another way for achieving uniqueness of the dynamical variables for free fields. Since in (4.9)–(4.13) naturally appear (anti)commutators between creation and annihilation operators and these (anti)commutators vanish under the standard normal or- dering [1,11,12,18], one may suppose that the normally ordered expressions of the dynamical variables may coincide. Let us analyze this method. Recall [1, 3, 11, 12], the normal ordering operator N (for free field theory) is a linear operator on the operator space of the system considered such that to a product (composition) c1 ◦ · · · ◦ cn of n ∈ N creation and/or annihilation operators c1, . . . cn it assigns the operator (−1)f cα1 ◦ · · · cαn . Here (α1, . . . , αn) is a permutation of (1, . . . , n), all creation operators stand to the left of all annihilation ones, the relative order between the creation/annihilation operators is preserved, and f is equal to the number of transpositions among the fermion operators (j = 1 ) needed to be achieved the just-described order (“normal order”) of the operators c1 ◦ · · · ◦ cn in cα1 ◦ · · · cαn .13 In particular this means that a+s (k) ◦ a t (p) = a+s (k) ◦ a t (p) N a†+s (k) ◦ a−t (p) = a†+s (k) ◦ a−t (p) a−s (k) ◦ a t (p) t (p) ◦ a−s (k) N a†−s (k) ◦ a+t (p) = εa+t (p) ◦ a†−s (k) (4.22) and, consequently, we have [a†±s (k), a t (p)]−ε = 0 N [a±s (k), a t (p)]−ε = 0, (4.23) due to ε := (−1)2j = ±1 (see (3.7)). (In fact, below only the equalities (4.22) and (4.23), not the general definition of a normal product, will be applied.) Applying the normal ordering operator to (4.9′), (4.10′), (4.17) and (4.18), we, in view of (4.23), get the identity 0 = 0, which means that the conditions (4.9), (4.10), (4.12) and (4.13) are identically satisfied after normal ordering. This is confirmed by the application of N to (3.9) and (3.10), which results respectively in (see (4.22)) N ( P̃ ′µ) = N ( P̃ ′′µ) 1 + τ 2j+1−δ0m(1−δ0j ) d3kkµ| m2c2+k2 {a†+s (k) ◦ a−s (k) + a+s (k) ◦ a†−s (k)} (4.24) 13 We have slightly modified the definition given in [1,3,11,12] because no (anti)commutation relations are presented in our exposition till the moment. In this paper we do not concern the problem for elimination of the ‘unphysical’ operators a± (k) and a (k) from the spin and orbital momentum operators when j = 1; for details, see [15], where it is proved that, for an electromagnetic field, j = 1 and q = 0, one way to achieve this is by adding to the number f above the number of transpositions between a±s (k), s = 1, 2, and a (k) needed for getting normal order. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 14 N ( Q̃′) = N ( Q̃′′) = 1 1 + τ 2j+1−δ0m(1−δ0j ) d3k{a†+s (k) ◦ a−s (k)− a+s (k) ◦ a†−s (k)}. (4.25) Therefore the normal ordering ensures the uniqueness of the momentum and charge operators, if we redefine them respectively as P̃µ := N ( P̃ ′µ) Q̃ := N ( Q̃′). (4.26) Putting ωµν := kµ − kν ∂∂kµ and using (4.22), one can verify that a+s (k) ←−−−→ ωµν ◦ a†−s (k) = a+s (k) ←−−−→ ωµν ◦ a†−s (k) a†+s (k) ←−−−→ ωµν ◦ a−s (k) = a†+s (k) ←−−−→ ωµν ◦ a−s (k) a−s (k) ←−−−→ ωµν ◦ a†+s (k) = −εa†+s (k) ←−−−→ ωµν ◦ a−s (k) a†−s (k) ←−−−→ ωµν ◦ a+s (k) = −εa+s (k) ←−−−→ ωµν ◦ a†−s (k). (4.27) As a consequence of these equalities, the action of N on the l.h.s. of (4.11) vanishes. Com- bining this result with the mentioned fact that the normal ordering converts (4.12) and (4.13) into identities, we see that the normal ordering procedure ensures also uniqueness of the spin and orbital operators if we redefine them respectively as: S̃µν := N ( S̃ ′µν) := N ( S̃ ′′µν) = (−1)j−1/2j~ 1 + τ 2j+1−δ0m(1−δ1j ) s,s′=1 µν (k)a s (k) ◦ a−s′(k) + εσ ss′,+ µν (k)a s′(k) ◦ a s (k) (4.28) L̃µν := N ( L̃′µν) := N ( L̃′′µν) = x0µ P̃ν − x0 ν P̃µ + (−1)j−1/2j~ 1 + τ 2j+1−δ0m(1−δ1j ) s,s′=1 µν (k)a s (k) ◦ a−s′(k) + εl ss′,+ µν (k)a (k) ◦ a†−s (k) 2(1 + τ) 2j+1−δ0m(1−δ0j ) a†+s (k) ←−−−−−→ ←−−−−−→ ◦ a−s (k) + a+s (k) ←−−−−−→ ←−−−−−→ ◦ a†−s (k) m2c2+k2 (4.29) where (3.14) was applied. 5. Heisenberg relations The conserved operators, like momentum and charge operators, are often identified with the generators of the corresponding transformations under which the action operator is invari- ant [1, 3, 11, 12]. This leads to a number of commutation relations between the components of these operators and between them and the field operators. The relations of the letter set are known/referred as the Heisenberg relations or equations. Both kinds of commuta- tion relations are from pure geometric origin and, consequently, are completely external to the Lagrangian formalism; one of the reasons being that the mentioned identification is, in Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 15 general, unacceptable and may be carried out only on some subset of the system’s Hilbert space of states [23, 24]. Therefore their validity in a pure Lagrangian theory is questionable and should be verified [11]. However, the considered relations are weaker conditions than the identification of the corresponding operators and there are strong evidences that these relations should be valid in a realistic quantum field theory [1,11]; e.g., the commutativity be- tween the momentum and charge operators (see below (5.18)) expresses the experimental fact that the 4-momentum and charge of any system are simultaneously measurable quantities. It is known [1,11], in a pure Lagrangian approach, the field equations, which are usually identified with the Euler-Lagrange, 14 are the only restrictions on the field operators. Besides, these equations do not determine uniquely the field operators and the letter can be expressed through the creation and annihilation operators. Since the last operators are left completely arbitrary by a pure Lagrangian formalism, one is free to impose on them any system of compatible restrictions. The best known examples of this kind are the famous canonical (anti)commutation relations and their generalization, the so-called paracommutation rela- tions [16,18]. In general, the problem for compatibility of such subsidiary to the Lagrangian formalism system of restrictions with, for instance, the Heisenberg relations is open and requires particular investigation [11]. For example, even the canonical (anti)commutation relations for electromagnetic field in Coulomb gauge are incompatible with the Heisenberg equation involving the (total) angular momentum operator unless the gauge symmetry of this field is taken into account [11, § 84]. However, the (para)commutation relations are, by con- struction, compatible with the Heisenberg relations regarding momentum operator (see [16] or below Subsect. 6.1). The ordinary approach is to be imposed a system of equations on the creation and annihilation operators and, then, to be checked its compatibility with, e.g., the Heisenberg relations. In the next sections we shall investigate the opposite situation: assuming the validity of (some of) the Heisenberg equations, the possible restrictions on the creation and annihilation operators will be explored. For this purpose, below we briefly review the Heisenberg relations and other ones related to them. Consider a system of quantum fields ϕ̃i(x), i = 1, . . . , N ∈ N, where ϕ̃i(x) denote the components of all fields (and their Hermitian conjugates), and P̃µ, Q̃ and M̃µν be its momentum, charge and (total) angular momentum operators, respectively. The Heisenberg relations/equations for these operators are [1, 3, 11,12] [ ϕ̃i(x), P̃µ] = i~ ∂ ϕ̃i(x) (5.1) [ ϕ̃i(x), Q̃] = e( ϕ̃i)q ϕ̃i(x) (5.2) [ ϕ̃i(x), M̃µν ] = i~{xµ∂ν ϕ̃i(x)− xν∂µ ϕ̃i(x)}+ i~ ϕ̃i′(x). (5.3) Here: q = const is the fields’ charge, e( ϕ̃i) = 0 if ϕ̃ i = ϕ̃i, e( ϕ̃i) = ±1 if ϕ̃ i 6= ϕ̃i with e( ϕ̃i)+e( ϕ̃ i ) = 0, and the constants I iµν = −Ii iνµ characterize the transformation properties of the field operators under 4-rotations. (If ε( ϕ̃i) 6= 0, it is a convention whether to put ε( ϕ̃i) = +1 or ε( ϕ̃i) = −1 for a fixed i.) We would like to make some comments on (5.3). Since its r.h.s. is a sum of two operators, the first (second) characterizing the pure orbital (spin) angular momentum properties of the system considered, the idea arises to split (5.3) into two independent equations, one involving the orbital angular momentum operator and another concerning the spin angular momentum operator. This is supported by the observation that, it seems, no process is known for transforming orbital angular momentum into spin one and v.v. (without destroying the 14 Recall, there are Lagrangians whose classical Euler-Lagrange equations are identities. However, their correct and rigorous treatment [22] reveals that they entail field equations which are mathematically correct and physically sensible. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 16 system). So one may suppose the existence of operators M̃orµν and M̃ µν such that [ ϕ̃i(x), M̃orµν ] = i~{xµ∂ν ϕ̃i(x)− xν∂µ ϕ̃i(x)} (5.4) [ ϕ̃i(x), M̃spµν ] = i~ iµν ϕ̃i′(x) (5.5) M̃µν = M̃orµν + M̃spµν . (5.6) However, as particular calculations demonstrate [5,14,15], neither the spin (resp. orbital) nor the spin (resp. orbital) angular momentum operator is a suitable candidate for M̃spµν (resp. M̃orµν). If we assume the validity of (5.1), then equations (5.4) and (5.5) can be satisfied if we choose M̃orµν(x) = L̃extµν := xµ P̃ν − xν P̃µ (5.7) M̃spµν(x) = M̃(0)µν (x) := M̃µν − L̃extµν = S̃µν + L̃µν − {xµ P̃ν − xν P̃µ} (5.8) with M̃µν satisfying (5.3). These operators are not conserved ones. Such a representation is in agreement with the equations (3.12), according to which the operator (5.7) enters addi- tively in the expressions for the orbital operator.15 The physical sense of the operator (5.7) is that it represents the orbital angular momentum of the system due to its movement as a whole. Respectively, the operator (5.8) describes the system’s angular momentum as a result of its internal movement and/or structure. Since the spin (orbital) angular momentum is associated with the structure (movement) of a system, in the operator (5.8) are mixed the spin and orbital angular momenta. These quantities can be separated completely via the following representations of the operators Morµν and M µν in momentum picture (when (5.1) holds) Morµν = xµ Pν − xµPµ + Lintµν (5.9) Mspµν = Mµν − (xµ Pν − xµ Pµ)− Lintµν , (5.10) where Lintµν describes the ‘internal’ orbital angular momentum of the system considered and depends on the Lagrangian we have started off. Generally said, Lintµν is the part of the orbital angular momentum operator containing derivatives of the creation and annihilation operators. In particular, for the Lagrangians L′, L′′ and L′′′ (see Sect. 3), the explicit forms of the operators (5.9) and (5.10) respectively are: M′ orµν =xµP ′ν − xν P ′µ 2(1 + τ) 2j+1−δ0m(1−δ0j) a†+s (k) ←−−−−−→ ←−−−−−→ ◦ a−s (k) − εa†−s (k) ←−−−−−→ ←−−−−−→ ◦ a+s (k) m2c2+k2 (5.11a) M′′ orµν =xµP ′′ν − xν P ′′µ 2(1 + τ) 2j+1−δ0m(1−δ0j) a+s (k) ←−−−−−→ ←−−−−−→ ◦ a†−s (k) − εa−s (k) ←−−−−−→ ←−−−−−→ ◦ a†+s (k) m2c2+k2 (5.11b) 15 This is evident in the momentum picture of motion, in which xµ stands for x0µ in (3.12) — see [13–15]. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 17 M′′′ orµν =xµP ′′′ν − xν P ′′′µ 4(1 + τ) 2j+1−δ0m(1−δ0j) a†+s (k) ←−−−−−→ ←−−−−−→ ◦ a−s (k) − εa−s (k) ←−−−−−→ ←−−−−−→ ◦ a†+s (k) + a+s (k) ←−−−−−→ ←−−−−−→ ◦ a†−s (k) − εa†−s (k) ←−−−−−→ ←−−−−−→ ◦ a+s (k) m2c2+k2 (5.11c) M′ spµν = (−1)j−1/2j~ 1 + τ 2j+1−δ0m(1−δ1j ) s,s′=1 µν (k) + l ss′,− µν (k))a s (k) ◦ a−s′(k) + (σss µν (k) + l ss′,+ µν (k))a s (k) ◦ a+s′(k) (5.12a) M′′ spµν = ε (−1)j−1/2j~ 1 + τ 2j+1−δ0m(1−δ1j ) s,s′=1 µν (k) + l ss′,+ µν (k))a (k) ◦ a†−s (k) + (σss µν (k) + σ ss′,− µν (k))a (k) ◦ a†+s (k) (5.12b) M′′′ spµν = (−1)j−1/2j~ 2(1 + τ) 2j+1−δ0m(1−δ1j ) s,s′=1 µν (k) + l ss′,− µν (k))[a s (k), a s′(k)]ε + (σss µν (k) + l ss′,+ µν (k))[a s (k), a (k)]ε (5.12c) Obviously (see Sect. 2), the equations (5.12) have the same form in Heisenberg picture in terms of the operators (2.9) (only tildes over M and a must be added), but the equa- tions (5.11) change substantially due to the existence of derivatives of the creation and annihilation operators in them [13–15]: M̃′ orµν = 2(1 + τ) 2j+1−δ0m(1−δ0j ) ã†+s (k) ←−−−−−→ ←−−−−−→ ◦ ã−s (k) − εã†−s (k) ←−−−−−→ ←−−−−−→ ◦ ã+s (k) m2c2+k2 (5.13a) M̃′′ orµν = 2(1 + τ) 2j+1−δ0m(1−δ0j ) ã+s (k) ←−−−−−→ ←−−−−−→ ◦ ã†−s (k) − εã−s (k) ←−−−−−→ ←−−−−−→ ◦ ã†+s (k) m2c2+k2 (5.13b) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 18 M̃′′′ orµν = 4(1 + τ) 2j+1−δ0m(1−δ0j ) ã†+s (k) ←−−−−−→ ←−−−−−→ ◦ ã−s (k) − εã−s (k) ←−−−−−→ ←−−−−−→ ◦ ã†+s (k) + ã+s (k) ←−−−−−→ ←−−−−−→ ◦ ã†−s (k) − εã†−s (k) ←−−−−−→ ←−−−−−→ ◦ ã+s (k) m2c2+k2 (5.13c) From (5.13) and (5.12) is clear that the operators M̃orµν and M̃ µν so defined are conserved (contrary to (5.7) and (5.8)) and do not depend on the validity of the Heisenberg rela- tions (5.1) (contrary to expressions (5.11) in momentum picture). The problem for whether the operators (5.12) and (5.13) satisfy the equations (5.4) and (5.5), respectively, will be considered in Sect. 6. There is an essential difference between (5.4) and (5.5): the equation (5.5) depends on the particular properties of the operators ϕ̃i(x) under 4-rotations via the coefficients I (see (5.25) below), while (5.4) does not depend on them. This is explicitly reflected in (5.11) and (5.12): the former set of equations is valid independently of the geometrical nature of the fields considered, while the latter one depends on it via the ‘spin’ (‘polarization’) functions ss′,± µν (k) and l ss′,± µν (k). Similar remark concerns (5.3), on one hand, and (5.1) and (5.2), on another hand: the particular form of (5.3) essentially depends on the geometric properties of ϕ̃i(x) under 4-rotations, the other equations being independent of them. It should also be noted, the relation (5.3) does not hold for a canonically quantized electromagnetic field in Coulomb gauge unless some additional terms it its r.h.s., reflecting the gauge symmetry of the field, are taken into account [11, § 84]. As it was said above, the relations (5.1)–(5.3) are from pure geometrical origin. However, the last discussion, concerning (5.4)–(5.8), reveals that the terms in braces in (5.3) should be connected with the momentum operator in the (pure) Lagrangian approach. More precisely, on the background of equations (3.11a)–(3.12c), the Heisenberg relation (5.3) should be replaced with [ ϕ̃i(x), M̃µν ] = xµ[ ϕ̃i(x), P̃ν ] − xν [ ϕ̃i(x), P̃µ] + i~ iµν ϕ̃i′(x), (5.14) which is equivalent to (5.3) if (5.1) is true. An advantage of the last equation is that it is valid in any picture of motion (in the same form) while (5.3) holds only in Heisenberg picture.16 Obviously, (5.14) is equivalent to (5.5) with M̃spµν defined by (5.8). The other kind of geometric relations mentioned at the beginning of this section are connected with the basic relations defining the Lie algebra of the Poincaré group [7, pp. 143– 147], [8, sect. 7.1]. They require the fulfillment of the following equations between the com- ponents P̃µ of the momentum and M̃µν of the angular momentum operators [3, 5, 7, 8]: [ P̃µ, P̃ν ] = 0 (5.15) [M̃µν , P̃λ] = −i~(ηλµ P̃ν − ηλν P̃µ). (5.16) [M̃κλ, M̃µν ] = −i~ ηκµ M̃λν − ηλµ M̃κν − ηκν M̃λµ + ηλν M̃κµ . (5.17) We would like to pay attention to the minus sign in the multiplier (−i~) in (5.16) and (5.17) with respect to the above references, where i~ stands instead of −i~ in these equations. When 16 In other pictures of motion, generally, additional terms in the r.h.s. of (5.3) will appear, i.e. the functional form of the r.h.s. of (5.3) is not invariant under changes of the picture of motion, contrary to (5.14). Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 19 (a representation of) the Lie algebra of the Poincaré group is considered, this difference in the sign is insignificant as it can be absorbed into the definition of M̃µν . However, the change of the sign of the angular momentum operator, M̃µν 7→ −M̃µν , will result in the change i~ 7→ −i~ in the r.h.s. of (5.3). This means that equations (5.15), (5.16) and (5.3), when considered together, require a suitable choice of the signs of the multiplier i~ in their right hand sides as these signs change simultaneously when M̃µν is replaced with −M̃µν . Since equations (5.3), (5.16) and (5.17) hold, when M̃µν is defined according to the Noether’s theorem and the ordinary (anti)commutation relations are valid [13–15], we accept these equations in the way they are written above. To the relations (5.15)–(5.17) should be added the equations [3, p. 78] [ Q̃, P̃µ] = 0 (5.18) [ Q̃, M̃µν ] = 0, (5.19) which complete the algebra of observables and express, respectively, the translational and rotational invariance of the charge operator Q̃; physically they mean that the charge and momentum or the charge and angular momentum are simultaneously measurable quantities. Since the spin properties of a system are generally independent of its charge or momentum, one may also expect the validity of the relations17 [ S̃µν , P̃µ] = 0 (5.20) [ S̃µν , Q̃] = 0. (5.21) But, as the spin describes, in a sense, some of the rotational properties of the system, equality like [ S̃µν , L̃κλ] = 0 is not likely to hold. Indeed, the considerations in [13–15] reveal that (5.20) and (5.21), but not the last equation, are true in the framework of the Lagrangian formalism with added to it standard (anti)commutation relations. Notice, if (5.20) and (5.21) hold, then, respectively, (5.16) and (5.19) are equivalent to [ L̃µν , P̃λ] = −i~(ηλµ P̃ν − ηλν P̃µ). (5.22) [ Q̃, L̃µν ] = 0. (5.23) It is intuitively clear, not all of the commutation relations (5.1)–(5.3) and (5.15)–(5.21) are independent: if D̃ denotes some of the operators P̃µ, Q̃, M̃µν , S̃µν or L̃µν and the commutators [ ϕ̃i(x), D̃] , i = 1, . . . , N , are known, then, in principle, one can calculate the commutators [Γ( ϕ̃1(x), . . . , ϕ̃N (x)), D̃] , where Γ( ϕ̃1(x), . . . , ϕ̃N (x)) is, for example, any function/functional bilinear in ϕ̃1(x), . . . , ϕ̃N (x); to prove this fact, one should apply the identity [A,B ◦ C] = [A,B] ◦ C + B ◦ [A,C] a suitable number of times. In particular, if D̃1 and D̃2 denote any two (distinct) operators of the dynamical variables, and [ ϕ̃i(x), D̃1] is known, then the commutator [ D̃1, D̃2] can be calculated explicitly. For this reason, we can expect that: (i) Equation (5.1) implies (5.15), (5.16), (5.18), (5.20) and (5.22). (ii) Equation (5.2) implies (5.18), (5.19), (5.21), and (5.23). (iii) Equation (5.3) implies (5.16), (5.17), and (5.19). Besides, (5.3) may, possibly, entail equations like (5.17) with S or L forM , with an exception of M̃µν in the l.h.s., i.e. [ S̃κλ, M̃µν ] = −i~ ηκµ S̃λν − ηλµ S̃κν − ηκν S̃λµ + ηλν S̃κµ [ L̃κλ, M̃µν ] = −i~ ηκµ L̃λν − ηλµ L̃κν − ηκν L̃λµ + ηλν L̃κµ } (5.24) 17 Recall, S̃µν (resp. L̃µν) is the conserved spin (resp. orbital) operator, not the generally non-conserved spin (resp. orbital) angular momentum operator [23]. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 20 The validity of assertions (i)–(iii) above for free scalar, spinor and vector fields, when respec- tively ϕ̃i(x) 7→ ϕ̃(x), ϕ̃†(x) Ii iµν 7→ Iµν = 0 e( ϕ̃) = −e( ϕ̃†) = +1 (5.25a) ϕ̃i(x) 7→ ψ̃(x), ˜̆ψ(x) Ii iµν 7→ Iψµν = Iψ̆µν = − σµν e( ψ̃) = −e( ˜̆ψ) = +1 (5.25b) ϕ̃i(x) 7→ Ũµ(x), Ũ†µ(x) Ii iµν 7→ Iσρµν = I†σρµν = δσµηνρ − δσν ηµρ e( Ũµ) = −e( Ũ†µ) = +1, (5.25c) where σµν := i [γµ, γν ] with γµ being the Dirac γ-matrices [1, 25], is proved in [13–15], respectively. Besides, in loc. cit. is proved that equations (5.24) hold for scalar and vector fields, but not for a spinor field.18 Thus, we see that the Heisenberg relations (5.1)–(5.3) are stronger than the commutation relations (5.15)–(5.23), when imposed on the Lagrangian formalism as subsidiary restrictions. 6. Types of possible commutation relations In a broad sense, by a commutation relation we shall understand any algebraic relation between the creation and annihilation operators imposed as subsidiary restriction on the Lagrangian formalism. In a narrow sense, the commutation relations are the equations (6.13), with ε = −1, written below and satisfied by the bose creation and annihilation operators. As anticommutation relations are known the equations (6.13), with ε = +1, written below and satisfied by the fermi creation and annihilation operators. The last two types of relations are often referred as the bilinear commutation relations [18]. Theoretically are possible also trilinear commutation relations, an example being the paracommutation relations [16, 18] represented below by equations (6.18) (or (6.20)). Generally said, the commutation relations should be postulated. Alternatively, they could be derived from (equivalent to them) different assumptions added to the Lagrangian formalism. The purpose of this section is to be explored possible classes of commutation relations, which follow from some natural restrictions on the Lagrangian formalism that are consequences from the considerations in the previous sections. Special attention will be paid on some consequences of the charge symmetric Lagrangians as the free fields possess such a symmetry [1, 3, 11,12]. As pointed in Sect 3, the Euler-Lagrange equations for the Lagrangians L̃′, L̃′′ and L̃′′′ coincide and, in quantum field theory, the role of these equations is to be singled out the independent degrees of freedom of the fields in the form of creation and annihilation operators a±s (k) and a s (k) (which are identical for L̃′, L̃′′ and L̃′′′). Further specialization of these operators is provided by the commutation relations (in broad sense) which play a role of field equations in this situation (with respect to the mentioned operators). Before proceeding on, we would like to simplify our notation. As a spin variable, s say, is always coupled with a 3-momentum one, k say, we shall use the letters l, m and n to denote pairs like l = (s,k), m = (t,p) and n = (r, q). Equipped with this convention, we shall write, e.g., a±l for a s (k) and a l for a s (k). We set δlm := δstδ 3(k−p) and a summation sign like l should be understood as d3k, where the range of the polarization variable s will be clear from the context (see, e.g., (3.9)–(3.12)). 18 The problem for the validity of assertions (i)–(iii) or equations (5.24) in the general case of arbitrary fields (Lagrangians) is not a subject of the present work. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 21 6.1. Restrictions related to the momentum operator First of all, let us examine the consequences of the Heisenberg relation (5.1) involving the momentum operator. Since in terms of creation and annihilation operators it reads [1,13–15] [a±s (k), Pµ] = ∓kµa±s (k) [a†±s (k), Pµ] = ∓kµa†±s (k) k0 = m2c2 + k2, (6.1) the field equations in terms of creation and annihilation operators for the Lagrangians (3.1), (3.3) and (3.4) respectively are (see [13–15] or (6.1) and (3.9)): 2j+1−δ0m(1−δ0j ) m2c2+q2 a±s (k), a t (q) ◦ a−t (q) + εa t (q) ◦ a+t (q) ± (1 + τ)a±s (k)δstδ3(k − q) d3q = 0 (6.2a) 2j+1−δ0m(1−δ0j ) m2c2+q2 a†±s (k), a t (q) ◦ a−t (q) + εa t (q) ◦ a+t (q) ± (1 + τ)a†±s (k)δstδ3(k − q) d3q = 0 (6.2b) 2j+1−δ0m(1−δ0j ) m2c2+q2 a±s (k), a t (q) ◦ a t (q) + εa t (q) ◦ a t (q) ± (1 + τ)a±s (k)δstδ3(k − q) d3q = 0 (6.3a) 2j+1−δ0m(1−δ0j ) m2c2+q2 a†±s (k), a t (q) ◦ a t (q) + εa t (q) ◦ a t (q) ± (1 + τ)a†±s (k)δstδ3(k − q) d3q = 0 (6.3b) 2j+1−δ0m(1−δ0j ) m2c2+q2 a±s (k), [a t (q), a t (q)]ε + [a t (q), a t (q)]ε ± (1 + τ)a±s (k)δstδ3(k − q) d3q = 0 (6.4a) 2j+1−δ0m(1−δ0j) m2c2+q2 a†±s (k), [a t (q), a t (q)]ε + [a t (q), a t (q)]ε ± (1 + τ)a†±s (k)δstδ3(k − q) d3q = 0, (6.4b) where j and ε are given via (3.7), the generalized commutation function [·, ·]ε is defined by (4.14), and the polarization indices take the values s, t = 1, . . . , 2j + 1− δ0m(1− δ0j) = 1 for j = 0 or for j = 1 and m = 0 1, 2 for j = 1 and m 6= 0 or for j = 1 and m = 0 1, 2, 3 for j = 1 and m 6= 0 (6.5) The “b” versions of the equations (6.2)–(6.4) are consequences of the “a” versions and the equalities (a±l ) † = a l ) = a l (6.6) [A,B]η = η[A†, B†]η for [A,B]η = η[B,A]η η = ±1. (6.7) Applying (6.2)–(6.4) and the identity [A,B ◦ C] = [A,B]η ◦ C − ηB ◦ [A,C]η for η = ±1 (6.8) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 22 for the choice η = −1, one can prove by a direct calculation that [ P̃µ, P̃ν ] = 0 [ Q̃, P̃µ] = 0 [ S̃µν , P̃λ] = 0 [ L̃µν , P̃λ] = −i~{ηλµ P̃ν − ηλν P̃µ} [M̃µν , P̃λ] = −i~{ηλµ P̃ν − ηλν P̃µ}, (6.9) where the operators P̃µ, Q̃, S̃µν , L̃µν , and M̃µν denote the momentum, charge, spin, orbital and total angular momentum operators, respectively, of the system considered and are calculated from one and the same initial Lagrangian. This result confirms the supposition, made in Sect. 5, that the assertion (i) before (5.24) holds for the fields investigated here. Below we shall study only those solutions of (6.2)–(6.4) for which the integrands in them vanish, i.e. we shall replace the systems of integral equations (6.2)–(6.4) with the following systems of algebraic equations (see the above convention on the indices l and m and do not sum over indices repeated on one and the same level): a±l , a m ◦ a−m + εa†−m ◦ a+m ± (1 + τ)δlma±l = 0 (6.10a) l , a m ◦ a−m + εa†−m ◦ a+m ± (1 + τ)δlma†±l = 0 (6.10b) a±l , a m ◦ a†−m + εa−m ◦ a†+m ± (1 + τ)δlma±l = 0 (6.11a) l , a m ◦ a†−m + εa−m ◦ a†+m ± (1 + τ)δlma†±l = 0 (6.11b) a±l , [a m , a m]ε + [a ± 2(1 + τ)δlma±l = 0 (6.12a) , [a†+m , a m]ε + [a ± 2(1 + τ)δlma†±l = 0. (6.12b) It seems, these are the most general and sensible trilinear commutation relations one may impose on the creation and annihilation operators. First of all, we should mentioned that the standard bilinear commutation relations, viz. [1, 3, 11–15] [a±l , a m]−ε = 0 [a l , a m ]−ε = 0 [a∓l , a m]−ε = (±1)2j+1τδlm idF [a l , a m ]−ε = (±1)2j+1τδlm idF [a±l , a m ]−ε = 0 [a l , a m]−ε = 0 , a†±m ]−ε = (±1)2j+1δlm idF [a , a±m]−ε = (±1)2j+1δlm idF , (6.13) provide a solution of any one of the equations (6.10)–(6.12) in a sense that, due to (3.7) and (6.8), with η = −ε any set of operators satisfying (6.13) converts (6.10)–(6.12) into identities. Besides, this conclusion remains valid also if the normal ordering is taken into account, i.e. if, in this particular case, the changes a m ◦ a+m 7→ εa+m ◦ a m and a m ◦ a m 7→ εa†+m ◦ a−m are made in (6.10)–(6.12). Now we shall demonstrate how the trilinear relations (6.12) lead to the paracommuta- tion relations. Equations (6.12) can be ‘split’ into different kinds of trilinear commutation relations into infinitely many ways. For example, the system of equations a±l , [a ± (1 + τ)δlma±l = 0 (6.14a) a±l , [a m , a ± (1 + τ)δlma±l = 0 (6.14b) , [a+m, a ± (1 + τ)δlma†±l = 0 (6.14c) l , [a m , a ± (1 + τ)δlma†±l = 0 (6.14d) provides an evident solution of (6.12). However, it is a simple algebra to be seen that these relations are incompatible with the standard (anti)commutation relations (6.13) and, Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 23 in this sense, are not suitable as subsidiary restrictions on the Lagrangian formalism. For our purpose, the equations a+l , [a + 2δlma l = 0 (6.15a) a+l , [a m , a + 2τδlma l = 0 (6.15b) a−l , [a − 2τδlma−l = 0 (6.15c) a−l , [a m , a − 2δlma−l = 0 (6.15d) and their Hermitian conjugate provide a solution of (6.12), which is compatible with (6.13), i.e. if (6.13) hold, the equations (6.15) are converted into identities. The idea of the paraquantization is in the following generalization of (6.15) a+l , [a + 2δlna m = 0 (6.16a) a+l , [a m , a + 2τδlna m = 0 (6.16b) a−l , [a − 2τδlma−n = 0 (6.16c) a−l , [a m , a − 2δlma−n = 0 (6.16d) which reduces to (6.15) for n = m and is a generalization of (6.13) in a sense that any set of operators satisfying (6.13) converts (6.16) into identities, the opposite being generally not valid.19 Suppose that the field considered consists of a single sort of particles, e.g. electrons or photons, created by b and annihilated by bl := a . Then the equation Hermitian conjugated to (6.15a) reads [bl, [b m, bm]ε] = 2δlmbm. (6.17) This is the main relation from which the paper [16] starts. The basic paracommutation relations are [16–18,26]: [bl, [b m, bn]ε] = 2δlmbn (6.18a) [bl, [bm, bn]ε] = 0. (6.18b) The first of them is a generalization (stronger version) of (6.17) by replacing the second index m with an arbitrary one, say n, and the second one is added (by ”hands”) in the theory as an additional assumption. Obviously, (6.18) are a solution of (6.15) and therefore of (6.12) in the considered case of a field consisting of only one sort of particles. The equations (6.15) contain also the relativistic version of the paracommutation rela- tions, when the existence of antiparticles must be respected [18, sec. 18.1]. Indeed, noticing that the field’s particles (resp. antiparticles) are created by b := a+ (resp. c ) and annihilated by bl := a (resp. cl := a ), from (6.15) and the Hermitian conjugate to them equations, we get [bl, [b m, bm]ε] = 2δlmbm [cl, [c m, cm]ε] = 2δlmcm (6.19a) , [c†m, cm]ε] = −2τδlmb†m [c , [b†m, bm]ε] = −2τδlmc†m. (6.19b) Generalizing these equations in a way similar to the transition from (6.17) to (6.18), we obtain the relativistic paracommutation relations as (cf. (6.16)) [bl, [b m, bn]ε] = 2δlmbn [bl, [bm, bn]ε] = 0 (6.20a) [cl, [c m, cn]ε] = 2δlmcn [cl, [cm, cn]ε] = 0 (6.20b) l , [c m, cn]ε] = −2τδlnb†m [c l , [b m, bn]ε] = −2τδlnc†m. (6.20c) 19 Other generalizations of (6.15) are also possible, but they do not agree with (6.13). Moreover, it is easy to be proved, any other (non-trivial) arrangement of the indices in (6.16) is incompatible with (6.13). Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 24 The equations (6.20a) (resp. (6.20b)) represent the paracommutation relations for the field’s particles (resp. antiparticles) as independent objects, while (6.20c) describe a pure relativistic effect of some “interaction” (or its absents) between field’s particles and antiparticles and fixes the paracommutation relations involving the bl’s and cl’s, as pointed in [18, p. 207] (where bl is denoted by al and cl by bl). The relations (6.17) and (6.20) for ε = +1 (resp. ε = −1) are referred as the parabose (resp. parafermi) commutation relations [18]. This terminology is a natural one also with respect to the commutation relations (6.16), which will be referred as the paracommutation relations too. As first noted in [16], the equations (6.13) provide a solution of (6.20) (or (6.18) in the nonrelativistic case) but the latter equations admit also an infinite number of other solutions. Besides, by taking Hermitian conjugations of (some of) the equations (6.18) or (6.20) and applying generalized Jacobi identities, like α[[A,B]ξ , C]η + ξη[[A,C]−α/ξ , B]−α/η − α2[[B,C]ξη/α, A]1/α = 0 αξη 6= 0 β[A, [B,C]α, ]−βγ + γ[B, [C,A]β , ]−γα + α[C, [A,B]γ , ]−αβ = 0 α, β, γ = ±1 [[A,B]η, C]− + [[B,C]η, A]− + [[C,A]η , B]− = 0 η = ±1 [[A,B]ξ, [C,D]η ]− = [[A,B]ξ , C]−,D]η + η[[A,B]ξ ,D]−, C]1/η η 6= 0, (6.21) one can obtain a number of other (para)commutation relations for which the reader is referred to [16,18,26]. Of course, the paracommutation relations (6.16), in particular (6.18) and (6.20) as their stronger versions, do not give the general solution of the trilinear relations (6.12). For instance, one may replace (6.12) with the equations a+l , [a m , a n ]ε + [a + 2(1 + τ)δlna m = 0 (6.22a) a−l , [a m , a n ]ε + [a − 2(1 + τ)δlma−n = 0. (6.22b) and their Hermitian conjugate, which in terms of the operators bl and cl introduced above [bl, [b m, bn]ε + [c m, cm]ε] = 2(1 + τ)δlmbn (6.23a) [cl, [b m, bn]ε + [c m, cm]ε] = 2(1 + τ)δlmcn, (6.23b) and supplement these relations with equations like (6.18b). Obviously, equations (6.16) con- vert (6.22) into identities and, consequently, the (standard) paracommutation relations (6.20) provide a solution of (6.23). On the base of (6.23) or other similar equations that can be obtained by generalizing the ones in (6.10)–(6.12), further research on particular classes of trilinear commutation relations can be done, but, however, this is not a subject of the present work. Let us now pay attention to the fact that equations (6.10), (6.11) and (6.12) are generally different (regardless of existence of some connections between their solutions). The cause for this being that the momentum operators for the Lagrangians L′, L′′ and L′′′ are generally different unless some additional restrictions are added to the Lagrangian formalism (see Sect. 4). A necessary and sufficient condition for (6.10)–(6.12) to be identical is [a±l , [a m , a m]−ε − [a+m, a†−m ]−ε] = 0, (6.24) which certainly is valid if the condition (4.9′), viz. [a†+m , a m]−ε − [a+m, a†−m ]−ε = 0, (6.25) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 25 ensuring the uniqueness of the momentum operator are, holds. If one adopts the standard bilinear commutation relations (6.13), then (6.25), and hence (6.24), is identically valid, but in the framework of, e.g., the paracommutation relations (6.16) (or (6.20) in other form) the equations (6.25) should be postulated to ensure uniqueness of the momentum operator and therefore of the field equations. On the base of (6.10) or (6.11) one may invent other types of commutation relations, which will not be investigated in this paper because we shall be interested mainly in the case when (6.10), (6.11) and (6.12) are identical (see (6.24)) or, more generally, when the dynamical variables are unique in the sense pointed in Sect. 4. 6.2. Restrictions related to the charge operator The consequences of the Heisenberg relations (5.2), involving the charge operator for a charged field, q 6= 0 (and hence τ = 0 – see (3.7)), will be examined in this subsection. In terms of creation and annihilation operators it is equivalent to [1, 13–15] [a±s (k), Q] = qa±s (k) [a†±s (k), Q] = −qa†±s (k), (6.26) the values of the polarization indices being specified by (6.5). Substituting here (3.10), we see that, for a charged field, the field equations for the Lagrangians L′, L′′ and L′′′ (see Sect. 3) respectively are: 2j+1−δ0m(1−δ0j ) d3p{[a±s (k), a t (p) ◦ a−t (p)− εa t (p) ◦ a+t (p)] − a±s (k)δstδ3(k − p)} = 0 (6.27a) 2j+1−δ0m(1−δ0j ) d3p{[a†±s (k), a t (p) ◦ a−t (p)− εa t (p) ◦ a+t (p)] + a†±s (k)δstδ3(k − p)} = 0 (6.27b) 2j+1−δ0m(1−δ0j ) d3p{[a±s (k), a+t (p) ◦ a t (p)− εa−t (p) ◦ a t (p)] + a s (k)δstδ 3(k − p)} = 0 (6.28a) 2j+1−δ0m(1−δ0j ) d3p{[a†±s (k), a+t (p) ◦ a t (p)− εa−t (p) ◦ a t (p)] − a†±s (k)δstδ3(k − p)} = 0 (6.28b) 2j+1−δ0m(1−δ0j ) d3p{[a±s (k), [a t (p), a t (p)]ε − [a+t (p), a t (p)ε] − 2a±s (k)δstδ3(k − p)} = 0 (6.29a) 2j+1−δ0m(1−δ0j ) d3p{[a†±s (k), [a t (p), a t (p)]ε − [a+t (p), a t (p)ε] + 2a s (k)δstδ 3(k − p)}=0. (6.29b) Using (6.27)–(6.29) and (6.8), with η = ε = −1, or simply (6.26), one can easily verify the validity of the equations [ P̃µ, Q̃] = 0 [ L̃µν , Q̃] = 0 [ S̃µν , Q̃] = 0 [M̃µν , Q̃] = 0, (6.30) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 26 where the operators P̃µ, Q̃, S̃µν , L̃µν and M̃µν are calculated from one and the same initial Lagrangian according to (3.9)–(3.12). This result confirms the validity of assertion (ii) before (5.24) for the fields considered. Following the above considerations, concerning the momentum operator, we shall now replace the systems of integral equations (6.27)–(6.29) with respectively the following stronger systems of algebraic equations (by equating to zero the integrands in (6.27)–(6.29)): a±l , a m ◦ a−m − εa†−m ◦ a+m − δlma±l = 0 (6.31a) , a†+m ◦ a−m − εa†−m ◦ a+m + δlma = 0 (6.31b) a±l , a m ◦ a†−m − εa−m ◦ a†+m + δlma l = 0 (6.32a) , a+m ◦ a†−m − εa−m ◦ a†+m − δlma†±l = 0 (6.32b) a±l , [a m , a m]ε − [a+m, a†−m ]ε − 2δlma±l = 0 (6.33a) , [a†+m , a m]ε − [a+m, a†−m ]ε + 2δlma = 0. (6.33b) These trilinear commutation relations are similar to (6.10)–(6.12) and, consequently, can be treated in analogous way. By invoking (6.8), it is a simple algebra to be proved that the standard bilinear commu- tation relations (6.13) convert (6.31)–(6.33) into identities. Thus (6.13) are stronger version of (6.31)–(6.33) and, in this sense, any type of commutation relations, which provide a solution of (6.31)–(6.33) and is compatible with (6.13), is a suitable candidate for general- izing (6.13). To illustrate that idea, we shall proceed with (6.33) in a way similar to the ‘derivation’ of the paracommutation relations from (6.12). Obviously, the equations (cf. (6.14) with τ = 0, as now q 6= 0) , [a+m, a m ]ε] + δlma m = 0 (6.34a) , [a†+m , a m]ε] − δlma±m = 0 (6.34b) and their Hermitian conjugate provide a solution of (6.33), but, as a direct calculations shows, they do not agree with the standard (anti)commutation relations (6.13). A solution of (6.33) compatible with (6.13) is given by the equations (6.15), with τ = 0 as the field considered is charged one — see (3.7). Therefore equations (6.16), with τ = 0, also provide a compatible with (6.13) solution of (6.33), from where immediately follows that the paracommutation relations (6.20), with τ = 0, convert (6.33) into identities. To conclude, we can say that the paracommutation relations (6.20), in particular their special case (6.13), ensure the simul- taneous validity of the Heisenberg relations (5.1) and (5.2) for free scalar, spinor and vector fields. Similarly to (6.22), one may generalize (6.33) to a+l , [a m , a n ]ε − [a+m, a†−n ]ε − 2δlna+m = 0 (6.35a) a−l , [a m , a n ]ε − [a+m, a†−n ]ε − 2δlma−n = 0. (6.35b) which equations agree with (6.13), (6.15), (6.16) and (6.20), but generally do not agree with (6.22), with τ = 0, unless the equations (6.16), with τ = 0, hold. More generally, we can assert that (6.33) and (6.12), with τ = 0, hold simultaneously if and only if (6.15), with τ = 0, is fulfilled. From here, again, it follows that the paracommu- tation relations ensure the simultaneous validity of (5.1) and (5.2). Let us say now some words on the uniqueness problem for the Heisenberg equations involving the charge operator. The systems of equations (6.31)–(6.33) are identical iff a±l , [a m , a m]−ε + [a m ]−ε = 0, (6.36) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 27 which, in particular, is satisfied if the condition [a†+m , a m]−ε + [a m ]−ε = 0, (6.37) ensuring the uniqueness of the charge operator (see (4.10′)), is valid. Evidently, equa- tions (6.36) and (6.24) are compatible iff a+l , [a m , a a−l , [a m , a = 0 (6.38) which is a weaker form of (4.15) ensuring simultaneous uniqueness of the momentum and charge operator. 6.3. Restrictions related to the angular momentum operator(s) It is now turn to be investigated the restrictions on the creation and annihilation operators that follow from the Heisenberg relations (5.3) concerning the angular momentum operator. They can be obtained by inserting the equations (3.11) and (3.12) into (5.3). As pointed in Sect. 5, the resulting equalities, however, depend not only on the particular Lagrangian employed, but also on the geometric nature of the field considered; the last dependence being explicitly given via (5.25) and the polarization functions σss µν (k) and l ss′m± µν (k) (see also (3.14)). Consider the terms containing derivatives in (5.3), L̃orµν := i~ ϕ̃i(x). (6.39) If ϕ̃ (k) denotes the Fourier image of ϕ̃i(x), i.e. ϕ̃i(x) = Λ d4ke− kµxµ ϕ̃ (k), (6.40) with Λ being a normalization constant, then the Fourier image of (6.39) is (k). (6.41) Comparing this expression with equations (3.12), we see that the terms containing derivatives in (3.12) should be responsible for the term (6.39) in (5.3).20 For this reason, we shall suppose that the momentum operator M̃µν admits a representation M̃µν = M̃orµν + M̃spµν (6.42) such that the operators M̃orµν and M̃ µν satisfy the relations (5.4) and (5.5), respectively. Thus we shall replace (5.3) with the stronger system of equations (5.4)–(5.5). Besides, we shall admit that the explicit form of the operatorsM̃orµν and M̃ µν are given via (5.13) and (5.12) for the fields investigated in the present work. Let us consider at first the ‘orbital’ Heisenberg relations (5.4), which is independent of the particular geometrical nature of the fields studied. Substituting (5.13) and (6.40) into (5.4), using that ϕ̃ (±k), with k2 = m2c2, is a linear combination of ã±s (k) with classical, not operator-valued, functions of k as coefficients [1, 13–15] and introducing for brevity the operator ωµν(k) := kµ , (6.43) 20 The terms proportional to the momentum operator in (3.12) disappear if the creation and annihilation operators (2.9) in Heisenberg picture are employed (see also [13–15]). Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 28 we arrive to the following integro-differential systems of equations: 2j+1−δ0m(1−δ0j ) (−ωµν(p) + ωµν(q))([ã±s (k), ã t (p) ◦ ã−t (q) − εã†−t (p) ◦ ã+t (q)] ) m2c2+p2 = 2(1 + τ)ωµν(k)(ã s (k)) (6.44a) 2j+1−δ0m(1−δ0j ) (−ωµν(p) + ωµν(q))([ㆠ±s (k), ã t (p) ◦ ã−t (q) − εã†−t (p) ◦ ã+t (q)] ) m2c2+p2 = 2(1 + τ)ωµν(k)(ã s (k)) (6.44b) 2j+1−δ0m(1−δ0j ) (−ωµν(p) + ωµν(q))([ã±s (k), ã+t (p) ◦ ã t (q) − εã−t (p) ◦ ã t (q)] ) m2c2+p2 = 2(1 + τ)ωµν(k)(ã s (k)) (6.45a) 2j+1−δ0m(1−δ0j ) (−ωµν(p) + ωµν(q))([ㆠ±s (k), ã+t (p) ◦ ã t (q) − εã−t (p) ◦ ã t (q)] ) m2c2+p2 = 2(1 + τ)ωµν(k)(ã s (k)) (6.45b) 2j+1−δ0m(1−δ0j ) (−ωµν(p) + ωµν(q))([ã±s (k), [ã t (p), ã t (q)]ε + [ã+t (p), ã t (q)]ε] ) m2c2+p2 = 4(1 + τ)ωµν(k)(ã s (k)) (6.46a) 2j+1−δ0m(1−δ0j ) (−ωµν(p) + ωµν(q))([ㆠ±s (k), [ã t (p), ã t (q)]ε + [ã+t (p), ã t (q)]ε] ) m2c2+p2 = 4(1 + τ)ωµν(k)(ã s (k)), (6.46b) where k0 = m2c2 + k2 is set after the differentiations are performed (see (6.43)). Follow- ing the procedure of the previous considerations, we replace the integro-differential equa- tions (6.44)–(6.46) with the following differential ones: (−ω◦µν(m) + ω◦µν(n))([ã±l , ã m ◦ ã−n − εã†−m ◦ ã+n ] ) = 2(1 + τ)δlmω µν(l)(ã l ) (6.47a) (−ω◦µν(m)+ω◦µν(n))([ã l , ã m ◦ ã−n − εã†−m ◦ ã+n ] ) = 2(1+ τ)δlmω µν(l)(ã l ) (6.47b) (−ω◦µν(m) + ω◦µν(n))([ã±l , ã m ◦ ã†−n − εã−m ◦ ã†+n ] ) = 2(1 + τ)δlmω µν(l)(ã ) (6.48a) (−ω◦µν(m)+ω◦µν(n))([ã l , ã m ◦ ã†−n − εã−m ◦ ã†+n ] ) = 2(1+ τ)δlmω µν(l)(ã l ) (6.48b) (−ω◦µν(m) + ω◦µν(n))([ã±l , [ã m , ã n ]ε + [ã m, ã n ]ε] ) = 4(1 + τ)δlmω µν(l)(ã l ) (6.49a) (−ω◦µν(m) + ω◦µν(n))([ã , [ã†+m , ã n ]ε + [ã m, ã n ]ε] ) = 4(1 + τ)δlmω µν(l)(ã (6.49b) where we have set (cf. (6.43)) ω◦µν(l) := ωµν(k) = kµ if l = (s,k) (6.50) and k0 = m2c2 + k2 is set after the differentiations are performed. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 29 Remark. Instead of (6.47)–(6.49) one can write similar equations in which the operator −ω◦µν(m) or +ω◦µν(n) is deleted and the factor +12 or − , respectively, is added on their right hand sides. These manipulations correspond to an integration by parts of some of the terms in (6.44)–(6.46). The main difference of the obtained trilinear relations with respect to the previous ones considered above is that they are partial differential equations of first order. The relations (6.49) agree with the equations (6.16) in a sense that if (6.16) hold, then (6.49) become identically valid. Indeed, since (−ω◦µν(m) + ω◦µν(n))(ã±mδln) = −2δlmω◦µν(m)(ã±m) (−ω◦µν(m) + ω◦µν(n))(ã±n δlm) = +2δlmω µν(m)(ã (6.51) due to (6.50), (6.43) and the equality dδ(x) f(x) = −δ(x)df(x) for a C1 function f , the application of the operator (−ω◦µν(m) + ω◦µν(n)) to (6.16) and subsequent setting n = m entails (6.49). In particular, this means that the paracommutation relations (6.20) and, moreover, the standard (anti)commutation relations (6.13) convert (6.49) into identities. Therefore the ‘orbital’ Heisenberg relations (5.4) hold for scalar, spinor and vector fields satisfying the bilinear or para commutation relations. It should be noted, the paracommutation relations are not the only trilinear commutation relations that are solutions of (6.49). As an example, we shall present the trilinear relations a+l , [a a+l , [a m , a = −(1 + τ)δlna+m (6.52a) a−l , [a a−l , [a m , a = +(1 + τ)δlma n , (6.52b) which reduce to (6.14) for n = m, do not agree with (6.13), but convert (6.49) into identities (see (6.51)). Other example is provided by the equations (6.22), which are compatible with the paracommutation relations and, as a result of (6.51), convert (6.49) into identities. Prima facie one may suppose that any solution of (6.12) provides a solution of (6.49), but this is not the general case. A counterexample is provided by the commutation relations a±l , [a m , a n ]ε + [a ± 2(1 + τ)δlna±m = 0, (6.53) which reduce to (6.12) for n = m, satisfy (6.49) with ã+l for ã l , and do not satisfy (6.49) with ã−l for ã l (see (6.51) and cf. (6.22)). From (5.13) follows that the operator M̃orµν is independent of the Lagrangian L′, L′′ or L′′′ one starts off if and only if (see (4.11)) (−ω◦µν(m) + ω◦µν(n)) [ã†+m , ã n ]−ε − [ã+m, ã†−n ]−ε = 0. (6.54) This condition ensures the coincidence of the systems of equations (6.47), (6.48) and (6.49) too. However, the following necessary and sufficient condition for the coincidence of these systems is expressed by the weaker equations (−ω◦µν(m) + ω◦µν(n)) ã±l , [ã m , ã n ]−ε − [ã+m, ã†−n ]−ε = 0. (6.55) It is now turn to be considered the ‘spin’ Heisenberg relations (5.5). Recall, the field operators ϕi for the fields considered here admit a representation [13–15] ϕi = Λ i (p)a t (p) + v i (p)a t (p) , (6.56) where Λ is a normalization constant and v i (p) are classical, not operator-valued, complex or real functions which are linearly independent. The particular definition of v i (p) depends Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 30 on the geometrical nature of ϕi and can be found in [13–15] (see also [1]), where the reader can find also a number of relations satisfied by v i (p). Here we shall mention only that i (p) = 1 for a scalar field and v i (p) = v i (p) =: v i(p) = (v i(p)) ∗ for a vector field. The explicit form of the polarization functions σ ss′,± µν (k) and l ss′,± µν (k) (see Sect. 3, in particular (3.14)) through v i (k) are [13–15]: µν (k) = (−1)j j + δj0 i (k)) ∗Iii′µνv µν (k) = (−1)j 2j + δj0 i (k)) ←−−−−−→ ←−−−−−→ i (k), (6.57) with an exception that σ ss′,± 0a (k) = σ ss′,± a0 (k) = 0, a = 1, 2, 3, for a spinor field, j = , [14]. Evidently, the equations (3.14) follow from the mentioned facts (see also (5.25)). Substituting (6.56) and (5.12) into (5.5), we obtain the following systems of integral equations (corresponding respectively to the Lagrangians L′, L′′ and L′′′): (−1)j+1j 1 + τ s,s′,t i (p) µν (k) + l ss′,− µν (k))[a t (p), a s (k) ◦ a−s′(k)] + (σss µν (k) + l ss′,+ µν (k))[a t (p), a s (k) ◦ a+s′(k)] d3pIi (p)a±t (p) (6.58) (−1)j+1j 1 + τ s,s′,t i (p) µν (k) + l ss′,+ µν (k))[a t (p), a (k) ◦ a†−s (k)] + (σss µν (k) + l ss′,− µν (k))[a t (p), a (k) ◦ a†+s (k)] d3pIi (p)a±t (p) (6.59) (−1)j+1j 2(1 + τ) s,s′,t i (p) µν (k) + l ss′,− µν (k)) a±t (p), [a s (k), a (k)]ε + (σss µν (k) + l ss′,+ µν (k)) a±t (p), [a s (k), a (k)]ε d3pIi (p)a±t (p). (6.60) For the difference of all previously considered systems of integral equations, like (6.2)– (6.4), (6.27)–(6.29) and (6.44)–(6.46), the systems (6.58)–(6.60) cannot be replaced by ones consisting of algebraic (or differential) equations. The cause for this state of affairs is that in (6.58)–(6.60) enter polarization modes with arbitrary s and s′ and, generally, one cannot ‘diagonalize’ the integrand(s) with respect to s and s′; moreover, for a vector field, the modes with s = s′ are not presented at all (see (3.14)). That is why no commutation relations can be extracted from (6.58)–(6.60) unless further assumptions are made. Without going into details, below we shall sketch the proof of the assertion that the commutation relations (6.16) convert (6.60) into identities for massive spinor and vector fields.21 In particular, this entails that the paracommutation and the bilinear commutation relations provide solutions of (6.60). Let (6.16) holds. Combining it with (6.60), we see that the latter splits into the equations 21 The equations (6.58)–(6.60) are identities for scalar fields as for them Iµν = 0 and v i (k) = 1, which reflects the absents of spin for these fields. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 31 (−1)jj 1 + τ i (p) τ(σst,−µν (p) + l µν (p)) + ε(σ µν (p) + l µν (p)) a+s (p), (p)a+s (p) (6.61a) (−1)j+1j 1 + τ i (p) (σts,−µν (p) + l µν (p)) + ετ(σ µν (p) + l µν (p)) a−s (p), i′ (p)a s (p). (6.61b) Inserting here (6.57), we see that one needs the explicit definition of v i (k) and formulae for sums like ρii′(k) := i (k)(v (k))∗, which are specific for any particular field and can be found in [13–15]. In this way, applying (5.25), (3.7) and the mentioned results from [13–15], one can check the validity of (6.61) for massive fields in a way similar to the proof of (5.3) in [13–15] for scalar, spinor and vector fields, respectively. We shall end the present subsection with the remark that the equations (4.17) and (4.18), which together with (4.15) ensure the uniqueness of the spin and orbital operators, are sufficient conditions for the coincidence of the equations (6.58), (6.59) and (6.60). 7. Inferences To begin with, let us summarize the major conclusions from Sect. 6. Each of the Heisenberg equations (5.1)–(5.3), the equations (5.3) being split into (5.4) and (5.5), induces in a natural way some relations that the creation and annihilation operators should satisfy. These rela- tions can be chosen as algebraic trilinear ones in a case of (5.1) and (5.2) (see (6.10)–(6.12) and (6.31)–(6.33), respectively). But for (5.4) and (5.5) they need not to be algebraic and are differential ones in the case of (5.4) (see (6.47)–(6.49)) and integral equations in the case of (5.5) (see (6.58)–(6.60)). It was pointed that the cited relations depend on the initial Lagrangian from which the theory is derived, unless some explicitly written conditions hold (see (6.24), (6.37) and (6.55)); in particular, these conditions are true if the equations (4.9)– (4.13), ensuring the uniqueness of the corresponding dynamical operators, are valid. Since the ‘charge symmetric’ Lagrangians (3.4) seem to be the ones that best describe free fields, the arising from them (commutation) relations (6.12), (6.33), (6.49) and (6.60) were stud- ied in more details. It was proved that the trilinear commutation relations (6.16) convert them into identities, as a result of which the same property possess the paracommutation relations (6.20) and, in particular, the bilinear commutation relations (6.13). Examples of tri- linear commutation relations, which are neither ordinary nor para ones, were presented; some of them, like (6.14), (6.34) and (6.52), do not agree with (6.13) and other ones, like (6.16), (6.22) and (6.35), generalize (6.20) and hence are compatible with (6.13). At last, it was demonstrated that the commutators between the dynamical variables (see (5.15)–(5.23)) are uniquely defined if a Heisenberg relation for one of the operators entering in it is postulated. The chief aim of the present section is to be explored the problem whether all of the reasonable conditions, mentioned in the previous sections and that can be imposed on the creation and annihilation operators, can hold or not hold simultaneously. This problem is suggested by the strong evidences that the relations (5.1)–(5.3) and (5.15)–(5.23), with a possible exception of (5.3) (more precisely, of (5.5)) in the massless case, should be valid in a realistic quantum field theory [1, 3, 7, 8, 11, 12]. Besides, to the arguments in loc. cit., we shall add the requirement for uniqueness of the dynamical variables (see Sect. 4). Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 32 As it was shown in Sect. 6, the relations (5.1), (5.2), (5.4) and (5.5) are compatible if one starts from a charge symmetric Lagrangian (see (3.4)), which best describes a free field theory; in particular, the commutation relations (6.16) (and hence (6.20) and (6.13)) ensure their simultaneous validity.22 For that reason, we shall investigate below only commutation relations for which (5.1), (5.2), (5.4) and (5.5) hold. It will be assumed that they should be such that the equations (6.10)–(6.12), (6.31)–(6.33), (6.47)–(6.49) and (6.58)–(6.60), respec- tively, hold. Consider now the problem for the uniqueness of the dynamical variables and its consis- tency with the commutation relations just mentioned for a charged field. It will be assumed that this uniqueness is ensured via the equations (4.9)–(4.11). The equation (4.15), viz. [a†±m , a m]−ε = 0, (7.1) is a necessary and sufficient conditions for the uniqueness of the momentum and charge operators (see Sect. 4 and the notation introduced at the beginning of Sect. 6). Before commenting on this relation, we would like to derive some consequences of it. Applying consequently (6.8) for η = −ε, (7.1) and the identity [A,B ◦ C]+ = [A,B]η ◦ C − ηB ◦ [A,C]−η η = ±1 (7.2) for η = +ε,−ε, we, in view of (7.1), obtain [a+m, [a m ]ε] = [a m , [a m]−ε]+ = (1− ε)[a†−m , a+m]ε ◦ a+m [a−m, [a m , a m]ε] = ε[a m , [a m]−ε]+ = ε(1 − ε)[a†+m , a−m]ε ◦ a−m. (7.3) Forming the sum and difference of (6.12a), for τ = 0, and (6.33a), we see that the system of equations they form is equivalent to [a+l , [a m , a m]ε] = 0 [a l , [a m ]ε] = 0 (7.4a) , [a+m, a m ]ε] + 2δlma = 0 [a− , [a†+m , a m]ε] − 2δlma−l = 0. (7.4b) Combining (7.4b), for l = m, with (7.3), we get (1− ε)[a†−m , a+m]ε ◦ a+m + 2a+m = 0 ε(1− ε)[a†+m , a−m]ε ◦ a−m − 2a−m = 0. (7.5) Obviously, these equations reduce to a±m = 0 (7.6) for bose fields as for them ε = +1 (see (3.7)). Since the operators (7.6) describe a completely unobservable field, or, more precisely, an absence of a field at all, the obtained result means that the theory considered cannot describe any really existing physical field with spin j = 0, 1. Such a conclusion should be regarded as a contradiction in the theory. For fermi fields, j = 1 and ε = −1, the equations (7.5) have solutions different from (7.6) iff a±m are degenerate operators, i.e. with no inverse ones, in which case (7.4a) is a consequence of (7.5) and (7.1) (see (6.8) and (7.3) too). The source of the above contradiction is in the equation (7.1), which does not agree with the bilinear commutation relations (6.13) and contradicts to the existing correlation between creation and annihilation of particles with identical characteristics (m = (t,p) in our case) as (7.1) can be interpreted physically as mutual independence of the acts of creation and annihilation of such particles [1, § 10.1]. At this point, there are two ways for ‘repairing’ of the theory. On one hand, one can forget about the uniqueness of the dynamical variables (in a sense of Sect. 4), after which 22 The special case(s) when (5.5) may not hold for a massless field will not be considered below. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 33 the formalism can be developed by choosing, e.g., the charge symmetric Lagrangians (3.4) and following the usual Lagrangian formalism; in fact, this is the way the parafield theory is build [16,18]. On another hand, one may try to change something at the ground of the theory in such a way that the uniqueness of the dynamical variables to be ensured automatically. We shall follow the second method. As a guiding idea, we shall have in mind that the bilinear commutation relations (6.13) and the related to them normal ordering procedure provide a base for the present-day quantum field theory, which describes sufficiently well the discovered elementary particles/fields. On this background, an extensive exploration of commutation relations which are incompatible with (6.13) is justified only if there appear some evidences for fields/particles that can be described via them. In that connection it should be recalled [17, 18], it seems that all known particles/fields are described via (6.13) and no one of them is a para particle/field. Using the notation introduced at the beginning of Sect. 4, we shall look for a linear mapping (operator) E on the operator space over the system’s Hilbert space F of states such E(D′) = E(D′′). (7.7) As it was shown in Sect. 4, an example of an operator E is provided by the normal ordering operator N . Therefore an operator satisfying (7.7) always exists. To any such operator E there corresponds a set of dynamical variables defined via D = E(D′). (7.8) Let us examine the properties of the mapping E that it should possess due to the re- quirement (7.7). First of all, as the operators of the dynamical variables should be Hermitian, we shall require = E(B†) (7.9) for any operator B, which entails D† = D, (7.10) due to (3.9)–(3.12) and (7.8). As in Sect. 4, we shall replace the so-arising integral equations with corresponding alge- braic ones. Thus the equations (4.5)–(4.20) remain valid if the operator E is applied to their left hand sides. Consider the general case of a charged field, q 6= 0. So, the analogue of (4.15) reads [a†±m , a = 0, (7.11) which equation ensures the uniqueness of the momentum and charge operators. Respectively, the condition (4.11) transforms into (−ω◦µν(m) + ω◦µν(n)) E([a†+m , a−n ]−ε)− E([a+m, a†−n ]−ε) = 0, (7.12) which, by means of (7.11) can be rewritten as (cf. (4.16)) ω◦µν(n) E([a†+m , a−n ]−ε)− E([a+m, a†−n ]−ε) = 0. (7.13) At the end, equations (4.17) and (4.18) now should be written as µν (k) E [a†+s (k), a (k)]−ε + σss µν (k) E [a†−s (k), a (k)]−ε = 0 (7.14) µν (k) E [a†+s (k), a (k)]−ε + lss µν (k) E [a†−s (k), a (k)]−ε = 0. (7.15) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 34 These equations can be satisfied if we generalize (7.11) to (cf. (4.20)) [a†±s (k), a (k)]−ε = 0 (7.16) for any s and s′. At last, the following stronger version of (7.16) [a†±m , a n ]−ε = 0, (7.17) for any m = (t,p) and n = (r, q), ensures the validity of (7.14) and (7.15) and thus of the uniqueness of all dynamical variables. It is time now to call attention to the possible commutation relations. The replacement D′, D′′, D′′′ 7→ D := E(D′) = E(D′′) = E(D′′′) results in corresponding changes in the whole of the material of Sect. 6. In particular, the systems of commutation relations (6.10)– (6.12), (6.31)–(6.33), (6.47)–(6.49) and (6.58)–(6.60) should be replaced respectively with:23 a±l , E(a m ◦ a−m) + ε E(a†−m ◦ a+m) ± (1 + τ)δlma±l = 0 (7.18) a±l , E(a m ◦ a−m)− ε E(a†−m ◦ a+m) − δlma±l = 0 (7.19) (−ω◦µν(m) + ω◦µν(n))([ã±l , E(ã m ◦ ã−n )− ε E(ã†−m ◦ ã+n )] ) = 2(1 + τ)δlmω µν(l)(ã (7.20) (−1)j+1j 1 + τ s,s′,t i (p) µν (k) + l ss′,− µν (k))[a t (p), E(a†+s (k) ◦ a−s′(k))] + (σss µν (k) + l ss′,+ µν (k))[a t (p), E(a†−s (k) ◦ a+s′(k))] d3pIi i′ (p)a t (p). (7.21) Due to the uniqueness conditions (7.11)–(7.14), one can rewrite the terms E(a†±m ◦ a∓m) in (7.18)–(7.21) in a number of equivalent ways; e.g. (see (7.11)) E(a†±m ◦ a∓m) = ε E(a∓m ◦ a†±m ) = E([a†±m , a∓m]ε). (7.22) Consider the general case of a charged field, q 6= 0 (and hence τ = 0). The system of equations (7.18)–(7.19) is then equivalent to , E(a†±m ◦ a∓m) = 0 (7.23a) , E(a†−m ◦ a+m) + εδlma = 0 (7.23b) a−l , E(a m ◦ a−m) − δlma−l = 0. (7.23c) These (commutation) relations ensure the simultaneous fulfillment of the Heisenberg rela- tions (5.1) and (5.2) involving the momentum and charge operators, respectively. To ensure also the validity of (7.20), with τ = 0, and, consequently, of (5.4), we generalize (7.23) to a±l , E(a m ◦ a∓n ) = 0 (7.24a) , E(a†−m ◦ a+n ) + εδlma n = 0 (7.24b) , E(a†+m ◦ a−n ) − δlma−n = 0, (7.24c) for any l = (s,k), m = (t,p) and n = (t, q) (see also (6.51)). In the way pointed in Sect. 6, one can verify that (7.24) for any l = (s,k), m = (t,p) and n = (r,p) entails (7.21) and hence (5.5). At last, to ensure the validity of all of the mentioned conditions and a 23 To save some space, we do not write the Hermitian conjugate of the below-written equations. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 35 suitable transition to a case of Hermitian field, for which q = 0 and τ = 1 (see (3.7)), we generalize (7.24) to a+l , E(a m ◦ a−n ) + τδlna m = 0 (7.25a) a−l , E(a m ◦ a+n ) − ετδlna−m = 0 (7.25b) , E(a†−m ◦ a+n ) + εδlma n = 0, (7.25c) , E(a†+m ◦ a−n ) − δlma−n = 0 (7.25d) where l, m and n are arbitrary. As a result of (7.17), which we assume to hold, and τa τa±l (see (3.7)), the equations (7.25a) and (7.25c) (resp. (7.25b) and (7.25d)) become identical when τ = 1 (and hence a l = a l ); for τ = 0 the system (7.25) reduces to (7.24). Recalling that ε = (−1)2j (see (3.7)), we can rewrite (7.25) in a more compact form as a±l , E(a m ◦ a∓n ) + (±1)2j+1τδlna±m = 0 (7.26a) a±l , E(a m ◦ a±n ) − (∓1)2j+1τδlma±n = 0. (7.26b) Since the last equation is equivalent to (see (7.17)) and use that ε = (−1)2j) , E(a±m ◦ a†∓n ) + (±1)2j+1δlna±m = 0, (7.26b′) it is evident that the equations (7.26a) and (7.26b) coincide for a neutral field. Let us draw the main moral from the above considerations: the equations (7.17) are sufficient conditions for the uniqueness of the dynamical variables, while (7.26) are such conditions for the validity of the Heisenberg relations (5.1)–(5.5), in which the dynamical variables are redefined according to (7.8). So, any set of operators a± and E , which are simultaneous solutions of (7.17) and (7.26), ensure uniqueness of the dynamical variables and at the same time the validity of the Heisenberg relations. Consider the uniqueness problem for the solutions of the system of equations consisting of (7.17)and (7.26). Writing (7.17) as E(a†±m ◦ a∓n ) = ε E(a∓n ◦ a†±m ) = E([a†±m , a∓n ]ε), (7.27) which reduces to (7.22) for n = m, and using ε = (−1)2j (see (3.7)), one can verify that (7.26) is equivalent to a+l , E([a n ]ε) + 2δlna m = 0 (7.28a) a+l , E([a m , a n ]ε) + 2τδlna m = 0 (7.28b) a−l , E([a n ]ε) − 2τδlma−n = 0 (7.28c) a−l , E([a m , a n ]ε) − 2δlma−n = 0. (7.28d) The similarity between this system of equations and (6.16) is more than evident: (7.28) can be obtained from (6.16) by replacing [·, ·]ε with E([·, ·]ε). As it was said earlier, the bilinear commutation relations (6.13) and the identification of E with the normal ordering operator N , E = N , (7.29) convert (7.27)–(7.28) into identities; by invoking (6.8), for η = −ε, the reader can check this via a direct calculation (see also (4.23)). However, this is not the only possible solution Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 36 of (7.27)–(7.28). For example, if, in the particular case, one defines an ‘anti-normal’ ordering operator A as a linear mapping such that A(a+m ◦ a†−n ) := εa†−n ◦ a+m A(a†+m ◦ a−n ) := εa−n ◦ a†+m A(a−m ◦ a†+n ) := a−m ◦ a†+n A(a†−m ◦ a+n ) := a†−m ◦ a+n , (7.30) then the bilinear commutation relations (6.13) and the setting E = A provide a solution of (7.27)–(7.28); to prove this, apply (6.8) for η = −ε. Evidently, a linear combination of N and A, together with (6.13), also provides a solution of (7.27)–(7.28).24 Other solution of the same system of equations is given by E = id and operators a± satisfying (6.16), in particular the paracommutation relations (6.20), and a m ◦ a,∓n = εa,∓n ◦ a†±m . The problem for the general solution of (7.27)–(7.28) with respect to E and a±l is open at present. Let us introduce the particle and antiparticle number operators respectively by (see (7.27), (7.9) and (3.16)) Nl := = E(a+ ◦ a†− ) = (Nl)† =: N †l †Nl := l , a = E(a†+l ◦ a l ) = ( †Nl)† =: †Nl†. (7.31) As a result of the commutation relations (7.28), with n = m, they satisfy the equations25 [Nl, a+m]− = δlma+l (7.32a) [ †Nl, a+m]− = τδlma+l (7.32b) [Nl, a†+m ]− = τδlma l (7.32c) [ †Nl, a†+m ]− = δlma l . (7.32d) Combining (3.9)–(3.12) and (5.11)–(5.13) with (7.8), (7.27) and (7.31), we get the following expressions for the operators of the (redefined) dynamical variables: P̃µ = 1 + τ m2c2+k2 (Nl + †Nl) l = (s,k) (7.33) Q̃ = q (−Nl + †Nl) (7.34) S̃µν = (−1)j−1/2j~ 1 + τ {εσmn,+µν Nnm + σmn,−µν †Nmn)} m=(s,k) n=(s′,k) (7.35) L̃µν = x0µ P̃ν − x0 ν P̃µ + (−1)j−1/2j~ 1 + τ {εlmn,+µν Nnm + lmn,−µν †Nmn)} m=(s,k) n=(s′,k) 2(1 + τ) −ω◦µν(l) + ω◦µν(m) (Nl + †Nl) m=l=(s,k) (7.36) M̃spµν = (−1)j−1/2j~ 1 + τ {ε(σmn,+µν + lmn,+µν )Nnm + (σmn,−µν + lmn,−µν ) †Nmn)} m=(s,k) n=(s′,k) (7.37) M̃orµν = 2(1 + τ) −ω◦µν(l) + ω◦µν(m) (Nl + †Nl) m=l=(s,k) . (7.38) 24 If we admit a± to satisfy the ‘anomalous” bilinear commutation relations (8.27) (see below), i.e. (6.13) with ε for −ε and (±1)2j for (±1)2j+1, then E = N , A also provides a solution of (7.27)–(7.28). However, as it was demonstrated in [13–15], the anomalous commutation relations are rejected if one works with the charge symmetric Lagrangians (3.4). 25 The equations (7.32a) and (7.32b) correspond to (7.28a) and (7.28b), respectively, and (7.32c) and (7.32d) correspond to the Hermitian conjugate to (7.28c) and (7.28d), respectively. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 37 Here ω◦µν(l) is defined via (6.50), we have set σmn,±µν := σ ss′,± µν (k) l µν := l ss′,± µν (k) for m = (s,k) and n = (s ′,k), (7.39) and (see (7.27)) Nlm := [a+l , a = E(a+l ◦ a m ) = (Nml)† =: N †Nlm := l , a = E(a†+l ◦ a m) = ( †Nml)† =: †Nml† (7.40) are respectively the particle and antiparticle transition operators (cf. [26, sec. 1] in a case of parafields). Obviously, we have Nl = Nll †Nl = †Nll. (7.41) The choice (7.29), evidently, reduces (7.33)–(7.36) to (4.24), (4.25), (4.28) and (4.29), respec- tively. In terms of the operators (7.38), the commutation relations (7.28) can equivalently be rewritten as (see also (7.9)) [Nlm, a+n ]− = δmna+l (7.42a) [ †Nlm, a+n ]− = τδmna+l (7.42b) [Nlm, a†+n ]− = τδmna (7.42c) [ †Nlm, a†+n ]− = δmna l . (7.42d) If m = l, these relations reduce to (7.32), due to (7.39). We shall end this section with the remark that the conditions for the uniqueness of the dynamical variables and the validity of the Heisenberg relations are quite general and are not enough for fixing some commutation relations regardless of a number of additional assumptions made to reduce these conditions to the system of equations (7.27)–(7.28). 8. State vectors, vacuum and mean values Until now we have looked on the commutation relations only from pure mathematical view- point. In this way, making a number of assumptions, we arrived to the system (7.27)–(7.28) of commutation relations. Further specialization of this system is, however, almost impossible without making contact with physics. For the purpose, we have to recall [1, 3, 11, 12] that the physically measurable quantities are the mean (expectation) values of the dynamical variables (in some state) and the transition amplitudes between different states. To make some conclusions from these basic assumption of the quantum theory, we must rigorously said how the states are described as vectors in system’s Hilbert space F of states, on which all operators considered act. For the purpose, we shall need the notion of the vacuum or, more precisely, the assumption of the existence of unique vacuum state (vector) (known also as the no-particle condition). Before defining rigorously this state, which will be denoted by X0, we shall heuristically analyze the properties it should possess. First of all, the vacuum state vector X0 should represent a state of the field without any particles. From here two conclusions may be drawn: (i) as a field is thought as a collection of particles and a ‘missing’ particle should have vanishing dynamical variables, those of the vacuum should vanish too (or, more generally, to be finite constants, which can be set equal Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 38 to zero by rescaling some theory’s parameters) and (ii) since the operators a−l and a l are interpreted as ones that annihilate a particle characterize by l = (s,k) and charge −q or +q, respectively, and one cannot destroy an ‘absent’ particle, these operators should transform the vacuum into the zero vector, which may be interpreted as a complete absents of the field. Thus, we can expect that D(X0) = 0 (8.1a) a−l (X0) = 0 a l (X0) = 0. (8.1b) Further, as the operators a+l and a l are interpreted as ones creating a particle charac- terize by l = (s,k) and charge −q or +q, respectively, state vectors like a+l (X0) and a l (X0) should correspond to 1-particle states. Of course, a necessary condition for this is X0 6= 0, (8.2) due to which the vacuum can be normalize to unit, 〈 X0| X0〉 = 1, (8.3) where 〈·|·〉 : F × F → C is the Hermitian scalar (inner) product of F . More generally, if , . . .) is a monomial only in i ∈ N creation operators, the vector ψl1l2... := M(a+l1 , a , . . .)(X0) (8.4) may be expected to describe an i-particle state (with i1 particles and i2 antiparticles, i1+i2 = i, where i1 and i2 are the number of operators a l and a l , respectively, in M(a , . . .)). Moreover, as a free field is intuitively thought as a collection of particles and antiparticles, it is natural to suppose that the vectors (8.4) form a basis in the Hilbert space F . But the validity of this assumption depends on the accepted commutation relations; for its proof, when the paracommutation relations are adopted, see the proof of [18, p. 26, theorem I-1]. Accepting the last assumption and recalling that the transition amplitude between two states is represented via the scalar product of the corresponding to them state vectors, it is clear that for the calculation of such an amplitude is needed an effective procedure for calculation of scalar products of the form 〈ψl1l2...|ϕm1m2...〉 := 〈 X0|(M(a+l1 , a , . . .))† ◦ M′(a+m1 , a , . . .)X0〉, (8.5) with M and M′ being monomials only in the creation operators. Similarly, for computation of the mean value of some dynamical operator D in a certain state, one should be equipped with a method for calculation of scalar products like 〈ψl1l2...| Dϕm1m2...〉 := 〈X0|(M(a+l1 , a , . . .))† ◦ D ◦ M′(a+m1 , a , . . .)X0〉. (8.6) Supposing, for the moment, the vacuum to be defined via (8.1), let us analyze (8.1)–(8.6). Besides, the validity of (7.27)–(7.28) will be assumed. From the expressions (7.8) and (3.9)–(3.12) for the dynamical variables, it is clear that the condition (8.1a) can be satisfied if E(a†±m ◦ a∓n )(X0) = 0, (8.7) which, in view of (7.27), is equivalent to any one of the equations E(a±m ◦ a†∓n )(X0) = 0 (8.8a) E([a±m, a†∓n ]ε)(X0) = 0. (8.8b) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 39 Equation (8.7) is quite natural as it expresses the vanishment of all modes of the vacuum corresponding to different polarizations, 4-momentum and charge. It will be accepted here- after. By means of (8.8) and the commutation relations (7.28) in the form (7.42), in particu- lar (7.32), one can explicitly calculate the action of any one of the operators (7.33)–(7.38) on the vectors (8.4): for the purpose one should simply to commute the operators Nlm (or Nl = Nll) with the creation operators in (8.4) according to (7.42) (resp. (7.32)) until they act on the vacuum and, hence, giving zero, as a result of (8.8) and (7.42) (resp. (7.32)). In particular, we have the equations (k0 = m2c2 + k2): a+l (X0) = kµa l (X0) P̃µ l (X0) = kµa l (X0) l = (s,k) (8.9) = −qa+ (X0) Q̃ = +qa (X0) (8.10) l=(s,k) (−1)j−1/2j~ 1 + τ {εσlm,+µν + τσml,−µν } m=(t,k) a+m|m=(t,k)(X0) l=(s,k) (−1)j−1/2j~ 1 + τ {ετσlm,+µν + σml,−µν } m=(t,k) a†+m |m=(t,k)(X0) (8.11) l=(s,k) = (x0 µkν − x0 νkµ)(a+l )(X0)− i~ ω◦µν(l)(a (−1)j−1/2j~ 1 + τ {εllm,+µν + τ lml,−µν } m=(t,k) a+m|m=(t,k)(X0) l=(s,k) = (x0 µkν − x0 νkµ)(a†+l )(X0)− i~ ω◦µν(l)(a (−1)j−1/2j~ 1 + τ {ετ llm,+µν + lml,−µν } m=(t,k) a†+m |m=(t,k)(X0) (8.12) M̃spµν l=(s,k) (−1)j−1/2j~ 1 + τ {ε(σlm,+µν + llm,+µν ) + τ(σml,−µν + l µν )} m=(t,k) a+m|m=(t,k)(X0) M̃spµν l=(s,k) (−1)j−1/2j~ 1 + τ {ετ(σlm,+µν + llm,+µν ) + (σml,−µν + l µν )} m=(t,k) a†+m |m=(t,k)(X0) (8.13) M̃orµν ã+l (X0) = −i~ ω◦µν(l)(ã (X0) M̃orµν l (X0) = −i~ ω◦µν(l)(ã (X0). (8.14) These equations and similar, but more complicated, ones with an arbitrary monomial in the creation operators for a+ are the base for the particle interpretation of the quantum theory of free fields. For instance, in view of (8.9) and (8.10), the state vectors a+l (X0) and l (X0) are interpreted as ones representing particles with 4-momentum ( m2c2 + k2,k) and charges −q and +q, respectively; similar multiparticle interpretation can be given to the general vectors (8.4) too. The equations (8.9)–(8.12) completely agree with similar ones obtained in [13–15] on the base of the bilinear commutation relations (6.13). By means of (8.7), the expression (8.6) can be represented as a linear combination of terms like (8.5). Indeed, as D is a linear combinations of terms like E(a†±m ◦a∓n ), by means of the relations (7.28) we can commute each of these terms with the creation (resp. annihilation) operators in the monomial M′(a+m1 , a m2 , . . .) (resp. (M(a+l1 , a , . . .))† = M′′(a†− , . . .)) and thus moving them to the right (resp. left) until they act on the vacuum X0, giving the zero vector — see (8.7). In this way the matrix elements of the dynamical variables, in particular their mean values, can be expressed as linear combinations of scalar products Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 40 of the form (8.5). Therefore the supposition (8.7) reduces the computation of mean values of dynamical variables to the one of the vacuum mean value of a product (composition) of creation and annihilation operators in which the former operators stand to the right of the latter ones. (Such a product of creation and annihilation operators can be called their ‘antinormal’ product; cf. the properties (7.30) of the antinormal ordering operator A.) The calculation of such mean values, like (8.5) for states ψ,ϕ 6= X0, however, cannot be done (on the base of (7.27)–(7.28), (8.7) and (8.1a)) unless additional assumption are made. For the purpose one needs some kind of commutation relations by means of which the creation (resp. annihilation) operators on the r.h.s. of (8.5) to be moved to the left (resp. right) until they act on the left (resp. right) vacuum vector X0; as a result of this operation, the expressions between the two vacuum vectors in (8.5) should transform into a linear combination of constant terms and such with no contribution in (8.5). (Examples of the last type of terms are E(a†±m ◦ a∓) and normally ordered products of creation and annihilation operators.) An alternative procedure may consists in defining axiomatically the values of all or some of the mean values (8.5) or, more stronger, the explicit action of all or some of the operators, entering in the r.h.s. of (8.5), on the vacuum.26 It is clear, both proposed schemes should be consistent with the relations (7.27)–(7.28), (8.1b) and (8.7)–(8.8). Let us summarize the problem before us: the operator E in (7.27)–(7.28) has to be fixed and a method for computation of scalar products like (8.5) should be given provided the vacuum vector X0 satisfies (8.1b), (8.2), (8.3) and (8.7). Two possible ways for exploration of this problem were indicated above. Consider the operator E . Supposing E(a†±m ◦ a∓n ) to be a function only of a m and a we, in view of (8.1b), can write E(a†±m ◦ a∓n ) = f±(a m ◦ a∓n ) ◦ b with b = a−n (upper sign) or b = a m (lower sign) and some functions f ±. Applying (7.27), we obtain (do not sum over l) E(a†+m ◦ a−l ) = f +(a†+m , a l ) ◦ a l E(a m ◦ a l ) = f −(a+m, a l ) ◦ a ◦ a†+m ) = εf+(a†+m , a−l ) ◦ a E(a†− ◦ a+m) = εf−(a+m, a ) ◦ a†− Since E is a linear operator, the expression E(a†±m ◦a∓n ) turns to be a linear and homogeneous function of a m and a n , which immediately implies f ±(A,B) = λ±A for operators A and B and some constants λ± ∈ C. For future convenience, we assume λ± = 1, which can be achieved via a suitable renormalization of the creation and annihilation operators.27 Thus, the last equations reduce to E(a†+m ◦ a−l ) = a m ◦ a−l E(a m ◦ a ) = a+m ◦ a (8.15a) E(a−l ◦ a m ) = εa m ◦ a−l E(a l ◦ a m) = εa m ◦ a l . (8.15b) Evidently, these equations convert (7.27), (8.7) and (8.8) into identities. Comparing (8.15) and (4.22), we see that the identification E = N (8.16) of the operator E with the normal ordering operator N is quite natural. However, for our purposes, this identification is not necessary as only the equations (8.15), not the general definition of N , will be employed. 26 Such an approach resembles the axiomatic description of the scattering matrix [1,7,8]. 27 Since λ+ = 0 or/and λ− = 0 implies D = 0, due to (7.8), these values are excluded for evident reasons. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 41 As a result of (8.15), the commutation relations (7.28) now read: [a+l , a m ◦ a†−n ] + δlna+m = 0 (8.17a) , a†+m ◦ a−n ] + τδlna+m = 0 (8.17b) , a+m ◦ a†−n ] − τδlma−n = 0 (8.17c) [a−l , a m ◦ a−n ] − δlma−n = 0. (8.17d) (In a sense, these relations are ‘one half’ of the (para)commutation relations (6.16): the latter are a sum of the former and the ones obtained from (8.17) via the changes a+m ◦a n ◦ a+m and a m ◦ a−n 7→ εa−n ◦ a m ; the last relations correspond to (7.28) with E = A, A being the antinormal ordering operator — see (7.30). Said differently, up to the replacement a±i 7→ for all l, the relations (8.17) are identical with (6.16) for ε = 0; as noted in [26, the remarks following theorem 2 in sec. 1], this is a quite exceptional case from the view-point of parastatistics theory.) By means of (6.8) for η = −ε, one can verify that equations (8.17) agree with the bilinear commutation relations (6.13), i.e. (6.13) convert (8.17) into identities. The equations (8.15) imply the following explicit forms of the number operators (7.31) and the transition operators (7.40): Nl = a+l ◦ a †Nl = a†+l ◦ a l (8.18) Nlm = a+l ◦ a †Nlm = a†+l ◦ a m. (8.19) As a result of them, the equations (7.33)–(7.36) are simply a different form of writing of (4.24), (4.25), (4.28) and (4.29), respectively. Let us return to the problem of calculation of vacuum mean values of antinormal ordered products like (8.5). In view of (8.1b) and (8.3), the simplest of them are 〈 X0|λ idF (X0)〉 = λ 〈X0|M±(X0)〉 = 0 (8.20) where λ ∈ C and M+ (resp. M−) is any monomial of degree not less than 1 only in the creation (resp. annihilation) operators; e.g. M± = a±l , a l , a ◦a±l2 , a ◦a†±l2 . These equations, with λ = 1, are another form of what is called the stability of the vacuum: if Xi denotes an i-particle state, i ∈ N∪{0}, then, by virtue of (8.20) and the particle interpretation of (8.4), we have 〈 Xi| X0〉 = δi0, (8.21) i.e. the only non-forbidden transition into (from) the vacuum is from (into) the vacuum. More generally, if Xi′,0 and X0,j′′ denote respectively i′-particle and j′′-antiparticle states, with X0,0 := X0, then 〈Xi′,0| X0,j′′〉 = δi′0δ0j′′ , (8.22) i.e. transitions between two states consisting entirely of particles and antiparticles, respec- tively, are forbidden unless both states coincide with the vacuum. Since we are dealing with free fields, one can expect that the amplitude of a transitions from an (i′-particle + j′-an- tiparticle) state Xi′,j′ into an (i′′-particle + j′′-antiparticle) state Xi′′,j′′ is 〈 Xi′,j′| Xi′′,j′′〉 = δi′i′′δj′j′′ , (8.23) but, however, the proof of this hypothesis requires new assumptions (vide infra). Let us try to employ (8.17) for calculation of expressions like (8.5). Acting with (8.17) and their Hermitian conjugate on the vacuum, in view of (8.1b), we get a+m ◦ (−a†−n ◦ a+l + δln idF)(X0) = 0 a n ◦ (a−m ◦ a − δlm idF )(X0) = 0 a†+m ◦ (−a−n ◦ a+l + τδln idF )(X0) = 0 a n ◦ (a†−m ◦ a l − τδlm idF )(X0) = 0. (8.24) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 42 These equalities, as well as (8.17), cannot help directly to compute vacuum mean values of antinormally ordered products of creation and annihilation operators. But the equa- tions (8.24) suggest the restrictions28 ◦ a+m(X0) = δlm X0 a−l ◦ a m (X0) = δlm X0 a−l ◦ a m(X0) = τδlm X0 a l ◦ a m (X0) = τδlm X0 (8.25) to be added to the definition of the vacuum. These conditions convert (8.24) into identities and, in this sense agree with (8.17) and, consequently, with the bilinear commutation rela- tions (6.13). Recall [16, 18], the relations (8.25) are similar to ones accepted in the parafield theory and coincide with that for parastatistics of order p = 1; however, here we do not sup- pose the validity of the paracommutation relations (6.20) (or (6.16)). Equipped with (8.25), one is able to calculate the r.h.s. of (8.5) for any monomial M (resp. M′) and monomials M′ (resp. M) of degree 1, degM′ = 1 (resp. degM = 1).29 Indeed, (8.25), (8.1b) and (8.3) entail: 〈 X0|a†−l ◦ a m(X0)〉 = 〈X0|a−l ◦ a m (X0)〉 = δlm 〈 X0|a−l ◦ a m(X0)〉 = 〈X0|a ◦ a†+m (X0)〉 = τδlm 〈 X0|(M(a+l1 , a , · · · ))† ◦ a+m(X0)〉 = 〈 X0|(M(a+l1 , a , · · · ))† ◦ a†+m (X0)〉 = 0 degM≥ 2 〈 X0|a−l ◦ M(a , a†+m2 , · · · )(X0)〉 = 〈X0|a l ◦ M(a , a†+m2 , · · · )(X0)〉 = 0 degM≥ 2. (8.26) Hereof the equation (8.23) for i′ + j′ = 1 (resp. i′′ + j′′ = 1) and arbitrary i′′ and j′′ (resp. i′ and j′) follows. However, it is not difficult to be realized, the calculation of (8.5) in cases more general than (8.20) and (8.26) is not possible on the base of the assumptions made until now.30 At this point, one is free so set in an arbitrary way the r.h.s. of (8.5) in the mentioned general case or to add to (8.17) (and, possibly, (8.25)) other (commutation) relations by means of which the r.h.s. of (8.5) to be calculated explicitly; other approaches, e.g. some mixture of the just pointed ones, for finding the explicit form of (8.5) are evidently also possible. Since expressions like (8.5) are directly connected with observable experimental results, the only criterion for solving the problem for calculating the r.h.s. of (8.5) in the general case can be the agreement with the existing experimental data. As it is known [1, 3, 11, 12], at present (almost?) all of them are satisfactory described within the framework of the bilinear commutation relations (6.13). This means that, from physical point of view, the theory should be considered as realistic one if the r.h.s. of (8.5) is the same as if (6.13) are valid or is reducible to it for some particular realization of an accepted method of calculation, e.g. if one accepts some commutation relations, like the paracommutation ones, which are a generalization of (6.13) and reduce to them as a special case (see, e.g., (6.20)). It should be noted, the conditions (8.1b)–(8.3) and (8.25) are enough for calculating (8.5) if (6.16), or its versions (6.17) or (6.20), are accepted (cf. [16]). The causes for that difference are replacements like [a+m, a n ] 7→ 2a+m◦a n , when one passes from (6.16) to (8.17); the existence of terms like a n ◦ a+ma+l in (6.16) are responsible for the possibility to calculate (8.5). 28 Since the operators a± and a are, generally, degenerate (with no inverse ones), we cannot say that (8.24) implies (8.25). 29 For degM′ = 0 (resp. degM′ = 0) — see (8.20). 30 It should be noted, the conditions (8.1b)–(8.3) and (8.25) are enough for calculating (8.5) if the rela- tions (6.16), or their version (6.20), are accepted (cf. [16]). The cause for that difference is in replacements like [a+m, a n ] 7→ 2a m ◦ a n , when one passes from (6.16) to (8.17); the existence of terms like a n ◦ a m ◦ a in (6.16) is responsible for the possibility to calculate (8.5), in case (6.16) hold. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 43 If evidences appear for events for which (8.5) takes other values, one should look, e.g., for other commutation relations leading to desired mean values. As an example of the last type can be pointed the following anomalous bilinear commutation relations (cf. (6.13)) , a±m]ε = 0 [a , a†±m ]ε = 0 [a∓l , a m]ε = (±1)2jτδlm idF [a l , a m ]ε = (±1)2jτδlm idF [a±l , a m ]ε = 0 [a l , a m]ε = 0 [a∓l , a m ]ε = (±1)2jδlm idF [a l , a m]ε = (±1)2jδlm idF , (8.27) which should be imposed after expressions like E(a†±m ◦ a∓n ) are explicitly calculated. These relations convert (8.17) and (8.25) into identities and by their means the r.h.s. of (8.5) can be calculated explicitly, but, as it is well known [1,3,11,12,27] they lead to deep contradictions in the theory, due to which should be rejected.31 At present, it seems, the bilinear commutation relations (6.13) are the only known com- mutation relations which satisfy all of the mentioned conditions and simultaneously provide an evident procedure for effective calculation of all expressions of the form (8.5). (Besides, for them and for the paracommutation relations the vectors (8.4) form a base, the Fock base, for the system’s Hilbert space of states [18].) In this connection, we want to mention that the paracommutation relations (6.16) (or their conventional version (6.20)), if imposed as additional restrictions to the theory together with (8.17), reduce in this particular case to (6.13) as the conditions (8.25) show that we are dealing with a parafield of order p = 1, i.e. with an ordinary field [17,18].32 Ending this section, let us return to the definition of the vacuum X0. It, generally, depends on the adopted commutation relations. For instance, in a case of the bilinear com- mutation relations (6.13) it consists of the equations (8.1a)–(8.3), while in a case of the paracommutation relations (6.16) (or other ones generalizing (6.13)) it includes (8.1a)–(8.3) and (8.25). 9. Commutation relations for several coexisting different free fields Until now we have considered commutation relations for a single free field, which can be scalar, or spinor or vector one. The present section is devoted to similar treatment of a system consisting of several, not less than two, different free fields. In our context, the fields may differ by their masses and/or charges and/or spins; e.g., the system may consist of charged scalar field, neutral scalar field, massless spinor field, massive spinor field and massless neural vector field. It is a priori evident, the commutation relations regarding only one field of the system should be as discussed in the previous sections. The problem is to be derived/postulated commutation relations concerning different fields. It will be shown, the developed Lagrangian formalism provides a natural base for such an investigation and makes superfluous some of the assumptions made, for example, in [17, p. B 1159, left column] or in [18, sec. 12.1], where systems of different parafields are explored. To begin with, let us introduce suitable notation. With the indices α, β, γ = 1, 2, . . . , N will be distinguished the different fields of the system, with N ∈ N, N ≥ 2, being their number, and the corresponding to them quantities. Let qα and jα be respectively the charge 31 As it was demonstrated in [13–15], a quantization like (8.27) contradicts to (is rejected by) the charge symmetric Lagrangians (3.4). 32 Notice, as a result of (8.17), the relations (6.16) correspond to (7.28) for E = A, with A being the antinormal ordering operator (see (7.30)). Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 44 and spin of the α-th field. Similarly to (3.7), we define jα := 0 for scalar α-th field for spinor α-th field 1 for vector α-th field τα := 1 for qα = 0 (neutral (Hermitian) field) 0 for qα 6= 0 (charged (non-Hermitian) field) εα := (−1)2jα = +1 for integer jα (bose fields) −1 for half-integer jα (fermi fields) (9.1) Suppose Lα is the Lagrangian of the α-field. For definiteness, we assume Lα for all α to be given by one and the same set of equations, viz. (3.1), or (3.3) or (3.4). To save some space, below the case (3.4), corresponding to charge symmetric Lagrangians, will be considered in more details; the reader can explore other cases as exercises. Since the Lagrangian of our system of free fields is Lα, (9.2) the dynamical variables are Dα (9.3) and the corresponding system of Euler-Lagrange equations consists of the independent equa- tions for each of the fields of the system (see (3.6) with Lα for L). This allows an introduction of independent creation and annihilation operators for each field. The ones for the α-th field will be denoted by a±α,sα(k) and a α,sα(k); notice, the values of the polarization variables generally depend on the field considered and, therefore, they also are labeled with index α for the α-th field. For brevity, we shall use the collective indices lα, mα and nα, with lα := (α, sα,k) etc., in terms of which the last operators are a± and a , respectively. The particular expressions for the dynamical operators Dα are given via (3.9)–(3.12) in which the following changes should be made: τ 7→ τα j 7→ jα ε 7→ εα s 7→ sα s′ 7→ s′α µν (k) 7→ σs αs′α,± µν (k) l ss′,± µν (k) 7→ ls αs′α,± µν (k). (9.4) The content of sections 4 and 5 remains valid mutatis mutandis, viz. provided the just pointed changes (9.4) are made and the (integral) dynamical variables are understood in conformity with (9.3). 9.1. Commutation relations connected with the momentum operator. Problems and their possible solutions In sections 6–8, however, substantial changes occur; for instance, when one passes from (6.12) or (6.15) to (6.16). We shall consider them briefly in a case when one starts from the charge symmetric Lagrangians (3.4). The basic relations (6.12), which arise from the Heisenberg relation (5.1) concerning the momentum operator, now read (here and below, do not sum over α, and/or β and/or γ if the opposite is not indicated explicitly!) a±lα , [a ]εβ + [a ± (1 + τ)δlαmβa±lα = 0 (9.5a) lα , [a ]εβ + [a ± (1 + τ)δlαmβa lα = 0. (9.5b) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 45 It is trivial to be seen, the following generalizations of respectively (6.14) and (6.15) a±lα , [a ± (1 + τβ)δlαmβa±lα = 0 (9.6a) ± (1 + τβ)δlαmβa±lα = 0 (9.6b) lα , [a ± (1 + τβ)δlαmβa lα = 0 (9.6c) ± (1 + τβ)δlαmβa = 0 (9.6d) a+lα , [a + 2δlαmβa lα = 0 (9.7a) + 2τβδlαmβa = 0 (9.7b) a−lα , [a − 2τβδlαmβa−lα = 0 (9.7c) − 2δlαmβa−lα = 0 (9.7d) provide a solution of (9.5) in a sense that they convert it into identity. As it was said in Sect. 6, the equations (9.6) (resp. (9.7)) for a single field, i.e. for β = α, agree (resp. disagree) with the bilinear commutation relations (6.13). The only problem arises when one tries to generalize, e.g., the relations (9.7) in a way similar to the transition from (6.15) to (6.16). Its essence is in the generalization of expres- sions like [a ]εβ and τ βδlαmβa . When passing from (6.15) to (6.16), the indices l and m are changed so that the obtained equations to be consistent with (6.13); of course, the numbers ε and τ are preserved because this change does not concern the field regarded. But the situation with (9.7) is different in two directions: (i) If we change the pair (mβ,mβ) in [a ]εβ with (m β, nγ), then with what the num- ber εβ should be replace? With εβ , or εγ or with something else? Similarly, if the mentioned changed is performed, with what the multiplier τβ in τβδlαmβa lα should be replaced? The problem is that the numbers εβ and τβ are related to terms like a and a± ◦ a†∓ in the momentum operator, as a whole and we cannot say whether the index β in εβ and τβ originates from the first of second index mβ in these expressions. (ii) When writing (mβ , nγ) for (mβ ,mβ) (see (i) above), then shall we replace δlαmβa with δlαmβa nγ , or δlαnγa , or δmβnγa ? For a single field, γ = β = α, this problem is solved by requiring an agreement of the resulting generalization (of (6.16) in the particular case) with the bilinear commutation relations (6.13). So, how shall (6.13) be generalized for several, not less than two, different fields? Obviously, here we meet an obstacle similar to the one described in (i) above, with the only change that −εβ should stand for εβ . Let blα and clα denote some creation or annihilation operator of the α-field. Consider the problem for generalizing the (anti)commutator [blα , clα ]±εα . This means that we are looking for a replacement [blα , clα ]±εα 7→ f±(blα , cmβ ;α, β), (9.8) where the functions f± are such that f±(blα , cmβ ;α, β) = [blα , clα ]±εα . (9.9) Unfortunately, the condition (9.9) is the only restriction on f± that the theory of free fields can provide. Thus the functions f±, subjected to equation (9.9), become new free parameters of the quantum theory of different free fields and it is a matter of convention how to choose/fix them. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 46 It is generally accepted [18, appendix F], the functions f± to have forms ‘maximum’ similar to the (anti)commutators they generalize. More precisely, the functions f±(blα , cmβ ;α, β) = [blα , cmβ ]±εαβ (9.10) where εαβ ∈ C are such that εαα = εα, (9.11) are usually considered as the only candidates for f±. Notice, in (9.10), εαβ are functions in α and β, not in lα and/or mβ. Besides, if we assume εαβ to be function only in εα and εβ , then the general form of εαβ is εαβ = uαβεα + (1− uαβ)εβ + vαβ(1− εαεβ) uαβ , vαβ ∈ C, (9.12) due to (9.1) and (9.11). (In view of (6.13), the value εαβ = +1 (resp. εαβ = −1) corresponds to quantization via commutators (resp. anticommutators) of the corresponding fields.) Call attention now on the numbers τα which originate and are associated with each term [blα , cmα ]±εα . With every change (9.8) one can associate a replacement τα 7→ g(blα , cmβ ;α, β), (9.13) where the function g is such that g(blα , cmβ ;α, β) = τα. (9.14) Of course, the last condition does not define g uniquely and, consequently, the function g, satisfying (9.14), enters in the theory as a new free parameter. Suppose, as a working hypothesis similar to (9.10)–(9.11), that g is of the form g(blα , cmβ ;α, β) = τ αβ , (9.15) where ταβ are complex numbers that may depend only on α and β and are such that ταα = τα. (9.16) Besides, if we suppose ταβ to be functions only in τα and τβ, then ταβ = xαβτα + yαβτβ + (1− xαβ − yαβ)τατβ xαβ , yαβ ∈ C, (9.17) as a result of (9.1) and (9.16). Let us summarize the above discussion. If we suppose a preservation of the algebraic structure of the bilinear commutation relations (6.13) for a system of different free fields, then the replacements [blα , clα ]±εα 7→ [blα , cmβ ]±εαβ εαα = εα (9.18a) τα 7→ ταα ταα = τα (9.18b) should be made; accordingly, the relations (6.13) transform into: ]−εαβ = 0 [a ]−εαβ = 0 [a∓lα , a ]−εαβ = τ αβδlαmβ idF × lα , a ]−εαβ = τ αβδlαmβ idF × [a±lα , a ]−εαβ = 0 [a lα , a ]−εαβ = 0 ]−εαβ = δlαmβ idF × ]−εαβ = δlαmβ idF × , (9.19) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 47 where 1 (resp. −εαβ) in corresponds to the choice of the upper (resp. lower) signs. If we suppose additionally εαβ (resp. ταβ) to be a function only in εα and εβ (resp. in τα and τβ), then these numbers are defined up to two sets of complex parameters: εαβ = uαβεα + (1− uαβ)εβ + vαβ(1− εαεβ) uαβ, vαβ ∈ C (9.20a) ταβ = xαβτα + yαβτβ + (1− xαβ − yαβ)τατβ xαβ, yαβ ∈ C. (9.20b) A reasonable further specialization of εαβ and ταβ may be the assumption their ranges to coincide with those of εα and τα, respectively. As a result of (9.1), this supposition is equivalent to vαβ = −uαβ,−uαβ + 1, uαβ − 1, uαβ uαβ ∈ C (9.21a) (xαβ , yαβ) = (0, 0), (0, 1), (1, 0), (1, 1). (9.21b) Other admissible restriction on (9.20) may be the requirement εαβ and ταβ to be symmetric, εαβ(εα, εβ) = εβα(εα, εβ) = εαβ(εβ , εα) (9.22a) ταβ(τα, τβ) = τβα(τα, τβ) = ταβ(τβ, τα), (9.22b) which means that the α-th and β-th fields are treated on equal footing and there is no a priori way to number some of them as the ‘first’ or ‘second’ one.33 In view of (9.20), the conditions (9.22) are equivalent to uαβ = vαβ ∈ C (9.23a) yαβ = xαβ. (9.23b) If both of the restrictions (9.21) and (9.23) are imposed on (9.20), then the arbitrariness of the parameters in (9.20) is reduced to: (uαβ , uαβ) = (9.24a) (xαβ , yαβ) = (0, 0), (1, 1) (9.24b) and, for any fixed pair (α, β), we are left with the following candidates for respectively εαβ and ταβ: (+1 + εα + εβ − εαεβ) (9.25a) (−1 + εα + εβ + εαεβ) (9.25b) 0 := τ α + τβ (9.25c) 1 := τ α + τβ − τατβ. (9.25d) When free fields are considered, as in our case, no further arguments from mathematical or physical nature can help for choosing a particular combination (εαβ , ταβ) from the four possible ones according to (9.25) for a fixed pair (α, β). To end the above considerations of εαβ and ταβ, we have to say that the choice (εαβ , ταβ) = (ε + , τ 0 ) = (+1 + εα + εβ − εαεβ), τα + τβ (9.26) 33 However, nothing can prevent us to make other choices, compatible with (9.18), in the theory of free fields; for instance, one may set εαβ = εαεβεβα and ταβ = 1 (τα + τβ)τβα. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 48 is known as the normal case [18, appendix F]; in it the relative behavior of bose (resp. fermi) fields is as in the case of a single field, i.e. they are quantized via commutators (resp. anticommutators) as (εαβ , ταβ) = (+1, 0) (resp. (εαβ , ταβ) = (−1, 0)), and the one of bose and fermi field is as in the case of a single fermi field, viz. the quantization is via commutators as (εαβ , ταβ) = (+1, 0). All combinations between ε ± and τ 0,1 different from (9.26) are referred as anomalous cases. Above we supposed the pair (α, β) to be fixed. If α and β are arbitrary, the only essential change this implies is in (9.25), where the choice of the subscripts +, −, 0 and 1 may depend on α and β. In this general situation, the normal case is defined as the one when (9.26) holds for all α and β. All other combinations are referred as anomalous cases; such are, for instance, the ones when some fermi and bose operators satisfy anticommutation relations, e.g. (9.19) with εαβ = −1 for εα + εβ = 0, or some fermi fields are subjected to commutation relations, like (9.19) with εαβ = +1 for εα = εβ = −1. For some details on this topic, see, for instance, [18, appendix F], [7, chapter 20] and [27, sect 4-4]. Fields/operators for which εαβ = +1 (resp. εαβ = −1), with β 6= α, are referred as relative parabose (resp. parafermi) in the parafield theory [17,18]. One can transfer this terminology in the general case and call the fields/operators for which εαβ = +1 (resp. εαβ = −1), with β 6= α, relative bose (resp. fermi) fields/operators. Further the relations (9.19) will be referred as the multifield bilinear commutation rela- tions and it will be assumed that they represent the generalization of the bilinear commuta- tion relations (6.13) when we are dealing with several, not less than two, different quantum fields. The particular values of εαβ and ταβ in them are insignificant in the following; if one likes, one can fix them as in the normal case (9.26). Moreover, even the definition (9.19) of ταβ is completely inessential at all, as ταβ always appears in combinations like ταβδlαmβ (see (9.19) or similar relations, like (9.27), below), which are non-vanishing if β = α, but then ταα = τα; so one can freely write τα for ταβ in all such cases. Equipped with (9.19) and (9.18), we can generalize (9.7) in different ways. For example, the straightforward generalization of (6.16) is: , [a+ nγ ]εβγ + 2δlαnγa = 0 (9.27a) a+lα , [a , a−nγ ]εβγ + 2ταγδlαnγa = 0 (9.27b) , [a+ nγ ]εβγ − 2ταβδlαmβa−nγ = 0 (9.27c) a−lα , [a , a−nγ ]εβγ − 2δlαmβa−nγ = 0. (9.27d) However, generally, the relations (9.19) do not convert (9.27) into identities. The reason is that an equality/identity like (cf. (6.8)) [blα , cmβ ◦ dnγ ] = [blα , cmβ ]−εαβ ◦ dnγ + λαβγcmβ ◦ [blα , dnγ ]−εαγ , (9.28) where blα , cmβ and dnγ are some creation/annihilation operators and λ αβγ ∈ C, can be valid only for λαβγ = εαβ εαγ = 1/εαβ (εαβ 6= 0), (9.29) which, in particular, is fulfilled if γ = β and εαβ = ±1. So, the agreement between (9.19) and (9.27) depends on the concrete choice of the numbers εαβ . There exist cases when even the normal case (9.26) cannot ensure (9.19) to convert (9.27) into identities; e.g. when the α-th field and β-th fields are fermion ones and the γ-th field is a boson one. Moreover, it can be proved that (9.19) and (9.27) are compatible in the general case if unacceptable equalities like a± ◦ a±m = 0 hold. One may call (9.27) the multifield paracommutation relations as from them a correspond- ing generalization of (6.18) and/or (6.20) can be derived. For completeness, we shall record Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 49 the multifield version of (6.20): [blα , [b , bnγ ]εβγ ] = 2δlαmβbnγ [blα , [bmβ , bnγ ]εβγ ] = 0 (9.30a) [clα , [c , cnγ ]εβγ ] = 2δlαmβcnγ [clα , [cmβ , cnγ ]εβγ ] = 0 (9.30b) lα , [c , cnγ ]εβγ ] = −2ταγδlαnγb lα , [b , bnγ ]εβγ ] = −2ταγδlαnγc . (9.30c) For details regarding these multifield paracommutation relations, the reader is referred to [17, 18], where the case τα = τβ = ταβ = 0 is considered. We leave to the reader as exercise to write down the multifield versions of the commuta- tion relations (6.22) or (6.23), which provide examples of generalizations of (9.7) and hence of (9.19) and (9.27). 9.2. Commutation relations connected with the charge and angular momentum operators In a case of several, not less than two, different fields, the basic trilinear commutation rela- tions (6.33), which ensure the validity of the Heisenberg relation (5.2) concerning the charge operator, read: a±lα , [a ]εβ − [a+mβ , a − 2δlαmβa±lα = 0 (9.31a) lα , [a ]εβ − [a+mβ , a + 2δlαmβa lα = 0. (9.31b) Of course, these relations hold only for those fields which have non-vanishing charges, i.e. in (9.31) is supposed (see (9.1)) τα = 0 τβ = 0 (⇐⇒ qαqβ 6= 0). (9.32) The problem for generalizing (9.31) for these fields is similar to the one for (9.7) in the case of non-vanishing charges, τβ = 0. Without repeating the discussion of Subsect. 9.1, we shall adopt the rule (9.18) for generalizing (anti)commutation relations between cre- ation/annihilation operators of a single field. By its means one can obtain different general- izations of (9.31). For instance, the commutation relations. , a−nγ ]εβγ − [a+mβ , a nγ ]εβγ − 2δlαnγa+mβ = 0 (9.33a) a−lα , [a , a−nγ ]εβγ − [a+mβ , a nγ ]εβγ − 2δlαmβa−nγ = 0 (9.33b) and their Hermitian conjugate contain (9.31) and (6.35) as evident special cases and agree with (9.19) if γ = β and εαβεβγ = +1. Besides, the multifield paracommutation rela- tions (9.27) for charged fields, τα = τβ = τγ = 0, convert (9.33) into identities and, in this sense, (9.33) agree with (contain as special case) (9.27) for charged fields. As an example of commutation relations that do not agree with (9.27) for charged fields and, consequently, with (9.33), we shall point the following ones: a±lα , [a nγ ]εβγ + δlαnγa = 0 (9.34a) , a−nγ ]εβγ− − δlαnγa±mβ = 0, (9.34b) which are a multifield generalization of (6.34). The consideration of commutation relations originating from the ‘orbital’ Heisenberg equation (5.4) is analogous to the one of the same relations regarding the charge operator. The multifield version of (6.49) is: (−ω◦µν(mβ) + ω◦µν(nγ))([ã±lα , [ã , ã−nγ ]εβγ + [ã+ nγ ]εβγ ] ) nγ=mβ = 4(1 + ταβ)δlαmβω α)(ã± ) (9.35a) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 50 (−ω◦µν(mβ) + ω◦µν(nγ))([ã lα , [ã , ã−nγ ]εβγ + [ã+ nγ ]εβγ ] ) nγ=mβ = 4(1 + ταβ)δlαmβω α)(ã lα ) (9.35b) where ω◦µν(l α) := ωµν(k) = kµ if lα = (α, sα,k). (9.36) Applying (6.51), with mβ for m and nγ for n, one can check that the multifield paracom- mutation relations (9.27) convert (9.35) into identities and hence provide a solution of (9.35) and ensure the validity of (5.4), when system of different free fields is considered. An example of a solution of (9.35) which does not agree with (9.27) is provided by the following multifield generalization of (6.52): a+lα , [a nγ ]εβγ a+lα , [a , a−nγ ]εβγ = −(1 + ταγ)δlαnγa+mβ (9.37a) a−lα , [a nγ ]εβγ a−lα , [a , a−nγ ]εβγ = +(1 + ταβ)δlαmβa nγ , (9.37b) which provides a solution of (9.5). Notice, the evident multifield version of (6.53) agrees with (9.5), but disagrees with (9.35) when the lower signs are used. At last, the multifield exploration of the ‘spin’ Heisenberg relations (5.5) is a mutatis mutandis (see (9.35)) version of the corresponding considerations in the second part of Sub- sect. 6.3. The main result here is that the multifield bilinear commutation relations (9.19), as well as their para counterparts (9.27), ensure the validity of (5.5). 9.3. Commutation relations between the dynamical variables The aim of this subsection is to be discussed/proved the commutation relations (5.15)–(5.24) for a system of at least two different quantum fields from the view-point of the commutation relations considered in subsections 9.1 and 9.2. To begin with, we rewrite the Heisenberg relations (5.1), (5.2) and (5.4) in terms of creation and annihilation operators for a multifield system [1,11]: , Pµ] = ∓kµa±lα [a , Pµ] = ∓kµa†±lα (9.38) [a±lα , Q] = qa lα [a lα , Q] = −qa lα (9.39) ,Morµν ] = i~ω◦µν(lα) ,Morµν ] = i~ω◦µν(lα) , (9.40) where lα = (α, sα,k), ω◦(lα) is defined by (9.36) and k0 = m2c2 + k2 is set in (9.38) and (9.40) (after the differentiations are performed in the last case). The corresponding version of (5.5) is more complicated and depends on the particular field considered (do not sum over sα!): [a±α,sα(k),Mspµν ] = i~gα αtα,+ µν (k)a α,tα(k) + αtα,− µν (k)a α,tα(k) α,sα(k),Mspµν ] = i~hα αtα,− µν (k)a α,tα(k) + αtα,+ µν (k)a α,tα(k) (9.41) where fsα = −1, 0,+1 (depending on the particular field), gα := −hα := 1jα+δjα0 (−1) jα+1 and sαtα,+ µν (k) and sαtα,− µν (k) are some functions which strongly depend on the particular field considered, with ±σ sαtα,± µν (k) being related to the spin (polarization) functions σ sαtα,± µν (k) (see (3.14) and (3.11)).34 As a result of (5.6), (9.40) and (9.41), one can easily write the Heisenberg relations (5.3) in a form similar to (9.38)–(9.41). 34 If φ̃αi (k) are the Fourier images of the α-th field and i (k) = i (k)ã α,sα(k) + v i (k)ã α,sα(k) , (9.42) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 51 The commutation relations involving the momentum operator are: [Pµ, Pν ] = 0 [Q, Pµ] = 0 [Sµν , Pλ] = [Mspµν , Pλ] = 0 [Lµν , Pλ] = [Morµν , Pλ] = [Mµν , Pλ] = −i~{ηλµ Pν − ηλν Pµ}. (9.45) We claim that these equations are consequences from (9.38) and the explicit expressions (3.9)– (3.12) and (5.11)–(5.13) for the operators of the dynamical variables of the free fields con- sidered in the present work. In fact, since (9.38) implies [b±lα ◦ c , Pµ] = 0 lα = (α, sα,k), mβ = (β, sβ ,k) (9.46a) [b±lα ←−−−→ ω◦µν (l α) ◦ c∓ , Pµ] = ±2(kµηνλ − kνηµλ)b±lα ◦ c , (9.46b) where b±lα , c lα = a lα , a lα and ←−−−→ ω◦µν (l α) is defined via (9.36) and (3.13), the verification of (9.45) reduces to almost trivial algebraic calculations. Further, we assert that any system of commu- tation relations considered in Subsect. 9.1 entails (9.45): as these relations always imply (9.5) (or similar multifield versions of (6.10) and (6.11) in the case of the Lagrangians (3.1) or (3.3), respectively) and, on its turn, (9.5) implies (5.1), the required result follows from the last assertion and the remark that (5.1) and (9.38) are equivalent. As an additional verification of the validity of (9.45), the reader can prove them by invoking the identity (6.8) and any system of commutation relations mentioned in Subsect. 9.1, in particular (9.19) and (9.27). The commutation relations concerning the charge operator read: [Pµ, Q] = 0 [Q, Q] = 0 [Lµν , Q] = [Sµν , Q] = 0 [Morµν , Q] = [Mspµν , Q] = [Mµν , Q] = 0. (9.47) These equations are trivial corollaries from (3.9)–(3.12) and (5.11)–(5.13) and the observation that (9.39) implies lα ◦ a , Q] = [a±lα ◦ a , Q] = 0 if qα = qβ, (9.48) due to (6.8) for η = −1. Since any one of the systems of commutation relations mentioned in Subsect. 9.2 entails (9.31) (or systems of similar multifield versions of (6.31) and (6.32), if the Lagrangians (3.1) or (3.3) are employed), which is equivalent to (9.39), the equations (9.47) hold if some of these systems is valid. Alternatively, one can prove via a direct calculation that the commutation relations arising from the charge operator entail the validity of (9.47); where v i (k) are linearly independent functions normalize via the condition i (k) i (k) = δ , (9.43) with fs = 1 for jα = 0, 1 and fs = 0,−1 for (jα, sα) = (1, 3) or (jα, sα) = (1, 1), (1, 2), respectively, then µν (k) := i (k) µν (k) := i (k) (9.44) with Ii iµν given via (5.25). Besides, σ µν (k) = µν (k) with an exception that σ µν (k) = 0 for α = 1 and (µ, ν) = (a, 0), (0, a) with a = 1, 2, 3. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 52 for the purpose the identity (6.8) and the explicit expressions for the dynamical variables via the creation and annihilation operators should be applied. At last, consider the commutation relations involving the different angular momentum operators: [Pλ, Sµν ] = [Pλ,Mspµν ] = 0 [Pλ, Lµν ] = [Pλ,Morµν ] = [Pλ,Mµν ] = +i~{ηλµ Pν − ηλν Pµ} [Q, Lµν ] = [Q, Sµν ] = [Q,Morµν ] = [Q,Mspµν ] = [Q,Mµν ] = 0 [Sκλ,Mµν ] = −i~ ηκµ Sλν − ηλµ Sκν − ηκν Sλµ + ηλν Sκµ [Lκλ,Mµν ] = −i~ ηκµ Lλν − ηλµ Lκν − ηκν Lλµ + ηλν Lκµ [Mκλ,Mµν ] = −i~ ηκµMλν − ηλµMκν − ηκνMλµ + ηλνMκµ (9.49) (The other commutators, that can be form from the different angular momentum operators, are complicated and cannot be expressed in a ‘closed’ form.) The proof of these relations is based on equations like (see (9.40) and (6.8)) [blα ◦ cmβ ,Morµν ] = i~ω◦µν(lα) blα ◦ cmβ lα = (α, sα,k), mβ = (β, sβ ,k), (9.50) with blα , clα = a lα , a lα , a lα , a lα , and similar, but more complicated, ones involving the other angular momentum operators. It, generally, depends on the particular field considered and will be omitted. As it was said in Subsect. 6.3, the Heisenberg relations concerning the angular momentum operator(s) do not give rise to some (algebraic) commutation relations for the creation and annihilation operators. For this reason, the only problem is which of the commutation relations discussed in subsections 9.1 and 9.2 imply the validity of the equations (9.49) (or part of them). The general answer of this problem is not known but, however, a direct calculation by means of (9.7), if it holds, and (6.8) shows the validity of (9.49). Since (9.19) and (9.27) imply (9.7), this means that the multifield bilinear and para commutation relations are sufficient for the fulfillment of (9.49). To conclude, let us draw the major moral of the above material: the multifield bilinear commutation relations (9.19) and the multifield paracommutation relations (9.27) ensure the validity of all ‘standard’ commutation relations (9.45), (9.47) and (9.49) between the operators of the dynamical variables characterizing free scalar, spinor and vector fields. 9.4. Commutation relations under the uniqueness conditions As it was said at the end of the introduction to this section, the replacements (9.4) ensure the validity of the material of Sect. 4 in the multifield case. Correspondingly, the considerations in Sect. 7 remain valid in this case provided the changes l 7→ lα m 7→ mβ n 7→ nγ τδlm 7→ ταβδlαmβ = ταδlαmβ [bm, bm]ε 7→ [bmβ , bmβ ]εβ [bm, bn]ε 7→ [bmβ , bnγ ]εβγ , (9.51) with bm (or bmβ ) being any creation/annihilation operator, and, in some cases, (9.4) are made.35 Without going into details, we shall write the final results. The multifield version of (7.27)–(7.28) is: E(a†± ◦ a∓nγ ) = εβγ E(a∓nγ ◦ a E([a†± , a∓nγ ]εβγ) (9.52) 35 As a result of (7.11), (7.16) and (7.17), in expressions like (7.18)–(7.26) the number ε should be replace by εαβ, where α and β are the corresponding field indices of the creation/annihilation operators on which the operator E acts, i.e. ε E(bm ◦ bn) 7→ ε βγ E(bmβ ◦ bnγ ). Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 53 a+lα , E([a nγ ]εβγ) + 2δlαnγa = 0 (9.53a) , E([a†+ , a−nγ ]εβγ ) + 2ταγδlαnγa = 0 (9.53b) a−lα , E([a nγ ]εβγ) − 2ταβδlαmβa−nγ = 0 (9.53c) , E([a†+ , a−nγ ]εβγ ) − 2δlαmβa−nγ = 0 (9.53d) γ =β. (9.53e) As one can expect, the relations (9.53a)–(9.53d) can be obtained from the multifield paracom- mutation relations (9.27) via the replacement [·, ·]ε 7→ E([·, ·]εβγ ). It should be paid special attention on the equation (9.53e). It is due to the fact that in the expressions for the dynami- cal variables do not enter ‘cross-field-products’, like a for β 6= α, and it corresponds to the condition (ii) in [17, p. B 1159]. The equality (9.53e) is quite important as it selects only that part of the ‘ E-transformed’ multifield paracommutation relations (9.27) which is com- patible with the bilinear commutation relations (9.19) (see (9.28) and (9.29)). Besides, (9.53e) makes (9.53a)–(9.53d) independent of the particular definition of εαβ (see (9.11)). The equations (9.52) are the only restrictions on the operator E ; examples of this operator are provided by the normal (resp. antinormal) ordering operator N (resp. A), which has the properties (cf. (4.22) (resp. (7.30)) ◦ a†−nγ := a+ ◦ a†−nγ N ◦ a−nγ ◦ a−nγ ◦ a†+nγ := εβγa nγ ◦ a−mβ N ◦ a+nγ := εβγa+nγ ◦ a (9.54) ◦ a†−nγ := εβγa nγ ◦ a+mβ A ◦ a−nγ := εβγa−nγ ◦ a ◦ a†+nγ := a− ◦ a†+nγ A ◦ a+nγ ◦ a+nγ . (9.55) The material of Sect. 8 has also a multifield variant that can be obtained via the re- placements (9.51) and (9.4). Here is a brief summary of the main results found in that The operator E should possess the properties (9.54) and, in this sense, can be identified with the normal ordering operator, E = N . (9.56) As a result of this fact and εββ = εβ (see (9.11)), the commutation relations (9.53) take the final form: a+lα , a ◦ a†− + δlαnβa = 0 (9.57a) a+lα , a + ταβδlαnβa = 0 (9.57b) a−lα , a ◦ a†− − ταβδlαmβa−nβ = 0 (9.57c) − δlαmβa−nβ = 0 (9.57d) which is the multifield version of (8.17) and corresponds, up to the replacement a±lα 7→ 2a±lα , to (9.27) with εβγ = 0. The vacuum state vector X0 is supposed to be uniquely defined by the following equations (cf. (8.1b)–(8.3)): a−lα X0 = 0 a lα X0 = 0 (9.58a) X0 6= 0 (9.58b) 〈 X0| X0〉 = 1 (9.58c) lα ◦ a (X0) = δlαmβ X0 a−lα ◦ a (X0) = δlαmβ X0 (X0) = ταβδlαmβ X0 a−lα ◦ a (X0) = ταβδlαmβ X0. (9.58d) Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 54 The Hilbert space F of state vectors is a direct sum of the Hilbert spaces Fα of the different fields and it is supposed to be spanned by the vectors ... = M(a , . . . )(X0) (9.59) with M(a+ , . . . ) being arbitrary monomial only in the creation operators. Since (9.58a), (9.56) and (9.54) imply the multifield version of (8.7), the computation of the mean values of (8.6), with l1 7→ lα11 etc., of the dynamical variables is reduced to the one of scalar products like (cf. (8.5)) 〈ψlα1 ...|φmβ1 〉 = 〈 X0| , . . . ) )† ◦ M′(a+ , . . . )(X0)〉 (9.60) of basic vectors of the form (9.59). By means of the basic properties (9.58) of the vacuum, one is able to calculate the simplest forms of the vacuum mean values (9.60), viz. the mul- tifield versions (see (9.51)) of (8.20) and (8.26). But more general such expression cannot be calculated by means of (9.57)–(9.58). Prima facie one can suppose that the multifield commutation relations (9.19), which ensure the vectors (9.59) to form a base of the system’s Hilbert space of states, can help for the calculation of (9.60) in more complicated cases. In fact, this is the case which works perfectly well and covers the available experimental data. In this connection, we must mention that the applicability of (9.19) for calculation of (9.60) is ensured by the compatibility/agreement between (9.19) and (9.57): by means of (6.8) for η = −εαβ, one can check that (9.19) converts (9.57) into identities.36 The commutation relations (9.57) admit as a solution also the multifield version of the anomalous bilinear commutation relations (8.27) but it, as we said earlier, leads to contradic- tions and must be rejected. The existence of solutions of (9.57) different from it and (9.19) seems not to be investigated. If there appear date which do not fit into the description by means of (9.19), one should look for other, if any, solutions of (9.57) or compatible with (9.57) effective procedures for calculating vacuum mean values like (9.60). 10. Conclusion In this paper we have investigated two sources of (algebraic) commutation relations in the Lagrangian quantum theory of free scalar, spinor and vector fields: the uniqueness of the dynamical variables (momentum, charge and angular momentum) and the Heisenberg rela- tions/equations for them. If one ignores the former origin, which is the ordinary case, the paracommutation relations or some their generalizations seems to be the most suitable can- didates for the most general commutation relations that ensure the validity of all Heisenberg equations. The simultaneous consideration of the both sources mentioned reveals, however, their incompatibility in the general case. The outlet of this situation is in the redefinition of the operators of the dynamical variables, similar to the normal ordering procedure and containing it as a special case. That operation ensures the uniqueness of the new (redefined) dynamical variables and changes the possible types of commutation relations. Again, the commutation relations, connected with the Heisenberg relations concerning the (redefined) momentum operator, entail the validity of all Heisenberg equations. 36 Recall, equations (9.19) and (9.27), or (9.53a)–(9.53d), for γ 6= β are generally incompatible. For instance, excluding some special cases, like systems consisting of only fermi (bose) fields or one fermi (bose) field and arbitrary number of bose (fermi) fields, the only operators satisfying (9.19) and (9.27) for γ 6= β and having normal spin-statistics connection are such that bmβ ◦ bnγ = 0, with γ 6= β and bmβ and cnγ being any creation/annihilation operators, which, in particular, means that no states with two particles from different fields can exist. Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 55 Further constraints on the possible commutation relations follow from the definition/in- troduction of the concept of the vacuum (vacuum state vector). They practically reduce the redefined dynamical variables to the ones obtained via normal ordering procedure, which results in the explicit form (8.17) of the admissible commutation relations. In a sense, they happen to be ‘one half’ of the paracommutation ones. As a last argument in the way for finding the ‘unique true’ commutation relations, we require the existence of procedure for calculation of vacuum mean values of anti-normally ordered products of creation and annihilation operators, to which the mean values of the dynamical variables and the transition amplitudes between different states are reduced. We have pointed that the standard bilinear commutation relations are, at present, the only known ones that satisfy all of the conditions imposed and do not contradict to the existing experimental data. The consideration of a system of at least two different quantum free fields meets a new problem: the general relations between creation/annihilation operators belonging to differ- ent fields turn to be undefined. The cause for this is that the commutation relations for any fixed field are well defined only on the corresponding to it Hilbert subspace of the system’s Hilbert space of states and their extension on the whole space, as well as the inclusion in them of creation/annihilation operators of other fields, is a matter of convention (when free fields are concerned); formally this is reflected in the structure of the dynamical variables which are sums of those of the individual fields included in the system under consideration. We have, however, presented argument by means of which the a priori existing arbitrari- ness in the commutation relations involving different field operators can be reduced to the ‘standard’ one: these relations should contain either commutators or anticommutators of the creation/annihilation operators belonging to different fields. A free field theory cannot make difference between these two possibilities. Accepting these possibilities, the admissible commutation relations (9.57) for system of several different fields are considered. They turn to be corresponding multifield versions of the ones regarding a single field. Similarly to the single field case, the standard multifield bilinear commutation relations seem to be the only known ones that satisfy all of the imposed restrictions and are in agreement with the existing data. Acknowledgments This research was partially supported by the National Science Fund of Bulgaria under Grant No. F 1515/2005. References [1] N. N. Bogolyubov and D. V. Shirkov. Introduction to the theory of quantized fields. Nauka, Moscow, third edition, 1976. In Russian. English translation: Wiley, New York, 1980. [2] J. D. Bjorken and S. D. Drell. Relativistic quantum mechanics, volume 1 and 2. McGraw- Hill Book Company, New York, 1964, 1965. Russian translation: Nauka, Moscow, 1978. [3] Paul Roman. Introduction to quantum field theory. John Wiley&Sons, Inc., New York- London-Sydney-Toronto, 1969. [4] Lewis H. Ryder. Quantum field theory. Cambridge Univ. Press, Cambridge, 1985. Russian translation: Mir, Moscow, 1987. [5] A. I. Akhiezer and V. B. Berestetskii. Quantum electrodynamics. Nauka, Moscow, 1969. In Russian. English translation: Authorized English ed., rev. and enl. by the Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 56 author, Translated from the 2d Russian ed. by G.M. Volkoff, New York, Interscience Publishers, 1965. Other English translations: New York, Consultants Bureau, 1957; London, Oldbourne Press, 1964, 1962. [6] Pierre Ramond. Field theory: a modern primer, volume 51 of Frontiers in physics. Read- ing, MA Benjamin-Cummings, London-Amsterdam-Don Mills, Ontario-Sidney-Tokio, 1 edition, 1981. 2nd rev. print, Frontiers in physics vol. 74, Adison Wesley Publ. Co., Redwood city, CA, 1989; Russian translation from the first ed.: Moscow, Mir 1984. [7] N. N. Bogolubov, A. A. Logunov, and I. T. Todorov. Introduction to axiomatic quantum field theory. W. A. Benjamin, Inc., London, 1975. Translation from Russian: Nauka, Moscow, 1969. [8] N. N. Bogolubov, A. A. Logunov, A. I. Oksak, and I. T. Todorov. General principles of quantum field theory. Nauka, Moscow, 1987. In Russian. English translation: Kluwer Academic Publishers, Dordrecht, 1989. [9] P. A. M. Dirac. The principles of quantum mechanics. Oxford at the Clarendon Press, Oxford, fourth edition, 1958. Russian translation in: P. Dirac, Principles of quantum mechanics, Moscow, Nauka, 1979. [10] P. A. M. Dirac. Lectures on quantum mechanics. Belfer graduate school of science, Yeshiva University, New York, 1964. Russian translation in: P. Dirac, Principles of quantum mechanics, Moscow, Nauka, 1979. [11] J. D. Bjorken and S. D. Drell. Relativistic quantum fields, volume 2. McGraw-Hill Book Company, New York, 1965. Russian translation: Nauka, Moscow, 1978. [12] C. Itzykson and J.-B. Zuber. Quantum field theory. McGraw-Hill Book Company, New York, 1980. Russian translation (in two volumes): Mir, Moscow, 1984. [13] Bozhidar Z. Iliev. Lagrangian quantum field theory in momentum picture. In O. Kovras, editor, Quantum Field Theory: New Researcn, pages 1–66. Nova Science Publishers, Inc., New York, 2005. http://arXiv.org e-Print archive, E-print No. hep-th/0402006, February 1, 2004. [14] Bozhidar Z. Iliev. Lagrangian quantum field theory in momentum picture. II. Free spinor fields. http://arXiv.org e-Print archive, E-print No. hep-th/0405008, May 1, 2004. [15] Bozhidar Z. Iliev. Lagrangian quantum field theory in momentum picture. III. Free vector fields. http://arXiv.org e-Print archive, E-print No. hep-th/0505007, May 1, 2005. [16] H. S. Green. A generalized method of field quantization. Phys. Rev., 90(2):270–273, 1953. [17] O. W. Greenberg and A. M. I. Messiah. Selection rules for parafields and the absence of para particles in nature. Phys. Rev., 138B(5B):1155–1167, 1965. [18] Y. Ohnuki and S. Kamefuchi. Quantum field theory and parafields. University of Tokyo Press, Tokyo, 1982. [19] Silvan S. Schweber. An introduction to relativistic quantum field theory. Row, Peter- son and Co., Evanston, Ill., Elmsford, N.Y., 1961. Russian translation: IL (Foreign Literature Pub.), Moscow, 1963. http://arXiv.org http://arxiv.org/abs/hep-th/0402006 http://arXiv.org http://arxiv.org/abs/hep-th/0405008 http://arXiv.org http://arxiv.org/abs/hep-th/0505007 Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 57 [20] Bozhidar Z. Iliev. Pictures and equations of motion in Lagrangian quantum field theory. In Charles V. Benton, editor, Studies in Mathematical Physics Research, pages 83–125. Nova Science Publishers, Inc., New York, 2004. http://arXiv.org e-Print archive, E-print No. hep-th/0302002, February 2003. [21] Bozhidar Z. Iliev. Momentum picture of motion in Lagrangian quantum field the- ory. International Journal of Theoretical Physics, Group Theory, and Nonlinear Optics, ??(?):??–??, 2007. To appear. http://arXiv.org e-Print archive, E-print No. hep-th/0311003, November, 2003. [22] Bozhidar Z. Iliev. On operator differentiation in the action principle in quantum field the- ory. In Stancho Dimiev and Kouei Sekigava, editors, Proceedings of the 6th International Workshop on Complex Structures and Vector Fields, 3–6 September 2002, St. Knstantin resort (near Varna), Bulgaria, “Trends in Complex Analysis, Differential Geometry and Mathematical Physics”, pages 76–107. World Scientific, New Jersey-London-Singapore- Hong Kong, 2003. http://arXiv.org e-Print archive, E-print No. hep-th/0204003, April 2002. [23] Bozhidar Z. Iliev. On angular momentum operator in quantum field theory. In Frank Columbus and Volodymyr Krasnoholovets, editors, Frontiers in quantum physics re- search, pages 129–142. Nova Science Publishers, Inc., New York, 2004. http://arXiv.org e-Print archive, E-print No. hep-th/0211153, November 2002. [24] Bozhidar Z. Iliev. On momentum operator in quantum field theory. In Frank Columbus and Volodymyr Krasnoholovets, editors, Frontiers in quantum physics research, pages 143–156. Nova Science Publishers, Inc., New York, 2004. http://arXiv.org e-Print archive, E-print No. hep-th/0206008, June 2002. [25] J. D. Bjorken and S. D. Drell. Relativistic quantum mechanics, volume 1. McGraw-Hill Book Company, New York, 1964. Russian translation: Nauka, Moscow, 1978. [26] A. B. Govorkov. Parastatistics and internal symmetries. In N. N. Bogolyubov, editor, Physics of elementary particles and atomic nuclei, volume 14, No. 5, of Particles and nuclei, pages 1229–1272. Energoatomizdat, Moscow, 1983. In Russian. [27] R. F. Streater and A. S. Wightman. PCT, spin and statistics and all that. W. A. Benjamin, Inc., New York-Amsterdam, 1964. Russian translation: Nauka, Moscow, 1966. http://arXiv.org http://arxiv.org/abs/hep-th/0302002 http://arXiv.org http://arxiv.org/abs/hep-th/0311003 http://arXiv.org http://arxiv.org/abs/hep-th/0204003 http://arXiv.org http://arxiv.org/abs/hep-th/0211153 http://arXiv.org http://arxiv.org/abs/hep-th/0206008 Introduction The momentum picture Lagrangians, Euler-Lagrange equations and dynamical variables On the uniqueness of the dynamical variables Heisenberg relations Types of possible commutation relations Restrictions related to the momentum operator Restrictions related to the charge operator Restrictions related to the angular momentum operator(s) Inferences State vectors, vacuum and mean values Commutation relations for several coexisting different free fields Commutation relations connected with the momentum operator. Problems and their possible solutions Commutation relations connected with the charge and angular momentum operators Commutation relations between the dynamical variables Commutation relations under the uniqueness conditions Conclusion References This article ends at page ABSTRACT Possible (algebraic) commutation relations in the Lagrangian quantum theory of free (scalar, spinor and vector) fields are considered from mathematical view-point. As sources of these relations are employed the Heisenberg equations/relations for the dynamical variables and a specific condition for uniqueness of the operators of the dynamical variables (with respect to some class of Lagrangians). The paracommutation relations or some their generalizations are pointed as the most general ones that entail the validity of all Heisenberg equations. The simultaneous fulfillment of the Heisenberg equations and the uniqueness requirement turn to be impossible. This problem is solved via a redefinition of the dynamical variables, similar to the normal ordering procedure and containing it as a special case. That implies corresponding changes in the admissible commutation relations. The introduction of the concept of the vacuum makes narrow the class of the possible commutation relations; in particular, the mentioned redefinition of the dynamical variables is reduced to normal ordering. As a last restriction on that class is imposed the requirement for existing of an effective procedure for calculating vacuum mean values. The standard bilinear commutation relations are pointed as the only known ones that satisfy all of the mentioned conditions and do not contradict to the existing data. <|endoftext|><|startoftext|> Introduction Epitaxial self-assembled quantum dots (SAQDs) represent an important step in the advancement of semiconductor fabrication at the nanoscale that will allow breakthroughs in optoelectronics and electronics. [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] Most frequent optoelectronic applications are high efficiency lasers with exotic wavelengths or photode- tectors. [1, 3, 4, 5, 6, 7, 8, 10, 11, 12] SAQDs are the result of a transition from 2D growth to 3D growth in strained epitaxial films such as SixGe1−x/Si and InxGa1−xAs/GaAs. This process is known as Stranski-Krastanow growth or Volmer-Webber growth. [13, 1, 14, 15]. In applications, order is a key factor. There are two types of order, spatial and size. Spatial order refers to the regularity of SAQD dot placement, and it is necessary for nano-circuitry applications. Size order refers to the uniformity of SAQD size which determines the voltage and/or energy level quantization of SAQDs. It is reasonable to expect that these type of order are linked, and it is important to understand the factors that determine SAQD order. Further understanding should help in the design and simulation of both spontaneous “bottom up” self-assembly and directed or guided self-assembly to enhance SAQD order. [16, 17, 18, 19, 20, 21, 22, 23]Here, an elaboration of and further application of a linear analysis of SAQD order [24] is presented. The work reported here forms the basis of a non-linear theory and modeling of SAQD order that will be reported in future work. In [24] it was reported that one could calculate a correlation function using a linearized model of SAQD formation. This correlation function included two correlation lengths that could be used to describe SAQD order. It was also found that one effect of a hypothesized wetting potential was to enhance SAQD order when growth occurs near the critical film height for 3D growth. Here, these results are expanded to create a more rigorous linearized theory of SAQD order that will inform non-linear theories. In particular, the model is generalized to any model that combines local energy effects such as surface energy density and non-local elastic destabilization, and the procedure for predicting order based on any linear theory with peak wavelengths is presented. The hypothesized effect of elastic anisotropy in [24] is verified with calculations using linear anisotropic elasticity theory. [25, 26] Details such as statistical fluctuation and convergence are also addressed along with a discussion of the possible forms of linear anisotropic terms in SAQD growth kinetics, and the effect of an atomic-scale cutoff in the continuum theory is addressed. Finally, the order enhancing effect of growing near the critical threshold is explored in more detail using calculations appropriate to Ge/Si SAQDs. In the literature, two modes of SAQD formation are generally discussed, the thermal nucleation mode and the nu- cleationless mode. [27, 28, 29] In the thermal nucleation mode, a 2D film surface is metastable, and the formation of individual quantum dots is thermally activated. [27]. This growth mode leads to the formation of individual quantum dots as uncorrelated or loosely correlated discrete events at essentially random locations. In the nucleationless mode, the 2D film surface transitions from stable (or metastable) to unstable. In this mode, dots form everywhere at once appearing at first as a cross-hatched ripple-like disturbance on the 2D film surface and then maturing into recogniz- able individual dots.[27, 30, 28, 31, 32] These two modes are probably connected via an encompassing conceptual and mathematical model1, and perhaps some of what is observed experimentally is in fact a hybrid mechanism. In agreement with intuition, it appears that the nucleationless mode leads to a more ordered dot pattern than the thermal nucleation mode that is dominated by randomness. 2 Thus, the presented analysis applies to the nucleationless mode. There are various implementations of nucleationless growth models [28, 37, 38, 39, 40, 18, 34], although, there is also a great deal of commonality among these models. In particular, they all include a non-local elastic effect and local surface energies and/or local wetting energies. Here, a linear analysis of quantum dot order resulting from this class of model is presented. Particular note is taken of the effects of stochastic initial conditions crystal anisotropy in general, elastic anisotropy in particular, and the effect of varying film height as a control parameter as first introduced in [33]. A simple model similar to [28, 37, 38, 40, 18] is presented to produce numerical examples and explore the effects of the average film height. Concurrently, a more abstract and general model is presented and analyzed that includes non-local elastic strain effects, and a local combined surface and wetting energy. The linear model with stochastic initial conditions and deterministic film height evolution will pave the way for more sophisticated analysis involving a non-linear model of stochastic film height evolution. As previously stated, one of the goals in the present work is to further explore the role of the wetting poten- tial during growth near the stability threshold in film height. A wetting potential has been included in the analysis and simulations in [38, 33, 37, 28]. Although somewhat controversial, the wetting potential plays an important phe- nomenological role. It ensures that growth takes place in the Stranski-Krastanow mode: that a 3D unstable growth occurs only after a critical layer thickness is achieved, and that a residual wetting layer persists. The physical origins and consequences of the wetting potential are discussed in [41, 28]. The analysis presented here is usable in models that neglect the wetting potential by simply setting it to zero. Another possibility is simply that the wetting potential is simply an approximation to the stabilizing effect of intermixing. [42] That said, if the wetting potential is real, the present analysis shows that it is beneficial to SAQD order to grow near the critical layer thickness. The presented analytic formulas and linear analysis are intended to complement existing numerical models of SAQD order. [43, 37, 44, 45] and to form a basis for future non-linear analytic analysis of SAQD order. The current findings agree with previous work on the beneficial effects of elastic anisotropy to enhance in-plane order. The linear analysis, of course, represents a simplification of the film evolution, and it applies only to the initial stages of SAQD formation when the nominally flat film surface becomes unstable and transitions to three-dimensional growth. However, the small surface fluctuation stage of SAQD growth determines the initial seeds of order or disorder in an SAQD array; thus, the small fluctuation stage should have an important influence on the final outcome. At later stages when surface fluctuations are large, there is a natural tendency of SAQDS to either order or ripen [33, 37, 46, 39, 47] Ordering systems tend to evolve slowly due to critical slowing down [39], while ripening tends to diminish order further. [37] Thus, it is possible that the linear model could, in fact, yield good predictions of SAQD order. The simplification and linearizion facilitates the development of analytic solutions that are most transparent, easily portable to multiple material systems and have no effective limit on system size. Finally, it is virtually impossible to have a thorough understanding of the full non-linear model without first having a thorough understanding of the linear behavior. The remainder of the paper is organized as follows. Section 2 presents the physical assumptions and mathematical 1It is likely that there a transition from stable, to metastable and finally to unstable. The analysis presented in [33] would appear to support such a view where the film height acts as the control parameter driving the transition. There is also some controversy regarding whether all dot growth is nucleationless or not. [34, 32] 2Compare various figures in [29, 35, 14, 31, 36]. approximations used to model film growth. Section 3 discusses the stochastic initial conditions and the resulting correlation functions and correlation lengths. Section 4, presents a procedure for estimating SAQD order with an application to Ge dots on a Si substrate. Section 5 presents conclusions, while Appendices A-F present additional calculational details. 2 Modeling The formation of SAQDs is modeled as a deterministic surface diffusion process with stochastic initial conditions. The resulting equations and ultimately the sought after correlation functions are different depending on whether the film surface is treated as one-dimensional isotropic, two-dimensional isotropic or two-dimensional anisotropic. The 1D and 2D isotropic cases are discussed first, and then the essential differences of the 2D anisotropic model are presented. The stochastic initial conditions need to be expressed in terms of the correlation functions that are also use to analyze order; consequently, the discussion of the initial conditions is deferred to Sec. 3.2. It should be noted that the results presented here are fairly general. There has been a good deal of recent work refining the modeling of nucleationless growth processes to incorporate various phenomenological aspects of SAQD growth. For example, the inclusion of orientation-dependent surface energy [38], strain-dependent surface energy [34] and explicit modeling of atomic species segregation and film-substrate inter diffusion. [48] Two models are presented here. One is a simple concrete example. It is the simplest model one can use including elastic effects surface energy and wetting energy. The second model is more abstract and describes the general case of a local potential energy that depends on both the film height and film height gradient. One effect that is not examined here is that of mixed 4-fold and two-fold symmetry. Such a mixing can occur due to diffusional anisotropy or surface energy anisotropy. (Sec. 2.2.1.2 and Appendix D). However, a similar analysis procedure should work for these cases as well. The general procedure for possible application to other models is discussed in Sec. 3.5. The following discussion will use abstract vector notation, e.g. k instead of ki, etc. Also, because it is sometimes computational expedient to perform one-dimensional modeling [24, 39, 17, 42], the case of a one dimensional surface with two dimensional volume is discussed along with the case of an isotropic 2D surface. To facilitate this combined discussion, the dimensionality of the surface will be denoted as d. In Secs. 3.3 and 3.4, d = 1, 2 will be substituted as appropriate. Finally, much of the calculation involves reciprocal space. The convention used for the Fourier transforms f(x) = ddk eik·xfk, and fk = (2π) ddx e−ik·xf(x) following the example of [28]. 2.1 1D and 2D Isotropic model This discussion pertains to both the 1D model and the 2D isotropic model. The formation of SAQDs is modeled as a surface diffusion process where the film height is a function of the lateral position and time. The system is treated as deterministic with stochastic initial conditions. First, the general non-linear governing equations are presented. Then, the linearized form is presented. Finally, the key behavior is reviewed. The mathematical model uses film height, H(x, t) as the dependent variable and the horizontal position x and time t as the independent variables. The film height evolves over time due to surface diffusion driven by a diffusion potential µ(x, t) and a flux of new material Q. The surface velocity is thus vn = nz∂tH = −∇S · D∇Sµ(x, t) +Q (1) where nz is the vertical component of the surface normal nz = [1 + (∇H)2]−1/2, ∇S is the surface gradient, D is the diffusivity, and ∇S · is the surface divergence. 2.1.1 Energetics The diffusion potential µ(x, t) must produce Stranski-Krastanow growth. Thus, it must contain an elastic term that destabilizes film growth, a surface energy term that stabilizes planar growth and a wetting energy that ensures a wetting layer. The diffusion potential can be derived from a total free energy. F = Felast + Fsurf. + Fwet volume dV ω + surface dAsurf. γ + dAW (H) where ω is the elastic energy density, γ is the surface energy density, W (H) is the wetting energy density. The last integral corresponds to Fwet, and whether the integral should be taken over the film surface or the substrate is ambiguous. The “simple” model (Sec. 2.1.1.1) assumes that the integral is over the substrate, while the “general” model (Sec. 2.1.1.2) can accommodate both cases. 2.1.1.1 simple form The simplest possible model results if the integral corresponding to Fwet is taken over the lateral positions x rather than over the actual free-surface. In concrete terms, one can use dV = d2xdz and dAsurf. = 1 + (∇H(x))2 to obtain the expression, volume d2xdz ω[H](x, z) + x-plane 1 + (∇H(x))2 γ +W (H(x)) , (2) where the “ω[H]” indicates that the elastic energy density is a non-local functional of the film height,H. The diffusion potential µ can be found, similar to [15], by differentiating F with respect to the surface motion (Appendix A.1), µ(x) = ΩδF/δH(x). Doing so for Eq. (2) (Appendix A.2), µ(x) = Ω [ω(x)− γκ(x) +W ′ (H(x))] . (3) where Ω is the atomic volume, ω(x) is the elastic energy density at the film surface (implicitly ω[H] (x,H(x))), κ = ∇ · ∇H(x) 1 + (∇H(x))2 ]−1/2} is the total surface curvature, and W ′(H) = ∂H(x)W (H(x)) is the derivative of W (H(x)) evaluated at x. 2.1.1.2 general form It should be noted that Eq. (3) is not the same diffusion potential used in [38]. The wetting potential used there can be derived by taking W (H) as an energy density of the free surface, not a density in the x-plane. Expressions like Eq. (3) and Eq. (1) in [38] are part of a larger class of surface evolution models with more or less the same linear behavior. The surface and wetting energy can be combined and incorporated into a more general form, with a total free energy Fsw and a free energy density Fsw(H,∇H) that depends on the film heightH(x) and the film height slope or orientation ∇H(x). The total free energy is thus F = Felast. + Fsw (4) volume d2xdz ω[H](x, z) + x−plane d2xFsw (H(x),∇H(x)) . Fsw may not necessarily be the sum of separate surface energy and wetting energy contributions. It need only be a local function ofH and ∇H. The corresponding diffusion potential is µ(x) = Ω ω(x) + F (10)sw (x)−∇ · F sw (x) , (5) where F (mn)sw indicates the mth derivative with respect to H and the nth derivative with respect to ∇H. F sw (x) = ∂H(x)Fsw (H(x),∇H(x)) and each vector component of F sw (x) is F(01)sw (x) = ∂[∇H(x)] Fsw (H(x),∇H(x)). One can obtain the results of the simple model (Eqs. (2) and (3)) by setting Fsw = 1 + (∇H(x))2 γ +W (H(x)) . (6) A diffusion potential like Eq. (1) in [38] can be obtained by setting Fsw = 1 + (∇H(x))2 [γ (∇H(x)) +W (H(x))] . This is different from Eq. (6) in two ways. First, the surface energy density depends on the surface orientation. Second, the Jacobian, J = 1 + (∇H(x))2 multiplies both the surface energy density and the wetting potential. Despite these differences, the common form of the diffusion potential (Eq. (5)) among different models suggests that they might all lead to similar linearized forms and behavior. 2.1.1.3 Linearization The diffusion potential is now linearized about the average film height H̄. In general, one can control the amount of deposited material, and thus the average film height H̄. It is therefore useful to decompose H(x) into the spatially averaged mean value and fluctuations about the average. Similar to [28], H = H̄+ h(x, t). (7) In the present calculation, H̄ is specified as constant in time. This assumption corresponds physically to a fast deposi- tion and then an anneal. It is not too difficult to generalize to a time dependent H̄, but that is beyond the scope of this manuscript. In [38, 49], deposition and evaporation is explicitly modeled. All terms in µ(x, t) are now kept to only linear order in h(x, t). The elastic energy density ω is a non-local functional of h(x, t) [40]; however, the equations generating ω(x) are translationally invariant. Thus, it is convenient to use reciprocal space for the linearization. The curvature is trivially linearized as κ(x) → ∇2h(x) in real space or κk → −k2hk in reciprocal space. The linearized elastic strain energy ω can be found in reciprocal space as in [15] to be ωk = −2M(1 + ν)�2mhk, where M = E/(1 − ν) is the biaxial modulus, E is the Young modulus, ν is the Poisson ratio, and �m is the film-substrate mismatch strain. This formula neglects possible differences in elastic moduli between the film and substrate as in [28], but a similar method of analysis should apply to that case as well. Linearizing Eqs. (3) and (5) in reciprocal space, µk is proportional to hk with a proportionality coefficient that depends on k and µlin,k = f(k, H̄)hk (8) where f(k, H̄) for three different isotropic cases, corresponding to Eqs. (3) and (5), and an abstracted general form, is given by f(k, H̄) = −2M(1 + ν)�2mk + γk2 +W ′′(H̄) ; case a (Eq. (3)) −2M(1 + ν)�2mk + F 02k2 + F 20 ; case b (Eq. (5)) −ak + bk2 + c ; case c (general) . (9) Due to isotropy, f(k, H̄) is independent of the direction of k, and only the wave number, k = ‖k‖, appears in the right hand side. F (20)sw is the second derivative of Fsw with respect to H, and F sw the second derivative of Fsw with respect to ∇H. F (20)sw and F sw depend on H̄ only; thus they are constants in the present analysis. See Appendix B.2 for more precise definitions and the derivation of f(k, H̄). Using Eq. (6), produces F (02)sw = γ and F sw = W ′′(H̄) which is identical to the simple case of Eq. (9), a. Case c, labeled as “general” where a, b, and c depend implicitly on H̄ shows that f(k, H̄) for cases a and b have the same relatively simple form. It also emphasizes the dynamic effects as opposed to the physical causes. There is a destabilizing term, −ak, a short wavelength cutoff term, bk2, and a term that stabilizes the entire spectrum, c. Despite the label “general,” there are of course limitations to the application of Eqs. (8) and (9). For example, there has been recent work on the effects of strain-dependent surface energies. [34] The second form can not represent such an effect because the derivation assumes that the surface energy only depends on local quantities, (H and ∇H) whereas the strain effect is non-local. However, it is reasonable to conjecture that a more detailed analysis of the effects of a strain dependent surface energy term would produce a coefficient function f(k, H̄) not very different from the case c “general” form of Eq. (9). Thus, the following analysis may very well apply to this more exotic model, but more study is needed to be certain. 2.1.2 Dynamics As discussed in Sec. 2.1.1, the dynamics are derived assuming no flux of new material (Q = 0) and keeping only terms to linear order in the height fluctuation, h(x, t). Under these assumptions, Eq. (1) can be decomposed into a Table 1: Characteristic wave-numbers, characteristic times and associated dimensionless variables for the three cases addressed in Eq. (9). kc tc α β case a 2M(1+ν)� 16DΩM4(1+ν)4�8m γW ′′(H̄) 4M2(1+ν)2�4m case b 2M(1+ν)� (F (02)sw ) 16DΩM4(1+ν)4�8m F (02)sw F 4M2(1+ν)2�4m case c a/b b3/(DΩa4) k/kc cb/a2 trivial equation for H̄ and an equation for the film height fluctuation by inserting Eq. (7). dH̄/dt = 0 (10) ∂th(x) = −∇ · D∇µlin(x) (11) where µlin(x) is the inverse Fourier transform of Eqs. (8) and (9), and it depends implicitly on the average film height H̄. Note that the time dependence is implicitly while the coordinate dependence is explicit. The explicit coordinate dependence serves to distinguish Assuming that the diffusivity D is constant, the Fourier transform of Eq. (11) gives the linearized differential equation for the evolution of each Fourier component. ∂thk = −Dk2µk = −Dk2f(k, H̄)hk. (12) Solving Eq. (12), hk(t) = hk(0)e σkt; (13) σk = −Dk2f(k, H̄). (14) The surface evolves in reciprocal space as an initial condition, hk(0) multiplied by an envelope function, eσkt. For most values of H̄, this envelope function has a peak. As time passes, this peak narrows and can be approximated by a gaussian. To analyze this behavior, appropriate dimensionless variables are defined. Then, the stability of the film is discussed. Finally, σk is expanded about its peak to aid analytic calculations. The time dependent behavior of the film height fluctuations is facilitated by using a characteristic wave number, characteristic time and related dimensionless variables. For the “general” case c of Eq. (9), the characteristic wavenum- ber is kc = a/b, and the characteristic time is tc = 1/(DΩbk4c ) = b3/(DΩa4). These characteristic dimensions can be used to define a dimensionless wave vector, α = k/kc and a dimensionless wetting parameter β = c/(bk2c ) = cb/a One can also define a dimensionless time, τ = t/tc. To obtain the corresponding characteristic scales for cases a and b, one merely has to plug in the appropriate substitutes for a, b and c and follow the pattern. For example, for case a, make the substitution a → Ω2M(2 + ν)�2m, etc. Table 1 summarizes these values for all three cases. For all three cases, f(k, H̄) and the growth constant σk reduce to the following forms: f(k, H̄) = f(kcα, H̄) = Ωbk2c −α+ α2 + β σk = σkcα = t α− α2 − β , (16) where α = ‖α‖ = k/kc is the dimensionless wave number. These forms are plotted in Figs. 1a and 2. Fig. 1a shows f(k, H̄)/Ωbk2c vs. α for an isotropic or one dimensional surface. Figs. 2 shows tcσk vs. α for a 2D anisotropic surface (Sec. 2.2). However, the curves marked 0◦ are identical to the dispersion relation for a 1D or 2D isotropic surface (compare Eqs. (9) and (23)). 2.1.3 Peaks The peak growth rate and the corresponding wavenumber k can be found from Eq. (16). σk has a peak at k0 = kcα0 where 9− 32β . (17) Expanding σk about this peak to second order in k − k0, σk ≈ σ0 − σ2(k − k0)2 ! ! 0.25! increasing θk Figure 1: Dimensionless diffusion potential prefactors vs. dimensionless wave number. (a) The one dimensional or isotropic case with β = 0.3. (b) The elastically isotropic case with anisotropy �A = 0.1 (see Eq. (22)). increasing θk 22.5◦ increasing θk 22.5◦ Figure 2: Dimensionless growth constant vs. dimensionless wave number. Curves are plotted for the elastically anisotropic case, but the curves marked 0◦ are the same as for the isotropic cases. In (a), β = 0. In (b) β = 0.2. Figure 3: Exponential Envelope eσkt as function of α for β = 0.208 and t/tc = 100. (a) 2D isotropic surface. (b) 2D anisotropic surface with � = 0.1236. The two constants are t−1c α 0 (α0 − 2β) , (18) σ2 = t c (3α0 − 4β) . (19) Inserting this approximation for σk into Eq. (13), hk(t) = hk(0)e σ0te− 2σ2t(k−k0) . (20) The individual initial surface fluctuation components grow with a gaussian shaped envelope. An example of this envelope is plotted in Fig. 3(a). Notice that in two dimensions, the envelope forms a ring as the peak is about the wave-number k0 but not about any particular point in the k-plane. 2.1.4 Stability and wetting potential Stranski-Krastanow growth is marked by a transition from stable two-dimensional growth to unstable three-dimensional growth once a critical height Hc is reached. [1] Eqs. (17), (18) and (20) are useful for analyzing the transition from stable to unstable growth. In order for this transition to occur, there must be some stabilizing term in the diffusion potential. In the present model, this means that there must be some surface energy-like term that varies strongly with film height. This condition equates to stating that W ′′(H̄) or F 20sw or c (Eq. (9)) must be rather large if H̄ < Hc. However, as H̄ increases, these terms are reduced. Finally, when H̄ > Hc, this term is no longer capable of stabilizing the film against fluctuations of all possible wavelengths. The critical value Hc can be found using the analysis from [33]. By inspection of Eqs. (8), (9) and (12), modes with f > 0 increase the total free energy F as they grow; thus, they are stable and decay with time. Modes with f < 0 decrease the total free energy F as they grow; thus, they are unstable and grow with time. This growth and decay rule is easily verified by inspection of Eq. (14). Thus, stable growth occurs when f(k, H̄) > 0 for all values of k, and unstable growth occurs when f(k, H̄) < 0 for some values of k. Thus, the transition from stable to unstable growth occurs when the minimum value of f(k, H̄) just becomes negative. Using the same dimensional analysis as in the previous section and following the discussion of [33], one finds that the minimum value, fmin = Ωbk2c (β − 1/4), occurs at kmin/kc = αmin = 1/2. fmin first becomes negative, and the transition to unstable growth occurs when the dimensionless wetting parameter (Table 1) drops to a critical value, β = 1/4 . β > 1/4 stable 2D growth, and β < 1/4 unstable 3D growth. It is reasonable to suppose that W (H̄), W ′′(H̄), and thus β are positive monotonically decreasing functions of H̄ so that the interface becomes less important for large values of H̄. For example, in [50] it is assumed that W (H) = B/H, where B is constant. When β → 0, corresponding to large H̄, the case discussed in [28] is obtained. A similar analysis can be done for cases b and c once one specifies how the terms F (20)sw and F sw or a, b and c depend on H̄. Using a guessed form for a wetting potential, one can find the critical film heightHc by setting β = 1/4 . Applying this condition to case a in Eq. (3) W ′′(Hc) = γk2c/4. Using the wetting potential of [50] as an example, W (H) = B/H, Hc = 3 8B/(γk2c ) = 8Bγ/(2M(1 + ν)�2m)2. (21) Conversely, one can fit a wetting potential to an observed or reasonable critical layer thickness from the same condition. Using the example wetting potential from [50], (2M(1 + ν)�2m) as stated in [50].3 2.2 2D Anisotropic case Crystal anisotropy leads to a dispersion relation σk that is both quantitatively and qualitatively different from the isotropic case. Here the effect of elastic anisotropy is discussed in most detail. Other sources of anisotropy are the surface and wetting energies. For example, in [38] the surface energy density is orientation dependent which introduces a possible anisotropy in the dispersion relation. Possible sources of anisotropy are an anisotropic elastic stiffness tensor, an orientation dependent surface energy or wetting potential or anisotropic diffusion. As discussed below, the form of anisotropy to linear order in the height fluctuation, h, is somewhat restricted. Results are presented for 4-fold symmetric surfaces, that is surfaces that have invariant dynamic evolution laws when rotated by 90◦. Possible complications arising from 2-fold symmetric anisotropic terms (with 180◦ rotational symmetry) are also discussed. As for the isotropic case, first the energetics are discussed, then the dynamics, and finally the expansion about the peaks in the dispersion relation, σk. 2.2.1 Energetics The discussion of energetics will first treat the effects of elastic anisotropy and then anisotropy resulting from surface or wetting like terms. 3This result from [50] corresponds to the choice Fsw(H,∇H) = 1 + (∇H)2γ+W (H). However, the numerical model in [50] appears to use Fsw(H,∇H) = 1 + (∇H)2 [γ(∇H) +W (H)]. This difference should lead to a slightly different critical film height in their numerical model from the one that they predicted (Eq. (21)). Figure 4: Plot of Eθk/(M� m) for various materials. Symbols indicate values calculated using Appendix C. Solid lines are the interpolation (Eq. (22)) using the values from Table 2. Table 2: Elastic constants [51] and calculated values (see Appendix C) for various materials of interest at T = 300K. c11 c12 c44 M 1011 ergcm3 10 11 erg cm3 10 11 erg cm3 10 11 erg Ge 12.60 4.40 6.77 13.93 2.16 1.906 0.1176 Si 16.60 6.40 7.96 18.07 2.22 1.997 0.1005 InAs 8.34 4.54 3.95 7.94 2.70 2.09 0.226 GaAs 11.90 5.34 5.96 12.45 2.15 1.87 0.1302 2.2.1.1 Elastic anisotropy One would like to obtain a simple symbolic expression for the elastic energy density at the free surface, ωk, to first order in hk for the elastically anisotropic case. Similar discussions can be found in [25, 26]. For the isotropic case, ωk = −2M(1 + ν)�2mhk. For the anisotropic case, ωk = −Eθkkhk where the prefactor Eθk is the decrease in elastic energy at the surface per unit wave number (k → 1) and unit amplitude (hk → 1) . It is not constant, but instead depends on the θk, the angle that k makes with the x−direction. The case of a cube-symmetry elastic stiffness tensor such as for Si is considered where one must specify three elastic constants c11, c12 and c44. [51]. Growth on a (100) surface will produce an elastic energy prefactor Eθk that is four-fold symmetric (symmetric upon rotations by 90◦). A procedure similar to [25, 26] based on a first order perturbation analysis is followed (Appendix C). A relatively simple interpolation formula [24] is hypothesized and then verified numerically. The interpolation procedure, suggested in [24] uses the lowest possible order expansion in sin(θk) and cos(θk) that has the appropriate four-fold symmetry and then interpolates between θk = 0◦ and θk = 45◦. Thus, Eθk = E0◦ 1− �A sin2 (2θk) where �A = (E0◦ − E45◦)/E0◦ is an anisotropy factor. This lowest order form turns out to be a very good fit to numerical calculations (Fig. 4). Table 2 gives values of E0◦ and �A for some systems of interest. In the elastically isotropic case, E0◦ = E45◦ = 2M(1 + ν) so that �A = 0. There are two important differences from the elastically isotropic case. The first is obvious, that Eθk depends on angular orientation, θk. The second is that the peak value of ωk is not the same as that for the elastically isotropic case because in general, E0◦ 6= 2M(1 + ν). In [24], where the purpose was simply to investigate the mechanism by which elastic anisotropy effects order, this second difference was neglected. 2.2.1.2 Surface and Wetting Energy Anisotropy The surface energy and wetting potential can be additional sources of anisotropy if they depend on the surface orientation so that γ → γ(∇H) or W (H) → W (H,∇H) (for example, [52, 38]). Then, to first order in h , µsurf.,k = Ω γk2 + k · γ̃′′ · k where γ̃′′ is the (2×2) matrix or Hessian matrix that results from taking the second derivatives of γ(∇H) with respect to the two components of ∇H (Appendix B.1). Similarly µwet,k = Ω W (20) + k · W̃(02) · k where W (20) and W̃(02) are the second derivatives of W (H,∇H) with respect to H and ∇H (Appendix B.1). For both µsurf.,k and µwet,k, the first term is isotropic, and the second term contains any possible anisotropy. The rank of the γ̃′′ and W̃(02) matrices greatly restricts the possible forms of the additional anisotropy. These (2 × 2) matrices must be either two-fold symmetric or perfectly isotropic. Thus, if the surface energy and wetting potential are four-fold symmetric as Eθk is, then γ̃ ′′ → γ′′, a scalar, and W̃(02) → W (02), a scalar, and neither one contributes any additional anisotropy. They do, however, help to stabilize or further destabilize the 2D surface as they add terms proportional to k2. The effect of these additional terms is indistinguishable from the effect of varying the value of the surface energy density, γ. [52, 31] It should be noted that the (100) surface of a diamond or zinc-blend structures allows for anisotropy that is only 2- fold symmetric (rotations by 180◦). Thus, they could “break” the four-fold symmetry that occurs when one considers the elastic anisotropy alone. However, this “broken” symmetry is somewhat dubious because even the diamond and zinc-blend structures have a screw symmetry (rotations by 90◦ and translation in the [100] direction by half a lattice vector). Thus, if for example, W (H,∇H) is anisotropic with two-fold symmetry to linear order, there must be a fast oscillation with changes in the film height H. In Appendix D, a similar term related to anisotropic diffusion is discussed. There does not appear to be any evidence for this two-fold symmetry in the case of (100) surfaces of IV/IV systems such as Ge/Si, but in III-V/III-V systems the four-fold symmetry of the (100) surface may indeed be “broken” in this way corresponding to either a surface energy anisotropy or a diffusional anisotropy. [53, 54]. Further analysis of such terms in any more detail would greatly complicate the present discussion, so it is left for future work. Most of the modeling literature avoids this complication by not including the symmetry-breaking of the zinc-blend surface, for example [25, 26, 38]. One can perform a similar analysis of the combined surface and wetting potential, Fsw(H,∇H) (case b). To linear order the resulting anisotropic diffusion potential is (Appendix B.2) µsw,k = Ω F (20)sw + k · F̃ sw · k Again, F̃(02)sw is a rank 2 tensor, and all of the same symmetry considerations apply here as well. Because the two-fold symmetry anisotropic terms are excluded from the current discussion, and isotropic terms simply “renormalize” the effective of surface energy, there will be no further consideration of anisotropy resulting from the surface energy or wetting potential in this discussion. Further calculations will proceed assuming that the surface energy density, γ, nor the wetting potential,W (H), depend on ∇H or similarly that Fsw(H,∇H) has a purely isotropic dependence on ∇H. This assumption can be made without affecting any of the qualitative results. 2.2.1.3 total diffusion potential Having dispensed with the discussion of the various sources of anisotropy, the total diffusion potential is stated for the case of 4-fold symmetric elastic anisotropy and a completely isotropic surface energy and wetting potential. µk = f(k, H̄) with f(k, H̄) = 1− �A sin2(2θk) k + γk2 +W ′′(H̄) ; case a (Eq. (3)) 1− �A sin2(2θk) k + F (02)sw k2 + F ; case b (Eq. (5)) 1− �A sin2(2θk) + bk2 + c ; case c (general) . (23) Table 3: Characteristic wave-numbers, characteristic times and associated dimensionless variables for the three cases addressed in Eq. (9) kc tc α β case a E0◦/γ γ3/(DΩE40◦) k/kc γW ′′(H̄)/E20◦ case b E0◦/F /(DΩE40◦) k/kc F sw /E20◦ case c a/b b3/(DΩa4) k/kc cb/a2 2.2.2 Dynamics The dynamics is governed by surface diffusion, just as for the fully isotropic case. It is assumed that the diffusivity is isotropic as was done for the surface energy and the wetting energies; thus, all anisotropy in the film evolution dynamics comes from elastic effects alone. The possibility and effects of an anisotropic diffusion potential is discussed in Appendix D (also see [54]). The time dependence of the surface perturbations simply follows Eqs. (13) and (14), but with Eq. (23) used for f(k, H̄). As for the isotropic case, appropriate characteristic wave numbers (kc) and time scales (tc) can be found for each of the three cases along with the associated dimensionless wave vector α and dimensionless wetting parameter β. These are listed in Table 3. The dispersion relation, σk can be expressed in terms of these dimensionless variables (α and β), giving σk = σkcα = t 1− �A sin2(2θk) − α2 − β . (24) The stability behavior is essentially the same as for the isotropic case with a transition occurring at β = 1/4 corre- sponding to H̄ = Hc. 2.2.3 Expansion about peaks σk has 4 peaks at (k, θk) = (k0, π[n− 1]/2) with k0 = kcα0 (Eq. (17)) and n = 1 . . . 4. In vector form, there are four peaks at kn = k0 (cos(π(n− 1)/2)i + sin(π(n− 1)/2)j) . Similar to the isotropic case, σk can be expanded about individual peaks so that in the vicinity of peak n, σk ≈ σn σn = σ0 − σ‖(k − k0)2 − 0(θk − nπ/2) where σ0 is given by Eq. (18), σ‖ = σ2 given by Eq. (19), and σ⊥ = 8�Aα0t In terms of the vector components parallel and perpendicular to kn, k‖ and k⊥ respectively, σn = σ0 − σ‖(k‖ − k0)2 − k‖ = cos[π(n − 1)/2]kx + sin[π(n − 1)/2]ky , and k⊥ = − sin[π(n − 1)/2]kx + cos[π(n − 1)/2]ky . The time evolution of hkin the vicinity of one of the kn is hk(t) ≈ hk(0)et(σ0− 2σ2(k‖−k0) 2− 12σ⊥k 3 Correlation Functions Correlation functions and associated constants such as correlation lengths can be very useful for characterizing order. In particular, the autocorrelation function (Eq. (25)) and its Fourier transform (Eq. (26)) also known as the spectrum function can give a very good characterization of dot order (Figs. 6a and c and 5b, e and h). The autocorrelation function is denoted CA(∆x) where ∆x is the difference vector between two points in the x−plane. The spectrum function is a function of k, and it is denoted CAk . The goal here is to be able to predict these two functions and to h(nm) h(nm) h(nm) describe them quantitatively in a manner that can be used to characterize SAQD order with just a few numbers. The au- tocorrelation function is the result of a spatial average over one experiment or one simulation (numerical experiment). It is regular and repeatable because it is closely tied to the correlation function and spectrum function that results from an ensemble average (Eqs. X and X). These are denoted as C(∆x) and the spectrum Ck respectively. Note that the ensemble averaged functions do not have a superscript “A.” These ensemble average correlation functions are useful in the analysis of stochastic ordinary and partial differential equations. [55, 56]. From a strictly technical viewpoint, the spatial average and the ensemble average are not exactly the same; however, they are closely enough connected that it is reasonable to use one as a substitute for the other (Sec. 3.1 and Appendix 3). In the following, the analysis of SAQD order via autocorrelation and correlation functions is discussed (Sec. 3.1). Then, the stochastic initial conditions are discussed (Sec. 3.2). Then, the prediction of the Fourier transforms of the correlation functions is discussed (Sec. 3.3). The real-space correlation functions are presented (Sec. 3.4). Finally, there are some notes regarding generalizing the analysis method to any dispersion relation that has peaks (Sec. 3.5), for example, peaks related to broken four-fold symmetry or growth on a miscut substrate. 3.1 Correlation Functions and SAQD order Auto-correlation functions are well-suited for investigating SAQD order. The autocorrelation function is defined as CA(∆x) = d2x′ h(∆x + x′)h(x′)∗. (25) Its Fourier transform sometimes called the spectrum [56], spectrum function or power spectrum is CAk = (2π)d d2∆x e−ik·∆xC(∆x) = (2π)d , (26) where A is the projected area of the film in the x − y−plane. A periodic array of SAQDs leads to a periodic auto- correlation function. A nearly periodic array leads to a range-limited periodic auto-correlation function. The ensemble- mean of these autocorrelation functions can be calculated, and it is a good predictor of a SAQD order. 3.1.1 Periodic array Consider a perfectly periodic height fluctuation corresponding to a perfect lattice of SAQDs, h(x) = exp [ikn · (x− xO)] (27) plus higher order harmonic, where the dots have a height proportional to h0, N is the degree of symmetry, probably, 4-fold or 6-fold, xO is a random origin offset. kn = k0 2π(n− 1) i + sin 2π(n− 1) In a linear analysis, the higher order harmonics do not come into play, so they are neglected here. In reciprocal space, e−ikn·xOδd(k− kn) plus higher order harmonic. The autocorrelation function is found by plugging Eq. (27) into Eq. (25) and simplifying, CA(∆x) = )2 N∑ exp [ikn ·∆x] (28) plus higher order harmonic. In finding Eq. (28), the relation∫ d2x′ ei(km−kn)·x = Aδkmkn = (2π) dδd(km − kn) (29) has been used. δkk′ is the Kronecker Delta, and δd(k− k′) is the Dirac Delta. Eq. (29) will be helpful whenever it is necessary to take an areal average or sum over Dirac Delta functions. In reciprocal space, CAk = (2π)2 m,n=1 δ2(k− km)δ2(k− kn) δ2(k− ki) (30) plus higher order harmonics, where δd(k − kn) = (A/(2π)d)δkkn .4. Thus, the order of the SAQD lattice manifests itself as periodic functions in real-space (Eq. (28)) and sharp peaks in reciprocal space (Eq. (30)). 3.1.2 Nearly Periodic array A nearly periodic arrays shows deviation from perfect order. This deviation is shows itself by broadening of the peaks of the spectrum function, CAk , and by range limited periodicity of the real-space autocorrelation function, C A(∆x). These two measure of disorder are naturally related. The disorder in lateral dot size ∆size and spacing, ∆spacing are related to each other and to the broadening of the peaks in CAk (Fig. 6.a and c). Prior to ripening, the size and spatial order should be related, as the volume of a dot should be proportional to the amount of nearby material. If the SAQDs have nearly uniform size and spacing (peak- to-peak distance) L0, the reciprocal space autocorrelation function will be tightly clustered around the wavenumber characterizing the dot spacing k0 = 2π/L0. There are a number of such peaks depending on the system symmetry (Fig. 6.a and c), but consider just one. Since the order is not perfect, the peak will have a finite width. Consequently, there will be a scatter in the dot size. Since L0 = 2π/k0, the scatter in dot spacing (∆spacing) is related to the scatter in Fourier components (∆k). Taking the derivative of the spacing-wavenumber relation and rearranging, ∆spacing It is reasonable to expect that the fractional disorder in size (∆size /Lsize) is given by a similar (if not exactly the same) number. Another way to view spatial order (periodicity) is not by dot-dot distances, but the distance over which the dot array can be considered periodic. This limited periodicity is evident in the film height autocorrelation function (Eq. (25) and Figs. 5.b, e and h). Consider two distant dots. Their position will be completely uncorrelated, so it will be completely random as to whether one position corresponds to a peak or a valley. Thus, for a large differences in position the autocorrelation function tends to zero. CA(∆xlarge) = 0 Similarly, the mean-square fluctuation of the film height can be large so that CA(∆x = 0)� 0. The distance over which the autocorrelation function, CA(∆x) decays to 0 is the correlation length, Lcor. Thus, Lcor is a reasonable measure of spatial order. The two measures of order ∆spacing and Lcor are intrinsically linked. The well known rule of Fourier transforms states that the product of the real-space and reciprocal space widths must be greater than or equal to unity. Thus, ∆kLcor ≥ 1, or ∆spacing ≥ 2πL20/Lcor. Similarly, one can expect that ∆size ∼ L2size/Lcor. Thus, assuming that dot size is governed by the amount of nearby material, small dispersions in dot size are only possible if there is long correlation length. 3.1.3 Ensemble Correlation Functions / ergodicity SAQDs are seeded by random fluctuations. Consequently, each experiment or simulation must be treated as just one possible realization, and the autocorrelation function will be different for each realization. Thus, for analytic 4Eq. (29) has been used to help with summation predictions, one must rely on ensemble averages. In [24], it was assumed that the ensemble average correlation function was a good description of a SAQD order, an assumption that was born out by numerical calculations. Now, this relation is put on a more solid ground. In particular, it is found that the ensemble correlation functions provide good estimates of the auto correlation function and spectrum function produced by any particular realization. First, it is shown that the mean value of the film-height fluctuation is zero. Then the method to calculate the ensemble- averaged autocorrelation function and spectrum function is presented. Additional mathematical details are presented in Appendix 3. 3.1.3.1 Mean fluctuation It is fairly straightforward to show that the ensemble mean film-height fluctuation is zero. The governing dynamics (Eq. (12)) is invariant upon the substitution h(x, t) → −h(x, t). Thus, assuming that one does not bias the initial conditions the mean fluctuations must be zero for all time, 〈h(x, t)〉 = 〈−h(x, t)〉 = 0, and 〈hk(t)〉 = 0. This is a common situation, and it is most appropriate to characterize the film height fluctuations using the two-point correlation function (or simply “the correlation function”). [55] 3.1.3.2 Correlation Function The autocorrelation function can be estimated by its ensemble average. Further- more, this ensemble average is equivalent to the correlation function that can be easily calculated analytically. These relations are first discussed for the real-space correlation functions and then their Fourier transforms. First, the statis- tical properties of the autocorrelation function are discussed. Then the statistical properties of the spectrum function. Finally, the method to The main results are reported here, and details of derivations are reported in Appendix E. First it is noted that the autocorrelation function averaged over all realizations is equal to the ensemble correlation function. 〈 CA(∆x) = C(∆x), where C(∆x) = 〈h(∆x)h(0)〉 , (31) where 〈. . . 〉 indicate an ensemble average. Eq. (31) assumes that the model of film-growth is translationally invariant.5 This relationship is fortunate, in that it allows one to predict the “typical” autocorrelation function using analytic tools that apply only to ensemble averages. Second, it is noted that as the area that is used to calculate the autocorrelation function becomes large, the autocor- relation function tends towards it mean value, CA(∆x) ≈ C(∆x) +O[A−1/2], (32) where O[A−1/2] indicates statistical fluctuations about the mean value that become smaller and smaller as the area in an experiment or the simulation area in a numerical experiment becomes larger. These fluctuations or noise die out as A1/2. For example, the autocorrelation functions in Figs. 5.e and h are very close to the ensemble average autocorre- lation functions Figs. 5.f and h, but have random fluctuations that are most visible far from the origin. This property, that averaging over a parameter such as position is equivalent to averaging over all realizations, is known as ergodicity. Individual realizations are tightly distributed about a “typical” behavior. This tight distribution lends credibility to the notion that one can have representative experiments or simulations. Unfortunately, the “demonstration” of Eq. (32) in Appendix E is not as general as one might like. Rigorously, it applies when the Fourier components of film height (hk) are independent and normally distributed; however, it is reasonable to conjecture that a relationship like Eq. (32) holds whenever the statistical distribution of film heights is suitably bounded as the boundedness of CAk plays an important role in the derivations. In reciprocal space, one finds that the ensemble-mean spectrum function is〈 = Ck, (33) where Ck is defined as the prefactor appearing in the reciprocal-space two-point correlation function. Ckk = 〈hkh∗k〉 = Ckδ d(k− k′) = Ck (2π)d δkk′ , (34) 5A quick survey of literature will find that, virtually all published continuum models of SAQD formation are translationally invariant. where Eq. (29) has been used. This form for the two-point correlation function in reciprocal space occurs if and only if the system is translationally invariant. Eq. (33) is valuable because one can solve for Ck analytically in the linear case or using various analytic approximations in the non-linear case. Unlike the autocorrelation function, the spectrum function fluctuates greatly about its mean. In fact, the fluctuations are about 100% (Appendix E.2). These large fluctuations result in the commonly observed speckle pattern for the spectrum function CAk (Figs. 6.a and c). Contrast this pattern with ensemble-mean spectrum function Ck shown in Figs. 6.b and d. These speckles can be removed by a smoothing operation, and a relation similar to Eq. (32) results (Appendix E.2.2). Finally, it should be noted that just as CAk is the Fourier transform of C A(∆x), Ck is the Fourier transform of C(∆x) (Appendix E.1). 3.2 Stochastic Initial Conditions To model or simulate the formation of SAQDs, it is absolutely essential to include some sort of stochastic effect. An initially flat film h(x, 0) = 0 is in unstable equilibrium. Thus, to seed the formation of quantum dots, it is necessary to perturb the flat surface. The simplest method to do this is to use stochastic initial conditions with deterministic evolution. One can tenuously suppose that white noise initial conditions do not “bias” the ultimate evolution of the film. [57] Thus, the initial conditions are taken from an ensemble with zero mean, 〈h(x, 0)〉 = 0. (35) and a spatial correlation function, C(x,x′, 0) = 〈h(x, 0)h(x′, 0)∗〉 = ∆2δd (x− x′) , (36) where the brackets 〈. . . 〉 indicate an ensemble average, ∆ is the noise amplitude, and δd(x) is the d−dimensional Dirac Delta function. White noise conditions have an infinite amplitude which is not physical. Thus, a minimum modification can be made to “cut off” the infinite fluctuations. C(x,x′, 0) = (2πb20) (x− x′)2 In the limit b0 → 0, this correlation function reverts to the white noise correlation functions. In reciprocal space, Ckk′(0) = 〈hk(0)h∗k′(0)〉 = (2π)−2d ddx′ e(−ik·x+ik ′·x′)C(x,x′, 0) (2π)d δd(k− k′) Letting b0 → 0, the white noise reciprocal space correlation function is obtained. Thus, the initial spectrum function Ck(0) = (2π)d The atomic-scale has a small and short-lived influence on the final film morphology (Appendix F), but the cutoff procedure is useful for choosing a reasonable value of ∆2. It seems reasonable to choose ∆2 so that the initial r.m.s. fluctuation C(0, 0) = 〈h(0, 0)h(0, 0)∗〉1/2is one monolayer (1 ML). Also, choosing b0 = 1 ML as the atomic scale cutoff is ∆2 = (2π)d/2(1 ML)2+d, (38) where the natural unit 1 ML is, of course, material dependent. Using stochastic initial conditions, one can integrate individual initial conditions to obtain representative samples and then average over many realizations, the Monte Carlo approach, or one can calculate analytically, the statistical measures of the ensemble. The ensemble statistical measures are strongly related to the statistical measures of order for an individual realization, so the second approach is opted for here. Thus, the predicted SAQD order is ultimately stated in terms of ensemble correlation functions. 3.3 Reciprocal Space Correlation Functions The reciprocal space correlation function, Ckk′ , and spectrum function, Ck, are calculated for the 1D and 2D isotropic case and then for the 2D anisotropic case. Generally Ck includes the length scales introduced in Sec. 2.1.3 as well as the atomic scale cutoff b0. Ckk′ = 〈hk(t)h∗k′(t)〉 = e (σk+σk′ )t 〈hk(0)hk′(0)∗〉 (2π)2 e(σk+σk′ )t− δ2(k− k′). (39) Without much error, b0 can be neglected in the exponential (Appendix F). Using Eq. (34), the spectrum function is then identified as (2π)d e2σkt. (40) Ck is now calculated for each model: 1D isotropic, 2D isotropic and 2D anisotropic. 3.3.1 one-dimensional The one dimensional surface is the simplest, so it is treated first. The spectrum function is simply e2σ0t− 2 (2σ2t)(k−k0) Ck has a peak at k = ±k0i. One can easily read off the correlation length as Lcor = 2σ2t = k 2(3α0 − 4β)(t/tc). (41) so that e2σ0t− cor(k−k0) This approximation is valid when k0Lcor � 1. In terms of kx, e2σ0t cor(kx−k0) cor(kx+k0) 3.3.2 2D isotropic The 2D isotropic case is very similar; (2π)2 e2σ0t− cor(k−k0) , (42) where Lcor is the same as in Eq. (41). It has maximum that forms a ring in the k−plane as graphed in Fig. 6.b. 3.3.3 anisotropic The anisotropic spectrum function is (2π)2 e2σ0t ‖(k‖−k0) 2− 12L ⊥ , (43) where 2σ‖t = k (6α0 − 8β)(t/tc), (44) 2σ⊥t = k 16�α0(t/tc), (45) k‖ = cos[π(n− 1)/2]kx + sin[π(n− 1)/2]ky , and k⊥ = − sin[π(n− 1)/2]kx + cos[π(n− 1)/2]ky and it is graphed in Fig. 6.d. This approximation is valid when k0L‖ � 1 and k0L⊥ � 1. Figure 6: CAk and Ck for Ge/Si as discussed in Sec. 4. (a,b) 2D isotropic surface. Eq. (42) is used for Ck. (c,d) 2D anisotropic surface. Eq. (43) is used for Ck. Figure 7: 1D isotropic surface in real space for Ge/Si as discussed in Sec. 4. (a) Example of h(x) plotted over a length of 8Lcor. (b) corresponding reals space correlation functions plotted for range ±4Lcor. Filled plot is an example of CA(∆x). Solid line isC(∆x) (Eq. (46)). Loosely speaking, one can argue that the isotropic case is similar to letting �A → 0 in Eq. (45) so that the perpendicular correlation length is always 0 regardless of time. A more conservative approach would be to argue that L⊥ ≈ 2π/k0 for the isotropic model via inspection of Figs. 6(a) and (b). Even still, the more conservative result guarantees that the perpendicular correlation length will always be the same as the dot spacing; thus, it will always limit SAQD order to the first nearest neighbor at best. 3.4 Real Space Correlation Functions The real space correlation functionsC(∆x) are now calculated for the 1D and 2D isotropic cases and the 2D elastically anisotropic case. 3.4.1 one-dimensional In one dimension, C(∆x) = dkx e ikx∆xCk e2σ0t− 2/L2cor2 cos (k0x) . (46) Thus, C(∆x) has a damped periodicity indicating that it is imperfectly periodic (Fig. 7). 3.4.2 2D isotropic In two dimensions with elastic isotropy, C(∆x) = d2k eik·∆xCk (2π)2 e2σ0t dk kei(k∆x cos(θk−θ∆x)e− cor(k−k0) Performing the angular integration first, C(∆x) = e2σ0t dk kJ0(k∆x)e − 12L cor(k−k0) where J0 is the zeroth Bessel function. In general, this integral is best performed numerically; however, it can be solved in two important cases: ∆x→ 0 and Lcor →∞ (corresponding to long times). In the first case, C(∆x) = e2σ0t dk ke− cor(k−k0) Under the same conditions that Eq. (42) is valid (k0Lcor � 1), the lower limit of the integral can be approximated as −∞ so that C(∆x = 0) = ∆2k0√ 2πLcor e2σ0t. (47) This function gives the mean square surface height fluctuation. In the second case where Lcor →∞, e− cor(k−k0) (2π)1/2L−1cor δ(k − k0), so that C(∆x) = ∆2k0√ 2πLcor e2σ0tJ0 (k0∆x) . (48) This correlation function is the most ordered case for a 2D isotropic surface. It is graphed in Fig. 5c. 3.4.3 anisotropic To find the real-space correlation function for the elastically anisotropic case, it is best to find the contribution from each peak and then sum so that C(∆x) = (2π)2 e2σ0t Cn(∆x) (49) where Cn(∆x) = d2k eik·xe− ‖(k‖−k0) 2− 12L ∆x can be decomposed into the directions parallel and perpendicular to kn, so that ∆x‖ = cos(π(n − 1)/2)∆x + sin(π(n− 1)/2)∆y and ∆x⊥ = − sin(π(n− 1)/2)∆x+ cos(π(n− 1)/2)∆y. Thus, Cn(∆x) = dk‖ e ik‖∆x‖− 12L ‖(k‖−k0) dk⊥ e ik⊥∆x⊥− 12L 2 (x2‖/L2‖+x2⊥/L2⊥)eik0x‖ . Plugging into Eq. (49), C(∆x) = πL‖L⊥ e2σ0t 2 (x2/L2‖+y2/L2⊥) cos(k0x) + e − 12 (x2/L2⊥+y2/L2‖) cos(k0y) . (50) 3.5 Generalizability The dynamics and analysis used here were for a specific model, but the general procedure for analyzing the order resulting from a linearized model should hold for any model with well-separated peaks in the dispersion relation, σk. The procedure to follow is: 1. Generate the dispersion relation, σk as some function of k. 2. Find the peaks in the dispersion relation, kn, (n = 1 . . . N ) 3. Expand about the peaks to generate the peak values, σn, and local Hessian matrix,( ∂ki∂kj The spectrum function is then approximately Ck(t) ≈ (2π)2 e2σnt exp t (k− kn) · H̃n · (k− kn) . (51) 4. Find the Eigenvalues of the local Hessian matrix, (Hn)I and (Hn)II . They should be negative, if there is a peak at kn 5. Use the eigenvalues to determine the correlation lengths, (Ln)I = 2 |(Hn)I | t and (Ln)II = 2 |(Hn)II | t. The real-space correlation function is C(∆x, t) ≈ (Hn)I (Hn)II e2σnt exp x · H̃−1n · x eikn·x. (52) The “goodness” of these approximate forms requires that (Ln) I and (Ln) II be much less than the spacing between peaks in the correlation function so that the gaussians do not overlap greatly. A reasonable test for this no-overlap condition is ‖kn‖ (Ln)I � 1 and ‖kn‖ (Ln)II � 1, assuming that the peaks are not large in number or very closely spaced. 4 Order Predictions The real-space correlation function formulas (Eqs. (46), (47), and (50)) and correlation length formulas (Eqs. (41), (44) and (45)) can now be used to estimate the order of SAQDs. Ge on Si is chosen for this example because this system has received the most attention from theoretical work [58, 38, 31, 18, 39, 41, 27, 25, 26, and others], and it is the simplest since it involves the diffusion of a single species. The procedure described below tries to predict the amount of order when an initial atomic-scale fluctuation becomes “large”. “Large” is taken to be greater than atomic-scale. Beyond this point, one would expect non-linear terms to become important. An example is presented for Ge on Si at 600K to compare and contrast the 2D anisotropic results with the 1D isotropic and 2D isotropic results. The predictions are also compared with a linear numerical calculation on a discrete reciprocal-space grid to test the approximations made and to illustrate the relation between the surface profile (h(x)), the example autocorrelation functions (CA(x) and CAk ) and the ensemble correlation functions (C(∆x) and Ck). Figs. 6, 7 and 5 show these results. Finally, the relation between average film height and order is investigated. 4.1 Ge at 600K The formulations for the three discussed cases are implemented for Ge/Si at 600K. The correlation lengths are esti- mated for the end of the linear regime where fluctuations become large (greater than atomic scale). First, appropriate physical constants are used to give the corresponding correlation length and correlation functions vs. time. These in- clude an initial average film height H̄ and a white noise amplitude ∆ (Eq. (38)). These initial conditions approximate a film at the beginning of an anneal that immediately follows a rapid deposition. The time tlarge is found by solving for the time where the mean-square fluctuations are atomic scale, h(x, t)2 = C(∆x = 0) = 1 ML2. At this point, the correlation lengths are calculated. Physical constants for the 2D anisotropic calculation are taken as follows. The elastic constants for Ge at 600 K are c11 = 1.199 × 1012, c12 = 4.01 × 1011(from cS = 3.991), c44 = 6.73. [51] Using aGe = 0.5658nm and aSi = 0.5431nm, it is found that �m = 0.0418. Using the procedure from (Appendix C), M = 1.332× 1012dyn/cm2. E0◦ = 4.96 × 109erg/cm3, and E45◦ = 4.35 × 109erg/cm3 , giving �A = 0.1236. The atomic volume is Ω = 2.27 × 10−23 cm3. The estimated surface energy density is γ = 1927 erg/cm2. The wetting potential is estimated by picking a plausible critical surface height, Hc ≈ 4 ML = 1.132 nm and setting W (H) = E20◦H3c/(8γH) = 2.315 × 10−6/H erg/cm2. The resulting characteristic wave number is kc = 0.257 nm−1. The initial film height is taken to be H̄ = Hc + 0.25 ML = 1.203 nm and then allowed to evolve naturally. Thus, β = 0.208, α0 = 0.5658, k0 = 0.1456 nm−1, σ0 = 0.1192/tc, σ‖ = 0.864/(k2c tc), σ⊥ = 0.559/(k c tc), L‖ = 0.744k 0 (t/tc) 1/2, and L⊥ = 0.599k 0 (t/tc) 1/2. The unspecified diffusivity has been absorbed into the characteristic time tc. From Eq. (38), ∆2 = 0.0403 nm4, and Eq. (50) gives C(0) = 1.223× 10−3tc/t e0.02385t/tc nm2. The initial infinitely rough surface undergoes a smoothing described by the tc/t factor. Then the surface roughens due to the exponential. The initial divergent roughness is an artifact of the non-physical white noise with the atomic scale cutoff b0 neglected (Appendix F). The time for the fluctuations to become “large” again are found by setting C(0) = h2large (53) where hlarge = 1 ML = 0.283 nm. The solutions are t1 = 0.01527tc or t2 = 430tc. The first solution is discarded since it is due to the non-physical white noise. At tlarge = t2, L‖ = 105.8 nm, and L⊥ = 85.2 nm. Taking L⊥ as more limiting, the correlation spans about n = k0L⊥/π = 3.95 islands across. The corresponding reciprocal space (Eq. (43)) and real-space correlation function (Eq. (50)) are shown in Figs. 6.d and 5.f respectively. A corresponding numerical experiment is performed. A periodic surface of size l = 96(2π/k0) is used. Random initial conditions consistent with Eq. (38) are used for k−space points on a square grid bounded by kx, ky = ±2k0. The relation between discrete and continuous Fourier components is used, (hk)discrete = [(2π)d/A]hk. Eqs. (13) and (14) are used without any additional approximation to find hk at time t = tlarge. The resulting CAk , a portion of the height profile h(x) and CA(∆x) are plotted in Figs. 6(c), 5(d) and 5(f) respectively. Similar calculations can be performed for the one-dimensional and two-dimensional elastically isotropic cases. Isotropic values used previously [58, 24] are about E = 1.361 × 1012 dyn/cm2 and ν = 0.198 giving M = E/(1 − ν) = 1.697 × 1012 dyn/cm2 and E = 2M(1 + ν) = 7.10 × 109 erg/cm3. Using the same critical surface height, Hc = 4 ML, W (H) = 4.74×10−6/H erg/cm2. The resulting characteristic wave number is kc = 0.368 nm−1. If the film is grown to H̄ = Hc+0.25 ML = 1.203 nm and then allowed to evolve naturally, β = 0.208; thus, α0 = 0.5658, k0 = 0.208 nm−1, σ0 = 0.1192/tc, σ2 = 0.864/(k2c tc), and Lcor = 0.744k 0 (t/tc) 1/2. In one dimension, Eq. (46) is used to find the mean square height fluctuation. Using Eq. (38) with d = 1, ∆2 = 0.0568 nm3, and C(0, t) = 0.01271(t/tc) −1/2e0.0238t/tc . Setting C(0, t) = (1 ML)2 = 0.0801 nm2, t1 = 0.0252tc, and t2 = 186.9tc. At t2, Lcor = 48.8 nm, and n = k0Lcor/π = 3.24, so about 3 dots in a row should be well correlated. The corresponding numerical calculation of size l = 96(2π/k0) is performed. A portion of h(x), CA(∆x) and C(∆x) are shown in Fig. 7. In two dimensions, Eq. (47) is used to find h(x, t)2 C(0, t) = 9.40× 10−4(t/tc)−1/2e0.0238t/tc . Setting C(0, t) = 0.0801 nm2, t1 = 1.376 × 10−4tc, and t2 = 306tc. At t2, Lcor = 62.4 nm, and n = k0Lcor/π = 4.14, and correlation is expected to extend about 4 dots. However, it should be noted that this correlation is not lattice-like. Corresponding numerical results and ensemble correlation functions are shown in Figs. 6 and 5.a-c. 4.2 General case of β In [24] it was suggested that allowing the film to evolve with β close to the stability threshold could enhance the SAQD correlation. It is interesting to note what happens for different values of β. Similar analytic and numerical calculations are performed for the large film-height limit, β = 0, for the 2D anisotropic Ge/Si surface. For β = 0, tlarge = 40.3tc, L⊥ = 30.0 nm, and n = k0L⊥/π = 1.84, so one to two dots in a row are expected to be well correlated. h(x) and real-space correlation functions are shown in Figs. 5g-i. The range of order is significantly less than for the case β = 0.208 (Sec. 4.1). For Si/Ge at 600K, the 2D anisotropic predictions for tlarge and L⊥ are shown in Fig. 8. In general, the closer β is to the critical value 0.25, the longer the correlation length. One can manipulate equation (53) to find that tlarge/tc varies approximately but not exactly as (β − 1/4)−1 × ln[h2large �A/(∆2k2c )]. Consequently, L⊥ ∼ (β − 1/4)−1/2. Furthermore, the appearance of hlarge and ∆2 inside the logarithm shows that the final order estimates are not overly sensitive to the guesses for ∆2 and h2large. The divergence of L⊥ with β − 1/4 is initially encouraging, but it is clear that for the parameters used for Ge/Si, subatomic control of the film height is needed to yield significantly enhanced long range correlations. Also as one approaches this threshold, one can probably expect thermal activation to nucleate subcritical SAQDs whose effect on supercritically formed SAQDs is uncertain. There should be some interesting phenomena at the theH → Hc. l arge c l a r g e k 0L � � 0 � l a r g e � � Figure 8: tlarge and L⊥ vs. β for Si/Ge using the 2D anisotropic model as described in Sec. 4. Units are normalize to characteristic time tc and predicted number of correlated dots (n = k0L⊥/π). 5 Discussion/Conclusions The order of epitaxial self-assembled quantum dots during initial stages of growth has been studied using a common model of surface diffusion with stochastic initial conditions. It has been shown that correlation functions of small surface height fluctuations can be predicted analytically using corresponding ensemble average correlation functions. These correlation functions are characterized by correlation lengths that can be predicted by analytic formulas given certain reasonable assumptions about the diffusion potential and the height and lateral scale of initial atomic scale random fluctuations. Thus, the linear model of film surface height evolution via surface diffusion has enabled analytic predictions of epitaxial SAQD order that are valid for small film height fluctuations. To what extent the initial degree of order persists into later stages of growth remains to be studied, but the order of initial stages should certainly have a strong influence on final outcomes. Furthermore, the linear analysis should provide insight into the less tractable non-linear behavior. These predictions of SAQD order have been used to investigate the role of crystal anisotropy and initial film height. Crystal anisotropy has been shown to play an important role in enhancing SAQD order as observed in previous numerical simulations continuum and atomistic numerical simulations. [43, 37, 44, 45] If a four-fold symmetry is assumed for the governing dynamics, the effect of crystal anisotropy to linear order is felt through elastic anisotropy alone. It is shown that elastic anisotropy is required to produce a lattice-like structure of SAQDs. The enhanced spatial order should in turn lead to enhanced size order, a consequence that must be confirmed with non-linear studies, but appears to be true based on the present available literature. The role of initial film height has been shown to greatly influence order. Growth near the critical film height for dot formation can enhance order. This order enhancement comes from increasing the duration of the linear small- fluctuation stage of growth. In fact, the predicted correlation lengths diverge when the initial film height approaches the critical film height from above. Achieving large correlation lengths in this manor is of course practically limited by ability to control film heights to subatomic accuracy. Additionally, one should be careful when interpreting the continuum model in such a context, as the effect of atomic discreteness might be greater at the transition film height. Finally, it is likely that additional randomizing effects of thermal activation will effectively cut off this divergence when the critical film height is approached from below during deposition. Finally, the presented method may be useful as a first step in the analysis of methods to enhance SAQD order. It is reasonable to suppose that under some circumstances initial growth stages will be very important while for others they will not. For example, prior work on vertical stacking appears to confirm the presented ordering mechanism. [44]. Vertical stacking not only achieves vertical correlation of dots, but each layer is more ordered horizontally than the one below. Additionally, a “growth window” was found, whereby to achieve enhanced order, the evolution of each layer be terminated before ripening begins. The reported simulation [44] supports the following scenario for SAQD order development. Order is enhanced during the small fluctuation stage as described here. Once the fluctuations are sufficiently large, the seeded dots evolve towards their equilibrium shapes. Finally, the dots begin to ripen and order diminishes. Order is transfered via strain to the next layer so that the next layer gets a head start on its initial ordering. Thus, the multiple layers of dots effectively draws out the linear growth stage. It may be possible to modify the present model to predict the correlation length of each SAQD layer. A Diffusion Potential The diffusion potential is calculated in terms of the film height H that is a function of the in plane coordinates x = xi + yj. The elastic and surface energy portions of the diffusion potential can be found in [15] µelast(x) = Ωω(x), and µsurf = −Ωγκ(x), where Ω is the atomic volume, ω(x) is the elastic energy density at the film surface, γ is the surface energy density, and κ is the total surface curvature. However, other calculations need to be included: 1. µwet for the two wetting potential cases, Eq. (3) and (5), 2. and µsurf and µwet when the surface energy density γ and wetting energy density W also depend on surface orientation. Before these case are addressed, a general form for the diffusion potential is justified. A.1 General Form µ = ΩδF/δH(x) The diffusion potential, µ(x), is the change in free energy, F , when a particle is added at a position, x. Note that µ(x) and F are relative energies. They can be used to compare the binding energy of one site on the surface in comparison with another site, but should not be interpreted as an absolute binding energy or total formation energy of the surface. If a particle has a volume Ω, then the diffusion potential at x is related to the variation of free energy with volume, δF = Ω−1 ddxµ(x)δV (x), (54) where δV (x) is the volume variation at x. Calculating δV (x), V = ddxH(x).Therefore, δV (x) = δH(x). Substituting into δF (Eq. (54)), δF = Ω−1 ddxµ(x)δH(x) or µ(x) = ΩδF/δH(x). A.2 Simple Model Starting from Eq. (2), µ(x) is found by taking the variational derivative, µelast.(x) = Ω δH(x) volume ddxdz ω[H](x, z) = Ωω (x) where the “[H]” indicates that the elastic energy, ω, is a nonlocal functional of the film height H, and ω(x) = ω[H] (x,H(x)), the elastic energy density evaluated above lateral position x at the free surface (z = H(x)). See [15] for details of the derivation. The surface energy diffusion potential is µsurf.(x,t) = Ω δH(x) 1 + (∇H(x))2 = −Ω∇ · 1 + (∇H(x))2 γ = −Ωγκ(x). The wetting energy diffusion potential is µwet(x) = Ω δH(x) ddxW (H(x)) = ΩW ′(H(x)) Putting these three terms together, one obtains Eq. (3) A.3 General Model Consider the general form for the combined surface energy and wetting potential, Fsw = ddxFsw(H(x),∇H(x)) as in Eq. (4) so that the free energy is an integral over the x−plane of an energy density that depends on H(x) and ∇H(x) locally. The corresponding diffusion potential is µ(x) = Ω δH(x) F (10)sw (H(x),∇H(x))−∇ · F sw (H(x),∇H(x)) B Linearized Diffusion Potential and Anisotropy The linearized diffusion potential µlin, k is found by finding µ(x) to first order in height fluctuations (h), to get µlin(x) and then taking the Fourier transform to get µlin,k. The linearization of the simple isotropic diffusion potential corre- sponding to Eqs. (2) and (3) was discussed in Sec. 2.1.1.1. Here, the more general diffusion potential corresponding to Eqs (4) and (5) is linearized and then applied to the anisotropic simple model and the anisotropic general model. Only the surface and wetting parts of the diffusion potential are discussed in this appendix. See ref. [15], Sec. 2.2.1.1 and Appendix C for discussion of µelast.. B.1 Linearizing the simple model Consider a wetting potential and diffusion potential that both depend on the film height gradient ∇H, γ → γ(∇H) and W (H) → W (H,∇H). Starting from Eq. (6) and expanding to second order in the film height fluctuation using H(x) = H̄+ h(x) (Eq. (7)), 1 + (∇H)2 ]−1/2 γ(∇H) = (∇h)2 + . . . γ + γ′ ·∇h+ γ̃′′ : ∇h∇h+ . . . = γ + γ′ ·∇h− γ (∇h)2 + γ̃′′ : ∇h∇h+O[h3] where γ is γ(0), and the primes indicate the derivatives with respect to the surface height gradient. γ′ = ∂∇Hγ(∇H)|∇H=0 , and γ̃ ′′ = ∂∇H∂∇Hγ(∇H)|∇H=0 . Taking the derivative with respect to ∇h results in a tensor of rank equal to the order of the derivative because ∇h is a vector (tank 1 tensor). Taking the variational derivative, µsurf.(x) = ΩδFsurf./δh(x), µsurf., lin(x) = Ω γ∇2h(x)− γ̃′′ : ∇∇h(x) The term with γ′ vanishes because it is the divergence of a constant (∇ · γ′). Taking the inverse Fourier transform, µsurf., lin,k = Ω −γk2 + k · γ̃′′ · k hk. (55) The first term is isotropic. The second term is parameterized by a rank 2 symmetric tensor. Going through the same process, one finds essentially the same result for an orientation dependent wetting energy. The step details are so close to the details for linearizing the more general form, Fsw(H,∇H), they are deferred to (Appendix B.2). One finds that µwet,lin,k = Ω W (20) + k · W̃(02) · k . (56) where W (mn) = ∂mH∂ ∇HW (H,∇H)|H=H̄,∇H=0 is the m th and nth derivative of the wetting energy density with respect toH and ∇H evaluated for a perfectly flat film of height H̄. W (mn) is a tensor of rank n. B.1.1 isotropic case In the isotropic case, γ̃′′ → γ′′Ĩ, where Ĩ is the identity operator, and γ′′ is a scalar. Similarly, W̃(02) →W (02)Ĩ. One thus gets for the combined surface and wetting parts of the diffusion potential, µsw,lin,k = Ω −γ + γ′′ +W (02) k2 +W (20) Thus, in the isotropic case, the linear order effect of introducing a surface orientation to either the surface energy or the wetting potential is simply to change the apparent surface energy density by γ → γ − γ′′ −W (02). B.1.2 anisotropic case The surface and wetting parts of the diffusion potential (Eqs. (55) and (56)) can admit only a limited anisotropy. They both contain rank 2 symmetric tensors, γ̃′′ and W̃(02) in the x−plane. For a two-dimensional surface, this means that they can either have two-fold-symmetric (rotations by 180◦) anisotropy or none at all. Thus, for the case considered in Sec. 2.2.1.2, four-fold-symmetric anisotropy , the surface and wetting parts of the diffusion potential must be completely isotropic. As discussed in Sec. 2.2.1.2, the (100) surface of zinc-blend structures, such as the mentioned Ge, Si, InAs and GaAs present a rather complicated situation. For simplicity, it is assumed here that the surface and wetting energies are at least four-fold symmetric. Consequently, they are completely isotropic. Finally, it should be noted that if Fsw depends on higher order derivatives, then the discussion is greatly compli- cated and a larger class of anisotropic terms is admissible. For example, whenFsw → Fsw(H,∇H,∇∇H,∇∇∇H, . . . ) is expanded aboutH(x) = H̄ to quadratic order in h, it would contains tensors of rank 6 and maybe even higher. B.2 Linearizing the general model The elastic part of the linearized diffusion potential was discussed in Sec. 2.2.1.1 and Appendix C . Eq. (56) can be found by using all of the following steps with the substitution Fsw →W . The surface-wetting part of the diffusion po- tential µ(x) is found by expanding Fsw to second order in the film-height fluctuation, h, and then taking the variational derivative. Expanding Fsw about h = 0 and ∇h = 0, Fsw(H̄+ h,∇h) = F (00)sw + F sw h+ F sw ·∇h+ hF sw ·∇h . . . · · ·+ F (20)sw h F̃(02)sw : ∇h∇h+O[h Note that in this expansion, all the F (mn)sw terms are constant with respect to h and depend implicitly on the average film height, H̄. The first index indicates the mth derivative with respect to h. The second index indicates the nth derivative with respect to ∇h. The derivatives are evaluated for a perfectly flat surface of height H̄. Thus, F (mn)sw = ∂ ∇HFsw (H,∇H)|H=H̄,∇H=0 . Since ∇h is a vector in the x−plane, F (mn)sw is a tensor of rank n. Taking the variational derivative of Fsw =∫ ddxFsw(H,∇H) and keeping terms to order h1, δh(x) = F (10)sw −∇ · F sw + F sw h−∇ · F̃(02)sw ·∇h Note that the F (00)sw term vanishes because it is constant, and the F sw term vanishes upon simplification. Additionally, the F (10)sw can be neglected if one enforces the condition that the film-height fluctuations do not add or subtract material from the surface, namely that ddx δh(x, t) = 0. Alternatively, one can discard it in anticipation of taking the gradient of the diffusion potential, since it is a constant. The term ∇ · F(01)sw = 0 for the same reasons, or because F sw is a constant. Multiplying through by the atomic volume, µlin(x) = Ω F (20)sw h− F̃ sw : ∇∇h . (57) B.2.1 isotropic case In the isotropic case, F̃(02)sw must be proportional to the identity so that F̃ sw = F sw Ĩ; thus, µsw,lin(x) = Ω F (20)sw h(x)− F 2h(x) Taking the inverse Fourier transform of this equation, µsw,lin,k = Ω F (20)sw + F This gives case b in Eq. (9). B.2.2 anisotropic case If the surface is anisotropic, then F̃(02)sw in Eq. (57) is a rank 2 symmetric tensor in the x−plane. Thus, it can have two distinct eigenvalues, and automatically has 2-fold rotational symmetry (rotations by 180◦). If any other symmetry is assumed such as 4-fold symmetry (rotations by 90◦), then F̃(02)sw must be fully isotropic. Taking the inverse Fourier transform, µsw,lin,k = Ω F (20)sw + k · F̃ sw · k In Eq. (23), case b, it is assumed that there is four-fold symmetry, resulting in a surface-wetting part of the diffusion potential that is completely isotropic. C Elastic Anisotropy In principal, the anisotropic elastic energy ωk is found in the same fashion as the isotropic elastic energy. [15] The flat film, initially in a state of biaxial stress, is perturbed by a small periodic surface fluctuation of amplitude h0. An appropriate elastic field is added to satisfy the perturbed traction-free boundary condition at the free surface. Finally, the elastic energy is evaluated at the free surface to first order in h0. The coefficient h0 is the sought after ωk. The equations themselves are cumbersome and best solved using a numeric implementation, so an abstract procedure for calculating ωk is outlined here. ωk is found for k = 1 but arbitrary θk. Let the surface have a height variation h(x) = h0e To first order in h0, the surface normal is n(x) = −ikh0eikxi + k. The elastic energy needs to be calculated to first order in h0. To find the elastic energy, it is necessary to find the perturbing elastic field to first order in h0. The initial unperturbed stress state is σ̃m =  σm 0 00 σm 0 0 0 0 where σm = c11 + c12 − 2c212/c11 �m. Note that this stress state is isotropic in the x−y-plane and thus independent of rotations about the vertical axis. Under this stress state, a flat surface is traction-free. With the height perturbation, the traction is tj = (n · σ̃m)j = −ikh0M�mδj1e ikx. (58) Next to find the perturbing elastic fields. These are not isotropic in the x − y−plane, and it is necessary to take into account the angle. First, the 3× 3× 3× 3 elastic stiffness tensor cijkl is constructed for the cube orientation from the compact 9× 9 matrix cij . The tensor representation aids in rotation. The stiffness tensor is then passively rotated in the x− y−plane by an angel θk, cijkl(θk) = m,n,p,q=1 R(θk)imR(θk)jnR(θk)kpR(θk)lqcmnpq where R(θk) =  cos(θk) sin(θk) 0− sin(θk) cos(θk) 0 0 0 1 This passive rotation of cijkl is equivalent to actively rotating the wave vector k = ki by θk. The appropriate form for the perturbing displacement field is found. Assume a displacement of the form ui(x, y, z) = Uie k(ix+κz), where κ can have a complex value. The elastic equilibrium equations are i,k,l=1 cijkl(θk) ul = 0; j = 1 . . . 3. Cjl(θk, κ)Ul k2ek(ix+κz) = 0 (59) where Cjl(θk, κ) = i,k=1 cijkl(θk)(iδi1 + δi3κ)(iδk1 + δk3κ). Factoring out k2ek(ix+κz), the part in parenthesis must be identically zero. To obtain a non-trivial solution, the determinant of Cjl(θk, κ) to zero. Six complex values of κ are found. The values of κ with Re[κ] < 0 are discarded since the corresponding displacements blow up as z → −∞. Each of the remaining values κ = κp with p = 1 . . . 3 is substituted back into Cjl(θk, κ), and Eq. (59) is solved to find the corresponding eigenvectors, Upl . The total displacement is thus ul(x, y, z) = i�mh0 k(ix+κpz), where it is assumed that the perturbing elastic displacement field is proportional to h0and σm, and the factor of i is put in for convenience. The coefficients Ap can be found from the traction-free boundary condition at the free surface. The traction formula is i,k,l=1 nicijkl(θk) ul(x, y, z) = ik�mh0 i,k,l,p=1 nicijkl(θk)ApU l (iδk1 + κ pδk3)e k(ix+κpz) (60) The traction is already proportional to h0. Thus, all terms in the sum must be kept to zeroth order in h0 so that h(x) = 0, and n(x) = k. Thus, plugging z = 0 to Eq. (60), tj = ik�mh0 (ic3j1l(θk) + κ pc3j3l(θk))ApU ikx. (61) Since the total traction (Eqs. (58) and (61)) must be zero, the coefficients Ap are found from KjpAp = Rj , where Kjp = (ic3j1l(θk) + κ pc3j3l(θk))U Rj = Mδj1 for j = 1 . . . 3. It is worth noting that only for the symmetry directions, θk = 0◦ and θk = 45◦ is the strain purely plane-strain as it is for the elastically isotropic case. The elastic energy at the film surface is found to order O(h0). If the stress and strain are expanded to first order in h0, σ̃ = σ̃0 + σ̃1, and �̃ = �̃0 + �̃1, then �̃ : c̃ : �̃ = σ̃0 : �0 + σ̃0 : �̃1 +O(h Thus, U = U0 +M�m ((�1)11 + (�1)22) (�1)11 = = −�mkh0 (�1)22 = ∂u2/∂y = 0. Thus, U = U0 − Eθkkh0e where Eθk = M� where Apand U 1 are implicitly functions of θk. This procedure has been used to find the values of E0◦ and E45◦ for Table. 2 and Sec. 4. D Diffusional Anisotropy In general, the surface diffusivity can depend on the film height H(x) and the surface orientation ∇H(x) so that the surface current is JS(x) = D̃(H(x),∇H(x)) ·∇sµ(x) where ∇s is the surface gradient, and D̃ is a rank 2 tensor in the two-dimensional space tangent to the film surface at x. Linearizing the surface current about a flat surface, JS(x) = D̃(H̄) ·∇µlin(x) where the diffusivity must be evaluated for h = 0 and ∇h = 0, since µlin(x) is already proportional to h(x). The lin- earized diffusivity is a symmetric rank 2 tensor in the x−plane. Thus, it is similar to F̃sw discussed in Appendix B.2.2. It is automatically either two-fold symmetry (rotations by 180◦) or it is completely isotropic. In Eq. (23), four-fold symmetry of the surface is assumed. Thus, the diffusivity must be completely isotropic; D̃ → D, a scalar. Sec- tion 2.2.1.2 and Appendix B.2.2 contain discussions of the symmetry properties of the various rank 2 tensors that appear in the linear evolution equations. A limited case of diffusional anisotropy has been modeled via kinetic Monte Carlo technique. [54] E Correlation Functions E.1 Mean Values Equations (31) and (33) are central to the presented analysis. Here, they are derived. The two-point correlation func- tions for a stochastic system are introduced. Then, the average of the autocorrelation function is taken and expressed in terms of the two-point correlation functions. Finally, this average is simplified using the translational invariance of the system (governing equations and ensemble of initial conditions). The two-point real-space space correlation function is C(x,x′) = 〈h(x)h(x′)∗〉 , and the reciprocal space correlation function is Ckk′ = 〈hkh∗k′〉 . These are related by the double Fourier transform, Ckk′ = 1(2π)2d ddxddx′ e−ik·x+ik ′·x′C(x,x′); (62) C(x,x′) = ddkddk′ eik·x−ik ′·x′Ckk′ . (63) These ensemble correlation functions can be used to give the ensemble-mean autocorrelation function and spec- trum function. In real space, CA(∆x) d2x′ 〈h(∆x + x′)h(x′)〉 d2x′ C(∆x + x′,x′). (64) (2π)d 〈hkh∗k〉 = (2π)d Ckk. (65) Fortunately, the translational invariance of the system simplifies these relations. Inspecting the governing equations and invoking the translational invariance of the stochastic initial conditions, the resulting ensemble and its statistical measures must also be translationally invariant. Thus under the translation by x′, C(∆x + x′,x′) = C(∆x,0) = C(∆x), (66) so that the independent variable is reduced to just the difference vector ∆x = x − x′. This relation can be used to simplify both the real and reciprocal space relations. The real space relation simplifies as follows.Inserting Eq. (66) into Eq. (64), CA(∆x) d2x′ C(∆x,0) = C(∆x). (67) The reciprocal space relation (Eq. (62)) simplifies to Ckk′ = Ckδ 2(k− k′) = Ck (2π)d δkk′ , (68) where (2π)d d2∆x e−ik·∆xC(∆x). One can see immediately from Eq. (67) that Ck is the Fourier transform of CA(∆x) = C(∆x), or one can plug Eq. (68) into Eq. (65), to get = Ck. E.2 Variance and Convergence The ergodic hypothesis is that an average with respect to a parameter such as position or time tends towards an ensemble average. In this case, CAk ≈ = Ck, (69) and CA(∆x) ≈ CA(∆x) = C(∆x). when the surface area is very large. The ensemble average is a good substitute if the variance about the average vanishes as the substrate area A becomes large. It is found that in reciprocal space, Var(CAk ) = = C2k. (70) Thus, the ergodic hypothesis does not hold for CAk . In practice, C k is a speckled version of Ck (Fig. 6) However, if one smooths CAk by averaging over a small patch in reciprocal space of size ksmooth = 1/∆s, so that CAk (∆s) = )d/2 ∫ ddk′ e− ′−k)2CAk′ , (71) then Var CAk (∆s) diminishes as 1/A. For sufficiently large ∆s,〈 CAk (∆s) ≈ Ck, (72) CAk (∆s) πd/2∆ds C2k. (73) Thus, the ergodic hypothesis (Eq. (69)) only holds for a smoothed version of CAk . In real space, CA(∆x) CA(∆x) CA(∆x) (2π)d e2ik·∆xC2k + C , (74) where the integral is bounded (finite) provided that either t > 0 or the atomic scale cutoff b0 > 0. Thus, the ergodic hypothesis holds for the real space autocorrelation function. E.2.1 Eq. (70) First, CAk C is calculated. CAk C (2π)d 〈hkh∗khk′h k′〉 . Assume that he distribution of hk is gaussian. Also, assume that h(x) is real so that hkh−k = |hk| 2. Then,〈 = Ck1Ck2δ d(k1 − k4)δd(k2 − k3) . . . . . . +Ck1Ck2δ d(k1 + k3)δ d(k2 + k4) . . . . . . +Ck1Ck3δ d(k1 − k2)δd(k3 − k4). Thus, CAk C (2π)d δd(k− k′) . . . . . . +C2k δd(k + k′) + CkCk′ δd(0) . (75) = C2k δkk′ + δk(−k′) + CkCk′ , (76) where Eq. (29) has been used liberally. Setting k = k′, results in Eq. (70). E.2.2 Eq. (73) Now consider CAk smoothed over a length ∆s (Eq. (71)). The mean value is CAk (∆s) )d/2 ∫ ddk′ e− ′−k)2 〈CAk′〉 . )d/2 ∫ ddk′ e− ′−k)2Ck′ . For sufficiently small ksmooth, (sufficiently large ∆s), Eq. (72) results. The variance of CAk (∆s) is now calculated. First, it is necessary to calculate CAk (∆s) CAk (∆s) ddk′ e− ′−k)2 . . . . . . × ddk′′ e− ′′−k)2 〈CAk′CAk′′〉 . Using Eq. (75) and Eq. (29) as needed, CAk (∆s) ddk′ddk′′ e− ′−k)2e− ′′−k)2 (2π)d . . . · · · × δd(k′ − k′′) + C2k δd(k′ + k′′) + CkCk′ δd(0) ′−k)2C2k′ + e − 12 ∆ s[(k′−k)2+(k′+k)2]C2k′ . . . · · ·+ )d/2 ∫ ddk′ e− ′−k)2Ck′ The first integral is bounded (finite) because Ck is bounded. Let its finite value be denoted I . The second integral is simply CAk (∆s) .Thus, Var(CAk (∆s)) = ∆2ds I a finite value that decreases as A−1 as required for the ergodic hypothesis to hold. For sufficiently small ksmooth (large ∆s), I ≈ (π/∆2s)d/2C2k, and Eq. (73) results. It should also be noted that the large ∆s required for this approximation also creates a more stringent requirement that A be large. E.2.3 Eq. (74) Now, consider the real space auto-correlation function. First, CA(∆x)CA(∆x) is needed. CA(∆x)CA(∆x) ddkddk′ ei(k+k ′)·∆x 〈CAk CAk′〉 Proceeding in a fashion similar to the previous section (making use of Eqs. (75) and (29) as needed) , CA(∆x)CA(∆x) (2π)2d ddkddk′ ei(k+k ′)·∆x δd(k− k′) . . . · · ·+ C2k δd(k + k′) + CkCk′ δd(0) (2π)d e2ik·∆xC2k + C . . . · · ·+ ddk eik·∆xCk ddk′ eik ′·∆xCk (2π)d e2ik·∆xC2k + C CA(∆x) Thus, Eq. (74) results. For the variance to be vanishing, the integral in Eq. (74) must be bounded (finite). If time, t > 0, the exponential in Eq. (77) guarantees that the integral is bounded. For time t = 0, the integral is only bounded if the atomic scale cutoff b0 > 0. F Atomic Scale Cutoff Starting from Eq. (39), (2π)d e2σkt− . (77) The effect of the small scale cutoff is both small and short-lived, as it only works to suppress fluctuations with large wavenumbers. The most important fluctuations have wavenumbers between 0 and 2kc. Thus, the typical size of the cutoff term is about b20k c . If a typical dot size or spacing size 10 nm, and a typical atomic scale is 10 −1 nm, a typical value for this term is about 10−3 − 10−2. To calculate the effect of the cutoff, it can absorbed into the time-dependent part with the substitution so that its effect lasts only as long as a perturbation with atomic scale curvature (κ = b0). Thus, Eq. (40) is a good approximation. Acknowledgement Thanks to L. Fang and C. Kumar for useful comments during the writing of this article. References [1] D. Bimberg, M. Grnudmann, and N. N. Ledentsov. Quantum Dot Heterostructures. John Wiley & Sons, 1999. [2] O. P. Pchelyakov, Yu. B. Bolkhovityanov, A. V. Dvurechenski, L. V. Sokolov, A. I. Nikiforov, A. I. Yakimov, and B. Voigtländer. SiliconGermanium nanostructures with quantum dots: Formation mechanisms and electrical properties. Semiconductors, 34(11):122947, 2000. [doi:10.1134/1.1325416]. [3] M. Grundmann. The present status of quantum dot lasers. Physica E, 5:167, 2000. [doi:10.1016/S1386- 9477(99)00041-7]. [4] Pierre M. Petroff, Axel Lorke, and Atac Imamoglu. Epitaxially self-assembled quantum dots. Physics Today, pages 46–52, May 2001. [5] Hui-Yun Liu, Bo Xu, Yong-Qiang Wei, Ding Ding, Jia-Jun Qian, Qin Han, Ji-Ben Liang, and Zhan-Guo Wang. High-power and long-lifetime InAs/GaAs quantum-dot laser at 1080 nm. Applied Physics Letters, 79(18):2868– 70, 2001. [doi:10.1063/1.1415416]. [6] F. Heinrichsdorff, M.H. Mao, N. Kirstaedter, A. Krost, D. Bimberg, A. O. Kosogov, and P. Werner. Room- temperature continuous-wave lasing from stacked InAs/GaAs quantum dots grown by metalorganic chemical vapor deposition. Applied Physics Letters, 71(1):22–4, 1997. [doi:doi:10.1063/1.120556]. [7] D. Bimberg, N.N. Ledentsov, and J.A. Lott. Quantum-dot vertical-cavity surface-emitting laser. MRS Bulletin, 27(7):531–7, 2002. [8] N. N. Ledentsov. Long-wavelength quantum-dot lasers on GaAs substrates: From media to device con- cepts. IEEE Journal of Selected Topics in Quantum Electronics, 8(5):1015–23, September/October 2002. [doi:10.1109/JSTQE.2002.804236]. [9] M Friesen, P Rugheimer, D. E. Savage, M. G. Lagally, D. W. van der Weide, R Joynt, and M. A. Eriksson. Practical design and simulation of silicon-based quantum-dot qubits. Physical Review B, 67(12):121301 (R), 2003. [doi:10.1103/PhysRevB.67.121301]. [10] Yi-Chang Cheng, San-Te (Cing-Ming) Yang, Jyh-Neng Yang, Liann-Be Chang, and Li-Zen Hsieh. Fabrication of a far-infrared photodetector based on InAs/GaAs quantum-dot superlattices. Optical Engineering, 42(1):11923, 2003. [doi:doi:10.1117/1.1525277]. [11] R. Krebs, S. Deubert, J.P. Reithmaier, and A. Forchel. Improved performance of MBE grown quantum-dot lasers with asymmetricdots in a well design emitting near 1.3 µm. Journal of Crystal Growth, 251:7427, 2003. [doi:10.1016/S0022-0248(02)02385-0]. [12] Hiroyuki Sakaki. Progress and prospects of advanced quantum nanostructures and roles of molecular beam epitaxy. Journal of Crystal Growth, 251:9–16, 2003. [doi:10.1016/S0022-0248(03)00831-5]. [13] B. J. Spencer, P. W. Voorhees, and S. H. Davis. Morphological instability in epitaxially strained dislocation-free films. Physical Review Letters, 67(26):3696–3699, 1991. [doi:10.1103/PhysRevLett.67.3696]. [14] Karl Brunner. Si/ge nanostructures. Reports on Progress in Physics, 65(1):27–72, 2002. [doi:10.1088/0034- 4885/65/1/202]. [15] L. B. Freund and S. Suresh. Thin Film Materials: Stress, Defect Formation and Surface Evolution, chapter 8. Cambridge University Press, 2003. [16] S. Yu Shiryaev, E. Verstlund Pedersen, F. Jensen, J. Wulff Petersen, J. Lundsgaard Hansen, and A. Nylandsted Larson. Dislocation patterning - a new tool for spatial manipulation of Ge islands. Thin solid films, 294(1- 2):311–314, 1997. [doi: 10.1016/S0040-6090(96)09240-1]. [17] C. Kumar and L. H. Friedman. Simulation of thermal field directed self assembly of epitaxial self-assembled Ge quantum dots. Journal of Applied Physics, in press. [18] Lawrence H. Friedman and Jian Xu. Feasibility study for thermal-field directed self-assembly of heteroepitaxial quantum dots. Applied Physics Letters, 88:093105, 2006. [doi:10.1063/1.2179109]. [19] S. Krishna, D. Zhu, J. Xu, and P. Bhattacharya. Structural and luminescence characteristics of cycled sub- monolayer InAs/GaAs quantum dots with room-temperature emission at 1.3 µm. Journal of Applied Physics, 86:6135–8, 1999. [doi:10.1063/1.371664]. [20] R. Hull, J.L. Gray, M. Kammler, T. Vandervelde, T. Kobayashi, P. Kumar, T. Pernell, J.C. Bean, J.A. Floro, and F.M. Ross. Precision placement of heteroepitaxial semiconductor quantum dots. Materials Science and Engineering B, 101:1–8, 2003. [doi:10.1016/S0921-5107(02)00680-3]. [21] O. Guise, Jr. J. T. Yates, J. Levy, J. Ahner, V. Vaithyanathan, and D. G. Schlom. Patterning of sub-10nm ge islands on si(100) by direct self-assembly. Applied Physics Letters, 87:171902, 2005. [doi:10.1063/1.2112198]. [22] X. Niu, R. Vardavas, R. E. Caflisch, and C. Ratsch. Level set simulation of directed self-assembly during epitaxial growth. Physical Review B, 74(19):193403, Nov 2006. [doi:10.1103/PhysRevB.74.193403. [23] Z. M. Zhao, T. S. Yoon, W. Feng, B. Y. Li, J. H. Kim, J. Liu, O. Hulko, Y. H. Xie, H. M. Kim, K. B. Kim, H. J. Kim, K. L. Wang, C. Ratsch, R. Caflisch, D. Y. Ryu, and T. P. Russell. The challenges in guided self-assembly of ge and inas quantum dots on si. THIN SOLID FILMS, 508(1-2):195–199, Jun 2006. [doi:10.1016/j.tsf.2005.08.407]. [24] Lawrence H. Friedman. Anisotropy and order of epitaxial self-assembled quantum dots. Physical Review B, in press. [25] Y. Obayashi and K. Shintani. Directional dependence of surface morphological stability of heteroepitaxial layers. Journal of Applied Physics, 84(6):3141, 1998. [doi:10.1063/1.368468]. [26] C. S. Ozkan, W. D. Nix, and H. J. Gao. Stress-driven surface evolution in heteroepitaxial thin films: Anisotropy of the two-dimensional roughening mode. JOURNAL OF MATERIALS RESEARCH, 14(8):3247–3256, Aug 1999. [doi:10.1557/JMR.1999.043]. [27] J. Tersoff and F. K. LeGoues. Competing relaxation mechanisms in strained layers. Physical Review Letters, 72(22):3570–3573, May 1994. [doi:10.1103/PhysRevLett.72.3570]. [28] B. J. Spencer, P. W. Voorhees, and S. H. Davis. Morphological instability in epitaxially strained dislocation- free solid films: Linear stability theory. Journal of Applied Physics, 73(10):4955–4970, 1993. [doi: 10.1063/1.353815]. [29] J. M. Baribeau, X. Wu, N. L. Rowell, and D. J. Lockwood. Ge dots and nanostructures grown epitaxially on si. JOURNAL OF PHYSICS-CONDENSED MATTER, 18(8):R139–R174, Mar 2006. [doi:10.1088/0953- 8984/18/8/R01]. [30] D. J. Srolovitz. On the stability of surfaces of stressed solids. Acta Metallurgica, 37(2):621–625, 1989. [doi:10.1016/0001-6160(89)90246-0]. [31] H. J. Gao and W. D. Nix. Surface roughening of heteroepitaxial thin films. ANNUAL REVIEW OF MATERIALS SCIENCE, 29:173–209, 1999. [doi:0.1146/annurev.matsci.29.1.173]. [32] P. Sutter and M. G. Lagally. Nucleationless three-dimensional island formation in low-misfit heteroepitaxy. Physical Review Letters, 84(20):4637, 2000. [doi:10.1103/PhysRevLett.84.4637. [33] A. A. Golovin, S. H. Davis, and P. W. Voorhees. Self-organization of quantum dots in epitaxially strained solid films. Physical Review E, 68:056203, 2003. [doi:10.1103/PhysRevE.68.056203]. [34] A. Ramasubramaniam and V. B. Shenoy. Growth and ordering of si-ge quantum dots on strain patterned sub- strates. JOURNAL OF ENGINEERING MATERIALS AND TECHNOLOGY-TRANSACTIONS OF THE ASME, 127(4):434–443, Oct 2005. [doi:10.1115/1.1924559]. [35] I. Berbezier, A. Ronda, F. Volpi, and A. Portavoce. Morphological evolution of SiGe layers. Surface Science, 531:231–243, 2003. [doi:10.1016/S0039-6028(03)00488-6]. [36] J. R. R. Bortoleto, H. R. Gutierrez, M. A. Cotta, J. Bettini, L. P. Cardoso, and M. M. G. de Carvalho. Spatial order- ing in InP/InGaP nanostructures. Applied Physics Letters, 82(20):3523–3525, 2003. [doi:10.1063/1.1572553]. [37] P. Liu, Y. W. Zhang, and C. Lu. Formation of self-assembled heteroepitaxial islands in elastically anisotropic films. Physical Review B, 67:165414, 2003. [doi: 10.1103/PhysRevB.67.165414]. [38] Y.W. Zhang, A.F. Bower, and P. Liu. Morphological evolution driven by strain induced surface diffusion. Thin solid films, 424:9–14, 2003. [doi:10.1016/S0040-6090(02)00897-0]. [39] Yu U. Wang, Yongmei M. Jin, and Armen G. Khachaturyan. Phase field microelasticity modeling of surface instability of heteroepitaxial thin films. Acta Materialia, 52:81–92, 2004. [doi:10.1016/j.actamat.2003.08.027]. [40] W. T. Tekalign and B. J. Spencer. Evolution equation for a thin epitaxial film on a deformable substrate. Journal of Applied Physics, 96(10):5505–5512, 2004. [doi:10.1063/1.1766084]. [41] M. J. Beck, A. van de Walle, and M. Asta. Surface energetics and structure of the ge wetting layer on si(100). Physical Review B, 70(20):205337, Nov 2004. [doi:10.1103/PhysRevB.70.205337]. [42] Y. H. Tu and J. Tersoff. Origin of apparent critical thickness for island formation in heteroepitaxy. Physical Review Letters, 93(21):216101, Nov 2004. [doi:10.1103/PhysRevLett.93.216101. [43] V. Holy, G. Springholz, M. Pinczolits, and G. Bauer. Strain induced vertical and lateral correlations in quantum dot superlattices. Physical Review Letters, 83(2):356–359, 1999. [doi:10.1103/PhysRevLett.83.356]. [44] P. Liu, Y. W. Zhang, and C. Lu. Three-dimensional finite-element simulations of the self-organized growth of quantum dot superlattices. Physical Review B, 68:195314, 2003. [doi:10.1103/PhysRevB.68.195314]. [45] G. Springholz, M. Pinczolits, V. Holy, S. Zerlauth, I. Vavra, and G. Bauer. Vertical and lateral ordering in self-organized quantum dot superlattices. Physica E, 9:149–163, 2001. [doi:10.1016/S1386-9477(00)00189-2. [46] P. Liu, Y. W. Zhang, and C. Lu. Coarsening kinetics of heteroepitaxial islands in nucleationless stranski-krastanov growth. Physical Review B, 68:035402, 2003. [doi:10.1103/PhysRevB.68.035402]. [47] F. M. Ross, J. Tersoff, and R. M. Tromp. Coarsening of self-assembled ge quantum dots on Si(001). Physical Review Letters, 80(5):984–7, 1998. [doi:10.1103/PhysRevLett.80.984]. [48] J. Tersoff. Kinetic surface segregation and the evolution of nanostructures. Applied Physics Letters, 83(2):353– 355, 2003. [doi:doi:10.1063/1.1592304]. [49] A. Ramasubramaniam and V. B. Shenoy. A spectral method for the nonconserved surface evolution of nanocrys- talline gratings below the roughening transition. Journal of Applied Physics, 97(11):114312, 2005. [doi: 10.1063/1.1897837]. [50] Y. W. Zhang and A. F. Bower. Three-dimensional analysis of shape transitions in strained-heteroepitaxial islands. Applied Physics Letters, 78(18):2706–2708, 2001. [doi:10.1063/1.1354155]. [51] L. E. Vorbyev. Handbook Series On Semiconductor Parameters, volume 1. World Scientific, 1996. [52] A. A. Golovin, M. S. Levine, T. V. Savina, and S. H. Davis. Faceting instability in the presence of wet- ting interactions: A mechanism for the formation of quantum dots. Physical Review B, 70:235342, 2004. [doi:10.1103/PhysRevB.70.235342]. [53] B L Liang, Zh M Wang, Yu I Mazur, V V Strelchuck, K Holmes, J H Lee, and G J Salamo. Ingaas quantum dots grown on b-type high index gaas substrates: surface morphologies and optical propertiesmorphologies and optical properties. Nanotechnology, 17(11):2736–2740, 2006. [doi:10.1088/0957-4484/17/11/004]. [54] M. Meixner, R. Kunert, and E. Scholl. Control of strain-mediated growth kinetics of self-assembled semicon- ductor quantum dots. Physical Review B, 67:195301, 2003. [doi: 10.1103/PhysRevB.67.195301]. [55] Robert Zwanzig. Nonequilbrium Statistical Mechanics. Oxford University Press, New York, 2001. [56] C. W. Gardiner. Handbook of Stochastic Methods for Physics Chemistry and the Natural Sciences. Springer, New York, 3rd edition, 2004. [57] M. C. Cross and P. C. Hohenberg. Pattern formation outside equilibrium. Reviews of Modern Physics, 65(3):851– 1112, 1993. [doi:10.1103/RevModPhys.65.851]. [58] B. J. Spencer, S. H. Davis, and P. W. Voorhees. Morphological instability in epitaxially strained dislocation-free solid films: Nonlinear evolution. Physical Review B, 47(15):9760, 1993. [doi: 10.1103/PhysRevB.47.9760]. Introduction Modeling 1D and 2D Isotropic model Energetics simple form general form Linearization Dynamics Peaks Stability and wetting potential 2D Anisotropic case Energetics Elastic anisotropy Surface and Wetting Energy Anisotropy total diffusion potential Dynamics Expansion about peaks Correlation Functions Correlation Functions and SAQD order Periodic array Nearly Periodic array Ensemble Correlation Functions / ergodicity Mean fluctuation Correlation Function Stochastic Initial Conditions Reciprocal Space Correlation Functions one-dimensional 2D isotropic anisotropic Real Space Correlation Functions one-dimensional 2D isotropic anisotropic Generalizability Order Predictions Ge at 600K General case of Discussion/Conclusions Diffusion Potential General Form =F/H(x) Simple Model General Model Linearized Diffusion Potential and Anisotropy Linearizing the simple model isotropic case anisotropic case Linearizing the general model isotropic case anisotropic case Elastic Anisotropy Diffusional Anisotropy Correlation Functions Mean Values Variance and Convergence Eq. (??) Eq. (??) Eq. (??) Atomic Scale Cutoff ABSTRACT Epitaxial self-assembled quantum dots (SAQDs) are of interest for nanostructured optoelectronic and electronic devices such as lasers, photodetectors and nanoscale logic. Spatial order and size order of SAQDs are important to the development of usable devices. It is likely that these two types of order are strongly linked; thus, a study of spatial order will also have strong implications for size order. Here a study of spatial order is undertaken using a linear analysis of a commonly used model of SAQD formation based on surface diffusion. Analytic formulas for film-height correlation functions are found that characterize quantum dot spatial order and corresponding correlation lengths that quantify order. Initial atomic-scale random fluctuations result in relatively small correlation lengths (about two dots) when the effect of a wetting potential is negligible; however, the correlation lengths diverge when SAQDs are allowed to form at a near-critical film height. The present work reinforces previous findings about anisotropy and SAQD order and presents as explicit and transparent mechanism for ordering with corresponding analytic equations. In addition, SAQD formation is by its nature a stochastic process, and various mathematical aspects regarding statistical analysis of SAQD formation and order are presented. <|endoftext|><|startoftext|> A NOTE ABOUT THE {Ki(z)} FUNCTIONS Branko J. Malešević In the article [10], A. Petojević verified useful properties of the Ki(z) functions which generalize Kurepa’s [1] left factorial function. In this note, we present simplified proofs of two of these results and we answer the open question stated in [10]. Finally, we discuss the differential transcendency of the Ki(z) functions. A. Petojević [7, p. 3.] considered the family of functions: vMm(s; a, z) = (−1)k−1 z +m+ 1− k L[s; 2F1(a, k − z,m+ 2; 1− t)],(1) for ℜ(z) > v−m−2, where v∈N is a positive integer; m∈{−1, 0, 1, 2, . . .} is an integer; s, a, z are complex variables; L[s;F (t)] is Laplace transform and 2F1(a, b, c;x) is the hypergeometric function (|x| < 1). D- .Kurepa has considered in the articles [1, p. 151.] and [2, p. 297.] a complex function defined by the integral: K(z) = tz − 1 dt,(2) for ℜ(z)>0. Especially, forKurepa’s functionK(z), it is true thatK(z)=1M0(1; 1, z), for ℜ(z)>0, according to [10]. For various of values of parameters v,m, s, a, z from (1), different special functions, as presented in [10], are obtained. A. Petojević has con- sidered in the article [10, p. 1640.] the following sequence of functions: Ki(z) = 1M0(1; 1, z + i− 1)− 1M0(1; 1, i− 1) 1M−1(1; 1, i) for i∈N and ℜ(z)>−i. On the basis of the definition in (3), the following represen- tation via Kurepa’s function is true: Ki(z) = (i−1)! K(z + i− 1)−K(i− 1) for i∈N and ℜ(z)>−i+1. Note that K(0)=0 [2, p. 297.] and therefore K1(z)=K(z) for ℜ(z)>0. Analytical and differential–algebraic properties of Kurepa’s function K(z) are considered in articles [1− 12] and in many other articles. On the basis of well-known statements for Kurepa’s function K(z), using representation (4), in many cases we can get simple proofs for analogous statements for Ki(z) functions. For example, it is a well-known fact that it is possible to analytically continue Ku- repa’s function to a meromorphic function with simple poles at integer points z = −1 and z = −m, (m ≥ 3) [2, p. 303.], [3, p. 474.]. Residues of Kurepa’s function at these poles have the following form [2]: Research partially supported by the MNTRS, Serbia, Grant No. 144020. http://arxiv.org/abs/0704.0068v2 2 Branko J. Malešević z = −1 K(z) = −1 and res z = −m K(z) = (−1)k−1 , (m≥3).(5) For Kurepa’s function K(z) the infinite point is an essential singularity [3]. Hence, on the basis of (4), each function Ki(z) is meromorphic with simple poles at integer points z=−i and z=−(i+m), (m≥2). On the basis of (4) we have: z = −(i+m) Ki(z) = (i−1)! · res z = −(i+m) K(z + i− 1) = 1 (i−1)! · res z = −(m+1) K(z),(6) where m = 0 or m≥2. Hence: z = −i Ki(z)=− (i−1)! and res z = −(i+m) Ki(z)= (i−1)! (−1)k−1 , (m≥2).(7) For each Ki(z) function the infinite point is an essential singularity. Therefore, we get Theorem 3.3. from [10]. Next, it is a well-known fact that for Kurepa’s function the following asymptotic relation K(x) ∼ Γ(x) is true for real x such that x → ∞ and where Γ(x) is the gamma function [2, p. 299.]. Hence, for fixed i ∈N and real x>−i+1, on the basis of (4), we get: Ki(x) Γ(x+ i− 1) (i−1)! · K(i+ x− 1)−K(i− 1) Γ(x+ i− 1) (i−1)! Ki(x) Γ(x+ i) (i−1)! · K(i+ x− 1)−K(i− 1) (x+ i− 1)Γ(x+ i− 1) 0.(9) Therefore, we get Theorem 3.6. from [10]. Next we give a solution to the open problem stated in Question 3.7. in [10]. Namely, the following formula in the article [8, p. 35.] is given: K(z) = Ei(1) + iπ (−1)zΓ(1 + z)Γ(−z,−1) ,(10) for values z ∈ C\{−1,−2,−3,−4, . . .} and i = −1. In the previous formula Ei(z) and Γ(z, a) are exponential integral and incomplete gamma function respectively [8]. Then, for fixed i∈N and values z∈C\{−i,−i− 1,−i− 2,−i− 3, . . .}, on the basis of (4) and (10), we get: Ki(z) = (i−1)! K(z + i− 1)−K(i− 1) Ei(1) + iπ e(i−1)! (−1)z+i−1Γ(1 + z + i− 1)Γ(−z − i+ 1,−1) e(i−1)! −Ei(1) + iπ e(i−1)! − (−1) i−1Γ(i)Γ(−i+ 1,−1) e(i−1)! = (−1)ie−1 Γ(1− i,−1)− (−1)z Γ(1− i− z,−1)Γ(i+ z) (i−1)! Therefore, the affirmative answer for Question 3.7. from [10] is true for complex values z∈C\{−i,−i− 1,−i− 2,−i− 3, . . .}. A note about the {Ki(z)}∞i=1 functions 3 Finally, at the end of this note let us emphasize one differential–algebraic fact for the sequence of functions Ki(z). On the basis of the formula (17) from the article [10], we can conclude that each Ki(z) function satisfies the following recurrence re- lation (i−1)!Ki(z + 1)− (i−1)!Ki(z) = Γ(z + i). The previous relation can be used to verify the differential transcendency of these functions as discussed in [11, 12]. Therefore, we can conclude that each Ki(z) function is a differential transcendental function, i.e. it satisfies no algebraic differential equation over the field of complex ra- tional functions. REFERENCES [1] D- . Kurepa: On the left factorial function !n, Mathematica Balkanica 1 (1971), 147−153. [2] D- . Kurepa: Left factorial function in complex domain, Mathematica Balkanica 3 (1973), 297 − 307. [3] D. Slavić: On the left factorial function of the complex argument, Mathematica Balkan- ica 3 (1973), 472− 477. [4] A. Ivić, Ž. Mijajlović: On Kurepa problems in number theory, Publications de l’Institut Mathématique, SANU Beograd, 57, (71) (1995), 19 − 28, available at http://elib.mi.sanu.ac.yu/pages/browse journals.php . [5] G.V. Milovanović: Expansions of the Kurepa function, Publications de l’Institut Mathématique, SANU Beograd 57 (71) (1995), 81− 90, available at home page http://gauss.elfak.ni.ac.yu . [6] G.V. Milovanović, A. Petojević: Generalized factorial functions, numbers and poly- nomials, Mathematica Balkanica 16 (2002), 113− 130. [7] A. Petojević: The function vMm(s; a, z) and some well-known sequences, Journal of Integer Sequences, Article 02.1.6, Vol. 5 (2002). [8] B. Malešević: Some considerations in connection with Kurepa’s function, Univerzitet u Beogradu, Publikacije Elektrotehničkog Fakulteta, Serija Matematika, 14 (2003), 26−36, available at http://pefmath.etf.bg.ac.yu/ . [9] B. Malešević: Some inequalities for Kurepa’s function, Journal of Inequalities in Pure and Applied Mathematics, Vol. 5, Issue 4, Article 84, (2004), available at http://jipam.vu.edu.au/ . [10] A. Petojević: The {Ki(z)} i=1 functions, Rocky Mountain Journal of Mathematics, Vol. 36, No. 5, (2006), 1637-1650. [11] Ž. Mijajlović, B. Malešević: Differentially transcendental functions, accepted in Bulletin of the Belgian Mathematical Society − Simon Stevin 2007, available at http://arxiv.org/abs/math.GM/0412354 . [12] Ž. Mijajlović, B. Malešević: Analytical and differential – algebraic properties of Gamma function, to appear in International Journal of Applied Mathematics & Statistics J.Rassias (ed.), Functional Equations, Integral Equations, Differen- tial Equations & Applications, http://www.ceser.res.in/ijamas/cont/fida.html Special Issues dedicated to the Tri-Centennial Birthday Anniversary of L. Euler, 2007., available at http://arxiv.org/abs/math.GM/0605430 . University of Belgrade, (Received : 04/01/2007 ) Faculty of Electrical Engineering, (Accepted : 05/25/2007 ) P.O.Box 35-54, 11 120 Belgrade, Serbia malesh@eunet.yu, malesevic@etf.bg.ac.yu http://elib.mi.sanu.ac.yu/pages/browse_journals.php http://gauss.elfak.ni.ac.yu http://pefmath.etf.bg.ac.yu/ http://jipam.vu.edu.au/ http://arxiv.org/abs/math.GM/0412354 http://www.ceser.res.in/ijamas/cont/fida.html http://arxiv.org/abs/math.GM/0605430 ABSTRACT In the article [Petojevic 2006], A. Petojevi\' c verified useful properties of the $K_{i}(z)$ functions which generalize Kurepa's [Kurepa 1971] left factorial function. In this note, we present simplified proofs of two of these results and we answer the open question stated in [Petojevic 2006]. Finally, we discuss the differential transcendency of the $K_{i}(z)$ functions. <|endoftext|><|startoftext|> Introduction Our purpose is to construct invariant dynamical objects for a self map f : X → X of a compact topological space. We make use of sheaf cohomology and differences in rates of expansion in different terms of a long exact sequence to construct invariant sections of a sheaf. We will show that there are in- variant degree 1 currents (or eigencurrents) corresponding to each expanding eigenvector of H1(X,R). We also show that successive preimages of suffi- ciently regular degree one currents converge to one of these eigencurrents. We show that if most of the expansion f : X → X is ”along” an invariant cohomological class v ∈ Hk(X,R) then there is an invariant current c in that cohomology class and other sufficiently regular currents in the same class converge to c under successive pullback. The group cohomology of Z acting on a space of functions on X via pull- back has been studied in the context of dynamical systems [Kat03]. This work seems related to ours, but to be pursued in an essentially different di- rection. Our map f is not assumed to be invertible, so there is not necessarily a Z action, only an N action. Also, we use sheaves rather than functions and make substantial use of cohomological tools. Most importantly, we are par- ticularly interested in the construction of invariant currents, especially when the current is some sense unique. Our results are actually motivated by results in higher dimensional holo- morphic dynamics showing the existence of a unique closed positive (1, 1) cur- rent under a variety of circumstances (just about any recent paper on higher dimensional holomorphic dynamics either proves such results or makes essen- tial use of such results, see e.g. [FS92], [HOV94], [HOV95], [BS91a], [BS91b], [BS92], [BLS93], [BS98a], [BS98b], [BS99], [Can01], [McM02], [FS94a], [FS94b], [FS95b], [FS01], [FS95a], [JW00], [FJ03], [Ued94], [Ued98], [Ued97], and [DS05]). While invariant measures have been a focal point in dynamics, it seems that invariant currents also have an imporant role to play. We will show under mild conditions that if some degree one cohomological class of a smooth self map f of a compact manifold is invariant and expanded there is necessarily a invariant degree one current of a certain type representing that class. We obtain analogous results for higher degree currents given bounds on the local growth rates of f . The uniqueness of these classes is significant. It seems clear that one could modify a map locally near a fixed point to obtain other invariant currents of the same type without affecting the topology. Thus our results also say that any local modification that created an invariant current of the given type must violate the local growth conditions. In other words, as long as things do not grow too fast compared to the growth rate of the cohomology class, the expansion of the cohomology class gives sufficient “marching orders” to points that no other invariant cohomological class of the given type can be created by purely local dynamical behavior. Our results give explicit conditions under which uniqueness is guaranteed. For degree one currents, no restriction on local growth rates is necessary for our results. 2 Cohomomorphisms We will make use of sheaves in this paper. There are two standard def- initions of sheaves on a topological space X, one as a topological space ([Bre97],[GR84]), and one as a functor on the category TopX satisfying var- ious axioms ([Har77],[Wei97]). Since we will often want to make use of a topology on sections of a sheaf A that differs from the topology these inherit using the topological definition of a sheaf, we will instead use the functor definition of a sheaf. Our sheaves will always be sheaves of K modules over some fixed field K. We will require that K have an absolute value for which K is complete. Given a continuous map f : X → Y and sheaves A and B on X and Y respectively, an f -cohomomorphism is a generalized notion of a pullback from B to A through f . Different types of geometric objects pull back differently, and this allows us to handle all cases at once. We take the following facts from from [Bre97] page 14–15. Definition 1. If A and B are sheaves onX and Y then an “f-cohomomorphism” k : B → A is a collection of homomorphisms kU : B(U) → A (f−1(U)), for U open in Y , compatible with restrictions. Note that if A is a sheaf on X and f : X → Y is continuous then there is a canonical cohomomorphism f∗A ; A where f∗A is the direct image of A , i.e. given an open U ⊂ Y , f∗A (U) = A (f−1(U)). Remark. Given a continuous map f : X → Y of topological spaces X and Y and sheaves A and B on X and Y respectively, all f -cohomomorphisms f : B ; A are given by a composition of the form j→ f∗A f∗→ A where j : B → f∗A is a sheaf homomorphism, and each such composition is seen to given an f -cohomomorphism. The usual notion of “a morphism of sheaves on X” is the same as an idX cohomomorphism of sheaves on X. 2.1 Cohomomorphisms and Γ. The functor Γ returns the global sections of that sheaf. Given a morphism φ : A → A ′ of sheaves on X, Γφ is just the homomorphism A (X)→ A ′(X). Given sheaves A and B on X and Y and given f : X → Y continuous then for a sheaf cohomomorphism F : B → A one defines ΓF to be the homomorphism B(Y ) → A (X). This extends Γ to be a functor on the category of topological spaces with an associated sheaf where morphisms are given by cohomomorphisms. 3 Invariant Global Sections Fix a continuous self map f : X → X of a topological space X. We will be interested in f self cohomomorphisms of sheaves A on X. As we will typically have several sheaves of interest on X, each with a corresponding f self cohomomorphism, we let fA : A ; A be the default notation for an f -cohomomorphism of A . Assume that X is a manifold and that p→ B q→ C is a short exact sequence of sheaves on X. Let f : X → X be a continuous self map of X and assume further that we are given f self cohomomorphisms of each of these sheaves and that // B q commutes. We will say that a commutative diagram as in (1) is an f self- cohomomorphism of the sequence A → B → C . Applying the functor Γ to this diagram, the rows can be extended in the usual long exact sequence. The resulting diagram is commutative ([Bre97] page 62). 0 // A (X) C (X) H1(X,A ) · · · 0 // A (X) // B(X) // C (X) // H1(X,A ) // · · · One can think of B as providing local potentials for members of C and of A as being those potentials which give rise to the zero member of C . It will be assumed that the reader is familiar with interpreting H1(X,A ) as classifying equivalence classes of bundles with transition functions in A . We will frequently refer to members of H1(X,A ) as bundles. Sections of such bundles will be assumed to be given locally by local sections of B, so that every member c of Γ(C ) is given locally by potentials in B, and these potentials, taken together, are a section of the corresponding bundle δ(c) ∈ H1(X,A ). Convention 1. We will frequently refer to a member v of H1(X,A ) as a bundle, to a member c ∈ Γ(C ) as a divisor and if δ(c) = v we will call c a divisor of the bundle v. We think this substantially adds to the readability of the paper. Definition 2. The support of a divisor c ∈ Γ(C ) is defined to be the com- plement of the union of all open sets U such that c Lemma 3. If an open set U lies outside the support of some c ∈ Γ(C ) then f−1(U) lies outside the support of fC (c) Proof. We note that by the definition of an f -cohomomorphism fC : C → C , since the cohomomorphism fC on C (U) is a homomorphism from C (U) to C (f−1(U)) and the induced action of fC on Γ(C ) restricted to U must agree with its action C (U) → C (f−1(U)), then if an open set U is outside the support of c then f−1(U) is outside the support of of fC (c). The following conditions for a given v ∈ H1(X,A ) will be of interest: Definition 4. We will refer to a bundle v ∈ H1(X,A ) for which (H1p)(v) = 0 as being closed. Note that this notion depends upon the exact sequence A → B → C , and not just on v. If B is γ acyclic then every member of H1(X,A ) is closed. Definition 5. We will call a bundle v ∈ H1(X,A ) base point free if for every x ∈ X there is some divisor c ∈ Γ(C ) associated to v whose support does not contain x. Lemma 6. If B is soft, X is a regular topological space, and a ∈ H1(X,A ) is a closed bundle then a is base point free. Proof. From the long exact sequence there is some c′ ∈ Γ(C ) with δ(c′) = a and given any point x ∈ X, from the fact that B � C the germ c′x of c at x is the image under qx of some germ b x of Γ(B) at x. Choose an open neighborhood U of x on which there is some b′ ∈ B(U) with b′x = b′′x. The topological assumption on X implies that there is a neighborhood V b U of x. The fact that B is soft implies there is some b ∈ Γ(B) such that . Then c = c′ − b ∈ Γ(C ) has δ(c) = a and x 6∈ Supp(c). Definition 7. We will refer to a bundle a ∈ H1(X,A ) such that fA (a) = λ·a for some λ ∈ C as a λ eigenbundle. We also find it useful to introduce a relevant notion of expansiveness of a map f : X → X relative to a base point free closed eigenbundle v ∈ H1(X,A ). Definition 8. Given a base point free closed eigenbundle v ∈ H1(X,A ) then we say that f is cohomologically expansive at x for v if for any open neighborhood U of x and any divisor c ∈ Γ(C ) of v, the set U intersects the support of fkC (c) for all sufficiently large k. Remark. It is a corollary of the definition that the set of points at which f is cohomologically expansive for v is closed and forward invariant. If Supp fkC (c) = f −k(Supp(c)) for each c ∈ Γ(C ) then the set of cohomolog- ically expansive points is totally invariant. The notion of being cohomologically expansive at x for v means roughly that under iteration by f small neighborhoods U of x always grow to cover enough of X that the pullback of the bundle v to the set fk(U) is a nontrivial bundle on fk(U) whenever k is large. We show that if B is soft and X is a compact metric space then some minimal expansion takes place at points where f is cohomologically expansive for a closed eigenbundle a ∈ H1(X,A ). We use B�(x) to denote the ball of radius � about x. Lemma 9. Let X be a compact metric space. If B is soft and v is a closed eigenbundle then there exists δ > 0 such that for every � > 0 there exists some K > 0 such that if f is cohomologically expanding at x then for every k > K, diam fk(B�(x)) > δ. Proof. The bundle v is base point free by Lemma 6. Using compactness we can conclude that there is a finite open cover U1, . . . , U` of X such that for each j, Uj is disjoint from Supp cj for some cj ∈ Γ(C ) with δ(cj) = v. We will prove the lemma by contradiction. Let δ be the Lebesgue number of the cover U1, . . . , U`. If the lemma is false there is some � > 0 and some increasing sequence kn and points xn at which f is cohomologically expansive such that diam fkn(B�(xn)) ≤ δ for each n. By going to a subsequence if necessary we can assume xn converges to a point x∞. Letting U = B 1 �(x∞) we see that U ⊂ B�(xn) for all large n and thus there is some one cj of c1, . . . , c` such that fkn(U) is disjoint from Supp cj for infinitely many values of n. Consequently U is disjoint from Supp fknC (cj) for infinitely many n, contrary to x∞ being a point at which f is cohomologically expansive for v. We included Lemma 9 to show that our notion of cohomological expansion is genuinely expansive. However, depending on the nature of A , being coho- mologically expansive can imply that neighborhoods grow a great deal under iteration indeed. In Lemma 10 we show that given any closed set K such that the pullback of a fixed point free closed eigenbundle a ∈ H1(X,A ) to K is a trivial bundle then any neighborhood U of a point at which f is cohomolog- ically expanding for a is so expanded under iteration that fk(U) 6⊂ intK for all sufficiently large k. The collection of such sets K typically contains very large sets about every point so no matter where fk(x) is the conclusion that fk(U) does not lie in any intK implies some points of fk(U) must lie far away from fk(x). The point is roughly that large iterates of any neighborhood of x can not be homotopically contracted to a point in X. Lemma 10. If B is soft, then for any closed set K ⊂ X such that the image of H1(X,A ) → H1(K,A ) is zero, given any divisor c ∈ Γ(C ), there is another divisor c′ ∈ Γ(C ) associated to the same bundle and c′ is supported outside the interior of K. Consequently, if f is cohomologically expansive at x ∈ X for some base point free closed eigenbundle a ∈ H1(X,A ) then necessarily for any neighborhood U of x, fk(U) 6⊂ intK for all large k, where intK is the interior of K. Proof. We use the commutative diagram H0(X,B) Γq // H0(X,C ) H1(X,A ) H0(K,B Γq // H0(K,C δ // 0 which we have written using H0 instead of Γ so it is clear what the ambi- ent space is in each case. From exactness there exists some β ∈ H0(K,B such that δ(β) = c . Then since B is soft the mapH0(X,B)→ H0(K,B is surjective so there is some b ∈ Γ(B) = H0(X,B) such that b = β. Then c′ = c − (Γq)(b) has δ(c′) = δ(c) and c′ = 0 so Supp(c′) is disjoint from the interior of K. It is easy to see that if f is cohomologically expansive at x ∈ X for some fixed point free closed eigenbundle a ∈ H1(X,A ) then necessarily for any neighborhood U of x, fk(U)∩Supp c 6= ∅ for all large k for any c ∈ Γ(C ) such that δ(c) = a. Hence fk(U) can not lie in the interior of K for any large Convention 2. We let K be either R or C, although our central theorems only require K to be a complete field with an absolute value. The following Theorem takes advantage of the fact that in an exact se- quence the eigenvalues of members of nonadjacent members of the sequence do not have to agree to give conditions under which one can uniquely “lift” fixed members of one term of the exact sequence to a fixed member of the pre- ceding term. Interpreted as a statement in the context of sheaf cohomology we will be able to use this Theorem to make dynamical conclusions. The theorem shows that each closed eigenbundle of the induced map fA : H 1(X,A ) → H1(X,A ) with sufficiently large eigenvalue has a unique associated invariant divisor c ∈ Γ(C ). Definition 11. Given any finite dimensional K vector space V along with a linear map g : V → V and any positive real number r, we let the r chron- ically expanding subspace of V be the span of the subspaces associated2 to eigenvalues of absolute value greater than r. We refer to the 1 chronically expanding subspace simply as the chronically expanding subspace. Theorem 12 (Unique Invariant Subspace Theorem). We will assume the following: • f : X → X is a continuous self map of a topological space X. • We are given an f self cohomomorphism of a short exact sequence of sheaves on X, p→ B q→ C • Γ(B) is a Banach space over K, and there exists some α, d ∈ R>0 such that ‖ΓfBk(B)‖ ≤ d · αk‖B‖ for k ∈ N, B ∈ Γ(B), • Γ(C ) is a topological vector space over K. • If a sequence Ci ∈ Γ(C ) of divisors converges to another divisor C∞ then the support of C∞ is contained in the closure of the union of the supports of Ci. • The maps ΓfC and Γq are continuous. • We are given a finite dimensional H1(fA ) invariant subspace W of the α chronically expanding subspace of H1(X,A ). We also require W to be comprised only of closed bundles. 2Meaning for each eigenvector λ we include not just the λ eigenspace, but also every v ∈ V such that (g − λ · idV )n(v) = 0 for some positive integer n. Then given any K linear map s : W → Γ(C ) such that δs = idW there is a K linear map τ : W → Γ(B) satisfying κ := lim (ΓfC ) ksgk = s+ (Γq)τ (3) where g : W → W is the inverse of H1fA . Under iterated pullback the rescaled pullbacks of any divisor C ∈ Γ(C ) of a bundle w ∈ W converge toward the invariant plane of divisors κ(W ) ⊂ Γ(C ). The map κ : W → Γ(C ) is the unique map making the diagram wwo o Γ(C ) // H1(X,A ) wwo o // Γ(C ) // H1(X,A ) commute. Finally, for any basepoint free eigenbundle v ∈ W the support of the corresponding invariant divisor κ(v) ∈ Γ(C ) is contained in the set of points on which f is cohomologically expansive for v. Proof. We note that δ (ΓfC )sg−s = 0 and so there is a map σ : W → Γ(B) such that (Γq)σ = (ΓfC )sg − s. Define Φ: Hom(W,Γ(B)) → Hom(W,Γ(B)) by Φ(σ) = (ΓfB)σg−1. We will show that the sequence of maps Φk is exponentially contracting on Hom(W,Γ(B)). Fix a norm ‖ · ‖ on W . The assumption that W lies in the α chronically expanding subspace of H1(X,A ) implies that there exists some β > α and some c > 0 such that ‖g−k(w)‖ ≤ cβ−k‖w‖ for k ∈ N, w ∈ W . This with the assumption on the rate of expansion of ΓfB easily implies that ‖Φk(φ)(w)‖ = ‖(ΓfB)k(φ(g−k(w)))‖ ≤ cd ‖φ‖ · ‖w‖ Thus Φk is an operator of norm no more than cd , where α < β. Letting τk = σ + Φ(σ) + Φ 2(σ) + · · · + Φk(σ) then limk→∞ τk converges to some map τ . It is easily confirmed that (Γq)τk = (ΓfC ) ksg−k − s. Equa- tion (3) then follows by continuity of Γq. The conclusions about the map κ are easy consequences of its definition. For the final conclusion note that if we just let W be the span of v then we have already shown that if C is the unique invariant member of Γ(C ) associated to v then for any divisor c′ ∈ Γ(C ) satisfying δ(c′) = v letting λ be the eigenvalue of v we can write c′ = κ(v) + (Γq)(b) and equation 3 becomes (ΓfC ) kc′/λk = κ(v) + (Γq)(ΓfB) kbλk where the final term goes to zero as k →∞ (by our assumptions on growth rates of g−1 and ΓfB). Hence (ΓfC ) k(c′)/λk converges to c = κ(v). If U is any open subset of X and if the support of c′ is disjoint from fn(U) for arbitrarily large values of n, then the support of (ΓfC ) n(c′) must be disjoint from U for arbitrarily large values of n. Since, rescaled, these converge to c then U must lie outside the support of c. Remark. While we have not formally required X to be compact, the re- quirement that Γ(B) be a Banach space makes this the main case in which Theorem 12 is apt to have interesting applications. Theorem 12 shows that among all members of Γ(C ) representing a coho- mology class in W there is a unique invariant linear subspace which can be identified with W and all other such members of Γ(C ) are contracted to this invariant copy of W in Γ(C ) under (rescaled) pullback. Corollary 13. Assume that the hypothesis of Theorem 12 are satisfied, and that g : W → W is dominated by a single simple real eigenvalue r > 0 with eigenvector v. Let C ≡ κ(v) be the unique invariant divisor of v. Then given a divisor C′ ∈ Γ(C ) of any w ∈ W the successive rescaled pullbacks fkC (C ′)/rk converge to a multiple (possibly zero) of C. Proof. This is a direct consequence of equation (3). The assumption that g : W → W is dominated by a single simple real eigenvalue is meant to handle the most typical situation, and is not an es- sential restriction. Remark. Given that for a fixed f : X → X the category of SC sheaves A on X endowed with an f self cohomomorphism F is an abelian category with enough injectives, then the functor Fixed Γ which gives the fixed global sections of A under F will be left exact and its right derived functors should be of dynamical interest. In the case where A is a sheaf of functions and f is invertible this is just group cohomology with the group Z acting on Γ(A ) and has been an object of study for some time (see, e.g. [Kat03]). We anticipate studying the case of more general sheaves A and the right derived functors of the composition Fixed Γ in a future paper, including the case of endomorphisms. 3.1 Regularity and Positivity Typically our regularity results for the members invariant plane κ(W ) will be most easily described in terms of B rather than C . We therefore make the following definition. Definition 14. Given a subsheaf B′ ⊂ B we will say a divisor C ∈ Γ(C ) has local B′ potentials if C ∈ Γ(q(B′)). This is equivalent to requiring that about each point x ∈ X there is an open neighborhood U and some B′ ∈ B′(U) such that q(B′) = C The proof of Theorem 12 implicitly provides a method to prove regularity results for members of the invariant plane κ(W ). We make this explicit as a corollary (of the proof). Corollary 15. Assume we are given f : X → X and a short exact sequence of sheaves A p→ B q→ C satisfying the hypothesis of Theorem 12. Assume that B′ is a subsheaf of B and that ΓfB(B ′) ⊂ B′. Let C ′ be the image of B′ under q : B → C . Let A ′ ⊂ A be the kernel of q : B → C ′. Assume that the canonical map H1(X,A ′) → H1(X,A ) is injective. Assume that there are basis members w1, . . . , wk of W with divisors each of which has local potentials in B′. Let r be the the inverse of the absolute value of the largest eigenvalue of g−1 (so for all j ≥ 0, g−j is an operator of norm no more than cr−j for some c > 0) Finally assume that for any sequence of numbers aj, j = 0, 1, 2, . . . such that |aj| is no more than a constant times r−j as j →∞ then for B ∈ Γ(B′) the exponentially decaying sequence a0 B + a1 (ΓfB)(B) + a2 (ΓfB) 2(B) + · · · (4) converges in the Banach space structure on Γ(B) to a member of Γ(B′). Then the map κ : W → Γ(C ) lands in Γ(C ′). Proof. Since W lies in the α chronically expanding subspace of W then neces- sarily α/r < 1. Thus the terms of equation (4) have exponentially decreasing norms and the series is exponentially decaying. By the assumption of a divisor in Γ(C ′) for each member wj of a basis then the map s : W → Γ(C ) in Theorem 12 can be assumed to land in Γ(C ′). Then (ΓfC )sg −1− s lands in Γ(C ′) and satisfies δ((ΓfC )sg−1− s) = 0. Since H1(X,A ′) → H1(X,A ) injects it easily follows that for each wj one can choose σ(wj) to be a member Bj of Γ(B) ′. Using the basis w1, . . . , wk to write g−1 as a matrix A, and letting aij,` be the ij entry of A ` (so for each ij, aij,` is bounded by a constant times r−`) we see that τ`(wj) = Bj + (ΓfB)(a1j,1B1 + · · ·+ akj,1Bk) + (ΓfB)2(a1j,2B2 + · · ·+ akj,2Bk) + · · ·+ (ΓfB)`(a1j,`B1 + · · ·+ akj,`Bk). Gathering all the B1 terms, B2 terms, etc... from the right hand side we see that τ = limk→∞ τk is a member of Γ(B ′) and thus that κ lands in Γ(C ′) by equation (3). The following trivial observation will suffice for our needed positivity conclusions. Observation. Assume we have an f self cohomomorphism of a short exact sequence of sheaves A p→ B q→ C satisfying the hypothesis of Theorem 12, and also a subsheaf C ′ ⊂ C such that 1. C ′ is closed under multiplication by R>0. Note that C ′ is not necessarily a sheaf of K modules, or even of groups. 2. fC (C ′) ⊂ C ′ 3. Γ(C ′) is closed in Γ(C ). Then for any closed eigenbundle v ∈ H1(X,A ) with eigenvalue in K0 and at least one divisor C′ ∈ Γ(C ′) the unique invariant divisor C ∈ Γ(C ) of v also lies in Γ(C ′). Proof. The proof is trivial since C = limk→∞(ΓfC ) k(C′)/λk where λ ∈ R>0 is the eigenvalue of v. 4 Subsheaf Cohomology In applications of Theorem 12 it is common that there is a well understood exact sequence of sheaves d0→ S1 d1→ S2 d2→ · · · (5) and that B is a subsheaf of Sk for some k, A is the kernel of dk : B → Sk+1 and C is the image of B in Sk+1. Moreover, in these cases the self co- homomorphism f on A → B → C is induced by an f self cohomomorphism of the sequence (5). In order to apply Theorem 12 to these cases we need to understand the R module H1(X,A ) and its induced self map. There does not seem to be a computationally useful way to extract an injective resolution of A using subsheaves of S0 d0→ S1 d1→ · · · even if this last sequence is acyclic. Consider for example the case where for each n, Sn is the sheaf of currents of degree n and B ⊂ Sk is a subsheaf of mildly regular currents. It is not clear one could make the regularization method of [dR84] work to compare H1(X,A ) to deRham cohomology groups because his chain homotopy operator A does not restrict well to B since dA does not preserve regularity. We use a standard sheaf cohomological trick, which we include here as a proposition which we will need and which we expect to be commonly used in conjuction with Theorem 12 because of the requirement that Γ(B) be a Banach space. Theorem 16 (Subsheaf Cohomology). Assume we are given an exact se- quence of sheaves S0 d0→ S1 d1→ S2 d2→ · · · and that B is a subsheaf of Sk for some k ≥ 1. Let A = ker dk , and B′ be the preimage of B under dk−1. Further assume that for each j ≥ 1 we have Hj(X,B′) = 0, Hj(X,B) = 0 and for any m satisfying 0 ≤ m ≤ k − 1 we have Hj(X,Sm) = 0 for j ≥ 1. Then for each n ≥ 1 there is a canonical isomorphism Hn(X,A ) ∼= Hn+k(X, ker d0). Proof. While this result is essential for us, its proof is a standard cohomo- logical trick. First one notes that ker dk−1 = ker dk−1 by the definition of B′. One has the short exact sequences of sheaves: ker dk−1 → B′ → (dk(B′) = A ) ker dj → Sj → ker dj+1, j = 0, . . . , k − 2. Considering the long exact sequences for these shows that the induced maps Hn(X,A )→ Hn+1(X, ker dk−1) andHn+j(X, ker dk−j)→ Hn+j−1(X, ker dk−j−1) are isomorphisms for j = 1, . . . , k−1. Composing each of these canonical iso- morphisms gives a canonical isomorphism fromHn(X,A )→ Hn+k(X, ker d0). Remark. We take it as clear from the functorality of the δ map in the long exact sequence that given an f -self cohomomorphism of S0 d0→ S1 d1→ S2 · · · which maps B to itself that the induced map of H1(X,A ) is identified with the induced map of Hk+1(X, ker d0) via the above isomorphism. We will need one more tool be able to make effective use of Theorem 16 for calculating sheaf cohomology of subsheaves of sheaves of currents. Definition 17. By an interval flow h on a bounded open interval I ⊂ R we will mean the flow obtained by integrating a vector field of the form σ(t) ∂ where σ is positive exactly on I and zero elsewhere. We use h(x, t) to denote the location of x ∈ R after following the flow for time t. Definition 18. By an n-box in Rn we will mean an open subset which is a product of n bounded open intervals I1, . . . , In. By an n-box in an n dimensional manifold we will mean an n-box which is compactly supported in some coordinate patch. By an n-subbox of an n box U = I1 × · · · × In we will mean an n box of the form I ′1 × · · · × I ′n where I ′k is a subinterval of Ik for each k ∈ 1, . . . , n. Definition 19. By an n-box flow we will mean the Rn action h on Rn which is the product of n interval flows h1(t1), . . . , hn(tn) on Rn. That is h(x, t) = (h1(x1, t1), . . . , hn(xn, tn)) where x = (x1, . . . , xn), t = (t1, . . . , tn) and h1, . . . , hn are interval flows on I1, . . . , In respectively. We refer to the n-box I1 × · · · × In as the open support of the n-box flow. We will often ht to denote the diffeomorphism h(·, t) : Rn → Rn. Definition 20. Let h be an n-box flow on an n-box B. Let ρ be a compactly supported smooth volume form on Rn. With this data we define an operator Sh,ρ on smooth k forms on any n box U containing B by Sh,ρ(φ) = h∗t (φ)ρ(t) (6) We say Sh,ρ defines a box smear on U , or smears U . We will omit the subscript from Sh,ρ when the meaning is clear from context. It is clear S(φ) is compactly supported in U if φ is. It is clear from the definition of S that if ψ is an n− k form on U then∫ SH,ρ(φ) ∧ ψ = φ ∧ S−H,ρ(ψ) where−H is the family Ht with the parameter negated. From this motivation we define a smear of a current. Definition 21. Given h, ρ defining a smear on an n box U we define the smear Sh,ρ on currents on U via < Sh,ρ(C), φ >≡< C,S−h,ρ(φ) > . Lemma 22. Given h, ρ defining a smear S on an n box U then d S(dC) for currents C on any open subset of U containing the open support of the smear. Also, restricted to the open support of the smear, S(C) is a smooth form on V . Proof. We remark that it is clear that d = S(dφ) for forms φ, and consequently for currents φ via the definition. Because on the open support of the smear, a smear is just convolution with a smooth function, then we see that if V is an open subset of the open support of smear S on U then for any current C on U , S(C) is a smooth form on V . Proposition 23. Let B be a sheaf of degree k currents. Assume that B contains the sheaf of smooth k forms on X, and that B(U) is closed under smears on any n-box U ⊂ X. Let B′ be the preimage under d of B in the sheaf of degree k − 1 currents. Then B′ is soft, and therefore, Γ-acyclic. Proof. To show that B′ is soft it is sufficient to show that B′ is locally soft ([Bre97] page 69). Given an n-box U in X we therefore only need to show that if K is a closed subset of X in U and if W is an open neighborhood of K then given any member B′0 of B ′(W ) there is an open neighborhood W0 ⊂ W of K and a member B′ ∈ B′(U) such that B′ = B′0 Choose any pair of open sets V1, V2 such that K b V1 b V2 b W . Then V2 \ V1 is compact and can therefore be covered by finitely many (open) n- subboxes Y1 . . . , YN of U . Moreover these subboxes can all be chosen to be disjoint from K and to lie inside W . Letting S1, . . . ,Sn be smears on U with open support Y1, . . . , YN respectively then let B = S1(S2(· · · (SN(B′0)) · · · )). Then on each Yj, B is given by a smooth k form. Also, B = B′0 . Finally, we choose a smooth function ψ : U → [0, 1] which is one on a neighborhood of V1 and zero on a neighborhood of U \ V2. Then the current B′ ≡ ψB extends (by zero) to a current on all of U . Then for each Yj, B smooth function times a smooth form. Thus d(B′ ) is a smooth form and Figure 1: A current comprised of parallel submanifolds smeared and cropped. lies in B(Yj). The boxes Yj cover V2 \ V1. Outside V2, B′ is identically zero. We know that dB ∈ B(W ) by Lemma 22. We also know that ψ ≡ 1 on an open neighborhood W1 of V1. Thus d(B ) = d(B ) ∈ B(W1). We thus conclude that B′ ∈ B′(U) since its restriction to each Yj, to W1 and to U \ V2 is a section of B′. Letting W0 = V2 \ (Y1 ∪ Y2 ∪ · · · ∪ YN) then W0 is an open neighborhood of K, then W0 ⊂ W1 so B′ = B′0 since W0 is disjoint from the open support of each of the smears S1, . . . ,SN . This completes the proof that B′ is soft. The following gives a broad generalization of the equalivalence of the co- homology of currents with the deRham cohomology groups. To the author’s knowledge, this result is new. Corollary 24. Let B be a sheaf of degree k currents. Assume that B con- tains the sheaf of smooth k forms on X, and that B(U) is closed under smears on any n-box U ⊂ X. Letting A be the subsheaf of d closed members of B, then Hm(X,A ) = Hm+k(X,K), where K is R or C depending on whether or not we allow complex valued currents and forms. Proof. This is an immediate consequence of Proposition 23 and Theorem 16. 5 Invariant Currents Notation 1. If G is some sheaf of functions on a smooth orientable manifold X we will use F k(G ) to denote the sheaf of k forms on X with coefficients in G . We will let F kc (G ) be the subsheaf of closed (in the sense of currents) members of F k(G ). It will be convenient to use either degree or dimension of a current de- pending on the context (just as dimension and codimension are useful for discussing manifolds), so we will not stick to just one of these terms. We will let C k denote the sheaf of degree k currents with the index written above as is typical for cohomology since d increases the degree. We will similarly write Ck for the sheaf of dimension k currents with the index written below since d decreases dimension as is common for homology. We use the following convention to realize a form α as a current so that if α is C1 then dα is the same whether computed as a current or a form. Definition 25. Given an k form α with L1 coefficients on an n manifold X we realize α as a degree k current via β 7→ (−1)( α ∧ β Definition 26. Given a (possibly complex) nonzero deRham cohomology class c ∈ HkdeRham(X) with f ∗(c) = α · c for some scalar α ∈ C we will refer to a current C in the same cohomology class as α as an eigencurrent for f if f ∗(C) = αC. Currents naturally pushforward, rather than pullback. Because we are considering maps which are not necessarily invertible we need to address how this pullback is performed. If f has critical points it is impossible to define a continuous pullback operation f ∗ on all currents in a way that agrees with expected cases. For instance, consider f(x) = x2 and let Ca be the dimension one current on R with Ca(h(x)dx) = h(a), i.e. Ca is a unit mass vector. Then the pullback f ∗(Ca) should be the sum of weighted unit masses at the two preimages of this vector (just like the pullback of a point mass is a sum of point masses each weighted by multiplicity), that is, f ∗(Ca) = C√a − C−√a . However, these pullbacks do not converge to a current as a → 0 so f ∗(C0) is not defined. Since we want f ∗ to be continuous, we are forced to work with currents that have some extremely mild regularity. We address this in the next section. 5.1 Nimble Forms and Lenient Currents Finding a good set of currents to use to study smooth finite self maps (not necessarily invertible) of compact manifolds turns out to be rather delicate. Our solution is to first expand our class of forms to include pushforwards (in the sense of currents) of forms through an appropriate class of smooth maps. Then we restrict our attention to currents which act on this extended class of forms. This solution has the very nice property that it can potentially be adapted directly to study the dynamics of other various other categories of smooth maps (by simply changing which forms are considered nimble, according to the class of maps used). It will convenient to first define the natural pushforward operator on forms: Definition 27. Given a compact orientable manifold X we let SX be the category of smooth maps f : X → X of nonzero degree and having the property that the critical set has measure zero. We use critical set here to mean the points at which Df is not invertible. It follows from our definition that the image of any set of positive measure under some f ∈ SX has positive measure. Definition 28. Given a compact orientable manifold X we define N k to be those currents ϕ which are a finite sum of currents of the form p∗(σ) where p : X → X is a map in SX and σ is a form of degree k. The pushforward p∗(σ) is computed in the sense of currents. We will later show that nimble forms are also, in fact, bona fide forms. Definition 29. We topologize N k by saying ϕj → ϕ in N k if for sufficiently large j there are maps f1, . . . , fk and k forms σ1j, . . . , σkj as well as forms σ1, . . . , σk such that i fi∗(σij) = ϕj and i fi∗(σi) = ϕ (where pushfor- wards are taken in the sense of currents) and for each i ∈ 1, . . . , k, the forms σij converge to σi in the strong sense (i.e. all derivatives converge uniformly). Lemma 30. Given a compact orientable manifold Y , N k(Y ) is a topological vector space. Proof. This follows easily from our definition of the topology. We now define the corresponding space of currents. Definition 31. We define the dimension k lenient currents Lk(Y ) to be the topological dual of N k(Y ). Every member of Lk(Y ) is a dimension k current, but with the added structure of its action on all nimble k forms. We give Lk the weak topology, i.e. Ci → C in Lk iff < Ci, ϕ >→< C, ϕ > for every ϕ ∈ N k. We write L k for the lenient currents of degree k. We define operations of wedge products with smooth forms as is usual for currents. It is clear that the lenient dimension k currents give a sheaf on X. The following properties of nimble forms are also immediately clear. Lemma 32. Let f : X → X be a member of SX . The pushforward (as a current) of a nimble k form by f is again a nimble form. Moreover f∗ : N k(X)→ N k(X) is continuous (in the topology of nimble forms). Also the exterior derivative of a nimble form (as a current) is a nimble form and d : N k(X)→ N k+1(X) is continuous. The basic necessary facts about pulling back lenient currents are then immediate. We state them here: Lemma 33. Given f : X → X a member of SX the induced map f ∗ on the sheaf of lenient degree k currents is an f cohomomorphism of sheaves. Both f ∗ : L k(X) → L k(X) and d : L k(X) → L k+1(X) are continuous. Lastly, f ∗d = df ∗ : L k(Y )→ L k+1(X). Proposition 34. Assume that f : X → X is a member of SX . Let R be the regular set of f . By Sard’s theorem R has full measure. Since the critical set is compact then R is an open subset of X. Since the preimage of a measure zero set has measure zero for SX maps then f−1(R) is also a full measure open set in X. There is a well defined operation f? which maps k forms on f−1(R) to k forms on R. Given a k form β on X, f?(β) is defined on any open subset V ⊂ R such that each component U1, . . . , Um of f−1(V ) maps diffeomorphically onto V by the formula f?(β) deg f (β) · σi (7) where σi ∈ {±1} is the oriented degree of f : Ui → V . The pushforward f? satisfies: • f?d = df? (keeping in mind that f? returns a current on R) • f?(1) = 1 • f?(f ∗(β) ∧ α) = β ∧ f?(α) • (f?)n = (fn)? • The formula ∫ f ∗(β) ∧ α = β ∧ f?(α) (8) holds for any k form β with L∞loc coefficients on Y and any smooth n−k form α on X. This justifies using f? to pullback currents. (Part of the conclusion is that both sides are integrable.) Proof. Each statement is a consequence of formula (7) except the integrabil- ity conclusion for equation (8). Local charts can be given which are bounded subsets of Rn and for which Df remains uniformly bounded (over each of the charts) and thus f ∗(β) will be a form with L∞loc coefficients in these charts. Thus the left hand side of (8) is the integral of a bounded function over a finite union of bounded charts and is therefore absolutely integrable. Since ∗(β) ∧ α) = β ∧ f?(α) it is sufficient to show that if γ is an n form with L∞loc coefficients then ∫ f−1(R) f?(γ). (9) Typicaly f?(γ) is unbounded so we need to show that the right hand side of (9) is integrable. About any point x ∈ R we can find an open V such that each of the preimages U1, . . . , Uk of V is mapped diffeomorphically onto V . Since X is orientable and n dimensional there is a well defined notion of the absolute value of an n form. Then∫ |f?(γ)| ≤ deg f ((f  = ∑ |γ| = f−1(V ) NowR is covered by countably many such sets V and listing them as V0, V1, V2, . . . , we can let V ′0 = V0, V 1 = V1 \V0, V ′2 = V2 \ (V0∪V1), . . . . Then R is the union of the countable collection of disjoint measurable sets V ′j and∫ |f?(γ)| = |f?(γ)| ≤ f−1(Vj) |γ| = f−1(R) Since f−1(R) |γ| is finite then f?(γ) is an L1 form. Using precisely the same argument but with the absolute values removed and the inequalities replaced with equalities then shows f?(γ) = f−1(R) Since R and f−1(R) are open and full measure then f? is an operator which takes in forms on X and returns forms defined almost everywhere on We now show that nimble forms are bona fide forms. Lemma 35. If g : X → X is a map in SX and σ is a smooth k form on X then the current g∗(σ) is the current of integration against the form g?(σ). Proof. If ϕ is a smooth n − k form then by definition < g∗(σ), ϕ >=< σ, g∗(ϕ) >= (−1)( σ ∧ g∗(ϕ) = (−1)( g?(σ) ∧ ϕ =< g?(σ), ϕ > by formula (8) of Proposition 34 As described in [Fed69], an inner product on a vector space V can be viewed as an isomorphism ` : V → V ∗ satisfying certain properties. The inverse of ` gives the induced inner product on V ∗. The fact that < v,w >≤ ‖v‖ · ‖w‖ with equality iff v and w are scalar multiples implies that the inner product norm on V ∗ is the same as the operator norm of V ∗ acting on V . The induced map V ∗ gives an inner product on We call this the canonical inner product on V induced by the inner prod- uct on V . Hence, given a Riemannian metric on X, there are canonical smoothly varying inner products on TxX and T ∗xX for each x ∈ X. At any point x ∈ X we define ‖ Dxf‖ to be the operator norm of the linear function Dxf : TxX → Tf(x)X. We define ‖ Df‖ to be the L∞loc norm of the map x 7→ ‖ Dxf‖. Also, given a k form ϕ we define the comass ‖ϕ‖L∞loc of ϕ to be the L loc norm of the function x 7→ ‖ ϕx‖. It is clear that the k forms with the comass norm is a Banach space. We now show that the k forms with L∞loc coefficients are naturally lenient currents. We start by defining the action on nimble forms. Definition 36. Given an n− k form C with L∞loc coefficients we define < C, p∗(σ) >= (−1)( n−k+1 C ∧ p?(σ) Lemma 37. The space F n−k(L∞loc) of n−k forms with L loc coefficients under the comass norm includes continuously into Lk(X) where the action of C ∈ F n−k(L∞loc) on some ϕ = i fi∗(σi) ∈ N k(X), with each fi ∈ SX and each σi ∈ F k(C∞) is given by < C, ϕ >≡ f ∗i (C) ∧ σi. Proof. The assumption that X is compact means that any two Rieman- nian metrics on X are comparable. Choose one so the notion of the comass norm makes sense. The result is then a straightforward consequence of equa- tion (8), Lemma 35, and our definitions. Remark. It follows that a current with local F k(L∞loc) potentials is also a lenient current. Remark. Given a member C of F k(L∞loc) then f ∗(C) is the same whether done as a lenient current or as a form. This, along with the fact that df ∗ = f ∗d justifies the ad hoc pullback of closed positive (1, 1) currents used so successfully in holomorphic dynamics. Similarly dC gives the same result whether calculated as a lenient current or a form if C ∈ F k(C1). 5.2 Hölder Lemmas We will want to apply Corollary 15 to show that each eigencurrent we con- struct has local d potentials (or ddc potentials in the holomorphic case) which are forms with Hölder continuous coefficients. In order to do this we will need a few facts which we include here in order to avoid having to include regu- larization results as afterthoughts to our main theorems. Observation. Let Hα be the functions with coefficients that are Hölder of exponent at least equal to some fixed α > 0. Since diffeomorphisms preserve Hölder exponents and averages of Hölder functions are Hölder then we take it as clear that Corollary 24 applies to show that H1(X,A ′) = H1(X,A ) where A ′ is the closed members of F k(Hα)) and A is the closed degree k currents. Lemma 38. Let X be a compact manifold (real or complex) with a Rieman- nian metric and of real dimension n. Let f : X → X be a smooth map. Then local coordinate charts Ui can be chosen on X (each representing a convex open subset of Rn) so that there is a positive constant 1 < M so that for any k form ϕ, there exist constants c, C > 0 such that writing each fk∗(ϕ) in any of the charts Ui as fk∗(ϕ) = akidx then each function aki satisfies |aki| ≤ c · ‖fk∗(ϕ)‖comass (10) and for each j ∈ 1, . . . , n, ∂aki  ≤ C ·Mk. Proof. Equation (10) is a basic fact. The rest is a straightforward consequence of realizing a self map of a manifold as being made up of a bunch of maps between different coordinate patches in Rn. That is, one chooses an open cover of patches Ui of X. Each patch is realized in Rn as a round ball. Thinking of each patch as lying in Rn then we can find explicit maps from between open subsets of Rn of the form pij : Ui ∩ f−1(Uj)→ Uj. By shrinking each open ball Ui a small amount the resulting patches still cover X but the derivatives of the maps pij are all now bounded (since we are working on relatively compact subsets of the previous maps pij). Then given any x we can keep track of which patch fk(x) is in at each time and can then realize the map fk(x) as a composition pi1i2 ◦ pi2i3 ◦ · · · ◦ pik−1ik . Since each partial derivative of each pij is uniformly bounded then any partial derivative of the composition grows at most exponentially with k and we are done. The following observation will also be useful: Lemma 39. If there are positive constants c, C,m,M with m < 1 < M such that a sequence of smooth functions hk on an open convex set U ⊂ Rn satisfies ‖hk‖sup < c ·mk and www∂hk < C ·Mk for all k ∈ 0, 1, 2, . . . then h1+h2+h3+. . . converges to a bounded continuous function which is Hölder of any exponent α < log(m) log(m/M) Proof. The proof is elementary. 5.3 Eigencurrents for Cohomologically Expanding Smooth We will call a section V of TX a k-vector field. We define ‖V ‖L∞loc to be the L∞loc norm of the function x 7→ ‖Vx‖. Whether Theorem 12 applies to a map will depend the size of ‖ Df‖. Replacing f with an iterate does not affect the needed estimate so we make the following definition. Definition 40. We define Υk to be the limit supremum as j → ∞ of D(f j)‖ j . It follows that Υ1 ≥ eλ for any Lyapunov exponent λ and that Υk ≤ Υk1 ([Fed69] page 33). We let B be the sheaf F k−1(L∞loc). The norm ‖ · ‖∞ clearly makes Γ(B) into a Banach space. Given a member B ∈ Γ(B), since the operator norm on TxX is equal to the norm already defined on T ∗xX for each x ∈ X then ‖B‖∞ is equal to supremum of the L∞loc norm of the function x 7→ B(Vx) as V varies over all L∞loc k-vector fields of norm no more than one. Theorem 41. Given f : X → X an a map in SX for the compact orientable manifold X, assume that c ∈ HkdeRham(X) is a cohomology class (using ei- ther real or complex deRham cohomology) which is an eigenvector for f ∗ with eigenvalue β. Assume also that |β| > Υk−1. Then there exists a unique eigen- current C with local F k−1(L∞loc) potentials representing the class c. Moreover C has local F k−1(H) potentials. Also, given any neighborhood U ⊂ X of any point in the support of C, then for every lenient current C′ with local F k−1(L∞loc) potentials and which represents the cohomology class c then fk(U) ∩ Supp C′ 6= ∅ for all large k. Assume that the linear map f ∗ : HkdeRham(X)→ H deRham(X) is dominated by a single simple real eigenvalue r. Given C′ any current which has local F k−1(L∞loc) potentials and which represents a cohomology class in the Υ chronically expanding subspace of HkdeRham(X), then the successive rescaled pullbacks fk∗(C′)/rk of C′ converge to a multiple of C in the sense of lenient currents (and thus also in the sense of currents). Proof. We let B = F k−1(L∞loc), A and C be the kernel and image respec- tively of B d→ L k. By Theorem 24, H1(X,A ) can be canonically identified with Hk(X,K). Since B is Γ-acyclic then every member of H1(X,A ) is a closed bundle with respect to the short exact sequence A → B → C . From Lemma 33 there is an induced f cohomorphism of the short ex- act sequence A ι→ B d→ C . Also Γ(C ) is a space of lenient currents by Lemma 33 and thus has a natural structure as a topological vector space. If a sequence Bi ∈ Γ(B) converges to B ∈ Γ(B) then < dBi, ϕ >= Bi∧dϕ =∫ B ∧ dϕ =< dB, ϕ > so the map d : Γ(B)→ Γ(C ) is continuous. The cohomomorphism ΓfB is pullback f ∗ of differential forms. Fixing any real α satisfying Υk−1 < α < |β| it is clear from the definition of Υk−1 that one can choose a real d > 0 such that ‖ D(f `)‖ ≤ d ·α` for all ` ∈ N. The `th pullback f `∗(B) of B ∈ Γ(B) satisfies ‖f `∗(B)‖∞ = supV ‖B( D(f `)(V ))‖∞ where the supremum is taken over all k-covector fields V with ‖V ‖∞ ≤ 1. However D(f `)(V ) is a k-covector field of norm no more than ‖ D(f `)‖, so ‖f `∗(B)‖∞ ≤ ‖B‖∞ · D(f `)‖∞ ≤ d · α`‖B‖∞. Given any W in the Υk−1 chronically expanding subspace of H k(X,K), we can alter our choice of α > Υk−1 so that W also lies in the α chronically expanding subspace of Hk(X,K). We can therefore apply Theorem 12 to conclude that there is a (unique) map κ : W → Γ(C ) such that f ∗κ = κf ∗, where the first f ∗ is pullback of currents and the second is pullback on Hk(X,K). In fact κ(W ) lies in the space of currents with locally Hölder potentials (meaning F k−1(H) potentials) by applying Corollary 15 in conjunction with Observation 5.2, Lemma 38 and Lemma 39. The second half of the Theorem is a consequence of equation (3). Remark. Theorem 41 gives regular degree one eigencurrents for every eigen- value of f ∗ : H1(X,K) → H1(X,K) of norm greater one without requiring any constraints on the local behavior of f . The degree one eigencurrents seem to be, in some sense, more robust than currents of lower dimension, including invariant measures. Moreover since codimension one closed sub- manifolds are closed currents with local F 0(L∞loc) potentials then successive rescaled preimages of such manifolds in the right cohomological class will converge to the eigencurrent. Remark. The fact that eigencurrents constructed via Theorem 41 have local potentials which are forms does not imply their support has positive Lebesgue measure as the classical example of a monotonic nonconstant function which is constant on a set of full measure shows. Remark. The assumption that f ∗ : H1deRham(X)→ H deRham(X) is dominated by a single simple real eigenvalue r is not essential, but just meant to handle the simplest case. In fact the proof actually shows that if W lies in the Υk−1 chronically expanding subspace of H k(X,K) then every current in the invariant plane κ(W ) ⊂ Γ(C ) of currents has local F k−1(H) potentials and any current with cohomological class in W with local F k−1(L∞loc) potentials is attracted to κ(W ) under successive rescaled pullback. Since measures are of particular interest in dynamics, we note thatH1(X,F n−1(L∞loc)) = Hn(X,K) = K by Corollary 24 so there is a unique f ∗ eigenvalue and it is precisely the topological degree of f . We thus obtain: Corollary 42. Given that Υn−1 < deg f then there is a unique dimension zero eigencurrent C with F k−1(L∞loc) potentials (and in fact it has F k−1(H) potentials) and the successive rescaled preimages of any C′ with F k−1(L∞loc) potentials converge to C. If additionally there is no point x ∈ X about which f is locally an orientation reversing diffeomorphism then C (and every other member of κ(W )) is a positive distribution and is therefore a Radon measure. Proof. Since f ∗ pulls back dimension zero currents (i.e. distributions) which are positive to distributions which are positive then by Corollary 3.1 the distribution C is positive. It is therefore a Radon measure (see e.g. [HL99] page 270). Remark. In the case where f is orientation reversing on some parts of X (but not on all of X) some special remarks apply. If it happens that successive rescaled images of some point converge to a dimension zero eigencurrent then since preimages of points are counted with multiplicity then when pulled back through a portion of X on which f reverses orientation the sign of a point is flipped. Thus in this case the eigencurrent may not describe so much the distribution of preimages as the relative density of preimages counted negatively as compared to those counted positively. The number of actual preimages of a point may grow exponentially faster than the degree of the map in such cases so that dividing by the degree does not yield a measure in the limit unless some such “cancellation” takes place in the limit. One would expect that the corresponding eigencurrents have local potentials which are not of bounded variation in such a case. 5.4 Eigencurrents for Smooth Covering Maps We will call a covering map which is locally a diffeomorphism a smooth covering map. We now consider the special case of smooth self covering maps f : X → X of a compact smooth orientable manifold X. We show that in this case we have a substantially broader collection of currents whose successive pullbacks converge to an eigencurrent, albeit we need different estimates for Theorem 12 to apply. We will pull back currents by pushing forward forms with f?. Since the regular set of f is all of X then f? is a well defined operator from smooth forms to smooth forms. Definition 43. For a map satisfying the hypothesis of Proposition 34 we define the operation f ∗ from currents on X to currents on Y by < f ∗(C), α >≡< C, f?(α) > . Clearly f ∗ preserves the dimension of a current. Let Mk−1 be the sheaf for which Mk−1(U) is the Banach space of bounded linear operations on the topological vector space comprised of F k−1(C∞)(U) with the ‖ · ‖∞ norm. Equivalently, Mk−1 is the sheaf of dimension k − 1 currents of finite mass. Choose a Riemannian metric on X. If f : X → X is a smooth cover then for each x ∈ X and each ` ∈ N, Dx(f `) : TxX → Tf`(x)X is invertible. We let νk(x, `) be the operator norm of the inverse of TxX →∧k Tf`(x)X. We define νk(`) = supx∈X νk(x, `) 1/`. We define νk = lim sup`→∞ νk(`). The iterated pushforward operation f `? : F k−1(C∞)(X) → F k−1(C∞)(X) satisfies ‖f `?(ϕ)‖∞ ≤ νk(`) · ‖ϕ‖∞ as is straightforward to verify. If f is in- vertible then νk is a bound on the growth of the k th wedge product of the derivative under f−1. For non-invertible f , νk represents a bound on the growth of the kth wedge product of the derivative under any sequence of successive branches of f−1. Theorem 44. Given f : X → X a smooth self covering map and that c ∈ HkdeRham(X) is a cohomology class (using either real or complex deR- ham cohomology) which is an eigenvector for f ∗ with eigenvalue β. Assume also that |β| > νk−1. Then there exists a unique eigencurrent C with local Mk−1 potentials representing the class c. Moreover C has local F k−1(C0) potentials. Consequently C is a current of order one. Also, given any neighborhood U ⊂ X of any point in the support of C, then for every lenient current C′ with local Mk−1 potentials and which represents the cohomology class c then fk(U) ∩ Supp C′ 6= ∅ for all large k. Assume that the linear map f ∗ : HkdeRham(X)→ H deRham(X) is dominated by a single simple real eigenvalue r. Given C′ any current which has local Mk−1 potentials and which represents a cohomology class in the ν k−1 chroni- cally expanding subspace of HkdeRham(X), then the successive rescaled pullbacks fk∗(C′)/rk of C′ converge a multiple of C. Proof. We let A and C be the kernel and image respectively of d : Mk−1 → C k. Since df? = f?d then pullback of currents gives an f cohomomorphism of the short exact sequence of sheaves A →Mk−1 → C . Since ΓMk−1 is the continuous linear operators on a normed vector space then it is a Banach space. From the observations previous to the statement of Theorem 44 one concludes that for any α > νk−1 there is a constant d > 0 such that ‖f `∗(B)‖ ≤ d · αk‖B‖ for all ` ∈ N. Since Γ(C ) is a space of currents it is naturally a topological vector space over K. The map f ∗ : Γ(C ) → Γ(C ) is continuous since if Ci → C in Γ(C ) then < f ∗(Ci), ϕ >=< Ci, f?(ϕ) >→< C, f?(ϕ) >=< f ∗(C), ϕ >. If Pi → P in ΓMk−1 (using the Banach space structure) then ‖Pi−P‖ → 0 by assumption then ‖P(dϕ) − Pi(dϕ)‖ ≤ ‖P − Pi‖ · ‖dϕ‖ → 0. Hence < dPi, ϕ >= Pi(dϕ)→ P(dϕ) =< dP, ϕ > and so we conclude that the map d : ΓMk−1 → Γ(C ) is continuous. Given any W in the νk−1 chronically expanding subspace of H k(X,K), we can alter our choice of α > νk−1 so that W also lies in the α chronically expanding subspace of Hk(X,K). We can therefore apply Theorem 12 to conclude that there is a (unique) map κ : W → Γ(C ) such that f ∗κ = κf ∗, where the first f ∗ is pullback of currents and the second is pullback on Hk(X,K). In fact κ(W ) in the currents with locally continuous potentials by apply- ing applying Corollary 15 in conjunction with Observation 5.2, Lemma 38 and Lemma 39. The second half of the Theorem is a consequence of equa- tion (3). Proposition 45. Let Y be an oriented codimension k submanifold of X. If the cohomological class of Y (as a current) lies in the νk−1 chronically expand- ing subspace of Hk(X,K) then the successive rescaled preimages of Y con- verge to the invariant plane of currents κ(W ). If f ∗ : Hk(X,K)→ Hk(X,K) is dominated by a single real eigenvalue r > νk−1 then the successive rescaled preimages of Y converge to a multiple (possibly zero) of the r eigencurrent. In particular, if νn−1 < deg f then the successive rescaled preimages of any point converge to the unique invariant measure with Mn−1 potentials. Proof. This follows immediately from Theorem 44 if we show that Y has local potentials in Mk−1. This is equivalent to showing that locally Y = dP where < P,ϕ >≤ a · ‖ϕ‖∞ for some a > 0. Let B be a ball in Rn and Y0 a k-plane in Rn. Then there is a k + 1 half plane P such that, as currents in U , ∂P = Y0. Moreover it is clear that < P,ϕ >≤ a‖ϕ‖∞ for some real a > 0. (There are also local potentials for Y which are given by forms with L1loc coefficients. These can be constructed by choosing a projection π from U \ Y0 to a codimension one cylinder C with axis Y0, and choosing a volume form σ on C. The local potential is the pullback π∗(σ).) Remark. As with Theorem 41, Theorem 44 gives regular degree one eigencur- rents for every eigenvalue of f ∗ : H1(X,K)→ H1(X,K) of norm greater one without requiring any constraints on the local behavior of f . In holomorphic dynamics much progress has been made in constructing degree one eigencur- rents and then constructing dynamically important invariant measures via a generalized wedge product (see the references cited at the beginning of Section 6). Remark. The proof of Proposition 45 could clearly be modified to apply to many singular manifolds as well. 6 Holomorphic Endomorphisms We now restrict our interest to holomorphic dynamics. Thus all manifolds are assumed to be complex manifolds and all maps are assumed to be holo- morphic unless stated otherwise. Holomorphic endomorphisms of the Riemann sphere have been studied in great detail. For endomorphisms much of the theory is still in its be- ginnings. Much attention has been paid to holomorphic automorphisms of C2 [FM89], [FS92], [HOV94], [HOV95], [BS91a], [BS91b], [BS92], [BLS93], [BS98a], [BS98b], [BS99] or K3 surfaces [Can01], [McM02], the major de- velopments for endomorphisms have been on Pn, [FS94a], [FS94b], [FS95b], [FS01], [FS95a], [JW00], [FJ03], [Ued94], [Ued98], [Ued97]. Recent signifi- cant developments have been made for endomorphisms of Kahler manifolds in [DS05]. The paper [DS05] shows existence of eigencurrents (or Green’s currents) for endomorphisms of Kahler manifolds under a simple condition on the comparative rates of growth of volume in two different dimensions. They also show that a specific weighted sum of an arbitrary closed positive smooth current will converge to the Green’s current, and that the Green’s current has a Hölder continuous potential. In this setting our theorem shows that arbitrary (rescaled) preimages of a broader class of currents will con- verge to the Green’s current. A wide variety of results have been proven in these various circumstances either showing the existence of invariant cur- rents, showing convergence of currents to invariant currents, or studying the properties of these invariant currents. We include here results that follow from the method of this paper, which we are sure substantially overlap with existing results. Presumably our cohomologicaly lifting theorem could be used in conjuction with Theorem 12 to show existence of higher degree (k, k) currents given certain bounds on local growth rates. 6.1 ddc Cohomology Let Z be a complex manifold and let f : Z → Z be a holomorphic self map of Z. Let H be the sheaf of pluriharmonic functions, let L∞loc be the sheaf of locally bounded functions, and let C be the sheaf of currents with local potentials in L∞loc, i.e. currents locally of the form dd cb, for b a locally bounded function. The members of C are closed (1, 1) currents on Z. Using the usual pullback on functions, and the induced pullback on cur- rents with function potentials (i.e. pullback the current by pulling back its local potentials), then we get a self cohomomorphism of the exact sequence of sheaves H → L∞loc ddc→ C . (11) We note that H1(Z,H ) is a finite dimensional R vector space as can be seen from the long exact sequence for the short exact sequence R→ O →H where the first map is inclusion and the second takes the imaginary part. The terms H1(Z,O) → H1(Z,H ) → H2(Z,R) give the finite dimensionality since O is a coherent analytic sheaf (see e.g. [Tay02] page 302). Then from Theorem 12 we obtain: Corollary 46. Given v any closed eigenbundle of H1(Z,H ) for f ∗ with eigenvalue r > 1, there is a unique closed (1, 1) current C such that limk→∞ f k∗(C′)/rk converges to C for any divisor C′ of v. Remark. We note that the terms “closed eigenbundle” and “divisor” in Corol- lary 46 are understood using the long exact sequence for (11). We can apply Corollary 15 to show that Corollary 47. Any such invariant current C so obtained has Hölder contin- uous local potentials. Proof. The result follows from Lemma 5.2, Lemma 38, the fact that the ddc closed Hölder continuous functions are the same as the ddc closed L∞loc functions and from Corollary 15. Also from Observation 3.1, Corollary 48. If v has a plurisubharmonic section the current C is positive. 7 Result via Invariant Sections We stated early on that our construction of invariant members of H0(C ) for a self cohomomorphism of a short exact sequence A → B → C of sheaves could be done in terms of finding invariant sections of bundles. We illustrate this here in a specific case where we can take advantage of geometry to make further conclusions. Finding an invariant section of a bundle is equivalent to finding an invariant trivialization of the bundle, and we will make our initial statement in terms of a trivialization. Let Z be a compact complex manifold. Let f : Z → Z be a holomorphic endomorphism. Let p ∈ H1(Z,H ) be an eigenvector for f ∗ with real eigen- value λ of norm greater than one. If f ∗ were to have complex eigenvalues of interest, an analogous construction can be made to the one that follows. We note that there is a canonical bundle map f̃ : f ∗(p) → p which gives the map f on the base space. It is easy to show that there is a map σ : p→ λp which is the identity on the base space and takes the form r 7→ λr+ b on the fibers, where b is a constant. What is more, the map τλ is easily seen to be unique up to the addition of a constant. Then define the map f̌ : p → p to be the composition of τλ→ λp = f ∗(p) f̃→ p. Then f̌ is the map f on the base space and takes the form r 7→ λr + b on the fibers. Since every pluriharmonic bundle is trivial as a smooth bundle, then we can choose a smooth trivialization t : p→ R, i.e. t(a+ r) = σ(a) + r for any a ∈ p, r ∈ R, where a+ r is computed in the fiber containing a. Theorem 49. There is a unique continuous trivialization g : p → R such that: g(a+ r) = g(a) + r for a ∈ p and r ∈ R, g(f̌(a)) = λ · g(a) for a ∈ p, moreover g = lim λ−k ◦ t ◦ f̌ ◦k and the limit converges uniformly. Finally, the zero set of g is the image of a section g : Z → p and is exactly the set of points whose forward image under f̌ remains bounded. Proof. Define a function T : p→ R by T (a) ≡ t f̌(a) − λ · t(a). Note that T descends to a well defined continuous function T : Z → R since for an arbitrary r ∈ R one has T (a + r) = t f̌(a + r) − λ · t(a + r) = t(f̌(a) + λr)− λ · (t(a) + r) = T (a). One notes that since the function T is necessarily bounded if Z is compact then defining g(a) ≡ t(a) + λ−1 · T (a) + λ−2T (f̌(a)) + λ−3T (f̌ ◦2(a)) + · · · gives a continuous function g : p→ R satisfying the above two properties. Assume g1 and g2 are two such functions. Then ∆ ≡ g1− g2 : p→ R is a function satisfying ∆(a+ r) = ∆(a) for a ∈ p and r ∈ R so ∆ descends to a continuous function ∆: p→ R satisfying ∆(f̌(a)) = λ ·∆(a). However since λ > 1 one concludes that this is only possible if ∆ ≡ 0 since M is compact so ∆(M) has compact image in R. It is easy to check using the definition of T that λ−k ◦ t ◦ f̌ ◦k(a) is exactly a partial sum of the first k terms of the above series and this gives the convergence result. The conclusion about the section g is trivial. The above construction can be carried through almost without modifi- cation for any subspace of H1(Z,H ) on which f ∗ is expanding. This gives an alternate way of understanding the convergence of preimages of sections. The point is that if s is any section of p, i.e. the potential of a current C, then 1 f ∗(C) is a current with potential which is the setwise preimage of s under f̌ (this is easy to confirm from the construction of f). The Green’s trivialization g shows that f̌ is uniformly repelling away from the image of the invariant section g. Thus as long as s is bounded in p, (not even neces- sarily continuous), then the successive preimages of s will converge uniformly to the section g. Since uniform convergence of potentials implies convergence of currents then the rescaled pullbacks of a current C converge to the cur- rent with potential g. We already have this as a theorem, so we have not restated it as such here. This is just an alternative approach. Note that in the case where Z = P2 [FJ03] has given far more precise control of when the successive rescaled preimages of a current will converge to the eigencurrent. 7.1 Sections version with an Invariant Ample Bundle It is also interesting to consider the special case where there is an invariant ample bundle with eigenvalue λ ≥ 2 an integer. Without loss of generality we assume ` is very ample. The morphism of sheaves log | · | : O∗ → H induces a map from holomorphic line bundles to pluriharmonic bundles. We let p = log |`| be the corresponding pluriharmonic bundle. It is easy to see that there is a holomorphic map ` → `λ which is of the form σλ : z 7→ azλ, a ∈ C∗ on each fiber and is the identity on the base space. There is also a canonical holomorphic map f̃ : f ∗(`) → ` which is a line bundle map and is f on the base space. One then defines the holomorphic map f̆ : `→ ` which is the composition σk→ `k = f ∗(`) f̃→ `. This map is of the form z 7→ azk on each fiber and is equal to the map f : Z → Z on the base space. Let `∗ denote ` with its zero section removed, so that log | · | : ` → p is a well defined continuous map. Since the preimage of the zero section of ` under f̆ is the zero section then f̆ is a holomorphic self map of `∗. It is easy to confirm that f̆ : ` → ` can be rescaled so that the diagram log |·| log |·| commutes. Our Greens trivialization g : p→ R can be pulled back to give a Green’s function G : `∗ → R on the punctured bundle `∗. It satisfies G(f̃(w)) = λ · G(w) and G(βw) = G(w) + log |β| for w ∈ ` and β ∈ C∗. Since g is a trivialization of an R bundle over a compact space, g is proper. Since log | · | : `∗ → p is proper then G is proper. Thus, in this setting one can construct a Greens function that is exactly analogous to the Green’s function constructed on Cn+1 for a holomorphic endomorphism of Pn. Potentially one could take advantage of the special geometry of very ample bundles to get information about the dynamics in this situation. 8 Bibliography References [BLS93] Eric Bedford, Mikhail Lyubich, and John Smillie. Polynomial dif- feomorphisms of C2. IV. The measure of maximal entropy and lam- inar currents. Invent. Math., 112(1):77–125, 1993. [Bre97] Glen E. Bredon. Sheaf Theory. Springer-Verlag, 1997. [BS91a] Eric Bedford and John Smillie. Polynomial diffeomorphisms of C2: currents, equilibrium measure and hyperbolicity. Invent. Math., 103(1):69–99, 1991. [BS91b] Eric Bedford and John Smillie. Polynomial diffeomorphisms of C2. II. Stable manifolds and recurrence. J. Amer. Math. Soc., 4(4):657– 679, 1991. [BS92] Eric Bedford and John Smillie. Polynomial diffeomorphisms of C2. III. Ergodicity, exponents and entropy of the equilibrium measure. Math. Ann., 294(3):395–420, 1992. [BS98a] Eric Bedford and John Smillie. Polynomial diffeomorphisms of C2. V. Critical points and Lyapunov exponents. J. Geom. Anal., 8(3):349–383, 1998. [BS98b] Eric Bedford and John Smillie. Polynomial diffeomorphisms of C2. VI. Connectivity of J . Ann. of Math. (2), 148(2):695–735, 1998. [BS99] Eric Bedford and John Smillie. Polynomial diffeomorphisms of C2. VII. Hyperbolicity and external rays. Ann. Sci. École Norm. Sup. (4), 32(4):455–497, 1999. [Can01] Serge Cantat. Dynamique des automorphismes des surfaces K3. Acta Math., 187(1):1–57, 2001. [dR84] Georges de Rham. Differentiable Manifolds. Springer-Verlag, 1984. [DS05] Tien-Cuong Dinh and Nessim Sibony. Green currents for holomor- phic automorphisms of compact kähler manifolds. J. Amer. Math. Soc., 18(2):291–312, 2005. [Fed69] Herbert Federer. Geometric Measure Theory. Springer-Verlag, 1969. [FJ03] Charles Favre and Mattias Jonsson. Brolin’s Theorem for Curves in Two Complex Dimensions. Ann. Inst. Fourier, 53:1461–1501, 2003. [FM89] Shmuel Friedland and John Milnor. Dynamical properties of plane polynomial automorphisms. Ergodic Theory Dynam. Sys- tems, 9(1):67–99, 1989. [FS92] John Erik Fornaess and Nessim Sibony. Complex Hénon mappings in C2 and Fatou-Bieberbach domains. Duke Mathematical Journal, 65(2):345–380, 1992. [FS94a] John Erik Fornaess and Nessim Sibony. Complex dynamics in higher dimension. In Complex Potential Theory, pages 131–186, 1994. [FS94b] John Erik Fornaess and Nessim Sibony. Complex dynamics in higher dimension. I. Astérisque, 222:201–231, 1994. [FS95a] John Erik Fornaess and Nessim Sibony. Classification of recurrent domains for some holomorphic maps. Math. Ann., 301(4):813–820, 1995. [FS95b] John Erik Fornaess and Nessim Sibony. Complex dynamics in higher dimension. II. In Modern methods in complex analysis (Princeton, NJ, 1992), pages 135–182. Princeton Univ. Press, Princeton, NJ, 1995. [FS01] John Erik Fornæss and Nessim Sibony. Dynamics of p2 (examples). In Laminations and foliations in dynamics, geometry and topology (Stony Brook, NY, 1998), pages 47–85. Amer. Math. Soc., Provi- dence, RI, 2001. [GR84] H. Grauert and R. Remmert. Coherent Analytic Sheaves. Springer- Verlag, 1984. [Har77] Robin Hartshorne. Algebraic Geometry. Springer-Verlag, 1977. [HL99] Francis Hirsche and Gilles Lacombe. Elements of Functional Analy- sis, volume 192 of Graduate Texts in Mathematics. Springer-Verlag, 1999. Translated from the 1997 French original by Silvio Levy. [HOV94] John H. Hubbard and Ralph W. Oberste-Vorth. Hénon mappings in the complex domain. I. The global topology of dynamical space. Inst. Hautes Études Sci. Publ. Math., (79):5–46, 1994. [HOV95] John H. Hubbard and Ralph W. Oberste-Vorth. Hénon mappings in the complex domain. II. Projective and inductive limits of poly- nomials. In Real and complex dynamical systems (Hillerød, 1993), volume 464 of NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci., pages 89–132. Kluwer Acad. Publ., Dordrecht, 1995. [JW00] Mattias Jonsson and Brendan Weickert. A nonalgebraic attractor in P2. Proc. Amer. Math. Soc., 128(10):2999–3002, 2000. [Kat03] Anatole Katok. Combinatorial constructions in ergodic theory and dynamics, volume 30 of Unversity Lecture Series. American Math- ematical Society, Providence, RI, 2003. [McM02] Curtis T. McMullen. Dynamics on K3 surfaces: Salem numbers and Siegel disks. J. Reine Angew. Math., 545:201–233, 2002. [Tay02] Joseph L Taylor. Several Ccomplex Variables with Connections to Algebraic Geometry and Lie Groups. American Mathematical Society, 2002. [Ued94] Tetsuo Ueda. Fatou sets in complex dynamics in projective spaces. J. Math. Soc. Japan, 46:545–555, 1994. [Ued97] Tetsuo Ueda. Complex dynamics on Pn and kobayashi metric. In Complex dynamical systems and related areas, pages 188–191. Surikaisekikenkyusho Kokyuroku No 988, 1997. (Kyoto 1996). [Ued98] Tetsuo Ueda. Critical orbits of holomorphic maps on projective spaces. J. Geometric Analysis, 8(2):319–334, 1998. [Wei97] Charles A. Weibel. An Introduction to Homological Algebra. Cam- bridge University Press, 1997. Introduction Cohomomorphisms Cohomomorphisms and . Invariant Global Sections Regularity and Positivity Subsheaf Cohomology Invariant Currents Nimble Forms and Lenient Currents Hölder Lemmas Eigencurrents for Cohomologically Expanding Smooth Maps Eigencurrents for Smooth Covering Maps Holomorphic Endomorphisms ddc Cohomology Result via Invariant Sections Sections version with an Invariant Ample Bundle Bibliography ABSTRACT The goal of this paper is to construct invariant dynamical objects for a (not necessarily invertible) smooth self map of a compact manifold. We prove a result that takes advantage of differences in rates of expansion in the terms of a sheaf cohomological long exact sequence to create unique lifts of finite dimensional invariant subspaces of one term of the sequence to invariant subspaces of the preceding term. This allows us to take invariant cohomological classes and under the right circumstances construct unique currents of a given type, including unique measures of a given type, that represent those classes and are invariant under pullback. A dynamically interesting self map may have a plethora of invariant measures, so the uniquess of the constructed currents is important. It means that if local growth is not too big compared to the growth rate of the cohomological class then the expanding cohomological class gives sufficient "marching orders" to the system to prohibit the formation of any other such invariant current of the same type (say from some local dynamical subsystem). Because we use subsheaves of the sheaf of currents we give conditions under which a subsheaf will have the same cohomology as the sheaf containing it. Using a smoothing argument this allows us to show that the sheaf cohomology of the currents under consideration can be canonically identified with the deRham cohomology groups. Our main theorem can be applied in both the smooth and holomorphic setting. <|endoftext|><|startoftext|> Coincidence of the oscillations in the dipole transition and in the persistent current of narrow quantum rings with two electrons Y. Z. He and C. G. Bao∗ State Key Laboratory of Optoelectronic Materials and Technologies, and Department of Physics, Sun Yat-Sen University, Guangzhou, 510275, P.R. China The fractional Aharonov-Bohm oscillation (FABO) of narrow quantum rings with two electrons has been studied and has been explained in an analytical way, the evolution of the period and amplitudes against the magnetic field can be exactly described. Furthermore, the dipole transition of the ground state was found to have essentially two frequencies, their difference appears as an oscillation matching the oscillation of the persistent current exactly. A number of equalities relating the observables and dynamical parameters have been found. PACS numbers: 73.23.Ra, 78.66.-w * The corresponding author Quantum rings containing only a few electrons can be now fabricated in laboratories1,2. When a magnetic field B is applied, interesting physical phenomena, e.g., Aharonov-Bohm oscillation (ABO) and fractional ABO (FABO)of the ground state (GS) energy Eo and persis- tent current Jo, have been observed 2−4,13. In the the- oretical aspect, a number of calculations based on exact diagonalization5−8, local-spin-density approximation9,10, and the diffusion Monte Carlo method11 have been performed. These calculations can in general repro- duce the experimental data. For examples, in the calculation of 4-electron ring6,11, the period of oscilla- tion Φ0/4 found in experiments was recovered (Φ0 = hc/eisthefluxquantum). In addition to the oscillations in Eo and Jo, the oscil- lation in the optical properties is noticeable.16,17. In this paper a new kind of oscillation found in the dipole tran- sition of two-electron (2-e) narrow rings is reported. The emitted (absorbed) photon of the dipole transition of the GS was found to have essentially two energies, their dif- ference is exactly equal to hJo, where h is the Planck’s constant. In other words the difference of the two photon energies appears as an oscillation which matches exactly the oscillation of Jo. This finding is approved by both numerical calculation and analytical analysis as follows. The narrow 2-e ring is first considered as one- dimensional, then the effect of the width of the ring is further evaluated afterward. The Hamiltonian reads H = T + V12 +HZeeman (1) G(−i ∂ +Φ)2, G = ~ 2m∗R2 where m∗ the effective mass, θj the azimuthal an- gle of the j − th electron, Φ = πR2B/Φ0, where B is a magnetic field perpendicular to the plane of the ring, V12 the e-e Coulomb interaction, HZeeman = −SZµΦ the well known Zeeman energy where SZ is the Z-component of the total spin S, and µ = g πR2Φ0 where g∗ is the effective g-factor and µB is the Bohr magneton. The interaction is adjusted as 7 V12 = e2/(2ε d2 +R2 sin2((θ1 − θ2)/2)) −1, where ε is the di- electric constant and the parameter d is introduced to account for the effect of finite thickness of the ring. We first perform a numerical calculation so that all related quantities can be evaluated quantitatively. m∗ = 0.063me, ε = 12.4 (for InGaAs), d = 0.05R , and the units meV , nm , Tesla and Φ0 are used. Accordingly, G = 604.8/R2,and µ = 33.53/R2. A set of basis functions φk1k2 = e i(k1θ1+k2θ2)/2π is in- troduced to diagonalize the Hamiltonian, where k1 and k2 must be integers to assure the periodicity, the sum of k1 and k2 is just the total orbital angular momentum L. φk1k2 must be further (anti-)symmetrized when S = 0(1). When about three thousand basis functions are adopted, accurate solutions (at least six effective digits) can be ob- tained. The low-lying spectrum is plotted in Fig.1, where the oscillation of the GS energy and the transition of the GS angular momentum Lo can be clearly seen. Let θC = (θ2 + θ1)/2 , and ϕ = θ2 − θ1. Then H = Hcoll +Hint (2) whereHcoll = G(−i ∂ +2Φ)2+HZeeman andHint = 2G(−i ∂ )2+V12, they are for the collective and internal motions, respectively. Our numerical results lead to the following points. (i) Separability: The separability of one-dimensional ring is well known5. However, for the convenience of the following description, it is briefly summarized as follows. Each eigenenergy E can be exactly divided as a sum of three terms E = 1 G(L+ 2Φ)2 + Eint − SZµΦ (3) where the first term is the kinetic energy of collective motion, Eint is the internal energy . Since the basis functions can be rewritten as http://arxiv.org/abs/0704.0070v1 φk1k2 = e iLθCei (k2−k1)ϕ/2π (4) the spatial part of each eigenstate Ψ is strictly separa- ble as Ψ = 1√ eiL θCψint where the first part describes the collective motion, while ψint is a normalized internal state depending only on ϕ. In particular, both Eint and ψint do not depend on B (or Φ). 0 2 4 6 8 E (meV) FIG. 1: Low-lying levels of a 2-e ring against Φ/Φ0 in the FABO region. When Φ/ is positive, Lo is negative, the num- bers by the curves are −Lo. (ii) Classification of ψint: When L is even (odd), (k2− k1)/2 is an integer (half-integer), thus the period of ϕ as shown in (4) is 2π (4π). Therefore, the periodicity of the internal states have two choices. In fact, the difference in the periodicity is closely related to the dependence of the domains of the new variables θC and ϕ, this point has been discussed in detail in ref.[14,15]. Let Q = (−1)L, then the four cases (Q,S) = (1,0), (-1,0), (-1,1) and (1,1) are associated with four types of states labeled by a, b, c, and d , respectively. The internal states of Type a are denoted as ψa, ψa∗ , · · · and the associated internal energies as Ea < Ea∗ , · · · and so on. Examples of ψint and Eint are plotted in Fig.2 and listed in Table 1, respectively. Table 1, The lowest and second lowest internal energies (in meV ) of Type a to d, R = 30nm. Type a b c d Eint 2.626 4.247 2.630 4.272 E∗int 6.342 8.912 6.435 9.158 Due to the e-e repulsion, a dumbbell shape (DB), i.e., ϕ = 180◦, is advantageous in energy because the two electrons are farther away from each other meanwhile. However, a rotation of this geometry by π is equivalent to an interchange of particles, these operations will cre- ate the factors (−1)L and (−1)S, respectively, from the wave function. Therefore, the equivalence leads to a con- straint, accordingly the DB is allowed only for the states Type a Type b 0 90 180 270 360 Type c 0 90 180 270 360 Type d FIG. 2: Four types of ψint against ϕ , R = 40nm. The lowest three of each type are shown, the higher state has more nodes. with L + S even, i.e., only for Type a and c. Other- wise, the states would have an inherent node at the DB and therefore be higher in energy as shown in Table 1, where Ea << Eb, Ec << Ed, and Ea ≈ Ec. In Fig.2 the patterns of Type a are one-to-one similar to Type c , they all have a peak at the DB. On the contrary, all those of Type (b) and(d) have the inherent node at the DB. It is noticeable that Type b and c are not continuous at ϕ = 0 and 2π due to their periods are not equal to 2π. It was found that the internal states of all the GSs are either ψa or ψc without exceptions because the favorable DB is allowed in them. When the dynamical parame- ters vary in reasonable ranges, the qualitative features of Fig.2 remain the same. According to (3), an appropriate Lo would be chosen to minimize the GS energy. When Φ increases, Lo will un- dergo even-odd transitions repeatedly and become more negative as shown in Fig.1. Correspondingly, the total spin So undergoes singlet-triplet transitions, and ψa and ψc appear in the GS alternatively. However, due to the Zeeman effect, when Φ is larger than a critical value Φcrit , only So = 1 states will be dominant, and accordingly only ψc will appear in the GS. The region Φ < (>) Φcrit is called the FABO (ABO) region. (iii) Persistent Current : Let J1 be the current of the particle e1. The expression of J1 is well known. 5 However, since it does not depend on the azimuthal angle, it equals to its average over θ1. Thus the total current J = J1+J2 J = 1 dθ1dθ2 [Ψ ∗(−i ∂ +2Φ)Ψ+c.c.] (5) where g = ~/(m∗R2). Using the arguments θC and ϕ and making use of the separability, the integration over θC and ϕ can be performed. Thus we have J = g(L+ 2Φ)/2π (6) This equation demonstrates explicitly the mechanism of the oscillation of the persistent current, it is caused by the step-by-step transition of L during the increase of Φ. Examples of J are shown in Fig.3, where each stronger oscillation (associated with a L odd and S = 1 GS) is followed by a weaker oscillation (associated with a L even and S = 0 GS). 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 FIG. 3: The oscillation of the persistent current and the two photon energies of the ground states against Φ/Φ0 in the FABO region. The unit of current is 10−5C/R, where C is the velocity of light. In the lowest panel, the black square (white circle) denotes ~ω+ (~ω−), namely, the energy associ- ated with Lo to Lo + 1 (Lo − 1) transition. (iv) Relations among the internal states : Define Om = e im(θ1−θC)+ eim(θ2−θC) = 2 cos(mϕ/2). By an- alyzing the numerical data, we found Ñ(O1ψa) = ψb + ξa and Ñ(O1ψc) = ψd + ξc (7) where Ñ is the operator of normalization, both ξa and ξc are very small functions and depend on the dynamical parameters very weakly. E.g., when R varies from 30 to 90, the weights of ξa and ξc vary from 0.0004 to 0.0002. They are so small that in fact can be neglected. Since O1 contains a node at the DB, it must cause a change of type from a to b, or from c to d. Thus it is not surprising that (7) holds. Since O1 is the operator of the dipole transition (see below), eq.(7) provides an additional rule of selection as discussed later. (v) Dipole transition: The probability of dipole tran- sition reads P (o),± = (ω±/c) 3R2 |A (o),±| 2, where ω± is the frequency of the photon, = 〈Ψ(f)±|e ±iθ1 + e±iθ2 |Ψ(o)〉 = δL(f), L(o) ±1〈ψ int |O1|ψ int〉 (8) where (f) and (o) denote the final and initial states, respectively, the signs ± are associated with L(f) = L(o) ± 1. Let the initial state be the GS with Lo, then ψ must be ψa or ψc depending on Lo is even or odd . Let α denotes the type of the initial state. Due to (7) int |O1|ψ int〉 = δ(f),α < O1ψα|O1ψα > 1/2, where δ(f),α implies that the final state must be ψb (ψd) if α = a (c), otherwise the amplitude is zero. Thus, due to the ad- ditional rule of selection eq.(7), the dipole strength of the GS is completely concentrated in two final states having L(f) = L(o) ± 1 and both having the same internal state specified by eq.(7). Accordingly, only the photons with the two energies ~ω± = E(f)± − E(o) = G[ 1 (1± 2(Lo + 2Φ)) + ∆α/G] (9) can be emitted (absorbed), where ∆α = Eb − Ea or Ed − Ec depending on α = a or (c). The oscillation of ~ω± is plotted in the lowest panel of Fig.3. It turns out that ∆α/G depends on R very weakly, thus ~ω± is nearly proportional to R−2. Accordingly, a smaller ring will have a larger probability of transition with a higher energy. (vi) FABO region: The oscillation in this region is com- plicated as shown in Fig.1 and 3. It is noted that the GS energy (3), persistent current (6), and the photon ener- gies (9) all contain the factor Lo + 2Φ, thus their FABO are completely in phase and have the same mechanism caused by the transition of Lo against Φ. In Fig.1 the abscissa Φ can be divided into segments, in each the GS has a specific Lo and the GS energy is given by a piece of a parabolic curve. The segment is called an even (odd) segment if Lo is even (odd). At the border of two neighboring segments the two GS energies are equal. From the equality and based on (3), the right and left boundaries of the segment with Lo can be obtained as Φright(Lo) = (1−(µ/2G) 2)−1[ 1−µ(Ec−Ea)/G 2Lo+(−1) Lo(2(Ec−Ea)+µ(Lo−1/2))/G]/4 (10) Φleft(Lo) = (1 − (µ/2G) 2)−1[−1 − µ(Ec − Ea)/G 2Lo − (−1) Lo(2(Ec − Ea) + µ(Lo + 1/2))/G]/4 (11) where Lo ≤ 0 and Φright(Lo) = Φleft(Lo − 1), µ arises from the HZeeman. The length of the segment reads dLo = Φright(Lo) − Φleft(Lo) = (1 − (µ/2G) 2)−1[1 + (−1)Lo(2(Ec − Ea) + µLo)/G]/2 (12) which is related to the period of the FABO. When Φ increases, the magnitude of Lo would increase. Since µLo is negative, it is clear from eq.(12) that the length of even (odd) segments would become shorter (longer) when Φ increases. The location of a segment with a given Lo can be known from the inequality Φleft(Lo) ≤ Φ ≤ Φright(Lo). Once the relation between Lo and the segments of Φ is clear, every details of the FABO can be analytically and exactly explained via the eq.(3), (6), and (9). In particu- lar, the extrema in each segment can be known by giving Φ = Φright or Φleft. For an example, the maximal cur- rent is g(Lo+2Φright)/2π. Incidentally, the minimum of the GS energy in a segment is Emin = Ec−µ 2/8G+µLo/2 (if So = 1), or just equal to Ea (if S0 = 0). It is noted that Ec − Ea (cf. Table 1) and µ/G (it is 0.0554 in our case) are both small. When Φ is small the magnitude of |Lo| would be also small . In this case eq.(12) leads to dLo ≈ 1/2, i.e., the period is a half of the one of the normal ABO. In fact, (12) provides an quantitative description of the variation of the period of the FABO. (vii) ABO region: When Φ becomes sufficiently large, Lo will become very negative, the even segments will disappear due to their lengths dLo ≤ 0 . We can de- fine a critical odd integer Lcrit so that dLcrit−1 ≤ 0 while dLcrit+1 > 0, thereby the critical flux separating the FABO and ABO region can be defined as Φcrit = Φleft(Lcrit) (13) Once Φ > Φcrit, Lo remains odd and the system keeps polarized. Let IX be the largest even integer smaller than −(G + 2(Ec − Ea))/µ. It turns out from eq.(12) that Lcrit = IX + 1. With our parameters, Lcrit = −19 and accordingly Φcrit = 9.003 ( refer to Fig.1). Both Lcrit and Φcrit depend on R very weakly, but sensitively on the effective mass m∗. In the ABO region (Φ > Φcrit), eqs.(10) to (12) do not hold. Instead we have Φright = −(Lo − 1)/2, Φleft = −(Lo + 1)/2, and dLo = 1. Thus the normal ABO re- covers. Evaluated from (6), the magnitude of current is from −g/2π to g/2π (for a comparison, it is from −g/4π to g/4π for 1-e rings). From (9) the photon energies ~ω+ is from ∆c −G/2 to ∆c + 3G/2, at the same time ~ω− is from ∆c + 3G/2 to ∆c −G/2. (viii) Relations between the photon energies and other physical quantities : Due to (7), the emitted (absorbed) dipole photon has only two frequencies , therefore it is meaningful to define ∆~ω = ~(ω+ − ω−). Directly from (9) and (6), we have ∆~ω = hJo (14) where h is the Planck’s constant and Jo is the persis- tent current of the GS. To compare with 1-e rings, the latter has ∆~ω = 2hJoĖq.(14) demonstrates that the os- cillation of ∆~ω and the oscillation of Jo are matched with each other exactly, they keep strictly proportional to each other during the variation of Φ. The maxima of ∆~ω measured in the ABO and FABO regions, respectively, read (∆~ω) max = 2G (15) (∆~ω) max = 2G(Lo + 2Φright) (16) Obviously, (15) provides a way to determine G, m∗can be thereby obtained. (16) can be rewritten as Ec−Ea = (G−µLo)/2−(2G−µ)/(4G)(∆~ω) max (17) This equation can be used to determine Ec −Ea. Fur- thermore, we define Γ~ω = ~(ω+ + ω−) = G+ 2∆α (18) Once G has been known, (18) can be used to determine Eb−Ea andEd−Ec. Since the spectrum can be generated from the internal energies via (3), the evolutions of the spectrum and the persistent current against Φ can be understood simply by measuring the photon energies. 0.0 0.2 0.4 0.6 0.8 1.0 -0.15 -0.10 -0.05 B (Tesla) 50 to 120nm FIG. 4: Evolution of hJo(solid line) and ∆~ω (dotted line) against B for a 2-e ring with ra = 50 and rb = 120nm (ix) Effect of the width: We now consider a two- dimensional model in which the two electrons are strictly confined in an annular region by a potential U(r), which is zero if ra < r < rb or is infinite otherwise. Under this model we have performed numerical calculation to obtain ∆~ω and hJo, where Jo is now the total angular current inside the ring (from ra to rb). The result is shown in Fig.4 where ra = 50 and rb = 120 are assumed, and the two quantities are slightly different from each other. However, when the width becomes smaller, say rb − ra < 30, the two curves overlap. Thus (14) works not only for one-dimensional but also for two-dimensional narrow rings. Let us define r = ~/ m∗(∆~ω)ABmax. For one-dimensional rings and from (15), we have r= R, where R is the radius of the ring. For two-dimensional rings, it was found from our numerical calculation that r ≈ (rb + ra)/2 if rb − ra < 30. E.g., when rb = 100 and ra = 70, r =85.03. When rb = 100 and ra = 90, r =95.00. Thus (15) works also well for two-dimensional narrow rings if the R in G is replaced by the average radius. It is noted that the band-structure and related optical properties of 2-e rings have already been studied in de- tail by Wendler and coauthors18. They classify the eigen- states according to their radial motion, relative angular motion, and collective rotation. In our paper the relative angular motion is further classified into four types ac- cording to the inherent nodal structures and periodicity of their wave functions, i.e., according to whether the DB shape is allowed and whether the wave function is contin- uous at ϕ = 2π. The DB-accessibility turns out to be im- portant because it affects the eigenenergies decisively. In fact, the classification of states based on inherent nodal structures was found to be crucial in atomic physics,19 this would be also true in two-dimensional systems. Fur- thermore, the rule of selection for the dipole transition has been proposed in ref.[18]. In our paper, an addi- tional rule (namely,eq.(7)) is further proposed based on the possible transition of internal structures. This rule would affect the dipole spectrum seriously because the emission (absorption) is thereby concentrated into two frequencies. The difference of these two frequencies turns out to be proportional to the persistent current. There- fore the measurement of this difference can be used to determine the magnitude of the current. In summary, we have studied the FABO both analyt- ically and numerically. The analytical formalism pro- vides not only a base for qualitative understanding, but also provides a number of formulae for quantitative de- scription. The domain of Φ is divided into segments, each corresponds to a Lo. This division describes ex- actly how Lo would transit against |Phi, which causes directly the FABO. Thereby the variation of the period and amplitude of the oscillation of the GS energy, persis- tent current, and the frequencies of dipole transition in the FABO region can be described exactly. A number of equalities to relate the physical quantities and dynamical parameters have been found. In particular, a new oscilla- tion, namely, the oscillation of ∆~ω was found to match exactly the oscillation of Jo. Since the photon energies can be more accurately measured, other observables and parameters can be thereby determined via the equali- ties. Since the separability of the Hamiltonian and the existence of inherent nodes are common, the above de- scription can be more or less generalized to N−electron rings, this deserves to be further studied. Acknowledgment, This work is supported by the NSFC of China under the grants 10574163 and 90306016. .REFERENCES 1, A. Lorke, R.J. Luyken, A.O. Govorov, J.P. Kot- thaus, J.M. Garcia, and P.M. Petroff, Phys. Rev. Lett. 84, 2223 (2000). 2, U.F. Keyser, C. Fühner, S. Borck, R.J. Haug, M. Bichler, G. Abstreiter, and W. Wegscheider, Phys. Rev. Lett. 90, 196601 (2003) 3, D. Mailly, C. Chapelier, and A. Benoit, Phys. Rev. Lett. 70, 2020 (1993) 4, A. Fuhrer, S. Lüscher, T. Ihn, T. Heinzel, K. Ensslin, W. Wegscheider, and M. Bichler, Nature (London) 413, 822 (2001) 5, S. Viefers, P. Koskinen, P.Singha Deo, M. Manninen, Physica E 21, 1(2004). 6, K. Niemelä, P. Pietiläinen, P. Hyvönen, and T. Chakraborty, Europhys. Lett. 36, 533 (1996) 7, M. Korkusinski, P. Hawrylak, and M. Bayer, Phys. Stat. Sol. B 234, 273 (2002) 8, Z. Barticevic, G. Fuster, and M. Pacheco, Phys. Rev. B 65, 193307 (2002) 9, M. Ferconi and G.Vignale, Phys. Rev. B 50, 14722 (1994). 10, Li. Serra, M. Barranco, A. Emperador, M. Pi, and E. Lipparini, Phys. Rev. B 59, 15290 (1999) 11 A. Emperador, F. Pederiva, and E. Lipparini, Phys. Rev. B 68, 115312 (2003) 12, C.G. Bao, G.M. Huang, Y.M. Liu, Phys. Rev. B 72, 195310 (2005) 13, A.E. Hansen, A. Kristensen, S. Pedersen, C.B. Sorensen, and P.E. Lindelof, Physica E (Amsterdam) 12,770 (2002). 14, K. Moulopoulos and M. Constantinou, Phys. Rev. B. 70, 235327 (2004) 15,J. Planelles, J.I. Climente, and J.L. Movilla, arXiv:cond-mat/0506691 (2005) 16, J.I. Climente and J. Planelles, Phys. Rev. B 72, 155322 (2005) 17, A.O. Govorov, S.E. Ulloa, K. Karrai, and R.J. War- burton, Phys. Rev. B 66, 081309 (2002) 18, L. Wendler, V.M. Fomin, A.V. Chaplik, and A.O. Govorov, Phys. Rev. B 54, 4794 (1996). 19, M.D. Poulsen and L.B. Madsen, Phys. Rev. A 72, 042501 (2005). http://arxiv.org/abs/cond-mat/0506691 ABSTRACT The fractional Aharonov-Bohm oscillation (FABO) of narrow quantum rings with two electrons has been studied and has been explained in an analytical way, the evolution of the period and amplitudes against the magnetic field can be exactly described. Furthermore, the dipole transition of the ground state was found to have essentially two frequencies, their difference appears as an oscillation matching the oscillation of the persistent current exactly. A number of equalities relating the observables and dynamical parameters have been found. <|endoftext|><|startoftext|> Introduction Being rare or ‘exotic’ is a relative phenomenon. From a Samoan point of view Burushaski is an extremely exotic language, but from the point of view of Telugu much less so. In this brief note we want to look a how different and how similar languages turn out to be in pairwise comparisons and the role that genealogical relatedness plays in this regard. We are interested in knowing whether there is a cut-off point Shigh in the amount of similarities such that we can be sure that language pairs that have more than Shigh similarities are all generally thought to be related and also whether there is a cut-of point Slow at the other end of the scale such that all languages having less similarities than Slow are thought to be unrelated. In other words, if a language is ‘normal’ relative to some other language (as Burushaski is to Telugu), does this imply that the two languages are related according to commonly accepted classifications? Or, if two languages are mutually very exotic (as Burushaski and Samoan), does this imply that they are not thought to be related in commonly accepted classifications? The data we use, as well as the genealogical classification, are from the World Atlas of Language Structures (Haspelmath et al., ed., henceforth WALS). The conclusions must of course be seen in relation to this particular dataset. Thus, when we observe a certain amount of typological similarity between two languages, this is strictly and only similarity in terms of the kinds of features investigated in WALS. The dataset includes 134 nonredundant features, each of which has from two to nine discrete values. All of these are quite generic typological features. Our conclusions are also limited to the amount of data available. We have required that for any language pair in our sample there should be 45 or more features attested for both members of the pair (a motivation for this precise number follows shortly). This has limited our sample to 320 languages and 29,810 pairs of languages compared. Among these pairs, there are 1,099 which are related according to the We would like to thank Bernard Comrie, Cecil Brown, and Dietrich Stauffer for comments on this manuscript. classification used in WALS. Henceforth we substitute ‘related’ for the more cumbersome ‘related according to the WALS classification’. We follow this classification because it seeks to meet a consensus view. 1. Results Figure 1 presents the overall results of the investigation. As can be seen, the more similar languages get, the greater the probability is that they are related. The figures on which the curve is based are presented in Table 1. Percent similarity was defined as the percentage of attested features for which both languages have the same value. We have binned language pairs in 5% intervals from 10% to 90% similarity. For the plot in Figure 1 the mean percent similarity in each interval was used. Table 1 gives some additional information: it also shows how many language pairs belong in each interval. This is important for the interpretation of the results, as we shall see shortly. Before giving our interpretation let us explain why we have chosen the criterion that language pairs should have 45 or more features attested for both languages. It turns out that for a criterion of 30 or more features the curve is rather similar but not quite as steep, showing less dependence between the amount of similarity and the probability of finding related pairs. This indicates that the fewer features one operates with, the more prominent is random sampling variability in percent similarity. When operating with a criterion of 60 or more attested features the curve becomes uneven, indicating that the higher criterion passes too few pairs for stable results. This becomes even more pronounced when the criterion is 75 or more features. Obviously, with a more extended database the number of features taken to be criterial could be raised, but 45 is a number that suits the data available in WALS. Figure 1. The probability of finding related languages Probability of finding related languages 0 10 20 30 40 50 60 70 80 90 100 % similarity Table 1. Data. (%SIM = % typological similarity between members of pairs; %REL = % of language pairs that are related; PAIRS = number of language pairs in range) %SIM %REL PAIRS 10.0-14.9 0 11 15.0-19.9 0 91 20.0-24.9 0 443 25.0-29.9 0.26 1566 30.0-34.9 0.33 3904 35.0-39.9 0.4 6019 40.0-44.9 1.2 6772 45.0-49.9 3.26 4873 50.0-54.9 6.68 3520 55.0-59.9 15.41 1551 60.0-64.9 23.72 666 65.0-69.9 38.24 238 70.0-74.9 54.26 94 75.0-79.9 61.54 39 80.0-84.9 85 20 85.0-89.9 100 2 90.0-94.9 100 1 It may be of interest to mention the language pairs that fall in the lower and upper ranges of the percentage of shared values. Collectors of linguistic trivia may find it interesting that the members of the most divergent language pair in the world (in our dataset), i.e. Tümpisa Shoshone and Wari’, are found in the same general area, namely the Americas, that someone who is tired of Romance linguistics should turn to Nivkh and someone fed up with Swedish should visit the Koasatis when looking for something as radically different as it gets. Lists of the 20 most divergent language pairs and the 20 most similar ones are provided in tables 2 and 3. Table 2. The 20 most divergent language pairs in the sample Language A Language B Number of features compared % Similarity Tümpisa Shoshone Wari' 48 10.4 Archi Tukang Besi 46 13 Maybrat Limbu 45 13.3 Italian Nivkh 51 13.7 Burushaski Samoan 49 14.3 Tzutujil Burmese 49 14.3 Ju|'hoan Yup'ik (Central) 56 14.3 Maybrat Tamil 55 14.5 Nubian (Dongolese) Acehnese 48 14.6 Swedish Koasati 47 14.9 Klamath Wari' 47 14.9 Kongo Ladakhi 46 15.2 Bashkir Maori 46 15.2 Berber (Middle Atlas) Waorani 45 15.6 Lango Archi 45 15.6 Archi Thai 45 15.6 Thai Retuarã 45 15.6 Ijo (Kolokuma) Kutenai 50 16 Kongo Evenki 56 16.1 Arabic (Egyptian) Tümpisa Shoshone 48 16.7 Table 3. The 20 most similar language pairs in the sample Language A Language B Relatedness Number of features compared % Similarity Lango Luo same genus 46 80.4 Luvale Zulu same genus 97 80.4 Khmer Vietnamese same family, different genera 89 80.9 Vietnamese Thai different families 110 80.9 Khalkha Tuvan same family, different genera 48 81.3 Lithuanian Russian same family, different genera 64 81.3 Greek (Modern) Bulgarian same family, different genera 64 81.3 Khmer Thai different families 91 81.3 Polish Russian same genus 71 81.7 Russian Serbian-Croatian same genus 45 82.2 Swahili Zulu same genus 107 82.2 Dagur Turkish same family, different genera 46 82.6 Telugu Kannada same family, different genera 47 83 Kongo Nkore-Kiga same genus 48 83.3 Dutch German same genus 56 83.9 Italian Spanish same genus 63 84.1 Drehu Iaai same genus 46 84.8 English Swedish same genus 60 85 French Italian same genus 64 85.9 Hindi Panjabi same genus 49 91.8 While Table 2 does not point in any specific direction and remains a curiosity, Table 3 provides fragments of information which fits into the larger picture that emerges from our study. We note that two pairs of unrelated languages, Vietnamese-Thai and Khmer-Thai, turn up in this list, which otherwise consists of genealogically unrelated language pairs. Furthermore, the rest of the pairs represent a mixture of languages related to different degrees (see Dryer 1992, 2005 for a definition of ‘genera’). Returning to Figure 1 and the associated data in Table 1 let us proceed to overall interpretations. We set out asking whether there is some degree of similarity in typological profiles beyond which it is certain that languages are related. The answer is positive, but nevertheless discouraging. Members of language pairs in the sample that are 81.5% or more similar are all related. But only 12 pairs of languages are that similar, in spite of the fact that there are 1099 pairs of related languages in the sample! On the other hand, if there are less than 25% shared feature values all language pairs will be unrelated, and this goes for 545 pairs in the sample. If one allows for a very small margin of error (around 1%), it can predicted that less than 40% shared feature values implies unrelatedness. That goes for 12,034 language pairs in the sample—close to half of the total of 29,810. Thus, lack of similarity is a good predictor of unrelatedness, but presence of similarity is a bad predictor of relatedness. 2. Are there ways of improving the results? We next consider the question of whether the prediction of relatedness could be improved somehow. In other studies (Holman et al. 2006a,b, Brown et al. 2006) we have made exact quantitative explorations of the relationship between typological similarity and geographical distance among languages. Not surprisingly, the greater the geographical proximity is between languages, the more similar they tend to be (this goes for both related and unrelated languages). If one takes into account the areal factor, this might move the cut-off point to allow more accurate predictions of relatedness. Testing this strategy was unsuccessful. We were not able to obtain markedly different results by adjusting the measure of similarity relative to geographical distance: the correlation between adjusted and unadjusted measures was 0.96. The reason for this is probably that the distance measure, as given in the WALS database, identifies the location of a given language (roughly) with its center of extension. This means that some neighbouring languages, such as German and Dutch, are treated as having a certain distance between them when in reality they don’t have any. The more widespread the languages compared are, the bigger this problem gets. Since it is impossible to provide adequate measure of geographical distances for 29,810 language pairs, and not just take recourse to a mechanical measure of distance from one WALS dot to another, it is not viable to improve on the cut-off point in such a way. Also, the 134 features differ appreciably in the distribution of rarity and commonness among their values. It is possible to imagine that taking into account the relative rarity of feature values might improve the predictions. We again failed to obtain markedly different results by adjusting the measure of similarity relative to differences among features: the correlation between adjusted and unadjusted measures was 0.98. The probable reason is that differences among features tend to average out in a sample of at least 45 attested features. Another strategy to try to improve the power of prediction concerning relatedness would be to weight different features or values of features according to their stability. We have explored ways of measuring stability and have come out with a ranked order of stability for WALS features (Wichmann et al. 2006). Conceivably, if the features shared among languages were weighted for their stability the cut-off point could be pushed a bit. We expect, however, that the results would be similar to the results for taking into account rarity, since stable and unstable features would also average out. A final strategy to improve the results would be to take into account the areality of features. The linguistic typological literature abounds with statements concerning the susceptibility to diffusion of certain features as opposed to others. In practice, however, it turns out to be virtually impossible to define areas and measure areality in a consistent way. A major contribution of WALS has been to show that most typological features are ‘areal’ to various extents. Browsing the maps will make it clear to anyone that almost any feature can spread and that whatever features diffuse are the features that happen to exist in an area. Thus, ‘areality’ is not amenable to quantification in any straightforward way. 3. Deviant language pairs The results reported on in Figure 1 and Table 1 show that there are a few pairs of languages which are related even though showing less than 40% similarities, which is the point where pairs tend overwhelmingly not to be related. It serves the record to provide a list of the pairs of related languages that are deviant in the sense that they show less similarities than related languages normally do. This list is provided in Table 4. Table 4. Related languages that have unusually different typological profiles (less than 40% similarities) Language A Language B Language family Number of features compared % Similarity Luvale Ijo (Kolokuma) Niger-Congo 52 28.8 Zulu Ijo (Kolokuma) Niger-Congo 52 28.8 Maidu (Northeast) Tsimshian (Coast) Penutian 48 29.2 Ngiti Koyra Chiini Nilo-Saharan 47 29.8 Yoruba Ijo (Kolokuma) Niger-Congo 51 31.4 Mundari Semelai Austro-Asiatic 66 31.8 Swahili Ijo (Kolokuma) Niger-Congo 50 32 Maung Yidiny Australian 81 32.1 Mundari Khmer Austro-Asiatic 78 32.1 Koyraboro Senni Murle Nilo-Saharan 65 32.3 Koromfe Ijo (Kolokuma) Niger-Congo 49 32.7 Beja Margi Afro-Asiatic 45 33.3 Sango Ijo (Kolokuma) Niger-Congo 51 33.3 Nandi Koyraboro Senni Nilo-Saharan 47 34 Nandi Koyra Chiini Nilo-Saharan 52 34.6 Marathi Spanish Indo-European 52 34.6 Margi Amharic Afro-Asiatic 49 34.7 Mundari Vietnamese Austro-Asiatic 88 35.2 Garo Cantonese Sino-Tibetan 51 35.3 Berber (Middle Atlas) Kera Afro-Asiatic 65 35.4 Irish Marathi Indo-European 45 35.6 Paamese Acehnese Austronesian 45 35.6 Limbu Mandarin Sino-Tibetan 45 35.6 Mandarin Bawm Sino-Tibetan 76 36.8 Ijo (Kolokuma) Diola-Fogny Niger-Congo 46 37 Ngiti Nubian (Dongolese) Nilo-Saharan 54 37 Miwok (Southern Sierra) Tsimshian (Coast) Penutian 62 37.1 Mundari Khmu' Austro-Asiatic 70 37.1 Bagirmi Nubian (Dongolese) Nilo-Saharan 64 37.5 Beja Hausa Afro-Asiatic 82 37.8 Koromfe Kisi Niger-Congo 45 37.8 Yidiny Tiwi Australian 90 37.8 Limbu Meithei Sino-Tibetan 45 37.8 Kera Amharic Afro-Asiatic 50 38 Zulu Yoruba Niger-Congo 104 38.5 Beja Kera Afro-Asiatic 57 38.6 Ngiyambaa Maranungku Australian 74 39.2 Malagasy Acehnese Austronesian 56 39.3 Ngiti Nandi Nilo-Saharan 48 39.6 Lugbara Lango Nilo-Saharan 53 39.6 Fur Ngiti Nilo-Saharan 58 39.7 Experts in the different families involved will surely have good explanations for these deviant cases. In some cases a pair may in reality not belong to the same family, as in the case of large and not altogether uncontroversial families such as Australian and Nilo-Saharan. In other cases, such as the two pairs featuring Marathi, a wide separation both temporally and geographically and interaction with widely different types of languages may conspire to make a related pair stand out as unusually different. In any case, measuring the amount of typological similarity provides a clue that ‘something is going on’—either the classification is potentially wrong or heavy language contact is involved. So the method of comparing typological profiles is potentially useful for someone wishing to probe into the behavior of different languages within a proposed family. 4. Conclusions The results reported on in this note were, in part, unsurprising and, in part, unexpected. Figure 1 showed a close correlation between relatedness and typological similarity. This is what we had expected. But we also expected to find some minimal amount of typological similarity among language pairs which would suffice to predict that two languages are related. It turned out to be the case, however, that the amount of similarity required to make this prediction is so high (81.5%) that only few language pairs qualify. In practice, this means that typological features such as those of WALS are not useful for identifying relatedness among languages when it comes to comparisons of single pairs (when groups of languages are compared the situation may be different, but this issue is beyond the scope of this paper). At the other end of the scale we found that typological dissimilarity is a good predictor of unrelatedness: with only a small margin of error one can predict that languages which have 60% or more differences are not related according to the WALS classification. Our finding that a certain amount of typological differences can be used to predict that languages are not commonly believed to be related means that typological differences are a yardstick for gauging the limits of the traditional comparative method. While it was was not surprising to find a correlation between relatedness and the amount of typological differences among language pairs, this finding may nevertheless steer us in new directions. Presumably there is a correlation between the amount of shared basic vocabulary and relatedness as well. If so, the amount of shared basic vocabulary and the amount of typological similarity should also be correlated, and it may even be possible to start considering whether there is such a thing as a ‘typological clock’ such that the time of separation of languages of a given family may be inferred from the amount of typological differences within the family. The fact that unrelated languages may be as similar typologically as related ones indicates that for a ‘typological clock’ to work reasonably well, several pairwise comparison should be made. How, in practice, this kind of methodology could be developed would be an item for future research. References Brown, Cecil H., Eric W. Holman, Christian Schulze, Dietrich Stauffer, and Søren Wichmann. 2006. Are similarities among languages of the Americas due to diffusion or inheritance? An exploration of the WALS evidence. Paper presented at the conference “Genes and Languages”, University of California Santa Barbara, September 8-10, 2006. Dryer, Matthew S. 1992. The Greenbergian word order correlations. Language 68:81-138. Dryer, Matthew S. 2005. “Genealogical language list,” in The World Atlas of Language Structures, edited by Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie, pp. 584-643. Oxford: Oxford University Press. Haspelmath, Martin, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.). 2005. The World Atlas of Language Structures. Oxford: Oxford University Press. Holman, Eric W., Dietrich Stauffer, Christian Schulze, and Søren Wichmann. 2006. On the relation between structural diversity and geographical distance among languages: observations and computer simulations. (Revised version under review for Linguistic Typology). Holman, Eric W., Søren Wichmann, and Cecil H. Brown. 2006. Linguistic and cultural diffusion in a comparative perspective. Submitted. Wichmann, Søren, Eric W. Holman, & Hans-Jörg Bibiko. 2006. How computer simulations may help linguists: recent progress and prospects for more. Paper presented at the conference “Language and physics”, Warsaw, September 11-15, 2006. ABSTRACT No abstract given; compares pairs of languages from World Atlas of Language Structures. <|endoftext|><|startoftext|> Introduction Nonlinear partial differential equations (PDEs) play very important role in many fields of mathematics, physics, chemistry, and biology, and numerous applica- tions. If for nonlinear ordinary differential equations (ODEs) one can observe incontestable progress in their automatic solving, the situation for nonlinear PDEs seems as nearly hopeless one. Despite the fact that various methods for solving nonlinear PDEs have been developed in 19-20 centuries as the suitable groups of transformations, such as point or contact transformations, differential substitutions, and Backlund trans- formations etc., the most powerful method for explicit integration of second- order nonlinear PDEs in two independent variables remains the method of Dar- boux [1]-[4]. The original Darboux method (as already Darboux stated in [1]) is extendable in principle to equations of all orders in an arbitrary number of independent variables, even to systems of equations; however, in [1]-[2] and sub- sequent papers by many authors, the detailed calculations were performed only http://arxiv.org/abs/0704.0072v1 for a single second-order equation with one dependent and two independent variables. The Darboux method was refined in recent years into more precise and effi- cient (although not completely algorithmic) form [5]-[8] and references therein. Nevertheless this approaches suffer from high complexity and necessitate to use some tricks. There were some partially successful attempts to extend modern variants of the Darboux method based on Laplace cascade method on higher-order PDEs and PDEs in the space of more than two independent variables [10]-[13] but they suffer from high complexity too. There is an original approach to the problem, based on the special type of local change of variables which leads to the order reduction of initial PDE, proposed in [14], which is suitable for high dimensions problems but of very special class though. In present paper we propose seemingly new method for finding solutions of some types of nonlinear PDEs in closed form. The method is based on decomposition of nonlinear operators on sequence of operators of lower orders. It is shown that decomposition process can be done by iterative procedure(s), each step of which is reduced to solution of some auxiliary PDEs system(s) for one dependent variable. Moreover, we find on this way the explicit expression of the first-order PDE(s) for first integral of decomposable initial PDE. Remarkably that this first-order PDE is linear if initial PDE is linear in its highest derivatives. The developed method is implemented in Maple procedure, which can really solve many of different order PDEs with different number of independent vari- ables. Examples of PDEs with calculated their general solutions demonstrate a potential of the method for automatic solving of nonlinear PDEs. 2 Bases of the method 2.1 Decomposable PDEs The simplest second-order non-linear PDE for w = w(t, x) can be easily transformed to the following decomposed form ) = 0 , (2) from which we can without difficulty obtain the general solution to PDE (1) in two steps. First step gives us d ln(G(x)) , (3) where G(x) is an arbitrary function. And then, solving the equation (3) on the second step, we obtain w(t, x) = F (t)G(x) , (4) where F (t) is one more arbitrary function. The main observations on analyzing the grounds of solvability of the PDE (1) by the above method are that 1. The PDE (1) is ”decomposable”, i.e., it can be represented as a composi- tion of successive differential operators of type (5) (not necessarily linear). It is clear that such type of decomposition can be done for some PDEs of any order and with any number of independent variables in the following manner D1(w) = u1 , D2(u1) = u2 , . . . . . . , (5) Dn(un−1) = 0 , where ~x = (x1, . . . , xm), w = w(~x), ui = ui(~x) and Di(u) = Vi(~x, u, , . . . , Assuming that Vi are arbitrary functions, and eliminating ui by successive sub- stitutions in system (5), we get a family of PDEs for w of nth order Dn(Dn−1(. . . D1(w) . . . )) = 0 . (6) which are ”decomposable” and in principle their solutions general or particular can be obtained by integration of split system (5). The PDE (6) is nonlinear if at least one of the operators Di is nonlinear. Not all PDEs admit such repre- sentation. And in positive cases such representation is not unique in general. Note that as a matter of fact Di need not be the first-order differential operators. So the composition procedure for nth order PDE, when n > 2 can be as follows (w) = u , (u) = 0 , (7) where n1, n2 are integers and n1 + n2 = n, w = w(~x), u = u(~x), and (k ≤ j) i (u) = Vi(~x, u, , . . . , . . . ∂x m |k1+···+km=k≤j , . . . , The late representation allows us to carry out the PDE‘s decomposition or order reduction gradually bit by bit. We have to stress here that in general representations (5) and (7) may have different meaning. For example, some PDEs do not admit representation (5) but permit the form (7) with both solvable DEs. 2. Each step of the solving process for decomposed PDE is faced with the necessity to solve differential equation Di(ui−1) = ui (or D i (ui−1) = ui), so all such DEs must be solvable. Note that only first step Dn(un−1) = 0 is free from arbitrary functions. So one of the PDEs solving strategies may be as follows. First of all we try to decompose given PDE. In order to do so we have to solve corresponding auxiliary nonlinear PDE system for unknown functions Vi, it is sufficient to find a particular solution here. And, if it is successful, then, deciding between the variants, try to solve each arising DE from the chain (5). Main obstacle here, beginning at the second step is just mentioned necessity to solve DEs with arbitrary functions. There are sufficiently narrow circle of solvable (in sense of the general solutions) DEs with an arbitrary function as a parameter. Another (classification) approach can be based on the usage of only solvable DEs. That is, we can form a composition of successive solvable differential operators and as a result obtain a families of solvable PDEs. Such a way leads to extensive nontrivial families for different types of nonlinear PDEs which general solutions can be expressed in closed form. But on this way we encounter a difficulty to circumscribe such families integrally and are forced to consider particular subfamilies. Nevertheless it yields extensive field of PDEs for methods testing [15]. 2.2 Decomposition algorithm for decomposable PDEs For nth order PDE, when n > 2 there are some slightly different approaches which are dictated by goals of the problem. If the goal is to decompose given nonlinear operator then we have to use the scheme (7) with n1 = 1, n2 = n− 1. And conversely we have to use the scheme (7) with n1 = n − 1, n2 = 1 if the goal is to solve given PDE. The last procedure in some features resembles the well-known technics of reducing ODEs order, e.g., by first integral method. Of course, it is possible to use intermediate cases. All above cases can be treated by the same way as we consider below but each of them leads to auxiliary PDEs systems of different order, viz n2+1, with corresponding calculation complexity. In sequel we will consider for shortness only the case with n1 = n−1, n2 = 1, as more practical for PDEs solving. Let us consider the decomposition of type (7) with Dn−1 (w) as a solution of the following equation with respect of u = u(~x) J(u, ~x, w, , . . . , . . . ∂x m |k1+···+km=k≤n−1 , . . . , ∂n−1w ) = 0 (8) D2(u) = V (~x, u, , . . . , ) . (9) If substitute u = Dn−1 (w) into (9) we obtain decomposable n-th order PDE V (~x, U0, Ux1 , . . . , Uxm) = 0 , (10) where (we use below the following notation w = W0 and ...∂x = Wk1,...,km) (w) = U0 , (11) ∂Wk1,...,km ,...,k∗ = Uxi (i = 1, . . . ,m) , (12) where k∗j = kj + 1 if j = i and k j = kj otherwise, and it is supposed that differentiations in sum are carried out on all indexed W ‘s which are involved in Here we can introduce U0 and Ux1 , . . . , Uxm as new independent variables if express m variables from the set {Wk1,...,km} with k1 + · · ·+ km = n using linear system (12). Assuming that given PDE of order n F (~x, w, , . . . , . . . ∂x m |k1+···+km=k≤n , . . . , ) = 0 (13) is decomposable, we receive, that after substitution of the new variables, left- hand side of given PDE must turn into (10) with some V . Left-hand side of given PDE expressed in new variables is the first-order differential expression with respect to J(U0, ~x,W0,W1,0,...,0, . . . ,Wk1,...,km |k1+···+km=k≤n−1, . . . ,W0,0,...,n−1) and must not depend on all indexed W ‘s, that is derivatives of F expressed in new variables with respect to all indexed W ‘s are equal to zero. Sequence of such derivatives of F equated to zero form a second-order PDE system for J . So a solution (particular as well) the PDE system gives possible expression of differential operator Dn−1 (w) through (8) and differential operator D2(u) by substituting the solution of J into left-hand side of given PDE expressed in new variables. Of course, there are problems where a operator decomposition is required only. But in most cases obtained decomposition is intended for finding solutions for given PDE. If in obtained decomposition the corresponding PDE D2(u) = 0 is solvable, then substitution of obtained u into J expressed in original variables gives us a first integral (see its definition in the next subsection) of given PDE. It is easy to see that for decomposable PDEs the first integral is a differential equation, so we can try to solve it or to find a first integral for this new DE (or decompose it) by the scheme described above until we come to the first-order Remarkably that in the approach under consideration the finding of first integrals can be done more directly and effectively. 2.3 Differential equation for first integral of decomposable The first integral I of the PDE is an expression, involving one arbitrary func- tion, which is equivalent in some sense to the given PDE. The first integral vanishes on the set of solutions of given PDE. And (in accordance with [4]) all differential consequences of the equation I = 0 coincide with respective differen- tial consequences of given PDE (e.g., elimination of the arbitrary function leads to the given PDE). Our goal here is to find PDE for first integral of a decomposable PDE. To do so we first of all have to take into account that u(~x) is the solution of the corresponding PDE V (~x, u, , . . . , ) = 0 , so u(~x) depends only on ~x but in no way on indexed W ‘s. Secondly, the depen- dent variable in this case, namely J(u(~x), ~x,W0,W1,0,...,0, . . . ,Wk1,...,km |k1+···+km=k≤n−1, . . . ,W0,0,...,n−1) of given PDE (13) expressed in new variables do not to depend on Ux1 , . . . , Uxm and is a first integral of given PDE. If now consider u(~x) as an unknown function, we can denote the first integral I(~x,W0,W1,0,...,0, . . . ,Wk1,...,km |k1+···+km=k≤n−1, . . . ,W0,0,...,n−1) = J(u(~x), ~x,W0,W1,0,...,0, . . . ,Wk1,...,km |k1+···+km=k≤n−1, . . . ,W0,0,...,n−1) and instead of (12) in the form ∂Wk1,...,km ,...,k∗ = −Uxi (i = 1, . . . ,m) we arrive to the following system ∂Wk1,...,km ,...,k∗ = 0 (i = 1, . . . ,m) . (14) If express m variables from the set {Wk1,...,km} with k1 + · · ·+ km = n (at least one of which is actual for given PDE - note that there are some variants here as a rule, so we can obtain some consistent PDEs on this stage) using linear system (14) and substitute them into given PDE (13) we receive a first-order (even linear if PDE (13) is linear in its highest derivatives) PDE with respect to first integral I. And it remains only to solve this PDE(s) to find a first integral of given PDE. Note, given PDE is decomposable iff exists a solution of such first-order PDE(s). 3 Examples To facilitate necessary calculations in the process of finding first integrals I have implemented above described method in prototype of Maple procedure reduce PDE order (see Appendix). The input data of the procedure are given PDE of any order and dependent variable of the PDE with any number of independent variables. The procedure tries to find first integral(s) of the input linear or nonlinear PDE. The Maple built-in procedure pdsolve is used inside my procedure to solve the first-order PDE for first integral. As different Maple versions have different PDE solving abilities so the output depends on Maple version. In the following examples I refer to Maple 11. The procedure reduce PDE order is able to find first integrals for many known and unknown linear and nonlinear PDEs. Here we give examples of PDEs for which it is possible to find finally their general solutions. More exam- ples one can find in collection of solvable nonlinear PDEs [15]. 3.1 Second-order PDE with two independent variables For PDE (w = w(t, x)) −kw− bc = 0 (15) with a 6= 0 and 4ak − b2 6= 0 the procedure reduce PDE order outputs the following first integral I = F1 4ak − b2 − 2 arctan c+ 2a∂w 4ak − b2 4ak − b2 with arbitrary function F1. The ODE I = 0 can be solved and one obtains (after some hand simplifica- tions and edition) the following general solution to (15) w(t, x) = exp(t b2 − 4ak)F (x)(b + b2 − 4ak)− b2 − 4ak + b 1 + exp(t b2 − 4ak)F (x) G(t)} exp exp(t b2 − 4ak)F (x)(b + b2 − 4ak)− b2 − 4ak + b 1 + exp(t b2 − 4ak)F (x) where F (x) and G(t) are arbitrary functions. 3.2 Second-order PDE with four independent variables For PDE ∂x1∂x4 ∂x2∂x4 ∂x3∂x4 + C0 +B1 C1(A1 +B1w +B0)+ C2(A1 +B1w +B0) 2 = 0 , (16) where w = w(x1, x2, x3, x4) and Ai, Bi, Ci are constants, the procedure re- duce PDE order outputs the following first integral I = F1 x1, x2, x3, x4 + 2 arctan 2C2(A1 +B1w +B0) + C1 4C0C2 − C21 4C0C2 − C21 with arbitrary function F1. The PDE I = 0 can be solved and one obtains the following general solution to (16) w(x1, x2, x3, x4) = 2A1C2 exp(−B1x1 )(2B0C2 + C1 + tan[ 4C0C2 − C21 +G(ξ, (A2ξ +A1x2 −A2x1), (A3ξ +A1x3 −A3x1))] 4C0C2 − C21 )dξ + exp(−B1x1 )F [(A1x2 − A2x1), (A1x3 −A3x1), x4] , where F (t1, t2, t3) and G(t1, t2, t3) are arbitrary functions, c is arbitrary con- stant. 3.3 Third order PDE with two independent variables For PDE (w = w(t, x)) ∂t∂x2 − 2w ∂ − w∂w − aw3 = 0 (17) the procedure reduce PDE order outputs the following first integrals I1 = F1 − axw2), 1 ax2w2 + 2w( − x ∂ ) + 2x I2 = F1 − atw2 − with arbitrary function F1. We can form some PDEs from I1 and to solve them we can repeat the process of order reduction with the procedure reduce PDE order. The ODE I2 = 0 can be solved directly and one obtains in any way the following general solution to w(t, x) = F (t) exp − xH(t) + x G(x)dx − xG(x)dx where F (t), H(t) and G(x) are arbitrary functions. 3.4 Fourth order PDE with two independent variables For PDE (w = w(t, x)) ∂t2∂x2 − 2w2 ∂t2∂x ∂t∂x2 − 2∂w = 0 (18) the procedure reduce PDE order outputs the following first integrals I1 = F1(t, ∂t2∂x − 2w∂w − x ∂ ∂t2∂x w − 2x∂w I2 = F1(x, ∂t∂x2 w + 2 − t ∂ ∂t∂x2 w − 2t∂w with arbitrary function F1. The wealth of first integrals here allows us to operate with them in many different ways. Apart from aforesaid subsequent order reduction we can, for example, from ∂t∂x2 − 2w∂w w + 2 = F (x) ∂t∂x2 ] = G(x) , where F (x) and G(x) are arbitrary functions, algebraically eliminate mixed derivative and obtain the following ODE + [tF (x)−G(x)]w2 = 0 , which gives the general solution to (18) w(t, x) = H(t) exp xF (x) dx − tx F (x) dx+ G(x) dx − xG(x) dx + xK(t) where F (x), H(t), G(x) and K(t) are arbitrary functions. 4 Conclusion The method have considered above is efficient enough for solving decomposable PDEs of relatively high order with many independent variables. The main limitation here is concerned with abilities to solve corresponding auxiliary first- order PDEs for first integrals. An adaptability of the method to PDEs which are not decomposable but which general solutions can be expressed in closed form remains unsolved yet. But it can be shown on examples that there are some ways to extend the method for some types of such PDEs. These approaches deserve further thorough study in another publication. References [1] G. Darboux, Sur les equations aux derivees partieles du second ordre. Ann. Sci. Ecole Norm. Sup. 1870, v. 7, pp. 163-173. [2] G. Darboux, Lecons sur la theorie generale des surfaces. v.II. Paris: Her- mann, 1915. [3] E. Goursat, Lecons sur l’integration des equations aux derivees partieles du second ordre a deux variables independantes. V.I,II. Paris: Hermann, 1896, 1898. [4] A.R. Forsyth, Theory of differential equations. Part 4. Partial differential equations, vol. 6, Dover Press, New York, 1959. [5] M. Juras, Generalized Laplace invariants and classical integration meth- ods for second-order scalar hyperbolic partial differential equations in the plane, Differential Geometry and Applications: Proc., Conf. Brno (Czech Republic), 28 Aug.-1 Sept. 1995, Brno: Masaric Univ., 1966, pp. 275-284. [6] M. Juras, Geometric aspects of second-order scalar hyperbolic partial dif- ferential equations in the plane, Ph.D. thesis, 1997, Utah State University, [7] V.V. Sokolov, A.V. Zhiber, On the Darboux integrable hyperbolic equa- tions. Phys. Lett. A, v. 208, pp. 303-308, 1995. [8] A.V. Zhiber, V.V. Sokolov, Exact integrable Liouville type hyperbolic equa- tions [in Russian], Uspekhi Mat. Nauk, Vol. 56, No. 1, pp. 64-104, 2001. [9] S.P.Tsarev, On Darboux integrable nonlinear partial differential equations, Proc. Steklov Institute of Mathematics, v. 225, p. 372-381, 1999. [10] J. Le Roux. Extensions de la methode de Laplace aux equations lin- eaires aux derivees partielles dordre superieur au second. Bull. Soc. Math. de France, v. 27, p. 237262, 1899. A digitized copy is obtainable from http://www.numdam.org/ [11] U. Dini, Sopra una classe di equazioni a derivate parziali di secondordine con un numero qualunque di variabili. Atti Acc. Naz. dei Lincei. Mem. Classe fis., mat., nat. (ser. 5) v. 4, 1901, p. 121178. Also Opere v. III, p. 489566. [12] U. Dini, Sopra una classe di equazioni a derivate parziali di secondordine. Atti Acc. Naz. dei Lincei. Mem. Classe fis., mat., nat. (ser. 5) v. 4, 1902, p. 431467. Also Opere v. III, p. 613660. [13] S.P. Tsarev, On factorization and solution of multidimensional linear par- tial differential equations. http://arxiv.org/abs/cs.SC/0609075, 2006. [14] V.M. Boyko, W.I. Fushchych, Lowering of order and general solutions of some classes of partial differential equations, Reports on Math. Phys., V. 41, No. 3, pp. 311-318, 1998. [15] Yu.N. Kosovtsov, The general solutions of some nonlinear second and third order PDEs with constant and nonconstant parameters. http://arxiv.org/abs/math-ph/0609003 , 2006. http://www.numdam.org/ http://arxiv.org/abs/cs.SC/0609075 http://arxiv.org/abs/math-ph/0609003 5 Appendix. Maple procedure reduce PDE order reduce PDE order:=proc(pde,unk) local B,W,N,NN,ARG,acargs,i,M,pde0,DN,IND,IND2,IND3,IND4,ARGS,SUB,SUB0, Z0,Bargs,EQS,XXX,WW,BB,PP,pdeI,IV,s,AN; option ‘Copyright (c) 2006-2007 by Yuri N. Kosovtsov. All rights reserved.‘; N:=PDETools[difforder](op(1,[selectremove(has,indets(pde,function),unk)])); NN:=op(1,[selectremove(has,op(1,[selectremove(has,indets(pde,function),unk)]),diff)]); ARG:=[op(unk)]; acargs:={}; for i from 1 to nops(ARG) do if PDETools[difforder](NN,op(i,ARG))=0 then else acargs:=acargs union {op(i,ARG)} fi; od; acargs:=convert(acargs,list); M:=op(0,unk)(op(acargs)); if type(pde,equation)=true then pde0:=lhs(subs(unk=M,pde))-rhs(subs(unk=M,pde)) else pde0:=subs(unk=M,pde) DN:=[seq(seq(i,i=1..nops(acargs)),j=1..N)]; IND:=seq(op(combinat[choose](DN,i)),i=1..N); IND2:=seq(op(combinat[choose](DN,i)),i=1..N-2); IND3:=op(combinat[choose](DN,N-1)); IND4:=op(combinat[choose](DN,N)); ARGS:=op(unk),M,seq(convert(D[op(op(i,[IND2]))](op(0,unk)) (op(acargs)),diff),i=1..nops([IND2])); SUB:={M=W[0],seq(convert(D[op(op(i,[IND]))](op(0,unk)) (op(acargs)),diff)=W[op(op(i,[IND]))],i=1..nops([IND]))}; SUB0:={W[0]=op(0,unk)(op(ARG)), seq(W[op(op(i,[IND]))]=subs(M=op(0,unk)(op(ARG)), convert(D[op(op(i,[IND]))](op(0,unk))(op(acargs)),diff)),i=1..nops([IND]))}; Z0:=B(ARGS,seq(convert(D[op(op(i,[IND3]))](op(0,unk))(op(acargs)),diff), i=1..nops([IND3]))); Bargs:=op(indets(subs(SUB,Z0),name)); EQS:=convert(subs(SUB,{seq(diff(Z0,op(i,acargs))=0,i=1..nops(acargs))}),diff); XXX:={seq(W[op(op(i,[IND4]))],i=1..nops([IND4]))}; WW:=select(type,indets(subs(SUB,pde0)), ’name’) intersect {seq(W[op(op(i,[IND4]))],i=1..nops([IND4]))}; BB:=select(has,combinat[choose](XXX, nops(acargs)),WW); PP:={}; pdeI:={seq({subs(subs(solve(EQS,op(i,BB)),subs(SUB,pde0)))},i=1..nops(BB))}; IV:={seq(W[op(op(i,[IND4]))],i=1..nops([IND4]))}; for s from 1 to nops(pdeI) do AN:=pdsolve(op(s,pdeI),{B},ivars=IV); for i from 1 to nops(AN) do if op(0,lhs(op(i,AN)))=B then PP:=PP union {rhs(op(i,AN))} catch: end try; PP:=subs(SUB0,PP); RETURN(PP); end proc: Calling Sequence: reduce PDE order(PDE, f(~x)); PDE - partial differential equation; f(~x) - indeterminate function with its arguments. Introduction Bases of the method Decomposable PDEs Decomposition algorithm for decomposable PDEs Differential equation for first integral of decomposable PDEs Examples Second-order PDE with two independent variables Second-order PDE with four independent variables Third order PDE with two independent variables Fourth order PDE with two independent variables Conclusion Appendix. Maple procedure reduce_PDE_order ABSTRACT In present paper we propose seemingly new method for finding solutions of some types of nonlinear PDEs in closed form. The method is based on decomposition of nonlinear operators on sequence of operators of lower orders. It is shown that decomposition process can be done by iterative procedure(s), each step of which is reduced to solution of some auxiliary PDEs system(s) for one dependent variable. Moreover, we find on this way the explicit expression of the first-order PDE(s) for first integral of decomposable initial PDE. Remarkably that this first-order PDE is linear if initial PDE is linear in its highest derivatives. The developed method is implemented in Maple procedure, which can really solve many of different order PDEs with different number of independent variables. Examples of PDEs with calculated their general solutions demonstrate a potential of the method for automatic solving of nonlinear PDEs. <|endoftext|><|startoftext|> Introduction Morita contexts, in general, and (semi-)strict Morita contexts (with surjective con- necting bilinear morphisms), in particular, were extensively studied and developed expo- nentially during the last few decades (e.g. [AGH-Z1997]). However, we sincerely feel that there is a gap in the literature on injective Morita contexts (i.e. those with injective con- necting bilinear morphisms). Apart from the results in [Nau1994-a], [Nau1994-b] (where the second author initially explored this notion) and from an application to Grothendieck groups in the recent paper ([Nau2004]), it seems that injective Morita contexts were not studied systematically at all. ∗Corresponding Author http://arxiv.org/abs/0704.0074v2 We noticed that in several results of ([Nau1993], [Nau1994-a] and [Nau1994-b]) that are related to Morita contexts, only one trace ideal is used. Observing this fact, we introduce the notions of Morita semi-contexts and Morita data and investigate them. Several results are proved then for injective Morita semi contexts and/or injective Morita data. Consider a Morita datum M = (T, S, P,Q,<,>T , <,>S), with not necessarily compat- ible bimodule morphisms <,>T : P ⊗S Q→ T and <,>S: Q⊗T P → S. We say that M is injective, iff <,>T and <,>S are injective, and to be a Morita α-datum, iff the associated dual pairings Pl := (Q, TP ), Pr := (Q,PS), Ql := (P, SQ) and Qr := (P,QT ) satisfy the α-condition (which is closely related to the notion of local projectivity in the sense of Zimmermann-Huisgen [Z-H1976]). The α-condition was introduced in [AG-TL2001] and further investigated by the first author in [Abu2005]. While (semi-)strict unital Morita contexts induce equivalences between the whole mod- ule categories of the rings under consideration, we show in this paper how injective Morita (semi-)contexts and injective Morita data play an important role in establishing equiva- lences between suitable intersecting subcategories of module categories (e.g. intersections of subcategories that are localized/colocalized by trace ideals of a Morita datum with sub- categories of static/adstatic modules, etc.). Our main applications in addition to equiv- alences related to the Kato-Ohtake-Müller localization-colocalization theory (developed in [Kat1978], [KO1979] and [Mül1974]), will be to ∗-modules (introduced by Menini and Or- satti [MO1989]) and to right wide Morita contexts (introduced by F. Castaño Iglesias and J. Gómez-Torrecillas [C-IG-T1995]). Most of our results will be stated for left modules, while deriving the “dual” versions for right modules is left to the interested reader. Moreover, for Morita contexts, some results are stated/proved for only one of the Morita semi-contexts, as the ones corresponding to the second semi-context can be obtained analogously. For the convenience of the reader, we tried to make the paper self-contained, so that it can serve as a reference on injective Morita (semi-)contexts and their applications. In this respect, and for the sake of completeness, we have included some previous results of the authors that are (in most cases) either provided with new shorter proofs, or are obtained under weaker conditions. This paper is organized as follows: After this brief introduction, we give in Section 2 some preliminaries including the basic properties of dual α-pairings, which play a central role in rest of the work. The notions of Morita semi-contexts and Morita data are intro- duced in Section 3, where we clarify their relations with the dual pairings and the so-called elementary rngs. Injective Morita (semi-)contexts appear in Section 4, where we study their interplay with dual α-pairings and provide some examples and a counter-example. In Section 5 we include some observations regarding static and adstatic modules and use them to obtain equivalences among suitable intersecting subcategories of modules related to a Morita (semi-)context. In the last section, more applications are presented, mainly to subcategories of modules that are localized or colocalized by a trace ideal of an injective Morita (semi-)context, to ∗-modules and to injective right wide Morita contexts. 2 Preliminaries Throughout, R denotes a commutative ring with 1R 6= 0R and A,A ′, B, B′ are unital R-algebras. We have reserved the term “ring” for an associative ring with a multiplicative unity, and we will use the term “rng” for a general associative ring (not necessarily with unity). All modules over rings are assumed to be unitary, and ring morphisms are assumed to respect multiplicative unities. If T and S are categories, then we write T ≤ S (T ≤ S) to mean that T is a (full) subcategory of S, and T ≈ S to indicate that T and S are equivalent. Rngs and their modules 2.1. By an A-rng (T, µT ), we mean an (A,A)-bimodule T with an (A,A)-bilinear mor- phism µT : T ⊗A T → T, such that µT ◦ (µT ⊗A idT ) = µT ◦ (idT ⊗A µT ). We call an A-rng (T, µT ) an A-ring, iff there exists in addition an (A,A)-bilinear morphism ηT : A → T, called the unity map, such that µT ◦ (ηT ⊗A idT ) = ϑ T and µT ◦ (idT ⊗A ηT ) = ϑ T (where A⊗A T ≃ T and T ⊗A A ≃ T are the canonical isomorphisms). So, an A-ring is a unital A-rng; and an A-rng is (roughly speaking) an A-ring not necessarily with unity. 2.2. A morphism of rngs (ψ : δ) : (T : A) → (T ′ : A′) consists of a morphism of R-algebras δ : A→ A′ and an (A,A)-bilinear morphism ψ : T → T ′, such that µT ′◦χ (A,A′) (T ′,T ′) ◦(ψ⊗Aψ) = ψ ◦µT (where χ (A,A′) (T ′,T ′) : T ′ ⊗A T ′ → T ′ ⊗A′ T ′ is the canonical map induced by δ). By RNG we denote the category of associative rngs with morphisms being rng morphisms, and by URNG < RNG the (non-full) subcategory of unital rings with morphisms being the morphisms in RNG which respect multiplicative unities. 2.3. Let (T, µT ) be an A-rng. By a left T -module we mean a left A-module N with a left A-linear morphism φNT : T ⊗AN → N, such that φ T ◦ (µT ⊗A idN) = φ T ◦ (idT ⊗A φ T ). For left T -modulesM,N, we call a left A-linear morphism f :M → N a T -linear morphism, iff f(tm) = tf(m) for all t ∈ T. The category of left T -modules and left T -linear morphisms is denoted by TM. The category MT of right T -modules is defined analogously. Let (T : A) and (T ′ : A′) be rngs. We call an (A,A′)-bimodule N a (T, T ′)-bimodule, iff (N, φNT ) is a left T -module and (N, φNT ′) is a right T ′-module, such that φNT ′ ◦ (φ T ⊗A′ idT ′) = φ (idT ⊗Aφ T ′). For (T, T ′)-bimodulesM,N, we call an (A,A′)-bilinear morphism f :M → N (T, T ′)-bilinear, provided f is left T -linear and right T ′-linear. The category of (T, T ′)- bimodules is denoted by TMT ′ . In particular, for any A-rng T, a left (right) T -module M has a canonical structure of a unitary right (left) S-module, where S := End(TM) (S := End(MT )); and moreover, with this structure M becomes a (T, S)-bimodule (an (S, T )-bimodule). Remark 2.4. Similarly, one can define rngs over arbitrary (not-necessarily unital) ground rngs and rng morphisms between them. Moreover, one can define (bi)modules over such rngs and (bi)linear morphisms between them. Notation. Let T be an A-rng. We write TU (UT ) to denote that U is a left (right) T - module. For a left (right) T -module TU, we consider the set ∗U := HomT−(U, T ) (U Hom−T (U, T )) of all left (right) T -linear morphisms from U to T with the canonical right (left) T -module structure. Generators and cogenerators Definition 2.5. Let T be an A-rng. For a left T -module TU consider the following sub- classes of TM : Gen(TU) := {TV | ∃ a set Λ and an exact sequence U (Λ) → V → 0}; Cogen(TU) := {TW | ∃ a set Λ and an exact sequence 0 → W → U Pres(TU) := {TV | ∃ sets Λ1,Λ2 and an exact sequence U (Λ2) → U (Λ1) → V → 0}; Copres(TU) := {TW | ∃ sets Λ1,Λ2 and an exact sequence 0 →W → U Λ1 → UΛ2}; A left T -module in Gen(TU) (respectively Cogen(TU), Pres(TU), Copres(TU)) is said to be U-generated (respectively U-cogenerated, U-presented, U-copresented). Moreover, we say that TU is a generator (respectively cogenerator, presentor, copresentor), iff Gen(TU) = TM (respectively Cogen(TU) = TM, Pres(TU) = TM, Copres(TU) = TM). Dual α-pairings In what follows we recall the definition and properties of dual α-pairings introduced in [AG-TL2001, Definition 2.3.] and studied further in [Abu2005]. 2.6. Let T be an A-rng. A dual left T -pairing Pl = (V, TW ) consists of a left T -module W and a right T -module V with a right T -linear morphism κPl : V → ∗W (equivalently a left T -linear morphism χPl : W → V ∗). For dual left pairings Pl = (V, TW ), P l = (V ′), a morphism of dual left pairings (ξ, θ) : (V ′,W ′) → (V,W ) consists of a triple (ξ, θ : ς) : (V, TW ) → (V ′, T ′W where ξ : V → V ′ and θ : W ′ → W are T -linear and ς : T → T ′ is a morphism of rngs, such that considering the induced maps <,>T : V ×W → T and <,>T ′: V ′ ×W ′ → T ′ we < ξ(v), w′ >T ′= ς(< v, θ(w ′) >T ) for all v ∈ V and w ′ ∈ W ′. (1) The dual left pairings with the morphisms defined above build a category, which we denote by Pl. With Pl(T ) ≤ Pl we denote the full subcategory of dual T -pairings. The category Pr of dual right pairings and its full subcategory Pr(T ) ≤ Pr of dual right T -pairings are defined analogously. Remark 2.7. The reader should be warned that (in general) for a non-commutative rng T and a dual left T -pairing Pl = (V, TW ), the following map induced by the right T -linear morphism κPl : V → <,>T : V ×W → T, < v, w >T := κPl(v)(w) is not necessarily T -balanced, and so does not induce (in general) a map V ⊗T W → T. In fact, for all v ∈ V, w ∈ W and t ∈ T we have < vt, w > = κPl(vt)(w) = [κPl(v)t](w) = [κPl(v)(w)]t = < v,w >T t; < v, tw > = κPl(v)(tw) = t[κPl(v)(w)] = t < v, w >T . 2.8. Let T be an A-rng, N,W be left T -modules and identify NW with the set of all mappings fromW toN. Considering N with the discrete topology andNW with the product topology, the induced relative topology on HomT−(W,N) →֒ N W is a linear topology (called the finite topology), for which the basis of neighborhoods of 0 is given by the set of annihilator submodules: Bf (0) := {F ⊥(HomT−(W,N)) | F = {w1, ..., wk} ⊂W is a finite subset}, where F⊥(HomT−(W,N)) := {f ∈ HomT−(W,N)) | f(W ) = 0}. 2.9. Let T be an A-rng, Pl = (V, TW ) a dual left T -pairing and consider for every right T -module UT the following canonical map U : U ⊗T W → Hom−T (V, U), ui ⊗T wi 7→ [v 7→ ui < v,wi >T ]. (2) We say that Pl = (V, TW ) ∈ Pl(T ) satisfies the left α-condition (or is a dual left α- pairing), iff α U is injective for every right T -module UT . By P l (T ) ≤ Pl(T ) we denote the full subcategory of dual left T -pairings satisfying the left α-condition. The full subcategory of dual right α-pairings Pαr (T ) ≤ Pr(T ) is defined analogously. Definition 2.10. Let T be an A-rng, Pl = (V, TW ) be a dual left T -pairing and consider κPl : V → ∗W and α V : V ⊗T W → End(VT ). We say Pl ∈ Pl(T ) is dense, iff κPl(V ) ⊆ ∗W is dense (w.r.t. the finite topology on ∗W →֒ TW ); injective (resp. semi-strict, strict), iff α V is injective (resp. surjective, bijective); non-degenerate, iff V →֒ ∗W and W →֒ V ∗ canonically. 2.11. Let T be an A-rng. We call a T -module W locally projective (in the sense of B. Zimmermann-Huisgen [Z-H1976]), iff for every diagram of T -modules 0 // F g′◦ι ι //W // N // 0 with exact rows and finitely generated T -submodule F ⊆W : for every T -linear morphism g : W → N, there exists a T -linear morphism g′ : W → L, such that g ◦ ι = π ◦ g′ ◦ ι. For proofs of the following basic properties of locally projective modules and dual α-pairings see [Abu2005] and [Z-H1976]: Proposition 2.12. Let T be an A-ring and Pl = (V, TW ) ∈ Pl(T ). 1. The left T -module TW is locally projective if and only if ( ∗W,W ) is an α-pairing. 2. The left T -module TW is locally projective, iff for any finite subset {w1, ..., wk} ⊆ W, there exists {(fi, w̃i)} i=1 ⊂ ∗W ×W such that wj = fi(wj)w̃i for all j = 1, ..., k. 3. If TW is locally projective, then TW is flat and T -cogenerated. 4. If Pl ∈ P l (T ), then TW is locally projective. 5. If TW is locally projective and κP (V ) ⊆ ∗W is dense, then Pl ∈ P l (T ). 6. Assume TT is an injective cogenerator. Then Pl ∈ P l (T ) if and only if TW is locally projective and κPl(V ) ⊆ ∗W is dense. 7. If T is a QF ring, then Pl ∈ P l (T ) if and only if TW is projective and W →֒ V ∗. The following result completes the nice observation [BW2003, 42.13.] about locally projective modules: Proposition 2.13. Let T be a ring, TW a left T -module, S := End(TW ) op and consider the canonical (S, S)-bilinear morphism [, ]W : ∗W ⊗T W → End(TW ), f ⊗T w 7→ [w̃ 7→ f(w̃)w]. 1. TW is finitely generated projective if and only if [, ]W is surjective. 2. TW is locally projective if and only if Im([, ]W ) ⊆ End(TW ) is dense. Proof. 1. This follows by [Fai1981, 12.8.]. 2. Assume TW is locally projective and consider for every left T -module N the canonical mapping [, ]WN : ∗ W ⊗T N → HomT (W,N), f ⊗T n 7→ [w̃ 7→ f(w̃)n]. It follows then by [BW2003, 42.13.], that Im([, ]WN ) ⊆ HomT (W,N) is dense. In particular, setting N = W we conclude that Im([, ]W ) ⊆ End(TW ) is dense. On the other hand, assume Im([, ]W ) ⊆ End(TW ) is dense. Then for every finite subset {w1, ..., wk} ⊆ W, there exists g̃i ⊗T w̃i ∈ ∗W ⊗T W with wj = idW (wj) = [, ]W ( g̃i ⊗T w̃i)(wj) = g̃i(wj)w̃i for j = 1, ..., k. It follows then by Proposition 2.12 “2” that TW is locally projective.� 3 Morita (Semi)contexts We noticed, in the proofs of some results on equivalences between subcategories of module categories associated to a given Morita context, that no use is made of the com- patibility between the connecting bimodule morphisms (or even that only one trace ideal is used and so only one of the two bilinear morphisms is really in action). Some results of this type appeared, for example, in [Nau1993], [Nau1994-a] and [Nau1994-b]. Moreover, in our considerations some Morita contexts will be formed for arbitrary associative rngs (i.e. not necessarily unital rings). These considerations motivate us to make the following general definitions: 3.1. By a Morita semi-context we mean a tuple mT = ((T : A), (S : B), P, Q,<,>T , I), (3) where T is an A-rng, S is a B-rng, P is a (T, S)-bimodule, Q is an (S, T )-bimodule, <,>T : P ⊗S Q → T is a (T, T )-bilinear morphism and I := Im(<,>T ) ⊳ T (called the trace ideal associated to mT ).We drop the ground rings A,B and the trace ideal I ⊳ T, if they are not explicitly in action. If mT (3) is a Morita semi-context and T, S are unital rings, then we call mT a unital Morita semi-context. 3.2. Let mT = ((T : A), (S : B), P, Q,<,>T ), mT ′ = ((T ′ : A′), (S ′ : B′), P ′, Q′, <,>T ′) be Morita semi-contexts. By a morphism of Morita semi-contexts from mT to mT ′ we mean a four fold set of morphisms ((β : δ), (γ : σ), φ, ψ) : ((T : A), (S : B), P, Q) → ((T ′ : A′), (S ′ : B′), P ′, Q′), where (β : δ) : (T : A) → (T ′ : A′) and (γ : σ) : (S : B) → (S ′ : B′) are rng morphisms, φ : P → P ′ is (T, S)-bilinear and ψ : Q→ Q′ is (S, T )-bilinear, such that β(< p, q >T ) =< φ(p), ψ(q) >T ′ for all p ∈ P, q ∈ Q . Notice that we consider P ′ as a (T, S)-bimodule and Q′ as an (S, T )-bimodule with actions induced by the morphism of rngs (β : δ) and (γ : σ). By MSC we denote the category of Morita semi-contexts with morphisms defined as above, and by UMSC < MSC the (non-full) subcategory of unital Morita semi-contexts. Morita semi-contexts are closely related to dual pairings in the sense of [Abu2005]: 3.3. Let (T, S, P,Q,<,>T ) ∈ MSC and consider the canonical isomorphisms of Abelian groups Hom(S,T )(Q, ≃ Hom(T,T )(P ⊗S Q, T ) ≃ Hom(T,S)(P,Q This means that we have two dual T -pairings Pl := (Q, TP ) ∈ Pl(T ) and Qr := (P,QT ) ∈ Pr(T ), induced by the canonical T -linear morphisms κPl := ξ −1(<,>T ) : Q→ ∗P and κQr := ζ(<,>T ) : P → Q On the other hand, let (S, T,Q, P,<,>S) ∈ MSC and consider the canonical isomorphisms of Abelian groups Hom(S,T )(Q,P ≃ Hom(S,S)(Q⊗T P, S) ≃ Hom(T,S)(P, Then we have two dual S-pairings Pr := (Q,PS) ∈ Pr(S) and Ql := (P, SQ) ∈ Pl(S), induced by the canonical morphisms κPr := ξ ′−1(<,>S) : Q→ P ∗ and κQr := ζ ′(<,>S) : P → 3.4. By a Morita datum we mean a tuple M = ((T : A), (S : B), P, Q,<,>T , <,>S, I, J), (4) where the following are Morita semi-contexts. MT := ((T : A), (S : B), P, Q,<,>T , I) and MS := ((S : B), (T : A), Q, P,<,>S, J) (5) If, moreover, the bilinear morphisms <,>T : P ⊗S Q → T and < −, >S: Q ⊗T P → S are compatible, in the sense that < q, p >S q ′ = q < p, q′ >T and p < q, p ′ >S =< p, q >T p ′ ∀ p, p′ ∈ P, q, q′ ∈ Q, (6) then we call M a Morita context. If T, S in a Morita datum (context) M are unital, then we call M a unital Morita datum (context). 3.5. LetM = ((T : A), (S : B), P, Q,<,>T , <,>S) andM ′ = ((T ′ : A′), (S ′ : B′), P ′, Q′, < ,>T ′, <,>S′) be Morita contexts. Extending [Ami1971, Page 275], we mean by a mor- phism of Morita contexts from M to M′ a four fold set of maps ((β : δ), (γ : σ), φ, ψ) : ((T : A), (S : B), P, Q) → ((T ′ : A′), (S ′ : B′), P ′, Q′), where (β : δ) : (T : A) → (T ′ : A′), (γ : σ) : (S : B) → (S ′ : B′) are rng morphisms, φ : P → P ′ is (T, S)-bilinear and ψ : Q→ Q′ is (S, T )-bilinear, such that β(< p, q >T ) =< φ(p), ψ(q) >T ′ and γ(< q, p >S) =< ψ(q), φ(p) >S′ ∀ p ∈ P, q ∈ Q. By MC we denote the category of Morita contexts with morphisms defined as above, and by UMC R) yields a Morita context (R,R, P,Q,<,>R, [, ]R), where [, ]R := Q⊗R P ≃ P ⊗R Q −→ R.� 3.7. We call a Morita semi-context mT = (T, S, P,Q,<,>T ) semi-derived (derived), iff S := End(TP ) op (and Q = ∗P ). We call a Morita datum, or a Morita context, M = (T, S, P,Q,<,>T , <,>S) semi-derived (derived), iff S = End(TP ) op, or T = End(PS) (S = End(TP ) op and Q = ∗P, or T = End(PS) and Q = P Remark 3.8. Following [Cae1998, 1.2.] (however, dropping the condition that the bilinear map <,>T : P ⊗SQ→ T is surjective), Morita semi-contexts (T, S, P,Q,<>T ) in our sense were called dual pairs in [Ver2006]. However, we think the terminology we are using is more informative and avoids confusion with other notions of dual pairings in the literature (e.g. the ones studied by the first author in [Abu2005]). The reason for this specific terminology (i.e. Morita semi-contexts) is that every Morita context contains two Morita semi-contexts as clear from the definition; and that any Morita semi-context can be extended to a (not necessarily unital) Morita context in a natural way as explained below. Elementary rngs In what follows we demonstrate how to build new Morita (semi-)contexts from a given Morita semi-context. These constructions are inspired by the notion of elementary rngs in [Cae1998, 1.2.] (and [Ver2006, Remark 3.8.]): Lemma 3.9. Let mT := ((T : A), (S : B), P, Q,<,>T ) ∈ MSC. 1. The (T, T )-bimodule T := P ⊗S Q has a structure of a T -rng (A-rng) with multipli- cation (p⊗S q) ·T (p ′ ⊗S q ′) :=< p, q >T p ′ ⊗S q ′ ∀ p, p′ ∈ P, q, q′ ∈ Q, such that <,>T : T → T is a morphism of A-rngs, P is a (T, S)-bimodule and Q is an (S,T)-bimodule, where (p⊗S q)⇀ p̃ :=< p, q >T p̃ and q̃ ↼ (p⊗S q) := q̃ < p, q >T . Moreover, we have morphisms of T -rngs (A-rngs) ψ : T → End(PS), p⊗S q 7→ [p̃ 7→< p, q >T p̃]; φ : T → End(SQ) op, p⊗S q 7→ [q̃ 7→ q̃ < p, q >T ], ((T : A), (S : B), P, Q, idT) ∈ MSC and we have a morphism of Morita semi-contexts (<,>T , idS, , idP , idQ) : (T, S, P,Q, idT) → (T, S, P,Q,<,>T ). 2. The (S, S)-bimodule S := Q⊗T P has a structure of an S-rng (B-rng) with multipli- cation (q⊗T p) ·S (q ′ ⊗T p ′) := q < p, q′ >T ⊗T p ′ = q⊗T < p, q ′ >T p ′ ∀ p, p′ ∈ P, q, q′ ∈ Q, such that <,>S: S → S is a morphism of B-rngs, P is a (T,S)-bimodule and Q is an (S, T )-bimodule, where p̃ ↼ (q ⊗T p) :=< p̃, q >T p and (q ⊗T p)⇀ q̃ := q < p, q̃ >T . Moreover, we have morphisms of S-rngs (B-rngs) Ψ : S → End(TP ) op, q ⊗T p 7→ [p̃ 7→< p̃, q >T p], Φ : S → End(QT ), q ⊗T p 7→ [q̃ 7→ q < p, q̃ >T ], and M := ((T : A), (S : B), P, Q,<,>T , idS) is a Morita context. Remarks 3.10. 1. Given ((S : B), (T : A), Q, P,<,>S) ∈ MSC, the (S, S)-bimodule S := Q⊗T P becomes an S-rng with multiplication (q ⊗T p) ·S (q ′ ⊗T p ′) :=< q, p >S q ′ ⊗T p ′ ∀ p, p′ ∈ P, q, q′ ∈ Q; and the (T, T )-bimodule T := P ⊗S Q becomes a T -rng with multiplication (p⊗S q) ·T (p ′⊗S q ′) := p < q, p′ >S ⊗S q ′ = p⊗S < q, p ′ >S q ′ ∀ p, p′ ∈ P, q, q′ ∈ Q. Analogous results to those in Lemma 3.9 can be obtained for the S-rng S and the T -rng T. 2. Given a Morita semi-context (T, S, P,Q,<,>T ) several equivalent conditions for the T -rng T := P ⊗S Q to be unital and the modules TP, QT to be firm can be found in [Ver2006, Theorem 3.3.]. Analogous results can be formulated for the S-rng Q⊗T P and the S-modules PS, SQ corresponding to any (S, T,Q, P,<,>S) ∈ MSC. Proposition 3.11. 1. Let mT = (T, S, P,Q,<,>T ) ∈ UMSC and assume the A-rng T := P ⊗S Q to be unital. If <,>T : T → T respects unities (and mT is injective), then <,>T is surjective (T ≃ T as A-rings). 2. Let mS = (S, T,Q, P,<,>S) ∈ UMSC and assume the B-rng S := Q ⊗S P to be unital. If <,>S: S → S respects unities (and mS is injective), then <,>S is surjective (S ≃ S as B-rings). 3. Let M = (T, S, P,Q,<,>T , <,>S) ∈ UMC and assume the rngs T := P ⊗S Q, T, S := Q⊗S P to be unital. If <,>T : P ⊗S Q → T and <,>S: S → S respect unities, then T ≃ T as A-ring, S ≃ S as B-rings and we have equivalences of categories TM ≈ SM (and MT ≈ MS). Proof. Assume T is unital with 1T = pi ⊗S qi. If <,>T respects unities, then we < pi, qi >T= 1T , and so for any t ∈ T we get t = t1T = t < pi, qi >T=∑n < tpi, qi >T∈ Im(<,>T ). One can prove “2” analogously. As for “3”, it is well known that a unital Morita context with surjective connecting bimodule morphisms is strict (e.g. [Fai1981, 12.7.]), hence T ≃ T, S ≃ S. The equivalences of categories TM ≃ TM ≈ SM ≃ SM (and MT ≃ MT ≈ MS ≃ MS) follow then by classical Morita Theory (e.g. [Fai1981, Chapter 12]).� Definition 3.12. Let T be an A-rng, VT a right T -module and consider for every left T -module TL the annihilator ann⊗L(VT ) := {l ∈ L | V ⊗T l = 0}. Following [AF1974, Exercises 19], we say VT is L-faithful, iff ann L(VT ) = 0; and to be completely faithful, iff VT is L-faithful for every left T -module SL. Similarly, we can define completely faithful left T -modules. Under suitable conditions, the following result characterizes the Morita data, which are Morita contexts: Proposition 3.13. Let M = (T, S, P,Q,<,>T , <,>S) be a Morita datum. 1. If M ∈ MC, then S ≃ S and T ≃ T as rngs. 2. Assume TP is Q-faithful and QT is P -faithful. Then M ∈ MC if and only if S and T ≃ T as rngs. Proof. 1. Obvious. 2. Assume S ≃ S and T ≃ T as rngs. If p ∈ P and q, q′ ∈ Q are arbitrary, then we have for any p̃ ∈ P : < q, p >S q ′ ⊗T p̃ = (q ⊗T p) ·S (q ′ ⊗T p̃) = (q ⊗T p) ·S (q ′ ⊗T p̃) = q < p, q ′ >T ⊗T p̃, hence < q, p >S q ′ − q < p, q′ >T∈ annQ(P ) = 0 (since TP is Q-faithful), i.e. < q, p >S q ′ = q < p, q′ >T for all p ∈ P and q, q ′ ∈ Q. Assuming QT is P -faithful, one can prove analogously that < p, q >T p ′ = p < q, p′ >S for all p, p ′ ∈ P and q ∈ Q. Consequently, M is a Morita context.� 4 Injective Morita (Semi-)Contexts Definition 4.1. We call a Morita semi-context mT = (T, S, P,Q,<,>T , I) : injective (resp. semi-strict, strict), iff <,>T : P ⊗S Q→ T is injective (resp. surjec- tive, bijective); non-degenerate, iff Q →֒ ∗P and P →֒ Q∗ canonically; Morita α-semi-context, iff Pl := (Q, TP ) ∈ P l (T ) and Qr := (P,QT ) ∈ P r (T ). Notation. By MSCα ≤ MSC (UMSCα ≤ UMSC) we denote the full subcategory of (unital) Morita semi-contexts satisfying the α-condition. Moreover, we denote by IMSC ≤ MSC (IUMSC ≤ UMSC) the full subcategory of injective (unital) Morita semi-contexts. Definition 4.2. We say a Morita datum (context) M = (T, S, P,Q,<,>T , <,>S, I, J) : is injective (resp. semi-strict, strict), iff <,>T : P⊗SQ→ T and <,>S: Q⊗T P → S are injective (resp. surjective, bijective); is non-degenerate, iff Q →֒ ∗P, P →֒ Q∗, Q →֒ P ∗ and P →֒ ∗Q canonically; satisfies the left α-condition, iffPl := (Q, TP ) ∈ P l (T ) andQl := (P, SQ) ∈ P l (S); satisfies the right α-condition, iff Qr := (P,QT ) ∈ P r (T ) and Pr := (Q,PS) ∈ Pαr (S); satisfies the α-condition, or M is a Morita α-datum (Morita α-context), iff M satisfies both the left and the right α-conditions. Notation. By MCαl < MC (UMC l < UMC) we denote the full subcategory of Morita contexts satisfying the left α-condition, and by MCαr < MC (UMC r < UMC) the full subcategory of (unital) Morita contexts satisfying the right α-condition. Moreover, we set α := MCαl ∩MC r and UMC α := UMCαl ∩ UMC Lemma 4.3. Let M = (T, S, P,Q,<,>T , <,>S, I, J) ∈ MC. Consider the Morita semi- context MS := (S, T,Q, P,<,>S), the dual pairings Pl := (Q, TP ) ∈ Pl(T ), Qr := (P,QT ) ∈ Pr(T ) and the canonical morphisms of rings ρP : S → End(TP ) op and λQ : S → End(QT ). 1. If Qr is injective (semi-strict), then MS is injective (ρP : S → End(TP ) op is a surjective morphism of B-rngs). 2. Assume PS is faithful and let Qr be semi-strict. Then S ≃ End(TP ) op (an isomor- phism of unital B-rings) and MS is strict. 3. If Pl is injective (semi-strict), then MS is injective (λQ : S → End(QT ) is a surjec- tive morphism of B-rngs). 4. Assume SQ is faithful and let Pl is semi-strict. Then S ≃ End(QT ) (an isomorphism of unital B-rings) and MS is strict. Proof. We prove only “1” and “2”, as “3” and “4” can be proved analogously. Consider the following butterfly diagram with canonical morphisms Q⊗T Q ∗P ⊗T P Q⊗T P idQ⊗TκQr llYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY ⊗T idP 22eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee uujjj Hom−T ( ∗P,Q) (κPl ,Q) Hom−T (Q ∗, P ) (κQr ,P ) ttjjjj **TTT End(QT ) End(TP ) qi ⊗T pi ∈ Q⊗T P be arbitrary. For every p̃ ∈ P we have [(κQr , P ) ◦ α qi ⊗T pi)](p̃) = < p̃, qi >T pi p̃ < qi, pi >S = ρP ( < qi, pi >S)(p̃) = (ρP◦ <,>S)( qi ⊗T pi)(p̃), i.e. α P := (κQr , P ) ◦ α P = ρP◦ <,>S; and [, ]lP ◦ (κPl ⊗T idP ))( qi ⊗T pi)](p̃) = κPl(qi)(p̃)pi < p̃, qi >T pi p̃ < qi, pi >S = ρP ( < qi, pi >S)(p̃) = [(ρP◦ <,>S)( qi ⊗T pi)](p̃), i.e. [, ]lP ◦ (κPl ⊗T idP ) = ρP◦ <,>S . On the other hand, for every q̃ ∈ Q we have ((κPl , Q) ◦ α qi ⊗T pi)(q̃) = qi < pi, q̃ >T < qi, pi >S)q̃ = λQ( < qi, pi >S)(q̃) = (λQ◦ <,>S)( qi ⊗T pi), i.e. α Q := (κPl , Q) ◦ α Q = λQ◦ <,>S and ([, ]rQ ◦ (idQ ⊗T κQr))( qi ⊗T pi)](q̃) = qiκQr(pi)(q̃) qi < pi, q̃ >T < qi, pi >S q̃ = λQ( < qi, pi >S)(q̃) = [(λQ◦ <,>S)( qi ⊗T pi)](q̃), i.e. [, ]rQ ◦ (idQ ⊗T κQr) = λQ◦ <,>S . Hence Diagram (7) is commutative. (1) Follows directly from the assumptions and the equality α P = ρP◦ <,>S . (2) Let PS be faithful, so that the canonical left S-linear map ρP : S → End(TP ) is injective. Assume now that Qr is semi-strict. Then ρP is surjective by “1” , whence bijective. Since rings of endomorphisms are unital, we conclude that S ≃ End(TP ) op is a unital B-ring as well (with unity ρ−1P (idP )). Moreover, the surjectivity of α P = ρP◦ <,>S implies that <,>S is surjective (since ρP is injective), say 1S = < q̃j, p̃j >S for some {(q̃j , p̃j)}J ⊆ Q× P. For any qi ⊗T pi ∈ Ker(<,>S), we have then qi ⊗T pi = ( qi ⊗T pi) · 1S = (qi ⊗T pi) · ( < q̃j, p̃j >S) qi ⊗T pi < q̃j, p̃j >S = qi⊗T < pi, q̃j >T p̃j qi < pi, q̃j >T ⊗T p̃j = < qi, pi >S q̃j ⊗T p̃j < qi, pi >S)q̃j ⊗T p̃j = 0, i.e. <,>S is injective, whence an isomorphism.� The following result shows that Morita α-contexts are injective: Corollary 4.4. MCαl ∪MC r ≤ IMC. Example 4.5. Let mT = (T, S, P,Q,<,>T ) be a non-degenerate Morita semi-context. If T is a QF ring and the T -modules TP, QT are projective, then by Proposition 2.12 “7” Pl := (Q, TP ) ∈ P l (T ) and Qr := (P,QT ) ∈ P r (T ) (i.e. mT is a Morita α-semi- context, whence injective). On the other hand, let M = (T, S, P,Q,<,>T , <,>S) be a non-degenerate Morita datum. If T, S are QF rings and the modules TP, QT , PS, SQ are projective, then M is an Morita α-datum (whence injective).� Every semi-strict unital Morita context is injective (whence strict, e.g. [Fai1981, 12.7.]). The following example, which is a modification of [Lam1999, Example 18.30]), shows that the converse is not necessarily true: Example 4.6. Let T = M2(Z2) be the ring of 2× 2 matrices with entries in Z2. Notice that ∈ T is an idempotent, and that eTe ≃ Z2 as rings. Set P := Te = { | a′, c′ ∈ Z2} and Q := eT = { | a, b ∈ Z2}. Then P = Te is a (T, eTe)-bimodule and Q = eT is an (eTe, T )-bimodule. Moreover, we have a Morita context Me = (T, eTe, T e, , eT, <,>T , < . >eTe), where the connecting bilinear maps are <,>T : Te⊗eTe eT → T, a′a a′b c′a c′b <,>eTe : eT ⊗T Te → eTe aa′ + bc′ 0 Straightforward computations show that<,>T is injective but not surjective (as Im(<,>T )) and that <,>eTe is in fact an isomorphism. This means that Me is an injective Morita context that is not semi-strict (whence not strict).� Definition 4.7. Let T be a rng and I ⊳ T an ideal. For every left T -module TV consider the canonical T -linear map ζI,V : V → HomT (I, V ), v 7→ [t 7→ tv]. We say T I is strongly V -faithful, iff annV (I) := Ker(ζI,V ) := 0. Moreover, we say I is strongly faithful, if T I is V -faithful for every left T -module TV. Strong faithfulness of I w.r.t. right T -modules can be defined analogously. Remark 4.8. Let T be a rng, I ⊳ T an ideal and TU a left ideal. It’s clear that ann U(IT ) ⊆ annU(I) := Ker(ζI,U). Hence, if T I is strongly U-faithful, then IT is U-faithful (which justifies our terminology). In particular, if T I is strongly faithful, then IT is completely faithful. Morita α-contexts are injective by Corollary 4.4. The following result gives a partial converse: Lemma 4.9. Let M = (T, S, P,Q,<,>T , <,>S, I, J) ∈ MC and assume the Morita semi- context MS := (S, T,Q, P,<,>S, J) is injective. 1. If SJ is strongly faithful, then Qr := (P,QT ) ∈ P r (T ). 2. If JS is strongly faithful, then Pl := (Q, TP ) ∈ P l (T ). Proof. We prove only “1”, since “2” can be proved similarly. Assume MS is injective and consider for every left T -module U the following diagram Q⊗T U ζJ,Q⊗T U ((QQ HomT−(P, U) ψQ,Uuukkk HomS−(J,Q⊗T U) where for all f ∈ HomT−(P, U) and < qj , pj >S∈ J we define ψQ,U(f)( < qj , pj >S) := qj ⊗T f(pj). Then we have for every q̃i ⊗T ũi ∈ Q⊗T U and s = < qj, pj >S∈ J : (ψQ,U ◦ α q̃i ⊗T ũi)(s) = qj ⊗T [α q̃i ⊗T ũi)](pj) qj ⊗T < pj , q̃i >T ũi] qj⊗T < pj, q̃i >T ũi qj < pj , q̃i >T ⊗T ũi < qj , pj >S q̃i ⊗T ũi = ζJ,Q⊗TU( q̃i ⊗T ũi)(s), i.e. diagram (8) is commutative. If SJ is strongly faithful, then Ker(ζJ,Q⊗TU) = annQ⊗TU(J) = 0, hence ζJ,Q⊗TU is injective and it follows then that α U is injective.� Proposition 4.10. Let M = (T, S, P,Q,<,>T , <,>S, I, J) ∈ IMC. If T I, IT , SJ and JS are strongly faithful, then M ∈ MCα. 5 Equivalences of Categories In this section we give some applications of injective Morita (semi-)contexts and in- jective Morita data to equivalences between suitable subcategories of modules arising in the Kato-Müller-Ohtake localization-colocalization theory (as developed in (e.g. [Kat1978], [KO1979], [Mül1974]). All rings, hence all Morita (semi-)contexts and data, in this section are unital. Static and Adstatic Modules 5.1. ([C-IG-TW2003]) Let A and B be two complete cocomplete Abelian categories, R : A → B an additive covariant functor with left adjoint L : B → A and let ω : LR → 1A and η : 1B → RL be the induced natural transformations (called the counit and the unit of the adjunction, respectively). Related to the adjoint pair (L,R) are two full subcategories of A and B : Stat(R) := {X ∈ A | LR(X) ≃ X} and Adstat(R) := {Y ∈ B | Y ≃ RL(Y )}, whose members are called R-static objects and R-adstatic objects, respectively. It is evident (from definition) that we have equivalence of categories Stat(R) ≈ Adstat(R). A typical situation, in which static and adstatic objects arise naturally is the following: 5.2. Let T, S be rings, TUS a (T, S)-bimodule and consider the covariant functors HlU := HomT (U,−) : TM → SM and T U := U ⊗S − : SM → TM. It is well-known that (TlU ,H U) is an adjoint pair of covariant functors via the natural isomorphisms HomT (U ⊗S M,N) ≃ HomS(M,HomT (U,N)) for all M ∈ SM and N ∈ TM and the natural transformations ωlU : U ⊗S HomT (U,−) → 1TM and η U : 1SM → HomT (U, U ⊗S −) yield for every TK and SL the canonical morphisms ωlU,K : U ⊗S HomT (U,K) → K and η U,L : L→ HomT (U, U ⊗S L). (9) We call the HlU-static modules U-static w.r.t. S and set Statl(TUS) := Stat(H U) = {TK | U ⊗S HomT−(U,K) ≃ K}; and the HlU-adstatic modules U-adstatic w.r.t. S and set Adstatl(TUS) := Adstat(H U) = {SL | L ≃ HomT−(U, U ⊗S L)}. By [Nau1990a] and [Nau1990b], there are equivalences of categories Statl(TUS) ≈ Adstat l(TUS). (10) On the other hand, one can define the full subcategories Statr(TUS) ≈ Adstat r(TUS) : Statr(TUS) := {KS | Hom−S(U,K)⊗T U ≃ K}; Adstatr(TUS) := {LT | L ≃ Hom−S(U, L⊗T U)}. In particular, setting Stat(TU) := Stat l(TUEnd(TU)op); Adstat(TU) := Adstat l(TUEnd(TU)op); Stat(US) := Stat r(End(SU)US); Adstat(US) := Adstat r(End(SU)US), there are equivalences of categories: Stat(TU) ≃ Adstat(TU) and Stat(US) ≃ Adstat(US). (11) Remark 5.3. The theory of static and adstatic modules was developed in a series of papers by the second author (see the references). They were also considered by several other authors (e.g. [Alp1990], [CF2004]). For other terminologies used by different authors, the interested reader may refer to a comprehensive treatment of the subject by R. Wisbauer in [Wis2000]. Intersecting subcategories Several intersecting subcategories related to Morita contexts were introduced in the literature (e.g. [Nau1993], [Nau1994-b]). In what follows we introduce more and we show that many of these coincide, if one starts with an injective Morita semi-context. Moreover, other results on equivalences between some intersecting subcategories related to an injective Morita context will be reframed for arbitrary (not necessarily compatible) injective Morita data. Definition 5.4. 1. For a right T -module X, a T -submodule X ′ ⊆ X is called K-pure for some left T -module TK, iff the following sequence of Abelian groups is exact 0 → X ′ ⊗T K → X ⊗T K → X/X ′ ⊗T K → 0; 2. For a left T -module Y, a T -submodule Y ′ ⊆ Y is called L-copure for some left T -module TL, iff the following sequence of Abelian groups is exact 0 → HomT (Y/Y ′, L) → HomT (Y, L) → HomT (Y ′, L) → 0. Definition 5.5. (Compare [KO1979, Theorems 1.3., 2.3.]) Let T be a ring, I ⊳ T an ideal, U a left T -module and consider the canonical T -linear morphisms ζI,U : U → HomT (I, U) and ξI,U : I ⊗T U → U. 1. We say TU is I-divisible, iff ξI,U is surjective (equivalently, iff IU = U). 2. We say TU is I-localized, iff U ≃ HomT (I, U) canonically (equivalently iff T I is strongly U -faithful and T I ⊆ T is U -copure). 3. We say a left T -module U is I-colocalized, iff I⊗TU ≃ U canonically (equivalently, iff TU is I-divisible and IT ⊆ T is U -pure). Notation. For a ring T, an ideal I ⊳ T, and with morphisms being the canonical ones, we set ID := {TU | IU = U}; IF := {TU | U →֒ HomT−(I, U)}; IL := {TU | U ≃ HomT (I, U}; IC := {TU | I ⊗T U ≃ U}; DI := {UT | UI = U}; FI := {UT | U →֒ Hom−T (I, U)}; LI := {UT | U ≃ HomT (I, U}; CI := {UT | U ⊗T I ≃ U}; . The following result is due to T. Kato, K. Ohtake and B. Müller (e.g. [Mül1974], [Kat1978], [KO1979]): Proposition 5.6. Let M = (T, S, P,Q,<,>T , <,>S, I, J) ∈ UMC. Then there are equiv- alences of categories IC ≈ JC, CI ≈ CJ , IL ≈ JL and LI ≈ LJ . 5.7. Let mT = (T, S, P,Q,<,>T , I) ∈ UMSC and consider the dual pairings Pl := (Q, TP ) ∈ Pl(T ) and Qr := (P,QT ) ∈ Pr(T ). For every left (right) T -module U consider the canonical S-linear morphism induced by <,>T : U : Q⊗T U → HomT−(P, U) (α U : U ⊗T P → Hom−T (Q,U)). We define Dl(mT ) := {TU | Q⊗T U ≃ HomT−(P, U)}; Dr(mT ) := {UT | U ⊗T P ≃ Hom−T (Q,U)}. Moreover, set Ul(mT ) := Stat l(TPS) ∩Adstat l(SQT ); Ur(mT ) := Stat r(SQT ) ∩Adstat r(TPS); Vl(mT ) := Stat l(TPS) ∩ Dl(mT ); Vr(mT ) := Stat r(SQT ) ∩ Dr(mT ); Vl(mT ) := IC ∩ Dl(mT ); Vr(mT ) := CI ∩ Dr(mT ); V̂l(mT ) := Vl(mT )∩ IL; V̂r(mT ) := Vr(mT ) ∩ LI ; Wl(mT ) := Adstat l(SQT ) ∩ Dl(mT ); Wr(mT ) := Adstat r(TPS) ∩ Dr(mT ); Wl(mT ) := IL ∩ Dl(mT ); Wr(mT ) := LI ∩ Dr(mT ); Ŵl(mT ) := Wl(mT )∩ IC; Ŵr(mT ) := Wr(mT ) ∩ CI ; Xl(mT ) := Vl(mT ) ∩Wl(mT ); Xr(mT ) := Vr(mT ) ∩Wr(mT ); Xl(mT ) := Vl(mT ) ∩Wl(mT ); Xr(mT ) := Vr(mT ) ∩Wr(mT ). X ∗l (mT ) := {S(Q⊗T U) | V ∈ Xl(mT )}; X r (mT ) := {(U ⊗T P )S | V ∈ Xr(mT )}; l (mT ) := {S(Q⊗T U) | V ∈ Xl(mT )}; X r(mT ) := {(U ⊗T P )S | V ∈ Xr(mT )}. Given mS = (S, T,Q, P,<,>S, J) ∈ UMSC one can define analogously, the corresponding intersecting subcategories of SM and MS. As an immediate consequence of Proposition 5.6 we get Corollary 5.8. Let M = (T, S, P,Q,<,>T , <,>S, I, J) ∈ IUMC and consider the asso- ciated Morita semi-contexts MT and MS (5). 1. If IC ≤ Dl(MT ) and JC ≤ Dl(MS), then Vl(MT ) ≈ Vl(MS). Similarly, if CI ≤ Dr(MT ) and CJ ≤ Dr(MS), then Vr(MT ) ≈ Vr(MS). 2. If IL ≤ Dl(MT ) and JL ≤ Dl(MS), then Wl(MT ) ≈ Wl(MS). Similarly, if LI ≤ Dr(MT ) and LJ ≤ Dr(MS), then Wr(MT ) ≈ Wr(MS). Starting with a Morita context, the following result was obtained in [Nau1993, Theorem 3.2.]. We restate the result for an arbitrary (not necessarily compatible) Morita datum and sketch its proof: Lemma 5.9. Let M = (T, S, P,Q,<,>T , <,>S, I, J) be a unital Morita datum and con- sider the associated Morita semi-contexts MT and MS in (5). Then there are equivalences of categories Xl(MT ) HomT−(P,−) HomS−(Q,−) Xl(MS) and Xr(MT ) Hom−T (Q,−) Hom−S(P,−) Xr(MS). Proof. Let TV ∈ Xl(MT ). By the equivalence Stat l(TPS) HomT (P,−) ≈ Adstatl(TPS) in 5.2 we have HomT−(P, V ) ∈ Adstat l(TPS). Moreover, V ∈ Dl(M), hence HomT−(P, V ) ≃ Q⊗T V canonically and it follows then from the equivalence Adstatl(SQT ) ≈ Statl(SQT ) that HomT−(P, V ) ∈ Stat l(SQT ). Moreover, we have the following natural isomorphisms P ⊗S HomT−(P, V ) ≃ V ≃ HomS−(Q,Q⊗T V ) ≃ HomS−(Q,HomT−(P, V )), (13) i.e. HomT−(P, V ) ∈ Dl(MS). Consequently, HomT−(P, V ) ∈ Xl(MS). Moreover, (13) yields a natural isomorphism V ≃ HomS−(Q,HomT−(P, V )). Analogously, one can show for everyW ∈ Xl(MS) that HomS−(Q,W ) ∈ Xl(MT ) and thatW ≃ HomT−(P,HomS−(Q,W )) naturally. Consequently, Xl(MT ) ≈ Xl(MS). The equivalences Xr(MT ) ≈ Xr(MS) can be proved analogously.� Proposition 5.10. Let M = (T, S, P,Q,<,>T , <,>S, I, J) be a unital injective Morita datum and consider the associated Morita semi-contexts MT and MS in (5). 1. There are equivalences of categories Statl(T IT ) ≈ Adstat l(T IT ); Stat l(SJS) ≈ Adstat l(SJS); Statr(T IT ) ≈ Adstat r(T IT ); Stat r(SJS) ≈ Adstat r(SJS). 2. If Statl(T IT ) ≤ X l (MS) and Stat l(SJS) ≤ X l (MT ), then there are equivalences of categories Statl(T IT ) ≈ Stat l(SJS) and Adstat l(T IT ) ≈ Adstat l(SJS). 3. If Statr(T IT ) ≤ X r (MS) and Stat r(SJS) ≤ X r (MT ), then there are equivalences of categories Statr(T IT ) ≈ Stat r(SJS) and Adstat r(T IT ) ≈ Adstat r(SJS). Proof. To prove “1”, notice that since M is an injective Morita datum, P ⊗S Q and Q⊗T P ≃ J as bimodules and so the four equivalences of categories result from 5.2. To prove “2”, one can use an argument similar to that in [Nau1994-b, Theorem 3.9.] to show that the inclusion Statl(T IT ) = Stat l(T (P ⊗S Q)T ) ≤ X l (MS) implies Stat l(T IT ) = Statl(T (P ⊗S Q)T ) = Xl(MT ) and that the inclusion Stat l(SJS) = Stat l(S(Q ⊗T P )S) ≤ X ∗l (MT ) implies Stat l(SJS) = Stat l(S(Q ⊗T P )S) = Xl(MS). The result follows then by Lemma 5.9. The proof of “3” is analogous to that of “2”.� For injective Morita semi-contexts, several subcategories in (12) are shown in the following result to be equal: Theorem 5.11. Let mT = (T, S, P,Q,<,>T , I) ∈ IUMS. Then 1. Vl(mT ) = Vl(mT ), Wl(mT ) = Wl(mT ), whence V̂l(mT ) = Ŵl(mT ) = Xl(mT ) = Xl(mT ) = IC∩Dl(mT )∩ IL and X l (mT ) = X l (mT ). 2. Vr(mT ) = Vr(mT ), Wr(mT ) = Wr(mT ), whence V̂r(mT ) = Ŵr(mT ) = Xr(mT ) = Xr(mT ) = CI∩Dr(mT )∩LI and X r (mT ) = X r(mT ). Proof. We prove only “1” as “2” can be proved analogously. Assume the Morita semi- context mT = (T, S, P,Q,<,>T , I) is injective. By our assumption we have for every V ∈ Dl(mT ) the commutative diagram P ⊗S (Q⊗T V ) idP⊗S(α (P ⊗S Q)⊗T V <,>T⊗T idV≃ P ⊗S HomT−(P, V ) // V I ⊗T VξI,V Then it becomes obvious that ωlP,V : P ⊗S HomT (P, V ) → V is an isomorphism if and only if ξI,V : I ⊗T V → V is an isomorphism. Consequently V(mT ) = Dl(mT ) ∩ Stat l(TPS) = Dl(mT ) ∩ IC = V(mT ). On the other hand, we have for every V ∈ Dl(mT ) the following commutative diagram HomS−(Q,HomT−(P, V )) // HomT−(P ⊗S Q, V ) HomS−(Q,Q⊗T V ) // HomT−(I, V ) (<,>T ,V )≃ It follows then that ηlP,L : V → HomS(Q,Q ⊗T P ) is an isomorphism if and only if ζI,V : V → HomT (I, V ) is an isomorphism. Consequently, W(mT ) = Dl(mT ) ∩ Adstat l(TPS) = Dl(mT ) ∩I L = W(mT ). Moreover, we have V̂l(mT ) := Vl(mT ) ∩ IL = Vl(mT ) ∩ IL = IC ∩ Dl(mT )∩ IL = IC ∩Wl(mT ) = IC ∩Wl(mT ) = Ŵl(mT ). On the other hand, we have Xl(mT ) = Vl(mT ) ∩Wl(mT ) = Vl(mT ) ∩Wl(mT ) = Xl(mT ) and so the equalities V̂l(mT ) = Ŵl(mT ) = Xl(mT ) = Xl(mT ) and X l (mT ) = X l (mT ) are established.� In addition to establishing several other equivalences of intersecting subcategories, the following results reframe the equivalence of categories V̂ ≈ Ŵ in [Nau1994-b, Theorem 4.9.] for an arbitrary (not necessarily compatible) injective Morita datum: Theorem 5.12. Let M = (T, S, P,Q,<,>T , <,>S, I, J) be an injective Morita datum and consider the associated Morita semi-contexts MT and MS (5). 1. The following subcategories are mutually equivalent: V̂l(MT ) = Ŵl(MT ) = Xl(MT ) = Xl(MT ) ≈ Xl(MS) = Xl(MS) = Ŵl(MS) = V̂l(MS). 2. If Vl(MT ) ≤ IL and Wl(MS) ≤ JC, then Vl(MT ) ≈ Wl(MS). If Wl(MT ) ≤ IC and Vl(MS) ≤ JL, then Wl(MT ) ≈ Vl(MS). 3. The following subcategories are mutually equivalent: V̂r(MT ) = Ŵr(MT ) = Xr(MT ) = Xr(MT ) ≈ Xr(MS) = Xr(MS) = Ŵr(MS) = V̂r(MS). 4. If Vr(MT ) ≤ LI and Wr(MT ) ≤ CJ , then Vr(MT ) ≈ Wr(MS). If Wr(MT ) ≤ CJ and Vr(MS) ≤ LI , then Vr(MS) ≈ Wr(MT ). Proof. By Lemma 5.9, Xl(MT ) ≈ Xl(MS) and so “1” follows by Theorem 5.11. If Vl(MT ) ≤ IL and Wl(MS) ≤ JC, then we have Vl(MT ) = Vl(MT ) ∩ IL = V̂l(MT ) ≈ Ŵl(MS) = Wl(MS) ∩ JC = Wl(MS). On the other hand, if Wl(MT ) ≤ IL and Vl(MS) ≤ JC, then Wl(MT ) = Wl(MT ) ∩ IC = Ŵl(MT ) ≈ V̂l(MS) = Vl(MS) ∩ JL = Vl(MS). So we have established “2”. The results in “3” and “4” can be obtained analogously.� 6 More applications In this final section we give more applications of Morita α-(semi-)contexts and injective Morita (semi-)contexts. All rings in this section are unital, whence all Morita (semi-)contexts are unital. Moreover, for any ring T we denote with TE an arbitrary, but fixed, injective cogenerator in TM. Notation. Let T be an A-ring. For any left T -module TV, we set #V := HomT (V, TE). If moreover, TVS is a (T, S)-bimodule for some B-ring S, then we consider S V with the left S-module structure induced by that of VS. Lemma 6.1. (Compare [Col1990, Lemma 3.2.], [CF2004, Lemmas 2.1.2., 2.1.3.]) Let T be an A-ring, S a B-ring and TVS a (T, S)-bimodule, 1. A left T -module TK is V -generated if and only if the canonical T -linear morphism ωlV,K : V ⊗S HomT (V,K) → K (18) is surjective. Moreover, V ⊗S W ⊆ Pres(TV ) ⊆ Gen(TV ) for every left S-module 2. A left S-module SL is S V -cogenerated if and only if the canonical S-linear morphism ηlV,L : L→ HomT (V, V ⊗S L) (19) is injective. Moreover, HomT (V,M) ⊆ Copres( S V ) ⊆ Cogen( S V ) for every left T -module TM. Remark 6.2. Let T be an A-ring, S a B-ring and TVS a (T, S)-bimodule. Notice that for any left S-module SL we have ann⊗L(VS) := {l ∈ L | V ⊗S l = 0} = Ker(η V,L), whence (by Lemma 6.1 “2” ) VS is L-faithful if and only if SL is S V -cogenerated. It follows then that VS is completely faithful if and only if S V is a cogenerator. Localization and colocalization In what follows we clarify the relations between static (adstatic) modules and subcate- gories colocalized (localized) by a trace ideal of a Morita context satisfying the α-condition. Recall that for any (T, S)-bimodule TPS we have by Lemma 6.1: Statl(TPS) ⊆ Gen(TP ) and Adstat l(TPS) ⊆ Cogen( S P ). (20) Theorem 6.3. Let M = (T, S, P,Q,<,>T , <,>S, I, J) ∈ UMC. Then we have IC ⊆ ID ⊆ Gen(TP ). (21) Assume Pr := (Q,PS) ∈ P r (S). Then 1. Gen(TP ) = Stat l(TPS) ⊆ IF. 2. If Gen(TP ) ⊆ IC, then IC = ID = Gen(TP ) = Stat l(TPS). 3. If Qr := (P,QT ) ∈ P r (T ), then T I ⊆ TT is pure and IC = ID. Proof. For every left T -module TK, consider the following diagram with canonical mor- phisms and let α2 := ζI,K ◦ ω P,K. It is easy to see that both rectangles and the two right triangles commutes: P ⊗S Q⊗T K idP⊗Sα <,>T⊗T idK P ⊗S HomT (P,K) HomT (P,K)// HomS(Q,HomT (P,K)) HomT (P ⊗S Q,K) I ⊗T K // HomT (I,K) (<,>T ,K) It follows directly from the definitions that IC ⊆ ID and Stat l(TPS) ⊆ Gen(TP ). If TK is I-divisible, then ξI,K◦ <,>T ⊗T idK = ω P,K ◦ idP ⊗S α K is surjective, whence ω is surjective and we conclude that TK is P -generated by Lemma 6.1 “1”. Consequently, ID ⊆ Gen(TP ). Assume now that Pr ∈ P r (S). Considering the canonical map ρQ : T → End(SQ) the map ρQ◦ <,>T= α Q is injective and so the bilinear map <,>T is injective (i.e. P ⊗S Q ≃ I). Define α1 := (idP ⊗S α K ) ◦ (<,>T ⊗T idK) −1, so that the left triangles commute. Notice that αPr HomT (P,K) is injective and the commutativity of the upper right triangle in Diagram (22) implies that α2 is injective (whence ω P,K is injective by the commutativity of the lower right triangle). 1. If K ∈ Statl(TPS), then the commutativity of the lower right triangle (22) and the injectivity of α2 show that ζI,K is injective; hence, Stat l(TPS) ⊆ IF. On the other hand, if TK is P -generated, then ω P,K is surjective by Lemma 6.1 (1), thence bijective, i.e. K ∈ Statl(TPS). Consequently, Gen(TP ) = Stat l(TPS). 2. This follows directly from the inclusions in (21) and “1”. 3. Assume Qr := (P,QT ) ∈ P r (T ). Since Pr ∈ P r (S), it follows by analogy to Propo- sition 2.12 “3” that PS is flat, hence idP ⊗S α K is injective. The commutativity of the upper left triangle in Diagram (22) implies then that α1 is injective, thence ξI,K is injective by commutativity of the lower left triangle (i.e. T I ⊆ TT is K-pure). If TK is divisible, then K ⊗T I ≃ K (i.e. K ∈ IC).� Theorem 6.4. Let M = (T, S, P,Q,<,>T , <,>S, I, J) ∈ UMC. Then we have JL ⊆ JF ⊆ Cogen( S P ) and Adstat l(TPS) ⊆ Cogen( S P ). Assume Qr := (P,QT ) ∈ P r (T ). Then 1. JS ⊆ SS is pure and JC ⊆ Cogen( S P ). 2. If Pr := (Q,PS) ∈ P r (S), then JL ⊆ Adstat l(TPS) ⊆ Cogen( S P ) ⊆ JF. 3. If Pr ∈ P r (S) and Cogen( S P ) ⊆ JL, then JL = Cogen( S P ) = Adstat l(TPS). Proof. For every right S-module L consider the commutative diagram with canonical morphisms and let α3 be so defined, that the left triangles become commutative J ⊗S L ξJ,L // ζJ,L // HomS(J, L) (<,>S ,L) HomS(Q⊗T P, L) ≃ can Q⊗T P ⊗S L (<,>S)⊗S idL // HomT (P, P ⊗S L) HomT (P,HomS(Q,L)) By definition JL ⊆ JF and Adstat l(TPS) ⊆ Cogen( S P ). If SL ∈ JF, then ζJ,L is injective and it follows by commutativity of the right rectangle in Diagram (23) that ηlP,L is injective, hence SL is S P -cogenerated by Lemma 6.1 “2”. Consequently, JF ⊆ Cogen( S P ). Assume now that Qr ∈ P r (T ). Then it follows from Lemma 4.3 that <,>S is injective (hence Q⊗T P ≃ J) and so α4 := (can ◦ (<,>S, L)) −1 ◦ (P, αPrL ) is injective. 1. Since α3 is injective, ξJ,L is also injective for every SL, i.e. JS ⊆ SS is pure. If SL ∈ JC, then it follows from the commutativity of the left rectangle in Diagram (23) that ηlP,L is injective, hence L ∈ Cogen( S P ) by Lemma 6.1 (2). 2. Assume that Pr ∈ P r (S), so that α4 is injective. If SL ∈ JL, then ζJ,L is an isomorphism, thence ηlP,L is surjective (notice that α4 is injective). Consequently, JL ⊆ Adstat l(TPS). 3. This follows directly from the assumptions and “2”.� ∗-Modules To the end of this section, we fix a unital ring T, a left T -module TP and set S := End(TP ) Definition 6.5. ([MO1989]) We call TP a ∗-module, iff Gen(TP ) ≈ Cogen( S P ). Remark 6.6. It was shown by J. Trlifaj [Trl1994] that all ∗-modules are finitely generated. By definition, Statl(TPS) ≤ TM and Adstat l(TPS) ≤ SM are the largest subcat- egories between which the adjunction (P ⊗S −,HomT (P,−)) induces an equivalence. On the other hand, Lemma 6.1 shows that Gen(TP ) ≤ TM and Cogen( S P ) ≤ SM are the largest such subcategories (see [Col1990, Section 3] for more details). This suggests the following observation: Proposition 6.7. ([Xin1999, Lemma 2.3.]) We have TP is a ∗ -module ⇔ Stat(TP ) = Gen(TP ) and Adstat(TP ) = Cogen( S P ). Definition 6.8. A left T -module TU is said to be semi- -quasi-projective (abbr. s- -quasi-projective), iff for any left T -module TV ∈ Pres(TU) and any U-presentation U (Λ) → U (Λ ′) → V → 0 of TV (if any), the following induced sequence is exact: HomT (U, U (Λ)) → HomT (U, U (Λ′)) → HomT (U, V ) → 0; weakly- -quasi-projective (abbr. w- -quasi-projective), iff for any left T - module TV and any short exact sequence 0 → K → U (Λ ′) → V → 0 with K ∈ Gen(TU) (if any), the following induced sequence is exact: 0 → HomT (U,K) → HomT (U, U (Λ′)) → HomT (U, V ) → 0; self-tilting, iff TU is w- -quasi-projective and Gen(TU) = Pres(TU);∑ -self-static, iff any direct sum U (Λ) is U -static. (self)-small, iff HomT (U,−) commutes with direct sums (of TU); Proposition 6.9. Assume M = (T, S, P,Q,<,>T , <,>S) is a unital Morita context. 1. If Pr := (Q,PS) ∈ P r (S), then: (a) Gen(TP ) = Stat l(TPS); (b) there is an equivalence of categories Gen(TP ) ≈ Cop( S P ); (c) TP is -self-static and Statl(TPS) is closed under factor modules. (d) Gen(TP ) = Pres(TP ); 2. If M ∈ UMCαr and Cogen( S P ) ⊆ JL, then: (a) Gen(TP ) = Stat l(TPS) and Cogen( S P ) = Adstat l(TPS); (b) there is an equivalence of categories Cogen( S P ) ≈ Gen(TP ); (c) TP is a ∗-module; (d) TP is self-tilting and self-small. Proof. 1. If Pr ∈ P r (S), then it follows by Theorem 6.3 that Gen(TP ) = Stat l(TPS), which is equivalent to each of “b” and “c” by [Wis2000, 4.4.] and to “d” by [Wis2000, 4.3.]. 2. It follows by the assumptions, Theorems 6.3, 6.4 and 5.2 that Gen(TP ) = Stat l(TPS) ≈ Adstatl(TPS) = Cogen( S P ), whence Gen(TP ) ≈ Cogen( S P ) (which is the definition of ∗-modules). Hence “a” ⇔“b” ⇔“c”. The equivalence “a” ⇔ “d” is evident by [Wis2000, Corollary 4.7.] and we are done.� Wide Morita Contexts Wide Morita contexts were introduced by F. Castaño Iglesias and J. Gómez-Torrecillas [C-IG-T1995] and [C-IG-T1996] as an extension of classical Morita contexts to Abelian categories. Definition 6.10. Let A and B be Abelian categories. A right (left) wide Morita context between A and B is a datum Wr = (G,A,B, F, η, ρ), where G : A ⇄ B : F are right (left) exact covariant functors and η : F ◦ G −→ 1A, ρ : G ◦ F −→ 1B (η : 1A −→ F ◦ G, ρ : 1B −→ G ◦ F ) are natural transformations, such that for every pair of objects (A,B) ∈ A× B the compatibility conditions G(ηA) = ρG(A) and F (ρB) = ηF (B) hold. Definition 6.11. Let A and B be Abelian categories and W = (G,A,B, F, η, ρ) be a right (left) wide Morita context. We call W injective (respectively semi-strict, strict), iff η and ρ are monomorphisms (respectively epimorphisms, isomorphisms) Remarks 6.12. Let W = (G,A,B, F, η, ρ) be a right (left) wide Morita context. 1. It follows by [CDN2005, Propositions 1.1., 1.4.] that if either η or ρ is an epimorphism (monomorphism), then W is strict, whence A ≈ B. 2. The resemblance of injective left wide Morita contexts is with the Morita-Takeuchi contexts for comodules of coalgebras, i.e. the so called pre-equivalence data for cate- gories of comodules introduced in [Tak1977] (see [C-IG-T1998] for more details). Injective Right wide Morita contexts In a recent work [CDN2005, 5.1.], Chifan, et. al. clarified (for module categories) the relation between classical Morita contexts and right wide Morita contexts. For the conve- nience of the reader and for later reference, we include in what follows a brief description of this relation. 6.13. Let T, S be rings, A := TM and B := SM. Associated to each Morita context M = (T, S, P,Q,<,>T , <,>S) is a wide Morita context as follows: Define G : A ⇄ B : F by G(−) = Q ⊗T − and F (−) = P ⊗S −. Then there are natural transformations η : F ◦G −→ 1 and ρ : G ◦ F −→ 1 such that for each TV and WS : ηV : P ⊗S (Q⊗T V ) → V, pi ⊗S (qi ⊗T vi) 7→ < pi, qi >T vi, ρW : Q⊗T (P ⊗S W ) → W, qi ⊗T (pi ⊗S wi) 7→ < qi, pi >S wi. Then the datum Wr(M) := (G, TM, SM, F, η, ρ) is a right wide Morita context. Conversely, let T ′, S ′ be two rings and W ′r = (G ′, T ′M, S′M, F ′, η′, ρ′) be a right wide Morita context between T ′M and S′M such that the right exact functors G ′ : T ′M ⇄ S′M : F ′ commute with direct sums. By Watts’ Theorems (e.g. [Gol1979]), there exists a (T, S)-bimodule P ′ (e.g. F ′(S ′)) such that F ′ ≃ P ′⊗S′ −, an (S, T )-bimodule Q ′ such that G′ ≃ Q′ ⊗T ′ − and there should exist two bilinear forms <,>T ′: P ′ ⊗S′ Q ′ → T ′ and <,>S′: Q ′ ⊗T ′ P ′ → S ′, such that the natural transformations η′ : F ′ ◦G′ → 1 , ρ : G′ ◦ F ′ → 1 are given by η′V ′(p ′ ⊗S′ q ′ ⊗T ′ v ′) =< p′, q′ >T ′ v ′ and ρ′W ′(q ′ ⊗T p ′ ⊗S w ′) =< q′, p′ >S′ w for all V ′ ∈ T ′M, W ′ ∈ S′M, p ′ ∈ P ′, q′ ∈ Q′, v′ ∈ V ′ and w′ ∈ W ′. It can be shown that in this way one obtains a Morita context M′ = M′(W ′r) := (T ′, S ′, P ′, Q′, <,>T ′, <,>S′). Moreover, it turns out that given a wide Morita context Wr, we have Wr ≃ Wr(M(Wr)). The following result clarifies the relation between injective Morita contexts and injective right wide Morita contexts. Theorem 6.14. Let M = (T, S, P,Q,<,>T , <,>S) be a Morita context, A := TM, B := SM and consider the induced right wide Morita context Wr(M) := (G,A,B, F, η, ρ). 1. If Wr(M) is an injective right wide Morita context, then M is an injective Morita context. 2. If M ∈ UMCαr , then Wr(M) is an injective right wide Morita context. Proof. 1. Let Wr(M) be an injective right wide Morita context. Then in particular, <,>T= ηT and <,>S= ρS are injective, i.e. M is an injective Morita context. 2. Assume that M satisfies the right α-condition. Suppose there exists some TV and∑ pi ⊗S (qi ⊗T vi) ∈ Ker(ηV ). Then for any q ∈ Q we have 0 = q ⊗T ηV ( (pi ⊗S qi)⊗T vi) = q⊗T < pi, qi >T vi q < pi, qi >T ⊗T vi = < q, pi >S qi ⊗T vi < q, pi >S (qi ⊗T vi) = α pi ⊗S (qi ⊗T vi))(q). Since Pr := (Q,PS) ∈ P r (S), the morphism α is injective and so pi⊗S (qi⊗T vi) = 0, i.e. ηV is injective. Analogously, suppose qi ⊗T (pi ⊗S wi) ∈ Ker(ρW ). Then for any p ∈ P we have 0 = p⊗S ρW ( qi ⊗T (pi ⊗S wi) = p⊗S < qi, pi >S wi p < qi, pi >S ⊗Swi = < p, qi >T pi ⊗S wi < p, qi >T (pi ⊗S wi) = α qi ⊗T (pi ⊗S wi))(p). Since Qr := (P,QT ) ∈ P r (T ), the morphism α is injective and so qi ⊗T (pi ⊗S wi) = 0, i.e. ρW is injective. Consequently, the induced right wide Morita context Wr(M) is injective.� Acknowledgement: The authors thank the referee for his/her careful reading of the paper and for the fruitful suggestions, comments and corrections, which helped in improving several parts of the paper. Moreover, they acknowledge the excellent research facilities as well as the support of their respective institutions, King Fahd University of Petroleum and Minerals and King AbdulAziz University. References [Abr1983] G.D. Abrams, Morita equivalence for rings with local units, Comm. Algebra 11 (1983), 801-837. [Abu2005] J.Y. Abuhlail, On the linear weak topology and dual pairings over rings, Topol- ogy Appl. 149 (2005), 161-175. [AF1974] F. Anderson and K. Fuller, Rings and Categories of Modules, Springer-Verlag (1974). [AGH-Z1997] A.V. Arhangélskii, K.R. Goodearl and B. Huisgen-Zimmermann, Kiiti Morita, (1915-1995 ), Notices Amer. Math. Soc. 44(6) (1997), 680-684. [AG-TL2001] J.Y. Abuhlail, J. Gómez-Torrecillas and F. Lobillo, Duality and rational modules in Hopf algebras over commutative rings, J. Algebra 240 (2001), 165- [Alp1990] J.L. Alperin, Static modules and nonnormal Clifford theory, J. Austral. Math. Soc. Ser. A 49(3) (1990), 347-353. [AM1987] P.N. Ánh and L. Márki, Morita equivalence for rings without identity, Tsukuba J. Math 11 (1987), 1-16. [Ami1971] S.A. Amitsur, Rings of quotients and Morita contexts, J. Algebra 17 (1971), 273-298. [Ber2003] I. Berbee, The Morita-Takeuchi theory for quotient categories, Comm. Algebra 31(2) (2003), 843-858. [C-IG-T1995] F. Castaño Iglesias and J. Gomez-Torrecillas, Wide Morita contexts, Comm. Algebra 23 (1995), 601-622. [C-IG-T1996] F. Castaño Iglesias and J. Gomez-Torrecillas, Wide left Morita contexts and equivalences, Rev. Roum. Math. Pures Appl. 4(1-2) (1996), 17-26. [C-IG-T1998] F. Castaño Iglesias and J. Gomez-Torrecillas, Wide Morita contexts and equivalences of comodule categories, J. Pure Appl. Algebra 131 (1998), 213- [BW2003] T. Brzeziński and R. Wisbauer, Corings and Comodules, Lond. Math. Soc. Lec. Not. Ser. 309, Cambridge University Press (2003). [Cae1998] S. Caenepeel, Brauer Groups, Hopf Algebras and Galois Theory, Kluwer Aca- demic Publishers (1998). [C-IG-TW2003] F. Castaño Iglesias, J. Gómez-Torrecillas and R. Wisbauer, Adjoint func- tors and equivalence of subcategories, Bull. Sci. Math. 127 (2003), 279-395. [CDN2005] N. Chifan, S. Dăscălescu and C. Năstăsescu, Wide Morita contexts, relative injectivity and equivalence results, J. Algebra 284 (2005), 705-736. [Col1990] R. Colpi, Some remarks on equivalences between categories of modules, Comm. Algebra 18 (1990), 1935-1951. [CF2004] R. Colby and K. Fuller, Equivalence and Duality for Module Categories. With Tilting and Cotilting for Rings, Cambridge University Press (2004). [Fai1981] C. Faith, Algebra I, Rings, Modules and Categories, Springer-Verlag (1981). [Gol1979] J. Golan, An Introduction to Homological Algebra, Academic Press (1979). [HS1998] Z. Hao and K.-P. Shum, The Grothendieck groups of rings of Morita contexts, Group theory (Beijing, 1996), 88-97, Springer (1998). [Kat1978] T. Kato, Duality between colocalization and localization, J. Algebra 55 (1978), 351-374. [KO1979] T. Kato and K. Ohtake, Morita contexts and equivalences. J. Algebra 61 (1979), 360-366. [Lam1999] T.Y. Lam, Lectures on Modules and Rings, Springer (1999). [MO1989] C. Menini and A. Orsatti, Representable equivalences between categories of mod- ules and applications. Rend. Sem. Mat. Univ. Padova 82 (1989), 203-231. [Mül1974] B.J. Müller, The quotient category of a Morita context, J. Algebra 28 (1974), 389-407. [Nau1990a] S.K. Nauman, Static modules, Morita contexts, and equivalences. J. Algebra 135 (1990), 192-202. [Nau1990b] S.K. Nauman, Static modules and stable Clifford theory, J. Algebra 128(2) (1990), 497-509. [Nau1993] S.K. Nauman, Intersecting subcategories of static modules and their equiva- lences, J. Algebra 155(1) (1993), 252-265. [Nau1994-a] S.K. Nauman, An alternate criterion of localized modules, J. Algebra 164 (1994), 256-263. [Nau1994-b] S.K. Nauman, Intersecting subcategories of static modules, stable Clifford the- ory and colocalization-localization, J. Algebra 170(2) (1994), 400-421. [Nau2004] S.K. Nauman, Morita similar matrix rings and their Grothendieck groups, Ali- garh Bull. Math. 23(1-2) (2004), 49-60. [Sat1978] M. Sato, Fuller’s Theorem of equivalences, J. Algebra 52 (1978), 274-284. [Tak1977] M. Takeuchi, Morita theorems for categories of comodules, J. Fac. Univ. Tokyo 24 (1977), 629-644. [Trl1994] J. Trlifaj, Every ∗-module is finitely generated, J. Algebra 169 (1994), 392-398. [Ver2006] J. Vercruysse, Local units versus local dualisations: corings with local structure maps, Commun. Algebra 34 (2006), 2079-2103. [Wis1991] R.Wisbauer, Foundations of Module and Ring Theory, a Handbook for Study and Research, Gordon and Breach Science Publishers (1991). [Wis1998] R. Wisbauer, Tilting in module categories, in “Abelian groups, module theory and topology”, LNPAM 201 (1998), 421-444. [Wis2000] R. Wisbauer, Static modules and equivalences, in “Interactions between Ring Theory and Representation Theory”, Ed. V. Oystaeyen, M. Saorin, Marcel Decker (2000), 423-449. [Xin1999] Lin Xin, A note on ∗-modules, Algebra Colloq. 6(2) (1999), 231-240. [Z-H1976] B. Zimmermann-Huisgen, Pure submodules of direct products of free modules, Math. Ann. 224 (1976), 233-245. Introduction Preliminaries Morita (Semi)contexts Injective Morita (Semi-)Contexts Equivalences of Categories More applications ABSTRACT This paper is an exposition of the so-called injective Morita contexts (in which the connecting bimodule morphisms are injective) and Morita $\alpha$contexts (in which the connecting bimodules enjoy some local projectivity in the sense of Zimmermann-Huisgen). Motivated by situations in which only one trace ideal is in action, or the compatibility between the bimodule morphisms is not needed, we introduce the notions of Morita semi-contexts and Morita data, and investigate them. Injective Morita data will be used (with the help of static and adstatic modules) to establish equivalences between some intersecting subcategories related to subcategories of modules that are localized or colocalized by trace ideals of a Morita datum. We end up with applications of Morita $\alpha$-contexts to $\ast$-modules and injective right wide Morita contexts. <|endoftext|><|startoftext|> Introduction The notations and conventions of charmed baryon The 3P0 model The strong decays of charmed baryon Numerical results Discussion and conclusion Appendix The harmonic oscillator wave functions used in our calculation The momentum space integration Acknowledgments References ABSTRACT There has been important experimental progress in the sector of heavy baryons in the past several years. We study the strong decays of the S-wave, P-wave, D-wave and radially excited charmed baryons using the $^3P_0$ model. After comparing the calculated decay pattern and total width with the available data, we discuss the possible internal structure and quantum numbers of those charmed baryons observed recently. <|endoftext|><|startoftext|> Introduction It took thirty-seven years from the discovery of a tiny CP violating effect of order 10−3 inKL → π+π− 1 to a first observation of a breakdown of CP symmetry outside the strange meson system. A large CP asymmetry of order one between rates of initial B0 and B̄0 decays to J/ψKS was measured in summer 2001 by the Babar and Belle Collaborations.2 A sizable however smaller asymmetry had been anticipated twenty years earlier 3 in the framework of the Kobayashi-Maskawa (KM) model of CP violation,4 in the absence of crucial information on b quark couplings. The asymmetry was observed in a time-dependent measurement as suggested,5 thanks to the long B0 lifetime and the large B0-B̄0 mixing.6 The measured asymmetry, fixing (in the standard phase convention7) the sine of the phase 2β (≡ 2φ1) ≡ 2arg(VtbV td) of the top-quark dominated B 0-B̄0 mixing amplitude, was found to be in good agreement with other determinations of Cabibbo-Kobayasi-Maskawa (CKM) parameters,8,9 including a recent precise measurement of Bs-B̄s mixing. This showed that the CKM phase γ (≡ φ3) ≡ arg(V ∗ub), which seems to be unable to account for the observed cosmological baryon asymmetry,11 is the dominant source of CP violation in flavor-changing processes. With this confirmation the next pressing question became whether small contributions beyond the CKM framework occur in CP violating flavor-changing processes, and whether such effects can be observed in beauty decays. One way of answering this question is by over-constraining the CKM unitarity triangle through precise CP conserving measurements related to the lengths of the ∗Based partially on review talks given at recent conferences. http://arxiv.org/abs/0704.0076v2 October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review 2 M. Gronau sides of the triangle. An alternative and more direct way, focusing on the origin of CP violation in the CKM framework, is to measure β and γ in a variety of B decay modes. Different values obtained from asymmetries in several processes, or values different from those imposed by other constraints, could provide clues for new sources of CP violation and for new flavor-changing interactions. Such phases and interactions occur in the low energy effective Hamiltonian of extensions of the Standard Model (SM) including models based on supersymmetry.12 In this presentation we will focus on the latter approach based primarily on CP asymmetries, using also complementary information on hadronic B decay rates which are expected to be related to each other in the CKM framework. In the next section we outline several of the most relevant processes and the theoretical tools applied for their studies, quoting numerous papers where these ideas have been originally proposed and where more details can be found.13 Sections 3, 4 and 5 describe a number of methods in some detail, summarizing at the end of each section the current experimental situation. Section 6 discusses several tests for NP effects, while Section 7 concludes. 2. Processes, methods and New Physics effects Whereas testing the KM origin of CP violation in most hadronic B decays requires separating strong and weak interaction effects, in a few “golden modes” CP asym- metries are unaffected by strong interactions. For instance, the decay B0 → J/ψKS is dominated by a single tree-level quark transition b̄ → c̄cs̄, up to a correction smaller than a fraction of a percent.14,15,16,17 Thus, the asymmetries measured in this process and in other decays dominated by b̄ → c̄cs̄ have already provided a rather precise measurement of sin 2β,18,19,20 sin 2β = 0.678± 0.025 . (1) This value permits two solutions for β at 21.3◦ and at 68.7◦. Time-dependent an- gular studies of B0 → J/ψK∗0,21 and time-dependent Dalitz analyses of B0 → Dh0 (D → KSπ+π−, h0 = π0, η, ω)22 measuring cos 2β > 0 have excluded the second solution at a high confidence level, implying β = (21.3± 1.0)◦ . (2) Since B0 → J/ψKS proceeds through a CKM-favored quark transition, contribu- tions to the decay amplitude from physics at a higher scale are expected to be very small, potentially identifiable by a tiny direct asymmetry in this process or in B+ → J/ψK+.23 Another process where the determination of a weak phase is not affected by strong interactions is B+ → DK+, proceeding through tree-level amplitudes b̄ → c̄us̄ and b̄ → ūcs̄. The interference of these two amplitudes, from D̄0 and D0 which can always decay to a common hadronic final state, leads to decay rates and a CP asymmetry which measure very cleanly the relative phase γ between these October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review CP violation in beauty decays 3 amplitudes.24,25 The trick here lies in recognizing the measurements which yield this fundamental CP-violating quantity. Physics beyond the SM is expected to have a negligible effect on this determination of γ which relies on the interference of two tree amplitudes. B decays into pairs of charmless mesons, such as B → ππ (or B → ρρ) and B → Kπ (or B → K∗ρ), involve contributions of both tree and penguin ampli- tudes which carry different weak and strong phases.14,26,27 Contrary to the case of B → DK, the determination of β and γ using CP asymmetries in charmless B decays involves two correlated aspects which must be considered: its dependence on strong interaction dynamics and its sensitivity to potential New Physics (NP) effects. This sensitivity follows from the CKM and loop suppression of penguin am- plitudes, implying that new heavy particles at the TeV mass range, replacing the W boson and the top-quark in the penguin loop, may have sizable effects.28. In order to claim evidence for physics beyond the SM from a determination of β and γ in these processes one must handle first the question of dynamics. There are two approaches for treating the dynamics of charmless hadronic B decays: (1) Study systematically strong interaction effects in the framework of QCD. (2) Identify by symmetry observables which do not depend on QCD dynamics. The first approach faces the difficulty of having to treat precisely long distance effects of QCD including final state interactions. Remarkable theoretical progress has been made recently in proving a leading-order (in 1/mb) factorization formula for these amplitudes in a heavy quark effective theory approach to perturbative QCD.29,30,31 However, there remain differences between ways of treating in differ- ent approaches power counting, the scale of Wilson coefficients, end-point quark dis- tribution functions of light mesons, and nonperturbative contributions from charm loops.32 Also, the nonperturbative input parameters in these calculations involve non-negligible uncertainties. These parameters include heavy-to-light form factors at small momentum transfer, light-cone distribution amplitudes, and the average inverse momentum fraction of the spectator quark in the B meson. The resulting inaccuracies in calculating magnitudes and strong phases of amplitudes prohibit a precise determination of γ from measured decay rates and CP asymmetries. Also, the calculated rates and asymmetries cannot provide a clear case for physics be- yond the SM in cases where the results of a calculation deviate slightly from the measurements. In the second approach one applies isospin symmetry to obtain relations among several decay amplitudes. For instance, using the distinct behavior under isospin of tree and penguin operators contributing in B → ππ, a judicious choice of observ- ables permits a determination of γ or α (≡ φ2) = π − β − γ. 33 The same analysis applies in B decays to pairs of longitudinally polarized ρ mesons. In case that an observable related to the subdominant penguin amplitude is not measured with sufficient precision, it may be replaced in the analysis by a CKM-enhanced SU(3)- related observable, in which a large theoretical uncertainty is translated to a small October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review 4 M. Gronau error in γ. The precision of this method is increased by including contributions of higher order electroweak penguin amplitudes, which are related by isospin to tree amplitudes.34,35 With sufficient statistics one should also take into account isospin- breaking corrections of order (md−mu)/ΛQCD ∼ 0.02,36,37 and an effect caused by the ρ meson width.38 A similar analysis proposed for extracting γ in B → Kπ 39,40 requires using flavor SU(3) instead of isospin for relating electroweak penguin con- tributions and tree amplitudes.35,41 While flavor SU(3) is usually assumed to be broken by corrections of order (ms − md)/ΛQCD ∼ 0.3, in this particular case a rather precise recipe for SU(3) breaking is provided by QCD factorization, reducing the theoretical uncertainty in γ to only a few degrees.42 Charmless B decays, which are sensitive to physics beyond the SM 28, provide a rich laboratory for studying various signatures of NP. A large variety of theories have been studied in this context, including supersymmetric models, models involving tree-level flavor-changing Z or Z ′ couplings, models with anomalous three-gauge- boson couplings and other models involving an enhanced chromomagnetic dipole operator.43,44 The following effects have been studied and will be discussed in Section 6 in a model-independent manner: (1) Within the SM, the three values of γ extracted from B → ππ, B → Kπ and B+ → DK+ are equal. As we will explain, these three values are expected to be different in extensions of the SM involving new low energy four-fermion operators behaving as ∆I = 3/2 in B → ππ and as ∆I = 1 in B → Kπ. (2) Other signatures of anomalously large ∆I = 1 operators contributing to B → Kπ are violations of isospin sum rules, holding in the SM for both decay rates and CP asymmetries in these decays.45,46,47 (3) Time-dependent asymmetries in B0 → π0KS , B0 → φKS and B0 → η′KS and in other b → s penguin-dominated decays may differ substantially from the asymmetry sin 2β sin∆mt, predicted approximately in the SM.26,43,48 Significant deviations are expected in models involving anomalous |∆S| = 1 operators behaving as ∆I = 0 or ∆I = 1. (4) An interesting question, which may provide a clue to the underlying New Physics once deviations from SM predictions are observed, is how to diagnose the value of ∆I in NP operators contributing to |∆S| = 1 charmless B decays. We will discuss an answer to this question which has been proposed recently.49 3. Determining γ in B → DK In this section we will discuss in some length a rather rich and very precise method for determining γ in processes of the form B → D(∗)K(∗), which uses both charged and neutral B mesons and a large variety of final states. It is based on a broad idea that any coherent admixture of a state involving a D̄0 from b̄ → c̄us̄ and a state with D0 from b̄ → ūcs̄ can decay to a common final state.24,25 The interference between the two channels, B → D(∗)0K(∗), D0 → fD and B → D̄(∗)0K(∗), D̄0 → October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review CP violation in beauty decays 5 fD, involves the weak phase difference γ, which may be determined with a high theoretical precision using a suitable choice of measurements. Effects of D0-D̄0 mixing are negligible.50 While some of these processes are statistically limited, combining them together is expected to reduce the experimental error in γ. In addition to (quasi) two-body B decays, the D or D∗ in the final state may be accompanied by any multi-body final state with quantum numbers of a kaon.25 Each process in this large class of neutral and charged B decays is characterized by two pairs of parameters, describing complex ratios of amplitudes for D0 and D̄0 for the two steps of the decay chain (we use a convention rB , rf ≥ 0, 0 ≤ δB, δf < A(B → D(∗)0K(∗)) A(B → D̄(∗)0K(∗)) = rBe i(δB+γ) , A(D0 → fD) A(D̄0 → fD) = rfe iδf . (3) In three-body decays ofB andD mesons, such asB → DKπ andD → Kππ, the two pairs of parameters (rB , δB) and (rf , δf ) are actually functions of two corresponding Dalitz variables describing the kinematics of the above three-body decays. The sensitivity of determining γ depends on rB and rf because this determination relies on an interference of D0 and D̄0 amplitudes. For D decay modes with rf ∼ 1 (see discussion below) the sensitivity increases with the magnitude of rB. For each of the eight sub-classes of processes, B+,0 → D(∗)K(∗)+,0, one may study a variety of final states in neutral D decays. The states fD may be divided into four families, distinguished qualitatively by their parameters (rf , δf ) defined in Eq. (3): (1) fD = CP-eigenstate 24,25,51 (K+K−,KSπ 0, etc.); rf = 1, δf = 0, π. (2) fD = flavorless but non-CP state 52 (K+K∗−,K∗+K−, etc.); rf = O(1). (3) fD = flavor state 53 (K+π−,K+π−π0, etc.); rf ∼ tan2 θc. (4) fD = 3-body self-conjugate state 54 (KSπ +π−); rf , δf vary across the Dalitz plane. In the first family, CP-odd states occur in Cabibbo-favored D0 and D̄0 decays, while CP-even states occur in singly Cabibbo-suppressed decays. The second family of states occurs in singly Cabibbo-suppressed decays, the third family occurs in Cabibbo-favored D̄0 decays and in doubly Cabibbo-suppressed D0 decays, while the last state is formally a Cabibbo-favored mode for both D0 and D̄0. The parameters rB and δB in B → D(∗)K(∗) depend on whether the B meson is charged or neutral, and may differ for K vs K∗,55 and for D vs D∗, where a neutral D∗ can be observed in D∗ → Dπ0 or D∗ → Dγ.56 The ratio rB involves a CKM factor |VubVcs/VcbVus| ≃ 0.4 in both B+ and B0 decays, and a color- suppression factor in B+ decays, while in B0 decays both b̄ → c̄us̄ and b̄ → ūcs̄ amplitudes are color-suppressed. A rough estimate of the color-suppression factor in these decays may be obtained from the color-suppression measured in corresponding CKM-favored decays, B → Dπ,D∗π,Dρ,D∗ρ, where the suppression is found to be in the range 0.3 − 0.5.57 Thus, one expects rB(B0) ∼ 0.4, rB(B+) = (0.3 − October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review 6 M. Gronau 0.5)rB(B 0) in all the processes B+,0 → D(∗)K(∗)+,0. We note that three-body B+ decays, such as B+ → D0K+π0, are not color-suppressed, making these processes advantageous by their potentially large value of rB which varies in phase space. 58,59 The above comparison of rB(B +) and rB(B 0) may be quantified more precisely by expressing the four ratios rB(B 0)/rB(B +) in B → D(∗)K(∗) in terms of recip- rocal ratios of known magnitudes of amplitudes:60 0 → D(∗)K(∗)0) rB(B+ → D(∗)K(∗)+) B+ → D̄(∗)0K(∗)+) B0 → D̄(∗)0K(∗)0) . (4) This follows from an approximation, A(B0 → D(∗)0K(∗)0) ≃ A(B+ → D(∗)0K(∗)+) , (5) where the B0 and B+ processes are related to each other by replacing a spectator d quark by a u quark. While formally Eq. (5) is not an isospin prediction, it may be obtained using an isospin triangle relation,61 A(B0 → D(∗)0K(∗)0) = A(B+ → D(∗)0K(∗)+) +A(B+ → D(∗)+K(∗)0), (6) and neglecting the second amplitude on the right-hand-side which is “pure annihilation”.62 This amplitude is expected to be suppressed by a factor of four or five relative to the other two amplitudes appearing in (6) which are color-suppressed. Evidence for this kind of suppression is provided by corresponding ratios of CKM- favored amplitudes,57 |A(B0 → D−s K+)/ 2A(D̄0π0)| = 0.23 ± 0.03, |A(B0 → D∗−s K 2A(D̄∗0π0)| < 0.24. Applying Eq. (4) to measured branching ratios,57,63 one finds rB(B+) B → DK B → DK∗ B → D∗K B → D∗K∗ 2.9± 0.4 3.7± 0.3 > 2.2 > 3.0 (7) This agrees with values of rB(B 0) near 0.4 and rB(B +) between 0.1 and 0.2. Note that in spite of the expected larger values of rB in B 0 decays, from the point of view of statistics alone (without considering the question of flavor tagging and the efficiency of detecting a KS in B 0 → D(∗)K0), B+ and B0 decays may fare comparably when studying γ. This follows from (5) because the statistical error on γ scales roughly as the inverse of the smallest of the two interfering amplitudes. We will now discuss the actual manner by which γ can be determined using separately three of the above-mentioned families of final states fD. We will men- tion advantages and disadvantages in each case. For illustration of the method we will consider B+ → fDK+. We will also summarize the current status of these measurements in all eight decay modes B+,0 → D(∗)K(∗)+,0. October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review CP violation in beauty decays 7 3.1. fD = CP-eigenstates One considers four observables consisting of two charge-averaged decay rates for even and odd CP states, normalized by the decay rate into a D0 flavor state, RCP± ≡ Γ(DCP±K −) + Γ(DCP±K Γ(D0K−) , (8) and two CP asymmetries for even and odd CP states, ACP± ≡ Γ(DCP±K −)− Γ(DCP±K+) Γ(DCP±K−) + Γ(DCP±K+) . (9) In order to avoid dependence of RCP± on errors in D 0 and DCP branching ratio measurements one uses a definition of RCP± in terms of ratios of B decay branching ratios intoDK andDπ final states.59 The four observablesRCP± and ACP± provide three independent equations for rB, δB and γ, RCP± = 1 + r B ± 2rB cos δB cos γ , (10) ACP± = ±2rB sin δB sin γ/RCP± . (11) While in principle this is the simplest and most precise method for extracting γ, up to a discrete ambiguity, in practice this method is sensitive to r2B , because (RCP+ + RCP−)/2 = 1 + r B . This becomes very difficult for charged B decays where one expects rB ∼ 0.1− 0.2, but may be feasible for neutral B decays where rB ∼ 0.4. An obvious signature for a non-zero value of rB would be observing a difference between RCP+ and RCP− which is linear in this quantity. Studies of B+ → DCPK+, B+ → DCPK∗+ and B+ → D∗CPK+ have been car- ried out recently,64,65,66 each consisting of a few tens of events. A nonzero difference RCP+ −RCP− at 2.6 standard deviations, measured in B+ → DCPK∗+,64 is prob- ably a statistical fluctuation. A larger difference is anticipated in B0 → DCPK∗0, as the value of rB in this process is expected to be three or four times larger than in B+ → DK∗+. [See Eq. (7).] Higher statistics is required for a measurement of γ using this method. 3.2. fD = flavor state Consider a flavor state fD in Cabibbo-favored D̄ 0 decays, accessible also to doubly Cabibbo-suppressed D0 decays, such that one has rf ∼ tan2 θc in Eq. (3). One studies the ratio of two charge-averaged decay rates, for decays into f̄DK and fDK, Γ(fDK −) + Γ(f̄DK Γ(f̄DK−) + Γ(fDK+) , (12) and the CP asymmetry, Γ(fDK −)− Γ(f̄DK+) Γ(fDK−) + Γ(f̄DK+) . (13) October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review 8 M. Gronau These observables are given by Rf = r B + r f + 2rB rf cos(δB − δf ) cos γ , (14) Af = 2r rf sin(δB − δf ) sin γ/Rf , (15) where a multiplicative correction 1 +O(rBrf ) ∼ 1.01 has been neglected in (14). These two observables involve three unknowns, rB , δB − δf and γ. One assumes rf to be given by the measured ratio of doubly Cabibbo-suppressed and Cabibbo- favored branching ratios. Thus, one needs at least two flavor states, fD and f for which two pairs of observables (Rf , Af ) and (Rf ′ , Af ′) provide four equations for the four unknowns, rB, δB − δf , δB − δf ′ , γ. The strong phase differences δf , δf ′ can actually be measured at a ψ′′ charm factory,67 thereby reducing the number of unknowns to three. While the decay rate in the numerator of Rf is rather low, the asymmetry Af may be large for small values of rB around 0.1, as it involves two amplitudes with a relative magnitude rf/rB. So far, only upper bounds have been measured for Rf implying upper limits on rB in several processes, rB(B + → DK+) < 0.2,68,69,70 rB(B+ → D∗K+) < 0.2,68 r(B+ → DK∗+) < 0.4,71 and rB(B0 → DK∗0) < 0.4.63,72 Further con- straints on rB in the first three processes have been obtained by studying D decays into CP-eigenstates and into the state KSπ +π−. Using rB(B 0 → DK∗0)/rB(B+ → DK∗+) = 3.7 ± 0.3 in (7) and assuming that rB(B+ → DK∗+) is not smaller than about 0.1, one may conclude that a nonzero measurement of rB(B 0 → DK∗0) should be measured soon. The signature for B0 → D0K∗0 events would be two kaons with opposite charges. 3.3. fD = KSπ The amplitude for B+ → (KSπ+π−)DK+ is a function of the two invariant-mass variables, m2 ≡ (pKS + pπ±)2, and may be written as A(B+ → (KSπ+π−)DK+) = f(m2+,m2−) + rBei(δB+γ)f(m2−,m2+) . (16) In B− decay one replaces m+ ↔ m−, γ → −γ. The function f may be written as a sum of about twenty resonant and nonresonant contributions modeled to describe the amplitude for flavor-tagged D̄0 → KSπ+π− which is measured separately.73,74 This introduces a model-dependent uncertainty in the analysis. Using the measured function f as an input and fitting the rates for B± → (KSπ+π−)DK± to the parameters, rB , δB and γ, one then determines these three parameters. The advantage of using D → KSπ+π− decays over CP and flavor states is being Cabibbo-favored and involving regions in phase space with a potentially large interference between D0 and D̄0 decay amplitudes. The main disadvantage is the uncertainty introduced by modeling the function f . Two recent analyses of comparable statistics by Belle and Babar, combining B± → DK±, B± → D∗K± and B± → DK∗±, obtained values 73 γ = [53+15 −18 ± 3± October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review CP violation in beauty decays 9 9(model)]◦ and γ = [92±41±11±12(model)]◦.74 [This second value does not use the process B+ → D(KSπ+)K∗ , also studied by the same group,75.] The larger errors in the second analysis are correlated with smaller values of the extracted parameters rB in comparison with those extracted in the first study. The model-dependent errors may be reduced by studying at CLEO-c the decays DCP± → KSπ+π−, providing further information on strong phases in D decays.67 Conclusion: The currently most precise value of γ is γ = [53+15 −18 ± 3± 9(model)]◦, obtained from B± → D(∗)K(∗)± using D → KSπ+π−. These errors may be reduced in the future by combining the study of all D decay modes in B+,0 → D(∗)K(∗)+,0. The decay B0 → DK∗0 seems to carry a high potential because of its expected large value of rB. Decays B 0 → D(∗)K0 may also turn useful, as they have been shown to provide information on γ without the need for flavor tagging of the initial B0.60,76 4. The currently most precise determination of γ: B → ππ, ρρ, ρπ 4.1. B → ππ The amplitude for B0 → π+π− contains two terms, conventionally denoted “tree” (T ) and “penguin” (P ) amplitudes, 14,26 involving a weak CP-violating phase γ and a strong CP-conserving phase δ, respectively: A(B0 → π+π−) = |T |eiγ + |P |eiδ . (17) Time-dependent decay rates, for an initial B0 or a B , are given by Γ(B0(t)/B (t) → π+π−) = e−ΓtΓπ+π− [1± C+− cos∆mt∓ S+− sin∆mt] , (18) where S+− = 2Im(λππ) 1 + |λππ|2 , C+− = 1− |λππ |2 1 + |λππ |2 , λππ ≡ e−2iβ 0 → π+π−) A(B0 → π+π−) . (19) One has14 S+− = sin 2α+ 2|P/T | cos2α sin(β + α) cos δ +O(|P/T |2) , C+− = 2|P/T | sin(β + α) sin δ +O(|P/T |2) . (20) This tells us two things: (1) The deviation of S+− from sin 2α and the magnitude of C+− increase with |P/T |, which can be estimated to be |P/T | ∼ 0.5 by comparing B → ππ rates with penguin-dominated B → Kπ rates.77 (2) Γπ+π− , S+− and C+− are insufficient for determining |T |, |P |, δ and γ (or α). Further information on these quantities may be obtained by applying isospin sym- metry to all B → ππ decays. In order to carry out an isospin analysis,33 one uses the fact that the three physical B → ππ decay amplitudes and the three B̄ → ππ decay amplitudes, October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review 10 M. Gronau depending each on two isospin amplitudes, obey triangle relations of the form, A(B0 → π+π−)/ 2 +A(B0 → π0π0)−A(B+ → π+π0) = 0 . (21) Furthermore, the penguin amplitude is pure ∆I = 1/2; hence the ∆I = 3/2 am- plitude carries a week phase γ, A(B+ → π+π0) = e2iγA(B− → π−π0). Defin- ing sin 2αeff ≡ S+−/(1 − C2+−)1/2, the difference αeff − α is then determined by an angle between corresponding sides of the two isospin triangles sharing a com- mon base, |A(B+ → π+π0)| = |A(B− → π−π0)|. A sign ambiguity in αeff − α is resolved by two model-independent features which are confirmed experimentally, |P |/|T | ≤ 1, |δ| ≤ π/2. This implies α < αeff .78 Table I. Branching ratios and CP asymmetries in B → ππ, B → ρρ. Decay mode Branching ratio (10−6) ACP = −C S B0 → π+π− 5.16± 0.22 0.38 ± 0.07 −0.61± 0.08 B+ → π+π0 5.7± 0.4 0.04 ± 0.05 B0 → π0π0 1.31± 0.21 0.36 +0.33 −0.31 B0 → ρ+ρ− 23.1 0.11 ± 0.13 −0.06± 0.18 B+ → ρ+ρ0 18.2± 3.0 −0.08± 0.13 B0 → ρ0ρ0 1.07± 0.38 Current CP-averaged branching ratios and CP asymmetries for B → ππ and B → ρρ decays are given in Table I,20 where ACP ≡ −C for decays to CP eigen- states. An impressive experimentally progress has been achieved in the past two years in extracting a precise value for αeff , αeff = (110.6 −3.2) ◦. However, the er- ror on αeff − α using the isospin triangles is still large. An upper bound, given by CP-averaged rates and a direct CP asymmetry in B0 → π+π−,79,80 cos 2(αeff − α) ≥ Γ+− + Γ+0 − Γ00 )2 − Γ+−Γ+0 Γ+−Γ+0 1− C2+− , (22) leads to 0 < αeff − α < 31◦ at 1σ. Adding in quadrature the error in αeff and the uncertainty in α−αeff , this implies α = (95± 16)◦ or γ = (64± 16)◦ by . A similar central value but a smaller error, α = (97± 11)◦, has been reported recently by the Belle Collaboration.81 The possibility that a penguin amplitude in B0 → π+π− may lead to a large CP asymmetry S for values of α near 90◦ where sin 2α = 0 was anticipated fifteen years ago.82 The bound on αeff − α may be improved considerably by measuring a nonzero direct CP asymmetry in B0 → π0π0. This asymmetry can be shown to be large and positive (see Eq. (46) in Sec. 5.2), implying a large rate for B̄0 but a small rate for B0. Namely, the triangle (21) is expected to be squashed, while the B̄ triangle is roughly equal sided. An alternative way of treating the penguin amplitude in B0 → π+π− is by combining within flavor SU(3) the decay rate and asymmetries in this process with October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review CP violation in beauty decays 11 rates and asymmetries in B0 → K0π+ or B0 → K+π−.77 The ratio of ∆S = 1 and ∆S = 0 tree amplitudes in these processes, excluding CKM factors, is taken to be given by fK/fπ assuming factorization, while the ratio of corresponding penguin amplitudes is allowed to vary by ±0.22 around one. A current update of this rather conservative analysis obtains 83 γ = (73± 4+10 ◦ , (23) where the first error is experimental, while the second one is due to an uncertainty in SU(3) breaking. A discussion of SU(3) breaking factors relating B0 → π+π− and B0 → K+π− is included in Section 5.2. 4.2. B → ρρ Angular analyses of the pions in ρ decays have shown that B0 → ρ+ρ− is dominated almost 100% by longitudinal polarization 20. This simplifies the isospin analysis of CP asymmetries in these decays to becoming similar to B0 → π+π−. The advantage of B → ρρ over B → ππ is the relative small value of ( ρ0ρ0) in comparison with ρ+ρ−) and ( ρ+ρ0) (see Table I), indicating a smaller |P/T | in B → ρ+ρ− (|P/T | < 0.3 8) than in B0 → π+π− (|P/T | ∼ 0.5 77). Eq. (22) leads to an upper bound on αeff − α in B → ρρ, 0 < αeff − α < 17◦ (at 1σ). The asymmetries for longitudinal ρ’s given in Table I imply αeff = (91.7 −5.2) ◦. Thus, one finds α = (83 ± 10)◦ or γ = (76± 10)◦ by adding errors in quadrature. A stronger bound on |P/T | in B0 → ρ+ρ−, leading to a more precise value of γ, may be obtained by relating this process to B+ → K∗0ρ+ within flavor SU(3). 84 One uses the branching ratio and fraction of longitudinal rate measured for this process 20, ( K∗0ρ+) = (9.2 ± 1.5) × 10−6, fL(K∗0ρ+) = 0.48 ± 0.08, to normalize the penguin amplitude in B0 → ρ+ρ−. Including a conservative uncertainty from SU(3) breaking and smaller amplitudes, one finds a value γ = (71.4+5.8 −1.7) ◦ , (24) where the first error is experimental and the second one theoretical. The current small theoretical error in γ requires including isospin breaking effects in studies based on isospin symmetry. The effect of electroweak penguin amplitudes on the isospin analyses of B → ππ and B → ρρ has been calculated and was found to move γ slightly higher by an amount ∆γEWP = 1.5 ◦.34,35 Other corrections, relevant to methods using π0 and ρ0, includng π0-η-η′ mixing, ρ-ω mixing, and a small I = 1 ρρ contribution allowed by the ρ-width, are each smaller than one degree.36,37,38 Conclusion: Taking an average of the two values of γ in (23) and (24) obtained from B0 → π+π− and B0 → ρ+ρ−, and including the above-mentioned EWP correction, one finds γ = (73.5± 5.7)◦ . (25) October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review 12 M. Gronau A third method of measuring γ (or α) in time-dependent Dalitz analyses of B0 → (ρπ)0 involves a much larger error,85 and has a small effect on the overall averaged value of the weak phase. We note that sin γ is close to one and its relative error is only 3%, the same as the relative error in sin 2β and slightly smaller than the relative error in sinβ. 5. Rates, asymmetries, and γ in B → Kπ 5.1. Extracting γ in B → Kπ The four decays B0 → K+π−, B0 → K0π0, B+ → K0π+, B+ → K+π0 involve a potential for extracting γ, provided that one is sensitive to interference between a dominant isoscalar penguin amplitude and a small tree amplitude contributing to these processes. This idea has led to numerous suggestions for determining γ in these decays starting with a proposal made in 1994.86,87 An interference between penguin and tree amplitudes may be identified in two ways: (1) Two different properly normalized B → Kπ rates. (2) Nonzero direct CP asymmetries. Table II. Branching ratios and asymmetries in B → Kπ. Decay mode Branching ratio (10−6) ACP B0 → K+π− 19.4± 0.6 −0.097± 0.012 B+ → K+π0 12.8± 0.6 0.047± 0.026 B+ → K0π+ 23.1± 1.0 0.009± 0.025 B0 → K0π0 10.0± 0.6 −0.12± 0.11 Current branching ratios and CP asymmetries are summarized in Table II.20 Three ratios of rates, calculated using the ratio of B+ and B0 lifetimes, τ+/τ0 = 1.076± 0.008,20 are: R ≡ Γ(B 0 → K+π−) Γ(B+ → K0π+) = 0.90± 0.05 , 2Γ(B+ → K+π0) Γ(B+ → K0π+) = 1.11± 0.07 , Γ(B0 → K+π−) 2Γ(B0 → K0π0) = 0.97± 0.07 . (26) The largest deviation from one, observed in the ratio R at 2σ, is insufficient for claiming unambiguous evidence for a non-penguin contribution. An upper limit, R < 0.965 at 90% confidence level, would imply γ ≤ 79◦ using sin2 γ ≤ R,88 which neglects however “color-suppressed” EWP contributions.89 As we will argue now, these contributions and “color-suppressed” tree amplitudes are actually not suppressed as naively expected. October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review CP violation in beauty decays 13 The nonzero asymmetry measured in B0 → K+π− provides first evidence for an interference between penguin (P ′) and tree (T ′) amplitudes with a nonzero rel- ative strong phase. Such an interference occurs also in B+ → K+π0 where no asymmetry has been observed. An assumption that other contributions to the lat- ter asymmetry are negligible has raised some questions about the validity of the CKM framework. In fact, a color-suppressed tree amplitude (C′), also occurring in B+ → K+π0,86 resolves this “puzzle” if this amplitude is comparable in magnitude to T ′. Indeed, several studies have shown that this is the case,90,91,92,93,94 also im- plying that color-suppressed and color-favored EWP amplitudes are of comparable magnitudes.35 For consistency between the two CP asymmetries in B0 → K+π− and B+ → K+π0, the strong phase difference between C′ and T ′ must be negative and cannot be very small.95 This seems to stand in contrast to QCD calculations using a factorization theorem.29,31,94 The small asymmetry ACP (B + → K+π0) implies bounds on the sine of the strong phase difference δc between T ′ +C′ and P ′. The cosine of this phase affects Rc − 1 involving the decay rates for B+ → K0π+ and B+ → K0π+. A question studied recently is whether the two upper bounds on | sin δc| and | cos δc| are con- sistent with each other or, perhaps, indicate effects of NP. Consistency was shown by proving a sum rule involving ACP (B + → K+π0) and Rc − 1, in which an elec- troweak penguin (EWP) amplitude plays an important role. We will now present a proof of the sum rule, which may provide important information on γ.95 The two amplitudes for B+ → K0π+,K+π0 are given in terms of topological contributions including P ′, T ′ and C′, A(B+ → K0π+) = (P ′ − 1 P ′cEW ) +A A(B+ → K+π0) = (P ′ − 1 P ′cEW ) + (T ′ + P ′cEW ) + (C ′ + P ′EW ) +A ′ , (27) where P ′EW and P EW are color-favored and color-suppressed EWP contributions. The small annihilation amplitude A′ and a small u quark contribution to P ′ involv- ing a CKM factor V ∗ubVus will be neglected (|V ∗ubVus|/|V ∗cbVcs| = 0.02). Evidence for the smallness of these terms can be found in the small CP asymmetry measured for B+ → K0π+. Large terms would require rescattering and a sizable strong phase difference between these terms and P ′. Flavor SU(3) symmetry relates ∆I = 1, I(Kπ) = 3/2 electroweak penguin and tree amplitudes through a calculable ratio δEW 35,41, T ′ + C′ + P ′EW + P EW = (T ′ + C′)(1 − δEW e−iγ) , δEW = − c9 + c10 c1 + c2 |V ∗tbVts| |V ∗ubVus| = 0.60± 0.05 . (28) The error in δEW is dominated by the current uncertainty in |Vub|/|Vcb| = 0.104± 0.007 57, including also a smaller error from SU(3) breaking estimated using QCD factorization. Eqs. (27) and (28) imply 96 Rc = 1− 2rc cos δc(cos γ − δEW) + r2c (1− 2δEW cos γ + δ2EW) , (29) October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review 14 M. Gronau ACP (B + → K+π0) = −2rc sin δc sin γ/Rc , (30) where rc ≡ |T ′ + C′|/|P ′ − 13P EW | and δc is the strong phase difference between T ′ + C′ and P ′ − 1 P ′cEW . The parameter rc is calculable in terms of measured decay rates, using bro- ken flavor SU(3) which relates T ′ + C′ and T + C dominating B+ → π+π0 by a factorization factor fK/fπ (neglecting a tiny EWP term in B + → π+π0),87 |T ′ + C′| = |A(B+ → π+π0)| . (31) Using branching ratios from Tables I and II, one finds B+ → π+π0) B+ → K0π+) = 0.198± 0.008 . (32) The error in rc does not include an uncertainty from assuming factorization for SU(3) breaking in T ′ + C′. While this assumption should hold well for T ′, it may not be a good approximation for C′ which as we have mentioned is comparable in magnitude to T ′ and carries a strong phase relative to it. Thus one should allow a 10% theoretical error when using factorization for relating B → Kπ and B → ππ T + C amplitudes, so that rc = 0.20± 0.01 (exp)± 0.02 (th) . (33) Eliminating δc in Eqs. (29) and (30) by retaining terms which are linear in rc, one finds Rc − 1 cos γ − δEW ACP (B + → K+π0) sin γ = (2rc) 2 +O(r3c ) . (34) This sum rule implies that at least one of the two terms whose squares occur on the left-hand-side must be sizable, of the order of 2rc = 0.4. The second term, |ACP (B+ → K+π0)|/ sin γ, is already smaller than ≃ 0.1, using the current 2σ bounds on γ and |ACP (B+ → K+π0)|. Thus, the first term must provide a dominant contribution. For Rc ≃ 1, this implies γ ≃ arccos δEW ≃ (53.1± 3.5)◦. This range is expanded by including errors in Rc and ACP (B + → K+π0). For instance, an upper bound Rc < 1.1 would imply an inportant upper limit, γ < 70 ◦. Currently one only obtains an upper limit γ ≤ 88◦ at 90% confidence level.95 This bound is consistent with the value obtained in (25) from B → ππ and B → ρρ, but is not competitive with the latter precision. Conclusion: The current constraint obtained from Rc and ACP (B + → K+π0) is γ ≤ 88◦ at 90% confidence level. Further improvement in the measurement of Rc (which may, in fact, be very close to one) is required in order to achieve a precision in γ comparable to that obtained in B → ππ, ρρ. (A conclusion concerning the different CP asymmetries measured in B0 → K+π− and B+ → K+π0 will be given at the end of the next subsection.) October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review CP violation in beauty decays 15 5.2. Symmetry relations for B → Kπ rates and asymmetries The following two features imply rather precise sum rules in the CKM framework, both for B → Kπ decay rates and CP asymmetries: (1) The dominant penguin amplitude is ∆I = 0. (2) The four decay amplitudes obey a linear isospin relation,39 A(K+π−)−A(K0π+)− 2A(K+π0) + 2A(K0π0) . (35) An immediate consequence of these features are two isospin sum rules, which hold up to terms which are quadratic in small ratios of non-penguin to penguin amplitudes,45,46,47 Γ(K+π−) + Γ(K0π+) = 2Γ(K+π0) + 2Γ(K0π0) , (36) ∆(K+π−) + ∆(K0π+) = 2∆(K+π0) + 2∆(K0π0) , (37) where ∆(Kπ) ≡ Γ(B̄ → K̄π̄)− Γ(B → Kπ) . (38) Quadratic corrections to (36) have been calculated in the SM and were found to be a few percent.97,98,99 This is the level expected in general for isospin-breaking corrections which must therefore also be considered. The above two features imply that these ∆I = 1 corrections are suppressed by a small ratio of non-penguin to penguin amplitudes and are therefore negligible.100 Indeed, this sum rule holds experimentally within a 5% error.101 One expects the other sum rule (37) to hold at a similar precision. The CP rate asymmetry sum rule (37), relating the four CP asymmetries, leads to a prediction for the asymmetry in B0 → K0π0 in terms of the other three asymmetries which have been measured with higher precision, ACP (B 0 → K0π0) = −0.140± 0.043 . (39) While this value is consistent with experiment (see Table II), higher accuracy in this asymmetry measurement is required for testing this straightforward prediction. Relations between CP asymmetries in B → Kπ and B → ππ following from approximate flavor SU(3) symmetry of QCD 102 are not expected to hold as pre- cisely as isospin relations, but may still be interesting and useful. An important question relevant to such relations is how to include SU(3)-breaking effects, which are expected to be at a level of 20-30%. Here we wish to discuss two SU(3) rela- tions proposed twelve years ago,103,104 one of which holds experimentally within expectation, providing some lesson about SU(3) breaking, while the other has a an interesting implication for future applications of the isospin analysis in B → ππ. A most convenient proof of SU(3) relations is based on using a diagramatic approach, in which diagrams with given flavor topologies replace reduced SU(3) October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review 16 M. Gronau matrix elements.86 In this language, the amplitudes for B0 decays into pairs of charged or neutral pions, and pairs of charged or neutral π and K, are given by: −A(B0 → π+π−) = T + (P + 2P cEW /3) + E + PA , 2A(B0 → π0π0) = C − (P − PEW − P cEW /3)− E − PA , −A(B0 → K+π−) = T ′ + (P ′ + 2P ′cEW /3) , 2A(B0 → K0π0) = C′ − (P ′ − P ′EW − P ′cEW /3) . (40) The combination E + PA, representing exchange and penguin annihilation topolo- gies, is expected to be 1/mb-suppressed relative to T and C, 31,62 as demonstrated by the small branching ratio measured for B0 → K+K−.20 This term will be neglected. Expressing topological amplitudes in terms of CKM factors, SU(3)-invariant amplitudes and SU(3) invariant strong phases, one may write T ≡ V ∗ubVud|T + Puc| , P + 2P cEW /3 ≡ V ∗tbVtd|Ptc|eiδ , T ′ ≡ V ∗ubVus|T + Puc| , P ′ + 2P ′cEW /3 ≡ V ∗tbVts|Ptc|eiδ , (41) C ≡ V ∗ubVud|C − Puc| , P − PEW − P cEW /3 ≡ V ∗tbVtd|P̃tc|eiδ̃ , C′ ≡ V ∗ubVus|C − Puc| , P ′ − P ′EW − P ′cEW /3 ≡ V ∗tbVts|P̃tc|eiδ̃ . Unitarity of the CKM matrix, V ∗cbVcd(s) = −V ∗tbVtd(s) − V ∗ubVud(s), has been used to absorb in T ( ′) and C( ′) a penguin term Puc ≡ Pu − Pc multiplying V ∗ubVud(s), while Ptc ≡ Pt − Pc and P̃tc ≡ P̃t − P̃c contain two distinct combinations of EWP contributions. Using the identity Im (V ∗ubVudVtbV td) = −Im (V ∗ubVusVtbV ∗ts) , (42) one finds103,104 ∆(B0 → K+π−) = −∆(B0 → π+π−) (43) ∆(B0 → K0π0) = −∆(B0 → π0π0) , (44) where ∆ is the CP rate difference defined in (38). Quoting products of branching ratios and asymmetries from Tables I and II, Eq. (43) reads − 1.88± 0.24 = −1.96± 0.37 . (45) This SU(3) relation works well and requires no SU(3)-breaking. An SU(3) breaking factor fK/fπ in T but not in P , or in both T and P , are currently excluded at a level of 1.0σ, or 1.75σ. More precise CP asymmetry measurements in B0 → K+π− and B0 → π+π− are required for determining the pattern of SU(3) breaking in tree and penguin amplitudes. Using the prediction (39) of the B → Kπ asymmetry sum rule, Eq. (44) predicts ACP (B 0 → π0π0) = 1.07± 0.38 . (46) The error is dominated by current errors in CP asymmetries for B+ → K0π+ and B+ → K+π0, and to a less extent by the error in ( π0π0). SU(3) breaking in October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review CP violation in beauty decays 17 amplitudes could modify this prediction by a factor fπ/fK if this factor applies to C, and less likely by (fπ/fK)2. A large positive CP asymmetry, favored in all three cases, will affect future applications of the isospin analysis in B → ππ. It implies that while the B̄ isospin triangle is roughly equal-sided, the B triangle is squashed. A twofold ambiguity in the value of γ disappears in the limit of a flat B triangle.24 Conclusion: The isospin sum rule for B → Kπ decay rates holds well, while the CP asymmetry sum rule predicts ACP (B 0 → K0π0) = −0.140±0.043. The different asymmetries in B0 → K+π− and B+ → K+π0 can be explained by an amplitude C′ comparable to T ′ and involving a relative negative strong phase, and should not be considered a “puzzle”. An SU(3) relation for B0 → ππ and B0 → Kπ CP asymmetries works well for charged modes. The corresponding relation for neutral modes predicts a large positive asymmetry in B0 → π0π0. Improving asymmetry measurements can provide tests for SU(3) breaking factors. 6. Tests for small New Physics effects 6.1. Values of γ We have described three ways for extracting a value for γ relying on interference of distinct pairs of quark amplitudes, (b → cūs, b → uc̄s), (b → cc̄s, b → uūs) and (b → cc̄d, b → uūd). The three pairs provide a specific pattern for CP violation in the CKM framework, which is expected to be violated in many extensions of the SM. The rather precise value of γ (25) extracted from B → ππ, ρρ, ρπ is consistent with constraints on γ from CP conserving measurements related to the sides of the unitarity triangle.8,9 The values of γ obtained in B → D(∗)K(∗) and B → Kπ are consistent with those extracted in B → ππ, ρρ, ρπ, but are not yet sufficiently precise for testing small NP effects in charmless B decays. Further experimental improvements are required, in particular in the former two types of processes. While the value of γ in B → D(∗)K∗) is not expected to be affected by NP, the other two classes of processes involving penguin loops are susceptible to such effects. The extraction of γ in B → ππ ρρ assumes that γ is the phase of a ∆I = 3/2 tree amplitude, while an additional ∆I = 3/2 EWP contribution is included using isospin. The extracted value could be modified by a new ∆I = 3/2 effective operator originating in physics beyond the SM, but not by a new ∆I = 1/2 operator. Similarly, the value of γ extracted in B → Kπ is affected by a potential new ∆I = 1 operator, but not by a new ∆I = 0 operator, because the amplitude (28), playing an essential role in this method, is pure ∆I = 1. 6.2. B → Kπ sum rule Charmless |∆S| = 1 B and Bs decays are particularly sensitive to NP effects, as new heavy particles at the TeV mass range may replace the the W boson and top- quark in the penguin loop dominating these amplitudes.28 The sum rule (36) for B → Kπ decay rates provides a test for such effects. However, as we have argued October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review 18 M. Gronau from isospin considerations, it is only affected by quadratic ∆I = 1 amplitudes including NP contributions. Small NP amplitudes, contributing quadratically to the sum rule, cannot be separated from SM corrections, which are by themselves at a level of a few percent. This is the level to which the sum rule has already been tested. We will argue below for evidence showing that potential NP contributions to |∆S| = 1 charmless decays must be suppressed by roughly an order of magnitude relative to the dominant b→ s penguin amplitudes. 6.3. Values of S,C in |∆S| = 1 B0 → fCP decays A class of b → s penguin-dominated B0 decays to CP-eigenstates has recently at- tracted considerable attention. This includes final statesXKS andXKL, whereX = φ, π0, η′, ω, f0, ρ 0,K+K−,KSKS , π 0π0, for which measured asymmetries −ηCPS and C are quoted in Table III. [The asymmetries S and C = −ACP were de- fined in (18) for B0 → π+π−. Observed modes with KL in the final states obey ηCP (XKL) = −ηCP (XKS).] In these processes, a value S = −ηCP sin 2β (for states Table III. Asymmetries S and C in B0 → XKS . X φ π0 η′ ω f0(980) −ηCP S 0.39± 0.18 0.33± 0.21 0.61± 0.07 0.48± 0.24 0.42 ± 0.17 C 0.01± 0.13 0.12± 0.11 −0.09± 0.06 −0.21± 0.19 −0.02± 0.13 X ρ0 K+K− KSKS π −ηCP S 0.20± 0.57 0.58 +0.18 −0.13 0.58± 0.20 −0.72± 0.71 C 0.64± 0.46 0.15± 0.09 −0.14± 0.15 0.23± 0.54 with CP-eigenvalue ηCP ) is expected approximately. 26,43 These predictions involve hadronic uncertainties at a level of several percent, of order λ2, λ ∼ 0.2. It has been pointed out some time ago105 that it is difficult to separate these hadronic uncer- tainties within the SM from NP contributions to decay amplitudes if the latter are small. In the next subsection we will discuss indirect experimental evidence showing that NP contributions to S and C must be small. Corrections to S = −ηCP sin 2β and values for the asymmetries C have been calculated in the SM using methods based on QCD factorization106,107 and flavor SU(3),90,108,109 and were found to be between a few percent up to above ten percent within hadronic uncertainties. Whereas the deviation of S from −ηCP sin 2β is process-dependent, a generic result has been proven a long time ago for both S and C, to first order in |c/p|,14 ∆S ≡ −ηCPS − sin 2β = 2 cos 2β sin γ cos∆ , C = 2 sin γ sin∆ . (47) Here p and c are penguin and color-suppressed tree amplitudes involving a small ra- tio and relative weak and strong phases γ and ∆, respectively. This implies ∆S > 0 October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review CP violation in beauty decays 19 for |∆| < π/2, which can be argued for several of the above decays using QCD arguments106,107 or SU(3) fits.109 (Note that while |p| is measurable in certain decay rates up to first order corrections, |c| and ∆ involve sizable hadronic uncertain- ties in QCD calculations.) In contrast to this expectation, the central values mea- sured for ∆S are negative for all decays. (See Table III.) Consequently, one finds an averaged value sin 2βeff = 0.53±0.05,20 to be compared with sin 2β = 0.678±0.025. Two measurements which seem particularly interesting are−ηCPSφKS = 0.39±0.18, where a positive correction of a few percent to sin 2β is expected in the SM,106,107 and −ηCPSπ0KS = 0.33± 0.21, where a rather large positive correction to sin 2β is expected shifting this asymmetry to a value just above 0.8.90 While the current averaged value of sin 2βeff is tantalizing, experimental errors in S and C must be reduced further to make a clear case for physics beyond the SM. Assuming that the discrepancy between improved measurements and calcu- lated values of S and C persists beyond theoretical uncertainties, can this pro- vide a clue to the underlying New Physics? Since many models could give rise to a discrepancy,28,43,44 one would seek signatures characterizing classes of models rather than studying the effects in specific models. One way of classifying extensions of the SM is by the isospin behavior of the new effective operators contributing to b→ sqq̄ transitions. 6.4. Diagnosis of ∆I for New Physics operators Four-quark operators in the effective Hamiltonian associated with NP in b → sqq̄ transitions can be either isoscalar or isovector operators. We will now discuss a study proposed recently in order to isolate ∆I = 0 or ∆I = 1 operators, thus determining corresponding NP amplitudes and CP violating phases.49 We will show that since S and C in the above processes combine ∆I = 0 or ∆I = 1 contributions, separating these contributions requires using also information from other two asymmetries, which are provided by isospin-reflected decay processes. Two |∆S| = 1 charmless B (or Bs) decay processes, related by isospin reflection, RI : u ↔ d, ū ↔ −d̄, can always be expressed in term of common ∆I = 0 and ∆I = 1 amplitudes B and A in the form: A(B+ → f) = B +A , A(B0 → RIf) = ±(B −A) . (48) A proof of this relation uses a sign change of Clebsch-Gordan coefficients underm↔ −m.49 The description (48) applies, in particular, to pairs of processes involving all the B0 decay modes listed in Table III, and B+ decay modes where final states are obtained by isospin reflection from corresponding B0 decay modes. Decay rates for pairs of isospin-reflected B decay processes, and for B̄ decays to corresponding charge conjugate final states are therefore given by (we omit inessential common kinematic factors), Γ+ = |B +A|2 , Γ0 = |B − A|2 , Γ− = |B̄ + Ā|2 , Γ0̄ = |B̄ − Ā|2 . (49) October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review 20 M. Gronau The amplitudes B̄ and Ā are related to B and A by a change in sign of all weak phases, whereas strong phases are left unchanged. For each pair of processes one defines four asymmetries: an isospin-dependent CP-conserving asymmetry, Γ+ + Γ− − Γ0 − Γ0̄ Γ+ + Γ− + Γ0 + Γ0̄ , (50) two CP-violating asymmetries for B+ and B0, A+CP ≡ Γ− − Γ+ Γ− + Γ+ , − C ≡ A0CP ≡ Γ0̄ − Γ0 Γ0̄ + Γ0 , (51) and the time-dependent asymmetry S in B0 decays, 1 + |λ|2 , λ ≡ ηCP B̄ − Ā e−2iβ , (52) In the Standard Model, the isoscalar amplitude B contains a dominant penguin contribution, BP , with a CKM factor V cbVcs. The residual isoscalar amplitude, ∆B ≡ B −BP , (53) and the amplitude A, consist each of contributions smaller than BP by about an order of magnitude.29,30,31,32,86 These contributions include terms with a much smaller CKM factor V ∗ubVus, and a higher order electroweak penguin amplitude with CKM factor V ∗tbVts. Thus, one expects |∆B| ≪ |BP | , |A| ≪ |BP | . (54) Consequently, the asymmetries AI , A CP and ∆S are expected to be small, of or- der 2|A|/|B| and 2|∆B|/|BP |. In contrast, potentially large contributions to ∆B and A from NP, comparable to BP , would most likely lead to large asymmetries of order one. An unlikely exception is the case when both ∆B/BP and A/BP are purely imaginary, or almost purely imaginary. This would require very special cir- cumstances such as fine-tuning in specific models. Excluding cancellations between NP and SM contributions in both CP-conserving and CP violating asymmetries, tests for the hierarchy (54) become tests for the smallness of corresponding potential NP contributions to B and A. There exists ample experimental information showing that asymmetries A+CP are small in processes related by isospin reflection to the decay modes in Table III. Upper limits on the magnitudes of most asymmetries are at a level of ten or fifteen percent [e.g., A+CP (K +φ) = 0.034±0.044,A+CP (K+η′) = 0.031±0.026], while others may be as large as twenty or thirty percent [A+CP (K +ρ0) = 0.31+0.11 −0.10]. Similar values have been measured for isospin asymmetries AI [e.g., AI(K +φ) = −0.037± 0.077, +η′) = −0.001± 0.033, AI(K+ρ0) = −0.16± 0.10].49 Since these two types of asymmetries are of order 2|∆B|/|BP | and 2|A|/|BP |, this confirms the hierarchy (54), which can be assumed to hold also in the presence of NP. October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review CP violation in beauty decays 21 We will take by convention the dominant penguin amplitude BP to have a zero weak phase and a zero strong phase, referring all other strong phases to it. Writing B = BP +∆B , B̄ = BP +∆B̄ , (55) and expanding the four asymmetries to leading order in ∆B/BP or A/BP , one has ∆S = cos 2β Im(Ā−A) − Im(∆B̄ −∆B) , (56) Re(Ā+A) , (57) A+CP = Re(Ā−A) Re(∆B̄ −∆B) , (58) A0CP = − Re(Ā−A) Re(∆B̄ −∆B) . (59) The four asymmetries provide the following information: • The ∆I = 0 and ∆I = 1 contributions in CP asymmetries are separated by taking sums and differences, A∆I=0CP ≡ (A+CP +A CP ) = Re(∆B̄ −∆B) , (60) A∆I=1CP ≡ (A+CP −A CP ) = Re(Ā−A) . (61) • ReA/BP and ReĀ/BP may be separated by using information from A∆I=1CP and AI . • ∆S is governed by an imaginary part of a combination of ∆I = 0 and ∆I = 1 terms which cannot be separated in B decays. Such a separation is possible in Bs decays to pairs of isospin-reflected decays, e.g. Bs → K+K−,KSKS or Bs → K∗+K∗−,K∗0K̄∗0, where 2β in the definition of ∆S (47) is now replaced by the small phase of Bs-B̄s mixing. One may take one step further under the assumption that strong phases as- sociated with NP amplitudes are small relative to those of the SM and can be neglected.110 This assumption, which must be confronted by data, is reasonable because rescattering from a leading b → scc̄ amplitude is likely the main source of strong phases, while rescattering from a smaller b → sqq̄ NP amplitude is then a second-order effect. In the convention (55), where the strong phase of BP is set equal to zero, ∆B and A have the same CP-conserving strong phase δ, and involve CP-violating phases φB and φA, respectively, ∆B = |∆B|eiδeiφB , A = |A|eiδeiφA . (62) Since the four asymmetries (56)-(59) are first order in small ratios of amplitudes, one may take BP in their expression to be given by the square root of Γ+ or Γ0, thereby neglecting second order terms. These four observables can then be shown to October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review 22 M. Gronau determine |A|, φA and |∆B| sinφB .49 The combination |∆B| cosφB adds coherently to BP and cannot be fixed independently. The amplitudes ∆B and A consist of process-dependent SM and potential NP contributions. Assuming that the former are calculable, either using methods based on QCD-factorization or by fitting within flavor SU(3) these and other B decay rates and asymmetries, the four asymmetries determine the magnitude and CP violating phase of a ∆I = 1 NP amplitude and the imaginary part of a ∆I = 0 NP amplitude. In certain cases, e.g., B → φK or B → η′KS , stringent upper bounds on SM contributions to ∆B and Amay suffice if some of the four measured asymmetries are larger than permitted by these bounds. In the pair B+ → K+π0, B0 → K0π0, the four measured asymmetries [using the predicted value (39)] are AI = 0.087 ± 0.038, A∆I=0CP = −0.047± 0.025, A∆I=1CP = 0.094± 0.025,∆S = −0.35± 0.21. Some reduction of errors is required for a useful implementation of this method. Conclusion: There exists ample experimental evidence in pairs of isospin-reflected b → s penguin-dominated decays that potential NP amplitudes must be small. Assuming that these amplitudes involve negligible strong phases, and assuming that small SM non-penguin contributions are calculable or can be strictly bounded, one may determine the magnitude and CP violating phase of a NP ∆I = 1 amplitude, and the imaginary part of a NP ∆I = 0 amplitude in each pair of isospin-reflected decays. 6.5. Null or nearly-null tests We have not discussed null tests of the CKM framework.111 Evidence for physics beyond the Standard Model may show-up as (small) nonzero asymmetries in pro- cesses where they are predicted to be extremely small in the CKM framework. A well-known example is B+ → π+π0, where the CP asymmetry is expected to be a small fraction of a percent including EWP amplitudes.34,35 We have only discussed exclusive hadronic B decays, where QCD calculations involve hadronic uncertain- ties. A more robust calculation exists for the direct CP asymmetry in inclusive radiative decays B → Xsγ, found to be smaller than one percent.112 The current upper limit on this asymmetry is at least an order of magnitude larger.113 Time-dependent asymmetries in radiative decays B0 → KSπ0γ, for a KSπ0 invariant-mass in the K∗ region and for a larger invariant-mass range including this region, are interesting because they test the photon helicity, predicted to be dominantly right-handed in B0 decays and left-handed in B̄0 decays.105,114 The asymmetry, suppressed by ms/mb, is expected to be several percent in the SM, and can be very large in extensions where spin-flip is allowed in b → sγ. While dimensional arguments seem to indicate a possible larger asymmetry in the SM, of order ΛQCD/mb ∼ 10%,115 calculations using perturbative QCD116 and QCD factorization117 find asymmetries of a few percent. The current averaged values, for the K∗ region and for a larger invariant-mass range including this region, are S((KSπ 0)K∗γ) = −0.28 ± 0.26 and S(KSπ0γ) = −0.09 ± 0.24.20,118 These mea- October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review CP violation in beauty decays 23 surements must be improved in order to become sensitive to the level predicted in the SM, or to provide evidence for physics beyond the SM. 7. Summary The Standard Model passed with great success numerous tests in the flavor sector, including a variety of measurements of CP asymmetries related to the CKM phases β and γ. Small potential New Physics corrections may occur in ∆S = 0 and |∆S| = 1 penguin amplitudes, affecting the extraction of γ and modifying CP-violating and isospin-dependent asymmetries in |∆S| = 1 B0 decays and isospin-related B+ decays. Higher precision than achieved so far is required for claiming evidence for such effects and for sorting out their isospin structure. Similar studies can be performed with Bs mesons produced at hadron colliders and at e+e− colliders running at the Υ(5S) resonance. Time-dependence in Bs → D−s K + and Bs → J/ψφ or Bs → J/ψη measures γ and the small phase of the Bs-B̄s mixing amplitude. 119 Comparing time-dependence and angular analysis in Bs → J/ψφ with b → s penguin-dominated processes including Bs → φφ,Bs → K∗+K∗−, Bs → K∗0K̄∗0 provides a methodic search for potential NP effects. Work on Bs decays has just begun at the Tevatron. 120 One is looking forward to first results from the LHC. Acknowledgments I am grateful to numerous collaborators, in particular to Jonathan Rosner whose collaboration continued without interruption for many years. This work was sup- ported in part by the Israel Science Foundation under Grant No. 1052/04 and by the German-Israeli Foundation under Grant No. I-781-55.14/2003. References 1. J. H. Christenson, J. W. Cronin, V. L. Fitch and R. Turlay, Phys. Rev. Lett. 13, 138 (1964). 2. B. Aubert et al. [BABAR Collaboration], Phys. Rev. Lett. 87, 091801 (2001); K. Abe et al. [Belle Collaboration], Phys. Rev. Lett. 87, 091802 (2001). 3. A. B. Carter and A. I. Sanda, Phys. Rev. Lett. 45, 952 (1980); Phys. Rev. D 23, 1567 (1981); I. I. Y. Bigi and A. I. Sanda, Nucl. Phys. B 193, 85 (1981). 4. M. Kobayashi and T. Maskawa, Prog. Theor. Phys. 49, 652 (1973). 5. I. Dunietz and J. L. Rosner, Phys. Rev. D 34, 1404 (1986); I. I. Y. Bigi and A. I. Sanda, Nucl. Phys. B 281, 41 (1987). 6. H. Albrecht et al. [ARGUS Collaboration], Phys. Lett. B 192, 245 (1987); S. L. Wu, Nucl. Phys. Proc. Suppl. 3, 39 (1988). 7. L. Wolfenstein, Phys. Rev. Lett. 51, 1945 (1983). We use a standard phase convention in which Vub and Vtd are complex, while all other CKM matrix elements are real to a good approximation. 8. J. Charles et al. [CKMfitter Collaboration], eConf C060409, 043 (2006), presenting updated results periodically on the web site http://www.slac. stanford.edu/xorg/ckmfitter/. http://www.slac October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review 24 M. Gronau 9. M. Bona et al. [UTfit Collaboration], JHEP 0610, 081 (2006), presenting updated results periodically on the web site http://www.utfit.org/. 10. V. M. Abazov et al. [D0 Collaboration], Phys. Rev. Lett. 97, 021802 (2006); A. Abu- lencia et al. [CDF Collaboration], Phys. Rev. Lett. 97, 242003 (2006). 11. For a recent review see A. D. Dolgov, arXiv:hep-ph/0511213. 12. See e.g. E. Gabrielli, A. Masiero and L. Silvestrini, Phys. Lett. B 374, 80 (1996). 13. This review, which is only 27 page long (the number of Hebrew alphabet letters) includes 120 references, as a Jewish blessing says “May you live to be 120!” It is too short to include other hundreds or thousands of relevant papers. I apologize to their many authors. 14. M. Gronau, Phys. Rev. Lett. 63, 1451 (1989). 15. H. Boos, T. Mannel and J. Reuter, Phys. Rev. D 70, 036006 (2004). 16. M. Ciuchini, M. Pierini and L. Silvestrini, Phys. Rev. Lett. 95, 221804 (2005). 17. H. n. Li and S. Mishima, arXiv:hep-ph/0610120. 18. B. Aubert et al. [BABAR Collaboration], arXiv:hep-ex/0607107. 19. K. F. Chen et al. [Belle Collaboration], arXiv:hep-ex/0608039. 20. E. Barbiero et al. [Heavy Flavor Averaging Group], hep-ex/0603003; updates are avail- able at http://www.slac.stanford.edu/xorg/hfag/. 21. B. Aubert et al. [BABAR Collaboration], Phys. Rev. D 71, 032005 (2005); R. Itoh et al. [Belle Collaboration], Phys. Rev. Lett. 95, 091601 (2005). 22. P. Krokovny et al. [Belle Collaboration], Phys. Rev. Lett. 97, 081801 (2006). B. Aubert et al. [BABAR Collaboration], arXiv:hep-ex/0607105. 23. R. Fleischer and T. Mannel, Phys. Lett. B 506, 311 (2001). 24. M. Gronau and D. London., Phys. Lett. B 253, 483 (1991). 25. M. Gronau and D. Wyler, Phys. Lett. B 265, 172 (1991). 26. D. London and R. D. Peccei, Phys. Lett. B 223, 257 (1989). 27. B. Grinstein, Phys. Lett. B 229, 280 (1989). 28. M. Gronau and D. London, Phys. Rev. D 55, 2845 (1997). 29. M. Beneke, G. Buchalla, M. Neubert and C. T. Sachrajda, Phys. Rev. Lett. 83, 1914 (1999); Nucl. Phys. B 606, 245 (2001); Phys. Rev. D 72, 098501 (2005). 30. Y. Y. Keum, H. n. Li and A. I. Sanda, Phys. Lett. B 504, 6 (2001); Phys. Rev. D 63, 054008 (2001). 31. C. W. Bauer, D. Pirjol, I. Z. Rothstein and I. W. Stewart, Phys. Rev. D 70, 054015 (2004); C. W. Bauer, D. Pirjol, I. Z. Rothstein and I. W. Stewart, Phys. Rev. D 72, 098502 (2005). 32. M. Ciuchini, E. Franco, G. Martinelli and L. Silvestrini, Nucl. Phys. B 501, 271 (1997); M. Ciuchini, R. Contino, E. Franco, G. Martinelli and L. Silvestrini, Nucl. Phys. B 512, 3 (1998) [Erratum-ibid. B 531, 656 (1998)]; M. Ciuchini, E. Franco, G. Martinelli, M. Pierini and L. Silvestrini, Phys. Lett. B 515, 33 (2001). 33. M. Gronau and D. London, Phys. Rev. Lett. 65, 3381 (1990). 34. A. J. Buras and R. Fleischer, Eur. Phys. J. C 11, 93 (1999). 35. M. Gronau, D. Pirjol and T. M. Yan, Phys. Rev. D 60, 034021 (1999) [Erratum-ibid. D 69, 119901 (2004)]. 36. S. Gardner, Phys. Rev. D 59, 077502 (1999); S. Gardner, Phys. Rev. D 72, 034015 (2005). 37. M. Gronau and J. Zupan, Phys. Rev. D 71, 074017 (2005). 38. A. F. Falk, Z. Ligeti, Y. Nir and H. Quinn, Phys. Rev. D 69, 011502 (2004). 39. Y. Nir and H. R. Quinn, Phys. Rev. Lett. 67, 541 (1991); H. J. Lipkin, Y. Nir, H. R. Quinn and A. Snyder, Phys. Rev. D 44, 1454 (1991); M. Gronau, Phys. Lett. B 265, 389 (1991); http://www.utfit.org/ http://arxiv.org/abs/hep-ph/0511213 http://arxiv.org/abs/hep-ph/0610120 http://arxiv.org/abs/hep-ex/0607107 http://arxiv.org/abs/hep-ex/0608039 http://arxiv.org/abs/hep-ex/0603003 http://www.slac.stanford.edu/xorg/hfag/ http://arxiv.org/abs/hep-ex/0607105 October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review CP violation in beauty decays 25 40. See, however, N. G. Deshpande and X. G. He, Phys. Rev. Lett. 74, 26 (1995) [Erratum- ibid. 74, 4099 (1995)]. 41. M. Neubert and J. L. Rosner, Phys. Lett. B 441, 403 (1998); Phys. Rev. Lett. 81, 5076 (1998). 42. M. Neubert, JHEP 9902, 014 (1999); M. Beneke and S. Jager, hep-ph/0610322. 43. Y. Grossman and M. P. Worah, Phys. Lett. B 395, 241 (1997). 44. M. Ciuchini, E. Franco, G. Martinelli, A. Masiero and L. Silvestrini, Phys. Rev. Lett. 79, 978 (1997); R. Barbieri and A. Strumia, Nucl. Phys. B 508, 3 (1997); S. A. Abel, W. N. Cottingham and I. B. Whittingham, Phys. Rev. D 58, 073006 (1998); Y. Gross- man, M. Neubert and A. L. Kagan, JHEP 9910, 029 (1999); X. G. He, C. L. Hsueh and J. Q. Shi, Phys. Rev. Lett. 84, 18 (2000); G. Hiller, Phys. Rev. D 66, 071502 (2002); N. G. Deshpande and D. K. Ghosh, Phys. Lett. B 593, 135 (2004); V. Barger, C. W. Chiang, P. Langacker and H. S. Lee, Phys. Lett. B 580, 186 (2004); ibid. 598, 218 (2004). 45. M. Gronau and J. L. Rosner, Phys. Rev. D 59, 113002 (1999); H. J. Lipkin, Phys. Lett. B 445, 403 (1999). 46. D. Atwood and A. Soni, Phys. Rev. D 58, 036005 (1998); M. Gronau, Phys. Lett. B 627, 82 (2005). 47. A sum rule involving three asymmetries, based on the expectation that the asymmetry in B+ → K0π+ should be very small, is discussed in M. Gronau and J. L. Rosner, Phys. Rev. D 71, 074019 (2005). 48. D. London and A. Soni, Phys. Lett. B 407, 61 (1997). 49. M. Gronau and J. L. Rosner, arXiv:hep-ph/0702193, to be published in Phys. Rev. 50. Y. Grossman, A. Soffer and J. Zupan, Phys. Rev. D 72, 031501 (2005). Evidence for 0-D̄0 mixing has been reported recently, B. Aubert et al. [BABAR Collaboration], arXiv:hep-ex/0703020; K. Abe et al. [Belle Collaboration], arXiv:hep-ex/0703036. 51. M. Gronau, Phys. Rev. D 58, 037301 (1998). 52. Y. Grossman, Z. Ligeti and A. Soffer, Phys. Rev. D 67, 071301 (2003) 53. D. Atwood, I. Dunietz and A. Soni, Phys. Rev. Lett. 78, 3257 (1997); D. Atwood, I. Dunietz and A. Soni, Phys. Rev. D 63, 036005 (2001). 54. A. Giri, Y. Grossman, A. Soffer and J. Zupan, Phys. Rev. D 68, 054018 (2003); A. Bondar, Proceedings of BINP Special Analysis Meeting on Data Analysis, 24–26 September 2002, unpublished. 55. I. Dunietz, Phys. Lett. B 270, 75 (1991). 56. A. Bondar and T. Gershon, Phys. Rev. D 70, 091503 (2004). 57. W. M. Yao et al. [Particle Data Group], J. Phys. G 33, 1 (2006). 58. R. Aleksan, T. C. Petersen and A. Soffer, Phys. Rev. D 67, 096002 (2003). 59. M. Gronau, Phys. Lett. B 557, 198 (2003). 60. M. Gronau, Y. Grossman, N. Shuhmaher, A. Soffer and J. Zupan, Phys. Rev. D 69, 113003 (2004). 61. M. Gronau and J. L. Rosner, Phys. Lett. B 439, 171 (1998); Z. z. Xing, Phys. Rev. D 58, 093005 (1998); J. H. Jang and P. Ko, Phys. Rev. D 58, 111302 (1998). 62. B. Blok, M. Gronau and J. L. Rosner, Phys. Rev. Lett. 78, 3999 (1997). 63. B. Aubert et al. [BABAR Collaboration], Phys. Rev. D 74, 031101 (2006). 64. B. Aubert et al. [BABAR Collaboration], Phys. Rev. D 72, 071103 (2005). 65. B. Aubert et al. [BABAR Collaboration], Phys. Rev. D 73, 051105 (2006). 66. K. Abe et al. [BELLE Collaboration], Phys. Rev. D 73, 051106 (2006). 67. J. P. Silva and A. Soffer, Phys. Rev. D 61, 112001 (2000); M. Gronau, Y. Grossman and J. L. Rosner, Phys. Lett. B 508, 37 (2001). http://arxiv.org/abs/hep-ph/0610322 http://arxiv.org/abs/hep-ph/0702193 http://arxiv.org/abs/hep-ex/0703020 http://arxiv.org/abs/hep-ex/0703036 October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review 26 M. Gronau 68. B. Aubert et al. [BABAR Collaboration], Phys. Rev. D 72, 032004 (2005). 69. K. Abe et al. [Belle Collaboration], arXiv:hep-ex/0508048. 70. B. Aubert et al. [BABAR Collaboration], arXiv:hep-ex/0607065. 71. B. Aubert et al. [BABAR Collaboration], Phys. Rev. D 72, 071104 (2005). 72. See also P. Krokovny et al. [Belle Collaboration], Phys. Rev. Lett. 90, 141802 (2003); K. Abe et al. [Belle Collaboration], arXiv:hep-ex/0408108. 73. A. Poluektov et al. [Belle Collaboration], Phys. Rev. D 73, 112009 (2006). 74. B. Aubert et al. [BABAR Collaboration], arXiv:hep-ex/0607104. See also B. Aubert et al. [BABAR Collaboration], Phys. Rev. Lett. 95, 121802 (2005). 75. B. Aubert et al. [BABAR Collaboration], arXiv:hep-ex/0507101. 76. M. Gronau, Y. Grossman, Z. Surujon and J. Zupan, arXiv:hep-ph/0702011, to be published in Phys. Lett. B. 77. M. Gronau and J. L. Rosner, Phys. Lett. B 595, 339 (2004). 78. M. Gronau, E. Lunghi and D. Wyler, Phys. Lett. B 606, 95 (2005). 79. M. Gronau, D. London, N. Sinha and R. Sinha, Phys. Lett. B 514, 315 (2001). 80. For two somewhat weaker bounds, which are included in this bound, see Y. Grossman and H. R. Quinn, Phys. Rev. D 58, 017504 (1998); J. Charles, Phys. Rev. D 59, 054007 (1999). 81. H. Ishino et al. [Belle Collaboration], BELLE-PREPRINT-2006-33. 82. M. Gronau, Phys. Lett. B 300, 163 (1993). 83. M. Gronau and J. L. Rosner, work in progress. 84. M. Beneke, M. Gronau, J. Rohrer and M. Spranger, Phys. Lett. B 638, 68 (2006). 85. A. E. Snyder and H. R. Quinn, Phys. Rev. D 48, 2139 (1993); A. Kusaka et al. [Belle Collaboration], arXiv:hep-ex/0701015; B. Aubert et al. [BABAR Collaboration], arXiv:hep-ex/0703008. 86. M. Gronau, O. F. Hernandez, D. London and J. L. Rosner, Phys. Rev. D 50, 4529 (1994); ibid 52, 6374 (1995). 87. M. Gronau, J. L. Rosner and D. London, Phys. Rev. Lett. 73, 21 (1994). 88. R. Fleischer and T. Mannel, Phys. Rev. D 57, 2752 (1998). 89. M. Gronau and J. L. Rosner, Phys. Rev. D 57, 6843 (1998). 90. C. W. Chiang, M. Gronau, J. L. Rosner and D. A. Suprun, Phys. Rev. D 70, 034020 (2004). 91. S. Baek, P. Hamel, D. London, A. Datta and D. A. Suprun, Phys. Rev. D 71, 057502 (2005). 92. A. J. Buras, R. Fleischer, S. Recksiegel and F. Schwab, Phys. Rev. Lett. 92, 101804 (2004). 93. H. n. Li, S. Mishima and A. I. Sanda, Phys. Rev. D 72, 114005 (2005). 94. M. Beneke and S. Jager, Nucl. Phys. B 751, 160 (2006). 95. M. Gronau and J. L. Rosner, Phys. Lett. B 644, 237 (2007). 96. M. Gronau and J. L. Rosner, Phys. Rev. D 65, 013004 (2002); [Erratum-ibid. D 65, 079901 (2002). 97. M. Gronau and J. L. Rosner, Phys. Lett. B 572, 43 (2003). 98. M. Beneke and M. Neubert, Nucl. Phys. B 675, 333 (2003). 99. C. W. Bauer, I. Z. Rothstein and I. W. Stewart, Phys. Rev. D 74, 034010 (2006). 100. M. Gronau, Y. Grossman, G. Raz and J. L. Rosner, Phys. Lett. B 635, 207 (2006). 101. M. Gronau and J. L. Rosner, Phys. Rev. D 74, 057503 (2006). 102. D. Zeppenfeld, Z. Phys. C 8, 77 (1981); M. J. Savage and M. B. Wise, Phys. Rev. D 39, 3346 (1989) [Erratum-ibid. D 40, 3127 (1989)]; L. L. Chau, H. Y. Cheng, W. K. Sze, H. Yao and B. Tseng, Phys. Rev. D 43, 2176 (1991). [Erratum-ibid. D 58, 019902 (1998)]. http://arxiv.org/abs/hep-ex/0508048 http://arxiv.org/abs/hep-ex/0607065 http://arxiv.org/abs/hep-ex/0408108 http://arxiv.org/abs/hep-ex/0607104 http://arxiv.org/abs/hep-ex/0507101 http://arxiv.org/abs/hep-ph/0702011 http://arxiv.org/abs/hep-ex/0701015 http://arxiv.org/abs/hep-ex/0703008 October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review CP violation in beauty decays 27 103. N. G. Deshpande and X. G. He, Phys. Rev. Lett. 75, 1703 (1995); X. G. He, Eur. Phys. J. C 9, 443 (1999). 104. M. Gronau and J. L. Rosner, Phys. Rev. Lett. 76, 1200 (1996); A. S. Dighe, M. Gronau and J. L. Rosner, Phys. Rev. D 54, 3309 (1996). 105. D. Atwood, M. Gronau and A. Soni, Phys. Rev. Lett. 79, 185 (1997). 106. M. Beneke, Phys. Lett. B 620, 143 (2005). 107. H. Y. Cheng, C. K. Chua and A. Soni, Phys. Rev. D 72, 014006 (2005); H. Y. Cheng, C. K. Chua and A. Soni, Phys. Rev. D 72, 094003 (2005). 108. Y. Grossman, Z. Ligeti, Y. Nir and H. Quinn, Phys. Rev. D 68, 015004 (2003); G. Engelhard, Y. Nir and G. Raz, Phys. Rev. D 72, 075013 (2005); G. Engelhard and G. Raz, Phys. Rev. D 72, 114017 (2005). 109. M. Gronau and J. L. Rosner, Phys. Lett. B 564, 90 (2003); C. W. Chiang, M. Gronau and J. L. Rosner, Phys. Rev. D 68, 074012 (2003); C. W. Chiang, M. Gronau, Z. Luo, J. L. Rosner and D. A. Suprun, Phys. Rev. D 69, 034001 (2004); M. Gronau, J. L. Ros- ner and J. Zupan, Phys. Lett. B 596, 107 (2004); M. Gronau, J. L. Rosner and J. Zupan, Phys. Rev. D 74, 093003 (2006). 110. A. Datta and D. London, Phys. Lett. B 595, 453 (2004); S. Baek, P. Hamel, D. Lon- don, A. Datta and D. A. Suprun, Phys. Rev. D 71, 057502 (2005); A. Datta, M. Im- beault, D. London, V. Page, N. Sinha and R. Sinha, Phys. Rev. D 71, 096002 (2005). 111. T. Gershon and A. Soni, J. Phys. G 33, 479 (2007). 112. J. M. Soares, Nucl. Phys. B 367, 575 (1991); A. L. Kagan and M. Neubert, Phys. Rev. D 58, 094012 (1998). 113. B. Aubert et al. [BABAR Collaboration], Phys. Rev. Lett. 93, 021804 (2004); Phys. Rev. Lett. 97, 171803 (2006); S. Nishida et al. [BELLE Collaboration], Phys. Rev. Lett. 93, 031803 (2004). 114. D. Atwood, T. Gershon, M. Hazumi and A. Soni, Phys. Rev. D 71, 076003 (2005). 115. B. Grinstein, Y. Grossman, Z. Ligeti and D. Pirjol, Phys. Rev. D 71, 011504 (2005); B. Grinstein and D. Pirjol, Phys. Rev. D 73, 014013 (2006). 116. M. Matsumori and A. I. Sanda, Phys. Rev. D 73, 114022 (2006). 117. P. Ball and R. Zwicky, Phys. Lett. B 642, 478 (2006). 118. B. Aubert et al. [BaBar Collaboration], Phys. Rev. D 72, 051103 (2005); Y. Ushiroda et al. [Belle Collaboration], Phys. Rev. D 74, 111104 (2006). 119. R. Aleksan, I. Dunietz and B. Kayser, Z. Phys. C 54, 653 (1992). 120. M. Paulini, arXiv:hep-ex/0702047; G. Punzi [CDF - Run II Collaboration], arXiv:hep-ex/0703029. http://arxiv.org/abs/hep-ex/0702047 http://arxiv.org/abs/hep-ex/0703029 ABSTRACT Precision tests of the Kobayashi-Maskawa model of CP violation are discussed, pointing out possible signatures for other sources of CP violation and for new flavor-changing operators. The current status of the most accurate tests is summarized. <|endoftext|><|startoftext|> Introduction of Reichenbach’s book [3]. He called the concept as ” ... of great interest for the methodology of physics but what has so far not received the attention it deserves”. In this paper we shall try to rectify for this failure of appreciating the concept of the Universal Force - albeit in a somewhat altered and improved form. Reichenbach defines two kind of forces - Differential Forces and Univer- sal Forces. It may be pointed out that the term ”force” here should not be taken strictly as defined in physics but in a broad and general framework. In fact Carnap has suggested that the term ”effect” instead of ”force’ would better serve the purpose [5] and which allows it be used in different frame- works. Hence to conform with the accepted practice, though in this paper we shall continue to use the term ”Universal Force” the reader may do well to remember that what we really mean is ”Universal Effect”. One calls a force Differetial if it acts differently on different substances. It is called Universal if it is quantitatively the same for all the substances [3,5]. If we heat a rod of initial length l0 from initial temperature T0 to tempetature T then its length is given as l = l0[1 + β(T − T0)] (1) where β the coefficient for thermal expansion is different for different materials. Hence this is a Differential Force. Now the correction factor due to the influence of gravitation on the length of the rod is l = l0[1− C φ] (2) Here the rod is placed at a distance r from sun whose mass is m and φ is the angle of the rod with respect to the the line sun to rod. C is a universal constant ( in CGS unit C= 3.7 x 10−29 ). As this acts in the same manner for any material of mass m, gravity is a Universal Force as per the above definition. Reichenbach also gives a general definition of the Universal Forces [3,p 12] as: (1) affecting all the materials in the same manner and (2) there are no insulating walls against it. We saw above that gravity is such a force, Indeed gravity is a Universal Force par excellance. It affects all matter in the same manner. The equality of the gravitational and inertial masses is what ensures this physically. If the gravitational and inertial masses were not found to be equal, then one would not have been able to visualize of the paths of freely falling mass points as geodesics in the four dimentional space-time. In that case different geodesics would have resulted from different materials of mass points [3]. Therefore the universal effect of gravitation on different kinds of measur- ing instruments is to define a single geometry for all of them. Viewed this way, one may say that gravity is geometerized. ”It is not theory of gravitation that becomes geometry, but it is geometry that becomes the experience of the gravitational field” [3, p 256]. Why does the planet follow the curved path? Not because it is acted upon by a force but because the curved space-time manifold leaves it with no other choice! So as per Einstein’s theory of relativity, one does not speak of a change produced by the gravitational field in the measuring instruments, but regard the measuring instruments as free from any deforming forces. Gravity being a Universal Force, in the Einstein’s Theory of Relativity, it basically disappears and is replaced by geometry. In fact Reichenbach [3, p 22] shows how one can give a consistent defi- nition of a rigid rod - the same rigid rods which are needed in relativity to measure all lengths. ”Rigid rods are solid bodies which are not affected by Differential Forces, or concerning which the influence of Differential Forces has been eliminated by corrections; Universal Forces are disregarded. We do not neglect Universal Forces. We set them to zero by definition. Without such a rule a rigid body cannot be defined.” In fact this rule also helps in defining a closed system as well. All this was formalized in terms of a theorem by Reichenbach [3, p 33] THEOREM θ : Given the geometry G0 to which the measuring instruments conform, we can imagine a Universal Force F which affects the instruments in such a way that the actual geometry is an arbitrary geometry G, while the ob- served deviation from G is due to universal deformation of the measuring instruments.” G0 + F = G (3) Hence only the combination G0+F is testable. As per Reichenbach’s prin- ciple one prefers the theory wherein we put F=0. If we accept Reichenbach principle of putting the Universal Force of gravity to zero, then the arbitrari- ness in the choice of the measuring procedure is avoided and the question of the geometrical structure of the physical space has a unique answer deter- mined by physical measurement. It is this principle which Carnap praises highly [5, p 171], ” Whenever there is a system of physics in which a certain universal effect is asserted by a law that specifies under what conditions in what amount the effect occurs, then the theory should be transformed so that the amount of effect would be reduced to zero. This is what Einstein did in regard to contraction and expansion of bodies in gravitational field.” The left hand side of Einstein’s equation (below) gives the relevant non-Euclideon geometry Gµν = 8πG〈φ|Tµν|φ〉 (4) In the case of gravity, and in as much as Einsteins’s Theory of Relativ- ity has been well tested experimentally, we treat the above concept as well placed empirically. But from this single success Reichenbach generalizes this as a fundamental principle for all cases where Universal forces may arise. As Carnap states [5, p 171], ” Whenever universal effects are found in physics, Reichenbach maintained that it is always possible to eliminate them by suit- able transformation of theory; such a transformation should be made because of the overall simplicity that would result. This is a useful general principle, deserving more attention than it has received. It applies not only to relativ- ity theory, but also to situations that may arise in the future in which other universal effects may be observed. Without the adoption of this rule there is no way to give unique answer to the question - what is the structure of space?”. As such Reichenbach goes ahead and tries to apply this principle of elimi- nation of Universal Forces to another universal effect that he finds and which arises from considerations of topology ( as an additional consideration over and above that of geometry ) of space-time of the universe. The Theorem θ is limited to talking about the geometry of space-time only. It does not take account of specific topological issues that may arise. To take account of topology of the space-time we shall have to extend the said theorem appropriately. What would one experience if space had different topological properties. To make the point home Reichenbach considers a torus-space [3, p 63]. This is quite detailed and extensive. However for the purpose of simplifying the and shortening the discussion here we shall talk of a two dimensional being who lives on the surface of a sphere. His measurements tell him so. But in spite of this he insists that he lives on a plane. He may actually do so as per our discussion above if he confines himself to metrical relations only. With an appropriate Universal Force he can he can justify living on a plane. But the surface of a sphere is topologically different from that of a plane. On a sphere if he starts at a point X and goes on a world tour he may come back to the same point X. But this is impossible on a plane. And hence to account for coming back to the ”same point” he has to maintain that on the plane he actually has come back to a different point Y - which though is identical to X in all other respects. One option for him is to accept that he is actually living on a sphere. However if he still wants to maintain his position that he is living on a plane then he has to explain as to how point Y is physically identical to point X in spite of the fact that X and Y are different and distinct points of space. Indeed he can do so by visualizing a fictitious force as an effect of some kind of ”pre-established harmony” [3, p 65] by proposing that everything that occurs at X also occurs at the point Y. As it would affect all matter in the same manner this corresponds to a Universal Force/Effect as per Reichenbach’s definition. This interdependence of corresponding points which is essential in this ”pre-established” harmony cannot be interpreted as ordinary causality, as it does not require ordinary time to transmit it and also does not spread continuously through intervening space. Hence there is no mysterious causal connection between the points X and point Y. Thus this necessarily entails proposing a ”causal anomaly” [3, p 65]. In short connecting different topolo- gies through a fictitious Universal Effect of ”pre-established harmony” neces- sarly calls for introduction of ”causal anomalies”. Call this new hypothesize Universal Force as A and the Theorem θ be extended to read G0 + F + A = G (5) where on the right had side we have given a different capital G which reduces to G of the original Theorem θ when A is set equal to zero. Now as per Reichenbach’s law of preferring that physical reality wherein all Universal Forces are put to zero, he advocates of putting A to zero. He pointed out that this has the advantage of retaining physical ”causality ” in our science, This he takes as a success of his methodology. As per Re- ichenbach [3, p 65] ” The principle of causality is one of its (physics) sacred laws, which it will not abandon lightly; pre-established harmony, however is incompatible with this law”. However, as the said ’causal anomaly” is of topological origin we cannot be sure in what manner it will manifest itself physically. In addition will not the Universal Force/Effect of ”pre-established harmony” compensate for it in some manner? So what one is saying is that it is possible that Reichenbach was wrong in putting all Universal Forces to zero. It was OK to put F to zero which justified the geometrical interpretation of gravity. But in the case of this new topological Universal Force we really do not know enough and let us not be governed by any theoretical prejudice and let the Nature decide as to what is happening. So to say, let us look at modern cosmology to see if it is throwing up any new Universal Forces which may be identified with our ”pre-established harmony” here. To understand this let us look at the Einstein’s Equation given above. Harvey and Schucking [11] correcting for Einstein’s error in understanding the role of the cosmological term λ have derived the most general equation of motion to be Gµν + λgµν = 8πG〈φ|Tµν|φ〉 (6) They showed that [11] the Cosmological Constant λ above provides a new repulsive force proportional to mass m, repelling every particle of mass m with a force F = mc2 x (7) Recent data [1] on λ is what leads to the crisis of Dark Energy. Quite clearly this repulsive force is a new Universal Force as per our definition and hence conforms to the ”pre-established harmony” aspect of the ”causal anomaly”. Thus we see that indeed as per the recent data on accelerating universe we have stumbled upon this new Universal Force which is of topological origin. Hence the source of dark energy is due to ”causal anomaly” arising from the unique topological structure of our universe. This solves the mystery of the origin of Dark Energy. So we would like to emphasize that it is the accelerating universe ( and hence the Dark Energy ) which is forcing us to accept the incorporation of this ”causal anomaly” of topological origin. Implications of this new concept in physics have now to be explored. Note that as per Theorem θ when one puts F to be zero then one obtains the proper non-Euclidean Geometry of Einstein’s equation. But now we know that full structure is the sum of this non-Euclidean geometry plus A , the new Universal Force ( as per the modified theorem above ) and this is what the accelerating universe is forcing us to accept. This is what we called capital G above. We feel that the DASI data on Ω0 being close to one and thus showing that the Universe is flat [1] is consistent with capital G being equal to G+ A. In principle just as per the original Theorem θ one may add a Universal Force F to Einstein’s non-Euclidean geometry to obtain a physically relevant Euclidean geometry, so in the same manner given a non- Euclidean geometry of Einstein on can add an appropriate Universal Force A to provide a flat universe. And this is exactly what capital G is telling us. Thus the observed flatness of the universe may be treated as a success of the new idea proposed here. One would like to ask as to in what other manner incorporation of this new ”causal anomaly” may help us in understanding Nature better? Will it provide new perspectives as answers to quantum mechanical puzzles of quantum jumps, non-locality etc. These are open questions to be tackled in future. REFERENCES 1. M S Turner, ”Making sense of the new cosmology”, Int J Mod Phys, A17S1 (2002) 180-196 2. Hans Reichenbach (1891-1953) can properly be called a philosopher- scientist. As a leading philosopher of science he was founder of the Berlin Circle and a proponent of logical positivism. Among his teachers were David Hilbert, Max Planck, Max Born and Albert Einstein. He wrote extensively on the theory of probability, theory of relativity and quantum mechanics. His philosophical writings have a definite scientific touch in them, very much akin to that of Descartes, Leibniz and Huygens. 3. H Reichenbach, ”The philosophy of space and time”, Dover, New York (1957) (Original German edition in 1928) 4. C Callender and N Huggett, ” Physics meets philosophy at the Planck scale”, Cambridge University Press, UK (2001) 5. R Carnap, ”An introduction to the philosophy of science”, Basic Books, New York (1966) 6. E Nagel, ”The structure of science”, Routledge and Kegan Paul, Lon- don (1961) 7. D Dieks, ”Gravitation as a Universal Force”, Synthese, 73 (1987) 381-397 8. B Ellis, ”Universal and Differential Forces”, Brit J Phil Sc, 14 (1963) 177-194 9. A Gruenbaum, ”Philosophical problems of space and time”, Dordrecht, Holland; D Reidel (1973) or Alfred A Knopf, New York (1963) 10. R Torretti, ”Relativity and geometry”, Pergamon Press (1983) 11. A Harvey and E Schucking, ”Einstein’s mistake and the cosmological constant”, Am J Phys, 68 (2000) 723-727 ABSTRACT The Dark Energy problem is forcing us to re-examine our models and our understanding of relativity and space-time. Here a novel idea of Fundamental Forces is introduced. This allows us to perceive the General Theory of Relativity and Einstein's Equation from a new pesrpective. In addition to providing us with an improved understanding of space and time, it will be shown how it leads to a resolution of the Dark Energy problem. <|endoftext|><|startoftext|> Introduction An important aspect in any geometric gravitational theory is the analysis of how to match two spacetimes. This is true in particular for General Relativity and its perturbation theory. Despite the relevance and maturity of the matching theory one often finds papers where the matching conditions are not properly used. Most of the difficulties arise from the fact that the matching conditions are imposed in specific coordinate systems in a manner which is not completely coordinate independent. More specifically, matching two spacetimes requires identifying the boundaries pointwise, and sometimes this identification is done implicitly by fixing spacetime coordinates, without paying enough attention to the fact that solving the matching involves finding an identification of the boundary and that this should not be fixed a priori. In perturbation theory this problem also arises, and it gets complicated by the fact that the fields to be matched (as the perturbed metric) are gauge dependent. So, in addition to a priori choices of identifications of the boundary, there is also the problem http://arxiv.org/abs/0704.0078v1 that particular gauges are often used. It may be argued that the matching theory must be gauge independent and therefore it can be performed in any gauge. This is true, but only when due care is taken to ensure that the choice of gauge does not restrict, a priori, the perturbed identification of the boundaries. A complete description of the linearized matching conditions has been achieved only recently by Carter and Battye [5] and independently by Mukohyama [6]. To second order, the matching conditions have been recently found in [7]. Despite these papers, we believe that some confusion still lingers in the field, in particular with respect to the existing gauge invariant formulations. The aim of this paper is to try to clarify these issues. In order to do that, we will critically discuss some of the approaches proposed in the literature trying to make clear which are the implicit assumptions made and to what extent are they justified. The first papers discussing the perturbed matching theory are, as far as we know, the classic papers by Gerlach and Sengupta [2, 3]. However, as explained below, their description of the perturbed matching theory contains imprecisions, and we will therefore start discussing their approach pointing out the difficulties they encounter. A first attempt to justify the claims in [2, 3] is due to Mart́ın-Garćıa and Gundlach [4], who propose a different but nevertheless closely related set of linearized matching conditions. Pointing out the implicit assumption made by these authors will also help us to try to explain the subtleties inherent to the perturbed matching theory. In [6] the linearized matching conditions are described for arbitrary backgrounds, perturbations and matching hypersurfaces, and then applied to the case of two background spacetimes with a high degree of symmetry, namely those which admit a maximal group of isometries acting on codimension two spacelike submanifolds (e.g. spherically symmetric spacetimes). In order to simplify the matching conditions, Mukohyama derives a set of matching conditions for so-called doubly gauge invariants. However, a gap arises in his final conclusions as the presented set of conditions for doubly gauge invariant quantities for the linearized matching of spacetimes are only shown to be necessary conditions. Analysing sufficiency touches directly on the issue we are trying to emphasize in this paper, so we devote one section to clarify this point, where we show how these conditions are, stricktly speaking, not sufficient. Since the matching conditions in terms of doubly gauge invariants are widely used in the literature, we consider important to close this gap. Moreover, the constructions of gauge invariant quantities using spherical harmonic decompositons leaves out the l = 0 and l = 1 sectors. We will discuss this issue and its consequences. The paper is organized as follows. We start by summarising the perturbed matching conditions in Section 2, where we also describe the gauge freedom involved. Then, the procedures used in the classic papers [2, 3] together with the justifications and further developments in [4] are reviewed in Section 3. Section 4 focuses on the consequences of the existence of symmetries in the background configuration, which will have relevance in our final discussion. Section 5 has three subsections. The first one is devoted to present briefly the procedure and results discussed in [6] particularised to the case of spherically symmetric backgrounds. In the second subsection we analyse the sufficiency of the doubly gauge invariant matching conditions in [6]. The last subsection is devoted to the study of the freedom left in the perturbation of the matching hypersurface once the metric perturbations have been fixed at both sides. We finish with an appendix where explicit expressions for the discontinuities of the perturbed second fundamental forms in the spherical case are given. Some of these expressions are used in the main text. 2 Linearized matching In this section we describe the gauge freedom involved in the linearised spacetime matching and summarise the perturbed matching conditions. 2.1 Gauge freedom The purpose of the matching theory is to construct a new spacetime out of two spacetimes M± with boundary by finding a suitable diffeomorphism between the boundaries which allows for their pointwise identification. In particular, the matched spacetime cannot be thought to exist beforehand. Another aspect to bear in mind is that the matching conditions involve exclusively tensors on the identified boundary Σ and hence any coordi- nate system in M± is equally valid. This is well-known but it is still source of confusion sometimes. In perturbed matching theory, not only the metrics are perturbed but also the match- ing hypersurfaces may be deformed. Furthermore, as for the metric, the “deviation” of the matching hypersurface is also a gauge dependent quantity. This can be best understood by viewing perturbations as ε-derivatives (at ε = 0) of a one-parameter family of space- times (M+ε , g ε ) with boundary Σ ε . It is convenient to embed M ε within a larger manifold (without boundary) V +ε to clarify the discussion. A priori, the manifolds (M ε , gε) are completely distinct so it makes no direct sense to talk about ε-derivatives. It is necessary to identify first the different manifolds so that a single point p refers to one point on each of the manifolds. Obviously, there are infinite ways to identify the manifolds, all of them equally valid a priori. This freedom leads to the gauge dependence of the perturbed met- ric (and of any other geometrically defined tensor). The identification above may, or may not, map the boundaries Σ+ε among themselves. A priori, a point in Σ 0 may be mapped, for ε 6= 0, to a point on Σ+ε , to a point interior to M+ε or to a point exterior to M+ε (within the extension V +ε ) which is not part of the manifold. How can we then take derivatives with respect to ε at those later points? Since only derivatives at ε = 0 are needed, re- stricting to infinitesimal values of ε entails no loss of generality. Then, if for some small ε, a point q ∈ Σ+0 is mapped to the exterior of M+ε , it follows from differentiability with respect to ε that q is mapped, for the reverse value −ε, to a point interior to M+ε . Thus, perturbations can be defined at the boundary by taking one sided derivatives, i.e. to take limits ε → 0, with a sign restriction on ε (c.f. [7] for an alternative discussion). However, an important issue remains: How do we describe the deformation of the boundary Σ+0 ? As a set of points each boundary Σ ε maps, with the above identification, into a hypersurface of the background spacetime, which we call Σ̂+ε . In general, this hypersurface will not coincide with Σ+0 and may well touch it or cross it. This gives us an idea of how the boundary is deformed, but only as a subset, not pointwise. In order to know how the boundary actually moves within the background, we need to prescribe a priori a pointwise identification of Σ+0 with Σ ε . This identification is completely different and independent from the one described above involving spacetime points, and involves only the points on the boundaries. As before, there are infinitely many ways to identify the boundaries, and this defines a second gauge freedom, which involves objects intrinsically defined on the boundary. This gauge freedom will be referred as hypersurface gauge, as opposed to the usual spacetime gauge described above. With both identifications chosen, the deformation of the boundary within the back- ground can already be described: Fix a point q on the background boundary Σ+0 . The identification of the boundaries defines a point qε on Σ ε , for each ε. The spacetime iden- tification takes this point qε and maps it into a point q̂ε of the background M 0 (perhaps after a sign restriction on ε). Obviously q̂ε belongs to the perturbed hypersurface Σ̂ ε . We have therefore not only a deformation of the background hypersurface as a set of points, but also pointwise information. It only remains to take the tangent vector of the curve q̂ε at ε = 0, i.e. ~Z + = dq̂ε |ε=0 which encodes completely the deformation of the matching hypersurface as seen from the background spacetime. Two final remarks are in order: (i) ~Z+ is defined exclusively on Σ+0 , no extension thereof is defined or required and (ii) ~Z+ depends on both the spacetime and hypersurface gauges, since its defining curve is constructed using both identifications. However, decomposing ~Z+ = Q+~n 0+ + ~T+, where ~n 0+ is the unit normal of Σ 0 (assumed non-null anywhere) and ~T+ is tangent to it, it turns out that Q+ depends on the spacetime gauge but not on the hypersurface gauge. This is because changing the hypersurface gauge reorganizes the points within each Σ̂+ε , but cannot modify any of them as a set of points. Tensors defined intrinsically on the boundaries Σ+ε are completely unaffected by the spacetime identification, and are therefore invariant under spacetime gauge transforma- tions. Recall that the matching conditions involve only objects intrinsic to the match- ing hypersurfaces. Since the perturbed matching conditions are, formally, just their ε- derivatives, it follows by construction that the perturbed matching conditions must be gauge invariant under spacetime gauge transformations. This may seem surprising at first sight since the matching conditions must involve the perturbed metric, which is obviously gauge dependent. However, the conditions turn out to be gauge independent because they also involve the deformation vector ~Z+, which is spacetime gauge dependent. This vector is therefore of fundamental importance and must be taken into account in any sensible approach to the problem, as we shall see next. 2.2 Matching conditions Let (M±0 , g 0 ) be n-dimensional spacetimes with non-null boundaries Σ 0 . Matching them requires an identification of the boundaries, i.e. a pair of embeddings Φ± : Σ0 −→ M±0 with Φ±(Σ0) = Σ 0 , where Σ0 is an abstract copy of any of the boundaries. Let ξ (i, j, . . . = 1, . . . , n−1) be a coordinate system on Σ0. Tangent vectors to Σ±0 are obtained by e±αi = (α, β, . . . = 0, . . . , n − 1). There are also unique (up to orientation) unit normal vectors n α to the boundaries. We choose them so that if n α points towards M+ then n α points outside of M− or viceversa. The first and second fundamental are simply q(0)ij ± ≡ e±αi e , K(0)ij ± = −n(0)±αe i ∇±β e j . The matching conditions (in the absence of shells) require the equality of the first and second fundamental forms on Σ±0 , i.e. q(0)ij + = q(0)ij −, K(0)ij + = K(0)ij −. (1) Under a perturbation of the background metric g±pert = g (0)± + g(1)± and of the matching hypersurfaces via ~Z± = Q± ~n(0)± + ~T ±, the matching conditions will be perturbatively satisfied if and only if [6] q(1)+ij = q ij , K ij = K ij, (2) q(1)ij ± = L~T±q ± + 2Q±K(0)ij ± + e±αi e ±, (3) K(1)ij ± = L~T±K ± − ǫDiDjQ± +Q±(−n(0)± µn αµβνe ±K(0) g(1)αβ βK(0)ij ± − n(0)± µS (1)±µ j , (4) where ǫ = n (0)α, D is the covariant derivative of (Σ, q(0)±) and S(1) βγ ≡ 12(∇ (1)±α ∇±γ g(1)±αβ −∇±α g(1) In these equations, Q± and ~T± are a priori unknown quantities and fulfilling the matching conditions requires showing that two vectors ~Z± exist such that (2) are satisfied. The spacetime gauge freedom can be exploited to fix either or both vectors ~Z± a priori, but this should be avoided (or at least carefully analysed) if additional spacetime gauge choices are made, in order not to restrict a priori the possible matchings. Regarding the hypersurface gauge, this can be used to fix one of the vectors ~T+ or ~T−, but not both. As already stressed the linearized matching conditions are by construction spacetime gauge invariant (in fact each of the tensors q(1)ij ±, K(1)ij ± is). Moreover, the set of conditions (2) are hypersurface gauge invariant, provided the background is properly matched, since [6] under such a gauge transformation given by the vector ~ζ on Σ0, q transforms as q(1)ij + L~ζ q(0)ij , and similarly for K(1)ij . 3 On previous spacetime gauge invariant formalisms The first attempt to derive a general formalism for the matching conditions in linearized gravity is, to our knowledge, due to Gerlach and Sengupta [2]. Their approach is based on the description of the matching hypersurface Σ as a level set of a function f defined on the spacetime. Assuming the level sets {f = const} to be timelike, a field of spacelike unit normals is defined as nµ = (g αβf,αf,β) −1/2f,µ. The unperturbed matching conditions correspond to the continuity everywhere (in particular across Σ) of the tensors qαβ ≡ gαβ − nαnβ, Kαβ ≡ qαµqβν∇µnν , (5) which are the spacetime versions of the first and second fundamental forms introduced above. Being f defined everywhere, it makes sense to perturb it in order to describe the variation of the matching hypersurface. Obviously, by perturbing f one also perturbs nµ. The perturbed matching conditions proposed in [2] read β△(qαβ)+ = qµαqνβ△(qαβ)−, qµαqνβ△(Kαβ)+ = qµαqνβ△(Kαβ)−, (6) where qβ α is the projector onto Σ, △ stands for perturbation and + and − denote the quantities as computed from either side of the matching hypersurface Σ. These expres- sions involve the projections of the perturbations of qαβ and Kαβ onto Σ. The need of 1We will abuse slightly the notation and refer to vectors on Σ0 and their images on spacetime with the same symbol. The meaning should be clear from the context. considering only the projected components is justified in [2] since the matching condi- tions need to be intrinsic to the matching hypersurfaces. However, Gerlach and Sengupta themselves note that conditions (6) are not gauge2 invariant. Since the main interest in [2, 3] refers to spherically symmetric backgrounds, this “ambiguity” is fixed in that case by finding suitable gauge invariant combinations of the linearized matching conditions, which turn out to give a correct set of necessary perturbed matching conditions in spherical symmetry. It should be stressed however, that the authors consider these gauge invariant subset to be sufficient also, with no further justification. We know from the discussion in Sect. 2.1 above that (6) cannot be correct as it leads to a set of gauge dependent conditions. Since, on the other hand the proposal (6) may look plausible, it is of interest to point out where, and in which sense, it fails to be correct. The first source of problems comes from assuming that the matched spacetime is given beforehand. Indeed, qαβ and Kαβ are spacetime tensors and they can only exist (and be continuous) once the matched spacetime is constructed. But this is precisely the purpose of the matching conditions, so the conditions become circular. Another aspect of the same problem is that one can only talk about continuity once the pointwise identification of the boundaries is chosen. But a level set of a function defines only a set of points and not the way those points must be identified. A third instance of the same issue is that tensor components must be expressed in some basis, e.g. a common coordinate system covering both sides of Σ. But again this cannot be assumed a priori. It needs to be constructed. Let us however mention that once the pointwise identification of the boundaries is chosen, the use of spacetime tensors is allowed provided they are finally projected onto the hypersurface. In that sense, and when properly used, using spacetime indices may simplify some calculations notably (see Carter and Battye, [5] where this notation is used to derive the perturbed matching conditions). Besides this aspect (which already affects the background matching) the perturbed equations (6) suffer from one extra problem. The perturbations △(qαβ)(p) and△(Kαβ)(p) at a point p in the background can be defined by taking ε-derivatives at fixed p and ε = 0 of the corresponding tensors (defined by gαβ(ε) and fε). For each value of ε, the matching conditions impose the continuity of qαβ(ε) andKαβ(ε) everywhere (with the caveat already mentioned regarding the identification of the boundaries). However, continuity of △(qαβ) and △(Kαβ) at p would only follow if derivatives of continuous functions with respect to an external parameter were necessarily continuous (in our case, the derivative with respect to ε), which is not true in general. A trivial example is given by the function u(ε, x) = |x + ε|, whith x ∈ R. For each ε this function is continuous. However, the derivative with respect to ε does not even exist at x = 0, ε = 0. This reflects the fact that subtracting continuous tensors at a fixed spacetime point p leads to objects that need not be continuous. This is in fact the main problem of (6) as linearized matching conditions. An immediate question arises: Why is the gauge invariant subset of matching condi- tions found in [2, 3] for spherically symmetric backgrounds correct? In order to understand this, let us rewrite (6) using the formalism of section 2.2. First of all, since △(nαnβ) will contain, at least, one free n α , we have β△(qαβ)± = qµαqνβg(1) αβ . (7) 2Throughout this section gauge will refer to spacetime gauge. Hypersurface gauges will only appear briefly towards the end of the section. Moreover, a simple calculation gives △(∇αnβ) = ∇α(△nβ) − S(1)µαβ n µ and △(qαβ ) = −g(1)αµn(0)µ n(0)β + g(0)αµ△(nµ)n (0)α△(nβ). These, together with standard properties of the projector, lead to β△(Kαβ)± = a(0)ν qµ α△(nα) + qµαqνβ∇α(△nβ)− qµαqνβS(1)ραβ n ±, (8) where a ν ≡ n(0)α∇αn(0)ν . In general, these expressions do not agree with (3) and (4). However, when the gauges are chosen so that ~Z± = 0, then △f ≡ 0 on Σ because the matching hypersurface is unperturbed as seen from the background. Consequently ∂α(△f) ∝ n(0)α on Σ, which implies △(nα) α for some function h. Imposing ~n(ε) to be unit for all ε fixes h = ǫ β . Inserting into (8) the matching conditions (6) become βg(1)αβ βg(1)αβ , (9) n(0)α n β Kµν − qµ n(0)α n β Kµν − qµ which agree with (2) (with the exception that (9) refers to spacetime tensors and (2) are defined on Σ). Since Gerlach and Sengupta derive a subset of gauge invariant matching conditions out of (6) in the spherically symmetric case and their conditions are correct in one gauge, it follows that the invariant subset is correct in any gauge. This is the reason why the results in [2, 3] involving spherically symmetric backgrounds turn out to be fine. Substantial progress in the linearized matching problem was made by Mart́ın-Garćıa and Gundlach [4]. These authors pointed out the lack of justification in [2, 3] for the choice of (6) as matching conditions. It was also argued that for spacetimes with boundary it only makes sense to define perturbations by using gauges where the perturbed matching hypersurface is mapped onto the background matching hypersurface. Perturbations in this gauge, called “surface gauge” (not to be confused with hypersurface gauge) are denoted by △̄, and its defining property is △̄f = 0. The idea was to write down the matching conditions in this gauge and then transform into any other gauge if necessary. As noticed by the authors, the surface gauge is not unique since there are still three degrees of freedom left, which correspond to the three directions tangent to Σ. A relevant observation made in [4] was that the continuity of tensorial perturbations may depend on the index position in the tensors. The authors argue that the tensors truly intrinsic to the hypersurfaces are qαβ, Kαβ (with indices upstairs) and propose the following perturbed matching conditions △̄(qαβ)+ = △̄(qαβ)−, △̄(Kαβ)+ = △̄(Kαβ)−, (10) which are demonstrated to become exactly (9). This shows the equivalence of both pro- posals in the surface gauge, as explicitly stated in [4]. This justifies partially the validity of both approaches in the surface gauge. However, the justification is not complete because of the issue we discuss next. Indeed, conditions (10) still carry one implicit assumption that needs to be clarified. As already stressed the perturbed matching conditions have two inherent and independent degrees of gauge freedom. The approach by Mart́ın-Garćıa and Gundlach involves only spacetime objects, and therefore can only notice the spacetime gauge freedom. This leads to an incorrect statement in [4], as it is not true that the linearized matching conditions read (10) in any surface gauge. Conditions (10) will only be valid when the spacetime gauge maps pairs of background points (identified, via the background matching) to pairs of points on the perturbed boundaries Σ±ε which are also identified through the matching. Notice that not all surface gauges have this property. In explicit terms, this means that the vectors ~Z± must (i) only have tangential components (so that we are in surface gauge) and (ii) have the same components when written in terms of an intrinsic basis of Σ0. In less precise, but more intuitive terms, condition (ii) states that ~Z+ and ~Z− are the same vector, i.e. that the gauges in both regions are chosen such that the displacement of one fixed point of the background hypersurface is identical in both regions (the displaced point, of course, stays on the unperturbed hypersurface, due to the choice of surface gauge). Observe finally that if Q± = 0 and ~T+ = ~T−, then the linearized matching conditions (2) truly reduce to conditions (9), once the latter are projected on Σ. This shows the correctness of the approaches by Gerlach and Sengupta and Mart́ın-Garćıa and Gundlach in special gauges. 4 Freedom in matching due to symmetries We devote this section to the study of the consequences of the existence of background symmetries on perturbed spacetime matchings. The existence of symmetries in the background configuration introduces two issues which are important to take into consideration: the first corresponds to the freedom in- troduced by the matching procedure, when preserving the symmetries, at the background level [9], c.f. [10] for an application. The second issue corresponds to the consequences that the symmetries in the background configuration may have on the perturbation of the matching. It must be stressed here that the arbitrariness introduced by the presence of symmetries in the background configuration is completely independent from both the hypersurface and spacetime gauge freedoms. However, that arbitrariness is gauge dependent and therefore a gauge choice can be made to remove it. As we will show, an isometry in the background implies that there is a direction along which the difference [~T ] ≡ ~T+ − ~T− cannot be determined by the perturbed matching equations. But, as we have discussed at the end of section 2, one could eventually choose part of the spacetime gauges (if there is any freedom left) to fix [~T ]. Note, finally, that a change of hypersurface gauge leaves [~T ] invariant. 4.1 Isometries We shall now consider the presence of isometries in the background configuration. So, let us assume that one of the sides, say (M+0 , g (0)+), admits an isometry generated by the Killing vector field ~ξ+ tangent to the boundary Σ+0 . The commutation of the Lie derivative and the pull-back implies [9] L~ξ+q + = e+αi e j L~ξ+g +|Σ0 = 0, which means that ~ξ+ is a Killing vector of (Σ0, q +). This implies from expression (3) that q(1)ij + is invariant under the transformation ~T+ → ~T+ + ~ξ+|Σ0. As for K(1)ij +, from its expression (4), it is again clear that the previous transforma- tions of ~T+ will leave K(1)ij + invariant provided L~ξ+K(0)ij+ = 0. But this is precisely the case since ~ξ+ is a Killing vector orthogonal to n + , which implies L~ξ+n + β|Σ+ = 0, and hence L~ξ+K + = e+αi e j L~ξ+(∇n + )αβ|Σ0 = e+αi e j ∇αL~ξ+n + β|Σ0 = 0. Of course, all this discussion also applies to the (−) side. The combination of the invariance of q(1)ij ± and K(1)ij ± leads to the fact that the first order perturbed matching conditions are invariant under a change of the vectors ~T± along the direction of any isometry of the background configuration (preserved by the matching). Then, as expected, when symmetries are present the linearized matching conditions cannot determine the difference [~T ] completely: they leave undetermined the relative (between the two sides) deformation of the hypersurface along the direction of the symmetry. Note that, still, the shape of the perturbed hypersurface is completely determined, since that is driven by Q±. The overall picture is as follows: at the background level we have the arbitrariness of the identification of Σ+0 with Σ 0 [9], which can be seen as a “sliding” between Σ 0 and Σ−0 . The perturbation adds to this an arbitrary shift of the deformation of the matching hypersurface at each side along the orbits of the isometry group. As an example, in the description of stationary and axisymmetric compact bodies discussed in [10, 9], the background sliding corresponds to an arbitrary constant rotation of the interior with respect to the exterior. Note that, in that case, this rotation is only relevant because the exterior is taken to be asymptotically flat. As a result, two identical interiors can, in principle, give rise to two exteriors that differ by a constant rate rotation [10]. The shift of the surface deformation would, in principle, lead to an arbitrary constant rotation along the axial coordinate of the surface deformation of the body. Likewise, two identical perturbations in the interior of the body may produce two different perturbations in the exterior, which may differ by a relative constant rate rotation. A choice of spacetime gauge could be used to relate the deformations inside and outside. However, this may interfere with other gauge fixings that may have been made. 5 n-dimensional spherically symmetric backgrounds In this section we shall revisit Mukohyama’s theory for linearized matching in the special case of spherical symmetry. Similar results [6] hold for backgrounds admitting isometry groups of dimension (n−1)(n−2)/2 acting on non-null codimension-two orbits of arbitrary topology (strictly speaking the orbits need to be compact). 5.1 The approach of Mukohyama Concentrating on one of the two spacetimes to be matched, either + or −, we consider a spherically symmetric background metric of the form αdxβ = γabdx adxb + r2ΩABdθ AdθB, (11) where γab (a, b, .. = 0, 1) is a Lorentzian two-dimensional metric (depending only on {xa}), r > 0 is a function of {xa}, and ΩABdθAdθB is the n − 2 dimensional unit sphere metric with coordinates {θA} (A,B, . . . = 2, 3, . . . , n− 1). A general spherically symmetric background hypersurface can be given in parametric form as Σ0 := {x0 = Z(0)0(λ), x1 = Z(0)1(λ), θA = ϑA}, (12) where {ξi} = {λ, ϑA} is a coordinate system in Σ0 adapted to the spherical symmetry. The tangent vectors to Σ0 read ~eλ = ˙Z(0)0∂x0 + ˙Z(0)1∂x1 , ~eϑA = ∂θA |Σ0 , (13) where dot is derivative w.r.t. λ. With N2 ≡ −ǫeλaeλbγab|Σ0, so that ǫ = 1 (ǫ = −1) corresponds to a timelike (spacelike) hypersurface, the unit normal to Σ0 reads (0) = − det γ − ˙Z(0)1dx0 + ˙Z(0)0dx1 , (14) where the sign choice of N corresponds to the choice of orientation of the normal. The background induced metric and second fundamental form on Σ0 read q(0)ijdξ idξj = −ǫN2dλ2 + r2|Σ0ΩAB|Σ0dϑAdϑB, (15) K(0)ijdξ idξj = N2Kdλ2 + r2K̄|Σ0ΩAB|Σ0dϑAdϑB, (16) where K ≡ N−2eλaeλb∇an(0)b , K̄ = n (0)a∂xa ln r. It follows that the background matching conditions (1) are N2+ = N +|Σ0 = r2−|Σ0, K+ = K−, K̄+ = K̄−. (17) Using (3) and (4) we could now compute the first order perturbations q(1)ij and K ij in terms of the above quantities and ~Z (or equivalently Q and ~T ), c.f. Eqs. (45) and (46) in [6]. Let us recall (see subsection 2.2) that while the individual tensors q(1)ij and K are not hypersurface gauge invariant, their respective differences from the + and − sides (i.e. the linearized matching conditions) are. Those tensors depend of the hypersurface gauge through the tangent vectors ~T+ and ~T−, which under a gauge change transform simply by adding the gauge vector. It follows that only their difference [~T ] can appear in the linearized matching conditions. Consequently there are three degrees of freedom that cannot be fixed by the equations, but can be fixed by choosing the hypersurface gauge, for instance to set ~T+. Thus, the linearized matching conditions can be looked at as equations for the difference [~T ] as well as for Q+ and Q−, i.e. for five objects. If these equations admit solutions, then the linearized matching is possible and it is impossible otherwise. Mukohyama emphasizes the convenience to look for doubly gauge invariant quantities to write down the linearized matching conditions, however the matching conditions are already gauge invariant (both for the spacetime and hypersurfaces gauges). Looking for gauge invariant combinations on each side amounts to writing equations where the dif- ference vector [~T ] simply drops. Indeed, in many cases, knowing the value of such vector in a specific matching is not interesting. In that sense, using doubly gauge invariant quantities is useful as it lowers the number of equations to analyse. However, we want to stress that this is not related to obtaining gauge invariant linearized matching equations. It is just related to not solving for superfluous information. In fact, a set of equations where also Q+ and Q− have disappeared would be even more convenient from this point of view, provided one is not interested in knowing how the hypersurfaces are deformed in the specific spacetime gauge being used. Since the use of doubly gauge invariant matching conditions is used extensively, let us recall its main ingredients in order to discuss if they really are equivalent to the full set of linearized matching equations and in which sense. To that aim Mukohyama [6], decomposes the perturbation tensors q(1)ij and K ij in terms of scalar Y , vector VA and tensor harmonics TAB on the sphere, as q(1)ijdξ idξj = (σ00Y dλ 2 + σ(Y )T(Y )ABdϑ AdϑB) + 2(σ(T )0V(T )A + σ(L)0V(L)A)dλdϑ (σ(T )T(T )AB + σ(LT )T(LT )AB + σ(LL)T(LL)AB)dϑ AdϑB, (18) K(1)ijdξ idξj = (κ00Y dλ 2 + κ(Y )T(Y )ABdϑ AdϑB) + 2(κ(T )0V(T )A + κ(L)0V(L)A)dλdϑ (κ(T )T(T )AB + κ(LT )T(LT )AB + κ(LL)T(LL)AB)dϑ AdϑB, (19) where all the scalar coefficients depend only on λ. Each coefficient in the decomposition has indices l and m which have been dropped for notational simplicity. Notice that each coefficient σ and κ is defined in the range of l’s appearing in the corresponding summatory. By construction, each of the σ and κ are spacetime-gauge invariant (but not hypersurface-gauge invariant). For l ≥ 2 they can even be written down [6] explicity in terms of spacetime-gauge invariant quantities. In a similar way, the doubly gauge- invariant quantities presented in [6], are only defined for l ≥ 2 (except k(T )0, which is also defined for l = 1), and read l ≥ 2 : f00 ≡ σ00 − 2N∂λ l ≥ 2 : f ≡ σ(Y ) + ǫN−2χ∂λ r2|Σ0 k2l σ(LL), l ≥ 2 : f0 ≡ σ(T )0 − r2|Σ0∂λ r−2|Σ0σ(LT ) l ≥ 2 : f(T ) ≡ σ(T ), l ≥ 2 : k00 ≡ κ00 + ǫKσ00 + ǫχ∂λK, l ≥ 1 : k(T )0 ≡ κ(T )0 − K̄σ(T )0, (20) l ≥ 2 : k(L)0 ≡ κ(L)0 + (ǫK − K̄)σ(L)0 + (ǫK + K̄) χ− r2|Σ0∂λ(r−2|Σ0σ(LL)) l ≥ 2 : k(LT ) ≡ κ(LT ) − K̄σ(LT ), l ≥ 2 : k(LL) ≡ κ(LL) − K̄σ(LL), l ≥ 2 : k(Y ) ≡ κ(Y ) − K̄σ(Y ) + ǫN−2r2|Σ0χ∂λK̄, l ≥ 2 : k(T ) ≡ κ(T ) − K̄σ(T ), 3The ranges of l’s are not made explicit in [6] in order to include also non-compact homogeneous spaces, where the index l is continuous. However, to discuss sufficiency of the equations we need to be precise on the range of validity of each equation. where k2l = l(l + n− 3) and l ≥ 2 : χ ≡ σ(L)0 − r2|Σ0∂λ(r−2|Σ0σ(LL)). The orthogonality properties of the scalar, vector and tensor harmonics imply that the equalities of the coefficients σ and κ for each l and m is equivalent to the equality of the perturbation tensors (18) and (19) at both sides of Σ0. Thus, recalling the notation [f ] ≡ f+|Σ0 − f−|Σ0 , the equations l ≥ 0 : [σ00] = [σ(Y )] = 0 l ≥ 1 : [σ(L)0] = [σ(T )0] = 0 l ≥ 2 : [σ(T )] = [σ(LT )] = [σ(LL)] = 0 l ≥ 0 : [κ00] = [κ(Y )] = 0 l ≥ 1 : [κ(L)0] = [κ(T )0] = 0 l ≥ 2 : [κ(T )] = [κ(LT )] = [κ(LL)] = 0 are equivalent to (2) and therefore correspond exactly to the linearized matching condi- tions in this setting. Notice that each of the equalities in (21) and (22) is in fact one equation for each l and m in the appropriate range. We will however refer to them simply as equations. The full linearized matching conditions obviously imply the following equalities in terms of the doubly-gauge invariant quantities (20), l ≥ 2 : [f00] = [f ] = [f0] = [f(T )] = 0 (23) l ≥ 1 : [k(T )0] = 0 l ≥ 2 : [k00] = [k(Y )] = [k(L)0] = [k(LL)] = [k(LT )] = [k(T )] = 0. Whether these equations can be regarded as the full set of linearized matching conditions or not requires studying their sufficiency, i.e. whether they imply (21)-(22) or not. This point was not mentioned in [6] and in fact the answer turns out to be negative, although in a mild way, as we discuss in the next subsection. 5.2 On the sufficiency of the continuity of the doubly-gauge in- variants Let us recall that fulfilling the matching conditions requires finding two ~Z± such that (21)-(22) are satisfied. The key issue for the matching is therefore to show existence of deformation vectors ~Z± so that all the equations hold. A plausibility argument in favour of the sufficiency of (23)-(24) comes from simple equation counting. Indeed, as already discussed, the linearized matching conditions are spacetime and hypersurface gauge invariant and therefore can only involve the difference vector [~T ], i.e. three quantities. Since constructing double gauge invariant quantities on each side eliminates this vector, the number of equations should be reduced exactly by three if they are to remain equivalent to the original set. This is precisely what happens as we go from the original forteen equations in (21)-(22) down to eleven equations in (23)-(24). This argument however is not conclusive, both because it is not rigorous and because each equation in those expressions is, in fact, a set of equations depending on l and m, and the range of l’s changes with the equations. Let us therefore analyse this issue in detail. In particular we need to discuss what are the consequences of the non-existence of doubly gauge-invariant variables for l = 0 and l = 1 (except for k(T )0 which exists for l = 1), something not mentioned in [6]. Let us start by finding explicit expressions for σ’s valid in the whole range of l’s. As in [6], we decompose g(1) in harmonics as g(1)αβdx αdxβ = (habY dx adxb + h(Y )T(Y )ABdθ AdθB) 2(h(T )aV(T )A + h(L)aV(L)A)dx (h(T )T(T )AB + h(LT )T(LT )AB + h(LL)T(LL)AB)dθ AdθB, (25) and ~Z as zaY dx (z(T )V(T )A + z(L)V(L)A)dθ QY n(0) − ǫN−2zλY eλ (z(T )V(T )A + z(L)V(L)A)dθ A, (26) which implies Tαdx l=0(−ǫN−2zλY eλ) + l=1(z(T )V(T )A + z(L)V(L)A)dθ A. Inserting these expressions into (2) and expanding in spherical harmonics it is straightforward to l ≥ 0 : [σ00] = 0 ⇔ [hλλ] + 2[Q]N2K + 2N∂λ N−1[zλ] l ≥ 1 : [σ(L)0] = 0 ⇔ [zλ] + [h(L)λ] + r2|Σ0∂λ(r−2|Σ0[z(L)]) = 0, (27) l ≥ 2 : [σ(LL)] = 0 ⇔ [z(L)] + [h(LL)] = 0, (28) l ≥ 0 : [σ(Y )] = 0 ⇔ [h(Y )] + 2[Q]r2|Σ0K̄ − ǫN−2[zλ]∂λ(r2|Σ0)− k2l [z(L)] = 0, l ≥ 1 : [σ(T )0] = 0 ⇔ [h(T )λ] + r2|Σ0∂λ(r−2|Σ0[z(T )]) = 0, l ≥ 2 : [σ(LT )] = 0 ⇔ [z(T )] + [h(LT )] = 0, (29) l ≥ 2 : [σ(T )] = 0 ⇔ [h(T )] = 0, where [hλλ], [h(L)λ], etc. denote eλ b[hab], eλ a[h(L)a], etc. Later on we will also write down the explicit expressions for (22) but they are not needed in this subsection. It is obvious by the form of f ’s and κ’s (20) that the set of equations (21)-(22) are equivalent to (23)-(24) together with l ≥ 2 : [σ(L)0] = [σ(LT )] = [σ(LL)] = 0 (30) l = 0, 1 : [σ00] = [σ(Y )] = 0 l = 1 : [σ(L)0] = [σ(T )0] = 0 l = 0, 1 : [κ00] = [κ(Y )] = 0 l = 1 : [κ(L)0] = 0. Sufficiency of Mukohyama’s doubly gauge invariant matching conditions would follow if these equations serve exclusively to determine the discontinuity [~T ], i.e. [zλ] for l ≥ 0 and [z(T )], [z(L)] for l ≥ 1. Now, the explicit expressions (27), (29), (28) show that (30) determine uniquely [zλ], [z(T )] and [z(L)] for l ≥ 2. So, restricted to the sector l ≥ 2 Mukoyama’s doubly gauge invariant matching conditions can be regarded as equivalent to the full set of matching conditions. Taking all l’s into account, however, the equations turn out not to be sufficient. To show this, it is enough to display one equation involving the discontinuity of the background metric perturbations and [Q] (but not [~T ]) which holds as a consequence of the full set of matching conditions (21)-(22) but not as a consequence of (23)-(24). Using the fact that each l = 1 expression refers to n − 1 objects (one for each m), the number of equations in (31)-(32) is 7n− 3, while the number of unkowns in [~T ] not yet determined by (30), i.e. [zλ] for l = 0, 1 and [z(T )], [z(L)] for l = 1 is 3n − 2, which is smaller. It is to be expected, therefore, that (31), (32) imply conditions where these variables do not appear. This can be made explicit, for instance, by combining [σ00]l=0 = 0 with [σ(Y )]l=0 = 0 which yields l = 0 : [hλλ] + 2[Q]N 2K + 2N∂λ ∂λ(r2|Σ0) [h(Y )] + 2[Q]r 2|Σ0K̄ whenever ∂λ(r 2|Σ0) 6= 0. (If ∂λ(r2|Σ0) = 0 it is enough to consider [σ(Y )]l=0 = 0.) This relation is enough to show that the continuity of the doubly-gauge invariant variables of Mukohyama is not sufficient to ensure the existence of the perturbed matching. Of course, this does not invalidate Mukohyama’s approach in any way, which remains interesting and useful. It only means that, when using this approach to solve linearized matchings, one still needs to look more carefully into the l = 0 and l = 1 sector to make sure that the remaining equations (31) and (32) hold. On the other hand, equations (31), (32) do not completely determine [~T ]. The variable [z(T )]l=1 only appears in [σ(T )0]l=1 = 0, in the term ∂λ(r −2|Σ0 [z(T )]). As a result, the matching conditions do not fix [z(T )]l=1 completely, but up to a constant factor times r2|Σ0 (for each m). Recalling that V(T )AdϑA for l = 1 correspond to the three Killing vectors on the sphere, this arbitrary constant (for each m) accounts for the addition to [~T ] of an arbitrary Killing vector of the sphere. This is in accordance with the discussion in Section 4. We devote the following subsection to complete the study of the freedom left in the matching. 5.3 Freedom in the matching As already emphasized, solving the linearized matching amounts to finding perturbation vectors ~Z+ and ~Z−. Assume now that a linearized matching between two given back- grounds and perturbations has been done. It is natural to ask what is the most general matching between those two spaces, i.e. what is the most general solution for ~Z+ and ~Z− of the matching conditions. Geometrically, this means finding all the possible deforma- tions of the matching hypersurface Σ0 which allow the two spaces to be matched. Since this problem is of interest not only when the full matching conditions are imposed but also in situations where layers of matter are present (e.g. in brane-world or shell cosmologies) so that jumps in the second fundamental forms are allowed, we will analyse this issue in two steps. First, we will study the equations involving the perturbed first fundamental forms and will determine the freedom they admit. On a second step we will write down the extra conditions coming from the equality of the second fundamental forms. Thus, let us consider two perturbation configurations of the same background matching and denote their respective sets of difference variables on Σ0 as [f ] and [f ] ′ for any given variable f . Now, we will define the difference between the two configurations as < f >≡ [f ]′ − [f ] for any variable f . The assumption that the perturbation on each side is fixed once and for all implies < g(1) >= 0. We are assuming that the linearized matching conditions are satisfied in each case, and so we can subtract them. Linearity implies that the differences of the linearised matching equations become equations for the difference vector < ~Z >. The general solution of these equations clearly determines the freedom in the deformation of the hypesurface. The difference of the equations in (21) for the two configurations using < g(1) >= 0 give the following set of equations l ≥ 0 : < σ00 >= 0 → < Q > N2K +N∂λ N−1 < zλ > = 0, (33) l ≥ 1 : < σ(L)0 >= 0 → < zλ > +r2|Σ0∂λ(r−2|Σ0 < z(L) >) = 0, (34) l ≥ 2 : < σ(LL) >= 0 → < z(L) >= 0, (35) l ≥ 0 : < σ(Y ) >= 0 → 2 < Q > r2|Σ0K̄ − ǫN−2 < zλ > ∂λ(r2|Σ0) k2l < z(L) >= 0, (36) l ≥ 1 : < σ(T )0 >= 0 → ∂λ(r−2|Σ0 < z(T ) >) = 0, (37) l ≥ 2 : < σ(LT ) >= 0 → < z(T ) >= 0, (38) l ≥ 2 : < σ(T ) >= 0 → 0 = 0. Expressions (35) and (38) readily determine < z(L) >l≥2=< z(T ) >l≥2= 0, which sub- stituted in (34) give < zλ >l≥2= 0. As a result, (36) for l ≥ 2 lead to < Q >l≥2= 0. Clearly all the equations for l ≥ 2 are now satisfied. We now concentrate on the l = 1 equations. Equation (37) implies that < z(T ) >l=1= ar 2|Σ0, where a is a constant for each m. Combining equations (33), (34) and (36) for l = 1 we obtain the following equation for r−2|Σ0 < z(L) >l=1, K̄∂2λ(r−2|Σ0 < z(L) >l=1) + (2K̄ + ǫK)∂λ(ln r|Σ0)− K̄∂λ lnN −2|Σ0 < z(L) >l=1) r−2|Σ0 < z(L) >l=1= 0, (39) while (34) and (33) determine < zλ >l=1 and < Q >l=1 respectively (provided K 6= 0, which occurs generically). The two equations for l = 0 can be rearranged onto K̄∂λ(N−1 < zλ >l=0) +N−1 < zλ >l=0 ǫK∂λ ln(r|Σ0) = 0 (40) plus the equation (33) for l = 0, which determines < Q >l=0. Summarizing, we have found that the freedom in the deformation of the hypersurface compatible with the linearized matching conditions involving the first fundamental form [~Z]′ − [~Z] = < Q > Y ~n(0) − ǫN−2 < zλ > Y ~eλ +am~V (T ) + r −2|Σ0 < z(L) >l=1,m ~V m(L), where r−2|Σ0 < z(L) >l=1,m, satisfy (39), < zλ >l=0 satisfy (40) and the rest of the variables are completely determined as described above. The term in am corresponds to adding Killing vectors on the sphere, something already discussed in Section 4. The rest of terms involve combinations (with functions) of the conformal Killing vectors on the sphere and tangential vectors along λ. Notice that the coefficients of the conformal Killing (i.e. < z(L) >l=1,m ) determine all the rest of the l = 1 coefficients. In particular when < z(L) >l=1,m vanishes, then all the l = 1 terms vanish and the freedom becomes radially symmetric. We now add to the analysis the difference of the equations in (22). Due to the fact that all coefficients in < ~Z > vanish for l ≥ 2 we only need to consider the equations for l = 0, 1, i.e. (32). We refer the reader to Appendix A for the explicit expressions of (32) in terms of the metric perturbations and ~Z. For the sake of completeness we also include all the explicit expressions of (22) in Appendix A. The difference of equations (32), see (44)-(46), whenever < g(1) >= 0 read l = 0, 1 : < κ00 >= 0 ⇔ (41) − < QR(γ)dbac > n (0)dn(0)aeλ c − ǫ∂2λ < Q > + 2∂λ < Q > −ǫ < Q > K2N2 −ǫKN2∂λ(N−2 < zλ >)− ǫ∂λ(K < zλ >) = 0, l = 1 : < κ(L)0 >= 0 ⇔ (42) −ǫ∂λ < Q > +ǫK < zλ > +ǫ < Q > ∂λ ln(r|Σ0) + r2|Σ0K̄∂λ(r−2|Σ0 < z(L) >) = 0 l = 0, 1 : < κ(Y ) >= 0 ⇔ (43) N−2∂λ(r 2|Σ0) (∂λ < Q > +K < zλ >) + < Qn(0)an(0)b∇a∇br2 > N−2eλ a < zλn (0)b∇b∇ar2 > + l(l + n− 3) ǫ < Q > −2K̄ < z(L) > It can be checked that in general these equations overdetermine the previous equations, i.e. (39) and (40), although there may be particular cases for which they are compatible. Therefore, generically, they will imply that < z(L) >l=1,m= 0 and < zλ >l=0= 0, and thence all the rest of the variables vanish, < zλ >l=1,m=< Q >l=1,m= 0, < zλ >l=0=< Q >l=0= 0, so that the only freedom left is given by [~Z]′ − [~Z] = am~V m(T ). Finding in which particular cases equations (39)-(43) are compatible is straightforward but tedious and will not be carried out explicitly here. Acknowledgements FM and MM thank CRUP(Portugal)/MCT(Spain) for grant E-113/04. FM thanks FCT (Portugal) for grant SFRH/BPD/12137/2003 and CMAT, University of Minho, for sup- port. MM was supported by the projects FIS2006-05319 of the Spanish Ministerio de Educación y Tecnoloǵıa and SA010CO of the Junta de Castilla y León. RV was supported by the Irish IRCSET, Ref. PD/2002/108, and now is funded by the Basque Government Ref. BFI05.335. A Appendix For the sake of completeness we devote this appendix to present the explicit expressions of (22) in terms of the metric perturbations and ~Z, which read l ≥ 0 : [κ00] = 0 ⇔ (44) N2K[hnn]− n(0)aeλ c(2∇c[hab]−∇a[hbc])− [QR(γ)dbac]n (0)dn(0)aeλ −ǫ∂2λ[Q] + 2∂λ[Q]− ǫ[Q]K2N2 − ǫKN2∂λ(N−2[zλ])− ǫ∂λ(K[zλ]) = 0, l ≥ 1 : [κ(L)0] = 0 ⇔ (45) [hnλ]− n(0)aeλ b(∂b[h(L)a]− ∂a[h(L)b])− ǫ∂λ[Q]− ǫK[zλ] +(ǫ[Q] + [h(L)n])∂λ ln(r|Σ0) + r2|Σ0K̄∂λ(r−2|Σ0[z(L)]) = 0 l ≥ 0 : [κ(Y )] = 0 ⇔ (46) r2|Σ0K̄[hnn] + N−2∂λ(r 2|Σ0) (ǫ[hnλ] + ∂λ[Q] +K[zλ]) + [n(0)a∂ah(Y )] [Qn(0)an(0)b∇a∇br2]− N−2eλ a[zλn (0)b∇b∇ar2] l(l + n− 3) [h(L)n] + ǫ[Q]− 2K̄[z(L)] l ≥ 1 : [κ(T )0] = 0 ⇔ n(0)aeλ b(∂b[h(T )a]− ∂a[h(T )b]) + [h(T )n]∂λ ln(r|Σ0) + r2|Σ0K̄∂λ(r−2|Σ0[z(T )]) = 0, l ≥ 2 : [κ(LT ] = 0 ⇔ − [h(T )n] + n(0)a∂a[h(LT )] + K̄[z(T )] = 0, l ≥ 2 : [κ(LT ] = 0 ⇔ − [h(L)n] + n(0)a∂a[h(LL)] + K̄[z(L)]− [Q] = 0, l ≥ 2 : [κ(T ] = 0 ⇔ n(0)a∂a[h(T )] = 0. References [1] Gerlach U H and Sengupta U K (1979) “Gauge-invariant perturbations on most gen- eral spherically symmetric space-times” Phys. Rev. D 19 2268-2272 [2] Gerlach U H and Sengupta U K (1979) “Junction conditions for odd-parity perturba- tions on most general spherically symmetric space-times” Phys. Rev. D 20 3009-3014 [3] Gerlach U H and Sengupta U K (1979) “Even parity junction conditions for pertur- bations on most spherically symmetric space-times” J. Math. Phys. 20 2540-2546 [4] Mart́ın-Garćıa J M and Gundlach (2001) “Gauge-invariant and coordinate- independent perturbations of stellar collapse II: matching to the exterior” Phys. Rev. D 64 024012 [5] Carter B and Battye R A (1995) “Gravitational Perturbations of Relativistic Mem- branes and Strings” Phys. Lett. B35 29-35 [6] Mukohyama S (2000) “Perturbation of the junction conditions and doubly gauge- invariant variables” Class. Quantum Grav. 17 4777-4797 [7] Mars M (2005) “First and second order perturbations of hypersurfaces” Class. Quan- tum Grav. 22 3325-3347 [8] Mars M and Senovilla J M M (1993) “Geometry of general hypersurfaces in spacetime: junction conditions” Class. Quantum Grav. 10 1865-1897 [9] Vera R (2002) “Symmetry-preserving matchings” Class. Quantum Grav. 19 5249-5264 [10] Mars M and Senovilla J M M (1998) “On the construction of global models describing rotating bodies; uniqueness of the exterior gravitational field” Mod. Phys. Lett. A13 1509-1519 Introduction Linearized matching Gauge freedom Matching conditions On previous spacetime gauge invariant formalisms Freedom in matching due to symmetries Isometries n-dimensional spherically symmetric backgrounds The approach of Mukohyama On the sufficiency of the continuity of the doubly-gauge invariants Freedom in the matching Appendix ABSTRACT We present a critical review about the study of linear perturbations of matched spacetimes including gauge problems. We analyse the freedom introduced in the perturbed matching by the presence of background symmetries and revisit the particular case of spherically symmetry in n-dimensions. This analysis includes settings with boundary layers such as brane world models and shell cosmologies. <|endoftext|><|startoftext|> Introduction The unilateral shift on complex separable Hilbert space generates two funda- mental operator algebras, namely the norm closed (unital) algebra and the weak operator topology closed algebra. The former is naturally isomorphic to the disc algebra of holomorphic functions on the unit disc, continuous to the boundary, while the latter is isomorphic to H∞. The freely noncommuting multivariable generalisations of these algebras arise from the freely noncom- muting shifts Le1 , . . . , Len given by the left creation operators on the Fock space Fn = k=0⊕(Cn)⊗k. Here the generated operator algebras, denoted An and Ln for the norm and weak topologies, are known as the noncommu- tative disc algebra and the freesemigroup algebra. They have been studied extensively with respect to operator algebra structure, representation theory and the multivariable operator theory of row contractions. See for example [2], [9]. Higher rank generalisations of these algebras arise when one considers several families of freely noncommuting generators between which there are commutation relations. In the present paper we consider a very general form of such relations, namely LeiLfj = ui,j,k,lLflLek where Le1 , . . . , Len and Lf1 , . . . , Lfm are freely noncommuting and u = (ui,j,k,l) is an nm×nm unitary matrix. The associated operator algebras are denoted Au and Lu and we classify them up to various forms of isomorphism in terms of the unitary matrices u. Such unitary relations arose originally in the con- text of the general dilation theorem proven in Solel ([12], [13]) for two row contractions [T1 · · ·Tn] and [S1 · · ·Sm] satisfying the unitary commutation relations. For n = m = 1, we have u = [α] with |α| = 1 and Au is the subalgebra of the rotation C*-algebra for the relations uv = αvu. When u is a permutation unitary matrix arising from a permutation θ in Snm then the relations are those associated with a single vertex rank 2 graph in the sense of Kumjian and Pask, and the algebras in this case have been considered in Kribs and Power [5] and Power [10]. In particular, in [10] it was shown that there are 9 operator algebras Aθ arising from the 24 permutations in case n = m = 2. In contrast, we see below in Section 6 that for general 2 by 2 unitaries u there are uncountably many isomorphism classes of the unitary relation algebras Au expressed in terms of a nine fold real parametrisation of isomorphism types. The algebras Aθ are easily defined; they are determined by the left regular representation of the semigroup F+θ whose generators are e1, . . . , en, f1, . . . , fm subject to the relations eifj = flek where θ(i, j) = (k, l). On the other hand the unitary relation algebras Au are generated by creation operators on a Z2+-graded Fock space k,l ⊕(Cn)⊗k⊗ (Cm)⊗l with relations arising from the identification u : Cn ⊗Cm → Cm ⊗Cn. In particular, Au is a representation of the non-selfadjoint tensor algebra of a rank 2 correspondence (or a product system over N2) in the sense of [13]. See also [3] In the main results, summarised partly in Theorem 5.10, we see that if Au and Av are isomorphic then the two families of generators have match- ing cardinalities. Furthermore, if n 6= m then the algebras are isomorphic if and only if the unitaries u, v in Mnm(C) are unitary equivalent by a unitary A ⊗ B in Mn(C) ⊗Mm(C). As in [10] we term this product unitary equiv- alence (with respect to the fixed tensor product decomposition). The case n = m admits an extra possibility, in view of the possibility of generator exchanging isomorphisms, namely that u, ṽ are product unitary equivalent, where ṽi,j,k,l = v̄l,k,j,i. The theorem is proven as follows. After some preliminaries we identify, in Section 3, the character space M(Au) and the set of w*-continuous charac- ters on Lu. These are subsets of the closed unit ball product Bn ×Bm which are associated with a variety Vu in C n×Cm determined by u. We then define the core Ω0u, a closed subset of the realised character space Ωu =M(Au), and we identify this intrinsically (algebraically) in terms of representations of Au into T2, the algebra of upper triangular matrices in M2(C). The importance of the core is that we are able to show that the interior is a minimal automor- phism invariant subset on which automorphisms act transitively. This allows us to infer the existence of graded isomorphisms from general isomorphisms. To construct automorphisms we first review, in Section 4, Voiculescu’s con- struction of a unitary action of the Lie group U(1, n) on the Cuntz algebraOn and the operator algebras An and Ln. This provides, in particular, unitary automorphisms Θα, for α ∈ Bn, which act transitively on the interior ball, Bn, of the character space of An. For these explicit unitary automorphisms of the ei-generated copy of An in Au, we establish unitary commutation re- lations for the tuples Θα(Le1), . . . ,Θα(Len) and Lf1 , . . . , Lfm , when (α, 0) is a point in the core. This enables us to define natural unitary automorphisms of Au itself, and in Theorem 4.8 the relative interior of the core is identi- fied as an automorphism invariant set in the Gelfand space Ωu. In Section 5 we determine the graded and bigraded isomorphisms in terms of product unitary equivalence. To do this we observe that such isomorphisms induce an origin preserving biholomorphic map between the cores Ω0u and Ω v and that these maps, by a generalised Schwarz’s Lemma, are implemented by a product unitary. We then prove the main classification theorem. In Section 6 we analyse in detail the case n = m = 2 and consider the special case of permutation unitaries. Finally, in Section 7 we show that the algebra Au is contained in a tensor algebra T+(X), associated with a correspondence X as in [7]. Moreover, at least when n 6= m, every automorphism of Au extends to an automorphism of T+(X). The advantage of the tensor algebra is that its representation theory is known ([7]) while this is not the case yet for the algebra Au. 2 Preliminaries Fix two finite dimensional Hilbert spaces E = Cn and F = Cm and a unitary mn × mn matrix u. The rows and columns of u are indexed by {1, . . . , n}×{1, . . . , m} (u = (u(i,j),(k,l))) and when we write u as an mn×mn matrix we assume that {1, . . . , n} × {1, . . . , m} is ordered lexicographically (so that, for example, the second row is the row indexed by (1, 2)). We also fix orthonormal bases {ei} and {fj} for E and F respectively and the matrix u is used to identify E ⊗ F with F ⊗ E through the equation ei ⊗ fj = u(i,j),(k,l)fl ⊗ ek. (1) Equivalently, we write fl ⊗ ek = ū(i,j),(k,l)ei ⊗ fj. (2) For every k, l ∈ N, we write X(k, l) for E⊗k ⊗ F⊗l. Using succesive applica- tions of (1), we can identify X(k, l) with E⊗k1 ⊗ F⊗l1 ⊗ E⊗k2 ⊗ · · · ⊗ F⊗lr whenever k = ki and l = Let F(n,m, u) be the Fock space given by the Hilbert space direct sum X(k, l) = E⊗k ⊗ F⊗l, and, for e ∈ E and f ∈ F , write Le and Lf for the “shift” operators Leei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fjl = e⊗ei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fjl Lfei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fjl = f⊗ei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fjl where, in the last equation, we use (1) to identify the resulting vector as a vector of E⊗k ⊗ F⊗(l+1). The unital semigroup generated by {I, Le, Lf : e ∈ E, f ∈ F} is denoted F+u and the algebra it generates denoted C[F u ]. The norm closure of C[F+u ] will be written Au and its closure in the weak* operator topology will be written Lu. In particular, the algebras Lθ and Aθ studied in [10] are the algebras Lu and Au for u which is a permutation matrix. The results of Section 2 in [5] hold here too with minor changes. Every A ∈ Lu is the limit (in the strong operator topology) of its Cesaro sums Σp(A) = (1− k )Φk(A) where Φk(A) lies in Lu and is “supported” on l ⊕E⊗l ⊗ F⊗(k−l). In fact, let Qk be the projection of F(n,m, u) onto l ⊕E⊗l ⊗ F⊗(k−l), form the one-parameter unitary group {Ut} defined by Ut := k=0 e iktQk and set γt = AdUt. Then {γt}t∈R is a w∗-continuous action of R on L(F(n,m, u)) that normalizes both Au and Lu and Φk(a) = e−iktγt(a)dt for all a ∈ L(F(n,m, u)). Then Φk leaves Lu invariant. We can define the algebra Ru generated by the right shifts Re and Rf defined by Reei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fjl = ei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fjl⊗e Rfei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fil = ei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fil⊗f. The techniques of the proof of Proposition 2.3 of [5] can be applied here to show that the commutant of Ru is Lu. Also, mapping ei1 ⊗ ei2 ⊗ · · · ⊗ eik ⊗ fj1 ⊗ fj2 ⊗ · · · ⊗ fjl to fjl ⊗ fjl−1 ⊗ · · · ⊗ fj1 ⊗ eik ⊗ eik−1 ⊗ · · · ⊗ ei1 , we get a unitary operator W : F(n,m, u) → F(n,m, u∗) implementing a unitary equivalence of Lu and Ru∗ . In fact, it is easy to check that ReiW = WLei and RfjW = WLfj for every i, j. To see that the commutation relation in the range is given by u∗, apply W to (2) to get (in the range of W ) ek⊗ fl = i,j ū(i,j),(k,l)fj ⊗ ei = i,j(u ∗)(k,l),(i,j)fj ⊗ ei which is equation (1) with u∗ instead of u. As in [5], we conclude that (Lu)′ = Ru and (Lu)′′ = Lu. 3 The character space and its core In the following proposition we describe the structure of the character spaces M(Lu) and M(Au) (equipped with the weak∗ topology). Similar results were obtained in [5] for algebras defined for higher rank graphs and in [2] for analytic Toeplitz algebras. (See also [10].) It will be convenient to write Vu = {(z, w) ∈ Cn × Cm : ziwj = u(i,j),(k,l)zkwl } (3) Ωu = Vu ∩ (Bn × Bm) (4) where Bn is the open unit ball of C n. We refer to Vu as the variety associated with u. Proposition 3.1 (1) The linear multiplicative functionals on C[F+u ] are in one-to-one correspondence with points (z, w) in Vu. (2) M(Au) is homeomorphic to Ωu. (3) For (z, w) ∈ Ωu, write α(z,w) for the corresponding character of Au. Then α(z,w) extends to a w ∗-continuous character on Lu if and only if (z, w) ∈ Bn × Bm. Proof. Part (1) follows immediately from (1). Fix α ∈ M(Au) and write zi = α(Lei), 1 ≤ i ≤ n, and wi = α(Lfj ), 1 ≤ j ≤ m. From the multiplicativity and linearity of α and (1), it follows that (z, w) ∈ Vu. Since α is contractive and maps i aiLei to i aizi, it follows that ‖z‖ ≤ 1 and similarly ‖w‖ ≤ 1. Thus (z, w) ∈ Ωu. For the other direction, fix first (z, w) ∈ Ωu with ‖z‖ < 1 and ‖w‖ < 1. It follows from the definition of Ωu and from (1) that (z, w) defines a linear and multiplicative map α on the algebra C[F+u ] such that Lei is mapped into zi and α(Lfj) = wj . Abusing notation slightly, we write α(x) for α(Lx) for every x ∈ E⊗k ⊗ F⊗l. Also, for i = (i1, . . . , ik) and j = (j1, . . . , jl), we write eifj for ei1 ⊗ · · · ⊗ eik ⊗ fj1 ⊗ · · · ⊗ fjl. These elements form an orthonormal basis for E⊗k ⊗ F⊗l and we now set α(eifj)eifj ∈ F(X). If pi ≥ 0 and p1 + . . . + pn = k then there are k!p1!···pn! terms ei1 ⊗ · · · ⊗ eik with α(ei1 ⊗ · · · ⊗ eik) = z 2 · · · z k . It follows that i |α(ei)|2 = i=(i1,...,ik) |α(ei1)|2 · · · |α(eik)|2. Thus ‖wα‖2 = i,j,k,l |α(eifj)|2 = (1− ‖z‖2)−1(1− ‖w‖2)−1 <∞ Note that, for every x ∈ E⊗k ⊗ F⊗l, 〈x, wα〉 = α(x). Thus, for e ∈ E, 〈x, L∗ewα〉 = 〈Lex, wα〉 = α(e⊗x) = α(e)α(x) = 〈α(e)wα, x〉 and, similarly 〈x, L∗fwα〉 = 〈α(f)wα, x〉 for f ∈ F . Thus 〈wα, L∗ewα〉 = α(e)α(wα) = α(e) |α(eifj)|2 = α(e)‖wα‖2. Similarly, 〈wα, L∗fwα〉 = α(f)α(wα) = α(f) |α(eifj)|2 = α(f)‖wα‖2 for f ∈ F . Thus if we write να = wα/‖wα‖ then α(x) = 〈Lxνα, να〉 for every x ∈ E⊗k ⊗ F⊗l (for every k, l). This shows that α is contractive and is w∗-continuous. We can, therefore, extend it to an element of M(Lu), also denoted α. The analysis above shows that the image of the map α 7→ (z, w) ∈ Ωu defined above (onM(Au)) contains Vu∩(Bn×Bm). Since M(Au) is compact and the map is w∗-continuous, its image contains (and, thus, is equal to) Ωu. This completes the proof of (2). To complete the proof of (3), we need to show that, if (z, w) ∈ Ωu and the corresponding character extends to a w∗-continuous character on Lu, then ‖z‖ < 1 and ‖w‖ < 1. For this, write L for the w∗-closed subalgebra of Lu generated by {Le : e ∈ E} ∪ {I}. Let P be the projection of F(E, F, u) onto F(E) = C ⊕ E ⊕ (E ⊗ E) ⊕ · · ·. Then PLP = PLuP and the map T 7→ PTP , is a w∗-continuous isomorphism of L onto PLuP . The latter algebra is unitarily equivalent to the algebra Ln studied in [2]. A w∗-continuous character of Lu gives rise, therefore, to a w∗-continuous character on Ln. It follows from [2, Theorem 2.3] that z ∈ Bn. Similarly, one shows that w ∈ Bm. � To state the next result, we first write u(i,j) for the n×m matrix whose k, l-entry is u(i,j),(k,l). Thus, the (i, j) row of u provides the n rows of u(i,j). We then compute u(i,j),(k,l)zkwl = u(i,j),(k,l)wl)zk = (u(i,j)w)kzk = 〈u(i,j)w, z̄〉. Write Ei,j for the n×m matrix whose i, j-entry is 1 and all other entries are 0 (so that 〈Ei,jw, z̄〉 = ziwj) and write C(i,j) for the matrix u(i,j)−Ei,j . Then the computation above yields the following. Lemma 3.2 With C(i,j) defined as above, we have Vu = {(z, w) ∈ Cn × Cm : 〈C(i,j)w, z̄〉 = 0, for all i, j}. Definition 3.3 The core of Ωu is the subset given by Ω0u := {(z, w) ∈ Bn × Bm : C(i,j)w = 0, Ct(i,j)z = 0 for all i, j}. Fix (z, w) ∈ Ω0u. We have u(i,j)w = Ei,jw for all i, j. Thus, for every k, u(i,j),(k,l)wl = δi,kwj (6) (where δi,k is 1 if i = k and 0 otherwise) and, for a1, a2, . . . , an, in C we have k,l u(i,j),(k,l)akwl = aiwj. Hence, if we let w̃ (i) be the vector in Cmn defined by w̃ (k,l) = δk,iwl, we get uw̃ (i) = w̃(i). Similarly, for z, we have u(i,j),(k,l)zk = δj,lzi (7) and for scalars b1, . . . , bm we have k,l u(i,j),(k,l)blzk = bjzi. Thus, writing z̃(j) for the vector defined by (z̃(j))(k,l) = δl,jzk, we have uz̃(j) = z̃(j). The vector w̃(i) in Cnm = Cn ⊗ Cm is also expressible as δi ⊗ w where {δ1, . . . , δn} is the standard basis of Cn, and, similarly, z̃(j) = z ⊗ δj . We therefore obtain Lemma 3.4 which will be useful in Section 6. We note also the following companion formula. Suppose (z, w) ∈ Ω0u. Then, as we noted above, uz̃(j) = z̃(j) and, thus, u ∗z̃(j) = z̃(j). Writing this explicitly, we have, for all i, j, l, u(k,l),(i,j)z̄k = δj,lz̄i. (8) Lemma 3.4 Let (z, w) be a vector in the core Ω0u. Then span{z̃(j), w̃(i) : 1 ≤ i ≤ n, 1 ≤ j ≤ m} ⊆ Ker(u− I). In particular, (i) If the core contains a vector (z, w) with z 6= 0, then dim(Ker(u−I)) ≥ (ii) If the core contains a vector (z, w) with w 6= 0 then dim(Ker(u−I)) ≥ (iii) If the core contains a vector (z, w) with z 6= 0 and w 6= 0, then dim(Ker(u− I)) ≥ m+ n− 1. We now characterise the core in an algebraic manner in terms of repre- sentations into the algebra T2 of upper triangular 2×2 matrices. We remark that nest representations such as these have proven useful in the algebraic structure theory of nonself-adjoint algebra [?], [11]. Let ρ : C[F+u ] → T2 with ρ(T ) = ρ1,1(T ) ρ1,2(T ) 0 ρ2,1(T ) Then ρ1,1 and ρ2,2 are characters and ρ1,2 is a linear functional that satisfies ρ1,2(TS) = ρ1,1(T )ρ1,2(S) + ρ1,2(T )ρ2,2(S) (9) for T, S ∈ C[F+u ]. We now restrict to the case where ρ1,1 = ρ2,2. By Proposition 3.1(1), both are associated with a point (z, w) in Vu. It follows from (9) that ρ1,2 is determined by its values on Lei and Lfj . Setting λi = ρ1,2(Lei) and µj = ρ1,2(Lfj ), we associate with each homomorphism ρ (as discussed above) a quadruple (z, w, λ, µ) where (z, w) ∈ Vu and, for every i, j, ziµj + λiwj = u(i,j),(k,l)(wlλk + µlzk). (10) (The last equation follows from (1)). Using (5) we can write the last equation 〈u(i,j)w, λ̄〉+ 〈u(i,j)µ, z̄〉 = ziµj + λiwj = 〈Ei,jw, λ̄〉+ 〈Ei,jµ, z̄〉. That is, 〈C(i,j)w, λ̄〉+ 〈µ, Ct(i,j)z〉 = 0. (11) The following lemma now follows from the definition of the core. Lemma 3.5 A point (z, w) ∈ Ωu lies in the core Ω0u if and only if every (λ, µ) ∈ Cn × Cm defines a homomorphism ρ : C[F+u ] → T2 such that ρ(Lei) = zi λi ρ(Lfj ) = wj µj for all i, j. 4 Automorphisms of Ln and Lu We first derive the unitary automorphisms of Ln and An associated with U(1, n). These were obtained by Voiculescu [14] in the setting of the Cuntz- Toeplitz algebra. However the automorphisms restrict to an action of U(1, n) on the free semigroup algebra. The result is rather fundamental, being a higher dimensional version of the familiar Möbius automorphism group on H∞. For the reader’s convenience we provide complete proofs. See also the discussion in Davidson and Pitts [2], and in [1], [10]. Lemma 4.1 Let α ∈ Bn and write (i) x0 = (1− ‖α‖2)−1/2, (ii) η = x0α, and (iii) X1 = (ICn + ηη ∗)1/2. (1) ‖η‖2 = |x0|2 − 1, (2) X1η = x0η, and (3) X21 = I + ηη In particular, the matrix X = satisfies X∗JX = J , where J = Proof. Part (1) is an easy computation and part (3) follows from the definition of X1. For (2), note that X 1η = (I + ηη ∗)η = η + ‖η‖2η = x20η and, for every ζ ∈ η⊥, X1ζ = ζ . Suppose X1η = aη + ζ (ζ ∈ η⊥). Then x20η = X 1η = a 2η + ζ and it follows that a = x0 (as X1 ≥ 0) and ζ = 0. � The lemma exhibits specific matrices (X1 is nonnegative) in U(1, n) asso- ciated with points in the open ball. One can similarly check (see [2] or [10] for example) that the general form of a matrix Z in U(1, n) is Z = η2 Z1 where ‖η1‖2 = ‖η2‖2 = |z0|2 − 1, Z1η1 = z̄0η2, Z 1η2 = z0η1, Z∗1Z1 = In + η1η 1 , Z1Z 1 = In + η2η It is these equations that are equivalent to the single matrix equation Z∗JZ = It is well known that the map θX defined on Bn by θX(λ) = X1λ+ η x0 + 〈λ, η〉 , λ ∈ Bn. is an automorphism of Bn with inverse θX−1 . See Lemma 4.9 of [2] and Lemma 8.1 of [10] for example. We make use of this in the proof of Voiculescu’s theorem below. Let L1, . . . , Ln be the generators of the norm closed algebraAn and for ζ ∈ Cn write Lζ = ζiLi. Recall that the character space M(An) is naturally identifiable with the closed ball B̄n, with λ in this ball providing a character φλ for which φλ(Li) = λi. The proof is a reduced version of that given above for M(Aθ). Theorem 4.2 Let α ∈ Bn and let X1, x0, η and X be associated with α as in Lemma 4.1. Then (i) there is an automorphism ΘX of Ln such that Θα(Lζ) = (x0I + Lη) −1(LX1ζ + 〈ζ, η̄〉I), (12) (ii) the inverse automorphism Θ−1X is ΘX−1, and X −1 is the matrix in U(1, n) associated with −α, (iii) there is a unitary UX on Fn such that for a ∈ An, UXaξ0 = Θα(a)(x0I + Lη) and ΘX(a) = UXaU Proof. Let Fn be the Fock space for Ln, In = IFn , and let L̃ = [In L1 · · ·Ln] viewed as an operator from (C⊕ Cn)⊗Fn = Fn ⊕ (Cn ⊗Fn) to Fn. Then L̃(J ⊗ I)L̃∗ = In − L̃L̃∗ = In − (L1L∗1 + . . . LnL∗n) = P0 where P0 is the vacuum vector projection from Fn to C. Also, since XJX = J , we have L̃(J ⊗ I)L̃∗ = L̃(X ⊗ In)(J ⊗ I)(X ⊗ In)L̃∗ = [Y0 Y1](J ⊗ I)[Y0 Y1]∗ where [Y0 Y1] = [In L] x0 ⊗ In η∗ ⊗ In η ⊗ In X1 ⊗ In Thus Y0Y 0 − Y1Y ∗1 = P0. Also Y0 = x0 ⊗ In + L(η ⊗ In) = x0In + Lη, Y1 = η ∗ ⊗ In + L(X1 ⊗ In) = η∗ ⊗ In + [LX1e1 . . . LX1en] where, here, e1, . . . , en is the standard basis for C The operator V = Y −10 Y1 is a row isometry [V1 · · · Vn], from Cn ⊗Fn to Fn with defect 1. To see this we compute I − V V ∗ = I − Y −10 Y1Y ∗1 Y ∗−10 = I − Y −10 (−P0 + Y0Y ∗0 )Y ∗−10 = I + Y −10 P0Y 0 − I = ξ 0 = Y 0 ξ0 = (x0In + Lη) −1ξ0 = x (x−10 Lη) and so ‖ξ′0‖ = |x0|−2 |x0|−2j‖η‖2j = x20 − ‖η‖2 Considering the path t → tα for 0 ≤ t ≤ 1 and the corresponding path of partial isometries V it follows from the stability of Fredholm index that the index of V and L coincide and so in fact V is a row isometry. Thus V1, . . . , Vn are isometries with orthogonal ranges. We now have a contractive algebra homomorphism An → L(Fn) deter- mined by the correspondence Lei → Vi, i = 1, . . . , n. In fact it is an algebra endomorphism Θ : An → An. Indeed, for ξ = (ξ1, . . . , ξn) we have Θ(Lξ) = ξiVi = 0 Y1(ei ⊗ In) ζi(x0In + Lη) −1(η∗ ⊗ In + [LX1e1 . . . LX1en])[In · · · In]t = (x0In + Lη) −1(〈ζ, η〉In + LX1ζ). Thus far we have followed Voiculescu’s proof [14]. The following argument shows that Θ is an automorphism and is an alternative to the calculation suggested in [14]. The calculation shows that φλ ◦ΘX = φθ We have φλ ◦ΘX(Lζ) = φλ((x0In + Lη)−1(〈ζ, η〉In + LX1ζ)) = (x0 + 〈λ, η〉)−1(〈ζ, η〉+ 〈X1ζ, λ〉) = φµ(Lζ) where X∗1λ + η x0 + 〈λ, η〉 X1λ+ η x0 + 〈λ, η〉 = θX(λ). Write ΘX for the contractive endomorphism Θ of An as constructed above. It follows that the composition Φ = ΘX−1 ◦ ΘX is a contractive endomorphism which, by the remarks preceding the statement of the theo- rem, induces the identity map on the character space, so that φλ = φλ ◦Φ−1 for all λ ∈ Bn. Such a map must be the identity. Indeed, suppose that we have the Fourier series representation Φ−1(Le1) = a1Le1 + . . . + anLen + X where X is a series with terms of total degree greater than one. It follows t−1φ(t,0,...,0)(Φ −1(Le1)) = a1 while t−1φ(t,0,...,0)(Le1) = 1. Since the induced map is the identity, we have a1 = 1 and ak = 0 for k ≥ 2. In this way we see that the image of each Li has the form Li+Ti where Ti has only terms of total degree greater than one. Since Liξ0 is orthogonal to Tiξ0 and Φ−1(Li) is a contraction, we have 1 ≥ ‖Φ−1(Li)ξ0‖2 = ‖Liξ0 + Tiξ0‖2 = ‖Liξ0‖2 + ‖Tiξ0‖2 = 1 + ‖Tiξ0‖2. Thus Tiξ0 = 0 and, consequently, Ti = 0 and so the composition Φ is the identity map. Finally, we show that Θα is unitarily implemented. Define UX on Anξ0 by UXaξ0 = ΘX(a)ξ 0 = ΘX(a)(x0I + Lη) −1ξ0 for a ∈ A. Since ΘX is an automorphism, (UXa)bξ0 = UXabξ0 = ΘX(a)ΘX(b)ξ 0 = ΘX(a)UXbξ0, for a, b ∈ An, and it follows that UXa = ΘX(a)UX , as linear transformations on the dense space Anξ0. Now, V = [V1, . . . , Vn] is a row isometry with defect space spanned by ξ The map UX maps ξi = Liξ0 to ΘX(Li)ξ 0 = Viξ 0 and, if w = w(e1, . . . , en) is a word in e1, . . . , en , then UXξw = UXw(L1, . . . , Ln)ξ0 = ΘX(w(L1, . . . , Ln))ξ 0 = w(V1, . . . , Vn)ξ Since V is a row isometry and ξ′0 is a unit wandering vector for V , it follows that {w(V1, . . . , Vn)ξ′0} is an orthonormal set. Thus, UX is an isometry. Since the range of UX contains UXAnξ0 = ΘX(An)ξ′0 = An(x0I + Lη)−1ξ0 = Anξ0 we see that UX is unitary. � Remark 4.3 With the same calculations as in the proof above and slightly more notation, one can show that each invertible matrix Z ∈ U(1, n) defines an automorphism ΘZ and that Z → ΘZ is an action of U(1, n) on An and, in particular, ΘZΘX = ΘZX . Moreover, Z → UZ is a unitary representation of U(1, n) implementing this as the following calculation indicates. Let W = be the matrix in U(1, n) associated with β ∈ Bn as in Lemma 4.1. Then UXUWaξ0 = UX(Θβ(a)(w0 + Lω) −1ξ0) = Θα(Θβ(a)(w0 + Lω) −1)(x0In + Lη) = Θα(Θβ(a))Θα((w0 + Lω) −1)(x0In + Lη) = ΘXW (a)Θα((w0 + Lω) −1)(x0In + Lη) = ΘXW (a)[w0In + (x0In + Lη) −1(LX1ω + 〈ω, η〉In)] (x0In + Lη) = ΘXW (a)[w0x0In + w0Lη + LX1ω + 〈ω, η〉In)] = ΘXW (a)[(w0x0In + 〈ω, η〉)In + Lω0η+X1ω] One readily checks that this is the same as UXW (a)ξ0 It is evident from the last theorem and its proof that the unitary auto- morphisms of An and Ln act transitively on the open subset Bn associated with the weak star continuous characters. We shall show that a version of this holds for the unitary relation algebras with respect to the open core of the character space. As a first step to constructing automorphisms of Au we obtain unitary commutation relations for the n-tuples [Θ(Le1), . . . ,Θ(Len)] and [Lf1 , . . . , Lfm ] for certain automorphisms Θ of the copy of An in Au. Lemma 4.4 Suppose (z, w) ∈ Ω0u∩(Bn×Bm). Write α for z̄ and let Θ := Θα be as in (12). Then, for every 1 ≤ i ≤ n and 1 ≤ j ≤ m, Θ(Lei)Lfj = u(i,j),(k,l)LflΘ(Lek). (13) Proof. Write Y for ηη∗ and β for (x0 + 1) −1. Since X21 = I + ηη X1 = I + βηη ∗ = I + βY and Y = (Yi,j) where Yi,j = ηiη̄j = x 0z̄izj . We now compute (X1ei)fj = eifj + βYt,ietfj = eifj + t,k,l βYt,iu(t,j),(k,l)flek u(i,j),(k,l)flek + t,k,l βx20z̄tziu(t,j),(k,l)flek u(i,j),(k,l)flek + βx t,k,l z̄tu(t,j),(k,l)flek. Using the core equation (8), the last expression is equal to u(i,j),(k,l)flek + βx δj,lz̄kflek u(i,j),(k,l)flek + βx z̄kfjek u(i,j),(k,l)flek + βx (δj,lzi)z̄kflek. Using the core equation (7), this is equal to u(i,j),(k,l)flek + βx u(i,j),(t,l)zt)z̄kflek u(i,j),(k,l)flek + βx k,l,t u(i,j),(k,l)zkz̄tflet u(i,j),(k,l)flek + β k,l,t u(i,j),(k,l)Yt,kflet u(i,j),(k,l)flek + β u(i,j),(k,l)flY ek u(i,j),(k,l)flX1ek. LX1eiLfj = u(i,j),(k,l)LflLX1ek . (14) Next, we compute i z̄ieifj = i,k,l u(i,j),(k,l)z̄iflek. Using (8), this is equal k,l δj,lz̄kflek = k z̄kfjek. Thus z̄ieifj = z̄ifjei (15) and, hence, Lη commutes with Lfj . It follows that Lfj (x0I − Lη)−1 = (x0I − Lη)−1Lfj . (16) We have, using (14) and (16), (x0I − Lη)−1LX1eiLfj = u(i,j),(k,l)(x0I − Lη)−1LflLX1ek u(i,j),(k,l)Lfl(x0I − Lη) −1LX1ek . Also, applying (7) and (16), we get (x0I − Lη)−1〈ei, η〉Lfj = ziLfj (x0I − Lη)−1 δj,lziLfl(x0I − Lη) u(i,j),(k,l)zkLfl(x0I − Lη) Subtracting the last two equations, we get (13). � Corollary 4.5 In the notation of Lemma 4.4, for every i, j, L∗fjΘ(Lei) = u(i,l),(k,j)Θ(Lek)L Proof. It follows from (13) that Θ(Lei)Lfl = k,t u(i,l),(k,t)LftΘ(Lek) for every i, l. Thus, for i, j, l, L∗fjΘ(Lei)LflL u(i,l),(k,t)L LftΘ(Lek)L u(i,l),(k,t)δj,tΘ(Lek)L u(i,l),(k,j)Θ(Lek)L Summing over l, we get L∗fjΘ(Lei)( u(i,l),(k,j)Θ(Lek)L l LflL = I − P where P is the projection onto the subspace C ⊕ E ⊕ (E ⊗ E) ⊕ . . .. Note that P is left invariant under the operators in the algebra generated by {Lei : 1 ≤ i ≤ n} and, in particular, by Θ(Lei). Thus L∗fjΘ(Lei)P = L PΘ(Lei)P = 0 = k,l u(i,l),(k,j)Θ(Lek)L P . This completes the proof of the corollary. � Proposition 4.6 Suppose (z, w) ∈ Ω0u ∩ (Bn × Bm). Then there is a auto- morphism Θ̃z of Au that is unitarily implemented and such that, for every X ∈ Au, α(0,w)(Θ̃ z (X)) = α(z,w)(X) (17) where α(z,w) is the character associated with (z, w) by Proposition 3.1. Proof. Let U be the unitary operator implementing Θ. We can view F(n,m, u) as the sum F(n,m, u) = F⊗k ⊗ F(E) where F(E) = C⊕E⊕ (E⊗E)⊕· · ·. We now let V be the unitary operator whose restriction to F⊗k ⊗F(E) is Ik ⊗U (where Ik is the identity operator on F⊗k). It is easy to check that, for every fj , V LfjV ∗ = Lfj . Now, fix i. We shall show, by induction, that, for every k and every ξ ∈ F⊗k ⊗ F(E), (Ik ⊗ U)Leiξ = Θ(Lei)(Ik ⊗ U)ξ. (18) For k = 0 this is just the fact that U implements Θ. Suppose we know this for k and fix fj ∈ F . Then, for ξ ∈ F⊗k ⊗ F(E) we have, (Ik+1 ⊗ U)LeiLfjξ = u(i,j),(k,l)(Ik+1 ⊗ U)LflLekξ u(i,j),(k,l)Lfl(Ik ⊗ U)Lekξ. Applying the induction hypothesis, this is equal to k,l u(i,j),(k,l)LflΘ(Lek)(Ik⊗ U)ξ. Using (13), this is Θ(Lei)Lfj (Ik ⊗ U)ξ = Θ(Lei)(Ik ⊗ U)Lfjξ. Since F⊗(k+1) ⊗ F(E) is spanned by elements of the form Lfjξ (as above) the equality follows. From the relations of Lemma 4.4 it follows that the map Θ̃z : X → V XV ∗ defines a unitary endomorphism of Au. Since Θ is an automorphism of An it follows that Θ̃z gives the desired automorphism. � Clearly, in Proposition 4.6, we can interchange z and w to get the follow- ing, where Θz,w = Θ̃zΘ̃w. Proposition 4.7 Suppose (z, w) ∈ Ω0u ∩ (Bn ×Bm). Then there is a unitary automorphism Θz,w of Lu which is a homeomorphism with respect to the w∗-topologies and which restricts to an automorphism of Au. Moreover, for every X ∈ Lu, α(0,0)(Θ z,w(X)) = α(z,w)(X) (19) where α(z,w) is the character associated with (z, w) as in Proposition 3.1. An automorphism Ψ of Au, defines a map on the character space of Au, namely φ 7→ φ ◦Ψ−1. Thus using Proposition 3.1 we have a homeomorphism θΨ of Ωu. Also, since Ωu ∩ (Bn × Bm) is the interior of Ωu, θΨ maps Ωu ∩ (Bn × Bm) onto itself. Similarly, if Ψ is an automorphism of Lu which is a homeomorphism with respect to the w∗-topologies, then θΨ is a homeomorphism of Ωu∩ (Bn×Bm). In the following theorem we identify the relative interior of the core as the orbit of (0, 0) under the group of maps θΨ associated with automorphisms Ψ. Theorem 4.8 For (z, w) ∈ Bn×Bm the following conditions are equivalent. (1) (z, w) ∈ Ω0u. (2) There exists a completely isometric automorphism Ψ of Lu that is a homeomorphism with respect to the w∗-topologies and restricts to an automorphism of Au, such that θΨ(0, 0) = (z, w). (3) There exists an algebraic automorphism Ψ of Au such that θΨ(0, 0) = (z, w). Proof. The proof that (1) implies (2) follows from Proposition 4.7. Clearly (2) implies (3). It is left to show that (3) implies (1). Given a point (z, w) ∈ Ωu, we saw in Lemma 3.5 that, for every (λ, µ) satisfying (11) there is a homomorphism ρz,w,λ,µ : C[F u ] → T2. For (z, w) = (0, 0) equation (11) holds for every pair (λ, µ). Since ρ0,0,λ,µ vanishes off a finite dimensional subspace, it is a bounded homomorphism. In fact, for every (λ, µ), ‖ρ0,0,λ,µ‖ ≤ 1 + ‖λ‖+ ‖µ‖. Given Ψ and (z, w) as in (3), for every (λ, µ) ∈ Cn × Cm, ρ0,0,λ,µ ◦ Ψ−1 is a homomorphism on C[F+u ] and, thus, it is of the form ρz,w,λ′,µ′ for some (unique) (λ′, µ′) satisfying (11). Write ψ(λ, µ) = (λ′, µ′) and note that this defines a continuous map. To prove the continuity, suppose (λn, µn) → (λ, µ) and write ρn for ρ0,0,λn,µn and ρ for ρ0,0,λ,µ. Then (using the estimate on the norm of ρ0,0,λ,µ) there is someM such that ‖ρn‖ ≤M for all n and ‖ρ‖ ≤M . For every Y ∈ C[F+u ], ρn(Y ) → ρ(Y ). Now fix X ∈ Au and ǫ > 0. There is some Y ∈ C[F+u ] such that ‖X − Y ‖ ≤ ǫ and there is some N such that for n ≥ N ‖ρn(Y )− ρ(Y )‖ ≤ ǫ. Thus, for such n, ‖ρn(X)− ρ(X)‖ ≤ (2M +1)ǫ. Setting X = Ψ(Lei), we get λ n → λ′ and similarly for µ′. If (z, w) is not in Ω0u, then the set of all (λ, µ) satisfying (11) is a subspace of Cn ×Cm of dimension strictly smaller than n+m and, as is shown above, it contains the continuous image (under the injective map ψ) of Cn × Cm. This is impossible. � 5 Isomorphic algebras In this section we shall find conditions for algebras Au and Av to be (isomet- rically) isomorphic. The characterisation also applies to the weak star closed algebras Lu. We start by considering a special type of isomorphism. We shall now assume that the set {n,m} for both algebras is the same. In fact, by inter- changing E and F , we can assume that the corresponding dimensions are the same and the algebras are defined on F(n,m, u) and F(n,m, v) respectively. This assumption will be in place in the discussion below up to the end of Lemma 5.5. The algebra Au carries a natural Z2+-grading, with the (k, l) labeled sub- space being spanned by products of the form Lei1Lei2 . . . LeikLfi1Lfi2 . . . Lfil . Also, the total length of such operators provides a natural Z+-grading. Note that an algebra isomorphism Ψ : Au → Av which respects the Z+-grading is determined by a linear map between the spans of the generators Le1 , . . . , Len , Lf1 , . . . , Lfm . Here we use the same notation for the generators of Au and Av. Such an isomorphism will be called graded. We now consider two types of graded isomorphisms, namely, either bi- graded, as in the following definition, or, in case n = m, bigraded after relabeling generators. Definition 5.1 (i) An isomorphism Ψ : Au → Av is said to be bigraded isomorphism if there are unitary matrices A (n × n) and B (m ×m) such that Ψ(Lei) = ai,jLej , Ψ(Lfk) = bk,lLfl. (ii) If m = n and Ψ is a graded isomorphism such that Ψ(Lei) = ai,jLfj , Ψ(Lfk) = bk,lLel for n × n unitary matrices A and B then we say that Ψ is a graded exchange isomorphism. We write ΨA,B for the bigraded isomorphism (as in (i)) and Ψ̃A,B for the graded exchange isomorphism. Abusing notation, we write Ψ(ei) = j ai,jej instead of Ψ(Lei) = j ai,jLej for a bigraded isomorphism (and similarly for the other expressions). For unitary permutation matrices the following lemma was proved in [10, Theorem 5.1(iii)]. Lemma 5.2 (i) If ΨA,B is a bigraded isomorphism then (A⊗ B)v = u(A⊗B) (20) where A⊗B is the mn×mn matrix whose (i, j), (k, l) entry is ai,kbj,l. (ii) If m = n and Ψ̃A,B is a graded exchange isomorphism then (A⊗ B)ṽ = u(A⊗B) (21) where ṽ(i,j),(k,l) = v̄(l,k),(j,i). Proof. Assume Ψ = ΨA,B is a bigraded isomorphism. For i, j, Ψ(ei ⊗ fj) = ( ai,kek)⊗ ( bj,lfl) = (A⊗ B)(i,j),(k,l)ek ⊗ fl = k,l,r,t (A⊗ B)(i,j),(k,l)v(k,l),(r,t)ft ⊗ er = ((A⊗ B)v)(i,j),(r,t)ft ⊗ er. On the other hand, Ψ(ei ⊗ fj) = Ψ( u(i,j),(k,l)fl ⊗ ek) = k,l,t,r u(i,j),(k,l)bl,tak,rft ⊗ er = (u(A⊗B))(i,j),(r,t)ft ⊗ er. This proves equation (20). A similar argument can be used to verify equation (21). � Definition 5.3 If u, v are mn×mn unitary matrices and there exist unitary matrices A and B satisfying (20), we say that u and v are product unitary equivalent. Now suppose that A and B are unitary matrices satisfying (20). The same computation as in Lemma 5.2 shows that WA,B : E ⊗u F → E ⊗v F defined by WA,B(ei ⊗ fj) = (A⊗ B)(i,j),(k,l)ek ⊗ fl is a well defined unitary operator. Here the notation E ⊗u F indicates that this is E ⊗ F as a subspace of F(n,m, u). Similarly, one defines a unitary operator, also denoted WA,B, from E ⊗k ⊗F⊗l in F(n,m, u) to E⊗k ⊗ F⊗l in F(n,m, v) by WA,B(ei1 ⊗ · · · ⊗ eik ⊗ fj1 ⊗ · · · ⊗ fjl) = ai1,r1 · · · aik,rkbj1,t1 · · · bjl,tler1 ⊗ · · · ⊗ erk ⊗ ft1 ⊗ · · · ⊗ ftl . This gives a well defined unitary operator WA,B : F(n,m, u) → F(n,m, v). Lemma 5.4 For every i, j, write Aei = k ai,kek and Bfj = l bj,lfl. Then, for g1, g2, . . . , gr in {e1, . . . , en, f1, . . . , fm}, WA,B(g1 ⊗ g2 ⊗ · · · ⊗ gr) = Cg1 ⊗ Cg2 ⊗ · · · ⊗ Cgr (22) where Cgi = Agi if gi ∈ {e1, . . . , en} and Cgi = Bgi if gi ∈ {f1, . . . , fm}. Proof. If the gi’s are ordered such that the first ones are from E and the following vectors are from F , then the result is clear from the definition of WA,B. Since we can get any other arrangement by starting with one of this kind and interchanging pairs gl, gl+1 successively (with gl ∈ {e1, . . . , en} and gl+1 ∈ {f1, . . . , fm}), it is enough to show that that if (22) holds for a given arrangement of e’s and f ’s and we apply such an interchange, then it still holds. So, we assume gl = ek, gl+1 = fs and we write g ′ = g1 ⊗ · · · ⊗ gl−1, g′′ = gl+2 ⊗ · · · ⊗ gr, Cg′ = Cg1 ⊗ · · · ⊗ Cgl−1 and Cg′′ = Cgl+2 ⊗ · · · ⊗ Cgr and compute WA,B(g ′ ⊗ fs ⊗ ek ⊗ g′′) = WA,B( ū(i,j),(k,s)g ′ ⊗ ei ⊗ fj ⊗ g′′). Using our assumption, this is equal to ū(i,j),(k,s)Cg ′ ⊗ ( ai,tet)⊗ ( bj,qfq)⊗ Cg′′ = i,j,t,q ū(i,j),(k,s)ai,tbj,qCg ′ ⊗ et ⊗ fq ⊗ Cg′′ = i,j,t,q,d,p ū(i,j),(k,s)ai,tbj,qv(t,q),(d,p)Cg ′ ⊗ fp ⊗ ed ⊗ Cg′′ = (u∗)(k,s),(i,j)(A⊗ B)(i,j),(t,q)v(t,q),(d,p)Cg′ ⊗ fp ⊗ ed ⊗ Cg′′ = (A⊗ B)(k,s),(d,p)Cg′ ⊗ fp ⊗ ed ⊗ Cg′′ = ak,dbs,pCg ′ ⊗ fp ⊗ ed ⊗ Cg′′ = Cg′ ⊗ Bfs ⊗ Aek ⊗ Cg′′ completing the proof. � The following lemma was proved in [10, Section 7] and it shows that the necessary conditions of Lemma 5.2 are also sufficient conditions on A⊗B for the existence of a unitarily implemented isomorphism ΨA,B. Lemma 5.5 For unitary matrices A,B satisfying (20) and X ∈ Au, the X 7→WA,BXW ∗A,B is the bigraded isomorphism ΨA,B : Au → Av. Moreover ΨA,B extends to a unitary isomorphism Lu → Lv, and similar statements holds for graded exchange isomorphisms (when m = n). Proof. It will suffice to show the equality ΨA,B(X)WA,B = WA,BX for X = Lei and for X = Lfj . Let X = Lfj and apply both sides of the equation to ei1 ⊗ · · · ⊗ eik ⊗ fj1 ⊗ · · · ⊗ fjl. Using Lemma 5.4, we get ΨA,B(Lfj )WA,B(ei1 ⊗ · · · ⊗ eik ⊗ fj1 ⊗ · · · ⊗ fjl) bj,rLfr(Aei1 ⊗ · · · ⊗ Aeik ⊗Bfj1 ⊗ · · · ⊗ Bfjl) = Bfj ⊗Aei1 ⊗ · · · ⊗Aeik ⊗ Bfj1 ⊗ · · · ⊗ Bfjl =WA,B(fj ⊗ ei1 ⊗ · · · ⊗ eik ⊗ fj1 ⊗ · · · ⊗ fjl) =WA,BLfj (ei1 ⊗ · · · ⊗ eik ⊗ fj1 ⊗ · · · ⊗ fjl). This proves the equality for X = Lfj . The proof for X = Lei is similar. � At this point we drop our assumption that the set {n,m} is the same for both algebras and write {n′, m′} for the dimensions associated with Av. We shall see in Proposition 5.8 (and Remark 5.11(i)) that, if the algebras are isomorphic, then necessarily {n,m} = {n′, m′}. Given an isomorphism Ψ : Au → Av we get a homeomorphism θΨ : Ωu → Ωv (as in the discussion preceeding Theorem 4.8). The arguments used in the proof of Theorem 4.8 to show that part (3) implies part (1) apply also to isomorphisms and thus, θΨ(0, 0) ∈ Ω0v. Proposition 5.6 Let Ψ : Au → Av be an (algebraic) isomorphism. Then u) = Ω v and θΨ(Ω u ∩ (Bn × Bm)) = Ω0v ∩ (Bn × Bm). Proof. Fix (z, w) in Ω0u and use Theorem 4.8 to get an automorphism Φ of Au such that θΦ(0, 0) = (z, w). But then θΨ◦Φ(0, 0) = θΨ(z, w) and, as we noted above, this implies that θΨ(z, w) ∈ Ω0v. It follows that θΨ(Ω0u) ⊆ Ω0v and, applying this to Ψ−1, the lemma follows. � Lemma 5.7 The map θΨ is a biholomorphic map. Proof. The coordinate functions for θΨ are (z, w) 7→ α(z,w)(Ψ−1(ei)) (and (z, w) 7→ α(z,w)(Ψ−1(fj))) where α(z,w) is the character associated with (z, w) by Proposition 3.1. For every Y ∈ C[F+v ], α(z,w)(Y ) is a polynomial in (z, w) (for (z, w) ∈ Ωv) and, therefore, an analytic function. Each X ∈ Av is a norm limit of elements in C[F+v ] and, thus, α(z,w)(X) is an analytic function being a uniform limit of analytic functions on compact subsets of Ωv. Hence, for every (z, w) ∈ Ωv, there is a power series that converges in some, non empty, circular, neighborhoodC of (z, w) that represents α(z,w)(X) onC∩Ωv. Taking for X the operators Ψ−1(ei) and Ψ −1(fj), we see that θ is analytic. The same arguments apply to θ−1. � The facts in the following proposition obtained in [10] in the case of permutation matrices. Proposition 5.8 Let Ψ : Au → Av be an algebraic isomorphism and let θΨ : Ωu → Ωv be the associated map between the character spaces. Suppose θΨ(0, 0) = (0, 0). Then we have the following. (1) {n,m} = {n′, m′} and we shall assume that n = n′ and m = m′ (interchanging E and F and changing u to u∗ if necessary). (2) There are unitary matrices U (n×n) and V (m×m) such that θΨ(z, w) = (Uz, V w) for (z, w) ∈ Ωu. (If n = m it is also possible that θΨ(z, w) = (V w, Uz).) (3) If Ψ is an isometric isomorphism, then Ψ is a bigraded isomorphism. (Or, if m = n, it may be a graded exchange isomorphism). Proof. The proof of Proposition 6.3 in [10] giving (1) and (2) in the permutation case is based essentially on Schwarz’s lemma for holomorphic map from the unit disc. It applies without change to the case of unitary matrices. For (3) we may assume m = m′ and n = n′. From (2) we have for each Φ(Lei) = LUei +X where X is a sum of higher order terms. Since Φ(Lei) is a contraction and LUei is an isometry it follows, as in the proof of Voiculescu’s theorem, that X = 0. Similarly, Φ(Lfj ) = LV fj and it follows that Φ is bigraded. � Since every graded isomorphism Ψ satisfies θΨ(0, 0) = (0, 0), we conclude the following. Corollary 5.9 Every graded isometric isomorphism is bigraded if n 6= m and otherwise is either bigraded or is a graded exchange isomorphism. Theorem 5.10 The following statements are equivalent for unitary matrices u, v in Mn(C)⊗Mm(C). (i) There is an isometric isomorphism Ψ : Au → Av. (ii) There is a graded isometric isomorphism from Ψ : Au → Av. (iii) The matrices u, v are product unitary equivalent or (in case n = m) the matrices u, ṽ are product unitary equivalent, where ṽ(i,j),(k,l) = v̄(l,k),(j,i). (iv) There is an isometric w*-continuous isomorphism Γ : Lu → Lv. Proof. Given Ψ in (i), let (z, w) = θΨ(0, 0). By Proposition 5.6 (z, w) lies in the interior of Ω0v. By Theorem 4.8 there is a completely isometric automor- phism Φ ofAv such that θΦ(0, 0) = (z, w) and, therefore, θΦ−1◦Ψ(0, 0) = (0, 0). By Proposition 5.8, Φ−1 ◦Ψ is a graded isometric isomorphism and (ii) holds. Lemma 5.2 shows that (ii) implies (iii) and Lemma 5.5 that (iii) implies (i). Finally, (iii) implies (iv) follows from Lemma 5.5, and (iv) implies (ii) is entirely similar to (i) implies (ii). � Remark 5.11 The argument at the beginning of the proof of Theorem 5.10 shows that, whenever Au and Av are isomorphic, we have {n,m} = {n′, m′}. Theorem 5.12 For n 6= m the isometric automorphisms of Au are of the form ΨA,BΘz,w where (z, w) ∈ Ω0u and (A⊗B)u = u(A⊗B). In case n = m the isometric automorphisms include, in addition, those of the form Ψ̃A,BΘz,w where (A⊗ B)ũ = u(A⊗ B). 6 Special cases 6.1 The case n = m = 2 Even in the low dimensions n = m = 2 there are many isomorphism classes and special cases. Note that the product unitary equivalence class orbit O(u) of the 4× 4 unitary matrix u takes the form O(u) = {(A⊗B)u(A⊗ B)∗ : A,B ∈ SU2(C)}, and so the product unitary equivalence classes are parametrised by the set of orbits, U4(C)/Ad(SU2(C)×SU2(C)). This set admits a 10-fold parametrisa- tion, since, as is easily checked, U4(C) and SU2(C)×SU2(C) are real algebraic varieties of dimension 16 and 6 respectively. It follows that the isometric iso- morphism types of the algebras Au admit a 10 fold real parametrisation, with coincidences only for pairs O(u),O(v) with u = ṽ We now look at some special cases in more detail. Let d = dimKer(u−I). Case I: d = 0 For every (z, w) ∈ B2 × B2, we have (z, w) ∈ Ωu if and only if the vector (z1w1, z1w2, z2w1, z2w2) t lies in Ker(u− I). Thus, in case I, Ωu is as small as possible and is equal to Ωmin := (B2 × {0}) ∪ ({0} × B2). It follows from Lemma 3.4 that, in this case, Ω0u = {(0, 0)}. By Proposition 5.8 every isometric automorphism of Au is graded and the isometric automorphisms of Au are given by pairs (A,B) of unitary matrices such that A⊗ B either commutes with u or intertwines u and ũ. Case II: d = 1 When d = 1 it still follows from Lemma 3.4 that Ω0u = {(0, 0)} but now it is possible for Ωu to be larger than Ωmin. In fact, if the non zero vector (a, b, c, d)t spanning Ker(u− I) satisfies ad 6= bc then Ωu = Ωmin but if ad = bc then the matrix is of rank one and can be written as (z1, z2) t(w1, w2). Thus, (z, w) ∈ Vu and Ωu contains some (z, w) with non zero z and w. Since Ω0u = {(0, 0)}, it is still true that isometric isomorphisms and auto- morphisms of these algebras are graded. Case III: d = 2 When d = 2 it is possible that Ω0u will contain non zero vectors (z, w) but, as Lemma 3.4 shows, it does not contain a vector with both z 6= 0 and w 6= 0. All other possibilities may occur. For example write u1, u2 and u3 for the three diagonal matrices: u1 = diag(1,−1,−1, 1), u2 = diag(1,−1, 1,−1) u3 = diag(1, 1,−1,−1). Using the definition of the core, we easily see that Ω0u1 = {(0, 0)}, Ω = {(0, 0, w1, 0) : |w1| ≤ 1} Ω0u3 = {(z1, 0, 0, 0) : |z1| ≤ 1}. Thus, the only isometric automorphisms of Au1 are graded, the isomet- ric automorphisms of Au2 are formed by composing graded automorphisms with automorphisms of the type described in Proposition 4.7 (with z = (0, 0) and w = (w1, 0)). Similarly, for the automorphisms of Au3, we use Proposi- tion 4.6. Case IV: d = 3 In this case we are able to obtain an explicit 2-fold parametrization of the isomorphism types of the algebra Au. Every 4×4 unitary matrix u with dim(Ker(u− I)) = 3 is determined by a unit eigenvector x and its (different from 1) eigenvalue. So that ux = λx, ‖x‖ = 1, |λ| = 1 and λ 6= 1. Suppose u and v are product unitary equivalent; that is (A⊗ B)u = v(A⊗B) for unitary matrices A,B, and write x, λ for the unit eigenvector and eigen- value of u. (Of course, x is determined only up to a multiple by a scalar of absolute value 1). Then y = (A ⊗ B)x is a unit eigenvector of v with eigenvalue λ. For unit vectors x, y (in C4) we write x ∼ y if there are unitary (2 × 2) matrices A,B with y = (A ⊗ B)x. For the statement of the next lemma recall that the entries of the vectors x and y in C4 are indexed by {(i, j) : 1 ≤ i, j ≤ 2}. Lemma 6.1 For a vector x = {x(i,j)} in C4, write c(x) for the 2× 2 matrix c(x) = x(1,1) x(1,2) x(2,1) x(2,2) Then x ∼ y if and only if there are unitary matrices A,B such that c(x) = Ac(y)B. (In this case, we shall write c(x) ∼ c(y).) Proof. Suppose y = (A⊗B)x for some unitary matrices A = (ai,j) and B = (bi,j). Then c(y)i,j = y(i,j) = (A ⊗ B)(i,j),(k,l)x(k,l) = k,l ai,kbj,lc(x)k,l = (Ac(x)B)i,j. � Using the polar decomposition c(x) = U |c(x)| and diagonalizing |c(x)| = V ∗, we find that c(x) ∼ = c(y) where y = (a, 0, 0, d) and a, d ≥ 0. Then a, d (the eigenvalues of |c(x)|) are uniquely determined once we choose them such that a ≤ d and, if ‖x‖ = 1, then a2 + d2 = 1 (so that 0 ≤ a ≤ 1/ 2 and a determines d). In this way, we associate to each unitary matrix u as above a pair (a, λ) with 0 ≤ a ≤ 1/ 2, λ 6= 1 and |λ| = 1. Using Lemma 6.1 and the discussion preceeding it, we have the following. Corollary 6.2 For every 4× 4 unitary matrix u with dim(Ker(u− I)) = 3, there are numbers λ (with |λ| = 1 and λ 6= 1) and a (0 ≤ a ≤ 1/ 2) such that u and v are product unitary equivalent if and only if they have the same a, λ. Proof. Let u and v be unitary matrices with dim(Ker(u− I)) = 3 and let (a, λ), (b, µ) be the pairs associated to u and v (respectively) as above. Also write x for the unit eigenvector of u associated to the eigenvalue λ and let y be the unit eigenvector of v associated to µ. Suppose u and v are product unitarily equivalent. Then they are unitary equivalent and, thus, λ = µ. Write (A⊗B)u = v(A⊗B) for unitary matrices A,B. As we saw above, y can be chosen to be (A⊗ B)x so that x ∼ y and, by Lemma 6.1, c(x) ∼ c(y). It follows that a = b. Conversely, assume that a = b and λ = µ. Then c(x) ∼ c(y) and, thus, x ∼ y so we can write y = (A⊗B)x for some unitary matrices A,B. Writing v′ = (A⊗B)u(A⊗B)∗, we find that y is the unit eigenvector of v′ associated to λ. Thus v = v′, completing the proof. � For every a, λ as in Corollary 6.2 we let u(a, λ) be the following 4 × 4 matrix. u(a, λ) = (λ− 1)a2 + 1 0 0 (λ− 1)a(1− a2)1/2 0 1 0 0 0 0 1 0 (λ− 1)a(1− a2)1/2 0 0 λ+ (1− λ)a2 It is a straightforward computation to verify that dim(Ker(u− I)) = 3 and that λ is an eigenvalue of u(a, λ) with eigenvector (a, 0, 0, (1− a2)1/2)t. Thus the pair associated to u(a, λ) is a, λ and we have Corollary 6.3 Every matrix u with dim(Ker(u−I)) = 3 is product unitary equivalent to a unique matrix of the form u(a, λ) (with 0 ≤ a ≤ 1/ 2, |λ| = 1 and λ 6= 1). Using the definition of the core, we immediately get the following. Proposition 6.4 If a = 0, |λ| = 1, λ 6= 1, then Ωu(0,λ) is the union {(z1, z2, w1, 0) : z ∈ B2; |w1| ≤ 1} ∪ {(z1, 0, w1, w2) : w ∈ B2; |z1| ≤ 1}, Ω0u(0,λ) = {(z1, 0, w1, 0) : |z1| ≤ 1; |w1| ≤ 1}. If a 6= 0 then Ωu(a,λ) = {(z1, z2, w1, w2) : az1w1 + (1− a2)1/2z2w2 = 0, (z, w) ∈ B2 × B2} Ω0u(a,λ) = {(0, 0)}. Proof. The space Ωu(a,λ) consists of points (z, w) for which (z1w1, z1w2, z2w1, z2w2) t = u(a, λ)(z1w1, z1w2, z2w1, z2w2) that is, for which ((λ− 1)a2 + 1)z1w1 + (λ− 1)a(1− a2)1/2z2w2 = z1w1, (λ− 1)a(1− a2)1/2z1w1 + (λ+ (λ− 1)a2)z2w2 = z2w2. If a = 0 this implies z2w2 = 0, while if a 6= 0 then (z1w1, 0, 0, z2w2) is a fixed vector for u(a, λ) and so for some scalar µ (z1w1, z2w2) = µ((1− a2)1/2,−a). The descriptions of Ωu(a,λ) follows. From the definition of the core and the fact that here C12 = C21 = 0 and C11 = (λ− 1) 0 0 (λ− 1)a(1− a2)1/2 C22 = (λ− 1)a(1− a2)1/2 0 0 (λ− 1) + (λ− 1)a2 we see that for a = 0 we have w2 = z2 = 0 while for a 6= 0, z1 = z2 = w1 = w2 = 0. � Recall that, for a 4 × 4 unitary matrix v we defined the matrix ṽ by ṽ(i,j),(k,l) = v̄(l,k),(j,i) and showed (Corollary 5.10) that Au and Av are isomet- rically isomorphic if and only if either u and v or u and ṽ are product unitary equivalent. Now, it is easy to check that ũ(a, λ) = u(a, λ̄) and so, using Proposi- tion 3.3 and previous results, we obtain the following. Theorem 6.5 Let 0 ≤ a, b ≤ 1/ 2, |λ| = |µ| = 1, λ, µ 6= 1. Then (1) Au(a,λ) and Au(b,µ) are isometrically isomorphic if and only if a = b and λ equals either µ or µ̄. (2) When a 6= 0 the isometric automorphisms of Au(0,λ) are all bigraded (3) If a = 0 then there are isometric isomorphisms that are not graded Case V: d = 4 This is the case where u = I. We have Ωu = Ω u = Bn×Bm and the isometric automorphisms are obtained by composing graded automorphisms and the automorphisms described by Proposition 4.6, Proposition 4.7. 6.2 Permutation unitary relation algebras With more structure assumed for a class of unitaries u it may be possible to derive an appropriately more definitive classification of the algebras Au. We indicate this now for the class of permutation unitaries. A fuller discussion is in [10]. Let θ ∈ S4, viewed as a permutation of the product set {1, 2} × {1, 2} = {11, 12, 21, 22}. Associate with θ the matrix uθ = u(i,j),(k,l) where u(i,j),(k,l) = 1 if (k, l) = θ(i, j) and is zero otherwise. If τ ∈ S4 is product conjugate to θ in the sense that τ = σθσ−1 with σ in S2×S2, then it follows that uτ and uθ are product unitarily equivalent. Thus we need only consider product conjugacy classes. It turns out that these classes are the same as the product unitary equivalence classes of the matrices uθ. It can be helpful to view a permutation θ in Snm as a permutation of the entries of an n ×m rectangular array, since product conjugacy corresponds to conjugation through row permutations and column permutations. Con- sidering this for n = m = 2 one can verify firstly that there are at most 9 isomorphism types for the algebras Atheta corresponding to the following permutations: θ1 = id, θ2 = (11, 12), θ3 = (11, 22), θ4a = (11, 22, 12), θ4b = θ 4a = (11, 12, 22), θ5 = ((11, 12), (21, 22)), θ6 = ((11, 22), (12, 21)), θ7 = (11, 12, 22, 21), θ8 = (11, 12, 21, 22). The Gelfand spaces of the algebras Aθ (and Lθ) distinguish all of these al- gebras except for the pairs {θ4a, θ4b} and {θ7, θ8}. However, one can verify in both cases that neither the pair u, v nor the pair u, ṽ are product unitary equivalent. Theorem 5.10 now applies to yield the following result from [10]. Theorem 6.6 For n = m = 2 there are 9 isometric isomorphism classes for the algebras Aθ and for the algebras Lθ. To a higher rank graph (Λ, d) in the sense of Kumjian and Pask [6] one can associate nonself-adjoint Toeplitz algebra AΛ,LΛ, as in Kribs and Power [5]. In the single vertex rank 2 case it is easy to see that AΛ is equal to the algebra Au for some permutation matrix u = θ in Snm. Thus Theorem 5.10 classifies these algebras in terms of product unitary equivalence restricted to Snm as stated formally in the next theorem. In the rank 2 case this is a significant improvement on the results in [10] which, although covering general rank, were restricted to the case of trivial core for the character space. With θ̃ the permutation for the permutation matrix ũθ (which corresponds to generator exchange) we have: Theorem 6.7 Let Λ1 and Λ2 be single vertex 2-graphs with relations de- termined by the permutations θ1 and θ2. Then the rank 2 graph algebras AΛ1,AΛ2 are isometrically isomorphic if and only if the pair θ1, θ2 or the pair θ1, θ̃2 are product unitary equivalent It is natural to expect that as in the (2, 2) case product unitary equiva- lence will correspond to product conjugacy. 7 Au as a subalgebra of a tensor algebra Let En be the Toeplitz extension of the Cuntz algebra On and write H for the Fock space associated with E (that is, H = C ⊕ E ⊕ (E ⊗ E) ⊕ · · ·). Note that En acts naturally on H ( by the “shift” or “creation” operators Li = Lei, 1 ≤ i ≤ n). In fact, Le1, . . . , Len generate En as a C∗-algebra. Consider also the space F(F )⊗H = H⊕(F⊗H)⊕((F⊗F )⊗H)⊕· · ·. This space is isomorphic to F(E, F, u) and we write w : F(F )⊗H → F(E, F, u) for the isomorphism. It will be convenient to write wk for the restriction of w to the summand F⊗k⊗H (which is an isomorphism onto its image). Note that, for a fixed k, {w∗kLeiwk : 1 ≤ i ≤ n } is a set of n isometries with orthogonal ranges. Thus it defines a representation ρk of En on F⊗k⊗H (with ρk(Lei) = w kLeiwk). (Note that we are using Lei for the creation operators both on H and on F(E, F, u). This should cause no confusion). We also write ρ∞ for the representation k ⊕ρk of En on F(F )⊗H (where ρ0 is the representation of En on H). Let X be the column space Cm(En). This is a C∗-module over En. As a vector space it is the direct sum of m copies of En. The right module action of En on X is given by (ai) · b = (aib) and the En-valued inner product is 〈(ai), (bi)〉 = i bi. For every 1 ≤ i ≤ n, we write S̃i for the operator in L(X) defined by S̃i(aj) j=1 = ( u(i,j),(k,l)Lekaj) Note that u(i,j),(k,l)Lekaj) l=1, ( j′,k′ u(i,j′),(k′,l)Lek′ bj′) l=1〉 = j,j′,k,k′,l ū(i,j),(k,l)a Lek′ bj′u(i,j′),(k′,l) = (uu∗)(i,j′),(i,j)a jbj′ = a∗jbj = 〈(aj), (bj′)〉. Thus S̃i is an isometry. A similar computation shows that these isometries have orthogonal ranges and, thus, this family defines a ∗-homomorphism ϕ : En → L(X), with ϕ(Lei) = S̃i, 1 ≤ i ≤ n, making X a C∗-correspondence over En (in the sense of [8] and [7]). Once we have a correspondence we can formX⊗X and, more generally, X⊗k. Recall that to define X⊗X one defines the sesquilinear form 〈x⊗y, x′⊗y′〉 = 〈y, ϕ(〈x, x′〉)y′〉 on the algebraic tensor product and then lets X ⊗X be the Hausdorff completion. The right action of En on X ⊗X is (x⊗ y) · a = x⊗ (y · a) and the left action is given by the map ϕ2. ϕ2(a)(x⊗ y) = ϕ(a)x⊗ y. The definition of X⊗k is similar (and the left action map is denoted ϕk) For k = 0 we set X⊗0 = En and ϕ0 is defined by left multiplication . Also write ϕ∞ for k ⊕ϕk, the left action of En on F(X). One can then define the Hilbert spaceX⊗k⊗EnH by defining the sesquilin- ear form 〈x⊗h, y⊗k〉 = 〈h, 〈x, y〉k〉 (x, y ∈ X⊗k) and applying the Hausdorff completion. Now define the map v : X ⊗En H → F ⊗H by setting v((ai)⊗ h) = fi ⊗ aih. It is straightforward to check that this map is a well defined Hilbert space isomorphism. By induction, we also define maps vk : X ⊗k⊗En H → F⊗k⊗H vk+1((aj)⊗ z) = fj ⊗ vk((ϕk(aj)⊗ IH)z) (23) for z ∈ X⊗k ⊗En H and v0 is the identity map from En ⊗En H (which is isomorphic to H) and F⊗0 ⊗ H = H . Assume that vk is a Hilbert space isomorphism of X⊗k ⊗En H onto F⊗k ⊗ H and compute, for (aj), (bj) ∈ X and z, z′ ∈ X⊗k ⊗H , 〈vk+1((aj)⊗z), vk+1((bj)⊗z)〉 = 〈fj⊗vk((ϕk(aj)⊗IH)z), fj′⊗vk((ϕk(bj′)⊗IH)z′)〉 = 〈vk((ϕk(aj)⊗ IH)z), vk((ϕk(bj)⊗ IH)z′)〉 = 〈z, (ϕk(a∗jbj)⊗ IH)z′)〉 = 〈(aj)⊗ z, (bj)⊗ z′〉. Thus, by induction, each map vk is a Hilbert space isomorphism and, sum- ming up, we get a Hilbert space isomorphism v∞ := ⊕vk : F(X)⊗En H → F(F )⊗H. Lemma 7.1 v∞ is a Hilbert space isomorphism and intertwines the actions of En. That is, v∞ ◦ (ϕ∞(a)⊗ IH) = ρ∞(a) ◦ v∞ for a ∈ En. Proof. We show that, for every p ≥ 0 and a ∈ En, we have vp ◦ (ϕp(a)⊗ IH) = ρp(a) ◦ vp. (24) The proof will proceed by induction on p. For p = 0 this is clear so we now assume that it holds for p. For 1 ≤ i ≤ n, (aj) ∈ X and z ∈ X⊗p⊗H , we have vp+1((ϕp+1(Lei)⊗ IH)((aj)⊗ z)) = vp+1(ϕ(Lei)(aj)⊗ z) = l,k,j u(i,j),(k,l)fl ⊗ vp((ϕp(Lekaj)⊗ IH)z). Using the induction hypothesis, this is equal to l,k,j u(i,j),(k,l)fl ⊗ ρp(Lek)ρp(aj)vpz = l,k,j u(i,j),(k,l)fl ⊗ w∗pLekwpρp(aj)vpz = l,k,j u(i,j),(k,l)fl ⊗ ekρp(aj)vpz = w∗∞ ei ⊗ fj ⊗ ρp(aj)vpz = ρp+1(Lei)w fj ⊗ ρp(aj)vpz. Using the induction hypothesis again, we get ρp+1(Lei)w j fj⊗vp((ϕp(aj)⊗ IH)z) = ρp+1(Lei)vp+1((aj)⊗z). This proves (24) for p+1 and the generators of En. Since both ρp+1 and vp+1(ϕp+1(·)⊗IH)v∗p+1 are ∗-homomorphisms, (24) holds for p + 1 and every a ∈ En, completing the induction step. Thus, (24) holds for every p and this implies the statement of the lemma. � Write δl for the vector (aj) in X such that al = I and aj = 0 if l 6= j. The tensor algebra T+(X) is generated by the operators Tδl (where Tδl is the creation operator on F(X) associated with δl) and the C∗-algebra ϕ∞(En). The latter algebra is generated (as a C∗-algebra) by the operators ϕ∞(Li) where {Li} is the set of generators of En. We have Lemma 7.2 For every 1 ≤ i ≤ n and 1 ≤ j ≤ m and k ≥ 0, (i) w ◦ vk ◦ (ϕ∞(Li)⊗ IH) = Lei ◦ w ◦ vk. (ii) w ◦ vk+1 ◦ (Tδj ⊗ IH) = Lfj ◦ w ◦ vk. Proof. Part (i) follows from (24) and part (ii) from (23) (with δj in place of (aj)). � Recalling that w ◦ v∞ is a unitary operator mapping F(X) ⊗ H onto F(E, F, u), we get Theorem 7.3 (1) The algebra Au is unitarily isomorphic to the (norm closed) subalgebra of the tensor algebra T+(X) that is generated by {ϕ∞(Li), Tδj : 1 ≤ i ≤ n, 1 ≤ j ≤ m}. (2) The (norm closed) subalgebra of B(F(E, F, u)) that is generated by {Lei, L∗ei, Lfj : 1 ≤ i ≤ n, 1 ≤ j ≤ m } is unitarily isomorphic to the tensor algebra T+(X) (and contains Au). (2) The (norm closed) subalgebra of B(F(E, F, u)) that is generated by {Lei, L∗fj , Lfj : 1 ≤ i ≤ n, 1 ≤ j ≤ m } is unitarily isomorphic to a tensor algebra T+(Y ) (and contains Au). Proof. Parts (1) and (2) follow from Lemma 7.2. For part (3), note that one can interchange the roles of E and F . More precisely, one defines the C∗-module Y over Em to be Y = Cn(Em) and the left action of Em on Y by ϕY (Lfl)(bk) k=1 = ( j,k ū(i,j),(k,l)Lfjbk) i=1. This makes Y into a C correspondence over Em and the rest of the proof proceeds along similar lines as above. � Suppose m = 1. Then X is the correspondence associated with the automorphism α of En given by mapping Ti to j=1 ui,jTj (note that u, in this case, is an n × n matrix). The tensor algebra T+(X) is the analytic crossed product En ×α Z+ and Au is unitarily isomorphic to the subalgebra of this analytic crossed product that can be written An×α Z+. One can also embed Au in T+(Y ) (as in Corollary 7.3(3)). Here Em is simply the (classical) Toeplitz algebra T and Y = Cn(T ) with ϕY (Tz)(bk)k = ( k ūi,kTzbk)i (where Tz is the generator of T ). Remark 7.4 Since the automorphisms Θz,w and ΨA,B of Au are both uni- tarily implemented, they can be extended to T+(X). It is easy to check that they map T+(X) into itself and, thus, are automorphisms of T+(X). Hence, at least when n 6= m, every automorphism of Au can be extended to an auto- morphism of the tensor algebra T+(X) that contains it (see Theorem 5.12). References [1] K.R. Davidson, Free Semigroup Algebras : a survey. Systems, approxi- mation, singular integral operators, and related topics (Bordeaux, 2000), 209–240, Oper. Theory Adv. Appl. 129, Birkhauser, Basel, 2001. [2] K.R. Davidson and D.R. Pitts, The algebraic structure of noncommuta- tive analytic Toeplitz algebras, Math. Ann. 311 (1998), 275-303. [3] N. Fowler, Discrete product systems of Hilbert bimodules, Pacific J. Math. 204 (2002), 335-375. [4] E. Katsoulis, D.W. Kribs, Isomorphisms of algebras associated with di- rected graphs, Math. Ann., 330 (2004), 709-728. [5] D.W. Kribs and S.C. Power, The H∞ algebras of higher rank graphs, Math. Proc. of the Royal Irish Acad., 106 (2006), 199-218. [6] A. Kumjian and D. Pask, Higher rank graph C* -algebras, New York J. Math. 6 (2000), 1–20. [7] P. Muhly and B. Solel, Tensor algebras over C∗-correspondences (Rep- resentations, dilations, and C∗-envelopes), J. Functional Anal. 158 (1998), 389–457. [8] M. Pimsner, A class of C∗-algebras generalizing both Cuntz-Krieger algebras and crossed products by Z, in Free Probability Theory, D. Voiculescu, Ed., Fields Institute Communications 12, 189-212, Amer. Math. Soc., Providence, 1997. [9] G. Popescu, Von Neumann inequality for (B(H)n)1, Math. Scand.68 (1991), 292-304. [10] S.C. Power, Classifying higher rank analytic Toeplitz algebras, preprint 2006, preprint Archive no., math.OA/0604630. [11] B. Solel, You can see the arrows in a quiver algebra, J. Australian Math. Soc., 77 (2004), 111-122. [12] B. Solel, Regular dilations of representations of product systems, preprint Archive no., math.OA/0504129. [13] B. Solel, Representations of product systems over semigroups and dila- tions of commuting CP maps, J. Funct. Anal.235 (2006), 593-618. [14] D. Voiculescu, Symmetries of some reduced free product C∗-algebras, Lect. Notes Math. 1132, 556-588, Springer-Verlag, New York 1985. http://arxiv.org/abs/math/0604630 http://arxiv.org/abs/math/0504129 Introduction Preliminaries The character space and its core Automorphisms of Ln and Lu Isomorphic algebras Special cases The case n=m=2 Permutation unitary relation algebras Au as a subalgebra of a tensor algebra ABSTRACT We define nonselfadjoint operator algebras with generators $L_{e_1},..., L_{e_n}, L_{f_1},...,L_{f_m}$ subject to the unitary commutation relations of the form \[ L_{e_i}L_{f_j} = \sum_{k,l} u_{i,j,k,l} L_{f_l}L_{e_k}\] where $u= (u_{i,j,k,l})$ is an $nm \times nm$ unitary matrix. These algebras, which generalise the analytic Toeplitz algebras of rank 2 graphs with a single vertex, are classified up to isometric isomorphism in terms of the matrix $u$. <|endoftext|><|startoftext|> THE ASTROPHYSICAL JOURNAL, 679:1272–1287, 2008 JUNE 1 Preprint typeset using LATEX style emulateapj v. 08/22/09 SHAPING THE GLOBULAR CLUSTER MASS FUNCTION BY STELLAR-DYNAMICAL EVAPORATION DEAN E. MCLAUGHLIN1,2 AND S. MICHAEL FALL3,4 The Astrophysical Journal, 679:1272–1287, 2008 June 1 ABSTRACT We show that the globular cluster mass function (GCMF) in the Milky Way depends on cluster half-mass density, ρh, in the sense that the turnover mass MTO increases with ρh while the width of the GCMF decreases. We argue that this is the expected signature of the slow erosion of a mass function that initially rose towards low masses, predominantly through cluster evaporation driven by internal two-body relaxation. We find excellent agreement between the observed GCMF—including its dependence on internal density rhoh, central concen- tration c, and Galactocentric distance rgc—and a simple model in which the relaxation-driven mass-loss rates of clusters are approximated by −dM/dt = µev ∝ ρ h . In particular, we recover the well-known insensitivity of MTO to rgc. This feature does not derive from a literal “universality” of the GCMF turnover mass, but rather from a significant variation of MTO with ρh—the expected outcome of relaxation-driven cluster disruption— plus significant scatter in ρh as a function of rgc. Our conclusions are the same if the evaporation rates are assumed to depend instead on the mean volume or surface densities of clusters inside their tidal radii, as µev ∝ ρ t or µev ∝ Σ t —alternative prescriptions that are physically motivated but involve cluster properties (ρt and Σt) that are not as well defined or as readily observable as ρh. In all cases, the normalization of µev required to fit the GCMF implies cluster lifetimes that are within the range of standard values (although falling towards the low end of this range). Our analysis does not depend on any assumptions or information about velocity anisotropy in the globular cluster system. Subject headings: galaxies: star clusters—globular clusters: general 1. INTRODUCTION The mass functions of star cluster systems provide an im- portant point of reference for attempts to understand the con- nection between old globular clusters (GCs) and the young massive clusters that form in local starbursts and galaxy merg- ers. When expressed as the number per unit logarithmic mass, dN/d log M, the GC mass function (GCMF) is character- ized by a peak, or turnover, at a mass MTO ≈ 1–2× 10 that is empirically very similar in most galaxies. By con- trast, the mass functions of young clusters show no such fea- ture but instead rise monotonically towards low masses over the full observed range (106 M⊙ & M & 10 4 M⊙ in the best- studied cases), in a way that is well described by a power law, dN/d log M ∝ M1−β with β ≃ 2 (e.g., Zhang & Fall 1999). At the same time, for high M > MTO, old GCMFs closely resemble the mass functions of young clusters, and of molecular clouds in the Milky Way and other galaxies (Harris & Pudritz 1994; Elmegreen & Efremov 1997); and it is well known that a number of dynamical processes cause star clusters to lose mass and can lead to their com- plete destruction as they orbit for a Hubble time in the potential wells of their parent galaxies (e.g., Fall & Rees 1977; Caputo & Castellani 1984; Aguilar, Hut, & Ostriker 1988; Chernoff & Weinberg 1990; Gnedin & Ostriker 1997; Murali & Weinberg 1997). It is therefore natural to ask whether the peaks in GCMFs can be explained by the deple- 1 Dept. of Physics and Astronomy, University of Leicester, University Road, Leicester, UK LE1 7RH 2 Permanent address: Astrophysics Group, Lennard-Jones Lab- oratories, Keele University, Keele, Staffordshire, UK ST5 5BG; dem@astro.keele.ac.uk 3 Institute for Advanced Study, Einstein Drive, Princeton, NJ 08450 4 Permanent address: Space Telescope Science Institute, 3700 San Martin Drive, Baltimore, MD 21218; fall@stsci.edu tion over many Gyr of globulars from initial mass distribu- tions that were similar to those of young clusters below MTO as well as above. Our chief purpose in this paper is to establish and inter- pret an aspect of the Galactic GCMF that appears fundamen- tal but has gone largely unnoticed to date: dN/d log M has a strong and systematic dependence on GC half-mass den- sity, ρh ≡ 3M/8πr h (rh being the cluster half-mass radius), in the sense that the turnover mass MTO increases and the width of the distribution decreases with increasing ρh. As observed facts, these must be explained by any theory of the GCMF. We argue here that they are an expected signature of slow dynam- ical evolution from a mass function that initially increased towards M < MTO, if the long-term mass loss from surviv- ing GCs has been dominated by stellar escape due to internal, two-body relaxation (which we refer to from now on as either relaxation-driven evaporation or simply evaporation). Fall & Zhang (2001; hereafter FZ01) explain in detail why cluster evaporation dominates the long-term evolution of the low-mass shape of observable GCMFs. Briefly, stellar evolu- tion removes (through supernovae and winds) the same frac- tion of mass from all clusters of a given age, and so cannot change the shape of dN/d log M (unless special initial con- ditions are invoked; cf. Vesperini & Zepf 2003). Meanwhile, for GCs like those that have survived for a Hubble time in the Milky Way, the mass loss from gravitational shocks dur- ing disk crossings and bulge passages is generally less than that due to evaporation for M < MTO (FZ01; Prieto & Gnedin 2006).5 As we discuss further in §2 below, the evaporation of tidally 5 It is possible that there existed a past population of GCs with low den- sities or concentrations, or perhaps on extreme orbits, that were destroyed in less than a Hubble time by shocks or stellar evolution. Our discussion does not cover such clusters. http://arxiv.org/abs/0704.0080v4 2 McLAUGHLIN & FALL limited clusters proceeds at a rate, µev ≡ −dM/dt, that is ap- proximately constant in time and primarily determined by cluster density. FZ01 show that a constant mass-loss rate leads to a power-law scaling dN/d log M ∝ M1−β with β → 0 (corresponding to a flat distribution of clusters per unit lin- ear mass) at sufficiently low M < µevt in the evolved mass function of coeval GCs that began with any nontrivial ini- tial dN/d log M0. 6 To accommodate this when dN/d log M0 originally increased towards low masses as a power law, a time-dependent peak must develop in the GCMF at a mass of order MTO(t) ∼ µevt (FZ01). But then, since µev depends fundamentally on cluster density, so too must MTO. A β ≃ 0 power-law scaling below the turnover mass has been confirmed directly in the GCMFs of the Milky Way (FZ01) and the giant elliptical M87 (Waters et al. 2006), while Jordán et al. (2007) show it to be consistent with dN/d log M data for 89 Virgo Cluster galaxies, and it is apparent in deep observations of some other GCMFs (e.g., in the Sombrero galaxy, M104; Spitler et al. 2006). As regards the peak itself, old GCs are observed (e.g., Jordán et al. 2005) to have rather similar densities on average—and, therefore, similar typical µev—in galaxies with widely different total luminosities and Hubble types. (Inasmuch as cluster densities are set by tides, this is probably related to the mild variation of mean galaxy density with total luminosity; see FZ01, and also Jordán et al. 2007.) Thus, an evaporation-dominated evolutionary origin for a turnover in the GCMF appears to be consistent with the well-known fact that the mass scale MTO generally dif- fers very little among galaxies (e.g., Harris 2001; Jordán et al. 2006). If this picture is basically correct, it implies that, even though MTO may appear nearly universal when considering the global mass functions of entire GC systems, in fact the GCMFs of subsamples of clusters with similar ages but dif- ferent densities should have different turnovers. In §2, we show—working for definiteness and relatively easy observ- ability with the half-mass density, ρh—that this is the case for globulars in the Milky Way. We fit the observed dN/d log M for GCs in bins of different ρh with models assuming that (1) the initial distribution increased as a β = 2 power law at low masses and (2) the mass-loss rates of individual clusters can be estimated from their half-mass densities by the rule µev ∝ ρ h . In §3 we discuss the validity of this prescription for µev, which is certainly approximate but captures the main physical dependence of relaxation-driven mass loss. In par- ticular, we show that the alternative mass-loss laws µev ∝ ρ and µev ∝ Σ t —where ρt and Σt are the mean volume and surface densities inside cluster tidal radii—lead to models for the GCMF that are essentially indistinguishable from those based on µev ∝ ρ h . The normalization of µev required to fit the observed GCMF implies cluster lifetimes that are within a factor of ≈ 2 (perhaps slightly on the low side, if the ini- tial power-law exponent at low masses was β = 2) of typical values in theories and simulations of two-body relaxation in tidally limited GCs. We also show in §2 that when the observed densities of in- dividual clusters are used in our models to predict GCMFs in different bins of Galactocentric radius (rgc), they fit the 6 Throughout this paper, we use “initial” to mean at a relatively early time in the development of long-lived clusters, after they have dispersed any rem- nants of their natal gas clouds, survived the bulk of stellar-evolution mass loss, and come into virial equilibrium in the tidal field of a galaxy. much weaker variation of dN/d log M and MTO as functions of rgc, which is well-known in the Milky Way and other large galaxies (see Harris 2001; Harris, Harris, & McLaughlin 1998; Barmby, Huchra, & Brodie 2001; Vesperini et al. 2003; Jordán et al. 2007). Similarly, applying our models to the GCs in two bins of central concentration, with only the mea- sured ρh of the clusters in each subsample as input, suffices to account for previously noted differences between the mass functions of low- and high-concentration Galactic globulars (Smith & Burkert 2002). The most fundamental feature of the GCMF therefore appears to be its dependence on cluster den- sity, which can be understood at least qualitatively (and even quantitatively, to within a factor of 2) in terms of evaporation- dominated cluster disruption. There is a widespread perception that if the GCMF evolved slowly from a rising power law at low masses, then a weak or null variation of MTO with rgc can be achieved only in GC systems with strongly radially anisotropic velocity distribu- tions, which are not observed (see especially Vesperini et al. 2003). This apparent inconsistency has been cited to bol- ster some recent attempts to identify a mechanism by which a “universal” peak at MTO ∼ 10 5 M⊙ might have been im- printed on the GCMF at the time of cluster formation, or very shortly afterwards, and little affected by the subsequent destruction of lower-mass GCs (e.g., Vesperini & Zepf 2003; Parmentier & Gilmore 2007). However, given the real suc- cesses of an evaporation-dominated evolutionary scenario for the origin of MTO, as summarized above and added to below, it would be premature to reject the idea in favor of requiring a near-formation origin, solely on the basis of difficulties with GC kinematics. (And, in any event, formation-oriented mod- els must now be reconsidered in light of the non-universality of MTO as a function of cluster density.) We are not concerned in this paper with velocity anisotropy in GC systems, because we only predict an evaporation- evolved dN/d log M as a function of cluster density (and age) and take the observed distribution of ρh versus rgc in the Milky Way as a given, to show consistency with the observed be- havior of MTO as a function of rgc. Most other models (FZ01; Vesperini et al. 2003; and references therein) predict dynam- ically evolved GCMFs directly in terms of rgc, and in doing so are forced also to derive theoretical dependences of cluster density on rgc. It is only at this stage that GC orbital distri- butions enter the problem, and then only in conjunction with several other assumptions and simplifications. As we discuss further in §3 below, the radially biased GC velocity distribu- tions that appear in such models could well be consequences of one or more of these other assumptions, rather than of the main hypothesis about evaporation-dominated GCMF evolu- tion. 2. THE GALACTIC GCMF AS A FUNCTION OF CLUSTER DENSITY In this section we define and model the dependence of the Galactic GCMF on cluster density. First, we describe the dependence that is expected to arise from evaporation- dominated evolution. Two-body relaxation in a tidally limited GC leads to a roughly steady rate of mass loss, µev ≡ −dM/dt ≃ constant in time. Thus, the total cluster mass decreases approxi- mately linearly, as M(t) ≃ M0 −µevt. This behavior is exact in some classic models of GC evolution (Hénon 1961) and is found to be a good approximation in most other calcula- tions (e.g., Lee & Ostriker 1987; Chernoff & Weinberg 1990; SHAPING THE GCMF BY EVAPORATION 3 Vesperini & Heggie 1997; Gnedin, Lee, & Ostriker 1999; Baumgardt 2001; Giersz 2001; Baumgardt & Makino 2003; Trenti, Heggie, & Hut 2007). The result comes from a variety of computational methods (semi-analytical, Fokker-Planck, Monte Carlo, and N-body simulation) applied to clusters with different initial conditions (densities and concentrations) on different kinds of orbits (circular and eccentric; with and with- out external gravitational shocks) and with different internal processes and ingredients (with or without stellar mass spec- tra, binaries, and central black holes). To be sure, deviations from perfect linearity in M(t) do occur, but these are generally small—especially away from the endpoints of the evolution, i.e., for 0.9&M(t)/M0 & 0.1—and neglecting them to assume an approximately constant dM/dt is entirely appropriate for our purposes. When gravitational shocks are subdominant to relaxation- driven evaporation, as they generally appear to be for extant GCs, they work to boost the mass-loss rate µev slightly without altering the basic linearity of M(t) (e.g., Vesperini & Heggie 1997; Gnedin, Lee, & Ostriker 1999; see also Figure 1 of FZ01). A time-dependent mass scale ∆ ≡ µevt is then associated naturally with any system of coeval clusters having a common mass-loss rate: all those with initial M0 ≤ ∆ are disrupted by time t, and replaced with the rem- nants of objects that began with M0 >∆. As we mentioned in §1, if the initial GCMF increased towards low masses as a power law, then ∆ is closely related to a peak in the evolved distribution, which eventually decreases towards low M <∆ as dN/d log M ∝ M1−β with β = 0 (FZ01). In standard theory (e.g., Spitzer 1987; Binney & Tremaine 1987, Section 8.3), the lifetime of a cluster against evapora- tion is a multiple of its two-body relaxation time, trlx. For a total mass M of stars within a radius r, this scales to first order (ignoring a weak mass dependence in the Coulomb log- arithm) as trlx(r) ∝ (Mr 3)1/2 ∝ M/ρ1/2, where ρ∝ M/r3. In a concentrated cluster with an internal density gradient, trlx(r) of course varies throughout the cluster, and the global re- laxation timescale is an average of the local values (see the early discussion by King 1958). This can still be written as trlx ∝ M/ρ 1/2, with M the total cluster mass and ρ an appro- priate reference density. We then have for the instantaneous mass-loss rate, µev ≡ −dM/dt ∝ M/trlx ∝ ρ1/2. Insofar as this is approximately constant in time, a GCMF evolving from an initial β > 1 power law at low masses should therefore de- velop a peak at a mass that depends on cluster density and age through the parameter ∆∝ ρ1/2t. It remains to identify the best measure of ρ in this context. A standard choice in the literature, and the one that we even- tually make to derive our main results in this paper, is the half-mass density ρh = 3M/8πr h. However, in a steady tidal field, the mean density ρt inside the tidal radius of a cluster is constant by definition, and thus choosing ρ = ρt instead is the simplest way to ensure that µev ∝ ρ 1/2 and µev ≃ constant in time are mutually consistent. In fact, King (1966) found from direct calculations of the escape rate at each radius within his standard (lowered Maxwellian) models, that the coefficient in µev ∝ ρ t is only a weak function of the internal density structure (concentration) of the models, and thus only a weak function of time for a cluster evolving quasistatically through a series of such models. The rule µev ∝ ρ t is routinely used to set the GC mass- loss rates in models for the dynamical evolution of the GCMF, although such studies normally express µev immediately in terms of orbital pericenters, rp, most often by assuming ρt ∝ r−2p as for GCs in galaxies whose total mass distributions fol- low a singular isothermal sphere (e.g., Vesperini 1997, 1998, 2000, 2001; Vesperini et al. 2003; Baumgardt 1998; FZ01). This bypasses any explicit examination of the GCMF as a function of cluster density, which is our main goal in this pa- per. But it is done in part because tidal radii are the most poorly constrained of all structural parameters for GCs in the Milky Way (their theoretical definition is imprecise and their empirical estimation is highly model-dependent and sensitive to low-surface brightness data), and they are exceedingly dif- ficult if not impossible to measure in distant galaxies. We deal with this here by focusing on the GCMF as a function of cluster density ρh inside the less ambiguous, empirically bet- ter determined and more robust half-mass radius, asking how simple models with µev ∝ ρ h fare against the data. Taking µev ∝ ρ h in place of µev ∝ ρ t , which we do to construct evaporation-evolved model GCMFs in §2.2, is most appropriate if the ratio ρt/ρh is the same for all clusters and constant in time. This is the case in Hénon’s (1961) model of GC evolution, and in this limit (adopted by FZ01 in their models for the Galactic GCMF) our analysis is rigorously jus- tified. However, real clusters are not homologous (ρt/ρh dif- fers among clusters) and they do not evolve self-similarly (ρh may vary in time even if ρt does not). The key assumption in our models is that µev is approximately independent of time for any GC, which is well-founded in any case. By using cur- rent ρh values to estimate µev, we do not suppose that the half- mass densities are also constant, but we in effect use a single number for all GCs to represent a range of (ρt/ρh) 1/2. Equiv- alently, we ignore a dependence on cluster concentration in the normalization of µev ∝ ρ h . As we discuss further in §3, it is reasonable to neglect this complication in a first approx- imation because (ρt/ρh) 1/2 varies much less among Galactic globulars than ρt and ρh do separately. We demonstrate this explicitly by repeating our analysis with ρh replaced by ρt and recover essentially the same results for the GCMF. In §3 we also discuss some recent results, which indicate that the timescale for relaxation-driven evaporation depends on a slightly less-than-linear power of trlx (Baumgardt 2001; Baumgardt & Makino 2003). We point out that this implies that µev may increase as a modest power of the average sur- face density of a cluster as well as (or, in an important special case, instead of) the usual volume density. However, we show in detail that making the appropriate changes throughout the rest of the present section to reflect this possibility does not change any of our conclusions. 2.1. Data Figure 1 shows the distribution of mass against half-mass density and against Galactocentric radius for 146 Milky Way GCs in the catalogue of Harris (1996),7 along with the dis- tribution of ρh versus rgc linking the two mass plots. The Harris catalogue actually records the absolute V magni- tudes of the GCs. We obtain masses from these by apply- ing the population-synthesis model mass-to-light ratios ΥV computed by McLaughlin & van der Marel (2005) for indi- vidual clusters based on their metallicities and an assumed age of 13 Gyr. However, we first multiplied all of the 7 Feb. 2003 version; see http://physwww.mcmaster.ca/∼harris/mwgc.dat . http://physwww.mcmaster.ca/~harris/mwgc.dat 4 McLAUGHLIN & FALL FIG. 1.— Left: Mass versus three-dimensional half-mass density, ρh ≡ 3M/8πr h , and versus Galactocentric radius, rgc, for 146 Milky Way GCs in the catalogue of Harris (1996). The dashed line in the first panel is M ∝ ρ h , a locus of approximately constant lifetime against evaporation. Right: Half-mass density versus rgc for the same clusters. McLaughlin & van der Marel ΥV values by a factor of 0.8 so as to obtain a median Υ̂V ≃ 1.5M⊙L ⊙ in the end, 8 consistent with direct dynamical estimates (see McLaughlin 2000 and McLaughlin & van der Marel 2005; also Barmby et al. 2007). By assigning mass-to-light ratios to GCs in this way, we allow for expected differences between clusters with differ- ent metallicities. Our application of a corrective factor to the population-synthesis values, ΥpopV , is motivated empiri- cally by the fact that their distribution among Galactic GCs is strongly peaked around a median Υ̂popV ≃ 1.9 M⊙ L ⊙ , while the observed (dynamical) ΥdynV lie in a fairly narrow range around Υ̂dynV ≃ 1.5 M⊙ L ⊙ (McLaughlin & van der Marel 2005). However, it is worth noting that the size of this differ- ence is similar to what is found in some numerical simulations of two-body relaxation over a Hubble time in clusters with a spectrum of stellar masses (e.g., Baumgardt & Makino 2003). In such simulations, ΥdynV falls below Υ V due to the prefer- ential escape of low-mass stars with high individual M∗/L∗ (population-synthesis models do not incorporate this or any other stellar-dynamical effect). Thus, a median Υ̂dynV < Υ̂ may itself be a signature of cluster evaporation. We might then also expect that more dynamically evolved clusters—that is, those with shorter relaxation times—could have systemat- ically lower ratios of ΥdynV /Υ V . However, this is a relatively small effect, which is not well quantified theoretically and is not clearly evident in real data (the numbers published by McLaughlin & van der Marel 2005 show no significant corre- lation between Υ V and trh for Galactic globulars). We therefore proceed, as stated, with a single ΥdynV /Υ V = 0.8 assumed for all GCs. Harris (1996) gives the projected half-light radius Rh for 141 of the clusters with a mass estimated in this way, and for these we obtain the three-dimensional half-mass radius from the general rule rh = (4/3)Rh (Spitzer 1987), which assumes no internal mass segregation. The remaining five objects have mass estimates but no size measurements. To each of these clusters, we assign an rh equal to the median value for those of the other 141 GCs having masses within a factor two of the one with unknown rh. In all cases, the half-mass density is 8 Throughout this paper, we use bx to denote the median of any quantity x. ρh ≡ 3M/8πr The leftmost panel in Figure 1 shows immediately that the cluster mass distribution has a strong dependence on half-mass density: the median M̂ increases with ρh while the scatter in log M—that is, the width of the GCMF— decreases. The first of these points is related to the fact that rh correlates poorly with M (e.g., Djorgovski & Meylan 1994; McLaughlin 2000). The second point, that the dispersion of dN/d log M decreases with increasing ρh, is behind the find- ing (Kavelaars & Hanes 1997; Gnedin 1997) that the GCMF is broader at very large Galactocentric radii. We return to this in §2.2. A natural concern, when plotting M against ρh as we have done here, is that any apparent correlation might only be a trivial reflection of the definition ρh ∝ M/r h. This may seem particularly worrisome because, as we just mentioned, it is known that size does not correlate especially well with mass for GCs in the Milky Way (or, indeed, in other galaxies). However, the lack of a tight M–rh correlation does not imply that all GCs have the same rh, even within the unavoidable measurement errors. The root-mean-square (rms) scatter of log rh about its average value is ±0.3 for Galactic GCs, and the 68-percentile spread in log rh is slightly greater than 0.5, or more than a factor of 3 in linear terms (from the data in Harris 1996; see, e.g., Figure 8 of McLaughlin 2000). This compares to an rms random measurement error (from formal, χ2 fitting uncertainties) of δ(log rh) ≈ 0.05, or about 10% rel- ative error; and an rms systematic measurement error (i.e., differences in the rh inferred from fitting different structural models to a single cluster) of perhaps δ(log rh) . 0.03; see McLaughlin & van der Marel (2005). Most of the scatter in plots of observed half-light radius versus mass is therefore real and contains physical information. The left-hand panel of Figure 1 displays this information in a form that highlights clear, nontrivial overall trends requiring physical explanation. The dashed line in the plot of mass against density traces the proportionality M ∝ ρ h , or Mr h = constant. Insofar as the half-mass relaxation time scales as trh ∝ (Mr 1/2, and to the extent that µev ∝ M/trh ∝ ρ h approximates the av- erage rate of relaxation-driven mass loss, this line is one of equal evaporation time. That such a locus nicely bounds the lower envelope of the observed cluster distribution is SHAPING THE GCMF BY EVAPORATION 5 TABLE 1 MILKY WAY GC PROPERTIES IN BINS OF DENSITY AND GALACTOCENTRIC RADIUS Bin N bρh brgc a Mmin Mmax bM a MTO [M⊙ pc−3] [kpc] [M⊙] [M⊙] [M⊙] [M⊙] ρh bins 0.034 ≤ ρh ≤ 76.5 M⊙ pc −3 48 8.48 12.9 5.63× 102 8.84× 105 4.12× 104 3.98× 104 78.8 ≤ ρh ≤ 526 M⊙ pc −3 49 232 5.6 8.37× 103 1.67× 106 1.22× 105 1.58× 105 579 ≤ ρh ≤ 5.65× 10 4 M⊙ pc−3 49 973 3.2 1.93× 104 1.30× 106 2.82× 105 2.88× 105 rgc bins 0.6 ≤ rgc ≤ 3.2 kpc 47 597 1.9 4.47× 103 1.02× 106 1.15× 105 2.14× 105 3.3 ≤ rgc ≤ 9.4 kpc 50 261 5.2 2.02× 103 1.67× 106 1.27× 105 1.66× 105 9.6 ≤ rgc ≤ 123 kpc 49 18.4 18.3 5.63× 102 1.30× 106 7.42× 104 8.71× 104 a The notation bx represents the median of quantity x. b MTO is the peak mass of the model GCMFs traced by the solid curves in each panel of Figure 2, which are given by equation (3) of the text with β = 2, Mc = 10 6 M⊙, and individual ∆ given by the observed ρh of each cluster through equation (4). itself a strong hint that relaxation-driven cluster disruption has significantly modified the GCMF at low masses (re- call that Mr3h = constant defines one side of the GC “sur- vival triangle” when the M–ρh plot is recast as rh versus M: Fall & Rees 1977; Okazaki & Tosa 1995; Ostriker & Gnedin 1997; Gnedin & Ostriker 1997). It is also further evidence that the weak correlation of observed rh with M is due to sig- nificant and real differences in cluster radii, since if rh were intrinsically the same for all GCs, then we would see M ∝ ρh instead. The middle panel of Figure 1 shows the well-known result that the typical GC mass depends weakly if at all on Galacto- centric radius, at least until large rgc & 30–40 kpc, where there are too few clusters to discern any trend. The right-hand panel of the figure shows why this is true even though the GCMF depends significantly on cluster density: although there is a correlation between half-mass density and Galactocentric po- sition, the large scatter about it is such that convolving the observed M versus ρh with the observed ρh versus rgc results in an almost null dependence of M on rgc. We now divide the GC sample in Figure 1 roughly into thirds, in two different ways: first on the basis of half-mass density, and second by Galactocentric radius. These ρh and rgc bins are defined in Table 1, which also gives a few sum- mary statistics for the globulars in each subsample. We count the clusters in every subsample in about 10 equal-width bins of log M to obtain histogram representations of dN/d log M, first as a function of ρh and then as a function of rgc. These GCMFs are shown by the points in Figure 2, with errorbars indicating standard Poisson uncertainties. The curves in the figure trace model GCMFs, which we describe in §2.2. For the moment, it is important to note that the dashed curve is the same in every panel, apart from minor differences in normal- ization, and is proportional to the GCMF for the whole sample of 146 GCs. (In the middle-left panel of Figure 2, which per- tains to clusters distributed tightly around the median ρh of the entire GC system, the dashed curve is coincident with the solid curve running through the data.) The left-hand panels of Figure 2 show directly that the GCMF is peaked for clusters at any density, and that the mass of the peak increases systematically with ρh (see also the last column of Table 1, but note that the turnover masses there refer to the model GCMFs that we develop below). The sta- tistical significance of this is very high, and qualitatively it is the behavior expected if MTO owes its existence to cluster disruption at a rate that increases with ρh, as is the case with relaxation-driven evaporation. The right-hand panels of Figure 2 confirm once again that the GCMF peak mass is a very weak function of Galactocen- tric position. In fact, the observed distributions in the two rgc bins inside ≃10 kpc are statistically indistinguishable in their entirety, and the main difference at larger rgc & 10 kpc is a slightly higher proportion of low-mass clusters rather than a large change in MTO. All of this is consistent with the pri- mary dependence of the GCMF being that on ρh, since Fig- ure 1 shows that the GC density distribution is not sensitive to Galactocentric position for rgc . 10–20 kpc but has a substan- tial low-density tail at larger radii (with a broader associated GCMF, as seen in the upper-left panel of Figure 2). 2.2. Simple Models We now assess more quantitatively whether these results are consistent with evaporation-dominated evolution of the GCMF from an initial distribution like that observed for young clusters in the local universe. We model the time- evolution of the distribution of M versus ρh in Figure 1 but do not attempt this for the distribution of ρh over rgc—the details of which likely depend on a complicated interplay between the tidal field of the Galaxy, the present and past orbital pa- rameters of clusters, and the structural nonhomology of GCs. To compare our models to the current GCMF as a function of rgc, we simply calculate them using the observed ρh of indi- vidual clusters in different ranges of Galactocentric radius. We assume that the initial GCMF was independent of clus- ter density, and that all globulars surviving to the present day have been losing mass for the past Hubble time at constant rates. We use the current half-mass density of each cluster to estimate µev ∝ ρ h . As we discussed earlier, an approxi- mately time-independentµev is indicated by most calculations of two-body relaxation in tidally limited GCs. We give a more detailed, a posteriori justification in §3 for using ρh, rather than other plausible measures of cluster density, to estimate Consider first a group of coeval GCs with an initial mass function dN/d log M0 and a single, time-independent mass- loss rate µev. The mass of every cluster decreases linearly as M(t) = M0 −µevt, and at any later time each has lost the same amount ∆ ≡ M0 − M(t) = µevt. FZ01 show rigorously that in 6 McLAUGHLIN & FALL FIG. 2.— GCMF as a function of half-mass density, ρh ≡ 3M/8πr h (left panels), and as a function of Galactocentric radius, rgc (right panels), for 146 Milky Way GCs in the catalogue of Harris (1996). The dashed curve in all cases is an evolved Schechter function for the entire GC system (Jordán et al. 2007): equation (3) with β = 2, Mc = 106 M⊙, and ∆≡ 2.3×105 M⊙ for all clusters (from equation [4] and a median bρh = 246 M⊙ pc −3), giving a peak at MTO = 1.6×10 5 M⊙ . Solid curves are the GCMFs predicted by equation (3) with β = 2 and Mc = 106 M⊙ but individual ∆ given by the observed ρh of each cluster (equation [4]) in the different subsamples. this case, the evolved and initial GCMFs are related by d log M d log M0 (M +∆) d log (M +∆) . (1) This is the basis for the claim that the mass function scales generically as dN/d log M ∝ M+1 (a β = 0 power law) at low enough M(t)<∆—that is, for the surviving remnants of clus- ters with M0 ≈∆—just so long as the initial distribution was not a delta function. We follow FZ01 (see also Jordán et al. 2007) in adopting a Schechter (1976) function for the initial GCMF: dN/d log M0 ∝ M 0 exp −M0/Mc . (2) With β ≃ 2, this distribution describes the power-law mass functions of young massive clusters in systems like the Anten- nae galaxies (e.g., Zhang & Fall 1999). An exponential cut- off at Mc ∼ 10 6 M⊙ is generally consistent with such data, even if not always demanded by them; here we require it mainly to match the curvature observed at high masses in old GCMFs (e.g., Burkert & Smith 2000; Jordán et al. 2007). Combining equations (1) and (2) gives the probability den- sity that a single GC with known evaporation rate and age has an instantaneous mass M. The time-dependent GCMF of a system of N GCs with a range of µev (or ages, or both) is then just the sum of all such individual probability densities: d log M [M +∆i] M +∆i . (3) Here the total mass losses ∆i = (µevt)i may differ from clus- ter to cluster (ti being the age of a single GC) but both β and Mc are assumed to be constants, independent of ρh in particu- lar.9 Given each ∆i, the normalizations Ai in equation (3) are defined so that the integral over d log M of each term in the summation is unity. Jordán et al. (2007) have introduced a specialization of equation (3) in which all clusters have the same ∆. They refer to this as an evolved Schechter function and describe its prop- erties in detail (including giving a formula for the turnover mass MTO as a function of ∆ and Mc) for the case β = 2. Here we note only that, at very young cluster ages or for slow mass- loss rates, such that ∆ ≪ Mc and only the low-mass, power- law part of the initial GCMF is significantly eroded, any one evolved Schechter function has a peak at MTO ≃ ∆/(β − 1) (for β > 1). As ∆ increases relative to Mc, the turnover at first increases proportionately and the width of the distribu- tion decreases (since the high-mass end at M & MTO is largely unchanged). For large ∆≫Mc, however, the peak is bounded above by MTO →Mc and the width approaches a lower limit. Thus, the dependence of MTO on ∆ is weaker than linear when Mc is finite in the initial GCMF of equation (2). Any peak in the full equation (3) for a system of GCs with individual ∆ values is an average of N different turnovers and must be cal- culated numerically. In their modeling of the Milky Way GC system, FZ01 ef- 9 Note that Mc appears to take on different values in the GCMFs of other galaxies, varying systematically with the total luminosity Lgal (Jordán et al. 2007). The reasons for this are unclear, as is the origin of this mass scale in the first place. 10 The increase of MTO and the decrease of the full width of dN/d log M for increasing ∆ eventually saturate when the mass loss per GC is so high that it affects clusters in the exponential part of the initial Schechter-function GCMF. This is because dN/d log M ∝ M+1 exp(−M/Mc) is a self-similar solution to equation (1). SHAPING THE GCMF BY EVAPORATION 7 fectively compute mass functions of the type (3)—based on the same initial conditions and dynamical evolution—with a distribution of ∆ values determined by the orbital parame- ters of clusters in an idealized, spherical and static logarith- mic Galaxy potential (used both to fix µev in terms of clus- ter tidal densities and to estimate additional mass loss due to gravitational shocks). Jordán et al. (2007) fit GCMF data in the Milky Way and scores of Virgo Cluster galaxies with their version of equation (3) in which all GCs have the same ∆. They thus estimate the dynamical mass loss from typi- cal clusters in these systems. Here, we construct models for the Milky Way GCMF using ∆ values given directly by the observed half-mass densities of individual GCs. We adopt β = 2 for the initial low-mass power-law index in equation (2), which carries over into equation (3) for the evolved dN/d log M. Jordán et al. (2007) have fitted the full Galactic GCMF with an evolved Schechter function assuming β = 2 and a single ∆ ≡ ∆̂ for all surviving globulars. They find Mc ≃ 10 6 M⊙ and ∆̂ = 2.3× 10 5 M⊙. We use this value of Mc in equation (3) and we associate ∆̂ with the mass loss from clusters at the median half-mass density of the entire GC system, which is ρ̂h = 246 M⊙ pc −3 from the data in Figure 1. Since we are assuming that ∆ = µevt ∝ ρ h t for coeval GCs, we therefore stipulate ∆ = 1.45× 104 M⊙ ρh/M⊙ pc −3)1/2 (4) for globulars with arbitrary ρh. Assuming a typical GC age of t = 13 Gyr, this corresponds to a mass-loss rate of µev ≃ 1100 M⊙ Gyr −1 (ρh/M⊙ pc−3 . (5) In §3 we discuss the cluster lifetimes implied by this value of µev. We emphasize here that the scaling of µev and ∆ with ρ h follows rather generically from our hypothesis of evaporation-dominated cluster evolution, while the numerical coefficients in equations (4) and (5) are specific to the assump- tion of β = 2 for the power-law index at low masses in the initial GCMF. The dashed curve shown in every panel of Figure 2 is the evolved Schechter function fitted to the entire GCMF of the Milky Way by Jordán et al. (2007). This has a peak at MTO ≃ 1.6× 10 5 M⊙ (magnitude MV ≃ −7.4 for a typical V - band mass-to-light ratio of 1.5 in solar units) and gives a very good description of the observed dN/d log M in the middle density bin, 79 . ρh . 530 M⊙ pc −3, and in the two inner ra- dius bins, rgc ≤ 9.4 kpc. This is expected, since the median half-mass density in each of these cluster subsamples is very close to the system-wide median ρ̂h = 246 M⊙ pc −3 (see Ta- ble 1). Even in the outermost rgc bin, a Kolmogorov-Smirnov (KS) test only marginally rejects the dashed-line model (at the ≃95% level), because this subsample still includes many GCs at or near the global median ρ̂h (see Figure 1). By con- trast, the average GCMF is strongly rejected as a model for the lowest- and highest-density GCs on the left-hand side of Figure 2: the KS probabilities that these data are drawn from the dashed distribution are <10−4 in both cases. This is also expected since, by construction, these bins only contain clus- ters with densities well away from the median of the full GC system, for which the total mass lost by evaporation should be significantly different from the typical ∆̂ = ∆(ρ̂h). The solid curves in Figure 2, which are different in ev- ery panel, are the superpositions of many different evolved Schechter functions, as in equation (3), with distinct ∆ values given by equation (4) using the observed ρh of each cluster in the corresponding subsample. These models provide excel- lent matches to the observed dN/d log M in every ρh and rgc bin, with χ2 < 1.3 per degree of freedom in all cases. This is the main result of this paper. The last column of Table 1 gives the mass MTO at which each of the solid model GCMFs in Figure 2 peaks. We note that these turnovers increase roughly as MTO ∼ ρ̂ 0.3−0.4 h for our specific binnings in ρh and rgc, somewhat shallower than the ρ h scaling of the cluster mass-loss rate that defines the models. This is partly because of the averaging over indi- vidual turnovers implied by the summation of many evolved Schechter functions in each GC bin, and partly because—as we discussed just after equation (3)—the turnover mass of any one evolved Schechter function cannot increase indefinitely in direct proportion to ∆ ∝ ρ h , but has a strict upper limit of MTO ≤ Mc. Our models are naturally consistent with the fact that the GCMF is narrower for clusters with higher densities. This is obvious in the left-hand panels of Figure 2; in the dis- cussion immediately after equation (3), we described how it follows from the increase of MTO with ∆ ∝ ρ h for a sin- gle evolved Schechter function. In addition, the superposi- tion of many such functions with separate, density-dependent turnovers and widths results in wider GCMFs for cluster sub- samples spanning larger ranges of ρh. This accounts in partic- ular for the breadth of the mass function at rgc ≥ 9.4 kpc. The globulars at these radii have 0.034 ≤ ρh ≤ 4.1×10 3 M⊙ pc corresponding to individual evolved Schechter functions with turnovers at 2.7× 103 . MTO . 4.0× 10 5 M⊙. The compos- ite GCMF in the lower-right panel of Figure 2 is therefore extremely broad and shows a very flat peak, such that an over- all MTO cannot be established precisely from the data alone. This explains the findings of Kavelaars & Hanes (1997), who pointed out that the GCMF of the outermost third of the Milky Way cluster system has a turnover that is statistically consis- tent with the full-Galaxy average, but a larger dispersion (see also Gnedin 1997). Finally, if the GCMF evolved dynamically from initial con- ditions similar to those we have adopted, then the data and models in the left-hand panels of Figure 2 argue against the notion that external gravitational shocks, rather than in- ternal two-body relaxation, were primarily responsible for shaping the present-day GCMF. This is because the mass- loss rate caused by shocks alone, −dM/dt = µsh ∝ M/ρh, differs significantly from that caused by evaporation alone, −dM/dt = µev ∝ ρ h . The direct dependence of µsh on M en- sures that shocks become progressively less important com- pared to evaporation as clusters lose mass (at a given ρh), and consequently shocks are not likely to have had much effect on the observed GCMF for M < MTO. Furthermore, the inverse dependence of µsh on ρh is contrary to the direct dependence of MTO on ρh shown in Figure 2. The different roles played by shocks and evaporation in shaping the observed GCMF are discussed more fully by FZ01. We note here that gravitational shocks may have been important in destroying very massive or very low-density clusters early in the history of our Galaxy. 2.3. Other Cluster Properties If the current shape of the GCMF is fundamentally the re- sult of long-term cluster disruption according to a mass-loss 8 McLAUGHLIN & FALL rule like µev ∝ ρ h , then it should be possible to reproduce the distribution as a function of any other cluster attribute by using the observed ρh of individual GCs in equations (3) and (4) to build model dN/d log M for subsamples of the Galactic cluster system defined by that attribute—as we did for the rgc binning of §2.2. Here we explore one example in which dif- ferences in the GCMFs of two groups of globulars can be seen in this way to follow from differences in their ρh distributions. Smith & Burkert (2002) have shown that the mass function of Galactic globulars with King (1966) model concentrations c < 0.99 has a less massive peak than that for c ≥ 0.99. [Here c ≡ log(rt/r0), where rt is the fitted tidal radius and r0 a core scale.] They further find that a power-law fit to the low- c GCMF just below its peak returns dN/d log M ∝ M+0.5— shallower than the M+1 expected generically for a mass-loss rate that is constant in time—but they confirm that the latter slope applies for the GCMF at c ≥ 0.99. They discuss various options to explain these results, including a suggestion that, if the mass functions of both low- and high-concentration clus- ters evolved slowly from the same, young-cluster–like initial distribution, then the mass-loss law for low-c GCs may have differed from that for high-c clusters. However, they give no physical explanation for such a difference, and we can show now that none is required. The upper panel of Figure 3 plots concentration against half-mass density for the same 146 GCs from Figure 1; the filled circles distinguish 24 clusters with c < 0.99. There is a correlation of sorts between c and ρh, which either de- rives from or causes the better-known correlation between c and M (e.g., Djorgovski & Meylan 1994; McLaughlin 2000). The important point here is that the ρh distribution is off- set to lower values and has a higher dispersion at c < 0.99. Following the discussion in §2.2, we therefore expect the low-concentration GCMF to have a smaller MTO, a flatter shape around the peak, and a larger full width than the high- concentration GCMF. The lower panel of Figure 3 shows the GCMFs for c < 0.99 (filled circles) and c ≥ 0.99 (open circles). The curves are again given by equation (3) with β = 2, Mc = 10 6 M⊙, and in- dividual ∆ calculated from the observed cluster ρh through equation (4). These models peak at MTO ≃ 4.3× 10 for the c < 0.99 subsample but at MTO ≃ 1.8× 10 5 M⊙ for c ≥ 0.99, entirely as a result of the different ρh involved. The larger width of dN/d log M and its shallower slope at any M . 105 M⊙ for the low-concentration GCs are also clear, in the model curves as well as the data. It is further evident that there are no low-c Galactic globulars observed with M & 2× 105 M⊙, above the nominal turnover of the full GCMF (as Smith & Burkert 2002 noted). But this is not sur- prising, given that there are so few low-concentration clusters in total and they are expected to be dominated by low-mass objects because of their generally low densities. Thus, the solid curve in Figure 3 predicts perhaps ≃ 3 high-mass clus- ters with c < 0.99, where none is found. The apparent variation of the Milky Way GCMF with in- ternal concentration is therefore consistent with the same density-based model for evaporation-dominated dynamical evolution that we compared to dN/d log M as a function of ρh and rgc in §2.2. To show this, we have made use of the densities ρh exactly as observed within the two concentration bins indicated in Figure 3—just as we also took ρh directly from the data for GCs in different ranges of rgc to construct models for comparison with the observed dN/d log M in the FIG. 3.— Top: Concentration parameter as a function of half-mass density for 146 Galactic GCs. The line of points at c ≡ 2.5 comes from the practice of assigning this value to core-collapsed clusters in the Harris (1996) catalogue and its sources. Bottom: GCMF data and models (eqs. [3] and [4]) for 24 clusters with c < 0.99 (filled circles and solid curve) and 122 clusters with c ≥ 0.99 (open circles and dashed curve). right-hand panels of Figure 2. Of course, this is not the same as explaining the distribution of ρh versus rgc or c. Doing so would certainly be of interest in its own right, but it is beyond the scope of our work here. 3. DISCUSSION In this section, we first show that the mass-loss rate in equa- tion (5) above implies cluster lifetimes that compare favorably with those expected from relaxation-driven evaporation. Then we discuss why it is reasonable to approximate µev ∝ ρ the first place. Finally, we address the issue of possible con- flict, in some other models for evaporation-dominated GCMF evolution, between the near-constancy of MTO as a function of rgc and the observed kinematics of GC systems. 3.1. Cluster Lifetimes The disruption time of a GC with mass M and a steady mass-loss rate µev is just tdis = M/µev. It is convenient, for purposes of comparison with evaporation times in the literature, to normalize tdis to the relaxation time of a cluster at its half-mass radius. In general, this is trh = 0.138M1/2r G1/2m∗ ln (γM/m∗) , where m∗ is the mean SHAPING THE GCMF BY EVAPORATION 9 stellar mass. For clusters of stars with a single mass, m∗ ≃ 0.7M⊙ and γ ≃ 0.4 are appropriate (Spitzer 1987; Binney & Tremaine 1987, equation [8-72]), in which case equation (5) for µev from our GCMF modeling implies µevtrh 0.57M/M⊙ 0.57× 105 . (6) Clusters with realistic stellar mass spectra will have slightly different values of m∗ and a smaller γ in the calculation of the relaxation time (Giersz & Heggie 1996), which changes the numerical value of tdis/trh somewhat but does not alter any scalings. We obtained the normalization of µev ∝ ρ h in §2.2 by fitting to observed GCMFs constructed by applying a spe- cific mass-to-light ratio ΥV to every cluster, with models as- suming a specific form for the initial dN/d log M0. Thus, the result in equation (6) depends both on the median Υ̂V and on the power-law index β at low masses in the original Schechter-function GCMF. The net scaling, for either single- or multiple-mass clusters, is tdis/trh ∝ Υ̂ V (β − 1) −1 . (7) To see the dependence of this dimensionless lifetime on Υ̂V , note that we require µev ∝∆∝ΥV to fit the mass losses of clusters with a given distribution of luminosities (the di- rect observables), whereas M/trh is proportional to ρ V (L/r 1/2. Therefore, tdis/trh ∝ (M/trh)/µev ∝Υ V . The mass-to-light ratios adopted in this paper, with a median value Υ̂V ≃ 1.5 M⊙ L ⊙ , are tied directly to dynamical determina- tions (§2.1). To understand the dependence on β in equation (7), recall first that the coefficients in our expressions for ∆ and µev as functions of ρh (eqs. [4] and [5]) followed from choos- ing β = 2 for the power-law exponent at low masses in the initial GCMF (equation [2]). As we mentioned just after equation (3), the turnover mass of an evolved Schechter func- tion with any β > 1 is MTO ≃ ∆/(β − 1) in the limit of low ∆ ∝ ρ h , and MTO → Mc for very high ∆. In this sense, the strongest observational constraints on the normalizations of ∆ and µev come from the low-density clusters. All other things being equal, their GCMF can be reproduced with β 6= 2 if ∆ and µev are multiplied by (β − 1) at fixed ρh. Therefore, tdis ∝ 1/µev ∝ 1/(β− 1). Observations of young massive clus- ters (e.g., Zhang & Fall 1999) indicate that β is near 2; but if it were slightly shallower, then the cluster lifetimes we infer from the old GCMF would increase accordingly. Even a rela- tively minor change to β = 1.5 would double tdis/trh from ≈10 to ≈20. In the model of Hénon (1961) for single-mass clusters evolving self-similarly (fixed ratio ρt/ρh of mean densities in- side the tidal and half-mass radii) in a steady tidal field (ρt constant in time), a cluster loses 4.5% of its remaining mass every half-mass relaxation time. The time to complete disrup- tion is therefore tdis/trh = 1/0.045≃ 22. For non-homologous clusters in a steady tidal field, tdis/trh is a function of central concentration and can differ from the Hénon value by factors of about two. From one-dimensional Fokker-Planck calcula- tions, Gnedin & Ostriker (1997) find tdis/trh ≃ 10–40 for King (1966) model clusters with c values similar to those found in real GCs and with gravitational shocks suppressed (see their Figure 6). Thus, even though the evaporation time in equation (6) may be slightly shorter than is typically found in theoreti- cal calculations, it is certainly within the range of such calcu- lations. Moreover, the assumptions of a steady tidal field and a single stellar mass in Hénon (1961) and Gnedin & Ostriker (1997) are important. Part of the difference between the typ- ical lifetimes in these particular theoretical treatments and our estimate of tdis/trh from the GCMF is that the former do not include gravitational shocks, which may have accelerated somewhat the evolution of real clusters (although we stress again that shocks do not appear in general to have dominated the evolution of extant Galactic GCs and are not expected to affect the basic time-independence of the net mass-loss rate; see Vesperini & Heggie 1997, Gnedin, Lee, & Ostriker 1999, FZ01, and Prieto & Gnedin 2006). A spectrum of stel- lar masses in the clusters may also have contributed to an in- crease in evaporation rate over the single-mass values (e.g., Johnstone 1993; Lee & Goodman 1995). Estimates of evaporation times from other numerical methods and for models of multimass clusters can be rather sensitive to the detailed computational techniques and input assumptions and approximations, and differences at roughly the factor-of-two level in tdis/trh between different analyses are not uncommon; see, e.g., Vesperini & Heggie (1997), Takahashi & Portegies Zwart (1998, 2000), Baumgardt (2001), Joshi, Nave, & Rasio (2001), Giersz (2001), and Baumgardt & Makino (2003). Thus, although the lifetimes in these studies tend to be broadly comparable to those in Hénon (1961) and Gnedin & Ostriker (1997), noticeably shorter values do occur in some models. In any case, we are encouraged by consistency to within factors of two or three between estimates of tdis or µev by such vastly different methods—one purely observational, based on the mass functions of cluster systems; the other purely theoretical, based on idealized models for the evolution of individual clusters—particularly since each method involves several uncertain inputs and parameters. 3.2. Approximating µev ∝ ρ 3.2.1. Half-mass versus Tidal Density The dimensionless disruption time in equation (6) is inde- pendent of any cluster property other than the Coulomb log- arithm because we have used GC half-mass densities to esti- mate tdis = M/µev ∝ M/ρ h , while trh also scales as M/ρ However, as we mentioned above, the Fokker-Planck calcu- lations of Gnedin & Ostriker (1997) in particular show that tdis/trh is actually a function of central concentration, c, for King (1966) model clusters in steady tidal fields. The constant of proportionality in µev ∝ ρ h should therefore also depend on c, a detail that we have neglected to this point. We show now that this has not biased any of our analysis or affected our conclusions. The dotted curve in Figure 4 illustrates the dependence of tdis/trh on c for single-mass King models, as given by equation (30) of Gnedin & Ostriker (1997). The solid curve is propor- tional to (ρh/ρt) 1/2 = (r3t /2r 1/2, which we have calculated as a function of c for these models and multiplied by a con- stant to compare directly with tdis/trh. Evidently, there is an approximate equality tdis/trh ≈ 2.15(ρh/ρt) 1/2, which holds to within <15% over the range of concentrations shown in Fig- ure 4 (note that all but 6 Galactic GCs have 0.7 ≤ c ≤ 2.5, corresponding to central potentials 3 .W0 . 11). Thus, if the 10 McLAUGHLIN & FALL FIG. 4.— Dependence of tdis/trh (dotted line; from Gnedin & Ostriker 1997) and (ρh/ρt ) 1/2 (solid line; after scaling by a factor of 2.15) on cen- tral concentration for single-mass King-model clusters. Over the range of c shown, which includes nearly all Galactic globulars, the approximate propor- tionality tdis/trh ∝ (ρh/ρt ) 1/2 holds to within better than 15%. Thus, to this level of accuracy the evaporation time tdis is roughly the same multiple of t for clusters with any internal density profile. evaporation time is written as tdis ∝ trh(ρh/ρt) 1/2 ∝ M/ρ then the constant of proportionality in the mass-loss rate µev ∝ M/tdis ∝ ρ t should be nearly independent of c. In fact, King (1966) originally concluded, from quite basic arguments, that the evaporation rate of a cluster with a lowered-Maxwellian velocity distribution would take the form µev ∝ ρ t with only a weak dependence on c. An essentially concentration- independent scaling of µev with ρ t is also found in N- body simulations of tidally limited, multimass clusters (e.g., Vesperini & Heggie 1997) and so is not an artifact of any as- sumptions specific to the calculations of either King (1966) or Gnedin & Ostriker (1997). This suggests that it might have been more natural to spec- ify cluster evaporation rates proportional to ρ t rather than h when developing our GCMF models in §2. For any clus- ter in a steady tidal field, with a constant ρt , such a choice would also have been automatically consistent with an ap- proximately time-independent µev and the corresponding lin- ear M(t) dependence that we have adopted throughout this paper. As we discussed at the beginning of §2, our deci- sion to work with ρh rather than ρt was motivated by the fact that the half-mass density is much better defined in prin- ciple and more accurately observed in practice. Neverthe- less, re-writing µev ∝ ρ t as µev ∝ (ρt/ρh) 1/2 × ρ h makes it clear that the validity of our models, with a fixed coefficient in µev ∝ ρ h , depends on the extent to which variations in (ρt/ρh) 1/2 can safely be ignored. Figure 4 shows that the full range of possible values for (ρh/ρt) 1/2 in King-model clusters with c ≥ 0.7 is only a factor of ≃ 4 between minimum and maximum. Therefore, using a single, intermediate value of this density ratio to describe all GCs (or a single GC evolving in time through a series of quasi-static King models)—which we have effectively done by using a GCMF fit to normalize ∆ and µev in equations (4) and (5)—should never be in error by more than a factor of 2 or so. This is a relatively small inaccuracy, given that measured GC densities range over four to five orders of magnitude. To confirm more directly that our models with µev ∝ ρ are good approximations to GCMF evolution under a mass- loss law µev ∝ ρ t , we have repeated the analysis of §2 in full but using the GC tidal densities ρt (derived from the values of rt listed by Harris 1996) in place of ρh throughout. All of our main results persist. For example, the two panels of Figure 5, which are analo- gous to the left- and rightmost panels of Figure 1 above, show that (1) the GC mass distribution has a clear dependence on ρt , with a lower envelope that is well matched by a line of constant evaporation time, M ∝ ρ t (the dashed line in the plot); and (2) although the scatter in the distribution of ρt over Galactocentric radius is smaller than the scatter in ρh ver- sus rgc, it is still significant. Because the M–rgc distribution can now be viewed as the convolution of the M–ρt distribu- tion with the ρt–rgc distribution, the scatter in ρt versus rgc is again critical in explaining the weak or null dependence of the GCMF on Galactocentric radius. (The M–rgc distribution is, of course, unchanged from that shown in the middle panel of Figure 1.)11 Figure 6 shows the Milky Way GCMF for globulars in three equally populated bins of tidal density (defined as indicated in the left-hand panels of the plot) and in the same three bins of Galactocentric radius that we used in §2.2 above. Our mod- els for these distributions are based as before on equation (3) with β = 2, but now the total mass lost from any GC is esti- mated from its tidal density rather than its half-mass density. Specifically, we take ∆ = 2.1× 105 M⊙ ρt/M⊙ pc −3)1/2 . (8) The numerical coefficient in equation (8) is such that it gives a ∆ identical to that in equation (4) for a GC with ρh/ρt = 210, which is the median value of this density ratio for the 146 GCs in the Harris (1996) catalogue. As in Figure 2, the dashed curve in every panel of Figure 6 is the same, representing a fit to the average dN/d log M of the entire Galactic GC system. Thus, it is immediately clear that the peak mass of the GCMF increases significantly and systematically with increasing ρt , just as it does with increas- ing ρh. Meanwhile, the solid curves are subsample-specific model GCMFs, obtained by using the observed tidal density of each cluster in any ρt or rgc bin to specify individual ∆ val- ues via equation (8) for each of the evolved Schechter func- tions in the summation of equation (3). As expected, there is no appreciable difference, in terms of the fits to any of the ob- served GCMFs, between these models based on evaporation rates µev ∝ ρ t and our original models with µev ∝ ρ 3.2.2. Retarded Evaporation Another potential concern comes from recent arguments (see especially Baumgardt 2001; Baumgardt & Makino 2003) 11 As was also the case with our earlier plots involving ρh in Figure 1, the scatter and structure in both panels of Figure 5 are real, since the rms scatter of log rt about the best-fit lines to either of log M or log rgc is 0.3–0.35 while the rms errorbars based on formal fitting uncertainties are in the range δ(log rt ) ≃ 0.05–0.15 for a variety of models (McLaughlin & van der Marel 2005). SHAPING THE GCMF BY EVAPORATION 11 FIG. 5.— Scatter plots of mass M versus mean density inside the tidal radius (ρt ≡ 3M/4πr3t ) and of ρt versus Galactocentric radius rgc, for 146 Galactic GCs from the Harris (1996) catalogue. These plots are analogous to the left- and rightmost panels of Figure 1. The dashed line in the left-hand plot traces the relation M ∝ ρ t , which defines a locus of constant evaporation time for µev ∝ ρ FIG. 6.— Observed GCMF (points, with Poisson errorbars) and models (curves) as a function of mean cluster density inside the tidal radius, ρt ≡ 3M/4πr3t (left- hand panels), and as a function of Galactocentric radius, rgc (right-hand panels). The dashed curve in every panel is an evolved Schechter function representing the entire GC system: equation (3) with β = 2, Mc = 106 M⊙, and a single ∆, common to all clusters, evaluated from equation (8) using the median bρt of all 146 Galactic GCs. Solid curves are subsample-specific models using equation (3) with β = 2 and Mc = 106 M⊙ but a different ∆ value for every cluster (obtained from equation [8] using individual observational estimates of ρt ) in any ρt or rgc bin. 12 McLAUGHLIN & FALL that the total evaporation time of a tidally limited cluster is not simply a multiple of an internal two-body relaxation time, trlx ∝ (Mr 3)1/2, but depends on both trlx and the crossing time tcr ∝ (M/r 3)−1/2 through the combination tdis ∝ t with x < 1. The mass-loss rate µev ∝ M/tdis then scales as M3/2−xr−3/2, which for x 6= 1 differs from the rates µev ∝ ρ and µev ∝ ρ t that we have so far adopted. However, our GCMF models are still meaningful, because postulating tdis ∝ txrlxt cr implies a dependence of µev on a measure of cluster density that is, once again, well approximated by ρ h for Galactic GCs. Before showing this, we briefly discuss the reasons and the evidence for a possible dependence of tdis on both trlx and tcr. If stars are assumed to escape a cluster as soon as they have attained energies above some critical value as a result of two-body relaxation, then tdis ∝ trlx is expected (and con- firmed by N-body simulations; e.g., Baumgardt 2001). How- ever, more complicated behavior may arise when escape not only depends on stars satisfying such an energy criterion, but also requires them to cross a spatial boundary. Then, al- though the stars are still scattered to near- and above-escape energies on the timescale trlx, they require some additional time to actually leave the cluster. This escape timescale is related fundamentally to tcr (but also depends on details of the stellar orbits, the external tidal field, and the shape of the zero-energy surface). The longer this extra time, the higher is the probability that further encounters with bound cluster stars may scatter any potential escapers back down to sub-escape energies. The net result is a slow-down (“re- tardation”) of the overall evaporation rate (Chandrasekhar 1942; King 1959; Takahashi & Portegies Zwart 1998, 2000; Fukushige & Heggie 2000; Baumgardt 2001) and a length- ening of the cluster lifetime tdis, by a factor that can be ex- pected to increase with the ratio tcr/trlx. If this factor scales as (tcr/trlx) 1−x for some x < 1, then tdis ∝ trlx (tcr/trlx) 1−x = txrlxt While such a retardation of evaporation can be expected to occur at some level in all clusters, there are physical sub- tleties in the effect that are probably not captured adequately by a simple re-parametrization of lifetimes as tdis ∝ t In particular, it is unlikely that this expression can hold for clusters of all masses with a single value of x < 1. Since tcr/trlx ∝ M −1, very massive clusters have tcr ≪ trlx, and stars scattered to greater than escape energies by relaxation cross the tidal boundary effectively instantaneously—implying that the standard tdis ∝ trlx, or x→ 1, applies in the high-mass limit. Indeed, if this were not the case, and a fixed x < 1 held for all M, then an unphysical tdis < trlx would obtain at high enough masses; see Baumgardt (2001) for further discussion. Unfor- tunately, “very massive” is not well quantified in this context, and it is not yet clear if a single value of x is accurate for the entire GC mass regime. So far, it has been checked directly only for initial cluster masses below the current peak of the GCMF. It is also worth noting that the analysis and simulations aimed at this problem to date have dealt with clusters on cir- cular or moderately eccentric orbits in galactic potentials that are static and spherical. This means that any tidal perturba- tions felt by stars within the clusters are relatively weak and/or slow compared to their own orbital periods, leading to nearly adiabatic or at least non-impulsive responses. In more realis- tic situations, the galactic potential would be time-dependent and non-spherical and there might be additional tidal pertur- bations, including disk and bulge shocks. These perturbations could in some cases accelerate the escape of weakly bound stars from the clusters and thus counteract the retardation ef- fect to some degree. Further study is therefore needed to de- termine the regime of validity of the formula tdis ∝ t cr and its possible modification outside this regime. In the meantime, Baumgardt (2001) and Baumgardt & Makino (2003; hereafter BM03) have fit- ted this formula to the lifetimes of a suite of N-body clusters with initial masses M0 . 7 × 10 4 M⊙ and several different initial concentrations and orbital eccentricities. BM03 at first write tdis in terms of the relaxation and crossing times of clusters at their half-mass radii, so that trlx ∝ (Mr tcr ∝ (M/r −1/2, and tdis ∝ M x−1/2r h (see their equation [5]). However, they immediately take a factor of (rt/rh) out from the normalization of this scaling—in effect to obtain tdis ∝M x−1/2r t with a different constant of proportionality— and then use a simple definition of the tidal radius (their equation [1], r3t = GMr c , which is appropriate for a circular orbit of radius rp in a logarithmic potential with circular speed Vc; see Innanen, Harris, & Webbink 1983) to obtain the total lifetime of a cluster as a function of its initial mass, perigalactic distance, and Vc (their equation [7]). A single exponent x ≃ 0.75 and a single normalization in this function then suffice to predict to within 10% the lifetimes of the simulated clusters, regardless of their initial concentrations. By implication, if trlx and tcr were fixed at rh rather than rt , then tdis would have an additional concentration dependence, related to the ratio (rt/rh) 3/2—very similar to what we discussed in §3.2.1 for the case x = 1. We now re-examine the Milky Way GCMF in terms of this prescription for retarded evaporation (bearing in mind the caveats mentioned above). To avoid any explicit dependences on concentration, we also focus on the tidal radius and write tdis ∝ M x−1/2r t for general x ≤ 1; but we do not substitute a potential- and orbit-specific formula for rt in terms of rp and galactic properties such as Vc. Instead, to keep the empha- sis entirely on cluster densities, we re-write the scaling of the lifetime in terms of the mean surface density inside the tidal radius, Σt ≡ M/πr t , and the corresponding volume density ρt = 3M/4πr t . This leads to tdis ∝ MΣ −3(1−x) −2(x−3/4) t , which then implies µev ≡ −dM/dt ∝ M/tdis ∝ Σ 3(1−x) 2(x−3/4) t . (9) Clearly, the standard µev ∝ ρ t , which we have already dis- cussed, is recovered for x = 1; while for x = 0.75, we have the equally straightforward µev ∝ Σ BM03 find that, even with the retarded evaporation implied by x ≃ 0.75, the masses of their simulated clusters still de- crease approximately linearly with time after stellar-evolution effects (which are only important for the first few 108 yr) are separated out; see especially their Figure 6, equation (12), and related discussion. Thus, if the GCMF initially rose towards low masses and has been eroded by slow, relaxation-driven cluster destruction, then in this modified description of evap- oration we might expect the current mass function to depend fundamentally on Σt rather than ρh or ρt . But because M(t) still decreases nearly linearly with t, only now with µev ∝Σ for each cluster, the shape of the evolved GCMF and its de- pendence on Σt should resemble our earlier results for ρh and SHAPING THE GCMF BY EVAPORATION 13 We have confirmed this expectation by repeating all of our analyses in §2 again, now using µev ∝Σ t to estimate cluster mass-loss rates. As before, we calculate Σt from the data in the Harris (1996) catalogue, although we caution once more that the tidal radii, and thus the derived Σt , are more uncertain than rh and ρh. Figure 7, which should be compared to Figures 1 and 5 above, shows that the average Galactic GC mass increases systematically with Σt; that the lower envelope of the M–Σt distribution is described well by M ∝ Σ t (the dashed line in the left-hand panel of Figure 7), which is a locus of constant lifetime against evaporation for µev ∝ Σ t ; and that the scat- ter in the distribution of cluster Σt versus Galactocentric ra- dius (right-hand panel of the figure) is substantial, as required to account for the almost non-existent correlation between M and rgc. The left-hand side of Figure 8 shows the mass functions of globulars in three bins of Σt , as defined in each panel. The right-hand side of the figure shows dN/d log M in the same three intervals of rgc as in Figures 2 and 6 above. As in those earlier plots, the dashed curve in all panels of Figure 8 is a model GCMF with the same parameters in every case, representing the mass function of the entire Galactic GC sys- tem. Once again, compared to the average MTO, the observed turnover mass is significantly lower for clusters in the lowest Σt bin and higher for clusters in the highest Σt bin, while the width of dN/d log M decreases noticeably as Σt increases. The solid curves in Figure 8 are again different in every panel. They are the sums of evaporation-evolved Schechter functions as in equation (3), with the usual β = 2 assumed but with total mass losses estimated individually for each GC in any Σt or rgc bin according to ∆ ∝ Σ t rather than ∆ ∝ h or ∆ ∝ ρ t . However, it turns out not to be necessary to change the normalization of ∆ ∝ ρ h in equation (4) to achieve good fits to the observed GCMF as a function of either Σt or rgc. Thus, in Figure 8 we have simply used ∆ = 1.45× 104 M⊙ Σt/M⊙ pc −2)3/4 . (10) The fits of these models, based on tdis ∝ t cr with x ≃ 0.75, are indistinguishable from the fits of our original mod- els based on the standard tdis ∝ trlx, i.e., x = 1. (We have con- firmed that adopting individual ∆ given by equation [10] also reproduces the GCMFs of low-and high-concentration GCs in Figure 3 as well as before.) It was somewhat unexpected that equation (10) and equation (4) should have the same nu- merical coefficient, but we note that this follows empirically from the fact that the measured ρh and Σt of Galactic GCs are consistent with the simple near-equality, ρh/M⊙ pc (Σt/M⊙ pc −2)1.5 in the mean. This is illustrated in Figure 9, which also shows that there is significant scatter about the re- lation.12 However, this scatter does not correlate with clus- ter mass or Galactocentric radius. From a pragmatic point of view, therefore, ρ h and Σ t are near enough to interchange- able for our purposes, and there is no practical difference be- 12 Although it may be only a coincidence that the constant of proportion- ality in ρh ∝ Σ t is so near unity, the basic scaling itself holds because combining the observed correlation between cluster mass and central con- centration (Djorgovski & Meylan 1994; McLaughlin 2000) with the intrinsic dependence of rt/rh on c in King models leads roughly to (rt/rh) ∝ M tween GCMF models based on one or the other measure of GC density. One further check on this is to verify that the mass-loss rate associated with equation (10) is roughly in keeping with that implied by the N-body simulations pointing to x = 0.75 in the first place. Thus, we compare the rate µev = ∆/(13 Gyr) ≃ 1100 M⊙ Gyr −1 (Σt/M⊙ pc −2)3/4 (11) to a formula implicit in BM03. Starting with their equa- tion (7) for the lifetime tdis as a function of initial cluster mass and perigalactic distance and circular speed in a loga- rithmic halo potential; using their x = 0.75 and their normal- ization of 1.91× 106 yr, multiplied as in their equation (9) by (1 + e) to allow for eccentric orbits with apo- and peri- galactic distances related by e ≡ (ra − rp)/(ra + rp); insert- ing their equation (1) for rt ; taking the mean mass of clus- ter stars to be m∗ = 0.55M⊙, as they do; using γ = 0.02 as they do in the Coulomb logarithm, ln(γM0/m∗); and defining Σt,0 ≡ M0/πr t,0 (the subscript 0 denoting initial values), we obtain µev(BM03)≃ 0.7M0 1 + e M⊙ Gyr 0.036M0/M⊙ 0.036× 105 ]3/4 ( M⊙ pc−2 This is appropriate for clusters that just fill their Roche lobes at perigalacticon, which is where Σt,0 is specified. The factor of 0.7 in the first equality accounts for mass loss due to stellar evolution in the BM03 simulations, which, as they discuss, can be treated as having occurred almost immediately and in full at the beginning of a cluster’s life. Our GCMF-based µev is a factor of ≈ 2 faster than the N- body value for clusters on circular orbits (with e = 0 and in steady tidal fields) in the simulations; and our µev is still within a factor of about three of the N-body rate for clusters on eccentric orbits with e = 0.5 in BM03 (e ≃ 0.5–0.6 is typical for tracers with an isotropic velocity distribution in a logarith- mic potential; van den Bosch et al. 1999). This is very similar to the comparison of lifetimes in §3.1 for our original models based on µev ∝ ρ h . Moreover, our new estimate of µev and that in BM03 are still subject to their own, separate uncertain- ties and reflect different idealizations and assumptions. For example, our rate still depends on the exact power-law expo- nent β at low masses in the initial GCMF, as discussed after equation (7); while the rate from BM03 still neglects grav- itational shocks from disk crossings and passages by a dis- crete galactic bulge, and may additionally be biased low for M0 > 10 5 M⊙ if x > 0.75 at such masses. All of this—not to mention again the large uncertainties and possible system- atics in the estimates of tidal radii needed to calculate Σt— makes the near agreement between equations (11) and (12) more striking than any apparent discrepancy. In summary, although the relation µev ∝ ρ h ≃ constant in time is rigorously correct only in rather specific circum- stances, our GCMF models based on it in §2 are good proxies, in all respects, for models based on other plausible characteri- zations of relaxation-driven cluster mass loss. This result will likely be important for future studies of the mass functions of extragalactic cluster systems, where it may well be necessary 14 McLAUGHLIN & FALL FIG. 7.— Scatter plots of mass M versus mean surface density inside the tidal radius (Σt ≡ M/πr2t ) and of Σt versus Galactocentric radius rgc, for 146 Galactic GCs from the Harris (1996) catalogue. These plots are analogous to the left- and rightmost panels of Figure 1, and the two panels of Figure 5. The dashed line in the left-hand plot traces the relation M ∝ Σ t , which defines a locus of constant evaporation time for µev ∝ Σ FIG. 8.— Observed GCMF (points, with Poisson errorbars) and models (curves) as a function of mean surface density inside the tidal radius, Σt ≡ M/πr2t (left- hand panels), and as a function of Galactocentric radius, rgc (right-hand panels). The dashed curve in every panel is an evolved Schechter function representing the entire GC system: equation (3) with β = 2, Mc = 106 M⊙, and a single ∆, common to all clusters, evaluated from equation (10) using the median bΣt of all 146 Galactic GCs. Solid curves are subsample-specific models using equation (3) with β = 2 and Mc = 106 M⊙ but a different ∆ value for every cluster (obtained from equation [10] using individual observational estimates of Σt ) in any Σt or rgc bin. SHAPING THE GCMF BY EVAPORATION 15 FIG. 9.— Half-mass density, ρh = 3M/8πr h , against mean surface den- sity inside the tidal radius, Σt = M/πr2t , for 146 clusters with data in Harris (1996). The straight line is ρh = Σ to adopt procedures based on ρh rather than ρt or Σt because of the difficulty or impossibility of estimating tidal radii. 3.3. MTO versus rgc, and Velocity Anisotropy in GC Systems In this paper we have directly modeled dN/d log M as a function only of GC density and age, and used the observed ρh (or ρt , or Σt) of clusters in relatively narrow ranges of Galac- tocentric position to show that such models are consistent with the current near-constancy of the GCMF as a function of rgc. Most other models in the literature for evaporation-dominated GCMF evolution, in either the Milky Way or other galaxies, instead predict the distribution explicitly as a function of rgc at any time. They therefore need, in effect, to derive theoretical density–position relations for clusters in galaxies alongside their main GCMF calculations. This usually begins with the adoption of analytical potentials to describe the parent galax- ies of GCs. Taking these to be spherical and static for a Hub- ble time allows the use of standard tidal-limitation formulae to write GC densities ab initio in terms of the (fixed) peri- centers rp of unique orbits in the adopted potentials. Cluster relaxation times and mass-loss rates µev then follow as func- tions of rp as well. Finally, specific initial mass, space, and velocity (or orbital eccentricity) distributions are chosen for entire GC systems, so that at all later times it is known what the dynamically evolved dN/d log M is for globulars with any single rp; how many clusters with a given rp survive; and what the distributions of rp and all dependent cluster properties are at any instantaneous position rgc. In this approach, if the GCMF began with a power-law rise towards low masses and its current peak is due entirely to clus- ter disruption, then a dependence of MTO on rp is expected in general, because the densities of tidally limited GCs decrease with increasing rp. Thus, models along these lines that as- sume the orbit distribution of a GC system to be the same at all radii in a galaxy (i.e., that the time average of the ra- tio rgc/rp is independent of position) have typically had diffi- culty in accounting for the observed weak or non-correlation between MTO and present rgc in large galaxies. This is partic- ularly a problem if it is assumed that the initial GCMF was a pure power law, with the same index at arbitrarily high masses as low (e.g., Baumgardt 1998; Vesperini 2001). It is poten- tially less of a concern if dN/d log M started as a Schechter function with an exponential cut-off at masses M > Mc, as we have assumed, since then the existence of a strict upper bound MTO ≤ Mc (§2.2) means that the dependence of an evaporation-evolved MTO on rp and rgc must saturate for small enough galactocentric radii (high enough GC densities). Even so, the “scale-free” models of FZ01, in which Mc ≃ 10 and all GCs in a Milky Way-like galaxy potential have the same time-averaged rgc/rp, still predict a gradient in MTO ver- sus rgc that is stronger than observed. FZ01 showed that, if they left all of their other assumptions unchanged, then a dependence of GCMF peak mass on rgc could be effectively erased by an appropriately varying radial velocity anisotropy in the initial GC system. Thus, in their “Eddington” models the eccentricity of a typical cluster or- bit increases with galactocentric distance (the time average of rgc/rp increases with radius), such that globulars spread over a larger range of current rgc can have more similar rp and asso- ciated MTO. However, the initial velocity-anisotropy gradient required to fit the Milky Way GCMF data specifically is only marginally consistent with the observed kinematics of the GC system (e.g., Dinescu, Girard, & van Altena 1999).13 Subse- quently, Vesperini et al. (2003) constructed broadly similar models for the GCMF of the Virgo elliptical M87 and con- cluded that there, too, a variable radial velocity anisotropy is required to match the observed MTO versus rgc; but the model anisotropy profile in this case is clearly inconsistent with the true velocity distribution of the GC system, which is observed to be isotropic out to large rgc (Romanowsky & Kochanek 2001; Côté et al. 2001). These results certainly suggest that some element is lacking in rgc-oriented GCMF models developed as outlined above. But they do not mean that the fault lies with the main hypoth- esis, that the difference between the mass functions of young clusters and old GCs is due to the effects of slow, relaxation- driven disruption in the latter case. Any conclusions about velocity anisotropy depend on the totality of steps taken to connect the densities and positions of clusters; and it is possi- ble that reasonable changes to one or more of these ancillary assumptions could make the models compatible with the ob- served kinematics of GCs in both the Milky Way and M87, without abandoning a basic physical picture of evaporation- dominated GCMF evolution that is otherwise quite success- One issue is that previous models have always specified evaporation rates a priori as functions of cluster density (or or- bital pericenter), usually normalizing µev so that tdis/trh ≃ 20– 40 as in standard treatments of two-body relaxation. How- ever, following our discussion in §3.1 and §3.2, it would seem worthwhile to investigate these models with µev increased at fixed ρh or rp to allow tdis/trh ≈ 10 (if β ≃ 2 for the low-mass power-law part of the initial GCMF). FZ01 and Vesperini et al. (2003) both consider velocity dis- tributions parametrized by a galactocentric anisotropy radius, RA, inside of which a cluster system is essentially isotropic and beyond which it is increasingly dominated by radial or- 13 The fact that clusters on radial orbits are preferentially disrupted lessens any inconsistency between the radial anisotropy required in the initial veloc- ity distribution and observational constraints on the present velocity distribu- tion. 16 McLAUGHLIN & FALL bits. In these terms, the difficulty with the published models is that, to reproduce the observed insensitivity of MTO to rgc given standard normalizations of µev, they require values of RA that are smaller than allowed by observations (especially for M87). Increasing RA to more realistic values while keep- ing the normalization of µev fixed leads to a stronger gra- dient in MTO: the orbits of GCs at small rgc . RA remain closely isotropic and the typical rp and MTO are essentially unchanged, while at large galactocentric distances the clus- ter orbits are on average less radial than before, with larger rp, lower densities, and lower evolved MTO for a given rgc. This effect is illustrated, for example, in Figure 9 of FZ01. However, it can be compensated at least in part by increas- ing µev by a common factor for all GCs, with the new, larger RA fixed, if the initial mass function is assumed to have been a Schechter function rather than a pure power law extend- ing to arbitrarily high masses. A faster evaporation rate will then lead to a (roughly) proportionate increase in the evolved GCMF peak mass for GCs with relatively low den- sities, i.e., those at large rgc and rp; but the increase in MTO will be smaller, and eventually even negligible, for higher- density clusters at progressively smaller rgc—again because MTO grows less than linearly with µev ∝ ρ h when there is an upper limit MTO < Mc due to an exponential cut-off in the initial dN/d log M0. Thus, the qualitative effect of increasing the normalization of µev in models with radially varying GC velocity anisotropy is to weaken the amount of radial-orbit bias required to fit an observed MTO versus rgc. Another point, emphasized by FZ01, has to do with the standard starting assumption that GCs orbit in galaxies that are perfectly static and spherical. In reality, galaxies grow hierarchically. In this case, even if the values of µev are not changed, much of the burden for the weakening or erasing of any initial gradients in MTO versus rgc may be transferred from velocity anisotropy to the time-dependent evolution of the galaxies themselves. Violent relaxation, major mergers, and smaller accretion events all work to move clusters between different parts of galaxies and between different progenitors, scrambling and combining any number of pericenter–density– MTO relations. Any position dependences in the GC ρh dis- tribution and in MTO itself for the final galaxy are therefore bound to be weaker, more scattered, and more difficult to re- late accurately to a cluster velocity distribution than in the case of a monolithic, non-evolving potential. Allowing for a non-spherical galaxy potential would have qualitatively the same effect, because in this case every cluster explores a range of pericenters and different maximum tidal fields on each of its orbits. In this situation, it may be important to ask how evapora- tion rates can still be approximately constant in time—so that cluster masses still decrease approximately linearly with t as our models assume—if the tidal field around any given GC changes significantly over time. Thus, consider first a sys- tem of GCs in a single, static galaxy potential. The mass- evolution curve for each cluster is approximately a straight line, M(t) ≃ M0 −µevt, with µev depending on some measure of internal density, which may be ρ h , ρ t , or Σ t . The av- erage mass-evolution curve for the entire system of clusters is also approximately linear, 〈M(t)〉 ≃ 〈M0〉 − 〈µev〉t. If now a merger or other event rearranges the clusters in the galaxy, then after the event the mass-loss rates of some clusters will be higher than before and the rates of other clusters will be lower than before. However, if the mean density of the galaxy as a whole is roughly the same after the event as before, then so too will be the average of the GC densities, because of tidal limitation. The average 〈µev〉 ∝ 〈ρ h 〉 (say) will differ even less between the pre- and post-merger systems. Thus, al- though using instantaneous densities to estimate the past µev of individual clusters may err on the high side for some clus- ters and on the low side for others, these errors will average away to a small or even zero net bias. The approximation µev ≃ constant in time in our GCMF models will then still be valid in the mean, and the average 〈M(t)〉 dependence of suf- ficiently large numbers of clusters will remain roughly linear. This type of scenario might be expected to pertain at least to galaxies that evolve on the fundamental plane, since this entails a connection between the total (baryonic plus dark) masses and circular speeds of galaxies, of the form Mgal ∝V or Mgal ∝ V c . By the virial theorem, the average densities scale as ρgal ∝ V 2, and thus ρgal ∝ M gal or ρgal ∝ M gal . Insofar as 〈ρh〉 ∝ 〈ρt〉 ∝ ρgal for the GCs, the system-wide av- erage 〈µev〉 ∝ 〈ρ h 〉 should therefore not change drastically even after a major merger between two fundamental-plane galaxies; at most, the ratio of final to initial 〈µev〉 will be roughly of order the −1/4 power of the ratio of final to ini- tial Mgal. Note that this line of reasoning is closely related to that applied by FZ01 to explain the small observed galaxy- to-galaxy differences in the average turnover masses of entire GC systems (although non-zero differences do exist, and can be accomodated in these sorts of arguments; see Jordán et al. 2006, 2007). A full exploration of questions such as these, about the wide range of ingredients in current GC-plus-galaxy models, will most likely require large N-body simulations set in a realistic, cold dark matter cosmology. Until these can be carried out, it is our view that the kinematics of globular cluster systems cannot be used as decisive side constraints on theories for the GCMF. 4. CONCLUSIONS We have shown that the mass function dN/d log M of glob- ular clusters in the Milky Way depends significantly on clus- ter half-mass density, ρh, with the peak or turnover mass MTO increasing and the width of the distribution decreasing as ρh increases. This behavior is expected if the GCMF initially rose towards masses below the present turnover scale—as the mass functions of young cluster systems like that in the An- tennae galaxies do—and has evolved to its current shape via the slow depletion of low-mass clusters over Gyr timescales, primarily through relaxation-driven evaporation. The fact that MTO increases with cluster density favors evaporation over external gravitational shocks as the primary mechanism of low-mass cluster disruption, since the mass-loss rates asso- ciated with shocks depend inversely on cluster density and directly on cluster mass. Our results therefore add to previ- ous arguments supporting an interpretation of the GCMF in terms of evaporation-dominated evolution, based on the fact that dN/d log M scales as M1−β with β ≃ 0 in the low-mass limit (Fall & Zhang 2001). The observed GCMF as a function of ρh is fitted well by simple models in which the initial distribution was a Schechter function, dN/d log M0 ∝ M 0 exp −M0/Mc with β = 2 and Mc ≃ 10 6 M⊙ assumed, and in which clusters have been losing mass for a Hubble time at roughly steady rates that can be estimated from their current half-mass den- SHAPING THE GCMF BY EVAPORATION 17 sities as µev ∝ ρ h . We have shown that, although this pre- scription is approximate, it captures the main physical depen- dence of relaxation-driven evaporation. In particular, it leads to model GCMFs that are entirely consistent with those re- sulting from alternative characterizations of evaporation rates in terms of cluster tidal densities ρt or mean surface densities Σt (§3.2). The normalization of µev at a given ρh (or ρt , or Σt) required to fit the GCMF implies total cluster lifetimes that are within range of the lifetimes typically obtained in theoret- ical studies of two-body relaxation, although our values may be slightly shorter than the theoretical ones if the low-mass, power-law part of the initial cluster mass function was as steep as we have assumed. Taking clusters in various bins of central concentration c and Galactocentric radius rgc and using their (individual) ob- served densities as direct input to our models yields dynam- ically evolved GCMFs as functions of c and rgc that agree well with all data. This again indicates that the most fun- damental physical dependence in the GCMF is that on clus- ter density. Moreover, our models for dN/d log M versus rgc obtained in this way are consistent in particular with the well-known insensitivity of the GCMF peak mass to Galac- tocentric position. This is seen to follow from a significant variation of MTO with ρh (or ρt , or Σt)—due in our analysis to evaporation-dominated cluster disruption—combined with substantial scatter in the GC densities at any Galactocentric position. We have not invoked an anisotropic GC velocity distribu- tion to explain the observed weak variation of MTO with rgc; indeed, we have made no predictions or assumptions what- soever about velocity anisotropy. We have emphasized that, when velocity anisotropy enters other long-term dynamical- evolution models for the GCMF, it is only in conjunction with several additional, interrelated assumptions made as part of larger efforts to derive theoretical density–rgc relations for GCs—which we have not attempted to do here. The appar- ent need in some current models for a strong bias towards high-eccentricity cluster orbits to explain the near-constancy of MTO versus rgc might well be avoided by changing one or more ancillary assumptions in the models, without having to discard the underlying idea that the peak and low-mass shape of the GCMF are the result of relaxation-driven cluster dis- ruption. It clearly will be of interest to test and refine the main ideas in this paper through modeling of the GCMFs in other galaxies. For the time being at least, doing so will re- quire the estimation of approximate mass-loss rates using cluster half-mass densities rather than tidal quantities, sim- ply because GC half-light radii can be measured accurately in many systems beyond the Local Group, whereas tidal radii are much more model-dependent and difficult to ob- serve. Chandar, Fall, & McLaughlin (2007) have recently shown that the peak mass of the GCMF in the Sombrero galaxy (M104) increases with ρh in a way that is reasonably well described by sums of evolved Schechter (1976) functions as in the models presented in this paper. It should be rela- tively straightforward to pursue similar studies in other nearby galaxies. We thank Michele Trenti, Douglas Heggie, Bill Harris, Ru- pali Chandar, and Bruce Elmegreen for helpful discussions and comments. SMF acknowledges support from the Am- brose Monell Foundation and from NASA grant AR-09539.1- A, awarded by the Space Telescope Science Institute, which is operated by AURA, Inc., under NASA contract NAS5-26555. REFERENCES Aguilar, L., Hut, P., & Ostriker, J. P. 1988, ApJ, 335, 720 Barmby, P., Huchra, J. P., & Brodie, J. P. 2001, AJ, 121, 1482 Barmby, P., McLaughlin, D. E., Harris, W. E., Harris, G. L. H., & Forbes, D. A. 2007, AJ, 133, 2764 Baumgardt, H. 1998, A&A, 330, 480 Baumgardt, H. 2001, MNRAS, 325, 1323 Baumgardt, H., & Makino, J. 2003, MNRAS, 340, 227 (BM03) Binney, J., & Tremaine, S. 1987, Galactic Dynamics (Princeton: Princeton University Press) Burkert, A., & Smith, G. H. 2000, ApJ, 542, L95 Caputo, F., & Castellani, V. 1984, MNRAS, 207, 185 Chandar, R., Fall, S. M., & McLaughlin, D. E. 2007, ApJ, 668, L119 Chandrasekhar, S. 1942, Principles of Stellar Dynamics (Chicago: University of Chicago Press) Chernoff, D. F., & Weinberg, M. D. 1990, ApJ, 351, 121 Côté, P., et al. 2001, ApJ, 559, 828 Dinescu, D. I., Girard, T. M., & van Altena, W. F. 1999, AJ, 117, 1792 Djorgovski, S., & Meylan, G. 1994, AJ, 108, 1292 Elmegreen, B. G., & Efremov, Y. N. 1997, ApJ, 480, 235 Fall, S. M., & Rees, M. J. 1977, MNRAS, 181, 37P Fall, S. M., & Zhang, Q. 2001, ApJ, 561, 751 (FZ01) Fukushige, T., & Heggie, D. C. 2000, MNRAS, 318, 753 Giersz, M. 2001, MNRAS, 324, 218 Giersz, M., & Heggie, D. C. 1996, MNRAS, 279, 1037 Gnedin, O. Y. 1997, ApJ, 487, 663 Gnedin, O. Y., & Ostriker, J. P. 1997, ApJ, 474, 223 Gnedin, O. Y., Lee, H. M., & Ostriker, J. P. 1999, ApJ, 522, 935 Harris, W. E. 1996, AJ, 112, 1487 Harris, W.E. 2001, in Star Clusters (28th Saas-Fee Advanced Course) ed. L. Labhardt & B. Binggeli (Berlin: Springer), 223 Harris, W. E., & Pudritz, R. E. 1994, ApJ, 429, 177 Harris, W. E., Harris, G. L. H., & McLaughlin, D. E. 1998, AJ, 115, 1801 Hénon, M. 1961, Ann. d’Astrophys., 24, 369 Innanen, K. A., Harris, W. E., & Webbink, R. F. 1983, AJ, 88, 338 Johnstone, D. 1993, AJ, 105, 155 Jordán, A., et al. 2005, ApJ, 634, 1002 Jordán, A., et al. 2006, ApJ, 651, L25 Jordán, A., et al. 2007, ApJS, 171, 101 Joshi, K. J., Nave, C. P., & Rasio, F. A. 2001, ApJ, 550, 691 Kavelaars, J. J., & Hanes, D. A. 1997, MNRAS, 285, L31 Lee, H. M., & Goodman, J. 1995, ApJ, 443, 109 King, I. 1958, AJ, 63, 109 King, I. 1959, AJ, 64, 351 King, I. R. 1966, AJ, 71, 64 Lee, H. M., & Ostriker, J. P. 1987, ApJ, 322, 123 McLaughlin, D. E. 2000, ApJ, 539, 618 McLaughlin, D. E., & van der Marel, R. P. 2005, ApJS, 161, 304 Murali, C., & Weinberg, M. D. 1997, MNRAS, 291, 717 Okazaki, T., & Tosa, M. 1995, MNRAS, 274, 48 Ostriker, J. P., & Gnedin, O. Y. 1997, ApJ, 487, 667 Parmentier, G., & Gilmore, G. 2007, MNRAS, 377, 352 Prieto, J. L., & Gnedin, O. Y. 2006, preprint (astro-ph/0608069) Romanowsky, A. J., & Kochanek, C. S. 2001, ApJ, 553, 722 Schechter, P. 1976, ApJ, 203, 297 Smith, G. H., & Burkert, A. 2002, ApJ, 578, L51 Spitler, L. R., Larsen, S. S., Strader, J., Brodie, J. P., Forbes, D. A., & Beasley, M. A. 2006, AJ, 132, 1593 Spitzer, L. 1987, Dynamical Evolution of Globular Clusters (Princeton: Princeton Univ. Press) Takahashi, K., & Portegies Zwart, S. F. 1998, ApJ, 503, L49 Takahashi, K., & Portegies Zwart, S. F. 2000, ApJ, 535, 759 Trenti, M., Heggie, D. C., & Hut, P. 2007, MNRAS, 374, 344 van den Bosch, F. C., Lewis, G. F., Lake, G., & Stadel, J. 1999, ApJ, 515, 50 Vesperini, E. 1997, MNRAS, 287, 915 Vesperini, E. 1998, MNRAS, 299, 1019 Vesperini, E. 2000, MNRAS, 318, 841 Vesperini, E. 2001, MNRAS, 322, 247 Vesperini, E., & Heggie, D. C. 1997, MNRAS, 289, 898 Vesperini, E., & Zepf, S. E. 2003, ApJ, 587, L97 Vesperini, E., Zepf, S. E., Kundu, A., & Ashman, K. M. 2003, ApJ, 593, 760 Waters, C. Z., Zepf, S. E., Lauer, T. R., Baltz, E. A., & Silk, J. 2006, ApJ, 650, 885 Zhang, Q., & Fall, S. M. 1999, ApJ, 527, L81 http://arxiv.org/abs/astro-ph/0608069 ABSTRACT We show that the globular cluster mass function (GCMF) in the Milky Way depends on cluster half-mass density (rho_h) in the sense that the turnover mass M_TO increases with rho_h while the width of the GCMF decreases. We argue that this is the expected signature of the slow erosion of a mass function that initially rose towards low masses, predominantly through cluster evaporation driven by internal two-body relaxation. We find excellent agreement between the observed GCMF -- including its dependence on internal density rho_h, central concentration c, and Galactocentric distance r_gc -- and a simple model in which the relaxation-driven mass-loss rates of clusters are approximated by -dM/dt = mu_ev ~ rho_h^{1/2}. In particular, we recover the well-known insensitivity of M_TO to r_gc. This feature does not derive from a literal ``universality'' of the GCMF turnover mass, but rather from a significant variation of M_TO with rho_h -- the expected outcome of relaxation-driven cluster disruption -- plus significant scatter in rho_h as a function of r_gc. Our conclusions are the same if the evaporation rates are assumed to depend instead on the mean volume or surface densities of clusters inside their tidal radii, as mu_ev ~ rho_t^{1/2} or mu_ev ~ Sigma_t^{3/4} -- alternative prescriptions that are physically motivated but involve cluster properties (rho_t and Sigma_t) that are not as well defined or as readily observable as rho_h. In all cases, the normalization of mu_ev required to fit the GCMF implies cluster lifetimes that are within the range of standard values (although falling towards the low end of this range). Our analysis does not depend on any assumptions or information about velocity anisotropy in the globular cluster system. <|endoftext|><|startoftext|> Introduction The quantum deformations of relativistic symmetries are described by Hopf-algebraic deformations of Lorentz and Poincaré algebras. Such quantum deformations are classified by Lorentz and Poincaré Poisson structures. These Poisson structures given by classical r-matrices were classified already some time ago by S. Zakrzewski in [1] for the Lorentz algebra and in [2] for the Poincaré algebra. In the case of the Lorentz algebra a complete list of classical r-matrices involves the four independent formulas and the corresponding quantum deformations in different forms were already discussed in literature (see [3, 4, 5, 6, 7]). In the case of Poincaré algebra the total list of the classical r-matrices, which satisfy the homogeneous classical Yang-Baxter equation, consists of 20 cases which have various numbers of free parameters. Analysis of these twenty solutions shows that each of them can be presented as a sum of subordinated r-matrices which almost all are of Abelian and Jordanian types. A part of twists corresponding to the r-matrices of Zakrzewski classification are given in explicit form. 2 Preliminaries Let r be a classical r-matrix of a Lie algebra g, i.e. r ∈ ∧ g and r satisfies to the classical Yang–Baxter equation (CYBE) [r12, r13 + r23] + [r13, r23] = Ω , (2.1) ∗Invited talk at the XXII Max Born Symposium ”Quantum, Super and Twistors”, September 27-29, 2006 Wroclaw (Poland), in honour of Jerzy Lukierski. †Supported by the grants RFBR-05-01-01086 and FNRA NT05-241455GIPM. http://arxiv.org/abs/0704.0081v1 where Ω is g-invariant element, Ω ∈ ( ∧ g)g. We consider two types of the classical r-matrices and corresponding twists. Let the classical r-matrix r = rA has the form xi ∧ yi , (2.2) where all elements xi, yi (i = 1, . . . , n) commute among themselves. Such an r-matrix is called of Abelian type. The corresponding twist is given as follows = exp = exp xi ∧ yi . (2.3) This twisting two-tensor F := Fr satisfies the cocycle equation F 12(∆⊗ id)(F ) = F 23(id⊗∆)(F ) , (2.4) and the ”unital” normalization condition (ǫ⊗ id)(F ) = (id⊗ ǫ)(F ) = 1 . (2.5) The twisting element F defines a deformation of the universal enveloping algebra U(g) considered as a Hopf algebra. The new deformed coproduct and antipode are given as follows ∆(F )(a) = F∆(a)F−1 , S(F )(a) = uS(a)u−1 (2.6) for any a ∈ U(g), where ∆(a) is a co-product before twisting, and u = i S(f i ) if i ⊗ f Let the classical r-matrix r = rJ(ξ) has the form rJ(ξ) = ξ xν ∧ yν , (2.7) where the elements xν , yν (ν = 0, 1, . . . , n) satisfy the relations [x0, y0] = y0 , [x0, xi] = (1− ti)xi , [x0, yi] = tiyi , [xi, yj] = δijy0 , [xi, xj ] = [yi, yj] = 0 , [y0, xj ] = [y0, yj] = 0 , (2.8) (i, j = 1, . . . , n), (ti ∈ C). Such an r-matrix is called of Jordanian type. The corresponding twist is given as follows [8, 9] = exp xi ⊗ yi e −2tiσ exp(2x0 ⊗ σ) , (2.9) 1Here entering the parameter deformation ξ is a matter of convenience. 2It is easy to verify that the two-tensor (2.7) indeed satisfies the homogenous classical Yang-Baxter equation (2.1) (with Ω = 0), if the elements xν , yν (ν = 0, 1, . . . , n) are subject to the relations (2.8). where σ := 1 ln(1 + ξy0). Let r be an arbitrary r-matrix of g. We denote a support of r by Sup(r)4. The following definition is useful. Definition 2.1 Let r1 and r2 be two arbitrary classical r-matrices. We say that r2 is subordinated to r1, r1 ≻ r2, if δr1(Sup(r2)) = 0, i.e. (x) := [x⊗ 1 + 1⊗ x, r1] = 0 , ∀x ∈ Sup(r2) . (2.10) If r1 ≻ r2 then r = r1 + r2 is also a classical r-matrix (see [15]). The subordination enables us to construct a correct sequence of quantizations. For instance, if the r-matrix of Jordanian type (2.7) is subordinated to the r-matrix of Abelian type (2.2), rA ≻ rJ , then the total twist corresponding to the resulting r-matrix r = rA+ rJ is given as follows Fr = Fr . (2.11) The further definition is also useful. Definition 2.2 A twisting two-tensor Fr(ξ) of a Hopf algebra, satisfying the conditions (2.4) and (2.5), is called locally r-symmetric if the expansion of Fr(ξ) in powers of the parameter deformation ξ has the form Fr(ξ) = 1 + c r +O(ξ 2) . . . (2.12) where r is a classical r-matrix, and c is a numerical coefficient, c 6= 0. It is evident that the Abelian twist (2.3) is globally r-symmetric and the twist of Jordanian type (2.9) does not satisfy the relation (2.12), i.e. it is not locally r-symmetric. 3 Quantum deformations of Lorentz algebra The results of this section in different forms were already discussed in literature (see [3, 4, 5, 6, 7]). The classical canonical basis of the D = 4 Lorentz algebra, o(3, 1), can be described by anti-Hermitian six generators (h, e±, h ′, e′±) satisfying the following non-vanishing commutation relations5: [h, e±] = ±e± , [e+, e−] = 2h , (3.1) [h, e′±] = ±e ± , [h ′, e±] = ±e ± , [e±, e ∓] = ±2h ′ , (3.2) [h′, e′±] = ∓e± , [e −] = −2h , (3.3) and moreover x∗ = −x (∀ x ∈ o(3, 1)) . (3.4) 3The corresponding twists for Lie algebras sl(n), so(n) and sp(2n) were firstly constructed in the papers [10, 11, 12, 13]. 4The support Sup(r) is a subalgebra of g generated by the elements {xi, yi} if r = xi ∧ yi. 5Since the real Lie algebra o(3, 1) is standard realification of the complex Lie sl(2,C) these relations are easy obtained from the defining relations for sl(2,C), i.e. from (3.1). A complete list of classical r-matrices which describe all Poisson structures and generate quantum deformations for o(3, 1) involve the four independent formulas [1]: r1 = α e+ ∧ h , (3.5) r2 = α (e+ ∧ h− e + ∧ h ′) + 2β e′+ ∧ e+ , (3.6) r3 = α (e + ∧ e− + e+ ∧ e −) + β (e+ ∧ e− − e + ∧ e −)− 2γ h ∧ h ′ , (3.7) r4 = α e′+ ∧ e− + e+ ∧ e − − 2h ∧ h ± e+ ∧ e + . (3.8) If the universal R-matrices of the quantum deformations corresponding to the classical r-matrices (3.5)–(3.8) are unitary then these r-matrices are anti-Hermitian, i.e. r∗j = −rj (j = 1, 2, 3, 4) . (3.9) Therefore the ∗-operation (3.4) should be lifted to the tensor product o(3, 1) ⊗ o(3, 1). There are two variants of this lifting: direct and flipped, namely, (x⊗ y)∗ = x∗ ⊗ y∗ (∗ − direct) , (3.10) (x⊗ y)∗ = y∗ ⊗ x∗ (∗ − flipped) . (3.11) We see that if the ”direct” lifting of the ∗-operation (3.4) is used then all parameters in (3.5)–(3.8) are pure imaginary. In the case of the ”flipped” lifting (3.11) all parameters in (3.5)–(3.8) are real. The first two r-matrices (3.5) and (3.6) satisfy the homogeneous CYBE and they are of Jordanian type. If we assume (3.10), the corresponding quantum deformations were described detailed in the paper [6] and they are entire defined by the twist of Jordanian type: = exp (h⊗ σ) , σ = ln(1 + αe+) (3.12) for the r-matrix (3.5), and = exp σ ∧ ϕ exp (h⊗ σ − h′ ⊗ ϕ) , (3.13) (1 + αe+) 2+ (αe′+) , ϕ = arctan 1 + αe+ (3.14) for the r-matrix (3.6). It should be recalled that the twists (3.12) and (3.13) are not locally r-symmetric. A locally r-symmetric twist for the r-matrix (3.5) was obtained in [14] and it has the following complicated formula: = exp ∆(h)− sinhαe+ ⊗ e−αe++ eαe+⊗ h sinhαe+ α∆(e+) sinhα∆(e+) , (3.15) where ∆ is a primitive coproduct. The last two r-matrices (3.7) and (3.8) satisfy the non-homogeneous (modified) CYBE and they can be easily obtained from the solutions of the complex algebra o(4,C) ≃ sl(2,C)⊕ sl(2,C) which describes the complexification of o(3, 1). Indeed, let us introduce the complex basis of Lorentz algebra (o(3, 1) ≃ sl(2;C) ⊕ sl(2,C)) described by two commuting sets of complex generators: (h + ıh′) , E1± = (e± + ıe ±) , (3.16) (h− ıh′) , E2± = (e± − ıe ±) , (3.17) which satisfy the relations (compare with (3.1)) [Hk, Ek±] = ±Ek± , [Ek+, Ek−] = 2Hk (k = 1, 2) . (3.18) The ∗-operation describing the real structure acts on the generatorsHk, and Ek± (k = 1, 2) as follows H∗1 = −H2 , E 1± = −E2± , H 2 = −H1 , E 2± = −E1± . (3.19) The classical r-matrix r3, (3.7), and r4, (3.8), in terms of the complex basis (3.16), (3.17) take the form r3 = r 1 + r r′3 := 2(β + ıα)E1+ ∧ E1− + 2(β − ıα)E2+ ∧ E2− , r′′3 := 4ıγ H2 ∧H1 , (3.20) r4 = r 4 + r r′4 := 2ıα(E1+ ∧ E1− −E2+ ∧ E2− − 2H1 ∧H2) , r′′4 := 4ıλE1+ ∧ E2+ (3.21) For the sake of convenience we introduce parameter6λ in r′′4 . It should be noted that r′3, r 3 and r 4 are themselves classical r-matrices. We see that the r-matrix r 3 is simply a sum of two standard r-matrices of sl(2;C), satisfying the anti-Hermitian condition r∗ = −r. Analogously, it is not hard to see that the r-matrix r4 corresponds to a Belavin- Drinfeld triple [15] for the Lie algebra sl(2;C) ⊕ sl(2,C)). Indeed, applying the Cartan automorphism E2± → E2∓, H2 → −H2 we see that this is really correct (see also [16]). We firstly describe quantum deformation corresponding to the classical r-matrix r3 (3.20). Since the r-matrix r′′3 is Abelian and it is subordinated to r 3 therefore the algebra o(3, 1) is firstly quantized in the direction r′3 and then an Abelian twist corresponding to the r-matrix r′′3 is applied. We introduce the complex notations z± := β ± ıα. It should be noted that z− = z + if the parameters α and β are real, and z− = −z the parameters α and β are pure imaginary. From structure of the classical r-matrix r′3 it follows that a quantum deformation Ur′ (o(3, 1)) is a combination of two q-analogs of U(sl(2;C)) with the parameter qz and qz , where qz := exp z±. Thus Ur′3(o(3, 1)) (sl(2;C))⊗Uq (sl(2;C)) and the standard generators q±H1z , E1± and q , E2± satisfy 6We can reduce this parameter λ to ± 1 by automorphism of o(4,C). the following non-vanishing defining relations qH1z+ E1± = q E1± q , [E1+, E1−] = q2H1z+ − q qz+ − q , (3.22) qH2z− E2± = q E2± q , [E2+, E2−] = q2H2z− − q qz− − q . (3.23) In this case the co-product ∆r′ and antipode Sr′ for the generators q±H1z , E1± and q E2± can be given by the formulas: (q±H1z+ ) = q ⊗ q±H1z+ , ∆r′1 (E1±) = E1± ⊗ q + q−H1z+ ⊗ E1± , (3.24) (q±H2z− ) = q ⊗ q±H2z− , ∆r′1 (E2±) = E2± ⊗ q + q−H2z− ⊗ E2± , (3.25) (q±H1z+ ) = q , Sr′ (E1±) = −q E1± , (3.26) (q±H2z ) = q∓H2z− , Sr′1 (E2±) = −q E2± . (3.27) The ∗-involution describing the real structure on the generators (3.8) can be adapted to the quantum generators as follows (q±H1z+ ) ∗ = q∓H2 , E∗1± = −E2± , (q )∗ = q∓H1 , E∗2± = −E1± , (3.28) and there exit two ∗-liftings: direct and flipped, namely, (a⊗ b)∗ = a∗ ⊗ b∗ (∗ − direct) , (3.29) (a⊗ b)∗ = b∗ ⊗ a∗ (∗ − flipped) (3.30) for any a ⊗ b ∈ Ur′ (o(3, 1)) ⊗ Ur′ (o(3, 1)), where the ∗-direct involution corresponds to the case of the pure imaginary parameters α, β and the ∗-flipped involution corresponds to the case of the real deformation parameters α, β. It should be stressed that the Hopf structure on Ur′ (o(3, 1)) satisfy the consistency conditions under the ∗-involution (a∗) = (∆r′ (a))∗, Sr′ ((Sr′ (a∗))∗) = a (∀x ∈ Ur′ (o(3, 1)) . (3.31) Now we consider deformation of the quantum algebra Ur′ (o(3, 1)) (secondary quan- tization of U(o(3, 1))) corresponding to the additional r-matrix r′′3 , (3.20). Since the generators H1 and H2 have the trivial coproduct (Hk) = Hk ⊗ 1 + 1⊗Hk (k = 1, 2) , (3.32) therefore the unitary two-tensor := qH1∧H2ıγ (F = F−1 ) (3.33) satisfies the cocycle condition (2.4) and the ”unital” normalization condition (2.5). Thus the complete deformation corresponding to the r-matrix r3 is the twisted deformation of (o(3, 1)), i.e. the resulting coproduct ∆r is given as follows (x) = Fr′′ (x)F−1 (∀x ∈ Ur′ (o(3, 1)) . (3.34) and in this case the resulting antipode Sr does not change, Sr . Applying the twisting two-tensor (3.33) to the formulas (3.24) and (3.25) we obtain (q±H1z+ ) = q ⊗ q±H1z+ , ∆r′1(q ) = q±H2z− ⊗ q , (3.35) (E1+) = E1+ ⊗ q qH2ıγ + q q−H2ıγ ⊗ E1+ , (3.36) (E1−) = E1− ⊗ q q−H2ıγ + q qH2ıγ ⊗ E1− , (3.37) ∆r3(E2+) = E2+ ⊗ q q−H1ıγ + q qH1ıγ ⊗ E2+ , (3.38) ∆r3(E2−) = E2− ⊗ q qH1ıγ + q q−H1ıγ ⊗ E2− . (3.39) Next, we describe quantum deformation corresponding to the classical r-matrix r4 (3.21). Since the r-matrix r′4(α) := r 4 is a particular case of r3(α, β, γ) := r3, namely r′4(α) = r3(α, β = 0, γ = α), therefore a quantum deformation corresponding to the r- matrix r′4 is obtained from the previous case by setting β = 0, γ = α, and we have the following formulas for the coproducts ∆r′ ) = q (k = 1, 2) , (3.40) (E1+) = E1+ ⊗ q H1+H2 + q−H1−H2 ⊗ E1+ , (3.41) (E1−) = E1− ⊗ q H1−H2 + q−H1+H2 ⊗ E1− , (3.42) (E2+) = E2+ ⊗ q −H1−H2 ξ + q H1+H2 ξ ⊗ E2+ , (3.43) (E2−) = E2− ⊗ q H1−H2 ξ + q −H1+H2 ξ ⊗ E2− , (3.44) where we set ξ := ıα. Consider the two-tensor := exp λE1+q H1+H2 ξ ⊗ E2+q H1+H2 . (3.45) Using properties of q-exponentials (see [17]) is not hard to verify that Fr′′ satisfies the co- cycle equation (2.4). Thus the quantization corresponding to the r-matrix r4 is the twisted q-deformation Ur′ (o(3, 1)). Explicit formulas of the co-products ∆r (·) = F (·)F−1 and antipodes Sr4(·) in the complex and real Cartan-Weyl bases of Ur′4(o(3, 1)) will be presented in the outgoing paper [7]. 4 Quantum deformations of Poincare algebra The Poincaré algebra P(3, 1) of the 4-dimensional space-time is generated by 10 elements: the six-dimensinal Lorentz algebra o(3, 1) with the generators Mi, Ni (i = 1, 2, 3): [Mi, Mj ] = ıǫijk Mk, [Mi, Nj ] = ıǫijk Nk, [Ni, Nj ] = −ıǫijk Mk, (4.1) and the four-momenta Pj, P0 (j = 1, 2, 3) with the standard commutation relations: [Mj , Pk] = ıǫjkl Pl , [Mj , P0] = 0 , (4.2) [Nj, Pk] = −ıδjk P0 , [Nj , P0] = −ıPj . (4.3) The physical generators of the Lorentz algebra, Mi, Ni (i = 1, 2, 3), are related with the canonical basis h, h′, e±, e ± as follows h = ıN3 , e± = ı(N1 ± M2), (4.4) h′ = −ıM3 , e ± = ı(±N2 −M1). (4.5) The subalgebra generated by the four-momenta Pj, P0 (j = 1, 2, 3) will be denoted by P and we also set P± := P0 ± P3. S. Zakrzewski has shown in [2] that each classical r-matrix, r ∈ P(3, 1) ∧ P(3, 1), has a decomposition r = a+ b+ c , (4.6) where a ∈ P ∧P, b ∈ o(3, 1) ∧P, c ∈ o(3, 1) ∧ o(3, 1) satisfy the following relations [[c, c]] = 0 , (4.7) [[b, c]] = 0 , (4.8) 2[[a, c]] + [[b, b]] = tΩ (t ∈ R) , (4.9) [[a, b]] = 0 . (4.10) Here [[·, ·]] means the Schouten bracket. Moreover a total list of the classical r-matrices for the case c 6= 0 and also for the case c = 0, t = 0 was found.7 It was shown that there are fifteen solutions for the case c = 0, t = 0, and six solutions for the case c 6= 0 where there is only one solution for t 6= 0. Thus Zakrzewski found twenty r-matrices which satisfy the homogeneous classical Yang-Baxter equation (t = 0 in (4.9)). Analysis of these twenty solutions shows that each of them can be presented as a sum of subordinated r- matrices which almost all are of Abelian and Jordanian types. Therefore these r-matrices correspond to twisted deformations of the Poincaré algebra P(3, 1). We present here r-matrices only for the case c 6= 0, t = 0: r1 = γh ′ ∧ h+ α(P+ ∧ P− − P1 ∧ P2) , (4.11) r2 = γe + ∧ e+ + β1(e+ ∧ P1 − e + ∧ P2 + h ∧ P+) + β2h ′ ∧ P+ , (4.12) r3 = γe + ∧ e+ + β(e+ ∧ P1 − e + ∧ P2 + h ∧ P+) + αP1 ∧ P+ , (4.13) r4 = γ(e + ∧ e+ + e+ ∧ P1+ e + ∧ P2− P1 ∧ P2) + P+ ∧ (α1P1+ α2P2) , (4.14) r5 = γ1(h ∧ e+ − h ′ ∧ e′+) + γ2e+ ∧ e + . (4.15) The first r-matrix r1 is a sum of two subordinated Abelian r-matrices r1 := r 1 + r 1 , r 1 ≻ r r′1 = α(P+ ∧ P− − P1 ∧ P2) , r 1 := γh ′ ∧ h . (4.16) Therefore the total twist defining quantization in the direction to this r-matrix is the ordered product of two the Abelian twits = Fr′′ = exp γh′ ∧ h α(P+ ∧ P− − P1 ∧ P2) . (4.17) 7Classification of the r-matrices for the case c = 0, t 6= 0 is an open problem up to now. The second r-matrix r2 is a sum of three subordinated r-matrices where two of them are of Abelian type and one is of Jordanian type r2 := r 3 + r 2 + r 2 , r 2 ≻ r 2 ≻ r r′2 := β1(e+ ∧ P1 − e + ∧ P2 + h ∧ P+) , r′′2 := γe + ∧ e+ , r 2 := β2h ′ ∧ P+ . (4.18) Corresponding twist is given by the following formulas = Fr′′′ , (4.19) where = exp β1(e+ ⊗ P1 − e + ⊗ P2) exp(2h⊗ σ+) , = exp(γe′+ ∧ e+) , Fr′′′2 = exp(β2h ′ ∧ σ+) . (4.20) Here and below we set σ+ := ln(1 + β1P+). The third r-matrix r3 is a sum of two subordinated r-matrices where one is of Abelian type and another is a more complicated r-matrix which we call mixed Jordanian-Abelian r3 := r 3 + r 3 , r 3 ≻ r r′3 := β1(e+ ∧ P1 − e + ∧ P2 + h ∧ P+) + αP1 ∧ P+ , r′′3 := γe + ∧ e+ . (4.21) Corresponding twist is given by the following formulas = Fr′′ , (4.22) where = exp β1(e+ ⊗ P1 − e + ⊗ P2) exp(αP1 ∧ σ+) exp(2h⊗ σ+) , = exp(γe′+ ∧ e+) . (4.23) The fourth r-matrix r4 is a sum of two subordinated r-matrices of Abelian type r4 := r 4 + r 4 , r 4 ≻ r r′4 := P+ ∧ (α1P1 + α+P2) , r′′4 := γ(e + − P1) ∧ (e+ + P2) . (4.24) Corresponding twist is given by the following formulas = Fr′′ , (4.25) where = exp (P+ ⊗ (α1P1 + α2P2) = exp γ(e′+ − P1) ∧ (e+ + P2) (4.26) The fifth r-matrix r5 is the r-matrix of the Lorentz algebra, (3.6), and the correspond- ing twist is given by the formula (3.13). References [1] S. Zakrzewski, Lett. Math. Phys., 32, 11 (1994). [2] S. Zakrzewski, Commun. Math. Phys., 187, 285 (1997); http://arxiv.org/abs/q-al/9602001. [3] M. Chaichian and A. Demichev, Phys. Lett., B34, 220 (1994) [4] A. Mudrov, Yadernaya Fizika, 60, No.5, 946 (1997). [5] A. Borowiec, J. Lukierski, V.N. Tolstoy, Czech. J. Phys., 55, 11 (2005); http://xxx.lanl.gov/abs/hep-th/0301033. [6] A. Borowiec, J. Lukierski, V.N. Tolstoy, Eur. Phys. J., C48, 336 (2006); arXiv:hep-th/0604146. [7] A. Borowiec, J. Lukierski, V.N. Tolstoy, in preparation. [8] V.N. Tolstoy, Proc. of International Workshop ”Supersymmetries and Quantum Sym- metries (SQS’03)”, Russia, Dubna, July, 2003, eds: E. Ivanov and A. Pashnev, publ. JINR, Dubna, p. 242 (2004); http://xxx.lanl.gov/abs/math.QA/0402433. [9] V.N. Tolstoy, Nankai Tracts in Mathematics ”Differential Geometry and Physics”. Proceedings of the 23-th International Conference of Differential Geometric Methods in Theoretical Physics (Tianjin, China, 20-26 August, 2005). Edi- tors: Mo-Lin Ge and Weiping Zhang. Wold Scientific, 2006, Vol. 10, 443-452; http://xxx.lanl.gov/abs/math.QA/0701079. [10] P.P. Kulish, V.D. Lyakhovsky and A.I. Mudrov, Journ. Math. Phys., 40, 4569 (1999). [11] P.P. Kulish, V.D. Lyakhovsky and M.A. del Olmo, Journ. Phys. A: Math. Gen., 32, 8671 (1999). [12] V.D. Lyakhovsky, S. Stolin and P.P. Kulish, J. Math. Phys. Gen., 42, 5006 (2000). [13] D.N. Ananikyan, P.P. Kulish and V.D. Lyakhovsky, St.Petersburg Math. J., 14, 385 (2003). [14] Ch. Ohn, Lett. Math. Phys., 25, 85 (1992). [15] A. Belavin and V. Drinfeld, Functional Anal. Appl., 16(3), 159 (1983); translated from Funktsional. Anal. i Prilozhen, 16, 1 (1982) (Russian). [16] A.P. Isaev and O.V. Ogievetsky, Phys. Atomic Nuclei, 64(12), 2126 (2001); math.QA/0010190. [17] S.M. Khoroshkin and V. Tolstoy, Comm. Math. Phys., 141(3), 599 (1991). http://arxiv.org/abs/q-al/9602001 http://xxx.lanl.gov/abs/hep-th/0301033 arXiv:hep-th/0604146 http://xxx.lanl.gov/abs/math.QA/0402433 http://xxx.lanl.gov/abs/math.QA/0701079 Introduction Preliminaries Quantum deformations of Lorentz algebra Quantum deformations of Poincare algebra ABSTRACT We discussed quantum deformations of D=4 Lorentz and Poincare algebras. In the case of Poincare algebra it is shown that almost all classical r-matrices of S. Zakrzewski classification correspond to twisted deformations of Abelian and Jordanian types. A part of twists corresponding to the r-matrices of Zakrzewski classification are given in explicit form. <|endoftext|><|startoftext|> Introduction In 2002, matter-wave bright solitons in quasi-one-dimensional (1D) Bose-Einstein conden- sates (BECs) were observed experimentally.1, 2) Bright solitons propagate in most cases with much larger amplitudes than dark solitons,3, 4) and are expected to have the potential for various applications such as coherent transport and atom interferometry. Soliton propagation in BEC can be described by the Gross-Pitaevskii (GP) equation. The GP equation, called the nonlinear Schrödinger (NLS) equation in nonlinear science, is integrable and has soliton solu- tions in a one-dimensional and uniform system. Recent experimental and theoretical advances about matter-wave bright solitons are reviewed, for instance, in ref. 5. The experimental creation of matter-wave solitons has been so far achieved only for single- component BEC. It is, nevertheless, very interesting to consider soliton propagation in BEC with internal degrees of freedom, so-called, spinor BEC. When BEC of ultracold alkali atoms is trapped exclusively by optical means, the hyperfine spin of atoms remains liberated. The spinor BEC was realized in such a way.6–8) Internal degrees of freedom endow solitons with a multiplicity. The multiple solitons will show a rich variety of dynamics. Here, we focus on the boson system in the F = 1 hyperfine spin state, exemplified by 23Na, 39K and 87Rb. The multi-component GP equation for F = 1 spinor BEC turns to an integrable model at special points, which is mathematically equivalent to the matrix NLS equation. An integrable model with a self-focusing nonlineality enables one to perform exact analysis via the inverse scatter- ing method (ISM) for the matrix NLS equation.9) In particular, bright soliton solutions under vanishing boundary conditions (VBC) are obtained, whose properties are investigated in refs. 10 and 11. Recently, the ISM for the matrix NLS equation under nonvanishing boundary con- ditions (NVBC) is formulated.12) Dark solitons in the F = 1 spinor BEC can be investigated by applying the ISM under NVBC to an integrable model with a self-defocusing nonlineal- ity.13) Although the ISM under NVBC is dedicated mainly to the self-defocusing case, we note that this technique is also applicable to an integrable model with a self-focusing nonlineality, which makes us available to bright soliton solutions with a finite background. In this paper, the detail of matter-wave bright solitons in the quasi-1D F = 1 spinor BEC is further investigated, based on an integrable model. We consider matter-wave spinor bright solitons traveling on a finite background of the condensate. We write down explicitly new soliton solutions, and verify that the obtained soliton solutions have the similar properties compared to those without a background. In the usual experimental setups, the condensates are confined in a finite-size regime, and the matter-wave bright solitons will accompany a finite background. The study given in this paper is meaningful in such realistic circumstances. The paper is organized as follows. In § 2, the GP equation for quasi-1D F = 1 spinor BEC is introduced. In particular, the integrable model is presented. There, the interactions between two atoms are supposed to be inter-atomic attractive and ferromagnetic, which lead to bright J. Phys. Soc. Jpn. Full Paper solitons. In § 3, the inverse scattering method under nonvanishing boundary conditions is applied to the integrable model. This application leads to bright soliton solutions with a finite background. Several conserved quantities of the model are also provided. One-soliton solutions are investigated in § 4. The spin states of one-solitons are classified, assuming that discrete eigenvalues are purely imaginary. Two-soliton solutions are discussed in § 5. The last section, § 6, is devoted to the concluding remarks. 2. GP Equation for F = 1 Spinor Bose-Einstein Condensates For BEC of ultracold alkali atoms, the mean-field theory works well, because almost all atoms go into condensation and the condensate is dilute. In this paper, we deal with the quasi-one-dimensional system. Atoms in the F = 1 hyperfine spin state have three magnetic substates labeled by the magnetic quantum number mF = 1, 0,−1. The system is charac- terized by a vectorial field operator with the components corresponding to each substate, Φ̂ = (Φ̂1, Φ̂0, Φ̂−1) T , satisfying equal-time commutation relations: [Ψ̂α(x, t), Ψ̂ (x′, t)] = δαβδ(x− x′), (1) where the subscripts α, β take on 1, 0,−1. In the framework of the mean-field theory for BEC, the quantum field is replaced with the order parameter: Φ(x, t) ≡ 〈Φ̂(x, t)〉 = (Φ1(x, t),Φ0(x, t),Φ−1(x, t))T . (2) Φ(x, t) is often called the spinor condensate wavefunction, which is normalized to the total number of atoms NT: dxΦ†(x, t)Φ(x, t) = NT. (3) The spinor condensate wavefunction obeys a set of coupled evolution equations, namely, the multi-component GP equation: i~∂tΦ1 = − ∂2xΦ1 + (c̄0 + c̄2) |Φ1|2 + |Φ0|2 +(c̄0 − c̄2)|Φ−1|2Φ1 + c2Φ∗−1Φ20, i~∂tΦ0 = − ∂2xΦ0 + (c̄0 + c̄2) |Φ1|2 + |Φ−1|2 +c̄0|Φ0|2Φ0 + 2c̄2Φ∗0Φ1Φ−1, i~∂tΦ−1 = − ∂2xΦ−1 + (c̄0 + c̄2) |Φ−1|2 + |Φ0|2 +(c̄0 − c̄2)|Φ1|2Φ−1 + c̄2Φ∗1Φ20, (4) where c̄0 = (ḡ0 + 2ḡ2)/3 and c̄2 = (ḡ2 − ḡ0)/3 denote effective 1D coupling constants for the mean-field and the spin-exchange interaction, respectively. Here, the effective 1D coupling constants ḡf are given by ḡf = 4~2af 1− Caf/a⊥ , (5) J. Phys. Soc. Jpn. Full Paper where af are the s-wave scattering lengths in the total hyperfine spin f channel, a⊥ is the size of the transverse ground state, m is the atomic mass, and C = −ζ(1/2) ≃ 1.46. Note that one may change the values of c̄0 and c̄2 by tuning a⊥. Equation (4) is derived as follows. The interaction between two atoms in the F = 1 hyperfine spin state has a form,15, 16) V̂ (x1 − x2) = δ(x1 − x2) c̄0 + c̄2F̂ 1 · F̂ 2 , (6) where F̂ i is the spin operator. The Gross-Pitaevskii energy functional is thus given by EGP[Φ] = α∂xΦα + α′Φα′Φα + αβ · fα′β′Φβ′Φβ , (7) where repeated subscripts (α, β, α′, β′ = 1, 0,−1) should be summed up and f = (fx, fy, fz)T with fi(i = x, y, z) being 3× 3 spin-1 matrices. Then, the variational principle: i~∂tΦα(x, t) = δEGP[Φ]/δΦ α(x, t), for α = 1, 0,−1, yields eq. (4). An important fact is that eq. (4) possesses a completely integrable point when c̄0 = c̄2 ≡ −c < 0, equivalently, 2ḡ0 = −ḡ2 > 0.10, 11) This condition is realized when a⊥ = 3C 2a0 + a2 , (8) assuming that a0a2(a2−a0) > 0 holds. The situation corresponds to attractive (c̄0 < 0) and fer- romagnetic (c̄2 < 0) interaction. When we change the wavefunction by Φ = (φ1, 2φ0, φ−1) and measure time and length in units of t̄ = ~a⊥/c and x̄ = ~ a⊥/2mc, respectively, we can rewrite eq. (4) with c̄0 = c̄2 ≡ −c < 0 into the dimensionless form, i∂tφ1 = −∂2xφ1 − 2(|φ1|2 + 2|φ0|2)φ1 − 2φ∗−1φ20, i∂tφ0 = −∂2xφ0 − 2(|φ−1|2 + |φ0|2 + |φ1|2)φ0 − 2φ∗0φ1φ−1, i∂tφ−1 = −∂2xφ−1 − 2(|φ−1|2 + 2|φ0|2)φ−1 − 2φ∗1φ20. (9) Then, eq. (9) is found to be equivalent to a 2× 2 matrix version of the NLS equation with a self-focusing nonlinearity: i∂tQ+ ∂ xQ+ 2QQ †Q = O, (10) with an identification, φ1 φ0 φ0 φ−1 . (11) The matrix NLS equation (10) is integrable in the sense that the initial value problem can be solved via the inverse scattering method.9, 12) The integrability of the reduced equations (9) is thus proved automatically. Thus, we have derived the integrable spinor model. Another integrable point of eq. (4) is c̄0 = c̄2 ≡ c > 0, i.e., the matrix NLS equation with a self- defocusing nonlineality.13) Special solutions for generic coupling constants c̄0, c̄2 are given in ref. 17. J. Phys. Soc. Jpn. Full Paper 3. Bright Solitons with a Finite Background We consider bright soliton solutions of the integrable model (9) under NVBC, whereas those under VBC are studied in refs. 10 and 11. We summarize briefly the results of the inverse scattering method for eq. (9) with NVBC.12) We define the nonvanishing boundary conditions as Q(x, t) → Q±, x→ ±∞, ±Q± = Q±Q ± = λ 0I, (12) where λ0 is a positive real constant and I denotes a 2 × 2 unit matrix. Note that vanishing boundary conditions are recovered as λ0 → 0. The analysis of the ISM under NVBC yields the standard form of the multiple soliton solutions of the 2 × 2 matrix NLS equation with a self-focusing nonlineality (10) as Q(x, t) = λ0 e I + 2i(I · · · I ︸ ︷︷ ︸ . (13) Here, a 2 × 2 complex matrix Πj is called the polarization matrix. S is a 2N × 2N matrix defined by Sij = ζj + λj δijI + i(ζi + ζj) ζi + λi ζj + λj 1 ≤ i, j ≤ N, (14) where λj is a complex discrete eigenvalue for the bound state and ζj = (λ j + λ 1/2 with Im ζj > 0 for j = 1, . . . , N . It is required for the ISM under NVBC that a two-sheet Riemann surface is introduced appropriately, due to a double-valued function ζj. The phase of the carrier wave, φ(x, t), is given by φ(x, t) = kx− (k2 − 2λ20)t+ δ, (15) and the coordinate function is given by χj ≡ χj(x, t) = 2iζj(x− 2(λj + k)t). (16) The above solution is the M(= N/2)-soliton solution. The ISM under NVBC for the self- focusing case results in pairs of discrete eigenvalues corresponding to each Riemann sheet. The constraint should be imposed on λj and ζj (j = 1, · · · , N) such that λ2l−1 = λ∗2l and ζ2l−1 = −ζ∗2l for l = 1, · · · , N/2. At the same time, Πj must satisfy that Π2l−1 = Π For our reduction to the integrable model for F = 1 spinor BEC, we must make the potential Q symmetric, noting eq. (11). The symmetry of Q is naturally reflected in Πj. When we take every Πj to be symmetric in eq. (13), soliton solutions of the integrable model (9) under NVBC are obtained. J. Phys. Soc. Jpn. Full Paper The form (13) is the standard form of soliton solutions in the sense that the boundary value at x→ ∞ ⇔ eχj → 0 is supposed to be fixed as Q(x, t) e−iφ(x,t) → λ0I, x→ ∞. (17) The spinor model, however, allows the SU(2) transformation of the solutions, if they are kept symmetric. To be concrete, let U be a 2× 2 unitary matrix. When Q is a solution of eq. (10) with eq. (11), then Q′ = UQUT , (18) is also a solution. Assuming that Q is the standard form (13), the limit x→ ∞ of Q′ becomes Q′ e−iφ(x,t) → λ0 UUT ≡ λ0Q′+, x→ ∞. (19) Q′+ = UUT is the so-called Cholesky decomposition. The arbitrary boundary conditions Q′+ other than eq. (17) are thus realized via the SU(2) transformation. On the other hand, the behavior in the limit x → −∞ varies depending on whether detΠ = 0 or not, which will be discussed later for the one-soliton case. There is another important concept about an integrable model. Due to the integrability, the model has the infinite conservation laws, which restrict the dynamics of the system in an essential way. Several conserved quantities, related to the physical quantities of the system, are listed below: Total number : N̄T = dx n̄(x, t), n̄(x, t) = tr (Q†Q)− tr (Q†±Q±). (20) Total spin : FT = dx f(x, t), f(x, t) = tr (Q†σQ). (21) Total momentum : P̄T = dx p̄(x, t), p̄(x, t) = −i~[tr (Q†Qx)− tr (Q†±Q±,x)]. (22) Total energy : ĒT = dx ē(x, t), ē(x, t) = c[tr (Q†xQx −Q†QQ†Q) −tr (Q†±,xQ±,x −Q ±Q±)]. (23) Here, σ = (σx, σy, σz) T are the Pauli matrices. To avoid the divergence of integrals, we should subtract the contribution of the background from the physical quantities, except the total spin, to which the background does not contribute explicitly. These subtractions are emphasized by the bars on the conserved densities and quantities. The local spin density f = (fx, fy, fz) is covariant under the SU(2) transformation (18), whereas the other densities such as n̄(x, t), J. Phys. Soc. Jpn. Full Paper p̄(x, t) and ē(x, t) are invariant. The total macroscopic spin is directed to face to an arbitrary direction by the global spin rotation. The SU(2) symmetry of the system causes the energy degenerated states of solitons for this spin rotation. 4. One-Soliton Solutions In this section, one-soliton solutions of the integrable spinor model (9) are investigated in detail. We can derive the explicit form of one-soliton solutions by setting N = 2 (M = 1) in the formula (13). The calculation is complicated but straightforward. The result is as follows: Q = λ0 e iφ(x,t) I + 2i , (24) where detS is given by detS = κ21κ 2 − eχ1ν1κ1κ22 trΠ− ex2ν2κ21κ2 trΠ† +eχ1+χ2κ1κ2 ̟ + ν1ν2(|tr Π|2 − 1) +e2χ1ν21κ 2 detΠ + e 2χ2ν22κ 1 detΠ − e2χ1+χ2ν1κ2̟ trΠ† detΠ− eχ1+2χ2ν2κ1̟ trΠdetΠ† +e2χ1+2χ2̟2|detΠ|2, (25) and T is a 2× 2 matrix such that T = eχ1κ1κ 2 Π+ e χ2κ21κ2 Π − eχ1+χ2κ1κ2 ς1 trΠ · Π† + ς2 trΠ† ·Π+ µ(|trΠ|2 − 1)I − e2χ1µ1κ22 detΠ · I − e2χ2µ2κ21 detΠ† · I +e2χ1+χ2κ2 detΠ ς21 Π † +̟ trΠ† · I +eχ1+2χ2κ1 detΠ ς22 Π+̟ trΠ · I − e2χ1+2χ2̟ (ς1 + ς2) |detΠ|2 · I. (26) We explain physical meanings of notations. The phase of the carrier wave is given by φ(x, t) = kx− (k2 − 2λ20)t+ δ. (27) Let λj and ζj = (λ j + λ 1/2 for j = 1, 2 be complex constants satisfying λ1 = λ 2 and ζ1 = −ζ∗2 . Without loss of generality, we assume that (λ1, ζ1) ((λ2, ζ2)) belongs to the upper (lower) Riemann sheet, which is characterized such that Im ζ Im λ > 0 (Im ζ Im λ < 0). χj(x, t) is expressed in terms of them as χ1 ≡ χ1(x, t) = 2iζ1(x− 2(λ1 + k)t), (28) χ2 ≡ χ2(x, t) = 2iζ2(x− 2(λ2 + k)t). (29) Note that χ1 = χ 2 ≡ χ holds. Reχ thus denotes the coordinate of the envelope soliton, whereas Imχ implies the self-modulation phase. The polarization matrix Π is a 2×2 symmetric J. Phys. Soc. Jpn. Full Paper matrix. Here, the normalization in a sense of the square norm is imposed on Π as a matter of convenience: , 2|α|2 + |β|2 + |γ|2 = 1. (30) The other parameters are expedient functions of λ0, λ1 and λ2: ζj + λj , µ = iλ0 κ1 + κ2 ζ1 + ζ2 , νj = iλ0κj , ̟ = ν1ν2 − µ2, ςj = νj − µ, (31) for j = 1, 2. We list the meaning of each parameter as follows: k : wave number of soliton’s carrier wave. λ0 : amplitude of soliton’s carrier wave. φ(x, t) : phase of soliton’s carrier wave. Re χ(x, t) : coordinate of soliton’s envelope. Im χ(x, t) : self-modulation phase of soliton. Π : symmetric polarization matrix of soliton. Equations (24)-(26) are new soliton solutions which had never been written down explicitly in the literatures. If we take the vanishing limit λ0 → 0, ζ1 and ζ2 converge at λ1 and −λ2, respectively. Then, · 2kR Πe−(χR+ρ/2) + (σyΠ†σy) eχR+ρ/2detΠ e−(2χR+ρ) + 1 + e2χR+ρ|detΠ|2 eiχI , (32) with notations eρ/2 ≡ 1 , (33) χR ≡ χR(x, t) = kR(x− 2kI t)− ǫ, (34) χI ≡ χI(x, t) = kIx+ (k2R − k2I )t, (35) each of which holds the following correspondence respectively: kR = −2Imλ1, (36) kI = 2Reλ1 + k, (37) ǫ = − ln(4|λ1|). (38) Equations (32)-(35) are the same forms as those in ref. 11, except a phase factor. This con- sequence is natural but non-trivial, because the formula of solitons under VBC9) is quite different from that under NVBC (13), in particular, in the form of the matrix S. Actually, the initial displacement ǫ can be arbitrarily changed, regardless of eq. (38), by the parallel shift of the position x. We have shown that the soliton solutions (24)-(26) can be regarded as a general form of bright soliton solutions, including the case of VBC. J. Phys. Soc. Jpn. Full Paper 4.1 Classification by the boundary conditions We shall show that there are two kinds of one-soliton solutions depending on the boundary conditions, detΠ = 0 or detΠ 6= 0. The similar classification about the boundary conditions also exists for dark solitons.13) The examples of snapshots of one-soliton density profiles are shown in Fig. 1. The upper row is for detΠ = 0, and the lower row is for detΠ 6= 0. The shape of envelope solitons looks a locally-oscilating wave rather than, literally, a solitary wave, because of the self-modulation due to the complex velocity. For detΠ = 0, the boundary conditions of the standard form (24)-(26) are Q e−iφ → λ0I, x→ ∞, Q e−iφ → λ0 I − 2i ς1 trΠ · Π † + ς2 trΠ † · Π+ µ(|trΠ|2 − 1)I ̟ + ν1ν2(|tr Π|2 − 1) , x→ −∞. (39) The left and right boundary values differ in not only the global phase but also the population of each component, in general. That is, those are the SU(2) rotated boundary conditions. In the upper row of Fig. 1, we see that the envelope soliton of each component forms the domain-wall (DW) shape, although it does not manifest in the total number density. On the other hand, for detΠ 6= 0, the boundary conditions are Q e−iφ → λ0I, x→ ∞, Q e−iφ → λ0 1− 2i ς1 + ς2 I, x→ −∞. (40) In contrast to the case that detΠ = 0, both boundary values are diagonal matrices, and only the phase-shift (PS) occurs. That is, those are the U(1) rotated boundary conditions. For the above reasons, we call one-soliton solutions the DW-type for detΠ = 0, and the PS-type for detΠ 6= 0. Remark the following; the spin density profile of DW-type suggests that the total spin is nonzero, whereas that of PS-type is dipole-shape, implying that the total spin amounts to zero. See the right panel of Fig. 1. This observation will be solidified in § 4.3. 4.2 Case of purely imaginary discrete eigenvalues The ISM performed on Riemann sheets involves a double-valued function of the spectral parameter, and it usually renders a very complicated representation of N -soliton solutions even for N = 1, as is seen from eqs. (24)-(26). To simplify an explicit representation, it is convenient to assume that λj and ζj are purely imaginary. 18) The similar approach is employed for the analysis of N -soliton solutions of the derivative NLS equation under NVBC.20) If we take a pair of discrete eigenvalues as (λ1, ζ1) ≡ (iλ0λ, iλ0ζ), (λ2, ζ2) ≡ (−iλ0λ, iλ0ζ), (41) where λ and ζ are positive real numbers such that λ > 1, ζ = λ2 − 1, (42) J. Phys. Soc. Jpn. Full Paper (a) (b) (c) Fig. 1. Snapshots of one-soliton density profiles. The upper row is plotted for detΠ = 0 at the moment t = 0, with k = 0, λ0 = 1, λ1 = 1+i, ξ1 = 1.27+0.79i (χ(x, t) = −(1.57−2.54i)x− (8.23−1.94i)t) and Π = 4/5 2/5 2/5 1/5 . The lower row is plotted for detΠ 6= 0 at the moment t = 0, with the same parameters except for Π = 2 2/5 2/5 3/(5 . The left panel (a) depicts the local density for each component, |φ1|2 (solid line), |φ0|2 (chain line) and |φ−1|2 (dotted line). The center panel (b) depicts the local number density n, where the contribution of the background is included. The right panel (c) depicts the local spin densities, fx (solid line) and fz (dotted line). fy vanishes identically due to a choice of a real matrix Π. we obtain a relatively simple form of one-soliton solutions as Q = λ0 e iφ(x,t) I + 2i , (43) detS = 1− (tr Π(t) + trΠ†(t)) + e2χP |trΠ(t)|2 (detΠ(t) + detΠ†(t))− trΠ†(t)detΠ(t) + trΠ(t)detΠ†(t) e4χP |detΠ(t)|2, (44) 2i T = 2(|tr Π(t)|2 − 1) e2χP + 2e2χP detΠ(t) + detΠ†(t) tr Π†(t)detΠ(t) + trΠ(t)detΠ†(t) 10/18 J. Phys. Soc. Jpn. Full Paper −2(ζ + λ) eχP + 2 e2χP trΠ†(t) + 2 e3χP detΠ†(t) −2(ζ − λ) eχP − 2 e2χP trΠ(t) + 2 e3χP detΠ(t) Π†(t), (45) where the coordinate of the envelope soliton is given by χP (x, t) = −2λ0ζ(x− 2kt), (46) and the time dependence of the phase modulation is embedded in the polarization matrix, namely, Π(t) ≡ Πe4iλ20ζλt. (47) When we take the limit λ0 → 0 with λ0λ and λ0ζ kept finite in eqs. (43)-(45), Q converges to the form of eqs. (32)-(35), accompanying the parameters kR = −2λ0λ, kI = k and ǫ = − ln(4λ0λ). Here, kR and kI are independent free parameters, apart from the trivial initial displacement ǫ. In this sense, in spite of the reduction, we can still regard the form of one- soliton solutions (43)-(45) as a general form of those under VBC. We can also take another limit. That is, we consider the reduction to the single-component case. If we set k = 0, Π = eiθ 0 , (48) the (1,1)-component of Q becomes Q11 · e−i(2λ t+δ) = λ0 − 2λ0ζ ζ cos(4λ20ζλt+ θ) + iλ sin(4λ 0ζλt+ θ) λ cosh(2λ0ζx+ ψ)− cos(4λ20ζλt+ θ) , (49) where eψ = ζ/λ. The form (49) was given in ref. 18. We thus verify that our soliton solutions are also generalization of those for the single-component NLS equation under NVBC.19) 4.3 Spin states In this subsection, we discuss the spin states of one-soliton solutions with a finite back- ground, by calculating the total spin. The conservation law guarantees that we obtain the total spin FT from integrating at arbitrary time. Therefore, we can select the time so that the calculation becomes easier. We concentrate on the case of purely imaginary discrete eigenval- ues. As a result, we see that the DW-type is associated with the ferromagnetic state, whereas the PS-type is associated with the polar state. One-soliton solutions for purely imaginary dis- crete eigenvalues (43)-(45) include those under VBC apart from an initial displacement, and therefore the classification about the spin states presented below is wider than that performed before.10, 11) 4.3.1 Ferromagnetic state For detΠ = 0 (DW-type), we substitute eqs. (43)-(45) into eq. (21) and calculate the total spin. The time t′ such that trΠ(t′) + trΠ†(t′) = 0 is suitable for the calculation. The result is 11/18 J. Phys. Soc. Jpn. Full Paper as follows: FT = 4λ0τ 2λRe{α∗(β + γ)} −2ζIm{α∗(β − γ)} λ(|β|2 − |γ|2) , τ ≡ |β + γ|2 , (50) with the modulus |F T|2 = (4λλ0)2τ. (51) The total number of the particles transformed into a soliton, N̄T, is calculated by eq. (20), N̄T = 4λ0ζ, (52) and the range of the value taken by |F T|2 is expressed in terms of N̄T, N̄2T ≤ |F T|2 ≤ N̄2T + (4λ0)2. (53) Remark that, in the vanishing limit λ0 → 0, the modulus of the total spin is always equal to the total number of the particles, namely, |F T| → NT. With nonzero total spin, the DW-type of solitons belongs to the ferromagnetic state. Since inter-atomic ferromagnetic interactions are supposed here, solitons tend to take the ferromagnetic state or DW-type. In various contexts of physics, the domain-walls are topo- logical solitons related with the symmetry breaking. Here, resulting from the domain-walls, the magnetic entity emerges as the spontaneously broken symmetry. It is worthy to notice that the case |N̄T| < |FT| may happen. The background is spinless, but its internal spin state appears to be affected on the ground that the ferromagnetic soliton runs over the background. Thereby, the background contributes to the total spin. 4.3.2 Polar state If detΠ 6= 0 (PS-type), the solitons show the other magnetic property. The time t′ such that detΠ(t′) = detΠ†(t′) > 0 is suitable for the analysis. After lengthy calculations, the local spin density at such time is derived as = 8λ20 e −3υΞ−2(χP ′) β − γ ζ e2υΞ(χP ′) sinh(χP ′) (ζ2 − λ2)tr Π(t′) + (ζ2 + λ2)trΠ†(t′) sinh(2χP ′) λ2/ζ · ((tr Π†(t′))2 − |tr Π(t′)|2) + 4ζdetΠ(t′) + 2ζ(|tr Π(t′)|2 − 1) sinh(χP ′) +h.c., (54) fy = 32λ −3υIm{α∗(β − γ)}Ξ−2(χP ′) 2ζ eυ sinh(2χP ′)− (trΠ(t′) + trΠ†(t′)) sinh(χP ′) , (55) 12/18 J. Phys. Soc. Jpn. Full Paper where Ξ(χP ′) is an even function of χP ′ , χP ′(x, t) is a parallel-shifted coordinate, and υ is a constant: Ξ(χP ′) ≡ 2 cosh(2χP ′)− 2 e−υ (trΠ(t′) + tr Π†(t′)) cosh(χP ′) ζ2 + |trΠ(t′)|2 λ2detΠ(t′) , (56) χP ′ ≡ χP + υ = −2λ0ζ(x− 2kt′) + υ, (57) υ ≡ ln (detΠ(t′))1/2 . (58) Note that fx and fz share the same functional form. Each component of the above local spin density is an odd function of χP ′ and, in particular, it has the node at the same point x0 such that χP ′(x0, t ′) = 0, namely, x0 = 2kt ′ + (2λ0ζ) −1υ. Consequently, the total spin amounts to zero: dxf(χP ′) = (0, 0, 0) T . (59) For this reason, the PS-type of solitons, on the average, belongs to the polar state.15) 5. Two-Soliton Collision We proceed to the discussion of two-soliton collisions in the integrable spinor model (9). Two-soliton solutions are obtained by setting N = 4 (M = 2) in the formula (13). There exist two independent discrete eigenvalues and symmetric polarization matrices, respectively, i.e., λ1 = λ 2 and Π1 = Π 2 for one of solitons, λ3 = λ 4 and Π3 = Π 4 for the other. Each soliton is separated at t = ±∞. Then, a two-soliton is asymptotically two one-solitons. The classification of one-soliton solutions based on the values of the determinants of polarization matrices, discussed in the previous section, is thus valid for two-soliton solutions. The derivation of the explicit form is more complicated than that in the case for one-soliton solutions. For the derivative NLS equation under NVBC, explicit two-soliton solutions and shifts of soliton positions due to collisions between solitons have been analytically obtained, in the case of purely imaginary eigenvalues, where complexity of calculation is considerably reduced.20) This strategy, however, does not stand in our NLS equation under NVBC. The reason is understood from eq. (46). In the spinor model, purely imaginary eigenvalues give two solitons with the same velocity 2k, and they do not collide with each other. Accordingly, we can not investigate the properties of collisions for purely imaginary discrete eigenvalues. No one has studied explicit multi-soliton solutions of the NLS equation under NVBC, even in the single-component case, because of the computational complexity. Here, we graphically show the characteristic behaviors of two-soliton collisions in the spinor model, by use of the exact solutions given by the ISM. Referring to them, we carry out the qualitative discussions. Although the presented graphs are depicted for the specific parameters, much the same behaviors are observed for arbitrarily selected parameters. 13/18 J. Phys. Soc. Jpn. Full Paper Figure 2 illustrates the behavior of a mutual collision between two PS-types, where detΠ1 6= 0 and detΠ3 6= 0. One can see that, in all three components, Both solitons re- tain their shapes before and after the collision, which is the common property with solitons in the single-component case. In this sense, PS-PS soliton collision is essentially equivalent to two-soliton collision of the single-component NLS equation. Figure 3 illustrates the behavior of a mutual collision between DW-type and PS-type, where detΠ1 = 0 and detΠ3 6= 0. The behavior of collisions between DW-type and PS-type is qualitatively alien from that between two PS-types. One observes that, in PS-type, much of the population initially inhabiting the hyperfine substate |F = 1,mF = ±1〉 is transferred into the hyperfine substate |F = 1,mF = 0〉 due to the collision. In contrast, in DW-type, such spin transfer does not occur, and the domain-wall shape is preserved against the collision. This phenomenon can be interpreted as follows. DW-type, with nonzero spin, can affect the internal spin state of PS-type, whereas PS-type, which is expected to have zero spin in total, does not affect the internal spin state of DW-type. This kind of spin-transfer phenomenon, called the spin-switching, has been first predicted for the case of VBC.11) Due to the conservation laws, the total number, the total spin and so on are invariant throughout the collision process. Population-mixing among internal degrees of freedom is permitted, as far as the conservation laws are not violated. The spin-switching is one of the dynamical processes which make the spinor solitons more interesting. Finally, for a mutual collision between two DW-types, where detΠ1 = detΠ3 = 0, the shapes of both solitons are expected to be deformed by the collision since each soliton carries nonzero total spin. In fact, drastic population-mixing is seen in Fig. 4, which shows an example of this kind of collisions. One finds that domain-walls ”repel” at the collision region, rather than collide. (a) (b) (c) Fig. 2. Density plots of |φ1|2 (a), |φ0|2 (b) and |φ−1|2 (c) for a mutual collision between two PS-types. The parameters used here are k = 1, λ0 = 1, λ1 = 1.03i, λ3 = 1.05 + i, Π1 = 2 i/2 i/2 0 0 i/2 i/2 1/ . The velocity of the right (left) moving soliton is 2.00 (−3.41). The collision takes place at t = 0. 14/18 J. Phys. Soc. Jpn. Full Paper (a) (b) (c) Fig. 3. Density plots of |φ1|2 (a), |φ0|2 (b) and |φ−1|2 (c) for a mutual collision between DW- type and PS-type. The parameters used here are the same as those of Fig. 2, except for 2i/3 −1/3 , Π3 = 0 −1/ . The right (left) moving soliton is DW-type (PS-type). (a) (b) (c) |φ1|2 |φ0|2 |φ−1|2 t t t x x x Fig. 4. Density plots of |φ1|2 (a), |φ0|2 (b) and |φ−1|2 (c) for a mutual collision between two DW- types. The parameters used here are the same as those of Fig. 2, except for Π1 = 1/2 i/2 i/2 −1/2 . The values more than 2 are colored white. 6. Concluding Remarks In this paper, we have investigated dynamical properties of matter-wave bright solitons with a finite background in the F = 1 spinor Bose-Einstein condensate. To perform our anal- ysis concretely, we have exploited an integrable spinor model with a self-focusing nonlinearity and the inverse scattering method under nonvanishing boundary conditions. The situation 15/18 J. Phys. Soc. Jpn. Full Paper that matter-wave solitons are located on a finite background fits to the experiments. One-soliton solutions are derived explicitly and studied in detail. From the point of the mathematical view, they offer general forms of bright soliton solutions of the NLS equation. We have confirmed that our one-soliton solutions include those obtained in the previous works. One-soliton solutions are classified into two kinds by the difference of boundary conditions; DW-type and PS-type. The spin density profiles of one-solitons vary depending on the bound- ary conditions. In the case of purely imaginary discrete eigenvalues, we have analytically shown that DW-type is in the ferromagnetic state, whereas PS-type is in the polar state. The exis- tence of two distinct magnetic properties for one-soliton solutions also gives rise to fascinating phenomena in the case for two-soliton collisions, for example, the spin-switching. The above results for bright solitons with a finite background agree with those for bright solitons under VBC10, 11) and dark solitons.13) Several problems still remain. It is desirable to extend our analysis to the case of general discrete eigenvalues. The computations of the conserved quantities other than the total spin are also required. (One approach is given in Appendix.) In addition, we wish to investigate analytical properties of general N -soliton solutions under NVBC in the spinor model. Needless to say, too complicated calculations are inevitable for the above problems. The remaining problems should be discussed elsewhere. We conclude that the properties of the multiple matter-wave solitons in the spinor BECs are interesting and should be useful in various applications. Bright solitons are preferable to dark solitons for applications, because of the advantage in the propagation distance. We hope that our analysis contributes to illuminating dynamical properties of solitons in the spinor BECs, which should be demonstrated experimentally in near future. Acknowledgment One of the authors (TK) acknowledges Dr. J. Ieda and Dr. M. Uchiyama for valuable comments and discussions. Appendix: Several Conserved Quantities of One-Soliton Solutions The conserved quantities help us to understand the dynamics of the system. In this ap- pendix, we calculate the total number, the total spin, the total momentum and the total energy of the spinor model. We assume that, in addition to purely imaginary discrete eigen- values, Π is a real symmetric 2× 2 matrix. The condition that Π is a real symmetric matrix is inherent in the self-defocusing case, i.e., dark solitons.12) For Π = Π†, one-soliton solutions of purely imaginary discrete eigenvalues (43)-(45) be- come the following form at t = t′ = (4n− 1)π/8λ20ξλ for n = 0,±1, . . . : Q = λ0 e iφ(x,t) I + 4iζ Πe−(χP+ρ ′/2) + (σyΠσy) eχP+ρ ′/2detΠ e−(2χP+ρ ′) + 1 + e2χP+ρ (detΠ)2 , (A·1) 16/18 J. Phys. Soc. Jpn. Full Paper where eρ ′/2 ≡ λ/ζ. This form is suitable for calculations, since the imaginary part is separated from the real one. One can see clearly that the one-soliton solutions under VBC (32) are located on a finite background in the form (A·1). Note that the domain-wall shape is lost even for detΠ = 0 there. Several conserved quantities of the solitons (A·1) are calculated by use of eqs. (20)-(23). The results for detΠ = 0 are given by N̄T = 4λ0ζ, (A·2) FT = N̄T 2α(β + γ) β2 − γ2 , |F T| = N̄T, (A·3) P̄T = N̄T~k, (A·4) ĒT = N̄Tc (k2 − 2λ20)− , (A·5) and those for detΠ 6= 0 are given by N̄T = 8λ0ζ, (A·6) FT = (0, 0, 0) T , (A·7) P̄T = N̄T~k, (A·8) ĒT = N̄Tc (k2 − 2λ20)− . (A·9) It is intriguing that, for fixed amplitude and discrete eigenvalue, N̄T, P̄T and ĒT of the PS-type (detΠ 6= 0) have just twice values as those of the DW-type (detΠ = 0), respectively. This enables us to interpret that the PS-type of solitons is a bound state of the two DW-types of solitons. Additionally, for fixed amplitude and total number, the total energy ĒT of the DW-type is lower than that of the PS-type: ĒDWT − ĒPST = −N̄3Tc/16 < 0, which suggests that the DW-type is physically preferable. This result is consistent with inter-atomic ferromagnetic interaction, i.e., c̄2 < 0. 17/18 J. Phys. Soc. Jpn. Full Paper References 1) K. E. Strecker, G. B. Partridge, A. G. Truscott and R. G. Hulet: Nature (London) 417 (2002) 150. 2) L. Khaykovich, F. Schreck, G. Ferrari, T. Bourdel, J. Cubizolles, L. D. Carr, Y. Castin and C. Salomon: Science 296 (2002) 1290. 3) S. Burger, K. Bongs, S. Dettmer, W. Ertmer and K. Sengstock: Phys. Rev. Lett. 83 (1999) 5198. 4) J. Denschlag, J. E. Simsarian, D. L. Feder, C. W. Clark, L. A. Collins, J. Cubizolles, L. Deng, E. W. Hagley, K. Helmerson, W. P. Reinhardt, S. L. Rolston, B. I. Schneider and W. D. Phillips: Science 287 (2000) 97. 5) F. K. Abdullaev, A. Gammal, A. M. Kamchatnov and L. Tomio: Int. J. Mod. Phys. B 19 (2005) 3415. 6) J. Stenger, S. Inouye, D. M. Stamper-Kurn, H.-J. Miesner, A. P. Chikkatur, W. Ketterle: Nature 396 (1998) 345. 7) D. M. Stamper-Kurn, M. R. Andrews, A. P. Chikkatur, S. Inouye, H.-J. Miesner, J. Stenger and W. Ketterle: Phys. Rev. Lett. 80 (1998) 2027. 8) H.-J. Miesner, D. M. Stamper-Kurn, J. Stenger, S. Inouye, A. P. Chikkatur and W. Ketterle: Phys. Rev. Lett. 82 (1999) 2228. 9) T. Tsuchida and M. Wadati: J. Phys. Soc. Jpn. 67 (1998) 1175. 10) J. Ieda, T. Miyakawa and M. Wadati: Phys. Rev. Lett. 93 (2004) 194102. 11) J. Ieda, T. Miyakawa and M. Wadati: J. Phys. Soc. Jpn. 73 (2004) 2996. 12) J. Ieda, M. Uchiyama and M. Wadati: J. Math. Phys. 48 (2007) 013507. 13) M. Uchiyama, J. Ieda and M. Wadati: J. Phys. Soc. Jpn. 75 (2006) 064002. 14) M. Olshanii: Phys. Rev. Lett. 81 (1998) 938. 15) T.-L. Ho: Phys. Rev. Lett. 81 (1998) 742. 16) T. Ohmi and K. Machida: J. Phys. Soc. Jpn. 67 (1998) 1822. 17) M. Wadati and N. Tsuchida: J. Phys. Soc. Jpn. 75 (2006) 014301. 18) T. Kawata and H. Inoue: J. Phys. Soc. Jpn. 44 (1978) 1722. 19) In ref. 18, the right-hand side of eq. (49) is λ0+2λ0ζ · · · . We are afraid that there exists a misprint of the sign. 20) X.-J. Chen, J. Yang and W. K. Lam: J. Phys. A 39 (2006) 3263. 18/18 ABSTRACT We investigate dynamical properties of bright solitons with a finite background in the F=1 spinor Bose-Einstein condensate (BEC), based on an integrable spinor model which is equivalent to the matrix nonlinear Schr\"{o}dinger equation with a self-focusing nonlineality. We apply the inverse scattering method formulated for nonvanishing boundary conditions. The resulting soliton solutions can be regarded as a generalization of those under vanishing boundary conditions. One-soliton solutions are derived in an explicit manner. According to the behaviors at the infinity, they are classified into two kinds, domain-wall (DW) type and phase-shift (PS) type. The DW-type implies the ferromagnetic state with nonzero total spin and the PS-type implies the polar state, where the total spin amounts to zero. We also discuss two-soliton collisions. In particular, the spin-mixing phenomenon is confirmed in a collision involving the DW-type. The results are consistent with those of the previous studies for bright solitons under vanishing boundary conditions and dark solitons. As a result, we establish the robustness and the usefulness of the multiple matter-wave solitons in the spinor BECs. <|endoftext|><|startoftext|> Why there is something rather than nothing (out of everything)? A.O.Barvinsky Theory Department, Lebedev Physics Institute, Leninsky Prospect 53, 119991 Moscow, Russia The path integral over Euclidean geometries for the recently suggested density matrix of the Universe is shown to describe a microcanonical ensemble in quantum cosmology. This ensemble corresponds to a uniform (weight one) distribution in phase space of true physical variables, but in terms of the observable spacetime geometry it is peaked about complex saddle-points of the Lorentzian path integral. They are represented by the recently obtained cosmological instantons limited to a bounded range of the cosmological constant. Inflationary cosmologies generated by these instantons at late stages of expansion undergo acceleration whose low-energy scale can be attained within the concept of dynamically evolving extra dimensions. Thus, together with the bounded range of the early cosmological constant, this cosmological ensemble suggests the mechanism of constraining the landscape of string vacua and, simultaneously, a possible solution to the dark energy problem in the form of the quasi-equilibrium decay of the microcanonical state of the Universe. PACS numbers: 04.60.Gw, 04.62.+v, 98.80.Bp, 98.80.Qc Euclidean quantum gravity (EQG) is a lame duck in modern particle physics and cosmology. After its sum- mit in early and late eighties (in the form of the cosmo- logical wavefunction proposals [1, 2] and baby universes boom [3]) the interest in this theory gradually declined, especially, in cosmological context, where the problem of quantum initial conditions was superseded by the con- cept of stochastic inflation [4]. EQG could not stand the burden of indefiniteness of the Euclidean gravitational action [5] and the cosmology debate of the tunneling vs no-boundary proposals [6]. Thus, a recently suggested EQG density matrix of the Universe [7] is hardly believed to be a viable candidate for the initial state of the Universe, even though it avoids the infrared catastrophe of small cosmological constant Λ, generates an ensemble of universes in the limited range of Λ, and suggests a strong selection mechanism for the landscape of string vacua [7, 8]. Here we want to justify this result by deriving it from first principles of Lorentzian quantum gravity applied to a microcanonical ensemble of closed cosmological models. Thermal properties in quantum cosmology [9] are in- corporated by a mixed physical state, which is dynam- ically more preferable than a pure state of the Hartle- Hawking type. This follows from the path integral for the EQG statistical sum [7, 8]. It can be cast into the form of the integral over a minisuperspace of the lapse function N(τ) and scale factor a(τ) of spatially closed FRW metric ds2 = N2(τ) dτ2 + a2(τ) d2Ω(3), e−Γ = periodic D[ a,N ] e−ΓE [ a,N ], (1) e−ΓE [ a,N ] = periodic Dφ(x) e−SE [ a,N ;φ(x) ]. (2) Here ΓE [ a, N ] is the Euclidean effective action of all inhomogeneous “matter” fields which include also met- ric perturbations on minisuperspace background Φ(x) = (φ(x), ψ(x), Aµ(x), hµν (x), ...). SE [a,N ;φ(x)] is the clas- sical Eucidean action, and the integration runs over pe- riodic fields on the Euclidean spacetime with a compact- ified time τ (of S1 × S3 topology). For free matter fields φ(x) conformally coupled to grav- ity (which are assumed to be dominating in the sys- tem) the effective action has the form [7] ΓE [ a,N ] = dτ NL(a, a′) + F (η), a′ ≡ da/Ndτ . Here NL(a, a′) is the effective Lagrangian of its local part including the classical Einstein term (with the cosmological constant Λ = 3H2) and the contribution of the conformal anomaly of quantum fields and their vacuum (Casimir) energy, L(a, a′) = −aa′2−a+H2a3+B . (3) F (η) is the free energy of their quasi-equilibrium excita- tions with the temperature given by the inverse of the conformal time η = dτ N/a. This is a typical boson or fermion sum F (η) = ± 1∓ e−ωη over field oscil- lators with energies ω on a unit 3-sphere. We work in units of mP = (3π/4G) 1/2, and B is the constant deter- mined by the coefficient of the Gauss-Bonnet term in the overall conformal anomaly of all fields φ(x). Semiclassically the integral (1) is dominated by the saddle points — solutions of the Friedmann equation −H2 − C , (4) modified by the quantum B-term and the radiation term C/a4 with the constant C satisfying the bootstrap equa- tion C = B/2 + dF (η)/dη. Such solutions represent garland-type instantons which exist only in the limited range 0 < Λmin < Λ < 3m P/B [7, 8] and eliminate the infrared catastrophe of Λ = 0. The period of these quasi- thermal instantons is not a freely specifiable parameter, but as a function of Λ follows from this bootstrap. There- fore this is not a canonical ensemble. Contrary to the construction above, the density ma- trix that we advocate here is given by the canonical path integral of Lorentzian quantum gravity. Its kernel in the http://arxiv.org/abs/0704.0083v2 representation of 3-metrics and matter fields denoted be- low as q reads ρ(q+, q−) = e q(t±)= q± D[ q, p,N ] e dt (p q̇−NµHµ) , (5) where the integration runs over histories of phase-space variables (q(t), p(t)) interpolating between q± at t± and the Lagrange multipliers of the gravitational constraints Hµ = Hµ(q, p) — lapse and shift functionsN(t) = N µ(t). The measure D[ q, p,N ] includes the gauge-fixing factor containing the delta function δ(χ) = µ δ(χ µ) of gauge conditions χµ and the ghost factor [10, 11] (condensed index µ includes also continuous spatial labels). It is important that the integration range of Nµ −∞ < N < +∞, (6) is such that it generates in the integrand the delta- functions of these constraints δ(H) = µ δ(Hµ). As a consequence the kernel (5) satisfies the set of quantum Dirac constraints — Wheeler-DeWitt equations q, ∂/i∂q ρ( q, q− ) = 0, (7) and the density matrix (5) can be regarded as an operator delta-function of these constraints ρ̂ ∼ “ δ(Ĥµ) ”. (8) This notation should not be understood literally because this multiple delta-function is not well defined, for the operators Ĥµ do not commute and form a quasi-algebra with nonvanishing structure functions. Moreover, ex- act operator realization Ĥµ is not known except the first two orders of a semiclassical ~-expansion [12]. For- tunately, we do not need a precise form of these con- straints, because we will proceed with their path-integral solutions well adjusted to the semiclassical perturbation theory. This strategy does not extend beyond typical field-theoretic considerations when the path integral is regarded more fundamental than the Schrodinger equa- tion marred with the problems of divergent equal-time commutators, operator ordering, etc. The very essence of our proposal is the interpretation of (5) and (8) as the density matrix of a microcanonical ensemble in spatially closed quantum cosmology. A sim- plest analogy is the density matrix of an unconstrained system having a conserved Hamiltonian Ĥ in the micro- canonical state with a fixed energy E, ρ̂ ∼ δ(Ĥ − E). Major distinction of (8) from this case is that spatially closed cosmology does not have freely specifiable con- stants of motion like the energy or other global charges. Rather it has as constants of motion the Hamiltonian and momentum constraints Hµ, all having a particular value — zero. Therefore, the expression (8) can be considered as a most general and natural candidate for the quantum state of the closed Universe. Below we confirm this fact by showing that in the physical sector the correspond- ing statistical sum is just a uniformly distributed (with a unit weight) integral over entire phase space of true physical degrees of freedom. Thus, this is a sum over Everything. However, in terms of the observable quanti- ties, like spacetime geometry, this distribution turns out to be nontrivially peaked around a particular set of uni- verses. Semiclassically this distribution is given by the EQG density matrix and the saddle-point instantons of the above type [7]. From the normalization of the density matrix in the physical Hilbert space the statistical sum follows as the path integral 1 = Trphys ρ̂ = q, ∂/i∂q ρ(q, q′) periodic D[ q, p,N ] e i dt(p q̇−NµHµ), (9) where the integration runs over periodic in time histo- ries of q = q(t). Here µ q, ∂/i∂q = µ̂ is the mea- sure which distinguishes the physical inner product in the space of solutions of the Wheeler-DeWitt equations (ψ1|ψ2) = 〈ψ1|µ̂|ψ2〉 from that of the space of square- integrable functions, 〈ψ1|ψ2〉 = dq ψ∗1ψ2. This measure includes the delta-function of unitary gauge conditions and an operator factor built with the aid of the relevant ghost determinant [12]. On the other hand, in terms of the physical phase space variables the Faddeev-Popov path integral equals [10, 11] D[ q, p,N ] e i dt (p q̇−NµHµ) DqphysDpphys e dt (pphys q̇phys−Hphys(t)) = Trphys T e−i dt Ĥphys(t) , (10) where T denotes the chronological ordering. Here the physical Hamiltonian Hphys(t) and its operator realiza- tion Ĥphys(t) are nonvanishing only in unitary gauges ex- plicitly depending on time [12], ∂tχ µ(q, p, t) 6= 0. In static gauges, ∂tχ µ = 0, they identically vanish, because in spa- tially closed cosmology the full Hamiltonian reduces to the combination of constraints. The path integral (10) is gauge-independent on-shell [10, 11] and coincides with that in the static gauge. Therefore, from Eqs.(9)-(10) with Ĥphys = 0, the sta- tistical sum of our microcanonical ensemble equals e−Γ = Trphys Iphys = dqphys dpphys = sum over Everything. (11) This ultimate equipartition, not modulated by any non- trivial density of states, is a result of general covariance and closed nature of the Universe lacking any freely speci- fiable constants of motion. The volume integral of entire physical phase space, whose structure and topology is not known, is very nontrivial. However, below we show that semiclassically it is determined by EQG methods and supported by instantons of [7] spanning a bounded range of the cosmological constant. Integration over momenta in (9) yields a Lagrangian path integral with a relevant measure and action e−Γ = D[ q,N ] eiSL[ q, N ]. (12) Integration runs over periodic fields (not indicated ex- plicitly but assumed everywhere below) even despite the Lorentzian signature of the underlying spacetime. Sim- ilarly to the procedure of [7, 8] leading to (1)-(2) in the Euclidean case, we decompose [ q,N ] into a min- isuperspace [ aL(t), NL(t) ] and the “matter” φL(x) vari- ables, the subscript L indicating their Lorentzian na- ture. With a relevant decomposition of the measure D[ q,N ] = D[ aL, NL ]×DφL(x), the microcanonical sum takes the form e−Γ = D[ aL, NL ] e iΓL[ aL, NL ], (13) eiΓL[ aL, NL ] = DφL(x) e iSL[ aL, NL;φL(x)], (14) where ΓL[ aL, NL ] is a Lorentzian effective action. The stationary point of (13) is a solution of the effective equa- tion δΓL/δNL(t) = 0. In the gauge NL = 1 it reads as a Lorentzian version of the Euclidean equation (4) and the bootstrap equation for the radiation constant C with the Wick rotated τ = it, a(τ) = aL(t), η = i dt/aL(t). However, with these identifications C turns out to be purely imaginary (in view of the complex nature of the free energy F (i dt/aL)). Therefore, no periodic solu- tions exist in spacetime with a real Lorentzian metric. On the contrary, such solutions exist in the Euclidean spacetime. Alternatively, the latter can be obtained with the time variable unchanged t = τ , aL(t) = a(τ), but with the Wick rotated lapse function NL = −iN, iSL[ aL, NL;φL] = −SE[ a,N ;φ ]. (15) In the gauge N = 1 (NL = −i) these solutions exactly coincide with the instantons of [7]. The corresponding saddle points of (13) can be attained by deforming the integration contour (6) of NL into the complex plane to pass through the point NL = −i and relabeling the real Lorentzian t with the Euclidean τ . In terms of the Eu- clidean N(τ), a(τ) and φ(x) the integrals (13) and (14) take the form of the path integrals (1) and (2) in EQG, iΓL[ aL, NL] = −ΓE [ a, N ]. (16) However, the integration contour for the Euclidean N(τ) runs from −i∞ to +i∞ through the saddle point N = 1. This is the source of the conformal rotation in Euclidean quantum gravity, which is called to resolve the problem of unboundedness of the gravitational action and effectively renders the instantons a thermal nature, even though they originate from the microcanonical ensemble. This mechanism implements the justification of EQG from canonical quantization of gravity [14] (see also [15] in black hole context). To show this we calculate (1) in the one-loop approx- imation with the measure inherited from the canonical path integral (5) D[ a,N ] = DaDN µ[ a,N ] δ[χ ] DetQ. Here µ[ a,N ] is a local measure determined by the La- grangian NL(a, a′), (3), in the local part of ΓE [ a,N ], µ1−loop = ∂2(NL) ∂ȧ ∂ȧ N a2a′2 D = a a′2(a2 −B +B a′2). (17) The Faddeev-Popov factor δ[χ ] DetQ contains a gauge condition χ = χ(a,N) fixing the one-dimensional dif- feomorphism, τ → τ̄ = τ − f/N , which for infinitesi- mal f = f(τ) has the form ∆fN ≡ N̄(τ) − N(τ) = ḟ , ∆fa ≡ ā(τ) − a(τ) = ȧ f/N , and Q = Q(d/dτ) is a ghost operator determined by the gauge transformation of χ(a,N), ∆fχ = Q(d/dτ) f(τ). The conformal mode σ of the perturbations δa = σa0 and δN = σN0 on the saddle-point background (labeled below by zero, a = a0 + δa, N = N0 + δN) origi- nates from imposing the background gauge χ(a,N) = δN − (N0/a0) δa. In this gauge Q = a(d/dτ)a−1, and the quadratic part of ΓE takes the form [13] δ2σΓE = − 3πm2P , (18) where D is given by (17). As is known from [7] for the background instantons a20(τ) ≥ a2− > B (a− is the turn- ing point with the smallest value of a0(τ)), so that D > 0 everywhere except the turning points where D degener- ates to zero. Therefore δ2σΓE < 0 for real σ, but the Gaussian integration runs along the imaginary axes and yields the functional determinant of the positive operator — the kernel of the quadratic form (18) e−Γ1−loop = e−Γ0 DetQ0 D/a′2 = e−Γ0×Det )]−1/2 In view of periodic boundary conditions for both oper- ators their determinants cancel each other (their zero modes to be eliminated because they correspond to the conformal Killing symmetry of FRW instantons) [13]. Therefore, the contribution of the conformal mode re- duces to the selection of instantons with a fixed time pe- riod, effectively endowing them with a thermal nature. As suggested in [7, 8, 16] these instantons serve as initial conditions for inflationary universes which evolve according to the Lorentzian version of Eq.(4) and, at late stages, have two branches of a cosmological acceleration with Hubble scales H2 = (m2P /B)(1±(1−2BH2)1/2). If the initial Λ = 3H2 is a composite inflaton field decaying at the end of inflation, then one of the branches under- goes acceleration with H2+ = 2m P/B. This is determined by the new quantum gravity scale suggested in [8] – the upper bound of the instanton Λ-range, Λmax = 3m P /B. To match the model with inflation and the dark energy phenomenon, one needs B of the order of the inflation scale in the very early Universe and B ∼ 10120 now, so that this parameter should effectively be a growing func- tion of time. This picture seems to fit into string theory at its low- energy field-theoretic level. Then, with a bounded range of Λ, it might constrain the landscape of string vacua [7, 8]. Moreover, string theory has a qualitative mecha- nism to promote the constant B to the level of a mod- uli variable indefinitely growing with the evolving size R(t) of extra dimensions. Indeed B as a coefficient in the overall conformal anomaly of 4-dimensional quantum fields basically counts their number N , B ∼ N . In the Kaluza-Klein (KK) theory and string theory the effective 4-dimensional fields arise as KK and winding modes with the masses [17] m2n,w = R2 (19) (enumerated by the KK and winding numbers), which break their conformal symmetry. These modes remain approximately conformally invariant as long as their masses are much smaller than the spacetime curvature, m2n,w ≪ H2+ ∼ m2P /N . Therefore the number of confor- mally invariant modes changes with R. Simple estimates show that for pure KK modes (w = 0, n ≤ N) their num- ber grows with R as N ∼ (mPR)2/3, whereas for pure winding modes (n = 0, w ≤ N) their number grows with the decreasing R as N ∼ (mPα′/R)2/3. Thus, the effect of indefinitely growing B is possible for both scenarios with expanding or contracting extra dimensions. In the first case this is the growing tower of superhorizon KK modes which make the horizon scale H+ = mP 2/B ∼ mP /(mPR) 1/3 indefinitely decreasing with R → ∞. In the second case this is the tower of superhorizon winding modes which make this acceleration scale decrease with the decreasing R as H+ ∼ mP (R/mPα′)1/3. This effect is flexible enough to accommodate the present day ac- celeration scale H0 ∼ 10−60mP (though, by the price of fine-tuning an enormous coefficient of expansion or con- traction of R). This gives a new dark energy mechanism driven by the conformal anomaly and transcending the inflationary and matter-domination stages starting with the state of the microcanonical distribution. To summarize, within a minimum set of assumptions (the equipartition in the physical phase space (11)), we seem to have the mechanism of constraining the landscape of string vacua and get the full evolution of the Universe as a quasi-equilibrium decay of its initial microcanonical state. Thus, contrary to anticipations of Sidney Coleman, “there is Nothing rather than Something” [3], one can say that Something (rather than Nothing) comes from Everything. The author thanks O.Andreev, C.Deffayet, A.Kamen- shchik, J.Khoury, H.Tye, A.Tseytlin, I.Tyutin and B.Voronov for thought provoking discussions and espe- cially Andrei Linde, this work having appeared as an un- intended response to his discontent with EQG initial con- ditions. The work was supported by the RFBR grant 05- 02-17661, the grant LSS 4401.2006.2 and SFB 375 grant at the Ludwig-Maximilians University in Munich. [1] J.B.Hartle and S.W.Hawking, Phys.Rev. D28, 2960 (1983); S.W.Hawking, Nucl. Phys. B 239, 257 (1984). [2] A.D. Linde, JETP 60, 211 (1984); A.Vilenkin, Phys. Rev. D 30, 509 (1984). [3] S.R.Coleman, Nucl. Phys. B 310, 643 (1988). [4] A.A.Starobinsky, in Field Theory, Quantum Gravity and Strings, 107 (eds. H.De Vega and N.Sanchez, Springer, 1986); A.D.Linde, Particle physics and inflationary cos- mology (Harwood, Chur, Switzerland, 1990). [5] G.W.Gibbons, S.W.Hawking and M.Perry, Nucl. Phys. B 138, 141 (1978). [6] A.Vilenkin, Phys. Rev. D58, 067301 (1988), gr-qc/9804051; gr-qc/9812027. [7] A.O.Barvinsky and A.Yu.Kamenshchik, J. Cosmol. As- tropart. Phys. 09, 014 (2006), hep-th/0605132. [8] A.O.Barvinsky and A.Yu.Kamenshchik, Phys. Rev.D74, 121502 (2006), hep-th/0611206. [9] H.Firouzjahi et al, JHEP 0409, 060 (2004); S.Sarangi and S.-H.H.Tye, hep-th/0505104; R.Brustein and S.P.de Alwis, Phys. Rev. D 73, 046009 (2006). [10] L.D.Faddeev, Theor. Math. Phys. 1, 1 (1970). [11] A.O.Barvinsky, Phys. Rep. 230, 237 (1993); Nucl. Phys. B 520 (1998) 533. [12] A.O.Barvinsky and V.Krykhtin, Class. Quantum Grav. 10, 1957 (1993); A.O.Barvinsky, Geometry of the Dirac and reduced phase space quantization of constrained sys- tems, gr-qc/9612003; M.Henneaux and C.Teitelboim, Quantization of Gauge Sytems (Princeton University Press, Princeton, 1992). [13] A.O.Barvinsky and A.Yu.Kamenshchik, in preparation. [14] J.B. Hartle and K. Schleich, in Quantum field theory and quantum statistics, 67 (eds. I.Batalin et al, Hilger, Bris- tol, 1988); K. Schleich, Phys.Rev. D 36, 2342 (1987). [15] D. Brown and J.W. York, Jr., Phys. Rev. D 47, 1420 (1993), gr-qc/9209014. [16] A.O.Barvinsky and A.Yu.Kamenshchik, Cosmological landscape and Euclidean quantum gravity,to appear in J. Phys. A, hep-th/0701201. [17] J.Polchinski, String Theory (Cambridge University Press, Cambridge, 1998). http://arxiv.org/abs/gr-qc/9804051 http://arxiv.org/abs/gr-qc/9812027 http://arxiv.org/abs/hep-th/0605132 http://arxiv.org/abs/hep-th/0611206 http://arxiv.org/abs/hep-th/0505104 http://arxiv.org/abs/gr-qc/9612003 http://arxiv.org/abs/gr-qc/9209014 http://arxiv.org/abs/hep-th/0701201 ABSTRACT The path integral over Euclidean geometries for the recently suggested density matrix of the Universe is shown to describe a microcanonical ensemble in quantum cosmology. This ensemble corresponds to a uniform (weight one) distribution in phase space of true physical variables, but in terms of the observable spacetime geometry it is peaked about complex saddle-points of the {\em Lorentzian} path integral. They are represented by the recently obtained cosmological instantons limited to a bounded range of the cosmological constant. Inflationary cosmologies generated by these instantons at late stages of expansion undergo acceleration whose low-energy scale can be attained within the concept of dynamically evolving extra dimensions. Thus, together with the bounded range of the early cosmological constant, this cosmological ensemble suggests the mechanism of constraining the landscape of string vacua and, simultaneously, a possible solution to the dark energy problem in the form of the quasi-equilibrium decay of the microcanonical state of the Universe. <|endoftext|><|startoftext|> Introduction. 1.1. Description of the model. We give a quantitative analysis of clus- tering in a stochastic model of one-dimensional gas. At time zero, the gas consists of n point particles, each one of mass 1 . These particles are ran- domly distributed on the real line and have zero initial speeds. Particles begin to move under the forces of mutual attraction. When two or more particles collide, they stick together forming a new particle, called cluster, whose mass and speed are defined by the laws of mass and momentum conservation. Between collisions, particles move according to the laws of Newtonian mechanics. We suppose that the force of mutual attraction does not depend on dis- tance and equals the product of masses. This assumption is natural for Received March 2007; revised September 2007. 1Supported in part by the Grants NSh-4222.2006.1 and DFG-RFBR 436 RUS 113/773/0-1(R). AMS 2000 subject classifications. Primary 60K35, 82C22; secondary 60F17, 70F99. Key words and phrases. Sticky particles, particle systems, gravitating particles, number of clusters, aggregation, adhesion. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Applied Probability, 2008, Vol. 18, No. 3, 1026–1058. This reprint differs from the original in pagination and typographic detail. http://arxiv.org/abs/0704.0086v2 http://www.imstat.org/aap/ http://dx.doi.org/10.1214/07-AAP481 http://www.imstat.org http://www.ams.org/msc/ http://www.imstat.org http://www.imstat.org/aap/ http://dx.doi.org/10.1214/07-AAP481 2 V. V. VYSOTSKY one-dimensional models because, by the Gauss law applied to flux of the gravitational field, gravitation is proportional to the distance to the power one minus dimension of the space. At any moment, the acceleration of a particle is thus equal to difference of masses located to the right and to the left of the particle. Random initial positions of particles are usually described (see [8, 16, 25]) by the following natural models: in the uniform model, n particles are independently and uniformly spread on [0,1]; in the Poisson model, particles are located at points 1 S2, . . . , Sn, where Si is a standard exponential random walk. In other words, particles are located at points of first n jumps of a Poisson process with intensity n. These two models are the most natural and interesting; let us call them the main models of initial positions. However, we will see that behavior of the Poisson model is essentially defined by independence of initial dis- tances between particles rather than by the particular type of the distances’ distribution. Therefore, it is of a great mathematical interest to general- ize the Poisson model by introducing the i.d. model, where “i.d.” stands for “independent distances,” as follows. Particles are initially located at S2, . . . , Sn, where Si is a positive random walk whose nonnegative i.i.d. increments Xi satisfy the normalization condition EXi = 1. Note that if we proceed to the limit as n→∞, we consider a system of total mass one, which consists of, roughly speaking, infinitesimal particles homogeneously spread on [0,1]; this is true for all the mentioned models of initial positions. The mathematical interest in sticky particles systems arises mainly from relations between these systems and some nonlinear partial differential equa- tions originating from fluid mechanics, for example, the Burgers equation. These equations admit interpretation in terms of sticky particles; see Gur- batov et al. [10], Brenier and Grenier [4] or E, Rykov and Sinai [6]. Sticky particles models are also used for numerical solving of other partial differen- tial equations; see Chertock et al. [5] for explanations and further references. As time goes, particles aggregate in clusters. Clusters become larger and larger while the number of clusters decreases until they merge into a single cluster containing all initial particles. This process of mass aggregation is strongly connected with additive coalescence; see Bertoin [2] and Giraud [9] for the most recent results and references. The aggregation process resembles formation of a star from dispersed space dust and sticky particles models indeed have relations to astrophysics. It is appropriate to clarify these relations since they are not so direct and cause a lot of misunderstanding. It is known that the distribution of galaxies in the universe is very inho- mogeneous and the regions of high density form a peculiar cellular structure. The first attempt to understand the formation of such structures was made CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 3 in 1970 by Zeldovich. Most of the mass in the universe is believed to ex- ist in the form of particles that practically do not collide with each other and interact only gravitationally, for example, neutrinos. In his model, Zel- dovich considered an initially homogeneous collisionless medium of particles moving by pure inertia; the gravitational interaction was taken away by an appropriate time change. He showed that singularities, that is, the thin re- gions of very high density of particles, so called “pancakes,” appear even if initial speeds of particles form a smooth velocity field. Zeldovich’s approximate model, however, does not explain formation of the cellular structure of matter. His approximation does not take into ac- count that particles hitting a “pancake” are hampered by its strong gravita- tional field and start oscillating inside the “pancake” instead of flying away. Although this gravitational adhesion of collisionless particles is not precisely the same as the real sticking, the model of sticky particles serves as a reason- able approximation. The effect of gravitational adhesion was then analyzed by the use of the Burgers equation; Gurbatov, Saichev and Shandarin pro- posed it in 1984 to extend Zeldovich’s approximation, which is invalid after formation of “pancakes.” The model of sticky particles is directly mentioned in Gurbatov et al. [11]; a comprehensive survey of the formation of the Universe’s large-scale struc- ture could be found in Shandarin and Zeldovich [23]. 1.2. Statement of the problem and the results. In general, the problem is to describe the process of mass aggregation. How fast is it? How large the clusters are? Where do clusters appear most intensively, and so forth? Numerous papers on the model (e.g., [8, 14, 16, 20, 25]) are dedicated to probabilistic description of various properties of the aggregation process as the number of initial particles n tends to infinity. Thus, the behavior of a typical system consisting of a large number of particles is studied. In this paper, we are interested in the asymptotic behavior of Kn(t), which denotes the number of clusters at time t in the system with n initial particles. This variable is a decreasing random step function satisfying Kn(0) = n and Kn(t) = 1 for t ≥ T lastn , where T lastn denotes the moment of the last collision. While calculating Kn(t), we also count initial particles that have not experienced any collisions; in other words, Kn(t) is the total number of particles existing at time t. It is very important to know the behavior of Kn(t). This gives us a deep understanding of the aggregation process since the average size of a cluster at time t is n Kn(t) At first we give a short deterministic example. Suppose that particles are located at points 1 , . . . , n , that is, Si = i. By simple calculations, we find that there would not be any collisions before t= 1. At the moment t= 1, all 4 V. V. VYSOTSKY particles simultaneously stick together, hence Kn(t) = n for 0 ≤ t < 1 and Kn(t) = 1 for t≥ 1. However, when the initial positions are random, the aggregation process behaves entirely differently. In [25], the author proved the following state- ment. Fact 1. There exists a deterministic function a(t) such that both in the Poisson and the uniform models of initial positions, for any t≥ 0, we have Kn(t) P−→ a(t), n→∞.(1) The function a(t) is continuous, a(0) = 1, and a(t) = 0 for t≥ 1. We conjec- ture, on the basis of numerical simulations, that a(t) = 1− t2 for 0≤ t≤ 1. The relation a(t) = 0 for t > 1 is not of a surprise because we know from Giraud [8] that both in the Poisson and the uniform models, T lastn P−→ 1 (the limit constant is so “fine” due to the proper scaling of the model). Therefore, we say that the moment t= 1 is critical ; note that this moment coincides with the moment of the total collision in the deterministic model. The aim of this paper is to strengthen the result of [25]. We first gen- eralize Fact 1 and prove it for the i.d. model. We will see [relations (19) and (27) below] that a(t) is equal to the probability of a certain event that is expressed in terms of Xi. Also, we will prove that a(t) depends on the common distribution of Xi as follows: a(t) = 1 on [0, µ), where µ := sup{y :P{Xi < y}= 0}; a(t) ∈ (0,1) on (√µ,1); and a(t) = 0 on (1,∞). Furthermore, the recent results of the author [26] allow us to prove the conjecture from Fact 1 that aPoiss(t) = aUnif(t) = 1− t2 for 0≤ t≤ 1. There is an amazing contrast between the simplicity of this formula and the hard calculations one needs to obtain it. It is remarkable that now we know the limit function a(t) for the main models of initial positions. Our main goal is to improve (1) by finding the next term in the asymp- totics of Kn(t). The result is the following statement, where the standard symbol D−→ denotes weak convergence and D denotes the Skorohod space. Theorem 1. In the i.d. model with continuous Xi satisfying EX for some γ > 4, there exists a centered Gaussian process K(·) on [0,1) such Kn(·)− na(·)√ D−→K(·) in D[0,1− ε] for all ε ∈ (0,1)(2) CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 5 as n→ ∞. The process K(·) depends on the distribution of Xi. This pro- cess satisfies K(0) = 0 and has a.s. continuous trajectories. The covariance function R(s, t) of K(·) is continuous on [0,1)2, R(s, t)> 0 on (√µ,1)2, and R(s, t) = 0 on [0,1)2 \ (√µ,1)2. In the uniform model, (2) holds for some centered Gaussian process KUnif(·) on [0,1). This process satisfies KUnif(0) = 0 and has a.s. continuous trajecto- ries. The covariance function RUnif(s, t) of KUnif(·) is continuous on [0,1)2, and RUnif(s, t) =RPoiss(s, t)− s2t2. Thus, the Poisson and the uniform models lead to different limit processes KPoiss(·) and KUnif(·), although aPoiss(·) = aUnif(·). As an immediate corollary of Theorem 1 (see Billingsley [3], Section 15), we get Kn(t)− na(t)√ D−→N (0, σ2(t)), n→∞(3) for any t < 1, where σ2(t) := R(t, t). It is possible to show that in the i.d. model, (3) holds for all t 6= 1 under the less restrictive condition EX2i <∞, with σ2(t) = 0 for t > 1; continuity of Xi is not required. We also study convergence of the left-hand side of (3) at the critical moment t= 1. Apparently, the limit is not Gaussian, but this complicated problem is related to a curious, but hardly provable conjecture on integrated random walks. In view of this non-Gaussianity, it seems impossible to prove any extended version of Theorem 1 that describes the weak convergence of trajectories on the whole interval [0,1]; we refer to Section 7 for further discussion. We finish this subsection with a note on scaling. In our model, the masses of particles are equal to 1 and the distances between them are of the order . Let us rescale the i.d. model by multiplying all masses and distances by n: the system of particles of mass one each, initially located at points S1 − S[n/2], S2 − S[n/2], . . . , Sn − S[n/2], is called the expanding model. The particles are shifted by S[n/2] because we want the system to expand “filling” the whole line as n→∞ rather than only the positive half-line. All results of our paper hold true for the expanding model. This is not unexpected because the shift does not produce any changes and the rescaling of masses is equivalent to the time contraction by n times while the rescaling of distances is equivalent to the time expansion by n times. We refer the reader to Section 2 below or to Lifshits and Shi [16] for rigorous arguments. 1.3. Organization of the paper. In Section 2 we describe a general method which is used to study systems of sticky particles. This method is applied for studying the i.d. model in Section 3, where we investigate some properties of 6 V. V. VYSOTSKY the aggregation process. We will show that the aggregation process is highly local, that is, the behavior of a particle is essentially defined by the motion of neighbor particles. This localization property suggests that we could use limit theorems for weakly dependent variables to prove both Fact 1 and Theorem 1 for the i.d. model; this will be done in Section 4. Then we will prove Theorem 1 for the uniform model in Section 5. In Section 6 we study the number of clusters at the critical moment t = 1. Some open questions are discussed in Section 7. 2. Method of barycenters. In this section we briefly describe the method of barycenters, which is the main tool used to study systems of sticky par- ticles; it is also applicable to more general models where particles could have nonzero initial speeds and different masses. The method of barycenters was independently introduced by E, Rykov and Sinai [6] and Martin and Piasecki [20]. Let us start with several definitions. We always numerate particles from left to right and identify particles with their numbers. A block of particles is a nonempty set J ⊂ [1, n] consisting of consecutive numbers. For example, the block (i, i+k] consists of particles i+1, . . . , i+k. Note that there are not any relations between blocks and clusters: for example, a block’s particles could be contained in different clusters and these clusters could even contain particles that do not belong to the block. It is convenient to assume that initial particles do not vanish at collisions but continue to exist in created clusters. Then the coordinate xi,n(t) of a particle i could be defined as the coordinate of a cluster that contains the particle at time t. The second subscript n always indicates the number of initial particles; we will omit this subscript as often as possible. By xJ(t) := |J |−1 i∈J xi(t) denote the position of the barycenter of a block J at time t. Further, define x∗J(t) := xJ(0) + where M J := n −1(n −maxj∈J) and M (L)J := n−1(minj∈J −1) are the to- tal masses of particles located to the right and to the left of the block J , respectively. A block is free from the right up to time t if, up to this time, the block’s particles did not collide with particles initially located to the right of the block. We similarly define blocks that are free from the left and say that a block is free up to time t if it is both free from the right and from the left. The next statement plays the key role in the analysis of sticky particles systems. The barycenter of a free block moves as an imaginary particle con- sisting of all particles of the block put together at the initial barycenter. In a more precise and general way, we state the following. CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 7 Proposition 1. If a block J is free from the right (resp. left) up to time t, then xJ(s)≥ x∗J(s) for s ∈ [0, t] [resp. xJ(s)≤ x∗J(s)]. If a block J is free up to time t, then xJ(s) = x J(s) for s ∈ [0, t]. This statement could be found, for example, in Lifshits and Shi [16], Proposition 4.1. The easy proof is based on the property of conservation of momentum. The moment when a particle j sticks with its right-hand side neighbor j + 1 is called the merging time Tj,n of the particle j. In other words, Tj,n is the first moment when particles j and j + 1 are contained in a common cluster; here j ∈ [1, n− 1]. Proposition 4.3 from Lifshits and Shi [16], which is stated below, gives us a way to calculate Tj,n. Proposition 2. For every j ∈ [1, n− 1], we have Tj,n = min j j, the block [j + 1, k] is free from the left. By Proposition 1, x∗(l,j](u)≤ x(l,j](u)≤ xj(u) j such that the blocks (l, j] and (j, k] are free up to time Tj,n (clusters containing particles from these blocks collide exactly at time Tj,n). In view of Proposition 1, x∗(l,j](Tj,n) = x(l,j](Tj,n) = x(j,k](Tj,n) = x (j,k](Tj,n); hence Tj,n = {s≥ 0 :x∗(j,k](s) = x (l,j] (s)} and Tj,n ≥min{· · ·}. � 3. Study of the i.d. model. The localization property. At first, note that Kn(t) = 1+ 1{tM : 1 (Si − it2)M : 1 (Si − it2) (Si − it2) 1M i=1 (Si − it2), we conclude that the considered event implies ∃k >M : 1 (Si − it2)< 0 ∃k >M : (Si − it2)< 0 Clearly, the latter implies {∃i≥M :Si − it2 < 0}= hence, combining all the estimates together, we get P{1{t≤Tj} 6= 1{t≤T (M) }} ≤ 2P .(13) Note that we obtained (13) without any assumptions on the moments of Xi. We now estimate the right-hand side of (13); recall that EXi = 1. Then the first part of (12) immediately follows from the classical result of Baum and Katz [1] (see their Theorem 3 and Lemma): 12 V. V. VYSOTSKY Fact 2. If EXi = a and E|Xi|γ <∞ for some γ ≥ 1, then = o(k1−γ), k→∞ for any ε > 0. In addition, the series k=1P{supi≥k |Sii − a|> ε} converges for all ε > 0 if γ = 2. The estimation of the second probability in the left-hand side of (12) is completely analogous, since {Tj 6= T (M)j , T j ≤ t} = {Tj < T (M)j ≤ t} 1≤k,l Fk,j,l(T j )< 0, min 1≤k,l≤M Fk,j,l(T j ) = 0, T j ≤ t We put j :=−1, repeat the estimates, and get P{Tj 6= T (M)j , T j ≤ t} ≤ 2P{∃i≥M :Si − i[T < 0, T −1 ≤ t} instead of (13). The right-hand side does not exceed 2P{∃i≥M : Si − it2 < 0}, hence P{Tj 6= T (M)j , T j ≤ t} ≤ 2P .(14) 3.3. The distribution function of T0 in the Poisson model. It is amazing that in the Poisson model, the distribution function of T0 could be found explicitly. This is important because by (27) below, the limit function a(t) equals P{T0 > t} for the i.d. model. Also, in the proof of Theorem 1 for the uniform model, we will need aPoiss(t) = P{TPoiss0 ≥ t} to be twice differen- tiable and have a continuous second derivative. Lemma 2. In the Poisson model, for 0≤ t≤ 1, we have P{T0 ≥ t}= 1− t2.(15) In addition, for t≥ 0, n≥ 2, and 1≤ j ≤ n− 1, we have P{Tj,n ≥ t}= et 1≤k≤j (Si − it2)≥ 0 1≤k≤n−j (Si − it2)≥ 0 where Si is a standard exponential random walk. CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 13 Proof. We start with (16). By (8), (9) and properties of Fk,j,l(·), P{Tj,n ≥ t}= P 1≤k≤n−j 1≤l≤j Fk,j,l(t)≥ 0 1≤k≤n−j (k− i)(Xj+i+1 − t2)(17) + min 1≤l≤j (l− i)(Xj−i+1 − t2) +Xj+1 − t2 ≥ 0 In the right-hand side of the last equality, by Y denote the first minimum and by Ỹ denote the second one. Suppose X is a standard exponential r.v., Z is a nonnegative r.v., and that X and Z are independent; then P{Z ≤X}= P{Z ≤ x}e−x dx E1{Z≤x}e −x dx= E 1{Z≤x}e −x dx= Ee−Z . Hence in view of independence of Y , Ỹ , Xj+1 we get P{Y + Ỹ +Xj+1 − t2 ≥ 0}= EeY+Ỹ−t EeY−t EeỸ−t and therefore, P{Tj,n ≥ t}= et P{Y +Xj+1 − t2 ≥ 0} · P{Ỹ +Xj+1 − t2 ≥ 0}. Now, by P{Ỹ +Xj+1 − t2 ≥ 0} 1≤l≤j (l− i)(Xj−i+1 − t2) +Xj+1 − t2 ≥ 0 1≤l≤j (l− i)(Xi+1 − t2) + l(X1 − t2) 1≤l≤j (l− i+1)(Xi − t2)≥ 0 we conclude the proof of (16). Indeed, the expression in the last line equals the first probability in the right-hand side of (16). 14 V. V. VYSOTSKY Now let us prove (15). From the definition of T0 and T 0 we see that 1{t≤T (k) } → 1{t≤T0} a.s. as k→∞; then by (16), P{T0 ≥ t}= et (Si − it2)≥ 0 Then we need to check that (Si − it)≥ 0 1− te−t/2 for 0 ≤ t ≤ 1. The complicated calculations of this probability take more then ten pages. Therefore, they were separated into independent paper [26]. Although these calculations seem to be technical, they are based on quite original ideas. � 3.4. Some properties of the variables Ti. In this subsection we prove several important properties of the r.v.’s Ti. 1. The sequence Ti is stationary. Proof. This statement immediately follows from the definition of Ti and stationarity of Xi, which are i.i.d. 2. The common distribution function of Ti is defined by P{Ti ≥ t}= P (k− i)(Xi − t2) + inf (l− i)(X−i − t2) + (X0 − t2)≥ 0 Proof. This formula follows from (9). 3. We have P{√µ ≤ Ti ≤ 1} = 1 while sup{y :P{Ti < y} = 0} = µ and inf{y :P{Ti < y} = 1} = 1; recall that µ = sup{y :P{Xi < y} = 0}. In addi- tion, if 0 1. If t= 1 and 0 1, then for any s, t ∈ [0,1) and k ∈ N, we have |cov(1{s≤T0},1{t≤Tk})| ≤ 2 γ(ρ(s) + ρ(t))k1−γ .(21) Proof. The idea is to approximate 1{s≤T0} and 1{t≤Tk} by 1{s≤T (k/2)0 } 1{t≤T (k/2) }, respectively; here by k/2 we mean ⌈k/2⌉, where ⌈x⌉=min{m ∈ Z :m≥ x}. Note that 1{s≤T (k/2)0 } and 1{t≤T (k/2) } are independent because the first is a function of {Xi}i≤k/2 while the second is a function of {Xi}i≥k/2+1. We then have |cov(1{s≤T0},1{t≤Tk})| = |cov(1{s≤T0},1{t≤Tk})− cov(1{s≤T (k/2) },1{t≤T (k/2) ≤ |E(1{s≤T0}1{t≤Tk} − 1{s≤T (k/2) }1{t≤T (k/2) + |E(1{s≤T0} − 1{s≤T (k/2) })|+ |E(1{t≤Tk} − 1{t≤T (k/2) })|(22) = P{1{s≤T0}1{t≤Tk} 6= 1{s≤T (k/2) }1{t≤T (k/2) + P{1{s≤T0} 6= 1{s≤T (k/2)0 }}+ P{1{t≤Tk} 6= 1{t≤T (k/2)k } P{1{s≤T0}1{t≤Tk} 6= 1{s≤T (k/2) }1{t≤T (k/2) ≤ P{1{s≤T0} 6= 1{s≤T (k/2) } ∪ 1{t≤Tk} 6= 1{t≤T (k/2) therefore the result follows from Lemma 1. 6. The r.v.’s {Ti}i∈Z, {T (k)i }i∈Z, and {Ti,n} i=1 are associated ; the author owes this observation to M. A. Lifshits. CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 17 Proof. Let us first recall the definition and some basic properties of as- sociated variables. R.v.’s ξ1, . . . , ξm are associated if for any coordinate-wise nondecreasing functions f, g :Rm →R, it is true that cov(f(ξ1, . . . , ξm), g(ξ1, . . . , ξm))≥ 0 (assuming that the left-hand side is well defined). An infinite set of r.v.’s is associated if any finite subset of its variables is associated. The following sufficient conditions of association are well known; see [7]. (a) Independent variables are associated. (b) Coordinate-wise nondecreasing functions (of finite number of argu- ments) of associated r.v.’s are associated. (c) If the variables ξ1,k, . . . , ξm,k are associated for every k and (ξ1,k, . . . , ξm,k) D−→ (ξ1, . . . , ξm) as k→∞, then ξ1, . . . , ξm are associated. (d) If two sets of associated variables are independent, then the union of these sets is also associated. Then {Ti,n}n−1i=1 are associated for every n by (a), (b) and (20). Analo- gously, {T (k)i }i∈Z are associated for every k. Finally, since T i → Ti a.s. as k→∞ for every i, (c) ensures the association of {Ti}i∈Z. 7. For any s, t ∈R and k ∈ Z, cov(1{T0≤s},1{Tk≤t})≥ 0.(23) Proof. This inequality follows from cov(1{T0≤s},1{Tk≤t}) = cov(1{s 2. For γ = 2, we get α(k)≤ 16 i=k/2 P{inf i≥M Sii < t 2}= o(1) using the same argument and applying (13), (14), and Fact 2 instead of Lemma 1. 3.5. The last collision. We finish this section with a statement on the convergence of the moments of the last collision. Proposition 3. In the i.d. model, T lastn P−→ 1 as n→∞ if EX2i <∞. This result is well known for the Poisson model; see Giraud [8]. Proof of Proposition 3. Let us first prove that P{T lastn ≥ t}→ 0 as n→∞ for all t > 1. Since T lastn =max1≤j≤n−1Tj,n, we have P{T lastn ≥ t} ≤ P{Tj,n ≥ t}.(25) CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 19 By taking into account that the minima in (17) are nonpositive and by arguing as in (18), P{Tj,n ≥ t} ≤ P 1≤k≤j∨n−j (k− i)(Xj+i+1 − t2) +Xj+1 − t2 ≥ 0 1≤k≤j∨n−j (k− i+1)(Xi − t2)≥ 0 1≤k≤n/2 (Si − it2)≥ 0 We claim that (without any assumptions on the moments of Xi) P{Tj,n ≥ t} ≤ P i≥(t−1)/4tn 1 + t2 ;(26) recall that t > 1. Clearly, (26) follows if we check that 1≤k≤n/2 (Si − it2)≥ 0 i≥(t−1)/4tn 1 + t2 Assume the converse; then, by the nonnegativity of Si, (Si − it2) = (Si − it2) + i=cn+1 (Si − it2) (Scn − it2) + i=cn+1 1 + t2 − it2 where c := t−1 . We estimate the last expression with cnScn − (cn)2 t2 − (n/2) 2 − (cn)2 2 − 1 n2 − 1/4− c 2 − 1 It is simple to check that the right-hand side is negative, thus we have a contradiction. Then from (25), (26) and Fact 2 it follows that P{T lastn ≥ t} = i=1 o((cn) −1) = o(1) for all t > 1. Now let us prove that P{T lastn < t} → 0 as n → ∞ for all t < 1. Since T lastn =max1≤j≤n−1Tj,n, we estimate P{T lastn < t} ≤ P n,n < t P{1{t≤Tj√n,n} 6= 1{t≤T ( 20 V. V. VYSOTSKY In view of (10) and Lemma 1, the sum is j=1 o(n −1/2) = o(1), hence it remains to check that the first probability in the last line tends to zero. For a fixed n, all T are independent because each one is a function of {Xi}|j√n−i|≤√n/2 (to be precise, of Xj√n−√n/2+2, . . . ,Xj√n+√n/2). Thus, n−1{T ( n/2)√ < t} ≤ P n−1{T0 < t}, which tends to zero; indeed, P{T0 < t}< 1 by Property 3, Section 3.4. � 4. Proofs of Fact 1 and Theorem 1 for the i.d. model. Recall that the number of clusters Kn(t) is given by (6). Our idea is to study i=1 1{t t}.(27) Let us first prove (1) for all t < 1. It is sufficient to check that Kn(t) 1{t 1. Using (26) gives E Kn(t) i=1 P{Ti,n > t} → 0 as n→∞ and Kn(t) P−→ a(t) = 0 follows from the Chebyshev inequality. It remains to check that (1) holds for t = 1 if EX2i < ∞ to conclude the proof. If DXi = 0, then the situation is deterministic, this case was described in Introduction. Here we always have Kn(1) = 1 and (1) is true. If 0 t} is continuous at t= 1. Then (1) is true for t = 1 since 0 < Kn(1) ≤ Kn(t) P−→ a(t) for any t ∈ (0,1) and a(t)→ a(1) = 0 as tր 1. � Now we prove Theorem 1 for the i.d. model. We think of D[0,1] as of a separable metric space equipped with the Skorohod metric d, which induces the Skorohod topology. Proof of Theorem 1. At first, we prove (2). In view of representation (6) for Kn(t), relation (2) follows from the relation 0≤t≤1−ε 1{t 0. Then the empirical processes of ξi weakly con- verge in D[0,1] to a centered Gaussian process with the covariance function i∈Z cov(1{ξ0≤s},1{ξi≤t}). Remark. The limit Gaussian process is a.s. continuous on [0,1]. Fact 4 also holds true if F is an arbitrary continuous distribution function. The a.s. continuity of the limit process could be concluded by a compar- ison of the proof from Lin and Lu [17] with the proof of Theorem 22.1 from Billingsley [3]. The statements and the proofs of these theorems are identical, CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 23 but Lin and Lu do not state the continuity while Billingsley does. Further, since F (ξi) is uniformly distributed on [0,1] if F is continuous, Fact 4 holds true for every continuous F ; see the proof of Theorem 22.1 by Billingsley [3] for explanations. Recall that we need to prove the convergence of the empirical process of Ti. It seems that the r.v.’s Ti are not strongly mixing; but min{Ti,1− ε} are strongly mixing because of Property 8, Section 3.4. These variables are not continuous and so we need to fix them. Let us fix an ε ∈ (0,1), and let αi be i.i.d. r.v.’s independent of all Ti and, say, uniformly distributed on [0, ε]; we define T̃i := min{Ti,1− ε}+ 1{Ti≥1−ε}αi. The stationary variables T̃i are distributed on [0,1], their common dis- tribution function G is continuous, and the coefficients of strong mixing of G(T̃i) decrease as o(k 2−γ). The proof of the last statement is the same as the proof of Property 8 from Section 3.4. Indeed, approximate the vari- ables G(T̃0),G(T̃−1), . . . from the “past” by G(T̃ (k/2) 0 ),G(T̃ (k/2+1) −1 ), . . . where i := min{T i ,1− ε}+ 1{T (m) ≥1−ε}αi; use the analogous approximation for the variables from the “future”; and then repeat word for word the ar- guments of the previous proof. Now, recalling that γ > 4, we see that T̃i satisfy the assumptions of Fact 4, with the only difference that their distribution is not uniform. By Ũn(·) denote the empirical process of T̃i; clearly, Ũn(·) coincides with the empirical process Un(·) of Ti on [0,1− ε]. By the remark to Fact 4, we conclude that first, Ũn(·) D−→ K̃(·) in D[0,1],(33) where K̃(·) is a centered Gaussian process with the covariance function R̃(s, t) := cov(1{T̃0≤s},1{T̃i≤t}) and, second, trajectories of K̃(·) are a.s. continuous on [0,1]. [There exists a simpler and more elegant proof of (33). Note that {T̃i}i∈Z are associated as coordinate-wise nondecreasing functions of associated r.v.’s {Ti, αi}i∈Z, see (a), (b) and (d) from Property 6, Section 3.4. Then we can obtain (33) applying the result of Louhichi [18] on convergence of empirical processes of stationary associated r.v.’s ξi instead of using Fact 4. This the- orem requires only cov(F (ξ0), F (ξk)) = O(k −(4+δ)), which could be proved analogously to Property 5, Section 3.4. Thus we avoid the complicated esti- mations of the strong mixing coefficients, and the proof of (33) is becomes much simpler. The only problem is that this proof requires γ > 5. We also note that the a.s. continuity of K̃(·) could be proved directly, without referring to the proof of Fact 4. The arguments should be the same as in the proof of the continuity of KUnif(·) in Section 5.] 24 V. V. VYSOTSKY Define R(s, t) := cov(1{T0≤s},1{Ti≤t}),(34) which is, evidently, equal to R̃(s, t) on [0,1 − ε]2. Since R̃(s, t) is positive definite and ε > 0 is arbitrary, the function R(s, t) is positive definite on [0,1)2. Therefore, by Lifshits [15], Section 4, there exists a centered Gaussian process K(·) on [0,1) with the covariance function R(s, t). The trajectories of K(·) are a.s. continuous on [0,1) by K(·) D= K̃(·) on [0,1−ε], arbitrariness of ε > 0, and the a.s. continuity of K̃(·) on [0,1]. Finally, by (33), Ũn(·) = Un(·) on [0,1− ε], K̃(·) D=K(·) on [0,1− ε], and the a.s. continuity of K̃(·), we get (32). Since (32) implies (30), we conclude the proof of (2). Only the stated properties of R(s, t) remain to be proven. The continuity of the joint distribution function of continuous variables T0 and Ti implies that cov(1{T0≤s},1{Ti≤t}) is continuous on [0,1) 2 for every i ≥ 0. Then, in view of (21), R(s, t) is continuous on [0,1)2 as a sum of uniformly converging series of continuous functions. The strict positivity of R(s, t) on ( µ,1)2 trivially follows from (34), (23) and cov(1{T0≤s},1{T0≤t}) = a(s∨ t)(1−a(s∧ t)) > 0; the last inequality holds by Property 3, Section 3.4. The R(s, t) = 0 on [0,1)2 \ (√µ,1)2 follows from P{Ti ≤ µ}= 0, which holds by Properties 3 and 4 from Section 3.4. � We note that (3) holds for t 6= 1 under the less restrictive condition EX2i < ∞. For t < 1, the proof is almost the same: By (29), which is true for γ > 3/2, we conclude that (3) holds if the stationary associated sequence 1{t 1, relation (3) holds true with σ2(t) = 0 because of Proposition 3. Finally, note that the process K(·) is associated, that is, the r.v.’s {K(t)}t∈[0,1) are associated. In fact, by (6), Property 6 from Section 3.4, and Condition (b) from the same Property 6, the processes Kn(·)−na(·)√ associated for every n. Then K(·) is associated by (2) and (c), Property 6. 5. Proof of Theorem 1 for the uniform model. There exists a simple method that allows to extend results from the Poisson model to the uniform model and vise versa. The method is based on the next statement (see Karlin [13], Section 9.1). CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 25 Fact 5. Let Si be an exponential random walk. Then for any k ≥ 1, we , . . . , = (U1,k,U2,k, . . . ,Uk,k),(35) where Ui,k are the order statistics of k i.i.d. r.v.’s uniformly distributed on [0,1]. Moreover, the random vector in the left-hand side of (35) is indepen- dent of Sk+1. Therefore, if xPoissj,n (0) = Sj are the initial positions of particles in the Poisson model, then for the initial positions of particles in the uniformmodel, we have xUnifj,n (0) = · xPoissj,n (0). By Proposition 2 and (5), we conclude TUnifj,n = β Poiss j,n , βn := ,(36) and hence, using (6), we get KUnifn (t) =K Poiss n (βnt).(37) Note that the process KUnifn (·) and the r.v. βn are independent since val- ues of the process are defined by xUnif1,n (0), . . . , x n,n (0), which are mutually independent of βn by Fact 5. Now we prove Theorem 1 for the uniform model. Proof of Theorem 1. Denote Yn(t) := KUnifn (t)− na(t)√ , Zn(t) := n(a(t)− a(βnt)); we stress that Yn(·) and Zn(·) are independent. Fix an ε ∈ (0,1). First, it follows from (2) for the Poisson model and (37) Yn(·) +Zn(·) D−→KPoiss(·) in D[0,1− ε].(38) Indeed, the process Yn(·) +Zn(·) is obtained from 1√n(K Poiss n (·)− na(·)) by the random time change t 7→ βnt; and since ‖βnt− t‖C[0,1−ε] P−→ 0, we have Yn(·) +Zn(·), KPoissn (·)− na(·)√ P−→ 0 by the definition of the Skorohod metric d. Second, from Fact 1, (15), and (27) it follows that aUnif(t) = aPoiss(t) = P{TPoiss0 ≥ t}= 1− t2 for 0≤ t≤ 1, and by the central limit theorem, Zn(t) D−→ t2η in D[0,1− ε],(39) 26 V. V. VYSOTSKY where η is a standard Gaussian r.v. We claim that (38), the independence of Yn(·) and Zn(·), and (39) yield the weak convergence of Yn(·) in D[0,1− ε]. Let us check the tightness of Yn(·) and the convergence of their finite-dimensional distributions. The tightness of Yn(·) in D[0,1− ε] follows from Yn(·) = (Yn(·)+Zn(·))− Zn(·), (38), and (39). Indeed, by the Prokhorov theorem, (38) and (39) yield that both sequences Yn(·) + Zn(·) and −Zn(·) are tight. But trajectories of −Zn(·) are a.s. continuous because of the continuity of a(·), and the tightness follows from the continuity of addition + :D×C →D and the fact that under any continuous mapping, the image of a compact set is also a compact set. Now we study convergence of finite dimensional distributions of Yn(·). Recall that the characteristic function of a centered Gaussian vector in Rm is e−1/2(Ru,u), where u ∈ Rm and R is the covariance matrix of the vector. Then (38), the independence of Yn(·) and Zn(·), and (39) yield that for the characteristic functions of all finite-dimensional distributions of Yn(·), we Eei(Yn(t),u) −→ e−1/2({R Poiss(tj ,tk)−t2j t j,k=1 ,(40) where u ∈Rm, t= (t1, . . . , tm) ∈ [0,1−ε]m, and Yn(t) := (Yn(t1), . . . , Yn(tm)). We stress that (40) is true for every t ∈ [0,1− ε]m since the limit processes in (38) and (39) have continuous trajectories. We see that the matrix {RPoiss(tj , tk)− t2j t2k}mj,k=1 is positive definite for any t = (t1, . . . , tm) ∈ [0,1− ε]m and m≥ 1 since the absolute value of the left-hand side of (40) does not exceed one. Putting RUnif(s, t) :=RPoiss(s, t)− s2t2, we have {RPoiss(tj , tk)− t2j t2k}mj,k=1 = {RUnif(tj , tk)}mj,k=1; then the function RUnif(s, t) is positive definite on [0,1)2 since ε > 0 is arbitrary. Thus, by Lifshits [15], Section 4, RUnif(s, t) is the covariance function of some centered Gaussian process KUnif(·) on [0,1). Relation (2) is thus proved. Now check that KUnif(·) ∈C[0,1− ε] a.s. to conclude the proof of Theorem 1 for the uniform model. For this purpose, let us prove that a.s., trajectories of Yn(·) have jumps of size 1√ only. In fact, the jumps of Yn(·) coincide with the jumps of KUnifn (·), whose jumps are of size 1√n if and only if T 6= TUnifj2,n for 1 ≤ j1 6= j2 ≤ n − 1. By (36), we need to verify that TPoissj1,n 6= T Poiss for 1 ≤ j1 6= j2 ≤ n − 1. This relation follows from (20) if H(k1, j1, l1) 6= H(k2, j2, l2) a.s. for j1 6= j2 and k1, k2, l1, l2 ≥ 1. The last a.s. nonequality is obvious because if the equality holds true, then a certain nontrivial linear combination of i.i.d. exponential Xi equals zero. CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 27 Then there exist a.s. continuous Ỹn(·) such that supt∈[0,1−ε] |Ỹn(t)−Yn(t)| ≤ a.s.; consequently, d(Ỹn, Yn) ≤ 1√n a.s. Then by Yn(·) D−→ KUnif(·), we also have Ỹn(·) D−→KUnif(·). But 1 = lim inf P{Ỹn(·) ∈C} ≤ P{KUnif(·) ∈C} since C ⊂D is closed in the Skorohod topology, therefore, a.s., KUnif(·) is continuous on [0,1− ε]. Since ε ∈ (0,1) is arbitrary, a.s., KUnif(·) is continuous on the whole inter- val [0,1). The RUnif(s, t) =RPoiss(s, t)− s2t2 is continuous on [0,1)2 because RPoiss(s, t) is. � 6. The number of clusters at the critical moment. Now we turn our attention to the number of clusters at the critical moment t = 1. We are interested in the behavior of Kn(1)− na(1)√ Kn(1)√ which is the left-hand side of (3) at t = 1; here we have a(1) = 0 under EX2i <∞, see Property 3, Section 3.4. We do not know if this sequence is weakly convergent, but we hope that it is. We also have a naive guess that its limit is Gaussian because the limit in Theorem 1 is Gaussian. In view of Kn(1)≥ 1, this conjectured weak limit is nonnegative, hence it is Gaussian if and only if it is identically equal to zero. However, the results of this section show that the limit is nonzero, thus our guess on Gaussianity fails. The study of convergence of Kn(1)√ is quite complicated. Therefore, in this section, we consider only the Poisson model. First, let us prove the following statement. Proposition 4. In the Poisson model, we have limn→∞P{Kn(1) = 1}> 0. Proof. On the one hand, Kn(1) = 1 is equivalent to T n;Poiss ≤ 1, where T lastn;Poiss denotes the moment of the last collision in the Poisson model. On the other hand, a result by Giraud [8] states that in the uniform model, n(T lastn;Unif − 1) D−→ sup 0≤x≤1 W (y)dy − W (y)dy =: τ, where W (·) is a Brownian bridge. Now, by (36), we have T lastn;Unif = β−1n T lastn;Poiss, hence n(β−1n T n;Poiss − 1) D−→ τ.(41) 28 V. V. VYSOTSKY But from the central limit theorem and the law of large numbers, n(β−1n − 1) =− Sn+1 − n√ Sn+1( Sn+1 + D−→ η ,(42) where η is a standard Gaussian r.v. and Si is a standard exponential ran- dom walk that defines initial positions of particles. Since, in view of Fact 5, T lastn;Unif = β n;Poiss and βn are independent, from (41), (42), and the law of large numbers it follows that n(T lastn;Poiss − 1) D−→ τ − η = τ + where τ and η are independent. Thus, P{Kn(1) = 1}= lim P{T lastn;Poiss ≤ 1}= P The main advantage of the Poisson model is that, by Lemma 2 and Prop- erty 4, Section 3.4 we have P{Tj,n > 1}= epjpn−j , where pk := P 1≤m≤k (Si − ESi)≥ 0 and Si is a standard exponential random walk. We say that the sequence of r.v.’s i=1(Si−ESi) is an integrated random walk. In the proof of Property 3, Section 3.4, we showed that pk → 0 as k→∞. Therefore, it is reasonable to say that pk are the unilateral small deviation probabilities of an integrated centered random walk. We need to obtain the asymptotics of pk → 0 to continue the study of convergence of Kn(1)√ . Unfortunately, the results of the rest of this section are completely dependent on the correctness of the following conjecture. Conjecture 1. We have pk ∼ c1k−1/4 as k→∞ for some c1 ∈ (0,∞). Simulations show that the conjecture is true and c1 ≈ 0.36. The weaker form pk ≍ k−1/4 of Conjecture 1 was proved by Sinai [22], but only for inte- grated symmetric Bernoulli random walks. It also interesting to note that, by McKean [19], the unilateral small deviation probabilities of an integrated Wiener process have the same order as T →∞: 0≤s≤T W (u)du≥−1 ∼ c2T−1/4(43) for some c2 ∈ (0,∞). The left-hand side of (43) is a unilateral small deviation probability since 0≤s≤T W (u)du≥−1 0≤s≤1 W (u)du≥−T−3/2 .(44) CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 29 To be precise, McKean was interested in a more general problem, and some calculations are required to obtain (43) from his results. Therefore, we additionally refer to Isozaki and Watanabe [12] who state (43) explic- itly. By the results mentioned above, we also suppose that Conjecture 1 is true for other integrated centered random walks that satisfy some moment conditions. Now we are able to prove the following result on convergence of Kn(1)√ Proposition 5. Suppose Conjecture 1 holds true. Then in the Poisson model, we have Kn(1)√ = c3, sup Kn(1)√ <∞(45) for some c3 ∈ (0,∞); the sequence Kn(1)√n is tight and uniformly integrable; and the limit of any weakly converging subsequence of Kn(1)√ takes value zero with positive probability, but is not identically equal to zero. Numerical simulations show that Kn(1)√ is weakly convergent and that this convergence is quite fast. In Figure 1 we present the (empirical) distribution function of Kn(1)√ for n = 10,000. Since the simulations performed for n = 40,000 showed a hardly perceptible difference, this function seems to be a good candidate for the distribution function of the conjectured limit. Note that if we weaken Conjecture 1 to pk ≍ k−1/4, then Proposition 5 still holds true with the only difference that E Kn(1)√ Proof of Proposition 5. We start with the convergence of the ex- pectation. On the one hand, by (6) and Lemma 2, Kn(1)√ pipn−i, and on the other hand, i−1/4(n− i)−1/4 = 1 )−1/4( )−1/4 −→B(3/4,3/4) as the integral sum of Beta function. Then it follows from Conjecture 1 and standard arguments that E Kn(1)√ converges to c3 := ec 1B(3/4,3/4) > 0. 30 V. V. VYSOTSKY Fig. 1. The distribution function of Kn(1)√ for n= 10,000. Now we check the uniform boundedness of E( Kn(1)√ )2. By (6) it is sufficient to prove that i,j=1,i 6=j P{Ti,n > 1, Tj,n > 1}<∞.(46) Suppose i < j; then by using (8) and properties of Fk,j,l(·), we get P{Ti,n > 1, Tj,n > 1}= P 1≤k≤n−i 1≤l≤i Fk,i,l(1)> 0, min 1≤k≤n−j 1≤l≤j Fk,j,l(1)> 0 1≤k≤(j−i)/2 1≤l≤i Fk,i,l(1)> 0, min 1≤k≤n−j 1≤l≤(j−i)/2 Fk,j,l(1)> 0 where by (j − i)/2 we mean ⌈(j − i)/2⌉. The minima in the last expres- sion are independent as functions of {Xm}m≤(i+j)/2 and {Xm}m≥(i+j)/2+1, respectively; hence P{Ti,n > 1, Tj,n > 1} ≤ P 1≤k≤(j−i)/2 1≤l≤i Fk,i,l(1)> 0 1≤k≤n−j 1≤l≤(j−i)/2 Fk,j,l(1)> 0 CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 31 = P{Ti,i+(j−i)/2 > 1} · P{T(j−i)/2,n−j+(j−i)/2 > 1} = e2pip ⌈(j−i)/2⌉pn−j, where the first equality follows from (8) and the second follows from Lemma 2. Recalling Conjecture 1, we get i,j=1,i 6=j P{Ti,n > 1, Tj,n > 1} ≤ i,j=1,i 6=j e2pip ⌈|j−i|/2⌉pn−j i,j=1,i 6=j i−1/4⌈|j − i|/2⌉−1/2(n− j)−1/4 i,j=1,i 6=j )−1/4∣ −1/2( )−1/4 for some c > 0. The last expression is an integral sum converging to x−1/4|x− y|−1/2(1− y)−1/4 dxdy, and it is a simple exercise to check that the integral is finite. This concludes (46). The uniform integrability of Kn(1)√ follows from the second relation from (45), see Billingsley [3], Section 5, and the tightness follows from the uniform integrability. Finally, suppose Kni(1)√ D−→ ξ for some subsequence ni → ∞ and some r.v. ξ. Then Eξ = c3 > 0 by the uniform integrability and (45), and hence ξ is not identically equal to zero. But the distribution of ξ has an atom at zero since by Proposition 4 and properties of weak convergence, P{ξ = 0}= lim P{ξ ≤ ε} ≥ lim lim sup Kni(1)√ ≥ lim P{Kni(1) = 1}> 0. � 7. Open questions. 1. The number of clusters at the critical moment t= 1. Here the main question is if Conjecture 1 holds true. Even by itself, this problem is worth studying. But even if Conjecture 1 is true, we still do not have a proof of weak convergence of Kn(1)√ , it is only known that this sequence is tight. The author 32 V. V. VYSOTSKY strongly believes, relying on numerical simulations, that the limit exists. It would be interesting to find this conjectured limit, which should be nontrivial by Proposition 5, in an explicit form. 2. The weak convergence of Kn(·)−na(·)√ on the whole interval [0,1]. It is very natural to ask if it is possible to strengthen Theorem 1 by proving the weak convergence of Kn(·)−na(·)√ in D[0,1]. This complicated problem returns us again to Question 1 because the weak convergence of Kn(·)−na(·)√ in D[0,1] implies the weak convergence of Kn(1)−na(1)√ Kn(1)√ , see Billingsley [3], Section 15. But even if Kn(1)√ converges, its weak limit K(1) is not Gaussian, hence the limit process K(·), which is Gaussian on [0,1), is no more Gaussian on [0,1]. Therefore, it is doubtful that Theorem 1 is true in D[0,1]; at least, one should provide a proof completely different from the presented one. Also, it is unclear how to define the finite-dimensional distributions of the non-Gaussian K(·) on [0,1] because simulations show that K(1) would not be independent with K(t) for t < 1. 3. The number of clusters in the warm gas. In the presented case, initial speeds of particles are zero. This model is of- ten called the cold gas according to its zero initial temperature. We introduce a new model stating that initial speeds of particles are anv1, anv2, . . . , anvn, where vi are some i.i.d. r.v.’s and an is a sequence of normalization con- stants. This model, called the warm gas, was studied in many papers, for example, [14, 16, 20, 25]. It is of a great interest to study the behavior of Kn(t) in the warm gas. In [25], the author proved that in the basic case where an = 1 for all n and Ev2i <∞, we have Kn(t) P−→ 0 for all t > 0. The question is to find a normal- ization of Kn(t) leading to some nontrivial limit. Clearly, this normalization depends on an, but it is very possible that there is an effect of phase tran- sition similar to the one discovered by Lifshits and Shi [16]: If an are small enough, then the gas has a low temperature and the normalization is the same as in the cold gas. If an are big enough, as in the basic case an ≡ 1, then the normalization and the behavior of the gas differ entirely from the case of the cold gas. The author believes that the localization property, which is described in Section 3, could be helpful in a study of these questions. It is also interesting to compare the behavior of Kn(1) in the warm and in the cold gases; in the warm gas, the moment t= 1 plays the same “critical” role as in the cold gas, see Lifshits and Shi [16]. The variable Kn(1) was studied by Suidan [24], who considered the warm gas with an ≡ 1 and deter- ministic initial positions of particles (his initial positions were 1 , . . . , n For this case, Suidan found the distribution of Kn(1) and showed that EKn(1)∼ logn. Recall that in the presented case, EKn(1)∼ c3 CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 33 4. The number of clusters in ballistic systems of sticky particles. A sticky particles model is called ballistic if it evolves according to the laws introduced in Section 1, but in the absence of gravitation. Such models are, in some sense, more natural than gravitational ones because the ba- sic assumption that gravitation does not depend on distance is sometimes confusing. However, an unpublished paper of Lifshits and Kuoza shows that certain gravitational and ballistic models are tightly connected. It seems interesting to study the number of clusters in the ballistic model. The author does not know any results in this field. Acknowledgments. I am grateful to my adviser Mikhail A. Lifshits for drawing my attention into the subject and for his guidance. I also thank the anonymous referees for carefully reading this paper and useful comments. REFERENCES [1] Baum, L. E. and Katz, M. (1965). Convergence rates in the law of large numbers. Trans. Amer. Math. Soc. 120 108–123. MR0198524 [2] Bertoin, J. (2002). Self-attracting Poisson clouds in an expanding universe. Comm. Math. Phys. 232 59–81. MR1942857 [3] Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York. MR0233396 [4] Brenier, Y. and Grenier, E. (1998). Sticky particles and scalar conservation laws. SIAM J. Numer. Anal. 35 2317–2328. MR1655848 [5] Chertock, A., Kurganov, A. and Rykov, Yu. (2007). A new sticky particle method for pressureless gas dynamics. SIAM J. Numer. Anal. 45 2408–2441. MR2361896 [6] E, W., Rykov, Yu. G. and Sinai, Ya. G. (1996). Generalized variational principles, global weak solutions and behavior with random initial data for systems of conservation laws arising in adhesion particle dynamics. Comm. Math. Phys. 177 349–380. MR1384139 [7] Esary, J. D., Proschan, F. and Walkup, D. W. (1967). Association of random variables, with applications. Ann. Math. Stat. 38 1466–1474. MR0217826 [8] Giraud, C. (2001). Clustering in a self-gravitating one-dimensional gas at zero tem- perature. J. Statist. Phys. 105 585–604. MR1871658 [9] Giraud, C. (2005). Gravitational clustering and additive coalescence. Stochastic Pro- cess. Appl. 115 1302–1322. MR2152376 [10] Gurbatov, S. N., Malakhov, A. N. and Saichev, A. I. (1991). Nonlinear Ran- dom Waves and Turbulence in Nondispersive Media: Waves, Rays, Particles. Manchester Univ. Press. MR1255826 [11] Gurbatov, S. N., Saichev, A. I. and Shandarin, S. F. (1989). The large-scale structure of the universe in the frame of the model equation of nonlinear diffu- sion. Mon. Not. R. Astr. Soc. 236 385–402. [12] Isozaki, Y. and Watanabe, S. (1994). An asymptotic formula for the Kolmogorov diffusion and a refinement of Sinai’s estimates for the integral of Brownian mo- tion. Proc. Japan Acad. Ser. A Math. Sci. 70 271–276. MR1313176 [13] Karlin, S. (1968). A First Course in Stochastic Processes. Academic Press, New York. MR0208657 http://www.ams.org/mathscinet-getitem?mr=0198524 http://www.ams.org/mathscinet-getitem?mr=1942857 http://www.ams.org/mathscinet-getitem?mr=0233396 http://www.ams.org/mathscinet-getitem?mr=1655848 http://www.ams.org/mathscinet-getitem?mr=2361896 http://www.ams.org/mathscinet-getitem?mr=1384139 http://www.ams.org/mathscinet-getitem?mr=0217826 http://www.ams.org/mathscinet-getitem?mr=1871658 http://www.ams.org/mathscinet-getitem?mr=2152376 http://www.ams.org/mathscinet-getitem?mr=1255826 http://www.ams.org/mathscinet-getitem?mr=1313176 http://www.ams.org/mathscinet-getitem?mr=0208657 34 V. V. VYSOTSKY [14] Kuoza, L. V. and Lifshits, M. A. (2006). Aggregation in one-dimensional gas model with stable initial data. J. Math. Sci. 133 1298–1307. MR2092206 [15] Lifshits, M. A. (1995). Gaussian Random Functions. Kluwer, Dordrecht. MR1472736 [16] Lifshits, M. and Shi, Z. (2005). Aggregation rates in one-dimensional stochastic systems with adhesion and gravitation. Ann. Probab. 33 53–81. MR2118859 [17] Lin, Z. and Lu, C. (1996). Limit Theory for Mixing Dependent Random Variables. Kluwer, Dordrecht. MR1486580 [18] Louhichi, S. (2000). Weak convergence for empirical processes of associated se- quences. Ann. Inst. H. Poincare Probab. Statist. 36 547–567. MR1792655 [19] McKean, H. P. (1963). A winding problem for a resonator driven by a white noise. J. Math. Kyoto Univ. 2 227–235. MR0156389 [20] Martin, Ph. A. and Piasecki, J. (1996). Aggregation dynamics in a self-gravitating one-dimensional gas. J. Statist. Phys. 84 837–857. MR1400187 [21] Newman, C. M. (1980). Normal fluctuations and the FKG inequalities. Comm. Math. Phys. 74 119–128. MR0576267 [22] Sinai, Ya. G. (1992). Distribution of some functionals of the integral of a random walk. Theor. Math. Phys. 90 219–241. MR1182301 [23] Shandarin, S. F. and Zeldovich, Ya. B. (1989). The large-scale structure of the universe: Turbulence, intermittency, structures in a self-gravitating medium. Rev. Modern Phys. 61 185–220. MR0989562 [24] Suidan, T. M. (2001). A one-dimensional gravitationally interacting gas and the con- vex minorant of Brownian motion. Russ. Math. Surv. 56 687–708. MR1861441 [25] Vysotsky, V. V. (2006). On energy and clusters in stochastic systems of sticky gravitating particles. Theory Probab. Appl. 50 265–283. MR2221711 [26] Vysotsky, V. V. (2007). The area of exponential random walk and partial sums of uniform order statistics. J. Math. Sci. 147 6873–6883. Department of Probability Theory and Mathematical Statistics Faculty of Mathematics and Mechanics St. Petersburg State University Bibliotechnaya pl. 2 Stary Peterhof 198504 Russia E-mail: vlad.vysotsky@gmail.com http://www.ams.org/mathscinet-getitem?mr=2092206 http://www.ams.org/mathscinet-getitem?mr=1472736 http://www.ams.org/mathscinet-getitem?mr=2118859 http://www.ams.org/mathscinet-getitem?mr=1486580 http://www.ams.org/mathscinet-getitem?mr=1792655 http://www.ams.org/mathscinet-getitem?mr=0156389 http://www.ams.org/mathscinet-getitem?mr=1400187 http://www.ams.org/mathscinet-getitem?mr=0576267 http://www.ams.org/mathscinet-getitem?mr=1182301 http://www.ams.org/mathscinet-getitem?mr=0989562 http://www.ams.org/mathscinet-getitem?mr=1861441 http://www.ams.org/mathscinet-getitem?mr=2221711 mailto:vlad.vysotsky@gmail.com Introduction Description of the model Statement of the problem and the results Organization of the paper Method of barycenters Study of the i.d. model. The localization property The initial study Localization property of the aggregation process The distribution function of T0 in the Poisson model Some properties of the variables Ti The last collision Proofs of Fact 1 and Theorem 1 for the i.d. model Proof of Theorem 1 for the uniform model The number of clusters at the critical moment Open questions Acknowledgments References Author's addresses ABSTRACT We give a quantitative analysis of clustering in a stochastic model of one-dimensional gas. At time zero, the gas consists of $n$ identical particles that are randomly distributed on the real line and have zero initial speeds. Particles begin to move under the forces of mutual attraction. When particles collide, they stick together forming a new particle, called cluster, whose mass and speed are defined by the laws of conservation. We are interested in the asymptotic behavior of $K_n(t)$ as $n\to \infty$, where $K_n(t)$ denotes the number of clusters at time $t$ in the system with $n$ initial particles. Our main result is a functional limit theorem for $K_n(t)$. Its proof is based on the discovered localization property of the aggregation process, which states that the behavior of each particle is essentially defined by the motion of neighbor particles. <|endoftext|><|startoftext|> Introduction Let Hm and Hn are hyperbolic spaces with dimensions m ≥ 2 and n correspond- ingly. For convenience, we use the upper-half space models for Hm and Hn. So Hm = {(x1, ..., xm) ∈ IRm : xm > 0}, Hn = {(y1, ..., yn) ∈ IRn : yn > 0} with metrics d2Hm = (xm)2 ((dx1)2 + ...+ (dxm)2), d2Hn = (yn)2 ((dy1)2 + ...+ (dyn)2). So the tension fields of u = (y1, ..., yn) is τα = (xm)2(∆0y < ∇0y α,∇0y n >), for 1 ≤ α ≤ n− 1 and τn(u) = (xm)2(∆0y α|2 − |∇0y n|2)), where ∇0 is the Euclidean gradient and ∆0 is the Euclidean Laplacian. A C2 map u : Hm → Hn is called a harmonic map if τ(u)s = 0 for all s = 1, 2, ..., n. The literature about harmonic maps between Riemannian manifolds are abundant, we refer the readers to the classical work [4]. One of the interesting problems for harmonic maps is that of the Dirichlet prob- lem at infinity: Given ∂Hm and ∂Hn geometric boundaries of Hm and Hn, and given a continuous map f : ∂M → ∂N (here continuity is understood in the sense Date: October 22, 2018. 2000 Mathematics Subject Classification. 53A35. Key words and phrases. Dirichlet problems; Harmonic functions; Hyperbolic spaces. This work has been initiated when the second author was at Department of mathematics, University of natural sciences, Hochiminh city, Vietnam. He would like to thank Professor Dang Duc Trong for his many invaluable helps. He also would like to express his thankfulness to Professor F. Helein, Professor R. Schoen, and Mr. Le Quang Nam for their generous help. http://arxiv.org/abs/0704.0087v2 2 DUONG MINH DUC AND TRUONG TRUNG TUYEN of Euclidean), is there a harmonic map u : Hm → Hn such that in Euclidean sense u is continuous up to the boundary ∂Hm and takes boundary value f? For this problem with some more requirements for the smoothness of f , there are many results. In three papers [8], [9] and [7], Li and Tam established the existence and uniqueness of a harmonic function u which is C1 up to the boundary and has boundary value f , provided f is C1. But for more general types of f , according to our knowledge, there is no answer to the existence of a solution u. In this paper we establish the existence of approximate solutions to the Dirichlet problem for harmonic maps between two hyperbolic spaces with prescribed bound- ary value. More explicitly, we prove the following result Theorem 1. Let f : Hm → Hn be a bounded uniformly continuous. Let functions g and ϕ be as in Section 2. Assume that t−1g(t)dt < ∞, in particular, this condition is satisfied if f is Holder continuous. For each ǫ > 0, there exists a harmonic map uǫ : H m → Hn which is continuous up to the boundary ∂Hm and u|∂Hm = (f 1, ..., fn−1, ǫ). Our strategy for proving this result is the follows: First, we construct an initial map, i.e., a C2 map v = (v1, ..., vn−1, vn) : Hm → Hn which has boundary value f for any continuous map f : ∂Hm → Hn. For this step we follows the ideas in [9], with some changes: Since the function f needs not to differentiable, we can not take vn as in [9], and the function vn of ours is a function of one variable xm. Then, we use this function to produce harmonic maps uǫ : H m → Hn which takes boundary value (f1, ..., fn−1, vn + ǫ) for every ǫ > 0. 2. Initial maps In this part, we use the techniques in [9] to construct good initial maps v having the map f : ∂Hm → ∂Hn as the boundary value. Let f : IRm−1 → IRn−1 be a uniformly continuous bounded function. Let g : Hm → (0.∞) be C2, bounded and g(x′, xm) = 0, uniformly in x′. We denote by v = {f, g} : Hm → Hn the extension of f defined as follows vα(x′, xm) = xmfα(y′) (|x′ − y′|2 + (xm)2)m/2 for 1 ≤ α ≤ n− 1 and vn(x′, xm) = g(x′, xm). By results in [9] (pp. 628-630) we have (i) v is C2 and up to the boundary given by xm = 0 it is continuous. (ii) If 1 ≤ α ≤ n− 1 then xm|∇0v α| = 0, uniformly in x′. Moreover, by estimates of elliptic PDEs (see Theorem 2.10 in [5]), noting that vα is bounded, there exists constants C > 0 such that (2.1) max{(xm)3|D3vα|, (xm)2|D2vα|, (xm)|∇0v α|} ≤ C. APPROXIMATE SOLUTIONS TO THE DIRICHLET PROBLEM FOR HARMONIC MAPS BETWEEN HYPERBOLIC SPACES3 We put g(r) = sup x′,y′∈IRm−1, |x′−y′|≤r |f(y′)− f(x′)|, ϕ(r) = s2 + r2 g(s)ds. Since g is monotone it follows that g is Lebesgue measurable. Moreover, since g is bounded, we see that ϕ is well-defined. Using polar coordinates with center at x′ we see that there exists a constant C > 0 such that ∫ IRm−1 xm|f(y′)− f(x′)| (|x′ − y′|2 + (xm)2)m/2 ≤ Cϕ(xm), for all x′ ∈ IRm−1. Since f is uniformly continuous we see that g(r) = 0. Now we show that ϕ(xm) = 0. Indeed, for any ǫ > 0, we find δ > 0 such that g(s) ≤ ǫ, if 0 < s ≤ δ. So, if K = sup s∈IR g(s) we have ϕ(r) = s2 + r2 g(s)ds+ s2 + r2 g(s)ds s2 + r2 s2 + r2 = ǫ arctan(δ/r) +K(π/2− arctan(δ/r)). Letting r → 0 we see that lim sup ϕ(r) ≤ ǫπ/2. Since ǫ > 0 is arbitrary, we see that ϕ(r) = 0. Thus, if we put v = {f, ϕ(xm)} we see that v is an extension of f . Moreover we have the following result Lemma 1. Let f : ∂Hm → ∂Hn be nonconstant, uniformly continous and bounded. Put v = {f, ϕ(xm)} as above. Then v is smooth, up to the boundary it is continuous, v|IRm−1 = f and there exists C > 0 such that for x m near 0 we ||τ(v)||2 ≤ C. 4 DUONG MINH DUC AND TRUONG TRUNG TUYEN Proof. By Section 6 in [9] we have |(xm)∇0v α| ≤ C3|ϕ(x where 1 ≤ α ≤ n− 1 and C3 is a positive constant. Directly computation gives ϕ′(r) = s2 − r2 (s2 + r2)2 g(s)ds, ϕ”(r) = (s2 + r2)2 g(s)ds+ −4r(s2 − r2) (s2 + r2)3 g(s)ds. max{|rϕ′(r)|, |r2ϕ”(r)|} ≤ C4ϕ(r), where C4 is a constant. Since g is increasing, g′ exists almost everywhere and g′ ≥ 0. Using integration by parts, noting that d s2+r2 ) = s (s2+r2)2 , we have ϕ′(r) = s2 − r2 (s2 + r2)2 g(s)ds s2 + r2 g(s)|∞0 + s2 + r2 g′(s)ds s2 + r2 g′(s)ds. Differentiating the last term in above equality we get ϕ”(r) = − (s2 + r2)2 g′(s)ds. Since f is nonconstant we see easily that g′ 6≡ 0 (in fact, we don’t need this restriction since we can add g with a non-constant positive function, for example (xm)1/2 ). So since g′ ≥ 0, it follows from above equalities that ϕ′(r) > 0, |rϕ”(r)| ≤ C5ϕ ′(r), where C5 is a positive constant. Then use the formula for the tension field we are done. � 3. Proof of Theorem 1 Proof. Fixed ǫ > 0. We define vǫ : H m → Hn as follows: vǫ(x) = (v 1(x), v2(x), ..., vn−1(x), ϕ(xm) + ǫ). For each δ > 0 denote uǫ,δ : H m k Ωδ = {x m > δ} → Hn the harmonic map taking value vǫ on ∂Ωδ. By inequality (2.1) in [2] and properties of v and vǫ (see Lemma 1) we have ∆HmdHn(uǫ,δ, vǫ) ≥ −|τ(vǫ)| ≥ −C ϕ(xm) ϕ(xm) + ǫ ϕ(xm), for all x ∈ Ωδ, and here C is one constant from Lemma 1. APPROXIMATE SOLUTIONS TO THE DIRICHLET PROBLEM FOR HARMONIC MAPS BETWEEN HYPERBOLIC SPACES5 We claim that the function ψ(r) = u−2ϕ(u) du ds is well-defined for r ≥ 0. In fact, using the formula for ϕ we have ψ(r) = u−2ϕ du ds = u−1(u2 + t2)−1g(t)dt du ds. Since the integrand is non-negative, using Fubini’s theorem we have∫ r u−1(u2 + t2)−1g(t)dt du ds = u−1(u2 + t2)−1g(t)du dt ds log(1 + )g(t)dt ds log(1 + )g(t)ds dt πt2 − 2 arctan( t )t2 + r log(1 + t g(t)dt. Now since g(t) is bounded we have πt2 − 2 arctan( t g(t)dt is convergent. Fixed r ≥ 0, near t = 0 we have r log(1 + t g(t) ≈ t−1g(t), and when t→ ∞ we have r log(1 + t g(t) ≈ t−3g(t), hence since g(t) is bounded and the assumption that t−1g(t) converges, our claim is verified. We use the same ψ to denote the function ψ : Hm → IR defined by ψ(x) = ψ(xm) for x = (x1, ..., xm−1, xm) ∈ Hm. Now we have ψ′(r) = u−2ϕ du > 0 and ψ”(r) = −r−2ϕ(r), since m ≥ 2 we have ∆Hm(−ψ(x)) = −(x m)2[ψ”(xm)− (m− 2) ψ′(xm)] ≥ −(xm)2ψ”(xm) = ϕ(r). Hence ∆Hm (dHn(uǫ,δ, vǫ)− C ψ) ≥ 0, for x ∈ Ωδ. Hence by maximum principle we have dHn(uǫ,δ, vǫ) ≤ C ψ(xm). This bound for dHn(uǫ,δ, vǫ) is independent of δ, hence by standard arguments (see the proof of Theorem 6.4 in [9]) we have a harmonic map uǫ : H m → Hn which is the subsequent limit of uǫ,δ. Moreover for all x ∈ H m we have dHn(uǫ, vǫ) ≤ C ψ(xm). 6 DUONG MINH DUC AND TRUONG TRUNG TUYEN Hence dHn(uǫ, vǫ) = 0, which shows that uǫ is continuous up to the boundary and takes boundary value 1, ..., xm−1, 0) = (f1, f2, ..., fn−1, ǫ). � References [1] Shiu-Yuen Cheng, Liouville theorem for harmonic maps, Proc. Symp. Pure Math. 36, 1980, 147–151. [2] Wei-Yue Ding and Youde Wang, Harmonic maps of complete noncompact Riemannian man- ifolds, Internat. J. Math. 2, 1991, 617–633. [3] Duong Minh Duc and Alberto Verjovsky, Proper harmonic maps with Lipschitz boundary values, preprint. [4] James Eells, Jr. and J. H. Sampson, Harmonic mappings of Riemannian manifolds, Ams. J. Math. 86 (1), 1964, pp. 109–160. [5] David Gilbarg and Neil S. Trudinger, Elliptic partial differential Equations of second order, Springer - Verlag, Berlin - Heidelberg-New York -Tokyo, 1983. [6] Frederic Helein, Regularite des applications faiblement harmoniques entre une surface et une variete riemannienne, C. R. Acad. Sci. Paris 312 (1), 1991, 591–596. [7] Peter Li and Luen-Fai Tam, The heat equation and harmonic maps of complete manifolds, Invent. Math. 105, 1991, 1–46. [8] Peter Li and Luen-Fai Tam, Uniqueness and regularity of proper harmonic maps, Anals of Mathematics 137, 1993, pp. 167-201. [9] Peter Li and Luen-Fai Tam, Uniqueness and regularity of proper harmonic maps II, Indiana University Mathematics Journal 42 (2), 1993, pp. 591-635. [10] Richard Schoen and Shing Tung Yau, Compact group actions and the topology of manifolds with non-positive curvature, Topology 18, 1979, 361–380. Department of Mathematics, University of natural sciences, Hochiminh city, Viet- E-mail address: dmduc@hcmuns.edu.vn Department of Mathematics, Indiana University, Rawles Hall, Bloomington, IN 47405 E-mail address: truongt@indiana.edu 1. Introduction 2. Initial maps 3. Proof of Theorem ?? References ABSTRACT Our main result in this paper is the following: Given $H^m, H^n$ hyperbolic spaces of dimensional $m$ and $n$ corresponding, and given a Holder function $f=(s^1,...,f^{n-1}):\partial H^m\to \partial H^n$ between geometric boundaries of $H^m$ and $H^n$. Then for each $\epsilon >0$ there exists a harmonic map $u:H^m\to H^n$ which is continuous up to the boundary (in the sense of Euclidean) and $u|_{\partial H^m}=(f^1,...,f^{n-1},\epsilon)$. <|endoftext|><|startoftext|> Introduction In [1-3] new effect called photonic flame effect was found and some its properties were studied. This effect is determined by properties of photonic crystals. Photonic crystals have attracted great attention since the first papers concerning such structures [4-6]. One of the most important photonic crystals are artificial opals – self – assembled structures composed of SiO2 spheres organizing face-centered cubic lattice. The size of such spheres varying between 200 nm and 400 nm and defines the parameters of the face- centered cubic lattice and the photonic bandgap. The possibility of opal infiltration with different medium gives rise to effective processing the properties of the light propogating through the crystal. The voids in the opal structures can be filled with semiconductors, superconductors, ferromagnetic materials, fluorescent medium [7] and this fact gives large possibility to practical applications of such structures for optoelectronics. The study of the linear optical properties of the photonic band gap have been the task of many theoretical and experimental works and still remain the task to be investigated [7,8]. The theoretical description of the electromagnetic field inside the photonic crystal structures (obtained by transfer matrix method [9] or coupled mode theory [10]) gives the clear picture of the transmitted and reflected spectrum, electromagnetic field distribution inside the crystal and their dependence on the parameters of the photonic crystal structure (values of period, number of periods, refractive index contrast). Large values of the electromagnetic field localization in some regions lead to the expectation of the strong enhancement of nonlinear wave-matter interaction in comparison with bulk crystals. Second harmonic generation in different types of photonic crystals was investigated in [11,12]. Properly chosen photonic crystal exhibits negative refraction at some conditions [13]. Some features of the stimulated Raman scattering in one-dimensional photonic structure were considered in [14]. Fully quantum mechanical treatment of the generation of entangled photon in nonlinear photonic crystals at the process of down-conversion was realized in [15]. Photonic band gap properties which are demonstrated by photonic crystalls are being actively used for investigation of photon-exciton interaction [16]. Acoustic modes excited in SiO2 balls which compose opal photonic crystal show the effect of phonon modes quantization [17] and are the reason of stimulated globular scattering [3]. Specific features of the acoustic wave propagation in the photonic structures lead to the possibility of the diverging ultrasonic beam focusing into a mailto:tchera@mail1.lebedev.ru narrow focal spot with a large focal depth [18]. Optical parametric oscillations via four-wave mixing in isotropic photonic crystals showes the possibility of the effective frequency processing [19]. The aim of this work is to give a short review of results [1-3] and to study collective behavior of several photonic crystals. The crystals are posed on Cu plate at the temperature of liquid nitrogen. One of the photonic crystals is illuminated by laser pulse and the laser light is focused on this only crystal. The phenomenon which we observe is the appearance of luminescence of other photonic crystals. The duration of the luminescence of other crystals which are spatially separated with the crystal illuminated by laser pulse is of the order of seconds. The appearance of the luminescence takes place with some time delay respectively to the laser pulse. The form of these light spots on the other crystals and their slow motions along the crystal reminds a small flame spot. This inspired us to give the “photonic flame” name to the observed effect. In the case of covering the surface of the Cu plate with liquid (acetone, ethanol, water) after the PFE excitation in the opal situated on this plate blue luminescence is being seen in the frozen liquid. The temporal characteristics of this luminescence are the sme as for single opal crystal. The paper is organized as follows. In Sec.2 the experimental setup, laser, the photonic crystals (artificial opals) used in the experiment are described. In Sec.3 the “photonic flame effect” observed in the experiment is discussed. In Sec.4 perspectives and possible explanations are presented. 2. Photonic crystals and laser used in experiment. One of the most promising three-dimensional photonic crystals is artificial opal. Opal is a crystal with face-centered cubic lattice consisting of the monodisperse close packed SiO2 spheres with diameter about several hundred nanometers. Because the refractive index contrast (ratio nSiO2/nair) is about 1,45 the complete photonic band gap does not exist but the photonic pseudogap takes place. Empty cavities among these globules have octahedral and tetrahedral form. It is possible to investigate both initial opals (opal matrices) and nanocomposites, in which cavities are filled with organic or inorganic materials, for instance, semiconductors, superconductors, ferromagnetic substances, dielectrics, displaying different types of Fig.1. Common appearance of a globular photonic crystal, built of spherical particles (globules) nonlinearities and so on. Filling voids of the photonic crystal with materials with different refraction index one can effectively process the parameters of the photonic pseudogap. Ruby laser giant pulse (λ=694.3 nm, τ=20 ns, Emax =0.3 J, spectral width of the initial light - 0.015 cm-1.) has been used as a source of excitation. Exciting light has been focused on the material by lenses with different focal lengths (50, 90, and 150 mm). The samples of opal crystals used had the size 3x5x5mm and were cut parallel to the plane (111) (see Fig.2) .The angle of the incidence of the laser beam on the plane (111) varied from 0 to 600. Sample distance from focusing system and exciting light energy were different in different runs of the experiment. This gave possibility to make measurements for different power density at the entrance of the sample and for different field distribution inside the sample. Opal crystals consisting of the close-packed amorphose spheres with diameter 200 nm, 230 nm, 250 nm and nanocomposites (opal crystals with voids filled with acetone or ethanol) were investigated. z θ y [111] x Fig.2. The scheme of illuminating the sample. Plane XY correspondes to the CU plate surface. 3.Characteristics of “photonic flame” Opal crystals were placed on the Cu plate which was put into the cell with liquid nitrogen (see Fig.3). The number of crystals varied from 1 to 5. The distance (d) betwen the crystals was of the order of several centimeters (maximum value of d was 5 centimeters and was determined by the Cu plate size). One of the crystals was illuminated by the focused laser pulse. In the case of the reaching of the threshold visible (blue) luminescence appeared. The luminescence duration was from 1 to 12 seconds and it looked like inhomogeneous spot changing its spatial distribution and position on the surface of the crystal during this time. liquid N2 opal opal exciting light d opal Cu Fig.3. Experimental setup. Parameters of the luminescence (duration, threshold) were determined by the geometric characteristic of the illumination and the refractive index contrast of the sample. For optimal geometry of the excitation the power density threshold for opal crystal was 0.12 Gw/cm2, for opal crystals filled with ethanol – 0.05 Gw/cm2, for opal crystal filled with acetone - 0,03 Gw/cm2. Typical luminescence temporal distribution measured for the part of the crystal displaying the most intensive brightness is shown on Fig.4. The same behavior is typical for all cases of the luminescence at these conditions of excitation, but the value of the luminescence duration fluctuated from shoot to shoot. a) b) Fig.4. Temporal distribution of the visible luminiscence. The duration of the luminiscence fluctuated from 1 till 12 seconds and demonstrated oscillating structure. In some cases the temporal distribution had maximum at the beginning of the luminiscence in some cases – minimum. Fig.4 a) and b) show the luminiscence of the pure opal matrix of the nearly the same duration at the same geometrical and energetical conditions of excitation near the threshold of excitation (0.12 Gw/cm2 ). The beginning of the mesurements corresponds to 0.3 s delay after the laser shoot (laser pulse duration is 20 ns). Secondary emission spectrum observed in photonic flame effect has been investigated with the help of setup shown at the Fig.5. 12 11 10 9 8 1 3 4 2 6 7 Fig. 5.The experimental setup for PPE spectrum study. 1- ruby laser; 2- lens; 3, 4, 5 – photonic crystals; 7 – cell with liquid nitrogen 6 – Cu plate; 8 – fiber wave guide; 9 – minipolychromator; 10 – computer; 11 – camera; 12 – computer. Spectra of the light emitted by photonic crystal for different pumping light power density are shown at the Fig. 6 (a and b) 200 300 400 500 600 700 800 900 1000 643I, λ, nm 200 300 400 500 600 700 800 900 1000 1100 λ, nm a b Fig. 6. Secondary emission spectrum of a photonic crystal for different laser light power density: a - I = 0.12 GW/cm2, b - I = 0.14 GW/cm2. Spectrum consisted of the sharp lines with wavelengths: 429.0, 453.0, 489.0, 555.0, 643.0 nm, which corresponds to the antistokes spectral range for exciting line 694.3 nm. Lines intensity in the spectrum strongly depended on the laser pumping intensity, which was evidence of stimulated type of the radiation emission. In the case of several crystals placed on the Cu plate only one of them was irradiated by the laser pulse. Luminescence took place in this crystal in the case of the threshold reaching. Bright shining of the other crystals began with some time delay after laser shoot. The value of this delay (and the intensity of the luminiscence) was determined by the spatial position of the crystals on the plate. The steal screen beeing put between the crystals (in order to avoid irradiating of the crystals by the light scattered by the crystal excited by the laser) did not stop the appearing of the luminescence if the distance between the Cu plate and the screen was more than 0.5 mm. The duration of the luminiscence was of the order of several seconds and temporal behavior was like shown on Fig.4. The typical features of such distribution were existence of maximum and large plato with near constant value of the intensity. In order to show the role of the material of the plate used we repeated these measurements with plates of the same size but made from steel and quarz on which opal crystals were placed like in the previous experiments. Luminescence of the same kind in the irradiated crystal took place but the luminescence of the other samples situated on these plates was not observed. The effect was also determined by the angle of incidence (Fig.2). For the samples used the value of the angle was chosen experimentally for achieving of the maximal value of the luminescence (it worth to mention that this value differed from 0 and was about 400). Easier the effect was excited in the unprocessed samples. In Fig.5 one can see the luminescence of the crystals situated at the distance of about 1 centimeter from the crystal which was irradiated. Fig.7 Visible luminiscence of the opal crystals in the case of the irradiating one of them (the irradiated crystal can be seen by bright red light; on the left picture it was the crystal in the center, on the right picture it was the crystal on the left). Left picture corresponds to the case where crystals are infiltrated by acetone. Right picture corresponds to the case of the opal crystals without infiltration. In the case of the large laser energy (several times more the threshold) or if the crystal was irradiated by several laser pulses the opal can be destroyed and the parts of the crystal produce the luminiscence with the spectral and temporal properties described above (Fig.8). Fig.8 Opal crystal is destroyed and 3 large pieces and several little pieces are going on to produce the luminescence. In order to clarify the role of the Cu plate surface on the energy transport between the crystals the next experiment was realized: the pure opal matrix posed on the Cu plate was irradiated by the ruby laser pulse and demonstrated strong luminescence lasting few seconds with the properties described above (Fig.9) Fig.9 Luminiscence of the single opal matrix Next step was covering the surface of the Cu plate with the liquid (experiments were made with water, aceton or ethanol). The thickness of the frozen liquid on the plate surface was about 1 mm. The transverse size of the frozen liquid was about 1 cm. After illuminating of the crystal by the ruby laser pulse the luminiscence of the crystal appears the bright blue luminiscence of the frozen species of the liquid used appears. The temporal characteristics of the luminiscence in crystal and in the frozen liquid are approximatly the same (the luminiscence duration is about several seconds). The luminiscence of the frozen liquid goes on in spite on the putting the screen between the crystal and the liquid. It shows that the luminiscence of the liquid is not a reflection of the light which is emmited by the crystal. Fig.10 shows the luminiscence of the crystal and the frozen liquid (in this case it was water). The pictures were made with the interval of 1 second between each other. Analogous behaviour is demonstrated by aceton and ethanol. The luminiscence of the area covered with frozen liquid takes place even if this area is at the distance of several cantimeters from the irradiated crystal.The explanation of the blue luminiscence of the frozen liquid can be done in several ways and for clarifying the reasons of this luminescence appearance it is necessary to produce additional experiments. The intensity of the laser in the experiments is about 0.12 Gw/cm2, and the large enhancement of this field due to Mie – resonance [20] simultaniously with the interference effect caused by the structure of the opal matrix can lead to the extremely large field enhancement which can play an important role in this effect. Fig.10 Luminiscence of the opal matrix (bright round spot) and frozen liquid (large blue spot) on the surface of the Cu plate. 4. Conclusions. In this paper we reported about some new features of the photonic flame effect. The main features of PFE are: - At the excitation of the artificial opal crystal which is placed on the Cu plate at the temperature of the liquid nitrogen by the ruby laser pulse of the nanosecond range long- continued optical luminescence takes place in the case if the threshold of the process is reached; - In the case of several opal crystals being put on the Cu plate while one of them is being irradiated bright visible luminescence occurs in all samples; - Temporal behavior and thresholds of the luminescence have been determined. Photonic crystals infiltrated with different nonlinear liquids and without infiltration have been investigated. Investigated transport of the excitation between the samples spatially separated by the length of several centimeters gives the possibility of the practical applications of PFE; - The blue luminiscence of the frozen liquid on the surface of the Cu plate takes place at the precense of the photonic flame effect; The photonic flame effect can have different explanation. Probably an essential role is played by plasma properties. The slow transport of the excitations from the irradiated crystal to other photonic crystals can be associated with sound waves created due to laser pulse interaction with the sample. Exciton mechanism and surface waveguides on the surface of the Cu plate also can play important role. It was checked that the change of the properties of the plate surface was leading to change of the photonic flame effect. Removing the oxid layer from the plate changed the threshold PFE. The luminescence of the frozen liquids on the surface of the Cu plate showes the important role of the electromagnetic field enhancement due to Mie resonance and Bragg diffraction on the photonic crystal lattice. The electromagnetic field enhancement can lead to producing laser plasma, electron acceleration and x-ray production. References 1. N.V.Tcherniega, A.D.Kudryavtseva, ArXiv Physics/ 0608150 (2006). 2. N.V.Tcherniega, A.D.Kudryavtseva, Journal of Russian Laser Research, V.27, N 5, стр.400- 409 (2006). 3.A.A.Esakov, V.S.Gorelik, A.D.Kudryavtseva, M.V.Tareeva and N.V.Tcherniega, SPIE Proceedings, V 6369, 6369 OE1 - 6369 OE12, Photonic Crystals and Photonic Crystal Fibers for Sensing Applications II; Henry H. Du, Ryan Bise; Eds, (Oct.2006). 4.P. Bykov, J. Eksp. Teor. Fiz., 35, 269, (1972). 5.E.Yablonovich, Phys. Rev. Lett.,58, 2059 (1987). 6.S.John , Phys. Rev. Lett., 58, 2486, (1987). 7.V. N. Astratov, V. N. Bogomolov, A. A. Kaplyanskii, A. V. Prokofiev, L. A. Samoilovich, S. M. Samoilovich, Yu. A. Vlasov, Nuovo Cimento, D 17,1349 (1995). 8.A. V. Baryshev, A. A. Kaplyanskii, V. A. Kosobukin, M. F. Limonov, K. B. Samsuev, Fiz.Tverd.Tela, 45, 434 (2003), in Russian. 9.M. Born, E. Wolf, Principles of Optics, Macmillan, New York (1964) 10.A. Yariv, Quantum Electronics, John Wiley and Sons, Inc., New York, London, Sudneu (1967). 11.M. G. Martemyanov, D. G. Gusev, I. V. Soboleva, T.V. Dolgova, A. A. Fedyanin, O. A. Akstipetrov, and G. Marovsky, Laser Physics, 14, 677 (2004). 12.A. A. Fedyanin, O. A. Aktsipetrov, D. A. Kurdyukov, V. G. Golubev, M. Inoue, Appl.Phys.Letters, 87, 151111 (2005). 13.Foteinopoulou, E.N.Economou, C.M.Soukoulis, Phys.Rev.Let., 90 , 107402, (2003). 14.R. G. Zaporozhchenko, S. Ya. Kilin, A. G. Smirnov, Quantum Electronics, 30, 997 (2000), in Russian. 15.W. T. M. Irvine, M. J. A. de Dood, D. Bouwmeester, Phys Rev.A 72, 043815 (2005). 16.N. A. Gippius, S. G. Tihodeev, A. Christ, J. Kuhl, H. Giessen, Fiz. Tverd. Tela, 47, 139 (2005). 17.M.H.Kuok, H.S.Lim, S.C.NG, N.N.Liu, Z.K.Wang, Phys.Rev.Let., 90 , 255502, (2003). 18. Suxia Yang, J.H.Page, Zhengyou Liu, M.L.Cowan, C.T.Chan, Ping Sheng, Phys.Rev.Let., 93 , 024301, (2004). 19.Claudio Conti, Andrea Di Falco, Gaetano Assantom, Optics Express, 12, 823, (2004). 20.G.Mie, Ann.Phys.,(Berlin), 25, 377,(1908) ABSTRACT The results of the spectral, energetical and temporal characteristics of radiation in the presence of the photonic flame effect are presented. Artificial opal posed on Cu plate at the temperature of liquid nitrogen boiling point (77 K) being irradiated by nanosecond ruby laser pulse produces long- term luminiscence with a duration till ten seconds with a finely structured spectrum in the the antistocks part of the spectrum. Analogous visible luminescence manifesting time delay appeared in other samples of the artificial opals posed on the same plate. In the case of the opal infiltrated with different nonlinear liquids the threshold of the luminiscence is reduced and the spatial disribution of the bright emmiting area on the opal surface is being changed. In the case of the putting the frozen nonlinear liquids on the Cu plate long-term blue bright luminiscence took place in the frozen species of the liquids. Temporal characteristics of this luminiscence are nearly the same as in opal matrixes. <|endoftext|><|startoftext|> Introduction Fundamentals of nonparametric modeling Description of kernel function Nonparametric estimation of PDF pertaining to experimental data Estimation of a physical law Characteristics of the model Predictor quality Redundancy and predictor cost function Example Discussion Conclusions Acknowledgments References ABSTRACT Statistical modeling of experimental physical laws is based on the probability density function of measured variables. It is expressed by experimental data via a kernel estimator. The kernel is determined objectively by the scattering of data during calibration of experimental setup. A physical law, which relates measured variables, is optimally extracted from experimental data by the conditional average estimator. It is derived directly from the kernel estimator and corresponds to a general nonparametric regression. The proposed method is demonstrated by the modeling of a return map of noisy chaotic data. In this example, the nonparametric regression is used to predict a future value of chaotic time series from the present one. The mean predictor error is used in the definition of predictor quality, while the redundancy is expressed by the mean square distance between data points. Both statistics are used in a new definition of predictor cost function. From the minimum of the predictor cost function, a proper number of data in the model is estimated. <|endoftext|><|startoftext|> Introduction This paper is a brief description of a methodology of developing options (in the sense of financial options, e.g., with all Greeks), to be applied in collaboration with Michael Bowman, as a first example to scheduling a massive US Army project, Future Combat Systems (FCS) [1]. The major focus is to develop Real Options for non-financial projects, as discussed in other earlier papers [3,4,12]. Data and some guidance on its use has been reported in a previous study of FCS [2,5]. The need for tools for fairly scheduling and pricing such a complex project has been emphasized in Recommendations for Executive Action in a report by the U.S. General Accounting Office (GAO) on FCS [14], and they also emphasize the need for management of FCS business plans [13]. 2. Goals A giv en Plan results in S(t), money allocated by the client/government is defined in terms of Projects Si(t), S(t) = Σ Si(t) where ai(t) may be some scheduled constraints. PATHTREE processes a probability tree developed over the life of the plan T , divided into N nodes at times {tn}, each with mean epoch length dt [11]. Options, including all Greeks, familiar to financial markets, are calculated for quite arbitrary nonlinear means and variances of multiplicative noise [6,9]. This ability to process nonlinear functions in probability distributions is essential for real-world applications. Each Task has a range of durations, with nonzero Ai , with a disbursement of funds used, defining Si(tn). Any Task dependent on a Task completion is slaved to its precursor(s). We dev elop the Plan conditional probability density (CPD) in terms of differenced costs, dS, P(S ± dS; tn + dt |S; tn) P is modeled/cast/fit into the functional form P(S ± dS; tn + dt |S; tn) = (2π g 2 exp(−Ldt) (dS − fdt)2 (2g2dt2) where f and g are nonlinear function of cost S and time t. The g2 variance function absorbs the multiple Task cost and schedule statistical spreads, to determine P(dS, t), giving rise to the stochastic nature of dollars spent on the Plan. A giv en Project i with Task k has a mean duration iik , with a a mean cost Sik . The spread in dS has two components arising from: (1) a stochastic duration around the mean duration, and (2) a stochastic spread of mean dollars around a deterministic disbursement at a given time. Different finite-width asymmetric distributions are used for durations and costs. For example, the distribution created for Adaptive Simulated Annealing (ASA) [8], originally called Very Fast Simulated Re-annealing [7], is a finite-ranged distribution with shape determined by a parameter “temperature” q. For each state (whether duration or cost): (a) A random binary choice can be made to be higher or lower than the mean, using any ratio of probabilities selected by the client. (b) Then, an ASA distribution is used on the chosen side. Each side has a different q, each falling off from the mean. This is illustrated and further described in Fig. 1. At the end of the tree at a time T (T also can be a parameter), there is a total cost at each node S(T ), called a final “strike” in financial language. (A final strike might also appear at any node before T due to cancellation of the Project using a particular kind of schedule alternative.) Working backwards, options are calculated at time t0. Greeks (functional derivatives of the option) assess sensitivity to various variables, e.g., like those discussed in previous papers [12], but here we deliver precise numbers based on as much real-world information available. Lester Ingber - 3 - Real Options for Project Schedules (ROPS) -1 -0.5 0 0.5 1 ASA (q = 0.1) 1/(2 * (abs(y) + q) * log(1 + 1/q)) Fig. 1. The ASA distribution can be used to develop finite-range asymmetric distributions from which a value can be chosen for a given state of duration or cost. (a) A random binary distribution is selected for a lower-than or higher-than mean, using any ratio of probabilities selected by the client. Each side of the mean has its own temperature q. Here an ASA distri- bution is given for q = 0.1. The range can be scaled to any finite interval and the mean placed within this range. (b) A uniform random distribution selects a value from [-1,1], and a normalized ASA value is read off for the given state. 3. Data The following data are used to develop Plan CPD. Each Task i has (a) a Projected allocated cost, Ci (b) a Projected time schedule, Ti (c) a CPD with a statistical width of funds spent, SWSi (d) a distribution with a statistical width of duration, SWTi (e) a range of durations, RTi (f) a range of costs, RSi Expert guesses need to be provided for (c)-(f) for the prototype study. A giv en Plan must be constructed among all Tasks, specified the ordering of Tasks, e.g., obeying any sequential constraints among Tasks. 4. Three Recursive Shell 4.1. Outer Shell There may be several parameters in the Project, e.g., as coefficients of variables in means and variances of different CPD. These are optimized in an outer shell using ASA [8]. This end product, including MULTI_MIN states returned by ASA, gives the client flexibility to apply during a full Project [12]. We may wish to minimize Cost/T , or (CostOverrun - CostInitial)/T , etc. Lester Ingber - 4 - Real Options for Project Schedules (ROPS) 4.2. Middle Shell To obtain the Plan CPD, an middle shell of Monte Carlo (MC) states are generated from recursive calculations. A Weibull or some other asymmetric finite distribution might be used for Task durations. For a giv en state in the outer middle, a MC state has durations and mean cost disbursements defined for each Task. 4.3. Inner Shell At each time, for each Task, the differenced cost ((Sik(t + dt) − Sik(t))) is subjected to a inner shell stochastic variation, e.g., some asymmetric finite distribution. The net costs dSik(t) for each Project i and Task k are added to define dS(t) for the Plan. The inner shell cost CPD is re-applied many times to get a set of {dS} at each time. 5. Real Options 5.1. Plan Options After the Outer MC sampling is completed, there are histograms generated of the Plan’s dS(t) and dS(t)/S(t − dt) at each time t. The histograms are normalized at each time to give P(dS, t). At each time t, the data representing P is “curve-fit” to the form of Eq. (0), where f and g are functions needed to get good fits, e.g., fitting coefficients of parameters {x} f = x f 0 + x f 1S + x f 2S 2 + . . . g = xg0 + xg1S + xg2S 2 + . . . At each time t, the functions f and g are fit to the function ln((P(dS, t)), which includes the prefactor containing g and the function L which may be viewed as a Padé approximate of these polynomials. Complex constraints as functions of Sik(t) can be easily incorporated in this approach, e.g., due to regular reviews by funding agencies or executives. These P’s are input into PATHTREE to calculate options for a given strategy or Plan. 5.2. Risk Management of Project Options If some measure of risk among Projects is desired, then during the MC calculations developed for the top- level Plan, sets of differenced costs for each Project, dSi(t) and dSi(t)/Si(t − dt), stored from each of the Project’s Tasks. Then, histograms and Project CPDs are developed, similar to the development of the Plan CPD. A copula analysis, coded in TRD for risk management of financial markets, are applied to develop a relative risk analysis among these projects [10]. In such an analysis, the Project marginal CPDs are all transformed to Gaussian spaces, where it makes sense to calculate covariances and correlations. An audit trail back the original Project spaces permits analysis of risk dependent on the tails of the Project CPDs. 6. Generic Applications ROPS can be applied to any complex scheduling of tasks similar to the FCS project. The need for government agencies to plan and monitor such large projects is becoming increasingly difficult and necessary [15]. Many large businesses have similar projects and similar requirements to manage their complex projects. Lester Ingber - 5 - Real Options for Project Schedules (ROPS) References [1] M. Bowman and L. Ingber, ‘‘Real Options for US Army Future Combat Systems,’’ Report 2007:ROFCS, Lester Ingber Research, Ashland, OR, 2007. [2] G.G. Brown, R.T. Grose, and R.A. Koyak, ‘‘Estimating total program cost of a long-term, high- technology, high-risk project with task durations and costs that may increase over time,’’ Military Operations Research 11, 41-62 (2006). [URL http://www.nps.navy.mil/orfacpag/resumePages/papers/Brownpa/Estimating_total_ program_cost.pdf] [3] T.E. Copeland and P.T. Keenan, ‘‘Making real options real,’’ McKinsey Quarterly 128-141 (1998). [URL http://faculty.fuqua.duke.edu/˜charvey/Teaching/BA456_2006/McK98_3.pdf] [4] G. Glaros, ‘‘Real options for defense,’’ Tr ansformation Trends June, 1-11 (2003). [URL http://www.oft.osd.mil/library/library_files/trends_205_transforma- tion_trends_9_june%202003_issue.pdf] [5] R. Grose, ‘‘Cost-constrained project scheduling with task durations and costs that may increase over time: Demonstrated with the U.S. Army future combat systems,’’ Thesis, Naval Postgraduate School, Monterey, CA, 2004. [URL http://www.stormingmedia.us/75/7594/A759424.html] [6] J.C. Hull, Options, Futures, and Other Derivatives, 4th Edition (Prentice Hall, Upper Saddle River, NJ, 2000). [7] L. Ingber, ‘‘Very fast simulated re-annealing,’’ Mathl. Comput. Modelling 12, 967-973 (1989). [URL http://www.ingber.com/asa89_vfsr.pdf] [8] L. Ingber, ‘‘Adaptive Simulated Annealing (ASA),’’ Global optimization C-code, Caltech Alumni Association, Pasadena, CA, 1993. [URL http://www.ingber.com/#ASA-CODE] [9] L. Ingber, ‘‘Statistical mechanics of portfolios of options,’’ Report 2002:SMPO, Lester Ingber Research, Chicago, IL, 2002. [URL http://www.ingber.com/markets02_portfolio.pdf] [10] L. Ingber, ‘‘Trading in Risk Dimensions (TRD),’’ Report 2005:TRD, Lester Ingber Research, Ashland, OR, 2005. [11] L. Ingber, C. Chen, R.P. Mondescu, D. Muzzall, and M. Renedo, ‘‘Probability tree algorithm for general diffusion processes,’’ Phys. Rev. E 64, 056702-056707 (2001). [URL http://www.ingber.com/path01_pathtree.pdf] [12] K.J. Leslie and M.P. Michaels, ‘‘The real power of real options,’’ McKinsey Quarterly 4-22 (1997). [http://faculty.fuqua.duke.edu/˜charvey/Teaching/BA456_2006/McK97_3.pdf] [13] General Accounting Office, ‘‘Future Combat System Risks Underscore the Importance of Oversight,’’ Report GAO-07-672T, GAO, Washington DC, 2007. [URL http://www.gao.gov/cgi- bin/getrpt?GAO-07-672T] [14] General Accounting Office, ‘‘Key Decisions to Be Made on Future Combat System,’’ Report GAO-07-376, GAO, Washington DC, 2007. [URL http://www.gao.gov/cgi- bin/getrpt?GAO-07-376] [15] B. Wysocki, Jr, ‘‘Is U.S. Government ’Outsourcing Its Brain’?,’’ Wall Street Journal March 30, 1 (2007). ABSTRACT Real Options for Project Schedules (ROPS) has three recursive sampling/optimization shells. An outer Adaptive Simulated Annealing (ASA) optimization shell optimizes parameters of strategic Plans containing multiple Projects containing ordered Tasks. A middle shell samples probability distributions of durations of Tasks. An inner shell samples probability distributions of costs of Tasks. PATHTREE is used to develop options on schedules.. Algorithms used for Trading in Risk Dimensions (TRD) are applied to develop a relative risk analysis among projects. <|endoftext|><|startoftext|> Introduction We shall start with Definition. Suppose that n ≥ 2 is an integer. We will say that a group M has the property (nCC) if there are exactly n conjugacy classes of elements in M . Note that a group M has (2CC) if and only if any two non-trivial elements are conjugate in M . For two elements x, y of some group G, we shall write x ∼ y if x and y are conjugate in G, and x ≁ y if they are not. For a group G, denote by π(G) the set of all finite orders of elements of G. A classical theorem of G. Higman, B. Neumann and H. Neumann ([8]) states that every countable group G can be embedded into a countable (but infinitely gen- erated) group M , where any two elements of the same order are conjugate and π(M) = π(G). For any integer n ≥ 2, take G = Z/2n−2Z and embed G into a countable group M according to the theorem above. Then card(π(M)) = card(π(G)) = n − 1. Since, in addition, M will always contain an element of infinite order, the theorem of Higman-Neumann-Neumann implies that G has (nCC). Another way to construct infinite groups with finitely many conjugacy classes was suggested by S. Ivanov [15, Thm. 41.2], who showed for every sufficiently large prime p there is an infinite 2-generated groupMp of exponent p possessing exactly p conjugacy classes. The groupMp is constructed as a direct limit of word hyperbolic groups, and, as noted in [21], it is impossible to obtain an infinite group with (2CC) in the same manner. In the recent paper [21] D. Osin developed a theory of small cancellation over relatively hyperbolic groups and used it to obtain the following remarkable result: 2000 Mathematics Subject Classification. 20F65, 20E45, 20F28. Key words and phrases. Conjugacy Classes, Relatively Hyperbolic Groups, Outer Automor- phism Groups. This work was supported by the Swiss National Science Foundation Grant ♯ PP002-68627. http://arxiv.org/abs/0704.0091v2 2 ASHOT MINASYAN Theorem 1.1 ([21], Thm. 1.1). Any countable group G can be embedded into a 2-generated group M such that any two elements of the same order are conjugate in M and π(M) = π(G). Applying this theorem to the group G = Z/2n−2Z one can show that for each integer n ≥ 2 there exists a 2-generated group with (nCC). And when n = 2 we get a 2-generated torsion-free group that has exactly two conjugacy classes. The presence of elements of finite orders in the above constructions was impor- tant, because if two elements have different orders, they can never be conjugate. So, naturally, one can ask the following Question 1. Do there exist torsion-free (finitely generated) groups with (nCC), for any integer n ≥ 3? Note that if G is the finitely generated group with (2CC) constructed by Osin, then the m-th direct power Gm of G is also a finitely generated torsion-free group which satisfies (2mCC). But what if we want to achieve a torsion-free group with (3CC)? With this purpose one could come up with Question 2. Suppose that G is a countable torsion-free group and x, y ∈ G are non-conjugate. Is it possible to embed G into a groupM , which has (3CC), so that x and y stay non-conjugate in M? Unfortunately, the answer to Question 2 is negative as the following example shows. Example 1. Consider the group (1.1) G1 = 〈a, t ‖ tat −1 = a−1〉 which is isomorphic to the non-trivial semidirect product Z ⋊ Z. Note that G1 is torsion-free, and t is not conjugated to t−1 in G1 because t ≁ t −1 in the infinite cyclic group 〈t〉 which is canonically isomorphic to the quotient of G1 by the normal closure of a. However, if G1 is embedded into a (3CC)-group M , it is easy to see that every element of M will be conjugated to its inverse (indeed, if y ∈ M \ {1} and y ≁ y−1 then yǫ ∼ a−1, for some ǫ ∈ {1,−1}, hence yǫ ∼ y−ǫ – a contradiction). In particular, t ∼ t−1. An analog of the above example can be given for each n ≥ 3 – see Section 3. This example shows that, in order to get a positive result, one would have to strengthen the assumptions of Question 2. Let G be a group. Two elements x, y ∈ G are said to be commensurable if there exist k, l ∈ Z \ {0} such that xk is conjugate to yl. We will use the notation x if x and y are commensurable in G. In the case when x is not commensurable with y we will write x 6≈ y. Observe that commensurability, as well as conjugacy, defines an equivalence relation on the set of elements of G. It is somewhat surprising that if one replaces the words ”non-conjugate” with the words ”non-commensurable” in Question 2, the answer becomes positive: Corollary 1.2. Assume that G is a countable torsion-free group, n ∈ N, n ≥ 2, and x1, . . . , xn−1 ∈ G \ {1} are pairwise non-commensurable. Then there exists a group M and an injective homomorphism ϕ : G→M such that 1. M is torsion-free and generated by two elements; GROUPS WITH FINITELY MANY CONJUGACY CLASSES 3 2. M has (nCC); 3. M is 2-boundedly simple; 4. the elements ϕ(x1), . . . , ϕ(xn−1) are pairwise non-commensurable in M . Recall that a group G is said to be k-boundedly simple if for any x, y ∈ G \ {1} there exist l ≤ k and g1, . . . , gl ∈ G such that x = g1yg 1 · · · glyg l in G. A group is called boundedly simple if it is k-boundedly simple for some k ∈ N. Evidently every boundedly simple group is simple; the converse is not true in general. For example, the infinite alternating group A∞ is simple but not boundedly simple because conjugation preserves the type of the decomposition of a permutation into a product of cycles. First examples of torsion-free finitely generated boundedly simple groups were constructed by A. Muranov (see [12, Thm. 2], [13, Thm. 1]). Corollary 1.2 is an immediate consequence of a more general Theorem 3.5 that will be proved in Section 3. Applying Corollary 1.2 to the group G = F (x1, . . . , xn−1), which is free on the set {x1, . . . , xn−1}, and its non-commensurable elements x1, . . . , xn−1, we obtain a positive answer to Question 1: Corollary 1.3. For every integer n ≥ 3 there exists a torsion-free 2-boundedly simple group satisfying (nCC) and generated by two elements. (In the case when n = 2 the above statement was obtained by Osin in [21, Cor. 1.3].) In fact, for any (finitely generated) torsion-free group H we can set G = H ∗ F (x1, . . . , xn−1), and then use Corollary 1.2 to embed G into a group M enjoying the properties 1− 4 from its claim. Since there is a continuum of pairwise non-isomorphic 2-generated torsion-free groups ([4]), and a finitely generated group can contain at most countably many of different 2-generated subgroups, this shows that there must be continually many pairwise non-isomorphic groups satisfying properties 1− 3 from Corollary 1.2. Recall that the rank rank(G) of a group G is the minimal number of elements re- quired to generate G. In Section 4 we show how classical theory of HNN-extensions allows to construct different embeddings into (infinitely generated) groups that have finitely many classes of conjugate elements, and in Section 5 we use Osin’s results (from [21]) regarding quotients of relatively hyperbolic groups to prove Theorem 1.4. Let H be a torsion-free countable group and let M ⊳H be a non- trivial normal subgroup. Then H can be isomorphically embedded into a torsion-free group Q, possessing a normal subgroup N ⊳Q, such that • Q = H ·N and H ∩N =M (hence Q/N ∼= H/M); • N has (2CC); • ∀ x, y ∈ Q \ {1}, x ∼ y if and only if ϕ(x) ∼ ϕ(y), where ϕ : Q → Q/N is the natural homomorphism; • rank(N) = 2 and rank(Q) ≤ rank(H/M) + 2. This theorem implies that if Q/N ∼= H/M has exactly (n− 1) conjugacy classes (e.g., if it is finite), then the group Q will have (nCC) and will not be simple (if n ≥ 3). Thus it may be used to build (nCC)-groups in a recursive manner. It also allows to obtain embeddings of countable torsion-free groups into (nCC)- groups, which we could not get by using Corollary 1.2. For instance, as we saw in Example 1, the fundamental group of the Klein bottle G1, given by (1.1), can not be embedded into a (3CC)-group M so that t ≁ t−1. However, with 4 conjugacy 4 ASHOT MINASYAN classes this is already possible: see Corollary 5.5 in Section 5. The idea is as follows: the group G1 can be mapped onto Z/3Z in such a way that the images of the elements t and t−1 are distinct. Let M be the kernel of this homomorphism. One can apply Theorem 1.4 to the pair (G1,M) to obtain the required embedding of G1 into a group Q. And since Z/3Z has exactly 3 conjugacy classes, the group Q will have (4CC). An application of Theorem 1.4 to the case when H = Z and M = 2Z⊳H also provides an affirmative answer to a question of A. Izosov from [9, Q. 11.42], asking whether there exists a torsion-free (3CC)-group Q that contains a normal subgroup N of index 2. The goal of the second part of this article is to show that every countable group can be realized as a group of outer automorphisms of some finitely generated (2CC)- group. This problem has some historical background: in [11] T. Matumoto proved that every group is a group of outer automorphisms of some group (in contrast, there are groups, e.g., Z, that are not full automorphism groups of any group); M. Droste, M. Giraudet, R. Göbel ([7]) showed that for every group C there exists a simple group S such that Out(S) ∼= C; I. Bumagina and D. Wise in [3] proved that each countable group C is isomorphic to Out(N) where N is a 2-generated subgroup of a countable C′(1/6)-group, and if, in addition, C is finitely presented then one can choose N to be residually finite. In Section 6 we establish a few useful statements regarding paths in the Cayley graph of a relatively hyperbolic group G, and apply them in Section 7 to obtain small cancellation quotients of G satisfying certain conditions. Finally, in Section 8 we prove the following Theorem 1.5. Let C be an arbitrary countable group. Then for every non-elemen- tary torsion-free word hyperbolic group F1 there exists a torsion-free group N sat- isfying the following properties: • N is a 2-generated quotient of F1; • N has (2CC); • Out(N) ∼= C. The principal difference between this theorem and the result of [3] is that our group N is torsion-free and simple. Moreover, if one applies Theorem 1.5 to the case when F1 is a torsion-free hyperbolic group with Kazhdan’s property (T) (and recalls that every quotient of a group with property (T) also has (T)), one will get Corollary 1.6. For any countable group C there is a 2-generated group N such that N has (2CC) and Kazhdan’s property (T), and Out(N) ∼= C. The reason why Kazhdan’s property (T) is interesting in this context is the question from [6, p. 134] which asked whether there exist groups that satisfy property (T) and have infinite outer automorphism groups (it can be motivated by a theorem of F. Paulin [22] which claims that the outer automorphism group is finite for any word hyperbolic group with property (T)). Positive answers to this question were obtained (using different methods) by Y. Ollivier and D. Wise [14], Y. de Cornulier [5], and I. Belegradek and D. Osin [2]. Corollary 1.6 not only shows that the group of outer automorphisms of a group N with property (T) can be infinite, but also demonstrates that there are no restrictions whatsoever on Out(N). GROUPS WITH FINITELY MANY CONJUGACY CLASSES 5 Acknowledgements. The author would like to thank D. Osin for fruitful dis- cussions and encouragement. 2. Relatively hyperbolic groups Assume that G is a group, {Hλ}λ∈Λ is a fixed collection of subgroups of G (called peripheral subgroups), and X is a subset of G. The subset X is called a relative generating set of G with respect to {Hλ}λ∈Λ if G is generated by X ∪ λ∈ΛHλ. In this case G a quotient of the free product F = (∗λ∈ΛHλ) ∗ F (X ), where F (X ) is the free group with basis X . Let R be a subset of F such that the kernel of the natural epimorphism F → G is the normal closure of R in the group F ; then we will say that G has relative presentation (2.1) 〈X , {Hλ}λ∈Λ ‖ R = 1, R ∈ R〉. If the sets X and R are finite, the relative presentation (2.1) is said to be finite. Set H = λ∈Λ(Hλ \ {1}). A finite relative presentation (2.1) is said to satisfy a linear relative isoperimetric inequality if there exists C > 0 such that, for every word w in the alphabet X ∪H (for convenience, we will further assume that X−1 = X ) representing the identity in the group G, one has f−1i R i fi, with equality in the group F , where Ri ∈ R, fi ∈ F , for i = 1, . . . , k, and k ≤ C‖w‖, where ‖w‖ is the length of the word w. The next definition is due to Osin (see [20]): Definition. the group G is called hyperbolic relative to (the collection of peripheral subgroups) {Hλ}λ∈Λ, if G admits a finite relative presentation (2.1) satisfying a linear relative isoperimetric inequality. This definition is independent of the choice of the finite generating set X and the finite set R in (2.1) (see [20]). We would also like to note that, in general, it does not require the group G to be finitely generated, which will be important in this paper. The definition immediately implies the following basic facts: Remark 2.1 ([20]). (a) Let {Hλ}λ∈Λ be an arbitrary family of groups. Then the free product G = ∗λ∈ΛHλ will be hyperbolic relative to {Hλ}λ∈Λ. (b) Any word hyperbolic group (in the sense of Gromov) is hyperbolic relative to the family {{1}}, where {1} denotes the trivial subgroup. Recall that a group H is called elementary if it has a cyclic subgroup of finite index. Further in this section we will assume that G is a non-elementary group hyperbolic relative to a family of proper subgroups {Hλ}λ∈Λ. An element g ∈ G is said to be parabolic if it is conjugated to an element of Hλ for some λ ∈ Λ. Otherwise g is said to be hyperbolic. Given a subgroup S ≤ G, we denote by S0 the set of all hyperbolic elements of S of infinite order. Lemma 2.2 ([17], Thm. 4.3, Cor. 1.7). For every g ∈ G0 the following conditions hold. 6 ASHOT MINASYAN 1) The element g is contained in a unique maximal elementary subgroup EG(g) of G, where (2.2) EG(g) = {f ∈ G : fg nf−1 = g±n for some n ∈ N}. 2) The group G is hyperbolic relative to the collection {Hλ}λ∈Λ ∪ {EG(g)}. Recall that a non-trivial subgroup H ≤ G is called malnormal if for every g ∈ G \H , H ∩ gHg−1 = {1}. The next lemma is a special case of Theorem 1.4 from [20]: Lemma 2.3. For any λ ∈ Λ and any g /∈ Hλ, the intersection Hλ ∩ gHλg −1 is finite. If h ∈ G, µ ∈ Λ and µ 6= λ, then the intersection Hλ ∩ hHµh −1 is finite. In particular, if G is torsion-free then Hλ is malnormal (provided that Hλ 6= {1}). Lemma 2.4 ([20], Thm. 2.40). Suppose that a group G is hyperbolic relative to a collection of subgroups {Hλ}λ∈Λ ∪ {S1, . . . , Sm}, where S1, . . . , Sm are word hyperbolic (in the ordinary non-relative sense). Then G is hyperbolic relative to {Hλ}λ∈Λ. Lemma 2.5 ([19], Cor. 1.4). Let G be a group which is hyperbolic relative to a collection of subgroups {Hλ}λ∈Λ ∪ {K}. Suppose that K is finitely generated and there is a monomorphism α : K → Hν for some ν ∈ Λ. Then the HNN-extension 〈G, t ‖ txt−1 = α(x), x ∈ K〉 is hyperbolic with respect to {Hλ}λ∈Λ. In [21] Osin introduced the following notion: a subgroup S ≤ G is suitable if there exist two elements g1, g2 ∈ S 0 such that g1 6≈ g2 and EG(g1)∩EG(g2) = {1}. For any S ≤ G with S0 6= ∅, one sets (2.3) EG(S) = EG(g) which is obviously a subgroup of G normalized by S. Note that EG(S) = {1} if the subgroup S is suitable in G. As shown in [1, Lemma 3.3], if S is non-elementary and S0 6= ∅ then EG(S) is the unique maximal finite subgroup of G normalized by Lemma 2.6. Let {H}λ∈Λ be a family of groups and let F be a torsion-free non- elementary word hyperbolic group. Then the free product G = (∗λ∈ΛHλ) ∗ F is hyperbolic relative to {Hλ}λ∈Λ and F is a suitable subgroup of G. Proof. Indeed, G is hyperbolic relative to {Hλ}λ∈Λ by Remark 2.1 and Lemma 2.4. Since F is non-elementary, there are elements of infinite order x, y ∈ F such that x 6≈ y (see, for example, [16, Lemma 3.2]). Evidently, x and y are hyperbolic elements of G that are not commensurable with each other, and the subgroups EG(x) = EF (x) ≤ F , EG(y) = EF (y) ≤ F are cyclic (as elementary subgroups of a torsion-free group). Hence EG(x) ∩ EG(y) = {1}, and thus F is suitable in G. � Lemma 2.7 ([21], Lemma 2.3). Suppose that G is a group hyperbolic relative to a family of subgroups {Hλ}λ∈Λ and S ≤ G is a suitable subgroup. Then one can find infinitely many pairwise non-commensurable (in G) elements g1, g2, · · · ∈ S 0 such that EG(gi) ∩ EG(gj) = {1} for all i 6= j. The following theorem was proved by Osin in [21] using the theory of small cancellation over relatively hyperbolic groups, and represents our main tool for obtaining new quotients of such groups having a number of prescribed properties: GROUPS WITH FINITELY MANY CONJUGACY CLASSES 7 Theorem 2.8 ([21], Thm. 2.4). Let G be a torsion-free group hyperbolic relative to a collection of subgroups {Hλ}λ∈Λ, let S be a suitable subgroup of G, and let T, U be arbitrary finite subsets of G. Then there exist a group G1 and an epimorphism η : G→ G1 such that: (i) The restriction of η to λ∈ΛHλ ∪ U is injective, and the group G1 is hy- perbolic relative to the collection {η(Hλ)}λ∈Λ; (ii) for every t ∈ T , we have η(t) ∈ η(S); (iii) η(S) is a suitable subgroup of G1; (iv) G1 is torsion-free; (v) the kernel ker(η) of η is generated (as a normal subgroup of G) by a finite collection of elements belonging to T · S. We have slightly changed the original formulation of the above theorem from [21], demanding the injectivity on V = λ∈ΛHλ ∪ U (instead of just λ∈ΛHλ) and adding the last point concerning the generators of the kernel. The latter follows from the explicit form of the relations, imposed on G (see the proof of Thm. 2.4 in [21]), and the former – from part 2 of Lemma 5.1 in [21] and the fact that any element from V has length (in the alphabet X ∪ H) at most N , where N = max{|h|X∪H : h ∈ U}+ 1. 3. Groups with finitely many conjugacy classes Lemma 3.1. Let G be a group and let x1, x2, x3, x4 ∈ G be elements of infinite order such that x1 6≈ xi, i = 2, 3, 4. Let H = 〈G, t ‖ tx3t −1 = x4〉 be the HNN-extension of G with associated cyclic subgroups generated by x3 and x4. Then x1 6≈ x2. Proof. Arguing by contradiction, assume that hxl1h −1xm2 = 1 for some h ∈ H , l,m ∈ Z \ {0}. The element h has a reduced presentation of the form h = g0t ǫ1g1t ǫ2 . . . tǫkgk where g0, . . . , gk ∈ G, ǫ1, . . . , ǫk ∈ Z \ {0}, and gj /∈ 〈x3〉 if 1 ≤ j ≤ k − 1 and ǫj > 0, ǫj+1 < 0 gj /∈ 〈x4〉 if 1 ≤ j ≤ k − 1 and ǫj < 0, ǫj+1 > 0 By the assumptions, x1 6≈ x2 hence k ≥ 1, and in the group H we have (3.1) hxl1h −1xm2 = g0t ǫ1g1t ǫ2 . . . tǫkgkx −ǫk . . . t−ǫ2g−11 t −ǫ1 g̃0 = 1, where g̃0 = g 2 ∈ G. By Britton’s Lemma (see [10, IV.2]), the left hand side in (3.1) can not be reduced, and this can happen only if gkx k belongs to either 〈x3〉 or 〈x4〉 in G, which would contradict the assumptions. Thus the lemma is proved. � Definition. Suppose that G is a group and Xi ⊂ G, i ∈ I, is a family of subsets. We shall say that Xi, i ∈ I, are independent if no element of Xi is commensurable with an element of Xj whenever i 6= j, i, j ∈ I. Lemma 3.2. Assume that G is a countable torsion-free group, n ∈ N, n ≥ 2, and non-empty subsets Xi ⊂ G \ {1}, i = 1, . . . , n− 1, are independent in G. Then G can be (isomorphically) embedded into a countable torsion-free group M in such a way that M has (nCC) and the subsets Xi, i = 1, . . . , n− 1, remain independent in 8 ASHOT MINASYAN Proof. For each i = 1, . . . , n− 1, fix an element xi ∈ Xi. First we embed G into a countable torsion-free group G1 such that for each non-trivial element g ∈ G there exist j ∈ {1, . . . , n− 1} and t ∈ G1 satisfying tgt −1 = xj in G1, and the subsets Xi, i = 1, . . . , n− 1, stay independent in G1. Let g1, g2, . . . be an enumeration of all non-trivial elements of G. Set G(0) = G and suppose that we have already constructed the group G(k), containing G, so that for each l ∈ {1, . . . , k} there is j ∈ {1, . . . , n− 1} such that the element gl is conjugated in G(k) to xj , and Xi, i = 1, . . . , n− 1, are independent in G(k). Suppose, at first, that gk+1 is commensurable in G(k) with an element of Xj for some j. Then gk+1 6≈ h for every h ∈ i=1,i6=j Xi. Define G(k + 1) to be the HNN-extension 〈G(k), tk+1 ‖ tk+1gk+1t k+1 = xj〉. By Lemma 3.1 the subsets Xi, i = 1, . . . , n− 1, will remain independent in G(k + 1). Thus we can assume that gk+1 is not commensurable with any element from i=1 Xi in G(k). According to the induction hypotheses one can apply Lemma 3.1 to the HNN-extension G(k + 1) = 〈G(k), tk+1 ‖ tk+1gk+1t k+1 = x1〉 to see that the subsets Xi ⊂ G ≤ G(k + 1), i = 1, . . . , n − 1, are independent in G(k + 1). Now, setG1 = k=0G(k). EvidentlyG1 has the required properties. In the same manner, one can embed G1 into a countable torsion-free group G2 so that each non- trivial element of G1 will be conjugated to xi in G2, for some i ∈ {1, . . . , n − 1}, and the subsets Xi, i = 1, . . . , n− 1, continue to be independent in G2. Proceeding like that we obtain the desired groupM = s=1Gs. By the construc- tion, M is a torsion-free countable group which has exactly n conjugacy classes: [1], [x1], . . . , [xn−1]. The subsets Xi, i = 1, . . . , n− 1, are independent in M because they are independent in Gs for each s ∈ N. � Corollary 3.3. In Lemma 3.2 one can add that the groupM is 2-boundedly simple. Proof. Let a torsion-free countable group G and its non-empty independent subsets Xi, i = 1, . . . , n− 1, be as in Lemma 3.2. Let F = F (a1, . . . , an−1, b1, . . . , bn−1) be the free group with the free generating set {a1, . . . , an−1, b1, . . . , bn−1}, and consider the group Ḡ = G ∗ F . For each i = 1, . . . , n− 1, define X̄i = Xi ∪ {ai, a i } ∪ {[aj, bi] | j = 1, . . . , n− 1, j 6= i} ⊂ Ḡ, where [aj , bi] = ajbia i . Using the universal properties of free groups and free products one can easily see that the subsets X̄i, i = 1, . . . , n− 1, are independent in Ḡ. Now we apply Lemma 3.2 to find a countable torsion-free (nCC)-group M , con- taining Ḡ, such that X̄i, i = 1, . . . , n− 1, are independent in M . Observe that this implies that for any given i = 1, . . . , n− 1, any two elements of X̄i are conjugate in M . For arbitrary x, y ∈ M \ {1} there exist i, j ∈ {1, . . . , n − 1} such that x and y ∼ aj . If i = j then x ∼ y. Otherwise, y ∼ a−1j and x ∼ [aj , bi] which is a product of two conjugates of aj , and, hence, of y. Therefore the group M is 2-boundedly simple, and since G ≤ Ḡ ≤M , the corollary is proved. � GROUPS WITH FINITELY MANY CONJUGACY CLASSES 9 Below is a particular (torsion-free) case of a theorem proved by Osin in [21, Thm. 2.6]: Lemma 3.4. Any countable torsion-free group S can be embedded into a 2-generated group M so that S is malnormal in M and every element of M is conjugated to an element of S in M . Proof. Following Osin’s proof of Theorem 2.6 from [21], we see that the required group M can be constructed as an inductive limit of relatively hyperbolic groups G(i), i ∈ N. More precisely, one sets G(0) = S ∗ F2, where F2 is a free group of rank 2, ξ0 = idG(0) : G(0) → G(0), and for each i ∈ N one constructs a group G(i) and an epimorphism ξi : G(0) → G(i) so that ξi is injective on S, G(i) is torsion-free and hyperbolic relative to {ξi(S)}, and ξi factors through ξi−1. The group M is defined to be the direct limit of (G(i), ξi) as i → ∞, i.e., Q = G(0)/N where N = i∈N ker(ξi). By Lemma 2.3, ξi(S) is malnormal in G(i), hence the image of S will also be malnormal in M . � Theorem 3.5. Let G be a torsion-free countable group, n ∈ N, n ≥ 2, and non- empty subsets Xi ⊂ G \ {1}, i = 1, . . . , n − 1, be independent in G. Then G can be embedded into a 2-generated torsion-free group M which has (nCC), so that the subsets Xi, i = 1, . . . , n− 1, stay independent in M . Moreover, one can choose M to be 2-boundedly simple. Proof. First, according to Corollary 3.3, we can embed the groupG into a countable torsion-free group S such that S has (nCC) and is 2-boundedly simple, and Xi, i = 1, . . . , n − 1, are independent in S. Second, we apply Lemma 3.4 to find the 2-generated group M from its claim. Choose any i, j ∈ {1, . . . , n − 1}, i 6= j, and x ∈ Xi, y ∈ Xj. If x and y were commensurable inM , the malnormality of S would imply that x and y must be commensurable in S, contradicting the construction. Hence Xi, i = 1, . . . , n − 1, are independent in M . Since each element of M is conjugated to an element of S, it is evident that M has (nCC), is torsion-free and 2-boundedly simple. � Remark 3.6. A more direct proof of Theorem 3.5, not using Lemma 3.4, can be extracted from the proof of Theorem 5.1 (see Section 5), applied to the case when H =M . It is easy to see that Theorem 3.5 immediately implies Corollary 1.2 that was formulated in the Introduction. As promised, we now give a counterexample to Question 2 (formulated in the Introduction) for any n ≥ 3. Example 2. Let G2 = 〈a, t ‖ tat −1 = a2〉 be the Baumslag-Solitar BS(1, 2)-group. ThenG2 is torsion-free, and the elements t 2, t4, . . . , t2 are pairwise non-conjugate in G2 (since this holds in the quotient of G2 by the normal closure of a). Suppose that G2 is embedded into a group M having (nCC) so that t 2, t4, . . . , t2 pairwise non-conjugate in M . Then t2, . . . , t2 is the list of representatives of all non-trivial conjugacy classes of M . Therefore there exist k, l ∈ {1, . . . , n− 1} such that t and a . Consequently and t2 hence k = l = n− 1 according to the assumptions. But this yields n−1 M ∼ t2, 10 ASHOT MINASYAN implying that t2 ∼ t4, which contradicts our assumptions. Thus G2 can not be embedded into a (nCC)-group M in such a way that t2, . . . , t2 remain pairwise non-conjugate in M . 4. Normal subgroups with (nCC) If M is a normal subgroup of a group H , then H naturally acts on M by con- jugation. We shall say that this action preserves the conjugacy classes of M if for any h ∈ H and a ∈M there exists b ∈M such that hah−1 = bab−1. Lemma 4.1. Let G be a torsion-free group, N ⊳ G and x1, . . . , xl ∈ N \ {1} be pairwise non-commensurable (in G) elements. Then there exists a partition N \ {1} = k=1Xk of N \ {1} into a (disjoint) union of G-independent subsets X1, . . . , Xl such that xk ∈ Xk for every k ∈ {1, . . . , l}. Moreover, each subset Xk will be invariant under conjugation by elements of G. Proof. Since ≈ is an equivalence relation on G\{1}, one can find the corresponding decomposition: G \ {1} = j∈J Yj , where Yj is an equivalence class for each j ∈ J . For each k = 1, . . . , l, there exists j(k) ∈ J such that xk ∈ Yj(k). Note that j(k) 6= j(m) if k 6= m since xk 6≈ xm. Denote J ′ = J \ {j(1), . . . , j(l − 1)}, X1 = Yj(1) ∩N, . . . , Xl−1 = Yj(l−1) ∩N, and Xl = Yj ∩N. EvidentlyN\{1} = k=1Xk, X1, . . . , Xl are independent subsets of G and xk ∈ Xk for each k = 1, . . . , l. The final property follows from the construction since for any a ∈ G and j ∈ J we have aYja −1 = Yj and aNa −1 = N . � Lemma 4.2. For every countable group C and each n ∈ N, n ≥ 2, there exists a countable torsion-free group H having a normal subgroup M ⊳H such that (i) M satisfies (nCC); (ii) M is 2-boundedly simple; (iii) the natural action of H on M preserves the conjugacy classes of M ; (iv) H/M ∼= C. Proof. Let H ′0 be the free group of infinite countable rank. Choose N 0 so that H ′0/N ∼= C. Let F = F (x1, . . . , xn−1) denote the free group freely generated by x1, . . . , xn−1. Define H0 = H 0 ∗ F and let N0 be the normal closure of N 0 ∪ F in H0. Evidently, H0/N0 ∼= H ∼= C and the elements x1, . . . , xn−1 ∈ N0 \ {1} are pairwise non-commensurable in H0. By Lemma 4.1, one can choose a partition of N0 \ {1} into the union of H0- independent subsets: N0 \ {1} = so that xk ∈ X0k for each k = 1, . . . , n− 1. By Corollary 3.3 there exists a countable torsion-free 2-boundedly simple group M1 with the property (nCC) containing a copy of N0, such that the subsets X0k, k = 1, 2, . . . , n − 1, are independent in M1. Denote by H1 = H0 ∗N0 M1 the amalgamated product of H0 and M1 along N0, and let N1 be the normal closure GROUPS WITH FINITELY MANY CONJUGACY CLASSES 11 of M1 in H1. Note that H1 is torsion-free as an amalgamated product of two torsion-free groups ([10, IV.2.7]). We need to verify that the elements x1, . . . , xn−1 are pairwise non-commen- surable in H1. Indeed, if a ∈ X0k and b ∈ X0l, k 6= l, are conjugate in H1 then there must exist y1, . . . , yt ∈ M1 \N0 and z1, . . . , zt−1 ∈ H0 \N0, z0, zt ∈ H0 such z0y1 · · · zt−1ytztaz t−1 · · · y Suppose that t is minimal possible with this property. As conjugation by elements of H0 preserves X0k and X0l, we can assume that z0, zt = 1. Hence y1z1 · · · zt−1ytay t−1 · · · z −1 H1= 1. By the properties of amalgamated products (see [10, Ch. IV]), the left-hand side in this equality can not be reduced, consequently ytay t ∈ N0 \ {1} = k=1 X0k. But then ytay t ∈ X0k by the properties of M1, contradicting the minimality of t. Thus, we have shown that xk 6≈ xl whenever k 6= l. Assume that the groupHi = Hi−1∗Ni−1Mi, i ≥ 1, has already been constructed, so that 0) Hi is countable and torsion-free; 1) Ni−1 ⊳Hi−1; 2) Hi−1 = H0 ·Ni−1 and H0 ∩Ni−1 = N0; 3) Mi satisfies (nCC); 4) x1, . . . , xn−1 are pairwise non-commensurable in Hi. Let Ni be the normal closure of Mi in Hi. Because of the condition 4) and Lemma 4.1, one can find a partition of Ni \ {1} into a union of Hi-independent subsets: Ni \ {1} = so that xk ∈ Xik for each k = 1, . . . , n − 1. By Lemma 3.2 there is a countable group a Mi+1, with (nCC), containing a copy of Ni, in which the subsets Xik, i = 1, . . . , n− 1, remain independent. Set Hi+1 = Hi ∗Ni Mi+1. Now, it is easy to verify that the analogs of the conditions 0)-3) hold for Hi+1 and (4.1) Ni−1 ≤Mi ≤ Ni ≤Mi+1. The analog of the condition 4) is true in Hi+1 by the same considerations as before (in the case of H1). Define the group H = i=1Hi and its subgroup M = i=1Ni. Observe that the condition 0) implies that H is torsion-free, condition 1) implies that M is normal in H , and 2) implies that H = H0 ·M and H0 ∩M = N0. Hence H/M ∼= H0/(H0∩M) ∼= C. Applying (4.1) we getM = i=1Mi, and thus, by the conditions 3), 4) it enjoys the property (nCC): each element of M will be a conjugate of xk for some k ∈ {1, . . . , n− 1}. Since x1, . . . , xn−1 ∈M1 ≤M and M1 is 2-boundedly simple, then so will be M . Finally, 4) implies that xk ≁ xl whenever k 6= l, and, consequently, the natural action of H on M preserves its conjugacy classes. Q.e.d. � Lemma 4.3. Suppose that G is a group, N ⊳ G, A,B ≤ G and ϕ : A → B is an isomorphism such that ϕ(a) ∈ aN (i.e., the canonical images of a and ϕ(a) in 12 ASHOT MINASYAN G/N coincide) for each a ∈ A. Let L = 〈G, t ‖ tat−1 = ϕ(a), ∀ a ∈ A〉 be the HNN-extension of G with associated subgroups A and B, and let K be the normal closure of 〈N, t〉 in L . Then G ∩K = N . Proof. This statement easily follows from the universal property of HNN-extensions and is left as an exercise for the reader. � The next lemma will allow us to construct (nCC)-groups that are not simple: Lemma 4.4. Assume that H is a torsion-free countable group and M⊳H is a non- trivial normal subgroup. Then H can be isomorphically embedded into a countable torsion-free group G possessing a normal subgroup K ⊳G such that 1) G = HK and H ∩K =M ; 2) ∀ x, y ∈ G \ {1}, ϕ(x) = ϕ(y) if and only if ∃ h ∈ K such that x = hyh−1, where ϕ : G → G/K is the natural homomorphism; in particular, K will have (2CC); 3) ∀ x, y ∈ G \ {1}, x ∼ y if and only if ϕ(x) ∼ ϕ(y); Proof. Choose a set of representatives Z ⊂ H of cosets of H modulo M , in such a way that each coset is represented by a unique element from Z and 1 /∈ Z. DefineG(0)=H andK(0) =M . Enumerate the elements ofG(0)\{1}: g1, g2, . . . . First we embed the group G(0) into a countable torsion-free group G1, having a normal subgroup K1 ⊳G1, such that G1 = HK1, H ∩K1 =M and for every i ≥ 0 there are ti ∈ K1 and zi ∈ Z satisfying tigit i = zi. Suppose that the (countable torsion-free) group G(j), j ≥ 0, and K(j) ⊳ G(j), have already been constructed so that H ≤ G(j), G(j) = HK(j), H ∩K(j) = M and, if j ≥ 1, then tjgjt j = zj for some tj ∈ K(j) and zj ∈ Z. The group G(j+1), containing G(j), is defined as the following HNN-extension: G(j + 1) = 〈G(j), tj+1 ‖ tj+1gj+1t j+1 = zj+1〉, where zj+1 ∈ Z ⊂ H is the unique representative satisfying gj+1 ∈ zj+1K(j) in G(j). Denote by K(j+1)⊳G(j+1) the normal closure of 〈K(j), tj+1〉 in G(j+1). Evidently the group G(j + 1) is countable and torsion-free, H ≤ G(j) ≤ G(j + 1), G(j + 1) = HK(j + 1) and H ∩K(j + 1) = H ∩K(j) =M by Lemma 4.3. Now, it is easy to verify that the group G1 = j=0G(j) and its normal subgroup j=0K(j) enjoy the required properties. In the same way we can embed G1 into a countable torsion-free group G2, that has a normal subgroup K2⊳G2, so that G2 = HK2, H∩K2 =M and each element of G1 \ {1} is conjugated in G2 to a corresponding element of Z. Performing such a procedure infinitely many times we achieve the group G = i=1Gi and a normal subgroup K = i=1Ki ⊳ G that satisfy the claims 1) and 2) of the lemma. It is easy to see that the claim 2) implies 3), thus the proof is finished. � 5. Adding finite generation Theorem 5.1. Assume that H is a countable torsion-free group and M is a non- trivial normal subgroup of H. Let F be an arbitrary non-elementary torsion-free word hyperbolic group. Then there exist a countable torsion-free group Q, containing H, and a normal subgroup N ⊳Q with the following properties: 1. H is malnormal in Q; 2. Q = H ·N and N ∩H =M ; GROUPS WITH FINITELY MANY CONJUGACY CLASSES 13 3. N is a quotient of F ; 4. the centralizer CQ(N) of N in Q is trivial; 5. for every q ∈ Q there is z ∈ H such that q Proof. The group Q will be constructed as a direct limit of relatively hyperbolic groups. Step 0 . Set G(0) = H ∗F and F (0) = F ; then G(0) is hyperbolic relative to its subgroupH and F (0) is a suitable subgroup of G(0) by Lemma 2.6. LetN(0)⊳G(0) be the normal closure of the subgroup 〈M,F 〉 in G(0). Evidently G(0) = H ·N(0) and H ∩ N(0) = M . Enumerate all the elements of N(0): {g0, g1, g2, . . . }, and of G(0): {q0, q1, q2, . . . }, in such a way that g0 = q0 = 1. Steps 0-i . Assume the groups G(j), j = 0, . . . , i, i ≥ 0, have been already constructed, so that 1◦. for each 1 ≤ j ≤ i there is an epimorphism ψj−1 : G(j−1) → G(j) which is injective on (the image of) H in G(j − 1). Denote F (j) = ψj−1(F (j − 1)), N(j) = ψj−1(N(j − 1)); 2◦. G(j) is torsion-free and hyperbolic relative to (the image of) H , and F (j) ≤ G(j) is a suitable subgroup, j = 0, . . . , i; 3◦. G(j) = H ·N(j), N(j)⊳G(j) and H ∩N(j) =M , j = 0, . . . , i; 4◦. the natural image ḡj of gj in G(j) belongs to F (j), j = 0, . . . , i; 5◦. there exists zj ∈ H such that q̄j ∼ zj, j = 0, . . . , i, where q̄j is the image of qj in G(j). Step i+1 . Let q̂i+1 ∈ G(i), ĝi+1 ∈ N(i) be the images of qi+1 and gi+1 in G(i). First we construct the group G(i+1/2), its normal subgroup Ki+1 and its element ti+1 as follows. If for some f ∈ G(i), f q̂i+1f −1 = z ∈ H , then set G(i + 1/2) = G(i), Ki+1 = N(i)⊳G(i + 1/2) and ti+1 = 1. Otherwise, q̂i+1 is a hyperbolic element of infinite order in G(i). Since G(i) is torsion-free, the elementary subgroup EG(i)(q̂i+1) is cyclic, thus EG(i)(q̂i+1) = 〈hx〉 for some h ∈ H and x ∈ N(i) (by 3◦), and q̂i+1 = (hx) m for some m ∈ Z. Now, by Lemma 2.2, G(i) is hyperbolic relative to {H, 〈hx〉}. Choose y ∈M so that hy 6= 1 and let G(i + 1/2) be the following HNN-extension of G(i): G(i + 1/2) = 〈G(i), ti+1 ‖ ti+1(hx)t i+1 = hy〉. The group G(i+ 1/2) is torsion-free and hyperbolic relative to H by Lemma 2.5. Let us now verify that the subgroup F (i) is suitable in G(i + 1/2). Indeed, according to Lemma 2.7, there are two hyperbolic elements f1, f2 ∈ F (i) of infinite order in G(i) such that fl 6≈ hx, fl 6≈ hy, l = 1, 2, and f1 6≈ f2. Then G(i+1/2) 6≈ f2 by Lemma 3.1. It remains to check that fl is a hyperbolic element of G(i + 1/2) for each l = 1, 2. Choose an arbitrary element w ∈ H and observe that fl 6≈ w (since H is malnormal in G(i) by Lemma 2.3, a non-trivial power of fl is conjugated to an element of H if and only if fl is conjugated to an element of H in G(i), but the latter is impossible because fl is hyperbolic in G(i)). Applying Lemma 3.1 again, we get that fl G(i+1/2) 6≈ w for any w ∈ H . Hence f1, f2 ∈ F (i) are hyperbolic elements of infinite order in G(i+1/2). The intersection EG(i+1/2)(f1)∩ 14 ASHOT MINASYAN EG(i+1/2)(f2) must be finite, since these groups are virtually cyclic (by Lemma 2.2), and f1 is not commensurable with f2 in G(i+1/2). But G(i+ 1/2) is torsion-free, therefore EG(i+1/2)(f1) ∩EG(i+1/2)(f2) = {1}. Thus F (i) is a suitable subgroup of G(i+ 1/2). Lemma 4.3 assures that H ∩Ki+1 =M where Ki+1 ⊳G(i + 1/2) is the normal closure of 〈N(i), ti+1〉 in G(i + 1/2). Finally, note that ti+1q̂i+1t i+1 = ti+1(hx) mt−1i+1 = (hy) m = z ∈ H in G(i + 1/2). Now, that the group G(i+ 1/2) has been constructed, set Ti+1 = {ĝi+1, ti+1} ⊂ Ki+1 and define G(i+ 1) as follows. Since Ti+1 ·F (i) ⊂ Ki+1 ⊳G(i+1/2), we can apply Theorem 2.8 to find a group G(i+1) and an epimorphism ϕi : G(i+1/2) → G(i + 1) such that ϕi is injective on H , G(i + 1) is torsion-free and hyperbolic relative to (the image of) H , {ϕi(ĝi+1), ϕi(ti+1)} ⊂ ϕi(F (i)), ϕi(F (i)) is a suitable subgroup of G(i + 1), and ker(ϕi) ≤ Ki+1. Denote by ψi the restriction of ϕi on G(i). Then ψi(G(i)) = ϕi(G(i)) = G(i + 1) because G(i + 1/2) was generated by G(i) and ti+1, and according to the construction, ti+1 ∈ ϕi(F (i)) ≤ ϕi(G(i)). Now, after defining F (i+1) = ψi(F (i)), N(i+1) = ψi(N(i)), ḡi+1 = ϕi(ĝi+1) ∈ F (i+1) and zi+1 = ϕi(z) ∈ H , we see that the conditions 1 ◦,2◦,4◦ and 5◦ hold in the case when j = i+1. The properties G(i+1) = H ·N(i+1) and N(i+1)⊳G(i+1) are immediate consequences of their analogs for G(i) and N(i). Finally, observe that ϕ−1i (H ∩N(i+ 1)) = H · ker(ϕi) ∩N(i) · ker(ϕi) = H ∩N(i) · ker(ϕi) · ker(ϕi) H ∩Ki+1 · ker(ϕi) =M · ker(ϕi). Therefore H ∩N(i+ 1) =M and the condition 3◦ holds for G(i + 1). Let Q = G(∞) be the direct limit of the sequence (G(i), ψi) as i → ∞, and let F (∞) and N = N(∞) be the limits of the corresponding subgroups. Then Q is torsion-free by 2◦, N ⊳ Q, Q = H · N and H ∩N = M by 3◦. N ≤ F (∞) by 4◦, and 5◦ implies the condition 5 from the claim. Since F (0) ≤ N(0) we get F (∞) ≤ N . Thus N = F (∞) is a homomorphic image of F (0) = F . For any i, j ∈ N ∪ {∞}, i < j, we have a natural epimorphism ζij : G(i) → G(j) such that if i < j < k then ζjk ◦ ζij = ζik. Take any g ∈ G(0). Since F = F (0) is finitely generated, using the properties of direct limits one can show that if w = ζ0∞(g) ∈ CQ(F (∞)) in Q, then ζ0j(g) ∈ CG(j)(F (j)) for some j ∈ N. But CG(j)(F (j)) ≤ EG(j)(F (j)) = {1} (by formulas (2.2) and (2.3)) because F (j) is a suitable subgroup of G(j), hence w = ζj∞ ζ0j(g) = 1, that is, CQ(F (∞)) = CQ(N) = {1}. This concludes the proof. � The next statement is well-known: Lemma 5.2. Assume G is a group and N ⊳ G is a normal subgroup such that CG(N) ⊆ N , where CG(N) is the centralizer of N in G. Then the quotient-group G/N embeds into the outer automorphism group Out(N). Proof. The action of G on N by conjugation induces a natural homomorphism ϕ from G to the automorphism group Aut(N) of N . Since ϕ(N) is exactly the group of inner automorphisms Inn(N) of N , one can define a new homomorphism ϕ̄ : G/N → Out(N) = Aut(N)/Inn(N) in the natural way: ϕ̄(gN) = ϕ(g)Inn(N) for every gN ∈ G/N . It remains to check that ϕ̄ is injective, i.e., if g ∈ G \N then ϕ̄(gN) 6= 1 in Out(N); or, equivalently, ϕ(g) /∈ Inn(N). Indeed, otherwise there GROUPS WITH FINITELY MANY CONJUGACY CLASSES 15 would exist a ∈ N such that ghg−1 = aha−1 for every h ∈ N , thus N 6∋ a−1g ∈ CG(N), contradicting the assumptions. Q.e.d. � Note that for an arbitrary group N , any subgroup C ≤ Out(N) naturally acts on the set of conjugacy classes C(N) of the group N . Theorem 5.3. For any n ∈ N, n ≥ 2, and an arbitrary countable group C, C can be isomorphically embedded into the outer automorphism group Out(N) of a group N satisfying the following conditions: • N is torsion-free; • N is generated by two elements; • N has (nCC) and the natural action of C on C(N) is trivial; • N is 2-boundedly simple. Proof. By Lemma 4.2 we can find a countable torsion-free group H and its normal subgroup M enjoying the properties (i)-(iv) from its claim. Now, if F denotes the free group of rank 2, we can obtain a countable torsion-free group Q together with its normal subgroup N that satisfy the conditions 1-5 from the statement of Theorem 5.1. Then N is torsion-free and generated by two elements (as a quotient of F ). Condition 2 implies that Q/N ∼= H/M ∼= C and, by 4 and Lemma 5.2, C embeds into the group Out(N). Using property 5, for each g ∈ N we can find u ∈ Q and z ∈ H such that ugu−1 = z ∈ N ∩ H = M . Since Q = HN , there are h ∈ H and x ∈ N such that u = hx. Since z, h−1zh ∈ M and the action of H on M preserves the conjugacy classes of M , there is r ∈ M such that rh−1zhr−1 = z, hence z = rh−1ugu−1(rh−1)−1 = rxgx−1r−1, where v = rx ∈ N . Thus for every g ∈ N there is v ∈ N such that vgv−1 ∈ M . Evidently, this implies that N is also 2- boundedly simple. Since M has (nCC), the number of conjugacy classes in N will be at most n. Suppose x1, x2 ∈ M and x1 ≁ x2. Then x1 ≁ x2 (by the property (iii) from the claim of Lemma 4.2), and since H is malnormal in Q we get x1 ≁ x2. Hence ≁ x2, i.e., N also enjoys (nCC). The fact that the natural action of C on C(N) is trivial follows from the same property for the action of H on C(M) and the malnormality of H in Q. Q.e.d. � Now, let us proceed with the Proof of Theorem 1.4. First we apply Lemma 4.4 to construct a group G and a normal subgroup K ⊳ G according to its claim. Now, by Theorem 5.1, there is a groupQ, having a normal subgroupN⊳Q such that G is malnormal in Q, Q = GN , G ∩ N = K, rank(N) ≤ 2 (if one takes the free group of rank 2 as F ) and every element q ∈ Q is conjugated (in Q) to an element of G. By claim 2) of Lemma 4.4, K has (2CC), and an argument, similar to the one used in the proof of Theorem 5.3, shows that N will also have (2CC). Consequently, rank(N) > 1 because N is torsion-free, hence rank(N) = 2. Since G = HK and H ∩ K = M we have Q = HKN = HN and H ∩ N = H ∩K =M . Since Q/N ∼= H/M and N can be generated by two elements, we can conclude that rank(Q) ≤ rank(H/M) + 2. 16 ASHOT MINASYAN Consider arbitrary x, y ∈ Q \ {1} and suppose that ϕ(x) ∼ ϕ(y). By Theorem 5.1, there are w, z ∈ G \ {1} such that x ∼ w and y ∼ z. Therefore ϕ(w) ∼ ϕ(z), hence the images of w and z in G/K are also conjugate. By claim 3) of Lemma 4.4, w ∼ z, implying x ∼ y. � Theorem 1.4 provides an alternative way of obtaining torsion-free groups that have finitely many conjugacy classes: for any countable group C we can choose a free group H of countable rank and a normal subgroup {1} 6= M ⊳ H so that H/M ∼= C, and then apply Theorem 1.4 to the pair (H,M) to get Corollary 5.4. Assume that n ∈ N, n ≥ 2, and C is a countable group that contains exactly (n− 1) distinct conjugacy classes. Then there exists a torsion-free group Q and N ⊳Q such that • Q/N ∼= C; • N has (2CC) and Q has (nCC); • rank(N) = 2 and rank(Q) ≤ rank(C) + 2. Corollary 5.5. The group G1, given by presentation (1.1), can be isomorphically embedded into a 2-generated torsion-free group Q satisfying (4CC) in such a way that t ≁ t−1. Proof. Denote by K the kernel of the homomorphism ϕ : G1 → Z3, for which ϕ(a) = 0 and ϕ(t) = 1, where Z3 is the group of integers modulo 3. Now, apply Theorem 1.4 to the pair (G1,K) to find the group Q, containing G1, and the normal subgroup N ⊳ Q from its claim. Since Q/N ∼= G1/K ∼= Z3 has (3CC), the group Q will have (4CC). We also have t ≁ t−1 because the images of t and t−1 are not conjugate in Q/N . Choose an element q1 ∈ Q \ N . Then q2 = q 1 ∈ N \ {1} and since N is 2- generated and has (2CC), there is q3 ∈ N such that N = 〈q2, q3〉 in Q. As Q/N is generated by the image of q1, the group Q will be generated by {q1, q2, q3}, and, consequently, by {q1, q3}. Q.e.d. � 6. Combinatorics of paths in relatively hyperbolic groups Let G be a group hyperbolic relative to a family of proper subgroups {Hλ}λ∈Λ, and let X be a finite symmetrized relative generating set of G. Denote H = λ∈Λ (Hλ \ {1}). For a combinatorial path p in the Cayley graph Γ(G,X ∪H) (of G with respect to X ∪H) p−, p+, L(p), and lab(p) will denote the initial point, the ending point, the length (that is, the number of edges) and the label of p respectively. p−1 will be the path obtained from p by following it in the reverse direction. Further, if Ω is a subset of G and g ∈ 〈Ω〉 ≤ G, then |g|Ω will be used to denote the length of a shortest word in Ω±1 representing g. We will be using the following terminology from [20]. Suppose q is a path in Γ(G,X ∪ H). A subpath p of q is called an Hλ-component for some λ ∈ Λ (or simply a component) of q, if the label of p is a word in the alphabet Hλ \ {1} and p is not contained in a bigger subpath of q with this property. Two components p1, p2 of a path q in Γ(G,X ∪ H) are called connected if they are Hλ-components for the same λ ∈ Λ and there exists a path c in Γ(G,X ∪ H) connecting a vertex of p1 to a vertex of p2 such that lab(c) entirely consists of letters GROUPS WITH FINITELY MANY CONJUGACY CLASSES 17 from Hλ. In algebraic terms this means that all vertices of p1 and p2 belong to the same coset gHλ for a certain g ∈ G. We can always assume c to have length at most 1, as every nontrivial element of Hλ is included in the set of generators. An Hλ-component p of a path q is called isolated if no other Hλ-component of q is connected to p. The next statement is a particular case of Lemma 2.27 from [20]; we shall for- mulate it in a slightly more general form, as it appears in [18, Lemma 2.7]: Lemma 6.1. Suppose that a group G is hyperbolic relative to a family of subgroups {Hλ}λ∈Λ. Then there exists a finite subset Ω ⊆ G and a constant K ∈ N such that the following holds. Let q be a cycle in Γ(G,X ∪H), p1, . . . , pk be a collection of isolated components of q and g1, . . . , gk be the elements of G represented by Lab(p1), . . . ,Lab(pk) respectively. Then g1, . . . , gk belong to the subgroup 〈Ω〉 ≤ G and the word lengths of gi’s with respect to Ω satisfy |gi|Ω ≤ KL(q). Definition. Suppose that m ∈ N and Ω is a finite subset of G. Define W(Ω,m) to be the set of all words W over the alphabet X ∪H that have the following form: W ≡ x0h0x1h1 . . . xlhlxl+1, where l ∈ Z, l ≥ −2 (if l = −2 then W is the empty word; if l = −1 then W ≡ x0), hi and xi are considered as single letters and 1) xi ∈ X ∪{1}, i = 0, . . . , l+1, and for each i = 0, . . . , l, there exists λ(i) ∈ Λ such that hi ∈ Hλ(i); 2) if λ(i) = λ(i + 1) then xi+1 /∈ Hλ(i) for each i = 0, . . . , l− 1; 3) hi /∈ {h ∈ 〈Ω〉 : |h|Ω ≤ m}, i = 0, . . . , l. Choose the finite subset Ω ⊂ G and the constant K > 0 according to the claim of Lemma 6.1. Recall that a path q in Γ(G,X ∪ H) is said to be without backtracking if all of its components are isolated. Lemma 6.2. Let q be a path in the Cayley graph Γ(G,X ∪ H) with Lab(q) ∈ W(Ω,m) and m ≥ 5K. Then q is without backtracking. Proof. Assume the contrary to the claim. Then one can choose a path q providing a counterexample of the smallest possible length. Thus if p1, . . . , pl is the (consec- utive) list of all components of q then l ≥ 2, p1 and pl must be connected Hλ′ - components, for some λ′ ∈ Λ, the components p2, . . . , pl−1 must be isolated, and q starts with p1 and ends with pl. Since Lab(q) ∈ W(Ω,m) we have L(q) ≤ 2l− 1. If l = 2 then the (X ∪ {1})-letter between p1 and p2 would belong to Hλ′ contradicting the property 2) from the definition of W(Ω,m). Therefore l ≥ 3. Since p1 and pl are connected, there exists a path v in Γ(G,X ∪ H) between (pl)− and (p1)+ with Lab(v) ∈ Hλ′ (thus we can assume that L(v) ≤ 1). Denote by q̂ the subpath of q starting with (p1)+ and ending with (pl)−. Note that L(q̂) = L(q)−2 ≤ 2l−3, and p2, . . . , pl−1 is the list of components of q̂, all of which are isolated. If one of them were connected to v it would imply that it is connected to p1 contradicting with the minimality of q. Hence the cycle o = q̂v possesses 18 ASHOT MINASYAN k = l − 2 ≥ 1 isolated components, which represent elements h1, . . . , hk ∈ H. Consequently, applying Lemma 6.1 one obtains that hi ∈ 〈Ω〉, i = 1, . . . , k, and |hi|Ω ≤ KL(o) ≤ K(L(q̂) + 1) ≤ K(2l− 2). By the condition 3) from the definition of W(Ω,m) one has |hi|Ω > m ≥ 5K for each i = 1, . . . , k. Hence k · 5K ≤ |hi|Ω ≤ K(2l− 2), or 5 ≤ 2l − 2 which contradicts the inequality k ≥ l − 2. Q.e.d. � Definition. Consider an arbitrary cycle o = rqr′q′ in Γ(G,X ∪ H), where Lab(q) and Lab(q′) belong to W(Ω,m). Let p be a component of q (or q′). We will say that p is regular if it is not an isolated component of o. If m ≥ 5K, and hence q and q′ are without backtracking by Lemma 6.2, this means that p is either connected to some component of q′ (respectively q), or to a component of r or r′. Lemma 6.3. In the above notations, suppose that m ≥ 7K and denote C = max{L(r),L(r′)}. Then (a) if C ≤ 1 then every component of q or q′ is regular; (b) if C ≥ 2 then each of q and q′ can have at most 4C components which are not regular. (c) if l is the number of components of q, then at least (l− 6C) of components of q are connected to components of q′; and two distinct components of q can not be connected to the same component of q′. Similarly for q′. Proof. Assume the contrary to (a). Then one can choose a cycle o = rqr′q′ with L(r),L(r′) ≤ 1, having at least one isolated component on q or q′, and such that L(q) + L(q′) is minimal. Clearly the latter condition implies that each component of q or q′ is an isolated component of o. Therefore q and q′ together contain k distinct isolated components of o, representing elements h1, . . . , hk ∈ H, where k ≥ 1 and k ≥ (L(q) − 1)/2 + (L(q′) − 1)/2. Applying Lemma 6.1 we obtain hi ∈ 〈Ω〉, i = 1, . . . , k, and |hi|Ω ≤ KL(o) ≤ K(L(q) + L(q ′) + 2). Recall that |hi|Ω > m ≥ 7K by the property 3) from the definition of W(Ω,m). Therefore i=1 |hi|Ω ≥ k · 7K, implying L(q′) L(q)− 1 L(q′)− 1 which yields a contradiction. Let us prove (b). Suppose that C ≥ 2 and q contains more than 4C isolated components of o. We shall consider two cases: Case 1. No component of q is connected to a component of q′. Then a com- ponent of q or q′ can be regular only if it is connected to a component of r or r′. Since, by Lemma 6.2, q and q′ are without backtracking, two distinct components of q or q′ can not be connected to the same component of r (or r′). Hence q and GROUPS WITH FINITELY MANY CONJUGACY CLASSES 19 q′ together can contain at most 2C regular components. Thus the cycle o has k isolated components, representing elements h1, . . . , hk ∈ H, where k ≥ 4C > 4 and k ≥ (L(q)−1)/2+(L(q′)−1)/2−2C. By Lemma 6.1, hi ∈ 〈Ω〉 for each i = 1, . . . , k, i=1 |hi|Ω ≤ K(L(q) + L(q ′) + 2C). Once again we can use the property 3) from the definition of W(Ω,m) to achieve L(q′) L(q)− 1 L(q′)− 1 − 2C + 1 + 3C L(q)− 1 L(q′)− 1 ≤ 2 + yielding a contradiction. Case 2. The path q has at least one component which is connected to a com- ponent of q′. Let p1, . . . , pl denote the sequence of all components of q. By part (a), if ps and pt, 1 ≤ s ≤ t ≤ l, are connected to components of q ′, then for any j, s ≤ j ≤ t, pj is connected to some component of q ′ (because q is without back- tracking by Lemma 6.2). We can take s (respectively t) to be minimal (respectively maximal) possible. Consequently p1, . . . , ps−1, pt+1, . . . , pl will contain the set of all isolated components of o that belong to q, and none of these components will be connected to a component of q′. Without loss of generality we may assume that s− 1 ≥ 4C/2 = 2C. Since ps is connected to some component p′ of q′, there exists a path v in Γ(G,X∪H) satisfying v− = (ps)−, v+ = p +, Lab(v) ∈ H ∪ {1}, L(v) ≤ 1. Let q̄ (respectively q̄ ′) denote the subpath of q (respectively q′) from q− to (ps)− (respectively from p + to q Consider a new cycle ō = rq̄vq̄′. Reasoning as before, one can show that ō has k isolated components, where k ≥ 2C ≥ 4 and k ≥ (L(q̄)−1)/2+(L(q̄′)−1)/2−C−1. Now, an application of Lemma 6.1 to the cycle ō together with the property 3) from the definition of W(Ω,m) will lead to a contradiction as before. By the symmetry, the statement (b) of the lemma also holds for q′. The claim (c) follows from (b) and the estimate L(r) + L(r′) ≤ 2C because if two different components p and p̄ of q were connected to the same component of some path in Γ(G,X ∪H), then p and p̄ would also be connected with each other, which would contradict Lemma 6.2. � Lemma 6.4. In the previous notations, let m ≥ 7K, C = max{L(r),L(r′)}, and let p1, . . . , pl, p 1, . . . , p l′ be the consecutive lists of the components of q and q respectively If l ≥ 12max{C, 1} + 2, then there are indices s, t, s′ ∈ N such that 1 ≤ s ≤ 6C + 1, l − 6max{C, 1} ≤ t ≤ l and for every i ∈ {0, 1, . . . , t − s}, the component ps+i of q is connected to the component p s′+i of q Proof. By part (c) of Lemma 6.3, there exists s ≤ 6C +1 such that the component ps is connected to a component p s′ for some s ′ ∈ {1, . . . , l′}. Thus there is a path r1 between (p s′)+ and (ps)+ with L(r1) ≤ 1. Consider a new cycle o1 = r1q1r where q1 is the segment of q from (ps)+ to q+ = r − and q 1 is the segment of q from q′− = r + to (p s′)+. Observe that ps+1, . . . , pl is the list of all components of q1 and l−s ≥ l−6C−1 ≥ 6max{1, C}+ 1, hence, according to part (c) of Lemma 6.3 applied to o1, there is t ≥ l − 6max{1, C} > s such that pt is connected to p t′ by means of a path r where s′ + 1 ≤ t′ ≤ l′, (r′1)− = (pt)+, (r 1)+ = (p t′)+ and L(r 1) ≤ 1. Consider 20 ASHOT MINASYAN s′+i′ p Figure 1. the cycle o2 = r1q2r 2 in which q2 and q 2 are the segments of q1 and q 1 from (ps)+ = (r1)+ to (pt)+ and from (p t′)+ to (p s′)+ = (r1)− respectively (Fig. 1). Note that ps+1, . . . , pt is the list of all components of q2 and p s′+1, . . . , p t′ is the list of all components of q′2 . The cycle o2 satisfies the assumptions of part (a) of Lemma 6.3, therefore for every i ∈ {1, . . . , t − s} there exists i′ ∈ {1, . . . , t′ − s′} such that ps+i is connected to p s′+i′ (ps+i can not be connected to r1 [r 1] because in this case it would be connected to ps [pt], but q is without backtracking by Lemma 6.2). It remains to show that i′ = i for every such i. Indeed, if i′ < i for some i ∈ {1, . . . , t − s} then one can consider the cycle o3 = r1q3r 3, where q3 and q′3 are segments of q2 and q 2 from (q2)− = (r1)+ to (ps+i)+ and from (p s′+i′ )+ to (q′2)+ = (r1)− respectively, and (r 3)− = (q3)+, (r 3)+ = (q 3)−, L(r 3) ≤ 1. According to part (a) of Lemma 6.3, each of the components ps+1, . . . , ps+i of q3 must be connected to one of p′s′+1, . . . , p s′+i′ . Hence, since i ′ < i, two distinct components of q3 will be connected to the same component of q , which is impossible by part (c) of Lemma 6.3. The inequality i′ > i would lead to a contradiction after an application of a symmetric argument to q′3. Therefore i ′ = i and the lemma is proved. � Lemma 6.5. In the above notations, let m ≥ 7K and C = max{L(r),L(r′)}. For any positive integer d there exists a constant L = L(C, d) ∈ N such that if L(q) ≥ L then there are d consecutive components ps, . . . , ps+d−1 of q and p s′ , . . . , p s′+d−1 of q′−1, so that ps+i is connected to p s′+i for each i = 0, . . . , d− 1. Proof. Choose the constant L so that (L − 1)/2 ≥ 12max{C, 1} + 2 + d. Let p1, . . . , pl be the consecutive list all components of q. Since Lab(q) ∈ W(Ω,m), we have l ≥ (L − 1)/2 (due to the form of any word from W(Ω,m)). Thus we can apply Lemma 6.4 to find indices s, t from its claim. By the choice of s and t, and the estimate on l, we have t− s ≥ d+ 1, yielding the statement of the lemma. � Corollary 6.6. Let G be a group hyperbolic relative to a family of proper subgroups {Hλ}λ∈Λ. Suppose that a ∈ Hλ0 , for some λ0 ∈ Λ, is an element of infinite order, and x1, x2 ∈ G \ Hλ0 . Then there exists k ∈ N such that g = a k1x1a k2x2 is a hyperbolic element of infinite order in G whenever |k1|, |k2| ≥ k. GROUPS WITH FINITELY MANY CONJUGACY CLASSES 21 Proof. Without loss of generality we can assume that x1, x2 ∈ X , since relative hyperbolicity does not depend on the choice of the finite relative generating set ([20, Thm. 2.34]). Choose the finite subset Ω ⊂ G and the constant K ∈ N according to the claim of Lemma 6.1, and set m = 7K. As the order of a is infinite, there is k ∈ N such that ak /∈ {h ∈ 〈Ω〉 : |h|Ω ≤ m} whenever |k ′| ≥ k. Assume that |k1|, |k2| ≥ k. Suppose, first, that gl = 1 for some l ∈ N. Consider the cycle o = rqr′q′ in Γ(G,X ∪ H) where q− = q+ = 1, Lab(q) ≡ (a k1x1a k2x2) l ∈ W(Ω,m) (akj are considered as single letters from the alphabet X ∪H) and r, r′, q′ are trivial paths (consisting of a single point). Then, by part (a) of Lemma 6.3, every component of q must be regular in o, which is impossible since q is without backtracking according to Lemma 6.2. Hence g has infinite order in G. Suppose, now, that there exists λ′ ∈ Λ, u ∈ Hλ′ and y ∈ G such that ygy −1 = u. Denote C = |y|X∪H. Since element u ∈ G has infinite order, there exists l ∈ N such that 2l ≥ 6C+2 and ul /∈ {h ∈ 〈Ω〉 : |h|Ω ≤ m}. The equality yg ly−1u−l = 1 gives rise to the cycle o = rqr′q′ in Γ(G,X∪H), where r and r′ are paths of length C whose labels represent y in G, r− = 1, q− = r+ = y, Lab(q) ≡ (a k1x1a k2x2) l ∈ W(Ω,m), r′− = q+, q − = r + = y(a k1x1a k2x2) ly−1 and Lab(q′) ≡ u−l ∈ W(Ω,m), L(q′) = 1. By part (c) of Lemma 6.3, at least 2l − 6C ≥ 2 distinct components of q must be connected to distinct components of q′, which is impossible as q′ has only one component. The contradiction shows that g must be a hyperbolic element of G. � Lemma 6.7. Let G be a torsion-free group hyperbolic relative to a family of proper subgroups {Hλ}λ∈Λ, a ∈ Hλ0 \ {1}, for some λ0 ∈ Λ, and t, u ∈ G \Hλ0 . Suppose that there exists k̂ ∈ N such that for every k ≥ k̂ the element g1 = a ktakt−1 is commensurable with g2 = a kuaku−1 in G. Then there are β, γ ∈ Hλ0 and ǫ, ξ ∈ {−1, 1} such that u = γtξβ, βaβ−1 = aǫ, γ−1aγ = aǫ. Proof. Changing the finite relative generating set X of G, if necessary, we can assume that t, u, t−1, u−1 ∈ X . Let the finite subset Ω ⊂ G and the constant K ∈ N be chosen according to Lemma 6.1. Define m = 7K and suppose that k is large enough to satisfy ak /∈ {h ∈ 〈Ω〉 : |h|Ω ≤ m}. Since g1 and g2 are commensurable, there exist l, l ′ ∈ Z\{0} and y ∈ G such that ygl2y −1 = gl 1 . Let C = |y|X∪H, d = 8 and L = L(C, d) be the constant from Lemma 6.5. Without loss of generality, assume that 4l ≥ L. Consider the cycle o = rqr′q′ in Γ(G,X ∪ H) such that r and r′ are paths of length C whose labels represent y in G, r− = 1, q− = r+ = y, Lab(q) ≡ (a kuaku−1)l ∈ W(Ω,m), L(q) = 4l, r′− = q+, q′− = r + = yg −1, Lab(q′) ≡ (aktakt−1)l ∈ W(Ω,m), L(q′) = 4l′. Now, by Lemma 6.5, there are subpaths q̃ = p1s1p2s2p3s3p4 of q and q̃ 4 of q ′−1 such that Lab(pi) ≡ a k, Lab(p′i) ≡ a ǫk, i = 1, 2, 3, 4, for some ǫ ∈ {−1, 1} (which depends on the sign of l′), Lab(s1) ≡ Lab(s3) ≡ u, Lab(s2) ≡ u −1, Lab(s′1) ≡ Lab(s 3) ≡ t ξ, Lab(s′2) ≡ t −ξ, for some ξ ∈ {−1, 1}, and pi is connected in Γ(G,X∪H) to p i for each i = 1, 2, 3, 4. Therefore there exist paths p̃1, p̃2, p̃3, p̃4 whose labels represent the elements α, β, γ, δ ∈ Hλ0 respectively, such that (p̃1)− = (p1)+, (p̃1)+ = (p 1)+, (p̃2)− = (p 2)+, (p̃2)+ = (p2)+, (p̃3)− = (p3)−, (p̃3)+ = (p 3)−, (p̃4)− = (p 4)−, (p̃4)+ = (p4)− (see Fig. 2). The cycles s−11 p̃1s 2p̃2p 2 , s2p̃3s p̃2 and s 3 p̃3p 3p̃4 give rise to the fol- lowing equalities in the group G: u = αtξaǫkβa−k, u = γtξβ and u = a−kγaǫktξδ. 22 ASHOT MINASYAN p1 s1 p2 s2 p3 s3 p4 p′1 s p̃1 p̃2 p̃3 p̃4 ak u ak u−1 ak u aǫkaǫkaǫk t−ξtξ tξ Figure 2. Consequently, recalling that Hλ0 is malnormal (Lemma 2.3) and that t ξ /∈ Hλ0 , we βakβ−1a−ǫk = t−ξγ−1αtξ ∈ Hλ0 ∩ t −ξHλ0t ξ = {1}, and a−ǫkγ−1akγ = tξδβ−1t−ξ ∈ Hλ0 ∩ t ξHλ0t −ξ = {1}. (6.1) βakβ−1 = aǫk and γ−1akγ = aǫk for some β = β(k), γ = γ(k) ∈ Hλ0 and ǫ = ǫ(k), ξ = ξ(k) ∈ {−1, 1}. Note that the proof works for any sufficiently large k, therefore we can find two mutually prime positive integers k, k′ with the above properties such that ǫ(k) = ǫ(k′) = ǫ and ξ(k) = ξ(k′) = ξ. Denote β′ = β(k′) and γ′ = γ(k′), then γtξβ = u = γ′tξβ′, implying γ−1γ′ = tξββ′ t−ξ ∈ Hλ0 ∩ t ξHλ0t −ξ = {1}. Hence β′ = β, γ′ = γ, (6.2) βak β−1 = aǫk and γ−1ak γ = aǫk It remains to observe that since k and k′ are mutually prime, the formulas (6.1) and (6.2) together yield βaβ−1 = aǫ and γ−1aγ = aǫ, q.e.d. � 7. Small cancellation over relatively hyperbolic groups Let G be a group generated by a subset A ⊆ G and let O be the set of all words in the alphabet A±1, that are trivial in G. Then G has a presentation of the following form: (7.1) G = 〈A ‖ O〉. Given a symmetrized set of words R over the alphabet A, consider the group G1 defined by (7.2) G1 = 〈A ‖ O ∪R〉 = 〈G ‖ R〉. During the proof of the main result of this section we use presentations (7.2) (or, equivalently, the sets of additional relators R) that satisfy the generalized small cancellation condition C1(ε, µ, λ, c, ρ). In the case of word hyperbolic groups this condition was suggested by Ol’shanskii in [16], and was afterwards generalized to relatively hyperbolic groups by Osin in [21]. For the definition and detailed theory we refer the reader to the paper [21], as we will only use the properties, that were GROUPS WITH FINITELY MANY CONJUGACY CLASSES 23 already established there. The following observation is an immediate consequence of the definition: Remark 7.1. Let the constants εj, µj , λ, c, ρj , j = 1, 2, satisfy 0 < λ ≤ 1, 0 ≤ ε1 ≤ ε2, c ≥ 0, 0 < µ2 ≤ µ1, ρ2 ≥ ρ1 > 0. If the presentation (7.2) enjoys the condition C1(ε2, µ2, λ, c, ρ2) then it also enjoys the condition C1(ε1, µ1, λ, c, ρ1). We will also assume that the reader is familiar with the notion of a van Kampen diagram over the group presentation (7.2) (see [10, Ch. V] or [15, Ch. 4]). Let ∆ be such a diagram. A cell Π of ∆ is called an R-cell if the label of its boundary contour ∂Π (i.e., the word written on it starting with some vertex in the counter-clockwise direction) belongs to R. Consider a simple closed path o = rqr′q′ in a diagram ∆ over the presentation (7.2), such that q is a subpath of the boundary cycle of an R-cell Π and q′ is a subpath of ∂∆. Let Γ denote the subdiagram of ∆ bounded by o. Assuming that Γ has no holes, no R-cells and L(r),L(r′) ≤ ε, it will be called an ε-contiguity subdiagram of Π to ∂∆. The ratio L(q)/L(∂Π) will be called the contiguity degree of Π to ∂∆ and denoted (Π,Γ, ∂∆). A diagram is said to be reduced if it has a minimal number of R-cells among all the diagrams with the same boundary label. If G is a group hyperbolic relative to a family of proper subgroups {Hi}i∈I , with a finite relative generating set X , then G is generated by the set A = X ∪ i∈I(Hi \ {1}), and the Cayley graph Γ(G,A) is a hyperbolic metric space [20, Cor. 2.54]. As for every condition of small cancellation, the main statement of the theory is the following analogue of Greendlinger’s Lemma, claiming the existence of a cell, large part of whose contour lies on the boundary of the van Kampen diagram. Lemma 7.2 ([21], Cor. 4.4). Suppose that the group G is generated by a subset A such that the Cayley graph Γ(G,A) is hyperbolic. Then for any 0 < λ ≤ 1 there is µ0 > 0 such that for any µ ∈ (0, µ0] and c ≥ 0 there are ε0 ≥ 0 and ρ0 > 0 with the following property. Let the symmetrized presentation (7.2) satisfy the C1(ε0, µ, λ, c, ρ0)-condition. Further, let ∆ be a reduced van Kampen diagram over G1 whose boundary contour is (λ, c)-quasigeodesic in G. Then, provided ∆ has an R-cell, there exists an R-cell Π in ∆ and an ε0-contiguity subdiagram Γ of Π to ∂∆, such that (Π,Γ, ∂∆) > 1− 23µ. The main application of this particular small cancellation condition is Lemma 7.3 ([21], Lemmas 5.1 and 6.3). For any 0 < λ ≤ 1, c ≥ 0 and N > 0 there exist µ1 > 0, ε1 ≥ 0 and ρ1 > 0 such that for any symmetrized set of words R satisfying C1(ε1, µ1, λ, c, ρ1)-condition the following hold. 1. The group G1 defined by (7.2) is hyperbolic relative to the collection of images {η(Hi)}i∈I under the natural homomorphism η : G→ G1. 2. The restriction of η to the subset of elements having length at most N with respect to A is injective. 3. Any element that has a finite order in G1 is an image of an element of finite order in G. Below is the principal lemma of this section that will later be used to prove Theorem 1.5. 24 ASHOT MINASYAN Lemma 7.4. Assume that G is a torsion-free group hyperbolic relative to a family of proper subgroups {Hi}i∈I , X is a finite relative generating set of G, S is a suitable subgroup of G and U ⊂ G is a finite subset. Suppose that i0 ∈ I, a ∈ Hi0 \ {1} and v1, v2 ∈ G are hyperbolic elements which are not commensurable to each other. Then there exists a word W (x, y) over the alphabet {x, y} such that the following is true. Denote w1 = W (a, v1) ∈ G, w2 = W (a, v2) ∈ G, and let 〈〈w2〉〉 be the normal closure of w2 in G, G1 = G/〈〈w2〉〉 and η : G → G1 be the natural epimorphism. • η is injective on {Hλ}λ∈Λ ∪ U and G1 is hyperbolic relative to the family {η(Hλ)}λ∈Λ; • η(S) is a suitable subgroup of G1; • G1 is torsion-free; • η(w1) 6= 1. Proof. By Lemma 2.7 there are hyperbolic elements v3, v4 ∈ S such that vi 6≈ vj if 1 ≤ i < j ≤ 4. Then by Lemma 2.2, the group G is hyperbolic relative to the finite collection of subgroups {Hi}i∈I ∪ j=1{EG(vj)}, and generated by the set A = X ∪ EG(vj)  \ {1}. Let Ω ⊂ G and K ∈ N denote the finite subset and the constant achieved after an application of Lemma 6.1 to this new collection of peripheral subgroups. Define m = 7K, λ = 1/3, c = 2 and N = max{|u|A : u ∈ U} + 1. Choose µj > 0, εj ≥ 0 and ρj > 0, j = 0, 1, according to the claims of Lemmas 7.2 and 7.3. Let ε = max{ε0, ε1}, and let L = L(C, d) > 0 be the constant given by Lemma 6.5 where C = ε0 and d = 2. Evidently there exists n ∈ N such that, for µ = (3ε+ 11)/n, one has 0 < µ ≤ min{µ0, µ1}, 2n(1− 23µ) > L, and 2n > max{ρ0, ρ1}. F(ε) = h ∈ 〈Ω〉 : |h| ≤ max{K(32ε+ 70),m} Since the subset F(ε) is finite, we can find k ∈ N such that ak 1 , v 2 /∈ F(ε) whenever k′ ≥ k. Consider the word W (x, y) ≡ xkykxk+1yk+1 . . . xk+n−1yk+n−1. Let wj ∈ G be the element represented by the wordW (a, vj) in G, j = 1, 2, and let R be the set of all cyclic shifts ofW (a, v2) and their inverses. By Lemma 2.3, Hi0 ∩ EG(v2) = {1} because G is torsion-free, hence by [21, Thm. 7.5] the presentation (7.2) satisfies the condition C1(ε, µ, 1/3, 2, 2n), and therefore, by Remark 7.1, it satisfies the conditions C1(ε0, µ, 1/3, 2, ρ0) and C1(ε1, µ1, 1/3, 2, ρ1). Observe that w1 6= 1 in G because, otherwise, there would have existed a closed path q in Γ(G,A) labelled by the word W (a, v1), and, by part (a) of Lemma 6.3, all components of q would have been regular in the cycle o = rqr′q′ (where r, r′, q′ are trivial paths), which is obviously impossible. Denote G1 = G/〈〈w2〉〉 and let η : G → G1 be the natural epimorphism. Then, according to Lemma 7.3, the group G1 is is torsion-free, hyperbolic relative to {η(Hi)}i∈I∪ j=1{η(EG(vj))} and η is injective on the set i∈I Hi∪ j=1 EG(vj)∪ GROUPS WITH FINITELY MANY CONJUGACY CLASSES 25 U (because the length in A of any element from this set is at most N). Since any elementary group is word hyperbolic, G1 is also hyperbolic relative to {η(Hi)}i∈I (by Lemma 2.4) and η(v3), η(v4) ∈ η(S) become hyperbolic elements of infinite or- der in G1, that are not commensurable with each other (by Lemma 2.3). Therefore EG1(η(v3)) ∩ EG2(η(v4)) = {1} (recall that these subgroups are cyclic by Lemma 2.2 and because G1 is torsion-free), and, consequently, η(S) is a suitable subgroup of G1. Suppose that η(w1) = 1. By van Kampen’s Lemma there exists a reduced planar diagram ∆ over the presentation (7.2) with the wordW (a, v1) written on its boundary. SinceW (a, v1) 6= 1, ∆ possesses at least one R-cell. It was proved in [21, Lemma 7.1] that any path in Γ(G,A) labelled byW (a, v1) is (1/3, 2)-quasigeodesic, hence we can apply Lemma 7.2 to find an R-cell Π of ∆ and an ε0-contiguity subdiagram Γ (containing no R-cells) between Π and ∂∆ such that (Π,Γ, ∂∆) > 1− 23µ. Thus there exists a cycle o = rqr′q′ in Γ(G,A) such that q is labelled by a subword of (a cyclic shift of) W (a, v2), q ′ is labelled by a subword of (a cyclic shift of) W (a, v1) ±1, L(r),L(r′) ≤ ε0 = C and L(q) > (1− 23µ) · L(∂Π) = (1− 23µ) · 2n > L. In particular, Lab(q),Lab(q′) ∈ W(Ω,m). Therefore we can apply Lemma 6.5 to find two consecutive components of q that are connected to some components of q′. Due to the form of the word W (a, v2), one of the formers will have to be an EG(v2)-component, but q ′ can have only EG(v1)- or Hi0 -components. This yields a contradiction because EG(v2) 6= EG(v1) and EG(v2) 6= Hi0 . Hence η(w1) 6= 1 in G1, and the proof is complete. � 8. Every group is a group of outer automorphisms of a (2CC)-group Lemma 8.1. There exists a word R(x, y) over the two-letter alphabet {x, y} such that every non-elementary torsion-free word hyperbolic group F1 has a non-elemen- tary torsion-free word hyperbolic quotient F that is generated by two elements a, b ∈ F satisfying (8.1) R(a, b) 6= 1, R(a−1, b−1) = 1, R(b, a) = 1, R(b−1, a−1) Proof. Consider the word R(x, y) ≡ xy101x2y102 . . . x100y200. Denote by F (a, b) the free group with the free generators a, b. Let R1 = {R(a, b), R(a −1, b−1), R(b, a), R(b−1, a−1)}, and R2 be the set of all cyclic permutations of words from R 1 . It is easy to see that the set R2 satisfies the classical small cancellation condition C ′(1/8) (see [10, Ch. V]). Denote by Ñ the normal closure of the set R3 = {R(a −1, b−1), R(b, a), R(b−1, a−1)} in F (a, b). Since the symmetrization of R3 also satisfies C ′(1/8), the group F̃ = F (a, b)/Ñ is a torsion-free ([10, Thm. V.10.1]) word hyperbolic group (because it has a finite presentation for which the Dehn function is linear by [10, Thm. V.4.4]) such that R(a, b) 6= 1 but R(a−1, b−1) = R(b, a) = R(b−1, a−1) 26 ASHOT MINASYAN Indeed, if the word R(a, b) were trivial in F̃ then, by Greendlinger’s Lemma [10, Thm. V.4.4], it would contain more than a half of a relator from (the symmetriza- tion of) R3 as a subword, which would contradict the fact that R2 enjoys C ′(1/8). The group F̃ is non-elementary because every torsion-free elementary group is cyclic, hence, abelian, but in any abelian group the relation R(a−1, b−1) = 1 implies R(a, b) = 1. Now, the free product G̃ = F̃ ∗F1 is a torsion-free hyperbolic group. Its subgroups F̃ and F1 are non-elementary, hence, according to a theorem of Ol’shanskii [16, Thm. 2], there exists a non-elementary torsion-free word hyperbolic group F and a homomorphism φ : G̃ → F such that φ(F̃ ) = φ(F1) = F and φ(R(a, b)) 6= 1 in F . Therefore F is a quotient of F1, the (φ-images of the) elements a, b generate F and enjoy the required relations. � We are now ready to prove Theorem 1.5. Proof of Theorem 1.5. The argument will be similar to the one used to prove The- orem 5.1. First, set n = 2 and apply Lemma 4.2 to find a countable torsion-free group H and a normal subgroup M ⊳H , where H/M ∼= C and M has (2CC) (alternatively, one could start with a free group H ′ and M ′ ⊳H ′ such that H ′/M ′ ∼= C, and then apply Lemma 4.4 to the pair (H ′,M ′) to obtain H and M with these properties). Consider the word R(x, y) and the torsion-free hyperbolic group F , generated by the elements a, b ∈ F which satisfy (8.1), given by Lemma 8.1. Denote G(−2) = H ∗ F and let N(−2) be the normal closure of 〈M,F 〉 in G(−2), F (−2) = F , R(−2) = {R(a, b)} – a finite subset of F (−2). By Lemma 2.6, G(−2) will be hyperbolic relative to the subgroup H , G(−2) = H ·N(−2), H ∩N(−2) =M and F (−2) will be a suitable subgroup of G(−2). The element a ∈ F (−2) will be hyperbolic in G(−2) and since the group G(−2) is torsion-free, the maximal elementary subgroup EG(−2)(a) will be cyclic generated by some element h−2x−2, where h−2 ∈ H , x−2 ∈ N(−2). Choose y−2 ∈ M so that h−2y−2 6= 1. By Lemmas 2.2 and 2.5, the HNN- extension G(−3/2) = 〈G(−2), t−1 ‖ t−1h−2x−2t −1 = h−2y−2〉 is hyperbolic relative to H . As in proof of Theorem 5.1, one can verify that F (−3) is a suitable subgroup of G(−3/2), and apply Theorem 2.8 to find an epimorphism η−2 : G(−3/2) → G(−1) such that G(−1) is a torsion-free group hyperbolic relative to η−2(H), η−2 is injective on H ∪ R(−2) and η−2(t−1) ∈ F (−1) where F (−1) = η−2(F (−2)) is a suitable subgroup of G(−1). Hence η−2(G(−2)) = G(−1) as G(−3/2) was generated by G(−2) and t−1. Denote N(−1) = η−2(N(−2)), R(−1) = η−2(R(−2)) and ψ−2 = η−2|G(−2) : G(−2) ։ G(−1). One can show thatG(−1) = H ·N(−1) andH∩N(−1) =M using the same arguments as in the proof of Theorem 5.1. According to the construction, we have η−2(t−1)η−2(a)η−2(t −1) = η(t−1at −1) ∈ N(−1) ∩H =M in G(−1), therefore, since the conjugation by η−2(t−1) is an inner automorphism of F (−1), we can assume that F (−1) is generated by a−1 and b−1, where a−1 ∈M and R(a−1, b−1) 6= 1 in F (−1) (because η−2(R(a, b)) 6= 1 in F (−1)). GROUPS WITH FINITELY MANY CONJUGACY CLASSES 27 Now, if b−1 is not a hyperbolic element of G(−1), i.e., if b−1 G(−1) ∼ c for some c ∈ H , then c ∈ N(−1)∩H =M , and since M has (2CC) we can find s−1 ∈ G(−1) such that b−1 = s−1a−1s −1. In this case we define G(0) = G(−1), N(0) = N(−1), F (0) = F (−1), R(0) = R(−1), a0 = a−1, s0 = s−1 and ψ−1 = idG(−1). Otherwise, if b−1 is hyperbolic in G(−1), then we construct the group G(0), and an epimorphism ψ−1 : G(−1) → G(0) in an analogous way, to make sure that η−1 is injective on H ∪ R(−1), G(0) torsion-free and hyperbolic relative to (the image of) H , F (0) = ψ−1(F (−1)) is a suitable subgroup of G(0), G(0) = H ·N(0) and H ∩N(0) = M where N(0) = ψ−1(N(−1)), and b0 = s0a0s 0 in G(0) where b0 = ψ−1(b−1), a0 = ψ−1(a−1) for some s0 ∈ G(0) Enumerate all elements of N(0): {g0, g1, g2, . . . }, and of G(0): {q0, q1, q2, . . . }, so that g0 = q0 = 1. The groups G(j) together with N(j)⊳G(j), F (j) ≤ G(j), finite subsets R(j) ⊂ G(j), and elements aj, sj ∈ G(j), j = 1, 2, . . . , that we will construct shall satisfy the following properties: 1◦. for each j ∈ N there is an epimorphism ψj−1 : G(j − 1) → G(j) which is injective on H ∪R(j− 1). F (j) = ψj−1(F (j − 1)), N(j) = ψj−1(N(j − 1)), aj = ψj−1(aj−1) ∈M , sj = ψj−1(sj−1) ∈ G(j); 2◦. G(j) is torsion-free and hyperbolic relative to (the image of) H , and F (j) ≤ G(j) is a suitable subgroup generated by aj and sjajs 3◦. G(j) = H ·N(j), N(j)⊳G(j) and H ∩N(j) =M ; 4◦. the natural image ḡj of gj in G(j) belongs to F (j); 5◦. there exists zj ∈ H such that q̄j ∼ zj, where q̄j is the image of qj in G(j); 6◦. if j ≥ 1, q̄j−1 ∈ G(j − 1) \H and for each k̂ ∈ N there is k ≥ k̂ such that akj−1sj−1a G(j−1) 6≈ akj−1q̄j−1a j−1q̄ j−1, then there is a word Rj−1(x, y) over the two-letter alphabet {x, y} which satisfies R(j) ∋ ψj−1 Rj−1(aj−1, sj−1aj−1s 6= 1 and Rj−1(aj−1, q̄j−1aj−1q̄ =1 in G(j). Suppose that the groups G(0), . . . , G(i) have already been defined. The group G(i+ 1) will be constructed in three steps. First, assume that q̄i ∈ G(i) \ H and for each k̂ ∈ N there is k ≥ k̂ such that aki sia 6≈ aki q̄ia i . Observe that si /∈ H because, otherwise, one would have F (i) ⊂ H , which is impossible as F (i) is suitable in G(i). Therefore, by Corollary 6.6, we can suppose that k is so large that the elements v1 = a i sia and v2 = a i q̄ia i are hyperbolic in G(i). Applying Lemma 7.4 we can find a word W (x, y) over {x, y} such that the group G(i+ 1/3) = G(i)/〈〈W (ai, v2)〉〉 and the natural epimorphism η : G(i) → G(i + 1/3) satisfy the following: η is injective on H ∪ R(i), G(i + 1/3) is torsion-free and hyperbolic relative to (the image of) H , η(F (i)) ≤ G(i + 1/3) is a suitable subgroup, and η(W (ai, v1)) 6= 1. Define the word Ri(x, y) ≡ W (x, x kyk). Then Ri(ai, siais i ) = W (ai, v1), Ri(ai, q̄iaiq̄ i ) = W (ai, v2) in G(i), hence Ri(ai, siais 6= 1 and η Ri(ai, q̄iaiq̄ = 1 in G(i + 1/3). 28 ASHOT MINASYAN If, on the other hand, q̄i ∈ H or there is k̂ ∈ N such that for every k ≥ k̂ one has aki sia ≈ aki q̄ia i , then we define G(i+ 1/3) = G(i), η : G(i) → G(i+ 1/3) to be the identical homomorphism and Ri(x, y) to be the empty word. Let ĝi+1 and q̂i+1 denote the images of gi+1 and qi+1 in G(i + 1/3), N̂(i) = η(N(i)), F̂ (i) = η(F (i)) and R̂(i) = η R(i) ∪ {Ri(ai, siais . Then, using 3◦, we get G(i + 1/3) = H · N̂(i) and H ∩ N̂(i) = M because ker(η) ≤ N(i) (as ai, q̄iaiq̄ i ∈ N(i)). Now we construct the group G(i + 2/3) in exactly the same way as the group G(i+ 1/2) was constructed in during the proof of Theorem 5.1. If for some f ∈ G(i + 1/3), f q̂i+1f −1 = z ∈ H , then set G(i + 2/3) = G(i), Ki+1 = N̂(i)⊳G(i + 2/3) and ti+1 = 1. Otherwise, q̂i+1 is a hyperbolic element of infinite order in G(i + 1/3). Since G(i + 1/3) is torsion-free, one has EG(i+1/3)(q̂i+1) = 〈hx〉 for some h ∈ H and x ∈ N̂(i), and there is m ∈ Z such that q̂i+1 = (hx) m. Now, by Lemma 2.2, G(i + 1/3) is hyperbolic relative to {H, 〈hx〉}. Choose y ∈ M so that hy 6= 1 and let G(i+ 2/3) be the following HNN-extension of G(i + 1/3): G(i + 2/3) = 〈G(i + 1/3), ti+1 ‖ ti+1(hx)t i+1 = hy〉. The group G(i + 2/3) is torsion-free and hyperbolic relative to H by Lemma 2.5. One can show that F̂ (i) is a suitable subgroup of G(i + 2/3) in the same way as during the proof of Theorem 5.1. Lemma 4.3 assures that H ∩Ki+1 = M where Ki+1 ⊳G(i+ 2/3) is the normal closure of 〈N̂(i), ti+1〉 in G(i+2/3). Finally, note ti+1q̂i+1t i+1 = ti+1(hx) mt−1i+1 = (hy) m = z ∈ H in G(i + 2/3). Define Ti+1 = {ĝi+1, ti+1} ⊂ Ki+1. The group G(i + 1) is constructed from G(i+2/3) as follows. Since Ti+1 · F̂ (i) ⊂ Ki+1⊳G(i+2/3), we can apply Theorem 2.8 to find a group G(i + 1) and an epimorphism ϕi : G(i + 2/3) → G(i + 1) such that ϕi is injective on H ∪ R̂(i), G(i + 1) is torsion-free and hyperbolic relative to (the image of) H , {ϕi(ĝi+1), ϕi(ti+1)} ⊂ ϕi(F̂ (i)), ϕi(F̂ (i)) is a suitable subgroup of G(i + 1), and ker(ϕi) ≤ Ki+1. Denote by ψi : G(i) → G(i + 1) the composition ϕi ◦ η. Then ψi(G(i)) = ϕi(G(i)) = G(i + 1) because G(i + 2/3) was generated by G(i) and ti+1, and according to the construction, ti+1 ∈ ϕi(F̂ (i)) ≤ ϕi(G(i)). Now, after defining F (i+1) = ψi(F (i)), N(i+1) = ψi(N(i)), R(i+1) = ϕi(R̂(i)), ḡi+1 = ϕi(ĝi+1) ∈ F (i+1) and zi+1 = ϕi(z) ∈ H , we see that the conditions 1 ◦ - 5◦ hold in the case when j = i+ 1, as in the proof of Theorem 5.1. The last property 6◦ follows from the way we constructed the group G(i + 1/3). Let Q = G(∞) be the direct limit of the sequence (G(i), ψi) as i → ∞, and let F (∞) and N = N(∞) be the limits of the corresponding subgroups. Let a∞, b∞ and s∞ be the images of a0, b0 and s0 in Q respectively. Then b∞ = s∞a∞s ∞ , Q is torsion-free by 2◦, N ⊳Q, Q = H ·N and H ∩N =M by 3◦, N ≤ F (∞) by 4◦. Hence Q/N ∼= H/M ∼= C. Since F (0) ≤ N(0) we get F (∞) ≤ N . Thus N = F (∞) is a homomorphic image of F (0) = F , and, consequently, it is a quotient of F1. By 5 ◦, for any q ∈ N there are z ∈ H and p ∈ Q such that pqp−1 = z. Consequently z ∈ H ∩ N = M . Choose x ∈ N and h ∈ H so that p = hx. Since M has (2CC) and h−1zh ∈ M , there is y ∈ M such that yh−1zhy−1 = z, therefore (yx)q(yx)−1 = z ∈ M and GROUPS WITH FINITELY MANY CONJUGACY CLASSES 29 yx ∈ MN = N . Hence each element q of N will be conjugated (in N) to an element ofM , and since M has (2CC), therefore the group N will also have (2CC). The property that CQ(N) = {1} can be established in the same way as in Theorem 5.1. Therefore the natural homomorphism Q → Aut(N) is injective. It remains to show that it is surjective, that is for every φ ∈ Aut(N) there is g ∈ Q such that φ(x) = gxg−1 for every x ∈ N . Since all non-trivial elements of N are conjugated, after composing φ with an inner automorphism of N , we can assume that φ(a∞) = a∞. On the other hand, there exist q∞ ∈ N and i ∈ N such that φ(b∞) = q∞a∞q ∞ and q∞ is the image of qi in Q. Note that s∞ /∈ H because si ∈ G(i) \ H for every i ∈ N. This implies that H is a proper subgroup of N , thus q∞ /∈ H since N = F (∞) = 〈a∞, q∞a∞q ∞ 〉 ≤ Q, and a∞ ∈ H . Hence q̄i ∈ G(i) \H . Now we have to consider two possibilities. Case 1: for each k̂ ∈ N there is k ≥ k̂ such that aki sia 6≈ aki q̄ia Then there is a word Ri(x, y) such that the property 6 ◦ holds for j = i + 1. And, since each ψj is injective on {1} ∪Rj (by 2 ◦), we conclude that Ri(a∞, s∞a∞s ∞ ) 6= 1 and Ri(a∞, q∞a∞q ∞ ) = 1 in Q, which contradicts the injectivity of φ. Hence Case 1 is impossible. Case 2: the assumptions of Case 1 fail. Then we can use Lemma 6.7 to find β, γ ∈ H and ǫ, ξ ∈ {−1, 1} such that q̄i = γs iβ, βaiβ −1 = aǫi and γ −1aiγ = a in G(i). Denote by γ∞ the image γ in Q, and for any y ∈ Q let Cy be the automorphism of N defined by Cy(x) = yxy −1 for all x ∈ N . If ξ = −1 then γ−1∞ a∞γ∞ = a ∞ and φ(b∞) = q∞a∞q ∞ = γ∞s hence Aut(N) ∋ Cs∞γ−1∞ ◦ φ : a∞ 7→ s∞a ∞ = b b∞ = s∞a∞s ∞ 7→ a But N has no such automorphisms because R(a∞, b∞) 6= 1 and R(b ∞) = 1 in N (since N is a quotient of F and 1 6= R(a0, b0) ∈ R(0) in G(0)). Therefore ξ = 1. Similarly, ǫ = 1, as otherwise we would obtain a contradiction with the fact that R(a−1∞ , b ∞ ) = 1 in N . Thus Aut(N) ∋ Cγ−1∞ ◦ φ : a∞ 7→ a∞ b∞ = s∞a∞s ∞ 7→ s∞a∞s ∞ = b∞ And since a∞ and b∞ generate N we conclude that for all x ∈ N , φ(x) = gxg where g = γ∞ ∈ Q. Thus the natural homomorphism fromQ to Aut(N) is bijective, implying that Out(N) = Aut(N)/Inn(N) ∼= Q/N ∼= C. Q.e.d. � References [1] G. Arzhantseva, A. Minasyan, D. Osin, The SQ-universality and residual properties of rela- tively hyperbolic groups, J. Algebra 315 (2007), no. 1, 165-177. [2] I. Belegradek, D. Osin, Rips construction and Kazhdan property (T), preprint (2006). arXiv: math.GR/0605553 [3] I. Bumagin, D.T. Wise, Every group is an outer automorphism group of a finitely generated group, J. Pure Appl. Algebra 200 (2005), no. 1-2, 137-147. [4] R. Camm, Simple Free Products, J. London Math. Soc. 28 (1953), 66-76. http://arxiv.org/abs/math/0605553 30 ASHOT MINASYAN [5] Y. de Cornulier, Finitely presentable, non-Hopfian groups with Kazhdan’s Property and infi- nite outer automorphism group, Proc. Amer. Math. Soc. 135 (2007), no. 4, 951-959. [6] P. de la Harpe, A. Valette, La propriété (T) de Kazhdan pour les groupes localement compacts, (avec un appendice de Marc Burger). Astérisque 175, 1989. [7] M. Droste, M. Giraudet, R. Göbel, All groups are outer automorphism groups of simple groups, J. London Math. Soc. (2) 64 (2001), no 3, 565-575. [8] G. Higman, B.H. Neumann, H. Neumann, Embedding theorems for groups, J. London Math. Soc. 24 (1949), 247-254. [9] The Kourovka notebook. Unsolved problems in group theory, 16th augmented edition, V. D. Mazurov and E. I. Khukhro eds., Rossĭıskaya Akademiya Nauk, Sibirskoe Otdelenie, Insti- tut Matematiki (Siberian branch of Russian Academy of Sciences, Mathematical Institute), Novosibirsk, 2006. [10] R. Lyndon and P. Schupp, Combinatorial Group Theory, Springer-Verlag, 1977. [11] T. Matumoto, Any group is represented by an outer automorphism group, Hiroshima Math. J. 19 (1989), no. 1, 209-219. [12] A. Muranov, Diagrams with selection and method for constructing boundedly generated and boundedly simple groups, Comm. Algebra 33 (2005), no. 4, 1217-1258. [13] A. Muranov, Finitely generated infinite simple groups of infinite commutator width, Int. J. Algebra Comput. 17 (2007), no. 3, 607-659. [14] Y. Ollivier, D.T. Wise, Kazhdan groups with infinite outer automorphism group, Trans. Amer. Math. Soc. 359 (2007), no. 5, 1959-1976. [15] A.Yu. Ol’shanskii, Geometry of defining relations in groups, Moscow, Nauka, 1989 (in Rus- sian); English translation in Mathematics and its Applications (Soviet Series), 70. Kluwer Academic Publishers Group, Dordrecht, 1991. [16] A.Yu. Ol’shanskii, On residualing homomorphisms and G-subgroups of hyperbolic groups, Internat. J. Algebra Comput. 3, no. 4 (1993), 365-409. [17] D.V. Osin, Elementary subgroups of relatively hyperbolic groups and bounded generation, Internat. J. Algebra Comput. 16 (2006), no. 1, 99-118. [18] D.V. Osin, Peripheral fillings of relatively hyperbolic groups, Invent. Math. 167 (2007), no. 2, 295-326. [19] D.V. Osin, Relative Dehn functions of HNN-extensions and amalgamated products, Topo- logical and asymptotic aspects of group theory, Contemp. Math. 394, 209-220, Amer. Math. Soc., Providence, RI, 2006. [20] D.V. Osin, Relatively hyperbolic groups: intrinsic geometry, algebraic properties, and algo- rithmic problems, Mem. Amer. Math. Soc. 179 (2006), no. 843, vi+100 pp. [21] D.V. Osin, Small cancellations over relatively hyperbolic groups and embedding theorems, Annals of Math., to appear. arXiv: math.GR/0411039 [22] F. Paulin, Outer automorphisms of hyperbolic groups and small actions on R-trees, Arboreal Group Theory (MSRI, Berkeley, 1988), R.C. Alperin ed., Math. Sci. Res. Inst. Publ. 19, Springer, New York, 1991. School of Mathematics, University of Southampton, Highfield, Southampton, SO17 1BJ, United Kingdom. E-mail address: aminasyan@gmail.com http://arxiv.org/abs/math/0411039 1. Introduction 2. Relatively hyperbolic groups 3. Groups with finitely many conjugacy classes 4. Normal subgroups with (nCC) 5. Adding finite generation 6. Combinatorics of paths in relatively hyperbolic groups 7. Small cancellation over relatively hyperbolic groups 8. Every group is a group of outer automorphisms of a (2CC)-group References ABSTRACT We combine classical methods of combinatorial group theory with the theory of small cancellations over relatively hyperbolic groups to construct finitely generated torsion-free groups that have only finitely many classes of conjugate elements. Moreover, we present several results concerning embeddings into such groups. As another application of these techniques, we prove that every countable group $C$ can be realized as a group of outer automorphisms of a group $N$, where $N$ is a finitely generated group having Kazhdan's property (T) and containing exactly two conjugacy classes. <|endoftext|><|startoftext|> Energy density for chiral lattice fermions with chemical potential Christof Gattringera and Ludovit Liptakb Institut für Physik, FB Theoretische Physik, Universität Graz, Universitätsplatz 5, 8010 Graz, Austria Institute of Physics, Slovak Academy of Sciences, Dúbravská cesta 9, 845 11 Bratislava 45, Slovak Republic We study a recently proposed formulation of overlap fermions at finite density. In particular we compute the energy density as a function of the chemical potential and the temperature. It is shown that overlap fermions with chemical potential approach the correct continuum behavior. PACS numbers: 11.15.Ha, 12.38.Gc I. INTRODUCTION Over the last two decades lattice gauge theory was turned into a powerful qualitative tool for analyzing QCD. This progress is in part due to the advances in algorithms and computer technology, but also on the con- ceptual side important breakthroughs were made. Most prominent among these is the correct implementation of chiral symmetry on the lattice based on the Ginsparg- Wilson equation for the Dirac operator [1]. An application of lattice techniques which has seen a lot of attention in recent years, is the study of QCD at finite temperature. The lattice implementation of the chemical potential µ, necessary for such an analysis, is not straightforward, however. It is well known [2], that a naive introduction leads to µ2/a2 contributions which diverge in the continuum limit when the lattice spacing a is sent to zero. For more traditional formulations, such as the Wilson or staggered Dirac operators, the problem has been solved by introducing the chemical potential in the same way as the 4-component of the gauge field. A satisfactory implementation of the chemical poten- tial should be compatible with chiral symmetry on the lattice based on the Ginsparg-Wilson equation. When attempting to introduce the chemical potential into the only solution of the Ginsparg-Wilson equation know in closed form, the overlap operator [3], a potential prob- lem quickly surfaces: defining the sign function of a non- hermitian matrix. In [4] Bloch and Wettig proposed a solution based on an analytic continuation of the sign function into the complex plane. It was shown, that the eigenvalue spectra of this construction match the expec- tations from random matrix theory. In this letter we analyze the proposal [4] further and study the energy density of free, massless overlap fermions with chemical potential. The dependence of the energy density on µ and the temperature T allows for a detailed analysis of the lattice formulation at finite den- sity. Of particular interest will be the question whether the analytic continuation of the sign function produces divergent µ2/a2 terms. Our study indicates the absence of such contributions and we find that the µ and T de- pendence of the energy density is approached correctly. II. SETUP OF THE CALCULATION The overlap Dirac operator D(µ) for fermions with a chemical potential µ is given as D(µ) = [1− γ5 signH(µ)] , H(µ) = γ5 [1− aDW (µ)] . (1) The sign function may be defined through the spectral theorem for matrices. DW (µ) denotes the usual Wilson Dirac operator, DW(µ)x,y = 1 δx,y − (2) Uj(x)δx+ĵ,y + Uj(x− ĵ)†δx−ĵ,y U4(x)e µa4δx+4̂,y − U4(x−4̂)†e−µa4δx−4̂,y . For later use we distinguish between the lattice spacing a in spatial direction and the temporal lattice constant a4. Periodic boundary conditions are used in the spatial directions, while in time direction we apply anti-periodic boundary conditions. The chemical potential µ is cou- pled in the usual exponential form [2]. For vanishing µ the Wilson Dirac operator is γ5- hermitian, i.e., γ5DW (0)γ5 = DW (0) †. This implies that H(0) is a hermitian matrix. As soon as the chemical po- tential µ is turned on, γ5-hermiticity no longer holds, and H(µ) is a non-hermitian, general matrix. This fact has two important consequences: Firstly, the eigenvalues of H(µ) are no longer real and the sign function for a com- plex number has to be defined in the spectral representa- tion of signH(µ). Secondly, the spectral representation has to be formulated using left and right eigenvectors. This latter problem will be dealt with later when we dis- cuss the evaluation of signH(µ). For the sign function of a complex number we use the analytic continuation proposed in [4] and define the sign function through the sign of the real part sign (x+ iy) = sign (x) . (3) http://arxiv.org/abs/0704.0092v2 The observable we study here is the energy density defined as ǫ(µ) = 〈H〉 = 1 H e−β (H−µN ) = (4) e−β(H−µN ) = − 1 ∂ lnZ Here H is the Hamiltonian of the system, N denotes the number operator and β = 1/T is the inverse temperature (in our units the Boltzmann constant k is set to k = 1). The derivatives in the second line are taken such that βµ = c = const. The continuum result for the subtracted energy density of free massless fermions reads (see, e.g., [5]) ǫ(µ)− ǫ(0) = µ µ2T 2 . (5) When working on the lattice, the inverse tempera- ture β is given by the lattice extent in 4-direction, i.e., β = N4a4. Thus the derivative ∂/∂β in (4) turns into N−14 ∂/∂a4. The partition function Z is given by the fermion determinant detD which we write as the prod- uct over all eigenvalues λn. We thus find ǫ(µ) = − 1 ∂ ln detD a4µ=c = − 1 a4µ=c = − 1 a4µ=c . (6) III. EVALUATION OF THE EIGENVALUES According to (6), for the evaluation of ǫ(µ) the eigen- values λn of the Dirac operator D have to be computed. This is done in three steps: First we bring the Dirac oper- ator for free fermions to 4× 4 block-diagonal form, using Fourier transformation. Subsequently the spectral repre- sentation is applied to the 4× 4 blocks of H to evaluate sign H . Finally the eigenvalues of the blocks of D are computed and by summing over the discrete momenta all eigenvalues are obtained. Following this strategy, one finds for the Fourier trans- form Ĥ of H , Ĥ = γ5h5 + iγ5 γνhν , (7) h5 = 1− [1− cos(apj)]− [1− cos(a4(p4 − iµ))] , hj = − sin(apj) for j = 1, 2, 3 , h4 = − sin(a4(p4 − iµ)) . (8) The spatial momenta are given by pj = 2πkj/aN , where N is the number of lattice points in the spatial directions and kj = 0, 1 ... N − 1. The momenta in time- direction are p4 = π(2k4 + 1)/a4N4, k4 = 0, 1 ... N4 − 1. The remaining diagonalization of Ĥ is similar to the construction of the left- and right-eigenfunctions for the free Dirac operator. One finds that Ĥ has two different, doubly degenerate eigenvalues α1 = α2 = + s , α3 = α4 = − s , s = h2 + h25 , (9) where h2 = ν . The corresponding left- and right- eigenvectors, lj and rj are given by l1 = l 1 [Ĥ + s1] , l2 = l 2 [Ĥ + s1] , l3 = l 3 [Ĥ − s1] , l4 = l 4 [Ĥ − s1] , r1 = [Ĥ + s1]r 1 , r2 = [Ĥ + s1]r r3 = [Ĥ − s1]r(0)3 , r4 = [Ĥ − s1]r 4 . (10) The constant spinors l j , r j are (T is transposition) 1 = r (0) T 1 = c (1, 0, 0, 0) , l 2 = r (0) T 2 = c (0, 1, 0, 0) , 3 = r (0) T 3 = c (0, 0, 1, 0) , l 4 = r (0) T 4 = c (0, 0, 0, 1) . The constant c = (2s(s + h5)) −1/2 ensures the correct normalization, such that the eigenvectors obey lirj = δij . Using these eigenvectors and the spectral theorem we find for sign Ĥ the simple result sign Ĥ = sign (λj) rj lj = sign(s) Ĥ . (12) Plugging this back into the overlap formula (1) and di- agonalizing the remaining 4 × 4 problem one finds two different eigenvalues for the overlap operator at a given momentum, 1− sign ( h2 + h25 )h5 ± i h2 + h25 , (13) where each of the two eigenvalues is twofold degenerate. The momentum dependence enters through the compo- nents hν , h5 defined in (8). In the spectral sum (6) the la- bel n runs over all momenta and the eigenvalues at fixed momentum as given in (13). The necessary derivative with respect to a4 is straightforward to compute in closed form, and the spectral sum (6) can then be summed nu- merically. The argument of the sign function cannot be- come purely imaginary on a finite lattice, and no δ-like terms occur. We remark, that after taking the derivative with respect to a4, we set a = a4 = 1, i.e., all the results we present are in lattice units. 0.00 0.02 0.04 0.06 0.08 0.10 0.000 0.001 0.002 0.003 0.004 / 4π FIG. 1: The energy density ǫ(µ)−ǫ(0) as a function of µ4 (all in lattice units). The symbols (connected to guide the eye) are for various lattice sizes, the dashed line is the continuum result. IV. RESULTS We begin the discussion of our results with Fig. 1, where we show the subtracted energy density ǫ(µ)− ǫ(0) as a function of µ4 for three different lattice volumes. For those lattices all 4 sides have equal length, i.e., in the thermodynamic limit they correspond to zero tempera- ture. Thus, according to (5), we expect the data (sym- bols in Fig. 1) to approach the continuum form µ4/4π2 (dashed line) as the 4-d volume is sent to infinity. The figure clearly shows that the lattice data are pre- dominantly linear when plotted versus µ4 and that for small µ they approach the continuum curve when the volume is increased. It is, however, obvious that also on our largest lattice still a discrepancy remains for larger µ. In particular one finds a slight curvature upwards, a discretization effect which here, since the lattice spacing is just the inverse lattice extension, is also a finite size effect. Furthermore, for small µ one expects to see finite temperature corrections according to (5). In order to study these finite temperature corrections systematically, we analyzed lattices with short tempo- ral extent, i.e., lattices with non-vanishing temperature. Fig. 2 shows the corresponding results, where we again plot the subtracted energy density as a function of µ4. The lattice with the shortest temporal extent, 1283×8, which corresponds to the largest temperature, shows a clear curvature. This curvature is due to the T 2µ2/2 term in (5), which appears as a square root when plotted as function of µ4. The effect is visible also for the other lattices, but becomes less pronounced as the temporal extent is increased, i.e., the temperature T is lowered. In order to study this effect quantitatively, we fit the finite temperature results to the continuum form (5) plus two terms even in µ which parameterize the cutoff effects 0.00 0.02 0.04 0.06 0.08 0.10 0.000 0.001 0.002 0.003 0.004 x 12 x 16 x 24 FIG. 2: The energy density ǫ(µ) − ǫ(0) as a function of µ4, now for finite temperature lattices (all in lattice units). observed in Fig. 1. The fit function is given by 2 + c4 µ 4 + c6 µ 6 + c8 µ 8 . (14) Due to (5) the coefficient of the quadratic term should scale with the temperature such that one expects c2 ∼ T 2/2 = N−24 /2 . (15) The coefficient for the quartic term should be constant, c4 ∼ 1/4π2 = 0.02533 . (16) The results of the fit for the data used in Fig. 2, and for the largest lattice of Fig. 1 are given in Table 1. The table shows that with increasing N4 the two phys- ically significant parameters c2 and c4 approach the val- ues expected from the continuum formula (5): c2 gets closer to N−24 /2 as listed in the second column, and c4 approaches 1/4π2 = 0.02533. For the largest finite tem- perature lattice 1283×24 the discrepancy is down to 9 % /2 c2 c4 c6 c8 8 0.007812 0.010125 0.03519 0.010 -0.021 12 0.003472 0.004125 0.03178 0.023 -0.013 16 0.001953 0.002192 0.02803 0.029 -0.015 24 0.000868 0.000947 0.02587 0.025 -0.030 128 0.000030 0.000032 0.02543 0.015 0.016 TABLE I: Results of the fits to the form (14). The spatial volume is always 1283. The temporal extension N4 is given in the first column. In the second column we list the corre- sponding value of N−2 /2 which is what one expects for the fitting coefficient c2 in the third column. The coefficient c4 in the fourth column is expected to approach the constant value 1/4π2 = 0.02533. 0.0 0.5 1.0 1.5 2.0 -0.05 Overlap , 128 Wilson , 128 FIG. 3: The ratio (ǫ(µ) − ǫ(0))/µ4 as a function of µ (in lattice units). We compare the results for overlap to those from Wilson fermions. for c2, and 2 % for c4. The larger discrepancy for small N4 can be understood as a discretization effect, since the temporal lattice spacing a4 is related to the temporal ex- tension through a4 = 1/N4 and thus larger N4 implies a smaller a4. For comparison we also display the fit results for the 1284 lattice, which corresponds to zero temper- ature. There we find excellent agreement (less than 1% discrepancy) for the parameter c4, governing the leading term at T = 0. The overall picture obtained from the fit results is that overlap fermions with chemical potential reproduce very well both, the µ4 term, as well as the fi- nite temperature contribution T 2µ2/2. We conclude that the analytic continuation of the sign function does not in- troduce lattice artifacts, such as the µ2/a2 term known to be present in a naive implementation of the chemical potential. In the final step of our analysis we study the discretiza- tion effect for larger values of µ and compare the re- sults to the data from the standard Wilson operator. In Fig. 3 we plot the ratio (ǫ(µ)− ǫ(0))/µ4 as a function of µ. In the continuum at T = 0 this ratio has the value 1/4π2 = 0.02533 indicated by the horizontal line. For small µ, up to about µ ∼ 0.7, the Wilson and overlap data fall on top of each other. For very small µ both opera- tors show a prominent increase which is a left-over finite temperature effect, which for the ratio (ǫ(µ) − ǫ(0))/µ4 shows up as a 1/µ2 term. In the range between µ = 0.1 and 0.5 the data are close to the continuum value. Be- yond 0.5 the discretization effects kick in and the overlap and Wilson results start to differ. A comparison with the equivalent plot in [6], where the results from various other lattice Dirac operators were presented, shows that the discretization effects of the overlap operator at large µ are comparable to other formulations. V. SUMMARY In this article we have analyzed the energy density of the overlap operator at finite chemical potential. Follow- ing [4], the sign function in the overlap was implemented through the spectral theorem using the analytic continu- ation of the sign into the complex plane. The subtracted energy density ǫ(µ) − ǫ(0) was analyzed for finite and zero temperature lattices. Fits of the data show that the expected continuum behavior is approached. No trace of unphysical µ2/a2 terms was found. We conclude that overlap fermions with chemical potential [4] provide both, chiral symmetry and the correct description of fermions at finite density. Acknowledgments: We thank Leonard Fister, Gabriele Jaritz, Christian Lang, Stefan Olejnik, Tilo Wettig, and Florian Wodlei for discussions and check- ing some of our calculations. This work is supported by the Slovak Science and Technology Assistance Agency under Contract No. APVT–51–005704, and the Austrian Exchange Service ÖAD. [1] P. H. Ginsparg and K. G. Wilson, Phys. Rev. D 25, 2649 (1982). [2] P. Hasenfratz and F. Karsch, Phys. Lett. B 125, 308 (1983). [3] R. Narayanan and H. Neuberger, Nucl. Phys. B 443, 305 (1995); H. Neuberger, Phys. Lett. B 417, 141 (1998). [4] J. Bloch and T. Wettig, Phys. Rev. Lett. 97, 012003 (2006); J. Bloch and T. Wettig, contribution to Lattice 2006 (hep-lat/0609020). [5] J. Kapusta, Finite temperature field theory, Cambridge University Press, Cambridge (1989). [6] W. Bietenholz and U. J. Wiese, Phys. Lett. B 426, 114 (1998). http://arxiv.org/abs/hep-lat/0609020 ABSTRACT We study a recently proposed formulation of overlap fermions at finite density. In particular we compute the energy density as a function of the chemical potential and the temperature. It is shown that overlap fermions with chemical potential reproduce the correct continuum behavior. <|endoftext|><|startoftext|> Aspects of Electron-Phonon Self-Energy Revealed from Angle-Resolved Photoemission Spectroscopy W.S. Lee,1 S. Johnston,2 T.P. Devereaux,2 and Z.-X. Shen1 Department of Physics, Applied Physics, and Stanford Synchrotron Radiation Laboratory, Stanford University, Stanford, CA 94305 Department of Physics, University of Waterloo,Waterloo, Ontario, Canada N2L 3G1 (Dated: November 4, 2018) Lattice contribution to the electronic self-energy in complex correlated oxides is a fascinating subject that has lately stimulated lively discussions. Expectations of electron-phonon self-energy effects for simpler materials, such as Pd and Al, have resulted in several misconceptions in strongly correlated oxides. Here we analyze a number of arguments claiming that phonons cannot be the origin of certain self-energy effects seen in high-Tc cuprate superconductors via angle resolved pho- toemission experiments (ARPES), including the temperature dependence, doping dependence of the renormalization effects, the inter-band scattering in the bilayer systems, and impurity substitution. We show that in light of experimental evidences and detailed simulations, these arguments are not well founded. PACS numbers: Valid PACS appear here I. INTRODUCTION The microscopic pairing mechanism of the high-Tc superconductivity remains an unsolved question even after twenty years of its discovery. Observations of a kink at around 40-70 meV in the band dispersion along the diagonal of the Brillouin zone (nodal diec- tion) and a peak-dip-hump (PDH) structure at the zone boundary by angle-resolved photoemission spectroscopy (ARPES)1,2,3,4,5,6,7,8,9,10,11,12,13 have drawn a great deal of recent attention as they may shed some light on this problem. Although an agreement has been established that the kink and PDH structure are signatures of the electrons coupled to a sharp bosonic mode, it is still strongly debated about the origin of this bosonic mode. Influenced by the fact that the high-Tc cuprate is a doped antiferromagnetic insulator, a common belief is that this bosonic mode has a magnetic origin2,3,4,5,6,7,8,9. How- ever, an alternative view is that the electron-phonon cou- pling in such a doped-insulator can be very strong and anomalous because of a number of unusual effects, such as poor screening, complex structure as well as the in- terplay with correlations. Indeed, oxygen related op- tical phonons have been invoked to explain the tem- perature and doping dependence of the renormalization effects10,11,12,13,14. This idea of phonons being mainly responsible for this low energy band renormalization ef- fect observed by ARPES has stimulated intense debate. There is currently no consensus whether a phonon, a set of phonons, or a magnetic mode is the primary cause of the band renormalization. As mentioned, some important reasons to invoke phonon interpretation of the ARPES data are: the presence of an universal energy scale in all materials at all doping10,15, particularly in the normal state of very low Tc materials 16; the strong inferred momen- tum dependence11,14; the existence of multiple bosonic mode couplings12 and the decrease in the overall cou- pling strength with increased doping, interpreted as a screening effect, especially for phonons with eigenvectors along the c-axis13. Yet, there is still a widespread belief that phonons are not responsible for the kink features. In the following sections, with a comprehensive look at all experimental data as well as some recent simulations, we address some of the criticisms that have been commonly used to argue against the phonon interpretation. These include the temperature and doping dependence of the renormalization effects, inter-band scattering for bilayer system, and the ARPES experiments on impurity substi- tuted Bi2212 crystal, Bi2Sr2Ca(Cu2−xMx)O8+δ with M = Zn or Ni. Our goal is to clarify these misconceptions as being due to oversimplifying the effects of electron- phonon coupling in cuprates as well as other strongly correlated transition metal oxides. II. ASPECTS OF THE ELECTRON-PHONON SELF-ENERGY A. Temperature Dependence In the standard treatment of electron-phonon coupling effects, the Debye temperature sets a characteristic tem- perature scale, which is well above Tc in conventional superconducting materials. However in the cuprates and other low Fermi energy systems, these two energy scales can be comparable. As a result, the temperature depen- dence of phonon induced self-energies can be very differ- ent from that of conventional superconductors. Accord- ing to the ARPES measurements on Bi2212 system, the band renormalization in the antinodal region (peak-dip- hump structure) shows a dramatic superconductivity- induced enhancement when the system goes through a phase transition from the normal state to the supercon- ducting state. It has been argued that only a mode which emerges in the superconducting state and vanishes in http://arxiv.org/abs/0704.0093v1 the normal state can explain this temperature-dependent renormalization effect2,3,4,5. Phonons are thereby ex- cluded. The sharpness of the renormalization effects due to electron-phonon coupling is strongly temperature depen- dent, given by the fact that Tc of optimally-doped Bi2212 is close to 100K. To demonstrate this temperature de- pendence, we consider the normal state (120 K) and su- perconducting state (10 K) of a d-wave superconductor coupled to a 36 meV B1g, 55meV oxygen A1g, and 70 meV breathing phonons14,17. The electron-phonon cou- pling for the B1g and breathing phonons are those used in Ref. 14. The A1g modes involve c-axis motions of both planar and apical oxygens, and have two branches around 55 and 80 meV. The apical electron-phonon cou- pling, derived in Ref. 17, involves a charge-transfer from the apex oxygen into the CuO2 plane via the Cu 4s or- bital, the same pathway that gives rise to bi-layer split- ting. However, for simplicity, the apical electron-phonon coupling is treated as a constant in the calculations pre- sented in this paper. The reason to include three modes in the calculation was inspired by the earlier success of the two-mode calculation11 as well as the recent discov- ery of multiple mode coupling12,13. For this calculation, the tight-binding band structure described in Ref. 19 has been used. The real part and imaginary part of the self-energy Σ(k, ω) and the spectral functions A(k, ω) at k = (0, π) are then obtained within weak coupling Eliash- berg formalism14 and plotted in Fig. 1. Details of the calculations are presented in the Appendix. At high temperature, both ReΣ(k, ω) and ImΣ(k, ω) do not exhibit a sharp renormalization feature as shown by the dashed curves in Fig. 1 (a) and Fig. 1(b), respec- tively. This demonstrates that the thermal broadening effect smears out the sharp renormalization features; in addition, broadening effects due to additional many body effects would smear out the renormalization features fur- ther. Thus, one should not expect to observe any sharp renormalization features at k = (0, π) in the normal state (∼ 100K) from the electron-phonon coupling. In the su- perconducting state, the renormalization features of the self-energy become much sharper, due to smaller ther- mal broadening effect as well as the opening of a su- perconducting gap. Consistent with the optimally-doped Bi2212 and Bi2223 measurements2,4,11,18, the PDH struc- ture of the spectral function at k = (0, π) emerges at low temperature and disappears at high temperature (nor- mal state), as illustrated in Fig. 1(c) and Fig. 1(d), respectively. While this behavior is generally expected for any phonon, we note that in addition, the self-energy from electron-phonon couplings which involve momen- tum transfers within and between anti-nodal regions of the Fermi surface, such as the apex A1g and B1g phonons, are greatly enhanced for all k-points due to the large density of state enhancements in these regions via the opening of a d-wave gap. A detailed momentum depen- dence of the renormalization effects in the normal state and superconducting state due to the coupling to the B1g FIG. 1: The calculated (a) real part, ReΣ, (b) imaginary part of the self-energy, ImΣ , and the corresponding spectral functions, A(k,ω) in (c) normal state and (d) superconducting state. An extra 5 meV is added to the imaginary part of the self energy in 120K simulation to account for the thermal broadening of the quasi-particle life time. The location for this calculation is indicated in inset of (a) by a red dot with a red curve representing the FS. Insets of (c) and (d) are the data of optimally-doped Bi2223 system (Tc=110K) taken at the normal state (120K) and superconducting state(25K)18, respectively. phonon has been discussed in Ref. 11 and Ref. 14. Fur- thermore, both the dispersion kink and the PDH struc- ture in the nodal region have been clearly observed in the normal state when measured at a low temperature on samples with a lower Tc 16. This lends further support to the strongly temperature dependent renormalization features due to electron-phonon coupling. B. Doping Dependence Another problematic statement against the phonon scenario stems from the apparent strong doping depen- dence of the kink position and strength. Based on the wisdom for conventional good metals, phonons should not have a strong doping dependence, either in frequency of the mode or in strength of the coupling. This is not necessary valid for layered, doped insulators with strong correlation effects, such as cuprates where dop- ing dramatically changes the metallicity and the abil- ity of electrons to screen charge fluctuations. We first note that many experiments on various cuprates have re- ported strongly doping dependent anomalies for several phonons, which implies a strongly doping dependent e-p coupling for these phonons. For example, from inelastic neutron scattering measurements, the breathing mode, half-breathing mode, and the bond-stretching modes ex- hibit prominent doping dependence of dispersion and en- ergy renormalizations20,21. In Raman and infrared spec- FIG. 2: The intensity plots of the (a) spectral functions with- out resolution convolution and (b) resolution convoluted spec- tral functions in the superconducting state (10K) along the nodal direction, as indicated in the inset of (b) by the blue line. Black curves are the band dispersion extracted from the maximum positions of the momentum distribution curves, which cut the spectral functions at a fixed energy. The MDC- derived dispersions in (a) exhibit three sharp ”subkinks” due to the coupling to the three phonon modes used in the model; while in (b) the subkinks are washed out by the finite instru- ment resolution effect leaving an apparent single kink in the band dispersion. The white dashed line illustrates the bare band for extracting ReΣ shown in Fig. 3 (a). troscopy, the Fano lineshapes of phonon modes in B1g and B1u symmetry show strong doping dependences Furthermore, the strength of the phonon energy shift and linewidth variation across Tc also changes strongly with doping23. Second, the doping dependence of the renormalization effects to the electronic self-energy is sophisticated as in- ferred by two recent ARPES studies. One is the observa- tion of multiple bosonic mode couplings along the nodal direction12. The other is the doping dependence of the c-axis screening effect to the coupling between the elec- tron and c−axis phonons. As proposed by Meevasana et 13,24 and Devereaux et al.17, for electron-phonon cou- pling at long wavelengths, the screening becomes more effective at reducing the coupling strength when the c- axis conductivity becomes more metallic. Given these two results plus the variation of the superconducting gap magnitude with doping, the doping dependence of the kink energy is highly convoluted in Bi2212 whose super- conducting gap has an energy comparable to some of the phonons. In Fig. 2, we present the intensity plot of calculated spectral functions demonstrating a doping dependence of the dispersion kink in the superconducting state. The superconducting gap size was set to be 40, 20, and 10 meV for the optimally-doped and more overdoped sys- tems. In addition, the coupling strength of the breathing mode, whose appreciable coupling is only for short wave- lengths and large momentum transfers20,21, remains un- FIG. 3: (a) The ReΣ extracted from Fig. 2(a) (dashed lines) and Fig. 2(b) (solid lines) by subtracting a linear bare band (dashed line in Fig. 2(a)) from the band dispersion. The ar- rows indicate the maximum positions of the ReΣ where the “single” apparent kink in the band dispersion is usually de- fined. (b) Summary of the doping dependence of the apparent kink energy and the apparent mode energy extracted by as- suming a single mode scenario. changed for our doping dependence simulation; while, a filter function, ω2/(ω2+ω2sc) with different value of c-axis screening frequency ωsc is applied to the c-axis phonons (B1g and A1g), to simulate the doping-dependent cou- pling strength due to the change of the c-axis screening effect13,24. We note that although this is a simplifica- tion, it represents the general behavior of screening con- siderations for phonons involving small in-plane momen- tum transfers. Full consideration of screening has been given in Ref. 17 and Ref. 24. In addition, a component 0.005+ ω2 eV is added in the imaginary part of the self- energy to mimic the quasiparticle life time broadening due to electron-electron interaction. As shown in Fig. 2(a), the coupling to multiple phonon modes induces several “subkinks” in the dispersion. The positions of these subkinks mostly correspond to the en- ergies of phonons plus the maximum d-wave SC gap, ∆0, even through there is no gap along the nodal di- rection. This is because when calculating the self-energy, one needs to integrate the contributions from the entire zone, which makes the electrons in the nodal region ”feel” the presence of the gap. Furthermore, revealed by the ex- tracted real-part of the self-energy, ReΣ (dashed curves in Fig. 3 (a)), the dominant feature in ReΣ for the OP case is induced by 36 meV B1g mode, while for the OD1 and OD2 case, the features of the 55 meV A1g mode and 70 meV breathing mode gradually out-weight the con- tribution from the B1g mode. This demonstrates that the change of the SC gap magnitude and the increasing screening effect to these phonons because of increased doping alters the relative strength of the subkinks caused by different modes. To simulate the experimental data, we convoluted the spectral functions shown in Fig. 2(a) with a typical ARPES instrumental resolution: 12.5 meV in energy res- olution and 0.012 π/a in momentum resolution. As illus- trated in Fig. 2(b) and the extracted ReΣ (solid curves in Fig. 3(a)), the subkinks are less obvious and become a broadened “single” kink in the dispersion which is lo- cated at roughly the energy of the dominant phonon fea- ture determined by the maximum position of the ReΣ (the arrows in Fig. 3(a)). The doping dependence of the kink position is summarized as the solid symbols in Fig. 3(b). Assuming a single mode scenario, one can obtain the “doping dependence” of the mode energy by subtracting out the SC gap size. However, we note that this extracted “apparent” mode energy does not match any of the modes used in the model; instead, it is a av- erage between the dominant features (open symbols in Fig. 3(b). Clearly, since the kink energy is a sum of the superconducting gap and a spectrum of bosonic modes, it should not be taken as a precise measurement of the energy of any particular bosonic mode. This casts doubts to the analysis of the doping dependent properties of the kink in the nodal band dispersion based on the single bosonic mode coupling scenario7,8. More importantly, this illustrates the complex nature of lattice effects in these oxides. C. Interband Scattering Borisenko et al.8,9 observed that the scattering rate of the bonding and antibonding bands along the nodal direction cross each other near the energy of the Van Hove singularity, suggesting a strong inter-band scatter- ing between the bonding and antibonding bands. They argued that only a mode with ”odd” symmetry, such as spin resonance mode, can mediate such inter-band scat- tering. The question whether phonons can induce such inter-band scattering has also been raised by these au- thors. First, we note that recent high energy and momentum resolution ARPES experiments on Bi2212 using low en- ergy photons( <10 eV) have better resolved the bilayer splitting at the nodal point25. However, as shown in Fig. 2 of Ref. 25, the scattering rate of the bonding and anti- bonding band does not exhibit a crossover behavior as reported by Borisenko at al.. The inconsistency of the data between the two groups implies that more experi- ments and better analysis are needed to verify whether this inter-band scattering effect is genuine. Second, empirically, it has been known for over 15 years that interband electron-phonon coupling in the cuprates is very large. The evidence comes from the strong reso- nance profiles of many Raman active phonons, which dis- play large intensity variations26. This is generally under- stood as a result of strong interband coupling, whereby phonons can be brought in and out of resonances via tun- ing of the incident photon energy27. Since, in general, phonons can also provide momentum to scatter electrons along the c-axis, direct inter bi-layer scattering can occur which involves mixing of different symmetries of phonons. This can be viewed in a simplified way even if we first neglect direct interband scattering and consider a bilayer system coupled to c−axis phonons. For qz = 0, a simple classification of c-axis modes is possible: k,σ,α=1,2 ǫα(k)c k,α,σ ck,α,σ + t⊥(k) k,1,σck,2,σ + h.c. k,q,σ,α=1,2,ν gν,α(k, q) k+q,α,σck,α,σ aν(−q) + a + h.c. , (1) where α is the index for the electronic states of different layers, ǫ1(k) = ǫ2(k), t⊥(k) describes the hopping of elec- trons between two layers, and the index ν can be either gerade or ungerade active c-axis modes, with symmetry classification with respect to the displacement eigenvec- tors to the inversion center of the cell, depicted in Fig. 4. After diagonalizing the first two terms by canonical transformation, the electron-phonon coupling can be re- cast as (g,u) k,q,σ g(g,u)(k, q) a(g,u)(q) + a (g,u) k+q,+,σck,(+,−),σ + c k+q,−,σck,(−,+),σ + h.c. ..(2) We have used the c+ and c− for the even and odd linear combination of c1 and c2, and subscript g and u for the gerade and ungerade mode, respectively. Thus for qz = 0, where this classification is possible, ger- ade phonons induce intra-band scattering (even chan- nel), while the ungerade phonons mediate the inter- band scattering process (odd channel) even without di- rect electron-phonon coupling across the layers. Yet for qz = π/c, the classification inverts, where gerade modes become ungerade and vice-versa, as illustrated in Fig 4. Thus, even in this simple case, modes at different qz con- tribute both to intra and interband scattering, and the net weight of the coupling appearing in the self energy is then largely determined by the specific momentum struc- ture g(k, q). Since the self-energy generally involves sums over qz, and coupling directly of electrons in adjacent lay- ers via phonons are non-negligible, clearly the inter-band scattering phenomena can not be used to argue against FIG. 4: The illustration of the gerade and ungerade c−axis phonons. The eigenmode of the gerade (ungerade) phonons is even (odd) with respect to the mirror plane between two CuO2 layers at qz = 0, while their definition swapped at qz = π/c. The black, grey, and white circles represent the Cu, Ca, and O atoms, respectively. the phonons being important to the electronic states. We also add a remark concerning the electron- phonon coupling derived from Raman measurements28 in YBa2Cu3O7 and Bi-2212 compared to that obtained from ARPES. While one might naively expect the cou- plings to be comparable from Raman and ARPES, we re- mark that this situation is remarkably different if the cou- pling is strongly moment dependent and whenever corre- lations are appreciable. Since Raman measures phonons with net zero momentum transfer and ARPES involves a sum over all transfers, a sizeable coupling difference may be discernable. This is specifically the case for the B1g phonon, where scattering involving momentum transfers across the necks of the Fermi surface near (π, 0)14, further enhanced via correlations29, yields a strong contribution to the electron self-energy that is absent in phonon self- energies. Moreover, a sum rule analysis presented in Ref. 30 highlights in general how electron and phonon self- energies may be qualitatively different in strongly corre- lated systems. D. ARPES Experiments on Zn and Ni substituted Bi2212 In this section, we comment on recent experiments about the renormalization effects in Zn and Ni substi- tuted Bi2212 crystal31,32. The strength of sharp renor- malizations in these substituted crystals is found to be weakened compared to the pristine crystals. Since the magnetic properties are expectedly modified due to the Cu substitution by these impurities, the authors con- cluded that the sharp renormalization effects are induced by magnetic-related modes, not phonons. In fact, a close examination of the data published by V. B. Zabolotnyy et al.31 and K. Terashima et al.32 implies that the magnetic property is not the only modification due to the substitution by Zn and Ni. First, although both sets of data are consistent in the antinodal region where the strength of the band renormalization is re- duced, they are inconsistent with each other on the kink strength along the nodal direction. In the data set of V. B. Zabolotnyy et al., the kink strength is weaker in the Zn or Ni doped samples, whereas there is no detectable change in the data set reported by K. Terashima et al.. Second, the data from K. Terashima et al. (Fig. 1(d)- (f) in Ref. 32) suggest that the bilayer-splitting struc- ture is much clearer in the pristine crystals than in the Zn and Ni doped crystals. Since the authors have ruled out the possibility of a significant doping level difference between pristine and impurity-doped crystals, the dis- tinct visibility variation of the bilayer structure implies a impurity-related change in the electronic structure. From these two observations on their data, it implies that not only the magnetic properties could change, the band structure and scattering behaviors could also be affected due to these impurities. It is possible that these changes of the electronic structures could “weaken” the renormalizaton features observed in the ARPES spectrum. Furthermore, we note that the strength of electron-phonon coupling could also be modified by the substituted impurities: this can be inferred from the change of the Fano spectra lineshape of the B1g 340 cm−1 phonon in Raman spectral for Zn-doped YBCO33 and Th-doped YBCO34 resulting from an increase in the phonon linewidth due to impurity scattering. Therefore, the experiments on Ni and Zn substituted Bi2212 crys- tals are inconclusive experiments to distinguish phonon and magnetic modes as the origin of the renormalization effects. III. CONCLUSION We have shown that the temperature and doping de- pendence of the renormalization effects, inter-band band scattering, and the results of Zn and Ni doped materials can be understood in the framework of electron-phonon coupling. On the other hand, the issues that make it not plausible for the sharp kink being of spin origin, es- pecially the spin resonance mode, remain: i) the nearly constant energy scale as a function of doping in small gap system12; ii) the multiple modes12; iii) the presence of clear kink in the normal state4,13,16 iv) the detailed agreement between B1g phonon based explanation of the mode coupling as a function of momentum11,14, while the spin resonance with tiny spectral weight (2%) is un- likely to give an explanation for both nodal and antin- odal renormalization; v) the accumulated evidence for lattice polaron effect in underdoped and deeply under- doped systems35,36. With these weaknesses of the spin resonance interpretation, lattice effect is a more plausible explanation of the renormalization effects. It remains a possibility that the spin-fluctuation and other strong cor- relation effects are also very important to determine the electronic structure of cuprates; they likely contribute to a smooth renormalization of the band and may be more relevant to the higher binding energy. However, opti- cal phonons are the most probable origin for the renor- malization effects due to sharp modes near 40-70 meV, which is also supported by the recent finding of STM experiments37,38. Acknowledgments W.S. Lee acknowledge the support from SSRL which is operated by the DOE Office of Basic Energy Science, Division of Chemical Science and Material Science under contract DE-AC02-76SF00515. T. P. Devereaux would like to acknowledge support from NSERC, ONR grant N00014-05-1-0127 and the A. von Humboldt Foundation. APPENDIX A: MIGDAL-ELIASHBERG BASED APPROACH In the calculations presented herein, we evaluate elec- tronic self energies and spectral functions via Migdal- Eliashberg treatment, as discussed in Ref. 39. The dressed Green’s function in the superconducting state is given in Nambu notation by Ĝ(k, ω) = ωZ(k, ω)τ̂0 + [ǫ(k) + χ(k, ω)]τ̂3 + φ(k, ω)τ̂1 [ωZ(k, ω)]2 − [ǫ(k) + χ(k, ω)]2 − φ2(k, ω) , (A1) from which the spectral function follows A(k, ω) = − 1 G′′1,1(k, ω) as shown in Figs. 1c,1d, and 2. The momentum- dependent components of the Nambu self energy are given as generalizations of those found in Ref. 39: ωZ2(k, ω) = |gν(k,p− k)| [nb(Ων) + nf(Ep)][δ(ω +Ων − Ep) + δ(ω − Ων + Ep)] +[nb(Ων) + nf (−Ep)][δ(ω − Ων − Ep) + δ(ω +Ων + Ep)] χ2(k, ω) = − |gν(k,p− k)| [nb(Ων) + nf (Ep)][δ(ω +Ων − Ep)− δ(ω − Ων + Ep)] +[nb(Ων) + nf (−Ep)][δ(ω − Ων − Ep)− δ(ω +Ων + Ep)] φ2(k, ω) = |gν(k,p− k)| [nb(Ων) + nf (Ep)][δ(ω +Ων − Ep)− δ(ω − Ων + Ep)] +[nb(Ων) + nf(−Ep)][δ(ω − Ων − Ep)− δ(ω +Ων + Ep)] where ν denotes the phonon mode index, and nf and nb are the Fermi and Bose occupation factors. gν(k,q) are the corresponding electron-phonon couplings for mode ν, given in reference14 for the B1g and breathing modes. We choose to model the A1g coupling via a momentum independent coupling. Further details can be found in Ref. 17. 1 P. V. Bogdanov, A. Lanzara, S. A. Kellar, X. J. Zhou, E. D. Lu, W. J. Zheng, G. Gu, J.-I. Shimoyama, K. Kishio, H. Ikeda, R. Yoshizaki, Z. Hussain, and Z. X. Shen, Phys. Rev. Lett. 85, 2581 (2000). 2 A. Kaminski, M. Randeria, J. C. Campuzano, M. R. Nor- man, H. Fretwell, J. Mesot, T. Sato, T. Takahashi, and K. Kadowaki, Phys. Rev. Lett. 86, 1070 (2001). 3 T. K. Kim, A. A. Kordyuk, S. V. Borisenko, A. Koitzsch, M. Knupfer, H. Berger, and J. Fink, Phys. Rev. Lett. 91, 167002 (2003). 4 T. Sato, H. Matsui, T. Takahashi, H. Ding, H.-B. Yang, S.-C. Wang, T. Fujii, T. Watanabe, A. Matsuda, T. Terashima, and K. Kadowaki, Phys. Rev. Lett. 91, 157003 (2003). 5 M. R. Norman, H. Ding, J. C. Campuzano, T. Takeuchi, M. Randeria, T. Yokoya, T. Takahashi, T. Mochiku, and K. Kadowaki, Phys. Rev. Lett. 79, 3506 (1997). 6 A. D. Gromko, A. V. Fedorov, Y.-D. Chuang, J. D. Ko- ralek, Y. Aiura, Y. Yamaguchi, K. Oka, Yoichi Ando, and D. S. Dessau Phys. Rev. B 68, 174520 (2003) 7 A. A. Kordyuk, S. V. Borisenko, V. B. Zabolotnyy, J. Geck, M. Knupfer, J. Fink, B. Büchner, C. T. Lin, B. Keimer, H. Berger, A.V. Pan, Seiki Komiya, and Yoichi Ando, Phys. Rev. Lett. 97, 017002(2006). 8 S. V. Borisenko, A. A. Kordyuk, V. Zabolotnyy, J. Geck, D. Inosov, A. Koitzsch, J. Fink, M. Knupfer, B. Büchner, V. Hinkov, C. T. Lin, B. Keimer, T. Wolf, S. G. Chi- uzbăian, L. Patthey, and R. Follath, Phys. Rev. Lett. 96, 117004 (2006). 9 S. V. Borisenko, A. A. Kordyuk, A. Koitzsch, J. Fink, J. Geck, V. Zabolotnyy, M. Knupfer, B. Büchner, H. Berger, M. Falub, M. Shi, J. Krempasky, and L. Patthey, Phys. Rev. Lett. 96, 067001 (2006). 10 A. Lanzara, P. V. Bogdanov, X. J. Zhou, S. A. Kellar, D. L. Feng, E. D. Lu, T. Yoshida, H. Eisaki, A. Fujimori, K. Kishio, J.-I. Shimoyama, T. Noda, S. Uchida, Z. Hussain, Z.-X. Shen, Nature (London) 412, 510 (2001). 11 T. Cuk, F. Baumberger, D. H. Lu, N. Ingle, X. J. Zhou, H. Eisaki, N. Kaneko, Z. Hussain, T. P. Devereaux, N. Na- gaosa, and Z.-X. Shen, Phys. Rev. Lett. 93, 117003 (2004). 12 X. J. Zhou, Junren Shi, T. Yoshida, T. Cuk, W. L. Yang, V. Brouet, J. Nakamura, N. Mannella, Seiki Komiya, Yoichi Ando, F. Zhou, W. X. Ti, J. W. Xiong, Z. X. Zhao, T. Sasagawa, T. Kakeshita, H. Eisaki, S. Uchida, A. Fu- jimori, Zhenyu Zhang, E. W. Plummer, R. B. Laughlin, Z. Hussain, and Z.-X. Shen, Phys. Rev. Lett. 95, 117001 (2005). 13 W. Meevasana, N. J. C. Ingle, D. H. Lu, J. R. Shi, F. Baumberger, K. M. Shen, W. S. Lee, T. Cuk, H. Eisaki, T. P. Devereaux, N. Nagaosa, J. Zaanen, and Z.-X. Shen, Phys. Rev. Lett. 96, 157003 (2006). 14 T. P. Devereaux, T. Cuk, Z.-X. Shen, and N. Nagaosa, Phys. Rev. Lett. 93, 117004 (2004). 15 X.J. Zhou, T. Yoshida, A. Lanzara, P.V. Bogdanov, S.A. Kellar, K.M. Shen, W.L. Yang, F. Ronning, T. Sasagawa, T. Kakeshita, T. Noda, H. Eisaki, S. Uchida, C.T. Lin, F. Zhou, J.W. Xiong, W.X. Ti, Z.X. Zhao, A. Fujimori, Z. Hussain, and Z.-X. Shen, Nature 423, 398 (2003). 16 A. Lanzara, P. V. Bogdanov, X. J. Zhou, N. Kaneko, H. Eisaki, M. Greven, Z. Hussain, and Z. -X. Shen, cond-mat/0412178. 17 T. P. Devereaux, Z.-X. Shen, N. Nagaosa, and J. Zaanen, preprint. 18 W.S. Lee et al., unpublished. 19 M. Eschrig and M. R. Norman, Phys. Rev. B 67, 144503 (2003). 20 L. Pintschovius, Phys. stat. sol. (b) 242, 30 (2005), and the references herein. 21 D. Reznik, L. Pintschovius, M. Ito, S. Likubo, M. Sato, H. Goka, M. Fujita, K. Yamada, G. D. Gu, and J. M. Tranquada, Nature 440, 1170 (2006). 22 M. Opel, R. Hackl, T. P. Devereaux, A. Virosztek and A. Zawadowski, A. Erb and E. Walker, H. Berger and L. Forró, Phys. Rev. B 60, 9836 (1999); C. Bernhard, D. Munzar, A. Golnik, C. T. Lin, A. Wittlin, J. Humliček, and M. Cardona, ibid. 61, 618-626 (2000). 23 E. Altendorf, X. K. Chen, J. C. Irwin, R. Liang and W. N. Hardy, Phys. Rev. B 47, 8140(1993); K. C. Hewitt, X. K. Chen, C. Roch, J. Chrzanowski, J. C. Irwin, E. H. Altendorf, R. Liang, D. Bonn, and W. N. Hardy, ibid. 69 064514(2004). 24 W. Meevasana, T. P. Devereaux, N. Nagaosa, Z.-X. Shen, and J. Zaanen, Phys. Rev. B 74, 174524 (2006). 25 T. Yamasaki, K. Yamazaki, A. Ino, M. Arita, H. Na- matame, M. Taniguchi, A. Fujimori, Z.-X. Shen, M. Ishikado, and S. Uchida, cond-mat/0603006. 26 E. T. Heyen, S. N. Rashkeev, I. I. Mazin, O. K. Andersen, R. Liu, M. Cardona, and O. Jepsen, Phys. Rev. Lett. 65, 3048-3051 (1990); B. Friedl, C. Thomsen, H.-U. Haber- meier, and M. Cardona, Solid State Commun. 78, 291 (1991); D. Reznik, S.L. Cooper, M.V. Klein, W.C. Lee, D.M. Ginsberg, A.A. Maksimov, A.V. Puchkov, I.I. Tar- takovskii, and S-W. Cheong, Phys. Rev. B 48, 7624 (1993); M. Kang, G. Blumberg, M. V. Klein, and N. N. Kolesnikov Phys. Rev. Lett. 77, 4434 (1996); X. Zhou, M. Cardona, D. Colson, and V. Viallet, Phys. Rev. B 55, 12 770 (1997); V.G. Hadjiev, X. Zhou, T. Strohm, M. Cardona, Q.M. Lin, and C.W. Chu, ibid. 58, 1043 (1998). 27 See, e.g., E. Ya. Sherman and C. Ambrosch-Draxl, Phys. Rev. B 62, 9713 (2000), and references therein. 28 Considering Y-123 and Bi-2212, earlier Raman measure- ments, when fit with a Fano profile, indicated that B1g cou- pling in Y-123 is more appreciable than in Bi-2212, which was thought to be due to the different electrostatic environ- ment surrounding the CuO2 planes. [T.P. Devereaux, A. Virosztek, A. Zawadowski, M. Opel, P.F. Müller, C. Hoff- mann, R. Philipp, R. Nemetschek, R. Hackl, H. Berger, L. Forro, A. Erb, and E. Walker, Solid State Commun. 108, 407 (1998)]. This at the time was supported by electrostatic calculations of the c-axis oriented crystal field in Y-123 [J. Li and J. Ladik, Solid State Commun. 95, 35 (1995)], but no calculations had been performed for Bi2212. A re- examination of the Raman data indicate that the extracted coupling for Bi-2212 may be affected by intrinsic inhomo- geneity of phonon lines in Bi-2212 compared to Y-123, as well to differences in the B1g electronic background. While λ was estimated to be 0.0013, with inhomogeneity of the phonon line taken into account along with a different choice of background, λ = 0.02 may be obtained, comparable to Y-123. This is supported by recent Ewald calculations for Bi-2212, which gives a value of local crystal field 1.25 eV/cm, comparable to that obtained for Y-123. 29 Carsten Honerkamp, Henry C. Fu, and Dung-Hai Lee, cond-mat/0605161. 30 O. Rösch, and O. Gunnarsson, Phys. Rev. Lett. 93, 237001(2004); O. Rösch, G. Sangiovanni, and O. Gunnars- son, cond-mat/0607612. 31 V. B. Zabolotnyy, S.V. Borisenko, A. A. Kordyuk, J. Fink, J. Geck, A. Koitzsch, M. Knupfer, B. Büchner, H. Berger, A. Erb, C. T. Lin, B. Keimer, and R. Follath, Phys. Rev. Lett. 96, 037003 (2006). 32 K. Terashima, H. Matsui, D. Hashimoto, T. Sato, T. Taka- hashi, H. Ding, T. Yamamoto AND K. Kadowaki, Nature Physics 2, 27 (2006). 33 M. Limonov, D. Shantsev, S. Tajima, and A. Yamanaka, Phys. Rev. B 65, 024515(2001). 34 E. Altendorf, J. C. Irwin, W. N. Hardy, and R. Liang, Physica C 185-189, 1375(1991). 35 K.M. Shen, F. Ronning, D.H. Lu, W.S. Lee, N.J.C. Ingle, W. Meevasana, F. Baumberger, A. Damascelli, N.P. Ar- mitage, L.L. Miller, Y. Kohsaka, M. Azuma, M. Takano, H. Takagi, and Z.-X. Shen, Phys, Rev. Lett. 93, 267002(2004) 36 O. Rösch, O. Gunnarsson, X. J. Zhou, T. Yoshida, T. Sasagawa, A. Fujimori, Z. Hussain, Z.-X. Shen, and S. Uchida, Phys. Rev. Lett. 95, 227002 (2005). 37 Jinho Lee, K. Fujita, K. McElroy, J. A. Slezak, M. Wang, Y. Aiura, H. Bando, M. Ishikado, T. Masui, J.-X. Zhu, A. V. Balatsky, H. Eisaki, S. Uchida and J. C. Davis, Nature http://arxiv.org/abs/cond-mat/0412178 http://arxiv.org/abs/cond-mat/0603006 http://arxiv.org/abs/cond-mat/0605161 http://arxiv.org/abs/cond-mat/0607612 442, 546(2006). 38 Jian-Xin Zhu, A. V. Balatsky, T. P. Devereaux, Qimiao Si, J. Lee, K. McElroy, and J. C. Davis, Phys. Rev. B 73, 014511(2006); cond-mat/0507621. 39 D. J. Scalapino, in Superconductivity, Vol. 1, editted by R. Parks, Dekker, 1969. http://arxiv.org/abs/cond-mat/0507621 ABSTRACT Lattice contribution to the electronic self-energy in complex correlated oxides is a fascinating subject that has lately stimulated lively discussions. Expectations of electron-phonon self-energy effects for simpler materials, such as Pd and Al, have resulted in several misconceptions in strongly correlated oxides. Here we analyze a number of arguments claiming that phonons cannot be the origin of certain self-energy effects seen in high-$T_c$ cuprate superconductors via angle resolved photoemission experiments (ARPES), including the temperature dependence, doping dependence of the renormalization effects, the inter-band scattering in the bilayer systems, and impurity substitution. We show that in light of experimental evidences and detailed simulations, these arguments are not well founded. <|endoftext|><|startoftext|> arXiv:0704.0094v1 [astro-ph] 2 Apr 2007 Timing and Lensing of the Colliding Bullet Clusters: barely enough time and gravity to accelerate the bullet HongSheng Zhao University of St Andrews, Scottish University Physics Alliances, KY16 9SS, UK We present semi-analytical constraint on the amount of dark matter in the merging bullet galaxy cluster using the classical Local Group timing arguments. We consider particle orbits in potential models which fit the lensing data. Marginally consistent CDM models in Newtonian gravity are found with a total mass MCDM = 1 × 10 M⊙ of Cold DM: the bullet subhalo can move with VDM = 3000 kms −1, and the ”bullet” X-ray gas can move with Vgas = 4200 kms −1. These are nearly the maximum speeds that are accelerable by the gravity of two truncated CDM halos in a Hubble time even without the ram pressure. Consistency breaks down if one adopts higher end of the error bars for the bullet gas speed (5000− 5400 kms−1), and the bullet gas would not be bound by the sub-cluster halo for the Hubble time. Models with VDM ∼ 4500 kms ∼ Vgas would invoke unrealistic large amount MCDM = 7× 10 M⊙ of CDM for a cluster containing only ∼ 10 M⊙ of gas. Our results are generalisable beyond General Relativity, e.g., a speed of 4500 kms−1 is easily obtained in the relativistic MONDian lensing model of Angus et al. (2007). However, MONDian model with hot dark matter MHDM ≤ 0.6×10 M⊙ and CDM model with a halo mass ≤ 1×10 are barely consistent with lensing and velocity data. PACS numbers: 98.10.+z, 98.62.Dm, 95.35.+d; submitted to Physical Review D, rapid publications I. POTENTIAL FROM TIMING Timing is a unique technique to establish the case for dark matter halos, first and most throughly explored in the context of the Local Group (Kahn & Woljter 1959, Fich & Tremaine 1991, Peebles 1989, Inga & Saha 1998). In its simplest version the Local Group consists of the Milky Way and M31 as two isolated point masses, which formed close to each other, moved apart due to the Hub- ble expansion, and slowed down and moved towards each other upto their present velocity ∼ 120 km s−1 and sepa- ration (about 700 kpc) due to their mutual gravity. The age of the universe sets the upper limit on the period of this galaxy pair, hence the total mass of the pair through Kepler’s 3rd law assuming Newtonian gravity. Timing also finds a timely application in the pair of merging galaxy clusters 1E0657-56 at redshift z = 0.3, which is largely an extra-galactic grand analogy of the M31-MW system. The sub-cluster, called the ”bullet”, presently penetrates 400-700 kpc through the main clus- ter with an apparent speed of ∼ 4750+710 −550 km s −1 (Marke- vitch 2006). The X-ray gas of the bullet (amounts to 2×1013M⊙) collides with the X-ray gas of the main clus- ter (with the total gas up to 1014M⊙) and forms a Mach-3 cone in front of the ”bullet”. The two clusters have at least four different centers, which are offset by 400 kpc between the pair of X-ray gas centers and by 700 kpc between the pair of star-light centers, which coincides with the gravitational lensing centers and (dark matter) potential centers (Clowe et al. 2006). The penetration speed is unusually high, hard for standard cosmology to explain statistically (Hayashi & White 2006), and modi- ∗Electronic address: hz4@st-andrews.ac.uk fied force law has been suggested (Farrar & Rosen 2006, Angus et al. 2007). The timing method applies in in MONDian gravity as well as Newtonian. Like lensing, timing is merely a method about constraining potential distribution, and is only indirectly related to the matter distribution. In this Letter we model the bullet clusters as a pair of mass con- centrations formed at high redshift, and set constraint on their mutual force using the simple fact that their radial oscillation period must be close to the age of universe at z = 0.3. We check the consistency with the lensing signal of the cluster and give interpretations in terms of standard CDM and MOND. First we can understand the speed of the bullet clus- ter analytically in simplified scenarios. Approximate the two clusters as points of fixed masses M1 and M2 on a head-on orbit, we can apply the usual MW-M31 timing argument. The total mass M0 = M1 + M2 is constant. The radial orbital period is computed from T = 2 ∫ rmax V (r) , (1) r3max , Newtonian p = 2 (2) 2πrmax , deep-MONDian, p = 1 (3) ∝ K−n/2r max, for a K/r p gravity, (4) where rmax is the apocenter and is related to the present relative velocity V (r) at separation r = 700 kpc by energy http://arxiv.org/abs/0704.0094v1 conservation V (r)2 = −GM0 Newtonian (5) = V 2M (ln rmax − ln r) deep−MONDian (6) r1−p − r1−pmax K/(1− p) for a K/rp gravity,(7) where VM = ξ(GM0a0) 1/4 is the MOND cir- cular velocity of two point masses, a0 equals one Angstrom per square second and is the MOND acceleration scale, and the dimensionless ξ ≡ 3M1M2 ∼ 0.81 ∼ 1 (cf. Mil- grom 1994, Zhao 2007, in preparation) for a typical mass ratio. The predictions for simple Newtonian Keplerian grav- ity are given in Fig. 1; the more subtle case for a MON- Dian cluster is discussed in the final section. Setting the orbital period T = 10Gyrs, the age of the universe at the cluster redshift, yields presently V ∼ 3200 km s−1 in Newtonian for a normal combined mass of M1 + M2 = (0.7− 1)× 1015M⊙ for the clusters, which is about 7-10 times their baryonic gas content (∼ 1014M⊙) for Newto- nian universe of Ω = 0.3 cold dark matter. In agreement with Farrar & Rosen and Hayashi & White, the sim- ple timing argument suggests that dark halo velocities of 4750 km s−1, as high as the ”bullet” X-ray gas, would require halos with unrealistically larger masses of dark matter, ∼ 1016M⊙, an order of magnitude more than what a universal baryon-dark ratio implies. As a sanity check, assuming a conventional 3× 1012M⊙ Local Group dark matter mass Fig.1 predicts the relative velocity of ∼ 100km/s for the M31-MW system at separation 700 kpc after 14 Gyrs, consistent with observation (Binney & Tremaine 1987). These analytical arguments, while straightforward, are not precise given its simplifying assumptions. For one, clusters do not form immediately at redshift infinity, and the cluster mass and size might grow with time gradual- lly. More important is that point mass Newtonian halo models are far from fitting the weak lensing data of the 1E0657-56. A shallower Newtonian potential makes it even more difficult to accelerate the bullet. On the other hand, Angus, Shan, Zhao, Famaey (2007) show that there are MOND-inspired potentials that fits lensing. As com- mented in their conclusion, the same potential is deep enough that a V = 4750 km s−1 ”bullet” is bound in an orbit of apocenter rmax of a few Mpc, so the two clus- ters could be accelerated by mutual gravity from a zero velocity apocenter to 4750 km/s within the clusters’ life- time. This line of thought was further explored by the more systematic numerical study of Angus & McGaugh (2007). Our paper is a spin-off of these works and the works of Hayashi & White and Farrar & Rosen. We emphasize the unification of the semi-analytical timing perspective and the lensing perspective, and aim to derive robust constraints to the potential, without being limited to a X-ray bullet speed Baryon 12.5 13 13.5 14 14.5 15 Log (Combined Mass) FIG. 1: Analytical timing-predicted dynamical mass vs. the relative speed of two objects separated by 700 kpc after 10± 4 Gyrs (three lines in increasing order for increasing time) assuming Keplerian potential of point masses. Three vertical lines indicate typical Local Group Halo mass, Baryonic mass in galaxy clusters, and most massive CDM halo masses. Three horizontal lines indicate the error bar of the speed of the X-ray ”bullet” gas. specific gravity theory or dark matter candidate. Towards the completion of this work, we are made aware by the preprint of Springel & Farrar (2007) that the unobserved bullet DM halo could be moving slower than its observed stripped X-ray gas. These authors, as well as the preprint of Milosovic et al. (2007), emphasized the effect of hydrodynamical pressure, which we will not be able to model realisticly here. But to address the velocity differences, instead we treat the X-ray gas as a ”bullestic particle”. We argue that our hypothetical ballistic parti- cle must move slow enough to be bound to vicinity of the subhalo before the collision, but moves somewhat faster than 4700+700 −550 km s now, since it does not experience ram pressure of the gas. This model follows the spirit of classical timing models of the separation of the Large and Small Magellanic Clouds and the Magellanic Stream (Lin & Lynden-Bell 1982). II. 3D POTENTIAL FROM LENSING The weak lensing shear map of Clowe et al. (2006) has been fitted by Angus et al. (2007) using a four- component analytical potential each being spherical but on different centres. For our purpose we redistribute the minor components and simplify the potential into two components centred on the moving centroid of galaxy light of the main cluster with the present spatial coordi- nates r1(t) = (−564,−176, 0) kpc and subcluster galaxy centroid r2(t) = (145, 0, 0) kpc; the coordinate origin is set at the present brightest point of the ”bullet” X-ray gas; presently the cluster is at z = 0.3 or cosmic time t = 10Gyrs. We also apply a Keplerian truncation to the potential beyond the truncation radius rt. So the follow- ing 3D potential is adopted for the cluster 1E0657-56 at time t, Φ(X,Y, Z, t) = (1800 km s−1)2φ (|r− r1|) (8) + (1270 km s−1)2φ (|r− r2|) , φ(|r − ri(t)|) = ln |r − ri(t)| 180 kpc + cst, r < rt(9) = − r̃t |r− ri(t)| , r ≥ rt(t) = C × t,(10) where r̃t ≡ r +1802 is to ensure a continuous and smooth transition of the potential across the truncation radius rt. The truncation rt evolves with time, since a pre-cluster region collapses gradually after the big bang, and its boundary and total mass grows with time till it reaches the size of a cluster. In the interests of simplicity rather than rigour, we use a linear model rt = C × t, where C is a constant of the unit kpc/Gyr. To check that the simplified potential is still consis- tent with weak lensing data, we recompute the 3D weak lensing convergence (Taylor et al. 2004) for sources at distance D(0, zs) at the redshift zs, κ(X,Y, zs) = i=X,Y ∫ D(0,zs) 2D(z, zs) (∂iΦ)dZ where the integrations in square backets are the deflec- tion angles for a source at zs, and the usrual lensing effective distance is related to the comoving distances by D(z, zs) = (1 + z) −1D̃(z) 1− D̃(z) D̃(zs) = 587 Mpc is for the bullet cluster z = 0.3 lensing sources at zs = 1; the distance increases by a factor 1.3 to 1.6 for source redshifts of 3 to infinity. Fig.2 shows the predicted κ along the line joining the two dark centers; the result is insensitive to the cluster truncation radius as long as rt ≥ 1000kpc presently. The lensing model predicts a signal in between that of the weak lensing data of Clowe et al., and strong lensing data of Bradac et al. It is known that these two data sets are somewhat discrepant to each other. So the fit here is reasonable. The method is deprojection is essentially similar to the decomoposi- tion method of Bradac et al. whose explicit assumption of Einsteinian gravity is however unnecessary. The important thing here is that as far as deproject- ing the above potential is concerned, no assumption is needed on the gravity theory as long as light rays fol- low geodesics, a feature built in most alternative grav- ity theory. Similarly orbits of massive particles are also (different) geodesics in these theories. The meaning of potential in such theories is that the potential (scaled by a factor 2/c2) represents metric perturbations to the flat space-time, especially to the g00(cdt) 2 = −(1+ 2Φ )(cdt)2 term, so the Christoffel Γi00 ∼ ∂∂XiΦ, it can be shown –1200 –1000 –800 –600 –400 –200 0 200 400 X/kpc FIG. 2: Predicted bullet cluster convergence (rescaled for sources at infinity) along the line Y = 0.3X + cst connect- ing our two potential centroids. The model predicts a lensing signal in between that of observed weak lensing data from sources at zs = 1 (Clowe et al, lower end of error bars) and the united weak lensing and strong lensing (zs = 3) data (Bradc et al. upper part of error bars); the mismatch of these two datasets are presently unresolved. that the geodesic equations have the same form as Ein- steinian in the weak-field limit: d R ≈ −(1 + v )∇RΦ, where R is the pair of spatial coordinates perpendicular to the instantaneous velocity v; the pathes of light rays are deflected twice as much by the metric perturbation 2Φ/c2 as those of low-speed particles. III. ORBITS OF THE COLLIDING CLUSTERS We now use this potential to predict the relative speed of the two clusters. This is possible using the classical timing argument, in the style of Kahn & Wolter (159), Fich & Tremaine (1991) and Voltonen et al. (1998); we postpone most rigourous least action models (Pee- bles 1989, Schmoldt & Saha 1998) for later investiga- tions since these require modeling a cosmological con- stant and other mass concentrations along the orbital path of the bullet clusters, which have technical issues in non-Newtonian gravity. We trace the orbits of the two centroids of the potentials according to the equation of motion d = −∇Φ(ri). We assign different relative ve- locities presently (at z = 0.3), and integrate backward in time and require the two centroids of the potential be close together at a time 10 Gyrs ago. The motions are primarily in the sky plane, but we allow for 600 km/s relative velocity component in the line of sight. Clearly at earlier times when t is small, the two centroids are well-separated compared to their sizes, so they move in the growing Keplerian potential of each other. At lat- ter times the centroids came close and move in the cored isothermal potential. We shall consider models with a normal truncation rt = C × t = 1000 kpc at time t = 10 Gyrs. We also consider models with a very large truncation C × t = 10000 kpc. In the language of CDM, the truncation means the virial radius of the halo. The present in- stantaneous escape speed of the model can be com- puted by Vesc = −2Φ(X,Y, Z, t). We find Vesc ∼ 4200− 4500 km s−1 in the central region of the shallower potential model with a present truncation 1000 kpc. The escape speed increases to Vesc ∼ 5700 km s−1 for models with a present truncation 10000 kpc. Fig. 3 shows the predicted orbits for different present relative velocities VDM = |dr2dt − |. Among models with a normal truncation, we find VDM ∼ 2950 km s−1; a model with relative velocity VDM < 2800 km s −1 would predict an unphysical orbital crossing at high redshift, while models with VDM > 3000 km s −1 would predict that the two potential centroids were never close at high redshift. Larger halo velocities are only possible in models with very large truncation. If the relative velocity is 4200 km s−1 < VDM < 4750 km s −1 between two clus- ter gravity centroids, then the truncation must be as big as 10Mpc at z = 0.3. We also track the orbit of the bullet X-ray gas cen- troid as a tracer particle in the above bi-centric poten- tial. We look for orbits where the bullet X-ray gas will always be bound to one member of the binary system since the ram pressure in a hydrodynamical collision is unlikely to be so efficient to eject the X-ray gas out of potential wells of both the main and sub-clusters. This means that the bullet speed must not exceed greatly the present instantaneous escape speed of the model, which is ∼ 4200− 4500 km s−1 in the central region of the shallow potential of a model with a present truncation 1000 kpc. The escape speed increases to ∼ 5700 km s−1 for models with a present truncation 10000 kpc. The model with normal truncation is marginally consistent with the ob- served gas speed Vgas ∼ 4750+710−550 km s −1. The problem would become more severe if the potential were made shallower by an even smaller truncation. The gas speed is less an issue in models with larger truncation. In short the present velocity and lensing data are eas- ier explained with potential models of very large trun- cation. Models with normal truncation have smaller gravitational power, can only accelerate the subhalo to 3000 km s−1 in 10 Gyrs. Models with normal CDM trun- cation can only accelerate the bullet X-ray gas cloud to ∼ 4200− 4400 km s−1, the escape speed, marginally con- sistent with observations. Above simulation results are sensitive to the present cluster separation, but insensitive to the present direc- tion of the velocity vector. Unmodeled effects such as dynamical friction associated with a live halo will reduce the predicted VDM for the same potential, but the effect is mild since the actual collision is brief ∼ 0.1− 0.3Gyrs and the factor exp(−M2/2) in Chandrasekhar’s formulae Curve 1 Curve 2 Curve 3 Curve 4 Curve 9 Curve 10 Curve 11 V_DM=2850 C=100 C=100 V_DM=2950 C=1000 V_DM=4200 C=1000kpc/Gyr V_DM=4750 km/s –2000 –4000 –2000 0 2000 4000 X kpc FIG. 3: The orbit of the bullet subcluster X-ray gas (red, with present Vgas = 5400 km s −1 for the 10 Gyrs in the past, and pink: for the future 4 Gyrs), and the orbits of the col- liding main cluster halo (blue dashes) and subhalo (black dashes) in the potential (eqs. 8-10) determined by lensing data; dashes indicate length traveled in 0.5 Gyrs steps. No explicit assumption of gravity is needed for these calcula- tions. Orbits with different present halo relative velocity VDM and halo growth rate C are shown after a vertical shift for clarity. Timing requires the present cluster relative velocity in between 2800 kms−1 < VDM < 3000 kms −1 for poten- tials of normal truncation (lowest panels where the cluster truncation grows from zero to C × 10Gyr = 1000 kpc), and 4200 kms−1 < VDM < 4750 kms −1 for potentials with large truncation (two upper panels where the cluster truncation grows from zero to C × 10 Gyr = 10000 kpc). sharply reduces dynamical friction for a supersonic body, where M ∼ 2− 3 is the Mach number for the bullet. IV. NEWTONIAN AND MONDIAN MEANINGS OF THE POTENTIAL MODEL Assuming Newtonian gravity the models with normal truncation rt = 1Mpc at t = 10Gyrs correspond to clus- ter (dark) masses of M1 = 0.745 × 1015M⊙ and M2 = 0.345×1015M⊙; the larger truncation rt = 10Mpc corre- sponds to M1 = 7.45×1015M⊙ and M2 = 3.45×1015M⊙ in Newtonian. All these models fit lensing. Interpreted in the MONDian gravity, the truncation is due to external field effect and cosmic background so to make the MOND potential finite hence escapable (Famaey, Bruneton, Zhao 2007). Beyond the trun- cation radius, MOND potential becomes nearly Kep- lerian. The MONDian models, insensitive to trunca- tion, would have masses only M1 = 0.66 × 1015M⊙ and M2 = 0.16 × 1015M⊙. These masses are still higher than their baryonic content ∼ 1014M⊙, implying the need for, e.g., massive neutrinos; the neutrino density is too low in galaxies to affect normal MONDian fits to galaxy rotation curves, but is high enough to bend light and orbits significantly on 1Mpc scale. The neutrino-to- baryon ration, approximately 7:1 in the bullet cluster, would be a reasonable assumption for a MONDian uni- verse with Ωb ∼ 0.04 plus 2eV neutrinos hot dark matter ΩHDM ∼ 0.25 ∼ 7 × Ωb (Sanders 2003, Pointecoute & Silk 2005, Skordis et al. 2006, Angus et al. 2007). The amount of hot dark matter inferred here is the same as Angus et al. (2007) since their potential parameters are fixed by the same lensing data. V. CONCLUSION In short a consistent set of simple lensing and dynam- ical model of the bullet cluster is found. The present relative speeds between galaxies of the two clusters is pre- dicted to be VDM ∼ 2900 km s−1 in CDM and VDM ∼ 4500 km s−1 in µHDM (MOND + Hot Dark Matter) if the two clusters were born close to each other 10 Gyrs ago; both models assume close to universal gas-DM ratio in clusters, i.e., about (0.6 − 1) × 1015M⊙ Hot or Cold DM. Modeling the bullet X-ray gas as ballistic particle, we find the gas particle with speed of Vgas = 4200km/s (at the lower end of observed speed) is bound to the potential of the subcluster for most part of the Hubble time for both above models, insensitive to the preference of the law of gravity. But if future relative proper mo- tion measurements of the subcluster galaxy speed is as high as VDM = 4500km/s, or the gas speed is as high as Vgas ∼ 5400 km s−1, then Newtonian models would need to invoke unlikely 7×1015M⊙ DM halos around 1014M⊙ [20] Angus, G.W. & McGaugh S.D. 2007, astro-ph/0703xxx [20] Angus G.W., Shan H, Zhao H., Famaey B., 2007, ApJ, 654, L13 [3] Bekenstein J., 2004, Phys. Rev. D., 70, 3509 [4] Binney, J., & Tremaine, S. 1987, Galactic Dynamics, Princeton University Press, Princeton, New Jersey, Ch.7 [5] Bradac M., Clowe D., Gonzalez A.H., et al., 2006,astro- ph/0608408 (B06) [6] Clowe D., Bradac M., Gonzalez A.H., et al., 2006,astro- ph/0608407 (C06) [20] Farrar G., & Rosen R.A., astro-ph/0610298 [20] Famaey B., Bruneton J.P., Zhao H.S. 2007, MNRAS, in press (astro-ph/072275) [20] Inga M.S. & Saha P. 1998, ApJ, 115, 2231 [20] Lin D.N.C. & Lynden-Bell D, 1982, MNRAS, 198, 707 [20] Markevitch M. 2006, in ESA SP-604: The X-ray Universe 2005, ed. A.Wlison 723 [20] Milgrom M. 1994, ApJ, 429, 540 [20] Kahn, F.D. & Woltjer L. 1959, ApJ, 130, 705 [20] Peebles P.J.E. 1989, ApJ, 344, L53 [20] Pointecoute E. & Silk J. 2005, MNRAS, 364, 654 [20] Skordis, C. et al. 2006, Phys. Rev. Lett, 96, 1301 [20] Sanders R. 2003, MNRAS, 343, 901 [20] Taylor A.,N., Bacon D.J., et al. 2004, MNRAS, 353, 1176 [20] Fich M. & Tremaine S. 1991,ARAA, 29, 409 [20] Voltonen M.J., Byrd G.G., McCall M., Innanen K.A. 1993, AJ 105, 886 ABSTRACT We present semi-analytical constraint on the amount of dark matter in the merging bullet galaxy cluster using the classical Local Group timing arguments. We consider particle orbits in potential models which fit the lensing data. {\it Marginally consistent} CDM models in Newtonian gravity are found with a total mass M_{CDM} = 1 x 10^{15}Msun of Cold DM: the bullet subhalo can move with V_{DM}=3000km/s, and the "bullet" X-ray gas can move with V_{gas}=4200km/s. These are nearly the {\it maximum speeds} that are accelerable by the gravity of two truncated CDM halos in a Hubble time even without the ram pressure. Consistency breaks down if one adopts higher end of the error bars for the bullet gas speed (5000-5400km/s), and the bullet gas would not be bound by the sub-cluster halo for the Hubble time. Models with V_{DM}~ 4500km/s ~ V_{gas} would invoke unrealistic large amount M_{CDM}=7x 10^{15}Msun of CDM for a cluster containing only ~ 10^{14}Msun of gas. Our results are generalisable beyond General Relativity, e.g., a speed of $4500\kms$ is easily obtained in the relativistic MONDian lensing model of Angus et al. (2007). However, MONDian model with little hot dark matter $M_{HDM} \le 0.6\times 10^{15}\msun$ and CDM model with a small halo mass $\le 1\times 10^{15}\msun$ are barely consistent with lensing and velocity data. <|endoftext|><|startoftext|> Introduction 1 2. Quasi-norms and the geometry of nilpotent Lie groups 12 3. The nilshadow 19 4. Periodic metrics 23 5. Reduction to the nilpotent case 27 6. The nilpotent case 31 7. Locally compact G and proofs of the main results 39 8. Coarsely geodesic distances and speed of convergence 47 9. Appendix: the Heisenberg groups 52 References 55 1. Introduction 1.1. Groups with polynomial growth. Let G be a locally compact group with left Haar measure volG. We will assume that G is generated by a compact sym- metric subset Ω. Classically, G is said to have polynomial growth if there exist C > 0 and k > 0 such that for any integer n ≥ 1 volG(Ω n) ≤ C · nk, Date: April 2012. http://arxiv.org/abs/0704.0095v2 2 EMMANUEL BREUILLARD where Ωn = Ω· . . . · Ω is the n-fold product set. Another choice for Ω would only change the constant C, but not the polynomial nature of the bound. One of the consequences of the analysis carried out in this paper is the following theorem: Theorem 1.1 (Volume asymptotics). Let G be a locally compact group with poly- nomial growth and Ω a compact symmetric generating subset of G. Then there exists c(Ω) > 0 and an integer d(G) ≥ 0 depending on G only such that the following holds: volG(Ω nd(G) = c(Ω) This extends the main result of Pansu [27]. The integer d(G) coincides with the exponent of growth of a naturally associated graded nilpotent Lie group, the asymptotic cone of G, and is given by the Bass-Guivarc’h formula (4) below. The constant c(Ω) will be interpreted as the volume of the unit ball of a sub- Riemannian Finsler metric on this nilpotent Lie group. Theorem 1.1 is a by- product of our study of the asymptotic behavior of periodic pseudodistances on G, that is pseudodistances that are invariant under a co-compact subgroup of G and satisfy a weak kind of the existence of geodesics axiom (see Definition 4.1). Our first task is to get a better understanding of the structure of locally compact groups of polynomial growth. Guivarc’h [21] proved that locally compact groups of polynomial growth are amenable and unimodular and that every compactly generated1 closed subgroup also has polynomial growth. Guivarc’h [21] and Jenkins [15] also characterized connected Lie groups with polynomial growth: a connected Lie group has polynomial growth if and only if it is of type (R), that is if for all x ∈ Lie(S), ad(x) has only purely imaginary eigenvalues. Such groups are solvable-by-compact and any connected nilpotent Lie group is of type (R). It is much more difficult to characterize discrete groups with polynomial growth, and this was done in a celebrated paper of Gromov [17], proving that they are virtually nilpotent. Losert [24] generalized Gromov’s method of proof and showed that it applied with little modification to arbitrary locally compact groups with polynomial growth. In particular he showed that they contain a normal compact subgroup modulo which the quotient is a (not necessarily connected) Lie group. We will prove the following refinement. Theorem 1.2 (Lie shadow). Let G be a locally compact group of polynomial growth. Then there exists a connected and simply connected solvable Lie group S of type (R), which is weakly commensurable to G. We call such a Lie group a Lie shadow of G. Two locally compact groups are said to be weakly commensurable if, up to moding out by a compact kernel, they have a common closed co-compact subgroup. More precisely, we will show that, for some normal compact subgroupK, G/K has 1in fact it follows from the Gromov-Losert structure theory that every closed subgroup is compactly generated. ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 3 a co-compact subgroup H/K which can be embedded as a closed and co-compact subgroup of a connected and simply connected solvable Lie group S of type (R). We must be aware that being weakly commensurable is not an equivalence relation among locally compact groups (unlike among finitely generated groups). Additionally, the Lie shadow S is not unique up to isomorphism (e.g. Z3 is a co-compact lattice in both R3 and the universal cover of the group of motions of the plane). We cannot replace the word solvable by the word nilpotent in the above theo- rem. We refer the reader to Example 7.9 for an example of a connected solvable Lie group of type (R) without compact normal subgroups, which admits no co- compact nilpotent subgroup. In fact this is typical for Lie groups of type (R). So in the general locally compact case (or just the Lie case) groups of polynomial growth can be genuinely not nilpotent, unlike what happens in the discrete case. There are important differences between the discrete case and the general case. For example, we will show that no rate of convergence can be expected in Theorem 1.1 when G is solvable not nilpotent, while some polynomial rate always holds in the nilpotent discrete case [9]. Theorem 1.2 will enable us to reduce most geometric questions about locally compact groups of polynomial growth, and in particular the proof of Theorem 1.1, to the connected Lie group case. Observe also that Theorem 1.2 subsumes Gromov’s theorem on polynomial growth, because it is not hard to see that a co-compact lattice in a solvable Lie group of polynomial growth must be virtually nilpotent (see Remark 7.8). Of course in the proof we make use of Gromov’s theorem, in its generalized form for locally compact groups due to Losert. The rest of the proof combines ideas of Y. Guivarc’h, D. Mostow and a crucial embedding theorem of H.C. Wang. It is given in Paragraph 7.1 and is largely independent of the rest of the paper. 1.2. Asymptotic shapes. The main part of the paper is devoted to the asymp- totic behavior of periodic pseudodistances on G. We refer the reader to Definition 4.1 for the precise definition of this term, suffices it to say now that it is a class of pseudodistances which contains both left-invariant word metrics on G and geodesic metrics on G that are left-invariant under co-compact subgroup of G. Theorem 1.2 enables us to assume that G is a co-compact subgroup of a simply connected solvable Lie group S, and rather than looking at pseudodistances on G, we will look at pseudodistances on S that are left-invariant under a co-compact subgroup H. More precisely a direct consequence of Theorem 1.2 is the following: Proposition 1.3. Let G be a locally compact group with polynomial growth and ρ a periodic metric on G. Then (G, ρ) is (1, C)-quasi-isometric to (S, ρS) for some finite C > 0, where S is a connected and simply connected solvable Lie group of type (R) and ρS some periodic metric on S. Recall that two metric spaces (X, dX ) and (Y, dY ) are called (1, C)-quasi-isometric if there exists a map φ : X → Y such that any y ∈ Y is at distance at most C from some element in the image of φ and if |dY (φ(x), φ(x ′)) − dX(x, x ′)| ≤ C for all x, x′ ∈ X. 4 EMMANUEL BREUILLARD In the case when S is Rd and H is Zd, it is a simple exercise to show that any periodic pseudodistance is asymptotic to a norm on Rd, i.e. ρ(e, x)/ ‖x‖ → 1 as x → ∞, where ‖x‖ = lim 1 ρ(e, nx) is a well defined norm on Rd. Burago in [6] showed a much finer result, namely that if ρ is coarsely geodesic, then ρ(e, x)−‖x‖ is bounded when x ranges over Rd.When S is a nilpotent Lie group andH a lattice in S, then Pansu proved in his thesis [27], that a similar result holds, namely that ρ(e, x)/ |x| → 1 for some (unique only after a choice of a one-parameter group of dilations) homogeneous quasi-norm |x| on the nilpotent Lie group. However, we show in Section 8, that it is not true in general that ρ(e, x) − |x| stays bounded, even for finitely generated nilpotent groups, thus answering a question of Burago (see also Gromov [20]). Our main purpose here will be to extend Pansu’s result to solvable Lie groups of polynomial growth. As was first noticed by Guivarc’h in his thesis [21], when dealing with geometric properties of solvable Lie groups, it is useful to consider the so-called nilshadow of the group, a construction first introduced by Auslander and Green in [2]. Accord- ing to this construction, it is possible to modify the Lie product on S in a natural way, by so to speak removing the semisimple part of the action on the nilradical, in order to turn S into a nilpotent Lie group, its nilshadow SN . The two Lie groups have the same underlying manifold, which is diffeomorphic to Rn, only a different Lie product. They also share the same Haar measure. This “semisimple part” is a commutative relatively compact subgroup T (S) of automorphisms of S, image of S under a homomorphism T : S → Aut(S). The new product g ∗ h is defined as follows by twisting the old one g · h by means of T (S), (1) g ∗ h := g · T (g−1)h The two groups S and SN are easily seen to be quasi-isometric, and this is why any locally compact group of polynomial growth G is quasi-isometric to some nilpotent Lie group. In particular, their asymptotic cones are bi-Lipschitz. The asymptotic cone of a nilpotent Lie group is a certain associated graded nilpotent Lie group endowed with a left invariant geodesic distance (or Carnot group). The graded group associated to SN will be called the graded nilshadow of S. Section 3 will be devoted to the construction and basic properties of the nilshadow and its graded group. In this paper, we are dealing with a finer relation than quasi-isometry. We will be interested in when do two left invariant (or periodic) distances are asymptotic2 (in the sense that d1(e,g) d2(e,g) → 1 when g → ∞). In particular, for every locally compact group G with polynomial growth, we will identify its asymptotic cone up to isometry and not only up to quasi-isometry or bi-Lipschitz equivalence (see Corollary 1.9 below). One of our main results is the following: 2Yet a finer equivalence relation is (1, C)-quasi-isometry, i.e. being at bounded distance in Gromov-Hausdorff metric; classifying periodic metrics up to this kind of equivalence is much harder. ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 5 Theorem 1.4 (Main theorem). Let S be a simply connected solvable Lie group with polynomial growth. Let ρ(x, y) be periodic pseudodistance on S which is in- variant under a co-compact subgroup H of S (see Def. 4.1). On the manifold S, one can put a new Lie group structure, which turns S into a stratified nilpotent Lie group, the graded nilshadow of S, and a subFinsler metric d∞(x, y) on S which is left-invariant for this new group structure such that ρ(e, g) d∞(e, g) as g → ∞ in S. Moreover every automorphism in T (H) is an isometry of d∞. The reader who wishes to see a simple illustration of this theorem can go directly to subsection 8.1, where we have treated in detail a specific example of periodic metric on the universal cover of the groups of motions of the plane. The new stratified nilpotent Lie group structure on S given by the graded nilshadow comes with a one-parameter family of so-called homogeneous dilations {δt}t>0. It also comes with an extra group of automorphisms, namely the image of H under the homomorphism T . This yields automorphisms of S for both the original group structure on S and the new graded nilshadow group structure. Moreover the dilations {δt}t>0 are automorphisms of the graded nilshadow and they commute with T (H). A subFinsler metric is a geodesic distance which is defined exactly as subRie- mannian (or Carnot-Caratheodory) metrics on Carnot groups are defined (see e.g. [25]), except that the norm used to compute the length of horizontal paths is not necessarily a Euclidean norm. We refer the reader to Section 2.1 for a precise definition. In Theorem 1.4 the subFinsler metric d∞ is left invariant for the new Lie struc- ture on S and it is also invariant under all automorphisms in T (H) (these form a relatively compact commutative group of automorphisms). Moreover it satisfies the following pleasing scaling law: d∞(δt(x), δt(y)) = td∞(x, y) ∀t > 0. The proof of Theorem 1.4 splits in two important steps. The first is a reduction to the nilpotent case and is performed in Section 5. Using a double averaging of the pseudodistance ρ over both K := T (H) and S/H, we construct an asso- ciated pseudodistance, which is periodic for the nilshadow structure on S (i.e. left-invariant by a co-compact subgroup for this structure), and we prove that it is asymptotic to the original ρ. This reduces the problem to nilpotent Lie groups. The key to this reduction is the following crucial observation: that unipotent au- tomorphisms of S induce only a sublinear distortion, forcing the metric ρ to be asymptotically invariant under T (H). The second step of the proof assumes that S is nilpotent. This part is dealt with in Section 6 and is essentially a reformula- tion of the arguments used by Pansu in [27]. 6 EMMANUEL BREUILLARD Incidently, we stress the fact that the generality in which Section 6 is treated (i.e. for general coarsely geodesic, and even asymptotically geodesic periodic met- rics) is necessary to prove even the most basic case (i.e. word metrics) of Theorem 1.4 for non-nilpotent solvable groups. So even if we were only interested in the asymptotics of left invariant word metrics on a solvable Lie group of polynomial growth S, we would still need to understand the asymptotics of arbitrary coarsely geodesic left invariant distances (and not only word metrics!) on nilpotent Lie groups. This is because the new pseudodistance obtained by averaging, see (30), is no longer a word metric. The subFinsler metric d∞(e, x) in the above theorem is induced by a certain T (H)-invariant norm on the first stratum m1 of the graded nilshadow (which is T (H)-invariant complementary subspace of the commutator subalgebra of the nilshadow). This norm can be described rather explicitly as follows. Recall that we have3 a canonical map π1 : S → m1, which is a group homomor- phism for both the nilshadow and graded nilshadow structures. Then: {v ∈ m1, ‖v‖∞ ≤ 1} = CvxHull π1(h) ρ(e, h) , h ∈ H\F where the right hand side is the intersection over all compact subsets F of S of the closed convex hull of the points π1(h)/ρ(e, h) for h ∈ H\F . Figure 1 gives an illustration of the limit shape corresponding to the word metric on the 3-dimensional discrete Heisenberg group with standard generators. We explain in the Appendix how one can compute explicitly the geodesics of the limit metric and the limit shape in this example. When S itself is nilpotent to begin with and ρ is (in restriction to H) the word metric associated to a symmetric compact generating set Ω of H (namely ρΩ(e, h) := inf{n ∈ N;h ∈ Ω n}), the above norm takes the following simple form: (2) {v ∈ m1, ‖v‖∞ ≤ 1} = CvxHull {π1(ω), ω ∈ Ω} For instance, in the special case when H is a torsion-free finitely generated nilpo- tent group with generating set Ω and S is its Malcev closure, the unit ball {v ∈ m1, ‖v‖∞ ≤ 1} is a polyhedron in m1. This was Pansu’s description in [27]. However when S is not nilpotent, and is equipped with a word metric ρΩ on a co-compact subgroup, then the determination of the limit shape, i.e. the de- termination of the limit norm ‖ · ‖∞ on the abelianized nilshadow, is much more difficult. Clearly ‖ · ‖∞ is K-invariant and it is a simple observation that the unit ball for ‖ · ‖∞ is always contained in the convex hull of the K-orbit of π1(Ω). 3The subspace m1 can be identified with the abelianized nilshadow (or abelianized graded nilshadow) by first identifying the nilshadow with its Lie algebra via the exponential map and then projecting modulo the commutator subalgebra. The map does not depend on the choice involved in the construction of the nilshadow. See also Remark 3.7. ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 7 Nevertheless the unit ball is typically smaller than that (unless Ω was K-invariant to begin with). In general it would be interesting to determine whether there exists a simple description of the limit shape of an arbitrary word metric on a solvable Lie group with polynomial growth. We refer the reader to Section 8 and Paragraph 8.2 for an example of a class of word metrics on the universal cover of the group of motions of the plane, for which we were able to compute the limit shape. Another by-product of Theorem 1.4 is the following result. Corollary 1.5 (Asymptotic shape). Let S be a simply connected solvable Lie group with polynomial growth and H a co-compact subgroup. Let ρ be an H- periodic pseudodistance on S. Then in the Hausdorff metric, (Bρ(t)) = C, where C is a T (H)-invariant compact neighborhood of the identity in S, Bρ(t) is the ρ-ball of radius t in S and {δt}t>0 is a one-parameter group of dilations on S (equipped with the graded nilshadow structure). Moreover, C = {g ∈ S, d∞(e, g) ≤ 1} is the unit ball of the limit subFinsler metric from Theorem 1.4. Proof. By Theorem 1.4, for every ε > 0 we have Bd∞(t−εt) ⊂ Bρ(t) ⊂ Bd∞(t+εt) if t is large enough. Since δ 1 (Bd∞(t)) = C, for all t > 0, we are done. � Combining this with Theorem 1.2, we also get the following corollary, of which Theorem 1.1 is only a special case with ρ the word metric associated to the gen- erating set Ω. Corollary 1.6 (Volume asymptotics). Suppose that G is a locally compact group with polynomial growth and ρ is a periodic pseudodistance on G. Let Bρ(t) be the ρ-ball of radius t in G, i.e. Bρ(t) = {x ∈ G, ρ(e, x) ≤ t}, then there exists a constant c(ρ) > 0 such that the following limit exists: (3) lim volG(Bρ(t)) td(G) = c(ρ) Here d(G) is the integer d(SN ), the so-called homogeneous dimension of the nilshadow SN of a Lie shadow S of G (obtained by Theorem 1.2), and is given by the Bass-Guivarc’h formula: (4) d(SN ) = dim(Ck(SN )) where {Ck(SN )}k is the descending central series of SN . The limit c(ρ) is equal to the volume volS(C) of the limit shape C from Corollary 1.5 once we make the right choice of Haar measure on a Lie shadow S of G. Let us explain this choice. Recall that according to Theorem 1.2, G/K admits a co- compact subgroup H/K which embeds co-compactly in S. Starting with a Haar measure volG on G, we get a Haar measure on G/K after fixing the Haar measure of K to be of total mass 1, and we may then choose a Haar measure on H/K so that the compact quotient G/H has volume 1. Finally we choose the Haar measure 8 EMMANUEL BREUILLARD Figure 1. The asymptotic shape of large balls in the Cayley graph of the Heisenberg group H(Z) = 〈x, y|[x, [x, y]] = [y, [x, y]] = 1〉 viewed in exponential coordinates. on S so that the other compact quotient S/(H/K) has volume 1. This gives the desired Haar measure volS such that c(ρ) = volS(C). Note that Haar measure on S is also invariant under the group of automor- phisms T (S) and is thus left invariant for the nilshadow structure on S. It is also left invariant for the graded nilshadow structure. In both exponential coordinates of the first kind (on SN ) and of the second kind (as in Lemma 3.10), Haar measure is just Lebesgue measure. In the case of the discrete Heisenberg group of dimension 3 equipped with the word metric given by the standard generators, it is possible to compute the con- stant c(ρ) and the volume of the limit shape as shown in Figure 1. In this case the volume is 31 (see the Appendix). The 5-dimensional Heisenberg group can also be worked out and the volume of its limit shape (associated to the word metric given by standard generators) is equal to 2009 21870 log 2 32805 . The fact that this number is transcendental implies that the growth series of this group, i.e. the formal power series n≥0 |Bρ(n)|z n is not algebraic in the sense that it is not a solution of a polynomial equation with rational functions in C(z) as coefficients (see [33, Prop. ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 9 3.3.]). This was observed by Stoll in [33] by more direct combinatorial means. Stoll also shows there the interesting fact that the growth series can be rational for some other choices of generating sets in the 5-dimensional Heisenberg group. So rationality of the growth series depends on the generating set. Another interesting feature is asymptotic invariance: Corollary 1.7 (Asymptotic invariance). Let S be a simply connected solvable Lie group with polynomial growth and ρ a periodic pseudodistance on S. Let ∗ be the new Lie product on S given by the nilshadow group structure (or the graded nilshadow group structure). Then ρ(e, g ∗ x)/ρ(e, x) → 1 as x → ∞ for every g ∈ S. This follows immediately from Theorem 1.4, when ∗ is the graded nilshadow product, and from Theorem 6.2 below in the case ∗ is the nilshadow group struc- ture. It is worth observing that we may not in general replace ∗ by the ordinary product on S. Indeed, let for instance S = R ⋉ R2 be the universal cover of the group of motions of the Euclidean plane, then S, like its nilshadow R3, admits a lattice Γ ≃ Z3. The quotient S/Γ is diffeomorphic to the 3-torus R3/Z3 and it is easy to find Riemannian metrics on this torus so that their lift to R3 is not invariant under rotation around the z-axis. Hence this metric, viewed on the Lie group S will not be asymptotically invariant under left translation by elements of S. Nevertheless, if the metric is left-invariant and not just periodic, then we have the following corollary of the proof of Theorem 1.4. Corollary 1.8 (Left-invariant pseudodistances are asymptotic to subFinsler met- rics). Let S be a simply connected solvable Lie group of polynomial growth and ρ be a periodic pseudodistance on S which is invariant under all left-translations by elements of S (e.g. a left-invariant coarsely geodesic metric on S). Then there is a left-invariant subFinsler metric d on S which is asymptotic to ρ in the sense ρ(e,g) d(e,g) → 1 as g → ∞. We already mentioned above that determining the exact limit shape of a word metric on S is a difficult task. Consequently so is the task of telling when two distinct word metrics are asymptotic. The above statement says that in any case every word metric on S is asymptotic to some left-invariant subFinsler metric. So the set of possible limit shapes is no richer for word metrics than for left-invariant subFinsler metrics. We note that in the case of nilpotent Lie groups (where K is trivial), Theorem 1.4 shows that every periodic metric is asymptotic to a left-invariant metric. It is still an open problem to determine whether every coarsely geodesic periodic metric is at a bounded distance from a left-invariant metric (this is Burago’s theorem in n, more about it below). Theorems 1.2 and 1.4 allow us to describe the asymptotic cone of (G, ρ) for any periodic pseudodistance ρ on any locally compact group with polynomial growth. 10 EMMANUEL BREUILLARD Corollary 1.9 (Asymptotic cone). Let G be a locally compact group with polyno- mial growth and ρ a periodic pseudodistance on G. Then the sequence of pointed metric spaces {(G, 1 ρ, e)}n≥1 converges in the Gromov-Hausdorff topology. The limit is the metric space (N, d∞, e), where N is a graded simply connected nilpo- tent Lie group and d∞ a left invariant subFinsler metric on N . Moreover the Lie group N is (up to isomorphism) independent of ρ. The space (N, d∞) is isometric to “the asymptotic cone” associated to (G, ρ). This asymptotic cone is independent of the choice of ultrafilter used to define it. This corollary is a generalization of Pansu’s theorem ((10) in [27]). We refer the reader to the book [18] for the definitions of the asymptotic cone and the Gromov-Hausdorff convergence. We discuss in Section 8 the speed of convergence (in the Gromov-Hausdorff metric) in this theorem and its corollaries about volume growth. In particular there is a major difference between the discrete nilpotent case and the solvable non nilpotent case. In the former, one can find a polynomial rate of convergence [9], while in the latter no such rate exist in general (see Theorem 8.1). 1.3. Folner sets and ergodic theory. A consequence of Corollary 1.6 is that sequences of balls with radius going to infinity are Folner sequences, namely: Corollary 1.10. Let G be a locally compact group with polynomial growth and ρ a periodic pseudodistance on G. Let Bρ(t) be the ρ-ball of radius t in G. Then {Bρ(t)}t>0 form a Folner family of subsets of G namely, for any compact set F in G, we have (∆ denotes the symmetric difference) (5) lim volG(FBρ(t)∆Bρ(t)) volG(Bρ(t)) Proof. Indeed FBρ(t)∆Bρ(t) ⊂ Bρ(t + c)\Bρ(t) for some c > depending on F . Hence (5) follows from (3). � This settles the so-called “localization problem” of Greenleaf for locally compact groups of polynomial growth (see [16]), i.e. determining whether the powers of a compact generating set {Ωn}n form a Folner sequence. At the same time it implies that the ergodic theorem for G-actions holds along any sequence of balls with radius going to infinity. Theorem 1.11. (Ergodic Theorem) Let be given a locally compact group G with polynomial growth together with a measurable G-space X endowed with a G- invariant ergodic probability measure m. Let ρ be a periodic pseudodistance on G and Bρ(t) the ρ-ball of radius t in G. Then for any p, 1 ≤ p < ∞, and any function f ∈ Lp(X,m) we have volG(Bρ(t)) Bρ(t) f(gx)dg = for m-almost every x ∈ X and also in Lp(X,m). ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 11 In fact, Corollary 1.10 above, was the “missing block” in the proof of the ergodic theorem on groups of polynomial growth. So far and to my knowledge, Corollary 1.10 and Theorem 1.11 were known only along some subsequence of balls {Bρ(tn)}n chosen so that (5) holds (see for instance [10] or [34]). This issue was drawn to my attention by A. Nevo and was my initial motivation for the present work. We refer the reader to the A. Nevo’s survey paper [26] Section 5. It later turned out that the mere fact that balls are Folner in a given polynomial growth locally compact group can also be derived from the fact these groups are doubling metric spaces (which is an easier result than the precise asymptotics vol(Ωn) ∼ cΩn d(G) proved in this paper and only requires lower and upper bounds of the form c1n d(G) ≤ vol(Ωn) ≤ c2n d(G)). This was observed by R. Tessera [35] who rediscovered a cute argument of Colding and Minicozzi [11, Lemma 3.3.] showing that the volume of spheres Ωn+1 \ Ωn is at most some O(n−δ) times the volume of the ball Ωn, where δ > 0 is a positive constant depending only on the doubling constant the word metric induced by Ω in G. In [9], we give a better upper bound (which depends only on the nilpotency class and not on the doubling constant) for the volume of spheres in the case of finitely generated nilpotent groups. This is done by showing the following error term in the asymptotics of the volume of balls: we have vol(Ωn) = cΩn d(G)+O(nd(G)−αr ), where αr > 0 depends only on the nilpotency class r of G. We refer the reader to Section 8 and to the preprint [9] for more information on this. We only note here that although the above Colding-Minicozzi-Tessera upper bound on the volume of spheres holds generally for all locally compact groups G with polynomial growth, unless G is nilpotent, there is no error term in general in the asymptotics of the volume of balls. An example with arbitrarily small speed is given in §8.1. 1.4. A conjecture of Burago and Margulis. In [7] D. Burago and G. Margulis conjectured that any two word metrics on a finitely generated group which are asymptotic (in the sense that ρ1(e,γ) ρ2(e,γ) tends to 1 at infinity) must be at a bounded distance from one another (in the sense that |ρ1(e, γ) − ρ2(e, γ)| = O(1)). This holds for abelian groups. An analogous result was proved by Abels and Margulis for word metrics on reductive groups [1]. S. Krat [23] established this property for word metrics on the Heisenberg group H3(Z). However using Theorem 1.4 (which in this particular case of finitely generated nilpotent groups is just Pansu’s theorem [27]) we will show in Section 8.3, that there are counter-examples and exhibit two word metrics on H3(Z) × Z which are asymptotic and yet are not at a bounded distance. For more on this counter-example, and how to adequately modify the conjecture of Burago and Margulis, we refer the interested reader to 1.5. Organization of the paper. Sections 2-4 are devoted to preliminaries. In Section 2 we present the basic nilpotent theory as can be found in Guivarc’h’s thesis [21]. In particular, a full proof of the Bass-Guivarc’h formula is given. In Section 3, we recall the construction of the nilshadow of a solvable Lie group. 12 EMMANUEL BREUILLARD In Section 4 we set up the axioms and basic properties of the (pseudo)distance functions that are studied in this paper. Sections 5-7 contain the core of the proof of the main theorems. In Section 5, we assume that G is a simply connected solvable Lie group and reduce the problem to the nilpotent case. In Section 6, we assume that G is a simply connected nilpotent Lie group and prove Theorem 1.4 in this case following the strategy used by Pansu in [27]. In Section 7, we prove Theorem 1.2 for general locally compact groups and reduce the proof of the results of the introduction to the Lie case. In the last section we make further comments about the speed of convergence. In particular we give examples answering negatively the aforementioned question of Burago and Margulis. The Appendix is devoted to the discrete Heisenberg groups of dimension 3 and 5. We compute their limit balls, explain Figure 1, and recover the main result of Stoll [33]. The reader who is mainly interested in the nilpotent group case can read directly Section 6 while keeping an eye on Sections 2 and 4 for background notations and elementary facts. Finally, let us mention that the results and methods of this paper were largely inspired by the works of Y. Guivarc’h [21] and P. Pansu [27]. 1.6. Nota Bene. A version of this article circulated since 2007. The present ver- sion contains essentially the same material, only the exposition has been improved and several somewhat sketchy arguments have been replaced by full fledged proofs (in particular in Sections 3 and 7). This delay is due to the fact that I was plan- ning for a long time to improve Section 6 and show an error term in the volume asymptotics of balls in nilpotent groups. E. Le Donne and I recently managed to achieve this and it has now become an independent joint paper [9]. 2. Quasi-norms and the geometry of nilpotent Lie groups In this section, we review the necessary background material on nilpotent Lie groups. In paragraph 2.4, we give some crucial properties of homogeneous quasi norms and reproduce some lemmas originally due to Y. Guivarc’h which will be used in the sequel. Meanwhile, we prove the Bass-Guivarc’h formula for the de- gree of polynomial growth of nilpotent Lie groups, following Guivarc’h’s original argument. 2.1. Carnot-Caratheodory metrics. Let G be a connected Lie group with Lie algebra g and let m1 be a vector subspace of g. We denote by ‖·‖ a norm on m1. We now recall the definition of a left-invariant Carnot-Carathéodory metric also called subFinsler metric on G. Let x, y ∈ G. We consider all possible piecewise smooth paths ξ : [0, 1] → G going from ξ(0) = x to ξ(1) = y. Let ξ′(u) be the tangent vector which is pulled back to the identity by a left translation, i.e. = ξ(u) · ξ′(u) ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 13 where ξ′(u) ∈ g and the notation ξ(u) · ξ′(u) means the image of ξ′(u) under the differential at the identity of the left translation by the group element ξ(u). We say that the path ξ is horizontal if the vector ξ′(u) belongs to m1 for all u ∈ [0, 1]. We denote by H the set of piecewise smooth horizontal paths. The Carnot-Carathéodory metric associated to the norm ‖·‖ is defined by: d(x, y) = inf{ ∥∥ξ′(u) ∥∥ du, ξ ∈ H, ξ(0) = x, ξ(1) = y} where the infimum is taken over all piecewise smooth paths ξ : [0, 1] → N with ξ(0) = x, ξ(1) = y that are horizontal in the sense that ξ′(u) ∈ m1 for all u. If ‖ · ‖ is a Euclidean norm, the metric d(x, y) is also called subRiemannian. In this paper however the norm ‖ · ‖ will typically not be Euclidean (it can be polyhedral like in the case of word metrics on finitely generated nilpotent groups) and d(x, y) will only be subFinsler. If m1 = g, and ‖·‖ is a Euclidean (resp. arbitrary) norm on g, then d is simply the usual left-invariant Riemannian (resp. Finsler) metric associated to ‖·‖ . Chow’s theorem (e.g. see [19] or [25]) tells us that d(x, y) is finite for all x and y in G if and only if the vector subspace m1, together with all brackets of elements of m1, generates the full Lie algebra g. If this condition is satisfied, then d is a distance on G which induces the original topology of G. In this paper, we will only be concerned with Carnot-Caratheodory metrics on a simply connected nilpotent Lie group N . In the sequel, whenever we speak of a Carnot-Carathéodory metric on N, we mean one that is associated to a norm ‖·‖ on a subspace m1 such that n = m1 ⊕ [n, n] where n = Lie(N). It is easy to check that any such m1 generates the Lie algebra n. Remark 2.1. Let us observe here that for such a metric d on N, we have the following description of the unit ball for ‖·‖ {v ∈ m1, ‖v‖ ≤ 1} = π1(x) d(e, x) , x ∈ N\{e} where π1 is the linear projection from n (identified with N via exp) to m1 with kernel [n, n]. Indeed, π1 gives rise to a homomorphism from N to the vector space m1. And if ξ(u) is a horizontal path from e to x, then applying π1 to (6) we get d π1(ξ(u)) = ξ ′(u), hence π1(x) = ξ′(u)du. Hence ‖π1(x)‖ ≤ d(e, x) with equality if x ∈ m1. 2.2. Dilations on a nilpotent Lie group and the associated graded group. We now focus on the case of simply connected nilpotent Lie groups. Let N be such a group with Lie algebra n and nilpotency class r. For background about analysis on such groups, we refer the reader to the book [12]. The exponential map is a diffeomorphism between n and N . Most of the time, if x ∈ n, we will abuse notation and denote the group element exp(x) simply by x. We denote by {Cp(n)}p the central descending series for n, i.e. C p+1(n) = [n, Cp(n)] with C0(n) = n and Cr(n) = {0}. 14 EMMANUEL BREUILLARD Let (mp)p≥1 be a collection of vector subspaces of n such that for each p ≥ 1, (7) Cp−1(n) = Cp(n)⊕mp. Then n = ⊕p≥1mp and in this decomposition, any element x in n (or N by abuse of notation) will be written in the form πp(x) where πp(x) is the linear projection onto mp. To such a decomposition is associated a one-parameter group of dilations (δt)t>0. These are the linear endomorphisms of n defined by δt(x) = t for any x ∈ mp and for every p. Conversely, the one-parameter group (δt)t≥0 determines the (mp)p≥1’s since they appear as eigenspaces of each δt, t 6= 1. The dilations δt do not preserve a priori the Lie bracket on n. This is the case if and only if (8) [mp,mq] ⊆ mp+q for every p and q (where [mp,mq] is the subspace spanned by all commutators of elements of mp with elements of mq). If (8) holds, we say that the (mp)p≥1 form a stratification of the Lie algebra n, and that n is a stratified (or homogeneous) Lie algebra. It is an exercise to check that (8) is equivalent to require [m1,mp] = mp+1 for all p. If (8) does not hold, we can however consider a new Lie algebra structure on the real vector space n by defining the new Lie bracket as [x, y]∞ = πp+q([x, y]) if x ∈ mp and y ∈ mq. This new Lie algebra n∞ is stratified and has the same underlying vector space as n. We denote by N∞ the associated simply connected Lie group. Moreover the (δt)t>0 form a one-parameter group of automorphisms of n∞. In fact the original Lie bracket [x, y] on n can be deformed continuously to [x, y]∞ through a continuous family of Lie algebra structures by setting (9) [x, y]t = δ 1 ([δtx, δty]) and letting t → +∞. Note that conversely, if the δt’s are automorphisms of n, then [x, y] = πp+q([x, y]) for all x ∈ mp and y ∈ mq, and n = n∞. The graded Lie algebra associated to n is by definition gr(n) = Cp(n)/Cp+1(n) endowed with the Lie bracket induced from that of n. The quotient map mp → Cp(n)/Cp+1(n) gives rise to a linear isomorphism between n and gr(n), which is a Lie algebra isomorphism between the new Lie algebra structure n∞ and gr(n). Hence stratified Lie algebra structures induced by a choice of supplementary sub- spaces (mp)p≥1 as in (7) are all isomorphic to gr(n). ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 15 On N∞ the left-invariant subFinsler metrics d∞ associated to a choice of norm on m1 are of special interest. The one-parameter group of dilations {δt}t is an automorphism of N∞ and that (10) d∞(δtx, δty) = td∞(x, y) for any x, y ∈ N∞. The metric space (N∞, d∞) is called a Carnot group. If on the other hand the simply connected nilpotent Lie groupN is not stratified, then the group of dilations (δt)t associated to a choice of supplementary vector subspaces mi’s as in (7) will not consist of automorphisms of N and the relation (10) will not hold. Note also that if we are given two different choices of supplementary subspaces mi’s and m i’s as in (7), then the left-invariant Carnot-Caratheodory metrics on the corresponding stratified Lie groups are isometric if and only if (m1, ‖·‖) and (m′1, ‖·‖ ) are isometric (a linear isomorphism from m1 to m 1 that sends ‖·‖ to extends to an isometry of the two Carnot groups). 2.3. The Campbell-Hausdorff formula. The exponential map exp : n → N is a diffeomorphism. In the sequel, we will often abuse notation and identify N and n without further notice. In particular, for two elements x and y of n (or N equivalently) xy will denote their product in N , while x + y denotes the sum in n. Let (δt)t be a one-parameter group of dilations associated to a choice of supplementary subspaces mi’s as in (7). We denote the corresponding stratified Lie algebra by n∞ as above and the Lie group by N∞. The product on N∞ is denoted by x ∗ y. On N∞ the dilations (δt)t are automorphisms. The Campbell-Hausdorff formula (see [12]) allows to give a more precise form of the product in N. Let (ei)1≤i≤d be a basis of n adapted to the decomposition into mi’s, that is mi = span{ej , ej ∈ mi}. Let x = x1e1 + ...+ xded the corresponding decomposition of an element x ∈ n. Then define the degree di = deg(ei) to be the largest j such that ei ∈ C j−1(n). If α = (α1, ..., αd) ∈ N d is a multi-index, then let dα = deg(e1)α1 + ...+ deg(ed)αd. The Campbell-Hausdorff formula yields (11) (xy)i = xi + yi + Cα,βx where Cα,β are real constants and the sum is over all multi-indices α and β such that dα + dβ ≤ deg(ei), dα ≥ 1 and dβ ≥ 1. From (9), it is easy to give the form of the associated stratified Lie group law: (12) (x ∗ y)i = xi + yi + Cα,βx where the sum is restricted to those α’s and β’s such that dα + dβ = deg(ei), dα ≥ 1 and dβ ≥ 1. 2.4. Homogeneous quasi-norms and Guivarc’h’s theorem on polynomial growth. Let n be a finite dimensional real nilpotent Lie algebra and consider a decomposition n = m1 ⊕ ...⊕mr 16 EMMANUEL BREUILLARD by supplementary vector subspaces as in (7). Let (δt)t>0 be the one parameter group of dilations associated to this decomposition, that is δt(x) = t ix if x ∈ mi. We now introduce the following definition. Definition 2.2 (Homogeneous quasi-norm). A continuous function | · | : n → R+ is called a homogeneous quasi-norm associated to the dilations (δt)t, if it satisfies the following properties: (i) |x| = 0 ⇔ x = 0. (ii) |δt(x)| = t|x| for all t > 0. Example 2.3. (1) Quasi-norms of supremum type, i.e. |x| = maxp ‖πp(x)‖ where ‖·‖p are ordinary norms on the vector space mp and πp is the projection on mp as above. (2) |x| = d∞(e, x), where d∞ is a Carnot-Carathéodory metric on a stratified nilpotent Lie group (as the relation (10) shows). Clearly, a quasi-norm is determined by its sphere of radius 1 and two quasi- norms (which are homogeneous with respect to the same group of dilations) are always equivalent in the sense that |·|1 ≤ |·|2 ≤ c |·|1 for some constant c > 0 (indeed, by continuity, | · |2 admits a maximum on the “sphere” {|x|1 = 1}). If the two quasi-norms are homogeneous with respect to two distinct semi-groups of dilations, then the inequalities (13) continue to hold outside a neighborhood of 0, but may fail near 0. Homogeneous quasi-norms satisfy the following properties: Proposition 2.4. Let | · | be a homogeneous quasi-norm on n, then there are constants C,C1, C2 > 0 such that (a) |xi| ≤ C · |x| deg(ei) if x = x1e1 + ...+ xnen in an adapted basis (ei)i. (b) |x−1| ≤ C · |x|. (c) |x+ y| ≤ C · (|x|+ |y|) (d) |xy| ≤ C1(|x|+ |y|) + C2. Properties (a), (b) and (c) are straightforward from the fact that |x| = maxp ‖πp(x)‖ is a homogeneous quasi-norm and from (13). Property (d) justifies the term “quasi- norm” and follows from Lemma 2.5 below. It can be a problem that the constant C1 in (d) may not be equal to 1. In fact, this is why we use the word quasi-norm instead of just norm, because we do not require the triangle inequality axiom to hold. However the following lemma of Guivarc’h is often a good enough remedy to this situation. Let ‖·‖p be an arbitrary norm on the vector space mp. Lemma 2.5. (Guivarc’h, [21] lemme II.1) Let ε > 0. Up to rescaling each ‖·‖p into a proportional norm λp ‖·‖p (λp > 0) if necessary, the quasi-norm |x| = maxp ‖πp(x)‖ satisfies (14) |xy| ≤ |x|+ |y|+ ε ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 17 for all x, y ∈ N . If N is stratified with respect to (δt)t we can take ε = 0. This lemma is crucial also for computing the coarse asymptotics of volume growth. For the reader’s convenience, we reproduce here Guivarc’h’s argument, which is based on the Campbell-Hausdorff formula (11). Proof. We fix λ1 = 1 and we are going to give a condition on the λi’s so that (14) holds. The λi’s will be taken to be smaller and smaller as i increases. We set |x| = maxp ‖πp(x)‖ and let |x|λ = maxp ‖λpπp(x)‖ for any r-tuple of λi’s. We want that for any index p ≤ r, (15) λp ‖πp(xy)‖p ≤ (|x|λ + |y|λ + ε) By (11) we have πp(xy) = πp(x) + πp(y) + Pp(x, y) where Pp is a polynomial map into mp depending only on the πi(x) and πi(y) with i ≤ p− 1 such that ‖Pp(x, y)‖p ≤ Cp · l,m≥1,l+m≤p Mp−1(x) lMp−1(y) where Mk(x) := maxi≤k ‖πi(x)‖ i and Cp > 0 is a constant depending on Pp and on the norms ‖·‖i’s. Since ε > 0, when expanding the right hand side of (15) all terms of the form |x|lλ|y| λ with l +m ≤ p appear with some positive coefficient, say εl,m. The terms |x| and |y| appear with coefficient 1 and cause no trouble since we always have λp ‖πp(x)‖p ≤ |x| λ and λp ‖πp(y)‖p ≤ |y| λ. Therefore, for (15) to hold, it is sufficient that λpCpMp−1(x) lMp−1(y) m ≤ εl,m|x| for all remaining l and m. However, clearly Mk(x) ≤ Λk · |x|λ where Λk := maxi≤k{1/λ i } ≥ 1. Hence a sufficient condition for (15) to hold is where ε = min εl,m. Since Λp−1 depends only on the first p−1 values of the λi’s, it is obvious that such a set of conditions can be fulfilled by a suitable r-tuple λ. � Remark 2.6. The constant C2 in Property (d) above can be taken to be 0 when N is stratified with respect to the mi’s (i.e. the δt’s are automorphisms), as is easily seen after changing x and y into their image under δt. And conversely, if C2 = 0 for some δt-homogeneous quasi-norm on N, then N admits a stratification. Indeed, from (11) and (12), we see that if the δt’s are not automorphisms, then one can find x, y ∈ N such that, when t is small enough, |δt(xy) − δt(x)δt(y)| ≥ ct (r−1)/r for some c > 0. However, combining Properties (c) and Property (d) with C2 = 0 above we must have |δt(xy)− δt(x)δt(y)| = O(t) near t = 0. A contradiction. Guivarc’h’s lemma enables us to show: Theorem 2.7. (Guivarc’h ibid.) Let Ω be a compact neighborhood of the identity in a simply connected nilpotent Lie group N and ρΩ(x, y) = inf{n ≥ 1, x −1y ∈ Ωn}. 18 EMMANUEL BREUILLARD Then for any homogeneous quasi-norm | · | on N, there is a constant C > 0 such |x| ≤ ρΩ(e, x) ≤ C|x|+ C Proof. Since any two homogeneous quasi-norms (w.r.t the same one-parameter group of dilations) are equivalent, it is enough to do the proof for one of them, so we consider the quasi-norm obtained in Lemma 2.5 with the extra property (14). The lower bound in (16) is a direct consequence of (14) and one can take there C to be max{|x|, x ∈ Ω} + ε. For the upper bound, it suffices to show that there is C ∈ N such that for all n ∈ N, if |x| ≤ n then x ∈ ΩCn. To achieve this, we proceed by induction of the nilpotency length of N. The result is clear when N is abelian. Otherwise, by induction we obtain C0 ∈ N such that x = ω1 · ... · ωC0n · z where ωi ∈ Ω and z ∈ C r−1(N) whenever |x| ≤ n. Hence |z| ≤ |x|+C0n ·max |ω i |+ εC0 · n ≤ C1n for some other constant C1 ∈ N. So we have reduced the problem to x = z ∈ mr = C r−1(N) which is central in N. We have z = zn 1 where |z1| = |z|/n ≤ C1. Since Ω is a neighborhood of the identity in N, the set U of all products of at most dim(mr) simple commutators of length r of elements in Ω is a neighborhood of the identity in Cr−1(N) (e.g. see [19], p113). It follows that there is a constant C2 ∈ N such that z1 is in U C2 , hence the product of at most C2 dim(mr) simple commutators. Then we are done because z itself will be equal to the same product of commutators where each letter xi ∈ Ω is replaced by xni . This last fact follows from the following lemma: Lemma 2.8. Let G be a nilpotent group of nilpotency class r and n1, ..., nr be positive integers. Then for any x1, ..., xr ∈ G 1 , [x 2 , [..., x r ]...] = [x1, [x2, [..., xr]...] n1·...·nr To prove the lemma it suffices to use induction and the following obvious fact: if [x, y] commutes to x and y then [xn, y] = [x, y]n. � Finally, we obtain: Corollary 2.9. Let Ω be a compact neighborhood of the identity in N. Then there are positive constants C1 and C2 such that for all n ∈ N, d ≤ volN (Ω n) ≤ C2n where d is given by the Bass-Guivarc’h formula: (17) d = i · dimmi Proof. By Theorem 2.7, it is enough to estimate the volume of the quasi-norm balls. By homogeneity of the quasi-norm, we have volN{x, |x| ≤ t} = t dvolN{x, |x| ≤ 1}. � Remark 2.10. The use of Malcev’s embedding theorem allows, as Guivarc’h ob- served, to deduce immediately that the analogous result holds for virtually nilpotent finitely generated groups. This fact that was also proven independently by H. Bass ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 19 [3] by a direct combinatorial argument. See also Tits’ appendix to Gromov’s pa- per [17]. In fact Guivarc’h’s Theorem 2.7 seems to have been rediscovered several times in the past 40 years, including by Pansu in his thesis [27], the latest example of that being [22]. 3. The nilshadow The goal of this section is to introduce the nilshadow of a simply connected solvable Lie group G. We will assume that G has polynomial growth, although this last assumption is not necessary for almost everything we do in this section. The only statement which will be used afterwards in the paper (in Section 5) is Lemma 3.12 below. The reader familiar with the nilshadow can jump directly to the statement of this lemma and skip the forthcoming discussion. 3.1. Construction of the nilshadow. The nilshadow of G is a simply connected nilpotent Lie group GN , which is associated to G in a natural way. This notion was first introduced by Auslander and Green in [2] in their study of flows on solvmanifolds. They defined it as the unipotent radical of a semi-simple splitting of G. However, we are going to follow a different approach for its construction by working first at the Lie algebra level. We refer the reader to the book [13] where this approach is taken up. Let g be a solvable real Lie algebra and n the nilradical of g.We have [g, g] ⊂ n. If x ∈ g, we write ad(x) = ads(x) + adn(x) the Jordan decomposition of ad(x) in GL(g). Since ad(x) ∈ Der(g), the space of derivations of g, and Der(g) is the Lie algebra of the algebraic group Aut(g), the Jordan components ads(x) and adn(x) also belong to Der(g). Moreover, for each x ∈ g, ads(x) sends g into n (because so does ad(x) and ads(x) is a polynomial in ad(x)). Let h be a Cartan subalgebra of g, namely a nilpotent self-normalizing subalgebra. Recall that the image of a Cartan subalgebra by a surjective Lie algebra homomorphism is again a Car- tan subalgebra. Now since g/n is abelian, it follows that h maps onto g/n, i.e. h+ n = g. Moreover ads(x)|h = 0 if x ∈ h, because h is nilpotent. Now pick any real vector subspace v of h in direct sum with n. Then the following two conditions hold: (i) v⊕ n = g . (ii) ads(x)(y) = 0 for all x, y ∈ v. From (i) and (ii), it follows easily that ads(x) commutes with ad(y), ads(y) and adn(y), for all x, y in v. We have: Lemma 3.1. The map v → Der(g) defined by x 7→ ads(x) is a Lie algebra homomorphism. Proof. First let us check that this map is linear. Let x, y ∈ v. By the above ads(y) and ads(x) commute with each other (hence their sum is semi-simple) and commute with adn(x)+adn(y). From the uniqueness of the Jordan decomposition 20 EMMANUEL BREUILLARD it remains to check that adn(x)+adn(y) is nilpotent if x, y in v. To see this, apply the following obvious remark twice to a = adn(x) and V = ad(n) first and then to a = adn(y) and V = span{adn(x), ad((ad(y)) nx), n ≥ 1} : Let V be a nilpotent subspace of GL(g) and a ∈ GL(g) nilpotent, i.e. V n = 0 and am = 0 for some n,m ∈ N and assume [a, V ] ⊂ V. Then (a+ V )nm = 0. The fact that this map is a Lie algebra homomorphism follows easily from the fact that all ads(x), x ∈ v commute with one another and with [g, g] ⊂ n. We define a new Lie bracket on g by setting: (18) [x, y]N = [x, y]− ads(xv)(y) + ads(yv)(x) where xv is the linear projection of x on v according to the direct sum v⊕n = g. The Jacobi identity is checked by a straightforward computation where the following fact is needed: ads (ads(x)(y)) = 0 for all x, y ∈ g. This holds because, as we just saw, ads(x)(g) ⊂ n for all x ∈ g, and ads(a) = 0 if a ∈ n. Definition 3.2. Let gN be the vector space g endowed with the new Lie algebra structure [·, ·]N given by (18). The nilshadow GN of G is defined to be the simply connected Lie group with Lie algebra gN . It is easy to check that gN is a nilpotent Lie algebra. To see this, note first that [gN , gN ]N ⊂ n, and if x ∈ gN and y ∈ n then [x, y]N = (adn(xv) + ad(xn))(y). However, adn(xv) + ad(xn) is a nilpotent endomorphism of n as follows from the same remark used in the proof of Lemma 3.1. Hence gN is a nilpotent. The nilshadow Lie product on GN will be denoted by ∗ in order to distinguish it from the original Lie product on G. In the sequel, we will often identify G (resp. GN ) with its Lie algebra g (resp. gN ) via their respective exponential map. Since the underlying space of gN was g itself, this gives an identification (although not a group isomorphism) between G and GN . Then the nilshadow Lie product can be expressed in terms of the original product as follows: g ∗ h = g · (T (g−1)h) Here T is the Lie group homomorphism G → Aut(G) induced by the above choice of supplementary subspace v as follows. (19) T (ea)(eb) = exp(eads(av)b) ∀a, b ∈ g. In other words, T is the unique Lie group homomorphism whose differential at the identity is the Lie algebra homomorphism deT : g → Der(g) given by deT (a)(b) = ads(av)b, that is the composition of the map v → Der(g) from Lemma 3.1 with the linear projection g → g/n ≃ v. It is easy to check that this definition of the new product is compatible with the definition of the new Lie bracket. It can also be checked that two choices of supplementary spaces v as above yield isomorphic Lie structures (see [13, Chap. III]). Hence by abuse of language, we ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 21 speak of the nilshadow of g, when we mean the Lie structure on G induced by a choice of v as above. The following example shows several of the features of a typical solvable Lie group of polynomial growth. Example 3.3 (Nilshadow of a semi-direct product). Let G = R ⋉φ R n where φt ∈ GLn(R) is some one parameter subgroup given by φt = exp(tA) = ktut where A is some matrix in Mn(R) and A = As +Au is its Jordan decomposition, giving rise to kt = exp(tAs) and ut = exp(tAu). The group G is diffeomorphic to R hence simply connected. If all eigenvalues of As are purely imaginary, then G has polynomial growth. However G is not nilpotent unless As = 0. So let us assume that neither As nor Au is zero. Then the nilshadow GN is the semi-direct product R⋉u R n where ut is the unipotent part of φt. It is easy to compute the homogeneous dimension of G (or GN) in terms of the dimension of the Jordan blocs of Au. If nk is the number of Jordan blocks of Au of size k, then d(G) = 1 + k(k + 1) 3.2. Basic properties of the nilshadow. We now list in the form of a few lemmas some basic properties of the nilshadow. Lemma 3.4. The image of T : G → Aut(G) is abelian and relatively compact. Moreover T (T (g)h) = T (h) for any g, h ∈ G. Proof. Since G has polynomial growth it is of type (R) by Guivarc’h’s theorem. Hence all ads(x) have purely imaginary eigenvalues. It follows that K is compact. Since T factors through the nilradical, its image is abelian. The last equality follows from (19) and the fact that ∀x, y ∈ g, ads(ads(x)(y)) = 0. � Lemma 3.5. T (G) also belongs to Aut(GN ) and T is a group homomorphism GN → Aut(GN ). Proof. The first assertion follows from (19) and the fact that deT is a derivation of gN as one can check from (18) and the fact that ∀x, y ∈ g, ads(ads(x)(y)) = 0. The second assertion then follows from Lemma 3.4. � We denote by K the closure of T (G) in Aut(G) = Aut(g). Lemma 3.6 (K-action on gN ). K preserves v and acts trivially on it. It also preserves the ideals n and the central descending series {Ci(gN )}i of gN . Proof. It suffices to check that ads(v) preserves n and C i(gN ). It preserves n because ad(x) preserves n for all x ∈ g. It preserves Ci(gN ) because it acts as a derivation of gN as we have already checked in the proof of Lemma 3.5. � Remark 3.7 (Well-definedness of π1). It is also easy to check from the definition of the nilshadow bracket that the commutator subalgebra [gN , gN ] and in fact each term of the central descending series Ci(gN ) is an ideal in g and does not depend on the choice of supplementary subspace v used to defined the nilshadow bracket. 22 EMMANUEL BREUILLARD In particular the projection map π1 : gN → gN/[gN , gN ] is a well defined linear map on g = gN (i.e. independently of the choice involved in the construction of the nilshadow Lie bracket). Lemma 3.8 (Exponential map). The respective exponential maps exp : g → G and expN : gN → GN coincide on n and on v. Proof. Since the two Lie products coincide on N = exp(n), so do their exponential map. For the second assertion, note that T (e−tv)v = v for every v ∈ v because ads(x)(y) = 0 for all x, y ∈ ν. It follows that {e tv}t is a one-parameter subgroup for both Lie structures, hence it is equal to {expN (tv)}t. � Remark 3.9 (Surjectivity of the exponential map). The exponential map is not always a diffeomorphism, as the example of the universal cover Ẽ of the group E of motions of the plane shows (indeed any 1-parameter subgroup of E is either a translation subgroup or a rotation subgroup, but the rotation subgroup is compact hence a torus, so its lift will contain the (discrete) center of E, hence will miss every lift of a non trivial translation). In fact, it is easy to see that if g is the Lie algebra of a solvable (non-nilpotent) Lie group of polynomial growth, then g maps surjectively on the Lie algebra of E. Hence, for a simply connected solvable and non-nilpotent Lie group of polynomial growth, the exponential map is never onto. Nevertheless its image is easily seen to be dense. However, exponential coordinates of the second kind behave nicely. Note that [gN , gN ] ⊂ n. Lemma 3.10 (Exponential coordinates of the second kind). Let {Ci(gN )}i≥0 be the central descending series of gN (with C 1(gN ) = [gN , gN ]) and pick linear subspaces mi in gN such that C i(gN ) = mi ⊕ C i−1(gN ) for i ≥ 2. Let ℓ be a supplementary subspace of C1(gN ) in n. Define exponential coordinates of the second kind by setting mr ⊕ ...⊕m2 ⊕ ℓ⊕ v → G (ξr, ..., ξ1, v) 7→ expN (ξr) ∗ . . . ∗ expN (ξ1) ∗ expN (v) This map is a diffeomorphism. Moreover expN (ξr) ∗ . . . ∗ expN (ξ1) ∗ expN (v) = eξr · ... · eξ1 · ev for all choices of v ∈ v and ξi ∈ mi. Proof. By Lemma 3.8 the exponential maps of G and GN coincide on n and on v. Moreover g ∗ h = g · h whenever g belongs to the nilradical exp(n) of G. Hence expN (ξr)∗. . .∗expN (ξ1)∗expN (v) = expN (ξr)·. . .·expN (ξ1)·expN (v) = e ξr ·...·eξ1 ·ev. The restriction of the map to n is a diffeomorphism onto exp(n), because this map and its inverse are explicit polynomial maps (the ξi’s are coordinates of the second kind, see the book [12]). Now the map n ⊕ v → G sending (n, v) to en · ev is a diffeomorphism, because G is simply connected and hence the quotient group G/ exp(n) isomorphic to a vector space and hence to exp(v). � Lemma 3.11 (“Bi-invariant” Riemannian metric). There exists a Riemannian metric on G which is left invariant under both Lie structures. ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 23 Proof. Indeed it suffices to pick a scalar product on g which is invariant under the compact subgroup K = T (G) ⊂ Aut(g). � We identify K = {T (g), g ∈ G} with its image in Aut(g) under the canonical isomorphism between Aut(G) and Aut(g). Recall that, according to Lemma 3.6, the central descending series of gN is invariant under ads(x) for all x ∈ v and consists of ideals of g. The same holds for n. It follows that these linear subspaces also invariant under K. However since K is compact, its action on g is completely reducible. Therefore we have proved: Lemma 3.12 (K-invariant stratification of the nilshadow). Let g be the Lie algebra of a simply connected Lie group G with polynomial growth. Let gN be the nilshadow Lie algebra obtained from a splitting g = n⊕v as above (i.e. n is the nilradical and v satisfies ads(x)(y) = 0 for every x, y ∈ v). Let K := {T (g), g ∈ G} ⊂ Aut(G), where T is defined by (19). Then there is a choice of linear subspaces mi’s and ℓ such that (20) gN = mr ⊕ . . . m2 ⊕ ℓ⊕ v, where each term is K-invariant, m1 := ℓ⊕ v and the central descending series of gN satisfies C i(gN ) = mi ⊕ C i−1(gN ). Moreover the action on K can be read off on the exponential coordinates of second kind in this decomposition, namely: eξr · ... · eξ0 = k(eξr) · ... · k(eξ0) = ek(ξr) · ... · ek(ξ0) = expN (k(ξr)) ∗ ... ∗ expN (k(ξ0)) 4. Periodic metrics In this section, unless otherwise stated, G will denote an arbitrary locally com- pact group. 4.1. Definitions. By a pseudodistance (or metric) on a topological space X, we mean a function ρ : X × X → R+ satisfying ρ(x, y) = ρ(y, x) and ρ(x, z) ≤ ρ(x, y) + ρ(y, z) for any triplet of points of X. However ρ(x, y) may be equal to 0 even if x 6= y. We will require our pseudodistances to be locally bounded, meaning that the image under ρ of any compact subset of G × G is a bounded subset of R+. To avoid irrelevant cases (for instance ρ ≡ 0) we will also assume that ρ is proper, i.e. the map y 7→ ρ(e, y) is a proper map, namely the preimage of a bounded set is bounded (we do not ask that the map be continuous). When ρ is locally bounded then it is proper if and only if y 7→ ρ(x, y) is proper for any x ∈ G. A pseudodistance ρ on G is said to be asymptotically geodesic if for every ε > 0 there exists s > 0 such that for any x, y ∈ G one can find a sequence of points x1 = x, x2, ..., xn = y in G such that ρ(xi, xi+1) ≤ (1 + ε)ρ(x, y) and ρ(xi, xi+1) ≤ s for all i = 1, ..., n − 1. 24 EMMANUEL BREUILLARD We will consider exclusively pseudodistances on a group G that are invariant under left translations by all elements of a fixed closed and co-compact subgroup H of G, meaning that for all x, y ∈ G and all h ∈ H, ρ(hx, hy) = ρ(x, y). Combining all previous axioms, we set the following definition. Definition 4.1. Let G be a locally compact group. A pseudodistance ρ on G will be said to be a periodic metric (or H-periodic metric) if it satisfies the following properties: (i) ρ is invariant under left translations by a closed co-compact subgroup H. (ii) ρ is locally bounded and proper. (iii) ρ is asymptotically geodesic. Remark 4.2. The assumption that ρ is symmetric, i.e. ρ(x, y) = ρ(y, x) is here only for the sake of simplicity, and most of what is proven in this paper can be done without this hypothesis. 4.2. Basic properties. Let ρ be a periodic metric on G and H some co-compact subgroup of G. The following properties are straighforward. (1) ρ is at a bounded distance from its restriction to H. This means that if F is a bounded fundamental domain for H in G and for an arbitrary x ∈ G, if hx denotes the element of H such that x ∈ hxF, then |ρ(x, y)− ρ(hx, hy)| ≤ C for some constant C > 0. (2) ∀t > 0 there exists a compact subset Kt of G such that, ∀x, y ∈ G, ρ(x, y) ≤ t ⇒ x−1y ∈ Kt. And conversely, if K is a compact subset of G, ∃t(K) > 0 s.t. x−1y ∈ K ⇒ ρ(x, y) ≤ t(K). (3) If ρ(x, y) ≥ s, the xi’s in (21) can be chosen in such a way that s ≤ ρ(xi, xi+1) ≤ 2s (one can take a suitable subset of the original xi’s). (4) The restriction of ρ to H × H is a periodic pseudodistance on H. This means that the xi’s in (21) can be chosen in H. (5) Conversely, given a periodic pseudodistance ρH on H, it is possible to extend it to a periodic pseudodistance on G by setting ρ(x, y) = ρH(hx, hy) where x = hxF for some bounded fundamental domain F for H in G. 4.3. Examples. Let us give a few examples of periodic pseudodistances. (1) Let Γ be a finitely generated torsion free nilpotent group which is embedded as a co-compact discrete subgroup of a simply connected nilpotent Lie group N . Given a finite symmetric generating set S of Γ, we can consider the corresponding word metric dS on Γ which gives rise to a periodic metric on N given by ρ(x, y) = dS(γx, γy) where x ∈ γxF and y ∈ γyF if F is some fixed fundamental domain for Γ in N. (2) Another example, given in [27], is as follows. Let N/Γ be a nilmanifold with universal cover N and fundamental group Γ. Let g be a Riemannian metric on N/Γ. It can be lifted to the universal cover and thus gives rise to a Riemannian metric g̃ on N . This metric is Γ-invariant, proper and locally bounded. Since Γ is co-compact in N, it is easy to check that it is also asymptotically geodesic hence periodic. ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 25 (3) Any word metric on G. That is, if Ω is a compact symmetric generating subset of G, let ∆Ω(x) = inf{n ≥ 1, x ∈ Ω n}. Then define ρ(x, y) = ∆Ω(x −1y). Clearly ρ is a pseudodistance (although not a distance) and it is G-invariant on the left, it is also proper, locally bounded and asymptotically geodesic, hence periodic. (4) If G is a connected Lie group, any left invariant Riemannian metric on G. Here again H = G and we obtain a periodic distance. Similarly, any left invariant Carnot-Carathéodory metric on G will do. Remark 4.3 (Berestovski’s theorem). According to a result of Berestovski [5] every left-invariant geodesic distance on a connected Lie group is a subFinsler metric as defined in Paragraph 2.1. 4.4. Coarse equivalence between invariant pseudodistances. The following proposition is basic: Proposition 4.4. Let ρ1 and ρ2 be two periodic pseudodistances on G. Then there is a constant C > 0 such that for all x, y ∈ G ρ2(x, y)− C ≤ ρ1(x, y) ≤ Cρ2(x, y) + C Proof. Clearly it suffices to prove the upper bound. Let s > 0 be the number cor- responding to the choice ε = 1 in (21) for ρ2. From 4.2 (2), there exists a compact subset Ks in G such that ρ2(x, y) ≤ 2s ⇒ x −1y ∈ K2s, and there is a constant t = t(K2s) > 0 such that x −1y ∈ K2s ⇒ ρ1(x, y) ≤ t. Let C = max{2t/s, t}, and let x, y ∈ G. If ρ2(x, y) ≤ s then ρ1(x, y) ≤ t so the right hand side of (22) holds. If ρ2(x, y) ≥ s then, from (21) and 4.2 (3), we get a sequence of xi’s in G from x to y such that s ≤ ρ2(xi, xi+1) ≤ 2s and 1 ρ2(xi, xi+1) ≤ 2ρ2(x, y). It follows that ρ1(xi, xi+1) ≤ t for all i. Hence ρ1(x, y) ≤ ρ1(xi, xi+1) ≤ Nt ≤ tρ2(x, y) and the right hand side of (22) holds. � In the particular case when G = N is a simply connected nilpotent Lie group, the distance to the origin x 7→ ρ(e, x) is also coarsely equivalent to any homoge- neous quasi-norm on N. We have, Proposition 4.5. Suppose N is a simply connected nilpotent Lie group. Let ρ1 be a periodic pseudodistance on N and | · | be a homogeneous quasi-norm, then there exists C > 0 such that for all x ∈ N |x−1y| − C ≤ ρ1(x, y) ≤ C|x −1y|+ C Moreover, if ρ2 is a periodic pseudodistance on the stratified nilpotent group N∞ associated to N, then again, there is a constant C > 0 such that ρ2(e, x)− C ≤ ρ1(e, x) ≤ Cρ2(e, x) + C The proposition follows at once from Guivarc’h’s theorem (see Corollary 2.7 above), the equivalence of homogeneous quasi-norms, and the fact that left-invariant Carnot-Caratheodory metrics on N∞ are homogeneous quasi norms. However, since the group structures on N and N∞ differ, (24) cannot in general be replaced by the stronger relation (22) as simple examples show. 26 EMMANUEL BREUILLARD The next proposition is of fundamental importance for the study of metrics on Lie groups of polynomial growth: Proposition 4.6. Let G be a simply connected solvable Lie group of polynomial growth and GN its nilshadow. Let ρ and ρN be arbitrary periodic pseudodistances on G and GN respectively. Then there is a constant C > 0 such that for all x, y ∈ G ρN (x, y)− C ≤ ρ(x, y) ≤ CρN (x, y) + C Proof. According to Proposition 4.4, it is enough to show (25) for some choice of periodic metrics on G and GN . But in Lemma 3.11 we constructed a Riemannian metric on G which is left invariant for both G and GN . We are done. � 4.5. Right invariance under a compact subgroup. Here we verify that, given a compact subgroup of G, any periodic metric is at bounded distance from another periodic metric which is invariant on the right by this compact subgroup. Let K be a compact subgroup of G and ρ a periodic pseudodistance on G. We average ρ with the help of the normalized Haar measure on K to get: (26) ρK(x, y) = ρ(xk1, yk2)dk1dk2 Then the following holds: Lemma 4.7. There is a constant C0 > 0 depending only on ρ and K such that for all k1, k2 ∈ K and all x, y ∈ G (27) |ρ(xk1, yk2)− ρ(x, y)| ≤ C0 Proof. From 4.2 (2), ∃t = t(K) > 0 s.t. ∀x ∈ G, ρ(x, xk) ≤ t. Applying the triangle inequality, we are done. � Hence we obtain: Proposition 4.8. The pseudodistance ρK is periodic and lies at a bounded dis- tance from ρ. In particular, as x tends to infinity in G the following limit holds (28) lim ρK(e, x) ρ(e, x) Proof. From Lemma 4.7 and 4.2 (3), it is easy to check that ρK must be asymp- totically geodesic, and periodic. Integrating (27) we get that ρK is at a bounded distance from ρ and (28) is obvious. � If K is normal in G, we thus obtain a periodic metric ρK on G/K such that ρK(p(x), p(y)) is at a bounded distance from ρ(x, y), where p is the quotient map G→ G/K. ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 27 5. Reduction to the nilpotent case In this section, G denotes a simply connected solvable Lie group of polynomial growth. We are going to reduce the proof of the theorems of the Introduction to the case of a nilpotent G. This is performed by showing that any periodic pseudodistance ρ on G is asymptotic to some associated periodic pseudodistance ρN on the nilshadow GN . We state this in Proposition 5.1 below. The key step in the proof is Proposition 5.2 below, which shows the asymptotic invariance of ρ under the “semisimple part” of G. The crucial fact there is that the displacement of a distant point under a fixed unipotent automorphism is negligible compared to the distance from the identity (see Lemmas 5.4, 5.5), so that the action of the semisimple part of large elements can be simply approximated by their action by left translation. 5.1. Asymptotic invariance under a compact group of automorphisms of G. The main result of this section is the following. Let G be a connected and simply connected solvable Lie group with polynomial growth and GN its nilshadow (see Section 3). Proposition 5.1. Let H be a closed co-compact subgroup of G and ρ an H- periodic pseudodistance (see Definition 4.1) on G. There exist a closed subset HK containing H which is a co-compact subgroup for both G and GN , and an HK- periodic (for both Lie structures) pseudodistance ρK such that (29) lim ρK(e, x) ρ(e, x) The closed subgroup HK will be taken to be the closure of the group generated by all elements of the form k(h), where h belongs to H and k belongs to the closure K in the group Aut(G) of the image of H under the homomorphism T : G → Aut(G) introduced in Section 3. It is easy to check from the definition of the nilshadow product (1) that this is indeed a subgroup in both G and its nilshadow GN . The new pseudodistance ρK is defined as follows, using a double averaging procedure: (30) ρK(x, y) := ρ(gk(x), gk(y))dkdµ(g) Here the measure µ is the normalized Haar measure on the coset space H\HK and dk is the normalized Haar measure on the compact group K. Recall that all closed subgroups of S are unimodular (since they have polynomial growth by [21][Lemme I.3.]). Hence the existence of invariant measures on the coset spaces. An essential part of the proof of Proposition 5.1 is enclosed in the following statement: 28 EMMANUEL BREUILLARD Proposition 5.2. Let ρ be a periodic pseudodistance on G which is invariant under a co-compact subgroup H. Then ρ is asymptotically invariant under the action of K = {T (h), h ∈ H} ⊆ Aut(G). Namely, (uniformly) for all k ∈ K, (31) lim ρ(e, k(x)) ρ(e, x) The proof of Proposition 5.2 splits into two steps. First we show that it is enough to prove (31) for a dense subset of k’s. This is a consequence of the following continuity statement: Lemma 5.3. Let ε > 0, then there is a neighborhood U of the identity in K such that, for all k ∈ U, limx→∞ ρ(x, k(x)) ρ(e, x) Then we show that the action of T (g) can be approximated by the conjugation by g, essentially because the unipotent part of this conjugation does not move x very much when x is far. This is the content of the following lemma: Lemma 5.4. Let ρ be a periodic pseudodistance on G which is invariant under a co-compact subgroup H. Then for any ε > 0, and any compact subset F in H there is s0 > 0 such that |ρ(e, T (h)x) − ρ(e, hx)| ≤ ερ(e, x) for any h ∈ F and as soon as ρ(e, x) > s0. Proof of Proposition 5.2 modulo Lemmas (5.3) and (5.4): As ρ is assumed to be H-invariant, for every h ∈ H, we have ρ(e, h−1x)/ρ(e, x) → 1. The proof of the proposition then follows immediately from the combination of the last two lemmas. � 5.2. Proof of Lemmas (5.3) and (5.4). We choose K-invariant subspaces mi’s and ℓ of the nilshadow gN of g as in Lemma 3.12 from Section 3. In particular gN = mr ⊕ . . . ⊕m2 ⊕ ℓ⊕ v, where each term is K-invariant, n = [gN , gN ] ⊕ l and C i(gN ) = mi ⊕ C i−1(gN ). Moreover δt(x) = t ix if x ∈ mi (here m1 = ℓ⊕ v). We also set v(x) = maxi ‖ξi‖ i if x = expN (ξr) ∗ . . . ∗ expN (ξ0) and di = i if i > 0 and d0 = 1. And we let |x| := maxi ‖xi‖ 1/di if x = xr + . . . + x1 + x0 in the above direct sum decomposition. Note that | · | is a δt-homogeneous quasi-norm. Moreover, it is straightforward to verify (using the Campbell-Hausdorff formula (12) and Proposition 2.4) that v(x) ≤ C|x|+C for some constant C > 0. In particular ξi/|x| di remains bounded as |x| becomes large. Proof of Lemma 5.3. Combining Propositions 4.5 and 4.6, there is a constant C > 0 such that for all x, y ∈ G, ρ(x, y) ≤ C|x∗−1 ∗ y| + C. Therefore we have ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 29 reduced to prove the statement for | · | instead of ρ, namely it is enough to show that |x∗−1 ∗ k(x)| becomes negligible compared to |x| as |x| goes to infinity and k tends to 1. It follows from the Campbell-Baker-Hausdorff formula (11) and (12) that, if x, y ∈ GN and |x|, |y| are O(t), then |δ 1 (x ∗ y) − δ 1 (x) ∗ δ 1 (y)| = O(t−1/r), and similarly |δ 1 (x1 ∗ . . . ∗ xm) − δ 1 (x1) ∗ . . . ∗ δ 1 (xm)| = Om(t −1/r), for m elements xi with |xi| = O(t). Hence when writing x = expN (ξr) ∗ ... ∗ expN (ξ0), and setting t = |x|, we thus obtain that the following quantity ∣∣∣∣∣∣ (x∗−1 ∗ k(x))− 0≤i≤r expN (−t −diξi) ∗ 0≤i≤r expN (t −dr−ik(ξr−i)) ∣∣∣∣∣∣ is a O(t−1/r). Indeed recall from Lemma 3.12 that k(x) = expN (k(ξr)) ∗ ... ∗ expN (k(ξ0)). As x gets larger, each t −diξi remains in a compact subset of mi. Therefore, as k tends to the identity in K, each t−dik(ξi) becomes uniformly close to t−diξi independently of the choice of x ∈ GN as long as t = |x| is large. The result follows. � Proof of Lemma 5.4. Recall that hx = h ∗ T (h)x for all x, h ∈ G (see (1). By the triangle inequality it is enough to bound ρ(y, h ∗ y), where y = T (h)x. From Propositions 4.5 and 4.6, ρ is comparable (up to multiplicative and additive constants to the homogeneous quasi-norm | · |. Hence the Lemma follows from the following: Lemma 5.5. Let N be a simply connected nilpotent Lie group and let | · | be a homogeneous quasi norm on N associated to some 1-parameter group of dilations (δt)t. For any ε > 0 and any compact subset F of N, there is a constant s2 > 0 such that |x−1gx| ≤ ε|x| for all g ∈ F and as soon as |x| > s2. Proof. Recall, as in the proof of the last lemma, that for any c1 > 0 there is a c2 > 0 such that if t > 1 and x, y ∈ N are such that |x|, |y| ≤ c1t, then (xy)− δ 1 (x) ∗ δ 1 (y)| ≤ c2t −1/r. In particular, if we set t = |x|, then ∣∣∣δ 1 (x−1gx)− δ 1 (x)−1 ∗ δ 1 (g) ∗ δ 1 ∣∣∣ ≤ c2t−1/r On the other hand, as g remains in the compact set F, δ 1 (g) tends uniformly to the identity when t = |x| goes to infinity, and δ 1 (x) remains in a compact set. By continuity, we see that δ 1 (x)−1 ∗ δ 1 (g) ∗ δ 1 (x) becomes arbitrarily small as t increases. We are done. � 30 EMMANUEL BREUILLARD 5.3. Proof of Proposition 5.1. First we prove the following continuity state- ment: Lemma 5.6. Let ρ be a periodic pseudodistance on G and ε > 0. Then there exists a neighborhood of the identity U in G and s3 > 0 such that 1− ε ≤ ρ(e, gx) ρ(e, x) ≤ 1 + ε as soon g ∈ U and ρ(e, x) > s3. Proof. Let ρN be a left invariant Riemannian metric on the nilshadow GN . |ρ(e, x)− ρ(e, gx)| ≤ ρ(x, gx) ≤ ρ(x, g ∗ x) + ρ(g ∗ x, gx) However ρ(a, b) ≤ CρN(a, b)+C for some C > 0 by Proposition 4.6. Moreover by (1) we have gx = g ∗ T (g)x. Hence |ρ(e, x) − ρ(e, gx)| ≤ CρN (x, g ∗ x) + CρN (x, T (g)x) + 2C To complete the proof, we apply Lemmas 5.5 and 5.3 to the right hand side above. We proceed with the proof of Proposition 5.1. Let L be the set of all g ∈ G such that ρ(e, gx)/ρ(e, x) tends to 1 as x tends to infinity in G. Clearly L is a subgroup of G. Lemma 5.6 shows that L is closed. The H-invariance of ρ insures that L contains H. Moreover, Proposition 5.2 implies that L is invariant under K. Consequently L contains HK , the closed subgroup generated by all k(h), k ∈ K, h ∈ H. This, together with Proposition 5.2, grants pointwise convergence of the integrand in (29). Convergence of the integral follows by applying Lebesgue’s dominated convergence theorem. The fact that ρK is invariant under left multiplication by H and invariant under precomposition by automorphisms from K insures that ρK is invariant under ∗- left multiplication by any element h ∈ H, where ∗ is the multiplication in the nilshadow GN . Moreover we check that T (g) ∈ K if g ∈ HK , hence HK is a subgroup of GN . It is clearly co-compact in GN too (if F is compact and HF = G then H ∗ FK = G where FK is the union of all k(F ), k ∈ K). Clearly ρK is proper and locally bounded, so in order to finish the proof, we need only to check that ρK is asymptotically geodesic. By H-invariance of ρK and since H is co-compact in G, it is enough to exhibit a pseudogeodesic between e and a point x ∈ H. Let x = z1 · ... ·zn with zi ∈ H and ρ(e, zi) ≤ (1+ε) ·ρ(e, x). Fix a compact fundamental domain F for H in HK so that integration in (29) over H\HK is replaced by integration over F. Then for some constant CF > 0 we have |ρ(g, gz) − ρ(e, gz)| ≤ CF for g ∈ F and z ∈ H. Moreover, it follows from Proposition 5.2, Lemma 5.6 and the fact that HK ⊂ L, that (32) ρ(e, gk(z)) ≤ (1 + ε) · ρ(e, z) for all g ∈ F, k ∈ K and as soon as z ∈ G is large enough. Fix s large enough so that CF ≤ εs and so that (32) holds when ρ(e, z) ≥ s. As already observed in the discussion following Definition 4.1 (property 4.2 (3)) we may take the zi’s so ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 31 that s ≤ ρ(e, zi) ≤ s. Then nCF ≤ nsε ≤ 3ερ(e, x). Finally we get for ε < 1 and x large enough ρK(e, zi) ≤ CFn+ (1 + ε) 2ρ(e, x) ≤ CFn+ (1 + ε) 3ρK(e, x) ≤ (1 + 10ε) · ρK(e, x) where we have used the convergence ρK/ρ→ 1 that we just proved. � 6. The nilpotent case In this section, we prove Theorem 1.4 and its corollaries stated in the Intro- duction for a simply connected nilpotent Lie group. We essentially follow Pansu’s argument from [27], although our approach differs somewhat in its presentation. Throughout the section, the nilpotent Lie group will be denoted by N, and its Lie algebra by n. Let m1 be any vector subspace of n such that n = m1 ⊕ [n, n]. Let π1 the associated linear projection of n onto m1. Let H be a closed co-compact subgroup of N . To every H-periodic pseudodistance ρ on N we associate a norm ‖·‖0 on m1 which is the norm whose unit ball is defined to be the closed convex hull of all elements π1(h)/ρ(e, h) for all h ∈ H\{e}. In other words, (33) E := {x ∈ m1, ‖x‖0 ≤ 1} = CvxHull π1(h) ρ(e, h) , h ∈ H\{e} The set E is clearly a convex subset of m1 which is symmetric around 0 (since ρ is symmetric). To check that E is indeed the unit ball of a norm on m1 it remains to see that E is bounded and that 0 lies in its interior. The first fact follows immediately from (23) and Example 2.3. If 0 does not lie in the interior of E, then E must be contained in a proper subspace of m1, contradicting the fact that H is co-compact in N . Taking large powers hn, we see that we can replace the set H \{e} in the above definition by any neighborhood of infinity in H. Similarly, it is easy to see that the following holds: Proposition 6.1. For s > 0 let Es be the closed convex hull of all π1(x)/ρ(e, x) with x ∈ N and ρ(e, x) > s. Then E = s>0Es. Proof. Since ρ is H-periodic, we have ρ(e, hn) ≤ nρ(e, h) for all n ∈ N and h ∈ H. This shows E ⊂ s>0Es. The opposite inclusion follows easily from the fact that ρ is at a bounded distance from its restriction to H, i.e. from 4.2 (1). � We now choose a set of supplementary subspaces (mi) starting with m1 as in Paragraph 2.2. This defines a new Lie product ∗ on N so that N∞ = (N, ∗) is stratified. We can then consider the ∗-left invariant Carnot-Carathéodory metric associated to the norm ‖·‖0 as defined in Paragraph 2.1 on the stratified nilpotent Lie group N∞. In this section, we will prove Theorem 1.4 for nilpotent groups in the following form: 32 EMMANUEL BREUILLARD Theorem 6.2. Let ρ be a periodic pseudodistance on N and d∞ the Carnot- Carathéodory metric defined above, then as x tends to infinity in N (34) lim ρ(e, x) d∞(e, x) Note that d∞ is left-invariant for the N∞ Lie product, but not the original Lie product on N . Before going further, let us draw some simple consequences. (1) In Theorem 6.2 we may replace d∞(e, x) by d(e, x), where d is the left invariant Carnot-Caratheodory metric on N (rather than N∞) defined by the norm ‖·‖0 (as opposed to d∞ which is ∗-left invariant). Hence ρ, d and d∞ are asymptotic. This follows from the combination of Theorem 6.2 and Remark 2.1. (2) Observe that the choice ofm1 was arbitrary. Hence two Carnot-Carathéodory metrics corresponding to two different choices of a supplementary subspace m1 with the same induced norm on n/[n, n], are asymptotically equivalent (i.e. their ratio tends to 1), and in fact isometric (see Remark 2.1). Conversely, if two Carnot-Carathéodory metrics are associated to the same supplementary subspace m1 and are asymptotically equivalent, they must be equal. This shows that the set of all possible norms on the quotient vector space n/[n, n] is in bijection with the set of all classes of asymptotic equivalence of Carnot-Carathéodory metrics on (3) As another consequence we see that if a locally bounded proper and asymp- totically geodesic left-invariant pseudodistance on N is also homogeneous with respect to the 1-parameter group (δt)t (i.e. ρ(e, δtx) = tρ(e, x)) then it has to be of the form ρ(x, y) = d∞(e, x −1y) where d∞ is a Carnot-Carathéodory metric on 6.1. Volume asymptotics. Theorem 6.2 also yields a formula for the asymptotic volume of ρ-balls of large radius. Let us fix a Haar measure on N (for example Lebesgue measure on n gives rise to a Haar measure on N under exp). Since d∞ is homogeneous, it is straightforward to compute the volume of a d∞-ball: vol({x ∈ N, d∞(e, x) ≤ t}) = t d(N)vol({x ∈ N, d∞(e, x) ≤ 1}) where d(N) = i≥1 dim(C i(n)) is the homogeneous dimension of N. For a pseu- dodistance ρ as in the statement of Theorem 6.2, we can define the asymptotic vol- ume of ρ to be the volume of the unit ball for the associated Carnot-Carathéodory metric d∞. AsV ol(ρ) = vol({x ∈ N, d∞(e, x) ≤ 1}) Then we obtain as an immediate corollary of Theorem 6.2: Corollary 6.3. Let ρ be periodic pseudodistance on N. Then td(N) vol({x ∈ N, ρ(e, x) ≤ t}) = AsV ol(ρ) > 0 Finally, if Γ is an arbitrary finitely generated nilpotent group, we need to take care of the torsion elements. They form a normal finite subgroup T and applying Theorem 6.2 to Γ/T , we obtain: ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 33 Corollary 6.4. Let S be a finite symmetric generating set of Γ and Sn the ball of radius n is the word metric ρS associated to S, then nd(N) #Sn = #T · AsV ol(ρS) vol(N/Γ) where N is the Malcev closure of Γ = Γ/T , the torsion free quotient of Γ, and dS is the word pseudodistance associated to S, the projection of S in Γ. Moreover, it is possible to be a bit more precise about AsV ol(ρS). In fact, the norm ‖·‖0 on m1 used to define the limit Carnot-Carathéodory distance d∞ associated to ρS is a simple polyhedral norm defined by {‖x‖0 ≤ 1} = CvxHull (π1(s), s ∈ S) More generally the following holds. Let H be any closed, co-compact subgroup of N. Choose a Haar measure on H so that volN (N/H) = 1. Theorem 6.2 yields: Corollary 6.5. Let Ω be a compact symmetric (i.e. Ω = Ω−1) neighborhood of the identity, which generates H. Let ‖·‖0 be the norm on m1 whose unit ball is CvxHull{π1(Ω)} and let d∞ be the corresponding Carnot-Carathéodory metric on N∞. Then we have the following limit in the Hausdorff topology (Ωn) = {g ∈ N, d∞(e, g) ≤ 1} volH(Ω nd(N) = volN ({g ∈ N, d∞(e, g) ≤ 1}) 6.2. Outline of the proof. We first devise some standard lemmas about piece- wise approximations of horizontal paths (Lemmas 6.6, 6.7, 6.10). Then it is shown (Lemma 6.11) that the original product on N and the product in the associated graded Lie group are asymptotic to each other, namely, if (δt)t is a 1-parameter group of dilations of N, then after renormalization by δ 1 , the product of O(t) elements lying in some bounded subset of N, is very close to the renormalized product of the same elements in the graded Lie group N∞. This is why all com- plications due to the fact that N may not be a priori graded and the δt’s may not be automorphisms disappear when looking at the large scale geometry of the group. Finally, we observe (Lemma 6.13), as follows from the very definition of the unit ball E for the limit norm ‖·‖0 , that any vector in the boundary of E, can be approximated, after renormalizing by δ 1 by some element x ∈ N lying in a fixed annulus s(1 − ε) ≤ ρ(e, x) ≤ s(1 + ε). This enables us to assert that any ρ-quasi geodesic gives rise, after renormalization, to a d∞-geodesic (this gives the lower bound in Theorem 6.2). And vice-versa, that any d∞-geodesic can be ap- proximated uniformly by some renormalized ρ-quasi geodesic (this gives the upper bound in Theorem 6.2). 34 EMMANUEL BREUILLARD 6.3. Preliminary lemmas. Lemma 6.6. Let G be a Lie group and let ‖·‖e be a Euclidean norm on the Lie algebra of G and de(·, ·) the associated left invariant Riemannian metric on G. Let K be a compact subset of G. Then there is a constant C0 = C0(de,K) > 0 such that whenever de(e, u) ≤ 1 and x, y ∈ K |de(xu, yu)− de(x, y)| ≤ C0de(x, y)de(e, u) Proof. The proof reduces to the case when u and x−1y are in a small neighborhood of e. Then the inequality boils down to the following ‖[X,Y ]‖e ≤ c ‖X‖e ‖Y ‖e for some c > 0 and every X,Y in Lie(G). � Lemma 6.7. Let G be a Lie group, let ‖·‖ be some norm on the Lie algebra of G and let de(·, ·) be a left invariant Riemannian metric on G. Then for every L > 0 there is a constant C = C(de, ‖·‖ , L) > 0 with the following property. Assume ξ1, ξ2 : [0, 1] → G are two piecewise smooth paths in the Lie group G with ξ1(0) = ξ2(0) = e. Let ξ i ∈ Lie(G) be the tangent vector pulled back at the identity by a left translation of G. Assume that supt∈[0,1] ‖ξ 1(t)‖ ≤ L, and that∫ 1 ‖ξ′1(t)− ξ 2(t)‖ dt ≤ ε. Then de(ξ1(1), ξ2(1)) ≤ Cε Proof. The function f(t) = de(ξ1(t), ξ2(t)) is piecewise smooth. For small dt we may write, using Lemma 6.6 f(t+ dt)− f(t) ≤ de(ξ1(t)ξ 1(t)dt, ξ1(t)ξ 2(t)dt) + de(ξ1(t)ξ 2(t)dt, ξ2(t)ξ 2(t)dt)− f(t) + o(dt) ∥∥ξ′1(t)− ξ′2(t) dt+ C0f(t) ∥∥ξ′2(t)dt + o(dt) ≤ ε(t)dt +C0Lf(t)dt+ o(dt) where ε(t) = ‖ξ′1(t)− ξ 2(t)‖e . In other words, f ′(t) ≤ ε(t) + C0Lf(t) Since f(0) = 0, Gronwall’s lemma implies that f(1) ≤ eC0L ε(s)e−C0Lsds ≤ Cε. From now on, we will take G to be the stratified nilpotent Lie group N∞, and de(·, ·) will denote a left invariant Riemannian metric on N∞ while d∞(·, ·) is a left invariant Carnot-Caratheodory Finsler metric on N∞ associated to some norm ‖·‖ on m1. Remark 6.8. There is c0 > 0 such that c 0 de(e, x) ≤ d∞(e, x) ≤ c0de(e, x) r in a neighborhood of e. Hence in the situation of the lemma we get d∞(ξ1(1), ξ2(1)) ≤ r for some other constant C1 = C1(L, d∞, de). Lemma 6.9. Let N ∈ N and dN (x, y) be the function in N∞ defined in the following way: dN (x, y) = inf{ ∥∥ξ′(u) ∥∥ du, ξ ∈ HPL(N), ξ(0) = x, ξ(1) = y} ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 35 where HPL(N) is the set of horizontal paths ξ which are piecewise linear with at most N possible values for ξ′. Then we have dN → d∞ uniformly on compact subsets of N∞. Proof. Note that it follows from Chow’s theorem (e.g. see [25] or [19]) that there exists K0 ∈ N such that A := supd∞(e,x)=1 dK0(e, x) < ∞. Moreover, since piece- wise linear paths are dense in L1, it follows for example from Lemma 6.7 that for each fixed x, dn(e, x) → d∞(e, x). We need to show that dN (e, x) → d∞(e, x) uni- formly in x satisfying d∞(e, x) = 1. By contradiction, suppose there is a sequence (xn)n such that d∞(e, xn) = 1 and dn(e, xn) ≥ 1 + ε0 for some ε0 > 0. We may assume that (xn)n converges to say x. Let yn = x −1 ∗xn and tn = d∞(e, yn). Then dK0(e, yn) = tndK0(e, δ 1 (yn)) ≤ Atn. Thus dn(e, xn) ≤ dn(e, x) + dn(e, yn) ≤ dn(e, x) +Atn as soon as n ≥ K0. As n tends to ∞, we get a contradiction. � This lemma prompts the following notation. For ε > 0, we let Nε ∈ N be the first integer such that 1 ≤ dNε(e, x) ≤ 1 + ε for all x with d∞(e, x) = 1. Then we have: Lemma 6.10. For every x ∈ N∞ with d∞(e, x) = 1, and all ε > 0 there exists a path ξ : [0, 1] → N∞ in HPL(Nε) with unit speed (i.e. ‖ξ ′‖ = 1) such that ξ(0) = e and d∞(x, ξ(1)) ≤ C2ε and ξ ′ has at most one discontinuity on any subinterval of [0, 1] of length εr/Nε. Proof. We know that there is a path in HPL(Nε) connecting e to x with length ℓ ≤ 1 + ε. Reparametrizing the path so that it has unit speed, we get a path ξ0 : [0, ℓ] → N∞ in HPL(Nε) with d∞(x, ξ0(1)) = d∞(ξ0(ℓ), ξ0(1)) ≤ ε. The deriva- tive ξ′0 is constant on at most Nε different intervals say [ui, ui+1). Let us remove all such intervals of length ≤ εr/Nε by merging them to an adjacent interval and let us change the value of ξ′0 on these intervals to the value on the adja- cent interval (it doesn’t matter if we choose the interval on the left or on the right). We obtain a new path ξ : [0, 1] → N∞ in HPL(Nε) with unit speed and such that ξ′ has at most one discontinuity on any subinterval of [0, 1] of length εr/Nε. Moreover ‖ξ′(t)− ξ′0(t)‖ dt ≤ ε r. By Lemma 6.7 and Remark 6.3, we have d∞(ξ(1), ξ0(1)) ≤ C1ε, hence d∞(ξ(1), x) ≤ d∞(x, ξ0(1)) + d∞(ξ0(1), ξ(1)) ≤ (C1 + 1)ε Lemma 6.11 (Piecewise horizontal approximation of paths). Let x∗y denote the product inside the stratified Lie group N∞ and x · y the ordinary product in N . Let n ∈ N and t ≥ n. Then for any compact subset K of N , and any x1, ..., xn elements of K, we have de(δ 1 (x1 · ... · xn), δ 1 (x1 ∗ ... ∗ xn)) ≤ c1 de(δ 1 (x1 ∗ ... ∗ xn), δ 1 (π1(x1) ∗ ... ∗ π1(xn))) ≤ c2 36 EMMANUEL BREUILLARD where c1, c2 depend on K and de only. Proof. Let ‖·‖ be a norm on the Lie algebra of N. For k = 1, ..., n let zk = x1 · ... ·xk−1 and yk = xk+1 ∗ ...∗xn. Since all xi’s belong to K, it follows from (24) that as soon as t ≥ n, all δ 1 (zk) and δ 1 (yk) for k = 1, ..., n remain in a bounded set depending only on K. Comparing (12) and (11), we see that whenever y = O(1) and δ 1 (x) = O(1), we have ∥∥∥δ 1 (xy)− δ 1 (x ∗ y) ∥∥∥ = O( On the other hand, from (12) it is easy to verify that right ∗-multiplication by a bounded element is Lipschitz for ‖·‖ and the Lipschitz constant is locally bounded. It follows that there is a constant C1 > 0 (depending only on K and ‖·‖) such that for all k ≤ n ∥∥∥δ 1 ((zk · xk) ∗ yk)− δ 1 (zk ∗ xk ∗ yk) ∥∥∥ ≤ C1 ∥∥∥δ 1 (zk · xk)− δ 1 (zk ∗ xk) Applying n times the relation (35) with x = x1 · ... · xk−1 and y = xk, we finally obtain ∥∥∥δ 1 (x1 · ... · xn)− δ 1 (x1 ∗ ... ∗ xn) ∥∥∥ = O( ) = O( where O() depends only on K. On the other hand, using (11), it is another simple verification to check that if x, y lie in a bounded set, then 1 de(x, y) ≤ ‖x− y‖ ≤ c2de(x, y) for some constant c2 > 0. The first inequality follows. For the second inequality, we apply Lemma 6.7 to the paths ξ1 and ξ2 starting at e and with derivative equal on [ k , k+1 ) to nδ 1 (xk) for ξ1 and to n π1(xk) for ξ2. We get de(δ 1 (x1 ∗ ... ∗ xn), δ 1 (π1(x1) ∗ ... ∗ π1(xn)) = O( Remark 6.12. From Remark 6.3 we see that if we replace de by d∞ in the above lemma, we get the same result with 1 replaced by t− Lemma 6.13 (Approximation in the abelianized group). Recall that ‖·‖0 is the norm on m1 defined in (33). For any ε > 0, there exists s0 > 0 such that for every s > s0 and every v ∈ m1 such that ‖v‖0 = 1, there exists h ∈ H such that (1− ε)s ≤ ρ(e, h) ≤ (1 + ε)s and ∥∥∥∥ π1(h) ρ(e, h) Proof. Let ε > 0 be fixed. Considering a finite ε-net in E, we see that there exists a finite symmetric subset {g1, ..., gp} of H\{e} such that, if we consider the closed convex hull of F = {fi = π1(gi)/ρ(e, gi)|i = 1, ..., p} and ‖·‖ε the associated norm on m1, then ‖·‖0 ≤ ‖·‖ε ≤ (1 + 2ε) ‖·‖0. Up to shrinking F if necessary, we may assume that ‖fi‖ε = 1 for all i’s. We may also assume that the fi’s generate m1 as ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 37 a vector space. The sphere {x, ‖x‖ε = 1} is a symmetric polyhedron in m1 and to each of its facets corresponds d = dim(m1) vertices lying in F and forming a vector basis of m1. Let f1, ..., fd, say, be such vertices for a given facet. If x ∈ m1 is of the form x = i=1 λifi with λi ≥ 0 for 1 ≤ i ≤ d then we see that ‖x‖ε = i=1 λi, because the convex hull of f1, ..., fd is precisely that facet, hence lies on the sphere {x, ‖x‖ε = 1}. Now let v ∈ m1, ‖v‖0 = 1, and let s > 0. The half line tv, t > 0, hits the sphere {x, ‖x‖ε = 1} in one point. This point belongs to some facet and there are d linearly independent elements of F, say f1, ..., fd, the vertices of that facet, such that this point belongs to the convex hull of f1, ..., fd. The point sv then lies in the convex cone generated by π1(g1), ..., π1(gd). Moreover, there is a constant Cε > 0 (Cε ≤ max1≤i≤p ρ(e, gi)) such that ∥∥∥∥∥sv − niπ1(gi) ∥∥∥∥∥ for some non-negative integers n1, ..., nd depending on s > 0. Hence niρ(e, gi) = ∥∥∥∥∥ niπ1(gi) ∥∥∥∥∥ (‖sv‖ε + Cε) ≤ 1 + 2ε + ≤ 1 + 3ε where the last inequality holds as soon as s > Cε/ε. Now let h = g 1 · ... · g d ∈ H. We have π1(h) = i=1 niπ1(gi) ρ(e, h) ≥ ‖π1(h)‖0 ≥ s− Cε ≥ s(1− ε) Moreover ρ(e, h) ≤ niρ(e, gi) ≤ s(1 + 3ε) Changing ε into say ε and for say ε < 1 , we get the desired result with s0(ε) = max1≤i≤p ρ(e, gi). � 6.4. Proof of Theorem 6.2. We need to show that as x→ ∞ in N 1 ≤ lim ρ(e, x) d∞(e, x) ≤ lim ρ(e, x) d∞(e, x) First note that it is enough to prove the bounds for x ∈ H. This follows from (4.2) (1). Let us begin with the lower bound. We fix ε > 0 and s = s(ε) as in the definition of an asymptotically geodesic metric (see (21)). We know by 4.2 (3) and (4) that as soon as ρ(e, x) ≥ s we may find x1, ..., xn in H with s ≤ ρ(e, xi) ≤ 2s such that xi and ρ(e, xi) ≤ (1 + ε)ρ(e, x). Let t = d∞(e, x), then n ≤ ρ(e, x), hence n ≤ C t where C is a constant depending only on ρ (see (23)). We may 38 EMMANUEL BREUILLARD then apply Lemma 6.11 (and the remark following it) to get, as t ≥ n as soon as s(ε) ≥ C, d∞(δ 1 (x), δ 1 (π1(x1) ∗ ... ∗ π1(xn))) ≤ c But for each i we have ‖π1(xi)‖0 ≤ ρ(e, xi) by definition of the norm, hence t = d∞(e, x) ≤ ‖π1(xi)‖0+ d∞(x, π1(x1) ∗ ... ∗π1(xn)) ≤ (1+ ε)ρ(e, x)+ c Since ε was arbitrary, letting t→ ∞ we obtain ρ(e, x) d∞(e, x) We now turn to the upper bound. Let t = d∞(e, x) and ε > 0. According to Lemma 6.10, there is a horizontal piecewise linear path {ξ(u)}u∈[0,1] with unit speed such that d∞(δ 1 (x), ξ(1)) ≤ C2ε and no interval of length ≥ contains more than one change of direction. Let s0(ε) be given by Lemma 6.13 and assume t > s0(ε r)Nε/ε r. We split [0, 1] into n subintervals of length u1, ..., un such that ξ′ is constant equal to yi on the i-th subinterval and s0(ε r) ≤ tui ≤ 2s0(ε r). We have ξ(1) = u1y1 ∗ ... ∗ unyn. Lemma 6.13 yields points xi ∈ H such that ∥∥∥∥yi − π1(xi) ∥∥∥∥ ≤ ε and ρ(e, xi) ∈ [(1 − ε r)tui, (1 + ε r)tui] (note that tui > s0(ε r)). Let ξ be the piecewise linear path [0, 1] → N∞ with the same discontinuities as ξ and where the value yi is replaced by π1(xi) . Then according to Lemma 6.7, d∞(ξ(1), ξ(1)) ≤ Cε. Since ρ(e, xi) ≤ 4s0(ε r) for each i, we may apply Lemma 6.11 (and the remark following it) and see that if y = x1 · ... · xn, d∞(ξ(1), δ 1 (y)) ≤ c′1(ε)t Hence d∞(δ 1 (x), δ 1 (y)) ≤ (C2+C)ε+c 1(ε)t r and ρ(e, y) ≤ ρ(e, xi) ≤ (1+ε while ρ(x, y) ≤ C ′td∞(e, δ 1 (x−1y)) + C ′ ≤ t(Cd∞(δ 1 (x), δ 1 (y)) + oε(1)). Hence ρ(e, x) ≤ t+ oε(t) Remark 6.14. In the last argument we used the fact that ∥∥∥δ 1 (xu)− δ 1 (x ∗ u) ∥∥∥ = ) if δ 1 (x) and δ 1 (u) are bounded, in order to get for y = xu, d∞(e, δ 1 (u)) ≤ d∞(δ 1 (x), δ 1 (xu)) + d∞(δ 1 (xu), δ 1 (x ∗ u)) ≤ d∞(δ 1 (x), δ 1 (y)) + o(1). ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 39 7. Locally compact G and proofs of the main results In this section, we prove Theorem 1.2 and complete the proof of Theorem 1.4 and its corollaries. We begin with the latter. Proof of Theorem 1.4. It is the combination of Proposition 5.1, which reduces the problem to nilpotent Lie groups, and Theorem 6.2, which treats the nilpotent case. It only remains to justify the last assertion that d∞ is invariant under T (H). SinceK = T (H) stabilizesm1 (see Lemma 3.12 for the definition ofm1) and acts by automorphisms of the nilpotent (nilshadow) structure (Lemma 3.5), given any k ∈ K, the metric d∞(k(x), k(y)) is nothing else but the left invariant subFinsler metric on the nilshadow associated to the norm ‖k(v)‖ for v ∈ m1 (if ‖ · ‖ denotes the norm associated to d∞). However, d∞ is asymptotically invariant under K, because of Proposition 5.1. Namely d∞(e, k(x))/d∞(e, x) tends to 1 as x tends to infinity. Finally d∞(e, v) = ‖v‖ and d∞(e, k(v)) = ‖k(v)‖ for all v ∈ m1. Two asymptotic norms on a vector space are always equal. It follows that the norms ‖ · ‖ and ‖k(·)‖ on m1 coincide. Hence d∞(e, k(x)) = d∞(e, k(x)) for all x ∈ S as claimed. � Proof of Corollary 1.8. First some initial remark (see also Remark 2.1). If d is a left-invariant subFinlser metric on a simply connected nilpotent Lie group N induced by a norm ‖ · ‖ on a supplementary subspace m1 of the commutator subalgebra, then it follows from the very definition of subFinsler metrics (see Paragraph 2.1) that π1 is 1-Lipschitz between the Lie group and the abelianization of it endowed with the norm ‖·‖, namely ‖π1(x)‖ ≤ d(e, x), with equality if x ∈ m1. From this and considering the definition of the limit norm in (33), we conclude that ‖ · ‖ coincides with the limit norm of d. In particular Theorem 6.2 implies that d is asymptotic to the ∗-left invariant subFinsler metric d∞ induced by the same norm ‖ · ‖ on the graded Lie group (N∞, ∗). We can now prove Corollary 1.8. By the above remark, the limit metric d∞ on the graded nilshadow of S is asymptotic to the subFinsler metric d induced by the same norm ‖ · ‖ on the same (K-invariant) supplementary subspace m1 of the commutator subalgebra of the nilshadow, and which is left invariant for the nilshadow structure on S. However, it follows from Theorem 1.4 that d∞ and the norm ‖ · ‖ are K-invariant. This implies that d is also left-invariant with respect to the original Lie group structure of S. Indeed, by (1), we can write d(gx, gy) = d(g ∗ (T (g)x), g ∗ (T (g)y)) = d(T (g)x, T (g)y) = d(x, y), where ∗ denotes this time the nilshadow product structure. We are done. � Proof of Corollary 1.7. This follows immediately from Theorem 1.4, when ∗ de- notes the graded nilshadow product. If ∗ denotes the nilshadow group structure, then it follows from Theorem 6.2 and the remark we just made in the proof of Corollary 1.8 (see also Remark 2.1). � 7.1. Proof of Theorem 1.2. Let G be a locally compact group of polynomial growth. We will show that G has a compact normal subgroup K such that G/K 40 EMMANUEL BREUILLARD contains a closed co-compact subgroup, which can be realized as a closed co- compact subgroup of a connected and simply connected solvable Lie group of type (R) (i.e. of polynomial growth). The proof will follow in several steps. (a) First we show that up to moding out by a normal compact subgroup, we may assume that G is a Lie group whose connected component of the identity has no compact normal subgroup. Indeed, it follows from Losert’s refinement of Gromov’s theorem ([24] Theorem 2) that there exists a normal compact subgroup K of G such that G/K is a Lie group. So we may now assume that G is a Lie group (not necessarily connected) of polynomial growth. The connected component G0 of G is a connected Lie group of polynomial growth. Recall the following classical fact: Lemma 7.1. Every connected Lie group has a unique maximal compact normal subgroup. By uniqueness it must be a characteristic Lie subgroup. Proof. Clearly if K1 and K2 are compact normal subgroups, then K1K2 is again a compact normal subgroup. Considering G/K, where K is a compact normal subgroup of maximal dimension, we may assume that G has no compact normal subgroup of positive dimension. But every finite normal subgroup of a connected group is central. Hence the closed group generated by all finite normal subgroups is contained in the center ofG. The center is an abelian Lie subgroup, i.e. isomorphic to a product of a vector space Rn, a torus Rm/Zm, a free abelian group Zk and a finite abelian group. In such a group, there clearly is a unique maximal compact subgroup (namely the product of the finite group and the torus). It is also normal, and maximal in G. � The maximal compact normal subgroup of G0 is a characteristic Lie subgroup of of G0. It is therefore normal in G and we may mod out by it. We therefore have shown that every locally compact (compactly generated) group with polynomial growth admits a quotient by a compact normal subgroup, which is a Lie group G whose connected component of the identity G0 has polynomial growth and con- tains no compact normal subgroup. We will now show that a certain co-compact subgroup of G has the embedding property of Theorem 1.2. (b) Second we show that, up to passing to a co-compact subgroup, we may assume that the connected component G0 is solvable. For this purpose, let Q be the solvable radical of G0, namely the maximal connected normal Lie subgroup of G0. Note that it is a characteristic subgroup of G0 and therefore normal in G. Moreover G0/Q is a semisimple Lie group. Since G0 has polynomial growth, it follows that G0/Q must be compact. Consider the action of G by conjugation on G0/Q, namely the map φ : G→ Aut(G0/Q). Since G0/Q is compact semisimple, its group of automorphisms is also a compact Lie group. In particular, the kernel ker φ is a co-compact subgroup of G. The connected component of the identity of Aut(G0/Q) is itself semisimple and hence has finite center. However the image of the connected component (ker φ)0 of ker φ in G0/Q modulo Q is central. Therefore it must be trivial. We have shown that (ker φ)0 is contained in Q and hence is solvable. Moreover (ker φ)0 ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 41 has no compact normal subgroup, because otherwise its maximal normal compact subgroup, being characteristic in (ker φ)0, would be normal in G (note that (ker φ)0 is normal in G). Changing G into the co-compact subgroup kerφ, we can therefore assume that G0 is solvable, of polynomial growth, and has no non trivial compact normal sub- group. The group G/G0 is discrete, finitely generated, and has polynomial growth. By Gromov’s theorem, it must be virtually nilpotent, in particular virtually poly- cyclic. (c) We finally prove the following proposition. Proposition 7.2. Let G be a Lie group such that its connected component of the identity G0 is solvable, admits no compact normal subgroup, and with G/G0 virtu- ally polycyclic. Then G has a closed co-compact subgroup, which can be embedded as a closed co-compact subgroup of a connected and simply connected solvable Lie group. The proof of this proposition is mainly an application of a theorem of H.C. Wang, which is a vast generalization of Malcev’s embedding theorem for torsion free finitely generated nilpotent groups. Wang’s theorem [36] states that any S- group can be embedded as a closed co-compact subgroup of a simply connected real linear solvable Lie group with only finitely many connected components. Wang defines a S-group to be any real Lie group G, which admits a normal subgroup A such that G/A is finitely generated abelian and A is a torsion-free nilpotent Lie group whose connected components group is finitely generated. In particular any S-group has a finite index (hence co-compact) subgroup which embeds as a co-compact subgroup in a connected and simply connected solvable Lie group. In order to prove Proposition 7.2, it therefore suffices to establish that G has a co-compact S-group. We first recall the following simple fact: Lemma 7.3. Every closed subgroup F of a connected solvable Lie group S is topologically finitely generated. Proof. We argue by induction on the dimension of S. Clearly there is an epi- morphism π : S → R. By induction hypothesis F ∩ ker π is topologically finitely generated. The image of F is a subgroup of R. However every subgroup of R contains either one or two elements, whose subgroup they generate has the same closure as the original subgroup. We are done. � Next we show the existence of a nilradical. Lemma 7.4. Let G be as in Proposition 7.2. Then G has a unique maximal normal nilpotent subgroup GN . Proof. The subgroup generated by any two normal nilpotent subgroups of any given group is itself nilpotent (Fitting’s lemma, see e.g. [30][5.2.8]). Let GN be the closure of the subgroup generated by all nilpotent subgroups of G. We need to show that GN is nilpotent. For this it is clearly enough to prove that it is 42 EMMANUEL BREUILLARD topologically finitely generated (because any finitely generated subgroup of GN is nilpotent by the remark we just made). Since G/G0 is virtually polycyclic, every subgroup of it is finitely generated ([29][4.2]). Hence it is enough to prove that GN ∩G0 is topologically finitely generated. This follows from Lemma 7.3. � Incidently, we observe that the connected component of the identity (GN )0 coincides with the nilradicalN of G0 (it is the maximal normal nilpotent connected subgroup of G0). We now claim the following: Lemma 7.5. The quotient group G/GN is virtually abelian. The proof of this lemma is inspired by the proof of the fact, due to Malcev, that polycyclic groups have a finite index subgroup with nilpotent commutator subgroup (e.g. see [30][ 15.1.6]). Proof. We will show that G has a finite index normal subgroup whose commutator subgroup is nilpotent. This clearly implies the lemma, for this nilpotent subgroup will be normal, hence contained in GN . First we observe that the group G admits a finite normal series Gm ≤ Gm−1 ≤ . . . ≤ G1 = G, where each Gi is a closed normal subgroup of G such that Gi/Gi+1 is either finite, or isomorphic to either Zn, Rn or Rn/Zn. This see it pick one of the Gi’s to be the connected component G0 and then treat G/G0 and G0 separately. The first follows from the definition of a polycyclic group (G/G0 has a normal polycyclic subgroup of finite index). While for G0, observe that its nilradical N is a connected and simply connected nilpotent Lie group and it admits such a series of characteristic subgroups (pick the central descending series), and G0/N is an abelian connected Lie group, hence isomorphic to the direct product of a torus n/Zn and a vector group Rn. The torus part is characteristic in G0/N , hence its preimage in G0 is normal in G. The group G acts by conjugation on each partial quotient Qi := Gi/Gi+1. This yields a map G → Aut(Qi). Now note that in order to prove our lemma, it is enough to show that for each i, there is a finite index subgroup of G whose com- mutator subgroup maps to a nilpotent subgroup of Aut(Qi). Indeed, taking the intersection of those finite index subgroup, we get a finite index normal subgroups whose commutator subgroup acts nilpotently on each Qi, hence is itself nilpotent (high enough commutators will all vanish). Now Aut(Qi) is either finite (if Qi is finite), or isomorphic to GLn(Z) (in case Qi is either Z n or Rn/Zn) or to GLn(R) (when Qi ≃ R n). The image of G in Aut(Qi) is a solvable subgroup. However, every solvable subgroup of GLn(R) contains a finite index subgroup, whose commutator subgroup is unipotent (hence nilpotent). This follows from Kolchin’s theorem for example, that a connected solvable algebraic subgroup of GLn(C) is triangularizable. We are done. � In the sequel we assume that G/G0 is torsion-free polycyclic. It is legitimate to do so in the proof of Proposition 7.2, because every virtually polycyclic group has a torsion-free polycyclic subgroup of finite index (see e.g. [29][Lemma 4.6]). We now claim the following: ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 43 Lemma 7.6. GN is torsion-free. Proof. Since G/G0 is torsion-free, it is enough to prove that GN ∩ G0 is torsion- free. However the set of torsion elements in GN forms a subgroup of GN (if x, y are torsion, then xy is too because 〈x, y〉 is nilpotent). Clearly it is a characteristic subgroup of GN . Hence its intersection with G0 is normal in G0. Taking the closure, we obtain a nilpotent closed normal subgroup T of G0 which contains a dense set of torsion elements. Recall that G0 has no normal compact subgroup. From this it quickly follows that T is trivial, because first it must be discrete (the connected component T0 is compact and normal in G0), hence finitely generated (by Lemma 7.3), hence made of torsion elements. But a finitely generated torsion nilpotent group is finite. Again since G0 has no compact normal subgroup, T must be trivial, and GN is torsion-free. � Now observe that the group of connected components of GN , namely GN/(GN )0 is finitely generated. Indeed, since G/G0 is finitely generated (as any polycyclic group), it is enough to prove that (G0 ∩GN )/(GN )0 is finitely generated, but this follows from the fact that G0∩GN is topologically finitely generated (Lemma 7.3). Now we are almost done. Note thatG is topologically finitely generated (Lemma 7.3), therefore so is G/GN . By Lemma 7.5 G/GN is virtually abelian, hence has a finite index normal subgroup isomorphic to Zn × Rm. It follows that G/GN has a co-compact subgroup isomorphic to a free abelian group Zn+m. Hence after changing G by a co-compact subgroup, we get that G is an extension of GN (a torsion-free nilpotent Lie group with finitely generated group of connected com- ponents) by a finitely generated free abelian group. Hence it is an S-group in the terminology of Wang [36]. We apply Wang’s theorem and this ends the proof of Proposition 7.2. (d) We can now conclude the proof of Theorem 1.2. By (a) and (b) G has a quotient by a compact group which admits a co-compact subgroup satisfying the assumptions of Proposition 7.2. Hence to conclude the proof it only remains to verify that the simply connected solvable Lie group in which a co-compact subgroup of G/K embeds has polynomial growth (i.e. is of type (R)). But this follows from the following lemma (see [21][Thm. I.2]). Lemma 7.7. Let G be a locally compact group. Then G has polynomial growth if and only if some (resp. any) co-compact subgroup of it has polynomial growth. Proof. First one checks that G is compactly generated if and only if some (resp. any) co-compact subgroup is. This is by the same argument which shows that finite index subgroups of a finitely generated group are finitely generated. In particular, if Ω is a compact symmetric generating set of G and H is a co-compact subgroup, then there is n0 ∈ N such that Ω n0H = G. Then H ∩ Ω3n0 generates If G has polynomial growth and H is any compactly generated closed subgroup, then H has polynomial growth. Indeed (see [21][Thm I.2]), if ΩH denotes a com- pact generating set for H, and K a compact neighborhood of the identity in G, 44 EMMANUEL BREUILLARD volG(K)volH(Ω H) ≤ volH(KK −1 ∩H)volG(Ω This inequality follows by integrating over a left Haar measure of G the function φ(x) := −1x)dh, where dh is a left Haar measure on H. This integral equals the left handside of the above displayed equation, while it is pointwise bounded by volH(xK −1 ∩H) inside HK and by zero outside HK. In the other direction, if H has polynomial growth, then G also has, because one can write Ωn ⊂ ΩnHK for some compact generating set ΩH of H and some compact neighborhood K of the identity in G (see Proposition 4.4). Then the result follows from the following inequality volH(ΩH)volG(Ω HK) ≤ volH(Ω H )volG(Ω H K), which itself is a direct consequence of the fact that the function ψ(x) := (h−1x)dh, where dh is a left Haar measure onH, satisfies ψ(x)dx = volH(Ω H )volG(Ω on the one hand and is bounded below by volH(ΩH) for every x ∈ Ω HK on the other hand. � Note that the above proof would be slightly easier if we already knew that both G and H were unimodular, in which case G/H has an invariant measure. But we know this only a posteriori, because the polynomial growth condition implies unimodularity ([21]). Similar considerations show that G has polynomial growth if and only if G/K has polynomial growth, given any normal compact subgroup K (e.g. see [21]). We end this paragraph with a remark and an example, which we mentioned in the Introduction. Remark 7.8 (Discrete subgroups are virtually nilpotent). Suppose Γ is a discrete subgroup of a connected solvable Lie group of type (R) (i.e. of polynomial growth). Then Γ is virtually nilpotent. Indeed, a similar argument as in Lemma 7.3 shows that every subgroup of Γ is finitely generated. It follows that Γ is polycyclic. How- ever Wolf [37] proved that polycyclic groups with polynomial growth are virtually nilpotent. Example 7.9 (A group with no nilpotent co-compact subgroup). Let G be the connected solvable Lie group G = R ⋉ (R2 × R2), where R acts as a dense one- parameter subgroup of SO(2,R) × SO(2,R). Then G is of type (R). It has no compact subgroup. And it has no nilpotent co-compact subgroup. Indeed suppose H is a closed co-compact nilpotent subgroup. Then it has a non-trivial center. Hence there is a non identity element whose centralizer is co-compact in G. However a simple examination of the possible centralizers of elements of G shows that none of them is co-compact. ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 45 7.2. Proof of Corollary 1.6 and Theorem 1.1. Let G be an arbitrary locally compact group of polynomial growth and ρ a periodic pseudodistance on G. Claim 1: Corollary 1.6 holds for a co-compact subgroup H of G, if and only if it holds for G. By Lemma 7.7, the groups G and H are unimodular, and hence G/H bears a G-invariant Radon measure volG/H , which is finite since H is co- compact. Now let F be a bounded Borel fundamental domain for H inside G. And let ρ be the periodic pseudodistance on G induced by the restriction of ρ to H, that is ρ(x, y) := ρ(hx, hy) where hx is the unique element of H such that x ∈ hxF. By 4.2 (1) and (4), ρ and ρ are at a bounded distance from each other. In particular, Bρ(r−C) ⊂ Bρ(r) ⊂ Bρ(r+C). Hence if the limit (3) holds for ρ, it also holds for ρ with the same limit. However, Bρ(r) = {x ∈ G, ρ(e, hx) ≤ r} = BρH (r)F where ρH is the restriction of ρ to H. Hence volG(Bρ(r)) = volH(BρH (r)) · volG/H(F ). By 4.2 (4), ρH is a periodic pseudodistance on H. So the result holds for (H, ρH) if and only if it holds for (G, ρ). Conversely, if ρ0 is a periodic pseudodistance on H, then ρ0(x, y) := ρ0(hx, hy) is a periodic pseudodistance on G, hence again volG(Bρ0(r)) = volH(Bρ0(r)) · volG(F ) and the result will hold for (H, ρ0) if and only if it holds for (G, ρ0). Claim 2: If Corollary 1.6 holds for G/K, where K is some compact normal subgroup, then it holds for G as well. Indeed, if ρ is a periodic pseudodistance on G, then the K-average ρK , as defined in (26), is at a bounded distance from G according to Lemma 4.7. Now ρK induces a periodic pseudodistance ρK on G/K and BρK (r) = BρK (r)K. Hence, volG(BρK (r)) = volG/K(BρK (r)) · volK(K). And if the limit (3) holds for ρK , it also holds for ρK , hence for ρ too. Thus the discussion above combined with Theorem 1.2 reduces Corollary 1.6 to the case when G is simply connected and solvable, which was treated in Section 5 and 6. � 7.3. Proof of Proposition 1.3 and Corollary 1.9. Proof of Proposition 1.3. We say that two metric spaces (X, dX ) and (Y, dY ) are at a bounded distance if they are (1, C)-quasi-isometric for some finite C. This is an equivalence relation. Now if ρ is H-periodic with H co-compact, then (G, ρ) is at a bounded distance from (H, ρ|H). Hence we may assume that H = G, i.e. that ρ is left invariant on G. Now Theorem 1.2 gives the existence of a normal compact subgroup K, a co- compact subgroup H containing K and a simply connected solvable Lie group S such that H/K is isomorphic to a co-compact subgroup of S. Lemma 4.7 shows that (G, ρ) is at a bounded distance from (G, ρK), where ρK is defined as in (26). Now ρK induces a left invariant periodic metric on G/K, and (G/K, ρK ) is clearly at a bounded distance from (G, ρK). Now by 4.2, its restriction to H/K is at a bounded distance and is left invariant. Now we set ρS(s1, s2) = ρ K(h1, h2), where (given a bounded fundamental domain F for the 46 EMMANUEL BREUILLARD left action of H/K on S) hi is the unique element of H/K such that si ∈ hiF . Clearly then (S, ρS) is at a bounded distance from (H/K, ρ K). We are done. � We note that our construction of S here depends on the stabilizer of ρ in G. Certainly not every choice of Lie shadow can be used for all periodic metrics (think that R3 is a Lie shadow of the universal cover of the group of motions of the plane). Perhaps a single one can be chosen for all, but we have not checked that. Proof of Corollary 1.9. Proposition 1.3 reduces the proof to a periodic metric ρ on a simply connected solvable Lie group S. Let d∞ the subFinsler metric on S (left invariant for the graded nilshadow group structure SN ) as given by Theorem 1.4. Let {δt}t is the group of dilations in the graded nilshadow SN of S as defined in Section 3. By definition of the pointed Gromov-Hausdorff topology (see [18]), it is enough to prove the Claim. The following quantity ρ(s1, s2)− d∞(δ 1 (s1), δ 1 (s2))| converges to zero as n tends to +∞ uniformly for all s1, s2 in a ball of radius O(n) for the metric ρ. Now this follows in three steps. First ρ is at a bounded distance from its restriction to the (co-compact) stabilizer H of ρ (cf. 4.2 (1), 4.2 (4)). Then for h1, h2 ∈ H, we can write ρ(h1, h2) = ρ(e, h 1 h2). However Proposition 5.1 implies the existence of another periodic distance ρK on S, which is invariant under left translations by elements of H for both the original Lie structure and the nilshadow Lie structure on S, such that ρ(e,x) ρK(e,x) tends to 1 as x tends to ∞. Hence ρK(e, h 1 h2) = ρK(h1, h2) = ρK(e, h 1 h2), where ∗ is the nilshadow product on S. Hence | 1 ρ(h1, h2)− ρK(e, h 1 h2)| tends to zero uniformly as h1 and h2 vary in a ball of radius O(n) for ρ. Finally Theorem 6.2 implies that | 1 ρK(e, h 1 h2) − d∞(e, h 1 h2)| tends to zero and the claim follows, as one verifies from the Campbell Hausdorff formula by comparing (11) and (12) as we did in (35), that |d∞(δ 1 (h1), δ 1 (h2))− d∞(e, δ 1 (h∗−11 h2)| converges to zero. The fact that the graded nilpotent Lie group does not depend (up to isomor- phism) on the periodic metric ρ but only on the locally compact group G fol- lows from Pansu’s theorem [28] that if two Carnot groups (i.e. a graded simply connected nilpotent Lie group endowed with left-invariant subRiemannian metric induced by a norm on a supplementary subspace to the commutator subalgebra) are bi-Lipschitz, the underlying Lie groups must be isomorphic. This deep fact relies on Pansu’s generalized Rademacher theorem, see [28]. Indeed, two differ- ent periodic metrics ρ1 and ρ2 on G are quasi-isometric (see Proposition 4.4), and hence their asymptotic cones are bi-Lipschitz (and bi-Lipschitz to any Carnot group metric on the same graded group, by (13)). � ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 47 8. Coarsely geodesic distances and speed of convergence Under no further assumption on the periodic pseudodistance ρ, the speed of convergence in the volume asymptotics can be made arbitrarily small. This is easily seen if we consider examples of the following type: define ρ(x, y) = |x−y|+ |x− y|α on R where α ∈ (0, 1). It is periodic and vol(Bρ(t)) = t− t α + o(tα). However, many natural examples of periodic metrics, such as word metrics or Riemannian metrics, are in fact coarsely geodesic. A pseudodistance on G is said to be coarsely geodesic, if there is a constant C > 0 such that any two points can be connected by a C-coarse geodesic, that is, for any x, y ∈ G there is a map g : [0, t] → G with t = ρ(x, y), g(0) = x and g(t) = y, such that |ρ(g(u), g(v)) − |u− v|| ≤ C for all u, v ∈ [0, t]. This is a stronger requirement than to say that ρ is asymptotically geodesic (see 21). This notion is invariant under coarse isometry. In the case when G is abelian, D. Burago [6] proved the beautiful fact that any coarsely geodesic periodic metric on G is at a bounded distance from its asymptotic norm. In particular volG(Bρ(t)) = c · t d+O(td−1) in this case. In the remarkable paper [32], M. Stoll proved that such an error term in O(td−1) holds for any finitely generated 2-step nilpotent group. Whether O(td−1) is the right error term for any finitely generated nilpotent group remains an open question. The example below shows on the contrary that in an arbitrary Lie group of polynomial growth no universal error term can be expected. Theorem 8.1. Let εn > 0 be an arbitrary sequence of positive numbers tending to 0. Then there exists a group G of polynomial growth of degree 3 and a compact generating set Ω in G and c > 0 such that volG(Ω c · n3 ≤ 1− εn holds for infinitely many n, although 1 volG(Ω n) → 1 as n→ +∞. The example we give below is a semi-direct product of Z by R2 and the metric is a word metric. However, many similar examples can be constructed as soon as the map T : G → K defined in Paragraph 5.1 in not onto. For example, one can consider left invariant Riemannian metrics on G = R · (R2 × R2) where R acts by via a dense one-parameter subgroup of the 2-torus S1 × S1. Incidently, this group G is known as the Mautner group and is an example of a wild group in representation theory. 8.1. An example with arbitrarily small speed. In this paragraph we describe the example of Theorem 8.1. Let Gα = Z · R 2 where the action of Z is given by the rotation Rα of angle πα, α ∈ [0, 1). The group Gα is quasi-isometric to R and hence of polynomial growth of order 3 and it is co-compact in the analogously defined Lie group G̃α = R ⋉ R 2. Its nilshadow is isomorphic to R3. The point is 48 EMMANUEL BREUILLARD Figure 2. The union of the two cones, with basis the disc of radius 2, represents the limit shape of the balls Ωn in the group Z ⋉ R2, where Z acts by an irrational rotation, with generating set Ω = {(±1, 0, 0)} ∪ {(0, x1, x2), x21 + x 2 ≤ 1}. that if α is a suitably chosen Liouville number, then the balls in Gα will not be well approximated by the limit norm balls. Elements of Gα are written (k, x) where k ∈ Z and x ∈ R 2. Let ‖x‖ x21+x be a Euclidean norm on R2, and let Ω be the symmetric compact generating set given by {(±1, 0)}∪{(0, x), ‖x‖ ≤ 1}. It induces a word metric ρΩ on G. It follows from Theorem 1.4 and the definition of the asymptotic norm that ρΩ(e, (k, x)) is asymptotic to the norm on R3 given by ρ0(e, (k, x)) := |k| + ‖x‖0 where ‖x‖0 is the rotation invariant norm on R2 defined by ‖x‖ (x21 + x 2). The unit ball of ‖·‖0 is the convex hull of the union of all images of the unit ball of ‖·‖ under all rotations Rkα, k ∈ Z. We are going to choose α as a suitable Liouville number so that (36) holds. Let δn = (4εn) 1/3 and choose α so that the following holds for infinitely many n’s: (37) d(kα,Z + ) ≥ 2δn for all k ∈ Z, |k| ≤ n. This is easily seen to be possible if we choose α of the form∑ 1/3ni for some suitable lacunary increasing sequence of (ni)i. Note that, since ‖x‖0 ≥ ‖x‖ , we have ρΩ ≥ ρ0. Let Sn be the piece of R 2 defined by Sn = {|θ| ≤ δn} where θ is the angle between the point x and the vertical axis ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 49 Re2. We claim that if x ∈ Sn, ρ0(e, (k, x)) ≤ n and n satisfies (37), then ρΩ(e, (k, x)) ≥ |k|+ (1 + ) ‖x‖0 It follows easily from the claim that volG(Ω n) ≤ (1− εn) · volG(Bρ0(n)). Moreover volG(Bρ0(n)) = c · n 3 + O(n2), where c = 4π if volG is given by the Lebesgue measure. Proof of claim. Here is the idea to prove the claim. To find a short path between the identity and a point on the vertical axis, we have to rotate by a Rkα such that kα is close to 1 , hence go up from (0, 0) to (k, 0) first, thus making the vertical direction shorter. However if (37) holds, the vertical direction cannot be made as short as it could after rotation by any of the Rkα with |k| ≤ n. Note that if ρ0(e, (k, x)) ≤ n then |k| ≤ n and ρΩ(e, (k, x)) ≥ |k|+inf ‖Rkiαxi‖ where the infimum is taken over all paths x1, ..., xN such that x = xi and all rotations Rkiα with |ki| ≤ n. Note that if δn is small enough and (37) holds then for every x ∈ Sn we have ‖Rkαx‖ ≥ (1 + δ n) ‖x‖0 . On the other hand ‖x‖0 = ‖xi‖0 cos(θi) where θi is the angle between xi and the x. Hence ‖Rkiαxi‖ ≥ |θi|≤δn ‖Rkiαxi‖+ |θi|>δn ‖Rkiαxi‖ ≥ (1 + δ2n) |θi|≤δn ‖xi‖0 cos(θi) + cos(δn) |θi|>δn ‖xi‖0 cos(θi) ≥ (1 + ) · ‖x‖0 8.2. Limit shape for more general word metrics on solvable Lie groups of polynomial growth. The determination of the limit shape of the word metric in Paragraph 8.1 was possible due to the rather simple nature of the generating set. In general, using the identity (see (1)) (38) ω1 · . . . · ωm = ω1 ∗ (T (ω1)ω2) ∗ . . . ∗ (T (ωm−1 · . . . · ω1)ωm) it is easy to check that the unit ball of the limit norm ‖ · ‖∞ inducing the limit subFinsler metric d∞ on the nilshadow associated to a given word metric with generating set Ω is contained in the K-orbit of the convex hull of the projection of Ω to the abelianized nilshadow, namely the convex hull of K · π1(Ω). In the example of Paragraph 8.1, we even had equality between the two. How- ever this is not the case in general. For example, the limit shape is always K- invariant, but clearly the limit shape associated to a generating set Ω coincides with the one associated with a conjugate gΩg−1 of it, while the convex hull of the respective K-orbits may not be the same. Of course if the generating set Ω is K-invariant to begin with, then Ωn = Ω∗n and we are back in the nilpotent case, where we know that the unit ball of the limit norm is just the convex hull of the projection of the generating set to the abelianization. In general however it is a challenging problem to determine the 50 EMMANUEL BREUILLARD precise asymptotic shape of a word metric on a general solvable Lie group with polynomial growth, and there seems to be no simple description analogous to what we have in the nilpotent case. Even in the above example Gα = Z⋉αR 2, or in the universal cover of the group of motions of the plane (in which Gα embeds co-compactly), it is not that simple. In general the shape is determined by solving an optimization problem in which one has to find the path which maximizes the coordinates of the endpoint. In order to illustrate this, we treat without proof the following simple example. Suppose Ω is a symmetric compact neighborhood of the identity in Gα = Z⋉αR of the form Ω = (0,Ω0) ∪ (1,Ω1) ∪ (1,Ω1) −1, where Ω0,Ω1 ⊂ R 2. Then the limit shape of the word metric ρΩ associated to Ω is the solid body (rotationally symmetric around the vertical axis as in Figure 2) made of two copies (upper and lower) of a truncated cone with base a disc on (0,R2) of radius max{r0, r1} and top (resp. bottom) a disc on the plane (1,R2) (resp. (−1,R2)) of radius r2, where the radii are given by r0 = max{‖x‖, x ∈ Ω0}, r1 = diam(Ω1), where diam(Ω1) is the diameter of Ω1 and r2 is given by the integral (39) r2 = max{πθ(Ω1)} where πθ(Ω1) is the orthogonal projection on the x-axis of image of Ω1 ⊂ R 2 by a rotation of angle θ around the origin. It is indeed convex (note that r2 ≤ r1). For example if Ω1 is made of only one point, then the limit shape is the same as in the previous paragraph and as in Figure 2, namely two copies of a cone. However if Ω1 is made of two points {a, b}, then the upper part of the limit shape will be a truncated cone with an upper disc of radius r2 = ‖a−b‖ (which is the result of the computation of the above integral). Let us briefly explain the formula (39). A path of length n reaching the highest z-coordinate in Gα is a word of the form (1, ω1) · . . . · (1, ωn), with ωi ∈ Ω1. By (38) this word equals Ri−1α ωi). Here ωi can take any value in Ω1. In order to maximize the norm of the second coordinate, or equivalently (by rotation invariance) its x-coordinate, one has to choose ωi ∈ Ω1 at each stage in such a way that the x-coordinate of R α ωi is maximized. Formula (39) now follows from the fact that {Ri−1α }1≤i≤n becomes equidistributed in SO(2,R) as n tends to infinity. In order to show that max{r0, r1} is the radius of the base disc and more generally that the limit shape is no bigger than this double truncated cone, one needs to argue further by considering all possible paths of the form (ε1, ω1) · . . . · (εn, ωn) where εi ∈ {0,±1} and εi is prescribed. ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 51 8.3. Bounded distance versus asymptotic metrics. In this paragraph we an- swer a question of D. Burago and G. Margulis (see [7]). Based on the abelian case and the reductive case (Abels-Margulis [1]), Burago and Margulis had conjectured that every two asymptotic word metrics should be at a bounded distance. We give below a counterexample to this. We first give an example (A) of a nilpotent Lie group endowed with two left invariant subFinsler metrics d∞ and d ∞ that are asymptotic to each other, i.e. d∞(e, x)/d ∞(e, x) → 1 as x → ∞ but such that |d∞(e, x)− d ∞(e, x)| is not uniformly bounded. Then we exhibit (B) a word met- ric that is not at a bounded distance from any homogeneous quasi-norm. Finally these examples also yield (C) two word metrics ρ1 and ρ2 on the same finitely generated nilpotent group which are asymptotic but not at a bounded distance. Note that the group Gα with ρ0 and ρΩ from the last paragraph also provides an example of asymptotic metrics which are not at a bounded distance (but this group was not discrete). (A) Let N = R × H3(R) where H3 is classical Heisenberg group and Γ = Z ×H3(Z) a lattice in N . In the Lie algebra n = RV ⊕ h3 we pick two different supplementary subspaces of [n, n] = RZ, i.e. m1 = span{V,X, Y } and m span{V + Z,X, Y }, where h3 is the Lie algebra of H3(R) spanned by X,Y and Z = [X,Y ].We consider the L1-norm on m1 (resp. m 1) corresponding to the basis (V,X, Y ) (resp. (V + Z,X, Y )). Both norms induce the same norm on n/[n, n]. They give rise to left invariant Carnot-Caratheodory Finsler metrics on N , say d∞ (resp. d ∞). We use the coordinates (v, x, y, z) = exp(vV + xX + yY + zZ). According to Remark (2) after Theorem 6.2, d∞ and d ∞ are asymptotic. Let us show that they are not at a bounded distance. First observe that, since V is central, d∞(e, (v; (x, y, z))) = |v| + dH3(e, (x, y, z)) where dH3 is the Carnot- Caratheodory Finsler metric on H3(R) defined by the standard L 1-norm on the span{X,Y }. Similarly d′∞(e, (v; (x, y, z))) = |v| + dH3(e, (x, y, z − v))). If d∞ and d′∞ were at a bounded distance, we would have a C > 0 such that for all t > 0 |d∞(e, (t; (0, 0, t))) − t| ≤ C Hence |dH3(e, (0, 0, t))| ≤ C, which is a contradiction. (B) Now let Ω = {(1; (0, 0, 1))±1 , (1; (0, 0,−1))±1 , (0; (1, 0, 0))±1 , (0; (0, 1, 0))±1} be a generating set for Γ and ρΩ the word metric associated to it. Let | · | be a homogeneous quasi-norm on N which is at a bounded distance from ρΩ, i.e. |ρΩ(e, g) − |g|| is bounded. Then | · | is asymptotic to ρΩ, hence is equal to the Carnot-Caratheodory Finsler metric d asymptotic to ρΩ and homogeneous with respect to the same one parameter group of dilations {δt}t>0. Let m1 = {v ∈ n, δt(v) = tv}. Then d is induced by some norm ‖·‖0 on m1, whose unit ball is given, according to Theorem 1.4 by the convex hull of the projections to m1 of the generators in Ω. There is a unique vector in m1 of the form V +z0Z. Its ‖·‖0-norm is 1 and d(e, (1; (0, 0, z0))) = 1. However d(e, (v; (x, y, z))) = |v|+ dH3(e, (x, y, z − vz0)). Since ρΩ(e, (n; (0, 0, n))) = n, we get d(e, (n; (0, 0, n))) − ρΩ(e, (n; (0, 0, n))) = dH3(e, (0, 0, n(1 − z0))) 52 EMMANUEL BREUILLARD If this is bounded, this forces z0 = 1. But we can repeat the same argument with (n; (0, 0,−n)) which would force z0 = −1. A contradiction. (C) Let now Ω2 := {(1; (0, 0, 0)) ±1 , (0; (1, 0, 0))±1 , (0; (0, 1, 0))±1} and ρΩ2 the associated word metric on Γ. Then again ρΩ and ρΩ2 are asymptotic by Theorem 6.2 because the convex hull of their projection modulo the z-coordinate coincide. However ρΩ2 is a product metric, namely we have ρΩ2(e, (v; (x, y, z))) = |v| + ρ(e, (x, y, z)), where ρ is the word metric on the discrete Heisenberg group H3(Z) with standard generators {(1, 0, 0)±1, (0, 1, 0)±1}. In particular ρΩ(e, (n; (0, 0, n))) − ρΩ2(e, (n; (0, 0, n))) = ρ(e, (0, 0, n)) which is unbounded. Remark 8.2 (An abnormal geodesic). We refer the reader to [9] for more on these examples. In particular we show there that ρ1 and ρ2 above are not (1, C)- quasi-isometric for any C > 0. The key phenomenon behind this example is the presence of an abnormal geodesic (see [25]), namely the one-parameter group {(t; (0, 0, 0))}t . Remark 8.3 (Speed of convergence in the nilpotent case). The slow speed phe- nomenon in Theorem 8.1 relied crucially on the presence of a non-trivial semisim- ple part in Gα ; this doesn’t occur in nilpotent groups. In [9], we show that for word metrics on finitely generated nilpotent groups, the convergence in Theorem 6.2 has a polynomial speed with an error term at least as good as O(d∞(e, x) 3r ), where r is the nilpotency class. We conjecture there that the optimal exponent is 1 This involves refining quantitatively the estimates of the above proof of Theorem 9. Appendix: the Heisenberg groups Here we show how to compute the asymptotic shape of balls in the Heisenberg groups H3(Z) and H5(Z) and their volume, thus giving another approach to the main result of Stoll [33]. The leading term for the growth of H3(Z) is rational for all generating sets (Prop. 9.1 below), whereas in H5(Z) with its standard generating set, it is transcendental. This explains how our Figure 1 was made (compare with the odd [22] Fig. 1). 9.1. 3-dim Heisenberg group. Let us first consider the Heisenberg group H3(Z) = 〈a, b|[a, [a, b]] = [b, [a, b]] = 1〉 . We see it as the lattice generated by a = exp(X) and b = exp(Y ) in the real Heisenberg group H3(R) with Lie algebra h3 generated by X,Y and spanned by X,Y,Z = [X,Y ]. Let ρΩ be the standard word metric on H3(Z) associated to the generating set Ω = {a±1, b±1}. According to Theorem 1.4, the limit shape of the n-ball Ωn in H3(Z) coincides with the unit ball C3 = {g ∈ H3(R), d∞(e, g) ≤ 1} for the Carnot-Caratheodory metric d∞ induced on H3(R) by the ℓ 1-norm ‖xX + yY ‖0 = |x|+ |y| on m1 = span{X,Y } ⊂ h3. ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 53 Computing this unit ball is a rather simple task. Exchanging the roles of X and Y , we see that C3 is invariant under the reflection z 7→ −z. Then clearly C3 is of the form {xX + yY + zZ, with |x|+ |y| ≤ 1 and |z| ≤ z(x, y)}. Changing X to −X and Y to −Y, we get the symmetries z(x, y) = z(−x, y) = z(x,−y) = z(y, x). Hence when determining z(x, y), we may assume 0 ≤ y ≤ x ≤ 1, x+ y ≤ 1. The following well known observation is crucial for computing z(x, y). If ξ(t) is a horizontal path in H3(R) starting from id, then ξ(t) = exp(x(t)X+y(t)Y +z(t)Z), where ξ′(t) = x(t)X + y(t)Y and z(t) is the “balayage” area of the between the path {x(s)X + y(s)Y }0≤s≤t and the chord joining 0 to x(t)X + y(t)Y. Therefore, z(x, y) is given by the solution to the “Dido isoperimetric problem” (see [25]): find a path in the X,Y -plane between 0 and xX + yY of ‖·‖0-length 1 that maximizes the “balayage area”. Since ‖·‖0 is the ℓ 1-norm in the X,Y -plane, as is well-known (see [8]), such extremal curves are given by arcs of square with sides parallel to the X,Y -axes. There is therefore a dichotomy: the arc of square has either 3 or 4 sides (it may have 1 or 2 sides, but these are included are limiting cases of the previous ones). If there are 3 sides, they have length ℓ, x and y + ℓ with y + ℓ ≤ x. Hence 1 = ℓ+ x+ y + ℓ and z(x, y) = ℓx+ 1 xy. Therefore this occurs when y ≤ 3x − 1 and we then have z(x, y) = x(1−x) If there are 4 sides, they have length ℓ, x+ u, y + ℓ and u, with ℓ+ y = x+ u. Hence 1 = 2ℓ + 2u + x + y and z(x, y) = (ℓ + y)(x + u) − . This occurs when y ≥ 3x− 1 and we then have z(x, y) = (1+x+y)2 Hence if 0 ≤ y ≤ x ≤ 1 and x+ y ≤ 1 (40) z(x, y) = 1y≤3x−1 x(1− x) + 1y>3x−1 (1 + x+ y)2 The unit ball C3 drawn in Figure 1 is the solid body C3 = {xX + yY + zZ, with |x|+ |y| ≤ 1 and |z| ≤ z(x, y)}. A simple calculation shows that vol(C3) = in the Lebesgue measure dxdydz. Since H3(Z) is easily seen to have co-volume 1 for this Haar measure on H3(R) (actually {xX+yY +zZ, x ∈ [0, 1), y ∈ [0, 1), z ∈ [0, 1)} is a fundamental domain), it follows that #(Ωn) = vol(C3) = We thus recover a well-known result (see [4], [31] where even the full growth series is computed and shown to be rational). One can also determine exactly which points of the sphere ∂C3 are joined to id by a unique geodesic horizontal path. The reader will easily check that uniqueness fails exactly at the points (x, y,±z(x, y)) with |x| < 1 and y = 0, or |y| < 1 x = 0, or else at the points (x, y, z) with |x|+ |y| = 1 and |z| < z(x, y). The above method also yields the following result. 54 EMMANUEL BREUILLARD Proposition 9.1. Let Ω be any symmetric generating set for H3(Z). Then the leading coefficient in #(Ωn) is rational, i.e. #(Ωn) is a rational number. Proof. We only sketch the proof here. We can apply the method above and com- pute r as the volume of the unit CC-ball C(Ω) of the limit CC-metric d∞ de- fined in Theorem 1.4. Since we know what is the norm ‖·‖ in the (x, y)-plane m1 = span 〈X,Y 〉 that generates d∞ (it is the polygonal norm given by the con- vex hull of the points of Ω), we can compute C(Ω) explicitly. We need to know the solution to Dido’s isoperimetric problem for ‖·‖ in m1, and as is well known (see [8]) it is given by polygonal lines from the dual polygon rotated by 90◦. Since the polygon defining ‖·‖ is made of rational lines (points in Ω have integer coordi- nates), any vector with rational coordinates has rational ‖·‖-length, and the dual polygon is also rational. The equations defining z(x, y) will therefore have only rational coefficients, and z(x, y) will be piecewisely given by a rational quadratic form in x and y, where the pieces are rational triangles in the (x, y)-plane. The total volume of C(Ω) will therefore be rational. � 9.2. 5-dim Heisenberg group. The Heisenberg groupH5(Z) is the group gener- ated by a1, b1, a2, b2,c with relations c = [a1, b1] = [a2, b2], a1 and b1 commute with a2 and b2 and c is central. Let Ω = {a i , b i , i = 1, 2}. Let us describe the limit shape of Ωn. Again, we see H5(Z) as a lattice of co-volume 1 in the group H5(R) with Lie algebra h5 spanned by X1, Y1X2, Y2 and Z = [Xi, Yi]. By Theorem 1.4, the limit shape is the unit ball C5 for the Carnot-Caratheodory metric on H5(R) induced by the ℓ1-norm ‖x1X1 + y1Y1 + x2X2 + y2Y2‖0 = |x1|+ |y1|+ |x2|+ |y2|. Since X1, Y1 commute with X2, Y2, in any piecewise linear horizontal path in H5(R), we can swap the pieces tangent to X1 or Y1 with those tangent to X2 or Y2 without changing the end point of the path. Therefore if ξ(t) = exp(x1(t)X1 + y1(t)Y1 + x2(t)X2 + y2(t)Y2 + z(t)Z) is a horizontal path, then z(t) = z1(t) + z2(t), where zi(t), i = 1, 2, is the “balayage area” of the plane curve {xi(s)Xi + yi(s)Yi}0≤s≤t. Since, just like for H3(Z), we know the curve maximizing this area, we can compute the unit ball C5 explicitly. In exponential coordinates it will take the form C5 = {exp(x1X1 + y1Y1 + x2X2 + y2Y2 + zZ), |x1|+ |y1|+ |x2|+ |y2| ≤ 1 and |z| ≤ z(x1, y1, x2, y2)}. Then z(x1, y1, x2, y2) = sup0≤t≤1{zt(x1, y1)+ z1−t(x2, y2)}, where zt(x, y) is the maximum “balayage area” of a path of length t between 0 and xX+yY. It is easy to see that zt(x, y) = t 2z(x/t, y/t) where z is given by (40). Hence zt is a piecewise quadratic function of t. Again z(x1, y1, x2, y2) is invariant under changing the signs of the xi,yi’s, and swapping x and y, or else swapping 1 and 2. We may thus assume that the xi,yi’s lie in D = {0 ≤ yi ≤ xi ≤ 1 and x1+y1+x2+y2 ≤ 1, and x2−y2 ≥ x1−y1}.We may therefore determine explicitly the supremum z(x1, y1, x2, y2), which after some straightforward calculations takes ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 55 on D the following form: z(x1, y1, x2, y2) = 1Amax{d1, d2}+ 1B max{d1, c1}+ 1C max{c1, c2} where d1 = (1−x1− y1−x2), c1 = (1+x1+ y1−x2− y2) x2y2−x1y1 and d2 and c2 are obtained from d1 and c1 by swapping the indices 1 and 2. The sets A,B and C form the following partition of D : A = D ∩ {m ≤ x1 − y1}, B = D ∩ {x1 − y1 < m < x2 − y2} and C = D ∩ {x2 − y2 ≤ m}, where m = (1 − x1 − x2 − y1 − y2)/2. Since C5 has such an explicit form, it is possible to compute its volume. The fact that z(x1, y1, x2, y2) is piecewisely given by the maximum of two quadratic forms makes the computation of the integral somewhat cumbersome but tractable. Our equations coincide (fortunately!) with those of Stoll (appendix of [33]), where he computed the main term of the asymptotics of #(Ωn) by a different method. Stoll did calculate that integral and obtained #(Ωn) = vol(C5) = 21870 log(2) 32805 which is transcendental. It is also easy to see by this method that if we change the generating set to Ω0 = {a 2 }, then we get a rational volume. Hence the rationality of the growth series of H5(Z) depends on the choice of generating set, which is Stoll’s theorem. One advantage of our method is that it can also apply to fancier generating sets. The case of Heisenberg groups of higher dimension with the standard gen- erating set is analogous: the function z({xi}, {yi}) is again piecewisely defined as the maximum of finitely many explicit quadratic forms on a linear partition of the ℓ1-unit ball |xi|+ |yi| ≤ 1. Acknowledgments. I would like to thank Amos Nevo for his hospitality at the Technion of Haifa in December 2005, where part of this work was conducted, and for triggering my interest in this problem by showing me the possible implications of Theorem 1.1 to Ergodic Theory. My thanks are also due to V. Losert for pointing out an inaccuracy in my first proof of Theorem 1.2 and for his other remarks on the manuscript. Finally I thank Y. de Cornulier, M. Duchin, E. Le Donne, Y. Guivarc’h, A. Mohammadi, P. Pansu and R. Tessera for several useful conversations. References [1] H. Abels and G. Margulis. Coarsely geodesic metrics on reductive groups. In Modern dy- namical systems and applications, pages 163–183. Cambridge Univ. Press, Cambridge, 2004. [2] L. Auslander and L. W. Green, G-induced flows, Amer. J. Math. 88 (1966), 43–60. [3] H. Bass, The degree of polynomial growth of finitely generated nilpotent groups, Proc. London Math. Soc. (3) 25 (1972), 603–614. [4] M. Benson, On the rational growth of virtually nilpotent groups, In: S.M. Gersten, Stallings (eds), Combinatorial Group Theory and Topology, Ann. Math. Studies, vol 111, PUP (1987). [5] V. N. Berestovskĭı. Homogeneous manifolds with an intrinsic metric I, Sibirsk. Mat. Zh., 29(6):17–29, 1988. 56 EMMANUEL BREUILLARD [6] D. Yu. Burago, Periodic metrics, in Representation Theory and Dynamical Systems, 205– 210, Adv. Soviet Math. 9 Amer. Math. Soc. (1992). [7] D. Yu. Burago, G.A. Margulis, Problem Session, in Oberwolfach Report, Geometric Group Theory, Hyperbolic Dynamics and Symplectic Geometry, 2006. [8] H. Busemann, The isoperimetric problem in the Minkowski plane, AJM 69 (1947), 863–871. [9] E. Breuillard and E. Le Donne, On the rate of convergence to the asymptotic cone for nilpotent groups and subFinsler geometry, preprint 2012. [10] A. Calderon, A general ergodic theorem, Annals of Math. 57 (1953), pp. 182-191. [11] T. H. Colding and W. P. Minicozzi, II. Liouville theorems for harmonic sections and appli- cations. Comm. Pure Appl. Math., 51(2):113–138, 1998. [12] L. Corwin and F. P. Greenleaf, Representations of nilpotent Lie groups and their applications, Part I, Basic theory and examples, Cambridge Univ. Press, (1990) 269pp. [13] N. Dungey, A. F. M ter Elst, and D. W. Robinson, Analysis on Lie groups with polynomial growth, Progress in Math. 214, Birkhauser, (2003) 312pp. [14] W. R. Emerson, The pointwise ergodic theorem for amenable groups, Amer. J. Math 96 (1974), 472–487. [15] J. W. Jenkins, A characterization of growth in locally compact groups, Bull. Amer. Math. Soc. 79 (1973), 103–106. [16] F. P. Greenleaf, Invariant means on topological groups and their applications, Van Nostrand Mathematical Studies, no 16 (1969) 113pp. [17] M. Gromov, Groups of polynomial growth and expanding maps, Publications Mathématiques de l’IHES, no 53 (1981), 53-73. [18] M. Gromov. Metric structures for Riemannian and non-Riemannian spaces, volume 152 of Progress in Mathematics. Birkhäuser Boston Inc., Boston, MA, 1999. Based on the 1981 French original, With appendices by M. Katz, P. Pansu and S. Semmes. [19] M. Gromov, Carnot-Carathéodory spaces seen from within, in Sub-Riemannian Geometry, edited by A. Bellaiche and J-J. Risler, 79-323, Birkauser (1996). [20] M. Gromov, Asymptotic invariants of infinite groups, in Geometric group theory, Vol. 2 (Sussex, 1991), 1–295, London Math. Soc. Lecture Note Ser., 182, CUP (1993). [21] Y. Guivarc’h, Croissance polynômiale et périodes des fonctions harmoniques, Bull. Sc. Math. France 101, (1973), p. 353-379. [22] R. Karidi, Geometry of balls in nilpotent Lie groups, Duke Math. J. 74 (1994), no. 2, 301–317. [23] S. A. Krat. Asymptotic properties of the Heisenberg group. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI), 261(Geom. i Topol. 4):125–154, 268, 1999. [24] V. Losert, On the structure of groups with polynomial growth, Math. Z. 195 (1987), no 1, 109–117. [25] R. Montgomery, A tour of sub-riemannian geometry, AMS book 2002. [26] A. Nevo, Pointwise ergodic theorems for actions of connected Lie groups, Handbook of Dy- namical Systems, Eds. B. Hasselblatt and A. Katok, to appear. [27] P. Pansu, Croissance des boules et des géodésiques fermées dans les nilvariétés, Ergodic Theory Dynam. Systems 3 (1983), no. 3, 415–445. [28] P. Pansu, Mtriques de Carnot-Carathodory et quasiisomtries des espaces symtriques de rang un, Ann. of Math. (2) 129 (1989), no. 1, 160. [29] M. S. Raghunathan, Discrete subgroups of Lie groups, Springer Verlag (1972). [30] D. Robinson, A course in the theory of groups, Springer-Verlag. [31] M. Shapiro, A geometric approach to almost convexity and growth of some nilpotent groups, Math. Ann, 285, 601-624 (1989). [32] M. Stoll, On the asymptotic of the growth of 2-step nilpotent groups, J. London Math. Soc (2) 58 (1998), no 1, 38–48. [33] M. Stoll, Rational and transcendental growth series for higher Heisenberg groups, Invent. math. 126, 85-109 (1996). [34] A. Tempelman, Ergodic theorems for group actions, Mathematics and its applications, 78, Kluwer Academic publishers (1992). ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 57 [35] R. Tessera, Volumes of spheres in doubling measures metric spaces and groups of polynomial growth, Bull. Soc. Math. France, 135(1):47–64, 2007. [36] H.C. Wang, Discrete subgroups of solvable Lie groups, Annals of Math, (1956), 64, 1-19. [37] J. Wolf, Growth of finitely generated solvable groups and curvature of Riemanniann mani- folds, J. Differential Geometry, 2 (1968) p. 421–446. E-mail address: emmanuel.breuillard@math.u-psud.fr Université Paris-Sud 11, Laboratoire de Mathématiques, 91405 Orsay, France 1. Introduction 2. Quasi-norms and the geometry of nilpotent Lie groups 3. The nilshadow 4. Periodic metrics 5. Reduction to the nilpotent case 6. The nilpotent case 7. Locally compact G and proofs of the main results 8. Coarsely geodesic distances and speed of convergence 9. Appendix: the Heisenberg groups References ABSTRACT We get asymptotics for the volume of large balls in an arbitrary locally compact group G with polynomial growth. This is done via a study of the geometry of G and a generalization of P. Pansu's thesis. In particular, we show that any such G is weakly commensurable to some simply connected solvable Lie group S, the Lie shadow of G. We also show that large balls in G have an asymptotic shape, i.e. after a suitable renormalization, they converge to a limiting compact set which can be interpreted geometrically. We then discuss the speed of convergence, treat some examples and give an application to ergodic theory. We also answer a question of Burago about left invariant metrics and recover some results of Stoll on the irrationality of growth series of nilpotent groups. <|endoftext|><|startoftext|> Introduction A system of n ordinary differential equations each of order M > 1, k = fk(u j , t), j, k = 1, n, s = 0,M − 1, (1) has a variable number of Lie point symmetries depending upon the structure of the functions fk. The maximal dimension D of the algebra of admitted Lie point symmetries can be obtained by the formulæ [9] M = 2 =⇒ D = n2 + 4n + 3 (2) M > 2 =⇒ D = n2 +Mn + 3. (3) Some explicit numbers are given in Table 1. Recently the elaboration of the elements of the Lie algebra, E8, of order 248 has been variously announced [3, 7, 13, 17, 16] in the serious popular media. The au- thoritative source is the Atlas of Lie Groups and Representations [2] which is funded by the National Science Foundation through the American Institute of Mathematics [1]. The results of the E8 computation were announced in a talk at MIT by David Vogan on Monday, March 19, 2007, and the details may be found at [15]. The Atlas of Lie Groups and Representations is a project to make available information about representations of semisimple Lie groups over real and p-adic fields. Of particular importance is the problem of the unitary dual, ie the classification of all of the ir- reducible unitary representations of a given Lie group. The goal of the Atlas of Lie ∗permanent address: School of Mathematical Sciences, Westville Campus, University of KwaZulu-Natal, Durban 4000, Republic of South Africa http://arxiv.org/abs/0704.0096v1 Table 1: : The maximal dimension of the algebra of admitted Lie point symmetries for systems of equations of varying order (horizontal) and number (vertical). M 2 3 4 5 6 7 8 9 10 1 8 7 8 9 10 11 12 13 14 2 15 13 15 17 19 21 23 25 27 3 24 21 24 27 30 33 36 39 42 4 35 31 35 39 43 47 51 55 59 5 48 43 48 53 58 63 68 73 78 6 63 57 63 69 75 81 87 93 99 7 80 73 80 87 94 101 108 115 122 8 99 91 99 107 115 123 131 139 147 9 120 111 120 129 138 147 156 165 174 10 143 133 143 153 163 173 183 193 203 Groups and Representations is to classify the unitary dual of a real Lie group, G, by computer. A step in this direction is to compute the admissible representations of G including their Kazhdan-Lusztig-Vogan polynomials. The computation for E8 was an important test of the technology. While the computation is an impressive achievement, it is only a small step towards the unitary dual and should not be ranked as important as the original work of Kazhdan, Lusztig, Vogan, Beilinson, Bernstein et al. (See for example [4, 5, 6, 11, 12, 14, 18, 8].) Nevertheless the result was regarded as being suitable for a concerted campaign of publicity to heighten awareness of Mathematics in the community at large: “Symmetrie ist möglicherweise das erfolgreichste Prinzip der Physik überhaupt” [7]. “Un groupe de chercheurs américains et européens, parmi lesquels on trouve deux Français, est parvenu à décoder une des structures les plus vastes de l’histoire des mathématiques” [13]. “It may be that some day this calculation can help physicists to understand the universe” [17]. “Eighteen mathematicians spent four years and 77 hours of supercomputer compu- tation to describe this structure” [16]. In this note we demonstrate three representations of a Lie algebra of dimension 248. The two of us spent four hours and 77 seconds of pocket-calculator computation to describe these three structures. 2 Three simple systems For D = 248 formula (2) does not have integral solutions and so there is no system of second-order ordinary differential equations of maximal symmetry possessing a 248-dimensional algebra of its Lie point symmetries1. About formula (3) the factors of 248-3=245 are 1, 5 and 7 (49 is out of question because 492 > 245). Consequently 1Is this another instance of the intrinsically uniqueness of Classical Mechanics? possible values of n are 1, 5 and 7. The corresponding values ofM are 244, 44 and 28, respectively. The systems of maximal symmetry are easily obtained as one simply puts fk = 0 ∀ k. Thus the systems we construct are the simplest representations of the equivalence class under point transformation of systems of equations of maximal symmetry. Firstly we consider the following system: k = 0, k = 1, 5. (4) It is easy to show that this simple system admits a 248-dimensional algebra of its Lie point symmetries since 52 + 5 · 44 + 3 = 248. The algebra is generated by the operators Γ1 = t 2∂t + 43t i=1 ui∂ui , Γ2 = t∂t, Γ3 = ∂t, Γi,k = uk∂ui , k = 1, 5, i = 1, 5 Γi+5,s = t s∂ui, s = 0, 43, i = 1, 5. Secondly we consider the system u(28)r = 0, r = 1, 7. (6) This equally simple system admits a 248-dimensional algebra (72+7 · 28+ 3 = 248) of its Lie point symmetries generated by Γ1 = t 2∂t + 27t j=1 uj∂uj , Γ2 = t∂t, Γ3 = ∂t, Γj,r = ur∂uj , r = 1, 7, j = 1, 7 Γj+7,n = t n∂uj , n = 0, 27, j = 1, 7. Thirdly and finally the scalar equation, u(244) = 0, (8) admits a 248-dimensional Lie algebra (12+1 ·244+3 = 248) of its point symmetries generated by the operators Γ1 = t 2∂t + 243tu∂u, Γ2 = t∂t, Γ3 = ∂t, Γ4 = u∂u, Γn+5 = t n∂u, n = 0, 243. 3 Conclusion We have demonstrated three representations of Lie algebras of dimension 248 which is the dimension of E8. Although the algebras we present are not simple, their method of construction is. The reason for this simplicity is that we used represen- tations for systems of equations of maximal symmetry. We do not deny that larger systems, be that in order or number, of less than maximal symmetry could possibly have an algebra of dimension 248, but even on the assumption that such systems be linear the complexity of the calculation becomes immense [10] and defeats the purpose of the present note. Note that we have used the simplest forms for the generators of the algebras of the three systems, (4), (6) and (8), for our primary interest is the demonstration of the existence of the algebras. Normally one would use combinations which reflect subalgebraic structures. For example in the case of (8) for which the algebra is obviously sl(250, IR) one would replace Γ2 with Γ̃2 = 2t∂t +243u∂u to underline the subalgebraic structure {sl(2, IR)⊕ A1} ⊕s 244A1, where Γ1, Γ̃2 and Γ3 constitute a representation of sl(2, IR), Γ4 reflects the homogeneity of the equation in the depen- dent variable and the 244-element abelian subalgebra is composed of the solution symmetries, so called because the coefficient functions are solutions of (8). Acknowledgements PGLL thanks the University of Kwazulu-Natal for its continued support. References [1] American Institute of Mathematics. http://aimath.org/E8/ [2] Atlas of Lie Groups and Representations. http://www.liegroups.org/ [3] BBC Monday, 19 March 2007, 12:28 GMT. http://news.bbc.co.uk/2/hi/science/nature/6466129.stm [4] Beilinson A (1983) Localization of representations of reductive Lie algebras Proceedings of the International Congress of Mathematicians, Warsaw 699-710 [5] Beilinson A & Bernstein J (1981) Localisation de g-modules Comptes Rendus de l’Académie des Sciences de Paris Séries I Mathématiques 292 15-18 [6] Bernstein J (1986) On the Kazhdan-Lusztig conjectures AMS Summer Research Conference (University of California, Santa Cruz, July 1986) [7] Der Spiegel, 19 März 2007. http://www.spiegel.de/wissenschaft/mensch/0,1518,472569,00.html http://aimath.org/E8/ http://www.liegroups.org/ http://news.bbc.co.uk/2/hi/science/nature/6466129.stm http://www.spiegel.de/wissenschaft/mensch/0 [8] Gelfand S & MacPherson R (1982) Verma modules and Schubert cells: a dic- tionary in Seminaire d’algebre Paul Dubriel et MP Malliavin (Lecture Notes in Mathematics 925, Springer Verlag, Berlin–New York) 150 [9] González-Gascón F & González-López A (1983) Symmetries of differential equa- tions IV Journal of Mathematical Physics 24 2006-2021 [10] Gorringe VM & Leach PGL (1988) Lie point symmetries for systems of second order linear ordinary differential equations Quæstiones Mathematicæ 11 95-117 [11] Kazhdan D & Lusztig G (1979) Representations of Coxeter groups and Hecke algebras Inventiones Mathematicæ 53 165184 [12] Kazhdan D & Lusztig G (1980) Schubert varieties and Poincaré duality in Geometry of the Laplace Operator, (Proceedings of Symposium on Pure Math- ematics 36, American Mathematical Society) 185203 [13] LEMONDE.FR avec AFP 19.03.07 http://www.lemonde.fr/web/article/0,1-0@2-3244,36-884723@51- 884724,0.html [14] Lusztig G & Vogan D (1983) Singularities of closures of K-orbits on flag mani- fold Inventiones Mathematicæ 71 365370 [15] http://www.liegroups.org/AIME8/technicaldetails.html [16] NEW YORK TIMES 2007/03/20. http://select.nytimes.com/gst/abstract.html?res=F40613FE3C540C738EDDAA0894DF404482 [17] The Times March 19, 2007. http://www.timesonline.co.uk/tol/news/uk/science/article1533648.ece [18] Vogan D (1983) Irreducible characters of semisimple Lie groups III: Proof of the Kazhdan-Lusztig conjecture in the integral case Inventiones Mathematicæ 71 381417 http://www.lemonde.fr/web/article/0 http://www.liegroups.org/AIM$_$E8/technicaldetails.html http://select.nytimes.com/gst/abstract.html?res=F40613FE3C540C738EDDAA0894DF404482 http://www.timesonline.co.uk/tol/news/uk/science/article1533648.ece Introduction Three simple systems Conclusion ABSTRACT In this note we present three representations of a 248-dimensional Lie algebra, namely the algebra of Lie point symmetries admitted by a system of five trivial ordinary differential equations each of order forty-four, that admitted by a system of seven trivial ordinary differential equations each of order twenty-eight and that admitted by one trivial ordinary differential equation of order two hundred and forty-four. <|endoftext|><|startoftext|> Introduction A mathematically rigorous approach to quantum field theory based on op- erator algebras is called an algebraic quantum field theory. It has a long history since pioneering works of Araki, Haag, Kastker. (See [22] for a gen- eral treatment of algebraic quantum field theory.) This theory works on Minkowski spaces on any spacetime dimension, and there have been some recent results on curved spacetimes or even noncommutative spacetimes. In the case of 1+1-dimensional Minkowski space with higher spacetime symme- try, conformal symmetry, we have conformal field theory and there we have seen many new developments in the recent years, so we survey such results here. Our emphasis is on representation theoretic aspects of the theory and we make various comparison with another mathematically rigorous and more recent approach to conformal field theory, that is, theory of vertex operator algebras. ∗Supported in part by JSPS. http://arxiv.org/abs/0704.0097v1 Roughly speaking, a mathematical study of quantum field theory is a study of Wightman fields, which are certain type of operator-valued distri- butions on a spacetime with covariance with respect to a given spacetime symmetry group. We have mathematically rigorous axioms for such Wight- man fields, but they involve distributions and unbounded operators, so these cause various kinds of technical difficulty. In contrast, in the algebraic quan- tum field theory, our fundamental object is a net of von Neumann algebras of bounded linear operators on a Hilbert space. (See [46] for general the- ory of von Neumann algebras.) Technical problems on definition domains of unbounded operators do not arise in this approach. A basic idea is as follows. Suppose we have a Wightman field Φ on a spacetime. Fix a bounded region O in the space time and consider a test function ϕ with support contained in O. Then the pairing 〈Φ, ϕ〉 produces an (unbounded) operator. We have many Φ and ϕ for a fixed O and obtain many unbounded operators from such pairing. Then we consider a von Neu- mann algebra of bounded linear operators on this Hilbert space generated by these unbounded operators. (For example, if we have a self-adjoint un- bounded operators, we consider its spectral projections which are obviously all bounded. In this way, we deal with only bounded operators.) This is re- garded as a von Neumann algebra generated by observables in the spacetime region O. A von Neumann algebra is an algebra of bounded linear operators which is closed under the adjoint operation and the strong operator topol- ogy. In this way, we have a family {A(O)} of von Neumann algebras on the same Hilbert space parameterized by spacetime regions. Since the spacetime regions make a net with respect to the inclusion order, we call such a family a net of von Neumann algebras. Now we forget Wightman fields and consider only a net of von Neumann algebras. We have some expected properties for such nets of von Neumann algebras from a physical consideration, and now we use these properties as axioms. So our mathematical object is a net of von Neumann algebras subject to certain set of axioms. Our mathematical aim is to study such nets of von Neumann algebras. 2 Conformal Quantum Field Theory We first explain formulation of full conformal quantum field theory on the 1 + 1-dimensional Minkowski space in algebraic quantum field theory. As a spacetime region O above, it is enough to consider only open rectangles O with edges parallel to t = ±x in (1 + 1)-dim Minkowski space. In this way, we get a family {A(O)} of operator algebras parameterized by spacetime regions O (rectangles). In order to realize conformal symmetry, we have to make a partial compactification of the 1+1-dimensional Minkowski space. If two rectangles are spacelike separated, then we have no interactions between them even at the speed of light, so our axiom requires that the corresponding two von Neumann algebras commute with each other. This is the locality axiom. Since this is not our main object in this paper, we omit details of the other axioms. See [29] for full details. Next we briefly explain that boundary conformal field theory can be han- dled within the same framework. Now we consider the half-space {(x, t) | x > 0} in the 1+1-dimensional Minkowski space and only rectangles O con- tained in this half-space. In this way, we have a similar net of von Neumann algebras {A(O)} parameterized with rectangles in the half-space. See [38] for full details of the axioms. If we have a net of von Neumann algebras over the 1 + 1-dimensional Minkowski space, we can restrict the net of von Neumann algebras to two chiral conformal field theories on the light cones {x = ±t}. In this way, we have two nets of von Neumann algebras on the compactified S1 as description of two chiral conformal field theories. Since this net is our main mathematical object in this article, we give a full set of axioms. (See [29] for details of this “restriction” procedure.) Now our “spacetime” is S1 and a “spacetime region” is an interval I, which means a non-empty, non-dense open connected subset of S1. We have a family {A(I)} of von Neumann algebras on a fixed Hilbert space H . These von Neumann algebras are simple and such von Neumann algebras are called factors, so the family {A(I)} satisfying the axioms below is called a net of factors (or an irreducible local conformal net of factors, strictly speaking). Actually, the set of intervals on S1 is not directed with respect to inclusions, so the terminology net is not mathematically appropriate, but is widely used. 1. (isotony) For intervals I1 ⊂ I2, we have A(I1) ⊂ A(I2). 2. (locality) For intervals I1, I2 with I1∩I2 = ∅, we have [A(I1),A(I2)] = 0 3. (Möbius covariance) There exists a strongly continuous unitary repre- sentation U of PSL(2,R) on H satisfying U(g)A(I)U(g)∗ = A(gI) for any g ∈ PSL(2,R) and any interval I. 4. (positivity of energy) The generator of the one-parameter rotation sub- group of U , called the conformal Hamiltonian, is positive. 5. (existence of the vacuum) There exists a unit U -invariant vector Ω in H , called the vacuum vector, and the von Neumann algebra I∈S1 A(I) generated by all A(I)’s is B(H). 6. (conformal covariance) There exists a projective unitary representation U of Diff(S1) on H extending the unitary representation of PSL(2,R) such that for all intervals I, we have U(g)A(I)U(g)∗ = A(gI), g ∈ Diff(S1), U(g)AU(g)∗ = A, A ∈ A(I), g ∈ Diff(I ′), where Diff(S1) is the group of orientation-preserving diffeomorphisms of S1 and Diff(I ′) is the group of diffeomorphisms g of S1 with g(t) = t for all t ∈ I. The isotony axiom is natural because we have more test functions (or more observables) for a larger interval. The locality axiom takes this simple form on S1. The choice of the spacetime symmetry is not unique, and we can use the Poincaré symmetry on the Minkowski space or the Möbius covariance on S1, for example, but in the conformal field theory, we use conformal symmetry, which means diffeomorphism covariance as above. This set of axioms imply various nice conditions such as the Reeh-Schlieder property, the Bisognano- Wichmann property and the Haag duality. See [28] and references there for details. In the usual situation, all the von Neumann algebras A(I) are isomorphic to the so-called Araki-Woods type III1 factor for all nets A and all intervals I. So each von Neumann algebra does not contain any information about the conformal field theory, but it is the relative position of the von Neumann algebras in the family that encodes the physical information of the theory. (It is similar to subfactor theory of Jones where we study a relative position of one factor in another.) At the end of this section, we compare our formulation of conformal quantum field theory with another mathematically rigorous approach, the- ory of vertex operator algebras. A vertex operator algebra is an algebraic axiomatization of Wightman fields on S1, called vertex operators. If we have an operator valued distribution on S1, its Fourier expansion should give countably many (possibly unbounded) operators as the Fourier coefficients. Under the so-called state-field correspondence, any vector in the space of “states” should give an operator-valued distribution, a quantum “field”, and its Fourier expansion gives countably many operators. In this way, one vector should give countably many operators on the space of these vectors. In other words, for two vectors v, w we have countably many binary operations v(n)w, n ∈ Z, the action of the n-th operator given by v on w. An axiomatization of this idea gives a notion of vertex operator algebra. (See [16] for a precise definition. There is a slightly weaker notion of a vertex algebra. See [27] for its precise definition and related results.) In theory of vertex operator algebra, one considers a vector space of states without an inner product and even when we have a positive definite inner product, one considers this vec- tor space without completion. Here in comparison to nets of factors, we are interested in the case where we have positive definite inner products on the spaces of states. We say that such a vertex operator algebra is unitary. Both of one (unitary) vertex operator algebra and one net of factors should describe one chiral conformal field theory. So unitary vertex operator algebras and nets of factors should be in a bijective correspondence, at least under some “nice” additional conditions, but no general theorems have been known for such a correspondence, though there is a recent progress due to S. Carpi and M. Weiner. However, if we have one construction or an idea on one side, we can often “translate” it to the other side, though it can be highly non- trivial from a technical viewpoint. Fundamental sources of constructions for vertex operator algebras are affine Kac-Moody algebras and integral lattices. The corresponding constructions for nets of factors have been done by A. Wassermann [47] and his students, and Dong-Xu [12], respectively, after the initial construction of Buchholz-Mack-Todorov [5]. If we have examples with some nice properties, we canoften construct new examples from them, and as such methods of constructions of vertex operator algebras, we have simple current extensions, the coset construction, and the orbifold construction. The simple current extensions for nets of factors are simply crossed products by DHR-automorphisms and easy to realize. (See the next section for a notion of DHR-endomorphisms.) The coset and orbifold constructions for nets of factors have been studied in detail by F. Xu [50, 51, 52]. For nets of factors, we have introduced a new construction of examples in [28] based on Longo’s notion of Q-systems [36]. Further examples have been constructed by Xu [55] with this method. This can be translated to the setting of vertex operator algebras, as we will see in this article later. 3 Representation Theory An important tool to study nets of factors is a representation theory. For a net of factors {A(I)}, all the algebras A(I) act on the initial Hilbert space H from the beginning, but we also consider their representations on another Hilbert space, that is, a family {πI} of representations πI : A(I) → B(K), where K is another Hilbert space, common for all I. For I1 ⊂ I2, we must have that the restriction of πI2 on A(I1) is equal to πI1 . The representation on the initial Hilbert space is called the vacuum representation and plays a role of a trivial representation. We also have to take care of the spacetime symmetry group when we consider a representation, but this part is often automatic (see [20]), so we now ignore it for simplicity. See [20] for a more detailed treatment. Note that a representation of a net of factors is a counterpart of a module over a vertex operator algebra. Notions of irreducibility and a direct sum for such representations are easy to formulate. Non-trivial notions are dimensions and tensor products. Each representation {πI} is in a bijective correspondence to a certain endo- morphism λ of an infinite dimensional operator algebra, called a Doplicher- Haag-Roberts (DHR) endomorphism [13, 15], and we can restrict λ to a single factor A(I) for an arbitrarily but fixed interval I. Then λ(A(I)) ⊂ A(I) is a subfactor and we have its Jones index [26]. (See [14, 41, 43] for general theory of subfactors.) The square root of this Jones index plays the role of the dimension of the representation [35]. In algebraic quantum field theory, such a dimension was called a statistical dimension, and it is analogous to a quantum dimension in the theory of quantum groups. It is a positive real numbers in the interval [1,∞]. We can also compose endomorphisms and this composition gives the correct notion of tensor products. We then get a braided tensor category as in [15]. In representation theory of a vertex operator algebra (and also a quantum group), it sometimes happens that we have only finitely many irreducible rep- resentations. Such finiteness is often called rationality, possibly with some extra assumptions on some finite dimensionality. This also plays an impor- tant role in theory of quantum invariants in low dimensional topology. In [32], we have introduced an operator algebraic condition for such rationality for nets of factors as follows and we called it complete rationality. We split the circle into four intervals I1, I2, I3, I4 in this order, say, counterclockwise. Then complete rationality is given by the finiteness of the Jones index for a subfac- tor A(I1)∨A(I3) ⊂ (A(I2)∨A(I4)) ′ where ′ means the commutant, together with the split property. The split property is known to hold if the vacuum character, n=0(dimHn)q n, is convergent for |q| < 1 by [9], so it usually holds and is easy to verify. (Here H = n=0Hn is the eigenspace decompo- sition of the original Hilbert space for the positive generator of the rotation group. So this convergence property can be verified simply by looking at the Hilbert space, not the von Neumann algebras.) In the original definition of complete rationality in [32], we required another condition called strong additivity, but it was proved to be redundant by Longo-Xu [39]. We have proved in [32] that this complete rationality implies that we have a modular tensor category as a representation category of {A(I)}. A modular tensor category produces a 3-dimensional topological quantum field theory. (See [45] for general theory of topological quantum field theory.) The SU(N)k-net of Wassermann has been shown to be completely rational by [49]. We now introduce an important notion of α-induction. For an inclusion of nets of factors, A(I) ⊂ B(I), we have an induction procedure analogous to the group representation. So from a representation of the smaller net A, we would like to construct a representation of the larger net B, but what we actually obtain is not a genuine representation of the larger net B in general, and is something weaker called solitonic. This induction procedure is called the α-induction and depends a choice of braiding, so we write α+ and α−. This was first defined in Longo-Rehren [37] and studied in detail in Xu [48]. Then Böckenhauer-Evans [1] made a further study, and [2, 3] unified this study with Ocneanu’s graphical method [42]. The intersection of the irreducible endomorphisms appearing in the images of α+-induction and α−-induction gives the true representation category of {B(I)} if A is completely rational by [2, 32]. This α-induction opens an important and new connection with theory of modular invariants. A modular tensor category produces a unitary rep- resentation π of SL(2,Z) through its braiding as in [44], and its dimension is the number of irreducible objects. So a completely rational net of fac- tors produces such a unitary representation. (Note that our representation of SL(2,Z) comes from the braiding structure, not from the action of this group on the characters through change of variables τ 7→ aτ + b cτ + d , though in all the “nice” known examples, these two representations coincide. See [30] for a discussion on this matter.) It has been proved in [2] that the matrix (Zλ,µ) defined by Zλ,µ = dimHom(α λ , α is in the commutant of the representation π, using Ocneanu’s graphical cal- culus [42]. Such a matrix Z is called a modular invariant, and we have only finitely many such Z for a given π. For any completely rational net {A(I)}, any extension {B(I) ⊃ A(I)} produces such Z. Matrices Z are certainly much easier to classify than extensions and this is a source of classification theory in the next section. 4 Classification Theory For a net of factors, we can naturally define a central charge and it is well- known to take discrete values 1− 6/m(m+ 1), m = 3, 4, 5, . . . , below 1 and all values in [1,∞) by [17, 18]. We have the Virasoro net {Virc(I)} for each such c and it is the operator algebraic counterpart of the Virasoro vertex operator algebra with the same c. Any net of factors {A(I)} with central charge c is an extension of the Virasoro net with the same central charge and it is automatically completely rational if c < 1, as shown in [28]. So we can apply the above theory and we get the following complete classification list for the case c < 1 as in [28]. 1. The Virasoro nets {Virc(I)} with c < 1. 2. The simple current extensions of the Virasoro nets with index 2. 3. Four exceptionals at c = 21/22, 25/26, 144/145, 154/155. The unitary representations of SL(2,Z) for the Virasoro nets are the well- known ones, and all the modular invariants for these have been classified by [6]. Our result shows that each of the so-called type I modular invariants in the classification list of [6] corresponds to a net of factors uniquely. They are labeled with pairs of A-D2n-E6,8 Dynkin diagrams with Coxeter numbers differing by 1. Three in (3) of the above list have been identified with coset models, but the remaining one does not seem to be related to any other known constructions. This is constructed with “extension by Q-system”. Xu [55] recently applied this construction to many other coset models and obtained infinitely many new examples based on [54], called mirror exten- sions. Classification for the case c = 1 has been also done under some extra assumption [7, 53]. This classification theorem also implies a classification of certain types of vertex operator algebras as follows. Let V be a (rational) vertex operator algebra and Wi be its irreducible modules. We would like to classify all vertex operator algebras arising from putting a vertex operator algebra structure on i niWi and using the same Virasoro element as V , where ni is multiplicity and W0 = V , n0 = 1. From a viewpoint of tensor category, this classification problem of extensions of a vertex operator algebras is the “same” as the classification problem of extensions of a completely rational net of factors, as shown in [24]. So the above classification theorem of local conformal nets implies a clas- sification theorem of extensions of the Virasoro vertex operator algebras with c < 1 as above, and we obtain the same classification list. That is, besides the Virasoro vertex operator algebras themselves, we have their simple cur- rent extensions, and four exceptionals at c = 21/22, 25/26, 144/145, 154/155. With the usual notation of L(c, h) for a module with central charge c and conformal weight h of the Virasoro vertex operator algebras with c < 1, the four exceptionals are listed as follows. 1. L(21/22, 0)⊕L(21/22, 8). It has 15 irreducible representations and has two coset realizations, from SU(9)2 ⊂ (E8)2 and (E8)3 ⊂ (E8)2⊗(E8)1. 2. L(25/26, 0) ⊕ L(25/26, 10). It has 18 irreducible representations and has a coset realization from SU(2)11 ⊂ SO(5)1 ⊗ SU(2)1. 3. L(144/145, 0)⊕L(144/145, 24)⊕L(144/145, 78)⊕ L(144/145, 189). It has 28 irreducible representations and no coset realization has been known. 4. L(154/155, 0)⊕L(154/155, 26)⊕L(154/155, 84)⊕ L(154/155, 203). It has 30 irreducible representations and has a coset realization from SU(2)29 ⊂ (G2)1 ⊗ SU(2)1. Note that it is not obvious that the representation category of the Virasoro net Virc and the representation category of the Virasoro vertex operator algebra L(c, 0) are isomorphic, but as long as the two are braided tensor category and have the same S- and T -matrices, the arguments in [28] work, so we obtain the above classification result for vertex operator algebras. Using the above results and more techniques, we can also completely classify full conformal field theories within the framework algebraic quantum field theory for the case c < 1. Full conformal field theories are given as certain nets of factors on 1 + 1-dimensional Minkowski space. Under natu- ral symmetry and maximality conditions, those with c < 1 are completely labeled with the pairs of A-D-E Dynkin diagrams with the difference of their Coxeter numbers equal to 1, as shown in [29]. We now naturally have D2n+1, E7 as labels, unlike in the chiral case. The main difficulty in this work lies in proving uniqueness of the structure for each modular invariant in the Cappelli-Itzykson-Zuber list [6]. This is done through 2-cohomology vanishing for certain tensor categories. in the spirit of [25]. Furthermore, using the above results and more techniques we can also completely classify boundary conformal field theories for the case c < 1. Boundary conformal field theories are given as certain nets of factors on a 1+ 1-dimensional Minkowski half-space. Under a natural maximality condition, these with c < 1 are now completely labeled with the pairs of A-D-E Dynkin diagrams with distinguished vertices having the difference of their Coxeter numbers equal to 1, as shown in [33] based on a general theory in [38]. The “chiral fields” in a boundary conformal field theory should produce a net of factors on the boundary (which is compactified to S1) as in the operator algebraic approach. Then a general boundary conformal field theory restricts to this boundary to produce a non-local extension of this chiral conformal field theory on the boundary. 5 Moonshine Conjecture The Moonshine conjecture, formulated by Conway-Norton [8], is about mys- terious relations between finite simple groups and modular functions, since an observation due to McKay. Today the classification of all finite simple groups is complete and the classification list contains 26 sporadic groups in addition to several infinite series. The largest group among the 26 sporadic groups is called the Monster group and its order is about 8× 1053 One the one hand, the non-trivial irreducible representation of the Mon- ster having the smallest dimension is 196883 dimensional. On the other hand, the following function, called j-function, has been classically studied in algebra. j(τ) = q−1 + 744 + 196884q + 21493760q2 + 864299970q3 + · · · For q = exp(2πiτ), Im τ > 0, we have modular invariance property, j(τ) = aτ + b cτ + d ∈ SL(2,Z), and this is the only function, up to the constant term, satisfying this property and starting with q−1, McKay noticed 196884 = 196883 + 1, and similar simple relations for other coefficients of the j-function and dimensions of irreducible represen- tations of the Monster group turned out to be true. Then Conway-Norton [8] formulated the Moonshine conjecture roughly as follows, which has been now proved by Borcherds [4] in 1992. 1. We have a “natural” infinite dimensional graded vector space V = Vn with some algebraic structure having a Monster action pre- serving the grading and each Vn is finite dimensional. 2. For any element g in the Monster, the power series n=0(Tr g|Vn)q is a special function called a Hauptmodul for some discrete subgroup of SL(2,R). When g is the identity element, the series is the j-function minus constant term 744. For the part (1) of this conjecture, Frenkel-Lepowsky-Meurman [16] gave a precise definition of “some algebraic structure” as a vertex operator algebra and constructed a particular example V , which is now called the Moonshine vertex operator algebras and denoted by V ♮. The construction roughly goes as follows. In dimension 24, we have an exceptional lattice Λ called the Leech lattice. Then there is a general con- struction of a vertex operator algebra from a certain lattice, and the one for the Leech lattice gives something very close to our final object V ♮. Then we take a fixed point algebra under a natural action of Z/2Z arising from the lattice symmetry, and then make a simple current extension of order 2. The resulting vertex operator algebra is the Moonshine vertex operator algebra V ♮. (The final step is called a twisted orbifold construction). The series n=0(dimV n−1 is indeed the j-function minus constant term 744. Miyamoto [40] has a new realization of V ♮ as an extension of a tensor power of the Virasoro vertex operator algebra with c = 1/2, L(1/2, 0)⊗48 (based on Dong-Mason-Zhu [11]). This kind of extension of a Virasoro tensor power is called a framed vertex operator algebra as in [10]. We have given an operator algebraic counterpart of such a construction in [31]. We realize a Leech lattice net of factors on S1 as an extension of Vir1/2 using certain Z4-code. Then we can perform the twisted orbifold construction in the operator algebraic sense to obtain a net of factors, the Moonshine net A♮. Theory of α-induction is used for obtaining various decompositions. We then get a Miyamoto-type description of this construction, as an operator algebraic counterpart of the framed vertex operator algebras. We then have the following properties. 1. c = 24. 2. The representation theory is trivial. 3. The automorphism group is the Monster. 4. The Hauptmodul property (as above). Outline of the proof of these four properties is as follows. It is immediate to get c = 24. We can show complete rationality passes to an extension (and an orbifold) in general with control over the size of the representation category, using the Jones index. With this, we obtain (2) very easily. Such a net is called holomorphic. Property (3) is the most difficult part. For the Virasoro VOA L(1/2, 0), the vertex operator is indeed a well- behaved Wightman field and smeared fields produce the Virasoro net Vir1/2. Using this property and the fact that g g(L(1/2, 0) ⊗48) for all g ∈ Aut(V ♮) generate the entire Moonshine VOA V ♮, we can prove that the automorphism group as a vertex operator algebra and the automorphism group as a net of factors are indeed the same. Then (4) is now a trivial corollary of the Borcherds theorem [4]. We note that the Baby Monster, the second largest among the 26 sporadic finite simple groups, can be treated similarly with Höhn’s construction of the shorter Moonshine super vertex operator algebra. Still, these examples are treated with various tricks case by case. We expect a bijective correspondence between vertex operator algebras and nets of factors on S1 under some nice conditions. On the side of vertex operator algebras, the most natural candidate for such a “nice” condition is the C2- finiteness condition of Zhu [56] (with unitarity). On the operator algebraic side, our complete rationality in [32] seems to be such a “nice” condition, but the actual relations between the two notions are not clear at this moment. The essential condition for complete rationality is the finiteness of the Jones index arising from four intervals on the circle, and this finiteness somehow has formal similarity to the finiteness appearing in the definition of the C2- finiteness. At the end, we list some open problems. The operator algebraic approach has an advantage in control of representation theory, but is behind of theory of vertex operator algebras in the theory of characters. For a net of factors, we can naturally define a notion of a character for each representation. But even convergence of these characters have not been proved in general, and the modular invariance property, the counterpart of Zhu’s result [56], is unknown, though we certainly expect it to be true. We also expect the Verlinde identity holds, which has been proved in the context of vertex operator algebras recently by Huang [23]. We would need an S- matrix version of the spin-statistics theorem [21] for nets of factors. References [1] J. Böckenhauer, D. E. Evans, Modular invariants, graphs and α- induction for nets of subfactors I, Commun. Math. Phys. 197 (1998) 361–386. II 200 (1999) 57–103. III 205 (1999) 183–228. [2] J. Böckenhauer, D. E. Evans, Y. Kawahigashi, On α-induction, chiral projectors and modular invariants for subfactors, Commun. Math. Phys. 208 (1999) 429–487. [3] J. Böckenhauer, D. E. Evans, Y. Kawahigashi, Chiral structure of modu- lar invariants for subfactors, Commun. Math. Phys. 210 (2000) 733–784. [4] R. E. Borcherds, Monstrous moonshine and monstrous Lie superalgebras, Invent. Math. 109 (1992) 405–444. [5] D. Buchholz, G. Mack, I. Todorov, The current algebra on the circle as a germ of local field theories, Nucl. Phys. B, Proc. Suppl. 5B (1988) 20–56. [6] A. Cappelli, C. Itzykson, J.-B. and Zuber, The A-D-E classification of minimal and A 1 conformal invariant theories, Commun. Math. Phys. 113 (1987) 1–26. [7] S. Carpi, On the representation theory of Virasoro nets, Commun. Math. Phys. 244 (2004) 261–284. math.OA/0306425. [8] J. H. Conway, S. P. Norton, Monstrous moonshine, Bull. London Math. Soc. 11 (1979) 308–339. [9] C. D’Antoni, R. Longo, F. Radulescu, Conformal nets, maximal tem- perature and models from free probability, J. Operator Theory 45 (2001) 195–208. [10] C. Dong, R. L. Griess Jr., G. Höhn, Framed vertex operator algebras, codes and the Moonshine module, Commun. Math. Phys. 193 (1998) 407–448. [11] C. Dong, G. Mason, Y. Zhu, Discrete series of the Virasoro algebra and the moonshine module, Proc. Symp. Pure. Math., Amer. Math. Soc. 56 II (1994) 295–316. [12] C. Dong, F. Xu, Conformal nets associated with lattices and their orb- ifolds, Adv. Math. 206 (2006) 279–306. math.OA/0411499. [13] S. Doplicher, R. Haag, J. E. Roberts, Local observables and particle statistics, I. Commun. Math. Phys. 23, 199-230 (1971); II. 35, 49-85 (1974). [14] D. E. Evans, Y. Kawahigashi, “Quantum symmetries on operator alge- bras”, Oxford University Press, 1998. [15] K. Fredenhagen, K.-H. Rehren, B. Schroer, Superselection sectors with braid group statistics and exchange algebras, I Commun. Math. Phys. 125, 201–226 (1989), II Rev. Math. Phys. Special issue (1992) 113– [16] I. Frenkel, J. Lepowsky, A. Meurman, “Vertex operator algebras and the Monster”, Academic Press, 1988. http://arxiv.org/abs/math/0306425 http://arxiv.org/abs/math/0411499 [17] D. Friedan, Z. Qiu, S. Shenker, Details of the non-unitarity proof for highest weight representations of the Virasoro algebra, Commun. Math. Phys. 107 (1986) 535–542. [18] P. Goddard, A. Kent, D. Olive, Unitary representations of the Virasoro and super-Virasoro algebras, Commun. Math. Phys. 103 (1986) 105– [19] R. L. Griess Jr., The friendly giant, Invent. Math. 69 (1982) 1–102. [20] D. Guido & R. Longo, Relativistic invariance and charge conjugation in quantum field theory, Commun. Math. Phys. 148 (1992) 521—551. [21] D. Guido, R. Longo, The conformal spin and statistics theorem, Com- mun. Math. Phys. 181 (1996) 11–35. [22] R. Haag, “Local Quantum Physics”, 2nd ed., Springer, Berlin, Heidel- berg, New York, 1996 [23] Y.-Z. Huang, Vertex operator algebras, the Verlinde conjecture, and mod- ular tensor categories, Proc. Natl. Acad. Sci. USA 102 (2005) 5352– 5356. [24] Y.-Z. Huang, A. Kirillov Jr., J. Lepowsky, Braided tensor categories and extensions of vertex operator algebras, in preparation. [25] M. Izumi, H. Kosaki, On a subfactor analogue of the second cohomology, Rev. Math. Phys. 14 (2002) 733–757. [26] V. F. R. Jones, Index for subfactors, Invent. Math. 72 (1983) 1–25. [27] V. Kac, “Vertex Algebras for Beginners”, Lect. Notes Series 10, Amer. Math. Soc. Providence, RI, 1988. [28] Y. Kawahigashi, R. Longo, Classification of local conformal nets. Case c < 1, Ann. of Math. 160 (2004), 493–522. math-ph/0201015. [29] Y. Kawahigashi, R. Longo, Classification of two-dimensional local con- formal nets with c < 1 and 2-cohomology vanishing for tensor categories, Commun. Math. Phys. 244 (2004) 63–97. math-ph/0304022. http://arxiv.org/abs/math-ph/0201015 http://arxiv.org/abs/math-ph/0304022 [30] Y. Kawahigashi, R. Longo, Noncommutative spectral invariants and black hole entropy, Commun. Math. Phys. 257 (2005) 193-225. math-ph/0405037. [31] Y. Kawahigashi, R. Longo, Local conformal nets arising from framed vertex operator algebras, Adv. Math. 206 (2006) 729–751. math.OA/0407263. [32] Y. Kawahigashi, R. Longo, M. Müger, Multi-interval subfactors and modularity of representations in conformal field theory, Commun. Math. Phys. 219 (2001) 631–669. [33] Y. Kawahigashi, R. Longo, U. Pennig, K.-H. Rehren, The classification of non-local chiral CFT with c < 1, Commun. Math. Phys. 271 (2007) 375–385. math.OA/0505130. [34] A. Kirillov Jr., V. Ostrik, On q-analog of McKay correspondence and ADE classification of sl(2) conformal field theories, Adv. Math. 171 (2002) 183–227. [35] R. Longo, Index of subfactors and statistics of quantum fields I–II, Com- mun. Math. Phys. 126 (1989) 217–247 & 130 (1990) 285–309. [36] R. Longo, A duality for Hopf algebras and for subfactors, Commun. Math. Phys. 159 (1994) 133–150. [37] R. Longo, K.-H. Rehren, Nets of subfactors, Rev. Math. Phys. 7 (1995) 567–597. [38] R. Longo, K.-H. Rehren, Local fields in boundary CFT, Rev. Math. Phys. 16 (2004) 909–960. [39] R. Longo, F. Xu, Topological sectors and a dichotomy in conformal field theory, Commun. Math. Phys. 251 (2004) 321–364. math.OA/0309366. [40] M. Miyamoto, A new construction of the moonshine vertex operator algebra over the real number field, Ann. of Math. 159 (2004) 535–596. [41] A. Ocneanu, Quantized group, string algebras and Galois theory for alge- bras, in Operator algebras and applications, Vol. 2 (Warwick, 1987), (ed. D. E. Evans and M. Takesaki), London Mathematical Society Lecture Note Series 36, Cambridge University Press, Cambridge, 1988, 119–172. http://arxiv.org/abs/math-ph/0405037 http://arxiv.org/abs/math/0407263 http://arxiv.org/abs/math/0505130 http://arxiv.org/abs/math/0309366 [42] A. Ocneanu, Paths on Coxeter diagrams: from Platonic solids and singu- larities to minimal models and subfactors, (Notes recorded by S. Goto), in Lectures on operator theory, (ed. B. V. Rajarama Bhat et al.), The Fields Institute Monographs, AMS Publications, 2000, 243–323. [43] S. Popa, “Classification of subfactors and of their endomorphisms”, CBMS Regional Conference Series, Amer. Math. Soc. 86 (1995). [44] K.-H. Rehren, Braid group statistics and their superselection rules, in “The Algebraic Theory of Superselection Sectors”, D. Kastler ed., World Scientific 1990, 333–355. [45] V. G. Turaev, “Quantum invariants of knots and 3-manifolds”, Walter de Gruyter, Berlin-New York, 1994. [46] M. Takesaki, “Theory of Operator Algebras”, vol. I, II, III, Springer Encyclopaedia of Mathematical Sciences 124 (2002), 125, 127 (2003). [47] A. Wassermann, Operator algebras and conformal field theory III: Fusion of positive energy representations of SU(N) using bounded operators, Invent. Math. 133 (1998) 467–538. [48] F. Xu, New braided endomorphisms from conformal inclusions, Com- mun. Math. Phys. 192 (1998) 347–403. [49] F. Xu, Jones-Wassermann subfactors for disconnected intervals, Com- mun. Contemp. Math. 2 (2000) 307–347. [50] F. Xu, Algebraic coset conformal field theories I, Commun. Math. Phys. 211 (2000) 1–44. [51] F. Xu, Algebraic coset conformal field theories II, Publ. RIMS, Kyoto Univ. 35 (1999) 795–824. [52] F. Xu, Algebraic orbifold conformal field theories, Proc. Nat. Acad. Sci. U.S.A. 97 (2000) 14069–14073. [53] F. Xu, Strong additivity and conformal nets, Pac. J. Math. 221 (2005) 167–199. math.QA/0303266. [54] F. Xu, 3-manifolds invariants from cosets, J. Knot Theory Ramif. 14 (2005) 21–90. http://arxiv.org/abs/math/0303266 [55] F. Xu, Mirror extensions of local nets, Commun. Math. Phys. 270 (2007) 835–847. math.QA/0505367. [56] Y. Zhu, Modular invariance of characters of vertex operator algebras, J. Amer. Math. Soc. 9 (1996) 237–302. http://arxiv.org/abs/math/0505367 Introduction Conformal Quantum Field Theory Representation Theory Classification Theory Moonshine Conjecture ABSTRACT We review recent progress in operator algebraic approach to conformal quantum field theory. Our emphasis is on use of representation theory in classification theory. This is based on a series of joint works with R. Longo. <|endoftext|><|startoftext|> Sparsely-spread CDMA - a statistical mechanics based analysis Jack Raymond and David Saad Neural Computation Research Group, Aston University, Aston Triangle, Birmingham, B4 7EJ E-mail: jack.raymond@physics.org Abstract. Sparse Code Division Multiple Access (CDMA), a variation on the standard CDMA method in which the spreading (signature) matrix contains only a relatively small number of non-zero elements, is presented and analysed using methods of statistical physics. The analysis provides results on the performance of maximum likelihood decoding for sparse spreading codes in the large system limit. We present results for both cases of regular and irregular spreading matrices for the binary additive white Gaussian noise channel (BIAWGN) with a comparison to the canonical (dense) random spreading code. PACS numbers: 64.60.Cn, 75.10.Nr, 84.40.Ua, 89.70.+c AMS classification scheme numbers: 68P30,82B44,94A12,94A14 http://arxiv.org/abs/0704.0098v5 Sparsely-spread CDMA - a statistical mechanics based analysis 2 1. Background The area of multiuser communications is one of great interest from both theoretical and engineering perspectives [1]. Code Division Multiple Access (CDMA) is a particular method for allowing multiple users to access channel resources in an efficient and robust manner, and plays an important role in the current preferred standards for allocating channel resources in wireless communications. CDMA utilises channel resources highly efficiently by allowing many users to transmit on much of the bandwidth simultaneously, each transmission being encoded with a user specific signature code. Disentangling the information in the channel is possible by using the properties of these codes and much of the focus in CDMA research is on developing efficient codes and decoding methods. In this paper we study a variant of the original method, sparse CDMA, where the spreading matrix contains only a relatively small number of non-zero elements as was originally studied and motivated in [2]. While the straightforward application of sparse CDMA techniques to uplink multiple access communication is rather limited, as it is difficult to synchronise the sparse transmissions from the various users, the method can be highly useful for frequency and time hopping. In frequency-hopping code division multiple access (FH-CDMA), one repeatedly switches frequencies during radio transmission, often to minimize the effectiveness of interception or jamming of telecommunications. At any given time step, each user occupies a small (finite) number of the (infinite) M-ary frequency-shift-keying (MFSK) chip/carrier pairs (with gain G, the total number of chip- frequency pairs is MG.) Hops between available frequencies can be either random or preplanned and take place after the transmission of data on a narrow frequency band. In time-hopping (TH-)CDMA, a pseudo-noise sequence defines the transmission moment for the various users, which can be viewed as sparse CDMA when used in an ultra-wideband impulse communication system. In this case the sparse time-hopping sequences reduces collisions between transmissions. This study follows the seminal paper of Tanaka [3], and other recent extensions [4], in utilising the replica analysis for randomly spread CDMA with discrete inputs, which established many of the properties of random densely-spread CDMA with respect to several different detectors including Maximum A Posteriori (MAP), Marginal Posterior Maximiser (MPM) and minimum mean square-error (MMSE). Sparsely-spread CDMA differs from the conventional CDMA, based on dense spreading sequences, in that any user only transmits to a small number of chips (by comparison to transmission on all chips in the case of dense CDMA). The sparse nature of this model facilitates the use of methods from statistical physics of dilute disordered systems [5, 6] for studying the properties of typical cases. The feasibility of sparse CDMA for transmitting information was recently demonstrated [2] for the case of real (Gaussian distributed) input symbols by employing a Gaussian effective medium approximation; several results have been reported for the case of random transmission patterns. In a separate recent study, based on the belief propagation inference algorithm and a binary input prior distribution, sparse CDMA has also been Sparsely-spread CDMA - a statistical mechanics based analysis 3 considered as a route to proving results in the densely spread CDMA [7]. In addition, this study demonstrated the existence of a waterfall phenomenon comparable to the dense code for a subset of ensembles. The waterfall phenomenon is observed in decoding techniques, where there is a dynamical transition between two statistically distinct solutions as the noise parameter is varied. Finally we note a number of pertinent studies concerning the effectiveness of belief propagation as an MPM decoding method [8, 9, 10, 11], and in combining sparse encoding (LDPC) methods with CDMA [12]. Many of these papers however consider the extreme dilution regime – in which the number of chip contributions is large but not O(N). The theoretical work regarding sparsely spread CDMA remained lacking in certain respects. As pointed out in [2], spreading codes with Poisson distributed number of non- zero elements, per chip and across users, are systematically failing in that each user has some probability of not contributing to any chips (transmitting no information). Even in the “partly regular” code [7] ensemble (where each user transmits on the same number of chips) some chips have no contributors owing to the Poisson distribution in chip connectivity, consequently the bandwidth is not effectively utilised. We circumvent this problem by introducing constraints to prevent this, namely taking regular signature codes constrained such that both the number of users per chip and chips per user take fixed integer values. Furthermore we present analytic and numerical analysis without resort to Gaussian approximations of any quantities. Using new tools from statistical mechanics we are able to cast greater light on the nature of the binary prior transmission process. Notably the nature of the decoding state space and relative performance of sparse ensembles versus dense ones across a range of noise levels; and importantly, the question of how the coexistence of solutions found by Tanaka [3] extends to sparse ensembles, especially close to the transition points determined for the dense ensemble. In this paper we demonstrate the superiority of regular sparsely spread CDMA code over densely spread codes in certain respects, for example, the anticipated bit error rate arising in decoding is improved in the high noise regime and the solution coexistence behaviour is less pervasive. Furthermore, to utilise belief propagation for such an ensemble is certain to be significantly faster and less computationally demanding [13], this also has power-consumption implications which may be important in some applications. Other practical issues of implementation, the most basic being non-synchronisation and power control, require detailed study and may make fully harnessing these advantages more complex and application dependent. The paper is organised as follows: In section 2 we will introduce the general framework and notation used, while the methodology used for the various codes will be presented in section 3. The main results for the various codes will be presented in section 4 followed by concluding remarks in section 5. Sparsely-spread CDMA - a statistical mechanics based analysis 4 i b ξ j d 1τ iτ jτ kτ lτ K yb yc yd yN Figure 1. A bi-partite graph is useful for visually realising a problem. A user node i at the bottom interacts with other variables through its set of neighbouring factor nodes (∂i) to which it connects. The factor nodes are determined through a similar neighborhood. The interaction at each factor (µ) is conditioned on neighbouring gain factors ξµ (the non-zero components of s), and yµ (which is an implicit function of the noise ωµ, and neighbouring input bits bµ and gain factors ξµ), assuming a uniform prior on the bits. The statistical mechanics reconstruction problem associates dynamical variables τ to the user nodes that interact through the factors. The thermodynamical equilibrium state of this system then describes the theoretical performance of optimal detectors. 2. The model We consider a standard model of CDMA consisting of K users transmitting in a bit interval of N chips. We assume a model with perfect power control and synchronisation, and consider only the single bit interval. In our case the received signal y is described by [skbk] + ω , (1) where the vector components describe the values for distinct chips: sk is the spreading code for user k, bk = ±1 is the bit sent by user k (binary input symbols) and ω the noise vector. Appropriate normalisation of the power is through the definition of the signature matrix (s). It is possible to include a user or chip specific amplitude variation, which may be due to fading or imperfect power control. We consider a model without these effects. The spreading codes are sparse so that in expectation only C of the elements in vector sk are non-zero. If, with knowledge of the signature matrix in use, we assume the signal has been subject to additive white Gaussian channel noise of variance σ20/β, where σ 0 is the variance of the true channel noise 〈ω2〉, we can write the posterior for the transmitted bits τ (unknowns given the particular instance) using Bayes Theorem P (τ |y) = [sµk(bk − τk)] + ωµ P (τ ) , (2) and from this define bit error rate, mutual information, and other quantities. The statistical mechanics approach from here is to define a Hamiltonian and partition Sparsely-spread CDMA - a statistical mechanics based analysis 5 function from which the various statistics relating to this probability distribution may be determined - and hence all the usual information theory measures. A suitable choice for the Hamiltonian is H(τ ) = [sµk(bk − τk)] + ωµ hkτk . (3) We can here identify τ as the dynamical variables in the inference problem (dependence shown explicitly). The other quenched variables (parameters), describing the instance of the disorder, are the signature matrix (s), noise (ω) and the inputs (b). The variables hk describe our prior beliefs about the inputs (the specific user bias), and we can assume some simple distribution for this such as all users having the same bias hk = H . Maximal rate transmission corresponds to unbiased bits H = 0, and this is considered throughout the paper. The properties of such a system may be reflected in a factor (Tanner) graph, a bipartite graph in which users and chips are represented by nodes (see figure 1). The calculation we undertake is specific to the case of the thermodynamic limit in which the number of chips N → ∞ whilst the load α = K/N is fixed. Note that α is termed β in many CDMA papers, here we reserve β to mean the “inverse temperature” in a statistical mechanics sense (which defines our prior belief for the noise level and give rise to the corresponding MAP detector.) In all ensembles we may identify the parameter L as the mean number of contributions to each chip, and C as the mean number of contributions per user. As such the following also holds . (4) The case in which α is greater than 1 will be called oversaturated, since more than one bit is being transmitted per chip. The calculations presented henceforth are specific to the case of memoryless noise, drawn from a single distribution of mean zero and mean square σ20 Ω(ω) = P (ωµ = ω) . (5) Defining normalised spreading codes such that sk.sk = N , we can identify the “power spectral density” (PSD) over a chip interval as a measure of the system noise 1/(2σ20) – the factor two being connected with physical considerations in implementing the model. 2.1. Code Ensembles We consider several code ensembles we call irregular, partly regular and regular, which differ in the constraints placed on the factor and variable degree constraints of the signature matrix s. The probability distribution P (s) = N δ(sµk 6= 0)− L̃) P (L̃) Sparsely-spread CDMA - a statistical mechanics based analysis 6 δ(sµk 6= 0)− C̃) P (C̃) P (sµk) , (6) where N is a normalising constant, P (L̃) is the factor degree probability distribution of mean L, P (C̃) is the variable degree probability distribution of mean C, and P (sµk) is the marginal probability distribution which is common to all ensembles P (sµk) = δ(sµk) + δ(sµk − ξ) . (7) The form of (6) is then sufficient for the sparse distributions we consider in the large system limit, and makes explicit the chip and user connectivity properties of the ensembles. The gain factor ξ, is drawn randomly from a single distribution with zero measure at ξ = 0, and finite moments, in any instance of a code φ(ξ) = P (sµk = ξ|sµk 6= 0) . (8) Unlike the dense case the details of this distribution will effect results, but only in a small way for reasonable choices [2]. We here investigate the case of Binary Phase Shift Keying (BPSK) which corresponds to a uniform distribution on {− 1√ }, though the analytic results presented are applicable to any distribution of mean square = 1/L. Note that disorder in the gain factors is not a necessity, the case ξ = 1/ L also allows decoding in sparse ensembles. The case where P (L̃) and P (C̃) are Poissonian distributed identifies the irregular ensemble - where the connections between chips and users are independently distributed. The second distribution called partly regular has P (C̃) = δC,C̃ , in which the chip connectivity is again Poisson distributed with mean L, but each user contributes to exactly C chips. This prevents the systematic failure inherent in the irregular ensemble since therein an extensive number of users fail to transmit on any chips. If in addition to the aforementioned constraint all chips receive exactly L contributions, P (L̃) = δL,L̃, the ensemble is called regular. Regular chip connectivity amongst other things prevents the systematic inefficiency due to leaving some chips unaccessed by any of the users. The case of Poissonian distributions is that in which there is no global control. In many engineering applications constraining users individually (non-Poissonian P (C̃)) is practical, whereas coordination between users (non-Poissonian P (L̃)) is difficult. The practicalities of implementing the different ensembles we consider are application specific: the advantages inherent in distributing channel resources more evenly amongst users may be lost to practical implentation problems. 3. Methodology 3.1. Spectral Efficiency Lower Bound The inferiority of codes with Poissonian user connectivity has been pointed out previously (e.g., in [2]), based on the understanding that codes which leave a portion of the users Sparsely-spread CDMA - a statistical mechanics based analysis 7 disconnected cannot be optimal. Analogously we argue that codes with irregular chip connectivity must also be inferior in that they leave a fraction of the chips (bandwidth) unutilised, thus providing a motivation for considering fully regular codes. In this section we show a particular case in which the regular codes are expected to outperform any other ensemble by analysing the amount of information that can be extracted on the sent bits by consideration of only one chip in isolation of the other chips. This corresponds to a detector reconstructing bits based only on the value of a single chip, and is independent of the user connectivity. The spectral efficiency is defined as the mutual information between the received signal and reconstructed bits per chip. In considering only a single chip (µ) we have I(τ ; yµ) = P (τ |yµ) P (τ ) P0(τ ,yµ) , (9) where the subscript zero indicates that the true (generative), rather than model (2), probability distribution. For brevity we consider the simplest case that the generative and model probability distributions are the same with unbiased bits and a Gaussian noise distribution in which case after some rearrangement I(τ ; yµ) = L̃− exp(−Hµ(τ µ)) τ µ exp(−Hµ(τ µ)) P0(τ µ,yµ) , (10) where τ µ are the bits connected to chip µ, and the chip Hamiltonian is ξiτi + yµ , (11) labelling each interacting (non-zero) component on the chip by i, L̃ being the chip connectivity. Working from this description we wish to compare the performance of ensembles with different chip connectivities. To do this we consider the ensemble average mutual information by averaging the mutual information over the connectivities (L̃), load factors, and transmitted bits. This average is complicated, however it is possible to calculate the dominant terms in the low and high PSD limits. In the case of low noise (PSD → ∞) we find the asymptotically dominant terms come first from the numerator 〈log2 exp−H(τ µ)〉 / log(2) = 2 log(2) , (12) which is an average over the ground state energy, and also the logarithm of the denominator which is exp−H(τ µ) ξi(bi − τi) , (13) where yµ has been decomposed into its bit ({bi}) and noise (ω) parts, and the averages are now over the ensembles as well as yµ. The first part of (13) gives an energy contribution cancelling (12). We call the remaining part the average over the chip entropy, by Sparsely-spread CDMA - a statistical mechanics based analysis 8 comparison with (10) this determines the amount of information lost in decoding. The chip entropy term contains an indicator function counting the ground states - the average chip entropy is zero when τ µ = bµ is the only solution. For the case of BPSK however there may be some degeneracy in ground states with two terms in the sum being non-zero but cancelling one another. This degeneracy has a dependence on the distribution P (L̃) for given L. Averaging over load factors and transmitted bits we find that in the zero noise limit I(τ , yµ) ξi(bi − τi) P (L̃) , (14) min(p,L̃−p) L̃− p P (L̃) . (15) By numerical evaluation of this function (see results section 4.2) we find that the optimal ensemble is in fact the regular ensemble. This is because chip entropy, when averaged over load factors and bits is a concave function in L̃, so that the information loss is minimised when P (L̃) = δ . This dependency on L̃ may be a peculiarity of the detector considered, but many other aspects of the calculation may be generalised to give a similar result. It is possible to consider the opposite limit σ20 → ∞ perturbatively. We found that the leading four orders in 1/σ0 were identical for all code ensembles of the same mean chip connectivity. We would anticipate the behaviour at non-extreme PSD to fall somewhere between these two regimes and thus for the chip regular ensemble to be atleast as good as the chip irregular ensembles. We note here that another reason for considering the regular code optimal amongst sparse random codes is to consider the field term when the Hamiltonian (11) is written in canonical form with a set of couplings ({J〈ij〉}) and user specific external fields ({hi}). In this representation the set of external fields are in expectation aligned with the sent bit sequence, but subject to fluctuations for each code instance. The variance of these fluctuations may be shown to be proportional to the excess chip connectivity over the true chip connectivity [14], which amongst all ensembles is minimised by the regular chip ensemble. The multi-user interference is larger in irregular codes and hence information recovery is weaker as predicted in this section.† 3.2. Replica Method Outline We determine the static properties of our model defined in section 2, including correlations due to the full interaction structure, we use the replica method. From the expression of the Hamiltonian (3) we may identify a free energy and partition function as: f = − 1 lnZ Z = Trτ exp (−βH(τ )) . † This argument is added since published version. Sparsely-spread CDMA - a statistical mechanics based analysis 9 To progress we make use of the anticipated self-averaging properties of the system. The assumption being that in the large system limit any two randomly selected instances will, with high probability, have indistinguishable statistical properties. This assumption has firm foundation in several related problems [15], and is furthermore intuitive after some reflection. If this assumption is true then the statists of any particular instance can be described completely by the free energy averaged over all instances of the disorder. We are thus interested in the quantity F = 〈f〉 = − lim 〈lnZ〉I , (16) where the angled brackets represent the weighted averages over I (the instances). The entropy density may be calculated from the free energy density by use of the relation s = β(e− f) , (17) where e is the energy density. To determine the free energy we must average over disorder in (16), which is a difficult problem except in special cases. This is why we make use of the replica identity 〈lnZ〉I = lim 〈Zn〉I . (18) We can model the system now as one of interacting replicas, where Zn is decomposed as a product of an integer number of partition functions with conditionally independent (given the instance of the disorder) dynamical variables. The discreteness of replicas is essential in the first part of the calculation, but a continuation to the real numbers is required in taking n → 0+ – this is a notorious assumption, which rigorous mathematics can not yet justify for the general case, in spite of the progress made in recent years [16, 17, 18]. However, we shall assume validity and since the methodology for the sparse structures is well established [19, 20, 15] we omit our particular details. The final functional form for the free energy is determlained from 〈Zn〉 = dP (b,σ)dP̂ (b,σ) exp{lnN +N(G1(n) +G2(n) +G3(n))} ; (19a) G1(n) = ln λ2α/2 P (b,σ) λα(b− τα) P (L̃) ; (19b) G2(n) = P (b,σ)P̂ (b,σ) ; (19c) G3(n) = α ln (−L)C̃ P̂ (b, τ ) P (C̃) P0(b) ; (19d) where N is a constant due to normalising the ensembles (6). This expression may be evaluated at the saddle point to give an expression for the free energy. In the term (19d) Sparsely-spread CDMA - a statistical mechanics based analysis 10 we account for the cases in which the marginalised probability distribution P0(b) and assumed marginal probability distribution (described by H) are asymmetric. In the case of maximal rate which we will consider, the b average is trivial and H = 0. Provided that in addition the gain factor distribution is symmetric then it is possible to remove the b dependence in the order parameters, since the symmetry P (b,σ) = P (−b,−σ) and P̂ (b,σ) = P̂ (−b,−σ) leaves the free energy invariant. 3.3. Replica Symmetric Equations The concise form for our equations is attained using the assumption of replica symmetry (RS). This amounts to the assumption that the correlations amongst replicas are all identical, and determined by a unique shared distribution. The validity of this assumption may be self consistently tested (section 3.5). This assumption differs from that used by Yoshida and Tanaka [2] where the correlations are described by only a handful of parameters rather than a distribution once RS is assumed – this approach may therefore miss some of the detailed structure although it is easier to handle numerically. The order parameter in our case is given by P (b, τ ) = dπ(x) (1 + bταx) ; (20a) P̂ (b, τ ) = q̂ dπ̂(x̂) (1 + bταx) ; (20b) where q̂ is a variational normalisation constant and π, π̂ are normalised distributions on the interval [−1, 1]. From here onwards we may consider the case in which the bit variables τα and gain factors ξ are gauged to b (τb → τ , ξb → ξ). Using Laplace’s method, this gives the following expression for the (RS) free energy at the saddle point FRS = − Extrπ,bπ G1,RS(L̃)(n) + G2,RS(n) + G3,RS(C̃)(n) where G1,RS(n) = − L ln 2 [dπ(xl)] ln Tr{τl=±1}χL̃(τ ; {ξ}, ω, {x}) Ω(ω),φ(ξ) P (L̃) ; (22a) χL̃(τ ; {ξ}, ω, {x}) = exp (1− τl)ξl (1 + τlxl) ; (22b) G2,RS(n) = − L dπ(xc)dπ̂(x̂c) ln(1 + xx̂c) ; (22c) G3,RS(n) = α [dπ̂(x̂c)] ln (1 + x̂c) + (1− x̂c) P (C̃) . (22d) Sparsely-spread CDMA - a statistical mechanics based analysis 11 and the saddle point value for ŵ (= L) has been introduced. The averages over L̃ and C̃ encapsulate the differences amongst the ensembles. Equation (22b) describes the interaction at a single chip in the factor graph (figure 1) of connectivity L̃. The parameter ξl and variable τ are the gain factors, and reconstructed bits respectively, both gauged to the transmitted bit, while ω is the instance of the chip noise. The order variational distributions {π, π̂} are chosen so as to extremise (21). The self consistent equations attained by the saddle point method are: π̂(x̂) = [dπ(xl)] Tr{τl=±1} τL̃+1 χ̄L̃(τ ; {ξ}, {x̂}) Tr{τl=±1} χ̄L̃(τ ; {ξ}, ω, {x}) {ξ},ω P (L̃) (23a) χ̄L̃(τ ; {ξ}, ω, {x}) = exp (1− τl)ξl (1 + τlxl) (23b) π(x) = [dπ̂(x̂c)] δ c=1(1 + x̂c)− c=1(1− x̂c) c=1(1 + x̂c) + c=1(1− x̂c) P (C̃) . (23c) The variables P (L̃) and P (C̃) are here the excess degree distributions of the particular ensemble (6). For regularly constrained ensembles the chip and user excesses are L − 1 and C − 1 respectively. For Poissonian distributions the excess degree distribution is the full degree distribution. Aside from entropy, the other quantities of interest may be determined from the probability distribution for the overlap of reconstructed and sent variables mk = 〈τk〉, P (m) = lim δmk,m , (24) [dπ̂(x̂c)] δ c=1(1 + x̂c)− c=1(1− x̂c) c=1(1 + x̂c)+ c=1(1− x̂c) P (C̃) . (25) We note finally that equivalent expressions to these found with the RS assumption may be obtained by using the cavity method [6] with the assumption of a single pure state. This approach is a probabilistic one and hence more intuitive on some levels. 3.4. Population Dynamics Analysis of these equations is primarily constrained by the nature of equations (23a- 23c). No exact solutions are apparent, and perturbative regimes about the ferromagnetic solution (which is only a solution for zero noise) are difficult to handle. Consequently we use population dynamics [21] – representing the distributions {π(x), π̂(x̂)} by finite populations (histograms) and iterating this distribution until convergence. It is hoped, and observed, that each histogram captures sufficient detail to describe the continuous Sparsely-spread CDMA - a statistical mechanics based analysis 12 function and the dynamics (described below) allow convergence towards a true solution distribution with only small corrections due to finite size effects. To solve the equations (23a,23c) with population dynamics finite histograms constucted from M undirected cavity magnetisations are used. Histograms approximating each function are formed π(x) → W = {x1, . . . , xi, . . . , xM} , (27a) π(x̂) → Ŵ = {x̂1, . . . , x̂a, . . . , x̂M} , (27b) with M sufficiently large to provide good resolution in the desired performance measures. The discrete minimisation dynamics of the histograms is derived from (23a-23c). Histogram updates are undertaken alternately, with all magnetisation in the histogram being updated sequentially. In the update of field xa the quenched parameters {L̃, ω, ξ} are sampled, L̃ being the chip excess degree, and L̃ magnetisations are randomly chosen from W , defining through (23a) the update x̂a = Tr{τl=±1} τL̃+1 χ̄L̃(τ ; {ξ}, ω, {x}) Tr{τl=±1} χ̄L̃(τ ; {ξ}, ω, , {x}) . (28) The update of the other histogram follows dynamics in which C̃ is sampled, C̃ being the user excess degree, along with C̃ randomly chosen magnetisations from Ŵ , defining through (23c) the update c=1(1 + x̂c)− c=1(1− x̂c) c=1(1 + x̂c) + c=1(1− x̂c) . (29) There is a strong analogy between the population dynamics algorithm and that of message passing on a particular instance of the graph. The iteration of the histograms implicit in (28-29) is analogous to the propagation of a population of cavity magnetisations between factor (a) and user (i) nodes, which may be written as the self consistent equations: x̂a→i = Tr{τl=±1}τi exp l∈∂ari (1− τl)ξal l∈∂ari (1 + τlxl→a) ; (30a) xi→a = c∈∂ira (1 + x̂c→i)− c∈∂ira (1− x̂c→i) ; (30b) where Nx,x̂ are the relevant normalisations, and the abbreviation ∂y indicates the set of nodes connected to y. In population dynamics, the notion of a particular graph with labelled edges is absent however, and the only the distribution of the two types of magnetisations are relevant. 3.5. Stability Analysis To test the stability of the obtained solutions we consider both the appearance of non-negative entropy, and a stability parameter defined through a consideration of the Sparsely-spread CDMA - a statistical mechanics based analysis 13 fluctuation dissipation theorem. The first criteria that the entropy be non-negative is based on the fact that physically viable solutions in discrete systems must have non- negative entropy so that any solution found not meeting this criteria must be based on bad premises; replica symmetry is a likely source. The stability parameter λ is defined in connection with the cavity method for spin glasses [22] and tests local stability of the solutions. It is equivalent to testing the local stability of belief propagation equations as proposed in [23]. A necessary condition for the stability of the RS solution is that the corresponding susceptibility does not diverge. This condition ensures that fields are not strongly correlated. The spin glass susceptibility when averaged over instances may be defined 〈τ0τd〉2c , (31) where d is the distance between two nodes in the factor graph, the inner average denotes the connected correlation function between these nodes, Xd describes the typical number of variables at distance d, and the outer average is over instances of the disorder (self- averaging part). This quantity is not divergent provided that λ = ln 〈τ0τd〉2c is negative, since this indicates an asympoptically exponential decrease in the terms of (31) and hence convergence of the sum. In the thermodynamic limit the connected correlation function is dominated by a single direct path which may be decomposed as a chain of local linear susceptibilities 〈τ0τd〉c ∝ (i,j) ∂xi→a ∂x̂b→i ∂x̂b→i ∂xj→b , (33) where (i,j) indicate the set of variables on the shortest path between nodes 0 and d in a particular instance of the graph (30a). This representation allows us to construct an estimation for λ numerically based on principles similar to population dynamics [24] – the directedness and fixed structure implicit in a particular problem is removed with the self-averaging assumption leaving a functional description similar to (23a-23c), which may be iterated. In order to approximate the stability parameter λ one introduces additional positive numbers in the population dynamics histograms (27b,27a), xi → {xi, vi} and x̂a → {x̂a, v̂a} respectively. These new values represent the relative sizes of perturbations in each magnetisation, and are updated in parallel to (28,29) as v̂a = , (34) and with similar assignments for the field update of W . (35) Sparsely-spread CDMA - a statistical mechanics based analysis 14 The partial derivatives are calculated from (28-29) and evaluated at the corresponding values in the sampled population. If the final fixed point is stable against small perturbations in the initial field then these values {v, v̂} must decay exponentially on average. Renormalisation of {vi} and {v̂a} such that the mean is 1 after each update is necessary. The numerical renormalisation constant for each population yields (dependent) estimations of λ, which can be sampled at a suitable convergence time (end of the {W, Ŵ} minimisation process). Like population dynamics we expect behaviour to be sensitive to initialisation conditions and finite size effects in some circumstances. In addition the estimation requires good resolution in the histograms W and Ŵ . 4. Results Results are presented here for the canonical case of Binary Phase Shift Keying (BPSK) where ξl ∈ {1,−1} with equal probability. Furthermore, we assume an AWGN model for the true noise ω (of variance σ20). For evaluation purposes we assume the channel noise level is known precisely, so that β = 1, employing the Nishimori temperature [5]. This guarantees that the RS solution is thermodynamically dominant. Furthermore the energy takes a constant value at the Nishimori temperature and hence the entropy is affine to the free energy. Where of interest we plot the comparable statistics for the Single User Gaussian channel (SUG), and the densely spread ensemble, each with MPM detectors – equivalent to maximum likelihood for individual bits. For population dynamics two parallel populations (27a,27b) are initialised either uniformly at random, or in the ferromagnetic state. These two populations are known to converge towards the unique solution, where one exists, from opposite directions, and so we can use their convergence as a criteria for halting the algorithm and testing for the appearance of multiple solutions. In the case where they converge to different solutions we can usually identify the solution converged to from the ferromagnetic initial state as a good solution - in the sense that it reconstructs well, and that arrived at from random initial state as a bad solution. In the equivalent belief propagation algorithm one cannot choose initial conditions equivalent to ferromagnetic – knowing the exact solution would of course makes the decoding redundant. We therefore expect the properties of the bad solution to be those realisable by belief propagation (though clever algorithms may be able to escape to the good solution under some circumstances). The stability variables {v, v̂} were initialised independently each as the square of a value drawn from a gaussian distribution – and tests indicated other reasonable distributions produced similar results. Computer resources restrict the cases studied in detail to an intermediate PSD regime, and small L. In particular, the problem at low PSD, is the Gaussian noise average, which is poorly estimated, while at high PSD a majority of the histogram is concentrated at magnetisations x, x̂ ≈ 1 not allowing sufficient resolution in the rest of the histogram. Several different measures are calculated from the converged order parameter, Sparsely-spread CDMA - a statistical mechanics based analysis 15 indicating the performance of sparsely-spread CDMA. Using the converged histograms for the fields we are able to determine the following quantities: free energy, energy and a histogram for the probability distribution, from discretisations of the previously presented equations (23a-23c). Using the probability distribution we are also able to approximate the decoding bit error rate dP (m) 1− sign(m) ; (36) multi-user efficiency MuE = erfc−1(Pb) ; (37) and mutual information between sent and reconstructed bits per chip, I(b; τ )/N (taking a factorised form given the RS assumption) MI = α dP (m) 1 + τm 1 + τm . (38) The spectral efficiency is the capacity I(τ ;y) per chip, which is affine to the entropy (and the free energy at the Nishimori temperature) ν = α− s/ ln 2 . (39) Negative entropy can be identified when the measured spectral efficiency exceeds the load, and thermodynamic transition points correspond to points of coincident spectral efficiency. Figure 2‡ demonstrates some general properties of the regular ensemble in which the variable and factor degree connectivities are C : L = 3 : 3, respectively. Equations (23a- 23c) were iterated using population dynamics and the relevant properties were calculated using the obtained solutions; the data presented is averaged over 100 runs and error-bars, which are typically small, are omitted for brevity. Figure 2(a) shows the bit error rate in regular and Poissonian codes, the inset focuses on the range where the sparse-regular and dense cases crossover. The sparse codes demonstrate similar trends to the dense case except the irregular code, which show weaker performance in general, and in particular at high PSD. Detailed trends can be seen in figure 2(b) that shows the multiuser efficiency. Codes with regular user connectivity show superior performance with respect to the dense case at low PSD. Figure 2(c) shows similar trends in the spectral efficiency and mutual information (shown in the inset); the effect of the disconnected (user) component is clear in the fact that the irregular code fails to reach capacity at high noise levels. In general it appears the chip connectivity distribution is not critical in changing the trends present, unlike the user connectivity distribution. It was found in these cases (and all cases with unique solutions for given PSD), that the algorithm converged to non-negative entropy values and to a stability measure fluctuating about a value less than 0, as shown in figure 2(d). These points would indicate the suitability of the RS assumption. ‡ This figure has been modified from the published version, the difference being that the Poissonian chip connectivity codes have everywhere weaker performance than the dense and sparse regular code ensemble. Sparsely-spread CDMA - a statistical mechanics based analysis 16 The outperformance of dense codes by sparse ensembles with regular user connectivity in the low PSD regime is new to our knowledge, although Poissonian chip connectivity is everywhere inferior to both the dense and regular sparse codes. The difference between codes disappears rapidly with increasing (connection) density at fixed α (figure 3). This is inline with our prediction of the regular code being a high performance ensemble in preceeding sections. Figure 3 indicates the effect of increasing density at fixed α in the case of the regular code. As density is increased the statistics of the sparse codes approach that of the dense channel in all ensembles tested. For the irregular ensemble performance increases monotonically with density at all PSD. The rapid convergence to the dense case performance was elsewhere observed for partly regular ensembles, and ensembles based on a Gaussian prior input [2, 7]. At all densities for which single solutions were found the RS assumption appeared validated in the stability parameter and entropy. Figure 4 indicates the effect of channel load α on performance. We first explain results for codes in which only a single solution was found (no solution coexistence). For small values of the load a monotonic increase in the bit error rate, and capacity are observed as α is increased with C constant, as shown in figures 4(a) and 4(b), respectively. This matches the trend in the dense case, the dense code becoming superior in performance to the sparse codes as PSD increases. We found that for all sparse ensembles there existed regimes with α > 1.49 for which only a single stable solution existed, although the equivalent dense systems are known to have two stable solutions in some range of PSD [3]. In all single valued regimes we observed positive entropy, and a negative stability parameter. However, in cases of large α many features became more pronounced close to the dense case solution coexistence regime: notably the cusp in the stability parameter, gap between MI and ν and the gradient in Pb. 4.1. Solution Coexistence Regimes As in the case of dense CDMA [3], also here we observe a regime where two solutions, of quite different performance, coexist. In order to investigate the regime where two solutions coexist we investigated the states arrived at from random and ferromagnetic initial conditions (giving bad and good solutions respectively). Separate heuristic convergence criteria were found for the histograms, and these seemed to work well for the good solution. For the bad solution we simply present results after a fixed number of histogram updates (500) as all convergence criteria tested appeared either too stringent, to require experimentally inaccessible timescales, or did not capture the asymptotic values for important quantities like entropy. We believe 500 updates to be sufficiently conservative to capture the properties of these solutions however. Figure 4(a) shows the dependence of the bit error rate on the load, which is also equivalent to L/C. There is a monotonic increase in bit error rate with the load and the emergence and coexistence of two separate solutions above a certain point; in the case of the 6 : 3 code the point above which the two solutions coexist is PSD = 10.23dB as Sparsely-spread CDMA - a statistical mechanics based analysis 17 −10 −8 −6 −4 −2 0 2 4 6 8 10 Spectral Power Density [dB] Irreg. P. Reg. Dense0 1 2 3 4 5 6 7 8 −10 −8 −6 −4 −2 0 2 4 6 8 10 Spectral Power Density [dB] Irreg. P. Reg. Dense(b) −10 −8 −6 −4 −2 0 2 4 6 8 10 Spectral Power Density [dB] Irreg. P. Reg. Dense 2 2.5 3 3.5 4 4.5 5 5.5 6 −10 −8 −6 −4 −2 0 2 4 6 8 10 Spectral Power Density [dB] Irreg. P. Reg. Reg. Type 1 Reg. Type 2 Figure 2. Performance of the sparse CDMA configuration of variable and factor degree connectivities C : L = 3 : 3, respectively; all data presented on the basis of 100 runs, error bars are omitted and are typically small in subfigures (a)-(c) the smoothness of the curves being characteristic of this level (numerical accuracy was excellent only at intermediate PSDs). (a) The bit error rate is limited by the disconnected component in the case of irregular codes, otherwise trends match the dense case, lower bounded by the SUG. Inset - the range where the sparse-regular and dense cases crossover.(b) Multiuser efficiency indicates the regular user connectivity codes outperform the dense case below some PSD. (c) The spectral efficiency [——] demonstrates similar trends, the entropy being positive. The gap between the mutual information [· · · · · ·] and spectral efficiency (shown in the inset) is everywhere small and especially so at small and large PSD, indicating little information loss in the decoding process. (d) The two markers show the mean results for the two different stability estimates in the algorithm for the regular code. There are systematic errors at small PSD, and convergence is good only at intermediate PSD. The lines represent the average of these quantities for each ensemble – all ensembles show a cusp at some PSD, for 3 : 3 codes the various ensembles shows very similar trends, indicating local stability everywhere. Sparsely-spread CDMA - a statistical mechanics based analysis 18 −10 −8 −6 −4 −2 0 2 4 6 8 10 Spectral Power Density [dB] Dense 0 1 2 3 4 5 6 7 8 Spectral Power Density [dB] Dense Figure 3. The effect of increasing density for the regular ensemble: (a) Multiuser efficiency, (b) spectral efficiency [——] and mutual information [– – –]. Data presented on the basis of 10 runs, error bars are omitted but of a size comparable with the smoothness of the curves. The performance of sparse codes rapidly approaches that of the dense code everywhere. The PSD threshold beyond which the dense code outperforms the sparse code is fairly stable. indicated by the vertical dotted line. We use the regular code 6 : 3 to demonstrate the solution coexistence found above some PSD in various codes. The onset of the bimodal distribution can be identified by the divergence in the convergence time in the single solution regime (the time for the ferromagnetic and random histograms to converge to a common distribution). The time for this to occur, in a heuristically chosen statistic and accuracy, is plotted in figure 4(b). By a naive linear regression across 3 decades we found a power law exponent of 0.59 and a transition point of PSD = 10.23dB, but cannot provide a goodness of fit measure to this data. This would represent the point at which at least two stable solutions co-exist. Beyond PSD ≈ 12dB only one stable solution is found from both random and ferromagnetic initial conditions, corresponding statistically to a continuation of the good solution. A solution which statistically resembles a continuation of the bad solution is occasionally arrived at from both initial conditions, this solution had a positive stability parameter and negative entropy – so is not a viable solution. Thus we predict a second dynamical transition in the region of 12dB, as might be guessed by comparison with the dense case and observation of the trend in the stability parameter (see figure 4(c)). The stability results are presented in figure 4(c). Only two stable solutions were found in the region beyond this critical point and upto 12dB, which we infer to be viable RS solutions (where entropy is positive). The bad solution upto 12dB has a well resolved negative value. The good solution has a negative value in its mean, but like other near ferromagnetic solutions investigated results are very noisy due to numerical issues relating to histogram resolution. Both capacity and spectral efficiency monotonically increase with the load as shown in figure 4(d). For the 6 : 3 code we see a separation of the two solutions at PSD = 10.23dB Sparsely-spread CDMA - a statistical mechanics based analysis 19 −4 −2 0 2 4 6 8 10 12 Spectral Power Density [dB] 6:3 (Bad) 6:3 (Good) − PSD Data Mean Linear fit Bounds −4 −2 0 2 4 6 8 10 12 Spectral Power Density [dB] 6:3 (Bad) 6:3 (Good) −4 −2 0 2 4 6 8 10 12 Spectral Power Density [dB] 6:3 (Bad) 6:3 (Good) 8 10 12 Figure 4. The effect of channel load α on performance for the regular ensemble. Data presented on the basis of 10 runs, error bars omitted but characterised by the smoothness of curves. Dashed lines indicate the dense code analogues. The vertical dotted line indicates the point beyond which 6 : 3 random and ferromagnetic initial conditions failed to converge to the same solution, both dynamically stable solutions are shown beyond this point. (a) There is a monotonic increase in bit error rate with the increasing load. (b) Investigation of the 6 : 3 code (α = 2) indicates a divergence in convergence time as PSD → 10.23dB with exponent 0.59 based on a simple linear regression of 15 points (each point is the mean of 10 independent runs). Beyond this point different initial conditions give rise to one of two solutions. (c) The stability parameter was found to be negative for all convergent solutions, indicating the suitability of RS. Where the solution is near ferromagnetic the stability measure becomes quickly very noisy (as shown for the 5 : 3 and 6 : 3 codes). (d) As load α is increased there is a monotonic increase in capacity. The spectral efficiency for the ’bad’ solution exceeds 2 in a small interval (equivalent to negative entropy), similar to the behaviour reported for the dense case. Sparsely-spread CDMA - a statistical mechanics based analysis 20 (vertical dotted line.) The dashed lines correspond to a similar behaviour observed in the dense case (the range of interest is magnified in the inset.) A cross over in the entropy of the two distinct solutions, near PSD ≈ 11dB, is indicative of a second order phase transition. As in the dense case, only the solution of smallest spectral efficiency is thermodynamically relevant at a given PSD, although the other is likely to be important in decoding dynamics. The trends in the sparse case follow the dense case qualitatively, with the good solution having performance only slightly worse than the corresponding solution in the dense case (and vice versa for the bad solution). The entropy of the bad solution becomes negative in a small interval (spectral efficiency exceeds 2) although no local instability is observed. The static and dynamic properties of the histograms appear to be well resolved in this region. However, the negative entropy indicates an instability towards either a type of solution not captured within the RS assumption, or towards some metastable configuration. We will not speculate further, the bad solution is in any case thermodynamically subdominant in its low and negative entropy form. Our hypothesis is therefore that the trends in the sparse ensembles match those in the dense ensembles within the coexistence region and RS continues to be valid for each of two distinct positive entropy solutions. The coexistence region for the sparse codes is however smaller than in the corresponding dense ensembles. Since our histogram updates mirror the properties of a belief propagation algorithm on a random graph we can suspect that the bad solution may have implications for the performance of belief propagation decoding in the coexistence region, and that convergence problems will appear near this region. In the user regular codes investigated the bad solution of the sparse ensemble outperforms the bad solution of the dense ensemble, and vice-versa for the good solution. Thus regardless of whether sparse decoding performance is good or bad, the dynamical transition point for the dense ensemble would corresponds to a PSD beyond which dense CDMA outperforms sparse CDMA at a particular load. 4.2. Spectral Efficiency Lower Bound Numerical Results Finally we present figure 5, which shows the the mutual information between a single chip and transmitted bits for sparse ensembles of differing chip connectivity in the infinite PSD (zero noise) limit (15). This shows that in expectation a chip drawn from the regular ensemble contains more information on the transmitted bits than a chip drawn from any other ensemble (including the Poissonian ensemble). The difference between the regular and Poissonian ensembles becomes relatively smaller as L increases. This appears consistent with the replica method results found at high PSD, although regular chip connectivity under performed by comparison with Poisson distributed chip connectivity in the low PSD regime, which was not anticipated by the single chip approximation. Sparsely-spread CDMA - a statistical mechanics based analysis 21 0 5 10 15 20 25 Mean Chip Connectivity, L Poissonian Figure 5. A PSD → ∞ limit to the expected mutual information between a single chip, and the transmitted bits. Mutual Information is highest for regular chip connectivities, with the Poissonian chip connectivity result also shown, the discrepancy becoming relatively small as L increases. The inset shows the mutual information/bit decoded (〈I(τ ; yµ)〉 /L) on a log-log plot to demonstrate an asymptotic power law behaviour and show more detail in the cases of small L. 5. Concluding Remarks Our results demonstrate the feasibility of sparse regular codes for use in CDMA. At moderate PSD it seems the performance of sparse regular codes may be very good. With the replica symmetric assumption apparently valid at practical PSD it is likely that fast algorithms based on belief propagation may be very successful in achieving the theoretical results. Furthermore for lower density sparse codes the problem of the coexistence regime, which limits the performance of practical decoding methods, seems to be less pervasive than for dense ensembles in the over saturated regime. A direct evaluation of the properties of belief propagation may prove similar results to those shown here. In the absence of replica symmetry breaking states it is normally true that belief propagation performs very well. However, to make best use of the channel resources it may be preferable to implement high load regimes in cases of high PSD, and so overcoming the algorithmic problems arising from the solution coexistence is a challenge of practical importance in this case. Other practical issues in implementation are certainly significant. Similar to the case of dense CDMA there are considerable problems relating to multipath, fading and power control, in fact it is known that these effects are more disruptive for the sparse codes, especially regular codes. However, certain situations such as broadcasting (one to many) channels and downlink CDMA, where synchronisation can be assumed, may be practical points for future implementation. There are practical advantages of the sparse case over dense and orthogonal codes in some regimes. The sparse CDMA method is likely to be particularly useful in frequency-hopping and time-hopping code division multiple access (FH and TH -CDMA) applications where the effect of these practical limitations is less Sparsely-spread CDMA - a statistical mechanics based analysis 22 emphasised. Extensions based on our method to cases without power control or synchronisation have been attempted and are quite difficult. A consideration of priors on the inputs, in particular the effects when sparse CDMA is combined with some encoding method may also be interesting. Acknowledgments Support from EVERGROW, IP No. 1935 in FP6 of the EU is gratefully acknowledged. DS would like to thank Ido Kanter for helpful discussions. Bibliography [1] S. Verdu. Multiuser Detection. Cambridge University Press, New York, NY, USA, 1998. [2] M. Yoshida and T. Tanaka. Analysis of sparsely-spread cdma via statistical mechanics. In Proceedings - IEEE International Symposium on Information Theory, 2006., pages 2378–2382, 2006. [3] T. Tanaka. A statistical-mechanics approach to large-system analysis of cdma multiuser detectors. Information Theory, IEEE Transactions on, 48(11):2888–2910, Nov 2002. [4] D. Guo and S. Verdu. Communications, Information and Network Security, chapter Multiuser Detection and Statistical Mechanics, pages 229–277. Kluwer Academic Publishers, 2002. [5] H. Nishimori. Statistical Physics of Spin Glasses and Information Processing. Oxford Science Publications, Oxford, UK, 2001. [6] M. Mezard, G. Parisi, and M.A Virasoro. Spin Glass Theory and Beyond. World Scientific, Singapore, 1987. [7] A. Montanari and D. Tse. Analysis of belief propagation for non-linear problems: The example of cdma (or: How to prove tanaka’s formula). In Proceedings IEEE Workshop on Information Theory, 2006. [8] Y. Kabashima. A statistical-mechanical approach to cdma multiuser detection: propagating beliefs in a densely connected graph. cond-mat/0210535, 2002. [9] J.P Neirotti and D. Saad. Improved message passing for inference in densely connected systems. Europhys. Lett., 71(5):866–872, 2005. [10] A. Montanari, B. Prabhakar, and D. Tse. Belief propagation based multiuser detection. In Proceedings of the Allerton Conference on Communication, Control and Computing, Monticello, USA, 2006. [11] D. Guo and C. Wang. Multiuser detection of sparsely spread cdma. (unpublished), 2007. [12] T. Tanaka and D. Saad. A statistical-mechanical analysis of coded cdma with regular ldpc codes. In Proceedings - IEEE International Symposium on Information Theory, 2003., page 444, 2003. [13] D.J. MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2004. [14] J. Raymond and D. Saad. Randomness and metastability in cdma paradigms. arXiv:0711.4380, 2007. [15] R. Vicente, D. Saad, and Y. Kabashima. Advances in Imaging and Electron Physics, volume 125, chapter Low Density Parity Check Codes - A statistical Physics Perspective, pages 231–353. Academic Press, 2002. [16] M Talagrand. The generalized parisi formula. Comptes Rendus Mathematique, 337(2):111–114, 2003. [17] S. Franz, M. Leone, and F.L. Toninelli. Replica bounds for diluted non-poissonian spin systems. Journal of Physics A: Mathematical and General, 36(43):10967–10985, 2003. [18] F. Guerra. Broken Replica Symmetry Bounds in the Mean Field Spin Glass Model. Communications in Mathematical Physics, 233:1–12, 2003. Sparsely-spread CDMA - a statistical mechanics based analysis 23 [19] R. Monasson. Optimization problems and replica symmetry breaking in finite connectivity spin glasses. J. Phys. A, 31(2):513–529, 1998. [20] K.Y.M. Wong and D. Sherrington. Graph bipartitioning and spin-glasses on a random network of fixed finite valence. J. Phys. A, 20:L793–99, 1987. [21] M. Mezard and G. Parisi. The bethe lattice spin glass revisited. Euro. Phys. Jour. B, 20(2):217–233, 2001. [22] O. Rivoire, G. Biroli, O.C. Martin, and M. Mzard. Glass models on bethe lattices. Euro. Phys. J. B, 37:55–78, 2004. [23] Y. Kapashima. Propagating beliefs in spin glass models. J. Phys. Soc. Jpn., 72:1645–1649, 2003. [24] J. Raymond, A. Sportiello, and L. Zdeborov. The phase diagram of random 1-in-3 satisfiability problem. Phys. Rev. E., 76(1):011101, 2007. Background The model Code Ensembles Methodology Spectral Efficiency Lower Bound Replica Method Outline Replica Symmetric Equations Population Dynamics Stability Analysis Results Solution Coexistence Regimes Spectral Efficiency Lower Bound Numerical Results Concluding Remarks ABSTRACT Sparse Code Division Multiple Access (CDMA), a variation on the standard CDMA method in which the spreading (signature) matrix contains only a relatively small number of non-zero elements, is presented and analysed using methods of statistical physics. The analysis provides results on the performance of maximum likelihood decoding for sparse spreading codes in the large system limit. We present results for both cases of regular and irregular spreading matrices for the binary additive white Gaussian noise channel (BIAWGN) with a comparison to the canonical (dense) random spreading code. <|endoftext|><|startoftext|> Introduction In [1], Ando proved the following inequalities for positive semidefinite (PSD) matrices A, B, and any unitarily invariant (UI) norm. For any non-negative operator monotone function f(t) on [0,∞): |||f(A)− f(B)||| ≤ |||f(|A−B|)|||, (1) and, when f(0) = 0 and f(∞) = ∞, and g is the inverse function of f , |||g(A)− g(B)||| ≥ |||g(|A−B|)|||. (2) In a later paper [2], Ando and Zhan proved the related inequalities (with the same conditions on f and g) |||f(A) + f(B)||| ≥ |||f(A+B)|||, (3) |||g(A) + g(B)||| ≤ |||g(A+B)|||. (4) The conditions on f are satisfied by every operator concave function f with f(0) = 0. Inequality (4) was generalised by Kosem [7] to non-negative convex functions g on [0,∞), with g(0) = 0. Inequality (3) was generalised very recently to any non-negative concave function on [0,∞) by Bourin and Uchiyama [5], who also gave a simpler proof of Kosem’s result. In the same spirit, we consider the question whether inequalities (1) and (2) can also be generalised to non-negative concave f and convex g, respectively. After introducing the necessary prerequisites in Section 2, we present our main results concerning this question in Section 3. Regrettably, most of our results are negative answers, and we give counterexamples to this generalisation. The answer is even negative for the special case A ≥ B, although the apparent hardness of finding counterexamples had led us temporarily into believing that in that case the generalisation might actually hold. All is not bad news, however. In Section 4 we answer the question affirmatively in the special case when A ≥ ||B||. In Section 5, we introduce the novel notion of Y -dominated majorisation between the spectra of two Hermitian matrices, where Y is itself a Hermitian matrix. We prove a certain property of this relation, namely Proposition 3, which we subsequently use, first in a rather destructive fashion. To wit, the Proposition has been instrumental in finally discovering a counterexample to the generalisation of (1) for A ≥ B; this will be reported in Section 6. On the more constructive side, the Proposition also allows to strengthen the results of Bourin-Uchiyama and Kosem mentioned above. This is the topic of the final Section, along with a few other applications. 2 Preliminaries In this Section, we introduce the notations and necessary prerequisites; a more detailed exposition can be found, e.g. in [4]. We will use the abbreviations LHS and RHS for left-hand side and right-hand side, respectively. We denote the vector of diagonal entries of a matrix A by Diag(A). We denote the absolute value by | · |, both for scalars and for matrices. For matrices this is defined as |A| := (A∗A)1/2. Similarly, we denote the positive part of a real scalar or Hermitian matrix by (·)+, and define it by A+ := (A+ |A|)/2. In this paper, we are mainly concerned with monotonously increasing convex and concave functions from R to R. Kosem noted in [7] that any such function can be approximated by a sum of angle functions x 7→ ax+ b(x−x0) +, where a ≥ 0, and b > 0 for a convex angle function (b < 0 for a concave one). We are also concerned with the unitarily invariant (UI) matrix norms, which we denote by ||| · |||, and which are defined in terms of the singular values σj(·) of the matrix only. We adopt the customary convention that the singular values are sorted in non-increasing order: σ1 ≥ σ2 ≥ . . . ≥ σd. Special cases of these norms are the operator norm || · ||, which is just equal to the largest singular value σ1(·), and the Ky Fan norms || · ||(k), which are defined as the sum of the k largest singular values: ||A||(k) := σj(A). The famous Ky Fan dominance theorem states that a matrix B dominates another matrix A in all UI norms if and only if it does so in all Ky Fan norms. The latter set of relations can be written as a weak majorisation relation between the vectors of singular values of A and B: σ(A) ≺w σ(B) : σj(A) ≤ σj(B), ∀k. For PSD matrices, the above domination relation translates to a weak majori- sation between the vectors of eigenvalues: λ(A) ≺w λ(B). Weyl’s monotonicity Theorem ([4], Corollary III.2.3) states that k(A) ≤ λ k(A +B), ∀k, for Hermitian A and positive semi-definite B. Here, λ↓(A) denotes the (real) vector of eigenvalues of A sorted in non-increasing order. Finally, we refer the reader to Chapter 2 of [6] for an exposition of a number of important functional analytic properties of eigenvalues and corresponding eigenspaces of a Hermitian matrix, which we will need in the proof of Propo- sition 2. 3 Main Results The question we start with is about the straightforward generalisation of in- equality (2) to non-negative convex functions. Question 1 For all A,B,≥ 0, for all UI norms, and for non-negative convex functions g on [0,∞) with g(0) = 0, does the inequality |||g(A) − g(B)||| ≥ |||g(|A− B|)||| hold? The answer to this question is negative, as shown by the following counterex- ample. We consider the convex angle function g(x) = x + (x − 1)+ and the operator norm. For the 2× 2 PSD matrices 0.9 0 0 0.6 , B = 0.8 0.5 0.5 0.4 , (5) the eigenvalues of g(|A− B|) are 0.65249 and 0.35249, while those of g(A)− g(B) are 0.65010 and −0.48862. Thus, ||g(|A − B|)||∞ = 0.65249, which is larger than ||g(A)− g(B)||∞ = 0.65010. ✷ Under the additional restriction A ≥ B, the absolute value in the argument of g in the right-hand side vanishes, leading to a simplified statement, and a second question, with better hopes for success. Introducing the matrix ∆ = A− B, Question 2 For all B,∆ ≥ 0, for all UI norms, and for non-negative convex functions g on [0,∞) with g(0) = 0, does the inequality |||g(B+∆)−g(B)||| ≥ |||g(∆)||| hold? This restricted case also turns out to have a negative answer. Counterexam- ples, however, were much harder to find, and required a reduction of the prob- lem based on certain results about a novel majorisation-like relation, which we call the Y -dominated majorisation. This will be the subject of Sections 5 and 6, where a number of Propositions of independent interest are proven. It is also very reasonable to ask: Question 3 For all B,∆ ≥ 0, for all UI norms, and for non-negative concave functions f on [0,∞), does the inequality |||f(B + ∆) − f(B)||| ≤ |||f(∆)||| hold? In fact, if this were true, a positive answer to Question 2 would easily follow, using the same reasoning that was used in [5] to derive the generalisation of (3) from the generalisation of (4). Again, this statement is false, as the following counterexample shows. Consider the concave angle function f(x) = min(x, 1) = x − (x − 1)+, and the 3 × 3 PSD matrices 0.701816 0.317887 0.198910 0.317887 1.014950 −0.093826 0.198910 −0.093826 0.274236 0.192713 0 0 0 0.446505 0 0 0 0.455416 One gets ||f(∆)||∞ = 0.455416 while ||f(B +∆)− f(B)||∞ = 0.455776. In Section 4, we consider an even more restricted special case, in which the inequalities (1) and (2) finally do hold. We actually prove that a stronger relationship holds in this special case: Proposition 1 For non-negative, monotonously increasing and concave func- tions g, and A,B ≥ 0 such that A ≥ ||B||, we have λ↓(g(A− B)) ≥ λ↓(g(A)− g(B)). (6) An easy Corollary is the corresponding statement for monotonously increasing convex functions. Corollary 1 Let f be a non-negative convex function on [0,∞) with f(0) = 0. Let A,B ≥ 0 such that A ≥ ||B||. Then λ↓(f(A− B)) ≤ λ↓(f(A)− f(B)). (7) Proof. Let f = g−1, with g satisfying the conditions of Proposition 1. Replace in (6) A by f(A) and B by f(B), yielding λ↓(g(f(A)− f(B))) ≥ λ↓(A− B). Applying the function f on both sides does not change the ordering, because of monotonicity of f , and yields validity of inequality (7). ✷ These two results obviously imply the corresponding majorisation relations, and by Ky Fan dominance, relations in any UI norm. 4 Proof of Proposition 1 We want to prove inequality (6): λ↓(g(A)− g(B)) ≤ λ↓(g(A−B)), for A,B ≥ 0, A ≥ ||B||, and concave, monotonously increasing and non- negative g. W.l.o.g. we will assume ||B|| = 1, since any other value can be absorbed in the definition of g. It is immediately clear that if (6) holds for g that in addition satisfy g(0) = 0, then it must also hold without that constraint, i.e. for functions g(x)+ c, with c ≥ 0. This is because the additional constant c drops out in the LHS, while λ↓(g(A−B) + c) ≥ λ↓(g(A− B)). Furthermore, (6) remains valid when replacing g(x) with ag(x), for a > 0. Thus, w.l.o.g. we can assume g(0) = 0 and g(1) = 1. Together with concavity of g, this implies that, for 0 ≤ x ≤ 1, g(x) ≥ x, while for x ≥ 1, the derivative g′(x) ≤ 1. Since 0 ≤ B ≤ 11, and for 0 ≤ x ≤ 1, g(x) ≥ x holds, we have g(B) ≥ B, or −g(B) ≤ −B. By Weyl monotonicity, this implies λ↓(g(A) − g(B)) ≤ λ↓(g(A)−B). Thus, statement (6) would be implied by the stronger statement λ↓(g(A)−B) ≤ λ↓(g(A− B)). (8) Now note that the argument of g in the LHS, A, is never below 1. Thus, in principle, we could replace g(x) in the LHS by another function h(x) defined h(x) = g(x), if x ≥ 1 x, otherwise. If we also do that in the RHS, we get a stronger statement than (8). Indeed, h(x) ≤ g(x) for x ≥ 0 and A − B ≥ 0, and therefore h(A − B) ≤ g(A − B) holds. By Weyl monotonicity again, we see that (8) is implied by λ↓(h(A)−B) ≤ λ↓(h(A− B)). (10) The importance of this move is that h(x) is still a monotonously increasing and concave function (because g′(x) ≤ 1 for x ≥ 1), but now has gradient h′(x) ≤ 1 for x ≥ 0. Defining C = A−B, which is positive semi-definite, we now have to show the inequality k(h(C +B)− B) ≤ λ k(h(C)) = h(λ k(C)), for every k. Fixing k, and introducing the shorthand x0 = λ k(C), we can exploit concavity of h to upper bound it as h(x) ≤ a(x − x0) + h(x0), where a = h′(x0) ≤ 1. Again by Weyl monotonicity, we find k(h(C +B)− B)≤λ k(a(C +B − x0) + h(x0)− B) k(aC + (a− 1)B − ax0 + h(x0)) k(aC)− ax0 + h(x0) = h(x0), where in the second line we could remove the term (a−1)B because it is nega- tive. This being true for all k, we have proved (10) and all previous statements that follow from it, including the statement of the Theorem. ✷ 5 On Y -dominated Majorisation To answer Question 2, we have to consider the property that a convex function f satisfies λ(f(∆)) ≺w λ(f(B +∆)− f(B)) (11) for all PSD B and ∆, which is equivalent to the statement λ(f(A− B)) ≺w λ(f(A)− f(B)) (12) for all A ≥ B ≥ 0. The monotone convex angle functions x 7→ ax + (x − 1)+ (a ≥ 0) already have proven their valour as a testing ground for similar statements, in Section 3. Numerical experiments using angle functions for inequality (11) did not directly lead to any counterexamples, however. This temporarily increased our belief that the inequality might actually hold, and led us to investigate, as an initial step towards a “proof”, whether the inequality j(aY +B) ≤ j(aY + C) might be true for all a ≥ 0, where B = f(Y ) and C = f(X + Y )− f(X), and f(x) = (x−1)+. The crucial observation is now that if this holds for all a ≥ 0, then, actually, a much stronger relationship than just majorisation must hold between aY +B and aY + C. To describe this phenomenon, we’ll consider a somewhat broader setting. Let G and C be Hermitian matrices, and let f1 and f2 be monotonously increasing real functions on R. Suppose that for all a ≥ 0, the following holds: j(aA +B) ≤ j(aA+ C), (13) with A = f1(G) and B = f2(G). It is easily seen that if (13) holds for a certain value of a, it also holds for all smaller positive values. Let b be a scalar such that 0 ≤ b < a. Because both A and B exhibit their eigenvalues as diagonal elements in the eigenbasis of G, and both in non-increasing order, we get j(aA +B) = j (bA+B) + (a− b) j (A). On the other hand, for aA + C we only have the subadditivity inequality j (aA+ C) ≤ j(bA + C) + (a− b) j(A). As a consequence, we obtain that, indeed, j (bA+B) ≤ j(bA + C) follows from (13). We are therefore led to consider what happens when a tends to infinity, because that value dominates all others. Subtracting j=1 λ j(aA) from both sides, and substituting a = 1/t, we obtain j(A + tB)− λ j(A)) ≤ j(A+ tC)− λ j(A)). In the limit of t going to 0, this yields a comparison between derivatives: j(A + tB) ≤ j(A+ tC). (14) We will show below that the derivatives ∂ j (A + tC) are the diagonal elements of C in a certain basis depending on G and C. Let us first introduce the vector δ(C;A) whose entries satisfy the following relation: δj(C;A) := j(A+ tC). (15) With this notation, relation (14) becomes δj(B;G) ≤ δj(C;G). That is, the entries of δ(B;G) are “majorised” by those of δ(C;G). How- ever, this is a much stronger relation than ordinary majorisation, since the rearrangement of the entries in decreasing order is absent. Introducing the symbol ≺dw for weak majorisation with missing rearrange- ment: a ≺dw b ⇐⇒ bj , (16) relation (14) is expressed as δ(B;G) ≺dw δ(C;G). (17) To justify these notations, we now show: Proposition 2 Let A and C be Hermitian matrices. With δ(C;A) defined by (15), the entries of the vector δ(C;A) are the diagonal entries of C in a certain basis in which A is diagonal and its diagonal entries appear sorted in non- increasing order. When all eigenvalues of A are simple (i.e. have multiplicity 1), this basis is just the eigenbasis of A and does not depend on C. Proof. There are three cases to consider, according to whether A is non- degenerate, A + tC has an accidental degeneracy at t = 0, or A + tC is permanently degenerate. 1. The most important case is when all eigenvalues of A are simple, i.e. when they have multiplicity 1. We then show that the derivative is given by j(A+ tC) = Tr[Pk(A) C], where Pk(A) denotes the projector on the subspace spanned by the k eigen- vectors of A corresponding to its k largest eigenvalues. By the simplicity of the eigenvalues of A, the eigenvalues of A + tC are also simple for small enough values of t. This follows easily fromWeyl’s inequalities: j(A) + λ n(tC) ≤ λ j (A+ tC) ≤ λ j(A) + λ 1(tC); thus if t||C|| is strictly less than one half the minimal difference between all pairs of eigenvalues of A, the difference between all pairs of eigenvalues of A+tC is bounded away from 0. Therefore, for small enough t, every eigenvalue of A+ tC has a unique eigenvector, and as a result Pk(A+ tC) is well-defined as the sum of the projectors on the eigenvectors pertaining to the k largest eigenvalues. It is well-known that the eigenvalues of A+tC as functions of the real variable t can be so ordered that they are analytic functions of t (see [6], Chapter 2), and hence continuous. This implies that the k-th largest eigenvalue of A+ tC is also a continuous function of t, for any k. If, furthermore, an eigenvalue λ(t) of A + tC is simple in an interval of t, then the projector P (t) on the eigenvector x(t) associated to it (with P (t) = x(t)x(t)∗) is also analytic, and therefore continuous in t on this interval. We conclude that Pk(A+ tC) is analytic in t, and therefore differentiable. By the maximality of Pk(A) in the variational characterisation j(A) = max Tr[AQk] = Tr[APk(A)], where Qk runs over all rank-k projectors, we have Tr[APk(A + tC)] = 0, which implies j (A+ tC) Tr[(A+ tC)Pk(A+ tC)] Tr[APk(A+ tC)] + Tr[(A+ tC)Pk(A)] =Tr[C Pk(A)]. Let U be the unitary that diagonalises A, i.e. UAU∗ = Λ↓(A). Then Tr[C Pk(A)] = (UCU∗)jj, and the statement of the Proposition follows. 2. When A has degenerate eigenvalues, the situation becomes somewhat more complicated, but there are no really significant changes. There is no longer a unique eigenbasis of A, so that Pk(A) is not well-defined for all k. We will first consider the case where C is such that it removes the degeneracy of the eigenvalues of A in A+ tC for small enough positive t. In that case Pk(A+ tC) will be uniquely defined for all positive t less than some value t0, which is the smallest positive t for which A + tC has an accidental degeneracy (which is what also happens at t = 0). This occurs, for instance, when C has simple eigenvalues. Indeed, by analyt- icity of the eigenvalues of A + tC in t, degeneracy is either accidental (for isolated values of t) or permanent (for all values of t). Since all eigenvalues are simple for large enough t, they have to remain simple for all values of t except possibly for some isolated values, such as t = 0, in this case. Let t0 be the smallest positive such value, then A + tC has simple eigenvalues for 0 < t < t0. We can therefore define Pk(A) in a unique way as the limit limt→0 Pk(A + tC). This is an allowed choice because of the continuity of the eigenvalues: j=0 λ k(A) = Tr[limt→0 Pk(A + tC) A]. Using the same argument as in the previous case, we obtain δ(C;A) := Tr[limt→0 Pk(A+ tC) C]. Let λl be the eigenvalues of A (multiplicity not counted), and Ql the projec- tions onto the corresponding eigenspaces of A (with Q∗l the corresponding in- clusion operators); the rank of Ql equals the multiplicity of λl, which we denote by ml. To obtain δ(C;A), we first construct the diagonal blocks Cl := QlCQ (of size ml), then take the eigenvalues λ ↓(Cl) in non-increasing order of each block, and then concatenate the obtained sequences of eigenvalues: δ(C;A) := (λ↓(C1), . . . , λ ↓(Cm)). If all eigenvalues of A are distinct, this reduces to the vector of diagonal elements of C in the eigenbasis of A that we encountered in case 1. For example, if λ↓(A) = (5, 5, 3, 1), then δ(C;A) = (λ 1(C1), λ 2(C1), C33, C44), where C1 = C11 C12 C21 C22  and all entries of C are taken in the eigenbasis of A. Let U be a unitary (which, in this case, is not unique) that diagonalises A as UAU∗ = Λ↓, and take the diagonal blocks Cl of UCU ∗, as above. Each block can be diagonalised using a unitary Vl. Together with U we obtain the total basis rotation W := U( l Vl). By construction, l Vl leaves Λ invariant, and resolves the ambiguity in U . We obtain that δ(C;A) is the vector of diagonal entries of C in the basis obtained by applying the unitary W . 3. Finally, we look at the case when A + tC is permanently degenerate, i.e. when it has degenerate eigenvalues for all values of t. W.l.o.g. we just have to look at t in an interval [0, t0), where t0 is the smallest positive value for which A + tC has an accidental degeneracy. Let us denote by λj(t) the eigenvalues of A+ tC in non-increasing order, multiplicity mj not counted, and by Pj(t) the projectors on the corresponding eigenspaces. In that case Pk(A + tC) is only well-defined if there is a j′ such that k = m1 +m2 + . . . +mj′; then we have Pk(A + tC) = P1(t) + P2(t) + . . .+ Pj′(t). If there is no such j′, let j′ be the largest integer such that k > m1 + m2 + . . .+mj′ =: k ′. Thus 0 < k − k′ < mj′+1. Then we have j (A+ tC) miλi(t) + (k − k ′)λj′+1(t) =Tr[(A + tC) (P1(t) + . . .+ Pj′(t) + k − k′ mj′+1 Pj′+1(t))] =Tr[(A + tC) ( k − k′ mj′+1 Pk′+mj′+1(A+ tC) + (1− k − k′ mj′+1 )Pk′(A+ tC))]. Thus, if we define α := (k − k′)/mj′+1, j=1 λ j(A + tC) interpolates linearly between j=1 λ j(A+ tC) and ∑k′+mj′+1 j=1 λ j(A + tC) with parameter α. Proceeding in the same way as in the two previous cases, we obtain for the derivative j (A+ tC) = Tr[C(αPk′+mj′+1(A) + (1− α)Pk′(A))], where the Pk(A) have to be replaced with the limits limt→0 Pk(A + tC) if in addition there are accidental degeneracies at t = 0. Let us consider the entries of C again as before, in an eigenbasis of A in which the eigenvalues of A appear on the diagonal, in non-increasing order. We get δ(C;A)k = (1− α) Cii + α k′+mj′+1 Cii. (18) Because of the permanent degeneracy, an eigenbasis is determined up to “lo- cal” rotations within the various eigenspaces. We consider a partitioning of C in such an eigenbasis corresponding to these eigenspaces. That is, in C we can single out diagonal blocks, each of which corresponds to the eigenspace of eigenvalue λj . We can use our freedom to choose the local rotations to make all diagonal elements of C equal within each diagonal block. This allows us to get rid of the interpolation in (18), and we finally obtain that, again, δ(C;A)k = with the entries of C taken in the eigenbasis that we have just chosen. ✷ The upshot of this Proposition is that there exists a unitary U such that UAU∗ = Λ↓(A) and δ(C;A) = Diag(UCU∗). In the generic case that all λi(A) are distinct, U is unique and does not depend on C. A number of easy consequences follow immediately from this Proposition: Corollary 2 Let G and C be Hermitian matrices, f be any monotonously increasing real function on R, and g any strictly increasing real function on R, then (i) δ(f(G);G) = f(λ↓(G)). (ii) δ(C;G) obeys Schur’s majorisation Theorem: δ(C;G) ≺ λ↓(C). (iii) δ(C;G) + aλ↓(f(G)) = δ(C + af(G);G), ∀a ≥ 0. (iv) δ(C; f(A)) = δ(C;A). Along with the previously demonstrated equivalence of (13) with (17), the Corollary immediately leads to the following Proposition: Proposition 3 For Hermitian G,C, monotonously increasing real functions f1, f2 on R, and A = f1(G), B = f2(G), the following are equivalent: λ(aA+B) ≺w λ(aA+ C), ∀a ≥ 0 (19) δ(B;G)≺dw δ(C;G) (20) δ(aA+ B;G)≺dw δ(aA+ C;G), ∀a ≥ 0. (21) Proof. (19) implies (20): This is just Proposition 2. (20) implies (21): Add aλ↓(A) to both sides and invoke statement (iii) of the Corollary. (21) implies (19): By statement (i) of the Corollary, the LHS of (21) is equal to λ↓(aA+B), while, by statement (ii) of the Corollary, its RHS is majorised by λ(aA+ C). ✷ 6 Counterexample to Question 2 If the answer to Question 2 is to be affirmative, it should at least hold for all angle functions f(x) = ax+ b(x−x0) +. By Proposition 3 this is equivalent to the statement δ((Y − 11)+; Y ) ≺dw δ((X + Y − 11) + − (X − 11)+; Y ). Consider the 3× 3 matrices 0.35614 −0.053243 0.10116 −0.053243 0.87456 0.40559 0.10116 0.40559 0.82474 0.53642 0 0 0 0.42018 0 0 0 0.094866 The eigenbasis of Y is therefore the standard basis. Then δ((Y − 11)+; Y ) = (0, 0, 0) and (X + Y − 11)+ − (X − 11)+ = −0.00018194 0.00052449 −0.0016345 0.00052449 0.2573 0.12368 −0.0016345 0.12368 0.04 so that δ((X+Y −11)+−(X−11)+; Y ) = (−0.00018194, 0.2573, 0.04). The first entry is negative, violating the ≺dw relation, and thereby answering Question 2 in the negative. ✷ 7 Further Applications of Y -dominated majorisation One issue we had to address during our attempts at giving a positive answer to Question 2 dealt with the possibility of reducing the question for convex functions to convex angle functions. One way of doing so would have been possible if the set of (monotonously increasing and convex) functions satisfying (11) were closed under addition. While we were unable to prove this particular statement (which is most likely false, anyway), Proposition 3 enables us to prove the corresponding statement for the relation δ(f(Y ); Y ) ≺dw δ(f(X + Y )− f(X); Y ). (22) Proposition 4 Let all the eigenvalues of Y be distinct. Let f and g be func- tions from R to R satisfying (22). Then f + g also satisfies (22). Proof. By the assumption on the eigenvalues of Y , δ(A; Y ) equals Diag(A) in a basis only depending on Y and is therefore a linear function of A. We can therefore add up the inequalities (22) for f and g and obtain the corresponding inequalities for f + g. ✷ A second application of Proposition 3 is a strengthening of the following Propo- sition, which we also obtained in the course of our attempts at positively answering Question 2. Proposition 5 For X, Y ≥ 0 and ga(x) = ax+ , with a ≥ 0, the following majorisation statement holds: λ(ga(Y )) ≺w λ(ga(X + Y )− ga(X)). Proof. From the proof of Lemma X.1.4 in [4], we have, for X, Y ≥ 0, j ((X + 11) −1 − (X + Y + 11)−1) ≤ λ j(11 − (Y + 11) Defining the function f(x) = x = 1− (x+ 1)−1, this turns into: j(f(X + Y )− f(X)) ≤ λ j(f(Y )). This implies the majorisation statement j(f(X + Y )− f(X)) ≤ j(f(Y )). (23) We want to prove a somewhat similar statement for the function ga(x). Since both f and ga are monotonously increasing over R +, and noting that ga(x) = (a+ 1)x− f(x), we have j(ga(Y ))= ga(λ j (Y )) = (a+ 1)λ j(Y )− f(λ j(Y )) j(f(Y ))= f(λ j(Y )), so that j(ga(Y )) = (a+ 1)λ j(Y )− λ j(f(Y )). This implies in particular j (ga(Y )) = (a+ 1) j (Y )− j(f(Y )) ≤ (a+ 1) j (Y )− j(f(X + Y )− f(X)), where we have inserted (23). Exploiting the well-known relation ([4], Th. III.4.1) j(A+B) ≤ j (A) + j(B), for A = (a+ 1)Y − f(X + Y ) + f(X) and B = f(X + Y )− f(X) then yields j (ga(Y ))≤ j((a+ 1)Y − f(X + Y ) + f(X)) j(ga(X + Y )− ga(X)). Proposition 3, with A = G = Y , B = f(Y ), C = f(X + Y ) − f(X), where f(x) = x2/(x+ 1), then yields the following strengthening of Proposition 5: Proposition 6 For X, Y ≥ 0, and ga(x) = ax+ , with a ≥ 0, δ(ga(Y ); Y ) ≺dw δ(ga(X + Y )− ga(X); Y ). Here we noted that ga(X + Y )− ga(X) = aY + f(X + Y )− f(X). To end this Section, we present a third application of Proposition 3, namely to the results of Kosem and Bourin-Uchiyama. Consider first inequality (3), which holds for all non-negative concave functions f(x). In particular, it holds for all functions f = ax+ f0(x), where f0 is non-negative concave, and a ≥ 0. Inserting this in the eigenvalue-majorisation form of inequality (3), we get the (A+B)-dominated majorisation relation λ(a(A+B) + f0(A+B)) ≺w λ(a(A +B) + f0(A) + f0(B)), for A,B ≥ 0. Proposition 3 then immediately yields the stronger form δ(f(A+B);A+B) ≺dw δ(f(A) + f(B);A+B), (24) for all non-negative concave functions f . The strengthening of inequality (4) is performed in a completely identical way and yields the reversed inequality of (24) for non-negative convex functions g such that g(0) = 0. Acknowledgements JSA thanks Professor Moin Uddin, Director of his institute for encourage- ment and supporting his visit to attend the conference at Nova Southeastern University, Fort Lauderdale, Florida, USA, which lead to his introduction to Koenraad M.R. Audenaert and the completion of this work. KA thanks the Institute for Mathematical Sciences, Imperial College London, for support. His work is part of the QIP-IRC (www.qipirc.org) supported by EPSRC (GR/S82176/0). References [1] T. Ando, “Comparison of norms |||f(A)− f(B)||| and |||f(|A−B|)|||,” Math. Z. 197, 403–409 (1988). [2] T. Ando and X. Zhan, “Norm inequalities related to operator monotone functions,” Math. Ann. 315, 771–780 (1999). [3] J.S. Aujla and F.C. Silva, “Weak majorization inequalities and convex functions,” Lin. Alg. Appl. 369, 217–233 (2003). [4] R. Bhatia, Matrix Analysis, Springer, Heidelberg (1997). [5] J.-C. Bourin and M. Uchiyama, “A matrix subadditivity inequality for f(A+B) and f(A) + f(B),” Arxiv.org E-print math.FA/0702475 (2007). [6] T. Kato, Perturbation theory for linear operators, Reprint of the 1980 edition, Classics in Mathematics, Springer-Verlag, Berlin (1995). [7] T. Kosem, “Inequalities between ||f(A + B)|| and ||f(A) + f(B)||,” Lin. Alg. Appl. 418, 153–160 (2006). http://arxiv.org/abs/math/0702475 Introduction Preliminaries Main Results Proof of Proposition 1 On Y-dominated Majorisation Counterexample to Question 2 Further Applications of Y-dominated majorisation Acknowledgements References ABSTRACT For positive semidefinite matrices $A$ and $B$, Ando and Zhan proved the inequalities $||| f(A)+f(B) ||| \ge ||| f(A+B) |||$ and $||| g(A)+g(B) ||| \le ||| g(A+B) |||$, for any unitarily invariant norm, and for any non-negative operator monotone $f$ on $[0,\infty)$ with inverse function $g$. These inequalities have very recently been generalised to non-negative concave functions $f$ and non-negative convex functions $g$, by Bourin and Uchiyama, and Kosem, respectively. In this paper we consider the related question whether the inequalities $||| f(A)-f(B) ||| \le ||| f(|A-B|) |||$, and $||| g(A)-g(B) ||| \ge ||| g(|A-B|) |||$, obtained by Ando, for operator monotone $f$ with inverse $g$, also have a similar generalisation to non-negative concave $f$ and convex $g$. We answer exactly this question, in the negative for general matrices, and affirmatively in the special case when $A\ge ||B||$. In the course of this work, we introduce the novel notion of $Y$-dominated majorisation between the spectra of two Hermitian matrices, where $Y$ is itself a Hermitian matrix, and prove a certain property of this relation that allows to strengthen the results of Bourin-Uchiyama and Kosem, mentioned above. <|endoftext|><|startoftext|> Introduction Black holes in space-times of greater than or equal to five dimensions have rich topological structure. According to the well-known results of Hawking concerning the topology of black holes in four-dimensional space-time, the apparent horizon or the spatial section of the stationary event horizon is necessarily diffeomorphic to a 2-sphere. [1, 2] This follows from the fact that the total curvature, which is the integral of the intrinsic scalar curvature over the horizon, is positive under the dominant energy condition and from the Gauss-Bonnet theorem. Alternative and improved proofs of Hawking’s theorem have been given by several authors. [3, 4, 5, 6] However in higher dimensional space-times, an apparent horizon or the spatial section of the stationary event horizon may not be a topological sphere, [7, 8, 9, 10] because the Gauss-Bonnet theorem does not hold in such cases. Nevertheless, the positivity of the total curvature of the horizon still holds. This puts certain topological restrictions on the black hole topology, though they are rather weak. For example, the apparent horizon in five-dimensional space-time can consist of finitely many connected sums of copies of S3/Γ and copies of S2 × S1. In fact, exact solutions representing a black hole space-time possessing a horizon of nonspherical topology have recently been found in five-dimensional general relativity. When such black holes with nontrivial topologies are regarded as being formed in the course of gravitational collapse, questions regarding the evolution of the topology of black holes naturally arise. Our purpose here is to understand the time evolution of the topology of event horizons in a general setting. The relation between the crease set, where the event horizon is nondifferentiable, and the topology of the event horizon is studied in Refs. [11, 12, 13] for four-dimensional space-times. In the present work, we carry out a systematic investigation and find useful rules to determine admissible processes of topological evolution for time slicing of a black hole. Our approach is to utilize the Morse theory [14, 15] in differential topology. The Morse theory is useful for the purpose of understanding the topology of smooth http://arxiv.org/abs/0704.0100v3 manifolds. The basic tool used in this approach is a smooth function on a differ- entiable manifold. The event horizon, however, is not a differentiable manifold but has a wedge-like structure at the past endpoints of the null geodesic generators of the horizon. For this reason, we first smooth the wedge. Then, the smooth time function which is assumed to exist plays the role of the Morse function on the smoothed event horizon. According to the Morse theory, the topological evolution of the event horizon can then be decomposed into elementary processes called “han- dle attachments.” In such a process, starting with a spherical horizon, one adds several handles, each characterized by the index of the critical points of the Morse function, which is an integer ranging from 0 to n (the dimension of the smoothed horizon as a differentiable manifold). The purpose of the present article is to show that there are several constraints on the handle attachments for real black hole space-times. 2. The Morse theory for event horizons Let M be an (n+1)-dimensional asymptotically flat space-time. We require the existence of a global time function t : M → R that is smooth and has an everywhere time-like and future-pointing gradient. The event horizon H is defined as the boundary of the causal past of the future null infinity H = ∂J−(I +). [2] We treat the event horizon defined with respect to a single asymptotic end, unless otherwise stated. In other words, the future null infinity, I +, is assumed to be connected. The black hole region B is defined as the interior region of H , specifically, as B = M \ J−(I +), and the exterior region E of the black hole region is its complement, E = int(J−(I +)). We refer to the intersection of the black hole region and the time slice Σ(t0) = {t = t0} as the black hole B(t0) = B ∩ Σ(t0) at time t = t0. The exterior region at time t = t0 is, accordingly, written E (t0) = E ∩ Σ(t0). One of most basic properties of the event horizon is that it is generated by null geodesics without future endpoints. In general, the event horizon is not smoothly imbedded into the space-time manifold M , but it has a wedge-like structure at the past endpoints of the null geodesic generators, where distinct null geodesic generators intersect. We call the set of past endpoints of null geodesic generators of H , from which two or more null geodesic generators emanate, the crease set S. [11, 12] When no crease set S exists between the time slices t = t1 and t = t2, the null geodesic generators of H naturally define a diffeomorphism ∂B(t1) ≈ ∂B(t2). Hence, the topological evolution of a black hole can take place only when the time slice intersects the crease set S. Of course, the event horizon itself is a gauge- independent object. Nevertheless, we often understand the dynamics of space-time by scanning it along time slices. Thus, the topological evolution of a black hole depends on the time function. It is expected that Morse theory [14] provides useful techniques to analyze such a process of topological evolution. Because the Morse theory is concerned with functions on smooth manifolds, we first regularize H around the crease set S. The event horizon is not necessarily smooth, even on H \ S, in the case that the future null infinity I + has a pathological structure. [16] Here it is assumed that H is smooth on H \ S. Then, small deformations of H near the crease set S will make H a smooth hypersurface H̃ in M , while B̃(t0) remains deformed in such a manner that ∂B̃(t0) = H̃ ∪ Σ(t0) holds and B̃(t0) remains homeomorphic to the original black hole for all t0 ∈ R. This deformation is assumed to be such that the time Figure 1. An example in which no smoothing procedure makes t| eH a Morse function on H̃ . Here, the intersection of the crease set S of the event horizon and t = t0 hypersurface has an accumulation point. function t| eH , which is the restriction of t on H̃, gives a Morse function on H̃ that has only nondegenerate critical points, where the gradient of t| eH defined on H̃ becomes zero and where also the Hessian matrix (∂i∂jt| eH) of t| eH is nondegenerate. Though this assumption should hold for a wide class of systems, it does not always hold. Figure 1 gives an example for which no smoothing procedure makes the induced time function t| eH a Morse function on H̃ , because the intersection of the crease set S of the event horizon and the t = t0 hypersurface has an accumulation point. It is highly nontrivial to determine whether such a smoothing procedure is generically possible. It is, however, not easy nor the primary purpose of this article to assertain the realm of validity of the assumption, and therefore we make this assumption without inquiring into its validity. According to the Morse Lemma, there is a local coordinate system {x1, · · · , xn} on H̃ in the neighborhood of the critical point p ∈ H̃ such that the restriction t| eH of the time function t on H̃ takes the form t| eH(x 1, · · · , xn) = t(p)− (x1)2 − · · · − (xλ)2 + (xλ+1)2 + · · ·+ (xn)2. The integer λ, ranging from 0 to n, is called the index of the critical point p. The topology of the black hole B̃(t) changes when Σ(t) pass through critical points, or equivalently, when the time function t takes critical values. This implies that critical points appears only near the crease set S. The gradient-like vector field for t| eH is defined to be the tangent vector field X on H̃ such that Xt| eH > 0 holds on H̃ , except for critical points, and has the form X = −2x1 − · · · − 2xλ + 2xλ+1 ∂xλ+1 + · · ·+ 2xn near the critical point of index λ, in terms of the standard coordinate system appearing in the Morse Lemma. We choose a gradient-like vector field X such that it coincides with the future-directed tangent vector field of null geodesic generators of H , except in a small neighborhood of the crease set S (Fig. 2). The effect of a critical point p of index λ is equivalent to the attachment of a λ-handle. [14, 15] The handlebody is just a topological n-disk Dn ≈ In (I = [0, 1]), but it is regarded as the product space Dn ≈ Dλ×Dn−λ (Fig. 3). The λ-handle attachment to an n- dimensional manifold N with a boundary consists of the set hλ = (Dλ ×Dn−λ, f), where the attaching map f induces the imbedding of ∂Dλ × Dn−λ ⊂ ∂Dn into ∂N (Fig. 4). The new manifold obtained through the λ-handle attachment to N is Figure 2. The smoothing procedure of the event horizon H . The gradient-like vector field on H̃ can be constructed through a slight deformation of the null geodesic generators of H . Here, the effect of the crease set S has been replaced by that of the critical points p1, p2 and p3. Figure 3. The local structure around the critical point p of index λ. It can be seen that H̃t(p)+ǫ is homeomorphic to H̃t(p)−ǫ with a λ-handle attached. given by N ∪ hλ = N ∪ (Dλ ×Dn−λ)/(x ∼ f(x)), (x ∈ ∂Dλ ×Dn−λ). Let us denote by H̃t0 the t ≤ t0 part of H̃. Then, H̃t(p)+ǫ (ǫ > 0) just above the critical point p of index λ is homeomorphic (in fact diffeomorphic, taking account of the smoothing procedure) to that just below p, H̃t(p)−ǫ attached with a λ-handle, H̃t(p)+ǫ ≈ H̃t(p)−ǫ ∪ h if there are no other critical points satisfying t(p)−ǫ ≤ t ≤ t(p)+ǫ. The handlebody itself is denoted by hλ as well. Let us consider several examples. The 0-handle attachment does not need an attaching map f . It simply corresponds to the emergence of the (n − 1)-sphere Sn−1 ≈ ∂Dn as a black hole horizon ∂B(t). A typical example is the creation of a black hole (Fig. 5): A black hole always emerges as 0-handle attachment. The other Figure 4. The attachment of a 1-handle and a 2-handle to a 3-manifold N creates a new 3-manifold N ∪ h1 ∪ h2. Figure 5. The emergence of a black hole through a 0-handle attachment. Figure 6. The emergence of a bubble in the black hole region by 0-handle attachment, which does not occur in the real black hole space-times. Figure 7. The collision of a pair of black holes, creating a single black hole, is realized through 1-handle attachment. possiblity is the creation of a bubble that is subset of J−(I +) in a black hole region (Fig. 6). One might think that this corresponds to wormhole creation between the internal and external regions of the event horizon. Although in the framework of the standard Morse theory on H̃ , these two examples are indistinguishable, we below see that the latter process is in fact impossible. Next, we consider 1-handle attachment. A typical example is the collision of two black holes. A 1-handle serves as a bridge connecting black holes, or it corresponds to taking the connected sum of each component of multiple black holes (Fig. 7). Figure 8. The bifurcation of one black hole into two is represeted by an (n − 1)-handle attachment. This, however, never occurs in real black hole space-times. Figure 9. The structure of λ-handle. The core Dλ × {0} corre- sponds to the stable submanifold with respect to the flow gener- ated by the gradient-like vector field, and the co-core {0} ×Dn−λ corresponds to the unstable submanifold. The time reversal of the collision of black holes consists of the bifurcation of one black hole into two. This would be realized through an (n− 1)-handle attachment, if such a process were possible (Fig. 8). It is, however, well known that such a process is forbidden. [2] In general, the time reversal of the λ-handle attachment corresponds to (n− λ)-handle attachment. Before discussing general cases, let us consider the structure of a handlebody. Recall that a λ-handle consists of the product space Dλ × Dn−λ. The subset Dλ × {0} ⊂ Dλ × Dn−λ is called the core of the handlebody, and {0} ×Dn−λ ⊂ Dλ ×Dn−λ is called the co-core. The core and co-core intersect transversely at a point. This point can be regarded as a critical point p. Let us refer to the subset Ws(p) of H̃ (1) Ws(p) = {q ∈ M | lim expq tX = p} which consists of points that converge to p along the flow generated by the gradient- like vector field X , as the stable manifold with respect to the critical point p. The stable manifold Ws(p) is homeomorphic to R λ if the index of p is given by λ. [17] Similarly, let us refer to the subset Wu(p) ⊂ H̃ consisting of points which converge to p along the flow generated by (−X) as the unstable manifold with respect to p. For the unstable manifold, Wu(p) ≈ R n−λ holds. The portions of the stable and unstable manifolds in the handlebody can be regarded as corresponding to the core and co-core, respectively. The effect of smoothing the event horizonH to H̃ is to deform the null vector field generating H into a gradient-like vector field X . The primary difference between the null geodesic generators and the flow generated by X is that the former does not have future endpoints, but the latter can. Thus, there are admissible and inadmissible processes for the smoothed manifold H̃. An admissible process is given by H̃ , which is obtained from an in priciple realizable event horizon, while an inadmissible one is constructed from a spurious event horizon, i.e., one that consists of the null hypersurface containing null geodesic generators with a future endpoint. 3. The structure of the critical points The spatial topology of a black hole changes only when the time function takes a critical value. The time evolution of the black hole topology can be understood by considering its local structure around critical points. To determine whether a given topological change is admissible or inadmissible, it is not sufficient to consider only the intrinsic structure of the event horizon. Rather, it is required to take account of its imbedding structure relative to the space-time. In a time slice, any point separate from H̃ belongs to either of the black hole or the exterior of the black hole region. It is useful to consider the local behavior of the black hole region or the exterior region near the critical point p. Let us call the exterior E of the black hole region simply the exterior region, for brevity. The exterior region is slightly deformed by the smoothing procedure. The deformed exterior region is denoted by Ẽ , and the deformed exterior region at the time t by Ẽ (t) = Ẽ ∩ Σ(t) = Σ(t) \ B̃(t).(2) The 0-handle is placed at some t ≥ t(p). Such an attachment describes the emer- gence of the black hole region at the critical point p and its expansion with time. The emergence of a bubble, which consists of a part of J−(I +), in the background of the black hole region would also be described by a 0-handle attachment. This, however, never occurs, as we explain below in detail. Hence, a 0-handle attachment always describes the creation of a black hole homeomorphic to the n-disk. An n-handle attachment corresponds to the time reversal of a 0-handle attache- ment. This process, however, never occurs in real black hole space-time. An n- handle is defined for t ≤ t(p), which means that it terminates at the critical point p. The crease set is isolated into critical points during the course of the smoothing procedure. The gradient-like vector field, which can be regarded as being tangent to the generator of the deformed event horizon H̃ , may have several inward (con- verging) directions at the critical point due to this smoothing procedure, while the original null generator of the event horizon does not have an inward direction at the crease set. In the case of the n-handle, all the directions become inward at the critical point. This implies that the null generators of the event horizon H must have future endpoints at the critical point, which is, of course, impossible. It is thus seen that an n-handle attachment never occurs in real black hole space-times. The remaining cases are λ-handle attachments for 1 ≤ λ ≤ n − 1. In these cases, the λ-handle lies on either side of the critical point p both in the future [t > t(p)] and in the past [t < t(p)]. Then, we consider the case in which the handle exists during the sufficiently small time interval t ∈ [t(p) − δ, t(p) + δ] (δ > 0), to understand the topological change of the black hole region at the critical point p. Figure 10. The neighborhood U of p is separated by hλ into the future region, U+, and the past region, U−. First, we introduce a coordinate system {t, xi} (i = 1, · · · , n) in the neighborhood U of p, where t is a given function of time, and {xi} is the extension over U of the cannonical coordinate appearing in the Morse Lemma such that each curve (x1, · · · , xn) = [const] is timelike in U . We assume that U is the solid cylinder given by t ∈ [t(p)−δ, t(p)+δ], (xi)2 ≤ δ. In this coordinate system, the λ-handle hλ is given by the saddle surface t = t(p)− (x1)2 − · · · − (xλ)2 + (xλ+1)2 + · · ·+ (xn)2 in U , which is an acausal set if the constant δ is taken sufficiently small, since hλ is tangent to the space-like hypersurface t = t(p) at p. Therefore, hλ separates U into two open subsets, the future and past regions U+ and U− of U , where U+ and U− are the subsets lying chronological future and past, respectively, of hλ: U± = I±(hλ) ∩ U . Explicitly, the future and past regions U± are the regions satisfying t ≷ t(p)− (x1)2 − · · · − (xλ)2 + (xλ+1)2 + · · ·+ (xn)2 in U , respectively (Fig. 10). Because the λ-handle is a subset of the black hole boundary H̃ , one of U± is contained in the black hole region, B̃, and the other in the exterior region, Ẽ . However, the future region U+ of U is always included in the black hole region, i.e. U+ ⊂ B̃, and hence we have U− ⊂ Ẽ , since the horizon is the boundary of the past set, J−(I +). Therefore, the black hole region B̃(t(p) − ǫ) ∩ U in U at the time t = t(p)− ǫ just before the critical time is given by (x1)2 + · · ·+ (xλ)2 > (xλ+1)2 + · · ·+ (xn)2 + ǫ, which is homotopic to the (λ − 1)-sphere Sλ−1. (For λ = 1, S0 simply consists of two points.) Similarly, B̃(t(p) + ǫ) ∩ U just after the critical time is given by (x1)2 + · · ·+ (xλ)2 + ǫ > (xλ+1)2 + · · ·+ (xn)2, which is homotopic to the n-disk. In this way, the black hole region restricted to the small neighborhood of the critical point p is initially homotopic to a sphere. Then, the internal region of the sphere is filled up at the critical time t = t(p) and eventually becomes homotopically trivial. The exterior region, Ẽ (t) ∩ U , in U is initially homotopic to an n-disk for t = t(p) − ǫ. Then, its (n − λ)-dimensional direction is penetrated by the black hole region at t = t(p), and thus it becomes homotopic to an (n− λ− 1)-sphere Sn−λ−1 for t = t(p) + ǫ. If the spurious event horizon is also taken into account, the future region U+ might be a subset of Ẽ , and therefore the past region U− might be a subset of B̃. Then, the black hole region in the λ-handle might be homotopic to an n-disk initially and become homotopic to an (n−λ−1)-sphere finally, and vice versa for the exterior region. Let us refer to such a topological change of the black hole region B̃(t)∩U from a region homotopic to a sphere to a region homotopic to a disk as a black handle attachment, and that from a region homotopic to a disk to a region homotopic to the sphere as a white handle attachment. The above observation shows that only a black handle attachment occurs if a sufficiently small neighborhood of the critical point is considered. For example, a collision of black holes corresponds to a black 1-handle attachment, while the bifurcation of a black hole corresponds to a white (n − 1)-handle attachment in the sense that the homotopy type of the exterior region Ẽ (t) ∩ U changes from that of Sn−2 to that of Dn. This local argument also elucidates te reason that a black hole collision is admissible while a black hole bifurcation, which is its time reversal, is inadmissible. We also note that the effect of time reversal is to convert a black λ-handle attachment into a white (n− λ)-handle attachment. It is appropriate to refer to the 0-handle attachment corresponding to the cre- ation of a black hole as a black 0-handle attachment. Then, the proposition above also applies to a 0-handle attachment. 4. Connectedness of the exterior region There also exist processes that are unrealizable due to global conditions. Let us, for a moment, consider the event horizon in maximally extended Schwarzschild space-time. Though we are interested in the event horizon defined with respect to a specific asymptotic end, for the purpose of explanation, we examine the event horizon defined with respect to a pair of asymptotic ends in Schwarzschild space- time (Fig. 11). Let I +1 and I 2 be the pair of future null infinities of the maximally extended Schwarzschild space-time. The event horizon here is defined by H = ∂J−(I +1 ∪ 2 ), which is nondifferentiable at the bifurcate horizon F = ∂J −(I +1 )∩∂J −(I +2 ). Let t be a global time function and χ be a global radial coordinate function such that each two-surface t, χ = [const] is invariant under the SO(3) isometry. These coordinates are chosen such that the bifurcation surface F is located at t = χ = 0 and the event horizon H is determined by t = |χ| around F . The smoothed event horizon H̃ is also taken to be invariant under the SO(3) isometry. Due to the symmetry of the configuration, the time function t has critical points of degenerate type. In fact, any point on bifurcate horizon F is critical. Here, we are not interested in such a nongeneric situation. Instead, we consider a slightly different time slicing determined by the new time function t′ = t+ ǫ sin2 where ǫ > 0 is a sufficiently small positive constant and ϑ, which satisfies 0 ≤ ϑ ≤ π, is the usual polar coordinate of the 2-sphere. Then, there appears only a pair of isolated critical points at the north pole (ϑ = 0) and the south pole (ϑ = π) on the bifurcate horizon F , and the time function t′ becomes the Morse function on H̃. At the time t′ = 0, the black hole appears at the north pole. This is the 0-handle attachment. The black hole formed there grows into a geometrically thick spherical shell with a hole at the south pole, which is nevertheless a topological 3-disk. At the time t′ = ǫ, the puncture at the south pole is filled, and the black hole region becomes topologically S2 × [0, 1]. The deformed event horizon H̃ splits into a disjoint union of a pair of 2-spheres. This is the 2-handle attachment. This kind of 2-handle attachment occurs because the event horizon is defined with respect to the two asymptotic ends, which is in general inadmissible if the future null infinity is connected, as we assume from this point. To understand the above statement, it should be noted that there is no process through which the several connected components of the exterior region Ẽ (t) = Ẽ ∩ Σ(t) at time t merge together at a later time because such a handle attachment is not admissible. It is also seen that no connected component of Ẽ (t) disappears, because possi- ble n-handle attachments are inadmissible. These facts imply that the number of connected components of the exterior region Ẽ (t) cannot decrease with the time function t. On the other hand, there is only one connected component of the exte- rior region Ẽ (t) for sufficiently large t, because of the connectedness of I +. This observation shows that the exterior region Ẽ (t) remains connected in any process. The only possible process through which the number of connected components of the exterior region Ẽ (t) changes is an (n− 1)-handle attachment, as constructed above in the Schwarzschild space-time. This is because the subset Dλ × ∂Dn−λ of H2 H1 t' = t' (p) B (t'(p)) II 12 Figure 11. The figure on the left is a conformal diagram of the maximally extended Schwarzschild space-time. The structure of the event horizon defined with respect to the two asymptotic ends is depicted on the right, with one dimension omitted. The shaded region represents the black hole region at the critical time t = t(p). This corresponds to the 2-handle attachment, where the exterior region is separated into a pair of connected components. the boundary of the λ-handle ∂hλ ≈ (∂Dλ ×Dn−λ) ∪ (Dλ × ∂Dn−λ), namely the part of ∂hλ which is the complement of the preimage of the attaching f : ∂hλ ⊃ ∂Dλ ×Dn−λ → H̃t, is disconnected only when λ = n − 1. In this case, the homotopy type of the exterior region Ẽ (t) changes from that of an n-disk to that of S0, namely two points. Note, however, that this does not imply that the exterior region Ẽ (t) is always separated into two disconnected parts through the (n− 1)-handle attachment. For example, a transition from the black ring horizon ≈ Sn−2 × S1 to the spherical black hole horizon ≈ Sn−1 is realized through a black (n− 1)-handle attachment, which pinches the longitude {a point}×S1 ⊂ Sn−2 ×S1 into a point. The exterior region Ẽ (t) remains connected all the while. Thus, there are both admissible and inadmissible processes for (n−1)-handle attachments. An (n−1)-handle attachment is inadmissible if it separates the exterior region Ẽ (t). 5. Concluding remarks The arguments given in this paper are summerized by the following rules. As- sume that (i) an (n+ 1)-dimensional space-time M is asymptotically flat and the future null infinity I + is connected, or the event horizon H = ∂J−(I +) is defined with respect to a single asymptotic end, (ii) the space-time M admits a smooth global time function t, (iii) the event horizon H can be deformed so that the black hole B̃(t) deformed accordingly at each time t is smooth and homeomorphic to orig- inal one B(t) at each time t and the time function t becomes the Morse function on H̃. Then, the topological evolution of the event horizon can be regarded as a λ-handle attachment (0 ≤ λ ≤ n) subject to the following rules: (1) The n-handle attachment is inadmissible. (2) Only the black λ-handle attachment (0 ≤ λ ≤ n − 1), where the black hole region in the neighborhood of the critical point varies from the region homotopic to the sphere Sλ−1 (regarded as the empty set for λ = 0) to the n-disk Dn, is admissible. (3) The (n − 1)-handle attachment which separates the spatial section of the exterior region of the black hole is inadmissible. The first rule simply states that no connected component of a black hole disap- pears. It also implies that if a bubble of the exterior region forms within the black hole region, it does not vanish. The second rule is concerned with the imbedding structure of the event horizon relative to the space-time manifold. The neighborhood of the critical point is sep- arated into two regions by the event horizon. One changes homotopically from a sphere to a disk and the other from a disk to a sphere. We call it a black handle at- tachment when the former corresponds to the black hole region and a white handle attachment otherwise. Then, the second rule states that a white handle attach- ment never occurs. The reverse process, in which a black hole region homotopically changes from a disk to a sphere, is ruled out. A white 0-handle attachment, which Figure 12. Black ring formation from a spherical black hole must be non-axisymmetric in real black hole space-times. describes the emergence of the exterior region, is also forbidden. This gives an- other reason for the well-known result that a black hole cannot bifurcate, because it corresponds to a white (n− 1)-handle attachment. The second rule applies to more general situations. For example, let us consider the topological evolution of the event horizon from Sn−1 to Sn−2 × S1 in (n+ 1)- dimensional space-time (n ≥ 3). When it is realized with a single critical point, it corresponds to a 1-handle attachment. Here, one might expect two possibilities if the second rule is not considered. One possibility is that the 1-handle is attached in the exterior region of the black hole. This is locally equivalent to the merging of a pair of black holes, where these two black holes are connected elsewhere irrelevant. The other possibility is that it is attached from the inside such that the 1-handle pierces the black hole region. In asymptotically flat space-times, only the latter includes axisymmetric configurations such that a spherical black hole is pinched out along the symmetric axis; here the axisymmetric configuration is such that the space-time possesses the SO(n − 1) isometry and the time slicing respects this symmetry. However, this latter possibility corresponds to a white 1-handle attachment, which is impossible, and only the former, which corresponds to a black 1-handle attachment, is possible. In particular, a transition from a spherical event horizon (≈ Sn−1) to a black ring horizon (≈ Sn−2×S1) in asymptotically flat space- times is always non-axisymmetric in the sense that such a configuration cannot possess SO(n− 1) symmetry (Fig. 12). While the apparent horizon must be diffeomorphic to a two-sphere in four- dimensional space-times under the dominant energy condition, a torus event hori- zon may appear, even under the dominant energy condition, via a black 1-handle attachment to the spherical horizon. More generally, an event horizon with an arbitrary number of genura may be formed by several black 1-handle attachments. The third rule is not directly determined by the local structure of the critical point. It states that the exterior region E (t) = E ∩ Σ(t) at each time is always connected under the assumption that I + is connected. Thus, the possibility that there forms a bubble of the exterior region inside the black hole horizon is ruled out. It should, however, be noted that such a process is possible if I + consists of several connected components. This may also be related to the topological censorship theorem. [19] The topological censorship theorem states that all causal curves from I − to I + are homotopic under the null energy condition. This also forbids the formation of a bubble of the exterior region inside the black hole, because otherwise there would be two nonhomotopic causal curves from I − to I +, one passing inside the horizon and the other outside. Our argument, however, does not depend on energy conditions. References [1] S. W. Hawking, Commun. Math. Phys. 25 (1972), 152. [2] S. W. Hawking and G. F. R. Ellis, The large scale structure of space-times (London, Cam- bridge University Press, 1973). [3] D. Gannon, Gen. Relat. Gravit. 7 (1974), 219. [4] P. T. Chrusciel and R. M. Wald, Class. Quantum Grav. 11 (1994), L147; gr-qc/9410004. [5] T. Jacobson and S. Venkataramani, Class. Quantum Grav. 12 (1995), 1055; gr-qc/9410023. [6] S. F. Browdy and G. J. Galloway, J. Math. Phys. 36 (1995), 4952. [7] M. l. Cai and G. J. Galloway, Class. Quantum Grav. 18 (2001), 2707; hep-th/0102149. [8] C. Helfgott, Y. Oz and Y. Yanay, JHEP 0602 (2006), 025; hep-th/0509013. [9] G. J. Galloway and R. Schoen, Commun. Math. Phys. 266 (2006), 571; gr-qc/0509107. [10] G. J. Galloway, gr-qc/0608118. [11] M. Siino, Phys. Rev. D 58 (1998), 104016; gr-qc/9701003. [12] M. Siino and T. Koike, arXiv:gr-qc/0405056. [13] S. L. Shapiro, S. A. Teukolsky and J. Winicour, Phys. Rev. D 52 (1995), 6982. [14] J. Milnor, Lectures on h-cobordism theorem (Princeton, Princeton University Press, 1965). [15] I. Tamura, The differential topology (Tokyo, Iwanami Shoten Publishers, 1978). [16] P. T. Chrusciel and G. J. Galloway, Commun. Math. Phys. 193 (1998), 449 gr-qc/9611032. [17] S. Smale, Ann. of Math. 74 (1961), 199. [18] R. Emparan and H. S. Reall, Phys. Rev. Lett. 88 (2002), 101101; hep-th/0110260. [19] J. L. Friedman, K. Schleich and D. M. Witt, Phys. Rev. Lett. 71 (1993), 1486 [Errata; 75 (1995), 1872]; gr-qc/9305017. http://arxiv.org/abs/gr-qc/9410004 http://arxiv.org/abs/gr-qc/9410023 http://arxiv.org/abs/hep-th/0102149 http://arxiv.org/abs/hep-th/0509013 http://arxiv.org/abs/gr-qc/0509107 http://arxiv.org/abs/gr-qc/0608118 http://arxiv.org/abs/gr-qc/9701003 http://arxiv.org/abs/gr-qc/0405056 http://arxiv.org/abs/gr-qc/9611032 http://arxiv.org/abs/hep-th/0110260 http://arxiv.org/abs/gr-qc/9305017 1. Introduction 2. The Morse theory for event horizons 3. The structure of the critical points 4. Connectedness of the exterior region 5. Concluding remarks References ABSTRACT The topological structure of the event horizon has been investigated in terms of the Morse theory. The elementary process of topological evolution can be understood as a handle attachment. It has been found that there are certain constraints on the nature of black hole topological evolution: (i) There are n kinds of handle attachments in (n+1)-dimensional black hole space-times. (ii) Handles are further classified as either of black or white type, and only black handles appear in real black hole space-times. (iii) The spatial section of an exterior of the black hole region is always connected. As a corollary, it is shown that the formation of a black hole with an S**(n-2) x S**1 horizon from that with an S**(n-1) horizon must be non-axisymmetric in asymptotically flat space-times. <|endoftext|><|startoftext|> Introduction The sixties was a period in which strong interacting processes were studied in detail using the newly constructed accelerators at Cern and other places. Many new hadronic states were found that appeared as resonant peaks in var- ious cross sections and hadronic cross sections were measured with increasing accuracy. In general, the experimental data for strongly interacting processes were rather well understood in terms of resonance exchanges in the direct channel at low energy and by the exchange of Regge poles in the transverse channel at higher energy. Field theory that had been very successful in de- scribing QED seemed useless for strong interactions given the big number of hadrons to accomodate in a Lagrangian and the strength of the pion-nucleon coupling constant that did not allow perturbative calculations. The only do- main in which field theoretical techniques were successfully used was current algebra. Here, assuming that strong interactions were described by an almost chiral invariant Lagrangian, that chiral symmetry was spontaneously broken and that the pion was the corresponding Goldstone boson, field theoretical methods gave rather good predictions for scattering amplitudes involving pi- ons at very low energy. Going to higher energy was, however, not possible with these methods. Because of this, many people started to think that field theory was use- less to describe strong interactions and tried to describe strong interacting http://arxiv.org/abs/0704.0101v1 2 Paolo Di Vecchia processes with alternative and more phenomenological methods. The basic ingredients for describing the experimental data were at low energy the ex- change of resonances in the direct channel and at higher energy the exchange of Regge poles in the transverse channel. Sum rules for strongly interacting processes were saturated in this way and one found good agreement with the experimental data that came from the newly constructed accelerators. Be- cause of these successes and of the problems that field theory encountered to describe the data, it was proposed to construct directly the S matrix without passing through a Lagrangian. The S matrix was supposed to be constructed from the properties that it should satisfy, but there was no clear procedure on how to implement this construction1. The word “bootstrap” was often used as the way to construct the S matrix, but it did not help very much to get an S matrix for the strongly interacting processes. One of the basic ideas that led to the construction of an S matrix was that it should include resonances at low energy and at the same time give Regge behaviour at high energy. But the two contributions of the resonances and of the Regge poles should not be added because this would imply double counting. This was called Dolen, Horn and Schmidt duality [2]. Another idea that helped in the construction of an S matrix was planar duality [3] that was visualized by associating to a certain process a duality diagram, shown in Fig. (1), where each meson was described by two lines representing the quark and the antiquark. Finally, also the requirement of crossing symmetry played a very important role. Fig. 1. Duality diagram for the scattering of four mesons Starting from these ideas Veneziano [4] was able to construct an S matrix for the scattering of four mesons that, at the same time, had an infinite number of zero width resonances lying on linearly rising Regge trajectories and Regge behaviour at high energy. Veneziano originally constructed the model for the 1 For a discussion of S matrix theory see Ref.s [1] The birth of string theory 3 process ππ → πω, but it was immediately extended to the scattering of four scalar particles. In the case of four identical scalar particles, the crossing symmetric scat- tering amplitude found by Veneziano consists of a sum of three terms: A(s, t, u) = A(s, t) +A(s, u) +A(t, u) (1) where A(s, t) = Γ (−α(s))Γ (−α(t)) Γ (−α(s)− α(t)) dxx−α(s)−1(1− x)−α(t)−1 (2) with linearly rising Regge trajectories α(s) = α0 + α ′s (3) This was a very important property to implement in a model because it was in agreement with the experimental data in a wide range of energies. s, t and u are the Mandelstam variables: s = −(p1 + p2)2 , t = −(p3 + p2)2 , u = −(p1 + p3)2 (4) The three terms in Eq. (1) correspond to the three orderings of the four parti- cles that are not related by a cyclic or anticyclic 2 permutation of the external legs. They correspond, respectively, to the three permutations: (1234), (1243) and (1324) of the four external legs. They have only simple pole singularities. The first one has only poles in the s and t channels, the second only in the s and u channels and the third only in the t and u channels. This property fol- lows directly from the duality diagram that is associated to each inequivalent permutation of the external legs. In fact, at that time one used to associate to each of the three inequivalent permutations a duality diagram where each particle was drawn as consisting of two lines that rappresented the quark and antiquark making up a meson. Furthermore, the diagram was supposed to have only poles singularities in the planar channels which are those involving adjacent external lines. This means that, for instance, the duality diagram corresponding to the permutation (1234) has only poles in the s and t chan- nels as one can see by deforming the diagram in the plane in the two possible ways shown in figure (2). This was a very important property of the duality diagram that makes it qualitatively different from a Feynman diagram in field theory where each diagram has only a pole in one of the three s, t and u channels and not simultaneously in two of them. If we accept the idea that each term of the sum in Eq. (1) is described by a duality diagram, then it is clear that we 2 An anticyclic permutation corresponding, for instance, to the ordering (1234) is obtained by taking the reverse of the original ordering (4321) and then performing a cyclic permutation. 4 Paolo Di Vecchia Fig. 2. The duality diagram contains both s and t channel poles do not need to add terms corresponding to equivalent diagrams because the corresponding duality diagram is the same and has the same singularities. It is now clear that it was in some way implicit in this picture the fact that the Veneziano model corresponds to the scattering of relativistic strings. But at that time the connection was not obvious at all. The only S matrix property that the Veneziano model failed to satisfy was the unitarity of the S matrix. because it contained only zero width resonances and did not have the various cuts required by unitarity. We will see how this property will be implemented. Immediately after the formulation of the Veneziano model, Virasoro [5] proposed another crossing symmetric four-point amplitude for scalar particles that consisted of a unique piece given by: A(s, t, u) ∼ Γ (−α(u) )Γ (−α(s) )Γ (−α(t) Γ (1 + )Γ (1 + )Γ (1 + where α(s) = α0 + α ′s (6) The model had poles in all three s, t and u channels and could not be written as sum of three terms having poles only in planar diagrams. In conclusion, the Veneziano model satisfies the principle of planar duality being a crossing symmetric combination of three contributions each having poles only in the planar channels. On the other hand, the Virasoro model consists of a unique crossing symmetric term having poles in both planar and non-planar channels. The attempts to construct consistent models that were in good agreement with the strong interaction phenomenology of the sixties boosted enormously the activity in this research field. The generalization of the Veneziano model to the scattering ofN scalar particles was built, an operator formalism consisting of an infinite number of harmonic oscillators was constructed and the complete spectrum of mesons was determined. It turned out that the degeneracy of states grew up exponentially with the mass. It was also found that the N point amplitude had states with negative norm (ghosts) unless the intercept of the Regge trajectory was α0 = 1 [6]. In this case it turned out that the model was free of ghosts but the lowest state was a tachyon. The model was called in the literature the “dual resonance model”. The birth of string theory 5 The model was not unitary because all the states were zero width reso- nances and the various cuts required by unitarity were absent. The unitarity was implemented in a perturbative way by adding loop diagrams obtained by sewing some of the external legs together after the insertion of a propagator. The multiloop amplitudes showed a structure of Riemann surfaces. This be- came obvious only later when the dual resonance model was recognized to correspond to scattering of strings. But the main problem was that the model had a tachyon if α0 = 1 or had ghosts for other values of α0 and was not in agreement with the experimental data: α0 was not equal to about as required by experiments for the ρ Regge trajectory and the external scalar particles did not behave as pions satisfying the current algebra requirements. Many attempts were made to construct more realistic dual resonance models, but the main result of these attempts was the construction of the Neveu-Schwarz [7] and the Ramond [8] models, respectively, for mesons and fermions. They were constructed as two independent models and only later were recognized to be two sectors of the same model. The Neveu-Schwarz model still contained a tachyon that only in 1976 through the GSO projection was eliminated from the physical spectrum. Furthermore, it was not properly describing the properties of the physical pions. Actually a model describing ππ scattering in a rather satisfactory way was proposed by Lovelace and Shapiro [9] 3. According to this model the three isospin amplitudes for pion-pion scattering are given by: [A(s, t) +A(s, u)]− 1 A(t, u) A1 = A(s, t)−A(s, u) A2 = A(t, u) (7) where A(s, t) = β Γ (1− α(s))Γ (1 − α(t)) Γ (1− α(t)− α(s)) ; α(s) = α0 + α ′s (8) The amplitudes in eq.(7) provide a model for ππ scattering with linearly rising Regge trajectories containing three parameters: the intercept of the ρ Regge trajectory α0, the Regge slope α ′ and β. The first two can be determined by imposing the Adler’s self-consistency condition, that requires the vanishing of the amplitude when s = t = u = m2π and one of the pions is massless, and the fact that the Regge trajectory must give the spin of the ρ meson that is equal to 1 when s is equal to the mass of the ρ meson mρ. These two conditions determine the Regge trajectory to be: α(s) = s−m2π m2ρ −mπ2 = 0.48 + 0.885s (9) 3 See also Ref. [10]. 6 Paolo Di Vecchia Having fixed the parameters of the Regge trajectory the model predicts the masses and the couplings of the resonances that decay in ππ in terms of a unique parameter β. The values obtained are in reasonable agreement with the experiments. Moreover, one can compute the ππ scattering lenghts: a0 = 0.395β a2 = −0.103β (10) and one finds that their ratio is within 10% of the current algebra ratio given by a0/a2 = −7/2. The amplitude in eq.(8) has exactly the same form as that for four tachyons of the Neveu-Schwarz model with the only apparently minor difference that α0 = 1/2 (for mπ = 0) instead of 1 as in the Neveu-Schwarz model. This difference, however, implies that the critical space-time dimension of this model is d = 4 4 and not d = 10 as in the Neveu-Schwarz model. In conclusion this model seems to be a perfectly reasonable model for describing low-energy ππ scattering. The problem is, however, that nobody has been able to generalize it to the multipion scattering and therefore to get the complete meson spectrum. As we have seen the S matrix of the dual resonance model was constructed using ideas and tools of hadron phenomenology of the end of the sixties. Although it did not seem possible to write a realistic dual resonance model describing the pions , it was nevertheless such a source of fascination for those who actively worked in this field at that time for its beautiful internal structure and consistency that a lot of energy was used to investigate its properties and for understanding its basic structure. It turned out with great surprise that the underlying structure was that of a quantum relativistic string. The aim of this contribution is to explain the logic of the work that was done in the years from 1968 to 1974 5 in order to uncover the deep properties of this model that appeared from the beginning to be so beautiful and consistent to deserve an intensive study. This seems to me a very good way of celebrating the 65th anniversary of Gabriele who is the person who started and also contributed to develop the whole thing with his deep physical intuition. 2 Construction of the N -point amplitude We have seen that the construction of the four-point amplitude is not sufficient to get information on the full hadronic spectrum because it contains only those hadrons that couple to two ground state mesons and does not see those intermediate states which only couple to three or to an higher number of ground state mesons [12]. Therefore, it was very important to construct the N -point amplitude involving identical scalar particles. The construction of 4 This can be checked by computing the coupling of the spinless particle at the level α(s) = 2 and seeing that it vanishes for d = 4. 5 Reviews from this period can be found in Ref. [11] The birth of string theory 7 the N -point amplitude was done in Ref. [13] (extending the work of Ref. [14]) by requiring the same principles that have led to the construction of the Veneziano model, namely the fact that the axioms of S-matrix theory be satisfied by an infinite number of zero width resonances lying on linearly rising Regge trajectories and planar duality. The fully crossing symmetric scattering amplitude of N identical scalar particles is given by a sum of terms corresponding to the inequivalent permu- tations of the external legs: An (11) Also in this case two permutations of the external legs are inequivalent if they are not related by a cyclic or anticyclic permutation. Np is the number of inequivalent permutations of the external legs and is equal to Np = (N−1)! and each term has only simple pole singularities in the planar channels. Each planar channel is described by two indices (i, j), to mean that it includes the legs i, i+ 1, i+ 2 . . . j − 1, j, by the Mandelstam variable sij = −(pi + pi+1 + . . .+ pj)2 (12) and by an additional variable uij whose role will become clear soon. It is clear that the channels (ij) and (j + 1, i− 1) 6 are identical and they should be counted only once. In the case of N identical scalar particles the number of planar channels is equal to N(N−3) . This can be obtained as follows. The independent planar diagrams involving the particle 1 are of the type (1, i) where i = 2 . . .N − 2. Their number is N − 3. This is also the number of planar diagrams involving the particle 2 and not the 1. The number of planar diagrams involving the particle 3 and not the particles 1 and 2 is equal to N − 4. In general the number of planar diagrams involving the particle i and not the previous ones from 1 to i-1 is equal to N − 1− i. This means that the total number of planar diagram is equal to: 2(N − 3) + (N − 1− i) = 2(N − 3) + = 2(N − 3) + (N − 4)(N − 3) N(N − 3) If one writes down the duality diagram corresponding to a certain planar ordering of the external particles, it is easy to see that the diagram can have simultaneous pole singularities only in N − 3 channels. The channels that allow simultaneous pole singularities are called compatible channels, the other 6 This channel includes the particles (j + 1, . . . , N, 1, . . . i− 1). 8 Paolo Di Vecchia are called incompatible. Two channels (i,j) and (h,k) are incompatible if the following inequalities are satisfied: i ≤ h ≤ j ; j + 1 ≤ k ≤ i− 1 (14) The aim is to construct the scattering amplitude for each inequivalent per- mutation of the external legs that has only pole singularities in the N(N−3) planar channels. We have also to impose that the amplitude has simultaneous poles only in N − 3 compatible channels. In order to gain intuition on how to proceed we rewrite the four-point amplitude in Eq. (2) as follows: A(s, t) = du23 u −α(s12)−1 −α(s23)−1 23 δ(u12 + u23 − 1) (15) where u12 and u23 are the variables corresponding to the two planar chan- nels (12) and (23) and the cancellation of simultaneous poles in incompatible channels is provided by the δ-function which forbids u12 and u23 to vanish simultaneously. We will now extend this procedure to the N -point amplitude. But for the sake of clarity let us start with the case of N = 5 [14]. In this case we have 5 planar channels described by u12, u13, u23, u24 and u34. Since we have only two compatible channels only two of the previous five variables are independent. We can choose them to be u12 and u13. In order to determine the depen- dence of the other three variables on the two independent ones, we exclude simultaneous poles in incompatible channels. This can be done by imposing relations that prevent variables corresponding to incompatible channels to vanish simultaneously. A sufficient condition for excluding simultaneous poles in incompatible channels is to impose the conditions: uP = 1− uP̄ (16) where the product is over the variables P̄ corresponding to channels that are incompatible with P . In the case of the five-point amplitude we get the following relations: u23 = 1− u34u12 ; u24 = 1− u13u12 u13 = 1− u34u24 ; u34 = 1− u23u13 ; u12 = 1− u24u23 (17) Solving them in terms of the two independent ones we get: u23 = 1− u12 1− u12u13 ; u34 = 1− u13 1− u12u13 ; u24 = 1− u12u13 (18) In analogy with what we have done for the four-point amplitude in Eq. (15) we write the five-point amplitude as follows: The birth of string theory 9 du34u −α(s12)−1 −α(s13)−1 ×u−α(s24)−124 u −α(s23)−1 −α(s34)−1 δ(u23 + u12u34 − 1)δ(u24 + u12u13 − 1)δ(u34 + u13u23 − 1) (19) Performing the integral over the variables u23, u24 and u34 we get: du13u −α(s12)−1 −α(s13)−1 × (1− u12)−α(s23)−1(1 − u13)−α(s13)−1(1− u12u13)−α(s24)+α(s23)+α(s34)(20) We have implicitly assumed that the Regge trajectory is the same in all chan- nels and that the external scalar particles have the same common mass m and are the lowest lying states on the Regge trajectory. This means that their mass is given by: α0 − α′p2i = 0 ; p2i ≡ −m2 (21) Using then the relation: α(s23) + α(s34)− α(s24) = 2α′p2 · p4 (22) we can rewrite Eq. (20) as follows: −α(s2)−1 −α(s3)−1 3 (1− u2)−α(s23)−1× × (1 − u3)−α(s34)−1 (1− xij)2α ′pi·pj (23) where si ≡ s1i , ui ≡ u1i ; i = 2, 3 ; xij = uiui+1 . . . uj−1. (24) We are now ready to construct the N -point function [13]. In analogy with what has been done for the four and five-point amplitudes we can write the N -point amplitude as follows: . . . −α(sP )−1 δ(uQ − 1 + uQ̄) (25) 10 Paolo Di Vecchia where the first product is over the N(N−3) variables corresponding to all planar channels, while the second one is over the (N−3)(N−2) independent δ-functions. The product in the δ-function is defined in Eq. (16). The solution of all the non-independent linear relations imposed by the δ-functions is given by uij = (1 − xij)(1− xi−1,j+1) (1− xi−1,j)(1 − xi,j+1) where the variables xij are given in Eq. (24). Eliminating the δ-function from Eq. (25) one gets: −α(si)−1 i (1 − ui) −α(si,i+1)−1 j=i+2 (1− xij)−γij(27) where γij = α(sij) + α(si+1;j−1)− α(si;j−1)− α(si+1;j) ; j ≥ i+ 2 (28) It is easy to see that α(si,i+1) = −α0 − 2α′pi · pi+1 ; γij = −2α′pi · pj ; j ≥ i+ 2 (29) Inserting them in Eq. (27) we get: −α(si)−1 i (1− ui) j=i+1 (1− xij)2α ′pi·pj (30) This is the form of the N -point amplitude that was originally constructed. Then Koba and Nielsen [15] put it in the form that is more known nowadays. They constructed it using the following rules. They associated a real variable zi to each leg i. Then they associated to each channel (i, j) an anharmonic ratio constructed from the variables zi, zi−1, zj, zj+1 in the following way (zi, zi+1, zj, zj+1) −α(sij)−1 = (zi − zj)(zi−1 − zj+1) (zi−1 − zj)(zi − zj+1) ]−α(sij)−1 and finally they gave the following expression for the N -point amplitude: dV (z) (i,j) (zi, zi+1, zj, zj+1) −α(sij)−1 (32) where dV (z) = 1 [θ(zi − zi+1)dzi] i=1(zi − zi+2)dVabc ; dVabc = dzadzbdzc (zb − za)(zc − zb)(za − zc) The birth of string theory 11 and the variables zi are integrated along the real axis in a cyclically ordered way: z1 ≥ z2 . . . ≥ zN with a, b, c arbitrarily chosen. The integrand of the N -point amplitude is invariant under projective transformations acting on the leg variables zi: αzi + β γzi + δ ; i = 1 . . .N ; αδ − βγ = 1 (34) This is because both the anharmonic ratio in Eq. (31) and the measure dVabc are invariant under a projective transformation. Since a projective transfor- mation depends on three real parameters, then the integrand of the N -point amplitude depends only on N − 3 variables zi. In order to avoid infinities, one has then to divide the integration volume with the factor dVabc that is also invariant under the projective transformations. The fact that the integrand depends only on N − 3 variables is in agreement with the fact that N − 3 is also the maximal number of simultaneous poles allowed in the amplitude. It is convenient to write the N -point amplitude in a form that involves the scalar product of the external momenta rather than the Regge trajectories. We distinguish three kinds of channels. The first one is when the particles i and j of the channel (i, j) are separated by at least two particles. In this case the channels that contribute to the exponent of the factor (zi − zj) are the channels (i, j) with exponent equal to −α(sij) − 1, (i + 1, j − 1) with exponent −α(si+1,j−1)− 1, (i+1, j) with exponent α(si+1,j)+1 and (i, j− 1) with exponent α(si,j−1) + 1. Adding these four contributions one gets for the channels where i and j are separated by at least two particles − α(sij)− α(si+1,j−1) + α(si+1,j) + α(si,j−1) = 2α′pi · pj (35) The second one comes from the channels that are separated by only one particle. In this case only three of the previous four channels contribute. For instance if j = i+2 the channel (i+1, j− 1) consists of only one particle and therefore should not be included. This means that we would get: − α(si;i+2)− 1 + α(s1+1;i+2) + 1 + α(si;i+1 + 1) = 1 + 2α′pi · pi+2 (36) Finally the third one that comes from the channels whose particles are adja- cent, gets only contribution from: − α(si;i+1)− 1 = α0 − 1 + 2α′pi · pi+1 (37) Putting all these three terms together in Eq. (32) and remembering the factor in the denominator in the first equation of (33) we get: 1 dziθ(zi − zi+1) dVabc (zi − zi+1)α0−1 (zi − zj)2α ′pi·pj(38) A convenient choice for the three variables to keep fixed is: 12 Paolo Di Vecchia za = z1 = ∞ ; zb = z2 = 1 ; zc = zN = 0 (39) With this choice the previous equation becomes: dziθ(zi − zi+1) (zi − zi+1)α0−1× j=i+1 (zi − zj)2α ′pi·pj (40) We now want to show that this amplitude is identical to the one given in Eq. (30). This can be done by performing the following change of variables: ; i = 2, 3 . . .N − 2 (41) that implies zi = u2u3 . . . ui−1 ; i = 3, 4 . . .N − 1 (42) Taking into account that the Jacobian is equal to: uN−2−ii (43) using the following two relations: (zi − zi+1)α0−1 = (N−1−i)α0−1 (1− ui)α0−1 (44) j=i+1 (zj − zi)2α ′pi·pj = j=i+1 (1− xij)2α ′pi·pj −α(si)−(N−i−1)α0 i (45) and the conservation of momentum pi = 0 (46) together with Eq. (21), one can easily see that Eq.s (30) and (40) are equal. The birth of string theory 13 The N -point amplitude that we have constructed in this section corre- sponds to the scattering of N spinless particles with no internal degrees of freedom. On the other hand it was known that the mesons were classified according to multiplets of an SU(3) flavour symmetry. This was implemented by Chan and Paton [16] by multiplying the N -point amplitude with a factor, called Chan-Paton factor, given by Tr(λa1λa2 . . . λaN ) (47) where the λ’s are matrices of a unitary group in the fundamental representa- tion. Including the Chan-Paton factors the total scattering amplitude is given Tr(λa1λa2 . . . λaN )BN (p1, p2, . . . pN ) (48) where the sum is extended to the (N − 1)! permutations of the external legs, that are not related by a cyclic permutations. Originally when the dual reso- nance model was supposed to describe strongly interacting mesons, this factor was introduced to represent their flavour degrees of freedom. Nowadays the interpretation is different and the Chan-Paton factor represents the colour degrees of freedom of the gauge bosons and the other massive particles of the spectrum. The N -point amplitude BN that we have constructed in this section con- tains only simple pole singularities in all possible planar channels. They cor- respond to zero width resonances located at non-negative integer values n of the Regge trajectory α(M2) = n. The lowest state located at α(m2) = 0 cor- responds to the particles on the external legs of BN . The spectrum of excited particles can be obtained by factorizing the N -point amplitude in the most general channel with any number of particles. This was done in Ref.s [17] and [18] finding a spectrum of states rising exponentially with the mass M . Being the model relativistic invariant it was found that many states obtained by factorizing the N -point amplitude were ”ghosts”, namely states with negative norm as one finds in QED when one quantizes the electromagnetic field in a covariant gauge. The consistency of the model requires the existence of rela- tions satisfied by the scattering amplitudes that are similar to those obtained through gauge invariance in QED. If the model is consistent they must decou- ple the negative norm states leaving us with a physical spectrum of positive norm states. In order to study in a simple way these issues, we discuss in the next section the operator formalism introduced already in 1969 [19, 20, 21]. Before concluding this section let us go back to the non-planar four-point amplitude in Eq. (5) and discuss its generalization to an N -point amplitude. Using the technique of the electrostatic analogue on the sphere instead of on the disk Shapiro [22] was able to obtain a N -point amplitude that reduces to the four-point amplitude in Eq. (5) with intercept α0 = 2. The N -point amplitude found in Ref. [22] is: 14 Paolo Di Vecchia i=1 d dVabc |zi − zj |α ′pi·pj (49) where dVabc = d2zad |za − zb|2|za − zc|2|zb − zc|2 The integral in Eq. (49) is performed in the entire complex plane. 3 Operator formalism and factorization The factorization properties of the dual resonance model were first studied by factorizing by brute force the N-point amplitude at the various poles [17, 18]. The number of terms that factorize the residue of the pole at α(s) = n, increases rapidly with the value of n. In order to find their degeneracy it turned out to be convenient to first rewrite the N-point amplitude in an operator formalism. In this section we introduce the operator formalism and we rewrite the N -point amplitude derived in the previous section in this formalism. The key idea [19, 20, 21] is to introduce an infinite set of harmonic oscil- lators and a position and momentum operators 7 which satisfy the following commutation relations: [anµ, a mν ] = ηµνδnm ; [q̂µ, p̂ν ] = iηµν (51) where ηµν is the flat Minkowski metric that we take to be ηµν = (−1, 1, . . . 1). A state with momentum p is constructed in terms of a state with zero mo- mentum as follows: p̂|p〉 ≡ p̂eip·q̂|0〉 = p|p〉 ; p̂ |0〉 = 0 (52) normalized as 8 〈p|p′〉 = (2π)dδ(d)(p+ p′) (53) In order to avoid minus signs we use the convention that 〈p| = 〈0|eip·q̂ (54) A complete and orthonormal basis of vectors in the harmonic oscillator space is given by |λ1, λ2, . . . λi; p〉 = (a†µn;n) λn;µn λn,µn ! eipq̂|0, 0〉 (55) 7 Actually the position and momentum operators were introduced in Ref. [23]. 8 Although we now use an arbitrary d we want to remind you that all original calculations were done for d = 4. The birth of string theory 15 where the first |0〉 corresponds to the one annihilated by all annihilation op- erators and the second one to the state of zero momentum: aµn;n|0, 0〉 = p̂|0, 0〉 = 0 (56) Notice that Lorentz invariance forces to introduce also oscillators that create states with negative norm due to the minus sign in the flat Minkowski metric. This implies that the space spanned by the states in Eq. (55) is not positive definite. This is, however, not allowed in a quantum theory and therefore if the dual resonance model is a consistent quantum-relavistic theory we expect the presence of relations of the kind of those provided by gauge invariance in Let us introduce the Fubini-Veneziano [23] operator: Qµ(z) = Q µ (z) +Q µ (z) +Q µ (z) (57) where Q(+) = i z−n ; Q(−) = −i Q(0) = q̂ − 2iα′p̂ log z (58) In terms of Q we introduce the vertex operator corresponding to the external leg with momentum p: V (z; p) =: eip·Q(z) :≡ eip·Q (−)(z)eipq̂e+2α ′p̂·p log zeip·Q (+)(z) (59) and compute the following vacuum expectation value: 〈0, 0| V (zi, pi)|0, 0〉 (60) It can be easily computed using the Baker-Haussdorf relation eAeB = eBeAe[A,B] (61) that is valid if the commutator, as in our case, [A,B] is a c-number. In our case the commutation relations to be used are: [Q(+)(z), Q(−)(w)] = −2α′ log and the second one in Eq. (51). Using them one gets: V (z; p)V (w; k) =: V (z; p)V (w; k) : (z − w)2α ′p·k (63) 16 Paolo Di Vecchia 〈0, 0| V (zi, pi)|0, 0〉 = (zi − zj)2α ′pi·pj (2π)dδ(d)( pi) (64) where the normal ordering requires that all creation operators be put on the left of the annihilation one and the momentum operator p̂ be put on the right of the position operator q̂. This means that (2π)dδ(d)( pi)BN = 1 dziθ(zi − zi+1) dVabc (zi − zi+1)α0−1 × 〈0, 0| V (zi, pi)|0, 0〉 (65) By choosing the three variables za, zb and zc as in Eq. (39) we can rewrite the previous equation as follows: (2π)dδ(d)( pi)BN = θ(zi − zi+1)× (zi − zi+1)α0−1 〈0, p1| V (zi; pi)|0, pN 〉 (66) where we have taken z2 = 1 and we have defined (α0 ≡ α′p2i ; i = 1 . . .N) : V (zN ; pN)|0, 0〉 ≡ |0; pN〉 ; 〈0; 0| lim z2α01 V (z1; p1) = 〈0, p1| (67) Before proceeding to factorize the N -point amplitude let us study the prop- erties under the projective group of the operators that we have introduced. We have already seen that the projective group leaves the integrand of the Koba-Nielsen representation of the N -point amplitude invariant. The projec- tive group has three generators L0, L1 and L−1 corresponding respectively to dilatations, inversions and translations. Assuming that the Fubini-Veneziano fields Q(z) transforms as a field with weight 0 (as a scalar) we can immedi- ately write the commutation relations that Q(z) must satisfy. This means in fact that, under a projective transformation, Q(z) transforms as follows: Q(z) → QT (z) = Q αz + β γz + δ ; αδ − βγ = 1 (68) Expanding for small values of the parameters we get: QT (z) = Q(z) + (ǫ1 + ǫ2z + ǫ3z dQ(z) + o(ǫ2) (69) The birth of string theory 17 This means that the three generators of the projective group must satisfy the following commutation relations with Q(z): [L0, Q(z)] = z ; [L−1, Q(z)] = ; [L1, Q(z)] = z They are given by the following expressions in terms of the harmonic oscilla- tors: L0 = α ′p̂2 + na†n · an ; L1 = 2α′p̂ · a1 + n(n+ 1)an+1 · a†n (71) L−1 = L 2α′p̂ · a†1 + n(n+ 1)a n+1 · an (72) They annihilate the vacuum L0|0, 0〉 = L1|0, 0〉 = L−1|0, 0〉 = 0 (73) that is therefore called the projective invariant vacuum, and satisfy the algebra that is called Gliozzi algebra [24]9: [L0, L1] = −L1 ; [L0, L−1] = L−1 ; [L1, L−1] = 2L0 (74) The vertex operator with momentum p is a projective field with weight equal to α0 = α ′p2. It transforms in fact as follows under the projective group: [Ln, V (z, p)] = z n+1 dV (z, p) + α0(n+ 1)z nV (z, p) ; n = 0,±1 (75) or in finite form as follows: UV (z, p)U−1 = (γz + δ)2α0 αz + β γz + δ where U is the generator of an arbitrary finite projective transformation. Since U leaves the vacuum invariant, by using Eq. (76) it is easy to show that: 〈0, 0| V (z′i, p)|0, 0〉 = (γzi + δ) 2α0〈0, 0| V (zi, p)|0, 0〉 (77) that together with the following equation: (z′i − z′i+1)α0−1 = (zi − zi+1)α0−1 (γzi + δ) −2α0(78) 9 See also Ref. [25]. 18 Paolo Di Vecchia implies that the integrand of the N -point amplitude in Eq. (65) is invariant under projective transformations. We are now ready to factorize the N -point amplitude and find the spec- trum of mesons. From Eq.s (75) and (76) it is easy to derive the transformation of the vertex operator under a finite dilatation: zL0V (1, p)z−L0 = V (z, p)zα0 (79) Changing the integration variables as follows: ; i = 2, 3 . . .N − 2 ; det = z3z4 . . . zN−2 (80) where the last term is the jacobian of the trasformation from zi to xi, we get from Eq.(66) the following expression: AN ≡ 〈0, p1|V (1, p2)DV (1, p3) . . .DV (1, pN−1)|0, pN〉 (81) where the propagator D is equal to: dxxL0−1−α0(1− x)α0−1 = Γ (L0 − α0)Γ (α0) Γ (L0) AN = (2π) dδ(d) BN (83) The factorization properties of the amplitude can be studied by inserting in the channel (1,M) or equivalently in the channel (M +1, N) described by the Mandelstam variable s = −(p1 + p2 + . . . pM )2 = −(pM+1 + pM+2 . . .+ pN )2 ≡ −P 2 (84) the complete set of states given in Eq. (55): 〈p(1,M)|λ, P 〉〈λ, P |D|µ, P 〉〈µ, P |p(M+1,N)〉 (85) where 〈p(1,M)| = 〈0, p1|V (1, p2)DV (1, p3) . . . V (1, pM ) (86) |p(M+1,N)〉 = V (1, pM+1)D . . . V (1, pN−1)|pN , 0〉 (87) Introducing the quantity: The birth of string theory 19 na†n · an (88) it is possible to rewrite 〈λ, P |D|µ, P 〉 = 〈λ, P | (−1)m α0 − 1 R+m− α(s) |µ, P 〉 (89) where s is the variable defined in Eq. (84). Using this equation we can rewrite Eq. (85) as follows 〈p(1,M)|λ, P 〉 〈λ, P | (−1)m α0 − 1 R+m− α(s) |µ, P 〉〈µ, P |p(M+1,N)〉(90) This expression shows that amplitude AN has a pole in the channel (1,M) when α(s) is equal to an integer n ≥ 0 and the states |λ〉 that contribute to its residue are those satisfying the relation: R|λ〉 = (n−m)|λ〉 ; m = 0, 1 . . . n (91) The number of independent states |λ〉 contributing to the residue gives the degeneracy of states for each level n. Because of manifest relativistic invariance the space spanned by the com- plete system of states in Eq. (55) contains states with negative norm corre- sponding to those states having an odd number of oscillators with timelike directions (see Eq. (51)). This is not consistent in a quantum theory where the states of a system must span a positive definite Hilbert space. This means that there must exist a number of relations satisfied by the external states that decouple a number of states leaving with a positive definite Hilbert space. In order to find these relations we rewrite the state in Eq. (87) going back to the Koba-Nielsen variables: |p(1,M)〉 = dziθ(zi − zi+1)] (zi − zi+1)α0−1× × V (1, p1)V (z2, p2) . . . V (zM−1, pM−1)|0, pM 〉 (92) Let us consider the operator U(α) that generate the projective transformation that leaves the points z = 0, 1 invariant: 1− α(z − 1) = z + α(z 2 − z) + o(α2) (93) From the transformation properties of the vertex operators in Eq. (76) it is easy to see that the previous transformation leaves the state in Eq. (92) invariant: 20 Paolo Di Vecchia U(α)|p(1,M)〉 = |p(1,M)〉 (94) This means that the generator of the previous transformation annihilates the state in Eq. (92): W1|p(1,M)〉 = 0 ; W1 = L1 − L0 (95) The explicit form of W1 follows from the infinitesimal form of the transforma- tion in Eq. (93). This condition that is of the same kind of the relations that on shell amplitudes with the emission of photons satisfy as a consequence of gauge invariance, implies that the residue at the pole in Eq. (90) can be fac- torized with a smaller number of states. It turns out, however, that a detailed analysis of the spectrum shows that negative norm states are still present. This can be qualitatively understood as follows. Due to the Lorentz metric we have a negative norm component for each oscillator. In order to be able to decouple all negative norm states we need to have a gauge condition of the type as in Eq. (95) for each oscillator. But the number of oscillators is infinite and, therefore, we need an infinite number of conditions of the type as in Eq. (95). It was found in Ref. [6] that, if we take α0 = 1, then one can easily construct an infinite number of operators that leave the state in Eq. (92) invariant. In the next section we will concentrate on this case. 4 The case α0 = 1 If we take α0 = 1 many of the formulae given in the previous section simplify. The N -point amplitude in Eq. (38) becomes: 1 dziθ(zi − zi+1) dVabc (zi − zj)2α ′pi·pj (96) that can be rewritten in the operator formalism as follows: (2π)4δ( pi)BN = 1 dziθ(zi − zi+1) dVabc 〈0, 0| V (zi, pi)|0, 0〉 (97) By choosing z1 = ∞, z2 = 1 and zN = 0 it becomes (2π)4δ( pi)BN = θ(zi − zi+1)〈0, p1| V (zi; pi)|0, pN 〉 (98) The birth of string theory 21 where V (zN ; pN )|0, 0〉 ≡ |0; pN 〉 ; 〈0; 0| lim z21V (z1; p1) = 〈0, p1| (99) Eq. (81) is as before, but now the propagator becomes: dxxL0−2 = L0 − 1 (100) This means that Eq. (89) becomes: 〈λ, P |D|µ, P 〉 = 〈λ, P | 1 L0 − 1 |µ, P 〉 (101) and Eq. (90) has the simpler form: 〈p(1,M)|λ, P 〉〈λ, P | R − α(s) |λ, P 〉〈λ, P |p(M+1,N)〉 (102) BN has a pole in the channel (1,M) when α(s) is equal to an integer n ≥ 0 and the states |λ〉 that contribute to its residue are those satisfying the relation: R|λ〉 = n|λ〉 (103) Their number gives the degeneracy of the states contributing to the pole at α(s) = n. The N -point amplitude can be written as: BN = 〈p(1,M)|D|p(M+1,N)〉 (104) where |p(1,M)〉 = ∫ M−1 [dziθ(zi − zi+1)]× × V (1, p1)V (z2, p2) . . . V (zM−1, pM−1|0, pM 〉 (105) Using Eq. (79) and changing variables from zi, i = 2 . . .M−1 to xi = zi+1zi , i = 1 . . .M − 2 with z1 = 1 we can rewrite the previous equation as follows: |p(1,M)〉 = V (1, p1)DV (1, p2) . . .DV (1, pM−1)|0, pM 〉 (106) where the propagator D is defined in Eq. (100). We want now to show that the state in Eq.s (105) and (106) is not only annihilated by the operator in Eq. (95), but, if α0 = 1 [6], by an infinite set of operators whose lowest one is the one in Eq. (95). We will derive this by using the formalism developed in Ref. [26] and we will follow closely their derivation. Starting from Eq.s (70) Fubini and Veneziano realized that the generators of the projective group acting on a function of z are given by: 22 Paolo Di Vecchia L0 = −z ; L−1 = − ; L1 = −z2 (107) They generalized the previous generators to an arbitrary conformal transfor- mation by introducing the following operators, called Virasoro operators: Ln = −zn+1 (108) that satisfy the algebra: [Ln, Lm] = (n−m)Ln+m (109) that does not contain the term with the central charge! They also showed that the Virasoro operators satisfy the following commutation relations with the vertex operator: [Ln, V (z, p)] = zn+1V (z, p) (110) More in general actually they define an operator Lf corresponding to an arbitrary function f(ξ) and Lf = Ln if we choose f(ξ) = ξ n. In this case the commutation relation in Eq. (110) becomes: [Lf , V (z, p)] = (zf(z)V (z, p)) (111) By introducing the variable: ξf(ξ) (112) where A is an arbitrary constant, one can rewrite Eq. (111) in the following form: [Lf , zf(z)V (z, p)] = (zf(z)V (z, p)) (113) This implies that, under an arbitrary conformal transformation z → f(z), generated by U = eαLf , the vertex operator transforms as: eαLfV (z, p) zf(z) e−αLf = V (z′, p)z′f(z′) (114) where the parameter α is given by: ξf(ξ) (115) On the other hand, this equation implies: zf(z) z′f(z′) (116) The birth of string theory 23 that, inserted in Eq. (114), implies that the quantity V (z, p) dz is left invariant by the transformation z → f(z): eαLfV (z, p)dze−αLf = V (z′, p)dz′ (117) Let us now act with the previous conformal transformation on the state in Eq. (105). We get: eαLf |p(1,M)〉 = [dziθ(zi − zi+1)] eαLfV (1, p1)e−αLf× ×eαLfV (z2, p2)e−αLf . . . . . . eαLfV (zM−1, pM−1)e−αLf eαLf |0, pM 〉 = θ(zi − zi+1)× eαLfV (1, p1)e−αLf× × V (z′2, p2)dz′2 . . . V (z′M−1, pM−1)dz′M−1eαLf |0, pM 〉 (118) where we have used Eq. (117). The previous transformation leaves the state invariant if both z = 0 and z = 1 are fixed points of the conformal transfor- mation. This happens if the denominator in Eq. (115) vanishes when ξ = 0, 1. This requires the following conditions: f(1) = 0 ; lim ξf(ξ) = 0 (119) Expanding ξ near the poinr ξ = 1 we can determine the relation between z and z′ near z = z′ = 1. We get: ze−αf 1− z + ze−αf ′(1) (120) and from it we can determine the conformal factor: (1 − z + ze−αf ′(1))2 → eαf ′(1) (121) in the limit z → 1. Proceeding in the same near the point z = z′ = 0 we get: zf(0)eαf(0) f(0) + zf ′(0)(1− eαf(0) → zeαf(0) (122) in the limit z → 0. This means that Eq. (118) becomes eα(Lf−f ′(1)−f(0))|p(1,M)〉 = |p(1,M)〉 (123) A choice of f that satisfies Eq.s (119) is the following: 24 Paolo Di Vecchia f(ξ) = ξn − 1 (124) that gives the following gauge operator: Wn = Ln − L0 − (n− 1) (125) that annihilates the state in Eq. (105): Wn|p1...M 〉 = 0 ; n = 1 . . .∞ (126) These are the Virasoro conditions found in Ref. [6]. There is one condition for each negative norm oscillator and, therefore, in this case there is the possibility that the physical subspace is positive definite. An alternative more direct derivation of Eq. (126) can be obtained by acting with Wn on the state in Eq. (106) and using the following identities: WnV (1, p) = V (1, p)(Wn + n) ; (Wn + n)D = [L0 + n− 1]−1Wn (127) The second equation is a consequence of the following equation: L0 = xL0+nLn (128) Eq.s (127) imply WnV (1, p)D = V (1, p)[L0 + n− 1]−1Wn (129) This shows that the operator Wn goes unchanged through all the product of terms V D until it arrives in front of the term V (1, pM−1)|0, pM 〉. Going through the vertex operator it becomes Ln − L0 + 1 that then annihilate the state (Ln − L0 + 1)|pM , 0〉 = 0 (130) This proves Eq. (126). Using the representation of the Virasoro operators given in Eq. (108) Fu- bini and Veneziano showed that they satisfy the algebra given in eq. (109) without the central charge. The presence of the central charge was recognized by Joe Weis10 in 1970 and never published. Unlike Fubini and Veneziano [26] he used the expression of the Ln operators in terms of the harmonic oscillators: 2α′np̂ · an + m(n+m)an+m · am+ m(n−m)am−n · am ;n ≥ 0 Ln = L†n (131) 10 See noted added in proof in Ref. [26]. The birth of string theory 25 He got the following algebra: [Ln, Lm] = (n−m)Ln+m + n(n2 − 1)δn+m;0 (132) where d is the dimension of the Minkowski space-time. We write here d for the dimension of the Minkowski space, but we want to remind you that almost everybody working in a model for mesons at that time took for granted that the dimension of the space-time was d = 4. As far as I remember the first paper where a dimension d 6= 4 was introduced was Ref. [27] where it was shown that the unitarity violating cuts in the non-planar loop become poles that were consistent with unitarity if d = 26. In the last part of this section we will generalize the factorization procedure to the Shapiro-Virasoro model whose N -point amplitude is given in Eq. (49). In this case we must introduce two sets of harmonic oscillators commuting with each other and only one set of zero modes satisfying the algebra [28] : [anµ, a mν ] = [ãnµ, ã mν ] = ηµνδnm ; [q̂µ, p̂ν ] = iηµν (133) In terms of them we can introduce the Fubini-Veneziano operator Q(z, z̄) = q̂ − 2α′p̂ log(zz̄) + i −n − a†nzn ãnz̄ −n − ã†nz̄n (134) We can then introduce the vertex operator: V (z, z̄; p) =: eip·Q(z,z̄) : (135) and write the N -point amplitude in Eq. (95) in the following factorized form: i=1 d dVabc V (zi, z̄i, pi)) |0〉 = = (2π)4δ(4)( i=1 d dVabc |zi − zj|α ′pi·pj (136) where the radial ordered product is given by V (zi, z̄i, pi)) V (zi, z̄i, pi)) θ(|zi| − |zi+1|) + . . . (137) 26 Paolo Di Vecchia and the dots indicate a sum over all permutations of the vertex operators. By fixing z1 = ∞, z2 = 1, zN = 0 we can rewrite the previous expression as follows: ∫ N−1 d2zi〈0, p1|R V (zi, z̄i, pi)) |0, pN〉 (138) For the sake of simplicity let us consider the term corresponding to the per- mutation 1, 2, . . .N . In this case the Koba-Nielsen variables are ordered in such a way that |zi| ≥ |zi+1| for i = 1, . . .N −1. We can then use the formula: V (zi, z̄i, pi)) = z L̃0−1 i V (1, 1, pi)z i (139) and change variables: ; |wi| ≤ 1 (140) to rewrite Eq. (138) as follows: 〈0, p1|V (1, 1, pi1)DV (1, 1, p2)D . . . V (1, 1, pN−1)|0, pN 〉 (141) where wL0−1w̄L̃0−1 = L0 + L̃0 − 2 · sinπ(L0 − L̃0) L0 − L̃0 (142) We can now follow the same procedure for all permutations arriving at the following expression: 〈0, p1|P [V (1, 1, p2)DV (1, 1, p3)D . . . V (1, 1, pN−1)]|0, pN〉 (143) where P means a sum of all permutations of the particles. If we want to consider the factorization of the amplitude on the pole at s = −(p1 + . . . pM )2 we get only the following contribution: 〈p(1...M)|D|p(M+1...N)〉 (144) where |p(M+1...N)〉 = P [V (1, 1, pM+1)D . . . V (1, 1, pN−1]|0, pN 〉 (145) 〈p(1...M)| = 〈0, p1|P [V (1, 1, p2)D . . . V (1, 1, pM)] (146) The amplitude is factorized by introducing a complete set of states and rewrit- ing Eq. (141) as follows: The birth of string theory 27 〈p1...M |λ, λ̃〉 2π〈λ, λ̃|δL0,L̃0|λ, λ̃〉 L0 + L̃0 − 2 〈λ, λ̃|p(M+1,...N)〉 (147) By writing p̂2 +R ; L̃0 = p̂2 + R̃ (148) na†n · an ; R̃ = nã†n · ãn (149) we can rewrite Eq. (147) as follows 〈p1...M |λ, λ̃〉 2π〈λ, λ̃|δR,R̃|λ, λ̃〉 R + R̃− α(s) 〈λ, λ̃|p(M+1,...N)〉 (150) We see that the amplitude for the Shapiro-Virasoro model has simple poles only for even integer values of αSV (s) = 2 + s = 2n ≥ 0 and the residue at the poles factorizes in a sum with a finite number of terms. Notice that the Regge trajectory of the Shapiro-Virasoro model has double intercept and half slope of that of the generalized Veneziano model. 5 Physical states and their vertex operators In the previous section, we have seen that the residue at the poles of the N - point amplitudes factorizes in a sum of a finite number of terms. We have also seen that some of these terms, due to the Lorentz metric, correspond to states with negative norm. We have also derived a number of ”Ward identities” given in Eq. (126) that imply that some of the terms of the residue decouple. The question to be answered now is: Is the space spanned by the physical states a positive norm Hilbert space? In order to answer this question we need first to find the conditions that characterize the on shell physical states |λ, P 〉 and then to determine which are the states that contribute to the residue of the pole at α(s = −P 2) = n. In other words, we have to find a way of characterizing the physical states and of eliminating the spurious states that decouple in Eq. (102) as a consequence of Eq.s (126). A state |λ.P 〉 contributes at the residue of the pole in Eq.(102) for α(s = −P 2) = n if it is on shell, namely if it satisfies the following equations: R|λ, P 〉 = n|λ, P 〉 ; α(−P 2) = 1− α′P 2 = n (151) that can be written in a unique equation: 28 Paolo Di Vecchia (L0 − 1)|λ, P 〉 = 0 (152) Because of Eq. (126) we also know that a state of the type: |s, P 〉 = W †m|µ, P 〉 (153) is not going to contribute to the residue of the pole. We call it a spurious or unphysical state. We start constructing the subspace of spurious states that are on shell at the level n. Let us consider the set of orthogonal states |µ, P 〉 such that R|µ, P 〉 = nµ|µ, P 〉 ; L0|µ, P 〉 = (1−m)|µ, P 〉 ; 1− α′P 2 = n (154) where m = n+ nµ (155) In terms of these states we can construct the most general spurious state that is on shell at the level n. It is given by |s, P 〉 = W †m|µ, P 〉 ; (L0 − 1)|s, P 〉 = 0 (156) per any positive integer m. Using Eq. (154), eq. (156) becomes: |s, P 〉 = L†m|µ, P 〉 (157) where |µ, P 〉 is an arbitrary state satisfying Eq.s (154). A physical state |λ, P 〉 is defined as the one that is orthogonal to all spuri- ous states appearing at a certain level n. This means that it must satisfy the following equation: 〈λ.P |L†ℓ |µ, P 〉 = 0 (158) for any state |µ, P 〉 satisfying Eq.s (154). In conclusion, the on shell physical states at the level n are characterized by the fact that they satisfy the following conditions: Lm|λ, P 〉 = (L0 − 1)|λ, P 〉 = 0 ; 1− α′P 2 = n (159) These conditions characterizing the physical subspace were first found by Del Giudice and Di Vecchia [28] where the analysis described here was done. In order to find the physical subspace one starts writing the most general on shell state contributing to the residue of the pole at level n in Eq. (154). Then one imposes Eq.s (159) and determines the states that span the physical subspace. Actually, among these states one finds also a set of zero norm states that are physical and spurious at the same time. Those states are of the form given in Eq. (157), but also satisfy Eq.s (159). It is easy to see that they are not really physical because they are not contributing to the residue of the pole The birth of string theory 29 at the level n. This follows from the form of the unit operator given in the space of the physical states by: norm 6=0 |λ, P 〉〈λ, P |+ [|λ0, P 〉〈µ0, P |+ |µ0, P 〉〈λ0, P |] (160) where |λ0, P 〉 is a zero norm physical and spurious state and |µ0, P 〉 its con- jugate state. A conjugate state of a zero norm state is obtained by changing the sign of the oscillators with timelike direction. Since |λ0, P 〉 is a spurious state when we insert the unit operator, given in Eq. (160), in Eq. (102) we see that the zero norm states never contribute to the residue because their contribution is annihilated either from the state 〈p(1,M)| or from the state |p(M+1,N)〉. In conclusion, the physical subspace contains only the states in the first term in the r.h.s. of Eq. (160). Let us analyze the first two excited levels. The first excited level corre- sponds to a massless gauge field. It is spanned by the states ǫµa 1µ|0, P 〉. In this case the only condition that we must impose is: 1µ|0, P 〉 = 0 =⇒ P · ǫ = 0 (161) Choosing a frame of reference where the momentum of the photon is given by Pµ ≡ (P, 0....0, P ) , Eq. (161) implies that the only physical states are: 1i |0, P 〉+ ǫ(a 1;0 − a 1;d−1)|0, P 〉 ; i = 1 . . . d− 2 (162) where ǫi and ǫ are arbitrary parameters. The state in Eq. (162) is the most general state of the level N = 1 satisfying the conditions in Eq. (159). The first state in eq. (162) has positive norm, while the second one has zero norm that is orthogonal to all other physical states since it can be written as follows: 1;0 − a 1;D−1)|0, P 〉 = L 1|0, P 〉 (163) in the frame of reference where Pµ ≡ (P, ...0, P ). Because of the previous property it is decoupled from the physical states together with its conjugate: 1,0 + a 1,d−1)|0, P 〉 (164) In conclusion, we are left only with the transverse d− 2 states corresponding to the physical degrees of freedom of a massless spin 1 state. At the next level n = 2 the most general state is given by: [αµνa 1,ν + β 2,µ]|0, P 〉 (165) If we work in the center of mass frame where Pµ = (M,0) we get the following most general physical state: |Phys >= αij [a†1,ia 1,j − (d− 1) 1,k]|0, P 〉+ 30 Paolo Di Vecchia +βi[a 2,i + a 1,i]|0, P >〉+ 1,i + 1,0 − 2a |0, P 〉 (166) where the indices i, j run over the d− 1 space components. The first term in (166) corresponds to a spin 2 in (d− 1) dimensional space and has a positive norm being made with space indices. The second term has zero norm and is orthogonal to the other physical states since it can be written as L+1 a 1,i|0, P 〉. Therefore it must be eliminated from the physical spectrum together with its conjugate, as explained above. Finally, the last state in (166) is spinless and has a norm given by: 2(d− 1)(26− d) (167) If d < 26 it corresponds to a physical spin zero particle with positive norm. If d > 26 it is a ghost. Finally, if d = 26 it has a zero norm and is also orthogonal to the other physical states since it can be written in the form: 2 + 3L 1 )|0 > (168) It does not belong, therefore, to the physical spectrum. The analysis of this level was done in Ref. [29] with d = 4. This did not allow the authors of Ref. [29] to see that there was a critical dimension. The analysis of the physical states can be easily extended [28] to the Shapiro-Virasoro model. In this case the physical conditions given in Eq. (159) for the open string, become [28]: Lm|λ, λ̃〉 = L̃m|λ, λ̃〉 = (L0 − 1)|λ, λ̃〉 = (L̃0 − 1)|λ, λ̃〉 = 0 (169) for any positive integer m. It can be easily seen from the previous equations that the lowest state of the Shapiro-Virasoro model is the vacuum |0a, 0ã, p〉 corresponding to a tachyon with mass α′p2 = 4, while the next level described by the state a 1ν |0a, 0ã, p〉 contains massless states corresponding to the graviton, a dilaton and a two-index antisymmetric tensor Bµν . Having characterized the physical subspace one can go on and construct a N -point scattering amplitude involving arbitrary physical states. This was done by Campagna, Fubini, Napolitano and Sciuto [30] where the vertex oper- ator for an arbitrary physical state was constructed in analogy with what has been done for the ground tachyonic state. They associated to each physical state |α, P 〉 a vertex operator Vα(z, P ) that is a conformal field with conformal dimension equal to 1: [Ln, Vα(z, p)] = zn+1Vα(z, p) (170) and reproduces the corresponding state acting on the vacuum as follows: Vα(z; p)|0, 0〉 ≡ |α; p〉 ; 〈0; 0| lim z2Vα(z; p) = 〈α, p| (171) The birth of string theory 31 It satisfies, in addition, the hermiticity relation: V †α (z, P ) = Vα( ,−P )(−1)α(−P 2) (172) An excited vertex that will play an important role in the next section is the one associated to the massless gauge field. It is given by: Vǫ(z, k) ≡ ǫ · dQ(z) eik·Q(z) ; k · ǫ = k2 = 0 (173) Because of the last two conditions in Eq. (173) the normal order is not neces- sary. It is convenient to give the expression of dQ(z) in terms of the harmonic oscillators: P (z) ≡ dQ(z) −n−1 (174) It is a conformal field with conformal dimension equal to 1. The rescaled oscillators αn are given by: nan ; α−n = na†n ; n > 0 ; α0 = 2α′p̂ (175) In terms of the vertex operators previously introduced the most general amplitude involving arbitrary physical states is given by [30]: (2π)4δ( 1 dziθ(zi − zi+1) dVabc 〈0, 0| Vαi(zi, pi)|0, 0〉(176) In the case of the Shapiro-Virasoro model the tachyon vertex operator is given in Eq. (135). By rewriting Eq. (134) as follows: Q(z, z̄) = Q(z) + Q̃(z̄) (177) where Q(z) = q̂ − 2α′p̂ log(z) + i −n − a†nzn (178) Q̃(z̄) = q̂ − 2α′p̂ log(z̄) + i ãnz̄ −n − ã†nz̄n (179) we can write the tachyon vertex operator in the following way: V (z, z̄, p) =: eip·Q(z)eip·Q̃(z̄) : (180) 32 Paolo Di Vecchia This shows that the vertex operator corresponding to the tachyon of the Shapiro-Virasoro model can be written as the product of two vertex oper- ators corresponding each to the tachyon of the generalized Veneziano model. Analogously the vertex operator corresponding to an arbitrary physical state of the Shapiro-Virasoro model can always be written as a product of two vertex operators of the generalized Veneziano model: Vα,β(z, z̄, p) = Vα(z, )Vβ(z̄, ) (181) The first one contains only the oscillators αn, while the second one only the oscillators α̃n. They both contain only half of the total momentum p and the same zero modes p̂ and q̂. The two vertex operators of the generalized Veneziano model are both conformal fields with conformal dimension equal to 1. If they correspond to physical states at the level 2n, they satisfy the following relation (n = ñ): + n = 1 (182) They lie on the following Regge trajectory: p2 ≡ αSV (−p2) = 2n (183) as we have already seen by factorizing the amplitude in Eq. (150). 6 The DDF states and absence of ghosts In the previous section we have derived the equations that characterize the physical states and their corresponding vertex operators. In this section we will explicitly construct an infinite number of orthonormal physical states with positive norm. The starting point is the DDF operator introduced by Del Giudice, Di Vec- chia and Fubini [31] and defined in terms of the vertex operator corresponding to the massless gauge field introduced in eq. (173): Ai,n = i Pµ(z)e ik·Q(z) (184) where the index i runs over the d−2 transverse directions, that are orthogonal to the momentum k. We have also taken = 1. Because of the log z term appearing in the zero mode part of the exponential, the integral in Eq. (184), that is performed around the origin z = 0, is well defined only if we constrain the momentum of the state, on which Ai,n acts, to satisfy the relation: 2α′p · k = n (185) The birth of string theory 33 where n is a non-vanishing integer. The operator in Eq. (184) will generate physical states because it com- mutes with the gauge operators Lm: [Lm, An;i] = 0 (186) since the vertex operator transforms as a primary field with conformal dimen- sion equal to 1 as it follows from Eq. (170). On the other hand it also satisfies the algebra of the harmonic oscillator as we are now going to show. From Eq. (184) we get: [An,i, Am,j] = − dzǫi · P (z)eik·Q(ζ)ǫj · P (ζ)eik ′ ·Q(ζ) (187) where 2α′p · k = n ; 2α′p · k′ = m (188) and k and k′ are supposed to be in the same direction, namely kµ = nk̂µ ; k µ = mk̂µ (189) 2α′p · k̂ = 1 (190) Finally the polarizations are normalized as: ǫi · ǫj = δij (191) Since k̂ · ǫi = k̂ · ǫj = k̂2 = 0 a singularity for z = ζ can appear only from the contraction of the two terms P (ζ) and P ((z) that is given by: 〈0, 0|ǫi · P (z)ǫj · P (ζ)|0, 0〉 = − 2α′δij (z − ζ)2 (192) Inserting it in Eq. (187) we get: [An,i, Am,j ] = δij in dζk̂ · P (ζ)e−i(n+m))k̂·Q(ζ) = = inδijδn+m;0 dζk̂ · P (ζ) (193) where we have used the fact that the integrand is a total derivative and therefore one gets a vanishing contribution unless n + m = 0. If n + m = 0 from Eq.s (174) and (190) we get: [An,i, Am,j ] = nδijδn+m;0 ; i, j = 1 . . . d− 2 (194) 34 Paolo Di Vecchia Eq. (194) shows that the DDF operators satisfy the harmonic oscillator alge- In terms of this infinite set of transverse oscillators we can construct an orthonormal set of states: |i1, N1; i2, N2; . . . im, Nm〉 = Aik,−Nk√ |0, p〉 (195) where λh is the multiplicity of the operator Aih,−Nh in the product in Eq. (195) and the momentum of the state in Eq. (195) is given by P = p+ k̂Ni (196) They were constructed in four dimensions where they were not a complete system of states 11 and it took some time to realize that in fact they were a complete system of states if d = 26 [32, 33] 12. Brower [32] and Goddard and Thorn [33] showed also that the dual resonance model was ghost free for any dimension d ≤ 26. In d = 26 this follows from the fact that the DDF operators obviously span a positive definite Hilbert space (See Eq. (194)). For d < 26 there are extra states called Brower states [32]. The first of these states is the last state in Eq. (166) that becomes a zero norm state for d = 26. But also for d < 26 there is no negative norm state among the physical states. The proof of the no-ghost theorem in the case α0 = 1 is a very important step because it shows that the dual resonance model constructed generalizing the four-point Veneziano formula, is a fully consistent quantum-relativistic theory! This is not quite true because, when the intercept α0 = 1, the lowest state of the spectrum corresponding to the pole in the N -point amplitude for α(s) = 0, is a tachyon with mass m2 = − 1 . A lot of effort was then made to construct a model without tachyon and with a meson spectrum consistent with the experimental data. The only reasonably consistent models that came out from these attempts, were the Neveu-Schwarz [7] for mesons and the Ramond model [8] for fermions that only later were recognized to be part of a unique model that nowadays is called the Neveu-Schwarz-Ramond model. But this model was not really more consistent than the original dual resonance 11 Because of this Fubini did not want to publish our result, but then he went to a meeting in Israel in spring 1971 giving a talk on our work where he found that the audience was very interested in our result and when he came back to MIT we decided to publish our result. 12 I still remember Charles Thorn coming into my office at Cern and telling me: Paolo, do you know that your DDF states are complete if d = 26? I quickly redid the analysis done in Ref. [29] with an arbitrary value of the space-time dimension obtaining Eq.s (166) and (167) that show that the spinless state at the level α(s) = 2 is decoupled if d = 26. I strongly regretted not to have used an arbitrary space-time dimension d in the analysis of Ref. [29] . The birth of string theory 35 model because it still had a tachyon with mass m2 = − 1 . The tachyon was eliminated from the spectrum only in 1976 through the GSO projection proposed by Gliozzi, Scherk and Olive [34]. Having realized that, at least for the critical value of the space-time dimen- sion d = 26, the physical states are described by the DDF states having only d− 2 = 24 independent components, open the way to Brink and Nielsen [35] to compute the value α0 = 1 of the Regge trajectory with a very physical ar- gument. They related the intercept of the Regge trajectory to the zero point energy of a system with an infinite number of oscillators having only d − 2 independent components: α0 = − n (197) This quantity is obviously infinite and, in order to make sense of it, they in- troduced a cutoff on the frequencies of the harmonic oscillators obtaining an infinite term that they eliminated by renormalizing the speed of light and a finite universal constant term that gave the intercept of the Regge trajectory. Instead of following their original approach we discuss here an alternative ap- proach due to Gliozzi [36] that uses the ζ-function regularization. He rewrites Eq. (197) as follows: α0 = − n = − n−s = − ζR(−1) = 1 (198) where in the last equation we have used the identity ζR(−1) = − 112 and we have put d = 26. Since the Shapiro-Virasoro model has two sets of trans- verse harmonic oscillators it is obvious that its intercept is twice that of the generalized Veneziano model. Using the rules discussed in the previous section we can construct the vertex operator corresponding to the state in Eq. (195). It is given by: V(i;Ni)(z, P ) = dziǫi · P (zi)eiNik̂·Q(zi) : eip·Q(z) : (199) where the integral on the variable zi is evaluated along a curve of the complex plane zi containing the point z. The singularity of the integrand for zi = z is a pole provided that the following condition is satisfied. 2α′p · k̂ = 1 (200) The last vertex in Eq. (199) is the vertex operator corresponding to the ground tachyonic state given in Eq. (59) with α′p2 = 1. Using the general form of the vertex one can compute the three-point amplitude involving three arbitrary DDF vertex operators. This calculation 36 Paolo Di Vecchia has been performed in Ref. [37] and since the vertex operators are conformal fields with dimension equal to 1 one gets: 〈0, 0|V (z1, P1)V(i(2) (z2, P2)V(i(3) (z3, P3)|0, 0〉 = (z1 − z2)(z1 − z3)(z2 − z3) (201) where the explicit form of the coefficient C123 is given by: C123 = 1〈0, 0|2〈0, 0|3〈0, 0|e r.s=1 n,m=1 −n;iN −m;i+ −n;i× × eτ0 (α′Π2r−1)|N (1)k1 , i 〉1|N (2)k2 , i 〉2|N (3)k3 , i 〉3 (202) where N rsnm = −N rnNsm nmα1α2α3 nαs +mαr ; N rn = Γ (−nαr+1 αrn!Γ (1− nαr+1αr − n) (203) Π = Pr+1αr − Prαr+1 ; r = 1, 2, 3 (204) Π is independent on the value of r chosen as a consequence of the equations: Pr = 0 (205) 7 The zero slope limit In the introduction we have seen that the dual resonance model has been constructed using rules that are different from those used in field theory. For instance, we have seen that planar duality implies that the amplitude corresponding to a certain duality diagram, contains poles in both s and t channels, while the amplitude corresponding to a Feynman diagram in field theory contains only a pole in one of the two channels. Furthermore, the scattering amplitude in the dual resonance model contains an infinite number of resonant states that, at high energy, average out to give Regge behaviour. Also this property is not observed in field theory. The question that was natural to ask, was then: is there any relation between the dual resonance model and field theory? It turned out, to the surprise of many, that the dual resonance model was not in contradiction with field theory, but was instead an extension of a certain number of field theories. We will see that the limit in The birth of string theory 37 which a field theory is obtained from the dual resonance model corresponds to taking the slope of the Regge trajectory α′ to zero. Let us consider the scattering amplitude of four ground state particles in Eq. (1) that we rewrite here with the correct normalization factor: A(s, t, u) = C0N 0 (A(s, t) +A(s, u) +A(t, u)) (206) where 2g(2α′) 4 (207) is the correct normalization factor for each external leg, g is the dimensionless open string coupling constant that we have constantly ignored in the previous sections and C0 is determined by the following relation: ′ = 1 (208) that is obtained by requiring the factorization of the amplitude at the pole corresponding to the ground state particle whose mass is given in Eq. (21). Using Eq. (21) in order to rewrite the intercept of the Regge trajectory in terms of the mass of the ground state particle m2 and the following relation satisfied by the Γ - function: Γ (1 + z) = zΓ (z) (209) we can easily perform the limit for α′ → 0 of A(s, t) obtaining: A(s, t) = m2 − s m2 − s (210) Performing the same limit on the other two planar amplitudes we get the following expression for the total amplitude in Eq. (206): A(s, t, u) = 2g(2α′) (α′)2 m2 − s m2 − s m2 − u (211) By introducing the coupling constant: g3 = 4g(2α 4 (212) Eq. (211) becomes A(s, t, u) = g23 m2 − s m2 − s m2 − u (213) that is equal to the sum of the tree diagrams for the scattering of four particles with mass m of Φ3 theory with coupling constant equal to g3. We have shown that, by keeping g3 fixed in the limit α ′ → 0, the scattering amplitude of four 38 Paolo Di Vecchia ground state particles of the dual resonance model is equal to the tree diagrams of Φ3 theory. This proof can be extended to the scattering of N ground state particles recovering also in this case the tree diagrams of Φ3 theory. It is also valid for loop diagrams that we will discuss in the next section. In conclusion, the dual resonance model reduces in the zero slope limit to Φ3 theory. The proof that we have presented here is due to J. Scherk [38] 13 A more interesting case to study is the one with intercept α0 = 1. We will see that, in this case, one will obtain the tree diagrams of Yang-Mills theory, as shown by Neveu and Scherk [40] 14. Let us consider the three-point amplitude involving three massless gauge particles described by the vertex operator in Eq. (173). It is given by the sum of two planar diagrams. The first one corresponding to the ordering (123) is given by: 3Tr (λa1λa2λa3) 〈0, 0|Vǫ1(z1, p1)Vǫ2(z2, p2)Vǫ3(z3, p3)|0, 0〉 [(z1 − z2)(z2 − z3)(z1 − z3)]−1 (214) Using momentum conservation p1+ p2+ p3 = 0 and the mass shell conditions p2i = pi · ǫi = 0 one can rewrite the previous equation as follows: 0Tr(λ a1λa2λa3) × [(ǫ1 · ǫ2)(p1 · ǫ3) + (ǫ1 · ǫ3)(p3 · ǫ2) + (ǫ2 · ǫ3)(p2 · ǫ1)] (215) The second contribution comes from the ordering 132 that can be obtained from the previous one by the substitution Tr(λa1λa2λa3) → −Tr(λa1λa3λa2) (216) Summing the two contributions one gets oTr(λ a1 [λa2 , λa3 ]) × [(ǫ1 · ǫ2)(p1 · ǫ3) + (ǫ1 · ǫ3)(p3 · ǫ2) + (ǫ2 · ǫ3)(p2 · ǫ1)] (217) The factor N0 = 2g(2α ′)(d−2)/4 (218) is the correct normalization factor for each vertex operator if we normalize the generators of the Chan-Paton group as follows: δij (219) 13 See also Ref. [39]. 14 See also Ref. [41]. The birth of string theory 39 It is related to C0 through the relation ′ = 2 (220) g is the dimensionless open string coupling constant. Notice that Eq.s (218) and (220) differ from Eq.s (207) and (208) because of the presence of the Chan-Paton factors that we did not include in the case of Φ3 theory. By using the commutation relations: [λa, λb] = ifabcλc (221) and the previous normalization factors we get for the three-gluon amplitude: igYMf a1a2a3 [(ǫ1 · ǫ2)((p1 − p2) · ǫ3 + +(ǫ1 · ǫ3)((p3 − p1) · ǫ2) + (ǫ2 · ǫ3)((p2 − p3) · ǫ1)] (222) that is equal to the 3-gluon vertex that one obtains from the Yang-Mills action LYM = − F aαβF a , F αβ = ∂αA β − ∂βAaα + gYMfabcAbαAcβ (223) where gYM = 2g(2α 4 (224) The previous procedure can be extended to the scattering of N gluons finding the same result that one gets from the tree diagrams of Yang-Mills theory. In the next section, we will discuss the loop diagrams. Also, in this case one finds that the h-loop diagrams involving N external gluons reproduces in the zero slope limit the sum of the h-loop diagrams with N external gluons of Yang-Mills theory. We conclude this section mentioning that one can also take the zero slope limit of a scattering amplitude involving three and four gravitons obtaining agreement with what one gets from the Einstein Lagrangian of general rela- tivity. This has been shown by Yoneya [43]. 8 Loop diagrams The N -point amplitude previously constructed satisfies all the axioms of S- matrix theory except unitarity because its only singularities are simple poles corresponding to zero width resonances lying on the real axis of the Mandel- stam variables and does not contain the various cuts required by unitarity [1]. 15 The determination of the previous normalization factors can be found in the Appendix of Ref. [42]. 40 Paolo Di Vecchia In order to eliminate this problem it was proposed already in the early days of dual theories to assume, in analogy with what happens for instance in pertur- bative field theory, that the N -point amplitude was only the lowest order (the tree diagram) of a perturbative expansion and, in order to implement unitar- ity, it was necessary to include loop diagrams. Then, the one-loop diagrams were constructed from the propagator and vertices that we have introduced in the previous sections [44]. The planar one-loop amplitude with M external particles was computed by starting from a (M + 2)-point tree amplitude and then by sewing two external legs together after the insertion of a propagator D given in Eq. (100). In this way one gets: (2α′)d/2(2π)d 〈P, λ|V (1, p1)DV (1, p2) . . . V (1, pN)D|P, λ〉 (225) where the sum over λ corresponds to the trace in the space of the harmonic os- cillators and the integral in ddP corresponds to integrate over the momentum circulating in the loop. The previous expression for the one-loop amplitude cannot be quite correct because all states of the space generated by the oscil- lators in Eq. (51) are circulating in the loop, while we know that we should include only the physical ones. This was achieved first by cancelling by hand the time and one of the space components of the harmonic oscillators reducing the degrees of freedom of each oscillator from d to d − 2 as suggested by the DDF operators at least for d = 26. This procedure was then shown to be cor- rect by Brink and Olive [45]. They constructed the operator that projects over the physical states and, by inserting it in the loop, showed that the reduction of the degrees of freedom of the oscillators from d to d− 2 was indeed correct. This was, at that time, the only procedure available to let only the physical states circulate in the loop because the BRST procedure was discovered a bit later also in the framework of the gauge field theories! To be more explicit let us compute the trace in Eq. (225) adding also the Chan-Paton factor. We get: (2π)dδ(d) NTr(λa1 . . . λaM ) (8π2α′)d/2 τd/2+1 [f1(k)] 12 (2π)M× dνM−1 . . . dν2 τ eG(νji) ]2α′pi·pj ; k ≡ e−πτ(226) where νji ≡ νj − νi, G(ν) = log ie−πν 2τ Θ1(iντ |iτ) f31 (k) ; f1(k) = k (1− k2n) (227) The birth of string theory 41 Θ1(ν|iτ) = −2k1/4 sinπν 1− e2iπνk2n 1− e−2iπνk2n (1− k2n)(228) Finally the normalization factor N0 is given in Eq. (218). We have performed the calculation for an arbitrary value of the space-time dimension d. However, in this way one gets also the extra factor of k 12 appearing in the first line of Eq. (226) that implies that our calculation is actually only consistent if d = 26. In fact, the presence of this factor does not allow one to rewrite the amplitude, originally obtained in the Reggeon sector, in the Pomeron sector as explained below. In the following we neglect this extra factor, implicitly assuming that d = 26, but, on the other hand, still keeping an arbitrary d. Using the relations: f1(k) = tf1(q) ; Θ1(iντ |iτ) = iΘ1(ν|it)t1/2eπν 2/t (229) where t = 1 and q ≡ e−πt, we can rewrite the one-loop planar diagram in the Pomeron channel. We get: (2π)dδ(d) NTr(λa1 . . . λaM ) (8π2α′)d/2 dt[f1(q)] 2−d(2π)M× dνM−1 . . . −Θ1(νji|it) f31 (q) ]2α′pi·pj (230) Notice that, by factorizing the planar loop in the Pomeron channel, one con- structed for the first time what we now call the boundary state [46] 16. This can be easily seen in the way that we are now going to describe. First of all, notice that the last quantity in Eq. (230) can be written as follows: Θ1(νji|it) f31 (q) ]2α′pi·pj −2 sin(πνji) 1− q2ne2πiνji 1− q2ne−2πiνji (1− q2n)2 ]2α′pi·pj (231) This equation can be rewritten as follows: 〈p = 0|q2R i=1 : e ipi·Q(e2iπνi ) : |p = 0〉 Tr (〈p = 0|q2N |p = 0〉) ; R = na†n · an (232) 16 See also the first paper in Ref. [47]. 42 Paolo Di Vecchia where the trace is taken only over the non-zero modes and momentum con- servation has been used. It must also be stressed that the normal ordering of the vertex operators in the previous equation is such that the zero modes are taken to be both in the same exponential instead of being ordered as in Eq. (59). By bringing all annihilation operators on the left of the creation ones, from the expression in Eq. (232) one gets (zi ≡ e2πiνi): (2π)dδ(d) (−2 sinπνji)2α ′pi·pj× n=1 Tr n·ane 2α′pj · znj e 2α′pi· an√ Tr (〈p = 0|q2N |p = 0〉) (233) The trace can be computed by using the completeness relation involving co- herent states |f〉 = efa† |0〉: e−|f | |f〉〈f | = 1 (234) Inserting the previous identity operator in Eq. (233) one gets after some cal- culation: (2π)dδ(d) (−2 sinπνji)2α ′pi·pj× i.j=1 −2α′pi·pje2πinνji q n(1−q2n) (235) Expanding the denominator in the last exponent and performing the sum over n one gets: (2π)dδ(d) (−2 sinπνji)2α ′pi·pj× 2α′pi·pj log(1−e2πiνji q2(m+1)) (236) that is equal to the last line of Eq. (231) apart from the δ-function for mo- mentum conservation. In conclusion, we have shown that Eq.s (231) and (232) are equal. Using Eq. (231) we can rewrite Eq. (230) as follows: NNM0 Tr(λ a1 . . . λaM ) (8π2α′)d/2 dt[f1(q)] 2−d(2πi)M dνM−1 . . . The birth of string theory 43 . . . λ〈p = 0, λ|q2R i=1 : e ipi·Q(e2iπνi ) : |p = 0, λ〉 λ〈p = 0, λ|q2N |p = 0, λ〉 (237) where the sum over any state |λ〉 corresponds to taking the trace over the non-zero modes. If d = 26 we can rewrite Eq. (237) in a simpler form: NNM0 Tr(λ a1 . . . λaM ) (8π2α′)d/2 dt (2πi)M dνM−1 . . . 〈p = 0, λ|q2R−2 : eipi·Q(e 2iπνi ) : |p = 0, λ〉 (238) The previous equation contains the factor dtq2R−2 that is like the propa- gator of the Shapiro-Virasoro model, but with only one set of oscillators as in the generalized Veneziano model. In the following we will rewrite it com- pletely with the formalism of the Shapiro-Virasoro model. This can be done by introducing the Pomeron propagator: dt q2N−2 = D̂ ; D̂ ≡ α zL0−1z̄L̃0−1; |z| ≡ q = e−πt(239) and rewriting the planar loop in the following compact form: 〈B0|D̂|BM 〉 ; |B0〉 ≡ n |p = 0, 0a, 0ã〉 (240) where |B0〉 is the boundary state without any Reggeon on it, Td−1 = 2(d−10)/4 α′)−d/2−1 (241) and |BM 〉 is instead the one with M Reggeons given by: |BM 〉 = NM0 Tr(λa1 . . . λaM )(2πi)M dνM−1 . . . : eipi·Q(e 2iπνi ) : |B0〉 (242) We want to stress once more that the normal ordering in the previous equa- tion is defined by taking the zero modes in the same exponential. Both the boundary states and the propagator are now states of the Shapiro-Virasoro model. This means that we have rewritten the one-loop planar diagram, where the states of the generalized Veneziano model circulate in the loop, as a tree 44 Paolo Di Vecchia diagram of the Shapiro-Virasoro model involving two boundary states and a propagator. This is what nowadays is called open/closed string duality. Besides the one-loop planar diagram in Eq. (225), that is nowadays called the annulus diagram, also the non-planar and the non-orientable diagrams were constructed and studied. In particular the non-planar one, that is ob- tained as the planar one in Eq. (225) but with two propagators multiplied with the twist operator Ω = eL−1(−1)R , (243) had unitarity violating cuts that disappeared [27] if the dimension of the space-time d = 26, leaving behind additional pole singularities. The explicit form of the non-planar loop can be obtained following the same steps done for the planar loop. One gets for the non-planar loop the following amplitude: 〈BR|D̂|BM 〉 (244) where now both boundary states contain, respectively, R and M Reggeon states. The additional poles found in the non-planar loop were called Pomerons because they occur in the Pomeron sector, that today is called the closed string channel, to distinguish them from the Reggeons that instead occur in the Reggeon sector, that today is called the open string sector of the planar and non-planar loop diagrams. At that time in fact, the states of the generalized Veneziano models were called Reggeons, while the additional ones appearing in the non-planar loop were called Pomerons. The Reggeons correspond nowa- days to open string states, while the Pomerons to closed string states. These things are obvious now, but at that time it took a while to show that the additional states appearing in the Pomeron sector have to be identified with those of the Shapiro-Virasoro model. The proof that the spectrum was the same came rather early. This was obtained by factorizing the non-planar dia- gram in the Pomeron channel [46] as we have done in Eq. (244). It was found that the states of the Pomeron channel lie on a linear Regge trajectory that has double intercept and half slope of the one of the Reggeons. This follows immediately from the propagator D̂ in Eq. (239) that has poles for values of the momentum of the Pomeron exchanged given by: p2 = 2n (245) that are exactly the values of the masses of the states of the Shapiro-Virasoro model [48], while the Reggeon propagator in Eq. (100) has poles for values of momentum equal to: 1− α′p2 = n (246) However, it was still not clear that the Pomeron states interact among them- selves as the states of the Shapiro-Virasoro model. To show this it was first The birth of string theory 45 necessary to construct tree amplitudes containing both states of the general- ized Veneziano model and of the Shapiro-Virasoro model [49]. They reduced to the amplitudes of the generalized Veneziano (Shapiro-Virasoro) model if we have only external states of the generalized Veneziano (Shapiro-Virasoro) model. Those amplitudes are called today disk amplitudes containing both open and closed string states. They were constructed [49] by using for the Reggeon states the vertex operators that we have discussed in Sect. (5) in- volving one set of harmonic oscillators and for the Pomeron states the vertex operators given in Eq. (181) that we rewrite here: Vα,β(z, z̄, p) = Vα(z, )Vβ(z̄, ) (247) because now both component vertices contain the same set of harmonic os- cillators as in the generalized Veneziano model. Furthermore, each of the two vertices is separately normal ordered, but their product is nor normal ordered. The amplitude involving both kinds of states is then constructed by taking the product of all vertices between the projective invariant vacuum and inte- grating the Reggeons on the real axis in an ordered way and the Pomerons in the upper half plane, as one does for a disk amplitude. We have mentioned above that the two vertices are separately normal ordered, but their product is not normal ordered. When we normal order them we get, for instance for the tachyon of the Pomeron sector, a factor (z − z̄)α′p2/2 that describes the Reggeon-Pomeron transition. This implies a direct coupling [51] between the U(1) part of gauge field and the two-index antisymmetric field Bµν , called Kalb-Ramond field [50], of the Pomeron sector, that makes the gauge field massive [51]. It was then shown that, by factorizing the non-planal loop in the Pomeron channel, one reproduced the scattering amplitude containing one state of the Shapiro-Virasoro and a number of states of the generalized Veneziano model [52]. If we have also external states belonging to the generalized Shapiro-Virasoro model, then by factorizing the non-planar one loop ampli- tude in the pure Pomeron channel, one would obtain the tree amplitudes of the Shapiro-Virasoro model [52]. All this implies that the generalized Veneziano model and the Shapiro- Virasoro model are not two independent models, but they are part of the same and unique model. In fact, if one started with the generalized Veneziano model and added loop diagrams to implement unitarity, one found the ap- pearence in the non-planar loop of additional states that had the same mass and interaction of those of the Shapiro-Virasoro model. The planar diagram, written in Eq. (230) in the closed string channel, is divergent for large values of t. This divergence was recognized to be due to exchange, in the Pomeron channel, of the tachyon of the Shapiro-Virasoro model and of the dilaton [47]. They correspond, respectively, to the first two terms of the expansion: [f1(q)] −24 = e2πt + 24 +O e−2πt (248) 46 Paolo Di Vecchia The first one could be cancelled by an analytic continuation, while the second one could be eliminated through a renormalization of the slope of the Regge trajectory α′ [47]. We conclude the discussion of the one-loop diagrams by mentioning that the one-loop diagram for the Shapiro-Virasoro model was computed by Shapiro [53] who also found that the integrand was modular invariant. The computation of multiloop diagrams requires a more advanced tech- nology that was also developed in the early days of the dual resonance model few years before the discovery of its connection to string theory. In order to compute multiloop diagrams one needs first to construct an object that was called the N -Reggeon vertex and that has the properties of containing N sets of harmonic oscillators, one for each external leg, and is such that, when we saturate it with N physical states, we get the corresponding N -point ampli- tude. In the following we will discuss how to determine the N -Reggeon vertex. The first step toward the N -Reggeon vertex is the Sciuto-Della Selva- Saito [54] vertex that includes two sets of harmonic oscillators that we denote with the indices 1 and 2. It is equal to: VSDS = 2〈x = 0, 0| : exp dzX ′2(z) ·X1(1− z) : (249) where X is the quantity that we have called Q in Eq. (57) and the prime denotes a derivative with respect to z. It satisfies the important property of giving the vertex operator Vα(z = 1) of an arbitrary state |α〉 when we saturate it with the corresponding state: VSDS |α〉2 = Vα(z = 1) (250) A shortcoming of this vertex is that it is not invariant under a cyclic permu- tation of the three legs. A cyclic symmetric vertex has been constructed by Caneschi, Schwimmer and Veneziano [55] by inserting the twist operator in Eq. (243). But the 3-Reggeon vertex is not enough if we want to compute an arbitrary multiloop amplitude. We must generalize it to an arbitrary number of external legs. Such a vertex, that can be obtained from the one in Eq. (249) with a very direct procedure, or that can also be obtained by sewing together three-Reggeon vertices, has been written in its final form by Lovelace [56] 17. Here we do not derive it, but we give directly its expression written in Ref. [56]: VN,0 = i=1 dzi dVabc i=1[V i (0)] [i 0√ 2α′pµ if n = 0√ na†µn if n < 0 (278) They are zero as a consequence of Eq.s (270) that in the conformal gauge become Eq.s (271). In the case of a closed string we get instead: L̃n = dzzn+1 = 0 (279) dz̄z̄n+1 = 0 (280) The birth of string theory 53 In terms of the harmonic oscillators introduced in eq. (276) we get αm · αn−m = 0 ; L̃n = α̃m · α̃n−m = 0 (281) where for the non-zero modes we have used the convention in (278), while the zero mode is given by: 0 = α̃ (282) In conclusion, the fact that we have reparametrization invariance implies that the Virasoro generators are classically identically zero. When we quantize the theory one cannot and also does not need to impose that they are vanishing at the operator level. They are imposed as conditions characterizing the physical states. 〈Phys′|Ln|Phys〉 = 〈Phys′|(L0 − 1)|Phys〉 = 0 ; n 6= 0 (283) These equations are satisfied if we require: Ln|Phys >= (L0 − 1)|Phys >= 0 (284) The extra factor −1 in the previous equations comes from the normal ordering as explained in Eq. (198). The authors of Ref. [71] further specified the gauge by fixing it completely. They introduced the light-cone gauge specified by imposing the condition: X+ = 2α′p+τ (285) where X0 ±Xd−1√ X0 ±Xd−1√ (286) In this gauge the only physical degrees of freedom are the transverse ones. In fact the components along the directions 0 and d − 1 can be expressed in terms of the transverse ones by inserting Eq. (285) in the constraints in Eq. (271) and getting: Ẋ− = 4α′p+ (Ẋ2i +X i ) X 2α′p+ Ẋi ·X ′i (287) that up to a constant of integration determine completely X− as a function of X i. In terms of oscillators we get α+n = 0 ; 2α′α−n = αin−mα m n 6= 0 (288) 54 Paolo Di Vecchia for an open string and α+n = α̃ n = 0 n 6= 0 (289) together with 2α′α−n = αin−mα 2α′α̃−n = α̃in−mα̃ m (290) in the case of a closed string. This shows that the physical states are described only by the transverse oscillators having only d − 2 components. Those transverse oscillators corre- spond to the transverse DDF operators that we have discussed in Section 6. The authors of Ref. [71] also constructed the Lorentz generators only in terms of the transverse oscillators and they showed that they satisfy the correct Lorentz algebra only if the space-time dimension is d = 26. In this way the spectrum of the dual resonance model was completely reproduced starting from the Nambu-Goto action if d = 26! On the other hand, the choice of d = 26 is a necessity if we want to keep Lorentz invariance! Immediately after this, the interaction was also included either by adding a term describing the interaction of the string with an external gauge field [73] or by using a functional formalism [74, 75]. In the following we will give some detail only of the first approach for the case of an open string. A way to describe the string interaction is by adding to the free string action an additional term that describes the interaction of the string with an external field. SINT = dDyΦL(y)JL(y) (291) where ΦL(y) is the external field and JL is the current generated by the string. The index L stands for possible Lorentz indices that are saturated in order to have a Lorentz invariant action. In the case of a point particle, such an interaction term will not give any information on the self-interaction of a particle. In the case of a string, instead, we will see that SINT will describe the interaction among strings because the external fields that can consistently interact with a string are only those that correspond to the various states of the string, as it will become clear in the discussion below. This is a consequence of the fact that, for the sake of consistency, we must put the following restrictions on SINT : • It must be a well defined operator in the space spanned by the string oscillators. The birth of string theory 55 • It must preserve the invariances of the free string theory. In particular, in the ”conformal gauge” it must be conformal invariant. • In the case of an open string, the interaction occurs at the end point of a string (say at σ = 0). This follows from the fact that two open strings interact attaching to each other at the end points. The simplest scalar current generated by the motion of a string can be written as follows J(y) = dσδ(σ)δ(d)[yµ − xµ(τ, σ)] (292) where δ(σ) has been introduced because the interaction occurs at the end of the string. For the sake of simplicity we omit to write a coupling constant g in (292). Inserting (292) in (291) and using for the scalar external field Φ(y) = eik·y a plane wave, we get the following interaction: SINT = dτ : eik·X(τ,0) : (293) where the normal ordering has been introduced in order to have a well defined operator. The invariance of (293) under a conformal transformation τ → w(τ) requires the following identity: SINT = dτ : eik·X(τ,0) : = dw : eik·X(w,0) : (294) or, in other words, that : eik·X(τ,0) :=⇒ w′(τ) : eik·X(w,0) : (295) This means that the integrand in Eq. (294) must be a conformal field with conformal dimension equal to one and this happens only if α′k2 = 1. The external field corresponds then to the tachyonic lowest state of the open string. Another simple current generated by the string is given by: Jµ(y) = dσδ(σ)Ẋµ(τ, σ)δ (d)(y −X(τ, σ)) (296) Inserting (296) in (291) we get SINT = dτẊµ(τ, 0)ǫ µeik·X(τ,0) (297) if we use a plane wave for Φµ(y) = ǫµe ik·y. The vertex operator in eq. (297) is conformal invariant only if k2 = ǫ · k = 0 (298) 56 Paolo Di Vecchia and, therefore, the external vector must be the massless photon state of the string. We can generalize this procedure to an arbitrary external field and the result is that we can only use external fields that correspond to on shell physical states of the string. This procedure has been extended in Ref. [73] to the case of external gravitons by introducing in the Nambu-Goto action a target space metric and obtaining the vertex operator for the graviton that is a massless state in the closed string theory. Remember that, at that time, this could have been done only with the Nambu-Goto action because the σ-model action was introduced only in 1976 first for the point particle [76] and then for the string [77]. As in the case of the photon it turned out that the external field corresponding to the graviton was required to be on shell. This condition is the precursor of the equations of motion that one obtains from the σ-model action requiring the vanishing of the β-function [78]. One can then compute the probability amplitude for the emission of a number of string states corresponding to the various external fields, from an initial string state to a final one. This amplitude gives precisely the N -point amplitude that we discussed in the previous sections [73]. In particular, one learns that, in the case of the open string, the Fubini-Veneziano field is just the string coordinate computed at σ = 0: Qµ(z) ≡ Xµ(z, σ = 0) ; z = eiτ (299) In the case of a closed string we get instead: Qµ(z, z̄) ≡ Xµ(z, z̄) ; z = e2i(τ−σ) , z̄ = e2i(τ+σ) (300) Finally, let me mention that with the functional approach Mandelstam [74] and Cremmer and Gervais [79] computed the interaction between three arbi- trary physical string states and reproduced in this way the coupling of three DDF states given in Eq. (202) and obtained in Ref. [37] by using the operator formalism. At this point it was completely clear that the structure underlying the generalized Veneziano model was that of an open relativistic string, while that underlying the Shapiro-Virasoro model was that of a closed relativistic string. Furthermore, these two theories are not independent because, if one starts from an open string theory, one gets automatically closed strings by loop corrections. 10 Conclusions In this contribution, we have gone through the developments that led from the construction of the dual resonance model to the bosonic string theory trying as much as possible to include all the necessary technical details. This is because we believe that they are not only important from an historical point of view, but are also still part of the formalism that one uses today in many The birth of string theory 57 string calculations. We have tried to be as complete and objective as possible, but it could very well be that some of those who participated in the research of these years, will not agree with some or even many of the statements we made. We apologize to those we have forgotten to mention or we have not mentioned as they would have liked. Finally, after having gone through the developments of these years, my thoughts go to Sergio Fubini who shared with me and Gabriele many of the ideas described here and who is deeply missed, and to my friends from Flo- rence, Naples and Turin for a pleasant collaboration in many papers discussed here. Acknowledgments I thank R. Marotta and I. Pesando for a critical reading of the manuscript. References 1. G.F. Chew, The analytic S matrix, W.A.Benjamin, Inc. (1966). R.J. Eden, P.V. Landshoff, D.I. Olive and J.C. Polkinghorne, The analytic S matrix, Cambridge University Press (1966). 2. R. Dolen, D. Horn and C. Schmid, Phys. Rev. 166, 1768 (1968). C. Schmid, Phys. Rev. Letters 20, 689 (1968). 3. H. Harari, Phys. Rev. Letters 22, 562 (1969). J.L. Rosner, Phys. Rev. Letters 22, 689 (1969). 4. G. Veneziano, Nuovo Cimento A 57, 190 (1968). 5. M. A. Virasoro, Phys. Rev. 177, 2309 (1969). 6. M.A. Virasoro, Phys. Rev. D 1, 2933 (1970). 7. A. Neveu and J.H. Schwarz, Nucl. Phys. B 31, 86 (1971) and Phys. Rev. D 4, 1109 (1971). 8. P. Ramond, Phys. Rev. D 3, 2415 (1971). 9. C. Lovelace, Phys. Lett. B 28, 265 (1968). J. Shapiro, Phys. Rev. 179, 1345 (1969). 10. P.H. Frampton, Phys. Lett. B 41, 364 (1972). 11. V. Alessandrini, D. Amati, M. Le Bellac and D. Olive, Phys. Rep. C 1, 269 (1971). G. Veneziano, Phys. Rep. C 9, 199 (1974). S. Mandelstam, Phys. Rep. C 13, 259 (1974). C. Rebbi, Phys. Rep. C 12, 1 (1974). J. Scherk, Rev. Mod. Phys. 47, 123 (1975). 12. F. Gliozzi, Lett. Nuovo Cimento 2, 1160 (1970). 13. K. Bardakçi and H. Ruegg, Phys. Rev. 181, 1884 (1969). C.G. Goebel and B. Sakita, Phys. Rev. Letters 22, 257 (1969). Chan Hong-Mo and T.S. Tsun, Phys. Lett. B 28, 485 (1969). Z. Koba and H.B.Nielsen, Nucl. Phys. B 10, 633 (1969). 14. K. Bardakçi and H. Ruegg, Phys. Lett.B 28, 671 (1969). M.A. Virasoro, Phys. Rev. Lett. 22, 37 (1969). 58 Paolo Di Vecchia 15. Z. Koba and H.B.Nielsen, Nucl. Phys. B 12, 517 (1969). 16. H. M. Chan and J.E. Paton, Nucl. Phys. B 10, 516 (1969). 17. S. Fubini and G. Veneziano, Nuovo Cimento A 64, 811 (1969). 18. Bardakçi and S. Mandelstam, Phys. Rev. 184, 1640 (1969). 19. S. Fubini, D. Gordon and G. Veneziano, Phys. Lett. B 29, 679 (1969) 20. Y. Nambu, Proc. Int. Conf. on Symmetries and Quark Models, Wayne State University 1969 (Gordon and Breach, 1970) p. 269. 21. L. Susskind, Nuovo Cimento A 69, 457 (1970) and Phys. Rev. Letter 23, 545 (1969). 22. J. Shapiro, Phys. Lett. B 33, 361 (1970). 23. S. Fubini and G. Veneziano, Nuovo Cimento A 67, 29 (1970). 24. F. Gliozzi, Lettere al Nuovo Cimento 2, 846 (1969). 25. C.B. Chiu, S. Matsuda and C. Rebbi, Phys. Rev. Lett. 23, 1526 (1969). C.B. Thorn, Phys. Rev. D 1, 1963 (1970). 26. S. Fubini and G. Veneziano, Annals of Physics 63, 12 (1971). 27. C. Lovelace, Phys. Lett. B 34, 500 (1971). 28. E. Del Giudice and P. Di Vecchia, Nuovo Cimento A 5, 90 (1971). M. Yoshimura, Phys. Lett. B 34, 79 (1971). 29. E. Del Giudice and P. Di Vecchia, Nuovo Cimento A 70, 579 (1970). 30. P. Campagna, S. Fubini, E Napolitano and S. Sciuto, Nuovo Cimento A 2, 911 (1971). 31. E. Del Giudice, P. Di Vecchia and S. Fubini, Annals of Physics, 70, 378 (1972). 32. R.C. Brower, Phys. Rev. D 6, 1655 (1972). 33. P. Goddard and C.B. Thorn, Phys. Lett. B 40, 235 (1972). 34. F. Gliozzi, J. Scherk and D. Olive, Phys. Lett. B 65, 282 (1976) ; Nucl. Phys. B 122, 253 (1977). 35. L. Brink and H.B. Nielsen, Phys. Lett. B 45, 332 (1973). 36. F. Gliozzi, unpublished. See also P. Di Vecchia in Many Degrees of Freedom in Particle Physics, Edited by H. Satz, Plenum Publishing Corporation, 1978, p. 493. 37. M. Ademollo, E. Del Giudice, P. Di Vecchia and S. Fubini, Nuovo Cimento A 19, 181 (1974). 38. J. Scherk, Nucl. Phys. B 31, 222 (1971). 39. N. Nakanishi, Prog. Theor. Phys. 48, 355 (1972). P.H. Frampton and K.C. Wali, Phys. Rev. D 8, 1879 (1973). 40. A. Neveu and J. Scherk, Nucl. Phys.B 36, 155 (1973). 41. A. Neveu and J.L. Gervais, Nucl. Phys. B 46, 381 (1972). 42. P. Di Vecchia, A. Lerda, L. Magnea, R. Marotta and R. Russo, Nucl. Phys. B 469, 235 (1996) 43. T. Yoneya, Prog. of Theor. Phys. 51, 1907 (1974). 44. K. Kikkawa, B. Sakita and M. Virasoro, Phys. Rev. 184, 1701 (1969). K. Bardakçi, M.B. Halpern and J. Shapiro, Phys. Rev. 185, 1910 (1969). D. Amati, C. Bouchiat and J.L. Gervais, Lett. al Nuovo Cimento 2, 399 (1969). A. Neveu and J. Scherk, Phys. Rev. D 1, 2355 (1970). G. Frye and L. Susskind, Phys. Lett. B 31, 537 (1970). D.J. Gross, A. Neveu, J. Scherk and J.H. Schwarz, Phys. Rev. D 2, 697 (1970). 45. L. Brink and D. Olive, Nucl. Phys. B 56, 253 (1973) and Nucl. Phys. B 58, 237 (1973). The birth of string theory 59 46. E. Cremmer and J. Scherk, Nucl. Phys. B 50, 222 (1972). L. Clavelli and J. Shapiro, Nucl. Phys. B 57, 490 (1973). L. Brink, D.I. Olive and J. Scherk, Nucl. Phys. B 61, 173 (1973). 47. M. Ademollo, A. D’Adda, R. D’Auria, F. Gliozzi, E. Napolitano, S. Sciuto and P. Di Vecchia, Nucl. Phys. B 94, 221 (1975). J. Shapiro, Phys. Rev. D 11, 2937 (1975). 48. D.I.Olive and J. Scherk, Phys. Lett. B 44, 296 (1973). 49. M. Ademollo, A. D’Adda, R. D’Auria, E. Napolitano, P. Di Vecchia, F. Gliozzi and S. Sciuto, Nucl. Phys. B 77, 189 (1974). 50. M. Kalb and P. Ramond, Phys. Rev. D 9, 2273 (1974). 51. E, Cremmer and J. Scherk, Nucl. Phys. B 72, 117 (1974). 52. A. D’Adda, R. D’Auria, E. Napolitano, P. Di Vecchia, F. Gliozzi and S. Sciuto, Phys. Lett. B 68, 81 (1977). 53. J. Shapiro, Phys. Rev. D 5, 1945 (1975). 54. S. Sciuto, Lettere al Nuovo Cimento 2, 411 (1969). A. Della Selva and S. Saito, Lett. al Nuovo Cimento 4, 689 (1970). 55. L. Caneschi, A. Schwimmer and G. Veneziano, Phys. Lett.B 30, 356 (1969). L. Caneschi and A. Schwimmer, Lettere al Nuovo Cimento 3, 213 (1970). 56. C. Lovelace, Phys. Lett. B 32, 490 (1970). 57. D.I. Olive, Nuovo Cimento A 3, 399 (1971). 58. I. Drummond, Nuovo Cimento A 67, 71 (1970). G. Carbone and S. Sciuto, Lett. Nuovo Cimento 3, 246 (1970). L. Kosterlitz and D. Wray, Lett. al Nuovo Cimento 3, 491 (1970). D. Collop, Nuovo Cimento A 1, 217 (1971). L.P. Yu, Phys. Rev. D 2, 1010 (!970); Phys. Rev. D 2, 2256 (!970). E. Corrigan and C. Montonen, Nucl. Phys. B 36, 58 (1972). J.L. Gervais and B. Sakita, Phys. Rev. D 4, 2291 (1971). 59. C. Lovelace, Phys. Lett. B 32, 703 (1970). 60. V. Alessandrini, Nuovo Cimento A 2, 321 (1971). 61. D. Amati and V. Alessandrini, Nuovo Cimento A 4, 793 (1971). 62. P. Di Vecchia, M. Frau, A. Lerda and S. Sciuto, Phys. Lett. B 199, 49 (1987). J.L. Petersen and J. Sidenius, Nucl. Phys. B 301, 247 (1988). 63. S. Mandelstam, In “Unified String Theories”, edited by M. Green and D. Gross, World Scientific, p. 46. 64. P. Di Vecchia, F. Pezzella, M. Frau, K. Hornfeck, A. Lerda and S. Sciuto, Nucl. Phys. B 322, 317 (1989). 65. Y. Nambu, Lectures at the Copenhagen Symposium, 1970, unpublished. 66. H. B. Nielsen, Paper submitted to the 15th Int. Conf. on High Energy Physics, Kiev, 1970 and Nordita preprint (1969). 67. T. Takabayasi, Progr. Theor. Phys. 44 (1970) 1117. O. Hara, Progr. Theor. Phys. 46, 1549 (1971). L.N. Chang and J. Mansouri, Phys. Rev. D 5, 2535 (1972). J. Mansouri and Y. Nambu, Phys. Lett. B 39, 357 (1972). M. Minami, Prog. Theor. Phys. 48, 1308 (1972). 68. T. Goto, Progr. Theor. Phys. 46 (1971) 1560. 69. D. Fairlie and H.B. Nielsen, Nucl. Phys. B 20, 637 (1970) and 22, 525 (1970). 70. C.S. Hsue, B. Sakita and M.A. Virasoro, Phys. Rev. 2, 2857 (1970). J.L. Gervais and B. Sakita, Phys. Rev. D 4, 2291 (1971). 71. P. Goddard, J. Goldstone, C. Rebbi and C. Thorn, Nucl. Phys. B 56, 109 (1973). 60 Paolo Di Vecchia 72. E.F. Corrigan and D.B. Fairlie, Nucl. Phys. B 91, 527 (1975). 73. M. Ademollo, A. D’Adda, R. D’Auria, P. Di Vecchia, F. Gliozzi, R. Musto, E. Napolitano, F. Nicodemi and S. Sciuto, Nuovo Cimento A 21, 77 (1974). 74. S. Mandelstam, Nucl. Phys. B 64, 205 (1973). 75. J.L. Gervais and B. Sakita, Phys. Rev. Lett. 30, 716 (1973). 76. L. Brink, P. Di Vecchia, P. Howe, S. Deser and B. Zumino, Phys. Lett. B 64, 435 (1976). 77. L. Brink, P. Di Vecchia and P. Howe, Phys. Lett. B 65, 471 (1976). S. Deser and B. Zumino, Phys. Lett. B 65, 369 (1976). 78. C. Lovelace, Phys. Lett. B 136, 75 (1984). C.G. Callan, E.J.Martinec, M.J. Perry and D. Friedan, Nucl. Phys. B 262, 593 (1985). 79. E. Cremmer and J.L. Gervais, Nucl. Phys. 76, 209 (1974). The birth of string theory Paolo Di Vecchia ABSTRACT In this contribution we go through the developments that in the years 1968 to 1974 led from the Veneziano model to the bosonic string. <|endoftext|><|startoftext|> Introduction The purpose of this paper is to construct examples of strange behavior of local coho- mology. In these constructions we follow a strategy that was already used in [CH] and which relates, via a spectral sequence introduced in [HR], the local cohomology for the two distinguished bigraded prime ideals in a standard bigraded algebra. In the first part we consider algebras with rather general gradings and deduce a similar spectral sequence in this more general situation. A typical example of such an algebra is the Rees algebra of a graded ideal. The proof for the spectral sequence given here is simpler than that of the corresponding spectral sequence in [HR]. In the second part of this paper we construct examples of standard graded rings A, which are algebras over a field K, such that the function (1) j 7→ dimK(H iA+(A)−j) is an interesting function for j ≫ 0. In our examples, this dimension will be finite for all Suppose that A0 is a Noetherian local ring, A = j≥0Aj is a standard graded ring and set A+ := j>0Aj. Let M be a finitely generated graded A-module and F := M̃ be the sheafification of M on Y = Proj(A). We then have graded A-module isomorphisms H i+1A+ (M) H i(Y,F(n)) for i ≥ 1, and a similar expression for i = 0 and 1. By Serre vanishing, H iA+(M)j = 0 for all i and j ≫ 0. However, the asymptotic behaviour of H iA+(M)−j for j ≫ 0 is much more mysterious. In the case when A0 = K is a field, the function (1) is in fact a polynomial for large enough j. The proof is a consequence of graded local duality ([BS, 13.4.6] or [BH, 3.6.19]) or follows from Serre duality on a projective variety. For more general A0, HA+(M)−j are finitely generated A0 modules, but need not have finite length. The following problem was proposed by Brodmann and Hellus [BrHe]. The second author was partially supported by NSF. http://arxiv.org/abs/0704.0102v1 Tameness problem: Are the local cohomology modules H iA+(M) tame? That is, is it true that either {H iA+(M)j 6= 0, ∀j ≪ 0} or {H A+(M)j = 0, ∀j ≪ 0}? The problem has a positive solution for A0 of small dimension (some of the references are Brodmann [Br], Brodmann and Hellus [BrHe], Lim [L], Rotthaus and Sega [RS]). Theorem 0.1 ([BrHe]). If dim(A0) ≤ 2, then M is tame. However, it has recently been shown by two of the authors that tameness can fail if dim(A0) = 3. Theorem 0.2 ([CH]). There are examples with dim(A0) = 3 where M is not tame. The statement of this example is reproduced in Theorem 3.1 of this paper. The function (1) is periodic for large j. Specifically, the function (1) is 2 for large even j and is 0 for large odd j. In Theorem 3.3 we construct an example of failure of tameness of local cohomology which is not periodic, and is not even a quasi polynomial (in −j) for large j. Specifically, we have for j > 0, dimK(H (A)−j) = 1 if j ≡ 0 (mod) (p+ 1), 1 if j = pt for some odd t ≥ 0, 0 otherwise, where the characteristic of K is p. We have pt ≡ −1 (mod) (p + 1) for all odd t ≥ 0. We also give an example (Theorem 3.5) of failure of tameness where (1) is a quasi polynomial with linear growth in even degree and is 0 in odd degree. In Theorem 3.6, we give a tame example, but we have dimK(H (A)−j) so (1) is far from being a quasi polynomial in −j for large j. While the example of [CH] is for M = ωA, where ωA is the canonical module of A, the examples of the paper are all for M = A. This allows us to easily reinterpret our examples as Rees algebras in Section 4, and thus we have examples of Rees algebras over local rings for which the above failure of tameness holds. In the final section, Section 5, we give an analysis of the explicit and implicit role of bigraded duality in the construction of the examples, and some comments on how it effects the geometry of the constructions. 1. Duality for polynomial rings in two sets of variables Let K be any commutative ring (with unit). In later applications K will be mostly a field. Furthermore let S = K[x1, . . . , xm, y1, . . . , yn], P = (x1, . . . , xm) and Q = (y1, . . . , yn). The homology of the Čech complex CP ( ) (resp. CQ( )) will be denoted by HP ( ) (resp. HQ( )). Notice that for any commutative ring K, this homology is the local cohomology supported in P (resp. Q), as P and Q are generated by a regular sequences. Assume that S is Γ-graded for some abelian group Γ, and that deg(a) = 0 for a ∈ K. If xsyp ∈ R, deg(xsyp) = l(s) + l′(p) with l(s) := i si deg(xi) and l ′(p) := j pj deg(yj). Definition 1.1. Let I ⊂ S be a Γ-graded ideal. The Γ-grading of S is I-sharp if H iI(S)γ is a finitely generated K-module, for every i and γ ∈ Γ. Lemma 1.2. The following conditions are equivalent: (i) the Γ-grading of S is P -sharp. (ii) the Γ-grading of S is Q-sharp. (iii) for all γ ∈ Γ, |{(α, β) : α ≥ 0, β ≥ 0, l(α) = γ + l′(β)}| < ∞. Note that if K is Noetherian, M is a finitely generated Γ-graded S-module, and the Γ-grading of S is I-sharp, then H iI(M)γ is a finite K-module, for every i and γ ∈ Γ. This follows from the converging Γ-graded spectral sequence Hp−q(H I (F)) ⇒ H I (M), where F is a Γ-graded free S-resolution of M with Fi finite for every i. We will assume from now on that the Γ-grading of S is P -sharp (equivalently Q- sharp). Set σ = deg(x1 · · · xmy1 · · · yn), and if N is a Γ-graded module, then let N∨ = HomS(N,S(−σ)) and N∗ = ∗HomK(N,K) where the Γ-grading of N∗ is given by (N∗)γ = HomK(N−γ ,K). More generally, we always denote the graded K-dual of a graded mod- ule N (over what graded K-algebra soever) by N∗. Finally we denote by ϕαβ the map S(−a) → S(−b) induced by multiplication by xαyβ where a = deg xα and b = − deg yβ. Lemma 1.3. HmP (ϕαβ)γ ∼= HnQ(ϕ∨αβ)∗. Proof. The free K-moduleHmP (S)γ is generated by the elements x −s−1yp with s, p ≥ 0 and −l(s)− l(1) + l′(p) = γ, and HnQ(S)γ′ is generated by the elements xty−q−1 with t, q ≥ 0 and l(t)− l′(q)− l′(1) = γ′. Let dγ : H P (S)γ → (HnQ(S∨)∗)γ = HnQ(S)−γ−σ be the K-linear map defined by −s−1yp)(xty−q−1) = 1, if s = t and p = q, 0, else. Then dγ is an isomorphism (because the Γ-grading of R is Q-sharp) and there is a com- mutative diagram HmP (S)γ−a (ϕαβ)γ−−−−−−−→ HmP (S)γ−b y dγ−b (HnQ(S)−γ+a−σ) −−−−−−−−−→ (HnQ(S)−γ+b−σ)∗. The assertion follows. � As an immediate consequence we obtain Corollary 1.4. (a) Let f ∈ S be an homogeneous element of degree a−b, and ϕ : S(−a) → S(−b) the graded degree zero map induced by multiplication with f . Then HmP (ϕ) ≃ HnQ(ϕ∨)∗. (b) Let F be a Γ-graded complex of finitely generated free S-modules. Then (i) H iP (F) = 0 for i 6= m and H Q(F) = 0 for j 6= n, (ii) HmP (F) ≃ HnQ((F)∨)∗. As the main result of this section we have Theorem 1.5. Assume that K is Noetherian, the Γ-grading of S is P -sharp (equivalently Q-sharp) and M is a finitely generated Γ-graded S-module. Set ωS/K := S(−σ). Let F be a minimal Γ-graded S-resolution of M . Then, (a) For all i, there is a functorial isomorphism H iP (M) ≃ Hm−i(HmP (F)). (b) There is a convergent Γ-graded spectral sequence, H iQ(Ext S(M,ωS/K)) ⇒ H i+j−n(HmP (F) In particular, if K is a field, there is a convergent Γ-graded spectral sequence, H iQ(Ext S(M,ωS)) ⇒ H dimS−(i+j) P (M) Proof. Claim (a) is an immediate consequence of Corollary 1.4 via the Γ-graded spec- tral sequence Hp−i(H P (F)) ⇒ H iP (M). For (b), the two spectral sequences arising from the double complex CQF∨ have as second terms respectively ′Eij2 = H iQ(Ext S(M,ωS/K)), 2 = 0 for i 6= n and ′′E 2 = H j(HnQ(F ∨)) ≃ Hj(HmP (F)∗). If further K is a field, Hj(HmP (F) ∗) ≃ (Hj(HmP (F)))∗ ≃ H P (F) Corollary 1.6. Under the hypotheses of the theorem, if K is a field, then for any γ ∈ Γ, there are convergent spectral sequences of finite dimensional K-vector spaces H iQ(Ext S(M,ωR))γ ⇒ H dimS−(i+j) P (M)−γ , H iP (Ext S(M,ωR))γ ⇒ H dimS−(i+j) Q (M)−γ . We now consider the special case that Γ = Z2, S := K[x1, . . . , xm, y1, . . . , yn] with deg(xi) = (1, 0) and deg(yj) = (dj , 1) with dj ≥ 0. Set T := K[x1, . . . , xm] and let M be a Γ-graded S-module. We view M as a Z-graded module by defining Mk = j M(j,k). Observe that each Mk itself is a graded T -module with (Mk)j = M(j,k) for all j. We also note that H iP (M)k ∼= H iP0(Mk), as can been seen from the definition of local cohomology using the Čech complex. Here P0 = (x1, . . . , xm) is the graded maximal ideal of T . Corollary 1.7. With the notation introduced, let s := dimS = m + n and d := dimM . (a) H0P (Ext S (M,ωS)) ∼= HdQ(M)∗ for any k, (b) there is an exact sequence 0→H1P (Exts−dS (M,ωS))→H Q (M) ∗→H0P (Exts−d+1S (M,ωS)). (c) Let i ≥ 2. If ExtjS(M,ωS) is annihilated by a power of P for all s−d < j < s−d+i, then there is an exact sequence Exts−d+i−1S (M,ωS)→H P (Ext S (M,ωS))→H Q (M) ∗→H0P (Exts−d+iS (M,ωS)). In particular, if Ext S(M,ωS) has finite length for all s − d < j ≤ s − d+ i0, for some integer i0, then H iP0(Ext S (M,ωS)k) ∼= (Hd−iQ (M)−k) ∗ for all i ≤ i0 and k ≫ 0. Consequently, if M is a generalized Cohen-Macaulay module (i.e. Exts−iS (M,ωS) has finite length for all i 6= d), and if we set N = Exts−dS (M,ωS), then H iP0(Nk) ∼= (Hd−iQ (M)−k) ∗ for all i and all k ≫ 0. Proof. (a), (b) and (c) are direct consequences of Corollary 1.6. For the application, notice that if γ = (ℓ, k) ∈ Γ with k ≫ 0 one has ExtjS(M,ωS)γ = 0 for all s− d < j ≤ s− d+ i0. Therefore, for such γ, the desired conclusion follows. � A typical example to which this situation applies is the Rees algebra of a graded ideal I in the standard graded polynomial ring T = K[x1, . . . , xm]. Say, I is generated be the homogeneous polynomials f1, . . . , fn with deg fj = dj for j = 1, . . . , n. Then the Rees algebra R(I) ⊂ T [t] is generated the elements fjt. If we set deg fjt = (dj , 1) for all j and deg xi = (1, 0) for all i, then R(I) becomes a Γ-graded S-module via the K-algebra homomorphism S → R(I) with xi 7→ xi and yj 7→ fjt. According to this definition we have R(I)k = Ik for all k. Since dimR(I) = m+1, the module ωR(I) = Extn−1S (R(I), ωS) is the canonical module of R(I) (in the sense of [HK, 5. Vortrag]). Recall that if a ring R is a finite S-module of dimension m + 1, the natural finite map R→Hom(ωR, ωR) ∼= Extn−1S (ωR, ωS) is an isomorphism if and only if R is S2. Thus in combination with Corollary 1.7 we obtain Corollary 1.8. Let R := R(I). Suppose that Rp is Cohen-Macaulay for all p 6= (m, R+) where m = (x1, . . . , xm) and R+ = k>0 I ktk. Then H im(I k) ∼= (Hm+1−iR+ (ωR)−k) ∗ for all i and all k ≫ 0. Proof. Since ωR localizes, the conditions imply that (ωR)p is Cohen-Macaulay for all p 6= (m, R+). Hence the natural into map R→R′ := Extn−1S (ωR, ωS) has a cokernel of finite length. In particular, R′k = Rk = I k for k ≫ 0. Thus Corollary 1.7 applied to M = ωR gives the desired conclusion. � Remark 1.9. Let R := R(I). If the cokernel of R→Hom(ωR, ωR) is annihilated by a power of R+ (in other words, the blow-up is S2, as a projective scheme over Spec(T )), then R′k = I k for k ≫ 0 and therefore one has an exact sequence 0→H0m(T/Ik)→(HmR+(ωR)−k) ∗→H0m(ExtnS(ωR, ωS)k)→H1m(T/Ik)→(Hm−1R+ (ωR)−k) for such a k. 2. A method of constructing examples Suppose that R = i,j≥0Rij is a standard bigraded algebra over a ring K = R00. Define Ri = j≥0Rij and Rj = i≥0Rij . Define ideals P = i and Q = j>0Rj in R. Suppose that M = ij∈ZMij is a finitely generated, bigraded R-module. Define M i = j∈ZMij and Mj = i∈ZMij. M i is a graded R0-module and Mj is a graded R0-module. Let Q0 = R01R 0, so that Q = Q0R. Let P0 = R10R0 so that P = R10R. We have K module isomorphisms H lQ(M)m,n ∼= H lQ0(M for m,n ∈ Z. Let M̃m be the sheafification of the graded R0-module Mm on Proj(R0). We have K module isomorphisms H lQ0(M m)n ∼= H l−1(Proj(R0), M̃m(n)) for l ≥ 2 and exact sequences 0 → H0Q0(M m)n → (Rm)n = Rm,n → H0(Proj(R0), M̃m(n)) → H1Q0(M m)n → 0. We have similar formulas for the calculation of H lP (M). Now assume that X is a projective scheme over K and F1 and F2 are very ample line bundles on X. Let Rm,n = Γ(X,F⊗m1 ⊗F We require that R = m,n≥0Rm,n be a standard bigraded K-algebra. We have X ∼= Proj(R0) ∼= Proj(R0). The sheafification of the graded R0-module Rm on X is R̃m = F⊗m1 , and the sheafification of the graded R0-module Rn on X is R̃n ∼= F⊗n2 (Exercise II.5.9 [Ha]). For l ≥ 2 we have bigraded isomorphisms H lQ(R) H lQ0(R m)n ∼= m≥0,n∈Z H l−1(X,F⊗m1 ⊗F Viewing R as a graded R0 algebra, we thus have graded isomorphisms (2) H lQ(R)n H l−1(X,F⊗m1 ⊗F for l ≥ 2 and n ∈ Z. Let d = dim(R) = dim(X) + 2. We now further assume that K is an algebraically closed field and X is a nonsingular K variety. Let V = P(F1 ⊕F2), a projective space bundle over X with projection π : V → X. Since F1 ⊕ F2 is an ample bundle on X, OV (1) is ample on V . Since Γ(V,OV (t)) Γ(V,OV (t)) ∼= Γ(X,St(F1 ⊕F2)) ∼= i+j=t and R is generated in degree 1 with respect to this grading, OV (1) is very ample on V and R is the homogeneous coordinate ring of the nonsingular projective variety V , so that R is generalized Cohen Macaulay (all local cohomology modules H iR+(R) of R with respect to the maximal bigraded ideal R+ of R have finite length for i < d). We further have that V is projectively normal by this embedding (Exercise II.5.14 [Ha]) so that R is normal. 3. Strange behavior of local cohomology In [CH], we constructed the following example of failure of tameness of local cohomology. In the example, R0 has dimension 3, which is the lowest possible for failure of tameness [Br]. Theorem 3.1. Suppose that K is an algebraically closed field. Then there exists a normal standard graded K-algebra R0 with dim(R0) = 3, and a normal standard graded R0-algebra R with dim(R) = 4 such that for j ≫ 0, dimK(H Q(ωR)−j) = 2 if j is even, 0 if j is odd, where ωR is the canonical module of R, Q = n>0Rn. We first show that the above theorem is also true for the local cohomology of R. Theorem 3.2. Suppose that K is an algebraically closed field. Then there exists a normal standard graded K-algebra R0 with dim(R0) = 3, and a normal standard graded R0-algebra R with dim(R) = 4 such that for j > 0, dimK(H Q(R)−j) = 2 if j is even, 0 if j is odd, where Q = n>0Rn. Proof. We compute this directly for the R of Theorem 3.1 from (2) and the calculations of [CH]. Translating from the notation of this paper to the notation of [CH], we have X = S is an Abelian surface, F1 = OS(r2laH) and F2 = OS(r2(D + alH)). By (2) of this paper, for n ∈ N, we have dimK(H Q(R)n) = m≥0 h 1(X,F⊗m1 ⊗F m≥0 h 1(S,OS((m+ n)r2alH + nr2D)). Formula (1) of [CH] tells us that for m,n ∈ Z, (3) h1(S,OS(mH + nD)) = 2 if m = 0 and n is even, 0 otherwise. Thus for n < 0, we have dimK(H Q(R)n) = 2 if n is even, 0 if n is odd, giving the conclusions of the theorem. � The following example shows non periodic failure of tameness. Theorem 3.3. Suppose that p is a prime number such that p ≡ 2 (mod) 3 and p ≥ 11. Then there exists a normal standard graded K-algebra R0 over a field K of characteristic p with dim(R0) = 4, and a normal standard graded R0-algebra R with dim(R) = 5 such that for j > 0, dimK(H Q(R)−j) = 1 if j ≡ 0 (mod) (p+ 1), 1 if j = pt for some odd t ≥ 0, 0 otherwise, where Q = n>0Rn. We have p t ≡ −1(mod)(p + 1) for all odd t ≥ 0. To establish this, we need the following simple lemma. Lemma 3.4. Suppose that C is a non singular curve of genus g over an algebraically closed field K, and M, N are line bundles on C. If deg(M) ≥ 2(2g+1) and deg(N ) ≥ 2(2g+1), then the natural map Γ(C,M) ⊗ Γ(C,N ) → Γ(C,M⊗N ) is a surjection. Proof. If L is a line bundle on C, then H1(C,L) = 0 if deg(L) > 2g − 2 and L is very ample if deg(L) ≥ 2g + 1 (Chapter IV, Section 3 [Ha]). Suppose that L is very ample and G is another line bundle on C. If deg(G) > 2g − 2− deg(L), then G is 2-regular for L (Lecture 14, [M1]). Thus if deg(G) > 2g − 2 + deg(L), Γ(C,G) ⊗ Γ(C,L) → Γ(C,G ⊗ L) is a surjection by Castelnuovo’s Proposition, Lecture 14, page 99 [M1]. We now apply the above to prove the lemma. Write M ∼= A⊗q ⊗ B where A is a line bundle such that deg(A) = 2g + 1, and 2g + 1 ≤ deg(B) < 2(2g + 1). deg(N ) > 2g − 2 + deg(A). Thus there exists a surjection Γ(C,N ) ⊗ Γ(C,A) → Γ(C,A⊗N ). We iterate to get surjections Γ(C,A⊗i ⊗N )⊗ Γ(C,A) → Γ(C,A⊗(i+1) ⊗N ) for i ≤ q, and a surjection Γ(C,A⊗q ⊗N )⊗ Γ(C,B) → Γ(C,M⊗N ). We now prove Theorem 3.3. For the construction, we start with an example from Section 6 of [CS]. There exists an algebraically closed field K of characteristic p, a curve C of genus 2 over K, a point q ∈ C and a line bundle M on C of degree 0, such that for n ≥ 0, H1(C,OC (q)⊗M⊗n) = 1 if n = pt for some t ≥ 0, 0 otherwise. Further, H1(C,OC (2q)⊗M⊗n) = 0 for all n > 0. Let a = p+ 1. Let E be an elliptic curve over K, and let T = E × E, with projections πi : T → E. Let b ∈ E be a point and let A = π∗1(OE(b)) ⊗ π∗2(OE(b)). Let X = T × C, with projections ϕ1 : X → T , ϕ2 : X → C. Let L = OC(q). Let F1 = ϕ∗1(A)⊗a ⊗ ϕ∗2(L)⊗a, F2 = ϕ∗1(A)⊗(1+a) ⊗ ϕ∗2(L⊗(1+a) ⊗M−1). For m,n ≥ 0, we have Γ(X,F⊗m1 ⊗F 2 ) = Γ(T,A⊗(ma+n(1+a)))⊗ Γ(C,L⊗(ma+n(1+a)) ⊗M−⊗n) = Γ(T,A⊗a)⊗m ⊗ Γ(T,A⊗(1+a))⊗n ⊗ Γ(C,La)⊗m ⊗ Γ(C,L⊗(1+a) ⊗M−1)⊗n = Γ(X,F1)⊗m ⊗ Γ(X,F2)⊗n by the Künneth formula (IV of Lecture 11 [M1]) and Lemma 3.4. Let Rm,n = Γ(X,F⊗m1 ⊗F 2 ). R = m,n≥0Rm,n is a standard bigraded K-algebra by (4). Thus (2) holds. By the Riemann Roch Theorem, we compute, (5) h0(C,L⊗r ⊗M−⊗s) = h1(C,L⊗r ⊗M−⊗s) + r − 1, and for s < 0, (6) h1(C,L⊗r ⊗M−⊗s) =   1− r r < 0, 1 r = 0, s < 0, 1 r = 1, s = −pt, for some t ∈ N, 0 r = 1, s 6= −pt for some t ∈ N, 0 r = 2, s < 0, 0 r ≥ 3. We further have (7) h1(T,A⊗r) = 0 r 6= 0, 2 r = 0, (8) h0(T,A⊗r) = 0 r < 0, 1 r = 0, r2 r > 0. By (2), for n ∈ Z, we have dimK(H Q(R)n) = h1(X,F⊗m1 ⊗F By the Künneth formula, H1(X,F⊗m1 ⊗F ∼= H0(T,A⊗(ma+n(1+a)))⊗H1(C,L⊗(ma+n(1+a)) ⊗M−⊗n) ⊕H1(T,A⊗(ma+n(1+a)))⊗H0(C,L⊗(ma+n(1+a)) ⊗M−⊗n). Thus by (5) - (8), we have for j > 0, dimK(H Q(R)−j) = 1 j ≡ 0 (mod) a, 1 j = pt for some odd t ∈ N, 0 otherwise. and we have the conclusions of Theorem 3.3. Theorem 3.5 gives an example of failure of tameness of local cohomology with larger growth. Theorem 3.5. Suppose that K is an algebraically closed field. Then there exists a normal standard graded K-algebra R0 over K with dim(R0) = 4, and a normal standard graded R0-algebra R with dim(R) = 5 such that for j > 0, dimK(H Q(R)−j) = 6j if j is even, 0 if j is odd, where Q = n>0Rn. Proof. Let E be an elliptic curve over K, and let q ∈ E be a point. Let L = OE(3q). By Proposition IV.4.6 [Ha], L is very ample on E, and (9) ⊕n≥0Γ(E,L⊗n) is generated in degree 1 as a K-algebra. For n ∈ N, (10) h0(C,L⊗n) = 0 n < 0, 1 n = 0, 3n n > 0. (11) h1(C,L⊗n) = −3n n < 0, 1 n = 0, 0 n > 0. Let X = E3, with the three canonical projections πi : X → E. Define F1 = π∗1(L⊗2)⊗ π∗2(L⊗2)⊗ π∗3(L⊗2) F2 = π∗1(L)⊗ π∗2(L)⊗ π∗3(L⊗2). Rm,n = Γ(X,F⊗m1 ⊗F m,n≥0 Rm,n. By (9) and the Künneth formula, R is standard bigraded. By (2), the fact that ωX ∼= OX and Serre duality, dimK(H Q(R)−j) = h2(X,F⊗m1 ⊗F 2 ) = h1(X,F⊗m1 ⊗F for j ∈ Z. Now by (10), (11) and the Künneth formula, we have that for n > 0, h1(X,F⊗m1 ⊗F 2 ) = 0 if 2m+ n 6= 0, 2h0(X,L⊗n) if 2m+ n = 0. Thus the conclusions of Theorem 3.5 hold. � The following theorem gives an example of tame, but still rather strange local cohomol- ogy. Let [x] be the greatest integer in a real number x. Theorem 3.6. Suppose that K is an algebraically closed field. Then there exists a normal standard graded K-algebra R0 with dim(R0) = 3, and a normal standard graded R0-algebra R with dim(R) = 4 such that for j > 0, dimK(H Q(R)−j) = 162 dimK(H Q(R)−j) where Q = n>0Rn. Proof. We use the method of Example 1.6 [Cu]. Let E be an elliptic curve over an algebraically closed field K, and let p ∈ E be a point. Let X = E × E with projections πi : X → E. Let C1 = π∗1(p), C2 = π∗2(p) and ∆ = {(q, q) | q ∈ E} be the diagonal of X. We compute (as in [Cu]) that (12) (C21 ) = (C 2 ) = (∆ 2) = 0 (13) (∆ · C1) = (∆ · C2) = (C1 · C2) = 1. If N is an ample line bundle on X, then (14) H i(X,N ) = 0 for i > 0 by the vanishing theorem of Section 16 [M2]. Suppose that L is a very ample line bundle on X, and M is a numerically effective (nef) line bundle. Then M is 3 regular for L, so that Γ(X,M⊗L⊗n)⊗ Γ(X,L) → Γ(X,M⊗L⊗(n+1)) is a surjection if n ≥ 3. C1 + 2C2 is an ample divisor by the Moishezon Nakai criterion (Theorem V.1.10 [Ha]), so that 3(C1+2C2) is very ample by Lefschetz’s theorem (Theorem, Section 17 [M2]). Let F1 = OX(9(C1 + 2C2)). Then OX is 3 regular for OX(3(C1 + 2C2)), so we have surjections Γ(X,F⊗n1 )⊗ Γ(X,F1) → Γ(X,F ⊗(n+1) for all n ≥ 1. ∆+C2 is ample by the Moishezon Nakai criterion. Let D = 3(∆+C2). D is very ample by Lefschetz’s theorem, and thus OX(D)⊗F1 is very ample. Let F2 = OX(3D)⊗F⊗31 . OX is 3 regular for OX(D)⊗F1, so we have surjections Γ(X,F2⊗n)⊗ Γ(X,F2) → Γ(X,F⊗(n+1)2 ) for all n ≥ 1. for n > 0 and m ≥ 0, we have F⊗m1 ⊗F ∼= OX(3nD)⊗F⊗(m+3n)1 . Since D is nef, it is 3 regular for F1, and we have a surjection for all m ≥ 0, n > 0, Γ(X,F⊗m1 ⊗F 2 )⊗ Γ(X,F1) → Γ(X,F ⊗(m+1) Rm,n = Γ(X,F⊗m1 ⊗F We have shown that ⊕m,n≥0Rm,n is a standard bigraded K-algebra. Thus (2) holds. For m,n ∈ Z, let G = F⊗m1 ⊗ F 2 . As in Example 1.6 [Cu], and by (14) and Serre Duality (ωX ∼= OX since X is an Abelian variety), we deduce that 1. (G2) > 0 and (G · F1) > 0 imply G is ample and h1(X,G) = h2(X,G) = 0. 2. (G2) < 0 implies h0(X,G) = h2(X,G) = 0. 3. (G2) > 0 and (G · F1) < 0 imply G−1 is ample and h0(X,G) = h1(X,G) = 0. Let τ2 = −4− and τ1 = −4 + Using (12) and (13), we compute (F21 ) = 2 · 162, (F2)2 = 31 · 162, (F1 · F2) = 8 · 162. We have (G2) = 324(m2 + 8mn+ 31 = 324(m− τ1n)(m− τ2n). (G · F1) = 324(m + 4n). Since τ2 < −4 < τ1 < 0, for n < 0 and m ∈ Z, we have 1. m > τ2n if and only if G2 > 0 and G · F1 > 0 2. τ1n < m < τ2n if and only if (G2) < 0 3. m < τ1n if and only if (G2) > 0 and (G · F1) < 0. By the Riemann Roch Theorem for an Abelian surface (Section 16 [M2]), χ(G) = 1 (G2). Thus for m ∈ Z and n < 0, h1(X,G) = (G2) = −162(m2 + 8mn+ 31 n2) if τ1n < m < τ2n, 0 otherwise. For n ∈ Z, let σ(n) = dimK(H2Q(Rn)). By (2), σ(n) = h1(X,F⊗m1 ⊗F For n < 0, we have σ(n) = −162( τ1n 0 such that Γ(X,F⊗l1 ⊗ F 2 ) 6= 0. Thus we have an embedding F2 ⊗ F 1 ⊂ OX . Let A = F2 ⊗F−l1 , which we have embedded as an ideal sheaf of X. For j ≥ 0 and i ≥ jl, let Tij = Γ(X,F⊗i1 ⊗A⊗j) = Ri−jl,j. For j ≥ 0, let Tj = i≥jl Tij and T = j≥0 Tj . Let B = j>0 Tj. R ∼= T as graded rings over R0 ∼= T0, although they have different bigraded structures. Thus for all i, j we have (15) H iB(T )j ∼= H iQ(R)j . T1 is a homogeneous ideal of T0, and T is the Rees algebra of T1. Thus all of the exam- ples of Section 3 can be interpreted as Rees algebras over normal rings T0 with isolated singularities. We thus obtain the following theorems from Theorems 3.2 - 3.6. Theorems 4.1, 4.2 and 4.3 give examples of Rees algebras with non tame local cohomology. Theorem 4.1. Suppose that K is an algebraically closed field. Then there exists a normal, standard graded K algebra T0 with dim(T0) = 3 and a graded ideal A ⊂ T0 such that the Rees algebra T = T0[At] of A is normal, and for j > 0, dimK(H B(T )−j) = 2 if j is even, 0 if j is odd. where B is the graded ideal AtT of T . Theorem 4.2. Suppose that p is a prime number such that p ≡ 2(mod)3 and p ≥ 11. Then there exists a normal standard graded K-algebra T0 over a field K of characteristic p with dim(T0) = 4, and a graded ideal A ⊂ T0 such that the Rees algebra T = T0[At] of A is normal, and for j > 0, dimK(H Q(T )−j) = 1 if j ≡ 0(mod)(p + 1), 1 if j = pt for some odd t ≥ 0, 0 otherwise, where B is the graded ideal AtT of T . We have pt ≡ −1(mod)(p + 1) for all odd t ≥ 0. Theorem 4.3. Suppose that K is an algebraically closed field. Then there exists a normal, standard graded K-algebra T0 with dim(T0) = 4 and a graded ideal A ⊂ T0 such that the Rees algebra T = R0[At] of A is normal, and for j > 0, dimK(H B(T )−j) = 6j if j is even, 0 if j is odd, where B is the graded ideal AtT of T . Theorem 4.4. Suppose that K is an algebraically closed field. Then there exists a normal standard graded K algebra T0 with dim(T0) = 3, and a graded ideal A ⊂ T0 such that the Rees algebra T = T0[At] of A is normal, and for j > 0, dimK(H B(T )−j) = 162 dimK(H B(T )−j) where B is the graded ideal AtT of T . By localizing at the graded maximal ideal of T0, we obtain examples of Rees algebras of local rings with strange local cohomology. In all of these examples, T0 is generalized Cohen Macaulay, but is not Cohen Macaulay. This follows since in all of these examples, H2P0(R0)0 ∼= H1(X,OX ) 6= 0. 5. Local duality in the examples The example of [CH], giving failure of tameness of local cohomology, is stated in The- orem 3.1 of this paper. The proof of [CH] uses the bigraded local duality theorem of [HR], which now follows from the much more general bigraded local duality theorem, The- orem 1.5 and Corollary 1.7 of this paper, to conclude that in our situation, where R is generalized Cohen Macaulay, (16) (Hd−iQ (ωR)−j) ∗ ∼= H iP (R)j for j ≫ 0. In [CH], the formula H iP (R)j ∼= H iP0(Rj) i−1(X, R̃j(m)) i−1(X,F⊗m1 ⊗F for i ≥ 2 and j ≥ 0 is then used with formula (1) of [CH] ((3) of this paper) to prove Theorem 3.1. In Section 2 we derive (2) from which we directly compute the local cohomology in the examples of this paper. We make essential use of Serre duality on X in computing the examples. In this section, we show how (16) can be obtained directly from the geometry of X and V , and how this formula can be directly interpreted as Serre duality on X. Let notation be as in Section 2, so that K is an algebraically closed field, F1 and F2 are very ample line bundles on the nonsingular variety X. Let ωR be the dualizing module of R, and let ωX be the canonical bundle of X (which is a dualizing sheaf on X). For a K module W , let W ′ = HomK(W,K). Lemma 5.1. We have that (ωR)ij = Γ(X,F⊗i1 ⊗F 2 ⊗ ωX) if i ≥ 1 and j ≥ 1 0 otherwise. Set (ωR) j∈Z(ωR)i,j , a graded R 0 module. The sheafification of (ωR) i on X is (18) (̃ωR)i = F⊗i1 ⊗ ωX if i ≥ 1 0 if i ≤ 0. Set (ωR)j = i∈Z(ωR)i,j , a graded R0 module. The sheafification of (ωR)j on X is (19) (̃ωR)j = F⊗j2 ⊗ ωX if j ≥ 1 0 if j ≤ 0. Proof. Give R the grading where the elements of degree e in R are [R]e = i+j=eRij. We have realized R (with this grading) as the coordinate ring of the projective embed- ding of V = P(F1 ⊕F2) by the very ample divisor OV (1), with projection π : V → X. Let ωV be the canonical line bundle on V . We first calculate ωV . Let f be a fiber of the map π : V → X. By adjunction, we have that (f · ωV ) = −2. Since Pic(V ) ∼= ZOV (1) ⊕ π∗(Pic(X)), we see that there exists a line bundle G on X such that ωV ∼= OV (−2)⊗ π∗(G). The natural split exact sequence (20) 0 → F2 → F1 ⊕F2 → F1 → 0 determines a section X0 of X, such that π∗ of the exact sequence 0 → OV (1)⊗OV (−X0) → OV (1) → OV (1) ⊗OX0 → 0 is (20) (Proposition II.7.12 [Ha]). Thus OV (1) ⊗OV (−X0) ∼= π∗(F2) OV (1)⊗OX0 ∼= F1. By adjunction, we have that the canonical line bundle of X0 is ∼= ωV ⊗OV (X0)⊗OX0 . Putting the above together, we see that G ∼= F1 ⊗F2 ⊗ ωX . ωV ∼= OV (−2)⊗ π∗(F1 ⊗F2 ⊗ ωX). We realize R as a bigraded quotient of a bigraded polynomial ring S = K[x1, . . . , xm, y1, . . . , yn], with deg(xi) = (1, 0) for all i and deg(yj) = (0, 1) for all j. Viewing S as a graded K- algebra with the grading determined by d(xi) = d(yj) = 1 for all i, j, we have a projective embedding V ⊂ P = Proj(S). Since V is nonsingular, we see from Section III.7 of [Ha] ωV ∼= ExtrP(OV ,Op(−e)) where e = m+ n is the dimension of S, and r = e− dim(R). ωR is defined as ωR = *Ext S(R,S(−e)) ∼= ExtrP(OV ,OP(m− e)). For m ≫ 0, Γ(P, ExtrP(OV ,Op(m− e))) ∼= ExtrP(OV ,OP(m− e)) (by Proposition III.6.9 [Ha]). Thus ωR and Γ∗(ωV ) = Γ(V, ωV (m)) are isomorphic in high degree. Since both modules have depth≥ 2 at the maximal bigraded ideal of R, we see that ωR ∼= Γ∗(ωV ). m∈Z Γ(V, ωV (m)) m∈Z Γ(V,OV (m− 2)⊗ π∗(F1 ⊗F2 ⊗ ωX)). Since a fiber f of π satisfies (f ·OV (m− 2)⊗π∗(F1⊗F2)) < 0 if m < 2, we see that (with this grading) [ωR]m = 0 if m < 2 and For m ≥ 2, we have [ωR]m = Γ(X,S m−2(F1 ⊕F2)⊗F1 ⊗F2) i+j=m−2 Γ(X,F ⊗(i+1) ⊗(j+1) 2 ⊗ ωX). The conclusions of the lemma now follow. � Suppose that 2 ≤ i ≤ d − 2. Since F1 and F2 are ample, and d − (i + 1) > 0, there exists a natural number n0 such that (21) Hd−(i+1)(X,F⊗m1 ⊗Fn2 ⊗ ωX) = 0 for n ≥ n0 and all m ≥ 0. By (18), we have graded isomorphisms (22) H iQ(ωR)n H i−1(X,F⊗m1 ⊗F 2 ⊗ ωX) for n ∈ Z. By Serre duality, (23) H iQ(ωR)n (Hd−i−1(X,F−⊗m1 ⊗F By (21), there exists n0 such that (24) H iQ(ωR)−n (Hd−i−1(X,F−⊗m1 ⊗F for n ≥ n0. Now apply the functor L∗ = HomK(L,K) on graded R0-modules, with the grading (L∗)i = HomK(L−i,K) to (24), and compare with (17), to obtain (25) Hd−iP (R)n ∼= (H iQ(ωR)−n)∗ for n ≥ n0, from which (16) immediately follows. We can now verify that Theorem 3.1 is in fact true for all j > 0, using (22) and (3). We finally comment that an alternate proof of Theorem 3.2 for j ≫ 0 is obtained from Theorem 3.1, Formulas (2) and (22), the fact that X is an Abelian variety so that ωX ∼= OX , and the observation that h1(X,F−⊗n2 ) = h 1(X,F⊗n2 ) = 0 for n > 0. References [A] Aoyama, On the depth and the projective dimension of the canonical module, Japan J. Math. 6(1980), 61–69. [BrHe] Brodmann, M. and Hellus, M., Cohomological patterns of coherent sheaves over projective schemes, J. Pure and Appl. Alg. 172 (2002), 165–182. [Br] Brodmann, M., Asymptotic behaviour of cohomology: tameness, supports and associated primes, Joint International Meeting of the American Mathematical Society and the Indian Mathematical Society on Commutative Algebra and Algebraic Geometry, Bangalore/India, December 17-20, 2003, Contemporary Mathematics 390(2005), 31-61. [BS] Brodmann, M. and Sharp, R., Local cohomology, Cambridge Univ. Press, Cambridge, (1998). [BH] Bruns, W. and Herzog, J., Cohen-Macaulay rings (Revised edition), Cambridge Studies in Ad- vanced Mathematics 39, Cambridge University Press, 1998. [Cu] Cutkosky, S.D., Zariski decomposition of divisors on algebraic varieties, Duke Math. J. 53 (1986), 149 -156. [CH] Cutkosky, S.D. and Herzog, J., Failure of tameness of Local Cohomology, to appear in Journal of Pure and Applied Algebra. [CS] Cutkosky, S.D. and Srinivas, V., On a problem of Zariski on dimensions of linear systems, Annals of Math. 137 (1993), 531 - 559. [E] Eisenbud, D., Commutative algebra, with a view towards algebraic geometry, Springer Verlag, New York, Heidelberg, Berlin (1995). [Ha] Hartshorne, R., Algebraic Geometry, Springer Verlag, New York, Heidelberg, Berlin, 1977. [HK] Herzog, J. and Kunz, E., Der kanonische Modul eines Cohen-Macaualy Rings, Lecture Notes in Mathematics 238, Springer, 1971. [HR] Herzog, J. and Rahimi, A., Local Duality for Bigraded Modules, math.AC/0604587. [L] Lim, C.S., Tameness of graded local cohomology modules for dimension R0 = 2, the Cohen- Macaulay case, Menumi Math 26, 11 - 21 (2004). [M1] Mumford, D., Lectures on curves on an algebraic surface, Annals of Math Studies 59, Princeton Univ. Press, princeton (1966). [M2] Mumford, D., Abelian Varieties, Oxford University Press, Bombay, 1970. [RS] Rotthaus, C. and Sega, L.M., Some properties of graded local cohomology modules, J. Algebra 283, 232 - 247 (2005). Marc Chardin, Institut Mathématique de Jussieu Université Pierre et Marie Curie, Boite 247, 4, place Jussieu, F-75252 PARIS CEDEX 05 E-mail address: chardin@math.jussieu.fr Dale Cutkosky, Mathematics Department, 202 Mathematical Sciences Bldg, University of Missouri, Columbia, MO 65211 USA E-mail address: dale@math.missouri.edu Jürgen Herzog, Fachbereich Mathematik und Informatik, Universität Duisburg-Essen, Campus Essen, 45117 Essen, Germany E-mail address: juergen.herzog@uni-essen.de Hema Srinivasan, Mathematics Department, 202 Mathematical Sciences Bldg, University of Missouri, Columbia, MO 65211 USA E-mail address: srinivasanh@math.missouri.edu ABSTRACT We prove a duality theorem for certain graded algebras and show by various examples different kinds of failure of tameness of local cohomology. <|endoftext|><|startoftext|> Introduction Is it possible to define weak solutions of the Einstein equations of class piecewise-C0, i.e. to generalize the compatibility conditions which replace the field equations on a singular hypersurface to the case when the metric is regularly discontinuous? To reach this goal would probably mean to define the most general class of regularly discontinuous weak solutions of the Einstein equations. It seems that this problem was never studied before in the literature. But, before we proceed, we need to discuss whether we are talking of something mathemat- ically and physically consistent or not. A fundamental concept of Riemannian geometry is that at any point of a submanifold there are coordinate choices for which the metric reduces to http://arxiv.org/abs/0704.0103v1 the Minkowski flat metric. Clearly, if this choice is made on both sides of the discontinuity surface, any ”jump” in the metric disappears. Thus, the metric discontinuity appears as a coordinate dependent concept, which is neither geometrically (nor physically) acceptable in the context of General Relativity. But we also have to consider that regularity of the global coordinates plays an important role in our approach, which is that of [1] and of the literature cited therein. In particular, since here the spacetime is only C0, we are led to considering (C0, piecewise C1) coordinate transformations. If the metric is discontinuous in some globally C0 chart, it is in general impossible to obtain the vanishing of the metric jump on both sides of a hypersurface with a C0 coordinate transformation (see section 2). Moeover in the following we are led in a natural way to considering C1 coordinate transformations; the metric discontinuity is a tensor with respect to such coordinate changes! In other words the jump of the metric has a precise mathematical mean- ing, if we consider it in connection with global regular coordinates. In a well consolidated procedure, the assumption of continuity for the metric across a gravitational interface is usually taken for granted; however it follows from the limiting process of the thin sandwich modelization, in consequence of the hypothesis that the external derivatives of the metric are bounded [2]. Yet in this paper we are going to see that, even removing the assumption of continuity, it is still possible to define a generalized inner geometry of the discontinuity hypersurface; one thus can consistently find a corresponding generalized set of compatibility conditions, which obviously reduces to the usual ones when the continuity hypothesis is restored. Yet, which are the physical motivations to move to such generalization? Actually gravitational shock waves and thin shells are usually defined by the presence of singular curvature with a “delta” component concentrated on a hypersurface, situation which is well cast within the classic C0 piecewise-C1 match of metrics [1, 3]. We were originally led to consider solutions of class piecewise-C0, as pos- sible generalizations of shock waves and thin shells, by the sake of mathe- matical completeness, with the idea that phisical interpretation would follow. We actually found a reacher framework than the usual one, with some in- teresting new features (and even some rather undesiderable ones), which we display in this paper. There are two main theories in the literature for solutions of class C0 piecewise-C1, i.e. that in terms of the second fundamental form (heuristic theory, see e.g. [4, 5]) and that in terms of the curvature tensor-distribution (axiomatic theory, see e.g. [6, 1]); such are equivalent through appropiate extensions (for a self-contained overview see e.g. [1]). The axiomatic theory appears to be inappropriate to the study of general- ized solutions, since the theory of distributions is basically linear. Even if we could in principle replace the discontinuous metric with its associated distri- bution gD, then it would be impossible to define, for example, replacements for the Christoffel symbols, since this would involve product of distributions, which, as it is generally believed, is impossible to define. In fact it was proved by Schwarz [7] that, under reasonable hypothesis, there can be no definition of commutative and associative operation on distributions which reduce to ordinary multiplication on integrable distributions (say on regular functions); thus in a word it is impossible to define product of distributions. Or is it? Colombeau [8, 9, 10] developed a theory which apparently con- tradicts Schwarz’s result. He introduced a very broad space of generalized functions, which extends the usual space of distributions, a subspace of which corresponds, in a certain sense (the correspondance is not 1 to 1), to usual distributions. Colombeau’s formalism permits multiplication of generalized functions; but the contradiction with the impossibility theorem is only ap- parent, in fact Schwarz’s hypothesis are violated, since the operation does not coincide with ordinary multiplication on regular functions nor with mul- tiplication of a regular function times a distribution (although it does at least for C∞ functions). Such theory, however, does not fit in a natural way in general relativity, since it is impossible to define covariantly invariant geometrical objects; in fact Colombeau’s space is not invariant for smooth coordinate transforma- tions, unless they are linear. Such difficulty, however, seems to have been overcome in subsequent adjustments of the theory, with the introduction of a richer mathematical framework [11, 12], so that the generalized functions current apparatus can be used in general relativity, and indeed it has been applied at least to the calculation of singular curvatures of the spacetimes of Kerr [13], Reissner-Nordstrom [14], and so-called cosmic-string spacetime [15]. In such literature Colombeau’s theory is adapted to the handling of curvature when the metric has a singularity in the sense of functions, i.e. the ordinary curvature would explode, at a singular event-point or at a singular worldline. There seems to be no particular reason to forbid Colombeau’s method also for defining the match of piecewise-C0 regularly discontinuous metrics at a singular hypersurface; however, as far as the author is aware, no attempt has been made yet to use it in this framework. The direct method we will introduce in the following sections, however, is so conceptually simple that we prefer not to experiment with Colombeau’s generalized functions, which would instead mean introducing a far more com- plicated and unfamiliar mathematical apparatus. In this paper in fact we propose a new, generalized theory for regularly discontinuous solutions, covering also the match of piecewise-C0 metrics. Our theory is heuristic, as it is constructed in a way similar to the heuristic the- ory of C0, piecewise-C1 solutions originated from the studies of Israel, but we completely avoid the traditional or projectional Gauss-Codazzi framework (which either does not include the lightlike case [4, 5], or needs a special adap- tation for it [16, 1]) and introduce what we called “mean-value differential geometry” framework, instead (see section 3). This is conceptually very sim- ple, and permits to construct in a natural way a generalized theory, where the main role (which used to be that of the jump of the secund fundamental form) is here played by the jump of the Christoffel symbols. The theory is an extension of the theory of gravitational discontinuity hypersurfaces we have studied in [1], to which it reduces when the metric is C0. Even if we should restrict to C0 solution, by adding the traditional assumption of continuity for the metric, our theory would undoubtedly have at least the good qualities of not needing the timelike and the lightlike case to be distinguished (different from usual heuristic theory), and of just requiring C0 continuity for the co- ordinates (different from the axiomatic theory). Moreover, it is completely cast in the framework of general coordinates of the ambient (glued) space- time, with no use of parametric equations of the hypersurface, nor of inner coordinates and holonomic 3-basis, which could be considered a good quality in some applications as well. Piecewise-C0 weak solutions of the Einstein equations, as far as the au- thor is aware, have never been considered previously in the literature. They generalize the corresponding C0 solutions, as examples in this paper show; however there is more. Apparently in fact the theory allows the propagation of free gravitational discontinuity at lower speed than the speed of light (sec- tion 8); or rather, we still have no general proof that the absence of stress energy concentrated on Σ should, in the timelike case, necessarily imply the degeneracy of a generalized solution to a boundary layer, although it does at least for a wide class of spherical matchs (see section 6). Moreover, non- simmetric stress-energy is allowed on the hypersurface (section 9), like e.g. in Einstein-Cartan dynamics. This possible link to classical unification theories is surprising, since in our framework we have nothing similar to Einstein- Cartan torsion. We therefore see a lot of space for future investigation. 2 Discontinuous metrics Let us suppose V4 an oriented differentiable manifold of dimension 4, of class (C0, piecewise C2), provided with a strictly hyperbolic metric of signature –+++ and class piecewise-C0. Let Ω ⊂ V4 be an open connected subset with compact closure. Let units be chosen in order to have the speed of light in empty space c ≡ 1. Greek indices run from 0 to 3. Let Σ ⊂ Ω be a regular hypersuperface of equation f(x) = 0; let Ω+ and Ω− denote the subdomains distinguished by the sign of f . We suppose the metric and its first and second partial derivatives to be regularly discontin- uous on Σ in all charts of class C0(Ω). Let f ∈ C0(Ω) ∩ C2(Ω\Σ), and let second and third derivatives of f be regularly discontinuous on Σ. Finally, let ℓα ≡ ∂αf denote the gradient of f . Let the metric be a solution of the ordinary Einstein equations on each of the two domains Ω+ and Ω−. In this situation Σ is the interface between two general relativistic spacetimes and it is called a (generalized) gravitational discontinuity hypersurface. In the following we will develope a theory to justify the introduction of suitable generalized compatibility conditions to replace the Einstein equa- tions on Σ (section 5); if such conditions are satisfied the match across the generalized gravitational hypersurface Σ will be called a generalized regularly discontinuous solution of the Einstein equations. Now let us briefly recall some basics notions on regularly discontinuous fields, which we will use as tools. In any case, for notation and terminology we refer to [1]. A field ϕ is said to be regularly discontinuous on Σ if its restrictions to the two subdomains Ω+ and Ω− both have a finite limit for f −→ 0; we denote such limits by ϕ+ and ϕ−, respectively. In this case the jump [ϕ] across Σ and its arithmetic mean value ϕ are well defined on the hypersurface: [ϕ] = ϕ+ − ϕ− ϕ = (1/2)(ϕ+ + ϕ−) If ϕ is continuous across Σ, we obviously have: [ϕ] = 0, ϕ = ϕ. We also have the converse formulae: ϕ+ = ϕ+ (1/2)[ϕ] ϕ− = ϕ− (1/2)[ϕ]. As for the product of two functions ϕ and ψ, we have: [ϕψ] = [ϕ]ψ + ϕ[ψ] ϕψ = ϕψ + (1/4)[ϕ][ψ] If a field ϕ is regularly discontinuous on Σ, its jump [ϕ] is sometimes called its discontinuity of order 0. The jump of a regularly discontinuous function has support on Σ, but in general, the partial derivative of the jump is well defined as the jump of the derivative of the function (see [17, 18]). In particular, the derivative of the jump of a continuous field is not null, unless the field is also C1. Similarly, we define the partial derivative of the mean value as the mean value of the partial derivative. We can also use regular prolongations to extend, in a sense, the definition of ϕ and [ϕ] to the whole domain Ω. Thus they can be regarded as regular and derivable fields in Ω, but their values (and those of their derivatives) are well defined only on Σ, while in Ω\Σ they depend on the choice of the prolongation. For details on the method of regular prolongations see e.g. [17, 18]. We moreover define the covariant derivative of a field with support on Σ by means of the mean value Γβρ σ of the Christoffel symbols. For the jump of a regularly discontinuous vector, for example, with this definition one has that the jump of the covariant derivative is different than the covariant derivative of the jump. Thus, by definition, we have: β] = ∂α[V β] + Γασ β[V σ] (4) and in consequence of (3): β] = [∇αV β]− [Γασ , (5) and similarly for the jump of any regularly discontinuous tensor. Since the spacetime is only C0, we are led to considering (C0, piecewise C1) coordinate transformations, with regularly discontinuous first deriva- tives; the metric discontinuity [gαβ] is not a tensor with respect to such changes of coordinates. In fact we have: [gαβ] = [gα′β′] + qαβ′ + qα′β where: qα′β = [gα′β′ ] + ḡα′β′ We therefore can simulate all (C0, piecewise C1) coordinate changes by com- bining C1 changes with metric gauge changes: [gαβ ]←→ [gαβ] + qαβ′ + qα′β which generalize usual gravitational gauge changes of the theory of (C0, piece- wise C1) solutions [1]. Is it always possible to make [gαβ] vanish with an appropriate C 0 trans- formation? Clearly the answer is negative. In fact it suffices to consider the case when [gαβ] and ḡαβ are both definite positive in a given chart to see that the equation obtained from (6) by replacing the first hand side with 0 has no solution for [∂xα /∂xα] and ∂xα /∂xα. Thus the set of effective generalized gravitational discontinuity hypersurfaces is non empty. Moreover it will occur in many applications to have ℓα ∈ C 0. Therefore it will be often desiderable to work in the framework of (C1, piecewise C2) coordinate transformations, which preserve such condition. The metric dis- continuity is a tensor with respect to such changes of coordinates, but the jump of the Christoffel symbols, which appear to play a main role in the following, is not; we have in fact: σ] = [Γα′β′ ∂xα∂xβ If the coordinates are C0 and so is the form ℓα we can write: ∂xα∂xβ = ℓαℓβ∂ where ∂2 denotes the weak discontinuity of order 2 (see e.g. [17, 18]). Thus on Σ we can generate all (C1, piecewise C2) transformations for [Γ] combining C2 transformations (with respect to which Γ is a tensor) and Christoffel gauges transformations, i.e. of the kind: σ]↔ [Γαβ σ] + ℓαℓβQ σ (11) with some analogy with the case of C0 metrics (where the main role is played by the first order metric discontinuity ∂g, see [1] section 3). In any case neither the mean value of the metric g or its jump [g] now have null covariant derivatives. Consider in fact the identity ∇αgβρ = 0 in the domain Ω+; from the limit f −→ 0+, on Σ we have: βρ − (Γαβ ν)+g+νρ − (Γαρ ν)+g+βν = 0 (12) Here, with obvious meaning of the symbols, we denote: g+βρ = (gβρ) +, gβρ = gβρ, etc. Consequently on Σ, from (2)1 we have: ∂αgβρ + (1/2)∂α[gβρ]− Γαβ νgνρ − Γαρ νgνβ+ −(1/2)([Γαβ ν ]gνρ + Γαβ ν [gρν ] + [Γαρ ν ]gνβ + Γαρ ν [gβν ])+ −(1/4)([Γαβ ν ][gνρ] + [Γαρ ν ][gβν ]) = 0 Similarly, from the limit f −→ 0− and from (2)2 we also have on Σ: ∂αgβρ − (1/2)∂α[gβρ]− Γαβ νgνρ − Γαρ νgνβ+ +(1/2)([Γαβ ν ]gνρ + Γαβ ν [gρν ] + [Γαρ ν ]gνβ + Γαρ ν [gβν ])+ −(1/4)([Γαβ ν ][gνρ] + [Γαρ ν ][gβν ]) = 0 From the sum of expressions (13) and (14) we thus have: ∇αgβρ = (1/4)([Γαβ ν ][gνρ] + [Γαρ ν ][gβν ]) (15) and, from difference: ∂α[gβρ] = [Γαβρ] + [Γαρβ ] (16) From (16), (3), and from the definition of covariant derivative over Σ, we then have: ∇α[gβρ] = [Γαβ ν ]gνρ + [Γαρ ν ]gβν (17) As for the jump and the mean value of the Christoffel symbols we have, from ν = (1/2){gνσ(∂αgβσ + ∂βgσα − ∂σgαβ)+ +(1/4)[gνσ](∂α[gβσ] + ∂β[gσα]− ∂σ[gαβ ])} ν ] = (1/2){gνσ(∂α[gβσ] + ∂β[gσα]− ∂σ[gαβ ])+ +[gνσ](∂αgβσ + ∂βgσα − ∂σgαβ) or, from (15) and (17): ν ]gνρ = (1/2)(∇α[gβρ] +∇β[gρα]−∇ρ[gαβ]) ν ][gνρ] = 2(∇αgβρ +∇βgρα −∇ρgαβ) 3 Mean-value geometry on a hypersurface Let us consider a 4-vector V α, regularly discontinuous on Σ, the jump and the mean value of which will work as a prototype of vectors with Σ as support. We have, by definition: [∇β∇αV σ] = ∇β[∇αV σ]− [Γβα ν ]∇νV σ + [∇βν σ]∇αV ν (21) where [∇αV σ] = ∇α[V σ]+ [Γαν and where, again by definition, we have: ∇νV σ = {∂ν(V +)σ + (Γ+)νλ σ(V +)λ + ∂ν(V −)σ + (Γ−)νλ σ(V −)λ} (22) Thus, from (2) we have: ∇νV σ = ∇νV σ + (1/4)[Γνλ σ][V λ], (23) which, incidentally, is the same result we could get from the formal applica- tion of (3), wich actually can be applied to covariant derivatives, provided one interpretes ∇ = ∇. We therefore have: [∇α∇βV σ] = ∇α∇β[V σ] +∇β[Γαν + [Γαν σ]∇βV −[Γβα ν ]∇νV − (1/4)[Γβαν][Γνλ σ][V λ]+ +[Γβν σ]∇αV + (1/4)[Γβν σ][Γαλ ν ][V λ] and thus, by antisymmetrization: [∇[β∇α]V σ] = ∇[β∇α][V σ] +∇[β[Γα]νσ ]V [Γν[β σ][Γα]λ ν ][V λ] (25) Now, from the Ricci identity we have: [∇[β∇α]V σ] = [Rαβρ σV ρ] and then, by [∇[β∇α]V σ] = [Rαβρ +Rαβρ σ[V ρ], (26) and thus from a well known identity which follows from (3) as a consequence our definition (5) for the covariant derivative on Σ, i.e. (see [1]): [Rαβρ σ] = ∇β[Γαρ σ]−∇α[Γβρ σ] (27) we have that the commutator of the covariant derivatives of the jump of a generic regularly discontinuous vector obeys the following Ricci-like formula: ∇[βα][V (1/2)Rαβρ σ − (1/4)[Γν[β σ][Γα]ρ [V ρ]. (28) Not surprisingly, working in a similar way starting from ∇β∇αV σ and anti- symmetrizing, we find again: ∇[βα]V (1/2)Rαβρ σ − (1/4)[Γν[β σ][Γα]ρ ; (29) in fact any given field with support on Σ can be considered, by the help of suitable prolongations, as the jump (or as the mean value of) some regularly discontinuous field. Thus, for any vector V with support on Σ, we can introduce the following mean-value geometry Ricci-like formula on Σ: (∇[β∇α])V σ = (RΣ)αβρ σV ρ; (30) where we have introduced the mean-value geometry curvature (RΣ), defined by the following mean-value geometry first Gauss-Codazzi identity: (RΣ)αβρ σ = Rαβρ σ − (1/4)([Γβν σ][Γαρ ν ]− [Γαν σ][Γβρ ν ]) (31) Notice that, for the sake of simplicity, we have introduced a slight abuse of notation, since in [1] and [16] the same symbol RΣ instead denotes the inner curvature defined with the help of projections. Actually anything goes like in [1] section 4 with the Gauss-Codazzi framework, with the difference that here we don’t have to make projections, which would involve product times a discontinuous tangent metric. Moreover here we don’t even have to distinguish between the cases of Σ timelike or lightlike. In other words our mean-value differential geometry on a hypersurface is a very simple, in conceptual terms, analogue of the Gauss-Codazzi apparatus. Thus, with the heuristic theory of [1] section 6 (see also [4] for the timelike case) in mind as a prototype, we expect the jump of the Christoffel symbols to play the main role, in place of the secund fundamental form, in the definition of compatibility conditions for very weak solutions of the Einstein equations. Indeed, this happens, as it will be shown in the following. 4 Complex mean-value formalism The metric being dicontinuous on Σ, we are missing the fundamental tool to rise and lower indices, and to construct curvature in the traditional way. This is the reason why sometimes one is tempted to introduce some hybrid metric object on Σ to replace the metric, even in the (C0, piecewise C1) case (see e.g. [5]). It is reassuring to find out that the framework of the preceeding section can be confirmed by such a kind of approach. It would be desiderable to simply replace g with g on Σ, but it is easy to check that g has not the necessary algebraic requisites; in particular we have αρ 6= δβ ρ. Consider instead: g̃αβ = gαβ + i(1/2)[gαβ], g̃ αβ = gαβ − i(1/2)[gαβ] (32) where i is the imaginary unit (i.e. we have i2 = −1). It is easy to check, with the help of (3), that we have: g̃αβ g̃ αρ = δα ρ + i[gαβ ]g αρ (33) i.e., in particular: ℜ(g̃αβ g̃ αρ) = δβ ρ. For the sake of brevity in the following we will denote A ≈ B the relation ℜ(A) = ℜ(B). Thus the pair g̃αβ and g̃αβ is a good candidate replacement for the metric on Σ, for the purposes of rising and lowering indices. Now, similar to (32) let us introduce: Γ̃αβν = Γαβν + i(1/2)[Γαβν ], Γ̃αβ σ = Γαβ σ − i(1/2)[Γαβ σ] (34) so that we have: Γ̃αβ σ ≈ Γ̃αβν g̃ σν and conversely: Γ̃αβν ≈ Γ̃αβ σg̃νσ. Let us now introduce the differential operator ∇̃ on Σ, which makes use of Γ̃ in place of Γ. As we could expect we have: ∇̃ρg̃αβ ≈ 0, ∇̃ρg̃ αβ ≈ 0 (35) which is the replacement on Σ for the covariant conservation of the metric tensor. Now let us construct on Σ the complex curvature tensor R̃ in the familiar way, but with Γ̃ in place of the ordinary Christoffel symbols (which are undefined on Σ): R̃αβρ σ = ∂βΓ̃αρ σ − ∂αΓ̃βρ σ + Γ̃βµ σΓ̃αρ µ − Γ̃αµ σΓ̃βρ µ (36) We rather unespectedly find out that R̃αβρ σ = (RΣ)αβρ σ + i(1/2)[Rαβρ σ] (37) i.e. in particular we have: R̃αβρ σ ≈ (RΣ)αβρ σ, where RΣ is given by (31). This is just another reason for identifying RΣ as the replacement for the curvature tensor of Σ, which is the first step of our path to the generalized compatibility conditions. 5 Generalized compatibility conditions Let us now consider limit f → 0+ of the curvature tensor of the subdomain Ω+; by (2) we have: (Rαβρ σ)+ = Rαβρ σ + (1/2)[Rαβρ σ] (38) and, by (27): (Rαβρ σ)+ = Rαβρ σ +∇[β[Γα]ρ σ] (39) We also have, by (31): (Rαβρ σ)+ = (RΣ)αβρ σ +∇[β[Γα]ρ σ] + [Γν[β σ][Γα]ρ ν ] (40) We see that R and RΣ only differ by terms proportional to [Γ], and not involving derivatives of it. Thus, in view of neglecting these tems, in the following we will consider R instead ofRΣ; this simply avoids the introduction of the symbol “ ∼= ”, with the meaning of equality but for terms not involving derivatives of [Γ] (which here replaces the second fundamental form K) as in [1] section 6. Then for the Ricci tensor Rβρ = Rαβρ α we have: (Rβρ) + = Rβρ + (1/2)∇µ µ[Γνρ ν ]− [Γβρ and for the curvature scalar R = Rα R+ = R + (1/2)∇µ µν ]− [Γν Now, to construct the Einstein tensor G+ we have to remember that, since the metric is also discontinuous: (gαβ) + = gαβ + (1/2)[gαβ] (43) so that we have: (Gβρ) + = Gβρ + (1/2)∇µ µ − (1/8)[gβρ] µν ]− [Γν where we have denoted, for the sake of brevity: µ[Γνρ ν ]− [Γβρ µ]− (1/2)gβρ µν ]− [Γν Let us fix a coordinate chart and consider a generic (for the moment) regular prolongation for G, so that its mean value is defined in the whole Ω. Now consider the Riemann 4-volume integral of G+ over the domain Ω+; from the Green theorem we obtain (for the general definition of integral on a hypersurface see [6] p. 6): Gβρ = Gβρ + (1/2) ℓ+µHβρ µ − (1/8) ℓ+µ [gβρ] µν ]− [Γν The analogous formula for Ω− involves −ℓ− as the outgoing normal vector and the metric g−αβ = gαβ − (1/2)[gαβ], so we have: Gβρ = Gβρ + (1/2) ℓ−µHβρ µ + (1/8) ℓ−µ [gβρ] µν ]− [Γν and consequently we have: Gβρ = Gβρ + ℓµHβρ µ (48) Thus reasons similar to those of the heuristic theory (see [4] and [1] section 6) lead to the reasonable hypothesis that G remain bounded in the neigh- bourhood of Σ, for any admissible prolongation, so that from the volume integral of the Einstein equations, with the presence of an eventual source term concentrated on Σ: Gαβ = −χ Tαβ − χ T̆αβ (49) where χ denotes the gravitational constant, we conclude that ℓµHβρ µ = −χ T̆βρ (50) which is our heuristic reason for considering the following set of general- ized compatibility conditions to hold on Σ as a replacement for the Einstein equations: ℓµHβρ µ = −χT̆βρ (51) Here T̆ represents the stress-energy content of the hypersurface. In the simpler case ℓα ∈ C 0, it is very easy to check that the object ℓµHβρ is gauge-invariant in the sense of (11), as we could hope. Turning now to the comparison with the C0 case, we see from eq.s (71) and (85) of [1] that our generalized conditions (51) are formally identical to ordinary compatibility conditions [eq. (110) of the same paper], if expressed in terms of [Γ] (which in the general case is a function of the jump of the metric [g] as well as of its weak discontinuity ∂g). Therefore it is clear that generalized compatibility conditions reduce to ordinary ones in case the metric is continuous, i.e. in case [gαβ] = 0. In particular, let us suppose g ∈(C0, piecewise C1) and f ∈ C0(Ω); let us moreover suppose (ℓ · ℓ) > 0, i.e. Σ timelike. By definition of Christoffel symbols, and from (11) of [1], we have: σ] = (ℓ · ℓ)−1/2(NβGρ σ +NρGβ σ −NσGβρ) + (ℓ · ℓ) 1/2NβNρQ σ (52) Q is a vector which can be set to zero with a suitable gauge choice; it plays no role in (51), as one would expect, in fact we have: ℓµ[Γβρ µ] = −Gβρ + (ℓ · ℓ)Nβρ(Q ·N) ℓβ[Γνρ ν ] = NβNρGν ν + (ℓ · ℓ)NβNρ(Q ·N) ℓµ[Γν µν ] = Gν ν + (ℓ · ℓ)(Q ·N) ℓµ[Γν νµ] = −Gν ν + (ℓ · ℓ)(Q ·N) and, since g = g = h(N) +N ⊗N , we have from (45): ℓµHβρ µ = Gβρ − h(N)βρGν ν (54) i.e., according to (88) of [1]: ℓµHβρ µ = Hβρ (55) as expected. Now let us instead suppose (ℓ · ℓ) = 0, i.e. Σ lightlike. Let u ∈ C0(Ω) be a given auxiliary reference frame. From eq.s (21) and (16) of [1] we have: σ] = (u ·ℓ)−1(−LβF(u)ρ σ−LρF(u)β σ+LσF(u)βρ)+(u ·ℓ) 2LβLρQ̂ σ (56) and consequently, from (18) and (19) of the same paper: ℓµ[Γβρ µ] = LβB(u, n)ρ + LρB(u, n)β − (u · ℓ) 3LβLρ(Q̂ · L) ℓβ[Γνρ ν ] = LβLρG(u, n)ν ν − (u · ℓ)3LβLρ(Q̂ · L) ℓµ[Γν µν ] = ℓµ[Γν νµ] = 0 We therefore have: ℓµHβρ µ = G(u, n)ν νLβLρ − LβB(u, n)ρ − LρB(u, n)β (58) i.e. again, according to (83) of [1], we have: ℓµHβρ µ = Hβρ, as expected. Therefore the set (51) of compatibility conditions, together with ordinary Einstein Equations to hold on each side of the discontinuity hypersurface, defines the class of generalized regularly discontinuous solutions of the Ein- stein equations. And in case [g] = 0, i.e. for continuous metric, from such conditions we recover the ordinary compatibility conditions for regularly dis- continuous weak solutions. However, in the generic case we have some differences, as we are going to show in the following. 6 A class of spherical boundary layers Let us consider the match of two piecewise-C0 regularly discontinuous spher- ical solutions of the vacuum Einstein equations, of the form ds2 = −a(r, t)dt2 + b(r, t)dr2 + r2dΩ2 (59) with dΩ2 = dθ2 + sin2 θdϕ2, across a spherical admissible gravitational dis- continuity hypersurface Σ of equation r = ρ(t), with ρ(t) ∈ C1. Therefore the form ℓα = δα r − ρ̇δα t is continuous (while ℓβ = gβαℓα in general is not). We suppose globally C0 coordinates, the same form of the metric in both domains Ω+ and Ω−, and the identification t+ = t−, r+ = r−, θ+ = θ−, ϕ+ = ϕ− on Σ. Leta, b > 0 and let a, b ∈ piecewise-C0 be regularly dis- continuous on Σ and with regularly discontinuous first derivatives. Let us denote by a dot the partial derivative with respect to t, and by a prime that with respect to r. Let moreover condition a− bρ̇ > 0, i.e. (ℓ · ℓ) > 0, hold on both sides on Σ. We have: [gαβ ] = −[a]δα t + [b]δα r (60) Now let us define the match as a generalized regularly discontinuous solution by (51), with T̆ = 0, i.e. in the absence of stress-energy concentrated on Σ. In this case our compatibility conditions reduce to: ℓβ[Γµρ µ]− ℓµ[Γβρ µ] = 0 (61) which, for a match of metrics of the kind (59), are equivalent to the following system: ρ̇[ḃb−1] + [a′b−1] = 0 ρ̇[ḃa−1] + [a′a−1] = 0 ρ̇[a′a−1] + [ȧa−1] = 0 ρ̇[b′b−1] + [ḃb−1] = 0 [b−1] = 0 i.e. we have [b] = 0 and consequently: ρ̇[ḃ] + [a′] = 0 ρ̇[ḃa−1] + [a′a−1] = 0 ρ̇[a′a−1] + [ȧa−1] = 0 ρ̇[b′] + [ḃ] = 0 and from (3): ρ̇[ḃ] + [a′] = 0 (ρ̇ḃ+ a′)[a−1] = 0 (ρ̇ a′ + ȧ)[a−1] + (ρ̇[a′] + [ȧ])a−1 = 0 ρ̇[b′] + [ḃ] = 0 Now if we had both ρ̇[ḃ] + [a′] = 0 and ρ̇ḃ + a′ = 0, by (2) we would have ρ̇ḃ + a′ = 0 on both sides of the hypersurface. We discard for the moment this singular situation, and from (64)2 we conclude that [a −1] = 0. Thus in this case our generalized compatibility conditions imply [a] = [b] = 0, i.e. they force the match to be C0, piecewise-C1. In [1] we have already studied some examples of C0, piecewise-C1 matchs of metrics of the kind (59) at a hypersurface of constant radius r = rb, with ℓα = δα r. Namely, we have considered: external Schwarzschild - internal Schwarzschild; external Schwarzschild - Tolman VI; external Schwarzschild - Tolman V. Such matchs obviously have ℓα ∈ C 0; moreover condition ρ̇ ḃ+a′ 6= 0 reduce in this case to a′ 6= 0, which is obviously satisfied. In each case we have verified that condition [a] = [b] = 0 imply ∂a = 0 (where ∂ denotes first order discontinuity), which then define the match as a boundary layer [1] (it actually also imply ∂b = 0, as one can verify). Such is a general result, since for a metric of the kind (59) the completely temporal and radial components of the Einstein tensor are independent from the second derivatives of the metric: Gtt = −a(b ′r + b2 − b)/r2b2 Grr = −(a ′r − ab+ a)/ar2 so that the corresponding vacuum Einstein equations reduce to: b′r + b2 − b = 0, a′r − ab+ a = 0. Now, since in the match of (59) vacuum solutions equations (66) are satisfied on each side of the interface Σ, their jump is in particular null, and from (3) we have: ′] + (2b− 1)[b] = 0 ′]− a[b]− (b+ 1)[a] = 0 from which it clearly follows that conditions [a] = [b] = 0 imply [a′] = [b′] = 0, i.e. ∂a = ∂b = 0. Summarizing, for the match of two piecewise-C0 regularly discontinuous spherical solutions, in the above hypothesis, generalized compatibility con- ditions (51) imply [a] = [b] = 0 i.e. they force the match to be C0. On the other hand conditions [a] = [b] = 0 imply that Σ is a boundary layer. Therefore for such spherical matchs generalized compatibility conditions (51) are necessary and sufficient for the match to be a boundary layer. 7 Generalized gravitational shock waves Let us consider the match of two plane wave metrics of the form ds2 = −2dξdη + F (ξ)2dx2 +G(ξ)2dy2 (68) across a hypersurface Σ of equation ξ = 0. Here ξ and η are the two null coordinates. We suppose continuously matching coordinates and F,G reg- ularly discontinuous, together with their first and second derivatives. The gradient vector of Σ is the continuous characteristic (on each side of Σ) vector ℓα = δα Generalized compatibility conditions (51) in the case T̆ = 0 (i.e. no stress- energy concentrated on the hypersurface) reduce to the following single scalar equation: [F−1F ′ +G−1G′] = 0 (69) which characterize the generalized gravitational shock wave. Let us now study compatibility of (69) with the Einstein Equations. Einstein vacuum equations also reduce to a single scalar equation: F−1F ′′ +G−1G′′ = 0 (70) which we suppose to hold on each side of the hypersurface Σ; thus replacing F+ and G+ by their expressions in terms of F , [F ], G and [G] according to (2) gives rise to the following couple scalar conditions: (2F ′′ + [F ′′])(2G+ [G]) + (2G′′ + [G′′])(2F + [F ]) = 0 (71) (2F ′′ − [F ′′])(2G− [G]) + (2G′′ − [G′′])(2F − [F ]) = 0 (72) Equations (71)-(72) are compatible with (69), i.e. the three equations set can be solved algebraically with respect to F , [G] and to any member of the pair (F , G), and the solution is not necessarily trivial. Finally let us notice that, if the additional condition [F ] = [G] = 0 holds, i.e. if the solution is C0, condition (69) reduces to F−1[F ′]+G−1[G′] = 0 i.e.: F−1∂F +G−1∂G = 0 (73) which is the analogous condition for the ordinary shock wave, according to [1] section 10.5. 8 Slow generalized gravitational waves Let us start trying to match two vacuum solutions of the kind (68) across the timelike (on both sides) hypersurface Σ of equation ξ = ζ . Again we suppose continuously matching coordinates, F,G regularly discontinuous to- gether with their first and second derivatives, and T̆ = 0. This times gener- alized compatibility conditions include (69) and the following two additional scalar conditions: [FF ′] = 0, [GG′] = 0 (74) i.e., in terms of F , [F ], G and [G], according to (3): F [F ′] + [F ]F ′ = 0 (75) G[G′] + [G]G′ = 0 (76) It is easy to check that the system (75)-(76) is not compatible with (71)-(72), in the sense that the whole system does not admit non-trivial solutions for F , [F ], G and [G]. On the other hand we have proved in section 6 that a wide class of gen- eralized spherical matchs at a hypersurface of constant radius necessarily degenerate to a C0 match. Other examples of degeneracy have not been included in the paper for the sake of brevity, but at least it seems to be a hard task to construct a non- trivial generalized match across a timelike (on each side) hypersurface, with no stress-energy content. Such difficulty is certainly not a proof that this is an impossible task, but it makes us wonder whether such a solution should necessarily degenerate to a boundary layer, just like it happens for ordinary C0 solutions (see e.g. [1]). This would forbid the existence of generalized solutions which propagate at a speed slower than light. Such would be a desiderable prohibition under certain respect, since one could expect that gravitational interactions in vacuo must necessarily propagate at the speed of light also in a generalized theory. In general terms, since for generalized solutions the metric is discontinu- ous, a hypersurface can in principle have different signatures on the different sides; for this reason we cannot simply distinguish between the timelike and the lightlike case, as for usual C0 solutions. We should rather distinguish between three cases: timelike-timelike, timelike-lightlike (or conversely) and lightlike-lightlike. In any case it is legitimate to expect that, at least in the timelike-timelike case, similar to the timelike case of (C0, piecewise C1) solutions, absence of stress-energy concentrated on Σ should imply the solution to degenerate to a boundary layer [1]. Unfortunately for generalized solutions we still have no proof that absence of stress energy concentrated on Σ does necessarily imply the degeneracy of the solution to a boundary layer. Therefore, although the examples considered in this paper seem to sug- gest that such property could hold true also in the generalized case, for the moment such result is still a conjecture; we thus have to admit that the the- ory in principle allows propagation of generalized gravitational shock waves at lower speed than the speed of light. We would call such waves “slow gen- eralized gravitational shock waves”. It would be reasonable to forbid this situation as unphysical, but for now this can only be done ad hoc, by means of a corresponding additional hypothesis. 9 Non-symmetric stress-energy Notice that ℓµHβρ µ is not necessarily symmetric; from identity: ν = (1/2)g−1∂αg (77) where g denotes the determinant of the contravariant metric, we have: ℓµH[βρ] µ = (1/4)(ℓβ[g −1∂ρg]− ℓρ[g −1∂βg]) (78) Thus the generalized scheme allows in principle the presence of non sym- metric stress-energy on the discontinuity hypersurface. We will display non- trivial examples of non-symmetry in the following section. Notice that the right hand side of (78) is identically null in case g ∈ C0 and ℓα ∈ C 0, since in this case we have [g−1∂βg] = ℓβg −1∂g. A non-symmetric Einstein tensor is a feature of Einstein-Cartan theory of gravitation (see [19], see also [3] section 7.2), where it is due to the presence of torsion in the non-symmetric connection used to construct generalized curvature. Thus the generalized theory can be interpreted, at least to some extent, as introducing a torsion equivalent tool on the shell only, even if there are no geometrical objects in our theory which can be directly interpretated as torsion. However, Einstein-Cartan theory also has a spin - angular mo- mentum field equation in addition to the Einstein equations, which here is missing. In the literature, compatibility conditions for C0 solutions of boundary layers [20], and recently of shock waves and thin shells [21], have been studied also in the framework of Einstein-Cartan theory; actually this can lead to non-symmetric stress-energy on the shell. But in that theory this feature is inherited from the ambient spacetime, which is not here: non-symmetric stress-energy arises on the shell only, in consequence of the theory. This interesting feauture probably is worth investigating. 10 Generalized thin shells Now let us consider a more general form of the spherical metric: ds2 = −a(r, t)dt2 + b(r, t)dr2 + c(r, t)dΩ2 (79) Let us consider a match of two spherical solutions of the Einstein equations across a timelike (on each side) hypersurface of equation r = ρ(t). Again we suppose ρ(t) ∈ C1 and therefore ℓα = δα r − ρ̇δα t ∈ C0. Let the coordinates be C0 globally, and let the metric have the same form (79) in both domains Ω+ and Ω−, with the identification t+ = t−, r+ = r−, θ+ = θ−, ϕ+ = ϕ− on Let moreover a, b, c > 0 and let a, b, c ∈ piecewise-C0 be regularly dis- continuous on Σ and with regularly discontinuous first derivatives. Again we denote by a dot the partial derivative with respect to t, and by a prime that with respect to r. In this case for the left hand side of the generalized compatibility condi- tions ℓµHβρ µ we obtain: ℓµHβρ µ = − [a′b−1/2] + ρ̇[ḃb−1/2 + ċc−1] +([a′a−1/2 + c′c−1] + ρ̇[ḃa−1/2])δβ +([ȧa−1/2 + ċc−1] + ρ̇[a′a−1/2])δβ −([ḃb−1/2] + ρ̇[b′b−1/2 + c′c−1])δβ +([c′b−1/2] + ρ̇[ċa−1/2])(δβ θ + sin2 θδβ [b−1(a′a−1/2 + c′c−1)] + ρ̇[a−1(ḃb−1/2 + ċc−1)] where, obviously: gβρ = −aδβ t + bδβ r + c(δβ θ + sin2 θδβ ϕ) (81) We now mean to interpret (80) as the matter-energy of a thin shell. Let us first get back to the particular case ρ̇ = 0 (static shell) and ȧ = ḃ = ċ = 0, to make the interpretation simpler by eliminating the non-symmetric component; rearranging some terms we in fact obtain: ℓµHβρ µ = (−[a′b−1/2] + ac−1[c′b−1/2])δβ +([a′a−1/2 + c′c−1]− bc−1[c′b−1/2])δβ c−1[c′b−1/2]− [b−1(a′a−1/2 + c′c−1)] This can be interpretated as a perfect isotropic magneto-fluid thin shell with infinite conductivity, i.e. we can solve the compatibility conditions by con- sidering the following stress-energy as the right hand side: T̆αβ = (ρ0 + p+ µh 2)UαUβ + (p+ (1/2)µh 2)gαβ − µhαhβ (83) where ρ0 is the proper density, h the magnetic field and µ the magnetic permeability [22, 23, 24, 25, 6]; here we define h2 = hαhβg αβ . In fact it suffices to define the following 4-velocity vector: a− (1/4)[a2]a−1δα t = (a−1)1/2δα t (84) which by construction is unitary on Σ, in the following sense: UαUβg αβ = −1, and the following magneto-hydrodynamical variables: ρ0 = χ −1a−1[a′b−1]/2 + χ−1c−1(b b−1/2− aa−1 + 1)[c′b−1]/2+ −χ−1[b−1](a′a−1/2 + c′c−1)− (3/2)χ−1b−1[a′a−1/2 + c′c−1] p = χ−1c−1(bb−1/2− 1)[c′b−1/2] + (1/2)χ−1b−1[a′a−1/2 + c′c−1]+ +χ−1[b−1](a′a−1/2 + c′c−1) hα = ± b−1χ−1([a′a−1/2 + c′c−1]− bc−1[c′b−1/2])δα to match (82) and (83) via ℓµHβρ µ = −χT̆βρ. If [a] = [b] = [c] = 0 then the generalized shell (85) degenerates to the C0 magnetohydrodynamical shell considered in [1] section 10.1, in the particular case ρ̇ = 0. The slightly more general case of ȧ = ḃ = ċ = 0, but ρ̇ 6= 0, displays non-symmetric terms in (80); however it is not difficult to see that the per- fect magnetofluid interpretation still holds, provided such additional non- symmetric terms are interpreted, or neglected. In fact in this case we have: ℓµHβρ µ = (−[a′b−1/2] + a c−1[c′b−1/2])δβ +([a′a−1/2 + c′c−1]− bc−1[c′b−1/2])δβ +(1/2)ρ̇[a′a−1/2− b′b−1/2− c′c−1](δβ t + δβ +(1/2)ρ̇[a′a−1/2 + b′b−1/2 + c′c−1](δβ t − δβ c−1[c′b−1/2]− [b−1(a′a−1/2 + c′c−1)] Now let us consider, for the sake of brevity, the following quantities: ρ̇2[ a ]2b−1 − (a ]− [ a ])2a−1 ]− [ a ]− [ a and let us suppose that inequality α < 0 holds, which is necessary for the physical interpretation. In fact in this case the following vector: ]− [ a t + 1 ρ̇[ a ]− [ a is a unit timelike vector on Σ, in the sense that UαUβg αβ = −1. Rearranging terms, (86) now reads: ℓµHβρ µ = αUβUρ + βδβ r + 1 ρ̇[ a t − δβ c−1[ c ]− [b−1( a which can be matched via ℓµHβρ µ = −χT̆βρ with a stress-energy tensor of the following kind: T̆αβ = (ρ0 + p+ µh 2)UαUβ + (p+ (1/2)µh 2)gαβ − µhαhβ + Aαβ (91) where A denotes the anti-symmetric term. We have: ρ0 = −χ −1α + χ−1 1 ]− χ−1[b−1( a )]− 1 p = −χ−1 1 ] + χ−1[b−1( a )]− 1 µh2 = χ−1βb−1 while the anti-symmetric term A reads: Aαβ = −χ t − δβ r) (93) The interpretation of such term is still missing; alternatively it could be neglegted by adding the further hypothesys: = 0 (94) which is equivalent to [g−1g′] = 0, as we could expect from (78). References [1] G. Gemelli Gen. Rel. Grav. 34 1491-1540 (2002). [2] S.O’Brien, J.L.Synge Comm. Dublin Inst. Adv. Stud. Ser. A 9 1-20 (1952). [3] C. Barrabes, P.A. Hogan Singular null hypersurfaces in general relativity World Scientific, Singapore (2003). [4] W. Israel, Nuovo Cimento B 44 1 (1966); corrections in 48, 463. [5] C. Barrabes, W. Israel Phys. Rev. D 43 1129 (1991). [6] A. Lichnerowicz, Magnetohydrodynamics: waves and shock waves in curved space-time,Mathematical physics studies vol. 14, Kluwer academic publishers, Dordrecht, 1994. [7] L. Schwarz, C. R. Acad. Sci. Paris 239 847 (1954). [8] J.F. Colombeau J. of Math. Anal. and appl. 94 96 (1983). [9] J.F. Colombeau New generalised functions and multiplication of distribu- tions North-Holland (1984). [10] J.F. Colombeau Multiplication of distributions Lecture notes in Mathe- matics 1532, Springer (1992). [11] J.F. Colombeau, A. Meril J. Math. Anal. Appl. 186 (1984). [12] J.A. Vickers, J.P. Wilson Class. Quantum Grav. 16 579-588 (1999). [13] H. Balasin, Class. Quantum Grav. 14 (1997). [14] R. Steinbauer J. Math. Phys. 38 1614 (1997). [15] C.J.S. Clarke, J.A. Vickers, J.P. Wilson Class. Quantum Grav. 13 2485 (1986). [16] G. Gemelli, J. Geom. Phys. 43/4 371-383 (2002). [17] C. Cattaneo Istit. Lombardo Accad. Sci. Lett. Rend. A 112 139 (1978). [18] G. Gemelli J. Geom. Phys. 20 233 (1996). [19] E. Cartan C. R. Acad. Sci. Paris 174 593-595 (1922). [20] W. Arkuszewsky, W Kopczynski, V.N. Ponomariev Commun. Math. Phys. 45 183-190 (1985). [21] G.F. Bressange Class. Quantum Grav. 17 2509-2523 (2000). bibitem- lich67 [22] A. Lichnerowicz, Ann. Inst. H.Poincaré 7 271 (1967). [23] A. Lichnerowicz, Comm. Math. Phys. 12 145 (1969). [24] A. Lichnerowicz, in “Centr. Int. Mat. Est. 1970,” Cremonese, Roma, (1971). [25] A. M. Anile Relativistic fluids and magneto - fluids Cambridge Univer- sity Press, Cambridge (1989). Introduction Discontinuous metrics Mean-value geometry on a hypersurface Complex mean-value formalism Generalized compatibility conditions A class of spherical boundary layers Generalized gravitational shock waves Slow generalized gravitational waves Non-symmetric stress-energy Generalized thin shells ABSTRACT The physical consistency of the match of piecewise-$C^0$ metrics is discussed. The mathematical theory of gravitational discontinuity hypersurfaces is generalized to cover the match of regularly discontinuous metrics. The mean-value differential geometry framework on a hypersurface is introduced, and corresponding compatibility conditions are deduced. Examples of generalized boundary layers, gravitational shock waves and thin shells are studied. <|endoftext|><|startoftext|> Introduction This paper is a step in a broader program, which aims at finding a geomet- ric counterpart to the Mirror Symmetry phaenomenon, and possibly a geometric language in which to formulate a physical theory interpolating between different σ-models. While we direct the reader to [G2],[G3] for more details, we list here only some aspects of this theory to put the present work into context. In the Strominger-Yau-Zaslow approach to Mirror Symmetry you have that two mirror dual Calabi-Yaus should posses (in some limiting sense) semi-flat special lagrangian torus fibrations f : M → B, f̂ : M̂ → B which have as fibres flat tori which are dual in the metric sense (see [SYZ], and [G2] for the terminology and the definitions). As it is widely known, the major drawback of this approach is that it is very difficult to build special lagrangian tori fibrations. Usually this construction can be carried out only when the dual Calabi-Yau manifolds are actually hyper- kahler, and the special lagrangian tori can be viewed as complex submanifolds (with respect to a rotated complex structure), so that the methods of complex algebraic geometry can be put to work. When you do have the fibrations, then the idea is to construct the mirror map as a sort of Fourier-Mukai transform (see for example [BMP]). This Fourier-Mukai transform is a correspondence induced by pull-back and push forward from the space X = M ×B M̂ . In the hyperkähler case this space is a complex manifold, while in the general case (for example for Mirror Symmetry for Calabi-Yau three- folds) it is just a real manifold of (real) dimension 3 · dimC(M). Background. The notion of (Weakly) self-dual manifold (cf. [G2]) was con- ceived in the first place to isolate the geometric aspects of the X above which are needed to obtain Mirror Symmetry betweenM and M̂ . We reproduce here the def- inition for the reader, while referring to [G2] and [G3] for all the remarks, examples and observations: Definition 1.1. A weakly self-dual manifold (WSD manifold for brevity) is given by a smooth manifold X, together with two smooth 2-forms ω1, ω2 a Riemannian metric and a third smooth 2-form ωD (the dualizing form) on it, which satisfy the following conditions: 1) dω1 = dω2 = dωD = 0 and the distribution ω 1 + ω 2 is integrable. 2) For all p ∈ X there exist an orthogonal basis dx1, .., dxm, dy 1 , ..., dy m, dy 1 , ..., dy Date: October 24, 2006. http://arxiv.org/abs/0704.0104v1 2 GIOVANNI GAIFFI, MICHELE GRASSI dz1, ..., dzc, dw1, ..., dwc of T pX such that the dx1, .., dxm, dy 1 , ..., dy m, dy 1 , ..., dy are orthonormal and (ω1)p = dxi ∧ dy i , (ω2)p = dxi ∧ dy i , (ωD)p = dy1i ∧ dy dzi ∧ dwi Any orthogonal basis of TpX dual to a basis of 1- forms as above is said to be adapted to the structure, or standard. The number m is the rank of the structure. For a more intrinsic definition of WSD manifolds the reader should refer to [G2]. Here we have chosen the quickest way to introduce them. When the forms ω1, ω2, ωD are covariant constant with respect to the Levi-Civita connection, we speak of 2-Kähler manifolds. An example of these comes from mirror symmetry for abelian varieties. Remark 1.2. The form ωD is symplectic once restricted to ω 1 + ω 2. We have therefore that ω dim(X)−m 6= 0. Definition 1.3. 1) A WSD manifold is nondegenerate if dim(ω01 ∩ ω 2)p = 0 at all points (equivalently if its dimension is 3 times the rank). 2) A WSD manifold is self-dual (SD manifold for brevity) if all the leaves of the distribution ω01 + ω 2 have volume one (with respect to the volume form induced by the metric) Using Self dual manifolds, you can give a first näıve geometric definition of Mir- ror Symmetry as follows: Two Calabi-Yau manifolds with B-field (M,BM ) and (M̂,BM̂ ) are mirror dual if there is a Self-dual manifold X together with surjections π : X → M and π̂ : X → M̂ such that: a) π∗(ωM ) = ω1, π̂ ) = ω2. b) The leaves of ω⊥1 are the fibres of π̂ c) The leaves of ω⊥2 are the fibres of π d) The induced B-fields on M and M̂ are the ones given. Here make their first appearence the B-fields BM and BM̂ , which are flat unitary gerbes on M and M̂ respectively, and which are not relevant for the discussions of this paper. In [G2] it was shown that this picture works well in the case of elliptic curves, and for some other flat situations. Physical motivation. One of the reasons to introduce SD manifolds however was to get rid of special lagrangian fibrations, which are so difficult to construct, and to be able to attack the problem of Mirror Symmetry also when these fibra- tions are not expected to exist. In this more general context one expects that the Mirror Symmetry phaenomenon will not be obtained directly from fibrations of a SD manifold to the dual Calabi-Yaus, but via a more sophisticated procedure, which involves a Gromov-Hausdorff type of limit. In [G3] it was shown that for the family of anticanonical divisors in complex projective space one can build a (real) two-dimensional family of WSD manifolds, which degenerate in a normalized Gromov-Hausdorff sense to the correct limits of the mirror dual Calabi-Yaus. The picture is the following: A GEOMETRIC REALIZATION OF sl(6,C) 3 MB MAS where MA and MB are the large Kähler and large complex structure limits of M and M̂ respectively. To be precise, the manifolds which come out of the costruc- tion of [G3] are 11 dimensional (degenerate) Weakly self-dual manifolds or rank 3. Dimension 11 is very appealing in this context from a physical point of view, and it brings us to the motivation for the present work. The point of view of [G3] is very different from the current one in the main liter- ature on mathematical Mirror Symmetry: instead of considering the fibre product M×B M̂ (when it exists) as a device for proving Mirror Symmetry for Calabi-Yaus, the limiting Calabi-Yaus of Mirror Symmetry are seen as very special limits of a family of Self-Dual manifolds, which are the main objects of study. This is actually more in line with what can be found in the physical literature, where the σ-models defining the string theories from which Mirror Symmetry originates are seen as just ”phases” of a unique theory, which is not necessarily in the form of a σ-model but could very likely be similar to a quantized Gauge theory on an 11-dimensional manifold. To make this circle of ideas more concrete (and hence more verifiable) at the end of [G3] it is suggested that one should try to build a natural gauge theory on Self-dual manifolds: the hope is that once quantized this gauge theory might in- terpolate between the σ-models associated to the Calabi-Yau’s, and as a byproduct prove Mirror Symmetry for them. Of course one can always put a gauge bundle on the Self-dual manifolds ”artificially”, but a natural bundle which depends only on the geometric structure would be much more appealing. We ignore here the issue of which action to put on the theory, but it too should be a natural geometric one. Finally, on [GG] we analyzed the situation for rank three WSD manifolds, and we found that in this case the corresponding natural bundle is formed by complex Lie superalgebras. We were able to find a geometrically motivated real form, and to split it into simple factors. The results of [GG] confirm the suspicion that on a WSD manifold of high enough rank there could be enough natural algebraic bundles of operators to build interesting gauge theories. The construction of LC. From a physical point of view the case of Calabi- Yau threefolds (i.e. rank three WSD manifolds) or fourfolds (i.e. rank four WSD manifolds) would be the most interesting one to start with. However, its technical difficulty convinced us to start more modestly from the case of Calabi-Yau two- folds (i.e. K3 surfaces) which correspond to rank two Self-dual manifolds. We also considered only orientable nondegenerate Self-dual manifolds of rank two, hence of dimension 6. This could be considered a proof of concept from a physicist’s point of view, however Mirror Symmetry for K3’s is in itself very interesting mathemati- cally, so we hope that our results could have some useful geometric consequences. The rank three case is treated in our subsequent [GG], as mentioned in the previous section of this introduction. The main result of the present paper is the following 4 GIOVANNI GAIFFI, MICHELE GRASSI (which is a geometric restatement of Theorem 5.11): The Lie algebra sl(6,C) acts via canonical operators (depending only on the geo- metric structure) on the smooth differential forms of any orientable nondegenerate WSD manifold of rank 2. This action generalizes naturally the action of sl(2,C) on smooth differential forms of any almost Kähler manifold, and is induced by a bundle action on the exterior power of the cotangent bundle. Recall that a Weakly self-dual manifold is a Riemannian manifold with three ”compatible” closed differential forms. We will build a Lie algebra of pointwise operators on complex differential forms on X , as smooth sections of a bundle of Lie algebras of operators on the complexified cotangent bundle of X . To start, one can define the following operators: Definition 1.4. For φ ∈ Ω∗ L0(φ) = ωD ∧ φ, L1(φ) = −ω2 ∧ φ, L2(φ) = ω1 ∧ φ One can notice immediately the strong resemblance of the operators above with the Lefschetz operator of Kähler geometry. Indeed, one can elaborate on this simi- larity, and use the metric to define the adjoints Λj = L j (using a pointwise proce- dure, as in the almost Kähler case). Simply using the Lj and the Λj , one can show that the algebra generated is iso- morphic to SL(4,C) ([G2]). However, there are other natural differential forms on a WSD manifold (which do not have a counterpart in the Kähler case), namely the volume forms of the distributions ω⊥1 , ω 2 , ω D of vectors which contract to zero with the forms ω1, ω2 and ωD respectively. If one calls V0, V1, V2 the corresponding wedge operators, and A0, A1, A2 their adjoints, the complexity of the calculations to describe the generated Lie algebra grows a lot. We called L the algebra generated by the Lj, Vj and their adjoints, and LC its complexification. To study LC we intro- duced an operator J , which is a complex structure on each of the two-dimensional distributions mentioned above and generates a group isomorphic to SO(2,R) (recall that we are in the ”hyperkahler” case, corresponding to Mirror Symmetry for K3’s, so an ”extra” complex structure shouldn’t be surprising; moreover the holonomy of a WSD manifold in which all ω1, ω2, ωD are invariant is actually always included in the group generated by J). One checks that all the operators introduced commute with it: ∀j [Lj , J ] = [Λj , J ] = [Vj , J ] = [Aj , J ] = 0 and therefore one can try to decompose Λ∗T ∗ X with respect to J and then use Shur’s Lemma to reduce to the study of the operators on the isotypical components. One should mention that in the (very) good cases (for instance 2-Kähler manifolds) the operators above are all covariant constant with respect to the metric connec- tion, and define an action on the cohomology of X much in the same way as in the Kähler setting the operators L and Λ do (due to Hodge-type identities). We don’t explore this aspect here, although it may be relevant to the (homological) mirror map construction. Coming back to the construction, we point out the inclusion of the Lie algebra LC inside a copy of the Clifford algebra Cl6,6. Using this Clifford algebra one can identify ”degree two” or ”quadratic” operators (in a way similar to the ones involved in the Spinor representations on standard Spin manifolds) and among these the SO(2,R)-invariant ones. A posteriori, it turns out that the operators of LC⊕ < J > are all the J-invariant operators of ”degree A GEOMETRIC REALIZATION OF sl(6,C) 5 two”, and this strengthens the rationale in our selection of natural operators. As a last step one finds that inside Λ∗T ∗X there is an SO(2,R)-isotypical compo- nent of dimension 6, and by direct computation we prove that indeed the operators restricted to this sub-representation determine a copy of sl(6,C) (with the defin- ing representation). Using the bound on the dimension of LC obtained computing ”quadratic” invariants, one then shows that the representation on this isotypical component is faithful. This provides as a byproduct a method for giving presenta- tion of standard Serre generators of LC, explicitely written in terms of the natural geometrical generators. 2. Basic operators In this section we fix a point p in the WSD manifold X . The WSD structure splits the cotangent space as T ∗pX =W0⊕W1⊕W2 where theWj are three mutually orthogonal canonical distributions defined as: W0 = {φ ∈ T pX | φ ∧ ω 1 = φ ∧ ω 2 = 0} W1 = {φ ∈ T pX | φ ∧ ω 1 = φ ∧ ω D = 0} W2 = {φ ∈ T pX | φ ∧ ω 2 = φ ∧ ω D = 0} The WSD structure also determines canonical pairwise linear identifications among W0,W1 and W2, so that one can also write T pX = W0 ⊗R R 3 or more simply T ∗pX =W ⊗R R where W =W0 ∼=W1 ∼=W2. Let us now come back to the canonical operators Lj mentioned in the introduction: Definition 1.4 For φ ∈ Ω∗ L0(φ) = ωD ∧ φ, L1(φ) = −ω2 ∧ φ, L2(φ) = ω1 ∧ φ We now choose a (non-canonical) orthonormal basis γ1, γ2 forW0, and this together with the standard identifications of the Wj determines an orthonormal basis for T ∗pX , which we write as {vij = γi ⊗ ej | i = 1, 2, j = 0, 1, 2}. We remark that the vij are an adapted coframe for the WSD structure, and therefore we have the explicit expressions: ω1 = v10 ∧ v11 + v20 ∧ v21 ω2 = v10 ∧ v12 + v20 ∧ v22 ωD = v11 ∧ v12 + v21 ∧ v22 A different choice of the γ1, γ2 would be related to the previous one by an element in O(2,R) or, taking into account the orientability ofX mentioned in the Introduction, an element of SO(2,R). The Lie algebra of the group SO(2,R) expressing the change from one oriented adapted basis to another is generated (point by point) by the global operator J : Definition 2.1. The operator J ∈ EndR(Ω ∗(X)) is induced by its pointwise action on the Λ∗T ∗pX for varying p ∈ X, defined in terms of the standard basis vij as J(v1j) = v2j , J(v2j) = −v1j for j ∈ {0, 1, 2} and J(v ∧ w) = J(v) ∧ w + v ∧ J(w) for v, w ∈ Λ∗T ∗pX Remark 2.2. As J commutes with itself, it is well defined, independently of the choice of an oriented adapted basis. Using the chosen (orthonormal) basis, one can define corresponding (non canon- ical) wedge and contraction operators: 6 GIOVANNI GAIFFI, MICHELE GRASSI Definition 2.3. Let i ∈ {1, 2} and j ∈ {0, 1, 2}. The operators Eij and Iij are respectively the wedge and the contraction operator with the form vij on (defined using the given basis); we use the notation ∂ to indicate the element of TpX dual to vij ∈ T Eij(φ) = vij ∧ φ, Iij(φ) = Proposition 2.4. The operators Eij , Iij satisfy the following relations: ∀i, j, k, l EijEkl = −EklEij , IijIkl = −IklIij ∀i, j EijIij + IijEij = Id ∀(i, j) 6= (k, l) EijIkl = −IklEij ∀i, j E∗ij = Iij , I ij = Eij where ∗ is adjunction with respect to the metric. Proof The proof is a simple direct verification, which we omit. � It is then immediate to verify that: Proposition 2.5. J can be expressed as (E2jI1j − E1jI2j) on the whole T ∗pX. From this expression and the previous proposition one ob- tains that J∗ = −J , i.e. for every p the Lie algebra generated by J is a subalgebra of o( T ∗pX) isomorphic to so(2,R) ∼= R. Moreover, the exponential images in- side AutR(Ω ∗(X))of the operators of type tJ for t ∈ R form a group isomorphic to SO(2,R) ∼= S1, as this isomorphism holds for the (faithful) restriction of the group action to T ∗pX. Using the (non canonical) operators Eij we can obtain simple expressions for the pointwise action of the other canonical operators, the volume forms Vj : Definition 2.6. For φ ∈ T ∗pX, V0(φ) = E10E20(φ), V1(φ) = E11E21(φ), V2(φ) = E12E22(φ) Remember however that the operators Vj do not depend on the choice of a basis, as they are simply multiplication by the volume forms of the spaces Wj . We use the vij also as a orthonormal basis for the complexified space T p ⊗R C (with respect to the induced hermitian inner product). We indicate with the same symbols Vj the complexified operators acting on the spaces T ∗pX . The riemannian metric induces a Riemannian metric on T ∗pX and on the space T ∗pX . Definition 2.7. For j ∈ {0, 1, 2} Λj = L j , Aj = V By construction the canonical operators Lj , Vj ,Λj, Aj on T ∗pX are the point- wise restrictions of corresponding global operators on smooth differential forms, which we indicate with the same symbols: for j ∈ {0, 1, 2}, Lj , Vj ,Λj , Aj : Ω ∗(X) → Ω∗(X) Summing up: A GEOMETRIC REALIZATION OF sl(6,C) 7 Definition 2.8. The ∗-Lie algebra L is the ∗-Lie subalgebra of EndR (Ω ∗(X)) gen- erated by the operators {Lj, Vj ,Λj, Aj | for j = 0, 1, 2} The ∗ operator on L is induced by the adjoint with respect to the Riemannian metric. The ∗-Lie algebra LC is L ⊗ C, and is in a natural way a ∗-Lie subalgebra of EndC (Ω (X)). The ∗ operator on LC is induced by the adjoint with respect to the induced Hermitian metric. The canonical splitting T ∗pX = W0 ⊕ W1 ⊕ W2 together with the canonical identifications W0 ∼=W1 ∼=W2 induce an action of the symmetric group S3, which propagates to T ∗X and to its C∞ sections. At every point, the action can be written explicitly in terms of the basis as σ(vij) = viσ(j) The induced action on endomorphisms via conjugation, σ(φ) = σ◦φ◦σ−1, preserves LC. Indeed, one can check directly using the basis vij at every point that for σ ∈ S3 σ(Vj) = Vσ(j), σ(Lj) = ǫ(σ)Lσ(j) Since S3 acts on LC by conjugation with unitary operators, its action commutes with adjunction (the ∗ operator), and therefore σ(Aj) = Aσ(j), σ(Λj) = ǫ(σ)Λσ(j) Moreover, one also has that σ(J) = J which means that the action of S3 commutes with that of so(2,R). 3. The action of so(2,R) When one deals with mirror simmetry for 2-Kähler manifolds (see the Introduc- tion), the WSD manifolds which arise have the property that the forms ω1, ω2 and ωD are covariant constant with respect to the metric. In this case, the maximal possible holonomy of the WSD manifold X is included in the so(2,R) generated by the operator J . We will show now that J commutes with LC. Our proof will be strictly algebraic, so that the commutativity between so(2,R) and LC will hold also on WSD manifolds for which the holonomy is more general. Definition 3.1. Given n ∈ Z, we indicate with Vn the one dimensional complex representation of SO(2,R) ∼= S1 ∼= R/Z given by the character: θ → e2πınθ Proposition 3.2. Under the SO(2,R) representation induced by the operator J , for any p ∈ X : 1) The space Xp) splits as V ⊕31 8 GIOVANNI GAIFFI, MICHELE GRASSI 2) The whole space Xp) splits according to the following picture: Xp) = V0 Xp) = V V ⊕31 Xp) = V V ⊕90 V ⊕32 Xp) = V−3 V ⊕91 Xp) = V V ⊕90 V ⊕32 Xp) = V V ⊕31 Xp) = V0 Proof 1) The space T ∗ Xp is a direct sum of the three Wj , and each one of these is the standard two dimensional real representation of so(2,R). We therefore diag- onalize the representation introducing a new basis for each Wj =< v1j , v2j >: wj = v1j + ı v2j , wj = v1j − ıv2j From the definition of J , one has then for every j ∈ {0, 1, 2} J(wj) = −ıwj , J(wj) = ıwj Therefore one has for every j ∈ {0, 1, 2} < wj >∼= V−1, < wj >∼= V1 2) To prove the general case, we use the fact that the operator J determines an almost complex structure on the manifold X , compatible with the metric. From this, following standard arguments, the complex differential forms and also the elements of Xy for any y ∈ Y can be divided according to their type: p+q=n In the notation adopted in the proof of the first statement, one has Xy =< wi1 ∧ · · · ∧ wip ∧ wj1 ∧ · · · ∧ wjq | i1, ..., jq ∈ {0, 1, 2} > From the definition of the action of J one has therefore that for any p, q Xy ∼= V with k = from which the second statement of the proposition can be esily deduced. � Theorem 3.3. The operators Lj, Vj for j ∈ {0, 1, 2} commute with the generator J of so(2,R). A GEOMETRIC REALIZATION OF sl(6,C) 9 Proof We prove the statements by a direct computation using the basis vij ; moreover, using the action of S3 (which permutes the Lj, Vj and fixes J), it is enough to prove the commutativity for L0 and V0. It useful to rewrite ω0 (and hence L0 which is wedge with ω0) in terms of the basis generated by the wj : ω0 = v11 ∧ v12 + v21 ∧ v22 = (w1 ∧ w2 − w2 ∧ w1) and then: [J, L0](wi1 ∧ · · · ∧ wip ∧ wj1 ∧ · · · ∧ wjq ) = (w1 ∧ w2 − w2 ∧ w1) wi1 ∧ · · · ∧wip ∧ wj1 ∧ · · · ∧ wjq (w1 ∧ w2 − w2 ∧ w1) wi1 ∧ · · · ∧wip ∧ wj1 ∧ · · · ∧ wjq (w1 ∧ w2 − w2 ∧w1) ∧ J(wi1 ∧ · · · ∧ wip ∧ wj1 ∧ · · · ∧ wjq ) Therefore the result follows from the fact that (w1 ∧w2 − w2 ∧ w1) = 0 as wj and wk have opposite weight with respect to J for any j, k. Similarly, [J, V0] = 0 follows from the fact that for any α V0(α) = v10 ∧ v20 ∧ α = w0 ∧ w0 ∧ α From the previous theorem one obtains the following corollary, which holds on any WSD manifold (not necessarily 2-Kähler ): Corollary 3.4. The algebra LC commutes with the action of so(2,R) induced by Proof We already know that [J, Lj] = [J, Vj ] = 0 for j ∈ {0, 1, 2}. The corre- sponding commutation relations for the adjoint generators Λj , Aj of LC follow from the fact that J∗ = −J , as noticed in Proposition 2.5. � Remark 3.5. From Schurs’s lemma it follows that the columns of the diagram of Proposition 3.2 are preserved by the action of LC. 4. An irreducible representation of LC Looking at the table in Proposition 3.2 we notice that the second column from the left is a representation of LC (by Remark 3.5) of dimension 6: V ∼= V −2 =< w0 ∧ w1, w0 ∧ w2, w1 ∧w2, w0 ∧ w1 ∧w2 ∧w0, w0 ∧w1 ∧w2 ∧ w1, w0 ∧ w1 ∧ w2 ∧ w2 > In this section we will compute explicitely this representation. Using the above described basis, it is not difficult to compute the matrices by hand: Proposition 4.1. Indicating with β the ordered basis for V indicated above, the matrices for the (restrictions to V of) the generators of LC are the following: Mβ(L0) = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 , Mβ(Λ0) = 0 0 0 0 −2 0 0 0 0 0 0 −2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 GIOVANNI GAIFFI, MICHELE GRASSI Mβ(L1) = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 − 0 0 0 , Mβ(Λ1) = 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 −2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Mβ(L2) = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 , Mβ(Λ2) = 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Mβ(V0) = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 , Mβ(A0) = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 −2ı 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Mβ(V1) = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 , Mβ(A1) = 0 0 0 0 0 0 0 0 0 0 2ı 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Mβ(V2) = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 , Mβ(A2) = 0 0 0 0 0 −2ı 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Proof Direct computation using the basis generated by the wj . � Corollary 4.2. The algebra generated by the restriction of LC to V is isomorphic to sl(6,C), with V its natural representation. One can sum up the computations above in the following theorem: Theorem 4.3. There is an exact sequence of Lie algebras 0 → K → LC → sl(6,C) → 0 given by the restriction to V . In the next section we will prove that K = {0}, and therefore the representation V is faithful and LC ∼= sl(6,C). 5. Quadratic invariants We begin by showing that the action of Lie algebra LC is induced by a (non- canonical) Clifford algebra representation. We use for simplicity the canonical identification T ∗∗Xp ∼= TXp without further comment, so that if {vij} is a basis for T ∗pX , then { } is the corresponding dual basis for TpX . Definition 5.1. For p ∈ X, the Clifford algebra Cp is Cp = Cl(TpX ⊕ T pX, q) A GEOMETRIC REALIZATION OF sl(6,C) 11 with the quadratic form q induced by the metric ∀i, j, h, k < vij , vhk >= 0 ∀i, j, h, k < ∂ ∀(i, j) 6= (h, k) < vij , ∀i, j < vij , >= − 1 Remark 5.2. The Clifford algebras Cp for varying p define a Clifford bundle C on X, as the definition of Cp is independent on the choice of a basis. Indeed, the quadratic form used to define it is simply induced by − 1 times the natural bilinear pairing TpX ⊗ T pX → R. Proposition 5.3. The Clifford algebra Cp has a canonical representation ρp on T ∗pX, induced by the operators Eij and Iij via the map ρp(vij) = Eij , ρp = Iij Proof The Clifford relations φψ + ψφ = −2 < φ,ψ > are precisely the content of Proposition 2.4. The representation is canonical, even if the operators Eij and Iij are not, because it can be defined in a basis independent way as ρp(v)(α) = v ∧ α, ρp Abusing slightly the notation, we will identify Cp with its (faithful) image inside T ∗pX , and we will omit any reference to the map ρp. Actually, as the representation above is a real analogue of the Spinor representation, it is easy to check that the map ρp is an isomorphism of associative algebras. One then has: Definition 5.4. The linear subspace C2p of Cp is the image of the natural map (TpX ⊕ T pX) → Cp. The linear subspace C p of Cp is the subspace generated by Recall that C2p is a Lie subalgebra of Cp (with the commutator bracket). Proposition 5.5. The Lie algebra Lp and the operator J sit inside C p for all p ∈ X. Proof The operators Lj , the Λj, the Vj and the Aj lie inside C p by Propo- sition 2.4 and the fact that ω1, ω,ωD lie in T ∗pX . The operator J lies inside C2p ⊕ C p by Proposition 2.5. By definition the elements C p are commutators, and therefore have trace zero in any representation, and hence also in the ρp. Moreover, again by inspection all the generators of Lp have trace zero once represented via ρp (they are nilpotent), and therefore they must lie inside C p. The operator J is in the Lie algebra of the isometry group, and therefore it too has trace zero and hence sits inside C2p . As C p is closed under the commutator bracket of Cp, and this commutator coincides with the composition bracket of operators, we have the conclusion. � Remark 5.6. Giving degree 1 to the operators Eij and degree −1 to the opera- tors Iij , we induce a Z-degree on Cp. This degree coincides with the degree of the operators induced from the grading on the forms from T ∗X. 12 GIOVANNI GAIFFI, MICHELE GRASSI Remark 5.7. For any p ∈ X, the Clifford algebra Cp is isomorphic to Cl6,6, as the metric used to define it has signature (6, 6). The previous proposition therefore shows that Lp is a Lie subalgebra of Cl ∼= spin(6, 6) = so(6, 6), generated by smooth global sections of the Clifford bundle C. The operator J acts on all of Cp by adjunction with respect to the commutator bracket, and sends its quadratic part C2p to itself from Proposition 5.5. We will show that the space of J-invariants inside C2p (the “quadratic” J-invariants) coincides with LC. To describe it explicitely, let us introduce the following notation: Definition 5.8. Ewj = E1j + ıE2j , Ewj = E1j − ıE2j Iwj = I1j − ıI2j , Iwj = I1j + ıI2j Lemma 5.9. The adjoint action of the operator J on Ewj , Iwj , Ewj , Iwj is: [J,Ewj ] = −ıEwj , [J, Iwj ] = ıIwj [J,Ewj ] = ıEwj , [J, Iwj ] = −ıIwj Proof It is enough to consider the corresponding J-weights of the wj , wj . � Proposition 5.10. The following 36 operators provide a linear basis for the qua- dratic J-invariants: (1) [Ew0 , Ew1 ], [Ew0 , Ew2 ], [Ew1 , Ew2 ], [Ew1 , Ew0 ], [Ew2 , Ew0 ], [Ew2 , Ew1 ] (2) [Iw0 , Iw1 ], [Iw0 , Iw2 ], [Iw1 , Iw2 ], [Iw1 , Iw0 ], [Iw2 , Iw0 ], [Iw2 , Iw1 ] (3) [Ew0 , Ew0 ], [Ew1 , Ew1 ], [Ew2 , Ew2 ] (4) [Iw0 , Iw0 ], [Iw1 , Iw1 ], [Iw2 , Iw2 ] (5) [Ew0 , Iw1 ], [Ew0 , Iw2 ], [Ew1 , Iw0 ], [Ew1 , Iw2 ], [Ew2 , Iw0 ], [Ew2 , Iw1 ] (6) [Ew0 , Iw1 ], [Ew0 , Iw2 ], [Ew1 , Iw0 ], [Ew1 , Iw2 ], [Ew2 , Iw0 ], [Ew2 , Iw1 ] (7) [Ew0 , Iw0 ], [Ew1 , Iw1 ], [Ew2 , Iw2 ], [Ew0 , Iw0 ], [Ew1 , Iw1 ], [Ew2 , Iw2 ] Proof The J-weight of a bracket of J-homogeneous operators is the sum of the respective weights. The quadratic ”monomials” (with respect to the bracket) in the Ewj , Iwj , Ewj , Iwj are all J-homogeneous, and therefore to find a basis of J-invariant quadratic operators it is enough to identify the J-invariant quadratic monomials. To be J-invariant means simply to have weight zero, and the compu- tation of the J-weight of the quadratic mononials follows immediately from those of Ewj , Iwj , Ewj , Iwj , which are respectively −ı, ı, ı,−ı. � We end this section with the following: Theorem 5.11. In the exact sequence of Theorem 4.3 the kernel K is equal to {0}. The algebra LC is therefore isomorphic to sl(6,C). Proof Since LC is included in the Lie algebra of quadratic invariants, it is enough to show that J 6∈ LC, as from this and the previous proposition it follows that dimC(LC) ≤ 35. As LC maps surjectively to sl(6,C) which has dimension 35, the kernel must be zero. When restricted to the subrepresentation V , the generators of LC have all trace zero by inspection of their matrices. However, by definition of V , J restricted to it is multiplication by −2ı, and has therefore trace equal to −12ı. � Corollary 5.12. The Lie algebra LC⊕ < J > equals the Lie algebra of quadratic invariants inside C2p . A GEOMETRIC REALIZATION OF sl(6,C) 13 6. A geometric presentation of Serre generators In this section, to gain a better geometric understanding of the representation LC of sl(6,C), we explore in greater detail its relation to the geometric structure of a WSD manifold. In particular, we give a presentation of a natural choice of Cartan subalgebra and Serre generators in terms on the geometric generators Lj ,Λj, Vj , Aj . The Lj operators are similar in nature to the Lefschetz operators of a Kähler manifold. This analogy is what provided the initial interest in the algebraic struc- ture of LC. Similarly to the corresponding standard construction of a representation of sl(2,C), we define Definition 6.1. For j ∈ {0, 1, 2} Hj = [Lj ,Λj] These operators are self-adjoint, as L∗j = Λj by definition. As in the context of Kählerian geometry, for every j the algebra < Lj ,Λj, Hj > turns out to be a copy of sl(2,C). Moreover, the following proposition shows that the operators Hj are semisimple on the whole algebra LC, and therefore generate a toral subalgebra of Proposition 6.2. The geometric operators Hj generate a toral subalgebra of LC, and the following relations hold: for j 6= k ∈ {0, 1, 2} (1) [Hj , Lj] = 2Lj, [Hj ,Λj] = −2Λj (2) [Hj , Lk] = Lk, [Hj ,Λk] = −Λk (3) [Hj , Vj ] = 0, [Hj , Aj ] = 0 (4) [Hj , Vk] = 2Vk, [Hj , Ak] = −2Ak Proof In view of Theorem 5.11, at this point the quickest method of proof of this proposition is to refer to the explicit matrices of the (faithful) restriction of LC to V . � The whole algebra LC splits into a direct sum of weight spaces with respect to < H0, H1, H2 >, as this subalgebra is toral. The weight of L0 with respect to the basis dual to H0, H1, H2 is: αL0 = (αL0(H0), αL0(H1), αL0(H2)) = (2, 1, 1) The full list is: αL0 = (2, 1, 1), αΛ0 = −αL0 αL1 = (1, 2, 1), αΛ1 = −αL1 αL2 = (1, 1, 2), αΛ2 = −αL2 αV0 = (0, 2, 2), αA0 = −αV0 αV1 = (2, 0, 2), αA1 = −αV1 αV2 = (2, 2, 0), αA2 = −αV2 To find a natural geometric expression for two ad-semisimple elements which com- plete < H0, H1, H2 > to a Cartan subalgebra we look at the generators Vj and Aj . However, it turns out that the natural candidates [Vj , Aj ] already lie in the algebra < H0, H1, H2 >. We instead build the new operators by ”subtracting” from the Vj their weight αVj : Definition 6.3. We define S0 = ı[[[V0,Λ1],Λ2], L0] S1 = ı[[[V1,Λ2],Λ0], L1] S2 = ı[[[V2,Λ0],Λ1], L2] 14 GIOVANNI GAIFFI, MICHELE GRASSI and denote by H the Lie algebra (over C): H =< H0, H1, H2, S0, S1, S2 > The coefficients ı which appear in the formulas above are dictated by the fact that with this choice the (diagonal) matrices of the Sj restricted to V have integer entries. Proposition 6.4. The algebra H is a Cartan subalgebra of LC. More precisely, the following are the diagonals of the operators H0, ..., S2 once restricted to V , H1 : , H2 : , S0 : , S1 : , S2 : Proof The computation of the matrices above shows that, once restricted to V , the algebra H spans the space of diagonal matrices of trace zero in the given basis. � Remark 6.5. The computation above shows also that operators S0, S1, S2 safisfy the relation S0 + S1 + S2 = 0 Even if from the previous proposition we know that H is maximal toral inside LC, the natural geometric generators Lj ,Λj are not eigenvectors for the adjoint action of the Sk. At this point however it is possible to single out in natural geometric terms operators of LC which have ”pure” weight with respect to the algebra H and which contain in their linear span the Lj,Λj : Definition 6.6. For j ∈ {0, 1, 2} L1j = −2Lj + [Sj , Lj], L2j = 2Lj + [Sj , Lj] Λ1j = −2Λj − [Sj ,Λj], Λ2j = 2Λj − [Sj ,Λj] Proposition 6.7. Indicating with ehk the 6× 6 matrix with a 1 in position k (row) and h (column) and zero otherwise, the matrices of the operators Lij and Λij re- stricted on V are: L10 = 2e 6 L11 = −2e 4 L12 = −2e L20 = −2e 5 L21 = −2e 6 L22 = 2e Λ10 = 8e 2 Λ11 = −8e 1 Λ12 = −8e Λ20 = −8e 1 Λ21 = −8e 3 Λ22 = 8e Corollary 6.8. We have the following relations for the operators of LC restricted to V : [Hk, Lij ] = (1 + δkj)Lij , [Hk,Λij ] = −(1 + δkj)Λij [Sk, Lij ] = (−1) i+1(1− 3δkj)Lij , [Sk,Λij ] = (−1) i(1− 3δkj)Λij [Sk, Vj ] = 0, [Sk, Aj ] = 0 Guided by all the explicit computations of the action on the isotypical component V = V ⊕6 −2 made up to this point, we now define in terms of the natural geometric operators a set of Serre generators for the algebra LC. A GEOMETRIC REALIZATION OF sl(6,C) 15 Definition 6.9. [L20, A1] f1 = [V1,Λ20] [L22, A0] f2 = [V0,Λ22] e3 = V0 f3 = A0 [L12, A0] f4 = [V0,Λ12] [L10, A1] f5 = [V1,Λ10] Moreover, for all i ∈ {1, .., 5} we define hi = [ei, fi]. As the ei have by construction associated matrix e i+1 once restricted to V and the fi are their respective adjoints, one gets: Proposition 6.10. The operators ei, fj ,hk satisfy the Serre relations for sl(6,C) and the hi span the Cartan subalgebra H: (H1 −H2 − S1 − S2) (H0 −H1 + S2) (−H0 +H1 +H2) (H0 −H1 − S2) (H1 −H2 + S1 + S2) It would be interesting as a last remark to identify in the list of quadratic in- variants the geometric operators Lij ,Λij , Vj , Aj , the algebra H and the so(2,R) generator J . To do this one could of course use the explicit matrices for the qua- dratic invariants once restricted to V , which are not difficult to compute. One can however get very quickly a qualitative picture by using the notion of multidegree which we now introduce. The decomposition T ∗X = W0 ⊕ W1 ⊕ W2 induces naturally a multi-degree on X with values in Z3, which we indicate with mdeg. This follows from the equation p+q+r=n (W0 ⊗ C)⊕ (W1 ⊗ C)⊕ (W2 ⊗ C) We notice furthermore that the (complexified) decomposition above is preserved by the operator J , and therefore mdeg commutes with the action of so(2,R). Proposition 6.11. The operators Lj, Vj ,Λj , Aj , Hj , Sj are mdeg-homogeneous, with multi-degrees: mdeg(L0) = (0, 1, 1) mdeg(L1) = (1, 0, 1) mdeg(L2) = (1, 1, 0) mdeg(Λ0) = (0,−1,−1) mdeg(Λ1) = (−1, 0,−1) mdeg(Λ2) = (−1,−1, 0) mdeg(V0) = (2, 0, 0) mdeg(V1) = (0, 2, 0) mdeg(V2) = (0, 0, 2) mdeg(A0) = (−2, 0, 0) mdeg(A1) = (0,−2, 0) mdeg(A2) = (0, 0,−2) mdeg(H0) = (0, 0, 0) mdeg(H1) = (0, 0, 0) mdeg(H2) = (0, 0, 0) mdeg(S0) = (0, 0, 0) mdeg(S1) = (0, 0, 0) mdeg(S2) = (0, 0, 0) Proof The values for mdeg for the Lj and the Vj follow immediately from mdeg of the corresponding forms and the dual (contraction) operators have opposite value 16 GIOVANNI GAIFFI, MICHELE GRASSI of mdeg. The remaing values can be computed using the additivity of mdeg with respect to the bracket. � Proposition 6.12. Let {j, k, l} = {0, 1, 2}. Then Span (L1j, L2j) = Span ([Ewk , Ewl ], [Ewl , Ewk ]) Span (Λ1j ,Λ2j) = Span ([Iwk , Iwl ], [Iwl , Iwk ]) Span (Vj) = Span [Ewj , Ewj ] Span (Aj) = Span [Iwj , Iwj ] H⊕ Span (J) = Span ([Ewm , Iwm ], [Ewm , Iwm ]) Proof The mdeg of the Lij is the same of the corresponding Lj, and sim- ilarly for their adjoints. The mdegs of the quadratic monomials are immedi- ately computed as they are the sum of those of their components. For example, mdeg(Ew0) = mdeg(Ew0) = (1, 0, 0) , mdeg(Ew1) = mdeg(Ew1) = (0, 1, 0) and therefore mdeg([Ew0 , Ew1 ] = (1, 1, 0), equal to that of L12 and L22. � References [B] V. Batyrev, Dual polyhedra and mirror symmetry for Calabi-Yau hypersurfaces in toric varieties, J. Alg. Geom. 3 (1994) , 493-535 [BMP] U. Bruzzo, G. Marelli, F. Pioli A Fourier transform for sheaves on real tori Part II. Relative theory J. of Geometry and Phy. 41 (2002) 312-329 [CDGP] P. Candelas, X.C. De la Ossa, P.S. Green, L. Parkes, A pair of Calabi-Yau manifolds as an exactly soluble superconformal theory, Nucl. Phys. B359 (1991), p 21-74 [GG] G. Gaiffi, M. Grassi, A natural Lie superalgebra bundle on rank three WSD manifolds, preprint (2007) [G1] M. Grassi, Polysymplectic spaces, s-Kähler manifolds and lagrangian fibrations, math.DG/0006154 (2000) [G2] M. Grassi, Mirror symmetry and self-dual manifolds, math.DG/0202016 (2002) [G3] M. Grassi, Self-dual manifolds and mirror symmetry for the quintic threefold, Asian J. Math 9 (2005) 79-102 [GP] B.R. Greene, M.R. Plesser, Duality in Calabi-Yau moduli space, Nucl. Phys. B338 (1990), 15-37 [GVW] B. R. Greene, C. Vafa, N. P. Warner, Calabi-Yau manifolds and renormalization group flows, Nucl. Phys. B324 (1989), 371-390 [Gr] M. Gromov, Metric structures for Riemannian and non-Riemannian spaces, Birkhäuser P.M. 152, Boston 1999 [GW] M. Gross, P.M.H. Wilson, Large Complex Structure limits of K3 surfaces, math.DG/0008018 (2001) [Gu] V. Guillemin, Moment maps and combinatorial invariants of Hamiltonian Tn-spaces, Birkhäuser P.M. 122 (1994) [M] A. McInroy, Orbifold mirror symmetry for complex tori, preprint [KS] M. Kontsevich, Y. Soibelman, Homological mirror symmetry and torus fibrations, math.SG/0011041 (2001) [SYZ] A. Strominger, S.T. Yau, E. Zaslow, Mirror Symmetry is T-Duality, Nucl. Phys. B479 (1996) 243-259; hep-th/9606040 http://arxiv.org/abs/math/0006154 http://arxiv.org/abs/math/0202016 http://arxiv.org/abs/math/0008018 http://arxiv.org/abs/math/0011041 http://arxiv.org/abs/hep-th/9606040 1. Introduction 2. Basic operators 3. The action of so(2,R) 4. An irreducible representation of LC 5. Quadratic invariants 6. A geometric presentation of Serre generators References ABSTRACT Given an orientable weakly self-dual manifold X of rank two, we build a geometric realization of the Lie algebra sl(6,C) as a naturally defined algebra L of endomorphisms of the space of differential forms of X. We provide an explicit description of Serre generators in terms of natural generators of L. This construction gives a bundle on X which is related to the search for a natural Gauge theory on X. We consider this paper as a first step in the study of a rich and interesting algebraic structure. <|endoftext|><|startoftext|> Introduction and main results 3 1.1 Many facets of displaceability . . . . . . . . . . . . . . . . . . 3 1.2 Preliminaries on quantum homology . . . . . . . . . . . . . . . 8 1.3 An hierarchy of rigid subsets within Floer theory . . . . . . . 10 1.4 Hamiltonian torus actions . . . . . . . . . . . . . . . . . . . . 14 1.5 Super(heavy) monotone Lagrangian submanifolds . . . . . . . 19 1.6 An effect of semi-simplicity . . . . . . . . . . . . . . . . . . . . 23 1.7 Discussion and open questions . . . . . . . . . . . . . . . . . . 27 1.7.1 Strong displaceability beyond Floer theory? . . . . . . 27 1.7.2 Heavy fibers of Poisson-commutative subspaces . . . . 28 2 Detecting stable displaceability 32 3 Preliminaries on Hamiltonian Floer theory 33 3.1 Valuation on QH∗(M) . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Hamiltonian Floer theory . . . . . . . . . . . . . . . . . . . . 34 3.3 Conley-Zehnder and Maslov indices . . . . . . . . . . . . . . . 36 3.4 Spectral numbers . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.5 Partial symplectic quasi-states . . . . . . . . . . . . . . . . . . 44 4 Basic properties of (super)heavy sets 45 5 Products of (super)heavy sets 48 5.1 Product formula for spectral invariants . . . . . . . . . . . . . 48 5.2 Decorated Z2-graded complexes . . . . . . . . . . . . . . . . . 49 5.3 Reduced Floer and Quantum homology . . . . . . . . . . . . . 50 5.4 Proof of Theorem 5.1 . . . . . . . . . . . . . . . . . . . . . . . 51 5.5 Proof of algebraic Theorem 5.2 . . . . . . . . . . . . . . . . . 52 6 Stable non-displaceability of heavy sets 57 7 Analyzing stable stems 59 8 Monotone Lagrangian submanifolds 61 9 Rigidity of special fibers of Hamiltonian actions 66 9.1 Calabi and mixed action-Maslov . . . . . . . . . . . . . . . . . 76 1 Introduction and main results 1.1 Many facets of displaceability A well-studied and easy to visualize rigidity property of subsets of a symplec- tic manifold (M,ω) is the rigidity of intersections: a subset X ⊂ M cannot be displaced from the closure of a subset Y ⊂ M by a compactly supported Hamiltonian isotopy: φ(X) ∩ Y 6= ∅ ∀φ ∈ Ham(M) . We say in such a case that X cannot be displaced from Y . If X cannot be displaced from itself we call it non-displaceable. These properties become especially interesting and purely symplectic when X can be displaced from itself or from Y by a (compactly supported) smooth isotopy. One of the main themes of the present paper is that “some non-displace- able sets are more rigid than others.” To explain this, we need the following ramifications of the notion of a non-displaceable set: Strong non-displaceability: A subset X ⊂ M is called strongly non- displaceable if one cannot displace it by any (not necessarily Hamiltonian) symplectomorphism of (M,ω). Stable non-displaceability: Consider T ∗S1 = R × S1 with the coordi- nates (r, θ) and the symplectic form dr ∧ dθ. We say that X ⊂ M is stably non-displaceable if X × {r = 0} is non-displaceable in M × T ∗S1 equipped with the split symplectic form ω̄ = ω ⊕ (dr ∧ dθ). Let us mention that de- tecting stably non-displaceable subsets is useful for studying geometry and dynamics of Hamiltonian flows (see for instance [50] for their role in Hofer’s geometry and [51] for their appearance in the context of kick stability in Hamiltonian dynamics). Formally speaking, the properties of strong and stable non-displaceability are mutually independent and both are strictly stronger than displaceability. In the present paper we refine the machinery of partial symplectic quasi- states introduced in [23] and get new examples of stably non-displaceable sets, including certain fibers of moment maps of Hamiltonian torus actions as well as monotone Lagrangian submanifolds discussed by Albers [2] and Biran-Cornea [15]. Further, we address the following question: given the class of stably non-displaceable sets, can one distinguish those of them which are also strongly non-displaceable by means of the Floer theory? Or, other way around, what are the Floer-homological features of stably non-displaceable but strongly displaceable sets? Toy examples are given by the equator of the symplectic two-sphere and by the meridian on a symplectic two-torus. Both are stably non-displaceable since their Lagrangian Floer homologies are non- trivial. On the other hand, the equator is strongly non-displaceable, while the meridian is strongly displaceable by a non-Hamiltonian shift. Later on we shall explain the difference between these two examples from the viewpoint of Hamiltonian Floer homology and present various generalizations. The question on Floer-homological characterization of (strongly) non-displa- ceable but stably displaceable sets is totally open, see Section 1.7.1 below for an example involving Gromov’s packing theorem and discussion. Leaving Floer-theoretical considerations for the next section, let us outline (in parts, informally) the general scheme of our results: Given a symplectic manifold (M,ω), we shall define (in the language of the Floer theory) two collections of closed subsets of M , heavy subsets and superheavy subsets. Every superheavy subset is heavy, but, in general, not vice versa. Formally speaking, the hierarchy heavy-superheavy depends in a delicate way on the choice of an idempotent in the quantum homology ring ofM . This and other nuances will be ignored in this outline. The key properties of these collections are as follows (see Theorems 1.2 and 1.5 below): Invariance: Both collections are invariant under the group of all symplec- tomorphisms of M . Stable non-displaceability: Every heavy subset is stably non-displace- able. Intersections: Every superheavy subset intersects every heavy subset. In particular, superheavy subsets are strongly non-displaceable. In contrast to this, heavy subsets can be mutually disjoint and strongly displaceable. Products: Product of any two (super)heavy subsets is (super)heavy. What is inside the collections? The collections of heavy and superheavy sets include the following examples: Stable stems: Let A ⊂ C∞(M) be a finite-dimensional Poisson-commuta- tive subspace (i.e. any two functions from A commute with respect to the Poisson brackets). Let Φ :M → A∗ be the moment map: 〈Φ(x), F 〉 = F (x). A non-empty fiber Φ−1(p), p ∈ A∗, is called a stem of A (see [23]) if all non-empty fibers Φ−1(q) with q 6= p are displaceable and a stable stem if they are stably displaceable. If a subset of M is a (stable) stem of a finite- dimensional Poisson-commutative subspace of C∞(M), it will be called just a (stable) stem. Clearly, any stem is a stable stem. The collection of superheavy subsets includes all stable stems (see Theorem 1.6 below). One readily shows that a direct product of stable stems is a stable stem and that the image of a stable stem under any symplectomorphism is again a stable stem. The following example of a stable stem is borrowed (with a minor mod- ification) from [23]: Let X ⊂ M be a closed subset whose complement is a finite disjoint union of stably displaceable sets. Then X is a stable stem. For instance, the codimension-1 skeleton of a sufficiently fine triangulation of any closed symplectic manifold is a stable stem. Another example is given by the equator of S2: it divides the sphere into two displaceable open discs and hence is a stable stem. By taking products, one can get more sophisticated examples of stable stems. Already the product of equators of the two-spheres gives rise to a Lagrangian Clifford torus in S2× . . .×S2. To prove its rigidity properties (such as stable non-displaceability) one has to use non-trivial sym- plectic tools such as Lagrangian Floer homology, see e.g. [44]. Products of the 1-skeletons of fine triangulations of the two-spheres can be considered as singular Lagrangian submanifolds, an object which is currently out of reach of the Lagrangian Floer theory. Another example of stable stems comes from Hamiltonian torus actions. Consider an effective Hamiltonian action ϕ : Tk → Ham(M) with the mo- ment map Φ = (Φ1, . . . ,Φk) : M → R k. Assume that Φi is a normalized Hamiltonian, that is Φi = 0 for all i = 1, . . . , k. A torus action is called compressible if the image of the homomorphism ϕ♯ : π1(T k) → π1(Ham(M)), induced by the action ϕ, is a finite group. One can show that for compressible actions the fiber Φ−1(0) is a stable stem (see Theorem 1.7 below). Special fibers of Hamiltonian torus actions: Consider an effective Hamiltonian torus action ϕ on a spherically monotone symplectic manifold. Let I : π1(Ham(M)) → R be the mixed action-Maslov homomorphism intro- duced in [49]. Since the target space Rk of the moment map Φ is naturally identified with Hom(π1(T k),R), the pull back pspec := −ϕ ♯I of the mixed action-Maslov homomorphism with the reversed sign can be considered as a point of Rk. The preimage Φ−1(pspec) is called the special fiber of the action. We shall see below that the special fiber is always non-empty. For monotone symplectic toric manifolds (that is when 2k = dimM) the special fiber is a monotone Lagrangian torus. Note that when the action is compressible we have pspec = 0 and therefore the special fiber is a stable stem according to the previous example. It is unknown whether the latter property persists for gen- eral non-compressible actions. Thus in what follows we treat stable stems and special fibers as separate examples. The collection of superheavy subsets includes all special fibers (see Theorem 1.9 below). For instance, consider CP 2 and the Lagrangian Clifford torus in it (i.e. the torus {[z0 : z1 : z2] ∈ CP 2 | |z0| = |z1| = |z2|}). Take the standard Hamiltonian T2-action on CP 2 preserving the Clifford torus. It has three global fixed points away from the Clifford torus. Make an equivariant sym- plectic blow-up, M , of CP 2 at k of these fixed points, 0 ≤ k ≤ 3, so that the obtained symplectic manifold is spherically monotone. The torus action lifts to a Hamiltonian action on M . One can show that its special fiber is the proper transform of the Clifford torus. Monotone Lagrangian submanifolds: Let (M2n, ω) be a spherically monotone symplectic manifold, and let L ⊂ M be a closed monotone La- grangian submanifold with the minimal Maslov number NL ≥ 2. We say that L satisfies the Albers condition [2] if the image of the natural morphism H∗(L;Z2) → H∗(M ;Z2) contains a non-zero element S with deg S > dimL+ 1−NL . The collection of heavy sets includes all closed monotone Lagran- gian submanifolds satisfying the Albers condition (see Theorem 1.15 below). Specific examples include the meridian on T2, RP n ⊂ CP n and all La- grangian spheres in complex projective hypersurfaces of degree d in CP n+1 with n > 2d − 3. In the case when the fundamental class [L] of L divides a non-trivial idempotent in the quantum homology algebra of M , L is, in fact, superheavy (see Theorem 1.18 below). For instance, this is the case for RP n ⊂ CP n. Furthermore, a version of superheaviness holds for any Lagrangian sphere in the complex quadric of even (complex) dimension. However, there exist examples of heavy, but not superheavy, Lagrangian submanifolds: For instance, the meridian of the 2-torus is strongly displa- ceable by a (non-Hamiltonian!) shift and hence is not superheavy. Another example of heavy but not superheavy Lagrangian submanifold is the sphere arising as the real part of the Fermat hypersurface M = {−zd0 + z 1 + . . .+ z n+1 = 0} ⊂ CP with even d ≥ 4 and n > 2d− 3. We refer to Section 1.5 for more details on (super)heavy monotone Lagrangian submanifolds. Motivation: Our motivation for the selection of examples appearing in the list above is as follows. Stable stems provide a playground for studying symplectic rigidity of singular subsets. In particular, no visible analogue of the conventional Lagrangian Floer homology technique is applicable to them. Detecting (stable) non-displaceability of Lagrangian submanifolds via La- grangian Floer homology is one of the central themes of symplectic topology. In contrast to this, detecting strong non-displaceabilty has at the moment the status of art rather than science. That’s why we were intrigued by Albers’ observation that monotone Lagrangian submanifolds satisfying his condition are in some situations strongly non-displaceable. In the present work we tried to digest Albers’ results [2] and look at them from the viewpoint of theory of partial symplectic quasi-states developed in [23]. In addition, our result on superheaviness of the Lagrangian anti-diagonal in S2 × S2 allows us to detect an “exotic” monotone Lagrangian torus in this symplectic manifold: this torus does not intersect the anti-diagonal, and hence is not heavy in contrast to the standard Clifford torus, see Example 1.20 below. In [23] we proved a theorem which roughly speaking states that every (singular) coisotropic foliation has at least one non-displaceable fiber. How- ever, our proof is non-constructive and does not tell us which specific fibers are non-displaceable. The notion of the special fiber arose as an attempt to solve this problem for Hamiltonian circle actions. Let us mention also that the product property enables us to produce even more examples of (super)heavy subsets by taking products of the subsets appearing in the list. A few comments on the methods involved into our study of heavy and su- perheavy subsets are in order. These collections are defined in terms of partial symplectic quasi-states which were introduced in [23]. These are cer- tain real-valued functionals on C∞(M) with rich algebraic properties which are constructed by means of the Hamiltonian Floer theory and which conve- niently encode a part of the information contained in this theory. In general, the definition of a partial symplectic quasi-state involves the choice of an idempotent element in the commutative part QH•(M) of the quantum ho- mology algebra of M . Though the default choice is just the unity of the algebra, there exist some other meaningful choices, in particular in the case when QH•(M) is semi-simple. This gives rise to another theme discussed in this paper: “visible” topological obstructions to semi-simplicity (see Corol- lary 1.24 and Theorem 1.25 below). For instance, we shall show that if a monotone symplectic manifold M contains “too many” disjoint monotone Lagrangian spheres whose minimal Maslov numbers exceed n+ 1, the quan- tum homology QH•(M) cannot be semi-simple. Let us pass to the precise set-up. For the reader’s convenience, the ma- terial presented in this brief outline will be repeated in parts in the next sections in a less compressed form. 1.2 Preliminaries on quantum homology The Novikov Ring: Let F denote a base field which in our case will be either C or Z2, and let Γ ⊂ R be a countable subgroup (with respect to the addition). Let s, q be formal variables. Define a field KΓ whose elements are generalized Laurent series in s of the following form: KΓ := θ, zθ ∈ F , ♯ θ > c | zθ 6= 0 <∞, ∀c ∈ R Define a ring ΛΓ := KΓ[q, q −1] as the ring of polynomials in q, q−1 with coefficients in KΓ. We turn ΛΓ into a graded ring by setting the degree of s to be 0 and the degree of q to be 2. The ring ΛΓ serves as an abstract model of the Novikov ring associated to a symplectic manifold. Let (M,ω) be a closed connected symplectic manifold. Denote by HS2 (M) the subgroup of spherical homology classes in the integral homology group H2(M ;Z). Abusing the notation we will write ω(A), c1(A) for the results of evaluation of the cohomology classes [ω] and c1(M) on A ∈ H2(M ;Z). Set π̄2(M) := H 2 (M)/ ∼, where by definition A ∼ B iff ω(A) = ω(B) and c1(A) = c1(B). Denote by Γ(M,ω) := [ω](HS2 (M)) ⊂ R the subgroup of periods of the symplectic form on M on spherical homology classes. By definition, the Novikov ring of a symplectic manifold (M,ω) is ΛΓ(M,ω). In what follows, when (M,ω) is fixed, we abbreviate and write Γ, K and Λ instead of Γ(M,ω), KΓ(M,ω) and ΛΓ(M,ω) respectively. Quantum homology: Set 2n = dimM . The quantum homology QH∗(M) is defined as follows. First, it is a graded module over Λ given by QH∗(M) := H∗(M ;F)⊗F Λ, with the grading defined by the gradings on H∗(M ;F) and Λ: deg (a⊗ zsθqk) := deg (a) + 2k . Second, and most important, QH∗(M) is equipped with a quantum prod- uct: if a ∈ Hk(M ;F), b ∈ Hl(M ;F), their quantum product is a class a ∗ b ∈ QHk+l−2n(M), defined by a ∗ b = A∈π̄2(M) (a ∗ b)A ⊗ s −ω(A)q−c1(A), where (a ∗ b)A ∈ Hk+l−2n+2c1(A)(M) is defined by the requirement (a ∗ b)A ◦ c = GW A (a, b, c) ∀c ∈ H∗(M ;F). Here ◦ stands for the intersection index and GWFA (a, b, c) ∈ F denotes the Gromov-Witten invariant which, roughly speaking, counts the number of pseudo-holomorphic spheres inM in the class A that meet cycles representing a, b, c ∈ H∗(M ;F) (see [55], [56], [41] for the precise definition). Extending this definition by Λ-linearity to the whole QH∗(M) one gets a correctly defined graded-commutative associative product operation ∗ on QH∗(M) which is a deformation of the classical ∩-product in singular ho- mology [37], [41], [55], [56], [69]. The quantum homology algebra QH∗(M) is a ring whose unity is the fundamental class [M ] and which is a module of finite rank over Λ. If a, b ∈ QH∗(M) have graded degrees deg (a), deg (b) deg (a ∗ b) = deg (a) + deg (b)− 2n. (1) We will be mostly interested in the commutative part of the quantum homology ring (which in the case F = Z2 is, of course, the whole quantum homology ring). For this purpose we introduce the following notation: We denote by QH•(M) the whole quantum homology QH∗(M) if F = Z2 and the even-degree part of QH∗(M) if F = C. In general, given a topological space X, we denote by H•(X ;F) the whole singular homology group H∗(X ;F) if F = Z2 and the even- degree part of H∗(X ;F) if F = C. Thus, in our notation the ring QH•(M) = H•(M ;F)⊗F Λ is always a com- mutative subring with unity of QH∗(M) and a module of finite rank over Λ. We will identify Λ with a subring of QH•(M) by λ 7→ [M ]⊗ λ. 1.3 An hierarchy of rigid subsets within Floer theory Fix a non-zero idempotent a ∈ QH2n(M) (by obvious grading considera- tions the degree of every idempotent equals 2n). We shall deal with spectral invariants c(a,H), where H = Ht : M → R, t ∈ R, is a smooth time- dependent and 1-periodic in time Hamiltonian function on M , or c(a, φH), where φH is an element of the universal cover H̃am (M) of Ham(M) rep- resented by an identity-based path given by the time-1 Hamiltonian flow generated by H . If H is normalized, meaning that dimM/2 = 0 for all t, then c(a,H) = c(a, φH). These invariants, which nowadays are standard objects of the Floer theory, were introduced in [45] (cf. [59] in the aspherical case; also see [42],[43] for an earlier version of the construction and [22] for a summary of definitions and results in the monotone case). Disclaimer: Throughout the paper we tacitly assume that (M,ω) (as well as (M ×T2, ω̄), when we speak of stable displaceability) belongs to the class S of closed symplectic manifolds for which the spectral invariants are well defined and enjoy the standard list of properties (see e.g. [41, Theorem 12.4.4]). For instance, S contains all symplectically aspherical and spherically monotone manifolds. Furthermore, S contains all symplectic manifolds M2n for which, on one hand, either c1 = 0 or the minimal Chern number (on HS2 (M)) is at least n − 1 and, on the other hand, [ω](H 2 (M)) is a discrete subgroup of R (cf. [64]). The general belief is that the class S includes all symplectic manifolds. Define a functional ζ : C∞(M) → R by ζ(H) := lim c(a, lH) It is shown in [23] that the functional ζ has some very special algebraic properties (see Theorem 3.6) which form the axioms of a partial symplectic quasi-state introduced in [23]. The next definition is motivated in part by the work of Albers [2]. Definition 1.1. A closed subset X ⊂ M is called heavy (with respect to ζ or with respect to a used to define ζ) if ζ(H) ≥ inf H ∀H ∈ C∞(M) , (3) and is called superheavy (with respect to ζ or a) if ζ(H) ≤ sup H ∀H ∈ C∞(M) . (4) The default choice of an idempotent a is the unity [M ] ∈ QH∗(M). In this case, as we shall see below, the collections of heavy and superheavy sets satisfy the properties listed in Section 1.1 and include the examples therein. In view of potential applications (including geometric obstructions to semi- simplicity of the quantum homology), we shall work, whenever possible, with general idempotents. The asymmetry between supX H and infX H is related to the fact that the spectral numbers satisfy a triangle inequality c(a ∗ b, φFφG) ≤ c(a, φF ) + c(b, φG), while there may not be a suitable inequality “in the opposite direc- tion”. In the case when such an “opposite” inequality exists (e.g. when a = b is an idempotent and ζ defined by it is a genuine symplectic quasi-state – see Section 1.6 below) the symmetry between supX H and infX H gets restored and the classes of heavy and superheavy sets coincide. Let us emphasize that the notion of (super)heaviness depends on the choice of a coefficient ring for the Floer theory. In this paper the coefficients for the Floer theory will be either Z2 or C depending on the situation. Unless otherwise stated, our results on (super)heavy subsets are valid for any choice the coefficients. The group Symp (M) of all symplectomorphisms of M acts naturally on H∗(M ;F) and hence on QH∗(M) = H∗(M ;F) ⊗F Λ. Clearly, the identity component Symp0(M) of Symp (M) acts trivially on QH∗(M) and hence for any idempotent a ∈ QH∗(M) the corresponding ζ is Symp0(M)-invariant. Thus the image of a (super)heavy set under an element of Symp0(M) is again a (super)heavy set with respect to the same idempotent a. If a is invariant under the action of the whole Symp (M) (for instance, if a = [M ]) the classes of heavy and superheavy sets with respect to a are invariant under the action of the whole Symp (M) in agreement with the invariance property presented in Section 1.1 above. Let us mention also that the collections of (super)heavy sets enjoy a stability property under inclusions: If X, Y , X ⊂ Y , are closed subsets of M and X is heavy (respectively, superheavy) with respect to an idempotent a then Y is also heavy (respectively, superheavy) with respect to the same a. We are ready now to formulate the main results of the present section. Theorem 1.2. Assume a and ζ are fixed. Then (i) Every superheavy set is heavy, but, in general, not vice versa. (ii) Every heavy subset is stably non-displaceable. (iii) Every superheavy set intersects every heavy set. In particular, a super- heavy set cannot be displaced by a symplectic (not necessarily Hamil- tonian) isotopy and if the idempotent a is invariant under the symplec- tomorphism group of (M,ω) (e.g. if a = [M ]), every superheavy set is strongly non-displaceable. The following theorem discusses the relation between heaviness/super- heaviness properties with respect to different idempotents. In particular, it shows that [M ] plays a special role among all the other non-zero idempotents in QH∗(M). Theorem 1.3. Assume a is a non-zero idempotent in the quantum homology. (i) Every set that is superheavy with respect to [M ] is also superheavy with respect to a. (ii) Every set that is heavy with respect to a is also heavy with respect to [M ]. (iii) Assume that the idempotent a is a sum of non-zero idempotents e1, . . . , el and assume that a closed subset X ⊂ M is heavy with re- spect to a. Then X is heavy with respect to ei for at least one i. The next proposition shows that, in general, the heaviness of a set does depend on the choice of an idempotent in the quantum homology. Proposition 1.4. Consider the torus T2n equipped with the standard sym- plectic structure ω = dp∧dq. Let M2n = T2n♯CP n be a symplectic blow-up of T2n at one point (the blow up is performed in a small ball around the point). Assume that the Lagrangian torus L ⊂ T2n given by q = 0 does not intersect the ball in T2n, where the blow up was performed. Then the proper transform of L (identified with L) is a Lagrangian sub- manifold of M , which is not heavy with respect to some non-zero idempotent a ∈ QH∗(M) but heavy with respect to [M ]. (Here we work with F = Z2). Next, consider direct products of (super)heavy sets. We start with the fol- lowing convention on tensor products. Let Γi, i = 1, 2, be two countable subgroups of R. Let Ei be a module over KΓi. We put E1⊗̂KE2 = E1 ⊗KΓ1 KΓ1+Γ2 ⊗KΓ1+Γ2 E2 ⊗KΓ2 KΓ1+Γ2 . (5) If E1, E2 are also rings we automatically assume that the middle tensor prod- uct is the tensor product of rings. In simple words, we extend both modules to KΓ1+Γ2-modules and consider the usual tensor product over KΓ1+Γ2 . Given two symplectic manifolds, (M1, ω1) and (M2, ω2), note that the subgroups of periods of the symplectic forms satisfy Γ(M1 ×M2, ω1 ⊕ ω2) = Γ(M1, ω1) + Γ(M2, ω2) . Furthermore, due to the Künneth formula for quantum homology (see e.g. [41, Exercise 11.1.15] for the statement in the monotone case; the general case in our algebraic setup can be treated similarly) there exists a natural ring monomorphism linear over KΓ1+Γ2 QH2n1(M1)⊗̂KQH2n2(M2) →֒ QH2n1+2n2(M1 ×M2) , We shall fix a pair of idempotents ai ∈ QH∗(Mi), i = 1, 2. The notions of (super)heaviness in M1,M2 and M1 ×M2 are understood in the sense of idempotents a1, a2 and a1 ⊗ a2 respectively. Theorem 1.5. Assume that Xi is a heavy (resp. superheavy) subset of Mi with respect to some idempotent ai, i = 1, 2. Then the product X1 × X2 is a heavy (resp. superheavy) subset of M with respect to the idempotent a1 ⊗ a2 ∈ QH•(M1 ×M2). An important class of superheavy sets is given by stable stems introduced and illustrated in Section 1.1. Theorem 1.6. Every stable stem is a superheavy subset with respect to any non-zero idempotent a ∈ QH∗(M). In particular, it is strongly and stably non-displaceable. In the next section we present an example of stable stems coming from Hamil- tonian torus actions. 1.4 Hamiltonian torus actions Fibers of the moment maps of Hamiltonian torus actions form an interesting playground for testing the various notions of displaceability and heaviness introduced above. Throughout the paper we deal with effective actions only, that is we assume that the map ϕ : Tk → Ham(M) defining the action is a monomorphism. Furthermore, we assume that the moment map Φ = (Φ1, . . . ,Φk) : M → R k of the action is normalized: Φi is a normalized Hamiltonian for all i = 1, . . . , k. By the Atiyah-Guillemin-Sternberg theorem [6], [30], the image ∆ = Φ(M) of Φ is a k-dimensional convex polytope, called the moment polytope. The subsets Φ−1(p), p ∈ ∆, are called fibers of the moment map. A torus action is called compressible if the image of the homomorphism ϕ♯ : π1(T k) → π1(Ham(M)), induced by the action ϕ, is a finite group. Theorem 1.7. Assume that (M,ω) is equipped with a compressible Hamilto- nian Tk-action with moment map Φ and moment polytope ∆. Let Y ⊂ ∆ be any closed convex subset which does not contain 0. Then the subset Φ−1(Y ) is stably displaceable. In particular, the fiber Φ−1(0) is a stable stem. Note that for symplectic toric manifolds, that is when 2k = dimM , the point 0 is the barycenter of the moment polytope with respect to the Lebesgue measure. This follows from our assumption on the normalization of the moment map. Theorems 1.6 and 1.7 imply that the fiber Φ−1(0) of a compressible torus action is stably non-displaceable, and thus we get the complete description of stably displaceable fibers for such actions. In the case when the action is not compressible, the question of the com- plete description of stably non-displaceable fibers remains open. We make a partial progress in this direction by presenting at least one such fiber, called the special fiber, explicitly in the case when (M,ω) is spherically monotone: [ω]|HS2 (M) = κ c1(TM)|HS2 (M) , κ > 0 . The special fiber can be described via the mixed action-Maslov homomor- phism introduced in [49]: Let (M2n, ω) be a spherically monotone symplectic manifold, and let {ft}, t ∈ [0, 1], be any loop of Hamiltonian diffeomorphisms, with f0 = f1 = 1, generated by a 1-periodic normalized Hamiltonian func- tion F (x, t). The orbits of any Hamiltonian loop are contractible due to the standard Floer theory1. Pick any point x ∈ M and any disc u : D2 → M spanning the orbit γ = {ftx}. Define the action 2 of the orbit by AF (γ, u) := F (γ(t), t)dt− u∗ω . Trivialize the symplectic vector bundle u∗(TM) over D2 and denote by mF (γ, u) the Maslov index of the loop of symplectic matrices corresponding to {ft∗} with respect to the chosen trivialization. One readily checks that, in view of the spherical monotonicity, the quantity I(F ) := −AF (γ, u)− mF (γ, u) does not depend on the choice of the point x and the disc u, and is invariant under homotopies of the Hamiltonian loop {ft}. In fact, I is a well defined homomorphism from π1(Ham(M)) to R (see [49], [68]). Assume again that ϕ : Tk → Ham(M,ω) is a Hamiltonian torus ac- tion. Write ϕ♯ for the induced homomorphism of the fundamental groups. Since the target space Rk of the moment map Φ is naturally identified with Hom(π1(T k),R), the pull back −ϕ∗♯I of the mixed action-Maslov homomor- phism with the reversed sign can be considered as a point of Rk. We call it a special point and denote by pspec. The preimage Φ −1(pspec) is called the special fiber of the moment map. In the case k = 1, when Φ is a real-valued function on M , we will call pspec the special value of Φ. 1The Floer theory guarantees the existence of at least one contractible periodic orbit – this is not obvious a priori if {ft} is not an autonomous flow. Since all the orbits of {ft} are homotopic, all of them are contractible. 2Note that our action functional and the one in [49] are of opposite signs. If k = n and M is a symplectic toric manifold, then pspec can be defined in purely combinatorial terms involving only the polytope ∆. Namely, pick a vertex x of ∆. Since ∆ in this case is a Delzant polytope [20], there is a unique (up to a permutation) choice of vectors v1, . . . ,vn which • originate at x; • span the n rays containing the edges of ∆ adjacent to x; • form a basis of Zn over Z. Proposition 1.8. pspec = x+ κ vi. (6) Proof. The vertices of the moment polytope are in one-to-one correspondence with the fixed points of the action. Let x ∈ M be the fixed point corre- sponding to the vertex x = (x1, . . . ,xn). Then the vectors vj = (v j , . . . , v j = 1, . . . , n, are simply the weights of the isotropy Tn-action on TxM . Since the definition of the mixed action-Maslov invariant of a Hamiltonian circle action does not depend on the choice of a 1-periodic orbit and a disc span- ning it, let us compute all Ii, l = 1, . . . , n, using the constant periodic orbit concentrated at the fixed point x and the constant disc u spanning it. Clearly, AΦi(x, u) = Φi(x) = xi and mΦi(x, u) = 2 vij ∀i = 1, . . . , n, which readily yields formula (6). E.Shelukhin pointed out to us that by summing up equations (6) over all the vertices x(1), . . . ,x(m) ∈ Rn of the moment polytope, one readily gets that pspec = Theorem 1.9. Assume M2n is a spherically monotone symplectic manifold equipped with a Hamiltonian Tk-action. Then the special fiber of the moment map is superheavy with respect to any (non-zero) idempotent a ∈ QH2n(M). In particular, it is stably and strongly non-displaceable. Let us mention that, in particular, the special fiber is non-empty and so pspec ∈ ∆. Moreover pspec is an interior point of ∆ – otherwise Φ −1(pspec) is isotropic of dimension < n and hence displaceable (see e.g. [9]). Remark 1.10. If dimM = 2dimTk (that is we deal with a symplectic toric manifold), the special fiber, say L, is a Lagrangian torus. In fact, this torus is monotone: for every D ∈ π2(M,L) we have ω = κ ·mL(D) , where mL stands for the Maslov class of L. This is an immediate consequence of the definitions. Remark 1.11. Note that when M is spherically monotone and the action is compressible Theorems 1.7 and 1.9 match each other: in this case pspec = 0 and therefore the special fiber is a stable stem by Theorem 1.7. It is unknown whether this property persists for the special fibers of non-compressible ac- tions. Example 1.12. Let M be the monotone symplectic blow up of CP 2 at k points (0 ≤ k ≤ 3) which is equivariant with respect to the standard T2- action and which is performed away from the Clifford torus in CP 2. Since the blow-up is equivariant, M comes equipped with a Hamiltonian T2-action extending the T2-action on CP 2. The Clifford torus is a fiber of the moment map of the T2-action on CP 2. Let L ⊂M be the Lagrangian torus which is the proper transform of the Clifford torus under the blow-up – it is a fiber of the moment map of the T2-action on M . Using Proposition 1.8 it is easy to see that L is the special fiber of M . According to Theorem 1.9, it is stably and strongly non-displaceable. In fact, it is a stem: the displaceability of all the other fibers was checked for k = 0 in [10], for k = 1 in [23] and for k = 2, 3 in [40]. We refer to Section 1.7.2 for further discussion of related problems and very recent advances. Digression: Calabi vs. action-Maslov. The method used to prove Theorem 1.9 also allows to prove the following result involving the mixed action-Maslov homomorphism. Denote by vol (M) the symplectic volume of M . Consider the function µ : H̃am (M) → R defined by µ(φH) := −vol (M) lim c(a, φlH)/l. In the case when a is the unity in a field that is a direct summand in the decomposition of the K-algebra QH2n(M,ω), as an algebra, into a direct sum of subalgebras, µ is a homogeneous quasi-morphism on H̃am (M) called Calabi quasi-morphism [22],[24],[46]; in the general case it has weaker prop- erties [23]. With this language the functional ζ (on normalized functions) is induced (up to a constant factor) by the pull-back of µ to the Lie algebra of H̃am (M). Following P.Seidel we described in [22] the restriction of µ (in fact, for any spherically monotoneM) on π1(Ham(M)) ⊂ H̃am (M) in terms of the Seidel homomorphism π1(Ham(M)) → QH ∗ (M), where QH ∗ (M) denotes the group of invertible elements in the ring QH∗(M). Here we give an alternative description of µ|π1(Ham(M)) in terms of the mixed action-Maslov homomor- phism I which, in turn, also provides certain information about the Seidel homomorphism. Theorem 1.13. Assume M is spherically monotone and let µ be defined as above for some non-zero idempotent a ∈ QH∗(M). Then µ|π1(Ham(M)) = vol (M) · I. Note that, in particular, µ|π1(Ham(M)) does not depend on a used to de- fine µ. The theorem also implies that µ descends to a quasi-morphism on Ham(M) if and only if I : π1(Ham(M)) → R vanishes identically (since µ descends to a quasi-morphism on Ham(M) if and only if µ|π1(Ham(M)) ≡ 0 – see e.g. [22], Prop. 3.4). The proof of the theorem is given in Section 9.1. Let us mention also that, interestingly enough, the homomorphism I coincides with the restriction to π1(Ham(M)) of yet another quasi-morphism on H̃am (M) constructed by P.Py (see [52, 53]). Digression: Action-Maslov homomorphism and Futaki invari- ant. This remark grew from an observation pointed out to us by Chris Woodward – we are grateful to him for that. Assume that our symplectic manifoldM is complex Kähler (i.e. the symplectic structure onM is induced by the Kähler one) and Fano (by this we mean here that [ω] = c1). Assume also that a Hamiltonian S1-action {ft} preserves the Kähler metric and the complex structure. For instance, if M2n is a symplectic toric manifold it can be equipped canonically with a complex structure and a Kähler metric invari- ant under the Tn-action on M , hence under the action of any S1-subgroup {ft} of T Let V be the Hamiltonian vector field generating the Hamiltonian flow {ft}. Since {ft} preserves the complex structure, one can associate to V its Futaki invariant F(V ) ∈ C [29]. It has been checked by E.Shelukhin [63] that, up to a universal constant factor, this Futaki invariant is equal to the value of the mixed action-Maslov homomorphism on the loop {ft}: F(V ) = const · I({ft}). Note that if such an M admits a Kähler-Einstein metric then the Futaki invariant has to vanish [29] – thus if I({ft}) 6= 0 the manifold does not admit a Kähler-Einstein metric. Moreover, if M2n is toric the opposite is also true: if the Futaki invariant vanishes for any V generating a subgroup of the torus Tn acting onM thenM admits a Kähler-Einstein metric – this follows from a theorem by Wang and Zhu [67], combined with a previous result of Mabuchi [38]. In terms of the moment polytope, the vanishing of the Futaki invariant, and accordingly the existence of a Kähler-Einstein metric, on a Kähler Fano toric manifold means precisely that the special point of the polytope coincides with the barycenter. 1.5 Super(heavy) monotone Lagrangian submanifolds Let (M2n, ω) be a closed spherically monotone symplectic manifold with [ω] = κ · c1(TM) on π2(M), κ > 0. Let L ⊂ M be a closed monotone Lagrangian submanifold with the minimal Maslov number NL ≥ 2. As usually, we put NL = +∞ if π2(M,L) = 0. As before, we work with the basic field F which is either Z2 or C. In the case F = C, we assume that L is relatively spin, that is L is orientable and the 2nd Stiefel-Whitney class of L is the restriction of some integral cohomology class of M . Disclaimer: In the case F = C the results of this section are conditional: We take for granted that Proposition 8.1 below, which was proved by Biran and Cornea [15] for homologies with Z2-coefficients, extends to homologies with C-coefficients. In each of the specific examples below we will explicitly state which F we are using and whenever we use F = C we assume that L is relatively spin. Denote by j the natural morphism j : H•(L;F) → H•(M ;F). We say that L satisfies the Albers condition [2] if there exists an element S ∈ H•(L;F) so that j(S) 6= 0 and deg S > dimL+ 1−NL . We shall refer to such S as to an Albers element of L. Example 1.14. Assume [L] ∈ H•(L;F) and j([L]) ∈ H•(M ;F) is non-zero. This means precisely that [L] is an Albers element of L. A closed monotone Lagrangian submanifold L which satisfies this con- dition (and whose minimal Maslov number is greater than 1) will be called homologically non-trivial in M . Theorem 1.15. Let L be a closed monotone Lagrangian submanifold satisfy- ing the Albers condition. Then L is heavy with respect to [M ]. In particular, any homologically non-trivial Lagrangian submanifold is heavy with respect to [M ]. Example 1.16. Assume that π2(M,L) = 0. Then the homology class of a point is an Albers element of L, and hence L is heavy. Note that in this case heaviness cannot be improved to superheaviness: the meridian on the two-torus is heavy but not superheavy. Here we took F = Z2. Example 1.17 (Lagrangian spheres in Fermat hypersurfaces). More exam- ples of heavy (but not necessarily superheavy) monotone Lagrangian sub- manifolds can be constructed as follows3. Let M ⊂ CP n+1 be a smooth complex hypersurface of degree d. The pull-back of the standard symplectic structure from CP n+1 turns M into a symplectic manifold (of real dimension 2n). If d ≥ 2, then, as it is explained, for instance, in [12],M contains a Lagrangian sphere: M can be included into a family of algebraic hypersurfaces of CP n+1 with quadratic degenerations at isolated points and the vanishing cycle of such a degeneration can be realized by a Lagrangian sphere following [5], [21], [60], [61], [62]. Let M ⊂ CP n+1 be a projective hypersurface of degree d, 2 ≤ d < n+ 2. The minimal Chern number of M equals N := n+2− d > 0. Let Ln ⊂ M2n be a simply connected Lagrangian submanifold (for instance, a Lagrangian sphere). First, consider the case when n is even, L is relatively spin and the Euler characteristics of L does not vanish (this is the case for a sphere). Then the 3We thank P.Biran for his indispensable help with these examples. homology class j([L]) ∈ Hn(M ;Z) is non-zero: its self-intersection number in M up to the sign equals the Euler characteristic. Thus [L] is an Albers element. (Here we use F = C). In view of Theorem 1.15, L is heavy with respect to [M ]. Second, suppose that n is of arbitrary parity but n > 2d − 3, and no restriction on the Euler characteristics of L is assumed anymore. This yields NL = 2N > n+ 1 and thus L satisfies the Albers condition with the class of a point P as an Albers element. Thus L is heavy with respect to [M ] – here we use F = Z2. Finally, fix n ≥ 3 and an even number d such that 4 ≤ d < n+2. Consider a Fermat hypersurface of degree d M = {−zd0 + z 1 + . . .+ z n+1 = 0} ⊂ CP n+1 . Its real part L := M ∩ RP n+1 lies in the affine chart z0 6= 0 and is given by the equation xd1 + . . .+ x n+1 = 1, where xj := Re(zj/z0) . Since d is even, L is an n-dimensional sphere. As it was explained above, L is heavy with respect to [M ] if either n is even (and F = C) or n > 2d − 3 (and F = Z2). However, in either case L is not superheavy with respect to [M ]. Indeed, let Σd ≈ Zd be the group of complex roots of unity. Given a vector α = (α1, . . . , αn) ∈ (Σd) n+1, denote by fα the symplectomorphism of M given by fα(z0 : z1 : . . . : zn+1) = (z0 : α1z1 : . . . : αn+1zn+1) . (7) If all αj ∈ C\R, then αjx /∈ R whenever x ∈ R\{0}, and thus fα(L)∩L = ∅. Therefore L is strongly displaceable and the claim follows from the part (iii) of Theorem 1.2. The next result gives a user-friendly sufficient condition of superheaviness. Theorem 1.18. Assume L is homologically non-trivial in M and assume a ∈ QH2n(M) is a non-zero idempotent divisible by j([L]) in QH•(M), that is a ∈ j([L]) ∗QH•(M). Then L is superheavy with respect to a. The homological non-triviality of L in the hypothesis of the theorem means just that [L] is an Albers element of L (see Example 1.14). In fact, the theorem can be generalized to the cases when L has other Albers elements – see Remark 8.3 (ii). Example 1.19 (Lagrangian spheres in quadrics). Here we work with F = C. Let M be the real part of the Fermat quadric M = {−z20 + j=1 z j = 0}. Assume that n is even and L is a simply connected Lagrangian submanifold with non-vanishing Euler characteristic (e.g. a Lagrangian sphere). Under this assumption, [L] ∈ H•(L) and j([L]) 6= 0, since L has non-vanishing self- intersection. Denote by p ∈ H∗(M ;F) the class of a point. The quantum homology ring of M was described by Beauville in [8]. In particular, p ∗ p = w−2[M ], where w = sκnqn. Thus a± := [M ]± pw are idempotents. One can show that j([L]) divides a− and hence L is a−- superheavy. Since a− is invariant under the action of Symp(M), the manifold L is strongly non-displaceable. For simplicity, we present the calculation in the case n = 2 – the general case is absolutely analogous. The 2-dimensional quadric is symplectomorphic to (S2 × S2, ω ⊕ ω). Denote by A and B the classes of [S2] × [point] and [point] × [S2] respectively. Since the symplectic form vanishes on j([L]) we get that j([L]) = l(B − A) with l 6= 0. It is known that A ∗ B = p and B ∗B = w−1[M ]. Thus j([L]) ∗ 1 wB = a−, that is j([L]) divides a−. In particular, the Lagrangian anti-diagonal ∆ := {(x, y) ∈ S2 × S2 : x = −y} , which is diffeomorphic to the 2-sphere, is superheavy with respect to a−. It is unknown whether ∆ is super-heavy with respect to a+. Further information on superheavy Lagrangian submanifolds in the quadrics can be extracted from [15]. Example 1.20 (A non-heavy monotone Lagrangian torus in S2 × S2). Con- sider the quadric M = S2 × S2 from the previous example. We will think of S2 as of the unit sphere in R3 whose symplectic form is the area form divided by 4π. We will work again with F = C. Interestingly enough, such an M contains a monotone Lagrangian torus that is not heavy with respect to a−. Namely, consider a submanifold K given by equations4 K = {(x, y) ∈ S2 × S2 : x1y1 + x2y2 + x3y3 = − , x3 + y3 = 0} . 4We thank Frol Zapolsky for his help with calculations in this example. One readily checks that K is a monotone Lagrangian torus with NK = 2 which represents a zero element inH2(M ;F) (both with F = C and F = Z2). Thus H•(K;F) does not contain any Albers element. Furthermore, K is disjoint from the Lagrangian anti-diagonal ∆ and hence is not heavy with respect to a− since, as it was shown above, ∆ is superheavy with respect to a−. In particular, K is an exotic monotone torus: it is not symplectomorphic to the Clifford torus which is a stem and hence a−-superheavy. A further study of exotic tori in products of spheres is currently being carried out by Y.Chekanov and F.Schlenk. It is an interesting problem to understand whether K is superheavy with respect to a+, or at least non-displaceable. Identify M \ {the diagonal} with the unit co-ball bundle of the 2-sphere. After such an identification ∆ corre- sponds to the zero section, while K corresponds to a monotone Lagrangian torus, say K ′. Interestingly enough, the Lagrangian Floer homology of K ′ in T ∗S2 (with F = Z2) does not vanish as was shown by Albers and Frauen- felder in [3], and thus K is not displaceable in M \ {the diagonal}. Thus the question on (non)-displaceability of K is related to understanding of the effect of the compactification of the unit co-ball bundle to S2 × S2. The proofs of theorems above are based on spectral estimates due to Albers [2] and Biran-Cornea [15]. Furthermore, the results above admit various generalizations in the framework of Biran-Cornea theory of quantum invariants for monotone Lagrangian submanifolds, see [15] and the discussion in Section 8 below. 1.6 An effect of semi-simplicity Recall that a commutative (finite-dimensional) algebra Q over a field A is called semi-simple if it splits into a direct sum of fields as follows: Q = Q1 ⊕ . . .⊕Qd , where • each Qi ⊂ Q is a finite-dimensional linear subspace over A; • each Qi is a field with respect to the induced ring structure; • the multiplication in Q respects the splitting: (a1, . . . , ad) · (b1, . . . , bd) = (a1b1, . . . , adbd). A classical theorem of Wedderburn (see e.g. [66], §96) implies that the semi- simplicity is equivalent to the absence of nilpotents in the algebra. Remark 1.21. Assume that the K-algebra QH2n(M,ω) splits, as an algebra, into a direct sum of two algebras, at least one of which is a field, and let e be the unity in that field. In particular, this is the case when QH2n(M,ω) = Q1⊕ . . .⊕Qd is semi-simple and e is the unity in one of the fields Qi. A slight generalization of the argument in [23, 46] (see [24], the remark on pp. 56-57) shows that the partial quasi-state ζ(e, ·) associated to e is R-homogeneous (and not just R+-homogeneous as in the general case). This immediately yields that every set which is heavy with respect to e is automatically superheavy with respect to e. In fact, in this situation ζ is a genuine symplectic quasi-state in the sense of [23] and, in particular, a topological quasi-state in the sense of Aarnes [1] (see [23] for details). In [1] Aarnes proved an analogue of the Riesz representation theorem for topological quasi-states which generalizes the correspondence between genuine states (that is positive linear functionals on C(M)) and measures. The object τζ corresponding to a quasi-state ζ is called a quasi- measure (or a topological measure). With this language in place, the sets that are (super)heavy with respect to ζ are nothing else but the closed sets of the full quasi-measure τζ . Any two such sets have to intersect for the following basic reason: any quasi-measure is finitely additive on disjoint closed subsets and therefore if two closed subsets of M of the full quasi-measure do not intersect, the quasi-measure of their union must be greater than the total quasi-measure of M , which is impossible. Example 1.22. In this example we again assume that F = Z2. Let M = CP n be equipped with the Fubini-Study symplectic structure ω, normalized so that [ω] = c1, and let A ∈ H2n−2(M) be the homology class of the hyper- plane. One readily verifies the following K-algebra isomorphism QH2n(M) ∼= K[X ]/〈X n+1 − u−1〉, where K = Z2[[u] = {zku k + zk−1u k−1 + . . . , zi ∈ Z2 ∀i} is the field of Laurent-type series in u := sn+1 with coefficients in Z2 and X = qA. Since no root of degree 2 or more of u−1 is contained in K, the polynomial P is irreducible over K for any n (see e.g. [34], Theorem 9.1) and therefore QH2n(M) is a field. Hence the collections of heavy and superheavy sets with respect to the fundamental class coincide. We claim that L := RP n ⊂ CP n is superheavy. The case n = 1 cor- responds to the equator of the sphere, which is known to be a stable stem. For n ≥ 2, note that NL = n + 1 and S = [RP 2] is an Albers element of L. Therefore, L is [M ]-heavy by Theorem 1.15, and hence superheavy. The next result follows directly from Theorem 1.3 (iii) and Remark 1.21: Theorem 1.23. Assume that QH2n(M) is semi-simple and splits into a direct sum of d fields whose unities will be denoted by e1, . . . , ed. Assume that a closed subset X ⊂M is heavy with respect to a non-zero idempotent a – as one can easily see, such an idempotent has to be of the form a = ej1+ . . .+ejl for some 1 ≤ j1 < . . . < jl ≤ d. Then X is superheavy with respect to some eji, 1 ≤ i ≤ l. The theorem yields the following geometric characterization of non-semi- simplicity of QH2n(M). Namely, define the symplectic Torelli group as the group of all symplectomorphisms of M which induce the identity map on H•(M ;F). For instance, this group contains Symp0(M). Note that any ele- ment of the symplectic Torelli group acts trivially on the quantum homology of M and hence maps sets (super)heavy with respect to an idempotent a to sets (super)heavy with respect to a. Now Theorem 1.23 readily implies the following Corollary 1.24. Assume that (M,ω) contains a closed subset X which is heavy with respect to a non-zero idempotent and displaceable by a symplec- tomorphism from the symplectic Torelli group. Then QH2n(M) is not semi- simple. The simplest examples are provided by sets of the form X×{a meridian} in M × T2 with a heavy X . Another result in the same vein is as follows5. Given a set Y of positive integers, put βY (M) = i∈Y βi(M), where βi(M) stands for the i-th Betti number of M over F . 5In the case F = C, Theorem 1.25 is conditional, see the disclaimer in the previous section. Theorem 1.25. Assume that either of the following (not mutually excluding) conditions holds: (a) M contains m > βY (M) + 1 pair-wise disjoint closed monotone La- grangian submanifolds whose minimal Maslov numbers are greater than n+1 and belong to a set Y of positive integers. (b) M contains pair-wise disjoint homologically non-trivial Lagrangian sub- manifolds6 whose fundamental classes, viewed as (non-zero) elements of H•(M ;F), are linearly dependent over F . (In the case F = C assume that all the Lagrangian submanifolds above are also relatively spin.) Then QH2n(M) is not semi-simple. The proof is given in Section 8. Example 1.26. For instance, if all the Lagrangian submanifolds from part (a) of the theorem are simply connected, their minimal Maslov numbers are equal to 2N , so that the set Y consists of one element: Y = {2N}. Thus if 2N > n + 1 and QH2n(M) is semi-simple, M cannot contain more than β2N(M)+ 1 pair-wise disjoint simply-connected Lagrangians (provided all of them are relatively spin if we work with F = C). Example 1.27. Set F = C. Fix n ≥ 11 and an even number d such that 6 ≤ d < (n + 3)/2. Consider a Fermat hypersurface of degree d M = {−zd0 + z 1 + . . .+ z n+1 = 0} ⊂ CP As we already saw in Example 1.17, the manifold L := M ∩ RP n+1 is an n-dimensional Lagrangian sphere. Consider the images fα(L), where sym- plectomorphisms fα are defined by (7). Note that, as long as αj/βj 6= ±1 for all j, the Lagrangian spheres fα(L) and fβ(L) are disjoint. Using this observation, it is easy to find d/2 disjoint Lagrangian spheres in M . The minimal Chern number N ofM equals n+2−d, and so 2N lies in the interval [n+2, 2n−4]. In this case β2N(M) = 1 (see e.g. [31]). Since d/2 > 2, we conclude from the previous example that QH2n(M) is not semi-simple. This conclusion agrees with the computation of QH∗(M) by Beauville [8]. 6See Example 1.14 for the definition. As in that example we again assume that all our Lagrangian submanifolds are closed, monotone and have minimal Maslov number greater than 1. It would be interesting to find examples of symplectic manifolds where the quantum homology is not known a priori and where the above theorems are applicable. Let us mention that different obstructions to the semi-simplicity of QH•(M) coming from Lagrangian submanifolds were recently found by Biran and Cornea [14]. 1.7 Discussion and open questions 1.7.1 Strong displaceability beyond Floer theory? Clearly, displaceability implies stable displaceability. The converse is not true, as the next example shows: Example 1.28. Consider the complex projective space CP n equipped with the Fubini-Study symplectic form (in our normalization the area of a line equals 1). Identify CP n with the symplectic cut of the Euclidean ball B(1) ⊂ Cn (that is the boundary of B(1) is collapsed to CP n−1 along the fibers of the Hopf fibration, see [36]), where B(r) := {π|z|2 ≤ r}. Then B(r) ⊂ CP n (i) displaceable for r < 1/2; (ii) strongly non-displaceable but stably displaceable for r ∈ [1/2, n/n+1); (iii) strongly and stably non-displaceable for r ≥ n/n+ 1. It is instructive to analyze the techniques involved in the proofs: The strong non-displaceability result in (ii) is an immediate consequence of Gromov’s packing-by-two-balls theorem, which is proved via the J-holomorphic variant of the theorem which states that there exists a J-holomorphic line in CP n passing through any two points. In the case (iii) the ball B(r) contains the Clifford torus, which is stably non-displaceable. This follows either from the fact that the Clifford torus is a stem (see [10]), or from non-vanishing of its Lagrangian Floer homology [16]. The displaceability of B(r) in (i) follows from the explicit construction of the two balls packing (see [33]). The stable displaceability in (ii) is a direct consequence of Theorem 1.7 above: Indeed, consider the standard Tn- action on CP n. The normalized moment polytope ∆ ⊂ Rn has the form ∆ = ∆stand + w where ∆stand is the standard simplex {ρi ≥ 0, ρi ≤ 1} in Rn, where (ρ1, . . . , ρn) denote coordinates in R n, and w = − 1 (1, . . . , 1). Note that the ball B(r) equals to Φ−1(∆r) where ∆r := r ·∆stand + w. Note that ∆r does not contain the origin exactly when r ≤ which yields the stable displaceability in (ii) above. A mysterious feature of Example 1.28 is as follows. On the one hand, we believe in the following general empiric principle: whenever one can establish the non-displaceability of a subset by means of the Floer homology theory, one gets for free the stable non-displaceability. On the other hand, we be- lieve, following a philosophical explanation provided by Biran, that Gromov’s packing-by-two-balls theorem may be extracted from some “operations” in Floer homology. Example 1.28 shows that at least one of these beliefs is wrong. It would be interesting to clarify this issue. 1.7.2 Heavy fibers of Poisson-commutative subspaces It was shown in [23] that for any finite-dimensional Poisson-commutative subspace A ⊂ C∞(M) at least one of the fibers of its moment map Φ has to be non-displaceable. Question. Is it true that at least one fiber of Φ has to be heavy (with respect to some non-zero idempotent a ∈ QH∗(M))? It is easy to construct an example of A whose moment map Φ has no superheavy fibers: take T2 with the coordinates p, q mod 1 on it and take A to be the set of all smooth functions depending only on p – the corresponding Φ defines the fibration of T2 by meridians none of which is superheavy. Here is another question which concerns fibers of symplectic toric man- ifolds, i.e. fibers of a moment map Φ of an effective Hamiltonian Tn-action on (M2n, ω). Assume M is (spherically) monotone. Theorem 1.9 shows that in such a case the special fiber ofM is superheavy, hence stably and strongly non-displaceable. In all the examples where it has been checked this turns out to be the only non-displaceable fiber of M . Question. Is the special fiber for a monotone symplectic toric M always a stem? In particular, is it the only non-displaceable fiber of the moment In the monotone case the special fiber is clearly the only heavy fiber of the moment map, because it is superheavy and any other heavy fiber would have had to intersect it. On the other hand, if we consider a Hamiltonian Tk- action on M2n with k < n there can be more than one non-displaceable fiber of the moment map – for instance, because of purely topological obstructions: the simplest Hamiltonian T1-action on CP 2 provides such an example. In the case of monotone symplectic toric manifolds of dimension bigger than 4 the question above is absolutely open. After the first draft of this paper appeared, a remarkable progress in this direction has been achieved in the works by Cho [17] and Fukaya, Oh, Ohta and Ono [28]: In particular, it turns out that a non-monotone symplectic toric manifold can have more than one non-displaceable fiber – this happens already for certain equivariant blowups of CP 2. Organization of the paper: In Section 2 we prove Theorem 1.7 which in particular states that the special fiber of a compressible torus action is a stable stem. In Section 3 we sum up various preliminaries from Floer theory including basic properties of spectral invariants and partial symplectic quasi-states. In addition we spell out a useful property of the Conley-Zehnder index: it is a quasi-morphism on the universal cover of the symplectic group (see Propo- sition 3.5). For completeness we extract a proof of this property from [54]; alternatively, one can use the results of [19]. In Section 4 we prove parts (i) and (iii) of Theorem 1.2 and Theorem 1.3 on basic properties of (super)heavy sets. In Section 5 we prove Theorem 1.5 on products of (super)heavy sets. Our approach is based on a quite general product formula for spectral invariants (Theorem 5.1), which is proved by a fairly lengthy algebraic argument. In Section 6 we prove Theorem 1.2 (ii) on stable non-displaceability of heavy subsets. The argument involves a “baby version” of the above-men- tioned product formula. In Section 7 we prove superheaviness of stable stems. In Section 8 we bring together the proofs of various results related to (super)heaviness of monotone Lagrangian submanifolds satisfying the Albers condition, including Theorems 1.15, 1.18, 1.25 and Proposition 1.4. In Section 9 we prove Theorem 1.9 on superheaviness of special fibers of Hamiltonian torus actions on monotone symplectic manifolds. The proof is quite involved. In fact, two tricks enabled us to shorten our original argu- ment: First, we use the Fourier transform on the space of rapidly decaying functions on the Lie coalgebra of the torus in order to reduce the problem to the case of Hamiltonian circle actions. Second, we systematically use the quasi-morphism property of the Conley-Zehnder index for asymptotic calcu- lations with Hamiltonian spectral invariants. Finally, in Section 9.1 we prove Theorem 1.13. Figure 1 sums up the hierarchy of the non-displaceability properties dis- cussed above. ������������������������ MHierarchy of non−displaceability properties of a closed subset of Heavy aidempotent Superheavy idempotent a wrt a non−zero wrt a non−zero (3) (4) (5) (6) action on a spherically torus action on a (not a compressible Hamiltonian necessarily monotone) Special fiber of monotone M Always true. True under certain conditions (see below) ? Question (under certain conditions − see below) Monotone Lagrangian submanifold L (14) (15) (17) (18) (21) (22) (16b) Product of codimension−1 skeletons of fine triangulations Strongly non−displaceable a Hamiltonian torus Non−displaceable a symplectic isotopy Non−displaceable by wrt [M] Heavy (16a) Superheavy wrt [M] Stable stem Stably non−displaceable Zero fiber of Figure 1: Hierarchy of non-displaceability properties (1),(2),(6),(19) - Trivial. (3) True if a is invariant under the action of the whole group Symp (M) – Theorem 1.2, part (iii). (4), (9) Theorem 1.2, part (iii). (5) True if the algebra QH2n(M) is semi-simple – see Corollary 1.24. (7a) True if the algebra QH2n(M) splits, as an algebra, into a direct sum of two algebras, at least one of which is a field, and a is the unity element in that field – see Remark 1.21. (7b), (16b) Theorem 1.2, part (i). (8) Theorem 1.2, part (ii). (10) Theorem 1.18 (see the assumptions on L there). (11) True if the algebra QH2n(M) is semi-simple – see Corollary 1.24. (12) Theorem 1.3, part (i). (13) Theorem 1.3, part (ii). (14) Theorem 1.18 (see the assumptions on L there) with a = [M ] – i.e. j(L) is invertible in QH•(M). (15) L satisfies the Albers condition – see Theorem 1.15. (16a) True if QH2n(M) is a field – see Remark 1.21. (17) Theorem 1.6. (18) Theorem 1.9. (20) Theorem 1.7. (21) Is the special fiber for a monotone symplectic toric M always a stem? See Section 1.7.2. (22) True if M is spherically monotone and the torus action is compressible – see Remark 1.11. (23) See [23]. 2 Detecting stable displaceability For detecting stable displaceability of a subset of a symplectic manifold we shall use the following result (cf. [48, Chapter 6]). Theorem 2.1. Let X be a closed subset of a closed symplectic manifold (M,ω). Assume that there exists a contractible loop of Hamiltonian diffeo- morphisms of (M,ω) generated by a normalized time-periodic Hamiltonian Ht(x) so that Ht(x) 6= 0 for all t ∈ [0, 1] and x ∈ X. Then X is stably displaceable. Proof. Denote by ht the Hamiltonian loop generated by H . Let h t be its homotopy to the constant loop: h t = ht and h t = 1. Write H (s)(x, t) for the corresponding normalized Hamiltonians. Consider the family of diffeo- morphisms Ψs of M × T ∗S1 given by Ψs(x, r, θ) = (h θ x, r −H (s)(h θ x, θ), θ) . One readily checks that Ψs, s ∈ [0, 1], is a Hamiltonian isotopy (not com- pactly supported). We claim that Ψ1 displaces Y := X × {r = 0}. Indeed, if Ψ1(x, 0, θ) ∈ Y we have hθx ∈ X and Hθ(hθx) = 0 which contradicts the assumption of the theorem. This completes the proof. Proof of Theorem 1.7: Choose a linear functional F : Rk → R with rational coefficients which is strictly positive on Y . Then for some suffi- ciently large positive integer N the Hamiltonian H := NΦ∗F generates a contractible Hamiltonian circle action on M and H is strictly positive on X := Φ−1(Y ). Thus X is stably displaceable in view of the previous theo- 3 Preliminaries on Hamiltonian Floer theory 3.1 Valuation on QH∗(M) Define a function ν : K → Γ by θ) = max{ θ | zθ 6= 0} . The convention is that ν(0) = −∞. In algebraic terms, exp ν is a non- Archimedean absolute value on K. The function ν admits a natural extension to Λ and then to QH∗(M) – abusing the notation we will denote all of them by ν. Namely, any element of λ ∈ Λ can be uniquely represented as λ = θ uθs θ, where each uθ belongs to F [q, q−1], and any non-zero a ∈ QH∗(M) can be uniquely represented as i λibi, 0 6= λi ∈ Λ, 0 6= bi ∈ H∗(M ;F). Define ν(λ) := max θ | uθ 6= 0 ν(a) := max ν(λi). 3.2 Hamiltonian Floer theory We briefly recall the notation and conventions for the setup of the Hamilto- nian Floer theory that will be used in the proofs. Let L be the space of all smooth contractible loops γ : S1 = R/Z → M . We will view such a γ as a 1-periodic map γ : R → M . Let D2 be the standard unit disk in R2. Consider a covering L̃ of L whose elements are equivalence classes of pairs (γ, u), where γ ∈ L, u : D2 → M , u|∂D2 = γ (i.e. u(e2π −1t) = γ(t)), is a (piecewise smooth) disk spanning γ in M and the equivalence relation is defined as follows: (γ1, u1) ∼ (γ2, u2) if and only if γ1 = γ2 and the 2-sphere u1#(−u2) vanishes in H 2 (M). The equivalence class of a pair (γ, u) will be denoted by [γ, u]. The group of deck transformations of the covering L̃ → L can be naturally identified with HS2 (M). An element A ∈ HS2 (M) acts by the transformation A([γ, u]) = [γ, u#(−A)]. (8) Let F :M× [0, 1] → R be a Hamiltonian function (which is time-periodic as we always assume). Set Ft := F (·, t). We will denote by ft the Hamiltonian flow generated by F , meaning the flow of the time-dependent Hamiltonian vector field Xt defined by the formula ω(·, Xt) = dFt(·) ∀t. (Note our sign convention!) Let PF ⊂ L be the set of all contractible 1-periodic orbits of the Hamilto- nian flow generated by F , i.e. the set of all γ ∈ L such that γ(t) = ft(γ(0)). Denote by P̃F the full lift of PF to L̃. Denote by Fix (F ) the set of those fixed points of f that are endpoints of contractible periodic orbits of the flow: Fix (F ) := {x ∈M | ∃γ ∈ PF , x = γ(0)}. We say that F is regular if for any x ∈ Fix (F ) the map dxf : TxM → TxM does not have eigenvalue 1. Recall that the action functional is defined on L̃ by the formula: AF ([γ, u]) = F (γ(t), t)dt− Note that AF (Ay) = AF (y) + ω(A) (9) for all y ∈ L̃ and A ∈ HS2 (M). For a regular Hamiltonian F define a vector space C(F ) over F as the set of all formal sums λiyi, λi ∈ Λ, yi ∈ P̃F , modulo the relations Ay = s−ω(A)q−c1(A)y, for all y ∈ P̃F , A ∈ H 2 (M). The grading on Λ together with the Conley- Zehnder index on elements of P̃F (see Section 3.3) defines a Z-grading on C(F ). We will denote the i-th graded component by Ci(F ). Given a loop {Jt}, t ∈ S 1, of ω-compatible almost complex structures, define a Riemannian metric on L by (ξ1, ξ2) = ω(ξ1(t), Jtξ2(t))dt, where ξ1, ξ2 ∈ TγL. Lift this metric to L̃ and consider the negative gradient flow of the action functional AF . For a generic choice of the Hamiltonian F and the loop {Jt} (such a pair (F, J) is called regular) the count of iso- lated gradient trajectories connecting critical points of AF gives rise in the standard way [26], [32], [58] to a Morse-type differential d : C(F ) → C(F ), d2 = 0. (10) The differential d is Λ-linear and has the graded degree −1. It strictly de- creases the action. The homology, defined by d, is called the Floer homology and will be denoted by HF∗(F, J). It is a Λ-module. Different choices of a regular pair (F, J) lead to natural isomorphisms between the Floer homology groups. The following proposition summarizes a few basic algebraic properties of Floer complexes and Floer homology that will be important for us further. The proof is straightforward and we omit it. Proposition 3.1. 1) Each Ci(F ) and each HFi(F, J), i ∈ Z, is a finite-dimensional vector space over K. 2) Multiplication by q defines isomorphisms Ci(F ) → Ci+2(F ) and HFi(F, J) → HFi+2(F, J) of K-vector spaces. 3) For each i ∈ Z there exists a basis of Ci(F ) over K consisting of the elements of the form ql[γ, u], with [γ, u] ∈ P̃F . 4) A finite collection of elements of the form ql[γ, u], [γ, u] ∈ P̃F , lying in C0(F ) ∪C1(F ) is a basis of the vector space C0(F )⊕C1(F ) over the field K if and only if it is a basis of the module C(F ) over the ring Λ. 3.3 Conley-Zehnder and Maslov indices In this section we briefly outline the definition and recall the relevant proper- ties of the Conley-Zehnder index referring to [54, 58, 57] for details. In par- ticular, we show that the Conley-Zehnder index is a quasi-morphism on the universal cover S̃p (2k) of the symplectic group Sp(2k) (see Proposition 3.5 below), a fact which will be useful for asymptotic calculations with Floer homology in the next sections. There are several routes leading to this fact, which is quite natural since all homogeneous quasi-morphisms on S̃p (2k) are proportional, and hence the same quasi-morphism admits quite dissimilar definitions [7]. We extract the quasi-morphism property from the paper of Robbin and Salamon [54] by bringing together several statements contained therein7. The Conley-Zehnder index assigns to each [γ, u] ∈ P̃F a number. Orig- inally the Conley-Zehnder index was defined only for regular Hamiltonians [18] – in this case it is integer-valued and gives rise to a grading of the ho- mology groups in Floer theory. Later the definition was extended in different ways by different authors to arbitrary Hamiltonians. We will use such an ex- tension introduced in [54] (also see [57, 58]). In this case the Conley-Zehnder index may take also half-integer values. Let k be a natural number. Consider the symplectic vector space R2k with a symplectic form ω2k on it. Denote by p = (p1, . . . , pk), q = (q1, . . . , qk) the corresponding Darboux coordinates on the vector space R2k. 7We thank V.L. Ginzburg for stimulating discussions on the material of this section. Robbin-Salamon index of Lagrangian paths: Let V ⊂ R2k be a Lagrangian subspace. Consider the Grassmannian Lagr (k) of all Lagrangian subspaces in R2k and consider the hypersurface ΣV ⊂ Lagr (k) formed by all the Lagrangian subspaces that are not transversal to V . To such a V and to any smooth path {Lt}, 0 ≤ t ≤ 1, in Lagr (k) Robbin and Salamon [54] associate an index, which may take integer or half-integer values and which we will denote by RS({Lt}, V ). The definition of the index can be outlined as follows. A number t ∈ [0, 1] is called a crossing if Lt ∈ ΣV . To each crossing t one associates a certain quadratic form Qt on the space L(t) ∩ V – see [54] for the precise definition. The crossing t is called regular if the quadratic form Qt is non-degenerate. The index of such a regular crossing t is defined as the signature of Qt if 0 < t < 1 and as half of the signature of Qt if t = 0, 1. One can show that regular crossings are isolated. For a path {Lt} with only regular crossings the index RS({Lt}, V ) is defined as the sum of the indices of its crossings. An arbitrary path can be perturbed, keeping the endpoints fixed, into a path with only regular crossings and the index of the perturbed path does not depend on the perturbation – in fact, it depends only on the fixed endpoints homotopy class of the path. Moreover, it is additive with respect to the concatenation of paths and satisfies the naturality property: RS({ALt}, AV ) = RS({Lt}, V ) for any symplectic matrix A. Indices of paths in Sp (2k): Consider the group Sp (2k) of symplectic 2k × 2k-matrices. Denote by S̃p (2k) its universal cover. One can use the index RS in order to define two indices on the space of smooth paths in Sp (2k). The first index, denoted by Ind2k, is defined as follows. Fix a Lagrangian subspace V ⊂ R2k. For each smooth path {At}, 0 ≤ t ≤ 1, in Sp (2k) define Ind2k ({At}, V ) as Ind2k ({At}, V ) := RS({AtV }, V ). The naturality of the RS index implies that RS({BAtB −1(BV )}, BV ) = RS({BAtV )}, BV ) = = RS({AtV )}, V ) for any B ∈ Sp (2k) and thus we get the following naturality condition for Ind2k: Ind2k ({BAtB −1}, BV ) = Ind2k ({At}, V ) for any B ∈ Sp (2k). (11) The second index, which we will call the Conley-Zehnder index of a matrix path and which will be denoted by CZmatr, is defined as follows. For each A ∈ Sp (2k) denote by GrA the graph of A which is a Lagrangian subspace of the symplectic vector space R4k = R2k×R2k equipped with the symplectic structure ω4k = −ω2k ⊕ ω2k. Denote by ∆ the diagonal in R 4k = R2k × R2k – it is a Lagrangian subspace with respect to ω4k. Now for any smooth path {At}, 0 ≤ t ≤ 1, in Sp (2k) define CZmatr as CZmatr({At}) := RS({GrAt},∆). Equivalently, one can define CZmatr({At}) similarly to the index RS by look- ing at the intersections of {A(t)} with the hypersurface Σ ⊂ Sp (2k) formed by all the symplectic 2k× 2k-matrices with eigenvalue 1 and translating the notions of a regular crossing and the corresponding quadratic form to this setup. Both indices Ind2k ({At}, V ) and CZmatr({At}) depend only on the fixed endpoints homotopy class of the path {At} and are additive with respect to the concatenation of paths in Sp (2k). The relation between the two indices is as follows. Denote by I2k the 2k × 2k identity matrix. Given a smooth path {At}, 0 ≤ t ≤ 1, in Sp (2k), set Ât := I2k ⊕ At ∈ Sp (4k). Then CZmatr({At}) = Ind4k({Ât},∆). (12) Remark 3.2. Note that near each W ∈ ΣV there exists a local coordinate chart (on Lagr (k)) in which ΣV can be defined by an algebraic equation of degree bounded from above by a constant C depending only on k and W . Moreover, since for any two V, V ′ ∈ Lagr (k) there exists a diffeomorphism of Lagr (k) mapping ΣV into ΣV ′ we can assume that C = C(k) is independent of W and depends only on k. Therefore for any V , for any point W ∈ ΣV and for any sufficiently small open neighborhood UW of W in Lagr (k) the number of connected components of UW \(UW ∩ΣV ) is bounded by a constant depending only on k. Using these observations and the fact that regular crossings are isolated it is easy to show that there exists a constant C(k), depending only on k, such that for any Lagrangian subspace V ⊂ R2k and any path {At} ⊂ Sp (2k), 0 ≤ t ≤ 1, there exists a δ > 0 such that for any smooth path {A′t} ⊂ Sp (2k), 0 ≤ t ≤ 1, which is δ-close to {At} in the C 0-metric, one has |Ind2k({At}, V )− Ind2k({A t}, V | < C(k), |CZmatr({At})− CZmatr({A t}| < C(k). Leray theorem on the index Ind2k: The following result follows from Theorem 5.1 in [54] which Robbin and Salamon credit to Leray [35], p.52. Denote by L the Lagrangian (q1, . . . , qk)-coordinate plane in R 2k. Any sym- plectic matrix S ∈ Sp (2k) can be decomposed into k × k blocks as where the blocks satisfy, in particular, the condition that EF T − FET = 0. (13) If SL ∩ L = 0 then the k × k-matrix F is invertible and multiplying (13) by F−1 on the left and (F T )−1 = (F−1)T on the right, we get that F−1E − ET (F−1)T = 0. Therefore the matrix QS := F −1E is symmetric. Theorem 3.3 ([54], Theorem 5.1; [35], p.52). Assume {At}, {Bt}, 0 ≤ t ≤ 1, are two smooth paths in Sp (2k), such that A0 = B0 = I2k and A1L ∩ L = 0, B1L ∩ L = 0, A1B1L ∩ L = 0. Then Ind2k({AtBt}, L) = Ind2k({At}, L) + Ind2k({Bt}, L) + sign (QA1 +QB1), where sign (QA1 +QB1) is the signature of the quadratic form defined by the symmetric k × k-matrix QA1 +QB1. Corollary 3.4. Let V be any Lagrangian subspace of R2k. Then there exists a positive constant C, depending only on k, such that for any smooth paths {Xt}, {Yt}, 0 ≤ t ≤ 1, in Sp (2k), such that X0 = Y0 = I2k (there are no assumptions on X1, Y1!), |Ind2k({XtYt}, V )− Ind2k({Xt}, V )− Ind2k({Yt}, V )| < C. Proof. We will write C1, C2, . . . for (possibly different) positive constants de- pending only on k. Pick a map Ψ ∈ Sp (2k) such that ΨV = L. Denote At = ΨXtΨ Bt = ΨYtΨ −1. Note that the paths {At}, {Bt} are based at the identity. Using the naturality property (11) of Ind2k we get |Ind2k({XtYt}, V )− Ind2k({Xt}, V )− Ind2k({Yt}, V )| = = |Ind2k({ΨXtYtΨ −1},ΨV )− Ind2k({ΨXtΨ −1},ΨV )− −Ind2k({ΨYtΨ −1},ΨV )| = = |Ind2k({(ΨXtΨ −1)(ΨYtΨ −1)}, L)− Ind2k({ΨXtΨ −1}, L)− −Ind2k({ΨYtΨ −1}, L)| = = |Ind2k({AtBt}, L)− Ind2k({At}, L)− Ind2k({Bt}, L)|. |Ind2k({XtYt}, V )− Ind2k({Xt}, V )− Ind2k({Yt}, V )| = = |Ind2k({AtBt}, L)− Ind2k({At}, L)− Ind2k({Bt}, L)|. (14) Further on, Remark 3.2 implies that we can find sufficiently C0-close identity- based perturbations {A′t}, {B t} of {At}, {Bt} such that A′1L ∩ L = 0, B 1L ∩ L = 0, A 1L ∩ L = 0. (15) |Ind2k({AtBt}, L)− Ind2k({At}, L)− Ind2k({Bt}, L)|− −|Ind2k({A t}, L)− Ind2k({A t}, L)− Ind2k({B t}, L)| < C1, (16) for some C1. On the other hand, since the three identity-based paths {A {B′t}, {A t}, satisfy the conditions (15), we can apply to them Theorem 3.3. Hence there exists C2 such that |Ind2k({A t}, L)− Ind2k({A t}, L)− Ind2k({B t}, L)| < C2. Combining it with (14) and (16) we get that there exists C3 such that |Ind2k({XtYt}, V )− Ind2k({Xt}, V )− Ind2k({Yt}, V )| < C3, which finishes the proof. Conley-Zehnder index as a quasi-morphism: Recall that 2n = dimM . Restricting CZmatr to the identity-based paths in Sp (2n) one gets a function on S̃p (2n) that will be still denoted by CZmatr. Proposition 3.5 (cf. [19]). The function CZmatr : S̃p (2n) → R is a quasi- morphism. It means that there exists a constant C > 0 such that |CZmatr(ab)− CZmatr(a)− CZmatr(b)| ≤ C ∀a, b ∈ S̃p (2n). Proof. Represent a and b by identity-based paths {At}, {Bt}, 0 ≤ t ≤ 1, in Sp (2n). Then use (12) and apply Corollary 3.4 for k = 2n, V = ∆ to {Ât}, {B̂t} in Sp (4n). Maslov index of symplectic loops: The Conley-Zehnder index for identity-based loops in Sp (2n) is called the Maslov index of a loop. Its original definition, going back to [4], is the following: it is the intersection number of an identity-based loop with the stratified hypersurface Σ whose principal stratum is equipped with a certain co-orientation. Note that we do not divide the intersection number by 2 and thus in our case the Maslov index takes only even values; for instance, the Maslov index of a counterclockwise 2π-twist of the standard symplectic R2 is 2. We denote the Maslov index of a loop {B(t)} by Maslov ({B(t)}). Conley-Zehnder and Maslov indices of periodic orbits: The Con- ley-Zehnder index for periodic orbits is defined by means of the Conley- Zehnder index for matrix paths as follows. Given [γ, u] ∈ P̃F , build an identity-based path {A(t)} in Sp (2n) as follows: take a symplectic trivial- ization of the bundle u∗(TM) over D2 and use the trivialization to identify the linearized flow dγ(0)ft, 0 ≤ t ≤ 1, along γ with a symplectic matrix {A(t)}. Then the Conley-Zehnder index CZF ([γ, u]) is defined as CZF ([γ, u]) := n− CZmatr ({A(t)}). (17) With such a normalization of CZF for any sufficiently C 2-small autonomous Morse Hamiltonian F , the Conley-Zehnder index of an element of P̃F , rep- resented by a pair [x, u] consisting of a critical point x of F (viewed as a constant path in M) and the trivial disk u, is equal to the Morse index of x. Note that with such a normalization CZF (Sy) = CZF (y)+2 c1(M) for every y ∈ P̃F and S ∈ H 2 (M). Similarly, if the time-1 flow generated by F defines a loop in Ham(M) then to each [γ, u] ∈ P̃F one can associate its Maslov index. Namely, trivialize the bundle u∗(TM) over D2 and identify the linearized flow {dxft} along γ with an identity-based loop of symplectic 2n × 2n-matrices. Define the Maslov index mF ([γ, u]) as the Maslov index for the loop of symplectic matrices. Under the action of HS2 (M) on P̃F the Maslov index changes as follows: mF (S · [γ, u]) = mF ([γ, u])− 2 c1(M), S ∈ H 2 (M). Let us make the following remark. Assume γ ∈ PF and assume that a symplectic trivialization of the bundle γ∗(TM) over S1 identifies {dγ(0)ft} with an identity-based path {A(t)} of symplectic matrices. Assume there is another symplectic trivialization of the same bundle, coinciding with the first one at γ(0), and denote by {B(t)} the identity-based loop of transition matrices from the first symplectic trivialization to the second one. Use the second trivialization to identify {dγ(0)ft} with an identity-based path {A ′(t)}. CZmatr ({A ′(t)}) = CZmatr ({A(t)}) +Maslov ({B(t)}), (18) and if {A(t)} is a loop then so is {A′(t)} and Maslov ({A′(t)}) = Maslov ({A(t)}) +Maslov ({B(t)}). (19) 3.4 Spectral numbers Given the algebraic setup as above, the construction of the Piunikhin-Sala- mon-Schwarz (PSS) isomorphism [47] yields a Λ-linear isomorphism (PSS- isomorphism) φM : QH∗(M) → HF∗(F, J) which preserves the grading and which is actually a ring isomorphism (the pair-of-pants product defines a ring structure on HF∗(F, J)). Using the PSS-isomorphism one defines the spectral numbers c(a, F ), where 0 6= a ∈ QH∗(M), in the usual way [45]. Namely, the action functional AF defines a filtration on C(F ) which induces a filtration HF ∗ (F, J), α ∈ R, on HF∗(F, J), with HF ∗ (F, J) ⊂ HF ∗ (F, J) as long as α < β. Then c(a, F ) := inf {α | φM(a) ∈ HF ∗ (F, J)}. Such spectral number is finite and well-defined (does not depend on J). Here is a brief account of the relevant properties of spectral numbers – for details see [45] (see also [65, 42, 59, 43] for earlier versions of this theory). (Spectrality) c(a,H) ∈ spec (H), where the spectrum spec (H) of H is defined as the set of critical values of the action functional AH , i.e. spec (H) := AH(P̃H) ⊂ R. (Quantum homology shift property) c(λa,H) = c(a,H) + ν(λ) for all λ ∈ Λ, where ν is the valuation defined in Section 3.1. (Hamiltonian shift property) c(a,H + λ(t)) = c(a,H) + λ(t) dt for any Hamiltonian H and function λ : S1 → R. (Monotonicity) If H1 ≤ H2, then c(a,H1) ≤ c(a,H2). (Lipschitz property) The map H 7→ c(a,H) is Lipschitz on the space of (time-dependent) Hamiltonians H : M × S1 → R with respect to the C0-norm. (Symplectic invariance) c(a, φ∗H) = c(a,H) for every φ ∈ Symp0(M), H ∈ C∞(M); more generally, Symp (M) acts on H∗(M ;F), and hence on QH∗(M), and c(a, φ ∗H) = c(φ∗a,H) for any φ ∈ Symp (M). (Normalization) c(a, 0) = ν(a) for every a ∈ QH∗(M). (Homotopy invariance) c(a,H1) = c(a,H2) for any normalized H1, H2 generating the same φ ∈ H̃am (M). Thus one can define c(a, φ) for any φ ∈ H̃am (M) as c(a,H) for any normalized H generating φ. (Triangle inequality) c(a ∗ b, φψ) ≤ c(a, φ) + c(b, ψ). The commutative ring QH•(M) admits a K-bilinear and K-valued form Ω on QH•(M) which associates to a pair of quantum homology classes a, b ∈ QH•(M) the coefficient (belonging to K) at the class [point] = [point] · q a point in their quantum product a ∗ b ∈ QH•(M) (the Frobenius structure). Let τ : K → F be the map sending each series θ∈Γ zθs θ, zθ ∈ F , to its free term z0. Define a non-degenerate F -valued F -linear pairing on QH•(M) by Π(a, b) := τΩ(a, b) = τΩ(a ∗ b, [M ]) . (20) Note that Π is symmetric and Π(a ∗ b, c) = Π(a, b ∗ c) ∀a, b, c ∈ QH•(M). (21) With this notion at hand, we can present another important property of spectral numbers: (Poincaré duality) c(b, φ) = − infa∈Υ(b) c(a, φ −1) for all b ∈ QH•(M)\{0} and φ. Here Υ(b) denotes the set of all a ∈ QH•(M) with Π(a, b) 6= 0. The Poincaré duality can be extracted from [47] (cf. [22]) – for a proof see [46]. The next property is an immediate consequence of the definitions (see [22] for a discussion in the monotone case): (Characteristic exponent property) Given 0 6= λ ∈ F , a, b ∈ QH∗(M), a, b, a + b 6= 0, and a (time-dependent) Hamiltonian H , one has c(λ · a,H) = c(a,H) and c(a+ b,H) ≤ max(c(a,H), c(b,H)). 3.5 Partial symplectic quasi-states Given a non-zero idempotent a ∈ QH2n(M) and a time-independent Hamil- tonian H :M → R, define ζ(a,H) := lim c(a, lH) . (22) When a is fixed, we shall often abbreviate ζ(H) instead of ζ(a,H). The limit in the formula (22) always exists and thus the functional ζ : C∞(M) → R is well-defined. The functional ζ on C∞(M) is Lipschitz with respect to the C0- norm ‖H‖ = maxM |H| and therefore extends to a functional ζ : C(M) → R, where C(M) is the space of all continuous functions on M . These facts were proved in [23] in the case a = [M ] but the proofs actually go through for any non-zero idempotent a ∈ QH2n(M). Here we will list the properties of ζ for such an M . Again, these proper- ties were proved in [23] in the case a = [M ] but the proof goes through for any non-zero idempotent a ∈ QH2n(M). The additivity with respect to con- stants property was not explicitly listed in [23] but follows immediately from the definition of ζ and the Hamiltonian shift property of spectral numbers. The triangle inequality follows readily from the definition of ζ and from the triangle inequality for the spectral numbers. Theorem 3.6. The functional ζ : C(M) → R satisfies the following prop- erties: Semi-homogeneity: ζ(αF ) = αζ(F ) for any F and any α ∈ R≥0. Triangle inequality: If F1, F2 ∈ C ∞(M), {F1, F2} = 0 then ζ(F1 + F2) ≤ ζ(F1) + ζ(F2). Partial additivity and vanishing: If F1, F2 ∈ C ∞(M), {F1, F2} = 0 and the support of F2 is displaceable, then ζ(F1 + F2) = ζ(F1); in particular, if the support of F ∈ C(M) is displaceable, ζ(F ) = 0. Additivity with respect to constants and normalization: ζ(F +α) = ζ(F )+α for any F and any α ∈ R. In particular, ζ(1) = 1. Monotonicity: ζ(F ) ≤ ζ(G) for F ≤ G. Symplectic invariance: ζ(F ) = ζ(F ◦ f) for every symplectic diffeomorphism f ∈ Symp0 (M). Characteristic exponent property: ζ(a1+a2, F ) ≤ max(ζ(a1, F ), ζ(a2, F )) for each pair of non-zero idempotents a1, a2 with a1 ∗ a2 = 0, a1+ a2 6= 0 (in this case a1 + a2 is also a non-zero idempotent), and for all F ∈ C(M) . We will call the functional ζ : C(M) → R satisfying all the properties listed in Theorem 3.6 a partial symplectic quasi-state. 4 Basic properties of (super)heavy sets In this section we prove parts (i) and (iii) of Theorem 1.2, as well as The- orem 1.3. We shall use that a partial symplectic quasi-state ζ extends by continuity in the uniform norm to a monotone functional on the space of continuous functions C(M), see Section 3.5 above. In particular, one can use continuous functions instead of the smooth ones in the definition of (su- per)heaviness in formulae (3) and (4). Assume a partial quasi-state ζ defined by a non-zero idempotent is fixed and we consider heaviness and superheaviness with respect to ζ . We start with the following elementary Proposition 4.1. A closed subset X ⊂ M is heavy if and only if for every H ∈ C∞(M) with H|X = 0, H ≤ 0 one has ζ(H) = 0. A closed subset X ⊂ M is superheavy if and only if for every H ∈ C∞(M) with H|X = 0, H ≥ 0 one has ζ(H) = 0. Proof. The “only if” parts follow readily from the monotonicity property of ζ . Let us prove the “if” part in the “heavy case” – the “superheavy” case is similar. Take a function H on M and put F = min(H − inf H, 0) . Note that F |X = 0 and F ≤ 0. Thus ζ(F ) = 0 by the assumption of the proposition. Thus 0 = ζ(F ) ≤ ζ(H − inf H) = ζ(H)− inf which yields heaviness of X . The following proposition proves part (i) of Theorem 1.2. Proposition 4.2. Every superheavy set is heavy. Proof. Let X ⊂ M be a superheavy subset. Assume that H|X = 0, H ≤ 0. By the triangle inequality for ζ we have ζ(H) + ζ(−H) ≥ 0. Note that −H|X = 0, −H ≥ 0. Superheaviness yields ζ(−H) = 0, so ζ(H) ≥ 0. But by monotonicity ζ(H) ≤ 0. Thus ζ(H) = 0 and the claim follows from Proposition 4.1. Superheavy sets have the following user-friendly property. Proposition 4.3. Let X ⊂ M be a superheavy set. Then for every α ∈ R and H ∈ C∞(M) with H|X ≡ α one has ζ(H) = α. Proof. Since ζ(H + α) = ζ(H) + α it suffices to prove the proposition for α = 0. Take any function H with H|X = 0. Since X is superheavy and, by Proposition 4.2, also heavy, we have 0 = ζ(−|H|) ≤ ζ(H) ≤ ζ(|H|) = 0 , which yields ζ(H) = 0. As an immediate consequence we get part (iii) of Theorem 1.2. Proposition 4.4. Every superheavy set intersects with every heavy set. Proof. Let X be a superheavy set and Y be a heavy set. Assume on the contrary that X ∩ Y = ∅. Take a function H ≤ 0 with H|Y ≡ 0 and H|X ≡ −1. Then ζ(H) = −1 by Proposition 4.3. On the other hand, ζ(H) = 0 since Y is heavy, and we get a contradiction. Note that two heavy sets do not necessarily intersect each other: a meridian of T2 is heavy (see Corollary 6.4 below), while two meridians can be disjoint. Proof of Theorem 1.3 (i) and (ii): The triangle inequality yields c(a,H) = c(a ∗ [M ], 0 +H) ≤ c(a, 0) + c([M ], H) = ν(a) + c([M ], H). Passing to the partial quasi-states ζ(a,H) and ζ([M ], H) we get: ζ(a,H) = lim c(a, kH)/k ≤ ≤ lim (ν(a) + c([M ], kH))/k = lim c([M ], kH)/k = ζ([M ], H). The result now follows from the definition of heavy and superheavy sets (see Definition 1.1). Proof of Theorem 1.3 (iii): By the characteristic exponent property of spectral invariants, ζ(a, F ) ≤ max i=1,...,l ζ(ei, F ) ∀F ∈ C ∞(M) . (23) Choose a sequence of functions Gj ∈ C ∞(M), j → +∞, with the fol- lowing properties: Gk ≤ Gj for k > j, Gj = 0 on X , Gj ≤ 0 and for every function F ≤ 0 which vanishes on an open neighborhood of X there exists j so that Gj ≤ F (existence of such a sequence can be checked easily). In view of inequality (23), we have that for every j there exists i so that ζ(a,Gj) ≤ ζ(ei, Gj). Passing, if necessary, to a subsequence Gjk , jk → +∞, we can assume without loss of generality that i is the same for all j. In view of heaviness of X with respect to a, we have that ζ(a,Gj) = 0. Therefore ζ(ei, Gj) ≥ 0. Choose any function F ≤ 0 onM which vanishes on an open neighborhood of X . Then there exists j large enough so that F ≥ Gj. By monotonicity combined with the previous estimate we have 0 ≥ ζ(ei, F ) ≥ ζ(ei, Gj) ≥ 0 , which yields ζ(ei, F ) = 0. Now let F be any continuous function on M that vanishes on X . Take a sequence of continuous functions Fj , converging to F in the C 0-norm, so that each Fj vanishes on an open neighborhood of X . Then ζ(ei, Fj) = limj→+∞ ζ(ei, Fj) = 0, because ζ(ei, ·) is Lipschitz with respect to the C norm. The heaviness ofX with respect to ei now follows from Proposition 4.1. This finishes the proof of the theorem. 5 Products of (super)heavy sets In this section we prove Theorem 1.5 on products of (super)heavy subsets. 5.1 Product formula for spectral invariants The proof of Theorem 1.5 is based on the following general result. Theorem 5.1. For every pair of time-dependent Hamiltonians G1, G2 onM1 and M2, and all non-zero a1 ∈ QHi1(M1), a2 ∈ QHi2(M2) we have c(a1 ⊗ a2, G1(z1, t) + G2(z2, t)) = c(a1, G1) + c(a1, G2) . Here G1(z1, t) +G2(z2, t) is a time-dependent Hamiltonian on M1 ×M2. Let us deduce Theorem 1.5 from Theorem 5.1. Proof of Theorem 1.5: We show that the product of superheavy sets is superheavy (the proof for heavy sets goes without any changes). We denote by ζ1, ζ2 and ζ the partial quasi-states on M1,M2 and M := M1 ×M2 as- sociated to the idempotents a1, a2 and a1 ⊗ a2 respectively. Let Xi ⊂ Mi, i = 1, 2, be a superheavy set. By Proposition 4.1 it suffices to show that if a non-negative function G ∈ C∞(M) vanishes on some neighborhood, say U , of X := X1 × X2 then ζ(G) = 0. (Since ζ is Lipschitz with respect to the C0-norm this would imply that ζ(G) = 0 for any non-negative G ∈ C(M) that vanishes on X). Put K := maxM G. Choose neighborhoods Ui of Xi so that U1 ×U2 ⊂ U . Choose non-negative functions Gi on Mi which vanish on Xi and such that Gi(z) > K for all z ∈Mi \Ui. Observe that G ≤ G1 +G2. But, in view of Theorem 5.1 and superheaviness of Xi, we have ζ(G1 +G2) = ζ1(G1) + ζ2(G2) = 0 . By monotonicity 0 ≤ ζ(G) ≤ ζ(G1 +G2) = 0 , and thus ζ(G) = 0. It remains to prove Theorem 5.1. Note that the left-hand side of the equality stated in the theorem does not exceed the right-hand side: this is an imme- diate consequence of the triangle inequality for spectral invariants. However, we were unable to use this observation for proving the theorem. Our ap- proach is based on a rather lengthy algebraic analysis which enables us to calculate separately the left and the right-hand sides “on the chain level”. A simple inspection of the results of this calculation yields the desired equality. 5.2 Decorated Z2-graded complexes A Z2-complex is a Z2-graded finite-dimensional vector space V over a field K equipped with a K-linear differential ∂ : V → V satisfying ∂2 = 0 and shifting the grading. A decorated complex over K = KΓ includes the following data: • a countable subgroup Γ ⊂ R; • a Z2-graded complex (V, d) over KΓ; • a preferred basis x1, . . . , xn of V ; • a function F : {x1, . . . , xn} → R (called the filter) which extends to V λjxj) = max{ν(λj) + F (xj) ∣∣∣ λj 6= 0}, and satisfies F (dv) < F (v) for all v ∈ V \ {0}. The convention is that F (0) = −∞. Here ν is the valuation defined in Section 3.1 above. We shall use the notation V := (V, {xi}i=1,...,n, F, d,Γ) for a decorated complex. The ⊗̂K-tensor product V = V1⊗̂KV2 of decorated complexes Vi = (Vi, {x j }j=1,...,ni, Fi, di,Γi) , i = 1, 2 is defined as follows. Consider the space V = V1⊗̂KV2 (see formula (5) above) with the natural Z2-grading. Define the differential d on V by d(x⊗ y) = d1x⊗ y + (−1) deg xx⊗ d2y . The preferred basis in V is given by {xpq := x p ⊗ x q } and the filter F is defined by F (xpq) = F1(x p ) + F2(x Finally, we put V := (V, {xpq}, F, d,Γ1 + Γ2) . The (Z2-graded) homology of decorated complexes are denoted by H∗(V) – they are K-vector spaces. By the Künneth formula, H(V1⊗̂KV2) = H(V1)⊗̂KH(V2). Next we define spectral invariants associated to a decorated complex V := (V, {xpq}, F, d) . Namely, for a ∈ H(V) put c(a) := inf{F (v) | a = [v], v ∈ Ker d} . We shall see below that c(a) > −∞ for each a 6= 0. The purpose of this algebraic digression is to state the following result: Theorem 5.2. For any two decorated complexes V1,V2 c(a1 ⊗ a2) = c(a1) + c(a2) ∀a1 ∈ H(V1), a2 ∈ H(V2) 5.3 Reduced Floer and Quantum homology The 2-periodicity of the Floer complex and Floer homology defined by the multiplication by q (see Proposition 3.1 above) allows to encode their al- gebraic structure in a decorated Z2-complex. Consider a regular pair (G, J) consisting of a Hamiltonian function and a compatible almost-complex struc- ture on M (both, in general, are time-dependent). Let (C∗(G), dG,J) be the corresponding Floer complex. Let us associate to it a Z2-complex: a Z2- graded vector space VG over KΓ, defined as VG := C0(G)⊕ C1(G), with the obvious Z2-grading, and a differential ∂G,J : VG → VG, defined as the direct sum of dG,J : C1(G) → C0(G) and qdG,J : C0(G) → C1(G). One readily checks that this is indeed a Z2-complex because dG,J : C(G) → C(G) is ΛΓ-linear. We will call (VG, ∂G,J) the Z2-complex associated to (G, J). Note that the cycles and the boundaries of (VG, ∂G) having Z2-degree i ∈ {0, 1} in VG coincide, respectively, with the cycles and the boundaries having Z-degree i of (C(G), dG,J). Therefore the Floer homology HFi(G, J) is isomorphic, as a vector space over KΓ, to the i-th degree component of the homology of the complex (VG, ∂G,J). The Z2-complex (VG, ∂G,J) carries a structure of the decorated complex VG,J as follows. Let γi(t), i = 1, . . . , m, be the collection of all contractible 1-periodic orbits of the Hamiltonian flow generated by G. Choose disc ui in M spanning γi. For each i there exists unique integer, say ri, so that the Conley-Zehnder index of the element xi := q ri · [γi, ui] lies in the set {0, 1}. Clearly, the collection {xi} forms a basis of VG over KΓ. We shall consider it as a preferred basis. Note that the preferred basis is unique up to multiplication of xi’s by elements of the form s αi , αi ∈ Γ. Finally, the action functional associated to G defines a filtration on VG. The homology of (VG, ∂G,J) can be canonically identified via the PSS- isomorphism with the object which we call reduced quantum homology: QHred(M) := QH0(M)⊕QH1(M) . We call this isomorphism the reduced PSS-isomorphism and denote it by ψG,J . Note that we have a natural projection p : QH∗(M) → QHred(M) which sends any degree homogeneous element a to aqr with deg a + 2r ∈ {0, 1}. With this notation, the usual Floer-homological spectral invariant c(a,G) coincides with the spectral invariant c(p(a)) of the decorated complex VG,J . 5.4 Proof of Theorem 5.1 By the Lipschitz property of spectral numbers it is enough to consider the case when G1 and G2 belong to regular pairs (Gi, Ji), i = 1, 2. Set G(z1, z2, t) := G1(z1, t) +G(z2, t) and J := J1 × J2. Then (G, J) is also a regular pair. Put Γi = Γ(Mi, ωi). It is straightforward to see that the decorated complex VG,J is the ⊗̂K-tensor product of the decorated complexes VGi,Ji for i = 1, 2. Put (M,ω) = (M1×M2, ω1⊕ω2). An obvious modification of the Künneth formula for quantum homology (see e.g. [41, Exercise 11.1.15] for the state- ment in the monotone case) yields a natural monomorphism ı : QHi1(M1, ω1)⊗̂KQHi2(M1, ω1) → QHi1+i2(M,ω) . Since in our setting quantum homologies are 2-periodic, the collection of these isomorphisms for all pairs (i1, i2) from the set {0, 1} induces an isomorphism j : QHred(M1)⊗̂KQHred(M2) → QHred(M) . It has the following properties: First, given two elements a1 ∈ QHi1(M1, ω1) and a2 ∈ QHi2(M2, ω2) we have that p(a1)⊗ p(a2) = p(a1 ⊗ a2) . Second, the following diagram commutes: H(VG1, ∂G1,J1)⊗̂KH(VG2 , ∂G2,J2) ψG1,J1⊗ψG2,J2 H(VG, ∂G,J) QHred(M1)⊗̂KQHred(M2) // QHred(M) Here k is the isomorphism coming from the Künneth formula for Z2-comple- xes, and ψGi,Ji, ψG,J stand for the reduced PSS-isomorphisms. It follows that the definition of c(ai, Gi), c(a1⊗a2, G) matches the definition of c(p(ai)) and c(p(a1)⊗ p(a2)). By Theorem 5.2 we get that c(a1⊗a2, G) = c(p(a1)⊗p(a2)) = c(p(a1))+ c(p(a2)) = c(a1, G1)+ c(a2, G2) . This proves Theorem 5.1 modulo Theorem 5.2. 5.5 Proof of algebraic Theorem 5.2 A decorated complex is called generic if F (xi) − F (xj) /∈ Γ for all i 6= j (recall that under our assumptions Γ, the group of periods of the symplectic form ω over π2(M), is a countable subgroup of R). We start from some auxiliary facts from linear algebra. Let V := (V, {xi}i=1,...,n, F, d,Γ) be a generic decorated complex. We recall once again that for brevity we write K instead of KΓ wherever it is clear what Γ is taken. An element x ∈ V is called normalized if x = xp + i 6=p λixi , λi ∈ K, F (xp) > max i 6=p F (λixi) . We shall use the notation x = xp+o(xp). In generic complexes, every element x 6= 0 can be uniquely written as x = λ(xp+o(xp)) for some p = 1, . . . , n and λ ∈ K. A system of vectors e1, . . . , em in V is called normal if every ei has the form ei = xji +o(xji) for ji ∈ {1, . . . , n} and the numbers ji are pair-wise distinct. Lemma 5.3. Let e1, . . . , em be a normal system. Then λiei) = max F (λiei) . Proof. We prove the result using induction in m. For m = 1 the statement is obvious. Let’s check the induction step m− 1 → m. Observe that it suffices to check that F (e1 + λiei) ≥ F (e1) . (24) Then obviously λiei) ≥ max F (λiei) , while the reversed inequality is an immediate consequence of the definitions. By the induction step, λiei) = max i=2,...,n F (λiei) . In view of the genericity, the maximum at the right hand side can be uniquely written as F (λi0xi0). Without loss of generality we shall assume that ei = xi + o(xi) and i0 = 2. λ−12 λiei = x2 + o(x2) . Write e1 = x1 + αx2 +X, v = x2 + βx1 + Y, where α, β ∈ K and X, Y ∈ SpanK(x3, . . . , xn). Note that F (x1) > F (αx2), F (x2) > F (βx1), which yields ν(α) < F (x1)− F (x2) < −ν(β) = ν(β −1) . (25) In particular, ν(α) < ν(β−1). Note that e1 + λ2v = (1 + λ2β)x1 + (α + λ2)x2 + Z, Z ∈ SpanK(x3, . . . , xn) . F (e1 + λ2v) ≥ max(ν(1 + λ2β) + F (x1), ν(α + λ2) + F (x2)) . If ν(1 + λ2β) ≥ 0 we have F (e1 + λ2v) ≥ F (x1) = F (e1) and inequality (24) follows. Assume that ν(1+λ2β) < 0 = ν(1). Then ν(λ2β) = 0 = ν(λ2)+ν(β), and hence ν(λ2) = ν(β −1) 6= ν(α). Thus ν(α + λ2) ≥ ν(λ2) = −ν(β) . Combining this inequality with (25) we get that F (e1 + λ2v) ≥ ν(α + λ2) + F (x1) + (F (x2)− F (x1)) ≥ F (x1) + (ν(α + λ2) + ν(β)) ≥ F (x1) = F (e1) . This completes the proof of inequality (24), and hence of the lemma. It readily follows from the lemma that every normal system is linearly inde- pendent. Lemma 5.4. Every subspace L ⊂ V has a normal basis. Proof. We use induction over m = dimK L. The case m = 1 is obvious, so let us handle the induction step m − 1 → m. It suffices to show the following: Let e1, . . . , em−1 be a normal basis in a subspace L ′, and let v /∈ L′ be any vector. Put L = SpanK(L ′ ∪ {v}). Then there exists em ∈ L so that e1, . . . , em is a normal basis. Indeed, assume without loss of generality that for all i = 1, . . . , m−1 one has ei = xi+ o(xi). Put W = SpanK(xm, . . . , xn). We claim that L′ ∩W = {0}. Indeed, otherwise λ1e1 + . . .+ λm−1em−1 = λmxm + . . .+ λnxn where the linear combinations in the right and the left-hand sides are non- trivial. Apply F to both sides of this equality. By Lemma 5.3 F (λ1e1 + . . .+ λm−1em−1) = F (xp) mod Γ, where 1 ≤ p ≤ m− 1 , while F (λmxm + . . .+ λnxn) = F (xq) mod Γ, where q ≥ m . This contradicts the genericity of our decorated complex, and the claim fol- lows. Since dimL′+dimW = dimV , we have that V = L′⊕W . Decompose v as u+w with u ∈ L′, w ∈ W , and note that w ∈ L. Note that e1, . . . , em−1, w are linearly independent. Furthermore, w = λ(xp + o(xp)) for some p ≥ m. Put em = λ −1w. The vectors e1, . . . , em form a normal basis in L. The same proof shows that if L1 ⊂ L2 are subspaces of V , every normal basis in L1 extends to a normal basis in L2. Now we turn to the analysis of the differential d. Choose a normal basis g1, . . . , gq in Im d, and extend it to a normal basis g1, . . . , gq, h1, . . . , hp in Ker d. Note that each of these p + q vectors has the form xj + o(xj) with distinct j. Let us assume without loss of generality that the remaining n−p−q elements of the preferred basis in V are x1, . . . , xq, and gi = xi+q + o(xi+q), hj = xj+2q + o(xj+2q) . Here we use that, by the dimension theorem, n = p+ 2q. Note that x1, . . . , xq, g1, . . . , gq, h1, . . . , hp is a normal system, and hence a basis in V . We call such a basis a spectral basis of the decorated complex V. Note that [h1], . . . , [hp] is a basis in the homology H(V). Consider any homology class a = λi[hi]. Every element v ∈ V with a = [v] can be written as v = λihi + αjgj. Thus, by Lemma 5.3, F (v) ≥ maxi F (λihi) and hence c(a) = max F (λihi) . (26) This proves in particular that the spectral invariants are finite provided a 6= 0. For finite sets A = {v1, . . . , vs} and B = {w1, . . . , ws} we write A⊗B for the finite set {vi ⊗ wj}. Assume now that V1,V2 are generic decorated complexes. We say that they are in general position if their tensor product V = V1⊗̂KV2 is generic. Let Bi = {x 1 , . . . , x 1 , . . . , g 1 , . . . , h }, i = 1, 2 be a spectral basis in Vi. Obviously, B1 ⊗ B2 is a normal basis in V1⊗̂KV2. We shall denote by d1, d2, d the differentials and by F1, F2, F the filters in V1,V2 and V respectively. Put Gi = {g 1 , . . . , g qi }, Hi = {h 1 , . . . , h and K = G1 ⊗ B2 ∪B1 ⊗G2. Observe that Im d ⊂W := Span(K) . Take any two classes j ] ∈ H(Vi) , i = 1, 2. Suppose that a1 ⊗ a2 = [v]. Then v is of the form λ(1)m λ m ⊗ h l + w where w must lie in W . Observe that (H1 ⊗H2) ∩K = ∅. By Lemma 5.3, F (v) ≥ max F (λ(1)m λ m ⊗ h l ) , and hence c(a1 ⊗ a2) = max F (λ(1)m λ m ⊗ h = max m ) + F2(λ = max m ) + max l ) = c(a1) + c(a2) . In the last equality we used (26). This completes the proof of Theorem 5.2 for decorated complexes in general position. It remains to remove the general position assumption. This will be done with the help of the following lemma. We shall work with a family of deco- rated complexes V := (V, {xi}i=1,...,n, F, d,Γ) which have exactly the same data (preferred basis, grading, differential and Γ) with the exception of the filter F which will be allowed to vary in the class of filters. The corresponding spectral invariants will be denoted by c(a, F ). Lemma 5.5. (i) If filters F, F ′ satisfy F (xi) ≤ F ′(xi) for all i = 1, . . . , n, then c(a, F ) ≤ c(a, F ′) for all non-zero classes a ∈ H(V). (ii) If F is a filter and θ ∈ R, then F + θ is again a filter and c(a, F + θ) = c(a, F ) + θ for all non-zero classes a ∈ H(V). The proof is obvious and we omit it. It follows that for any two filters F, F ′ |c(a, F )− c(a, F ′)| ≤ ||F − F ′||C0 ∀a ∈ H(V) \ {0} . Assume now that V1,V2 are decorated complexes. Denote by F1, F2 their filters. Fix ǫ > 0. By a small perturbation of the filters we get new filters, F ′1 and F 2, on our complexes so that the complexes become generic and in general position, and furthermore ||F1 − F 1||C0 ≤ ǫ , ||F2 − F 2||C0 ≤ ǫ . Given homology classes ai ∈ H(Vi) we have |c(a1, F1) + c(a2, F2)− c(a1 ⊗ a2, F1 + F2)| ≤ |c(a1, F 1) + c(a2, F 2)− c(a1 ⊗ a2, F 1 + F 2)|+ 4ǫ = 4ǫ . Here we used that Theorem 5.2 is already proved for generic complexes in general position. Since ǫ > 0 is arbitrary, we get that c(a1, F1) + c(a2, F2)− c(a1 ⊗ a2, F1 + F2) = 0 , which completes the proof of Theorem 5.2 in full generality. 6 Stable non-displaceability of heavy sets In this section we prove part (ii) of Theorem 1.2. Proposition 6.1. Every heavy subset is stably non-displaceable. For the proof we shall need the following auxiliary statement. Given R > 0, consider the torus T2R obtained as the quotient of the cylinder T ∗S1 = R(r)× S1 (θ mod 1) by the shift (r, θ) 7→ (r + R, θ). For α > 0 define the function Fα(r, θ) := αf(r) on T R, where f(r) is any R-periodic function having only two non-degenerate critical points on [0, R]: a maximum point at r = 0 with f(0) = 1, and a minimum point at r = R/2, f(R/2) =: −β < 0. We denote by [T ] the fundamental class of T2R. We work with the symplectic form dr∧dθ on T2R. Lemma 6.2. c([T ], Fα) = α. Proof. Note that the contractible closed orbits of period 1 of the Hamiltonian flow generated by Fα are fixed points forming circles S+ = {r = 0} and S− = {r = R/2}. The actions of the fixed points on S± equal respectively to α and −αβ, and thus the spectral invariants of Fα lie in the set {α,−αβ}. Recall from [59] that c([T ], Fα) > c([point], Fα). Thus c([T ], Fα) = α. Lemma 6.3. Let H ∈ C∞(M) so that H−1(maxH) is displaceable. Then ζ(H) < maxH. Proof. Choose ǫ > 0 so that the set H−1((maxH − ǫ,maxH ]) is displaceable. Choose a real-valued cut-off function ρ : R → [0, 1] which equals 1 near maxH and which is supported in (maxH−ǫ,maxH+ǫ). Thus ρ(H) is supported in H−1((maxH − ǫ; maxH ]) and ζ(ρ(H)) = 0. Since H and ρ(H) Poisson-commute, the vanishing and the monotonicity axioms yield ζ(H) = ζ(ρ(H)) + ζ(H − ρ(H)) ≤ max(H − ρ(H)) < maxH . Proof of Proposition 6.1: It suffices to show that for every R > 0 the set Y := X × {r = 0} ⊂M ′ :=M × T2R is non-displaceable. Assume on the contrary that Y is displaceable. Choose a function H on M with H ≤ 0, H−1(0) = X . Put H ′ = H + F1 = H + f(r) :M ′ → R. Assume that the partial quasi-state ζ on M is associated to some non-zero idempotent a ∈ QH∗(M) by means of (2). Denote by ζ ′ the quasi-state on M ′ associated to a⊗ T . Note that Y = (H ′)−1(maxH ′) , where maxH ′ = 1 , while Theorem 5.1 and Lemma 6.2 imply that ζ ′(H ′) = ζ(H) + 1 . By Lemma 6.3 ζ ′(H ′) < 1 and so ζ(H) < 0. In view of Proposition 4.1, we get a contradiction with the heaviness of X . Lemma 6.2 also yields a simple proof of the following result which also follows from Corollary 1.15: Corollary 6.4. Any meridian of T2 is heavy (with respect to the fundamental class [T ]). Proof. In the notation as above identify T2 with T21 for R = 1. Since any two meridians of T2 can be mapped into each other by a symplectic isotopy and since such an isotopy preserves heaviness, it suffices to prove that the meridian S := S+ = {r = 0} (see the proof of Lemma 6.2) is heavy. Let H : T2 → R be a Hamiltonian and let us show that ζ(H) ≥ infSH , where ζ is defined using [T ]. Shifting H , if necessary, by a constant, we may assume without loss of generality that infSH = 1. Pick f = f(r) : T 2 → R as in the definition of Fα so that F1 = f ≤ H on T 2 (note that f equals 1 on S). Then Lemma 6.2 yields ζ(H) ≥ ζ(F1) = 1 = inf 7 Analyzing stable stems Proof of Theorem 1.6: Assume that A is a Poisson-commutative subspace of C∞(M), Φ : M → A∗ its moment map with the image ∆, and let X = Φ−1(p) be a stable stem of A. Take any functionH ∈ C∞(A∗) with H ≥ 0 andH(p) = 0. We claim that ζ(Φ∗H) = 0. By an arbitrarily small C0-perturbation of H we can assume that H = 0 in a small neighborhood, say U , of p. Choose an open covering U0, U1, . . . , UN of ∆ so that U0 = U , and all Φ −1(Ui) are stably displaceable for i ≥ 1 (it exists by the definition of a stem). Let ρi : ∆ → R, i = 0, . . . , N , be a partition of unity subordinated to the covering {Ui}. Take the two-torus T2R as in Section 6. Choose R > 0 large enough so that Φ−1(Ui)× {r = const} is displaceable in M × T R for all i ≥ 1. Choose now a sufficiently fine covering Vj , j = 1, . . . , K, of the torus T R by sufficiently thin annuli {|r − rj | < δ} so that the sets Φ −1(Ui) × Vj are displaceable in M × T2R for all i ≥ 1 and all j. Let ̺j = ̺j(r), j = 1, . . . , K, be a partition of unity subordinated to the covering {Vj}. Denote by ζ ′ the partial quasi-state corresponding to a⊗T . Put F (r, θ) = cos(2πr/R). Write Φ∗H + F = (Φ∗H + F ) · Φ∗ρi · ̺j = Φ∗(Hρ0) + F · Φ ∗ρ0 + (Φ∗H + F ) · Φ∗ρi · ̺j . Note that Hρ0 = 0 and F · Φ ∗ρ0 ≤ 1. Applying partial quasi-additivity and monotonicity we get that ζ ′(Φ∗H + F ) = ζ ′(F · Φ∗ρ0) ≤ 1. By Lemma 6.2 and the product formula (Theorem 5.1 above) we have ζ ′(Φ∗H + F ) = ζ(Φ∗H) + 1 ≤ 1 and hence ζ(Φ∗H) ≤ 0. On the other hand, ζ(Φ∗H) ≥ 0 since H ≥ 0. Thus ζ(Φ∗H) = 0 and the claim follows. Further, given any function G on M with G ≥ 0 and G|X = 0, one can find a function H on A∗ with H(p) = 0 so that G ≤ Φ∗H . By monotonicity and the claim above 0 ≤ ζ(G) ≤ ζ(Φ∗H) = 0 , and hence ζ(G) = 0. Thus X is superheavy. 8 Monotone Lagrangian submanifolds The main tool of proving (super)heaviness of monotone Lagrangian subman- ifolds satisfying the Albers condition is the spectral estimate in Proposi- tion 8.1(iii) below, which originated in the work by Albers [2]. Later on Biran and Cornea pointed out a mistake in [2], and suggested a correction together with a far reaching generalization in [15]. Let us mention that the original Albers estimate was used in the first version of the present paper. We thank Biran and Cornea for informing us about the mistake, explaining to us their approach and helping us to correct a number of our results affected by this mistake. The main ingredient of Biran-Cornea techniques which is needed for our purposes is the following result. Let (M,ω) be a closed monotone symplectic manifolds with [ω] = κ·c1(M), κ > 0. WriteN for the minimal Chern number of (M,ω). Let Ln ⊂M2n be a closed monotone Lagrangian submanifold with the minimal Maslov number NL ≥ 2. We shall treat slightly differently the cases when NL is even and odd. Let us mention that for orientable L, NL is automatically even. Thus, due to our convention, when NL is odd we work with the basic field F = Z2. Let Γ = κN · Z be the group of periods of M . Recall that the quantum ring has the form QH∗(M) = H∗(M ;F) ⊗F Λ, where the Novikov ring Λ is defined as Λ = KΓ[q, q −1] . Put Γ′ = (κN/2) · Z. Consider an extended Novikov ring Λ′ := KΓ′ [q 2 , q− 2 ]. Define now QH ′∗(M) as QH∗(M) if NL is even, and as H∗(M,Z2)⊗Z2 Λ ′ if NL is odd. In the latter case QH ∗(M) is an extension of QH∗(M), and we shall consider without special mentioning QH∗(M), Λ, KΓ as subrings of QH ′∗(M), Λ ′, KΓ′. The grading of QH ∗(M) is determined by the condition deg q 2 = 1. As before, we shall use notation QH ′•(M), where • = “even” when F = C and • = ∗ when F = Z2. Note that the spectral invariants (and hence partial symplectic quasi- states) are well-defined over the extended ring, and furthermore, their values and properties, by tautological reasons, do not alter under such an extension (cf. a discussion in [15], Section 5.4). Put w := sκNL/2qNL/2. Recall that j stands for the natural morphism H•(L;F) → H•(M ;F). Proposition 8.1. Assume that k > n+1−NL. If F = C assume in addition that k is even. Then there exists a canonical homomorphism jq : Hk(L;F) → QH ′k(M) with the following properties 8The letter “q” in jq stands for quantum. (i) jq(x) = j(x) + w−1y, where y is a polynomial in w−1 with coefficients in H•(M ;F); (ii) jq([L]) = j([L]); (iii) If jq(x) 6= 0 then c(jq(x), H) ≤ supLH for every H ∈ C ∞(M). In particular, if S is an Albers element of L, we have jq(S) = j(S)+O(w−1) 6= This proposition was proved by Biran and Cornea in [15] in the case F = Z2: The map j q is essentially the map iL appearing in Theorem A(iii) in [15]. Proposition 8.1(i) above is a combination of Theorem A(iii) and Proposition 4.5.1(i) in [15]. Our variable w corresponds to the variable t−1 in [15], while our sNκqN corresponds to the variable s−1 in Section 2.1.2 of [15]. After such an adjustment of the notation, the formula w := sκNL/2qNL/2 above can be extracted from Section 2.1.2 of [15]. For Proposition 8.1(ii) above see Remark 5.3.2.a in [15]. Proposition 8.1(iii) above follows from Lemma 5.3.1(ii) in [15]. Finally, let us repeat the disclaimer made in Section 1.5: we take for granted that Proposition 8.1 remains valid for the case F = C. Let us pass to the proofs of our results on (super)-heaviness of monotone Lagrangian submanifolds. We start with the following remark. Let S be an Albers element of L. The Poincaré duality property of spectral invariants (see Section 3.4 above) extends verbatim to the case of the ring QH ′(M) with the following modification: When NL is odd, the pairing Π introduced in Section 3.4 extends in the obvious way to a non-degenerate F -valued pairing on QH ′•(M) which we still denote by Π. Applying Poincaré duality and substituting H := −F into Proposition 8.1 (iii) above we get that for every function F ∈ C∞(M) c(T, F ) ≥ inf F ∀T ∈ QH ′•(M) with Π(T, j q(S)) 6= 0. In particular, given a non-zero idempotent a ∈ QH ′•(M) and a class b ∈ QH ′•(M), so that Π(a∗b, j q(S)) 6= 0, we get, using the normalization property of spectral invariants, that c(a, F ) + ν(b) ≥ c(a ∗ b, F ) ≥ inf F ∀F ∈ C∞(M) . (27) Therefore, applying (27) to kF for k ∈ N, dividing by k and passing to the limit as k → +∞, we get that for the partial quasi-state ζ , defined by a, ζ(F ) ≥ inf F ∀F ∈ C∞(M), meaning that L is heavy with respect to a. Proof of Theorem 1.15: Let S be an Albers element of L. Let T ∈ H•(M ;F) be any singular homology class such that T ◦ j(S) 6= 0. Thus, applying Proposition 8.1 (i) we see that Π([M ]∗T, jq(S)) = Π(T, jq(S)) 6= 0, and hence inequality (27), applied to a = [M ], b = T , yields that L is heavy with respect to [M ]. Let us pass to the proof of Theorem 1.25 on the effect of semi-simplicity of the quantum homology. It readily follows from the next more general statement. Let L1, . . . , Lm be Lagrangian submanifolds satisfying the Albers condition. Let Si be any Albers element of Li. Denote by Zi = j q(Si) ∈ QH ′•(M) its image under the inclusion morphism from Proposition 8.1 above. Theorem 8.2. Given such L1, . . . , Lm and Z1, . . . , Zm, assume, in addition, that QH2n(M) is semi-simple and the Lagrangian submanifolds L1, . . . , Lm are pair-wise disjoint. Then the classes Z1, . . . , Zm are linearly independent over KΓ′. Proof. Arguing by contradiction, assume that Z1 = α2Z2 + . . .+ αmZm (28) for some α2, . . . , αm ∈ KΓ′ . Since QH2n(M) is semi-simple, it decomposes into a direct sum of fields with unities e1, . . . , ed. Since the pairing Π (on QH ′•(M ;F)) is non-degenerate, there exists T ∈ QH •(M ;F) such that Π(T, Z1) 6= 0. (29) Let us write T as T = [M ] ∗ T = ei ∗ T. (30) Equations (29), (30) imply that there exists l ∈ [1, d] such that Π(el ∗ T, Z1) 6= 0 . (31) Then (28) implies that there exists r ∈ [2, m] such that Π(el ∗ T, αrZr) 6= 0. Using (21) (for Π on QH ′•(M ;F)) we can rewrite the last equation as Π(el ∗ αrT, Zr) 6= 0. (32) Applying now formula (27) for S = Z1 ∈ H•(L1;F), a = el, b = T , and also for S = Zr ∈ H•(Lr;F), a = el, b = αrT , we conclude that both L1 and Lr are heavy with respect to el. Thus they are superheavy with respect to el, because el is the unity in a field factor of QH2n(M) (see Section 1.6). Hence they must intersect – in contradiction to the assumption of the theorem. This finishes the proof of the first part of the theorem. Proof of Theorem 1.25(a): Assume that L1, . . . , Lm are pair-wise disjoint Lagrangian submanifolds satisfying the condition (a) from the formulation of the theorem. Denote by Ni the minimal Maslov number of Li. Since Ni > n + 1, the class of a point from H0(Li;F) is an Albers element for Li. Let Zi ∈ QH 0(M) be its image under the Biran-Cornea inclusion morphism associated to Li. Note that by Proposition 8.1(i) Zi = p + aiw i , where wi = s κNi/2qNi/2, ai ∈ HNi(M ;F) and p ∈ H0(M ;F) is the homology class of a point. Observe that degwi = Ni > n + 1, and hence the expression for Zi cannot contain terms in w−1i of order two and higher, since HkNi(M ;F) = 0 for k ≥ 2. Recall now that all Ni’s lie in some set Y of positive integers. Let W ⊂ QH ′0(M) be the span over KΓ′ of H0(M ;F)⊕ s−κE/2q−E/2 ·HE(M ;F) . Note that dimKΓ′ W = βY (M) + 1 < m . Thus the elements Zi, i = 1, . . . , m, are linearly dependent over KΓ′ . By Theorem 8.2, QH2n(M) is not semi-simple. Proof of Theorem 1.25(b): Assume that L1, . . . , Lm are pair-wise disjoint homologically non-trivial Lagrangian submanifolds. By Proposition 8.1(ii) jq([Li]) = j([Li]) for every i = 1, . . . , m. Since the classes j([Li]) are linearly dependent, Theorem 8.2 implies that QH2n(M) is not semi-simple. Proof of Theorem 1.18: Combining Proposition 8.1 (ii) and (iii) we get that for any H ∈ C∞(M) c(j([L]), H) ≤ sup H ∀H ∈ C∞(M) . By the hypothesis of the theorem, we can write j([L]) ∗ b = a for some b. c(a,H) = c(j([L]) ∗ b,H) ≤ c(j([L]), H) + c(b, 0) . c(a,H) ≤ sup H + c(b, 0) . Applying this inequality to E · H with E > 0, dividing by E and passing to the limit as E → +∞ we get that ζ(H) ≤ supLH for all H . Thus L is superheavy. Remark 8.3. The results above admit the following generalizations in the framework of the Biran-Cornea theory. The main object of this theory is the quantum homology ring QH∗(L) of a monotone Lagrangian submanifold, which is isomorphic to the Lagrangian Floer homology HF∗(L, L) of L up to a shift of the grading. (i) If QH∗(L) does not vanish then L is heavy (see Remark 1.2.9b in [15]). In fact, it follows from [15] that if L satisfies the Albers condition, QH∗(L) does not vanish. (ii) The map jq of the Proposition 8.1 above is a footprint of the quan- tum inclusion map iL : QH∗(L) → QH ∗(M) constructed in [15]. The analogue of the action estimate in item (iii) of the proposition is ob- tained in [15] for the classes iL(x) for elements x ∈ QH∗(L) of a certain special form, yielding the following generalization of Theorem 1.18: for these special classes x ∈ QH∗(L) the condition that the class iL(x) does not vanish and divides a non-trivial idempotent a implies that L is superheavy with respect to a. This enables, for instance, to general- ize Example 1.19 on Lagrangian spheres in quadrics above to the case when dimL is odd. (iii) In [15] one can find another action estimate which comes from the QH∗(M)-module structure on QH∗(L), which yields more results on (super)heaviness of monotone Lagrangian submanifolds. Proof of Proposition 1.4: The quantum homology QH2n(M) splits as an algebra over K into a direct sum of two algebras one of which is a field. This was proved by McDuff for the field F = C (see [39] and [24, Section 7]), but the proof goes through for the case F = Z2 as well. Denote the unity of the field by a. It is a non-zero idempotent in QH2n(M). As we already pointed out in Remark 1.21, such an idempotent a defines a genuine symplectic quasi- state and therefore the classes of heavy and superheavy sets with respect to a coincide. By Theorem 1.2, the Lagrangian torus L ⊂ M cannot be superheavy with respect to a, since it can be displaced from itself by a symplectic (non- Hamiltonian) isotopy. Indeed, take an obvious symplectic isotopy φt of T that displaces L (a parallel shift) and compose it with a Hamiltonian isotopy ψt so that for every t we have that ψt is constant on φt(L) and ψtφt is identity on the ball where the blow up of T2n was performed. Clearly, the resulting symplectic isotopy ψtφt extends to a symplectic isotopy of M that displaces On the other hand, NL ≥ 2 because in this case NL = 2N , where N ≥ 1 is the minimal Chern number of M . Finally, note that L represents a non- trivial homology class in Hn(M ;Z2). Therefore we can apply Theorem 1.15 and get that L is heavy with respect to [M ]. 9 Rigidity of special fibers of Hamiltonian ac- tions In this section we prove Theorem 1.9. Denote the special fiber of Φ by L := Φ−1(pspec). Reduction to the case of T1-actions: First, we claim that it is enough to prove the theorem for Hamiltonian T1-actions and the general case will follow from it. Indeed, assume this is proved. The superheaviness of the special fiber immediately yields that for any function H̄ : R → R ζ(Φ∗H̄) = H̄(pspec), (33) where Φ :M → R is the moment map of the T1-action. Let us turn to the multi-dimensional situation and let Φ : M → Rk be the normalized moment map of a Hamiltonian Tk-action on M . For a v ∈ Rk denote by Φv(x) = 〈v,Φ(x)〉, where 〈·, ·〉 is the standard Euclidean inner product on Rk. Note that if v ∈ Zk the function Φv is the normalized moment map of a Hamiltonian circle action and its special value is 〈v, pspec〉. Thus by (33) K) = K(〈v, pspec〉) ∀K ∈ C ∞(R) . (34) By homogeneity of ζ , equality (34) holds for all v ∈ Qk, and by continuity for all v ∈ Rk. Observe that for each pair of smooth functions P,Q ∈ C∞(R) and for each pair of vectors v,w ∈ Rk the functions Φ∗ P and Φ∗ Q Poisson-commute on M . Thus the triangle inequality for the spectral numbers (see Section 3.4) yields P + Φ∗ Q) ≤ ζ(Φ∗ P ) + ζ(Φ∗ Q) . (35) Since M is compact, it suffices to assume that the function H̄ ∈ C∞(Rk) on Rk is compactly supported. By the inverse Fourier transform we can write H̄(p) = sin〈v, p〉 · F (v) + cos〈v, p〉 ·G(v) for some rapidly (say, faster than (|p| + 1)−N for any N ∈ N) decaying functions F and G on Rk. For every v ∈ Rk define a function Kv ∈ C Kv(s) := sin s · F (v) + cos s ·G(v) . Observe that Φ∗H̄ = Kv dv . Denote by B(R) the Euclidean ball of radius R in Rk with the center at the origin. Put H̄R(p) = Kv(〈v, p〉) dv, p ∈ R Since the functions F and G are rapidly decaying, we get that ||H̄R − H̄||C0(Rk) → 0 as R → ∞ . (36) We claim that for every R ζ(Φ∗H̄R) ≤ H̄R(pspec) . (37) Indeed, for ǫ > 0 introduce the integral sum H̄R,ε(p) = v∈ ε·Zk∩B(R) εk ·Kv(〈v, p〉) . Φ∗H̄R,ε = v∈ ε·Zk∩B(R) εk · Φ∗ Applying repeatedly (35) and (34) we get that ζ(Φ∗H̄R,ε) ≤ H̄R,ε(pspec) . Note now that for fixed R the family H̄R,ǫ converges to H̄R as ε → 0 in the uniform norm on C0(Rk). Using that ζ is Lipschitz with respect to the uniform norm on C0(M) we readily get the inequality (37). Combining the fact that ζ is Lipschitz with (36) and (37) we get that ζ(Φ∗H̄) = lim ζ(Φ∗H̄R) ≤ lim H̄R(pspec) = H̄(pspec) . Now, assume that H̄ ≥ 0 and H̄(pspec) = 0. We just have proved that ζ(Φ∗H̄) ≤ 0, and hence ζ(H) = 0, which immediately yields the desired su- perheaviness of the special fiber. This completes the reduction of the general case to the 1-dimensional case. From now on we will consider only the case of an effective Hamil- tonian T1-action on M with a moment map Φ :M → R. Its moment polytope ∆ is a closed interval in R and pspec = −I(Φ) ∈ R. Reduction to the case of a strictly convex function: We claim that it is enough to show the following proposition: Proposition 9.1. Assume H̄ : R → R is a strictly convex smooth function reaching its minimum at pspec. Set H := Φ ∗H̄. Then ζ(H) = H̄(pspec). Postponing the proof of the proposition for a moment let us show that it implies the theorem. Indeed, let F : M → R be a Hamiltonian on M . In order to show the superheaviness of L = Φ−1(pspec) we need to show that ζ(F ) ≤ supL F . Pick a very steep strictly convex function H̄ : R → R with the minimum value supL F reached at pspec and such that Φ ∗H̄ =: H ≥ F everywhere on M . Then using Proposition 9.1 and the monotonicity of ζ we ζ(F ) ≤ ζ(H) = H̄(pspec) = sup yielding the claim. Preparations for the proof of Proposition 9.1: Given a (time- dependent, not necessarily regular) Hamiltonian G, we associate to every pair [γ, u] ∈ P̃G a number DG([γ, u]) := AG([γ, u])− · CZG([γ, u]). (Recall that we defined the Conley-Zehnder index for all Hamiltonians and not only the regular ones – see Section 3.3). The number DG([γ, u]) is in- variant under a change of the spanning disc u – an addition of a sphere jS ∈ HS2 (M) to the disc u changes both AG([γ, u]) and κ/2 · CZG([γ, u]) by the same number. Thus we can write DG([γ, u]) = DG(γ). Given [γ, u] ∈ P̃G and l ∈ N define γ (l) and u(l) as the compositions of γ and u with the map z → zl on the unit disc D2 ⊂ C (here z is a complex coordinate on C). Denote by t 7→ gt the time-t flow of G and by G(l) :M × R → R the Hamiltonian whose time-t flow is t 7→ (gt) l and which is defined by G(l) := G♯ . . . ♯G (l times), where G♯K(x, t) := G(x, t) +K(g−1t x, t) for any K :M × R → R. Proposition 9.2. There exists a constant C > 0, depending only on n, with the following property. Given a 1-periodic orbit γ ∈ PG of the flow t 7→ gt generated by G, assume that γ(l) is a 1-periodic orbit of the flow t 7→ glt generated by G(l), and therefore for any u such that [γ, u] ∈ P̃G we have [γ(l), u(l)] ∈ P̃G(l). Then |DG(l)([γ (l), u(l)])− lDG([γ, u])| ≤ l · C. Proof. The action term in DG gets multiplied by l as we pass from G to G As for the Conley-Zehnder term, the quasi-morphism property of the Conley- Zehnder index (see Proposition 3.5) implies that there exists a constant C > 0 (depending only on n) such that |lCZG[γ, u]− CZG(l)([γ (l), u(l)])| ≤ C. This immediately proves the proposition. Proposition 9.3. Let G :M × [0, 1] → R be a Hamiltonian as above. Then one can choose ǫ > 0, depending on G, and a constant Cn > 0, depending only on n = dimM/2, so that any function F : M × [0, 1] → R which is ǫ- close to G in a C∞-metric on C∞(M×[0, 1]) satisfies the following condition: for every γ0 ∈ PF there exists γ ∈ PG such that the difference between DF (γ0) and DG(γ) is bounded by Cn. Proof. Denote the flow of G by gt (as before) and the flow of F by ft. We will view time-1 periodic trajectories of these flows both as maps of [0, 1] to M having the same value at 0 and 1 and as maps from S1 to M . First, consider the fibration D2×M →M and, slightly abusing notation, denote the natural pullback of ω again by ω. Second, look at the fibration pr : D2 ×M → D2. Denote by V ert the vertical bundle over D2 ×M formed by the tangent spaces to the fibers of pr. For each loop σ : S1 →M define by σ̂ : S1 → D2 ×M the map σ̂(t) := (t, γ(t)). The bundles σ∗TM and σ̂∗V ert over S1 coincide. Similarly for each w : D2 →M denote by ŵ : D2 → D2×M the map ŵ(z) := (z, w(z)). There exists δ > 0, depending on G, such that for each γ ∈ PG a tubular δ-neighborhood of the image of γ̂ in S1 ×M ⊂ D2 ×M , denoted by Ubγ, has the following properties: • there exists a 1-form λ on Ubγ satisfying dλ = ω; • V ert admits a trivialization over Ubγ . Given an ǫ > 0, we can choose F sufficiently C∞-close to G so that the paths t 7→ ft and t 7→ gt in Ham(M) are arbitrarily C ∞-close and therefore • for every x ∈ Fix (F ) there exists y ∈ Fix (G) which is ǫ-close to x (think of the fixed points as points of intersection of the graph of a diffeomorphism with the diagonal); • the C∞-distance between the maps γ0 : t 7→ ft(x) and γ : t 7→ gt(y) from [0, 1] to M is bounded by ǫ and the image of γ̂0 lies in Ubγ. Pick a map u0 : D 2 → M , u|∂D2 = γ0. Since γ0 and γ are C ∞-close one can enlarge D2 to a bigger disc D21 ⊃ D 2 and find a smooth map u : D21 →M so that • u|∂D21 = u0; • u(D21 \ D 2) ⊂ Ubγ. Rescaling D21 we may assume without loss of generality that [γ, u] ∈ PG. Trivialize the vector bundles γ∗0TM and γ ∗TM so that the trivializations extend to a trivialization of u∗TM over D21 (and hence of u 0TM over D Using the trivializations we can identify the paths t 7→ dγ0(0)ft and t 7→ dγ(0)gt with some identity-based paths of symplectic matrices A(t), B(t). Fixing a small ǫ as above, we can also assume that F is chosen so C∞-close to G that, in addition to all of the above, the C∞-distance between the paths t 7→ A(t) and t 7→ B(t) in Sp (2n) is bounded by ǫ (for instance, make sure first that the matrix paths obtained by writing the paths t 7→ dγ0(0)ft and t 7→ dγ(0)gt using some trivialization of V ert over Ubγ are close enough – then the matrix paths t 7→ A(t) and t 7→ B(t) will also be close enough). We claim that by choosing ǫ sufficiently small in the construction above we can bound the difference between DF ([γ0, u0]) and DG([γ, u]) by a quantity depending only on dimM . Indeed, the difference | F (γ0(t), t)dt − G(γ(t))dt| is bounded by a quantity depending only on some universal constants and ǫ, because γ0 is ǫ-close to γ and F is ǫ-close to G with respect to the C∞-metrics. It can be made arbitrarily small by choosing a sufficiently small ǫ. The difference u∗0ω − u∗ω| = | û∗0ω − û∗ω| is bounded by the difference | γ̂∗0λ − γ̂∗λ|. Since, γ0 and γ are ǫ-close in the C∞-metric the later difference can be made less than 1 if we choose a sufficiently small ǫ. Thus we have shown that by choosing a sufficiently small ǫ we can bound |AF ([γ0, u0])−AG([γ, u])| by 1. Now, as far as the Conley-Zehnder indices are concerned, our choice of the trivializations means that the difference between CZF ([γ0, u0]) and CZG([γ, u]) is just the difference between the Conley-Zehnder indices for the matrix paths t 7→ A(t) and t 7→ B(t). But the latter paths in Sp (2n) are ǫ-close in the C∞-sense, hence represent close elements of S̃p (2n) and if ǫ was chosen sufficiently small, then, as we mentioned in Section 3.3, their Conley-Zehnder indices differ at most by a constant depending only on n. This finishes the proof of the claim and the proposition. Plan of the proof of Proposition 9.1: We assume now that H̄ is a fixed strictly convex function on R. Our calculations will feature E as a large parameter. For quantities α, β depending on E we will write α � β if α ≤ β+const holds for large enough E, where const depends only on (M,ω), Φ and H̄ , and in particular does not depend on E. We will write α ≈ β if α � β and β � α. Using this language the proposition can be restated as c(a, EH) ≈ EH̄(pspec). (38) In general, 1-periodic orbits of the flow of EH are not isolated and there- fore the Hamiltonian is not regular. Let F be a regular (time-periodic) perturbation of EH . By the spectrality axiom, the spectral number c(a, F ) for a ∈ QH2n(M) equals AF ([γ0, u0]) for some pair [γ0, u0] ∈ P̃F with CZF ([γ0, u0]) = 2n. Thus c(a, F ) ≈ DF (γ0). Combining this with Proposition 9.3 we get that for some γ ∈ PEH EH̄(pspec) � c(a, EH) ≈ c(a, F ) ≈ DF (γ0) ≈ DEH(γ) . (39) Thus it would be enough to show that DEH(γ) � EH̄(pspec) for all γ ∈ PEH , (40) which together with (39) would imply (38). Inequality (40) will be proved in the following way. Note that each γ ∈ PEH lies in Φ −1(p) for some p ∈ ∆. We will show that DEH(γ) ≈ EH̄(p) + EH̄ ′(p)(pspec − p). (41) Note that (41) implies (40). Indeed, since H̄ is strictly convex and reaches its minimum at pspec, it follows from (41) that DEH(γ) ≈ EH̄(p) + EH̄ ′(p)(pspec − p) ≤ EH̄(pspec), which is true for any γ ∈ PEH thus yielding (40). Proof of (41): Let the T1-action on M be given by a loop of sym- plectomorphisms {φt}, t ∈ R, φt = φt+1. The flow of EH has the form htx = φEH̄′(Φ(x))tx. We view γ as a map γ : [0, 1] → M satisfying γ(0) = γ(1). Denote x := γ(0). The curve γ lies in Φ−1(p). Denote N := γ([0, 1]). This is the T1-orbit of x and it is either a point or a circle. In the first case γ is a constant trajectory concentrated at a fixed point N ∈M of the action. Using this constant curve γ together with the constant disc u spanning for the definitions of I(Φ) and DEH(γ) one gets pspec − p = mΦ(γ, u) · κ/2, DEH(γ) = EH̄(p)− κ/2 · CZEH([γ, u]). Thus proving (41) reduces in this case to proving −CZEH([γ, u]) ≈ EH̄ ′(p) ·mΦ(γ, u). Let us fix a symplectic basis of TNM and view each differential dNφt as a symplectic matrix A(t), so that {A(t)} is an identity-based loop in Sp (2n). −CZEH([γ, u]) ≈ CZmatr({A(EH̄ ′(p)t)}), while EH̄ ′(p) ·mΦ(γ, u) ≈ EH̄ ′(p)Maslov({A(t)}). Thus we need to prove CZmatr({A(EH̄ ′(p)t)}) ≈ EH̄ ′(p)Maslov({A(t)}), which follows easily from the definitions of the Conley-Zehnder index and the Maslov class. Thus from now on we will assume that N is a circle. Take any point x ∈ N . The stabilizer of x under the T1-action is a finite cyclic group of order k ∈ N. Thus the orbit of the T1-action turns k times along N . Since γ is a non-constant closed orbit of the Hamiltonian flow generated by EΦ∗H̄ , it turns r times along N with r ∈ Z \ {0}. This implies that EH̄ ′(p) = r/k. We claim that without loss of generality we may assume that l := r/k is an integer. Indeed, we can always pass to γ(k) ∈ PkEH , so that (kEH̄) ′(p) ∈ Z, and if we can prove the proposition for γ(k), then DkEH(γ (k)) ≈ kEH̄(p) + kEH̄ ′(p)(pspec − p). Applying Proposition 9.2 we get kDEH(γ) ≈ kEH̄(p) + kEH̄ ′(p)(pspec − p) + k · const , and hence DEH(γ) ≈ EH̄(p) + EH̄ ′(p)(pspec − p), proving the claim for the original γ. From now on we assume that l := EH̄ ′(p) ∈ Z\{0} and that [γ, u] ∈ P̃lΦ. Consider the Hamiltonian vector field X := sgradΦ at a point x ∈ N . Since N is a non-constant orbit we get X 6= 0. Then V = Tx(Φ −1(p)) is the skew- orthogonal complement to X . Choose a T1-invariant ω-compatible almost complex structure J in a neighborhood of N . Together ω and J define a T1- invariant Riemannian metric g. Decompose the tangent bundle TM along N as follows. Put Z = Span(JX,X) and set W to be the g-orthogonal complement to X in V . Thus we have a T1-invariant decomposition TxM = W ⊕ Z , x ∈ N . (42) Furthermore, W and Z carry canonical symplectic forms. Thus W and Z define symplectic (and hence trivial) subbundles of TM over N . They induce trivial subbundles of the bundle γ∗TM over S1. We calculate dht(x)ξ = dφEH′(Φ(x))t(x)ξ + EH ′′(Φ(x)) · dΦ(ξ) ·X . (43) We consider two trivializations of the bundle γ∗TM over S1. The first trivi- alization is defined by means of sections invariant under the T1-action. The second one is chosen in such a way that it extends to a trivialization of u∗TM over D2. Using these trivializations we can identify dht(x), respectively, with two identity-based paths {Ct}, {C t} of symplectic matrices. The decompo- sition (42) induces a split Ct = 1⊕ Bt . We claim that |CZmatr({Bt})| is bounded by a constant independent of E. Indeed, observe that in the basis (X, JX) of Z 1 b12(t) Denote by L the line spanned by X = (1, 0). Perturb {Bt} to a path {B RδtBt}, where Rt is the rotation by angle t, and δ > 0 is small enough. Observe thatB′(t)L∩L = {0} for t > 0. It follows readily from the definitions that |CZmatr(B t)| and |CZmatr(Rδt)| do not exceed 2. Thus by the quasi- morphism property of the Conley-Zehnder index (see Proposition 3.5) we have that |CZmatr({Bt})| is bounded by a constant independent of E, which yields the claim. Therefore CZmatr ({Ct}) ≈ 0 . On the other hand, by formula (18) CZmatr ({C t}) = CZmatr ({Ct}) +mlΦ([γ, u]) . CZEH([γ, u]) := n− CZmatr ({C t}) ≈ −mlΦ([γ, u]). (44) Since the periodic trajectory γ lies inside Φ−1(p), we get AEH([γ, u]) = EH(γ(t))dt− u∗ω = EH̄(p)− u∗ω. (45) Using (45) and (44) the precise equality DEH([γ, u]) = AEH([γ, u])− · CZEH([γ, u]) can be turned into an asymptotic inequality DEH([γ, u]) ≈ EH̄(p)− u∗ω + mlΦ([γ, u]). (46) Since the periodic trajectory γ lies inside Φ−1(p), we have AlΦ([γ, u]) = lΦ(γ(t))dt− u∗ω = lp− u∗ω. (47) Adding and subtracting lp from the right-hand side of (46) and using (47) we get DEH(γ) = DEH([γ, u]) ≈ EH̄(p)− lp) mlΦ([γ, u]) EH̄(p)− lp AlΦ([γ, u])+ mlΦ([γ, u]) EH̄(p)− lp − I(lΦ) = = EH̄(p) + l(−I(Φ)− p) = EH̄(p) + l(pspec − p) . Recalling that l = EH ′(p), we finally obtain that DEH(γ) = EH̄(p) + EH ′(p)(pspec − p), which is precisely the equation (41) that we wanted to get. This finishes the proof of Proposition 9.1 and Theorem 1.9. 9.1 Calabi and mixed action-Maslov Proof of Theorem 1.13. Assume H :M × [0, 1] → R is a normalized Hamiltonian which generates a loop in Ham(M) representing a class α ∈ π1(Ham(M)) ⊂ H̃am (M). Then H(l) is also normalized and generates a loop representing αl. Let us compute µ(α) = −vol (M) · liml→+∞ c(a,H (l))/l. Arguing as in the proof of (39) we get that there exists a constant C > 0 such that for each l ∈ N there exists γ ∈ PH(l) for which |c(a,H (l)) − DH(l)(γ)| ≤ C. But, as it follows from the definitions and from the fact that I is a homomorphism, DH(l)(γ) does not depend on γ and equals −I(α −lI(α). This immediately implies that µ(α) = vol (M) · I(α). Acknowledgements. The origins of this paper lie in our joint work with P.Biran on the paper [10] – we thank him for fruitful collaboration at an early stage of this project, as well as for his crucial help with Example 1.17 on Lagrangian spheres in projective hypersurfaces. We also thank him and O.Cornea for pointing out to us a mistake in the original version of this paper and helping us with the correction (see Section 8). We thank F. Zapolsky for his help with the “exotic” monotone Lagrangian torus in S2 × S2 dis- cussed in Example 1.20. We thank C. Woodward for pointing out to us the link between the special point in the moment polytope of a symplectic toric manifold and the Futaki invariant, and E. Shelukhin for useful discussions on this issue. We are also grateful to V.L. Ginzburg, Y. Karshon, Y. Long, D. McDuff, M. Pinsonnault, D. Salamon and M. Sodin for useful discussions and communications. We thank K. Fukaya, H. Ohta and K. Ono, the orga- nizers of the Conference on Symplectic Topology in Kyoto (February 2006), M. Harada, Y. Karshon, M. Masuda and T. Panov, the organizers of the Con- ference on Toric Topology in Osaka (May 2006), O. Cornea, V.L. Ginzburg, E. Kerman and F. Lalonde, the organizers of the Workshop on Floer theory (Banff, 2007), and A. Fathi, Y.-G. Oh and C. Viterbo, the organizers of the AMS Summer Conference on Symplectic Topology and Measure-Preserving Dynamical Systems (Snowbird, July 2007), for giving us an opportunity to present a preliminary version of this work and for the superb job they did in organizing these conferences. Finally, we thank an anonymous referee for helpful comments and corrections. References [1] Aarnes, J.F., Quasi-states and quasi-measures, Adv. Math. 86:1 (1991), 41-67. [2] Albers, P., On the extrinsic topology of Lagrangian submanifolds, Int. Math. Res. Not. 38, (2005), 2341-2371. [3] Albers, P., Frauenfelder, U., A non-displaceable Lagrangian torus in T ∗S2, Comm. Pure Appl. Math. 61:8 (2008), 1046-1051. [4] Arnold, V.I., On a characteristic class entering into conditions of quan- tization, (Russian) Funkcional. Anal. i Priložen. 1 1967, 1-14. [5] Arnold, V. I., Some remarks on symplectic monodromy of Milnor fi- brations, in The Floer memorial volume, 99-103, Progr. Math., 133, Birkhäuser, 1995. [6] Atiyah, M.F., Convexity and commuting Hamiltonians, Bull. London Math. Soc. 14:1 (1981), 1-15. [7] Barge, J., Ghys, E., Cocycles d’Euler et de Maslov, Math. Ann. 294:2 (1992), 235-265. [8] Beauville, A., Quantum cohomology of complete intersections, Mat. Fiz. Anal. Geom. 2:3-4 (1995), 384-398. [9] Biran, P., Cieliebak, K., Symplectic topology on subcritical manifolds, Comm. Math. Helv. 76:4 (2001), 712-753. [10] Biran, P., Entov, M., Polterovich, L., Calabi quasimorphisms for the symplectic ball, Commun. Contemp. Math. 6:5 (2004), 793-802. [11] Biran, P., Geometry of Symplectic Intersections, in Proceedings of the International Congress of Mathematicians (Beijing 2002), Vol. II, 241- [12] Biran, P., Symplectic topology and algebraic families, in 4-th European Congress of Mathematics (Stockholm 2004),pp. 827-836, Eur. Math. Soc., Zürich, 2005. [13] Biran, P., Lagrangian Non-Intersections, Geom. and Funct. Anal. (GAFA), 16 (2006), 279-326. [14] Biran, P., Cornea, O., Quantum structures for Lagrangian submanifolds, preprint, arXiv:0708.4221, 2007. [15] Biran, P., Cornea, O., Rigidity and uniruling for Lagrangian submani- folds, arXiv:0808.2440, 2008. [16] Cho, C.-H., Holomorphic disc, spin structures and Floer cohomology of the Clifford torus, Int. Math. Res. Not. 35 (2004), 1803-1843. [17] Cho, C.-H., Non-displaceable Lagrangian submanifolds and Floer coho- mology with non-unitary line bundle, prperint, arXiv:0710.5454, 2007. [18] Conley, C., Zehnder, E., Morse-type index theory for flows and peri- odic solutions for Hamiltonian equations, Comm. Pure Appl. Math. 37:2 (1984), 207-253. [19] de Gosson, M., de Gosson, S., Piccione, P., On a product formula for the Conley–Zehnder index of symplectic paths and its applications, preprint, math.SG/0607024, 2006. [20] Delzant, T., Hamiltoniens périodiques et images convexes de l’appli- cation moment, Bull. Soc. Math. France 116:3 (1988), 315-339. [21] Donaldson, S. K., Polynomials, vanishing cycles and Floer homology, in Mathematics: frontiers and perspectives, 55-64, AMS, 2000. [22] Entov, M., Polterovich, L., Calabi quasimorphism and quantum homol- ogy, Intern. Math. Res. Notices 30 (2003), 1635-1676. [23] Entov, M., Polterovich, L., Quasi-states and symplectic intersections, Comm. Math. Helv. 81:1 (2006), 75-99. http://arxiv.org/abs/0708.4221 http://arxiv.org/abs/0808.2440 http://arxiv.org/abs/0710.5454 [24] Entov, M., Polterovich, L., Symplectic quasi-states and semi-simplicity of quantum homology, in Toric Topology, pp. 47-70, Contemporary Mathematics 460, AMS, 2008. [25] Entov, M., Polterovich, L., Zapolsky, F., Quasi-morphisms and the Pois- son bracket, Pure Appl. Math. Quarterly 3:4, part 1 (2007), 1037-1055. [26] Floer, A., Symplectic fixed points and holomorphic spheres, Comm. Math. Phys. 120:4 (1989), 575-611. [27] Fukaya, K., Oh., Y.-G., Ohta, H., Ono, K., Lagrangian intersection Floer theory – anomaly and obstruction, preprint. [28] Fukaya, K., Oh., Y.-G., Ohta, H., Ono, K., Lagrangian Floer theory on compact toric manifolds I, preprint, arXiv:0802.1703, 2008. [29] Futaki, A., An obstruction to the existence of Einstein Kähler metrics, Invent. Math. 73:3 (1983), 437-443. [30] Guillemin, V., Sternberg, S., Convexity properties of the moment map- ping, Invent. Math. 67:3 (1982), 491-513. [31] Hirzebruch, F., Topological methods in algebraic geometry, Springer- Verlag, Berlin, 1966. [32] Hofer, H., Salamon, D., Floer homology and Novikov rings, in: The Floer Memorial Volume, 483-524, Progr. Math., 133, Birkhäuser, 1995. [33] Karshon, Y., Appendix to the paper “Symplectic packings and algebraic geometry” by D.McDuff and L.Polterovich, Invent. Math. 115:3 (1994), 431-434. [34] Lang, S., Algebra, 3rd edition, Springer-Verlag, 2002. [35] Leray, J., Lagrangian Analysis and Quantum Mechanics, The MIT Press, Cambridge, Massachusetts, 1981. [36] Lerman, E., Symplectic cuts, Math. Res. Lett. 2:3 (1995), 247-258. [37] Liu, G., Associativity of quantum multiplication, Comm. Math. Phys. 191:2 (1998), 265-282. http://arxiv.org/abs/0802.1703 [38] Mabuchi, T., Einstein-Káhler forms, Futaki invariants and convex ge- ometry on toric Fano varieties, Osaka J. Math. 24:4 (1987), 705-737. [39] McDuff, D., Hamiltonian S1 manifolds are uniruled, preprint, arXiv:0706.0675, 2007. [40] McDuff, D., Private communication, 2007. [41] McDuff, D., Salamon, D., J-holomorphic curves and symplectic topology, AMS, 2004. [42] Oh, Y.-G., Symplectic topology as the geometry of action functional I, J. Differ. Geom. 46 (1997), 499-577. [43] Oh, Y.-G., Symplectic topology as the geometry of action functional II, Commun. Anal. Geom. 7 (1999), 1-55. [44] Oh, Y.-G., Addendum to: “Floer cohomology of Lagrangian intersections and pseudo-holomorphic disks. I”, Comm. Pure Appl. Math. 48 (1995), no. 11, 1299-1302. [45] Oh, Y.-G., Construction of spectral invariants of Hamiltonian diffeomor- phisms on general symplectic manifolds, in The breadth of symplectic and Poisson geometry, 525-570, Birkhäuser, 2005. [46] Ostrover, Y., Calabi quasi-morphisms for some non-monotone symplec- tic manifolds, Algebr. Geom. Topol. 6 (2006), 405-434. [47] Piunikhin, S., Salamon, D., Schwarz, M., Symplectic Floer-Donaldson theory and quantum cohomology, in: Contact and Symplectic Geometry, 171-200, Publ. Newton Inst., 8, Cambridge Univ. Press, 1996. [48] Polterovich, L., The geometry of the group of symplectic diffeomor- phisms, Birkhäuser, 2001. [49] Polterovich, L., Hamiltonian loops and Arnold’s principle, Amer. Math. Soc. Transl. (2) 180 (1997), 181-187. [50] Polterovich, L., Hofer’s diameter and Lagrangian intersections, Internat. Math. Res. Notices 4 1998, 217-223. http://arxiv.org/abs/0706.0675 [51] Polterovich, L., Rudnick, Z., Kick stability in groups and dynamical systems, Nonlinearity 14:5 (2001), 1331-1363. [52] Py, P., Quasi-morphismes et invariant de Calabi, Ann. Sci. Ecole Norm. Sup. 39 (2006), 177–195. [53] Py, P., Quasi-morphismes et diffeomorphismes Hamiltoniens, PhD- thesis, ENS-Lyon, 2008. [54] Robbin, J., Salamon, D., The Maslov index for paths, Topology 32:4 (1993), 827-844. [55] Ruan, Y., Tian, G., A mathematical theory of quantum cohomology, Math. Res. Lett. 1:2 (1994), 269-278. [56] Ruan, Y., Tian, G., A mathematical theory of quantum cohomology, J. Diff. Geom. 42:2 (1995), 259-367. [57] Salamon, D., Zehnder, E., Morse theory for periodic solutions of Hamil- tonian systems and the Maslov index, Comm. Pure Appl. Math. 45:10 (1992), 1303-1360. [58] Salamon, D., Lectures on Floer homology, in: Symplectic geometry and topology (Park City, UT, 1997), 143-229, IAS/Park City Math. Ser., 7, AMS, 1999. [59] Schwarz, M., On the action spectrum for closed symplectically aspherical manifolds, Pacific J. Math. 193:2 (2000), 419-461. [60] Seidel, P., Floer homology and the symplectic isotopy problem, PhD the- sis, Oxford University, 1997. [61] Seidel, P., Graded Lagrangian submanifolds, Bull. Soc. Math. France 128:1 (2000), 103-149. [62] Seidel, P., Vanishing cycles and mutation, in European Congress of Mathematics, Vol. II (Barcelona, 2000), 65-85, Progr. Math., 202, Birkhäuser, 2001. [63] Shelukhin, E., PhD thesis (in preparation), Tel-Aviv University. [64] Usher, M., Spectral numbers in Floer theories, preprint, arXiv:0709.1127, 2007. [65] Viterbo, C., Symplectic topology as the geometry of generating functions, Math. Ann. 292:4 (1992), 685-710. [66] van der Waerden, B., Algebra. Vol. 2, Springer-Verlag, 1991. [67] Wang, X.-J., Zhu, X., Kähler-Ricci solitons on toric manifolds with pos- itive first Chern class, Adv. Math. 188:1 (2004), 87-103. [68] Weinstein, A., Cohomology of symplectomorphism groups and critical values of Hamiltonians, Math. Z. 201:1 (1989), 75-82. [69] Witten, E., Two-dimensional gravity and intersection theory on moduli space, Surveys in Diff. Geom. 1 (1991), 243-310. Michael Entov Leonid Polterovich Department of Mathematics School of Mathematical Sciences Technion Tel Aviv University Haifa 32000, Israel Tel Aviv 69978, Israel entov@math.technion.ac.il polterov@post.tau.ac.il http://arxiv.org/abs/0709.1127 Introduction and main results Many facets of displaceability Preliminaries on quantum homology An hierarchy of rigid subsets within Floer theory Hamiltonian torus actions Super(heavy) monotone Lagrangian submanifolds An effect of semi-simplicity Discussion and open questions Strong displaceability beyond Floer theory? Heavy fibers of Poisson-commutative subspaces Detecting stable displaceability Preliminaries on Hamiltonian Floer theory Valuation on QH* (M) Hamiltonian Floer theory Conley-Zehnder and Maslov indices Spectral numbers Partial symplectic quasi-states Basic properties of (super)heavy sets Products of (super)heavy sets Product formula for spectral invariants Decorated Z2-graded complexes Reduced Floer and Quantum homology Proof of Theorem 5.1 Proof of algebraic Theorem 5.2 Stable non-displaceability of heavy sets Analyzing stable stems Monotone Lagrangian submanifolds Rigidity of special fibers of Hamiltonian actions Calabi and mixed action-Maslov ABSTRACT We show that there is an hierarchy of intersection rigidity properties of sets in a closed symplectic manifold: some sets cannot be displaced by symplectomorphisms from more sets than the others. We also find new examples of rigidity of intersections involving, in particular, specific fibers of moment maps of Hamiltonian torus actions, monotone Lagrangian submanifolds (following the works of P.Albers and P.Biran-O.Cornea), as well as certain, possibly singular, sets defined in terms of Poisson-commutative subalgebras of smooth functions. In addition, we get some geometric obstructions to semi-simplicity of the quantum homology of symplectic manifolds. The proofs are based on the Floer-theoretical machinery of partial symplectic quasi-states. <|endoftext|><|startoftext|> Introduction Multiple parton scattering in a dense medium can be used as a useful tool to study properties of both hot and cold nuclear matter. The success of such an approach has been demonstrated by the discovery of strong jet quenching phenomena in central Au + Au collisions at the Relativistic Heavy-ion Collider (RHIC) [1,2,3] and their implications on the formation of a strongly coupled quark-gluon plasma at RHIC [4,5]. However, for a convincing phenomenological study of the existing and future experimental data, a unified description of all medium effects in hard processes involving nuclei, such as electron-nucleus (e + A), hadron-nucleus (h + A) and nucleus-nucleus collisions (A + A) has to be developed [6,7]. This must include the physics of transverse momentum broadening [8], strong nuclear enhancement in DIS [9] and Drell-Yan production [10,11], nuclear shadowing [12], and parton energy loss due to gluon radiation induced by multiple scattering [13,14,15,16,17,18,19]. There exist many different frameworks in the literature to describe multiple scattering in a nuclear medium [20,21,22]. Among them the twist expansion approach is based on the generalized factorization in perturbative QCD as initially developed by Luo, Qiu and Sterman (LQS) [23]. In the LQS formalism, multiple scattering processes generally involve high-twist multiple-parton correlations in analogy to the parton distribution op- erators in leading twist processes. Though the corresponding higher twist corrections are suppressed by powers of 1/Q2, they are enhanced at least by a factor of A1/3 due to multiple scattering in a large nucleus. This framework has been applied recently to study medium modification of the fragmentation functions as the leading parton propagates through the medium [18,19]. Because of the non-Abelian Landau-Pomeranchuck-Midgal interference in the gluon bremsstrahlung induced by multiple parton scattering in nuclei, the higher-twist nuclear modifications to the fragmentation functions are in fact enhanced by A2/3, quadratic in the nuclear size [18,19]. Phenomenological study of parton energy loss and nuclear modification of the fragmentation functions in cold nuclear matter [24] gives a good description of the nuclear modification of the leading hadron spectra in semi- inclusive deeply inelastic lepton-nucleus scattering observed by the HERMES experiment [25,26]. The same framework also gives a compelling explanation for the suppression of large transverse momentum hadrons discovered at RHIC [27]. The emphasis of recent studies of medium modification of fragmentation functions has been on radiative parton energy loss induced by multiple scattering with gluons. Such processes indeed are dominant relative to multiple scattering with quarks because of the abundance of soft gluons in either cold nuclei or hot dense matter produced in heavy-ion collisions. Since gluon bremsstrahlung induced by scattering with medium gluons is the same for quarks and anti-quarks, one also expects the energy loss and fragmentation modification to be identical for quarks and anti-quarks. However, in a medium with fi- nite baryon density such as cold nuclei and the forward region of heavy-ion collisions, the difference between quark and anti-quark distributions in the medium should lead to differ- ent energy loss and modified fragmentation functions for quarks and antiquarks through quark-antiquark annihilation processes. To study such an asymmetry, one must consider systematically all possible quark-quark and quark-antiquark scattering processes, which will be the focus of this paper. In this study we will calculate the modifications of quark and antiquark fragmentation functions (FF) due to quark-quark (antiquark) double scattering in a nuclear medium, working within the LQS framework for generalized factorization in perturbative QCD. For a complete description of nuclear modification of the single inclusive hadron spec- tra, one still have to consider medium modification of gluon fragmentation functions in addition to modified quark fragmentation function due to quark-gluon scattering [18]. The theoretical results presented in this paper will be a second step toward a complete description of medium modified fragmentation functions. However, one can already find that quark-quark (antiquark) double scattering will give different corrections to quark and antiquark FF, depending on antiquark and quark density of the medium, respec- tively. This difference between modified quark and antiquark FF may shed light on the interesting observation by the HERMES experiment [25,26] of a large difference between nuclear suppression of the leading proton and antiproton spectra in semi-inclusive DIS off large nuclei. Such a picture of quark-quark (antiquark) scattering can provide a com- peting mechanism for the experimentally observed phenomenon in addition to possible absorption of final state hadrons inside nuclear matter [28,29]. The paper is organized as follows. In the next section we will present the general for- malism of our calculation including the generalized factorization of twist-4 processes. In Section III we will illustrate the procedure of calculating the hard partonic parts of quark-quark double scattering in nuclei. In Section IV we will discuss the modifications to quark and antiquark fragmentation functions due to quark-quark (antiquark) dou- ble scattering in nuclei. In Section V, we will focus on the flavor dependent part of the medium modification to the quark FF’s due to quark-antiquark annihilation and we will discuss the implications for the flavor dependence of the leading hadron spectra in both DIS off a nucleus and heavy-ion collisions. We will summarize our work in Section VI. In the Appendix A-1, we collect the complete results for the hard partonic parts for different cut diagrams of quark-quark (antiquark) double rescattering in nuclei. We also provide an alternative calculation of the hard parts of the central-cut diagrams in Appendix A-3 through elastic quark-quark scattering or quark-antiquark annihilation as a cross check. 2 General formalism In order to study quark and antiquark FF’s in semi-inclusive deeply inelastic lepton- nucleus scattering, we consider the following processes, e(L1) + A(p) −→ e(L2) + h(ℓh) +X , Ap Ap Fig. 1. Lowest order and leading-twist contribution to semi-inclusive DIS. where L1 and L2 are the four momenta of the incoming and outgoing leptons, and ℓh is the observed hadron momentum. The differential cross section for the semi-inclusive process can be expressed as EL2Eℓh dσhDIS d3L2d3ℓh LµνEℓh dW µν , (1) where p = [p+, 0, 0⊥] is the momentum per nucleon in the nucleus, q = L2 − L1 = [−Q2/2q−, q−, 0⊥] the momentum transfer carried by the virtual photon, s = (p + L1)2 the lepton-nucleon center-of-mass energy and αEM is the electromagnetic (EM) coupling constant. The leptonic tensor is given by Lµν = 1/2Tr(γ · L1γµγ · L2γν) while the semi- inclusive hadronic tensor is defined as, 〈A|Jµ(0)|X, h〉〈X, h|Jν(0)|A〉2πδ4(q + p− pX − ℓh) (2) where X runs over all possible final states and Jµ = q eqψ̄qγµψq is the hadronic EM current. Assuming collinear factorization in the parton model, the leading-twist contribution to the semi-inclusive cross section can be factorized into a product of parton distributions, parton fragmentation functions and the partonic cross section. Including all leading log radiative corrections, the lowest order contribution [O(α0s)] from a single hard γ∗ + q scattering, as illustrated in Fig. 1, can be written as dW Sµν dxfAq (x, µ µν (x, p, q)Dq→h(zh, µ 2) ; (3) H(0)µν (x, p, q) = Tr(γ · pγµγ · (q + xp)γν) 2p · q δ(x− xB) , (4) where the momentum fraction carried by the hadron is defined as zh = ℓ −, xB = Q2/2p+q− is the Bjorken scaling variable, µ2I and µ 2 are the factorization scales for the initial quark distributions fAq (x, µ I) in a nucleus and the fragmentation functions in vacuum Dq→h(zh, µ 2), respectively. The renormalized quark fragmentation function Dq→h(zh, µ 2) satisfies the DGLAP QCD evolution equations [30]: ∂Dq→h(zh, µ ∂ lnµ2 γq→qg(z)Dq→h(zh/z, µ + γq→gq(z)Dg→h(zh/z, µ ; (5) ∂Dg→h(zh, µ ∂ lnµ2 γg→qq̄(z)Dq→h(zh/z, µ + γg→gg(z)Dg→h(zh/z, µ , (6) where γa→bc(z) denotes the splitting functions of the corresponding radiative processes [31,32]. In DIS off a nuclear target, the propagating quark will experience additional scatterings with other partons from the nucleus. The rescatterings may induce additional parton (quark or gluon) radiation and cause the leading quark to lose energy. Such induced radiation will effectively give rise to additional terms in the DGLAP evolution equation leading to a modification of the fragmentation functions in a medium. These are power- suppressed higher-twist corrections and they involve higher-twist parton matrix elements. We will only consider those contributions that involve two-parton correlations from two different nucleons inside the nucleus. They are proportional to the thickness of the nucleus [18,23,33] and thus are enhanced by a nuclear factor A1/3 as compared to two-parton correlations in a nucleon. As in previous studies [18,19], we will limit our study to such double scattering processes in a nuclear medium. These are twist-four processes and give leading contributions to the nuclear effects. The contributions of higher twist processes or contributions not enhanced by the nuclear medium will be neglected for the time being. When considering double scattering with nuclear enhancement, a very important process is quark-gluon double scattering as illustrated in Fig. 2. Such processes give the dominant contribution to the leading quark energy loss and have been studied in detailed in Refs. [18,19]. The modification to the vacuum quark fragmentation function from quark-gluon scattering is, qg→qg q→h (zh)= α2sCA Dq→h(zh/z) 1 + z2 (1− z)+ TAqg(x, xL) fAq (x) + δ(z − 1) ∆TAqg(x, ℓ fAq (x) +Dg→h(zh/z) 1 + (1− z)2 TAqg(x, xL) fAq (x) , (7) Fig. 2. A typical diagram for quark-gluon double scattering with three possible cuts [central(C), left(L) and right(R)]. where the +function is defined as F (z) (1 − z)+ F (z)− F (1) for any F (z) that is sufficiently smooth at z = 1 and the twist-four quark-gluon correla- tion function, TAqg(x, xL) = dy−1 dy i(x+xL)p +y−(1− e−ixLp+y 2 )(1− e−ixLp+(y−−y ×〈A|ψ̄q(0) F +σ (y +σ(y−1 )ψq(y −)|A〉θ(−y−2 )θ(y− − y−1 ), (9) has explicit interference included. The matrix element in the virtual correction [the term with δ(z − 1)] is defined as ∆TAqg(x, ℓ T ) ≡ 2TAqg(x, xL)|z=1 − (1 + z2)TAqg(x, xL) . (10) Since TAqg(x, xL)/f q (x) is proportional to gluon distribution and independent of the flavor of the leading quark, the suppression of the hadron spectrum caused by quark-gluon or antiquark-gluon scattering should be proportional to the gluon density of the medium and is identical for quark and antiquark fragmentation. It was shown in Ref. [24] that such modification of parton fragmentation functions by quark-gluon double scattering and gluon bremsstrahlung in a nuclear medium describes very well the recent HERMES data [25] on semi-inclusive DIS off nuclear targets. Fig. 3. Diagram for leading order quark-antiquark annihilation with three possible cuts [cen- tral(C), left(L) and right(R)]. Fig. 4. A typical diagram for next-to-leading order correction to quark-antiquark annihilation with three possible cuts [central(C), left(L) and right(R)]. In this paper, we will consider quark-quark (antiquark) double scattering such as the process shown in Fig. 3 and its radiative corrections at order O(α2s) in Fig. 4. The contributions of quark-quark double scattering is proportional to the quark density in a nucleon, while the contribution of quark-gluon double scattering is proportional to the gluon density in a nucleon; and the gluon density is generally larger than the quark density in a nucleon at small momentum fraction. However, as pointed out in earlier works [18], quark-quark double scattering mixes quark and gluon fragmentation functions and therefore gives rise to new nuclear effects. The annihilation processes as shown in Figs. 3 and 4 will lead to different modifications of quark and antiquark fragmentation functions in a medium with finite baryon density (or valence quarks). Such differences will in turn lead to flavor dependence of the nuclear modification of leading hadron spectra as observed in HERMES experiment [25,26]. Quark-quark double scattering as well as quark-gluon double scattering are twist-4 pro- cesses. We will apply the same generalized factorization procedure for twist-4 processes as developed by LQS [23] for semi-inclusive processes in DIS. In general, the twist-four contributions can be expressed as the convolution of partonic hard parts and two-parton correlation matrix elements. In this framework, contributions from double quark-quark scattering in any order of αs, e.g., the quark-antiquark annihilation process as illustrated in Fig. 4, can be written in the following form, dWDµν p+dy− dy−1 dy −, y−1 , y 2 , p, q, zh) ×〈A|ψ̄q(0) −)ψ̄q(y 2 )|A〉. (11) Here we have neglected transverse momenta of all quarks in the hard partonic part. Transverse momentum dependent contributions are higher twist and are suppressed by 〈k2⊥〉/Q2, Therefore, all quarks’ momenta are assumed collinear, k2 = x2p and k3 = x3p. −, y−1 , y 2 , p, q, zh) is the Fourier transform of the partonic hard part H̃µν(x, x1, x2, p, q, zh) in momentum space, −, y−1 , y 2 , p, q, zh) = eix1p +y−+ix2p +i(x−x1−x2)p ×H̃Dµν(x, x1, x2, p, q, zh) dxH(0)µν (x, p, q)H (y−, y−1 , y 2 , x, p, q, zh) , (12) where, in collinear approximation, the hard partonic part H(0)µν (x, p, q) [Eq. (4)] in the leading twist without multiple parton scattering can be factorized out of the high-twist hard part H̃Dµν(x, x1, x2, p, q, zh). The momentum fractions x, x1, and x2 are fixed by δ-functions of the on-shell conditions of the final state partons and poles of parton prop- agators in the partonic hard part. The phase factors in H −, y−1 , y 2 , p, q, zh) can then be factored out, which in turn will be combined with the partonic fields in Eq. (11) to give twist-four partonic matrix elements or two-parton correlations. The quark-quark double scattering corrections in Eq. (11) can then be factorized as the convolution of fragmentation functions, twist-four partonic matrix elements and the partonic hard scat- tering cross sections. For scatterings (versus the annihilation) with quarks (antiquarks), a summation over the flavor of the secondary quarks (antiquarks) should be included in two-quark correlation matrix elements and both t, u channels and their interferences should be considered for scattering of identical quarks in the hard partonic parts. After factorization, we then define the twist-four correction to the leading twist quark fragmentation function in the same form [Eq. (3)], dWDµν dxfAq (x)H µν (x, p, q)∆Dq→h(zh) . (13) 3 Quark-quark double scattering processes In this section we will discuss the calculation of the hard part of quark-quark double scattering in detail. The lowest order process of quark-quark (antiquark) double scattering in nuclei is quark-antiquark annihilation (or quark-gluon conversion) as shown in Fig. 3. The hard partonic parts from the three cut diagrams in this figure are [18]: 0,C(y −, y−1 , y 2 , x, p, q, zh)=Dg→h(zh) ×θ(−y−2 )θ(y− − y−1 ) , (14) 0,L(y −, y−1 , y 2 , x, p, q, zh)=Dq→h(zh) ×θ(y−1 − y−2 )θ(y− − y−1 ) , (15) 0,R(y −, y−1 , y 2 , x, p, q, zh)=Dq→h(zh) ×θ(−y−2 )θ(y−2 − y−1 ) . (16) The main focus of this paper is about contributions from the next-leading order correc- tions to the above lowest order process. There is a total of 12 diagrams for real corrections at one-loop level as illustrated in Fig. 5 to Fig. 16 in the Appendix A-1, each having up to three different cuts. In this section, we demonstrate the calculation of the hard parts from the quark-antiquark annihilation in Fig. 4 in detail as an example. We will list the complete results of all diagrams in Appendix A. One can write down the hard partonic part of the central-cut diagram of Fig. 4 (Fig. 5 in Appendix A-1) according to the conventional Feynman rule, C µν(y −, y−1 , y 2 , p, q, zh)= Dg→h( eix1p +y−+ix2p ×ei(x−x1−x2)p+y (2π)4 γµĤγν 2πδ+(ℓ 2)2πδ+(ℓ g) δ(1− z − γ · (q + x1p) (q + x1p)2 − iǫ γ · (q + x1p− ℓ) (q + x1p− ℓ)2 − iǫ γ · (q + xp− ℓ) (q + xp− ℓ)2 + iǫ γ · (q + xp) (q + xp)2 + iǫ εαρ(ℓ) εβσ(ℓg) , (17) where δ+ is a Dirac delta-function with only the positive solution in its functional variable, εαρ(ℓ) = −gαρ + (nαℓρ + nρℓα)/n · ℓ is the polarization tensor of a gluon propagator in an axial gauge (n · A = 0) with n = [1, 0−,~0⊥], ℓ and ℓg = q + (x1 + x2)p − ℓ are the 4-momenta carried by the two final gluons respectively. The fragmenting gluon carries a fraction, z = ℓ−g /q −, of the initial quark’s longitudinal momentum (the large minus component). To simplify the calculation in the case of small transverse momentum ℓT ≪ q−, p+, we can apply the collinear approximation to complete the trace of the product of γ-matrices, Ĥ ≈ γ · ℓq γ · ℓqĤ . (18) According to the convention in Eqs. (11) and (12), contributions from quark-quark double scattering in the nuclear medium to the semi-inclusive hadronic tensor in DIS off a nucleus can be expressed in the general factorized form: dWDqq̄,µν dxH(0)µν (x, p, q) p+dy− dy−1 dy (y−, y−1 , y 2 , x, p, q, zh) × 〈A|ψ̄q(0) ψq̄(y −)ψ̄q(y ψq̄(y 2 )|A〉 . (19) After carrying out the momentum integration in x, x1, x2 and ℓ ± in Eq. (17) with the help of contour integration and δ-functions, one obtains the hard partonic part, H the rescattering for the central-cut diagram in Fig. 4 (Fig. 5) as 5,C(y −, y−1 , y 2 , x, p, q, zh) = α2sxB Dg→h(zh/z) × 2(1 + z z(1− z) I5,C(y −, y−1 , y 2 , x, xL, p) , (20) I5,C(y −, y−1 , y 2 , x, xL, p) = e i(x+xL)p +y−θ(−y−2 )θ(y− − y−1 ) × (1− e−ixLp+y 2 )(1− e−ixLp+(y−−y )) , (21) where the momentum fractions xL is defined as 2p+q−z(1− z) . (22) Note that the function I5,C(y −, y−1 , y 2 , x, xL, p) contains only phase factors. One can combine these phase factors with the matrix elements of the quark fields to define a special two-quark correlation function A(5,C) qq̄ (x, xL) = p+dy− dy−1 dy 2 〈A|ψ̄q(0) ψq̄(y −)ψ̄q(y ψq̄(y 2 )|A〉 × I5,C(y−, y−1 , y−2 , x, xL, p) . (23) The contribution from quark-antiquark annihilation in the central-cut diagram in Fig. 4 to the hadronic tensor can then be expressed as dWDqq̄,µν dxH(0)µν (x, p, q) α2sxB Dg→h( 2(1 + z2) z(1 − z) A(5,C) qq̄ (x, xL). (24) Contributions from all quark-quark (antiquark) double scattering processes can be cast in the above factorized form. The structure of the phase factors in I5,C(y −, y−1 , y 2 , x, xL, p) is exactly the same as for gluon bremsstrahlung induced by quark-gluon scattering as studied in Ref. [18,19]. It resembles the cross section of dipole scattering and represents contributions from two different processes and their interferences. It contains essentially four terms, I5,C(y −, y−1 , y 2 , x, xL, p) = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p ×[1 + e−ixLp+(y−+y ) − e−ixLp+y 2 − e−ixLp+(y−−y )] . (25) The first term corresponds to the so-called hard-soft processes where the gluon emission is induced by the hard scattering between the virtual photon γ∗ and the initial quark with momentum (x + xL)p. The quark then becomes on-shell before it annihilates with a soft antiquark from the nucleus that carries zero momentum and converts into a real gluon in the final state. The second term corresponds to a process in which the initial quark with momentum xp is on-shell after the first hard γ∗-quark scattering. It then annihilates with another antiquark and produces two final gluons in the final state. In this process, the antiquark carries finite (hard) momentum xLp. Therefore one often refers to this process as double-hard scattering as compared to the first process in which the antiquark carries zero momentum. Set aside the change of flavors in the initial and final states, the double-hard scattering corresponds essentially to two-parton elastic scattering with finite momentum and energy transfer. This is in contrast to the hard-soft scattering which is essentially the final state radiation of the γ∗-quark scattering and the total energy and momentum of the two final state gluons all come from the initial quark. The corresponding matrix elements of the two-quark correlation functions from these first two terms are called ‘diagonal’ elements. The third and fourth terms with negative signs in I5,C(y −, y−1 , y 2 , x, xL, p) are interfer- ences between hard-soft and double hard processes. The corresponding matrix elements are called ‘off-diagonal’. The cancellation between the two diagonal and off-diagonal terms essentially gives rise to the destructive interference which is very similar to the Landau-Pomeranchuk-Migdal (LPM) interference in gluon bremsstrahlung induced by quark-gluon double scattering [18,19]. One can similarly define the formation time of the parton (quark or gluon) emission as . (26) In the limit of collinear emission (xL → 0) or when the formation time of the parton emission, τf , is much larger than the nuclear size, the effective matrix element vanishes because I5,C(y −, y−1 , y 2 , x, xL, p)|xL=0 → 0 , (27) when the hard-soft and double hard processes have complete destructive interference. We should note that in the central-cut diagram of Fig. 4, the final state partons are two gluons. Therefore, in Eq. (20) the gluon fragmentation function in vacuum Dg→h(zh/z) enters. If the other gluon (close to the γ∗-quark interaction) fragments, the contribution to the semi-inclusive hadronic tensor is similar except that the corresponding effective “splitting function” should be replaced by 1 + z2 z(1 − z) → 1 + (1− z) z(1 − z) . (28) As we will show in Appendix A-1, the two gluons in the quark-antiquark annihilation processes (central-cut diagrams) are symmetric when contributions from all possible an- nihilation processes and their interferences are summed. Therefore, one can simply mul- tiply the final results by a factor of 2 to take into account the hadronization of the second final-state gluon. In addition to the central-cut diagram, one should also take into account the asymmetrical- cut diagrams in Fig. 4, which represent interference between gluon emission from single and triple scattering. The hard partonic parts are mainly the same as for the central-cut diagram. The only differences are in the phase factors and the fragmentation functions since the fragmenting parton can be the final-state quark or gluon. These hard parts can be calculated following a similar procedure and one gets, 5,L(R)(y −, y−1 , y 2 , x, p, q, zh) = α2sxB Dq→h( 2(1 + z2) z(1 − z) × I5,L(R)(y−, y−1 , y−2 , x, xL, p) , (29) I5,L(y −, y−1 , y 2 , x, xL, p) =−ei(x+xL)p +y−(1− e−ixLp+(y−−y ×θ(y−1 − y−2 )θ(y− − y−1 ) , (30) I5,R(y −, y−1 , y 2 , x, xL, p) =−ei(x+xL)p +y−(1− e−ixLp+y ×θ(−y−2 )θ(y−2 − y−1 ) . (31) In the asymmetrical cut diagrams, the above contributions come from the fragmentation of the final-state quark. Therefore, quark fragmentation function Dq→h(zh/z) enters this contribution. For gluon fragmentation into the observed hadron in this asymmetrical-cut diagrams, the contribution can be obtained by simply replacing the quark fragmentation function by the gluon fragmentation function Dg→h(zh/z) and replacing z by 1 − z. Summing the contributions from three different cut diagrams of Fig. 4, we can observe further examples of mixing (or conversion) of quark and gluon fragmentation functions. This medium-induced mixing was first observed by Wang and Guo [18] and is a unique feature of quark-quark (antiquark) double scattering among all multiple parton scattering processes. With the same procedure we can calculate contributions from all other cut diagrams of quark-quark (antiquark) double scattering at order O(α2s), which are listed in Ap- pendix A-1. There are three types of processes: two annihilation processes, qq̄ → gg (central-cut diagrams in Figs. 5, 6, 7, 8 and 9), qq̄ → qiq̄i (central-cut diagram in Fig. 10) and quark-quark (antiquark) scattering, qqi(q̄i) → qqi(q̄i) (central-cut diagram in Fig. 11). One also has to consider the interference of s and t-channel amplitude for annihilation into an identical quark pair, qq̄ → qq̄ (central-cut diagrams in Figs. 12 and 13) and the interference between t and u channels of identical quark scattering qq → qq (central-cut diagram in Fig. 14). Contributions from left and right-cut diagrams correspond to interference between the amplitude of gluon radiation from single γ∗-quark scattering and triple quark scattering. The amplitudes of gluon radiation via triple quark scattering essentially come from ra- diative corrections to the left and right-cut diagrams of the lowest-order quark-antiquark annihilation in Fig. 3 (as shown in left and right-cut diagrams in Figs. 5, 6, 8, 9, 12, 13, 15 and 16). Two other triple quark scatterings with gluon radiation, shown as the left and right-cut diagrams in Figs. 11 and 14, correspond to the case where one of the final state quarks, after quark-quark scattering, annihilates with another antiquark and converts into a final state gluon. 4 Modified Fragmentation Functions In order to simplify the contributions from quark-quark (antiquark) scattering (annihi- lation), one can first organize the results of the hard parts in terms of contributions from central, left or right-cut diagrams, which are associated by contour integrals with specific products of θ-functions, = HDC θ(−y−2 )θ(y− − y−1 )−HDL θ(y−1 − y−2 )θ(y− − y−1 ) −HDR θ(−y−2 )θ(y−2 − y−1 ) . (32) These θ-functions provide a space-time ordering of the parton correlation and will restrict the integration range along the light-cone. For contributions from central, left and right- cut diagrams that have identical hard partonic parts, H C = H L = H R , they will have a common combination of θ-functions that produces a path-ordered integral, dy−2 =− dy−1 dy θ(−y−2 )θ(y− − y−1 )− θ(−y−2 )θ(y−2 − y−1 ) −θ(y− − y−1 )θ(y−1 − y−2 ) that is limited only by the spatial-spread y− of the first parton along the light-cone coordinate. For a high-energy parton that carries momentum fraction xp+, y− ∼ 1/xp+ should be very small. Those contributions that are proportional to the above path-ordered integral are referred to as contact contributions (or contact interactions). Similarly, y−1 − y−2 is the spatial spread of the second parton and can only be limited by the spatial size of its host nucleon even for small value of momentum fraction. The spatial position of its host nucleon, y−1 + y 2 , however, can be anywhere within the nu- cleus. Therefore, any contributions from double parton scattering that have unrestricted integration over y−1 and y 2 should be proportional to the nuclear size of the target A and therefore are nuclear enhanced. In this paper, we will only keep the nuclear enhanced contributions and neglect the contact contributions. This will greatly simplify the final results for double parton scattering. 4.1 qq̄ → g annihilation For the lowest order of quark-antiquark annihilation in Eqs. (14)-(16), the hard parts from the three cut diagrams are almost the same except for the parton fragmentation functions. The central-cut diagram is proportional to the gluon fragmentation function while the left and right-cut diagrams are proportional to quark fragmentation functions. Rearranging the contributions from the three cut diagrams and neglecting the contact term that is proportional to the path-ordered integral as in Eq. (33), the total contribution can be written as dWD(0)µν qq̄ (x, 0) H(0)µν (x, p, q) × [Dg→h(zh)−Dq→h(zh)] . (34) According to our definition in Eq. (13) of the twist-four correction to the quark fragmen- tation functions, the modification to the quark fragmentation function from the lowest order quark-antiquark annihilation is then, (qq̄→g) q→h (zh) = [Dg→h(zh)−Dq→h(zh)] qq̄ (x, 0) fAq (x) . (35) Here the effective quark-antiquark correlation function T qq̄ (x, 0) is defined as, qq̄i (x, xL)≡ p+dy− dy−1 dy ixp+y−−ixLp )θ(−y−2 )θ(y− − y−1 ) ×〈A|ψ̄q(0) −)ψ̄qi(y ψqi(y 2 )|A〉 , (36) with the antiquark q̄i carrying momentum fraction xL. This two-parton correlation func- tion is always associated with double-hard rescattering processes. Similarly, we define three other quark-antiquark correlation matrix elements qq̄i (x, xL)≡ p+dy− dy−1 dy i(x+xL)p +y−θ(y− − y−1 ) × θ(−y−2 )〈A|ψ̄q(0) −)ψ̄qi(y ψqi(y 2 )|A〉 , (37) A(I−L) qq̄i (x, xL)≡ p+dy− dy−1 dy i(x+xL)p +y−−ixLp +(y−−y )θ(y− − y−1 ) × θ(−y−2 )〈A|ψ̄q(0) −)ψ̄qi(y ψqi(y 2 )|A〉 , (38) A(I−R) qq̄i (x, xL)≡ p+dy− dy−1 dy i(x+xL)p +y−−ixLp 2 θ(y− − y−1 ) × θ(−y−2 )〈A|ψ̄q(0) −)ψ̄qi(y ψqi(y 2 )|A〉 , (39) that are associated with hard-soft rescattering and interference between double hard and hard-soft rescattering. In the first parton correlation T qq̄i (x, xL), the antiquark q̄i carries momentum fraction xL while the initial quark has the momentum fraction x. The two-parton correlation T qq̄i (x, xL) corresponds to the case when the leading quark has x+ xL but the antiquark carries zero momentum. The two interference matrix elements are approximately the same for small value of xL and will be denoted as T qq̄i (x, xL). 4.2 qq̄ → qiq̄i annihilation Contributions from the next-to-leading order quark-antiquark annihilation or quark- quark (antiquark) scattering are more complicated since they involve many real and vir- tual corrections. The simplest real correction comes from qq̄ → qiq̄i annihilation (qi 6= q) [Fig. 10 and Eqs. (A-25) and (A-26)] which has only a central-cut diagram, (qq̄→qiq̄i) q→h (zh) = α2sxB [z2 + (1− z)2] qi 6=q [Dqi→h(zh/z) +Dq̄i→h(zh/z)] qq̄ (x, xL) fAq (x) . (40) This kind of qq̄ annihilation is truly a hard processes and thus requires the second an- tiquark to carry finite initial momentum fraction xL. Furthermore, there are no other interfering processes. 4.3 qqi(q̄i) → qqi(q̄i) scattering Contributions from non-identical quark-quark scattering qq̄i → qq̄i (qi 6= q) are a little complicated because they involve all three cut diagrams (central, left and right) [Eqs. (A- 28)-(A-32)]. One can factor out the θ-functions in the hard parts according to Eq. (32) and re-organize the phase factors in each cut diagram, I11,C = e i(x+xL)p +y−(1− e−ixLp+y 2 )(1− e−ixLp+(y−−y = ei(x+xL)p +y−[1− e−ixLp+y 2 − e−ixLp+(y−−y ) + e−ixLp +(y−+y I11,L = e i(x+xL)p +y−(1− e−ixLp+(y−−y = ei(x+xL)p +y−[1− e−ixLp+y 2 − e−ixLp+(y−−y ) + e−ixLp 2 ] ; I11,R = e i(x+xL)p +y−(1− e−ixLp+y = ei(x+xL)p +y−[1− e−ixLp+y 2 − e−ixLp+(y−−y ) + e−ixLp +(y−−y )] , (41) such that the first three terms in each amplitude are the same. These three common phase factors will give rise to a contact contribution for all similar hard parts from the three cut diagrams, which we will neglect since they are not nuclear enhanced. The remaining part will have the following phase factors, I11= e i(x+xL)p +y−[θ(−y−2 )θ(y− − y−1 )e−ixLp +(y−+y −θ(y−1 − y−2 )θ(y− − y−1 )e−ixLp 2 − θ(−y−2 )θ(y−2 − y−1 )e−ixLp +(y−−y )] . (42) Note that the phase factors of the last two terms in the above equation give identi- cal contributions to the matrix elements when intergated over y−1 and y 2 as they dif- fer only by the substitution y−2 ↔ y−1 − y−. One therefore can combine them with θ(−y−2 )θ(y− − y−1 )e−ixLp +(y−−y ) to form another contact contribution (path-ordered) which can be neglected. The final effective phase factor is then I11 = e ixp+y−−ixLp )(1− eixLp+y 2 ) . (43) Using the above effective phase factor, one can obtain the effective modification to the quark fragmentation function due to quark-antiquark scattering, qq̄i → qq̄i, (qq̄i→qq̄i) q→h (zh) = α2sxB q̄i 6=q̄ Dq→h(zh/z) 1 + z2 (1− z)2 + Dg→h(zh/z) 1 + (1− z)2 A(HI) qq̄i (x, xL) fAq (x) + [Dq̄i→h(zh/z))−Dg→h(zh/z)] 1 + (1− z)2 A(HS) qq̄i (x, xL) fAq (x) α2sxB q̄i 6=q̄ Dq→h(zh/z) 1 + z2 (1− z)2 + Dq̄i→h(zh/z) 1 + (1− z)2 A(HI) qq̄i (x, xL) fAq (x) + [Dq̄i→h(zh/z)−Dg→h(zh/z)] 1 + (1− z)2 A(SI) qq̄i (x, xL) fAq (x)  , (44) where three types of two-parton correlations are defined: A(HI) qq̄i (x, xL)≡T qq̄i (x, xL)− T qq̄i (x, xL) p+dy− dy−1 dy ixp+y−−ixLp )(1− eixLp+y ×〈A|ψ̄q(0) −)ψ̄qi(y ψqi(y 2 )|A〉θ(−y−2 )θ(y− − y−1 ) , (45) A(SI) qq̄i (x, xL)≡T qq̄i (x, xL)− T qq̄i (x, xL) p+dy− dy−1 dy i(x+xL)p +y−(1− e−ixLp+y ×〈A|ψ̄q(0) −)ψ̄qi(y ψqi(y 2 )|A〉θ(−y−2 )θ(y− − y−1 ) , (46) A(HS) qq̄i (x, xL)≡T A(HI) qq̄i (x, xL) + T A(SI) qq̄i (x, xL) p+dy− dy−1 dy i(x+xL)p +y−(1− e−ixLp+y × (1− e−ixLp+(y−−y ))θ(−y−2 )θ(y− − y−1 ) ×〈A|ψ̄q(0) −)ψ̄qi(y ψqi(y 2 )|A〉 . (47) One can similarly obtain the modification of quark fragmentation from non-identical quark-quark scattering by replacing q̄i → qi in Eq. (44), (qqi→qqi) q→h (zh) = α2sxB qi 6=q Dq→h(zh/z) 1 + z2 (1− z)2 + Dqi→h(zh/z) 1 + (1− z)2 TA(HI)qqi (x, xL) fAq (x) + [Dqi→h(zh/z)−Dg→h(zh/z)] 1 + (1− z)2 TA(SI)qqi (x, xL) fAq (x) . (48) The two-quark correlations, TA(HI)qqi (x, xL) and T A(SI) (x, xL) can be obtained from T A(HI) qq̄i (x, xL) and T A(SI) qq̄i (x, xL), respectively, by making the replacements ψqi(y2) → ψ̄qi(y2) and ψ̄qi(y1) → ψqi(y1) in Eqs. (45) and (46), TA(HI)qqi (x, xL)≡ p+dy− dy−1 dy ixp+y−−ixLp )(1− eixLp+y ×〈A|ψ̄q(0) −)ψ̄qi(y ψqi(y 1 )|A〉θ(−y−2 )θ(y− − y−1 ) , (49) TA(SI)qqi (x, xL) = p+dy− dy−1 dy i(x+xL)p +y−(1− e−ixLp+y ×〈A|ψ̄q(0) −)ψ̄qi(y ψqi(y 1 )|A〉θ(−y−2 )θ(y− − y−1 ) , (50) and TA(HS)qqi (x, xL) = T A(HI) (x, xL) + T A(SI) (x, xL). Note that the contribution from fragmentation of quark qi or antiquark q̄i only comes from the central-cut diagram. This contribution is positive and is proportional to T A(HI) qq̄i (x, xL)+ A(SI) qq̄i (x, xL), containing all four terms: hard-soft, double-hard and both interference terms . The gluon fragmentation comes only from the single-triple interferences (left and right-cut diagrams). Its contribution is therefore negative and partially cancels the pro- duction of qi(q̄i) from the hard-soft rescattering. The cancellation is not complete since the gluon and quark fragmentation functions are different. The structure of this hard-soft rescattering (quark plus gluon) is very similar to the lowest order result of qq̄ → g in Eq. (35). It contributes to the modification of the effective fragmentation function but does not contribute to the energy loss. The energy loss of the leading quark comes only from double-hard rescattering, since the leading quark fragmentation comes both from the central-cut and single-triple interferences, and the single-triple interference terms can- cel the effect of hard-soft scattering for the leading fragmentation. Its net contribution is therefore proportional to T A(HI) qq̄i(q̄i) . Since the double-hard rescattering amounts to elastic qqi(q̄i) scattering, the effective energy loss is essentially elastic energy loss as shown in Ref. [34]. There is, however, LPM suppression due to partial cancellation by single-triple interference contributions. For long formation time, 1/xLp + ≫ RA, the cancellation is complete. Therefore, LPM interference effectively imposes the lower limit xL ≥ 1/p+RA on the fractional momentum carried by the second quark (antiquark). 4.4 qq → qq scattering For identical quark-quark scattering, qq → qq, one has to include both t and u-channels, their interferences, and the related single-triple interference contributions. Using the same technique to identify and neglect the contact contributions, one can find the correspond- ing modification to the quark fragmentation function from Eqs. (48) and (A-45)-(A-49), (qq→qq) q→h (zh) = α2sxB TA(HS)qq (x, xL) fAq (x) × [Dq→h(zh/z))−Dg→h(zh/z)] 1 + (1− z)2 z(1 − z) Dq→h(zh/z) 1 + z2 (1− z)2 z(1 − z) +Dg→h(zh/z) 1 + (1− z)2 z(1 − z) TA(HI)qq (x, xL) fAq (x) α2sxB TA(SI)qq (x, xL) fAq (x) P (s)qq→qq(z)[Dq→h(zh/z) −Dg→h(zh/z)] +Dq→h(zh/z)Pqq→qq(z) TA(HI)qq (x, xL) fAq (x) , (51) where the effective splitting functions are defined as P (s)qq→qq(z) = 1 + (1− z)2 z(1 − z) , (52) Pqq→qq(z) = 1 + (1− z)2 1 + z2 (1− z)2 z(1− z) . (53) 4.5 qq̄ → qq̄, gg annihilation The most complicated twist-four processes involving four quark field operators are quark- antiquark annihilation into two gluons or an identical quark-antiquark pair. We have to consider them together since they have similar single-triple interference processes and they involve the same kind of quark-antiquark correlation matrix elements, T qq̄ (x, xL), (i = HI, SI,HS). For notation purpose, we first factor out the common factor (CF/Nc)α sxB/Q 2/fAq (x) and the integration over ℓT and z and define (qq̄→gg,qq̄) q→h (zh) ≡ α2sxB Q2fAq (x) (qq̄→gg,qq̄) q→h (zh, z, x, xL). (54) After rearranging the phase factors and identifying (by combining central, left and right cut diagrams) and neglecting contact contributions we can list in the following the twist- four corrections to the quark fragmentation from the hard partonic parts of each cut diagram (see Appendix A): Fig. 5 (t-channel qq̄ → gg), (qq̄→gg,qq̄) q→h(5) =Dg→h(zh/z)2CF 1 + (1− z)2 z(1 − z) 1 + z2 z(1− z) A(HI) qq̄ (x, xL) + [Dg→h(zh/z)−Dq→h(zh/z)] 2CF 1 + z2 z(1− z) A(SI) qq̄ (x, xL) ; (55) Fig. 6 (interference between u and t-channel of qq̄ → gg), (qq̄→gg,qq̄) q→h(6) =Dg→h(zh/z) −4(CF − CA/2) z(1 − z) A(HI) qq̄ (x, xL) + [Dg→h(zh/z)−Dq→h(zh/z)] −2(CF − CA/2) z(1 − z) A(SI) qq̄ (x, xL) ; (56) Fig. 7 (s-channel of qq̄ → gg), (qq̄→gg,qq̄) q→h(7) = Dg→h(zh/z)4CA (1− z + z2)2 z(1 − z) qq̄ (x, xL) ; (57) Figs. 8 and 9 (interference of s and t-channel qq̄ → gg), (qq̄→gg,qq̄) q→h(8+9) =Dg→h(zh/z)(−2CA) 1 + z3 z(1− z) 1 + (1− z)3 z(1 − z) ×TA(HI)qq̄ (x, xL) + CA Dq→h(zh/z) 1 + z3 z(1 − z) +Dg→h(zh/z) 1 + (1− z)3 z(1 − z) × [TA(I2)qq̄ (x, xL)− T qq̄ (x, xL)] ; (58) Fig. 10 (s-channel of qq̄ → qq̄), (qq̄→gg,qq̄) q→h(10) = [Dq→h(zh/z) +Dq̄→h(zh/z)] [z 2 + (1− z)2] ×TA(H)qq̄ (x, xL) , (59) Fig. 11 (t-channel of qq̄ → qq̄), similar to Eq. (44), (qq̄→gg,qq̄) q→h(11) =Dq→h(zh/z) 1 + z2 (1 − z)2 A(HI) qq̄ (x, xL) +Dq̄→h(zh/z) 1 + (1− z)2 A(HS) qq̄ (x, xL) −Dg→h(zh/z) 1 + (1− z)2 A(SI) qq̄ (x, xL) ; (60) Figs. 12 and 13 (interference between s and t-channel qq̄ → qq̄), (qq̄→gg,qq̄) q→h(12+13) =−4(CF − CA/2) Dq→h(zh/z) 1 − z + Dq̄→h(zh/z) (1− z)2 A(HI) qq̄ (x, xL) +2(CF − CA/2) Dq→h(zh/z) +Dg→h(zh/z) (1− z)2 × [TA(I2)qq̄ (x, xL)− T qq̄ (x, xL)] ; (61) Figs. 15 and 16 (two additional single-triple interference diagrams), (qq̄→gg,qq̄) q→h(15+16) =−2CF Dq→h(zh/z) 1 + z2 + Dg→h(zh/z) 1 + (1− z)2 A(I2) qq̄ (x, xL) . (62) Most processes involve both TA(HI)(x, xL) for double-hard rescattering with interference and TA(SI)(x, xL) for hard-soft rescattering with interference. All the s-channel (Figs. 7 and 10) processes involve double-hard scattering only. Therefore, they depend only on the qq̄ (x, xL) = T A(HI) qq̄ (x, xL) + T qq̄ (x, xL). For interference between single and triple scattering (left and right-cut diagrams in Figs. 8, 9, 12 13, 15 and 16), where a hard rescattering with the second quark (antiquark) follows a soft rescattering with the third antiquark (quark), only interference matrix elements, T qq̄ (x, xL) and T A(I2) qq̄ (x, xL), are involved. Here, A(I2) qq̄ (x, xL)≡ p+dy− dy−1 dy ixp+y−+ixLp ×〈A|ψ̄q(0) −)ψ̄q(y 2 )|A〉θ(−y−2 )θ(y− − y−1 ) p+dy− dy−1 dy ixp+y−+ixLp +(y−−y ×〈A|ψ̄q(0) −)ψ̄q(y 2 )|A〉θ(−y−2 )θ(y− − y−1 ) , (63) is a new type of interference matrix elements that is only associated with this type of single-triple interference processes. One can categorize the above contributions according to the associated two-quark correlation matrix elements and rewrite the above contribu- tions as, qq̄→qq̄,gg q→h(HI) = T A(HI) qq̄ (x, xL)[Dg→h(zh/z)Pqq̄→gg(z) +Dq→h(zh/z)Pqq̄→qq̄(z) +Dq̄→h(zh/z)Pqq̄→qq̄(1− z)] (64) qq̄→qq̄,gg q→h(SI) = T A(SI) qq̄ (x, xL) z(1 − z) + 2CF 1 + (1− z)2 ×Dg→h(zh/z)−Dq→h(zh/z) z(1− z) + 2CF + Dq̄→h(zh/z) 1 + (1− z)2 A(SI) qq̄ (x, xL) [Dq→h(zh/z)−Dg→h(zh/z)] P (s)qq→qq(z) − 2CF 1 + z2 z(1− z) + [Dq̄→h(zh/z)−Dq→h(zh/z)] 1 + (1− z)2 qq̄→qq̄,gg q→h(I) = T qq̄ (x, xL) 4(1− z + z2)2 − 1 z(1 − z) − 2CF (1− z)2 ×Dg→h(zh/z) + [z2 + (1− z)2]Dq̄→h(zh/z)] + Dq→h(zh/z) z2 + (1− z)2 − z(1 − z) − 2CF A(I2) qq̄ (x, xL) Dq→h(zh/z) z(1 − z) − 2CF + Dg→h(zh/z) z(1 − z) − 2CF , (66) where P (s)qq→qq(z) is given in Eq. (52) and the effective splitting functions for qq̄ → gg and qq̄ → qq̄ are defined as Pqq̄→gg(z) = 2CF z2 + (1− z)2 z(1− z) − 2CA[z2 + (1− z)2] ; (67) Pqq̄→qq̄(z) = z 2 + (1− z)2 + 1 + z2 (1− z)2 , (68) which come from the complete matrix elements of qq̄ → gg and qq̄ → qq̄ scattering (see Appendix A-3). Again, double-hard rescattering corresponds to the elastic scattering of the leading quark with another antiquark in the medium and the interference contribu- tions. The structure of the hard-soft rescattering contribution we identify above shows the same kind of gluon-quark (or quark-antiquark) mixing in the fragmentation functions and does not contribute to the energy loss of the leading quark. The unique contributions in the qq̄ → qq̄, gg processes are the interference-only contributions. They mainly come from single-triple interference processes in the multiple parton scattering. 5 Modification due to quark-gluon mixing We have so far cast the modification of the quark fragmentation function due to quark- quark (antiquark) scattering (or annihilation) in a form similar to the DGLAP evolution equation in vacuum. In fact, one can also view the evolution of fragmentation functions in vacuum as modification due to final-state gluon radiation. In both cases, the modification at large zh is mainly determined by the singular behavior of the splitting functions for z → 1, whereas the modifications at mall zh is dominated by the singular behavior of the splitting function for z → 0. Let us first focus on the modification at large zh. A careful examination of the contribu- tions from all possible processes shows that the dominant modification to the effective quark fragmentation function comes from the t-channel of double hard quark-quark scat- tering processes, ∆Dq→h(zh)∼ α2sxB Dq→h( TA(HI)qqi (x, xL) fAq (x) × 1 + z (1 − z)2+ +δ(1− z)∆qi(ℓ2T ) Dq→h( A(HI) (x, xL) fAq (x) ×z(1 + z (1 − z)+ + δ(1− z)∆qi(ℓ2T ) , (69) where the summation is over all possible quark and antiquark flavors including qi = q, q̄ and ∆qi(ℓ T ) represents the contribution from virtual corrections. We have expressed the modification in a form that it is proportional to the matrix elements xLT A(HI) (x, xL)/f q (x) ∼ A1/3xLf (xL) as compared to the modification from quark-gluon scattering where the corresponding matrix element [Eq. (9)] is TA(HI)qg (x, xL)/f q (x) ∼ A1/3xLGN(xL). Here, fNqi (x) and G N(x) are quark and gluon distributions, respectively, in a nucleon. This leading contribution to the modification from quark-quark scattering is very similar in form to that from quark-gluon scattering [see Eq. (7)]. However, it is smaller due to the different color factors CF/CA = 4/9 and the different quark and gluon distributions, fNqi (xL) and G N(xL) in a nucleon. Because of LPM intereference, small angle scattering with long formation time τf = 1/xLp + is suppressed, leading to a minimum value of xL ≥ xA = 1/mNRA = 0.043 for a Kr target. For this value of xL, the ratio fNqi (xL, Q GN(xL, Q2) ≥ 1.40/1.85 ∼ 0.75, (70) at Q2 = 2 GeV2 according the CTEG4HJ parameterization [35]. Therefore, one has to include the effect of quark-quark scattering for a complete calculation of the total quark energy loss and medium modification of quark fragmentation functions. In a weakly coupled and fully equilibrated quark-gluon plasma, quark to gluon number density ratio is ρq/ρg = nf (3/2)Nc/(N c − 1) = 9nf/16. An asymptotically energetic jet in an infinitely large medium actually probes the small x = 〈q2T 〉/2ET regime, where quark-antiquark pairs and gluons are predominantly generated by thermal gluons through pQCD evolution. In this ideal scenario one expects Nq/Ng ∼ 1/4CA = 1/12 and therefore can neglect quark-quark scattering. The modification of quark fragmentation function will be dominated by quark-gluon rescattering. However, for moderate jet energy E ≈ 20 GeV and a finite medium L ∼ 5 fm, parton distributions in a quark-gluon plamsa are close to the thermal distribution. In particular, if quark and gluon production is dominated by non-perturbative pair production from strong color fields in the initial stage of heavy-ion collisions [36], the quark to gluon ratio is comparable to the equilibrium value. In this case, we should take into account the medium modification of the quark fragmentation functions by quark-quark scattering. An important double hard process in quark-quark (antiquark) scattering is qq̄ → gg [Eq. (64)]. In this process, the annihilation converts the initial quark into two final gluons that subsequently fragment into hadrons. This will lead to suppression of the leading hadrons not only because of energy loss (energy carried away by the other gluon) but also due to the softer behavior of gluon fragmentation functions at large zh. Even though the leading behavior of the effective splitting function [Eq. (67)] Pqq̄→gg(z) ≈ 2CF z2 + (1− z)2 z(1− z) is not as dominating as that of t-channel quark-quark scattering, it is enhanced by a color factor 2CF = 8/3. One expects this to make a significant contribution to the medium modification at intermediate zh. In high-energy heavy-ion collisions, the ratios of initial production rates for valence quarks, gluons and antiquarks vary with the transverse momentum pT . Gluon production rate dominates at low pT while the fraction of valence quark jets increases at large pT . Quarks are more likely to fragment into protons than antiprotons, while gluons fragment into protons and antiprotons with equal probabilities. Therefore, the ratio of large pT antiproton and proton yields in p + p collisions is smaller than 1 and decreases with pT as the fraction of valence quark jets increases. Since gluons are expected to lose more energy than quark jets, one would naively expect to see the antiproton to proton ratio p̄/p becomes smaller due to jet quenching. However, if the quark-gluon conversion due to qq̄ → gg becomes important, one would expect that the fractions of quark and gluon jets are modified toward their equilibrium values. The final p̄/p ratio could be larger than or comparable to that in p+ p collisions. Such a scenario of quark-gluon conversion was recently considered in Ref. [37] via a master rate equation. The mixing between quark and gluon jets also happens at the lowest order of quark- antiquark annihilation as shown in Fig. 3. At NLO, all hard-soft quark-quark (antiquark) scattering processes have this kind of mixing between quark and gluon fragmentation functions. Their contributions generally have the form, α2sxB Dqi→h( )−Dg→h( ×Pqqi→qqi(z) TA(SI)qqi (x, xL) fAq (x) , (72) where again the summation over the quark flavor includes qi = q, q̄. This mixing does not occur on the probability but rather on on the amplitude level since it involves in- terferences between single and triple scattering. Therefore, this contribution depends on the difference between gluon and quark fragmentation functions [Eq. (35)] and can be positive or negative in different region of zh. Nevertheless, they contribute to the mod- ification of the effective quark fragmentation function and the flavor dependence of the final hadron spectra. 6 Flavor dependence of the medium modified fragmentation Summing all contributions to quark-quark (antiquark) double scattering as listed in Sec- tion 4, we can express the total twist-four correction up to O(α2s) to the quark fragmen- tation function as ∆Dq→h(zh)= 2 [Dg→h(zh)−Dq→h(zh)] qq̄ (x, 0) fAq (x) a,b,i Db→h(zh/z)P qa→b(z) TA(i)qa (x, xL) fAq (x)  , (73) where the summation is over all possible q+a→ b+X processes and all different matrix elements TA(i)qa (x, xL) (i = HI, SI, I, I2), which will be four basic matrix elements we will use. The effective splitting functions P qa→b(z) are listed in Appendix A-2. One should also include virtual corrections which can be constructed from the real corrections through unitarity constraints [18]. Similarly, we can also write down the twist-four corrections to antiquark fragmentation in a nuclear medium, ∆Dq̄→h(zh)= 2 [Dg→h(zh)−Dq̄→h(zh)] q̄q (x, 0) fAq̄ (x) a,b,i Db→h(zh/z)P q̄a→b(z) q̄a (x, xL) fAq̄ (x)  , (74) where the matrix elements T q̄a (x, xL) and the effective splitting functions P q̄a→b(z) can be obtained from the corresponding ones for quarks. Given a model for the two-quark correlation functions, one will be able to use the above expressions to numerically evaluate twist-four corrections to the quark (antiquark) fragmentation functions. In this paper, we will instead give a qualitative estimate of the flavor dependence of the correction in DIS off a large nucleus. For the purpose of a qualitative estimate, one can assume that all the twist-four two-quark correlation functions can be factorized, as has been done in Refs. [18,19,23,33], p+dy− dy−1 dy +y−+ix2p )θ(−y−2 )θ(y− − y−1 ) ×〈A|ψ̄q(0) −)ψ̄qi(y ψqi(y 2 )|A〉 fAq (x1) f (x2) , (75) p+dy− dy−1 dy +y−+ix2p )±ixLp 2 θ(−y−2 )θ(y− − y−1 ) ×〈A|ψ̄q(0) ψq̄(y −)ψ̄qi(y ψqi(y 2 )|A〉 fAq (x1) f (x2)e A , (76) where xA = 1/mNRA, mN is the nucleon mass, RA the nucleus size, f (x2) is the antiquark distribution in a nucleon and C is assumed to be a constant, parameterizing the strength of two-parton correlations inside a nucleus. The integration over the position of the antiquark (y−1 +y 2 )/2 in the twist-four two-quark correlation matrix elements gives rise to the nuclear enhancement factor 1/xA = mNRA = 0.21A We should note that we set kT = 0 for the collinear expansion. As a consequence, the secondary quark field in the twist-four parton matrix elements will carry zero momentum in the soft-hard process. Finite intrinsic transverse momentum leads to higher-twist cor- rections. If a subset of the higher-twist terms in the collinear expansion can be resummed to restore the phase factors such as eixT p +y−, where xT ≡ 〈k2T 〉/2p+q−z(1 − z), the soft quark fields in the parton matrix elements will carry a finite fractional momentum xT . Under such an assumption of factorization, one can obtain all the two-quark correlation matrix elements: A(HI) qq̄i (x, xL)≈ fAq (x) f (xL + xT )[1− e−x A], (77) A(SI) qq̄i (x, xL)≈ fAq (x+ xL) f (xT )[1− e−x A], (78) qq̄i (x, xL)≈T A(I2) qq̄i (x, xL) ≈ fAq (x+ xL) f (xT )e fAq (x) f (xL + xT )e A. (79) In the last approximation, we have assumed xL ∼ xT ≪ x. Similarly, one can obtain TA(i)qqi (x, xL), T q̄qi (x, xL) and T q̄q̄i (x, xL). With these forms of two-quark correlation matrix elements, we can estimate the flavor dependence of the nuclear modification to the quark (antiquark) fragmentation functions. The lowest order corrections [O(αs)] are very simple q→h (zh) ∝ CA1/3[Dg→h(zh)−Dq→h(zh)]fNq̄ (xT ) , (80) q̄→h̄ (zh) ∝ CA1/3[Dg→h̄(zh)−Dq̄→h̄(zh)]fNq (xT ) . (81) We consider the dominant contribution from the fragmentation of a quark (antiquark) which is one of the valence quarks (antiquarks) of the final particle h (antiparticle h̄). The gluon fragmentation functions into h and h̄ are the same. For large zh, the gluon fragmentation function is always softer than the valence quark (antiquark) fragmenta- tion [38]. Therefore, the lowest order twist-four corrections are always negative for large zh, leading to a suppression of the valence quark (antiquark) fragmentation function, Dqv→h(zh) [Dq̄v→h̄(zh)]. Consider those quarks that are also valence quarks of a nucleon: n = udd p = uud , p̄ = ūūd̄ , (82) K+ = us̄ ,K− = ūs . (83) π+, π0, π− = ud̄ , (uū − dd̄ )/ 2 , dū . (84) One can find the following flavor dependence of the lowest order twist-four corrections to the quark (antiquark) fragmentation functions, q̄v→h̄ −|∆D(LO) q̄v→h̄ (zh)| −|∆D(LO)qv→h(zh)| fNqv (xT ) fq̄v(xT ) > 1 , (85) q̄v→h̄ 1 + ∆D q̄v→h̄ (zh)/Dq̄→h̄(zh) 1 + ∆D (zh)/Dq→h(zh) < 1 , (86) where R is the corresponding leading order suppression of the fragmentation function at large zh for proton (anti-proton) and K + (K−). Since pions contain both valence quark and antiquark, the suppression factors should be similar for all pions. For xT ≥ 0.043, u(x)/ū(x) ≥ 3 and d(x)/d̄(x) ≥ 2 [35]. Therefore, the modification of antiquark fragmentation functions due to quark-antiquark annihilation is significantly larger than that of a quark. The flavor dependence of the NLO results are more complicated since they involve scatter- ing with both quarks and antiquarks in the medium. One can observe first that effective splitting functions (or quark-quark scattering cross section) are the same for the t-channel qq′ → qq′ and qq̄′ → qq̄′ (q′ 6= q) scatterings, qq′→b(z) = P q̄q′→b(z) = P qq̄′→b(z) = P q̄q̄′→b(z) . (87) For identical quark-quark scattering or quark-antiquark annihilation, one can separate the qq̄ annihilation splitting functions (or cross sections) into singlet and non-singlet contributions by singling out the t-channel contributions, qq̄→b(z)≡P qq→b(z) + ∆P qq̄→b(z), (88) q̄q→b(z)≡P q̄q̄→b(z) + ∆P q̄q→b(z). (89) These singlet contributions to the modified fragmentation functions are, S(NLO) q→h (zh)∝ b,q′,i Db→h ⊗ P (i)qq′→b(zh) ×[fNq′ (xT ) + fNq̄′ (xT )]C(i) , (90) S(NLO) q̄→h̄ (zh)∝ b,q′,i Db̄→h̄ ⊗ P q̄q̄′→b̄ ×[fNq′ (xT ) + fNq̄′ (xT )]C(i) , (91) where the summation over q′ now includes q′=q and C(i)(xL) are flavor-independent functions determined from Eqs. (77)-(79), C(HI) =C(SI) = C(xL)(1− e−x C(I) =C(I2) = C(xL)e A , (92) and C(xL) is a common coefficient that is a function of xL. Using P q̄q̄→b̄ (z) = P qq→b(z) , one can conclude that the singlet contributions to the modified quark and antiquark fragmentation functions are the same, ∆D S(NLO) q→h (zh) = ∆D S(NLO) q̄→h̄ (zh). The non-singlet contributions, mainly from s-channel and s-t interferences, are, N(NLO) q→h (zh)∝ Db→h ⊗∆PN(i)qq̄→b(zh)fNq̄ (xT )C(i) , (93) N(NLO) q̄→h̄ (zh)∝ Db̄→h̄ ⊗∆P q̄q→b̄ (zh)f q (xT )C (i) , (94) where again ∆P qq̄→b(z) = ∆P q̄q→b̄ (z) due to crossing symmetry. We have listed all non- vanishing nonsinglet splitting functions ∆P qq̄→b(z) in Appendix A-2. We again consider the limit zh → 1. In this region the convolution in the modified fragmentation function is dominated by the large z → 1 behavior of the effective split- ting functions. From the listed ∆P qq̄→b(z) in Appendix A-2, we can obtain the leading contributions, C(i)∆P qq̄→q(z)≈−4CF C(xL) C(i)∆P qq̄→g(z)≈ 2 2CF + CF (1− e−x A) + CAe ] C(xL) , (95) where we have also neglected terms proportional to 1/Nc. All ∆P qq̄→q̄(z) are non-leading in the limit z → 1 and therefore can be neglected. With these leading contributions, the non-singlet modification to the quark and antiquark fragmentation functions can be estimated as N(NLO) q→h (zh)∝ C(xL) (1− z)+ CF (1− e−x A) + CAe + δ(1− z)∆1(ℓT ) −Dq→h C(xL) (1− z)+ + δ(1− z)∆2(ℓT ) fNq̄ (xT ) , (96) N(NLO) q̄→h̄ (zh)∝ Dg→h̄ C(xL) (1− z)+ CF (1− e−x A) + CAe + δ(1− z)∆1(ℓT ) Dg→h̄ −Dq̄→h̄ C(xL) (1− z)+ + δ(1− z)∆2(ℓT ) fNq (xT ) , (97) where ∆1(ℓT ) and ∆2(ℓT ) are from virtual corrections, ∆1(ℓT )= CFC(xL)|z=1 − [CF (1− e−x + CAe A ]C(xL) , (98) ∆2(ℓT )= 2CF [C(xL)|z=1 − C(xL)] . (99) Because of momentum conservation, C(xL) = 0 when xL → ∞ for z = 1. Therefore, the above virtual corrections are always negative. At large zh, these virtual corrections dominate over the real ones. There are two kinds of non-singlet contributions in the expressions given above. One that is proportional to gluon fragmentation functions is due to quark-antiquark annihilation into gluons which then fragment. The fragmenting gluon not only carries less energy than the initial quark but also has a softer fragmentation function, leading to suppression of the final leading hadrons. The second type of contributions is proportional to Dg→h(zh)− Dq→h(zh) and therefore mixes quark and gluon fragmentation functions, similarly as the lowest order quark-antiquark annihilation processes [see Eqs. (80) and (81)]. Since a gluon fragmentation function is softer than a quark one, the real corrections from this type of processes are positive for small zh and negative for large zh. The virtual corrections have just the opposite behavior. Therefore, the second type of contributions will reduce the total net modification. For intermediate values of zh where 2Dg→h(zh) > Dq→h(zh), the net effect is still the suppression of the effective fragmentation functions for leading hadrons. Since fNq (xT ) > f q̄ (xT ), we can conclude that the LO and NLO combined non-singlet suppression for antiquark fragmentation into valence hadrons is larger than that for quark fragmentation into valence hadrons. This qualitatively explains the flavor dependence of nuclear suppression of leading hadrons in DIS off heavy nuclear targets as measured by the HERMES experiment [25,26]. The ratio of differential semi-inclusive cross sections for nucleus and deuteron targets were used to study the nuclear suppression of the frag- mentation functions. It was observed that suppression of leading anti-proton is stronger than for leading proton and K− suppression is stronger than K+. In the valence quark fragmentation picture, the leading proton (K+) is produced mainly from u, d (u) quark fragmentation while anti-protons come primarily from ū, d̄ (ū) fragmentation. Therefore, HERMES data are consistent with stronger suppression of antiquark fragmentation. Since gluon bremsstrahlung and the singlet qqi(q̄i) scattering also suppress quark and antiquark fragmentation, but independently of quark flavor, one has to include all the processes in order to have a complete and quantitative numerical evaluation of the flavor dependence of the nuclear modification of the quark fragmentation functions. Further- more, the NLO contributions are proportional to αs ln(Q 2)/2π. They are as important as the lowest order correction for large values of Q2. In principle, one should resum these higher order corrections via solving a set of coupled DGLAP evolution equations, in- cluding medium modification for gluon fragmentation functions. The contributions from quark-quark (antiquark) scattering derived in this paper will be an important part of the complete dscription. Detailed numerical study of the effect of quark-quark (antiquark) scattering will be possible only after the completion of this complete description in the future. 7 Summary Utilizing the generalized factorization framework for twist-four processes we have stud- ied the nuclear modification of quark and antiquark fragmentation functions (FF) due to quark-quark (antiquark) double scattering in dense nuclear matter up to order O(α2s). We calculated and analyzed the complete set of all possible cut diagrams. The results can be categorized into contributions from double-hard, hard-soft processes and their interferences. The double-hard rescatterings correspond to elastic scattering of the lead- ing quark with another medium quark. It requires the second quark to carry a finite fractional momentum xL. Therefore, the energy loss of the leading quark through such processes can be identified as elastic energy loss at order O(α2s). The quark energy loss and modification of quark fragmentation functions are dominated by the t-channel of quark-quark (antiquark) scattering and are shown to be similar to that caused by quark- gluon scattering. The contribution from quark-quark scattering is smaller than that from quark-gluon scattering by a factor of CF/CA times the ratio of quark and gluon distribu- tion functions in the medium. We have shown that such contributions are not negligible for realistic kinematics and finite medium size. The soft-hard rescatterings mix gluon and quark scattering, in the same way as the lowest order qq̄ → g processes. Such processes modifies the final hadron spectra or effective fragmentation functions but do not con- tribute to energy loss of the leading quark. For qq̄ → qq̄, gg processes, there also exist pure interference contributions mainly coming from single-triple-scattering interference. With a simple model of a factorized two-quark correlation functions, we further investi- gated the flavor dependence of the medium modified quark fragmentation functions in a large nucleus. We identified the flavor dependent part of the modification and find that the nuclear modification for an antiquark fragmentation into a valence hadron is larger than that of a quark. This offers an qualitative explanation for the flavor dependence of the leading hadron suppression in semi-inclusive DIS off nuclear targets as observed by the HERMES experiment [25,26]. Acknowledgements The authors thank Jian-Wei Qiu and Enke Wang for helpful discussion. This work was supported by NSFC under project No. 10405011, by MOE of China under project IRT0624, by Alexander von Humboldt Foundation, by BMBF, by the Director, Office of Energy Research, Office of High Energy and Nuclear Physics, Divisions of Nuclear Physics, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231, and by the US NSF under Grant No. PHY-0457265, the Welch Foundation under Grant No. A-1358. A-1 Hard partonic parts for quark-quark double scattering In Section 3 we have discussed the calculation of the hard part of one example cut-diagram (Fig. 5) in detail. In this appendix we list the results for all possible real corrections to quark-quark (antiquark) double scattering in the next-to-leading order O(α2s). There are a total of 12 diagrams as illustrated in Figs. 5-16. For the purpose of abbreviation, we will suppress the variables in the notations of partonic hard parts D ≡ HD(y−, y−1 , y−2 , x, p, q, zh) , (A-1) and phase factor functions I ≡ I(y−, y−1 , y−2 , x, , xL, p) . (A-2) We first consider all qq̄ → gg annihilation diagrams with different possible cuts. The contributions of Fig. 5 are: 5,C = α2sxB I5,CDg→h(zh/z) 1 + z2 z(1 − z) 1 + (1− z)2 z(1 − z) , (A-3) Fig. 5. The t-channel of qq̄ → gg annihilation diagram with three possible cuts, central(C), left(L) and right(R). Fig. 6. The interference between t and u-channel of qq̄ → gg annihilation. I5,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p ×(1− e−ixLp+y 2 )(1− e−ixLp+(y−−y )) , (A-4) 5,L(R) = α2sxB I5,L(R) Dq→h(zh/z)2 1 + z2 z(1 − z) +Dg→h(zh/z)2 1 + (1− z)2 z(1 − z) , (A-5) I5,L =−θ(y−1 − y−2 )θ(y− − y−1 )ei(x+xL)p +y−(1− e−ixLp+(y−−y )) , (A-6) I5,R =−θ(−y−2 )θ(y−2 − y−1 )ei(x+xL)p +y−(1− e−ixLp+y 2 ) . (A-7) Here we have included the fragmentation of both final-state partons. The contributions from Fig. 6 are: Fig. 7. The s-channel of qq̄ → gg annihilation diagram with only a central-cut. 6,C = α2sxB I6,C 2Dg→h(zh/z) (1− z)z CF (CF − CA/2) , (A-8) 6,L(R) = α2sxB I6,L(R) [Dg→h(zh/z) +Dq→h(zh/z)] (1− z)z CF (CF − CA/2) , (A-9) I6,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p × (1− e−ixLp+y 2 )(1− e−ixLp+(y−−y )) , (A-10) I6,L =−θ(y−1 − y−2 )θ(y− − y−1 )ei(x+xL)p +y−(1− e−ixLp+(y−−y )) . (A-11) I6,R =−θ(−y−2 )θ(y−2 − y−1 )ei(x+xL)p +y−(1− e−ixLp+y 2 ) , (A-12) Note that the central-cut diagram in Fig. 6 corresponds to the interference between t and u-channel of the qq̄ → gg annihilation processes in Fig. 5. Since the splitting function is symmetric in z and 1 − z, a factor of 2 comes from the fragmentation of both gluons in the central-cut diagram. The s-channel of qq̄ → gg is shown in Fig. 7 which has only one central-cut. Its contri- bution to the partonic hard part is, 7,C = α2sxB I7,C 2Dg→h(zh/z) 2(z2 − z + 1)2 z(1 − z) , (A-13) I7,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p +y−e−ixLp +(y−−y− )e−ixLp 2 . (A-14) Note that the splitting function 2(z2 − z + 1)2/z(1 − z) = 2[1 − z(1 − z)]2/z(1 − z) is symmetric in z and 1− z. Therefore, fragmentation of the two final gluons gives rise to the factor of 2 in front of the gluon fragmentation function. Fig. 8. The interference between t and s-channel of qq̄ → gg annihilation. Fig. 9. The complex conjugate of Fig. 8. The interferences between t and s-channel of qq̄ → gg processes are shown in Figs. 8 and 9. There are only two possible cuts in these diagrams. The contributions from Fig. 8 are: 8,C = α2sxB I8,C Dg→h(zh/z) 1 + z3 z(1 − z) 1 + (1− z)3 z(1 − z) , (A-15) α2sxB Dq→h(zh/z)2 1 + z3 z(1− z) + Dg→h(zh/z)2 1 + (1− z)3 z(1 − z) , (A-16) I8,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p ×(1− e−ixLp+y 2 )e−ixLp +(y−−y− ) , (A-17) I8,L = θ(y 1 − y−2 )θ(y− − y−1 )ei(x+xL)p ×(e−ixLp+(y−−y ) − e−ixLp+(y−−y )) . (A-18) Contributions from Fig. 9, which are just the complex conjugate of Fig. 8, are: 9,C = α2sxB I9,C Dg→h(zh/z) 1 + z3 z(1 − z) 1 + (1− z)3 z(1 − z) , (A-19) 9,R = α2sxB Dq→h(zh/z)2 1 + z3 z(1 − z) + Dg→h(zh/z)2 1 + (1− z)3 z(1 − z) , (A-20) I9,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p ×(1− e−ixLp+(y−−y ))e−ixLp 2 , (A-21) I9,R = θ(−y−2 )θ(y−2 − y−1 )ei(x+xL)p ×(1− e−ixLp+(y ))e−ixLp 1 . (A-22) One can collect all contributions of the double hard qq̄ → gg processes from the central- cut diagrams, which should have the common phase factor ĪC = θ(−y−2 )θ(y− − y−1 )eixp +y−e−ixLp ) , (A-23) and obtain the total effective splitting function in the hard partonic part, Pqq̄→gg(z) = z(1 − z) {C2F [1 + z2 + 1 + (1− z)2]− 2CF (CF − CA/2) +2CFCA(1− z + z2)2 − CFCA[1 + z3 + 1 + (1− z)3]} z2 + (1− z)2 z(1− z) − 2CA[z2 + (1− z)2] . (A-24) We will find later in Appendix A-3 that the above result can also be obtained from the total matrix elements squared for qq̄ → gg annihilation. We now consider the annihilation processes qq̄ → qiq̄i with qi 6= q. There is only the s-channel process with one central-cut diagram as shown in Fig. 10. Its contribution to the hard part is 10,C = α2sxB I10,C qi 6=q [Dqi→h(zh/z) +Dq̄i→h(zh/z)] Fig. 10. s-channel qq̄ → qiq̄i annihilation. Fig. 11. t-channel qqi(q̄i) → qqi(q̄i) scattering. ×[z2 + (1− z)2] , (A-25) I10,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p +y−e−ixLp +(y−−y )e−ixLp 2 . (A-26) Here we define the effective splitting function for qq̄ → qiq̄i annihilation as, Pqq̄→qiq̄i(z) = [z2 + (1− z)2] . (A-27) Similarly, for qq̄i → qq̄i scattering with qi 6= q, there is only the t-channel as shown in Fig. 11. There are, however, three cut diagrams. Their contributions to the partonic hard part are: 11,C = α2sxB I11,C Dq→h(zh/z) 1 + z2 (1− z)2 +Dq̄i→h(zh/z) 1 + (1− z)2 , (A-28) 11,L(R) = α2sxB I11,L(R) Dq→h(zh/z) 1 + z2 (1− z)2 +Dg→h(zh/z) 1 + (1− z)2 , (A-29) I11,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p × (1− e−ixLp+y 2 )(1− e−ixLp+(y−−y )) , (A-30) I11,L =−θ(y−1 − y−2 )θ(y− − y−1 )ei(x+xL)p +y−(1− e−ixLp+(y−−y )) , (A-31) I11,R =−θ(−y−2 )θ(y−2 − y−1 )ei(x+xL)p +y−(1− e−ixLp+y 2 ) . (A-32) The twist-four two-parton correlation matrix element associated with the above quark- antiquark scattering is the quark-antiquark correlator, TAqq̄i(x, xL)∝ e ixp+y−−ixLp ×〈A|ψ̄q(0) −)ψ̄qi(y ψqi(y 2 )|A〉 , (A-33) and one should sum over all possible qi 6= q flavors. Note that in the above matrix element, the momentum flow for the antiquark (q̄i) is opposite to that of the quark (q) fields. For quark-quark scattering, qqi → qqi, the hard part is essentially the same. The only difference is the associated matrix element for the quark-quark correlator which is ob- tained from that of the quark-antiquark correlator via the exchange ψqi(y2) → ψ̄qi(y2) and ψ̄qi(y1) → ψqi(y1), TAqqi(x, xL)∝ e ixp+y−+ixLp ×〈A|ψ̄q(0) −)ψ̄qi(y ψqi(y 1 )|A〉 . (A-34) Note that the momentum flows of the two quarks (q and qi) point in the same direction. The effective splitting function of this scattering process is defined through the fragmen- tation of the quark in the central-cut diagram, Pqqi(q̄i)→qqi(q̄i)(z) = 1 + z2 (1− z)2 . (A-35) For annihilation qq̄ → qq̄ into identical quark and antiquark pairs, in addition to the s-channel (Fig. 10 for qi = q) and t-channel (Fig. 11 for qi = q̄), one has also to consider the interference between s and t-channel amplitudes as shown in Figs. 12 and 13, each having two cuts. Their contributions to the hard partonic parts are, respectively: Fig. 12. Interference between s and t-channel of qq̄ → qq̄ scattering Fig. 13. The complex conjugate of Fig. 12. 12,C = α2sxB I12,C Dq→h(zh/z) (1− z) +Dq̄→h(zh/z) 2(1− z)2 CF (CF − CA/2) , (A-36) 12,L = α2sxB I12,L Dq→h(zh/z) (1− z) +Dg→h(zh/z) 2(1− z)2 CF (CF − CA/2) , (A-37) I12,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p ×(1− e−ixLp+y 2 )e−ixLp +(y−−y− ) , (A-38) I12,L = θ(y 1 − y−2 )θ(y− − y−1 )ei(x+xL)p ×(e−ixLp+(y−−y ) − e−ixLp+(y−−y )) ; (A-39) Fig. 14. The interference between t and u-channel of identical quark-quark scattering qq → qq. 13,C = α2sxB I13,C Dq→h(zh/z) (1− z) +Dq̄→h(zh/z) 2(1− z)2 CF (CF − CA/2) , (A-40) 13,R = α2sxB I13,R Dq→h(zh/z) (1− z) +Dg→h(zh/z) 2(1− z)2 CF (CF − CA/2) , (A-41) I13,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p ×(1− e−ixLp+(y−−y ))e−ixLp 2 , (A-42) I13,R = θ(−y−2 )θ(y−2 − y−1 )ei(x+xL)p ×(e−ixLp+y 1 − e−ixLp+y 2 ) . (A-43) One can again collect contributions from the central-cut diagrams of the double scattering processes in Figs. 10, 11 12 and 13 and obtain the total effective splitting function for qq̄ → qq̄, Pqq̄→qq̄(z) = [z2 + (1− z)2] + 1 + z2 (1− z)2 CF (CF − CA/2) z2 + (1− z)2 + 1 + z2 (1− z)2 . (A-44) Here we have used CF − CA/2 = −1/2Nc. For antiquark fragmentation, Pqq̄→q̄q(z) = Pqq̄→qq̄(1 − z). One can also obtain the above result from qq̄ → qq̄ scattering matrix squared as shown in Appendix A-3. Similarly, for scattering of identical quarks qq → qq, one should set qi = q in Fig. 11[in Eq. (A-28)]. In addition, one should also also include interference between t and u-channel of the scattering as shown in Fig. 14. The contributions from such interference diagram 14,C = α2sxB I14,C × 2Dq→h(zh/z) z(1 − z) CF (CF − CA/2) , (A-45) 14,L(R) = α2sxB I14,L(R) × [Dq→h(zh/z) +Dg→h(zh/z)] z(1− z) CF (CF − CA/2) , (A-46) I14,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p × (1− e−ixLp+y 2 )(1− e−ixLp+(y−−y )) , (A-47) I14,L =−θ(y−1 − y−2 )θ(y− − y−1 )ei(x+xL)p +y−(1− e−ixLp+(y−−y )) , (A-48) I14,R =−θ(−y−2 )θ(y−2 − y−1 )ei(x+xL)p +y−(1− e−ixLp+y 2 ) . (A-49) Note again that the fragmentation of both quarks contributes to the factor 2 in Eq. (A- 45) since the splitting function is symmetric in z and 1 − z. The twist-four two-quark correlation matrix element associated with qq → qq scattering is TAqq(x, xL) as compared to TAqq̄(x, xL) for quark-antiquark annihilation processes. We can sum contributions from the double hard scattering in all the central-cut diagrams in Figs. 11 and 14 and obtain the total effective splitting function for qq → qq processes, Pqq→qq(z) = 1 + z2 (1− z)2 1 + (1− z)2 CF (CF − CA/2) z(1 − z) 1 + z2 (1− z)2 1 + (1− z)2 z(1 − z) . (A-50) There are two remaining cut diagrams that contribute to the quark-antiquark annihilation at the order of O(α2s) as shown in Figs. 15 and 16. Their contributions are: 15,L = α2sxB I15,L Dq→h(zh/z)2 1 + z2 +Dg→h(zh/z)2 1 + (1− z)2 , (A-51) I15,L =−θ(y−1 − y−2 )θ(y− − y−1 )ei(x+xL)p +y−e−ixLp +(y−−y ) , (A-52) Fig. 15. Interference between final-state gluon radiation from single and triple-quark scattering. Fig. 16. The complex conjugate of Fig. 15. 16,R = α2sxB I16,R Dq→h(zh/z)2 1 + z2 +Dg→h(zh/z)2 1 + (1− z)2 , (A-53) I16,R =−θ(−y−2 )θ(y−2 − y−1 )ei(x+xL)p +y−e−ixLp 1 . (A-54) A-2 Effective splitting functions In this Appendix, we list the effective splitting functions associated with each process qa→ b and the double-hard (HI), hard-soft (SI) or their interferences (I, I2) according to Eq. (73). qqi(q̄i)→qi(q̄i) (z) = 1 + (1− z)2 qqi(q̄i)→q (z) = 1 + z2 (1− z)2 qqi(q̄i)→qi(q̄i) (z) = 1 + (1− z)2 qqi(q̄i)→g (z) = −1 + (1− z) (A-55) qq̄→qi(z) =P qq̄→q̄i(z) = z 2 + (1− z)2, qq̄→qi(z) =P qq̄→q̄i(z) = z 2 + (1− z)2, (A-56) P (HI)qq→q(z) = 1 + (1− z)2 1 + z2 (1− z)2 z(1 − z) P (SI)qq→g(z) =−P (SI)qq→q(z) , (A-57) P (SI)qq→q(z) = 1 + (1− z)2 z(1 − z) qq̄→q(z) = z 2 + (1− z)2 + 1 + z (1− z)2 qq̄→q̄(z) =P qq̄→q(1− z) , qq̄→g(z) = 2CF z2 + (1− z)2 z(1 − z) − 2CA[z2 + (1− z)2], (A-58) qq̄→q(z) =− z(1 − z) + 2CF qq̄→q̄(z) = 1 + (1− z)2 qq̄→g(z) = z(1 − z) + 2CF − 1 + (1− z) (A-59) qq̄→q(z) = z 2 + (1− z)2 − z(1 − z) − 2CF qq̄→q̄(z) = z 2 + (1− z)2 , qq̄→g(z) =CA 4(1− z + z2)2 − 1 z(1 − z) − 2CF (1− z)2 , (A-60) qq̄→q(z) = z(1 − z) − 2CF qq̄→g(z) = z(1 − z) − 2CF . (A-61) The non-singlet splitting functions for qq̄ → b, defined as qq̄→b(z) ≡ P qq̄→b(z)− P qq→b(z), (A-62) are listed as below: N(HI) qq̄→qi(q̄i) (z) =P qq̄→qi(q̄i) (z), ∆P qq̄→qi(q̄i) (z) = P qq̄→qi(q̄i) (z), (A-63) N(HI) qq̄→q (z) =− (1− z2)(1 + z2 + (1− z)2) 1 + z3 z(1− z) N(HI) qq̄→q̄ (z) =P qq̄→q̄(z), ∆P N(HI) qq̄→g (z) = P qq̄→g(z), (A-64) N(SI) qq̄→q (z) =− 1 + z2 1 + (1− z)2 N(SI) qq̄→q̄ (z) =P qq̄→q̄(z) N(SI) qq̄→g (z) = 2CF 1 + z2 z(1 − z) 1 + (1− z)2 (A-65) qq̄→b(z) =P qq̄→b(z), ∆P N(I2) qq̄→b (z) = P qq̄→b(z) (b = q, q̄, g) (A-66) A-3 Alternative calculations of central-cut diagrams As a cross-check of the hard partonic parts calculated from different cut diagrams in Appendix A-1, we provide an alternative calculation of all the central-cut diagrams, which correspond to quark-quark (antiquark) scattering. Considering a parton (a) with momentum q scattering with another parton (b) that carries a fractional momentum xp, a(q) + b(xp) → c(ℓ) + d(p′), the cross section can be written as dσab = |M |2ab→cd(t̂/ŝ, û/ŝ) (2π)32ℓ0 2πδ[(p+ q − ℓ)2] (4π)2 |M |2ab→cd(t̂/ŝ, û/ŝ) z(1− z) dℓ2T δ , (A-67) where q = [0, q−, 0] and p = [xp+, 0, 0] are momenta of the initial partons and , zq−, ~ℓT (A-68) is the momentum of one of the final partons. With the given kinematics, the on-shell condition in the cross section can be recast as (xp + q − ℓ)2 = 2(1− z)xp+q− 1− xL , xL = 2z(1− z)p+q− . (A-69) The Mandelstam variables of the collision are, ŝ=(q + xp)2 = 2xp+q− = z(1 − z) , û = (ℓ− xp)2 = −zŝ, t̂=(ℓ− q)2 = −(1 − z) ŝ = −(1− z)ŝ, (A-70) where we have used the on-shell condition x = xL. With Eq. (A-67) and parton distribution functions fNb (x), one can obtain the parton- nucleon cross section, dσaN = dσabf b (x)dx fNb (xL)xL|M |2ab→cd(t̂/ŝ, û/ŝ) z(1− z) fNb (xL) C0Pab→cd(z)dz , (A-71) where s = 2p+q− is the center-of-mass energy for aN collision, C0 is some common color factor in the scattering matrix elements and Pab→cd(z) = (1/C0)|M |2ab→cd(t̂/ŝ, û/ŝ) (A-72) is what we have defined as the effective splitting function for the corresponding processes. One can therefore easily obtain these effective splitting functions from the corresponding matrix elements for elementary parton-parton scattering [39]. We will list them in the fol- lowing. A common color factor for all quark-quark(antiquark) scattering is C0 = CF/Nc. qq̄ → qiq̄i annihilation: |M |2qq̄→qiq̄i = t̂2 + û2 Pqq̄→qiq̄i(z) = z 2 + (1− z)2 . (A-73) qq̄ → qq̄ annihilation: |M |2qq̄→qq̄ = û2 + ŝ2 û2 + t̂2 Pqq̄→qq̄(z) = 1 + z2 (1− z)2 + z2 + (1− z)2 + . (A-74) qq̄ → gg annihilation: |M |2qq̄→gg = − 2CA û2 + t̂2 Pqq̄→gg(z) = 2CF z2 + (1− z)2 z(1− z) − 2CA(z2 + (1− z)2) . (A-75) qqi(q̄i) → qqi(q̄i) scattering: |M |2qqi(q̄i)→qqi(q̄i)= û2 + ŝ2 Pqqi(q̄i)→qqi(q̄i)(z) = 1 + z2 (1− z)2 . (A-76) qq → qq scattering: |M |2qq→qq = û2 + ŝ2 ŝ2 + t̂2 Pqq→qq(z) = 1 + z2 (1− z)2 1 + (1− z)2 z(1− z) . (A-77) For quark-gluon Compton scattering, the relevant gluon distribution function is xLGN (xL). One can therefore rewrite contribution from qg → qg to Eq. (A-71) as, dσqN = xLGN(xL)πα sz(1 − z)|M |2qg→qg(t̂/ŝ, û/ŝ)dz ≡xLGN(xL)πα2s Pqg→qg(z)dz . (A-78) We have then for qg → qg scattering, |M |2qg→qg = ŝ2 + û2 û2 + ŝ2 Pqg→qg(z) = z(1 − z) 1 + z2 (1− z)2 1 + z2 . (A-79) Comparing this result with that in Ref. [18] for the quark-gluon rescattering, we can see that they agree in the limit 1 − z → 0. This is a consequence of the collinear approxi- mation employed in Ref. [18] in the calculation of the hard partonic part in quark-gluon rescattering. We can also extend this calculation to the case of gluon-nucleon scattering. One can use Eq. (A-71) to define the splitting function for gq → gq scattering, |M |2gq→gq = ŝ2 + t̂2 t̂2 + ŝ2 Pgq→gq(z) = z(1 − z) 1 + (1− z)2 1 + (1− z)2 (1− z) . (A-80) Here for gluon-parton scattering, there is no common color factor. gg → qq̄ annihilation, |M |2gg→qq̄ = t̂2 + û2 t̂2 + û2 Pgg→qq̄(z) = z(1 − z) z2 + (1− z)2 z(1− z) [z2 + (1− z)2] . (A-81) gg → gg scattering |M |2gg→gg =2 3− t̂û − ûŝ − t̂ŝ Pgg→gg(z) = 2 (1− z + z2)3 z(1− z) . (A-82) One can use this technique to extend the study of modified fragmentation functions to propagating gluons. Since the modification is dominated by quark-gluon and gluon-gluon scattering, comparing the effective splitting functions, Pqg→qg(z)≈ , (A-83) Pgg→gg(z)≈ , (A-84) in the limit z → 1, one can conclude that a gluon’s radiative energy loss is larger than a quark by a factor of Nc/CF = CA/CF = 9/4. We will leave the complete derivation of medium modification of gluon fragmentations to a future publication. References [1] K. Adcox et al., [PHENIX Collaboration], Phys. Rev. Lett. 88, 022301 (2002). [2] C. Adler et al., [STAR Collaboration], Phys. Rev. Lett. 89 202301 (2002). [3] C. Adler et al., [STAR Collaboration], Phys. Rev. Lett. 90, 082302 (2003). [4] M. Gyulassy and L. McLerran, Nucl. Phys. A 750, 30 (2005). [5] P. Jacobs and X. N. Wang, Prog. Part. Nucl. Phys. 54, 443 (2005). [6] J. W. Qiu, [arXiv:hep-ph/0507268]. [7] J. W. Qiu and G. Sterman, Int. J. Mod. Phys. E 12 (2003) 149. [8] X. F. Guo, Phys. Rev. D58 (1998) 114033. [9] X. F. Guo, J. W. Qiu and W. Zhu, Phys. Lett. B 523 (2001) 88. [10] R. J. Fries, A. Schäfer, E. Stein and B. Muller, Nucl. Phys. B 582, 537 (2000). [11] J. W. Qiu and X. Zhang, Phys. Lett. B 525 (2002) 265. [12] J. W. Qiu and I. Vitev, Phys. Rev. Lett. 93, 262301 (2004), J. W. Qiu and I. Vitev, Phys. Lett. B 587, 52 (2004). [13] M. Gyulassy and X.-N. Wang, Nucl. Phys. B 420, 583 (1994); X.-N. Wang, M. Gyulassy and M. Plümer, Phys. Rev. D 51, 3436 (1995). [14] R. Baier et al., Nucl. Phys. B 483, 291 (1997). Nucl. Phys. B 484, 265 (1997); Phys. Rev. C 58, 1706 (1998). [15] B. G. Zakharov, JETP Lett. 63, 952 (1996). [16] M. Gyulassy, P. Lévai and I. Vitev, Nucl. Phys. B594, 371 (2001); Phys. Rev. Lett. 85, 5535 (2000). [17] U. Wiedemann, Nucl. Phys. B588, 303 (2000); C. A. Salgado and U. A. Wiedemann, Phys. Rev. Lett. 89, 092303 (2002). [18] X. F. Guo and X.-N. Wang, Phys. Rev. Lett. 85, 3591 (2000); X.-N. Wang and X. F. Guo, Nucl. Phys. A 696, 788 (2001). [19] B. W. Zhang and X.-N. Wang, Nucl. Phys. A 720, 429 (2003); B. W. Zhang, E. Wang and X.-N. Wang, Phys. Rev. Lett. 93, 072301 (2004); B. W. Zhang, E. K. Wang and X.-N. Wang, Nucl. Phys. A 757, 493 (2005). [20] B. Z. Kopeliovich, A. Schäfer and A. V. Tarasov , Phys. Rev. C 59 (1999) 1609 [arXiv:hep-ph/9808378]. [21] M. Gyulassy, I. Vitev, X. N. Wang and B. W. Zhang,Quark-Gluon Plasma 3, R. C. Hwa and X.-N Wang, Eds. (World Scientific, Singapore, 2003), p123-191 [arXiv:nucl-th/0302077]. [22] A. Kovner and U. A. Wiedemann, arXiv:hep-ph/0304151. [23] M. Luo, J. W. Qiu and G. Sterman, Phys. Lett. B 279 (1992) 377; Phys. Rev. D 50 (1994) 1951; Phys. Rev. D 49, 4493 (1994). [24] E. Wang and X.-N. Wang, Phys. Rev. Lett. 89, 162301 (2002) [arXiv:hep-ph/0202105]. [25] A. Airapetian et al. [HERMES Collaboration], Eur. Phys. J. C 20, 479 (2001) [arXiv:hep-ex/0012049]. http://arxiv.org/abs/hep-ph/0507268 http://arxiv.org/abs/hep-ph/9808378 http://arxiv.org/abs/nucl-th/0302077 http://arxiv.org/abs/hep-ph/0304151 http://arxiv.org/abs/hep-ph/0202105 http://arxiv.org/abs/hep-ex/0012049 [26] A. Airapetian et al. [HERMES Collaboration], Phys. Lett. B 577, 37 (2003) [arXiv:hep-ex/0307023]. [27] X. N. Wang, Phys. Lett. B 595, 165 (2004) [arXiv:nucl-th/0305010]. [28] T. Falter, W. Cassing, K. Gallmeister and U. Mosel, Phys. Rev. C 70, 054609 (2004) [arXiv:nucl-th/0406023]. [29] B. Z. Kopeliovich, J. Nemchik, E. Predazzi and A. Hayashigaki, Nucl. Phys. A 740, 211 (2004) [arXiv:hep-ph/0311220]. [30] V. N. Gribov and L. N. Lipatov, Sov. J. Nucl. Phys. 15, 438 (1972); Yu. L. Dokshitzer, Sov. Phys. JETP 46, 641 (1977); G. Altarelli and G. Parisi, Nucl. Phys. B126, 298 (1977); [31] R. D. Field, Applications of Perturbative QCD, Frontiers in Physics Lecture, Vol. 77, Ch. 5.6 (Addison Wesley, 1989). [32] M. E. Peskin and D. V. Schroeder, An Introduction to Quantuam Field Theory, (Addison- Wesley Advanced Book Program, 1995). [33] J. Osborne and X.-N. Wang, Nucl. Phys. A 710, 281 (2002) [arXiv:hep-ph/0204046]. [34] X. N. Wang, arXiv:nucl-th/0604040. [35] H. L. Lai et al. [CTEQ Collaboration], Eur. Phys. J. C 12, 375 (2000) [arXiv:hep-ph/9903282]; One can use the online parton distribution calculator at http://durpdg.dur.ac.uk/HEPDATA/PDF. [36] F. Gelis, K. Kajantie and T. Lappi, Phys. Rev. Lett. 96, 032304 (2006) [arXiv:hep-ph/0508229]. [37] W. Liu, C. M. Ko and B. W. Zhang, arXiv:nucl-th/0607047. [38] J. Binnewies, B. A. Kniehl and G. Kramer, Phys. Rev. D 52, 4947 (1995) [arXiv:hep-ph/9503464]. [39] R. Cutler and D. W. Sivers, Phys. Rev. D 17, 196 (1978). http://arxiv.org/abs/hep-ex/0307023 http://arxiv.org/abs/nucl-th/0305010 http://arxiv.org/abs/nucl-th/0406023 http://arxiv.org/abs/hep-ph/0311220 http://arxiv.org/abs/hep-ph/0204046 http://arxiv.org/abs/nucl-th/0604040 http://arxiv.org/abs/hep-ph/9903282 http://durpdg.dur.ac.uk/HEPDATA/PDF http://arxiv.org/abs/hep-ph/0508229 http://arxiv.org/abs/nucl-th/0607047 http://arxiv.org/abs/hep-ph/9503464 Introduction General formalism Quark-quark double scattering processes Modified Fragmentation Functions q"7016q g annihilation q"7016q qi"7016qi annihilation qqi("7016qi) qqi("7016qi) scattering qqqq scattering q"7016q q"7016q, gg annihilation Modification due to quark-gluon mixing Flavor dependence of the medium modified fragmentation Summary Hard partonic parts for quark-quark double scattering Effective splitting functions Alternative calculations of central-cut diagrams References ABSTRACT Modifications to quark and antiquark fragmentation functions due to quark-quark (antiquark) double scattering in nuclear medium are studied systematically up to order \cal{O}(\alpha_{s}^2)$ in deeply inelastic scattering (DIS) off nuclear targets. At the order $\cal{O}(\alpha_s^2)$, twist-four contributions from quark-quark (antiquark) rescattering also exhibit the Landau-Pomeranchuck-Midgal (LPM) interference feature similar to gluon bremsstrahlung induced by multiple parton scattering. Compared to quark-gluon scattering, the modification, which is dominated by $t$-channel quark-quark (antiquark) scattering, is only smaller by a factor of $C_F/C_A=4/9$ times the ratio of quark and gluon distributions in the medium. Such a modification is not negligible for realistic kinematics and finite medium size. The modifications to quark (antiquark) fragmentation functions from quark-antiquark annihilation processes are shown to be determined by the antiquark (quark) distribution density in the medium. The asymmetry in quark and antiquark distributions in nuclei will lead to different modifications of quark and antiquark fragmentation functions inside a nucleus, which qualitatively explains the experimentally observed flavor dependence of the leading hadron suppression in semi-inclusive DIS off nuclear targets. The quark-antiquark annihilation processes also mix quark and gluon fragmentation functions in the large fractional momentum region, leading to a flavor dependence of jet quenching in heavy-ion collisions. <|endoftext|><|startoftext|> Introduction Among all dimensions, 2-SAT possesses many special properties unique in the sense of computational complexity [1, 2, 3, 4, 5]. But in light of works [6, 8, 7, 9] a problem arose: either those properties are accidental or there are polynomial time reductions of SAT to 2-SAT of polynomial size. This article describes one such reduction. 2 Presenting SAT with XOR In [6] was described one of the ways to present SAT with a conjunction of XOR. Let us summarize it. Let Boolean formula f define a given SAT instance: f = c1 ∧ c2 ∧ . . . ∧ cm. (1) Clauses ci are disjunctions of literals: ci = Li1 ∨ Li2 ∨ . . . ∨ Lini , i = 1, 2, . . . , m - where ni is the number of literals in clause ci; and Lij are the literals. Using distributive laws, formula (1) can be rewritten in disjunctive form: f = d1 ∨ d2 ∨ . . . dp, p = n1n2 . . . nm. Clauses dk in this presentation are conjunctions of m literals - one literal from each clause ci, i = 1, 2, . . . , m: dk = L1k1 ∧ L2k2 ∧ . . . ∧ Lmkm , k = 1, 2, . . . , p. (2) ∗Author’s email: sgubin@genesyslab.com It is obvious that formula (1) is satisfiable iff there are clauses without com- plimentary literals amongst conjunctive clauses (2). Disjunction of all those clauses is the disjunctive normal form of formula (1). Thus, formula (1) is satisfiable iff there are members in its disjunctive normal form. There is a generator for conjunctive clauses (2): (ξi1 ⊕ ξi2 ⊕ . . .⊕ ξini) = true, (3) - where Boolean variable ξµν indicates whether literal Lµν participates in con- junction (2). Solutions of equation (3) generate conjunctive clauses (2). Let’s call the variables ξ the indicators. To select from all solutions of equation (3) those without complimentary clauses, let’s use another Boolean equation. For each of the combination of clauses (ci, cj), 1 ≤ i < j ≤ m, let’s build a set of all couples of literals participating in the clauses: Aij = { (Liµ, Ljν) | ci = Liµ ∨ . . . ; cj = Ljν ∨ . . . }. Let Bij be a set of such couples of indicators (ξiµ, ξjν), that the literals they present are complimentary: Bij = { (ξiµ, ξjν) | (Liµ, Ljν) ∈ Aij, Liµ = L̄jν }. There are C2m sets Bij , 1 ≤ i < j ≤ m, and |Bij| ≤ min{ni, nj}. Let’s mention that some of the sets can be empty. Then, the following equa- tion will select from all solutions of equation (3) those without complimentary clauses: 1≤i<|startoftext|> Half-metallic silicon nanowires E. Durgun,1, 2 D. Çakır,1, 2 N. Akman,2, 3 and S. Ciraci1, 2, ∗ Department of Physics, Bilkent University, Ankara 06800, Turkey National Nanotechnology Research Center, Bilkent University, Ankara 06800, Turkey Department of Physics, Mersin University, Mersin, Turkey (Dated: November 19, 2021) From first-principles calculations, we predict that transition metal (TM) atom doped silicon nanowires have a half-metallic ground state. They are insulators for one spin-direction, but show metallic properties for the opposite spin direction. At high coverage of TM atoms, ferromagnetic sil- icon nanowires become metallic for both spin-directions with high magnetic moment and may have also significant spin-polarization at the Fermi level. The spin-dependent electronic properties can be engineered by changing the type of dopant TM atoms, as well as the diameter of the nanowire. Present results are not only of scientific interest, but can also initiate new research on spintronic applications of silicon nanowires. PACS numbers: 73.22.-f, 68.43.Bc, 73.20.Hb, 68.43.Fg Rod-like, oxidation resistant Si nanowires (SiNW) can now be fabricated at small diameters[1] (1-7 nm) and dis- play diversity of interesting electronic properties. In par- ticular, the band gap of semiconductor SiNWs varies with their diameters. They can serve as a building material in many of electronic and optical applications like field effect transistors [2] (FETs), light emitting diodes [3], lasers [4] and interconnects. Unlike carbon nanotubes, the con- ductance of semiconductor nanowire can be tuned easily by doping during the fabrication process or by applying a gate voltage in a SiNW FET. In this letter, we report a novel spin-dependent elec- tronic property of hydrogen terminated silicon nanowires (H-SiNW): When doped by specific transition metal (TM) atoms they show half-metallic[5, 6] (HM) ground state. Namely, due to broken spin-degeneracy, energy bands En(k, ↑) and En(k, ↓) split and the nanowire re- mains to be insulator for one spin-direction of electrons, but becomes a conductor for the opposite spin-direction achieving 100% spin polarization at the Fermi level. Un- der certain circumstances, depending on the dopant and diameter, semiconductor H-SiNWs can be also either a ferromagnetic semiconductor or metal for both spin di- rections. High-spin polarization at the Fermi level can be achieved also for high TM coverage of specific SiNWs. Present results on the asymmetry of electronic states of TM doped SiNWs are remarkable and of technological in- terest since room temperature ferromagnetism is already discovered in Mn-doped SiNW[8]. Once combined with advanced silicon technology, these properties can be re- alizable and hence can make ”known silicon” again a po- tential material with promising nanoscale technological applications in spintronics, magnetism. Even though 3D ferromagnetic Heusler alloys and transition-metal oxides exhibit half-metallic properties [7], they are not yet appropriate for spintronics because of difficulties in controlling stoichiometry and the defect levels destroying the coherent spin-transport. Qian et al. have proposed HM heterostructures composed of δ- doped Mn layers in bulk Si [9]. Recently, Son et al. [10] predicted HM properties of graphene nanoribbons. Stable 1D half-metals have been also predicted for TM atom doped arm-chair single-wall carbon nanotubes [11] and linear carbon chains [12, 13]; but synthesis of these nanostructures appears to be difficult. Our results are obtained from first-principles plane wave calculations [14] (using a plane-wave basis set up to kinetic energy of 350 eV) within generalized gradient ap- proximation expressed by PW91 functional[15]. All cal- culations for paramagnetic, ferromagnetic and antiferro- magnetic states are carried out using ultra-soft pseudopo- tentials [16] and confirmed by using PAW potential[17]. All atomic positions and lattice constants are optimized by using the conjugate gradient method where total en- ergy and atomic forces are minimized. The convergence for energy is chosen as 10−5 eV between two steps, and the maximum force allowed on each atom is 0.05 eV/Å[18]. Bare SiNW(N)s (which are oriented along [001] direc- tion and have N Si atoms in their primitive unit cell) are initially cut from the ideal bulk Si crystal in rod-like forms and subsequently their atomic structures and lat- tice parameter are relaxed [19]. The optimized atomic structures are shown for N=21, 25, and 57 in Fig. 2. While bare SiNW(21) is a semiconductor, bare SiNW(25) and SiNW(57) are metallic. The average cohesive energy relative to a free Si atom (Ec) is comparable with the calculated cohesive energy of bulk crystal (4.64 eV per Si atom) and it increases with increasing N. The average co- hesive energy relative to the bulk Si crystal, E c, is small but negative as expected. Upon passivation of dangling bonds with hydrogen atoms all of these SiNWs (specified as H-SiNW) become semiconductor with a band gap EG. The binding energy of adsorbed hydrogen relative to the free H atom (Eb), as well as relative to the free H2 (E are both positive and increases with increasing N. Exten- http://arxiv.org/abs/0704.0109v1 FIG. 1: (Color online) Upper curve in each panel with numerals indicate the distribution of first, second, third, fourth etc nearest neighbor distances of SiNW(N) as cut from the ideal Si crystal, same for structure-optimized bare SiNW(N)(middle curve) and structure optimized H-SiNW(N) (bottom curve) for N=21, 57 and 81. Vertical dashed line cor- responds to the distance of Si-H bond. sive ab initio molecular dynamics calculations have been carried out at 500 K using supercells, which comprise ei- ther two or four primitive unit cells of nanowires to lift artificial limitations imposed by periodic boundary con- dition. After several iterations lasting 1 ps, the structure of all SiNW(N) and H-SiNW(N) remained stable. Even though SiNWs are cut from ideal crystal, their optimized structures deviate substantially from crystalline coordi- nation, especially for small diameters as seen in Fig.1. Upon hydrogen termination the structure is healed sub- stantially, and approaches the ideal case with increas- ing N (or increasing diameter), as expected. The cal- culated response of the wire to a uniaxial tensile force, κ = ∂ET /∂c, ranging from 172 to 394 eV/cell indicates that the strength of H-SiNW(N)s (N=21-57) is rather high. The adsorption of a single TM (TM=Fe, Ti, Co, Cr, and Mn) atom per primitive cell, denoted by n = 1, have been examined for different sites (hollow, top, bridge etc) on the surface of H-SiNW(N) for N=21, 25 and 57. In Fig. 2(c) we present only the most energetic adsorption geometry for a specific TM atom for each N, which re- sults in a HM state. These are Co-doped H-SiNW(21), Cr-doped SiNW(25) and Cr-doped SiNW(57). These nanowires have ferromagnetic ground state, since their energy difference between calculated spin-unpolarized and spin-polarized total energy, i.e. ∆Em = EsuT − is positive. We calculated ∆Em =0.04, 0.92 and 0.94 eV for H-SiNW(21)+Co, H-SiNW(25)+Cr and H- SiNW(57)+Cr, respectively [20]. Moreover, these wires have the integer number of unpaired spin in their prim- itive unit cell. In contrast to usually weak binding of TM atoms on single-wall carbon nanotubes which can lead to clustering [21], the binding energy of TM atoms (EB) on H-SiNWs is high and involve significant charge transfer from TM atom to the wire [22]. Mulliken anal- ysis indicates that the charge transfer from Co to H- FIG. 2: (Color online) Top and side views of optimized atomic structures of various SiNW(N)’s. (a) Bare SiNWs; (b) H- SiNWs; (c) single TM atom doped per primitive cell of H- SiNW (n = 1); (d) H-SiNWs covered by n TM atom corre- sponding to n > 1. Ec, E c, Eb, E b, EG, and µ, respectively denote the average cohesive energy relative to free Si atom, same relative to the bulk Si, binding energy of hydrogen atom relative to free H atom, same relative to H2 molecule, energy band gap and the net magnetic moment per primitive unit cell. Binding energies in regard to the adsorption of TM atoms, i.e. EB, E B for n = 1 and average values EB , E for n > 1 are defined in the text and in Ref[22]. The [001] direction is along the axis of SiNWs. Small, large-light and large-dark balls represent H, Si and TM atoms, respectively. Side views of atomic structure comprise two primitive unit cells of the SiNWs. Binding and cohesive energies are given in eV/atom. SiNW(21) is 0.5 electrons. The charge transfer from Cr to H-SiNW(25) and H-SiNW(57) is even higher (0.8 and 0.9 electrons, respectively). Binding energies of ad- sorbed TM atoms relative to their bulk crystals (E′B) are negative and hence indicate endothermic reaction. Due to very low vapor pressure of many metals, it is proba- bly better to use some metal-precursor to synthesize the structures predicted here. The band structures of HM nanowires are presented in Fig.3. Once a Co atom is adsorbed above the center of a hexagon of Si atoms on the surface of H-SiNW(21) the spin degeneracy is split and whole system becomes magnetic with a magnetic moment of µ=1 µB (Bohr mag- neton per primitive unit cell). Electronic energy bands become asymmetric for different spins: Bands of major- ity spins continue to be semiconducting with relatively smaller direct gap of EG=0.4 eV. In contrast, two bands of minority spins, which cross the Fermi level, become metallic. These metallic bands are composed of Co-3d and Si-3p hybridized states with higher Co contribution. The density of majority and minority spin states, namely D(E, ↑) and D(E, ↓), display a 100% spin-polarization P = [D(EF , ↑)−D(EF , ↓)]/[D(EF , ↑)+D(EF , ↓)] at EF . Cr-doped H-SiNW(25) is also HM. Indirect gap of major- ity spin bands has reduced to 0.5 eV. On the other hand, two bands constructed from Cr-3d and Si-3p hybridized states cross the Fermi level and hence attribute metal- licity to the minority spin bands. Similarly, Cr-doped H-SiNW(57) is also HM. The large direct band gap of undoped H-SiNW(57) is modified to be indirect and is reduced to 0.9 eV for majority spin bands. The mini- mum of the unoccupied conduction band occurs above but close to the Fermi level. Two bands formed by Cr-3d and Si-3p hybridized states cross the Fermi level. The net magnetic moment is 4 µB . Using PAW potential results, we estimated Curie temperature of half-metallic H-SiNW+TMs as 8, 287, and 709 K for N=21, 25, and 57, respectively. The well-known fact that density functional theory un- derestimates the band gap, EG does not concern the present HM states, since H-SiNWs are already verified to be semiconductor experimentally[1] and upon TM- doping they are predicted to remain semiconductor for one spin direction. In fact, band gaps predicted here are in fair agreement with experiment and theory. As for par- tially filled metallic bands of the opposite spin, they are properly represented. Under uniaxial compressive strain the minimum of the conduction band of majority spin states rises above the Fermi level. Conversely, it becomes semi-metallic under uniaxial tensile strain. Since conduc- tion and valence bands of both H-SiNW(21)+Co and H- SiNW(25)+Cr are away from EF , their HM behavior is robust under uniaxial strain. Also the effect of spin-orbit coupling is very small and cannot destroy HM properties [12]. The form of two metallic bands crossing the Fermi level eliminates the possibility of Peierls distortion. On FIG. 3: (Color online) Band structure and spin-dependent to- tal density of states (TDOS) for N=21, 25 and 57. Left panels: Semiconducting H-SiNW(N). Middle panels: Half-metallic H- SiNW(N)+TM. Right panels: Density of majority and mi- nority spin states of H-SiNW(N)+TM. Bands described by continuous and dotted lines are majority and minority bands. Zero of energy is set to EF . FIG. 4: (Color online) D(E, ↓), density of minority (light) and D(E, ↑), majority (dark) spin states. (a) H-SiNW(25)+Cr, n = 8; (b) H-SiNW(25)+Cr, n = 16. P and µ indicate spin- polarization and net magnetic moment (in Bohr magnetons per primitive unit cell), respectively. the other hand, HM ground state of SiNWs is not com- mon to all TM doping. For example H-SiNW(N)+Fe is consistently ferromagnetic semiconductor with different EG,↑ and EG,↓. H-SiNW(N)+Mn(Cr) can be either fer- romagnetic metal or HM depending on N. To see whether spin-dependent GGA properly repre- sents localized d-electrons and hence possible on-site re- pulsive Coulomb interaction destroys the HM, we also carried out LDA+U calculations[23]. We found that in- sulating and metallic bands of opposite spins coexist up to high values of repulsive energy (U = 4) for N=25. For N=57, HM persists until U∼1. Clearly, HM character of TM doped H-SiNW revealed in Fig.3 is robust and unique behavior. Finally, we note that HM state predicted in TM-doped H-SiNWs occurs in perfect structures; complete spin- polarization may deviate slightly from P=100% due to the finite extent of devices. Even if the exact HM charac- ter corresponding to n = 1 is disturbed for n > 1, the pos- sibility that some H-SiNWs having high spin-polarization at EF at high TM coverage can be relevant for spintronic applications. We therefore investigated electronic and magnetic structure of the above TM-doped H-SiNWs at n > 1 as described in Fig. 2(d). Figure 4 presents the calculated density of minority and majority spin states of Cr covered H-SiNWs. It is found that H-SiNW(21) covered by Co is non- magnetic for both coverage of n = 4 and 12. H-SiNW(25) is, however, ferromagnetic for different level of Cr cover- age and has high net magnetic moment. For example, n = 8 can be achieved by two different geometries; both geometries are ferromagnetic with µ=19.6 and 32.3 µB and are metallic for both spin directions. Interestingly, while P is negligible for the former geometry, the lat- ter one has P = 0.84 and hence is suitable for spin- tronic applications (See Fig. 4). Similarly, Cr covered H-SiNW(57) with n = 8 and 16 are both ferromagnetic with µ= 34.3 (P =56) and µ=54.5 µB (P =0.33), respec- tively. The latter nanostructure having magnetic mo- ment as high as 54.5 µB can be a potential nanomagnet. Clearly, not only total magnetic moment, but also the spin polarization at EF of TM covered H-SiNMs exhibits interesting variations depending on n, N and the type of In conclusion, hydrogen passivated SiNWs can exhibit half-metallic state when doped with certain TM atoms. Resulting electronic and magnetic properties depend on the type of dopant TM atom, as well as on the diam- eter of the nanowire. As a result of TM-3d and Si-3p hybridization two new bands of one type of spin direc- tion are located in the band gap, while the bands of other spin-direction remain to be semiconducting. Elec- tronic properties of these nanowires depend on the type of dopant TM atoms, as well as on diameter of the H-SiNW. When covered with more TM atoms, perfect half-metallic state of H-SiNW is disturbed, but for cer- tain cases, the spin polarization at EF continues to be high. High magnetic moment obtained at high TM coverage is another remarkable result which may lead to the fabrication of nanomagnets for various applica- tions. Briefly, functionalizing silicon nanowires with TM atoms presents us a wide range of interesting properties, such as half-metals, 1D ferromagnetic semiconductors or metals and nanomagnets. We believe that our findings hold promise for the use of silicon -a unique material of microelectronics- in nanospintronics including magne- toresistance, spin-valve and non-volatile memories. ∗ Electronic address: ciraci@fen.bilkent.edu.tr [1] D. D. D. Ma et al., Science 299, 1874 (2003). [2] Y. Cui, Z. Zhong, D. Wang, W. U. Wang and C. M. Lieber, Nano Lett. 3, 149 (2003). [3] Y. Huang, X. F. Duan and C. M. Lieber, Small 1, 142 (2005). [4] X. F. Duan, Y. Huang, R. Agarwal and C. M. Lieber, Nature(London) 421, 241 (2003). [5] R.A. de Groot et al., Phys. Rev. Lett. 50, 2024 (1983). [6] W.E. Pickett and J. S. Moodera, Phys. Today 54, 39 (2001). [7] J.-H. Park et al., Nature (London) 392, 794 (1998). [8] W.H. Wu, J.C. Tsai and J.L. Chen, Appl. Phys. Lett. 90, 043121 (2007). [9] M.C. Qian et al., Phys. Rev. Lett. 96, 027211 (2006). [10] Y-W Son, M.L. Cohen and S.G. Louie, Nature 444, (2006); Phys. Rev. Lett. 97, 216803 (2006). [11] C. Yang, J. Zhao and J.P. Lu, Nano. Lett. 4, 561 (2004); Y. Yagi et al., Phys. Rev. B 69, 075414 (2004). [12] S. Dag et al., Phys. Rev. B 72, 155444 (2005). [13] E. Durgun et al., Europhys. Lett. 73, 642 (2006). [14] Numerical computations have been carried out by us- ing VASP software: G. Kresse, J. Hafner, Phys Rev. B 47, R558 (1993). Calculations of charge transfer, orbital contribution and local magnetic moments have been re- mailto:ciraci@fen.bilkent.edu.tr peated by SIESTA code using local basis set, P. Ordejon, E. Artacho and J.M. Soler, Phys. Rev. B 53, R10441 (1996). [15] J. P. Perdew et al., Phys. Rev. B 46, 6671 (1992). [16] D. Vanderbilt, Phys. Rev. B 41, R7892 (1990). [17] P.E. Bloechl, Phys. Rev. B 50, 17953 (1994). [18] All structures have been treated within supercell geom- etry using the periodic boundary conditions with lattice constants of a and b ranging from 20 Å to 25 Å depending on the diameter of the SiNW and c = co (co being the optimized lattice constant of SiNW along the wire axis). Some of the calculations have been carried out in dou- ble and quadruple primitive unit cells of SiNW by taking c = 2co and c = 4co, respectively. In the self-consistent potential and total energy calculations the Brillouin zone is sampled in the k-space within Monkhorst-Pack scheme [H.J. Monkhorst and J.D. Pack, Phys. Rev. B 13, 5188 (1976)] by (1x1x15) mesh points. [19] Numerous theoretical studies on SiNW have been pub- lished in recent years. See for example: A. K. Singh et al., Nano. Lett. 6, 920 (2006); Q. Wang et al., Phys. Rev. Lett. 95, 167200 (2005); Nano Lett. 5, 1587 (2005). [20] Spin-polarized calculations have been carried by relax- ing the magnetic moment and by starting with different initial µ values. Whether antiferromagnetic ground state exists in H-SiNW(N)+TM’s has been explored by using supercell including double primitive cells. [21] E. Durgun et al., Phys. Rev. B 67, 201401(R) (2003); J. Phys. Chem. B 108, 575 (2004). [22] Binding energy corresponding to n=1 is calculated by the following expression, EB = ET [H − SiNW (N)] + ET [TM ] − ET [H − SiNW (N) + TM ] in terms of the total energy of optimized H-SiNW(N) and TM-doped H- SiNW(N) (i.e. H-SiNW(N)+TM) and the total energy of the string of TM atoms having the same lattice parameter co of H-SiNW(N)+TM, all calculated in the same super- lattice. Hence EB can be taken as the binding energy of single isoalted TM atom, since the coupling amaong ad- sorbed TM atoms has been excluded. To calculate E′B, ET (TM) is taken as the total energy of bulk TM crystal per atom. For n¿1, ET (TM) is taken as the free TM atom energy, and hence EB includes the coupling between TM atoms. For this reason E B > 0 for H-SiNW(21)+Co at [23] S.L. Dudarev et al., Phys. Rev. B, 57, 1505 (1998). ABSTRACT From first-principles calculations, we predict that transition metal (TM) atom doped silicon nanowires have a half-metallic ground state. They are insulators for one spin-direction, but show metallic properties for the opposite spin direction. At high coverage of TM atoms, ferromagnetic silicon nanowires become metallic for both spin-directions with high magnetic moment and may have also significant spin-polarization at the Fermi level. The spin-dependent electronic properties can be engineered by changing the type of dopant TM atoms, as well as the diameter of the nanowire. Present results are not only of scientific interest, but can also initiate new research on spintronic applications of silicon nanowires. <|endoftext|><|startoftext|> Introduction Noncommutative geometry (NCG) (a la Connes, see [2] ) and the C∗- algebraic theory of quantum groups (see, for example, [11], [10]) are two well-developed mathematical areas which share the basic idea of ‘noncom- mutative mathematics’, namely, to view a general (noncommutative) C∗ algebra as noncommutative analogue of a topological space, equipped with additional structures resembling and generalizing those in the classical (com- mutative) situation, e.g. manifold or Lie group structure. A lot of fruitful interaction between these two areas is thus quite expected. However, such an interaction was not very common until recently, when a systematic effort by a number of mathematicians for understanding C∗-algebraic quantum groups as noncommutative manifolds in the sense of Connes triggered a rapid and interesting development to this direction. However, quite sur- prisingly, such an effort was met with a number of obstacles even in the case of the simplest non-classical quantum group, namely SUq(2) and it was not so clear for some time whether this (and other standard examples of quantum groups) could be nicely fitted into the framework of Connes’ NCG (see [6] and the discussion and references therein). The problem of finding a nontrivial equivariant spectral triple for SUq(2) was finally settled in the affirmative in the papers by Chakraborty and Pal ([4], see also [3] and [5] for subsequent development), which increased the hope for a happy mar- riage between NCG and quantum group theory. However, even in the case of SUq(2), a few puzzling questions remain to be answered. One of them is the issue of invariance of the Chern character, which we have addressed in [7] and attempted to suggest a solution through the twisted version of http://arxiv.org/abs/0704.0111v1 the entire cyclic cohomology theory, building on the ideas of [8]. In that paper, we also made an attempt to study the connection between twisted and the conventional NCG following a comment in [3]. The present article is a follow-up of [7], and we mainly concentrate on SUq(2), considering it as a test-case for comparing the twisted and conventional formulation of NCG. 2 Notation and background Let A = SUq(2) (with 0 < q < 1) denote the C∗-algebra generated by two elements α, β satisfying α∗α+ β∗β = I, αα∗ + q2ββ∗ = I, αβ − qβα = 0, αβ∗ − qβ∗α = 0, β∗β = ββ∗. We also denote the ∗-algebra generated by α and β (without taking the norm completion) by A∞. There is a Hopf ∗ algebra structure on A∞, as can be seen from, for example, [10]. We denote the canonical coproduct on A∞ by ∆. We shall also use the so-called Sweedler convention, which we briefly explain now. For a ∈ A∞, there are finitely many elements , i = 1, 2, ..., p (say), such that ∆(a) = ⊗a(2) . For notational convenience, we abbreviate this as ∆(a) = a(1)⊗a(2). For any positive integer m, let A∞m be the m-fold algebraic tensor product of A∞. There is a natural coaction of A∞ on A∞m given by ∆mA(a1 ⊗ a2 ⊗ ...⊗ am) := (a1(1) ⊗ ...am(1))⊗ (a1(2)...am(2)), using the Sweedler notation, with summation being implied. Let us recall the convolution ∗ defined in [7]. If φ : A∞m → C is an m-linear functional, and ψ : A∞ → C is a linear functional, we define their convolution φ ∗ ψ : A∞m → C by the following : (φ ∗ ψ)(a1 ⊗ ...⊗ am) := φ(a(1)1 ⊗ ...⊗ a(1)m )ψ(a 1 ...a using the Sweedler convention. We say that an m-linear functional φ is invariant if φ ∗ ψ = ψ(1)φ for every functional ψ on A∞. In [9], the K-homology K∗(A∞) has been explicitly computed. It has been shown there that K0(A∞) = K1(A∞) = Z, and the Chern charac- ters (in cyclic cohomology ) of the generators of these K-homology groups, denoted by [τev] and [τodd] respectively, are also explicitly written down. 3 Main results 3.1 Chern characters are not invariant In this subsection, we give detailed arguments for a remark made in [7] about the impossibility of having an invariant Chern character for A∞ under the conventional (non-twisted) framework of NCG. To make the notion of invariance precise, we give the following definition (motivated by a comment by G. Landi, which is gratefully acknowledged). Definition 3.1 We say that a class [φ] ∈ HCn(A∞) is invariant if there is an invariant n + 1-linear functional φ′ such that φ′ is a cyclic cocycle and φ′ ∼ φ (i.e. [φ] = [φ′]). It is easy to see that the Chern chracter [τev] cannot be invariant. Had it been so, it would follow from the uniqueness of the Haar state (say h) on SUq(2) that τev must be a scalar multiple of h. Since τev is a nonzero trace, it would imply that h is a trace too. But it is known (see [10]) that h is not a trace. However, proving that [τodd] is not invariant requires little bit of detailed arguments. We begin with the following observation. Lemma 3.2 If τ is a trace on A∞, i.e. τ ∈ HC0(A∞), then we have that (∂ξ) ∗ τ = ∂(ξ ∗ τ) for every functional ξ on A∞, where the Hochschild coboundary operator ∂ is defined by (∂ξ)(a, b) = ξ(ab)− ξ(ba). Proof : We shall use the Swedler notation. We have that for a0, a1 ∈ A∞, (∂ξ ∗ τ)(a0, a1) = (∂ξ)(a 0 ⊗ a 1 )τ(a = ξ(a 1 )τ(a 1 )− ξ(a 0 )τ(a = ξ(a 1 τ(a 1 )− ξ(a 0 )τ(a 0 ) (since τ is a trace) = (ξ ∗ τ)(a0a1)− (ξ ∗ τ)(a1a0) = ∂(ξ ∗ τ)(a0, a1). The above lemma allows us to define the multiplication ∗ at the level of cohomology classes. More precisely, for [φ] ∈ HC1(A∞) and [η] ∈ HC0(A∞), we set [φ] ∗ [η] := [φ ∗ η] ∈ HC1(A∞), which is well-defined by the Lemma 3.2. Similarly [η] ∗ [φ] and [η] ∗ [η′] (where [η′] ∈ HC0(A∞)) can be defined. We now recall from [9] that [τev] ∗ [τev] = [τev], [τev] ∗ [τodd] = [τodd] ∗ [τev] = 0. We also note that τev(1) = 1 and that τev is a trace, i.e. τev(ab) = τev(ba). Using this observation, we are now in a position to prove that the Chern character of the generator of K1(A∞) is not an invariant class. Theorem 3.3 [τodd] is not invariant. Proof : Suppose that there is φ ∼ τodd such that φ is invariant. Then we have [φ ∗ τev] = [φ] ∗ [τev] = [τodd] ∗ [τev] = 0. However, since we have φ ∗ τev = τev(1)φ = φ by the invariance of φ, it follows that [φ] = [φ ∗ τev] = 0, that is, [τodd] = 0, which is a contradiction. 3.2 Nontrivial pairing with the twisted Chern character As already mentioned in the introduction, in [7] we have made an attempt to recover the desirable property of invariance by making a departure from the conventional NCG and using the twisted entire cyclic cohomology. We briefly recall here some of the basic concepts from that paper and refer the reader to [7] and the references therein for more details of this approach. We shall use the results derived in that paper wihout always giving a specific reference. Let us give the definition of twisted entire cyclic cohomology for Banach algebras for simplicity, but note that the theory extends to locally convex algebras, which we actually need. The extension to the locally convex al- gebra case follows exactly as remarked in [1, page 370]. So, let A be a unital Banach algebra, with ‖.‖∗ denoting its Banach norm, and let σ be a continuous automorphism of A, σ(1) = 1. For n ≥ 0, let Cn be the space of continuous n + 1-linear functionals φ on A which are σ-invariant, i.e. φ(σ(a0), ..., σ(an)) = φ(a0, ..., an)∀a0, ..., an ∈ A; and Cn = {0} for n < 0. We define linear maps Tn, Nn : C n → Cn, Un : Cn → Cn−1 and Vn : C n → Cn+1 by, (Tnf)(a0, ..., an) = (−1)nf(σ(an), a0, ..., an−1), Nn = T jn, (Unf)(a0, ..., an−1) = (−1)nf(a0, ..., an−1, 1), (Vnf)(a0, ..., an+1) = (−1)n+1f(σ(an+1)a0, a1, ..., an). Let Bn = Nn−1Un(Tn − I), bn = j=0 T n+1 VnT n. Let B, b be maps on the complex C ≡ (Cn)n given by B|Cn = Bn, b|Cn = bn. It is easy to verify (similar to what is done for the untwisted case , e.g. in [2]) that B2 = 0, b2 = 0 and Bb = −bB, so that we get a bicomplex (Cn,m ≡ Cn−m) with differentials d1, d2 given by d1 = (n − m + 1)b : Cn,m → Cn+1,m, : Cn,m → Cn,m+1. Furthermore, let Ce = {(φ2n)n ∈ IN ;φ2n ∈ C2n∀n ∈ IN}, and Co = {(φ2n+1)n ∈ IN ;φ2n+1 ∈ C2n+1∀n ∈ IN}. We say that an element φ = (φ2n) of C e is a σ-twisted even entire cochain if the radius of convergence of the complex power series ‖φ2n‖z is infinity, where ‖φ2n‖ := sup‖aj‖∗≤1 |φ2n(a0, ...., a2n)|. Similarly we define σ-twisted odd entire cochains, and let Ceǫ (A, σ) (Coǫ (A, σ) respectively) denote the set of σ-twisted even (respectively odd) entire cochains. Let ∂̃ = d1 + d2 , and we have the short complex Ceǫ (A, σ) Coǫ (A, σ). We call the cohomology of this complex the σ-twisted entire cyclic cohomology of A and denote it by H∗ǫ (A, σ). Let Aσ = {a ∈ A : σ(a) = a} be the fixed point subalgebra for the automorphism σ. There is a canonical pairing < ., . >σ,ǫ: K∗(Aσ)× H∗ǫ (A, σ) → C. We shall need the pairing for the odd case, which we write down : < [u], [ψ] >≡< [u], [ψ] >σ,ǫ= (−1)n n! (2n + 1)! ψ2n+1(u −1, u, ..., u−1, u), where [u] ∈ K1(Aσ) and [ψ] ∈ H1ǫ (A, σ). Definition 3.4 Let H be a separable Hilbert space, A∞ be a ∗ subalgebra (not necessarily complete) of B(H), R be a positive (possibly unbounded) operator on H, D be a self-adjoint operator in H with compact resolvents such that the following hold : (i) [D, a] ∈ B(H) ∀a ∈ A∞, (ii) R commutes with D, (iii) For any real number s and a ∈ A∞, σs(a) := R−saRs is bounded and be- longs to A∞. Furthermore, for any positive integer n, sups∈[−n,n] ‖σs(a)‖ < Then we call the quadruple (A∞,H,D,R) an odd R-twisted spectral data. We say that the odd twisted spectral data is Θ-summable if Re−tD is trace- class for all t > 0. Let us now recall the construction of twisted Chern character from a given odd twisted spectral data (A∞,H,D,R). Let B denote the set of all A ∈ B(H) for which σs(A) := R−sARs ∈ B(H) for all real number s, [D,A] ∈ B(H) and s 7→ ‖σs(A)‖ is bounded over compact subsets of the real line. In particular, A∞ ⊆ B. We define for n ∈ IN an n+1-linear functional Fn on B by the formula Fn(A0, ..., An) = Tr(A0e ...Ane R)dt0...dtn, where Σn = {(t0, ..., tn) : ti ≥ 0, i=0 ti = 1}. Let us now equip A∞ with the locally convex topology given by the fam- ily of Banach norms ‖.‖∗,n, n = 1, 2, ..., where ‖a‖∗,n := sups∈[−n,n](‖σs(a)‖+ ‖[D,σs(a)]‖). Let A denote the completion of A∞ under this topology, and thus A is Frechet space. We can construct the (twisted) Chern character in Hoǫ (A, σ), where σ = σ1, which extends on the whole of A by continuity. Theorem 3.5 Let φo ≡ (φ2n+1)n be defined by φ2n+1(a0, ..., a2n+1) = 2iF2n+1(a0, [D, a1], ..., [D, a2n+1]), ai ∈ A. Then we have (b+B)φo = 0, hence ψo ≡ ((2n + 1)!φ2n+1)n ∈ Hoǫ (A, σ). We shall also need some results from the theory of semifinite spectral triples and the corresponding JLO cocycles and index formula, as discussed in, for example, [1]. An odd semifinite spectral triple is given by (C,N ,K,D), where K is a separable Hilbert space, N ⊆ B(K) is a von Neumann algebra with a faithful semifinite normal trace (say τ), D is a self-adjoint operator affiliated to N , C is a ∗-subalgebra of B(K) such that [D, c] ∈ B(K) for all c ∈ C. In the terminology of [1], (N ,D) is also called an odd, unbounded Breuer-Fredholm module for the norm-closure of C. It is called Θ-summable if τ(e−tD ) < ∞ for all t > 0. For a Θ-summable semifinite spectral triple, there is a canonical construction of JLO cocycle and index theorem (see [1]), which are very similar to their counterparts in the conventional framework of NCG. Let us now settle in the affirmative conjecture made in [7] about the nontriviality of the twisted Chern character of a natural twisted spectral data obtained from the equivariant spectral triple of [4]. For reader’s convenience, we briefly recall the construction of this equivariant spectral triple. Let us index the space of irreducible (co-)representations of SUq(2) by half-integers, i.e. n = 0, 1 , 1, ...; and index the orthonormal basis of the corresponding (2n + 1)2 dimensional subspace of L2(SUq(2), h) by i, j = −n, ..., n, instead of 1, 2, ..., (2n + 1). Thus, let us consider the orthonormal basis eni,j , n = , ...; i, j = −n,−n + 1, ..., n in the notation of [4]. We consider any of the equivariant spectral triples constructed by the authors of [4] and in the associated Hilbert space H = L2(SUq(2), h) define the following positive unbounded operator R : R(eni,j) = q −2i−2jeni,j , n = 0, 1 , , 1, ...; i, j = −n,−n+ 1, ..., n. Let us choose a spectral triple given by the Dirac operator D on H, defined by D(eni,j) = d(n, i)e i,j , where d(n, i) are as in (3.12) of [4], i.e. d(n, i) = 2n + 1 if n = i, d(n, i) = −(2n+ 1) otherwise. It can easily be seen that (A∞,H,D,R) is an odd R- twisted spectral data and furthermore, the fixed point subalgebra SUq(2)σ for σ(.) = R−1 ·R is the unital ∗-algebra generated by β, so it contains u = ∗β)(β−I)+I which can be chosen to be a generator of K1(SUq(2)) = Z (see [4]). It is easily seen that the map from K1(C ∗(u)) to K1(SUq(2)), induced by the inclusion map, is an isomorphism of the K1-groups (where C∗(u) denotes the unital C∗-algebra generated by u). Thus, we can consider the pairing of the twisted Chern character with K1(C ∗(u)), and in turn with K1(SUq(2)) using the isomorphism noted before. The important question raised in [7] is whether we recover the nontrivial pairing obtained in [4] in our twisted framework, and in what follows, we shall give an affirmative answer to this question. Theorem 3.6 The pairing between K1(SUq(2)σ) ∼= K1(SUq(2)) and the (twisted) Chern character of the above twisted spectral data coincides with the pairing between K1(SUq(2)) and the Chern character of the (non-twisted) spectral triple (A∞,H,D). In particular, this pairing is nontrivial. Proof : Let N be the von Neumann algebra in B(H) generated by β and f(D) for all bounded measurable functions f : R → C. Since R commutes with both β and D, it is easy to see that the functional N ∋ X 7→ τ(X) := Tr(XR) defines a faithful, normal, semifinite trace on the von Neumann algebra N . Moreover, (N ,D) is an unbounded Θ-summable Breuer-Fredholm module for the norm-closure of the unital ∗-algebra (say C) generated by β. Moreover, it follows from the fact that R commutes with D and u that the pairing of [u] with the twisted Chern character (say ψo ≡ (ψ2n+1)) coming from the twisted spectral data (A∞,H,D,R) is given by < [u], [ψo] > (−1)n n! (2n+ 1)! ψ2n+1(u −1, u, ..., u−1, u) (−1)nn! Σ2n+1 Tr(u−1e−t0D [D,u]et1D ...[D,u]et2n+1D R)dt0...dt2n+1, (−1)nn! Σ2n+1 τ(u−1e−t0D [D,u]et1D ...[D,u]et2n+1D )dt0...dt2n+1 which is nothing but the pairing between [u] ∈ K1(C) and the Breuer- Fredholm module (N ,D) mentioned before. By Theorem 10.8 of [1] and a straightforward but somewhat lengthy calculation along the lines of index computation in [4], we can show that the value of this pairing is equal to −indτ (A) ≡ −(τ(PA) − τ(QA)) for the following operator A : H0 → H0, where H0 is the closed subspace spanned by {enn,j , n = 0, 12 , ..., j = −n,−n+ 1, ..., n}, PA, QA are the orthogonal projections onto the kernel of A and the kernel of A∗ respectively and where r is a positive integer such that q2r < 1 < q2r−2 : Aenn,j = −q(n+j)(2r+1)(1−q2(n−j))r(1−q2(n−j−1)) ,j− 1 +(1−q2r(n+j)(1−q2(n−j))r)enn,j. It can be verified by computations as in [4] that Ker(A) = {0} and Ker(A∗) is the one dimensional subspace spanned by the vector ξ = n,−n, where p 1 = 1 and for n ≥ 3 1− (1− q4n−2)r (1− q4n) 12 (1− q4n−2)r 1− (1− q2)r (1 − q4) 12 (1− q2)r Clearly, since Ren−n,n = e n,−n, we have Rξ = ξ and thus −indτ (A) = ‖ξ‖2 τ(|ξ >< ξ|) = ‖ξ‖2Tr(R|ξ >< ξ|) = 1, which is the same as the value of the pairing between [u] ∈ K1(SUq(2)) and the conventional Chern character corresponding to the spectral triple constructed in [4]. ✷ Thus we see that both the conventional and twisted frameworks of NCG give essentially the same results for the example we considered, namely SUq(2). The aparent weakness of the twisted NCG arising from the fact that the twisted cyclic cohomology can be paired naturally with only the K theory of the invariant subalgebra and not of the whole algebra, does not seem to pose any essential difficulty for studying the noncommutative geometric aspects of SUq(2), since by a suitable choice of the twisting operator R as we did one could make sure that the K theory of the corresponding invariant subalgebra is isomorphic with the K theory of the whole, and also the pairing between the Chern character and the generator of the K theory in the twisted framework is equal to the similar pairing in the ordinary (non-twisted) framework of NCG. It will be important and interesting to investigate whether a similar fact remains true for a larger class of quantum groups, and we hope to pursue this in the future. References [1] A. Carey and J. Phillips, Spectral flow in Fredholm modules, eta invariants and the JLO cocycle, K-Theory31 (2004), no. 2, 135–194. [2] A. Connes, Noncommutative Geometry, Academic Press (1994). [3] A. Connes, Cyclic Cohomology, Quantum group Symmetries and the Local Index Formula for SUq(2), J. Inst. Math. Jussieu 3 (2004), no. 1, 17-68. [4] P. S. Chakraborty and A. Pal, Equivariant spectral triples on the quantum SU(2) group, K-Theory 28(2003), No. 2, 107-126. [5] L. Dabrowski, G. Landi, A. Sitarz, W. van Suijlekom and J. C. Varilly, The Dirac operator on SUq(2), Commun.Math.Phys. 259 (2005) 729-759. [6] D. Goswami, Some Noncommutative Geometric Aspects of SUq(2), preprint ( math-ph/0108003). [7] D. Goswami, Twisted entire cyclic cohomology, J-L-O cocycles and equivariant spectral triples, Rev. Math. Phys. 16 (2004), no. 5, 583-602. [8] J. Kustermans, G.J. Murphy and L. Tuset, Differential Calculi over Quantum Groups and Twisted Cyclic Cocycles, J. Geom. Phys. 44 (2003), no. 4, 570–594. [9] T. Masuda, Y. Nakagami and J.Watanabe, Noncommutative Differ- ential Geometry on the Quantum SU(2), I: An Algebraic Viewpoint, K Theory 4 (1990), 157-180. [10] S. L. Woronowicz, Twisted SU(2)-group : an example of a non- commutative differential calculus, Publ. R. I. M. S. (Kyoto Univ.) 23(1987) 117-181. [11] S. L. Woronowicz, Compact matrix pseudogroups, Commun. Math. Phys. 111 (1987), no. 4, 613–665. ABSTRACT We give details of the proof of the remark made in \cite{G2} that the Chern characters of the canonical generators on the K homology of the quantum group $SU_q(2)$ are not invariant under the natural $SU_q(2)$ coaction. Furthermore, the conjecture made in \cite{G2} about the nontriviality of the twisted Chern character coming from an odd equivariant spectral triple on $SU_q(2)$ is settled in the affirmative. <|endoftext|><|startoftext|> 7 Placeholder Substructures III: A Bit-String-Driven “Recipe Theory” for Infinite-Dimensional Zero-Divisor Spaces Robert P. C. de Marrais ∗ Thothic Technology Partners, P.O.Box 3083, Plymouth MA 02361 October 29, 2018 Abstract Zero-divisors (ZDs) derived by Cayley-Dickson Process (CDP) from N- dimensional hypercomplex numbers (N a power of 2, and at least 4) can represent singularities and, as N → ∞, fractals – and thereby, scale-free net- works. Any integer > 8 and not a power of 2 generates a meta-fractal or Sky when it is interpreted as the strut constant (S) of an ensemble of octahe- dral vertex figures called Box-Kites (the fundamental ZD building blocks). Remarkably simple bit-manipulation rules or recipes provide tools for trans- forming one fractal genus into others within the context of Wolfram’s Class 4 complexity. 1 The Argument So Far In Parts I[1] and II[2], the basic facts concerning zero-divisors (ZDs) as they arise in the hypercomplex context were presented and proved. “Basic,” in the context of this monograph, means seven things. First, they emerged as a side-effect of apply- ing CDP a minimum of 4 times to the Real Number Line, doubling dimension to the Complex Plane, Quaternion 4-Space, Octonion 8-Space, and 16-D Sedenions. With each such doubling, new properties were found: as the price of sacrificing ∗Email address: rdemarrais@alum.mit.edu http://arxiv.org/abs/0704.0112v3 counting order, the Imaginaries made a general theory of equations and solution- spaces possible; the non-commutative nature of Quaternions mapped onto the re- alities of the manner in which forces deploy in the real world, and led to vector calculus; the non-associative nature of Octonions, meanwhile, has only come into its own with the need for necessarily unobservable quantities (because of confor- mal field-theoretical constraints)in String Theory. In the Sedenions, however, the most basic assumptions of all – well-defined notions of field and algebraic norm (and, therefore, measurement) – break down, as the phenomena correlated with their absence, zero-divisors, appear onstage (never to leave it for all higher CDP dimension-doublings). Second thing: ZDs require at least two differently-indexed imaginary units to be defined, the index being an integer larger than 0 (the CDP index of the Real Unit) and less than 2N for a given CDP-generated collection of 2N-ions. In “pure CDP,” the enormous number of alternative labeling schemes possible in any given 2N-ion level are drastically reduced by assuming that units with such indices interact by XOR-ing: the index of the product of any two is the XOR of their indices. Signing is more tricky; but, when CDP is reduced to a 2-rule construction kit, it becomes easy: for index u < G, G the Generator of the 2N-ions (i.e., the power of 2 immediately larger than the highest index of the predecessor 2N−1-ions), Rule 1 says iu · iG = +i(u+G). Rule 2 says take an associative triplet (a,b,c), assumed written in CPO (short for “cyclically positive order”: to wit, a · b = +c, b · c = +a, and c · a = +b). Consider, for instance, any (u,G,G+ u) index set. Then three more such associative triplets (henceforth, trips) can be generated by adding G to two of the three, then switching their resultants’ places in the CPO scheme. Hence, starting with the Quaternions’ (1,2,3) (which we’ll call a Rule 0 trip, as it’s inherited from a prior level of CDP induction), Rule 1 gives us the trips (1,4,5), (2,4,6), and (3,4,7), while Rule 2 yields up the other 4 trips defining the Octonions: (1,7,6), (2,5,7), and (3,6,5). Any ZD in a given level of 2N-ions will then have units with one index < G, written in lowercase, and the other index > G, written in uppercase. Such pairs, alternately called “dyads” or “Assessors,” saturate the diagonal lines of their planes, which diagonals never mutually zero-divide each other (or make DMZs, for ”divisors (or dyads) making zero”), but only make DMZs with other such diagonals, in other such Assessors. (This is, of course, the opposite situation from the projection operators of quantum mechanics, which are diagonals in the planes formed by Reals and dimensions spanned by Pauli spin operators contained within the 4- space created by the Cartesian product of two standard imaginaries.) Third thing: Such ZDs are not the only possible in CDP spaces; but they define the “primitive” variety from which ZD spaces saturating more than 1-D regions can be articulated. A not quite complete catalog of these can be found in our first monograph on the theme [3]; a critical kind which was overlooked there, involving the Reals (and hence, providing the backdrop from which to see the projection-operator kind as a degenerate type), were first discussed more recently [4]. (Ironically, these latter are the easiest sorts of composites to derive of any: place the two diagonals of a DMZ pairing with differing internal signing on axes of the same plane, and consider the diagonals they make with each other!) All the primitive ZDs in the Sedenions can be collected on the vertices of one of 7 copies of an Octahedron in the Box-Kite representation, each of whose 12 edges indicates a two-way “DMZ pathway,” evenly divided between 2 varieties. For any vertex V, and k any real scalar, indicate the diagonals this way: (V,/) = k · (iv+ iV ), while (V, \)= k · (iv− iV ). 6 edges on a Box-Kite will always have negative edge-sign (with unmarked ET cell entries: see the “sixth thing”). For vertices M and N, exactly two DMZs run along the edge joining them, written thus: (M,/) ·(N, \) = (M, \) · (N,/) = 0 The other 6 all have positive edge-sign, the diagonals of their two DMZs hav- ing same slope (and marked – with leading dashes – ET cell entries): (Z,/) ·(V, /)= (Z, \) ·(V,\)= 0 Fourth thing: The edges always cluster similarly, with two opposite faces among the 8 triangles on the Box-Kite being spanned by 3 negative edges (con- ventionally painted red in color renderings), with all other edges being positive (painted blue). One of the red triangles has its vertices’ 3 low-index units forming a trip; writing their vertex labels conventionally as A, B, C, we find there are in fact always 4 such trips cycling among them: (a,b,c), the L-trip; and the three U-trips obtained by replacing all but one of the lowercase labels in the L-trip with uppercase: (a,B,C); (A,b,C); (A,B,c). Such a 4-trip structure is called a Sail, and a Box-Kite has 4 of them: the Zigzag, with all negative edges, and the 3 Trefoils, each containing two positive edges extending from one of the Zigzag vertices to the two vertices opposite its Sailing partners. These opposite vertices are always joined by one of the 3 negative edges comprising the Vent which is the Zigzag’s opposite face. Again by convention, the vertices opposite A, B, C are written F, E, D in that order; hence, the Trefoil Sails are written (A,D,E); (F,D,B), and (F,C,E), ordered so that their lowercase renderings are equivalent to their CPO L-trips. The graphical convention is to show the Sails as filled in, while the other 4 faces, like the Vent, are left empty: they show “where the wind blows” that keeps the Box-Kite aloft. A real-world Box-Kite, meanwhile, would be held together by 3 dowels (of wood or plastic, say) spanning the joins between the only vertices left unconnected in our Octahedral rendering: the Struts linking the strut-opposite vertices (A, F); (B, E); (C, D). Fifth thing: In the Sedenions, the 7 isomorphic Box-Kites are differentiated by which Octonion index is missing from the vertices, and this index is designated by the letter S, for “signature,” “suppressed index,” or strut constant. This last designation derives from the invariant relationship obtaining in a given Box-Kite between S and the indices in the Vent and Zigzag termini (V and Z respectively) of any of the 3 Struts, which we call the “First Vizier” or VZ1. This is one of 3 rules, involving the three Sedenion indices always missing from a Box-Kite’s vertices: G, S, and their simple sum X (which is also their XOR product, since G is always to the left of the left-most bit in S). The Second Vizier tells us that the L-index of either terminus with the U-index of the other always form a trip with G, and it true as written for all 2N-ions. The Third shows the relationship between the L- and U- indices of a given Assessor, which always form a trip with X. Like the First, it is true as written only in the Sedenions, but as an unsigned statement about indices only, it is true universally. (For that reason, references to VZ1 and VZ3 hereinout will be assumed to refer to the unsigned versions.) First derived in the last section of Part I, reprised in the intro of Part II, we write them out now for the third and final time in this monograph: VZ1: v · z =V ·Z = S VZ2: Z · v =V · z = G VZ3: V · v = z ·Z = X. Rules 1 and 2, the Three Viziers, plus the standard Octonion labeling scheme derived from the simplest finite projective group, usually written as PSL(2,7), pro- vide the basis of our toolkit. This last becomes powerful due to its capacity for recursive re-use at all levels of CDP generation, not just the Octonions. The sim- plest way to see this comes from placing the unique Rule 0 trip provided by the Quaternions on the circle joining the 3 sides’ midpoints, with the Octonion Gen- erator’s index, 4, being placed in the center. Then the 3 lines leading from the Rule 0 trip’s (1, 2, 3) midpoints to their opposite angles – placed conventionally in clockwise order in the midpoints of the left, right, and bottom sides of a triangle whose apex is at 12 o’clock – are CPO trips forming the Struts, while the 3 sides themselves are the Rule 2 trips. These 3 form the L-index sets of the Trefoil Sails, while the Rule 0 trip provides the same service for the Zigzag. By a process analo- gized to tugging on a slipcover (Part I) and pushing things into the central zone of hot oil while wok-cooking (Part II), all 7 possible values of S in the Sedenions, not just the 4, can be moved into the center while keeping orientations along all 7 lines of the Triangle unchanged. Part II’s critical Roundabout Theorem tells us, moreover, that all 2N-ion ZDs, for all N > 3, are contained in Box-Kites as their minimal ensemble size. Hence, by placing the appropriate G, S, or X in the center of a PSL(2,7) triangle, with a suitable Rule 0 trip’s indices populating the circle, any and all candidate primitive ZDs can be discovered and situated. Sixth thing: The word “candidate” in the above is critical; its exploration was the focus of Part II. For, starting with N = 5 and hence G = 16 (which is to say, in the 32-D Pathions), whole Box-Kites can be suppressed (meaning, all 12 edges, and not just the Struts, no longer serve as DMZ pathways). But for all N, the full set of candidate Box-Kites are viable when S≤ 8 or equal to some higher power of 2. For all other S values, though, the phenomenon of carrybit overflow intervenes – leading, ultimately, to the “meta-fractal” behavior claimed in our abstract. To see this, we need another mode of representation, less tied to 3-D visualizing, than the Box-Kite can provide. The answer is a matrix-like method of tabulating the products of candidate ZDs with each other, called Emanation Tables or ETs. The L-indices only of all candidate ZDs are all we need indicate (the U-indices being forced once G is specified); these will saturate the list of allowed indices < G, save for the value of S whose choice, along with that of G, fixes an ET. Hence, the unique ET for given G and S will fill a square spreadsheet whose edge has length 2N−1 −2. Moreover, a cell entry (r,c) is only filled when row and column labels R and C form a DMZ, which can never be the case along an ET’s long diagonals: for the diagonal starting in the upper left corner, R xor R = 0, and the two diagonals within the same Assessor, can never zero-divide each other; for the righthand diagonal, the convention for ordering the labels (ascending counting order from the left and top, with any such label’s strut-opposite index immediately being entered in the mirror-opposite positions on the right and bottom) makes R and C strut-opposites, hence also unable to form DMZs. For the Sedenions, we get a 6 x 6 table, 12 of whose cells (those on long diagonals) are empty: the 24 filled cells, then, correspond to the two-way traffic of “edge-currents” one imagines flowing between vertices on a Box-Kite’s 12 edges. A computational corollary to the Roundabout Theorem, dubbed the Trip- Count Two-Step, is of seminal importance. It connects this most basic theorem of ETs to the most basic fact of associative triplets, indicated in the opening pages of Part I, namely: for any N, the number TripN of associative triplets is found, by simple combinatorics, to be (2N −1)(2N −2)/3! – 35 for the Sedenions, 155 for the Pathions, and so on. But, by Trip-Count Two-Step, we also know that the maximum number of Box-Kites that can fill a 2N-ion ET = TripN−2. For S a power of 2, beginning in the Pathions (for S= 25−2 = 8), the Number Hub Theorem says the upper left quadrant of the ET is an unsigned multiplication table of the 2N−2- ions in question, with the 0’s of the long diagonal (indicated Real negative units) replaced by blanks – a result effectively synonymous with the Trip-Count Two- Step. Seventh thing: We found, as Part II’s argument wound down, that the 2 classes of ETs found in the Pathions – the “normal” for S ≤ 8, filled with indices for all 7 possible Box-Kites, and the “sparse” so-called Sand Mandalas, showing only 3 Box-Kites when 8 < S < 16, were just the beginning of the story. A simple formula involving just the bit-string of s and g, where the lowercase indicates the values of S and G modulo G/2, gave the prototype of our first recipe: all and only cells with labels R or C, or content P ( = R xor C ), are filled in the ET. The 4 “missing Box-Kites” were those whose L-index trip would have been that of a Sail in the 2N−1 realm with S = s and G = g. The sequence of 7 ETs, viewed in S-increasing succession, had an obvious visual logic leading to their being dubbed a flip-book. These 7 were obviously indistinguishable from many vantages, hence formed a spectrographic band. There were 3 distinct such bands, though, each typified by a Box-Kite count common to all band-members, demonstrable in the ETs for the 64-D Chingons. Each band contained S values bracketed by multiples of 8 (either less than or equal to the higher, depending upon whether the latter was or wasn’t a power of 2). These were claimed to underwrite behaviors in all higher 2N-ion ETs, according to 3 rough patterns in need of algorithmic refining in this Part III. Corresponding to the first unfilled band, with ETs always missing 4N−4 of their candidate Box-Kites for N > 4, we spoke of recursivity, meaning the ETs for constant S and increasing N would all obey the same recipe, properly abstracted from that just cited above, empirically found among the Pathions for S > 8. The second and third behaviors, dubbed, for S ascending, (s,g)-modularity and hide/fill involution respectively, make their first showings in the Chingons, in the bands where 16 < S ≤ 24, and then where 24 < S < 32. In all such cases, we are concerned with seeing the “period-doubling” inherent in CDP and Chaotic attractors both become manifest in a repeated doubling of ET edge-size, leading to the fixed-S, N increasing analog of the fixed-N,S increasing flip-books first ob- served in the Pathions, which we call balloon-rides. Specifying and proving their workings, and combining all 3 of the above-designated behaviors into the “funda- mental theorem of zero-division algebra,” will be our goals in this final Part III. Anyone who has read this far is encouraged to bring up the graphical complement to this monograph, the 78-slide Powerpoint show presented at NKS 2006 [5], in another window. (Slides will be referenced by number in what follows.) 2 8 < S < 16,N → ∞ : Recursive Balloon Rides in the Whorfian Sky We know that any ET for the 2N-ions is a square whose edge is 2N−1 − 2 cells. How, then, can any simply recursive rule govern exporting the structure of one such box to analogous boxes for progressively higher N? The answer: include the label lines – not just the column and row headers running across the top and left margins, but their strut-opposite values, placed along the bottom and right margins, which are mirror-reversed copies of the label-lines (LLs) proper to which they are parallel. This increases the edge-size of the ET box to 2N−1. Theorem 11. For any fixed S > 8 and not a power of 2, the row and column indices comprising the Label Lines (LLs) run along the left and top borders of the 2N- ion ET ”spreadsheet” for that S. Treat them as included in the spreadsheet, as labels, by adding a row and column to the given square of cells, of edge 2N−1−2, which comprises the ET proper. Then add another row and column to include the strut-opposite values of these labels’ indices in “mirror LLs,” running along the opposite edges of a now 2N−1-edge-length box, whose four corner cells, like the long diagonals they extend, are empty. When, for such a fixed S, the ET for the 2N+1-ions is produced, the values of the 4 sets of LL indices, bounding the contained 2N-ion ET, correspond, as cell values, to actual DMZ P-values in the bigger ET, residing in the rows and columns labeled by the contained ET’s G and X (the containing ET’s g and g+S). Moreover, all cells contained in the box they bound in the containing ET have P-values (else blanks) exactly corresponding to – and including edge-sign markings of – the positionally identical cells in the 2N-ion ET: those, that is, for which the LLs act as labels. Proof. For all strut constants of interest, S < g(= G/2); hence, all labels up to and including that immediately adjoining its own strut constant (that is, the first half of them) will have indices monotonically increasing, up to and at least including the midline bound, from 1 to g − 1. When N is incremented by 1, the row and column midlines separating adjoining strut-opposites will be cut and pulled apart, making room for the labels for the 2N+1-ion ET for same S, which middle range of label indices will also monotonically increase, this time from the current 2N-ion generation’s g (and prior generation’s G), up to and at least including its own midline bound, which will be g plus the number of cells in the LL inherited from the prior generation, or g/2− 1. The LLs are therefore contained in the rows and columns headed by g and its strut opposite, g+S. To say that the immediately prior CDP generation’s ET labels are converted to the current generation’s P-values in the just-specified rows and columns is equivalent to asserting the truth of the following calculation: (g+u)+(sg) · (G+g+uopp) g + (G+g+S) −(vz) · (G+uopp) +(vz) · (sg) ·u +u − (sg) · (G+uopp) 0 only if vz = (−sg) Here, we use two binary variables, the inner-sign-setting sg, and the Vent-or- Zigzag test, based on the First Vizier. Using the two in tandem lets us handle the normal and “Type II” box-kites in the same proof. Recall (and see Appendix B of Part II for a quick refresher) that while the “Type I” is the only type we find in the Sedenions, we find that a second variety emerges in the Pathions, indistinguishable from Type I in most contexts of interest to us here: the orientation of 2 of the 3 struts will be reversed (which is why VZ1 and VZ3 are only true generally when unsigned). For a Type I, since S < g, we know by Rule 1 that we have the trip (S,g,g+S); hence, g – for all 2N-ions beyond the Pathions, where the Sand Mandalas’ g = 8 is the L-index of the Zigzag B Assessor – must be a Vent (and its strut-opposite, g+S, a Zigzag). For a Type II, however, this is necessarily so only for 1 of the 3 struts – which means, per the equation above, that sg must be reversed to obtain the same result. Said another way, we are free to assume either signing of vz means +1, so the “only if” qualifying the zero result is informative. It is u and its relationship to g+ u that is of interest here, and this formulation makes it easier to see that the products hold for arbitrary LL indices u or their strut-opposites. But for this, the term-by-term computations should seem routine: the left bottom is the Rule 1 outcome of (u,g,g+ u): obviously, any u index must be less than g. To its right, we use the trip (uopp,g,g+ uopp) → (G+ g+ uopp,g,G+uopp), whose CPO order is opposite that of the multiplication. For the top left, we use (u,S,uopp) as limned above, then augment by g, then G, leaving uopp unaffected in the first augmenting, and g + u in the second. Finally, the top right (ignoring sg and vz momentarily) is obtained this way: (u,S,uopp) → (u,g+uopp,g+S)→ (u,G+g+ s,G+g+uopp); ergo, +u. Note that we cannot eke out any information about edge-sign marks from this setup: since labels, as such, have no marks, we have nothing to go on – unlike all other cells which our recursive operations will work on. Indeed, the exact algo- rithmic determination of edge-sign marks for labels is not so trivial: as one iterates through higher N values, some segments of LL indexing will display reversals of marks found in the ascending or descending left midline column, while other seg- ments will show them unchanged – with key values at the beginnings and ends of such octaves (multiples of 8, and sums of such multiples with S mod 8) some- times being reversed or kept the same irrespective of the behavior of the terms they bound. Fortunately, such behaviors are of no real concern here – but they are, nevertheless, worth pointing out, given the easy predictability of other edge-sign marks in our recursion operations. Now for the ET box within the labels: if all values (including edge-sign marks) remain unchanged as we move from the 2N-ion ET to that for the 2N+1-ions, then one of 3 situations must obtain: the inner-box cells have labels u,v which belong to some Zigzag L-trip (u,v,w); or, on the contrary, they correspond to Vent L-indices – the first two terms in the CPO triplet (wopp,vopp,u), for instance; else, finally, one term is a Vent, the other a Zigzag (so that inner-signs of their multiplied dyads are both positive): we will write them, in CPO order, vopp and u, with third trip member wopp. Clearly, we want all the products in the containing ET to indicate DMZs only if the inner ET’s cells do similarly. This is easily arranged: for the containing ET’s cells have indices identical to those of the contained ET’s, save for the appending of g to both (and ditto for the U-indices). Case 1: If (u,v,w) form a Zigzag L-index set, then so do (g+ v,g+u,w), so markings remain unchanged; and if the (u,v) cell entry is blank in the contained, so will be that for (g+ u,g+ v) in its container. In other words, the following holds: (g+ v)+(sg) · (G+g+ vopp) (g+u) + (G+g+uopp) −(G+wopp) − (sg) ·w −w − (sg) · (G+wopp) 0 only if sg = (−1) (g+u) · (g+ v) = P : (u,v,w)→ (g+ v,g+u,w); hence, (−w). (g+ u) · (sg) · (G+ g+ vopp) = P : (u,wopp,vopp) → (g+ vopp,wopp,g+ u) → (G+wopp,G+g+ vopp,g+u); hence, (sg) · (−(G+wopp)). (G+g+uopp) · (g+ v) = P : (uopp,wopp,v)→ (g+ v,wopp,g+uopp) → (g+ v,G+g+uopp,G+wopp); hence, (−(G+wopp)). (G+g+uopp) ·(G+g+vopp) = P : Rule 2 twice to the same two terms yields the same result as the terms in the raw, hence (−w). Clearly, cycling through (u,v,w) to consider (g+ v) · (g+ w) will give the exactly analogous result, forcing two (hence three) negative inner-signs in the candidate Sail; hence, if we have DMZs at all, we have a Zigzag Sail. Case 2: The product of two Vents must have negative edge-sign, and there’s no cycling through same-inner-signed products as with the Zigzag, so we’ll just write our setup as a one-off, with upper inner-sign explicitly negative, and claim its outcome true. (g+ vopp)− (G+g+ v) (g+wopp) + (G+g+w) +(G+uopp) +u −u − (G+uopp) (g+wopp) · (g+ vopp) = P : (wopp,vopp,u) → (g+ vopp,g+wopp,u); hence, (−u). (g+wopp) · (G+g+v) = P : (wopp,v,uopp)→ (g+v,g+wopp,uopp)→ (G+ uopp,g+wopp,G+ g+ v); but inner sign of upper dyad is negative, so (−(G+ uopp)). (G+g+w) · (g+vopp) = P : (vopp,uopp,w)→ (g+w,uopp,g+vopp)→ (G+ uopp,G+g+w,g+ vopp); hence, (+(G+uopp)). (G+g+w) · (G+g+ v) = P : Rule 2 twice to the same two terms yields the same result as the terms in the raw; but inner sign of upper dyad is negative, so (+u). Case 3: The product of Vent and Zigzag displays same inner sign in both dyads; hence the following arithmetic holds: (g+u)+(G+g+uopp) (g+ vopp) + (G+g+ v) −(G+w) +wopp −wopp +(G+w) The calculations are sufficiently similar to the two prior cases as to make their writing out tedious. It is clear that, in each of our three cases, content and marking of each cell in the contained ET and the overlapping portion of the container ET are identical. � To highlight the rather magical label/content involution that occurs when N is in- or de- cremented, graphical realizations of such nested patterns, as in Slides 60-61, paint LLs (and labels proper) a sky-blue color. The bottom-most ET being overlaid in the central box has g = the maximum high-bit in S, and is dubbed the inner skybox. The degree of nesting is strictly measured by counting the number of bits B that a given skybox’s g is to the left of this strut-constant high-bit. If we partition the inner skybox into quadrants defined by the midlines, and count the number Q of quadrant-sized boxes along one or the other long diagonal, it is obvi- ous that the inner skybox itself has B = 0 and Q = 1; the nested skyboxes contain- ing it have Q = 2B. If recursion of skybox nesting be continued indefinitely – to the fractal limit, which terminology we will clarify shortly – the indices contained in filled cells of any skybox can be interpreted in B distinct ways, B → ∞, as rep- resentations of distinct ZDs with differing G and, therefore, differing U-indices. By obvious analogy to the theory of Riemann surfaces in complex analysis, each such skybox is a separate “sheet”; as with even such simple functions as the log- arithmic, the number of such sheets is infinite. We could then think of the infinite sequence of skyboxes as so many cross-sections, at constant distances, of a flash- light beam whose intensity (one over the ET’s cell count) follows Kepler’s inverse square law. Alternatively, we could ignore the sheeting and see things another Where we called fixed-N, S varying sequences of ETs flip-books, we refer to fixed-S, N varying sequences as balloon rides: the image is suggested by David Niven’s role as Phineas Fogg in the movie made of Jules Vernes’ Around the World in 80 Days: to ascend higher, David would drop a sandbag over the side of his hot- air balloon’s basket; if coming down, he would pull a cord that released some of the balloon’s steam. Each such navigational tactic is easy to envision as a bit- shift, pushing G further to the left to cross LLs into a higher skybox, else moving Figure 1: ETs for S=15, N=5,6,7 (nested skyboxes in blue)· · · and “fractal limit.” it rightward to descend. Using S = 15 as the basis of a 3-stage balloon-ride, we see how increasing N from 5 to 6 to 7 approaches the white-space complement of one of the simplest (and least efficient) plane-filling fractals, the Cesàro double sweep [6, p. 65]. The graphics were programmatically generated prior to the proving of the the- orems we’re elaborating: their empirical evidence was what informed (indeed, demanded) the theoretical apparatus. And we are not quite finished with the cur- rent task the apparatus requires of us. We need two more theorems to finish the discussion of skybox recursion. For both, suppose some skybox with B = k, k any non-negative integer, is nested in one with B = k + 1. Divide the former along midlines to frame its four quadrants, then block out the latter skybox into a 4×4 grid of same-sized window panes, partitioned by the one-cell-thick borders of its own midlines into quadrants, each of which is further subdivided by the out- side edges of the 4 one-cell-thick label lines and their extensions to the window’s frame. These extended LLs are themselves NSLs, and have R,C values of g and g+S; for S = 15, they also adjoin NSLs along their outer edges whose R,C val- ues are multiples of 8 plus S mod 8. These pane-framing pairs of NSLs we will henceforth refer to (as a windowmaker would) as muntins. It is easy to calculate that while the inner skybox has but one muntin each among its rows and columns, each further nesting has 2B+1 − 1. But we are getting ahead of ourselves, as we still have two proofs to finish. Let’s begin with Four Corners, or Theorem 12. The 4 panes in the corners of the 16-paned B = k + 1 window are identical in contents and marks to the analogously placed quadrants of the B = k skybox. Proof. Invoke the Zero-Padding Lemma with regard to the U-indices, as the labels of the boxes in the corners of the B = k+1 ET are identical to those of the same- sized quadrants in the B = k ET, all labels ≥ the latter’s g only occurring in the newly inserted region. � Remarks. For N = 6, all filled Four Corners cells indicate edges belonging to 3 Box-Kites, whose edges they in fact exhaust. These 3, not surprisingly, are the zero-padded versions of the identically L-indexed trio which span the entirety of the N = 5 ET. By calculations we’ll see shortly, however, the inner skybox, when considered as part of the N = 6 ET, has filled cells belonging to all the other 16 Box-Kites, even though the contents of these cells are identical to those in the N = 5 ET. As B increases, then, the “sheets” covering this same central region must draw upon progressively more extensive networks of interconnected Box-Kites. As we approach the fractal limit – and “the Sky is the limit” – these networks hence become scale-free. (Corollarily, for N = 7, the Four Corners’ cells exhaust all the edges of the N = 6 ET’s 19 Box-Kites, and so on.) Unlike a standard fractal, however, such a Sky merits the prefix “meta”: for each empty ET cell corresponds to a point in the usual fractal variety; and each pair of filled ET cells, having (r,c) of one = (c,r) of the other), correspond to diagonal-pairs in Assessor planes, orthogonal to all other such diagonal-pairs be- longing to the other cells. Each empty ET cell, in other words, not only corre- sponds to a point in the usual plane-confined fractal, but belongs to the comple- ment of the filled cells’ infinite number of dimensions framing the Sky’s meta- fractal. We’ve one last thing to prove here. The French Windows Theorem shows us the way the cell contents of the pairs of panes contained between the B = k+ 1 skybox’s corners are generated from those of the analogous pairings of quadrants in the B = k skybox, by adding g to L-indices. Theorem 13. For each half-square array of cells created by one or the other midline (the French windows), each cell in the half-square parallel to that adjoining the midline (one of the two shutters), but itself adjacent to the label-line delimiting the former’s bounds, has content equal to g plus that of the cell on the same line orthogonal to the midline, and at the same distance from it, as it is from the label- line. All the empty long-diagonal cells then map to g (and are marked), or g+S (and are unmarked). Filled cells in extensions of the label-lines bounding each shutter are calculated similarly, but with reversed markings; all other cells in a shutter have the same marks as their French-window counterparts. Preamble. Note that there can be (as we shall see when we speak of hide/fill involution) cells left empty for rule-based reasons other than P = R⊻C = 0 | S. The shutter-based counterparts of such French-window cells, unlike those of long- diagonal cells, remain empty. Proof. The top and left (bottom and right) shutters are equivalent: one merely switches row for column labels. Top/left and bottom/right shutter-sets are likewise equivalent by the symmetry of strut-opposites. We hence make the case for the left shutter only. But for the novelties posed by the initially blank cells and the label lines (with the only real subtleties involving markings), the proof proceeds in a manner very similar to Theorem 11: split into 3 cases, based on whether (1) the L-index trip implied by the R,C,P values is a Zigzag; (2) u,v are both Vents; or, (3) the edge signified by the cell content is the emanation of same-inner-signed dyads (that is, one is a Vent, the other a Zigzag). Case 1: Assume (u,v,w) a Zigzag L-trip in the French window’s contained skybox; the general product in its shutter is v − (G+ vopp) (g+u) + (G+g+uopp) −(G+g+wopp) +(g+w) −(g+w) +(G+g+wopp) (g+u) · v = P : (u,v,w)→ (g+w,v,g+u); hence, (−(g+w)). (g+ u) · (G+ vopp) = P : (u,wopp,vopp) → (g+wopp,g+ u,vopp) → (G+ vopp,g+u,G+g+wopp); dyads’ opposite inner signs make (G+g+wopp) pos- itive. (G+g+uopp) · v = P : (uopp,wopp,v) → (g+wopp,g+uopp,v) → (G+g+ uopp,G+g+wopp,v); hence, (−(G+g+wopp)). (G+g+uopp) · (G+ vopp) = P : (vopp,uopp,w) → (vopp,g+w,g+uopp) → (G+g+uopp,g+w,G+vopp); dyads’ opposite inner signs make (g+w) positive. Case 2: The product of two Vents must have negative edge-sign, hence nega- tive inner sign in top dyad to lower dyad’s positive. The shutter product thus looks like this: (uopp)− (G+u) (g+ vopp) + (G+g+ v) +(G+g+wopp) +(g+w) −(g+w) − (G+g+wopp) (g+vopp) ·uopp = P : (vopp,uopp,w)→ (g+w,uopp,g+vopp); hence, (−(g+ (g+ vopp) · (G+ u) = P : (vopp,u,wopp) → (g+wopp,u,g+ vopp) → (G+ u,G+ g+wopp,g+ vopp); but dyads’ inner signs are opposite, so (−(G+ g+ wopp)). (G+g+v) ·uopp = P : (uopp,wopp,v)→ (uopp,g+v,g+wopp)→ (uopp,G+ g+wopp,G+g+ v); hence, (+(G+g+wopp)). (G+g+v) ·(G+u)= P : (u,v,w)→ (u,g+w,g+v)→ (G+g+v,g+w,G+ u); but dyads’ inner signs are opposite, so (+(g+w)). Case 3: The product of Vent and Zigzag displays same inner sign in both dyads; hence the following arithmetic holds: (uopp)+(G+u) (g+ v) + (G+g+ vopp) +(G+g+w) +(g+wopp) −(g+wopp) − (G+g+w) As with the last case in Theorem 11, we omit the term-by-term calculations for this last case, as they should seem “much of a muchness” by this point. What is clear in all three cases is that index values of shutter cells have same markings as their French-window counterparts, at least for all cells which have markings in the contained skybox; but, in all cases, indices are augmented by g. The assignment of marks to the shutter-cells linked to blank cells in French windows is straightforward for Type I box-kites: since any containing skybox must have g > S, and since g+ s has g as its strut opposite, then the First Vizier tells us that any g must be a Vent. But then the R,C indices of the cell containing g must belong to a Trefoil in such a box-kite; hence, one is a Vent, the other a Zigzag, and g must be marked. Only if the R,C,P entry in the ET is necessar- ily confined to a Type II box-kites will this not necessarily be so. But Part II’s Appendix B made clear that Type II’s are generated by excluding g from their L-indices: recall that, in the Pathions, for all S ¡ 8, all and only Type II box-kites are created by placing one of the Sedenion Zigzag L-trips on the “Rule 0” circle of the PSL(2,7) triangle with 8 in the middle (and hence excluded). This is a box- kite in its own right (one of the 7 “Atlas” box-kites with S = 8); its 3 sides are “Rule 2” triplets, and generate Type II box-kites when made into zigzag L-index sets. Conversely, all Pathion box-kites containing an ’8’ in an L-index (dubbed ”strongboxes” in Appendix B) are Type I. Whether something peculiar might oc- cur for large N (where there might be multiple powers of 2 playing roles in the same box-kite) is a matter of marginal interest to present concerns, and will be left as an open question for the present. We merely note that, by a similar argument, and with the same restrictions assumed, g+S must be a Zigzag L-index, and R,C either both be likewise (hence, g+S is unmarked); or, both are Vents in a Trefoil (so g+S must be unmarked here too). The last detail – reversal of label-line markings in their g-augmented shutter- cell extensions – is demonstrated as follows, with the same caveat concerning Type II box-kites assumed to apply. Such cells house DMZs (just swap u for g+u in Theorem 11’s first setup – they form a Rule 1 trip – and compute). The LL extension on top has row-label g; that along the bottom, the strut-opposite g+S. Given trip (u,v,w), the shutter-cell index for R,C = (g,u) corresponds to French- window index for R,C = (g,g+u). But (u,g,g+u) is a Trefoil, since g is a Vent. So if u is one too, g+u isn’t; hence marks are reversed as claimed. � 3 Maximal High-Bit Singletons: (s,g)-Modularity for 16 < S ≤ 24 The Whorfian Sky, having but one high bit in its strut constant, is the simplest possible meta-fractal – the first of an infinite number of such infinite-dimensional zero-divisor-spanned spaces. We can consider the general case of such single- ton high-bit recursiveness in two different, complementary ways. First, we can supplement the just-concluded series of theorems and proofs with a calculational interlude, where we consider the iterative embeddings of the Pathion Sand Man- dalas in the infinite cascade of boxes-within-boxes that a Sky oversees. Then, we can generalize what we saw in the Pathions to consider the phenomenology of strut constants with singleton high-bits, which we take to be any bits representing a power of 2 ≥ 3 if S contains low bits (is not a multiple of 8), else a power of 2 strictly greater than 3 otherwise. Per our earlier notation, g = G/2 is the highest such singleton bit possible. We can think of its exponential increments – equiva- lent to left-shifts in bit-string terms – as the side-effects of conjoint zero-padding of N and S. This will be our second topic in this section. Maintaining our use of S = 15 as exemplary, we have already seen that NSLs come in quartets: a row and column are each headed by S mod g (henceforth, s) and g, hence 7 and 8 in the Sand Mandalas. But each recursive embedding of the current skybox in the next creates further quartets. Division down the midlines to insert the indices new to the next CDP generation induces the Sand Mandala’s adjoining strut-opposite sets of s and g lines (the pane-framing muntins) to be displaced to the borders of the four corners and shutters, with the new skybox’s g and g+ s now adjoining the old s and g to form new muntins, on the right and left respectively, while g+g/2 (the old G+g) and its strut opposite form a third muntin along the new midlines. Continuing this recursive nesting of skyboxes generates 1, 3, 7, · · ·, 2B+1 −1 row-and-column muntin pairs involving multiples of 8 and their supplementings by s, where (recalling earlier notation) B = 0 for the inner skybox, and increments by 1 with each further nesting. Put another way, we then have a muntin number µ = (2N−4 −1), or 4µ NSL’s in all. The ET for given N has (2N−1 −2) cells in each row and column. But NSLs divvy them up into boxes, so that each line is crossed by 2µ others, with the 0, 2 or 4 cells in their overlap also belonging to diagonals. The number of cells in the overlap-free segments of the lines, or ω , is then just 4µ · (2N−1 − 2− 2µ) = 24µ(µ +1): an integer number of Box-Kites. For our S = 15 case, the minimized line shuffling makes this obvious: all boxes are 6 x 6, with 2-cell-thick boundaries (the muntins separating the panes), with µ boundaries, and (µ + 1) overlap-free cells per each row or column, per each quartet of lines. The contribution from diagonals, or δ , is a little more difficult, but straight- forward in our case of interest: 4 sets of 1,2,3, · · · ,µ boxes are spanned by mov- ing along one empty long diagonal before encountering the other, with each box contributing 6, and each overlap zone between adjacent boxes adding 2. Hence, δ = 24 · (2N−3 − 1)(2N−3 − 2)/6 – a formula familiar from associative-triplet counting: it also contributes an integer number of Box-Kites. The one-liner we want, then, is this: BKN, 8 8 where the maximum high-bit (that is, g) is included in its bitstring: for, with g at B and S mod g at E, whichever R,C label is not one of these suffices to completely determine the remaining Assessor L-indices, so that no other bits in S play a role in determining any of them. Meanwhile, cell contents P containing either g or S mod g, but created by XORing of row and column labels equal to neither, are arrayed in off- diagonal pairs, forming disjoint sets parallel or perpendicular to the two empty ones. If we write S mod g with a lower-case s, then we could call the rule in play here (s,g)-modularity. Using the vertical pipe for logical or, and recalling the special handling required by the 8-bit when S is a multiple of 8 (which we signify with the asterisk suffixed to “mod”), we can shorthand its workings this way: Theorem 14. For a 2N-ion inner skybox whose strut constant S has a singleton high-bit which is maximal (that is, equal to g = G/2 = 2N−2), the recipe for its filled cells can be condensed thus: R | C| P = g | S mod∗ g Under recursion, the recipe needs to be modified so as to include not just the inner-skybox g and S mod∗ g (henceforth, simply lowercase s), but all integer multiples k of g less than the G of the outermost skybox, plus their strut opposites k ·g+ s. Proof. The theorem merely boils down the computational arguments of prior para- graphs in this section, then applies the last section’s recursive procedures to them. The first claim of the proof is identical to what we’ve already seen for Sand Man- dalas, with zero-padding injected into the argument. The second claim merely assumes the area quadrupling based on midline splitting, with the side-effects al- ready discussed. No formal proof, then, is called for beyond these points. � Remarks. Using the computations from two paragraphs prior to the theorem’s statement, we can readily calculate the box-kite count for any skybox, no matter how deeply nested: recall the formula 6 · (2N−1 − 4) for BKN, S = 2 N−3 − 1. It then becomes a straightforward matter to calculate, as well, the limiting ratio of this count to the maximal full count possible for the ET as N → ∞, with each cell approaching a point in a standard 2-D fractal. Hence, for any S with a singleton high-bit in evidence, there exists a Sky containing all recursive redoublings of its inner skybox, and computations like those just considered can further be used to specify fractal dimensions and the like. (Such computations, however, will not concern us.) Finally, recall that, by spectrographic equivalence, all such compu- tations will lead to the same results for each S value in the same spectral band or octave. 4 Hide/Fill Involution: Further-Right High-Bits with 24 < S < 32. Recall that, in the Sand Mandala flip-book, each increment of S moved the two sets of orthogonal parallel lines one cell closer toward their opposite numbers: while S = 9 had two filled-in rows and columns forming a square missing its cor- ners, the progression culminating in S = 15 showed a cross-hairs configuration: the parallel lines of cells now abutted each other in 2-ply horizontal and vertical arrays. The same basic progression is on display in the Chingons, starting with S = 17. But now the number of strut-opposite cell pairs in each row and column is 15, not 7, so the cross-hairs pattern can’t arise until S = 31. Yet it never arises in quite the manner expected, as something quite singular transpires just after flip- ping past the ET in the middle, for S = 24. Here, rows and columns labeled 8 and 16 constrain a square of empty cells in the center · · · quickly followed by an ET which seems to continue the expected trajectory – except that almost all the non- long-diagonal cells left empty in its predecessor ETs are now inexplicably filled. More, there is a method to the “almost all” as well: for we now see not 2, but 4 rows and columns, all being blanked out while those labeled with g and S mod g are being filled in. This is an inevitable side effect of a second high-bit in S: we call this phe- nomenon, first appearing in the Chingons, hide/fill involution. There are 4, not 2, line-pairs, because S and G, modulo a lower power of 2 (because devolving upon a prior CDP generation’s g), offer twice the possibilities: for S = 25, S mod 16 is now 9, but S mod 8 can result in either 1 or 17 as well – with correlated multiples of 8 (8 proper, and 24) defining the other two pairings. All cells with R |C | P equal to one of these 4 values, but for the handful already set to “on” by the first high-bit, will now be set to “off,” while all other non-long-diagonal cells set to “off” in the Pathion Sand Mandalas are suddenly “on.” What results for each Chingon ET with 24 < S < 32 is an ensemble comprised of 23 Box-Kites. (For the flip-book, see Slides 40 – 54.) Why does this happen? The logic is as straight- forward as the effect can seem mysterious, and is akin, for good reason, to the involutory effect on trip orientation induced by Rule 2 addings of G to 2 of the trip’s 3 indices. In order to grasp it, we need only to consider another pair of abstract calcula- tion setups, of the sort we’ve seen already many times. The first is the core of the Two-Bit Theorem, which we state and prove as follows: Theorem 15. 2N-ion dyads making DMZs before augmenting S with a new high-bit no longer do so after the fact. Proof. Suppose the high-bit in the bitstring representation of S is 2K, K < (N − 1). Suppose further that, for some L-index trip (u,v,w), the Assessors U and V are DMZ’s, with their dyads having same inner signs. (This last assumption is strictly to ease calculations, and not substantive: we could, as earlier, use one or more binary variables of the sg type to cover all cases explicitly, including Type I vs. Type II box-kites. To keep things simple, we assume Type I in what follows.) We then have (u+ u ·X)(v+ v · X) = (u+U)(v+V ) = 0. But now suppose, without changing N, we add a bit somewhere further to the left to S, so that S < (2K = L) < G. The augmented strut constant now equals SL = S+L. One of our L-indices, say v, belongs to a Vent Assessor thanks to the assumed inner signing; hence, by Rule 2 and the Third Vizier, (V,v,X)→ (X +L,v,V +L). Its DMZ partner u, meanwhile, must thereby be a Zigzag L-index, which means (u,U,X)→ (u,X +L,U +L). We claim the truth of the following arithmetic: v + (V +L) u + (U +L) +(W + L) +w + w − (W +L) NOT ZERO (+w’s don’t cancel) The left bottom product is given. The product to its right is derived as follows: since u is a Zigzag L-index, the Trefoil U-trip (u,V,W) has the same orientation as (u,v,w), so that Rule 2 → (u,W +L,V +L), implying the negative result shown. The left product on the top line, though, has terms derived from a Trefoil U-trip lacking a Zigzag L-index, so that only after Rule 2 reversal are the letters arrayed in Zigzag L-trip order: (U +L,v,W +L). Ergo, +(W +L). Similarly for the top right: Rule 2 reversal “straightens out” the Trefoil U-trip, to give (U+L,V +L,w); therefore, (+w) results. If we explicitly covered further cases by using an sg variable, we would be faced with a Theorem 2 situation: one or the other product pair cancels, but not both. � Remark. The prototype for the phenomenon this theorem covers is the “explo- sion” of a Sedenion box-kite into a trio of interconnected ones in a Pathion sand mandala, with the S of the latter = the X of the former. As part of this process, 4 of the expected 7 are “hidden” box-kites (HBKs), with no DMZs along their edges. These have zigzag L-trips which are precisely the L-trips of the 4 Sedenion Sails. Here, an empirical observation which will spur more formal investigations in a sequel study: for the 3 HBKs based on trefoil L-trips, exactly 1 strut has reversed orientation (a different one in each of them), with the orientation of the triangular side whose midpoint it ends in also being reversed. For the HBK based on the zigzag L-trip, all 3 struts are reversed, so that the flow along the sides is exactly the reverse of that shown in the “Rule 0” circle. (Hence, all possible flow patterns along struts are covered, with only those entailing 0 or 2 reversals corresponding to functional box-kites: our Type I and Type II designations.) It is not hard to show that this zigzag-based HBK has another surprising property: the 8 units defined by its own zigzag’s Assessors plus X and the real unit form a ZD-free copy of the Octonions. This is also true when the analogous Type II situation is explored, al- beit for a slightly different reason: in the former case, all 3 Catamaran “twistings” take the zigzag edges to other HBKs; in the latter, though, the pair of Assessors in some other Type II box-kite reached by “twisting” – (a,B) and (A,b), say, if the edge be that joining Assessors A and B, with strut-constant copp = d – are strut opposites, and hence also bereft of ZDs. The general picture seems to mirror this concrete case, and will be studied in “Voyage by Catamaran” with this expecta- tion: the bit-twiddling logic that generates meta-fractal “Skies” also underwrites a means for jumping between ZD-free Octonion clones in an infinite number of HBKs housed in a Sky. Given recent interest in pure “E8” models giving a privi- leged place to the basis of zero-divisor theory, namely “G2” projections (viz., A. Garrett Lisi’s “An Exceptionally Simple Theory of Everything”); a parallel vogue for many-worlds approaches; and, the well-known correspondence between 8-D closest-packing patterns, the loop of the 240 unit Octonions which Coxeter dis- covered, and E8 algebras – given all this, tracking the logic of the links across such Octonionic “brambles” might prove of great interest to many researchers. Now, we still haven’t explained the flipside of this off-switch effect, to which prior CDP generation Box-Kites – appropriately zero-padded to become Box- Kites in the current generation until the new high-bit is added to the strut-constant – are subjected. How is it that previously empty cells not associated with the sec- ond high-bit’s blanked-out R, C, P values are now full? The answer is simple, and is framed in the Hat-Trick Theorem this way. Theorem 16. Cells in an ET which represent DMZ edges of some 2N-ion Box- Kites for some fixed S, and which are offed in turn upon augmenting of S by a new leftmost bit, are turned on once more if S is augmented by yet another new leftmost bit. Proof. We begin an induction based upon the simplest case (which the Chingons are the first 2N-ions to provide): consider Box-Kites with S ≤ 8. If a high-bit be appended to S, then the associated Box-Kites are offed. However, if another high-bit be affixed, these dormant Box-Kites are re-awakened – the second half of hide/fill involution. We simply assume an L-index set (u,v,w) underwriting a Sail in the ET for the pre-augmented S, with Assessors (u,U) and (v,V ). Then, we introduce a more leftified bit 2Q = M, where pre-augmented S < L < M < G, then compute the term-by term products of (u+(U +L+M)) and (v+ sg · (V +L+ M)), using the usual methods. And as these methods tell us that two applications of Rule 2 have the same effect as none in such a setup, we have no more to prove. Corollary. The induction just invoked makes it clear that strut constants equal to multiples of 8 not powers of 2 are included in the same spectral band as all other integers larger than the prior multiple. The promissory note issued in the second paragraph of Part II’s concluding section, on 64-D Spectrography, can now be deemed redeemed. In the Chingons, high-bits L and M are necessarily adjacent in the bitstring for S < G = 32; but in the general 2N-ion case, N large, zero-padding guarantees that things will work in just the same manner, with only one difference: the recursive creation of “harmonics” of relatively small-g (s,g)-modular R,C,P values will propagate to further levels, thereby effecting overall Box-Kite counts. In general terms, we have echoes of the formula given for (s,g)-modular cal- culations, but with this signal difference: there will be one such rule for each high-bit 2H in S, where residues of S modulo 2H will generate their own near- solid lines of rows and columns, be they hidden or filled. Likewise for multiples of 2H 0, x, y ∈ X, Ψ0 ≡ 1. Note that, for arbitrary r0 > 0, the function (r, x, y) 7→ Ψr(x, y) is uniformly continuous on [r0,+∞)× X2. For fixed s ≤ t ≤ T,A ∈ R+ we decompose φs,tn as φs,tn = η n,A + ζ n,A, where n,A = A( kn−s) Xn(s), Xn We have that, on the set D n,A, for k such that s ≤ kn < t, Xn(0), Xn A( kn−s) Xn(s), Xn hence {φs,tn = η n,A} ⊃ D n,A and (6.3) {ζs,tn,A 6= 0} ⊂ Ω\D n,A ⊂ {H δ,n ≥ A}. Let p be the same as in statement 3 of Lemma 2. Then it follows from (6.3) and inequality 0 ≤ ζs,tn,A ≤ φs,tn (6.4) E n,A|Xn(s) = x (φs,tn ) p|Xn(s) = x δ,n ≥ A|Xn(s) = x) ] p−1 p ≤ B6(T )A−δ Similarly, one can write φs,t = η A + ζ A , where η ΨA(r−s)δ(X(s), X(r))dφ s,r , (6.5) E A |X(s) = x ≤ B6(T )A−δ We have ∣∣∣E n,A|Xn(s) = x A |X(s) = x ]∣∣∣ = ∣∣∣∣∣∣ gn(x) ]n(t−s)[−1∑ pk,n(x, y)ΨA( kn ) δ (x, y)µn(dy)− ∫ t−s pr(x, y)ΨArδ(x, y)µ(dy) dr ∣∣∣∣∣∣ ≤ δn +∆1n(x,A, s, t) + ∆2n(x,A, s, t) + ∆3n(x,A, s, t), 16 YURI N.KARTASHOV, ALEXEY M.KULIK where ]z[≡ min{N ∈ Z, N ≥ z}, ∆1n(x,A, s, t) = ∣∣∣∣∣∣ ]n(t−s)[−1∑ [pk,n(x, y)− p k (x, y)]Ψ A( kn ) δ (x, y)µn(dy) ∣∣∣∣∣∣ ∆2n(x,A, s, t) = ∣∣∣∣∣∣ ]n(t−s)[−1∑ (x, y)Ψ A( kn) δ(x, y)µn(dy)− ∫ t−s pr(x, y)ΨArδ(x, y)µn(dy) dr ∣∣∣∣∣∣ ∆3n(x,A, s, t) = ∫ t−s pr(x, y)ΨArδ (x, y)[µn(dy)− µ(dy)] dr ∣∣∣∣ . Denote ∆in(A, T ) = supx∈X,s≤t≤T ∆ n(x,A, s, t), i = 1, 2, 3. Since Ψr(x, y) ∈ [0, 1] and {Ψr(x, y) 6= 0} ⊂ {y ∈ B(x, 2r)}, ∆1n(A, T ) ≤ ]nT [−1∑ [αn + βk] x, 2A (6.6) ≤ Cθ(2A)θ · ]nT [−1∑ [αn + βk] )δθ−γ → 0, n→ +∞ by Toeplitz theorem. The function (r, x, y) 7→ pr(x, y)Ψr(x, y) is uniformly continuous over [r0,+∞)×X2 for any r0 > 0, therefore an estimate analogous to (6.6) provides that x∈X,s≤t≤T ∣∣∣∣∣∣ ]n(t−s)[−1∑ k=[r0n]+1 (x, y)Ψ A( kn) δ (x, y)µn(dy)− ∫ t−s pr(x, y)ΨArδ(x, y)µn(dy) dr ∣∣∣∣∣∣ (note that maxn µn(X) < +∞ since µn weakly converge to µ). The same arguments provide that lim sup ∆2n(A, T ) ≤ ≤ lim sup [r0n]∑  = B7(A, T )(r0)δθ−γ+1. Since r0 > 0 is arbitrary, this implies that (6.7) ∆2n(A, T ) → 0, n→ +∞. At last, the weak convergence of µn to µ and the first part of condition 3 provide that, for every t, In(A, t) ≡ sup pt(x, y)ΨArδ(x, y)[µn(dy)− µ(dy)] ∣∣∣∣→ 0, n→ +∞. Since In(A, t) ≤ Cγt−γ · Cθ(2Atδ)θ, the Lebesgue theorem of dominated convergence provides that (6.8) ∆3n(A, T ) → 0, n→ +∞. It follows from the estimates (6.4) – (6.8) that lim sup x∈X,s≤t≤T ∣∣f s,tn (x)− f s,t(x) ∣∣ ≤ 2B6(T )A−δ p , A > cθ. Taking A→ +∞ we obtain (6.2), that completes the proof. The theorem is proved. In order to make our exposition complete, let us formulate a version of Theorem 2 for the recurrent case. Theorem 3. Let conditions 1 – 4 of Theorem 2 hold true and γ < 1. Also let µn converge weakly to µ, and Xn converge to X by distribution in C(R +,X). Then (Xn, ψn(Xn)) ⇒ (X,φ(X)) in a sense of convergence in distribution in C(R+,X)× C(T,R+). INVARIANCE PRINCIPLE FOR ADDITIVE FUNCTIONALS OF MARKOV CHAINS 17 The proof, with slight changes, repeats the proof of Theorem 2, and is omitted. Note that, under conditions of Theorem 3, the convergence of finite-dimensional distributions of φn can be provided with the use of the technique, mentioned in the Introduction, that was proposed by I.I.Gikhman and is based on studying of limit behavior of difference equations for characteristic functions of φs,tn (see for instance the proof of Theorem 3 [9]). In the transient case, treated in Theorem 2, this technique can not be applied since the uniform estimates, analogous to (4.4) – (4.6), are not available in this case. At last, let us give an example of application of Theorem 2. To shorten exposition we omit the proofs of some technical details. Example 4. Let X = Rd, d ≥ 2 and Xn, X be as in Example 1. Let K ⊂ Rd be a compact set, for which the surface measure λK is well defined by equality λK(·) ≡ w − lim λd(· ∩Kε) λd(Kε) where w − lim means the limit in the sense of weak convergence of measures, λd is Lebesgue measure on Rd, Kε ≡ {x|dist(x,K) ≤ ε}. Assume that the condition (6.9) λd(Kε) ≥ const · εβ, ε > 0 holds with some β < 2. In particular, the set K can be smooth (or, more generally, Lipschitz) surface of codimension 1 or fractal with its Haussdorf-Besikovich dimension greater then d− 2. It not hard to verify that µ ≡ λK is W -measure (see [1], Chapter 8.1 for the terminology), and therefore corresponds to some W -functional φ of the Wiener process X . This functional is naturally interpreted as the local time of Wiener process at the set K, and can be written as φs,t = λK(Xr) dr. We consider the functionals φn(Xn) of the form k∈[sn,tn) 1I{Xn( k )∈K 1√ and apply Theorem 2 in order to prove convergence of the distributions in C(R+,Rd)× C(T,R+) (6.10) (Xn, ψn(Xn)) ⇒ (X,φ(X)) (ψn are the broken lines corresponding to φn). Condition 1 holds true due to Example 1, condition 2 is provided by condition (6.9) (by this condition, supx gn(x) ≤ const · n 2 ). Condition 3 holds with pt(x, y) = (2πt) 2 exp{− 1 ‖y− x‖2 } and γ = d . Condition (6.9) implies condition 6 with θ = d− β. We assume that the random walk Sn is either aperiodic on some lattice hZ d or is strongly non-lattice (i.e., Sn0 has bounded distribution density for some n0). Under this assumption, condition 4 holds with αn ≡ 0, ν = λd and νn equal to counting measures on d in lattice case or λd in strongly non-lattice case. It remains to provide conditions 5, 7. We have γ−1 = d−2 2(d−β) < . Choose some δ ∈ and consider α > 0 such that > δ and α > 2θ + 2. Suppose that (6.11) E‖ξk‖αRd < +∞. Then applying Burkholder inequality we obtain that (6.12) E‖Xn(t)−Xn(s)‖αRd ≤ const · |t− s| 2 , |t− s| ≥ 1√ , x ∈ Rd. Repeating the standard proof of the Kolmogorov’s theorem on existence of continuous modification (see, for instance [20], p. 44,45), one can deduce from (6.12) that, for ς < α, ϑ < t,s∈[0,T ],|t−s|≥ 1 ‖Xn(t)−Xn(s)‖Rd |t− s|ϑ < +∞. 18 YURI N.KARTASHOV, ALEXEY M.KULIK Finally, choosing ϑ = δ, ς > 2θ + 2 we obtain that conditions 5,7 hold with Cθ = ς . Applying Theorem 2, we obtain weak convergence (6.10) under additional moment condition (6.11). One can remove this condition using the ”cutting” procedure, described in the proof of the Proposition 3. Let us remark that for the lattice random walks the result, exposed in Example 4, was obtained in [5] by a technique, essentially different from the one proposed here. Convergence (6.10) in continuous case, as far as it is known to authors, is a new result. References [1] Dynkin E.B. Markov processes, M.: Fizmatgiz, 1963 (in Russian). [2] Skorokhod A.V., Slobodeniuk M.P. Limit theorems for random walks, Kiev: Naukova dumka, 1970 (in Russian). [3] Borodin A.N., Ibragimov I.A. Limit theorems for the functionals of random walks, Proc. of the Mathematical Institute of R. Acad. Sci, vol. 195. St.-P.: Nauka, 1994 (in Russian). [4] Revesz P. Random walk in random and nonrandom environments, World Sci. Publ. Co., Inc., Teaneck, NJ, 1990. [5] Bass R.F., Khoshnevisan D. Local times on curves and uniform invariance principles, Prob. Theory Rel. Fields 92, 1992, p. 465 – 492. [6] Cherny A.S., Shiryaev A.N., Yor M. Limit behavior of the ”horizontal-vertical” random walk and some extensions of the Donsker-Prokhorov invariance principle. Probability theory and its applications, vol. 47, 3, 2002, p. 498 – 517. [7] Gikhman I.I. Some limit theorems for the number of intersections of a boundary of a given domain by a random function, Sci. notes of Kiev Un-ty, 1957, vol. 16, 10, p. 149 – 164 (in Ukrainian). [8] Gikhman I.I. Asymptotic distributions for the number of intersections of a boundary of a domain by a random function, Visnyk of Kiev Un-ty, serie astron., athem and mech., 1958, v. 1, 1, p. 25 – 46 (in Ukrainian). [9] Portenko N.I. Integral equations and limit theorems for additive functionals of Markov processes, robability theory and its applications, 1967, v. 12, 3, p. 551 – 558 (in Russian). [10] Portenko N.I. The development of I.I.Gikhman’s idea concerning the methods for investigating local behavior of diffusion processes and their weakly convergent sequences, Probab. Theory and Math. Stat., 1994, 50, p. 7 – 22. [11] Kulik A.M. Markov Approximation of stable processes by random walks, vol.12(28) 2006, .1-2, p. 87 – 93. [12] Feller W. An introduction to probability theory and its applications, Vol II, M.: Mir, 1984 (Russian, translated from W.Feller, An introduction to probability theory and its applications, John Wiley & Sons, New York, 1971). [13] Skorokhod A.V. Studies in theory of stochastic processes, Kiev, Kiev Univ-ty publishing house, 1961 (in Russian). [14] Jacod J., Shiryaev A. Limit theorems for stochastic processes,Springer, Berlin, 1987. [15] Kurtz T.G., Protter Ph. Weak limit theorems for stochastic integrals and SDE’s, Annals of Probability, 1991, vol. 19, 3, p. 1035 – 1070. [16] Yamada T., Watanabe S. On the uniqueness of solutions of stochastic differential equations, J. Math. Kyoto Univ., 1971, vol. 11, p. 156 – 167. [17] Androshchuk T.O., Kulik A.M. Limit theorems for oscillatory functionals of a Markov process. Theory of stochastic proc- cesses, vol. 11(27), p. 3 – 13. [18] Ibragimov I.A., Linnik Yu.V. Linnik, Independent and stationary related variables, M.: Nauka, 1965 (in Russian). [19] Kulik A.M. Difference approximation for local times of multidimensional diffusions, arXiv:math/0702175 [20] Skorokhod A.V. Lections on theory of stochastic processes, Kyiv: Lybid, 1990 (in Ukrainian). E-mail address: kulik@imath.kiev.ua http://arxiv.org/abs/math/0702175 1. Introduction 2. Markov approximation. 3. Main theorem 4. The local time of a random walk at a point. 5. Difference approximations of diffusion processes. 6. Invariance principle for additive functionals of Markov chains References ABSTRACT We consider a sequence of additive functionals {\phi_n}, set on a sequence of Markov chains {X_n} that weakly converges to a Markov process X. We give sufficient condition for such a sequence to converge in distribution, formulated in terms of the characteristics of the additive functionals, and related to the Dynkin's theorem on the convergence of W-functionals. As an application of the main theorem, the general sufficient condition for convergence of additive functionals in terms of transition probabilities of the chains X_n is proved. <|endoftext|><|startoftext|> Introduction Let H,K be real separable Hilbert spaces with norms | · |H and | · |K . Let W be a cylindrical Wiener process in K defined on a probability space (Ω,F ,P) and let {Ft}t∈[0,T ] denote its natural augmented filtration. Let L 2(K,H) be the Hilbert space of Hilbert-Schmidt operators from K to H. http://arxiv.org/abs/0704.0509v1 We are interested in solving the following backward stochastic differential equation dYt = −AYtdt− f(t, Yt, Zt)dt+ ZtdWt, 0 ≤ t ≤ T, YT = ξ (1) where ξ is a random variable with values in H, f(t, Yt, Zt) = f0(t, Yt) + f1(t, Yt, Zt) and f0, f1 are given functions, and the operator A is an un- bounded operator with domain D(A) contained in H. The unknowns are the processes {Yt}t∈[0,T ] and {Zt}t∈[0,T ], which are required to be adapted with respect to the filtration of the Wiener process and take values in H, L2(K,H) respectively. In finite dimensional framework such type of equations has been solved by Pardoux and Peng [12] in the nonlinear case. They proved an existence and uniqueness result for the solution of the equation (1) when A = 0, the coefficient f(t, y, z) is Lipschitz continuous in both variables y and z, and the data ξ and the process {f(t, 0, 0)}t∈[0,T ] are square integrable. Since this first result, many papers were devoted to existence and uniqueness results under weaker assumptions. In finite dimension, when A = 0, the Lipschitz condition on the coefficient f with respect to the variable y is replaced by a monotonicity assumption; moreover, more general growth conditions in the variable y are formulated. Let us mention the contribution of Briand and Carmona [1], for a study of polynomial growth in Lp with p > 2, and the work of Pardoux [11] for an arbitrary growth. In [13] Pardoux and Rascanu deal with a BSDE involving the subdifferential of a convex function; in particular, one coefficient is not everywhere defined for y in Rk. In other works the existence of the solution is proved when the data, ξ and the process {f(t, 0, 0)}t∈[0,T ], are in L p for p ∈ (1, 2). El Karoui, Peng and Quenez [4] treat the case when f is Lipschitz continuous; in [2] this result is generalized to the case of a monotone coefficient f (both for equations on a fixed and on a random time interval) and is studied even the case p = 1. In the infinite-dimensional framework Hu and Peng [6], and Oksendal and Zhang [10] give an existence and uniqueness result for the equation with an operator A, infinitesimal generator of a strongly continous semigroup and the coefficient f Lipschitz in y and z. Pardoux and Rascanu [14] replace the operator A with the subdifferential of a convex function and assume that f is dissipative, everywhere defined and continuous with respect to y, Lipschitz with respect to z and with linear growth in y and z. Special results deal with stochastic backward partial differential equa- tions (BSPDEs): we recall in particular the works of Ma and Yong [8] and [9]. Earlier, Peng [16] studied a backward stochastic partial differential equation and regarded the classical Hamilton-Jacobi-Bellman equation of optimal stochastic control as special case of this problem. Our work extends these results in a special direction. We consider an operator A which is the generator of an analytic contraction semigroup on H and a coefficient f(t, y, z) of the form f0(t, y)+ f1(t, y, z). The coefficient f1(t, y, z) is assumed to be bounded and Lipschitz with respect to y and z. The term f0(t, y) is defined for y only taking values in a suitable subspaceHα of H and it satisfies the following growth condition for some 1 < γ < 1/α, S ≥ 0, P-a.s. |f0(t, y)|H ≤ S(1 + ||y|| ) ∀t ∈ [0, T ], ∀y ∈ Hα. Following [6], we understand the equation (1) in the following integral e(s−t)A[f0(s, Ys) + f1(s, Ys, Zs)]ds+ e(s−t)AZsdWs = e (T−t)Aξ, requiring, in particular, that Y takes values in Hα. This requires generally that the final condition also takes values in the smaller spaceHα. We take as Hα a real interpolation space which belongs to the class Jα between H and the domain of an operator A (see Section 2). Moreover f0(t, ·) is assumed to be locally Lipschitz from Hα into H and dissipative in H. We prove (Theorem 5) that if ξ takes its values in the closure of D(A) in Hα and is such that ||ξ||Hα is essentially bounded, then equation (2) has a unique mild solution, i.e. there exists a unique pair of progressively measurable processes Y : Ω×[0, T ] → Hα, Z : Ω×[0, T ] → L 2(K;H), satisfying P-a.s. equality (2) for every t in [0, T ] and such that E supt∈[0,T ] ||Yt|| ||Zt|| L2(K,H) This result extends former results concerning the deterministic case to the stochastic framework: see [7], where previous works of Fujita - Kato [5], Pazy [15] and others are collected. In these papers similar assumptions are made on the coefficients f0, f1 and on the operator A. The plan of the paper is as follows. In Section 2 some notations and definitions are fixed. In Section 3 existence and uniqueness of the solution of a simplified equation are proved, where f1 is a bounded progressively measurable process which does not depend on y and z. In Section 4, applying the previous result, a fixed point argument is used in order to prove our main result on existence and uniqueness of a mild solution of (2). Section 5 is devoted to applications. 2 Notations and setting The letters K and H will always denote two real separable Hilbert spaces. Scalar product is denoted by 〈·, ·〉; L2(K;H) is the separable Hilbert space of Hilbert-Schmidt operators from K to H endowed with the Hilbert-Schmidt norm. W = {Wt}t∈[0,T ] is a cylindrical Wiener process with values in K, defined on a complete probability space (Ω,F ,P). {Ft}t∈[0,T ] is the natural filtration of W , augmented with the family of P-null sets of F . Next we define several classes of stochastic processes with values in a Banach space X. • L2(Ω× [0, T ];X) denotes the space of measurable X-valued processes Y such that |Yτ | is finite, identified up to modification. • L2(Ω;C([0, T ];X)) denotes the space of continuousX-valued processes Y such that E sup τ∈[0,T ] |Yτ | is finite, identified up to indistinguisha- bility. • Cα([0, T ];X) denotes the space of α-Hölderian functions on [0, T ] with values in X such that [f ]α = sup 0≤x 0 such that (i) ρ(A) ⊇ Sθ,ω = {λ ∈ C : λ 6= ω, |arg(λ− ω)| < θ}, (ii) ||(λI −A)−1||L(X) ≤ |λ−ω| ∀λ ∈ Sθ,ω where ρ(A) is the resolvent set of A. For every t > 0, (3) allows us to define a linear bounded operator etA in X, by means of the Dunford integral etA = ω+γr,η etλ(λI −A)−1dλ, t > 0, (4) where, r > 0, η ∈ (π/2, π) and γr,η is the curve {λ ∈ C : |argλ| = η, |λ| ≥ r} ∪ {λ ∈ C : |argλ| ≤ η, |λ| = r}, oriented counterclockwise. We also set e0Ax = x,∀x ∈ X. Since the function λ 7→ etλR(λ,A) is holomorphic in Sθ,ω, the definition of e tA is independent of the choice of r and η. If A is sectorial, the function [0,+∞) → L(X), t 7→ etA, with etA defined by (4) is called analytic semigroup generated by A in X. We note that for every x ∈ X the function t 7→ etAx is analytic (and hence continuous) for t > 0. etA is a strongly continuous semigroup if and only if D(A) is dense in X; in particular this holds if X is a reflexive space. We need to introduce suitable classes of subspaces of X. Definition 1. Let (α, p) be two numbers such that 0 < α < 1, 1 ≤ p ≤ ∞ or (α, p) = (1,∞). Then we denote with DA(α, p) the space DA(α, p) = {x ∈ X : t 7→ v(t) = ||t 1−α−1/pAetAx|| ∈ Lp(0, 1)} where ||x||DA(α,p) = ||x||X + [x]α = ||x||X + ||v||Lp(0,1). (We set as usual 1/∞ = 0). We recall here some estimates for the function t 7→ etA when t → 0, which we will use in the sequel. For convenience, in the next proposition we set DA(0, p) = X, p ∈ [1,∞]. Proposition 1. Let (α, p), (β, p) ∈ (0, 1)× [1,+∞]∪{(1,∞)}, α ≤ β. Then there exists C = C(p;α, β) such that ||t−α+βetA||L(DA(α,p),DA(β,p)) ≤ C, 0 < t ≤ 1. Definition 2. Let 0 ≤ α ≤ 1 and let D,X be Banach spaces, D ⊂ X. A Ba- nach space Y such that D ⊂ Y ⊂ X is said to belong to the class Jα between X and D if there is a constant C such that ||x||Y ≤ C||x|| X ||x|| D, ∀x ∈ D. In this case we write Y ∈ Jα(X,D). Now we give the definition of solution to the BSDE: e(s−t)A[f0(s, Ys) + f1(s, Ys, Zs)]ds+ e(s−t)AZsdWs = e (T−t)Aξ, Definition 3. A pair of progressively measurable processes (Y,Z) is called mild solution of (5) if it belongs to the space L2(Ω;C([0, T ];Hα))×L [0, T ];L2(K,H)) and P-a.s.solves the integral equation (5) on the interval [0, T ]. We finally state a lemma needed in the sequel. It is a generalization of the well known Gronwall’s lemma. Its proof is given in the Appendix. Lemma 1. Assume a, b, α, β are nonnegative constants, with α < 1, β > 0 and 0 < T < ∞. For any nonnegative process U ∈ L1(Ω× [0, T ]), satisfying P-a.s. Ut ≤ a(T − t) (s− t)β−1EFtUsds for almost every t ∈ [0, T ], it holds P-a.s. Ut ≤ aM(T − t) −α, for almost every t ∈ [0, T ]. M is a constant depending only on b, α, β, T . 3 A simplified equation As a preparation for the study of (2), in this section we consider the following simplified version of that equation: e(s−t)A[f0(s, Ys)ds+ f1(s)]ds + e(s−t)AZsdWs = e (T−t)Aξ, (6) for all t ∈ [0, T ]. We suppose that the following assumptions hold. Hypothesis 2. 1. A : D(A) ⊂ H → H is a sectorial operator. We also assume that A is dissipative, i.e. it satisfies < Ay, y >≤ 0,∀y ∈ D(A); 2. for some 0 < α < 1 there exists a Banach space Hα continuously embed- ded in H and such that (i) DA(α, 1) ⊂ Hα ⊂ DA(α,∞); (ii) the part of A in Hα is sectorial in Hα. 3. the final condition ξ is an FT -measurable random variable defined on Ω with values in the closure of D(A) with respect to Hα-norm. We denote this set D(A) . Moreover ξ belongs to L∞(Ω,FT ,P;Hα); 4. f0 : Ω× [0, T ]×Hα → H satisfies: i) {f0(t, y)}t∈[0,T ] is progressively measurable ∀y ∈ Hα; ii) there exist constants S > 0, 1 < γ < 1/α such that P-a.s. |f0(t, y)|H ≤ S(1 + ||y|| ) t ∈ [0, T ], y ∈ Hα; iii) for every R > 0 there is LR > 0 such that P-a.s. |f0(t, y1)− f0(t, y2)|H ≤ LR||y1 − y2||Hα for t ∈ [0, T ] and yi ∈ Hα with ||yi||Hα ≤ R; iv) there exists a number µ ∈ R such that P-a.s., ∀t ∈ [0, T ], y1, y2 ∈ < f0(t, y1)− f0(t, y2), y1 − y2 >H≤ µ|y1 − y2| H ; (7) 5. f1 : Ω × [0, T ] → H is progressively measurable and for some constant C > 0 it satisfies P-a.s. |f1(t)|H ≤ C, for t ∈ [0, T ]. Remark 1. We note that the pair (Y,Z) solves the BSDE (6) with final con- dition ξ and drift f = f0+f1 if and only if the pair (Ȳ , Z̄) := (e λtYt, e λtZt) is a solution of the same equation with final condition eλT ξ and drift f ′(t, y) := 0(t, y) + f 1(t) where f 0(t, y) = e λt(f0(t, e −λty)− λy), f 1(t) = e λtf1(t). If we choose µ = λ, then f 0 satisfies the same assumption as f0, but with (7) re- placed by < f0(t, y1)− f0(t, y2), y1 − y2 >H≤ 0. If this last condition holds, then f0 is called dissipative. Hence, without loss of generality, we shall assume until the end that f0 is dissipative, or equivalently that µ = 0 in (7). 3.1 A priori estimates We prove a basic estimate for the solution in the norm of H. Proposition 2. Suppose that Hypothesis 2 holds; if (Y,Z) is a mild solution of (6) on the interval [a, T ], 0 ≤ a ≤ T , then there exists a constant C1, which depends only on ||ξ||L∞(Ω;H) and on the constants S of 4.ii) and C of 5. such that P-a.s. supa≤t≤T ||Yt||H ≤ C1. In particular the constant C1 is independent of a. Proof. Let the pair (Y,Z) ∈ L2(Ω, C([a, T ];Hα)× L 2(Ω × [a, T ];L2(K;H)) satisfy (6). Let us introduce the operators Jn = n(nI − A) −1, n > 0. We note that the operators AJn are the Yosida approximations of A and they are bounded. Moreover |Jnx − x| → 0 as n → ∞, for every x ∈ H. We set Y nt = JnYt, Z t = JnZt. It is readily verified that Y n admits the Itô differential dY nt = −AY t dt− Jnf(t, Yt)dt− Jnf1(t)dt+ Z t dWt, and Y T = Jnξ. Applying the Ito formula to |Y nt | H , using the dissipativity of A, we obtain |Y nt | ||Zns || L2(K;H)ds ≤ |Jnξ| H + 2 < Jnf0(s, Ys), Y s >H ds+ < Jnf1(s), Y s >H ds− 2 < Y ns , Z s dWs >H . We note that < Jnf0(s, Ys) + Jnf1(s), Y s >H ds → < f0(s, Ys) + f1(s), Ys >H ds by dominated convergence, as n → ∞. Moreover by the dominated convergence theorem we have ||(Zns ) ∗Y ns − Z sYs|| Kds → 0 P-a.s. and it follows that < Y ns , Z s dWs >H→ < Ys, ZsdWs >H in probability. If we let n → ∞ in (8) we obtain ||Zs|| L2(K;H)ds ≤ |ξ| H + 2 < f0(s, Ys) + f1(s), Ys >H ds < Ys, ZsdWs >H . Recalling (7), that we assume to hold with µ = 0, it follows that ||Zs|| L2(K,H) ≤ ≤ |ξ|2H + 2 < f0(s, 0), Ys >H +2 < f1(s), Ys >H ds+ < Ys, ZsdWs >H ≤ |ξ|2H + |f(s, 0)|2Hds+ |f1(s)| Hds + 2 < Ys, ZsdWs >H . Now, since sup0≤t≤T |f(t, 0)| H ≤ S 2 and since the stochastic integral < Ys, ZsdWs >H , t ∈ [a, T ] is a martingale, if we take the conditional expectation given Ft we have H ≤ E Ft |ξ|2H + 2E |f(s, 0)|2Hds+ E |f1(s)| ≤ |ξ|2L∞(Ω,H) + (S 2 + C2)T + 2 Ft |Ys| Since Y belongs to L2(Ω;C([a, T ];Hα)) and, consequently, ||Y || L1(Ω× [0, T ]), we can apply Lemma 1 to |Y |2H and conclude that H ≤ (|ξ| L∞(Ω,H) + [S2 + C2]T )(1 + 2Te2T ). Now we will show that the result of Proposition 2, together with the growth condition satisfied by f0, yields an a priori estimate on the solution in the Hα-norm. Let 0 < α < 1 and let γ > 1 be given by 4.ii). We fix θ = αγ and consider the Banach space DA(θ,∞) introduced in Definition 1. It is easy to check (see [7]) that, if we take θ ∈ (0, 1), θ > α, then Hα contains DA(θ,∞) and belongs to the class Jα/θ between DA(θ,∞) and H, hence the following inequality is satisfied: |x|Hα ≤ c|x| DA(θ,∞) H , x ∈ DA(θ,∞). (9) Proposition 3. Suppose that Hypothesis 2 is satisfied. Let (Y,Z) be a mild solution of (6) in [a, T ], a ≥ 0 and assume that there exists two constants R > 0 and K > 0, possibly depending on a, such that, P-a.s., t∈[a,T ] ||Yt||Hα ≤ R, sup t∈[a,T ] |Yt|H ≤ K. (10) Then the following inequality holds P-a.s.: |Yt|L∞(Ω,DA(θ,∞)) ≤ C2 (T − t)θ−α , a ≤ t < T (11) with C2 depending on the operator A, ||ξ||L∞(Ω,Hα), θ, α, K, C of 5. and S of 4.ii) of Hypothesis 2. Proof. Taking the conditional expectation given Ft in equation (6) we find Yt = E e(T−t)Aξ + e(s−t)A[f0(s, Ys) + f1(s)]ds , a ≤ t ≤ T. Consequently, we have ||Yt||DA(θ,∞) ≤ E Ft ||e(T−t)Aξ||DA(θ,∞) ||e(s−t)A[f0(s, Ys) + f1(s)]||DA(θ,∞)ds, a ≤ t ≤ T . Since Hα ⊂ DA(α,∞), we have Ft ||e(T−t)Aξ||DA(θ,∞) ≤ ≤ EFt ||e(T−t)A||L(DA(α,∞),DA(θ,∞))||ξ||L∞(Ω,DA(α,∞)) (T − t)θ−α ||ξ||L∞(Ω,Hα), with C0 = C0(α, θ,∞), where in the last inequality we use Proposition 1. Moreover ||e(s−t)A[f0(s, Ys) + f1(s)]||DA(θ,∞)ds ≤ ≤ EFt ||e(s−t)A||L(H,DA(θ,∞))|f0(s, Ys) + f1(s)|Hds ≤ ≤ EFt (s− t)θ [|f0(s, Ys)|H + |f1(s)|H ]ds ≤ EFt (s− t)θ [S(1 + ||Ys|| ) + C]ds. In the inequality we used Hypotheses 4.ii) and 5. and Proposition 1. Re- calling (9), we conclude that the last term is dominated by (s − t)θ S(1 + c|Ys| γ(1−α)/θ H ||Ys|| DA(θ,∞) ) + C = EFt (s − t)θ S(1 + c|Ys| γ(1−α)/θ H ||Ys||DA(θ,∞)) + C by choosing θ = αγ. By the second inequality in (10) this can be estimated (s− t)θ S(1 + cKγ(1−α)/θEFt ||Ys||DA(θ,∞) + C)ds (s− t)θ (C + S)ds + (s− t)θ ScKγ(1−α)/θEFt ||Ys||DA(θ,∞)ds. Hence by (13) and (14) it follows ||Yt||DA(θ,∞) ≤ (T − t)θ−α ||ξ||L∞(Ω,Hα) + (s− t)θ (C + S)ds (s− t)θ ScKγ(1−α)/θEFt ||Ys||DA(θ,∞)ds, and (11) follows from Lemma 1. In order to justify the application of Lemma 1, we need to prove that ||Y ||DA(θ,∞) belongs to L 1(Ω × [a, T ]). This also follows from(13) and (14) since, for some constant K1, ||Yt||DA(θ,∞) ≤ (T − t)θ−α ||ξ||L∞(Ω,Hα) + E Ft [ sup s∈[a,T ] (1 + ||Ys|| (s − t)θ (T − t)θ−α ||ξ||L∞(Ω,Hα) + (1 +R (s − t)θ 3.2 Local existence and uniqueness We prove that, under Hypothesis 2, there exists a unique solution of (6) on an interval [T − δ, T ] with δ sufficiently small. To treat the ordinary integral in the left hand side of (6), we need the following result, whose proof can be found in [7], Proposition 4.2.1 and Lemma 7.1.1. Lemma 3. Let φ ∈ L∞((a, T );H), 0 < a < T and set v(t) = e(s−t)Aφ(s)ds, a ≤ t ≤ T. If 0 < α < 1, then v ∈ C1−α([a, T ];DA(α, 1)) and there is G0 > 0, not depending on a, such that ||v||C1−α([a,T ];DA(α,1)) ≤ G0||φ||L∞((a,T );H). Since DA(α, 1) ⊂ Hα, we also have v ∈ C 1−α([a, T ];Hα) and there is G > 0, not depending on a, such that ||v||C1−α([a,T ];Hα) ≤ G||φ||L∞((a,T );H). Theorem 4. Let us assume that Hypothesis 2 holds, except possibly 4.iv). Then there exists δ > 0 such that the equation (6) has a unique local mild solution (Y,Z) ∈ L2(Ω;C([T − δ, T ];Hα))× L 2(Ω× [T − δ, T ];L2(K;H)). Remark 2. The dissipativity condition 4.iv) only plays a role in obtaining the a priori estimate in H (Proposition 2) and consequently global existence, as we will see later. Proof. Let Mα := sup0≤t≤T ||e tA||L(Hα). We fix a positive number R such that R ≥ 2Mα||ξ||L∞(Ω;Hα). This implies that sup0≤t≤T ||e tAξ||Hα ≤ R/2 P-a.s. Moreover, let LR be such that |f0(t, y1)− f0(t, y2)|H ≤ LR||y1 − y2||Hα 0 ≤ t ≤ T, ||yi||Hα ≤ R We recall that the space L2(Ω;C([T − δ, T ];Hα)) is a Banach space en- dowed with the norm Y → E supt∈[T−δ,T ] ||Yt|| . We define K = {Y ∈ L2(Ω;C([T − δ, T ],Hα)) : sup t∈[T−δ,T ] ||Yt||Hα ≤ R, a.s.}. It easy to check that K is a closed subset of L2(Ω;C([T − δ, T ],Hα)), hence a complete metric space (with the inherited metrics). We look for a local mild solution (Y,Z) in the space K. We define a nonlinear operator Γ : K → K as follows: given U ∈ K, Y = Γ(U) is the first component of the mild solution (Y,Z) of the equation e(s−t)A[f0(s, Us)ds+ f1(s)]ds+ e(s−t)AZsdWs = e (T−t)Aξ (15) for t ∈ [T − δ, T ]. Since U ∈ K we have P-a.s. |f0(t, Ut) + f1(t)|H ≤ S(1 + ||Ut|| ) + C ≤ S(1 +Rγ) + C, (16) for all t in [T − δ, T ]. Hence f0(·, U·)+ f1(·) belongs to L 2(Ω× [T − δ, T ];H) and, by a result of Hu and Peng [6], there exists a unique pair (Y,Z) ∈ L2(Ω× [T − δ, T ];H)× L2(Ω× [T − δ, T ];L2(K;H)) satisfying (15). More- over, by taking the conditional expectation given Ft, Y has the following representation Yt = E e(T−t)Aξ + e(s−t)A[f0(s, Us) + f1(s)]ds We will show that Γ is a contraction for the norm of L2(Ω, C([T − δ, T ];Hα) and maps K into itself, if δ is sufficiently small; clearly, its unique fixed point is the required solution of the BSDE. We first check the contraction property. Let U1, U2 ∈ K. Then Γ(U1)t − Γ(U 2)t = Y t − Y t = E e(s−t)A(f0(s, U s )− f0(s, U Let v(t) = e(s−t)A f0(s, U s )− f0(s, U ds. Then, noting that v(T ) = 0 and recalling Lemma 3, for t ∈ [T − δ, T ] ||Y 1t − Y t ||Hα = = ||EFtv(t)||Hα ≤ E Ft ||v(t)||Hα ≤ δ1−αEFt ||v||C(1−α)([T−δ,T ],Hα) ≤ Gδ(1−α)EFt ||f0(·, U · )− f0(·, U · )||L∞([T−δ,T ],H) ≤ Gδ(1−α)LRE Ft sup t∈[T−δ,T ] ||U1t − U t ||Hα =: Mt, where {Mt, t ∈ [T − δ, T ]} is a martingale. Hence, by Doob’s inequality E sup t∈[T−δ,T ] ||Y 1t − Y ≤ E sup t∈[T−δ,T ] 2 ≤ 2E|MT | = 2G2L2Rδ 2(1−α) E sup t∈[T−δ,T ] ||U1t − U If δ ≤ δ0 = 2GLR (1−α) , then Γ is a contraction with constant 1/2. Next we check that Γ mapsK into itself. For each U ∈ K and t ∈ [T−δ, T ] with δ ≤ δ0 we have t∈[T−δ,T ] ||Γ(U)t||Hα = sup t∈[T−δ,T ] ||Yt||Hα ≤ sup t∈[T−δ,T ] Ft ||e(T−t)Aξ||Hα+ + sup t∈[T−δ,T ] Ft || e(s−t)A[f0(s, Us) + f1(s)]ds||Hα ≤ R/2 + sup t∈[T−δ,T ] ||e(s−t)A[f0(s, Us) + f1(s)]||Hαds ≤ R/2 + sup t∈[T−δ,T ] ||e(s−t)A[f0(s, Us) + f1(s)]||DA(α,1)ds, where in the last inequality we have used the fact that DA(α, 1) ⊂ Hα. Now, by Proposition 1, and from 4.ii) and 5., it follows that ||e(s−t)A[f0(s, Us) + f1(s)]||DA(α,1) ≤ ≤ ||e(s−t)A||L(H,DA(α,1))|f0(s, Us) + f1(s)|H (s− t)α [S(1 + ||Us|| ) + C]. Then, since U ∈ K, we arrive at t∈[T−δ,T ] ||Γ(U)t||Hα ≤ ≤ R/2 + sup t∈[T−δ,T ] (s− t)α [S(1 + ||Us|| ) + C]ds ≤ R/2 + sup t∈[T−δ,T ] (s− t)α [S(1 +Rγ) + C]ds ≤ R/2 + CαS [(1 +Rγ) + C] δ1−α, where Cα depends on A, α. Hence, if δ ≤ δ0 is such that CαS [(1+Rγ )+C] is less or equal to R/2, then sup t∈[T−δ,T ] ||Γ(U)t||Hα ≤ R. Due to Lemma 3, P- a.s. the function t 7→ Yt−E Fte(T−t)Aξ belongs to C[T−δ, T ];Hα); moreover, the map t 7→ EFte(T−t)Aξ belongs to C[T − δ, T ];Hα), since ξ is a random variable taking values in D(A) . Therefore, P-a.s. Y· ∈ C([T − δ, T ];Hα) and Γ maps K into itself and has a unique fixed point in K. Remark 3. By Lemma 3, using properties of analytic semigroups, it can be proved that for every fixed ω the range of the map Γ is contained in C1−β([T − δ, T − ǫ];DA(β, 1)) for every ǫ ∈ (0, δ), β ∈ [0, 1]. 3.3 Global existence Now we are able to prove a global existence theorem for the solution of the equation (6), using all the results presented above. Theorem 5. If Hypothesis 2 is satisfied, the equation (6) has a unique mild solution (Y,Z) ∈ L2(Ω;C([0, T ],Hα))× L 2(Ω× [0, T ]);L2(K;H)). Proof. By Theorem 4 equation (6) has a unique mild solution (Y 1, Z1) ∈ L2(Ω;C([T − δ1, T ],Hα)) × L 2(Ω × [T − δ1, T ]);L 2(K;H)) on the interval [T − δ1, T ], for some δ1 > 0. By Proposition 2 we know that there exists a constant C1 such that P-a.s. |YT−δ1 |H ≤ C1. (17) We recall that the constant C1 depends only on |ξ|L∞(Ω;H) and on the con- stants S of 4.ii) and C of 5. and is independent of δ1. Moreover, by Propo- sition 3, there exists a constant C2 such that P-a.s. ||YT−δ1 ||L∞(Ω,DA(θ,∞)) ≤ C2 δθ−α1 , (18) with C2 depending on the operator A, ||ξ||L∞(Ω,Hα), θ, α, C1. This implies that YT−δ1 belongs to L ∞(Ω;Hα) and it can be taken as final value for the problem ∫ T−δ1 e(s−t)A[f0(s, Ys)ds+ f1(s)]ds + ∫ T−δ1 e(s−t)AZsdWs = = e(T−δ1−t)AYT−δ1 on an interval [T − δ1 − δ2, T − δ1], for some δ2 > 0. As in the proof of Theorem 4, we fix a positive number R2 such that R2 = 2Mα ≥ 2Mα||YT−δ1 ||L∞(Ω,DA(θ,∞)). By Theorem 4 there exists a pair of progressively measurable processes (Y 2, Z2) in L2(Ω;C([T − δ1 − δ2, T − δ1];Hα)) × L 2(Ω × [T − δ1 − δ2, T − δ1];L 2(K,H)) which solves (19) on the interval [T − δ1 − δ2, T − δ1] where δ2 depends on the operator A, α, R2. We note that the continuity in T − δ1 of Y 2 follows from the fact that YT−δ1 takes values in DA(α, 1) (see Remark 3), so that YT−δ1 takes values in D(A) . Now, the process Yt defined by Y 1t on the interval [T − δ1, T ] and by Y t on [T − δ1 − δ2, T − δ1] belongs to L2(Ω;C([T − δ1 − δ2, T ];Hα)) and it easy to see that it satisfies (6) in the whole interval [T − δ1 − δ2, T ]. Consequently, by Proposition 2, P-a.s., |YT−δ1−δ2 |H ≤ C1 with C1 the constant in (17), and by (18) ||YT−δ1−δ2 ||L∞(Ω,DA(θ,∞)) ≤ (δ1 + δ2)θ−α , (20) where C2 is the same constant as in (18). Again, YT−δ1−δ2 can be taken as initial value for problem ∫ T−δ1−δ2 e(s−t)A[f0(s, Ys)ds+ f1(s)]ds + ∫ T−δ1−δ2 e(s−t)AZsdWs = e(T−δ1−δ2−t)AYT−δ1−δ2 on the interval [T − δ1 − δ2 − δ3, T − δ1 − δ2], where δ3 will be fixed later. In this case, by (20), we can choose R3 = R2 = 2Mα ≥ 2Mα||YT−δ1−δ2 ||L∞(Ω,DA(θ,∞)) and prove that there exists a unique mild solution (Y 3, Z3) of (21) on the interval [T−δ1−δ2−δ3, T−δ1−δ2], with δ3 = δ2 . So we extend the solution to [T − δ1 − 2δ2, T ]. Proceeding this way we prove the global existence to (6) on [0, T ]. 4 The general case We can now study the equation: e(s−t)A[f0(s, Ys) + f1(s, Ys, Zs)]ds + e(s−t)AZsdWs = e (T−t)Aξ We require that the function f1 satisfy the following assumptions: Hypothesis 6. 1. there exists K ≥ 0 such that P-a.s. |f1(t, y, z)− f1(t, y )|H ≤ K|y − y |H +K||z − z ||L2(K;H), for every t ∈ [0, T ], y, y ∈ H, z, z ∈ L2(K;H), 2. there exists C ≥ 0 such that P-a.s. |f1(t, y, z)|H ≤ C, for every t ∈ [0, T ], y ∈ H, z ∈ L2(K;H). Theorem 7. If Hypotheses 2 and 6 hold, then equation (22) has a unique solution in L2(Ω;C([0, T ];Hα))× L 2(Ω× [0, T ];L2(K;H)). Proof. LetM be the space of progressive processes (Y,Z) in the space L2(Ω× [0, T ];H) × L2(Ω× [0, T ];L2(K;H)) endowed with the norm |||(Y,Z)|||2β = E eβs(|Ys| H + ||Zs|| L2(K;H))ds, where β will be fixed later. We define Φ : M → M as follows: given (U, V ) ∈ M, (Y,Z) = Φ(U, V ) is the unique solution on the interval [0, T ] of the equation e(s−t)A[f0(s, Ys)ds+f1(s, Us, Vs)]ds+ e(s−t)AZsdWs = e (T−t)Aξ. By Theorem 5 the above equation has a unique mild solution (Y,Z) which belongs to L2(Ω;C([0, T ];Hα))×L 2(Ω×[0, T ];L2(K;H)). Therefore Φ(M) ⊂ M. We will show that Φ is a contraction for a suitable choice of β; clearly, its unique fixed point is the required solution of (22). We take another pair (U ) ∈ M and apply Proposition 3.1 in [3] to the difference of two equations. We obtain β|Y 1t − Y H + ‖Z s − Z L2(K;H)ds eβs < f0(s, Y s ) + f1(s, U s , V −f0(s, Y s )− f1(s, U s , V s ), Y s − Y s >H ds eβsK(|U1s − U s |H + ||V s − V s ||L2(K;H))|Y s − Y s |Hds eβs(|U1s − U H + ||V s − V L2(K;H))/2 + 4K 2|Y 1s − Y where we have used 4.iv) of Hypothesis 2 and 1. of Hypothesis 6. Choosing β = 4K2 + 1, we obtain the required contraction property. 5 Applications In this section we present some backward stochastic partial differential prob- lems which can be solved with our techniques. 5.1 The reaction-diffusion equation Let D be an open and bounded subset of Rn with a smooth boundary ∂D. We choose K = L2(D). This choice implies that dWt/dt is the so-called ”space-time white noise”. Moreover, since Hilbert-Schmidt op- erators on L2(D) are represented by square integrable kernels, the space L2(L2(D), L2(D)) can be identified with L2(D ×D). We are given a com- plete probability space (Ω,F ,P) with a filtration (Ft)t∈[0,T ] generated by W and augmented in the usual way. Let us consider a non symmetric bilinear, coercive continuous form a : H10 (D) × H 0 (D) → R defined by a(u, v) := − i,j aij(x)Diu(x)Djv(x)dx, where the coefficients aij are Lipschitz continuous and there exists α > 0 such that i,j=1 aij(x)ξiξj ≥ α|ξ|2 for every x ∈ D, ξ ∈ Rn. Let A be the operator associated with the bi- linear form a such that < Au, v >L2(D)= a(u, v), v ∈ H 0 (D) and u ∈ D(A). It is known that, in this case, D(A) = H2(D) ∩H10 (D), where H 2(D) and H10 (D) are the usual Sobolev spaces. We consider for t ∈ [0, T ] and x ∈ D the backward stochastic problem written formally ∂Y (t, x) = AY (t, x) + r(Y (t, x)) + g(t, Y (t, x), Z(t, x), x)+ + Z(t, x) ∂W (t, x) on Ω× [0, T ]× D̄ Y (T, x) = ξ(x) on Ω× D̄ Y (t, x) = 0 on Ω× [0, T ]× ∂D We suppose the following. Hypothesis 8. 1. r : R → R is a continuous, increasing and locally Lipschitz function; 2. r satisfies the following growth condition: |r(x)| ≤ S(1 + |x|γ) ∀x ∈ R for some γ > 1; 3. g is a measurable real function defined on [0, T ] × R × L2(D × D) ×D and there exists a constant K > 0 such that |g(t, y1, z1, x)− g(t, y2, z2, x)| ≤ K(|y1 − y2|+ ||z1 − z2||L2(D×D)) for all t ∈ [0, T ], y1, y2 ∈ R, z1, z2 ∈ L 2(D), x ∈ D; 4. there exists a real function h in L2(D×D) such that P-a.s. |g(t, y, z, x)| ≤ K1h(x) for all t ∈ [0, T ], y ∈ R, z ∈ L 2(D), x ∈ D; 5. ξ belongs to L∞(Ω;H2(D) ∩H10 (D)). We define the operator A by (Ay)(x) = Ay(x) with domain D(A) = H2(D) ∩ H10 (D). We set f0(t, y)(x) = −r(y(t, x)) for t ∈ [0, T ], x ∈ D and y in a suitable subspace of H which will be determined below. For t ∈ [0, T ], x ∈ D, y ∈ L2(D), z ∈ L2(D × D) we define f1 as the operator f1(t, y, z)(x) = −g(t, y(t, x), z(t, x), x). Then problem (23) can be written in abstract way as dYt = −AYtdt− f0(t, Yt)dt− f1(t, Yt, Zt)dt+ ZtdWt, YT = ξ. Under the conditions in Hypothesis 8, the assumptions in Hypotheses 2, 6 are satisfied. The operator A is a closed operator in L2(D) and it is the infinitesimal generator of an analytic semigroup in L2(D) satisfying ‖etA‖L(H) ≤ 1 (see [17], Chapter 3). In particular, by Lumer-Philips theo- rem, A is dissipative. The non linear function f0(t, ·) : L 2γ(D) → L2(D), y 7→ −r(y) is locally Lipschitz. We look for a space of class Jα between H and D(A) where f0 is well defined and locally Lipschitz. It is well known (see [18]) that the fractional order Sobolev space W β,2(D) is of class Jβ/2 between L2(D) and H2(D) for every β ∈ (0, 2). Hence the space Hα defined by Hα = W β,2(D) if β < 1, by W β,2(D) ∩H10 (D) if β ≥ 1 is of class Jβ/2 between H and D(A). Moreover the restriction of A on Hα is a sectorial operator ([18]). By the Sobolev embedding theorem, W β,2 is contained in Lq(D) for all q if β ≥ n , and in L2n/(n−2β)(D) if β < n . If we choose β ∈ (0, 2) we have W β,2(D) ⊂ L2γ(D) for n < 4 . It is clear that f0 is locally Lipschitz with respect to y from Hα into H. It is easy to verify that f0 satisfies 4.ii) of Hypothesis 2 with γ = 2n + 1 and that it is dis- sipative with constant µ = 0. The function f1 is Lipschitz uniformly with respect to y and z and it is bounded. The final condition ξ takes values in and belongs to L∞(Ω;Hα). Hence we can apply the global exis- tence theorem and state that the above problem has a unique mild solution (Y,Z) ∈ L2(Ω;C([0, T ];Hα))× L 2(Ω× [0, T ];L2(K,H)). 5.2 A spin system Let Z be the one-dimensional lattice of integers. Its elements will be inter- preted as atoms. A configuration is a real function y defined on Z. The value y(n) of the configuration y at the point n can be viewed as the state of the atom n. We consider an infinite system of equations dY nt = −anY t dt+ |n−j|≤1 V (Y nt − Y t )dt+ Z t n ∈ Z, 0 ≤ t ≤ T Yn(T ) = ξn n ∈ Z, where Y n and Zn are real processes, and V : R → R. Let l2(Z) be the usual Hilbert space of square summable sequences. To study system (24) we apply results of previous sections. To fit our assump- tion in Hypotheses 2 and 6, we suppose the following Hypothesis 9. 1. W n, n ∈ Z are independent standard real Wiener processes; 2. a = {an}n∈Z is a sequence of nonnegative real numbers; 3. ξ = {ξn}n∈Z is a random variable belonging to L ∞(Ω, l2(Z)); 4. the function V : R → R is defined by V (x) = x2k+1 k ∈ N. We will study system (24) regarded as a backward stochastic evolution equation for t ∈ [0, T ] dYt = (AYt + f0(t, Yt))dt+ ZtdWt, YT = ξ (25) on a properly chosen Hilbert space H of functions on Z. To reformulate problem (24) in the abstract form (25), we set K = H = l2(Z). We set Wt = {W t }n∈Z, t ∈ [0, T ]. By 1. of Hypothesis 9, W is a cylindrical Wiener process inH defined on (Ω,F , P ). We define the operator A(y) = (anyn)n, D(A) = {y ∈ l 2(Z) such that n∈Z a n < ∞}. It is easy to prove that A is a self-adjoint operator in l2(Z), hence the infinitesimal generator of a sectorial semigroup. The coefficient f0 is given by (f0(t, y))n = (V (yn+1−yn)+V (yn−1−yn)), t ∈ [0, T ], y ∈ D(f0) where D(f0) = {y ∈ l 2(Z) such that n∈Z |xn+1 − xn| 2(2k+1) < +∞}. Under Hy- pothesis 9, A, f0, ξ satisfy Hypotheses 2 and 6. We observe that in this case the domain of f0 is the whole space H: if y ∈ l 2(Z) then |yn+1 − yn| 2(2k+1)} 2(2k+1) ≤ { |yn+1 − yn| 2 ≤ 2||y||l2(Z). Consequently, we can take Hα with α = 0, i.e. H0 = H. The function f0 is dissipative. Namely < f0(t, y)− f0(t, y ′), y − y′ >l2(Z) = {[(yn+1 − yn) (2k+1) + (yn−1 − yn) (2k+1)]+ +[(y′n+1 − y (2k+1) + (y′n−1 − y (2k+1)])[yn − y [(yn+1 − yn) (2k+1) − (y′n+1 − y (2k+1)][(yn+1 − yn)− (y n+1 − y and the last term is negative. Moreover, f0 satisfies 4.ii) of Hypothesis 2 with γ = 2k + 1. The map f0 is also locally Lipschitz from H in to H. Then by Theorem 7, problem (25) has a unique mild solution (Y,Z) which belongs to L2(Ω, C([0, T ];H)) × L2(Ω× [0, T ];L2(K,H)). 6 Appendix This section is devoted to the proof of Lemma 1. Assume first that β = 1. Using recursively the inequality Ut ≤ a(T − t) −α + b FtUsds we can easily prove that ≤ a(T − t)−α + (r − t)k−1 (k − 1)! (T − r)α +bEFt (b(r − t))n−1 (n− 1)! Urdr. The last term in the above inequality tends to zero as n tends to infinity for each t in the interval [0, T ]. Thus Ut ≤ a(T − t) −α + a bk(T − t)k−1 (k − 1)! (T − r)α ≤ a(T − t)−α + abeb(T−t) (T − r)α ≤ a(T − t)−α + abeb(T−t) (T − t)1−α ≤ a(T − t)−αM where M = 1 + bebT 1 In the case β 6= 1 a similar proof can be given, based on recursive use of the inequality Ut ≤ a(T − t) −α + b (s − t)β−1EFtUsds. Acknowledgments: I wish to thank Giuseppe Da Prato for hospitality at the Scuola Normale Superiore in Pisa, suggestions and helpful discussions. I would like to express my gratitude to Marco Fuhrman: I am indebted to him for his precious help and encouragement. Special thanks are due to Alessandra Lunardi, who gave me valuable advice and support. References [1] Ph. Briand and R. Carmona. BSDEs with polynomial growth generators. J. Appl. Math. Stochastic Anal., 13(3):207–238, 2000. [2] Ph. Briand and B. Deylon and Y. Hu and E. Pardoux and L. Stoica. Lp solutions of backward stochastic differential equations. Stochastic Process. Appl., 108(1):109–129, 2003. [3] F. Confortola. Dissipative backward stochastic differential equations in infinite dimensions. Infinite Dimensional Analysis, Quantum Probability and Related Topics, 9 (1):155–168, 2006. [4] N. El Karoui and S. G. Peng and M. C. Quenez. Backward Stochastic Differential equations in Finance. Math. Finance, 7(1):1–71, 1997. [5] H. Fujita and T. Kato. On the Navier-Stokes initial value problem I. Arch. Rational Mech. Anal. 16:269–315, 1964. [6] Y. Hu and S. G. Peng. Adapted solution of a backward semilinear stochastic evolution equation. Stochastic Anal. Appl., 9(4):445–459, 1991. [7] A. Lunardi. Analytic semigroups and optimal regularity in parabolic prob- lems volume 16 of Progress in Nonlinear Differential Equations and their Applications. Birkhser Verlag, Basel 1995. [8] J. Ma and J. Yong Adapted solution of a degenerate backward SPDE, with applications. Stochastic Process. Appl. 70:59–84, 1997. [9] J. Ma and J. Yong On linear, degenerate backward stochastic partial differential equations. Probab. theory Related Fields 113:135–170 1999. [10] B. Oksendal and T. Zhang. On backward stochastic partial differential equations, 2001. Preprint. [11] E. Pardoux. BSDEs, weak convergence and homogenization of semilin- ear PDEs. Nonlinear analysis, differential equations and control (Mon- treal, QC, 1998), 503–549, NATO Sci. Ser. C Math. Phys. Sci., 528, Kluwer Acad. Publ., Dordrecht, 1999. [12] É. Pardoux and S. Peng. Adapted solution of a backward stochastic differential equation. Systems and Control Lett. 14:55–61, 1990. [13] E. Pardoux and A. Răşcanu. Backward stochastic differential equa- tions with subdifferential operator and related variational inequalities. Stochastic Process. Appl., 76(2):191–215, 1998. [14] E. Pardoux and A. Răşcanu. Backward stochastic variational inequali- ties. Stochastics Stochastics Rep., 67(3-4):159–167, 1999. [15] A. Pazy Semigroups of linear operators and applications to partial dif- ferential equations, Springer-Verlag, (1983). [16] S. Peng Stochastic Hamilton-Jacobi-Bellman equations. SIAM J. Con- trol Optim., 30:284–304, 1992. [17] H. Tanabe Equations of evolution. Monographs and Studies in Mathemat- ics, 6. Pitman (Advanced Publishing Program), Boston, Mass.-London, 1979. [18] H. Triebel. Interpolation Theory, Function Spaces, Differential Op- erators vol. 18 of North-Holland Mathematical Library North-Holland Publishing Co., Amsterdam-New York, 1978 ABSTRACT In this paper we study a class of backward stochastic differential equations (BSDEs) of the form dY(t)= -AY(t)dt -f_0(t,Y(t))dt -f_1(t,Y(t),Z(t))dt + Z(t)dW(t) on the interval [0,T], with given final condition at time T, in an infinite dimensional Hilbert space H. The unbounded operator A is sectorial and dissipative and the nonlinearity f_0(t,y) is dissipative and defined for y only taking values in a subspace of H. A typical example is provided by the so-called polynomial nonlinearities. Applications are given to stochastic partial differential equations and spin systems. <|endoftext|><|startoftext|> Introduction b - DM coincidence from Affleck-Dine leptogenesis Lepton asymmetry Baryon asymmetry and LSP production from Q-balls A solution to the missing satellite problem and the cusp problem Concluding remarks Acknowledgments References ABSTRACT We show that axinos, which are dominantly generated by the decay of the next-to-lightest supersymmetric particles produced from the leptonic $Q$-ball ($L$-ball), become warm dark matter suitable for the solution of the missing satellite problem and the cusp problem. In addition, $\Omega_b - \Omega_{DM}$ coincidence is naturally explained in this scenario. <|endoftext|><|startoftext|> A UNIFIED APPROACH TO SIC-POVMs AND MUBs Olivier Albouy and Maurice R. Kibler Université de Lyon, Institut de Physique Nucléaire, Université Lyon 1 and CNRS/IN2P3, 43 bd du 11 novembre 1918, F–69622 Villeurbanne, France Electronic mail: o.albouy@ipnl.in2p3.fr, m.kibler@ipnl.in2p3.fr Abstract A unified approach to (symmetric informationally complete) positive op- erator valued measures and mutually unbiased bases is developed in this arti- cle. The approach is based on the use of Racah unit tensors for the Wigner- Racah algebra of SU(2) ⊃ U(1). Emphasis is put on similarities and differ- ences between SIC-POVMs and MUBs. Keywords: finite–dimensional Hilbert spaces; mutually unbiased bases; positive op- erator valued measures; SU(2) ⊃ U(1) Wigner–Racah algebra 1 INTRODUCTION The importance of finite–dimensional spaces for quantum mechanics is well recognized (see for instance [1]-[3]). In particular, such spaces play a major role in quantum informa- tion theory, especially for quantum cryptography and quantum state tomography [4]-[27]. Along this vein, a symmetric informationally complete (SIC) positive operator valued measure (POVM) is a set of operators acting on a finite Hilbert space [4]-[14] (see also [3] for an infinite Hilbert space) and mutually unbiased bases (MUBs) are specific bases for such a space [15]-[27]. The introduction of POVMs goes back to the seventies [4]-[7]. The most general quan- tum measurement is represented by a POVM. In the present work, we will be interested http://arxiv.org/abs/0704.0511v3 in SIC-POVMs, for which the statistics of the measurement allows the reconstruction of the quantum state. Moreover, those POVMs are endowed with an extra symmetry condi- tion (see definition in Sec. 2). The notion of MUBs (see definition in Sec. 3), implicit or explicit in the seminal works of [15]-[18], has been the object of numerous mathematical and physical investigations during the last two decades in connection with the so-called complementary observables. Unfortunately, the question to know, for a given Hilbert space of finite dimension d, whether there exist SIC-POVMs and how many MUBs there exist has remained an open one. The aim of this note is to develop a unified approach to SIC-POVMs and MUBs based on a complex vector space of higher dimension, viz. d2 instead of d. We then give a specific example of this approach grounded on the Wigner-Racah algebra of the chain SU(2) ⊃ U(1) recently used for a study of entanglement of rotationally invariant spin systems [28] and for an angular momentum study of MUBs [26, 27]. Most of the notations in this work are standard. Let us simply mention that I is the identity operator, the bar indicates complex conjugation, A† denotes the adjoint of the operator A, δa,b stands for the Kronecker symbol for a and b, and ∆(a, b, c) is 1 or 0 according as a, b and c satisfy or not the triangular inequality. 2 SIC-POVMs Let Cd be the standard Hilbert space of dimension d endowed with its usual inner product denoted by 〈 | 〉. As is usual, we will identify a POVM with a nonorthogonal decompo- sition of the identity. Thus, a discrete SIC-POVM is a set {Px : x = 1, 2, · · · , d2} of d2 nonnegative operators Px acting on C d, such that: • they satisfy the trace or symmetry condition Tr (PxPy) = , x 6= y; (1) moreover, we will assume the operators Px are normalized, thus completing this condition with = 1; (2) • they form a decomposition of the identity Px = I; (3) • they satisfy a completeness condition: the knowledge of the probabilities px defined by px = Tr(Pxρ) is sufficient to reconstruct the density matrix ρ. Now, let us develop each of the operators Px on an orthonormal (with respect to the Hilbert–Schmidt product) basis {ui : i = 1, 2, · · · , d2} of the space of linear operators on vi(x)ui, (4) where the operators ui satisfy Tr(u iuj) = δi,j . The operators Px are thus considered as vectors v(x) = (v1(x), v2(x), · · · , vd2(x)) (5) in the Hilbert space Cd of dimension d2 and the determination of the operators Px is equivalent to the determination of the components vi(x) of v(x). In this language, the trace property (1) together with the normalization condition (2) give v(x) · v(y) = 1 (dδx,y + 1) , (6) where v(x) · v(y) = i=1 vi(x)vi(y) is the usual Hermitian product in C In order to compare Eq. (6) with what usually happens in the search for SIC-POVMs, we suppose from now on that the operators Px are rank-one operators. Therefore, by putting Px = |Φx〉〈Φx| (7) with |φx〉 ∈ Cd, the trace property (1, 2) reads |〈Φx|Φy〉|2 = (dδx,y + 1) . (8) From this point of view, to find d2 operators Px is equivalent to finding d 2 vectors |φx〉 in Cd satisfying Eq. (8). At the price of an increase in the number of components from d3 (for d2 vectors in Cd) to d4 (for d2 vectors in Cd ), we have got rid of the square modulus to result in a single scalar product (compare Eqs. (6) and (8)), what may prove to be suitable for another way to search for SIC-POVMs. Moreover, our relation (6) is independent of any hypothesis on the rank of the operators Px. In fact, there exists a lot of relations among these d4 coefficients that decrease the effective number of coefficients to be found and give structural constraints on them. Those relations are highly sensitive to the choice of the basis {ui : i = 1, 2, · · · , d2} and we are going to exhibit an example of such a set of relations by choosing the basis to consist of Racah unit tensors. The cornerstone of this approach is to identify Cd with a subspace ε(j) of constant angular momentum j = (d− 1)/2. Such a subspace is spanned by the set {|j,m〉 : m = −j,−j + 1, · · · , j}, where |j,m〉 is an eigenvector of the square and the z-component of a generalized angular momentum operator. Let u(k) be the Racah unit tensor [29] of order k (with k = 0, 1, · · · , 2j) defined by its 2k + 1 components u(k)q (where q = −k,−k + 1, · · · , k) through u(k)q = m′=−j (−1)j−m j k j −m q m′ |j,m〉〈j,m′|, (9) where (· · ·) denotes a 3–jm Wigner symbol. For fixed j, the (2j + 1)2 operators u(k)q (with k = 0, 1, · · · , 2j and q = −k,−k + 1, · · · , k) act on ε(j) ∼ Cd and form a basis of the Hilbert space CN of dimension N = (2j + 1)2, the inner product in CN being the Hilbert–Schmidt product. The formulas (involving unit tensors, 3–jm and 6–j symbols) relevant for this work are given in Appendix (see also [29] to [31]). We must remember that those Racah operators are not normalized to unity (see relation (46)). So this will generate an extra factor when defining vi(x). Each operator Px can be developed as a linear combination of the operators u Hence, we have ckq(x)u q , (10) where the unknown expansion coefficients ckq(x) are a priori complex numbers. The determination of the operators Px is thus equivalent to the determination of the coefficients ckq(x), which are formally given by ckq(x) = (2k + 1)〈Φx|u(k)q |Φx〉, (11) as can be seen by multiplying each member of Eq. (10) by the adjoint of u p and then using Eq. (46) of Appendix. By defining the vector v(x) = (v1(x), v2(x), · · · , vN(x)), N = (2j + 1)2 (12) vi(x) = 2k + 1 ckq(x), i = k 2 + k + q + 1, (13) the following properties and relations are obtained. • The first component v1(x) of v(x) does not depend on x since c00(x) = 2j + 1 for all x ∈ {1, 2, · · · , (2j + 1)2}. Proof: Take the trace of Eq. (10) and use Eq. (48) of Appendix. • The components vi(x) of v(x) satisfy the complex conjugation property described ckq(x) = (−1)qck−q(x) (15) for all x ∈ {1, 2, · · · , (2j + 1)2}, k ∈ {0, 1, · · · , 2j} and q ∈ {−k,−k + 1, · · · , k}. Proof: Use the Hermitian property of Px and Eq. (43) of Appendix. • In terms of ckq, Eq. (6) reads 2k + 1 ckq(x)ckq(y) = 2(j + 1) [(2j + 1)δx,y + 1] (16) for all x, y ∈ {1, 2, · · · , (2j + 1)2}, where the sum over q is SO(3) rotationally invariant. Proof: The proof is trivial. • The coefficients ckq(x) are solutions of the nonlinear system given by 2K + 1 cKQ(x) = (−1)2j−Q k ℓ K −q −p Q k ℓ K j j j ckq(x)cℓp(x) (17) for all x∈ {1, 2, · · · , (2j+1)2}, K ∈ {0, 1, · · · , 2j} and Q∈ {−K,−K+1, · · · , K}. Proof: Consider P 2x = Px and use the coupling relation (51) of Appendix involving a 3–jm and a 6–j Wigner symbols. As a corollary of the latter property, by taking K = 0 and using Eqs. (47) and (50) of Appendix, we get again the normalization relation ‖v(x)‖2 = v(x) · v(x) = 1. • All coefficients ckq(x) are connected through the sum rule (2j+1)2 ckq(x) j k j −m q m′ = (−1)j−m(2j + 1)δm,m′ , (18) which turns out to be useful for global checking purposes. Proof: Take the jm–jm′ matrix element of the resolution of the identity in terms of the operators Px/(2j + 1). 3 MUBs A complete set of MUBs in the Hilbert space Cd is a set of d(d + 1) vectors |aα〉 ∈ Cd such that |〈aα|bβ〉|2 = δα,βδa,b + (1− δa,b), (19) where a = 0, 1, · · · , d and α = 0, 1, · · · , d − 1. The indices of type a refer to the bases and, for fixed a, the index α refers to one of the d vectors of the basis corresponding to a. We know that such a complete set exists if d is a prime or the power of a prime (e.g., see [16]-[24]). The approach developed in Sec. 2 for SIC-POVMs can be applied to MUBs too. Let us suppose that it is possible to find d+ 1 sets Sa (with a = 0, 1, · · · , d) of vectors in Cd, each set Sa = {|aα〉 : α = 0, 1, · · · , d − 1} containing d vectors |aα〉 such that Eq. (19) be satisfied. This amounts to finding d(d+ 1) projection operators Πaα = |aα〉〈aα| (20) satisfying the trace condition Tr (ΠaαΠbβ) = δα,βδa,b + (1− δa,b), (21) where the trace is taken on Cd. Therefore, they also form a nonorthogonal decomposition of the identity Πaα = I. (22) As in Sec. 2, we develop each operator Πaα on an orthonormal basis with expansion coefficients wi(aα). Thus we get vectors w(aα) in C w(aα) = (w1(aα), w2(aα), · · · , wd2(aα)) (23) such that w(aα) · w(bβ) = δα,βδa,b + (1− δa,b) (24) for all a, b ∈ {0, 1, · · · , d} and α, β ∈ {0, 1, · · · , d− 1}. Now we draw the same relations as for POVMs by choosing the Racah operators to be our basis in Cd . We assume once again that the Hilbert space Cd is realized by ε(j) with j = (d − 1)/2. Then, each operator Πaα can be developed on the basis of the (2j + 1)2 operators u Πaα = dkq(aα)u q , (25) to be compared with Eq. (10). The expansion coefficients are dkq(aα) = (2k + 1)〈aα|u(k)q |aα〉 (26) for all a ∈ {0, 1, · · · , 2j + 1}, α ∈ {0, 1, · · · , 2j}, k ∈ {0, 1, · · · , 2j} and q ∈ {−k,−k + 1, · · · , k}. For a and α fixed, the complex coefficients dkq(aα) define a vector w(aα) = (w1(aα), w2(aα), · · · , wN(aα)) , N = (2j + 1)2 (27) in the Hilbert space CN , the components of which are given by wi(aα) = 2k + 1 dkq(aα), i = k 2 + k + q + 1. (28) We are thus led to the following properties and relations. The proofs are similar to those in Sec. 2. • First component w1(aα) of w(aα): d00(aα) = 2j + 1 for all a ∈ {0, 1, · · · , 2j + 1} and α ∈ {0, 1, · · · , 2j}. • Complex conjugation property: dkq(aα) = (−1)qdk−q(aα) (30) for all a ∈ {0, 1, · · · , 2j + 1}, α ∈ {0, 1, · · · , 2j}, k ∈ {0, 1, · · · , 2j} and q ∈ {−k,−k + 1, · · · , k}. • Rotational invariance: 2k + 1 dkq(aα)dkq(bβ) = δα,βδa,b + 2j + 1 (1− δa,b) (31) for all a, b ∈ {0, 1, · · · , 2j + 1} and α, β ∈ {0, 1, · · · , 2j}. • Tensor product formula: 2K + 1 dKQ(aα) = (−1)2j−Q k ℓ K −q −p Q k ℓ K j j j dkq(aα)dℓp(aα) (32) for all a ∈ {0, 1, · · · , 2j + 1}, α ∈ {0, 1, · · · , 2j}, K ∈ {0, 1, · · · , 2j} and Q ∈ {−K,−K + 1, · · · , K}. • Sum rule: dkq(aα) j k j −m q m′ = (−1)j−m2(2j + 1)δm,m′ (33) which involves all coefficients dkq(aα). 4 CONCLUSIONS Although the structure of the relations in Sec. 1 on the one hand and Sec. 2 on the other hand is very similar, there are deep differences between the two sets of results. The similarities are reminiscent of the fact that both MUBs and SIC-POVMs can be linked to finite affine planes [12, 13, 22, 23, 25] and to complex projective 2–designs [8, 10, 19, 24]. On the other side, there are two arguments in favor of the differences between relations (6) and (24). First, the problem of constructing SIC-POVMs in dimension d is not equivalent to the existence of an affine plane of order d [12, 13]. Second, there is a consensus around the conjecture according to which there exists a complete set of MUBs in dimension d if and only if there exists an affine plane of order d [22]. In dimension d, to find d2 operators Px of a SIC-POVM acting on the Hilbert space d amounts to find d2 vectors v(x) in the Hilbert space CN with N = d2 satisfying ‖vx‖ = 1, v(x) · v(y) = for x 6= y (34) (the norm ‖v(x)‖ of each vector v(x) is 1 and the angle ωxy of any pair of vectors v(x) and v(y) is ωxy = cos −1[1/(d+ 1)] for x 6= y). In a similar way, to find d + 1 MUBs of Cd is equivalent to find d + 1 sets Sa (with a = 0, 1, · · · , d) of d vectors, i.e., d(d + 1) vectors in all, w(aα) in CN with N = d2 satisfying w(aα) · w(aβ) = δα,β, w(aα) · w(bβ) = for a 6= b (35) (each set Sa consists of d orthonormalized vectors and the angle ωaαbβ of any vector w(aα) of a set Sa with any vector w(bβ) of a set Sb is ωaαbβ = cos −1(1/d) for a 6= b). According to a well accepted conjecture [8, 10], SIC-POVMs should exist in any dimension. The present study shows that in order to prove this conjecture it is sufficient to prove that Eq. (34) admits solutions for any value of d. The situation is different for MUBs. In dimension d, it is known that there exist d+ 1 sets of d vectors of type |aα〉 in Cd satisfying Eq. (19) when d is a prime or the power of a prime. This shows that Eq. (35) can be solved for d prime or power of a prime. For d prime, it is possible to find an explicit solution of Eq. (19). In fact, we have [26, 27] |aα〉 = 2j + 1 ω(j+m)(j−m+1)a/2+(j+m)α|j,m〉, (36) ω = exp 2j + 1 , j = (d− 1) (37) for a, α ∈ {0, 1, · · · , 2j} while |aα〉 = |j,m〉 (38) for a = 2j + 1 and α = j +m = 0, 1, · · · , 2j. Then, Eq. (26) yields dkq(aα) = 2k + 1 2j + 1 m′=−j ωθ(m,m ′)(−1)j−m j k j −m q m′ , (39) θ(m,m′) = (m−m′) (1−m−m′)a+ α for a, α ∈ {0, 1, · · · , 2j} while dkq(aα) = δq,0(2k + 1)(−1)j−m j k j −m 0 m for a = 2j + 1 and α = j + m = 0, 1, · · · , 2j. It can be shown that Eqs. (40) and (41) are in agreement with the results of Sec. 3. We thus have a solution of the equations for the results of Sec. 3 when d is prime. As an open problem, it would be worthwhile to find an explicit solution for the coefficients dkq(aα) when d = 2j + 1 is any positive power of a prime. Finally, note that to prove (or disprove) the conjecture according to which a complete set of MUBs in dimension d exists only if d is a prime or the power of a prime is equivalent to prove (or disprove) that Eq. (35) has a solution only if d is a prime or the power of a prime. APPENDIX: WIGNER-RACAH ALGEBRA OF SU(2) ⊃ We limit ourselves to those basic formulas for the Wigner-Racah algebra of the chain SU(2) ⊃ U(1) which are necessary to derive the results of this paper. The summations in this appendix have to be extended to the allowed values for the involved magnetic and angular momentum quantum numbers. The definition (9) of the components u q of the Racah unit tensor u(k) yields 〈j,m|u(k)q |j,m′〉 = (−1)j−m j k j −m q m′ , (42) from which we easily obtain the Hermitian conjugation property u(k)q = (−1)qu(k)−q . (43) The 3–jm Wigner symbol in Eq. (42) satisfies the orthogonality relations j j′ k m m′ q j j′ ℓ m m′ p 2k + 1 δk,ℓδq,p∆(j, j ′, k) (44) (2k + 1) j j′ k m m′ q j j′ k M M ′ q = δm,Mδm′,M ′. (45) The trace relation on the space ε(j) u(k)q u(ℓ)p 2k + 1 δk,ℓδq,p∆(j, j, k) (46) easily follows by combining Eqs. (42) and (44). Furthermore, by introducing j j′ 0 m −m′ 0 = δj,j′δm,m′(−1)j−m 2j + 1 in Eq. (44), we obtain the sum rule (−1)j−m j k j −m q m 2j + 1δk,0δq,0∆(j, k, j), (48) known in spectroscopy as the barycenter theorem. There are several relations involving 3–jm and 6–j symbols. In particular, we have (−1)j−M j k j −m q M j ℓ j −M p m′ j K j −m Q m′ = (−1)2j−Q k ℓ K −q −p Q k ℓ K j j j , (49) where {· · ·} denotes a 6–j Wigner symbol (or W Racah coefficient). Note that the intro- duction of k ℓ 0 j j J = δk,ℓ(−1)j+k+J (2k + 1)(2j + 1) in Eq. (49) gives back Eq. (44). Equation (49) is central in the derivation of the coupling relation u(k)q u (−1)2j−Q(2K + 1) k ℓ K −q −p Q k ℓ K j j j Q . (51) Equation (51) makes it possible to calculate the commutator [u q , u p ] which shows that the set {u(k)q : k = 0, 1, · · · , 2j; q = −k,−k + 1, · · · , k} can be used to span the Lie algebra of the unitary group U(2j + 1). The latter result is at the root of the expansions (17) and (32). Note added in version 3 After the submission of the present paper for publication in Journal of Russian Laser Research, a pre-print dealing with the existence of SIC-POVMs was posted on arXiv [32]. The main result in [32] is that SIC-POVMs exist in all dimensions. As a corollary of this result, Eq. (34) admits solutions in any dimension. Acknowledgements This work was presented at the International Conference on Squeezed States and Un- certainty Relations, University of Bradford, England (ICSSUR’07). The authors wish to thank the organizer A. Vourdas and are grateful to D. M. Appleby, V. I. Man’ko and M. Planat for interesting comments. References [1] A. Peres, “Quantum Theory: Concepts and Methods”, Dordrecht: Kluwer (1995) [2] A. Vourdas, J. Phys. A: Math. Gen. 38, 8453 (2005) [3] W. M. de Muynck, “Foundations of Quantum Mechanics, an Empiricist Approach”, Dordrecht: Kluwer (2002) [4] J. M. Jauch and C. Piron, Helv. Phys. Acta 40, 559 (1967) [5] E. B. Davies and J. T. Levis, Comm. Math. Phys. 17, 239 (1970) [6] E. B. Davies, IEEE Trans. Inform. Theory IT-24, 596 (1978) [7] K. Kraus, “States, Effects, and Operations”, Lect. Notes Phys. 190 (1983) [8] G. Zauner, Diploma Thesis, University of Wien (1999) [9] C. M. Caves, C. A. Fuchs and R. Schack, J. Math. Phys. 43, 4537 (2002) [10] J. M. Renes, R. Blume-Kohout, A. J. Scott and C. M. Caves, J. Math. Phys. 45, 2171 (2004) [11] D. M. Appleby, J. Math. Phys. 46, 052107 (2005) [12] M. Grassl, Proc. ERATO Conf. Quant. Inf. Science (EQIS 2004) ed. J. Gruska, Tokyo (2005) [13] M. Grassl, Elec. Notes Discrete Math. 20, 151 (2005) [14] S. Weigert, Int. J. Mod. Phys. B 20, 1942 (2006) [15] J. Schwinger, Proc. Nat. Acad. Sci. USA 46, 570 (1960) [16] P. Delsarte, J. M. Goethals and J. J. Seidel, Philips Res. Repts. 30, 91 (1975) [17] I. D. Ivanović, J. Phys. A: Math. Gen. 14, 3241 (1981) [18] W. K. Wootters, Ann. Phys. (N.Y.) 176, 1 (1987) [19] H. Barnum, Preprint quant-ph/0205155 (2002) [20] S. Bandyopadhyay, P. O. Boykin, V. Roychowdhury and F. Vatan, Algorithmica 34, 512 (2002) [21] A. O. Pittenger and M. H. Rubin, Linear Alg. Appl. 390, 255 (2004) [22] M. Saniga, M. Planat and H. Rosu, J. Opt. B: Quantum Semiclassical Opt. 6, L19 (2004) [23] I. Bengtsson and Å. Ericsson, Open Syst. Inf. Dyn. 12, 107 (2005) [24] A. Klappenecker and M. Rötteler, Preprint quant-ph/0502031 (2005) [25] W. K. Wootters, Found. Phys. 36, 112 (2006) [26] M. R. Kibler and M. Planat, Int. J. Mod. Phys. B 20, 1802 (2006) [27] O. Albouy and M. R. Kibler, SIGMA 3, article 076 (2007) [28] H.-P. Breuer, J. Phys. A: Math. Gen. 38, 9019 (2005) [29] G. Racah, Phys. Rev. 62, 438 (1942) [30] U. Fano and G. Racah, “Irreducible Tensorial Sets”, New York: Academic (1959) [31] M. Kibler and G. Grenet, J. Math. Phys. 21, 422 (1980) [32] J.L. Hall and A. Rao, Preprint quant-ph/0707.3002v1 (20 July 2007) http://arxiv.org/abs/quant-ph/0205155 http://arxiv.org/abs/quant-ph/0502031 INTRODUCTION SIC-POVMs MUBs CONCLUSIONS ABSTRACT A unified approach to (symmetric informationally complete) positive operator valued measures and mutually unbiased bases is developed in this article. The approach is based on the use of operator equivalents expanded in the enveloping algebra of SU(2). Emphasis is put on similarities and differences between SIC-POVMs and MUBs. <|endoftext|><|startoftext|> Introduction The simplest model exhibiting time-oscillations in a two-component system is the model proposed independently by Lotka [1, 2, 3] and by Volterra [4]. In this model the individuals of two species are dispersed over an assumed homogeneous space. It is implicitly assumed in this approach that any individual can interact with any other one with equal intensity implying that their positions are not taken into account. The time evolution of the densities of the two species in the Lotka-Volterra model is given by a set of two ordinary differential equations [5, 6, 7, 8] and is set up in analogy with the laws of mass-action. Depending on the level of description wanted, the approach based on mass-action laws, contained on the Lotka-Volterra model, suffices. However, there are situations in which the coexistence takes place in a spatially heterogeneous habitat such that the population densities can be very low in some regions. In this case we need to proceed beyond the mass-law equations and consider the space structure of the habitat. In other words, it becomes necessary to analyze the coexistence by taking explicitly into account spatial structured models. In fact, the role of space in the description of population biology problems has been recognized by several authors in the last years [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]. In a very clear manner, Durrett and Levin [11] have pointed out that the modelling of population dynamics systems which are spatially distributed by interacting particle systems [11, 27, 28, 29] is the appropriate theoretical approach that is able to give the more complete description of the problem. We include in this approach probabilistic cellular automata (PCA) [29, 30, 31], which will concern us here. We refer to interacting particle systems and PCA as stochastic lattice models. They are both Markovian processes defined by discrete stochastic variables residing on the sites of a lattice; the former being a continuous time process and the latter a discrete time process. In the present work we study the coexistence and the emergence of stable self- sustained oscillations in a predator-prey system by considering a PCA previously studied by numerical simulations [24, 26]. This PCA is defined by local rules, similar to the ones of the contact process [27], that are capable of describing the interaction between prey and predator. Here, we focus on the analysis of the PCA by means of dynamic mean-field approximations [10, 28, 29, 30, 32, 33]. In this approach the equations for the time evolution of correlations of various orders are truncated at a certain level and high order correlations of sites are written in terms of small order correlations. The simplest approximation is the one in which all correlations are written in terms of one- site correlation, called simple approximation. In a more sophisticated approximation, called pair approximation [10, 29], any correlation is written in terms of one-site and two-site correlations. The simple mean-field approximation is capable of predicting coexistence of individuals in a stationary state where the densities of each species, and of empty sites, are constant. However, it is not capable of predicting possible time oscillating Stable oscillations of a predator-prey probabilistic cellular automaton 3 behavior of the population densities and we have proceed to the next order of mean- field approximation. The simple approximation, on the other hand, can be placed in an explicit correspondence with a patch model [7, 12, 34] where unoccupied patches can be colonized by prey and patches occupied by prey can be colonized by predators that in turn may become extinct. In this approximation the PCA can be seen as an extended version of the Lotka-Volterra model which includes an extra logistic term related to the empty sites. The pair-mean field approximation is able to predict possible time oscillating behavior of the population densities that are self-sustained and are attained thorough Hopf bifurcations. This is in contrast with the Lotka-Volterra model which presents no stable oscillations but exhibits instead infinite cycles that are associated to different initial conditions. However, from the biological point of view, one does not expect that a small variation in the initial densities of prey and predator result in different amplitudes of oscillations. Within our approach, a PCA treated in the pair-approximation, the oscillations are associated with limit cycles what mean to say that they are stable against the changes in the initial conditions. According to our point of view, the pair- approximation, in which the correlation between neighboring sites are treated exactly, provides a basic description of the predator-prey spatial interactions. For this reason, we will refer to the PCA in this approximation as a quasi-spatial-structured model. 2. Model 2.1. Probabilistic cellular automaton We consider interacting particles living on the sites of a lattice and evolving in time according to Markovian local rules. The lattice is the geometrical object that plays the role of the spatial region occupied by particles, in a general case, or by individuals of each species in the present case. The lattice sites are the possible locations for the individuals. Each site can be either empty or occupied by one individual of different species and a stochastic variable ηi is introduced to describe the state of each site at a given instant of time. The state of the entire system is denoted by η = (η1, . . . , ηi, . . . , ηN) where N is the total number of sites. The transition between the states is governed by the interactions between neighbor sites in the lattice and by a synchronous dynamics. The probability P (ℓ)(η) of configuration η at time step ℓ evolves according to the Markov chain equation P (ℓ+1)(η) = W (η|η′)P (ℓ)(η′), (1) where the summation is over all the microscopic configurations of the system, and W (η|η′) is the conditional transition probability from state η′ at time ℓ to state η at time ℓ + 1. This transition probability does not depend on time and contains all the information about the dynamics of the system. Taking into account that all the sites are simultaneously updated, which is the fundamental property of a PCA, the transition Stable oscillations of a predator-prey probabilistic cellular automaton 4 Figure 1. Transitions of the predator-prey model. The three states are: prey or herbivorous (H or 1), predator (P or 2) and empty or vegetable (V or 0). The allowed transitions obey the cyclic order shown. probability can be factorized and written in the form [29, 30] W (η|η′) = wi(ηi|η ′), (2) where wi(ηi|η ′) is the conditional transition probability that site i takes the state ηi given that the whole system is in state η′. Being a probability distribution, the quantity wi(ηi|η ′) must satisfy the following properties: wi(ηi, η ′) ≥ 0 and wi(ηi|η ′) = 1. (3) The average of any state function F (η) is evaluated by 〈F (η)〉ℓ = F (η)P (ℓ)(η). (4) The time evolution equation for 〈F (η)〉 is obtained from definition (4) and equation (1). For example, we can derive the equations for the time evolution of densities and two-site correlations. 2.2. Predator-prey probabilistic cellular automaton To model a predator-prey system by a PCA, the stochastic variable ηi associated to site i will represent the occupancy of the site by one prey, or the occupancy by one predator or the vacancy (a site devoid of any individual). The variable ηi is assumed to take the value 0, 1, or 2, according to whether the site is empty (V), occupied by a prey individual (H) or by a predator (P), respectively. That is, 0, empty (V), 1, prey (H), 2, predator (P), which defines a three-state per site PCA. The stochastic rules, embodied in the transition rate wi(ηi|η ′), are set up according to the following assumptions. (a) The space is homogeneous, which means to say that no region of the space will be privileged against the others, that is, in principle the individuals have the same conditions of surveillance in any space region. (b) The space is isotropic, which means to say that there is no preferential direction in this space for any interaction. (c) The allowed transitions between states are only the ones that obey Stable oscillations of a predator-prey probabilistic cellular automaton 5 the cyclic order shown in figure 1. Prey can only born in empty sites; prey can give place to a predator, in a process where a prey individual dies and a predator is instantaneously born; finally a predator can die leaving an empty site. The empty sites are places where prey can proliferate and can be seen as the resource for prey surveillance. The death of predators complete this cycle, reintegrating to the system the resources for prey. The predator-prey PCA has three parameters: a, the probability of birth of prey, b, the probability of birth of predator and death of prey, and c, the probability of predator death. Two of the process are catalytic: the occupancy of a site by prey or by a predator is conditioned, respectively, to the existence of prey or predator in the neighborhood of the site. The third reaction, where predator dies, is spontaneous, that is, it occurs, with probability c, independently of the neighbors of the site. We assume that a+ b+ c = 1, (6) with 0 ≤ a, b, c ≤ 1. The transition probabilities of the predator-prey PCA are described in what follows: (a) If a site i is empty, ηi = 0, and there is at least one prey in its first neighborhood there is a favorable condition for the birth of a new prey. The probability of site i being occupied in next time step by a prey is proportional to the parameter a and to the number of prey that are in the first neighborhood of the empty site. (b) If a site is occupied by a prey, ηi = 1, and there is at least one predator in its first neighborhood then the site has a probability of being occupied by a new predator in the next instant of time. In this process the prey dies instantaneously. The transition probability is proportional to the parameter b and the number of predators in first neighborhood of the site. (c) If site i is occupied by a predator, ηi = 2, it dies with probability c. The transition probabilities associated to the three processes above mentioned can be summarized as follows: wi(0|η) = cδ(ηi, 2) + [1− fi(η)]δ(ηi, 0), (7) wi(1|η) = fi(η)δ(ηi, 0) + [1 − gi(η)]δ(ηi, 1), (8) wi(2|η) = gi(η)δ(ηi, 1) + (1− c)δ(ηi, 2), (9) where fi(η) = δ(ηk, 1), gi(η) = δ(ηk, 2), (10) and the summation is over the four nearest neighbors of site i in a regular square lattice. The notation δ(x, y) stands for the Kronecker delta function. These stochastic local rules, when inserted in equation (2), define the dynamics of the PCA for a predator- prey system. The present stochastic dynamics predicts the existence of states, called absorbing states, in which the system becomes trapped. Once the system has entered such a state Stable oscillations of a predator-prey probabilistic cellular automaton 6 it cannot escape from it anymore remaining there forever. There are two absorbing states. One of them is the empty lattice. Since the predator death is spontaneous, a configuration where just predators are present is not stationary. This situation happens whenever the prey have been extinct. In this case the predator cannot reproduce anymore and also get extinct, leaving the entire lattice with empty sites. The other absorbing state is the lattice full of prey. This situation occurs if there are few predators and they become extinct. The remaining prey will then reproduce without predation filling up the whole lattice. The existence of absorbing stationary states is an evidence of the irreversible character of the model or, in other words, of the lack of detailed balance [29]. However, the most interesting states, the ones that we are concerned with in the present study, are the active states characterized by the coexistence of prey and predators. 2.3. Time evolution equations for state functions We start by defining the densities, which are the one-site correlations, and the two-site correlations. These quantities will be useful in our mean-field analysis to be developed below. The density of prey, predator, and empty sites at time step ℓ are defined thought the expressions i (1) = 〈δ(ηi, 1)〉ℓ, (11) i (2) = 〈δ(ηi, 2)〉ℓ, (12) i (0) = 〈δ(ηi, 0)〉ℓ. (13) The evolution equations for the above densities are obtained from their definitions as state functions, as given by equation (4), and by using the evolution equation for P (ℓ)(η), given by equation (1). The resulting equations can be formally written as (ℓ+1) i (1) = 〈wi(1|η)〉ℓ, (14) (ℓ+1) i (2) = 〈wi(2|η)〉ℓ, (15) (ℓ+1) i (0) = 〈wi(0|η)〉ℓ, (16) where the transition probabilities for this model are given in equations (7), (8) and (9). The correlation between a prey localized at site i and a predator localized at site j at time step ℓ is defined by ij (1, 2) = 〈δ(ηi, 1)δ(ηj, 2)〉ℓ. (17) The other two-site correlations are defined similarly. The time evolution equation for the correlation of two neighbor sites i and j, one being occupied by a prey and the other by a predator, is given by (ℓ+1) ij (1, 2) = 〈wi(1|η)wj(2|η)〉ℓ. (18) The other two-site evolution equations are given by similar formal expressions. We can also derive equations for three-site correlations. Since we are interested here on Stable oscillations of a predator-prey probabilistic cellular automaton 7 approximations in which only the one-site and two-site correlations should be treated exactly, the above equations suffice. We call the attention to the fact that equation (18) includes the product of two transition probabilities. This is a consequence of the synchronous update of the PCA which allows that both neighboring sites i and j have their states changed at same time step. This situation does not occur when we consider a continuous time one- site dynamics. Therefore, although local interaction in the present PCA and in the continuous time model considered in reference [10] are the same, the predator-prey system evolves according to different global dynamics which leads to different time evolution equations for the densities and the correlations. The exact evolution equations for the one-site correlations are P ′j (1) = Pji(01)− Pji(12) + Pj(1), (19) P ′j (2) = Pji(12) + (1− c)Pj(2), (20) where the summation in j is over the ζ nearest neighbors of site i. To simplify notation we are using unprimed and primed quantities to refer to quantities taken at time ℓ and ℓ+ 1, respectively. The exact evolution equations for the correlations of two nearest neighbor sites j and k are P ′jk(01) = n(6=j) Pjkn(001)− i(6=k) Pijkn(1001) + (1− Pjk(01)− n(6=j) Pjkn(012) i(6=k) Pijk(101)− n(6=j) Pijkn(1012) )Pjk(21)− n(6=j) Pjkn(212) n(6=j) Pjkn(201), (21) P ′jk(12) = n(6=j) Pjkn(012) + i(6=k) Pijkn(1012) n(6=j) Pjkn(112)− i(6=k) Pijkn(2112) + (1− c) )Pjk(12)− i(6=k) Pijk(212) (1− c) i(6=k) Pijk(102), (22) Stable oscillations of a predator-prey probabilistic cellular automaton 8 P ′jk(02) = n(6=j) )Pjkn(012)− i(6=k) Pijkn(1012) + (1− c) cPjk(22) + Pjk(02)− i(6=k) Pijk(102) Pjk(21) + n(6=j) Pjkn(212)  , (23) where the summation in i is over the nearest neighbors of j and the summation in n is over the nearest neighbors of k. 3. Mean-field approximation 3.1. One and two site approximations The evolution equation for a density in any interacting particle system which evolves in time according to local interaction rules always contains terms related to the correlations between neighbor sites in a lattice. The evolution equations for the correlations of two neighbor sites includes the correlation of clusters of three or more sites in the lattice and so on. In this way we can have an infinite set of coupled equations for the correlations which is equivalent to the evolution equation for the probability P (ℓ)(η), described in equation (1) for the automaton. The scope of the dynamic mean-field approximation consists in the truncation of this infinite set of coupled equations [30, 31, 32, 33]. The lowest order dynamic mean-field approximation is the one where the probability of a given cluster is written as the product of the probabilities of each site. That is, all the correlations between sites in the cluster are neglected. For example, let us consider the cluster constituted by a center (C) site and its first neighboring sites to the north (N), south (S), east (E) and west (W) as shown in figure 2. Within the one-site approximation the probability P (N,E,W, S, C) corresponding to the cluster shown in figure 2 is approximated by P (N,E,W, S, C) = P (N)P (E)P (W )P (S)P (C), (24) where P (X), X = N,E,W, S, C are the one-site probabilities corresponding to each site. For some stochastic dynamics models this approximation is able to give qualitative results that are in agreement with the expected results. In order to get a better approximation we must include fluctuations. The simplest mean-field approximation that includes correlations is the pair-mean field approximation. This approximation is better explained by taking again, as an example, the cluster constituted by a center site which and its four nearest neighbors, shown above. Within the pair-approximation the conditional probability P (N,E,W, S |C) is approximated by P (N,E,W, S |C) = P (N |C)P (E, |C)P (W |C)P (S |C), (25) Stable oscillations of a predator-prey probabilistic cellular automaton 9 Figure 2. A site (C) of the square lattice and its four nearest neighbor sites (N, E, W, S). that is, the conditional probability P (N,E,W, S |C) is written in terms of the product of the conditional probabilities P (X|C), X = N,E,W, S. Now using the definition of conditional probability we have P (N,E,W, S, C) P (C) P (N,C) P (C) P (E,C) P (C) P (W,C) P (C) P (S, C) P (C) , (26) P (N,E,W, S, C) = P (N,C)P (E,C)P (W,C)P (S, C) [P (C)]3 . (27) We see that the resulting probability is written as a function of two-site correlations P (X,C), and the one-site correlation P (C). 3.2. Patch model The simple mean-field approximation of the predator-prey PCA describes exactly the same properties of an extended Levins patch model [7, 34]. That is, the PCA with local rules similar to the contact process becomes, in the simple mean-field approximation, analogous to the Levins model for metapopulation with empty patches, patches colonized by prey and patches colonized by predators. In the one-site mean-field approximation we consider that the probability of any cluster of sites can be written as the product of the probabilities of each site, as in equation (24). Using this approach, and writing x = Pi(1), y = Pi(2), and z = Pi(0) it can be seen that the set of equations can be reduced to the following two-dimensional map [26] x′ = x+ axz − bxy, (28) which is an evolution equation for prey density x, and y′ = y + bxy − cy, (29) which is an evolution equation for predator density yℓ. Notice that z = 1− x− y. (30) The fixed point of this map are those that represent the stationary solutions x′ = x and y′ = y, and they correspond to the three following solutions x1 = 0, y1 = 0, and x2 = 1, y2 = 0, and x3 = a/b, y3 = (1 − c/b)/(1 + b/a). The first solution corresponds to an absorbing states where both species have been extinct. The second corresponds Stable oscillations of a predator-prey probabilistic cellular automaton 10 −0.5 0 0.5 Figure 3. Phase diagram of the patch model. The continuous line represents the transition, c1(p), between the prey absorbing (A) state and the active species coexistence (C) state. The dashed line separates the two asymptotic time behavior of the active state. to an absorbing state where predators have extinct. The third solution corresponds to an active state where prey and predator coexist. Due to the constraint (6), the parameters a, b and c are not all independent and only two can be chosen as independent. For this reason it is convenient to introduce the following parametrization [10] − p, b = + p, (31) and consider p and c as the independent variables. The parameter p is such that −1/2 ≤ p ≤ 1/2 and 0 ≤ c ≤ 1 as before. This parametrization will useful in the determination of the different phases displayed by the model. A linear stability analysis reveals that solution the (x1, y1) is a hyperbolic saddle point for any set of the parameters a, b and c and so it is always unstable. The empty absorbing state will never be reached. A linear stability analysis also shows that the solution (x2, y2) is a stable node in the following region of the phase diagram c > c1 where c1(p) = (1 + 2p). (32) The active solution is stable in the region c < c1 and is attained in two ways: by an asymptotic stable focus, where the successive interactions of the map show damped oscillations; or trough an asymptotic stable node. In the phase diagram of figure 3 we show the transition line between the prey absorbing state and the active state given by c = c1. In figure 4 it is shown the behavior of the densities against the parameter c, the probability of predators death, for the special case p = 0.2. In terms of phase transitions what happens is that in the phase diagram there is a transition line separating the absorbing prey phase and the active phase which is characterized by constant and nonzero densities of prey and predator. Stable oscillations of a predator-prey probabilistic cellular automaton 11 0 0.1 0.2 0.3 0.4 0.5 0.6 predator Figure 4. Densities of predator and prey as functions of the parameter c for p = 0.2, for the patch model. We may conclude that the mean-field approximation for the predator-prey probabilistic cellular automaton with rules (7), (8), and (9) is capable to show, under a robust set of control parameters, that prey and predators can coexist without extinction. However the map defined by equations (28) and (29) is not able to describe self-sustained oscillations of species population densities. 3.3. Quasi-spatial model In order to find if oscillations in the species populations can be described within a mean- field approach we consider a more sophisticated approximation, the pair-approximation, where correlations of two neighbor sites are included in the time evolution equations for the densities. This is the lowest order mean-field approximation which takes into account the spatial localization of neighboring individuals. In this analysis we will maintain the correlations of one site and the correlations of two-sites in the equations. Correlations of three and four neighbor sites will be approximated by means of equation (27). With these approximation the model is described by the following set of five coupled equations x′ = au− bv + x, (33) y′ = bv + (1− c)y, (34) u′ = αa[ ] + [(1− βa)− αa ][u− αb + αac + c[(1− βb)v − αb ], (35) v′ = αb[βa ] + αa(1− c) + αb[ ] + (1− c)[(1− βb)v − αb ], (36) Stable oscillations of a predator-prey probabilistic cellular automaton 12 −0.5 0 0.5 Figure 5. Phase diagram of the quasi-spatial model. The upper continuous line represents the transition, c1(p), between the prey absorbing (A) state and the nonoscillating coexistence (CNO) state. The lower continuous line represents the transition, c2(p), between the nonoscillating coexistence and the oscillating (COS) coexistence state. The dashed line separates the two asymptotic time behavior of the nonoscillating coexistence state. w′ = αb[(1− βa) ] + (1− c)[w − αa + c[βbv + αb ] + c(1− c)s, (37) where α and β are numerical fractions defined by α = (ζ − 1)/ζ and β = 1/ζ where ζ is the coordination number of the lattice. For the present case of a square lattice, ζ = 4 so that α = 3/4 and β = 1/4. We are using the following notation: u = P (0, 1), v = P (1, 2), and w = P (0, 2) and also r = P (1, 1), q = P (0, 0) and s = P (2, 2). The last three correlations are not independent but are related to others by r = x− u− v, (38) q = z − u− w, (39) s = y − v − w. (40) We used the properties P (1, 0) = P (0, 1), P (1, 2) = P (2, 1) and P (2, 0) = P (0, 2), that follows from the assumption that space is isotropic and homogeneous. We have analyzed numerically the five-dimensional map, described by the set of equations (33), (34), (35), (36) and (37), and we have obtained four types of solutions. Two solutions are trivial and are given by x = y = u = v = w = 0 and x = 1, y = u = v = w = 0. They correspond to the empty and prey absorbing states, respectively. The empty absorbing state, where both species have been extinct is an unstable solution and never occurs. However, the prey absorbing state is one of the possible stable stationary solutions and is stable above the critical transition line Stable oscillations of a predator-prey probabilistic cellular automaton 13 0 0.1 0.2 0.3 0.4 0.5 0.6 predator Figure 6. Densities of predator and prey as functions of c for the quasi-spatial model, for p = 0.2. c = c1(p) shown in figure 5. Below this line it becomes unstable giving rise to the active state. The other solutions correspond to the active states where both prey and predator coexist. These solutions are of two kinds: a stationary solution where there is a coexistence of the two species with densities constant in time, which we call the nonoscillating (NO) active state; and another solution where both population densities oscillate in time. This solution corresponds to a self-sustained oscillation of the predator- prey system and will be called the oscillating (O) active state. In the phase diagram of figure 5 there is a line c = c2(p) that separates the NO and O active phases. Figure 6 shows the behavior of the densities as a function of c for p = 0. 3.4. Oscillatory behavior In figure 7 we show an example of self-sustained oscillations of the densities of prey and predators as functions of time. The oscillating solutions are attained from the nonoscillating solutions by a Hopf bifurcation. The fixed point associated to this solution is an unstable center which produces a stable limit cycle as trajectories in the phase- space of the predator density versus prey density, as can be seen in figure 7. Notice that the oscillations are not damped and have a well defined period which is the same for the prey density and for the predator density, which implies that the oscillations are coupled. A maximum of predators always follow a maximum of prey. This means that the abundance of prey is a condition that favors the increase in the number of predators. As the predator number increases the prey population decays. The evanescence of prey is followed by a decrease in the predator number, giving conditions for the increase of prey population until the cycle starts again. A well defined oscillatory behavior is found for many biological population, the most famous being the one related to the time oscillations of the population of lynx and snowshoe hare in Canada for which data were collected for a long period of time Stable oscillations of a predator-prey probabilistic cellular automaton 14 1000 1200 1400 1600 1800 2000 predator 0 0.05 0.1 0.15 Prey population density Figure 7. (a) Densities of predator and prey as functions of time and (b) density of predator versus density of prey, for the quasi-spatial model, for p = 0 and c = 0.016. [7, 8]. If the hare population cycles are mainly governed by the lynx cycle then the oscillations shown by the present model reproduces qualitatively some of the features of this predator-prey dynamics. Next we analyze the behavior of the frequency and amplitude of oscillations. Fixing the parameter p and varying the parameter c, we verify that in all the oscillating region the frequency of oscillation is proportional to parameter c, ω ∼ c, (41) as can be seen in figure 8. Low frequencies are associated to low values of c; what means that, for small values of c, the greater the predator lifetime the greater will be period of the oscillation. As to the amplitude A of the oscillations, we have verified, that fixing the value of p and varying the parameter c, it increases as c decreases. Our results show that, A ∼ (c− c2) 1/2, (42) as expected for a Hopf bifurcation and shown in figure 8. The transition line c = c2 from the oscillating phase to the nonoscillating phase can either be obtained by using the criterion given by equation (42) or by analyzing the eigenvalues associated to the map given by the set of equations (33), (34), (35), (36) and (37). This last criterion means to find the points of phase diagram such that the real part of the dominant complex eigenvalue equals 1. 4. Discussion and conclusion The main result coming from the pair mean-field approximation applied to the predator- prey PCA is that it is possible to describe coexistence and self-sustained time oscillations. Moreover, these are stable oscillations. Given a set of parameters, just one limit cycle is achieved, no matter what the initial conditions are. This property is essential in describing a biological system since a small variation in the initial condition can not Stable oscillations of a predator-prey probabilistic cellular automaton 15 0 0.005 0.01 0.015 0.02 0 0.005 0.01 0.015 0.02 Figure 8. (a) Frequency of oscillations ω versus the parameter c. The frequency vanishes linearly as one approaches c = 0. (b) Amplitude A of oscillations versus c near the Hopf bifurcation point c2 = 0.019. The quantity A 2 vanishes linearly when c → c2 in accordance with a Hopf bifurcation. modify the amplitude, frequency and mean value of the time oscillation densities of a predator-prey system. Similar results were obtained from a continuous time version of the present model [10]. Although the simple mean-field equations are essentially the same in both versions this is not the case concerning the pair mean-field approximation. The time evolutions of the pair correlations for the PCA, presented here, depend on higher order correlations (up to fourth) when compared to the ones of the continuous version (up to third). The model studied here is a spatial structured model with individuals residing in sites of a lattice and described by discrete dynamic variables. When we perform simple mean-field approximation we neglect all the correlations of sites in the lattice. But we take into account that there are limited resources for the surveillance of each species. For example in the time evolution equation for the density of prey we have an explicit term relative to reaction of birth of prey which is the product of the density of prey x by the density of empty sites z = (1−x−y). This coincides with an extended patch model approach for predator-prey systems. The presence of this term is what differs the simple mean-field equations from the Lotka-Volterra equations. However, taking into account the limitation of space and resources the simple mean-field equations are not sufficient to get self-sustained oscillations although able to describe damped time oscillations of population densities. To get self-sustained time oscillations we had to proceed to the next level of approximation in which a pair of nearest neighbor sites is treated exactly. This approximation can be seen as representing a pair of nearest neighbor sites immersed in a mean field produced by the rest of the lattice. The most important feature being the fact that the two sites of this pair can be seen as localized in space. The set of five equations which results from the pair approximation for the PCA is indeed able to produce self-sustained oscillations of population densities. It presents an important Stable oscillations of a predator-prey probabilistic cellular automaton 16 property that the Lotka-Volterra model lacks, namely, the oscillating solutions are stable and are unique for a given set of the control parameters. Acknowledgements The authors have been supported by the Brazilian agency CNPq. References [1] Lotka A 1920 J. Am. Chem. Soc. 42 1595 [2] Lotka A 1920 Proc. Nat. Acad. of Sciences USA 6 410 [3] Lotka A 1924 Elements of Mathematical Biology (new York: Dover) [4] Volterra V 1931 Leçons sur la Théorie Mathématique de la Lutte pour la Vie Paris: Gauthier- Villars) [5] Haken H 1976 Synergetics, An Introduction (Berlin: Springer) [6] Renshaw E 1991 Modelling Biological Populations in Space and Time (Cambridge: Cambridge University Press) [7] Hastings A 1997 Population Biology: Concepts and Models (New York: Springer) [8] Ricklefs R E and Miller G L 2000 Ecology (New York: Freeman) [9] Tainaka K 1989 Phys. Rev. Lett. 63 2688 [10] Satulovsky J and Tomé T 1994 Phys. Rev. E 49 5073 [11] Durrett R and Levin S 1994 Theor. Popul. Biol. 46 363 [12] Hanski I and Gilpin M E (eds.) 1997 Metapopulation Biology: Ecology, Genetic and Evolution (San Diego: Academic Press) [13] Satulovsky J and Tomé T 1997 J. Math. Biol. 35 344 [14] Tilman D and Kareiva P 1997 Spatial Ecology: the Role of Space in Population Dynamics and Interactions (Princeton: Princeton University Press) [15] Fracheburg L and Krapvisky P 1998 J. Phys. A 31 L287 [16] Liu Y C, Durrett R and Milgroom M 2000 Ecol. Model. 127 291 [17] Antal T, Droz M, Lipowsky A and Odor G 2001 Phys. Rev. E 64 036118 [18] Ovaskanien O, Sato K, Bascompte J and Hanski I 2002 J. Theor. Biol. 215 95 [19] Aguiar M A M, Sayama H, Baranger M and Bar-Yam Y 2003 Braz. J. Phys. 33 514 [20] de Carvalho K C and Tomé T 2004 Mod. Phys. Lett. B 18 873 [21] Nakagiri N and Tainaka K 2004 Ecol. Model. 174 103 [22] Szabó G 2005 J. Phys. A 38 6689 [23] Stauffer D, Kunwar A and Chowdhury D 2005 Physica A 352 202 [24] de Carvalho K C and Tomé T 2006 Int. Mod. Phys. C 17 1647 [25] Mobilia M, Georgiev I T and Tauber U C 2006 Phys. Rev. E 73 040903 [26] Arashiro E and Tomé T 2007 J. Phys. A 40 887 [27] Liggett T M 1985 Interacting Particle Systems (New York: Springer) [28] Marro J and Dickman R 1999 Nonequilibrium Phase Transitions (Cambridge: Cambridge University Press) [29] Tomé T and de Oliveira M J 2001 Dinâmica Estocástica e Irreversibilidade (São Paulo: Editora da Universidade de São Paulo) [30] Tomé T 1994 Physica A 212 99 [31] Tomé T, Arashiro E, Drugowich de Feĺıcio J R and de Oliveira M J, 2003 Braz. J. Phys. 33 458 [32] Dickman R 1986 Phys. Rev. A 34 4246 [33] Tomé T and Drugowich de Feĺıcio J R 1996 Phys. Rev. E 53 3976 [34] Levins R 1969 Bull. Entomol. Soc. Am. 15 237 Introduction Model Probabilistic cellular automaton Predator-prey probabilistic cellular automaton Time evolution equations for state functions Mean-field approximation One and two site approximations Patch model Quasi-spatial model Oscillatory behavior Discussion and conclusion ABSTRACT We analyze a probabilistic cellular automaton describing the dynamics of coexistence of a predator-prey system. The individuals of each species are localized over the sites of a lattice and the local stochastic updating rules are inspired on the processes of the Lotka-Volterra model. Two levels of mean-field approximations are set up. The simple approximation is equivalent to an extended patch model, a simple metapopulation model with patches colonized by prey, patches colonized by predators and empty patches. This approximation is capable of describing the limited available space for species occupancy. The pair approximation is moreover able to describe two types of coexistence of prey and predators: one where population densities are constant in time and another displaying self-sustained time-oscillations of the population densities. The oscillations are associated with limit cycles and arise through a Hopf bifurcation. They are stable against changes in the initial conditions and, in this sense, they differ from the Lotka-Volterra cycles which depend on initial conditions. In this respect, the present model is biologically more realistic than the Lotka-Volterra model. <|endoftext|><|startoftext|> Introduction Observations and data reduction WHT spectroscopy Calar Alto photometry Data analysis Radial velocity analysis Light curve analysis Discussion Conclusions ABSTRACT Intermediate polars (IPs) are cataclysmic variables which contain magnetic white dwarfs with a rotational period shorter than the binary orbital period. Evolutionary theory predicts that IPs with long orbital periods evolve through the 2-3 hour period gap, but it is very uncertain what the properties of the resulting objects are. Whilst a relatively large number of long-period IPs are known, very few of these have short orbital periods. We present phase-resolved spectroscopy and photometry of SDSS J233325.92+152222.1 and classify it as the IP with the shortest known orbital period (83.12 +/- 0.09 min), which contains a white dwarf with a relatively long spin period (41.66 +/- 0.13 min). We estimate the white dwarf's magnetic moment to be mu(WD) \approx 2 x 10^33 G cm^3, which is not only similar to three of the other four confirmed short-period IPs but also to those of many of the long-period IPs. We suggest that long-period IPs conserve their magnetic moment as they evolve towards shorter orbital periods. Therefore the dominant population of long-period IPs, which have white dwarf spin periods roughly ten times shorter than their orbital periods, will likely end up as short-period IPs like SDSS J2333, with spin periods a large fraction of their orbital periods. <|endoftext|><|startoftext|> Microsoft Word - Complexity_Considerations_FOL2.doc Radosław Hofman, cSAT problem lower bound, 2007 Abstract—This article deals with the lower bound that is considered as the worst case minimal amount of time required to calculate a problem result for cSAT (counted Boolean satisfiability problem). It uses the observation that Boolean algebra is a complete first-order theory where every sentence is decidable. Lower bound of this decidability is defined and shown. The article shows that deterministic calculation model made up of finite number of machines (algorithms), oracles, axioms, or predicates is incapable of solving considered NP-complete problem when its instance grows to infinity. This is a direct proof of the fact that P and NP complexity classes differ and oracle capable of solving NP-complete problems in polynomial time must consist of infinite number of objects (i.e., must be nondeterministic). Corollary of this article clears complexity hierarchy: P < NP Index terms—complexity class, P vs NP, Boolean algebra, first order theory, first order predicate calculus. I. INTRODUCTION Unknown relation between P and NP [5] complexity classes remains one of the significant unsolved problems in complexity theory. P complexity class consists of problems solvable by deterministic Turing machine (DTM) in polynomially bounded time, while NP complexity class consists of problem solvable by nondeterministic Turing machine (NDTM) in polynomially bounded time. This means that DTM can verify the solution of every NP problem in polynomially bounded time, even if polynomial algorithm for finding this solution is unknown [13]. All known attempts to prove whether these classes are or are not equal could not convince the community that arguments used there are final. Problem with attempts showing that P=NP is mainly with counter examples provided for methods described by solvers (see for example: [6], [9]), especially for large instances. Problem with proof attempts that P≠NP touches mainly the difference between a problem and an algorithm. Proving the inequality of these classes is equivalent to proving that “there is no such algorithm that solves a particular NP problem in polynomially bounded time.” Algorithm is an immaterial object, so proving that it does not exist is rather difficult. Can then the inequality of complexity classes be proved? One of the possible ways is to use the properties of first-order theory. Useful properties include every sentence ϕ in theory T is provable if there exists a set of axioms a, b, c, … such that ϕ can be obtained using these axioms and the inference Manuscript created December 29, 2006. Author is Ph. D. student of Department of Information Systems at The Poznan University of Economics, http://www.kie.ae.poznan.pl, email: radekh@teycom.pl. rules “modus ponens” and “universal generalization” (a ∧ b ∧ c… → ϕ) [2]. II. BACKGROUND This section presents some background for the first-order theory and other rules used in the article. A. First-Order Theories First-order theory is a given set of axioms in some language. Language consists of logical symbols and set constants, functions, and relation symbols (predicates). Terms and formulas are built from language and give rise to sentences, which are formulas with no free variables in body. Theory is then a set of sentences which may be closed if it contains all consequences of its elements. Theory can be also complete (i.e. every sentence can be proved or disproved), consistent (not every sentence is provable), or decidable (every sentence can be proved or disproved and there exists a computational path (algorithm) showing which sentences are provable). An example of first-order theory that is complete and decidable is Boolean algebra [16] or Zermelo–Frænkel set theory. B. First-Order Logic First-order logic, also called first-order predicate calculus (FOPC), is a system of deductions extending propositional logic. Atomic sentences of first-order logic are called predicates and are written usually in the form P(t1, t2, …,tn). An important ingredient of the first-order logic not found in propositional logic is quantification. In 1929, Gödel [8] proved that every valid logical formula is valid in first-order logic. In other words, it is proved that for complete first-order theory, inference rules of FOPC are sufficient to prove any valid formula. First-order predicate calculus language consists of predicates, constants, functions, variables, logical operators (NOT, OR, AND), quantifiers, parentheses, and some types of equality symbol. There is also a set of rules for recognition of terms and well-formed formulas (wffs). There are four axioms for quantification: 1) PRED-1: (∀ x Z(x)) → Z(t) 2) PRED-2: Z(t) → (∃ x Z(x)) 3) PRED-3: (∀ x (W → Z(x))) → (W → ∀ x Z(x)) 4) PRED-4: (∀ x (Z(x) → W)) → (∃ x Z(x) → W) An important theorem for first-order logic is the outcome from Herbrand’s work (known as Herbrand’s theorem). It states that in predicate logic without equality, a formula A in prenex form (all quantifiers at the front) is provable if and only if a sequent S comprising substitution instances of the quantifier-free subformula of A is propositionally derivable, and A can be obtained from S by structural rules and quantifier rules only. In other words, it states that the formula cSAT problem lower bound Radosław Hofman, cSAT problem lower bound, 2007 is provable, if, and only if we can rewrite it without quantifier substituting values and obtain provable formula. For example: ∀ x Z(x) = Z(0) ∧ Z(1) ∃ x Z(x) = Z(0) ∨ Z(1) C. Boolean Algebra Boolean algebra (also called Boolean lattice) is an algebraic structure containing objects and operations upon them and set of axioms (see Section D). It consists of one unary operation ¬ (not) and two binary operations ∧ (and), ∨ (or) also with two distinct elements 0 (constant representing false), 1 (constant representing true). Language of Boolean algebra considered as language for first order logic also contains symbols: = (equality), ⇒ (implication), parentheses and quantifiers, ∀ (universal), and ∃ (existential). Boolean algebra has the essentials of logic properties as well as all set operations (union, intersection, complement). D. Axioms of Boolean Algebra Given below is a complete list of Boolean algebra axioms. This set is not a minimal set of axioms (especially staring from Ax13)) – some axioms can be derived from others, but it does not change the reasoning used in this article (the list is larger only for clearness and ensuring that it is complete): Ax1) a = b can be written as (a ∧ b) ∨ (¬a ∧ ¬b) Ax2) a ⇒ b = ¬a ∨ (a ∧ b) Ax3) a ∨ (b ∨ c) = a ∨ b ∨ c = (a ∨ b) ∨ c Ax4) a ∧ (b ∧ c) = a ∧ b ∧ c = (a ∧ b) ∧ c Ax5) a ∨ b = b ∨ a Ax6) a ∧ b = b ∧ a Ax7) a ∨ (a ∧ b) = a Ax8) a ∧ (a ∨ b) = a Ax9) a ∨ (b ∧ c) = (a ∨ b) ∧ (a ∨ c) Ax10) a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c) Ax11) a ∨ ¬a = 1 Ax12) a ∧ ¬a = 0 Ax13) a ∨ a = a Ax14) a ∧ a = a Ax15) a ∨ 0 = a Ax16) a ∧ 1 = a Ax17) a ∨ 1 = 1 Ax18) a ∧ 0 = 0 Ax19) ¬0 = 1 Ax20) ¬1 = 0 Ax21) ¬(a ∨ b) = ¬a ∧ ¬b Ax22) ¬(a ∧ b) = ¬a ∨ ¬b Ax23) ¬¬a=a Using universal generalization, one may add in every axiom definition, the universal quantifier stating “for all x axiom body” (). E. Computational Tree Corollary of the considerations stated earlier establishes that every formula in Boolean algebra is decidable. It is said to be proved (or called “tautology”) if there exists a transformation path from a set of axioms to a sentence that we are trying to prove. Until the authors are discussing FOPC, one may say that every sentence is provable, if, and only if we can start with axiom and repeatedly apply “modus ponens” or “universal generalization” and obtain this sentence [2]. One may then consider every possible (provable) sentence to be deducible from axioms, which may be presented as the graph shown in Fig. 1. Axiom 1 Axiom 2 Axiom 3 Axiom n Sentence 1.1 Sentence 1.2 Sentence 1.3 Sentence 1.m... Modus ponens General universalization Figure 1 Example of deducible tree Axioms may also be the result of computations, especially when they are not independent (e.g., ZF axioms) or when computations fall in a cycle. Usually, during computation one would skip deduction to already proven sentences because it does not introduce any new information, so deductions to axioms would have been omitted. F. Inference Rules and Deduction Modus ponens is an inference rule using the reasoning: if a and a → b are both proved, then b is also proved. Universal generalization is an inference rule using the reasoning: if P(a) is proved, and a is a free variable, then ∀ a P(a) is also proved. Deduction theorem (in fact deduction meta-theorem) states that if formula F can be deduced from E, then the implication P → Q can be directly shown to be deducible from the empty set. Using symbol “├” for deducible, one may write: if P ├ Q then ├ P → Q. One may generalize it to a finite sequence of assumption formulas P1, P2, P3, …, Pn ├ Q: P1, P2, P3… Pn-1 ├ Pn → Q and repeat it until we obtain the empty set on the left-hand side: ├ (P1 → (…(Pn-1 → (Pn → Q))…). Deduction follows three kinds of steps: setting up a set of assumptions (hypothesis), reiteration - calling hypothesis made previously to make it recent, and deduction, which is removing recent hypothesis. If one wants to convert proof done using deduction meta-theorem to axiomatic proof, then usually the following axioms would have been involved: 1) P → (Q → P) 2) (P → (Q → R)) → ((P → Q) → (P → R)) 3) Modus ponens: (P ∧ (P → Q)) → Q G. Corollaries Theorem 1––If formula expressible in FOPC language is deducible, then every possible transformation of this formula obtained by usage inference rules and axioms is also deducible, and can be expressed in the same language. Theorem 2––If every transformation of formula is Radosław Hofman, cSAT problem lower bound, 2007 expressible in FOPC, then the optimal for certain resource for chosen computational model is also expressible in the same language. Proofs of these theorems are provided in the appendix ( VI.A and VI.B). One needs to focus on Theorem 1 and have good understanding of significance of Gödels work. Let there be considered some formula ϕ which is intended to be proven or disproved. Assuming that there exists some deterministic transformation T1 which transform formula ϕ to ϕ1: ϕ1=T1(ϕ), there can also exist another transformation T2 taking ϕ1 as input and returning ϕ2 as output: ϕ2=T2(ϕ1)=T2(T1(ϕ)). Continuing this idea of transformations one reaches ϕTRUE or ϕFALSE formula allowing to prove or disprove ϕ. Theorem 1 is in fact summary of Gödels Theorem [8], stating that ϕ, ϕ1, ϕ2… ϕx can all be expressed in FOPC language. Above in fact causes statement of Theorem 2: if every possible transformation / reformulation of formula is expressible in FOPC language, then (by power of every quantification) also optimal transformation is expressible in FOPC language. It does not matter what is nature of this transformations. If they are deterministic (for certain input always returns same output in finite number of steps). Optimal way to solve the problem (decide on formula) can be then written as: ϕTRUE/FALSE=Tx(Tx-1(Tx-1(…(T2(T1(ϕ)))…)) III. CSAT LOWER BOUND A. Problem Definition In this work, the authors consider a problem called “Count of Satisfaction of Boolean Expression for formula ϕ.” This problem is almost the same as the classical SAT problem, but instead of the question “Is there an assignment to variables such that formula ϕ is satisfied?”, they ask the question “Are there at least L assignments such that formula ϕ is satisfied?”. L in problem instance is written unary, and the remaining part of the instance is exactly the same as in SAT problem (the authors assume that it is in conjunctive normal form (CNF)). It is easy to show that the problem is in NP – Guess & Check algorithm, for NDTM requires O(L*n) steps to check (certificate size is L*v where v is the number of Boolean variables used). It is also easy to show that the problem is NP-complete. One can show it using reduction from SAT problem and ask the question “Is there at least L=1 assignment such that formula ϕ is satisfied?” B. Measurable Predicate Problem question is easy to understand by a human, but it certainly extends to FOPC language defined in Section II. To express it in a defined language, one needs to define predicate “µ” – measure. This predicate is a representation of sigma-additive (countably additive) measurable function known as “set cardinality.” Definition of this predicate requires one constant variable n – number of different Boolean variables used. Predicate “µ” will measure number of assignments satisfying formula ϕ. 1: µ(∅) :- 0 2: µ(TRUE) :- 2n 3: µ(FALSE) :- 0 4: µ(¬ϕ1) :- 2n-µ(ϕ1) 5: µ(a1) :- 2n-1 6: µ(a1∧a2…∧ak) ∃ ai, aj: i≠j ∧ ai=¬aj :- µ(FALSE) ∃ ai, aj: i≠j ∧ ai=aj :- µ(a1∧a2…aj-1∧aj+1...∧ak) :- 2n-k µ(ϕ1∨ϕ2) :- µ(ϕ1)+µ(ϕ2)−µ(ϕ1∧ϕ2) One may think of adding some more conditions to this predicate, but the list given earlier is sufficient to calculate measure for every formula for a defined language (growth of number of axioms and definitions is discussed in Section H). It is also compliant with sigma-measurable function definition. One may also observe that usage of measure leads to exponential number of calculations required for CNF. This is a consequence of sigma-additivity property: for any sets a and b: µ(a∪b)=µ(a)+µ(b)−µ(a∩b). If one considers m sets, then this function transforms to: ⋅−= })..({ aaPS Sa aa IU µµ , where P({a1..am}) is power set over m sets, which means that it has 2m objects in it. If one is able to calculate the measure of a set or intersection of sets, then the calculation of union of m sets requires Ω(2m) intersections to be measured. Problem question using predicate “µ” is then: “µ(ϕ)≥L?”. Direct calculation may not be the only possible way for solving problems, and the authors now analyze the definition of lower bound, deterministic and nondeterministic computation models. C. Lower Bound Definition Lower bound in Big-O notation is denoted as Ω(g(n)), and for its use in this article, one may assume that it is used to express problem lower bound. Interpretation of lower bound is “minimum value of function in the worst case,” and is defined as f(n) ∈ Ω(g(n)) ⇔ 0 inflim > ∞→ ng In most of the complexity considerations, two types of resources are used in the expressions of problem lower bounds or algorithm upper bounds. These resources are time (number of steps required) and space (number of symbols/tape cells required). Theorem 3––Time complexity of problem/algorithm is always greater than or equal to space complexity. This theorem is proved in Section VI.C. Theorem 4––Minimal number of symbols required for unambiguous description of object is Ω(log(N)), where N represents the number of possible objects to be stored. In other words, this means that if one has N different objects that may occur in computations at a certain step and would want to store information on which one occurred, then Ω(log(N)) Radosław Hofman, cSAT problem lower bound, 2007 symbols are required. This theorem is proved in Section VI.D. In this work, the authors mainly consider time complexity using observation from Theorems 3 and 4. Theorem 5––Lower bound calculated to express a specific resource (time or space) usage for deciding formula expressed in FOPC for a chosen computational model is equal to the minimal usage of this resource for the best possible transformation of formula in this language. This theorem is proved in Section VI.E. Theorem 5 is consequence of Theorem 2. If one had set of deterministic transformations expressing optimal way to solve the problem: ϕTRUE/FALSE=Tx(Tx-1(Tx-1(…(T2(T1(ϕ)))…)) then by power of definitions it can be shown that lower bound for problem solution is exactly equal to time required by this optimal solution. This is consequence of lower bound definition – it is asymptotically minimal amount of resource required to solve the problem. Repeating most important observations till this point: a) formula ϕ can be expressed in FOPC language (from Gödels Theorem [8]) b) any possible transformation of formula can be expressed in FOPC language ϕ1=T1(ϕ) (Theorem 1) c) if every deterministic transformation can be expressed in FOPC language then also optimal deterministic transformation can be expressed in FOPC language (Theorem 2) ϕTRUE/FALSE=Tx(Tx-1(Tx-1(…(T2(T1(ϕ)))…)) d) resource cost of optimal transformation of formula is equal to deterministic lower bound of the problem Roughly speaking, lower bound should be considered as the minimal amount of resource used for computation for the worst case. In case of time, it is the minimal number of operations to perform. It is even intuitive to see that if one could express calculation in some “steps,” then lower bound is equivalent to minimal number of “steps” required in the worst case. D. Nondeterministic Calculation Model Nondeterministic calculation model may be considered as the “luckiest possible guesser.” Such an approach expresses that the role of NDTM to answer a problem question is to guess the certificate and check it. If the check can be performed in O(nc) for some constant c, then one considers the problem as part of NP complexity class. One has to remember that DTM is a “special case” of NDTM where from every machine state, only one possibility to choose the next state exists, regardless of the symbol in the cell where the tape read/write head is positioned. This means that every problem solvable by DTM in O(nc) steps is solvable also on NDTM in at most same number of steps (or may be less). A good example expressing the differences between DTM and NDTM is the 2SAT problem (classic satisfaction of Boolean expression in CNF problem, but where in each clause there are at most two literals). This problem is solvable by DTM in O(n3) steps, but NDTM may guess the correct assignment and verify it in O(n). In terms of first-order logic and Herbrand’s theorem, one can see that NDTM is a verifier of Herbrand’s subformulas. When the formula is expressed using existential quantifier: ∃ F(a), then, according to Herbrand’s theorem, it is equivalent to: F(a1) ∨ F(a2)… ∨ F(ak). NDTM is able to check each of F(ai) simultaneously, even if the number of possible assignments is exponential, excepting when at least one of the computation paths led to an accepting state. One can see that for a nondeterministic calculation model problem, the number of steps of lower bound is equal to the minimal number of steps required to check the certificate. For example, for 2SAT problems one can have different approaches. Number of possible Herbrand’s subformulas Minimal number of steps to check each subformula Total calculatio n cost 1 2n N n (guessing only p variables) n*p3 N*p3 (without splitting) 3 n3 Table 1 Different approaches for nondeterministic calculation Table 1 presents different approaches differing mainly in the number of “guesses.” Calculation of problem lower bound for nondeterministic model of calculation returns the minimal number of steps required to check subformula. The last row presents the approach where the problem is not split, so it is calculated as in the deterministic model of calculations. E. Deterministic Calculation Model As mentioned in Section D, deterministic model of calculations follows a single computation path. It is obvious that despite direct calculations, DTM can also perform Guess & Check algorithm (simulating NDTM). This time, the authors do not assume that DTM is the “luckiest possible guesser” and for lower bound complexity calculation of this approach, they have to assume that DTM is the “worst possible guesser.” This is also a consequence of the slight change in computation goal - NDTM has to “accept” when there is computational path leading to accepting state, while DTM has to “decide” on input, which means that the answer “NO” can be produced only when there is no possible way of reaching the accepting state (NDTM can be defined without rejecting state). Additionally, DTM requires an iterator (space on tape where number of current “guess” can be stored), which according to Theorem 4 requires Ω(log(H)), where H represents the possible number of “guesses.” Table 2 shows what time complexity would look like. Radosław Hofman, cSAT problem lower bound, 2007 Number of possible Herbrand’s subformulas Minimal number of steps to check each subformula Total calculation cost 1 2n N 2n*n+log(2n) 2 2p n*p3 2n*n*p3+log(2p) K 1 n3 n3 Table 2 Different approaches for deterministic calculation In this table, the last row represents the minimal possible number of steps to calculate result. It is easy to show that for DTM, this row also presents deterministic problem lower bound because if any of the “guessing” approaches had been better, then it would have been used to present minimal deterministic calculation cost (we assume that values in the table are “best possible” not “best known” - see Theorem 5). F. cSAT Nondeterministic Algorithm Upper Bound Upper bound for algorithm solving cSAT problem is polynomial. It is a consequence of Herbrand’s theorem and ability of NDTM to: generate all possible subformulas in O(nc) verify each of them in O(nc) NDTM algorithm can be described using the following steps: 1) Guess sets of measure L consisting of assignments of variables (time O(L*v)) 2) Verify guessed set (time O(L*v)) This procedure leads to accepting the state (if at least one computation path is accepting) in at most O(L*v) steps and because instance size n∈Ω(L+v), the solution is provided in O(n2). G. cSAT Deterministic Lower Bound Now, using the observations described in the earlier sections, the authors calculate deterministic lower bound of cSAT problem (it is known that its nondeterministic upper bound is O(L*v)). First, one needs to write the problem in the FOPC language. One uses the predicate µ: µ(ϕ)≥L. This problem may be considered to be harder than the classic SAT problem. If one tries to guess all possible subsets, then we would have Ω( v22 ) subsets, so according to Theorem 4, it would require Ω(2v) symbols to store information about the considered subset, which, according to Theorem 3, leads to the conclusion that such a calculation requires at least Ω(2v) steps. “Guessing” only subsets of size L leads to Ω(2v) different subsets, so that it can be calculated by NDTM in polynomial time (Ω(v*L) steps), but requires Ω(2v*v*L) steps to calculate on DTM. In fact, following the assumption that DTM is the worst possible guesser, one may see that the number of hypotheses (“guesses”) used during computation can lead to an exponential usage of time if “depth” of hypothesis path is longer than O(log(n)) or any of the hypotheses has more than polynomial number of possible values. For example, if one states hypothesis A with possibilities, it is true or false (constant number of possibilities) and it is followed by hypothesis B (true or false), etc.; we need O(n) hypotheses before we can decide on formula, then in the worst case we require Ω(2n) steps to give the answer “NO.” Leaving then all Guess & Check approaches, the authors try to determine the minimal possible number of steps for DTM to decide on problem input. According to Theorem 5 and considerations from the earlier sections, the authors conclude that the shortest possible path consists of steps transforming input formula to axioms of theory. If one can show that every transformation requires exponential number of steps or usage of object using exponential number of symbols to store, then it will be direct proof that lower bound of cSAT problem is over-polynomial. When will one be able to observe exponential growth of minimum number of required steps? If after using an axiom or predicate, one will obtain a formula of multiplicative length by a factor greater than 1. For example, if for formula of size n1 (considered to be in CNF), the authors use Ax9) for one parenthesis, they obtain new formula in the format v1∧(n2,1)∨v2∧(n2,2)∨…∨vm∧(n2,m). In each of the m parts, one can use a variable from the beginning to remove all its negations from body, so |n2,*|<|n1|−m, but for very large n1, these parts of formula will still require further transformations, which if done only with Ax9) would lead to exponential growth. Concluding this paragraph, one may say that if transformation reduces size of formula substring by O(nc) and multiplies this shorter string in formula making string grow to n2, where n2∈Ω(n1*c), then this path leads to exponential growth of formula and thus its lower bound is Ω(2n). In Table 3, the authors present the effect obtained by usage of every possible transformation, but before this the authors define polynomial purifying function for formula. This function will use axioms Ax7), Ax8), Ax11), Ax12), Ax13), Ax14), Ax15), Ax16), Ax17), Ax18), Ax19), Ax20), Ax23), and two observations: µ(ϕ1)=µ(ϕ1∧(TRUE))=µ(ϕ1∧(v1∨v2∨…∨TRUE)); µ(v1∧ϕ1)=µ(ϕ2) where ϕ2 is obtained by replacing every occurrence of v1 in ϕ1 with constant TRUE. Roughly speaking, this function looks for variables that can be cleared out from formula and prepare it for the next step of calculation. The authors assume that at every step of calculation, formula is in a form not allowing the use of any of the above axioms or rules. It is also important to remember that the number of transformation rules does not matter - refer to Section H. Transfor mation used Length string used Result string length Lower bound for path Remarks for “worst case” Ax1) Ax2) These axioms cannot be used since input never contains these symbols Ax3) n1 n1 Ω(cSAT) Ax4) n1 n1 Ω(cSAT) These axioms do not change formula length Radosław Hofman, cSAT problem lower bound, 2007 Transfor mation used Length string used Result string length Lower bound for path Remarks for “worst case” Ax5) n1 n1 Ω(cSAT) Ax6) n1 n1 Ω(cSAT) Ax7) Ax8) These axioms cannot be used because formula is transformed by purifying function Ax9) m1+m2+ 2*m1* m2+nr- Ax10) m1+m2+ 2*m1* m2+nr- Used on two parenthesis replaces them with string of size 2*m1*m2, after using these axioms purifying function will reduce size but in the worst case only by 2 symbols Ax11), Ax12), Ax13), Ax14), Ax15), Ax16), Ax17), Ax18), Ax19), Ax20) These axioms cannot be used because formula is transformed by purifying function Ax21) n1 n1 Ω(cSAT) Ax22) n1 n1 Ω(cSAT) These axioms do not change formula length Ax23) As Ax20) above µ1 µ2 µ3 These rules cannot be used because formula is transformed by purifying function µ4 n1 n1 Ω(cSAT) Can be used with Ax21), Ax22) or Ax23), but does not change length of formula µ5 As µ3 above µ6 m1+m2 m1*m2 n Treating formula as consisting of two parts Table 3 Lower bounds for every possible transformation of cSAT formula The variables used in Table 3 are the following: n1 – Length of the formula nr – Length of the remaining part of the formula m1 – Length of first part/parenthesis of the formula m2 – Length of second part/parenthesis of the formula p1 – Number of parentheses in the formula – The authors consider asymptotic behavior of the function, so one may use kind of “mean” m1 – representing Ω(m1). It is clear that in the worst case, Ax9), Ax10), or µ6 have to be used several times before purifying function would make significant reduction of length. Lower bound is considered as the minimal worst case, so from this table it is clear that in the worst case it is Ω(mp)=Ω(2p*log(m)) and because p and m are both O(n), the whole lower bound is Ω(2n). H. More Conditions and More Axioms The above considerations prove clearly that deterministic lower bound for cSAT problem considered with FOPC language defined is exponential. But one needs to answer one additional question – Is it the result of too poor FOPC axioms set definition? Or are too few predicates defined? In [1] Baker–Gill–Solovay theorem, authors have shown that problem “Is P equal to NP?” can be relativized using oracles. Oracle is a machine (black box) that gives answers to certain type of problems in one step. One can then imagine that there are a very large number of oracles which can solve certain types of instances. DTM task is to pick up one of them (or use them sequentially because if the number of oracles is an attribute of the machine, then even if we have used millions of them, the complexity in terms of relation to instance size is O(1)). The authors presume then, that for cSAT problem, there exists some deterministic algorithm calculating answer in O(nc) steps. Following lower bound calculation, one knows that this algorithm calculates a result requiring Ω(2n) transformations. Reminding optimal transformation as described above: ϕTRUE/FALSE=Tx(Tx-1(Tx-1(…(T2(T1(ϕ)))…)) and x∈Ω(2n). The presumption made here can be presented as existence of some transformation TA≡Tx-k(Tx-k-1(…(Tx-k-m( ))…)), where m is exponential (TA is equivalent to exponential number of transformations in FOPC language, on optimal transformation path). The authors also assume that TA is deterministic, as deterministic lower bound is discussed in this section, and computable in polynomial number of steps. Now, one need to look on transformation path as on decision process, where at each step there is a decision to be made (decision which transformation is to be used). Each decision takes Ω(1) space to be stored. If m was dynamic and asymptotically equal to 2n, and also computable in polynomial number of steps then this would be equal to O(2n) decisions in O(nc) time what contradicts Theorem 3. Considering constant m (invented by algorithm designer) one may ask what is common in a large number of Turing machines (in the sense of defined algorithms), large number of axioms, large number of predicates, large number of oracles, or large m in above transformation TA? Their number is always a constant, even if very large. If then anyone defines multiple TMs, adds multiple axioms, predicates, defines large number of oracles, or finds one transformation TA equivalent to exponential number of other transformation, then in fact after defining them, one may have constant number of machines, axioms, predicates, transformations, and oracles. The authors then assume that there exists a machine denoted by LDTM in which implements are equivalent to large number of TMs, large number of axioms, predicates, implements TA and are connected to multitude of oracles. Such a defined machine is (by power of assumption) capable Radosław Hofman, cSAT problem lower bound, 2007 of answering cSAT questions for a finite number of differing input types (number of types is a consequence of maximal input size). In other words, the authors assume that there exists a machine LDTM able to answer cSAT questions for instance size less than or equal to nl. They may consider having ( )lncO different input types, and each type is covered at least by one combination of axioms, predicates, or oracles allowing LDTM to give answer in O(n) steps. One may assume that there are gl such combinations. Denoting |gi(nl)| number of instances solved by ith combination of axioms and oracles for instance nl symbols long, the authors have assumed that: |||)(| cng ≥∑ , where ( ) |||| cng <∀ Now the authors determine the ability of this machine to answer cSAT question where n=nl*y. Number of combinations of axioms and oracles remain constant (gl), but they assume that each combination covers more instances (considered to be “same type”) gl(n)=gl(nl*y)≤gl(nl) y. The number of possible types grows from ( )lncO to ( ) ( )( )ynyn ll cOcO =* . Calculating instances covered by gl definitions, we have: ngyng |)(||)*(| . If one proves that for y growing to infinity lcng |||)(| , it will be proof that not all instances of size O(nl*y) are solvable using LDTM definitions, so these large instances will require calculations using deterministic lower bound discussed in Section G. Proof is presented in VI.F, so corollary about impossibility to answer cSAT problems in polynomial time by LDTM holds. IV. COROLLARIES To summarize this article, the authors repeat the deduction path: a) formula ϕ can be expressed in FOPC language (from Gödels Theorem [8]) b) any possible transformation of formula can be expressed in FOPC language ϕ1=T1(ϕ) (Theorem 1) c) if every deterministic transformation can be expressed in FOPC language then also optimal deterministic transformation can be expressed in FOPC language (Theorem 2) ϕTRUE/FALSE=Tx(Tx-1(Tx-1(…(T2(T1(ϕ)))…)) d) resource cost of optimal transformation of formula is equal to deterministic lower bound of the problem e) TA equivalent to exponential number of transformations computable in polynomial time contradicts Theorem 3 f) large number of defined constant set of transformations, oracles, algorithms, machines ect. cannot cover all possible inputs for growing instance size (Theorem 6) g) optimal solution of problem requires Ω(2n) transformations h) deterministic lower bound for cSAT problem is then Ω(2n), then cSAT∉P (Theorem 5) i) NDTM solves cSAT in polynomial time, so cSAT∈NP j) this means that P≠NP If above considerations are correct then checking problem known to be in P has to show that it is in P using the same reasoning. Such check for 2SAT problem if presented in Section VI.G - lower bound for this problem is Ω(nc). In [1], there was presented an oracle A for which PA=NPA. Proof presented in Section VI.F and problem lower bound lead to corollary, and if A is able to solve cSAT in polynomial time, then A has to be nondeterministic - it has to consist of infinite number of objects: deterministic oracles, algorithms, DTMs, axioms, rules etc. (the authors also consider NDTM as an infinite set of DTM duplicates - each for one computational branch). This work discusses problem P=NP, as described in [5]. It may be said to relativize (see [1]) to deterministic model of computation showing that deterministic calculation model made up of finite number of machines (algorithms), oracles, axioms, or predicates is incapable of solving the considered problem when its instance grows to infinity. On the other hand, one may conclude that if restrictions on maximum input length problem are set, then the problem can be proved to be in P using a large number of machines, axioms, algorithms, predicates, or oracles. For deterministic model of computation, one knows then that P≠NP. Using Theorem 13 from [12], the authors also know that NP-complete≠(NP-P). In this theorem, the authors have proved that: if P≠NP and U is some NP-complete language then U=A∪B where neither A nor B language is NP-complete (at least one of them is also not equal to P: A≠P ∨ B≠P). Complexity classes can be put in a picture (Fig. 2): Figure 2 Relation between P, NP, and NP-complete classes V. REFERENCES [1] Baker T. P., Gill J., Solovay R., “Relativizations of the P =? NP question”, SIAM Journal on Computing, vol. 4, no. 4, 1975, pp. 431-442. [2] Barwise J., Etchemendy J., “Language Proof and Logic”, Seven Bridges Press, CSLI (University of Chicago Press) and New York, 2000. [3] Chandra A. K., Kozen D. C., Stockmeyer L. J., “Alternation”, Journal of the ACM, vol. 28, no. 1, 1981. [4] Cook S. A., “The complexity of theorem-proving procedures”, Proceedings of the Third Annual ACM Symposium on Theory of Computing, 1971, pp. 151-158. Radosław Hofman, cSAT problem lower bound, 2007 [5] Cook S. A., “P versus NP problem”, unpublished. Available at: http://www.claymath.org/millennium/P_vs_NP/Official _Problem_Description.pdf [6] Diaby M., “P = NP: Linear programming formulation of the traveling salesman problem”, 2006, unpublished. Available at: http://arxiv.org/abs/cs.CC/0609005 [7] Gallier J. H., "Logic for Computer Science: Foundations of Automatic Theorem Proving", Harper & Row Publishers, 1986. [8] Gödel K., "Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme", I. Monatshefte für Mathematik und Physik, vol. 38, 1931, pp. 173-198. [9] Hofman R., “Report on article: P=NP linear programming formulation of the traveling salesman problem”, 2006, unpublished. Available at: http://arxiv.org/abs/cs.CC/0610125 [10] Jech T., “Set Theory: The Third Millennium Edition, Revised and Expanded”, ISBN 3-540-44085-2, 2003. [11] Karp R. M., “Reducibility among combinatorial problems”, In Complexity of Computer Computations, Proceedings of the Symposium of IBM Thomas J. Watson Research Center, Yorktown Heights, NY. Plenum, New York, 1972, pp. 85-103. [12] Landweber, Lipton, Robertson, “On the structure of sets in NP and other complexity classes”, Theoretical Computer Science, vol. 15, 1981, pp. 181-200. [13] Papadimitriou C.H., Steiglitz K., “Combinatorial Optimization: Algorithms and Complexity”, Prentice-Hall, Englewood Cliffs, 1982. [14] Razborov A., Rudich S., “Natural proofs”, Journal of Computer and System Sciences, vol. 55, no. 1, 1997, pp. 24-35. [15] Savitch W. J., “Relationships between nondeterministic and deterministic tape complexities”, Journal of Computation and System Science, vol. 4, 1970, pp. 177-192. [16] Tarski A., Givant S., “A Formalization of Set Theory Without Variables”, American Mathematical Society, Providence, RI, 1987. VI. APPENDIX A. Proof 1 - Proof of Theorem 1 Theorem 1 - If formula expressible in FOPC language is deducible, then every possible transformation of this formula obtained by usage inference rules and axioms is also deducible and can be expressed in the same language. This theorem is a direct consequence of FOPC definitions. If ϕ is deducible, then: • ϕ ∧ axiom • ϕ → axiom • ∀ x ϕ are also deducible. B. Proof 2 - Proof of Theorem 2 Theorem 2 - If every transformation of formula is expressible in FOPC, then the optimal for certain resource for chosen computational model is also expressible in the same language. This theorem is a consequence of Theorem 1 and FOPC definitions. If the goal of calculation is to decide on formula based on theory axioms, then it is required to obtain formula as a consequence of axioms (with empty left-hand side): ├ (P1 → (…(Pn-1 → (Pn → Q))…). The authors said that every possible transformation of formula is expressible in FOPC and this directly means that the optimal in the aspect of a certain resource (time or space) path is also expressible in FOPC. C. Proof 3 - Proof of Theorem 3 Theorem 3 - Time complexity of problem/algorithm is always greater than or equal to space complexity. This theorem is a consequence of Turing machine definition, which states that in one step, a machine can read or write one (or in general constant) number of symbols. If then f(n) symbols were written, then machine had used at least f(n) steps to write them. D. Proof 4 - Proof of Theorem 4 Theorem 4 - Minimal number of symbols required for unambiguous description of object is Ω(log(N)), where N represents the number of possible objects to be stored. In this section, the function log is considered to have ∑ in root, where ∑ represents the number of the symbols in the alphabet: log(∑)=1. The authors prove the theorem using contradiction. Suppose that one knows “compression” algorithm allowing to write each of N symbols using log(N)−f(N) symbols, where f(N) is a function such that: ∀ N: 01 (it is the smallest value making difference in the number of symbols used), it is negative which means that less than N objects can be represented using a string of this length. E. Proof 5 - Proof of Theorem 5 Theorem 5 - Lower bound calculated to express specific resource (time or space) usage for deciding formula expressed in FOPC for a chosen computational model is equal to the minimal usage of this resource for best possible transformation of formula in this language. Proof of this theorem is in fact a direct corollary of Radosław Hofman, cSAT problem lower bound, 2007 Theorems 1 and 2. If any transformation of formula is expressible in FOPC language, then the optimal in terms of chosen resource is also expressible in FOPC language and when the authors calculate lower bound for this resource for the whole transformation path (from input string to decidable form (to axioms)), then they obtain the value of lower bound for the considered problem. F. Proof 6 - Proof of Not Covering by Constant Set of Definitions All Possible Large Instances by LDTM Assumptions: ( ) |||| cng <∀ and |||)(| cng ≥∑ . Also gi(n) function operates on natural numbers and returns natural numbers, so ( ) |1||| cng . One want to solve yn lcng |||)(| for y growing to infinity. First one may observe that: ∑∑ |1||)(| if one can solve inequality yn yn ll cc |||1| , then it will be equivalent to prove that the proof is correct. The new equality presented by the authors is free from i variables, so it can be rewritten as: ynyn ll ccl |||1|* <− . Now the authors take logarithm on both sides to the base l: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )||log|1|log1 ||log*|1|log*1 ||log|1|loglog ||log|1|*log At this point, it is obvious that this inequality holds - 1/y when y grows to infinity may be omitted and one has inequality of two logarithms where this one on the left-hand side is the logarithm of lower value. More formally, one may calculate limens: ( ) ( ) 1loglim loglim ||log|1|log lim Proof is then correct. G. Proof 7 - Lower Bound for 2SAT Problem 2SAT problem is a special case of cSAT problem. Its special factors are: • L = 1 (problem question is “µ(ϕ)>=1” or “µ(ϕ)>0”) m1=m2=…=mp=2 The authors assume that the input string is in CNF and purifying function (defined in Section III.G) cannot be applied. They use Ax9) on parenthesis to select next parenthesis such that: • parenthesis has not been used yet • parenthesis contains negation of variable used in a previous step Every time the usage of Ax9) will be followed by purifying function usage. For example: (a∨b)1∧(a∨¬c)2∧(c∨d)3∧(¬b∨¬c)4∧(¬b∨¬a)5∧(¬d∨¬a) 6∧(e∨f)7 the authors would have used Ax9) for parenthesis: 1 and 6 (last variable is: ¬d), then 3 (lv: c), then 4 (lv: ¬b), then 1 (second time, lv: a), then 5 (lv: ¬b), then 1 (third time, lv: a), then 6 (second time, lv: ¬d), then 3 (second time, lv: c), and finally 2. Every parenthesis will be used at most on every path from any pair of parentheses. At every stage, calculation formula will contain at most p+1 conjunctions (where p is the number of parenthesis processed) and each conjunction will contain at most every variable once. Every parenthesis will be used at most p2 times, which means that at every stage of computation, formula length is O(p3). This may not be a time optimal solution. According to Theorem 2, optimal transformation path is expressible using axioms and predicates defined for FOPC, but to show that the problem is in P, one does not need to look for optimal transformation path - the authors have shown that there exists at least one transformation path polynomially bounded to instance size, and even if it is not the optimal one, it shows that 2SAT problem is in P. ABSTRACT This article discusses completeness of Boolean Algebra as First Order Theory in Goedel's meaning. If Theory is complete then any possible transformation is equivalent to some transformation using axioms, predicates etc. defined for this theory. If formula is to be proved (or disproved) then it has to be reduced to axioms. If every transformation is deducible then also optimal transformation is deducible. If every transformation is exponential then optimal one is too, what allows to define lower bound for discussed problem to be exponential (outside P). Then we show algorithm for NDTM solving the same problem in O(n^c) (so problem is in NP), what proves that P \neq NP. Article proves also that result of relativisation of P=NP question and oracle shown by Baker-Gill-Solovay distinguish between deterministic and non-deterministic calculation models. If there exists oracle A for which P^A=NP^A then A consists of infinite number of algorithms, DTMs, axioms and predicates, or like NDTM infinite number of simultaneous states. <|endoftext|><|startoftext|> arXiv:0704.0515v2 [cond-mat.mes-hall] 16 Jul 2007 Temperature dependence of Coulomb drag between finite-length quantum wires J. Peguiron,1 C. Bruder,1 and B. Trauzettel1 Department of Physics and Astronomy, University of Basel, Klingelbergstrasse 82, 4056 Basel, Switzerland (Dated: July 2007) We evaluate the Coulomb drag current in two finite-length Tomonaga-Luttinger-liquid wires cou- pled by an electrostatic backscattering interaction. The drag current in one wire shows oscillations as a function of the bias voltage applied to the other wire, reflecting interferences of the plasmon standing waves in the interacting wires. In agreement with this picture, the amplitude of the current oscillations is reduced with increasing temperature. This is a clear signature of non-Fermi-liquid physics because for coupled Fermi liquids the drag resistance is always expected to increase as the temperature is raised. PACS numbers: 71.10.Pm,72.10.-d,72.15.Nj Coulomb drag phenomena in coupled one- dimensional (1D) electron systems have been investigated quite extensively in the past [1–10]. The interest has mainly been driven by the fact that Coulomb drag, i.e. the electrical response of one wire as a finite bias is applied to the other wire, seems to be an ideal testing ground for Tomonaga-Luttinger- liquid (TLL) physics in nature. This is because both inter-wire and intra-wire Coulomb interactions substantially modify transport properties such as the average current and the current noise. On the experimental side, there have been a few works, some of which have claimed to have observed TLL behavior in the drag data [11–13]. Recently, Yamamoto and coworkers have measured Coulomb drag in coupled quantum wires of different lengths and found peculiar transport properties that depend, for instance, on the asymmetry of the two wires [14]. This experiment is the major motivation for our work. We analyze theoretically the Coulomb drag current of two electrostatically coupled quantum wires using the concept of the inhomogeneous TLL model [15–17]. This model is known to capture the essential physics of an interacting 1D wire of finite length coupled to non- interacting (Fermi liquid) electron reservoirs. Within this framework, we are able to study finite-length and finite-temperature effects and therefore to make quali- tative contact with the experimental setup of Ref. [14]. Since the Coulomb interaction varies between the wire regions and the lead regions, charge excitations feel the interaction difference at the boundaries and are known to exhibit Andreev-type reflections [17]. We show that these reflections play a crucial role in the Coulomb drag setup illustrated in Fig. 1. Further- more, we show that the quantum interference phe- nomena associated with the Andreev-type reflections considerably modify the drag current. This is particu- larly interesting as far as the temperature dependence of the drag current is concerned. For Fermi-liquid sys- tems, it is well known that the drag resistance should always increase as the temperature is raised [6]. In our setup instead, the drag current at a fixed drive bias can either increase or decrease as a function of temperature. It crucially depends on the interference pattern due to finite-length effects. This is a clear signature of non-Fermi-liquid physics which could be observed in the double-wire setup of Ref. [14]. The system considered consists of two in- teracting parallel wires (j = <,>) of fi- nite length L< (shorter wire) and L> (longer wire) connected to non-interacting semi-infinite 1D leads (Fig. 1) and is described by the Hamiltonian j=<,> H0j +H +HC. The intra-wire inter- action is modelled through a TLL description [15–17] H0j = Π2j + g2j (x) (∂xΦj) with the piecewise constant interaction parame- ter gj(x) = gj < 1 in the wire region |x| < Lj/2 and gj(x) = 1 in the non-interacting lead re- gions |x| > Lj/2. The Fermi velocity vFj , the in- teraction strength gj, and the wire length Lj set the frequency ωLj = vFj/gjLj of the collective plasmonic excitations hosted in each wire. A voltage eVj = interwire coupling wire < wire > FIG. 1: (color online). The system under consideration. Each interacting wire of length Lj [gray area, interaction parameter gj(|x| < Lj/2) = gj < 1] is connected to a pair of non-interacting leads [gj(|x| > Lj/2) = 1]. The region of backscattering inter-wire interaction (red dashed box) extends over the length of the shorter wire. A voltage V is applied between the leads connected to the drive wire (here the longer wire j = >) and the backscattering-induced current Idr in the drag wire (here the shorter wire j = <) is investigated. http://arxiv.org/abs/0704.0515v2 0 1 2 3 V / 2πV a) g=0.1 b) g=0.25 c) g=0.5 d) g=0.75 e) g=1 FIG. 2: Drag current as a function of drive voltage for identical wires at zero temperature (solid curves). The inter-wire interaction strength ranges from strongly inter- acting (g = 0.1) to non-interacting (g = 1) for the dif- ferent curves (with I dr = eλ 2α4g/~2ωL and VL = ~ωL/e). The dashed curves show the dominant contri- bution ∝ V 4g−2 [given in Eq. (10)] for g = 0.1, 0.25, 0.5. L − µ R applied to the leads is described by HVj = − dx µj(x)∂xΦj(x), (2) with the piecewise constant electro-chemical potential µj(x) = L for x < −Lj/2, 0 for |x| < Lj/2, R for x > Lj/2 with µ L = −µ R . This model is expected to cap- ture the essential physics of a quantum wire coupled smoothly to electron reservoirs (with typical smooth- ing length Ls) as long as Lj=<,> ≫ Ls ≫ λF , where λF is the electron Fermi wavelength [18, 19]. Fi- nally, we include an inter-wire backscattering inter- action over the length L< of the shorter wire, HC = λBS ∫ L(x)]}. (4) This term includes the contribution of the density- density interaction which is most relevant to Coulomb drag [1, 2] when the Fermi wave-vectors of both wires are similar in magnitude, i.e. kF< ≈ kF> [22]. In the following, we choose to apply a voltage V> = V to the longer wire (µ>L = −µ R = eV/2, drive wire) and none V< = 0 to the shorter wire (µ L = µ R = 0, drag wire). The average current in the wires may then be written as I< = Idr and I> = V − Idr. In our model, the two currents I< and I> always flow in the same direction, which is due to momentum conserva- tion. This is known as positive Coulomb drag. In order to get an expression for the drag cur- rent Idr, we follow the formalism used in [18] in the case of a single wire with an impurity. We consider the situation of weak inter-wire coupling. To second order in λBS, we obtain Idr = I ∫ 1−R drjdr(r, R), (5) with the normalization I dr = eλ 2ωL<, where α< = ωL , (6) involves the parameters l = L>/L<, p = g>/g<, and q = vF>/vF<. The correlation function Cj = CGSj + C j of each wire can be decomposed in a zero-temperature and a finite-temperature contribu- tion given by [18] CGSj (r, R; τ) = − αj + i(τ − sr − 2k) αj + i(−2k) |2k+1| αj + i(τ − sR− 2k − 1) [α2j + (r − sR− 2k − 1)2]1/2 , (7) CTFj (r, R; τ) = − sinch[πθ(τ − sr − 2k)] sinch[πθ(−2k)] |2k+1| sinch[πθ(τ − sR− 2k − 1)] sinch[πθ(r − sR− 2k − 1)] , (8) with γj = (1− gj)/(1 + gj) and sinchx = (sinhx)/x. It is to be noted that the expression for the drag cur- 3rent Idr does not depend on whether the drive wire is the longer wire or the shorter one due to the symme- try of our model. In the following, we present results obtained by numerical evaluation of the triple integral involved in (5) and (6) and discuss several analytical approximations. First we set the temperature to zero and consider identical wires (l = p = q = 1, thus we drop the wire index j). The drag current shows non-monotonous behavior and oscillations with period ∼ 2π~ωL/e as a function of the bias voltage (Fig. 2). It decays at large voltages for g < 1/2 whereas it increases for g > 1/2. Thus, we obtain qualitatively the same behavior as in a dual Coulomb drag setup where a drive current is applied and a drag voltage is measured [2]. Similar os- cillations as a function of voltage have been predicted in the context of two coupled fractional quantum Hall line junctions [20]. An analytic approximation can be derived in the limit u = eV/~ωL ≫ 1, that is for high voltages or long wires. In Eq. (7), the terms proportional to γ|m| account for contributions from plasmon exci- tations reflected |m| times inside the wire. When the wire length is much longer than other relevant length scales, the contribution without any reflection m = 0 becomes dominant and yields the integrand jdr(r, R) ∼ Γ(2g) )2g−1/2 J2g−1/2(ur), (9) where Γ(z) denotes the Gamma function and Jν(z) the Bessel function of order ν. The resulting expres- sion for the drag current, which involves hypergeo- metric functions, underestimates the amplitude of the oscillations with respect to the exact numerical re- sult. However, the behavior at large u, governed by the dominant contribution Idr ∼ 2Γ2(2g) )4g−2 , (10) shows good agreement in the appropriate parameter regime (dashed curves in Fig. 2). Since we do per- turbation theory in λBS, the relation Idr ≪ (e2/h)V has to hold. In the large u regime, this means (λBSL/~ωc) 2(ωL/ωc)(eV/~ωc) 4g−3 ≪ 1. Now we consider the situation where the two wires have different lengths. The qualitative behavior of the drag current does not change for increasing length ra- tio l = L>/L<, but the peak positions get shifted to lower voltages (Fig. 3). Here, neglecting plasmon re- flections in the correlation function of the wires leads again to the expression (9), which is independent of l. This fact explains why the drag current does not change appreciably as a function of l and indicates that the peak shifts observed result from plasmon re- flections inside the wires. Our studies are the first to analyze the effect of an asymmetry in the length on Coulomb drag phenomena in coupled quantum wires which is of recent experimental relevance [14]. 0 0.5 1 1.5 2 Idr/Idr L>/L<|startoftext|> Journal of the Chinese Chemical Society, 2001, 48: 449-454 Effects of Imperfect Gate Operations in Shor’s Prime Factorization Algorithm Hao Guo1,2, Gui-Lu Long1,2,3,4,5 and Yang Sun1,2,6,7 Department of Physics, Tsinghua University, Beijing 100084 Key Laboratory for Quantum Information and Measurements, MOE Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100080, P.R. China Centre for Nuclear Theory, Lanzhou National Laboratory of Heavy Ions Chinese Academy of Sciences, Lanzhou 740000, P.R. China Center of Atomic, Molecular and Nanosciences, Tsinghua University, Beijing 100084 Department of Physics, Xuzhou Normal University, Xuzhou, Jiangsu 221009 Department of Physics and Astronomy, University of Tennessee, Knoxville, TN 37996, U.S.A. (Dated: 2001) The effects of imperfect gate operations in implementation of Shor’s prime factorization algorithm are investigated. The gate imperfections may be classified into three categories: the systematic error, the random error, and the one with combined errors. It is found that Shor’s algorithm is robust against the systematic errors but is vulnerable to the random errors. Error threshold is given to the algorithm for a given number N to be factorized. PACS numbers: PACS numbers: 03.67.Lx, 89.70.+c, 89.80.+h I. INTRODUCTION Shor’s factorization algorithm [1] is a very impor- tant quantum algorithm, through which one has demon- strated the power of quantum computers. It has greatly promoted the worldwide research in quantum computing over the past few years. In practice, however, quantum systems are subject to influence of environment, and in addition, quantum gate operations are often imperfect [2, 3]. Environment influence on the system can cause de- coherence of quantum states, and gate imperfection leads to errors in quantum computing. Thanks to Shor’s an- other important work, in which he showed that quantum error correlation can be corrected [4]. With quantum error correction scheme, errors arising from both deco- herence and imperfection can be corrected. There have been several works on the effects of deco- herence on Shor’s algorithm. Sun et al. discussed the effect of decoherence on the algorithm by modeling the environment [5]. Palma studied the effects of both deco- herence and gate imperfection in ion trap quantum com- puters [6]. There have also been many other studies on the quantum algorithm [7, 8, 9, 10]. The error correction scheme uses available resources. Thus it is important to study the robustness of the algo- rithm itself so that one can strike a balance between the amount of quantum error correction and the amount of qubits available. In this paper, we investigate the effects of gate imperfection on the efficiency of Shor’s factoriza- tion algorithm. The results may guide us in practice to suppress deliberately those errors that influence the algo- rithm most sensitively. For those errors that do not affect the algorithm very much, we may ignore them as a good approximation. In addition, study of the robustness of algorithm to errors is important where one can not apply the quantum error correction at all, for instance, in cases that there are not enough qubits available. The paper is organized as follows. Section II is devoted to an outline of Shor’s algorithm and different error’s modes. In Section III, we present the results. Finally, a short summary is given in Section IV. II. SHOR’S ALGORITHM AND ERROR’S MODES Shor’s algorithm consists of the following steps: 1) preparing a superposition of evenly distributed states |ψ〉 = 1√ |a〉|0〉, where q = 2L and N2 ≤ q ≤ 2N2 with N being the number to be factorized; 2) implementing yamodN and putting the results into the 2nd register |ψ1〉 = |a〉|yamodN〉; 3) making a measument on the 2nd register; The state of the register is then |φ2〉 = |jr + l〉|z = yl = yjr+lmodN〉 where j ≤ 4) performing discrete Fourier transformation (DFT) on the first register |φ3〉 = f̃ (c) |c〉 |z〉, where f̃ (c) = 2πi(jr + l) 2πilc 2πijrc http://arxiv.org/abs/0704.0516v1 This term is nonzero only when c = k q , with k = 0, 1, 2...r − 1, which correspond to the peaks of the dis- tribution in the measured results, and thus this term be- comes f̃(c) = 1√ 2πilc q . The Fourier transformation is important because it makes the state in the first register the same for all possible values in the 2nd register. The DFT is constructed by two basic gate operations: the single bit gate operation Aj = , which is also called the Walsh-Hadmard transformation, and the 2-bits controlled rotation Bjk = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 eiθjk with θjk = . The gate sequence for implementing DFT is (Aq−1)(Bq−2q−1Aq−2) . . . (B0q−1B0q−2 . . . B01A0). Errors can occur in both Aj and Bjk. Aj is actually a rotation about y-axis through π Aj(θ) = e Syθ = I cos( )−i sin(θ )σy = cos( θ ) − sin( θ sin( θ ) cos( θ If the gate operation is not perfect, the rotation is not exactly π . In this case, Aj is a rotation of Aj(δ) = cos(δ)− sin(δ) −(sin(δ) + cos(δ)) sin(δ) + cos(δ) cos(δ)− sin(δ) If δ is very small, we have: Aj(θ) = 1− δ −(1 + δ) 1 + δ+ 1− δ Similarly, errors in Bjk can be written as Bjk = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 ei(θjk+δ) With these errors, the DFT becomes |a〉 → i( 2π +δc)a(1 + δ′c)|c̃〉 = i( 2πc +δc)a(1 + δ′c)|c̃〉,(1) where δc and δ c denote the error of Aj and Bjk, respec- tively. Let us assume the following error modes: 1) system- atic errors, where δc or δ c in (1) can only have system- atic errors (EM1); 2) random errors (EM2), for which we assume that δc or δ c can only be random errors of the Gaussian or the uniform type; 3) coexistence of both systematic and random errors (EM3). In the next sec- tion, we shall present the results of numerical simulations and discuss the effects of imperfect gate operation on the DFT algorithm, and thus on the Shor’s algorithm. III. INFLUENCE OF IMPERFECT GATE OPERATIONS We first discuss the influence of imperfect gate opera- tions in the initial preparation Al−1Al−2...A0|0...0〉 = 1√ (|0〉+ |1〉+ δ1(|0〉 − |1〉))⊗ (|0〉+ |1〉+ δ2(|0〉 − |1〉))⊗ . . .⊗ (|0〉+ |1〉+ δn(|0〉 − |1〉)) i1i2...in=0 |i1i2...in〉+ 1√ R=1 δn i1i2...in=0 (|i1..iR−10iR+1..in〉 − |i1..iR−11iR+1..in〉 If the errors are systematic, for instance, caused by the inaccurate calibration of the rotations, then δ1 = δ2 = . . . = δn = δ. In this case, we can write the 2nd term as |ψ〉 = 1√ i1i2...in=0 (2s− n)|i1i2...in〉, where s stands for the number of 1’s, and 2s− n = s − (n − s) is the difference in the number of 1’s and 0’s. Thus the results after the first procedure is (|a〉+ δ(2s− n)|a〉) = (1 + δa)|a〉. (2) This implies that after the procedure, the amplitude of each state is no longer equal, but have slight difference. Combining the effect in the initialization and in the DFT, we have (1 + δa)(1 + δc)e i( 2πc +δ′c)a = (1 + δ′′)ei( +δ′c)a, where δ′′c = δc + δa. In the DFT, we have |ψ〉 ⇒ (1 + δj)e i( 2πc +δ′j)(jr+l)|c̃〉, where we have rewrite δ′′ as δj here. Let Pc denote the probability of getting the state |c̃〉 after we perform a measurement, we have (1 + δm)(1 + δk)e i( 2πc +δ′m)(mr+l)×e−i( +δ′k)(kr+l) (1 + δm)(1 + δk) cos[ r(m − k) + (mr + l)δ′m − (kr + l)δ′k](3) From Eq. (3), we find that after the last measurement, each state can be extracted with a probability which is nonzero, and the offset l can’t be eliminated. Eq. (3) is very complicated, so we will make some predigestions to discuss different error modes for conve- nience. Generally speaking, the influence of exponential error δj is more remarkable than δj , so we can omit the error δj , thus DFTq |φ〉 = j=0 e i( 2πc +δ′j)(jr+l)|c〉 . A. Case 1 If only systematic errors (EM1) are considered, namely, all the δj ’s are equal, then f̃(c) can be given analytically f̃(c) = i( 2πc +δ)(jr+l) il( 2πc +δ) 1− ei( 1− ei( The relative probability of finding c is f̃(c) sin2( sin2(πcr and if c = k q , then r sin2( q2 sin2( δr It can be easily seen that limδ→0 Pc = , which is just the case that no error is considered. When δ takes certain values, say, δ = 2 (k− r )π where k is an integer, then the summation in Eq. (4) is on longer valid. In our simulation, δ does not take these values. Here we consider the case where q = 27 = 128 and r = 4. For comparisons, we have drawn the relative probability for obtaining state c in Fig.1. for this given example. We have found the following results: (i) When δ is small, the errors do hardly influence the final result, for instance when c = k q , then Pc = lim r sin2( δq q2 sin2( δr The probability distribution is almost identical to those without errors. (ii) Let us increase δ gradually, from Fig.2, we see that a gradual change in the probability distribution takes place. (Here, we again consider the relative probabilities) When δ is increased to certain values, the positions of peaks change greatly. For instance at δ = 0.05, there appears a peak at c=127, whereas it is Pc = 0 when no systematic errors are present. In general, the influence of systematic errors on the algorithm is a shift of the peak positions. This influences the final results directly. B. Case 2 When both random errors and systematic errors are present, we add random errors to the simulation. To see the effect of different mode of random errors, we use two random number generators. One is the Gaussian mode and the other is the uniform mode. In this case, the er- ror has the form δ = δ0 + s, where δ0 is the systematic error. s has a probability distribution with respect to c, depending on the uniform or the Gaussian distribu- tion. When δ0 = 0, we have only random errors which is our error mode 2. When δ0 6= 0, we have error mode 3. For the uniform distribution, s ∼ ±smax × u(0, 1) where u(0, 1) is evenly distributed in [0,1]. smax indicates the maximum deviation from δ0. For Gaussian distribution, s ∼ N(0, σ0). Through the figure, we see the following: (1) When only random errors are present (δ0 = 0), the peak positions are not affected by these random errors. However, different random error modes cause similar re- sults. The results for uniform random error mode are shown in Fig.3. For the uniform distribution error mode, with increasing δmax, the final probability distribution of the final results become irregular. In particular, when δmax is very large, all the patterns are destroyed and is hardly recognizable. Many unexpected small peaks ap- pear. For the Gaussian distribution error mode, as shown in Fig.4, the influence of the error is more serious. This is because in Gaussian distribution, there is no cut-off of errors. Large errors can occur although their proba- bility is small. The influence of σ0 on the final results is also sensitive, because it determines the shape of the distribution. When σ0 increases, the final probability dis- tribution becomes very messy. A small change in σ0 can cause a big change in the final results. (2) When δ0 6= 0, which corresponds to error mode 3, the effect is seen as to shift the positions of the peaks in addition to the influences of the random errors. IV. SUMMARY To summarize, we have analyzed the errors in Shor’s factorization algorithm. It has been seen that the effect of the systematic errors is to shift the positions of the peaks, whereas the random errors change the shape of the probability distribution. For systematic errors, the shape of the distribution of the final results is hardly destroyed, though displaced. We can still use the result with several trial guesses to obtain the right results because the peak positions are shifted only slightly. However, the random errors are detrimental to the algorithm and should be reduced as much as possible. It is different from the case with Grover’s algorithm where systematic errors are disastrous while random errors are less harmful [10]. [1] P.W. Shor, Proceedings of the 35th Annual Symposium on the Foundations of Computer Science, edited by S. Goldwasser (IEEE Computer Society Press, Los Alami- tos, CA, 1994) p.124. -20 0 20 40 60 80 100 120 140 FIG. 1: Relative probability for finding state c in the absence of errors. [2] A. Ekert and R. Jozsa, Rev. Mod. Phys. 68 (1996) 733. [3] W.G. Unruh, Phys. Rev A51 (1995) 992. [4] I. Chuang and R. laflamme, ”Quantum error correction by codding” (1995) quant-ph/9511003. [5] C.P. Sun, H. Zhan and X.F. Liu, Phys. Rew. A58 (1998) 1810. [6] G.M. Palma, K.A. Suominen and A.K. Ekert, Proc. R. Soc. London, A 452 (1996) 567. [7] R.P. Feynman, Int. J. Theo. Phys., 21 (1982) 467. [8] D. Deutsch, Proc. R. Soc. Land. A 400 (1985) 97. [9] L.K. Grover, Phys. Rev, Lett, 79 (1997) 325. [10] G.L. Long, Y.S. Li, W.L. Zhang, C.C. Tu, Phys. Rev. A 61 (2000) 042305. [11] L.K. Grover, Phys. Rev. Lett, 80 (1998) 4329. http://arxiv.org/abs/quant-ph/9511003 -20 0 20 40 60 80 100 120 140 -20 0 20 40 60 80 100 120 140 -20 0 20 40 60 80 100 120 140 -20 0 20 40 60 80 100 120 140 FIG. 2: The same as Fig.1. with systematic errors. In sub- figures (1), (2), (3), (4), δ are 0.02, 0.03, 0.05 respectively. In sub-figure (4), the curve with solid circles(with higher peaks) is the result with δ = 0.1, and the one without solid cir- cles(with lower peaks) denotes the result with δ = 0.33. -20 0 20 40 60 80 100 120 140 c -20 0 20 40 60 80 100 120 140 -20 0 20 40 60 80 100 120 140 -20 0 20 40 60 80 100 120 140 0.000 0.005 0.010 0.015 0.020 0.025 0.030 FIG. 3: The same as Fig.1. with uniform random errors. In sub-figures (1), (2), (3), (4), smax are set to 0.01, 0.03, 0.05, 0.1 respectively. -20 0 20 40 60 80 100 120 140 -20 0 20 40 60 80 100 120 140 -20 0 20 40 60 80 100 120 140 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 -20 0 20 40 60 80 100 120 140 FIG. 4: The same as Fig.1. with Gaussian random errors and systematic errors. In sub-figures (1), (2), and (3) τ are set to 0.01, 0.03 and 0.05 respectively, and δ0 = 0(without systematic errors). In sub-figure (4), both systematic and random Gaussian errors exist, where δ0 = 0.33, τ = 0.02. ABSTRACT The effects of imperfect gate operations in implementation of Shor's prime factorization algorithm are investigated. The gate imperfections may be classified into three categories: the systematic error, the random error, and the one with combined errors. It is found that Shor's algorithm is robust against the systematic errors but is vulnerable to the random errors. Error threshold is given to the algorithm for a given number $N$ to be factorized. <|endoftext|><|startoftext|> Introduction The quantitative assessment of dietary exposure to certain contaminants is of high priority to the Food and Agricultural Organization and the World Health Organization (FAO/WHO). For exam- ple, excessive exposure to methylmercury, a contaminant mainly found in fish and other seafood (mollusks and shellfish) may have neurotoxic effects such as neuronal loss, ataxia, visual disturbance, impaired hearing, and paralysis (WHO, 1990). Quantitative risk assessments for such chronic risk require the comparison between a tolerable dose of the contaminant called Provisional Tolerable Weekly Intake (PTWI) and the population’s usual intake. The usual intake distribution is gener- ally estimated from independent individual food consumption surveys (generally not exceeding 7 days) and food contamination data. Several models have been developed to estimate the distribu- tion of usual dietary intake from short-term measurements (see for example, Nusser et al., 1996; Hoffmann et al., 2002). The proportion of consumers whose usual weekly intake exceeds the PTWI can then be viewed as a risk indicator (see for example, Tressou et al., 2004). This kind of risk assessment does not account for the underlying dynamic process, i.e. for the fact that the contami- nant is ingested over time and naturally eliminated at a certain rate by the human body. Moreover, longer term measurements of consumption are available through household budget surveys (HBS). In this paper, we propose to use HBS data to quantify individual long term exposure to a contaminant. This data provides long time series of household food acquisitions which are first used in a decomposition model, similar to the one proposed by Chesher (1997, 1998) in the nutrition field, in order to obtain time series of individual intakes. Then, the pharmacokinetic properties of the contaminant are integrated into an autoregressive model in which the current body burden is defined as a fraction of the previous one plus the current intake. From a toxicological point of view, this approach is, to our knowledge, novel and hence requires the definition of an ad-hoc long term safe dose as proposed in the next section. We refer to this autoregressive model as Kinetic Dietary Exposure Model (KDEM). From a statistical point of view, such autoregressive models are well known in general time series analysis (see for example, Hamilton, 1994) and most of the paper is devoted to the description of the decomposition model. This statistical model aims at estimating individual quantities from total household quantities and structures. This problem is similar to that studied by Engle et al. (1986), Chesher (1997, 1998), and Vasdekis and Trichopoulou (2000), and is addressed in a slightly different way. In the present article, the individual contaminant intake is firstly viewed as a nonlinear function of age within each gender, with time and socioeconomic characteristics being secondly introduced in a linear way. The nonlinear function is represented by a truncated polynomial spline of order 1 that admits a mixed model spline representation (section 4.9 in Ruppert et al., 2003). These choices yield a simple linear mixed model which is estimated by REstricted Maximum Likelihood (REML, Patterson and Thompson, 1971). One major extension of the proposed model compared to Chesher (1997) is the introduction of dependence between the individual intakes of a given household. In the next section, focusing on the methylmercury example even though the method is much more general and could be applied to any chronic food risk, SECODIP data are described along with the construction of a household intake series and the individual cumulative and long term exposure concepts yielding the KDEM. Section 2 is devoted to the statistical methodology used to decompose the household intake series into individual intake series, namely the presentation of the model and its estimation and tests. Section 3 displays the results for the quantification of long term exposure to methylmercury of the French population using the 2001 SECODIP panel. Finally, a discussion on the use of household acquisition data, with the focus on the French SECODIP panel, is conducted in section 4 with respect to the proposed long term risk analysis. 1 Motivating example: risk related to methylmercury in seafoods in the French population In this section, the Kinetic Dietary Exposure Model (KDEM) and the concept of long term risk are defined. Then a brief panorama of consumption data in France is given and the way the SECODIP HBS data will be used as an input of the KDEM is described. 1.1 Cumulative exposure and long term risk: the Kinetic Dietary Exposure Model (KDEM) The main objective of the analysis is to assess individuals’ long term exposure to a contaminant to deduce whether these individuals are at risk or not. As mentioned in the introduction the only ”safe dose” reference is the PTWI expressed in terms of body weight (relative intake). Unfortunately, TNS SECODIP did not record the body weight of the individuals until 2001. The body weights are thus estimated from independent data sets; namely the French national survey on individual consumption (INCA, CREDOC-AFSSA-DGAL, 1999) for people older than 18, and the weekly body weight distribution available from French health records (Sempé et al. (1979)) for individuals under 18. In both cases, gender differentiation is introduced. Assume that estimations of the individual weekly intakes are available, that is yi,h,t denotes the intake of individual i belonging to household h for the tth week (with i = 1, . . . , nh,t; h = 1, . . . ,H and t = 1, . . . , T ), and Di,h,t denotes the same quantity expressed on a body weight basis. The cumulative exposure up to the tth week of this individual is then given by Si,h,t = exp(−η) · Si,h,t−1 +Di,h,t, (1.1) where η > 0 is the natural dissipation rate of the contaminant in the organism. This dissipation parameter is defined from the so called half life of the contaminant,which is the time required for the body burden to decrease by half in the absence of any new intake. For methylmercury, the half life, denoted by l1/2, is estimated to 6 weeks, so that η = ln(2)/l1/2 := ln(2)/6 (Smith and Farris, 1996). The autoregressive model defined by (1.1) and a given initial state Si,h,0 = Di,h,0 has a stationary solution since exp(−η) < 1. As a convention, Si,h,0 is set to the mean of all positive exposures (Di,h,t)t=1,. . . ,T . However, this convention has little impact on the level of an individual’s long term exposure since the contribution of the initial state Si,h,0 tends to zero as t increases. We call this autoregressive model ”KDEM” for Kinetic Dietary Exposure Model. The individual cumulative exposure Si,h,t can be considered to be the long term exposure of an individual for sufficiently large values of t. For methylmercury, the long term steady state of the individual exposure to a contaminant is reached after 5 or 6 half lives according to Dr P. Granjean, a methylmercury expert. Thus, the long term individual’s exposure to methylmercury is defined as the cumulative exposure reached after say 6l1/2 = 36 weeks. The risk assessment usually consists of comparing the exposure with the so called Provisional Tolerable Weekly Intake (PTWI). This tolerable dose, determined from animal experiments and extrapolated to humans, refers to the dose an individual can ingest throughout his entire life without appreciable risk. For methylmercury, the PTWI is set to 1.6 microgram per kilogram of body weight per week (1.6 µg/kg bw, see FAO/WHO, 2003). In our dynamic approach, the long term exposure is compared to a reference long term exposure denoted by Sref , and defined as the cumulative exposure of an individual whose weekly intake is equal to the PTWI, d, such as Sref = lim 1− exp(−η) , (1.2) where d exp(−η(t− s)) = dexp(−η(t+ 1))− 1 exp(−η)− 1 . (1.3) For methylmercury, the reference for long term exposure Sref is 14.6 µg/kg bw. An individual is then assumed to be at risk if his cumulative exposure Si,h,t exceeds the reference S t for any t > 6l1/2. This KDEM model requires some long surveys of individual intakes which are not monitored and can only be approximated from available consumption data and contamination data. 1.2 From household acquisition data to household intake series Two current major consumption data sources in France are the national survey on individual consumption (INCA, CREDOC-AFSSA-DGAL, 1999) and the SECODIP panel managed by the company TNS SECODIP. Most quantitative risk assessments conducted by the French agency for food safety (AFSSA) use the 7 day individual consumption data of the INCA survey jointly with contamination data collected by several French institutions. Regarding methylmercury, seafood contamination data have been collected through different analytical surveys (MAAPAR, 1998-2002; IFREMER, 1994-1998) and were used in Tressou et al. (2004) and Crépet et al. (2005) combined with the INCA survey. In this paper, a methodology using the SECODIP data is developed (see Boizot, 2005, for a full description of this database). The company TNS SECODIP has been collecting the weekly food acquisition data of about five thousand households since 1989. All participating households register grocery purchases through the use of EAN bar codes but other grocery purchases are registered differently: the fresh fruit and vegetable purchases are recorded by the FL sub-panel while fresh meat, fresh fish and wine purchases are recorded by the VP sub-panel. The households are selected by stratification according to several socioeconomic variables and stay in the survey for about 4 years. TNS SECODIP provides weights for each sub-panel and each period of 4 weeks to make sure of the representativeness of the results in terms of several socioeconomic variables. TNS SECODIP also defines the notion of household activity which refers to the correct and regular reporting of household purchases over a year. For each household, the age and gender of each member of the household are retained in our decomposition model with some socioeconomic variables: the region, the social class (from modest to well-to-do), the occupation category and level of education of the principal household earner. For methylmercury risk assessment, the households of the VP panel are considered; in the 2001 data set, there are H = 3229 active households (corresponding to 9288 individuals) and T = 53 weeks during which the households may or may not acquire seafood. The weekly purchases of seafood are clustered into two categories (”Fish” and ”Mollusks and Shellfish”) for which the mean contamination levels are calculated from the MAAPAR-IFREMER data and are given in table 1. Table 1 around here, see page 21 Household intake series ((yh,t)h=1,...,H;t=1,...,T) are computed as the cross product between weekly purchases of seafoods which are assimilated to weekly consumptions, and mean contamination levels. They are expressed in micrograms per week (µg/w). The food ”purchase-consumption” assimilation is of course arguable and will be the main subject of the final discussion (see section 4). An additional assumption concerns the household size, denoted by nh,t for the household h and the week t. This can indeed vary over time in the case of a birth or death of a household member. Since a new born baby will not consume fish in his first few months, we assume that food diversification (and hence consumption of seafoods) starts at one year of age, yielding a total sample of 8913 individuals for the 2001 panel. These household intake series are then decomposed into individual intake series using the model described in the next section. These individual intake series are then used as imputs of the KDEM. 2 Statistical methodology In this section, the decomposition model is described and compared to similar models described in the literature, namely Chesher (1997, 1998); Vasdekis and Trichopoulou (2000). Its estimation and some structure tests are then presented. 2.1 The decomposition model 2.1.1 General principle Consider a household composed of nh,t members, each member having unobserved weekly intakes yi,h,t, with i = 1,. . . , nh,t, h = 1,. . . ,H, and t = 1,. . . , T . The week t intake of a household h is simply the sum across household members of the individual weekly intakes, such as yh,t = nh,t∑ yi,h,t. (2.1) As detailed below, the individual weekly intake yi,h,t is assumed to depend on • the age and gender of the individual via a function f, • some socioeconomic characteristics of the household, • time (seasonal variations). There are obviously several ways to model the individual intake under these assumptions and this choice leads to more or less simple estimation procedures. In Chesher (1997, 1998); Vasdekis and Trichopoulou (2000), a discretization argument on age is used leading to a penalized least square estimation of a great number of parameters, that is one parameter for each year of age and gender. We propose to use a truncated polynomial spline of order 1 for each gender, which admits a mixed model spline representation for f. As far as socioeconomic characteristics are con- cerned, Chesher (1997) retained a multiplicative specification whereas Vasdekis and Trichopoulou (2000) chose the additive one. In the multiplicative model, a change in income for example would proportionally affect all the individual intakes whereas in the additive setting, they would be af- fected by the same value. Following Vasdekis and Trichopoulou (2000), we retained the additive specification since the difference between the two specifications may not be notable, and the addi- tive setting yields to a much simpler estimation procedure (linear model). Finally, time dependency is only introduced in Chesher (1998) to track changes with age within cohorts: this time depen- dency is directly introduced into the function f that is bivariately smoothed according to age and time (cf. Green and Silverman, 1994). Again, we adopt a simpler specification in which time is introduced as a dummy variable. All these assumptions yield an individual model of the form yi,h,t = xi,h,tβ + zi,h,tu+ wh,tγ + δtα+ εi,h,t, (2.2) where the terms xi,h,tβ + zi,h,tu stand for the mixed model spline representation of the function f, the term wh,tγ denotes the socioeconomic effects, the term δtα the time effect, and εi,h,t is the individual error term. Combining (2.1) and (2.2) , we obtain the final rescaled household model given by Yh,t = Xh,tβ + Zh,tu+ nh,twh,tγ + nh,tδtα+ εh,t, (2.3) where Yh,t ≡ ∑nh,t i=1 yi,h,t/ nh,t, Xh,t ≡ ∑nh,t i=1 xi,h,t/ nh,t, Zh,t ≡ ∑nh,t i=1 zi,h,t/ nh,t, and εh,t ≡ ∑nh,t i=1 εi,h,t/ nh,t. 2.1.2 Specification details Age-gender function specification Let ai,h,t and si,h denote the age and sex of individual i of household h for the tth week. Individual dietary intake is generally different according to the gender of individuals, so the function f takes the following form f(ai,h,t, si,h) = fM (ai,h,t)1l{si,h=M} + fF (ai,h,t)1l{si,h=F}, where fM(.) and fF (.) are age-intake relationships for males (M) and females (F) respectively, and 1l{A} is the indicator function of event A. The function fS(.) is approximated by a spline of order one with a truncated polynomial basis for either sex, such as fS(ai,h,t) = β 0 + β 1 ai,h,t + uSk (ai,h,t − κS,k)+ , (2.4) where the (κS,k)k=1,. . . ,KS are nodes chosen from an age list and (ai,h,t − κS,k)+ ≡ (ai,h,t − κS,k) 1l{ai,h,t−κS,k>0} denotes the positive part of the difference between the age of the individual ai,h,t and the node κS,k and the uSk are random effects assumed to be i.i.d. Gaussian with distribution N 0, σ2uS . This last assumption allows us to introduce some penalties into the model and to smooth the function fS yielding a mixed model representation for the spline as shown in Speed (1991); Verbyla (1999); Brumback et al. (1999); Ruppert et al. (2003). As in Ruppert et al. (2003), page 125, the total number of nodes KS is set to min {∣∣aS,d ∣∣ , 35 , where aS,d is the list of distinct ages for individuals of sex S, and the nodes κS,k are defined as the percentile of vector aS,d for k = 1,. . . ,KS . Defining xi,h,t as a line vector 1l{si,h=M} ai,h,t1l{si,h=M} 1l{si,h=F} ai,h,t1l{si,h=F} , and zi,h,t as the line vector (ai,h,t − κS,k)+ 1l{si,h=S} k=1,. . . ,KS ; S=M,F , we finally obtain the first terms of (2.2) , that is f(ai,h,t, si,h) = xi,h,tβ + zi,h,tu. Socioeconomic characteristics and time dependency In the application, all the socioe- conomic characterics are categorical variables. Consider the Q categorical variables W h,t , q = 1, . . . , Q, with mq modalities, and fix the m q modality as the reference modality, then the socioe- conomic effect term in (2.2) and (2.3) is wh,tγ = mq−1∑ γq,m1l where γq,m is the effect of the m th modality of the socioeconomic variable q. Similarly, time is only measured by weekly counts throughout the year so that the time effect in (2.2) and (2.3) is simply δtα = τ 6=τR ατ1l{τ=t}, where ατ is the effect of week τ and τR is the reference week. Error specification The error at the individual level εi,h,t is assumed to be Gaussian with zero mean, and the variance-covariance structure is such that • households are independent, i.e. ∀i, i′, t, t′ and ∀h 6= h′ cov(εi,h,t, εi′,h′,t′) = 0, • members of the same household are dependent, that is for ∀h, t and i 6= i′, cov(εi,h,t, εi′,h,t) = ρσ where ρ measures the dependence between individuals within the same household. • there is no time dependence, that is ∀i, i′ and ∀t 6= t′ cov(εi,h,t, εi′,h,t′) = 0. In the rescaled household model (2.3), the error εh,t ≡ ∑nh,t i=1 εi,h,t/ nh,t is i.i.d. Gaussian with a zero mean and a variance R such that ∀t, t′ and ∀h 6= h′, V(εh,t) = ρσ εnh,t + (1− ρ)σ2ε and cov(εh,t, εh′,t′) = 0. (2.5) 2.2 Estimation and tests The model (2.3) is a linear mixed model that can be estimated using restricted maximum likelihood (REML) techniques, see Ruppert et al. (2003) for details. An attractive consequence of the use of the mixed model representation of a penalized spline in (2.4) is that mixed model methodology and software can be used to estimate the parameters and predict the random effect in the resulting household model. The amount of smoothing of the underlying functions fS is estimated with the REML technique via the estimation of σ2uS . The estimation was conducted using R©SAS MIXED procedure. To get estimators for σ2ε and ρ, asymptotic least square techniques combined with the linear relationship between the variance given in (2.5) and the household size were used. More precisely, a residual variance σ2n is first estimated for each household size n = 1, . . . , N = maxnh,t using an option of the MIXED procedure (see the program for the detailed syntax). Then, ordinary least square regression and the delta method give estimators for σ2ε and ρ and their standard deviations. The individual intake is then predicted by ŷi,h,t = xi,h,tβ̂ + zi,h,tû+ wh,tγ̂ + δtα̂, (2.6) where β̂, γ̂, and α̂ are the estimators of β, γ, and α respectively and û is the best prediction of the random effect u in the model (2.3). Confidence and prediction intervals can be built for the prediction ŷi,h,t as proposed in Ruppert et al. (2003) and several tests can be conducted in this model: 1. Are the random effects different according to sex? In other words, is the assertion σ2uM = σ2uF = σ u true? 2. Another question is the necessity for such random effects. Is the assertion σ2u = 0 (resp. σ2uM = 0 or σ = 0) true? 3. More globally, is the function f the same for both sexes? Is the assertion fM = fS true? These tests can be conducted using classical likelihood (or restricted likelihood) ratio techniques. The likelihood ratio statistic is asymptotically distributed as a chi square with a degree of freedom being the number of tested equalities, except for point 2, where the limiting distribution is known to be a mixture of chi-square (Self and Liang, 1987; Crainiceanu et al., 2003) because the test concerns the frontier of the parameter definition (σ2u ∈ [0,+∞[). 3 Applying our methodology to the methylmercury risk assess- In this section, we illustrate our approach on our motivating example. Firstly, several tests are conducted on the decomposition model, and secondly, individual long term exposure is compared to the reference long term exposure described in section 1. 3.1 Estimation and tests on the structure of the model Table 2 shows the REML estimates for all socioeconomic variables (parameter γ) and the p-values of Student tests in the model (2.3). The socioeconomic variables used are household income, region of residence, occupation category and level of education of the principal household earner. For each socioeconomic variable, the reference modality is given in Table 2. We assume here that • the function f differs according to the gender but the random effect does not (fM 6= fF and σ2uM = σ • the maximum household size N is set to 6 for variance-covariance estimation. Indeed, the dependence between individuals within the same household depends on the household size nh in (2.5). For each household size, a variance is estimated, and estimates of ρ and σ are obtained using asymptotic least square techniques as mentioned in section 2.2. Since large households are not numerous in the database, the estimations are implemented with a maximum household size, N , set to 6; it is assumed that there is a common variance for all households with size greater than N . In this sub-section, we show the results of several tests we carried out to simplify the inter- pretation of our study. These tests have been implemented in a hierarchical way, starting with the highest-order interaction terms, combining to the reference modality the modality which does not differ significantly from the reference. All tests are performed on the 5% level of significance and each new hypothesis is tested, conditionally on the results of the previous tests. Each null hypothesis and the p-value resulting from the appropriate F-test are shown in Table 3. First of all, concerning the occupation category variable, the self-employed modality does not significantly differ from the reference modality blue collar workers (H1, Pval = 0.771). Refitting the model with the reference modality ”Blue collar workers and self employed”, all the socioeconomic variables are significantly different from the reference. Then, F-tests allow us to conclude that the resulting three groups are significantly different from each other (H2, H3, H4). Let us now consider the region of residence variable. First, there are some very substantial differences among the 4 regions of residence (H5, Pval =< 0.001). However, the modality ”North, Brittany, and Vendee coast” and the modality ”Paris and its suburbs” should be grouped (H6 c, Pvalc = 0.881). Then, the other tests implemented for the level of education and income variables suggest that no further simplification is possible (see p-values of null hypotheses H7, H8, H9 in Table 3). Finally, the overall F-test comparing our resulting final model to the original model (2.3) shows that no important variable has been left out of the model (Pval = 0.59). Table 4 shows the parameter estimates and p-values of the Student’s t-tests for all socioeco- nomic variables of the reduced final model. The income effects on individual exposure are those expected: the richer the households are, the higher their exposures are because seafoods are ex- pensive. Furthermore, living in a coastal region or in Paris and its suburbs brings about larger individual exposure relatively to living in a non coastal region because of the more ready supply of seafoods in these regions. Moreover, the more educated you are, the larger the individual exposure is. The occupation category of the principal household earner has an unexpected effect on the in- dividual exposure. Indeed a higher exposure is expected for white collar workers and retirees whan compared to blue collar workers but an opposite effect is observed. This may be explained by the fact that the reference modality for this variable is a very heterogeneous modality also comprising managers and self-employed persons (farmers and craftsmen). Another explanation could be that white collars workers have a higher propensity to eat out in restaurants whereas outside the home consumption is not included in the model. Table 2 around here, see page 21 Table 3 around here, see page 22 Table 4 around here, see page 22 Likelihood ratio tests are implemented to test the structure of the final model. First, the dependence of individual exposures to methylmercury within a household is tested. The null hypothesis ρ = 0 (cf. equation (2.5)) is rejected (null Pval) which confirms that individuals within the same household have correlated exposures. Then, we test if the function f is the same for both genders. The null hypothesis fM = fF is rejected (null Pval) but the null hypothesis σ = σ2uF is accepted. This means that individual exposure differs with gender but both functions need the same amount of smoothing. 3.2 The cumulative and the long term individual exposure The cumulative individual exposure Si,h,t is calculated from the estimated individual weekly intakes according to equation (1.1) and the resulting values for t > 35 are compared to the reference cumulative exposure defined by (1.3). Figure 1 shows the cumulative individual exposure over the 53 weeks of the year 2001 for different individuals. Only certain percentiles of the distribution of the individual cumulative exposures of the last week are displayed. For example, the curve Pmax represents the cumulative exposure of an individual whose last week’s cumulative exposure is the highest. This is the cumulative exposure of a girl who turned one year old during the 30th week of 2001, lives in Paris or its suburbs in a well to do household. Very few individuals have a cumulative individual exposure above the reference long term ex- posure. We estimate that only 0.186% of individuals are deemed at risk. This risk index should be compared to the more common one defined as the percentage of weekly intakes Di,h,t exceeding the PTWI, denoted R1.6, such as R1.6 = i=1 1l (Di,h,t > 1.6). R1.6 is equal to 0.45%, and is slightly higher since each occasional deviation above the PTWI increases the risk index whereas only long term deviations above this PTWI should be taken into account to assess the risk. A deeper analysis of at risk individuals shows that all these vulnerable individuals are children less than three years old. They represent 5.29% of the children aged between 1 and 3 in 2001. Further, no child of a modest households is found to be at risk. Figure 1 around here, see page 23 4 Discussion As mentioned in section 1, the use of household acquisition data in a food safety context, and in our case the use of the SECODIP database for assessing methylmercury dietary intakes, gives rise to some approximations: 1. Consumption outside of the home is out of the scope of household acquisition data. TNS SECODIP does not provide any information on the quantities of seafoods consumed out of the home or bought for outside consumption. Nevertheless, Serra-Majem et al. (2003) assert that these data are good estimates for the consumption of the whole household. Vasdekis and Trichopoulou (2000) avoid this question by using the term ”availaibility” in- stead of intake or consumption. However, as in Chesher (1997), auxiliary information about outdoor consumption could be introduced in the model as a correction factor accounting for the propensity to eat outside of the home according to age, sex or socioeconomic variables. The French INCA survey on individual consumptions gives details about inside / outside the home consumption for 3003 individuals people aged 3 and older. The mean outside the home consumption proportion is 20% for seafoods. Applying such a factor to all household intakes yields a long term risk of 0.226%, and R1.6 = 0.791%. Furthermore, in this case, a small proportion of consumers older than 3 years old are vulnerable. Nevertheless, children aged between 1 and 3 in 2001 still represent the most vulnerable consumer group, at 10% of the corresponding population. 2. The amount of food bought by a household can be different from the amount actually con- sumed. Indeed, namely for seafoods, a non negligible part is not edible: Favier et al. (1995) show than on average only 61% of fresh or frozen fish is edible. Besides, Maresca and Poquet (1994) also demonstrate some part of the purchased food is thrown away, which also reduces the actual amount of food consumed by a household. However, SECODIP does not specify whether the quantity of fresh or frozen fish bought is ready to be consumed or as a whole fish that needs some preparation. Applying such a factor to all household intakes yields a long term risk of 0.00%, and R1.6 = 0.043%. If both the 20% outside of the home consumption correction factor and the 61% edible proportion factor are applied to our series, the long term risk is equal to 0.021%, R1.6 = 0.13%, and 1.06% of the population of children aged between 1 and 3 are vulnerable. These results stress that applying such a correction factor to assess the actual quantity consumed is probably too strong and is certainly a crude approximation of the quantity of seafoods ingested. Thus, a more detailed database on fish and seafood is needed, to realize an accurate assessment of exposure to methylmercury, taking into account only the edible part of fish and other seafood. Body weight information is crucial in a food safety context and will be included in the future SECODIP data since it has now been added to the list of required individual characteristics. The measurement error afferent to this quantity will remain however, namely for children whose body weight changes a lot throughout a year. Nevertheless, approximating the weekly body weight of young children by the median of the weekly body weight distribution available in French health records is the best approximation possible. 3. The food nomenclature of the SECODIP database is not as detailed as the contamination database. Unfortunately, fish and seafood species are not well documented so it is not possible to consider more than two food categories when computing household intakes. This problem of nomenclature matching is ubiquitous of food risk assessments since contamination analysis are generally conducted independently from the food nomenclature of consumption data. These arguments mainly show the disadvantages of the use of household food acquisition data such as the SECODIP database. Nevertheless, they also present many advantages compared to the individual food record survey mainly used in France in the food safety context: • As mentioned before, households respond for a long period of time (the average is 4 years in the SECODIP panel) which allows us to observe long term behaviors and avoid some well known biases of individual food record surveys. For example, respondents might over- (under- ) declare certain foods with a good (bad) nutritional value either deliberately or just because they increased (reduced) their consumption for the short (7 days) period of the survey. • The individual surveys are expensive and very difficult to conduct. Highly trained interviewers are required and extraordinary cooperation is required from respondents. Household food acquisition data can serve many other applications (economics or marketing) and, at least for the SECODIP data, acquisition recording is simplified by optical scanning of food barcodes. Conclusion In this paper, we proposed a methodology to assess chronic risks related to food contamination using the example of methylmercury exposure through seafood consumption. This methodology includes the definition of a Kinetic Dietary Exposure Model (KDEM) that integrates the fact that contaminants are eliminated from the body at different rates, the rate being measured by the half life of the contaminant. In this paper, the estimation is based on the use of household food acqui- sition data which are first decomposed into individual intake data through a disaggregation model accounting for the dependence among household members. Several extensions of this methodology are currently studied. First, the disaggregation model could be improved by considering a prelim- inary step in which we determine what member is an actual consumer, in the spirit of the Tobit model. The KDEM idea is also currently being developed by studying the stability and ergodic properties of the underlying continuous time piecewise deterministic Markov process (Bertail et al., 2006). The parameters of this new model are the intake distribution, the inter intake time distri- bution and the dissipation rate distribution. In this framework, the dissipation parameter η of the KDEM model is random and the intake and inter-intake distributions can be estimated either from individual (INCA-type) data or household (SECODIP-type) data. References Bertail, P., S. Clémençon and J. Tressou (2006). A storage model with random release rate for modeling exposure to food contaminants. Submitted for publication. Boizot, C. (2005). Présentation du panel de données SECODIP. Technical report. INRA-CORELA. Brumback, B., D. Ruppert and M. P. Wand (1999). Comment on ”variable selection and function estimation in additive nonparametric regression using a data-based prior” by Shively, Kohn, and Wood. Journal of the American Statistical Association 94, 794–797. Chesher, A. (1997). Diet revealed?: Semiparametric estimation of nutrient intake-age relationships. Journal of the Royal Statistical Society A 160(3), 389–428. Chesher, A. (1998). Individual demands from household aggregates: Time and age variation in the quality of diet. Journal of Applied Econometrics 13(5), 505–524. Crainiceanu, C. M., D. Ruppert and T. J. Vogelsang (2003). Some properties of likelihood ratio tests in linear mixed models. (Working Paper). CREDOC-AFSSA-DGAL (1999). Enquête INCA (individuelle et nationale sur les consommations alimentaires). TEC&DOC ed.. Lavoisier, Paris. (Coordinateur : J.L. Volatier). Crépet, A., J. Tressou, P. Verger and J. Ch. Leblanc (2005). Management options to reduce ex- posure to methyl mercury through the consumption of fish and fishery products by the French population. Regulatory Toxicology and Pharmacology 42(2), 179–189. Engle, R. F., C. W. J. Granger, J. Rice and A. Weiss (1986). Non-parametric estimation of the rela- tionship between weather and electricity demand. Journal of the American Statistical Association 81, 310–320. FAO/WHO (2003). Evaluation of certain food additives and contaminants for methylmercury. Sixty first report of the Joint FAO/WHO Expert Committee on Food Additives, Technical Report Series. WHO. Geneva, Switzerland. Favier, C., J. Ireland-Ripert, C. Toque and M. Feinberg (1995). Rpertoire Gnral des Aliments, Table de composition, tome 1. TEC&DOC ed.. Lavoisier, Paris. Green, P.J. and B.W. Silverman (1994). Nonparametric Regression and Generalized Linear Models. Chapman & Hall. Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press. Hoffmann, K., H. Boeingand, A. Dufour, J. L. Volatier, J. Telman, M. Virtanen, W. Becker and S. De Henauw (2002). Estimating the distribution of usual dietary intake by short-term mea- surements. European Journal of Clinical Nutrition 56, 53–62. IFREMER (1994-1998). Résultat du réseau national d’observation de la qualité du milieu marin pour les mollusques (RNO). MAAPAR (1998-2002). Résultats des plans de surveillance pour les produits de la mer. Ministère de l’Agriculture, de l’Alimentation, de la Pêche et des Affaires Rurales. Maresca, B. and G. Poquet (1994). Collectes slectives des dchets et comportements des mnages. Technical Report R146. CREDOC. Nusser, S.M., A.L. A.L. Carriquiry, K.W. Dodd and W.A. Fuller (1996). A semiparametric trans- formation approach to estimating usual intake distributions. Journal of the American Statistical Association 91, 1440–1449. Patterson, H. D. and R. Thompson (1971). Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554. Ruppert, D., M .P. Wand and R. J. Carroll (2003). Semiparametric regression. Cambridge Series in Statistical and Probabilistic Mathematics. Cambrige University Press. Self, S. G. and K.Y. Liang (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Associ- ation 82(398), 605–610. Sempé, M., G. Pédron and M. P. Roy-Pernot (1979). Auxologie, méthode et séquences. Théraplix. Paris. Serra-Majem, L., D. MacLean, L. Ribas, D. Brule, W. Sekula, R. Prattala, R. Garcia-Closas, A. Yngve and M. Lalondeand A. Petrasovits (2003). Comparative analysis of nutrition data from national, household, and individual levels: results from a WHO-CINDI collaborative project in Canada, Finland, Poland, and Spain. Journal of Epidemiology and Community Health 57, 74–80. Smith, J. C. and F. F. Farris (1996). Methyl mercury pharmacokinetics in man: A reevaluation. Toxicology And Applied Pharmacology 137, 245–252. Speed, T. (1991). Discussion of “that blup is a good thing: the estimation of random effects” by g. robinson. Statistical science 6, 42–44. Tressou, J., A. Crépet, P. Bertail, M. H. Feinberg and J. C. Leblanc (2004). Probabilistic exposure assessment to food chemicals based on extreme value theory. application to heavy metals from fish and sea products. Food and Chemical Toxicology 42(8), 1349–1358. Vasdekis, V.G.S. and A. Trichopoulou (2000). Non parametric estimation of individual food avail- ability along with bootstrap confidence intervals in household budget surveys. Statistics and Probability Letters 46, 337–345. Verbyla, A. (1999). Mixed Models for Practitioners. Biometrics SA, Adelaide. WHO (1990). Methylmercury, environmental health criteria 101. Technical report. Geneva, Switzer- land. Figures and Tables Table 1: Description of the contamination database (Unit: microgram per kilogram Mean Min Max Standard Deviation Number of analysis Fish 0.147 0.003 3.520 0.235 1350 Mollusk and Shellfish 0.014 0.001 0.172 0.011 1293 Table 2: Restricted maximum likelihood estimates (REML) for age and all socioeconomic variables and the p-value of the Student’s tests (Pval) Effect Parameter REML Pval Income (ref: Mean sup) Well to do γ1 6.027 <0.001 Mean inf γ2 2.686 <0.001 Modest γ3 -1.928 <0.001 Region of residence (ref: Noncoastal regions) North, Brittany, Vendee coast γ4 0.962 0.003 South West coast γ5 5.232 <0.001 Mediterranean coast γ6 2.303 <0.001 Paris and its suburbs γ7 1.023 0.009 Occupation category of the principal household earner (ref: Blue collar workers) self-employed persons γ8 -0.122 0.771 white collar workers γ9 -3.733 <0.001 retirees γ10 -5.261 <0.001 no activity γ11 -1.910 0.004 Level of Education of the principal household earner (ref: BAC and higher degree) student γ12 5.901 <0.001 no or weak diploma γ13 -1.281 <0.001 Table 3: The different steps performed in testing the socioeconomic part of our model. For each step, the null hypothesis tested and the p-value resulting from the appropriate F-test are shown. All tests are performed conditionally on the results of the previous tests (Pval) Null hypothesis Pval H1 : γ8 = 0 0.771 H2 : γ9 = γ10 0.030 H3 : γ9 = γ11 0.018 H4 : γ10 = γ11 <0.001 H5 : γ4 = γ5 = γ6 = γ7 <0.001 H6 : a : γ4 = γ5 <0.001 b : γ4 = γ6 <0.001 c : γ4 = γ7 0.881 d : γ5 = γ6 <0.001 e : γ5 = γ7 <0.001 f : γ6 = γ7 0.0103 H7 : γ12 = γ13 <0.001 H8 : γ1 = γ2 = γ3 <0.001 H9 : a : γ1 = γ2 <0.001 b : γ1 = γ3 <0.001 c : γ2 = γ3 <0.001 Table 4: Restricted maximum likelihood estimates (REML) for all age and socioeconomic variables of the reduced final model with all variance components and their standard errors (s.e) Effect Parameter REML Pval Income (ref: Mean sup) Well to do γ1 6.108 <0.001 Mean inf γ2 2.760 <0.001 Modest γ3 -1.915 <0.001 Region of residence (ref: Non coastal regions) Paris and North, Brittany, Vendee coast γ4= γ7 0.995 <0.001 South west coast γ5 5.156 <0.001 Mediterranean coast γ6 2.250 <0.001 Occupation category of the principal household earner (ref: Blue collar workers and self employed persons) white collar workers γ9 -3.745 <0.001 retirees γ10 -5.243 <0.001 no activity γ11 -1.871 0.005 Level of education of the principal household earner (ref: BAC and higher degree) student γ12 5.879 <0.001 no or weak diploma γ13 -1.279 <0.001 REML s.e Variance of the random effect σu 24.832 6.7316 Variance-covariance structure variance σ2 1260705 282309 correlation ρ -0.22 0.0434 Figure 1: Cumulative exposure to MeHg (unit: µg per kg of body weight) Motivating example: risk related to methylmercury in seafoods in the French population Cumulative exposure and long term risk: the Kinetic Dietary Exposure Model (KDEM) From household acquisition data to household intake series Statistical methodology The decomposition model General principle Specification details Estimation and tests Applying our methodology to the methylmercury risk assessment Estimation and tests on the structure of the model The cumulative and the long term individual exposure Discussion ABSTRACT Foods naturally contain a number of contaminants that may have different and long term toxic effects. This paper introduces a novel approach for the assessment of such chronic food risk that integrates the pharmacokinetic properties of a given contaminant. The estimation of such a Kinetic Dietary Exposure Model (KDEM) should be based on long term consumption data which, for the moment, can only be provided by Household Budget Surveys such as the SECODIP panel in France. A semi parametric model is proposed to decompose a series of household quantities into individual quantities which are then used as inputs of the KDEM. As an illustration, the risk assessment related to the presence of methyl mercury in seafood is revisited using this novel approach. <|endoftext|><|startoftext|> Introduction Hot molecular cores represent an early evolutionary stage in massive star formation prior to the formation of an ultra- compact Hii region (UCHii). Single-dish line surveys toward hot cores have revealed high abundances of many molecu- lar species and temperatures usually exceeding 100 K (e.g., Schilke et al. 1997; Hatchell et al. 1998; McCutcheon et al. 2000). Unfortunately, most hot cores are relatively far away (a few kpc, Orion-KL being an important exception), and high- spatial resolution studies are important to disentangle the var- Send offprint requests to: H. Beuther ious components in the region, to resolve potential multiple heating sources, and to search for chemical variations through- out the regions. Here we present sub-arcsecond resolution submm spectral line and dust continuum observations of the hot core G29.96−0.02, characterizing the physical and chemi- cal properties of this prototypical region. The hot core/UCHii region G29.96−0.02 is a well studied source comprising a cometary UCHii region and approximately 2.6′′ to the west a hot molecular core (Wood & Churchwell 1989; Cesaroni et al. 1994, 1998). G29.96−0.02 is at a dis- tance of ∼6 kpc (Pratap et al. 1999), the bolometric luminos- ity measured with IRAS is very high with L ∼ 1.4 × 106 L⊙ http://arxiv.org/abs/0704.0518v1 2 Beuther et al.: SMA observations of G29.96−0.02 (Cesaroni et al. 1994). Since the region harbors at least two massive (proto)stars (within the UCHii region and the hot core) this luminosity must be distributed over various sources. Based on cm continuum free-free emission, Cesaroni et al. (1994) cal- culate a luminosity for the UCHii region of Lcm ∼ 4.4×10 5 L⊙. Furthermore, they try to estimate the luminosity of the hot core via a first order black-body approximation and get a value of Lbb ∼ 1.2×10 5 L⊙. Later, Olmi et al. (2003) derive a similar es- timate (∼ 9 × 104 L⊙) via integrating a much better determined SED. The exciting source of the UCHii region has been identi- fied in the near-infrared as an O5-O8 star (Watson & Hanson 1997). Furthermore, Pratap et al. (1999) identified two addi- tional sources toward the rim of the UCHii region and an en- hanced density of reddened sources indicative of an embedded cluster. A line survey toward a number of UCHii regions reveals that G29.96−0.02 is a strong molecular line emitter in nearly all observed species (Hatchell et al. 1998). High-angular res- olution studies show that many species (e.g., NH3, CH3CN, HNCO, HCOOCH3) peak toward the main H2O maser cluster ∼ 2.6′′ west of the UCHii region (e.g, Hofner & Churchwell 1996; Cesaroni et al. 1998; Olmi et al. 2003), whereas CH3OH peaks ∼ 4′′ further south-west associated with another iso- lated H2O maser feature (Pratap et al. 1999). Hoffman et al. (2003) detected one of the relatively rare H2CO masers toward the hot core position. These masers are proposed to trace the warm molecular gas in the vicinity of young forming massive stars (Araya et al. 2006). The signature of a CH3OH peak off- set from the other molecular lines is reminiscent of Orion-KL (e.g., Wright et al. 1996; Beuther et al. 2005b). Temperature estimates toward the hot core based on high-density trac- ers vary between 80 and 150 K (e.g., Cesaroni et al. 1994; Hatchell et al. 1998; Pratap et al. 1999; Olmi et al. 2003). While Gibb et al. (2004) detect a molecular outflow in H2S emanating from the hot core in approximately the south- east north-west direction, Cesaroni et al. (1998) and Olmi et al. (2003) detect a velocity gradient in the east-west direction in the high-density tracers NH3(4,4) and CH3CN, consistent with a rotating disk around an embedded protostar. However, Maxia et al. (2001) also report that their rather low-resolution 5.9′′ × 3.7′′ (≈ 0.15 pc) SiO(2–1) data are consistent with the disk scenario as well. This is a bit puzzling since SiO is usually found to trace shocked gas in outflows and not more quies- cent gas in disks. Inspecting their SiO image again (Fig. 6 in Maxia et al. 2001), this interpretation is not unambiguous, the data also appear to be consistent with the outflow observed in H2S (Gibb et al. 2004). It is possible that the spatial resolution of their SiO(2–1) observations is not sufficient to really disen- tangle the outflow in this distant region. Olmi et al. (2003) compiled the SED from cm to mid- infrared wavelengths. While the 3 mm data are still strongly dominated by the free-free emission (Olmi et al. 2003), at 1 mm the hot core becomes clearly distinguished from the ad- jacent UCHii region (Wyrowski et al. 2002). G29.96−0.02 is one of the few hot cores which is detected at mid-infrared wavelengths (De Buizer et al. 2002). Interestingly, the mid- infrared peak is ∼ 0.5′′ (∼3000 AU) offset from the NH3(4,4) hot core position. While Gibb et al. (2004) speculate that the mid-infrared peak might arise from the scattered light only, De Buizer et al. (2002) suggest that it could trace a second mas- sive source within the same core. This hypothesis can be tested via very-high-angular-resolution submm continuum studies. 2. Observations We have observed the hot core G29.96−0.02 with the Submillimeter Array (SMA1, Ho et al. 2004) during four nights between May and November 2005. We used all available ar- ray configurations (compact, extended, very extended, for de- tails see Table 1) with unprojected baselines between 16 and 500 m, resulting at 862 µm in a projected baseline range from 16.5 to 591 kλ. The chosen phase center was the peak position of the associated UCHii region R.A. [J2000.0]: 18h46m03.s99 and Decl. [J2000.0] −02◦39′21.′′47. The velocity of rest is vlsr ∼ +98 km s −1 (Churchwell et al. 1990). Table 1. Observing parameters Date Config. # ant. Source loop τ(225GHz) [hours] 28.May05 very ext. 6 7.0 0.13-0.16 18.Jul.05 comp. 7 7.5 0.06-0.09 4.Sep.05 ext. 6 4.5 0.06-0.08 5.Nov.05 very ext. 7 3.0 0.06 For bandpass calibration we used Ganymede in the com- pact configuration and 3C279 and 3C454.3 in the extended and very extended configuration. The flux scale was derived in the compact configuration again from observations of Ganymede. For two datasets of the more extended configurations, we used 3C454.3 for the relative scaling between the various baselines and then scaled that absolutely via observations of Uranus. For the fourth dataset we did the flux calibration using 3C279 only. The flux accuracy is estimated to be accurate within 20%. Phase and amplitude calibration was done via frequent observa- tions of the quasars 1743-038 and 1751+096, about 15.5◦ and 18.3◦ from the phase center of G29.96−0.02. The zenith opac- ity τ(348GHz), measured with the NRAO tipping radiometer located at the Caltech Submillimeter Observatory, varied dur- ing the different observation nights between ∼0.15 and ∼0.4 (scaled from the 225 GHz measurement). The receiver operated in a double-sideband mode with an IF band of 4-6 GHz so that the upper and lower sideband were separated by 10 GHz. The central frequencies of the upper and lower sideband were 348.2 and 338.2 GHz, respectively. The correlator had a bandwidth of 2 GHz and the channel spacing was 0.8125 MHz. Measured double-sideband system temperatures corrected to the top of the atmosphere were between 110 and 800 K, depending on the zenith opacity and the elevation of the source. Our sensitivity was dynamic-range limited by the side-lobes of the strongest 1 The Submillimeter Array is a joint project between the Smithsonian Astrophysical Observatory and the Academia Sinica Institute of Astronomy and Astrophysics, and is funded by the Smithsonian Institution and the Academia Sinica. Beuther et al.: SMA observations of G29.96−0.02 3 emission peaks and thus varied between the line maps of dif- ferent molecules and molecular transitions. This limitation was mainly due to the incomplete sampling of short uv-spacings and the presence of extended structures. The 1σ rms for the velocity-integrated molecular line maps (the velocity ranges for the integrations were chosen for each line separately depend- ing on the line-widths and intensities) ranged between 36 and 76 mJy. The average synthesized beam of the spectral line maps was 0.65′′×0.48′′ (P.A. −83◦). The 862 µm submm continuum image was created by averaging the apparently line-free parts of the upper sideband. The 1σ rms of the submm continuum image was ∼ 21 mJy/beam, and the achieved synthesized beam was 0.36′′×0.25′′ (P.A. 18◦), the smallest beam obtained so far with the SMA. The different synthesized beams between line and continuum maps are due to different applied weightings in the imaging process (“robust” parameters set in MIRIAD to 0 and -2, respectively) because there was insufficient signal-to- noise in the line data obtained in the very extended configura- tion. The initial flagging and calibration was done with the IDL superset MIR originally developed for the Owens Valley Radio Observatory (Scoville et al. 1993) and adapted for the SMA2. The imaging and data analysis were conducted in MIRIAD (Sault et al. 1995). 3. Results 3.1. Submillimeter continuum emission Figure 1 presents the 862 µm continuum emission extracted from the line-free parts of the upper sideband spectrum (∼1.8 GHz in total used) shown in Figure 4. The very high spa- tial resolution of 0.36′′ × 0.25′′ corresponds to a linear res- olution of ∼ 1800 AU at the given distance of ∼6 kpc. The submm continuum emission peaks approximately 2′′ west of the UCHii region and is associated with the molecular line emission known from previous observations. We do not de- tect any submm continuum emission toward the UCHii re- gion itself. At the given spatial resolution, for the first time multiplicity within the G29.96−0.02 hot core is resolved and we identify 6 submm continuum emission peaks (submm1 to submm6) above the 3σ level of 63 mJy beam−1 (Fig. 1). We consider submm1 and submm2 to be separate sources instead of a dust ridge because we count compact spherical or ellip- tical sources and their emission peaks are separated by about one synthesized beam. The four strongest submm peaks, that are all > 6σ detections, are located within a region of (1.3′′)2 (7800 AU) in diameter. The submm peak submm1 is associated with H2O and H2CO maser emission (Hofner & Churchwell 1996; Hoffman et al. 2003), and we consider this to be probably the most luminous sub-source. The other H2O maser peaks are offset from the submm continuum emission. The mid-infrared source detected by De Buizer et al. (2002) is offset > 1′′ from the submm emission. This may either be due to uncertainties in the MIR astrometry or the MIR emission may trace another young source in the region. It should be noted that the class ii CH3OH masers detected by Minier et al. (2001) peak close to 2 The MIR cookbook by Charlie Qi can be found at http://cfa-www.harvard.edu/∼cqi/mircook.html. the MIR source as well, which indicates that the MIR offset from the hot core may well be real. Table 2 lists the absolute source positions, their 862µm peak intensities and the integrated flux densities approximately associated with each of the sub-sources. Calculating the bright- ness temperature Tb of the corresponding Planck-function for, e.g., submm1, we get Tb(Peak1) ∼ 27 K. Assuming hot core dust temperatures of ∼ 100 K, the usual assumption of opti- cally thin dust emission is not really valid anymore, and one gets an approximate beam-averaged optical depth τ of the dust emission of ∼0.3. To calculate the dust and gas masses, we can follow the mass determination outlined in Hildebrand (1983) and Beuther et al. (2002, 2005a), which assumes op- tically thin emission, and correct that for the increased dust opacity. Assuming constant emission along the line of sight, the opacity correction factor C is 1 − e−τ With τ ∼ 0.3, we get a correction factor C ∼ 1.16 still compa- rably small. Assuming a dust opacity index β = 1.5, the dust opacity per unit dust mass is κ(862µm) ∼ 1.5 cm2g−1 (with the reference value κ(250µm) ∼ 9.4 cm2g−1, see Hildebrand 1983), and we assume a gas-to-dust ratio of 100. Given the uncertainties in β and T , we estimate the masses to be accu- rate within a factor 4. Table 2 gives the derived masses and beam-averaged column densities. Each sub-peak has a mass of a few M⊙, and the main submm1 exhibits approximately 10 M⊙ of compact, warm gas and dust emission. The inte- grated 862 µm continuum flux density of the central region comprising the four main submm continuum sources amounts to 1.16 Jy. At an average dust temperature of 100 K, this cor- responds to a central core mass of 39.9 M⊙. In comparison to these flux density measurements, Thompson et al. (2006) ob- served with SCUBA 850µm peak and integrated flux densities of ∼ 11.5 ± 1.2 Jy/(14′′beam) and ∼19.2 Jy, respectively. The ratio between peak and integrated JCMT fluxes already indi- cates non-compact emission even on that scales. Furthermore, subtracting a typical line contamination of the continuum emis- sion in hot cores of the order 25% (e.g., NGC6334I, Hunter et al. in prep.), the total 850µm single-dish continuum flux den- sity should amount to ∼8.6 Jy. Compared with the integrated flux density in the SMA data of ∼1.74 Jy, this indicates that approximately 80% of the single-dish emission is filtered out by the missing short spacings in the interferometer data. The dust and gas in the central region have higher temperatures than the components filtered out on larger spatial scales, and since the dust and gas mass is inversely proportionally related to the temperature by MH2 ∝ (e hν/kT − 1) (e.g., Beuther et al. 2002), a greater proportion of the mass (> 80%) is filtered out in the SMA data. However, the SMA image reveals the most compact hot gas and dust cores at the center of the evolving massive star-forming region. The shortest baseline of the SMA obser- vations of ∼16.5 kλ correspond to scales > 12′′ which hence have to be filtered out entirely. However, even smaller scales are missing because the uv-spacings corresponding to scales ≥ 5′′ are still relatively poorly sampled and the image presented in Figure 1 is only sensitive to spatial scales of the order a few http://cfa-www.harvard.edu/~cqi/mircook.html 4 Beuther et al.: SMA observations of G29.96−0.02 Fig. 1. The hot core UCHii region G29.96−0.02. The grey-scale with contours shows the submm continuum emission with a spatial resolution of 0.36′′×0.25′′. The contour levels start at the 1σ level of 21 mJy beam−1 and continue at 63, 105 mJy beam−1 (black contours) to 147, 168 mJy beam−1 (white contours). The dashed contours outline the cm continuum emission from the UCHii region and the thick contours show the NH3 emission (Cesaroni et al. 1994). The contouring is done from 15 to 95% (step 10%) of the peak emission of each image, respectively (S peak(1.2cm) = 109mJy/beam, S peak(NH3) = 15mJy/beam). Triangles, circles and pentagons show the H2O (Hofner & Churchwell 1996), H2CO (Hoffman et al. 2003) and class ii CH3OH (Minier et al. 2001) maser positions. The star marks the peak of the MIR emission (De Buizer et al. 2002), which is not a point source but has a similar size as the NH3 emission. The squares mark the infrared sources by Pratap et al. (1999). arcseconds. The submm peaks detected by the SMA are much stronger than what would have been expected if the single-dish flux (∼8.6 Jy) were uniformly distributed over the SCUBA pri- mary beam of 14′′, even ignoring any spatial filtering and miss- ing flux effects (This would result in ∼ 4 mJy per synthesized SMA beam.). This shows that the emission measured on the small spatial scales sampled by the SMA represents the com- pact core emission much better than expected. However, it does not imply that the gas masses measured by the SMA are the only gas reservoir the embedded protostars have for their on- going accretion; they may also gain mass from the large-scale gas envelope that is filtered out by our observations (see also the competitive accretion scenario, e.g., Bonnell et al. 2004). The derived beam-averaged H2 column densities are of the or- der a few times 1024 cm−2, corresponding to visual extinctions Av of a few 1000 (Av = NH/0.94 × 10 21, Frerking et al. 1982). 3.2. Spectral line emission Figure 4 presents spectra extracted toward the main submm submm1 with an angular resolution of 0.64′′×0.47′′ compared to the submm continuum map (see §2). More than 80 spectral lines from 18 molecular species, isotopologues or vibrationally excited species have been identified with a minor fraction of ∼5% of unidentified lines (UL) (Tables 6 & 4). The range of up- per level excitation temperatures for the many lines varies be- tween approximately 40 and 750 K (Table 6). Therefore, with one set of observations we are able to trace various gaseous temperature components from the relatively colder gas sur- Beuther et al.: SMA observations of G29.96−0.02 5 Table 2. Submm continuum source parameters Source R.A. Dec. S peak S int M N [J2000] [J2000] [ mJy ] [mJy] [M⊙] [10 24cm−3] submm1 18:46:03.786 -02:39:22.19 173 288 11.5 5.7 submm2 18:46:03.789 -02:39:22.48 168 237 9.5 5.5 submm3 18:46:03.758 -02:39:22.16 138 178 7.1 4.5 submm4 18:46:03.736 -02:39:22.65 151 249 9.9 5.0 submm5 18:46:03.710 -02:39:23.33 68 106 4.2 2.2 submm6 18:46:03.665 -02:39:23.80 84 85 3.4 2.8 The Table shows the peak intensities S peak, the integrated intensities S int, the derived gas masses M as well as the H2 column densities N. rounding the hot core region to the densest and warmest gas best observed in some of the vibrationally excited lines. Table 3. Peak intensities, rms and velocity ranges for images in Figs. 2 & 3. Line S peak rms ∆v mJy/beam mJy/beam km/s 862µm cont., low res. 422 17 CH3OH(73,5 − 62,4) 878 64 [90,104] 13CH3OH(137,7 − 127,6) 752 51 [95,101] CH3OH(74,3 − 64,3), vt = 1 1419 69 [91,105] CH3OCH3(74,3 − 63,4) 669 46 [94,104] C2H5OH(157,9 − 156,10) 586 51 [95,100] SiO(8 − 7) 391 36 [75,105] C34S(7 − 6) 592 62 [92,104] H2CS(101,0 − 91,9) 933 69 [92,100] 34SO(88 − 77) 827 57 [95,103] SO2(144,14 − 183,15) 544 53 [94,100] HCOOCH3(275,22 − 265,21) 491 70 [96,100] CH3CN(198 − 188) 788 71 [94,100] CH3CH2CN(383,36 − 373,35) 791 56 [94,102] CH3CHCN(362,34 − 352,32) 655 68 [96,100] HC3N(37 − 36), v7 = 1 622 55 [94,102] HC3N(37 − 36), v7 = 2 416 57 [94,100] HN13C(4 − 3) 1149 76 [94,100] Figures 2 and 3 now present integrated images of the var- ious detected species, isotopologues and vibrationally excited lines. For comparison, Figure 2 also shows the submm con- tinuum emission reduced with the same degraded spatial res- olution as the line images. All images show emission in the vicinity of the hot molecular core and no emission toward the associated UCHii region. However, the morphology varies sig- nificantly between many of the observed molecular line maps. The molecular emission is largely confined to the central region of the main four submm continuum peaks, and we do not detect appreciable molecular emission toward the continuum peaks 5 and 6. Reducing the submm continuum data with the same spatial resolution as the line images, the four submm peaks are smoothed to a single elongated structure peaking close to the submm peak submm1 (Fig. 2, top-left panel). The ground state CH3OH emission is relatively broadly distributed with two peaks in east-west direction, and one may associate one with the submm peaks 1 and 2 and the other with the submm peak submm3, but most other maps show on average one spec- tral line peak somewhere in the middle of the 4 main submm continuum peaks, similar to the lower-resolution submm con- tinuum map. However, there are also a few species which significantly deviate from this picture and show a different spatial morphol- ogy. For example SiO is more extended in north-east south- west direction likely due to a molecular outflow (§3.3). Also interesting is the emission from C34S which lacks emission around the central four submm peaks but is stronger in the in- terface region between the hot molecular core and the UCHii region (§4.3). Furthermore, there are a few spectral line maps – mainly those from likely optically thin lines (HCOOCH3, HN13C), highly excited lines (CH3CHCN) and vibrationally excited lines (CH3OH vt = 1, 2, HC3N v7 = 1, 2) – which show their emission peaks concentrated toward the main submm peak submm1 (§4.5). Previous lower-resolution (∼ 10′′) molecular line observa- tions revealed strong CH3OH emission toward the H2O maser feature approximately 4′′ south-west of the hot core peak (Fig. 1, Hofner & Churchwell 1996; Pratap et al. 1999). A lit- tle bit surprising, we do not detect any CH3OH emission (nor any other species) toward that south-western position, even when imaged at low angular resolution using only the com- pact configuration data (therefore, we do not cover that posi- tion in Figures 2 and 3). Pratap et al. (1999) discuss mainly two possibilities to explain this discrepancy: Either their ob- served specific CH3OH(80 − 71) line is a weak maser and we do not cover any comparable CH3OH line, or the emis- sion covered by the lower-resolution data is relatively extended and filtered out by our observations. As discussed in the pre- vious section, the shortest baseline of our observations was ∼16 m, implying that we are not sensitive to any scales > 12′′. Since the CH3OH emission in Pratap et al. (1999) is slightly resolved by their synthesized beam of 12.6′′ × 9.8′′, it is un- likely that we would have filtered out all emission. However, among the many observed CH3OH lines (Table 6), some have similar excitation temperatures of the order 80 K as the line observed by Pratap et al. (1999), and we would expect to de- tect thermal emission from these lines as well. Therefore, our non-detection of CH3OH emission toward the south-western H2O maser position supports rather their suggested scenario of weak CH3OH maser emission in the previously reported obser- vations (Pratap et al. 1999). 6 Beuther et al.: SMA observations of G29.96−0.02 Fig. 4. Lower and upper sideband spectra extracted toward the submm1. The spatial resolution of these data is 0.64′′ × 0.47′′. The main line identifications are shown in both panels. Table 4. Detected molecular species Species Isotopologues Vibrational states CH3OH 13CH3OH CH3OH, vt = 1, 2 CH3OCH3 C2H5OH HCOOCH3 CH3CN CH3CH2CN CH3CHCN HC3N, v7 = 1, 2 HN13C a The detection of this CH3OH vt = 2 line is doubtful since other close vt = 2 lines with similar excitation temperatures were not detected. 3.3. Molecular outflow emission The SiO(8-7) spectrum spans a large range of velocities from ∼75 to ∼111 km s−1. Integrating the blue- and red-shifted emis- sion, one gets the outflow map presented in Fig. 5. The elon- gated north-west south-east structure is consistent with the pre- viously proposed outflow by Gibb et al. (2004). The additional red feature north-east of the central hot core region makes the interpretation ambiguous: If the north-west south-east outflow is a relatively highly collimated jet, then the north-eastern red feature could be attributed to an additional outflow leaving the core in north-east south-west direction. The blue wing of that potential second outflow would not detected in our data. However, since we are filtering out any larger-scale emission, it is also possible that the red SiO features south-east and north- east of the main core are part of the same wide-angle outflow tracing potentially the limb-brightened cavity walls. In this sce- nario, our observations would miss part of the blue-shifted wide-angle outflow lobe. With the current data, it is difficult to clearly distinguish between the two scenarios. However, com- paring the elongated blue-shifted SiO(8–7) data with the pre- vious north-west south-eastern outflow observed in H2S by Gibb et al. (2004), it appears that this is the most likely direc- tion of the main outflow of the region. Therefore, the multi- ple outflow scenario appears more likely for the hot core in G29.96−0.02. The lower resolution SiO(2–1) observation by Maxia et al. (2001) are also consistent with this scenario. Based on these data, we cannot conclusively say which of the submm continuum sources submm1 to submm4 contribute to driving the outflows. Fig. 5. SiO(8-7) outflow map. The full and dashed contours are integrated over the blue- and redshifted SiO emission as shown in the figure. The contouring starts at ±2σ and continues in ±1σ steps (thick contours positive, thin contours negative). The 1σ values for the blue- and red-shifted images are 48 and 46 mJy beam−1, respectively. The markers are the same as in the previous images, the synthesized beam of 0.68′′ × 0.49′′ is shown at the bottom right, and the arrows guide the eye for the potential directions of the two discussed outflows. The offsets on the axes are relative to the phase center. 4. Discussion 4.1. The formation of a proto-Trapezium system? The four main submm continuum peaks are located within a projected area of 7800 × 7800 (AU)2 on the sky. The projected separation ∆θ between individual sub-sources varies between 1800 AU (peaks 1 and 2) and 5400 AU (peaks 1 and 4, see Table 5). Could the four central submm peaks be the predecessors of a future Trapezium system? Trapezia are defined as non- hierarchical multiple systems of three or more stars where the Beuther et al.: SMA observations of G29.96−0.02 7 largest projected separation between Trapezia members should not exceed the smallest projected separation by a factor of 3 (Sharpless 1954; Ambartsumian 1955; Abt & Corbally 2000). This criterion is satisfied by the four submm peaks at the cen- ter of the G29.96−0.02 hot core. The 14 optically identified Trapezia discussed by Abt & Corbally (2000) have mean radii to the furthest outlying member of ∼ 4 × 104 AU, with the largest radius of ∼ 5.4 × 105 AU (∼2.6 pc), the approximate dimension of an open cluster. Therefore, the protostellar pro- jected separations of the tentative proto-Trapezium candidate in G29.96−0.02 are significantly smaller than in typical optically visible Trapezia systems. A similar small size for a candidate Trapezium system has recently been reported for the multiple system in W3IRS5 (Megeath et al. 2005). Table 5. Spatial separation Pair ∆θ ∆x [′′] [AU] 1-2 0.3 1800 1-3 0.5 3000 1-4 0.9 5400 2-3 0.6 3600 2-4 0.8 4800 3-4 0.6 3600 The numbers in column 1 correspond to the numbers of the submm peaks. The small sizes of the proto-Trapezia in G29.96−0.02 and W3IRS5 may be attributed to their youth. During their upcom- ing evolution, these young system will expel most of the sur- rounding gas and dust envelope via the protostellar outflows and strong uv-radiation. Therefore, the whole gravitational po- tential of the system will decrease and the kinetic energy may dominate. Systems with positive total energy will globally ex- pand and will eventually be observable as a larger-scale optical Trapezia systems (Ambartsumian 1955). With the given data it is hard to estimate how massive the expected Trapezia stars are and will finally be at the end of their formation processes. The integrated hot core luminosity is estimated to be ∼ 105 L⊙ (Cesaroni et al. 1994; Olmi et al. 2003), in contrast to the integrated luminosity of the whole re- gion measured by the large IRAS beam of ∼ 106 L⊙. Producing 105 L⊙ requires either an O7 star or a few stars of comparable but lower masses. Nevertheless, the numbers imply that this Trapezium system should form at least one or more massive stars. Although the gas masses we derived from our dust con- tinuum data (Table 2) are relatively low, that does not neces- sarily imply that their mass reservoir is restricted to these gas masses because it is possible that they may accrete additional gas from the larger-scale envelope that is filtered out by our observations. This scenario is predicted by the competitive ac- cretion model for massive star formation (e.g., Bonnell et al. 2004). The fact that the gas masses we find for the four strongest submm sources are all similar allows to speculate that they may form about similar mass stars in the end, however, this cannot be proven by these data in more detail. Assuming that the projected size of the potential proto-Trapezium system in G29.96−0.02 of approximately 7800 (AU)2 resembles a 3-dimensional sphere of radius ∼3900 AU, we can estimate the current protostellar volume density of the region to approximately 1.4 × 105 protostars per cubic pc. This number is larger than typical stellar den- sities in young clusters of the order 104 stars per cubic pc (Lada & Lada 2003), but it is still below the extremely high (proto)stellar densities required for protostellar merger models of the order 106 to 108 stars per cubic pc (Bonnell et al. 1998, 2004; Stahler et al. 2000; Bally & Zinnecker 2005). Although we have not yet observed the extremely high (proto)stellar densities predicted by the coalescence scenario, as soon as we observe massive star-forming regions with a spa- tial resolution ≤ 4000 AU, we begin to resolve multiplicity and potential proto-Trapezia (see also the recent observations of NGC6334I and I(N) by Hunter et al. 2006). Furthermore, this (proto)stellar density may even be a lower limit, since we ob- serve only a two-dimensional projection and are additionally sensitivity limited to masses ≥ 2.1 M⊙ (corresponding to the 3σ flux limit of 63 mJy beam−1 at the assumed temperature of 100 K). Higher spatial resolution has so far always increased the observed (proto)stellar densities, and it is possible that in the future we may reach the 106 requirement for merging to play a role. However, it is also important to get better theoret- ical predictions of potential merger signatures that observers could look for. 4.2. Various episodes of massive star formation? It is interesting to note that the previously identified mid- infrared source (De Buizer et al. 2002) is offset from the submm continuum peaks. Although the mid-infrared astrome- try is usually relatively uncertain, the association of the mid- infrared peak with class ii CH3OH maser emission with an absolute positional uncertainty of only 30 mas (Minier et al. 2001) is indicative that the offset may be real. Combining the facts that we find within a small region of only ∼20000 AU (∼0.1 pc) at least three different regions of massive star for- mation – the UCHii region, the mid-infrared source, and the submm continuum sources – indicates that not all massive stars within the same evolving cluster are coeval but that sequences of massive star formation may take place even on such small spatial scales. 4.3. Carbon mono-sulfide C34S One of the most striking spectral line maps is from the rare car- bon mono-sulfide isotopologue C34S(7–6). Its emission peak is not toward the hot core nor any of the submm continuum peaks, but largely east of it in the interface region between the submm continuum peaks and the UCHii region. Hence, one likes to understand why the C34S emission is that weak toward the hot core region and that strong at the hot core/UCHii region inter- face. CS usually desorbs from dust grains at moderate temper- atures of a few 10 K, hence it should be observable relatively 8 Beuther et al.: SMA observations of G29.96−0.02 early in the evolution of a growing hot molecular core (e.g., Viti et al. 2004). From 100 K upwards H2O is released from grains, then it forms OH molecules, and the OH can react with S to SO and SO2 (e.g., Charnley 1997). Therefore, the initial high CS abundances should decrease with time while the SO and SO2 are expected to increase with time (e.g., Wakelam et al. (2005)). As shown in Figure 2, 34SO peaks to- ward the hot core where the derived CH3OH temperatures ex- ceed the H2O evaporation temperature (see §4.4 and Fig. 7b, potentially validating this theoretical prediction. According to such chemical models, the hot core G29.96−0.02 should have a chemical age of at least a few times 104 years. The strong C34S emission in the hot core/UCHii interface region may be explained in the same framework. In the molec- ular evolution scheme outlined above, one would expect low C34S emission toward the hot core with a maybe symmetrical increase further-out. In the case of the G29.96−0.02 hot core, we have the decrease toward the center, but the emission rises only toward the east, north and west with the strongest increase in the eastern hot core/UCHii region interface. If one compares the C34S morphology in Figure 2 with the temperature distri- bution in Figure 7b, one finds the lowest CH3OH temperatures right in the vicinity of the C34S emission peaks, adding further support to the proposed chemical picture. Extrapolating this scenario to other molecules, it indicates that species which are destroyed by H2O, e.g., molecular ions such as HCO+ or N2H + (e.g., Bergin et al. 1998), are no good probes of the inner regions of hot molecular cores. 4.4. Temperature structure Leurini et al. (2004, 2007) investigated the diagnostic proper- ties of methanol over a range of physical parameters typical of high-mass star-forming regions. They found that the ground state lines of CH3OH are mainly tracers of the spatial density of the gas, although at submillimeter wavelengths high k tran- sitions are also sensitive to the kinetic temperature. However, in hot, dense regions such as hot cores, the effects of infrared pumping on the level populations due to the thermal heating of the dust is not negligible, but mimic the effect of collisional ex- citation. For the ground state line, Leurini et al. (2007) found that it is virtually impossible to distinguish between IR pump- ing and pumping by collisions, as both mechanisms equally populate the vt = 0 levels. On the other hand, the vibrationally or torsionally excited lines have very high critical densities (1010–1011 cm−3) and high level energies (T ≥ 200 K). They are difficult to be populated by collisions and trace the IR field instead. To study the physical conditions of the gas around the main continuum peaks in G29.96–0.02, we analyzed only the emis- sion coming from the vt = 1 lines, as their optical depth is lower than for the ground state, and their emission is confined to the gas around the dust condensations, while the vt = 0 transitions are more extended and can be affected by problems of missing flux. We first fitted the methanol emission of the vt = 1 lines (see Fig. 6) towards the peak position, using the method de- scribed by Leurini et al. (2004, 2007) that is based on an LVG analysis and includes radiative pumping (Leurini et al. 2007). The continuum emission derived in §3.1 was used in the calcu- lations to solve the equations for the level populations. The two main dust condensations submm1 and submm2 fall in the beam of the line data; however, we assumed that the emission is com- ing from only one component, which is more extended than our beam, and derived a CH3OH column density averaged over the beam of 4×1017 cm−2. The corresponding methanol abundance, relative to H2 is of the order of 10 −7, typical of hot core sources. Since the emission from the vt = 1 lines is optically thin for this column density, and also at higher values, we consider this approach valid. The temperature derived toward the line peak is 340 K. This corresponds to our best fit model, but from a χ2 analysis we can only infer a low limit of ∼220 K for the temper- ature of the gas. Since lines are optically thin, the degeneracy between kinetic temperature and column density is not solved, and the model delivers good fit to the vt = 1 lines for lower or higher temperatures by adjusting the methanol column den- sity. However, the low temperature solutions (Tkin=100–200 K) need high methanol abundances relative to H2(∼ 10 −6), which can be hardly found at these temperatures. Moreover, lines are optically thick for these column densities, and the assumption of our analysis is not valid anymore. We also investigated the line ratio between several vt = 1 lines at the column density derived for the main position, to find the best temperature diagnostic tool among the methanol lines and derive a temperature map of the region. We found that the line ratios with the blend of lines at ∼ 337.64 GHz increase with the temperature of the gas (Fig. 7a). However, the blending of several transitions together complicates the use of such diagnostic. In Fig. 7b, we show the map of the line ratio between the 71,6 → 61,5-E vt = 1 at 337.708 GHz and the blend between the 71,7 → 61,6-E vt = 1 at 337.642 GHz and 70,7 → 60,6-E vt = 1 at 337.644 GHz. Since line intensities do not simply add up, we did not correct for the overlapping between the two transitions. Two other lines, the 74,3 → 64,2-E vt = 1 and the 75,3 → 64,2-E vt = 1, are also very close in frequency. This is seen in the linewidth of the blending, which is wider than for the other lines. Therefore, we considered only half of the channels of the blending at 337.64 GHz in our line ratio analysis. From the ratio-map in Fig. 7b, submm1, submm2 and submm3 of Table 1 show high temperatures (T≥ 300 K), while relatively low temperature gas (T∼ 100 K) is found at R.A. [J2000]=18h46m03s.818 Dec. [J2000]= −02◦39′22′′.14, close to a secondary peak of many ground state lines of methanol (Fig. 2). The temperature then decreases towards submm4. The increase in the line ratio towards the south-east and north is probably not true, but due to the poor signal to noise ratio in these areas. Changes in the column densities along the area may affect our results. Fig. 6. Spectrum of the 7ka ,kb → 6ka ,kb−1 vt = 1 methanol band to- ward the main dust condensation. Overlaid in black is the synthetic spectrum resulting from the fit. Beuther et al.: SMA observations of G29.96−0.02 9 Fig. 7. a: Modeled line ratio between the 71,6 → 61,5-E vt = 1 line and the 71,7 → 61,6-E vt = 1 transitions, as function of the temperature. b: Map of the line ratio between the same transitions in the inner region around the peaks. The white stars mark the positions of the dust peaks; the white dashed contours show the values of the line ratio from ∼ 150 to ∼ 350 K, which correspond to levels from 0.3 to 0.7 in step of 0.1 in the map. The solid black contours show the continuum emission smoothed to the resolution of the line data (from 0.2 to 0.4 Jy/beam in step of 0.05). The offsets on the axes are relative to the phase center. 4.5. Tracing rotation toward the massive cores At the given lower spatial resolution of the spectral line data compared to the submm continuum, we cannot resolve the four submm peaks well. However, one of the aims of such multi-line studies is to identify spectral lines that trace the massive proto- stars and that are potentially associated with massive disk-like structures. Such lines may then be used for kinematic gas stud- ies of rotating gas envelopes, tori or accretion disks. Therefore, we analyzed the data-cubes searching for velocity structures indicative of any kind of rotation. In the large majority of spec- tral lines, this was not successful and we could mostly not identify coherent velocity structure. While chemical and tem- perature effects (§4.3 & 4.4) may be responsible for parts of that, the large column densities derived in §3.1 imply also large molecular line column densities and hence large optical depths. Therefore, many of the observed lines are likely optically thick tracing only outer gas layers of the hot molecular core not pen- etrating down to the deeply embedded protostars. Furthermore, many molecules would not only be excited in the central ro- tating disk-like structures but also in the surrounding envelope and maybe the outflow. Hence, disentangling the different com- ponents observationally remains a challenging task. Fig. 8. Moment 1 maps of HN13C(4–3) (top) and HC3N(37– 36)v7 = 1 (bottom). The markers are the same as in the previous images, and the synthesized beam of 0.68′′ × 0.49′′ is shown at the bottom left. The offsets on the axes are relative to the phase center. The major exceptions are the molecular lines of the rare isotopologue of hydrogen isocyanide HN13C(4–3) with a low excitation temperature of only 42 K, and the vibrationally ex- cited line of cyanoacetylene HC3N(37–36)v7 = 1 with a higher excitation temperature of 629 K (Fig. 8). In both cases we find a velocity gradient across the main submm peak submm1 with a position angle of ∼ 45◦ from north. This is approximately perpendicular to the molecular outflow discussed in §3.3 and by Gibb et al. (2004). Interestingly, Gibb et al. (2004) also find a similar velocity gradient in their central velocity channels of H2S. The previously reported NH3 and CH3CN velocity gradi- ents in approximately east-west direction (Cesaroni et al. 1998; Olmi et al. 2003) have been observed with slightly lower spa- tial resolution and are consistent with our data as well. Our observations as well as previous work in the liter- ature suggest that the G29.96−0.02 hot core exhibits a ve- locity gradient in the dense gas in approximately north-east south-west direction perpendicular to the molecular outflow observed at larger scales. Based on the HN13C(4–3) map, the diameter of this structure is ∼ 1.6′′ corresponding to radius of ∼4800 AU. Since this emission encompasses not only the submm peak submm1 but also the submm2 and submm3, it is not genuine protostellar disk as often observed in low- mass star-forming regions. The velocity structure does not re- semble Keplerian rotation and may hence be due to some larger-scale rotating envelope or torus that could transform into a genuine accretion disks at smaller still unresolved spa- tial scales (Cesaroni et al. 2007). Additional options to ex- plain such a velocity gradient may be (a) interaction with the 2nd outflow in north-east–south-western direction, (b) inter- action with the expanding UCHii region, and (c) global col- lapse like recently proposed for NGC2264 (Peretto et al. 2006). While we cannot exclude (a) and (b), option (c) of a globally collapsing core appears particularly interesting because com- bining rotation and collapse would result in an inward spi- raling kinematic structure, potentially similar to the models originally proposed for rotating low-mass cores (e.g., Ulrich 1976; Terebey et al. 1984). Recent hydrodynamic simulations by Dobbs et al. (2005) and Krumholz et al. (2006) as well as analytic studies by Kratter & Matzner (2006) find fragmenta- tion and star formation within the massive disks forming early in the collapse process of high-mass cores. This would be con- sistent with the found three sub-sources (submm1 to submm3) within the HN13C/HC3N structure. However, on a cautionary note it needs to be stressed that the collapse/rotation scenario is far from conclusive, and that the outflow and/or UCHii re- gion can potentially influence the observed velocity pattern as well. It remains puzzling that only these two lines exhibit the discussed signatures whereas all the other spectral lines in our setup do not. 5. Conclusions and Summary The new 862 µm submm continuum and spectral line data ob- tained with the SMA toward G29.96−0.02 at sub-arcsecond spatial resolution resolve the hot molecular core into sev- eral sub-sources. At an angular resolution of 0.36′′ × 0.25′′, corresponding to linear scales of ∼1800 AU, the central core contains four submm continuum peaks which resemble a Trapezium-like multiple system at a very early evolutionary stage. Assuming spherical symmetry for the hot core region, the protostellar densities are high of the order 1.4 × 105 pro- tostars per pc3. However, these protostellar densities are still below the required values between 106 to 108 protostars/pc3 to make coalescence of protostars a feasible process. Derived H2 column densities of the order a few 1024 cm−2 imply visual ex- tinctions of a few 1000. The existence of three sites of massive star formation in different evolutionary stages within a small region (the UCHii region, the mid-infrared source, and the submm continuum sources) indicates that sequences of mas- sive star formation may take place within the same evolving massive protocluster. The 4 GHz of observed bandpass reveal a plethora of ap- proximately 80 spectral lines from 18 molecular species, iso- topologues or vibrationally excited lines. Only about 5% of the 10 Beuther et al.: SMA observations of G29.96−0.02 spectral lines remain unidentified. Most spectral lines peak to- ward the hot molecular core, while a few species also show more extended emission, likely due to molecular outflows and chemical differentiation. The CH3OH line forest allows us to investigate the temperature structure in more detail. We find hot core temperatures≥ 300 K and decreasing temperature gra- dients to the core edges. The SiO(8-7) observations confirm a previously reported outflow Gibb et al. (2004) in north-west south-east direction with a potential identification of a second outflow emanating approximately in perpendicular direction. Furthermore, C34S exhibits a peculiar morphology being weak toward the hot molecular core and strong in its surroundings, particular in the UCHii/hot core interface region. The C34S de- ficiency toward the hot molecular core may be explained by time-dependent chemical desorption from grains, where the C34S desorbs early, and later-on after H2O desorbs from grains forming OH, the sulphur reacts with the OH to form SO and Furthermore, we were interested in identifying the best molecular line tracers to investigate the kinematics and po- tential disk-like structures in such dense and young massive star-forming regions. Most spectral lines do not exhibit any coherent velocity structure. A likely explanation for this un- correlation between molecular line peaks and submm contin- uum peaks is that many spectral lines may be optically thick in such high-column-density regions, and that additional chem- ical evolution and temperature effects complicate the picture. Furthermore, many molecules are excited in various gas com- ponents (e.g., disk, envelope, outflow), and it is often observa- tionally difficult to disentangle the different contributions prop- erly. There are a few exceptions of optically thin and vibra- tionally excited lines that apparently probe deeper into the core tracing submm1 better than other transitions. Investigating the velocity pattern of these spectral lines, we find for some of them a velocity gradient in the north-east south-west direc- tion perpendicular to the molecular outflow. Since the spatial scale of this structure is relatively large (∼4800 AU) compris- ing three of the central protostellar sources, and since the veloc- ity structure is not Keplerian, this is not a genuine Keplerian ac- cretion disk. While these data are consistent with a larger-scale toroid or envelope that may rotate and/or globally collapse, we cannot exclude other explanations, such as that the influence of the outflow(s) and/or expanding UCHII region produces the observed velocity pattern. In addition to this, these data con- firm previous findings that the high column densities, the large optical depths of the spectral lines, the chemical evolution, and the different spectral line contributions from various gas com- ponents make it very difficult to identify suitable massive ac- cretion disk tracers, and hence to study this phenomenon in a more statistical fashion. (e.g., Beuther et al. 2006) Acknowledgements. We like to thank Peter Schilke and Sebastian Wolf for many interesting discussions about related subjects. We also thank the anonymous referee whose comments helped improv- ing the paper. H.B. acknowledges financial support by the Emmy- Noether-Program of the Deutsche Forschungsgemeinschaft (DFG, grant BE2578). References Abt, H. A. & Corbally, C. J. 2000, ApJ, 541, 841 Ambartsumian, V. A. 1955, The Observatory, 75, 72 Araya, E., Hofner, P., Goss, W. M., et al. 2006, ApJ, 643, L33 Bally, J. & Zinnecker, H. 2005, AJ, 129, 2281 Bergin, E. A., Neufeld, D. A., & Melnick, G. J. 1998, ApJ, 499, Beuther, H., Schilke, P., Menten, K. M., et al. 2002, ApJ, 566, Beuther, H., Schilke, P., Menten, K. M., et al. 2005a, ApJ, 633, Beuther, H., Zhang, Q., Greenhill, L. J., et al. 2005b, ApJ, 632, Beuther, H., Zhang, Q., Sridharan, T. K., Lee, C.-F., & Zapata, L. A. 2006, A&A, 454, 221 Bonnell, I. A., Bate, M. R., & Zinnecker, H. 1998, MNRAS, 298, 93 Bonnell, I. A., Vine, S. G., & Bate, M. R. 2004, MNRAS, 349, Cesaroni, R., Churchwell, E., Hofner, P., Walmsley, C. M., & Kurtz, S. 1994, A&A, 288, 903 Cesaroni, R., Galli, D., Lodato, G., Walmsley, C. M., & Zhang, Q. 2007, in Protostars and Planets V, ed. B. Reipurth, D. Jewitt, & K. Keil, 197–212 Cesaroni, R., Hofner, P., Walmsley, C. M., & Churchwell, E. 1998, A&A, 331, 709 Charnley, S. B. 1997, ApJ, 481, 396 Churchwell, E., Walmsley, C. M., & Cesaroni, R. 1990, A&AS, 83, 119 De Buizer, J. M., Radomski, J. T., Piña, R. K., & Telesco, C. M. 2002, ApJ, 580, 305 Dobbs, C. L., Bonnell, I. A., & Clark, P. C. 2005, MNRAS, 360, 2 Frerking, M. A., Langer, W. D., & Wilson, R. W. 1982, ApJ, 262, 590 Gibb, A. G., Wyrowski, F., & Mundy, L. G. 2004, ApJ, 616, Hatchell, J., Thompson, M. A., Millar, T. J., & MacDonald, G. H. 1998, A&AS, 133, 29 Hildebrand, R. H. 1983, QJRAS, 24, 267 Ho, P. T. P., Moran, J. M., & Lo, K. Y. 2004, ApJ, 616, L1 Hoffman, I. M., Goss, W. M., Palmer, P., & Richards, A. M. S. 2003, ApJ, 598, 1061 Hofner, P. & Churchwell, E. 1996, A&AS, 120, 283 Hunter, T. R., Brogan, C. L., Megeath, S. T., et al. 2006, ApJ, 649, 888 Kratter, K. & Matzner, C. 2006, ArXiv Astrophysics e-prints: astro-ph/0609692 Krumholz, M., Klein, R., & McKee, C. 2006, ArXiv Astrophysics e-prints: astro-ph/0609798 Lada, C. J. & Lada, E. A. 2003, ARA&A, 41, 57 Leurini, S., Schilke, P., Menten, K. M., et al. 2004, A&A, 422, Leurini, S., Schilke, P., Wyrowski, F., & Menten, K. 2007, sub- mitted Maxia, C., Testi, L., Cesaroni, R., & Walmsley, C. M. 2001, A&A, 371, 287 Beuther et al.: SMA observations of G29.96−0.02 11 McCutcheon, W. H., Sandell, G., Matthews, H. E., et al. 2000, MNRAS, 316, 152 Megeath, S. T., Wilson, T. L., & Corbin, M. R. 2005, ApJ, 622, Minier, V., Conway, J. E., & Booth, R. S. 2001, A&A, 369, 278 Olmi, L., Cesaroni, R., Hofner, P., et al. 2003, A&A, 407, 225 Peretto, N., Hennebelle, P., & Andre, P. 2006, ArXiv Astrophysics e-prints, astro-ph/0611277 Pratap, P., Megeath, S. T., & Bergin, E. A. 1999, ApJ, 517, 799 Sault, R. J., Teuben, P. J., & Wright, M. C. H. 1995, in ASP Conf. Ser. 77: Astronomical Data Analysis Software and Systems IV, 433 Schilke, P., Walmsley, C. M., Pineau des Forets, G., & Flower, D. R. 1997, A&A, 321, 293 Scoville, N. Z., Carlstrom, J. E., Chandler, C. J., et al. 1993, PASP, 105, 1482 Sharpless, S. 1954, ApJ, 119, 334 Stahler, S. W., Palla, F., & Ho, P. T. P. 2000, Protostars and Planets IV, 327 Terebey, S., Shu, F. H., & Cassen, P. 1984, ApJ, 286, 529 Thompson, M. A., Hatchell, J., Walsh, A. J., MacDonald, G. H., & Millar, T. J. 2006, A&A, 453, 1003 Ulrich, R. K. 1976, ApJ, 210, 377 Viti, S., Collings, M. P., Dever, J. W., McCoustra, M. R. S., & Williams, D. A. 2004, MNRAS, 354, 1141 Wakelam, V., Selsis, F., Herbst, E., & Caselli, P. 2005, A&A, 444, 883 Watson, A. M. & Hanson, M. M. 1997, ApJ, 490, L165+ Wood, D. O. S. & Churchwell, E. 1989, ApJS, 69, 831 Wright, M. C. H., Plambeck, R. L., & Wilner, D. J. 1996, ApJ, 469, 216 Wyrowski, F., Gibb, A. G., & Mundy, L. G. 2002, in Astronomical Society of the Pacific Conference Series, 43 12 Beuther et al.: SMA observations of G29.96−0.02 Fig. 2. Compilation of integrated line images (and submm continuum at the same spatial resolution) always shown in grey-scale with contours and labeled at the bottom of each panel. The dashed contours show negative features due to missing short spacings. The contouring is done from ±15 to ±95% (step ±10%) of the peak emission of each image, respectively. Peak fluxes S peak, rms and integrated velocity ranges for each image are given in Table 3. The dotted contours again show the UCHii region and the stars mark the submm continuum peaks from Figure 1. The offsets on the axes are relative to the phase center. Beuther et al.: SMA observations of G29.96−0.02 13 Fig. 3. Continued Figure 2. 14 Beuther et al.: SMA observations of G29.96−0.02 Table 6. Line parameters Freq. Line Eu Freq. Line Eu GHz K GHz K 337.279 CH3OH(72,5 − 62,4)E(vt=2) a 727 338.409 CH3OH(70,7 − 60,6)A 65 337.297 CH3OH(71,7 − 61,6)A(vt=1) 390 338.431 CH3OH(76,1 − 66,0)E 254 337.348 CH3CH2CN(383,36 − 373,35) 328 338.442 CH3OH(76,1 − 66,0)A 259 337.397 C34S(7–6) 65 CH3OH(76,2 − 66,1)A − 259 337.421 CH3OCH3(212,19 − 203,18) 220 338.457 CH3OH(75,2 − 65,1)E 189 337.446 CH3CH2CN(374,33 − 364,32) 322 338.475 CH3OH(75,3 − 65,2)E 201 337.464 CH3OH(76,1 − 60,0)A(vt=1) 533 338.486 CH3OH(75,3 − 65,2)A 203 337.474 UL CH3OH(75,2 − 65,1)A − 203 337.490 HCOOCH3(278,20 − 268,19)E 267 338.504 CH3OH(74,4 − 64,3)E 153 337.519 CH3OH(75,2 − 65,2)E(vt=1) 482 338.513 CH3OH(74,4 − 64,3)A − 145 337.546 CH3OH(75,3 − 65,2)A(vt=1) 485 CH3OH(74,3 − 64,2)A 145 CH3OH(75,2 − 65,1)A −(vt=1) 485 CH3OH(72,6 − 62,5)A − 103 337.582 34SO(88 − 77) 86 338.530 CH3OH(74,3 − 64,2)E 161 337.605 CH3OH(72,5 − 62,4)E(vt=1) 429 338.541 CH3OH(73,5 − 63,4)A + 115 337.611 CH3OH(76,1 − 66,0)E(vt=1) 657 338.543 CH3OH(73,4 − 63,3)A − 115 CH3OH(73,4 − 63,3)E(vt=1) 388 338.560 CH3OH(73,5 − 63,4)E 128 337.626 CH3OH(72,5 − 62,4)A(vt=1) 364 338.583 CH3OH(73,4 − 63,3)E 113 337.636 CH3OH(72,6 − 62,5)A −(vt=1) 364 338.612 SO2(201,19 − 192,18) 199 337.642 CH3OH(71,7 − 61,6)E(vt=1) 356 338.615 CH3OH(71,6 − 61,5)E 86 337.644 CH3OH(70,7 − 60,6)E(vt=1) 365 338.640 CH3OH(72,5 − 62,4)A 103 337.646 CH3OH(74,3 − 64,2)E(vt=1) 470 338.722 CH3OH(72,5 − 62,4)E 87 337.648 CH3OH(75,3 − 65,2)E(vt=1) 611 338.723 CH3OH(72,6 − 62,5)E 91 337.655 CH3OH(73,5 − 63,4)A(vt=1) 461 338.760 13CH3OH(137,7 − 127,6)A 206 CH3OH(73,4 − 63,3)A −(vt=1) 461 338.769 HC3N(37 − 36)v7 = 2 525 337.671 CH3OH(72,6 − 62,5)E(vt=1) 465 338.886 C2H5OH(157,8 − 156,19) 162 337.686 CH3OH(74,3 − 64,2)A(vt=1) 546 339.058 C2H5OH(147,7 − 146,8) 150 CH3OH(74,4 − 64,3)A −(vt=1) 546 347.232 CH2CHCN(381,38 − 371,37) 329 CH3OH(75,2 − 65,1)E(vt=1) 494 347.331 28SiO(8–7) 75 337.708 CH3OH(71,6 − 61,5)E(vt=1) 489 347.446 UL 337.722 CH3OCH3(74,4 − 63,3)EE 48 347.494 HCOOCH3(275,22 − 265,21)A 247 337.732 CH3OCH3(74,3 − 63,3)EE 48 347.759 CH2CHCN(362,34 − 352,32) 317 337.749 CH3OH(70,7 − 60,6)A(vt=1) 489 347.792 UL 337.778 CH3OCH3(74,4 − 63,4)EE 48 347.842 UL 337.787 CH3OCH3(74,3 − 63,4)AA 48 347.916 C2H5OH(204,17 − 194,16) 251 337.825 HC3N(37 − 36)v7 = 1 629 347.983 UL 337.838 CH3OH(206,14 − 215,16)E 676 348.261 CH3CH2CN(392,37 − 382,36) 344 337.878 CH3OH(71,6 − 61,5)A(vt=2) 748 348.340 HN 13C(4–3) 42 337.969 CH3OH(71,6 − 61,5)A(vt=1) 390 348.345 CH3CH2CN(402,39 − 392,38) 351 338.081 H2CS(101,10 − 91,9) 102 348.532 H2CS(101,9 − 91,8) 105 338.125 CH3OH(70,7 − 60,6)E 78 348.910 HCOOCH3(289,20 − 279,19)E 295 338.143 CH3CH2CN(373,34 − 363,33) 317 348.911 CH3CN(199 − 189) 745 338.306 SO2(144,14 − 183,15) 197 349.025 CH3CN(198 − 188) 624 338.345 CH3OH(71,7 − 61,6)E 71 349.107 CH3OH(141,13 − 140,14) 43 338.405 CH3OH(76,2 − 66,1)E 244 a The detection of this CH3OH vt = 2 line is doubtful since other close vt = 2 lines with similar excitation temperatures were not detected. This figure "ch3oh1.jpg" is available in "jpg" format from: http://arxiv.org/ps/0704.0518v1 http://arxiv.org/ps/0704.0518v1 This figure "hc3n_v1_mom1.jpg" is available in "jpg" format from: http://arxiv.org/ps/0704.0518v1 http://arxiv.org/ps/0704.0518v1 This figure "hn13c_mom1.jpg" is available in "jpg" format from: http://arxiv.org/ps/0704.0518v1 http://arxiv.org/ps/0704.0518v1 This figure "ch3oh2.jpg" is available in "jpg" format from: http://arxiv.org/ps/0704.0518v1 http://arxiv.org/ps/0704.0518v1 This figure "ch3oh3.jpg" is available in "jpg" format from: http://arxiv.org/ps/0704.0518v1 http://arxiv.org/ps/0704.0518v1 Introduction Observations Results Submillimeter continuum emission Spectral line emission Molecular outflow emission Discussion The formation of a proto-Trapezium system? Various episodes of massive star formation? Carbon mono-sulfide C34S Temperature structure Tracing rotation toward the massive cores Conclusions and Summary ABSTRACT Aiming at a better understand of the physical and chemical processes in the hot molecular core stage of high-mass star formation, we observed the prototypical hot core G29.96-0.02 in the 862mu band with the Submillimeter Array (SMA) at sub-arcsecond spatial resolution. The observations resolved the hot molecular core into six submm continuum sources with the finest spatial resolution of 0.36''x0.25'' (~1800AU) achieved so far. Four of them located within 7800(AU)^2 comprise a proto-Trapezium system with estimated protostellar densities of 1.4x0^5 protostars/pc^3. The plethora of ~80 spectral lines allows us to study the molecular outflow(s), the core kinematics, the temperature structure of the region as well as chemical effects. The derived hot core temperatures are of the order 300K. We find interesting chemical spatial differentiations, e.g., C34S is deficient toward the hot core and is enhanced at the UCHII/hot core interface, which may be explained by temperature sensitive desorption from grains and following gas phase chemistry. The SiO(8-7) emission outlines likely two molecular outflows emanating from this hot core region. Emission from most other molecules peaks centrally on the hot core and is not dominated by any individual submm peak. Potential reasons for that are discussed. A few spectral lines that are associated with the main submm continuum source, show a velocity gradient perpendicular to the large-scale outflow. Since this velocity structure comprises three of the central protostellar sources, this is not a Keplerian disk. While the data are consistent with a gas core that may rotate and/or collapse, we cannot exclude the outflow(s) and/or nearby expanding UCHII region as possible alternative causes of this velocity pattern. <|endoftext|><|startoftext|> Introduction The Hamiltonian formulation of non-conservative systems has been developed by Riewe[1,2].He used the fractional derivative [3,4,5] to construct the Lagrangian and Hamiltonian for non-conservative systems. As a sequel to Riewe's work, Rabei et al. [6] used Laplace transforms of fractional integrals and fractional derivatives to develop a general formula for the potential of any arbitrary forces, conservative or non- conservative. This led directly to the consideration of the dissipative effects in Lagrangian and Hamiltonian formulations. Besides, the canonical quantization of non- conservative systems carried out by Rabei et al. [7]. Other investigations and further developments are given by Agrawal [8] .He presented the fractional variational problems and the resulting equations are found to be similar to those for variation problems containing integral order derivatives. This approach is extended to classical fields with fractional derivatives [9]. Besides, Kilmek [10] showed that the fractional Hamiltonian is usually not a constant of motion, even in the case when the Hamiltonian is not an explicit function of time. In addition, as a continuation of Agrawal’s work [8], Rabei et al. [11] achieved the passage from the Lagrangian containing fractional derivatives to the Hamiltonian. The Hamilton's equations of motion are obtained in a similar manner to the usual mechanics. In the present work, the Hamilton – Jacobi partial differential equation (HJPDE) is generalized to be applicable for systems containing fractional derivatives. The paper is organized as follows: In Sec. 2 Lagrangian and Hamiltonian formalisms with fractional derivatives are reviewed briefly. In Sec.3, Hamilton-Jacobi Partial differential equations with fractional derivatives is constructed, and two illustrative examples are given in Sec. 4. 2- Hamiltonian Formalism with Fractional Derivative Several definitions of a fractional derivative have been proposed. These definitions include Riemann–Liouville, Grünwald–Letnikov, Weyl, Caputo, Marchaud, and Riesz fractional derivatives. Here; the problem is formulated in terms of the left and the right Riemann–Liouville fractional derivatives. The left Riemann–Liouville fractional derivative defined as ∫ −−−⎟⎠ xa dfx xfD τττ αα )()( )( 1 (1) Which is denoted as the LRLFD and the right Riemann–Liouville fractional derivative reads as ∫ −−−⎟⎠ bx dfx xfD τττ αα )()( )( 1 (2) Which is denoted as the RRLFD. Here α is the order of the derivative such that nn ≤≤− α1 and Γ represents the Euler gamma function. If α is an integer, these derivatives are defined in the usual sense, i.e. ,....3,2,1,)()(,)()( =⎟ xfDxf xfD bxxa (3) The fractional operator xa D can be written as [13] D −= αα (4) Where the number of additional differentiations n is equal to [α] +1, where [α] is the whole part of α. The operator αxa D is a generalization of differential and integral operators and can be introduced as follows: ∫ − 0)Re()( 0)Re(1 0)Re( (5) Following to Agrawal [8], the Euler-Lgrange equations for fractional calculus of variations problem is obtained as (6) Where L is the genaralized Lagrangian function of the form ),,,( tqDqDqL btta The generalized momenta are introduced as ββαα ∂ = , (7) And the Hamiltonian depending on the fractional time derivatives reads as LqDpqDpH btta −+= α (8) In Ref [11], the Hamilton’s equations of motion are obtained in a similar manner to the usual mechanics. These equations read as, ; qD , ; β α pDpD tabt +=∂ It is observed that the fractional Hamiltonian is not a constant of motion even though the Lagrangian does not depend on the time explicitly. 3. Hamilton-Jacobi Partial Differential Equation with Fractional Derivatives In this section, the determination of the Hamilton-Jacobi partial differential equation for systems with fractional derivatives is discussed. According to Rabei et al. [11], the fractional Hamiltonian is written as ( ) ),,,(,,, tqDqDqLqDpqDptppqH bttabtta βαββααβα −+= (9) Consider the canonical transformation with a generating function ( )tPPqDqDF btta ,,,, 112 βαβα −− Then, the new Hamiltonian will take the form ( ) ),,,(,,, tQDQDQLQDPQDPtPPQK bttabtta βαββααβα ′−+= (10) The old canonical coordinates βα ppq ,, , satisfy the fractional Hamilton’s principle that can be put in the form ( ) 0 btta dtHqDpqDp αδ (11) At the same time the new canonical coordinates βα PPQ ,, , of course satisfy a similar principle. ( ) 0 btta dtKQDPQDP αδ (12) The simultaneous validity of Eq. (11) and Eq. (12) does not mean of course that the integrands in both expressions are equal. Since the general form of the Hamilton’s principle has zero variation at the end points, both statements will be satisfied if the integrands connected by a relation of the form [12] KQDPQDPHqDpqDp bttabtta +−+=−+ α (13) Where the function F is given as: ( ) QDPQDPtPPqDqDFF bttabtta 11112 ,,,, −−−− −−= ββααβαβα (14) The function F2 is called Hamilton’s principal function S for a contact transformation. ( )tPPqDqDSF btta ,,,, 112 βαβα −−= (15) Thus, btbttata 1111 −−−− −−−−= ββ By using definitions of fractional calculus given in Eq. (4) then we have QDPQD QDPQD btbttata αα −−−−= −− 11 (16) Substituting the values of the from Eq. (16) into the Eq. (13) we have KHqDpqDp bttabtta 11 −− −−+−=−+ ββααββ α (17) Again using definitions of fractional calculus given in Eq. (4) we have the following form α 11 (18) Substituting the values of the from Eq. (18) into the Eq. (17) we get = ββαα (19) http://scienceworld.wolfram.com/physics/ContactTransformation.html QD btta ∂ = −− 11 , (20) K + (21) We can automatically ensure that the new variables are constant in time by requiring that the transformed Hamiltonian K shall be identically zero, In other words, βα PPQ ,, are constants. We see by putting K = 0 that this generating function must satisfy the partial differential equation. 0= H (22) This equation is called the Hamilton –Jacobi equation. Let us assume that 21 , EPEP == βα Where 1E , 2E are constants. Then the action function (15), can be expressed as ( )tEEqDqDSS btta ,,,, 2111 −−= βα (23) Further insight into the physical significance of Hamilton’s principal function S is furnished by an examination of the total time derivative, which can be computed from the formula α 11 (24) By using Eq. (19) we have HqDpqDp btta −+= And using Eq. (9) we have L Thus ∫= LdtS (25) If we restrict our considerations to the time -independent Hamiltonians, then the Hamilton-Jacobi function can be written in the form ( ) ( ) ( )tEEfEqDWEqDWS btta ,,,, 21212111 ++= −− βα (26) Where W is called Hamilton’s characteristic function and the function, f takes the following form: ( ) EttEEf −=,, 21 Making use of equations (19) and (20) we obtain: = ββαα (27) 2 11 , λλ βα = QD btta (28) Here 1λ , 2λ are constants. The physical significance of W can be understood by writing its total time derivative qD = (29) Comparing this expression to the results of substituting Eq. (27) into Eq. (29) we see that ∫∫ −=⇒=⇒= qDdpWqdtDpWqDpdt tatata α (30) Again one may show that ∫ −= qDdpW bt 12 ββ (31) 4. Illustrative Examples To demonstrate the application of our formalism, let us discuss the following models: As a first model consider the lagrangian given by Agrawal [8] ( )20 qDL t The (HJPDE) for this Lagrangian is calculated as ( ) 0 1 2 = Using Eq. (27) we obtain 0 1 =−⎟⎟ − EqD Solving this equation we have qDEW t −= α Thus Ep 2=α Making use of Eq. (26) we obtain the function S as: EtqDES t −= Eq. (28) leads to 1 1 λαα =−= = −− tqD QD tt Thus ( )110 2 λα +=− tEqDt α α pEqDt == 20 This is the same result obtained by Rabei et al. [11], which is equivalent to Agrawal formalism [8]. As a second model consider the Lagrangian given by Rabei et al. [11] ( ) ( ) qDqDqDqDL tttt βαβα 102120 ++= The Hamiltonian is calculated as ( ) ( )22 βα ppH == Thus, the Hamilton-Jacobi partial differential equation reads as: ( ) 0 1 2 = Making use of Eq. (26) we have 0 1 =−⎟⎟ − EqD Thus, qDEW t −= α Again the (HJPDE) can be written as ( ) 0 1 2 = Then 0 2 =−⎟⎟ − EqD Which leads to qDEW t −= β Thus the Hamilton-Jacobi action function can be written as EtqDEqDES tt −+= Where E = −αα E = −ββ Using Eq. (28) we get 1 1 λβαα =−+= −−− tqD QD ttt Thus ( )11110 2 λβα +=+ −− tEqDqD tt EqDqD tt 210 =+ Then qDqDp tt α 10 += qDqDp tt β 10 += These Leads to 0))(( 1010 =++ qDqDDD tttt This result is in exact agreement with Rabei et al. [11]. 5- Conclusion In This work we have studied the Hamilton-Jacobi partial differential equation for systems containing fractional derivatives. A general theory to solve the Hamilton- Jacobi partial differential equation is proposed for systems containing fractional derivatives under the condition that this equation is separable. The Hamilton-Jacobi function is determined in the same manner as for usual systems. Finding this function enables us to get the solutions of the equations of motion. In order to test our formalism, and to get a somewhat deeper understanding, we have examined two examples of systems with fractional derivatives. The result found to be in exact agreement with Lagrangian formulation given by Agrawal [8] and with Hamiltonian formulation given by Rabei et al. [11]. 6- References [1]F. Riewe, Non Conservative Lgrangian and Hamiltonian mechanics. Physical Review E. 53:1890-1899, (1996). [2] F. Riewe, Mechanics with Fractional Derivatives. Physical Review E.55: 3581- 3592, (1997). [3] R. Hilfer, Applications of Fractional Calculus in Physics, World Scientific Publishing Company, Singapore, New Jersey, London and Hong Kong, (2000). [4] S. G. Samko, A. A. Kilbas, O. I. Marichev, Fractional Integrals and Derivatives: theory and applications, Gordon and Breach, Amsterdam, (1993). [5] I. Podlubny, Fractional Differential Equations, Academic Press, San Diego (1999); Fractional Calculus and Applied Analysis 5: 367, (2002). [6] Eqab M. Rabei, T. Al-halholy, A. Rousan, Potentials of Arbitrary Forces with Fractional Derivatives, International Journal of Modern Physics A, 19: 3083-3092, (2004). [7] Eqab M. Rabei, Abdul-Wali Ajlouni, Humman B. Ghassib, Quantization of Non- Conservative Systems Using Fractional Calculus, WSEAS Transactions on Mathematics, 5: 853-863, (2006); Quantization of Brownian Motion, International Journal of Theoretical physics, (2006) in press. [8]Om P. Agrawal, Formulation of Euler–Lagrange equations for fractional variational problems, J. Math. Anal. Appl.272: 368-379, (2002). [9] Dumitru Baleanu, Sami I. Muslih, Lagrangian formulation of classical fields within Riemann-Liouville fractional derivatives, Phys. Scripta (in press). [10] M. Klimek, Lagrangian and Hamiltonian fractional sequential mechanics; Czech J. Phys., 52: 1247-1253, (2002); Fractional sequential mechanics – models with symmetric fractional derivative, Czech J. Phys. 51: 1348-1354, (2001). [11] Eqab M. Rabei, Khaled I. Nawafleh, Raed S. Hijjawi , Sami I. Muslih, Dumitru Baleanu, The Hamilton Formalism With Fractional Derivatives, J. Math. Anal. Appl. (in press) [12] H. Goldstein, Classical Mechanics, Addison-Wesley Publishing Company, (1980). [13] Igor M. Sokolov,Joseph Klafter, Alexander Blumen, Fractional Kinetics, Physics Today(2002) American Institute of physics,S-0031-9228-0211-030-1. [14] B.N.N. Achar, J.W. Hanneken, T. Enck, T.Clarke, Dynamics of the fractional oscillator, Physica A, 297: 361-367, (2001). ABSTRACT As a continuation of Rabei et al. work [11], the Hamilton- Jacobi partial differential equation is generalized to be applicable for systems containing fractional derivatives. The Hamilton- Jacobi function in configuration space is obtained in a similar manner to the usual mechanics. Two problems are considered to demonstrate the application of the formalism. The result found to be in exact agreement with Agrawal's formalism. <|endoftext|><|startoftext|> Introduction and the model Entanglement is a physical observable measured by the von Neumann entropy or, alternatively, by the Concurrence of the system under consideration. The concept of entanglement gives a physical meaning to the electron cor- relation energy in structures of interacting electrons. The electron correlation is not directly observable, since it is defined as the difference between the ex- act ground state energy of the many electrons Schrödinger equation and the Hartree–Fock energy. In this paper we discuss the Hamiltonian which describes the Hydrogen molecule regarded as a two interacting spin 1 (qubit) model. In [1] it was argued that the entanglement (a quantum observable) can be used in analyzing the so–called correlation energy which is not directly observ- able. From our point of view, the Hydrogen molecule is dealt with a bipartite system governed by the Hamiltonian HH2 = − (1 + g)σ1 ⊗ σ1 − (1− g)σ2 ⊗ σ2 − B(σ3 ⊗ σ3 + σ0 ⊗ σ3), (1) ∗Dipartimento di Fisica dell’Università del Salento and Sezione INFN di Lecce, 73100 Lecce, Italy; e–mail: tina.maiolo@le.infn.it †Dipartimento di Fisica dell’Università del Salento and Sezione INFN di Lecce, 73100 Lecce, Italy; e–mail: luigi.martina@le.infn.it ‡Dipartimento di Fisica dell’Università del Salento and Sezione INFN di Lecce, 73100 Lecce, Italy; e–mail: giulio.soliani@le.infn.it http://arxiv.org/abs/0704.0520v2 where σi stand for the Pauli matrices (σ0 = I). Actually, this model was con- sidered in [1] in order to illustrate their method. However, here we will make some interpretative changes. Indeed, from our point of view, the states of an isolated atom are strongly reduced to a system with two energy levels related to the intensity of the magnetic field B. Relatively to this scale, the exchange in- teraction constant J is usually smaller than B, in order to represent the residual interatomic interactions. From the point of view of quantum chemistry, one may interpret the discrete spectrum as provided by the Hartree–Fock calculations, while the interaction coupling J models the residual multielectronic effects, not taken into account by the mean field approximation. For simplicity we limit ourselves to the ferromagnetic phase with J > 0. The parameter g, such that 0 ≤ g ≤ 1, describes the degree of anisotropy corresponding for g = 0 to the completely isotropic XY spin model. Conversely, g = 1 provides the anisotropic XY spin model, the so-called Ising model. We notice that when the atoms are far apart, their interaction is quite weak. This corresponds to a vanishing value of J . In this situation the state of the system is completely factorized in the product state of the ground states of the indipendent spins. The corresponding total energy, in unit of B, is just the sum of the two fundamental levels, E0 = −2, which we may consider as the Hartree- Fock approximated fundamental level in molecular structure calculations. When J 6= 0, the fundamental energy eigenvalue is E= − 4 + g2λ2 in Re- gion I defined by 0 < λ ≤ 2√ , otherwise E = −λ (λ means the coupling constant) in Region II, which is the complement of I which respect to pos- itive real axis. The corresponding (non normalized) eigenstates are |ΨI〉 = g2λ2+4+2 , 0, 0, 1 and |ΨII〉 = 0, 1, 1, 0 , respectively. In both cases the state is entangled. Since we are dealing with pure states, the von Neumann entropy [2] SvN = −Tr ρ1log2ρ1 is chosen to be a measurement of the entanglement, where ρ1 is the 1-particle reduced density matrix. However, for general mixed states other entanglement estimators (for instance, the Concurrence [4]) have to be used. In the considered case, one has SvN,I = − g2λ2 + 4 log g2λ2 + 4 g2λ2 + 4 + 4 λ2 + 8 g2λ2 + 4 + 2 g2λ2 + 4 g2λ2 + 4 g2λ2 + 2 g2λ2 + 4 + 4 log(4) SvN,II = 1. (4) Scrutinizing Eq. (3) and Eq. (4) it emerges that the entropy is an increasing function of the coupling constant λ in Region I, but the state is maximally entangled in Region II independently from the anisotropy parameter g. One sees that, as it arises graphycally, for g = 1 the entanglement is a monotonic increasing function of the interaction coupling λ. Moreover for weak (< 1) coupling values it is always less than the 30%. Of course, for large coupling constants the entropy approaches 1, meaning that all levels are equiprobably visited by the considered spin. Limiting all further considerations to the case of weak interaction, we observe that at the boundary point λb = a discontinuity occurs, signaling a crossing of the lowest eigenvalues and, in a more general context, a quantum phase transition [5]. As it was pointed out in [6], for quantifying the entanglement we can resort to the reduced density matrix. Furthermore, in [7], Wootters has shown that for a pair of binary qubits one can use the concept of Concurrence C to measure the entanglement. The Concurrence reads C(ρ) = max(0, ν1 − ν2 − ν3 − ν4), (5) where the νi’s are the eigenvalues of the Hermitian matrix where ρ̃ = (σy ⊗ σy)ρ∗(σy ⊗ σy), ρ∗ being the complex conjugate of ρ taken in the standard basis [7]. Some interesting results on the simple model (1) of the Hydrogen molecule can be achieved by realizing a comparative study of the von Neumann entropy and the Concurrence. To this aim, we compute the Concurrence CI and CII, i. e. CI = gλ g2λ2 + 4 , CII = 1. (6) where I and II refer to Regions I and II, where 0 ≤ λ ≤ 2 1−g2 , and E = −λ, respectively. In Figure 1 a comparison between the Concurrence and the von Neumann entropy for two spins system as a function of the coupling λ for g = 1 is pre- sented. Sec. 2 contains a comparison between the entanglement and the correlation energy. In Sec. 3 the Configuration Interaction method is introduced to compare entanglement and correlation energy. In Sec. 4 some differences between the Configuration Interaction approach and the two spin Ising model are presented. Finally, our main results are summarized in Sec. 5. 1 2 3 4 5 Conc. Figure 1: Comparison between the Concurrence and the von Neumann entropy for the two spins system as a function of the coupling constant λ for g = 1. 2 A comparison between the entanglement and the correlation energy Now we look for a comparison between the entanglement with the energy cor- relation, which as we have already recalled, it is understood as the difference of the fundamental energy level compared with respect to the corresponding value at vanishing coupling constant λ. For g = 1 and in unities of B it is given by Ecorr = |E0| − 2 = 4 + λ2 − 2. (7) We observe that the entanglement measure is always bounded, while Ecorr is a divergent function of λ. So it does not make much sense to look for simple relations valid on the entire λ-axes. Consequently, limiting ourselves to weak couplings, for 0 ≤ λ ≤ 1, we minimize the mean squared deviation ∆S2α dλ, with ∆Sα = Ecorr − αSvN . (8) Thus the minimizing parameter αmin will be given by αmin = EcorrSvN dλ ≈ −0.691217. (9) A formula analogous to (9) can be obtained by using the Concurrence as a measure of entanglement. In this case, by minimizing the mean squared deviation we have ∆C2α′ dλ, with ∆Cα′ = Ecorr − α′ C. (10) Now, in order to estimate the relative deviation of SvN with respect to Ecorr, let us report |∆Sαmin |/SvN and |∆Sαmin/Ecorr| as functions of λ at the optimal value αmin. The graphs of these functions are shown in Figure 2. 0.2 0.4 0.6 0.8 1 ÈDSmin�SvN È 0.2 0.4 0.6 0.8 1 ÈDSmin�EcorrÈ Figure 2: The relative quadratic deviation between the von Neumann entropy and the correlation energy with respect to the former and the latter, respectively, at the optimal value αmin as a function of the coupling constant λ for g = 1. In Figure 3, the relative quadratic deviation between the Concurrence and the correlation energy with respect to the former and the latter, at the optimal values α′min, is represented. 0.2 0.4 0.6 0.8 1 ÈDCminÈ�C 0.2 0.4 0.6 0.8 1 ÈDCmin�EcorrÈ Figure 3: The relative quadratic deviation between the Concurrence and the correlation energy with respect to the former and the latter, respectively, at the optimal value α′min as a function of the coupling constant λ for g = 1. Remark 1 From these graphs, one can argue that the agreement between the two func- tions SvN and Ecorr is only qualitatively good, in fact, for very small λ, it is not good at all. However, in an intermediate range of values, i. e., 0.6 ≤ λ ≤ 1 the two functions are almost proportional within the 10%. Analogously, the same is true between energy and Concurrence. Even, the agreement becomes worst comparing the relative deviation of the Concurrence with respect to the corre- lation energy, since the range in which the relative deviations become smaller than 10% are narrower. Then, the question is whether the above results are i) sufficient to justify the conjecture advanced in [1], i.e., entanglement can be considered as an estimation of correlation energy; ii) if such a relation has a more concrete physical meaning, in particular whether the minimizing parame- ter αmin and the vanishing point of ∆Sαmin does possess any physical meaning (or α′min and the vanishing point of ∆Cα′min). Notice that in the case of the comparison for the Concurrence simpler analytical expressions appear. For in- stance one finds ∆Cα′ 0.383249 λ√ λ2 + 4 + 2 Remark 2 We note that in an interval of values around αmin, the deviation function (8) possesses a minimum in the interval of interest 0 ≤ λ ≤ 1, otherwise the minimum is achieved at larger value of λ, or the function is monotonically increasing (see Figure 4). 0.2 0.4 0.6 0.8 1 -0.05 0.2 0.4 0.6 0.8 1 -0.05 d DSΑ � dΛ Figure 4: The deviation ∆Sα and its derivative with respect to λ are computed for values of −1.29(red) ≤ α ≤ −0.091(violet), for steps of 0.06. The curve drawn thicker corresponds to αmin This behavior suggests to consider the function ∆Sαmin as a sort of ”free en- ergy” , where αmin mimics the ”temperature” specific of the system. If, for some reason, we allow λ to change, then we expect that spontaneously the interaction coupling adjusts itself to the minimum of ∆Sαmin . Similar considerations can be made looking at the graphs drawn for the function ∆Cα′ and its derivative with respect to λ (see Figure 5). The function ∆Sαmin or, alternatively, the minimum of ∆Cα′ can be ob- tained algebraically. Such a minimum is at the value of the coupling constant λSvNmin ≈ 0.485 and λCmin ≈ 0.371, respectively. The authors in [1] studied numerically the von Neumann entropy and the correlation function for a Hydrogen molecule, using an old result by Herring and Flicker [8], going back to an oldest idea by Heitler and London [9], which con- sists in substituting the molecular binding with a position dependent exchange coupling: J(r) ≈ 1.641 r 2 e−2 r Ry, (11) where r is given in Bohr radius, see Figure 6. The maximum value taken by this function is at the point rmax = 1.25. Assuming B = 0.5 Ry, i.e. 12 of the funda- mental level of the Hydrogen atom, the maximum value λ′max = J(rmax)/B ≈ 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 dDCΑ'� dΛ Figure 5: The deviation ∆Cα′ and its derivative with respect to λ are computed for values of −0.98(red) ≤ α′ ≤ 0.22(violet), for steps of 0.06. The curve drawn thicker corresponds to α′min 0.5 1 1.5 2 2.5 3 JHrLHRyL Figure 6: The effective interaction Hydrogen-Hydrogen atom 0.470628 < λSvNmin , i.e. the value of the effective interaction value is less than the minimum for the deviation function ∆Sαmin . Then, the equilibrium bal- ance between entanglement (as von Neumann entropy) and correlation energy predicts a length of the molecule equal to rmax (see the first panel of Figure 7). On the other hand, if we consider the energy gap 2B = 3/4 Ry, i.e. the energy step to the first excited state, one obtains the new value λ′′max ≈ 0.628, which goes beyond λmin, even if it is always less than 1. Now, the deviation function ∆Sαmin has two minima as seen in the second panel of Figure 7, one of which is at r′′− ≈ 0.76 , the other one being at r′′+ ≈ 1.91. These results should be compared with the experimental equilibrium length of the Hydrogen molecule, which is rexp ≈ 2.0. We point out that although the spin–model described by the Hamiltonian (1) is characterized by features which are essentially rough, however we are induced to answer positively to the quest for a physical meaning of the deviation function ∆Sαmin . Indeed, the results elucidated in Figure 7 encourage, on one part, improvement of the computation of r in order to make more accurate the comparison with the experimental value rexp. 0.5 1 1.5 2 2.5 3 -0.015 -0.0125 -0.01 -0.0075 -0.005 -0.0025 DS minHr; B= .5 RyL 0.5 1 1.5 2 2.5 3 -0.015 -0.0125 -0.01 -0.0075 -0.005 -0.0025 DS minHr; B= .375 RyL r''- r''+ Figure 7: The von Neumann entropy for the 2-spin model for B = .5 Ry (left panel) and for B = .375 Ry (right panel) and the position depending interaction given by (11). The first question to answer is whether this draft works also for the Concur- rence. A statement about it is not obvious, since the von Neumann entropy is a nonlinear function of the Concurrence in the 2-qubits case. However, from Figure 8 one can see that the minimized deviation of the Concurrence takes one minimum for relatively large intensity of the magnetic field ( say B ≥ 0.6 Ry), while for weak fields two minima appear, corresponding to the situation depicted nearby. 0 0.2 0.4 0.6 0.8 1 r HBohrL B HRyL 0.2 0.4 0.6 0.8 1 B(Ry) r(Bohr) Figure 8: Two contour plots of the minimized deviation of the Concurrence as a function of the magnetic field B (Ry) and of the internuclear distance r, as given by (11). The range of values divided by the contour lines is [−0.038, 0, 04] for the left panel and [−0.03705, −0, 03000] for the right one that approximatively corresponding to the black area in the left panel. In correspondence of the same values considered above, for B = 0.5 Ry the function ∆Cα′ (r) has two minima at r = 0.79 and r = 1.88, while for B = 0.375 Ry they are located at r = 0.60 and r = 2.25. So one sees that the resulting equilibrium configurations are not much very close to the experimental one. The equilibrium configuration more closest to the experimental one is the minimum occurring at r = 1.88 (B = 1 Ry) for the function ∆Cα′ One sees that one of the resulting equilibrium configurations is only roughly close to the experimental one. In other words, to conclude monitoring numerically B the equilibrium config- uration more closest to experimental one in the minimum occurring at r = 1.88 for B = 1 Ry and at r = 2.25 for B = 0.375 Ry for the function ∆Cα′ 3 A quantum chemical framework to compare entanglement and correlation energy In this Section we represent the results produced in [1], where the electron entan- glement in the Hydrogen molecule, calculated by the von Neumann entropy of the reduced density matrix ρ1, is obtained starting by the excitation coefficients of the wave function expanded by a configuration interaction method: ρCISD1 = −Tr ρCISD1 log2ρ |c2i+11 |2 + |c2i+1,2i+21,2 |2 |c2i+11 |2 + |c2i+1,2i+21,2 |2 |c0|2 + |c2i+22 |2 |c0|2 + |c2i+22 |2 , (12) where c1 is the coefficient for a single excitation, and c1,2 is the double excitation (in Appendix A of [10] more details are shown). In this framework, entanglement (S) and correlation energy (Ecorr), as func- tions of nucleus – nucleus separation are those in Figure 9 0 1 2 3 4 5 R ( Å ) S ( ρ1 Figure 9: Comparison between the entanglement, calculated by the von Neu- mann entropy of the reduced density matrix, and the electron correlation energy in the Hydrogen molecule. By the results given by this model, we want to discuss and to suggest some answers to the questions i) and ii) presented in Remark 1. Even if, in order to represent correlation energy and entanglement, we use two different scales, in Figure 9 we can see that entanglement has a small value in the united atom limit after it is growing for small distances till it arrives at a maximum value then it decrease till it assumes zero value at the separated atom limit and it is exactly the progress of the correlation curve. In order to compare the entropy S with the electron correlation energy Ecorr, we rescale S with the parameter αmin calculated with some procedure illustrated in Eq. (8) and Eq. (9) replacing the integration variable λ with R; in this way we extract EcorrSvNdR SvNdR ≈ 0.009. (13) The corresponding ∆Sαmin = Ecorr−αS allows us to answer to the question ii); in fact, as it is shown in Figure 10, the vanishing point of ∆Sαmin is, according to the two –spin Ising model, nearby R ≈ 2 Å that corresponds to the equilibrium configuration of the Hydrogen molecule. 0 1 2 3 4 5 −0.01 Figure 10: ∆Sαmin for theH2 molecule as a function of nucleus–nucleus distance. 4 Differences between the Configuration Inter- action approach and the two–spin Ising model The model proposed in Sec. 1 provides us with a measurement of entanglement: indeed, Eq. (3) describes the von Neumann entropy as a function of coupling constant λ, for small λ. By using Eq. (7), we can express λ in terms of corre- lation energy and substituting it in Eq. (3) we can obtain the variation of SvN in terms of Ecorr. SvN = − EcorrLog Ecorr 2(Ecorr+2) + (Ecorr + 4)Log Ecorr+4 2(Ecorr+2) (Ecorr + 2)Log4 . (14) In order to calculated the coefficient of proportionality among SvN and Ecorr we make an expansion of SvN for Ecorr → 0 (or equivalently for λ → 0) at the first order, obtaining a straight line characterized by an angular coefficient given by mSvN (Ecorr) = ( )(1 + 1 ). Since this behavior is uncorrect to represent the logatithmic singularity of SvN in the origin, we make an expansion of Eq. (14), preserving the logarithmic deviation, and we obtain an expression of the SvN = AEcorr +BEcorrLog(Ecorr), (15) where A = 1/2 and B = −1/(4Log2). 0.02 0.04 0.06 0.08 0.1 0.025 0.075 0.125 Ecorr Linear AE+BELogE Figure 11: A comparison among the behavior of Eq. (14) and its linear approx- imation and the logarithmic one, for the Ising model. In order to compare the behavior of SvN in Eq. (14), we have organized the numerical data, calculated with the method proposed in [1], by making a correspondence between each value of Ecorr and its respective value of SvN , obtaining the plot in Figure 12 0 0.01 0.02 0.03 0.04 0.05 Figure 12: A correspondence of Ecorr and SvN calculated by the numerical procedure suggested by [1] Of particular significance is the fact that, in the range where S is monotoni- cally increasing, the correlation energy has its maximum, consequently S seems to be not a function. Moreover, it is important to note that Ecorr begins to decrease for R > 1 Å, region where the states become mixed, i. e. ,Trρ 6= Trρ2; as depicted in Figure 13. 0 1 2 3 4 R(Å ) Figure 13: The increasing of the degree of mixing in the two electron state: in black we depict the trace of ρ, in red the trace of ρ2. Probably, for this reason, the procedure adopted in [1] seems to be not cor- rect: the density matrix, in fact, is calculated starting by the excitation coeffi- cient of a wave function obtained developping with the Configuration Interaction Single Double method a pure two electrons state. However, even if we consider only the first branch of the plot in Figure 12, i.e. , the numerical values of SvN corresponding with increasing values of Ecorr, and we fit the values around Ecorr → 0 with a F = AEcorr + BEcorrLog(Ecorr) we draw out numerical values of the coefficient different from the ones used in Eq. (15). This result is shown in Figure 14. 0.01 0.02 0.03 0.04 0.05 0.35 S Ecorr A=17.1 B=3.3 Figure 14: A fit of SvN as a function of Ecorr, around the origin, with a function of the form F = AEcorr +BEcorrLog(Ecorr) whose coefficients A and B assume the numerical values in Figure. In particular the arithmetic sign of the coefficient B in the two models are opposite and this implies the opposite concavity of the curve. This fact, clearly demonstrates a not satisfactory agreement between the Ising model and the one proposed in [1]. 5 Concluding remarks We have explored the role of entanglement in the model of two qubits describing the Hydrogen molecule (1), considered as a bipartite system. In our discussion we have limited to the ferromagnetic case governed by the interaction coupling parameter J > 0. The concept of entanglement gives a physical meaning to the electron cor- relation energy in structures of interacting electrons. The entanglement can be measured by using the von Neumann entropy or, alternatively, the notion of Concurrence [7]. To compute the entanglement it is convenient to consider two Regions, say I and II, which provide two different reduced density matrices. The entropy turns out to be an increasing function of the coupling constant λ in Region I, but the state under consideration is maximally entangled in Region II indipendently from the anisotropy parameter g. An interesting result is that for large coupling constants the entropy ap- proach 1, meaning that all levels are equiprobably visited by the considered spin. For weak interactions, at the boundary point λb = the von Neumann entropy admits a discontinuity, indicating a crossing of the lowest eigenvalues and, in a more general constext, a quantum phase transition [5]. In Sec. 2 a comparison between the entanglement and the correlation energy is performed. To quantifying the entanglement we resort to the reduced density matrix. The entanglement can also be measured by exploiting the concept of Concur- rence. The entanglement measure is always bounded, while the energy correlation, Ecorr = |E0| − 2 = 4 + λ2 − 2, is a divergent function of λ. This fact tells us that to look for simple relations valid on the whole λ−axes has no sense. Thus, by limiting ourselves to weak couplings, we have minimized the mean square deviation given by Eq. (8). This procedure leads to the value αmin ≈ −0.691217 for the minimizing parameter (see Eq. (9)). Sec. 1 contains a comparison between the von Neumann entropy and the Concurrence. Such a comparison is illustrated in Figure 1, for two spin system as a function of the coupling λ for g = 1. Some important points are commented in Remark 1 and Remark 2 . In Figure 4 the deviation ∆Sα and its derivatives with respect to λ are computed and αmin is evaluated for α ranging in the interval −1.29 ≤ α ≤ −0.091. In Figure 5 the minimized Concurrence deviation ∆C for the four eigen- states of the 2-spin model is shown. We point out the existence of a perfect symmetry among the Concurrence deviations for pairs of eigenstates of opposite eigenvalues. Formula (11), due to Heitler–London [9], is reported, where the position dependent exchange coupling J(r) is expressed in term of the length r of the nucleus–nucleus separation in the Hydrogen molecule. To conclude, the magnetic field B has been monitored such that the equi- librium configuration more closest to the experimental one, r ≈ 2.00, is the minimum occurring at r = 1.88 for B = 1 Ry and r = 2.25 for B = 0.375 Ry for the function ∆Cα′ We observe also that in the intermediate range of values, i. e., for 0.6 ≤ λ ≤ 1, the two functions SvN and the correlation energy are almost proportional within the 10%. However, when we organized the pairs of points (Ecorr, SvN ) calculated by following the procedure described by [1], it is clear that the von Neumann en- tropy cannot be considered a function of correlation energy. The principle cause is that the function Ecorr presents a maximum in the region where SvN is mono- tonically increasing. The reversing behavior of correlation energy occurs in correspondence with an increase of the mixing degree of the two electrons state. The function Ecorr in terms of the nucleus – nucleus distance R, increases till the state is pure, on the contrary, when Tr(ρ2) becomes discordant from Tr(ρ), the function Ecorr decreases. This fact suggests us that the numerical model based on the calculation of SvN starting by the excitation coefficients ci, isn’t completley correct because the density matrix is obtained as a product of two electron pure states. However, even if we consider only a branch of the plot in Figure 12, the function obtained by the two spin Ising model, i. e., Eq. (14), is unsuitable for fitting these numerical data. On the basis of our results, essentially grounded on numerical considerations, in the near feature we would explore more complicated systems of molecules, such as for example the ethylene or other hydrocarbons, and compare these studies with the goals obtained for the Hydrogen molecule. Acknowledgments The authors acknowledge the Italian Ministry of Scientific Researches (MIUR) for partial support of the present work under the project SINTESI 2004/06 and the INFN for partial support under the project Iniziativa Specifica LE41. References [1] Z. Huang, S. Kais, Chem. Phys. Lett. 413, 1 (2005). [2] M. A. Nielsen and I. L. Chuang Quantum Computation and Quantum In- formation, Cambridge Univ. Press, Cambridge, 2000. [3] D. M. Collin, Z. Naturforsch A 48, 68 (1993). [4] P. Rungta and C. M. Caves Phys. Rev. A 67, 012307 (2003). [5] S. Sachdev Quantum Phase Transition, Cambridge University Press, 2001. [6] O. Osenda, Z. Huang and S. Kais Phys. Rev A 67, 062321 (2003). [7] W. K. Wootters, Phys. Rev. Lett. 80, 2245 (1998). [8] C. Herring and M. Flicker, Phys. Rev. A 134, 362 (1964). [9] W. Heitler, F. London, Z. Physik 44, 455 (1927) [10] T. Maiolo, F. Della Sala, L. Martina, G. Soliani arXiv: quant–ph/ 0610238 (2006). http://arxiv.org/abs/quant--ph/0610238 Introduction and the model A comparison between the entanglement and the correlation energy A quantum chemical framework to compare entanglement and correlation energy Differences between the Configuration Interaction approach and the two–spin Ising model Concluding remarks ABSTRACT In this paper we investigate some entanglement properties for the Hydrogen molecule considered as a two interacting spin 1/2 (qubit) model. The entanglement related to the $H_{2}$ molecule is evaluated both using the von Neumann entropy and the Concurrence and it is compared with the corresponding quantities for the two interacting spin system. Many aspects of these functions are examinated employing in part analytical and, essentially, numerical techniques. We have compared analogous results obtained by Huang and Kais a few years ago. In this respect, some possible controversial situations are presented and discussed. <|endoftext|><|startoftext|> Introduction Fractional charges on frustrated lattices Effective Hamiltonians Height representation, conserved quantities, and gauge symmetries Ground states and lowest excitations in the undoped case Static and dynamical properties of the doped system References ABSTRACT Systems of strongly correlated fermions on certain geometrically frustrated lattices at particular filling factors support excitations with fractional charges $\pm e/2$. We calculate quantum mechanical ground states, low--lying excitations and spectral functions of finite lattices by means of numerical diagonalization. The ground state of the most thoroughfully studied case, the criss-crossed checkerboard lattice, is degenerate and shows long--range order. Static fractional charges are confined by a weak linear force, most probably leading to bound states of large spatial extent. Consequently, the quasi-particle weight is reduced, which reflects the internal dynamics of the fractionally charged excitations. By using an additional parameter, we fine--tune the system to a special point at which fractional charges are manifestly deconfined--the so--called Rokhsar--Kivelson point. For a deeper understanding of the low--energy physics of these models and for numerical advantages, several conserved quantum numbers are identified. <|endoftext|><|startoftext|> BABAR-PUB-07/009 SLAC-PUB-12430 hep-ex/0704.0522 Measurement of Decay Amplitudes of B → (cc)K∗ with an Angular Analysis, for (cc) = J/ψ , ψ(2S) and χ B. Aubert,1 M. Bona,1 D. Boutigny,1 Y. Karyotakis,1 J. P. Lees,1 V. Poireau,1 X. Prudent,1 V. Tisserand,1 A. Zghiche,1 J. Garra Tico,2 E. Grauges,2 L. Lopez,3 A. Palano,3 G. Eigen,4 I. Ofte,4 B. Stugu,4 L. Sun,4 G. S. Abrams,5 M. Battaglia,5 D. N. Brown,5 J. Button-Shafer,5 R. N. Cahn,5 Y. Groysman,5 R. G. Jacobsen,5 J. A. Kadyk,5 L. T. Kerth,5 Yu. G. Kolomensky,5 G. Kukartsev,5 D. Lopes Pegna,5 G. Lynch,5 L. M. Mir,5 T. J. Orimoto,5 M. Pripstein,5 N. A. Roe,5 M. T. Ronan,5, ∗ K. Tackmann,5 W. A. Wenzel,5 P. del Amo Sanchez,6 C. M. Hawkes,6 A. T. Watson,6 T. Held,7 H. Koch,7 B. Lewandowski,7 M. Pelizaeus,7 T. Schroeder,7 M. Steinke,7 W. N. Cottingham,8 D. Walker,8 D. J. Asgeirsson,9 T. Cuhadar-Donszelmann,9 B. G. Fulsom,9 C. Hearty,9 N. S. Knecht,9 T. S. Mattison,9 J. A. McKenna,9 A. Khan,10 M. Saleem,10 L. Teodorescu,10 V. E. Blinov,11 A. D. Bukin,11 V. P. Druzhinin,11 V. B. Golubev,11 A. P. Onuchin,11 S. I. Serednyakov,11 Yu. I. Skovpen,11 E. P. Solodov,11 K. Yu Todyshev,11 M. Bondioli,12 S. Curry,12 I. Eschrich,12 D. Kirkby,12 A. J. Lankford,12 P. Lund,12 M. Mandelkern,12 E. C. Martin,12 D. P. Stoker,12 S. Abachi,13 C. Buchanan,13 S. D. Foulkes,14 J. W. Gary,14 F. Liu,14 O. Long,14 B. C. Shen,14 L. Zhang,14 H. P. Paar,15 S. Rahatlou,15 V. Sharma,15 J. W. Berryhill,16 C. Campagnari,16 A. Cunha,16 B. Dahmes,16 T. M. Hong,16 D. Kovalskyi,16 J. D. Richman,16 T. W. Beck,17 A. M. Eisner,17 C. J. Flacco,17 C. A. Heusch,17 J. Kroseberg,17 W. S. Lockman,17 T. Schalk,17 B. A. Schumm,17 A. Seiden,17 D. C. Williams,17 M. G. Wilson,17 L. O. Winstrom,17 E. Chen,18 C. H. Cheng,18 A. Dvoretskii,18 F. Fang,18 D. G. Hitlin,18 I. Narsky,18 T. Piatenko,18 F. C. Porter,18 G. Mancinelli,19 B. T. Meadows,19 K. Mishra,19 M. D. Sokoloff,19 F. Blanc,20 P. C. Bloom,20 S. Chen,20 W. T. Ford,20 J. F. Hirschauer,20 A. Kreisel,20 M. Nagel,20 U. Nauenberg,20 A. Olivas,20 J. G. Smith,20 K. A. Ulmer,20 S. R. Wagner,20 J. Zhang,20 A. M. Gabareen,21 A. Soffer,21 W. H. Toki,21 R. J. Wilson,21 F. Winklmeier,21 Q. Zeng,21 D. D. Altenburg,22 E. Feltresi,22 A. Hauke,22 H. Jasper,22 J. Merkel,22 A. Petzold,22 B. Spaan,22 K. Wacker,22 T. Brandt,23 V. Klose,23 H. M. Lacker,23 W. F. Mader,23 R. Nogowski,23 J. Schubert,23 K. R. Schubert,23 R. Schwierz,23 J. E. Sundermann,23 A. Volk,23 D. Bernard,24 G. R. Bonneaud,24 E. Latour,24 V. Lombardo,24 Ch. Thiebaux,24 M. Verderi,24 P. J. Clark,25 W. Gradl,25 F. Muheim,25 S. Playfer,25 A. I. Robertson,25 Y. Xie,25 M. Andreotti,26 D. Bettoni,26 C. Bozzi,26 R. Calabrese,26 A. Cecchi,26 G. Cibinetto,26 P. Franchini,26 E. Luppi,26 M. Negrini,26 A. Petrella,26 L. Piemontese,26 E. Prencipe,26 V. Santoro,26 F. Anulli,27 R. Baldini-Ferroli,27 A. Calcaterra,27 R. de Sangro,27 G. Finocchiaro,27 S. Pacetti,27 P. Patteri,27 I. M. Peruzzi,27, † M. Piccolo,27 M. Rama,27 A. Zallo,27 A. Buzzo,28 R. Contri,28 M. Lo Vetere,28 M. M. Macri,28 M. R. Monge,28 S. Passaggio,28 C. Patrignani,28 E. Robutti,28 A. Santroni,28 S. Tosi,28 K. S. Chaisanguanthum,29 M. Morii,29 J. Wu,29 R. S. Dubitzky,30 J. Marks,30 S. Schenk,30 U. Uwer,30 D. J. Bard,31 P. D. Dauncey,31 R. L. Flack,31 J. A. Nash,31 M. B. Nikolich,31 W. Panduro Vazquez,31 P. K. Behera,32 X. Chai,32 M. J. Charles,32 U. Mallik,32 N. T. Meyer,32 V. Ziegler,32 J. Cochran,33 H. B. Crawley,33 L. Dong,33 V. Eyges,33 W. T. Meyer,33 S. Prell,33 E. I. Rosenberg,33 A. E. Rubin,33 A. V. Gritsan,34 Z. J. Guo,34 C. K. Lae,34 A. G. Denig,35 M. Fritsch,35 G. Schott,35 N. Arnaud,36 J. Béquilleux,36 M. Davier,36 G. Grosdidier,36 A. Höcker,36 V. Lepeltier,36 F. Le Diberder,36 A. M. Lutz,36 S. Pruvot,36 S. Rodier,36 P. Roudeau,36 M. H. Schune,36 J. Serrano,36 V. Sordini,36 A. Stocchi,36 W. F. Wang,36 G. Wormser,36 D. J. Lange,37 D. M. Wright,37 C. A. Chavez,38 I. J. Forster,38 J. R. Fry,38 E. Gabathuler,38 R. Gamet,38 D. E. Hutchcroft,38 D. J. Payne,38 K. C. Schofield,38 C. Touramanis,38 A. J. Bevan,39 K. A. George,39 F. Di Lodovico,39 W. Menges,39 R. Sacco,39 G. Cowan,40 H. U. Flaecher,40 D. A. Hopkins,40 P. S. Jackson,40 T. R. McMahon,40 F. Salvatore,40 A. C. Wren,40 D. N. Brown,41 C. L. Davis,41 J. Allison,42 N. R. Barlow,42 R. J. Barlow,42 Y. M. Chia,42 C. L. Edgar,42 G. D. Lafferty,42 T. J. West,42 J. I. Yi,42 J. Anderson,43 C. Chen,43 A. Jawahery,43 D. A. Roberts,43 G. Simi,43 J. M. Tuggle,43 G. Blaylock,44 C. Dallapiccola,44 S. S. Hertzbach,44 X. Li,44 T. B. Moore,44 E. Salvati,44 S. Saremi,44 R. Cowan,45 P. H. Fisher,45 G. Sciolla,45 S. J. Sekula,45 M. Spitznagel,45 F. Taylor,45 R. K. Yamamoto,45 S. E. Mclachlin,46 P. M. Patel,46 S. H. Robertson,46 A. Lazzaro,47 F. Palombo,47 J. M. Bauer,48 L. Cremaldi,48 V. Eschenburg,48 R. Godang,48 R. Kroeger,48 D. A. Sanders,48 D. J. Summers,48 H. W. Zhao,48 S. Brunet,49 D. Côté,49 M. Simard,49 P. Taras,49 http://arxiv.org/abs/0704.0522v2 F. B. Viaud,49 H. Nicholson,50 G. De Nardo,51 F. Fabozzi,51, ‡ L. Lista,51 D. Monorchio,51 C. Sciacca,51 M. A. Baak,52 G. Raven,52 H. L. Snoek,52 C. P. Jessop,53 J. M. LoSecco,53 G. Benelli,54 L. A. Corwin,54 K. K. Gan,54 K. Honscheid,54 D. Hufnagel,54 H. Kagan,54 R. Kass,54 J. P. Morris,54 A. M. Rahimi,54 J. J. Regensburger,54 R. Ter-Antonyan,54 Q. K. Wong,54 N. L. Blount,55 J. Brau,55 R. Frey,55 O. Igonkina,55 J. A. Kolb,55 M. Lu,55 R. Rahmat,55 N. B. Sinev,55 D. Strom,55 J. Strube,55 E. Torrence,55 N. Gagliardi,56 A. Gaz,56 M. Margoni,56 M. Morandin,56 A. Pompili,56 M. Posocco,56 M. Rotondo,56 F. Simonetto,56 R. Stroili,56 C. Voci,56 E. Ben-Haim,57 H. Briand,57 J. Chauveau,57 P. David,57 L. Del Buono,57 Ch. de la Vaissière,57 O. Hamon,57 B. L. Hartfiel,57 Ph. Leruste,57 J. Malclès,57 J. Ocariz,57 A. Perez,57 L. Gladney,58 M. Biasini,59 R. Covarelli,59 E. Manoni,59 C. Angelini,60 G. Batignani,60 S. Bettarini,60 G. Calderini,60 M. Carpinelli,60 R. Cenci,60 A. Cervelli,60 F. Forti,60 M. A. Giorgi,60 A. Lusiani,60 G. Marchiori,60 M. A. Mazur,60 M. Morganti,60 N. Neri,60 E. Paoloni,60 G. Rizzo,60 J. J. Walsh,60 M. Haire,61 J. Biesiada,62 P. Elmer,62 Y. P. Lau,62 C. Lu,62 J. Olsen,62 A. J. S. Smith,62 A. V. Telnov,62 E. Baracchini,63 F. Bellini,63 G. Cavoto,63 A. D’Orazio,63 D. del Re,63 E. Di Marco,63 R. Faccini,63 F. Ferrarotto,63 F. Ferroni,63 M. Gaspero,63 P. D. Jackson,63 L. Li Gioi,63 M. A. Mazzoni,63 S. Morganti,63 G. Piredda,63 F. Polci,63 F. Renga,63 C. Voena,63 M. Ebert,64 H. Schröder,64 R. Waldi,64 T. Adye,65 G. Castelli,65 B. Franek,65 E. O. Olaiya,65 S. Ricciardi,65 W. Roethel,65 F. F. Wilson,65 R. Aleksan,66 S. Emery,66 M. Escalier,66 A. Gaidot,66 S. F. Ganzhur,66 G. Hamel de Monchenault,66 W. Kozanecki,66 M. Legendre,66 G. Vasseur,66 Ch. Yèche,66 M. Zito,66 X. R. Chen,67 H. Liu,67 W. Park,67 M. V. Purohit,67 J. R. Wilson,67 M. T. Allen,68 D. Aston,68 R. Bartoldus,68 P. Bechtle,68 N. Berger,68 R. Claus,68 J. P. Coleman,68 M. R. Convery,68 J. C. Dingfelder,68 J. Dorfan,68 G. P. Dubois-Felsmann,68 D. Dujmic,68 W. Dunwoodie,68 R. C. Field,68 T. Glanzman,68 S. J. Gowdy,68 M. T. Graham,68 P. Grenier,68 C. Hast,68 T. Hryn’ova,68 W. R. Innes,68 M. H. Kelsey,68 H. Kim,68 P. Kim,68 D. W. G. S. Leith,68 S. Li,68 S. Luitz,68 V. Luth,68 H. L. Lynch,68 D. B. MacFarlane,68 H. Marsiske,68 R. Messner,68 D. R. Muller,68 C. P. O’Grady,68 A. Perazzo,68 M. Perl,68 T. Pulliam,68 B. N. Ratcliff,68 A. Roodman,68 A. A. Salnikov,68 R. H. Schindler,68 J. Schwiening,68 A. Snyder,68 J. Stelzer,68 D. Su,68 M. K. Sullivan,68 K. Suzuki,68 S. K. Swain,68 J. M. Thompson,68 J. Va’vra,68 N. van Bakel,68 A. P. Wagner,68 M. Weaver,68 W. J. Wisniewski,68 M. Wittgen,68 D. H. Wright,68 A. K. Yarritu,68 K. Yi,68 C. C. Young,68 P. R. Burchat,69 A. J. Edwards,69 S. A. Majewski,69 B. A. Petersen,69 L. Wilden,69 S. Ahmed,70 M. S. Alam,70 R. Bula,70 J. A. Ernst,70 V. Jain,70 B. Pan,70 M. A. Saeed,70 F. R. Wappler,70 S. B. Zain,70 W. Bugg,71 M. Krishnamurthy,71 S. M. Spanier,71 R. Eckmann,72 J. L. Ritchie,72 A. M. Ruland,72 C. J. Schilling,72 R. F. Schwitters,72 J. M. Izen,73 X. C. Lou,73 S. Ye,73 F. Bianchi,74 F. Gallo,74 D. Gamba,74 M. Pelliccioni,74 M. Bomben,75 L. Bosisio,75 C. Cartaro,75 F. Cossutti,75 G. Della Ricca,75 L. Lanceri,75 L. Vitale,75 V. Azzolini,76 N. Lopez-March,76 F. Martinez-Vidal,76 D. A. Milanes,76 A. Oyanguren,76 J. Albert,77 Sw. Banerjee,77 B. Bhuyan,77 K. Hamano,77 R. Kowalewski,77 I. M. Nugent,77 J. M. Roney,77 R. J. Sobie,77 J. J. Back,78 P. F. Harrison,78 T. E. Latham,78 G. B. Mohanty,78 M. Pappagallo,78, § H. R. Band,79 X. Chen,79 S. Dasu,79 K. T. Flood,79 J. J. Hollar,79 P. E. Kutter,79 Y. Pan,79 M. Pierini,79 R. Prepost,79 S. L. Wu,79 Z. Yu,79 and H. Neal80 (The BABAR Collaboration) 1Laboratoire de Physique des Particules, IN2P3/CNRS et Université de Savoie, F-74941 Annecy-Le-Vieux, France 2Universitat de Barcelona, Facultat de Fisica, Departament ECM, E-08028 Barcelona, Spain 3Università di Bari, Dipartimento di Fisica and INFN, I-70126 Bari, Italy 4University of Bergen, Institute of Physics, N-5007 Bergen, Norway 5Lawrence Berkeley National Laboratory and University of California, Berkeley, California 94720, USA 6University of Birmingham, Birmingham, B15 2TT, United Kingdom 7Ruhr Universität Bochum, Institut für Experimentalphysik 1, D-44780 Bochum, Germany 8University of Bristol, Bristol BS8 1TL, United Kingdom 9University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z1 10Brunel University, Uxbridge, Middlesex UB8 3PH, United Kingdom 11Budker Institute of Nuclear Physics, Novosibirsk 630090, Russia 12University of California at Irvine, Irvine, California 92697, USA 13University of California at Los Angeles, Los Angeles, California 90024, USA 14University of California at Riverside, Riverside, California 92521, USA 15University of California at San Diego, La Jolla, California 92093, USA 16University of California at Santa Barbara, Santa Barbara, California 93106, USA 17University of California at Santa Cruz, Institute for Particle Physics, Santa Cruz, California 95064, USA 18California Institute of Technology, Pasadena, California 91125, USA 19University of Cincinnati, Cincinnati, Ohio 45221, USA 20University of Colorado, Boulder, Colorado 80309, USA 21Colorado State University, Fort Collins, Colorado 80523, USA 22Universität Dortmund, Institut für Physik, D-44221 Dortmund, Germany 23Technische Universität Dresden, Institut für Kern- und Teilchenphysik, D-01062 Dresden, Germany 24Laboratoire Leprince-Ringuet, CNRS/IN2P3, Ecole Polytechnique, F-91128 Palaiseau, France 25University of Edinburgh, Edinburgh EH9 3JZ, United Kingdom 26Università di Ferrara, Dipartimento di Fisica and INFN, I-44100 Ferrara, Italy 27Laboratori Nazionali di Frascati dell’INFN, I-00044 Frascati, Italy 28Università di Genova, Dipartimento di Fisica and INFN, I-16146 Genova, Italy 29Harvard University, Cambridge, Massachusetts 02138, USA 30Universität Heidelberg, Physikalisches Institut, Philosophenweg 12, D-69120 Heidelberg, Germany 31Imperial College London, London, SW7 2AZ, United Kingdom 32University of Iowa, Iowa City, Iowa 52242, USA 33Iowa State University, Ames, Iowa 50011-3160, USA 34Johns Hopkins University, Baltimore, Maryland 21218, USA 35Universität Karlsruhe, Institut für Experimentelle Kernphysik, D-76021 Karlsruhe, Germany 36Laboratoire de l’Accélérateur Linéaire, IN2P3/CNRS et Université Paris-Sud 11, Centre Scientifique d’Orsay, B. P. 34, F-91898 ORSAY Cedex, France 37Lawrence Livermore National Laboratory, Livermore, California 94550, USA 38University of Liverpool, Liverpool L69 7ZE, United Kingdom 39Queen Mary, University of London, E1 4NS, United Kingdom 40University of London, Royal Holloway and Bedford New College, Egham, Surrey TW20 0EX, United Kingdom 41University of Louisville, Louisville, Kentucky 40292, USA 42University of Manchester, Manchester M13 9PL, United Kingdom 43University of Maryland, College Park, Maryland 20742, USA 44University of Massachusetts, Amherst, Massachusetts 01003, USA 45Massachusetts Institute of Technology, Laboratory for Nuclear Science, Cambridge, Massachusetts 02139, USA 46McGill University, Montréal, Québec, Canada H3A 2T8 47Università di Milano, Dipartimento di Fisica and INFN, I-20133 Milano, Italy 48University of Mississippi, University, Mississippi 38677, USA 49Université de Montréal, Physique des Particules, Montréal, Québec, Canada H3C 3J7 50Mount Holyoke College, South Hadley, Massachusetts 01075, USA 51Università di Napoli Federico II, Dipartimento di Scienze Fisiche and INFN, I-80126, Napoli, Italy 52NIKHEF, National Institute for Nuclear Physics and High Energy Physics, NL-1009 DB Amsterdam, The Netherlands 53University of Notre Dame, Notre Dame, Indiana 46556, USA 54Ohio State University, Columbus, Ohio 43210, USA 55University of Oregon, Eugene, Oregon 97403, USA 56Università di Padova, Dipartimento di Fisica and INFN, I-35131 Padova, Italy 57Laboratoire de Physique Nucléaire et de Hautes Energies, IN2P3/CNRS, Université Pierre et Marie Curie-Paris6, Université Denis Diderot-Paris7, F-75252 Paris, France 58University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA 59Università di Perugia, Dipartimento di Fisica and INFN, I-06100 Perugia, Italy 60Università di Pisa, Dipartimento di Fisica, Scuola Normale Superiore and INFN, I-56127 Pisa, Italy 61Prairie View A&M University, Prairie View, Texas 77446, USA 62Princeton University, Princeton, New Jersey 08544, USA 63Università di Roma La Sapienza, Dipartimento di Fisica and INFN, I-00185 Roma, Italy 64Universität Rostock, D-18051 Rostock, Germany 65Rutherford Appleton Laboratory, Chilton, Didcot, Oxon, OX11 0QX, United Kingdom 66DSM/Dapnia, CEA/Saclay, F-91191 Gif-sur-Yvette, France 67University of South Carolina, Columbia, South Carolina 29208, USA 68Stanford Linear Accelerator Center, Stanford, California 94309, USA 69Stanford University, Stanford, California 94305-4060, USA 70State University of New York, Albany, New York 12222, USA 71University of Tennessee, Knoxville, Tennessee 37996, USA 72University of Texas at Austin, Austin, Texas 78712, USA 73University of Texas at Dallas, Richardson, Texas 75083, USA 74Università di Torino, Dipartimento di Fisica Sperimentale and INFN, I-10125 Torino, Italy 75Università di Trieste, Dipartimento di Fisica and INFN, I-34127 Trieste, Italy 76IFIC, Universitat de Valencia-CSIC, E-46071 Valencia, Spain 77University of Victoria, Victoria, British Columbia, Canada V8W 3P6 78Department of Physics, University of Warwick, Coventry CV4 7AL, United Kingdom 79University of Wisconsin, Madison, Wisconsin 53706, USA 80Yale University, New Haven, Connecticut 06511, USA (Dated: November 4, 2018) We perform the first three-dimensional measurement of the amplitudes of B → ψ(2S)K∗ and B → χc1K ∗ decays and update our previous measurement for B → J/ψK∗. We use a data sample collected with the BABAR detector at the PEP-II storage ring, corresponding to 232 million BB pairs. The longitudinal polarization of decays involving a JPC = 1++ χc1 meson is found to be larger than that with a 1−− J/ψ or ψ(2S) meson. No direct CP -violating charge asymmetry is observed. PACS numbers: 13.25.Hw, 12.15.Hh, 11.30.Er In the context of measuring the parameters of the Unitarity Triangle of the CKM matrix, B0 decays to charmonium-containing final states (J/ψ , ψ(2S), χc1)K defined collectively here as B0 → (cc̄)K∗, are of in- terest for the precise measurement of sin 2β, where β ≡ arg[−VcdV ∗cb/VtdV ∗tb], in a similar way as for B0 → J/ψK0. Furthermore, the J/ψK∗ channel allows the measurement of cos 2β [1]. For the modes considered in this paper, the final state consists of two spin-1 mesons, leading to three possible values of the total angular momentum with different CP eigenvalues (L = 1 is odd, while L = 0, 2 are even). The different contributions must be taken into account in the measurement of sin 2β. The amplitude for longitudinal polarization of the two spin-1 mesons is A0. There are two amplitudes for polarizations of the mesons transverse to the decay axis, here expressed in the transversity basis [2]: A‖ for parallel polarization and A⊥ for their perpen- dicular polarization. Only the relative amplitudes are measured, so that |A0|2 + |A‖|2 + |A⊥|2 = 1. Previous measurements by the CLEO [3], CDF [4], BABAR [1] and Belle [5] collaborations for the B → J/ψK∗ channels are all compatible with each other, and with a CP -odd in- tensity fraction |A⊥|2 close to 0.2. Factorization predicts that the phases of the transver- sity decay amplitudes are the same. BABAR has observed [1, 6] a significant departure from this prediction. Precise measurements of the branching fractions of B → (cc̄)K∗ decays are now available [7] to test the theoretical description of the non-factorizable contribu- tions [8], but polarization measurements are also needed. In particular, measurements for ψ(2S) and χc1, com- pared to that of J/ψ , would discriminate the mass de- pendence from the quantum number dependence. CLEO has measured the longitudinal polarization of B → ψ(2S) K∗ decays to be |A0|2 = 0.45 ± 0.11 ± 0.04 [9]. Belle has studied B → χc1 K∗ decays and obtained |A0|2 = 0.87± 0.09± 0.07 [10]. B → (cc̄)K(∗) decays provide a clean environment for the measurement of the CKM angle β because one tree amplitude dominates the decay. Very small direct CP -violating charge asymmetries are expected in these decays, and no such signal has been found [7]. While more than one amplitude with different strong and weak phases are needed to create a charge asymmetry in a sim- ple branching fraction measurement, London et al. have suggested [11] that an angular analysis of vector-vector decays can detect charge asymmetries even in the case of vanishing strong phase difference. Belle has looked for, and not found, such a signal [5]. In this paper we present the amplitude measurement of charged and neutral B → (cc)K∗ using a selection simi- lar to that of Ref. [7], and a fitting method similar to that of Ref. [1]. We use the notation ψ for the 1−− states J/ψ and ψ(2S). ψ (χc1) candidates are reconstructed in their decays to ℓ+ℓ− (J/ψγ), where ℓ represents an electron or a muon. Decays to the flavor eigenstates K∗0 → K±π∓, K∗± → K0 π± and K∗± → K±π0 are used. The relative strong phases are known to have a two-fold ambiguity when measured in an angular analysis alone. In con- trast to earlier publications [3, 4, 6] we use here the set of phases predicted in Ref. [12], with arguments based on the conservation of the s-quark helicity in the decay of the b quark. We have confirmed experimentally this prediction through the study of the variation with Kπ in- variant mass of the phase difference between theK∗(892) amplitude and a non-resonant Kπ S-wave amplitude [1]. The data were collected with the BABAR detector at the PEP-II asymmetric e+e− storage ring, and correspond to an integrated luminosity of about 209 fb−1 at the center- of-mass energy near the Υ (4S) mass. The BABAR detec- tor is described in detail elsewhere [13]. Charged-particle tracking is provided by a five-layer silicon vertex tracker (SVT) and a 40-layer drift chamber (DCH). For charged- particle identification (PID), ionization energy loss in the DCH and SVT, and Cherenkov radiation detected in a ring-imaging device (DIRC) are used. Photons are iden- tified by the electromagnetic calorimeter (EMC), which comprises 6580 thallium-doped CsI crystals. These sys- tems are mounted inside a 1.5-T solenoidal superconduct- ing magnet. Muons are identified in the instrumented flux return (IFR), composed of resistive plate chambers and layers of iron that return the magnetic flux of the solenoid. We use the GEANT4 [14] software to simulate interactions of particles traversing the detector, taking into account the varying accelerator and detector condi- tions. J/ψ → e+e− (µ+µ−) candidates must have a mass between 2.95 − 3.14 (3.06 − 3.14) GeV/c2. ψ(2S) can- didates are required to have invariant masses 3.44 < me+e− < 3.74 GeV/c 2 or 3.64 < mµ+µ− < 3.74 GeV/c Electron candidates are combined with photon candi- dates in order to recover some of the energy lost through Bremsstrahlung. J/ψ candidates and γ candidates with an energy larger than 150MeV, are combined to form χc1 candidates, which must satisfy 350 < mℓ+ℓ−γ −mℓ+ℓ− < 450 MeV/c2. π0 → γγ candidates must satisfy 113 < mγγ < 153 MeV/c 2. The energy of each photon has to be greater than 50MeV. K0 → π+π− candidates are required to satisfy 489 < mπ+π− < 507 MeV/c 2. In ad- dition, the K0 flight distance from the ψ vertex must be larger than three times its uncertainty. K∗0 and K∗+ candidates are required to satisfy 796 < mKπ < 996 MeV/c2 and 792 < mKπ < 992 MeV/c 2, respectively. In addition, due to the presence of a large background of low-energy non-genuine π0’s, the cosine of the angle θK∗ between the K momentum and the B momentum in the K∗ rest frame has to be less than 0.8 for K∗ → K±π0. In events where two B’s reconstruct to modes with the same cc̄ and K candidate, one with a π± and the other with a π0, the B candidate with a π0 is discarded due to the high background induced by fake π0’s. B candidates, reconstructed by combining cc̄ and K∗ candidates, are characterized by two kinematic variables: the difference between the reconstructed energy of the B candidate and the beam energy in the center-of-mass frame ∆E = E∗B − s/2, and the beam-energy substi- tuted mass mES ≡ (s/2 + p0 · pB)2/E20 − p2B, where subscript 0 and B correspond to Υ (4S) and the B can- didate in the laboratory frame. For a correctly recon- structed B meson, ∆E is expected to peak near zero and mES near the B-meson mass 5.279GeV/c 2. The analysis is performed in a region of the mES vs ∆E plane defined by 5.2 < mES < 5.3 GeV/c 2 and −120 < ∆E < 120 MeV. The signal region is defined asmES > 5.27 GeV/c and |∆E| smaller than 40 (30) MeV for channels with (without) a π0. For events that have multiple candi- dates, the candidate having the smallest |∆E| is chosen. mES distributions are available in Ref. [18]. The B decay amplitudes are measured from the dif- ferential decay distribution, expressed in the transversity basis [1, 6], Fig. 1, with conventions detailed in Ref. [15]. θK∗ is the helicity angle of the K ∗ decay. It is defined in FIG. 1: Definition of the transversity angles. Details are given in the text. the rest frame of the K∗ meson, and is the angle between the kaon and the opposite direction of the B meson in this frame. θtr and φtr are defined in the ψ (χc1) rest frame and are the polar and azimutal angle of the posi- tive lepton (J/ψ daughter of χc1) , with respect the axis defined by: • xtr: opposite direction of the B meson; • ytr: perpendicular to xtr, in the (xtr,pK∗) plane, with a direction such that pK∗ · ytr > 0; • ztr: to complete the frame, ie: ztr = xtr × ytr. In terms of the transversity angular variables ω ≡ (cos θK∗ , cos θtr, φtr), the time-integrated differential de- cay rate for the decay of the B meson is g(ω;A) ≡ 1 d cos θK∗d cos θtrdφtr Akfk(ω), (1) where the amplitude coefficientsAi and the angular func- tions fk(ω), k = 1 · · · 6 are listed in Table I. The ψ decays to two spin-1/2 particles, while the χc1 decays to two vector particles. The angular dependencies are therefore different [15]. The symbol A ≡ (A0, A‖, A⊥) denotes the transversity amplitudes for the decay of the B meson, and A for the B meson decay. In the absence of direct CP violation, we can choose a phase conven- tion in which these amplitudes are related by A0 = +A0, A‖ = +A‖, A⊥ = −A⊥, so that A⊥ is CP -odd and A0 and A‖ are CP -even. The phases δj of the amplitudes, where j = 0, ‖,⊥, are defined by Aj = |Aj |eiδj . Phases are defined relative to δ0 = 0. We perform an unbinned likelihood fit of the three- dimensional angle probability density function (PDF). The acceptance of the detector and the efficiency of the event reconstruction may vary as a function of the transversity angles, in particular as the angle θK∗ is strongly correlated with the momentum of the final kaon and pion. We use the acceptance correction method de- velopped in Ref. [1]. The PDF of the observed events, gobs, is : gobs(ω;A) = g(ω;A) 〈ε〉(A) , (2) where ε(ω) is the angle-dependent acceptance and 〈ε〉(A) ≡ g(ω;A)ε(ω)dω (3) is the average acceptance. We take into account the pres- ence of cross-feed from channels with the same cc̄ candi- date and a differentK∗ candidate that has (due to isospin symmetry) the same A dependence as the signal. The observed PDF for channel b (b = K±π∓,K0 π±,K±π0) is then gbobs(ω;A) = g(ω;A) εb(ω) k=1 Ak(A)Φbk , (4) TABLE I: Amplitude coefficients Ak and angular functions fk(ω) that contribute to the differential decay rate. An overall normalization factor 9/32π (for ψ) and 9/64π (for χc1) has been omitted. In the case of a B decay, the ℑm terms change sign. i Ak fk(ω) for ψ [1, 6] fk(ω) for χc1 [15] 1 |A0| 2 2 cos2 θK∗ 1− sin2 θtr cos 2 φtr 2 cos2 θK∗ 1 + sin2 θtr cos 2 φtr 2 |A‖| 2 sin2 θK∗ 1− sin2 θtr sin 2 φtr sin2 θK∗ 1 + sin2 θtr sin 2 φtr 3 |A⊥| 2 sin2 θK∗ sin 2 θtr sin 2 θK∗ 2 cos2 θtr + sin 2 θtr 4 ℑm(A∗‖A⊥) sin 2 θK∗ sin 2θtr sinφtr − sin 2 θK∗ sin 2θtr sinφtr 5 ℜe(A‖A sin 2θK∗ sin 2 θtr sin 2φtr sin 2θK∗ sin 2 θtr sin 2φtr 6 ℑm(A⊥A sin 2θK∗ sin 2θtr cos φtr − sin 2θK∗ sin 2θtr cos φtr where εb(ω) is the efficiency, defined as the ratio between the reconstructed and generated yield for the process (B → (cc̄)K∗, K∗ → b), and we do not distinguish be- tween correctly reconstructed signal and cross-feed in the numerator εb(ω) ≡ a→b(ω). (5) εa→b(ω) is the probability for an event generated in chan- nel a and with angle ω to be detected as an event in channel b. Fa, a = K π0,K±π∓,K±π0,K0 π± denotes the fraction of each channel in the total branching frac- tion B → ccK∗, a Fa = 1. The Φ k are the fk(ω) moments of the total efficiency εb, including cross-feed : Φbk ≡ fk(ω)ε a→b(ω)dω. (6) Under the approximations of neglecting the angular resolution for signal and cross-feed events, and the pos- sible mis-measurement of the B flavor such as in events where both daughters inK∗0 → K±π∓ are mis-identified (K-π swap), the PDF gobs can be expressed as in Eq. (2), and only the coefficients ΦbK are needed. The biases in- duced by these approximations have been estimated with Monte Carlo (MC) based studies and found to be negli- gible. The coefficients Φbk are computed with exclusive signal MC samples obtained using a full simulation of the ex- periment [14, 16]. PID efficiencies measured with data control samples are used to adjust the MC simulation to the observed performance of the detector. Separate co- efficients are used for different charges of the final state mesons, in particular to take into account the charge de- pendence of the interaction of charged kaons with matter, and a possible charge asymmetry of the detector. Writ- ing the expression for the log-likelihood Lb(A) for the PDF gbobs(ωi;A) for a pure signal sample of NS events, the relevant contribution is Lb(A) = ln (g(ωi;A))−NS ln Ak(A)Φbk , (7) since the remaining term i=1 ln εb(ωi) does not de- pend on the amplitudes. We use a background correction method [1] in which background events from a pure background sample of NB events are added with a negative weight to the log- likelihood that is maximized L′b(A) ≡ nB+NS L(ωi;A)− L(ωj ;A), (8) where L(ω;A) = ln(gbobs(ω;A)). The fit is performed within the mES signal region. Background events used here for subtraction are from generic (BB, qq) MC sam- ples. ñB is an estimate of the unknown number nB of background events that are present in the signal region in the data sample. As L′b is not a log-likelihood, the uncertainties yielded by the minimization program Minuit [17] are biased es- timates of the actual uncertainties. An unbiased esti- mation of the uncertainties is described and validated in Appendix A of Ref. [1]. With this pseudo-log-likelihood technique, we avoid parametrizing the acceptance as well as the background angular distributions. The measurement is affected by several systematic un- certainties. The branching fractions used in the cross- feed part of the acceptance cross section are varied by ±1σ, and the largest variation is retained. The uncer- tainty induced by the finite size of the MC sample used to compute the coefficients Φbk is estimated by the statis- tical uncertainty of the angular fit on that MC sample [6]. The uncertainty due to our limited understanding of the PID efficiency is estimated by using two different meth- ods to correct for the MC-vs-data differences. The back- ground uncertainty is obtained by comparing MC and data shapes of the mES distributions for the combinato- rial component and by using the corresponding branching errors for the peaking component. The uncertainty due to the presence of a Kπ S wave under the K∗(892) peak is estimated by a fit including it. The differential decay rate is described by Eqs. (6-9) of Ref. [1]. The results are summarized in Table II. The values of |A0|2, |A‖|2, |A⊥|2 are negatively correlated due to the constraint |A0|2+ |A‖|2+ |A⊥|2 = 1. In particular, |A‖|2, TABLE II: Summary of the measured amplitudes. For decays to χc1, as A⊥ is compatible with zero, its phase is not defined. Channel |A0| 2 |A‖| 2 |A⊥| 2 δ‖ δ⊥ J/ψK∗ 0.556 ± 0.009 ± 0.010 0.211 ± 0.010 ± 0.006 0.233 ± 0.010 ± 0.005 −2.93 ± 0.08± 0.04 2.91 ± 0.05± 0.03 ψ(2S)K∗ 0.48± 0.05 ± 0.02 0.22 ± 0.06 ± 0.02 0.30± 0.06 ± 0.02 −2.8± 0.4± 0.1 2.8± 0.3± 0.1 ∗ 0.77± 0.07 ± 0.04 0.20 ± 0.07 ± 0.04 0.03± 0.04 ± 0.02 0.0± 0.3± 0.1 – Ψ(2S) -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 cosθK* K+π- Ksπ + K+π0 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 cosθtr K+π- Ksπ + K+π0 0 2.5 5 0 2.5 5 0 2.5 5 0 2.5 5 0 2.5 5 0 2.5 5 0 2.5 5 0 2.5 5 0 2.5 5 K+π- Ksπ + K+π0 FIG. 2: Angular distributions with PDF from fit overlaid. The asymmetry of the cos θK∗ distributions induced by the S-wave interference is clearly visible. TABLE III: Difference between the interference terms mea- sured in B and B decays to J/ψ . δA4 δA6 (K+π−) 0.002 ± 0.025 ± 0.005 −0.011 ± 0.043 ± 0.016 (K+π0) −0.017 ± 0.047 ± 0.023 −0.051 ± 0.098 ± 0.064 (K0Sπ +) −0.008 ± 0.049 ± 0.011 0.075 ± 0.089 ± 0.009 which would be the least precisely measured parameter in separate one-dimensional fits, is strongly anti-correlated with |A0|2, which would be the best measured. The one-dimensional (1D) distributions, acceptance-corrected with an 1D Ansatz and background-subtracted, are over- laid with the fit results and shown on Figure 2. In con- trast with the dedicated method used in the fit, for the plots, we simply computed the 1D efficiency maps from the distributions of the accepted events divided by the 1D PDF. As in lower statistics studies, the cos θK∗ for- ward backward asymmetry due to the interference with the S wave is clearly visible. Our measurement of the amplitudes of B decays to J/ψ are compatible with, and of better precision than, previous measurements. A comparison of neutral and charged B decays (not shown) yields results consistent with isospin symmetry. The strong phase difference δ‖ − δ⊥ is obtained from a fit in which the phase origin is δ⊥ ≡ 0. We confirm our previous observation that the strong phase differences are significantly different from zero, in contrast with what is predicted by factorization. For B → J/ψK∗, it amounts to δ‖ − δ⊥ = 0.45± 0.05± 0.02. The presence of direct CP -violating triple-products in the amplitude would produce a B to B difference in the interference terms A4 and A6: δA4 and δA6. Our results (see Table III), with improved precision relative to Ref. [19], are consistent with no CP violation. In summary, we have performed the first three- dimensional analysis of the decays to ψ(2S) and χc1. The longitudinal polarization of the decay to ψ(2S) is lower than that to J/ψ , while the CP -odd intensity fraction is higher (by 1.4 and 1.0 standard deviations, respectively). This is compatible with the prediction of models of me- son decays in the framework of factorization. The lon- gitudinal polarization of the decay to χc1 is found to be larger than that to J/ψ , in contrast with the predictions of Ref. [8], which include non-factorizable contributions. The CP -odd intensity fraction of this decay is compatible with zero. The parallel and longitudinal amplitudes for χc1 seem to be aligned (|δ‖ − δ0| ∼ 0) while for ψ they are anti-aligned (|δ‖ − δ0| ∼ π). We are grateful for the extraordinary contributions of our PEP-II colleagues in achieving the excellent luminos- ity and machine conditions that have made this work pos- sible. The success of this project also relies critically on the expertise and dedication of the computing organiza- tions that support BABAR. The collaborating institutions wish to thank SLAC for its support and the kind hospi- tality extended to them. This work is supported by the US Department of Energy and National Science Foun- dation, the Natural Sciences and Engineering Research Council (Canada), the Commissariat à l’Energie Atom- ique and Institut National de Physique Nucléaire et de Physique des Particules (France), the Bundesministerium für Bildung und Forschung and Deutsche Forschungsge- meinschaft (Germany), the Istituto Nazionale di Fisica Nucleare (Italy), the Foundation for Fundamental Re- search on Matter (The Netherlands), the Research Coun- cil of Norway, the Ministry of Science and Technology of the Russian Federation, Ministerio de Educación y Cien- cia (Spain), and the Science and Technology Facilities Council (United Kingdom). Individuals have received support from the Marie-Curie IEF program (European Union) and the A. P. Sloan Foundation. ∗ Deceased † Also with Università di Perugia, Dipartimento di Fisica, Perugia, Italy ‡ Also with Università della Basilicata, Potenza, Italy § Also with IPPP, Physics Department, Durham Univer- sity, Durham DH1 3LE, United Kingdom [1] B. Aubert et al. [BABAR Collaboration], Phys. Rev. D 71, 032005 (2005). [2] I. Dunietz, H. R. Quinn, A. Snyder, W. Toki and H. J. Lipkin, Phys. Rev. D 43, 2193 (1991). [3] C. P. Jessop et al. [CLEO Collaboration], Phys. Rev. Lett. 79, 4533 (1997). [4] T. Affolder et al. [CDF Collaboration], Phys. Rev. Lett. 85, 4668 (2000). [5] R. Itoh et al. [Belle Collaboration], Phys. Rev. Lett. 95, 091601 (2005). [6] B. Aubert et al. [BABAR Collaboration], Phys. Rev. Lett. 87, 241801 (2001). [7] B. Aubert et al. [BABAR Collaboration], Phys. Rev. Lett. 94, 141801 (2005). [8] C. H. Chen and H. N. Li, Phys. Rev. D 71, 114008 (2005). [9] S. J. Richichi et al. [CLEO Collaboration], Phys. Rev. D 63, 031103 (2001). [10] N. Soni et al. [Belle Collaboration], Phys. Lett. B 634, 155 (2006). [11] D. London, N. Sinha and R. Sinha, Phys. Rev. Lett. 85, 1807 (2000). [12] M. Suzuki, Phys. Rev. D 64, 117503 (2001). [13] B. Aubert et al. [BABAR Collaboration], Nucl. Instrum. Meth. A 479, 1 (2002). [14] S. Agostinelli et al. [GEANT4 Collaboration], Nucl. In- strum. Meth. A 506, 250 (2003). [15] Ph. D. Thesis, S. T’Jampens, BaBar THESIS-03/016, Paris XI Univ., 18 Dec 2002. [16] D. J. Lange, Nucl. Instrum. Meth. A 462 152 (2001). [17] F. James and M. Roos, Comput. Phys. Commun. 10 (1975) 343. [18] B. Aubert et al. [BABAR Collaboration], arXiv:hep-ex/0607081. [19] R. Itoh et al. [BELLE Collaboration], Phys. Rev. Lett. 95, 091601 (2005). http://arxiv.org/abs/hep-ex/0607081 References ABSTRACT We perform the first three-dimensional measurement of the amplitudes of $B\to \psi(2S) K^*$ and $B\to \chi_{c1} K^*$ decays and update our previous measurement for $B\to J/\psi K^*$. We use a data sample collected with the BaBar detector at the PEP2 storage ring, corresponding to 232 million $B\bar B$ pairs. The longitudinal polarization of decays involving a $J^{PC}=1^{++}$ $\chi_{c1}$ meson is found to be larger than that with a $1^{--}$ $J/\psi$ or $\psi(2S)$ meson. No direct {\it CP}-violating charge asymmetry is observed. <|endoftext|><|startoftext|> Quantum superpositions and entanglement of thermal states at high temperatures and their applications to quantum information processing Hyunseok Jeong and Timothy C. Ralph Centre for Quantum Computer Technology, Department of Physics, University of Queensland, St Lucia, Qld 4072, Australia (Dated: October 26, 2018) We study characteristics of superpositions and entanglement of thermal states at high tempera- tures and discuss their applications to quantum information processing. We introduce thermal-state qubits and thermal-Bell states, which are a generalization of pure-state qubits and Bell states to thermal mixtures. A scheme is then presented to discriminate between the four thermal-Bell states without photon number resolving detection but with Kerr nonlinear interactions and two single- photon detectors. This enables one to perform quantum teleportation and gate operations for quantum computation with thermal-state qubits. I. INTRODUCTION In many problems considered within the framework of quantum physics, physical systems are treated as pure states that can be represented by state vectors, or equiva- lently, by wave functions. Even though such an approach is simple and useful to address certain problems, it could often be quite different from real conditions of physical systems. This may be particularly true when one deals with macroscopic physical systems in terms of quantum physics. A macroscopic object is a complex open sys- tem which cannot avoid continuous interactions with the environment. Such a physical system is generally in a significantly mixed state and cannot be represented by a state vector. In general, mixed states are subtle ob- jects whose properties are significantly more difficult to characterize than pure states. Schrödinger’s famous cat paradox is a typical example where a massive classical object was assumed to be a pure state. It describes a counter-intuitive feature of quantum physics which dramatically appears when the principle of quantum superposition is applied to macroscopic objects. In the original paradox and its various explanations, the initial cat isolated in the steel chamber is considered a pure state that can be represented by a state vector such as |alive〉 (or a wave function such as ψalive). The cat isolated from the environment is then assumed to inter- act with a microscopic superposition state, (|g〉+|e〉)/ where |g〉 and |e〉 are the ground and excited states of a two-level atom. The cat will be dead if the atom is found in the excited state, |e〉, while it will remain alive if other- wise. Thus in Schrödinger’s gedanken experiment the cat is entangled with the atom as (|g〉|alive〉+ |e〉|dead〉) where the alive and dead statuses of the cat are described by the state vectors |alive〉 and |dead〉. If one mea- sures out the atomic system on the superposed basis, (|g〉 ± |e〉)/ 2, the cat will be in a superposition of alive and dead states such as (|alive〉± |dead〉)/ 2. It is often argued that such superposed states and entangled states can theoretically exist but are virtually impossible to ob- serve because one cannot perfectly isolate a macroscopic object such as the cat from its environment [4]. However, this explanation is not fully satisfactory be- cause the cat, a macroscopic object, is a complex open system which cannot be represented by a state vector. One may argue that the cat could be assumed to be in an unknown pure state such that the cat was certainly alive but the exact state of the cat was unknown. However, the interactions between the cat and its environment can cause the cat to become entangled with the environment [5]. In such a case, even though one can perfectly iso- late the cat in the steel chamber from the enviroment, the cat will remain entangled with the environment due to its pre-interactions with the environment. Therefore, strictly speaking, even to assume a cat as an unknown pure state in the steel chamber is not legitimate. Thus a key point here is that it is unsatisfactory to describe the cat by a pure state such as |alive〉 and |dead〉. We may need a more realistic assumption that the “cat” in Schrödinger’s paradox was in a significantly mixed clas- sical state. An intriguing question is then whether the quantum properties of the resulting state would still re- main or diminish under such an assumption. Recently, such an analogy of Schrödinger’s cat para- dox, where the state corresponding to the virtual cat is a significantly mixed thermal state, was investigated [6]. A thermal state with a high temperature is consid- ered a classical state in quantum optics. As the tem- perature of the thermal state increases, the degree of mixedness, which can be quantified by linear entropy, rapidly approaches the maximum value. When the tem- perature approaches infinity, the thermal state does not show any quantum properties. As a comparison, coher- ent states with large amplitudes are known as the most classical pure states [7], and their superposition is of- ten regarded as a superposition of classical states [8]. However, coherent states are still pure states which may not well represent truly classical systems, and they dis- play some nonclassical features [9]. In Ref. [6], it was shown that prominent quantum properties can actually be transferred from a microscopic superposition to a sig- nificantly mixed thermal state (i.e. a thermal state of which the degree of mixedness is close to the maximum value) at a high temperature through an experimentally http://arxiv.org/abs/0704.0523v2 feasible process. This result clarifies that unavoidable ini- tial mixedness of the cat does not preclude strong quan- tum phenomena. One of the results in Ref. [6] is that quantum entan- glement can be produced between thermal states with nearly the maximum Bell-inequality violation when the temperatures of both modes goes to infinity. In previous related results, Bose et al. showed that entanglement can arise when two systems interact if one of the system are pure even when the other system is extremely mixed [10]. There is an interesting previous example shown by Filip et al. for the maximum violation of Bell’s inequal- ity when one of the modes is an extremely mixed thermal state [11]. Very recently, Ferreira et al. showed that en- tanglement can be generated at any finite temperature between high Q cavity mode field and a movable mirror thermal state [12]. However, in these example [10, 11, 12] only one of the modes is considered a large thermal state [10, 11, 12] and entanglement vanishes in the infinite tem- perature limit [10, 12], which is obviously in contrast to the result presented in Ref. [6]. Entanglement for both of the modes at the thermal limit of the infinitely high temperature has not been found before. Remarkably, the violation of Bell’s inequality in our examples reaches up to Cirel’son’s bound [13] even in this infinite-temperature limit for both modes. As Vedral [14] and Ferreira et al. [12] pointed out it is believed that high temperatures re- duce entanglement and all entanglement vanishes if the temperature is high enough, which is obviously not the case in Ref. [6]. The purpose of this paper is twofold. Firstly, we review and further investigate various properties of superposi- tions and entanglement of thermal states at high tem- peratures [6]. In particular, we investigate two classes of highly mixed symmetric states in the phase space. Both the classes of these states do not show typical interference patterns in the phase space while they manifest strong singular behaviors. Interestingly, the first class of states has neither squeezing properties nor negative values in their Wigner functions, however, they are found to be highly nonclassical states. The second class of states has the maximum negativity in the Wigner function. Further, we discuss the possibility of quantum informa- tion processing with thermal-state qubits. We introduce thermal-state qubits and thermal-Bell states, which are a generalization of pure Bell states. We show that four thermal-Bell states can be well discriminated by nonlin- ear interactions without photon number resolving mea- surements. Quantum teleportation and gate operations for thermal-state qubits can be realized using the Bell measurement scheme. This paper is organized as follows. In Sec. II, we review the generation process of superpositions of thermal states and study their characteristics. In Sec. III, we study en- tanglement of thermal states, i.e., Bell inequality viola- tions. In Sec. IV, we discuss the possibility of quantum information processing using thermal states. We first de- fine the thermal-state qubit and the Bell-basis states us- ing thermal-state entanglement. We then show that the four Bell states can be well discriminated by homodyne detection and two Kerr nonlinearities. It follows that quantum teleportation and quantum gate operations can be realized with thermal-state qubits. We conclude with final remarks in Sec. V. II. SUPERPOSITIONS OF THERMAL STATES A. Generation of thermal-state superpositions Let us first consider a two-mode harmonic oscillator system. A displaced thermal state can be defined as ρth(V, d) = d2αP th(V, d)|α〉〈α| (1) where |α〉 is a coherent state of amplitude α and P thα (V, d) = π(V − 1) exp[− 2|α− d|2 V − 1 ] (2) with variance V and displacement d in the phase space. The thermal temperature τ increases as V increases as e~ν/τ = (V + 1)/(V − 1), where ~ is Planck’s constant and ν is the frequency [15]. Suppose that a microscopic superposition state |ψ〉a = (|0〉a + |1〉a), (3) where |0〉 and |1〉 are the ground and first excited states of the harmonic oscillator, interacts with a thermal state ρthb (V, d) and the interaction Hamiltonian is HK = λâ†âb̂†b̂ (4) which corresponds to the cross Kerr nonlinear interac- tion. The resulting state is then ρentab = d2αP th(V, d) |0〉〈0| ⊗ |α〉〈α| + |1〉〈0| ⊗ |αeiϕ〉〈α|+ |0〉〈1| ⊗ |α〉〈αeiϕ| + |1〉〈1| ⊗ |αeiϕ〉〈αeiϕ| and ϕ is determined by the strength of the nonlinearity λ and the interaction time. The Wigner representation of ρentab is W entab (α, β) = e−2|α| W th(β; d) + 2αV c(β; d) + 2[αV c(β; d)]∗ + (4|α|2 − 1)W th(β; deiϕ) where α and β are complex numbers parametrizing the phase spaces of the microscopic and macroscopic systems respectively and W th(α; d) = exp[−2|α− d| ], (7) V c(α; d) = exp[− 2 (1 − eiϕ)d2 − 1 (α− 2e )(α∗ − 2d )], (8) K = 2+ (V − 1)(1− eiϕ), J = (sinϕ/2 + iV cosϕ/2)/(2V sinϕ/2 + 2i cosϕ/2), and d has been assumed real without loss of generality. If one traces ρentab over mode a, the remaining state will be simply in a classical mixture of two thermal states and its Wigner function will be positive everywhere. However, if one measures out the “microscopic part” on the superposed basis, i.e., (|0〉a ± |1〉a)/ 2, the “macroscopic part” for mode b may not lose its nonclassical characteristics. Such a measurement on the the superposed basis will reduce the remaining state to ρsup(±) = N±s d2αP th(V, d) |α〉〈α| ± |αeiϕ〉〈α| ± |α〉〈αeiϕ|+ |αeiϕ〉〈αeiϕ| , (9) where N±s are the normalization factors, and its Wigner function is W sup(±)(α) = N±s {W th(α; d)± V c(α; d) ± {V c(α; d)}∗ +W th(α; deiϕ)}. (10) The ± signs in Eqs. (8) and (9) correspond to the two possible results from the measurement of the microscopic system. The state in Eq. (10) is a superposition of two thermal states. A feasible experimental setup to generate superposi- tions of thermal states is atom-field interactions in cavi- ties, where a π/2 pulse can be used to prepare the atom in a superposed state. This type of experiment has al- ready been performed to produce a superposition of co- herent states [16]. In our cases, simply thermal states can be used instead of coherent states. Another pos- sible setup is an all-optical scheme with free-traveling fields and a cross-Kerr medium, where a standard single- photon qubit could be used as the microscopic superpo- sition. Recently, there have been theoretical and experi- mental efforts to produce and observe giant Kerr nonlin- earities using electromagnetically induced transparency [17]. Furthermore, it was shown that a weak Kerr non- linearity can still be useful if a initially strong field is employed in this type of experiment [18]. We shall fur- ther explain this with examples in Sec. III. B. Negativity of the Wigner function The negativity of the Wigner function is known as an indicator of non-classicality of quantum states. In order to observe negativity of the Wigner function in a real experiment, its absolute minimum negativity should be large enough. The minimum negativity of the Wigner function in Eq. (6) for V = 1 is −0.144 for d = 0 and −0.246 for d → ∞. Now suppose the initial state can be considered a classical thermal state by letting V ≫ 1. One might expect that the negativity would be washed out as the initial state becomes mixed, but this is not the case. The minimum negativity actually increases as V gets larger. If V → ∞, the minimum negativity of the Wigner function (6) is −0.246 regardless of d: no matter how mixed the initial thermal state was, the minimum negativity of Wigner function is found to be a large value. The point in the phase space which gives the minimum negativity when V ≫ 1 or d ≫ 0 is (− 1 , 0) and has negativity Wneg ≡W entab (− , 0) = 2(−2 + 1 exp[− 2d . (11) It can be shown that Wneg approaches −4/(π2 −0.246 when either d→ ∞ or V → ∞. This effect is obviously due to the interaction between the microscopic superposition and the macroscopic ther- mal state. If the initial microscopic state is not super- posed, e.g., |ψ〉a = |1〉a, the resulting state will be a simple direct product, (|1〉〈1|)a ⊗ ρthb (V,−d). Whilst for V = 1 this state will exhibit negativity, this is washed out and tends to zero as V → ∞. Needless to say, if it was |0〉a instead of |1〉a, the resulting Wigner function will be a direct product of two Gaussian states whose Wigner fucntion can never be negative. The superpositon state (3) plays the crucial role in making the minimum negativ- ity of the resulting Wigner function always saturate to a certain negative value no matter how mixed and classical the initial state of the other mode becomes. − −0.04 100 0.15 0 0.040 0.005 0.015 FIG. 1: The probability distributions of x (left) and p (right) for a “superposition” of two distant thermal states. A thermal state with a large mixedness is converted to such a “thermal- state superposition” by interacting with a microscopic super- potion (see text). The variance V and displacement d for the thermal state are chosen as (a) V = 100 and d = 100, and (b) V = 1000 and d = 300. The fringe visibility is 1 regardless of V and the fringe spacing (the distance between the fringes) does not depend on the variance (i.e. mixedness) but only on the distance d between the two component thermal states. The Wigner functions of the single-mode states, W sup(±)(α), in Eq. (10) show large negative values. The minimum negativity of the Wigner function W sup(−)(α) is W sup(−)(0) = 2/π regardless of the values of V and d. On the other hand, the minimum negativity of the Wigner function W sup(+)(α) approaches 2/π for d→ ∞ and disappears when d = 0. C. Quantum interference in the phase space When ϕ = π, the state (9) becomes ρ± = N(ρth(V, d)±σ(V, d)±σ(V,−d)+ρth(V,−d)), (12) where σ(V, d) = d2αP th(V, d)| − α〉〈α| and N = 2 exp[− 2d . (13) If the initial state for mode b is a pure coherent state, i.e., V = 1, the measurement on the superposed basis for mode a will produce a superposition of two pure coherent states as |Ψ̃±〉 = 1± e−2|α|2 (|α〉 ± | − α〉), (14) where α = d. The probability P± to obtain the state ρ± is obtained as [19] P± = 〈ψ±|Trb[ρentab ]|ψ±〉 = exp[− 2d2 ), (15) 1998 2000 −2002 5 5 FIG. 2: The probability distributions P for a “superposition” of thermal states where V = 5, d = 2000, ϕ = π/1000. The x′ (p′) axis in this figure has been rotated by π/2000 from the x (p) axis for clarity. where |ψ±〉 = (|0〉±|1〉)/ 2. The probability approaches P± = 1/2 when either d or V becomes large. As an analogy of Schrödinger’s cat paradox, the vari- ance V corresponds to the size the initial “cat”, and the distance d between the two thermal component states corresponds to distinguishability between the “alive cat” and the “dead cat”. Suppose that both V and d are very large for the initial thermal state. The two thermal states ρth(V,±d) become macroscopically distinguishable when V , and our example may become a more realis- tic analogy of the cat paradox in this limit. Both the states ρ± in this case show probability distributions with two Gaussian peaks and interference fringes [6]. Figure 1 presents the probability distributions of x (≡ Re[α]) and p (≡ Im[α]) for ρ− (a) when V = 100 and d = 100 and (b) when V = 1000 and d = 300. The probability dis- tribution of x (p) for ρ± can be obtained by integrating the Wigner function of ρ± over p (x). The two Gaussian peaks along the x axis and interference fringes along the p axis shown in Fig. 1 are a typical signature of a quan- tum superposition between macroscopically distinguish- able states. The visibility v of the interference fringes is defined as [15] Imax − Imin Imax + Imin , (16) where I = dxW sup(−)(α) and the maximum should be taken over p. It can be simply shown that the visibil- ity v is always 1 regardless of the value of V . Note that d should increase proportionally to V to maintain the condition of classical distingushability between the two component thermal states ρth(V,±d). The interference fringes with high visibility are incompatible with classical physics and evidence of quantum coherence. The fringe spacing (the distance between the fringes) does not de- pend on V but only on d, i.e., a pure superposition of coherent states shows the same fringe spacing for a given d. We emphasize that the states shown in Fig. 1 are “superpositions” of severely mixed thermal states. An experimental realization of a nonlinear effect cor- responding to ϕ = π is very demanding particularly in the presence of decoherence. Here we point out that the method using a weak nonlinear effect (ϕ≪ π) combined with a strong field (d≫ 1) [18] can be useful to generate a thermal-state superposition with prominent interference (a) (b) (c) 0.005 0.005 0.015 0.005 0.015 (d) (e) (f) FIG. 3: (Color online) The time dependent Wigner functions of the thermal state of V = 100 at the origin (d = 0) after an interaction with a microscopic superposition and a conditional measurement. The measurement result on the microscopic part was supposed to be (|0〉 + |1〉)/ 2. The interaction times are (a) θ = λt = 0, (b) θ = λt = π/32, (c) θ = π/16, (d) θ ≈ 3.102, (e) θ ≈ 3.122 and (f) θ = π. patterns. In Fig. 2, we have used experimentally acces- sible values, V = 5, d = 2000 and ϕ = π/1000, but the fringe visibility is still 1. In this case, decoherence during the nonlinear interaction would be significantly reduced because of the decrease of the interaction time [18]. Note also that, if required, the state in Fig. 2 can be moved to the center of the phase space, for example, using a biased beam splitter (BS) and a strong coherent field [18]. D. Symmetric macroscopic quantum states Let us assume that d = 0, i.e., the initial state is the thermal state, ρth(V, 0), at the origin of the phase space. In this case, the thermal-state superpositions, ρ±, are produced with probabilities, P± = (1/2){1± (1/V )}, re- spectively. Figure 3 shows the Wigner functions of ρ+ dependent on the interaction time between the macro- scopic thermal state and the microscopic superposition in a cross Kerr medium. The state is always symmetric in the phase space regardless of the interaction time as shown in Fig. 3. In this figure, the initial state is a ther- mal state of V = 100 (Fig. 3(a)). In a relatively short time (θ = π/32 and θ = π/16), the state shows some in- terference patterns. When θ = π, the evolved state looks very localized around the origin as shown in Fig 3. The generated state at θ = π does not show negativity of the Wigner function nor squeezing properties. On the other hand, a well defined P function does not exist for this state. In the case of ρ−, with the same assumption d = 0, the Wigner function at ϕ = π has the minimum negativity (−2/π) at the origin regardless of V . As a result of the interaction with the microscopic superposition, a deep hole to the negative direction below zero has been formed around the origin for ρ− as shown in Fig. 4 . III. ENTANGLEMENT BETWEEN THERMAL STATES Entanglement between macroscopic objects and its Bell-type inequality tests are an important issue. In this section, we shall show that entanglement can be gener- ated between high-temperature thermal states even when the temperature of each mode goes to infinity. −1 −0.5 0 0.5 1 FIG. 4: (Color online) The Wigner function of the thermal state of V = 100 at the origin (d = 0) after an interac- tion with a microscopic superposition and a conditional mea- surement. The measurement result on the microscopic part was supposed to be (|0〉 − |1〉)/ 2 with the interaction time θ = λt = π. 2001 400 600 800 1000 50 100 150 FIG. 5: (a) The optimized violation, B ≡ |B+|max, of Bell- CHSH inequality for the “thermal-state entenglement”, ρ+, of V = 1000 (solid curve) and V = 100 (dashed curve). The Bell-violation of a pure entangled coherent state, i.e., V = 1, has been plotted for comparison (dotted curve). The Bell-violation B approaches its maximum bound, 2 2, when V regardless of the level of the mixedness V . (b) The optimized Bell-violation B against d for the different type of thermal-state entanglement generated using a 50:50 beam splitter from ρ+. V = 1000 (solid curve), V = 100 (dashed curve) and V = 1 (dotted curve). A. Entanglement using two initial thermal states If the microscopic superposition interacts with two thermal states, ρthb (V, d) and ρ c (V, d), and the micro- scopic particle is measured out on the superposed basis, the resulting state will be ρtm(±) = Nt ρth(V, d)⊗ ρth(V, d)± σ(V, d) ⊗ σ(V, d) ± σ(V,−d)⊗ σ(V,−d) + ρth(V,−d)⊗ ρth(V,−d) where Nt = 2 exp[− 4d . (18) Such two-mode thermal-state entanglement can be gener- ated using two cavities and an atomic state detector [20]. Extending the two cavities to N cavities, entanglement of N -mode thermal states can also be generated. Such a state is an analogy of the N -mode pure GHZ state [21] but each mode is extremely mixed. Here we shall consider the Bell-CHSH inequality [22, 23] with photon number parity measurements [20, 24]. The parity mea- surements can be performed in a high-Q cavity using a far-off-resonant interaction between a two-level atom and the field [25]. The Bell-CHSH inequality can be repre- sented in terms of the Winger function as [24] |B(±)| = π |W tm(±)(α, β) +W tm(±)(α, β′) +W tm(±)(α′, β)−W tm(±)(α′, β′)| ≤ 2, where W tm(±)(α, β) is the Wigner function of ρtm(±) in Eq. (17). As shown in Fig. 5, the Bell-violation ap- proaches the maximum bound for a bipartite measure- ment, 2 2 [13], when d ≫ V regardless of the level of the mixedness V , i.e., the temperatures of the thermal states. Note that it is true for both of ρ+ and ρ− even though only the case of ρ+ has been plotted in Fig. 5(a). This implies that entanglement of nearly 1 ebit has been produced between the two significantly mixed thermal states for d ≫ V , and such “thermal-state entangle- ment” cannot be described by a local theory. B. Entanglement using a beam splitter A different type of macroscopic entanglement can be generated by applying the beam splitter operation exp[θ/2(eiφâ†sâd − e−iφâ dâs)], (20) on the “thermal-state superpositions” in Eq. (9). The state after passing through a 50:50 beam splitter can be represented as d2αP thα (V, d) ,− α√ 〉 ± | − α√ ,− α√ | ± 〈− α√ , (21) 200 400 600 800 1000 FIG. 6: The optimized Bell-violation B against V for the slightly different type of thermal-state entanglement gener- ated using a 50:50 beam splitter using ρ+ when d = 0. 3.13 3.14 3.15 3.16 3.13 3.14 3.15 3.16 FIG. 7: (a) The Bell-CHSH function B against θ (= λt) for V = 1 (solid curve), V = 10 (dashed curve) and V = 20 (dotted curve) for d = 30. (b) The Bell-CHSH function for d = 10 (solid curve), d = 20 (dashed curve) and d = 30 (dotted curve) for V = 10. The Bell violations are more sensitive to the interaction time as either V or d increases. where N is defined in Eq. (13). When d is large, this state violates the Bell-CHSH inequality to the maximum bound 2 2 regardless of the level of mixedness V as shown in Fig. 5(b). Again, it is true for both of ρ+ and ρ− even though only the case of ρ+ has been plotted in Fig. 5(b). Furthermore, these states severely violate Bell’s inequality even when d = 0 as V increases as shown in Fig. 6. We have found that the optimized Bell violation of these states approaches 2.32449 for V → ∞. Interest- ingly, this value is exactly the same as the optimized Bell-CHSH violation for a pure two-mode squeezed state in the infinite squeezing limit [26]. Note that multilmode entangled states can be generated using multiple beam splitters. It should be noted that the Bell violations are more sensitive to the interaction time when either V or d is larger. Figure 7 clearly shows this tendency. Therefore, in order to observe the Bell violations using the mixed state of V (and d) large, the interaction time in the Kerr medium should be more accurate. IV. QUANTUM INFORMATION PROCESSING WITH THERMAL-STATE QUBITS In this section, we discuss the possibility of quan- tum information processing with thermal-state qubits and thermal-state entanglement. A. Qubits and Bell-state measurements We introduce a thermal-state qubit ρψ = |a|2ρth(V, d)± ab∗σ(V, d)± a∗bσ(V,−d) + |b|2ρth(V,−d), where a and b are arbitrary complex numbers. The ba- sis states, ρth(V, d) and ρth(V,−d), can be well discrimi- nated by a homodyne measurement when d is larger than V . The thermal state qubit (22) can be re-written as d2αP thα (V, d) a|α〉+ b| − α〉 a∗〈α| + b∗〈−α| which can be understood as a generalization of the co- herent state qubit, a|d〉+ b| − d〉, where |d〉 is a coherent state of amplitude d. The thermal-state qubit (23) be- comes identical to the coherent-state qubit when V = 1. We also define four thermal-Bell states as ρΦ(±) = Nt ρth(V, d) ⊗ ρth(V, d)± σ(V, d)⊗ σ(V, d)± σ(V,−d)⊗ σ(V,−d) + ρth(V,−d)⊗ ρth(V,−d) ρΨ(±) = Nt ρth(V, d)⊗ ρth(V,−d)± σ(V, d) ⊗ σ(V,−d)± σ(V,−d)⊗ σ(V, d) + ρth(V,−d)⊗ ρth(V, d) where Nt was defined in Eq. (18). The thermal-Bell states can be written as ρΦ(±) = Nt dα2dβ2P thα (V, d)P β (V, d) |α, β〉 ± | − α,−β〉 〈α, β| ± 〈−α,−β| , (26) ρΨ(±) = Nt dα2dβ2P thα (V, d)P β (V, d) |α,−β〉 ± | − α, β〉 〈α,−β| ± 〈−α, β| . (27) Homodyne detector C FIG. 8: A schematic of the thermal-Bell state measurement (a) using photon number resolving detection and (b) using ho- modyne measurements with cross-Kerr nonlinear interactions (NL). See text for details. For quantum information processing applications, it is an important task to discriminate between the four Bell states. Here we discuss two possible ways to discrimi- nate between the thermal-Bell states (25). We shall only briefly describe the first scheme using photon number re- solving measurements and focus on the second scheme using nonlinear interactions. The first method is to simply use a 50-50 beam splitter and two photon number resolving detectors as shown in Fig. 8(a). This scheme is basically the same as the Bell- state measurement scheme with pure entangled coherent states [27, 28]. Let us suppose that the amplitude, d, is large enough, i.e., d ≫ V . If the incident state was ρΦ(+) or ρΦ(−), most of the photons are detected on de- tector A in in Fig. 8(a). Meanwhile, most of the photons are detected on detector B when the incident state was ρΨ(+) or ρΨ(−). The average photon numbers between the “many-photon case” and the “few-photon case” are compared in Fig. 9. Furthermore, the states ρΨ(+) and ρΦ(+) contain only even numbers of photons while ρΨ(−) and ρΦ(−) contain only odd numbers of photons. There- fore, all the four Bell states can be well discriminated by analyzing numbers of photons detected at detectors A and B. For example, if detector A detects many photons while detector B detects few and the total photon num- ber detected by the two detectors are even, this means that state ρΦ(+) was measured by the thermal-Bell mea- surement. The nonzero failure probability can be made arbitrarily small by increasing d. However, the average photon numbers of the thermal- Bell states are high when V ≫ 1 and d≫ 1. In this case, it would be unrealistic to use photon number resolving detectors. It would be an interesting question whether 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3 FIG. 9: The average photon number N for the “many-photon case” (solid line) and the “few-photon case” (dashed line) for V = 10 against d (a) when the input state is either ρΦ(+) or ρΨ(+) and (b) when the input state is either ρΦ(−) or ρΨ(−). these four thermal-Bell states can be distinguished by classical measurements, such as homodyne detection, in- stead of photon number resolving detection. Our alterna- tive scheme employs cross-Kerr nonlinearities and single photon detectors as shown in Fig. 8(b). Let us first sup- pose that the input field was ρΦ(+). The incident two- mode state passes through a 50-50 beam splitter, BS1. The state after passing through the 50:50 beam splitter, BS1, is ρB = Nt d2αd2βP thα (V, d)P β (V, d) |η,−ξ〉〈η,−ξ| + |η,−ξ〉〈−η, ξ|+ | − η, ξ〉〈η,−ξ| + | − η, ξ〉〈−η, ξ| where η = (α+β)/ 2 and ξ = (α−β)/ 2. Two dual-rail single photon qubits, |ψ+〉ee′ and |ψ+〉ff ′ , where |ψ+〉 = (|0〉|1〉+ |1〉|0〉), (29) are prepared using two single photons and 50:50 beam splitters, BS2 and BS3, as shown in Fig. 8(b). Then, traveling fields at modes c and d interacts with those of modes e and f , respectively, in cross-Kerr nonlinear media. We suppose that the interaction time is t = π/λ, and the resulting state is then = UceUdfρ ff ′U df (30) where Uce = exp[iπHKce/λ~] and ρq = |ψq〉〈ψq |. An ex- plicit form of Eq. (30) can then be simply obtained using the identity Uce|α〉c|0〉e = |α〉c|0〉e, Uce|α〉c|1〉e = | − α〉c|1〉e where |α〉 is a coherent state. However, we omit such an explicit expression in this paper for it is too lengthy. After the nonlinear interactions, the qubit parts, modes e, e′, f and f ′, should be measured with the mea- surement basis {|++〉, |+−〉, | −+〉, | − −〉} (32) where |+ +〉 = |ψ+〉ee′ |ψ+〉ff ′ , | + −〉 = |ψ+〉ee′ |ψ−〉ff ′ , | − +〉 = |ψ−〉ee′ |ψ+〉ff ′ , | − −〉 = |ψ−〉ee′ |ψ−〉ff ′ , and |ψ−〉 = (|0〉|1〉 − |1〉|0〉)/ 2. This measurement can be performed using two 50:50 beam splitters, BS4 and BS5, and four detectors, A1, A2, B1 and B2, as shown in Fig. 8(b). If detector A1 and B1 click, i.e., the mea- surement result is | + +〉, the resulting state at modes c and d is ρ++ = d2αd2βP thα (V, d)P β (V, d) (|η〉 + | − η〉)(〈η| + 〈−η|) (|ξ〉+ | − ξ〉)(〈ξ| + 〈−ξ|) Note that state ρ++ is not normalized, which implies that the probability of obtaining the corresponding measure- ment result is not unity. The probability of obtaining this result is P++ = (V + 1)(V + e− 2(V 2 + e− . (34) When the result is either |+−〉 or | −+〉, the result is 〈ψ2|ρB |ψ2〉 = 〈ψ3|ρB |ψ3〉 = 0, (35) which obviously means that the probability of the ob- taining this result is zero. When the result is | −−〉, i.e., detector A2 and B2 click, ρ−− = d2αd2βP thα (V, d)P β (V, d) (|η〉 − | − η〉)(〈η| − 〈−η|) (|ξ〉 − | − ξ〉)(〈ξ| − 〈−ξ|) which is not normalized. The probability of obtaining this result is P−− = (V − 1)(V − e− 4d 2(V 2 + e− , (37) and it can be simply verified that P+++P−− = 1. There- fore, only the measurement results |++〉 and | −−〉 can be obtained in the case of the input state ρΦ(+). This is exactly the same for the case of ρΨ(+). In the same way, it can be shown that if either the input state was ρΦ(−) or ρΨ(−), only the measurement results |+−〉 and | −+〉 −20 −10 10 20 Probability −20 −10 10 20 Probability FIG. 10: (a) The probability distributions, P++ (solid curve) and P++ (dashed curve), for homodyne measurements at de- tector C. (b) The probability distributions, P++ (solid curve) and P++ (dashed curve), for homodyne measurements at de- tector C. can be obtained. In other words, the parity of the to- tal incoming state is perfectly well discriminated by the measurements on single-photon qubits. Subsequently, a homodyne measurement is performed for mode c by homodyne detector C as shown in Fig. 8(b). We assume that ideal homodyne measurements are per- formed, i.e., when a homodyne measurement is per- formed the state is projected onto eigenstate |x〉 of oper- ator X with eigenvalue x, where (a+ a†). (38) Let us first consider the case when the measurement re- sult for the single photon qubits is | + +〉. In this case, the remaining state is ρ++ in Eq. (33). The probabil- ity distribution P++ for the homodyne measurement at detector C is = 〈x|Trd[ρ++]|x〉 = 2 (e−V x 2 (V + 1) . (39) Note that the superscript, ++, denotes that the qubit measurement result was |++〉, and the subscript, Φ(+), denotes that the input state was ρΦ . These notations will be used also for the other cases in this section. The same analysis can be performed for the other possible measurement outcome | − −〉: = 〈x|Trd[ρ−−]|x〉 = 2 (e−V x 2 − e−x 2 (V − 1) . (40) In the same way, for another input state, ρΦ(−), it is straightforward to show: = P++ , P−+ = P−− , (41) and P++ = P−− = 0. On the other hand, if the input state was ρΨ(+), the probability distributions P++ at detector C are = 〈x|Trc[ρ++]|x〉 = x{4d+(2+V 2)x} (1+V 2)x2 V + 2e 2x(2d+x) V + e x(8d+x+V 2x) V V ) + 1 , (42) = 〈x|Trc[ρ−−]|x〉 = x{4d+(2+V 2)x} (1+V 2)x2 V − 2e 2x(2d+x) V + e x(8d+x+V 2x) V V )− 1 . (43) 20 4 6 8 10 d FIG. 11: The distinguishability Ps between states ρ Ψ(+) and ρΦ(+) by a homodyne measurement against for V = 10 (solid curve) and V = 20 (dashed curve) against distance d. See text for details. It is straightforward to show for the other input state ρΨ(−): = P−− , P−+ = P++ . (44) The probability distributions P++ and P++ are plotted in Fig. 10. Figure 10 shows that when the input state was ρΦ(+) or ρΦ(−), the homodyne measurement outcome by detector C, characterized by P++ and P−− , is located around the origin. However, when the input state was ρΨ(+) or ρΨ(−), the homodyne measurement outcome by detector C, characterized by P++ and P−− , is located far from the origin. Therefore, two of the Bell states, ρΦ(+) or ρΦ(−), can be well distinguished from the other two by the homodyne detector C for the case of the mea- surement outcome |++〉. Finally, by combining the ho- modyne measurement result and the qubit measurement result, all four Bell states can be effectively distinguished. For example, let us assume that the measurement out- come of the single photon detectors was | + +〉 and the homodyne detection outcome was around the origin, i.e., x ≈ 0. Then, one can say that state ρΨ(−) has been mea- sured for the result of the thermal-Bell measurement. As implied in Fig. 10, the overlaps between the proba- bility distributions around the origin, P++ and P−− and the other distributions, P++ and P−− , are ex- tremely small for a sufficiently large d. In other words, the distinguishability by the homodyne detection rapidly approaches 1 as d increases. As an example, we can cal- culate the distinguishability between the states ρΨ(+) and ρΦ(+) by the homodyne measurement by detector C. The distinguishability by homodyne detection is |x| 0.99999 for d = 10 (d = 15) when V = 10 (V = 20). If necessary, another homodyne measurement can be performed for mode d to enhance distinguishability of the Bell measurement. When the probability distribution at detector C is around the origin that of detector D is far from the origin and vice versa. Note also that the second scheme using homodyne de- tection is robust to detection inefficiency compared with the first scheme using photon number resolving measure- ments. In the first scheme, even if a detector misses only one photon, it will result in a completely wrong mea- surement outcome. In the second scheme, however, the measurement outcome will not be affected in that way. If a single photon detector misses a photon, it will be imme- diately recognized. Such a case can simply be discarded so that it will only degrade the success probability of the Bell measurement. The homodyne detection inefficiency will not significantly affect the result when the distribu- tions around the origin and the distributions far from the origin are well separated, i.e., when d ≫ V , as shown in Fig. 10. On the other hand, loss in the Kerr medium will have a detrimental affect. B. Quantum teleportation and computation Quantum teleportation of a thermal-state qubit can be performed using one of the Bell states as the quantum channel. Let us assume that Alice needs to teleport a thermal-state qubit, ρψ, to Bob using a thermal-state entanglement, ρΨ(−), shared by the two parties. The total state can be represented as 1 ⊗ ρ 23 = Nt dα2dβ2dγ2P thα (V, d)P β (V, d)P γ (V, d) (a|α〉+ b| − α〉)1(|β,−γ〉 − | − β, γ〉)23 . (46) Alice first needs to perform the thermal-Bell measure- ment described in the previous subsection. To complete the teleportation process, Bob should perform an appro- priate unitary transformation on his part of the quantum channel according to the measurement result sent from Alice via a classical channel. It is straightforward to show that the required transformations are exactly the same to those for the coherent-state qubit [27]. When the mea- surement outcome is ρΨ(−), Bob obtains a perfect replica of the original unknown qubit without any operation. When the measurement outcome is ρΦ(−), Bob should perform |α〉 ↔ | − α〉 on his qubit in Eq. (23). Such a phase shift by π can be done using a phase shifter whose action is described by P (ϕ) = eiϕa †a, where a and a† are the annihilation and creation operators. When the out- come is ρΨ(+), the transformation should be performed as |α〉 → |α〉 and | − α〉 → −| − α〉. It is known that the displacement operator is a good approximation of this transformation for d ≫ 1 [29]. This transformation can also be achieved by teleporting the state again locally and repeating until the required phase shift is obtained [30]. When the outcome is ρΦ(+), σx and σz should be successively applied. V. CONCLUSION In this paper, we have studied characteristics of su- perpositions and entanglement of thermal states at high temperatures and discussed their applications to quan- tum information processing. The superpositions and en- tanglement of thermal states show various nonclassical properties such as interference patterns, negativity of the Wigner functions, and violations of the Bell-CHSH in- equality. The Bell violations are more sensitive to the interaction time during the generation process when the thermal temperature (i.e. mixedness) of the thermal- state entanglement is larger. Therefore, in order to ob- serve the Bell violations using the mixed state at a high temperature, the interaction time in the Kerr medium should be accurate. We have pointed out that certain superpositions of high-temperature thermal states, sym- metric in the phase space, can also be generated. Some of these states have neither squeezing properties nor neg- ative values in their Wigner functions but they are found to be highly nonclassical. We have introduced the thermal-state qubit and thermal-Bell states for applications to quantum informa- tion processing. We have presented two possible methods for the Bell-state measurement. The Bell-state measure- ment enables one to perform quantum teleportation and gate operations for quantum computation with thermal- state qubits. The first scheme uses two photon number resolving detectors and a 50-50 beam splitter to discrim- inate the thermal-Bell states. Using the second scheme, it is possible to effectively discriminate the thermal-Bell states without photon number resolving detection. The required resources for the second scheme are two Kerr nonlinear interactions, two single photon detectors, two 50:50 beam splitters and one homodyne detector. The second scheme is more robust to inefficiency of the de- tectors: the inefficiency of the single photon detectors only degrades the success probability of the Bell mea- surement. Acknowledgments This work was supported by the DTO-funded U.S. Army Research Office Contract No. W911NF-05-0397, the Australian Research Council and Queensland State Government. [1] E. Schrödinger, Naturwissenschaften. 23, pp. 807-812; 823-828; 844-849 (1935). [2] A.J. Leggett and A. Garg, Phys. Rev. Lett. 54, 857 (1985). [3] M.D. Reid, preprint quant-ph/0101052 and references therein. [4] M.A. Nielsen and I.L. Chuang, Quantum Computation and Quantum Information (Cambridge, 2000). [5] H.M. Wiseman and J.A. Vaccaro, Phys. Rev. Lett. 87, 240402, (2001); See discussions in the introduction and references therein. [6] H. Jeong and T.C. Ralph, Phys. Rev. Lett. 97, 100401 (2006). [7] E. Schrödinger, Naturwissenschaften 14, 664 (1926). [8] W. Schleich, M. Pernigo, and F.L. Kien, Phys. Rev. A 44, 2172 (1991). [9] L.M. Johansen, Phys. Lett. A 329, 184. [10] S. Bose, I. Fuentes-Guridi, P.L. Knight, and V. Vedral, http://arxiv.org/abs/quant-ph/0101052 Phys. Rev. Lett. 87, 050401 (2001). [11] R. Filip, M. Dusek, J. Fiurasek, L. Mista, Phys. Rev. A 65, 043802 (2002). [12] A. Ferreira, A. Guerreiro, and V. Vedral, Phys. Rev. Lett. 96, 060407 (2006); We note that this work appeared on the Los Alamos archive (quant-ph/0504186) after we up- loaded the main results of our work (quant-ph/0410210). [13] B. S. Cirel’son, Lett. Math. Phys. 4, 93 (1980). [14] V. Vedral, New J. Phys. 6 102 (2004). [15] D. F. Walls and G. J. Milburn, Quantum Optics, Springer-Verlag (1994). [16] M. Brune et al., Phys. Rev. Lett. 77, 4887 (1996); A. Auffeves et al., Phys. Rev. Lett. 91 230405 (2003). [17] H. Schmidt and A. Imamoglu, Opt. Lett. 21, 1936 (1996); L. V. Hau et al., Nature 397, 594 (1999). [18] H. Jeong, Phys. Rev. A 72, 034305 (2005) and references therein. [19] We note that denominator V was missing in the genera- tion probability P± in [6]. [20] M. S. Kim and J. Lee, Phys. Rev. A 61 042102 (2000). [21] D. M. Greenberger, M. Horne and A. Zeilinger, Bells theorem, Quantum theory, and Conceptions of the the Universe, ed. M. Kafatos, Kluwer, Dordrecht, 69 (1989); [22] S. Bell, Physics 1, 195 (1964). [23] J. F. Clauser et al., Phys. Rev. Lett. 23, 880 (1969). [24] K. Banaszek and K. Wódkiewicz, Phys. Rev. A 58, 4345 (1998); Phys. Rev. Lett. 82, 2009 (1999). [25] B. -G. Englert, N. Sterpi, and H.Walther, Opt. Commun. 100 526 (1993). [26] H. Jeong, W. Son, M. S. Kim, D. Ahn, and C. Brukner, Phys. Rev. A 67, 012106 (2003). [27] H. Jeong, M. S. Kim, and J. Lee, Phys. Rev. A. 64, 052308 (2001). [28] S. J. van Enk and O. Hirota, Phys. Rev. A. 64, 022313 (2001). [29] H. Jeong and M. S. Kim Phys. Rev. A 65, 042305 (2002). [30] T. C. Ralph, A. Gilchrist, G. J. Milburn, W. J. Munro, and S. Glancy, Phys. Rev. A 68, 042319 (2003). http://arxiv.org/abs/quant-ph/0504186 http://arxiv.org/abs/quant-ph/0410210 ABSTRACT We study characteristics of superpositions and entanglement of thermal states at high temperatures and discuss their applications to quantum information processing. We introduce thermal-state qubits and thermal-Bell states, which are a generalization of pure-state qubits and Bell states to thermal mixtures. A scheme is then presented to discriminate between the four thermal-Bell states without photon number resolving detection but with Kerr nonlinear interactions and two single-photon detectors. This enables one to perform quantum teleportation and gate operations for quantum computation with thermal-state qubits. <|endoftext|><|startoftext|> Optimal control of stochastic differential equations with dynamical boundary conditions Stefano BONACCORSI∗, Fulvia CONFORTOLA†, Elisa MASTROGIACOMO Dipartimento di Matematica, Università di Trento, via Sommarive 14, 38050 Povo (Trento), Italia In this paper we investigate the optimal control problem for a class of stochastic Cauchy evolution problem with non standard boundary dynamic and control. The model is composed by an infinite dimensional dynamical system coupled with a finite dimensional dynamics, which describes the boundary conditions of the internal system. In other terms, we are concerned with non standard boundary conditions, as the value at the boundary is governed by a different stochastic differential equation. Keywords: Stochastic differential equations in infinite dimensions, dynamical bound- ary conditions, optimal control 1991 MSC : 1 Setting of the problem Our model is a one dimensional semilinear diffusion equation in a confined system, where interactions with extremal points cannot be disregarded. The extremal points have a mass and the boundary potential evolves with a specific dynamic. Stochas- ticity enters through fluctuations and random perturbations both in the inside as on the boundaries; in particular, in our model we assume that the control process is perturbed by a noisy term. There is a growing literature concerning such problems; we shall mention the paper [2] where a problem in a domain O ⊂ Rn is concerned; the authors cite as an example an SPDE with stochastic perturbations which appears in connection with random fluctuations of the atmospheric pressure field. As opposite to ours, however, that paper is not concerned with control problems. Quite recently, the authors became aware of the paper [1] where a different application to some generalized Lamb model is proposed. The internal dynamic is described by a stochastic evolution problem in the unit ∗stefano.bonaccorsi@unitn.it †Current address: fulvia.confortola@unimib.it http://arxiv.org/abs/0704.0524v1 2 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo interval D = [0, 1] ∂tu(t, x) = ∂ xu(t, x) + f(t, x, u(t, x)) + g(t, x, u(t, x))Ẇ (t, x) (1) which we write as an abstract evolution problem on the space L2(0, 1) du(t) = Amu(t) + F (t, u(t)) dt+G(t, u(t)) dW (t), (2) where the leading operator is Am = ∂ x with domain D(Am) = H 2(0, 1). We assume that f and g are real valued mappings, defined on [0, T ]× [0, 1]× R, which verify some boundedness and Lipschitz continuity assumptions. The boundary dynamic is governed by a finite dimensional system which follows a (ordinary, two dimensional) stochastic differential equation ∂tvi(t) = −bivi(t) + ∂νu(t, i) + hi(t)V̇i(t), i = 0, 1 where bi are positive numbers and hi(t) are bounded, measurable functions; ∂ν is the normal derivative on the boundary, and coincides with (−1)i∂x for i = 0, 1. For notational semplicity, we introduce the 2× 2 diagonal matrices B = diag(−b−0, b1) and h(t) = diag(h0(t), h1(t)). There is a constraint Lu = v which we interpret as the operator evaluating boundary conditions; the system is coupled by the presence, in the second equation, of a feedback term C that is an unbounded operator ∂xu(0) −∂xu(1) The idea is to write the problem in abstract form for the vector u = the space X = L2(0, 1)× R2, that is du = Au(t) + F(t,u(t)) dt +G(t,u(t)) dW(t) u(0) = Our main concern is to study spectral properties of the matrix operator on the domain D(A) = {u ∈ D(Am)× R2 : Lu = v}. Theorem 1. A is the infinitesimal generator of a strongly continuous, analytic semigroup of contractions etA, self-adjoint and compact. Control of stochastic differential equations with dynamical boundary conditions 3 We shall prove the above theorem in Section 2. Further, we shall prove that A is a self-adjoint operator with compact resolvent, which implies that the gener- ated semigroup is Hilbert-Schmidt. Moreover, we can characterize the complete, orthonormal system of eigenfunctions associated to A. Let us fix a complete probability space (Ω,F, {Ft},P); on this space we de- fine W (t), that is a space-time Wiener process taking values in X and V (t) = (V1(t), V2(t)), that is a R 2-valued Wiener process, such that W (t, x) and V (t) are independent. As a corollary to Theorem 1, using standard results for infinite dimensional stochastic differential equations, compare [3, Theorem 7.4], we obtain the following existence result Theorem 2. For any initial condition ∈ X×R2 there exists a unique process u ∈ L2F (0, T ;X × R2) such that u(t) = etA e(t−s)AF(u(s)) ds+ e(t−s)AG(u(s)) dW(s) that is by definition a mild solution of (3). The abstract semigroup setting we propose in this paper allows to obtain an optimal control synthesis for the above evolution problem with boundary control and noise. This means that we assume a boundary dynamics of the form: ∂tv(t) = bv(t)− ∂νu(t, ·) + h(t)[z(t) + V̇ (t)] (4) where z(t) is the control process and takes values in a given subset of R2. As before, we can write the system – defined by the internal evolution problem (1) and the dynamical boundary conditions described by (4) – in the following abstract form duzt = Au t dt+ F(t,u t ) dt+G(t,u t )[Pzt dt+ dWt] ut0 = u0. P : R2 → X denote the immersion of the boundary space in the product space X = L2(0, 1)× R2. The aim is to choose a control process z, within a set of admissible controls, in such way to minimize a cost functional of the form J(t0, u0, z) = E λ(s,uzs , zs)) ds+ Eφ(u T ) (6) where λ and φ are given real functions. In our setting, altough the control lives in a finite dimensional space, we obtain an abstract optimal control problem in infinite dimensions. Such type of problems has been exhaustively studied by Fuhrman and Tessitore in [8]. The control problem is understood in the usual weak sense (see [7]). We prove that if f and g are sufficiently regular then the abstract control problem, under suitable assumptions on λ and φ, can be solved and we can characterize optimal controls by a feedback law (see Theorem 17 and compare Theorem 7.2 in [8]). 4 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo Theorem 3. In our assumptions, there exists an admissible control {z̄t, t ∈ [0, T ]} taking values in a bounded subset of R2, such that the closed loop equation: duτ = Auτ dτ +G(τ,uτ )PΓ(τ,uτ ,G(τ,uτ ) ∗∇xv(τ,uτ )) dτ + F(τ,uτ ) dτ +G(τ,uτ ) dWτ , τ ∈ [t0, T ], ut0 = u0 ∈ X. admits a solution and the couple (z,u) is optimal for the control problem. Stochastic boundary value problems are already present in the literature, see the paper [11] and the references therein; in those papers, the approach to the solution of the system is more similar to that in [2]. We also need to mention the paper [5] for a one dimensional case where the boundary values are set equal to a white noise mapping. 2 Generation properties Let X = L2(0, 1) be the Hilbert space of square integrable real valued functions defined on D = [0, 1] and X = X × R2. In this section we consider the following initial-boundary value problem on the space X u(t) = Amu(t) v(t) = Lu(t) v(t) = Bv(t)− Cu(t) u(0) = u0 ∈ X, v(0) = v0 ∈ R2. In the above equation, Am is an unbounded operator with maximal domain Am = ∂ x, D(Am) = H 2(0, 1); B is a diagonal matrix with negative entries (−b0,−b1). Let C : D(C) ⊂ X → ∂X the feedback operator, defined on D(C) = H1(0, 1) ∂xu(0) −∂xu(1) The boundary evaluation operator L is the mapping L : X → R2 given by Its inverse is the Dirichlet mapping D λ : R 2 → D(Am) λ φ = u(x) ∈ D(Am) : (λI −Am)u(x) = 0, Lu = φ. As proposed in [10], we define a mild solution of (8) a function u ∈ C([0, T ];X) such that u(t) = u0 +Am u(s) ds, t ∈ [0, T ] v(t) = v0 +B v(s) ds+ C u(s) ds. Control of stochastic differential equations with dynamical boundary conditions 5 In order to use semigroup theory to study equation (8), we consider a matrix operator describing the evolution with feedback on the boundary on the domain D(A) = {u ∈ D(Am)× R2 : Lu = v}. Then a mild solution for equation (8) exists if and only if A is the generator of a strongly continuous semigroup. The above definition of the domain D(A) puts in evidence the relation between the first and the second component of the vector u. There is a different characteri- zation that is sometimes useful in the applications. Let us define the operator A0 as A0 = Am on D(A0) = {u ∈ D(Am) : Lu = 0}. We can then write the domain of A as D(A) = {u ∈ D(Am)× ∂X : u−DA,L0 v ∈ D(A0)}. The operator A can be decomposed as the product I −DA,L0 Then, according to Engel [6], A is called a one-sided K-coupled matrix-valued operator. Proof of Theorem 1 In this section we apply form theory in order to prove generation property of the operator A, compare the monograph [13]. Proposition 4. A is the infinitesimal generator of a strongly continuous, analytic semigroup of contractions, self-adjoint and compact. We will give the proof in two steps. First of all we will consider the following form: a(u,v) = u′(x)v′(x) dx + b0 u(0) v(0) + b1 u(1) v(1) on the domain u = (u, α) ∈ H1(0, 1)× R2 | u(0) = α0, u(1) = α1 and we will show that it is densely defined, closed, positive, symmetric and continue. Moreover, the operator associated with the form a is (A, D(A)) defined above. According to [13], this implies that the operator A is self-adjoint and generates a contraction semigroup etA on X that is analytic of angle π . Then we will show the self-adjointness and the compactness of the semigroup etA. To see this, we will refer to [9]. Let us begin with the properties of the form a. 6 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo Lemma 5. The form a is densely defined, closed, positive, symmetric and continue. Proof. By assumption, since b0 and b1 are positive real numbers, it follows that in particular a is symmetric and positive. It is clear that V is a linear subspace of X. Observe that V is dense in X if any u ∈ X can be approximated with elements of V . Consider (u, α) ∈ L2[0, 1] × R2. Since C∞c [0, 1] is dense in L 2(0, 1) it follows that for all ε > 0 there exists v ∈ C∞c [0, 1] such that |u− v|L2[0,1] ≤ Now let ρ0(x) be a symmetric function in C c (R) with support in Bε(0), ρ0(0) = 1 ρ0(x) dx = ε/3. Finally, let ρ1(x) = ρ0(x−1). Then, if we define the function ρ = v + α0 ρ0 [0,1] + α1 ρ1 [0,1] , we have: |u− ρ|L2[0,1] ≤ |u− v|L2[0,1] + |α0ρ0|L2[0,1] + |α1ρ1|L2[0,1] ≤ ≤ max {1, α0, α1} ε. Morever, ρ(0) = α0 and ρ(1) = α1. Thus |(u, α)− (ρ, ρ(0), ρ(1))| for a suitable M . This shows that V is dense in X. In order to check closedness and continuity of a, observe first that the norm induced by a on the space V is equivalent to the norm given by the inner product (u,v)V = [u′(x)v′(x) + u(x)v(x)] dx+ u(1)v(1) + u(0)v(0). In fact, if we set b = b0 + b1, we have ‖u‖a = a(u,u) + ‖u‖2V so that ‖u‖2a ≤ 2 ‖u‖ H1(0,1) + 2b u(0)2 + u(1)2 ≤ max {2, 2b} ‖u‖2V . Now observe that V becomes a Hilbert space when equipped with the inner product defined above since V is a closed subspace of H1(0, 1)× R2. Then a is closed. Finally, a is continuous. To see this, take u,v ∈ V ; then |a(u,v)| ≤ |u′(x)v′(x)| dx+ b [|u(0)| |v(0)|+ |u(1)| |v(1)|] ≤ ‖u‖H1(0,1) ‖v‖H1(0,1) + b [|u(0)| |v(0)|+ |u(1)| |v(1)|] ≤ ‖u‖V ‖v‖V ≤M ‖u‖a ‖v‖a by the Cauchy-Schwartz inequality. Control of stochastic differential equations with dynamical boundary conditions 7 Lemma 6. The operator associated with a is (A, D(A)) defined above. Proof. Denote by (C, D(C)) the operator associated with a. By definition, C is given D(C) = {f ∈ V | ∃g ∈ X s.t. a(f ,g) = (g,h)X∀h ∈ V } Cf = −g. Let us first show that A ⊂ C. Take f ∈ D(A). Then for all h ∈ V a(f ,h) = f ′(x)h′(x) dx + b0f(0)h(0) + b1f(1)h(1) = f ′(x)h(x)|10 − f ′′(x)h(x) dx + b0f(0)h(0) + b1f(1)h(1) = f ′(1)h(1)− f ′(0)h(0)− f ′′(x)h(x) dx + b0f(0)h(0) + b1f(1)h(1). At the same time, if we set α = (f(0), f(1)), β = (h(0), h(1)), we have (Af ,h) = (Af, h)L2(0,1) + (Cf +Bα, β)R2 = f ′′(x)h(x) dx + f ′(0)h(0)− f ′(1)h(1) − b0f(0)h(0)− b1f(1)h(1) = −a(f ,g). The last equality shows that A ⊂ C. To check the converse inclusion C ⊂ A take f ∈ D(C). By definition, there exists g ∈ X such that a(f ,h) = (g,h)X, ∀h ∈ V that is, f ′(x)h′(x) dx = g(x)h(x) dx. Now choose h = (h, α) ∈ V such that the function h belongs to H10 (0, 1) (the existence of such a function is ensured by the continuous embedding of H10 (0, 1)in H1(0, 1)). Then by the last equality we cand derive that f ′ ∈ H1(0, 1) and g is the weak derivative of f ′: it follows that f ′ ∈ H1(0, 1) and we conclude that f ∈ H2(0, 1). Integrating by parts as in the proof of the first inclusion we see that a(f ,h) = f ′(x)h′(x) dx + b0f(0)h(0) + b1f(1)h(1) = f ′(x)h(x)|10 − f ′′(x)h(x) dx + b0f(0)h(0) + b1f(1)h(1) = (−Af ,h) = (g,h), ∀h ∈ V. This implies that Af = −g, and the proof is complete. 8 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo Corollary 7. The operator (A, D(A)) is self-adjoint and dissipative. Moreover it has compact resolvent. Proof. The self-adjointness of A follows by [13] (Proposition 1.24) and he dissipativ- ity is obsvious. Since D(A) ⊂ H2(0, 1)×R2, the operator A has compact resolvent and the claim follows. Taking into account the above corollary, it follows that A generates a contraction semigroup (etA)t≥0 on X that is analytic of angle π/2 and self-adjoint. Finally, by [9, Corollary XIX.6.3] we obtain that etA is compact for all t > 0. Thus we have just proved Proposition 4. Remark 1. By the Spectral Theorem [9, Chapter XIX, Corollary 6.3] it follows that there exists an orthonormal basis {en}n∈N of X and a sequence {λn}n∈N of real negative numbers λn ≤ 0, such that en ∈ D(A), Aen = λnen and lim λn = −∞. Moreover, A is given by λn(u, en)en, u ∈ D(A) etAu = eλnt(u, en)en, u ∈ X. 2.1 Spectral properties of the matrix operator We shall now apply Theorem 2.5 in Engel[6] in order to describe the spectrum of A. According to that result σ(A) ⊆ σ(A0) ∪ σ(B) ∪ S (9) where S = {λ ∈ ρ(A0) ∩ ρ(B) : Det(F (λ)) = 0}. (10) The matrix F (λ) is defined as F (λ) = I − (λ−B)LλKλR(λ,B) where the operators Lλ and Kλ are given by Lλ = −BR(λ,B)R(0, B)C, Kλ = −A0R(λ,A0)DA,L0 . Notice that the matrix F (λ) can also be written as F (λ) = I + CA0R(λ,A0)D 0 R(λ,B). Remark 2. In case when the feedback operator matrix C is identically zero, the above construction implies that S = ∅. Control of stochastic differential equations with dynamical boundary conditions 9 Determining the set S In the following, we construct explicitly the set S. The idea is to construct the matrix F (λ) and compute its determinant. We have to distinguish two cases. If λ < 0 we have Det(F (λ)) = 1 + −λcos( λ+ b0 λ+ b1 (λ+ b0)(λ + b1) We note that the equation Det(F (λ)) = 0 has infinite solutions {λj}j∈N and every λj belongs to the interval (−π2(j + 1)2,−π2j2). Each λj is eigenvalue of the operator A corresponding to the eigenfunction φj = (ej(x), ej(0), ej(1)) where ej(x) = −λjBj b0 + λj −λjx+Bj sin −λjx. for a normalizing constant 0 < Bj < If λ > 0 then Det(F (λ)) = 1 + 1 + e2 − 1 + e2 b0 + λ b1 + λ (b0 + λ) (b1 + λ) We note that Det(F (λ)) > 0 for every λ > 0. This means that there are not elements λ strictly positive in S. Moreover the eigenvalues of A in S are all negative. Remark 3. It is possible to verify directly with some computation that the eigen- values of A are not eigenvalues of A. Further, the same happens in general with the eigenvalues of B, except in case b0 and b1 satisfy an explicit relation. In any case, also if b0 and b1 happen to belong to σ(A), they are in a finite number and do not affect its behaviour. Therefore, with no loss of generality, in the following we may and do assume that all the eigenvalues of A are contained in S. Theorem 8. In the above assumptions the semigroup etA is Hilbert-Schmidt, that |etAφi|2L2(0,1)×R2 <∞ (11) for any orthonormal basis {φi} of L2(0, 1)× R2. Proof. In order to prove that the semigroup etA is Hilbert-Schmidt, it is enough verify the (11) for an orthonormal basis. Let {φi} the orthonormal sequence of eigenfunctions of the operator A described in Remark 1. Then |etAφi|2L2(0,1)×R2 = e2tλi 10 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo where λi are the eigenvalues of the operator A. By (9) it follows that e2tλi ≤ i:λi∈σ(A) e2tλi + i: λi∈σ(B) e2tλi + i:λi∈S e2tλi . But, by Remark 3 we have that e2tλi ≤ i: λi∈σ(B) e2tλi + i:λi∈S e2tλi and the first of the last two series is a finite sum and the second one converges since the eigenvalues λi in S are asymptotic to −π2i2. 3 The abstract problem In this section we are concerned with problem (3): we introduce the relevant assump- tions and we formulate the main existence and uniqueness result for its solution. Let W = (W,V ) be the Wiener process taking values in = L2(0, 1) × R2. We denote {Ft, t ∈ [0, T ]} the natural filtration of W, augmented with the family N of P-null sets of FT : Ft = σ(W(s) : s ∈ [0, t]) ∨N. The filtration {Ft} satisfies the usual conditions. Define F : [0, T ]× X → X for every u = F(t,u) = F F (t, u) where F (t, u)(ξ) = f(t, ξ, u(ξ)). Let G be the mapping [0, T ]×X → L(X,X) such that, for u = and y = in X, G1(t, u) y G2(t, v) η where (G1(t, u) y)(ξ) = g(t, ξ, u(ξ))y(ξ) and (G2(t, v) · η) = h(t) η; we stress that h is a diagonal matrix. Therefore, we are concerned with the following abstract problem dut = Aut dt+ F(t,ut) dt+G(t,ut)dWt ut0 = u0 on which we formulate the following assumptions. Control of stochastic differential equations with dynamical boundary conditions 11 Assumption 9. (i) f : [0, T ] × [0, 1] × R → R, is a measurable mapping, bounded and Lipschitz continuous in the last component |f(t, x, u)| ≤ K, |f(t, x, u)− f(t, x, v)| ≤ L|u− v|. for every t ∈ [0, T ], x ∈ [0, 1], u, v ∈ R. (ii) g : [0, T ]× [0, 1]× R → R, is a measurable mapping such that |g(t, x, u)| ≤ K, |g(t, x, u)− g(t, x, v)| ≤ L|u− v| for every t ∈ [0, T ], x ∈ [0, 1], u, v ∈ R. (iii) h : [0, T ] →M(2, 2) is a bounded measurable mapping verifying |h(t)| ≤ K for every t ∈ [0, T ]. The existence and uniqueness of the solution to (12) is a standard result in the literature, see for instance the monograph [3]. In order to apply the known results, we shall verify that the nonlinear coefficients F and G satisfy suitable Lipschitz continuous conditions. That will be enough to prove the existence of a mild solution which is a process ut adapted to the filtration Ft satisfying the following integral equation ut = e e(t−s)AF(s,us) ds+ e(t−s)AG(s,us) dWs. (13) Proposition 10. Under Assumptions 9(i)–(iii), the following hold: 1. the mapping F : X → X is measurable and satisfies, for some constant L > 0, |F(t,u)− F(t,v)|X ≤ L|u− v|X u,v ∈ X. 2. G is a mapping [0, T ]× X → L(X) such that a. for every v ∈ X the map G(·, ·)v : [0, T ]× X → X is measurable, b. esAG(t,u) ∈ L2(X) for every s > 0, t ∈ [0, T ] and u ∈ X, and c. for every s > 0, t ∈ [0, T ] and u.v ∈ X we have |esAG(t,u)|L2(X) ≤ L s −1/4 (1 + |u|X), (14) |esAG(t,u)− esAG(t,v)|L2(X) ≤ L s −1/4|u− v|X, (15) |G(t,u)|L(X) ≤ L (1 + |u|X), (16) for a constant L > 0. Proof. 1. We have, for u = and v = |F(t,u)− F(t,v)|X = |F (t, u)− F (t, v)|X ≤ L|u− v|X ≤ L|u− v|X. 12 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo 2. Condition (16) follows from the definition of G and the Assumptions 9 (ii)-(iii) on g and h. Now we prove condition (14). Let {φk}k∈N be an orthonormal basis in X. |esAG(t,u)|2L2(X) = | < esAG(t,u)φj , φk > |2X | < G(t,u)φj , esAφk > |2X ≤ |G(t,u)|2L(X) |esA|2L2(X) ≤ L 2(1 + |u|2 )|esA|2L2(X). Using Theorem 8, |esA|2L2(X) ≈ e−2sn where f(t) ≈ g(t) means that f(s)/g(s) = O(1) as s→ 0; this verifies (14). In order to prove the last statement (15), we take the orthonormal basis {φk}k∈N consisting of eigenvectors of A (see Remark 1). We recall that φk = (ek(x), ek(0), ek(1)) where ek(x) = Bk b0 + λk −λkx+Bk sin −λkx. We have |esAG(t,u)− esAG(t,v)|2L2(X) = | < esA[G(t,u)−G(t,v)]φj , φk > |2X | < G(t,u)−G(t,v)φj , esAφk > |2X = e2sλk |G(t,u)−G(t,v)φk|2. But, for u = and v = , by the definition of the operator G, we have |G(t,u)−G(t,v)φk|2X = |g(t, x, u(x))− g(t, x, v(x))|2|ek(x)|2dx K2|u(x)− v(x)|2dx ≤ K2|u− v|2 since the function g is Lipschitz and |ek(x)| ≤ Bk is uniformly bounded in k. Consequently |esAG(t,u)− esAG(t,v)|L2(X) ≤ { e2tλk}1/2K|u− v|X ≤ |esA|L2(X)K|u− v|X which concludes the proof. Control of stochastic differential equations with dynamical boundary conditions 13 Proposition 11. Under the assumptions 9 for every p ∈ [2,∞) there exists a unique process u ∈ Lp(Ω;C([0, T ];X)) solution of (12). Proof. We can apply Theorem 5.3.1 in [4]. In fact by Proposition 4 the operator A generates a strongly continuous semigroup {etA} of bounded linear operators in the Hilbert space X. Moreover, for this theorem to apply we need to verify that coefficients F and G satisfy conditions (14)—(16), which follows from Proposition 4 Stochastic control problem After some preliminaries, in this section we are concerned with an abstract control problem in infinite dimensions. We settle the problem in the framework of weak control problems (see [7]). We aim to control the evolution of the system by the boundary. This means that we assume a boundary dynamic of the form: ∂tv(t) = bv(t)− ∂νu(t, ·) + h(t)[z(t) + V̇ (t)] (17) where z(t) is the control process. We require that z ∈ L2(Ω× [0, T ];R2). As in the previous section we can write the system ∂tu(t, x) = ∂ xu(t, x) + f(t, x, u(t, x)) + g(t, x, u(t, x))Ẇ (t, x) ∂tv(t) = bv(t)− ∂νu(t, ·) + h(t)[z(t) + V̇ (t)] in the following abstract form duzt = Au t dt+ F(t,u t ) dt+G(t,u t )[Pzt dt+ dWt] ut0 = u0 (19) where P : R2 → X is the immersion of the boundary space in the product space X = X×R2. Equation (19), in the framework of stochastic optimal control problem, is called the controlled state equation associated to an admissible control system. We recall that, in general, fixed t0 ≥ 0 and u0 ∈ X, an admissible control system (a.c.s) is given by (Ω,F, {Ft}t≥0,P, {Wt}t≥0, z) where • (Ω,F,P) is a probability space, • {Ft}t≥0 is a filtration in it, satisfying the usual conditions, • {Wt}t≥0 is a Wiener process with values in X and adapted to the filtration {Ft}t≥0, • z is a process with values in a space K, predictable with respect to the fil- tration {Ft}t≥0 and satisfies the constraint: z(t) ∈ Z, P-a.s., for almost every t ∈ [t0, T ], where Z is a suitable domain of K. 14 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo In our case the space K coincide with R2. To each a.c.s. we associate the mild solution uz of state equation the mild so- lution uz ∈ C([t0, T ];L2(Ω;X)) of the state equation. We introduce the functional J(t0, u0, z) = E λ(s,uzs , zs)) ds+ Eφ(u T ) (20) We consider the problem of minimizing the functional J over all admissible control systems (which is known in the literature as the weak formulation of the control problem); any a.c.s. that minimize J -if it exsts- is called optimal for the control problem. We define in classical way the Hamiltonian function relative to the above problem ψ : [0, T ]× X× X → R setting ψ(t,u,w) = inf {λ(t,u, z)+ < w, P z >} (21) and we define he following set Γ(t,u,w) = {z ∈ Z : λ(t,u, z)+ < w, P z >= ψ(t,u, z)} We consider the Hamilton-Jacobi-Bellman equation associated to the control problem ∂v(t, x) + Lt[v(t, ·)](x) = ψ(t, x, v(t, x),G(t, x)∗∇xv(t, x)), t ∈ [0, T ], x ∈ X, v(T, x) = Φ(x). where the operator Lt is defined by Lt[φ](x) = Trace G(t, x)G(x) ∗∇2φ(x) + < Ax,∇φ(x) > . Under suitable assumptions, if we let v denote the unique solution of (22) then we have J(t, x, z) ≥ v(t, x) and the equality holds if and only if the following feedback law is verified by z and uzσ: z(σ) = Γ(σ,uzσ,G(σ,u ∗∇xv(σ,uzσ)). Thus, we can characterize optimal controls by a feedback law. This class of stochastic control problems, in infinite dimensional setting, has been studied by Fuhrman and Tessitore [8] (We refer to Theorem 7.2 in that paper for precise statements and additional results). In order to characterize optimal controls by a feedback law we have to require that the abstract operators F and G satisfy further regularity conditions. We will prove that, under suitable assumptions on the functions f and g in the problem (18), the abstract operators fit the required conditions. Control of stochastic differential equations with dynamical boundary conditions 15 We impose that the operators F and G are Gâteaux differentiable. This notion of differentiability is weaker than the differentiability in the Fréchet sense. We recall that for a mapping F : X → V , where X and V denote Banach spaces, the directional derivative at point x ∈ X in the direction h ∈ X is defined as ∇F (x;h) = lim F (x+ sh)− F (x) whenever the limit exists in the topology of V . F is called Gâteaux differentiable at point x if it has directional derivative in every direction at point x and there exists an element of L(X,V ), denoted ∇F (x) and called Gâteaux derivative, such that ∇F (x;h) = ∇F (x)h for every h ∈ X . Definition 12. We say that a mapping F : X → V belongs to the class G1(X ;V ) if it is continuous, Gâteaux differentiable on X, and ∇F : X → L(X,V ) is strongly continuous. The last requirement of the definition means that for every h ∈ X the map ∇F (·)h : X → V is continuous. Note that ∇F : X → L(X,V ) is not continuous in general if L(X,V ) is endowed with the norm operator topology; clearly, if this happens then F is Fréchet differentiable on X . Membership of a map in G1(X,V ) may be conveniently checked as shown in the following lemma. Lemma 13. A map F : X → V belongs to G1(X,V ) provided the following condi- tions hold: i) the directional derivatives ∇F (x;h) exist at every point x ∈ X and in every direction h ∈ X; ii) for every h, the mapping ∇F (·;h) : X → V is continuous; iii) for every x, the mapping h 7→ ∇F (x;h) is continuous from X to V . When F depends on additional arguments, the previous definitions and proper- ties have obvious generalizations. The following assumptions are necessary in order to provide Gâteaux differen- tiability for the coefficients of the abstract formulation. Assumption 14. For a.a. t ∈ [0, T ], ξ ∈ [0, 1] the functions f(t, ξ, ·) and g(t, ξ, ·) belong to the class C1(R). Proposition 15. Under assumptions 9 and 14, for every s > 0, t ∈ [0, T ], F(t, ·) ∈ G1(X,X), esAG(t, ·) ∈ G1(X, L2(X)). Proof. The first statement is an immediate consequence of the fact that f(t, ξ, ·) ∈ C1(R,R). In order to prove that esAG(t, ·) belongs to the class G1(X, L2(X)) we use the continuous differentiability of g and an argument similar to that used in the proof of Proposition 10. We note that, for u = and v = , the gradient operator∇ esAG(t,u) is an Hilbert Schmidt operator that maps 7→ esA gu(t, ·, u(·))w(·)v(·) = esA (∇u(G(t,u)v)(w)) 16 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo In fact, we have esAG(t,u+ rv) − esAG(t,u) −∇esAG(t,u)v L2(X) = lim esAG(t,u+ rv) − esAG(t,u) φj − esA (∇u(G(t,u)v)φj) , φk > = lim G(t,u+ rv) −G(t,u) −∇uG(t,u)v φj , e sAφk > = lim e2sλk G(t,u+ rv) −G(t,u) −∇uG(t,u)v = lim e2sλk g(t, u(ξ) + rv(ξ)) − g(t, u(ξ)) ek(ξ)− gu(t, u(ξ))v(ξ)ek(ξ) ≤ c lim e2sλk g(t, u(ξ) + rv(ξ)) − g(t, u(ξ)) − gu(t, u(ξ))v(ξ) = c lim e2sλk gu(t, u(ξ) + αrv(ξ)) − gu(t, u(ξ)) dα v(ξ) and, by dominated convergence, this limit is equal to zero. In similar way we can prove the points (ii)− (iii) of Lemma 13 to obtain the thesis. In order to prove the main result of this section we require the following hypoth- esis. Assumption 16. (i) λ is measurable and for a.e. t ∈ [0, T ], for all u,u′ ∈ X, z ∈ Z |λ(t,u, z)− λ(t,u′, z)| ≤ C|1 + u+ u′|m|u− u′| |λ(t, 0, z)| ≤ C for suitable C ∈ R+, m ∈ N; (ii) Z is a Borel and bounded subset of R2; (iii) Φ ∈ G1(X,R) and, for every σ ∈ [0, T ], ψ(σ, ·, ·) ∈ G1,1(X × X,R); (iv) for every t ∈ [0, T ], u,w,h ∈ X ψ(t,u,w)h| + |∇ φ(u)h| ≤ L|h|(1 + |u|)m; (v) for all t ∈ [0, T ], for all u ∈ X and w ∈ X there exists a unique Γ(t,u,w) ∈ Z that realizes the minimum in (21). Namely λ(t,u,Γ(t,u,w))+ < w, PΓ(t,u,w) >= ψ(t,u,w) Control of stochastic differential equations with dynamical boundary conditions 17 Theorem 17. Suppose that assumptions 9, 14 and 16 hold. For all a.c.s. we have J(t0, u0, z) ≥ v(t0, u0) and the equality holds if and only if the following feedback law is verified by z and uz: z(σ) = Γ(σ,uzσ, G(σ,u ∗∇xv(σ,uzσ)), P− a.s. for a.a. σ ∈ [t0, T ]. (23) Finally there exists at least an a.c.s. for which (23) holds. In such a system the closed loop equation: duτ = Auτ dτ +G(τ,uτ )PΓ(τ,uτ ,G(τ,uτ ) ∗∇xv(τ,uτ )) dτ + F(τ,uτ ) dτ +G(τ,uτ ) dWτ , τ ∈ [t0, T ], ut0 = u0 ∈ X. admits a solution and if z(σ) = Γ(σ,uσ, G(σ,uσ) ∗∇xv(σ,uσ)) then the couple (z,u) is optimal for the control problem. Proof. By Proposition 4 we know that A generates a strongly continuous semigroup of linear operators etA on X. The assumption 9 ensures that the statements in Proposition 10 hold. Moreover the assumption 14 guarantees that the results in Proposition 15 are true. Finally these conditions together with the assumption 16 allow us to apply Theorem 7.2 in [8] and to perform the synthesis of the optimal control. References 1. M. Bertini, D. Noja, A. Posilicano, Dynamics and Lax-Phillips scattering for gen- eralized Lamb models, J. Phys. A: Math. Gen. 39 (2006), 15173–15195 2. Igor Chueshov, Björn Schmalfuss, Parabolic stochastic partial differential equations with dynamical boundary conditions, Differential Integral Equations 17 (2004), no. 7-8, 751–780. 3. Giuseppe Da Prato, Jerzy Zabczyk, Stochastic equations in infinite dimen- sions, Encyclopedia of Mathematics and its Applications, 44. Cambridge University Press, Cambridge, 1992. 4. G. Da Prato, J. Zabczyk, Ergodicity for infinite-dimensional systems, Lon- don Mathematical Society Lecture Notes Series, 229, Cambridge University Press, 1996. 5. A. Debussche, M. Fuhrman, G. Tessitore, Optimal Control of a Stochastic Heat Equation with Boundary-noise and Boundary-control, to appear in ESAIM Con- trol, Optimisation and Calculus of Variations. 6. K.-J. Engel, Spectral theory and generator property for one-sided coupled operator matrices, Semigroup Forum 58 (1999), 267–295. 7. W. H. Fleming, H. M. Soner, Controlled Markov processes and viscosity solutions, Springer-Verlag, 1993. 8. M. Fuhrman, G. Tessitore, Non linear Kolmogorov equations in infinite dimen- sional spaces: the backward stochastic differential equations approach and appli- cations to optimal control, Ann. Probab. 30 (2002), no. 3: 1397-1465. 9. Israel Gohberg, Seymour Goldberg, Marinus A. Kaashoek, Classes of linear oper- ators, Vol. I, Birkhser Verlag, Basel, 1990. Operator Theory: Advances and Applica- tions, 49. 18 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo 10. Marjeta Kramar, Delio Mugnolo, Rainer Nagel, Semigroups for initial-boundary value problems.In: Evolution equations: applications to physics, industry, life sciences and economics (Levico Terme, 2000), 275–292, Progr. Nonlinear Differential Equations Appl., 55, Birkhuser, Basel, 2003. 11. Bohdan Maslowski, Stability of semilinear equations with boundary and pointwise noise, Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4) 22 (1995), no. 1, 55–93. 12. D. Mugnolo, Asymptotics of semigroups generated by operator matrices, Ulmer seminare 10 (2005), 299–311. 13. El Maati Ouhabaz, Analysis of heat equations on domains, London Math- ematical Society Monographs Series, 31. Princeton University Press, Princeton, NJ, 2005. Setting of the problem Generation properties Spectral properties of the matrix operator The abstract problem Stochastic control problem ABSTRACT In this paper we investigate the optimal control problem for a class of stochastic Cauchy evolution problem with non standard boundary dynamic and control. The model is composed by an infinite dimensional dynamical system coupled with a finite dimensional dynamics, which describes the boundary conditions of the internal system. In other terms, we are concerned with non standard boundary conditions, as the value at the boundary is governed by a different stochastic differential equation. <|endoftext|><|startoftext|> Introduction Fractional calculus is a branch of mathematics that deal with a generalization of well-known operations of differentiations and integrations to arbitrary non-integer order, which can be real non-integer or even imaginary number. Nowadays physicists have used this powerful tool to deal with some problems which were not solvable in the classical sense. Therefore, the fractional calculus became one of the most powerful and widely useful tools in describing and explaining some physical complex systems. Recently, the Euler-Lagrange equations has been presented for unconstrained and constrained fractional variational problems [1 and other references]. This technique enable us to solve some problems including describing the behavior of non-conservative systems developed by Riewe [2], where he used the fractional derivative to construct the Lagrangian and Hamiltonian for non-conservative systems. From these reasons in [3] was developed a general formula for the potential of any arbitrary force conservative or not conservative, which leads directly to the consideration of dissipative effect in Lagrangian and Hamiltonian formulation. Also, the canonical quantization of non-conservative systems has been carried out in [4]. Starting from a Lagrangian containing a fractional derivative, the fractional Hamiltonian is achieved in [5]. In addition, the passage from Hamiltonian containing fractional derivatives to the fractional Hamilton-Jaccobi is achieved by Rabei et.al [6]. The equations of motion are obtained in a similar manner to the usual mechanics. All these outstanding results using the fractional derivative make us concentrate on another branch of quntam physics. WKB approximation [7, 8, 9, 10,14]. In this paper we are mainly interested to construct the solution of Schroödinger equation in an exponential form (Griffith 1995) starting from fractional Hamilton-Jaccobi equation and how it leads naturally to this semi-classical approximation namely fractional WKB. The purpose of this paper is to find the solution of Schrödinger equation for some systems that have a fractional behavior in their Lagrangians and obey the WKB approximation assumptions. The plan of this paper is as follows: In section II the derivation of generalized Hamilton-Jaccobi partial differential equation which given in [6] is briefly reviewed. In section III the fractional WKB approximation is derived. In Section IV some examples with the fractional WKB technique is reported. Section V is dedicated to conclusions. II. Basic Tools The left and right Reimann-Loville fractional derivative are defined as follows [3] The left Riemann-Liouville fractional derivative is given by αα dfx xa ∫ −−−⎟⎠ = )()( )( 1 (1) The right Riemann-Liouville fractional derivative has the form ββ dfx bx ∫ −−−⎟⎠ = )()( )( 1 (2) Here α, β are the order of derivation such that n-1≤α <|startoftext|> Introduction The Skyrme model, in its initial form, was proposed and developed by T.H.R. Skyrme in a series of papers as a non-linear field theory of pions [2], [3]. Skyrme’s initial idea was to think of baryons (in particular the nucleons) as secondary structures arising from a more fundamental mesonic fluid. The key property of the model was that the baryons arose as solitons in a topological manner and thus possessed a conserved topological charge identified with the baryon number. The lowest energy stable solutions of the model are termed Skyrmions and can be thought of as baryonic solitons. The Skyrme model has been very successful in modelling the struc- tures of various nuclei and has been shown by Witten et al. [4] to possess the general features of a low energy effective field theory for QCD. Some studies of the Skyrme model coupled to gravity have previously been undertaken [1], [5], [6], mainly with the motivation of a comparison of its features with those of other non- linear field theories coupled to gravity. Of particular note is the Einstein-Yang-Mills theory, in which gravitationally bound configurations of non-abelian gauge fields are produced. Other reasons for studying the Einstein-Skyrme model are cosmological and astrophysi- cal ones. Various authors have studied black hole formation in the model, with the conclu- sion that the so-called no-hair conjecture may not hold [7], [8]. The purpose of this paper is to study large baryon number Skyrmions or configurations of Skyrmions in the Einstein-Skyrme model. In particular, we wish to investigate if stable solitonic stars could exist within the model and to compare their properties to those of neutron stars. Preliminary studies of Skyrmion stars have predicted instability to single particle decay [1]. However this was done using the hedgehog ansatz for baryon number larger than 1 which is known to lead to unstable solutions even for the usual Skyrme model. Since then, it has been shown that the Skyrme model has stable shell-like solutions[9] which can be well approximated by the so called rational map ansatz [10]. In this paper we use the rational map ansatz and its extension to multiple shells to construct configurations in the Gravitating Skyrme model that have a very large number of baryon. We show that those configurations, contrary to the hedgehog ansatz are bound even for very large baryon numbers. To construct configurations that have a baryon number comparable to that of neutron star, we have to introduce a further approximation, which we call the ramp ansatz. We show that this anstaz introduces further errors of only a few percent and we use it to compute very large Skyrmion configurations. The paper is organised as follow: first we outline the Einstein-Skyrme model and discuss the main features of the results on static gravitating SU(2) hedgehogs obtained by Bizon and Chmaj [1]. We then use the rational map ansatz to construct shell like gravitating multibaryon configurations and show that for a fixed value of the coupling constant, the configurations exist only when the baryon number is below a certain critical value. Finally we introduce a ramp profile approximation to construct solutions with extremely high baryon numbers. We show how accurate it is and use it to construct Skyrmion stars configuration. 2 The Einstein-Skyrme Model The action for gravitating Skyrmions is formed from the standard Skyrme action for the matter field and the Einstein-Hilbert action for the gravitational field. LSk − x. (1) Here LSk is the Lagrangian density for the Skyrme model defined on the manifold M : LSk = Tr(∇µU∇µU−1) + Tr[(∇µU)U−1, (∇νU)U−1]2, (2) where U belongs to SU(2). As we eventually wish to study baryon stars, we take a spheri- cally symmetric metric, such as associated with the line element = −A2(r) 1− 2m(r) 1− 2m(r) + sin ), (3) where A(r) and m(r) are two profile functions that must be determined by solving the Einstein equations for the model. Our choice of ansatz is motivated by the fact that although in some cases we will be studying non-spherical Skyrmion configurations, the regime we are primarily interested in (i.e. Skyrmions of extremely high baryon number) will be shown to admit quasi spherical solutions. Also, for realistic values of the couplings, the gravitational interaction is small compared to the Skyrme interaction and thus the use of a spherical metric even with non-spherical configurations, is not a great problem. From (3), it can be shown that the Ricci scalar is −A′′r2 − 2A′r + 2A′′rm+ A′m+ 3A′rm′ + Arm′′ + 2Am′ which, after integrating various terms by parts and noting that asymptotic flatness requires both A(r) and m(r) to take a constant value at spatial infinity, reduces the gravitational part of the action to Sgr = −m′(r) . (5) For what follows, it will be convenient to scale to dimensionless variables by defining x = eFπr and µ(x) = eFπm(r)/2, resulting in one dimensionless coupling parameter for the model, α = πF 2πG. We note that taking Fπ = 186Mev and G = 6.72 × 10−45Mev−2, then the physical value of the coupling is α = 7.3× 10−40. As the Skyrme field is an SU(2) valued scalar field, at any given time one can think of it as a map from R3 to the SU(2) manifold. Finite energy considerations impose that the field at spatial infinity should map to the same point on SU(2), say the identity. Thus, one can simply think of the Skyrme field as a map between three-spheres. All such maps fall into disjoint homotopy classes characterised by their winding number. This winding number is a conserved topological charge because no continuous deformation of the field and thus no time evolution, can allow transitions between homotopy classes. It is this topological charge that is interpreted as the baryon number. 3 Gravitating Hedgehog Skyrmions Gravitating Skyrmions were first studied by Bizon and Chmaj[1] who analysed the properties of static spherically symmetric gravitating SU(2) skyrmions. Taking the Hedgehog Ansatz for the Skyrme field U = exp(i−→σ .r̂F (r)) (6) subject to the boundary conditions F (r = 0) = Bπ (7) F (r = ∞) = 0 (8) where B is the Baryon number associated with the Skyrmion configuration, they derived the Euler-Lagrange equation for the profiles F (r), (A(r) and m(r) and found that the model admit two branches of global solitonic solutions at each given baryon number, which annihilate at a critical value of the coupling parameter. Above αcrit no further solutions were found. In particular the value of the critical coupling decreased quite considerably with increasing baryon number as αcrit ≈ 0.040378/B2 . It appears that the existence of a critical coupling does not signal the collapse of a Skyrmion to form a black hole. In fact the metric factor S(x) = (1 − 2µ(x) ) is non-zero at αcrit; there simply ceases to be any stationary points of the action above the critical coupling. The major problem with the ansatz (7) is that it leads to unstable solutions, i.e. for any given value of α, MADM (B = N) > NMADM (B = 1). This is actually the case for the pure Skyrme model as well where the hedgehog anstaz (7) with B > 1 does not correspond to the lowest energy solution for the model. The solutions of the pure Skyrme model when B > 1 are known not to be spherically symmetric[11] but are stable i.e. E(B = N) < N ∗E(B = 1). It was actually shown by Houghton et al [10], [12] that the multi-baryon solutions of the pure Skyrme model can be well approximated by the so called rational maps ansatz which is a generalisation of the hedgehog ansatz. While not radially symmetric, the ansatz separates its radial and angular dependence through a profile function and a rational map respectively. In the following sections we will generalise the construction of Houghton et al to approx- imate the solution of the Einstein-Skyrme model. 4 The Rational Map Ansatz The rational map ansatz introduced by Houghton et al.[10] works by decomposing the field into angular and radial parts. Using the polar coordinates in R3 and defining the stereographic coordinates z = tan(θ/2) expiφ the ansatz reads [10] U = exp (i~σ · n̂RF (r, t)) (9) where n̂R = 1 + |R|2 2ℜ(R), 2ℑ(R), 1− |R|2 is a unit vector where R is a rational function of z. It can be shown that the baryon number for Skyrmions constructed in this way, is equal to the degree of the rational map providing we take the boundary conditions F (r = 0) = π F (r = ∞) = 0. (11) Substituting the ansatz (9) into the action for the model and scaling to dimensionless variables as earlier, we obtain the following reduced Hamiltonian 16πFπ S(x)F (x) + Bsin2F (x)(1 + S(x)F (x)′2) Isin4F (x) where S(x) = 1− 2µ(x) From which one obtains the following field equations S(x)x F (x) + B sin2 F (x) + S(x)BF (x)′2 sin2 F (x) + I sin 4 F (x) F (x) S(x)V (x) sin 2F (x) B + S(x)BF (x)′2 + I sin 2 F (x) − αS(x)F (x) ′3V (x)2 − S(x)′F (x)′V (x)− S(x)F (x)′V (x)′ = αA(x)F (x) x+ 2B sin 2 F (x) where, for convenience, we have defined V (x) as V (x) = x + 2B sin2 F (x). (17) B is the baryon number and I = 1 1 + |z|2 1 + |R|2 2idzdz (1 + |z|2)2 Its value depends on the chosen rational map R. To compute low energy configurations for a given baryon charge B one must find the rational map R or degree B that minimize I. This has been done in [10] and [11] for several values of B. Moreover when b is large, one can use the approximation[11] I ≈ 1.28B2 . The value of I so obtained is then used as a parameter and one can solve equations (14) - (16) for the radial profiles F (x), A(x) and µ(x). We should point out here that for the pure Skyrme model the rational map ansatz produce very good approximation to the multi skyrmion solutions [10]: the energies are only 3 or 4 percent higher and the energy densities exhibit the same symmetries and differ by very little. All the solutions computed by Battye and Sutcliffe[11], when B is not too small, have somehow the shape of a hollow shell. The baryon density is very small everywhere outside the shell, while on the shell itself, it forms a lattice of hexagons and pentagons. 5 Hollow Skyrmion Shells Using the rational map ansatz, we will now solves the field equations (14) - (16) to compute some low action configurations. These solutions will correspond, initially, to a hollow shell of Skyrmions similar to the configuration obtained with the rational map anstaz for the pure Skyrme model. In the following sections we will show how our ansatz can be generalised to allow for more realistic configuration made out of embedded shells. The first thing to note about our solutions is that we again obtain two branches of solutions at each baryon number (Fig. 1). Obtaining this same qualitative behaviour is not surprising when one considers that the B = 1 rational map Skyrmion reproduces the usual B = 1 hedgehog. However, the behaviour of the critical coupling itself is drastically altered for the rational map generated configurations. Namely, we observe that it decreases as approximately 0.040378/B 2 (Fig. 2). In particular this means that for a given value of the coupling, the rational map generated skyrmions can possess a much higher topological charge than their hedgehog counterparts, before there ceases to be any solutions. Quantitatively if Bhedgehog is the maximum baryon number for which hedgehog solutions can be found at a given value of the coupling, then the highest baryon number rational map solution found at the same value of α will be approximately B4hedgehog . Again we observe that the metric function S(x) is non-zero at the critical coupling for all the solutions we have found and as such a horizon has not formed. In Table 1 we present the radius, ADM mass per baryon and minimum value of the metric function, S(x), for configurations up to the maximum baryon number allowed at α = 1× 10−6. These values were obtained by direct numerical solution of equations (14) - (16), where we have used the boundary data as specified in (11). We didn’t didn’t use the physical value of α (7.3×10−40) because for this value, the ratio between the width of the shell and its radius is so small when we reach the maximum value of B that it becomes very difficult to solve the equation reliably. The value α = 1 × 10−6 is small enough to allow for a shell with a large baryon number to exist but large enough to make it possible to compute these solution nears the critical value of B for a single shell configuration. 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 Figure 1: Plot of the two branches of solutions found for B=2 configurations generated with the rational map ansatz. The major difference between these configuration and the solutions of Bizon and Chmadj is that the rational map ansatz configurations become more bound when the baryon number increases This suggests the possibility that giant gravitating Skyrmions can be bound and consequently, that the Skyrme model can be used to study baryon stars. Another interesting feature of the data is the observed change in the radius of the solu- tions with increasing baryon number. We note that the radius grows as approximately B However there are two main deviations from this. Firstly, the constant of proportionality relating the radius to the square root of the baryon number decreases slightly but persis- tently as we increase the baryon number, indicating the gravitational interaction becoming more important as the number of baryons increases. As we approach the maximum baryon charge that can exist at α = 1 × 10−6, we also notice that the radius of the skyrmion actually decreases as we add more baryons. This shows that the gravitation pull plays a crucial role near the critical value of the skyrmion. This is a tantalising property when one considers that generally a neutron star’s radius must decrease for an increase in mass in order to achieve sufficient degenerate neutron pressure 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 2 4 6 8 10 12 14 16 18 20 Figure 2: Plot of the decrease in αcrit with increasing baryon number, for configurations generated with the rational map ansatz. +: αcr for the minimum value of I; curve: αcr = 0.0404B −1/2. to support the star. To motivate the further approximation that we will introduce in the next section, we now look at the profiles of the configuration that we have computed. First of all, we observe that the profile function F (x) stays approximately at its boundary value, π, for a finite radial distance before decreasing monotonically over some small region and finally attaining its second boundary value, 0. A similar behaviour is seen for both the mass field µ(x) and the metric field A(x) (see Fig. 3). Furthermore, as we increase the baryon number the structure becomes more pronounced, with the distance before the fields change (shell radius) increasing significantly, whilst the distance over which the fields change (shell width) settles to a constant size. We conclude that at large baryon numbers, those configurations correspond to hollow shells where the baryons are distributed on a tight lattice over the shell. As such the, structures are nearly spherical, validating our choice of radial metric. Such structures immediately pose an interesting question. Can the gravitating Skyrmions exist as shells with more than one layer? To investigate this we note that it is possible to B R( 2 ) MADM ( 2etopconv ) Smin 1 0.8763 1.2315 1.0000 4 1.7728 1.1365 1.0000 8 2.5065 1.1180 1.0000 100 8.6829 1.0845 0.9999 500 19.3994 1.0827 0.9998 1× 103 27.4314 1.0825 0.9997 1× 104 86.7192 1.0821 0.9989 1× 105 274.0397 1.0814 0.9963 1× 106 864.6968 1.0792 0.9883 1× 107 2715.0729 1.0722 0.9628 1× 108 8377.4601 1.0500 0.88192 1× 109 23585.5315 0.9743 0.6107 1.5× 109 26860.2040 0.9463 0.5020 1.8× 109 27470.2449 0.9302 0.4256 1.81× 109 27456.5804 0.9296 0.4225 1.85× 109 27357.9201 0.9274 0.4090 1.9× 109 27078.6014 0.9246 0.3886 1.95× 109 26126.5508 0.9217 0.3517 1.951× 109 26050.7695 0.9217 0.3495 1.952× 109 25937.4210 0.9216 0.3463 Table 1: Properties of the one shell low energy configuration for α = 1× 10−6 1210 1215 1220 1225 1230 1235 1210 1215 1220 1225 1230 1235 0.99 0.995 1.005 1.01 1210 1215 1220 1225 1230 1235 Figure 3: Numerical solutions for the profiles F (x), µ(x) and A(x) when B = 2 × 106 and α = 1× 10−6. modify the boundary condition (11) to read F (r = 0) = Nπ (19) F (r = ∞) = 0 (20) whilst still ensuring that the Skyrme field is well defined at the origin. This idea was first used in[12] to construct two shell configurations for the pure Skyrme model. The baryon charge is now N times the degree of the rational map. Fig. 4 shows the structure of the solutions we find in this case when N = 2. They suggest that the Skyrmion now exists as a N-layered structure. This is exhibited in the form of the profile, mass and metric functions which interpolate between the boundary values in N distinct steps of equal 1200 1205 1210 1215 1220 1225 1230 1200 1205 1210 1215 1220 1225 1230 0.995 1.005 1.01 1.015 1.02 1200 1205 1210 1215 1220 1225 1230 Figure 4: Numerical solutions for for the profiles F (x), µ(x) and A(x) for 2 layers configurations (F (0) = 2π) when B = 2× 106 and α = 1× 10−6. size stacked next to each other. We can therefore think of this as a naive way of constructing a gravitating Skyrmion. Instead of using the boundary conditions as in (11) and a rational map of degree B we consider constructing the B-Skyrmion using a rational map of degree B/N (with the asso- ciated value of I) and the boundary condition (20). This is a crude construction as we are effectively considering N adjacent shells of baryons, all with the same baryon number. We might realistically expect that the baryon number per shell and distribution of shells may vary significantly for the minimum energy configuration. Nevertheless we shall study the properties of such structures. In fact, in the case where the baryon number is large and the number of shells is small, we expect this crude construction to be quite valid. That is, we do not expect the baryon number to change significantly over the few shells at large radius. B R( 2 ) MADM ( 2etopconv ) Smin 4 1.2898 1.6179 1.0000 8 1.7858 1.4072 1.0000 100 6.1754 1.1363 0.9999 1× 103 19.4157 1.0913 0.9996 1× 104 61.3207 1.0833 0.9985 1× 105 193.7006 1.0812 0.9949 1× 106 610.6271 1.0779 0.9835 1× 107 1911.3704 1.0680 0.9475 1× 108 5825.2626 1.0362 0.8325 9.0× 108 13736.9982 0.9302 0.4258 9.7× 108 13263.0853 0.9224 0.3644 9.76× 108 12998.2817 0.9217 0.3480 9.764× 108 12931.5189 0.9216 0.3444 9.7647× 108 12895.6984 0.9216 0.3425 9.76472× 108 12891.4247 0.9216 0.3423 9.764724× 108 12889.8645 0.9216 0.3422 Table 2: Table of properties of double layer solutions obtained numerically at α = 1× 10−6 For the remainder of this section we will restrict ourselves to the case where N = 2. Table 2 summarises the properties of double layered gravitating skyrmions up to the maximum baryon charge allowed at α = 1 × 10−6. Briefly, we note the main features. Firstly, for all baryon numbers, the radius of the double layered solutions is significantly less than their single layered counterparts. This is not surprising as the baryon charge exists over a thicker region and so the mean radius can decrease with the baryon density remaining the same. Secondly, when B is large enough, i.e. when the double layer starts to make sense, the double layer solutions are energetically favourable when one compares the ADM mass with the single layer solutions. Finally we note that the maximum baryon number allowed (at the given coupling) is almost twice as much in the case of the single layer skyrmions. Of course the results of this section are not really the main regime of interest. We clearly need to study configurations of extremely high baryon number (of order 1058) relevant for baryon stars. We will now discuss this high baryon number regime. 6 The Ramp-profile Approximation Unfortunately, at very high baryon numbers, eqns. ((14) - (16)) become difficult to handle numerically. This is largely because the radius of the solutions becomes much larger than the distance over which the fields change. That is, we need to integrate over a region which is much less than 10−16radius, and so even double precision data types have insufficient precision. Moreover, single shell configurations are not physically relevant and multiple shells will only yield configuration that looks like a star if the number of layers is very large, typically well over 1017. With such a large number of layers we won’t be able to solve the equation nu- merically as we will need at least 10 times as many sampling points for the profile functions. We must thus resort to another level of approximation: approximate the profile functions by profiles that are piecewise linear. This is inspired by the work of Kopeliovich [13] [14] except that our ansatz has to be piecewise linear to be able to generate configurations with a huge number of layers. After defining the ansatz for an arbitrary number of layers, we will show that for a single layer configuration the ansatz produces configurations that are in good agreements with the rational map ansatz configuration. Then we will use the new ansatz to construct configurations that are made out of a very large no of layers. We have shown, in the previous section, that one can construct shell like structures with very large Baryon numbers. At large baryon numbers, the Skyrmions resemble shell like structures. That is, the fields are constant nearly everywhere except in a small region corresponding to the shell. In that region, the profile look like linear functions smoothly linked to the constant parts at the edges (cfr. Fig 3 ). Motivated by this we approximate the fields by the ramp-functions F (x) = − (x− x0) , (x0 −NW/2) ≤ x ≤ (x0 +NW/2) (21) µ(x) = + (x− x0) , (x0 −NW/2) ≤ x ≤ (x0 +NW/2) (22) A(x) = (1 + A0) + (x− x0) (1−A0) , (x0 −NW/2) ≤ x ≤ (x0 +NW/2) (23) In the above there are four free parameters, namely the central radius x0 of the shell over which the fields change, the width of the shell W , the mass field at spatial infinity M and the value of the metric field at the origin A0 such that limx→∞ = 0. N is the number of layers we wish to study and, as such, is treated as an input parameter. The picture is of a gravitating skyrmion with very high baryon number existing as N thin layers or shell of small thickness. The above ansatz, allow us to find an approximation to the integrated energy. To do this we use the fact that the shell width is much smaller than the radius at large baryon numbers. In particular to evaluate the action integral we can approximate expressions of the type G(x) sinp F (x) for any function G(x) that varies very little over the width of the shell by G(x0) sin p F (x). We then use the fact that Z x0+NW/2 x0−NW/2 F (x) = y dy. (24) This leads to the following expression for the energy: E = −16πFπ 1 +A0 1 + A0 0 −Mx0 + 1− A0 W 2x0 − MWx0 1 + A0 −W − π − 3IW 16x20 1 + A0 To find the configurations which minimize this energy we first minimised it with respect to A0 and M algebraically in order to find an expression for the energy as a function of the width and radius only. Then we minimised this numerically using Mathematica. We will now discuss the features of these configurations. First of all, we must compare the results obtained with the ramp-profile when N = 1 and compare them to the result obtained with the full profile. Tables. 3 and 4 show the properties of solutions we obtained using the ramp-profile approximation, again at α = 1 × 10−6. All the general features of the full numerical solutions are reproduced. In particular, the approximate B 2 scaling and then decrease of the radius, the decreasing ADM mass and the differences between the double and single layer solutions are all exhibited by the data obtained using the ramp-profile approximation. Quantitatively though, there are some differences. The approximation allows a signif- icant increase in the maximum allowed baryon charge. Also, the radius of configurations obtained using the approximation, tend to be smaller than those obtained numerically. If we concentrate on the baryon numbers greater than 105 so as to ensure our approximation, B R( 2 ) W MADM ( ) Smin 100 8.3063 3.1286 1.1023 0.9999 500 18.6031 3.1386 1.1160 0.9997 1× 103 26.313 3.1397 1.1195 0.9996 1× 104 83.206 3.1396 1.1254 0.9987 1× 105 262.94 3.1357 1.1266 0.9960 1× 106 829.60 3.1230 1.1251 0.9872 1× 107 2604.2 3.0825 1.1186 0.9595 1× 108 8032.8 2.9512 1.0972 0.8713 1× 109 22899 2.4837 1.0272 0.5772 2× 109 29121 2.1092 0.9818 0.3645 2.8× 109 29098 1.6623 0.9505 0.1380 2.83× 109 28514 1.6066 0.9495 0.1119 2.839× 109 28024 1.5671 0.94922 0.09373 2.8397× 109 27869.3 1.5556 0.94924 0.08845 2.83975× 109 27869.8 1.5524 0.94925 0.08699 2.839752× 109 27822 1.5521 0.94925 0.08687 Table 3: Table of properties of the single layer step ansatz configurations for varying the baryon number at fixed α = 1× 10−6 B R( 2 ) W MADM ( ) Smin 100 5.7924 3.0428 1.0692 0.9999 1× 103 18.5788 3.1305 1.1047 0.9995 1× 104 58.8202 3.1380 1.1201 0.9983 1× 105 185.8420 3.1332 1.1246 0.9944 1× 106 585.7950 3.1153 1.1233 0.9820 1× 107 1833.0500 3.0578 1.1143 0.9428 1× 108 5587.3600 2.8688 1.0840 0.8172 9× 108 14147.1782 2.1859 0.9900 0.4065 9.764724× 108 14472.3851 2.1276 0.9837 0.3746 1× 109 14560.5000 2.1092 0.9818 0.3646 1.4× 109 14549.0000 1.6623 0.9505 0.1381 1.41963× 109 13994.0523 1.5644 0.9492 0.0926 1.419635134× 109 13993.2000 1.5643 0.9492 0.0925 Table 4: Table of properties of the double layer step ansatz configurations for varying the baryon number at fixed α = 1× 10−6 that the width is much smaller than the radius, is valid, then at worst we find a discrepancy in the ADM mass of 11% and in the radius of 7%. In general then, the data seems to confirm the reliability of the ramp-profile approxi- mation. In fact the approach will be even more reliable at the extremely high values of the baryon number that we are interested in. This is because the radius of solutions is of orders of magnitudes greater than the width in such a regime, consistent with the approximations we have made. Moreover, whilst searching for minima of the energy does not allow us to probe both branches of solutions, it does allow us to locate the value of αcrit. We again obtain the approximate trend αcrit ∝ B− 2 , for large B. Now in order to say anything about the possibility of baryon stars in the Skyrme model we need to be able to verify that the decrease in the ADM mass per baryon we observed at low and moderate baryon numbers, extends to baryon numbers of order 1058 for α = 7.3× 10−40. Table. 5 summarizes our solutions in such a regime. Firstly we consider constructing a single layer self-gravitating Skyrmion with these values. We do indeed see that the con- figuration is bound. This is verified by checking that the ADM mass is lower (even at this significantly lower value of α) than for the B = 1 hedgehog. So the possibility of baryon stars in the Einstein-Skyrme model cannot be ruled out on the grounds of energy. The Skyrmion exists as a giant thin shell, and the large baryon charge is distributed as a tight lattice over this. However a hollow shell is clearly not a realistic construction for a neutron star. This fact manifests itself in the extremely high radius of the configuration. Transferring to standard units, the single layer B = 1058 gravitating Skyrmion has a radius of 2.42× 1010km ! To address this issue, we can use a large number of layered Skyrmions as discussed earlier. This has several benefits. Firstly, as we are distributing the baryon number through a larger volume, then at a given baryon density the necessary radius can decrease. Similar to what we see in the double layer results. On top of this, we expect the radius to decrease further due to extra gravitational compression, as the outer layers of the Skyrmions feel the attraction of inner layers. Finally, the many layer approach is also a more realistic construction of a solid baryon star. The results for using more and more layers in the construction (for fixed B and α), are also presented in Fig. 5. We note that not only does the radius decrease significantly, but the added gravitational binding further improves the energies of the configurations, reflected in the low ADM masses obtained. There appears to be a critical number of layers that can be used before there ceases to be any solutions and although the value of Smin is close to zero at this point, the star still has not collapsed to form a black hole. Finally, we note that the radius of the Skyrmion at the critical number of layers is approximately 20.91km. This is comparable to a real neutron star, with a typical radius of 10km. We reemphasise here that our approach to embedding shells of baryons is quite crude. For few shells and large baryon number, we might reasonably believe that baryon number does not chance significantly from one shell to the next. However, when we embed many shells we should really consider that the baryon number of the inner most shells would likely be significantly less than the that of the outer shells. Nevertheless, our naive embedding has produced some interesting properties. In a future work we hope to improve our multi-layer construction to obtain a more realistic description of a baryon star. NShell R( ) W MADM/(6π 2B) Smin 1× 102 8.3236× 1027 3.1416 1.1285 1.0000 1× 103 2.6321× 1027 3.1416 1.1285 1.0000 1× 104 8.3236× 1026 3.1416 1.1285 1.0000 1× 105 2.6321× 1026 3.1416 1.1285 1.0000 1× 106 8.3236× 1025 3.1416 1.1285 1.0000 1× 107 2.6321× 1025 3.1416 1.1285 1.0000 1× 108 8.3236× 1024 3.1416 1.1285 1.0000 1× 109 2.6321× 1024 3.1415 1.1285 1.0000 1× 1010 8.3234× 1023 3.1415 1.1285 0.9999 1× 1011 2.6319× 1023 3.1412 1.1285 0.9997 1× 1012 8.3216× 1022 3.1402 1.1283 0.9991 1× 1013 2.6301× 1022 3.1373 1.1278 0.9971 1× 1014 8.3034× 1021 3.1280 1.1263 0.9907 1× 1015 2.6118× 1021 3.0986 1.1213 0.9705 1× 1016 8.1147× 1020 3.0037 1.1057 0.9063 1× 1017 2.4001× 1020 2.6810 1.0552 0.6977 5× 1017 8.2066× 1019 1.7888 0.9552 0.2036 5.3× 1017 7.4172× 1019 1.6227 0.9491 0.1247 5.33× 1017 7.1871× 1019 1.5625 0.94866 0.0971 5.3306× 1017 7.1597× 1019 1.5549 0.94868 0.0936 5.33065× 1017 7.1525× 1019 1.5528 0.948694 0.0927 5.330657× 1017 7.1506× 1019 1.5523 0.948692 0.0924 Table 5: Table of properties of the step ansatz configurations for varying the number of embedded shells at fixed B = 1058 and α = 7.3× 10−40. 7 Conclusions Previous work on the Einstein-Skyrme model highlighted a considerable problem with using the Skyrmions as a model for baryon stars. Namely, multibaryon hedgehog Skyrmions were simply not energetically favourable states. We have shown that this is simply a consequence of a poor ansatz for the true Skyrmion and, having used the more appropriate rational map ansatz, we have generated energetically favourable configurations of multibaryons. We also observe the interesting property that near the critical coupling, the Skyrmions can decrease in radius as we add more baryons. This hints towards the similar behaviour exhibited by real neutron stars. Although the rational map ansatz does not have an exact radial symmetry, at large scale it does. The anisotropy only appears at the nucleon scale. Finally, since we started with the motivation of studying baryon stars within the Skyrme model, it is interesting to compare the features of our configurations with those of neutron stars. For realistic values, B = 1058 and α = 7.3 × 10−40 we find a minimal energy single layer configuration with radius=2.42 × 1010km. This is clearly too large for a neutron star (which is of order 10km. in radius). This is to be expected however due to the shell model we have taken. Firstly, as we are distributing the baryons over the surface area rather than throughout the volume of the star we naturally must require a much larger star for a given baryon number. This effect is two-fold in that if we were distributing the baryons throughout the volume, outer layers would feel the attraction of inner layers and enhanced radial compression would occur. The loss of such an effect is pronounced when we are considering realistically small values of the coupling. It seems therefore that the way to construct baryon stars in the Skyrme model is to consider embedding shells of baryons within shells. This gives rise to more appropriate specifications for the star and is also more realistic. We do indeed observe such improvements for a many layered configuration. In fact the radius of B = 1058 gravitating Skyrmion (at realistic α), can be decreased in this manner to approximately 20.91km. We note however that this approach to shell embedding has only be done naively thus far. We have only considered the case where the baryon number is equal for each shell. We really should allow the baryon number(and hence the rational map quantities) to vary over the shells. One approach towards this would be to assume that the baryon density is a constant over the shells. An even better approach would be to allow this to be a smoothly varying function that must be determined by minimising the energy. This will give a more realistic description of baryon stars within the Einstein-Skyrme model, as traditional descriptions of neutron stars also involve many strata, of differing neutron density. We are currently investigating such configurations. 8 Acknowledgements GIP is supported by a PPARC studentship. References [1] P. Bizon & T. Chmaj “Gravitating Skyrmions” Phys. Lett. B 297 (1992), 55-62 [2] T. H. R. Skyrme, “A Non-Linear Field Theory” Proc. Roy. Soc. A 260 (1961), 127-138 [3] T. H. R. Skyrme, “A Unified Theory of Mesons and Baryons” Nucl. Phys.31 (1962) [4] E. Witten “Global Aspects of Current Algebra” Nucl. Phys. B223 (1983), 422-433 [5] N. K. Glendenning, T. Kodama & F. R. Klinkhamer “Skyrme Topological Soliton Coupled to Gravity” Phys. Rev. D 38 Number 10 (1988), 3226-3230 [6] M. S. Volkov & D. V. Gal’tsov “Gravitating Non-Abelian solitons and Black Holes with Yang-Mills Fields” Physics Reports, 319, Numbers 1-2, 1-83 (1999) [7] H. Luckock & I. Moss “Black Holes HAve Skyrmion Hair” Phys. Lett B176 (1986),341- [8] S. Droz, M. Heusler & N. Straumann “New Black Hole Solutions with Hair” Phys. Lett. B268 (1991), 371-376 [9] R.A. Battye, P.M. Sutcliffe “ MULTI - SOLITON DYNAMICS IN THE SKYRME MODEL.” Phys.Lett. B391 (1997), 150-156 [10] C. Houghton, N. Manton & P. Sutcliffe “Rational MAps, Monopoles and Skyrmions” Nucl. Phys. B510 (1998), 507-537 [11] R. A. Battye & P. M. Sutcliffe “Skyrmions, Fullerenes and Rational Maps” Rev. Math. Phys. 14 (2002), 29-86 [12] N. S. Manton & B. M. A. G. Piette “Understanding Skyrmions using Rational Maps” hep-th/0008110 Understanding Skyrmions Using Rational Maps: Proceedings of the European Congress of Mathematics, Barcelona 2000, eds. C.Casacuberta et al., Progress in Mathematics, Birkhauser, Basel Vol. 201 (2001) 469-479 http://arxiv.org/abs/hep-th/0008110 [13] V. B. Kopeliovich “The Bubbles of Matter from MultiSkyrmions” JETP Lett. 73 (2001), 587-591; Pisma Zh.Eksp.Teor.Fiz. 73 (2001), 667-671 [14] V. B. Kopeliovich “MultiSkyrmions and Baryonic Bags” J.Phys. G28 (2002), 103-120 Introduction The Einstein-Skyrme Model Gravitating Hedgehog Skyrmions The Rational Map Ansatz Hollow Skyrmion Shells The Ramp-profile Approximation Conclusions Acknowledgements ABSTRACT We investigate the large baryon number sector of the Einstein-Skyrme model as a possible model for baryon stars. Gravitating hedgehog skyrmions have been investigated previously and the existence of stable solitonic stars excluded due to energy considerations. However, in this paper we demonstrate that by generating gravitating skyrmions using rational maps, we can achieve multi-baryon bound states whilst recovering spherical symmetry in the limit where B becomes large. <|endoftext|><|startoftext|> Microsoft Word - Transaction _Mar 2, 2007__1.doc IEEE TRANSACTIONS ON MOBILE COMPUTING, MANUSCRIPT ID 1 Many-to-One Throughput Capacity of IEEE 802.11 Multi-hop Wireless Networks Chi Pan Chan, Student Member, IEEE, Soung Chang Liew, Senior Member IEEE, and An Chan, Student Member, IEEE Abstract—This paper investigates the many-to-one throughput capacity (and by symmetry, one-to-many throughput capacity) of IEEE 802.11 multi-hop networks. It has generally been assumed in prior studies that the many-to-one throughput capacity is upper-bounded by the link capacity L. Throughput capacity L is not achievable under 802.11. This paper introduces the notion of “canonical networks”, which is a class of regularly-structured networks whose capacities can be analyzed more easily than unstructured networks. We show that the throughput capacity of canonical networks under 802.11 has an analytical upper bound of 3L/4 when the source nodes are two or more hops away from the sink; and simulated throughputs of 0.690L (0.740L) when the source nodes are many hops away. We conjecture that 3L/4 is also the upper bound for general networks. When all links have equal length, 2L/3 can be shown to be the upper bound for general networks. Our simulations show that 802.11 networks with random topologies operated with AODV routing can only achieve throughputs far below the upper bounds. Fortunately, by properly selecting routes near the gateway (or by properly positioning the relay nodes leading to the gateway) to fashion after the structure of canonical networks, the throughput can be improved significantly by more than 150%. Indeed, in a dense network, it is worthwhile to deactivate some of the relay nodes near the sink judiciously. Index Terms—wireless mesh networks, many-to-one, one-to-many, data-gathering networks, 802.11, sensor networks, throughput capacity, wireless multi-hop networks. —————————— —————————— 1 INTRODUCTION any-to-one communication is a common communi- cation mode in many multi-hop wireless networks. Two relevant applications are sensor networks and multi-hop wireless mesh networks. In sensor networks, there is often a “data processing center” to which data collected at distributed sensors are to be forwarded. In multi-hop wireless mesh networks, there is an Internet gateway connecting the mesh network to the core wired Internet – the client stations and the Internet gateway form a many-to-one relationship. This paper investigates the many-to-one throughput capacity of IEEE 802.11 multi-hop networks. In this set- ting, there are multiple source nodes generating traffic streams to be forwarded to a common sink node via relay nodes. The relay nodes could be sources themselves. By symmetry, the throughput capacity thus found is also the same as that in a one-to-many scenario in which a source node generates multiple distinct data streams to be for- warded to their respective sinks (note: this is not to be confused with the multicast scenario in which the same data is to be forwarded to multiple sinks). For conven- ience, we shall refer to the sink in the many-to-one sce- nario as the “center” of the network. There have been many related studies on the capacity of general wireless networks. Gupta and Kumar [1] ana- lyzed the capacity in many-to-many situation. It provides the basic model that can be adapted for use in the analysis of the many-to-one communication. As a loose bound, it is obvious that the many-to-one throughput capacity is upper-bounded by L [1]-[3], where L is the single-link throughput capacity, since this is the rate at which the sink can receive data. There is a high probability, how- ever, that the throughput capacity is lower than L for a random network [3]. This paper follows the approach used in [1]-[3] in characterizing which nodes can transmit together without packet collisions. The main difference is that here we are interested in the capacity throughput obtained under the IEEE 802.11 distributed MAC protocol [4]. Specifically, we integrate into our analysis the effects of carrier sensing, the existence of an ACK frame for each DATA frame transmission, and the distributed nature of the CSMA protocol, while [1]-[3] do not and their bounds are obtained with the implicit assumption of perfectly scheduled transmissions. There are three main contributions to this paper: 1. We introduce the notion of “canonical networks”, which is a class of regularly-structured networks whose capacities can be analyzed more easily than general unstructured networks. We find that the throughput capacity of canonical networks under 802.11 is upper bounded by 3L/4 when the source nodes are at least two hops away from the sink. We conjecture that this is also the upper bound for general networks. Indeed, when all the links in the network are of equal length, canonical networks and general networks have the same upper bound of 2L/3. xxxx-xxxx/0x/$xx.00 © 200x IEEE ———————————————— • All authors are with the Department of Information Engineering, The Chinese University of Hong Kong, New Territories, Hong Kong. E-mail: C.P. Chan : cpchan4@ie.cuhk.edu.hk , S. C. Liew : soung@ie.cuhk.edu.hk, A. Chan : achan5@ie.cuhk.edu.hk. Manuscript received (insert date of submission if desired). Please note that all acknowledgments should be placed at the end of the paper, before the bibliography. 2 IEEE TRANSACTIONS ON MOBILE COMPUTING, MANUSCRIPT ID 2. We find that canonical networks give much insight on how a many-to-one network should be designed in general. Our simulations show that 802.11 networks with random topologies operated with AODV routing can only achieve throughputs far below the upper bound of canonical networks. However, if we route the traffic in accordance to the optimized routes ob- tained from an optimization algorithm, the routes near the center have a structure similar to that of the opti- mal canonical network structure. In other words, as a principle, routing or network design near the center should be fashioned after the canonical network. Our further investigation shows that a “manifold” canonical network structure near the center may yield through- put improvement of more than 150% relative to that obtained by using AODV routing in a general network structure. Indeed, in a dense network, it is worthwhile to deactivate some of the relay nodes near the sink ju- diciously. 3. We find that ensuring the many-to-one network is hidden-node free (HNF) in our design leads to higher throughputs as compared to not doing so. This is in contrast to the many-to-many case, in which the large carrier-sensing range required to ensure the HNF property may lower the network throughput due to the increased exposed-node problem [5]. This observa- tion is used as a design principle in much of the study in 1 and 2 above. The rest of this paper is organized as follows. Section II provides the definitions and assumptions used in our analysis. Section III derives the throughput capacities of canonical networks, and presents simulation results to support our findings. In addition, we demonstrate the desirability of ensuring the HNF property in many-to-one networks. Section IV investigates general networks not restricted to the canonical network structure. We show that the optimal routing in general networks results in a subset of selected routes that form a structure near the center that resembles the optimal canonical network. We then apply this insight to demonstrate the desirability of designing the network according to a “manifold” canoni- cal-network structure near the center. Section V concludes this paper. 2 DEFINITIONS AND ASSUMPTIONS Let us first provide some definitions used in our analy- sis. Definition 1: The source nodes are nodes that generate data traffic. Definition 2: The sink node is the center to which the data collected at the source nodes are to be forwarded. Definition 3: The relay nodes relay data traffic from the source nodes to the sink node. Note that a node can be classified as one of the follow- ings: 1) a source node; 2) a sink node; 3) a relay node; or 4) both a source node and a relay node. Definition 4: Given a network topology, the uniform throu- ghput capacity uC with respect to a set of source nodes and a sink node is the maximum total rate at which the data can be forwarded to the sink node, with equal amount of traffic from each source node to the sink node. The throu- ghput capacity, mC , on the other hand, does not require equal amounts of traffic from sources to sink. Thus, in general, m uC C≥ . Fig. 1 shows a simple example of a network consisting of three nodes. Suppose that node 2 is the sole source node and node 1 is the relay node that forwards packets from node 2 to node 0. Node 1 does not generate traffic by itself. Then, / 2mC L= , where L is the capacity of one link. This is because node 1 cannot receive and transmit at the same time (typical assumption of half-duplexity of wireless links). Also, since there is only one source node, m uC C= . If node 1 is also a source node in addition to being a re- lay node, then mC L= (obtained when only node 1 is allowed to transmit), and 2 / 3uC L= , with nodes 1 and 2 having a throughput of L/3 each. Since node 1 needs to serve as the relay node for node 2, node 1 will need to transmit twice as often as node 2. So, proper scheduling is required. Now, if we generalize the above linear network [7] to the one consisting (n+1) nodes, in which there are n sources nodes with (n-1) of them also being relay nodes. Then, uC can be obtained as follows. Node 1 will trans- mit to node 0, the sink node, at rate uC . Node 2 will transmit to node 1 at rate ( 1) /uC n n− , and so on. In gen- eral, node (i+1) transmits to node i at rate ( ) /uC n i n− . We note that when node i transmits, nodes (i+1) and (i+2) cannot: node (i+2) cannot transmit because the reception at node (i+1) will be corrupted by the transmission by node i. So, considering transmissions of nodes 3, 2, and 1 (which form the bottleneck), we have ( 1) / .u C n i n L − + =∑ That is, /(3 3) / 3uC Ln n L= − ≈ for large n. We note that L/3 is also the mC if node n were the only source node. As a matter of fact, / 3u mC C L= = if the source nodes in the linear network were nodes i for 3i ≥ only. Thus, for reasonably large n, if the traffic from nodes 1 and 2 is only a small fraction of the total traffic to the sink, we could treat nodes 1 and 2 as pure relay, non- source, nodes. Once we do that, we then do not have to distinguish between uC and mC . We next consider a general many-to-one network, such as that in Fig. 2. For the study of many-to-one networks in this paper, we focus on the case where the source nodes are two or more hops away from the sink. This is a good approximation when the nodes within one hop to the sink only generate a small fraction of the total traffic. Definition 5: The throughput capacity with respect a multi-access protocol p (e.g., IEEE 802.11), pC , is the total rate at which the data can be forwarded to the sink nodeusing that protocol, assuming the source nodes are two or more hops away from the sink. The transmission CHAN ET AL.: MANY-TO-ONE THROUGHPUT CAPACITY OF IEEE 802.11 MULTI-HOP WIRELESS NETWORKS 3 schedule by the links is dictated by the protocol. This paper focuses on the throughput capacity under the 802.11 CSMA protocol, 802.11C . Henceforth, by throughput capacity, we mean 802.11C . For illustration, let us consider the two-chain linear topology shown in Fig. 3. Suppose that only nodes 2 and 2’ are the source nodes. Under “perfect scheduling”, nodes 1 and 2’ will transmit together; and nodes 1’ and 2 will transmit together. This results in a throughput capacity of L. Under 802.11, how- ever, the transmissions are usually not perfectly aligned in time. In addition, a DATA frame is followed by an ACK frame in the reverse direction. Suppose nodes 1 and 2’ transmit together. Say, the transmission of the DATA frame of node 1 completes first, while the transmission node 2’ is ongoing. When node 0 returns an ACK to node 1, this ACK also reaches node 1’, the receiver of the transmission from node 2’, causing a collision there. Thus, under 802.11, simultaneous transmissions by nodes 1 and 2’ will usually result in a collision unless the completion times of their DATA transmissions are perfectly aligned, which is rare. In this case, 802.11C is at best 2L/3, since at best node 2 and 2’ can transmit together, and nodes 1 and 1’ will need to transmit at separate times. For many-to-one networks, the capacity bottleneck is likely to be near the sink node because all traffic travels toward the sink node. Specifically, nodes near the sink node are responsible for forwarding more traffic, and these nodes contend for access of the wireless medium because they are close to each other. To obtain an idea on the upper limit of the throughput capacity under 802.11, we consider a class of networks referred to as the canoni- cal networks. An example of a canonical network is shown in Fig. 4. We show that 3L/4 is the upper bound of the throughput capacity of canonical networks, and conjecture that this is also the upper bound for networks with general structures. We will motivate the study of the canonical networks shortly. In the special case in which all links have equal length, then the throughput capacities of the canonical network as well as general networks are upper-bounded by 2L/3. We now define the canonical networks. Definition 6: A chain is formed by a sequence of at least three nodes leading to the center sink node. Traffic is for- warded from one node to the next node in the sequence on its way to the sink node. A linear chain is a chain which is a straight line. In Fig. 4, for example, there are eight linear chains. Definition 7: An i-hop node is a node that is i hops away from the sink node in a chain (see Fig. 4). Definition 8: A canonical network is formed by a number of linear chains leading to a common center sink node; the nodes in different chains are distinct except the sink node. In addition, the distance between an i-hop node and an (i- 1)-hop node, di, is the same for all the linear chains (see Fig. 4). Definition 9: A ring is a circle centered on the sink node. An i-hop ring consists of all the i-hop nodes of the differ- ent linear chains in a canonical network (see Fig. 4). Motivation for the Study of Canonical Networks Canonical networks have regular structures and can be analyzed more easily than general networks. We con- jecture that the upper bound of throughput capacity ob- tained for canonical networks is also the upper bound for general networks, because intuitively canonical networks model a rich class of networks the optimal of which may yield very good throughput performance. Consider the following intuitive argument. (i) In a densely populated network (say, infinitely dense), we may choose to form linear chains from the source nodes to the center sink node for routing purposes. Since the direction of traffic flow is pointed exactly to the center, there is no “wastage” Fig. 1. Simple network example. Fig. 2. A random many-to-one network. Fig. 4. A Canonical Network. Fig. 3. A two-chain many-to-one network with equal link length. Sink Source Node 0 Node 1 Node 2 Relay Sink Source Node 0 Node 1 Node 2 Relay Relay Source Node 1’ Node 2’ ring 3-hop node d0 d1 d2 2-hop node 1-hop node chain 4 IEEE TRANSACTIONS ON MOBILE COMPUTING, MANUSCRIPT ID with respect to the case in which the routing direction is at an angle to the center. (ii) We have defined the class of canonical networks to be quite general in that we do not restrict the number of linear chains in it. Neither do we limit the distance di. In deriving the capacity of the ca- nonical network later, we allow for the possibility of an infinite number of linear chains and arbitrarily small di. This provides us with a high degree of freedom in identi- fying the best-structured canonical networks. The above intuitive reasoning will be validated by simulation results later. In addition, we will show later that in a random network with many nodes (so that there is a high degree of freedom in forming routes), establishing a canonical- network-like structure near the center for routing pur- poses will generally lead to superior throughput per- formance. In this paper, unless otherwise stated, we further as- sume the following: Assumptions: (1) The nodes and links are homogenous. They are con- figured similarly, i.e., same transmission power, carrier- sensing range (CSRange), transmission rate, etc. (2) ACK is sent by the receiver when a packet is received successfully, as per the 802.11 DCF operation. (3) The following constraints apply to simultaneous transmissions [1][6]. Consider two links (T1 ,R1) and (T2 ,R2). For simultaneous transmissions without collisions, they must satisfy all the eight inequalities below: 2 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 1 2 2 2 1 2 2 2 1 2 2 2 1 2 2 2 T R T R R R T R T T T R R T T R T R T R R R T R T T T R R T T R X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X − > + ∆ − − > + ∆ − − > + ∆ − − > + ∆ − − > + ∆ − − > + ∆ − − > + ∆ − − > + ∆ − (1) where Xi is the location of node i, |Xi – Xj| is the distance between Xi and Xj, ∆ > 0 is the distance margin (see next paragraph). These are the physical constraints that pre- vent DATA-DATA, DATA-ACK and ACK-ACK colli- sions. The received power function can be expressed in the form of ( ) /tP d P d α∝ , (2) where Pt is the transmission power, d is the distance and α is the path-loss exponent, which typically ranges from 2 to 6 according to different environments [8]. By the as- sumptions that all the nodes have the same transmission power and α = 4, and Signal-to-Interference Ratio (SIR) requirement of 10dB. Then at R1 , we require (| - |) (| - |) P X X P X X > (3) giving | - | 10 1.78 | - | > = In other words, ∆ = 0.78. Unless otherwise stated, we as- sume ∆ = 0.78 throughout this work. (4) In 802.11 networks, there are two types of packet colli- sions: collisions due to hidden nodes (HN) (see explana- tion of assumption (5) below or [6]), and collisions due to simultaneous countdown to zero in the backoff period of the MAC of different transmitters. In much of our throu- ghput-capacity analysis, we will neglect the latter colli- sions and assume that they have only small effects toward throughput capacity, a fact which has been borne out by simulations and which can be understood through intui- tive reasoning, particularly for a network in which a node is surrounded by only a few other active nodes who may collide with it. As will be shown later in this paper, this is generally a characteristic of a network with good throu- ghput performance (see results of Fig. 14 and Fig. 18, for example). Also, an upper bound on throughput capacity obtained by neglecting the countdown collisions is still a valid upper bound. It is a good upper bound so long as it is tight. We will see later that the upper bounds we obtain are reasonably tight when verified against simulations results in which countdown collisions are taken into ac- count. In the remainder of this paper, unless otherwise stated, the term “collisions” refers to collisions due to HN (i.e., caused by the failure of carrier-sensing) rather than simultaneous countdown to zero. (5) In this paper, unless otherwise specified, we assume the so-called Hidden-Node Free Design (HFD) [6] in the network. That is, we design the network such that simul- taneous transmissions that will cause collisions can be carrier-sensed by transmitters and be avoided. A reason for this assumption is that for many-to-one communica- tion, eliminating hidden nodes is worthwhile (see simula- tion results in Section III-C). According to [6], HFD re- quires (i) Use of Receiver Restart (RS) Mode, and (ii) Sufficiently large CSRange. This paper assumes the 802.11 basic mode and RTS/CTS are not used. We briefly describe the HFD re- quirements for understanding of the analysis later. More details can be found in [6]. Fig. 5 is an example showing that no matter how large CSRange is, the hidden node (HN) phenomenon can still occur in the absence of RS. In the figure, T1 and T2 are more than CSRange apart, and so simultaneous transmissions can occur. Furthermore, the SIR is sufficient at R1 and R2 so that no “physical colli- sions” occur. But HN can still happen, as described below. Assume T1 starts first to transmit a DATA packet to R1. After the physical-layer preamble of the packet is re- ceived by R2, R2 will “capture” the packet and will not attempt to receive another new packet while T1’s DATA is ongoing. If at this time T2 starts to transmit a DATA to R2, CHAN ET AL.: MANY-TO-ONE THROUGHPUT CAPACITY OF IEEE 802.11 MULTI-HOP WIRELESS NETWORKS 5 R2 will not receive it and will not reply with an ACK to T2, causing a transmission failure on link (T2, R2). This is the default receiver mode assumed in the NS-2 simulator [10] and most 802.11 commercial products. Note that the ex- ample in Fig. 5 is independent of the size of CSRange. This HN problem can be solved with the Receiver Re- start Mode (RS) which can be enabled in some 802.11 products (e.g., Atheros Wi-Fi chips; however, the default is that this mode is not enabled). With RS, a receiver will switch to receive the stronger packet if its power is Ct times greater than the current packet (say, 10 dB higher). The example in Fig. 5 will not give rise to HN with RS if CSRange is sufficiently large. RS Mode alone, however, cannot prevent HN without sufficiently large CSRange. To see this, consider the ex- ample in Fig. 6. Assume T1 transmits a DATA to R1 first. During the DATA’s period, T2 starts to send a shorter DATA packet to R2. With RS Mode, R2 switches to receive T2’s DATA and sends an ACK after the reception. If T1’s DATA is still in progress, R2’s ACK will corrupt the DATA at R1, since the distance between R1 and R2 is within interference range ( max(1 )d+ ∆ ). To prevent T2 from transmission (hence the collision), the following must be satisfied: 1 2| - | T TX X CSRange≤ . (4) Reference [6] proved that in general if CSRange > (3+∆) dmax, where dmax is the maximum link length, then HN can be prevented in any network. However, for a specific network topology, e.g., the canonical network, the re- quired CSRange can be smaller. Throughout this work, we primarily focus on the pair- wise-interference model [1][6]. The concept of CSRange and the constraints in (1) rely on this assumption. An analysis which at the outset takes into account the simul- taneous interferences from more than one source will complicate things significantly. So, given a network to- pology, our approach is to first identify the capacity based on pair-wise interference analysis only, and then verify the capacity is still largely valid under multiple interferences (this verification is done in Section III-D). 3 CANONICAL NETWORKS In this section, we derive the throughput capacities of canonical networks. Section A analyzes two kinds of ca- nonical networks: equal link-length and variable link- length networks. Simulation results are presented and discussed in Section B. Section C compares the perform- ance of HFD and non-HFD networks, and Section D veri- fies the results under multiple interferences. 3.1 Theoretical Analysis (1) Equal Link-Length Networks We first consider the case where all links have the same length d, i.e., d0 = d1 =… =d. Theorem 1, which fol- lows from Lemma 1 and Corollary 1 below, proves that the throughput capacity in this network is upper- bounded by 2L/3, where L is the single-link throughput. Lemma 1: Given three nodes on the periphery of a circle of radius d, we can identify two nodes with distance smaller than (1+∆)d between them. Proof: The three nodes form the vertices of a triangle. Consider the equilateral triangle inscribed on the circle of radius d , and let t be the length of one side (see Fig. 7). Then 2 cos = 1.731 (1+ ) t d d d = < ∆ That is, it is not possible to inscribe a triangle with all sides no less than (1+∆)d on the circle. Corollary 1: At any time, at most two 2-hop nodes can transmit at the same time. Proof: With reference to Fig. 8, suppose that three 2-hop nodes can transmit together. In order that the ACK of any 1-hop node to not interfere with the reception of DATA packet of another transmission, the distances between the three 1-hop nodes must all be larger than (1+∆)d. By Lemma 1, this is not possible. Theorem 1: For equal-link-length canonical networks, 802.11 2 / 3C L≤ , where L is the link capacity. Proof: Define “airtime” usage of a node to include the transmission time of DATA packets as well as the ACK from the receiver [7]. Let Sij be the airtime occupied by the transmission of the i-hop node on the j-th chain over a Fig. 5. Lack of RS Mode leads to HN no matter how large CSRange and SIR are. Fig. 6. With RS Mode, CSRange not sufficiently large still leads to HN due to insufficient SIR . Fig. 7. Equilateral triangle inscribed in a circle. R1 T1 T2 R2 dmax dmax DATA DATA CSRange >(1+ ∆ )dmax T1 R2 R1 DATA ACK dmax dmax <(1+ ∆ )dmax CSRange 6 IEEE TRANSACTIONS ON MOBILE COMPUTING, MANUSCRIPT ID long time interval [0, Time]. Let S1 = the union of airtimes occupied by all 1-hop nodes S1j. Similarly, let S2 = the union of airtimes occupied by all 2-hop nodes S2j . That is, 1 11 12 1... NS S S S= ∪ ∪ ∪ and 2 21 22 2... NS S S S= ∪ ∪ ∪ . We further define xij = |Sij|/Time. By definition, 1 2| |S S Time∪ ≤ (5) According to assumption (3), when any 1-hop node trans- mits, none of the other 1-hop nodes or 2-hop nodes can transmit at the same time if collisions are not to happen. Thus, if carrier-sensing works perfectly and collisions due to simultaneous countdown to zero in the 802.11 backoff algorithm are negligible (see assumptions (4) and (5) in Section II), then 1 2S S∩ = ∅ (6) and 1 1i jS S∩ = ∅ for i ≠ j . (7) This implies 1 2 1 2| | | | | |S S S S Time+ = ∪ ≤ (8) 1 11 12 1| | | | | | ... | |NS S S S= + + + . (9) By Corollary 1, 21 22 2 | | | | ... | | NS S SS + + + ≥ . (10) Recall that we assume that the 1-hop nodes are relay nodes that do not generate data (see Definition 5 and the justification be- fore that in Section II). All traffic transmitted by 1-hop nodes must therefore come from 2-hop nodes. By the “no collision” assumption, the sum of the airtimes of 1-hop nodes must not be greater than the sum of airtimes of 2-hop nodes. We have 11 12 1 21 22 2| | | | ... | | | | | | ... | |N NS S S S S S+ + + ≤ + + + (11) From (8)-(10), we have 11 12 1 21 22 2| | | | ... | | (| | | | ... | |) / 2N NS S S S S S Time+ + + + + + + ≤ . Applying (11), we get 11 12 1 11 12 1 ( ... ) ( ... ) 1 x x x x x x + + + + + + + ≤ giving 11 12 1 x x x+ + + ≤ where 11 12 1( ... )Nx x x L+ + + is the throughput. We now show a specific schedule on a 2-chain network which achieves the capacity of 2L/3. Consider the topol- ogy shown in Fig. 9. There are two chains, having link distance d and CSRange = 2.9d, which removes HN. Recall that the general HFD has two requirements, (i) RS mode and (ii) CSRange > (3+∆) dmax [6]. For the topology in Fig. 9, it turns out that CSRange = 2.9d is enough. The numbers shown on the links in Fig. 9 represent a possible transmission schedule. Links with same number transmits at the same time. Following this pattern, the throughput capacity of 2L/3 is “potentially” achievable. Our simulation results in Subsection B below show that the 802.11 protocol throughput capacity is below but close to this upper bound. The reader may be curious as to why we did not use a “symmetric” 2-chain network (where the angle between the chains isπ ) as the illustrating example above. It turns out that the symmetric structure cannot achieve the throughput of 2L/3 if there are source nodes four or more hops away. To see this, first we note that for a symmetric 2-chain network, CSRange must be at least 3d to ensure HFD in the areas around the sink node (see discussion of the example in Fig. 3 in Section II). Given CSRange=3d, each of the chains (assuming a long chain with more than four hops (or five nodes)) cannot have throughput of L/3, as can be easily verified by analysis of one linear chain [7], [9]. Before going to the next subsection, we note that Theo- rem 1 actually applies not just to canonical networks (the proof does not require it), but general networks in which (i) all links are of the same length; and (ii) source nodes are two hops are more away from the center. In other words, the chains leading to the data center need not be straight-line linear chains. Thus, Theorem 1 can be stated more generally as Theorem 1’ below: Theorem 1’: For equal-link-length general networks, 802.11 2 / 3C L≤ , where L is the link capacity. Proof: Same as Theorem 1 since Lemma 1 and Corollary 1 apply to general networks with equal link length also. (2) Variable Link-Length Networks In this subsection, we consider canonical networks in which the distance between adjacent rings can be varied (i.e., d0 , d1 ,… may be distinct). With this assumption, the capacity is upper-bounded by 3L/4. This is proved in Fig. 8. At most two simultaneous transmissions from 2-hop nodes. Fig. 9. Example of equal-link-length topology, CSRange=2.9d. 1 2 3 O N11 N12 >(1+∆)d >(1+∆)d <(1+∆)d CHAN ET AL.: MANY-TO-ONE THROUGHPUT CAPACITY OF IEEE 802.11 MULTI-HOP WIRELESS NETWORKS 7 Theorem 2 after Lemma 2 in the following. Lemma 2: At any time, at most three 2-hop nodes can transmit at the same time. Proof: Assume the contrary that we can have four 2-hop nodes belonging to four different chains transmitting at the same time. With respect to Fig. 10, consider the four straight lines formed by the four nodes to the center (note: the network could have more chains, just that we are fo- cusing on the four chains of the four 2-hop nodes in focus here). Four angles are formed between adjacent lines. Let θ < / 2π be the minimum of the four angles. Four angles are also formed between non-adjacent lines. Let β π≤ be the angle encompassing θ (see Fig. 10). For simultaneous transmissions of 2-hop nodes, the transmitters should not be able to carrier-sense each other. This implies an upper bound for CSRange as follows: 0 12( )sin 2 CSRange d d < + . (12) In addition, by assumption (5), to prevent collisions of 1- hop nodes and 2-hop nodes, they should be able to car- rier-sense each other. This implies a lower bound for CSRange. By (4), 0 1 0 0 0 1( ) 2 ( )cosCSRange d d d d d d≥ + + − + β . (13) By assumption (3), the receivers of simultaneous transmis- sions should not violate the physical constraints. By (1), 1 0(1 ) 2 sin 2 + ∆ < . (14) Since there are four chains, / 2θ π≤ and β π≤ . From the definitions of θ and β, we have 2θ β π≤ ≤ . (15) From (13) and (15), 0 1 0 0 0 1( ) 2 ( )cos(2 )CSRange d d d d d d≥ + + − + θ . (16) Let d1= α d0. We can form two inequalities from (12), (14) and (16): 2(1 cos ) , (17) (2cos 1) 1 2cos1 2cos 1 2cos 1 2cos (2cos 1) 1 2cos1 2cos 1 2cos 1 2cos ⎧ θ − + − θ− θ α > + −⎪ − θ − θ⎪⎪ θ − + − θ− θ⎪α < − −⎪ − θ − θ⎩ . (18) Fig. 11 shows the plot of (17) and (18) when ∆ = 0.78. The shadowed region is the area of solution. From the plot, 1.73 / 2θ π> > . This leads to a contradiction. Thus, there can be at most three simultaneous 2-hop transmissions. Theorem 2: For variable-link-length canonical networks, 802.11 3 / 4C L≤ , where L is the link capacity. Proof: Similar to the proof of Theorem 1, from Lemma 2, 21 22 2 | | | | ... | | NS S SS + + + ≥ . (19) Hence, 11 12 1 11 12 1 ( ... ) ( ... ) 1 x x x x x x + + + + + + + ≤ or 11 12 1 x x x+ + + ≤ , where 11 12 1( ... )Nx x x L+ + + is the throughput Fig. 12 shows an example of a canonical network. The CSRange has to be set larger than 2.62d0 and smaller than 3.417d0. The numbers on the links show a possible trans- mission schedule that achieves capacity of 3L/4. 1 Our simulation results in Subsection B below show that 802.11 throughput capacity is below but close to this upper bound. In the analysis of canonical networks, we have as- sumed that the loss exponent is 4, corresponding to ∆ = 1 For the one-to-many network (i.e., the sink becomes the source, and the sources become the sinks with respect to the many-to-one case here), some parameters should be changed to attain the capacity of 3L/4. Spe- cifically, CSRange = 1.7d0, and di = 0.7d0 for i=1, 2, … The derivation method for the capacity of the one-to-many case is similar to that in the many-to-one case here. Fig. 11. Plot of Inequalities (17) and (18). Fig. 10. Example of 4-chain canonical network. d0 d1 d1 (12) (13) (14) 8 IEEE TRANSACTIONS ON MOBILE COMPUTING, MANUSCRIPT ID 0.78. In outdoor environment, the typical value of loss exponent is in the range 2 to 4. Similar analytical tech- nique can be used to find their throughput capacities. Since smaller loss exponent implies larger ∆ (larger inter- ference), the throughput capacity under the assumption of loss exponent 4 serves as an upper-bound for the throughput capacity in outdoor environment. 3.2 Simulation We use the network simulator NS2 [10] to simulate the canonical network shown in Fig. 12. As shown in Subsec- tion A, for the 3-chain canonical network, 802.11 3 / 4C L≤ . In the simulation, the RS Mode is enabled. Table I shows the details of the simulated configuration. Only the n-hop nodes at the boundary are source nodes that generate data. Offered load control is applied to prevent them from injecting too much traffic into the network. For the interested reader, it has been shown in [7] that offered- load control can yield higher throughput in multi-hop networks. Fig. 13 shows the simulation result assuming the set- up of Table I. The x-axis is the number of nodes per chain, including the sink. Given a number of nodes per chain, we vary the offered load in the simulation to identify an offered load that achieves the highest average throughput. When the number of nodes per chain is 3, i.e., the 2-hop nodes are the source nodes, the throughput is 4.62Mbps (0.740L), which is very close to the theoretical capacity 3L/4, where the link capacity L is around 6.24Mbps as obtained by simulating one single link. But when the number of nodes per chain increases, the throughput drops to 4.30Mbps (0.690L). An explanation for this phenomenon is that the sched- uling scheme of IEEE 802.11 does not result in the optimal transmission schedule presented in Fig. 12 needed to achieve the 3L/4 upper bound. That is, the incorporation of random backoff countdown time in 802.11 causes im- perfect scheduling. Consider Fig. 12, it is possible for 2- hop and 3-hop nodes of different chains to transmit at the same time in 802.11, since they are out of the carrier- sensing range of each other. To achieve capacity 3L/4, however, all the 2-hop nodes must transmit together. However, a 3-hop transmission may prevent this, result- ing in only some of the 2-hop nodes transmitting together. In other words, there are times when not all 2-hop nodes transmit together, meaning |S2| cannot reach the lower bound in (19). Meeting the lower bound, however, is es- sential to achieving the optimal throughput 3L/4. Fig. 14 shows the simulation results of canonical net- works with different numbers of chains but with equal link length. The simulated configuration is shown in Ta- bles II and III. For the 2-chain canonical network, we use the network structure in Fig. 9. The angle between two chains are slightly less than π . The reason of not using a symmetric structure has been given in Subsection A above. For other cases, the chains are evenly placed on the network. The CSRange for each topology is determined by minimizing its value while preventing HN. The throughput is obtained by varying the offered load and choosing the highest one. From the graph, the highest throughput is 3.86Mbps (0.619L), which is slightly smaller than the theoretical capacity of 2L/3. This is due to the imperfect scheduling by 802.11, which has been discussed in the previous paragraph. In Fig. 14, the throughput converges to around 2.0Mbps (0.321L) when the number of chains increases. The convergence can be explained as follows. From the analysis in Subsection A, we see that the bottleneck is around the center. When the number of chains is large, the area near the center will become dense. The possible transmission patterns are similar in this area, and thus the throughput converges. In addition, note that the con- verged value, 0.321L, is considerably smaller than the value achieved when the number of chains is three, 0.619L. This is again due to imperfect scheduling of 802.11 MAC protocol. An interesting insight is that when the number of chains is small, the possible transmission pat- terns arise from “random” 802.11 MAC scheduling is more limited. And by limiting this degree of freedom, higher throughput can actually be achieved because ran- dom transmission patterns that degrade throughputs are eliminated. The above observation has two implications: (i) For network design, we may want to design the network in such a way that the number of routes leading to the center is limited. (ii) Even for a general, non-canonical, network densely popu- lated with nodes and with many routes leading to the center, it is better to selectively turn on only a subset of the nodes to limit the routes to the center. This principle will be further dis- cussed in Section IV. TABLE I SIMULATION CONFIGURATION FOR VARIABLE-LINK-LENGTH CA- NONICAL NETWORKS Number of chains 3 d0 250m d1 242m di for i>1 250m Transmission Range 250m Carrier Sensing Range 675m Routing Protocol AODV Propagation Model Two Ray Ground Packet Data Size 1460 bytes Fig. 12. Example of 3-chain canonical network, CSRange=2.7d. 1.732d0 0.973d0 2.62d0 3.417d0 CHAN ET AL.: MANY-TO-ONE THROUGHPUT CAPACITY OF IEEE 802.11 MULTI-HOP WIRELESS NETWORKS 9 3 4 5 6 7 8 9 10 Number of nodes per chain Fig. 13. Simulated throughput of a 3-chain canonical network with offered load control. TABLE II SIMULATION CONFIGURATION FOR EQUAL-LINK-LENGTH CA- NONICAL NETWORKS Number of nodes per chain 8 di for all i 250m Transmission Range 250m Carrier Sensing Range Refer to Table III Routing Protocol AODV Propagation Model Two Ray Ground Packet Data Size 1460 bytes TABLE III CARRIER SENSING RANGE FOR EQUAL-LINK-LENGTH CANONICAL NETOWKRS Number of chains Carrier Sensing Range 2 725m 3 875m 4 750m 5 725m 6 875m 7 800m 8 750m 9 875m 10 825m >10 900m 2 6 10 14 18 Number of chains Fig. 14. Simulated throughput of equal-link-length canonical net- works with offered load control. 3.3 HFD versus Non-HFD Performance In the preceding sections, we have assumed HFD net- works to simplify the analysis by eliminating the effect of collision. We now investigate the performance of HFD versus that of non-HFD networks. As a reminder, HFD requires (i) Use of Receiver Restart (RS) Mode, and (ii) Sufficiently large CSRange. From [11], we know that increasing CSRange increases the number of exposed nodes (EN) and decrease the number of hidden nodes (HN), and vice versa. When HN is removed, say with HFD, the EN phenomenon will be more severe, which lowers the throughput. However, that is the case for many-to-many data delivery only. For this paper, we are interested in many-to-one data delivery. Table IV shows the simulation results with same configu- ration as in Table II with varying CSRange. The shaded entries correspond to HFD. From the table, when the number of chains is between 2 to 10, the highest through- put is achieved if we choose the smallest CSRange within HFD. This shows that the best HFD configuration gener- ally works better than non-HFD. TABLE IV SIMULATION RESULT FOR EQUAL-LINK-LENGTH CANONICAL NETWOKRS No. of Chains Through- (Mbps) 2 3 4 5 6 7 8 9 10 975 2.388 2.981 3.355 2.833 2.863 3.022 2.891 3.054 3.114 925 2.793 2.993 3.329 3.518 2.837 2.805 2.943 3.270 3.108 875 2.797 2.999 3.508 3.535 3.393 3.272 3.163 3.384 2.883 825 2.795 2.490 3.513 3.483 2.615 3.681 3.575 3.053 3.366 775 2.808 2.473 3.724 3.540 2.760 2.754 3.709 3.367 3.269 725 3.589 2.226 3.210 3.854 2.095 2.264 3.147 3.199 2.686 675 3.170 2.288 2.398 2.799 2.142 2.261 2.176 2.367 2.633 625 3.166 1.806 2.219 2.657 1.735 2.020 2.670 1.906 2.156 575 3.183 1.788 2.168 2.202 1.657 1.609 2.280 1.929 2.041 bold: highest throughput; shaded: HFD The better performance of HFD could be explained as follows. When CSRange is decreased, the number of HN increases and the number of EN decreases. More links could be active when there are fewer EN, thus the throughput in multiple-source-multiple-destination net- work could be higher in the non-HN free situation. In a many-to-one network, however, all the traffic is directed toward the same destination. With a non-HN free design, although the total throughput on a link basis (point-to- point throughput) may be increased, the many-to-one throughput (or the end-to-end throughput) could not benefit from the increase, because all the traffic in the end will flow toward the bottleneck and be dropped there due to HNs. We will see later that this observation suggests a design in which the area near the center should be made HN-free, while areas far away from the data center need not be HN-free. 3.4 Multiple Interference Thus far, we have considered pair-wise interferences only. The analysis of pair-wise interferences is appealing from the simplicity viewpoint. However, it may not have taken into account the fact that the interferences from several other simultaneously transmitting sources may add up to yield unacceptable SIR even though each of the interferences may not be detrimental. In this section, we extend our analysis to take into account the effect of mul- tiple interferences. For brevity, we refer to the throughput 10 IEEE TRANSACTIONS ON MOBILE COMPUTING, MANUSCRIPT ID capacity obtained by assuming pair-wise interferences as pair-wise-interference throughput capacity, and the throughput capacity with mul-tiple interferences taken into account as multiple-interference throughput capacity. The multiple-interference throughput capacity is in general less than or equal to that of the pair-wise throughput capacity. The question then is whether the pair-wise-inter-ference capacity is a tight bound for mul- tiple-interference capacity. We show in the following that this is indeed the case. In the following, we focus on the 3- chain network. The analytical argument and the qualita- tive results for the 2-chain network are similar. Consider the canonical network in the Fig. 15, where d0=d2=d3=d4, and d1=0.9d0. In some cases, the SIR may not satisfy the constraint 10dB. For example, when N11 is receiving DATA from N12, and at the same time N21 and N31 are replying ACK to N22 and N32, the SIR is at most 11 11 21 31 4 4 4 ( ) (0.9 ) 6.859 1 1( ) ( ) ( ) 1.7321 1.7321 P N d PP N P N where PX(Y) is the received power from node Y to node X, Pt is the transmission power. This situation, however, occurs only if multiple ACKs are transmitted simultaneous in nearby links near the center. The probability of this occurring is low, since the transmission time of ACK is much lower than that of DATA. If we ignore the simultaneous transmissions of ACKs in these nearby links, we can show that the SIR due to multiple interferences is still more than 10dB, given that the SIR due to pair-wise interferences is more than 10dB, as follows. 1. 1-hop node to sink node When the sink node is receiving DATA from N11, the nearest three active links that cause largest interference are: N23 to N22, N33 to N32 and N14 to N13. If no two ACKs are transmitted simultaneously by these three links, the “worst-case” interference power at N0 (which includes ACK from N22 DATAs from N33 and N14, and transmis- sions by other nodes) is at most Hence, the SIR is at least 1/0.09949=10.513 2. 2-hop node to 1-hop node Consider the link N12 to N11. The nearest three active links are: N22 to N21, N32 to N31, N15 to N14. Similar to above, the SIR is at least 3. 3-hop node to 2-hop node and others The interference is less than the above cases. This part is skipped because the analytical approach is similar. In the above, we have argued analytically the consid- eration of multiple interferences will not have substan- tially different performance than that of pair-wise inter- ference. We have focused on the 3-chain network with variable link distance because this structure provides the highest capacity bound among the canonical networks. We now present simulation results for general canoni- cal networks with arbitrary number of chains. We have modified the NS2 simulator to take into account the ef- fects of multiple interferences (the modified NS2 code can be downloaded from the website in [12]). The throughput results are shown in Fig. 16. The multiple-interference throughput is only lower than the pair-wise-interference throughput by a small margin, and therefore the pair- wise-interference throughput serves a good bound for multiple-interference throughput. 4 GENERAL NETWORKS In this section, we consider the throughput of general networks. Since general networks may not have the regu- lar structure of canonical networks, the throughput capac- Fig. 15. Example of 3-chain canonical network, CSRange=2.7d. Fig. 16. Simulated throughput of 3-chain canonical network with offered load control. 0 0 0 0 0 022 33 14 25 35 16 4 4 4 4 4 4 4 4 ( ) ( ) ( ) ( ) ( ) ( ) ... 1 1 1 1 1 1 ( ...) 0.0995 1.9 2.9 3.9 4.9 4.9 5.9 N N N N N N P N P N P N P N P N P N + + + + + + = + + + + + + ≈ 11 11 11 11 11 11 N 21 N 32 N 15 N 24 N 34 N 17 4 4 4 4 4 4 4 (N ) (N ) (N ) (N ) (N ) (N ) ... (0.9 ) 1 1 1 1 1 1 ( ...) 1.7321 2.5515 3.9 4.4844 4.4844 5.9 10.5259 P P P P P P + + + + + + + + + + + + 1.732d0 0.9d0 2.55d0 3.29d0 N11 3 4 5 6 7 8 9 10 Number of nodes per chain multiple interference pairw ise interference CHAN ET AL.: MANY-TO-ONE THROUGHPUT CAPACITY OF IEEE 802.11 MULTI-HOP WIRELESS NETWORKS 11 ity could be lower than 3L/4. We propose a method to find the capacity by selecting Hidden-node Free Paths (HFP). 4.1 Discussion of HFP In Section III-C, we found that the network with HN- free outperforms that with HN in terms of throughput capacity. We could have three schemes which satisfy the HN-free condition for general network analysis. As one of the requirements of HFD, we assume RS Mode is used in all the analyses and experiments in the remaining of the paper. We assume that all nodes use a common fixed CSRange in each of the following schemes (assumption (1) in Section II); however, the schemes set the fixed CSRange differently. Scheme 1: CSRange is set to 3.78‧TxRange, where TxRange is the transmission range. This is a sufficient condition of HN free for any networks [6]. Scheme 2: CSRange is minimized according to the network topology so that no hidden node exists with respect to any two links in the network. This scheme, for example, was used in the analysis of canonical networks. Scheme 3: HFP - We select a subset of links to form paths to the center which are hidden-node free and achieve the highest possible throughput. Since some links are not used, the CSRange can be smaller than scheme 1 and 2 (i.e., only the links in the path are considered when fixing CSRange.) Based on Table IV, the highest throughput is achieved when we choose the smallest CSRange within HFD. So we have the following predictions for the throughputs of the different schemes above. The throughput of scheme 1 cannot be higher than that of scheme 2 (because the CSRange of some links are forced to adopt a higher value than necessary in scheme 1). Also, the throughput of scheme 2 cannot be higher than that of scheme 3 (because scheme 3 requires the HN property to be maintained only for links along the paths, and the paths that will be used are optimally chosen with regard to the throughput; whereas scheme 2 requires all links to be HN-free, even for links that are not used). For an example where HFP can achieve a higher throughput than scheme 2, we add two nodes to the 3-chain canonical network in Fig. 12 to yield the network in Fig. 17. In the network, link BB’ in- terfere with link AA’. If we set CSRange to be less than 3.417d0, node B will become a hidden node of link AA’. If we set CSRange larger than 3.417d0, the capacity upper- bound 3L/4 cannot be achieved. On the other hand, if we use HFP, we could select the links in the canonical net- work only. So node A could be “switched off” and there will not be hidden-node problem if we set CSRange to 2.7d0. 4.2 Experiments and Discussions To conserve space, this paper will not go into the de- tails of the formulation of the HFP problem, and the HFP experimental methodology. For the interested readers, such details can be found in the Appendix of our techni- cal report [12]. In a nutshell, our approach extends that of [13] by additionally taking into consideration the effects of carrier sensing and HFD requirements. We also pro- vide a branch-and-bound heuristic algorithm for the re- sulting integer linear program (ILP). Here we only pre- sent the performance results of experiments on schemes 1, 2, and 3 and their implications. Solving the ILP of scheme 3 is computationally intensive. The experimental results of scheme 3 in this subsection are therefore obtained us- ing our branch-and-bound heuristic. Schemes 1 and 2 are still solved in an optimal manner. As will be seen, even with a suboptimal heuristic, scheme 3 still yields better results. In our experiments, we put the nodes inside a disk of radius one. A sink node is placed at the center of the disk, and six source nodes are placed evenly at the boundary of the disk spaced evenly apart. For each source node, a node is randomly generated within the transmission range 0.4. More nodes are generated similarly with refer- ence to the newly created node until a node is within the transmission range from the sink node. In this way, we could ensure that there is a path from any source node to the sink node. By setting the transmission range to 0.4, the data from the source nodes will need at least three hops to reach the sink node. Table V shows the experiment results for five ran- domly generated networks, Net1, Net2, …, Net5 . T1, T2 and T3 are the throughputs of the three schemes. In ob- taining Ti, we vary the offered load at the source nodes until the highest throughput is obtained [7]. From Table V, scheme 3 has improvements of 4.8% to 43.8% over scheme 1, and 4.8% to 23.2% over scheme 2. As related earlier, we did not solve scheme 3 optimally, but rather used a heu- ristic. Therefore, the CSRange (CS) found for HFP in the experiments may not be the shortest possible CSRange. Nevertheless, the result shows that the solutions of scheme 3 exhibit some properties similar to the canonical network, as shown in Fig. 12. We discuss the similarities in the following paragraph. First, for scheme 3, CSRange/TxRange (CS3/TX) for Fig. 17. Example of HFP. 1.732d0 0.973d0 2.62d0 3.417d0 12 IEEE TRANSACTIONS ON MOBILE COMPUTING, MANUSCRIPT ID Net1 to Net5 is in the range of 2.62 to 3.417, which is the CSRange region we mentioned near the end of Section III- A for achieving the capacity of 3L/4 in a canonical net- work. Second, exactly three paths leading to the sink node are used, which is the same as the 3-chain canonical network (Fig. 18). This gives us an intuition that the ca- nonical network is in a sense optimal – that is, we may want to form a structure similar to the canonical network by turning on only some of the relay nodes. TABLE V RESULT FOR THROUGHPUT OF RANDOM NETWORKS T1 T2 T3 T3/T1 T3/T2 CS3 CS3/TX Net1 0.4 0.5 0.575 1.438 1.15 1.253 3.133 Net2 0.412 0.439 0.541 1.313 1.232 1.265 3.162 Net3 0.429 0.451 0.536 1.25 1.189 1.265 3.163 Net4 0.429 0.5 0.6 1.4 1.2 1.205 3.012 Net5 0.5 0.5 0.524 1.048 1.048 1.287 3.216 Neti: Network i TX: Transmission range, set to 0.4 in experiments T1: Throughput when CSRange=3.78 TX (Scheme 1) T2: Throughput when CSRange is minimized with respect to links in the network (Scheme 2). T3: Throughput when only some links in the network are activated (HFP) (Scheme 3) CS3: CSRange for Scheme 3 4.3 Applying Canonical Network to General Networks The preceding subsection shows that HFP outperforms other HN-free schemes in terms of throughput. We also observe from the results that (i) HFP solutions for a ran- dom network exhibit structures similar to that of the 3- chain canonical network near the center. Furthermore, from simulation results in Section III-B (see Fig. 13), we observe that (ii) IEEE 802.11 scheduling in the canonical network achieves throughput close to that of perfect scheduling. Observations (i) and (ii) lead to the following general engineering principle: Centric Canonical-Network Design Principle In a general multi-hop network densely populated with relay nodes, instead of solving the complex HFP optimization problem, as a heuristic, we may select routes near the center so that the structure looks like that of a 3-chain canonical network. If we have the freedom for node placement near the center during the network design process, then the nodes around the center should be structured like a 3-chain canonical network. Note that there is no restriction on nodes far away from the center, and that they can be randomly distributed (see Fig. 19 for illustration). This subsection investigates the application of the Cen- tric Canonical-Network Design Principle. For our simula- tions, we assume there is a disk with radius 2000m. Within the disk, there is an inner circle with radius 980m. As illustrated in Fig. 19, the inner circle is structured as a canonical network. The nodes outside the inner circle are placed randomly with the constraint that the smallest distance between any two of them is not shorter than 125m. The nodes outside the inner circle act as source nodes and relay nodes at the same time, while the nodes inside the inner circle act merely as relay nodes. We refer to the network structure in Fig. 19 as centric canonical net- work, alluding to the fact that only the vicinity of the cen- ter looks like a canonical network. Henceforth, we shall refer to vicinity of the center as the canonical network and the randomly-structured part beyond that as the random network. The number of nodes beyond the inner circle is 284. We use the default setting in NS2, CSRange of 550m and TXRange of 250m, for performing the simulations. AODV routing is assumed. For the canonical network, with respect to Fig.12, we set d0=200m. Since 550m/200m=2.75, which is within the range 2.62 to 3.417 (see Fig. 12), the canonical network is HN free. The ran- dom network, however, is not necessarily HN-free in our experiments. The assumption is reasonable, and corre- sponds to the real situation in which we only try to de- sign the network architecture near the center judiciously by careful node placement. As a benchmark, we have also conducted simulation experiments for a random network in which the inner circle is populated by 146 randomly placed nodes with no constraint on the node-to-node distance. In all our simula- tions below, the offered load to the source nodes are var- ied until we find the largest throughput for each network structure [7]. Simulation of 802.11 with AODV yields a Fig. 18. Random Networks and HFP.. CHAN ET AL.: MANY-TO-ONE THROUGHPUT CAPACITY OF IEEE 802.11 MULTI-HOP WIRELESS NETWORKS 13 throughput of 1.16 Mbps for the benchmark random net- work, and a throughput of 2.79Mbps for the centric ca- nonical network. That is, the throughput of the centric canonical network is more than 100% higher. This dem- onstrates that a carefully designed structured network around the data center yields superior performance. Although the improvement is significant, 2.79 Mbps is still a bit lower than the 4.30Mbps simulated throughput of the 3-chain canonical network in Section III. It turns out that the centric-canonical network actually fails to take another bottleneck into account. That is, in addition to the bottleneck around the center, there is also a bottleneck at the “confluence” of the random network and the canoni- cal network, where the canonical network may branch off to many paths in the random network, and the nodes on these branches may interfere with each other in a negative way to bring down the throughput. To mitigate the bottleneck at the confluence, we mod- ify the canonical network as in Fig. 20. As shown, each chain in the canonical network only branches out further into two chains before meeting the random network. We refer to this design as the manifold canonical network, in reference to the fact that there are actually two “layers” of canonical networks. The first one is at the center, with three more before meeting the random network. We refer to this design principle as the Manifold Canonical-Network Design Principle. In our simulations, the manifold canonical network is placed inside an inner circle of radius 1026s. The nodes beyond the manifold canonical network are randomly generated with the same constraints as the nodes gener- ated beyond the inner circle of the centric canonical net- work. As the inner circle is larger than previous networks and the number of nodes (which are relay nodes) in the manifold canonical network is 31, to keep the total num- ber of nodes in the network constant, the number of ran- domly generated nodes (which are also the source nodes) outside the inner circle is decreased from 284 to 269. We set CSRange 550m and d0=200m in the manifold canonical network in our simulation (see Fig. 12). Simulation of 802.11 with AODV routing yields a throughput of 3.34Mbps, which is 20% higher than that of the centric canonical network. For fair bench-marking, we again per- form the simulation with the inner circle replaced by ran- dom node placements, but this time with the inner circle having a radius of 1026m, as in the manifold canonical network. The simulation of the benchmark network yields a throughput of 1.31Mbps. We find that the throughput of the manifold canonical network is more than 150% over that by the pure random benchmark net- work. We have also investigated the robustness of the mani- fold canonical network with respect node positioning. Simulations show that 5% position error of the nodes in the two “layers” of the canonical network only decreases the throughput by 10% on average, as summarized in TABLE VI. TABLE VI COMPARISON OF THROUGHPUTS OF MANIFOLD CANONICAL NETWORKS WITH AND WITHOUT NODE POSITION ERROR Throughput without position error (Mbps) Throughput with position error (Mbps) Ratio 3.44 3.45 1.003 3.35 3.11 0.928 3.32 3.18 0.958 3.29 2.94 0.894 3.37 2.96 0.878 3.36 2.82 0.839 5 CONCLUSION In this paper, we have studied the throughput capacity of many-to-one multi-hop wireless networks based on the IEEE 802.11 MAC protocol. We have defined a class of canonical networks whose throughput capacity serves as a benchmark for general networks. Specifically, the throu- ghput capacity of canonical networks under 802.11 is up- per bounded by 3L/4, where L is the single-link capacity, when the source nodes are at least two hops away from the sink. If we restrict our attention to networks in which all links have the same length, the upper bound is further reduced to 2L/3. While the 3L/4 result in the previous paragraph has been established for canonical networks only, the 2L/3 result applies to general networks so long as (i) source nodes are at least two hops away from the data center; (ii) all links have the same length. Our 802.11 simulation results yield throughputs are Fig. 19. Example of a centric-canonical. Fig. 20. Example of a manifold canonical network. -2000 -1000 -2000 -1000 0 1000 2000 -2000 -1000 -2000 -1000 0 1000 2000 14 IEEE TRANSACTIONS ON MOBILE COMPUTING, MANUSCRIPT ID around 0.690L (for variable-link-length canonical net- works) and 0.619L (for equal-link-length canonical net- works) under the worse-case scenario when the source nodes are very far away and their traffic needs to go through many hops before reaching the sink node. That is, the simulated throughputs are reasonably close to the theoretical upper bounds of 3L/4 and 2L/3, respectively. This is a quite positive result considering the fact that 802.11 schedules transmissions in a rather random man- ner, while the examples we gave in Section III-A to achieve throughputs of 3L/4 and 2L/3 require very spe- cific transmission orders. The above results also imply that using variable link length is more desirable than using fixed link length. When the network is very dense (say, infinitely dense), if each node chooses a routing path with maximum hop distance in each hop, an equivalent network with fixed link length dmax, may result, where dmax is the maximum hop-distance governed by the transmit power and re- ceiver sensitivity. This max-hop-distance routing is not optimal for the many-to-one traffic pattern. This paper has considered both canonical networks with and without hidden nodes. Our results indicate that hidden-node free designs (HFD) yield higher throughput capacity. This is in contrast to the many-to-many case where HFD may not yield better throughputs [5] [6] and may actually decrease the overall system throughput. For general networks, we have used the concept of HFP (Hidden-node Free Path) to set up routes that yield optimal throughput. HFP routing, however, requires solving a complicated integer linear program, which may not be practical. Fortunately, our experimental results indicate that the routes selected by the HFP algorithm resemble the structure of the canonical network near the center. This gives rise to simple network design principles that attempt to approximate the canonical network struc- ture in the center. Specifically, we have shown that a manifold canonical network structure near the sink can yield superior throughput that is as much as 150% higher than that of a dense random network. A key insight is that in a network densely populated with nodes, deliberating turn- ing off some nodes in the area near the sink node so as to approximate the canonical network structure can actually give rise to better throughput performance. REFERENCES [1] P. Gupta and P. R. Kumar, “The Capacity of Wireless Net- works,” IEEE Transactions on Information Theory, vol. IT-46, March 2000. [2] D. Marco, E.J. Duarte-Melo, M. Liu, and D.L. Neuhoff, “On the Many-to-One Transport Capacity of a Dense Wireless Sensor Network and the Compressibility of Its Data,” IPSN 2003, pp. 1- 16, April 2003 [3] E.J. Duarte-Melo, M. Liu, “Data-Gathering Wireless Sensor Networks: Organization and Capacity,” Computer Networks, vol. 43, pp.519-537, Nov. 2003 [4] IEEE Computer Society LAN MAN Standards Committee, “Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications,” IEEE Std. 802.11, 1997 [5] L. Jiang, “Improving Capacity and Fairness by Elimination of Exposed and Hidden Nodes in 802.11 Networks,” M.Phil Thesis, The Chinese University of Hong Kong, Jun. 2005. [6] L. Jiang and S. C. Liew, “Removing Hidden Nodes in IEEE 802.11 Wireless Networks,” IEEE VTC, Sept. 2005. More com- prehensive version to appear as “Hidden-node Removal and Its Application in Cellular WiFi Networks” IEEE Trans. On Vehicu- lar Technology, Nov 2007. [7] P.C. Ng and S.C. Liew, “Offered Load Control in IEEE802.11 Multi-hop Ad-hoc Networks,” IEEE MASS, Oct. 2004. More comprehesive version to appear as “Throughput Analysis of IEEE 802.11 Multi-hop Ad hoc Networks,” IEEE/ACM Transac- tions on Networking, June 2007. [8] The Institute of Electrical and Electronics Engineers Inc. Press, “Wireless Communications Principles and Practice” [9] J. Li, C. Blake et al., “Capacity of Ad Hoc Wireless Networks,” ACM MobiCom, July 2001 [10] “The Network Simulator NS-2”, http://www.isi.edu /nsnam/ns [11] P. C. Ng, S. C. Liew, and L. Jiang, “Achieving Scalable Perform- ance in Large-Scale IEEE 802.11 Wireless Networks,” IEEE WCNC, March 2005 [12] http://www.ie.cuhk.edu.hk/soung/many_to_one, Technical Report with Appendix on HFP Algorithm and NS-2 code modeling multiple interferences. [13] K. Jain, J. Padhye et al, “Impact of Interference on Multi-hop Wire- less Network Performance”, MobiCom ’03, Sept. 2003 Chi Pan Chan received his B.Eng and M.Phil. degrees in Informa- tion Engineering from The Chinese University of Hong Kong in 2004 and 2006. His research was mainly related to capacity analysis in multi-hop wireless networks. He is now involved in the software in- dustry in the field of multimedia and networking. Soung Chang Liew received his S.B., S.M., E.E., and Ph.D. de- grees from the Massachusetts Institute of Technology. From March 1988 to July 1993, Soung was at Bellcore (now Telcordia), New Jer- sey, where he engaged in Broadband Network Research. Soung is currently Professor and Chairman of the Department of Information Engineering, the Chinese University of Hong Kong. Soung’s current research interests focus on wireless networking. Recently, Soung and his student won the best paper awards in the 1st IEEE Interna- tional Conference on Mobile Ad-hoc and Sensor Systems (IEEE MASS 2004) the 4th IEEE International Workshop on Wireless Local Network (IEEE WLN 2004). Separately, TCP Veno, a version of TCP to improve its performance over wireless networks proposed by Soung and his student, has been incorporated into a recent release of Linux OS. Publications of Soung can be found in www.ie.cuhk.edu.hk/soung. Besides academic activities, Soung is also active in the industry. He co-founded two technology start-ups in Internet Software and has been serving as consultant to many com- panies and industrial organizations. He is currently consultant for the Hong Kong Applied Science and Technology Research Institute (ASTRI), providing technical advice as well as helping to formulate R&D directions and strategies in the areas of Wireless Internetwork- ing, Applications, and Services. An Chan received the B.Eng degree in Information Engineering from The Chinese University of Hong Kong, Hong Kong in 2005. He is currently working toward a M.Phil degree in the same field at The Chinese University of Hong Kong. His research interests are in QoS over wireless network and advanced IEEE 802.11-like multi-access protocols. He is a graduate student member of IEEE. ABSTRACT This paper investigates the many-to-one throughput capacity (and by symmetry, one-to-many throughput capacity) of IEEE 802.11 multi-hop networks. It has generally been assumed in prior studies that the many-to-one throughput capacity is upper-bounded by the link capacity L. Throughput capacity L is not achievable under 802.11. This paper introduces the notion of "canonical networks", which is a class of regularly-structured networks whose capacities can be analyzed more easily than unstructured networks. We show that the throughput capacity of canonical networks under 802.11 has an analytical upper bound of 3L/4 when the source nodes are two or more hops away from the sink; and simulated throughputs of 0.690L (0.740L) when the source nodes are many hops away. We conjecture that 3L/4 is also the upper bound for general networks. When all links have equal length, 2L/3 can be shown to be the upper bound for general networks. Our simulations show that 802.11 networks with random topologies operated with AODV routing can only achieve throughputs far below the upper bounds. Fortunately, by properly selecting routes near the gateway (or by properly positioning the relay nodes leading to the gateway) to fashion after the structure of canonical networks, the throughput can be improved significantly by more than 150%. Indeed, in a dense network, it is worthwhile to deactivate some of the relay nodes near the sink judiciously. <|endoftext|><|startoftext|> Scanning Tunneling Spectroscopy in the Superconducting State and Vortex Cores of the β-pyrochlore KOs2O6 C. Dubois,∗ G. Santi, I. Cuttat, C. Berthod, N. Jenkins, A. P. Petrović, A. A. Manuel, and Ø. Fischer DPMC-MaNEP, Université de Genève, Quai Ernest-Ansermet 24, 1211 Genève 4, Switzerland S. M. Kazakov, Z. Bukowski, and J. Karpinski Laboratory for Solid State Physics ETHZ, CH-8093 Zürich, Switzerland (Dated: October 24, 2018) We performed the first scanning tunneling spectroscopy measurements on the pyrochlore super- conductor KOs2O6 (Tc = 9.6 K) in both zero magnetic field and the vortex state at several temper- atures above 1.95 K. This material presents atomically flat surfaces, yielding spatially homogeneous spectra which reveal fully-gapped superconductivity with a gap anisotropy of 30%. Measurements performed at fields of 2 and 6 T display a hexagonal Abrikosov flux line lattice. From the shape of the vortex cores, we extract a coherence length of 31–40 Å, in agreement with the value derived from the upper critical field Hc2. We observe a reduction in size of the vortex cores (and hence the coher- ence length) with increasing field which is consistent with the unexpectedly high and unsaturated upper critical field reported. PACS numbers: 74.70.Dd, 74.50.+r, 74.25.Qt The discovery of superconductivity in the β-pyrochlore osmate compounds AOs2O6 (A = K, Rb, Cs) [1] has high- lighted the question of the origin of superconductivity in classes of materials which possess geometrical frustra- tion [2, 3]. Interest has been predominantly focused on the highest-Tc compound KOs2O6 which presents many striking characteristics. In particular, the absence of in- version symmetry in its crystal structure [4] raises the question of its Cooper pair symmetry and the possibility of spin singlet-triplet mixing [5, 6]. The pyrochlore osmate compound KOs2O6 displays a critical temperature Tc = 9.6 K, the largest in its class of materials (CsOs2O6 and RbOs2O6 which differ only by the nature of the alkali ion have Tcs of 3.3 and 6.3 K re- spectively). Although band structure calculations show that the K ion does not influence the density of states (DOS) at the Fermi level [7, 8], it seems to affect sev- eral key properties [9]. In particular, the first order phase transition revealed by specific heat measurements in magnetic fields at the temperature Tp ≈ 7.5 K has been ascribed to a “freezing” of its rattling motion [10]. The negative curvature of the resistivity as a function of temperature also indicates a large electron-phonon scattering [11]. Specific heat measurements [12] sug- gest the coexistence of strong electron correlations and strong electron-phonon coupling, two generally antago- nistic phenomena with respect to the superconducting pairing symmetry. The nature of the symmetry remains a controversial subject in the literature. NMR [13] and µSR [14] data suggest anisotropic gap functions with nodes whereas thermal conductivity experiments [15] fa- vor a fully-gapped state. The peculiar behavior of KOs2O6 is demonstrated by its upper critical magnetic field Hc2, whose tem- perature dependence is linear down to sub-Kelvin tem- peratures and whose amplitude is above the Clogston limit [16]. One possible interpretation is the occur- rence of spin-triplet superconductivity driven by spin- orbit coupling [5, 6]. Alternatively, it has also been sug- gested that this behavior can be explained by the peculiar topology of the Fermi surface (FS) sheets of KOs2O6, assuming that superconductivity occurs mainly on the closed sheet [16]. The understanding of the physics of this compound would greatly benefit from a detailed knowledge of the local density of states (LDOS). Scanning Tunneling Spec- troscopy (STS) is an ideal tool for this, particularly since it allows one to map the vortices in real space and also access the normal state below Tc by probing their cores [17, 18, 19, 20]. In this Letter we present a detailed STS study of KOs2O6 single crystals, including the first vortex imaging in this material. The KOs2O6 single crystals were grown from Os and KO2 in oxygen-filled quartz ampoules. Their dimensions are around 0.3 × 0.3 × 0.3 mm3. The details of their chemical properties as well as their growth conditions can be found in Ref. 4. AC susceptibility measurements show a very sharp superconducting transition (∆Tc = 0.35 K). Our measurements are carried out using a home-built low temperature scanning tunneling microscope featuring a compact nanopositioning stage [21] to target the small- sized crystals. Electrochemically etched iridium tips are used for STS measurements on as-grown single crystal surfaces and the differential conductivity was measured using a standard AC lock-in technique. The surface topography of as-grown samples (Fig. 1a) reveals atomically flat regions speckled with small corru- gated islands a few Ångströms high whose spectroscopic characteristics are noisy and not superconducting (thus restraining our field of view for spectroscopic imaging). http://arxiv.org/abs/0704.0529v1 0 50 100 150 200 PSfrag replacements x (nm) Distance d (nm) Bias voltage V (mV) 100 (Å) Conductance (shifted, arb. units) −5 −4 −3 −2 −1 0 1 2 3 4 5 PSfrag replacements x (nm) y (nm) Distance d (nm) Height z (Å) Bias voltage V (mV) )(a) (b) FIG. 1: (a) Large-scale topography of KOs2O6 (T = 2 K, Rt = 60 MΩ); the box shows the measurement area for the vortex maps. (b) Spectroscopic trace along a 100 Å path taken on an atomically flat region with one spectrum every 1 Å. The spectra show raw data offset vertically for clarity (T = 2 K, Rt = 20 MΩ). The large flat regions display highly homogeneous super- conducting spectra (Fig. 1b), which were perfectly repro- ducible over the timescale of our experiments (4 months). We have checked that the spectra obtained by varying the tunnel resistance Rt all collapse onto a single curve, thus confirming true vacuum tunneling conditions. We have also verified that the numerical derivative of the tunnel current with respect to the voltage gives the same spectroscopic signature as the dI/dV lock-in signal. We stress that all measurements presented in this paper are raw data. The lack of inversion symmetry in this compound to- gether with several experimental findings raises the ques- tion of the symmetry of the gap function. In order to clarify this point, we have fitted our data to several sym- metry models, focusing on the question of the presence or absence of nodes and the amplitude of any possible gap anisotropy. We therefore considered three scenarii with an approximate angular dependence of the gap, i.e. an isotropic s-wave (∆0), a d-wave (∆ cos 2φ) with nodes and an “anisotropic” s-wave (∆0 + ∆sinφ) which has the same angular dependence as the s-p-wave singlet- triplet mixed state [6]. We do not take the real topol- ogy of the FS [7] into account, since it comprises two 3D Fermi sheets and is hence unlikely to have any sig- nificant effect on the gap structure. For an anisotropic gap, ∆(φ), the quasiparticle DOS is given by N(ω) ∝ |Re[〈(ω+iΓ)/ (ω + iΓ)2 − |∆(φ)|2〉φ]| where Γ is a phe- nomenological scattering rate. In addition, we included broadenings due to the experimental temperature and the lock-in in our fits. The results are presented in Fig. 2. The d-wave model can be rejected at this stage since its zero bias conductance (ZBC) is larger than in experi- ment (increasing Γ in the model can only increase the ZBC). The differences between symmetries appear much more clearly in the second derivative spectrum (d2I/dV 2, Fig. 2d) which is not surprising as it emphasizes the varia- tions of the DOS on a small energy scale and is very sensi- tive to the model parameters (in contrast with the dI/dV −4 −3 −2 −1 0 1 2 3 4 V (mV) Experiment anisotropic s−wave s−wave d−wave PSfrag replacements x (nm) y (nm) Distance d (nm) Height z (Å) Bias voltage V (mV) 100 (Å) Conductance (shifted, arb. units) −2 −1 0 1 2 V (mV) PSfrag replacements x (nm) y (nm) Distance d (nm) Height z (Å) Bias voltage V (mV) 100 (Å) Conductance (shifted, arb. units) −5 0 5 V (mV) 1.95 K 3.10 K 4.00 K 5.10 K 6.00 K 9.00 K 10.00 K PSfrag replacements x (nm) y (nm) Distance d (nm) Height z (Å) Bias voltage V (mV) 100 (Å) Conductance (shifted, arb. units) anisotropic (meV) s d s-wave ∆0 1.22 - 1.09 ∆ - 1.52 0.40 Γ 0.12 0 0.05 2.93 3.66 3.58 (a) (b) T = 1.95 K T = 1.95 K FIG. 2: Experimental and theoretical tunneling spectra. (a) Normalized dI/dV spectra at different temperatures from 1.95 to 10 K (spectra are offset vertically for clarity). (b) Pa- rameters for the different theoretical models. (c) Comparison of the experimental spectrum at low temperature and low en- ergy with the different theoretical models; the color codes are explained in (d). (d) Same as (c) for the second derivative d2I/dV 2. curve). The best fit is clearly given by the “anisotropic” s-wave model with an anisotropy of around 30%. With respect to the singlet-triplet mixed state, we note that we do not see any evidence in our data for a second co- herence peak arising from spin-orbit splitting. Since the 3D nature of both sheets implies that tunneling takes place in both of them, the absence of a second peak also rules out the possibility of two different isotropic gaps on separate FS sheets. Our results would however be compatible with multiband superconductivity with two (overlapping) anisotropic gaps. Finally, we see no signa- ture of a normal-normal tunneling channel in our junc- tion, suggesting that all electrons involved in the tunnel- ing process come from the superconducting condensate. To investigate the temperature evolution of the quasi- particle DOS, we acquired tunneling conductance spec- tra at different temperatures between 1.95 K and 10 K (Fig. 2a). The closure of the gap at the bulk Tc shows that we are probing the bulk properties of KOs2O6. This −6 −4 −2 0 2 4 6 bias voltage V (mV) −6 −4 −2 0 2 4 6 bias voltage V (mV) (a) (b)H = 2 T H = 6 T FIG. 3: Spectroscopic traces at T = 2 K across vortices for a field of 2 T (a) and 6 T (b). The spectra at the vortex centers are highlighted in red. The spatial variation of the conductance is shown in the corresponding insets. is further confirmed by the fact that similar spectra were also obtained on freshly cleaved surfaces. The totally flat conductance spectra at higher temperature show no support for a pseudogap in the DOS above Tc, imply- ing that the steep decrease in the 1/(T1T ) curve around 16 K in NMR data [13] must have a different origin. The spectra taken between 6 and 9 K (not shown) were very noisy. This could be explained by the proximity to the first order transition at Tp ≃ 7.5 K [10]. The BCS coupling ratio 2∆max/kBTc inferred from our measured gaps and critical temperature is about 3.6 for the anisotropic s-wave case, a value slightly smaller than that reported from specific heat measurements [12]. However, we stress that STS is a direct probe of the su- perconducting gap. Our findings therefore lead us to the conclusion that KOs2O6 is fully gapped with a significant anisotropy of around 30%. We now focus on measurements performed in an ap- plied magnetic field. In the vortex cores whose radial size is roughly given by the coherence length ξ, superconduc- tivity is suppressed leading to a drastic change in the LDOS which can be measured by STM. Our measure- ments were performed for two fields, 2 and 6 T, over the particularly flat region of about 60 × 60 nm2 (Fig. 1a). Each measurement was taken at 2 K with a typical ac- quisition time of 40 hours. The results are presented in Figs 3 and 4. The vor- tex maps (insets of Fig. 3 and Figs 4a and 4b) show the ZBC normalized to the conductance at 6 meV. Fig. 3 displays the spectra taken along traces passing through vortex cores for each of the two fields considered. The suppression of superconductivity and its effect on the conductance in a vortex core can clearly be seen. The vortex maps show a roughly hexagonal vortex lattice with vortex spacings d = 352 ± 17 Å and 216 ± 21 Å at 2 and 6 T respectively, in agreement with the spacings 2Φ0/H expected for an Abrikosov hexago- nal lattice [22], i.e. 345 Å and 199 Å. We ascribe the variations in the core shapes and the deviation from a perfectly hexagonal lattice to vortex pinning. In partic- ular, the vortex identified by the arrow in Fig. 4 appears to be split. We attribute this to the vortex oscillating between two pinning centers during the measurement, a situation which has been seen in other compounds [23]. One should also note that the islands (surface defects) at the border of the measurement area (Fig. 1) could influence the vortex core shapes and positions. In order to estimate the coherence length ξ from our measurements, we now consider the spatial dependence of the ZBC. Due to the proximity of the vortices, we model the LDOS as a superposition of isolated vortex LDOS which can be expressed as N(ω, r) = n |un(r)| δ(ω − En) + |vn(r)|2 δ(ω + En), where ψn(r) = (un(r), vn(r)) is the wave function of the nth vortex core state and En its energy. An approximate solution for the iso- lated vortex was given long ago [24] in which the ra- dial dependence of each ψn(r) consists of a rapidly os- cillating n-dependent Bessel function multiplied by a cosh−1/π(r/ξ) envelope common to all states. We there- fore construct a phenomenological model for our 2D ZBC maps, σ(ω = 0, r) ∝ N(ω = 0, r), by retaining the slowly varying parts of the wave functions alone, i.e. σ(ω = 0, r) = σ0 + Λ |r − ri| where σ0 = 0.13 is the residual normalized conductance at zero bias in the absence of field (Fig. 2c), Λ a scaling factor, ξ the coherence length and the sum runs over all the vortices with positions ri in the map. Using (1), we fitted ri and ξ over the entire map for each field, thus considering all imaged vortices to determine ξ. The results from the 2D fits are presented in Fig. 4c and d in map format and along traces selected to pass through vortex cores in Fig. 4e and f. The traces help to visualize the spatial extent of the vortices and assess the (extremely high) quality of the 2D fits. We first ob- serve that the normalized ZBC between the vortices is slightly enhanced at H = 2 T but increases strongly at H = 6 T with respect to the value at zero-field (Fig. 2c), indicating a significant core overlap. From our data taken at T = 2 K, we obtain ξ = 35 ± 3 Å and 45 ± 7 Å at H = 6 and 2 T respectively (the uncertainties are esti- mated from the spread of the results obtained on several maps: two for 6 T and three for 2 T). Using Ginzburg- Landau theory, we extrapolate the corresponding T = 0 values as ξ = 31± 3 and 40± 6 Å respectively, consistent with the value derived from Hc2. Furthermore, our re- sults indicate that the vortex size decreases with increas- ing field and, although at the limit of the error bars, we believe this trend to be genuine. In addition, this finding is consistent with the abnormally large Hc2: if the vor- tices become smaller as the field increases, the material can accommodate more vortices before the breakdown 0 20 40 60 Distance (nm) 0 20 40 60 Distance (nm) ) (e) x (nm) 0 10 20 30 40 50 x (nm) 0 10 20 30 40 50 60 60 (c) H = 2 T H = 6 T 0 0.2 0.4 0.6 0.8 1 FIG. 4: (a), (b) Experimental ZBC maps (T = 2 K) normal- ized to the background conductance at 2 and 6 T respectively with corresponding fits (c), (d); large values (red) correspond to normal regions (i.e. vortex cores) and low values (blue) to superconducting (gapped) regions. (e), (f) Experimental ZBC profiles across vortex centers together with the corresponding profiles from the 2D fits (red lines). of superconductivity, leading to a higher upper critical field. This correlates with the observed temperature de- pendence of the upper critical field. We find that the spectra at the vortex centers are flat for both fields (Fig. 3), showing the presence of localized quasiparticle states in the vortex cores. However, our spectra show no excess spectral weight at or close to zero bias and thus no ZBCP which is the generally expected signature of vortex core states. The absence of a ZBCP is at first glance striking considering the large mean free path ℓ ≈ 200 nm ≫ ξ in KOs2O6 [15]. In fact, this ab- sence is common to many non-cuprate superconductors, the only known exceptions being 2H-NbSe2 [17, 25, 26] and YNi2B2C [27]. Although no definitive theory cur- rently exists to explain such an absence, a possible ex- planation assumes that the scattering rate is strongly en- hanced in the vortex cores. This interpretation is sup- ported by our numerical solutions of the Bogoliubov-de Gennes equations for a single vortex with an r-dependent scattering rate Γ. Furthermore, these simulations show a radial dependence of the LDOS which is fully consistent with (1). In conclusion, we have presented the first scanning tun- neling spectroscopic measurements on superconducting KOs2O6. The fitted spectra demonstrate that KOs2O6 is a fully-gapped superconductor with an anisotropy of around 30%, possibly resulting from a s-p singlet-triplet mixed state allowed by the lack of inversion symme- try. We have imaged hexagonal vortex lattices matching Abrikosov’s prediction for 2 and 6 T fields. Using Caroli- de Gennes-Matricon theory we extract a field-dependent coherence length of 31–40 Å, in good agreement with the thermodynamic estimate fromHc2. The absence of a zero bias conductance peak, the apparent field dependence of ξ and the precise radial dependence of the LDOS all call for deeper exploration. We acknowledge T. Jarlborg, M. Decroux, I. Maggio- Aprile and P. Legendre for valuable discussions and thank P.E. Bisson, L. Stark and M. Lancon for technical sup- port. This work was supported by the Swiss National Science Foundation through the NCCR MaNEP. ∗ Electronic address: duboisc@mit.edu [1] S. Yonezawa, Y. Muraoka, Y. Matsushita and Z. Hiroi, J. Phys.:Condens. Matter 16, L9 (2004); ibid, J. Phys. Soc. Jpn 73, 819 (2004); S. Yonezawa, Y. Muraoka and Z. Hiroi, J. Phys. Soc. Jpn 73, 1655 (2004). [2] P. W. Anderson, Mater. Res. Bull. 8, 153 (1973). [3] H. Aoki, J. Phys.: Condens. Matter 16, V1 (2004). [4] G. Schuck, S. Kazakov, K. Rogacki, N. Zhigadlo, and J. Karpinski, Phys. Rev. B 73, 144506 (2006). [5] P. A. Frigeri, D. F. Agterberg, A. Koga, and M. Sigrist, Phys. Rev. Lett. 92, 097001 (2004); ibid, Phys. Rev. Lett. 93, 099903 (2004). [6] N. Hayashi, Y. Kato, P. A. Frigeri, K. Wakabayashi, and M. Sigrist, Physica C 437-38, 96 (2006). [7] J. Kuneš, T. Jeong, and W. E. Pickett, Phys. Rev. B 70, 174510 (2004). [8] R. Saniz, J. Medvedeva, L.-H. Ye, T. Shishidou, and A. Freeman, Phys. Rev. B 70, 100505(R) (2004). [9] J. Kuneš and W. E. Pickett, Phys. Rev. B 74, 094302 (2006). [10] Z. Hiroi, S. Yonezawa, and J. Yamaura, cond- mat/0607064, to be published in the Proceedings of HFM2006 (J.Phys.: Condens. Matter) (2006). [11] Z. Hiroi, S. Yonezawa, J. Yamaura, T. Muramatsu, and Y. Muraoka, J. Phys. Soc. Jpn. 74, 1682 (2005). [12] M. Brühwiler, S. Kazakov, J. Karpinski, and B. Batlogg, Phys. Rev. B 73, 094518 (2006). [13] K. Arai, J. Kikuchi, K. Kodama, M. Takigawa, S. Yonezawa, Y. Muraoka, and Z. Hiroi, Physica B 359- 361, 488 (2005). [14] A. Koda, W. Higemoto, K. Ohishi, S. R. Saha, R. Kadono, S. Yonezawa, Y. Muraoka, and Z. Hiroi, J. mailto:duboisc@mit.edu Phys. Soc. Jpn. 74, 1678 (2005). [15] Y. Kasahara, Y. Shimono, T. Shibauchi, Y. Matsuda, S. Yonezawa, Y. Muraoka, and Z. Hiroi, Phys. Rev. Lett. 96, 247004 (2006). [16] T. Shibauchi, L. Krusin-Elbaum, Y. Kasahara, Y. Shi- mono, Y. Matsuda, R. D. McDonald, C. H. Mielke, S. Yonezawa, Z. Hiroi, M. Arai, et al., Phys. Rev. B 74, 220506 (2006). [17] H. Hess, R. Robinson, R. Dynes, J. J. Valles, , and J. Waszczak, Phys. Rev. Lett. 62, 214 (1989). [18] Y. DeWilde, M. Iavarone, U. Welp, V. Metlushko, A. Koshelev, I. Aranson, G. Crabtree, and P. Canfield, Phys. Rev. Lett. 78, 4273 (1997). [19] M. Eskildsen, M. Kugler, S. Tanaka, J. Jun, S. Kazakov, J. Karpinski, and Ø. Fischer, Phys. Rev. Lett. 89, 187003 (2002). [20] N. Bergeal, V. Dubost, Y. Noat, W. Sacks, D. Roditchev, N. Emery, C. Hérold, J.-F. Marêché, P. Lagrange, and G. Loupias, Phys. Rev. Lett. 97, 077003 (2006). [21] C. Dubois, P. E. Bisson, S. Reymond, A. A. Manuel, and Ø. Fischer, Rev. Sci. Instrum. 77, 043712 (2006). [22] A. A. Abrikosov, Sov. Phys.-JETP 5, 1174 (1957). [23] B. Hoogenboom, M. Kugler, B. Revaz, I. Maggio-Aprile, Ø. Fischer, and C. Renner, Phys. Rev. B 62, 9179 (2000). [24] C. Caroli, P. de Gennes, and J. Matricon, Physics Letters 9, 307 (1964). [25] F. Gygi and M. Schluter, Phys. Rev. B 41, 822 (1990). [26] C. Renner, A. D. Kent, P. Niedermann, Ø. Fischer, and F. Lévy, Phys. Rev. Lett. 67, 1650 (1991). [27] H. Nishimori, K. Uchiyama, S. Kaneko, A. Tokura, H. Takeya, K. Hirata, and N. Nishida, J. Phys. Soc. Jpn. 73, 3247 (2004). ABSTRACT We performed the first scanning tunneling spectroscopy measurements on the pyrochlore superconductor KOs2O6 (Tc = 9.6 K) in both zero magnetic field and the vortex state at several temperatures above 1.95 K. This material presents atomically flat surfaces, yielding spatially homogeneous spectra which reveal fully-gapped superconductivity with a gap anisotropy of 30%. Measurements performed at fields of 2 and 6 T display a hexagonal Abrikosov flux line lattice. From the shape of the vortex cores, we extract a coherence length of 31-40 {\AA}, in agreement with the value derived from the upper critical field Hc2. We observe a reduction in size of the vortex cores (and hence the coherence length) with increasing field which is consistent with the unexpectedly high and unsaturated upper critical field reported. <|endoftext|><|startoftext|> Introduction In the low-energy limit string theory with D-branes gives rise to noncommutative field theory on the branes when the string propagates in a nontrivial NS-NS two-form (B-field) background [1, 2, 3, 4]. In particular, if the open string has N=2 worldsheet supersymmetry, the tree-level target space dynamics is described by a noncommutative self-dual Yang-Mills (SDYM) theory in 2+2 dimensions [5]. Furthermore, open N=2 strings in a B-field background induce on the worldvolume of n coincident D2-branes a noncommutative Yang-Mills-Higgs Bogomolny-type system in 2+1 dimensions which is equivalent to a noncommutative generalization [6] of the modified U(n) chiral model known as the Ward model [7]. The topological nature of N=2 strings and the integrability of their tree-level dynamics [8] render this noncommutative sigma model integrable.1 Being integrable, the commutative U(n≥2) Ward model features a plethora of exact scattering and no-scattering multi-soliton and wave solutions, i.e. time-dependent stable configurations on R2. These are not only a rich testing ground for physical properties such as adiabatic dynamics or quantization, but also descend to more standard multi-solitons of various integrable systems in 2+0 and 1+1 dimensions, such as sine-Gordon, upon dimensional and algebraic reduction. There is a price to pay however: Nonlinear sigma models in 2+1 dimensions may be Lorentz-invariant or integrable but not both [7, 11]. In fact, Derrick’s theorem prohibits the existence of stable solitons in Lorentz-invariant scalar field theories above 1+1 dimensions. A Moyal deformation, however, overcomes this hurdle, but of course replaces Lorentz invariance by a Drinfeld-twisted version. There is another gain: The deformed Ward model possesses not only deformed versions of the just-mentioned multi-solitons, but in addition allows for a whole new class of genuinely noncommutative (multi-)solitons, in particular for the U(1) group [12, 13]! Moreover, this class is related to the generic but perturbatively constructed noncommutative scalar-field solitons [14, 15] by an infinite-stiffness limit of the potential [16]. In [12, 13] and [17]–[20] families of multi-solitons as well as their reduction to solitons of the noncommutative sine-Gordon equations were described and studied. In the nonabelian case both scattering and nonscattering configurations were obtained. For static configurations the issue of their stability was analyzed [21]. The full moduli space metric for the abelian model was computed and its adiabatic two-soliton dynamics was discussed [16]. Recall that the critical N=2 string theory has a four-dimensional target space, and its open string effective field theory is self-dual Yang-Mills [8], which gets deformed noncommutatively in the presence of a B-field [5]. Conversely, the noncommutative SDYM equations are contained [19] in the equations of motion of N=2 string field theory (SFT) [22] in a B-field background. This SFT formulation is based on the N=4 topological string description [23]. It is well known that the SDYM model can be described in terms of holomorphic bundles over (an open subset of) the twistor space2 [26] CP 3 and the topological N=4 string theory contains twistors from the outset. The Lax pair, integrability and the solutions to the equations of motion by twistor and dressing methods were incorporated into the N=2 open SFT in [27, 28]. However, this theory reproduces only bosonic SDYM theory, its symmetries (see e.g. [29, 30, 31]) and integrability properties. It is natural to ask: What string theory can describe supersymmetric SDYM theory [32, 33] in four dimensions? 1For discussing some other noncommutative integrable models see e.g. [9, 10] and references therein. 2For reviews of twistor theory see, e.g., the books [24, 25]. There are some proposals [33, 34, 35, 36] for extending N=2 open string theory (and its SFT) to be space-time supersymmetric. Moreover, it was shown by Witten [37] that N=4 supersymmetric SDYM theory appears in twistor string theory, which is a B-type open topological string with the supertwistor space CP 3|4 as a target space.3 Note that N<4 SDYM theory forms a BPS subsector of N -extended super Yang-Mills theory, and N=4 SDYM can be considered as a truncation of the full N=4 super Yang-Mills theory [37]. It is believed [43, 39] that twistor string theory is related with the previous proposals [33, 34, 35, 36] for a Lorentz-invariant supersymmetric extension of N=2 (and topological N=4) string theory which also leads to the N=4 SDYM model. A dimensional reduction of the above relations between twistor strings and N=4 super Yang- Mills and SDYM models was considered in [44, 45, 46, 47]. The corresponding twistor string theory after this reduction is the topological B-model on the mini-supertwistor space P2|4. In [47] it was shown that the 2N=8 supersymmetric extension of the Bogomolny-type model in 2+1 dimensions is equivalent to an 2N=8 supersymmetric modified U(n) chiral model on R2,1. The subject of the current paper is an 2N≤8 version of the above supersymmetric Bogomolny-type Yang-Mills-Higgs model in signature (− + +), its relation with an N -extended supersymmetric modified integrable U(n) chiral model (to be defined) in 2+1 dimensions and the Moyal-type noncommutative deformation of this chiral model. We go on to explicitly construct multi-soliton configurations on noncommutative R2,1 for the corresponding supersymmetric sigma model field equations. By studying the scattering properties of the constructed configurations, we prove their asymptotic factorization without scattering for large times. We also briefly discuss a D-brane interpretation of these soliton configurations from the viewpoint of twistor string theory. 2 Supersymmetric Bogomolny model in 2+1 dimensions 2.1 N -extended SDYM equations in 2+2 dimensions Space R2,2. Let us consider the four-dimensional space R2,2 = (R4, g) with the metric ds2 = gµνdx µdxν = det(dxαα̇) = dx11̇dx22̇ − dx21̇dx12̇ (2.1) with (gµν) = diag(−1,+1,+1,−1), where µ, ν, . . . = 1, . . . , 4 are space-time indices and α = 1, 2, α̇ = 1̇, 2̇ are spinor indices. We choose the coordinates4 (xµ) = (xa, t̃) = (t, x, y, t̃) with a, b, . . . = 1, 2, 3 , (2.2) and the signature (− ++−) allows us to introduce real isotropic coordinates (cf. [19, 6]) x11̇ = 1 (t− y) , x12̇ = 1 (x+ t̃) , x21̇ = 1 (x− t̃) , x22̇ = 1 (t+ y) . (2.3) SDYM. Recall that the SDYM equations for a field strength tensor Fµν on R 2,2 read εµνρσF ρσ = Fµν , (2.4) 3For other variants of twistor string models see [38, 39, 40]. For recent reviews providing a twistor description of super Yang-Mills theory, see [41, 42] and references therein. 4Our conventions are chosen to match those of [12] after reduction to the space R2,1 with coordinates (t, x, y). where εµνρσ is a completely antisymmetric tensor on R 2,2 and ε1234 = 1. In the coordinates (2.3) we have the decomposition αα̇,ββ̇ = ∂αα̇Aββ̇ − ∂ββ̇Aαα̇ + [Aαα̇, Aββ̇ ] = εαβ Fα̇β̇ + εα̇β̇ Fαβ (2.5) := −1 αα̇,ββ̇ and Fαβ := −12ε α̇β̇F αα̇,ββ̇ , (2.6) where εαβ is antisymmetric, εαβε βγ = δ α, and similar for ε α̇β̇, with ε12 = ε1̇2̇ = 1. The gauge potential (Aαα̇) will appear in the covariant derivative , · ] . (2.7) In spinor notation, (2.4) is equivalently written as = 0 ⇔ F αα̇,ββ̇ Fαβ . (2.8) Solutions {Aαα̇} to these equations form a subset (a BPS sector) of the solution space of Yang-Mills theory on R2,2. N -extended SDYM in component fields. The field content of N -extended super SDYM is5 N = 0 Aαα̇ (2.9a) N = 1 Aαα̇, χiα with i = 1 (2.9b) N = 2 Aαα̇, χiα, φ[ij] with i, j = 1, 2 (2.9c) N = 3 Aαα̇, χiα, φ[ij], χ̃ [ijk] with i, j, k = 1, 2, 3 (2.9d) N = 4 Aαα̇, χiα, φ[ij], χ̃ [ijk] [ijkl] with i, j, k, l = 1, 2, 3, 4 . (2.9e) Here (Aαα̇, χ [ij], χ̃ [ijk] [ijkl] ) are fields of helicities (+1,+1 , 0,−1 ,−1). These fields obey the field equations of the N = 4 SDYM model, namely [33, 37] = 0 , (2.10a) Dαα̇χ iα = 0 , (2.10b) Dαα̇D αα̇φij + 2{χiα, χjα} = 0 , (2.10c) Dαα̇χ̃ α̇[ijk] − 6[χ[iα, φjk]] = 0 , (2.10d) D γ̇α G [ijkl] + 12{χ[iα, χ̃ } − 18[φ[ij ,D φkl]] = 0 . (2.10e) Note that the N < 4 SDYM field equations are governed by the first N+1 equations of (2.10), where F = 0 is counted as one equation and so on. 5We use symmetrization (·) and antisymmetrization [·] of k indices with weight 1 , e.g. [ij] = 1 (ij − ji). 2.2 Superfield formulation of N -extended SDYM Superspace R4|4N . Recall that in the space R2,2 = (R4, g) with the metric g given in (2.1) one may introduce purely real Majorana-Weyl spinors6 θα and ηα̇ of helicities +1 and −1 as anti- commuting (Grassmann-algebra) objects. Using 2N such spinors with components θiα and ηα̇i for i = 1, . . . ,N , one can define the N -extended superspace R4|4N and the N -extended supersymmetry algebra generated by the supertranslation operators Pαα̇ = ∂αα̇ , Qiα = ∂iα − ηα̇i ∂αα̇ and Qiα̇ = ∂iα̇ − θiα∂αα̇ , (2.11) where ∂αα̇ := ∂xαα̇ , ∂iα := and ∂iα̇ := ∂ηα̇i . (2.12) The commutation relations for the generators (2.11) read {Qiα, Qjα̇} = −2δ iPαα̇ , [Pαα̇, Qiβ ] = 0 and [Pαα̇, Q ] = 0 . (2.13) To rewrite equations of motion in terms of R4|4N superfields one uses the additional operators Diα = ∂iα + η i ∂αα̇ and D α̇ = ∂ α̇ + θ iα∂αα̇ , (2.14) which (anti)commute with the operators (2.11) and satisfy {Diα,Dj } = 2δjiPαβ̇ , [Pαα̇,Diβ ] = 0 and [Pαα̇,D ] = 0 . (2.15) Antichiral superspace R4|2N . On the superspace R4|4N one may introduce tensor fields de- pending on bosonic and fermionic coordinates (superfields), differential forms, Lie derivatives LX etc.. Furthermore, on any such superfield A one can impose the constraint equations LDiαA = 0, which for a scalar superfield f reduce to the so-called antichirality conditions Diαf = 0 . (2.16) These are easily solved by using a coordinate transformation on R4|4N , (xαα̇, ηα̇i , θ iα) → (x̃αα̇ = xαα̇−θiαηα̇i , ηα̇i , θiα) , (2.17) under which ∂αα̇,Diα and D α̇ transform to the operators ∂̃αα̇ = ∂αα̇ , D̃iα = ∂iα and D̃ α̇ = ∂ α̇ + 2θ iα∂αα̇ . (2.18) Then (2.16) simply means that f is defined on a sub-superspace R4|2N ⊂ R4|4N with coordinates x̃αα̇ and ηα̇i . (2.19) This space is called antichiral superspace. In the following we will usually omit the tildes when working on the antichiral superspace. 6Note that in Minkowski signature the Weyl spinor θα is complex and ηα̇ = εα̇β̇η β̇ = θα is complex conjugate to θα. For the Kleinian (split) signature 2 + 2, however, these spinors are real and independent of one another. N -extended SDYM in superfields. The N -extended SDYM equations can be rewritten in terms of superfields on the antichiral superspace R4|2N [33, 48]. Namely, for any given 0 ≤ N ≤ 4, fields of a proper multiplet from (2.9) can be combined into superfields Aαα̇ and Aiα̇ depending on xαα̇, ηα̇i ∈ R4|2N and giving rise to covariant derivatives ∇αα̇ := ∂αα̇ +Aαα̇ and ∇iα̇ := ∂iα̇ +Aiα̇ . (2.20) In such terms the N -extended SDYM equations (2.10) read [∇αα̇,∇ββ̇] + [∇αβ̇ ,∇βα̇] = 0 , [∇ α̇,∇ββ̇ ] + [∇ ,∇βα̇] = 0 , {∇iα̇,∇ }+ {∇i } = 0 , (2.21) which is equivalent to [∇αα̇,∇ββ̇] = εα̇β̇ Fαβ , [∇ α̇,∇ββ̇ ] = εα̇β̇ F β and {∇iα̇,∇ } = ε F ij , (2.22) where F ij is antisymmetric and Fαβ is symmetric in their indices. The above gauge potential superfields (Aαα̇, Aiα̇) as well as the gauge strength superfields (Fαβ , F iα, F ij) contain all physical component fields of theN -extended SDYMmodel. For instance, the lowest component of the triple (Fαβ , F iα, F ij) in an η-expansion is (Fαβ , χiα, φij), with zeros in case N is too small. By employing Bianchi identities for the gauge strength superfields, one successively obtains [48] the superfield expansions and the field equations (2.10) for all component fields. It is instructive to extend the antichiral combination in (2.18) to potentials and covariant derivatives, D̃iα̇ = ∂ α̇ + 2 θ iα ∂αα̇ + + + Ãiα̇ := Aiα̇ + 2 θiαAαα̇ ‖ ‖ ‖ ∇̃iα̇ := ∇iα̇ + 2 θiα∇αα̇ (2.23) where ∇αα̇, ∇iα̇ and D̃iα̇ are given by (2.20) and (2.18), while Aiα̇ and Aαα̇ depend on xαα̇ and ηα̇i only. With the antichiral covariant derivatives, one may condense (2.21) or (2.22) into the single {∇̃iα̇, ∇̃ } + {∇̃i , ∇̃j } = 0 ⇔ {∇̃iα̇, ∇̃ } = ε F̃ ij , (2.24) with F̃ ij = F ij + 4 θ[iαF j]α + 4 θiαθjβFαβ . The concise form (2.24) of the N -extended SDYM equations is quite convenient, and we will use it interchangeable with (2.21). Linear system for N -extended SDYM. It is well known that the superfield SDYM equations (2.21) can be seen as the compatibility conditions for the linear system of differential equations ζ α̇(∂αα̇ +Aαα̇)ψ = 0 and ζ α̇(∂iα̇ +Aiα̇)ψ = 0 , (2.25) where (ζ and ζ α̇ = εα̇β̇ζ . The extra (spectral) parameter7 ζ lies in the extended complex plane C∪∞ = CP 1. Here ψ is a matrix-valued function depending not only on xαα̇ and ηα̇i but also (meromorphically) on ζ ∈ CP 1. We subject the n×n matrix ψ to the following reality condition: ψ(xαα̇, ηα̇i , ζ) ψ(xαα̇, ηα̇i , ζ̄) = 1l , (2.26) 7The parameter ζ is related with λ used in [45] by the formula ζ = i 1−λ (cf. e.g. [31]). where “†” denotes hermitian conjugation and ζ̄ is complex conjugate to ζ. This condition guarantees that all physical fields of the N -extended SDYM model will take values in the adjoint representation of the algebra u(n). In the concise form the linear system (2.25) is written as ζ α̇(∇iα̇ + 2θiα∇αα̇)ψ = 0 ⇔ ζ α̇(D̃iα̇ + Ãiα̇)ψ = 0 ⇔ ζ α̇ ∇̃iα̇ ψ = 0 . (2.27) 2.3 Reduction of N -extended SDYM to 2+1 dimensions The supersymmetric Bogomolny-type Yang-Mills-Higgs equations in 2+1 dimensions are obtained from the described N -extended super SDYM equations by a dimensional reduction R2,2 → R2,1. In particular, for the N=0 sector we demand the components Aµ of a gauge potential to be independent of x4 and put A4 =: ϕ. Here, ϕ is a Lie-algebra valued scalar field in three dimensions (the Higgs field) which enters into the Bogomolny-type equations. Similarly, for N ≥ 1 one can reduce the N -extended SDYM equations on R2,2 by imposing the ∂4-invariance condition on all the fields (Aαα̇, χ [ij], χ̃ [ijk] [ijkl] ) from the N=4 supermultiplet or its truncation to N<4 and obtain supersymmetric Bogomolny-type equations on R2,1. Spinors in R2,1. Recall that on R2,2 both N=4 SDYM theory and full N=4 super Yang- Mills theory have an SL(4, R) ∼= Spin(3,3) R-symmetry group [33]. A dimensional reduction to 2,1 enlarges the supersymmetry and R-symmetry to 2N=8 and Spin(4,4), respectively, for both theories (cf. [49] for Minkowski signature). More generally, any number N of supersymmetries gets doubled to 2N in the reduction. Since dimensional reduction collapses the rotation group Spin(2,2) ∼= Spin(2,1)L×Spin(2,1)R of R2,2 to its diagonal subgroup Spin(2,1)D as the local rotation group of R2,1, the distinction between undotted and dotted indices disappears. We shall use undotted indices henceforth. Coordinates and derivatives in R2,1. The above discussion implies that one can relabel the bosonic coordinates xαβ̇ from (2.3) by xαβ and split them as xαβ = 1 (xαβ + xβα) + 1 (xαβ − xβα) = x(αβ) + x[αβ] (2.28) into antisymmetric and symmetric parts, x[αβ] = 1 εαβx4 = 1 εαβ t̃ and x(αβ) =: yαβ , (2.29) respectively, with y11 = x11 = 1 (t− y) , y12 = 1 (x12 + x21) = 1 x , y22 = x22 = 1 (t+ y) . (2.30) We also have θiα 7→ θiα and ηα̇i 7→ ηαi for the fermionic coordinates on R4|4N reduced to R3|4N . Bosonic coordinate derivatives reduce in 2+1 dimensions to the operators ∂(αβ) = (∂αβ + ∂βα) (2.31) which read explicitly as ∂(11) = = ∂t−∂y , ∂(12) = ∂(21) = 12 = ∂x , ∂(22) = = ∂t+∂y . (2.32) We thus have = ∂(αβ) − εαβ∂4 = ∂(αβ) − εαβ∂t̃ , (2.33) where ε12 = −ε21 = −1, ∂4 = ∂/∂x4 and ∂t̃ = ∂/∂t̃. The operators Diα and D α̇ acting on t̃-independent superfields reduce to Diα = ∂iα + η i ∂(αβ) and D α = ∂ α + θ iβ∂(αβ) , (2.34) where ∂iα = ∂/∂θ iα and ∂iα = ∂/∂η i . Similarly, the antichiral operators D̃iα and D̃ α̇ in (2.18) become D̂iα = ∂iα and D̂ α = ∂ α + 2θ iβ∂(αβ) . (2.35) Supersymmetric Bogomolny-type equations in component fields. According to (2.33), the components A of a gauge potential in four dimensions split into the components A(αβ) of a gauge potential in three dimensions and a Higgs field A[αβ] = −εαβ ϕ, i.e. Aαβ = A(αβ) +A[αβ] = A(αβ) − εαβ ϕ . (2.36) Then the covariant derivatives D reduced to three dimensions become the differential operators Dαβ − εαβ ϕ = ∂(αβ) + [A(αβ), · ]− εαβ [ϕ, · ] , (2.37) and the Yang-Mills field strength on R2,1 decomposes as Fαβ, γδ = [Dαβ , Dγδ] = εαγ fβδ + εβδ fαγ with fαβ = fβα . (2.38) Substituting (2.36) and (2.37) into (2.10), i.e. demanding that all fields in (2.10) are independent of x4 = t̃, we obtain the following supersymmetric Bogomolny-type equations on R2,1: fαβ +Dαβϕ = 0 , (2.39a) Dαβ χ iβ + εαβ [ϕ, χ iβ] = 0 , (2.39b) Dαβ D αβφij + 2[ϕ, [ϕ, φij ]] + 2{χiα, χjα} = 0 , (2.39c) Dαβ χ̃ β[ijk] − εαβ [ϕ, χ̃β[ijk]]− 6[χ[iα, φjk]] = 0 , (2.39d) [ijkl] + [ϕ,G [ijkl] ] + 12{χ[iα, χ̃jkl]β } − 18[φ[ij ,Dαβφkl]]− 18εαβ [φ[ij , [φkl], ϕ]] = 0 .(2.39e) Supersymmetric Bogomolny-type equations in terms of superfields. Translations gen- erated by the vector field ∂4 = ∂t̃ are isometries of superspaces R 4|4N and R4|2N . By taking the quotient with respect to the action of the abelian group G generated by ∂4, we obtain the reduced full superspace R3|4N ∼= R4|4N /G and the reduced antichiral superspace R3|2N ∼= R4|2N/G. In the following, we shall work on R3|2N and R3|2N × CP 1, since the reduced ψ-function from (2.25) and (2.27) is defined on the latter space. The linear system stays in the center of the superfield approach to the N -extended SDYM equations. After imposing t̃-independence on all fields in the linear system (2.27), we arrive at the linear equations ζα ∇̂iα ψ ≡ ζα(D̂iα + Âiα)ψ = 0 (2.40) of the same form but with D̂iα = ∂ α + 2θ iβ∂(αβ) and Âiα = Aiα + 2θiβ(A(αβ) − εαβΞ) , (2.41) where Aiα, A(αβ) and Ξ are superfields depending on yαβ and ηαi only. These linear equations expand again to the pair (cf. (2.25)) ζβ(∂(αβ) +A(αβ) − εαβΞ)ψ = 0 and ζα(∂iα +Aiα)ψ = 0 . (2.42) The compatibility conditions for the linear system (2.40) read {∇̂iα, ∇̂ } + {∇̂iβ, ∇̂jα} = 0 ⇔ {∇̂iα, ∇̂ } = εαβ F̂ ij (2.43) and present a condensed form of (2.39) rewritten in terms of R3|2N superfields. Similarly, these equations can also be written in more expanded forms analogously to (2.21) or using the superfield analog of (2.37). However, we will not do this since all these sets of equations are equivalent. 3 Noncommutative N -extended U(n) chiral model in 2+1 dimensions As has been known for some time, nonlinear sigma models in 2 + 1 dimensions may be Lorentz- invariant or integrable but not both [7, 11]. We will show that the super Bogomolny-type model discussed in Section 2 after a gauge fixing is equivalent to a super extension of the modified U(n) chiral model (so as to be integrable) first formulated by Ward [7]. Since integrability is compatible with noncommutative deformation (if introduced properly, see e.g. [9]–[20]) we choose from the beginning to formulate our super extension of this chiral model on Moyal-deformed R2,1 with noncommutativity parameter θ ≥ 0. Ordinary space-time R2,1 can always be restored by taking the commutative limit θ → 0. Star-product formulation. Classical field theory on noncommutative spaces may be realized in a star-product formulation or in an operator formalism8. The first approach is closer to the commutative field theory: it is obtained by simply deforming the ordinary product of classical fields (or their components) to the noncommutative star product (f ⋆ g)(x) = f(x) exp{ i ab −→∂b} g(x) ⇒ xa ⋆ xb − xb ⋆ xa = iθab (3.1) with a constant antisymmetric tensor θab. Specializing to R2,1, we use real coordinates (xa) = (t, x, y) in which the Minkowski metric g on R3 reads (gab) = diag(−1,+1,+1) with a, b, . . . = 1, 2, 3 (cf. Section 2). It is straightforward to generalize the Moyal deformation (3.1) to the superspaces introduced in the previous section, allowing in particular for non-anticommuting Grassmann-odd coordinates. Deferring general superspace deformations and their consequences to future work, we here content ourselves with the simple embedding of the “bosonic” Moyal deformation into superspace, meaning that (3.1) is also valid for superfields f and g depending on Grassmann variables θiα and ηαi . For later use we consider not only isotropic coordinates and vector fields u := 1 (t+y) = y22 , v := 1 (t−y) = y11 , ∂u = ∂t + ∂y = ∂(22) , ∂v = ∂t − ∂y = ∂(11) (3.2) 8See [50] for reviews on noncommutative field theories. introduced in Section 2, but also the complex combinations z := x+ iy , z̄ := x− iy , ∂z = 12 (∂x − i∂y) , ∂z̄ = (∂x + i∂y) . (3.3) Since the time coordinate t remains commutative, the only nonvanishing component of the non- commutativity tensor θab is θxy = −θyx =: θ > 0 ⇒ θzz̄ = −θz̄z = −2i θ . (3.4) Hence, we have z ⋆ z̄ = zz̄ + θ and z̄ ⋆ z = zz̄ − θ (3.5) as examples of the general formula (3.1). Operator formalism. The nonlocality of the star products renders explicit computation cum- bersome. We therefore pass to the operator formalism, which trades the star product for operator- valued spatial coordinates (x̂, ŷ) or their complex combinations (ẑ, ˆ̄z), subject to [t, x̂] = [t, ŷ] = 0 but [x̂, ŷ] = iθ ⇒ [ẑ, ˆ̄z] = 2 θ . (3.6) The latter equation suggests the introduction of annihilation and creation operators, ẑ and a† = ˆ̄z with [a , a†] = 1 , (3.7) which act on a harmonic-oscillator Fock space H with an orthonormal basis { |ℓ〉, ℓ = 0, 1, 2, . . .} such that a |ℓ〉 = ℓ |ℓ−1〉 and a† |ℓ〉 = ℓ+1 |ℓ+1〉 . (3.8) Any superfield f(t, z, z̄, ηαi ) on R 3|2N can be related to an operator-valued superfield f̂(t, ηαi ) ≡ F (t, a, a†, ηαi ) on R 1|2N acting in H, with the help of the Moyal-Weyl map f(t, z, z̄, ηαi ) 7→ f̂(t, ηαi ) = Weyl-ordered f 2θa†, ηαi . (3.9) The inverse transformation recovers the ordinary superfield, f̂(t, ηαi ) ≡ F (t, a, a†, ηαi ) 7→ f(t, z, z̄, ηαi ) = F⋆ t, z√ , z̄√ , ηαi , (3.10) where F⋆ is obtained from F by replacing ordinary with star products. Under the Moyal-Weyl map, we have f ⋆ g 7→ f̂ ĝ and dx dy f = 2π θTrf̂ = 2π θ 〈ℓ|f̂ |ℓ〉 , (3.11) and the spatial derivatives are mapped into commutators, ∂zf 7→ ∂̂z f̂ = − 1√ [a†, f̂ ] and ∂z̄f 7→ ∂̂z̄ f̂ = 1√ [a , f̂ ] . (3.12) For notational simplicity we will from now on omit the hats over the operators except when con- fusion may arise. Gauge fixing for ψ. Note that the linear system (2.40) and the compatibility conditions (2.43) are invariant under a gauge transformation ψ 7→ ψ′ = g−1ψ , (3.13a) A 7→ A′ = g−1A g + g−1∂ g (with appropriate indices) , (3.13b) Ξ 7→ Ξ′ = g−1Ξ g , (3.13c) where g = g(xa, ηαi ) is a U(n)-valued superfield globally defined on the deformed superspace R CP 1. Using a gauge transformation of the form (3.13), we can choose ψ such that it will satisfy the standard asymptotic conditions (see e.g. [51]) ψ = Φ−1 + O(ζ) for ζ → 0 , (3.14a) ψ = 1l + ζ−1Υ + O(ζ−2) for ζ →∞ , (3.14b) where the U(n)-valued function Φ and u(n)-valued function Υ depend on xa and ηαi . This “unitary” gauge is compatible with the reality condition for ψ, ψ(xa, ηαi , ζ) ψ(xa, ηαi , ζ̄) = 1l , (3.15) obtained by reduction from (2.26). Gauge fixing for Âiα. After fixing the unitary gauge (3.14) for ψ and inserting (ζα) = the linear system (2.40), one can easily reconstruct the superfield given in (2.41) from Φ or Υ via Âi1 = 0 and Âi2 = Φ−1D̂i2Φ = D̂i1Υ (3.16) and thus fix a gauge for the superfields Âiα. The operators D̂iα were defined in (2.35). One can express (3.16) in terms of Aiα and A(αβ) − εαβΞ as Ai1 = 0 and Ai2 = Φ−1∂i2Φ = ∂i1Υ , (3.17) A(11) = 0 and A(12) + Ξ = Φ−1∂(12)Φ = ∂(11)Υ , (3.18) A(21) − Ξ = 0 and A(22) = Φ−1∂(22)Φ = ∂(12)Υ . (3.19) Using (2.32), we can rewrite the nonzero components as A := Φ−1∂uΦ = ∂xΥ , B := Φ−1∂xΦ = ∂vΥ , Ci := Φ−1∂i2Φ = ∂i1Υ . (3.20) Recall that the superfields Φ and Υ depend on xa and ηαi . Linear system. In the above-introduced unitary gauge the linear system (2.42) reads (ζ∂x − ∂u −A)ψ = 0 , (ζ∂v − ∂x − B)ψ = 0 , (ζ∂i1 − ∂i2 − Ci)ψ = 0 , (3.21) which adds the last equation to the linear system of the Ward model [7] and generalizes it to superfields A(xa, ηαj ), B(xa, ηαj ) and Ci(xa, ηαj ). The concise form of (3.21) reads ζ D̂i1 − D̂i2 − Âi2 ψ = 0 (3.22) or, in more explicit form, ∂i1 + 2θ i1∂v + 2θ ∂i2 + Ci + 2θi1(∂x + B) + 2θi2(∂u +A) ψ = 0 . (3.23) N -extended sigma model. The compatibility conditions of this linear system are the N - extended noncommutative sigma model equations D̂i1(Φ −1D̂j2 Φ) + D̂ −1D̂i2 Φ) = 0 (3.24) which in expanded form reads (gab + vcε cab) ∂a(Φ −1∂bΦ) = 0 ⇔ ∂x(Φ−1∂xΦ) − ∂v(Φ−1∂uΦ) = 0 , (3.25a) ∂i1(Φ −1∂xΦ) − ∂v(Φ−1∂i2Φ) = 0 , ∂i1(Φ−1∂uΦ) − ∂x(Φ−1∂i2Φ) = 0 , (3.25b) ∂i1(Φ −1∂j2Φ) + ∂ −1∂i2Φ) = 0 . (3.25c) Here, the first line contains the Wess-Zumino-Witten term with a constant vector (vc) = (0, 1, 0) which spoils the standard Lorentz invariance but yields an integrable chiral model in 2+1 dimen- sions. Recall that Φ is a U(n)-valued matrix whose elements act as operators in the Fock space H and depend on xa and 2N Grassmann variables ηαi . As discussed in Section 2, the compatibility conditions of the linear equations (3.22) (or (3.21)) are equivalent to the N -extended Bogomolny- type equations (2.39) for the component (physical) fields. Thus, chiral model field equations (3.25) are equivalent to a gauge fixed form of equations (2.39). Υ-formulation. Instead of Φ-parametrization of (A,B, Ci) given in (3.17)–(3.20) we may use the equivalent Υ-parametrization also given there. In this case, the compatibility conditions for the linear system (3.21) reduce to (∂2x − ∂u∂v)Υ + [∂vΥ , ∂xΥ] = 0 , (3.26a) (∂i2∂v − ∂i1∂x)Υ + [∂i1Υ , ∂vΥ] = 0 , (∂i2∂x − ∂i1∂u)Υ + [∂i1Υ , ∂xΥ] = 0 , (3.26b) (∂i2∂ 1 + ∂ 1)Υ + {∂i1Υ , ∂ 1Υ} = 0 , (3.26c) which in concise form read (D̂i2 D̂ 1 + D̂ 1)Υ + {D̂i1Υ , D̂ 1Υ} = 0 . (3.27) Recall that Υ is a u(n)-valued matrix whose elements act as operators in the Fock space H and depend on xa and 2N Grassmann variables ηαi . For N=4, the commutative limit of (3.27) can be considered as Siegel’s equation [33] reduced to 2+1 dimensions. According to Siegel, one can extract the multiplet of physical fields appearing in (2.39) from the prepotential Υ via ∂i1Υ = A 2 , ∂ 1Υ = φ ij , ∂i1∂ 1Υ = χ̃ [ijk] 2 , ∂ 1Υ = G [ijkl] 22 , (3.28a) ∂(α1)Υ = A(α2) − εα2ϕ , ∂(α1)∂i1Υ = χiα , ∂(α1)∂(β1)Υ = fαβ , (3.28b) where one takes Υ and its derivatives at η2i = 0. The other components of the physical fields, i.e. χ̃ [ijk] 1 , G [ijkl] 11 , G [ijkl] 21 , A(11) and A(21)−ϕ, vanish in this light-cone gauge. Supersymmetry transformations. The 4N supercharges given in (2.11) reduce in 2+1 dimen- sions to the form Qiα = ∂iα − ηβi ∂(αβ) and Q α = ∂ α − θiβ∂(αβ) . (3.29) Their antichiral version, matching to D̂iα and D̂ of (2.35), reads Q̂iα = ∂iα − 2ηβi ∂(αβ) and Q̂ , (3.30) so that {Q̂iα , Q̂jβ} = −2 δ i ∂(αβ) . (3.31) On a (scalar) R3|2N superfield Σ these supersymmetry transformations act as δ̂Σ := εiαQ̂iαΣ + ε αΣ (3.32) and are induced by the coordinate shifts δ̂ yαβ = −2εi(αηβ)i and δ̂ η i = ε i , (3.33) where εiα and εαi are 4N real Grassmann parameters. It is easy to see that our equations (3.24) and (3.27) are invariant under the supersymmetry transformations (3.32) (applied to Φ or Υ). This is simply because the operators D̂iα and D̂ anticommute with the supersymmetry generators Q̂iα and Q̂ . Therefore, the equations of motion (3.25) of the modified N -extended chiral model in 2+1 dimensions as well as their reductions to 2+0 and 1+1 dimensions carry 2N supersymmetries and are genuine supersymmetric extensions of the corresponding bosonic equations. Note that this type of extension is not the standard one since the R-symmetry groups are Spin(N ,N ) in 2+1 and Spin(N ,N )× Spin(N ,N ) in 1+1 dimensions, which differ from the compact unitary R-symmetry groups of standard sigma models. Contrary to the standard case of two-dimensional sigma models the above “noncompact” 2N supersymmetries do not impose any constraints on the geometry of the target space, e.g. they do not demand it to be Kähler [52] or hyper-Kähler [53]. This may be of interest and deserves further study. Action functionals. In either formulation of the N -extended supersymmetric SDYM model on 2,2 there are difficulties with finding a proper action functional generalizing the one [54, 55] for the purely bosonic case. These difficulties persist after the reduction to 2+1 dimensions, i.e. for the equations (3.25) and (3.26) describing our supersymmetric modified U(n) chiral model. It is the price to be paid for overcoming the no-go barrier N ≤ 4 and the absence of geometric target-space constraints. On a more formal level, the problem is related to the chiral character of (3.24) as well as (3.27), where only the operators D̂iα but not D̂iα appear. Note however, that for N = 4 one can write an action functional in component fields producing the equations (2.39), which are equivalent to the superspace equations (3.24) when i, j = 1, . . . , 4 (see e.g. [47]). One proposal for an action functional stems from Siegel’s idea [33] for the Υ-formulation of the N -extended SDYM equations. Namely, one sees that ∂i2Υ enters only linearly into the last two lines in (3.26). Therefore, if we introduce Υ(1) := Υ|η2 =0 (3.34) then it must satisfy the first equation from (3.26), and the remaining equations iteratively define the dependence of Υ on η2i starting from Υ(1). Hence, all information is contained in Υ(1), as can also be seen from (3.28). In other words, the dependence of Υ on η2i is not ‘dynamical’. For an action one can then take (cf. [33]) d3x dN η1 Υ(1)∂(αβ)∂ (αβ)Υ(1) + Υ(1) ε αβ∂(α1)Υ(1) ∂(β1)Υ(1) . (3.35) Extremizing this functional yields the first line of (3.26) at η2i = 0. Except for the Grassmann integration, this action has the same form as the purely bosonic one [55]. One may apply the same logic to the Φ-formulation where the action for the purely bosonic case is also known [54, 56]. 4 N -extended multi-soliton configurations via dressing The existence of the linear system (3.22) (equivalent to (3.21)) encoding solutions of theN -extended U(n) chiral model in an auxiliary matrix ψ allows for powerful methods to systematically construct explicit solutions for ψ and hence for Φ† = ψ|ζ=0 and Υ = lim ζ (ψ−1l). For our purposes the so-called dressing method [57, 51] proves to be the most practical [12]–[20], and so we shall use it here for our linear system, i.e. already in the N -extended noncommutative case. Multi-pole ansatz for ψ. The dressing method is a recursive procedure for generating a new solution from an old one. More concretely, we rewrite the linear system (3.21) in the form ψ(∂u − ζ∂x)ψ† = A , ψ(∂x − ζ∂v)ψ† = B , ψ(∂i2 − ζ∂i1)ψ† = Ci . (4.1) Recall that ψ† := (ψ(xa, ηαi , ζ̄)) † and (A,B, Ci) depend only on xa and ηαi . The central idea is to demand analyticity in the spectral parameter ζ, which strongly restricts the possible form of ψ. One way to exploit this constraint starts from the observation that the left hand sides of (4.1) as well as of the reality condition (3.15) do not depend on ζ while ψ is expected to be a nontrivial function of ζ globally defined on CP 1. Therefore, it must be a meromorphic function on CP 1 possessing some poles which we choose to lie at finite points with constant coordinates µk ∈ CP 1. Here we will build a (multi-soliton) solution ψm featuringm simple poles at positions µ1, . . . , µm with9 Imµk < 0 by left-multiplying an (m−1)-pole solution ψm−1 with a single-pole factor of the µm − µ̄m ζ − µm a, ηαi ) , (4.2) where the n×n matrix function Pm is yet to be determined. Starting from the trivial (vacuum) solution ψ0 = 1l, the iteration ψ0 7→ ψ1 7→ . . . 7→ ψm yields a multiplicative ansatz for ψm, µm−ℓ − µ̄m−ℓ ζ − µm−ℓ , (4.3) which, via partial fraction decomposition, may be rewritten in the additive form ψm = 1l + ζ − µk , (4.4) 9This condition singles out solitons over anti-solitons, which appear for Imµk > 0. where Λmk and Sk are some n×rk matrices depending on xa and ηαi , with rk ≤ n. Equations for Sk. Let us first consider the additive parametrization (4.4) of ψm. This ansatz must satisfy the reality condition (3.15) as well as our linear equations in the form (4.1). In particular, the poles at ζ = µ̄k on the left hand sides of these equations have to be removable since the right hand sides are independent of ζ. Inserting the ansatz (4.4) and putting to zero the corresponding residues, we learn from (3.15) that µ̄k − µℓ Sk = 0 , (4.5) while from (4.1) we obtain the differential equations µ̄k − µℓ A,B,i Sk = 0 , (4.6) where L̄ A,B,i stands for either L̄Ak = ∂u − µ̄k∂x , L̄Bk = µk(∂x − µ̄k∂v) or L̄ik = ∂i2 − µ̄k∂i1 . (4.7) Note that we consider a recursive procedure starting from m=1, and operators (4.7) will appear with k = 1, . . . ,m if we consider poles at ζ = µ̄k. Because the L̄ A,B,i for k = 1, . . . ,m are linear differential operators, it is easy to write down the general solution for (4.6) at any given k, by passing from the coordinates (u, v, x; η1i , η i ) to “co-moving coordinates” (wk, w̄k, sk; η , η̄i ). The precise relation for k = 1, . . . ,m is [12, 58] wk := x+ µ̄ku+ µ̄ v = x+ 1 (µ̄k−µ̄−1k )y + (µ̄k+µ̄ )t and ηik := η i + µ̄kη i , (4.8) with w̄k and η̄ obtained by complex conjugation and the co-moving time sk being inessential because by definition nothing will depend on it. The kth moving frame travels with a constant velocity (vx , vy)k = − ( µk + µ̄k µkµ̄k + 1 µkµ̄k − 1 µkµ̄k + 1 , (4.9) so that the static case wk=z is recovered for µk = −i. On functions of (wk, ηik, w̄k, η̄ik) alone the operators (4.7) act as L̄Ak = L̄ k = (µk−µ̄k) =: L̄k and L̄ k = (µk−µ̄k) . (4.10) By induction in k = 1, . . . ,m we learn that, due to (4.5), a necessary and sufficient condition for a solution of (4.6) is L̄kSk = SkZ̃k and L̄ kSk = SkZ̃ k (4.11) with some rk×rk matrices Z̃k and Z̃ik depending on (wk, w̄k, η Passing to the noncommutative bosonic coordinates we obtain ŵk , ˆ̄wk = 2θ νkν̄k with νkν̄k = µk−µ̄k−µ−1k +µ̄ . (4.12) Thus, we can introduce annihilation and creation operators and c so that [ck , c ] = 1 (4.13) for k = 1, . . . ,m. Naturally, this Heisenberg algebra is realized on a “co-moving” Fock space Hk, with basis states |ℓ〉k and a “co-moving” vacuum |0〉k subject to ck|0〉k = 0. Each co-moving vacuum |0〉k (annihilated by ck) is related to the static vacuum |0〉 (annihilated by a) through an ISU(1,1) squeezing transformation (cf. [12]) which is time-dependent. The fermionic coordinates ηik and η̄ k remain spectators in the deformation. Coordinate derivatives are represented in the standard fashion as 7→ −[c† , · ] and ν̄k 7→ [ck , · ] . (4.14) After the Moyal deformation, the n×rk matrices Sk have become operator-valued, but are still functions of the Grassmann coordinates ηi and η̄i . The noncommutative version of the BPS conditions (4.11) naturally reads ck Sk = Sk Zk and Sk = Sk Z k (4.15) where Zk and Z k are some operator-valued rk×rk matrix functions of η and η̄ Nonabelian solutions for Sk. For general data Zk and Z it is difficult to solve (4.15), but it is also unnecessary because the final expression ψm turns out not to depend on them. Therefore, we conveniently choose Zk = ck ⊗ 1lrk×rk and Zik = 0 ⇒ Sk = Rk(ck, ηik) , (4.16) where Rk is an arbitrary n×rk matrix function independent of c†k and η̄ik.10 It is known that nonabelian (multi-) solitons arise for algebraic functions Rk (cf. e.g. [7] for the commutative and [12] for the noncommutative N=0 case). Their common feature is a smooth commutative limit. The only novelty of the supersymmetric extension is the ηi dependence, i.e. Rk = Rk,0 + η kRk,i + η Rk,ij + η Rk,ijp + η Rk,ijpq . (4.17) Abelian solutions for Sk. It is useful to view Sk as a map from C rk⊗Hk to Cn⊗Hk (momentarily suppressing the η dependence). The noncommutative setup now allows us to generalize the domain of this map to any subspace of Cn ⊗Hk. In particular, we may choose it to be finite-dimensional, say Cqk , and represent the map by an n×qk array |Sk〉 of kets in H. In this situation, Zk and Zik in (4.15) are just number -valued qk×qk matrix functions of ηjk and η̄ . In case they do not depend on η̄ , we can write down the most general solution as |Sk〉 = Rk(ck, ηjk) |Zk〉 exp ) η̄ik with |Zk〉 := exp |0〉k . (4.18) 10Changing Zk or Z k multiplies Rk by an invertible factor from the right, which drops out later, except for the degenerate case Zk=0 which yields Sk = Rk |0〉k〈0|k. As before, we may put Zi = 0 without loss of generality, but now the choice of Zk does matter. For any given k generically there exists a qk-dimensional basis change which diagonalizes the ket-valued matrix |Zk〉 7→ diag c† , eα c†, . . . , eα |0〉k = diag |α1k〉 , |α2k〉 , . . . , |α , (4.19) where we defined coherent states |αlk〉 := eα c† |0〉k so that ck |αlk〉 = αlk |αlk〉 for l = 1, . . . , qk and αlk ∈ C . (4.20) Note that not only the entries of Rk but also the α k are holomorphic functions of the co-moving Grassmann parameters η and thus can be expanded like in (4.17). In the U(1) model, we must use ket-valued 1×qk matrices |Sk〉 for all k, yielding rows |Sk〉 = R1k |α1k〉 , R2k |α2k〉 , . . . , R for k = 1, . . . ,m , (4.21) with functions αl ). Here, the Rl only affect the states’ normalization and can be collected in a diagonal matrix to the right, hence will drop out later and thus may all be put to one. Formally, we have recovered the known abelian (multi-) soliton solutions, but the supersymmetric extension has generalized |Sk〉 → |Sk(ηjk)〉. Explicit form of Pk. Let us now consider the multiplicative parametrization (4.3) of ψm which also allows us to solve (4.5). First of all, note that the reality condition (3.15) is satisfied if Pk = P = P 2k ⇔ Pk = Tk (T −1T † for k = 1, . . . ,m , (4.22) meaning that Pk is an operator-valued hermitian projector (of group-space rank rk ≤ n) built from an n×rk matrix function Tk (the abelian case of n=1 is included). The reality condition follows just because µk − µ̄k ζ − µk µ̄k − µk ζ − µ̄k = 1l for any ζ and k = 1, . . . ,m . (4.23) The rk columns of Tk span the image of Pk and obey Pk Tk = Tk ⇔ (1l−Pk)Tk = 0 . (4.24) Furthermore, the equation (4.5) with m = k (induction) rewritten in the form (1l−Pk) µk−ℓ − µ̄k−ℓ µ̄k − µk−ℓ Sk = 0 (4.25) reveals that (cf. (4.24)) T1 = S1 and Tk = 1l − µk−ℓ − µ̄k−ℓ µk−ℓ − µ̄k Sk for k ≥ 2 , (4.26) where the explicit form of Sk for k = 1, . . . ,m is given in (4.16) or (4.18). The final result reads µm−ℓ − µ̄m−ℓ ζ − µm−ℓ = 1l + ζ − µk (4.27) with hermitian projectors Pk given by (4.22), Tk given by (4.26) and Sk given by (4.16) or (4.18). The explicit form of Λmk (which we do not need) can be found in [12]. The corresponding superfields Φ and Υ are Φm = ψ m|ζ=0 = (1l− ρkPk) with ρk = 1− , (4.28a) Υm = lim ζ (ψm − 1l) = (µk−µ̄k)Pk . (4.28b) From (4.22) it is obvious that Pk is invariant under a similarity transformation Tk 7→ Tk Λk ⇔ Sk 7→ Sk Λk (4.29) for an invertible operator-valued rk×rk matrix Λk. This justifies putting Zik = 0 from the beginning and also the restriction to Zk = ck ⊗1lrk×rk in the nonabelian case, both without loss of generality. Hence, the nonabelian solution space constructed here is parametrized by the set {Rk}m1 of matrix- valued functions of ck and η k and the pole positions µk. The abelian moduli space, however, is larger by the set {Zk}m1 of matrix-values functions of ηik which generically contain the coherent- state parameter functions {αl )}. Restricting to ηi =0 reproduces the soliton configurations of the bosonic model [12]. Static solutions. Let us consider the reduction to 2+0 dimensions, i.e. the static case. Recall that static solutions correspond to the choice m = 1 and µ1 ≡ µ = −i implying w1 = z, so we drop the index k. Specializing (4.27), we have ψ = 1l − 2 i ζ + i P so that Φ = Φ† = 1l− 2P , (4.30) where a hermitian projector P of group-space rank r satisfies the BPS equations (1l−P ) aP = 0 ⇒ (1l−P ) aT = 0 , (4.31a) (1l−P ) ∂ P = 0 ⇒ (1l−P ) ∂ T = 0 , (4.31b) with P = T (T †T )−1T † and ηi = η1i + iη i . In this case T = S, and for a nonabelian r=1 projector P we get T = T (a, ηi) as an n×1 column. For the simplest case of N=1 we just have (cf. [59]) T = Te(a) + η To(a) with η = η 1 + iη2 , (4.32) where Te(a) and To(a) are rational functions of a (e.g. polynomials) taking values in the even and odd parts of the Grassmann algebra. Similarly, an abelian N=1 projector (for n=1) is built from |T 〉 = |α1〉 , |α2〉 , . . . , |αq〉 . (4.33) At θ=0, the static solution (4.32) of our supersymmetric U(n) sigma model is also a solution of the standard N=1 supersymmetric CPn−1 sigma model in two dimensions (see e.g. [59]).11 For 11In fact, Φ in (4.30) takes values in the Grassmannian Gr(r, n), and Gr(1, n) = CPn−1. this reason, one can overcome the previously mentioned difficulty with constructing an action (or energy from the viewpoint of 2+1 dimensions) for static configurations. Moreover, on solutions obeying the BPS conditions (4.31) the topological charge Q = 2πθ dη1dη2 Tr tr Φ D+Φ ,D−Φ (4.34) is proportional to the action (BPS bound) S = 2πθ dη1dη2 Tr tr D+Φ ,D−Φ (4.35) and is finite for algebraic functions Te and To. Here, the standard superderivatives D± are defined + iη ∂z and D− = + iη̄ ∂z̄ . (4.36) One-soliton configuration. For one moving soliton, from (4.27) and (4.28) we obtain ψ1 = 1l + µ− µ̄ ζ − µ P with P = T (T †T )−1T † (4.37) Φ = 1l − ρP with ρ = 1− µ . (4.38) Now our n×r matrix T must satisfy (putting Zi = 0 and Z = c⊗ 1lr×r) [c , T ] = 0 and T = 0 with ηi = η1i + µ̄ η i , (4.39) where c is the moving-frame annihilation operator given by (4.13) for k=1. Recall that the operators c and c† and therefore the matrix T and the projector P can be expressed in terms of the corresponding static objects by a unitary squeezing transformation (see e.g. (4.8) and (4.13)). For simplicity we again consider the case N=1 and a nonabelian projector with r=1. Then (4.39) tells us that T is a holomorphic function of c and η, i.e. T = Te(c) + η To(c) = T 1e (c) + η T o (c)... Tne (c) + η T o (c) (4.40) with polynomials T ae and T o of order q, say, analogously to the static case (4.32). Note that, for T ao to be Grassmann-odd and nonzero, some extraneous Grassmann parameter must appear. Similarly, abelian projectors for a moving one-soliton obtain by subjecting (4.33) to a squeezing transformation. For N=1 the moving frame was defined in (4.8) (dropping the index k) via w = x + 1 (µ̄−µ̄−1)y + 1 (µ̄+µ̄−1)t and η = η1 + µ̄η2 hence ∂tη = 0 . (4.41) Consider the moving frame with the coordinates (w, w̄, s; η, η̄) with the choice s = t and the related change of the derivatives (see [12, 58]) ∂x = ∂w + ∂w̄ , (4.42a) (µ̄−µ̄−1) ∂w + 12(µ−µ −1) ∂w̄ , (4.42b) (µ̄+µ̄−1) ∂w + (µ+µ−1) ∂w̄ + ∂s , (4.42c) ∂η1 = ∂η + ∂η̄ , (4.42d) ∂η2 = µ̄ ∂η + µ∂η̄ . (4.42e) In the moving frame our solution (4.38) is static, i.e. ∂sΦ = 0, and the projector P has the same form as in the static case. The only difference is the coefficient ρ instead of 2 in (4.38). Therefore, by computing the action (4.35) in (w, w̄; η1, η2) coordinates, we obtain for algebraic functions T in (4.40) a finite answer, which differs from the static one by a kinematical prefactor depending on µ (cf. [12] for the bosonic case). Large-time asymptotics. Note that in the distinguished (z, z̄, t) coordinate frame (4.41) implies that at large times w→ κ t with κ = 1 (µ̄+µ̄−1). As a consequence, the tq term in each polynomial in (4.40) will dominate, i.e. T → tq a1 + η b1 an + η bn =: tq Γ , (4.43) where Γ is a fixed vector in Cn. It is easy to see that in the distinguished frame the large-time limit of Φ given by (4.38) is Φ = 1l − ρΠ with Π = Γ (Γ†Γ)−1Γ† (4.44) being the projector on the constant vector Γ. Consider now them-soliton configuration (4.28). By induction of the above argument one easily arrives at the m-soliton generalization of (4.44). Namely, in the frame moving with the ℓth lump we have Φm = (1l− ρ1Π1) . . . (1l− ρℓ−1Πℓ−1)(1l− ρℓPℓ)(1l− ρℓ+1Πℓ+1) . . . (1l− ρmΠm) , (4.45) where the Πm are constant projectors. This large-time factorization of multi-soliton solutions provides a proof of the no-scattering property because the asymptotic configurations are identical for large negative and large positive times. 5 Conclusions In this paper we introduced a generalization of the modified integrable U(n) chiral model with 2N≤ 8 supersymmetries in 2+1 dimensions and considered a Moyal deformation of this model. It was shown that this N -extended chiral model is equivalent to a gauge-fixed BPS subsector of an N -extended super Yang-Mills model in 2+1 dimensions originating from twistor string theory. The dressing method was applied to generate a wide class of multi-soliton configurations, which are time-dependent finite-energy solutions to the equations of motion. Compared to the N=0 model, the supersymmetric extension was seen to promote the configurations’ building blocks to holomorphic functions of suitable Grassmann coordinates. By considering the large-time asymptotic factorization into a product of single soliton solutions we have shown that no scattering occurs within the dressing ansatz chosen here. The considered model does not stand alone but is motivated by twistor string theory [37] with a target space reduced to the mini-supertwistor space [44, 45, 47]. In this context, the obtained multi-soliton solutions are to be regarded as D(0|2N )-branes moving inside D(2|2N )-branes [60]. Here 2N appears due to fermionic worldvolume directions of our branes in the superspace de- scription [60]. Switching on a constant B-field simply deforms the sigma model and D-brane worldvolumes noncommutatively, thereby admitting also regular supersymmetric noncommutative abelian solutions. Restricting to static configurations, the models can be specialized to Grassmannian supersym- metric sigma models, where the superfield Φ takes values in Gr(r, n), and the field equations are invariant under 2N supersymmetry transformations with 0 ≤ N ≤ 4. This differs from the results for standard 2D sigma models [52, 53] where the target spaces have to be Kähler or hyper-Kähler for admitting two or four supersymmetries, respectively. This difference will be discussed in more details elsewhere. We derived the supersymmetric chiral model in 2+1 dimensions through dimensional reduction and gauge fixing of the N -extended supersymmetric SDYM equations in 2+2 dimensions. Recall that for the purely bosonic case most (if not all) integrable equations in three and fewer dimensions can be obtained from the SDYM equations (or their hierarchy [25]) by suitable dimensional reduc- tions (see e.g. [61]–[65] and references therein). Moreover, this Ward conjecture [61] was extended to the noncommutative case (see e.g. [66, 67]). It will be interesting to consider similar reductions of the N -extended supersymmetric SDYM equations (and their hierarchy [68]) to supersymmetric integrable equations in three and two dimensions generalizing earlier results [69]. Acknowledgements We acknowledge fruitful discussions with C. Gutschwager. This work was supported in part by the Deutsche Forschungsgemeinschaft (DFG). References [1] M.R. Douglas and C.M. Hull, J. High Energy Phys. 02 (1998) 008 [hep-th/9711165]. [2] C.S. Chu and P.M. Ho, Nucl. Phys. B 550 (1999) 151 [hep-th/9812219]. [3] V. Schomerus, J. High Energy Phys. 06 (1999) 030 [hep-th/9903205]. [4] N. Seiberg and E. Witten, J. High Energy Phys. 09 (1999) 032 [hep-th/9908142]. [5] O. Lechtenfeld, A.D. Popov and B. Spendig, Phys. Lett. B 507 (2001) 317 [hep-th/0012200]. [6] O. Lechtenfeld, A.D. Popov and B. Spendig, J. High Energy Phys. 06 (2001) 011 [hep-th/0103196]. [7] R.S. Ward, J. Math. Phys. 29 (1988) 386; Commun. Math. Phys. 128 (1990) 319. [8] H. Ooguri and C. Vafa, Mod. Phys. Lett. A 5 (1990) 1389; Nucl. Phys. B 361 (1991) 469. [9] M. Hamanaka, “Noncommutative solitons and D-branes,” PhD thesis, Tokyo University, 2003 [hep-th/0303256]; F.A. Schaposnik, Braz. J. Phys. 34 (2004) 1349 [hep-th/0310202]; L. Tamassia, “Noncommutative supersymmetric/integrable models and string theory,” PhD thesis, Pavia University, 2005 [hep-th/0506064]; [10] M. Hamanaka, “Noncommutative solitons and integrable systems,”in Noncommutative Geome- try and Physics, Eds. Y. Maeda, N. Tose, N. Miyazaki, S. Watamura and D. Steinheimer (World Scientific, 2005), p.175 [hep-th/0504001]; Nucl. Phys. B 741 (2006) 368 [hep-th/0601209]. [11] W.J. Zakrzewski, “Low dimensional sigma models”, IOP Publishing, Bristol, 1989. [12] O. Lechtenfeld and A.D. Popov, J. High Energy Phys. 11 (2001) 040 [hep-th/0106213]. [13] O. Lechtenfeld and A.D. Popov, Phys. Lett. B 523 (2001) 178 [hep-th/0108118]. [14] R. Gopakumar, S. Minwalla and A. Strominger, J. High Energy Phys. 05 (2000) 020 [hep-th/0003160]. [15] R. Gopakumar, M. Headrick and M. Spradlin, Commun. Math. Phys. 233 (2003) 355 [hep-th/0103256]. [16] M. Klawunn, O. Lechtenfeld and S. Petersen, J. High Energy Phys. 06 (2006) 028 [hep-th/0604219]. [17] S. Bieling, J. Phys. A 35 (2002) 6281 [hep-th/0203269]. [18] M. Wolf, J. High Energy Phys. 06 (2002) 055 [hep-th/0204185]. [19] M. Ihl and S. Uhlmann, Int. J. Mod. Phys. A 18 (2003) 4889 [hep-th/0211263]. [20] O. Lechtenfeld, L. Mazzanti, S. Penati, A.D. Popov and L. Tamassia, Nucl. Phys. B 705 (2005) 477 [hep-th/0406065]. [21] A.V. Domrin, O. Lechtenfeld and S. Petersen, J. High Energy Phys. 03 (2005) 045 [hep-th/0412001]. [22] N. Berkovits, Nucl. Phys. B 450 (1995) 90 [Erratum-ibid. B 459 (1996) 439] [hep-th/9503099]. [23] N. Berkovits and C. Vafa, Nucl. Phys. B 433 (1995) 123 [hep-th/9407190]; H. Ooguri and C. Vafa, Nucl. Phys. B 451 (1995) 121 [hep-th/9505183]. [24] R.S. Ward and R.O. Wells, “Twistor geometry and field theory,” Cambridge University Press, Cambridge, 1990. [25] L.J. Mason and N.M. J. Woodhouse, “Integrability, self-duality, and twistor theory,” Oxford University Press, Oxford, 1996. [26] R.S. Ward, Phys. Lett. A 61 (1977) 81. [27] O. Lechtenfeld and A.D. Popov, Phys. Lett. B 494 (2000) 148 [hep-th/0009144]; O. Lechtenfeld, A.D. Popov and S. Uhlmann, Nucl. Phys. B 637 (2002) 119 [hep-th/0204155]. [28] A. Kling, O. Lechtenfeld, A.D. Popov and S. Uhlmann, Phys. Lett. B 551 (2003) 193 [hep-th/0209186]; Fortsch. Phys. 51 (2003) 775 [hep-th/0212335]; A. Kling and S. Uhlmann, J. High Energy Phys. 07 (2003) 061 [hep-th/0306254]; M. Ihl, A. Kling and S. Uhlmann, J. High Energy Phys. 03 (2004) 002 [hep-th/0312314]; S. Uhlmann, J. High Energy Phys. 11 (2004) 003 [hep-th/0408245]. [29] L.L. Chau, M.L. Ge and Y.S. Wu, Phys. Rev. D 25 (1982) 1080; L. Dolan, Phys. Lett. B 113 (1982) 387; L.L. Chau, M.L. Ge, A. Sinha and Y.S. Wu, Phys. Lett. B 121 (1983) 391; L. Crane, Commun. Math. Phys. 110 (1987) 391. [30] A.D. Popov and C.R. Preitschopf, Phys. Lett. B 374 (1996) 71 [hep-th/9512130]; T.A. Ivanova, J. Math. Phys. 39 (1998) 79 [hep-th/9702144]; A.D. Popov, Rev. Math. Phys. 11 (1999) 1091 [hep-th/9803183]; Nucl. Phys. B 550 (1999) 585 [hep-th/9806239]. [31] T.A. Ivanova and O. Lechtenfeld, Int. J. Mod. Phys. A 16 (2001) 303 [hep-th/0007049]. [32] A.M. Semikhatov, Phys. Lett. B 120 (1983) 171; I.V. Volovich, Phys. Lett. B 123 (1983) 329. [33] W. Siegel, Phys. Rev. D 46 (1992) 3235 [hep-th/9205075]. [34] W. Siegel, Phys. Rev. Lett. 69 (1992) 1493 [hep-th/9204005]; Phys. Rev. D 47 (1993) 2512 [hep-th/9210008]. [35] N. Berkovits and W. Siegel, Nucl. Phys. B 505 (1997) 139 [hep-th/9703154]. [36] S. Bellucci, A. Galajinsky and O. Lechtenfeld, Nucl. Phys. B 609 (2001) 410 [hep-th/0103049]. [37] E. Witten, Commun. Math. Phys. 252 (2004) 189 [hep-th/0312171]. [38] N. Berkovits, Phys. Rev. Lett. 93 (2004) 011601 [hep-th/0402045]; N. Berkovits and L. Motl, J. High Energy Phys. 04 (2004) 056 [hep-th/0403187]. [39] W. Siegel,“Untwisting the twistor superstring,” hep-th/0404255; O. Lechtenfeld and A.D. Popov, Phys. Lett. B 598 (2004) 113 [hep-th/0406179]. [40] I.A. Bandos, J.A. de Azcarraga and C. Miquel-Espanya, J. High Energy Phys. 07 (2006) 005 [hep-th/0604037]; M. Abou-Zeid, C.M. Hull and L.J. Mason, “Einstein supergravity and new twistor string theories,” hep-th/0606272; L. Dolan and P. Goddard, “Tree and loop amplitudes in open twistor string theory,” hep-th/0703054. [41] A.D. Popov and C. Saemann, Adv. Theor. Math. Phys. 9 (2005) 931 [hep-th/0405123]; A.D. Popov and M. Wolf, “Hidden symmetries and integrable hierarchy of the N=4 super- symmetric Yang-Mills equations,” hep-th/0608225. [42] C. Saemann, “Aspects of twistor geometry and supersymmetric field theories within super- string theory,” PhD thesis, Leibniz University of Hannover, 2006 [hep-th/0603098]; M. Wolf, “On supertwistor geometry and integrability in super gauge theory,” PhD thesis, Leibniz University of Hannover, 2006 [hep-th/0611013]. [43] A. Neitzke and C. Vafa, “N = 2 strings and the twistorial Calabi-Yau,” hep-th/0402128. [44] D.W. Chiou, O.J. Ganor, Y.P. Hong, B.S. Kim and I. Mitra, Phys. Rev. D 71 (2005) 125016 [hep-th/0502076]; D.W. Chiou, O.J. Ganor and B.S. Kim, J. High Energy Phys. 03 (2006) 027 [hep-th/0512242]. [45] A.D. Popov, C. Saemann and M. Wolf, J. High Energy Phys. 10 (2005) 058 [hep-th/0505161]. [46] C. Saemann, “On the mini-superambitwistor space and N = 8 super Yang-Mills theory,” hep-th/0508137; “The mini-superambitwistor space,” In: Proc. of the Intern. Workshop on Supersymmetries and Quantum Symmetries (SQS’05), Eds. E. Ivanov and B. Zupnik, Dubna, 2005 [hep-th/0511251]. [47] A. D. Popov, Phys. Lett. B 647 (2007) 509 [hep-th/0702106]. [48] C. Devchand and V. Ogievetsky, Nucl. Phys. B 481 (1996) 188 [hep-th/9606027]. [49] N. Seiberg, Nucl. Phys. Proc. Suppl. 67 (1998) 158 [hep-th/9705117]. [50] A. Konechny and A.S. Schwarz, Phys. Rept. 360 (2002) 353 [hep-th/0012145, hep-th/0107251]; M.R. Douglas and N.A. Nekrasov, Rev. Mod. Phys. 73 (2001) 977 [hep-th/0106048]; R.J. Szabo, Phys. Rept. 378 (2003) 207 [hep-th/0109162]. [51] L.D. Faddeev and L.A. Takhtajan, “Hamiltonian methods in the theory of solitons”, Springer, Berlin, 1987. [52] B. Zumino, Phys. Lett. B 87 (1979) 203. [53] L. Alvarez-Gaumé and D.Z. Freedman, Commun. Math. Phys. 80 (1981) 443. [54] V.P. Nair and J. Schiff, Nucl. Phys. B 371 (1992) 329; A. Losev, G.W. Moore, N. Nekrasov and S. Shatashvili, Nucl. Phys. Proc. Suppl. 46 (1996) 130 [hep-th/9509151]. [55] E.T. Newman, Phys. Rev. D 18 (1978) 2901; A.N. Leznov, Theor. Math. Phys. 73 (1988) 1233. [56] T.A. Ioannidou and W. Zakrzewski, Phys. Lett. A 249 (1998) 303 [hep-th/9802177]. [57] V.E. Zakharov and A.V. Mikhailov, Sov. Phys. JETP 47 (1978) 1017; V.E. Zakharov and A.B. Shabat, Funct. Anal. Appl. 13 (1979) 166; P. Forgács, Z. Horváth and L. Palla, Nucl. Phys. B 229 (1983) 77; K. Uhlenbeck, J. Diff. Geom. 30 (1989) 1; O. Babelon and D. Bernard, Commun. Math. Phys. 149 (1992) 279 [hep-th/9111036]. [58] C.S. Chu and O. Lechtenfeld, Phys. Lett. B 625 (2005) 145 [hep-th/0507062]. [59] A.M. Perelomov, Phys. Rept. 174 (1989) 229. [60] O. Lechtenfeld and C. Saemann, J. High Energy Phys. 03 (2006) 002 [hep-th/0511130]. [61] R.S. Ward, Phil. Trans. Roy. Soc. Lond. A 315 (1985) 451; “Multidimensional integrable systems,” In: Field Theory, Quantum Gravity and Strings, Eds. H.J. De Vega, N. Sanchez, Vol. 2, p.106, 1986. [62] L.J. Mason and G.A.J. Sparling, Phys. Lett. A 137 (1989) 29; J. Geom. Phys. 8 (1992) 243; L.J. Mason and M.A. Singer, Commun. Math. Phys. 166 (1994) 191. [63] S. Chakravarty, M.J. Ablowitz and P.A. Clarkson, Phys. Rev. Lett. 65 (1990) 1085; M.J. Ablowitz, S. Chakravarty and L.A. Takhtajan, Commun. Math. Phys. 158 (1993) 289; S. Chakravarty, S.L. Kent and E.T. Newman, J. Math. Phys. 36 (1995) 763; M.J. Ablowitz, S. Chakravarty and R.G. Halburd, J. Math. Phys. 44 (2003) 3147. [64] G.V. Dunne, R. Jackiw, S.Y. Pi and C.A. Trugenberger, Phys. Rev. D 43 (1991) 1332 [Erratum-ibid. D 45 (1992) 3012]; I. Bakas and D.A. Depireux, Int. J. Mod. Phys. A 7 (1992) 1767. [65] T.A. Ivanova and A.D. Popov, Phys. Lett. A 170 (1992) 293; Phys. Lett. A 205 (1995) 158 [hep-th/9508129]; Theor. Math. Phys. 102 (1995) 280; M. Legaré and A. D. Popov, JETP Lett. 59 (1994) 883; Phys. Lett. A 198 (1995) 195. [66] M. Hamanaka and K. Toda, Phys. Lett. A 316 (2003) 77 [hep-th/0211148]; J. Phys. A 36 (2003) 11981 [hep-th/0301213]; M. Hamanaka, Phys. Lett. B 625 (2005) 324 [hep-th/0507112]; [67] A. Dimakis and F. Mueller-Hoissen, J. Phys. A 33 (2000) 6579 [nlin.si/0006029]; M.T. Grisaru, L. Mazzanti, S. Penati and L. Tamassia, J. High Energy Phys. 04 (2004) 057 [hep-th/0310214]; I. Cabrera-Carnero, J. High Energy Phys. 10 (2005) 071 [hep-th/0503147]. [68] M. Wolf, J. High Energy Phys. 02 (2005) 018 [hep-th/0412163]; “Twistors and aspects of inte- grability of self-dual SYM theory”, In: Proc. of the Intern. Workshop on Supersymmetries and Quantum Symmetries (SQS’05), Eds. E. Ivanov and B. Zupnik, Dubna, 2005 [hep-th/0511230]. [69] S.J. Gates and H. Nishino, Phys. Lett. B 299 (1993) 255 [hep-th/9210163]; A.K. Das and C.A.P. Galvao, Mod. Phys. Lett. A 8 (1993) 1399 [hep-th/9211014]; H. Nishino and S. Rajpoot, Phys. Lett. B 572 (2003) 91 [hep-th/0306290]. ABSTRACT We consider a supersymmetric Bogomolny-type model in 2+1 dimensions originating from twistor string theory. By a gauge fixing this model is reduced to a modified U(n) chiral model with N<=8 supersymmetries in 2+1 dimensions. After a Moyal-type deformation of the model, we employ the dressing method to explicitly construct multi-soliton configurations on noncommutative R^{2,1} and analyze some of their properties. <|endoftext|><|startoftext|> Introduction and Summary 1 2. Action and Hamiltonian 2 2.1 The 3 + 1 split 4 2.2 Shifted Variables 7 2.3 Linearization 7 3. Linearized Gravitational Duality and Holography 9 3.1 Duality and Holography 9 3.2 Linearized gravitational duality 10 3.3 Linearized Constraints and Bianchi Identities 11 3.4 Connection with other known dualities 12 4. The Effect on the Boundary Theory 13 5. Conclusions and Outlook 13 6. Appendix: other duality mappings 14 1. Introduction and Summary Duality has played an important role in our understanding of Yang-Mill theories and it is believed that it will play an important role also in gravity and in higher-spin gauge theories. Indeed, although it is less clear what could be the implications of duality for theories whose quantum versions are still unknown, gravity and higher-spin gauge theories1 are intimately connected to a quantum string theory where certainly duality plays a crucial role. The recent advent of holography raises some intriguing questions for duality. For example one may wonder what is the holographic image of a duality invariant spectrum, a duality trans- formation or a possible quantization condition that usually duality implies for charges. Some of these issues were raised by Witten in [2] where it was argued that the standard electric-magnetic duality of a U(1) gauge theory on AdS4 is responsible for a “natural” SL(2,Z) action on current two-point functions in three-dimensional CFTs.2 Shortly afterwards it was shown in [3] that such an SL(2,Z) action is intimately related to certain “double-trace” deformations in the boundary, 1For reviews of higher-spin theories see e.g. [1]. 2See [4] and [5] for more recent works. – 1 – assuming suitable large-N limits and existence of non-trivial fixed points. The latter assumptions are strengthened by the fact that there exist models (e.g., see [6] and references therein) which exhibit the required behavior. In particular, it was shown in [3] that certain ”double-trace” deformations induce an SL(2,Z) action on two-point functions of higher-spin (i.e. spin s ≥ 2) currents. This has led to the Duality Conjecture of [3]: linearized higher-spin theories on AdS4 spaces possess a generalization of electric-magnetic duality whose holographic image is the natural SL(2,Z) action on boundary two-point functions. Surprisingly, even the duality for linearized spin-2 gauge fields (linearized gravity) was not widely known by the time of this conjecture.3 Second order linearized gravitational duality was discussed among other in [8, 9, 10, 11, 12]. More recently, the duality properties of linearized gravity around flat space were studied in [13] and were further discussed in [14]. The duality of linearized gravity around dS4 was later studied in [15]. In this note we present our calculations regarding the duality properties of gravity in the presence of a cosmological constant. Having in mind applications to higher-spin gauge theories we use forms and work in the first order formalism where duality is also manifested at the level of the action [16]. Moreover, the first order formalism is relevant for applications of duality to holography, since the correlation functions of the boundary theory are essentially determined by the bulk canonical momenta (see e.g. [17]). Our aim in this work is to formulate linearized first order gravity using suitable ”electric” and ”magnetic” variables, in close analogy with electromagnetism. We find that this is pos- sible only when the background geometry is Minkowski or (A)dS4. Then we implement the standard electric-magnetic duality rotations. We find that, up to ”boundary” terms, the lin- earized Hamiltonian changes by terms that do not alter the bulk dynamics i.e. do not alter the second order bulk equations of motion. Moreover, the duality rotation interchanges the (linearized) constraints with the (linearized) Bianchi identitites. The ”boundary” terms have important holographic consequences since they correspond to marginal ”double-trace” deforma- tions [3] that induce the boundary SL(2,Z) action. In the Appendix we exhibit a modified duality rotation that leaves the bulk Hamiltonian invariant and induces ”boundary” terms that correspond to relevant deformations as in [3]. 2. Action and Hamiltonian Having in mind the extension of our results to higher-spin gauge theorieswe start from the MacDowell-Mansouri form [18] of the gravitational action4 IMM = ǫabcd Rab ∧ Rcd + 2Λea ∧ eb ∧ Rcd + Λ2ea ∧ eb ∧ ec ∧ ed , (2.1) 3An interesting formulation of first order duality for linearized gravity around flat space was presented in [7]. 4We note I = −16πGNS, where S is the usually normalized gravitational action. – 2 – where a, b, ... are Lorentz indices. In this formalism, the vierbein ea and the spin connection ωab are initially thought of as independent variables. The curvature 2-form is Rab = dω b + ω c ∧ ω Rabcde c ∧ ed. Varying the action with respect to ea and ωab, we find Rab + Λea ∧ eb = 0 , (2.2) T a = dea + ωab ∧ e b = 0 . (2.3) The relation to gravity is established via the vanishing torsion equation (2.3), which relates e and ω in the familiar way. The above equations are equivalent to the Einstein equation in metric variables Rµν − Rgµν = +3Λgµν . (2.4) and the scalar curvature is R = −d(d − 1)Λ = −12Λ. Note that our Λ is related to the cosmological constant in its usual definition via Λcosm = −6Λ. Λ > 0 corresponds to AdS. Note that this is actually SO(3, 2) covariant, as we can combine ω, e into a super-connection. Note that Λ has units (Length)−2. In the SO(3, 2)-invariant formalism, IMM arises from IMM = ǫABCDEV ERAB ∧RCD , (2.5) where V E is a non-dynamical 0-form field (that we take to have value V −1 = 1 to gauge back to the SO(3, 1) formalism) and RAB is the curvature of Ω B ≡ {e a, ωab}. There are also quasi- topological terms of the form Itop = RAB ∧R RAB ∧ RACV BV C (2.6) that we could add to the action. In the stated gauge, this reduces to Itop = P2 + (θ + α)CNY + α Rab ∧ e a ∧ eb (2.7) where P2 = Rab∧R a is the Pontryagin class, CNY = (T a∧Ta−Rab∧e a∧eb) is the Nieh-Yan class and we also note the Euler class E2 = ǫabcdR ab∧Rcd. Note that in the presence of torsion, the action (2.7) contains the non-topological term Rab ∧ e a ∧ eb with “Immirzi parameter” γ = −2/α. In the absence of torsion, this term is a total derivative. The Hilbert-Palatini action is IHP = IMM − E2 . (2.8) It differs from IMM by a boundary term, is smooth as Λ → 0 but is not manifestly SO(3, 2)- invariant. – 3 – 2.1 The 3 + 1 split Next, we carefully consider the 3 + 1 split. Although much of the discussion here is familiar from the ADM formalism, we feel it is important to set notation carefully, as we will introduce some new ingredients. To accommodate both AdS and dS signatures simultaneously, we will introduce a ‘time’ function t and a foliation of space-time Σt →֒ M . In dS, t is time-like, and this corresponds to the usual Hamiltonian foliation; in AdS on the other hand, we will take t to be the (space-like) radial coordinate. We will keep track of the resulting signs by a parameter σ⊥, equal to ±1 in dS(AdS). Proceeding as usual then, we get a vector field t that satisfies ∇tt = 1 ≡ t(t) (so t = and a 1-form dt. Given a 4-metric, we can introduce the normal 1-form n as n = σ⊥Ndt , (2.9) which is normalized as (n, n) = σ⊥. The dual vector field n can be expanded as N , (2.10) where the shift N satisfies (N,n) = 0, and thus (t,n) = σ⊥N . Next, we will locally choose a basis of 1-forms e0 = σ⊥n = Ndt , (2.11) eα = ẽα +Nαdt . (2.12) The ẽα span T ∗Σt, and correspond to a 3-metric hij = ẽ j ηαβ . The quantities N α are the components of N: Nα = eαi N i. These basis 1-forms are dual to {e0 = n, eα = ẽα}, with b) = δba. We expand the spin connection in the same basis5 ωab = q bdt+ ω̃ b , (2.13) which leads to Rab = R̃ b + dt ∧ r b , (2.14) where R̃ is formed from ω̃ and d̃ only, and rab = ˙̃ω b − d̃q b − ω̃ b + q b . (2.15) Note that these quantities are merely decompositions along T ∗Σt in the 4-geometry; we will introduce the intrinsically defined objects shortly. We then find IHP = 2ǫαβγ N(R̃αβ + Λẽα ∧ ẽβ) ∧ ẽγ − 2Nα(R̃0β) ∧ ẽγ + r0α ∧ ẽβ ∧ ẽγ . (2.16) 5We have qab = Nω0 b and ω̃ b ≡ ωα – 4 – As is familiar, the lapse and shift appear as Lagrange multipliers. The constraints that they multiply are of course zero in any background (i.e. vacuum solution), such as (A)dS4. The final term in the action contains the real dynamics – r0α depends on the components R0α0β of the Riemann tensor. Note though that the tensors used here are 4-dimensional. Let us define the ”electric field” Kα = σ⊥ω̃ α = Kβαẽ β . (2.17) In the case that ω is the torsion-free Levi-Civita connection, this agrees with the standard definition for extrinsic curvature, regarded as a vector-valued one-form. We then find R̃αβ = (3)Rαβ − σ⊥K α ∧Kβ , (2.18) R̃0α = σ⊥(d̃Kα +Kβ ∧ ω̃ α) ≡ σ⊥(D̃K)α . (2.19) These equations amount to the Gauss-Codazzi relations. Furthermore, r0α contains time derivatives of ω̃0α as well as terms linear in components of q. We find 2ǫαβγr 0α ∧ ẽβ ∧ ẽγ = 2ǫαβγ α − (D̃q)0α ∧ ẽβ ∧ ẽγ , (2.20) = 2σ⊥ǫαβγ K̇α + qαδKδ ∧ ẽβ ∧ ẽγ + 4q0α ǫαβγ T̃ β ∧ ẽγ up to a total 3-derivative. We have defined the intrinsic 3-torsion T̃ α = d̃ẽα + ω̃α β ∧ ẽ β. Since we wish to regard the ẽ as coordinate variables,6 we integrate the first term by parts to obtain (up to the total time-derivative ∂ α ∧ ẽβ ∧ ẽγǫαβγ 2ǫαβγr 0α ∧ ẽβ ∧ ẽγ = Πα ∧ ˙̃e α + 4q0αǫαβγ T̃ β ∧ ẽγ + 2σ⊥q αδǫαβγKδ ∧ ẽ β ∧ ẽγ . (2.21) where we have defined the momentum 2-form Πα = −4σ⊥ǫαβγK β ∧ ẽγ . (2.22) The qab appear as Lagrange multipliers. In particular, the qαβ constraint precisely sets the antisymmetric (torsional) part of the extrinsic curvature tensor K[αβ] to zero. Next, we define the ”magnetic field” σ⊥ǫαβγω̃ βγ, ωαβ = −ǫαβγBγ. (2.23) and we find that the q0α constraint ǫαβγ T̃ β ∧ ẽγ = ǫαβγ d̃ẽ β ∧ ẽγ − σ⊥Bβ ∧ ẽ β ∧ ẽα = 0 , (2.24) 6Without this integration by parts, we would be in the Ashtekar formalism. Here, our choice gives a formalism closely related to the metric variable formalism. Note that the induced boundary term may be written − 1 Πα∧ ẽ – 5 – involves only the antisymmetric part B[α,β] of the magnetic field Bα = Bαβ ẽ β . The antisymmetric part of Bα spoils the gauge covariance of the constraint (2.24) under an SO(3) rotation of the dreibein ẽα, hence it represents degrees of freedom that can be gauged fixed to zero by an SO(3) rotation. On the other hand, an algebraic equation of motion connects the symmetric part of Bαβ to derivatives of ẽ d̃ẽα + ǫαβγBβ ∧ ẽγ = 0 (2.25) At the end, one is left with the canonically conjugate variables ẽα and Πα. These results are familiar from the metric formalism. Dropping the torsional terms, we then arrive at the action IHP = ˙̃eα ∧ Πα + 2Nǫαβγ( (3)Rαβ − σ⊥K α ∧Kβ + Λẽα ∧ ẽβ) ∧ ẽγ −4σ⊥N αǫαβγ(D̃K) β ∧ ẽγ . (2.26) Furthermore, using ∗3ẽ α ∧ ẽβ = 1 αβγ ẽδ, we have Π̂α = ∗3Πα = −2(Kαβ − ηαβtrK)ẽ β , (2.27) where trK = ηαβKαβ . We can solve the above equation to get Kα = − (Π̂αβ − ηαβtrΠ̂)ẽ β . (2.28) As stated above, Kαβ (and Π̂αβ) is symmetric when the torsion vanishes. Finally, with the definition (2.23) we find7 (3)Rαβ ∧ ẽγ = ǫαβγ d̃ω̃αβ + ω̃αδ ∧ ω̃ ∧ ẽγ = σ⊥ 2d̃Bγ + ǫαβγB α ∧ Bβ ∧ ẽγ . (2.29) Introducing Bα is an unusual thing to do but it will play a role in duality: in this form, the Hamiltonian contains terms which are reminiscent of those of the Maxwell theory. The full HP action is of the form IHP = ˙̃eα ∧Πα − 4σ⊥N αǫαβγ(D̃K) β ∧ ẽγ +2σ⊥N(2d̃Bγ + ǫαβγB α ∧ Bβ − ǫαβγK α ∧Kβ + σ⊥Λǫαβγ ẽ α ∧ ẽβ) ∧ ẽγ . (2.30) Note that the entire contribution of the cosmological constant appears in the last term of the Hamiltonian constraint. 7The spatial signature σ3 appears in ǫαβγǫφδρη αφ = σ3(ηβδηγρ − ηβρηγδ). We will always consider Lorentzian spacetime signature, so σ3 = −σ⊥. – 6 – 2.2 Shifted Variables It is possible to make a transformation of the canonical variables in order to absorb the cosmo- logical constant term in (2.30). This can be achieved by introducing the new variables K̂α = Kα − ρẽα , (2.31) and requiring that ρ2 = σ⊥Λ . (2.32) This is positive only when σ⊥ and Λ are simultaneously positive or negative, as it is the case for both AdS4 (Λ > 0) and dS4 (Λ < 0). We will often write Λ = σ⊥/L 2 where L is a length scale. Under (2.31) the momentum 2-form becomes Πα → Pα − 4σ⊥ρǫαβγ ẽ β ∧ ẽγ . (2.33) The last term in (2.33) contributes a total time derivative to the action (of the form of a boundary cosmological term). We have introduced a new momentum variable Pα = −4σ⊥ǫαβγK̂ β ∧ ẽγ . Then, we get the action IHP = ˙̃eα ∧ Pα − 4σ⊥N αǫαβγ(D̃K̂ + ρT̃ ) β ∧ ẽγ − σ⊥ρǫαβγ (ẽα ∧ ẽβ ∧ ẽγ) +2σ⊥N 2d̃(Bα ∧ ẽ α) + 2Bγ ∧ T̃ γ − ǫαβγ Bα ∧Bβ + K̂α ∧ K̂β + 2ρK̂α ∧ ẽβ ∧ ẽγ .(2.34) Note that the shift constraint is still written in terms of the ordinary covariant derivative, and thus involves a non-linear term coupling B to K̂. Consistent with our previous discussion, we drop the terms involving the torsion T̃ , and disregard the boundary term to obtain IHP = ˙̃eα ∧ Pα − 4σ⊥N αǫαβγ(D̃K̂) β ∧ ẽγ +2σ⊥N 2d̃(Bα ∧ ẽ α)− ǫαβγ Bα ∧Bβ + K̂α ∧ K̂β + 2ρK̂α ∧ ẽβ ∧ ẽγ . (2.35) We note that the parameter ρ can be of either sign (although, this sign does not appear in the second order equations of motion). 2.3 Linearization Next, we linearize the above action around an appropriate fixed background. We expand as ẽα = ẽα + Eα, N = 1 + n, Nα = nα, Bα = Bα + bα, K̂α = K̂ + kα . (2.36) The background values should satisfy the constraints. The simplest choice is the background where = 0 = Bα . (2.37) – 7 – In fact, reaching this simple form was a motivation for the shift (2.31). Then, to quadratic order in the fluctuating fields the Hamiltonian gives IHP = Ėα ∧ pα − 4σ⊥n αǫαβγ d̃k β ∧ ẽγ + 4σ⊥n d̃(bα ∧ ẽ α)− ρǫαβγk α ∧ ẽβ ∧ ẽγ −2σ⊥ǫαβγ bα ∧ bβ + kα ∧ kβ + 2ρkα ∧ Eβ ∧ ẽγ ,(2.38) where pα = −4σ⊥ǫαβγk β ∧ ẽγ (2.39) are the linearized momentum variables conjugate to Eα. In order to reach the form (2.38) the linear terms in the fluctuations must vanish. For this to happen we find the relationships ˙̃eα + ρẽα = 0 . (2.40) Notice that we can also write the linearized action in the form IHP = (Ėα + ρEα) ∧ pα − 2σ⊥ǫαβγ bα ∧ bβ + kα ∧ kβ ∧ ẽγ −4σ⊥n αǫαβγ d̃k β ∧ ẽγ + n 4σ⊥d̃bγ + ρpγ ∧ ẽγ . (2.41) The form of the first term, involving the momentum, makes clear that longitudinal fluctuations are non-dynamical. The natural time dependence of Eα is of the form e−ρt (correspondingly, the natural time dependence of pα is e +ρt). Other than that, we see that in comparing to the flat space action, in these variables, the only change is that the Hamiltonian constraint is modified. The solutions of (2.40) and (2.37) are components of (A)dS4 spacetimes. We can solve (2.40) to obtain e0 = dt, eα = e−ρtdxα . (2.42) With these we construct the usual Poincaré metric on (A)dS which, however, covers only half of the space even though the parameter t runs from −∞ to +∞. The conformal boundary in these coordinates is at t = +∞. Then we derive ωα0 = −ρe −ρtdxα = −ρeα , (2.43) and so Rαβ = − eα ∧ eβ Rα0 = − eα ∧ e0 Rab = − ea ∧ eb . (2.44) Hence Ricab = − ηab and R = −12σ⊥/L 2 = −12Λ. We also evaluate Πα = −4σ⊥ρǫαβγ ẽ β ∧ ẽγ , Π̂ = 4ρẽα, trΠ̂ = 12ρ (2.45) Bα = 0, K α = ρẽα ⇒ K̂ = 0 (2.46) Note that in this gauge, (D̃K)α = 1 = 0, which solves the shift constraint, while the Hamil- tonian constraint is satisfied through a cancellation between the K2 term and the cosmological term. – 8 – 3. Linearized Gravitational Duality and Holography Let us summarize what we have obtained so far. In the presence of a cosmological constant we have defined variables such that the action resembles most closely the action without the cosmological constant. This was done in order to look for a suitable background around which linear fluctuations are as simple as possible. Requiring that K̂ (the “electric field”) and B (the ”magnetic field”) vanish in such a background - as they do around flat space - we found that the background should be (A)dS4. Quite satisfactorily, both sign choices for ρ in the change of variables (2.31) lead to (A)dS4 spacetimes. 3.1 Duality and Holography This is the appropriate point to recall some salient features of duality rotations. In simple Hamiltonian systems the effect of the canonical transformation p 7→ q and q 7→ −p to the action is (see e.g. [19]) dt[pq̇ −H(p, q)] 7→ ID = dt[−qṗ−H(q,−p)] . (3.1) Notice that ID involves the dual variables, for which we have however kept the same notation for simplicity. The transformed Hamiltonian H(q,−p) is in general not related to H(p, q). However, if H(q,−p) = H(p, q) we call the above transformation a duality. It then holds ID = I − qp . (3.2) The dual action describes exactly the same dynamics as the initial one, up to a modification of the boundary conditions. For example, if I is stationary on the e.o.m for fixed q in the boundary, ID is stationary on the same e.o.m. for fixed p in the boundary. This simple example illustrates the role of duality in holography; a bulk duality transformation corresponds to a particular modification of the boundary conditions. This property of duality transformations is behind the remarkable holographic properties of electormagnetism in (A)dS4 [2, 3]. Clearly, the crucial properties of a duality transformation are to be canonical and to leave the Hamiltonian unchanged. However, consider a slight generalization dt[pq̇ − (p2 + q2 + 2λpq)] (3.3) where λ is an arbitrary parameter. The Hamiltonian now is not invariant under the canonical transformation p 7→ q and q 7→ −p – the pq term changes sign. Consequently, the first order form of the equations of motion are also not duality invariant. Nevertheless, the second order equation of motion is invariant. We will find that gravity in the presence of a cosmological constant follows precisely this model. Of course, gravity is a much more complicated constrained system, but as we will show, the constraints and Bianchi identities transform appropriately. – 9 – We also note that the canonical transformation (implemented by a generating functional of the first kind) p 7→ q + 2λp , q 7→ −p . (3.4) is of interest here. The above does not change the Hamiltonian and the transformed action differs from the initial one by total time derivative terms8 S 7→ SD = S − pq . (3.5) 3.2 Linearized gravitational duality As a preamble to gravity we recall the duality properties of Maxwell theory IMax = A ∧ ∗3E − (E ∧ ∗3E +B ∧ ∗3B)−A0d̃ ∗3 E , (3.6) Under the duality E 7→ − ∗3 B, B 7→ ∗3E, à 7→ ÃD, we find IMax 7→ IMax,D = AD ∧ B − (E ∧ ∗3E +B ∧ ∗3B) + A0d̃B . (3.7) E and B in (3.7) should be expressed through ÃD. We observe that the kinetic term has changed sign, while the Hamiltonian remains invariant. In addition, the (Gauss) constraint is dualized to the trivial ‘Bianchi’ identity dB = 0 for the dual magnetic field. Next we try to apply a Maxwell-type duality map in gravity. We consider the following transformation around the fixed background (2.40) kα 7→ −bα, bα 7→ kα . (3.8) To implement the map (3.8) we need to specify the mapping of Eα to a ‘dual 3-bein’ Eα. We do that using the linearized form of (2.25) as ǫαβγbβ ∧ ẽγ + d̃E α = 0 7→ ǫαβγkβ ∧ ẽγ + d̃E α = 0 = d̃Eα − pα (3.9) Since pα = 4σ⊥d̃Eα, it is natural to define pD,α = 4σ⊥d̃Eα = −4σ⊥ǫαβγb β ∧ ẽγ , (3.10) and thus the mapping (3.8) is supplemented by E 7→ E , E 7→ −E , p 7→ −pD , pD 7→ p (3.11) 8In holography, the latter terms correspond to the relevant ”multi-trace” boundary deformations discussed in – 10 – Now, let us see the effects of the above duality mapping. The action transforms to IHP 7→ IHP,D = −Ėα ∧ pD,α − ρE α ∧ pD,α − 2σ⊥ǫαβγ bα ∧ bβ + kα ∧ kβ ∧ ẽγ (3.12) +4σ⊥n αǫαβγ d̃b β ∧ ẽγ + n 4σ⊥d̃kγ + ρpD,α ∧ ẽγ where now kα and bα should be expressed in terms of the dual variables Eα and pD,α via (3.9) and (3.10). We notice that the ’kinetic’ part Ė ∧ p of the action changes sign under the duality map, in direct analogy with the Maxwell case. However, the Hamiltonian is not invariant due to the change of sign of the second term in the first line of (3.12). We will discuss this further in a later section. For now, we note that this sign change would not show up in the equations of motion, written in second order form. It is important to also note that the constraints are transformed into quantities which in the next subsection we will recognize as the linearized Bianchi identities. This is to be expected since the duality transformations are canonical. We also note that it may be possible to choose an alternative canonical transformation, designed to leave the Hamiltonian invariant. The latter is presumably related to the work of Julia et. al. [15] and is considered in the Appendix. 3.3 Linearized Constraints and Bianchi Identities By virtue of the discussion above we may now demonstrate that under the duality mapping (3.8) the linearized constraints transform to the linearized Bianchi identities as Cα ≡ ǫαβγ d̃k β ∧ ẽγ 7→ −ǫαβγ d̃b β ∧ ẽγ (3.13) C0 ≡ −σ⊥ d̃bγ − ρǫαβγk α ∧ ẽβ ∧ ẽγ 7→ −σ⊥ d̃kγ + ρǫαβγb α ∧ ẽβ ∧ ẽγ (3.14) To identify the right hand sides, we first note that the Bianchi identities are b = dR c ∧ ω b + ω b = 0 (3.15) BaT = dT a − Rab ∧ e b + ωab ∧ T b = 0 (3.16) which are obtained from the definitions of Rab and T a by exterior differentiation. The first equation is satisfied identically. Since the torsion vanishes, the second equation tells us only that Rab ∧ e b = 0. If we do the 3+1 split, we find two equations. The first is α = −((3)Rαβ − σ⊥K α ∧Kβ) ∧ ẽ β = 0 (3.17) which upon using the symmetry of Kα linearizes to α = −ǫαβγ d̃b β ∧ ẽγ + . . . (3.18) Note that this is the image under duality of the shift constraint as in (3.13). The second identity is 0 = −R̃0α ∧ ẽ α = −σ⊥(D̃K)α = −σ⊥ d̃kα + ρǫαβγb β ∧ ẽγ ∧ ẽα = 0 (3.19) – 11 – where to arrive in the second line we used (2.46). This is the image of the Hamiltonian constraint as in (3.14). Summarizing, the duality transformations between linearized constraints and Bianchi iden- tities are Cα 7→ BT,α C0 7→ B T (3.20) BT,α 7→ −Cα B T 7→ −C0 (3.21) 3.4 Connection with other known dualities The Maxwell-type duality operation (3.8) is closely related to the dualization of the first two indices of the Riemann tensor as9 Rab → S dRcd (3.22) at least at the linearized level. Let us investigate (3.22) by rewriting expressions in the 3+1 split. We have Rab = R̃ b + dt ∧ r Sab = S̃ b + dt ∧ s We begin with the spatial 2-forms when we have R̃αβ = −ǫαβγ d̃Bγ + σ⊥(B α ∧ Bβ −Kα ∧Kβ) (3.23) R̃0α = σ⊥(d̃Kα +Kβ ∧ ω̃ α) ≡ σ⊥(D̃K)α (3.24) S̃0γ = σ⊥ǫαβγR̃ αβ (3.25) S̃αβ = ǫαβγR̃ 0γ (3.26) If we linearize these expressions, we find under the duality transformation (3.8) R̃ab 7→ −σ⊥S̃ ab (3.27) Because the expressions (3.24) involve derivatives of B and K, the duality (3.8) is an ‘integrated form’ of the usual Riemann tensor duality, but implies it. Similarly, if we investigate the spatial 1-forms, we find rab 7→ −σ⊥s ab (3.28) To arrive at this result we have set to zero the Lagrange multiplier field q. 9For a discussion of the duality properties of gravity in terms of the Riemann tensor see [10]. – 12 – 4. The Effect on the Boundary Theory It is well known that AdS is holographic. We may well ask, in the context of AdS/CFT, how the duality transformation that we have defined here acts in the boundary. We are instructed to consider the on-shell bulk action as a function of bulk fields. So, we evaluate the action on a solution to the equation of motion, resulting in a pure boundary term which is of the form Sbdy = pα ∧ E α (4.1) Applying the duality transformation to the bulk theory, although the bulk action is not invari- ant as we have discussed above, nevertheless it may be easily shown that it induces a simple transformation on the (linearized) boundary term: it simply changes its sign. Sdualbdy = − pD,α ∧ E α (4.2) This transformation is exactly analogous to what happens in the Maxwell case: it amounts to the result [21].10 2 = −1 . (4.3) 5. Conclusions and Outlook Motivated by possible application in holography and in higher-spin gauge theory we have studied the duality properties of gravity in the Hamiltonian formulation. We have presented the gravity action in terms of suitable variables that closely resemble the electric and magnetic fields in Maxwell theory. We have found suitable ”electric” and ”magnetic” field variables, such that at the linearized level first order gravity most closely resembles electromagnetism. This can be done only around Minkowksi and (A)dS4 backgrounds. We have implemented duality transformations in the linearized gravity fluctuations around these backgrounds. In the presence of a cosmological constant, the Hamiltonian changes, nev- ertheless the bulk dynamics remains unaltered, while the linearized lapse and shift constraints are mapped into the linearized Bianchi identities. Moreover, the duality transformations induce boundary terms whose relevance in holography we have briefly discussed. Finally, we have ex- hibited a modified duality rotation that leaves the bulk Hamiltonian invariant, while it induces boundary terms corresponding to relevant deformations. The main implication of our results is that certain properties of correlations functions in three-dimensional CFTs mimic the duality of gravity. It would be interesting to extend our results to black-hole backgrounds and also when topological terms are present in the bulk. We also expect that one can analyze the duality of higher-spin gauge theories based on our first-order approach. Acknowledgments 10See also [22] for an interesting recent application of this formula. – 13 – The work of A. C. P. was partially supported by the research program ”PYTHAGORAS II” of the Greek Ministry of Education. RGL was supported in part by the U.S. Department of Energy under contract DE-FG02-91ER40709. 6. Appendix: other duality mappings It is possible to find a transformation that leaves the Hamiltonian unchanged. Consider the following transformation in the fixed background (2.40) kα 7→ −bα − 2ρEα, bα 7→ kα . (6.1) The mapping to the dual dreibein is still specified by (3.9). A straightforward calculation reveals that the action transforms as IHP 7→ IHP,D = IHP + 4σ⊥ ǫαβγE α ∧ bβ + ρǫαβγE α ∧ Eβ ∧ ẽγ αǫαβγ d̃b β ∧ ẽγ − 8ρnαkβ ∧ ẽα ∧ ẽ +n(4σ⊥d̃kα + 4σ⊥ρǫαβγb β ∧ ẽγ + 8ΛǫαβγE β ∧ ẽγ) ∧ ẽα (6.2) The transformations (6.1) leaves unchanged the Hamiltonian and changes the action by the total ”time” derivative terms shown in the first line of (6.2). Moreover, the linearized constraints transform into the linearized Bianchi identities. Let us see that in some detail. The second term in the shift constraint is zero since kα is a symmetric one form kα = kαβ ẽ β with kαβ = kβα; see (2.21). The term proportional to Λ in the lapse constraint is also zero. This is slightly more involved to see and it is based on the possibility of solving (3.9) for Eα after gauge fixing.11 One way to see this is in components. Write Eα = Eαβ ẽ β and (3.9) becomes γ − ∂γE α = ǫ γ − ǫ α (6.3) In the ”Lorentz gauge” where ∂αEβα = 0 = ∂ αkβα the above can be inverted as Eαβ = ǫαδγ∂ γkδβ (6.4) Using (6.4) one verifies that the last term in the lapse constraint vanishes. This modified duality transformation is probably related to the one considered by Julia et. al. in [15]. References [1] D. Francia and A. Sagnotti, arXiv:hep-th/0601199. X. Bekaert, S. Cnockaert, C. Iazeolla and M. A. Vasiliev, arXiv:hep-th/0503128. 11This is the equivalent of inverting Ē = ∇× Ā in the discussion of duality in electromagnetism [16]. – 14 – [2] E. Witten, arXiv:hep-th/0307041. [3] R. G. Leigh and A. C. Petkou, JHEP 0312 (2003) 020 [arXiv:hep-th/0309177]. [4] R. Zucchini, Adv. Theor. Math. Phys. 8 (2005) 895 [arXiv:hep-th/0311143]. H. U. Yee, Phys. Lett. B 598 (2004) 139 [arXiv:hep-th/0402115]. [5] S. de Haro and P. Gao, arXiv:hep-th/0701144. [6] S. Hands, Phys. Rev. D 51 (1995) 5816 [arXiv:hep-th/9411016]. [7] P. C. West, Class. Quant. Grav. 18, 4443 (2001) [arXiv:hep-th/0104081]. [8] T. Curtright, Phys. Lett. B 165 (1985) 304. [9] J. A. Nieto, Phys. Lett. A 262 (1999) 274 [arXiv:hep-th/9910049]. [10] C. M. Hull, JHEP 0109 (2001) 027 [arXiv:hep-th/0107149]. [11] X. Bekaert, N. Boulanger and M. Henneaux, Phys. Rev. D 67 (2003) 044010 [arXiv:hep-th/0210278]. and [12] N. Boulanger, S. Cnockaert and M. Henneaux, JHEP 0306 (2003) 060 [arXiv:hep-th/0306023]. [13] M. Henneaux and C. Teitelboim, Phys. Rev. D 71, 024018 (2005) [arXiv:gr-qc/0408101]. [14] S. Deser and D. Seminara, Phys. Rev. D 71, 081502 (2005) [arXiv:hep-th/0503030]. S. Deser and D. Seminara, Phys. Lett. B 607, 317 (2005) [arXiv:hep-th/0411169]. [15] B. L. Julia, arXiv:hep-th/0512320. B. Julia, J. Levie and S. Ray, JHEP 0511, 025 (2005) [arXiv:hep-th/0507262]. [16] S. Deser and C. Teitelboim, Phys. Rev. D 13, 1592 (1976). [17] I. Papadimitriou and K. Skenderis, arXiv:hep-th/0404176. [18] S. W. MacDowell and F. Mansouri, Phys. Rev. Lett. 38 (1977) 739 [Erratum-ibid. 38 (1977) 1376]. [19] H. Goldstein, ”Classical Mechanics”, Addison-Wesley Publishing Company Inc. (1980) [20] E. Witten, arXiv:hep-th/0112258. [21] A. C. Petkou, Fortsch. Phys. 53, 962 (2005). [22] C. P. Herzog, P. Kovtun, S. Sachdev and D. T. Son, arXiv:hep-th/0701036. – 15 – ABSTRACT We discuss the implementation of electric-magnetic duality transformations in four-dimensional gravity linearized around Minkowski or (A)dS4 backgrounds. In the presence of a cosmological constant duality generically modifies the Hamiltonian, nevertheless the bulk dynamics is unchanged. We pay particular attention to the boundary terms generated by the duality transformations and discuss their implications for holography. <|endoftext|><|startoftext|> Introduction It is with great pleasure that we contribute to this book in honor of Prof. Takeo Fujiwara. GTL enjoyed eighteen months of Prof. Fujiwara’s hospi- tality at the University of Tokyo during the early 1990’s. At that time the work of Prof. Fujiwara in the field of electronic structure of quasicrystals had already made a major contribution to the literature (see for instance [1]). Since that time our research owes much to his work. Prof. Fujiwara was the first who performed realistic calculations of the electronic structure in quasicrystalline materials without adjustable param- eters (ab-initio calculations) [2]. Indeed these complex alloys [3] have very exotic physical properties (see Refs. [4, 5] and Refs therein), and it rapidly appeared that realistic calculations on the actual quasicrystalline materials are necessary to understand the physical mechanism that govern this prop- erties. In particular, these calculations allow to analyze numerically the role http://arxiv.org/abs/0704.0532v1 2 ELECTRONIC STRUCTURE 2 of transition-metal elements which is essential in those materials. In this paper, we briefly present our work on the role of transition-metal element in electronic structure and transport properties of quasicrystals and related complex phases. Several Parts of these works have been done or initiated in collaboration with Prof. T. Fujiwara. 2 Electronic structure 2.1 Ab-initio determination of the density of states A way to study the electronic structure of quasicrystal is to consider the case of approximants. Approximants are crystallines phases, with very large unit cell, which reproduce the atomic order of quasicrystals locally. Experiments indicate that approximant phases, like α-AlMnSi, α-AlCuFeSi, R-AlCuFe, etc., have transport properties similar to those of quasicrystals [4, 6]. In 1989 and 1991, Prof. Fujiwara performed the first numerical calculations of the electronic structure in realistic approximants of quasicrystals [2, 7, 8]. He showed that their density of states (DOS, see figure 1) is characterized by a depletion near the Fermi energy EF, called “pseudo-gap”, in agreement with experimental results (for review see Ref. [4, 9, 18]) and a Hume-Rothery stabilization [10, 11]. The electronic structure of simpler crystals such as orthorhombic Al6Mn, cubic Al12Mn, present also a pseudo-gap near EF which is less pronounced than in complex approximants phases (figure 1) [11]. 2.2 Models to analyze the role of transition-metal element sp–d hybridization model The role of the transition-metal (TM, TM= Ti, Cr, Mn, Fe, Co, Ni) elements in the pseudo-gap formation has been shown from experiments, ab-initio calculations and model analysis [4,13–19,11]. Indeed the formation of the pseudo-gap results from a strong sp–d coupling associated to an ordered 2 ELECTRONIC STRUCTURE 3 -12 -10 -8 -6 -4 -2 0 2 4 -12 -10 -8 -6 -4 -2 0 2 4 Energy (eV) Fα-Al Figure 1: Ab-initio total DOS of Al6Mn (simple crystal) and α- Al69.6Si13.0Mn17.4 (approximant of icosahedral quasicrystals) [11, 12]. sub-lattice of TM atoms [19, 11]. Consequently, the electronic structure, the magnetic properties and the stability, depend strongly on the TM positions, as was shown from ab-initio calculations [28–33,20,21]. How an effective TM–TM interaction induces stability? Just as for Hume-Rothery phases a description of the band energy can be made in terms of pair interactions (figure 2) [17, 19]. Indeed, it has been shown that an effective medium-range Mn–Mn interaction mediated by the sp(Al)–d(Mn) hybridization plays a determinant role in the occurrence of the pseudo-gap [19]. We have shown that this interaction, up to distances 10–20 Å, is essential in stabilizing these phases, since it can create a Hume- Rothery pseudo-gap close to EF. The band energy is then minimized as shown on figure 3 [20, 11]. 2 ELECTRONIC STRUCTURE 4 2 3 4 5 6 7 8 9 10 11 12 13 Mn−Mn distance r (A) -0.04 -0.02 with repulsive term without repulsive term repulsive term : b e − a r r = 4.8 A r = 6.7 A Figure 2: Effective medium-range Mn–Mn interaction between two non- magnetic manganese atoms in a free electron matrix which models aluminum atoms. [11] 0 4 8 12 16 20 24 28 32 36 40 L (A) α-AlMnSi β-AlMnSi Figure 3: Variation of the band energy due to the effective Mn–Mn interac- tion in o-Al6Mn, α-AlMnSi and β-Al9Mn3Si. [20] The effect of these effective Mn–Mn interactions has been also studied by several groups [17, 20, 21] (see also Refs in [11]). It has also explained the origin of large vacancies in the hexagonal β-Al9Mn3Si and ϕ-Al10Mn3 phases on some sites, whereas equivalent sites are occupied by Mn in µ-Al4.12Mn and λ-Al4Mn, and by Co in Al5Co2 [20]. On the other hand, an spin-polarized 2 ELECTRONIC STRUCTURE 5 effective Mn–Mn interaction is also determinant for the existence (or not) of magnetic moments in AlMn quasicrystals and approximants [21, 22, 32]. The analysis can be applied to any Al(rich)-Mn phases, where a small number of Mn atoms are embedded in the free electron like Al matrix. The studied effects are not specific to quasicrystals and their approximants, but they are more important for those alloys. Such a Hume-Rothery stabiliza- tion, governed by the effective medium-range Mn–Mn interaction, might therefore be intrinsically linked to the emergence of quasi-periodicity in Al(rich)-Mn system. Cluster Virtual Bound states One of the main results of the ab-initio calculations performed by Prof. Fujiwara for realistic approximant phases, is the small energy dispersion of electrons in the reciprocal space. Consequently, the density of states of approximants is characterized by “spiky” peaks [2, 7, 8, 28]. In order to analyze the origin of this spiky structure of the DOS, we developed a model that show a new kind of localization by atomic cluster [23]. As for the local atomic order, one of the characteristics of the quasicrys- tals and approximants is the occurrence of atomic clusters on a scale of 10–30 Å [25]. The role of clusters has been much debated in particular by C. Janot [24] and G. Trambly de Laissardière [23]. Our model is based on a standard description of inter-metallic alloys. Considering the cluster embedded in a metallic medium, the variation ∆n(E) of the DOS due to the cluster is cal- culated. For electrons, which have energy in the vicinity of the Fermi level, transition atoms (such as Mn and Fe) are strong scatters whereas Al atoms are weak scatters. In the figure 4 the variation, ∆n(E), of the density of states due to different clusters are shown. The Mn icosahedron is the actual Mn icosahedron of the α-AlMnSi approximant. As an example of a larger cluster, we consider one icosahedron of Mn icosahedra. ∆n(E) of clusters exhibits strong deviations from the Virtual Bound 2 ELECTRONIC STRUCTURE 6 6 7 8 9 10 11 12 13 14 15 Energy E (eV) )) 1 Mn atom 1 Mn icosahedron 1 icosahedron of 12 Mn icosahedra Figure 4: Variation ∆n(E) of the DOS due to Mn atoms. Mn atoms are embedded in a metallic medium (Al matrix). From [23]. States (1 Mn atom) [26]. Indeed several peaks and shoulders appear. The width of the most narrow peaks (50 − 100meV) are comparable to the fine peaks of the calculated DOS in the approximants (figure 1). Each peak indicates a resonance due to the scattering by the cluster. These peaks correspond to states “localized” by the icosahedron or the icosahedron of icosahedra. They are not eigenstate, they have finite lifetime of the order of ~/δE, where δE is the width of the peak. Therefore, the stronger the effect of the localization by cluster is, the narrower is the peak. A large lifetime is the proof of a localization, but in the real space these states have a quite large extension on length scale of the cluster. The physical origin of these states can be understood as follows. Elec- trons are scattered by the Mn atoms of a cluster. By an effect similar to that of a Faraday cage, electrons can by confined by the cluster provided that their wavelength λ satisfies λ & l, where l is the distance between two Mn spheres. Consequently, we expect to observe such a confinement by the 3 TRANSPORT PROPERTIES 7 cluster. This effect is a multiple scattering effect, and it is not due to an overlap between d-orbitals because Mn atoms are not first neighbor. 3 Transport properties Quasicrystals have many fascinating electronic properties, and in particular quasicrystals with high structural quality, such as the icosahedral AlCuFe and AlPdMn alloys, have unconventional conduction properties when com- pared with standard inter-metallic alloys. Their conductivities can be as low as 150–200 (Ω cm)−1 (see Refs. [4, 5, 27] and Refs. therein). Furthermore the conductivity increases with disorder and with temperature, a behavior just at the opposite of that of standard metal. In a sense the most striking property is the so-called “inverse Mathiessen rule” according to which the increases of conductivity due to different sources of disorder seems to be ad- ditive. This is just the opposite that happens with normal metals where the increases of resistivity due to several sources of scattering are additive. An important result is also that many approximants of these quasicrystalline phases have similar conduction properties. For example the crystalline α- AlMnSi phase with a unit cell size of about 12 Å and 138 atoms in the unit cell has a conductivity of about 300 (Ω cm)−1 at low temperature [4]. 3.1 Small Boltzmann velocity Prof. Fujiwara et al. was the first to show that the electronic structure of AlTM approximants and related phases is characterized by two energy scales [2, 7, 8, 28, 29] (see previous section). The largest energy scale, of about 0.5−1 eV, is the width of the pseudogap near the Fermi energy EF. It is related to the Hume–Rothery stabilization via the scattering of electrons by the TM sub-lattice because of a strong sp–d hybridization. The smallest energy scale, less than 0.1 eV, is characteristic of the small dispersion of the band energy E(k). This energy scale seems more specific to phases related to 3 TRANSPORT PROPERTIES 8 Temperature Metallic alloys "Perfect" stable quasicrystals Doped semi-conductors 4 K 300 K Metastable quasicrystals (i-AlMn), (i-AlCuFe and i- AlPdMn) "Imperfect" stable quasicrystals (i-AlLiCu) Amorphous alloys ρ Mott Figure 5: Schematic temperature dependencies of the experimental resistiv- ity of quasicrystals, amorphous and metallic crystals. 1e+14 1e+15 1/τ (s−1) Al (f.c.c.) Temperature Figure 6: Ab-initio elec- trical resistivity versus inverse scattering time, in cubic approximant α- Al69.6Si13.0Mn17.4, pure Al (f.c.c.), and cubic Al12Mn. the quasi-periodicity. The first consequence on transport is a small velocity at Fermi energy, Boltzmann velocity, VB = (∂E/∂k)E=EF . From numerical calculations, Prof. Fujiwara et al. evaluated the Bloch–Boltzmann dc con- ductivity σB in the relaxation time approximation. With a realistic value 3 TRANSPORT PROPERTIES 9 of scattering time, τ ∼ 10−14 s [27], one obtains σB ∼ 10 − 150 (Ωcm) −1 for a α-AlMn model [8] and 1/1-AlFeCu model [28]. This corresponds to the measured values [4, 6], which are anomalously low for metallic alloys. For decagonal approximant the anisotropy found experimentally in the conduc- tivity is also reproduced correctly [29]. 3.2 Quantum transport in Quasicrystals and approximants The semi-classical Bloch–Boltzmann description of transport gives inter- esting results for the intra-band conductivity in crystalline approximants, but it is insufficient to take into account many aspects due to the spe- cial localization of electrons by the quasi-periodicity (see Refs. [34–43] and Refs. therein). Some specific transport mechanisms like the temperature dependence of the conductivity (inverse Mathiessen rule, the defects influ- ence, the proximity of a metal / insulator transition), require to go beyond a Bloch–Boltzmann analysis. Thus, it appears that in quasicrystals and re- lated complex metallic alloys a new type of breakdown of the semi-classical Bloch-Boltzmann theory operates. In the literature, two different unconven- tional transport mechanisms have been proposed for these materials. Trans- port could be dominated, for short relaxation time τ by hopping between “critical localized states”, whereas for long time τ the regime could be dom- inated by non-ballistic propagation of wave packets between two scattering events. We develop a theory of quantum transport that applies to a normal bal- listic law but also to these specific diffusion laws. As we show phenomenolog- ical models based on this theory describe correctly the experimental trans- port properties [41, 42, 43] (compare figures 5 and 6). 3.3 Ab-initio calculations of quantum transport According to the Einstein relation the conductivity σ depends on the diffu- sivity D(E) of electrons of energy E and the density of states n(E) (summing 3 TRANSPORT PROPERTIES 10 the spin up and spin down contribution). We assume that n(E) and D(E) vary weakly on the thermal energy scale kT , which is justified here. In that case, the Einstein formula writes σ = e2n(EF)D(EF) (1) where EF is the chemical potential and e is the electronic charge. The tem- perature dependence of σ is due to the variation of the diffusivity D(EF ) with temperature. The central quantity is thus the diffusivity which is re- lated to quantum diffusion. Within the relaxation time approximation, the diffusivity is written [41] D(E) = C0(E, t) e −|t|/τ dt (2) where C0(E, t) = Vx(t)Vx(0) + Vx(0)Vx(t) it the velocity correlation functions without disorder, and τ is the relaxation time. Here, the effect of defects and temperature (scattering by phonons ...) is taken into account through the relaxation time τ . τ decreases as disorder increases. In the case of crystals phases (such as approximants of quasicrystals), one obtains [42, 43]: σ = σB + σNB (3) σB = e 2n(EF)V B τ and σNB = e 2n(EF) L2(τ) where σB is actual the Bolzmann contribution to the conductivity and σNB a non-Boltzmann contribution. L2(τ) is smaller than the square of the unit cell size L0. L 2(τ) can be calculated numerically for the ab-initio electronic structure [42]. From (3) and (4), it is clear that the Bolzmann term domi- nates when L0 ≪ VBτ : The diffusion of electrons is then ballistic, which is the case in normal metallic crystals. But, when L0 ≃ VBτ , i.e. when the Bolzmann velocity VB is very low, the non-Bolzmann term is essential. In the case of α-Al69.6Si13.0Mn17.4 approximant (figure 7) [42], with realistic value of τ (τ equals a few 10−14 s [27]), σNB dominates and σ increases when 3 TRANSPORT PROPERTIES 11 1e+14 2e+14 3e+14 4e+14 1/τ (s−1) Figure 7: Ab-initio dc-conductivity σ in cubic approximant α- Al69.6Si13.0Mn17.4 versus inverse scattering time. [42] 1e+14 2e+14 3e+14 4e+14 1/τ (s−1) 1e+05 2e+05 3e+05 4e+05 Figure 8: Ab-initio dc-conductivity σ in an hypothetical cubic approximant α-Al69.6Si13.0Cu17.4 versus inverse scattering time. [43] 1/τ increases, i.e. when defects or temperature increases, in agreement with experimental measurement (compare figures 5 and 6). To evaluate the effect of TM elements on the conductivity, we have considered an hypothetical α-Al69.6Si13.0Cu17.4 constructed by putting Cu atoms in place of Mn atoms in the actual α-Al69.6Si13.0Mn17.4 structure. Cu atoms have almost the same number of sp electrons as Mn atoms, but their d DOS is very small at EF. Therefore in α-Al69.6Si13.0Cu17.4, the effect of sp(Al)–d(TM) hybridization on electronic states with energy near EF is 4 CONCLUSION 12 very small. As a result, the pseudogap disappears in total DOS, and the conductivity is now ballistic (metallic), σ ≃ σB, as shown on figure 8. 4 Conclusion In this article we present the effect of transition-metal atoms on the physical properties of quasicrystals and related complex phases. These studies lead to consider these aluminides as spd electron phases [11], where a specific electronic structure governs stability, magnetism and quantum transport properties. The principal aspects of this new physics are now understood particularly thanks to seminal work of Prof. T. Fujiwara and subsequent developpements of his ideas. References [1] Fujiwara T, Tsunetsugu H. In: Di Vincenxo DP, Steinhart PJ, editors, Quasicrystals: The states of the art, Singapore: World Scientific, 1991. [2] Fujiwara T. Phys Rev 1989;B40:942. [3] Shechtman D, Blech I, Gratias D, Cahn JW. Phys Rev Lett 1984;53:1951. [4] Berger C. In: Hippert F, Gratias D, editors. Lecture on Quasicrystals. Les Ulis: Les Editions de Physique, 1994; p. 463. [5] Grenet T. In: Belin-Ferré E, Berger C, Quiquandon M, Sadoc A, edi- tors. Quasicrystals: Current Topics. Singapor: World Scientific, 2000; p. 455. [6] Quivy A, Quiquandon M, Calvayrac Y, Faudot F, Gratias D, Berger C, Brand RA, Simonet V, Hippert. J Phys Condens Matter 1996;8:4223. [7] Fujiwara T, Yokokawa T. Phys Rev Lett 1991;66:333. REFERENCES 13 [8] Fujiwara T, Yamamoto S, Trambly de Laissardière G. Phys Rev Lett 1993;71:4166. Mat Sci Forum 1994;150-151:387. [9] Mizutani U, Takeuchi T, Sato H. J Phys: Condens Matter 2002;14:R767. [10] Massalski TB, Mizutani U. Prog Mater Sci 1978;22:151. [11] Trambly de Laissardière G, Nguyen Manh D, Mayou D, Prog Mater Sci 2005;50:679. [12] Zijlstra ES, Bose SK. Phys Rev 2003;B67:224204. [13] Dankházi Z, Trambly de Laissardière G, Nguyen–Manh D, Belin E, Mayou D. J Phys: Condens Matter 1993;5:3339. [14] Trambly de Laissardière G, Mayou D, Nguyen Manh D. Europhys Lett 1993;21:25. J Non-Cryst Solids 1993;153-154:430. Trambly de Lais- sardière G, et al. Phys Rev 1995;B52:7920. [15] Berger C, Belin E, Mayou D. Annales de Chimie-Science des Matériaux 1993;18:485. [16] Mayou D, Cyrot–Lackmann F, Trambly de Laissardière G, Klein T. J Non-Cryst Solids 1993;153-154:412. [17] Zou J, Carlsson AE. Phys Rev Lett 1993;70:3748. [18] Belin-Ferré E. J Non-Cryst Solids 2004;334-335:323. [19] Trambly de Laissardière G, Nguyen Manh D, Mayou D. J Non-Cryst Solids 2004;334-335:347. [20] Trambly de Laissardière G. Phys Rev 2003;B68:045117. [21] Trambly de Laissardière G, Mayou D. Phys Rev Lett 2000;85:3273. [22] Simonet V, Hippert F, Audier M, Trambly de Laissardière G. Phys Rev 1998;B58:R8865. REFERENCES 14 [23] Trambly de Laissardière G, Mayou M. Phys Rev 1997;B55:2890. Tram- bly de Laissardière G, Roche S, Mayou D. Mat Sci Eng 1997;A226- 228:986. [24] Janot C, de Boissieu M. Phys Rev Lett 1994;72:1674. [25] Gratias D, Puyraimond F, Quiquandon M, Katz A. Phys Rev 2000;B63:24202. [26] Friedel J. Can J Phys 1956;34:1190. Anderson PW. Phys Rev 1961;124:41. [27] Mayou D, Berger C, Cyrot–Lackmann F, Klein T, Lanco P. Phys Rev Lett 1993;70:3915. [28] Trambly de Laissardière G, Fujiwara T. Phys Rev 1994;B50:5999. [29] Trambly de Laissardière G, Fujiwara T. Phys Rev 1994;B50:9843. Mat Sci Eng 1994;A181-182:722. [30] Hafner J, Krajč́ı M. Phys Rev 1998;B57:2849. [31] Krajč́ı M, Hafner J. Phys Rev 1998;B58:14110. [32] Nguyen–Manh D, Trambly de Laissardière G. J Mag Mag Mater 2003;262:496. [33] Zijlstra ES, Bose SK, Klanǰsek M, Jeglič P, Dolinšek J. Phys Rev 2005;B72:174206. [34] Tokihiro T, Fujiwara T, Arai M. Phys. Rev 1988;B38:5981. [35] Fujiwara T, Mitsui T, Yamamoto S. Phys Rev B 1996;53,R2910. [36] Roche S, Trambly de Laissardière G, Mayou D. J Math Phys 1996;38:1794. [37] Roche S, Mayou D. Phys Rev Lett 1997;79:2518. REFERENCES 15 [38] Mayou D. In: Belin-Ferré E, Berger C, Quiquandon M, Sadoc A, edi- tors. Quasicrystals: Current Topics. Singapor: World Scientific, 2000; p. 412. [39] Triozon F, Vidal J, Mosseri R, Mayou D. Phys Rev 2002;B65:220202. [40] Bellissard J. In: Garbaczeski P, Olkieicz R, editors. Dynamics of Dissi- pation, Lecture Notes in Physics. Berlin: Springer, 2003; p. 413. [41] Mayou D. Phys Rev Lett 2000;85:1290. [42] Trambly de Laissardière G, Julien JP, Mayou D. Phys Rev Lett 2006;97:026601. [43] Mayou D, Trambly de Laissardière G. In: Fujiwara T, Ishii Y, editors. Quasicrystals. Series “Handbook of Metal Physics”. Elsevier Science, 2007. to appear Introduction Electronic structure Ab-initio determination of the density of states Models to analyze the role of transition-metal element Transport properties Small Boltzmann velocity Quantum transport in Quasicrystals and approximants Ab-initio calculations of quantum transport Conclusion ABSTRACT In this paper, we briefly present our work on the role of transition-metal element in electronic structure and transport properties of quasicrystals and related complex phases. Several Parts of these works have been done or initiated in collaboration with Prof. T. Fujiwara. <|endoftext|><|startoftext|> TbMn2O5 Non-resonant and Resonant X-ray Scattering Studies on Multiferroic TbMn2O5 J. Koo1, C. Song1, S. Ji1, J.-S. Lee1, J. Park1, T.-H. Jang1, C.-H. Yang1, J.-H. Park1,2, Y. H. Jeong1, K.-B. Lee1,2,∗ T.Y. Koo2, Y.J. Park2, J.-Y. Kim2, D. Wermeille3,† A.I. Goldman3, G. Srajer4, S. Park5, and S.-W. Cheong5,6 eSSC and Department of Physics, POSTECH, Pohang 790-784, Korea Pohang Accelerator Laboratory, Pohang University of Science and Technology, Pohang 790-784, Korea Ames Laboratory, Department of Physics and Astronomy, Iowa State University, Ames, IA 50011, USA Advanced Photon Source, Argonne National Laboratory, Argonne, IL 60439, USA Rutgers Center for Emergent Materials and Department of Physics and Astronomy, Rutgers University, Piscataway, NJ 08854, USA Laboratory of Pohang Emergent Materials and Department of Physics, POSTECH, Pohang 790-784, Korea (Dated: November 1, 2018) Comprehensive x-ray scattering studies, including resonant scattering at Mn L-edge, Tb L- and M -edges, were performed on single crystals of TbMn2O5. X-ray intensities were observed at a forbidden Bragg position in the ferroelectric phases, in addition to the lattice and the magnetic modulation peaks. Temperature dependences of their intensities and the relation between the mod- ulation wave vectors provide direct evidences of exchange striction induced ferroelectricity. Resonant x-ray scattering results demonstrate the presence of multiple magnetic orders by exhibiting their different temperature dependences. The commensurate-to-incommensurate phase transition around 24 K is attributed to discommensuration through phase slipping of the magnetic orders in spin frustrated geometries. We proposed that the low temperature incommensurate phase consists of the commensurate magnetic domains separated by anti-phase domain walls which reduce spontaneous polarizations abruptly at the transition. PACS numbers: 77.80.e-, 75.25.+z, 64.70.Rh, 61.10.-i In recent years, much attention has been paid to mul- tiferroic materials, in which magnetic and ferroelectric orders coexist and are cross-correlated [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], due to theoretical interests and poten- tial application to magnetoelectric (ME) devices. Ma- nipulation of electric polarizations by external magnetic fields has been demonstrated in some of these materi- als [4, 5]. Orthorhombic TbMn2O5, one of the multi- ferroic materials, displays a rich phase diagram. Upon cooling through TN ∼ 41 K, TbMn2O5 becomes anti- ferromagnetic with an incommensurate magnetic (ICM) order which transits to a commensurate magnetic (CM) phase with spontaneous electric polarization at T c1 ∼ 36 K, and reenters a low temperature incommensurate magnetic (LT-ICM) phase at T c2 ∼ 24 K. Anomalies of ferroelectricity and dielectric properties were observed concurrently with these magnetic phase transitions [4, 9]. Especially, the reentrant LT-ICM phase is a phenomenon peculiar to RMn2O5 multiferroics while commensurate phases are more common as the low temperature ground states. Since the CM to LT-ICM phase transition is also accompanied with an abrupt loss of spontaneous polar- izations, it is critical to elucidate the natures of the in- commensurability of the material, including the mecha- nism of the CM to LT-ICM phase transition. The origin of the complex phases of the material is at- tributed to the coupling between magnetic moments of Mn ions and lattice [8, 9]. It is suggested that, when a magnetic order is modulated with a wave vector qm, the exchange striction affects inter-atomic bondings result- ing in a periodic lattice modulation with a wave vector qc = 2qm [5, 6, 7, 8, 9]. Recently, Chapon et al. pro- posed for RMn2O5 systems that ferroelectricity results from the exchange striction of acentric spin density waves for the CM phases [9]. Indeed, Kimura et al. insisted that CM modulations are indispensable to the ferroelec- tricity in the LT-ICM phase, from their neutron scat- tering results on HoMn2O5 under high magnetic fields [11]. However, lattice distortions derived from ICM spin structures turned out to describe well the spontaneous polarizations of YMn2O5 even in the ICM phase [12], implying that commensurability is not a necessary con- dition for the ferroelectricity. In order to understand the intriguing magnetoelectricity well, detailed information on the lattice and spin structure changes is necessary. However, only limited crystallographic data are available and even any direct evidence on the symmetry lowering has not been reported yet [9, 10, 11, 12, 13, 14]. In this letter, we present synchrotron x-ray scatter- ing results on single crystals of TbMn2O5. Since x-ray scattering is sensitive to both lattice and magnetic mod- ulations, x-ray scattering with intense undulator x-rays allowed simultaneous measurements for qm and qc. Non- resonant x-ray scattering results show the relationship of qc = 2qm, confirming lattice modulations are generated by the magnetic orders. A (3 0 0) forbidden Bragg peak, which is a direct evidence of the symmetry lowering to a non-centrosymmetry space group, was observed in the ferroelectric (FE) phases. Furthermore, the temperature dependence of the peak intensity, I (300), was found to coincide with those of the lattice modulation peak inten- sities, I c, and the spontaneous polarization square,P 2, in http://arxiv.org/abs/0704.0533v1 the CM phase. This indicates the ferroelectricity is gen- erated by the lattice modulations. In the LT-ICM phase, temperature dependences of I c cannot be described by a single order parameter, implying the presence of differ- ent magnetic orders. Resonant x-ray magnetic scattering results at Mn L-, Tb L3- and M5-edges show that each magnetic order has its own temperature dependence. It is proposed that CM to LT-ICM phase transition is in- duced by discommensuration through phase slipping due to competing magnetic orders under the frustrated ge- ometry. Moreover, the CM modulations with anti-phase domain walls are consistent with the temperature depen- dences of qm and I (300) in the LT-ICM phase, and explain well the abrupt loss of P at the transition. Single crystals of TbMn2O5 were grown by a flux method [4]. The specimen used for the hard x-ray scat- tering measurements has a plate-like shape with (1 1 0) as a surface normal direction. Its mosaicity was measured to be about 0.01◦ at (3 3 0) Bragg reflection. For soft x-ray scattering, a different sample was cut and polished to have (2 0 1) as a surface normal direction. Soft x-ray scattering measurements were performed at 2A beamline in the Pohang Light Source (PLS). Details of the soft x-ray scattering chamber were described elsewhere [15]. X-ray diffraction experiments were conducted at the 3C2 bending magnet beamline in the PLS and at the 6-ID undulator beamline in the Midwest Universities Collab- orative Access Team (MUCAT) Sector in the Advanced Photon Source. For non-resonant x-ray scattering exper- iments, 6.45 keV was selected as an incident x-ray energy below Mn K -edge (∼ 6.55 keV). All the incident x-rays were σ-polarized and PG(006) was used to have a σ-to-π channel at Tb L3-edge. Nonresonant x-ray scattering measurements were per- formed to investigate the temperature dependence of qm and qc simultaneously. The measured lattice modula- tion peak position of (2 5 -0.5) for the CM phase and those of its 4 split peaks for the ICM phases are pre- sented as solid and open circles, respectively, in Fig. 1 (a). For magnetic satellites, (2.5 5 -0.25) peak and its 2 split ones were measured for the CM and ICM phases. Their positions are presented as solid and open squares, respectively. The magnetic and lattice modulation satel- lites for ICM phases are linked with broken and solid lines to their corresponding main Bragg peaks. Temper- ature dependences of qm and qc are shown in Fig. 1 (b) and (c). From the results, it is obvious that relation, qc = 2qm, holds within experimental errors in the whole temperature range below TN . It is consistent with the magnetic order induced lattice modulations. The tem- perature dependence of qm shown here is qualitatively similar to the neutron scattering results by others [16]. Below TN , ICM magnetic peaks develop, and qm locks into a CM ordering at (1 ) via a first order transi- tion at T c1. On further cooling the sample below T c2, the CM to LT-ICM phase transition takes place. With FIG. 1: (Color online) Positions of the measured magnetic satellites (square) and lattice modulation peaks (circle) in the (h 5 l) reciprocal lattice plane are shown in (a). The temperature dependences of qxm (square) and q c (circle), and those of qzm (square) and q c (circle) are shown in (b) and (c), respectively. For direct comparisons with those of qm, the components of qc are divided by two. Vertical broken lines indicate TN ∼ 41 K, T c1 ∼ 36 K, T c2 ∼ 24 K and T c3 ∼ 13 K, respectively. further decreasing temperature, qm of the LT-ICM mod- ulations evolves and is eventually pinned around (0.486 0 0.308) which can be approximated to a CM value of (17 ) at T c3 ∼ 13 K. Such a long-period CM modulation can be interpreted as the CM modulations (qm = ( )) with domain walls, as is the case for ErNi2B2C [17]. As shown in Fig. 2 (a), measurable x-ray intensi- ties were observed, in the ferroelectric phase, at (3 0 0) Bragg position which is forbidden under a space group of the room temperature paraelectric phase, Pbam. Resid- ual intensities above T c1 are due to higher harmonic FIG. 2: (Color online) (a) Rocking curves of a (3 0 0) forbid- den Bragg peak measured below (open) and above T c1 (solid). (b) Temperature dependences of the integrated intensities of a (3 0 0) Bragg peak (circle), CM lattice modulation peak (square) and squared spontaneous polarization (broken line) taken from Ref. 4. All the data are properly scaled. contaminations. Values for full-width-at-half-maximum (FWHM) of the peak are about 0.01◦, close to those of (4 0 0) main Bragg peak in the LT-ICM phase. The results explicitly evidenced that inversion symmetry is broken concomitantly with the FE phase as speculated before. According to the models suggested by others [9, 10], dis- placements of Mn3+ are in ab-plane. While b-axis com- ponents of the atomic displacements mainly contribute to P, a-axis components enable the emergence of I (300). If the atomic displacements correspond to the periodic lat- tice modulations, it is expected that both P2 and I (300) are proportional to I c, as shown in Fig. 2 (b). (The spon- taneous polarization data are taken from Ref. 4 and are shifted in order to get the same values for T c1.) It con- firms that spontaneous polarization is due to the atomic displacements driven by magnetic orders: a direct crys- tallographic evidence of exchange striction as the origin of ferroelectricity in the material [8, 9, 10, 12]. Also it is noted that I (300) drops abruptly at T c2 and has a broad minimum around T c3. Though many interesting ME phenomena have been reported in the LT-ICM phases below T c2 [4, 11, 18, 19], their basic mechanisms still remain to be understood. Since the lattice modulations reflect basic ME natures, temperature dependences below T c2 of integrated inten- sities were measured at the four split ICM peak positions illustrated in Fig. 1 (a). From the results displayed in Fig. 3, it is clear that temperature dependences of all four peaks cannot be described by a single order parameter, implying the presence of various magnetic orders having the same qm’s but different temperature dependences. To investigate different magnetic orders, we performed resonant x-ray magnetic scattering measurements at Mn L-, Tb L3- and M5-edges. Figure 4 (a) shows energy pro- files around Mn L-edge of magnetic satellites at 10 K and FIG. 3: (Color online) Temperature dependences of the ICM lattice modulation peak intensities. x-ray absorption spectroscopy (XAS) at room tempera- ture. Magnetic peaks and XAS data clearly show reso- nances at both Mn L2- and L3-edges. XAS results show broad peaks containing contributions from the multiplet states of 3d electrons of Mn3+ and Mn4+ ions. Magnetic satellites show relatively sharp double peaks at both Mn L-edges. The sharp resonances represent different multi- plet states of Mn 3d electrons including charge transfer excitations, while Mn ions are expected to be in the high- spin configurations with all the 3d electron spins aligned FIG. 4: (a) Energy profiles of the ICM magnetic peaks (cir- cle)and XAS (solid line) around Mn L2,3-edges. Vertical bro- ken lines correspond to 640.8 eV and 644.2 eV, respectively. (b) Temperature dependences of the ICM (circle) and the CM (square) magnetic peaks. Open (Solid) symbols denote the data taken E = 640.8 eV (644.2 eV), respectively. (c) Temperature dependences of the ICM (open circle) and the CM (solid square) magnetic peak at Tb L3-edge. parallel. Therefore, although the resonances do not have one-to-one correspondences with the magnetic orders of Mn ions, changes in the resonances at magnetic satellites reflect the changes in spin ordering which are periodi- cally modulated with the wave vector qm. Temperature dependences of x-ray intensities at the ICM peak of Qm = (qxm 0 q m) were measured at the two resonances, 640.8 and 644.2 eV. The results are presented in Fig. 4 (b). Data for a CM peak of Qm = (0.5 0 0.25) at the reso- nance of 644.2 eV are presented together. It is clear that, above 15 K, intensities of each resonance have different temperature dependences from each other. Though the origin of the anomalous temperature dependences is not understood in detail, it reflects complicated natures of magnetic moments of Mn ions under the frustrated con- figuration. Magnetic ordering of Tb3+ ions was investigated with resonant x-ray scattering measurements at Tb L3-edge. Figure 4 (c) shows that ordering temperature of Tb mag- netic moments is the same with that of Mn, TN , which is consistent with neutron scattering results [9]. The modu- lation wave vector of Tb magnetic order is the same with the values of qm measured in nonresonant x-ray scatter- ing. Soft x-ray magnetic scattering measurements were also performed at Tb M 5-edge and the result not shown here confirms that observed x-ray intensities in Fig. 4 (c) reflect magnetic order of Tb 4f electrons which grows monotonically below TN . From the results shown in Fig. 4 (b) and (c), it is clear that there exist multiple magnetic order parameters hav- ing the same qm’s but different temperature dependences. The contributing portions of each magnetic order to scat- tering factors of magnetic satellites are different depend- ing on Qm(= QBragg + qm), and it results in different temperature dependence for each magnetic peak and its corresponding lattice modulation peak intensities, which explains the temperature dependences presented in Fig. Since the magnetic orders are located under the spin frustrated geometry, it is reasonable to suppose that phase-slips take place due to competitions between the magnetic orders, as their order parameters grow with different temperature dependences. The discommensu- ration results in the transition to the LT-ICM phase. Anti-phase domain walls for the phase slips are consis- tent with the aforementioned long-period CM modula- tions below T c3. Assuming the model suggested by oth- ers [9], atomic displacements are canted antiferroelectric type. Across an anti-phase domain wall, directions of the atomic displacements and the spontaneous polariza- tions are reversed. Therefore, not only the polarizations from domains separated by the domain wall cancel each other but also x-ray scattering amplitudes for the (3 0 0) Bragg peak are canceled due to the crystal symme- try. Then, only remnants resulting from unequal popu- lations of the domains contribute to P and I(300). Since a density of the domain walls determines qm, tempera- ture dependences of P, I(300) and qm down to T c3 can be explained consistently in terms of CM modulations with the anti-phase domain walls. This indicates that CM modulations are preferred as its low temperature ground state. Then, the low temperature phase seems to have a higher entropy due to the domain walls than the high temperature CM phase, violating the entropy rule. How- ever, due to the geometrical frustration and the presence of multiple magnetic orders many different energy scales can exist. The complicated temperature dependences of the magnetic orders in Fig. 4(c) reflect the presence of the different energy scales. Smaller energy scales become important at low temperatures and induce discommen- suration. Upturns of the electrical polarization and I(300) below T c3 are attributed to lattice modulations enhanced by increasing Tb magnetic moments, which is consistent with results of others demonstrating couplings between Tb moments and lattices [18, 19, 20]. In summary, we have shown that exchange striction is the driving mechanism for the magnetoelectricity in the material. The same temperature dependences of x- ray intensities at a (3 0 0) forbidden Bragg peak and a lattice modulation peak in the CM FE phase, together with observation of the relation, qc = 2qm, demonstrate that spontaneous electric polarization is due to atomic displacements driven by the exchange striction of mag- netic orders. Resonant x-ray magnetic scattering results confirm the presence of multiple magnetic orders hav- ing different temperature dependences. The CM to LT- ICM phase transition is attributed to discommensura- tion through phase slipping in the competing magnetic orders in the frustrated configurations. Temperature de- pendences of qm, P and I(300) in the LT-ICM phase are explained in terms of the CM modulations with anti- phase domain walls. We thank D.J. Huang for the useful discussions. This work was supported by the KOSEF through the eSSC at POSTECH, and by MOHRE through BK-21 pro- gram. The experiments at the PLS were supported by the POSTECH Foundation and MOST. Use of the Ad- vanced Photon Source (APS) was supported by the U.S. Department of Energy, Office of Science, Office of Ba- sic Energy Sciences, under Contract No. W-31-109-Eng- 38. The Midwest Universities Collaborative Access Team (MUCAT) sector at the APS is supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, through the Ames Laboratory under contract No. DE-AC02-07CH11358. Work at Rutgers was supported by NSF-DMR-0520471. ∗ Electronic address: kibong@postech.ac.kr † Present address: European Synchrotron Radiation Facil- mailto:kibong@postech.ac.kr ity, BP 220, F-38043 Grenoble Cedex 9, France [1] M. Kenzelmann et al., Phys. Rev. Lett. 95, 87206 (2005). [2] G. Lawes et al., Phys. Rev. Lett. 95, 087205 (2005). [3] S. Kobayashi et al., J. Korean Phys. Soc. 46, 289 (2005). [4] N. Hur et al., Nature 429, 392 (2004). [5] T. Kimura et al., Nature 426, 55 (2003). [6] T. Goto et al., Phys. Rev. Lett. 92, 257201 (2004). [7] T. Kimura et al., Phys. Rev. B 68, 060403(R) (2003). [8] S.-W. Cheong et al., Nature Mater. 6, 13 (2007). [9] L. C. Chapon et al., Phys. Rev. Lett. 93 177402, (2004). G.R. Blake et al., Phys. Rev. B. 71, 214402 (2005). [10] I. Kagomiya et al., Ferroelectrics 286, 167 (2003). [11] H. Kimura et al., J. Phys. Soc. Jpn. 75, 113701 (2006). [12] L. C. Chapon et al., Phys. Rev. Lett. 96, 097601 (2006). [13] V. Polyakov et al., Physica B 297, 208 (2001). [14] D. Higashiyama et al., Phys. Rev. B 70, 174405 (2004). [15] J.-S. Lee, Ph. D. Thesis, POSTECH (2006). [16] S. Kobayashi et al., J. Phys. Soc. Jpn. 73, 3439 (2004). [17] H. Kawano-Furukawa et al., Phys. Rev. B. 65, 180508 (2002). [18] S. Y. Haam et al., Ferroelectrics 336, 153 (2006). [19] S.-H. Baek et al., Phys. Rev. B 74 140410(R) (2006). [20] R. Valdés Aguilar et al., Phys. Rev. B 74, 184404 (2006). ABSTRACT Comprehensive x-ray scattering studies, including resonant scattering at Mn L-edge, Tb L- and M-edges, were performed on single crystals of TbMn2O5. X-ray intensities were observed at a forbidden Bragg position in the ferroelectric phases, in addition to the lattice and the magnetic modulation peaks. Temperature dependences of their intensities and the relation between the modulation wave vectors provide direct evidences of exchange striction induced ferroelectricity. Resonant x-ray scattering results demonstrate the presence of multiple magnetic orders by exhibiting their different temperature dependences. The commensurate-to-incommensurate phase transition around 24 K is attributed to discommensuration through phase slipping of the magnetic orders in spin frustrated geometries. We proposed that the low temperature incommensurate phase consists of the commensurate magnetic domains separated by anti-phase domain walls which reduce spontaneous polarizations abruptly at the transition. <|endoftext|><|startoftext|> Introduction to Their Statistical Mechanics (World Scientific, London, 2005). [7] R. Vogelsang and C. Hoheisel, Phys. Rev. A 38, 6296 (1988); H.P. van den Berg and C. Hoheisel, Phys. Rev. A 42, 2090 (1990). [8] K.W. Kehr, K. Binder, and S.M. Reulein, Phys. Rev. B 39, 4891 (1989). [9] W. Hess, G. Nägele, and A.Z. Akcasu, J. Polymer Sc. B 28, 2233 (1990). [10] J. Trullàs and J.A. Padrò, Phys. Rev. E 50, 1162 (1994). [11] A. Baumketner and Ya. Chushak, J. Phys.: Condens. Matter 11, 1397 (1999). [12] J.–F. Wax and N. Jakse, Phys. Rev. B 75, 024204 (2007). [13] L.S. Darken, Trans. AIME 180, 430 (1949). [14] M. Fuchs and A. Latz, Physica A 201, 1 (1993). [15] M. Asta, D. Morgan, J.J. Hoyt, B. Sadigh, J.D. Althoff, D. de Fontaine, and S.M. Foiles, Phys. Rev. B 59, 14271 (1999). [16] F. Faupel, W. Frank, M.–P. Macht, H. Mehrer, V. Naun- dorf, K. Rätzke, H.R. Schober, S.K. Sharma, and H. Te- ichler, Rev. Mod. Phys. 75, 237 (2003). [17] Y. Mishin, M. J. Mehl, and D. A. Papaconstantopoulos, Phys. Rev. B 65, 224114 (2002). [18] S. K. Das, J. Horbach, M. M. Koza, S. Mavila Chatoth, and A. Meyer, Appl. Phys. Lett. 86, 011918 (2005). [19] G.L. Batalin, E.A. Beloborodova, and V.G. Kazimirov, Thermodynamics and the Constitution of Liquid Al Based Alloys (Metallurgy, Moscow, 1983). [20] G.D. Ayushina, E.S. Levin, and P.V. Geld, Russ. J. Phys. Chem. 43, 2756 (1969). [21] M. Maret, T. Pomme, A. Pasturel, and P. Chieux, Phys. Rev. B 42, 1598 (1990). [22] S. Sadeddine, J.F. Wax, B. Grosdidier, J.G. Gasser, C. Regnaut, and J.M. Dubois, Phys. Chem. Liq. 28, 221 (1994). [23] M. Asta, V. Ozolins, J.J. Hoyt, and M. van Schilfgaarde, Phys. Rev. B 64, 020201(R) (2001). [24] V.S. Sudovtseva, A.V. Shuvalov, N.O. Sharchina, Ras- plavy No. 4, 97 (1990); U.K. Stolz, I. Arpshoven, F. Sommer, and B. Predel, J. Phase Equilib. 14, 473 (1993); K.V. Grigorovitch and A.S. Krylov, Thermochim. Acta 314, 255 (1998). [25] S.K. Das, J. Horbach, and K. Binder, J. Chem. Phys. 119, 1547 (2003); S.K. Das, J. Horbach, K. Binder, M.E. Fisher, J.V. Sengers, J. Chem. Phys. 125, 024506 (2006); S.K. Das, M. E. Fisher, J.V. Sengers, J. Horbach, K. Binder, Phys. Rev. Lett. 97, 025702 (2006). [26] F.O. Raineri and H.L. Friedman, J. Chem. Phys. 91, 5633 (1989); F.O. Raineri and H.L. Friedman, J. Chem. Phys. 91, 5642 (1989). [27] A.B. Bhatia and D.E. Thornton, Phys. Rev. B 52, 3004 (1970). [28] D. Holland–Moritz, O. Heinen, R. Bellissent, T. Schenk, and D.M. Herlach, Int. J. Mat. Res. 97, 948 (2006). [29] J.R. Manning, Phys Rev. 124, 470 (1961). [30] M. P. Allen, D. Brown, and A. J. Masters, Phys. Rev. E 49, 2488 (1994); M. P. Allen, Phys. Rev. E 50, 3277 (1994). [31] A. Griesche, F. Garcia Moreno, M.P. Macht, and G. Frohberg, Mat. Sc. Forum 508, 567 (2006). [32] A. Griesche, M.P. Macht, J.P. Garandet, and G. Froh- berg, J Non–Cryst. Solids 336, 173 (2004). [33] A. Griesche, M.P. Macht, and G. Frohberg, unpublished. [34] J.P. Garandet, C. Barrat, and T. Duffar, Int. J. Heat Mass Transfer 38, 2169 (1995). [35] C. Barrat and J.P. Garandet, Int. J. Heat Mass Transfer 39, 2177 (1996). [36] A. Meyer, Phys. Rev. B 66, 134205 (2002). [37] S. Mavila Chathoth, A. Meyer, M. M. Koza, and F. Yu- ranji, Appl. Phys. Lett. 85, 4881 (2004). [38] D. P. Landau and K. Binder, A Guide to Monte Carlo Simulations in Statistical Physics (Cambridge University Press, Cambridge, 2000). [39] A.F. Voter and S.P. Chen, in Characterization of Defects in Materials, edited by R.W. Siegel et al., MRS Sym- posia Proceedings No. 82 (Materials Research Society, Pittsburgh, 1978), p. 175. [40] S.M. Foiles and M.S. Daw, J. Mat. Res. 2, 5 (1987). [41] S.K. Das, J. Horbach, and K. Binder, unpublished. Introduction Self–diffusion and interdiffusion: Basic theory Experimental Methods Long–capillary technique Neutron scattering experiments Details of the simulation Results Conclusion Acknowledgments References ABSTRACT A combination of experimental techniques and molecular dynamics (MD) computer simulation is used to investigate the diffusion dynamics in Al80Ni20 melts. Experimentally, the self-diffusion coefficient of Ni is measured by the long-capillary (LC) method and by quasielastic neutron scattering. The LC method yields also the interdiffusion coefficient. Whereas the experiments were done in the normal liquid state, the simulations provided the determination of both self-diffusion and interdiffusion constants in the undercooled regime as well. The simulation results show good agreement with the experimental data. In the temperature range 3000 K >= T >= 715 K, the interdiffusion coefficient is larger than the self-diffusion constants. Furthermore the simulation shows that this difference becomes larger in the undercooled regime. This result can be refered to a relatively strong temperature dependence of the thermodynamic factor \Phi, which describes the thermodynamic driving force for interdiffusion. The simulations also indicate that the Darken equation is a good approximation, even in the undercooled regime. This implies that dynamic cross correlations play a minor role for the temperature range under consideration. <|endoftext|><|startoftext|> Introduction 1.1. The Multimodality of Globular Cluster Systems One of the most significant developments in the study of extragalactic globular cluster systems (GCSs) was the discovery of bimodality in their color distributions (see Ashman & Zepf 1998; Harris 2001; West et al. 2004 and references therein). Today, we generally refer to glob- ular clusters (GCs) belonging to the blue peak of the color distribution as metal-poor GCs and to the red-peak members as the metal-rich sub-population. It is generally considered that the presence of multiple modes implies multiple distinct GC formation epochs and/or mechanisms and ties those directly into formation scenarios that have to describe the par- allel assembly histories of GCSs and the diffuse stellar populations in their host galaxies. In massive early-type galaxies the current GCS assembly paradigms view the origin of the two color peaks from the perspective of either episodic star-cluster formation bursts triggered by gas-rich galaxy mergers (e.g. Ashman & Zepf 1992), temporarily interrupted cluster forma- tion (so-called in-situ formation, e.g. Forbes et al. 1997; Harris et al. 1998), and star-cluster accretion as a result of the hierarchical assembly of galaxies (e.g. Côté et al. 1998). While the majority of GCSs in early-type galaxies show clearly bimodal color distri- butions, the general picture is much more complex, ranging from purely blue to purely red color distributions (e.g. Gebhardt & Kissler-Patig 1999; Kundu & Whitmore 2001a,b; Larsen et al. 2001; Peng et al. 2006). This complexity is exacerbated by the fact that color – 3 – bimodality is a function of galaxy mass and morphology, as less massive and later-type galaxies tend to have single-mode blue (i.e. metal-poor) GC populations (e.g. Lotz et al. 2004; Sharina et al. 2005; Peng et al. 2006). Furthermore, color bimodality is also a func- tion of galactocentric distance and is mainly due to the more extended spatial distribution of the metal-poor sub-population relative to metal-rich clusters (e.g. Harris & Harris 2002; Rhode & Zepf 2004; Dirsch et al. 2003, 2005). 1.2. Numerical Models of Globular Cluster System Formation The aspect of GCS formation and assembly entered recently the domain of numerical simulations of galaxy formation due to the increasing spatial resolution of these computa- tions. For instance, Li et al. (2004) model GC formation by identifying absorbing sink parti- cles in their smoothed particle hydrodynamics (SPH) high-resolution simulation of isolated gaseous disks and their mergers. They find a bimodal globular-cluster metallicity distribution in their merger remnant under the assumption of a particular age-metallicity relation. A key finding of their merger simulation is a more concentrated spatial distribution of metal-rich GCs with respect to the metal-poor sub-population in good agreement with observations. Since their models of isolated galaxies produce a smooth age distribution (implying a smooth metallicity and color distribution), Li et al. conclude that mergers are required to produce a bimodal metallicity (i.e. color) distribution. In a more detailed adaptive-grid cosmological simulation, Kravtsov & Gnedin (2005) followed the formation of a star-cluster system during the early evolution of a Milky Way-size disk galaxy to redshift z=3. Their model could reproduce the extended spatial distribution of metal-poor halo globular clusters as observed in M31 and the Milky Way. However, because their simulation does not follow the later evolution at z < 3 it is unclear whether it would produce a metallicity bimodality and any significant age-metallicity relation. An alternative, more statistical approach to modeling GCS assembly is to directly link the mode of GC formation to the star-formation rate in semi-analytic models. Beasley et al. (2002) were the first to explore this path by assuming that metal-poor GCs form in gaseous proto-galactic disks while metal-rich GCs are created during gaseous merger events. Their study showed that the observed globular-cluster color bimodality can only be reproduced by artificially stopping the formation of metal-poor GCs at redshifts z & 5. By construction, no spatial information on metal-rich and/or metal-poor GCs is provided in these models. – 4 – 1.3. A Spatially Resolved Chemical Evolution Model for Spheroid Galaxies Recently, Pipino & Matteucci (2004, hereafter PM04) presented a spatially resolved chemical evolution model for the formation of spheroids, which successfully reproduces a large number of photo-chemical properties that could be inferred from either the optical or from the X-ray spectra of the light coming from ellipticals. The model includes an initial gas infall and a subsequent galactic wind; it takes into account detailed nucleosynthesis prescriptions of both type-II and Ia supernovae as well as low and intermediate-mass stars. It has been extensively tested against the main photo-chemical properties of nearby ellipticals, including the observed increase of the α-enhancement in their stellar populations with galaxy mass (e.g. Worthey et al. 1992; Weiss et al. 1995). This is at variance with standard models based on the hierarchical merging paradigm, which do not reproduce this trend (Thomas et al. 2002). Since the PM04 model provides full radial information on the composite nature of stellar populations that make up elliptical galaxies, the observation of different GC sub- populations is, therefore, a new sanity check for the validity of this model. Moreover, we recall that PM04 and, more recently, Pipino et al. (2006, hereafter PMC06) suggested that elliptical galaxies should form outside-in, namely the outermost regions form faster as well as develop an earlier galactic wind with respect to the central parts (see also Martinelli et al. 1998). This mechanism implies that the stars in massive spheroids form a Composite Stellar Population (CSP), whose chemical properties, in particular their metallicity distribution, changes with galactocentric distance. Starting with the assumption that GC sub-populations trace the components of CSPs, we will show how the observed multi-modality in GCSs can be ascribed to the radial vari- ation in the underlying stellar populations. In particular, the observed GCSs are a linear combination of GC sub-populations inhabiting a given projected galactocentric radius. The paper is organized as follows: in Section 2 we briefly describe the adopted theo- retical model; in Section 3 we compare the predictions with observations and discuss the implications, while Section 4 presents the final conclusions. 2. The model 2.1. The Chemical Evolution Code The chemical evolution code for elliptical galaxies adopted here is described in PM04, where we refer the reader for more details. In this work, we present the results for a galaxy with Mlum ∼ 10 11M⊙, taken from PM04’s Model IIb. This model is characterized – 5 – by a Salpeter (1955) IMF, Thielemann et al. (1996) yields for massive stars, Nomoto et al. (1997) yields for Type-Ia SNe, and van den Hoek & Groenewegen (1997) yields for low- and intermediate-mass stars. An important feature of the PM04 model is its multi-zone nature, namely the model galaxy is divided into several non-interacting spherical shells of radius ri, which facilitate a detailed study of the radial variation of the photo-chemical properties of the GCS and its host galaxy. In each zone i, the equations for the chemical evolution of 21 chemical elements are solved (see PM04, Matteucci 2001). The model assumes that the galaxy assembles by merging of gaseous lumps on short timescales. The chemical composition of the lumps is assumed to be primordial. In fact, our model assumes that the accretion of primordial gas from the surroundings1 is more efficient in more massive systems, given their higher cross section per unit mass (see PM04). The model galaxy suffers a strong starburst which injects a large amount of energy into the interstellar medium, able to trigger a galactic wind, occurring at different times at different radii, mainly due to the radial variation of the potential well, which is shallower in the galactic outskirts. After the onset of wind activity the star formation is assumed to stop and the galaxy evolves passively with continuous mass loss. In order to correctly evaluate the amount of energy driving the wind, a detailed treatment of stellar feedback is included in the code (that takes into account the stellar lifetimes). In particular, the energy restored to the interstellar medium by both Type-Ia and Type-II supernovae has been calculated in a self-consistent manner according to the time of explosion of each supernova and the characteristics of the ambient medium (see PM04 for details). The potential well that keeps the gas bound to the galaxy is assumed to be dominated by a diffuse and massive halo of Dark Matter surrounding the galaxy. In the following we adopt the standard star formation rate ψ∗(t, ri) = ν · ρgas(ri, t) before the onset of the galactic wind (tgw), where ρgas is the gas density, ν the star-formation efficiency; otherwise we assume that ψ∗(t > tgw, ri) = 0. We recall here that the adopted star-formation efficiency is ν = 10Gyr−1, while the infall timescale is τ = 0.4Gyr in the galactic core and τ = 0.01Gyr at one effective radius (of the diffuse light, Reff), respectively. These values were chosen by PM04 in order to reproduce the majority of the chemical and photometric properties of ellipticals such as: the 〈[Mg/Fe]〉−σ (e.g. Faber et al. 1992), the Color-Magnitude (e.g. Bower et al. 1992), the Mass-Metallicity relation (e.g. Gallazzi et al. 2005) as well as the observed gradients in metallicity (e.g. Carollo et al. 1993), 〈[Mg/Fe]〉 1Since we lack a cosmological framework we cannot further specify the properties of the infalling primordial – 6 – (e.g. Mendez et al. 2005), and color (e.g. Peletier et al. 1990). Pipino et al. (2005) recently extended this model to explain also the properties of hot X-ray emitting halos surrounding more massive spheroids. 2.2. Globular Cluster Formation The formation rate of GCs, ψGC , in the i-th shell is assumed to be directly linked to its star formation rate ψ∗(t, ri, Zi) via a suitable function of time t, radius ri, and metallicity Z, which represents some scaling law between star formation rate ψ∗ and the star cluster formation ψGC and can be regarded as a GC formation efficiency. A similar relation be- tween the average star formation rate per surface area and the star cluster formation was recently found by Larsen & Richtler (2000) to hold in nearby spiral galaxies. In addition, the efficiency of cluster formation in massive ellipticals appears to be constant, where the mass ratio between the mass in star clusters and the baryons locked in field stars+gas is ǫGC≈ 0.25% (McLaughlin 1999). Here we extend this surface density relation to 3-D space. Moreover, PMC06 showed that at a given galactocentric radius model galaxies are made of a CSP, namely a mixture of several simple stellar populations (SSPs) each with a single age and chemical composition. The CSP reflects the chemical enrichment history of the entire system, weighted by the star formation rate. We define the stellar metallicity distribution Υ∗ as the distribution of stars belonging to a given CSP as a function of [Z/H]. We can then write the globular-cluster metallicity distribution ΥGC at a given radius ri and time t as: ΥGC(t, ri, Z) = f(t, ri, Z) ·Υ∗(t, ri, Z) , (1) where f includes all the information pertaining to the connection between ψGC and ψ∗. It is not trivial, and beyond the scope of the paper, to find an explicit definition for f(t, ri, Zi), which basically carries the information on the internal physics of gas clouds where GCs are expected to form. In the following we will show that, for a few and sensible choices of f , the observed multi-modality in the color distribution of globular clusters may be driven by the radial variations in the stellar population mix of ellipticals. We will first adopt a constant function f (see Sec. 3.1) and then allow f to mildly vary with Z (see Sec. 3.2). No absolute values for f will be given, since our formalism deals with normalized distributions. In particular, the total ΥGC summed over all radial shells can be written as: ΥGC,tot(t, Z) = f(t, ri, Z) ·Υ∗(t, ri, Z) . (2) – 7 – Similar equations hold for other GC distributions as a function of either [Mg/Fe] or [Fe/H]. At this stage it is useful to recall that Υ∗(t, ri, Z) can be represented in two following ways: i) as the fraction of mass of a CSP which is locked in stars at any given metallicity (Pagel & Patchett 1975; Matteucci 2001). In the following we refer to this stellar metal- licity distribution as Υ∗,m (MSMD: mass-weighted stellar metallicity distribution); ii) as a fraction of luminosity of the CSP in each metallicity bin. This definition is closer to the measurement as it can be directly compared to the luminosity-weighted mean Υ∗,l (LSMD: luminosity-weighted stellar metallicity distribution) at a given radius (see Arimoto & Yoshii 1987; Gibson 1996). This classification is important since PMC06 showed that the Υ∗,m and Υ∗,l might differ, especially at large radii, even for old stellar populations. The advantage of GCSs, for which accurate ages are known, is that they directly probe the mass-weighted distributions. At this point it is useful to recall that our the adopted chemical evolution model divides the galaxy in several non-interacting shells. In each shell the time at which the galactic wind occurs is self-consistently evaluated from the local condition. In particular, we follow Mar- tinelli et al. (1998) suggestion that gradients can arise as a consequence of a more prolonged SF, and thus stronger chemical enrichment, in the inner zones. In the galactic core, in fact, the potential well is deeper and the supernovae (SNe) driven wind develops later relative to the most external regions. This particular formation scenario leaves a characteristic imprint on the shape of both Υ∗,m and Υ∗,l and here we give some general considerations. In partic- ular, we can explain the slow rise in the low metallicity tail of the distributions as the effect of the initially infalling gas, whereas the onset of the galactic wind sets the maximum metal- licity of the Υ∗,m and Υ∗,l. In general, the suggested outside-in formation process reflects in a more asymmetric stellar metallicity distribution at larger radii, where the galactic wind occurs earlier (i.e. closer to the peak of the star formation rate), with respect to the galactic center. The qualitative agreement between these model predictions and the observed stellar metallicity distributions derived at different radii by Harris & Harris (2002, see their Fig. 18) for the stars in the elliptical galaxy NGC 5128 is remarkable. If confirmed from observations in other ellipticals, the expected sharp truncation of Υ∗,m at large radii might be the first direct evidence of a sudden and strong wind which stopped the star formation earlier in the galactic outskirts (see PMC06 and Pipino, D’Ercole, & Matteucci, in preparation). – 8 – 3. Results and discussion 3.1. The Multi-Modality of Globular Cluster Systems in Elliptical Galaxies The general presence of multi-modal GCSs implies that their host galaxies did not form in a single, isolated monolithic event, but experienced spatially and/or temporally separated star-formation bursts. In the recent past, both semi-analytic and hydrodynamic simula- tions of galaxy formation attempted to follow the process of GC formation (Beasley et al. 2002; Kravtsov & Gnedin 2005), but neither could produce a clearly bimodal MDF in their simulated GCSs. In this section we will show how to obtain a bimodal metallicity distribution function for GCs ΥGC,tot starting from single-mode stellar metallicity distribution functions Υ∗(t, ri, Z) (commonly known as G-dwarf-like diagrams) for the CSP inhabiting different radii of a prototypical elliptical galaxy according to Equation 2. 3.1.1. The Comparison Sample As stressed in the introduction, the multi-modality in GCSs varies as a function of host galaxy properties (e.g. mass, morphological type, etc.). Here we try to match the distri- butions resulting from the recent compilation of spectroscopic data by Puzia et al. (2006, hereafter P06), which samples the typical bimodal color distribution of GCs in nearby galax- ies (see also Puzia et al. 2004, 2005). This is illustrated in Figure 1, where we plot the (V−I)0 color distribution of the P06 sample of GCs in elliptical galaxies (top panel) together with those of GCs in NGC 4472 and the Milky Way (middle and bottom panel). NGC 4472 is the most luminous elliptical in the Virgo galaxy cluster and hosts a GC system with a prototypical color bimodality (e.g. Puzia et al. 1999). To allow direct comparison with the P06 sample, we use GCs in NGC 4472 that are brighter than V ≃ 22.5 since the P06 sample includes only the brightest GCs in nearby early-type galaxies in order to maximize the S/N of their spectra. The resulting color distribution is remarkably similar to the one of the P06 sample, which assures that the P06 sample includes a representative sampling of the GC color bimodality in massive elliptical galaxies. However, the comparison with the Milky Way GCs shows that the P06 sample covers few of the most metal-poor GCs. Therefore, the bimodality which we refer to in the fol- lowing may not be the same as the one observed in spirals or in some elliptical galaxies, where a substantial population of metal-poor clusters with [Z/H] . −1.5 is present (e.g. Gebhardt & Kissler-Patig 1999). – 9 – We do not include the dynamical evolution of GCs in our model, since we are considering only the brightest (most massive, i.e. > 105.5M⊙, see also Puzia et al. 2004, A&A 415, 123) clusters in nearby galaxies as comparison sample. The comparison sample includes GCs much brighter than the typical turnover magnitude of the globular cluster luminosity function and we, therefore, do not expect significant differential dynamical evolution for these massive systems (see Gnedin & Ostriker 1997, for details). In fact, it has been shown (e.g. Fall & Zhang, 2001) that the timescales for both evaporation by two body relaxation and tidal stripping of star clusters is longer than a Hubble time for GCs more massive than ∼ 105.5M⊙. In our model, the number of clusters formed in a star-formation burst of a given strength is adjusted to match the observations. Hence, the absolute scaling of GC numbers is arbi- trary, i.e., within physical limitations of the star formation rate any number of GCs can be reproduced by adjusting the function f in Equation 2. If, however, metal-poor and metal-rich GCs are on systematically different orbits and experience significantly different dynamical evolutions the effect of tidal disruption might be slowly changing relative GC numbers with time. Another complication is the variation of the initial star-cluster mass function, in particular as a function of metallicity. Modeling these effects requires detailed knowledge of the orbital characteristics and chemo-dynamical processes that lead to star cluster formation, and goes far beyond the scope of this work. We keep these potential systematics in mind, but expect negligible impact on our analysis. 3.1.2. A Simple Model In order to make a first-order comparison between our model predictions and the ob- served ΥGC,tot at t=13 Gyr, we first focus on the simple case in which: ΥGC,tot(Z) = fred ·Υ∗(t = 13Gyr, r1, Z) + fblue ·Υ∗(t = 13Gyr, r2, Z) , (3) with fred , fblue = const and r1 = 0.1Reff , r2 ≥ 1Reff . The first term (red) corresponds to a population typical of the galaxy core (well inside r < 1Reff). The second term (blue) represents Υ∗ in the outer regions. In order to take into account the different amounts of stars formed in each galactic region, we point out that the stellar metallicity distributions entering Equation 2 were not normalized. As a first step, we take several values for the weights fred , fblue in order to mimic different mixtures of the two GC populations. In particular, we used the relative numbers of – 10 – the red (here identified as the metal-rich core population) and the blue globular clusters (the halo metal-poor population) as a function of galactocentric radius for the elliptical galaxy NGC 1399 (Dirsch et al. 2003). Our particular choice is driven by observationally motivated values for the weights, although we realize that NGC 1399 is a quite peculiar, massive cD elliptical and might not be representative of less massive systems. In the context of this first step, the weights might reflect the effects of the projection on the sky of a three-dimensional structure. However, we show below that the results do not strongly depend on the weights. Therefore they might be interpreted as mean values and could be changed if one decides to model a particular galaxy, with a different ellipticity, inclination, and luminosity profile. In Figure 2 we show the globular-cluster metallicity distribution ΥGC by mass (compu- tation based on Υ∗,m) in two radial bins for three particular choices of weights. In particular in the following we will use the ratios fred = 0.77,fblue = 0.23, fred = 0.60,fblue = 0.40 and fred= fblue = 0.50 in order to define the theoretical innermost, intermediate and outermost sub-sample of the GCS, respectively. These ΥGC will be compared against subsamples of the P06 data, obtained by selecting GCs with either r < 1Reff (in the case of the innermost population) or r ≥ 1Reff (for the intermediate and the outemost cases, respectively), unless otherwise stated. 3.1.3. Globular Cluster Metallicity Distribution In order to plot the different cases on the same scale we normalize each ΥGC by its maximum value. In the left panel of Figure 2 the shaded histogram represents the innermost population. Our predictions match the data very well, especially in the metal-rich slope and the mean of the distribution. The same happens for the pure core populations, which shows how the GCS might be used to probe the CSP in ellipticals. It should be remarked that a second peak centered at super-solar metallicity appears in the distribution predicted by our models, although not evident in the data of the particular radial sub-sample. The right panel of Figure 2 illustrates model predictions which are more representative of the galaxy as a whole (either at 1Reff , i.e. the intermediate population, or at several effective radii, the outermost population), and we consider them as the fiducial case. These two cases look quite similar to each other and have clear signs of bimodality in remarkable agreement with the spectroscopic data (solid empty histogram, sub-sample of the P06 data with r ≥ Reff). A Kolmogorov-Smirnov test returns > 99% probability that both model predictions and observations are drawn from the same parent distribution in the left panel of Figure 2. The right panel statistics gives a lower likelihood of 98.4% that both distributions – 11 – have the same origin, which is mainly due to the observed excess of metal-poor GCs at large galactocentric radii compared to the model predictions. The prediction of a super- solar metallicity globular cluster sub-population is entirely new and a result of the radially varying and violent formation of the parent galaxy. Moving to the low-metallicity tail, we predict slightly fewer low metallicity objects than expected from observations. But we recall the systematics mentioned in Section 3.1.1. In Figure 3 (left panel) we show the results for a pure core GCs, namely one in which we adopt fred :fblue = 1:0. In this quite extreme case the observed GCs have been selected with radius r < 0.5Reff . The histogram reflects the shape of a G-dwarf-like diagram expected for a typical CSP inhabiting the galactic core. This finding is particularly important, because it might offer the opportunity to resolve the SSPs in ellipticals, at variance with data coming from the integrated spectra which deal with luminosity-weighted quantities. Whereas in Figure 3 (right panel), the intermediate population is compared to a sub-sample of P06 GCs with 0.5 < r < 0.5Reff . This is to show that the multimodality is not an artifact due to the particular radial binning adopted in this paper. Figure 4 shows the V -band luminosity-weighted ΥGC for which the computation is based on Υ∗,l. This metallicity distribution has been obtained by converting the mass in each [Z/H] bin of the previous figure into LV , by means of theM/LV ratio computed by Maraston (2005) as a function of [Z/H] for 13 Gyr old SSPs. Due to the well-known increase of the M/LV in the high metallicity tail of the distribution2, we notice in Figure 4 that now the second peak has a smaller intensity in all the cases. The corresponding diffuse-light population goes undetected in integrated-light studies. In any case, the conclusions reached by analyzing Figure 2 are not significantly altered. We conclude that our analysis is not significantly biased by some metallicity effect which may alter the shape of the observed ΥGC,tot by luminosity. We stress the power of GCSs in disentangling stellar sub-populations in massive ellipticals, due to their nature as simple stellar populations that can be directly compared to SSP model predictions, unlike diffuse light measurements. Even with this simple parametrization, where f in Equation 3 does not depend on metallicity, we suggest that the bimodality for the metal-rich GCs is the result of different shapes of Υ∗,m (and the Υ∗,l) at different galactocentric radii. 2See PMC06 for a comparison between G-dwarf like diagrams for Υ∗,m and Υ∗,l predicted for the same – 12 – 3.1.4. Globular Cluster [Mg/Fe] Distributions Finally, in Figure 5 we show the [Mg/Fe] distributions for GCs divided in radial bins as in Figure 2. According to PMC06, these [Mg/Fe] distributions are narrower, more symmetric, and exhibit a smaller radial variation with respect to the [Z/H] distributions. In any case, a small degree of bimodality is still present. We point out the impressive agreement with the spectroscopic observations by P06. There is a rather large discrepancy between data and models at the high-[Mg/Fe] end. Hence, the corresponding Kolmogorov-Smirnov likelihood tests return a probaility of 1% (for inner sample) and 95% (for the outer ones). If we limit the model predictions to [Mg/Fe] < 0.8 dex, the agreement slightly improves, reaching a 10% probablity in the inner region. However, the inner field data still does not reach the extreme [Mg/Fe] values of the models. In fact, due to the monotonic decrease of the [Mg/Fe] as a function of either metallicity or time (see PM04), the lack of low-metallicity GCs, evident from Figure 2, translates into a lack of α-enhanced clusters. The [Mg/Fe] bimodality of our model predictions and the match with the spectroscopic measurements strongly imply that globular clusters in massive elliptical galaxies form on two different timescales. Their chemical compositions are consistent with an early mode with a duration of ∆t . 100 Myr and a normal formation that lasted for ∆t . 500 Myr. In fact, according to the time-delay model (see Matteucci, 2001) and given the typical star formation history of our model ellipticals, the [Mg/Fe] ratio in the gas - out of which the GCs form - is quickly and continuously decreasing with time. We predict that the [Mg/Fe] ratio can be higher than 0.65 dex (i.e. in the bins in which our predictions exhibit a deficit of GCs with respect to the observed distribution) only in the first ∼ 100 Myr (see also Fig.3 in P06 and related discussion). In fact, such a high value for the [Mg/Fe] can be attained only if very massive type II SNe contribute to the chemical evolution, without any contribution from either lower-mass type II or type Ia SNe. The normal formation, instead, is the one already plotted in Fig. 5 and forms on a typical timescale of 0.5 − 0.7 Gyr. More quantitatively, our initial theoretical GCMD predicts that only ∼ 4% of the GCS forms at [Mg/Fe] larger than 0.65 dex. In order to improve the agreement with observations we require that the above fraction should be increased to ∼ 12− 15%. Since star and globular cluster formation are expected to be closely linked (e.g. Chandar et al. 2006) the same must be true for the diffuse stellar population of such galaxies. We therefore foresee the presence of a similar [Mg/Fe] bimodality in the diffuse light of massive elliptical galaxies. Unfortunately, there are not direct observations confirming our suggestions, until the metallicity distributions for the diffuse stellar component in ellipticals will become available for a number of galaxies. Indeed, the detection of bimodality in the [Mg/Fe]-distribution might be a benchmark test for our predictions. – 13 – We will still refer to multiple GC sub-populations. However, their differences ought to be ascribed only to the fact that they are created during an extended (and intense) star formation event during which the variation in chemical evolution is not negligible. The radial differences originate from the fact that the galactic wind epoch is tightly linked to the potential, occurring later in the innermost regions (e.g. Carollo et al. 1993; Martinelli et al. 1998). Moreover, PM04 and PMC06 found that also the infall timescale is linked to the galactocentric radius. In particular, it lasts longer in the more internal regions, owing to the continuous gas flows in the center of the galactic potential well. In particular, we recall that in our model the core experiences a longer (∼ 0.7 Gyr) star-formation history with respect to the outskirts where the typical star-formation timescale is ∼0.2 Gyr. According to our models the metal-rich population of GCs in massive elliptical galaxies may consist of multiple sub-populations which basically play the same role as the CSPs populating each galactocentric shell in our framework of the global galaxy evolution. At the same time, we point out the lack of our models to produce a significant fraction of metal- poor GCs similar to the halo GC population in the Milky Way, with the caveat that the star formation histories are very different. This, in turn, suggests that GCSs in giant elliptical galaxies were assembled by accretion of a significant number of metal-poor GCs. 3.2. Metallicity Dependent Globular Cluster Formation In this section we explore the approach outlined in Equation 2, by introducing the effect of metallicity in the function f. In particular, we start by assuming that f(t, ri, [Z/H]<−1) f(t, ri, [Z/H]>−1) = 2 , (4) roughly following what was found for the GCS of the most nearby giant elliptical galaxy NGC 5128 (Centaurus A) by Harris & Harris (2002, hereafter HH02) for their “inner field”, regardless of the radius of the i-th shell. Since the final distributions are normalized, the actual zeropoint of the function f is not relevant. Our model predictions are plotted in Figure 6. We notice a modest increase of the low-metalliticy tail of the distribution with respect to the simple picture sketched in Section 3.1 without metallicity dependence, as well as a lower fraction of globular clusters populating the high-metallicity peak. Including the metallicity dependence leads to an ambiguous change in agreement with the general observed trend. Hence, no firm conclusions on the real need for a metallicity dependence can be drawn. Similar results are obtained in the more realistic case in which f is a linearly decreasing function of [Z/H]. – 14 – At variance with HH02, we chose to adopt the same scaling irrespective of galacto- centric radius, for the following reason. Despite the fact that the HH02 stellar metallicity distributions Υ∗ as functions of [Z/H] confirm both the shape and the radial behavior of our model predictions for the mass-weighted stellar metallicity distribution Υ∗,m (compare their Fig. 7 with PMC06 Fig. 4), care should be taken when comparing their results for Υ∗ as a function of [Fe/H]. The latter, in fact, had been obtained by assuming a particular trend in the [α/Fe] as a function of galactocentric radius which disagrees with the results of our detailed chemical evolution model. In particular, we find an offset of at least 0.2 dex in the sense that [Fe/H] HH02 ∼ 0.2+ [Fe/H] PM04 at a given metallicity ([Z/H]). This disagreement becomes larger either at very low metallicity or at larger galactocentric radii, where we expect a stronger α-enhancement. Once the PM04 value for [Fe/H] is adopted in Fig. 18 of HH02, we find that: i) for the inner halo, the stellar Υ∗,m should be shifted by ∼ 0.2 dex toward lower metallicities, removing any metallicity effect, and ii) for the outer halo the discrepancy between the stellar Υ∗,m and the ΥGC,tot should be reduced. Nevertheless, we believe that some decrease with time of the function f could be mo- tivated by theoretical arguments. In fact, recent work (e.g. Elmegreen & Efremov 1997; Elmegreen 2004) shows that GCs of all ages preferentially form in turbulent high-pressure regions. If we interpret the decrease in the efficiency of star formation (inside the gas clouds that form GCs), as a function of the ambient pressure (Elmegreen & Efremov 1997) as a proxy for the temporal behaviour of our function f, we find again a reduction of a factor ∼2−3 from the early high-pressure epochs to a late, more quiescent evolutionary phase. 3.2.1. The Ratio of Metal-poor to Metal-rich Globular Clusters For our fiducial model we predict a ratio of metal-poor (namely with [Z/H] ≤ −1) to metal-rich GCs of ∼0.2. Previous photometric surveys found that the typical value for GC systems in elliptical galaxies is close to unity (Gebhardt & Kissler-Patig 1999; Kundu & Whitmore 2001a). Provided a linear color-metallicity transformation (see also Section 3.5), a possible explanation for the discrepancy between our models and the observations might be obtained by boosting the metal-poor population by a factor of f(t, ri, [Z/H]<−1)/f(t, ri, [Z/H]>−1) ≥ Another way to solve the discrepancy is to assume that all the missing globular clusters have been accreted from the surroundings, e.g. from dwarf satellites (e.g. Côté et al. 1998). We estimate the amount of the accreted metal-poor GCs, needed to achieve a ratio close to 1, as a factor of ∼4 of the number of globular clusters initially formed inside the galaxy. – 15 – 3.3. The Role of the Host Galaxy Mass A natural consequence of the scenario depicted in Sections 3.1 and 3.2 is that, at a given galactocentric radius, the mean metallicity and [α/Fe] ratios of a GCS coincide with the mass-weighted [〈Z/H〉∗] and [〈α/Fe〉∗] of the underlying stellar population, because the GC quantities are calculated either from Υ∗,m or the Υ∗,l (see Eqs. 1 and 2 of PMC06), unless the scaling function f is allowed to strongly vary with time. We expect this to happen at least in the innermost GC sub-populations, in which the effects of the accretion of GCs from the environment can be reasonably neglected. In particular, PM04 predict that more massive galaxies should show higher [〈α/Fe〉∗] and [〈Z/H〉∗]. If accretion plays a negligible role, we expect the same correlations for the total GC population with host galaxy mass for the most massive systems, in agreement with current observations (e.g. van den Bergh 1975; Brodie & Huchra 1991; Peng et al. 2006). In fact, if we perform the same study of the above sections for a 1012M⊙ galaxy (see Table 2 of PM04 for its properties), both peaks in ΥGC,tot shift their positions by about 0.2 dex to higher [Z/H]. This is in good agreement with the results of (Peng et al. 2006, see their Figure 13 and 14). This trend holds for smaller objects as well. If we apply the procedure to a 1010M⊙ galaxy (Model IIb of PM04), we find that the metal-rich peak shifts towards a lower metallicity by 0.3 dex (with respect to our fiducial model with Mlum = 10 11M⊙), while the other peak is now centered around [Z/H] = −0.8 dex. In particular, we find a faster decrease in the mean metalliticy of the metal-poor GCs than for the metal-rich ones, again in agreement with the Peng et al. results. Interestingly, the ratio of metal-poor to metal-rich cluster increases up to ∼ 0.5 for less massive halos. We recall that in the PM04 scenario, the low-mass galaxies are those forming on a longer timescales and with a slower infall rate. Therefore, we suggest that the combination of these factors is likely to at least partly explain the change of the GC distributions in different galaxy morphologies. This is especially the case in dwarf galaxies, where star formation is slow and still on-going, together with the fact that the probability for a substantial change in the pressure of the interstellar medium relative to its initial values is higher than in ellipticals, thus implying a much stronger variation of f with time. 3.4. Merger-Induced Globular Cluster Formation It has been suggested that GC populations are produced during major merger events which would lead to present-day ellipticals and their rich GCSs (e.g. Schweizer 1987; Ashman & Zepf 1992). Subsequent studies (e.g. Forbes et al. 1997; Kissler-Patig et al. 1998b) challenged this – 16 – view by pointing out the much higher SN and more metal-rich GCSs in early-type galaxies compared to those of spiral and irregular galaxies, which are thought to represent the early building blocks of massive ellipticals. In the following, we study the impact of the merger hypothesis on the predictions of our simulations. In order to do that, we extended the procedure sketched in the previous sections to the merger models presented by Pipino & Matteucci (2006, hereafter PM06). In this paper, the effects of late gas accretion episodes and subsequent merger-induced starbursts on the photo-chemical evolution of elliptical galaxies have been studied and compared to the picture of galaxy formation emerging from PM04; in particular the PM04 best model is taken here as a reference. By means of the comparison with the colour-magnitude relations and the [〈Mg/Fe〉V ]-σ relation observed in ellipticals (e.g. Renzini 2006), PM06 showed that either bursts involving a gas mass comparable to the mass already transformed into stars during the first episode of star formation and occurring at any redshift (major mergers), or bursts occurring at low redshift (i.e. z ≤ 0.2) and with a large range of accreted mass (minor mergers), are ruled out. The reason lies in the fact that the chemical abundances in the ISM after the galactic wind (and before the occurrence of the merger) are dominated by Type Ia SN explosions, which continuously enrich the gas with their ejecta (mainly Fe). When the merger-induced starburst occurs, most stars form out of this enriched gas (thus, e.g., lowering the total [〈Mg/Fe〉]); at the same time, we expect the metallicities of GCs formed out of this gas to be on average higher and their [Mg/Fe] ratios to be lower than those of the bulk of stars and GCs formed in the initial starburst. In this work we present the case in which the galaxy accretes a gas mass Macc = Mlum at tacc = 2 Gyr (i.e. ∼1 Gyr after the onset of the galactic wind). We make this choice for several reasons: i This model is quite similar to the PM06 models b and g, which were among those in good agreement with observations of the diffuse galaxy light properties. ii) The formation epoch of the bulk of these second generation GCs cannot occur &2 Gyr later than tgw, because the majority of GCs in the most massive elliptical galaxies studied today appear old within the age resolution of current photometric (∆t/t ≈ 0.4−0.5) and spectroscopic studies (∆t/t ≈ 0.2−0.3). Finally, the composition of the newly accreted gas is assumed to be primordial (see PM06 for a detailed discussion), but we remark that we reach roughly the same conclusion in the case of solar composition, in order to mimic some pre-enrichment for the newly accreted gas. We point out that, lacking dynamics, PM06 presented their results for one-zone models. Therefore, in this section we are considering Equation 2 limited to only one shell. In this way we can check whether the single merger hypothesis alone is enough to produce some bimodality in the total globular-cluster metallicity distribution ΥGC,tot, and if it is consistent with the predictions based on our fiducial model described in Section 3.2. – 17 – We show our results in Figure 7 and 8. We notice a clear change in the overall shape of the metallicity distribution ΥGC,tot with respect to the cases shown in the previous sections, in the sense that now ΥGC,tot is narrower and dominated by objects with super-solar metallicity (and sub-solar [Mg/Fe] ratios) with a dominant population at [Z/H] ≈ 0.1, which is not prominent in the observations of P06. The high-metallicity globular cluster populations dominate the metallicity distribution which is at variance with both the results from previous sections and the observations. Our merger model does not include the accretion of globular clusters that were already formed within the accreted satellite galaxies. The inclusion of this effect could remedy the match between models and observations at low metallicities, as the typical GC in a dwarf galaxy is metal-poor (e.g. Lotz et al. 2004; Sharina et al. 2005) and their addition to the initial GC population would enhance the total number of metal-poor GCs and improve the fit to the data. However, these clusters need to be α-enhanced to match the observations. The impact of GC accretion on our post-merger model predictions will be studied in detail in a future paper. Here, we remark that the time at which the purely gaseous subsequent merger event can occur (which does not import already formed GCs), is limited by the onset of the galactic wind, after which the type-Ia SNe dominates the nucleosynthesis, and needs to be completed at tmrg . 1 − 2 Gyr after the first starburst. However, this time constraint implies that such merger events would overlap with the initial starburst and be mostly indistinguishable from each other. Such a scenario closely resembles the Searle-Zinn scenario (Searle & Zinn 1978), in which galaxy halos are formed from the agglomeration of gaseous protogalactic fragments. Later merger events are excluded in our models, as they would produce GCs with sub-solar [Mg/Fe] ratios which is at variance with the observations. Note also that the fraction of GCs at [Z/H]< −1 can be recovered in our models only if the cluster formation at low metallicity is enhanced (e.g. using a value of 10 instead of 2 in eq.4). However, even in the case in which we adopt some f(Z) strongly declining with total metallicity, which may alter the shape of ΥGC,tot enhancing the low-metallicity tail and thus improving the agreement with observations, the position of both super-solar metallicity peaks will not change, remaining at variance with the data. 3.5. Other Mechanisms responsible for Multimodality The picture emerging from our analysis is far from being the general solution to ex- plaining the complexity of GC color distributions, and it suggests only a scheme in which multiple mechanisms could be at work together, either broadening or adding features to the observed distributions. – 18 – For instance, Yoon et al. (2006) suggested that the color bimodality could arise from the presence of hot horizontal-branch stars (so far not accounted for in SSP model predictions) that results in a non-linear color-metallicity transformation producing two color peaks from an originally single-peak metallicity distribution. We tested this scenario on our fiducial model GC metallicity distribution, by applying to each SSP the following transformation from [Fe/H] to the (g − z) color: (g − z) = α + β [Fe/H] + γ [Fe/H] +δ [Fe/H]3 + ǫ [Fe/H] The numerical values of the coefficients are given in Table 1 and the relation was adopted from Yoon et al. (2006) and is consistent with the best-fit relation presented in their Figure 1b. We show our results in Figure 9. Since we start from symmetric metallicity distributions, the non-linear transformation seems to work and produce a color bimodality for the CSP inhabiting the < 10Reff shell (Figure 9, solid line), although the bimodality is slightly exaggerated compared to real data (see Figure 1). In fact, a look at the color distribution which we obtained for the sole 0.1Reff shell reveals that it still has one peak and is roughly symmetric (Figure 9, dashed line). Obviously, since the (g − z)− [Fe/H] relationship is meant to explain the GCs color bimodality without invoking any other effect, we did not combine the two histograms, either according to Eq.2 or to Eq. 3 in our models, as we are comparing metallicity distributions to spectroscopic measurements. It is of great importance to investigate this transformation with large and homoge- neous data sets that cover a wide enough metallicity range to allow a robust analysis of the non-linear inflection point in the color-metallicity transformation. However, as a re- sult of Figure 9, we point out that the color bimodality typically found for GCSs in massive early-type galaxies might be only partly due to a non-linear color-metallicity transformation. Another effect put forward by, e.g.,Recchi & Danziger (2005) is the claim that GCs might have undergone a self-enrichment phase at the early stages of their formation, and Table 1. Numerical values of coefficients used in Equation 6. coefficient numerical value α 1.5033 β 0.172774 γ −0.623522 δ −0.453331 ǫ −0.089038 – 19 – therefore some GCs could have experienced a boost in metallicity which would be not rep- resentative of the metallicity of their parent gas cloud. Finally, as already mentioned above in Section 3.4, some GCs residing in the outermost regions of the galaxies (e.g. Lee et al. 2006) could have experienced entirely different chemical enrichment histories at the time of their formation and later been added to a more massive system through accretion (e.g. Côté et al. 1998). The inclusion of these effects goes far beyond the scope of this work, but we remind the reader that all the aforementioned effects might influence the interpretation of any globular cluster color and metallicity distribution. 4. Conclusions By means of the comparison between PM04’s best model predictions for the radial changes in the CSP chemical properties and the recent spectroscopic data on the metallicity distributions of extragalactic GCSs from Puzia et al. (2006), we are able to derive some conclusions on the GC metallicity distributions in massive elliptical galaxies. In particular, we focused on the main drivers of the multi-modality that is observed in the majority of GCSs in massive elliptical galaxies. Our main conclusions are: • We show that the observed multi-modality in the GC metallicity distributions can be ascribed to the radial variation in the underlying stellar populations in giant elliptical galaxies. In particular, the observed GCSs are consistent with a linear combination of the GC sub-populations inhabiting different galactocentric radii projected on the sky. • A new prediction of our models, which is in astonishing agreement with the spectro- scopic observations, is the presence of a super-solar metallicity mode that seems to emerge in the most massive elliptical galaxies. In smaller objects, instead, this mode disappears quickly with decreasing stellar mass of the host galaxy. • Our models successfully reproduce the observed [Mg/Fe] bimodality in GCSs of mas- sive elliptical galaxies. This, in turn, suggests a bimodality in formation timescales during the early formation epochs of GCs in massive galaxy halos. The two modes are consistent with an early (initial) and later (triggered) formation mode. • Since the GC populations trace the properties of galactic CSPs in our scenario, we predict an increase of the mean metallicity of the cluster system with the host galaxy mass, which closely follows the mass-metallicity relation for ellipticals. Moreover, we expect that a major fraction of the GCs (i.e. those born inside the galaxy) follows an age-metalliticity relationship, in the sense that the older ones are also more α-enhanced and more metal-poor. – 20 – • The role of host galaxy metallicity in shaping the observed GC metallicity distribution is non-negligible, although its effects have been estimated to change the function f ≃ ψGC/ψ∗ by a factor of ∼ 2 − 5, in order to match the sample of Puzia et al. (2006). Either a non-linear color-metallicity transformation, or a stronger metallicity effect, and/or accretion of GCs from the surrounding environment is needed to explain a ratio of metal-poor to metal-rich GCs close to unity, as reported for ellipticals based on results from photometric surveys. • Merger models which include the later accretion of primordial and/or solar-metallicity gas predict a shape for the GC metallicity distribution which is at variance with the spectroscopic observations. We thank the referee for a careful reading of the paper. A.P. warmly thanks S.Recchi for useful discussions. A.P. acknowledges support by the Italian Ministry for University under the COFIN03 prot. 2003028039 scheme. T.H.P. acknowledges support by NASA through grants GO-10129 and GO-10515 from the Space Telescope Science Institute, which is operated by AURA, Inc., under NASA Contract NAS5-26555, and the support in form of a Plaskett Research Fellowship at the Herzberg Institute of Astrophysics. REFERENCES Arimoto, N., & Yoshii, Y. 1987, A&A, 173, 23 Ashman, K. M., & Zepf, S. E. 1992, ApJ, 384, 50 Ashman, K. M. & Zepf, S. E. 1998, Globular Cluster Systems, Cambridge University Press Beasley, M. A., Baugh, C. M., Forbes, D. A., Sharples, R. M., & Frenk, C. S. 2002, MNRAS, 333, Bower, R. G., Lucey, J. R., & Ellis, R. S. 1992, MNRAS, 254, 601 Bressan, A., Chiosi, C., Fagotto, F. 1994, ApJs, 94, 63 Brodie, J.P., & Huchra, J.P., 1991, ApJ, 379, 157 Brodie, J.P., & Strader, J., 2006, astro-ph/0602601 Carollo, C.M., Danziger, I.J., & Buson, L. 1993, MNRAS, 265, 553 Chandar, R., Fall, S. M., & Whitmore, B. C. 2006, ApJ, 650, L111 http://arxiv.org/abs/astro-ph/0602601 – 21 – Côté, P., Marzke, R. O., & West, M. J. 1998, ApJ, 501, 554 Dirsch, B., Richtler, T., Geisler, D., Forte, J.C., Bassino, L.P., & Gieren, W.P., 2003, ApJ, 125, Dirsch, B., Schuberth, Y., & Richtler, T. 2005, A&A, 433, 43 Elmegreen, B. C. 2004, ASP Conf. Ser. 322: The Formation and Evolution of Massive Young Star Clusters, 322, 277 Elmegreen, B. G., & Efremov, Y. N. 1997, ApJ, 480, 235 Elmegreen, B. G., & Scalo, J. 2004, ARA&A, 42, 211 Faber, S.M., Worthey, G., & Gonzalez, J.J. 1992, in IAU Symp. n.149, eds. B. Barbuy & A. Renzini, p. 255 Fall, S.M., & Zhang, Q. 2001, ApJ, 561, 751 Forbes, D. A., Brodie, J. P., & Grillmair, C. J. 1997, AJ, 113, 1652 Gallazzi, A., Charlot, S., Brinchmann, J., White, S. D. M., & Tremonti, C. A. 2005, MNRAS, 362, Gebhardt, K., & Kissler-Patig, M. 1999, AJ, 118, 1526 Gibson, B.K., 1996, MNRAS, 278, 829 Gnedin, O. Y., & Ostriker, J. P. 1997, ApJ, 474, 223 Greggio, L., 1997, MNRAS, 285, 151 Harris, W. E. 1991, ARA&A, 29, 543 Harris, W. E. 1996, AJ, 112, 1487 Harris, W. E. 2001, in Saas-Fee Advanced School on Star Clusters, ed. L. Labhardt & B. Binggeli (course 28), Springer, New York Harris, W. E., Harris, G. L. H., & McLaughlin, D. E. 1998, AJ, 115, 1801 Harris, W. E., & Harris, G. L. H. 2002, AJ, 123, 3108 Holtzman, J. A., et al. 1992, AJ, 103, 691 Kissler-Patig, M., Brodie, J. P., Schroder, L. L., Forbes, D. A., Grillmair, C. J., & Huchra, J. P. 1998a, AJ, 115, 105 – 22 – Kissler-Patig, M., Forbes, D. A., & Minniti, D. 1998b, MNRAS, 298, 1123 Kravtsov, A. V., & Gnedin, O. Y. 2005, ApJ, 623, 650 Kundu, A., & Whitmore, B. C. 2001a, AJ, 121, 2950 Kundu, A., & Whitmore, B. C. 2001b, AJ, 122, 1251 Larsen, S. S., Brodie, J. P., Huchra, J. P., Forbes, D. A., & Grillmair, C. J. 2001, AJ, 121, 2974 Larsen, S. S., & Richtler, T. 2000, A&A, 354, 836 Lee, J.-W., López-Morales, M., & Carney, B. W. 2006, ApJ, 646, L119 Li, Y., Mac Low, M.-M., & Klessen, R. S. 2004, ApJ, 614, L29 Lotz, J. M., Miller, B. W., & Ferguson, H. C. 2004, ApJ, 613, 262 Lutz, D. 1991, A&A, 245, 31 Maraston, C. 2005, MNRAS, 362, 799 Martinelli, A., Matteucci, F., Colafrancesco, S., 1998, MNRAS, 298, 42 Matteucci, F. 2001, The chemical evolution of the Galaxy, Kluwer Academic Publishers, Dordrecht McLaughlin, D. E. 1999, AJ, 117, 2398 Mendez, R.H., Thomas, D., Saglia, R.P., Maraston, C., Kudritzki, R.P., & Bender, R., 2005, ApJ, 627, 767 Nomoto, K., Hashimoto, M., Tsujimoto, T., Thielemann, F.K., Kishimoto, N., Kubo, Y., Nakasato, N., 1997, Nuclear Physics A, A621, 467 Osterbrock, D. E., & Ferland, G. J. 2006, Astrophysics of gaseous nebulae and active galactic nuclei, 2nd. ed. by D.E. Osterbrock and G.J. Ferland. Sausalito, CA: University Science Books, 2006, Pagel, B. E. J., & Patchett, B.E. 1975, MNRAS, 172, 13 Peletier, R.F., Davies, R.L., Illingworth, G.D., Davis, L.E., Cawson, M. 1990, AJ, 100, 1091 Peng, E. W., et al. 2006, ApJ, 639, 95 Pipino, A., Matteucci, F. 2004, MNRAS, 347, 968 (PM04) Pipino, A., Kawata, D., Gibson, B. K., & Matteucci, F. 2005, A&A, 434, 553 – 23 – Pipino, A., Matteucci, F., & Chiappini, C. 2006, ApJ, 638, 739 (PMC06) Pipino, A., & Matteucci, F. 2006, MNRAS, 365, 1114 (PM06) Puzia, T. H., Kissler-Patig, M., Brodie, J. P., & Huchra, J. P. 1999, AJ, 118, 2734 Puzia, T. H., et al. 2004, A&A, 415, 123 Puzia, T. H., Kissler-Patig, M., Thomas, D., Maraston, C., Saglia, R. P., Bender, R., Goudfrooij, P., & Hempel, M. 2005, A&A, 439, 997 Puzia, T. H., Kissler-Patig, M., & Goudfrooij, P. 2006, ApJ, 648, 383, (P06) Recchi, S., & Danziger, I. J. 2005, A&A, 436, 145 Renzini, A. 2006, ARA&A, 44, 141 Rhode, K. L., & Zepf, S. E. 2004, AJ, 127, 302 Salpeter, E. E. 1955, ApJ, 121, 161 Schweizer, F. 1987, Nearly Normal Galaxies. From the Planck Time to the Present, 18 Searle, L., & Zinn, R. 1978, ApJ, 225, 357 Sharina, M. E., Puzia, T. H., & Makarov, D. I. 2005, A&A, 442, 85 Thielemann, F. K., Nomoto, K., Hashimoto, M. 1996, ApJ, 460, 408 Thomas, D., Maraston, C., & Bender, R., 2002, Ap&SS, 281, 371 Tinsley, B.M., 1980, ApJ, 241, 41 van den Hoek, L.B., Groenewegen, M.A.T. 1997, A&AS, 123, 305 van den Bergh, 1975, ARA&A, 13, 217 Weiss, A., Peletier, R.F., Matteucci, F. 1995, A&A, 296, 73 West, M. J., Côté, P., Marzke, R. O., & Jordán, A. 2004, Nature, 427, 31 Worthey, G., Faber, S.M., & Gonzalez, J.J. 1992, ApJ, 398, 69 Yoon, S.-J., Yi, S. K., & Lee, Y.-W. 2006, Science, 311, 1129 Zepf, S. E., & Ashman, K. M. 1993, MNRAS, 264, 611 This preprint was prepared with the AAS LATEX macros v5.2. – 24 – Fig. 1.— Color distributions of GCs in nearby, massive elliptical galaxies (Puzia et al. 2006, top panel), in NGC 4472 (Puzia et al. 1999, middle panel), and the Milky Way, taken from the February 2003 update of the McMaster catalog (Harris 1996, bottom panel). In order to allow a robust comparison between the P06 and NGC 4472 sample, only GCs in NGC 4472 with with luminosities brighter than V ≃ 22.5 mag are shown. The solid lines are Epanechnikov- kernel probability density estimates with their bootstrapped 90% confidence limits. – 25 – Fig. 2.— Predicted globular-cluster metallicity distribution ΥGC,tot by mass as a function of [Z/H] for three different radial compositions (i.e. fred/fblue). The left panel shows both model predictions and observations related to the central part of an elliptical galaxy. The right panel shows the same quantities for cluster populations residing at r ≥ Reff . Solid empty histograms: observational data taken as sub-samples of the P06 compilation, according to the galactic regions presented in each panel. – 26 – Fig. 3.— Predicted globular-cluster metallicity distribution ΥGC,tot by mass as a function of [Z/H] for two different projected galactocentric radii. The left panel shows both model predictions and observations related to the pure core of an elliptical galaxy (namely fred : fblue = 1 : 0). The right panel shows the same quantities for cluster populations residing either at 0.5Reff < r < 1.5Reff . Solid empty histograms: observational data taken as sub- samples of the P06 compilation, according to the galactic regions presented in each panel. – 27 – Fig. 4.— Predicted globular-cluster metallicity distribution ΥGC,tot by luminosity at three different projected galactocentric radii. Solid: innermost region; dotted: average galactic (intermediate population); dashed: outermost part. – 28 – Fig. 5.— Shaded histogram: Predicted distribution of globular-cluster [Mg/Fe] values at two different projected galactocentric radii (innermost and outermost regions). Solid empty histogram: observational data taken as sub-samples of the P06 compilation, according to the galactic regions presented in each panel (see text). – 29 – Fig. 6.— Predicted globular-cluster metallicity distribution ΥGC,tot by mass as a function of [Z/H] for three different radial compositions (i.e. different fred/fblue). In this case, the function f has an explicit dependence on [Z/H] (see text). The left panel shows both model predictions and observations related to the central part of an elliptical galaxy. The right panel shows the same quantities for cluster populations residing at r ≥ Reff . Solid empty histograms: observational data taken as sub-samples of the P06 compilation, according to the galactic regions presented in each panel. – 30 – Fig. 7.— Shaded histogram: predicted total GC metallicity distribution ΥGC,tot by mass for the < 1Reff shell, for a case in which a second episode of star formation, induced by a gaseous merger, is taken into account (see text). Solid histogram: observations from Puzia et al. (2006), their entire sample. – 31 – Fig. 8.— Shaded histogram: predicted total GC [Mg/Fe] distribution by mass for the < 1Reff shell, for a case in which a second episode of star formation, induced by a gaseous merger, is taken into account (see text). Solid histogram: observations by Puzia et al. (2006), their entire sample. – 32 – Fig. 9.— Predicted globular-cluster metallicity distribution by mass as a function of the (g−z) colour at two different projected galactocentric radii. Dashed: galactic core. Solid: galactic halo out to 10 Reff . Introduction The Multimodality of Globular Cluster Systems Numerical Models of Globular Cluster System Formation A Spatially Resolved Chemical Evolution Model for Spheroid Galaxies The model The Chemical Evolution Code Globular Cluster Formation Results and discussion The Multi-Modality of Globular Cluster Systems in Elliptical Galaxies The Comparison Sample A Simple Model Globular Cluster Metallicity Distribution Globular Cluster [Mg/Fe] Distributions Metallicity Dependent Globular Cluster Formation The Ratio of Metal-poor to Metal-rich Globular Clusters The Role of the Host Galaxy Mass Merger-Induced Globular Cluster Formation Other Mechanisms responsible for Multimodality Conclusions ABSTRACT The most massive elliptical galaxies show a prominent multi-modality in their globular cluster system color distributions. Understanding the mechanisms which lead to multiple globular cluster sub-populations is essential for a complete picture of massive galaxy formation. By assuming that globular cluster formation traces the total star formation and taking into account the radial variations in the composite stellar populations predicted by the Pipino & Matteucci (2004) multi-zone photo-chemical evolution code, we compute the distribution of globular cluster properties as a function of galactocentric radius. We compare our results to the spectroscopic measurements of globular clusters in nearby early-type galaxies by Puzia et al. (2006) and show that the observed multi-modality in globular cluster systems of massive ellipticals can be, at least partly, ascribed to the radial variation in the mix of stellar populations. Our model predicts the presence of a super-metal-rich population of globular clusters in the most massive elliptical galaxies, which is in very good agreement with the spectroscopic observations. Furthermore, we investigate the impact of other non-linear mechanisms that shape the metallicity distribution of globular cluster systems, in particular the role of merger-induced globular cluster formation and a non-linear color-metallicity transformation, and discuss their influence in the context of our model (abridged) <|endoftext|><|startoftext|> Introduction GLAST studies of accreting binaries X-ray jets Large scale jet-ISM interaction Gamma-ray spectral states and major ejections The observational status: gamma-ray binaries The view from space The view from the ground Gamma-ray binaries Gamma-ray binaries as compact pulsar wind nebulae GLAST studies of rotation-powered binaries Gamma-ray orbital modulation Probing pulsar winds Population studies of gamma-ray binaries ABSTRACT Radio and X-ray observations of the relativistic jets of microquasars show evidence for the acceleration of particles to very high energies. Signatures of non-thermal processes occurring closer in to the compact object can also be found. In addition, three binaries are now established emitters of high (> 100 MeV) and/or very high (> 100GeV) energy gamma-rays. High-energy emission can originate from a microquasar jet (accretion-powered) or from a shocked pulsar wind (rotation-powered). I discuss the impact GLAST will have in the very near future on studies of such binaries. GLAST is expected to shed new light on the link between accretion and ejection in microquasars and to enable to probe pulsar winds on small scales in rotation-powered binaries. <|endoftext|><|startoftext|> Introduction 1.1 The main questions and results In this paper, every surface will be complex, rational, algebraic and smooth, and except for C2, will also be projective. By an automorphism of a surface we mean a biregular algebraic morphism from the surface to itself. The group of automor- phisms (respectively of birational transformations) of a surface S will be denoted by Aut(S) (respectively by Bir(S)). The group Bir(P2) is classically called the Cremona group. Taking some sur- face S, any birational map S 99K P2 conjugates Bir(S) to Bir(P2); any subgroup of Bir(S) may therefore be viewed as a subgroup of the Cremona group, up to conjugacy. The minimal surfaces are P2, P1×P1 and the Hirzebruch surfaces Fn for n ≥ 2; their groups of automorphisms are a classical object of study, and their structures are well known (see for example [Bea1]). These groups are in fact the maximal connected algebraic subgroups of the Cremona group (see [Mu-Um], [Um]). Given some group acting birationally on a surface, we would like to determine some geometric properties that allow us to decide whether the group is conjugate to a group of automorphisms of a minimal surface, or equivalently to decide whether http://arxiv.org/abs/0704.0537v2 it belongs to a maximal connected algebraic subgroup of the Cremona group. This conjugation looks like a linearisation, as we will see below, and explains our title. We observe that the set of points of a minimal surface which are fixed by a non-trivial automorphism is the union of a finite number of points and rational curves. Given a group G of birational transformations of a surface, the following properties are thus related (note that for us the genus is the geometric genus, so that a curve has positive genus if and only if it is not rational); property (F ) is our candidate for the geometric property for which we require: (F ) No non-trivial element of G fixes (pointwise) a curve of positive genus. (M) The group G is birationally conjugate to a group of automorphisms of a minimal surface. The fact that a curve of positive genus is not collapsed by a birational trans- formation of surfaces implies that property (F ) is a conjugacy invariant; it is clear that the same is true of property (M). The above discussion implies that (M) ⇒ (F ); we would like to prove the converse. The implication (F ) ⇒ (M) is true for finite cyclic groups of prime order (see [Be-Bl]). The present article describes precisely the case of finite Abelian groups. We prove that (F ) ⇒ (M) is true for finite cyclic groups of any order, and that we may restrict the minimal surfaces to P2 or P1 × P1. In the case of finite Abelian groups, there exists, up to conjugation, only one counterexample to the implication, which is represented by a group isomorphic to Z/2Z × Z/4Z acting biregularly on a special conic bundle. Precisely, we will prove the following results, announced without proof as Theorems 4.4 and 4.5 in [Bla3]: Theorem 1. Let G be a finite cyclic subgroup of order n of the Cremona group. The following conditions are equivalent: • If g ∈ G, g 6= 1, then g does not fix a curve of positive genus. • G is birationally conjugate to a subgroup of Aut(P2). • G is birationally conjugate to a subgroup of Aut(P1 × P1). • G is birationally conjugate to the group of automorphisms of P2 generated by (x : y : z) 7→ (x : y : e2iπ/nz). Theorem 2. Let G be a finite Abelian subgroup of the Cremona group. The following conditions are equivalent: • If g ∈ G, g 6= 1, then g does not fix a curve of positive genus. • G is birationally conjugate to a subgroup of Aut(P2), or to a subgroup of Aut(P1 × P1) or to the group Cs24 isomorphic to Z/2Z × Z/4Z, generated by the two elements (x : y : z) 99K (yz : xy : −xz), (x : y : z) 99K (yz(y − z) : xz(y + z) : xy(y + z)). Moreover, this last group is conjugate neither to a subgroup of Aut(P2), nor to a subgroup of Aut(P1 × P1). Then, we discuss the case in which the group is infinite, respectively non- Abelian (Section 11) and provide many examples of groups satisfying (F ) but not Note that many finite groups which contain elements that fix a non-rational curve are known, see for example [Wim] or more recently [Bla2] and [Do-Iz]. This can also occur if the group is infinite, see [BPV] and [Bla5]. In fact, the set of non-rational curves fixed by the elements of a group is a conjugacy invariant very useful in describing conjugacy classes (see [Ba-Be], [dFe], [Bla4]). 1.2 How to decide Given a finite Abelian group of birational transformations of a (rational) surface, we thus have a good way to determine whether the group is birationally conjugate to a group of automorphisms of a minimal surface (in fact to P2 or P1 × P1). If some non-trivial element fixes a curve of positive genus (i.e. if condition (F ) is not satisfied), this is false. Otherwise, if the group is not isomorphic to Z/2Z×Z/4Z, it is birationally conjugate to a subgroup of Aut(P2) or of Aut(P1 × P1). There are exactly four conjugacy classes of groups isomorphic to Z/2Z×Z/4Z satisfying condition (F ) (see Theorem 5); three are conjugate to a subgroup of Aut(P2) or Aut(P1 × P1), and the fourth (the group Cs24 of Theorem 2, described in detail in Section 7) is not. 1.3 Linearisation of birational actions Our question is related to that of linearisation of birational actions on C2. This latter question has been studied intensively for holomorphic or polynomial actions, see for example [De-Ku], [Kra] and [vdE]. Taking some group acting birationally on C2, we would like to know if we may birationally conjugate this action to have a linear action. Note that working on P2 or C2 is the same for this question. Theorem 1 implies that for finite cyclic groups, being linearisable is equivalent to fulfilling condition (F ). This is not true for finite Abelian groups in general, since some groups acting biregularly on P1 × P1 are not birationally conjugate to groups of automorphisms of P2. Note that Theorem 1 implies the following result on linearisation, also announced in [Bla3] (as Theorem 4.2): Theorem 3. Any birational map which is a root of a non-trivial linear automor- phism of finite order of the plane is conjugate to a linear automorphism of the plane. 1.4 The approach and other results Our approach – followed in all the modern articles on the subject – is to view the finite subgroups of the Cremona group as groups of (biregular) automorphisms of smooth projective rational surfaces and then to assume that the action is minimal (i.e. that it is not possible to blow-down some curves and obtain once again a biregular action on a smooth surface). Manin and Iskovskikh ([Man] and [Isk2]) proved that the only possible cases are action on del Pezzo surfaces or conic bun- dles. We will clarify this classification, for finite Abelian groups fillfulling (F), by proving the following result: Theorem 4. Let S be some smooth projective rational surface and let G ⊂ Aut(S) be a finite Abelian group of automorphisms of S such that • the pair (G,S) is minimal; • if g ∈ G, g 6= 1, then g does not fix a curve of positive genus. Then, one of the following occurs: 1. The surface S is minimal, i.e. S ∼= P2, or S ∼= Fn for some integer n 6= 1. 2. The surface S is a del Pezzo surface of degree 5 and G ∼= Z/5Z. 3. The surface S is a del Pezzo surface of degree 6 and G ∼= Z/6Z. 4. The pair (G,S) is isomorphic to the pair (Cs24, Ŝ4) defined in Section 7. We will then prove that all the pairs in cases 1, 2 and 3 are birationally equiv- alent to a group of automorphisms of P1 × P1 or P2, and that this is not true for case 4. In fact, we are able to provide the precise description of all conjugacy classes of finite Abelian subgroups of Bir(P2) satisfying (F ): Theorem 5. Let G be a finite Abelian subgroup of the Cremona group such that no non-trivial element of G fixes a curve of positive genus. Then, G is birationally conjugate to one and only one of the following: [1] G ∼= Z/nZ× Z/mZ g.b. (x, y) 7→ (ζnx, y) and (x, y) 7→ (x, ζmy) [2] G ∼= Z/2Z× Z/2nZ g.b. (x, y) 7→ (x−1, y) and (x, y) 7→ (−x, ζ2ny) [3] G ∼= (Z/2Z)2 × Z/2nZ g.b. (x, y) 7→ (±x±1, y) and (x, y) 7→ (x, ζ2ny) [4] G ∼= (Z/2Z)3 g.b. (x, y) 7→ (±x,±y) and (x, y) 7→ (x−1, y−1) [5] G ∼= (Z/2Z)4 g.b. (x, y) 7→ (±x±1,±y±1) [6] G ∼= Z/2Z× Z/4Z g.b. (x, y) 7→ (x−1, y−1) and (x, y) 7→ (−y, x) [7] G ∼= (Z/2Z)3 g.b. (x, y) 7→ (−x,−y), (x, y) 7→ (x−1, y−1), and (x, y) 7→ (y, x) [8] G ∼= (Z/2Z)× (Z/4Z) g.b. (x : y : z) 99K (yz(y − z) : xz(y + z) : xy(y + z)) and (x : y : z) 99K (yz : xy : −xz) [9] G ∼= (Z/3Z)2 g.b. (x : y : z) 7→ (x : ζ3y : (ζ3) and (x : y : z) 7→ (y : z : x) (where n,m are positive integers, n divides m and ζn = e 2iπ/n). Furthermore, the groups in cases [1] through [7] are birationally conjugate to sub- groups of Aut(P1×P1), but the others are not. The groups in cases [1] and [9] are birationally conjugate to subgroups of Aut(P2), but the others are not. To prove these results, we will need a number of geometric results on automor- phisms of rational surfaces, and in particular on automorphisms of conic bundles and del Pezzo surfaces (Sections 3 to 9). We give for example the classification of all the twisting elements (that exchange the two components of a singular fibre) acting on conic bundles in Proposition 6.5 (for the elements of finite order) and Proposition 6.8 (for those of infinite order); these are the most important elements in this context (see Lemma 3.8). We also prove that actions of (possibly infinite) Abelian groups on del Pezzo surfaces satifying (F ) are minimal only if the degree is at least 5 (Section 9) and describe these cases precisely (Sections 4, 5 and 9). We also show that a finite Abelian group acting on a projective smooth surface S such that (KS) 2 ≥ 5 is birationally conjugate to a group of automorphisms of P1 × P1 or P2 (Corollary 9.10) and in particular satisfies (F ). 1.5 Comparison with other work Many authors have considered the finite subgroups of Bir(P2). Among them, S. Kantor [Kan] gave a classification of the finite subgroups, which was incomplete and included some mistakes; A. Wiman [Wim] and then I.V. Dolgachev and V.A. Iskovskikh [Do-Iz] successively improved Kantor’s results. The long paper [Do-Iz] expounds the general theory of finite subgroups of Bir(P2) according to the modern techniques of algebraic geometry, and will be for years to come the reference on the subject. Our viewpoint and aim differ from those of [Do-Iz]: we are only interested in Abelian groups in relation with the above conditions (F) and (M); this gives a restricted setting in which the theoretical approach is simplified and the results obtained are more accurate. In the study of del Pezzo surfaces, using the classification [Do-Iz] of subgroups of automorphisms would require the examination of many cases; for the sake of readibility we prefered a direct proof. The two main theorems of [Do-Iz] on automorphism of conic bundles (Proposition 5.3 and Theorem 5.7(2)) do not exclude groups satisfying property (F ) and do not give explicit forms for the generators of the groups or the surfaces. 1.6 Aknowledgements This article is part of my PhD thesis [Bla2]; I am grateful to my advisor T. Vust for his invaluable help during these years, to I. Dolgachev for helpful discussions, and thank J.-P. Serre and the referees for their useful remarks on this paper. 2 Automorphisms of P2 or P1 × P1 Note that a linear automorphism of C2 may be extended to an automorphism of either P2 or P1 × P1. Moreover, the automorphisms of finite order of these three surfaces are birationally conjugate. For finite Abelian groups, the situation is quite different. We give here the birational equivalence of these groups. Notation 2.1. The element [a : b : c] denotes the diagonal automorphism (x : y : z) 7→ (ax : by : cz) of P2, and ζm = e 2iπ/m. Proposition 2.2 (Finite Abelian subgroups of Aut(P2)). Every finite Abelian subgroup of Aut(P2) = PGL(3,C) is conjugate, in the Cremona group Bir(P2), to one and only one of the following: 1. A diagonal group, isomorphic to Z/nZ×Z/mZ, where n divides m, generated by [1 : ζn : 1] and [ζm : 1 : 1]. (The case n = 1 gives the cyclic groups). 2. The special group V9, isomorphic to Z/3Z×Z/3Z, generated by [1 : ζ3 : (ζ3) and (x : y : z) 7→ (y : z : x). Thus, except for the group V9, two isomorphic finite Abelian subgroups of PGL(3,C) are conjugate in Bir(P2). Proof. First of all, a simple calculation shows that every finite Abelian subgroup of PGL(3,C) is either diagonalisable or conjugate to the group V9. Furthermore, since this last group does not fix any point, it is not diagonalisable, even in Bir(P2) [Ko-Sz, Proposition A.2]. Let T denote the torus of PGL(3,C) constituted by diagonal automorphisms of P2. Let G be a finite subgroup of T ; as an abstract group it is isomorphic to Z/nZ× Z/mZ, where n divides m. Now we can conjugate G by a birational map of the form h : (x, y) 99K (xayb, xcyd) so that it contains [ζm : 1 : 1] (see [Be-Bl] and [Bla1]). Since h normalizes the torus T , the group G remains diagonal and contains the n-torsion of T , hence it contains [1 : ζn : 1]. Corollary 2.3. Every finite Abelian group of linear automorphisms of C2 is bi- rationally conjugate to a diagonal group, isomorphic to Z/nZ × Z/mZ, where n divides m, generated by (x, y) 7→ (ζnx, y) and (x, y) 7→ (x, ζmy). Proof. This follows from the fact that the group GL(2,C) of linear automorphisms of C2 extends to a group of automorphisms of P2 that leaves the line at infinity invariant and fixes one point. Example 2.4. Note that Aut(P1 × P1) contains the group (C∗)2 ⋊ Z/2Z, where (C∗)2 is the group of automorphisms of the form (x, y) 7→ (αx, βy), α, β ∈ C∗, and Z/2Z is generated by the automorphism (x, y) 7→ (y, x). The birational map (x, y) 99K (x : y : 1) from P1×P1 to P2 conjugates (C∗)2⋊ Z/2Z to the group of automorphisms of P2 generated by (x : y : z) 7→ (αx : βy : z), α, β ∈ C∗ and (x : y : z) 7→ (y : x : z). Proposition 2.5 (Finite Abelian subgroups of Aut(P1 × P1)). Up to birational conjugation, every finite Abelian subgroup of Aut(P1×P1) is conjugate to one and only one of the following: [1] G ∼= Z/nZ× Z/mZ g.b. (x, y) 7→ (ζnx, y) and (x, y) 7→ (x, ζmy) [2] G ∼= Z/2Z× Z/2nZ g.b. (x, y) 7→ (x−1, y) and (x, y) 7→ (−x, ζ2ny) [3] G ∼= (Z/2Z)2 × Z/2nZ g.b. (x, y) 7→ (±x±1, y) and (x, y) 7→ (x, ζ2ny) [4] G ∼= (Z/2Z)3 g.b. (x, y) 7→ (±x,±y) and (x, y) 7→ (x−1, y−1) [5] G ∼= (Z/2Z)4 g.b. (x, y) 7→ (±x±1,±y±1) [6] G ∼= Z/2Z× Z/4Z g.b. (x, y) 7→ (x−1, y−1) and (x, y) 7→ (−y, x) [7] G ∼= (Z/2Z)3 g.b. (x, y) 7→ (−x,−y), (x, y) 7→ (x−1, y−1), and (x, y) 7→ (y, x) (where n,m are positive integers, n divides m and ζn = e 2iπ/n). Furthermore, the groups in [1] are conjugate to subgroups of Aut(P2), but the others are not. Proof. Recall that Aut(P1 × P1) = (PGL(2,C) × PGL(2,C)) ⋊ Z/2Z. Let G be some finite Abelian subgroup of Aut(P1 × P1); we now prove that G is conjugate to one of the groups in cases [1] through [7]. First of all, if G is a subgroup of the group (C∗)2⋊Z/2Z given in Example 2.4, then it is conjugate to a subgroup of Aut(P2) and hence to a group in case [1]. Assume that G ⊂ PGL(2,C)×PGL(2,C) and denote by π1 and π2 the projec- tions πi : PGL(2,C)×PGL(2,C) → PGL(2,C) on the i-th factor. Since π1(G) and π2(G) are finite Abelian subgroups of PGL(2,C) each is conjugate to a diagonal cyclic group or to the group x 99K ±x±1, isomorphic to (Z/2Z)2. We enumerate the possible cases. If both groups π1(G) and π2(G) are cyclic, the group G is conjugate to a subgroup of the diagonal torus (C∗)2 of automorphisms of the form (x, y) 7→ (αx, βy), α, β ∈ C∗. If exactly one of the two groups π1(G) and π2(G) is cyclic we may assume, up to conjugation in Aut(P1 × P1), that π2(G) is cyclic, generated by y 7→ ζmy, for some integer m ≥ 1, and that π1(G) is the group x 99K ±x ±1. We use the exact sequence 1 → G ∩ kerπ2 → G → π2(G) → 1 and find, up to conjugation, two possibilities for G: (a) G is generated by (x, y) 7→ (x−1, y) and (x, y) 7→ (−x, ζmy). (b) G is generated by (x, y) 7→ (±x±1, y) and (x, y) 7→ (x, ζmy). If m is even, we obtain respectively [2] and [3] for n = m/2. If m is odd, the two groups are equal; conjugating by ϕ : (x, y) 99K (x, y(x + x−1)) (which conjugates (x, y) 7→ (−x, y) to (x, y) 7→ (−x,−y)) we obtain the group [2] for n = m. If both groups π1(G) and π2(G) are isomorphic to (Z/2Z) 2, then up to conju- gation, we obtain three groups, namely (a) G is generated by (x, y) 7→ (−x,−y) and (x, y) 7→ (x−1, y−1). (b) G is generated by (x, y) 7→ (±x,±y) and (x, y) 7→ (x−1, y−1). (c) G is given by (x, y) 7→ (±x±1,±y±1). The group [2] with n = 1 is conjugate to (a) by (x, y) 99K (x, x y+x y+x−1 ). The groups (b) and (c) are respectively equal to [4] and [5]. We now suppose that the group G is not contained in PGL(2,C)×PGL(2,C). Any element ϕ ∈ Aut(P1×P1) not contained in PGL(2,C)×PGL(2,C) is conjugate to ϕ : (x, y) 7→ (α(y), x), where α ∈ Aut(P1), and if ϕ is of finite order, α may be chosen to be y 7→ λy with λ ∈ C∗ a root of unity. Thus, up to conjugation, G is generated by the group H = G ∩ (PGL(2,C)× PGL(2,C)) and one element (x, y) 7→ (λy, x), for some λ ∈ C∗ of finite order. Since the group G is Abelian, every element of H is of the form (x, y) 7→ (β(x), β(y)), for some β ∈ PGL(2,C) satisfying β(λx) = λβ(x). Three possibilities occur, depending on the value of λ which may be 1, −1 or something else. If λ = 1, we conjugate the group by some element (x, y) 7→ (γ(x), γ(y)) so that H is either diagonal or equal to the group generated by (x, y) 7→ (−x,−y) and (x, y) 7→ (x−1, y−1). In the first situation, the group is contained in (C∗)2 ⋊Z/2Z (which gives [1]); the second situation gives [7]. If λ = −1, the group H contains the square of (x, y) 7→ (−y, x), which is (x, y) 7→ (−x,−y) and is either cyclic or generated by (x, y) 7→ (−x,−y) and (x, y) 7→ (x−1, y−1). If H is cyclic, it is diagonal, since it contains (x, y) 7→ (−x,−y), so G is contained in (C∗)2 ⋊ Z/2Z. The second possibility gives [6]. If λ 6= ±1, the group H is diagonal and then G is contained in (C∗)2 ⋊ Z/2Z. We now prove that distinct groups of the list are not birationally conjugate. First of all, each group of case [1] fixes at least one point of P1 × P1. Since the other groups of the list don’t fix any point, they are not conjugate to [1] [Ko-Sz, Proposition A.2]. Consider the other groups. The set of isomorphic groups are those of cases [3] (with n = 1), [4] and [7] (isomorphic to (Z/2Z)3), and of cases [2] (with n = 2) and [6] (isomorphic to Z/2Z× Z/4Z). The groups of cases [2] to [5] leave two pencils of rational curves invariant (the fibres of the two projections P1 × P1 → P1) which intersect freely in exactly one point. We prove that this is not the case for [6] and [7]; this shows that these two groups are not birationally conjugate to any of the previous groups. Take G ⊂ Aut(P1×P1) to be either [6] or [7]. We have then Pic(P1×P1)G = Zd, where d = − 1 KP1×P1 is the diagonal of P 1×P1. Suppose that there exist two G-invariant pencils Λ1 = n1d and Λ2 = n2d of rational curves, for some positive integers n1, n2 (we identify here a pencil with the class of its elements in Pic(P1 × P1)G). The intersection Λ1 · Λ2 = 2n1n2 is an even integer. Note that the fixed part of the intersection is also even, since G is of order 8 and acts without fixed points on P1 × P1. The free part of the intersection is then also an even integer and hence is not 1. Let us now prove that [4] is not birationally conjugate to [3] (with n = 1). This follows from the fact that [4] contains three subgroups that are fixed-point free (the groups generated by (x, y) 7→ (x−1, y−1) and one of the three involutions of the group (x, y) 7→ (±x,±y)), whereas [3] (with n = 1) contains only one such subgroup, which is (x, y) 7→ (±x±1, y). We now prove the last assertion. The finite Abelian groups of automorphisms of P2 are conjugate either to [1] or to the group V9, isomorphic to (Z/3Z) 2 (see Proposition 2.2). As no group of the list [2] through [7] is isomorphic to (Z/3Z)2, we are done. Summary of this section. We have found that the groups common to the three surfaces C2,P2 and P1 × P1 are the ”diagonal” ones (generated by (x, y) 7→ (ζnx, y) and (x, y) 7→ (x, ζmy)). On P 2 there is only one more group, which is the special group V9, and on P 1 × P1 there are 2 families ([2] and [3]) and 4 special groups ([4], [5], [6] and [7]). 3 Some facts about automorphisms of conic bun- We first consider conic bundles without mentioning any group action on them. We recall some classical definitions: Definition 3.1. Let S be a rational surface and π : S → P1 be a morphism. We say that the pair (S, π) is a conic bundle if a general fibre of π is isomorphic to P1, with a finite number of exceptions: these singular fibres are the union of smooth rational curves F1 and F2 such that (F1) 2 = (F2) 2 = −1 and F1 · F2 = 1. Let (S, π) and (S̃, π̃) be two conic bundles. We say that ϕ : S 99K S̃ is a birational map of conic bundles if ϕ is a birational map which sends a general fibre of π on a general fibre of π̃. We say that a conic bundle (S, π) is minimal if any birational morphism of conic bundles (S, π) → (S̃, π̃) is an isomorphism. We remind the reader of the following well-known result: Lemma 3.2. Let (S, π) be a conic bundle. The following conditions are equivalent: • (S, π) is minimal. • The fibration π is smooth, i.e. no fibre of π is singular. • S is a Hirzebruch surface Fm, for some integer m ≥ 0. � Blowing-down one irreducible component in any singular fibre of a conic bundle (S, π), we obtain a birational morphism of conic bundles S → Fm for some integer m ≥ 0. Note that m depends on the choice of the blown-down components. The following lemma gives some information on the possibilities. Note first that since the sections of Fm have self-intersection≥ −m, the self-intersections of the sections of π are also bounded from below. Lemma 3.3. Let (S, π) be a conic bundle on a surface S 6∼= P1 × P1. Let −n be the minimal self-intersection of sections of π and let r be the number of singular fibres of π. Then n ≥ 1 and: 1. There exists a birational morphism of conic bundles p− : S → Fn such that: (a) p− is the blow-up of r points of Fn, none of which lies on the exceptional section En. (b) The strict pull-back Ẽn of En by p− is a section of π with self-intersection 2. If there exist two different sections of π with self-intersection −n, then r ≥ 2n. In this case, there exist birational morphisms of conic bundles p0 : S → F0 = P 1 × P1 and p1 : S → F1. Proof. We denote by s a section of π of minimal self-intersection −n, for some integer n (this integer is in fact positive, as will appear in the proof). Note that this curve intersects exactly one irreducible component of each singular fibre. If r = 0, the lemma is trivially true: take p− to be the identity map. We now suppose that r ≥ 1, and denote by F1, ..., Fr the irreducible components of the singular fibres which do not intersect s. Blowing these down, we get a birational morphism of conic bundles p− : S → Fm, for some integerm ≥ 0. The image of the section s by p− is a section of the conic bundle of Fm of minimal self-intersection, so we get m = n, and n ≥ 0. If we had n = 0, then taking some section s̃ of P1×P1 of self-intersection 0 passing through at least one blown-up point, its strict pull-back by p− would be a section of negative self-intersection, which contradicts the minimality of s2 = −n = 0. We find finally that m = n > 0, and that p−(s) is the unique section Fn of self-intersection −n. This proves the first assertion. We now prove the second assertion. Suppose that some section t 6= s has self- intersection −n. The Picard group of S is generated by s = p∗−(En), the divisor f of a fibre of π and F1, ..., Fr. Write t as t = s+ bf − i=1 aiFi, for some integers b, a1, ..., ar, with a1, ..., ar ≥ 0. We have t 2 = −n and t · (t+KS) = −2 (adjunction formula), where KS = p −(KFn) + i=1 Fi = −(n + 2)f − 2s + i=1 Fi. These relations give: s2 = t2 = s2 − i=1 a i + 2b, n− 2 = t ·KS = −(n+ 2) + 2n− 2b+ i=1 ai, whence i=1 ai = i=1 a i = 2b, so each ai is equal to 0 or 1 and consequently 2b ≤ r. Since s · t = b− n ≥ 0, we find that r ≥ 2n, as announced. Finally, by contracting f − F1, f − F2, ..., f − Fn, Fn+1, Fn+2, ..., Fr, we obtain a birational morphism p0 of conic bundles which sends s on a section of self- intersection 0 and whose image is thus F0. Similarly, the morphism p1 : S → F1 is given by the contraction of f − F1, f − F2, ..., f − Fn−1, Fn, Fn+1, ..., Fr. We now add some group actions on the conic bundles, and give natural defi- nitions (note that we will restrict ourselves to finite or Abelian groups only when this is needed and will then say so): Definition 3.4. Let (S, π) be some conic bundle. • We denote by Aut(S, π) ⊂ Aut(S) the group of automorphisms of the conic bundle, i.e. automorphisms of S that send a general fibre of π on another general fibre. Let G ⊂ Aut(S, π) be some group of automorphisms of the conic bundle (S, π). • We say that a birational map of conic bundles ϕ : S 99K S̃ is G-equivariant if the G-action on S̃ induced by ϕ is biregular (it is clear that it preserves the conic bundle structure). • We say that the triple (G,S, π) is minimal if any G-equivariant birational morphism of conic bundles ϕ : S → S̃ is an isomorphism. Remark 3.5. We insist on the fact that since a conic bundle is for us a pair (S, π), an automorphism of S is not necessarily an automorphism of the conic bundle (i.e. Aut(S) 6= Aut(S, π) in general). One should be aware that in the literature, conic bundle sometimes means ”a variety admitting a conic bundle structure”. Remark 3.6. If G ⊂ Aut(S, π) is such that the pair (G,S) is minimal, so is the triple (G,S, π). The converse is not true in general (see Remark 4.7). Note that any automorphism of the conic bundle acts on the set of singular fibres and on its irreducible components. The permutation of the two components of a singular fibre is very important (Lemma 3.8). For this reason, we introduce some terminology: Definition 3.7. Let g ∈ Aut(S, π) be an automorphism of the conic bundle (S, π). Let F = {F1, F2} be a singular fibre. We say that g twists the singular fibre F if g(F1) = F2 (and consequently g(F2) = F1). If g twists at least one singular fibre of π, we will say that g twists the conic bundle (S, π), or simply (if the conic bundle is implicit) that g is a twisting element. Here is a simple but very important observation: Lemma 3.8. Let G ⊂ Aut(S, π) be a group of automorphisms of a conic bundle. The following conditions are equivalent: 1. The triple (G,S, π) is minimal. 2. Any singular fibre of π is twisted by some element of G. � Remark 3.9. An automorphism of a conic bundle with a non-trivial action on the basis of the fibration may twist at most two singular fibres. However, an automorphism with a trivial action on the basis of the fibration may twist a large number of fibres. We will give in Propositions 6.5 and 6.8 a precise description of all twisting elements. The following lemma is a direct consequence of Lemma 3.3; it provides infor- mation on the structure of the underlying variety of a conic bundle admitting a twisting automorphism. Lemma 3.10. Suppose that some automorphism of the conic bundle (S, π) twists at least one singular fibre. Then, the following occur. 1. There exist two birational morphisms of conic bundles p0 : S → F0 and p1 : S → F1 (which are not g-equivariant). 2. Let −n be the minimal self-intersection of sections of π and let r be the number of singular fibres of π. Then, r ≥ 2n ≥ 2. Proof. Note that any section of π touches exactly one component of each singular fibre. Since g twists some singular fibre, its action on the set of sections of S is fixed-point-free. The number of sections of minimal self-intersection is then greater than 1 and we apply Lemma 3.3 to get the result. Remark 3.11. A result of the same kind can be found in [Isk1], Theorem 1.1. Lemma 3.12. Let G ⊂ Aut(S, π) be a group of automorphisms of the conic bundle (S, π), such that: • π has at most 3 singular fibres (or equivalently (KS) 2 ≥ 5); • the triple (G,S, π) is minimal. Then, S is either a Hirzeburch surface or a del Pezzo surface of degree 5 or 6, depending on whether the number of singular fibres is 0, 3 or 2 respectively. Proof. Let −n be the minimal self-intersection of sections of π and let r ≤ 3 be the number of singular fibres of π. If r = 0, we are done, so we may suppose that r > 0. Since (G,S, π) is minimal, every singular fibre is twisted by some element of G (Lemma 3.8). From Lemma 3.10, we get r ≥ 2n ≥ 2, whence r = 2 or 3 and n = 1, and we obtain the existence of some birational morphism of conic bundles (not G-equivariant) p1 : S → F1. So the surface S is obtained by the blow-up of 2 or 3 points of F1, not on the exceptional section (Lemma 3.3), and thus by blowing-up 3 or 4 points of P2, no 3 of which are collinear (otherwise we would have a section of self-intersection ≤ −2). The surface is then a del Pezzo surface of degree 6 or 5. Remark 3.13. We conclude this section by mentioning an important exact se- quence. Let G ⊂ Aut(S, π) be some group of automorphisms of a conic bundle (S, π). We have a natural homomorphism π : G → Aut(P1) = PGL(2,C) that satisfies π(g)π = πg, for every g ∈ G. We observe that the group G′ = kerπ of automorphisms that leave every fibre invariant embeds in the group PGL(2,C(x)) of automorphisms of the generic fibre P1(C(x)). Then we get the exact sequence 1 → G′ → G → π(G) → 1. (1) This restricts the structure of G; for example if G is Abelian and finite, so are G′ and π(G), and we know that the finite Abelian subgroups of PGL(2,C) and PGL(2,C(x)) are either cyclic or isomorphic to (Z/2Z)2. We also see that the group G is birationally conjugate to a subgroup of the group of birational transformations of P1 × P1 of the form (written in affine coor- dinates): (x, y) 99K ax+ b cx+ d α(x)y + β(x) γ(x)y + δ(x) where a, b, c, d ∈ C, α, β, γ, δ ∈ C(x), and (ad− bc)(αδ − βγ) 6= 0. This group, called the de Jonquières group, is the group of birational transfor- mations of P1 ×P1 that preserve the fibration induced by the first projection, and is isomorphic to PGL(2,C(x))⋊ PGL(2,C). The subgroups of this group can be studied algebraically (as in [Bea2] and [Bla4]) but we will not adopt this point of view here. 4 The del Pezzo surface of degree 6 There is a single isomorphism class of del Pezzo surfaces of degree 6, since all sets of three non-collinear points of P2 are equivalent under the action of linear automorphisms. Consider the surface S6 of degree 6 defined by the blow-up of the points A1 = (1 : 0 : 0), A2 = (0 : 1 : 0) and A3 = (0 : 0 : 1). We may view it in P2 × P2, defined as { (x : y : z), (u : v : w) | ux = vy = wz}, where the blow-down p : S6 → P 2 is the restriction of the projection on one copy of P2, explicitly p : (x : y : z), (u : v : w) 7→ (x : y : z). There are exactly 6 exceptional divisors, which are the pull-backs of the Ai’s by the two projection morphisms. We write Ei = p −1(Ai) and denote by Dij the strict pull-back by p of the line of P2 passing through Ai and Aj . The group of automorphisms of S6 is well known (see for example [Wim], [Do-Iz]). It is isomorphic to (C∗)2 ⋊ (Sym3 × Z/2Z), where (C ∗)2 ⋊ Sym3 is the lift on S6 of the group of automorphisms of P 2 that leave the set {A1, A2, A3} invariant, and Z/2Z is generated by the permutation of the two factors (it is the lift of the standard quadratic transformation (x : y : z) 99K (yz : xz : xy) of P2); the action of Z/2Z on (C∗)2 sends an element on its inverse. There are three conic bundle structures on the surface S6. Let π1 : S6 → P be the morphism defined by (x : y : z), (u : v : w) (y : z) if (x : y : z) 6= (1 : 0 : 0), (w : v) if (u : v : w) 6= (1 : 0 : 0). Note that p sends the fibres of π1 on lines of P 2 passing through A1. There are exactly two singular fibres of this fibration, namely π−11 (1 : 0) = {E2, D12} and π 1 (0 : 1) = {E3, D13}; and E1, D23 are sections of π1. Lemma 4.1. The group Aut(S6, π1) of automorphisms of the conic bundle (S6, π1) acts on the hexagon {E1, E2, E3, D12, D13, D23} and leaves the set {E1, D23} in- variant. 1. The action on the hexagon gives rise to the exact sequence 1 → (C∗)2 → Aut(S6, π1) → (Z/2Z) 2 → 1. 2. This exact sequence is split and Aut(S6, π1) = (C ∗)2 ⋊ (Z/2Z)2, where (a) (C∗)2 is the group of automorphisms of the form( (x : y : z), (u : v : w) (x : αy : βz), (αβu : βv : αw) , α, β ∈ C∗. (b) The group (Z/2Z)2 is generated by the automorphisms (x : y : z), (u : v : w) (x : z : y), (u : w : v) whose action on the set of exceptional divisors is (E2 E3)(D12 D13); (x : y : z), (u : v : w) (u : v : w), (x : y : z) whose action is (E1 D23)(E2 D13)(E3 D12). (c) The action of (Z/2Z)2 on (C∗)2 is generated by permutation of the coordinates and inversion. Proof. Since Aut(S6) acts on the hexagon, so does Aut(S6, π1) ⊂ Aut(S6). Since the group Aut(S6, π1) sends a section on a section, the set {E1, D23} is invariant. The group (C∗)2 leaves the conic bundle invariant, and is the kernel of the action of Aut(S6, π1) on the hexagon. As the set {E1, D23} is invariant, the image is contained in the group (Z/2Z)2 generated by (E2 E3)(D12 D13) and (E1 D23)(E2 D13)(E3 D12). The rest of the lemma follows directly. By permuting coordinates, we have two other conic bundle structures on the surface S6, given by the following morphisms π2, π3 : S6 → P (x : y : z), (u : v : w) (x : z) if (x : y : z) 6= (0 : 1 : 0), (w : u) if (u : v : w) 6= (0 : 1 : 0). (x : y : z), (u : v : w) (x : y) if (x : y : z) 6= (0 : 0 : 1), (v : u) if (u : v : w) 6= (0 : 0 : 1). The description of the exceptional divisors on S6 shows that π1, π2 and π3 are the only conic bundle structures on S6. Lemma 4.2. For i = 1, 2, 3, the pair (Aut(S6, πi), S6) is not minimal. More precisely the morphism πj×πk : S6 → P 1×P1 conjugates Aut(S6, πi) to a subgroup of Aut(P1 × P1), where {i, j, k} = {1, 2, 3}. Proof. The union of the sections E1 and D23 is invariant by the action of the whole group Aut(S6, π1). Since these two exceptional divisors don’t intersect, we can contract both and get a birational Aut(S6, π1)-equivariant morphism from S6 to P1×P1: the pair (Aut(S6, π1), S6) is thus not minimal; explicitly, the birational morphism is given by q 7→ (π2(q), π3(q)), as stated in the lemma. We obtain the other cases by permuting coordinates. Remark 4.3. The subgroup of Aut(P1×P1) obtained in this manner doesn’t leave any of the two fibrations of P1 × P1 invariant. Corollary 4.4. If (G,S6) is a minimal pair (where G ⊂ Aut(S6)), then G does not preserve any conic bundle structure. � We conclude this section with a fundamental example; we will use several times the following automorphism κα,β of (S6, π1): Example 4.5. For any α, β ∈ C∗, we define κα,β to be the following automorphism of (S6, π1): κα,β : (x : y : z), (u : v : w) (u : αw : βv), (x : α−1z : β−1y) Note that κα,β twists the two singular fibres of π1 (see Lemma 4.6 below); its action on the basis of the fibration is (x1 : x2) 7→ (αx1 : βx2) and κ2α,β( (x : y : z), (u : v : w) (x : αβ−1y : α−1βz), (u : α−1βv : αβ−1w) So κα,β is an involution if and only if its action on the basis of the fibration is trivial. Lemma 4.6. Let g ∈ Aut(S6, π1) be an automorphism of the conic bundle (S6, π1). The following conditions are equivalent: • the triple (< g >, S6, π1) is minimal; • g twists the two singular fibres of π1; • the action of g on the exceptional divisors of S6 is (E1 D23)(E2 D12)(E3 D13); • g = κα,β for some α, β ∈ C Proof. According to Lemma 4.1 the action of Aut(S6, π1) on the exceptional curves is isomorphic to (Z/2Z)2 and hence the possible actions of g 6= 1 are these: 1. id, 2. (E2 E3)(D12 D13), 3. (E1 D23)(E2 D13)(E3 D12), 4. (E1 D23)(E2 D12)(E3 D13). In the first three cases, the triple (< g >, S6, π1) is not minimal. Indeed, the blow-down of {E2, E3} or {E2, D13} gives a g-equivariant birational morphism of conic bundles. Hence, if (< g >, S6, π1) is minimal, its action on the exceptional curves is the fourth one above, as stated in the lemma, and it then twists the two singular fibres of π1. Conversely if g twists the two singular fibres of π1, the triple (< g >, S6, π1) is minimal (by Lemma 3.8). It remains to see that the last assertion is equivalent to the others. This follows from Lemma 4.1; indeed this lemma implies that (C∗)2κ1,1 is the set of elements of Aut(S6, π1) inducing the permutation (E1 D23)(E2 D12)(E3 D13). Remark 4.7. The pair (Aut(S6, π1), S6) is not minimal (Lemma 4.2). Consequently < κα,β > is an example of a group whose action on the surface is not minimal, but whose action on a conic bundle is minimal. 5 The del Pezzo surface of degree 5 As for the del Pezzo surface of degree 6, there is a single isomorphism class of del Pezzo surfaces of degree 5. Consider the del Pezzo surface S5 of degree 5 defined by the blow-up p : S5 → P 2 of the points A1 = (1 : 0 : 0), A2 = (0 : 1 : 0), A3 = (0 : 0 : 1) and A4 = (1 : 1 : 1). There are 10 exceptional divisors on S5, namely the divisor Ei = p −1(Ai), for i = 1, ..., 4, and the strict pull-back Dij of the line of P2 passing through Ai and Aj , for 1 ≤ i < j ≤ 4. There are 5 sets of 4 skew exceptional divisors on S5, namely F1 = {E1, D23, D24, D34}, F2 = {E2, D13, D14, D34}, F3 = {E3, D12, D14, D24}, F4 = {E4, D12, D13, D23}, F5 = {E1, E2, E3, E4}. Proposition 5.1. The action of Aut(S5) on the five sets F1, ..., F5 of four skew exceptional divisors of S5 gives rise to an isomomorphism ρ : Aut(S5) → Sym5. Furthermore, the actions of Symn, Altm ⊂ Aut(S5) on S5 given by the canonical embedding of these groups into Sym5 are fixed-point free if and only if n = 3, 4, 5, respectively m = 4, 5. Proof. Since any automorphism in the kernel of ρ leaves E1, E2, E3 and E4 invari- ant and hence is the lift of an automorphism of P2 that fixes the 4 points, the homomorphism ρ is injective. We now prove that ρ is also surjective. Firstly, the lift of the group of au- tomorphisms of P2 that leave the set {A1, A2, A3, A4} invariant is sent by ρ on Sym4 = Sym{F1,F2,F3,F4}. Secondly, the lift of the standard quadratic transforma- tion (x : y : z) 99K (yz : xz : xy) is an automorphism of S5, as its lift on S6 is an automorphism, and as it fixes the point A4; its image by ρ is (F4 F5). It remains to prove the last assertion. First of all, it is clear that the actions of the cyclic groups Alt3 and Sym2 fix some points. The group Sym3 ⊂ Aut(P of permutations of A1, A2 and A3 fixes exactly one point, namely (1 : 1 : 1). The blow-up of this point gives a fixed-point free action on F1, and thus its lift on S5 is also fixed-point free. The group Alt4 ⊂ Aut(P 2) contains the element (x : y : z) 7→ (z : x : y) (which corresponds to (1 2 3)) that fixes exactly three points, i.e. (1 : a : a2) for a3 = 1. It also contains the element (x : y : z) 7→ (z − y : z − x : z) (which corresponds to (1 2)(3 4)) that does not fix (1 : a : a2) for a3 = 1. Thus, the action of Alt4 on P 2 is fixed-point free and the same is true on S5. Remark 5.2. The structure of Aut(S5) is classical and can be found for example in [Wim] and [Do-Iz]. Lemma 5.3. Let π : S5 → P 1 be some morphism inducing a conic bundle (S5, π). There are exactly four exceptional curves of S5 which are sections of π; the blow- down of these curves gives rise to a birational morphism p : S5 → P 2 which conjugates the group Aut(S5, π) ∼= Sym4 to the subgroup of Aut(P 2) that leaves invariant the four points blown-up by p. In particular, the pair (Aut(S5, π), S5) is not minimal. Proof. Blowing-down one component in any singular fibre, we obtain a birational morphism of conic bundles (not Aut(S5, π)-equivariant) from S5 to some Hirze- bruch surface Fn. Since S5 does not contain any curves of self-intersection ≤ −2, n is equal to 0 or 1. Changing the component blown-down in a singular fibre performs an elementary link Fn 99K Fn±1; we may then assume that n = 1, and that F1 is the blow-up of A1 ∈ P 2. Consequently, the fibres of the conic bundles correspond to the lines passing through A1. Denoting by A2, A3, A4 the other points blown-up by the constructed birational morphism S5 → P 2 and using the same notation as before, the three singular fibres are {Ei, D1i} for i = 2, ..., 4, and the other excep- tional curves are four skew sections of the conic bundle, namely the elements of F1 = {E1, D23, D24, D34}. The blow-down of F1 gives an Aut(S5, π)-equivariant birational morphism (that is not a morphism of conic bundles) p : S5 → P 2 and conjugates Aut(S5, π) to a subgroup of the group Sym4 ⊂ Aut(P 2) of automor- phisms that leaves the four points blown-up by p invariant. The fibres of π are sent on the conics passing through the four points, so the lift of the whole group Sym4 belongs to Aut(S5, π). Corollary 5.4. Let G be some group of automorphisms of a conic bundle (S, π) such that the pair (G,S) is minimal and (KS) 2 ≥ 5 (or equivalently such that the number of singular fibres of π is at most 3). Then, the fibration is smooth, i.e. S is a Hirzebruch surface. Proof. Since (G,S) is minimal, so is the triple (G,S, π). By Lemma 3.12, the surface S is either a Hirzebruch surface, or a del Pezzo surface of degree 5 or 6. Corollary 4.4 shows that the del Pezzo surface of degree 6 is not possible and Lemma 5.3 eliminates the possibility of the del Pezzo surface of degree 5. 6 Description of twisting elements In this section, we describe the twisting automorphisms of conic bundles, which are the most important automorphisms (see Lemma 3.8). Lemma 6.1 (Involutions twisting a conic bundle). Let g ∈ Aut(S, π) be a twist- ing automorphism of the conic bundle (S, π). Then, the following properties are equivalent: 1. g is an involution; 2. π(g) = 1, i.e. g has a trivial action on the basis of the fibration; 3. the set of points of S fixed by g is an irreducible hyperelliptic curve of genus (k − 1) – a double covering of P1 by means of π, ramified over 2k points – plus perhaps a finite number of isolated points, which are the singular points of the singular fibres not twisted by g. Furthermore, if the three conditions above are satisfied, the number of singular fibres of π twisted by g is 2k ≥ 2. Proof. 1 ⇒ 2: By contracting some exceptional curves, we may assume that the triple (< g >, S, π) is minimal. Suppose that g is an involution and π(g) 6= 1. Then g may twist only two singular fibres, which are the fibres of the two points of P1 fixed by π(g). Hence, the number of singular fibres is ≤ 2. Lemma 3.12 tells us that S is a del Pezzo surface of degree 6 and then Lemma 4.6 shows that g = κα,β (Example 4.5) for some α, β ∈ C ∗. But such an element is an involution if and only if it acts trivially on the basis of the fibration. (1 and 2) ⇒ 3: Suppose first that (< g >, S, π) is minimal. This implies that g twists every singular fibre of π. Therefore, since π(g) = 1 and g2 = 1, on a singular fibre there is one point fixed by g (the singular point of the fibre) and on a general fibre there are two fixed points. The set of points of S fixed by g is thus a smooth irreducible curve. The projection π gives it as a double covering of P1 ramified over the points whose fibres are singular and twisted by g. By the Riemann-Hurwitz formula, this number is even, equal to 2k and the genus of the curve is k − 1. The situation when (< g >, S, π) is not minimal is obtained from this one, by blowing-up some fixed points. This adds in each new singular fibre (not twisted by the involution) an isolated point, which is the singular point of the singular fibre. We then get the third assertion and the final remark. 3 ⇒ 2: This implication is clear. 2 ⇒ 1: If π(g) = 1, then, g2 leaves every component of every singular fibre of π invariant. Let p1 : S → F1 be the birational morphism of conic bundles given by Lemma 3.10; it is a g2-equivariant birational morphism which conjugates g2 to an automorphism of F1 that necessarily fixes the exceptional section. The pull-back by p1 of this section is a section C of π, fixed by g 2. Since C touches exactly one component of each singular fibre (in particular those that are twisted by g), g sends C on another section D also fixed by g2. The union of the sections D and C intersects a general fibre in two points, which are exchanged by the action of g. This implies that g has order 2. We now give some further simple results on twisting involutions. Corollary 6.2. Let (S, π) be some conic bundle. No involution twisting (S, π) has a root in Aut(S, π) which acts trivially on the basis of the fibration. Proof. Such a root must twist a singular fibre and so (Lemma 6.1) is an involution. Remark 6.3. There may exist some roots in Aut(S, π) of twisting involutions which act non trivially on the basis of the fibration. Take for example four general points A1, ..., A4 of the plane and denote by g ∈ Aut(P2) the element of order 4 that permutes these points cyclically. The blow-up of these points conjugates g to an automorphism of the del Pezzo surface S5 of degree 5 (see Section 5). The pencil of conics of P2 passing through the four points induces a conic bundle structure on S5, with three singular fibres which are the lift of the pairs of two lines passing through the points. The lift on S5 of g is an automorphism of the conic bundle whose square is a twisting involution. Corollary 6.4. Let (S, π) be some conic bundle and let g ∈ Aut(S, π). The following conditions are equivalent. 1. g twists more than 2 singular fibres of π. 2. g fixes a curve of positive genus. And these conditions imply that g is an involution which acts trivially on the basis of the fibration and twists at least 4 singular fibres. Proof. The first condition implies that g acts trivially on the basis of the fibration, and thus (by Lemma 6.1) that g is an involution which fixes a curve of positive genus. Suppose that g fixes a curve of positive genus. Then, g acts trivially on the basis of the fibration, and fixes 2 points on a general fibre. Consequently, the curve fixed by g is a smooth hyperelliptic curve; we get the remaining assertions from Lemma 6.1. As we mentioned above, the automorphisms that twist some singular fibre are fundamental (Lemma 3.8). We now describe these elements and prove that the only possibilities are twisting involutions, roots of twisting involutions (of even or odd order) and elements of the form κα,β (see Example 4.5): Proposition 6.5 (Classification of twisting elements of finite order). Let g ∈ Aut(S, π) be a twisting automorphism of finite order of a conic bundle (S, π). Let n be the order of its action on the basis. Then gn is an involution that acts trivially on the basis of the fibration and twists an even number 2k of singular fibres; furthermore, exactly one of the fol- lowing situations occurs: 1. n = 1. 2. n > 1 and k = 0; in this case n is even and there exists a g-equivariant bi- rational morphism of conic bundles η : S → S6 (where S6 is the del Pezzo surface of degree 6) such that ηgη−1 = κα,β for some α, β ∈ C ∗ (see Exam- ple 4.5). 3. n > 1 is odd and k > 0; here g twists 1 or 2 fibres, which are the fibres twisted by gn that are invariant by g. 4. n is even and k > 0; here g twists r = 1 or 2 singular fibres; none of them are twisted by gn; moreover the action of g on the set of 2k fibres twisted by gn is fixed-point free; furthermore, n divides 2k, and 2k/n ≡ r (mod 2). Proof. Lemma 6.1 describes the situation when n = 1. We now assume that n > 1; by blowing-down some components of singular fibres we may also suppose that the triple (G,S, π) is minimal. Denote by a1, a2 ∈ P 1 the two points fixed by π(g) ∈ Aut(P1). For i 6≡ 0 (mod n) the element π(gi) fixes only two points of P1, namely a1 and a2 (since π(g) has order n); the only possible fibres twisted by gi are thus π−1(a1), π −1(a2). Suppose that gn does not twist any singular fibre. By minimality there are at most 2 singular fibres (π−1(a1) and/or π −1(a2)) of π and g twists each one. Lemma 3.12 tells us that S is a del Pezzo surface of degree 6 and Lemma 4.6 shows that g = κα,β : (x : y : z), (u : v : w) (u : αw : βv) , (x : α−1z : β−1y) for some α, β ∈ C∗. We compute the square of g and find (x : y : z), (u : v : w) (x : αβ−1y : α−1βz) , (u : α−1βv : αβ−1w) Consequently, the order of g is 2n. The fact that gi twists π−1(a1) and π −1(a2) when i is odd implies that n is even. Case 2 is complete. If gn twists at least one singular fibre, it twists an even number of singular fibres (Lemma 6.1) which we denote by 2k, and gn is an involution. If n is odd, each fibre twisted by gn is twisted by g, and conversely; this yields case 3. It remains to consider the more difficult case when n is even. Firstly we observe that there are r + 2k singular fibres with r ∈ {1, 2}, cor- responding to the points a1 and/or a2, c1, ..., c2k of P 1, the first r of them be- ing twisted by g and the 2k others by gn. Under the permutation π(g), the set {c1, ..., c2k} decomposes into disjoint cycles of length n (this action is fixed- point-free); this shows that n divides 2k. We write t = 2k/n ∈ N and set {c1, ..., c2k} = ∪ i=1Ci, where each Ci ⊂ P 1 is an orbit of π(g) of size n. To deduce the congruence r ≡ t (mod 2), we study the action of g on Pic(S). For i ∈ {1, ..., t}, choose Fi to be a component in the fibre of the singular fibre of some point of Ci, and for i ∈ {1, r} choose Li to be a component in the fibre of ai. Let us write i=1(Fi + g(Fi) + ...+ g n−1(Fi)) + i=1 Li ∈ Pic(S). Denoting by f ⊂ S a general fibre of π, we find the equalities g(Li) = f − Li and gn(Fi) = f − Fi in Pic(S), which yield (once again in Pic(S)): g(R) = R+ (r + t)f − 2( i=1 Li + i=1 Fi). The contraction of the divisor R gives rise to a birational morphism of conic bundles (not g-equivariant) ν : S → Fm for some integer m ≥ 0. Denote by s ⊂ S the pull-back by ν of a general section of Fm of self-intersection m (which does not pass through any of the base-points of ν−1). The canonical divisor KS of S is then equal in Pic(S) to the divisor −2s+ (m − 2)f + R. We compute g(2s) and 2(g(s)− s) = g(2s)− 2s in Pic(S): g(2s) = g(−KS + (m− 2)f +R) = −KS + (m− 2)f + g(R); g(2s)− 2s = g(R)−R = (r + t)f − 2( i=1 Li + i=1 Fi). This shows that (r + t)f ∈ 2Pic(S), which implies that r ≡ t (mod 2). Case 4 is complete. Corollary 6.6. If g ∈ Aut(S, π) is a root of a twisting involution h that fixes a rational curve (i.e. that twists 2 singular fibres) and if g twists at least one fibre not twisted by h, then g2 = h, g twists exactly one singular fibre, and it exchanges the two fibres twisted by h. Proof. We apply Proposition 6.5 and obtain case 4 with k = 1. Corollary 6.6 and the following result will be useful in the sequel. Lemma 6.7. Let g ∈ Aut(S, π) be a non-trivial automorphism of finite order that leaves every component of every singular fibre of π invariant (i.e. that acts trivially on Pic(S)) and let h ∈ Aut(S, π) be an element that commutes with g. Then, either no singular fibre of π is twisted by h or each singular fibre of π which is invariant by h is twisted by h. Proof. If no twisting element belongs to Aut(S, π), we are done. Otherwise, the birational morphism of conic bundles p0 : S → P 1 × P1 given by Lemma 3.10 conjugates g to an element of finite order of Aut(P1 × P1, π1) whose set of fixed points is the union of two rational curves. The set of points of S fixed by g is thus the union of two sections and a finite number of points (which are the singular points of the singular fibres of π). Any element h ∈ Aut(S, π) that commutes with g leaves the set of these two sections invariant. More precisely, the action on one invariant singular fibre F implies the action on the two sections: h exchanges the two sections if and only if it twists F . Since the situation is the same at any other singular fibre, we obtain the result. We conclude this section with some results on automorphisms of infinite order of conic bundles, which will not help us directly here but seem interesting to observe. Proposition 6.8 (Classification of twisting elements of infinite order). Let (S, π) be a conic bundle and g ∈ Aut(S, π) be a twisting automorphism of infinite order. Then g twists exactly two fibres of π and there exists some g-equivariant bira- tional morphism of conic bundles η : S → S6, where S6 is the del Pezzo surface of degree 6 and ηgη−1 = κα,β for some α, β ∈ C Proof. Assume that the triple (< g >, S, π) is minimal. Lemma 6.1 shows that no twisting element of infinite order acts trivially on the basis of the fibration. Consequently, gk acts trivially on the basis if and only if k = 0, whence gk twists a fibre F if and only if k is odd and g twists F . There thus exist at most 2 singular fibres of π, and Lemma 3.12 tells us that S is a del Pezzo surface of degree 6. Lemma 4.6 shows that g = κα,β for some α, β ∈ C Corollary 6.9. Let g ∈ Aut(S, π) be an element of infinite order; then a birational morphism conjugates g to an automorphism of a Hirzebruch surface. Proof. Assume that the triple (< g >, S, π) is minimal. If the fibration is smooth, we are done. Otherwise, a birational morphism conjugates g to an automorphism κα,β ∈ Aut(S6) of a conic bundle on the del Pezzo surface of degree 6 (Lemma 6.8). We conclude by using Lemma 4.2. 7 The example Cs24 We now give the most important example of this paper. This is the only finite Abelian subgroup of the Cremona group which is not conjugate to a group of automorphisms of P2 or P1 × P1 but whose non-trivial elements do not fix any curve of positive genus (Theorem 2). Let S6 ⊂ P 2 × P2 be the del Pezzo surface of degree 6 (see Section 4) defined S6 = { (x : y : z), (u : v : w) | ux = yv = zw}; we keep the notation of Section 4. We denote by η : Ŝ4 → S6 the blow-up of A4, A5 ∈ S6 defined by (0 : 1 : 1) , (1 : 0 : 0) ∈ D23, (1 : 0 : 0) , (0 : 1 : −1) ∈ E1. We again denote by E1, E2, E3, D12, D13, D23 the total pull-backs by η of these divisors of S6. We denote by Ẽ1 and D̃23 the strict pull-backs of E1 and D23 by η. (Note that for the other exceptional divisors, the strict and total pull-backs are the same.) Let us illustrate the situations on the surfaces S6 and Ŝ4 respectively: E2 D15 E4 E3 D12 E5 D14 D13 Let π1 denote the morphism S6 → P 1 defined in Section 4. The morphism π = π1 ◦ η gives the surface Ŝ4 a conic bundle structure (Ŝ4, π). It has 4 singular fibres, which are the fibres of (−1 : 1), (0 : 1), (1 : 1) and (1 : 0). We denote by f the divisor of Ŝ4 corresponding to a fibre of π and set E4 = η −1(A4), E5 = η −1(A5). Note that E4 is one of the components of the singular fibre of (1 : 1); we denote by D14 = f−E4 the other component, which is the strict pull-back by η of π 1 (1 : 1). Similarly, we denote by D15 the divisor f−E5, so that the singular fibre of (−1 : 1) is {E5, D15}. Lemma 7.1. On the surface Ŝ4 there are exactly 10 irreducible rational smooth curves of negative self-intersection. Explicitly, the 8 curves E2, E3, E4, E5, D12, D13, D14, D15 have self-intersection −1 and the two curves Ẽ1 = E1 − E5 and D̃23 = D23 − E4 have self-intersection −2. Proof. The difficult part is to show that every rational irreducible smooth curve of negative self-intersection is one of the ten given above. Let C be such a curve. Denote by L the pull-back of a general line of P2 by the blow-up pr1 ◦ η : Ŝ4 → P2 of the five points. If C is collapsed by pr1 ◦ η, then C is one of the curves Ẽ1, E2, E3, E4, E5. Otherwise, C = mL − i=1 aiEi, where m, a1, ..., a5 are non- negative integers, and m > 0. Since C is rational we have C · (C + K ) = −2, and by hypothesis C2 = −r for some positive integer r. The relations C2 = −r and C ·K = r − 2 imply (since K = −3L+ i=1 Ei) the equations i=1 a i = m 2 + r, i=1 ai = 3m+ r − 2. If m = r = 1, we find that C is the pull-back of a line passing through two of the points, so C = D1i for some i ∈ {2, ..., 5}. If m = 2 and r = 1, C is the pull-back of a conic passing through each blown-up point. The configuration of the points eliminates this possibility. If m = 1 and r = 2, we obtain a line passing through three blown-up points, so C = D̃23. We now prove that if there is no integral solution to (2) for m, r ≥ 2. Since i=1 ai) 2 ≤ 5( i=1 a i ) (by the Cauchy-Schwarz inequality with the vectors (1, ..., 1) and (a1, ..., a5)), we obtain (3m+ (r − 2)) 2 ≤ 5(m2 + r), and this gives 4m2 − 10 + (r − 2) · (6m+ r − 7) ≤ 0. But this is not possible if m, r ≥ 2, since in this case 4m2 > 10 and 6m+r > 7. Note that (K )2 = 4, which is why we denote this surface by Ŝ4; the hat is here because the surface is not a del Pezzo surface, since it contains irreducible divisors of self-intersection −2. Corollary 7.2. There is only one conic bundle structure on Ŝ4, which is the one induced by π = π1 ◦ η. Proof. Since (K )2 = 4, the number of singular fibres of any conic bundle is 4, and thus it consists of eight (−1)-curves C1, ..., C8. The divisor of a fibre of the conic bundle is thus equal to 1 i=1 Ci. Since there are exactly eight (−1)-curves on Ŝ4, there is no choice. The group of automorphisms of Ŝ4 that leave every curve of negative self- intersection invariant is isomorphic to C∗ and corresponds to automorphisms of P2 of the form (x : y : z) 7→ (αx : y : z), for α ∈ C∗. Indeed, such automorphisms are the lifts of automorphisms of S6 leaving invariant every exceptional curve (which are of the form (x : y : z), (u : v : w) (x : αy : βz), (u : α−1v : β−1w) , for α, β ∈ C∗) and which fix both points A4 and A5. Definition 7.3. Let h1 and h2 be the following birational transformations of P h1 : (x : y : z) 99K (yz : xy : −xz) h2 : (x : y : z) 99K (yz(y − z) : xz(y + z) : xy(y + z)) and denote respectively by g1, g2 the lift of these elements on Ŝ4 and by Cs24 the group generated by g1 and g2. The following lemma shows that Cs24 ⊂ Aut(Ŝ4, π) and describes some of the properties of this group. Lemma 7.4. Let h1, h2, g1, g2,Cs24 be as in Definition 7.3. Then: 1. The group Cs24 is a group of automorphisms of Ŝ4 that preserve the conic bundle (Ŝ4, π), i.e. Cs24 ⊂ Aut(Ŝ4, π). 2. The action of g1 and g2 on the set of irreducible rational curves of negative self-intersection is respectively: (Ẽ1 D̃23)(E2 D12)(E3 D13)(E4 E5)(D14 D15), (Ẽ1 D̃23)(E2 D13)(E3 D12)(E4 D14)(E5 D15). In particular, both g1 and g2 twist the conic bundle (Ŝ4, π). 3. Both g1 and g2 are elements of order 4 and 2 = (h2) 2 = (x : y : z) 7→ (−x : y : z). Thus (g1) 2 = (g2) 2 ∈ kerπ is an automorphism of Ŝ4 which leaves every divisor of negative self-intersection invariant. 4. The group Cs24 is isomorphic to Z/2Z × Z/4Z and the action on the basis of the fibration π yields the exact sequence 1 →< (h1) 2 >∼= Z/2Z → Cs24 →< π(h1), π(h2) >∼= (Z/2Z) 2 → 1. 5. The group Cs24 contains no involution that twists the conic bundle (Ŝ4, π). In particular, no element of Cs24 fixes a curve of positive genus. 6. The pair (Cs24, Ŝ4) and the triple (Cs24, Ŝ4, π) are both minimal. Proof. Observe first that h1 and h2 preserve the pencil of lines of P 2 passing through the point A1 = (1 : 0 : 0), so g1, g2 are birational transformations of Ŝ4 that send a general fibre of π on another fibre. Then, we compute (h1) 2 = (h2) (x : y : z) 7→ (−x : y : z). This implies that both h1 and h2 are birational maps of order 4. Note that the lift of h1 on the surface S6 is the automorphism κ1,−1 : (x : y : z), (u : v : w) (u : w : −v), (x : z : −y) (see Example 4.5). Since this automorphism permutes A4 and A5, its lift on Ŝ4 is biregular. The action on the divisors with negative self-intersection is deduced from that of κ1,−1 (see Lemma 4.6). Compute the involution h3 = h1h2 = (x : y : z) 99K (x(y + z) : z(y − z) : −y(y − z)). Its linear system is {ax(y + z) + (by + cz)(y − z) = 0 | (a : b : c) ∈ P2}, which is the linear sytem of conics passing through (0 : 1 : 1) and A1 = (1 : 0 : 0), with tangent y + z = 0 at this point (i.e. passing through A5). Blowing-up these three points (two on P2 and one in the blow-up of A1), we get an automorphism g3 of some rational surface. As the points A2 = (0 : 1 : 0) and A3 = (0 : 0 : 1) are permuted by h3, we can also blow them up and again get an automorphism. The isomorphism class of the surface obtained is independent of the order of the blown-up points. We may first blow-up A1, A2, A3 and get S6. Then, we blow-up the two other base-points of h3, which are in fact A4 (the point (0 : 1 : −1)) and A5 (the point infinitely near to A1 corresponding to the tangent y + z = 0). This shows that g3, and therefore g2, belong to Aut(Ŝ4, π). Since h3 permutes the points A2 and A3, g3 = g1g2 permutes the divisors E2 and E3. It also permutes D12 and D13, since h3 leaves the pencil of lines passing through A1 invariant. It therefore leaves Ẽ1 and D̃23 invariant, since E2 and E3 touch D̃23 but not E1. The remaining exceptional divisors are E4, E5, D14, D15. Either g1g2 leaves all four invariant, or it acts as (E4 D15)(E5 D14) (using the intersection with Ẽ1 and D̃23). Since A4 and A5 are base-points of h1h2, E4 and E5 are not invariant. Thus, g1g2 acts on the irreducible rational curves of negative self-intersection as (E2 E3)(D12 D13)(E4 D15)(E5 D14). We obtain the action of g2 by composing that of g1g2 with that of g1 and thus have proved assertions 1 through 3. Assertion 4 follows from assertion 3 and the fact that g1 and g2 commute. Let us prove that Cs24 contains no involution that twists the conic bundle (Ŝ4, π). Recall that such elements are involutions acting trivially on the basis of the fibration (see Lemma 6.1). Note that the 2-torsion of Cs24 is equal to {1, g21, g1g2, g1g 2 }. The elements g1g2 and g1g 2 do not act trivially on the basis of the fibration, and the element (g1) 2 does not twist any singular fibre since it leaves every curve of negative self-intersection invariant. This proves assertion 5. It remains to prove the last assertion. Observe that the orbits of the action of Cs24 on the exceptional divisors of Ŝ4 are {E2, E3, D12, D13} and {E4, E5, D14, D15}. Since these orbits cannot be contracted, the pair (Cs24, Ŝ4) is minimal, and so is the triple (Cs24, Ŝ4, π). Remark 7.5. The pair (Cs24, Ŝ4) was introduced in [Bla2] and was called Cs.24 because it is a group acting on a conic bundle, which is special, and isomorphic to Z/2Z× Z/4Z. 8 Finite Abelian groups of automorphisms of conic bundles - birational representative elements In this section we use the tools prepared in the previous sections to describe the finite Abelian groups of automorphisms of conic bundles such that no non-trivial element fixes a curve of positive genus. We first treat the case in which no involution twisting the conic bundle belongs to the group: Proposition 8.1. Let G ⊂ Aut(S, π) be a finite Abelian group of automorphisms of the conic bundle (S, π) such that: • no involution that twists the conic bundle (S, π) belongs to G; • the triple (G,S, π) is minimal. Then, one of the following occurs: • The fibration is smooth, i.e. S is a Hirzebruch surface. • S is the del Pezzo surface of degree 6. • The triple (G,S, π) is isomorphic to the triple (Cs24, Ŝ4, π) of Section 7. Proof. We assume that the fibration is not smooth. Recall that since the triple (G,S, π) is minimal, any singular fibre of π is twisted by an element of G (by Lemma 3.8). Since no twisting involution belongs to G, any element g ∈ G that twists a fibre corresponds to case 2 of Proposition 6.5. In particular, g is the lift on S of an automorphism of the form κα,β of the del Pezzo surface of degree 6 and it twists 2 singular fibres, which correspond to the fibres of the two fixed points of π(g) ∈ PGL(2,C). Furthermore, g is the root of an involution that leaves every component of every singular fibre of π invariant. If the number of singular fibres is exactly two, then S is the del Pezzo surface of degree 6, and we are done. Now suppose that the number of singular fibres is larger than two. This implies that π(G) is not a cyclic group (otherwise the non-trivial elements of π(G) would have the same two fixed points: there would then be at most two singular fibres); therefore, π(G) is isomorphic to (Z/2Z)2. By a judicious choice of coordinates we may suppose that π(G) = Since a singular fibre corresponds to a fixed point of one of the three elements of order 2 of π(G), only the fibres of (0 : 1), (1 : 0), (1 : 1), (−1 : 1), (i : 1), (−i : 1) can be singular. Since the group π(G) acts transitively on the sets {(1 : 0), (0 : 1)}, {(1 : ±1)} and {(1 : ±i)}, there are 4 or 6 singular fibres. We denote by g1 an element of G which twists the two singular fibres of (1 : 0) and (0 : 1). Let η : S → S6 denote the birational g1-equivariant morphism given by Proposition 6.5, which conjugates g1 to the automorphism −1 = κα,β : (x : y : z), (u : v : w) (u : αw : βv), (x : α−1z : β−1y) of the del Pezzo surface S6 of degree 6, for some α, β ∈ C ∗. In fact, since π(g1) has order 2, we have β = −α, so ηg1η −1 = κα,−α. The points blown-up by η are fixed by η(g1) 2η−1 = (κα,−α) (x : y : z), (u : v : w) (x : −y : −z), (u : −v : −w) and therefore belong to the curves E1 = { (1 : 0 : 0), (0 : a : b) | (a : b) ∈ P1} and D23 = { (0 : a : b), (1 : 0 : 0) | (a : b) ∈ P1}. Since these points consist of orbits of ηg1η −1, half of them lie in E1 and the other half in D23. In fact, up to a change of coordinates, (x, y, z), (u, v, w) (u, v, w), (x, y, z) , the points that may be blown-up by η are (0 : 1 : 1) , (1 : 0 : 0) ∈ D23, κα,−α(A4) = A5 = (1 : 0 : 0) , (0 : 1 : −1) ∈ E1, (0 : 1 : i) , (1 : 0 : 0) ∈ D23, κα,−α(A6) = A7 = (1 : 0 : 0) , (0 : 1 : i) ∈ E1. The strict pull-backs Ẽ1 and D̃23 by η of E1 and D23 respectively thus have self- intersection −2 or −3 in S, depending on the number of points blown-up. By convention we again denote by E1, E2, E3, D12, D13, D23 the total pull-backs by η of these divisors. (Note that for E2, E3, D12, D13, the strict and the total pull- backs are the same.) We set E4 = η −1(A4),..., E7 = η −1(A7) and denote by f the divisor class of the fibre of the conic bundle. (a) Suppose that η is the blow-up of A4 and A5, which implies that S is the surface Ŝ4 of Section 7. The Picard group of S is then generated by E1, E2, ..., E5 and f . Since we assumed that (G,S, π) is minimal, the singular fibres of (1 : 1) and (−1 : 1) must be twisted. One element g2 twists these two singular fibres and acts with order 2 on the basis of the fibration, with action (x1 : x2) 7→ (x2 : x1). Since g1 and g2 twist some singular fibre, both must invert the two curves of self- intersection −2, namely Ẽ1 and D̃23. The action of g1 and g2 on the irreducible rational curves of negative self-intersection is then respectively (Ẽ1 D̃23)(E2 D12)(E3 D13)(E4 E5)(D14 D15), (Ẽ1 D̃23)(E2 D13)(E3 D12)(E4 D14)(E5 D15). The elements g1 and g2 thus have the same action on Pic(S) = Pic(Ŝ4) as the two automorphisms with the same name in Definition 7.3 and Lemma 7.4, which generate Cs24. Note that the group H of automorphisms of S that leave every curve of negative self-intersection invariant is isomorphic to C∗ and corresponds to automorphisms of P2 of the form (x : y : z) 7→ (αx : y : z), for any α ∈ C∗. Then, g1 and g2 are equal to the lift of the the following birational maps of P h1 : (x : y : z) 99K (µyz : xy : −xz), h2 : (x : y : z) 99K (νyz(y − z) : xz(y + z) : xy(y + z)), for some µ, ν ∈ C∗. As h1h2(x : y : z) = (µx(y + z) : νz(y − z) : −νy(y − z)) and h2h1(x : y : z) = (νx(y + z) : µz(y − z) : −µy(y − z)) must be the same by hypothesis, we get µ2 = ν2. We observe that π(g1) and π(g2) generate π(G) ∼= (Z/2Z) 2; on the other hand, by hypothesis an element of G′ does not twist a singular fibre and hence belongs to H . As the only elements of H which commute with g1 are id and (g1) 2 (which is the lift of (h1) 2 : (x : y : z) 7→ (−x : y : z)), we see that g1 and g2 generate the whole group G. Conjugating h1 and h2 by (x : y : z) 7→ (αx : y : z), where α ∈ C ∗, α2 = µ, we may suppose that µ = 1. So ν = ±1 and we get in both cases the same group, because (h1) 2(x : y : z) = (−x : y : z). The triple (G,S, π) is hence isomorphic to the triple (Cs24, Ŝ4, π) of Section 7. (b) Suppose that η is the blow-up of A6 and A7. We get a case isomorphic to the previous one, using the automorphism (x : y : z), (u : v : w) (x : y : iz), (u : v : −iw) of S6. (c) Suppose that η is the blow-up of A4, A5, A6 and A7. The Picard group of S is then generated by E1, E2, ..., E6, E7 and f . Since (G,S, π) is minimal, there must be two elements g2, g3 ∈ G that twist respectively the fibres of (±1 : 1) and those of (±i : 1). As in the previous example, the three actions of these elements on the basis are of order 2, and the three elements transpose Ẽ1 and D̃23. The actions of g1, g2 and g3 on the set of irreducible components of the singular fibres of π are then respectively (E2 D12)(E3 D13)(E4 E5)(D14 D15)(E6 E7)(D16 D17), (E2 D13)(E3 D12)(E4 D14)(E5 D15)(E6 E7)(D16 D17), (E2 D13)(E3 D12)(E4 E5)(D14 D15)(E6 D16)(E7 D17). This implies that the action of the element g1g2g3 is (E2 D12)(E3 D13)(E4 D14)(E5 D15)(E6 D17)(E7 D16), and thus it twists six singular fibres of the conic bundle and fixes a curve of genus 2 (Lemma 6.1), which contradicts the hypothesis. (In fact, one can also show that the group generated by g1, g2 and g3 is not Abelian, see [Bla2], page 66.) After studying the groups that do not contain a twisting involution, we now study those which contain such elements. Since these twisting involutions cannot fix a curve of positive genus, they twist exactly two fibres (Lemma 6.1). Proposition 8.2. Let G ⊂ Aut(S, π) be a finite Abelian group of automorphisms of a conic bundle (S, π) such that: 1. If g ∈ G, g 6= 1, then g does not fix a curve of positive genus. 2. The group G contains at least one involution that twists the conic bundle (S, π). 3. The triple (G,S, π) is minimal. Then, S is a del Pezzo surface of degree 5 or 6. Proof. If the number of singular fibres is at most 3, then the surface is a del Pezzo surface of degree 5 or 6 (Lemma 3.12). We now assume that the number of singular fibres is at least 4 and show that this situation is not compatible with the hypotheses. We recall once again the exact sequence of Remark 3.13 1 → G′ → G → π(G) → 1, (1) and prove the following important assertions: (a) No element of G twists more than two singular fibres. (b) Any twisting involution that belongs to G belongs to G′ and twists exactly two singular fibres. (c) Any singular fibre is twisted by an element of G. (d) No non-trivial element preserves every component of every singular fibre. (e) Any twisting element of G is a root of (or equal to) a twisting involution that belongs to G′. Corollary 6.4 shows that an element that twists more than two fibres fixes a curve of positive genus; since this possibility is excluded by hypothesis, we obtain assertion (a). Lemma 6.1 shows that any twisting involution contained in G be- longs to G′ and twists an even number of fibres; using assertion (a), we thus obtain assertion (b). Assertion (c) follows from the minimality of the triple (G,S, π) (see Lemma 3.8). Let us prove assertion (d). Suppose that there exists a non-trivial element g ∈ G that leaves every component of every singular fibre invariant, and denote by h ∈ G′ a twisting involution (which exists by hypothesis). Since g and h commute, Lemma 6.7 shows that each singular fibre invariant by h – there are at least 4 – is twisted by h, which contradicts assertion (a). Therefore, such an element g doesn’t exist and assertion (d) is proved. Finally, Proposition 6.5 shows that any twisting element that does not act trivially on the basis of the fibration is a root of an involution that belongs to G′, and assertion (d) shows that this involution is twisting, and we obtain assertion (e). Now that assertions (a) through (e) are proved, we deduce the proposition from them. Let us denote by σ ∈ G′ a twisting involution, which twists two singular fibres that we denote by F1 and F2. There are at least two other singular fibres F3 and F4 that are twisted by other elements of G. If G′ =< σ >, the fibres F3 and F4 are twisted by roots of σ belonging to G (assertions (c) and (e)). The description of these elements (Proposition 6.5, and in particular Corollary 6.6) shows that the roots must be square roots that twist exactly one singular fibre and permute the two fibres F1 and F2 twisted by σ. There thus exist two elements h3, h4 ∈ G that twist respectively the fibres F3 and F4. Since h3 commutes with h4, it must leave invariant the unique fibre twisted by h4, i.e. F4. Similarly, h4 must leave F3 invariant. Therefore, h3h4 leaves the four fibres F1,...,F4 invariant and twists the two fibres F3 and F4; it is thus an involution that belongs to G′, which contradicts the fact that G′ =< σ >. If G′ 6=< σ >, since σ has no root in G′ (Corollary 6.2), the Abelian groupG′ ⊂ PGL(2,C(x)) is isomorphic to (Z/2Z)2 and contains (using (d)) three twisting involutions σ, ρ and σρ. Note that two of these three involutions do not twist singular fibres which are all distinct, otherwise the product of the two involutions would give an involution that twists 4 singular fibres, contradicting (a). We may thus suppose that ρ twists F1 and F3, which implies that σρ twists F2 and F3. The fibre F4 is then twisted by an element which is a square root of one of the three twisting involutions (assertion (e) and Corollary 6.6). Denote this square root by h and suppose that h2 6= σ. Note that h exchanges the two singular fibres twisted by h2. One of these is twisted by σ and the other is not, so h and σ do not commute. The only remaining possible finite Abelian groups of automorphisms of conic bundles satisfying property (F ) are thus del Pezzo surfaces of degree 6 or 5 (studied in Sections 4 and 5), the triple (Cs24, Ŝ4, π) studied in Section 7, and Hirzebruch surfaces. We now describe this last case and prove that it is birationally reduced to the case of P1 × P1. Proposition 8.3. Let G ⊂ Aut(Fn) be a finite Abelian subgroup of automorphisms of Fn, for some integer n ≥ 1. Then, a birational map of conic bundles conjugates G to a finite group of automorphisms of F0 = P 1 × P1 that leaves one ruling invariant. Proof. Let G ⊂ Aut(Fn) be a finite Abelian group, with n ≥ 1. Note that G preserves the unique ruling of Fn. We denote by E ⊂ Fn the unique section of self-intersection −n, which is necessarily invariant by G. We have the exact sequence (see Remark 3.13) 1 → G′ → G → π(G) → 1. (1) Since the group π(G) ⊂ PGL(2,C) is Abelian, it is isomorphic to a cyclic group or to (Z/2Z)2. If π(G) is a cyclic group, at least two fibres are invariant by G. The group G fixes two points in one such fibre. We can blow-up the point that does not lie on E and blow-down the corresponding fibre to get a group of automorphisms of Fn−1. We do this n times and finally obtain a birational map of conic bundles that conjugates G to a group of automorphisms of F0 = P 1 × P1. If π(G) is isomorphic to (Z/2Z)2, there exist two fibres F, F ′ of π whose union is invariant by G. Let GF ⊂ G be the subgroup of G of elements that leave F invariant. This group is of index 2 in G and hence is normal. Since GF fixes the point F ∩ E in F , it acts cyclically on F . There exists another point P ∈ F , P /∈ E, which is fixed by GF . The orbit of P by G consists of two points, P and P ′, such that P ′ ∈ F ′, P ′ /∈ E. We blow-up these two points and blow-down the strict transforms of F and F ′ to get a group of automorphisms of Fn−2. We do this ⌊n/2⌋ times to obtain G as a group of automorphisms of F0 or F1. If n is even, we get in this manner a group of automorphisms of F0 = P 1 × P1. Note that n cannot be odd, if the group π(G) is not cyclic. Otherwise, we could conjugate G to a group of automorphisms of F1 and then to a group of automorphisms of P2 by blowing-down the exceptional section on a point Q ∈ P2. We would get an Abelian subgroup of PGL(3,C) that fixes Q, and thus a group with at least three fixed points. In this case, the action on the set of lines passing through Q would be cyclic (see Proposition 2.2), which contradicts our hypothesis. We can now prove the main result of this section: Proposition 8.4. Let G ⊂ Aut(S, π) be some finite Abelian group of automor- phisms of the conic bundle (S, π) such that the triple (G,S, π) is minimal and no non-trivial element of G fixes a curve of positive genus. Then, one of the following situations occurs: 1. S is a Hirzebruch surface Fn; 2. S is a del Pezzo surface of degree 5 or 6; 3. The triple (G,S, π) is isomorphic to the triple (Cs24, Ŝ4, π) of Section 7. If we suppose that the pair (G,S) is minimal, then we are in case 1 with n 6= 1 or in case 3. Moreover, cases 1 and 2 are birationally conjugate to automorphisms of P1 × P1 whereas the third is not. Proof. The fact that one of the three cases occurs follows directly from Proposi- tions 8.1 and 8.2. Case 1 is clearly minimal if and only if n 6= 1 and Proposition 8.3 shows that it is conjugate to automorphisms of P1 × P1. In the case of del Pezzo surfaces of degree 5 and 6, the pair (G,S) is not minimal and the group is respectively birationally conjugate to a subgroup of Sym4 ⊂ Aut(P 2) (Lemma 5.3) or Aut(P1× P1) (Lemma 4.2). If the first situation occurs, since the group is Abelian and not isomorphic to (Z/3Z)2 it is diagonalisable and conjugate to a subgroup of Aut(P1 × P1) (Proposition 2.2). Thus, we are done with case 2. It remains to show that the pair (Cs24, Ŝ4) is not birationally conjugate to a group of automorphisms of P1 × P1. Let us suppose the contrary, i.e. that there exists some Cs24-equivariant birational map ϕ : Ŝ4 99K P 1 × P1 (that conjugates Cs24 to a group of automorphisms). Then, ϕ is the composition of Cs24-equivariant elementary links (see for example [Isk3, Theorem 2.5], or [Do-Iz, Theorem 7.7]). Since our group preserves the conic bundle, the first link is of type II, III or IV (in the classical notation of Mori theory). We now study these possibilities and show that it is not possible to go to P1 × P1. Link of type II - In our case, this link is a birational map of conic bundles, which is the composition of the blow-up of an orbit of Cs24, no two points on the same fibre, with the blow-down of the strict transforms of the fibres of the points blown-up. The points must be fixed by the elements of Cs24 that act trivially on the basis of the fibration, and thus an orbit has 4 points, two on Ẽ1 and two on D̃23. This link conjugates the triple (Cs24, Ŝ4, π) to a triple isomorphic to it, by Proposition 8.1. Link of type III - It is the contraction of some set of skew exceptional curves, in- variant by Cs24. This is impossible since the pair (Cs24, Ŝ4) is minimal (Lemma 7.4). Link of type IV - It is a change of the fibration. This is not possible since the surface Ŝ4 admits only one conic bundle fibration (Corollary 7.2). 9 Actions on del Pezzo surfaces with fixed part of the Picard group of rank one In this section we prove the following result (note that finiteness is not required and that minimality of the action is implied by the condition on Pic(S)G). Proposition 9.1. Let S be a del Pezzo surface, and let G ⊂ Aut(S) be an Abelian group such that rk Pic(S)G = 1 and no non-trivial element of G fixes a curve of positive genus. Then, one of the following occurs: 1. S ∼= P2 or S ∼= P1 × P1; 2. S is a del Pezzo surface of degree 5 and G ∼= Z/5Z; 3. S is a del Pezzo surface of degree 6 and G ∼= Z/6Z. Furthermore, in cases 2 and 3, the group G is birationally conjugate to a diagonal cyclic subgroup of Aut(P2). This will be proved separately for each degree, in Lemmas 9.7, 9.8, 9.13, 9.15, 9.16 and 9.17. Remark 9.2. A del Pezzo surface S is either P1 × P1 or the blow-up of 0 ≤ r ≤ 8 points in general position on P2 (i.e. such that no irreducible curve of self- intersection ≤ −2 appears on S). The group Pic(S) has dimension r + 1, and its intersection form gives a decomposition Pic(S) ⊗ Q = QKS ⊕K S ; the signature is (1,−1, ...,−1). The group Aut(S) of automorphisms of a del Pezzo surface S acts on Pic(S) and preserves the intersection form. This gives an homomorphism of Aut(S) → Aut(Pic(S)) which is injective if and only if r > 3, since the kernel is the lift of automorphisms of P2 that fix the r blown-up points. Furthermore, the image is contained in the Weyl group and is finite (see [Dol]). In particular, the group Aut(S) is finite if and only if r > 3. When we have some group action on a del Pezzo surface, we would like to determine the rank of the fixed part of the Picard group. Here are some tools to this end. Lemma 9.3 (Size of the orbits). Let S be a del Pezzo surface, which is the blow-up of 1 ≤ r ≤ 8 points of P2 in general position, and let G ⊂ Aut(S) be a subgroup of automorphisms with rk Pic(S)G = 1. Then: • G 6= {1}; • the size of any orbit of the action of G on the set of exceptional divisors is divisible by the degree of S, which is 9− r; • in particular, if the order of G is finite, it is divisible by the degree of S. Proof. It is clear that G 6= {1}, since rk Pic(S) > 1. Let D1, D2, ..., Dk be k exceptional divisors of S, forming an orbit of G (the orbit is finite, see Remark 9.2). The divisor i=1 Di is fixed by G and thus is a multiple of KS . We can write∑k i=1 Di = aKS, for some a ∈ Q. In fact, since aKS is effective, we have a < 0 and a ∈ Z. Since the Di’s are irreducible and rational, we deduce from the adjunction formula Di(KS +Di) = −2 that Di ·KS = −1. Hence i=1 Di = i=1 KS ·Di = −k = KS · aKS = a(9− r). Consequently, the degree 9− r divides the size k of the orbit. Remark 9.4. This lemma shows in particular that rk Pic(S)G > 1 if S is the blow- up of r = 1, 2 points of P2, a result which is obvious when r = 1, and is clear when r = 2, since the line joining the two blown-up points is invariant by any automorphism. Lemma 9.5. Let S be some (smooth projective rational) surface, and let g ∈ Aut(S) be some automorphism of finite order. Then, the trace of g acting on Pic(S) is equal to χ(Fix(g)) − 2, where Fix(g) ⊂ S is the set of fixed points of g and χ is the Euler characteristic. Proof. This follows from the topological Lefschetz fixed-point formula, which as- serts that the trace of g acting on H∗(S,Z) is equal to χ(Fix(g)) (this uses the fact that g is an homeomorphism of finite order). Since S is a complex rational surface, H0(S,Z) and H4(S,Z) have dimension 1, H2(S,Z) ∼= Pic(S), and Hi(S,Z) = 0 for i 6= 0, 2, 4. Since the trace on H2 and H4 is 1, we obtain the result. Remark 9.6. This lemma is false if the order of g is infinite. Take for example the automorphism (x : y : z) 7→ (λx : y : z + y) of P2, for any λ ∈ C∗, λ 6= 1. It fixes exactly two points, namely (1 : 0 : 0) and (0 : 0 : 1), but its trace on Pic(P2) = Z is 1. We now start the proof of Proposition 9.1 by studying the cases of del Pezzo surfaces of degree 6 or 5. Lemma 9.7 (Actions on the del Pezzo surface of degree 6). Let S6 = { (x : y : z), (u : v : w) | ux = vy = wz} ⊂ P2 × P2 be the del Pezzo surface of degree 6 and let G ⊂ Aut(S6) be an Abelian group such that rk Pic(S6) G = 1. Then, G is conjugate in Aut(S6) to the cyclic group of order 6 generated by (x : y : z), (u : v : w) (v : w : u), (y : z : x) . Furthermore, G is birationally conjugate to a diagonal subgroup of Aut(P2). Proof. Lemma 9.3 implies that the sizes of the orbits of the action of G on the ex- ceptional divisors are divisible by 6. The action of G on the hexagon of exceptional divisors is thus transitive, so G contains an element of the form (x : y : z), (u : v : w) (αv : βw : u), (βy : αz : αβx) where α, β ∈ C∗. As the only element of (C∗)2 that commutes with g is the identity (see the description of Aut(S6) = (C ∗)2 ⋊ (Sym3 × Z/2Z) in Section 4), G must be cyclic, generated by g. Conjugating it by (x : y : z), (u : v : w) (βx : y : αz), (αu : αβv : βw) we may assume that α = β = 1, as stated in the lemma (this shows in particular thatG is of finite order). It remains to prove that this automorphism is birationally conjugate to a linear automorphism of the plane. Denote by p : S → P2 the restriction of the projection on the first factor. This is a birational morphism which is the blow-up of the three diagonal points A1, A2, A3 of P 2. Consider the birational map ĝ = pgp−1 of P2, which is explicitly ĝ : (x : y : z) 99K (xz : xy : yz). Since g is an automorphism of the surface, it fixes the canonical divisor KS , so the birational map ĝ leaves the linear system of cubics of P2 passing through A1, A2 and A3 invariant (this can also be verified directly). Note that ĝ fixes exactly one point of P2, namely P = (1 : 1 : 1), and that its action on the projective tangent space P(TP (P 2)) of P2 at P is of order 3, with two fixed points, corresponding to the lines (x− y) + ωk(z − y) = 0, where ω = e2iπ/3, k = 1, 2. Hence, the birational map ĝ preserves the linear system of cubics of P2 passing through A1, A2 and A3, which have a double point at P and are tangent to the line (x− y) + ω(z − y) = 0 at this point. This linear system thus induces a birational transformation of P2 that conjugates ĝ to a linear automorphism. Lemma 9.8 (Actions on the del Pezzo surface of degree 5). Let S5 be the del Pezzo surface of degree 5 and let G ∈ Aut(S5) = Sym5 be an Abelian group such that rk Pic(S5) G = 1. Then, G is cyclic of order 5. Furthermore, G is birationally conjugate to a diagonal subgroup of Aut(P2). Proof. We use the description of the surface S5 and its automorphisms group Aut(S5) = Sym5 given in Section 5. Lemma 9.3 implies that the order of G is divisible by 5, and thus that G is a cyclic subgroup of Sym5 of order 5. Since all such subgroups are conjugate in Aut(S5) = Sym5, we may suppose that G is generated by the lift of the birational transformation h : (x : y : z) 99K (xy : y(x− z) : x(y− z)) of P2, that fixes two points of P2, namely (ζ +1 : ζ : 1), where ζ2 − ζ − 1 = 0. Denoting one of them by P , the linear system of cubics passing through the four blown-up points and having a double point at P is invariant by h. The birational transformation associated to this system thus conjugates h to a linear automorphism of P2. Remark 9.9. The fact that (x : y : z) 99K (xy : y(x− z) : x(y − z)) is linearisable was proved in [Be-Bl], using the same argument as above. Corollary 9.10. Let S be a rational surface with (KS) 2 ≥ 5 and let G ⊂ Aut(S) be a finite Abelian group. Then G is birationally conjugate to a subgroup of Aut(P2) or Aut(P1 × P1). Proof. We may assume that the pair (G,S) is minimal; consequently there are two possibilities (see [Man], [Isk2] or [Do-Iz]): 1. S is a del Pezzo surface and rk Pic(S)G = 1. Then S is either P2, P1×P1 or a del Pezzo surface of degree 6 or 5 (Remark 9.4); we apply Lemmas 9.7 and 9.8 to conclude. 2. G preserves a conic bundle structure on S. Here the number of fibres is at most 3, hence no element of G fixes a curve of positive genus (Corollary 6.4); we apply Proposition 8.4 to conclude. To study del Pezzo surfaces of degree 4, let us describe their group of auto- morphisms (note that we do not use the notation Sd for the del Pezzo surfaces of degree d ≤ 4, because there are many different surfaces of the same degree): Lemma 9.11 (Automorphism group of del Pezzo surfaces of degree 4). Let S be a del Pezzo surface of degree 4 given by the blow-up η : S → P2 of five points A1, ..., A5 ∈ P 2 such that no three are collinear. Setting Ei = η −1(Ai) and denoting by L the pull-back by η of a general line of P2, we have: 1. There are exactly 10 conic bundle structures on S, whose fibres are respec- tively L− Ei, −KS − (L− Ei), for i = 1, ..., 5. 2. The action of Aut(S) on the five pairs of divisors {L−Ei,−KS− (L−Ei)}, i = 1, ..., 5 gives rise to a split exact sequence 0 → F → Aut(S) → Sym5, where F = {(a1, ..., a5) ∈ (F2) ai = 0} ∼= (F2) 4, and the automorphism (a1, ..., a5) permutes the pair {L−Ei,−KS − (L−Ei)} if and only if ai = 1. 3. We have Aut(S) = F⋊Aut(S, η), where Aut(S, η) is the lift of the group of automorphisms of P2 that leave the set {A1, ..., A5} invariant, and Aut(S, η) acts on F = {(a1, ..., a5) ∈ ai = 0} by permutation of the ai’s, as it acts on {A1, ..., A5}, and as ρ(Aut(S)) = ρ(Aut(S, η)) ⊂ Sym5 acts on the exceptional pairs. 4. The elements of F with two ”ones” correspond to quadratic involutions of P2 and fix exactly 4 points of S. 5. The elements of F with four ”ones” correspond to cubic involutions of P2 and the points of S fixed by these elements form a smooth elliptic curve. Remark 9.12. The group F ⊂ Aut(S) has been studied intensively since 1895 (see [Kan], Theorem XXXIII). A modern description of the group as the 2-torsion of PGL(5,C) can be found in [Bea2, (4.1)], together with a study of the conjugacy classes of such groups in the Cremona group. For further descriptions of the auto- morphism groups of these surfaces, see [Do-Iz, section 6.4] and [Bla2, section 8.1]. Proof. Let A = mL − i=1 aiEi be the divisor of the fibre of some conic bundle structure on S, for some m, a1, ..., a5 ∈ Z. From the relations A 2 = 0 (the fibres are disjoint) and AKS = −2 (adjunction formula) we get: i=1 ai 2 = m2, i=1 ai = 3m− 2. As in Lemma 7.1, we have ( i=1 ai) 2 ≤ 5 i=1 ai 2, which implies here that (3m− 2)2 ≤ 5m2, that is 4(m2 − 3m+ 1) ≤ 0. As m is an integer, we must have 1 ≤ m ≤ 2. If m = 1, we replace it in (3) and see that there exists i ∈ {1, ..., 5} such that A = L − Ei. Otherwise, taking m = 2 and replacing it in (3), we see that four of the aj ’s are equal to 1, and one is equal to 0. This gives the ten conic bundles of assertion 1, which are the lift on S of the lines of P2 passing through one of the Ai’s or of the conic passing through four of the Ai’s. The group Aut(S) acts on the set ∪5i=1{L−Ei,−KS − (L−Ei)}; since KS is fixed, this induces an action on the set of five pairs {L−Ei,−KS − (L−Ei)}. We denote by ρ : Aut(S) → Sym5 the corresponding homomorphism. The action of the kernel of ρ on the pairs of conic bundles gives a natural embedding of Ker(ρ) into (F2) We now prove that Ker(ρ) = {(a1, ..., a5) | ai = 0} = F. Acting by a linear automorphism of P2, we may assume that the points blown-up by η are A1 = (1 : 0 : 0), A2 = (0 : 1 : 0), A3 = (0 : 0 : 1), A4 = (1 : 1 : 1), A5 = (a : b : c), for some a, b, c ∈ C∗. Then, the birational involution τ : (x0 : x1 : x2) 99K (ax1x2 : bx0x2 : cx0x1) of P 2 lifts as an automorphism η−1τη ∈ Aut(S) that acts on Pic(S)  0 −1 −1 0 0 −1 −1 0 −1 0 0 −1 −1 −1 0 0 0 −1 0 0 0 0 1 0 0 0 0 1 0 0 1 1 1 0 0 2  with respect to the basis (E1, E2, E3, E4, E5, L). It follows from this observation that η−1τη belongs to the kernel of ρ, and acts on the pairs of conic bundles as (0, 0, 0, 1, 1) ∈ (F2) 5. Permuting the roles of the points A1, ..., A5, we get 10 involutions whose representations in (F2) 5 have two ”ones” and three ”zeros”. These involutions generate the group {(a1, ..., a5) | ai = 0} = F. To prove that this group is equal to Ker(ρ), it suffices to show that (1, 1, 1, 1, 1) does not belong to Ker(ρ). This follows from the fact that (1, 1, 1, 1, 1) would send L = 1 (KS +∑5 i=1(L − Ei)) on the divisor (KS + i=1(−KS − L + Ei)) = (−2L − 3KS), which doesn’t belong to Pic(S). This concludes the proof of assertion 2 (except the fact that the exact sequence is split, which will be proved by assertion 3). We now prove assertion 3. Let σ ∈ Sym5 be a permutation of the set {1, ..., 5} in the image of ρ and g be an automorphism of S such that ρ(g) = σ. Let α be the element of Aut(Pic(S)) that sends Ei on Eσ(i) and fixes L. Viewing Aut(S) as a subgroup of Aut(Pic(S)), the element gα−1 ∈ Aut(Pic(S)) fixes the five pairs of conic bundles. There exists some element h ∈ F ⊂ Aut(S) such that hgα−1 either fixes the divisor of every conic bundle or permutes the divisors of conic bundles in each pair. The same argument as in the above paragraph shows that this latter possibility cannot occur. Hence hgα−1 fixes L−E1, ..., L−E5 and KS. It follows that hgα−1 acts trivially on Pic(S), so α = hg ∈ Aut(S), and α is by construction the lift of an automorphism of P2 that acts on the set {A1, ..., A5} as σ does on {1, ..., 5}. Conversely, it is clear that every automorphism r of P2 which leaves the set {A1, A2, A3, A4, A5} invariant lifts to the automorphism η −1rη of S whose action on the pairs of conic bundles is the same as that of r on the set {A1, A2, A3, A4, A5}. This gives assertion 3. Assertion 4 follows from the above description of some element of F ⊂ Aut(S) with two ”ones” as the lift of a birational map of the form τ : (x0 : x1 : x2) 99K (ax1x2 : bx0x2 : cx0x1). As the automorphism η −1τη ∈ Aut(S) does not leave any exceptional divisor invariant, its fixed points are the same as those of τ , which are the four points (α : β : γ), where α2 = a, β2 = b, γ2 = c. It remains to prove the last assertion. Note that the element h = (0, 1, 1, 1, 1) ∈ Aut(S) fixes the divisor L−E1, hence acts on the associated conic bundle structure. Furthermore, the four singular fibres of this conic bundle, {L − E1 − Ei, Ei}, for i = 2, ..., 5, are invariant by h and this element switches the two components of each fibre. This shows that the action of h on the basis of the fibration is trivial, so the restriction of h on each fibre is an involution of P1 which fixes two points. On each singular fibre, exactly one point is fixed, which is the singular point of the fibre. The situation is similar for the other elements with four ”ones” (in fact, the involutions described here are twisting involutions, see Lemma 6.1). Lemma 9.13 (Actions on the del Pezzo surfaces of degree 4). Let S be a del Pezzo surface of degree 4, and let G ∈ Aut(S) be an Abelian group such that rk Pic(S)G = 1. Then, G contains an involution that fixes an elliptic curve. Proof. We keep the notation of Lemma 9.11 for η : S → P2,Aut(S, η), ρ,F, ... and denote by H the group G ∩ F = G ∩ Kerρ. We will prove that H contains an element of F with four ”ones”, which is an involution that fixes an elliptic curve (Lemma 9.11). The group ρ(G) ⊂ ρ(Aut(S)) ∼= Aut(S, η) is isomorphic to a subgroup of Aut(S, η). The group Aut(S, η) is the lift of the group of automorphisms of P2 that leave the set {A1, ..., A5} invariant (Lemma 9.11). The restriction of this group to the conic of P2 passing through the five points is a subgroup of PGL(2,C) that leaves five points invariant. Since ρ(G) is finite and Abelian, it is cyclic, of order at most 5. We consider the different possibilities. The order of ρ(G) is 1. This implies that G ⊂ F. If G contains an element with four ”ones”, we are done. Otherwise, up to conjugation G is a subset of the group generated by (1, 1, 0, 0, 0) and (1, 0, 1, 0, 0), and fixes L − E4 and L − E5 (thus rk Pic(S)G > 1). The order of ρ(G) is 2. Up to a change of numbering, ρ(G) is generated by (1 2)(3 4); since G is Abelian, we find that H ⊂ V = {(a, a, b, b, 0) | a, b ∈ F2}. Let g = ((a, b, c, d, e), (1 2)(3 4)) ∈ G be such that ρ(g) = (1 2)(3 4). We may suppose that e = 1 (otherwise, the group G would fix L − E5 and we would have rk Pic(S)G ≥ 2.) Conjugating by ((0, b, 0, d, b + c), id) we may assume that g = ((a+ b, 0, c+ d, 0, 1), (1 2)(3 4)). In fact, since a+ b+ c+ d+ e = 0, we have g = ((α, 0, 1+α, 0, 1), (1 2)(3 4)), where α = a+b = c+d+1 ∈ F2. If α = 1, then g has order 4 and fixes the divisor 2L− E3 − E4, thus G cannot be equal to < g > and it follows that V ⊂ G; in particular the element (1, 1, 1, 1, 0) is contained in G. If α = 0, then < g > fixes 2L− E1 − E2, so once again G contains V . The order of ρ(G) is 3. In this case, ρ(G) is generated by a 3-cycle, namely (1 2 3); then H must be a subgroup of V = {(a, a, a, b, a + b) | a, b ∈ F2}. The order of G must be a multiple of 4, by Lemma 9.3, hence H = V , and thus G contains the element (1, 1, 1, 1, 0). The order of ρ(G) is 4. Then ρ(G) is generated by (1 2 3 4), so H must be a subgroup of V =< (1, 1, 1, 1, 0) >. Let g = ((a, b, c, d, e), (1 2 3 4)) ∈ G be such that ρ(g) = (1 3 2 4). Conjugating the group by ((a, a+b, a+b+c, 0, a+c), id), we may suppose that g = ((0, 0, 0, e, e), (1 3 2 4)). If e = 1, then g4 = (1, 1, 1, 1, 0) ∈ G. If e = 0, the element g belongs to HS , so it fixes the divisors L and E5. As the group V fixes L− E5, the rank of Pic(S) G cannot be 1. The order of ρ(G) is 5. Then, ρ(G) is generated by a 5-cycle and H = {1}. The rank of Pic(S)H cannot be 1, by Lemma 9.3. Before studying the case of del Pezzo surfaces of degree ≤ 3, we remind the reader of some classical embeddings of these surfaces. Remark 9.14. Recall ([Kol], Theorem III.3.5) that a del Pezzo surface of degree 3 (respectively 2, 1) is isomorphic to a smooth hypersurface of degree 3 (respectively 4, 6) in the projective space P3 (respectively in P(1, 1, 1, 2), P(1, 1, 2, 3)). Further- more, in each of the 3 cases, any automorphism of the surface is the restriction of an automorphism of the ambient space. We will use these classical embeddings, take w, x, y, z as the variables on the projective spaces, and denote by [α : β : γ : δ] the automorphism (w : x : y : z) 7→ (αw : βx : γy : δz). Note that a del Pezzo surface of degree 4 is isomorphic to the intersection of two quadrics in P4, but we will not use this here. Lemma 9.15 (Actions on the del Pezzo surfaces of degree 3). Let S be a del Pezzo surface of degree 3, and let G ∈ Aut(S) be an Abelian group such that rk Pic(S)G = 1. Then, G contains an element of order 2 or 3 that fixes an elliptic curve of S. Proof. Lemma 9.3 implies that the order of G is divisible by 3, so G contains an element of order 3. We view S as a cubic surface in P3, and Aut(S) as a subgroup of PGL(4,C) (see Remark 9.14). There are three kinds of elements of order 3 in PGL(4,C), depending on the nature of their eigenvalues. Setting ω = e2iπ/3, there are elements with one eigenvalue of multiplicity 3 (conjugate to [1 : 1 : 1 : ω], or its inverse), elements with two eigenvalues of multiplicity 2 (conjugate to [1 : 1 : ω : ω]) and elements with three distinct eigenvalues (conjugate to [1 : 1 : ω : ω2]). We consider the three possibilities. Case a: G contains an element of order 3 with one eigenvalue of multiplicity 3. The element [1 : 1 : 1 : ω] fixes the hyperplane z = 0, whose intersection with the surface S is an elliptic curve (because Fix(g) ⊂ S is smooth). Thus, we are done. Case b: G contains an element of order 3 with two eigenvalues of multiplicity 2. With a suitable choice of coordinates, we may assume that this element is g = [1 : 1 : ω : ω]. Since S is smooth, its equation F is of degree at least 2 in each variable, which implies that F (w, x, ωy, ωz) = F (w, x, y, z) (the eigenvalue is 1); up to a change of coordinates F = w3 + x3 + y3 + z3, which means that S is the Fermat cubic surface. The group of automorphisms of S is (Z/3Z)3 ⋊ Sym4 and the centraliser of g in it is (Z/3Z)3 ⋊ V , where V ∼= (Z/2Z)2 is the subgroup of Sym4 generated by the two transpositions (w, x) and (y, z). The structure of the centraliser gives rise to an exact sequence 1 → (Z/3Z)3 → (Z/3Z)3 ⋊ V → V → 1 ∪ ∪ ∪ 1 → G ∩ (Z/3Z)3 → G → γ(G) → 1. We may suppose that G contains no element of order 3 with an eigenvalue of multiplicity 3, since this case has been studied above (case a). There are then three possibilities for G ∩ (Z/3Z)3, namely < g >, < g, [1 : ω : 1 : ω] > and < g, [1 : ω : ω : 1] >. The last is conjugate to the second by the automorphism (y, z). Note that g preserves exactly 9 of the 27 lines on the surface; these are {w + ωix = y + ωjz = 0}, for 0 ≤ i, j ≤ 2. If G ∩ (Z/3Z)3 is equal to < g >, then G/ < g >∼= γ(G) has order 1, 2 or 4 and thus G leaves at least one of the 9 lines invariant, whence rk Pic(S)G > 1. If G ∩ (Z/3Z)3 is the group H =< g, [1 : ω : 1 : ω] > we have G = H , since the centraliser of H in (Z/3Z)3⋊V is the group (Z/3Z)3. As the set of three skew lines {w + ωix = y + ωiz = 0} for 0 ≤ i ≤ 2 is an orbit of H , the rank of Pic(S)G is strictly larger than 1. Case c: G contains an element g of order 3 with three distinct eigenvalues. We may suppose that g = [1 : 1 : ω : ω2]. Note that the action of g on P3 fixes the line Lyz of equation y = z = 0 and thus the whole group G leaves this line invariant. If Lyz ⊂ S, the rank of rk Pic(S) G is at least 2. Otherwise, the equation of S is of the form L3(w, x)+L1(w, x)yz+ y 3+ z3 = 0, where L3 and L1 are homogeneous forms of degree respectively 3 and 1, and L3 has three distinct roots, so Fix(g) = S ∩ Lyz. Since g fixes exactly three points, the trace of its action on Pic(S) ∼= Z7 is 1 (Lemma 9.5) and thus rk Pic(S)g > 1, which implies that G 6=< g >. Note that every subgroup of PGL(4,C) isomorphic to (Z/3Z)2 contains an element with only two distinct eigenvalues, so we may assume that G contains only two elements of order 3, which are g and g2. This implies that the action of G on the three points of Lyz ∩ S gives an exact sequence 1 →< g >→ G → Sym3, where the image on the right is a transposition. The group G thus contains an element of order 2, that we may assume to be diagonal of the form (w : x : y : z) 7→ (−w : x : y : z) and that fixes the elliptic curve which is the trace on S of the plane w = 0. Lemma 9.16 (Actions on the del Pezzo surfaces of degree 2). Let S be a del Pezzo surface of degree 2, and let G ∈ Aut(S) be an Abelian group such that rk Pic(S)G = 1. Then, G contains either the Geiser involution (that fixes a curve isomorphic to a smooth quartic curve) or an element of order 2 or 3 that fixes an elliptic curve. Proof. We view S as a surface of degree four in the weighted projective space P(2, 1, 1, 1) (see Remark 9.14). Note that the projection on the last three coordi- nates gives S as a double covering of P2 ramified over a smooth quartic curve Q. Lemma 9.3 implies that the order of G is divisible by 2, so G contains an element g of order 2. If the element g is the involution induced by the double covering (classically called the Geiser involution), we are done; otherwise we may assume that g acts on P(2, 1, 1, 1) as g : (w : x : y : z) 7→ (ǫw : x : y : −z), where ǫ = ±1, and the equation of S is w2 = z4 + L2(x, y)z 2 + L4(x, y), where Li is a form of degree i, and L4 has four distinct roots. The trace on S of the equation z = 0 defines an elliptic curve Lz ⊂ S. If ǫ = 1, then g fixes the curve Lz and we are done; we therefore assume that ǫ = −1. If G contains another involution, we diagonalise the group generated by these two involutions and see that one element of the group fixes either an elliptic curve or the smooth quartic curve, so we may assume that g is the only involution of G. Note that g fixes exactly four points of S, which are the points of intersection of Lz with the quartic Q (of equation w = 0). The trace of g on Pic(S) ∼= Z thus equal to 2 (Lemma 9.5), whence rk Pic(S)g = 5 and G 6=< g >. The group G acts on the line z = 0 of P2 and on the four points of Lz ∩ Q. Since g is the only element of order 2 of G, the action of G on these four aligned points has order 3 and thus, we may assume that L4(x, y) = x(x 3 +λy3) and that there exists an element h of G that acts as (w : x : y : z) 7→ (αw : x : e2iπ/3y : βz), with α2 = β4 = 1. We find that h4 is an element of order 3 that fixes the elliptic curve which is the trace on S of the equation y = 0. Lemma 9.17 (Actions on the del Pezzo surfaces of degree 1). Let S be a del Pezzo surface of degree 1, and let G ∈ Aut(S) be an Abelian group such that rk Pic(S)G = 1. Then, some non-trivial element of G fixes a curve of S of positive genus. Proof. We view S as a surface of degree six in the weighted projective space P(3, 1, 1, 2) (see Remark 9.14). Up to a change of coordinates, we may assume that the equation is w2 = z3 + zL4(x, y) + L6(x, y), where L4 and L6 are homogeneous forms of degree 4 and 6 respectively. The embedding of S into P(3, 1, 1, 2) is given by | − 3KS| × | −KS | × | − 2KS|, which implies that G is a subgroup of P (GL(1,C)×GL(2,C)×GL(1,C)). The projection (w : x : y : z) 99K (x : y) is an elliptic fibration generated by | − KS |, and has one base-point, namely (1 : 0 : 0 : 1), which is fixed by Aut(S). This projection induces an homomorphism ρ : Aut(S) → Aut(P1) = PGL(2,C). Note that the kernel of ρ is generated by the Bertini involution w 7→ −w (and the element z 7→ ωz (ω = e2iπ/3) if L4 = 0) and is hence cyclic of order 2 (or 6). Furthermore, any element of this kernel fixes a curve of positive genus. We assume that no non-trivial element of G fixes a curve of positive genus. This implies that G is isomorphic to ρ(G) ⊂ Aut(P1), and thus is either cyclic or isomorphic to (Z/2Z)2. Since the lift of this latter group in Aut(S) is not Abelian, G is cyclic. We use the Lefschetz fixed-point formula (Lemma 9.5) to deduce the eigenvalues of the action of elements of G on Pic(S) ∼= Z9. For any element g ∈ G, g 6= 1, Fix(g) contains the point (1 : 0 : 0 : 1) and is the disjoint union of points and lines. Thus χ(Fix(g)) ≥ 1 and so the trace of g on Pic(S) is at least −1 (Lemma 9.5). Elements of order 2: The eigenvalues are < 1a, (−1)b > with a ≥ 4, b ≤ 5. Elements of order 3: The eigenvalues are < 1a, (ω)b, (ω2)b > with a ≥ 3, b ≤ 3. Elements of order 4: The eigenvalues are < 1a, (−1)b, (i)c, (−i)c > with a ≥ b−1. Furthermore, the information on the square induces that a+b ≥ 4, so a ≥ 2. Elements of order 5: The eigenvalues are < 15, l1, l2, l3, l4 >, where l1, ..., l4 are the four primitive 5-th roots of unity. Elements of order 6: The eigenvalues are< 1a, (−1)b, (ω)c, (ω2)c, (−ω)d, (−ω2)d >, where a − b − c + d ≥ −1. Computing the square and the third power, we find respectively a + b ≥ 3, c + d ≤ 3 and a + 2c ≥ 4, b + 2d ≤ 5. This implies that a ≥ 2. Indeed, if a = 1, we get b, c ≥ 2 and thus d ≤ 1, which contradicts the fact that the trace a− b− c+ d is at least −1. Since rk Pic(S)G = 1, the order of the cyclic group G is at least 7. As the action of G leaves L4 and L6 invariant, both L6 and L4 are monomials. If some double root of L6 is a root of L4, the surface is singular, so up to an exchange of coordinates we may suppose that L4 = x 4 and either L6 = xy 5 or L6 = y In the first case, the equation of the surface is w2 = z3+x4z+xy5 whose group of automorphisms Aut(S) is isomorphic to Z/20Z, generated by [i : 1 : ζ10 : −1], and contains the Bertini involution. No subgroup of Aut(S) fullfills our hypotheses. In the second case, the equation of the surface is w2 = z3 + x4z + y6, whose group of automorphisms is isomorphic to Z/2Z×Z/12Z, generated by the Bertini involution and g = [i : 1 : ζ12 : −1]. The only possibility for G is to be equal to < g >. Since g4 = [1 : 1 : ω : 1] fixes an elliptic curve, we are done. Proposition 9.1 now follows, using all the lemmas proved above. 10 The results We now prove the five theorems stated in the introduction. Proof of Theorem 4. Since the pair (G,S) is minimal, either rk Pic(S)G = 1 and S is a del Pezzo surface, or G preserves a conic bundle structure (see [Man], [Isk2] or [Do-Iz]). In the first case, either S ∼= P2, or S ∼= P1 × P1 or S is a del Pezzo surface of degree d = 5 or 6 and G ∼= Z/dZ (Proposition 9.1). In the second case, either S is a Hirzebruch surface or the pair (G,S) is the pair (Cs24, Ŝ4) of Section 7 (Proposition 8.4). Proof of Theorem 2. No non-trivial element of Aut(P2),Aut(P1 × P1) or Cs24 fixes a non-rational curve (the first two cases are clear, the last one follows from Lemma 7.4). Conversely, suppose that G is a finite Abelian subgroup of the Cremona group such that no non-trivial element fixes a curve of positive genus. Since G is finite, it is birationally conjugate to a group of automorphisms of a rational surface S (see for example [dF-Ei, Theorem 1.4] or [Do-Iz]). Then, we assume that the pair (G,S) is minimal and use the classification of Theorem 4. If S is an Hirzebruch surface, the group is birationally conjugate to a subgroup of Aut(P1 ×P1) (Proposition 8.3). If S is a del Pezzo surface, the group G is bira- tionally conjugate to a subgroup of Aut(P1 × P1) or Aut(P2), by Proposition 9.1. Otherwise, the pair (G,S) is isomorphic to the pair (Cs24, Ŝ4). It remains to show that the group Cs24 is not birationally conjugate to a subgroup of Aut(P1 × P1) or Aut(P2). Since the group is isomorphic to Z/2Z × Z/4Z, only the case of Aut(P1 × P1) need be considered (see Section 2). This was proved in Proposition 8.4. Proof of Theorem 5. By Theorem 2, G is birationally conjugate either to a sub- group of Aut(P2), or of Aut(P1 × P1), or to the group Cs24. The group Cs24 is case [8]. The finite Abelian subgroups of Aut(P 2) are conju- gate to the groups of case [1] or [9] (Proposition 2.2). The finite Abelian subgroups of Aut(P1 × P1) are conjugate to the groups of cases [1] through [7] (Proposi- tion 2.5). It was proved in Proposition 2.5 that cases [1] through [7] are distinct. In Proposition 8.4 we showed that [8] (Cs24) is not birationally conjugate to any groups of cases [1] through [7]. Finally, the group [9] is isomorphic only to [1], but is not birationally conjugate to it (Proposition 2.2). This completes the proof that the distincts cases given above are not birationally conjugate. The proof of Theorem 1 follows directly from Theorem 5, and Theorem 3 is a corollary of Theorem 1. 11 Other kinds of groups Our main interest up to now was in finite Abelian subgroups of the Cremona group. In this section, we give some examples in the other cases, in order to show why the hypothesis ”finite”, respectively ”Abelian”, is necessary to ensure that condition (F ) (no curve of positive genus is fixed by a non-trivial element) implies condition (M) (the group is birationally conjugate to a group of automorphisms of a minimal surface). We refer to the introduction for more details. Finiteness is important since it imposes that the group is conjugate to a group of automorphisms of a projective rational surface. This is not the case if the group is not finite (see for example [Bla2], Proposition 2.2.4). Lemma 11.1. Let ϕ : P2 99K P2 be a quadratic birational transformation with three proper base-points, and such that deg(ϕn) = 2n for each integer n ≥ 1. Then, the following occur: 1. no pencil of curves is invariant by ϕ; 2. ϕ is not birationally conjugate to an automorphism of P2 or of P1 × P1. Proof. Denote by A1, A2, A3 the three base-points of ϕ and by B1, B2, B3 those of ϕ−1. Up to a change of coordinates, we may suppose that A1 = (1 : 0 : 0), A2 = (0 : 1 : 0) and A3 = (0 : 0 : 1). The birational transformation ϕ is thus the composition of the standard quadratic transformation σ : (x : y : z) 99K (yz : xz : xy) with a linear automorphism τ ∈ Aut(P2) that sends Ai on Bi for i = 1, 2, 3. Let Λ be some pencil of curves, and assume that ϕ(Λ) = Λ. We will prove that some base-point of Λ is sent by ϕ on an orbit of infinite order. The con- dition deg(ϕn) = 2n is equivalent to saying that for i = 1, 2, 3, the sequence Bi, ϕ(Bi), ..., ϕ n(Bi), ... is well-defined, i.e. that ϕ m(Bi) is not equal to Aj for any i, j ∈ {1, 2, 3},m ∈ N. Denote by α1, α2, α3, β1, β2, β3 the multiplicity of Λ at respectively A1, A2, A3, B1, B2, B3 and by n the degree of the curves of Λ. The curves of the pencil ϕ(Λ) thus have degree 2n−α1−α2−α3. Since Λ is invariant, n = α1 + α2 + α3, so at least one of the αi’s is not equal to zero. The equality n = α1+α2+α3 implies that the curves of σ(Λ) have multiplicity αi at Ai, so the curves of ϕ(Λ) have multiplicity αi at Bi, whence αi = βi for i = 1, 2, 3. Since Λ passes through Bi with multiplicity αi, the pencil ϕ(Λ) = Λ passes through ϕ(Bi) with multiplicity αi for i = 1, 2, 3. Continuing in this way, we see that Λ passes through ϕn(Bi) with multiplicity αi for each n ∈ N. Consequently, Λ has infinitely many base-points, which is not possible. This establishes the first assertion. The second assertion follows directly, as each automorphism of P2 or P1 × P1 leaves a pencil of rational curves invariant. Corollary 11.2. The group generated by a very general quadratic transformation is a infinite cyclic group satisfying (F ) but not (M). Proof. The condition deg(ϕn) = 2n, n ∈ N is satisfied for all quadratic transfor- mations, except for a countable set of proper subvarieties. Consequently condition (F ) is not satisfied (Lemma 11.1) for a very general quadratic transformation. Let n be some positive integer and write ϕn : (x : y : z) 99K (f1(x, y, z) : f2(x, y, z) : f3(x, y, z)), for some homogeneous polynomials fi of degree 2 n. The set of points fixed by ϕn belongs to the intersection of the curves with equations xf2 − yf1, xf3 − zf1 and yf3 − zf2. In general, there is only a finite number of points; this yields condition (F ). In fact, the argument of Lemma 11.1 works for any very general birational transformation of P2, since this is a composition of quadratic transformations. We thus find infinitely many cyclic subgroups of the Cremona group that are not birationally conjugate to a group of automorphisms of a minimal surface although none of their non-trivial elements fixes a non-rational curve. The implication (F ) ⇒ (M) is therefore false for general cyclic groups. We now study the finite non-Abelian subgroups and provide, in this case, many examples satisfying (F ) but not (M): Lemma 11.3. Let S6 = { (x : y : z), (u : v : w) | ux = vy = wz} ⊂ P2 × P2 be the del Pezzo surface of degree 6. Let G ∼= Sym3 × Z/2Z be the subgroup of automorphisms of S6 generated by (x : y : z), (u : v : w) (u : v : w), (x : y : z) (x : y : z), (u : v : w) (y : x : z), (v : u : w) (x : y : z), (u : v : w) (z : y : x), (w : v : u) Then no non-trivial element of G fixes a curve of positive genus, and G is not birationally conjugate to a group of automorphisms of a minimal surface. Proof. Since every non-trivial element of finite order of Aut(S6) is birationally conjugate to a linear automorphism of P2 (Corollary 9.10), no such element fixes a curve of positive genus. The description of every G-equivariant elementary link starting from S6 was given by Iskovskikh in [Isk4]. This shows that this group is not birationally conjugate to a group of automorphisms of a minimal surface. Lemma 11.4. Let S5 be the del Pezzo surface of degree 5. Let G ∼= Sym5 be the whole group Aut(S5). Then no non-trivial element of G fixes a curve of posi- tive genus, and G is not birationally conjugate to a group of automorphisms of a minimal surface. Proof. Since every non-trivial element of Aut(S5) is birationally conjugate to a linear automorphism of P2 (Corollary 9.10), such an element does not fix a curve of positive genus. Suppose that there exists some G-equivariant birational trans- formation ϕ : S5 99K S̃ where S̃ is equal to P 2 or P1 × P1. We decompose ϕ into G-equivariant elementary links (see for example [Isk3], Theorem 2.5). The classi- fication of elementary links ([Isk3], Theorem 2.6) shows that a link S5 99K S either a Bertini or a Geiser involution (and in this case S′ = S5, and thus this link conjugates G to itself), or the composition of the blow-up of one or two points, and the contraction of 5 curves to respectively P1 × P1 or P2. It remains to show that no orbit of G has size 2 or 1, to conclude that these links are not possible. This follows from the fact that the actions of Sym5,Alt5 ⊂ G on S5 are fixed-point free (Proposition 5.1). Finally, the way to find more counterexamples is to look at groups acting on conic bundles. The generalisation of the example Cs24 gives many examples of non-Abelian finite groups. Here is the simplest family: Lemma 11.5. Let n be some positive integer, and let G be the group of birational transformations of P2 generated by g1 : (x : y : z) 99K (yz : xy : −xz), g2 : (x : y : z) 99K (yz(y − z) : xz(y + z) : xy(y + z)), h : (x : y : z) 99K (e2iπ/2nx : y : z). Then, G preserves the pencil Λ of lines passing through (1 : 0 : 0) and the corre- sponding action gives rise to a non-split exact sequence 1 →< h >∼= Z/2nZ → G → (Z/2Z)2 → 1. In particular, the group G has order 8n. Furthermore, no non-trivial element of G fixes a curve of positive genus, and G is not birationally conjugate to a group of automorphisms of a minimal surface. Proof. Firstly, since g1 and g2 generate the group Cs24, which is not birationally conjugate to a group of automorphisms of a minimal surface, this is also the case for G. Secondly, we compute that (g1) 2 = (g2) 2 = (h)n is the birational transforma- tion (x : y : z) 7→ (−x : y : z). The maps g1 and g2 thus have order 4 and h has order 2n. Thirdly, every generator of G preserves the pencil Λ of lines passing through (1 : 0 : 0). The action of g1, g2 and h on this pencil is respectively (y : z) 7→ (−y : z), (y : z) 7→ (z : y) and (y : z) 7→ (y : z). The action of G on the pencil thus gives an exact sequence 1 → G′ → G → (Z/2Z)2 → 0, where G′ is the subgroup of elements of G that act trivially on the pencil Λ. It is clear that < h >∼= Z/2nZ is a subgroup of G′. Since g1h(g1) −1 = g2h(g2) −1 = h−1 and g1 and g2 commute, the group < h > is equal to G Finally, any element of G that fixes a curve of positive genus must act trivially on the pencil Λ and thus belongs to < h >. Hence, only the identity is possible. References [Ba-Be] L. Bayle, A. Beauville, Birational involutions of P2. Asian J. Math. 4 (2000), no. 1, 11–17. [Be-Bl] A. Beauville, J. Blanc, On Cremona transformations of prime order. C.R. Acad. Sci. Paris, Sér. I 339 (2004), 257-259. [Bea1] A. Beauville, Complex Algebraic Surfaces. LondonMathematical Society Student Texts, 34, 1996. [Bea2] A. Beauville, p-elementary subgroups of the Cremona group. J. Algebra 314 (2007), no. 2, 553-564 [Bla1] J. Blanc, Conjugacy classes of affine automorphisms of Kn and linear automorphisms of Pn in the Cremona groups. Manuscripta Math., 119 (2006), no.2 , 225-241. [Bla2] J. Blanc, Finite Abelian subgroups of the Cremona group of the plane, PhD Thesis, University of Geneva, 2006. Available online at http://www.unige.ch/cyberdocuments/theses2006/BlancJ/meta.html [Bla3] J. Blanc, Finite Abelian subgroups of the Cremona group of the plane. C.R. Acad. Sci. Paris, Sér. I 344 (2007), 21-26. [Bla4] J. Blanc, The number of conjugacy classes of elements of the Cremona group of some given finite order. Bull. Soc. Math. France 135 (2007), no. 3, 419-434. [Bla5] J. Blanc, On the inertia group of elliptic curves in the Cremona group of the plane. Michigan Math. J. (to appear) math.AG/0703804 http://www.unige.ch/cyberdocuments/theses2006/BlancJ/meta.html http://arxiv.org/abs/math/0703804 [BPV] J. Blanc, I. Pan, T. Vust, Sur un théorème de Castelnuovo. Bull. Braz. Math. Soc. 39 (2008), no. 1, 61-80. [De-Ku] H. Derksen, F. Kutzschebauch, Nonlinearizable holomorphic group ac- tions. Math. Ann. 311 (1998), no. 1, 41–53. [dFe] T. de Fernex, On planar Cremona maps of prime order. Nagoya Math. J. 174 (2004), 1–28. [dF-Ei] T. de Fernex, L. Ein, Resolution of indeterminacy of pairs. Algebraic geometry, 165-177, de Gruyter, Berlin (2002). [Dol] I.V. Dolgachev, Weyl groups and Cremona transformations. Singulari- ties I, 283–294, Proc. Sympos. Pure Math. 40, AMS, Providence (1983). [Do-Iz] I.V. Dolgachev, V.A. Iskovskikh, Finite subgroups of the plane Cremona group. To appear in ”Arithmetic and Geometry - Manin Festschrift” math.AG/0610595 [vdE] A. van den Essen, Polynomial Automorphisms and the Jacobian Con- jecture. Progress in Mathematics, 190. Birkhäuser Verlag, Basel, 2000. [Isk1] V.A. Iskovskikh, Rational surfaces with a pencil of rational curves. Math. USSR Sbornik 3 (1967), no 4. [Isk2] V.A. Iskovskikh, Minimal models of rational surfaces over arbitrary fields. Izv. Akad. Nauk SSSR Ser. Mat. 43 (1979), no 1, 19-43, 237. [Isk3] V.A. Iskovskikh, Factorization of birational mappings of rational sur- faces from the point of view of Mori theory. Uspekhi Mat. Nauk 51 (1996) no 4 (310), 3-72. [Isk4] V.A. Iskovskikh, Two nonconjugate embeddings of the group S3×Z2 into the Cremona group. Tr. Mat. Inst. Steklova 241 (2003), Teor. Chisel, Algebra i Algebr. Geom., 105–109. [Kan] S. Kantor, Theorie der endlichen Gruppen von eindeutigen Transforma- tionen in der Ebene. Mayer & Müller, Berlin (1895). [Kol] J. Kollár, Rational Curves on Algebraic Varieties. Ergebnisse der Mathe- matik und ihrer Grenzgebiete. 3. Folge. Band 32, Springer-Verlag, Berlin (1996) [Ko-Sz] J. Kollár, E. Szabó, Fixed points of group actions and rational maps. Canadian J. Math. 52 (2000), 1054-1056. [Kra] H. Kraft, Challenging problems on affine n-space. Séminaire Bourbaki, Vol. 1994/95. Astérisque No. 237 (1996), Exp. No. 802, 5, 295–317. [Man] Yu. Manin, Rational surfaces over perfect fields, II. Math. USSR - Sbornik 1 (1967), 141-168. http://arxiv.org/abs/math/0610595 [Mu-Um] S. Mukai, H. Umemura,Minimal rational threefolds. Algebraic geometry, Tokyo/Kyoto, (1982), 490–518, Lecture Notes in Math., 1016, Springer, Berlin, 1983. [Um] H. Umemura, On the maximal connected algebraic subgroups of the Cre- mona group. I. Nagoya Math. J. 88 (1982), 213–246. [Wim] A. Wiman, Zur Theorie der endlichen Gruppen von birationalen Trans- formationen in der Ebene. Math. Ann., vol. 48, (1896), 497-498, 195- Introduction The main questions and results How to decide Linearisation of birational actions The approach and other results Comparison with other work Aknowledgements Automorphisms of P2 or P1P1 Some facts about automorphisms of conic bundles The del Pezzo surface of degree 6 The del Pezzo surface of degree 5 Description of twisting elements The example Cs24 Finite Abelian groups of automorphisms of conic bundles - birational representative elements Actions on del Pezzo surfaces with fixed part of the Picard group of rank one The results Other kinds of groups ABSTRACT This article gives the proof of results announced in [J. Blanc, Finite Abelian subgroups of the Cremona group of the plane, C.R. Acad. Sci. Paris, S\'er. I 344 (2007), 21-26.] and some description of automorphisms of rational surfaces. Given a finite Abelian subgroup of the Cremona group of the plane, we provide a way to decide whether it is birationally conjugate to a group of automorphisms of a minimal surface. In particular, we prove that a finite cyclic group of birational transformations of the plane is linearisable if and only if none of its non-trivial elements fix a curve of positive genus. For finite Abelian groups, there exists only one surprising exception, a group isomorphic to Z/2ZxZ/4Z, whose non-trivial elements do not fix a curve of positive genus but which is not conjugate to a group of automorphisms of a minimal rational surface. We also give some descriptions of automorphisms (not necessarily of finite order) of del Pezzo surfaces and conic bundles. <|endoftext|><|startoftext|> Introduction Organic electronics, in particular, organic field effect transistors (OFET) is a fast developing field of research and technological development [1-3]. Pentacene (PnC) is one of the most extensively studied organic semiconductors for OFETs due to its relatively high carrier mobility [2]. Ordered molecular materials are used in electronic and photonic organic devices for obtaining anisotropic properties. Therefore, techniques for formation of high-quality films play an important role in the development of organic thin film devices. For such applications, uniform films with the thickness range from nanometers to submicrons are required. For electronic applications, film purity and interface characteristics influence the charge transport and energy transfer processes. For optical applications, controlling of dipole orientation is required as well as uniform thickness and low scattering loss. It is not easy to fulfill all these requirements by the wet processing. On the other hand, stable polymers like polytetrafluoroethylene (PTFE) do not dissolve in any solvent. Therefore, vacuum-based dry processing is the only possible method for deposition of such polymers. Some polymers can be evaporated by heating in vacuum, but for complex polymers low temperature plasma polymerisation should be used. Primary polymer degradation products are generated by the scission of the molecular chain at various sites and/or the cleavage of side groups or atoms. Depending on the nature of the polymer structure, the scission of polymer chains can occur either randomly or in an ordered depolymerisation mechanism. PTFE films were deposited in vacuum, but with a modified technique, which includes electron cloud activation of the decomposition products [4]. Since the discovery of the friction transfer method of PTFE hot friction transfer has been used extensively to prepare substrates materials on top of which deposited chromophores form oriented layers by self- organization [5-7]. Recently it was found that vacuum deposited and rubbed PTFE films also support growth of oriented dye layers [8-10]. Using a series of measuring techniques (e.g. ellipsometry, optical and infrared spectroscopy and atomic force microscopy) we investigated physical and optical properties of vacuum deposited PTFE and PnC thin films formed on top of these PTFE layers in order to find optimal conditions for deposition of highly oriented PnC films. 2. Experimental 2.1. Description of PTFE and PnC PTFE is a linear polymer having the chemical structure shown in figure 1a. (a) polytetrafluoroethylene (PTFE) (b) pentacene Fig.1. Chemical structure of: (a) polytetrafluoroethylene (PTFE), (b) pentacene (PnC). PTFE can be considered to be a suitable organic material serving as gate dielectric in organic field-effect transistor (OFET) devices because of its physical and chemical properties: very good chemical, photochemical and thermal stability, low dielectric constant, very low conductivity and high breakdown filed strength. PTFE is one of the most thermally stable plastic materials manifesting no appreciable decompositions below 260°C. The chemical structure of pentacene which consists of five annulated benzene rings is shown in fig. 1b. Due to its flat conformation it can easily form crystals, which show highly anisotropic transport properties. Pentacene has a molar mass of 278.35 grams. The melting point is at about 300°C and the heat of vaporization is 74.4 kJ/mol. 2.2. Deposition technique The preparation of PTFE films was carried out by use of a special vacuum deposition technique. The films were obtained by evaporation of bulk PTFE pellets in the temperature range between 300° and 450°C with electron cloud-assisted activation with typical process pressure of 10-2 Pa, an accelerating voltage of 1-3 kV and an electron activation current of 0 – 5 mA as proposed before [4,11]. The electron cloud was produced by an electron gun with a ring cathode. A computer equipped with a quartz oscillator card Sigma SQM-242 was monitoring the film thickness and deposition rate. The temperature of the crucible was monitored by a chromel-alumel thermocouple. The deposition rate depends both on the electron current used for activation and on the temperature of the crucible. At the start of a deposition run, the increase of PTFE temperature in the evaporator causes an increase of both pressure and deposition rate. Fragments are colliding with each other before reaching the substrate, losing their chemical reactivity by forming stable gaseous species, which will not be incorporated into the deposited layer on the substrate. The evaporation rate is limited by the fact that the pressure can rise only to a certain value at which a breakdown of the electrical gun occurs. Hence, there exists an operation heating temperature [4, 11], which strongly depends on the pumping speed of the vacuum system and should be determined for each installation. This method gives the possibility to have a fast control of the evaporation rate, that in general is limited by the thermal inertia of the crucible, but in this method it is controlled instantly by the electrical power, that produces the activating cloud of electrons and, therefore, changing the quantity of active species. All PTFE films were deposited at a substrate temperature of 200C. Fig. 2 shows schematically the deposition installation used for the PTFE film deposition. Fig.2. Deposition set-up used for PTFE and pentacene deposition. PnC for fluorescence was purchased from Sigma-Aldrich and used as received. PnC films were deposited onto rubbed PTFE layers using conventional a tantalum boat heated by electric current. Important parameter which governs film formation is the temperature of the substrate. Related to this temperature, kinetic limitations such as molecular mobility, crystallization speed, and other thermodynamic factors are controlling the structure and morphology of the film. The substrate temperature was kept constant at room or elevated temperature and monitored by a chromel-alumel thermocouple. The deposition rate was chosen in the range between 0.05 and 0.2 nm/s. The distance between evaporators and substrate was 0.15 m. 2.3. Mechanical rubbing method The PTFE layers deposited onto different substrates were rubbed in an unidirectional mode on a cotton surface used to clean optical systems. The cotton for friction was placed in a fixed position on an optical table. The samples were rubbed 3- 6 times on a cotton surface with a constant force and speed. The scheme for cold friction is shown in Fig. 3. Fig.3. Schematic representation of the cold friction technique applied to a PTFE layer. 2.4. Studies of the deposited films The surface morphology of the films was obtained using an Atomic Force Microscope (Autoprobe VP 2 Park Scientific Instruments), operating in non-contact mode in air at room temperature. The mean thickness and index of refraction (n) of the PTFE films were determined by means of ellipsometry using a Plasmos SD2000 Automatic Ellipsometer operating at a wavelength of 632,8 nm. The thickness of the investigated thin films was measured using a Dektak Profilometer (DEKTAK 3 from Veeco Instruments) device, which has the capability of measuring the step height down to a few nm. Polarized absorption spectra of pentacene films were obtained with a UV/VIS Spectrometer (Lambda 16 Perkin Elmer). Measurements of infrared spectra of PTFE films have been carried out by use of a Perkin-Elmer Spectrum 2000 fourier transform infrared (FT-IR) spectrometer. 3. Results and discussions The analysis of the results, obtained using electron cloud assisted activation evaporation revealed that only a few important processes determine the film properties. Fig. 4 shows the influence of the evaporator temperature on the layer thickness at constant deposition time of 10 minutes. The presented curve stops well below the limiting pressure above which a decrease of deposition rate occurs due to the reason described above [4]. With increase of electron activation current the deposition rate and resulting film thickness are increasing. The maximum deposition rate obtained at a limiting pressure of 5-6 x10–2 Pa was 0.18 nm/s at an activation current of 10 mA and a voltage of 3 kV. The surface relief of PTFE films deposited at different conditions onto silicon substrate is shown in Fig. 5. Fig.4. Dependence of layer thickness on crucible temperature after 10 minutes of deposition, I=2mA, V=1.5 kV. Fig.5. Surface morphology and profile of PFTE films: (a) 2 mA and (b) 1 mA electron current activation, respectively. The surface of all PTFE films is smooth. For smaller electron activation current, a larger granular structure on the surface is detected. Root mean square (RMS) roughness is 1 nm and 3 nm, respectively. The obtained RMS values indicate a smoother surface occurs at higher electron activation energy. Ellipsometry results confirm the AFM investigations: thicknesses determined by both methods are comparable. In addition, a change of refractive index in dependence on electron activation energy and on current density was found, as determined by means of ellipsometry. Thus, electron activation parameters affect the surface morphology and refractive index of the PTFE films. Table 1. Refractive index of PTFE films versus activation conditions. The IR spectra of deposited films under different activation are depicted in Fig. 6. The bands at 1161 and 1258 cm-1 was assigned to the -CF2- groups, the band at 1350 cm- 1 to groups with a double bond. The intensity of the bands at 524 and 556 cm-1 is lower, than the intensity of the band at 736 cm-1, thus indicating, that the material of the films is almost amorphous [11-13]. Normally at low electron activation PTFE layers are crystalline [4, 13]. An increase of electron activation current makes the films amorphous and increases the content of double bonds and side branches. Here at low activation power almost amorphous films with some double bonds but almost without branches were deposited. Fig.6. IR-spectra of a 500 nm-thick films, deposited by PTFE evaporation under following conditions: 1 with activation current 1,5 mA; 2 with activation current 2 mA. Inset: the magnification of IR spectra in the range from 500 to 1000 cm-1 is shown. After rubbing with a cotton cloth, the film surfaces were investigated by AFM and profilometry techniques. The film surface acquired ordered relief oriented in the direction of friction. Fig. 7 shows the relief and the profile of the PTFE layer after friction. The PTFE grooves have 10 -100 µm length, and about 300 nm height. Also, we can see that the spectral lines show exactly the linear structure of PTFE. Fig.7. The 1 µm × 1 µm AFM scan of rubbed PTFE films: (a) 3D image of the film relief; (b) profile of a series of grooves. The groove length is about 100 nm, and the height is 300 nm. Fig.8. Polarized absorption spectra of pentacene films for the parallel (dotted curves) and perpendicular (solid curves) orientation of the electric vector of light in respect to the PTFE layer alignment. Films were deposited at the following substrate conditions: a – onto 36 nm PTFE at 200C, b – onto50 nm PTFE at 200C, c – onto 90 nm PTFE at 750C, d – onto50 nm PTFE at 750C. Film thicknesses by quartz monitor: a) and b) – 75 nm, c) and d) – 80 nm. Band splitting at the main absorption is 30, 34, 40 and 40 nm for a), b), c) and d) respectively. Fig.9. Surface relief of PnC film onto rubbed PTFE sublayer. Measurements of electronic absorption spectra of the pentacene films deposited onto rubbed PTFE layers of different thickness have shown that orientation of the PnC films depend on both the PTFE film thickness and on substrate temperature. Optical spectra of some PnC films are presented in Fig.8. They are in a good agreement with spectra of α- and β- phase of PnC films, deposited onto both inorganic and polymer substrates, including PTFE [7, 14]. Rubbed PTFE films of about 50 nm thickness lead to the best oriented PnC films. Absorption measurements with polarized light have shown that the deposited PnC films show a pronounced dichroism. A dichroic ratio of about 2 was measured even at deposition temperature of 20°C. This dichroic ratio is larger than obtained for PnC films deposited onto friction transferred PTFE layers, and for deposition at 20°C there no dichroism was observed at all. The temperature elevation from 20°C to 75°C slightly enhances the PnC film orientation and changes the spectral shape. The latter two effects can be explained by the PnC molecular mobility enhancement. The former one is subject for further detailed studies. A little difference in the spectral shape indicates different molecular interactions inside of the PnC crystals dependent on deposition conditions. The crystal size and structure is also sensitive to the deposition conditions and results in modification of the absorption spectra. The optical spectra of the PnC films deposited at 75°C show a small shift of all bands towards to red region and an increase of band splitting in comparison with bands of the films, deposited at 20°C, thus evidencing better intermolecular interactions in films, deposited at elevated temperature. Both PTFE film thickness and substrate temperature allow controlling this parameter in order to deposit PnC films with predetermined properties. The absorption of films deposited at 75°C is smaller than the absorption of films deposited at 20°C, although the quartz monitor thickness was the same for both samples. Obviously, a re- evaporation took place already at 75°C as mentioned before. By AFM no preferred crystal orientation was found in all PnC films. The typical relief of a PnC film on a PTFE aligned layer is shown in Fig.9a. The crystal size is in the range of 80 to 200 nm, depending on deposition conditions. Sometimes freely distributed needle-like PnC crystal with long axis up to 500 nm appeared (Fig.9b). Such films have low optical anisotropy. Therefore, the optical anisotropy of PnC films is due to the unidirectional arrangement of PnC molecules inside all crystals. Comparison of the obtained PnC crystals with those grown on friction-transferred PTFE layers shows that the crystals grown on the vacuum deposited, rubbed PTFE layers have smaller size and more round shape. The like effect was found for the growth of squarylium dyes on such vacuum deposited PTFE films [10]. This effect is caused by a smaller relief of the surface of the vacuum-deposited and rubbed PTFE film in comparison to the friction transferred films. In addition, some differences in the structure of friction-transferred and vacuum- deposited PTFE also plays a role. The PnC nucleation directed by PTFE edges are the main mechanism of growth of oriented PnC film as it was proposed by Brinkmann et. al.[7]. They observed that the top material domains have been enforced to grow parallel to the ledge direction due to the confinement by the PTFE nanofibrils. Only when the height of these domains exceeds that of the ledge the lateral growth of the domains is possible. The opinion about the prevailing influence of PTFE aligned on the molecular level onto dye oriented growth was expressed previously by Tanaka et. al. [8] and Wittmann et al. [5, 6, 7]. Our results seem to support the latter opinion, but the amorphous structure of our PTFE films should be taken into account. Perhaps, both mechanisms are taking place with different contributions in dependence on both the sublayer properties and deposition conditions. But even this suggestion does not explain all peculiarities, so further research should be carried out. 4. Conclusions Amorphous PTFE films with RMS roughness of 1-3 nm were deposited by electron cloud-assisted deposition in vacuum. Aligned grooves and ridges on the PTFE film surface were obtained by rubbing with a cotton cloth. PTFE film thickness and growth temperature elevation influence anisotropy of pentacene film. A dichroic ratio about of 2 was obtained even when the substrate was held at room temperature. The pentacene film is oriented on the molecular level. The strength of this technique is that the vacuum deposited, and rubbed PTFE layers have a higher orienting power than friction transferred PTFE layers so that they may favorably be used in OFETs as bottom gate dielectric which induces enhanced order in the channel material deposited on top of them. In addition, the vacuum deposited PTFE layers can also be used in top gate geometry, i.e. by deposition on top of OFET channel materials on plastic substrates. 5. Acknowledgements The authors would like to thanks to Dagmar Stabenow (University of Potsdam), Ramakrishna Velagapudi (University of Applied Sciences Wildau) for the AFM and optical measurements and Dr. Oleg Dimitiriev (Institute of Semiconductor Physics, Kyiv) for the fruitful discussions. Financial support of the European Commission under contract number: HPRN-CT-2002-00327-RTN EUROFET and of Federal Ministry of Education and Research (BMBF) Project under no. Ukr 04/004 is gratefully acknowledged. 6. References 1. Daraktchlev M, von Muchlenen A, Nuesch F. New J. of Physics 2005; 7:113. 2. Mattis BA, Pei Y, Subramanian V. Appl.Phys. Lett. 2005; 86: 033113. 3. Misaki M, Ueda Y. Appl. Phys. Lett. 2005; 87:243503. 4. Gritsenko KP, Krasovsky AM. Chem. Rev. 2004; 103(9):3607. 5. Wittmann JC, Smith P. Nature 1991; 352:414. 6. Moulin JF, Brinkmann M, Thierry A, Wittmann JC. Adv. Mater.2002; 14(6):436. 7. Brinkmann M, Graff S, Straupe C, Wittmann JC. J.Phys.Chem. 2003; B107:10531. 8. Tanaka T, Honda Y, Ishitobi M. Langmuir 2002; 17:2192. 9. Gritsenko KP, Slominski Yu L, Tolmachev AI, Tanaka T, Schrader S. Proc.SPIE 2002; 4833: 482. 10. Gritsenko KP, Grinko DO, Dimitrev OP, Schrader S, Thierry A, Wittmann JC. Optical Memory and Neutral Networks 2004; N3:135. 11. Roeges NP. G. A Guide to the Complete Interpretation of Infrared Spectra of Organic Structures, Wiley: New York (1994). 12. Liang CY, Krimm S J. J. Chem. Phys. 1956; 25:563. 13. Gritsenko KP, Lantoukh GV. J. Applied Spectroscopy. 1990; 52:677. 14. Brinkmann M, Videva VS, Bieber A. J. Phys. Chem. 2004; A108:8170. 15. Ruiz R, Chouldhary D, Nickel B. Chem. Mater. 2004; 16:4497. 16. Pratontep S, Nüesch F, Zuppiroli L, Brinkmann M. Phys. Rev.2005; B 72:085211. ABSTRACT We investigated structure and morphology of PTFE layers deposited by vacuum process in dependence on deposition parameters: deposition rate, deposition temperature, electron activation energy and activation current. Pentacene (PnC) layers deposited on top of those PTFE films are used as a tool to demonstrate the orienting ability of the PTFE layers. The molecular structure of the PTFE films was investigated by use of infrared spectroscopy. By means of ellipsometry, values of refractive index between 1.33 and 1.36 have been obtained for PTFE films in dependence on deposition conditions. Using the cold friction technique orienting PTFE layers with unidirectional grooves are obtained. On top of these PTFE films oriented PnC layers were grown. The obtained order depends both on the PTFE layer thickness and on PnC growth temperature. <|endoftext|><|startoftext|> Introduction The following notations are used: (n) stands for n1+...+np=n n1, . . . , np ∈ N0 and without any indices means (n). The notation D ≥ 0 is also used for non–symmetrical matrices Dp×p with only non–negative eigenvalues. The spectral norm of a p× p–matrix B is denoted by ‖B‖, I or Ip is always an identity matrix and Cp is the p–cube (−π, π]p. The Laplace transform (L.t.) of a p–variate non–central Γp(α,Σ,∆)–density with α > 0, Σ > 0 and a non–centrality matrix ∆ ≥ 0 was originally obtained from the L.t. of a non-central Wp(2α,Σ,∆)–Wishart distribution (with an additional scale factor 2) and is given by f̂(t1, . . . , tp;α,Σ,∆) = |Ip +ΣT |−αetr(−ΣT (I +ΣT )−1∆), (1) T = diag(t1, . . . , tp), t1, . . . , tp ≥ 0. This function f̂ is generally the L.t. of the density of a real measure on (0,∞)p which is not always a probability measure. The term ”Γp(α,Σ,∆)–distribution” is used here in this general sense. The exact set of values α, leading to a probability density (pdf) f(x1, . . . , xp;α,Σ,∆), depends on Σ and presumedly on ∆. To obtain a pdf, all positive integers 2α (degrees of freedom) are admissible and all 2α > p − 1. Moreover, in the central case all non–integer values 2α > p− 2 ≥ 0 are allowed. For p− 2 < 2α < p− 1 http://arxiv.org/abs/0704.0539v1 see Royen (1997). Furthermore all α > 0 are admissible if |I + ΣT |−1 is infinitely divisible. Two characterizations of infinite divisibility of a Γp(α,Σ)–distribution are found in Griffiths (1984) and Bapat (1989). Further conditions for admissible non– integer 2α < p− 2 are given in Royen (1997), (2006). Three integral representations by integration over Cp are provided by theorem 2 in section 4 for the functions F (x1, . . . , xp;α1, . . . , αn,Σ1, . . . ,Σn,∆1, . . . ,∆n) (2) . . . f(ξ1, . . . , ξp;α1, . . . , αn,Σ1, . . . ,Σn,∆1, . . . ,∆n)dξ1 . . . dξp, where f has the L.t. |Ip +ΣkT |−αketr(−ΣkT (I +ΣkT )−1∆k), (3) α1, . . . , αn > 0,Σ1, . . . ,Σn > 0,∆1, . . . ,∆n ≥ 0. Thus, F is not always the cumulative distribution function (cdf) of a probability measure. In particular letXp×n be aNp×n(Mp×n,Σp×p⊗In)–random matrix and An×n ≥ 0 of rank q with T ′AT = Λ = diag(λ1, . . . , λn), λ1 ≥ . . . ≥ λn. Then the joint distribution of the diagonal elements of the generalized quadratic form 1 XAX ′ equals the distribution of the diagonal of 1 Y ΛY ′ with a Np×n(MT,Σp×p⊗In)–distributed Y = XT . This is the distribution of a sum of q independent Γp( , λkΣ,∆k = −1)–random vectors, where µ∗k is the k–th column of M ∗ = MT . This joint distribution of p quadratic forms of normal random vectors is comprised within theorem 2 as a special case with , Σk = λkΣ, k = 1, . . . , q. For methods under more general assumptions see also Blacher (2003). For a survey of univariate quadratic forms of normal random variables see chapter 4 in Mathai and Provost (1992). For several quadratic forms of skew elliptical distributions see B.Q. Fang (2005). In Royen (1991), (1992) three different types of series expansions for the χ2p(2α,Σ)– cdf were derived from three different representations of the χ2p(2α,Σ)–L.t. which are extended to the general Γp(α,Σ,∆)–L.t. in section 3 in a similar way as in Royen (1995). Some series expansions, closely related to the first two types, are already found in Khatri, Krishnaiah and Sen (1977). The third type was introduced because of its superior convergence properties. The simple method to transform many series expansions into integrals over Cp is explained in more detail in section 2 and summarized in theorem 1. The idea is as follows: If A(z1, . . . , zp) and B(z1, . . . , zp) are analytical functions whose power series have the coefficients a(m1, . . . ,mp) and b(n1, . . . , np) and which are absolutely convergent for max |zj | < rA and max |zj | < rB respectively, where r−1B < rA, then (2π)−p A(y1, . . . , yp)B(y 1 , . . . , y p )dϕ1 . . . dϕp (4) a(n1, . . . , np)b(n1, . . . , np) holds with yj = re iϕj , −π < ϕj ≤ π, j = 1, . . . , p and r−1B < r < rA. The integrals in (4) might be more economical than the series if the generating functions A and B are simple available functions and if the series are slowly convergent with very intricate coefficients. For non–central multivariate gamma distributions series expansions are practically not feasible. The integral representations in theorem 2 of section 4 are of the type in (4). As long as no elementary density formulas are availale it should be a reasonable way to obtain the joint cdf by integration of elementary terms only over Cp and not over Rp as by the Fourier or Laplace inversion formula. A single Γp(α,Σ,∆)–cdf is represented by a (p− 1)–variate integral over Cp−1 in section 5. A totally different –variate integral representation of the Γp(α,Σ)–cdf has been given recently by Royen (2006), which is based on m–factorial decompositions∑ p×p = D −BB′, where D is a real or complex diagonal matrix minimizing the rank m of Σ−1 −D. Approximations to a Γp(α,Σ)–cdf are obtained by m–factorial approx- imations to Σ with a low value of m. These approximations are improved further by successive correction terms. 2. The method Theorem 1 in this section can be generalized in many ways, e.g. for Fourier trans- forms, but the version below is sufficient for the purpose of the underlying paper. Let f̂(t1, . . . , tp), t1, . . . , tp ≥ 0, be a given L.t. of an unknown function f(x1, . . . , xp) with f = 0 for minxj < 0. It is assumed that there are univariate L.t. ĝj0(t) of some probability densities gj0(x) on (0,∞) and further functions hj(t) with |hj(t)| ≤ 1, uniformly for t ≥ 0, which enable a representation f̂(t1, . . . , tp) = ĝj0(tj) B (h1(t1), . . . , hp(tp)) (5) with an analytical function B(z1, . . . , zp) whose power series expansion b(n1, . . . , np) j (6) is absolutely convergent for |z1|, . . . , |zp| < rB with a certain value rB > 1. Furthermore, the products ĝj0(t)(hj(t)) n are supposed to be the L.t. of continuous functions gjn(x), x > 0, which satisfy the conditions |gjn(x)| ≤ nck(x) with a constant c and (7)∫ ∞ k(x)e−txdx < ∞ for all t > 0 . Hence, the generating functions (generators) gj(x, y) = gjn(x)y n, j = 1, . . . , p, (8) are defined for all x > 0 and |y| < 1, and they have the L.t. ĝj(t, y) = ĝj0(t) 1− yhj(t) , t ≥ 0. (9) Theorem 1. Under the assumptions from (6) and (7) f̂ in (5) is the L.t. of f(x1, . . . , xp) = (2π) B(y−11 , . . . , y gj(xj , yj)dϕj (10) with yj = re iϕj , −π < ϕ ≤ π, r−1B < r < 1, gj from (8). Proof. The integral in (10) is evaluated by (2π)−p b(m1, . . . ,mp) gjnj (xj)y  dϕ1 . . . dϕp b(n1, . . . , np) j=1 gjnj (xj) and this series has the L.t. from (5). Some further remarks: With Gj(xj , yj) = gj(ξ, yj)dξ (11) instead of the gj in (10), the corresponding representation arises for F (x1, . . . , xp) = . . . f(ξ1, . . . , ξp)dξ1 . . . dξp. (12) If the series in (8) are absolutely convergent for all y ∈ C then additionally hj(t) = 0 (13) is supposed to hold. Then the rhs of (9) is the L.t. of gj(x, y) for any fixed y and all sufficiently large t. In some cases the functions gj0 and their L.t. ĝj0 are known from univariate marginal distributions apart from some scale factors. If the functions uj = hj(t) are explicitly invertible then B(u1, . . . , up) = h−11 (u1), . . . , h p (up) j=1 ĝj0 h−1j (uj) ) (14) can sometimes be found easily from the given f̂ . 3. Three representations for the Γp(α,Σ,∆)–Laplace transform and the related generators With any v > 0 we define zj = (1 + v −1tj) −1, tj ≥ 0, uj = 1− zj = v−1tjzj , ωj = zj − uj , Z = diag(z1, . . . , zp), U = diag(u1, . . . , up), Ω = diag(ω1, . . . , ωp). The scale factor v is introduced to obtain ‖B‖ < 1 for the matrices B defined in (20) below and to effect the convergence of some series expansions. For a more general scaling see remarks following theorem 2 in section 4. From the relations v−1T = UZ−1, Ip = Z + U, Ω = Z − U, (16) it follows for the matrices I +ΣT in the L.t. (1): I +ΣT = I + vΣUZ−1 = (Z + vΣU)Z−1 (17) Z + vΣU = I + (vΣ− I)U, (18a) vΣ(I + (v−1Σ−1 − I)Z), (18b) (I + vΣ)(I + (2(I + vΣ)−1 − I)Ω), (18c) and therefore |I +ΣT |−α = cα|Z|α|I +BY |−α (19) Y = U, B = vΣ− I, c = 1, (20a) Y = Z, B = (vΣ)−1 − I, c = |I +B|, (20b) Y = Ω, B = 2(I + vΣ)−1 − I, c = |I +B|. (20c) It should be noticed that ‖B‖ < 1 in (20c) for every v > 0 and Σ > 0. Now, using (16), by a straightforward calculation the L.t. in (1) can be represented f̂(t1, . . . , tp;α,Σ,∆) = |Z|α|I +BU |−αetr(−(I +B)U(I +BU)−1∆), (21a) |I +B|αetr(−∆)|Z|α|I +BZ|−αetr(Z(I +BZ)−1(I +B)∆), (21b) |I +B|αetr(− 1 ∆(I −B)) (21c) ·|Z|α|I +BΩ|−αetr(1 Ω(I +BΩ)−1(I +B)∆(I −B)), with the corresponding matrices B from (20) and Z,U,Ω from (15). For the former series expansions the following relations were used: Laplace transform f̂(t): f(x): F (x) = f(ξ)dξ: zαun vg α+n(vx) G α+n(vx) (22a) zα+n vgα+n(vx) Gα+n(vx) (22b) zαωn vhα,n(vx) Hα,n(vx) (22c) where z = (1 + v−1t)−1, gα+n(x) = e −xxα−1+n/Γ(α+ n), α+n(x) = gα+n(x) = α− 1 + n L(α−1)n (x)gα(x) with the generalized Laguerre polynomials L (α−1) n and hα,n(x) = (−1)n α− 1 + n L(α−1)n (2x)gα(x). The last identity is verified by L.t. The following bounds are derived from (22.14.13) in Abramowitz and Stegun (1965): ∣∣∣g(n)α+n(x) ∣∣∣ ≤ ex/2gα(x), α ≥ 1 2nα−1ex/2gα(x), 0 < α < 1 |hα,n(x)| ≤ xα−1/Γ(α), α ≥ 1 2nxα−1/Γ(α+ 1), 0 < α < 1 , (24) matching with the conditions in (7). The following generators (generating functions) with the Γ(α+n)–cdf Gα+n(x) are required for the formulas in theorem 2: Fα(x, y) = n=0 G α+n(x)y n = 1 , |y| < 1, (25a) n=0 Gα+n(x)y n = Gα(x, y), y ∈ C, (25b) n=0 Hα,n(x)y n = 1 x, 2y , |y| < 1 (25c) The identities (a) and (c) are verified by the L.t. of fα(x, y) = Fα(x, y). A short calculation shows Gα(x, y) = Gα(x) − y1−αe(y−1)x Gα(xy) , y 6= 1, α > 0 Gα−1(x) − y1−αe(y−1)x Gα−1(xy) , α ≥ 1, G0 := 1 xgα(x) + (1 + x− α) Gα(x), y = 1 gα(x, y) = Gα(x, y) = gα(x) + y 1−αe(y−1)x Gα(xy), α > 0 y1−αe(y−1)x Gα−1(xy), α ≥ 1 The functions Fα(x, y) are especially simple for α ∈ N since Gα(z) = 1−e−z j=0 z j/j!, α ∈ N. Besides, Gk+1/2(z) = erf(z 1/2)− e−z zj−1/2 Γ(j + 1/2) , k ∈ N0. The following simple lemma is used for the proof of theorem 2. Lemma 1. If B is a symmetrical p× p–matrix with ‖B‖ < 1 and Y = diag(y1, . . . , yp) then the power series expansion |I +BY |−α = b(n1, . . . , np) is absolutely convergent for max |yj | < rB = ‖B‖−1. This follows from (n) |b(n1, . . . , np)| = O(ϑn) with any ϑ > ‖B‖, which has been already shown in (2.1.16) . . . (2.1.18) in Royen (1991) (with the notation −C instead of 4. The integral representations In theorem 2 below the functions F (x1, . . . , xp;α1, . . . , αn,Σ1, . . . ,Σn,∆1, . . . ,∆n) from (2) are represented by three different integrals over Cp = (−π, π]p. Together with the generators Fα from (25), α = k=1 αk, the following matrices are used with a scale factor v to enforce ‖Bk‖ < 1: Bk = vΣk − I, Dk = ∆k(I +Bk), Fα from (25a), (27a) Bk = (vΣk) −1 − I, Dk = (I +Bk)∆k, Fα from (25b), (27b) Bk = 2(I + vΣk) −1 − I, Dk = 12 (I +Bk)∆k(I −Bk), Fα from (25c). (27c) Furthermore, we define λmax = max ‖Σk‖ , λ−1min = max ‖Σ k ‖, yj = reiϕj , −π < ϕj ≤ π, Y = diag(y1, . . . , yp), K = K(y1, . . . , yp) = etr(±(Y +Bk)−1Dk)|I +BkY −1|−αk , where the negative sign occurs only with Bk, Dk from (27a), and Fαdϕ = j=1 Fα(vxj , yj)dϕj . Theorem 2. With the above notations the functions F from (2) are respresentable by each of the following three integrals: (2π)−p KFαdϕ, (28) Fα from (25a), Bk, Dk from (27a), ‖Bk‖ < 1 if v < 2λ−1max, max ‖Bk‖ < r < 1, etr(−∆k)|I +Bk|αk (2π)−p KFαdϕ, (29) Fα from (25b), Bk, Dk from (27b), ‖Bk‖ < 1 if v > 12λ min, max ‖Bk‖ < r, ∆k(I −Bk) |I +Bk|αk (2π)−p KFαdϕ, (30) Fα from (25c), Bk, Dk from (27c), v > 0, max ‖Bk‖ < r < 1. Proof. Because of lemma 1 the assumptions of theorem 1 are satisfied with ĝj0(t) = z j = (1 + v −1tj) −α and hj(tj) corresponding to zj or uj = v −1tjzj = 1 − zj or ωj = zj − uj respectively. The functions ĝj0(t)(hj(t))n are the L.t. of the functions in the second column of (22) from which type (a) and (c) have the bounds in (23), (24), satisfying the condition (7) for theorem 1. The series n=0 Gα+n(x)y n = Gα(x, y) in (25b) is absolutely convergent for every y ∈ C. Thus, all r > max ‖Bk‖ are admissible in (29). In (30) we have max ‖Bk‖ < 1 for every v > 0. Hence, theorem 1 together with the respresentations of the L.t. in (21) implies (28), (29) and (30). The univariate case of (29) provides F (x;α1, . . . , αn, σ 1 , . . . , σ 1 , . . . , δ n) = (31)( σ−2αkk e Gα(vx, e δ2k/(1 + vσ iϕ − 1)) 1 + (v−1σ−2k − 1)e−iϕ with 2v > maxσ−2k , r = 1, Gα from (26). With p = 1 similar formulas arise from (28) or (30). The cdf of a quadratic form 1 x′Ax with T ′AT = diag(λ1, . . . , λn) ≥ 0 of rank q and a N (µ, σ2In)–random vector x is a special case of (31) with αk = 12 , σ k = λkσ 2 and non–centrality parameters δ2k = µ∗2k /σ 2, k = 1, . . . , q, µ∗ = T ′µ. Some further remarks: In (29) also ‖Bk‖ > 1 is allowed since every r = ‖Y ‖ > max ‖Bk‖ is admissible, which entails max ‖BkY −1‖ < 1. With ϑ = λmax/λmin it follows with special values of v: max ‖Bk‖ ≤ in (28) with v = 2(λmin + λmax) max ‖Bk‖ ≤ in (29) with v = (λ−1min + λ max), max ‖Bk‖ ≤ ϑ− 1√ in (30) with v = (λminλmax) −1/2. More generally, the scale factor v = w2 can be replaced by a scale matrix W 2 = diag(w21 , . . . , w p) > 0. Then with Tw = W −1TW−1, Σw = WΣW , ∆w = W∆W the L.t. (1) equals |I +ΣwTw|−α etr(−ΣwTw(I +ΣwTw)−1∆w). (32) Consequently, besides the substitutions vΣk → WΣkW , ∆k → W∆kW−1, the matrices I + Bk in theorem 2 must be replaced by WΣkW , (WΣkW ) −1 and 2(I + WΣkW ) respectively, and the generators Fα(vxj , yj) by Fα(w jxj , yj). In particular for a single Γp(α,Σ,∆)–distribution this more general scaling can be used to minimize ‖B‖ or for a ”natural scaling” i.e. to standardize I+B to a correlation matrix. However, ‖B‖ < 1 must be taken into account in (28), whereas this condition is satisfied in (30) for every scaling. It was shown in Royen (1991) that natural scaling can always be accomplished also in I +B = 2(I +WΣW )−1 by a unique W 2. 5. Representations of the Γp(α,Σ,∆) distribution function by (p− 1)–variate integrals For a single Γp(α,Σ,∆)–cdf it is always possible to perform the integration over a single variable ϕj within the integrals from theorem 2. We use the following functions Gα(x, y) = e−y Gα+n(x) α+n(x) (−y)n x, y ∈ C, Gα+n, G(n)α+n from (22), and G∗α(x, y) = eyGα(x, y). For positive half integers α = 1/2 + k these functions can also be computed by the erf–function and a sum of k terms which are essentially given by the modified Bessel functions Ij−1/2(2(xy) 1/2), j = 1, . . . , k, (see e.g. Royen (1995) or (2006)). Now let be W 2 = diag(w21 , . . . , w p) a general scale matrix, Y = diag(y1, . . . , yp), yj = re iϕj , −π < ϕj ≤ π, Bpp bp b′p bpp WΣW − I, (34a) (WΣW )−1 − I, (34b) 2(I +WΣW )−1 − I, (34c) Dpp dp dp dpp W∆ΣW, (35a) W−1Σ−1∆W−1, (35b) 2(I +WΣW )−1W∆W−1(I − (I +WΣW )−1), (35c) y0 = y0(y1, . . . , yp−1) = b p(Ypp +Bpp) −1bp − bpp (36) q = q(y1, . . . , yp−1) = (b p(Ypp +Bpp) −1,−1)D (Ypp +Bpp) Kα = Kα(y1, . . . , yp−1) = etr(±(Ypp +Bpp)−1Dpp)|I +BppY −1pp |−α, where the negative sign is only taken for Bpp from (34a). Theorem 3. With the above notations the Γp(α,Σ,∆)–cdf F (x1, . . . , xp;α,Σ,∆) is given by each of the following three integrals: (2π)p−1 w2pxp 1− y0 1− y0 1− yj w2jxj , yj − 1 dϕj , (38) B from (34a), D from (35a), ‖B‖ < r < 1, etr(−W∆W−1) |WΣW |α · (2π)p−1 (1− y0)−α G∗α (1− y0)w2pxp, 1− y0 jxj , yj)dϕj , B from (34b), D from (35b), ‖B‖ < r, 2αpetr(− 1 W∆W−1(I −B)) |I +WΣW |α · (2π)p−1 1− y0 (1− y0)−αGα 1− y0 1 + y0 w2pxp, 1− y20 1 + yj w2jxj , yj + 1 dϕj , B from (34c), D from (35c), ‖B‖ < r < 1. For the proof of theorem 3 the following two lemmas are required. Lemma 2. With Y = diag(y1, . . . , yp), yj = re iϕj , ‖B‖ < r, B,D, y0, q from (34), (35), (36), (37) the following decomposition is obtained etr((Y +B)−1D)|Y +B|−α = etr (Ypp +Bpp) −1Dpp |Ypp +Bpp|−α exp yp − y0 (yp − y0)−α. Proof. From frequently used formulas for p× p–matrices, (see e.g. complements and problems 2.4, 2.7 in chapter 1b of Rao (1973)) it follows for A = Y +B = App bp b′p yp + bpp |A| = |App|(yp + bpp − b′pA−1pp bp) = |Ypp +Bpp|(yp − y0), A−1 = A−1pp + yp−y0 A−1pp bpb pp − 1yp−y0A pp bp yp−y0 yp−y0 trace(A−1D) = trace A−1pp Dpp + yp − y0 A−1pp bpb pp Dpp −A−1pp bpdp yp − y0 (dpp − b′pA−1pp dp) = trace(A−1pp Dpp) + yp − y0 , which implies (41). Lemma 3. Let be q any number, Sr = {y ∈ C ∣∣|y| = r}, y0 any number with |y0| < r, then with Fα from (25), Gα,G∗α from (33), and the negative sign in ±q only for (42a) y − y0 Fα(x, y)(y − y0)−αyα−1dy Fα from (25a), r < 1, (42a) (1− y0)−α G∗α (1 − y0)x, q1−y0 , Fα from (25b), (42b) (1− y0)−α Gα x, 2q , Fα from (25c), r < 1, (42c) Proof. It is sufficient to verify (42) for the corresponding derivatives fα = At first, (42a) is shown: With Fα from (25a) and the binomial series for (1− y0/y)−(α+n) we obtain fα(x, y)(y − y0)−(α+n)yα−1dy α+m(x)y α+ n+ k − 1 y−n−1dy. With z = (1 + t)−1, u = tz, the last integral has the L.t. (uy)m α+ n+ k − 1 y−n−1dy m=n+k α+ n+ k − 1 yk0 = z αun(1− uy0)−(α+n). Multiplication by (−q)n/n! and summation over n leads to the L.t. (1− uy0)α 1− uy0 (1 + (1− y0)t)α (1− y0)t 1 + (1− y0)t and this is the L.t. of ∂ To verify (42b) we obtain with Fα from (25b): fα(x, y)(y − y0)−(α+n)yα−1dy gα+m(x)y Γ(α+ n+ k) Γ(α+ n)k! y−n−1dy m=n+k gα+m(x) Γ(α + n+ k) Γ(α+ n)k! yk0 = gα+n(x)e = (1− y0)−(α+n)(1− y0)gα+n((1 − y0)x). Multiplication by qn/n! and summation provides (1− y0)−α ∂∂x G (1− y0)x, q1−y0 (42c) can be shown by L.t. in a similar way as (42a). Proof of theorem 3. Without loss of generality yp is selected from the variables yj = re iϕj in Y = diag(y1, . . . , yp) with any fixed r > ‖B‖. If yp is replaced by a variable y with any |y| then the equation |Y +B| = |Ypp +Bpp|(yp + bpp − b′p(Ypp +Bpp)−1bp) = 0 has always a unique solution y = y0 = b p(Ypp +Bpp) −1bp − bpp with |y0| < r since ‖Bpp‖ ≤ ‖B‖. Hence, with lemma 2 and lemma 3, theorem 3 is obtained by integration over ϕp in the integrals of theorem 2 with n = 1. References Abramowitz, M. and Stegun, I.A. (1968). Handbook of Mathematical Functions, Dover, New York. Bapat, R.B. (1989). Infinite divisibility of multivariate gamma distributions and M–matrices, Sankhyā, Series A 51, 73–78. Blacher, R. (2003). Multivariate quadratic forms of random vectors, Journal of Multivariate Analysis 87, 2–23. Fang, B.Q. (2005). Noncentral quadratic forms of the skew elliptical variables, Journal of Multivariate Analysis 95, 410–430. Griffiths, R.C. (1984). Characterization of infinitely divisible multivariate gamma distributions, Journal of Multivariate Analysis 15, 13–20. Khatri, C.G., Krishnaiah, P.R. and Sen, P.K. (1977). A note on the joint distribution of correlated quadratic forms, Journal of Statistical Planning and Inference 1, 299–307. Krishnamoorthy, A.S. and Parthasarathy, M. (1951). A multivariate gamma type distribution, Annals of Mathematical Statistics 22, 549–557 (correction: ibid. (1960), 31, p. 229). Mathai, A.M. and Provost, S.B. (1992). Quadratic forms in random variables: Theory and applications, Marcel Dekker, New York. Rao, C.R. (1973). Linear Statistical Inference and its Applications, 2nd edition, Wiley, New York. Royen, T. (1991). Expansions for the multivariate chi–square distribution, Journal of Multivariate Analysis 38, 213–232. Royen, T. (1992). On representation and computation of multivariate gamma distributions, in: Data Analysis and Statistical Inference - Festschrift in Honour of Friedhelm Eicker, 201–216, Verlag Josef Eul, Bergisch Gladbach, Köln. Royen, T. (1995). On some central and non–central multivariate chi–square distributions, Statistica Sinica 5, 373–397. Royen, T. (1997). Multivariate gamma distributions (Update), Encyclopedia of Statistical Sciences, Update Volume 1, 419–425, Wiley, New York. Royen, T. (2006). Integral representations and approximations for multivariate gamma distributions, Annals of the Institute of Statistical Mathematics, DOI 10.1007/s10463-006-0057-5. Introduction The method Three representations for the p(,,)–Laplace transform and the related generators The integral representations Representations of the p(,,) distribution function by (p-1)–variate integrals ABSTRACT Three types of integral representations for the cumulative distribution functions of convolutions of non-central p-variate gamma distributions are given by integration of elementary complex functions over the p-cube Cp = (-pi,pi]x...x(-pi,pi]. In particular, the joint distribution of the diagonal elements of a generalized quadratic form XAX' with n independent normally distributed column vectors in X is obtained. For a single p-variate gamma distribution function (p-1)-variate integrals over Cp-1 are derived. The integrals are numerically more favourable than integrals obtained from the Fourier or laplace inversion formula. <|endoftext|><|startoftext|> Introduction The Channel Model An Achievable Rate Region for the Discrete Memoryless IC-DMS Random Codebook Generation Encoding and Transmission Decoding Evaluation of Probability of Error Relating with Existing Rate Regions A Subregion of R A Subregion of Rsim The Gaussian IC-DMS The Channel Model of the GIC-DMS Achievable Rate Regions for the GIC-DMS Gaussian Extension of R Gaussian Extension of Rsuc Numerical Examples Comparing with Rate Regions in Tarokh06:icdmscog Comparing with Rate Regions in jovicic06:cogICDMS,wuwei06icdms Conclusions References ABSTRACT The interference channel with degraded message sets (IC-DMS) refers to a communication model in which two senders attempt to communicate with their respective receivers simultaneously through a common medium, and one of the senders has complete and a priori (non-causal) knowledge about the message being transmitted by the other. A coding scheme that collectively has advantages of cooperative coding, collaborative coding, and dirty paper coding, is developed for such a channel. With resorting to this coding scheme, achievable rate regions of the IC-DMS in both discrete memoryless and Gaussian cases are derived, which, in general, include several previously known rate regions. Numerical examples for the Gaussian case demonstrate that in the high-interference-gain regime, the derived achievable rate regions offer considerable improvements over these existing results. <|endoftext|><|startoftext|> Introduction The additive group of integers modulo n will be denoted by Zn. Let G be a finite Abelian group and let X ⊂ G. The subgroup generated by a subset X of G will be denoted 〈X〉. For a positive integer k, we shall write k ∧X = A ⊂ X and |A| = k Following the terminology of [12] we write k ∧X. The set X is said to be complete if SX = 〈X〉. The reader may find the connection between this notion and the corresponding notion for integers in [12]. We shall also write S0X = SX ∪ {0}. Note that S0X = x∈X{0, x}. Université Pierre et Marie Curie, E. Combinatoire, Case 189, 4 Place Jussieu, 75005 Paris, France. yha@ccr.jussieu.fr Universitat Politècnica de Catalunya, Dept. Matemàtica Apl. IV; Jordi Girona, 1, E-08034 Barcelona, Spain. allado@ma4.upc.edu Universitat Politècnica de Catalunya, Dept. Matemàtica Apl. IV; Jordi Girona, 1, E-08034 Barcelona, Spain. oserra@ma4.upc.edu http://arxiv.org/abs/0704.0541v1 Let p denote a prime number and let A ⊂ Zp \ {0}. Erdős and Heilbronn [4] showed that A is complete if |A| ≥ p, and conjectured that 18 can be replaced by 2. This conjecture was proved by Olson[8]. More precisely, Olson’s Theorem states that A is complete if |A| ≥ 4p− 4. This result was sharpened by Dias da Silva and one of the authors [1] by showing that |k∧A| = p, if |A| ≥ 4p − 4, where k = ⌈ p− 1 ⌉. They also showed that |(j ∧ A) ∪ ((j + 1) ∧A)| = p, if |A| ≥ 4p− 8, where j = ⌈ p− 2 ⌉. Let G be a finite abelian group and let A ⊂ G\{0}. Complete sets for general abelian group were investigated by Diderrich and Mann [3]. Diderrich [2] proved that, if |G| = pq is the product of two primes, then A is complete if |A| ≥ p+ q − 1. Let p be the smallest prime dividing |G|. Diderrich conjectured [2] that A is complete, if |G|/p is composite and |A| = p+ |G|/p− 2. This conjecture was finally proved by Gao and one of the authors [5]. More precise results were later proved by Gao and the present authors [6]. Note that the bound of Diderrich is best possible, since one may construct non complete sets of size p+ |G|/p − 3. However the result of Olson was extended recently by Vu [13] to general cyclic groups. Let A ⊂ Zn be such that all the elements of A are coprime with n. Vu proved that there is an absolute constant c such that, for an arbitrary large n, A is complete if |A| ≥ c n. The proof of Vu is rather short and depends on a recent result of Szemerédi and Vu [11]. In the same paper Vu conjectures that the constant is essentially 2. Our main result is the following: Theorem 1.1 Let A be a subset of Zn be such that all the elements of A are coprime with n. If |A| > 1 + 2 n− 4 then A is complete. This result implies the validity of the last conjecture of Vu. We conjecture the following: Conjecture 1.2 Let A ⊂ Zn be such that all the elements of A are coprime with n and |A| ≥√ 4n− 4. Then |k ∧A| = n, where k = ⌈ n− 1 ⌉. 2 Some tools In this section we present known material and some easy applications of it. We give short proofs in order to make the paper self-contained. Recall the following well-known and easy lemma. Lemma 2.1 Let G be a finite group. Let X and Y be subsets of G such that X + Y 6= G. Then |X|+ |Y | ≤ |G|. Proof. Take a ∈ G \ (X + Y ). We have (a− Y ) ∩X = ∅. ✷ We use also the Chowla’s Theorem [7, 10] : Theorem 2.2 (Chowla [7, 10]) Let n be a positive integer and let X and Y be non-empty subsets of Zn. Assume that 0 ∈ Y and that the elements of Y \ {0} are coprime with n. Then |X + Y | ≥ min(n, |X| + |Y | − 1). Proof. The proof is by induction on |Y |, the result being obvious for |Y | = 1. Assume first that Y ⊂ X − x, for all x ∈ X. Then X + Y ⊂ X, and hence X + Y = X. It follows that X + Y = X + nY = Zn. Assume now that Y 6⊂ X − x, for some x ∈ X. Then 0 ∈ Y ∩ (X − x) and |Y ∩ (X − x)| < |Y |. By the induction hypothesis, |X|+ |Y | − 1 ≤ |((X − x) ∪ Y ) + ((X − x) ∩ Y )| ≤ |(X − x) + Y |. Let B ⊂ G and x ∈ G. Following Olson, we write λB(x) = |(B + x) \B|. The following result is implicit in [8]: Lemma 2.3 (Olson, [8]) Let Y be a nonempty subset of G \ {0}, z /∈ Y and y ∈ Y . Put B = S0Y . Then |B| ≥ |S0Y \{y}|+ λB(y), (1) |S0Y ∪{z}| = |S Y |+ λB(z). (2) Proof. Clearly we have B∩(B−y) ⊂ B\S0 Y \{y} and hence λB(y) = |B∩(B−y)| ≤ |B|−|S0Y \{y}|. ¿From S0 Y ∪{z} = B + {0, z} we have |S0 Y ∪{z} | = |B|+ |(B + z) \B|} = |B|+ λB(z). ✷ We need the following helpful result also due to Olson: Lemma 2.4 (Olson [8]) Let B and C be nonempty subsets of an abelian group G such that 0 6∈ C. Then, λB(x) = λB(−x). (3) λB(x+ y) ≤ λB(x) + λB(y). (4) λB(x) ≥ |B|(|C| − |B|+ 1). (5) Proof. For each x ∈ G we have |(B + x) ∩B| = |B + x| − |(B + x) ∩B| = |B − x| − |B ∩ (B − x)| = |B ∩ (B − x)| = λB(−x), proving (3). Let x, y ∈ G. Then, λB(x+ y) = |(B + x+ y) ∩B| = |(B + x) ∩ (B − y)| = |(B + x) ∩B ∩ (B − y)|+ |(B + x) ∩B ∩ (B − y)| ≤ |(B + x) ∩B|+ |B ∩ (B − y)| = λB(x) + λB(y), proving (4). Finally, λB(x) ≥ (|B + x| − |B ∩ (B + x)|) ≥ |C||B| − |B ∩ (B + x)| ≥ |C||B| − x∈G\0 |B ∩ (B + x)| = |B|(|C| − |B|+ 1), proving (5). ✷ 3 The main result The next Lemma is the key tool for our main result. Lemma 3.1 Let A and B be nonempty subsets of Zn. Assume that A ∩ (−A) = ∅ and that each element in A is coprime with n. Put a = |A| and b = |B|. Assume also that a ≥ 3 and 2b ≤ n+ 2. Then λB(x) > a− a(a− 3) . (6) In particular, if 2b ≥ a(a− 3), then λB(x) ≥ a− 1. (7) Proof. Put A∗ = A ∪ (−A) ∪ {0}. Let t < n be a positive integer and set t = 2ma+ r, m ≥ 0, 0 ≤ r ≤ 2a− 1. Let Cj = jA ∗. By Chowla’s theorem, |Cj| ≥ min{n, 2ja + 1} = 2ja + 1, for j ≤ m. Therefore we can choose a set C ⊃ A∗ of cardinality t + 1 which intersects Cj in exactly 2ja elements j = 2, . . . ,m, and intersects Cm+1 in exactly r elements. Let E = C \{0}. Let α = max{λB(x) : x ∈ A}. By (3) we have λB(x) ≤ α, for all x ∈ A∗. For an element x in Cj there are elements x1, · · · , xj ∈ A∗ such that x = x1+ · · ·+xj. In view of (4) we have λB(x) ≤ λ(x1)+ · · ·+λ(xj) ≤ jα. Therefore, λB(x) ≤ α2a+ 2α2a + · · ·+mα2a+ r(m+ 1)α = α(m+ 1)(ma+ r) = α(t− r + 2a)(t+ r) ≤ α(t+ a) By using (5) we have x∈E λB(x) (t+ a)2 ≥ 4ab(t− b+ 1) (t+ a)2 In particular, since 2b ≤ n+ 2, we can set t = 2b− 3 to get, α ≥ 4ab(b− 2) (2b+ a− 3)2 ≥ a(b− 2) (1− a− 3 > a− a(a− 3) where we have used a ≥ 3. In particular, if 2b ≥ a(a − 3), then α > a − 2 so that α ≥ a − 1. This completes the proof. ✷ Lemma 3.1 gives the following estimation for the cardinality of the set of subset sums. Lemma 3.2 Let A ⊂ Zn such that A ∩ (−A) = ∅ and every element of A is coprime with n. Also assume |A| ≥ 2. Then |S0A| ≥ min{ n + 2 , 3 + |A|(|A| − 1) Proof. We shall prove the result by induction on a = |A|, the result being obvious for a = 2. Suppose a > 2. Put B = S0A. We may assume b = |B| ≤ n2 + 1 so that 2b ≤ n + 2. By the induction hypothesis, 2b ≥ 6 + (a− 1)(a− 2) > a(a− 3). By (7) there is an x ∈ A with λB(x) ≥ a− 1. Then, by Lemma 2.3, |B| ≥ |S0A\{x}|+ λB(x) ≥ 3 + (a− 2)(a − 1)/2 + a− 1 = 3 + a(a− 1) as claimed. ✷ We are now ready for the proof of Theorem 1.1. Proof of Theorem 1.1. Suppose A non complete and put |A| = k. Let X,Y be disjoint subsets of A. We clearly have SX + S Y ⊂ SA 6= Zn. Since |SX | ≥ |S0X | − 1, we have |S0X |+ |S0Y | ≤ n+ 1, (8) by Lemma 2.1. Partition A = A1 ∪ A2 into two almost equal parts, i.e. |A1| = ⌈k/2⌉ and |A2| = ⌊k/2⌋, such that Ai ∩ (−Ai) = ∅, i = 1, 2. We must have 3 + ⌊k ⌋ − 1)/2 < (n+ 2)/2, (9) since otherwise, by Lemma 3.2, we have |S0A1 |+ |S | ≥ n+ 2, contradicting (8). Case 1. k even. Then we have by (9) n/2 > 2 + − 1)/2 = 2 + k(k − 2)/8, and hence (k − 1)2 + 16 ≤ 4n, a contradiction. Case 2. k odd. Put a = k−1 . In view of (9), Lemma 3.2 implies |S0A2 | ≥ 3 + a(a− 1)/2. By (7) with B = S0A2 , there is a y ∈ A1 such that λB(y) ≥ a− 1. Put C1 = A1 \ {y} and C2 = A2 ∪ {y}. Then we have, by Lemma 2.3, |S0C2 | ≥ |S |+ λB(y) ≥ 3 + a(a− 1)/2 + a− 1 = 2 + a(a+ 1) On the other hand, from (9) and Lemma 3.2 we get |S0C1 | ≥ 3 + a(a− 1) By (8), n+ 1 ≥ |S0C1 |+ |S | ≥ 3 + a(a− 1)/2 + 2 + a(a+ 1)/2 = 5 + a2. Therefore 4n ≥ 16 + 4a2 = 16 + (k − 1)2, a contradiction. This completes the proof. ✷ References [1] J.A. Dias da Silva and Y. O. Hamidoune, Cyclic spaces for Grassmann derivatives and additive theory, Bull. London Math. Soc., 26 (1994), 140-146. [2] G.T. Diderrich, An addition theorem for abelian groups of order pq, J. Number Theory 7 (1975), 33-48. [3] G. T. Diderrich and H. B. Mann, Combinatorial problems in finite abelian groups, In: ”A survey of Combinatorial Theory” (J.L. Srivasta et al. Eds.), pp. 95- 100, North- Holland, Amsterdam (1973). [4] P. Erdős and H. Heilbronn, On the Addition of residue classes mod p, Acta Arith. 9 (1964), 149-159. [5] W. Gao and Y.O. Hamidoune, On additive bases, Acta Arith. 88 (1999), 3, 233-237. [6] W. Gao, Y.O. Hamidoune A. S. Lladó and O. Serra, Covering a finite abelian group by subset sums. Combinatorica 23 (2003), no. 4, 599–611. [7] H.B. Mann, Addition Theorems, R.E. Krieger, New York, 1976. [8] J. E. Olson, An addition theorem mod p, J. Comb. Theory 5 (1968), 45-52. [9] S. Chowla, A theorem on the addition of residue classes: applications to the number Γ(k) in Waring’s problem, Proc.Indian Acad. Sc. 2 (1935) 242–243. [10] M. B. Nathanson, Additive Number Theory. Inverse problems and the geometry of sumsets, Grad. Texts in Math. 165, Springer, 1996. [11] E. Szemerédi and V.H. Vu, Long arithmetic progressions in finite and infinite sets, Annals of Math., to appear. [12] T. Tao and V.H. Vu, Additive Combinatorics, Cambridge Studies in Advanced Mathematics 105 (2006), Cambridge Press University. [13] V.H. Vu, Olson Theorem for cyclic groups, Preprint, arXiv:math.NT/0506483 v1, 23 june 2005. http://arxiv.org/abs/math/0506483 Introduction Some tools The main result ABSTRACT A subset $X$ of an abelian $G$ is said to be {\em complete} if every element of the subgroup generated by $X$ can be expressed as a nonempty sum of distinct elements from $X$. Let $A\subset \Z_n$ be such that all the elements of $A$ are coprime with $n$. Solving a conjecture of Erd\H{o}s and Heilbronn, Olson proved that $A$ is complete if $n$ is a prime and if $|A|>2\sqrt{n}.$ Recently Vu proved that there is an absolute constant $c$, such that for an arbitrary large $n$, $A$ is complete if $|A|\ge c\sqrt{n},$ and conjectured that 2 is essentially the right value of $c$. We show that $A$ is complete if $|A|> 1+2\sqrt{n-4}$, thus proving the last conjecture. <|endoftext|><|startoftext|> Introduction 4 Organization of the paper . . . . . . . . . . . . . . . . . . . . . . . . . 5 Important note added . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 I The theorem 7 1 The set up 7 1.1 The statement of the problem . . . . . . . . . . . . . . . . . . . . 7 1.2 Some convenient choices . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Reduction to the case n even . . . . . . . . . . . . . . . . . . . . 8 2 The theorem 9 2.1 Basic notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Two fundamental definitions . . . . . . . . . . . . . . . . . . . . 11 2.2.1 Definition of v-chain . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Definition of O-domination . . . . . . . . . . . . . . . . . . 11 2.3 The main theorem and its corollary . . . . . . . . . . . . . . . . . 11 II From geometry to combinatorics 13 3 Reduction to combinatorics 13 3.1 Homogeneous co-ordinate ring of the Schubert variety X(w) . . . 13 3.1.1 The line bundle L on Md(V ) . . . . . . . . . . . . . . . . . 13 3.1.2 The section qθ of L . . . . . . . . . . . . . . . . . . . . . . 13 3.1.3 Standard monomial theory for Md(V ) . . . . . . . . . . . . 14 3.2 Co-ordinate rings of affine patches and tangent cones of X(w) . . 14 3.2.1 Standard monomial theory for affine patches . . . . . . . . . 14 3.2.2 Standard monomial theory for tangent cones . . . . . . . . . 16 4 Further reductions 17 4.1 The main propositions . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 From the main propositions to the main theorem . . . . . . . . . 18 III The proof 21 5 Terminology and notation 21 5.1 Distinguished subsets . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.1.1 Distinguished subsets of N . . . . . . . . . . . . . . . . . . 21 5.1.2 Attaching elements of I(d, 2d) to distinguished subsets of N 21 5.2 The involution # . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2.1 The involution # on I(d, 2d) . . . . . . . . . . . . . . . . . 21 5.2.2 The involution # on N and R . . . . . . . . . . . . . . . . 22 5.3 The subset SC attached to a v-chain C . . . . . . . . . . . . . . 22 5.3.1 Vertical and horizontal projections of an element of ON . . 22 5.3.2 The “connection” relation on elements of a v-chain . . . . . 22 5.3.3 The definition of SC . . . . . . . . . . . . . . . . . . . . . 23 5.3.4 The type of an element α of a v-chain C, and the set SC,α 23 6 O-depth 27 6.1 Definition of O-depth . . . . . . . . . . . . . . . . . . . . . . . . 27 6.2 O-depth and depth . . . . . . . . . . . . . . . . . . . . . . . . . . 29 6.3 O-depth and type . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 7 The map Oπ 34 7.1 Description of Oπ . . . . . . . . . . . . . . . . . . . . . . . . . . 34 7.2 Illustration by an example . . . . . . . . . . . . . . . . . . . . . . 35 7.3 A proposition about Sj,j+1 . . . . . . . . . . . . . . . . . . . . . 38 7.4 Proof of Proposition 7.1.1 . . . . . . . . . . . . . . . . . . . . . . 40 7.5 More observations . . . . . . . . . . . . . . . . . . . . . . . . . . 41 8 The map Oφ 42 8.1 Description of Oφ . . . . . . . . . . . . . . . . . . . . . . . . . . 42 8.2 Basic facts about Tw,j,j+1 and T w,j,j+1 . . . . . . . . . . . . . . . 45 9 Some Lemmas 46 9.1 Lemmas from the Grassmannian case . . . . . . . . . . . . . . . . 46 9.2 Orthogonal analogues of Lemmas of 9.1 . . . . . . . . . . . . . . 49 9.3 Orthogonal analogues of some lemmas in [7] . . . . . . . . . . . . 51 9.4 More lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 10 The Proof 59 10.1 Proof of Proposition 4.1.1 . . . . . . . . . . . . . . . . . . . . . . 59 10.2 Proof that OφOπ = identity . . . . . . . . . . . . . . . . . . . . 62 10.3 Proof that OπOφ = identity . . . . . . . . . . . . . . . . . . . . 63 10.4 Proof of Proposition 4.1.3 . . . . . . . . . . . . . . . . . . . . . . 64 IV An Application 65 11 Multiplicity counts certain paths 65 11.1 Description and illustration . . . . . . . . . . . . . . . . . . . . . 65 11.2 Justification for the interpretation . . . . . . . . . . . . . . . . . 67 References 71 Index of definitions and notation 73 Introduction In this paper the following problem is solved: given a Schubert variety in an or- thogonal Grassmannian (by which is meant the variety of isotropic subspaces of maximum possible dimension of a finite dimensional vector space with a sym- metric non-degenerate form—see §1 for precise definitions) and an arbitrary point on the Schubert variety, how to compute the multiplicity, or more gener- ally the Hilbert function, of the local ring of germs of functions at that point. In a sense, our solution is but a translation of the problem: we do not give closed form formulas but alternative combinatorial descriptions. The meaning of “alternative” will presently become clear. The same problem for the Grassmannian was treated in [11, 8, 7, 9, 12] and for the symplectic Grassmannian in [4]. The present paper is a sequel to [11, 7, 9, 12, 4] and toes the same line as them. In particular, its strategy is borrowed from them and runs as follows: first translate the problem from geometry to combinatorics, or, more precisely, apply standard monomial theory to obtain an initial combinatorial description of the Hilbert function (the earliest version of the theory capable of handling Schubert varieties in an orthogonal Grassmannian is to be found in [17]); then transform the initial combinatorial description to obtain the desired alternative description. But that is easier said than done. While the problem makes sense for Schubert varieties of any kind and stan- dard monomial theory itself is available in great generality [13, 15], the transla- tion of the problem from geometry to combinatorics has been made—in [14]— only for “minuscule1 generalized Grassmannians.” Orthogonal Grassmannians being minuscule, this translation is available to us and we have an initial combi- natorial description of the Hilbert function. As to the passage from the initial to the alternative description—and this is where the content of the present paper lies—neither the end nor the means is clear at the outset. The first problem then is to find a good alternative description. But how to measure the worth of an alternative description? The interpretation of multi- plicity as the number of certain non-intersecting lattice paths (deduced in §11 from our alternative description) seems to testify to the correctness of our al- ternative description, but we are not sure if there are others that are equally or more correct. The proof of the equivalence of the initial and alternative combinatorial de- scriptions is, unfortunately, a little technically involved. It builds on the details of the proofs of the corresponding equivalences in the cases of the Grassmannian and the symplectic Grassmannian. In [10] it is shown that the equivalence in the case of the Grassmannian is a kind of KRS correspondence, called “bounded KRS.” The proof there is short and elegant and it would be nice to realise the main result of the present paper too in a similar spirit as a kind of KRS corre- spondence. The initial description is in terms of “standard monomials” and the alterna- 1Symplectic Grassmannians are not minuscule but can be treated as if they were. tive description in terms of “monomials in roots.” The equivalence of the two descriptions thus gives a bijective correspondence between standard monomials and monomials in roots. Roughly—but not actually—the correspondence maps each standard monomial to its initial term (with respect to a certain monomial order). Thus it is natural to wonder whether we can compute the initial ideal of the ideal of the tangent cone to the Schubert variety at the given point. We believe that this can be done but that it is far more involved and difficult than the corresponding computation for Grassmannians and symplectic Grassmanni- ans (the natural set of generators of the ideal of the tangent cone do not form a Gröbner basis unlike in those cases). If all goes well, the computation will soon appear [16]. Taking the Schubert variety to be of a special kind and the point to be the “identity coset,” our problem specializes to a problem about Pfaffian ideals con- sidered in [5, 2]. On the other side of the spectrum from the identity coset, so to speak, lie the “generic singularities,” points that are generic in the complement of the open orbit of the stabiliser of the Schubert variety. For these, a geometric solution to the problem appears in [1]. Given that our solution of the problem is but a translation, it makes sense to ask if one can extract more tangible information—closed form formulas for example—from our alternative description. See the papers quoted in the previ- ous paragraph and also [3] for some answers in the special cases they consider. Organization of the paper The table of contents indicates how the paper is organized. There is a brief description at the beginning of every subdivision of the contents therein. An index of definitions and notation is included, for it would otherwise be difficult to find the meanings of certain words and symbols. Important note added The recent article [6] treats some of the questions addressed here and some that could be addressed by using the main result proved here. It includes: • an interpretation of the multiplicity similar to ours. • a closed formula for the multiplicity (as a specialization of a factorial Schur function), thereby answering the question we raised above. • a formula for the restriction to the torus fixed point of the equivariant cohomology class of a Schubert variety. The approach in [6] is quite different from ours. In fact, it is the opposite of ours in that it circumvents the lack of results about initial ideals of tangent cones, while our prime motivation is to remedy the lack. The starting points in the two approaches are also different: [6] takes off from certain results of Kostant-Kumar and Arabia on equivariant cohomology, while our launchpad is standard monomial theory. The appearance of [6] notwithstanding, our approach is worthwhile, for, quite apart from the difference in starting points, there is no way, as far as we can tell, to the Hilbert function via the approach of [6], nor to the initial ideal, both of which are interesting in their own right. Part I The theorem Definitions are recalled, the problem formulated, and the theorem stated. 1 The set up In this section, we state the problem to be addressed after recalling the neces- sary basic definitions, make some choices that are convenient for studying the problem, and see why it is enough to focus on a particular case of the problem. 1.1 The statement of the problem Fix an algebraically closed field of characteristic not equal to 2. Fix a vector space V of finite dimension n over this field and a non-degenerate symmetric bilinear form 〈 , 〉 on V . Let d be the integer such that either n = 2d or n = 2d+ 1. A linear subspace of V is said to be isotropic if the form 〈 , 〉 vanishes identically on it. It is elementary to see that an isotropic subspace of V has dimension at most d and that every isotropic subspace is contained in one of dimension d. Denote by Md(V ) ′ the closed sub-variety of the Grassmannian of d-dimensional subspaces consisting of the points corresponding to isotropic subspaces. The orthogonal group O(V ) of linear automorphisms of V preserving 〈 , 〉 acts transitively on Md(V ) ′, for by Witt’s theorem an isometry between sub- spaces can be lifted to one of the whole vector space. If n is odd the special orthogonal group SO(V ) (consisting of form preserving linear automorphisms with trivial determinant) itself acts transitively on Md(V ) ′. If n is even the spe- cial orthogonal group SO(V ) does not act transitively on Md(V ) ′, and Md(V ) has two connected components. We define the orthogonal Grassmannian Md(V ) to be Md(V ) ′ if n is odd and to be one of the two components of Md(V ) ′ if n is even. The Schubert varieties of Md(V ) are defined to be the B-orbit closures in Md(V ) (with canonical reduced scheme structure), where B is a Borel sub- group of SO(V ). The choice of B is immaterial, for any two of them are con- jugate. The question that is tackled in this paper is this: given a point on a Schubert variety in Md(V ), how to compute the multiplicity (and more gen- erally, the Hilbert function) of the Schubert variety at the given point? The answers are contained in Theorem 2.3.1 and Corollary 2.3.2. But in order to make sense of those statements, we need some preparation. 1.2 Some convenient choices We now make some choices that are convenient for the study of Schubert vari- eties. For k an integer such that 1 ≤ k ≤ n, set k∗ := n + 1 − k. Fix a basis e1, . . . , en of V such that 〈ei, ek〉 = 1 if i = k∗ 0 otherwise The advantage of this choice is: the elements of SO(V ) for which each ek is an eigenvector form a maximal torus, and the elements that are upper triangular with respect to this basis form a Borel subgroup (a linear transformation is upper triangular if for each k, 1 ≤ k ≤ n, the image of ek under the transformation is a linear combination of e1, . . . , ek). We denote this maximal torus and this Borel subgroup by T and B respectively. Our Schubert varieties will be orbit closures of this particular Borel subgroup B. The B-orbits of Md(V ) ′ are naturally indexed by its T -fixed points: each orbit contains one and only one such point. The T -fixed points are evidently of the form 〈ei1 , . . . , eid〉, where 1 ≤ i1 < . . . < id ≤ n and for each k, 1 ≤ k ≤ d, there does not exist j, 1 ≤ j ≤ d, such that i∗k = ij—in other words, for each ℓ, 1 ≤ ℓ ≤ n, such that ℓ 6= ℓ∗, exactly one of ℓ and ℓ∗ appears in {i1, . . . , id}; in addition, if n is odd, then d+ 1 does not appear in {i1, . . . , id}. Denote the set of such d-element subsets {i1 < . . . < id} by I n. We thus have a bijective correspondence between I ′n and the B-orbits of Md(V ) ′. Each B-orbit being irreducible and open in its closure, it follows that B-orbit closures are indexed by the B-orbits. Thus I ′n is an indexing set for B-orbit closures in Md(V ) Suppose that n is even—it will be shown presently that it is enough to consider this case. As already observed, Md(V ) ′ has two connected components on each of which SO(V ) acts transitively. The B-orbits belong to one or the other component accordingly as the parity of the cardinality of the number of entries bigger than d in the corresponding element of I ′n. We take Md(V ) to be the component in which these cardinalities are even. We let In denote the subset of I ′n consisting of elements for which this cardinality is even. Schubert varieties in Md(V ) are thus indexed by elements of In. 1.3 Reduction to the case n even We now argue that it is enough to consider the case n even. Suppose that n is odd. Let ñ := n + 1 and Ṽ be a vector space of dimension ñ with a non-degenerate symmetric form. Let ẽ1, . . . , ẽen be a basis of Ṽ as in 1.2. Put e := ẽd+1 and f := ẽd+2. Take λ to be an element of the field such that λ2 = 1/2. We can take V to be the subspace of Ṽ spanned by the vectors ẽ1, . . . , ẽd, λe + λf, ẽd+3, . . . , ẽen, and a basis of V to be these vectors in that order. There is a natural map from Md+1(Ṽ ) ′ to Md(V ): intersecting with V an isotropic subspace of Ṽ of dimension d + 1 gives an isotropic subspace of V of dimension d. This map is onto, for every isotropic subspace of Ṽ (and hence of V ) is contained in an isotropic subspace of Ṽ of dimension d + 1. It is also elementary to see that the map is two-to-one (essentially because in a two- dimensional space with a non-degenerate symmetric form there are two isotropic lines), and that the two points in any fiber lie one in each component (there is clearly an element in O(Ṽ ) \ SO(Ṽ ) that moves one element of the fiber to the other, and so if there was an element of SO(Ṽ ) that also moved one point to the other, the isotropy at the point would not be contained in SO(Ṽ ), a contradiction). We therefore get a natural isomorphism between Md+1(Ṽ ) and Md(V ). We will now show that the B̃-orbits in Md+1(Ṽ ) correspond under the isomorphism to B-orbits of Md(V ) (we denote by T̃ and B̃ the maximal torus and Borel subgroups of SO(Ṽ ) as in §1.2). It will then follow that Schubert varieties in Md+1(Ṽ ) are isomorphic to those in Md(V ) and the purpose of this subsection will be achieved. The group SO(V ) can be realized as the subgroup of SO(Ṽ ) consisting of the elements that fix e − f . The isomorphism Md+1(Ṽ ) ∼= Md(V ) above is equivariant for SO(V ), and we have T̃ ∩ SO(V ) = T and B̃ ∩ SO(V ) = B. It should now be clear that the preimages in Md+1(Ṽ ) of two elements in the same B-orbit of Md(V ) are in the same B̃-orbit: an element of B that moves one to the other considered as an element of B̃ moves also the preimage of the one to that of the other. On the other hand, the preimages of distinct T -fixed points are distinct T̃ -fixed points, the corresponding map from I ′n to Ien being given as follows: i = {i1 < . . . < id} 7→ {̃i1, . . . , ĩd, d+ 1} if i ∈ In {̃i1, . . . , ĩd, d+ 2} if i ∈ I n \ In where ĩk = ik if ik ≤ d ik + 1 if ik ≥ d+ 2 (Note that d + 1 never occurs as an entry in any element of I ′n and that the elements ĩ1, . . . , ĩd, d + 1 (respectively ĩ1, . . . , ĩd, d + 2) are not in increasing order except in the trivial case i = {1 < . . . < d}.) Given that each B-orbit has a T -fixed point and that distinct T̃ -fixed points belong to distinct B̃-orbits, this implies that the preimages of two elements in distinct B-orbits belong to distinct B̃-orbits, and the proof is over. � 2 The theorem The purpose of this section is to state the main theorem and its corollary. We first set down some basic notation and two fundamental definitions needed in order to state the theorem. 2.1 Basic notation We keep the terminology and notations of §1.1, 1.2. As observed in §1.3, it is enough to consider the case n even. So from now on let n = 2d. Recall that, for an integer k, 1 ≤ k ≤ 2d, k∗ := 2d + 1 − k. As observed in §1.2, Schubert varieties in Md(V ) are indexed by In. Since d now determines n, we will henceforth write I(d) instead of In. In other words, I(d) is the set of d-element of subsets of {1, . . . , 2d} such that • for each k, 1 ≤ k ≤ 2d, the subset contains exactly one of k, k∗, and • the number of elements in the subset that exceed d is even. We write I(d, 2d) for the set of all d-element subsets of {1, . . . , 2d}. There is a natural partial order on I(d, 2d) and so also on I(d): v = (v1 < . . . < vd) ≤ w = (w1 < . . . < wd) if and only if v1 ≤ w1, . . . , vd ≤ wd. Given v ∈ I(d), the corresponding T -fixed point in Md(V ) (namely, the span of ev1 , . . . , evd) is denoted e v. Given w ∈ I(d), the corresponding Schubert variety in Md(V ) (which, by definition, is the closure of the B-orbit of the T - fixed point ew with canonical reduced scheme structure) is denoted X(w). The point ev belongs to X(w) if and only if v ≤ w in the partial order just defined. Since, under the natural action of B on X(w), each point of X(w) is in the B-orbit of a T -fixed point ev for some v such that v ≤ w, it is enough to focus attention on such T -fixed points. For the rest of this section an element v of I(d) will remain fixed. We will be dealing extensively with ordered pairs (r, c), 1 ≤ r, c ≤ 2d, such that r is not and c is an entry of v. Let R denote the set of all such ordered pairs, and set N := {(r, c) ∈ R | r > c} OR := {(r, c) ∈ R | r < c∗} ON := {(r, c) ∈ R | r > c, r < c∗} = OR ∩N d := {(r, c) ∈ R | r = c∗} diagonal boundary (r, c) (c∗, c) (r, r∗) The picture shows a drawing of R. We think of r and c in (r, c) as row index and column index respectively. The columns are indexed from left to right by the entries of v in ascending order, the rows from top to bottom by the entries of {1, . . . , 2d} \ v in ascending order. The points of d are those on the diagonal, the points of OR are those that are (strictly) above the diag- onal, and the points of N are those that are to the South-West of the poly- line captioned “boundary of N”—we draw the boundary so that points on the boundary belong to N. The reader can readily verify that d = 13 and v = (1, 2, 3, 4, 6, 7, 10, 11, 13, 15, 18, 19, 22) for the particular picture drawn. The points of ON indicated by solid circles form a v-chain (see §2.2.1 below). We will be consideringmonomials, also calledmultisets, in some of these sets. A monomial, as usual, is a subset with each member being allowed a multiplicity (taking values in the non-negative integers). The degree of a monomial has also the usual sense: it is the sum of the multiplicities in the monomial over all elements of the set. The intersection of a monomial in a set with a subset of the set has also the natural meaning: it is a monomial in the subset, the multiplicities being those in the original monomial. We will refer to d as the diagonal. 2.2 Two fundamental definitions 2.2.1 Definition of v-chain Given two elements (R,C) and (r, c) in ON, we write (R,C) > (r, c) if R > r and C < c (note that these are strict inequalities). An ordered sequence α, β, . . . of elements of ON is called a v-chain if α > β > . . . . A v-chain α1 > . . . > αℓ has head α1, tail αℓ, and length ℓ. 2.2.2 Definition of O-domination To a v-chain C : α1 > α2 > . . . inON there corresponds, as described in §5.3.3, a subset SC ofN which, as observed in Proposition 5.3.5, is “distinguished” in the sense of §5.1.1. To a distinguished subset of N there corresponds, as described below in §5.1.2, an element of I(d, 2d). Following these correspondences through, we get an element of I(d, 2d) attached to the v-chain C. Let w(C) denote this element—sometimes we write wC . (All this makes sense even when C is empty— w(C) will turn out to be v itself in that case.) Furthermore, as will be obvious from its definition, the monomial SC is “symmetric” in the sense of §5.2.2 and contains evenly many elements of the diagonal d. Thus, by Proposition 5.2.1, the element w(C) of I(d, 2d) belongs to I(d). An element w of I(d) is said toO-dominate C if w ≥ w(C), or, equivalently— and this is important for the proofs—if w dominates in the sense of [7] the mono- mial SC (for the proof of the equivalence, see [7, Lemma 5.5]). An element w of I(d) O-dominates a monomial S of ON (repsectively of OR) if it O-dominates every v-chain in S (respectively in S ∩ON). 2.3 The main theorem and its corollary Theorem 2.3.1 Fix a positive integer d and elements v ≤ w of I(d). Let V be a vector space of dimension 2d with a symmetric non-degenerate bilinear form (over a field of characteristic not 2). Let X(w) be the Schubert variety corresponding to w in the orthogonal Grassmannian Md(V ), and e v the torus fixed point of X(w) corresponding to v. Let Rwv denote the associated graded ring with respect to the unique maximal ideal of the local ring of germs at ev of functions on X(w). Then, for any non-negative integer m, the dimension as a vector space of the homogeneous piece of Rwv of degree m equals the cardinality of the set Sw(v)(m) of monomials of degree m of OR that are O-dominated by w. The proof of this theorem occupies us for most of this paper. It is reduced in §3, by an application of standard monomial theory, to combinatorics. The resulting combinatorial problem is solved in §4–10. For now, let us note the following immediate consequence: Corollary 2.3.2 The multiplicity at the point ev of the Schubert variety X(w) equals the number of monomials in ON of maximal cardinality that are square- free and O-dominated by w. Proof: The proof of Corollary 2.2 of [7] holds verbatim here too. � Part II From geometry to combinatorics The problem is translated from geometry to combinatorics. The main combi- natorial results are formulated. 3 Reduction to combinatorics In this section we translate the problem from geometry to combinatorics. In §3.1 we recall from [17] the theorem that enables the translation. The translation itself is done in 3.2 and follows [14]. 3.1 Homogeneous co-ordinate ring of the Schubert vari- ety X(w) 3.1.1 The line bundle L on Md(V ) Let Md(V ) ⊆ Gd(V ) →֒ P(∧ dV ) be the Plücker embedding (where Gd(V ) denotes the Grassmannian of all d-dimensional subspaces of V ). The pull-back toMd(V ) of the line bundle O(1) on P(∧ dV ) is the square of the ample generator of the Picard group ofMd(V ). Letting L denote the ample generator, we observe that it is very ample and want to describe the homogeneous coordinate rings of Md(V ) and its Schubert subvarieties in the embedding defined by L. 3.1.2 The section qθ of L For θ in I(d, 2d), let pθ denote the corresponding Plücker coordinate. Consider the affine patch A of P(∧dV ) given by pǫ = 1, where ǫ := (1, . . . , d). The intersection A ∩Gd(V ) of this patch with the Grassmannian is an affine space. Indeed the d-plane corresponding to an arbitrary point z of A ∩ Gd(V ) has a basis consisting of column vectors of a matrix of the form where I is the identity matrix and A an arbitrary matrix both of size d × d. The association z 7→ A is bijective. The restriction of a Plücker coordinate pθ to A∩Gd(V ) is given by the determinant of a submatrix of size d× d of M , the entries of θ determining the rows to be chosen from M to form the submatrix. As can be readily verified, a point z of A ∩ Gd(V ) represents an isotropic subspace if and only if the corresponding matrix A = (aij) is skew-symmetric with respect to the anti-diagonal : aij + aj∗i∗ = 0, where the columns and rows of A are numbered 1, . . . , d and d+1, . . . , 2d respectively. For example, if d = 4, then a matrix that is skew-symmetric with respect to the anti-diagonal looks like this:  −d −c −b 0 −g −f 0 b −i 0 f c 0 i g d Since the set of these matrices is connected and contains the point that is spanned by e1, . . . , ed, it follows that A∩Gd(V ) does not intersect the other com- ponent of Md(V ) ′. In other words, pǫ vanishes everywhere on Md(V ) ′ \Md(V ). Now suppose that θ belongs to I(d). Computing pθ/pǫ as a function on the affine patch pǫ 6= 0, we see that it is the determinant of a skew-symmetric matrix of even size, and therefore a square. The square root, which is determined up to sign, is called the Pfaffian. This suggests that pθ itself is a square: more precisely that there exists a section qθ of the line bundle L on Md(V ) such that q2θ = pθ. A weight calculation confirms this to be the case. The qθ are also called Pfaffians. 3.1.3 Standard monomial theory for Md(V ) A standard monomial in I(d) is a totally ordered sequence θ1 ≥ . . . ≥ θt (with repetitions allowed) of elements of I(d). Such a standard monomial is said to be w-dominated for w ∈ I(d) if w ≥ θ1. To a standard monomial θ1 ≥ . . . ≥ θt in I(d) we associate the product qθ1 · · · qθt , where the qθ are the sections defined above of the line bundle L. Such a product is also called a standard monomial and it is said to be dominated by w for w ∈ I(d) if the underlying monomial in I(d) is dominated by w. Standard monomial theory for Md(V ) says: Theorem 3.1.1 (Seshadri [17]) Standard monomials qθ1 · · · qθr of degree r form a basis for the space of forms of degree r in the homogeneous coordinate ring of Md(V ) in the embedding defined by the ample generator L of the Picard group. More generally, for w ∈ I(d), the w-dominated standard monomials of degree r form a basis for the space of forms of degree r in the homogeneous coordinate ring of the Schubert subvariety X(w) of Md(V ). 3.2 Co-ordinate rings of affine patches and tangent cones of X(w) From Theorem 3.1.1 one can deduce rather easily, as we now show, bases for co-ordinate rings of affine patches of the form qv 6= 0 and of tangent cones of Schubert varieties. An element v of I(d) will remain fixed for the rest of this section. To simplify notation we will suppress explicit reference to v. 3.2.1 Standard monomial theory for affine patches Let A denote the affine patch of P(H0(Md(V ), L) ∗) given by qv 6= 0. The origin of the affine space A is identified as the T -fixed point ev. The functions fθ := qθ/qv, v 6= θ ∈ I(d), provide a set of coordinate functions on A. Monomials in these fθ form a k-basis for the polynomial ring k[A] of functions on A, where k denotes the underlying field. Fix w ≥ v in I(d), so that the point ev belongs to the Schubert varietyX(w), and let Y (w) be the affine patch of X(w) defined thus: Y (w) := X(w) ∩ A. The coordinate ring k[Y (w)] of Y (w) is a quotient of the polynomial ring k[A], and the proposition that follows identifies a subset of the monomials in fθ which forms a k-basis for k[Y (w)]. We say that a standard monomial θ1 ≥ . . . ≥ θt in I(d) is v-compatible if for each k, 1 ≤ k ≤ t, either θk v or v θk. Given w in I(d), we denote by SM the set of w-dominated v-compatible standard monomials. Proposition 3.2.1 As θ1 ≥ . . . ≥ θt runs over the set SM w of w-dominated v-compatible standard monomials, the elements fθ1 · · · fθt form a basis for the coordinate ring k[Y (w)] of the affine patch Y (w) = X(w) ∩ A of the Schubert variety X(w). Proof: The proof is similar to the proof of Proposition 3.1 of [7]. First consider a linear dependence relation among the fθ1 · · · fθt . Replacing fθ by qθ and “ho- mogenizing” by qv yields a linear dependence relation among the w-dominated standard monomials qθ1 · · · qθs restricted to X(w), and so the original relation must only have been the trivial one, for by Theorem 3.1.1 the qθ1 · · · qθs are linearly independent on X(w). To prove that fθ1 · · · fθt generate k[Y (w)] as a vector space, we make the following claim: if qµ1 · · · qµr be any monomial in the Pfaffians qθ, and qτ1 · · · qτs a standard monomial that occurs with non-zero co-efficient in the expression for (the restriction to X(w) of) qµ1 · · · qµr as a linear combination of w-dominated standard monomials, then τ1∪· · ·∪τs = µ1∪· · ·∪µr as multisets of {1, . . . , 2d}. To prove the claim, consider the maximal torus T of SO(V ) as in §1.2. The affine patch A is T -stable and there is an action of T on k[Y (w)]. The sections qθ are eigenvectors for T with corresponding characters ǫθ1+· · ·+ǫθd , where ǫk denotes the character of T given by the projection to the diagonal entry on row k. The claim now follows since eigenvectors corresponding to different characters are linearly independent. Let fµ1 · · · fµr be an arbitrary monomial in the fθ. Fix an integer h such that h > r(d − 1) and consider the expression for (the restriction to X(w) of) qµ1 · · · qµr · q v as a linear combination of w-dominated standard monomials. We claim that qv occurs in every standard monomial qτ1 · · · qτr+h in this expression (from which it will follow that the τj are all comparable to v). Suppose that none of τ1, . . . , τr+h equals v. For each τj there is at least one entry of v that does not occur in it. The number of occurrences of entries of v in τ1 ∪ · · · ∪ τr+h is thus at most (r + h)(d − 1). But these entries occur at least hd times in µ1 ∪ · · · ∪ µr ∪ v ∪ · · · ∪ v (where v is repeated h times), a contradiction to the claim proved in the previous paragraph. Hence our claim is proved. Dividing by qr+hv the expression for qµ1 · · · qµr .q v as a linear combination of w-dominated standard monomials provides an expression for fµ1 · · · fµr as a linear combina- tion of fθ1 · · · fθt , as θ1 ≥ . . . ≥ θt varies over SM 3.2.2 Standard monomial theory for tangent cones The affine patch Md(V )∩A of the orthogonal Grassmannian Md(V ) is an affine space whose coordinate ring can be taken to be the polynomial ring in variables of the form X(r,c) with (r, c) ∈ OR, where (as in §2.1) OR = {(r, c) | 1 ≤ r, c ≤ 2d, r 6∈ v, c ∈ v, r < c∗} Taking d = 5 and v = (1, 3, 4, 6, 9) for example, a general element of Md(V )∩A has a basis consisting of column vectors of a matrix of the following form:  1 0 0 0 0 X21 X23 X24 X26 0 0 1 0 0 0 0 0 1 0 0 X51 X53 X54 0 −X26 0 0 0 1 0 X71 X73 0 −X54 −X24 X81 0 −X73 −X53 −X23 0 0 0 0 1 0 −X81 −X71 −X51 −X21  The expression for fθ = qθ/qv in terms of the X(r,c) is a square root of the determinant of the submatrix of a matrix like the one above obtained by choosing the rows given by the entries of θ. Thus fθ is a homogeneous polynomial of degree the v-degree of θ, where the v-degree of θ is defined as one half of the cardinality of v \ θ. Since the ideal of the Schubert variety X(w) in the homogeneous coordinate ring of Md(V ) is generated 2 by the qτ , τ ∈ I(d) such that τ 6≤ w, it follows that the ideal of Y (w) := X(w) ∩ A in Md(V ) ∩ A is generated by the the fτ , τ ∈ I(d) such that τ 6≤ w. We are interested in the tangent cone to X(w) at ev (or, what is the same, the tangent cone to Y (w) at the origin), and since k[Y (w)] is graded, its associated graded ring with respect to the maximal ideal corresponding to the origin is k[Y (w)] itself. Proposition 3.2.1 says that the graded piece of k[Y (w)] of degree m is gener- ated as a k-vector space by elements of degreem of the set SMw of w-dominated v-compatible standard monomials, where the degree of a standard monomial θ1 ≥ . . . ≥ θt is defined to be the sum of the v-degrees of θ1, . . . , θt. To prove Theorem 2.3.1 it therefore suffices to prove the following: 2This is a consequence of Theorem 3.1.1. It is easy to see that the qτ such that τ 6≤ w vanish on X(w). Since all standard monomials form a basis for the homogeneous coordinate ring of Md(V ) in P(H 0(Md(V ), L) ∗), it follows that w-dominated standard monomials span the quotient ring by the ideal generated by such qτ . Since such monomials are linearly independent in the homogeneous coordinate ring of X(w), the desired result follows. Theorem 3.2.2 The set SMw(m) of standard monomials in I(d) of degree m that are w-dominated and v-compatible is in bijection with the set Sw(v)(m) of monomials in OR of degree m that are O-dominated by w. 4 Further reductions In the last section, we reduced the proof of our main theorem (Theorem 2.3.1) to that of Theorem 3.2.2. We now reduce the proof of Theorem 3.2.2 to that of Propositions 4.1.1, 4.1.2 and 4.1.3 below. These propositions will eventually be proved in §10. 4.1 The main propositions Fix once and for all an element v of I(d). The bijection stated in Theorem 3.2.2 will be described by means of two maps Oπ and Oφ whose definitions will be given in §7 and §8 below. We will now state some properties of these maps. In §4.2 we will see how Theorem 3.2.2 follows once these properties are estab- lished. The map Oπ associates to a monomial S in ON a pair (w,S′) consisting of an element w of I(d) and a “smaller” monomial S′ in ON. This map enjoys the following good properties: Proposition 4.1.1 1. w ≥ v. 2. v-degree(w) + degree(S′) = degree(S). 3. w O-dominates S′. 4. w is the least element of I(d) that O-dominates S. The map Oφ, on the other hand, associates a monomial in ON to a pair (w,T) consisting of an element w of I(d) with w ≥ v and a monomial T in ON that is O-dominated by w. Proposition 4.1.2 The maps Oπ and Oφ are inverses of each other. For an integer f , 1 ≤ f ≤ 2d, consider the following conditions, the first on a monomial S in ON, the second on an element w of I(d): (‡) f is not the row index of any element of S and f⋆ is not the column index of any element of S. (‡) f is not an entry of w. (It is convenient to the use the same notation (‡) for both conditions.) Proposition 4.1.3 Assume that v satisfies (‡)—all references to (‡) in this proposition are with respect to a fixed f , 1 ≤ f ≤ 2d. 1. Let w be an element of I(d) with w ≥ v and T a monomial in ON that is O-dominated by w. If w and T both satisfy (‡), then so does Oφ(w,T). 2. If a monomial S in ON satisfies (‡), then so do the “components” w and S′ of its image under Oπ. 4.2 From the main propositions to the main theorem Let us now see how Theorem 3.2.2 follows from the propositions of §4.1. Most of the following argument runs parallel to its counterparts in the case of the Grassmannian and symplectic Grassmannian (Propositions 4.1.1 and 4.1.2 have their counterparts in [7, 4]), but, in the case that d is odd, the part involving the “mirror image” requires additional work. This is where Proposition 4.1.3 comes in. Let S, T , and U , denote respectively the sets of monomials in OR, ON, and OR\ON. Let SMv denote the set of v-compatible standard monomials that are “anti-dominated” by v: a standard monomial θ1 ≥ . . . ≥ θt is anti-dominated by v if θt ≥ v (we can also write θt > v since θt 6= v by v-compatibility). Define the domination map from T to I(d) by sending a monomial in ON to the least element that O-dominates it. Define the domination map from SMv to I(d) by sending θ1 ≥ . . . ≥ θt to θ1. Both these maps take, by definition, the value v on the empty monomial. Notation 4.2.1 In the following, we use subscripts, superscripts, suffixes, and combinations thereof to modify the meanings of S, T , U , SM , and SMv. • superscript: this will be an element w of I(d); when used on T it denotes O-domination (more precisely, Tw denotes the subset of T consisting of those elements that are O-dominated by w); when used on SM or SMv it denotes domination by w. • subscript: denotes anti-domination (applied only to standard monomials). • suffix “(m)”: indicates degree (for example, SMwv (m) denotes the set of v- compatible standard monomials that are anti-dominated by v, dominated by w, and of degree m). Repeated application of Oπ gives a map from T to SMv that commutes with domination (as just defined) and preserves degree. Repeated application of Oφ gives a map from SMv to T . These two maps being inverses of each other (Proposition 4.1.2) and so we have a bijection between SMv and T . In fact, since domination and degree are respected (Proposition 4.1.1), we get a bijection SMwv (m) ∼= Tw(m). As explained below, the “mirror image” of the bijection SMv(m) ∼= T (m) gives a bijection SMv(m) ∼= U(m). Putting these bijections together, we get the desired result: SMw(m) = SMwv (k)× SM v(m− k) Tw(k)× U(m− k) = Sw(m). We now explain how to realize the bijection SMv(m) ∼= U(m) as the “mirror image” of the bijection SMv(m) ∼= T (m). For an element u of I(d), define u∗ := (u∗d, . . . , u 1). In the case d is even, the association u 7→ u ∗ is an order reversing involution, and the argument in [4] for the symplectic Grassmannian holds here too. In the case d is odd, u∗ is not an element of I(d), and so some additional work is required. Recall that a “base element” v of I(d) has been fixed and that our notation does not explicitly indicate this dependence upon v: for example, OR is depen- dent upon v. For a brief while now (until the end of this section) we need to simultaneously handle several base elements of I(d). We will use the following convention: when the base element of I(d) is not v, we will explicitly indicate it by means of a suffix. For instance, SM(v∗) denotes the set of v∗-compatible standard monomials in I(d). Let us first do the case when d is even. We get a bijection SMv ∼= SMv∗(v by associating to θ1 ≥ . . . ≥ θt the element θ t ≥ . . . ≥ θ 1 . The sum of the v-degrees of θ1, . . . , θt equals the sum of the v ∗-degrees of θ∗t , . . . , θ 1 , so that we get a bijection SMv(m) ∼= SMv∗(v ∗)(m). For an element (r, c) of ON(v∗), consider its flip (c, r). Since v belongs to I(d), the complement of v∗ in {1, . . . , 2d} is v, and it follows that (c, r) belongs to OR \ ON. This induces a degree preserving bijection T (v∗) ∼= U . Putting this together with the bijection of the previous paragraph and the one deduced earlier in this section (using Oπ and Oφ), we get what we want: SMv(m) ∼= SMv∗(v ∗)(m) ∼= T (v ∗)(m) ∼= U(m). Now suppose that d is odd. Then the map x 7→ x∗ does not map I(d) to I(d) but to I(d)∗ (defined as the set consisting of those elements u of I(d, 2d) such that, for each k, 1 ≤ k ≤ 2d, exactly one of k, k∗ belongs to u, and the number of entries of u greater than d is odd). We define a map u 7→ ũ from I(d)∗ to I(d + 1) as follows: ũ := {ũ1, . . . , ũd, d + 2} (the elements are not in increasing order except in the trivial case u = (1, . . . , d)), where, for an integer e, 1 ≤ e ≤ 2d, we set ẽ := e if 1 ≤ e ≤ d e+ 2 if d+ 1 ≤ e ≤ 2d This map u 7→ ũ is an order preserving injection. Consider the composition x 7→ x∗ 7→ x̃∗ from I(d) to I(d+1). This is an order reversing injection. The induced map on standard monomials is an injection from SMv to SMfv∗(ṽ ∗). It is readily seen that the image under this map is the subset SMfv∗(ṽ ∗)(‡) consisting of those standard monomials all of whose elements satisfy (‡) with f = d+1. We have already established (using the maps Oπ and Oφ) a bijection SMfv∗(ṽ ∗) ∼= T (ṽ∗). It follows from Proposition 4.1.3 that under this bijection the subset SMfv∗(ṽ ∗)(‡) maps to T (ṽ∗)(‡) (defined as the set of those monomials in ON(ṽ∗) satisfying (‡) with f = d+ 1). Now T (ṽ∗)(‡) is in degree preserving bijection with U : every element of degree 1 of T (ṽ∗)(‡) is uniquely of the form (c̃, r̃) for (r, c) in OR\ON, and the desired bijection is induced from this. Putting all of these together, we finally SMv ∼= SMfv∗(ṽ ∗)(‡) ∼= T (ṽ∗)(‡) ∼= U. Thus, in order to prove our main theorem (Theorem 2.3.1), it suffices to describe the maps Oπ and Oφ and to prove Propositions 4.1.1–4.1.3. Part III The proof The main combinatorial results formulated in §4.1 are proved. An attempt is made to maintain parallelism with the proofs in [7]. 5 Terminology and notation 5.1 Distinguished subsets 5.1.1 Distinguished subsets of N Following [7, §4], we define a multiset S of N to be distinguished , if, first of all, it is a subset in the usual sense (in other words, it is “multiplicity free”), and if, for any two distinct elements (R,C) and (r, c) of S, the following conditions are satisfied: A. R 6= r and C 6= c. B. If R > r, then either r < C or C < c. In terms of pictures, condition A says that (r, c) cannot lie exactly due North or East of (R,C) (or the other way around); so we can assume, interchanging the two points if necessary, that (r, c) lies strictly to the Northeast or Northwest of (R,C); condition B now says that, if (r, c) lies to the Northwest of (R,C), then the point that is simultaneously due North of (R,C) and due East of (r, c) (namely (r, C)) does not belong to N. 5.1.2 Attaching elements of I(d, 2d) to distinguished subsets of N To a distinguished subset S of N there is naturally associated an element w of I(d, 2d) as follows: start with v, remove all members of v which appear as column indices of elements of S, and add row indices of all elements of S. As observed in [7, Proposition 4.3], this association gives a bijection between distinguished subsets of N and elements w ≥ v of I(d, 2d). The unique distinguished subset of N corresponding to an element w ≥ v of I(d, 2d) is denoted Sw. 5.2 The involution # 5.2.1 The involution # on I(d, 2d) There are two natural order reversing involutions on I(d, 2d). First there is w 7→ w∗ induced by the natural order reversing involution j 7→ j∗ on {1, . . . , 2d}: here w∗ has the obvious meaning, namely, it consists of all j∗ such that j belongs to w. Then there is the map taking w to its complement {1, . . . , 2d}\w. These two involutions commute. Composing the two we get an order preserving involution on I(d, 2d) which we denote by w 7→ w#. The elements of the subset I(d) are fixed points under this involution (there are points not in I(d) that are also fixed). 5.2.2 The involution # on N and R For α = (r, c) in N, or more generally in R, define α# = (c∗, r∗). The involution α 7→ α# is just the reflection with respect to the diagonal d. For a subset or even multiset S of N (or R), the symbol S# has the obvious meaning. We call S symmetric if S = S#. Proposition 5.2.1 An element w ≥ v of I(d, 2d) belongs to I(d) if and only if the distinguished subset Sw of N corresponding to it as described in §5.1.2 is symmetric and has evenly many diagonal elements. Proof: That the symmetry of Sw is equivalent to the condition that w = w is proved in [4, Proposition 5.7]. Now suppose that Sw is symmetric. We claim that for an element (r, c) of Sw that is not on the diagonal, either both r and c are bigger than d or both are less than d + 1. It is enough to prove the claim, for w is obtained from v by removing the column indices and adding the row indices of elements of Sw, and it would follow that the number of entries in w that are bigger than d equals the number of such entries in v plus the number of diagonal elements in Sw. We now prove the claim. Since Sw is symmetric, it follows that (c ∗, r∗) also belongs to Sw. Since Sw is distinguished, it follows that in case r < c ∗ (that is, if (r, c) lies above the diagonal), we have r < r∗, and so c < r < r∗; and in case r > c∗, we have c∗ < c, and so c∗ < c < r. Thus the claim is proved. � 5.3 The subset SC attached to a v-chain C 5.3.1 Vertical and horizontal projections of an element of ON For α = (r, c) in ON (or more generally in OR), the elements pv(α) := (c ∗, c) and ph(α) := (r, r ∗) of the diagonal d are called respectively the vertical and horizontal projections of α. In terms of pictures, the vertical projection is the element of the diagonal due South of α; the horizontal projection is the element of the diagonal due East of α. The vertical line joining α to its vertical projection pv(α) and the horizontal line joining α to its horizontal projection ph(α) are called the legs of α. 5.3.2 The “connection” relation on elements of a v-chain Let C : α1 = (r1, c1) > α2 = (r2, c2) > · · · be a v-chain in ON. Two consecutive elements αj and αj+1 of C are said to be connected if the following conditions are both satisfied: • their legs are “intertwined”; equivalently and more precisely, this means that r∗j ≥ cj+1, or, what amounts to the same, rj ≤ c • the point (rj+1, r j ) belongs to N; this just means that rj+1 > r Consider the coarsest equivalence relation on the elements of C generated by the above relation. The equivalence classes of C with respect to this equivalence relation are called the connected components of the v-chain C. This definition has its quirks: The v-chain C : α > β > γ in the pic- ture has {α, β} and {γ} as its connected components; but the “sub” v-chain α > γ of C is connected (as a v-chain in its own right). diagonal boundary 5.3.3 The definition of SC We will define SC as a multiset of N. It is easy to see and in any case stated explicitly as part of Corollary 5.3.5 that it is multiplicity free and so is actually a subset of N. First suppose that C : α1 = (r1, c1) > · · · > αℓ = (rℓ, cℓ) is a connected v-chain in ON. Observe that, if there is at all an integer j, 1 ≤ j ≤ ℓ, such that the horizontal projection ph(αj) does not belong to N, then j = ℓ. Define SC := {pv(α1), . . . , pv(αℓ)} if ℓ is even {pv(α1), . . . , pv(αℓ)} ∪ {ph(αℓ)} if ℓ is odd and ph(αℓ) ∈ N {pv(α1), . . . , pv(αℓ−1)} ∪ {αℓ, α } if ℓ is odd and ph(αℓ) 6∈ N For a v-chain C that is not necessarily connected, let C = C1 ∪ C2 ∪ · · · be the partition of C into its connected components, and set SC := SC1 ∪SC2 ∪ · · · 5.3.4 The type of an element α of a v-chain C, and the set SC,α We introduce some terminology and notation. Their usefulness may not be immediately apparent. Suppose that C : α1 > · · · > αℓ is a connected v-chain. We define the type in C of an element αj , 1 ≤ j ≤ ℓ, of C to be V, H, or S, accordingly as: V: j 6= ℓ, or j = ℓ and ℓ is even. H: j = ℓ, ℓ is odd, and ph(αℓ) ∈ N. S: j = ℓ, ℓ is odd, and ph(αℓ) 6∈ N. The type of an element in a v-chain that is not necessarily connected is defined to be its type in its connected component. The set SC,α of elements of N generated by an element α of C is defined to SC,α := {pv(α)} if α is of type V in C; {pv(α), ph(α)} if α is of type H in C; {α, α#} if α is of type S in C; Observe that, for a v-chain C, the monomial SC defined in §5.3.3 is the union, over all elements α of C, of SC,α. For an element α of a v-chain C, we define qC,α to be pv(α) if α is of type V or H and to be α if it is of type S. If the horizontal projection of an element in a v-chain does not belong to N, then clearly the same is true for every succeeding element. The first such element of a v-chain is called the critical element. Proposition 5.3.1 1. The cardinality is odd of a connected component that has an element of type H or S. Conversely, if the cardinality of a compo- nent is odd, then it has an element of type H or S. 2. An element of type H or S can only be the last element in its connected component. 3. The critical element has type either V or S. No element before it can be of type S and every element after it is of type S. In particular, any element that succeeds an element of type S is of type S. Proof: Clear from definitions. � Proposition 5.3.2 Let α > γ be elements of a v-chain C (we are not assuming that they are consecutive). 1. If α > γ is connected as a v-chain in its own right, then α is connected to its next member in C; that is, α cannot be the last element in its connected component in C. 2. If α > γ is not connected as a v-chain in its own right and the legs of α and γ intertwine, then the connected component of γ in C is the singleton {γ}, and γ has type S in C. Proof: Clear from definitions. � Proposition 5.3.3 Let E : α > . . . > ζ be a v-chain, D and D′ two v-chains with tail α, and C, C′ the concatenations of D, D′ respectively with E. Then 1. The last element in the connected component containing α is the same in C and C′ (and this is the same as in E). Let λ denote this element. 2. The only element among α, . . . , ζ that possibly has different types in C and C′ is λ. Proof: (1): Whether or not two successive elements in a v-chain are connected is independent of other elements in the v-chain. (2): The type of an element in a v-chain is V unless it is the last element in its connected component. And the type of the last element in a component depends on the cardinality of the component. The components of E not contain- ing α are still components in C and C′. In contrast, the component containing α could possibly be larger in C (respectively C′) and hence its cardinality could be different. � For an element α = (r, c) of N, we define α(up) to be α itself if α is either on or above the diagonal d (more precisely, if r ≤ c∗), and to be its “reflection” in the diagonal (more precisely, (c∗, r∗)) if α is below the diagonal (more precisely, if r > c∗). For a monomial S of N, S(up) is defined to be the intersection of S (as a multiset) with the subset ON ∪ d of N. The notations α(down) and S(down) have similar meanings. Caution: It is not true that S(up) = {α(up)|α ∈ S} (in the obvious sense one would make of the right hand side). In particular, for a singleton monomial {α}, it is not always true that {α}(up) = {α(up)}. Proposition 5.3.4 Let α and β be elements of a v-chain C. Let us use α′ and β′ respectively to denote elements of SC,α(up) and SC,β(up). 1. If α > β (these elements are not necessarily consecutive in C), then, given β′, there exists α′ such that α′ > β′. In fact, this is true for every choice of α′ except when (*) α is of type H, and ph(α) 6> β ′ for some β′ ∈ SC,β. In particular, qC,α > β ′ and qC,α > qC,β. 2. Conversely, suppose that α′ > β′ for some choice of α′ and β′. Then α ≥ β; if equality occurs, then α is of type H, α′ = pv(α) and β ′ = ph(α). In particular, if α′ > qC,β (or more specially qC,α > qC,β), then α > β. 3. If (*) holds for α > β in C, then (a) the critical element of C is the one just after α; in particular, α is uniquely determined. (b) all elements of C succeeding α are of type S; in particular, β is of type S and β′ = β. (c) (*) holds for γ in place of β for every γ in C that succeeds α. Proof: (1) If α is of type V or H, we need only take α′ = pv(α), for pv(α) > pv(β), pv(α) > ph(β), and pv(α) > β. Now suppose that α is of type S. Then β too is of type S (Proposition 5.3.1 (3)), so β′ can only be β, and the first part of (1) is proved. It follows from the above that if α′ = pv(α) or if α has type S, then α ′ > β′ independent of the choice of α′. So if α′ 6> β′, then (*) holds and α′ = ph(α). (3) Let λ be the immediate successor of α in C. Then α is not connected to λ (Proposition 5.3.1 (2)). Since ph(α) 6> β ′, it follows that α and β have intertwining legs. Therefore so do α and λ. By Proposition 5.3.2 (2), λ has type S in C. Since α has type H and λ type S, it follows immediately from the definition of the critical element that λ is the critical element. This proves (a). Assertion (b) now follows from Proposition 5.3.1 (3). For (c), write ph(α) = (a, a ∗), λ = (R,C), and γ = (r, c). Then R < a∗, for α and λ have intertwining legs but are not connected. So c < r ≤ R < a∗. This means ph(α) 6> γ. And γ being of type S (by (b)), we can take γ′ = γ. (2) Suppose that α 6≥ β. Then β > α. By the second part of (1) above, β is of type H and β′ = ph(β); by item (b) of (3), α is of type S, so α ′ = α. This leads to the contradiction β > α > ph(β). � Corollary 5.3.5 The multiset SC attached to a v-chain C is a distinguished subset of N in the sense of 5.1.1. Proof: If α in C is of type V or S, then SC,α is a singleton; if it is of type H, then SC,α = {pv(α), ph(β)}. So there can be no violation of conditions A and B of §5.1.1 by elements of SC,α. Suppose α > β. By Proposition 5.3.4 (1), we have α′ > β′ for any choice of α′ ∈ SC,α and β ′ ∈ SC,β except when the condition (*) holds. By (3) of the same proposition, if (*) holds, then β′ = β, and writing β = (r, c), ph(α) = (a, a ∗), we have r < a (since α > β) and c < r < a∗ (see proof of item 3(c) of the proposition). Thus there can be no violation of conditions A and B of §5.1.1. � Corollary 5.3.6 Let S be a v-chain in ON and w an element of I(d). If w O-dominates S, then w dominates in the sense of [7] the monomial S ∪ S# of N. Proof: By [4, Proposition 5.15], it is enough to show that w dominates S. Let C : α1 > . . . > αt be a v-chain in S. Writing αj = (rj , cj) and qC,αj = (Rj , Cj) we have rj ≤ Rj and Cj ≤ cj . By Proposition 5.3.4 (1), we have qC,α1 > . . . > qC,αt . Since w O-dominates S, it in particular dominates qC,α1 > . . . > qC,αt and so also C. � 6 O-depth The concept of O-depth defined in §6.1 below plays a key role in this paper. As the name suggests, it is the orthogonal analogue of the concept of depth of [7]. In §6.2 below, it is observed that the O-depth is no smaller than depth in the sense of [7]. In §6.3, some observations about the relation between O-depths and types of elements in v-chains are recorded. 6.1 Definition of O-depth The O-depth of an element α in a v-chain C in ON is the depth in SC in the sense of [7] of qC,α: in other words, it is the depth in SC of pv(α) in case α is of type V or H, and of α (equivalently of α#) in case α is of type S. It is denoted O-depthC(α). The O-depth of an element α in a monomial S of ON is the maximum, over all v-chains C in S containing α, of the O-depth of α in C. It is denoted O-depth (α). Finally, the O-depth of a monomial S in ON is the maximum of the O-depths in S of all the elements of S. There is a conflict in the above definitions: Is the O-depth of an element of a v-chain C the same as its depth as an element of the monomial C? In other words, could the O-depth of an element in a v-chain be exceeded by its O-depth in a sub-chain? The conflict is resolved by the first item of the following proposition. Proposition 6.1.1 1. For v-chains C ⊆ D, the O-depth in C of an element of C is no more than its O-depth in D. 2. If a v-chain C is an initial segment of a v-chain D, then the O-depths in C and D of an element of C are the same. Proof: (1): By an induction on the difference in the cardinalities of D and C, we may assume that D has one more element than C. Call this extra element δ. Suppose that δ lies between successive elements α and β of C (the modifications needed to cover the extreme cases when it goes at the beginning or the end are being left to the reader). The only elements of C that could possibly undergo changes of type on addition of δ are α and the last element in the connected component of β, which let us call β′. If there are no type changes, then SC ⊆ SD and the assertion is immediate. The only type change that α can undergo is from H to V. The type changes that β′ can undergo are: H to V; V to H; S to V; V to S. An easy enumeration of cases shows that only one of α and β′ can undergo a type change. We need not worry about changes from V to H for in this case SC ⊆ SD. First let us suppose that α undergoes a change of type (from H to V). Then δ is connected to α. It follows from Proposition 5.3.1 (1) that δ has type V in D: the connected component of α in C has odd number of elements, so if δ happens to be the last element in its connected component in D, the number of elements in that component will be even. Replacing an occurrence of ph(α) in a v-chain of SC by pv(δ) would result in a v-chain in SD (by Proposition 5.3.4 (1)), and this case is settled. Now suppose that β′ undergoes a type change. Then δ is connected to β and δ is of type V in D (Proposition 5.3.1 (2)). Replacing by pv(δ) any occurrence in a v-chain in SC of pv(β ′), ph(β ′), β′ accordingly as the type of β′ in C is V, H, or S, (not necessarily in the same place but at an appropriate place) would result in a v-chain in SD (by Proposition 5.3.4 (1)), and we see that the O-depth cannot decrease. (2): It follows from Proposition 5.3.4 (2) that, for an element α of C, con- tributions to SD from elements beyond α (in particular from those not in C) do not affect the depth in SD of qD,α. Looking for the possibility of differences in types in C and D of elements of C, we see that the only element of C that has possibly a different type in D is its last element. And this too can change type only from H to V. The above two observations imply that the calculations of O-depths in C and D of an element α of C are no different: we would be considering the depth in SC and SD respectively of the same element (either pv(α) or α), and the differences in SD and SC have no effect on this consideration. � Corollary 6.1.2 If C ⊆ D are v-chains in ON, then wC ≤ wD (although it is not always true that SC ⊆ SD). Proof: By [7, Lemma 5.5], it is enough to show that every v-chain in SC is dominated by wD. Let β1 = (r1, c1) > · · · > βt = (rt, ct) be an arbitrary v-chain in SC . To show that it is dominated by wD, it is enough, by [7, Lemma 4.5], to show the existence of a v-chain (R1, C1) > · · · > (Rt, Ct) in SD with rj ≤ Rj and Cj ≤ cj for 1 ≤ j ≤ t. Such a v-chain exists by the proof of (1) of Proposi- tion 6.1.1. � Corollary 6.1.3 1. Let S be a monomial in ON and α ∈ S. Then there ex- ists a v-chain C in S with tail α such that O-depth (α) = O-depthC(α). 2. For elements α > γ in a v-chain C (these need not be consecutive), we have O-depthC(α) < O-depthC(γ). 3. For elements α > γ of a monomial S in ON, we have O-depth (α) < O-depth 4. No two elements of the same O-depth in a monomial in ON are compa- rable. Proof: (1) This follows from (2) of the Proposition above and the definition of O-depth. (2) This follows from Proposition 5.3.4 (1) and the definition of O-depth. (3) By (1), there exists a v-chain C with tail α such that O-depth (α) = O-depthC(α). Concatenate C with α > γ and letD denote the resulting v-chain. By (2) of the Proposition above, O-depthC(α) = O-depthD(α). By (2) above, O-depthD(α) < O-depthD(γ). And finally, O-depthD(γ) ≤ O-depthS(γ) by the definition of O-depth (4) Immediate from (3). � Corollary 6.1.4 Let β > γ be elements of a v-chain C of elements of ON. Let E be a v-chain in SC with tail qC,γ and length O-depthC(γ). Then qC,β occurs in E. Proof: It is enough to show that for α′ 6= qC,β in E, either α ′ > qC,β or qC,β > α ′. Let α be in C such that qC,β 6= α ′ ∈ SC,α. If β ≥ α, then qC,β > α by Proposition 5.3.4 (1). If α > β and α′ 6> qC,β , then, by (1) and (3) of the same proposition, α′ 6> qC,γ , a contradiction. � 6.2 O-depth and depth Lemma 6.2.1 The O-depth of an element α in a monomial S of ON is no less than its depth (in the sense of [7]) in S ∪S#. Proof: Let C : α1 > . . . > αt be a v-chain in S∪S # with tail αt = α, where t is the depth of α in S ∪S#. We then have α1(up) > . . . > αt(up), so we may assume C to be in S. By Proposition 5.3.4 (1), qC,α1 > . . . > qC,αt in SC . So depth S∪S#(α) = t ≤ depthSC (qC,αt) ≤ O-depthS(α). � 6.3 O-depth and type We begin by defining some useful terminology. Let (r, c) and (R,C) be two elements of R. To say that (R,C) dominates (r, c) means that r ≤ R and C ≤ c (in terms of pictures, (r, c) lies (not necessarily strictly) to the Northeast of (R,C)). To say that they are comparable means that either (R,C) > (r, c) or (r, c) > (R,C). While this is admittedly strange, there will arise no occasion for confusion. For an integer i, we let i(odd) be the largest odd integer not bigger than i and i(even) the smallest even integer not smaller than i. Lemma 6.3.1 1. For consecutive elements α > β of a v-chain C, O-depthC(β) = O-depthC(α) + 2 if and only if α is of type H and ph(α) > β O-depthC(α) + 1 otherwise 2. For an element of a v-chain C such that either its horizontal projection belongs to N or it is connected to its predecessor, the parity of its O-depth in C is the same as that of its ordinality in its connected component in C. 3. The O-depth in a v-chain of an element of type H is odd. 4. If in a v-chain an element of type V is the last in its connected component, then its O-depth is even. 5. If in a v-chain C there is an element of O-depth d, then (a) for every odd integer d′ not exceeding d, there is in C an element of O-depth d′. (b) if, for an even integer d′ not exceeding d, there is no element in C of O-depth d′, then the element α in C of O-depth d′ − 1 is of type H, and ph(α) > β, where β denotes the immediate successor of α in C. 6. Let C be a v-chain and α an element of type H in C. Then the depth in SC of ph(α) equals O-depthC(α) + 1. In particular, this depth is even. Proof: (1): From items 1 and 3(a) of Proposition 5.3.4, it follows that, for γ in C with γ > α, if γ′ 6> qC,α for some γ ′ in SC,γ , then γ ′ 6> qC,β. Thus O-depthC(β) exceeds O-depthC(α) by the number of elements in SC,α that dominate qC,β . This number is 1 if α is of type V, or of type S, or of type H and ph(α) 6> β; it is 2 if α is of type H and ph(α) > β (note that ph(α) > β if and only if ph(α) > qC,β). (2): Let λ be such an element. Everything preceding λ in C is of type H or V (Proposition 5.3.1 (3)). Let λ belong to the kth connected component, and n1, . . . , nk be respectively the cardinalities of the first, . . . , k th connected components. By (1) above and item 3(b) of Proposition 5.3.4, O-depthC(λ) is n1(even) + · · · + nk−1(even) plus the ordinality of λ in the k th connected component. (3) and (4): These are special cases of 2. (5): This follows easily from (1) and (3). (6): It follows from Proposition 5.3.4 (2) that there is no element γ in SC that lies between pv(α) and ph(α) (meaning pv(α) > γ > ph(α)), so the asser- tion holds. � Corollary 6.3.2 For a v-chain C in ON, if the O-depths of elements in C are bounded by k, then the depths of elements in SC are bounded by k(even). Proof: The depth of qC,α in SC for any α in C is at most k by hypothesis. An element of SC that is not qC,α for any α in C can only be of the form ph(α) for some α. By Proposition 5.3.4, depthSCpv(α) = depthSCph(α) − 1, which implies depth ph(α) ≤ k + 1. If, moreover, k is even, then by (3) of Lemma 6.3.1 depth ph(α) = depthSCpv(α) + 1 ≤ (k − 1) + 1 = k. � Proposition 6.3.3 Given a monomial S in ON and an element α in it, there exists a v-chain C in S with tail α such that O-depthC(β) = O-depthS(β) for every β in C. Proof: Proceed by induction on d := O-depth (α). Choose a v-chain D in S with tail α such that O-depthD(α) = O-depthS(α) (such a v-chain exists by Corollary 6.1.3 (1)). Let α′ be the element in D just before α. It follows from item (3) of Corollary 6.1.3 and item (1) of Lemma 6.3.1 that O-depth (α′) (as also O-depthD(α ′)) is either d − 1 or d − 2. By induction, there exists a v- chain C′ with tail α′ that has the desired property. Let C be the concatenation of C′ with α′ > α. We claim that C has the desired property. The only thing to be proved is that O-depthC(α) = d. By item (1) of Lemma 6.3.1, we have O-depthC(α) ≥ O-depthC′(α ′) + 1. In particular, the claim is proved in case O-depthC′(α is d − 1, so let us assume that O-depthC′(α ′) is d− 2. It now follows from the same item that α′ has type H in D and ph(α ′) > α; it further follows that it is enough to show that α′ has type H in C. Since α′ has type H in D, it follows (from item (2) of Proposition 5.3.1) that α′ > α is not connected and (from item (3) of Lemma 6.3.1) that d− 2 is odd. Now, by item (4) of Lemma 6.3.1, the type in C′ of α′ cannot be V, so it is H, and the claim is proved. � Corollary 6.3.4 Let S be a monomial in ON, β an element of S, and i an integer such that i < O-depth (β). Then (a) If i is odd, there exists an element α in S of O-depth i such that α > β. (b) If i is even and there is no element α in S of O-depth i such that α > β, then there is element α in S of O-depth i− 1 such that ph(α) > β. Proof: Choose a v-chain C in S having tail β and the good property of Propo- sition 6.3.3. Apply Lemma 6.3.1 (5). � Corollary 6.3.5 Let C be a v-chain in ON with tail α such that O-depthC(α) is odd. Let A be a v-chain in ON with head α, and D the concatenation of C with A. Let C′ denote the v-chain C \ {α}. Then 1. The type of an element of A is the same in both A and D. In particular, SA ⊆ SD and qA,β = qD,β for β in A. 2. The type of an element of C′ is the same in both C′ and D. In particular, SC′ ⊆ SD. 3. SD = SC′ ∪ SA (disjoint union); letting j0 := O-depthC(α) we have j0 = SA and (SD)1 ∪ · · · ∪ (SD)j0−1 = SC′ . (For a monomial S, the subset of elements of depth at least i is denoted Si, and the subset of elements of depth exactly i is denoted Si.) Proof: (1) Generally (meaning without the assumption that O-depthC(α) is odd), the only element of A that could possibly have a different type in D is the last one in the first connected component of A; whether or not it changes type depends exactly upon whether or not the parity of the cardinality of its connected component in D is different from that in A. Under our hypothesis, this parity does not change, for, by (4) of Lemma 6.3.1, the type of α in C is H or S, and so the cardinality of the connected component of α in C is odd. (2) Generally (meaning without the assumption that O-depthC(α) is odd), the only element of C′ that could possibly have a different type in D is the last one of C′; it changes type if and only if it is connected to α and the cardinality of its connected component in C′ is odd. Under our hypothesis, this cardinality is even, for the same reason as in (1). (3) That SD = SC′ ∪ SA (disjoint union) is an immediate consequence of (1) and (2). By Lemma 6.3.1 (1), qA,α = qD,α dominates every element of SA, so SA ⊆ (SD) j0 (depth qD,α = O-depthD(α) = O-depthC(α) = j0). It is enough to prove the following claim: every element of SC′ has depth less than j0 in SD. Let γ ′ be an element of SC′ . If γ ′ > qD,α then the claim is clear. If not, then, by Proposition 5.3.4 (1), γ′ = ph(γ). By Lemma 6.3.1 (3), O-depthD(γ) is odd. Since the claim is already true for qD,γ = pv(γ), we have O-depthD(γ) = depthSDpv(γ) ≤ j0 − 2. By (6) of the same lemma, depth γ′ = O-depthD(γ)+1, so depthDγ ′ ≤ j0−1, and the claim is proved.� Proposition 6.3.6 Let S be a monomial in ON and j an odd integer. For β in Sj,j+1(:= {α ∈ S |O-depth (α) ≥ j}), we have O-depth Sj,j+1 (β) = O-depth (β)− j + 1 Proof: Proceed by induction on j. For j = 1, the assertion reduces to a tautology. Suppose that the assertion has been proved upto j. By the induction hypothesis, we have Sj+2,j+3 = (Sj,j+1)3,4, and we are reduced to proving the assertion for j = 3. Let A be a v-chain in S3,4 with tail β and O-depthA(β) = O-depthS3,4(β). Let α be the head of A. We may assume that O-depth (α) = 3 for, if O-depth (α) > 3, we can find, by Lemma 6.3.1 (5), α′ of O-depth 3 in S with α′ > α, and extending A by α′ will not decrease the O-depth in A of β (Proposition 6.1.1 (1)). Let E be a v-chain in SA with tail qA,β and length O-depthA(β). The head of E is then qA,α (see Proposition 5.3.4 (1)). Choose C in S with tail α such that O-depthC(α) = 3. Let D be the concatenation of C with A. By Corollary 6.3.5, E is contained in SD, qD,α = qA,α, and qD,β = qA,β. By Proposition 6.1.1 (2), the O-depth of α is the same in D as in C. Choose a v-chain F in SD with tail qD,α = qA,α. Concatenating F with E we get a v-chain inSD with tail qD,β = qA,β of lengthO-depthS3,4(β)+2. This proves that O-depth (β) ≥ O-depth (β) + 2. To prove the reverse inequality, we need only turn the above proof on its head. Let D be a v-chain in S with tail β such that O-depth (β) = O-depthD(β). Let G be a v-chain in SD with tail qD,β and length O-depthS(β). There exists an element α in D of O-depth 3 in D (by Lemma 6.3.1 (5)). Let C be the part of D upto and including α, and A the part α > . . . > β. By Proposition 6.1.1 (2), O-depthC(α) = 3 and, as above, Corollary 6.3.5 applies. By Corollary 6.1.4, qA,α = qD,α occurs in G. The part F of G upto and in- cluding qA,α is of length at most 3, and the part E : qD,α > . . . > qD,β belongs also to SA (Proposition 5.3.4 (2)). Thus the length of G is at most 2 more than the the length of E which is at most O-depth (β). � Corollary 6.3.7 For odd integers i, j, we have (Si,i+1)j,j+1 = Si+j−1,i+j . � Corollary 6.3.8 Let E : α > . . . > ζ be a v-chain, D and D′ two v-chains with tail α, and C, C′ the concatenations of D, D′ respectively with E. Then 1. O-depthC(ζ)−O-depthC(α) ≤ O-depthC′(ζ) −O-depthC′(α) + 1; 2. equality holds if and only if the type of λ is H in C and V in C′, and ph(λ) > µ, where λ is the last element in the connected component con- taining α of E and µ is the immediate successor in E of λ. Proof: These assertions follow from combining (2) of Proposition 5.3.3 with (1) of Lemma 6.3.1. � Corollary 6.3.9 Let ζ be an element of a monomial S in ON. Let C be a v-chain in S with tail ζ such that O-depthC(ζ) = O-depthS(ζ). Then 1. O-depthC(α) ≥ O-depthS(α) − 1 for any α in C. 2. If O-depthC(α) = O-depthS(α) − 1 for some α in C, then (a) letting λ be the last element in the connected component containing α and µ the element next to λ, the type of λ in C is H and ph(λ) > µ. (b) O-depthC(γ) = O-depthS(γ) − 1 for all γ in C between α and λ (both inclusive). Proof: (1) Let α be in C. Let E denote the part of C beyond (and including) α. Let D′ be a v-chain in S with tail α such that O-depthD′(α) = O-depthS(α). Let C′ be the concatenation of D′ and E. Applying Proposition 6.3.8 (1), we O-depthC(α) ≥ O-depthC(ζ) −O-depthC′(ζ) +O-depthC′(α)− 1. But O-depthC(ζ)−O-depthC′(ζ) = O-depthS(ζ)−O-depthC′(ζ) ≥ 0, and, by the choice of D′ and Proposition 6.1.1 (2), O-depthC′(α) = O-depthD′(α) = O-depth (2) Assertions (a) and (b) follow respectively from the “only if ” and “if” parts of item (2) of Proposition 6.3.8. � 7 The map Oπ The purpose of this section is to describe the map Oπ. The description is given in §7.1. It relies on certain claims which are proved in §§7.3, 7.4. Those proofs in turn refer to results from §9, but there is no circularity—to postpone the definition of Oπ until all the results needed for it have been proved would hurt rather than help readability. The observations in §7.5 are required only in §10. The symbol j will be reserved for an odd positive integer throughout this section. 7.1 Description of Oπ The map Oπ takes as input a monomial S in ON and produces as output a pair (w,S′), where w is an element of I(d) such that w ≥ v and S′ is a “smaller” monomial, possibly empty, in ON. If the input S is empty, no output is produced (by definition). So now suppose that S is non-empty. We first partition S into subsets according to the O-depths of its elements. Let S be the sub-monomial of S consisting of those elements of S that have O-depth k—the superscript “pr” is short for “preliminary”. It follows from Corollary 6.1.3 (4) that there are no comparable elements in S and so we can arrange the elements of S in ascending order of both row and column indices. Let σk be the last element of S in this arrangement. Let now j be an odd integer. We set j,j+1 := S We say that S is truly orthogonal at j if ph(σj) belongs to N (that is, if r > r where σj = (r, c)), Let Sj,j+1 denote the monomial in N defined by Sj,j+1 := j,j+1 \ {σj} j,j+1 \ {σj} ∪ {pv(σj), ph(σj)} if S is truly orthogonal at j j,j+1 ∪ j,j+1 otherwise Here S j,j+1 \ {σj} and other terms on the right are to be understood as mul- tisets. As proved in Corollary 7.3.4 (1) below, Sj,j+1 has depth at most 2. Let Sj (respectively Sj+1 be the subset (as a multiset) of elements of depth 1 (respectively 2) of Sj,j+1. Now, for every integer k, we apply the map of π of [7, §4] to Sk to obtain a pair (w(k),S′k), where w(k) is an element of I(d, 2d) andS k is a monomial in N. Let Sw(k) be the distinguished monomial in N associated to w(k)—see §5.1.2. Proposition 7.1.1 1. Sw(k) and S k are symmetric. And therefore so are ∪kSw(k) and ∪kS 2. ∪kSw(k) is a distinguished subset of N (in particular, the Sw(k) are dis- joint). 3. For j an odd integer, either • both Sw(j) and Sw(j+1) meet the diagonal, or • neither of them meets the diagonal, precisely as whether or not S is truly orthogonal at j. And therefore ∪kSw(k) has evenly many diagonal elements. 4. No S′k intersects the diagonal. And therefore neither does ∪kS The proposition will be proved below in §7.4. Finally we are ready to define the image (w,S′) of S under Oπ. We let w be the element of I(d, 2d) associated to the distinguished subset ∪kSw(k) of N; since ∪kSw(k) is symmetric and has evenly many diagonal elements, it follows from Proposition 5.2.1 that w is in fact an element of I(d). And we take S′ := k ∩ON. Remark 7.1.2 Setting π(Sj,j+1) := (wj,j+1,S j,j+1), S ′ := ∪j oddS j,j+1 ∩ON, and defining w to be the element of I(d, 2d) associated to ∪j oddSwj,j+1 would give an equivalent definition of Oπ. 7.2 Illustration by an example We illustrate the map Oπ by means of an example. Let d = 15, and v = (1, 2, 3, 4, 9, 10, 14, 16, 18, 19, 20, 23, 24, 25, 26). A monomial S in ON is shown in Figure 7.2.1. Solid black dots indicate the elements that occur in S with non- zero multiplicity. Integers written near the solid dots indicate multiplicities. The O-depth of S is 5. The element (21, 9) has O-depth 3 although it has depth 2 in S. Figure 7.2.2 shows the monomials S 1,2, S 3,4, and S 5,6. Solid dots, open dots, and crosses indicate elements of these monomials respectively. The monomial S is truly orthogonal at 1 and 3 but not at 5: σ1 = (28, 2), σ3 = (21, 9), and σ5 = (15, 14). Figure 7.2.3 shows the monomials S1,2, S3,4, and S5,6 of N and also their decomposition into blocks, and Figure 7.2.4 the monomialsS′1,2, S 3,4, and S We have Sw = {(15, 14), (17, 16), (21, 10), (7, 4), (27, 24), (28, 3), (30, 1), (29, 2)} hence w = (7, 9, 15, 17, 18, 19, 20, 21, 23, 25, 26, 27, 28, 29, 30). It is easy to check that w ∈ I(d). The monomial S′ is the intersection with ON of the union of S′1,2, S 3,4, and S 5,6—in other words it is just the monomial lying above d in Figure 7.2.4. diagonal 1 2 3 4 9 10 14 16 18 19 20 23 24 25 26 2 1 3 4 3 2 6 2 2 3 2 3 1 2 3 Figure 7.2.1: The monomial S diagonal 1 2 3 4 9 10 14 16 18 19 20 23 24 25 26 2 1 3 4 3 2 6 2 2 1 3 2 3 1 2 3 Figure 7.2.2: S 1,2, S 3,4, and S diagonal 1 2 3 4 9 10 14 16 18 19 20 23 24 25 26 2 1 3 4 3 2 6 2 2 1 3 2 3 1 2 2 4 3 1 4 2 1 3 2 1 2 1 3 2 1 2 4 6 5 3 1 2 1 4 Figure 7.2.3: S1,2, S3,4, and S5,6 diagonal 1 2 3 4 9 10 14 16 18 19 20 23 24 25 26 2 1 4 3 6 5 3 2 1 3 2 1 1 4 3 1 4 2 1 3 2 Figure 7.2.4: S′1,2, S 3,4, and S 7.3 A proposition about Sj,j+1 The aim of this subsection is to show that Sj,j+1 has depth no more than 2— see item (1b) of Proposition 7.3.3. This basic fact was mentioned above in the description of Oπ and is necessary (psychologically although not logically) to make sense of the definitions of Sj and Sj+1. We prepare the way for Proposition 7.3.3 by way of two preliminary propositions. The first of these is about elements of O-depth j and j + 1 in S, the second about the relation of these elements with σj . Proposition 7.3.1 1. S has no comparable elements. 2. For j an odd integer and β an element of S j+1, there exists α in S j such that α > β. In particular, the row index of σj+1 (if σj+1 exists) is less than the row index of σj. Proof: (1) follows from Corollary 6.1.3 (4); (2) follows from Proposition 6.3.3 and Lemma 6.3.1 (5). � Proposition 7.3.2 Let j be an odd integer and let S be truly orthogonal at j. 1. pv(σj) > ph(σj); if α > pv(σj), then α > σj; if α > σj , then α > ph(σj). 2. No element of S j is comparable to pv(σj) or ph(σj). 3. No element of S j+1 is comparable to ph(σj). 4. The following is not possible: α ∈ S j , β ∈ S j+1, and ph(α) > β. Proof: (1) is trivial. (2) follows immediately from the definition of σj . We now prove (3). First suppose β > ph(σj) for some β in S j+1. By (2) of Proposition 7.3.1, there exists α in S j such that α > β. But then the row index of α exceeds that of σj , a contradiction to the choice of σj . We claim that it is not possible for β ∈ S j+1 to satisfy ph(σj) > β. This being a special case of (4), we need only prove that statement. So suppose that α belongs to S j and that ph(α) > β. Let C be a v-chain in S with tail α such that O-depthC(α) = j (see Proposition 6.1.3 (1)). Concatenate C with α > β and call the resulting v-chain D. Then, by Lemma 6.3.1 (4), α is of type H in D, so that, by Lemma 6.3.1 (1), we have O-depthD(β) = O-depthD(α) + 2. But, by Proposition 6.1.1 (2), O-depthD(α) = O-depthC(α) = j, so that O-depth (β) ≥ j + 2, a contradiction. � Let Sj,j+1(ext) denote the set—not multiset—defined by: Sj,j+1(ext) := Sj,j+1 ∪ {σj , σ j } if S is truly orthogonal at j Sj,j+1 otherwise Here Sj,j+1 on the right stands for the underlying set of the multiset Sj,j+1 defined above. The set Sj,j+1(ext) is the disjoint union of the sets Sj(ext) and Sj+1(ext) defined as follows (here again the terms on the right hand side denote the underlying sets of the corresponding multisets): Sj(ext) := ∪ {pv(σ)} if S is truly orthogonal at j otherwise Sj+1(ext) := j+1 ∪ ∪ {ph(σ)} if S is truly orthogonal at j j+1 ∪ otherwise Proposition 7.3.3 1. Sj(ext) (respectively Sj+1(ext)) is precisely the set of elements of depth 1 (respectively 2) in Sj,j+1(ext). In particular, (a) Neither Sj(ext) nor Sj+1(ext) contains comparable elements. (b) The length of a v-chain in Sj,j+1(ext) is at most 2. (c) There is a v-chain of length 2 in Sj,j+1 unless Sj+1(ext) is empty. 2. Let k be a positive integer, not necessarily odd. If there is in S an element of O-depth at least k, then Sk(ext) is non-empty. The converse also holds except possibly if k is even and S is truly orthogonal at k−1. In particular, if Sk(ext) is non-empty, then there is an element of O-depth at least k−1. Proof: (1): It is enough to show that every element of Sj(ext)(up) (respec- tively Sj+1(ext)(up)) is of depth 1 (respectively 2) in Sj,j+1(ext)(up), for • α > β implies α(up) > β(up) for elements α, β of N. • Sj,j+1(ext) = Sj(ext) ∪Sj+1(ext). • Sj,j+1(ext), Sj(ext), and Sj+1(ext) are symmetric. In turn, it is enough to show the following: (i) Every element of Sj(ext)(up) has depth 1. (ii) Sj+1(ext)(up) has no comparable elements. (iii) Every element of Sj+1(ext)(up) has depth at least 2. Item (i) follows from Proposition 7.3.1 and Proposition 7.3.2 (2); item (ii) from Proposition 7.3.1 (1) and Proposition 7.3.2 (3); item (iii) from Propo- sition 7.3.1 (2) and Proposition 7.3.2 (1). (2): The first assertion follows from Lemma 6.3.1 (5): if k is odd there is an element ofO-depth k inS; if k is even and there is no element ofO-depth k inS, then there is in S an element of O-depth k− 1 and of type H, so S is truly or- thogonal at k−1. The second assertion is clear from the definition of Sk(ext). � Corollary 7.3.4 1. No element of Sj,j+1 has depth more than 2. 2. Sj+1(ext) = Sj+1 and Sj(ext) ∩ Sj,j+1 = Sj (as sets). In particular, Sj+1 = Sj,j+1∩Sj+1(ext) and Sj = Sj,j+1∩Sj(ext) as multisets defined by the intersection of a multiset with a subset. Proof: (1): Since Sj,j+1 ⊆ Sj,j+1(ext) (as sets), this follows immediately from (1b) of the proposition above. (2): Since the union of Sj+1(ext) (which always is contained in Sj,j+1) and Sj(ext) ∩Sj,j+1 is all of Sj,j+1, and since Sj , Sj+1 are disjoint, it it enough to show that Sj+1(ext) ⊆ Sj+1 and Sj(ext) ∩Sj,j+1 ⊆ Sj . Now, since elements of Sj(ext) have depth 1 even in Sj,j+1(ext) (by item (1) of the proposition above), it is immediate that Sj(ext) ∩ Sj,j+1 ⊆ Sj . And it follows from the proof of item (iii) in the proof of item (1) of the proposi- tion above that an element of Sj+1(ext) has depth 2 even in Sj,j+1 (not just in Sj,j+1(ext)), so that Sj+1(ext) ⊆ Sj+1. � 7.4 Proof of Proposition 7.1.1 (1) The monomials Sj,j+1 are clearly symmetric. Observe that α in Sj,j+1 has the same depth as α#, for α1 > α2 implies α(up) > α2(up) and α(down) > α2(down) for α1, α2 in N. Thus the monomials Sk are symmetric. Since the map π of [7] respects #—see Proposition 5.7 of [4]—it follows that Sw(k) and S′k are symmetric. Therefore so are ∪kSw(k) and ∪kS (2) This follows from Corollary 9.3.6. (3) IfS is truly orthogonal at j, then pv(σj) and ph(σj) are diagonal elements respectively in Sj and Sj+1—see Corollary 7.3.4 (2). Thus both Sj and Sj+1 have diagonal blocks in the sense of Proposition 5.10 (A) of [4]. It follows from the result just quoted that both Sw(j) and Sw(j+1) meet the diagonal. It is of course clear that each Sw(k) meets the diagonal at most once since diagonal elements are clearly comparable but elements of Sw(k) are not by Lemma 4.9 of [7]. Suppose that S is not truly orthogonal at j. Then σj and σ j belong to different blocks—this is equivalent to the definition of S being not truly orthog- onal at j. By Proposition 7.3.1 (2), it follows that σj+1 and σ j+1 also belong to different blocks. So neither Sj nor Sj+1 has a diagonal block. (4) If S is not truly orthogonal at j, then neither Sj nor Sj+1 has a diagonal block (as has just been said above), and it follows from Proposition 5.10 (A) of [4] that neither S′j nor S j+1 meets the diagonal. So suppose that S is truly orthogonal at j. Then both Sj and Sj+1 have a diagonal entry each of multiplicity 1, namely pv(σj) and ph(σj) respectively. It is clear from the definition of σj that no element of Sj(up) shares its row index with pv(σj). And it follows from Proposition 7.3.1 (2) that no element of Sj+1(up) shares its row index with ph(σj). It now follows from the proof of Proposition 5.10 (B) of [4]—see the last line of that proof—that neither S′j nor S′j+1 meets the diagonal. � 7.5 More observations Proposition 7.5.1 The length of any v-chain in Sj,j+1∪S j+1 is at most 2. Proof: By Corollary 7.3.4 (1), the length of any v-chain in Sj,j+1 is at most 2. Applying Lemma 9.1.1 to Sj,j+1, we get the desired result. � Proposition 7.5.2 1. For an element α′ = (r, c) of S′k(up), there exists an element α = (r, C) of S with C ≤ c. 2. For an element α′ = (r, c) of S′j+1(up), there exists an element α = (R, c) of Sj+1(up) with r ≤ R. 3. For an element α′ of S′j+1(up), there exists an element α of S j with α > α′. Proof: (1) That there exists α inSk(up) with C ≤ c follows from the definition of S′k(up). Clearly such an α cannot be on the diagonal, so α belongs to S (2) As in the proof of (1), it follows from the definition of S′j+1 that there exists α = (R, c) in Sj+1 with r ≤ R. If α lies strictly below the diagonal, then c > R∗, so that α∗ = (c∗, R∗) > α′ = (r, c), a contradiction to Lemma 9.1.1 (α∗ belongs to Sj+1 by the symmetry of Sj+1). Thus α belongs to Sj+1(up). (3) Writing α′ = (r, c), by (1), we can find an β = (r, C) in S j+1 with C ≤ c. By Proposition 7.3.1 (2), there exists α in S j,j+1 such that α > β. � Corollary 7.5.3 If in S′j(up)∪S j+1(up) there exists an element with horizon- tal projection in N, then S is truly orthogonal at j. Proof: Follows directly from Proposition 7.5.2 (1) and (3). � Proposition 7.5.4 The O-depth of an element in S j ∪ S j+1 is at most 2. More strongly, the O-depth of an element in S j,j+1 ∪S j(up) ∪S j+1(up) is at most 2. Proof: It is enough to show that no element in S′j(up)∪S j+1(up) hasO-depth more than 2, for we may assume by increasing multiplicities that S j ⊆ S j(up) and S j+1 ⊆ S j+1(up) (as sets). It follows from Proposition 7.5.1 that a v-chain in S′j(up) ∪ S j+1(up) has length at most 2. Let α 1 = (r1, c1) > α 2 = (r2, c2) be such a v-chain. It follows from the proof of Corollary 4.14 (2) of [7] that α′1 ∈ S j(up) and α 2 ∈ S j+1(up). By item (1) of Lemma 6.3.1, it is enough to rule out the following possibility: α′1 is of type H in α 1 > α 2 and ph(α 1) > α Suppose that this is the case. By Proposition 7.5.2 (1) and (2), it follows that there exist elements α1 = (r1, C1) ∈ S j,j+1 and α2 = (R2, c2) ∈ Sj+1(up) with C1 ≤ c1 and r2 ≤ R2. Since ph(α 1) > α 2, it follows that α1 > α2. Now, if α2 = ph(σj), then Proposition 7.3.2 (2) is contradicted; if α2 belongs to S Proposition 7.3.2 (4) is contradicted (because ph(α1) > α2). � 8 The map Oφ The purpose of this section is to describe the map Oφ and prove some basic facts about it. Certain proofs here refer to results from §9, but there is no circularity—to postpone the definition of Oφ until all the results needed for it have been proved would hurt rather than help readability. As in §7, the symbol j will be reserved for an odd integer throughout this section. 8.1 Description of Oφ The map Oφ takes as input a pair (w,T), where T is a monomial, possibly empty, in ON and w ≥ v an element of I(d) that O-dominates T, and produces as output a monomial T∗ of ON. To describe Oφ, we first partition T into subsets Tw,j,j+1. As the subscript w in Tw,j,j+1 suggests, this partition depends on w. For an odd integer j, let Sjw (respectively Sw,j,j+1) denote the subset of Sw consisting of those elements that are j-deep (respectively that are j deep but not j + 2 deep, or equivalently of depth j or j + 1) in Sw in the sense of [7, §4]. Since Sw is distinguished, symmetric, and has evenly many elements on the diagonal d, it follows that Sjw and Sw,j,j+1 too have these properties, and that, in fact, the number of diagonal elements of Sw,j,j+1 is either 0 or 2 (in the latter case, the elements have to be distinct since Sw is distinguished and so is multiplicity free). Let us denote by wj and wj,j+1 the elements of I(d) corresponding to Sjw and Sw,j,j+1 by Proposition 5.2.1. Let Tw,j,j+1 denote the subset of T consisting of those elements α such that • every v-chain in T with head α is O-dominated by wj , and • there exists a v-chain in T with head α that is not O-dominated by wj+2. It is evident that the subsets Tw,j,j+1 are disjoint (as j varies over the odd integers) and that their union is all of T (for w = w1 O-dominates all v-chains in T by hypothesis and Sjw is empty for large j and so w j = v). In other words, the Tw,j,j+1 form a partition of T. Lemma 8.1.1 1. The length of a v-chain in Tw,j,j+1∪T w,j,j+1 is at most 2. In fact, the O-depth of any element in Tw,j,j+1 is at most 2. 2. wj,j+1 O-dominates Tw,j,j+1. Proof: The lemma follows rather easily from Corollary 9.2.3 as we now show. Let C be a v-chain in Tw,j,j+1. Let τ be the tail of C. Choose a v-chain D in T with head τ that is not O-dominated by wj+2. Let E be the concate- nation of C with D. Since the head of E belongs to Tw,j,j+1, it follows that E is O-dominated by wj . It follows from (the only if part of) Corollary 9.2.3 (applied with S = E and x = wj) that wj,j+1 O-dominates E 1 ∪ E 2 and wj+2 O-dominates E3,pr. This means τ 6∈ E3,pr, so τ ∈ E 1 ∪ E 2 , and so C ⊆ E 1 ∪ E 2 . This proves (2). By Proposition 6.1.1 (2), the O-depths of elements of C are the same in C and E, so C ⊆ C 1 ∪ C 2 , which proves the second assertion of (1). The first assertion of (1) follows from the second (see Lemma 6.2.1). � Corollary 8.1.2 wj,j+1 dominates Tw,j,j+1 ∪ T w,j,j+1 in the sense of [7]. Proof: This follows from (2) of Lemma 8.1.1 and Corollary 5.3.6 (the latter applied with S = Tw,j,j+1 and w = wj,j+1). � We may therefore apply the map φ of [7, §4] to the pair (wj,j+1,Tw,j,j+1 ∪ w,j,j+1) to obtain a monomial (Tw,j,j+1∪T w,j,j+1) ⋆ in N. In applying φ, there is the partitioning of Tw,j,j+1 ∪ T w,j,j+1 into “pieces”, these being indexed by elements of Swj,j+1 = Sw,j,j+1—observe that the elements of depth 1 (respec- tively 2) of Sw,j,j+1 are precisely those of Sw of depth j (respectively j + 1). We denote by Pβ the piece of Tw,j,j+1∪T w,j,j+1 corresponding to β in Swj,j+1 . We also use the notation P∗β as in [7]. Moreover, we will use the phrase piece of T (with respect to w being implicitly understood) to refer to a piece of Tw,j,j+1 ∪ T w,j,j+1 for some odd integer j. Caution: Thinking of T as a monomial in N and w as an element of I(d, 2d) that dominates it, there is, as in [7], the notion of “piece of T” (with respect to w). The two notions of “piece” are different. Lemma 8.1.3 1. The monomial (Tw,j,j+1∪T w,j,j+1) ⋆ is symmetric and has either none or two distinct diagonal elements depending exactly on whether Swj,j+1 = Sw,j,j+1 has 0 or 2 elements on the diagonal. 2. The depth of (Tw,j,j+1 ∪ T w,j,j+1) ⋆ is 2; and ∪β∈(Sw)jP β, ∪β∈(Sw)j+1P are respectively the elements of depth 1 and 2 in (Tw,j,j+1 ∪ T w,j,j+1) Proof: (1) The symmetry follows by combining Proposition 5.6 of [4], which says that the map π respects the involution #, with Proposition 4.2 of [7], which says that π and φ are are inverses of each other. The assertion about diagonal elements follows by combining item (B) of [4, Proposition 5.10], which is an assertion about the existence and relative mul- tiplicities of diagonal elements in B and B′ where B is a diagonal block of a monomial in N, and Proposition 4.2 of [7]. (2) It follows from Propositions 4.2 of [7] that the map π (described in §4 of that paper) applied to (Tw,j,j+1∪T w,j,j+1) ⋆ results in the pair (wj,j+1,Tw,j,j+1∪ Tw,j,j+1). It now follows from Lemma 4.16 of [7] that the depth of (Tw,j,j+1 ∪ w,j,j+1) ⋆ is exactly 2. The latter assertions again follow from the results of [7]— in fact, the proof that π ◦ φ is identity on pages 47–49 of [7] shows that the P∗β are the blocks in the sense of [7] of the monomial (Tw,j,j+1 ∪ T w,j,j+1) Suppose that (Tw,j,j+1 ∪ T w,j,j+1) ⋆ contains the pair (a, a∗), (b, b∗) of di- agonal elements with a > b. We call the pair (b, a∗), (a, b∗) the “twists,” and set δj := (b, a ∗). In other words, δj is the element of the twisted pair that lies above the diagonal—observe that the twisted elements are reflections of each other. We allow ourselves the following ways of expressing the condition that (Tw,j,j+1 ∪ T w,j,j+1) ⋆ has diagonal elements: δj exists; w is diagonal at j (the latter expression is justified by the lemma above). With notation as above, consider the new monomial defined as (Tw,j,j+1 ∪ T w,j,j+1) ⋆ if w is not diagonal at j( (Tw,j,j+1 ∪ T w,j,j+1) ⋆ \ d ∪ {δj, δ j } if w is diagonal at j This new monomial is symmetric and contains no diagonal elements. Its inter- section with ON is denoted T⋆w,j,j+1. In other words, T w,j,j+1 is the intersection of the new monomial with the subset of N of those elements that lie strictly above the diagonal. The union of T⋆w,j,j+1 over all odd integers j is defined to be T w, the result of Oφ applied to (w,T). This finishes the description of the map Oφ. For β in Sw,j,j+1(up), we define the “orthogonal piece-star” OP β corre- sponding to β as OP∗β := P∗β = P β(up) if β is not on the diagonal P∗β ∩ON if β ∈ (Sw)j+1 is on the diagonal {P∗β ∩ON} ∪ {δj} if β ∈ (Sw)j is on the diagonal (8.1.1) With this, we can say that T⋆w is the union of OP β as β varies over Sw(up). Lemma 8.1.4 Suppose that (Tw,j,j+1 ∪ T w,j,j+1) ⋆ contains the pair (a, a∗), (b, b∗) of diagonal elements with a > b. Let . . . , (r1, c1), (a, a ∗), (c∗1, r 1), . . . ; . . . , (r2, c2), (b, b ∗), (c∗2, r 2), . . . be respectively the elements of depth 1 and 2 of (Tw,j,j+1 ∪ T w,j,j+1) ⋆ arranged in increasing order of row and column indices. Then 1. c1 ≤ a ∗ and r1 ≤ b (assuming (r1, c1) exists); and 2. r2 < b and c2 ≤ b ∗ (assuming (r2, c2) exists). Proof: (1) Suppose that (r1, c1) exists. It is clear that c1 ≤ a ∗. From way the map φ of [7] is defined, it follows that (r1, a ∗) is an element of Tw,j,j+1. Suppose that r1 > b. Then ph(r1, a ∗) = (r1, r 1) belongs to N. We consider two cases. If (r2, c2) exists, then, again from the definition of the map φ, it follows that (r2, b ∗) is an element of Tw,j,j+1. But then ph(r1, a ∗) = (r1, r 1) > (b, b ∗) and (b, b∗) dominates (r2, b ∗), which means that the v-chain (r1, a ∗) > (r2, b ∗) (note that a∗ < b∗ because a > b by hypothesis) in Tw,j,j+1 has O-depth more than 2, a contradiction to Lemma 8.1.1 (1). Now suppose that (r2, c2) does not exist. (Then (b, b ∗) is the diagonal ele- ment in (Sw)j+1.) Consider the singleton v-chain C := {(r1, a ∗)} in Tw,j,j+1. Then SC = {(a, a ∗), (r1, r 1)} which is not dominated by wj,j+1, a contradiction to Lemma 8.1.1 (2). (2) Suppose that (r2, c2) exists. Then there exists, by the definition of the map φ, an element (r2, b ∗) in Tw,j,j+1. Since (r2, b ∗) lies above the diagonal, it follows that r2 < b. That c2 ≤ b ∗ is clear. � 8.2 Basic facts about Tw,j,j+1 and T w,j,j+1 Lemma 8.2.1 1. Let α′ > α be elements of T. Let j and j′ be the odd integers such that α′ ∈ Tw,j′,j′+1 and α ∈ Tw,j,j+1. Then j ′ ≤ j. 2. If, further, either (a) there exists µ in T such that α′ > µ > α, or (b) α′ ∈ Pβ′ for β ′ in (Sw)j′+1, then j′ < j. Proof: (1) By hypothesis, every v-chain with head α′ is O-dominated by wj This implies, by Corollary 6.1.2, that every v-chain with head α is O-dominated by wj . This shows j′ ≤ j. (2a) Suppose that j′ = j. It follows from (1) that α′, µ, and α all belong to Tw,j,j+1. But then α ′ > µ > α is a v-chain of length 3 in Tw,j,j+1, a contradiction to Lemma 8.1.1 (1). (2b) Suppose that j′ = j. Then α′ > α is a v-chain in Tw,j,j+1. Being of length 2, it cannot be dominated by (Sw)j+1, which means, by the definition of Pβ′ , that α ′ cannot belong to Pβ′ , a contradiction. � Proposition 8.2.2 1. The length of a v-chain in T⋆w,j,j+1 is at most 2. 2. The O-depth of T⋆w,j,j+1 is at most 2. 3. ∪β∈(Sw)j(up)OP β is precisely the set of depth 1 elements of T w,j,j+1 (in particular, no two elements there are comparable); if δj exists, then it is the last element of ∪β∈(Sw)j(up)OP β when the elements are arranged in increasing order of row and column indices. 4. ∪β∈(Sw)j+1(up)OP β is precisely the set of depth 2 elements of T w,j,j+1 (in particular, no two elements there are comparable); if δj exists, then its row index exceeds the row index of any element in ∪β∈(Sw)j+1(up)OP Proof: For (1), it is enough, given Lemma 8.1.3 (2), to show that δj is not comparable to any element of depth 1 of (Tw,j,j+1 ∪T w,j,j+1) ⋆, and this follows from Lemma 8.1.4 (1). In fact, the above argument proves also (3). For (4), it is enough, given Lemma 8.1.3 (2), the symmetry of the monomi- als involved in that lemma, and the observation that α > β implies α(up) > β(up) for elements α, β of N, to show the following: if (a, a∗) > γ = (e, f) for γ an element of (Tw,j,j+1 ∪ T w,j,j+1) ⋆ lying (strictly) above the diagonal, then δj > γ. But this follows from Lemma 8.1.4 (2): γ is a depth 2 ele- ment in (Tw,j,j+1 ∪ T w,j,j+1) ⋆, and we have e ≤ r2 < b (and a ∗ < f since (a, a∗) > γ). In fact, the above argument proves also (2): observe that f ≤ b∗ (Lemma 8.1.4 (2)). � 9 Some Lemmas The main combinatorial results of this paper are Propositions 4.1.1 and 4.1.2. They are analogues respectively of Propositions 4.1 and 4.2 of [7]. We have tried to preserve the structure of the proofs in [7] of those propositions. The proofs in [7] rely on certain lemmas and it is natural therefore to first establish the orthogonal analogues of those. The purpose of this section is precisely that. Needless to say that the lemmas (especially those in §9.4) may be unintelligible until one tries to read §10. The division of this section into four subsections is also suggested by the structure of the proofs in [7]. Each subsection has at its beginning a brief description of its contents. 9.1 Lemmas from the Grassmannian case In this subsection, the terminology and notation of [7, §4] are in force. The state- ments here could have been made in [7, §4] and would perhaps have improved the efficiency of the proofs there, but do not appear there explicitly. Let S be a monomial in N. Recall from [7] the notion of depth of an ele- ment α in S: it is the largest possible length of a v-chain in S with tail α and denoted depth α. The depth of S is the maximum of the depths in it of all its elements. We denote by Sk the set of elements of depth k of S (as in [7]) and by Sk the set of elements of depth at least k of S. Caution: For a monomial S of ON, we have introduced in §7.1 the notation Sk. That is different from the Sk we have just defined. Lemma 9.1.1 Let S be a monomial in N, and let π(S) = (w,S′), where π is the map defined in [7, §4]. Then the maximum length of a v-chain in S∪S′ is the same as the maximum length of a v-chain in S. Proof: We use the notation of [7, §4] freely. Let d be the maximum length of a v-chain in S. Suppose α1 > . . . > αℓ is a v-chain in S ∪S ′. Let i1, . . . , iℓ be such that αj belongs to Sij ∪ S (the integers ij are uniquely determined— see Corollary 5.4 of [4]). We claim that i1 < . . . < iℓ. This suffices to prove the lemma, for Sk ∪S k is empty for k > d. To prove the claim, it is enough to show i1 < i2. It follows from Lemma 4.10 of [7] that i1 6= i2. We now assume that i1 > i2 and arrive at a contradiction. First suppose that α1 ∈ Si1 . Then, by the definition ofSi1 , there exists β inSi2 with β > α1. Now β > α2 and both β, α2 belong to Si2 ∪S , a contradiction to [7, Lemma 4.10]. If α1 = (r, c) belongs to S , then, by the definition of S′i1 , there exists (r, a) in Si1 with a ≤ c, and there exists β in Si2 with β > (r, a). This leads to the same contradiction as before. � Lemma 9.1.2 Let B and U be monomials in N. Assume that • the elements of B form a single block (in the sense of [7, Page 38]). • U has depth 1 (equivalently, there are no comparable elements in U). • for every β = (r, c) in B, there exist γ1(β) = (R1, C1), and γ2(β) = (R2, C2) in U such that C1 < c, C2 < R1, r < R2 (this holds, for example, when there exists γ(β) in U such that γ(β) > β: take γ1(β) = γ2(β) = γ(β)). Then there exists a unique block C of U such that w(C) > w(B). Proof: It is useful to isolate the following observation: Lemma 9.1.3 Let (r1, c1) and (r2, c2) be elements of N with c2 < r1 ≤ r2. Let γ11 = (R 1 ), γ 1 = (R 1 ) and γ 2 = (R 2 ), γ 2 = (R 2 ) be elements of N such that 1. C11 ≤ c1, C 1 < R 1, r1 ≤ R 2. C12 ≤ c2, C 2 < R 2, r2 ≤ R 3. No two of γ11 , γ 1 , γ 2 , γ 2 are comparable (they could well be equal and this is important for us—see our definition of comparability). Then the monomial {γ11 , γ 1 , γ 2 , γ 2} consists of a single block. Proof: It follows from assumption (1) that γ11 and γ 1 belong to a single block: • if R11 < R 1, then C 1 < R 1 becomes relevant; • if R21 < R 1, then the other two inequalities in (1) become relevant: C11 ≤ c1 < r1 ≤ R Similarly it follows from assumption (2) that γ12 and γ 2 belong to a single block. We therefore need only consider the cases when, in the arrangement of the elements {γ11 , γ 1 , γ 2 , γ 2} in increasing order of row indices, both γ 1 , γ 1 come before or after γ12 , γ 2 . In the former case, the first sequence of inequalities below shows that γ21 and γ 2 belong to the same block, and we are done; in the latter case, the second sequence of inequalities below shows that γ22 and γ belong to the same block, and we are done: • C12 ≤ c2 < r1 ≤ R • C11 ≤ c1 < r1 ≤ r2 ≤ R Continuing with the proof of Lemma 9.1.2, we first prove the existence part. Arrange the elements of B in non-decreasing order of row numbers as well as column numbers (this is possible since there are no comparable elements in B). If β1 = (r1, c1) and β2 = (r2, c2) are successive elements, then c2 < r1 ≤ r2 (since B is a single block). Apply Lemma 9.1.3 with γ11 = γ 1(β1), γ 1 = γ 2(β1), and γ12 = γ 1(β2), γ 2 = γ 1(β2). We conclude that {γ 1 , γ 1 , γ 2 , γ 2} belongs to a single block, say C, of U. Continuing thus, we conclude that all γ1(β) and γ2(β), as β varies over B, belong to C. Since the row (respectively column) index of w(C) is the maximum (respectively minimum) of all row (respectively column) indices of elements of C (and similarly for B), it follows that w(C) > w(B). To prove uniqueness, let C1 and C2 be two blocks of U with w(C1) > w(B) and w(C2) > w(B). Apply the lemma with (r1, c1) = (r2, c2) = w(B) and γ11 = γ 1 = w(C1) and γ 2 = γ 2 = w(C2); it follows from [7, Lemma 4.9] that w(C1) and w(C2) are not comparable. But, unless C1 = C2, neither is the mono- mial {w(C1), w(C2)} a single block, again by [7, Lemma 4.9]. � Lemma 9.1.4 Let S be a monomial in N and x an element of I(d, n). For x to dominate S it is necessary and sufficient that for every α = (r, c) in S there exist β = (R,C) in Sx with C ≤ c, r ≤ R, and depthSxβ ≥ depthSα. (Here Sx denotes the distinguished monomial in N associated to x as in [7, Proposition 4.3].) Proof: The lemma is a corollary of [7, Lemma 4.5] as we now show. First suppose that x dominatesS. Let α = (r, c) be an element ofS, and C a v-chain in S with tail α and length depth α. Since x dominates C, there exists, by [7, Lemma 4.5], a chain in D in Sx of length depthSα and tail β = (R,C) with C ≤ c and r ≤ R, and we are done with the proof of the necessity. To prove the sufficiency, let C : α1 = (r1, c1) > . . . > αk = (rk, ck) be a v-chain in S. By hypothesis, there exist β1 = (R1, C1), . . . , βk = (Rk, Ck) in Sx with Ci ≤ ci, ri ≤ Ri, and depthSxβi = i for 1 ≤ i ≤ k (observe that replacing the ≥ in the latter condition of the statement by an equality yields an equivalent statement). We claim that β1 > . . . > βk. By [7, Lemma 4.5], it suffices to prove the claim. Since βk has depth k in Sx, there exists a β k−1 = (R k−1, C k−1) of depth k − 1 in Sx such that β k−1 > βk. It follows from the distinguishedness of Sx that that β′k−1 = βk−1: if not, then we have two distinct elements of the same depth (namely k−1) in Sx both dominating αk, a contradiction. So βk−1 > βk, and the claim is proved by continuing in a similar fashion. � Let x be an element of I(d, n). Let Sx denote the distinguished monomial in N associated to x as in [7, Proposition 4.3]. For k a positive integer, let xk denote the element of I(d, n) corresponding to the distinguished subset (Sx)k. For a monomial S ofN, let Sk,k+1 := Sk∪Sk+1. Let xk,k+1 denote the element of I(d, n) corresponding to the distinguished monomial (Sx)k,k+1; let x k denote the element of I(d, n) corresponding to the distinguished subset (Sx) Caution: For a monomial S of ON and an odd integer j, we have introduced in §7.1 the notation Sj,j+1. That is different from the Sk,k+1 just defined. Corollary 9.1.5 x dominates S ⇔ xk dominates Sk ∀ k ⇔ x1,2 dominates S1,2 and x 3 dominates S3. Proof: The first equivalence is a restatement of the lemma: in the statement of the lemma we could equally well have written depth β = depth α. The second follows from the first and the following observations: (S1,2)1 = S1, (S1,2)2 = S2, (S 3)k = Sk+2; and (x1,2)1 = x1, (x1,2)2 = x2, (x 3)k = xk+2. � 9.2 Orthogonal analogues of Lemmas of 9.1 Lemma 9.2.2 below is the orthogonal analogue of Lemma 9.1.4 (more precisely, that of the first assertion of Corollary 9.1.5). The following proposition will be used in its proof. Proposition 9.2.1 Let x be an element of I(d) and S a monomial in ON. Then x O-dominates S 1 ∪ S 2 if and only if it O-dominates every v-chain in S of O-depth at most 2. Proof: The “if” part is immediate from definitions (in any case, see also Proposition 7.5.4). For the “only if” part, let C be a v-chain in S of O-depth at most 2. Our goal is to show that x dominates SC . For this, it is enough, by Corollary 9.1.5, to show that x1 dominates (SC)1 and x2 dominates (SC)2 (by choice of C, (SC)k is empty for k ≥ 3). Let α′ ∈ (SC)1. Choose α in C such that α ′ ∈ SC,α. Choose α0 in S 1 such that α0 dominates α. Since x O-dominates the singleton v-chain {α0}, it follows that x1 dominates q{α0},α0 . We claim that q{α0},α0 dominates α ′. To prove the claim, we need only rule out the possibility that α0 is of type S in {α0} and α of type V in C. Since α′ ∈ (SC)1, it follows from Proposition 5.3.4 (1) that α is the first element of C. In particular, if α is of type V in C, then ph(α) ∈ N, so ph(α0) ∈ N, and α0 is of type H in {α0}. The claim is thus proved. Now consider an element of (SC)2. Observe that the length of C is at most 2 (Lemma 6.2.1). So our element is either the horizontal projection ph(α) of the head α of C, or it is qC,β where β is the tail of C. In the first case, let α0 be as in the previous paragraph, and proceed similarly. It is clear that ph(α0) ∈ N (because ph(α) ∈ N); x2 dominates ph(α0) and so also ph(α). Now we handle the second case. If β ∈ S 2 , then C is contained in S and there is nothing to prove. So assume that O-depth (β) ≥ 3. Choose a v-chain D in S with tail β, O-depthD(β) ≥ 3, and with the good property as in Proposition 6.3.3. There occurs in D an element of O-depth 3, say δ. (Lemma 6.3.1 (5)). Let A denote the part δ > . . . of D and C′ the part up to but not including δ. There clearly is an element—call it µ—of depth 2 in SD that dominates qD,β. This element µ belongs to SC′ (Corollary 6.3.5 (3)). Since D has the good property of Proposition 6.3.3, C′ ⊆ S 2 , so µ is dominated by an element in (Sx)2. In particular, qD,β is dominated by the same element of (Sx)2. We are still not done, for it is possible that qD,β be β and qC,β be pv(β). Suppose that this is the case. Then α > β is connected. So ph(α) ∈ N and the legs of α and β intertwine. As seen above in the third paragraph of the present proof, there is an element of (Sx)2 that dominates ph(α). By the distinguished- ness of Sx, it follows that the element in (Sx)2 dominating β is the same as the one dominating ph(α). By the symmetry of Sx, this element lies on the diagonal and so dominates pv(β), and, finally, we are done with the proof in the second case. � Lemma 9.2.2 Let S be a monomial in ON and x an element of I(d). For x to O-dominate S it is necessary and sufficient that, for every odd integer j, every v-chain in S j+1 is O-dominated by xj,j+1. Proof: First suppose that x dominates S. Let j be an odd integer and let A a v-chain in S j+1. We need to show that xj,j+1 dominates SA. For this, we may assume that A is maximal (by Corollary 6.1.2). By Corollary 6.1.3 (3), the length of A is at most 2. By Lemma 6.3.1 (5) (b), for every β in S j+1 there exists α in S j with α > β. Thus we may assume that the head α of A belongs It is enough to show (see [7, Lemma 4.5]) that for any v-chain E in SA • the length of E is at most 2; • there exists an x-dominated monomial in N containing E and the head of E is an element of depth at least j in that monomial. The first of these conditions holds by Proposition 7.5.4. We now show that the second holds. We may assume that E is maximal in SA. By Proposition 5.3.4 (1), the head of E is qA,α. Let C a v-chain in S with tail α such that O-depthC(α) = j. Let D be the concatenation of C with A. We claim that the monomial SD has the desired properties. That SD is x-dominated is clear (since x O-dominates S). By Corollary 6.3.5, it follows that qD,α = qA,α and SA ⊆ SD (in particular that E ⊆ SD). By Proposition 6.1.1 (2), O-depthD(α) = O-depthC(α) = j, that is, depth qD,α = j. The proof of the necessity is thus complete. To prove the sufficiency, proceed by induction on the largest odd integer J such that S J+1 is non-empty. When J = 1, there is nothing to prove, for S 1 ∪ S 2 = S and x1,2 O-dominates S 1 ∪ S 2 . So suppose that J ≥ 3. We implicitly use Corollary 6.3.7 in what follows. By induction, x3 O- dominates S3,4. Let D be a v-chain in S. Our goal is to show that x dominates SD. Let α be the element of D with O-depthD(α) = 3—such an element exists, by Lemma 6.3.1 (5) (if there exists in D an element of O-depth in D exceed- ing 2); the following proof works also in the case when α does not exist. Let A be the part α > . . . of D, and C′ the part up to but not including α. By Proposition 6.1.1 (2), the O-depth (in C′) of elements of C′ is at most 2. By Proposition 9.2.1, x1,2 dominates SC′ . By Corollary 6.3.5 (3), (SD)1,2 = SC′ and (SD) 3 = SA. Since A ⊆ S 3,4, it follows that x3 dominates SA (induction hypothesis). Finally, by an application of Corollary 9.1.5, we conclude that x dominates SD. � Corollary 9.2.3 Let S be a a monomial in ON and x an element of I(d). For x to O-dominate S it is necessary and sufficient that x1,2 O-dominate S and x3 O-dominate S3,4. Proof: It is easy to see that (x3)j,j+1 = xj+2,j+3; it follows from Proposi- tion 6.3.6 that (S3,4) j ∪ (S j+1 = S j+2 ∪S j+3. The assertion follows from the lemma. � 9.3 Orthogonal analogues of some lemmas in [7] The proofs of Propositions 4.1 and 4.2 of [7] are based on assertion 4.9–4.16 (of that paper). Assertion 4.9 being a statement about a single Sk, it is applicable in the present situation. Since references to it are frequent, we recall it below as Lemma 9.3.1. As to assertions 4.10–4.16 of [7], assertions 9.3.2, 9.3.4–9.3.9 below are their respective analogues. A block of a monomial S in ON means a block of Sj,j+1 in the sense of [7] for some odd integer j. Caution: Considering S as a monomial in N, there is the notion of a “block” of S as in [7], which has in fact been used in §9.1, and which is different from the notion just defined. Both notions are used and it will be clear from the context which is meant. Throughout this section S denotes a monomial in ON and j an integer (not necessarily odd). Lemma 9.3.1 If B1, . . . ,Bl are the blocks in order from left to right of some Sk, and w(B1) = (R1, C1), w(B2) = (R2, C2), . . ., w(Bl) = (Rl, Cl), then C1 < R1 < C2 < R2 < . . . < Rl−1 < Cl < Rl Proof: This is merely a recall Lemma 4.9 of [7]. In any case it follows easily from the definitions. � Lemma 9.3.2 No two elements of Sk(ext) ∪ S k are comparable. More pre- cisely, it is not possible to have elements α > β both belonging to Sk(ext)∪S Proof: It follows from Lemma 9.3.1 that Sk ∪ S k contains no comparable elements. If k is even, then Sk(ext) = Sk (Corollary 7.3.4 (2)); if k is odd, we may assumeSk(ext) = Sk (as sets) by increasing the multiplicity of σk in S Lemma 9.3.3 For integers i ≤ k, there cannot exist γ ∈ S′i(up) and β ∈ S such that β > γ. For integers i < k, there cannot exist γ ∈ S′i(up) and β ∈ S such that β dominates γ. Proof: Let γ ∈ S′i(up) and β ∈ S . If i = k and β > γ, then we get a contradiction immediately to Lemma 9.3.2. Now suppose that i < k and that β dominates γ. Apply Corollary 6.3.4 (the notation of the corollary being sug- gestive of how exactly to apply it). Let α be as in its conclusion. The chain α > γ contradicts Lemma 9.3.2 in case i is odd and either Lemma 9.3.2 or Proposition 7.5.4 in case i is even. � Lemma 9.3.4 For (r, c) in S′, there exists a unique block B of S with (r, c) in B′. Proof: The existence is clear from the definition of S′. For the uniqueness, suppose that B and C are two distinct blocks of S with (r, c) in both B′ and C′. We will show that this leads to a contradiction. Let i and k be such that B ⊆ Si and C ⊆ Sk. From Lemma 4.11 of [7] (of which the present lemma is the orthogonal analogue) it follows that i 6= k, so we can assume without loss of generality that i < k. By applying the involu- tion # if necessary, we may assume that (r, c) ∈ S′i(up). Now there exists an element (r, a) in C with a ≤ c (this follows from the definition of C′). Clearly (r, a) ∈ S . Taking β = (r, a) and γ = (r, c), we get a contradiction to Lemma 9.3.3. � Lemma 9.3.5 Let i < j be positive integers. 1. Given a block B of Sj, there exists a unique block C of Si such that w(C) > w(B). 2. Given an element β in Sj(ext)∪S j, there exists α in Si such that α > β. Proof: (1): The assertion follows by applying Lemma 9.1.2 with B = B and U = Si. We need to make sure however that the lemma can be applied. More precisely, we need to check that for every β = (r, c) in B there exist γ1(β) = (R1, C1) and γ2(β) = (R2, C2) in Si such that C 1 < c, C2 < R1, and r < R2. We may assume β = β(up), for, if β = β(down), then β(up) also belongs to Sj because Sj is symmetric, and we can set γ 1(β) = γ2(β(up))(down), and γ2(β) = γ1(β(up))(down)—note that these two belong to Si since Si is symmetric. We consider three cases: 1. β belongs to S. 2. β = ph(σj−1) (in particular, j is even and S is truly orthogonal at j − 1). 3. β = pv(σj) (in particular, j is odd and S is truly orthogonal at j). Define β′ to be β in case 1, σj−1 in case 2, and σj in case 3. Let C be a v-chain in S with tail β′ and having the good property as in Proposition 6.3.3. First suppose that there exists in C an element of O-depth i and denote it by γ. If ph(γ) 6∈ N (this can happen only in case 1), then set γ 1(β) = γ2(β) = γ. Now suppose ph(γ) ∈ N. Then γ ∈ Si except when γ = σi with i odd and σi has multiplicity 1 in S. If γ ∈ Si, take γ 1(β) = γ and γ2(β) = γ# = γ(down); if γ 6∈ Si, then take γ 1(β) = γ2(β) = pv(γ). Now suppose that C has no element ofO-depth i. Then, by Lemma 6.3.1 (5), i is even and there exists in C an element of O-depth i − 1. This element of C is of type H by Lemma 6.3.1 (1), so S is truly orthogonal at i − 1. Set γ1(β) = γ2(β) = ph(σi−1). (2): This proof parallels the proof of (1) above. As in the above proof, we may assume that β = β(up). Suppose β = (r, c) belongs to S′j. Then there exists (r, a) ∈ Sj with a ≤ c. Since S j does not meet the diagonal, it is clear that (r, a) ∈ ON, and thus it is enough to prove the assertion for β ∈ Sj(ext). So now take β ∈ Sj(ext). Let β ′ and C be in the proof of (1). First suppose that there exists in C an element of O-depth i. Denote it by γ. If γ ∈ Si, then take α = γ. If γ 6∈ Si, then pv(γ) ∈ Si, and we take α = pv(γ). In case there is no element in C of O-depth i, we take α = ph(σi−1) (see the above proof). � Corollary 9.3.6 If B and B1 are blocks of S with w(B) = (r, c) and w(B1) = (r1, c1), then exactly one of the following holds: c < r < c1 < r1, c1 < r1 < c < r, c < c1 < r1 < r, or c1 < c < r < r1. Proof: This is a formal consequence of Lemmas 9.3.1 and 9.3.5, just as Corol- lary 4.13 of [7] is of Lemmas 4.9 and 4.12 of that paper. � Corollary 9.3.7 If w(B) > w(C) for blocks B ⊆ Si and C ⊆ Sj of S, then i < j. Proof: This is a formal consequence of Lemmas 9.3.1 and 9.3.5. It follows from the first lemma that i 6= j. Suppose i > j. Then there exists by the second lemma a block C′ ⊆ Sj such that w(C ′) > w(B). But then w(C′) > w(C), a contradiction of the first lemma. � Corollary 9.3.8 Let (s, t) > (s1, t1) be elements of S ′, and B, B1 be blocks of S such that (s, t) ∈ B′, and (s1, t1) ∈ B 1. Then w(B) > w(B1). Proof: Let w(B) = (r, c) and w(B1) = (r1, c1). By Corollary 9.3.6, we have four possibilities. Since (r, c) dominates (s, t) and (r1, c1) dominates (s1, t1), the possibilities c < r < c1 < r1 and c1 < r1 < c < r are eliminated. It is thus enough to eliminate the possibility c1 < c < r < r1. Suppose that this is the case. Then, by Corollary 9.3.7, j1 < j, where j1 and j are such that B ⊆ Sj and B1 ⊆ Sj1 . Now, by Lemma 9.3.5 (2), there exists α in Sj1 such that α > (s, t) > (s1, t1). But then this contradicts Lemma 9.3.2. � Corollary 9.3.9 For a B ⊆ Si of S, the depth of w(B) in Sw is exactly i. Proof: That the depth is at least i follows from Lemma 9.3.5. That the depth cannot exceed i follows from Corollary 9.3.7. � Corollary 9.3.10 Let α ∈ S′k(up), β ∈ S m(up), and α > β. Then k < m. Proof: Corollary 9.3.8 and Corollary 9.3.9. � 9.4 More lemmas This subsection is a collection of lemmas to be invoked in the later subsec- tions. More specifically, Lemma 9.4.1 and Corollary 9.4.2 are invoked in the proof of Proposition 4.1.1 in §10.1, Lemma 9.4.3 in the proof of the first half of Proposition 4.1.2 in §10.2, and Lemma 9.4.4 in the proof of the second half of Proposition 4.1.2 in §10.3. Throughout this subsection, S denotes a monomial in ON. Lemma 9.4.1 Let C be a v-chain in S′, α an element of C, and α′ ∈ SC,α. Then depth α′ ≤ k(even), for k the integer such that α ∈ S′k(up). Proof: Proceed by induction on k. If k = 1, the assertion follows from Corol- lary 9.3.10, so assume k > 1. Choose a v-chain C′ in SC with tail α ′ and depthC′α ′ = depth (α′). The length of a v-chain in SC,α is clearly at most 2. So, if γ′ is the element two steps before α′ in C′ (if γ′ does not exist then there is clearly nothing to prove), then γ′ ∈ SC,γ with γ > α (see Proposition 5.3.4 (2)). We claim that depth (γ′) ≤ k(odd) − 1. It is enough to prove the claim, for then depth (α′) = depthC′α ′ = depthC′γ ′ + 2 ≤ k(odd)− 1 + 2 = k(even). The claim follows by induction from Corollary 9.3.10 if k is odd or more generally if γ ∈ S′l(up) with l ≤ k(odd) − 1. So assume that k is even and γ ∈ S′k−1(up). By 7.5.4, it is not possible that γ is of type H and ph(γ) > α. So the only possibility is that α′ = ph(α) and γ > α is connected. In particular, γ is of type V and α of type H in C and γ′ = pv(γ). Now let µ be the first element in the connected component of α in C. The cardinality of the part µ > . . . > γ of C is even (by Proposition 5.3.1 (1), it follows that the cardinality of µ > . . . > α is odd), say e. Letting m be such that µ ∈ S′m(up), we have, by Proposition 9.3.10, m ≤ k − 1 − (e − 1) = k − e. If m(even) < k − e, then, since depthC′γ ′ = depthC′pv(µ) + e − 1 (by Proposition 5.3.4 (1), since, by Proposition 5.3.1 (2), µ, . . . , γ all have type V in C) and depthC′pv(µ) ≤ m(even) by induction, it follows that depthC′γ k − e+ e− 1 = k − 1, and we are done. So suppose that m(even) = k − e. Let ν be the element just before µ in C (if such an element does not exist, then depthC′γ ′ = e ≤ k − 2—observe that m(even) ≥ 2—and we are done). Then ν > µ is not connected (by choice of µ). So ph(ν) > µ. By Proposition 7.5.4, this means that j ≤ m(even) − 2 where j is the odd integer defined by ν ∈ S′j(up) ∪ S j+1(up). So, again by induction, depthC′γ ′ = depthC′ph(ν) + e ≤ m(even) − 2 + e = k − 2, and the claim is proved. � Corollary 9.4.2 The O-depth of an element α in S′ is at most k where k is such that α ∈ S′k(up). Proof: Let C′ be a v-chain in SC with tail qC,α. If k is even, then, by the lemma, depthC′qC,α ≤ k. So suppose that k is odd. Let γ ′ be the immediate pre- decessor of qC,α in C ′. By Proposition 5.3.4 (2), γ > α, and so γ ∈ S′l(up) with l ≤ k− 1 (see the observation in the first paragraph of the proof of the lemma). So depthC′γ ′ ≤ k − 1 (by the lemma) and depthC′α ′ = depthC′γ ′ + 1 ≤ k. � Lemma 9.4.3 Let S be a monomial in ON and Oπ(S) = (w,S′). Let i < k be integers, α an element of S′i(up), and δ an element of (Sw)k(up) that dominates α. 1. If k is even, then there exists β ∈ S′k(up) with α > β. 2. If k is odd and wk,k+1 O-dominates the singleton v-chain α, then either there exists β ∈ S′k(up) with α > β or there exists γ ∈ S k+1(up) with ph(α) > γ. Proof: Write α = (r, c) and δ = (A,B). By Corollary 9.3.9, there exists a block B of Sk such that δ = w(B). Let (D,B) be the first element of B (arranged in increasing order of row and column indices). We have the following possibilities: (i) D ≤ A and (D,B) ∈ S (ii) k is odd, S is truly orthogonal at k, (D,B) = (A,B) = pv(σk), and B consists of the single diagonal element (D,B) = (B∗, B). (iii) k is even, S is truly orthogonal at k−1, (D,B) = (A,B) = ph(σk−1), and B consists of the single diagonal element (D,B) = (B∗, B). We claim the following: in case (i), D < r (in particular, D < A); in case (ii), the row index of σk is less than r; and case (iii) is not possible. The first two assertions and also the third in the case i < k − 1 follow readily from Lemma 9.3.3; in case (iii) holds and i = k − 1, then σk−1 > α, a contradiction to Lemma 9.3.2. First suppose that possibility (ii) holds. Write σk = (s,B). Since s < r and ph(σk) ∈ N, it is clear that ph(α) = (r, r ∗) also belongs to N. From the hypothesis that wk,k+1 O-dominates {α}, it follows that there is an element of (Sw)k+1 that dominates ph(α) = (r, r ∗). Such an element must be diagonal (because of the distinguishedness ofSw), and so must be the w(C) for the unique diagonal block C ofSk+1. In particular, this means that there are elements other than (s, s∗) in Sk+1, and so S k+1 is non-empty. In the arrangement of elements of S′k+1(up) in increasing order of row and column numbers, let γ = (e, s ∗) be the last element. Then e < s < r and r∗ < s∗, so ph(α) > γ, and we are done. Now suppose that possibility (i) holds. Let (p, q) be the element of Sk such that p is the largest row index that is less than r, and, among those elements with row index p, the maximum possible column index is q. The arrangement of elements of Sk (in increasing order or row and column indices) looks like this: . . . , (p, q), (s, t), . . . Since p < r ≤ A and w(B) = (A,B), we can be sure that (p, q) is not the last element of B. We first consider the case c < t. Then α = (r, c) > β := (p, t) ∈ S′k. If β ∈ S′k(up), then we are done. It is possible that (p, q) lies on or below the diagonal so that β lies below the diagonal, in which case, α > β(up) and β(up) ∈ S′k(up), and again we are done. Now suppose that t ≤ c. We claim that: • (s, t) belongs to the diagonal; • k is odd and S is truly orthogonal at k; and • σk = (u, t) with u < r. Suppose that (s, t) does not belong to the diagonal. Since r ≤ s (by choice of (p, q)), it follows that (s, t) dominates (r, c). This leads to a contradiction to Lemma 9.3.3, for either (s, t) or its reflection (t∗, s∗) (whichever is above the diagonal) belongs to S and dominates α = (r, c) in S′i(up). This shows that (s, t) belongs to the diagonal. If k is even, then (s, t) = ph(σk−1), which means σk−1 > α, again contradicting Lemma 9.3.3, so k must be odd. It also follows that S is truly orthogonal at k and that (s, t) = pv(σk). Writing σk = (u, t), if r ≤ u, then σk would dominate α, again contradicting Lemma 9.3.3. So u < r, and the claim is proved. To finish the proof of the lemma, now proceed as in the proof when possi- bility (ii) holds. � Lemma 9.4.4 Let T be a monomial in ON and w an element of I(d) that O- dominates T. Let β′ > β be elements Sw(up). Let d−1 and d be their respective depths in Sw. Let α be an element of OP β or more generally an element of ON such that (a) it is dominated by β, (b) it is not comparable to any element of Pβ, and (c) in case d is odd, then {α} ∪ Tw,d,d+1 has O-depth at most 2. 1. there exists α′ ∈ P∗β′(up) with α ′ > α; 2. for α′ as in (1), if α′ is diagonal, then ph(δd−2) > α if d is odd and δd−1 > α if d is even. Proof: Assertion (2) is rather easy to prove. If d is odd, then, in fact, ph(δd−2) = α ′; if d is even, then δd−1 has the same column index as α ′ and, by Proposition 8.2.2 (4), has row index more than that of α, so δd−1 > α. Let us prove (1). Write α = (r, c), β = (R,C), and β′ = (R′, C′). There exists, by the definition of P∗β′ , an element in P β′ with column index C ′. We have C′ < c (for C′ < C ≤ c). Let (r′, c′) be the element of P∗β′ such that c is maximum possible subject to c′ < c and among those elements with column index c′ the maximum possible row index is r′. If r < r′, then we are done (if (r′, c′) is below the diagonal, its mirror image would have the desired properties). It suffices therefore to suppose that r′ ≤ r and arrive at a contradiction. In the arrangement of elements of P∗β′ in non-decreasing order of row and column indices, there is a portion that looks like this: . . . , (r′, c′), (a, b), . . . Since there is in P∗β an element with row index R ′ (and clearly r′ ≤ r < R < R′), it follows that (a, b) exists (that is, (r′, c′) is not the last element in the above arrangement). It follows from the construction of P∗β′ from Pβ′ that (r ′, b) is an element in Pβ′ . By the choice of (r ′, c′), we have c ≤ b. Thus (r, c) dominates (r′, b). The proof now splits into two cases accordingly as d is even or odd. First suppose that d is even. Then, since β dominates (r′, b) and yet (r′, b) does not belong to Pβ, there exists a v-chain in Tw,d−1,d of length 2 and head (r ′, b). The tail of this v-chain then belongs toPβ and is dominated by (r, c), a contradiction to our assumption that α is not comparable to any element of Pβ. Now suppose that d is odd. Choose a v-chain C in T with head (r′, b) that is not O-dominated by wd. Let D be the part of C consisting of elements of O-depth (in C) at most 2. We claim that D is O-dominated by wd,d+1. In fact, we claim the following: Any v-chain F with head (r′, b) and O-depth at most 2 is O-dominated by wd,d+1. To prove the claim, we first prove the following subclaim: (†) If the horizontal projection of (r′, b) belongs to N, then β is on the diagonal and dominates the vertical projection of (r′, b), and the diagonal element β1 of (Sw)d+1 dominates the horizontal projection of (r′, b). Let ph(r ′, b) ∈ N. Then β belongs to the diagonal because Sw is distinguished and symmetric. Once β is on the diagonal, it is clear that it dominates pv(r ′, b) (from our assumptions, β dominates (r, c) and (r, c) dominates (r′, b)). It follows from Proposition 8.2.2 (3) that the row index of β1 exceeds the row index r of (r, c), so β1 dominates ph(r ′, b). This finishes the proof of the subclaim (†). To begin the proof of the claim, observe that F has length at most 2. Suppose first that F consists only of the single element (r′, b). The type of (r′, b) in F is either H or S. If it is S, then since β dominates (r′, b), the claim follows immediately. If it is H, then the claim follows immediately from the subclaim (†). Continuing with the proof of the claim, let now F consist of two elements: (r′, b) > µ. Let γ be the element of Sw such that µ ∈ Pγ , and let e be the depth of γ in Sw. From Lemma 8.2.1 (2b) it follows that e ≥ d. If e = d, then γ = β (by the distinguishedness of Sw), and the comparability of (r, c) and µ contradicts our hypothesis (b). So e ≥ d+1, and there exists δ of depth d+1 in Sw that dominates µ. We have β > δ (again by the distinguishedness of Sw). The possibilities for the types of (r′, b) and µ in F are: S and S, V and V, H and S (in the last case ph(r ′, b) 6> µ by Lemma 6.3.1 (1)). Noting the existence in (Sw)d,d+1 of the v-chain β > δ in the first case and also of β > β1 (where β1 is as in the subclaim) in the last case, the proof of the claim in these cases is over. So suppose that the second possibility holds. The distinguishedness of Sw implies that δ = β1. Since δ is diagonal, it dominates the vertical projection of µ. Noting the existence of the v-chain in β > δ in (Sw)d,d+1, the proof of the claim in this case too is over. We continue with the proof of the lemma. It follows from the claim that D is O-dominated by wd,d+1. From Corollary 9.2.3 it follows that the complement E of D in C is not O-dominated by wd+2,d+3 (in particular, that E is non-empty) and that every v-chain in T with head ǫ (where ǫ denotes the head of E) is O-dominated by wd (given such a v-chain, the concatenation of D with it is O-dominated by wd−2, and ǫ continues to have O-depth 3 in the concatenated v-chain). Thus ǫ belongs to Tw,d,d+1. From (1) and (2b) of Lemma 8.2.1 it follows that the element µ of C in between (r′, b) and ǫ (if it exists at all) also belongs to Tw,d,d+1. Now consider the v-chain obtained as follows: take the part of C up to (and including) ǫ and replace its head (r′, b) by (r, c). This chain has O-depth 3 and lives in {α} ∪ Tw,d,d+1, a contradiction to hypothesis (c). � Corollary 9.4.5 Let T be a monomial in ON and w an element of I(d) that O-dominates T. Let β′ > β be elements of Sw(up), α an element of OP β, and d′ := depth 1. If d′ is odd, there exists α′ ∈ OP∗β′ such that α ′ > α. 2. If there does not exist α′ ∈ OP∗β′ such that α ′ > α then (d′ is even by (1) above and) there exists α′′ ∈ OP∗β′′ such that ph(α ′′) > α, where β′′ is the unique element of (Sw)d′−1 such that β ′′ > β′. Proof: Immediate from the lemma. � Corollary 9.4.6 Let T be a monomial in ON and w an element of I(d) that O-dominates T. Let β, β′ be elements of Sw(up), and α, α ′ elements of OP∗β and OP∗β′ respectively. 1. If α′ > α then β′ > β (in particular, depth β′ < depth 2. If ph(α ′) > α and depth β is even, depth β′ ≤ depth β − 2. Proof: (1) Writing β = (r, c) and β′ = (r′, c′), there are, since both β and β′ dominate α and Sw is distinguished, the following four possibilities: c < r < c1 < r1, c1 < r1 < c < r, c < c1 < r1 < r, c1 < c < r < r1 Since α′ > α, and α, α′ are dominated respectively by β, β′ (this is because α, α′ belong to OP∗β, OP β′ respectively), the possibilities c < r < c1 < r1 and c1 < r1 < c < r are eliminated (by the distinguishedness of Sw). It is thus enough to eliminate the possibility β > β′. Suppose, by way of contradiction, that β > β′. By Corollary 9.4.5, either there exists γ ∈ OP∗β such that γ > α in which case the v-chain γ > α in OP∗β contradicts Proposition 8.2.2 (3) or (4), or d := depth β is even and there exists (with β′′ being the unique element in Sw such that β ′′ > β and depth β′′ = d− 1) an element α′′ ∈ OP∗β′′ with ′′) > α′, in which case the v-chain α′′ > α in T⋆w,d−1,d has O-depth 3 and so contradicts Proposition 8.2.2 (2). (2) Set d := depth β. If depth β′ were d− 1, then the v-chain α′ > α in T⋆w,d−1,d would be ofO-depth 3 and so would contradict Proposition 8.2.2 (2). � 10 The Proof The aim of this section is to prove Propositions 4.1.1 and 4.1.2. The proof of first proposition appears in §10.1 and that of the second in §§10.2, 10.3. In §9.4 some lemmas are established that are used in the proofs. Needless to say that the lemmas maybe unintelligible until one tries to read the proofs in the later subsections. 10.1 Proof of Proposition 4.1.1 (1) By definition, w is the element of I(d) associated to the distinguished mono- mial ∪kSw(k). By the very definition of this association, we have w ≥ v. (2) This follows from the corresponding property of the map π of [7]. More precisely, that property justifies the third equality below. The other equalities are clear from the definitions. v-degree(w) + degree(S′) = degree(Sw) + degree(S′k) degree(Sw(k)) + degree(S degree(Sk) j odd degree(Sj,j+1) j odd degree(S j ) + degree(S = degree(S) (3) We have: w O-dominates S′ ⇔ w ≥ wC ∀ v-chain C in S ⇔ w dominates SC ∀ v-chain C in S ⇔ ∀ v-chain C in S′, ∀ α′ = (r, c) ∈ SC , ∃ β = (R,C) ∈ Sw with C ≤ c, r ≤ R, and depth β ≥ depth The first equivalence above follows from the definition of O-domination, the second from [7, Lemma 4.5], the third from Lemma 9.1.4. Now let C be a v-chain in S′ and α′ = (r, c) in SC . We will show that there exists β in Sw that dominates α and satisfies depthSwβ ≥ depthSCα Let α be the element in C such that α′ ∈ SC,α, let k be such that α ∈ S k(up), and let B be the block of Sk such that α ∈ B ′. Writing α = (r1, c1) and w(B) = (R1, C1), we have C1 ≤ c1 and r1 ≤ R1 straight from the definition of w(B). By Corollary 9.3.9, depth w(B) = k. First suppose that w(B) dominates α′ (meaning C1 ≤ c and r ≤ R1). If k ≥ depth α′, we are clearly done; by Corollary 9.4.2, this is the case when α′ = qC,α. So suppose that α is of type H, α ′ = ph(α), and that k < depth α′. By Lemma 9.4.1, depth α′ ≤ k(even). It follows that k is odd and depth α′ = k + 1. By Corollary 7.5.3, S is truly orthogonal at k, which means that Sk+1 has a diagonal block, say C. Note that w(C) dominates ph(σk) which in turn dominates ph(α). Since depthSww(C) = k+1 by Corollary 9.3.9, we are done. Now suppose that w(B) does not dominate α′. Then B is non-diagonal and α′ = pv(α). Since B is non-diagonal, ph(α) 6∈ N, and α cannot be of type H. So α is of type V in C. It follows easily (see Proposition 5.3.1 (3)) that α is the critical element in C, and and that last element in its connected component in C; by Lemma 6.3.1 (4), O-depthC(α) = depthSC qC,α =: d is even. By Proposition 5.3.1 (1), (2), the cardinality of the connected component of α in C is even. The immediate predecessor γ of α in C is connected to α (this follows from what has been said above). It is of type V in C, ph(γ) belongs to N, and depth pv(γ) = d − 1 (see Lemma 6.3.1 (1)). Let ℓ be such that γ ∈ S ℓ(up). Let C be the block of Sℓ such that γ ∈ C ′. Since ph(γ) ∈ N, C is diagonal. Note that w(C) dominates pv(γ) and that pv(γ) > pv(α). By Corollary 9.3.9, depth w(C) = ℓ. Thus if d ≤ ℓ we are done. On the other hand, d− 1 ≤ ℓ by Corollary 9.4.2. So we may assume that ℓ = d− 1. By Corollary 7.5.3, S is truly orthogonal at d − 1. This implies that Sd has a diagonal block, say D. Note that w(D) dominates ph(σd−1) which in turn dominates ph(γ). Writing γ = (r2, c2), since γ > α is connected, it follows that (r1, r 2) belongs to ON. Now both w(B) and w(D) dominate (r1, r 2). Since Sw is distinguished and symmetric and w(B) is not on the diagonal, it follows that w(D) > w(B). This implies, since w(D) is on the diagonal, w(D) > pv(α). Since depthSww(D) = d by Corollary 9.3.9, we are done. (4) Let x be an element of I(d) that O-dominates S. We will show that x ≥ w. By [7, Lemma 5.5], it is enough to show that x dominates Sw. By Lemma 9.1.4, it is enough to show the following: for every block B of S, there exists β in Sx such that β dominates w(B) and depthSxβ ≥ depthSww(B). Let B be a block of S. By Corollary 9.3.9, depth w(B) = k where B ⊆ Sk. Let S x denote the set of elements of Sx of depth at least k. Our goal is to show that there exists β in Skx that dominates w(B). It follows easily from the distinguishedness of Sx and the fact that B is a block, that it suffices to show the following: given α ∈ B, there exists β in Skx (depending upon α) that dominates α. Moreover, since B and Skx are symmetric, we may assume that α = α(up). So now let α = α(up) belong to B. Then either 1. α belongs to S 2. k is odd, S is truly orthogonal at k, and α = pv(σk), or 3. k is even, S is truly orthogonal at k − 1, and α = ph(σk−1). The proofs in the three cases are similar. In the first case, choose a v-chain C in S with tail α such that O-depthC(α) = k (see Corollary 6.1.3 (1)). Then depth qC,α = k and, clearly, qC,α dominates α. Since x dominates SC , there exists, by Lemma 4.5 of [7], β in Skx that dominates qC,α (and so also α). In the second case, choose a v-chain C in S with tail σk with the property that O-depthC(σk) = k. Then depthSC qC,σk = k. Since ph(σk) belongs to N, σk is of type V or H in C, so qC,σk = α. Since x dominates SC , there exists, by [7, Lemma 4.5], β in Skx that dominates qC,σk = α. In the third case, choose a v-chain C in S with tail σk−1 such that the O-depth in C of σk−1 is k − 1. Then depthSCqC,σk−1 = k − 1. Since ph(σk−1) belongs to N, σk−1 is of type V or H in C, so qC,σk−1 = pv(σk−1). From Lemma 6.3.1 (4), it follows, since k − 1 is odd, that σk−1 is of type H. Since pv(σk−1) > ph(σk−1) = α, it follows that depthSCph(σk−1) ≥ k (in fact equality holds as is easily seen). Since x dominates SC , there exists, by [7, Lemma 4.5], β in Skx that dominates ph(σk−1) = α. � 10.2 Proof that OφOπ = identity LetS be a monomial inON and let Oπ = (w,S′). We need to show thatOφ ap- plied to the pair (w,S′) gets us back toS. We know from (3) of Proposition 4.1.1 that w O-dominates S′, so Oφ can indeed be applied to the pair (w,S′). The main ingredients of the proof are the corresponding assertion in the case of Grassmannian [7, Proposition 4.2] and the following claim which we will presently prove: (S′)w,j,j+1 = S j(up) ∪S j+1(up) for every odd integer j Let us first see how the assertion follows assuming the truth of the claim, by tracing the steps involved in applying Oφ to (w,S′). From the claim it follows that when we partition S′ into pieces (see §8), we get S′j(up) ∪S j+1(up) (for odd integers j). Adding the mirror images will get us to S′j ∪ S j+1. From Corollary 9.3.9 it follows that wj,j+1 is exactly the element of I(d, 2d) obtained by acting π on Sj ∪ Sj+1. Now, since φ ◦ π = identity, it follows that on application of φ to (wj,j+1,S j+1) we obtain Sj ∪Sj+1. By twisting the two diagonal elements in Sj ∪ Sj+1 (if they exist at all) and removing the elements below the diagonal d, we get back S j,j+1. Taking the union of S j,j+1 (over odd integers j), we get back S. Thus we need only prove the claim. Since S′ is the union over all odd integers of the right hand sides (this follows from the definition of S′), and the left hand sides as j varies are mutually disjoint, it is enough to show that the right hand side is contained in the left hand side. Thus we need only prove: for j an odd integer and α an element in S′j(up) ∪S j+1(up), • every v-chain in S′ with head α is O-dominated by wj . • there exists a v-chain in S′ with head α that is not O-dominated by wj+2. To prove the first item, write T = Sj,j+1 := {α ∈ S|O-depth (α) ≥ j} and set Oπ(T) = (x,T′). By Proposition 6.3.6, we have T i ∪ T i+1 = S i+j−1 ∪ i+j for any odd integer i. Thus, by the description of Oπ, we have T ∪k≥jS k(up). By Corollary 9.3.9 and the description of Oπ, we have x = w By Corollary 9.3.10, any v-chain inS′ with head belonging toS′j(up)∪S j+1(up) is contained entirely in ∪k≥jS k(up). Finally, by Proposition 4.1.1 (3) applied to T, the desired conclusion follows. To prove the second item we use Lemma 9.4.3. Proceed by decreasing induc- tion on j. For j sufficiently large the assertion is vacuous, for S′j(up)∪S j+1(up) is empty. To prove the induction step, assume that the assertion holds for j+2. If the v-chain consisting of the single element α is not O-dominated by wj+2, then we are done. So let us assume the contrary. Since the O-depth of the singleton v-chain α is at most 2, it follows from Lemma 9.2.2 that wj+2,j+3 O- dominates the v-chain α. Apply Lemma 9.4.3 with k = j+2. By its conclusion, either there exists β ∈ S′j+2(up) such that α > β or there exists γ ∈ S j+3(up) such that ph(α) > γ. First suppose that a γ as above exists. By induction, there exists a v-chain in S′—call it D—with head γ that is not O-dominated by wj+4. Let C be the concatenation of α > γ and D. Since elements of D haveO-depth at least 3 in C (Lemma 6.3.1 (1)), it follows from Corollary 9.2.3 that C is not O-dominated by wj+2, and we are done. Now suppose that such a γ does not exist. Then a β as above exists. If α > β is not O-dominated by wj+2 we are again done. So assume the contrary. Since the O-depth of β in α > β is at least 2, it follows that there exists an element of (Sw)j+3 that dominates β. Applying Lemma 9.4.3 again, this time with k = j + 3, we find γ′ ∈ S′j+3(up) such that β > γ ′. Arguing as in the previous paragraph with γ′ in place of γ, we are done. � 10.3 Proof that OπOφ = identity Let T be a monomial in ON and w an element of I(d) that O-dominates T. We can apply Oφ to the pair (w,T) to obtain a monomial T⋆w in ON. We need to show that Oπ applied to T⋆w results in (w,T). The main step of the proof is to establish the following: T⋆w,j,j+1 = (T j,j+1 (10.3.1) (for the meaning of the left and right sides of the above equation, see §8 and §7 respectively). Assuming this for the moment let us show that Oπ ◦ Oφ = identity. We trace the steps involved in applying Oπ to T⋆w. From Eq. (10.3.1) it follows that when we break up T⋆w according to the O-depths of its elements as in §7, we get T⋆w,j,j+1 (as j varies over odd integers). The next step in the application of Oπ is the passage from (T⋆w) j,j+1 to (T w)j,j+1. This involves replacing σj by its projections and adding the mirror image of the remaining elements of (T⋆w) j,j+1. It follows from Proposition 8.2.2 (3) that σj = δj and so (T⋆w)j,j+1 = (Tw,j,j+1 ∪ T w,j,j+1) ⋆. The next step is to apply π to (Tw,j,j+1 ∪ w,j,j+1) ⋆. Since π is the inverse of φ (as proved in [7]), we have π((Tw,j,j+1 ∪ w,j,j+1) ⋆) = (wj,j+1,Tw,j,j+1). Since Sw and T are respectively the unions, as j varies over odd integers, of (Sw)j,j+1 and Tw,j,j+1, we see that Oφ applied to T⋆w results in (w,T). Thus it remains only to establish Eq. (10.3.1). It is enough to show that the left hand side is contained in the right hand side, for the union over all odd j of either side is T⋆w and the right hand side is moreover a disjoint union. In other words, we need only show that the O-depth in T⋆w of an element of T w,j,j+1 is either j or j + 1. We will show, more precisely, that, for any element β of Sw, the O-depth in T w of any element of OP β equals the depth in Sw of β. Lemma 9.4.4 will be used for this purpose. Let α be an element of OP∗β and set e := O-depthT⋆w (α). We first show, by induction on d := depth β, that e ≥ d. There is nothing to prove in case d = 1, so we proceed to the induction step. Let β′ be the element of Sw of depth d − 1 such that β′ > β. If there exists α′ in OP⋆β′ with α ′ > α, the desired conclusion follows from Corollary 6.1.3 (3) and induction. Lemma 9.4.4 says that such an α′ exists in case d is even. So suppose that d is odd and such an α′ does not exist. The same lemma now says that ph(δd−2) > α, so the desired conclusion follows from Lemma 6.3.1 (1). We now show, by induction on e, that d ≥ e. There is nothing to prove in case e = 1, so we proceed to the induction step. Let C be a v-chain in T⋆w with tail α and having the good property of Proposition 6.3.3. Let α′ be the immediate predecessor in C of α. Let β′ in Sw be such that α ′ ∈ OP∗β′ (we are not claiming at the moment that β′ is unique although that is true and follows from the assertion that we are proving, the distinguishedness of Sw, and the fact that β′ dominates α′). It follows from Corollary 9.4.6 that β′ > β. Let d′ := depth β′. It follows from Corollary 6.1.3 (3) that e′ < e where e′ := O-depth (α′). We have, d ≥ d′+1 ≥ e′+1 ≥ (e−2)+1 = e−1, the first equality being justified because β′ > β, the second by the induction hypothesis, and the last by Lemma 6.3.1 (1). It suffices to rule out the possibility that d = e − 1. So assume d = e − 1. Then d = d′ + 1 and d′ = e′ = e − 2. It follows from (1) of Lemma 6.3.1 that the v-chain α′ > α has O-depth 3 and from (3) of the same lemma that e′ is odd. But then we get a contradiction to Proposition 8.2.2 (2) (α′ and α belong to T⋆w,d′,d′+1). The proof of Eq. (10.3.1) is thus over. � 10.4 Proof of Proposition 4.1.3 Observe that the condition (‡) makes sense also for a monomial of N. By virtue of belonging to I(d), v has f∗ as an entry. It follows from the description of the bijection w ↔ Sw of §5.1.2 that for an element w of I(d) to satisfy (‡) it is necessary and sufficient that Sw (equivalently all its parts Sw,j,j+1) satisfy (‡). (1) Since T satisfies (‡), so do its parts Tw,j,j+1 and Tw,j,j+1 ∪ T w,j,j+1 (adding the mirror image preserves (‡)). Since Sw,j,j+1 also satisfies (‡), it follows from the description of the map φ of [7] (observe the passage from a piece P to its “star” P∗) that the (Tw,j,j+1 ∪ T w,j,j+1) ⋆ satisfy (‡). Since the “twisting” involved in the passage from (Tw,j,j+1 ∪ T w,j,j+1) ⋆ to T⋆w,j,j+1 involves only a rearrangement of row and column indices, it follows that the T⋆w,j,j+1 satisfy (‡). Finally so also does their union T (2) The parts S j,j+1 of S clearly satisfy (‡). Therefore so do the Sj,j+1, for, first of all, adding the mirror image preserves (‡), and then the removal of σj and addition of its projections involves only a rearrangement of row and column indices. It follows from description of the map π of [7] (observe the passage from a block B to the pair (w(B),B′)) that both Sw,j,j+1 and S j,j+1 satisfy (‡). Finally, Sw and S ′ being the union (respectively) of Sw,j,j+1 and S j,j+1(up), they satisfy (‡). � Part IV An Application As an application of the main theorem (Theorem 2.3.1), an interpretation of the multiplicity is presented. 11 Multiplicity counts certain paths Fix elements v, w in I(d) with v ≤ w. It follows from Corollary 2.3.2 that the multiplicity of the Schubert variety X(w) in Md(V ) at the point e v can be in- terpreted as the cardinality of a certain set of non-intersecting lattice paths. We first illustrate this by means of two examples and then justify the interpreta- tion. 11.1 Description and illustration The points of N can be represented, in a natural way, as the lattice points of a grid. The column indices of the points of the grid are the entries of v and the row indices are the entries of {1, . . . , 2d} \ v. In Figure 11.1.1 the points of ON and those of the diagonal in N are shown (for the specific choice of v in Example 11.1.1). The open circles represent the points of Sw(up), where Sw is the distinguished monomial in N that is associated to w as in §5.1.2. From each point β of Sw(up) we draw a vertical line upwards from β and let β(start) denote the top most point of ON on this line. In case β is not on the diagonal, draw also a horizontal line rightwards from β and let β(finish) denote the right most point of ON on this line. In case β is on the diagonal, then β(finish) is not a fixed point but varies subject to the following constraints: • β(finish) is one step away from the diagonal (that is, it is of the form (r, c), for some entry c of v, where r is the largest integer less than c∗ that is not an entry of v); • the column index of β(finish) is not less than that of β; • if depth β is odd, then the horizontal projection of β(finish) is the same as the vertical projection of γ(finish) where γ is the diagonal element of Sw of depth 1 more than that of β. With v and w as in Example 11.1.1, we have β(start) = (6, 3) and β(finish) = (9, 5) for β = (9, 3); β(start) = β(finish) = (21, 20) for β = (21, 20); β(start) = (15, 11) for the diagonal element β = (36, 11); β(start) = (6, 1) for the diagonal element β = (46, 1). In the particular case (of non-intersecting lattice paths) drawn in Figure 11.1.1, β(finish) = (27, 19) for β = (36, 11) and β(finish) = (28, 14) for β = (46, 1). A lattice path between a pair of such points β(start) and β(finish) is a se- quence α1, . . . , αq of elements of ON with α1 = β(start) and αq = β(finish) such that, for 1 ≤ j ≤ q − 1, if we let αj = (r, c), then αj+1 is either (R, c) or (r, C) where R is the least element of {1, . . . , 2d} \ v that is bigger than r and C the least element of v that is bigger than c. Note that if β(start) = (r, c) and β(finish) = (R,C), then q equals |({1, . . . , 2d} \ v) ∩ {r, r + 1, . . . , R}|+ |v ∩ {c, c+ 1, . . . , C}| − 1, where | · | is used to denote cardinality. Consider the set Pathsw of all tuples (Λβ)β∈Sw(up) of paths where • Λβ is a lattice path between β(start) and β(finish) (if β is on the diagonal, then β(finish) is allowed to vary in the manner described above); • Λβ and Λγ do not intersect for β 6= γ. The number of such p-tuples, where p := |Sw(up)|, is the multiplicity of X(w) at the point ev. Example 11.1.1 Let d = 23, v = (1, 2, 3, 4, 5, 11, 12, 13, 14, 19, 20, 22, 23, 26, 29, 30, 31, 32, 37, 38, 39, 40, 41), w = (4, 5, 9, 10, 14, 17, 18, 21, 23, 25, 27, 28, 31, 32, 34, 35, 36, 39, 40, 41, 44, 45, 46), so that Sw ={(9, 3), (10, 2), (17, 13), (18, 12), (21, 20), (25, 22), (27, 26), (28, 19), (34, 30), (35, 29), (36, 11), (44, 38), (45, 37), (46, 1)} and Sw(up) = {(9, 3), (10, 2), (17, 13), (18, 12), (21, 20), (25, 22), (28, 19), (36, 11), (46, 1)}. A particular element of Pathsw is depicted in Figure 11.1.1. � Example 11.1.2 Figure 11.1.2 shows all the elements of Pathsw in the follow- ing simple case: d = 7, v = (1, 2, 3, 4, 7, 9, 10), and w = (4, 6, 7, 10, 12, 13, 14). We have Sw = {(6, 3), (12, 9), (13, 2), (14, 1)}, Sw(up) = {(6, 3), (13, 2)(14, 1)}. There are 15 elements in Pathsw and thus the multiplicity in this case is 15. � Example 11.1.3 Let d = 10, v = (1, 2, 3, 4, 6, 8, 11, 12, 14, 16), and w = (8, 9, 11, 14, 15, 16, 17, 18, 19, 20). so that Sw = {(20, 1)(19, 2)(18, 3), (17, 4), (9, 6)(15, 12)}. Figure 11.1.3 shows a tuple of paths that is disallowed (meaning one that is not in Pathsw). The elements of ON are represented as usual by a grid. The slanted line represents 1 2 3 4 5 11 12 13 14 19 20 22 23 Figure 11.1.1: An element of Pathsw with v and w as in Example 11.1.1 the diagonal d. The solid dot represents the point of Sw(up) that is not on d, and the crosses on d represent the points of Sw(up) that lie on d. The tuple is disallowed because the horizontal projection of the last point of the path Λβ1 is not the vertical projection of the last point of the path Λβ2 , where β1 = (20, 1) and β2 = (19, 2) are the diagonal elements of Sw of depths 1 and 2 respectively. � 11.2 Justification for the interpretation We now justify the interpretation in the previous subsection of the multiplicity. Corollary 2.3.2 says that the multiplicity is the number of monomials in OR of maximal cardinality that are square-free and O-dominated by w. Any such monomial contains OR \ON, for, by the definition of O-domination, adding or removing elements of OR\ON to or from a monomial does not alter the status of its O-domination. One could therefore equally well consider the number of monomials in ON of maximal cardinality that are square-free and O-dominated ✲ ✲ ✲ ✲ ✲ ❄ ❄ ❄ ❄ ❄ ❄ ❄ ❄ ❄ ❄ ❄ ❄ ❄ Figure 11.1.2: All the 15 non-intersecting lattice paths of Example 11.1.2 diagonal 1 2 3 4 6 8 11 12 14 16 The horizontal projection of the last point of the path associated to (20,1) is not the same as the vertical projection of the last point of the path associated to (19,2). This is the reason that this tuple is disallowed. Figure 11.1.3: A disallowed tuple of lattice paths (see Example 11.1.3) by w. We now establish a bijection between the set Monw of such monomials and the set Pathsw of non-intersecting lattice paths as in §11.1. Each element Λ of Pathsw can be thought of, in the obvious way, as a monomial in ON. We will continue to denote the corresponding monomial by Λ. It is clear that the monomial Λ is square-free and that all such monomials Λ have the same cardinality (in particular, that if Λ1 ⊆ Λ2 for two such monomials then Λ1 = Λ2). In order to establish the bijection it therefore suffices to prove the following proposition. Proposition 11.2.1 1. w is the element of I(d) obtained on application of Oπ to the monomial Λ (in particular (see Proposition 4.1.1), the monomial Λ is O-dominated by w). 2. Given a monomial T of ON that is square-free and O-dominated by w, there exists Λ such that T ⊆ Λ. Proof: (1) Write Λ = (Λβ)β∈Sw(up). From the description of the map Oπ in §7, it follows that it suffices to show that Λ (in the notation of §7) is the union ∪Λβ where β runs over all elements of depth k in Sw(up). In other words, it suffices to show that the O-depth in Λ of any element of Λβ equals the depth in Sw of β. To prove this, we observe the following (these assertions are easily seen to be true thinking in terms of pictures): for fixed β ∈ Sw(up) and α ∈ Λβ , (A) For β′ in Sw(up) such that β ′ > β, there exists α′ ∈ Λβ′ such that α ′ > α. (B) If α′ > α for some α′ in Λβ′ for some β ′ in Sw(up), then β ′ > β. If, furthermore, β and β′ are diagonal, their depths in Sw are 1 apart, and the depth in Sw of β is even, then the following is not possible: ph(α belongs to N and ph(α ′) > α. From (A) it is immediate that the O-depth e in Λ of an element α of Λβ is not less than the depth d in Sw of β. We now show, by induction on e, that e ≤ d. For e = 1 there is nothing to show. Suppose that e ≥ 2. Let C be a v-chain in Λ having tail α and the good property of Proposition 6.3.3, α′ the immediate predecessor in C of α, e′ the O-depth of α′ in Λ, β′ the element of Sw(up) such that α′ ∈ Λβ′ , and d ′ the depth in Sw of β ′. From Corollary 6.1.3 (3) it follows that e′ ≤ e − 1, so we may apply induction. From (B) it follows that d′ ≤ d − 1, so that, by induction, e′ ≤ d − 1. If e′ ≤ d − 2, then we are done by Lemma 6.3.1 (1). So suppose that e′ = d′ = d − 1. If d is odd, then the conclusion e ≤ d follows from (1) and (3) of the same lemma. In case d is even, then it follows from condition (B) and (1) of the same lemma. (2) Let T be a square-free monomial in ON that is O-dominated by w. To construct Λ such that T ⊆ Λ, we construct the “components” Λβ . As in §8, let Pβ denote the piece of T corresponding to β ∈ Sw. From every point belonging to Pβ(up) and also from β(start) carve out the South-West quadrant; if β is not diagonal, then do this also from β(finish). The boundary of the carved out portion (intersected with ON) gives a lattice path starting from β(start). In case β is not diagonal, the path ends in β(finish). In this case as well as in the case when β is diagonal and of even depth in Sw, we take Λβ to be this lattice path. In case β is diagonal and of odd depth in Sw we do the carving out from one more point before taking Λβ to be the boundary of the carved out region, namely from the point that is one step away from the diagonal and whose horizontal projection is the vertical projection of the end point of Λγ where γ is the diagonal element of Sw of depth 1 more than β. We need to justify why carving out from the extra point is still valid, and we do this now by applying Lemma 8.1.4. Let us first choose notation that is consistent with that of that lemma. Let β and γ be diagonal elements in Sw of depths d and d + 1. Assume that d is odd. Let the pieces of T corresponding to β and γ, when their elements are arranged in increasing order of row and column indices, look like this: . . . , (r1, a ∗), (a, r∗1), . . . ; . . . , (r2, b ∗), (b, r∗2), . . . It is easy to see that the conditions on the numbers in the above display that pro- vide the requisite justification are: r1 ≤ b and a ∗ < b∗ (if Pβ is empty then the justification is easy). To prove that a∗ < b∗, observe that the diagonal elements in P∗β and P γ are respectively (a, a ∗) and (b, b∗), and apply Lemma 8.1.3 (2). That r1 ≤ b now follows from Lemma 8.1.4 (1). This finishes the justification. It suffices to prove the following claim: the lattice paths Λβ as β varies are non-intersecting. Suppose that Λβ and Λβ′ intersect for β 6= β ′. Let α be a point of intersection. Clearly β dominates all elements of Λβ and in particular α; for the same reason β′ also dominates α. By the distinguishedness of Sw, we may assume without loss of generality that β′ > β. It is easy to see graphically that if γ in Sw is such that β ′ > γ > β then Λγ intersects either Λβ′ or Λβ : consider the open portion of ON “caught between” the segment of Λβ′ from β ′(start) to α and the segment of Λβ from β(start) to α; the starting point γ(start) of Λγ lives in this region but its ending point does not (points strictly to the Northwest of α can neither be of the form γ(finish) for γ not on the diagonal nor can they be one step away from the diagonal); so Λγ must intersect one of the two lattice path segments. We may therefore assume that the depths of β′ and β differ by 1. We now apply Lemma 9.4.4. From the construction of Λβ it readily fol- lows that α satisfies the hypotheses (a), (b), and (c) of that lemma. By the conclusion of Lemma 9.4.4, there exists α′ ∈ P∗β′(up) such that α ′ > α. On the other hand, it follows from the construction of P∗β′ from Pβ′ , and from the construction of Λβ′ that two elements one from P β′ and another from Λβ′ are not comparable. This is a contradiction to the comparability of α′ and α. � References [1] M. Brion and P. Polo, Generic singularities of certain Schubert varieties , Math. Z., 231, no. 2, 1999, pp. 301–324. [2] E. De Negri, Some results on Hilbert series and a-invariant of Pfaffian ideals , Math. J. Toyama Univ., 24, 2001, pp. 93–106. [3] S. R. Ghorpade and C. Krattenthaler, The Hilbert series of Pfaffian rings , in: Algebra, arithmetic and geometry with applications (West Lafayette, IN, 2000), Springer, Berlin, 2004, pp. 337–356. [4] S. R. Ghorpade and K. N. Raghavan, Hilbert functions of points on Schubert varieties in the Symplectic Grassmannian, Trans. Amer. Math. Soc., 358, 2006, pp. 5401–5423. [5] J. Herzog and N. V. Trung, Gröbner bases and multiplicity of determinantal and Pfaffian ideals , Adv. Math., 96, no. 1, 1992, pp. 1–37. [6] T. Ikeda and H. Naruse, Excited Young diagrams and equivariant Schubert calculus , arXiv:math/0703637. [7] V. Kodiyalam and K. N. Raghavan, Hilbert functions of points on Schubert varieties in Grassmannians , J. Algebra, 270, no. 1, 2003, pp. 28–54. [8] C. Krattenthaler, On multiplicities of points on Schubert varieties in Grass- mannians. II , J. Algebraic Combin., 22, no. 3, 2005, pp. 273–288. [9] V. Kreiman, Monomial bases and applications for Schubert and Richardson varieties in ordinary and affine Grassmannians , Ph. D. Thesis, Northeast- ern University, 2003. [10] V. Kreiman, Local Properties of Richardson Varieties in the Grassmannian via a Bounded Robinson-Schensted-Knuth Correspondence, preprint, 2005, URL arXiv:math.AG/0511695. [11] V. Kreiman and V. Lakshmibai, Multiplicities of singular points in Schu- bert varieties of Grassmannians , in: Algebra, arithmetic and geometry with applications (West Lafayette, IN, 2000), Springer, Berlin, 2004, pp. 553– [12] V. Kreiman and V. Lakshmibai, Richardson varieties in the Grassmannian, in: Contributions to automorphic forms, geometry, and number theory, Johns Hopkins Univ. Press, Baltimore, MD, 2004, pp. 573–597. [13] V. Lakshmibai and C. S. Seshadri, Geometry of G/P . V , J. Algebra, 100, no. 2, 1986, pp. 462–557. [14] V. Lakshmibai and J. Weyman, Multiplicities of points on a Schubert va- riety in a minuscule G/P , Adv. Math., 84, no. 2, 1990, pp. 179–208. [15] P. Littelmann, Contracting modules and standard monomial theory for sym- metrizable Kac-Moody algebras , J. Amer. Math. Soc., 11, no. 3, 1998, pp. 551–567. [16] K. N. Raghavan and S. Upadhyay, Initial ideals of tangent cones to Schubert varieties in orthogonal Grassmannians . In preparation. [17] C. S. Seshadri, Geometry of G/P . I. Theory of standard monomials for minuscule representations , in: C. P. Ramanujam—a tribute, vol. 8 of Tata Inst. Fund. Res. Studies in Math., Springer, Berlin, 1978, pp. 207–239. Index >, relation on ON . . . . . . . . . . . . . . . . . 11 ≤, partial order on I(d, 2d) . . . . . . . . 10 A, affine patch qv 6= 0 of Md(V ) . . . 14 α(down), for α ∈ N . . . . . . . . . . . . . . . . 25 α# for α in N . . . . . . . . . . . . . . . . . . . . . 22 α(up), for α ∈ N . . . . . . . . . . . . . . . . . . .25 anti-domination . . . . . . . . . . . . . . . . . . . 18 B, a specific Borel subgroup . . . . . . . . 8 β(finish), for β ∈ Sw(up) . . . . . . . . . . 65 β(start), for β ∈ Sw(up) . . . . . . . . . . . 65 block in the sense of [7] . . . . . . . . . . . . .47 of a monomial S in ON . . . . . . . 51 comparability, of elements of R . . . . 29 connected components of a v-chain.23 connectedness of two succcessive ele- ments in a v-chain . . . . . . . . 22 critical element (of a v-chain) . . . . . . 24 d, integral part of n/2 (unfortunately also used otherwise) . . . . . 7, 9 degree, of a monomial . . . . . . . . . . . . . 10 degree, of a standard monomial . . . . 16 δj , for j odd . . . . . . . . . . . . . . . . . . . . . . . 44 depth (of an element α in a monomial S in N) = depth α . . . . . . 46 depth (of a monomial S in N) . . . . . 46 diagonal, d . . . . . . . . . . . . . . . . . . . . . 10, 11 distinguished (a subset of N) . . . . . . 21 domination (among elements of R) 29 domination map . . . . . . . . . . . . . . . . . . . 18 e1, . . . , en, a specific basis of V . . . . . . 8 ev, T -fixed point . . . . . . . . . . . . . . . . . . . 10 〈 , 〉, bilinear form on V . . . . . . . . . . . . 7 fθ := qθ/qv . . . . . . . . . . . . . . . . . . . . . . . . 14 head, of a v-chain. . . . . . . . . . . . . . . . . .11 horizontal projection ph(α) . . . . . . . . 22 I(d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 I(d, 2d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 i(even), for an integer i . . . . . . . . . . . . 29 i(odd), for an integer i . . . . . . . . . . . . . 29 intersection (of a monomial in a set with a subset) . . . . . . . . . . . . 11 isotropic subspace . . . . . . . . . . . . . . . . . . 7 In . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 I ′n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 k, base field, (characteristic 6= 2) .7, 15 k∗(:= n+ 1− k) . . . . . . . . . . . . . . . . 7, 10 L, line bundle . . . . . . . . . . . . . . . . . . . . . 13 Λβ , for β ∈ Sw(up) . . . . . . . . . . . . . . . . 66 lattice path, from β(start) to β(finish), denoted Λβ . . . . . . . . . . . . . . . 65 legs of α, for α ∈ ON . . . . . . . . . . . . . . 22 legs, intertwining of . . . . . . . . . . . . . . . .22 length, of a v-chain . . . . . . . . . . . . . . . . 11 Md(V ), orthogonal Grassmannian 7, 8 Md(V ) ′ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 monomial. . . . . . . . . . . . . . . . . . . . . . . . . .10 w . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 multiplicity, of X(w) at ev . . . . . . . . . 65 multiset := monomial . . . . . . . . . . . . . . 10 N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 n := dimV , (even from §2.1 on) . . 7, 9 O(V ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 O-depth . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 ON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 OP∗β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Oφ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17, 42 Oπ . . . . . . . . . . . . . . . . . . . . . . . . . 17, 34, 35 OR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 O-domination . . . . . . . . . . . . . . . . . . . . . 11 orthogonal Grassmannian (Md(V )) . 7 Paths w . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Pβ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 P∗β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Pfaffian qθ . . . . . . . . . . . . . . . . . . . . . . . . . 14 ph(α), horizontal projection. . . . . . . .22 piece of T (see also caution) . . . . . . . 43 pθ, Plücker coordinate . . . . . . . . . . . . . 13 pv(α), vertical projection . . . . . . . . . . 22 qC,α, for α in a v-chain C . . . . . . . . . . 24 qθ, Pfaffian . . . . . . . . . . . . . . . . . . . . . . . . 14 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 S, fixed monomial in ON in §7, §9.3 . S, set of monomials in OR . . . . . . . . 18 modifications . . see Notation 4.2.1 SC , where C is a v-chain . . . . . . . . . . 23 SC,α, for α in a v-chain C . . . . . . . . . 24 Schubert varieties . . . . . . . . . . . . . . . . 7, 8 S(down), for a monomial S . . . . . . . 25 S#, for monomial S in N or R . . . . 22 σk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Sj,j+1, for S in ON, j odd . . . . . . . . 34 Sj,j+1(ext), for S in ON, j odd . . . 38 j,j+1, for S in ON, j odd . . . . . . . . 34 Sk, for monomial S in N . . . . . . . . . . 46 Sk, for monomial S in ON . . . . . . . . 34 Sk(ext), for monomial S in ON . . . 39 Sk,k+1, for monomial S in N . . . . . . 49 , for monomial S in ON . . . . . . . 34 SM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 modifications . . see Notation 4.2.1 SMv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 SMw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 SO(V ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 S′, for monomial S in ON . . . . . . . . 35 standard monomial . . . . . . . . . . . . . . . . 14 v-compatible . . . . . . . . . . . . . . . . . . 15 w-dominated . . . . . . . . . . . . . . . . . . 14 S(up), for a monomial S . . . . . . . . . . 25 Sj,j+1, for monomial S in ON . . . . 32 Sk, for monomial S in N . . . . . . . . . . 46 Sw(v)(m) . . . . . . . . . . . . . . . . . . . . . . . . . .11 Sw, w in I(d, 2d) or I(d, n) . . . . 21, 48 Sw,j,j+1, w in I(d), j odd . . . . . . . . . 42 Sjw, w in I(d), j odd . . . . . . . . . . . . . . 42 symmetric (monomial of N). . . . . . . .22 T , a specific maximal torus . . . . . . . . . 8 T , set of monomials in ON . . . . . . . . 18 modifications . . see Notation 4.2.1 tail, of a v-chain . . . . . . . . . . . . . . . . . . . 11 T, fixed monomial in ON in §8 . . . . . . . truly orthogonal at j (j odd) . . . . . . 34 Tw,j,j+1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 (Tw,j,j+1 ∪ T w,j,j+1) ⋆ . . . . . . . . . . . . . . 43 T⋆w,j,j+1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 T⋆w(:= Oφ(w,T)) . . . . . . . . . . . . . . . . . . 44 type (V, H, S), of an element in a v- chain. . . . . . . . . . . . . . . . . .23–24 U , set of monomials in OR \ON . . 18 modifications . . see Notation 4.2.1 u∗, for u ∈ I(d) . . . . . . . . . . . . . . . . . . . . 19 V , vector space of dimension n . . . . . .7 v, fixed element of I(d) . . . . . . . . . . . . 10 v-chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 v-degree . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 vertical projection pv(α) . . . . . . . . . . . 22 w(C) (or wC), where C is v-chain. .11 w#, for w an element of I(d, 2d) . . . 21 w(k) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34 (w,S′)(:= Oπ(S)), for S in ON . . 34, w∗, for w an element of I(d, 2d) . . . .21 wj,j+1, j odd . . . . . . . . . . . . . . . . . . . . . . 42 wj , j odd . . . . . . . . . . . . . . . . . . . . . . . . . . 42 X(w), Schubert variety . . . . . . . . . . . . 10 xk, x k, xk,k+1, for x ∈ I(d, n). . .48–49 Xr,c, variable . . . . . . . . . . . . . . . . . . . . . . 16 Y (w)(:= X(w) ∩ A). . . . . . . . . . . . . . . .15 ABSTRACT A solution is given to the following problem: how to compute the multiplicity, or more generally the Hilbert function, at a point on a Schubert variety in an orthogonal Grassmannian. Standard monomial theory is applied to translate the problem from geometry to combinatorics. The solution of the resulting combinatorial problem forms the bulk of the paper. This approach has been followed earlier to solve the same problem for the Grassmannian and the symplectic Grassmannian. As an application, we present an interpretation of the multiplicity as the number of non-intersecting lattice paths of a certain kind. Taking the Schubert variety to be of a special kind and the point to be the "identity coset," our problem specializes to a problem about Pfaffian ideals treatments of which by different methods exist in the literature. Also available in the literature is a geometric solution when the point is a "generic singularity." <|endoftext|><|startoftext|> Introduction The hard X–ray transient IGR J11215–5952 was discovered with the INTEGRAL satellite during an outburst in April 2005 (Lubinski et al., 2005) and was associated with HD 306414 (Negueruela et al., 2005), a B1Ia supergiant located at a dis- tance of 6.2 kpc (Masetti et al., 2006). The short duration of the outburst together with the likely optical counterpart suggested that IGR J11215–5952 could be a new member of the class of Supergiant Fast X-ray Transients (SFXTs; Negueruela et al. 2006). Analysing archival INTEGRAL observations of the source field, Sidoli, Paizis, & Mereghetti (2006, hereafter Paper I) discovered two previously unnoticed outbursts (in July 2003 and in May 2004) which demonstrate the recurrent nature of this transient and suggest a possible periodicity of ∼330 days. This periodicity was confirmed by the detection of the fourth outburst from IGR J11215–5952 with RossiXTE/PCA on 2006 March 16–17, 329 days after the third outburst (Smith et al., 2006b). The RXTE/PCA observations showed strong flux variability and a hard spectrum (power-law photon index of 1.7 ± 0.2 in the range 2.5–15 keV) as well as a possible pulse period of ∼195 s (Smith et al., 2006a). The periodicity was confirmed with RXTE observations of the latest outburst, yielding P = 186.78 ± 0.3 s (Swank et al., 2007). Follow-up observations with Swift/XRT Send offprint requests to: P. Romano, patrizia.romano@brera.inaf.it refined the source position and confirmed the association with HD 306414 (Steeghs et al., 2006). A hard power-law with a high energy cut-off around 15 keV is a good fit to the spectra ob- served with INTEGRAL (Paper I). For the distance of 6.2 kpc, the peak fluxes of the outbursts correspond to a luminosity of ∼ 3×1036 erg s−1 (5–100 keV). All these findings confirmed IGR J11215–5952 as a member of the class of the SFXTs, and the first object of this class of High Mass X–ray Binaries dis- playing periodic outbursts. Predicting a fifth outburst for 2007 Feb 9, we obtained a Target of Opportunity (ToO) observing campaign with Swift, which commenced on Feb 4. The source started showing re- newed activity on Feb 8 (Romano et al., 2007) and under- went a powerful outburst on Feb 9 (Mangano et al., 2007a,b; Sidoli et al., 2007; Swank et al., 2007). This paper presents our observations of IGR J11215–5952 and it is organized as follows. In Sect. 2 we describe our observations and data reduction; in Sect. 3 we describe our spatial, timing and spectral data anal- ysis. Finally, in Sect. 4 we discuss our findings and draw our conclusions. 2. Observations and Data Reduction Table 1 reports the log of the Swift/XRT observations used for this work. Thanks to Swift’s fast-slewing and flexible observing http://arxiv.org/abs/0704.0543v1 2 P. Romano et al.: The fifth outburst of IGR J11215–5952 observed by Swift 140 145 150 155 MJD (−54,000) (a) 1−10 keV Feb 9 14 19 24 0 2×104 4×104 6×104 8×104 Time (s since 2007−02−09 00:03:05 UT) 1 2 3 (b) 1−10 keV 1−4 keV(c) Flare 1 te 4−10 keV 5200 5400 5600 5800 6000 Time (s) 4−10/1−4 1−4 keV (d) Flare 2 3 4−10 keV 1.05×104 1.1×104 1.15×104 Time (s) 4−10/1−4 Fig. 1. XRT light curves, cor- rected for pile-up, PSF losses, vignetting and background- subtracted. a) 1–10 keV light curve for the whole campaign. Different colours denote dif- ferent observations (Table 1), and points before Feb 6 (MJD 54,137) and after Feb 15 (MJD 54,146) are drawn from the sum of several observations. Filled circles are full detections (S/N>3), triangles marginal detections (2 <|startoftext|> Introduction Background Model Functional representation of the grand partition function of an ionic model Effective Hamiltonian in the vicinity of the critical point Ginzburg temperature Summary Appendices Recurrence formulas for the cumulants Fourier space. The nth-particle structure factors of a one component hard sphere systems in the Percus-Yevick approximation Explicit expression for SR2 Explicit expressions for the integrals used in equations (??)-(??) References ABSTRACT According to extensive experimental findings, the Ginzburg temperature $t_{G}$ for ionic fluids differs substantially from that of nonionic fluids [Schr\"oer W., Weig\"{a}rtner H. 2004 {\it Pure Appl. Chem.} {\bf 76} 19]. A theoretical investigation of this outcome is proposed here by a mean field analysis of the interplay of short and long range interactions on the value of $t_{G}$. We consider a quite general continuous charge-asymmetric model made of charged hard spheres with additional short-range interactions (without electrostatic interactions the model belongs to the same universality class as the 3D Ising model). The effective Landau-Ginzburg Hamiltonian of the full system near its gas-liquid critical point is derived from which the Ginzburg temperature is calculated as a function of the ionicity. The results obtained in this way for $t_{G}$ are in good qualitative and sufficient quantitative agreement with available experimental data. <|endoftext|><|startoftext|> Mostovoy Reply: In their Comment [1] Kenzelmann and Harris argue against the conclusion made in [2] that spiral magnets are in general ferroelectric. First of all, I believe, this conclusion was proved experimentally. The systematic search for ferroelectricity in magnets with spi- ral ordering recently led to a discovery of new multifer- roic materials, such as CoCr2O4 [3], MnWO4 [4, 5] and LiCu2O2 [6]. Furthermore, Kenzelmann and Harris argue that the continuum theory outlined in [2] leads to misleading pre- dictions about the magnetically-induced electric polar- ization. To prove their point, they consider two hy- pothetical spin configurations shown in Fig. 1 (c) and (d) of their Comment, and argue that the results of the continuum theory are incompatible with crystal symme- tries. While one cannot deny the importance of symme- try considerations, the arguments Kenzelmann and Har- ris are themselves very misleading. They incorrectly as- sert that for the spin configurations shown in Fig. 1 (c) and (d) ‘the spiral theory’ would predict electric polar- ization along, respectively, the c and a axes. The continuum model of multiferroics [2] is based on assumption that the spin state can be described by a sin- gle magnetization vector. For TbMnO3 (see Fig. 1b), where the wave vector of the magnetic spiral is along the b axis and spins are rotating in the bc plane, it predicts electric polarizationP along the c axis, in agreement with experiment. The magnetic structures (c) and (d) are of a different kind, as they are made of spirals rotating in op- posite directions. Thus in the configuration (c) there are two counter-rotating bc spirals in each ab plane, which is why the net polarization along the c axis is zero. Simi- larly, in the configuration (d) the ab spirals in neighboring bc planes rotate in opposite directions, resulting in zero net Pa. It is not difficult to modify the continuum model con- sidered in [2] to describe these more general magnetic orders. For more than one magnetic ion per unit cell one can introduce several independent magnetic order parameters, which increases the number of possible mag- netoelectric coupling terms. For instance, all three spin configurations shown in Fig. 1 of the Comment can be described by three antiferromagnetic order parameters L1 = S1 + S2 − S3 − S4, L2 = S1 − S2 + S3 − S4, L3 = S1 − S2 − S3 + S4 (the labels of the 4 Mn ions in the unit cell of TbMnO3 are the same as in [7]). The spiral configuration (b) can be described by a single order parameter L1 with nonzero Lb and Lc . As discussed in [2], the magneto- electric coupling linear in the gradient of the magnetic order parameter (Lifshitz invariant) allowed by symme- tries has the form P c , which gives rise to magnetically-induced P c. The configuration (c) is described by two different order parameters, Lb . The term Lc does not transform like any of the components of P, so that the induced polar- ization is zero. Finally, for the configuration (d) with nonzero Lb and La , the only possible coupling term is , allowing for nonzero P c. The point is, however, that the spin configurations (c) and (d) considered by Kenzelmann and Harris, are very artificial, as it is difficult to find a system where interac- tions between spins would favor the simultaneous pres- ence of counter-rotating spirals. The average interaction between counter-rotating spirals is zero, while for spirals with spins rotating in the same direction some interac- tion energy can always be gained by properly adjusting their relative phases. This is the reason why the simple model of Ref. [2] with a single vector order parameter successfully describes thermodynamics and magnetoelec- tric properties of many spiral multiferroics. Maxim Mostovoy Materials Science Center, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands [1] M. Kenzelmann and A. B. Harris, Comment arXiv/cond-mat0610471. [2] M. Mostovoy, Phys. Rev. Lett. 96, 067601 (2006). [3] Y. Yamasaki, S. Miyasaka, Y. Kaneko, J.-P. He, T. Arima, and Y. Tokura, Phys. Rev. Lett. 96, 207204 (2006). [4] K. Taniguchi, N. Abe, T. Takenobu, Y. Iwasa, and T. Arima, Phys. Rev. Lett. 97, 097203 (2006). [5] O. Heyer et al., J. Phys. Condens. Matter 18, L471 (2006). [6] S. Park, Y. J. Choi, C. L. Zhang and S.-W. Cheong, to be published. [7] A. B. Harris and G. Lawes, arXiv/cond-mat0508617. http://arxiv.org/abs/0704.0545v1 References ABSTRACT In response to the comment of Kenzelmann and Harris I show how the continuum theory of spiral multiferroics can be modified to describe general magnetic orders and discuss why the microscopic mechanism of magnetically-induced ferroelectricity usually makes such modifications unnecessary. This explains why the simple model with a single vector order parameter successfully describes thermodynamics and magnetoelectric properties of many spiral multiferroics. <|endoftext|><|startoftext|> Microsoft Word - SQubit.doc PERSISTENT CURRENTS IN SUPERCONDUCTING QUANTUM INTERFERENCE DEVICES F. Romeo Dipartimento di Fisica “E. R. Caianiello”, Università degli Studi di Salerno I-84081 Baronissi (SA), Italy R. De Luca CNR-INFM and DIIMA, Università degli Studi di Salerno I-84084 Fisciano (SA), Italy ABSTRACT Starting from the reduced dynamical model of a two-junction quantum interference device, a quantum analog of the system has been exhibited, in order to extend the well known properties of this device to the quantum regime. By finding eigenvalues of the corresponding Hamiltonian operator, the persistent currents flowing in the ring have been obtained. The resulting quantum analog of the overdamped two-junction quantum interference device can be seen as a supercurrent qubit operating in the limit of negligible capacitance and finite inductance. PACS: 74.50.+r, 85.25.Dq Keywords: Josephson junctions, d. c. SQUID, Qubit I INTRODUCTION The d. c. SQUID (Superconducting QUantum Interference Device) is a well known system, widely investigated in the literature [1-3]. This system, though not confined to atomic scale in its dimensions, has been proposed as the basic unit for quantum computing (qubit) by resorting to a characteristic feature of superconductivity: macroscopic quantum coherence [4]. In general, a qubit can be realized by means of a two-level quantum mechanical system [5]. Therefore, the quantum states of a qubit can be a linear combinations of the orthogonal basis 0 and 1 , so that the Hilbert space generated by this basis is two-dimensional. Alternatively, a qubit state can be represented by elements of an infinite-dimensional Hilbert space. In this case, however, the effective potential of the system must show a double-well potential, in such a way that one of the two stationary states can be defined as state 0 and the other as state 1 . The electrodynamic properties of d. c. SQUIDs can be analyzed by means of two-junction quantum interferometer models, where each Josephson junction is assumed to be in the overdamped regime. The simplest possible analysis of these systems is done by assuming negligible values of the inductance L of a single branch of the device, so that 0 β , where 0Φ is the elementary flux quantum, and 21 JJ = is the mean value of the maximum Josephson currents of the junctions. In this case, the dynamical equation for the superconducting phase differences 1φ and 2φ across the two junctions can be written as a single equation for the average phase variable 21 φφϕ = . This equation is similar to the nonlinear differential equation governing the time evolution of a single overdamped junction, so that it can be defined as an equivalent single junction model, and is written as follows: sincos Bex =+ ϕπψ , (1) where t τ , with 21 RRR = , 1R and 2R being the resistive junction parameters, exψ is the externally applied flux normalized to 0Φ and Bi is the bias current normalized to JI . Following the same type of approach, by means of a perturbation analysis, taking β as the perturbation parameter, it can be shown that, to first order in β , the equivalent single junction model can be written as follows for a symmetric SQUID with identical junctions [6]: ( ) ( ) ( ) 2sinsinsincos1 2 Bexex =Ψ+Ψ−+ ϕππβϕπ , (2) where n is an integer. This model allows, at least for small values of the parameter, to calculate in closed form some electrodynamic quantities, such as, for example, the amplitude of the half-integer Shapiro steps appearing in these systems [7]. It has also been shown that, by extending this model to SQUIDs with non-identical junction, one can obtain an effective classical double-well potential in which the transition from one state to the other can be enhanced by applying an opportune external magnetic flux [8]. However, this classical analysis by itself does not allow to define the quantum states of the system. Nonetheless, the equation of the motion (2) could be assumed to be a classical version of the time evolution of a quantum phase state. Therefore, the aim of the present work is to obtain, starting from the time evolution of the superconducting phase difference ϕ , the quantum mechanical Hamiltonian and to compute, by means of this quantum mechanical system, which in the classical limit reduces to Eq. (2), the persistent currents in the SQUID. It is interesting to notice that the resulting “Hamiltonian” quantum model derives from the overdamped limit of a classical dissipative system in the presence of a double well potential. The present analysis can be seen as an alternative approach to the study of the quantum properties of supercurrent qubits: It allows to study the response of the quantum system in the limit of negligible capacitance and finite values of the inductance, as opposed to the case usually considered in the literature, where negligible inductance and finite capacitance is assumed [9 - 10]. II FROM CLASSICAL TO QUANTUM MECHANICS Let us consider the classical dynamical equation ( )xfx =& for the state variable x, where the dot notation indicates the derivative with respect to the normalized time τ. Making use of the previous equation, taking the time derivative of both sides, we can obtain the equation of the motion of the quantity x& as follows: ( ) ( ) ( )xfxfxxfx xx == &&& , (3) where the notation ( )xf x stands for the partial derivative of ( )xf with respect to x. Given the above equation and following the procedure described by Huang and Lin [11], the Lagrangian associated to this problem is obtained in the form: ( )( )[ ]22 xfxL += & . (4) Starting from the Lagrangian L, the Legendre transformation allows us to get the following classical Hamiltonian: ( )( )[ ]22 xfH −= π , (5) where =π is the canonical momentum conjugated to the variable x, while ( ) ( )( )2 xfxU −= could be considered as an effective potential. We are now interested in the quantization of the classical model described so far. According to the standard procedure, the recipe to transform the classical Hamiltonian in a quantum operator is implemented by making the substitution xi∂−=→ππ ˆ , xxx =→ ˆ (in dimensionless units). From the previous definitions, the commutation rule [ ] ix =π̂,ˆ for the conjugated variables follows directly. Furthermore, the Hamiltonian operator can be written as ( )( )[ ]22 1ˆ xfH x +∂−= . The general procedure described above can be adopted to obtain the quantum model of an overdamped d.c. SQUID in the limit in which the reduced two-junctions interferometer model [6] can be applied. In the framework of this model, the phase dynamics can be written (in the homogeneous case) as in Eq. (2), so that the function ( )ϕff = takes the following form: ( ) ( ) ( )ϕϕγϕ 2sinsin baf −−= , (6) where ( )exa πψcos= , ( )exb πψπβ 2sin= , 2 Bi=γ , having chosen 0=n for simplicity. We notice that this analysis cannot be extended to the similar case, considered by Grønbech-Jensen et al. [12], of junctions with finite capacitance. Therefore, by setting ϕ=x and ϕπ &= in the above general analysis, we notice that the phase and the voltage across the two-junction quantum interferometer are conjugate variables of the system. In the present case, therefore, proceeding as we said, by squaring ( )ϕf and exhibiting the final result of the calculation in terms of the higher harmonics of the phase variable instead of powers of trigonometric functions, the following Hamiltonian operator is obtained: ( ) ( ) ( ) ( ) ( ) ( ) ( )γϕγϕγ ϕϕϕϕϕ ,,2sinsin babaab = , (7) where ( ) 222 ba γ is a flux dependent energy shift which will be important in the following discussion. In order to calculate the relevant physical quantities of the system, we introduce the orthonormal complete basis n = of the infinite-dimensional Hilbert space with the inner product == −∫ , where mn,δ is the Kronecker delta. In this representation the matrix elements of the Hamiltonian operator can be written as follows: ( ) ( ) ( ) ( ) ( ) ( ) ( )2,2,1,1,4,4, 1,1,, +−+−+−+− −+−+++++ ++++− mnmnmnmnmnmnmnmn mnmnmnmnmnmn aabban δδδδδ , (8) where the following useful relations have been used: ( ) ( )lmnlmninlm +− −= ,,2 sin δδϕ , (9a) ( ) ( )lmnlmnnlm +− += ,,2 cos δδϕ . (9b) Once the matrix elements of the Hamiltonian operator are known, we can diagonalize a reduced version of the complete infinite-dimensional matrix by introducing an energy cut-off. Such procedure can be safely carried out when we need to characterize low energy states which are located very far from the cut and when the number of the vectors in the basis of the reduced Hilbert space is able to capture the essential features of the low energy states. For instance, the Hilbert space spanned by the first 20 basis functions can be a very effective choice, if we need to study only the lowest energy states close to the ground state of our system. In fact, in our case we have noted that, by halving the number of the basis elements, no evident difference is present in the lowest energy eigenvalues. III PERSISTENT CURRENTS Following the procedure described above, in the present section we shall derive the behavior of the persistent currents associated to each eigenstate of the Hamiltonian as a consequence of the time reversal symmetry breaking provided by the magnetic flux. Such a current, in units of the Josephson current divided by π2 (i. e., in units of JE , where JE is the Josephson energy), can be defined as follows: −= , (10) where nε and exψ are the eigenvalues of the Hamiltonian and the normalized external magnetic flux, respectively. According to the above relation, the persistent current nI can be computed once the pertinent eigenvalue nε is known. Furthermore, it should be noticed that, in the absence of the off-diagonal terms in the Hamiltonian given in Eq. (8), the state independent persistent current computed by means of Eq. (10) would be given by: ( ) ( )[ ]exexI πψβππψπ 222 sin212sin4 −−= . (11) The solution of the full problem can thus be seen as the state dependent correction to the above relation induced by the off-diagonal terms. Last point can be well understood by analyzing Figs. 1a- b. In these figures, even tough we are in the presence of finite off-diagonal terms, the relation given in Eq. (11) is able to describe quite accurately the behavior of the persistent currents which appears insensitive to the state index due to the small value of β . When the value of β is raised (see Fig. 2a), the persistent currents starts to become weakly state sensitive and some deformation of the original shape occurs. Furthermore, the states of higher energy (see Fig. 2b) induce a behavior of the persistent current which is quite insensitive to the state index. A further raising of β (see Fig. 3a) induces a suppression of the persistent current carried by the first excited state in the vicinity of half integer values of the normalized applied magnetic flux. This implies that, in the low energy regime (i. e., when the quantum state can be written as SSS 1 += , where 0 and represent the ground state and the first excited state, respectively), the average persistent current 1 ISISI += close to an half integer flux is mainly related to the ground state properties of the system, since 10 II >> in the vicinity of 2 =exψ (for 2 ≠exψ ). In Fig. 4a, raising once again β , it can be noticed that, in the vicinity of half integer values of the normalized flux, the ground state and the first excited state carries currents of opposite sign, inducing a competing magnetic behavior. Therefore, the average persistent current, and its magnetic behavior, depend, on both coefficients of the decomposition (i. e., on S0 and S1 ). This last point implies that, by measuring the magnetic momentum of the system in a particular magnetic field configuration, we can obtain constraints on the nature of the quantum superposition. For instance, under these conditions, we could prepare the quantum state in such a way that the average persistent current is negligibly small in the vicinity of half integer values of the normalized applied magnetic flux. Furthermore, we point out that a double well potential can be obtained setting the model parameters as done in Fig.5 ( 2.0=β and 7.0=exψ ), where the potential ( )ϕU is shown. Indeed, we notice that for 0=γ two low-energy degenerate states are present, the degeneracy being removed by means of a small current bias. Such bias can drive the response of the system toward one of the two minima of the potential allowing a complete control of the quantum state which can be exploited for technological applications. Finally, we notice that, even thought the chosen β values in Figs. 2 – 5 are close to the validity limits of the first order approximation of the reduced model in ref. [6], the above characteristic response of the system remain qualitatively valid, since we are here considering the leading order in the value of β . IV CONCLUSION Starting from the reduced dynamical model of the two-junction quantum interference device, the applied flux dependence of persistent currents in this system has been studied in the quantum regime. The extension of the dissipative overdamped classical system, from classical to quantum mechanics, allows to consider the electrodynamical response of a supercurrent quantum bit in the limit of negligible capacitance and finite inductance. For null bias current and for opportune values of the externally applied magnetic flux, the quantum analog of the two-junction interferometer shows effective potential with a degenerate ground state; degeneracy can be removed by applying a control non-null bias current. In the literature, the quantum behavior of the two-junction quantum interference device is studied by considering the charging energy of the junctions as preponderant with respect to the energy of the circulating currents [5, 9, 10]. In the present work it is shown that it is possible to obtain an Hamiltonian quantum analog of d. c. SQUIDs containing overdamped junctions in the limit of null capacitance and finite inductance values. In this framework, a flux qubit can be realized, under quite different conditions than those with high junction capacitance value [5]. Finally, the present analysis can also be considered as a link between classical dissipative systems and their corresponding quantum mechanical models. REFERENCES 1. A. Barone and G. Paternò, Physics and Applications of the Josephson Effect (Wiley, NY, 1982). 2. K. K. Likharev, Dynamics of Josephson Junctions and circuits, Gordon and Breach, Amsterdam, 1986. 3. J. Clarke and A. I. Braginski, Eds., The SQUID Handbook, Vol. I (Wiley-VCH, Weinheim, 2004). 4. M. F. Bocko, A. M. Herr and M. J. Feldman, IEEE Tans. Appl. Supercond. 7, 3638 (1997). 5. J. B. Majer, F. G. Paauw, A. C. J. ter Haar, C. J. P. M. Harmans, and J. E. Mooij, Phys. Rev. Lett. 94, 090501 (2005). 6. F. Romeo, R. De Luca, Phys. Lett. A 328, 330 (2004). 7. C. Vanneste, C. C. Chi, W. J. Gallagher, A. W. Kleinsasser, S. I. Raider, and R. L. Sandstrom, J. Appl. Phys. 64, 242 (1988). 8. R. De Luca, F. Romeo, Phys. Rev. B 73, 214518 (2006). 9. G. Burkard, Phys. Rev B 71, 144511 (2005). 10. T. P. Orlando, J. E. Mooij, Lin Tian, Caspar H. van der Wal, L. S. Levitov, Seth Lloyd, J. J. Mazo, Phys. Rev B 60, 15398 (1999). 11. Y.-S. Huang and C.-L. Lin, Am. J. Phys. 70, 741 (2002). 12. N. Grønbech-Jensen, D. B. Thompson, M. Cirillo, C. Cosmelli, Phys. Rev. B 67, 224505 (2003). FIGURE CAPTIONS Fig. 1 (a) Persistent currents 1I (triangle) and 2I (box) plotted as a function of the applied external flux exψ and by fixing 075.0=β and 0=γ . (b) Persistent currents 3I (star) and 4I (diamond) plotted as a function of the applied external flux exψ and by fixing 075.0=β and 0=γ . Fig. 2 (a) Persistent currents 1I (triangle) and 2I (box) plotted as a function of the applied external flux exψ and by fixing 15.0=β and 0=γ . (b) Persistent currents 3I (star) and 4I (diamond) plotted as a function of the applied external flux exψ and by fixing 15.0=β and 0=γ . Fig. 3 (a) Persistent currents 1I (triangle) and 2I (box) plotted as a function of the applied external flux exψ and by fixing 2.0=β and 0=γ . (b) Persistent currents 3I (star) and 4I (diamond) plotted as a function of the applied external flux exψ and by fixing 2.0=β and 0=γ . Fig. 4 (a) Persistent currents 1I (triangle) and 2I (box) plotted as a function of the applied external flux exψ and by fixing 25.0=β and 0=γ . (b) Persistent currents 3I (star) and 4I (diamond) plotted as a function of the applied external flux exψ and by fixing 25.0=β and 0=γ . Fig. 5 Density plot of the potential ( )ϕU plotted as a function of the phase ϕ and of the normalized bias current γ by setting the remaining parameters as: 2.0=β and 7.0=exψ . Lower energy states are represented by darker regions in the plot. Fig. 1 0 0.2 0.4 0.6 0.8 1 �0.75 �0.25 0 0.2 0.4 0.6 0.8 1 Fig. 2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Fig. 3 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Fig. 4 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Fig. 5 0 Π����� Π 3 Π���������� ����� ���������� 2 2 Π ABSTRACT Starting from the reduced dynamical model of a two-junction quantum interference device, a quantum analog of the system has been exhibited, in order to extend the well known properties of this device to the quantum regime. By finding eigenvalues of the corresponding Hamiltonian operator, the persistent currents flowing in the ring have been obtained. The resulting quantum analog of the overdamped two-junction quantum interference device can be seen as a supercurrent qubit operating in the limit of negligible capacitance and finite inductance. <|endoftext|><|startoftext|> Draft version August 6, 2018 Preprint typeset using LATEX style emulateapj v. 4/12/04 MID-INFRARED FINE STRUCTURE LINE RATIOS IN ACTIVE GALACTIC NUCLEI OBSERVED WITH SPITZER IRS: EVIDENCE FOR EXTINCTION BY THE TORUS R. P. Dudik , J. C. Weingartner , S. Satyapal , J. Fischer , C. C. Dudley , & B. O’Halloran Draft version August 6, 2018 ABSTRACT We present the first systematic investigation of the [NeV] (14µm/24µm) and [SIII] (18µm/33µm) infrared line flux ratios, traditionally used to estimate the density of the ionized gas, in a sample of 41 Type 1 and Type 2 active galactic nuclei (AGNs) observed with the Infrared Spectrograph on board Spitzer. The majority of galaxies with both [NeV] lines detected have observed [NeV] line flux ratios consistent with or below the theoretical low density limit, based on calculations using currently available collision strengths and ignoring absorption and stimulated emission. We find that Type 2 AGNs have lower line flux ratios than Type 1 AGNs and that all of the galaxies with line flux ratios below the low density limit are Type 2 AGNs. We argue that differential infrared extinction to the [NeV] emitting region due to dust in the obscuring torus is responsible for the ratios below the low density limit and we suggest that the ratio may be a tracer of the inclination angle of the torus to our line of sight. Because the temperature of the gas, the amount of extinction, and the effect of absorption and stimulated emission on the line ratios are all unknown, we are not able to determine the electron densities associated with the [NeV] line flux ratios for the objects in our sample. We also find that the [SIII] emission from the galaxies in our sample is extended and originates primarily in star forming regions. Since the emission from low-ionization species is extended, any analysis using line flux ratios from such species obtained from slits of different sizes is invalid for most nearby galaxies. Subject headings: Galaxies: Active— Galaxies: Starbursts— X-rays: Galaxies — Infrared: Galaxies 1. INTRODUCTION Mid-infrared (mid-IR) emission-line spectroscopy of active galactic nuclei (AGNs) is used to investigate the physical conditions of the dust-enshrouded gas that is in close proximity to the active nucleus. In particular, many spectral lines are emitted in the so-called narrow- line region (NLR) of these objects which typically ex- tends between tens to at most a thousand parsecs from the nucleus (Capetti, et al. 1995, 1997, Schmitt & Kin- ney 1996, Falcke et al. 1998; Ferruit et al. 1999, Schmitt et al. 2003). The NLRs of AGNs have been studied extensively us- ing optical spectroscopic observations. However, there have been very few systematic studies of the NLR using infrared spectroscopic observations. Infrared (IR) fine- structure emission lines have a number of special char- acteristics that have been regarded as distinct advan- tages, particularly in determining the electron density of the ionized gas very close to the central AGN. Infrared spectroscopic observations allow access to fine-structure lines from ions with higher ionization potentials than the most widely used optical diagnostic lines. This is impor- tant in many AGNs, where a significant fraction of the line emission from lower ionization species can originate in gas ionized by star forming regions. In addition, it is generally assumed that the density-sensitive infrared line ratios originate in gas with temperatures around 104 K and are less dependent on electron temperature vari- ations, enabling a more straightforward determination of the electron density in the ionized gas. Finally, it 1 George Mason University, Department of Physics & Astron- omy, MS 3F3, 4400 University Drive, Fairfax, VA 22030 2 Naval Research Laboratory, Remote Sensing Division, 4555 Overlook Ave SW, Washington DC, 20375 has long been assumed that the IR diagnostic line ratios are insensitive to reddening corrections–a serious impedi- ment to optical and ultraviolet observations, particularly in the NLRs of AGNs, where the dust composition and spatial distribution are highly uncertain. For these rea- sons, IR spectroscopic observations, especially since the era of the Infrared Space Observatory (ISO), have pro- vided us with some of the most reliable tools for studying the NLRs in AGNs. However, while there are clear ad- vantages of mid-IR fine-structure diagnostics in studying the physical state of the ionized gas, very little work has been done to investigate their robustness in determin- ing the gas densities of the NLRs in a large sample of AGNs. The Spitzer Space Telescope Infrared Spectrome- ter (IRS), with its extraordinary sensitivity and spectral resolution, offers the opportunity to examine for the first time the physical state of NLR gas in a large sample of AGNs. The focus of most previous comparative studies of the infrared fine-structure lines in AGNs has been on the excitation state of the ionized gas, in an effort to de- termine the existence and energetic importance of po- tentially buried AGNs and to constrain their ionizing radiation fields (Genzel et al. 1998, Lutz et al. 1999, Alexander & Sternberg 1999, Sturm et al. 2002, Satya- pal, Sambruna, & Dudik 2004, Spinoglio et al. 2005). Remarkably, very little work has been done in the in- frared on studying the line flux ratios traditionally used to probe the NLR gas densities in a significant number of AGNs. We present in this paper the first systematic in- frared spectroscopic study of the line flux ratios of [NeV] and [SIII] in order to 1) test the robustness of these line ratios as density diagnostics and 2) if possible, to probe the densities of the NLR gas in a large sample of AGNs. http://arxiv.org/abs/0704.0547v2 2. THE SAMPLE We searched the Spitzer archive for galaxies with an ac- tive nucleus and both high- and low-resolution Infrared Spectrometer (IRS; Houck et al. 2004) observations cur- rently available. Only those galaxies with indisputable optical, X-ray, or radio signatures of active nuclei (such as broad Hα or X-ray or radio point sources) were in- cluded in our sample. The sample includes three AGN subclasses: Seyferts, LINERs, and Quasars. The galax- ies in this sample span a wide range of distances (4 to 400 Mpc; median = 21 Mpc), Hubble types, bolomet- ric luminosities (log (LBOL) ∼ 40 to 46, median = 43), and Eddington Ratios (log(L/LEdd) ∼ -6.5 to 0.3; me- dian= -2.5). The entire sample consists of 41 galaxies. The basic properties of the sample are given in Table 1. The black hole masses listed in Table 1 were derived using resolved stellar kinematics, if available, reverber- ation mapping, or by applying the correlation between optical bulge luminosity and central black hole mass de- termined in nearby galaxies only when the host galaxy was clearly resolved. Bolometric luminosities listed in Table 1 were calculated from the X-ray luminosities for most objects. For Seyferts, the relationship LBOL = 10 × LX was adopted (Elvis 1994). For LINERs 1 we assumed LBOL = 34× LX , as derived from the spectral energy dis- tribution of a sample of nearby LINERs from Ho (1999) (see also Dudik et al. 2005 and Satyapal et al. 2005). The bolometric luminosities and black hole masses for quasars and radio galaxies were taken from Woo & Urry (2002) and Marchesini, Celotti, & Ferrarese (2004), re- spectively. A detailed discussion of our methodology and justification of assumptions for determining black hole masses and bolometric luminosities for the various AGN classes represented in Table 1 can be found in Satyapal et al. (2005) and Dudik et al. (2005). Table 1 also lists the AGN type (1 or 2) for the galaxies in our sample based on the presence or absence of broad (full width at half max (FWHM) exceeding 1000 km s−1) Balmer emission lines in the optical spectrum. We emphasize that the selection basis for the objects in our sample was on the availability of high resolution IRS Spitzer observations. The sample should therefore not be viewed as complete in any sense. 3. DATA ANALYSIS AND RESULTS We extracted archival spectral data obtained us- ing the short-wavelength, low-resolution module (SL2, 3.6”×57”, λ = 5.2-7.7µm) and both the short- wavelength, high-resolution (SH, 4.7”×11.3”, λ = 9.9- 19.6µm) and long-wavelength, high-resolution (LH, 11.1”×22.3”, λ = 18.7-37.2µm) modules of IRS. The data presented here were preprocessed by the IRS pipeline (version 13.0) at the Spitzer Science Center (SSC) prior to download. Preprocessing includes ramp fitting, dark-sky subtraction, droop correction, linear- ity correction, flat-fielding, and flux calibration2. The Spitzer data were further processed using the SMART v. 5.5.7 analysis package (Higdon et al. 2004). The slit for 1 We include all galaxies that are classified as LINERs using either the Heckman (1980) or Veilleux & Osterbrock (1987) diag- nostic diagrams. 2 See Spitzer Observers Manual, Chapter 7, http://ssc.spitzer.caltech.edu/documents/som/irs60.pdf Table 1: Properties of the Sample Galaxy Distance Hubble log log log AGN Name (Mpc) Type (MBH) (LX ) (L/LEdd) Type (1) (2) (3) (4) (5) (6) (7) Seyferts NGC4151 13 SABab 7.13a 42.7b -1.53 1r NGC1365 19 SBb 7.64b 41.3d -3.42 2s NGC1097 15 SBb · · · 40.7e · · · · · · NGC7469 65 SABa 6.84a 44.3a 0.34 1t NGC4945 4 SBcd 7.35b 42.5f -1.97 2u Circinus 4 SAb 7.72b 42.1g -2.74 2v Mrk 231 169 SAc 7.24c 42.2h -2.16 · · · Mrk3 54 S0 8.65a 43.5a -2.21 2w Cen A 3 S0 7.24b 41.8i -2.54 2x Mrk463 201 Merger · · · 43.0j · · · 2y NGC 4826 8 SAab 6.76b · · · · · · · · · NGC 4725 16 SABab 7.40b · · · · · · 2r 1 ZW 1 245 Sa · · · 43.9k · · · · · · NGC 5033 19 SAc 7.39b 41.4l -3.13 1r NGC1566 20 SABbc 6.92a 43.5a -0.57 1t NGC 2841 9 SAb 8.21a 42.7a -2.64 · · · NGC 7213 24 SA0 7.99a 43.30a -1.79 · · · LINERs NGC4579 17 SABb 7.85b 41.0b -3.47 · · · NGC3031 4 SAab 7.79b 40.2b -4.16 · · · NGC6240 98 Merger 9.15b 44.2b -1.52 2z NGC5194 8 SAbc 6.90b 41.0b -2.43 2r MRK266NE 112 Merger · · · 40.9b · · · 2t NGC7552 21 SBab 6.99b · · · · · · · · · NGC 4552 17 · · · 8.57b 39.6b -5.52 · · · NGC 3079 15 SBc 7.58b 40.1m -4.05 · · · NGC 1614 64 SBc 6.94b · · · · · · · · · NGC 3628 10 SAb 7.86b 39.9n -4.58 · · · NGC 2623 74 Pec 6.83b · · · · · · 2aa IRAS23128-5919 178 Merger · · · 41.0b · · · 2bb MRK273 151 Merger 7.74b 44.0o -0.31 2t IRAS20551-4250 171 Merger 7.52c 40.9b -3.23 · · · NGC3627 10 SABb 7.16b 39.4p -4.33 2r UGC05101 158 S · · · 40.9b · · · 1cc NGC4125 18 E6 8.50b 38.6b -6.47 · · · NGC 4594 10 SAa 9.04b 40.1q -5.47 · · · Quasars PG 1351+640 353 · · · 8.48a 44.5a -1.08 · · · PG 1211+143 324 · · · 7.49a 44.8a 0.22 1t PG 1119+120 201 · · · · · · · · · · · · 1y PG 2130+099 252 Sa 7.74a 44.47a -0.37 1y PG 0804+761 400 · · · 8.24a 44.93a -0.41 1dd PG 1501+106 146 E · · · · · · · · · 1y Columns Explanation: Col(1):Common Source Names; Col(2): Dis- tance (for H0= 75 km s −1Mpc−1); Col(3): Morphological Class; Col(4): Mass of central black hole in solar masses; Col(5): Log of the hard X-ray luminosity (2-10keV) in erg s−1. Col(6): log of the Eddington Ratio. (* = We include all galaxies that are classified as LINERs using either the Heckman (1980) or Veilleux & Osterbrock (1987) diagnostic diagrams. Col(6): AGN type based on the presence or absence of broad Balmer emission lines.) References:aWoo & Urry 2002, b Satyapal et al. 2005, c Tacconi et al. 2002, d Risaliti et al. 2005, e Terashima et al. 2002, f Guainazzi et al. 2000, g Smith & Wilson 2001, h Gallagher et al. 2002, i Evans et al. 2004, j Iman- ishi & Terashima et al. 2004, kGallo et al. 2004 , l Terashima et al. 1999, m Cappi et al. 2006, n Roberts, Schurch, & Warwick 2001, o Balestra et al. 2005, pGeorgantopoulos et al 2002, q Dudik et al. 2005, r Ho et al. 1997, s Storchi-Bergmann, Mulchaey, & Wilson, 1992, t Veron-Cetty & Veron 2003, u Marconi et al. 2000, vOliva et al. 1994, w Khachikian & Weedman 1974, x Veron-Cetty & Veron 1986y Dahari & De Robertis 1988, z Andreasian, Khachikian, & Ye, 1987, aa Laine et al. 2003, bb Duc, Mirabel, & Maza 1997, cc Sanders et al. 1988, dd Thompson 1992. http://ssc.spitzer.caltech.edu/documents/som/irs60.pdf the SH and LH modules is too small for background sub- traction to take place and separate SH or LH background observations do not exist for any of the galaxies in this sample. For the SL2 module, background subtraction was done using either a designated background file when available or the interactive source extraction option. In the case of the latter, the exact position of the slit on the host galaxy was first checked using Leopard, the data archive access tool available from the SSC. The source was then carefully defined according to the boundary of the slit and the edge of the host galaxy. The background was defined at the edge of the slit, where no other obvious source was present. In some cases, the slit was enveloped in the host galaxy and background subtraction could not take place. For both high and low resolution spectra, the ends of each order were manually cut from the rest of the spectrum. The 41 observations presented in this work are archived from various programs, including the SINGS Legacy Pro- gram, and therefore contain both mapping and staring observations. All of the staring observations were cen- tered on the nucleus of the galaxy. The SH, LH, SL2 staring observations include data from two slit positions overlapping by one third of a slit. In order to isolate the nuclear region in the mapping observations so that we might compare them to the staring observations, we extracted only those 3 overlapping slit positions coin- ciding with either radio or 2MASS nuclear coordinates. Because the slits in both the mapping and staring obser- vations occupy distinctly different regions of the sky, the slits cannot be averaged unless the emission originates from a compact source that is contained entirely in each slit. Therefore the procedure for flux extraction was the following: 1) If the fluxes measured from the two slits differed by no more than the calibration error of the in- strument, then the fluxes were averaged; otherwise, the slit with the highest measured line flux was chosen. 2) If an emission line was detected in one slit, but not in the other, then the detection was selected. This is true for all of the high and low resolution staring and mapping observations. In Tables 2 and 3 we list the line fluxes and statistical errors from the SH and LH observations for the [NeV] 14.3µm and 24.3µm lines, the [SIII] 18.1µm and 33.5µm lines, as well as the 6.2µm PAH emission feature. For all galaxies with previously published fluxes, we list in Tables 2 and 3 the published flux values. Our values dif- fer by no more than a factor of 1.9, much less in most cases, from the Weedman et al. (2005) or Armus et al. (2004, 2006) published values. These differences can be attributed to differences in the pipeline used for prepro- cessing. In all cases detections were defined when the line flux was at least 3σ. For the absolute photometric flux uncertainty we conservatively adopt 15%, based on the assessed values given by the Spitzer Science Center (SSC) over the lifetime of the mission.3 This error is cal- culated from multiple observations of various standard stars throughout the Spitzer mission by the SSC. The dominant component of the total error arises from the 3 See Spitzer Observers Manual, Chapter 7, (http://ssc.spitzer.caltech.edu/documents/som/som7.1.irs.pdf and IRS Data Handbook (http://ssc.spitzer.caltech.edu/irs/dh/dh20v2.pdf, Chapter 7.2 uncertainty at mid-IR wavelengths in the stellar models used in calibration and is systematic rather than Gaus- sian in nature. We note that the spectral resolution of the SH and LH modules of IRS (λ / ∆λ ∼ 600) is in- sufficient to resolve the velocity structure for most of the lines. There are a few galaxies which do show slightly broadened [NeV] line profiles (FWHM ∼ 200 - 1200 km s−1). These results will be discussed in a future paper. Abundance-independent density estimates can readily be obtained using infrared fine-structure transitions from like ions in the same ionization state with different crit- ical densities. The density diagnostics available in the IRS spectra of our objects are: [NeV] 14.32µm, 24.32 µm (ncrit ∼ 4.9 × 10 4 cm−3, and 2.7 × 104 cm−3, where ncrit = Aul/γul, with Aul the Einstein A coefficient and γul the rate coefficient for collisional de-excitation from the upper to the lower level), [NeIII] 15.55µm, 36.04 µm (ncrit ∼ 3 × 10 5 cm−3, and 5 × 104 cm−3, Giveon et al. 2002), and [SIII]18.71µm, 33.48 µm (ncrit ∼ 1.5 × 10 cm−3, and 4.1 × 103 cm−3). The results are very insen- sitive to the shape of the ionizing continuum. Since the [NeIII] 36µm line was either not detected or was outside the wavelength range of the LH module in virtually all galaxies, we omit any analysis of the [NeIII] line ratio from this work. 4. THE [NEV] LINE FLUX RATIOS In Figure 1 we plot the calculated 14µm/24µm line lu- minosity ratio as a function of electron density ne for gas temperatures T = 104K, 105K, and 106K. We include only the five levels of the ground 2s22p2 configuration and neglect absorption and stimulated emission. The results are nearly identical if only the lowest three levels of the ground term are included. We adopt collision strengths from Griffin & Badnell (2000) and radiative transition probabilities from Galavis, Mendoza, & Zeippen (1997). Fig. 1.— [NeV] 14µm/24µm line flux ratio versus electron den- sity, ne, for gas temperatures T = 10 4 K, 105 K, 106 K In Table 2, we list the observed [NeV] line flux ratios http://ssc.spitzer.caltech.edu/documents/som/som7.1.irs.pdf and their associated calibration uncertainties. In calcu- lating the upper and lower limits on the ratios, RMAX and RMIN , shown in Table 2, we did not propagate the errors in quadrature as would be appropriate for statis- tical uncertainties, but propagated them as follows: RMAX = F [NeV]14 + 0.15(F [NeV]14) F [NeV]24 − 0.15(F [NeV]24) RMIN = F [NeV]14 − 0.15(F [NeV]14) F [NeV]24 + 0.15(F [NeV]24) We note that this is conservative, since some components of the calibration errors should cancel in the ratio. Both line fluxes were measured for 19 galaxies. In what fol- lows we compare the line flux ratios measured in all but one, MKN 266, for reasons that are discussed in detail in Section 5.2. Of these 18 AGNs, 13 have ratios that are consistent with the low density limit to within the uncertainties, while only 2, both Type 1, have ratios sig- nificantly above it. The remaining 3, all Type 2, have ratios significantly below the low-density limit. Inter- estingly, we note that a similar range of ratios was also measured with the ISO SWS (Sturm et al. 2002, Alexan- der et al. 1999). There are several possible explanations for this finding. The observed, unphysically low ratios could result from artifacts introduced by variations in the slit sizes from which the line fluxes are obtained, from calibration uncertainties, or from substantial mid- IR extinction. Alternatively, perhaps important physical processes were neglected in calculating the theoretical ra- tios. In addition, errors in the collisional rate coefficients for the [NeV] transitions associated with the mid-infrared lines may be important. We explore these scenarios in the following sections. Observational Effects: Because the IRS LH slit is larger than the SH slit, if the [NeV] emission is extended, or multiple AGNs are present, the 14/24 µm line ratio will be artificially reduced. However, since the ionization potential of [NeV] is ∼ 97 eV, we expect that the [NeV]- emitting gas is ionized by the AGN radiation field only and is concentrated very close to the central source. Vir- tually all of the [NeV] fluxes presented in this work were obtained from IRS staring observations. Thus it is im- possible to determine whether the emission is extended using Spitzer observations alone. However, a number of galaxies have been observed at 14 and 24 µm by ISO. In Table 2 we list in addition to our Spitzer [NeV] fluxes, all available [NeV] fluxes from ISO. The ISO aperture at 14 and 24 µm (14”×27”) is much larger than either the SH or LH slits. In Figure 2 we plot the ratio of the [NeV] flux measured by ISO to that measured by Spitzer for both the 14 and 24 µm lines. The ranges of the [NeV] line flux ratios are consistent with the instrument uncertain- ties and are similar for all galaxies in the sample. Only the 14µm ratio for Mrk 266 falls outside of the expected range. This strongly suggests that the [NeV] emission is indeed compact and originates in the NLR and that the ratios are not affected by aperture variations, except for Mrk 266 which is discussed in detail in Section 5.2. If the data were affected by aperture variations we would expect to see an overall systematic increase of the 14µm/24µm line ratio with distance(See Figure 3). The Spearman rank correlation coefficient (rS, Kendell & Stuart 1976) corresponding to this plot is -0.069 (with [NeV] 14 micron Ratio 0.0 0.5 1.0 1.5 2.0 2.5 [NeV] (ISO) / F [NeV] (Spitzer) Mrk 266 [NeV] 24 micron Ratio Fig. 2.— Ratio of the ISO to Spitzer [NeV] 14µm and 24µm fluxes for those galaxies with overlapping observations. The range indicated with arrows is that corresponding to the absolute flux calibration for ISO (20%) and Spitzer (15%). Within the calibra- tion uncertainties of the instrument, the [NeV] fluxes are virtually the same for all of the galaxies except Mrk 266 (See Section 6.2). This strongly suggests that the [NeV] emission is compact and orig- inates in the NLR. We note that Sturm et al. 2002 find that the [NeV] 24µm detection for NGC 7469 is questionable. The ISO to Spitzer ratio for this galaxy (0.43) is the lowest shown here. a probability of chance correlation of 0.78), where a co- efficient of 1 or -1 indicates a strong correlation and a coefficient of 0 indicates no correlation. Thus we find that there is no correlation between the [NeV] ratio and distance in our sample. However this does not completely rule out aperture effects, if the size of the [NeV] emitting region increases with the bolometric luminosity of the AGN and the sample displays a significant trend in bolo- metric luminosity with distance. In this case, a correla- tion between the [NeV] ratio and distance would not be apparent since aperture variations would affect all galax- ies in the same way, regardless of distance. However this scenario is unlikely since the size of the [NeV]-emitting region would have to increase proportionately with dis- tance in order to remain extended beyond the slit for all galaxies. Nevertheless, we checked for this possibility, both by examining the [NeV] ratio vs. bolometric lu- minosity and by plotting the ratio vs. distance, binning the galaxies according to their bolometric luminosity. We find neither to be correlated over 5 orders of magnitude in LBOL. Thus, in the case of the [NeV] line flux ratio, we find no indication that ratios below the low density limit are artifacts of aperture effects. We point out that the [NeV] 24µm IRS line fluxes in the small overlapping sample plotted in Figure 2 are system- atically higher than the corresponding ISO-SWS fluxes, despite the smaller IRS slit. This indicates that one or both of the instruments is affected by systematic errors more severe than are indicated by the calibration un- certainty estimates. The SWS band 3D that includes the [NeV] 24µm line was characterized by strong fring- ing effects that when combined with the narrow range of the line scan mode introduced sometimes large un- 0.5 1.0 1.5 2.0 2.5 Mrk 266 = 0.069 log(Distance, Mpc) Seyferts Liners Quasars Fig. 3.— The [NeV] 14µm/24µm ratio as a function of distance. Open symbols signify Type 1 AGNs, Filled symbols signify Type 2 AGNs. The error bars shown here mark the calibration uncertain- ties on the line ratio. If the ratio were indeed affected by aperture variations we would expect a systematic increase of the ratio with distance. As can be seen here, this is not the case, and we find no indication that the low ratio is attribuable to aperture effects. certainties in the baseline fitting, and therefore the line flux measurement accuracy. In contrast, the baseline fit- ting over the entire Spitzer IRS SH and LH full spectra can be much more accurate. Moreover pointing accu- racy and stability are an order of magnitude improved over that obtained by ISO. We therefore assume in the sections that follow that the adopted conservative Spitzer IRS calibration uncertainties are accurate characteriza- tions of the IRS measurements. Importantly, regardless of which instrument is used, [NeV] ratios consistent with the low density limit have been observed in a number of sources with both ISO (e.g. Sturm et al. 2002 NGC 1365, NGC 7582, NGC4151, NGC 5506; Alexander et al. 1999, NGC 4151) and Spitzer (Weedman et al. 2005, Haas et al. 2005, and this work). Extinction: We consider the possibility that mid-IR differential extinction toward the [NeV]-emitting regions is responsible for the low [NeV] line ratios. Adopting the low-density limit (LDL) for the intrinsic value of the ratio ([NeV]14µm/24µm ∼0.83 for ne≤200 cm −3) for galaxies with ratios below the LDL, the observed line ratio gives a lower limit to the extinction, for a given MIR extinction curve. We examined the visual extinctions correspond- ing to the mid-IR differential extinction derived using three separate extinction curves: 1) the Draine (1989) extinction curve amended by the more recent ISO SWS extinction curve toward the Galactic center for 2.5-10µm (Lutz et al. 1996), 2) the Chiar & Tielens (2006) ex- tinction curve for the Galactic Center using 2.38-40µm ISO SWS observations of a bright IR source in the Quin- tuplet cluster (GCS3-I) 3) the Chiar & Tielens (2006) extinction curve for the local ISM using 2.38-40µm ISO SWS observations of a WC-type Wolf-Rayet (WR) star (WR98a). The Draine (1989) and Lutz et al. (1996) extinction curve yields AV ∼ 3 to 99 mag (See Table 2). However, these values result from an extinction law that is unexplored beyond 10µm. The Chiar & Tie- lens (2006) Galactic center extinction curve cannot ex- plain the observed [NeV] ratios since the extinction at 24µm is greater than the extinction at 14µm, so we do not discuss it further. The visual extinction resulting from their local ISM extinction curve is unrealistically high (AV median=500mag). The calculated extinction ob- tained using the Draine (1989), Lutz et al. (1996), and Chiar & Tielens (2006) local ISM extinction curves are given in Table 2. The AV derived from the two extinction curves de- scribed above are extremely high in many cases. Even if the extinction is calculated from the upper limit on the ratio to the LDL for the three galaxies whose upper limits are below the LDL, the corresponding visual ex- tinction is still very high (for the Draine 1989 and Lutz 1996 extinction curve AV = 21, 26, and 30 mag for these three galaxies; for the Chiar and Tielens extinction curve AV = 260, 330, and 370 mag). However, we caution the reader that the actual value for extinction is highly un- certain. Indeed very little is known about the 8-40µm extinction curve in AGNs. Specifically, the 10 and 18 µm silicate features in this band are the source of in- consistency. Even within the AGN class, extinction may vary dramatically from 8-40µm because of variations in the silicate features due to differences in grain size, poros- ity, shape, composition, abundance, and location in each galaxy. Hao et al. (2005) show that in five AGNs (4 of which are in our sample), both silicate features vary considerably in strength and width. Sturm et al. (2005) also show that the standard ISM silicate models do not accurately fit NGC 3998, a LINER with silicate emis- sion. Sturm et al. (2005) suggest that increased grain size and possibly the presence of crystalline silicates such as clino-pyroxenes may improve the fit, but that clearly circumnuclear dust in AGNs has very different proper- ties than dust in the Galactic ISM (see also Maiolino et al. 2001a, 2001b, but Weingartner & Murray 2002 for an alternative view). Chiar & Tielens (2006) even show that the GC observations and the local ISM observations within the Galaxy deviate from each other most dramat- ically in the wavelength region between the two silicate absorption features. In their observations, this is the re- gion between ∼ 12-15µm -directly overlapping with the 14µm values in which we are interested. Because of ir- regularity of the silicate features in the mid-IR, it is very difficult to interpret the true extinction there. Moreover, in addition to the uncertainty in the extinction law, the geometry of the obscuring material is unknown and can vary substantially from galaxy to galaxy. The most that can be said here for the galaxies with ratios below the LDL is that if extinction is responsible for the low ratios, then the extinction must be less at 24µm than at 14µm. Physical Processes: It is possible that important physical processes have been neglected in calculating the [NeV] line luminosity ratio as a function of electron den- sity shown in Figure 1. We consider three physical pro- cesses that may affect the line ratios: (1) A source of gas heating in addition to photoioniza- tion (e.g., shocks, turbulence) that may yield gas temper- atures substantially higher than 104K. As can be seen in Figure 1, higher gas temperatures do not yield sig- nificantly lower line ratios in the low-density limit, but could explain the generally low values of the ratios that lie above the LDL. (2) Pumping from the ground term to the first excited term, e.g., by O III resonance lines. The specific energy density required for this to significantly affect the line ratio exceeds 10−14 erg cm−3 Hz−1, which is implausibly large by orders of magnitude. (3) Absorption and stimulated emission within the ground term, which could be important if, e.g., a large quantity of warm dust yielding copious 24µm continuum emission is located close to the [NeV]-emitting region. Figures 4a through 4e show the line ratio as a func- tion of the specific energy density at 24µm, uν(24µm). We display results for electron density ne = 10 2, 103, 104, 105, and 106 cm−3; gas temperature T = 104, 105, and 106K; and ratio of the specific energy density at 14 and 24µm, uν(14µm)/uν(24µm) = 0.4, 1.0, and 1.8 (val- ues were chosen to reproduce the observed range of the 14µm/24µm continuum flux ratios; see Section 5). For the moment, assume that the NeV is located suf- ficiently far from the source of the 14 and 24µm contin- uum emission to treat the source as a point. If hot dust within or near the inner edge of the torus is responsible for this emission, then this assumption requires that the distance to the NeV, rNe, be large compared with the dust sublimation radius, rsub ∼ 1 pcL bol, 46 (Ferland et al. 2002); Lbol, 46 is the bolometric luminosity in units of 1046 erg s−1. In this case, we can obtain a simple estimate of uν(24µm) at the location of the [NeV]-emitting region from the observed specific flux Fν(24µm), the distance to the galaxy D, and rNe: uν(24µm) ∼ Fν(24µm) . (3) With rNe = 100 pc, uν(24µm) estimated in this way ranges from ≈ 10−24 erg cm−3 Hz−1 to somewhat less than 10−20 erg cm−3 Hz−1 for the galaxies in our sam- ple. These can be compared to the results of Hönig et al. (2006), who modeled the infrared emission from clumpy tori. They presented plots of Fν at a distance of 10Mpc from an AGN with bolometric luminosity Lbol = 4 × 10 45 erg s−1. Extrapolating to a distance of 100 pc, we find uν(24µm) as large as a few times 10−21 erg cm−3 Hz−1, close to the estimate for the most luminous AGN in our sample. From Figures 4a through 4e, we see that the infrared continuum can only reduce the line ratio significantly at rNe ≈ 100 pc if T & 10 5K when ne ∼ 10 2 cm−3 and T & 106K when ne ∼ 10 3 cm−3. However, the NeV, as a high-ionization species, may lie closer to the central source than does the bulk of the narrow line region. If rNe ≈ 10 pc, then uν(24µm) increases by a factor ∼ 100. In this case, the observed low line ratios can be explained by this mechanism with T ∼ 104K, if ne ∼ 10 2 cm−3. Higher values of electron density would require higher gas temperatures. In Section 5.1, we suggest that the [NeV]-emitting re- gion may lie within the torus. In this case, absorp- tion and stimulated emission within the ground term are probably important. For the high-luminosity objects, these may even dominate over collisional excitation and de-excitation. At these central locations, gas tempera- tures T ∼ 106 K may be natural (Ferland et al. 2002). Relatively high densities may also be expected, in which case the infrared continuum may not appreciably depress the line ratio (see Figure 4d). Adopting the Mathews & Ferland (1987) spectrum and T ≈ 106K, the ionization parameter U ≡ nγ/ne ∼ 10 in order for a substantial fraction of the Ne to be NeV; nγ is the number density of H-ionizing photons. For this spectrum, nγ ≈ 1.7×10 3Lbol, 46 r Ne, 100 cm −3, where rNe, 100 = rNe/100 pc. If rNe = 1pc, then either (1) ne ∼ 10 10Lbol,46 cm −3 or (2) the nuclear continuum is filtered through a far-UV/X-ray-absorbing medium be- fore reaching the [NeV]-emitting region. If absorption and stimulated emission are indeed rele- vant processes in [NeV] line production, we might expect a relationship between the [NeV] line flux ratio and the 24 µm continuum luminosity that is consistent with one of the curves shown in Figures 4a through 4e. In Figure 5 we plot this relationship for the [NeV] emitting galax- ies in our sample. As can be seen in Figure 5, we find no relationship between the [NeV] line flux ratio and the 24µm continuum luminosity for our sample of galaxies. The Spearman rank correlation coefficient for this plot is -0.01 (probability of chance correlation = 0.95), indi- cating no correlation. As a result, as can be seen from Figures 4a and 4b, stimulated emission and absorption at low densities can be ruled out as possible scenarios be- cause the scatter plot shown in Figure 5 does not follow the model predictions. We note that the location of the [NeV]-emitting region relative to the source of the 24µm continuum emission is uniform among the galaxies in the sample. Variations in the location might obscure any correlation in these plots. Figures 4c and 4d reveal that, for some values of ne, T, and uν(14µm)/uν(24µm), the line ratio is very insensitive to the value of uν(24µm). In these cases, the line ratio remains above ∼ 0.8. Thus, al- though absorption and stimulated emission may be con- tributing processes to [NeV] production, another mecha- nism is required to explain the low (<0.8) [NeV] line flux ratios in our sample. Computed Quantities: Finally, it is possible that there is significant error in the adopted collisional rate coefficients. The accuracy of collisional strengths of in- frared atomic transitions has been a longstanding ques- tion. We adopt the collisional rate coefficients from the state of the art IRON project (Hummer et al. 1993) which produced the most up-to-date and accurate colli- sion strengths for a large database of atomic transitions. While these calculations have been questioned based on recent ISO observations of nebulae (Clegg et al. 1987, Oliva et al. 1996, Rubin et al. 2002, Rubin 2004), it is likely that the discrepancies between the observational and theoretical values can be explained by inaccuracies in the fluxes employed (van Hoof et al. 2000). Uncer- tainties in the collisional rate coefficients for the [NeV] transitions are unlikely to exceed 30% (van Hoof et al. 2000). It is therefore unlikely that the low critical den- sities implied by our data can be attributed to uncer- tainties in the theoretical values of the [NeV] collisional strengths. 5. EXTINCTION EFFECTS OF THE TORUS AND AGN UNIFICATION Although low electron densities, high gas tempera- tures, and/or high infrared radiation densities may play Fig. 4.— The [NeV] line ratio as a function of the specific energy density at 24µm, uν(24µm), for temperatures, T = 10 4, 105, 106 K, for 14µm/24µm continuum ratios of 0.4, 1.0, and 1.8, and finally for electron densites, ne = 10 2, 103, 104, 105, 106 cm−3. Table 2: NeV Line Fluxes and Derived Extinction Galaxy [NeV] [NeV] [NeV] [NeV] [NeV] AV AV AV AV Source 14.32 14.32 24.32 24.32 Ratio Ratio to Ratio to Ratio to Ratio to SH ISO LH ISO LDL (D&L) HDL (D&L) LDL (C&T) HDL (C&T) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Seyferts NGC4151 7.77a 5.50c 6.77a 5.60c 1.15 +0.41 −0.30 · · · 126 · · · 1686 NGC1365 2.20±0.06 2.50c 5.36±0.06 3.90c 0.41 +0.14 −0.11 45 189 570 2519 NGC1097 <0.05 · · · <0.18 · · · · · · · · · · · · · · · · · · NGC7469 1.16a <1.50c 1.47a 0.63c∗ 0.79 +0.28 −0.21 3 149 41 1990 NGC4945 0.28±0.03 <0.50d <0.75 · · · >0.38 · · · · · · · · · · · · Circinus 23.94±0.61 31.70c 24.00±3.90 21.80c 1.00 +0.35 −0.26 · · · 135 · · · 1799 Mrk 231 <0.44a <1.50e <0.69a · · · · · · · · · · · · · · · · · · Mrk3 6.45a 4.60c 6.75a 3.40c 0.96 +0.34 −0.25 · · · 138 · · · 1835 Cen A 2.32a 2.70c 2.99a 2.00c 0.77 +0.27 −0.20 4 150 56 2005 Mrk463 1.83b 1.40c 2.04b · · · 0.90 +0.32 −0.23 · · · 141 · · · 1886 NGC 4826 · · · · · · · · · · · · · · · · · · · · · · · · · · · NGC 4725 <0.09 · · · 0.09±0.03 · · · <1.04 · · · · · · · · · · · · 1 ZW 1 <0.11a 0.27c <0.10a · · · · · · · · · · · · · · · · · · NGC 5033 0.07±0.02 · · · 0.11±0.02 · · · 0.65+0.23 −0.17 16 161 198 2146 NGC1566 0.16±0.05 · · · 0.22±0.04 · · · 0.74+0.26 −0.19 7 153 92 2041 NGC 2841 <0.04 · · · <0.03 · · · · · · · · · · · · · · · · · · NGC 7213 <0.04 · · · <0.09 · · · · · · · · · · · · · · · · · · LINERs NGC4579 <0.06 · · · <0.03 · · · · · · · · · · · · · · · · · · NGC3031 <0.06 · · · <0.04 · · · · · · · · · · · · · · · · · · NGC6240 0.51b <1.00e <0.39b · · · <1.31 · · · · · · · · · · · · NGC5194 0.41±0.04 <0.20c 0.39±0.09 · · · 1.06 +0.37 −0.28 · · · 131 · · · 1751 MRK266∗∗ 0.21±0.02 0.50f 1.19±0.06 · · · 0.18 +0.06 −0.05 100 240 1254 3203 NGC7552 <0.11 · · · <0.83 · · · · · · · · · · · · · · · · · · NGC 4552 <0.06 · · · <0.07 · · · · · · · · · · · · · · · · · · NGC 3079 <0.07a · · · <0.14a · · · · · · · · · · · · · · · · · · NGC 1614 <0.28 · · · <1.49 · · · · · · · · · · · · · · · · · · NGC 3628 <0.06 · · · <0.34 · · · · · · · · · · · · · · · · · · NGC 2623 0.30±0.04 · · · 0.47±0.07 · · · 0.63 +0.22 −0.14 17 163 218 2167 IRAS23128· · · 0.22±0.02 <0.40e 0.34±0.10 · · · 0.65 +0.23 −0.22 16 161 203 2152 MRK273 1.06±0.05 0.82e 2.74±0.19 · · · 0.39 +0.14 −0.10 49 192 617 2565 IRAS20551· · · <0.06 <0.25e <0.25 · · · · · · · · · · · · · · · · · · NGC3627 0.08±0.01 · · · 0.19±0.05 · · · 0.45+0.16 −0.12 40 184 504 2453 UGC05101 0.52b <1.50e 0.49b · · · 1.06+0.37 −0.28 · · · 131 · · · 1750 NGC4125 <0.03 · · · <0.07 · · · · · · · · · · · · · · · · · · NGC 4594 <0.03 · · · <0.04 · · · · · · · · · · · · · · · · · · Quasars PG1351· · · <0.04 · · · <0.07 · · · · · · · · · · · · · · · · · · PG1211· · · 0.04±0.007 · · · <0.04 · · · >1.01 · · · · · · · · · · · · PG1119· · · 0.30±0.06 · · · 0.22±0.02 · · · 1.39 +0.49 −0.36 · · · 115 · · · 1531 PG2130· · · 0.42±0.03 · · · 0.42±0.05 · · · 1.00 +0.35 −0.26 · · · 135 · · · 1798 PG0804· · · <0.06 · · · <0.07 · · · · · · · · · · · · · · · · · · PG1501· · · 0.78±0.02 · · · 0.83±0.02 · · · 0.94 +0.33 −0.25 · · · 138 · · · 1846 Columns Explanation: Col(1):Common Source Names; Col(2): 14.32 µm [NeV] line flux and statistical error in units of 10−20 W cm−2 from Spitzer; Col(3): 14.32 µm [NeV] line flux and statistical error in units of 10−20 W cm−2 from ISO; Col(4): 24.31 µm [NeV] line flux and statistical error in units of 10−20 W cm−2 from Spitzer; Col(5): 24.32 µm [NeV] line flux and statistical error in units of 10−20 W cm−2 from ISO; Col(6): [NeV] Line Ratio used in plots and calculations; Col(7): Extinction required to bring ratios below the low-density limit (LDL) up to the LDL, calculated using the Draine (1989) extinction curve amended by the more recent ISO SWS extinction curve toward the Galactic center for 2.5-10µm (Lutz et al. 1996); Col(8): Extinction required to bring ratios below the low-density limit (LDL) up to the high-density limit (HDL), calculated using the Draine (1989) extinction curve amended by the more recent ISO SWS extinction curve toward the Galactic center for 2.5-10µm (Lutz et al. 1996), Col(9): Extinction required to bring ratios below the low-density limit (LDL) up to the LDL, calculated using the Chiar & Tielens (2006) extinction curve for the local ISM, Col(10): Extinction required to bring ratios below the low-density limit (LDL) up to the high-density limit (HDL), calculated using the Chiar & Tielens (2006) extinction curve for the local ISM.∗ Sturm et al. 2002 find that the [NeV] 24µm detection for NGC 7469 is a questionable one, ∗∗ As discussed in detail in Section 6.2, Mrk 266 is the only galaxy in our sample where we find that aperture variation may affect the observed [NeV] line flux ratio. For this reason it has been excluded from relevent plots and calculations. References for Table 2: a Weedman et al. 2005, b Armus et al. 2004 & 2006, c Sturm et al. 2002, d Verma et al. 2003, e Genzel et al. 1998, f Prieto & Viegas Table 3: SIII Line Fluxes and Derived Extinction Galaxy [SIII] [SIII] [SIII] [SIII] [SIII] Av Av Av Av PAH6.2 Source 18.71 18.71 33.48 33.48 Ratio Ratio to Ratio to Ratio to Ratio to SL2 SH ISO LH ISO LDL (D&L) HDL (D&L) LDL (C&T) HDL (C&T) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) Seyferts NGC4151 7.50a 5.40c 6.57a 8.10c 1.14 +0.40 −0.30 · · · · · · · · · · · · 168.1 NGC1365 5.73±0.05 13.50c 27.20±0.38 36.10c 0.21 +0.07 −0.05 · · · · · · · · · · · · 132.3 NGC1097 2.18±0.02 · · · 11.40±0.23 · · · 0.19 +0.07 −0.05 · · · · · · · · · · · · 151.7 NGC7469 7.70a 9.20c 9.80a 10.40c 0.79 +0.28 −0.20 · · · · · · · · · · · · 415.0 NGC4945 3.18±0.03 6.30d 38.70±1.80 51.40d 0.08 +0.03 −0.02 · · · · · · · · · · · · 671.7 Circinus 19.10±0.70 35.20c 56.30±3.31 93.20c 0.37 +0.13 −0.10 · · · · · · · · · · · · 1018.4 Mrk 231 <0.47a <3.00e <2.30a <3.00e · · · · · · · · · · · · · · · 175.6 Mrk3 5.55a · · · 5.25a · · · 1.06 +0.37 −0.28 · · · 83 · · · 72 18.4 Cen A 4.54a 6.40c 14.80a 22.30c 0.31 +0.11 −0.08 · · · · · · · · · · · · 220.8 Mrk463 1.50b <0.80c 1.35b 1.20c 1.11 +0.39 −0.29 · · · 81 · · · 70 55.4 NGC 4826 3.39±0.03f · · · 4.61±0.08f · · · 0.74 +0.26 −0.19 · · · 96 · · · 82 66.2 NGC 4725 0.02±0.02f · · · 0.11±0.02f · · · 0.23 +0.08 −0.06 23 135 20 116 3.7 1 ZW 1 <0.11a <0.50c <0.18a <1.00c · · · · · · · · · · · · · · · 22.2 NGC 5033 0.85±0.11 · · · 2.42±0.10 · · · 0.35 +0.12 −0.09 · · · · · · · · · · · · 33.9 NGC1566 0.55±0.05f · · · 0.55±0.06f · · · 1.00 +0.35 −0.26 · · · 85 · · · 73 61.0 NGC 2841 0.22±0.04f · · · 0.29±0.03f · · · 0.75 +0.26 −0.20 · · · 95 · · · 82 10.9 NGC 7213 0.47±0.05 · · · 0.59±0.06 · · · 0.80 +0.28 −0.21 · · · · · · · · · · · · 28.7 LINERs NGC4579 0.32±0.06f <0.78g 0.24±0.03f <1.20g 1.33 +0.47 −0.35 · · · 75 · · · 65 17.3 NGC3031 0.61±0.03 · · · 0.09±0.09 · · · 0.67 +0.24 −0.18 · · · · · · · · · · · · 23.4 NGC6240 1.99b <4.00e 2.63b 4.50e 0.76 +0.27 −0.20 · · · 95 · · · 81 399b NGC5194 1.06±0.05f 1.00d 1.48±0.03f 4.60d 0.72 +0.25 −0.19 · · · 96 · · · 83 26.4 MRK266∗∗ 1.00±0.13 · · · 4.65±0.09 · · · 0.21 +0.08 −0.06 25 138 22 118 23.1 NGC7552 17.11±0.08f 24.60d 13.38±0.41f 41.10d 1.28 +0.45 −0.33 · · · 77 · · · 66 872.0 NGC 4552 0.07±0.03f · · · 0.06±0.02f · · · 1.29 +0.45 −0.34 · · · 76 · · · 66 21.5 NGC 3079 1.25a 6.80g 6.08a 6.60g 0.21 +0.07 −0.05 · · · · · · · · · · · · 620.3 NGC 1614 9.63±0.27 · · · 11.60±0.43 · · · 0.83 +0.29 −0.15 · · · 91 · · · 79 508.5 NGC 3628 2.14±0.03 · · · 15.80±0.33 · · · 0.14 +0.05 −0.04 · · · · · · · · · · · · 430.4 NGC 2623 0.88±0.05 · · · 3.16±0.20 · · · 0.28 +0.10 −0.07 16 129 14 111 128.6 IRAS23128· · · 2.62±0.12 0.89e 2.11±0.18 2.80e 1.24 +0.44 −0.32 · · · 78 · · · 67 90.1 MRK273 1.24±0.07 <0.82e 3.88±0.40 2.30e 0.32 +0.11 −0.08 12 124 10 107 69.2 IRAS20551· · · 0.66±0.06 0.30e 1.18±0.13 1.40e 0.56 +0.20 −0.15 · · · 105 · · · 90 38.3 NGC3627 0.38±0.03f · · · 0.57±0.09f · · · 0.67 +0.24 −0.18 · · · 99 · · · 85 153.8 UGC05101 0.98b <1.40e 1.30b 2.50e 0.75 +0.27 −0.20 · · · 95 · · · 82 190b NGC4125 · · ·f · · · 0.06±0.05f · · · · · · · · · · · · · · · · · · 14.9 NGC 4594 0.39±0.03 · · · 1.24±0.13 · · · 0.32 +0.11 −0.08 · · · · · · · · · · · · 14.2 Quasars PG1351· · · 0.34±0.06 · · · <0.13 · · · >2.70 · · · · · · · · · · · · 29.6 PG1211· · · <0.06 · · · <0.08 · · · · · · · · · · · · · · · · · · 25.8 PG1119· · · <0.13 · · · 0.19±0.06 · · · <0.71 · · · · · · · · · · · · 8.7 PG2130· · · <0.19 · · · 0.34±0.06 · · · <0.55 · · · · · · · · · · · · 26.9 PG0804· · · <0.06 · · · <0.21 · · · · · · · · · · · · · · · · · · 27.4 PG1501· · · 0.67±0.15 · · · 0.41±0.05 · · · 1.64 +0.58 −0.43 · · · 68 · · · 59 19.2 Columns Explanation: Col(1):Common Source Names; Col(2): 18.71 µm [SIII] line flux and statistical error in units of 10−20 W cm−2 from Spitzer; Col(3): 18.71 µm [SIII] line flux and statistical error in units of 10−20 W cm−2 from ISO; Col(4): 33.48 µm [SIII] line flux and statistical error in units of 10−20 W cm−2 from Spitzer; Col(5): 33.48 µm [SIII] line flux and statistical error in units of 10−20 W cm−2 from ISO; Col(6):[SIII] line flux ratio used for plots and calculations; Col(7): Extinction required to bring ratios below the low-density limit (LDL) up to the LDL, calculated using the Draine (1989) extinction curve amended by the more recent ISO SWS extinction curve toward the Galactic center for 2.5-10µ (Lutz et al. 1996) for those galaxies with distances greater than 55 Mpc that are not effected by aperture variations, Col(8): Extinction required to bring ratios below the low-density limit (LDL) up to the high-density limit (HDL), calculated using the Draine (1989) extinction curve amended by the more recent ISO SWS extinction curve toward the Galactic center for 2.5-10µ (Lutz et al. 1996) for those galaxies with distances greater than 55 Mpc that are not effected by aperture variations, Col(9): Extinction required to bring ratios below the low-density limit (LDL) up to the LDL, calculated using the Chiar & Tielens (2006) extinction curve for the Galactic Center for those galaxies with distances greater than 55 Mpc that are not effected by aperture variations, Col(10): Extinction required to bring ratios below the low-density limit (LDL) up to the high-density limit (HDL), calculated using the Chiar & Tielens (2006) extinction curve for the Galactic Center for those galaxies with distances greater than 55 Mpc that are not effected by aperture variations, Col(11): 6.2 µm PAH line flux in units of 10−21 W cm−2 ∗∗ The [SIII] ratio for Mrk 266 is known to be affected by aperture variations(See Section 6.2). For this reason it has been excluded from relevent plots and calculations. References for Table 3: a Weedman et al. 2005, b Armus et al. 2004 & 2006, c Sturm et al. 2002, d Verma et al. 2003, e Genzel et al. 1998, f Dale et al. 2006, g Satyapal et al. 2004. 32 33 34 35 36 37 38 = -0.01 log(24 m Specific Luminosity) (W m-1) 14/24 < 0.50 0.50 < 14/24 < 1.0 14/24 > 1.0 Fig. 5.— The observed [NeV] line ratio as a function of the 24µm specific luminosity for our sample of galaxies. The error bars shown here represent the calibration uncertainties on the [NeV] line flux ratio as in Figure 3. The symbol type indicates the 14µm/24µm continuum ratio. a role in lowering the [NeV] line flux ratio, we argue that differential infrared extinction to the [NeV] emitting re- gion due to dust in the obscuring torus is responsible for the low line ratios in at least some AGN. Clearly, this requires that there is significant extinction at mid-IR wavelengths, and specifically toward the [NeV]-emitting regions. Is this reasonable? If there is significant ex- tinction, it is possible that: 1) the [NeV]-emitting region originates much closer to the central source than previ- ously recognized, close enough to be extinguished by the central torus in some galaxies, 2) the [NeV]-emitting por- tion of the NLR is obscured by dust in the host galaxy or in the NLR itself, or 3) some combination of these scenarios. We explore these possibilities in the following analysis. 5.1. The [NeV] originates in gas interior to the central torus. In the conventional picture of an AGN, the broad line region (BLR) is thought to exist within a small region interior to a dusty molecular torus while the NLR origi- nates further out. This of course is the paradigm invoked to explain the Type 1/Type 2 dichotomy. However there have been multiple optical spectroscopic studies that con- tradict the assumption that the observational properties of the NLR are not dependent on the viewing angle and the inclination of the system, suggesting that some of the narrow emission lines originate in gas interior to the torus. For instance, Shuder and Osterbrock (1981) and Cohen (1983) showed that narrow high ionization forbid- den lines such as [Fe VII] λ 6374 (requiring photons with energies ≥ 100eV to ionize) are stronger relative to the low ionization lines in Seyfert 1 galaxies (including inter- mediate Seyferts, 1.2, 1.5 etc.) than in Seyfert 2 galaxies, suggesting that some of the emission is obscured by the torus. In addition, [FeX] λ 6374 and [NeV] λ 3426 have also been shown to be less luminous in Type 2 objects than in Type 1 objects (Murayama & Taniguchi, 1998a; Schmitt 1998, Nagao et al. 2000, 2001a, 2001b, 2003, Tran et al. 2000, see also Jackson and Browne (1990) for narrow line radio galaxies and quasars.) These findings may imply that the emission lines of species with the highest ionization potentials originate closer to the AGN than those of lower ionization species such as [OII]λ3727, [SII]λλ6716, 6731, [OI] λ6300 etc. and therefore may be partially obscured by the central torus. If there is considerable extinction to the line-emitting regions due to the torus, one may expect the mid-infrared continuum to be similarly obscured. To test this scenario we divided our sample into Type 1 or Type 2 objects based on the presence or absence of broad (full width at half max (FWHM) exceeding 1000 km s−1) Balmer emis- sion lines in the optical spectrum. The spectral classifi- cation for the [NeV]-emitting galaxies is given in Table 1. In Figure 6a, we plot the [NeV] 14µm/24µm line flux ra- tio versus the 14µm/24µm continuum ratio of the [NeV] emitting galaxies in our sample. Assuming there is no correlation between the electron density and the contin- uum shape, a correlation between the line flux and con- tinuum ratios would suggest that the mid-IR extinction associated with the torus (such as that found by Clavel et al. 2000) affects the observed line flux ratios. As can be seen, there is a correlation between the line and contin- uum ratios for galaxies with [NeV] emission. Moreover we note that the 3 nuclei with ratios significantly below the LDL are all Type 2 AGNs, while the 2 that lie sig- nificantly above this limit are Type 1 AGNs, suggestive that the extinction of the [NeV]-emitting region in Type 2 AGNs may be due to the torus. We note that the error bars displayed in Figure 6 are based on a conserva- tive estimate (15%) of the absolute calibration error on the flux (see Section 3). Moreover, we have adopted the most conservative approach in propagating the error (see Section 4) for each line ratio. We further note that two of the three nuclei with ratios below the LDL were also observed in high-accuracy peak-up mode, resulting in a pointing accuracy on the continuum for these galaxies of 0.4”. The third galaxy, NGC 3627, was observed in high resolution mapping mode over 15” X 22”. We extracted the spectra and found that the full map and the single slit fluxes agree to within 10%. Thus pointing errors do not appear to be responsible for the low ratios in these galaxies. Finally we find that the ratios for the all galax- ies except Mrk 266 are not sensitive to the line-fitting or flux extraction methods that we have employed. The Spearman rank correlation coefficient for Figure 6 is 0.60 (with a probability of chance correlation of 0.008), indicating a significant correlation between the [NeV] line flux ratios and the mid-IR continuum ratio. We note that some AGNs are known to contain prominent sili- cate emission features (Hao et al. 2005, Sturm et al. 2006) which have not been disentangled from the under- lying continuum in this study. Because of this, the 14µm or 24µm flux may be overestimated in some cases mak- ing intrinsic value of the continuum at 14µm and 24µm somewhat uncertain. However only one galaxy plotted in Figure 6 is currently known to contain such features (PG1211+143, Hao et al. 2005). Variations in ne and the underlying continuum shape will also add scatter to the correlation, as will differences in extinction to the 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 = 0.60 F (14 m) / F (24 m) Continuum Type 1 Objects Type 2 Objects 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 D < 55 Mpc D > 55 Mpc F (14 m) / F (24 m) Continuum Fig. 6.— The [NeV] line ratio vs. the Fν(14µm)/Fν (24µm) continuum ratio for our sample. In both plots, the error bars mark the calibration uncertainties on the line ratio. There is a correlation between the line and continuum ratios which suggests that extinction affects the observed line flux ratios. 6a) The majority of galaxies with ratios below the LDL are Type 2 objects, implying that the extinction toward the [NeV]-emitting region may be due to the torus. 6b) The correlation shown here is not an artifact of aperture vatiations between the SH and LH slits. The correlation holds when only the most distant galaxies are considered. line- and continuum-producing regions. We should note that the correlation seen in Figure 6 is not an artifact of aperture variations between the LH and SH slit. The correlation holds when only the most distant galaxies (closed symbols in Figure 6b) are considered. Independent of this correlation, our most important finding is that the [NeV] line flux ratio is significantly lower for Type 2 AGNs than it is for Type 1 AGNs. Fig- ure 7 shows the relative [NeV] flux ratios for the Type 1 and Type 2 objects in our sample. The mean ratios are 0.97 and 0.72 for the eight Type 1 and ten Type 2 AGNs, respectively, with uncertainties in the mean of about 0.08 for each. Interestingly, although the sample size is limited, precluding us from drawing firm statisti- cally significant conclusions, there is a similar suggestive trend seen in the sample of AGNs observed by Sturm et al. (2002) with ISO-SWS. That is, in their work, the two galaxies with the lowest [NeV] flux ratios are NGC 1365 and NGC 7582, both Type 2 AGNs. The galaxy with the highest ratio in their work is TOL 0109-383 , a Type 1 AGN. If indeed the torus obscures the IR [NeV] emission in Type 2 objects, one would expect the optical/UV [NeV] emission in these objects to be obscured as well. We searched the literature for optical/UV detections of [NeV] λ3426 for all of the galaxies in our sample and found five galaxies with observations at this wavelength. Four of these galaxies (Mrk 463, Mrk 3, NGC 1566, and NGC 4151) were detected at [NeV] λ3426; the other (NGC 3031) was not detected (see, Kuraszkiewicz et al. 2002, 2004 and Forster et al. 2001 for optical/UV fluxes). Of the four galaxies with optical/UV [NeV] detections, two are Type 1 galaxies (NGC 1566 & NGC 4151) and, surprisingly, two are Type 2 galaxies (Mrk 3 and Mrk 463). If the Type 2 galaxies Mrk 3 and Mrk 463 had [NeV] emitting regions interior to the torus, then the op- Type 1 AGN 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 [NeV]14 m/[NeV]24 m Type 2 AGN Fig. 7.— Histogram of the [NeV] 14µm/24µm line flux ratio as a function of AGN type. The [NeV] line ratios for Type 2 AGNs are consistently lower than those from Type 1 AGNs. tical/UV lines in these objects should not be detected due to severe obscuration. We note that Mrk 3 and Mrk 463 have some of the highest X-ray luminosities (both ∼ 1043 erg s−1 ) in the sample and mid-IR [NeV] ra- tios that are comparable to similarly luminous Type 1 objects–consistent with little or no obscuration in the mid-IR for these Type 2 galaxies. This finding may im- ply that, in the most powerful AGNs, the [NeV] emitting region is pushed beyond the torus because the radiation field is so intense, while lines with higher ionization po- tentials than [NeV] (such as [NeVI], [FeX] etc.) are still concealed by the torus. More data, both from the mid- IR and from the optical/UV, are needed to further test this hypothesis. 5.2. The [NeV]-emitting region is obscured by the host galaxy or dust in the NLR. While the correlation in Figures 6a and 6b are very promising explanations for the observed [NeV] line ra- tios, it is not completely clear why some Type 2 galaxies appear obscured and others may not. Perhaps the [NeV] emission is attenuated by dust in the NLR itself or else- where in the host galaxy. Indeed, it is well-known that dust does exist in the NLR (e.g., Radomski et al. 2003, Tran et al. 2000) and that it can be extended and patchy (e.g Alloin et al. 2000; Galliano et al. 2005; Mason et al. 2006). In addition, dust in the host galaxy could be responsible for the extinction seen here. For complete- ness, we have conducted a detailed archival analysis of all of the galaxies in our sample with [NeV] ratios close to or below the LDL in order to see if there is additional evidence for high extinction either in the host galaxy or within the NLR. We find that the majority of galaxies with low densities do indeed have well-known dust lanes, large X-ray inferred column densities, or other properties indicative of extinction. Cen A: This nearby (D = 3.4Mpc) early type (S0) galaxy at one time devoured a smaller gas-rich spiral galaxy (Israel 1998, Quillen 2006). There is clear evi- dence for substantial obscuration toward the nucleus of Cen A. For example, the central region is veiled by a well known dense dust lane thought to be a warped thin disk (Ebneter & Balick 1983, Bland et al. 1986, 1987, Nichol- son et al. 1992, Sparke 1996, Israel 1998, Quillen et al. 2006). Schreier et al. (1996) find V-band extinction av- eraging 4-5 mag and infrared observations by Alonso & Minniti yield AV values exceeding 30 mag in some re- gions. Thus, it is plausible that there are regions toward the nucleus of Cen A that are obscured even at infrared wavelengths. NGC 1566: The optical nuclear spectrum of this nearby galaxy is known to vary dramatically over a pe- riod of months, changing its optical classification from a Type 2 object to a Type 1 object and back again (Pastor- iza & Gerola 1970, de Vaucouleurs 1973, Penfold 1979, Alloin et al. 1985). The narrow optical lines in this ob- ject also show prominent blue wings and the radio prop- erties of this galaxy are more consistent with a Type 2 object than a Type 1 object (Alloin et al. 1985). HST continuum imagery reveals spiral dust lanes within 1” of the nucleus (Griffiths et al. 1997) which might be re- sponsible for the Type 1/Type 2 variability. Baribaud et al. (1992) find hot dust which lies just outside the broad line region in this galaxy and a large covering factor that might explain the steep continuum of the AGN. Ehle et al. (1996) find NH ∼ 2.5 × 10 20 cm−2 from ROSAT X-ray observations of this galaxy. NGC 2623: This galaxy’s tidal tails are evidence of a merger event, however infrared observations reveal a sin- gle symmetric nucleus, implying that the merging galax- ies have coalesced. Multi-color, near infrared observa- tions reveal strong concentrations of obscuring material in the central 500 pc.(Joy & Harvey 1987; Lipari et al. 2004). Lipari et al. (2004) also find an optically obscured nucleus with V-band extinction ≥ 5 mag. IRAS 23128-5919: This galaxy is also in the late stages of a merger. The nuclei of the two galaxies are 4kpc apart and have not yet coalesced. The northern nucleus is a starburst. The southern nucleus is a known AGN, though its optical classification, Seyfert or LINER, is unclear (Duc, Mirabel, & Maza 1997; Charmandaris et al. 2002; Satyapal et al. 2004). IRAS 23128-5919 is an ultraluminous infrared galaxy (ULIRG), clearly con- sistent with the presence of substantial dust towards the nucleus. Optical spectroscopy of the southern nucleus in- dicates very large (1500 km s−1) blue asymmetries in the Hβ and [OIII] lines. This blue wing could be a signature of extinction toward the far side of an expanding region, where the red wing is preferentially obscured. (Johans- son & Bergvall 1988). Mrk 273: This galaxy is also a ULIRG, so significant dust obscuration toward the nucleus is expected. Near- IR imaging and high resolution radio observations show evidence for a double nucleus in this galaxy separated by less than 1 kpc (Ulvestad & Wilson 1984; Mazzarella et al. 1991; Majewski et al. 1993). However, high res- olution Chandra observations reveal only the northern of the two nuclei, suggesting that this galaxy is hosting only one AGN and that perhaps the other ”nucleus” is in fact a portion of the southern radio jet. The soft X- ray emission from the northern nucleus is obscured by column densities of at least 1023 cm−2 (Xia et al. 2002). Although the X-ray-emitting regions are physically dis- tinct from the NLR and some of the obscuration at X-ray wavelengths likely arises in dust-free gas within the sub- limation radius, the high column density derived may be consistent with high extinction toward the central re- gions of this galaxy. Though Xia et al (2002) find that the X-ray morphology of the AGN in Mrk 273 is con- sistent with a Seyfert, Colina et al. (1999) find that it has a LINER optical spectrum, thus implying that some LINER galaxies are in fact heavily absorbed powerful AGN. The soft diffuse X-ray halo in combination with the radio morphology found by Carilli & Taylor (2000) may suggest a circumnuclear starburst surrounding the northern AGN nucleus, again consistent with substantial obscuration toward the AGN. NGC 3627: This nearby galaxy (D ∼ 10Mpc) is thought to have had tidal interactions with NGC 3628, a neighboring galaxy in the Leo Triplet, some 8 × 108 years ago which caused an intense burst of star forma- tion in the nuclear regions around the same time (Rots 1978, Zhang et al. 1993, Afanasiev & Sil’chenko 2005). Zhang et al. (1993) also discovered an extremely dense molecular bar (mass ≥ 4 × 108 M⊙) and Chemin et al. (2003) uncovered a warped disk using Hα observations, both evidence of the tidal interaction. In their spectral fitting to the BeppoSAX observation of NGC 3627, Geor- gantopoulos et al. (2002) find intrinsic column densities of ∼ 1.5 × 1022 cm−2 which, like Mrk 273, may suggest substantial extinction to other regions near the nucleus. NGC 7469: This is a well-known, extensively-studied galaxy with strong, active star formation surrounding a Seyfert 1 nucleus. Meixner et al. (1990) find dense molecular gas (2 × 1010 M⊙), two orders of magnitude above the Galactic value, within the central 2.5kpc of the nucleus. 3.3µm imaging of the galaxy reveals that 80% of the PAH emission comes from an annulus ∼ 1”- 3”in radius around the central nucleus, indicating that there is an elongated region of material that shelters the PAH from the harsh radiation field of the AGN (Cutri et al. 1984, Mazzarella et al. 1994). [OIII] line asym- metries may corroborate the presence of a dense obscur- ing medium, revealing a blue wing resulting when the redshifted gas is obscured by the star forming ring (Wil- son et al. 1986). In addition, Genzel et al. (1995) find variation in the NIR emission attributable to extinction and estimate the extinction from the CO observations of Meixner et al. (1990) to be AV ∼ 10 mag. NGC 1365: This nearby (D = 18.6 Mpc) AGN is known to be circumscribed by embedded young star clus- ters. The galaxy also contains a prominent bar with a dust lane that penetrates the nuclear region (Phillips et al. 1983, Lindblad et al. 1996 & 1999, Galliano et al. 2005). Like NGC 7469, NGC 1365 shows a peak at 3.5µm implying PAH emission in spite of the harsh AGN radia- tion field (Galliano et al. 2005). The large Hα/Hβ ratio found by Alloin et al. (1981) implies substantial extinc- tion toward the emission line regions, ranging from 3-4 mag. Observations with ASCA and ROSAT imply high intrinsic column densities toward the X-ray emitting re- gions, suggesting possibly high obscuration towards other regions near the nucleus (Iyomoto et al. 1997, Komossa & Schulz (1998), see also Schulz et al. 1999). Komossa & Schulz show that the ratio of Hα to both the mid-IR and X-ray radiation is substantially different in NGC 1365 compared with typical Seyfert 1 galaxies, possibly sug- gesting inhomogenous obscuration (Schultz et al. 1999). In an XMM X-ray study of NGC 1365, Risaliti et al. (2005) also find a heavily absorbed Seyfert nucleus. The blueshifted X-ray spectral lines imply high column den- sities of 1023 cm−2 or more. Mrk 266 (NGC 5256): This luminous infrared galaxy is the only galaxy for which aperture effects most likely account for the low 14µm/24µm ratio. Mrk 266 contains a very complicated structure which includes at least two bright nuclei, a Seyfert and a LINER, that are 10” apart–a signature of a merger in progress. The mor- phology of the northeast LINER nucleus is extremely controversial (Wang et al. 1997; Kollatschny & Kowatsch 1998; Satyapal 2004, 2005; Ishigaki et al. 2000; Davies, Ward, & Sugai 2000). Mazzarella et al. (1988) find three non-thermal radio structures, two that coincide with the nuclei and one between the two nuclei. Mazzarella et al. (1988) suggest that the two nuclear structures are associated with classical AGN and are in the stage of a violent interaction in which the center of gravity of the collision produces a massive burst of star formation with supernovae or shocks which are responsible for the third nonthermal radio source. As can be seen in Figure 8, the SH slit, which provides the 14µm flux, overlaps with this third radio source, while the LH slit, responsible for the 24µm flux, encompasses the southwestern nucleus, the third radio source, and part of the northeastern nu- cleus. In this case the two lines observed originate in physically distinct regions that do not each encompass all potential sources of [NeV] emission, resulting in an unphysical 14µm/24µm ratio. This is not to say that Mrk 266 does not suffer from extinction at all. Indeed the possible presence of a circumnuclear starburst im- plies that there may be substantial extinction (Ishigaki et al. 2000; Davies, Ward, & Sugai 2000). We have ver- ified that this is the only distant galaxy in our sample Fig. 8.— 20 cm image of Mrk 266 taken from NED (http://nedwww.ipac.caltech.edu/). As can be seen here, the SH slit (from which the 14µm line is extracted) overlaps with a third radio source, while the LH slit (from which the 24µm line is ex- tracted) encompasses the southwestern nucleus and part of the northeastern nucleus. with a complicated nuclear structure that will result in aperture effects. 5.3. Can the [NeV] line flux ratio be used as a density diagnostic? Our analysis reveals that extinction towards parts of the NLR in some objects is significant and cannot be ig- nored at mid-IR wavelengths. In fact, it is quite possible that extinction affects the [NeV] line flux ratios of those galaxies with ratios above the low density limit (LDL) and the amount of extinction in all cases is highly un- certain. In addition to extinction, the temperature of the [NeV] emitting gas is unknown. If the [NeV] emis- sion originates within the walls of the obscuring central torus, which may be the source of extinction in many of our galaxies, we might expect the temperature of the gas to reach 106 K (Ferland et al. 2002). If, on the other hand, the [NeV] emission comes from further out in the NLR and is instead attenuated by the intervening mate- rial, we might expect the temperature of the gas to be closer to 104 K. As shown in Figure 1, the electron densi- ties inferred from the [NeV] line flux ratios are sensitive to temperature when such large temperature variations are considered. Based on the calculations shown in Fig- ure 1, the low ratios could indicate that the densities in the [NeV] line emitting gas are typically ≤ 3000 cm−3 for T = 104K. However, if the [NeV] gas is characterized by temperatures as high as T = 105K to 106K, densities as high as 105 cm−3 would be consistent with our mea- surements. We note that the [NeV] line flux ratios for the galaxies in our sample (especially the Type 1 AGNs) http://nedwww.ipac.caltech.edu/ all cluster around a ratio of ≈ 1.0. Two separate con- clusions may be drawn from this finding: 1) That the temperatures of the gas are low (∼ 104K) and that the electron density is relatively constant over many orders of magnitude in X-ray Luminosity and Eddington Ratio for these AGNs, or 2) That the temperature of the gas is high (105K to 106 K) and that the AGNs here sam- ple a wide range of electron densities (from 102 cm−3 to 105 cm−3). Since gas temperature, electron density, mid-IR continuum, and extinction are all unknown for these objects, the electron density cannot be determined here. 6. THE SIII LINE FLUX RATIOS In Figure 9 we plot the 18µm/33µm line ratio as a function of electron density ne. As with [NeV], we only consider the five levels of the ground configuration when computing the line ratio and we plot the relationship for gas temperatures of T = 104K and 105K. We adopt col- lision strengths from Tayal & Gupta (1999) and radiative transition probabilities from Mendoza & Zeippen (1982). Fig. 9.— 18µm/33µm line flux ratio in S III versus electron density ne, for gas temperatures T = 10 4 K and 105 K. In Table 3, we list the observed [SIII] line flux ratios for the galaxies in our sample. As with the [NeV] ratios, the [SIII] ratios in many galaxies listed in Table 3 are well below the theoretically allowed value of 0.45 for a gas temperature of T = 104K (13/33 detections). Again we explore the observational effects and the theoretical uncertainties that could artificially lower these ratios. Aperture Effects: The ionization potential of [SIII] is ∼ 35 eV and therefore the [SIII] emission may arise from gas ionized by either the AGN or young stars. In Table 3 we list, in addition to our Spitzer [SIII] fluxes, all available [SIII] fluxes from ISO. Unlike [NeV], the [SIII] fluxes from ISO are significantly larger than the Spitzer fluxes for most galaxies. In Figure 10 we plot the ISO to Spitzer flux ratios for the 18µm and 33µm the [SIII] lines. As can be seen here, the [SIII] emission extends beyond the Spitzer slit for many galaxies (6 out of 9 for [SIII] 18µm and 11 out of 13 for [SIII] 33µm). Similarly, when we compare the [SIII] flux arising from a single slit centered on the nucleus to the flux arising from a more extended region obtained using mapping observations (Dale et al. 2006), we find that in most cases the fluxes from the extended region are much larger than the nuclear single-slit fluxes. Galaxies with fluxes from Dale et al. (2006) are not included in Figure 10 since the extraction aperture for these galaxies is comparable to the 18µm ISO slit. We point out that the value for this ratio is dependent on the orientation of the Spitzer slit relative to the ISO slit and on the distance of each object. We also note that IRAS20551 and IRAS23128 are point sources with Spitzer 18µm fluxes greater than the ISO fluxes from Genzel et al. (1998), however they fall within the Genzel et al. (1998) quoted errors of 30% and the Spitzer calibration error of 15%. Figure 10 suggests that the [SIII] emission may be produced in the extended, circumnuclear star forming regions associated with many AGNs and that aperture effects need to be considered in our analysis of the [SIII] ratio for nearby objects. [SIII] (ISO) / F [SIII] (Spitzer) [SIII] 18 micron Ratio 0 1 2 3 4 5 6 [SIII] 33 micron Ratio Fig. 10.— The ratios of the [SIII] flux from ISO and Spitzer for the 18µm and 33µm lines. The range indicated with arrows is that corresponding to the absolute flux calibration for ISO (20%) and Spitzer (15%). The [SIII] emission is indeed extended beyond the Spitzer slit for many galaxies, suggesting that the [SIII] emission may be produced in star forming regions. We note that IRAS20551 and IRAS23128 are point sources with Spitzer 18µm fluxes greater than the ISO fluxes from Genzel et al. 1998, however they fall within the Genzel et al. (1998) quoted errors of 30% and the Spitzer calibration error of 15%. Galaxies with fluxes from Dale et al. 2006 are not included in this plot since the extraction aperture for these galaxies is comparable to the 18µm ISO slit. The contribution from star formation to the [SIII] lines can be estimated using the strength of the PAH emis- sion, one of the most widely used indicators of the star formation activity in galaxies (e.g. Luhman et al. 2003; Genzel et al. 1998; Roche et al. 1991; Rigopoulou et al. 1999, Clavel et al. 2000; Peeters, Spoon, & Tielens 2004). We examined the [SIII] 18.71 µm/PAH 6.2 µm and [SIII] 33.48 µm/PAH 6.2 µm line flux ratios in 7 starburst galaxies observed by Spitzer and found them to be comparable to the analogous ratios in our entire sample of AGNs as shown in Figure 11. This suggests that the bulk of the [SIII] emission originates in gas ion- ized by young stars. We note that the apertures of the SH and LH IRS modules are smaller than that of the SL2 module, which may artificially raise the line ratios plotted in Figure 11 for nearby galaxies compared with the more distant ones. However, the fact that the line ratios plotted in Figure 11 span a very narrow range sug- gests that the [SIII] line emission has a similar origin in starbursts and in AGNs. Thus, we assume that the bulk of the [SIII] emission originates in gas ionized by young stars and that the electron densities derived using these lines taken from slits of the same size (such those galaxies coming from Dale et al. 2006 mapping observations) or from the most distant galaxies are representative of the gas density in star forming regions. Extinction: We have shown that aperture effects are the likely explanation for why many of the [SIII] ratios for the galaxies in our sample fall below the LDL. How- ever, there are three galaxies in the sample with ratios below the LDL that are distant enough (D>55 Mpc, cor- responding to projected distances greater than 1.2 by 3 kpc and 3 by 6 kpc for the SH and LH slits, respectively) that aperture effects may not be as important (NGC 2623 & Mrk 273, Mrk 266 has been excluded since it is known to be affected by aperture variations See Section 5.2). Extinction may be the explanation for the low ratios in these galaxies. However, even though the SH and LH slits likely cover the entirety of the NLR at these distances, we note that these three galaxies contain well-known, large circumnuclear starbursts (See Section 5.2 for the indi- vidual galaxy summaries) which may produce extremely extended [SIII] emission. It is therefore still possible that the line ratios in these galaxies are artificially lowered by aperture variations between the SH and LH slits. How- ever, in addition to these three distant galaxies, NGC 4725 from Dale et al. (2006) has a [SIII] ratio below the low density limit. The low [SIII] ratio (<0.45) in this case cannot be attributed to aperture variations since the extraction region is the same for both the 18 and 33µm lines. Thus, for completeness, the extinction de- rived using the extinction curves given in Section 4 from the observed [SIII] line ratio for these four sources are given in Table 3. The Draine (1989) and Lutz et al. (1996) extinction curve calculations yield extinction val- ues that range from ∼ 12 to 25 mag. The Chiar and Tielens (2006) extinction curve for the Galactic Center may also be used since, unlike [NeV], the extinction at the longer wavelength line (33µm) is greater than that at the shorter wavelength line (18µm). The values de- rived from this method are quite similar, ranging from ∼ 10 to 22 mag. The Chiar and Tielens (2006) extinction curve from the local ISM cannot be used here since it only extends to 27.0µm. Computed Quantities: As with NeV, there may be uncertainties in the computed SIII infrared collisional rate coefficients. However, there is generally less contro- versy surrounding the [SIII] coefficients and these values are widely accepted. Our analysis suggests that aperture effects severely in- fluence the [SIII] line flux ratios in most cases and that the observed flux is likely dominated by star forming regions. Figure 12, a plot of the [SIII] line ratio as a function of distance, illustrates the influence of aperture effects on the [SIII] line ratio. Most of the galaxies at dis- tances <55 Mpc with [SIII] fluxes extracted from aper- tures of different sizes (i.e. NOT the Dale et al. (2006) galaxies) are below the LDL. On the other hand, galax- ies at larger distances and galaxies with fluxes from Dale et al. (2006) are generally above the LDL. Thus, for the most distant galaxies in our sample and the galaxies with fluxes from Dale et al. (2006) where the aperture for the 18 and 33 µm lines are equal, aperture effects are not problematic, but extinction, as can be seen from Mrk 273, NGC2623, and NGC 4725 in Figure 12, needs to be considered. As with the [NeV] line ratio, the [SIII] line ratio is NOT a tracer of the electron density in our sam- ple. In conclusion, the ambiguity of the intrinsic [SIII] line ratio is primarily the result of aperture variations. However there is at least one case (NGC 4725) where aperture effects cannot explain the low ratio, implying that, in addition to aperture variations, extinction likely plays a role in lowering the [SIII] line flux ratios. 7. SUMMARY We report in this paper the [NeV] 14µm/24µm and [SIII]18µm/33µm line flux ratios, traditionally used to measure electron densities in ionized gas, in an archival sample of 41 AGNs observed by the Spitzer Space Tele- scope. 1. We find that the [NeV] 14µm/24µm line flux ratios are low: approximately 70% of those measured are consistent with the low density limit to within the calibration uncertainties of the IRS. 2. We find that Type 2 AGNs have lower [NeV] 14µm/24µm line flux ratios than Type 1 AGNs. The mean ratios are 0.97 and 0.72 for the eight Type 1 and ten Type 2 AGNs, respectively, with uncertainties in the mean of about 0.08 for each. 3. For several galaxies, the observed [NeV] line ratios are below the theoretical low density limit. All of these galaxies are Type 2 AGNs. 4. We discuss the physical mechanisms that may play a role in lowering the line ratios: differential mid- IR extinction, low density, high temperature, and high mid-IR radiation density. 5. We argue that the [NeV]-emitting region likely originates interior to the torus in many of these AGNs and that differential infrared extinction due to dust in the obscuring torus may be responsible for the ratios below the low density limit. We sug- gest that the ratio may be a tracer of the torus inclination angle to our line of sight. 6. Our results imply that the extinction curve in these galaxies must be characterized by higher extinction at 14µm than at 24µm, contrary to recent studies of the extinction curve toward the Galactic Center. 7. A comparison between the [NeV] line fluxes ob- tained with Spitzer and ISO reveals that there are systematic discrepancies in calibration between the two instruments. However, our results are indepen- dent of which instrument is used; [NeV] line flux ratios are consistently lower in Type 2 AGNs than in Type 1 and [NeV] line flux ratios below the LDL are observed with both ISO and Spitzer. 0 1 2 3 4 5 [SIII]18/F(PAH6.2) Starbursts 0 1 2 3 4 5 AGN 0 1 2 3 4 5 [SIII](33)/F(PAH6.2) Starbursts 0 1 2 3 4 5 AGN Fig. 11.— Distribution of the [SIII]33µm/PAH 6.2µm and the [SIII]18µm/PAH 6.2µm line flux ratios for our sample of AGNs and a small sample of starburst galaxies observed by Spitzer. It is apparent that the line ratios of the AGNs are comparable to the corresponding ratios in starbursts, suggesting that the bulk of the [SIII] emission originates in star forming regions and not the NLRs in our sample of AGNs. 8. Our work provides strong motivation for investigat- ing the mid-IR spectra of a larger sample of galaxies with Spitzer in order to test our conclusions. 9. Finally, an analysis of the [SIII] emission reveals that it is extended in many or all of the galaxies and likely originates in star forming gas and NOT the NLR. Since there is a variation in the aper- tures between the SH and LH modules of the IRS, we cannot use the [SIII] line flux ratios to derive densities for the majority of galaxies in our sam- We are extremely thankful for all of the invaluable data analysis assistance from Dan Watson and Joel Green, without which this work would not have been possible. We are also very grateful to Davide Donato, Eli Dwek, Frederic Galliano, Paul Martini, Kartik Sheth, Eckhard Sturm, Peter van Hoof, and Dan Watson for their en- lightening and thoughtful comments/expertise that sig- nificantly improved this paper. Carissa Khanna was also very helpful in providing assistance in the preliminary data analysis. We are also grateful for the helpful and constructive comments from the referee. This research has made use of the NASA/IPAC Extragalactic Database (NED) which is operated by the Jet Propulsion Labora- tory, California Institute of Technology, under contract with the National Aeronautics and Space Administra- tion. SS gratefully acknowledges financial support from NASA grant NAG5-11432 and NAG03-4134X. JCW gratefully acknowledges support from Spitzer Space Tele- scope Theoretical Research Program. JCW is a Cottrell Scholar of Research Corporation. Research in infrared astronomy at NRL is supported by 6.1 base funding. RPD gratefully acknowledges financial support from the NASA Graduate Student Research Program. REFERENCES Alexander, T. & Sternberg, A., 1999, ApJ, 520, 137 Alexander, T., Sturm, E., Lutz, D., et al. 1999, ApJ, 512, 204 Alloin, D., Edmunds, M. G., Lindblad, P. O., & Pagel, B. E. J., 1981, A&A, 101, 377 Alloin, D., Pelat, D., Phillips, M., & Whittle, M., 1985, ApJ, 288, Alloin, D., Pantin, E, Lagage, P.O., & Granato, G. L., 2000, A&A, 363, 929 Afanasiev, V. L. & Sil’chenko, 2005, A&A, 429, 825 Aoki, Kawaguchi, & Ohta, 2005, ApJ, 618, 601 Andreasian, N. K. & Khachikian, E. Y., 1987, IAUS, 121, 541 0.0 0.5 1.0 1.5 2.0 2.5 3.0 NGC 4725 NGC2623 Mrk 273 log(Distance) Mpc Fig. 12.— The [SIII] 18µm/33µm line ratio as a function of distance. The error bars shown here mark the calibration uncer- tainties on the line ratio. Dale et al. (2006) quote 30% calibration error which is shown here for those galaxies. For the rest of the sample the calibration error is 15% as per the Spitzer handbook. Most of the galaxies at distances <55 Mpc with [SIII] fluxes ex- tracted from apertures of different sizes (i.e. not the Dale et al. (2006) galaxies) are below the LDL. However, for the most distant galaxies in our sample and the galaxies with fluxes from Dale et al. (2006) where the aperture for the 18 and 33 µm lines are equal, aperture effects are not problematic, but extinction needs to be considered (see Mrk 273, NGC2623, and NGC 4725 above). Armus, L., Charmandaris, V., Spoon, H. W. W., et al. 2004, ApJS, 154, 178 Armus, L., Bernard-Salas, J., Spoon,et al., H. W. W., et al. 2006, ApJ, 640, 204 Balestra, I., Boller, T., Gallo, L., et al., 2005, A&A, 442, 469 Baribaud, T., Alloin, D., Glass, I., & Pelat, D., 1992, A&A, 256, Barth, A. J., Ho, L. C., Filippenko, A. V., Rix, H., & Sargent, W. L. W., 2001, ApJ, 546, 205 Bland, J., Taylor, K, & Atherton, P. D., 1987, MNRAS, 228, 595 Bland, J., Taylor, K, & Atherton, P. D., 1986, IAUS, 127, 417 Capetti, A., Macchetto, F., Axon, D. J., Sparks, W. B., & Boksenberg, A., 1995, ApJ, 448, 600. Capetti, A., Axon, D. J., & Macchetto, F., 1997, ApJ, 487, 560 Cappi, M, Panessa, F., Bassani, L., et al. 2006, 446, 459. Carilli, C. L., & Taylor, G. B., 2000, ApJ, 532, 95 Charmandaris, V., Laurent, O., Le Floch, E., 2002, A&A, 391, 429 Chemin, L., Cayatte, V., Balkowski, C., et al., 2003, A&A, 405, 89 Chiar, J. E. & Tielens, A. G. G. M., 2006, ApJ, 637, 774 Clavel, J., Schulz, B., Altieri, B., et al., 2000, A&A, 357, 839 Clegg, R.E.S, Harrington, J. P., Barlow, M. J., & Walsh, J. R.,1987, ApJ, 314, 551 Cohen, R. D., 1983, ApJ, 273, 489 Colina, L. Arribas, S., & Borne, K. D., 1999, ApJ, 527, 13 Cutri, R. M., Rieke, G. H., Tokunaga, A. T., Willner, S. P. & Rudy, R. J., 1984, ApJ, 280, 521 Dahari, O. & De Robertis, M. M., 1988, ApJS, 67, 249 Dale, D. A., Smith, J. D. T., Armus, L, et al., 2006, astroph/0604007 Davies, R., Ward, M., & Sugai, H., 2000, ApJ, 535, 735 de Vaucouleurs, G., 1973, ApJ, 181, 31 Draine, B. T., 1989, Interstellar Extinction in the Infrared, Edited by B.H. Kaldeich, European Space Agency. Duc, P. A., Mirabel, I. F., & Maza, J., 1997, A&AS, 124, 533 Dudik, R. P., Satyapal, S. Gliozzi, M. & Sambruna, R. M.2005, ApJ, 620, 113 Ebneter, K., & Balick, B., 1983, ASP, 95, 675 Ehle, M., Beck, R., Haynes, et al., 1996, A&A, 306, 73 Elvis, M.; Wilkes, B. J., McDowell, J. C., et al. 1994, ApJS, 95,1 Evans, D. A., Kraft, R. P., Worrall, D. M., et al., 2004, ApJ, 612, Falcke, H., Wilson, A. S., & Simpson, C., 1998, ApJ, 502, 199 Ferland, G. J., Martin, P. G., van Hoof, P. A. M., & Weingartner, J. C. 2002, in Workshop on X-ray Spectroscopy of AGN with Chandra and XMM-Newton, MPE Report 279, 103 Ferruit, P., Wilson, A. S., Falcke, H. et al., 1999, MNRAS, 309, 1 Filippenko, A. V. & Halpern, J. P., 1984, ApJ, 285, 458 Forster, K., Green, P. J., Aldcroft, T. L. et al., 2001, ApJS, 134, Galavis, M. E., Mendoza, C., Zeippen, C. J., 1997, A&AS, 123, 159 Gallagher, S. C., Brandt, W. N., Chartas, G., Garmire, G. P., & Sambruna, R. M., 2002, ApJ, 569, 655 Galliano, E., Alloin, D., Pantin, E., Lagage, P.O., & Marco, O., 2005, A&A, 438, 803 Gallo, L. C., Boller, Th., Brandt, W. N., et al., 2004, A&A, 417, Genzel, R., Weitzel, L., Tacconi-Garman, L. E., et al., 1995, ApJ, 444, 129 Genzel, R., Lutz, D., Sturm, E., Egami, E., et al., 1998, ApJ, 498, Georgantopoulos, I. Panessa, F., Akylas, A., et al., 2002, A&A, 386, 60 Giveon, U., Sternberg, A., Lutz, D., Feuchtgruber, H., & Pauldrach, A. W. A. 2002, ApJ, 566, 880 Griffin, D. C. & Badnell, N. R., 2000, JPhB, 33, 4389 Griffiths, R. E., Homeier, N., Gallagher, J., & HST/WFPC2 Investigation Definition Teamn, 2005, AAS, 191, 7607 Guainazzi, M., Matt, G., Brandt, W. N., et al., 2000, A&A, 356, Hao, L., Spoon, H. W. W., Sloan, G. C., et al. 2005, ApJ, 625, 75 Haas, M., Siebenmorgen, R., Schulz, B., Krugel, E., & Chini, R., 2005, A&A, 442, 39 Heckman, T. M., 1980, A&A, 87, 152 Higdon, S. J. U., Devost, D., Higdon, J. L., et al., 2004, PASP, 116, Ho, L. C., Fillipenko, A. V., & Sargent, W. L. W. 1997, ApJS, 112, Ho, L. C. 1999, ApJ, 516, 672 Hönig, S. F., Bekcert, T., Ohnaka, K. & Weigelt, G., 2006, A&A, 452, 459 Houck, J. R., Roellig, T. L., Van Cleve, J. et al., 2004, ApJS, 154, Hummer, D. G., Berrington, K. A., Eissner, W. et al., 1993, A&A, 279, 298 Imanishi, M. & Terashima, Y., 2004, AJ, 127, 758 Ishigaki, T., Yoshida, M., Aoki, K., et al., 2000, PASJ, 52, 185 Israel, F. P, 1998, A&ARv, 8, 237 Iyomoto, N., Makishima, K., Fukazawa, Y., Tashiro, M. & Ishisaki, Y., 1997, PASJ, 49, 425 Jackson & Browne, 1990, Nature, 343, 43 Johansson, L. & Bergvall, N., 1988, A&A, 192, 81 Joy, M. & Harvey, P. M., 1987, ApJ, 315, 480 Kendall, M., & Stuart, A.,1976, In: The Advanced Theory of Statistics, Vol. 2, (New York: Macmillian) Khachikian, E. Y. & Weedman, D. W., 1973, ApJ, 192, 581 Kollatschny, W. & Kowatsch, P., 1998, A&A, 336, 21 Komossa, S. & Schulz, H., 1998, A&A, 339, 345 Kuraszkiewicz, J. K., Green, P. J., Forster, K., et al., 2002, ApJS, 143, 257. Kuraszkiewicz, J. K., Green, P. J., Crenshaw, D. M., et al., 2004, Laine, S., van der Marel, R. P., & Rossa, J.; et al., 2003, AJ,126, Leech, K., Kester, D., Shipman, R., Beintema, D., et al., 2003, ESASP, The ISO Handbook, Volume V-SWS-Short Wavelength Spectrometer, 1262 Lindblad, P. O., Hjelm, M., Hoegbom, J., et al., 1996, A&AS, 120, Lipari, S., Mediavilla, E., Diaz, R. J., et al., 2004, MNRAS, 348, Lindblad, P. 1999, A&ARv, 9, 22 Luhman, M. L., Satyapal, S., Fischer, J., Wolfire, M. G., Sturm, E et al., 2003, ApJ, 594, 758 Lutz, D.; Genzel, R., Sternberg, A., et al., 1996, A&A, 315, 137 Lutz, D., Veilleux, S. & Genzel, R. 1999, ApJ, 517L, 13L Maiolino, R., Marconi, A., Salvati, M. et al., 2001a, A&A, 365, 37 Maiolino, R., Marconi, A., & Oliva, E., 2001b, A&A, 365, 37 Majewski, S. R., Hereld, M., Koo, D.C. Illingworth, G. D., & Heckman, T. M., 1993, ApJ, 402, 125. Marchesini, D., Celotti, A., & Ferrarese, L. 2004, MNRAS, 351, Marconi, A.; Oliva, E.; van der Werf, P. P.; et al., 2000, A&A, 357, Mason, R. E., Geballe, T. R., Packham, C., et al., 2006, ApJ, 640, Mathews, W. G. & Ferland, G. J. 1987, ApJ, 323, 456 Mazzarella, J. M., Bothun, G. D., & Boroson, T. A., 1991, AJ, 101, Mazzarella, J. M., Voit, G. M., Soifer, B. T., et al., 1994, AJ, 107, Mazzarella, J. M., Gaume, R. A., Aller, H. D., & Hughes, P. A., 1988, ApJ, 333, 168 Meixner, M., Puchalsky, R., Blitz, L., Wright, M. & Heckman, T., 1990, ApJ, 354, 158 Mendoza, C., & Zeippen, C. J., 1982, MNRAS, 199, 1025 Miller, P., Rawlings, S., & Saunders, R., 1993, MNRAS, 263, 425 Murayama, T. & Taniguchi, Y., 1998, ApJ, 497, 9 Nagao, T., Taniguchi, Y, & Murayama, T. 2000, AJ, 119, 2605 Nagao, T., Murayama, T , & Taniguchi, Y, 2001a, ApJ, 549, 155 Nagao, T., Murayama, T , & Taniguchi, Y, 2001b, PASJ, 53, 629 Nagao, T., Murayama, T., Shioya, Y, & Taniguchi, Y, 2003, 125, Nelson, C. H. & Whittle, M., 1996, ApJ, 465, 96 Nicholson, R. A., Bland-Hawthorn, J, & Taylor, K, 1992, ApJ, 387, Oliva, E.; Salvati, M.; Moorwood, A. F. M.; & Marconi, A., 1994, A&A, 288, 457 Oliva, E., Pasquali, A., & Reconditi, M., 1996, A&A, 305, L21 Pastoriza, M. & Gerola, H., 1970, ApL, 6, 155 Peeters, E., Spoon, H. W. W., & Tielens, A. G. G. M., 2004, ApJ, 613, 986 Pelat, D., Alloin, D. & Fosbury, R. A. E., 1981, MNRAS, 195, 787 Penfold, J. E., 1979, MNRAS, 186, 297 Phillips, M. M., Edmunds, M. G., Pagel, B. E. J, & Turtle, A. J., 1983, MNRAS, 203, 759 Prieto, M. A., & Viegas, S., 2000, RMxAC, 9, 324 Quillen, A. C., Bland-Hawthorn, J., Brookes, M. H. et al. 2006, ApJ, 641, 29 Radomski, J. T., Pina, R. K., Packham, C., et al., 2003, ApJ, 587, Rigopoulou, D., Spoon, H. W. W., Genzel, R., et al., 1999, AJ, 118, 2625 Risaliti, G., Bianchi, S., Matt, G., et al., 2005, ApJ, 630, 129 Roberts, T.P., Schurch, N. J., & Warwick, R. S., MNRAS, 2001, 324, 737 Roche, P. F.; Aitken, Smith, & Ward, 1991, MNRAS, 248, 606 Rots, A. H, 1978, AJ, 83, 219 Rubin, R. H., Colgan, S. W. J., Daane, A. R. & Dufour, R. J. 2002, AAS, 34, 1252 Rubin, R. H., 2004, IAUS, 217, 190 Sanders,D.B., Soifer,B.T., Elias,J.H., et al., 1988, ApJ., 325, 74. Satyapal, S., Sambruna, R. M., & Dudik, R. P. 2004, A&A, 414, Satyapal, S. Dudik, R. P., O’Halloran, B. & Gliozzi, M, 2005, ApJ, 633, 86 Schmitt, H. R. & Kinney, A. L., 1996, ApJ, 463, 498 Schmitt, H. R., 1998, ApJ, 506, 647 Schmitt, H. R., Donley, J. L, Antonucci, R. R., et al., 2003, ApJ, 597, 768 Schulz, H., Komossa, S., Schmitz, C., & Mücke, A., 1999, A&A, 346, 764 Schreier, E. J., Capetti, A., Macchetto, F., Sparks, W. B., & Ford, H. J., 1996, ApJ, 459, 535 Shields, J. C.; Boeker, T., Ho, L. C., et al., 2004, AAS, 205, 6411 Shuder & Osterbrock, 1981, ApJ, 250, 55. Sparke, L., 1996, ApJ, 473, 810 Spinoglio, L, Malkan, M. A., Smith, Howard, A., Gonzalez-Alfonso, E., & Fischer, J., 2005, ApJ, 623, 123 Smith, D. A. & Wilson, A. S., 2001, ApJ 557, 180 Storchi-Bergmann, T.; Mulchaey, J. S.; & Wilson, A. S. 1992, ApJ, 395, 73 Sturm, E., Lutz, D, Verma, A, et al., 2002, A&A, 393, 821 Sturm, E., Schweitzer, M., Lutz, D., et al., 2005, ApJ, 629, 21 Tacconi, L. J., Genzel, R., Lutz, D, et al, 2002, ApJ, 580, 73 Tayal, S. S., & Gupta, G. P., 1999, ApJ, 526, 544 Terashima, Y., Kunieda, H. & Misaki, K., 1999, PASJ, 51, 277 Terashima, Y, Iyomoto, N., Ho L. C., & Ptak, A. F., 2002, ApJ, 139, 1 Thompson, K. L., 1992, ApJ, 395, 403 Tran, H.D., Cohen, M. H., & Villar-Martin, M., 2000, AJ, 120, 562 Ulvestad, J. S., & Wilson, A. S., 1984, 285, 439 Van Hoof, P. A. M., Beintema, D. A., Verner D. A. & Ferland, G. J., 2000, A&A, 354, 41 Veron-Cetty, M. P. & Veron, P., 1986, A&AS, 66, 335 Veron-Cetty, M. P. & Veron, P., 2003, A&A, 412, 399 Veilleux, S. & Osterbrock, D. E., 1987, ApJS, 63, 295 Verma, A., Lutz, D., Sturm, E., et al., 2003, A&A, 403, 829 Wang, J., Heckman, T. M., Weaver, K. A., & Armus, L. 1997, ApJ, 474, 659 Weedman, D. W., Hao, L., Higdon, S. J. U., et al. 2005, ApJ, 605, Weingartner, J. C., & Murray, N., 2002, ApJ, 580, 88 Wilson, A. S., Baldwin, J. A., Sun, S., & Wright, A. E., 1986, ApJ, 310, 121 Wilson, A. S. & Heckman, T. M., 1985, Astrophysics of Active Galaxies and Quasi-Stellar Objects, Mill Valley, CA Woo, J.H., & Urry, C. M., 2002, ApJ, 579, 530 Xia, X. Y., Xue, S. J., Mao, S., Boller, Th., Deng, Z. G., & Wu, H., 2002, ApJ, 564, 196 Zhang, X., Wright, M., & Alexander, P., 1993, ApJ, 418, 100 ABSTRACT We present the first systematic investigation of the [NeV] (14um/24um) and [SIII] (18um/33um) infrared line flux ratios, traditionally used to estimate the density of the ionized gas, in a sample of 41 Type 1 and Type 2 active galactic nuclei (AGNs) observed with the Infrared Spectrograph on board Spitzer. The majority of galaxies with both [NeV] lines detected have observed [NeV] line flux ratios consistent with or below the theoretical low density limit, based on calculations using currently available collision strengths and ignoring absorption and stimulated emission. We find that Type 2 AGNs have lower line flux ratios than Type 1 AGNs and that all of the galaxies with line flux ratios below the low density limit are Type 2 AGNs. We argue that differential infrared extinction to the [NeV] emitting region due to dust in the obscuring torus is responsible for the ratios below the low density limit and we suggest that the ratio may be a tracer of the inclination angle of the torus to our line of sight. Because the temperature of the gas, the amount of extinction, and the effect of absorption and stimulated emission on the line ratios are all unknown, we are not able to determine the electron densities associated with the [NeV] line flux ratios for the objects in our sample. We also find that the [SIII] emission from the galaxies in our sample is extended and originates primarily in star forming regions. Since the emission from low-ionization species is extended, any analysis using line flux ratios from such species obtained from slits of different sizes is invalid for most nearby galaxies. <|endoftext|><|startoftext|> Hawaii Neutrinos & Non-proliferation in Europe Michel Cribier* APC, Paris CEA/Saclay, DAPNIA/SPP The International Atomic Energy Agency (IAEA) is the United Nations agency in charge of the development of peaceful use of atomic energy. In particular IAEA is the verification authority of the Treaty on the Non-Proliferation of Nuclear Weapons (NPT). To do that jobs inspections of civil nuclear installations and related facilities under safeguards agreements are made in more than 140 states. IAEA uses many different tools for these verifications, like neutron monitor, gamma spectroscopy, but also bookeeping of the isotopic composition at the fuel element level before and after their use in the nuclear power station. In particular it verifie that weapon-origin and other fissile materials that Russia and USA have released from their defense programmes are used for civil application. The existence of an antineutrino signal sensitive to the power and to the isotopic composition of a reactor core, as first proposed by Mikaelian et al. [Mik77] and as demonstrated by the Bugey [Dec95] and Rovno experiments, [Kli94], could provide a means to address certain safeguards applications. Thus the IAEA recently ask members states to make a feasibility study to determine whether antineutrino detection methods might provide practical safeguards tools for selected applications. If this method proves to be useful, IAEA has the power to decide that any new nuclear power plants built has to include an antineutrino monitor. Within the Double Chooz collaboration, an experiment [Las06] mainly devoted to study the fundamental properties of neutrinos, we thought that we were in a good position to evaluate the interest of using antineutrino detection to remotely monitor nuclear power station. This effort in Europe, supplemented by the US effort [Ber06], will constitute the basic answer to IAEA of the neutrino community. * On behalf of a collective work by S. Cormon, M. Fallot, H. Faust, T. Lasserre, A. Letourneau, D. Lhuillier, V. Sinev from DAPNIA, Subatech and ILL. - 2 - Figure 1 : The statistical distribution of the fission products resulting from the fission of the most important fissile nuclei 235U and 239Pu shows two humps, one centered around masses 100 and the other one centered around 135. The low mass hump is at higher mass in 239Pu fission than in 235U, resulting in different nuclei and decays. The high penetration power of antineutrinos and the detection capability might provide a means to make remote, non-intrusive measurements of plutonium content in reactors [Ber02]. The antineutrino flux and energy spectrum depends upon the thermal power and on the fissile isotopic composition of the reactor fuel. Indeed, when a heavy nuclei (Uranium, Plutonium) experience a fission, it produce two unequal fission fragments (and a few free neutrons) ; the statistical distribution of the atomic masses is depicted in figure 1. All these nuclei immediately produced are extremely unstable - they are too rich in neutrons - and thus ß decay toward stable nuclei with an average of 6 ß decays. All these process involving several hundreds of unstable nuclei, with their excited states, makes very difficult to understand details of the physics, moreover, the most energetic antineutrinos, which are detected more easily, are produced in the very first decays, involving nuclei with typical lifetime smaller than a second. 235U 239Pu released energy per fission 201.7 MeV 210.0 MeV Mean energy of ν 2.94 MeV 2.84 MeV ν per fission > 1.8 MeV 1.92 1.45 average inter. cross section ≈ 3.2 10-43 ≈ 2.76 10-43 Based on predicted and observed ß spectra, the number of antineutrinos per fission from 239Pu is known to be less than the number from 235U, and the energy released bigger by 5%. Hence an hypothetical reactor able to use only 235U would induce in a detector an antineutrino signal 60% higher than the same reactor producing the same amount of energy but burning only 239Pu (see table). This offers a means to monitor changes in the relative amount of 235U and 239Pu in the core. If made in conjunction with accurate independent - 3 - measurements of the thermal power (with the temperature and the flow rate of cooling water), antineutrino measurements might provide an estimate of the isotopic composition of the core, in particular its plutonium inventories. The shape of the antineutrino spectrum can provide additional information about core fissile isotopic composition. Because the antineutrino signal from the reactor decreases as the square of the distance from the reactor to the detector a precise "remote" measurement is really only practical at distances of a few tens of meters if one is constrained to "small" detectors of the order of few cubic meter in size. Simulations MAGNITUDES OF SOME EFFECTS In our group, the development of detailed simulations using professional reactor codes started (see below), but it seems wise to use less sophisticated methods in order to evaluate already, with some flexibility, the magnitude of some effects. To do that we started from the set of Bateman equations, as depicted graphicaly in figure 2, which discribed the evolution of fuel elements in a reactor. The gross simplification in such treatment is the use of average cross section, depending only on 3 groups (thermal neutron, resonance region, fast neutrons), and moreover the fact that the neutron flux is imposed and not calculated. Figure 2 : The Bateman equations are the set of differential equations which described all transformations of the nuclei submitted to a given neutron flux : capture of neutrons are responsible to move at Z constant (green arrow), ß-decay are responsible to increase the atomic mass by one unit (dark blue arrow), and fission destroy the heavy nuclei and produce energy (orange arrows). Given this we use for each isotope under consideration, the cross section for capture, fission, and also plug in the parameters of the decays. Then it is rather easy (and fast) to simulate the evolution of a given initial core composition ; in the same way, it is possible to « make a diversion » by manipulation the fuel composition at a choosen moment. As an example, the figure 3 show the evolution of a fresh core composed of Uranium enriched at 3.5 % in 235U : the build up of 239Pu and 241Pu is rather well reproduced. - 4 - Figure 3 : In a new reactor the initial fuel consist of enriched uranium rods, with an 235U content typically at 3.5 %, the rest is 238U. As soon as the reactor is operating, reactions described by Bateman equations produce 239Pu (and 241Pu), which then participate to the energy production, at the expense of 238U. Knowing the amount of fissions at a given time, it is straight forward to translate that in a given antineutrino flux using the parametrisation of [Hub04], and finally using the interaction cross section for inverse ß decay reaction, to produce the recorded signal in a given detector placed at a suitable location from the reactor under examination. - 5 - Figure 4 : Positron spectrum recorded in an typical antineutrino detector (10 tons of target) placed at 150m of a nuclear reactor (1000 MWel). Positrons results from the inverse ß-decay reaction used in the detection of anti-neutrino. The signal is the superposition of several component whose spectrum exhibit small but sizeable differences, especialy at high energy. As an example of this type of computation, we show in figure 5, the effect of the modification of fuel composition after 100 days : here the operator, clever enough, knows that he cannot merely remove Plutonium from the core without changing the thermal power which will be immediatly noticed. Hence he takes the precaution to add 28 kg of 235U at the same time where he remove 20 kg of 239Pu : although the thermal power is kept constant, the imprint on the antineutrino signal, although modest, is such that, after 10 days, there is an increase of more than 1 σ in the number of interactions recorded. Such a diversion is clearly impossible in PWR or BWR, but more easy in Candu-type reactor, and even more in a molten salt reactor. - 6 - Figure 5 : An hypothetical diversion scenario where an exchange of 239Pu with 235U is made such that the power does not change, but the antineutrino signal recorded by the monitor is slightly increased, giving some evidence of an abnormal operation. SIMULATIONS OF DIVERSION SCENARIOS The IAEA recommends the study of specific safeguards scenarios. Among its concerns are the confirmation of the absence of unrecorded production of fissile material in declared reactors and the monitoring of the burn-up of a reactor core. The time required to manufacture an actual weapon estimated by the IAEA (conversion time), for plutonium in partially irradiated or spent fuel, lies between 1 and 3 months. The significant quantity of Pu is 8 kg, to be compared with the 3 tons of 235U contained in a Pressurized Water Reactor (PWR) of power 900MWe enriched to 3%. The small magnitude of the researched signal requires a carefull feasability study. The proliferation scenarios of interest involve different kinds of nuclear power plants such as light water or heavy water reactors (PWR, BWR, Candu...), it has to include isotope production reactors of a few tens of MWth, and future reactors (e.g., PBMRs, Gen IV reactors, accelerator-driven sub-critical assemblies for transmutation, molten salt reactors). To perform these studies, core simulations with dedicated Monte-Carlo codes should be provided, coupled to the simulation of the evolution of the antineutrino flux and spectrum over time. We started a simulation work using the widely used particle transport code MCNPX [Mcn05], coupled with an evolution code solving the Bateman equations for the fission - 7 - products within a package called MURE (MCNP Utility for Reactor Evolution) [Mur05]. This package offers a set of tools, interfaced with MCNP or MCNPX, that allows to define easily the geometry of a reactor core. In the evolution part, it accesses, the set of evaluated nuclear data and cross sections. MURE is perfectly adapted to simulate the evolution with time of the composition of the fuel, taking into account the neutronics of a reactor core. We are adapting the evolution code to simulate the antineutrino spectrum and flux, using simple Fermi decay as starting point. The extended MURE simulation will allows to perform sensitivity studies by varying the Pu content of the core in the relevant scenarios for IAEA. By varying the reactor power, the possibility to use antineutrinos for power monitoring can be evaluated. Preliminary results show that nuclei with half-lives lower than 1s emit about 70% (50%) of the 235U( 239Pu) antineutrino spectrum above 6 MeV. The high energy part of the spectrum is the energy region where Pu and U spectra differ mostly. The influence of the ß decay of these nuclei on the antineutrino spectrum might be preponderant also in scenarios where rapid changes of the core composition are performed, e.g. in reactors such as Candu, refueled on line. The appropriate starting point for this scenario is a representative PWR, like the Chooz reactors. For this reactor type, simulations of the evolution of the antineutrino flux and spectrum over time will be provided and compared to the accurate measurement provided by the near detector of Double Chooz. This should tell the precision on the fuel composition and of an independent thermal power measurements. An interesting point to study is at the time of the partial refuelling of the core, thanks to the fact that reactors like Chooz (N4-type) does not use MOX fuel. Without any extra experimental effort, the near detector of the Double Chooz experiment will provide the most important dataset of anti neutrino detected (5x105 ν per year) by a PWR. The precise neutrino energy spectrum recorded at a given time will be correlated to the fuel composition and to the thermal power provided by EDF. This valuable dataset will constitute an excellent experimental basis for the above feasibility studies of potential monitoring and for bench-marking fuel management codes ; it is expected that individual component due to fissile element (235U, 239Pu) could be extracted with some modest precision and serve as a benchmark of this techniques. To fulfil the goal of non-proliferation additional lab tests and theoretical calculations should also be performed to more precisely estimate the underlying neutrino spectra of plutonium and uranium fission products, especially at high energies. Contributions of decays to excited states of daughter nuclei are mandatory to reconstruct the shape of each spectrum. Following the conclusion of P. Huber and Th. Schwetz [Hub04] to achieve this goal a reduction of the present errors on the anti-neutrino fluxes of about a factor of three is necessary. We will see that such improvement needs an important effort. Experimental effort The precise measurement of β-decay spectra from fission products produced by the irradiation of a fissile target can be performed at the high flux reactor at Institut Laue - 8 - Langevin (ILL) in Grenoble, where similar studies performed in the past [Sch85] are the basis of the actual fluxes of antineutrinos used in these reactor neutrino experiment. The ILL reactor produces the highest neutron flux in the world : the fission rate of a fissile material target placed close to the reactor core is about 1012 per second. It is possible to choose different fissile elements as target in order to maximize the yield of the nucleus of interest. Using the LOHENGRIN recoil mass spectrometer [Loh04], measurement of individual β−spectra from short lived fission products are possible ; in the same irradiation channel, measurements of integral ß-spectrum with the Mini-INCA detectors [Mar06], could be envisaged to perform study on the evolution with time of the antineutrino energy spectrum of a nuclear power plant. EXPERIMENTS WITH LOHENGRIN The LOHENGRIN recoil mass spectrometer offers the possibility to measure β- decays of individual fission products. The fissile target (235U, 239Pu, 241Pu, …) is placed into a thermal neutron flux of 6.1014 n/cm2/s, 50 cm from the fuel element. Recoil fission products are selected with a dipolar magnetic field followed by an electrostatic condenser. At the end the fragments could be implanted in a moving tape, and the measurement of subsequent β and γ-rays are recorded by a β-spectrometer (Si-detector) and Ge-clover detectors, respectively. Coincidences between these two quantities could also be made to reconstruct the decay scheme of the observed fission products or to select one fission product. Fragments with half- lives down to 2 µs can be measured, so that nuclei with large Qß (above 4 MeV) can be measured. The LOHENGRIN experimental objectives are to complete existing β-spectra of individual fission products [Ten89] with new measurements for the main contributors to the detected ν-spectra and to clarify experimental disagreements between previous measurements. This ambitious experimental programme is motivated by the fact - noted by C. Bemporad [Bem02] - that unknown decays contribute as much as 25% of the antineutrinos at energies > 4MeV. Folding the antineutrino energy spectrum over the detection cross-section for inverse beta decay enhances the contribution of the high energy antineutrinos to the total detected flux by a factor of about 10 for Eν > 6 MeV. The focus of these experiments will be on neutron rich nuclei with yields very different in 239Pu and 235U fission. In the list : 86Ge,90-92Se, 94Br, 96-98Kr, 100Rb, 100-102Sr, 108-112Mo, 106-113Tc,113-115Ru…contribute to the high energy part of the spectrum and have never been measured. IRRADIATION TESTS IN SUMMER 2005 A test-experiment has been performed during two weeks last in summer 2005. The isobaric chains A=90 and A=94 were studied where some isotopes possess a high Qß energy, contributing significantly to the high energy part of the antineutrino spectra following 235U and 239Pu fissions and moreover produced with very different fission yields after 235U and 239Pu fission [Eng94]. The well-known nuclei, such as 90Br, will serve as a test of the experimental set-up, while the beta decay of more exotic nuclei such as 94Kr and 94Br will constitute a test case for how far one can reach in the very neutron rich region with this experimental device. The recorded data (figure 6) will validate the simulation described in the - 9 - previous section, in particular the evolution over time of the isobaric chains beta decay spectra. Silicon detector Germanium detector Figures 6 : Beta energy spectrum (6a) recorded with the silicon detector corresponding to ß decay of fission products with mass A=94. The fission products arising from the LOHENGRIN spectrometer were implanted on a mylar tape of adjustable velocity in front of the silicon detector. The highest velocity was selected in order to enhance shorter-lived nuclei such as 94Kr and 94Br. The gamma energy spectrum (6b) obtained with the germanium detector corresponding and to the same runs is displayed also. INTEGRAL ß SPECTRA MEASUREMENTS In complement to individual studies on LOHENGRIN, more integral studies can be envisaged using the so called “Mini-INCA chamber” at ILL [Mar06] in return for adding a β- spectrometer (to be developped). The existing α- and γ-spectroscopy station is connected to the LOHENGRIN channel and offers the possibility to perform irradiations in a quasi thermal neutron flux up to 20 times the nominal value in a PWR. Moreover, the irradiation can be repeated as many time as needed. It offers then the unique possibility to characterize the evolution of the ß spectrum as a function of the irradiation time and the irradiation cooling. The expected modification of the β spectrum as a function of the irradiation time is connected to the transmutation induced by neutron capture of the fissile and fission fragment elements. It is thus related to the “natural” evolution of the spent-fuel in the reactor. The modification of the β spectrum as a function of the cooling time is connected to the decaying chain of the fission products and is then a means to select the emitted fragments by their livetime. This - 10 - information is important because long-lived fission fragments accumulate in the core and after few days mainly contribute to the low energy part of the antineutrino-spectra. Due to the mechanical transfer of the sample from the irradiation location to the measurement station an irreducible delay time of 30 mn is imposed leading to the loss of short-live fragments. PROSPECT TO STUDY FISSION OF 238U The integral beta decay spectrum arising from 238U fission has never been measured. All information relies on theoretical computations [Vog89]. Some experiments could be envisaged using few MeV neutron sources in Europe (Van de Graaf in Geel, SINQ in PSI, ALVARES or SAMES accelerators at Valduc, …). Here the total absence of experimental data on the ß emitted in the fission of 238U change the context of this measurement compared to the other isotopes. Indeed any integral measurements performed could be used to constraint the present theoretical estimations of the antineutrino flux produced in the fission of 238U. In any case it seems rather difficult to fulfil the goal of a determination of the isotopic content from antineutrinos measurements as long as in important part of the energy spectrum is so poorly known. Conclusions After the preliminary studies, some thoughts can already be made. A realistic diversion (≈ 10 kg Pu) has an imprint in the antineutrino signal which is very small. The present knowledge on antineutrino spectrum emitted in fissions is not precise enough to allow a determination of the isotopic content in the core sensitive to such diversion. On the other hand, the thermal power measurement is a less difficult job. Neutrinos sample the whole core, without attenuation, and would bring valuable information on the power with totally different systematics than present methods. Even if its measurement is not dissuasive by itself, the operator cannot hide any stops or change of power, and in most case, such a record made with an external and independent device, virtually impossible to fake, will act as a strong constraint. In spite of the uncertainty mentioned previously, we see that the most energetic part offers the best possibility to disentangle fission from 235U and 239Pu. The comparison between the cumulative numbers of antineutrinos as a function of antineutrino energy detected at low vs. high energy is an efficient observable to distinguish pure 235U and 239Pu. IAEA seeks also monitoring large spent-fuel elements. For this application, the likelihood is that antineutrino detectors could only make measurements on large quantities of beta-emitters, e.g., several cores of spent fuel. In the time of the experiment the discharge of parts of the core will happen and the Double-Chooz experiment will quantify the sensitivity of such monitoring. More generally the techniques developed for the detection of antineutrinos could be applied for the monitoring of nuclear activities at the level of a country. Hence a KamLAND type detector deeply submerged off the coast of the country, would offer the sensitivity to - 11 - detect a new underground reactor located at several hundreds of kilometers. All these common efforts toward more reliable techniques, remotely operated detectors, not to mention undersea techniques will automatically benefit to both fields, safeguard and geo-neutrinos. References [Bem02] Bemporad et al.,Rev. of Mod. Phys., Vol. 74, (2002). [Ber02] A. Bernstein, Y. Wang, G. Gratta, and T. West, J. Appl. Phys. 91, 4672 (2002) [Ber06] A. Bernstein, these proceeding [Dec95] Y. Declais et al., Nucl. Phys. B434, 503 (1995) [Eng94] T.R. England and B.F. Rider, ENDF-349, LA-UR-94-3106. [Hub04] P. Huber, Th. Schwetz, Precision spectroscopy with reactor anti- neutrinos Phys.Rev. D70 (2004) 053011 [Kli94] Klimov et al., Atomic Energy, v.76-2, 123, (1994) [Las06] T. Lasserre, these proceeding [Loh04] ILL Instrument Review, 2004/2005. [Mik77] Mikaelian L.A. Neutrino laboratory in the atomic plant, Proc. Int. Conference Neutrino-77, v. 2, p. 383-387 [Mar06] F. Marie, A. Letourneau et al., Nucl. Instr and Meth A556 (2006) 547. [Mcn05] Monte Carlo N-Particle eXtended, LA-UR-05-2675, J.S.Hendricks et al. [Mur05] MURE : MCNP Utility for Reactor Evolution -Description of the methods, first applications and results. MÃl'plan O., Nuttin A., Laulan O., David S., Michel-Sendis F. et al. In Proceedings of the ENC 2005 (CD-Rom) (2005) 1-7. [Sch85] K. Schreckenbach, G. Colvin, W. Gelletly, F.v. Feilitzch, Phys. Lett. B160 (1985) 325 [Ten89] O. Tengblad et al., Nuclear Physics A 503 (1989) 136-160. [Vog89] P. Vogel and J. Engel, Phys. Rev. D39, 3378 (1989) ABSTRACT Triggered by the demand of the IAEA, neutrino physicists in Europe involved with the Double Chooz experiment are studying the potential of neutrino detection to monitor nuclear reactors. In particular a new set of experiments at the ILL is planned to improve the knowledge of the neutrino spectrum emitted in the fission of 235U and 239Pu. <|endoftext|><|startoftext|> Introduction Two–dimensional massive Integrable Quantum Field Theories (IQFTs) have proven to be one of the most successful topics of relativistic field theory, with a large variety of applications to statistical mechanical models. The main reason for this success consists of their simplified on–shell dynamics which is encoded into a set of elastic and factorized scattering amplitudes of their massive particles [1, 2]. The two- particle S-matrix has a very simple analytic structure, with only poles in the physical strip, and it can be computed combining the standard requirements of unitarity, crossing and factorization together with specific symmetry properties of the theory. The complete mass spectrum is obtained looking at the pole singularities of the S–matrix elements. Off–mass shell quantities, such as the correlation functions, can be also determined once the elastic S–matrix and the mass spectrum are known. In fact, one can compute the exact matrix elements of the (semi)local fields on the asymptotic states with the Form Factor (FF) approach [3], and use them to write down the spectral representation of the correlators. By following this approach, it has been possible, for instance, to tackle successfully the long-standing problem of spelling out the mass spectrum and the correlation functions of the two dimensional Ising model in a magnetic field [2, 4], as well as many other interesting problems of statistical physics (for a partial list of them see, for instance, [5]). The S-matrix approach can be also constructed for massless IQFTs [6, 7, 8, 9], despite the subtleties in defining a scattering theory between massless particles in (1 + 1) dimensions, and turns out to be useful mainly when conformal symmetry is not present. In this case, massless IQFTs generically describe the Renormaliza- tion Group trajectories connecting two different Conformal Field Theories, which respectively rule the ultraviolet and infrared limits of all physical quantities along the flows. Given the large number of remarkable results obtained by the study of IQFTs, one of the most interesting challenges is to extend the analysis to the non–integrable field theories, at least to those obtained as deformations of the integrable ones and to develop the corresponding perturbation theory. The breaking of integrability is expected to considerably increase the difficulties of the mathematical analysis, since scattering processes are no longer elastic. Non–integrable field theories are in fact generally characterized by particle production amplitudes, resonance states and, correspondingly, decay events. All these features strongly effect the analytic structure of the scattering amplitudes, introducing a rich pattern of branch cut singularities, in addition to the pole structure associated to bound and resonance states. For massive non–integrable field theories, a convenient perturbative scheme was originally proposed in [10] and called Form Factor Perturbation Theory (FFPT), since it is based on the knowledge of the exact Form Factors (FFs) of the original integrable theory. It was shown that, even using just the first order correction of the FFPT, a great deal of information can be obtained, such as the evolution of their particle content, the variation of their masses and the change of the ground state energy. Whenever possible, universal ratios were computed and successfully compared with their value obtained by other means. Recently, for instance, it has been obtained the universal ratios relative to the decay of the particles with higher masses in the Ising model in a magnetic field, once the temperature is displayed away from the critical value [11] (see also the contibution by G. Delfino in this proceedings [12]). For other and important aspects of the Ising model along non-integrable lines see the references [13, 14, 15, 16]. Applied to the double Sine–Gordon model [17], the FFPT has been useful in clarifying the rich dynamics of this non–integrable model. In particular, in relating the confinement of the kinks in the deformed theory to the non–locality properties of the perturbed operator and predicting the existence of a Ising–like phase transition for particular ratios of the two frequencies – results which were later confirmed by a numerical study [18]. The FFPT has been also used to study the spectrum of the O(3) non-linear sigma model with a topological θ term, by varying θ [19, 20]. In this talk I would like to focus the attention on a different approach to tackle some interesting non-integrable models, i.e. those two dimensional field theories with kink topological excitations. Such theories are described by a scalar real field ϕ(x), with a Lagrangian density (∂µϕ) 2 − U(ϕ) , (1.1) where the potential U(ϕ) possesses several degenerate minima at ϕ a (a = 1, 2, . . . , n), as the one shown in Figure 1. These minima correspond to the different vacua | a 〉 of the associate quantum field theory. The basic excitations of this kind of models are kinks and anti-kinks, i.e. topological configurations which interpolate between two neighbouring vacua. Semiclassically they correspond to the static solutions of the equation of motion, i.e. ∂2x ϕ(x) = U ′[ϕ(x)] , (1.2) with boundary conditions ϕ(−∞) = ϕ(0)a and ϕ(+∞) = ϕ(0)b , where b = a ± 1. Denoting by ϕab(x) the solutions of this equation, their classical energy density is (A) (B) Figure 1: Potential U(ϕ) of a quantum field theory with kink excitations (A) and istogram of the masses of the kinks (B). given by ǫab(x) = + U(ϕab(x)) , (1.3) and its integral provides the classical expression of the kink masses Mab = ǫab(x) . (1.4) It is easy to show that the classical masses of the kinks ϕab(x) are simply proportional to the heights of the potential between the two minima ϕ a and ϕ : their istogram provides a caricature of the original ptential (see Figura 1). The classical solutions can be set in motion by a Lorentz transformation, i.e. ϕab(x) → ϕab (x± vt)/ 1− v2 . In the quantum theory, these configurations de- scribe the kink states | Kab(θ) 〉, where a and b are the indices of the initial and final vacuum, respectively. The quantity θ is the rapidity variable which parameterises the relativistic dispersion relation of these excitations, i.e. E = Mab cosh θ , P = Mab sinh θ . (1.5) Conventionally | Ka,a+1(θ) 〉 denotes the kink between the pair of vacua {| a 〉, | a+ 1 〉} while | Ka+1,a 〉 is the corresponding anti-kink. For the kink configurations it may be useful to adopt the simplified graphical form shown in Figure 2. The multi-particle states are given by a string of these excitations, with the adja- cency condition of the consecutive indices for the continuity of the field configuration | Ka1,a2(θ1)Ka2,a3(θ2)Ka3,a4(θ3) . . .〉 , (ai+1 = ai ± 1) (1.6) In addition to the kinks, in the quantum theory there may exist other excitations in the guise of ordinary scalar particles (breathers). These are the neutral excitations a,a+1K K a+1,a | a+1> | 0 > | a > | n > Figure 2: Kink and antikink configurations. | Bc(θ) 〉a (c = 1, 2, . . .) around each of the vacua | a 〉. For a theory based on a Lagrangian of a single real field, these states are all non-degenerate: in fact, there are no extra quantities which commute with the Hamiltonian and that can give rise to a multiplicity of them. The only exact (alias, unbroken) symmetries for a Lagrangian as (1.1) may be the discrete ones, like the parity transformation P , for instance, or the charge conjugation C. However, since they are neutral excitations, they will be either even or odd eigenvectors of C. The neutral particles must be identified as the bound states of the kink-antikink configurations that start and end at the same vacuum | a 〉, i.e. | Kab(θ1)Kba(θ2) 〉, with the “tooth” shapes shown in Figure 3. | 0 > | 0 > | 0 > Figure 3: Kink-antikink configurations which may give rise to a bound state nearby the vacuum | 0 〉a. If such two-kink states have a pole at an imaginary value i ucab within the physical strip 0 < Im θ < π of their rapidity difference θ = θ1 − θ2, then their bound states are defined through the factorization formula which holds in the vicinity of this singularity | Kab(θ1)Kba(θ2) 〉 ≃ i θ − iuc | Bc 〉a . (1.7) In this expression gcab is the on-shell 3-particle coupling between the kinks and the neutral particle. Moreover, the mass of the bound states is simply obtained by sub- stituing the resonance value i ucab within the expression of the Mandelstam variable s of the two-kink channel s = 4M2ab cosh −→ mc = 2Mab cos . (1.8) Concerning the vacua themselves, as well known, in the infinite volume their classical degeneracy is removed by selecting one of them, say | k 〉, out of the n available. This happens through the usual spontaneously symmetry breaking mech- anism, even though – stricly speaking – there may be no internal symmetry to break at all. This is the case, for instance, of the potential shown in Figure 1, which does not have any particular invariance. In the absence of a symmetry which connects the various vacua, the world – as seen by each of them – may appear very different: they can have, indeed, different particle contents. The problem we would like to examine in this talk concerns the neutral excitations around each vacuum, in particular the question of the existence of such particles and of the value of their masses. To this aim, let’s make use of a semiclassical approach. 2 A semiclassical formula The starting point of our analysis is a remarkably simple formula due to Goldstone- Jackiw [23], which is valid in the semiclassical approximation, i.e. when the coupling constant goes to zero and the mass of the kinks becomes correspondingly very large with respect to any other mass scale. In its refined version, given in [24] and redis- covered in [25], it reads as follows1 (Figure 4) (θ) = 〈Kab(θ1) | ϕ(0) | Kab(θ2)〉 ≃ dx eiMab θ x ϕab(x) , (2.9) where θ = θ1 − θ2. 1The matrix element of the field ϕ(y) is easily obtained by using ϕ(y) = e−iPµy ϕ(0) eiPµy and by acting with the conserved energy-momentum operator Pµ on the kink state. Moreover, for the semiclassical matrix element FG (θ) of the operator G[ϕ(0)], one should employ G[ϕab(x)]. For instance, the matrix element of ϕ2(0) are given by the Fourier transform of ϕ2 Figure 4: Matrix element between kink states. Notice that, if we substitute in the above formula θ → iπ − θ, the corresponding expression may be interpreted as the following Form Factor (θ) = f(iπ − θ) = 〈a | ϕ(0) | Kab(θ1)Kba(θ2)〉 . (2.10) In this matrix element, it appears the neutral kink states around the vacuum | a〉 we are interested in. Eq. (2.9) deserves several comments. 1. The appealing aspect of the formula (2.9) stays in the relation between the Fourier transform of the classical configuration of the kink, – i.e. the solu- tion ϕab(x) of the differential equation (1.2) – to the quantum matrix ele- ment of the field ϕ(0) between the vacuum | a 〉 and the 2-particle kink state | Kab(θ1)Kba(θ2) 〉. Once the solution of eq. (1.2) has been found and its Fourier transform has been taken, the poles of Fab(θ) within the physical strip of θ identify the neutral bound states which couple to ϕ. The mass of the neutral particles can be extracted by using eq. (1.8), while the on-shell 3-particle coupling gcab can be obtained from the residue at these poles (Figura 5) θ→i uc (θ − iucab)Fab(θ) = i gcab 〈a | ϕ(0) | Bc 〉 . (2.11) 2. It is important to stress that, for a generic theory, the classical kink config- uration ϕab(x) is not related in a simple way to the anti-kink configuration ϕba(x). It is precisely for this reason that neighbouring vacua may have a different spectrum of neutral excitations, as shown in the examples discussed in the following sections. ab ba ab ba Figure 5: Residue equation for the matrix element on the kink states. 3. It is also worth noting that this procedure for extracting the bound states masses permits in many cases to avoid the semiclassical quantization of the breather solutions [22], making their derivation much simpler. The reason is that, the classical breather configurations depend also on time and have, in general, a more complicated structure than the kink ones. Yet, it can be shown that in non–integrable theories these configurations do not exist as exact solutions of the partial differential equations of the field theory. On the contrary, in order to apply eq. (2.9), one simply needs the solution of the ordinary differential equation (1.2). It is worth notice that, to locate the poles (θ), one only needs to looking at the exponential behavior of the classical solutions at x → ±∞, as discussed below. In the next two sections we will present the analyse a class of theories with only two vacua, which can be either symmetric or asymmetric ones. A complete analysis of other potentials can be found in the original paper [27]. 3 Symmetric wells A prototype example of a potential with two symmetric wells is the ϕ4 theory in its broken phase. The potential is given in this case by U(ϕ) = ϕ2 − m . (3.12) Let us denote with | ±1 〉 the vacua corresponding to the classical minima ϕ(0)± = . By expanding around them, ϕ = ϕ ± + η, we have ± + η) = m 2 η2 ±m λ η3 + η4 . (3.13) Hence, perturbation theory predicts the existence of a neutral particle for each of the two vacua, with a bare mass given by mb = 2m, irrespectively of the value of the coupling λ. Let’s see, instead, what is the result of the semiclassical analysis. The kink solutions are given in this case by ϕ−a,a(x) = a , a = ±1 (3.14) and their classical mass is ǫ(x) dx = . (3.15) The value of the potential at the origin, which gives the height of the barrier between the two vacua, can be expressed as U(0) = M0 , (3.16) and, as noticed in the introduction, is proportional to the classical mass of the kink. If we take into account the contribution of the small oscillations around the classical static configurations, the kink mass gets corrected as [22] +O(λ) . (3.17) It is convenient to define > 0 , and also the adimensional quantities ; ξ = 1− πcg . (3.18) In terms of them, the mass of the kink can be expressed as . (3.19) Since the kink and the anti-kink solutions are equal functions (up to a sign), their Fourier transforms have the same poles. Hence, the spectrum of the neutral particles will be the same on both vacua, in agreement with the Z2 symmetry of the model. We have f−a,a(θ) = dx eiMθ xϕ−a,a(x) = i a By making now the analitical continuation θ → iπ−θ and using the above definitions (3.18), we arrive to F−a,a(θ) = 〈a | ϕ(0) | K−a,a(θ1)Ka,−a(θ2)〉 ∝ (iπ−θ) ) . (3.20) The poles of the above expression are located at θn = iπ (1− ξ n) , n = 0,±1,±2, . . . (3.21) and, if ξ ≥ 1 , (3.22) none of them is in the physical strip 0 < Im θ < π. Consequently, in the range of the coupling constant 1 + πc = 1.02338... (3.23) the theory does not have any neutral bound states, neither on the vacuum to the right nor on the one to the left. Viceversa, if ξ < 1, there are n = neutral bound states, where [x] denote the integer part of the number x. Their semiclassical masses are given by = 2M sin = n mb n2 + ... . (3.24) Note that the leading term is given by multiples of the mass of the elementary boson | B1〉. Therefore the n-th breather may be considered as a loosely bound state of n of it, with the binding energy provided by the remaining terms of the above expansion. But, for the non-integrability of the theory, all particles with mass mn > 2m1 will eventually decay. It is easy to see that, if there are at most two particles in the spectrum, it is always valid the inequality m2 < 2m1. However, if ξ < , for the higher particles one always has mk > 2m1 , for k = 3, 4, . . . n . (3.25) According to the semiclassical analysis, the spectrum of neutral particles of ϕ4 theory is then as follows: (i) if ξ > 1, there are no neutral particles; (ii) if 1 < ξ < 1, there Figure 6: Neutral bound states of ϕ4 theory for g < 1. The lowest two lines are the stable particles whereas the higher lines are the resonances. is one particle; (iii) if 1 < ξ < 1 there are two particles; (iv) if ξ < 1 there are particles, although only the first two are stable, because the others are resonances. Let us now briefly mention some general features of the semiclassical methods, starting from an equivalent way to derive the Fourier transform of the kink solution. To simplify the notation, let’s get rid of all possible constants and consider the Fourier transform of the derivative of the kink solution, expressed as G(k) = dx eikx cosh2 x . (3.26) We split the integral in two terms G(k) = dx eikx cosh2 x dx eikx cosh2 x , (3.27) and we use the following series expansion of the integrand, valid on the entire real axis (except the origin) cosh2 x (−1)n+1n e−2n|x| . (3.28) Substituting this expression into (3.27) and computing each integral, we have G(k) = 4i (−1)n+1n ik + 2n −ik + 2n . (3.29) Obviously it coincides with the exact result, G(k) = πk/ sinh π k, but this derivation permits to easily interpret the physical origin of each pole. In fact, changing k to the original variable in the crossed channel, k → (iπ − θ)/ξ, we see that the poles which determine the bound states at the vacuum | a〉 are only those relative to the exponential behaviour of the kink solution at x → −∞. This is precisely the point where the classical kink solution takes values on the vacuum | a〉. In the case of ϕ4, the kink and the antikink are the same function (up to a minus sign) and therefore they have the same exponential approach at x = −∞ at both vacua | ±1〉. Mathematically speaking, this is the reason for the coincidence of the bound state spectrum on each of them: this does not necessarily happens in other cases, as we will see in the next section, for instance. The second comment concerns the behavior of the kink solution near the minima of the potential. In the case of ϕ4, expressing the kink solution as ϕ(x) = 2x − 1 2x + 1 , (3.30) and expanding around x = −∞, we have ϕ(t) = − m√ 1− 2t+ 2t2 − 2t3 + · · · 2 (−1)ntn · · · , (3.31) where t = exp[ 2x]. Hence, all the sub-leading terms are exponential factors, with exponents which are multiple of the first one. Is this a general feature of the kink solutions of any theory? It can be proved that the answer is indeed positive [27]. The fact that the approach to the minimum of the kink solutions is always through multiples of the same exponential (when the curvature ω at the minimum is different from zero) implies that the Fourier transform of the kink solution has poles regularly spaced by ξa ≡ ωπMab in the variable θ. If the first of them is within the physical strip, the semiclassical mass spectrum derived from the formula (2.9) near the vacuum | a 〉 has therefore the universal form mn = 2Mab sin . (3.32) As we have previously discussed, this means that, according to the value of ξa, we can have only the following situations at the vacuum | a 〉: (a) no bound state if ξa > 1; (b) one particle if < ξa < 1; (c) two particles if < ξa < ; (d) particles if ξa < , although only the first two are stable, the others being resonances. So, semiclassically, each vacuum of the theory cannot have more than two stable particles above it. Viceversa, if ω = 0, there are no poles in the Fourier transform of the kink and therefore there are no neutral particles near the vacuum | a 〉. 4 Asymmetric wells In order to have a polynomial potential with two asymmetric wells, one must nec- essarily employ higher powers than ϕ4. The simplest example of such a potential is obtained with a polynomial of maximum power ϕ6, and this is the example discussed here. Apart from its simplicity, the ϕ6 theory is relevant for the class of universality of the Tricritical Ising Model [28]. As we can see, the information available on this model will turn out to be a nice confirmation of the semiclassical scenario. . A class of potentials which may present two asymmetric wells is given by U(ϕ) = ϕ− b m√ ϕ2 + c , (4.33) with a, b, c all positive numbers. To simplify the notation, it is convenient to use the dimensionless quantities obtained by rescaling the coordinate as xµ → mxµ and the field as ϕ(x) → λ/mϕ(x). In this way the lagrangian of the model becomes L = m (∂ϕ)2 − 1 (ϕ+ a)2(ϕ− b)2(ϕ2 + c) . (4.34) The minima of this potential are localised at ϕ 0 = −a and ϕ 1 = b and the corresponding ground states will be denoted by | 0 〉 and | 1 〉. The curvature of the potential at these points is given by U ′′(−a) ≡ ω20 = (a+ b)2(a2 + c) ; U ′′(b) ≡ ω21 = (a+ b)2(b2 + c) . (4.35) For a 6= b, we have two asymmetric wells, as shown in Figure 7. To be definite, let’s assume that the curvature at the vacuum | 0 〉 is higher than the one at the vacuum | 1 〉, i.e. a > b. The problem we would like to examine is whether the spectrum of the neutral particles | B 〉s (s = 0, 1) may be different at the two vacua, in particular, whether it would be possible that one of them (say | 0〉) has no neutral excitations, whereas the other has just one neutral particle. The ordinary perturbation theory shows that both vacua has neutral excitations, although with different value of their mass: m(0) = (a+ b) 2 (a2 + c) , m(1) = (a+ b) 2 (b2 + c) . (4.36) Let’s see, instead, what is the semiclassical scenario. The kink equation is given in this case by = ±(ϕ + a)(ϕ− b) ϕ2 + c . (4.37) Figure 7: Example of ϕ6 potential with two asymmetric wells and a bound state only on one of them. We will not attempt to solve exactly this equation but we can present nevertheless its main features. The kink solution interpolates between the values −a (at x = −∞) and b (at x = +∞). The anti-kink solution does viceversa, but with an important difference: its behaviour at x = −∞ is different from the one of the kink. As a matter of fact, the behaviour at x = −∞ of the kink is always equal to the behaviour at x = +∞ of the anti-kink (and viceversa), but the two vacua are approached, in this theory, differently. This is explicitly shown in Figure 8 and proved in the following. -4 -2 0 2 4 0.005 0.015 0.025 0.035 Figure 8: Typical shape of , obtained by a numerical solution of eq. (4.37). Let us consider the limit x → −∞ of the kink solution. For these large values of x, we can approximate eq. (4.37) by substituting, in the second and in the third term of the right-hand side, ϕ ≃ −a, with the result ≃ (ϕ+ a)(a+ b) a2 + c , x → −∞ (4.38) This gives rise to the following exponential approach to the vacuum | 0〉 ϕ0,1(x) ≃ −a+ A exp(ω0x) , x → −∞ (4.39) where A > 0 is a arbitrary costant (its actual value can be fixed by properly solving the non-linear differential equation). To extract the behavior at x → −∞ of the anti-kink, we substitute this time ϕ ≃ b into the first and third term of the right hand side of (4.37), so that ≃ (ϕ− b)(a+ b) b2 + c , x → −∞ (4.40) This ends up in the following exponential approach to the vacuum | 1〉 ϕ1,0(x) ≃ b− B exp(ω1x) , x → −∞ (4.41) where B > 0 is another constant. Since ω0 6= ω1, the asymptotic behaviour of the two solutions gives rise to the following poles in their Fourier transform F(ϕ0,1) → ω0 + ik (4.42) F(ϕ1,0) → ω1 + ik In order to locate the pole in θ, we shall reintroduce the correct units. Assuming to have solved the differential equation (4.37), the integral of its energy density gives the common mass of the kink and the anti-kink. In terms of the constants in front of the Lagrangian (4.34), its value is given by α , (4.43) where α is a number (typically of order 1), coming from the integral of the adimen- sional energy density (1.4). Hence, the first pole2 of the Fourier transform of the kink and the antikink solution are localised at θ(0) ≃ iπ 1− ω0 1− ω0 (4.44) θ(1) ≃ iπ 1− ω1 1− ω1 2In order to determine the others, one should look for the subleading exponential terms of the solutions. If we now choose the coupling constant in the range , (4.45) the first pole will be out of the physical sheet whereas the second will still remain inside it! Hence, the theory will have only one neutral bound state, localised at the vacuum | 1 〉. This result may be expressed by saying that the appearance of a bound state depends on the order in which the topological excitations are arranged: an antikink-kink configuration gives rise to a bound state whereas a kink-antikink does not. Finally, notice that the value of the adimensional coupling constant can be chosen so that the mass of the bound state around the vacuum | 1 〉 becomes equal to mass of the kink. This happens when . (4.46) Strange as it may appear, the semiclassical scenario is well confirmed by an explicit example. This is provided by the exact scattering theory of the Tricritical Ising Model perturbed by its sub-leading magnetization. Firstly discovered through a numerical analysis of the spectrum of this model [29], its exact scattering theory has been discussed later in [30]. 5 Conclusions In this paper we have used simple arguments of the semi-classical analysis to in- vestigate the spectrum of neutral particles in quantum field theories with kink ex- citations. We have concentrated our analysis on two cases: the first relative to a potential with symmetric wells, the second concerning with a potential with asym- metric wells. Leaving apart the exact values of the quantities extracted by the semiclassical methods, it is perhaps more important to underline some general fea- tures which have emerged through this analysis. One of them concerns, for instance, the existence of a critical value of the coupling constant, beyond which there are no neutral bound states. Another result is about the maximum number n ≤ 2 of neutral particles living on a generica vacuum of a non-integrable theory. An addi- tional aspect is the role played by the asymmetric vacua, which may have a different number of neutral excitations above them. Acknowledgements I would like to thank G. Delfino and V. Riva for interesting discussions. I am particularly grateful to M. Peyrard for very useful and enjoyable discussions on solitons. This work was done under partial support of the ESF grant INSTANS. References [1] A.B. Zamolodchikov and Al.B. Zamolodchikov, Ann. Phys. 120 (1979) 253. [2] A.B. Zamolodchikov, Adv. Stud. Pure Math. 19 (1989), 641. [3] F. A. Smirnov, Form Factors in Completely Integrable Models of Quantum Field Theory, (World Scientific, Singapore, 1992); M. Karowski and P. Weisz, Nucl. Phys. D 139, (1978), 455. [4] G. Delfino and G. Mussardo, Nucl. Phys. B 455, (1995), 724; G. Delfino and P. Simonetti, Phys. Lett. B 383, (1996), 450. [5] G. Mussardo, Phys. Rept. 218 (1992), 215. [6] Al.B.Zamolodchikov, Nucl.Phys. B 358, (1991), 524. [7] A.B.Zamolodchikov and Al.B.Zamolodchikov, Nucl.Phys. B 379, (1992), 602. [8] P. Fendley, H. Saleur and N.P. Werner, Nucl.Phys. B 430, (1994), 577. [9] G.Delfino, G.Mussardo and P.Simonetti, Phys. Rev. D 51, (1995), 6622. [10] G.Delfino, G.Mussardo and P.Simonetti,Nucl.Phys. B 473, (1996), 469. [11] P. Grinza, G. Delfino and G. Mussardo, hep/th 0507133, Nucl. Phys. B in press. [12] G. Delfino, Particle decay in Ising field theory with magnetic field, Proceedings ICMP 2006. [13] B.M. McCoy and T.T. Wu, Phys. Rev. D 18 (1978), 1259. [14] P. Fonseca and A.B. Zamolodchikov, J.Stat.Phys.110 (2003), 527. [15] S.B. Rutkevich, Phys. Rev. Lett. 95 (2005), 250601. [16] P. Fonseca and A.B. Zamolodchikov, Ising Spectoscopy I: Mesons at T < Tc, hep-th/0612304. http://arxiv.org/abs/hep-th/0612304 [17] G. Delfino and G. Mussardo, Nucl. Phys. B 516, (1998), 675. [18] Z. Bajnok, L. Palla, G. Takacs, F. Wagner, Nucl.Phys. B 601, (2001), 503. [19] D. Controzzi and G. Mussardo, Phys. Rev. Lett. 92, (2004), 021601. [20] D. Controzzi and G. Mussardo, Phys. Lett. B 617, (2005), 133. [21] G. Delfino, P. Grinza and G. Mussardo, Nucl. Phys. B 737 (2006), 291. [22] R.F.Dashen, B.Hasslacher and A.Neveu, Phys. Rev. D 10 (1974) 4130; R.F.Dashen, B.Hasslacher and A.Neveu, Phys. Rev. D 11 (1975) 3424. [23] J. Goldstone and R. Jackiw, Phys.Rev. D 11 (1975) 1486. [24] R. Jackiw and G. Woo, Phys. Rev. D 12 (1975), 1643. [25] G. Mussardo, V. Riva and G. Sotkov, Nucl. Phys. B 670 (2003), 464. [26] G. Mussardo, V. Riva and G. Sotkov, Nucl. Phys. B 699 (2004), 545. G. Mussardo, V. Riva and G. Sotkov, Nucl. Phys. B 705 (2005), 548 [27] G. Mussardo, Neutral bound states in kink-like theories, hep-th/0607025, to appear on Nucl. Phys. B. [28] A.B. Zamolodchikov, Sov.J.Nucl.Phys. 44 (1986), 529. [29] M. Lassig, G. Mussardo and J.L. Cardy, Nucl. Phys. B 348 (1991), 591. [30] F. Colomo, A. Koubek and G. Mussardo, Int. Journ. Mod. Phys. A 7 (1992), 5281. http://arxiv.org/abs/hep-th/0607025 Introduction A semiclassical formula Symmetric wells Asymmetric wells Conclusions ABSTRACT In this talk we discuss an elementary derivation of the semi-classical spectrum of neutral particles in two field theories with kink excitations. We also show that, in the non-integrable cases, each vacuum state cannot generically support more than two stable particles, since all other neutral exitations are resonances, which will eventually decay. <|endoftext|><|startoftext|> Introduction Observations and data reduction The colour - magnitude diagrams Cluster parameters King 11 Berkeley 32 Summary and discussion ABSTRACT We have obtained CCD BVI imaging of the old open clusters Berkeley 32 and King 11. Using the synthetic colour-magnitude diagram method with three different sets of stellar evolution models of various metallicities, with and without overshooting, we have determined their age, distance, reddening, and indicative metallicity, as well as distance from the Galactic centre and height from the Galactic plane. The best parameters derived for Berkeley 32 are: subsolar metallicity (Z=0.008 represents the best choice, Z=0.006 or 0.01 are more marginally acceptable), age = 5.0-5.5 Gyr (models with overshooting; without overshooting the age is 4.2-4.4 Gyr with poorer agreement), (m-M)_0=12.4-12.6, E(B-V)=0.12-0.18 (with the lower value being more probable because it corresponds to the best metallicity), Rgc ~ 10.7-11 kpc, and |Z| ~ 231-254 pc. The best parameters for King 11 are: Z=0.01, age=3.5-4.75 Gyr, (m-M)_0=11.67-11.75, E(B-V)=1.03-1.06, Rgc ~ 9.2-10 kpc, and |Z| ~ 253-387 pc. <|endoftext|><|startoftext|> Introduction The genetic programming (GP) bibliography1, created and maintained by one of us (WBL) and by S. Gustafson contains most of the GP papers. As such, it is a rich source of data that implicitly describes many aspects of the structure of the GP community. Searching the bibliography and looking at the images2 provides a lot of useful information about the field and the people working on GP. However, a deeper analysis of the data, that goes beyond the mere pictorial aspect, provides a much more complete view. The coauthorship data is a social network since collaborating in a research ∗Information Systems Department, University of Lausanne, Switzerland †Information Systems Department, University of Lausanne, Switzerland ‡Dpt. of Animal Production Epidemiology and Ecology, University of Torino, Italy §Department of Computer Science, University of Essex, UK 1http://www.cs.bham.ac.uk/∼wbl/biblio/ 2http://www.cs.bham.ac.uk/∼wbl/biblio/gp-coauthors/ http://arxiv.org/abs/0704.0551v1 study usually requires that the coauthors become personally acquainted. Thus, studying those ties, their structure, and their evolution allows a better understanding of the factors that shape scientific collaboration. We present a systematic study of the GP coauthorship data base us- ing methods and tools pertaining to complex networks and social network analysis. Social network analysis (see [?] for a survey), although it is an old discipline, has recently received new impetus and tools from the field of complex networks (see [?] for an excellent review). This is mainly due to the relatively recent availability of large machine-readable databases such as the GP bibliography. Social acquaintances involve psychological and other human aspects that are difficult to quantify. However, as it has been done in other fields [?, ?, ?, ?], we use objective data such as joint published work to stand for social bonds. Since this must ignore subtler aspects of a col- laboration relationship, it is obviously far from perfect as a social indicator, yet it is still a good “proxy” for the network of social relationships and can reveal several interesting facts and trends. A preliminary investigation of the GP coauthorship network appears in [?]. In the first part of this article we update this initial study using the most recent data and adding the study of the influence of excluding co- edited proceedings and books. In the second part we offer a new analysis of the finer community structure of the collaboration network. Similar studies have been performed in the last few years on several other collaboration networks in disciplines such as physics, mathematics, medicine, biology, and computer science [?, ?, ?, ?]. A related investigation concerning the EC collaboration network [?] has appeared recently in popular form, but it does not take into account, for example the community structure of the network. [?] deals with some of the same statistical features for the EC community at large as we describe in detail here for GP. The values reported by [?] are in line with those found here for the GP field. Given that the intersection between the GP researchers and general EC is likely to be rather large, it would be interesting to study how they are related to each other. 2 The GP Collaboration Network We treat the genetic programming social network as a graph where each node is a GP researcher, i.e. someone who has at least one entry in the bib- liography. There is a connection between two people if they have coauthored at least one paper, or if they have coedited one or more book or proceedings. As of the start of 2007, there is a total of N = 2809 connected nodes, i.e. authors that have at least one GP collaborator, and a total of 5853 edges (collaborations) in the GP coauthorship network. There are 367 isolated vertices, which represent authors who have not collaborated with others to the extent of coauthoring a paper. Isolated vertices are ignored in our graph statistics. We have also excluded a single paper with 108 coauthors in a nuclear physics journal. This is because we consider it to be an anomalous entry that is not representative of typical collaborations in our discipline. Due to the youth of GP, the graph is relatively small compared to some studied collaboration networks [?, ?, ?]. (Although some published studies have covered much smaller and more specialised networks, e.g. of only 50 people [?].) The main disadvantage of studying a relatively small database is that, like any statistical study, more data allows deeper and more mean- ingful inferences to be drawn. In particular, studies of the form of the distributions (such as whether they follow exponential or power laws) re- quire a large amount of data. The advantages include that the graph almost fully represents the state of the whole GP community. This allows reliable characterisation of collaboration in the community. Also, the problems of multiple authors with the same name (e.g. A. Smith), outliers and different name spelling that plague the larger data sets, are unlikely and easy to spot in our data. Although in many cases in our field co-editing a book or proceedings vol- ume does reflect personal acquaintance, there are some large coeditorships which are not representative and so may give a slanted view. Therefore in the following figures we present two kinds of statistics: those that include all joint publications and those in which co-edited conference proceedings and co-edited books are excluded (but not their contents, of course). Next we present and discuss some basic measures that characterise the GP col- laboration network. 2.1 Number of Papers per Author The average number of papers per author is 3.16 with co-edited books and proceedings and it is 3.14 without. The five most prolific authors are, in de- creasing order: J. Koza, R. Poli, W. B. Langdon, W. Banzhaf and C. Ryan. If we exclude proceedings’ co-editors the ranking remains unchanged. Nat- urally the distribution of the number of papers per author, P (k), has some scatter, particularly in the tail of the distribution. Thus, we present in Fig- ure 1 the graph of the cumulative distribution P (k ≥ n) which is smoother and allows the same inferences to be made. The curves are rather well fitted by a straight line, and thus the distributions follow a power-law P (k) ∝ k−γ 10000 1 10 100 1000 number of entries with coeditors without coeditors power-law fit Figure 1: Cumulative distribution of the number of entries per author. Log- log scale. The straight line is the best mean-square fit and shows the number of authors is ∝ k−2.5. with a calculated exponent γ of 2.5 for both of them. A power-law distribu- tion with similar exponents has been observed for analogous collaboration networks, e.g. 2.86 for a biological publication database (Medline), 3.41 for a computer science database (NCSTRL), 2.4 for mathematics, and 2.1 for a neuroscience papers database [?, ?]. A smaller exponent (in absolute value) means that the tail of the distribution is more stretched towards high values of degree. 2.2 Number of Collaborators per Author The average number of collaborators per author, i.e. the mean degree 〈k〉 of the coauthorship graph, is 4.17 with proceedings and 3.62 without. This is close to the values reported by studies of computer science, physics (exclud- ing high energy physics) and Mathematics, suggesting GP follows similar collaboration patterns to those disciplines. However it is much less than found in high energy physics and medicine. See Table 1. In order and including co-edited volumes, the five authors that have the largest num- ber of collaborators are: W. Banzhaf, J.A. Foster, P. Nordin, W.B. Lang- don, U.-M. O’Reilly. Without co-edited books the ranking is: P. Nordin, W. Banzhaf, J. Daida, C. Ryan and R. Goodacre. The five “pairs” that have the highest number of coauthored papers are, in decreasing order both with or without co-edited proceedings: J. Koza–M. A. Keane, R. Poli–W.B. Langdon, J. Koza–D. Andre, J. Koza–F. Bennet and F. Bennet–M.A. Keane. This shows that J. Koza’s group has been tightly collaborating for a long time, a conclusion that is confirmed in the community study of section 4. It is also evident that the W.B. Langdon–R. Poli association has been an extremely productive one. 10000 1 10 100 number of collaborators with coeditors without coeditors Figure 2: Cumulative distribution of the number of authors with a given number of collaborators. Logarithmic scale on both axes. Figure 2 shows the cumulative distributions of the number of collabora- tors. One sees that the distributions are not pure power-laws, otherwise the points would approximately lie on a straight line. Rather, the distributions shows a power-law regime in the first part followed by an exponential decay in the tail. That is, the whole network cannot be fitted by a power-law. This is quite common. In fact, several measured social networks do not follow a power-law degree distribution [?, ?] and are best fitted either by an exponential degree distribution P (k) ≈ e−k/〈k〉 or by an exponentially truncated power-law of the type P (k) ≈ k−γe−k/kc , where kc represents a critical connectivity and 〈k〉 is the average degree. 2.3 Number of Authors per Paper Figure 3 shows the cumulative distribution of the number of papers written by a given number of coauthors. Here the distribution also has a tail that is longer than that of a Gaussian or exponential distribution, however it does not follow a power-law. The average number of authors per paper is 2.25 (2.22 without co-editors). From Table 1 we can see that these figures are close to the equivalent ones for computer science (NCSTRL) and physics, while Mathematics has a lower number of co-authors per paper. On the other hand, nuclear physics stands out with an unusually high number of coauthors per paper. 10000 1 10 100 number of coauthors with coeditors without coeditors Figure 3: Cumulative distribution of the number of papers with a given number of coauthors on log-log scales. From Figures 2 and 3 one can see that the tails of the distribution with co-editors are longer than without them. Thus, taking co-editorship into account seems to rather artificially inflate the number of publications with many co-authors and, by consequence, the number of collaborators that a person has. 2.4 Connected Components In the theory of Poisson random graphs there is a critical value of average de- gree 〈k〉 = 1 above which there is a sudden appearance of a giant component. This is so-called since most vertices belong to it. The other components are smaller and have an exponentially decreasing size distribution [?]. Although collaboration graphs are not random, a similar phenomenon appears. In- cluding coeditors there are 1025 GP authors in the giant component. This is 36.5% of the total graph. If we exclude coediting proceedings etc. the size is 743, representing the 26.9% of the total. In the giant component the average number of collaborators per author is 5.83 with co-editors and 4.39 without them. The cumulative size distribution of the connected components with and without co-editors are depicted in Figure 4. Figure 4 shows that the proba- bility density functions are well approximated by a power law with exponent of 2.9 (excluding co-editors) and 2.6 (total). Since the other authors did not provide the analogous data for their databases, we do not know how our figures would compare with those for other coauthorship databases. The existence of a big connected component has a social meaning. It suggests 36.5% of GP researchers are members of a single community, since those researchers are either directly connected via a collaboration or they are close to each other in a way that will be made clear in section 3. The size of the giant component is notably smaller in the GP graph with respect to other measured coauthorship networks (see Table 1). This may be due to the comprehensive nature of the GP bibliography. It captures work done by smaller groups which does not get into major journals, whereas, perhaps, the other databases concentrate upon higher impact outlets where work is heavily cited but at the expense of ignoring less regarded authors. This may artificially inflate the fraction of authors within their giant component. Alternatively it may be due to the youth of the GP field, with many semi- isolated individuals and groups starting research independently. One should also consider that all collaboration networks are in a non- equilibrium state as they are continuously evolving [?]. Accordingly, as time goes by, one should observe small components progressively connecting themselves to the large one. For example, in less than one year the size of the giant component including co-editors has grown from 942 to 1025 nodes. This is due in part to a number of newcomers collaborating with people already belonging to the giant component. The other part comes from the absorption of a few disconnected small components into the giant one thanks to one or more new collaborations. This suggests that the size of the giant component has not yet reached its “steady-state” value and it will continue to grow in relative size. Since we possess all the time-stamped data, it is possible to study the evolution of this component, as well as several other indicators from the beginning and up to the present days. This investigation is currently under way. 2.5 Social GP Clusters The clustering coefficient of a node in a graph is the proportion of its neigh- bouring nodes which are also neighbours of each other. The average clus- tering coefficient 〈C〉 is calculated across all nodes in the graph [?]. In other words, 〈C〉 is a simple statistical measure of the amount of local structure 1 10 100 1000 component size with coeditors without coeditors power-law fit power-law fit Figure 4: Cumulative distributions of the number of connected components in the collaboration graph by number of people. Log-log scale. that is present in a graph. Most real-world networks, e.g. the world wide web, roads, electrical power transmission and including the social networks that have been studied to date, have a much larger clustering coefficient than would be expected of a random graph with the same number of ver- tices and edges. Social networks are particularly clustered. For example, the average clustering coefficient is 〈C〉 = 0.665 for the GP collaboration graph including book co-editors, and it is 0.660 without. (We would expect 0.0015 and 0.0013 for the corresponding random graphs). In terms of scien- tific collaborations, a high clustering coefficient means that people tend to collaborate in groups of three or more. This agrees with what we know of the GP field. It may mean that two researchers that collaborate indepen- dently with a third one may, in time, become acquainted and so collaborate themselves. Alternatively it might be due to collaborators coming from the same institution. In all cases, a high value of 〈C〉 for a social network is an indication that collaborations are not made at random at all, and that social forces and processes are at work in the network structure formation. Table 1 summarises the results of this section and compares them with those for some other collaboration networks. Some of the entries in the table will be discussed in the following section. Most GP statistics are similar to those of the larger databases. However one notable difference, as we have already remarked, is the relative smallness of the largest component. The clustering is rather high, which shows that GP researchers know each other quite well within the large component, and the community is rather homoge- Table 1: Basic statistics for some scientific collaboration networks. GP1 is the GP bibliography at the start of 2007, including coedited books and pro- ceedings. GP2 is the same but without coeditors. SPIRES is a data set of papers in high-energy physics. Medline is a database of articles on biomed- ical research. Mathematics comprises articles from Mathematical Reviews. NCSTRL is a database of preprints in computer science. Physics has been assembled from papers posted on the Physics E-print Archive. Details about these databases can be found in [?, ?, ?]. GP1 GP2 SPIRES Medline Mathematics NCSTRL Physics Total number of papers 4564 4504 66652 2163923 1600000 13169 98502 Total number of authors 2809 2765 56627 1520251 253339 11994 52909 Average papers per author 3.16 3.14 11.6 6.4 7 2.55 5.1 Average authors per paper 2.25 2.22 8.96 3.754 1.5 2.22 2.53 Average collaborators per author 4.17 3.62 173 18.1 2.94 3.59 9.7 Size of the giant component (%) 36.5 26.9 88.7 92.6 82.0 57.2 85.0 Clustering coefficient 0.665 0.660 0.726 0.066 0.15 0.496 0.43 Average path length 4.74 5.2 4.0 4.6 7.73 9.7 5.9 neous. In contrast, in biology and medicine or mathematics, where scientist from different sub-disciplines seldom collaborate, the clustering coefficient is lower. Note also the high number of authors per paper, and especially the strikingly high number of collaborators per author in the nuclear physics community (SPIRES). Clearly, nobody can maintain an average of 173 sci- entific partners on a first-hand acquaintance basis and thus this figure does not seem to be socially meaningful. 3 Distances and Centrality A social network can be characterised by a number of measures that give an idea of “how far” people are from each other, or how “central” they are with respect to the whole community. These measures are well known in social network analysis. Here we shall concentrate on average path length and on betweenness centrality. 3.1 Average Path Length The average path length L of a graph is the average value of the shortest paths between all of its pairs of vertices. In random graphs and many real networks, such as the Internet, the World Wide Web and social networks, the average path distance scales as a logarithmic function O(logN) of the number of vertices N . Such networks, if they also have a high clustering coefficient, are known as small worlds networks [?]. Since, even for very large graphs, any two nodes in a small world network are only a few steps apart. In contrast in regular lattices, two nodes are O(N D ) apart. (Where D is the lattice’s dimensionality. For example, for a square lattice L ≤ 2 2 ). The average path length of the giant component of the GP collaboration graph including coeditors is 4.74 (it is 5.2 without coeditors). The longest among all the shortest paths (known as the diameter) is 12 (14 without coeditors). Thus, unsurprisingly, the GP community, as far as its “core” component is concerned, is indeed a small world and is characterised by values that are typical of these kinds of network (see Table 1). Being a small world means that information may circulate quickly and collaborations are easier to set up. These are clearly advantageous for a research community. The connected components following the largest one are themselves small worlds. We expect over time some of them will merge with the largest component. (For this to happen, only a single new collaboration between two scientists each belonging to one of the components is needed.) 3.2 Betweenness The betweenness b(v) of a vertex v is the total number of shortest paths be- tween all possible pairs of vertices that pass through this vertex. Nodes that have a high betweenness potentially have more influence, i.e. they are more central in the network, in that there is more “traffic” that goes through them. The first five authors in terms of betweenness in the network (in- cluding co-editors and in decreasing order) are: W. Banzhaf, H. Iba, U.-M. O’Reilly, H. de Garis and W. B. Langdon. W. Banzhaf is also the researcher that has the highest number of different collaborators. Without co-editors the ranking is: W. B. Langdon, U.-M. O’Reilly, W. Banzhaf, M. Tomassini and P. Nordin. People who have a large value of betweenness play the role of intermediaries or “brokers” in a social sense. 3.3 Non-random collaborations between directly connected authors Most technological and biological networks are disassortive in that they have negative correlation, meaning that high-degree vertices are preferentially connected to low-degree vertices. However most measured social networks are assortative, meaning highly connected nodes tend to be connected with other highly connected nodes [?]. The GP collaboration network confirms this general observation with a correlation coefficient of +0.15 for the gi- Powered by yFiles Lanz Pizz Figure 5: One of the communities belonging to the main network compo- nent. The thickness of the links gives an indication of the number of co- authored papers. The largest thickness indicates more than 16 coauthored works. The thinnest link (light gray) stands for a single collaboration. The different symbols and colours represent sub-communities of the illustrated community. ant component, and +0.30 for the whole graph (including coeditors and excluding the single physicist’s paper). These are close to the coefficients observed for other social networks (specifically 0.127 for Medline and 0.120 for Mathematics [?]). 4 Communities in the Giant Component All the researchers belonging to the largest component of the network can be said to form part of the GP community at large, in the sense that they are all only a few steps away from any other member of the community. However, we know from direct experience that some groups of GPers are more closely connected between themselves than with other people. In other words, they belong to what one might call a group or a tighter community within the global one. It is not easy to give a rigorous quantitative definition of a community within a network. For our purposes a community can be seen as a set of highly connected vertices having few connections with vertices belonging to other communities. In the analysis of social networks, several algorithms that attempt to split a network into communities have been proposed. We used Newman’s method [?], which is based on a measure of the fraction of edges that fall within communities minus the expected value of the same quantity if edges fall at random without regard for the community structure. Since the GP bibliography contains the number of papers that any two collaborators have published together, it is possible to go a step further than just saying that two people have coauthored at least a paper, and give a measure of the intensity of the collaboration. We use the number of papers that two given authors have in common as a measure of the strength of their collaboration. Newman [?] has proposed a more refined measure which takes into account the actual number of coauthors of each paper. However this is more complicated than we need, instead we ignore the total number of coauthors for each paper. Our measure of collaboration strength is used in our communities algorithm to highlight groups of researchers that have collaborated strongly with the aim of uncovering the stability of the scientific relationship. We have also excluded coedited proceedings, books, etc., as we have already seen that these might sometimes represent spurious collaboration relationships. The results of running the algorithm on the subgraph represented by the largest connected component are qualitatively surprisingly close to what one would expect, given our knowledge of the GP field. The advantage is that the analysis makes them explicit and uncovers a number of other relationships that would be difficult to infer without an explicit study of the raw data. As an example of the about 25 communities that the algorithm discovers, Figure 5 shows the structure of the groups around one of us (“Toma”). If we now consider this community as an isolated subgraph and apply again New- man’s algorithm to it, we obtain the groups highlighted by different symbols Powered by yFiles Chio BuxtHoll Figure 6: Another community belonging to the main network component. The thickness of the links gives an indication of the number of co-authored papers. The largest thickness indicates more than 16 coauthored works. The thinnest link (light gray) stands for a single collaboration. The different symbols and colours represent sub-communities. and colours in the figure. Thus, the groups correspond to sub-communities within the main community. The thickness of the links represents the inten- sity of the relationship. It is easy to recognise a “hard core” of collaborating researchers strongly connected to “Toma” forming triads and higher poly- gons of order four and five. The strong triangle (“Foli”, “Pizz”, “Spez”) is relatively loosely connected to the rest, showing that these researchers be- long to the community but often collaborate between themselves. It is also possible to discern institutional and geographical components in the com- munity. For example, most of the upper right part of the figure through the node “Chop” comprises researchers essentially belonging to the University of Geneva, which is close to the University of Lausanne, to which “Toma” belongs. However, geographical closeness is not the key factor in the other groups which belong to Universities in France, Italy, Spain, and the US. We might conjecture that many collaborations start locally at the same or at close institutions and then they spread through people being introduced to others via a common acquaintance, or through people physically moving or visiting other institutions. This is the case in the figure, where “Vann”, ”Chop”, and “Vega” among others have played the role of “bridges” between different institutions and across countries. As a second illustration, let us look at Figure 6 which is the community that revolves around one of us (“Lang”) and “Poli”. In contrast to the pre- vious case, one can see that the graph structure is more “star-like”, with two large directly connected big hubs (“Lang” and “Poli”) who have about 70 co- authored papers, and three other highly connected nodes (“Buxt”,“McPh”, “Rowe”) which are strongly connected to one of the main hubs but not to both. It is interesting to observe the role of “McPh” who, like “Vega” in the previous community (cf. Figure 5), plays a bridging role, this time between some UK and some American researchers. We can also recognise a strong ”theory-oriented” group, which is almost a clique in the graph, formed by (“McPh”,“Poli”,“Rowe”, “Steph”, “Wrig”). There is also another bridge formed by “Cagn” from UK to Italy, again due to a long-standing collabo- ration and friendship. The small cliques or almost cliques at the periphery of the figure essentially represent people that have worked at the same in- stitution in either Italy or the US. The discussion above, motivated by our belonging to the mentioned com- munities, and thus by our direct human knowledge about them, should be enough to get an impression of the many useful observations that one can make on the communities that interlock in the main network component. There are of course several other large well known and interesting commu- nities in the network but unfortunately we cannot describe them here for reasons of space. 5 Conclusions In sections 2 and 3 we characterised the genetic programming (GP) coau- thorship graph using a number of local and global statistics. We extended and updated the findings presented in [?] by studying the influence of coedited volumes and by using the latest data available. Section 3 showed the GP field to be highly clustering and that the GP coauthorship network has a small mean path length. Together these suggest that, at least for the core, GP is indeed a “small world”. We also found, compared with other published collaboration networks, that the fraction of GP authors connected by coauthorship is a relatively small fraction of all GP authors. Section 4 is a study of the community structure of GP. It uses a more pre- cise definition of collaboration, which takes into account the intensity of the relationship. This uncovers many groups of tightly interacting researchers. From the detailed study of two of the communities we have drawn inferences about the pivotal role of some researchers or groups of researchers in pro- moting collaborations within and between academic institutions. Adding our human knowledge about geographical location and personal acquain- tance, allows some conjectures to be drawn about the way in which different continents and countries collaborate on research projects. It should be obvious that the present data driven approach to social network analysis can only provide some answers but not all of them. Algo- rithms and data cannot take into account human aspects such as friendship in scientific collaboration. While these may be buried in the sea of numbers they will never appear explicitly from such analyses. Nevertheless, we feel that our results are interesting and useful in the way that they characterise our community. There is another aspect of the collaboration graph that would be re- vealing: the analysis of its development over the years. Indeed, since each paper has a date of publication, we possess all the data that are needed for such an investigation. This would allow the detailed study of how the network has grown to its present size and structure from the beginning and might give hints as to its future progress. This extension is currently under investigation. Introduction The GP Collaboration Network Number of Papers per Author Number of Collaborators per Author Number of Authors per Paper Connected Components Social GP Clusters Distances and Centrality Average Path Length Betweenness Non-random collaborations between directly connected authors Communities in the Giant Component Conclusions ABSTRACT Useful information about scientific collaboration structures and patterns can be inferred from computer databases of published papers. The genetic programming bibliography is the most complete reference of papers on GP\@. In addition to locating publications, it contains coauthor and coeditor relationships from which a more complete picture of the field emerges. We treat these relationships as undirected small world graphs whose study reveals the community structure of the GP collaborative social network. Automatic analysis discovers new communities and highlights new facets of them. The investigation reveals many similarities between GP and coauthorship networks in other scientific fields but also some subtle differences such as a smaller central network component and a high clustering. <|endoftext|><|startoftext|> arXiv:0704.0552v1 [astro-ph] 4 Apr 2007 The Expanding Photosphere Method: Progress and Problems József Vinkó and Katalin Takáts Department of Optics & Quantum Electronics, University of Szeged, Hungary Abstract. Distances to well-observed Type II-P SNe are determined from an updated version of the Expanding Photosphere Method (EPM), based on recent theoretical models. The new EPM distances show good agreement with other independent distances to the host galaxies without any significant systematic bias, contrary to earlier results in the literature. The accuracy of the method is comparable with that of the distance measurements for Type Ia SNe. Keywords: supernovae; core-collapse; distances PACS: 97.10.Vm, 97.60.Bw INTRODUCTION Distance is one of the most fundamental quantities in astrophysics, and it is especially true for supernovae. Type Ia SNe are thought to be the most reliable distance indicators, even up to z ∼ 1.5 redshift, and they play major role in determining the expansion of the Universe as well as the cosmic equation of state. On the other hand, accurate distances to SNe are crucial in understanding not only their physical properties, but also revealing their progenitor objects and the possible explosion mechanisms. The Expanding Photosphere Method (EPM) is a tool for measuring distances to SNe that have large amount of ejected material [1]. The concept of EPM is based on a few assumptions about the general physics of the expanding ejecta. These are the followings: 1. The expansion of the ejected material is spherically symmetric. 2. The ejecta is expanding homologously, i.e. R(t) = v(R) · (t − te), where R(t) is the time-dependent radius of a particular layer in the ejecta, v(R) is the (constant) expansion velocity of this layer and t − te is the time elapsed since the moment of explosion (te). 3. The ejecta is optically thick, i.e. there exists a layer where the optical depth τλ ∼ 1. This layer is the „photosphere” (Rphot ). Because of the expansion, the location of the photosphere moves inward the ejecta, so its velocity (vphot ) is decreasing with time. 4. The photosphere radiates as a blackbody, so the shape of the emergent flux spec- trum is Planckian with a well-defined effective temperature Te f f . However, the ab- solute flux value differs from that of the blackbody due to the dominance of scatter- ing opacity over true absorption in the ejecta. The deviation from the blackbody can be described with a simple scaling, i.e. Fλ = ζ 2πBλ (T ). where Fλ is the surface flux, Bλ (T ) is the Planck function and ζ is the correction (or “dilution”) factor. http://arxiv.org/abs/0704.0552v1 These assumptions are most likely to be valid in Type II-P SNe. These eject a massive, hydrogen-rich envelope that remains optically thick for ∼ 100 days after explosion, and the emergent spectrum is indeed close to be Planckian. Thus, EPM is expected to work best for such SNe. Based on the assumptions, the instantaneous radius of the photosphere can be ex- pressed as Rphot = vphot(t) · (t − te) (the radius of the progenitor is usually neglected). Meantime, the observed flux is fλ = θ 2 · ζ 2πBλ (T ), where θ = Rphot/D is the angular radius of the photosphere from distance D. Combining these two equations, one gets the basic equation of EPM [2, 3]: t = te+D · vphot . (1) Since θ and vphot can be determined from observations, te and D are the only unknowns in Eq.1. These can be derived via least-squares fitting to the observed quantities. If the SNe under study are at high redshifts, the equations should be slightly modified [4]. The definition of the angular radius is connected with the angular distance DA, while in the expression of the observed flux the luminosity distance DL enters. At high z DL = (1+ z) 2DA, so the angular radius of the photosphere can be inferred from fλ (1+ z) πBλ ′(T ) , (2) where λ ′ = λ/(1+ z). One particular advantage of EPM is that it does not require initial calibration, i.e. a sample of objects with a priori known distances. However, the computation of the ζ correction factors needs detailed model atmospheres, which makes the method essen- tially model-dependent. Currently, there are two independent sets of model atmospheres of Type II-P SNe in the literature, which were used to compute correction factors as a function of Te f f [5, 6]. The former one was used in detailed studies of SN 1999em (the most extensively studied SN II-P so far) that resulted in DEPM ≈ 8±1 Mpc [2, 3, 7] . This is in significant disagreement with the Cepheid distance to the host galaxy NGC 1637 being DCep = 11.7±1 Mpc [8]. This problem has been solved in [9] by using a new set of correction factors based on the NLTE radiative transfer code CMFGEN which gave DEPM = 11.5±1.0 Mpc for SN 1999em. NEW EPM DISTANCES TO SNE II-P The method outlined above has been implemented in a new code that needs observed BVRI light curves, radial velocities (determined from the absorption minima of certain spectral features, see below) and reddening information (typically E(B−V )) as input. As in any method based on photometry, the magnitudes must be dereddened, but fortu- nately the results of EPM are quite insensitive to reddening errors, compared with other methods [5]. 3000 4000 5000 6000 7000 8000 9000 Wavelength (Å) SN 1999em (+8 d) Tbb = 16862 K 4 6 8 10 12 14 16 18 20 T (103 K) FIGURE 1. Left panel: Result of fitting a blackbody (solid line) to broadband BVI fluxes (filled symbols) of SN 1999em [3]. The R−band flux is also in good agreement with the fitted blackbody. The flux-calibrated spectrum obtained simultaneously (dotted line) is shown for comparison. Right panel: The correction factor as a function of Te f f from [6] (filled circles) and [5] (open circles). At each epoch, the angular radius has been computed by a simultaneous fitting to the dereddened B, V and I fluxes, as described in [2]. The corresponding effective temper- ature has been derived by fitting a blackbody curve to the broadband fluxes converted from the dereddened magnitudes. Our experience shows that the best results can be achieved by considering all optical+NIR (i.e. BVRI) fluxes simultaneously. Earlier stud- ies were sometimes limited to the usage of B and V bands only, which may result in increased systematic errors due to the large deviation of the B-band fluxes from the blackbody curve at later phases. The left panel of Fig.1 illustrates the optimum fitting of a blackbody to either photometric, or precisely calibrated spectroscopic fluxes. From the resulting Te f f , the correction factor ζ has been computed from the ζBV I(T ) function of Dessart & Hillier [6] for data obtained less than 40 days after explosion. For data measured between 40 - 60 days after explosion, the function given by Eastman et al. [5] was applied. As noted above, the usage of the function of Dessart & Hillier produces better distances, but their models are valid only during the first month after explosion, before the hydrogen starts to recombine. The ζBV I(T ) functions are plotted in the right panel of Fig.1. Beside the correction factors, the other important quantity is the photospheric velocity vphot , because the resulting distance is very sensitive to the velocities that appear in the denominator in Eq.1. Thus, the problem of finding an optimum method to infer vphot from Type II-P SNe spectra has been addressed in several studies (see [2, 3, 6]). We have studied this problem by computing model spectra with the parametrized spectral synthesis code SYNOW [10]. SYNOW computes the emergent spectrum in a homologously expanding atmosphere assuming LTE and pure scattering line formation. The input parameters are the velocity and the blackbody temperature at the photosphere (vphot and Te f f ), the exponent of the atmospheric structure, the list of ions contributing to the spectral features, and the optical depth of one strong line for each ion. Four sets of spectra have been defined corresponding to phases +10, +15, +50 and +95 days after explosion, respectively. The list of ions contained H, He I, Na I, Fe II, Sc 3000 4000 5000 6000 7000 8000 Wavelength (Å) +10 d +15 d +50 d +95 d 0.96 0.98 1.02 1.04 1.06 1.08 2 4 6 8 10 12 14 16 vobs (10 3 km/s) HeI 5876 FeII 4924 FeII 5018 FeII 5169 ScII 5526 FIGURE 2. Left panel: SYNOW model spectra of Type II-P SNe. The phase of each spectrum (ex- pressed in days after explosion) is indicated. Right panel: The ratio of the true photospheric velocity (an input parameter of a SYNOW model) to the „observed” velocity (derived from the absorption minimum of P Cygni lines) as a function of the „observed” velocity. Different symbols mean different photospheric lines indicated on the righ-hand side. II, Ti II and Ba II, because these ions are thought to be responsible for the strongest lines in the optical [3]. The input parameters except vphot were tuned to match real Type II-P SNe spectra. Then, several model spectra were synthesized with different input vphot for each phase. The left panel of Fig.2 shows representative spectra of all phases. The synthesized spectra were used to compute “observed” radial velocities by mea- suring the Doppler-shift of the absorption minima of selected lines. For P Cygni line profiles, this should give exactly vphot if the line is isolated and optically thin. However, in reality the features in a SN spectrum are all blends and may not be optically thin. Therefore, vobs will differ from vphot . In the right panel of Fig.2 the ratio of vphot/vobs is plotted as a function of vobs for the features shown. It is seen that in almost all cases vphot is slightly underestimated. The explanation of such a phenomenon is discussed in [6] for the Hα line. However, the relative difference is below 5 %, thus, these lines are expected to represent vphot with 2 - 4 % accuracy. Motivated by these results, we have selected the He I λ5876 and the Fe II λ5169 features to infer vphot from early-phase (< +20 days) and late-phase spectra of real SNe, respectively. In order to apply the method to real SNe, we have collected the available data of Type II-P SNe from the literature (details and references will be published in a forthcoming paper). Eq.1 was fitted to the observed data via least squares using either t or θ/vphot as the independent variable. The two results for D and te were averaged to obtain their final value. Whenever possible, the fit was restricted to data obtained between +5 – +40 days after explosion, and the angular radii were calculated using the Dessart & Hillier correction factors (see above). In a few cases only late-phase (t ∼ 40− 60 days) data were available. The Eastman et al. correction factors were applied for those SNe. The EPM distances are plotted against the “reference” distances to their host galaxies (mostly Tully-Fisher or SBF-distances for the nearby ones and Hubble-flow distances for the more distant ones) in the left panel of Fig.3. As a comparison, the distances coming 1000 1 10 100 1000 Dref (Mpc) 26 28 30 32 34 36 38 40 FIGURE 3. Left panel: the comparison of EPM (filled circles) and SCM (open triangles) distances with the reference distances of the host galaxies. Right panel: residuals of the distance moduli of Type II-P SNe from EPM (filled symbols) and Type Ia SNe (see text). from the „Standard Candle Method” (SCM) [11] for nearly the same observational sample are also shown. The scattering is very similar for both EPM and SCM. It is concluded that these two methods provide distances to Type II-P SNe with ∼ 15−20 % accuracy. The accuracy of the new EPM distances is also similar to that of individual SNe Ia distances. This is illustrated in the right panel of Fig.3, where the difference between the distance moduli of Type II-P SNe (from this paper) and the low-redshift subsample of Type Ia SNe (from [12]) are plotted against the reference distance moduli (adopting Dre f = cz/H0 for Type Ia SNe). Again, the scattering of the data is similar for the two samples. Thus, the concept of EPM combined with the present knowledge of Type II-P SNe atmospheres may provide consistent and reliable distances, which may be extended toward higher redshifts in the future. This could be a very important, independent test of the Type Ia SNe distance scale. This work was supported by Hungarian OTKA Grants No. T 042509 and TS 049872. REFERENCES 1. Kirshner R.P., Kwan J., ApJ 193, 27 (1974) 2. Hamuy M. et al., ApJ, 558, 615 (2001) 3. Leonard D.C. et al., PASP 114, 35 (2002) 4. Schmidt, B.P. et al., AJ 107, 1444 (1994) 5. Eastman, R.G., Schmidt, B.P., & Kirshner, R., ApJ 466, 911 (1996) 6. Dessart, L. and Hillier, D. J., Astronomy & Astrophysics 439, 671 (2005) 7. Elmhamdi, A. et al. MNRAS 338, 939 (2003) 8. Leonard, D.C., Kanbur, S.M., Ngeow, C.C., Tanvir, N.R. ApJ 594, 247 (2003) 9. Dessart, L. and Hillier, D. J., Astronomy & Astrophysics 447, 691 (2006) 10. Baron E. et al., ApJ 545, 444 (2000) 11. Hamuy, M., in Cosmic Explosions - IAU Colloquium 192, edited by J. M. Marcaide and K. W. Weiler, Springer Proceedings in Physics 99, Springer-Verlag, Berlin, Heidelberg, 2005, pp. 535–541. 12. http://braeburn.pha.jhu.edu/~ariess/R06/Davis07_R07_WV07.dat ABSTRACT Distances to well-observed Type II-P SNe are determined from an updated version of the Expanding Photosphere Method (EPM), based on recent theoretical models. The new EPM distances show good agreement with other independent distances to the host galaxies without any significant systematic bias, contrary to earlier results in the literature. The accuracy of the method is comparable with that of the distance measurements for Type Ia SNe. <|endoftext|><|startoftext|> Introduction The old idea[1] that spontaneous Lorentz invariance violation (SLIV) may lead to an alternative theory of quantum electrodynamics still remains extremely attractive in numerous theoretical contexts[2] (for some later developments, see the papers[3]). The SLIV could generally cause the appearance of massless vector Nambu-Goldstone modes which are identified with photons and other gauge fields underlying the mod- ern particle physics framework like as Standard Model and Grand Unified Theory. At the same time, the Lorentz violation by itself has attracted a considerable at- tention in recent years as an interesting phenomenological possibility appearing in various quantum field and string theories[4-9]. Early models realizing the SLIV conjecture were based on the four fermion (current-current) interaction, where the proposed gauge field may appear as a fermion- antifermion pair composite state[1], in a complete analogy with a massless composite scalar field in the original Nambu-Jona-Lazinio model[10]. Unfortunately, owing to the lack of a starting gauge invariance in such models and composite nature of Goldstone modes appeared it is hard to explicitly demonstrate that these modes really form together a massless vector boson being a gauge field candidate. Actu- ally, one must make a precise tuning of parameters, including a cancellation between terms of different orders in the 1/N expansion (where N is the number of fermion species involved), in order to achieve the massless photon case[11]. Rather, there are in general three separate massless Goldstone modes, two of which may mimic the transverse photons polarizations, while the third one must properly be suppressed. In this connection, the more instructive laboratory for SLIV consideration proves to be some simple class of the QED type models having from the outset a gauge invariant form, whereas the Lorentz violation is realized through the nonlinear dy- namical constraint imposed on the starting vector field Aµ A2µ = n 2 (1) where nµ is an properly oriented unit Lorentz vector, while M is a proposed SLIV scale. This constraint means in essence that the vector field Aµ develops the vacuum expectation value 〈Aµ(x)〉 = nµM and Lorentz symmetry SO(1, 3) breaks down to SO(3) or SO(1, 2) depending on the time-like (n2µ = +1) or space-like (n µ = −1) SLIV. Such QED model was first studied by Nambu a long time ago[12], but only for the time-like SLIV case and in the tree approximation. For this purpose he applied the technique of nonlinear symmetry realizations which appeared successful in the handling of the spontaneous breakdown of chiral symmetry in the nonlinear σ model[13] and beyond1. 1Actually, the simplest possible way to obtain the above supplementary condition (1) could be an inclusion the “standard” quartic vector field potential V (A) = − A2µ + (A2µ) 2 into the QED type Lagrangian, as can be motivated to some extent[14] from the superstring theory. This potential inevitably causes the spontaneous violation of Lorentz symmetry in a standard way, much as an internal symmetry violation is caused in a linear σ model for pions[13]. As a result, one has a In the present paper, we mainly address ourselves to the Yang-Mills gauge fields as the possible vector Goldstone modes (Sec.3) once some basic ingredients of the Goldstonic QED model are established in a general SLIV case (Sec.2). This prob- lem has been discussed many times in the literature within quite different contexts, such as the Yang-Mills gauge fields as the Goldstone modes for the spontaneous breaking of general covariance in a higher-dimensional space[17] or for the nonlinear realization of some special infinite parameter gauge group[18]. However, all these considerations look rather speculative and optional. Specifically, they do not give a correlation between the SLIV induced photon case, from the one hand, and the Yang-Mills gauge field case, from the other. In contrast, our approach is solely based on the spontaneous Lorentz violation thus properly generalizing the Nambu’s QED model[12] to the non-Abelian internal symmetry case. Just in this approach evolved the interrelation between both of cases appears most transparent. We will see that in the Yang-Mills theory case with an internal symmetry group G having D generators not only the pure Lorentz symmetry part SO(1, 3) in the symmetry SO(1, 3) ×G of the Lagrangian, but the larger accidental symmetry SO(D, 3D) of the SLIV constraint Tr(AµA µ) = ±M2 in itself is spontaneously broken as well. Because the starting non-Abelian theory proves to be expanded about the vacuum which violates the much higher accidental symmetry appeared, many extra mass- less modes, the pseudo-Goldstone vector bosons (PGB), have to arise. Actually, while the spontaneous Lorentz violation on its own still generates only one genuine Goldstone vector boson, the accompanying vector PGBs related to the SO(D, 3D) breaking also come into play in the final arrangement of the entire Goldstone vec- tor field multiplet. Remarkably, in contrast to the familiar scalar PGB case[13] the vector PGBs remain strictly massless being protected by the non-Abelian gauge in- variance of the Yang-Mills theory involved. Then in Sec.4 we show by some examples of the lowest order SLIV processes that, while the Goldstonic non-Abelian theory evolved contains a rich variety of Lorentz and CPT violating couplings, it proves to be physically indistinguishable from a conventional Yang-Mills theory. Actually, one of the goals of the present work is to explicitly demonstrate that a conventional Yang-Mills theory (as well as QED) is in fact the spontaneously broken theory. The Lorentz violation, due to the quadratic field constraint of the type (1), renders this theory highly nonlinear in the Goldstone vector modes, while physically equivalent to the usual one. So, as well as in the pure QED case, the SLIV only means the noncovariant gauge choice in the otherwise gauge invariant and Lorentz invariant Yang-Mills theory. However, even a tiny breaking of the starting gauge invariance at massive Higgs mode (with mass 2mA) together with a massless Goldstone mode associated with photon. Furthermore, just as in the pion model one can go from the linear model for the SLIV to the non-linear one taking a limit λA → ∞, m A → ∞ (while keeping the ratio m A/λA to be finite). This immediately leads to the constraint (1) for vector potential Aµ with n 2 = m2A/λA, as it appears from a validity of its equation of motion. Another motivation for the nonlinear vector field constraint (1) might be an attempt to avoid the infinite self-energy of the electron in a classical electrodynamics, as was originally indicated by Dirac[15] and extended later to various vector field theory cases[16]. very small distances influenced by gravity would render the SLIV physically signifi- cant. For the SLIV scale comparable with the Planck one the spontaneous Lorentz violation could become directly observable at low energies. We summarize the results obtained in the final Sec.5. 2 Goldstonic quantum electrodynamics The simplest SLIV model is given by a conventional QED Lagrangian for the charged fermion field ψ L(A,ψ) = − µν + ψ(iγ · ∂ −m)ψ − eAµψγ µψ (2) where the nonlinear vector field constraint (1) is imposed[12]. For the resulting Lorentz violation, one can rewrite the Lagrangian L(A,ψ) in terms of the standard parametrization for the vector potential Aµ Aµ = aµ + (n ·A) (n2 ≡ n2µ) (3) where the aµ is pure Goldstonic mode n · a = 0 (4) while the effective Higgs mode (or the Aµ component in the vacuum direction) is given according to the above nonlinear constraint (1) by n ·A = (M2 − n2a2ν) 2 =M − n2a2ν +O(1/M2) (5) where, for definiteness, the positive sign for the above square root was taken when expanding it in powers of a2ν/M 2. Putting the parametrization (3) with the SLIV constraint (1, 5) into our basic gauge invariant Lagrangian (2) one comes to the truly Goldstonic model for QED. This model might look unacceptable due to the inappropriately large Lorentz violating fermion bilinear eMψ(γ ·n)ψ stemming from the vector-fermion current interaction eAµψγ µψ in the Lagrangian L (2) when the expansion (5) is taken. However, thanks to a local invariance of the Lagrangian L this term can be gauged away by a suitable redefinition of the fermion field ψ → eieM(n·x)ψ (6) after which the above fermion bilinear is exactly cancelled by an analogous term stemming from the fermion kinetic term. So, one eventually comes to the essentially nonlinear SLIV Lagrangian for the Goldstonic aµ field of the type (taken in the first approximation in a2ν/M L(a, ψ) = − δ(n · a)2 − n2a2ρ + (7) +ψ(iγ · ∂ +m)ψ − eaµψγ en2a2ρ ψ(γ · n)ψ We denoted its strength tensor by fµν = ∂µaν − ∂νaµ, while h µν = nµ∂ν −nν∂µ is a new SLIV oriented differential tensor. This tensor hµν acts on the infinite series in a2ρ coming from the expansion of the effective Higgs mode (5) from which the first order term −n2a2ν/2M was only taken in this expansion throughout the Lagrangian L(a, ψ). Also, we explicitly included the orthogonality condition n · a = 0 into Lagrangian through the term which can be treated as the gauge fixing term (taking the limit δ → ∞) and retained the former notation for the fermion ψ. The Lagrangian (7) completes the Goldstonic QED construction for the charged fermion field ψ. The model, as one can see, contains the massless Goldstone modes given by the tree broken generators of the Lorentz group, while keeping the massive Higgs mode frozen. These modes, lumped together, constitute a single Goldstone vector boson associated with photon2. In the limit M → ∞ the model is indistin- guishable from a conventional QED taken in the general axial (temporal or pure axial) gauge. So, for this part of the Lagrangian L(a, ψ) given by the zero-order terms in 1/M the spontaneous Lorentz violation only means the noncovariant gauge choice in otherwise the gauge invariant (and Lorentz invariant) theory. Remarkably, furthermore, also all the other (first and higher order in 1/M) terms in the L(a, ψ) (7), though being by themselves the Lorentz and CPT violating ones, do not lead to the physical SLIV effects which turn out to be strictly cancelled in all the physical processes involved. So, the nonlinear constraint (1) imposed on the standard QED Lagrangian (2) appears, in fact, as a possible gauge choice, while the S-matrix re- mains unaltered under such a gauge convention. This conclusion was first reached at tree level[12] and recently extended to the one-loop approximation[19]. All the one-loop contributions to the photon-photon, photon-fermion and fermion-fermion interactions violating the physical Lorentz invariance were shown to be exactly can- celled as well. This means that the vector field constraint A2µ = n 2 which has been treated as the nonlinear gauge choice at a tree (classical) level, remains just as a pure gauge condition when quantum effects are also taken into account. Re- markably, this conclusion appears to work also for a general Abelian theory case[20], particularly, when the internal U(1) charge symmetry is spontaneously broken hand in hand with the Lorentz one. As a result, the massless photon being first generated by the Lorentz violation become then massive due to the standard Higgs mechanism, while the SLIV condition in itself remains to be a gauge choice3. 2Strictly speaking one can no longer use the standard definition of photon as a state being the spin-1 representation of the (now spontaneously broken) Poincare group. However, due to gauge symmetry of the starting QED Lagrangian (2) the separate SLIV Goldstone modes appear combined in such a way that a standard photon (taken in an axial gauge (4)) emerges. 3Note in this connection that there was discussed[12] a possibility of an explicit construction of the gauge function corresponding to the nonlinear gauge constraint (1) that would eliminate the need for all the kinds of checks of gauge invariance mentioned above. Remarkably, the equation for this gauge function appears to be mathematically equivalent to the classical Hamilton-Jacobi equation of motion for a charged particle. Thus, this gauge function should in principle exist because there is a solution to the classical problem. However, this formal analogy only works for the time-like SLIV (n2µ = +1) in the pure QED leaving aside a general Abelian theory when the gauge invariance can spontaneously be broken. Apart from that, it does not generally extend to 3 Goldstonic Yang-Mills theory In this section, we extend our discussion to the non-Abelian internal symmetry case given by a general group G with generators ti([ti, tj ] = icijktk and Tr(titj) = δij where cijk are structure constants and i, j, k = 0, 1, ...,D − 1). The corresponding vector fields which transform according to its adjoint representation are given in the proper matrix form Aµ = A i, while the matter fields (fermions, for definiteness) are presented in the fundamental representation column ψr (r = 0, 1, ..., d− 1) of G. By analogy with the above Goldstonic QED case we take for them a conventional Yang-Mills type Lagrangian L(A, ψ) = − Tr(F µνF µν) + ψ(iγ · ∂ −m)ψ + gψAµγ µψ (8) (where F µν = ∂µAν − ∂νAµ − ig[Aµ,Aν ] and g stands for the universal coupling constant in the theory) with the nonlinear SLIV constraint Tr(AµA µ) = n2µM 2, n2µ = ±1 (9) imposed4. One can easily see that, although we propose only the SO(1, 3) × G invariance in the theory, the SLIV constraint taken (9) possesses, in fact, the much higher accidental symmetry SO(D, 3D) determined by the dimensionality D of the G group adjoint representation to which the vector fields Aiµ are belonged. This symmetry is indeed spontaneously broken at a scale M Aiµ(x) = niµM (10) with the vacuum direction given now by the ‘unit’ rectangular matrix niµ which describes both of the generalized SLIV cases at once, time-like (SO(D, 3D) → SO(D−1, 3D)) or space-like (SO(D, 3D) → SO(D, 3D−1)), respectively, depending on the sign of the n2µ ≡ n µ,i = ±1. This matrix has only one non-zero element for both of cases determined by the proper SO(D, 3D) rotation. They are, particularly, 0 or n 3 provided that the vacuum expectation value (10) is developed along the i = 0 direction in the internal space and along the µ = 0 or µ = 3 direction, respec- tively, in the Minkowskian space-time. In response to each of these two breakings, side by side with one true vector Goldstone boson and the D − 1 scalar Goldstone bosons corresponding to the spontaneous violation of actual SO(1, 3) ⊗ G symme- try of the total Lagrangian L, the D − 1 vector pseudo-Goldstone bosons related to breaking of the accidental SO(D, 3D) symmetry of the SLIV constraint taken (9) are also produced. Remarkably, in contrast to the familiar scalar PGB case[13] the non-Abelian case (see next Section). 4As in the Abelian case, the existence of such a constraint could be related with some non- linear σ type SLIV model proposed for the vector field multiplet Aiµ in the Yang-Mills theory (8). Note in this connection that, due to its generic antisymmetry, the familiar quadrilinear terms g2Tr([Aµ, Aν ]) 2 in the Lagrangian (8) do not contribute into the SLIV since they identically vanish for any single-valued vacuum configuration the vector PGBs remain strictly massless being protected by the non-Abelian gauge invariance of the starting Lagrangian (8). Together with the aforementioned true vector Goldstone boson they complete the entire Goldstonic vector field multiplet of the internal symmetry group G. As in the Abelian case, upon an explicit use of the corresponding SLIV constraint (9) being so far the only supplementary condition for vector field multiplet Aiµ, one comes to the pure Goldstone field modes aiµ identified in a similar way Aiµ = a (n · A) , n · a ≡ niµa µ,i = 0 (n2 ≡ n2µ) , (11) At the same time, an effective Higgs mode (i.e., the Aiµ component in the vacuum direction niµ) is given by the product n · A ≡ n µ,i determined by the SLIV con- straint n · A = M2 − n2(aiν) 2 =M − 2(aiν) +O(1/M2) (12) where, as earlier in the Abelian case, we took the positive sign for the square root when expanding it in powers of (aiν) 2/M2. Note that the general Goldstonic modes aiµ, apart from pure vector fields, contain the D − 1 scalar ones, a 0 and a (i′ = 1...D − 1), for the time-like (niµ = n 0gµ0δ i0) and space-like (niµ = n 3gµ3δ SLIV, respectively. They can be eliminated from the theory if one puts the proper supplementary conditions on the aiµ fields which were still the constraint free. Using their overall orthogonality (11) to the physical vacuum direction niµ one can formu- late these supplementary conditions in terms of a general axial gauge for the entire aiµ multiplet n · ai ≡ nµa µ,i = 0, i = 0...D − 1 (13) where nµ is the unit Lorentz vector introduced in the Abelian case which is now oriented in Minkowskian space-time so as to be parallel to the vacuum matrix niµ. For such a choice the simple equation holds µ = s inµ (s n · ni ) (14) which shows that the rectangular vacuum matrix niµ has the factorized ”two-vector” form. As a result, apart from the Higgs mode excluded earlier by the orthogonality condition (11), all the scalar fields also appear eliminated, and only pure vector fields, (µ′ = 1, 2, 3) or ai (µ′′ = 0, 1, 2) for time-like or space-like SLIV, respectively, are only left in the theory. We now show that the such constrained Goldstone vector fields aiµ (with the supplementary conditions (13) taken) appear truly massless when the starting non- Abelian Lagrangian L (8) is rewritten in the form determined by the SLIV. Actually, putting the parametrization (11) with the SLIV constraint (12) into the Lagrangian (8) one is led to the highly nonlinear Yang-Mills theory in terms of the pure Gold- stonic gauge field modes aiµ. However, as in the above Abelian case, one should first gauge away (using the local invariance of the Lagrangian L) the enormously large, while false, Lorentz violating terms appearing in the theory in the form of the fermion and vector field bilinears. As one can readily see, they stem from the couplings gψAµγ µψ and −1 g2Tr([Aµ, Aν ]) 2, respectively, when the effective Higgs mode expansion (12) is taken in the Lagrangian (8). Making the appropriate redef- initions of the fermion (ψ ) and vector (aµ ≡ a i) field multiplets ψ → U(ω)ψ , aµ → U(ω)aµU(ω) †, U(ω) = eigM(n i·x)ti (15) and using the evident equalities for the linear (in coordinate) transformations U(ω) with the single-valued vacuum matrix niµ (n 0 or n 3 for the particular SLIV cases) ∂µU(ω) = ign iU(ω) = igU(ω)niµt i (16) one can confirm that the abovementioned Lorentz violating terms are exactly can- celled with the analogous bilinears stemming from their kinetic terms. So, the final Lagrangian for the Goldstonic Yang-Mills theory takes the form (in the first approx- imation in (aiν) 2/M2) L(a,ψ) = − Tr(fµνf δ(n · ai)2 + Tr(fµνh 2(aiν) +ψ(iγ · ∂ −m)ψ + gψaµγ gn2(aiν) ψ(γ · nk)tkψ (17) where the tensor fµν is, as usual, fµν = ∂µaν − ∂νaµ − ig[aµ,aν ], while hµν is a new SLIV oriented tensor of the type hµν = nµ∂ν − nν∂µ + ig([nµ,aν ]− [nν ,aµ]), nµ ≡ n k (18) This tensor hµν acts on the infinite series in (a 2 coming from the expansion of the effective Higgs mode (12) from which only the first order term −n2(aiν) 2/2M was taken throughout the Lagrangian L(a,ψ). We also retained the former notations for the fermion and vector field multiplets after transformations (15), and explicitly in- cluded the (axial) gauge fixing term into Lagrangian according to the supplementary conditions taken (13). The theory derived gives a proper generalization of the nonlinear QED model[12] for the non-Abelian case. It contains the massless vector boson multiplet aiµ (con- sisting of one Goldstone and D − 1 pseudo-Goldstone vector states) which gauges the starting internal symmetry G. In the limit M → ∞ it is indistinguishable from a conventional Yang-Mills theory taken in the general axial gauge. So, for this part of the Lagrangian L(a,ψ) given by the zero-order in 1/M terms the spontaneous Lorentz violation only means the noncovariant gauge choice in the otherwise gauge invariant (and Lorentz invariant) theory. However, one may expect that, just as it appears in the nonlinear QED model, also all the first and higher order in 1/M terms in the L (17), though being by themselves the Lorentz and CPT violating ones, do not lead to the physical SLIV effects due to the mutual cancellation of their contributions into all the physical processes appeared. 4 The lowest order SLIV processes Let us now show that the simple tree level calculations related to the Lagrangian L(a,ψ) confirms in essence this proposition. As an illustration, we consider SLIV processes in the lowest order in g and 1/M being the fundamental parameters of the Lagrangian (17). They are, as one can readily see, the vector-fermion and vector- vector elastic scattering going in the order g/M , which we turn to once the Feynman rules in the Goldstonic Yang-Mills theory are established. 4.1 Feynman rules The corresponding Feynman rules, apart from the ordinary Yang-Mills theory rules (i) the vector-fermion vertex − ig γµ t i (19) (ii) the vector field propagator (taken in a general axial gauge nµaiµ = 0) Dijµν (k) = − gµν − nµkν + kµnν n · k n2kµkν (n · k)2 which automatically satisfies the orthogonality condition nµD µν(k) = 0 and on-shell transversality kµD µν(k) = 0 (k 2 = 0); the latter means that free vector fields with polarization vector ǫiµ(k, k 2 = 0) are always appeared transverse kµǫiµ(k) = 0; (iii) the 3-vector vertex (with vector field 4-momenta k1, k2 and k3; all 4-momenta in vertexes are taken ingoing throughout) gcijk[(k1 − k2)γgαβ + (k2 − k3)αgβγ + (k3 − k1)βgαγ ] (21) include the new ones, violating Lorentz and CPT invariance, for (iv) the contact 2-vector-fermion vertex (γ · nk)τkgµν δ ij (22) (v) another 3-vector vertex (k1 · n i)k1,αgβγδ jk + (k2 · n j)k2,βgαγδ ki + (k3 · n k)k3,γgαβδ where the second index in the vector field 4-momenta k1, k2 and k3 denotes their Lorentz components; (vi) the extra 4-vector vertex (with the vector field 4-momenta k1,2,3,4 and their proper differences k12 ≡ k1 − k2 etc.) [cijpδklgαβgγδ(n p · k12) + c klpδijgαβgγδ(n p · k34) + +cikpδjlgαγgβδ(n p · k13) + c jlpδikgαγgβδ(n p · k24) + (24) +cilpδjkgαδgβγ(n p · k14) + c jkpδilgαδgβγ(n p · k23)] where only the terms which can not lead to contractions of the rectangular vacuum matrix n µ with vector field polarization vectors ǫ µ(k) are presented. These contrac- tions are in fact vanished due to the gauge taken (13), np · ǫi = sp(n · ǫi) = 0 (with a factorized two-vector form for the matrix n µ (14) used). Just the rules (i-vi) are needed to calculate the lowest order amplitudes of the processes we have mentioned in the above. 4.2 Vector boson scattering on fermion This process is directly related to two SLIV diagrams one of which is given by the contact a2-fermion vertex (22), while another corresponds to the pole diagram with the longitudinal a-boson exchange between Lorentz violating a3 vertex (23) and ordinary a-boson-fermion one (19). Since ingoing and outgoing a-bosons appear transverse (k1 · ǫ i(k1) = 0, k2 · ǫ j(k2) = 0) only the third term in this a 3 coupling (23) contributes to the pole diagram so that one comes to a simple matrix element iM for both of diagrams iM = i ū(p2)τ (γ · nl) + i(k · nl)γµkνDµν(k) u(p1)[ǫ(k1) · ǫ(k2)] (25) where the spinors u(p1,2) and polarization vectors ǫ µ(k1) and ǫ µ(k2) stand for the in- going and outgoing fermions and a-bosons, respectively, while k is the 4-momentum transfer k = p2 − p1 = k1 − k2. Upon the further simplifications in the square bracket related to the explicit form of the a boson propagator Dµν(k) (20) and ma- trix niµ (14), and using the fermion current conservation ū(p2)(p̂2 − p̂1)u(p1) = 0, one is finally led to the total cancellation of the Lorentz violating contributions to the a-boson-fermion scattering in the g/M approximation. Note, however, that such a result may be in some sense expected since from the SLIV point of view the lowest order a-boson-fermion scattering discussed here is hardly distinct from the photon-fermion scattering considered in the nonlinear QED case[12]. Actually, the fermion current conservation which happens to be crucial for the above cancellation works in both of cases, whereas the couplings being peculiar to the Yang-Mills theory have not yet touched on. In this connection the next example seems to be more instructive. 4.3 Vector-vector scattering The matrix element for this process in the lowest order g/M is given by the contact SLIV a4 vertex (24) and the pole diagrams with the longitudinal a-boson exchange between the ordinary a3 vertex (21) and Lorentz violating a3 one (23), and vice versa. There are six pole diagrams in total describing the elastic a − a scattering in the s- and t-channels, respectively, including also those with an interchange of identical a-bosons. Remarkably, the contribution of each of them is exactly canceled with one of six terms appeared in the contact vertex (24). Actually, writing down the matrix element for one of the pole diagrams with ingoing a-bosons (with momenta k1 and k2) interacting through the vertex (21) and outgoing a-bosons (with momenta k3 and k4) interacting through the vertex (23) one has cijpδkl[(k1 − k2)µgαβ + (k2 − k)αgβµ + (k − k1)βgαµ] · ·Dpqµν(k)gγδkν(n q · k)[ǫi,α(k1)ǫ j,β(k2)ǫ k,γ(k3)ǫ l,δ(k4)] (26) where polarization vectors ǫi,α(k1), ǫ j,β(k2), ǫ k,γ(k3) and ǫ l,δ(k4) belong, respectively, to ingoing and outgoing a-bosons, while k = −(k1 + k2) = k3 + k4 according to the momentum running in the diagrams taken above. Again, as in the previous case of vector-fermion scattering, due to the fact that outgoing a-bosons appear transverse (k3 · ǫ k(k3) = 0 and k4 · ǫ l(k4) = 0), only the third term in the Lorentz violating a coupling (23) contributes to this pole diagram. Upon evident simplifications related to the a-boson propagator Dµν(k) (20) and matrix n µ (14) one comes to the expres- sion which is exactly cancelled with the first term in the contact SLIV vertex (24) when it is properly contracted with a-boson polarization vectors. Likewise, other terms in this vertex provide the further one-to-one cancellation with the remaining pole matrix elements iM (2−6) . So, again, the Lorentz violating contribution to the vector-vector scattering is absent in Goldstonic Yang-Mills theory in the lowest g/M approximation. 4.4 Other processes Other tree level Lorentz violating processes, related to a bosons and fermions, appear in higher orders in the basic SLIV parameter 1/M . They come from the subsequent expansion of the effective Higgs mode (12) in the Lagrangian (17). Again, their amplitudes are essentially determined by an interrelation between the longitudinal a-boson exchange diagrams and the corresponding contact a-boson interaction dia- grams which appear to cancel each other thus eliminating physical Lorentz violation in theory. Most likely, the same conclusion can be derived for SLIV loop contributions as well. Actually, as in the massless QED case considered earlier [19], the corre- sponding one-loop matrix elements in Goldstonic Yang-Mills theory either vanish by themselves or amount to the differences between pairs of the similar integrals whose integration variables are shifted relative to each other by some constants (be- ing in general arbitrary functions of external four-momenta of the particles involved) that in the framework of dimensional regularization leads to their total cancellation. So, the Goldstonic vector field theory (17) for a non-Abelian charge-carrying matter is likely to be physically indistinguishable from a conventional Yang-Mills theory. 5 Conclusion The spontaneous Lorentz violation in 4-dimensonal flat Minkowskian space-time was shown to generate vector Goldstone bosons both in Abelian and non-Abelian theo- ries with the corresponding nonlinear vector field constraint (1) or (9) imposed. In the Abelian case such a massless vector boson is naturally associated with photon. In non-Abelian case, although the pure Lorentz violation still generates only one genuine Goldstone vector boson, the accompanying vector PGBs related to a vio- lation of the larger accidental symmetry SO(D, 3D) of the SLIV constraint (9) in itself come also into play in the final arrangement of the entire Goldstone vector field multiplet of the internal symmetry group G. Remarkably, they remain strictly massless being protected by the gauge invariance of the Yang-Mills theory involved. These theories, both Abelian and non-Abelian, while being essentially nonlinear in the Goldstone vector modes, are physically indistinguishable from conventional QED and Yang-Mills theory. One could actually see that just the gauge invariance not only provides these theories to be free from the unreasonably large Lorentz violation stemming from the fermion and vector field bilinears (see Sections 2 and 3), but also render all the other physical SLIV effects (including those which are suppressed by the Lorentz violation scale M) non-observable (Section 4). As a result, Abelian and non-Abelian SLIV theory appear, respectively, as standard QED and Yang-Mills theory taken in the nonlinear gauge (to which the vector field constraints (1) and (9) are virtually reduced), while the S-matrix remains unaltered under such a gauge convention. So, while at present the Goldstonic nature of gauge fields, both Abelian and non- Abelian, seems to be highly plausible, the most fundamental question of physical Lorentz violation in itself, that only could uniquely point toward such a possibility, is still an open question. Note, that here we are not dealing with direct (and quite arbitrary in essence) Lorentz non-invariant extensions of QED or Standard Model which were intensively discussed on their own in recent years [6-8]. Rather, the case in point is a construction of genuine SLIV models which would generate gauge fields as the proper vector Goldstone bosons, from one hand, and could lead to observed Lorentz violating effects, from the other. In this connection, somewhat natural framework for physical Lorentz violation to occur would be a model where the internal gauge invariance were slightly broken at very small distances through some high-order operators stemming from the gravity-influenced area. Such physical SLIV effects would be seen in terms of powers of ratio M/MP l (where MP l is the Planck mass). So, for the SLIV scale comparable with the Planck one they would become directly observable. Remarkably enough, if one has such internal gauge symmetry breaking in an ordinary Lorentz invariant theory this breaking appears vanishingly small at laboratory being properly suppressed by the Planck scale. However, the spontaneous Lorentz violation would render it physically significant: the higher Lorentz scale, the greater SLIV effects observed. If true, it would be of particular interest to have a better understanding of the internal gauge symmetry breaking mechanism that brings out the spontaneous Lorentz violation at low energies. We return to this basic question elsewhere. Acknowledgments We would like to thank Colin Froggatt, Rabi Mohapatra and Holger Nielsen for useful discussions and comments. One of us (J.L.C.) is grateful for the warm hospitality shown to him during a visit to Center for Particle and String Theory at University of Maryland where part of this work was carried out. References [1] W. Heisenberg, Rev. Mod. Phys. 29 (1957) 269; J.D. Bjorken, Ann. Phys. (N.Y.) 24 (1963) 174; I. Bialynicki-Birula, Phys. Rev. 130 (1963) 465 ; G. Guralnik, Phys. Rev. 136 (1964) B1404; T. Eguchi, Phys.Rev. D 14 (1976) 2755; H. Terazava, Y. Chikashige and K. Akama, Phys. Rev. D 15 (1977) 480 . [2] C.D. Froggatt and H.B. Nielsen, Origin of Symmetries (World Scientific, Sin- gapore, 1991). [3] J.L. Chkareuli, C.D. Froggatt and H.B. Nielsen, Phys. Rev. Lett. 87 (2001) 091601; J.L. Chkareuli, C.D. Froggatt and H.B. Nielsen Nucl. Phys. B 609 (2001) 46; J.D. Bjorken, hep-th/0111196; Per Kraus and E.T. Tomboulis, Phys. Rev. D 66 (2002) 045015; A. Jenkins, Phys. Rev. D 69 (2004) 105007; J.L. Chkareuli, C.D. Froggatt, R.N. Mohapatra and H.B. Nielsen, hep-th/0412225; J.L. Chkareuli, C.D. Froggatt and H.B. Nielsen, hep-th/0610186. [4] D. Colladay and V.A. Kostelecky, Phys. Rev. D58 (1998) 116002 ; V.A. Kostelecky, Phys. Rev. D69 (2004) 105009 ; R. Bluhm and V.A. Kostelecky, Phys. Rev. D 71(2005) 065008; CPT and Lorentz Symmetry, ed. A. Kostelecky (World Scientific, Singapore, 1999, 2002, 2005). [5] S.M. Carroll, G.B. Field and R. Jackiw, Phys. Rev. D 41 (1990) 1231; R. Jackiw and V.A. Kostelecky, Phys. Rev. Lett. 82 (1999) 3572. [6] S. Coleman and S.L. Glashow, Phys. Rev. D 59 (1999) 116008. http://arxiv.org/abs/hep-th/0111196 http://arxiv.org/abs/hep-th/0412225 http://arxiv.org/abs/hep-th/0610186 [7] J. W. Moffat, Int. J. Mod.Phys. D2 (1993) 351; J.W. Moffat, Int. J. Mod.Phys. D12 (2003) 1279. [8] O. Bertolami and D.F. Mota, Phys. Lett. B 455 (1999) 96. [9] T. Jacobson, S. Liberati and D. Mattingly, Ann. Phys. (N.Y.) 321 (2006) 150. [10] Y. Nambu and G. Jona-Lasinio, Phys. Rev. 122 (1961) 345. [11] M. Suzuki, Phys. Rev. D 37 (1988) 210 . [12] Y. Nambu, Progr. Theor. Phys. Suppl. Extra 190 (1968). [13] S. Weinberg, The Quantum Theory of Fields, v.2, Cambridge University Press, 2000. [14] V.A. Kostelecky and S. Samuel, Phys. Rev. D 39 (1989) 683; V.A. Kostelecky and R. Potting, Nucl. Phys. B 359 (1991) 545. [15] P.A.M. Dirac, Proc. Roy. Soc. 209A (1951) 292; P.A.M. Dirac, Proc. Roy. Soc. 212A (1952) 330. [16] R. Righi and G. Venturi, Lett. Nuovo Cim. 19 (1977) 633; R. Righi, G. Venturi and V. Zamiralov, Nuovo Cim. A47 (1978) 518. [17] Y.M. Cho and P.G.O. Freund, Phys. Rev. D 12 (1975) 1711. [18] E. A. Ivanov and V.I. Ogievetsky, Lett. Math. Phys. 1 (1976) 309 . [19] A.T. Azatov and J.L. Chkareuli, Phys. Rev. D 73 (2006) 065026. [20] J.L. Chkareuli and Z.R. Kepuladze, Phys. Lett. B 644 (2007) 212; J.L. Chkareuli and Z.R. Kepuladze, Proc. of XIV Int. Seminar “Quarks-2006”, eds. S.V. Demidov at al (Moscow, INR, 2006); hep-th/0610227. http://arxiv.org/abs/hep-th/0610227 Introduction Goldstonic quantum electrodynamics Goldstonic Yang-Mills theory The lowest order SLIV processes Feynman rules Vector boson scattering on fermion Vector-vector scattering Other processes Conclusion ABSTRACT We argue that non-Abelian gauge fields can be treated as the pseudo-Goldstone vector bosons caused by spontaneous Lorentz invariance violation (SLIV). To this end, the SLIV which evolves in a general Yang-Mills type theory with the nonlinear vector field constraint $Tr(% \boldsymbol{A}_{\mu }\boldsymbol{A}^{\mu})=\pm M^{2}$ ($M$ is a proposed SLIV scale) imposed is considered in detail. With an internal symmetry group $G$ having $D$ generators not only the pure Lorentz symmetry SO(1,3), but the larger accidental symmetry $SO(D,3D)$ of the SLIV constraint in itself appears to be spontaneously broken as well. As a result, while the pure Lorentz violation still generates only one genuine Goldstone vector boson, the accompanying pseudo-Goldstone vector bosons related to the $SO(D,3D)$ breaking also come into play in the final arrangement of the entire Goldstone vector field multiplet. Remarkably, they remain strictly massless, being protected by gauge invariance of the Yang-Mills theory involved. We show that, although this theory contains a plethora of Lorentz and $CPT$ violating couplings, they do not lead to physical SLIV effects which turn out to be strictly cancelled in all the lowest order processes considered. However, the physical Lorentz violation could appear if the internal gauge invariance were slightly broken at very small distances influenced by gravity. For the SLIV scale comparable with the Planck one the Lorentz violation could become directly observable at low energies. <|endoftext|><|startoftext|> Introduction The knowledge of the properties of highly compressed and heated hadronic matter is an important issue for the understanding of astrophysical processes, such as the mechanism of supernovae explosions and the physics of neutron stars [1,2]. Heavy ion collisions provide the unique opportunity to explore highly excited hadronic matter, i.e. the high density behavior of the nuclear EoS, under controlled conditions (high baryon energy densities and tempera- tures) in the laboratory [3]. Of particular recent interest is also the still poorly known density dependence of the isovector channel of the EoS. Suggested observables have been the nucleon collective flows [3,4] and the distributions of produced particles such as pions and, in particular, particles with strangeness (kaons) [5,6]. Because of the rather high energy threshold (Elab = 1.56 GeV for Nucleon-Nucleon collisions), kaon production in HICs at energies in the range 0.8− 1.8 AGeV is mainly due to secondary processes involving ∆ resonances and pions (π). On the other hand, secondary processes require high baryon density. This explains why the kaon production around threshold is intimately connected to the high density stage of the nucleus- nucleus collision. Furthermore, the relatively large mean free path of positive charged (K+) and neutral (K0) kaons inside the hadronic environment causes hadronic matter to be transparent for kaons [7]. Therefore kaon yields and generally strangeness ratios have been proposed as important signals for the investigation of the high density behavior of the nuclear EoS. This idea, as firstly suggested by Aichelin and Ko [8], has been recently applied in HIC at intermediate energies in terms of strangeness ratios, e.g. the ratio of the kaon yields in Au+Au and C+C collisions [5,9]. In these studies it was found that this ratio is very sensitive to the stiffness of the nuclear EoS. Indeed comparisons with KaoS data [10] favored a soft behavior of the high density nuclear EoS, a statement which is particularly consistent with elliptic flow data of the FOPI collaboration [11]. The idea of studying particle ratios in HICs around the kinematical threshold has been recently applied in the determination of the isovector channel of the nuclear EoS, i.e. the high density dependence of the symmetry energy Esym. It has turned out that particle ratios, such as (π−/π+) [12] or (K0/K+) [13–15], are sensitive to the stiffness of the symmetry energy and, in particular to the strength of the vector isovector field. However in medium effects on the kaon propagation have been neglected so far. Here we will test the robustness of the yield ratio against the inclusion and the variation of the corresponding kaon potentials. At the same time in Ref. [16] the role of the in-medium modifica- tions of NN cross sections has been studied in terms of baryon and strangeness dynamics. It was found that the pion and kaon yields are sensitively influenced by the reduced effective NN cross sections for inelastic processes. Here we will see that the kaon yield ratio appears robust even with respect to the density dependence of the in-medium inelastic NN cross sections, while at variance the pion ratio seems to be more sensitive. The collision dynamics is rather complex and involves the nuclear mean field (EoS) and binary 2-body collisions. In the presence of a nuclear medium the treatment of binary collisions represents a non-trivial problem. The NN cross sections for elastic and inelastic processes, which are the crucial physical pa- rameters here, are experimentally accessible only in free space and not for 2-body scattering at finite baryon density. Recent microscopic studies, based on the G-matrix approach, have shown a strong decrease of the elastic NN cross section [17,18] in a hadronic medium. These in-medium effects of the elastic NN cross section considerably influence the hadronic reaction dynam- ics [19]. Obviously the question arises whether similar in-medium effects of the inelastic NN cross sections may affect the reaction dynamics and, in particular, the production of particles (pions and kaons). Furthermore, the strangeness propagation inside the nuclear medium is even more complex and involves the additional consideration of kaon mean field potentials in the dynamical description. This is an important issue when com- paring with experimental kaon data [10]. In a Chiral Perturbation approach at the lowest order (ChPT Potentials), the kaon (antikaon) potential has an attractive scalar and a repulsive (attractive) vector part [20]. This leads to weakly repulsive (strongly attractive) potentials for kaons (antikaons) with corresponding scalar and vector kaon-nucleon coupling constants depending on the parametrization [20,21] accounted for. Similar results can be obtained in an effective meson-coupling model (OBE Potentials, in the RMF spirit), where the K-meson couplings are simply related to the nucleon-meson ones, in the spirit of ref. [22]. The latter approach has the advantage of being fully consistent with the covariant transport equations used to simulate the reaction dynamics [14,15]. We remind that the high density dependence of the kaon self energies is still an object of current debate, e.g. see Refs. [23,7] in which the role of the kaon potential has been investigated in terms of kaon in-plane and out-of-plane flows. Moreover for studies aimed to the determination of the symmetry energy from strangeness production one has to consider with particular care the isospin dependence of the kaon mean field potential. The main focus of the present work is on a detailed study of the robustness of the pionic (π−/π+) and, in particular, the strangeness ratio (K0/K+) with re- spect to the in-medium modifications of the imaginary part of the nucleon self energy, i.e. the NN cross sections, and to the in-medium variations of the kaon self energy, i.e. the density dependence of the kaon potential. This analysis, which goes beyond our previous investigations of [14,15], is also motivated by new measurements of the FOPI collaboration [24] by means of the strangeness ratios. The paper is organized as follows: The next Section describes the theoret- ical treatment of the reaction dynamics within the Relativistic Boltzmann- Uheling-Uhlenbeck (RBUU) transport equation. A detailed discussion on the in-medium modifications of the inelastic NN cross sections is presented. In Sec- tion 3 we discuss the kaon mean field potentials (in both ChPT and OBE/RMF schemes) and their expected isospin dependence. Section 4 is devoted to a short introduction to the dynamical calculations. Results are then shown in Section 5, mostly for central 197Au+197 Au collisions at 1AGeV , in terms of pion and kaon yields. The initial presentation of the absolute yields is relevant for a de- tailed discussion as well as for a comparison with theoretical results of other groups and with experimental data of the KaoS and FOPI collaborations. All together this intermediate step is important for testing the reliability of the calculations, since ratios do not do it. Finally we present the pion and strangeness ratios and discuss their dependence on the in-medium modifica- tions of the cross NN cross sections and of the kaon potentials, including the isospin effects. In Section 6 we conclude with a summary and some general comments and perspectives. 2 Theoretical description of the collision dynamics In this chapter we briefly discuss the transport equation focusing on the treat- ment of two features important for kaon dynamics: (a) the collision integral by means of the cross sections; (b) the kaon mean field potential and its isospin dependence. 2.1 The RBUU equation The theoretical description of HICs is based on the semiclassical kinetic the- ory of statistical mechanics, i.e. the Boltzmann Equation with the Uehling- Uhlenbeck modification of the collision integral [25]. The relativistic analog of this equation is the Relativistic Boltzmann-Uehling-Uhlenbeck (RBUU) equa- tion [26] k∗µ∂xµ + (k µν +M∗∂µxM ∗) ∂k f(x, k∗) = 2(2π)9 W (kk2|k3k4) f3f4f̃ f̃2 − ff2f̃3f̃4 , (1) where f(x, k∗) is the single particle distribution function. In the collision term the short-hand notations fi ≡ f(x, k∗i ) for the particle and f̃i ≡ (1− f(x, k∗i )) 0 50 100 150 200 250 300 350 [MeV] =1.1 fm =1.34 fm =1.7 fm pn-data Fig. 1. Elastic in-medium neutron-proton cross section σel at various Fermi momenta kF as a function of the laboratory energy Elab. The free cross section (kF = 0) is compared to the experimental total np cross section [17]. for the hole distributions are used, with E∗ M∗2 + k2. The collision in- tegral explicitly exhibits the final state Pauli-blocking while the in-medium scattering amplitude includes the Pauli-blocking of intermediate states. The dynamics of the drift term, i.e. the lhs of eq.(1), is determined by the mean field. Here the attractive scalar field Σs enters via the effective mass M∗ = M − Σs and the repulsive vector field Σµ via the kinetic momenta k∗µ = kµ − Σµ and via the field tensor F µν = ∂µΣν − ∂νΣµ. The dynamical description according to Eq.(1) involves the strangeness propagation in the nuclear medium. This topic will be discussed in more detail at the end of this section. 2.2 In-medium effects on NN cross sections The in-medium cross sections for 2-body processes (see below) enter in the collision integral via the transition amplitude W = (2π)4δ4 (k + k2 − k3 − k4) (M∗)4|T |2 (2) with T the in-medium scattering matrix element. In the kinetic equation (1) both physical input quantities, the mean field (EoS) and the collision integral (cross sections) should be derived from the same underlying effective two-body interaction in the medium, i.e. the in-medium T-matrix; Σ ∼ ℜTρB, σ ∼ ℑT , respectively. However, in most practical applications phenomenological mean fields and cross sections have been used. In such approach the strategy is to adjust to the known bulk properties of nuclear matter around the saturation point, and to try to constrain the models at supra-normal densities with the help of heavy ion reactions [27,28]. Medium modifications of the NN cross sections are usually not taken into account. In spite of that for several observables the comparison to experimental data appears to work astonishingly well [27–30]. However, in particular kinematical regimes a sensitivity to the elastic NN cross sections of dynamical observables, such as collective flows and stopping [19,31] or transverse energy transfer [32], has been observed. Microscopic Dirac-Brueckner-Hartree-Fock (DBHF) studies for nuclear matter above the Fermi energy regime show a strong density dependence of the elastic [17] and inelastic [18,33] NN cross sections. In such studies one starts from the bare NN-interaction in the spirit of the One-Boson-Exchange (OBE) model by fitting the parameters to empirical nucleon-nucleus scattering and solves then the equations of the nuclear matter many body problem in the T -matrix or ladder approximation. It is not the aim of the present work to go into further details on this topic. An important feature of such microscopic calculations is the inclusion of the Pauli-blocking effect in the intermediate scattering states of the T -matrix elements and their in-medium modifications, i.e. the density dependence of the nucleon mass and momenta. Here of particular interest are the in-medium modifications of the inelastic NN cross sections since they di- rectly influence the production mechanism of resonances and thus the creation of pions and kaons according to the channels listed later (see Sect.3). DBHF studies on inelastic NN cross sections are rare and in limited regions of density and momentum [18]. For this reason we will first discuss in the following the in-medium dependence of the elastic NN cross sections, which will be then used as a starting basis for a detailed analysis of the density dependence of the inelastic NN cross sections. The microscopic in-medium dependence of the elastic cross sections can be seen in Fig. 1, where the energy dependence of the in-medium neutron-proton (np) cross section at Fermi momenta kF = 0.0, 1.1, 1.34, 1.7fm −1, correspond- ing to ρB ∼ 0, 0.5, 1, 2ρ0 (ρ0 = 0.16fm−3 is the nuclear matter saturation den- sity) is shown. These results are obtained from relativistic Dirac-Brueckner calculations [17]. The presence of the medium leads to a substantial suppres- sion of the cross section which is most pronounced at low laboratory energy Elab and high densities where the Pauli-blocking of intermediate states is most efficient. At larger Elab asymptotic values of 15-20 mb are reached. Also the angular distributions are affected by the presence of the medium. E.g. the ini- tially strongly forward-backward peaked np cross sections become much more isotropic at finite densities, mainly due to the Pauli suppression of intermedi- ate soft modes (π-exchange) [17]. As a consequence a larger transverse energy transfer can be expected. The case of the inelastic NN cross sections is similar, but more complicated. The presence of the medium influences not only the matrix elements, but also the threshold energy Etr, which is an important quantity at beam energies be- low or near the threshold of particle production. In free space it is calculated from the invariant quantity s = (p 1 + p 2 )(p1µ + p2µ) with p i , (i = 1, 2) the 4-momenta of the two particles in the ingoing collision channel, e.g. NN −→ N∆. This quantity is conserved in binary collisions in free space, from which one determines the modulus of the momenta of the particles in the outgoing channel. The threshold condition reads Etr ≡ s ≥ M1 +M2. Cross sections in free space are usually parametrized in terms of s or the corresponding mo- mentum in the laboratory system plab within the One-Boson-Exchange (OBE) model, see e.g. [34] for details. At finite density, however, particles carry kinetic momenta and effective masses and obey a dispersion relation p∗µp ∗µ = m∗2 modified with respect to the free case. These in-medium effects shift the threshold energy in the free space according to s∗ = (p 2 )(p 2µ) and the threshold condition for inelastic processes inside the medium reads now E∗ s∗ ≥ m∗ . The requirement of energy-momentum conservation can be carried out in terms of the quantity s∗ or s, only as long as the in-medium mean fields or the corresponding self energies do not change between ingoing and outgoing channels. The application of free parametrizations of cross sections for inelastic processes in dynamical situations of HICs at finite density leads thus to an inconsistency, since the threshold condition is performed in terms of effective quantities, but the matrix elements are carried out in free space, e.g. by fitting their parameters to free empirical NN scattering. This effect can be seen in Fig. 2 (left panel) where the free inelastic NN −→ N∆ cross section σinel as a function of the laboratory energy Elab is displayed, at various baryon densities ρB. The threshold energy in the free space is Etr = s = 2.014 GeV (for M = 0.939 GeV and Mmin = 1.076 for the nucleon and the lower limit mass of the ∆ resonance). The corresponding threshold value of the laboratory energy Elab = (E − 4M2)/2M is 0.32 GeV. However, at finite density the threshold is shifted towards lower energies, i.e. the free cross section increases, due to the reduction of the free masses of the outgoing particles in the threshold condition E∗ . Obviously at higher energies far from threshold the free cross section does not depend on the density. A more consistent approach is the determination of the inelastic cross section under the consideration of in-medium effects, i.e. the Pauli-blocking of inter- mediate scattering states and in-medium modified spinors in the determina- tion of the matrix elements within the OBE model. A simultaneous treatment of the transport equation and the structure equations of DBHF for actual anisotropic momentum configurations is not possible, due to its high com- plexity. For this reason we have applied the same method as for the case 0 0.5 1 1.5 2 (GeV) 0 0.5 1 1.5 2 (GeV) =0.5ρ0 free effective Fig. 2. Inelastic NN −→ N∆ cross section σinel at various baryon densities ρB (in units of the saturation density ρ0 = 0.16 fm −3) as a function of the laboratory energy Elab using the free parametrizations (left) and the in-medium modified ones from DBHF [18] (right). of elastic binary processes, i.e. in-medium parametrizations of the inelastic cross sections of the type NN −→ N∆ within the same underlying DBHF approach as already used for the elastic processes. Haar and Malfliet [18] investigated this topic for infinite nuclear matter with the result of a strong in-medium modification of the inelastic cross sections due to the reasons given above. However, these studies were performed at various densities but only in a limited region of momenta. For a practical application in HICs we have thus extended these DBHF calculations using an extrapolation technique. We have imposed an exponential decay law of the form ae−bplab on the values of the in-medium cross sections of the channel NN → N∆ given in ref. [18]. The parameter a normalizes to the last value of the extrapolated cross sec- tion and b is defined by fitting the slope of the free cross section, since it does not change with density. For the density dependence we have enforced a correction of the form f(ρB) = 1 + a0(ρB/ρ0) + a1(ρB/ρ0) 2 + a2(ρB/ρ0) where a0 = −0.601, a1 = 0223, a2 = −0.0035, with ρ0 saturation density, are extracted from the results of ref.[18]. The same modification is imposed on the cross sections of all the inelastic channels, in a form of the type σeff = σfree(Elab)f(ρB), with σfree taken from the standard free parametriza- tions of Ref. [34]. Such a procedure is well appropriate at low energies but at higher momenta can be less accurate. This, however, should not be a problem at the reaction energies below the kaon production threshold considered in this work. Fig. 2 (right panel) shows the energy dependence of the inelastic NN cross section at various densities as obtained from DBHF calculations [17] for sym- metric nuclear matter. As in the case of elastic processes (see Fig. 1), the inelastic one drops with increasing baryon density ρB mainly due to the Pauli blocking of intermediate scattering states and the in-medium modification of the effective Dirac mass [17]. There are also phenomenological studies [16,33] which give similar medium effects on the inelastic cross sections, within the limitation to isospin symmetric nuclear matter. More suitable results would come from a DBHF approach to isospin asymmetric nuclear matter. Only re- cently such studies have been started [35], however, limiting to low momenta regions, below the threshold energy of inelastic channels. 3 Kaon Potentials Before starting with the presentation of the results, it is important to analyse the in-medium kaon potential, since it could be relevant when theoretical re- sults will be compared with experiments. In fact it has been widely discussed whether the kaon potential plays a crucial role in describing kaon production and their dynamics [23,7,9]. Kaplan and Nelson [20] found that the explicit chi- ral symmetry breaking is not so small forK mesons and this leads to significant corrections to the free kaon mass at finite baryon density. There are different models for the description of kaon properties in the nuclear medium. Here we will briefly discuss two main approaches, one based on Chiral Perturbation Theory (ChPT ) and a second on effective meson couplings (OBE/RMF ), more consistent with the general frame of our covariant reaction dynamics. The results are in good agreement and this is not surprising on the basis of a simple physics argument. It is well established [7] that kaons (K0,+) feel a weak repulsive potential in nuclear matter, of the order of 20 − 30 MeV at normal density. This can be described as the net result of the cancellation of an attractive scalar and a repulsive vector interaction terms. Such a mechanism can be reproduced in the ChPT approach through the competition between an attractive scalar Kaplan-Nelson term [20] and a repulsive vector Weinberg- Tomozawa [36] term. The same effect can be obtained in an effective meson field scheme just via a coupling to the attractive σ-scalar and to the repulsive ω-vector fields. In this paper antikaons K− and their strong attractive potential will be not discussed, since for the higher threshold they have been not considered in the energy range of interest here. Finally, for studies aimed to the determination of the symmetry energy from strangeness production one has to treat with particular care the isospin de- pendence of the kaon mean field potential. 3.1 Chiral Perturbative Results Starting from an effective chiral Lagrangian for the K mesons one obtains a density and isospin dependence for the effective kaon (K0,+) masses [7]. In isospin asymmetric matter we finally get m∗K = m2K − ρs3 + VµV µ (upper sign, K +), (3) where ρs, ρs3 are total and isospin scalar densities, with mK = 494MeV the free kaon mass, fπ = 93MeV the pion decay constant, and ΣKN the kaon- nucleon sigma term (attractive scalar), here chosen as 450 MeV. The vector potential is given by: 8f ∗2π 8f ∗2π jµ3 (upper sign, K +), (4) with jµ, jµ3 baryon and isospin currents. The f π is an in-medium reduced pion decay constant. It is expected to scale with density in a way similar to the chiral condensate [37]. This leads to a reduction around normal density f ∗2π ≃ 0.6f 2π . Such a reduction is compensated in one-loop ChPT by other contributions in the scalar attractive term so we will use f ∗π only for the vector potential, with an enhanced repulsive effect [7]. The constant C has been fixed from the Gell-Mann-Okubo mass formula (i.e. in free space) to a value of 33.5MeV [22]. In Eqs. (3-4) upper signs hold for K+ and lower signs for K0. As can be seen, the vector term, which dominates over the scalar one at high density, is more repulsive for K0 than for K+. This leads to a higher (lower) K0 (K+) kaon in-medium energy given by the dispersion relation EK(k) = k0 = k2 +m∗2K + V0 (5) The density dependence, evaluated in the chiral approach, of the quantity EK(k)k=0 = m K + V0 for K 0,+, that directly influences the in-medium pro- duction thresholds is shown by the upper curves in Fig. 3 (left panel). In particular, it can be noted that that K0 and K+ in medium-energy differs by ≈ 5% at ρB = 2ρ0 (with EK0 > EK+), at a fixed isospin asymmetry around 0.2. Therefore, the inclusion of isovector terms favors K+ over K0 production, with a consequent reduction of the K0/K+ strangeness ratio. 0 1 2 3 0 1 2 3 0 - E Fig. 3. Density dependence (ρ0 is the saturation density) of in medium kaon energy (left panel) in unit of the free kaon mass (mK = 0.494GeV ). Upper curves refer to ChPT model calculations: the central line corresponds to symmetric matter, the other two give the isospin effect (up K0, down K+). Bottom curves are obtained in the OBE/RMF approach, the solid one is for symmetric matter. The isospin splitting is given by the dashed (NLρ) and dotted (NLρδ) lines, again up K0, down K+. Right panel: relative weight of the isospin splitting, see text. All the curves are obtained considering an asymmetry parameter α = 0.2. 3.2 Relativistic Mean Field Results Kaon potentials can be also derived within an effective meson field OBE ap- proach, fully consistent with the RMF transport scheme used to simulate the reaction dynamics, see Eq.(1). We will use a simple constituent quark-counting prescription to relate the kaon-meson couplings to the nucleon-meson cou- plings, i.e. just a factor 3 reduction. Following the chiral argument discussed before, only for the scalar vector case we have further increased the kaon coupling to gωK ≃ 1.4/3gωN . This will ensure the required repulsion around normal densities for K+s. Consistently the isospin dependence will be directly derived from the coupling between the kaon fields and the ρ and δ isovector mesons [22]. The in-medium energy carried by kaons will have the same form as in Eq.(5) but with effective masses and vector potentials given by m∗K = m2K −mK(gσKσ ± gδKgδN m2K −mK(gσKσ ± fδρs3) (6) gωKgωN gρKgρN ∗ρB ± fρρB3) (7) where upper signs are forK+s. The fi ≡ g2iN/m2i , i = σ, ω, ρ, δ are the nucleon- meson coupling constants used in our RMF Lagrangians and f ∗ω = 1.4fω due to the enhanced kaon-scalar/vector coupling. σ represents the solution of the non linear equation for the scalar/isoscalar field which gives the reduction of the nucleon mass in symmetric matter, therefore we can directly evaluate the kaon-σ coupling using gσKσ = (M −M∗) where M∗ is the nucleon effective mass at the fixed baryon density. In this RMF approach we can derive an almost analytical expression for the isospin effects on the kaon in-medium energy Eq.(5) at k = 0. Using the approximate form ρs ≃ M∗/E∗FρB for the scalar density, we get a rel- ative weight of the isospin splitting of the kaon potentials ∆EK(k)k=0 ≡ EK0(k)k=0 −EK+(k)k=0 given by 2α(fρ − M f ∗ω + (mK − 16(M −M∗)) with α ≡ ρB3/ρB the asymmetry parameter. We can now easily estimate the isospin splitting of K0 vs. K+ for the two isovector mean field Lagrangians used here, NLρ and NLρδ. The effect will be clearly larger when the δ coupling is included since we have to increase the ρ-coupling fρ, see [14,15], but still the expected weight is relatively small, going from about 1.5% (NLρ) to about 3.0% (NLρδ) at ρB = 2ρ0, for a fixed isospin asymmetry around 0.2. The complete results are also shown in Fig. 3 (right panel). The agreement with the ChPT estimations is rather good, but in the RMF scheme we see an overall reduced repulsion and a smaller isospin splitting. Both effects are of interest for our discussion, the first affecting the K0,+ absolute yields, the second important for the K0/K+ yield ratios. 4 Numerical realization and notations The Vlasov term of the RBUU equation (1) is treated within the Relativistic Landau-Vlasov method, in which the phase space distribution function f(x, p∗) is represented by covariant Gaussians in coordinate and momentum space [38]. For the nuclear mean field or the corresponding EoS in symmetric matter the fσ (fm 2) fω (fm 2) fρ (fm 2) fδ (fm 2) A (fm−1) B NLρ 9.3 3.6 1.22 0.0 0.015 -0.004 NLρδ 9.3 3.6 3.4 2.4 0.015 -0.004 Table 1 Coupling parameters in terms of fi ≡ ( gimi ) 2 for i = σ, ω, ρ, δ, A ≡ a and B ≡ b for the non-linear NL models [14] using the ρ (NLρ) and both, the ρ and δ mesons (NLρδ) for the description of the isovector mean field. NL2 parametrization [26] of the non-linear Walecka model [39] is adopted with a compression modulus of 200 MeV and a Dirac effective mass of m∗ = 0.82 M (M is the bare nucleon mass) at saturation. The momentum dependence enters via the relativistic treatment in terms of the vector component of the baryon self energy. The isovector components in the mean fields are introduced in the NLρ,NLρδ Lagrangians as in the recent Refs. [14,15]. In Table 1 we report all the coupling constants and the coefficients of the non-linear σ-terms. The collision integral is treated within the standard parallel ensemble algo- rithm imposing energy-momentum conservation. For the elastic NN cross sec- tions the DBHF calculations of Ref. [17] have been used throughout this work. At intermediate relativistic energies up to the threshold of kaon (K0,+) pro- duction, i.e. Elab = 1.56 GeV, the major inelastic channels are (B, Y,K stand for a baryon (nucleons N or a ∆-resonance), hyperon and kaon, respectively) – NN ←→ N∆ (∆-production and absorption) – ∆←→ πN (π-production and absorption) – BB −→ BYK, Bπ −→ Y K (K-production from BB and Bπ-channels) The produced resonances propagate in the same mean field as the nucleons, and their decay is characterized by the energy dependent lifetime Γ which is taken from Ref. [34]. The produced pions propagate under the influence of the Coulomb interaction with the charged hadrons. Kaon production is treated hereby perturbatively due to the low cross sections, taken from Refs. [40]. Kaons undergo elastic scattering and their phase space trajectories are deter- mined by relativistic equations of motion, if the kaon potential is accounted In the next section the results of transport calculations in terms of pion and kaon yields and their rapidity distributions will be presented. The following cases for the inelastic NN cross sections σinel and the kaon potential ΣK (scalar and vector) will be particularly discussed: – free σinel, without ΣK (w/o K-pot σfree) – free σinel, with ΣK (w K-pot σfree) – free σinel, with isospin dependent ΣK (w ID K-pot σfree) – effective σinel, without ΣK (w/o K-pot σeff ) – effective σinel, with ΣK (w K-pot σeff ) – effective σinel, with isospin dependent ΣK (w ID K-pot σeff ) For pions only the different cases of σinel will be labelled, since they do not ex- perience any potential, apart coulomb. One should note that in all calculations only inelastic processes including the lowest mass resonance ∆(1232MeV ) have been considered, without accounting for the N∗(1440) resonance. This will have not appreciable consequences for pions yields, but it slightly reduces the kaon multiplicities. 5 Results As mentioned in the introduction, the main topic of the present work is to study the sensitivity of particle ratios to physical parameters such as in- medium effects of cross sections and the isospin dependence of the kaon po- tential. This is an important issue to clarify since there is some evidence sug- gesting the yield ratios as good observables in determining the high density behavior of the symmetry energy. In a near future these data will be exper- imentally accessible with the help of reactions with radioactive ion beams. However, a comparison of absolute values with experimental data, although it is not the aim of this work, is essential and it has to be included in order to show the consistency of our approach. Thus we will start the presentation of the results first in terms of absolute yields, and comparison with data, before passing to the main section on the particle ratios. Most calculations refer to central 197Au+197 Au collisions at 1 AGeV . 5.1 Effects of in-medium inelastic NN cross sections on particle yields 5.1.1 Resonance and Pion Production Here we study the role of the density dependence of the effective inelastic NN cross sections on particle yields (pions and kaons). We start with the temporal evolution of the ∆ resonances and the produced pions, as shown in Fig. 4. The maximum of the multiplicity of produced ∆-resonances occurs around 15 fm/c which corresponds to the time of maximum compression. Due to their finite lifetimes these resonances decay into pions (and nucleons) as ∆ −→ πN . Some of these pions are re-absorbed in the inverse process, i.e. πN −→ ∆ but chemical equilibrium is never reached, as pointed out in [15]. This 0 10 20 30 40 50 60 time (fm/c) 0 10 20 30 40 50 60 time (fm/c) Fig. 4. Time evolution of the ∆-resonances (left panel) and total pion yield (right panel) for a central (b = 0 fm) Au+Au reaction at 1 AGeV incident energy. Cal- culations with free (solid lines) and effective (DBHF, dashed lines) σinel are shown. mechanism continues until all resonances have decayed leading to a saturation of the pion yield for times t ≥ 50 fm/c (the so-called freeze-out time). The resonance production takes place during the high density phase, where the in-medium effects of the effective cross sections are expected to dominate. In fact, the transport results with the in-medium modified σinel reduce the multiplicity of inelastic processes, and thus the yields of ∆ resonances and pions. However, the in-medium effect is not so pronounced here with respect to similar phenomenological studies of Ref. [16,33], which should come from the moderate density dependence of the effective cross sections, see also again Fig. 2. Fig. 5 shows the centrality dependence of the charged pion yields for Au+Au collisions at 1.0 AGeV incident energy. The degree of centrality is characterized by the observable Apart, which gives the number of participant nucleons and can be calculated within a geometrical picture using smooth density profiles for the nucleus [41]. Obviously Apart increases with decreasing impact parameter b and its value approaches the total mass number of the two colliding nuclei in the limiting case of b = 0 fm. As can be seen in Fig. 5, the charged pion yields are enhanced with increasing Apart, particularly in a non-linear Apart- dependence. As pointed out in [41], the charged pion multiplicities show a similar non-linear increase also in the data. However, by directly comparing the theoretical charged pion yields with the experiments [41] we observe that our calculations overpredict the data, even when the in-medium reductions in σinel are accounted for. This discrepancy is a general feature of the transport models and may lie on the role of the rescattering processes that take place in the spectator region, 0.0 100.0 200.0 300.0 400.0 0 0 5 5 10 10 15 15 20 20 (FOPI) (FOPI) E=1.0 AGeV Au+Au Fig. 5. Centrality dependence (in terms of Apart) of the negative (π −) and positive (π+) charged pions for Au+Au collisions at 1 AGeV incident energy. Calculations with free (solid lines, filled circles) and effective (dashed lines, filled squares) cross sections are shown as indicated. Experimental data, taken from FOPI collaboration [41], are also displayed for comparison. where nuclear surface effects can play a crucial role. In order to check this point we have performed a selection on pions produced at central rapidity, where data are also available [41]. In Fig. 6 we present the inclusive (all centralities) pion rapidity distributions vs. the FOPI data for charged pions. We see that the agreement is rather good at mid-rapidity while we see a definite overcounting in the spectator sources. Such a good evaluation of the pion production ad mid-rapidity is confirmed by the results shown in Fig. 7, where we present the inclusive (all centralities) pion transverse spectrum at midrapidity (−0.2 < y0 < 0.2). We first note that this is also not much affected by the inclusion of the in-medium inelastic cross sections. Moreover we see again that our results are in good agreement with the experimental values from the FOPI collaboration [41], in the same rapidity selection. The overestimation of the pion yields shown in Fig. 5 probably results from other rapidity regions where the role of the spectator sources is more evident. We have also to say that we are not imposing any experimental filter to our results. The point is rather delicate since the main discrepancies appear in high rapidity regions. In any case such a fine agreement at mid- -2 -1 0 1 2 -2 -1 0 1 2 Fig. 6. Inclusive (all centralities) pion rapidity distributions for a Au+Au reaction at Ebeam = 1 AGeV incident energy. Comparison with the experimental values given by FOPI collaboration [41]; as in the data we have used a transverse momentum cut to pt > 0.1GeV/c. 0 200 400 600 800 (MeV) FOPI FOPIπ 0.1*π Fig. 7. Inclusive transverse spectrum at midrapidity of π−, π+ for a Au+Au reaction at Ebeam = 1 AGeV incident energy. Comparison with the experimental values given by FOPI collaboration [41]. The cross sections are normalized to a rapidity interval dy = 1. rapidity is very important for the reliability of our results on kaon production, mostly produced in that rapidity range via secondary πN,∆N channels, see [15]. The pion reaction dynamics is furthermore not sensitively affected by the in-medium inelastic cross sections. We restrict here the analysis to central Au+Au collisions at 1 AGeV. In Fig. 8 we show cross section effects on the -2 -1 0 1 2 -2 -1 0 1 2 Fig. 8. Rapidity distributions of negative and positive charged pions (left and right panels, respectively) for a central (b = 0 fm) Au+Au reaction at Ebeam = 1 AGeV incident energy. rapidity distributions (normalized to the projectile rapidity in the cm sys- tem) for π±, an observable which characterizes the degree of stopping or the transparency of the colliding system. This is due to the fact that the global dynamics is mainly governed by the total NN cross sections, in which its elas- tic contribution is the same for all the cases. In previous studies [19,31] the in-medium effects of the elastic NN cross sections gave important contribu- tions to the degree of transparency or stopping. It was found that a reduction of the effective NN cross section particularly at high densities is essential in describing the experimental data [19], as confirmed by various other analy- ses [31]. The density effects on the inelastic NN cross section influence only those nucleons associated with resonance production, and therefore they do not affect the global baryon dynamics significantly. 5.1.2 Kaon Production The situation is different for kaon production, see Fig. 9. The influence of the in-medium dependence of σinel is important, and reduces the kaon abundancies by a factor of ≈ 30%. This is due to the fact that the leading channels for kaon production are N∆ −→ BY K and Nπ −→ ΛK. Thus kaon production is essentially a twostep process and the medium-modified inelastic cross sections enter twice, leading to an increased sensitivity. Fig. 10 shows the rapidity distributions of kaons, where the in-medium effect is more visible with respect to the corresponding pion rapidity distributions (see Fig. 8). These results seem to show that kaon production could be used to determine the in-medium dependence of the NN cross section for inelas- 0 10 20 30 40 50 60 time (fm/c) 0 10 20 30 40 50 60 time (fm/c) w/o K-pot σ w/o K-pot σ Fig. 9. Time evolution of the K0 (left panel) and K+ (right panel) multiplicities, for the same reaction and models as in Fig. 4, with free and in-medium inelastic cross sections, without the inclusion of the kaon potentials. -2 -1 0 1 2 w/o K-pot σ w/o K-pot σ -2 -1 0 1 2 Fig. 10. Same as in Fig. 9, but for the normalized rapidity distributions. tic processes. Similar phenomenological studies based on the BUU approach [16,33] strongly support in-medium modifications of the free cross sections. It is of great interest to perform an extensive comparison with experimental data on kaon production, in order to have a more clear image of the effect of the in-medium cross sections on their production. The point is that kaon absolute yields are also largely affected by the kaon potentials, see the following, as expected from the general discussion of the previous section. However since kaons are mainly produced in more uniform high density regions the effects of the medium on cross sections tend to disappear in the yield ratios. In the next section we will show that the same holds true for the K0, K+ potentials. Our conclusion is that the kaon yield ratios might finally be a rather robust 0 10 20 30 40 50 60 time (fm/c) 0 10 20 30 40 50 60 time (fm/c) w/o K-pot w K-pot w ID K-pot Fig. 11. Time evolution of the K0 (left panel) and K+ (right panel) multiplicities for the same reaction as in Fig. 4. Calculations without (w/o K-pot, solid), with (w K-pot, dashed) and with the isospin dependent (w ID K-pot,dotted-dashed) kaon potential are shown. In all the cases the free choice for σinel is adopted. observable to probe the nuclear EoS at high baryon densities. 5.2 The role of the kaon potential As discussed in the previous sections, the important quantity which influences the kaon production threshold is the in-medium energy at zero momentum [7]. This quantity rises with increasing baryon density and in the general case of isospin asymmetric matter shows a splitting between K0 and K+, see Fig. 3. We are presenting here several K-production results in ab initio collision sim- ulations using the Chiral determination of the K-potentials, ChPT . Fig. 11 shows the time dependence of the two isospin states of the kaon with respect to the role of the kaon potential and its isospin dependence. First of all, the repulsive kaon potential considerably reduces the kaon yields, at least in this ChPT evaluation. The inclusion of the isospin dependent part of the kaon potential slightly modifies the kaon yields, towards a larger K+ production in neutron-excess matter. However by comparing to the corresponding isospin dependence of the in-medium kaon energy, see Fig. 3, the effect is less pronounced in the dynamical situation. This is due to the fact that in heavy ion collisions the local asymmetry in the interacting region varies with time, see [15]. In particular, it decreases with respect to the initial asymmetry because of partial isospin equilibration due to stopping and inelastic processes with associated isospin exchange. This is reflected also in the kaon rapidity distributions, see Fig. 12, -2 -1 0 1 2 -2 -1 0 1 2 w/o K-pot w K-pot w ID K-pot Fig. 12. Same as in Fig. 11, but for the rapidity distributions. -1 0 1 2 normalized rapidity y -1 0 1 2 normalized rapidity y -1 0 1 2 normalized rapidity y w/o K-pot w K-pot w ID K-pot Fig. 13. K+ rapidity distributions for semi-central(b < 4 fm) Ni+Ni reactions at 1.93 AGeV. Theoretical calculations (as indicated) are compared with the exper- imental data of FOPI (open triangles) and KaoS (open diamonds) collaborations [42,43]. where the role of the kaon potential is crucial, but not its isospin dependence. As we have already seen even in-medium modifications of inelastic cross sec- tions are affecting the kaon absolute yields, so it appears of interest to look at the combined effects. For that purpose we have performed calculations for a semi-central (b < 4fm) Ni+Ni system at 1.93AGeV , where data are existing from the FOPI [42] and KaoS [43] collaborations. The results for K+ rapidity distributions, compared to experimental data, are shown in Fig.13. We observe that although the kaon yields are reduced when using the in- medium inelastic cross section, we are still rather far away from the data, left panel of Fig. 13. We note that the reduction due to the density dependence of the effective inelastic cross sections is rather moderate here with respect to that of the heavier Au-system (see Fig. 10). for kaons). This is due to the less compression achieved for the lighter Ni-systems. The inclusion of the kaon potential, without (central panel) and with (right panel) isospin dependence, is further suppressing the K+ yield, towards a better agreement with data, as expected for the repulsive behavior at high density. In fact the results obtained with kaon potentials and effective cross sections seem to underestimate the data. This could be an indication that the ChPT K-potentials are too repulsive at densities around 2ρ0 where kaons are pro- duced, see [15]. We like to remind that the parameters of ChPT potentials are essentially derived from free space considerations. When we follow a more consistent RMF approach, directly linked to the effective Lagrangians used to describe bulk properties of the nuclear matter as well as the relativistic trans- port dynamics, we see less repulsion, bottom curves in Fig. 3 (left panel). This is valid also for the isospin dependent part of the K-potentials, that more di- rectly will affect the K0/K+ yield ratio. We see from the same Fig. 3 (right panel) that in the RMF frame this splitting is reduced to a few percent for all the different isovector interactions. The conclusion is that when kaon po- tentials are evaluated within a consistent effective field approach we have a better agreement with data for absolute yields, with a very similar reduction of the K0 and K+ rates. This is important for the yield ratio, that then should be not much sensitive to the in-medium effects on kaon propagation. A similar conclusion on K-potential effects, obtained within the ChPT ap- proach, can be drawn from the centrality dependence of the K+ yields shown in Fig. 14 in the case of Au+Au collisions at 1 AGeV beam energy and com- pared with KaoS data [44]. The trend in centrality can be reproduced by all theoretical calculations (with different cross sections), however, all of them seem to underestimate the experimental yields. In fact we have to mention that another possible source of the discrepancy with data can be that in all our simulations only the lowest mass resonance ∆(1232MeV ) has been dynamically included. Transport calculations from other groups, that take care also of the N∗(1440MeV ) resonance, are getting an enhancement of the K+ yield for Au+Au collisions at 1 AGeV incident energy [7]. This significant dependence of the kaon yields on the N∗ resonance comes from the 2-pionic N∗-decay channel, i.e. N∗ −→ ππN . Therefore, since the most important channels of kaon production are the pionic ones, we can expect some underestimation of the absolute yields in our calculations. Just to confirm this point, in Fig. 14 we report also transport results from the Tübingen group, in which all resonances are accounted for [7]. We finally re- 0,0 0,2 0,4 0,6 0,8 5,0×10 1,0×10 1,5×10 2,0×10 0,0 0,2 0,4 0,6 0,8 1,0 w K-pot w ID K-pot Fig. 14. K+ centrality dependence in Au+Au reactions at 1 AGeV incident energy. Our theoretical calculations (as indicated) are compared with KaoS data from [44] (open diamonds) and with results of the Tübingen group (open squares). mark that the inclusion of other nucleon resonances in neutron-rich matter will further contribute to increase the K0 yield through a larger intermedi- ate π− production. This can contribute to compensate the opposite effect of isospin dependent part of the K-potentials on the K0/K+ yield ratios. 5.3 Pionic and Strangeness Ratios A crucial question is whether particle yield ratios are influenced by in-medium effects both on inelastic cross section and kaon potentials. This point is of ma- jor importance particularly for kaons, since ratios of particles with strangeness have been widely used in determining the nuclear EoS at supra-normal density. Relative ratios of kaons between different colliding systems have been utilized in determining the isoscalar sector of the nuclear EoS [5]. More recently, the (π−/π+)- and (K0/K+)-ratios have been proposed in order to explore the high density behavior of the symmetry energy, i.e. the isovector part of the nuclear mean field [14,15,13,12]. Fig. 15 shows the incident energy dependence of the pionic (π−/π+, left panel) and strangeness (K0/K+, right panel) ratios for the different choices of in- elastic cross sections and kaon potentials, as widely discussed in the previous sections. First of all, a rapid decrease of the pionic ratio with increasing beam energy 0.8 1 1.2 1.4 1.6 1.8 2 (AGeV) 0.8 1 1.2 1.4 1.6 1.8 2 (AGeV) free w/o K-pot eff w/o K-pot 0.8 1 1.2 1.4 1.6 1.8 2 (AGeV) free w K-pot eff w K-pot 0.8 1 1.2 1.4 1.6 1.8 2 (AGeV) tio σ free w ID K-pot eff w ID K-pot Fig. 15. Energy dependence of the π−/π+ (left panel) and K0/K+ (right panel) ratios for central (b = 0 fm) Au+Au reactions. is observed, related to the opening of secondary rescattering processes (reab- sorption/recreation of pions with associated isospin exchange) channels. The corresponding strangeness ratio depends only moderately on beam energy due to the absence of secondary interactions with the hadronic environment. The pionic ratio is partially affected by the in-medium effects of σinel, as it can be seen in the left panel of Fig.15. Its slope is slightly changing with respect to beam energy. The situation is similar for the strangeness ratio, which actually appears even more robust vs. in-medium modifications, even with the kaon potentials. This can be seen in the right panel of Fig.15, where for all the considered beam energies the ratio remains almost unchanged. Such a result is consistent with those of the previous sections, where it was found that the absolute kaon yields decrease in the same way when the effective σinel are applied and when the K-potentials are included. The different sensitivity to variations in the inelastic cross sections of pionic vs. strangeness ratios can be easily understood. For the large rescattering and lower masses pions can be produced at different times during the collision, in different density regions. At variance kaons are mainly produced at early times in a rather well definite compression stage, i.e. in a source with a more uniform high density, and so the density dependence of the inelastic cross sections will affect in the same way neutral and charged kaon yields, leaving the ratio unchanged. At this level of investigation one could argue that the strangeness ratio is a very promising observable in determining the nuclear EoS and particularly its isospin dependent part. This has been also the main conclusion in Ref. [15]. However, a strong isospin dependence of the kaon potentials could directly NL NLρ NLρδ K0/K+ (w/o K-pot) 1.24 (± 0.02) 1.35 (± 0.01) 1.43 (± 0.02) K0/K+ (w ID K-pot) 1.02 (± 0.03) 1.22 (± 0.04) 1.34 (± 0.05) Table 2 Sensitivity of the strangeness ratio K0/K+ to the isospin dependent kaon potential and to the isovector mean field (NL, no isovector fields, NLρ and NLρδ). The considered reaction is a central (b = 0 fm) Au+Au collision at 1 AGeV incident energy. affect the ratio, since the K0 and K+ rates will be modified in opposite ways. This is shown by the two triangle points at 1 AGeV in the right panel of Fig. 15. As already discussed this large isospin dependence of the kaon potential, clearly present in the ChPT evaluation, is greatly reduced in a consistent mean field approach, see Fig. 3 and the arguments presented in Section 3. In any case this point deserves more detailed studies. We plan to perform ab initio kaon-production simulations within the OBE/RMF evaluation of kaon potentials, with an isospin part fully consistent with the isovector fields of the Hadronic Lagrangians used for the reaction dynamics. An interesting final comment is that the sensitivity of the strangeness ratio to the isovector part of the nuclear EoS remains even when strong isospin dependence of the kaon potentials is inserted, as in the ChPT case. In order to check this, we have repeated for Au+Au at 1 AGeV incident energy the calculations by varying the isovector part of the nuclear mean field. As in Refs. [14,15], three options for the isovector mean field have been applied: the NL (no isovector fields), NLρ and NLρδ parametrizations, but now in- cluding the isospin effect in the kaon potential in the ChPT evaluation. The options of the symmetry energy differ from each other in the high density stiff- ness. NL gives a relatively soft Esym, NLρδ a relatively stiff one, and NLρ lies in the middle between the other limiting cases [14]. Table 2 shows the strangeness ratio as function of these different cases for the isovector mean field, keeping now constant the other parameters (free σinel, isospin dependent kaon potential). The ratio, indeed, strongly decreases when the isospin part in the kaon potential is accounted for. The more interesting result is, however, that the relative difference between the different choices of the symmetry en- ergy remains stable. This can be understood from the fact that in the kaon self energies the isospin sector contains only the isospin densities and currents without additional parameters such meson-nucleon coupling constants. Since the local asymmetry does not strongly vary from one case to the other (NL, NLρ, NLρδ), one would expect a robustness of the EoS dependence. Thus we conclude that the strangeness ratio appears to be well suited in determin- ing the isovector EoS, however, a fully consistent mean field approach is still missing. 6 Conclusions We have investigated the role of the in-medium modifications of the inelastic cross section and of the kaon mean field potentials on particle production in intermediate energy heavy ion collisions within a covariant transport equation of a Boltzmann type. We have used for both, the elastic and inelastic NN cross sections the same DBHF approach which provide in a parameter free manner the in-medium modifications of the imaginary part of the self energy in nu- clear matter. The kaon potential has been evaluated in two ways, following a Chiral Perturbative approach and an Effective Field scheme, considering va- lence quark-meson couplings. We have applied these modifications of the cross sections and kaon potentials to the collision integral of the transport equation and analyzed Au+Au and Ni+Ni collisions at intermediate relativistic energies around the kaon threshold energy. Our studies have shown a good sensitivity of the particle multiplicities and rapidity distributions of pions and kaons. In particular, a moderate reduction for pions has been seen when the in-medium effects in the inelastic cross section are accounted for. The pion yields are still overestimating the inclusive data while we have a very nice agreement with the pion spectra and multiplicities at mid-rapidity. The latter point is important for trusting the kaon production, mainly due to secondary pion collisions at mid-rapidity. At variance the kaon (K0,+) yields show a larger sensitivity to the reduction of the inelastic cross sections, with a decrease of about 30 %. However we see that the introduction of a repulsive kaon potential is essential in order to reproduce even inclusive data. We have then focused our attention on π−/π+ and K0/K+ yield ratios, re- cently suggested as good probes of the isovector part of the EoS at high den- sities. The pionic ratios, due to their strong secondary interaction processes with the hadronic environment, show a dependence on the density behavior of the inelastic cross sections. A further selection of the production source, i.e. a transverse momentum discrimination, could be required in order to have a more reliable probe of the nuclear EoS. The situation appears more favorable for the kaon ratios. In fact we find that the multiplicities of K0 and K+ are influenced in such a way that their ratio is not affected by the density dependence of the inelastic cross sections. This is due to the long mean free path of the K0,+ that are produced only in the compression stage of the collision [15]. The effects of the in medium kaon potentials are also largely compensating in the K0/K+ yield ratio, due to the similar repulsive field seen by K0 and K+ mesons. Such a result can be modified by the isospin dependence of the kaon potentials which is expected to act in opposite directions for neutral and charged kaons rates. Actually this is a rather stimulating open problem. In our analysis with two completely different approaches, ChPT vs. RMF , we get a good agreement for the isoscalar kaon potential but a rather different prediction for the isovector part. However, a study in terms of the different choices of the isovector kaon field has shown that the relative dependence of the strangeness ratios on the stiff- ness of the isovector nuclear EoS remains a well robust observable. This is an important issue in determining the high density behavior of the symmetry energy in more systematic analyses in the future, when more experimental data will be available. Acknowledgments. This work is supported by BMBF, grant 06LM189 and the State Scholarships Foundation (I.K.Y.). It is also co-funded by European Union Social Fund and National funded Pythagoras II - EPEAEK II, under project 80861. One of the authors (V.P.) would like to thank H.H. Wolter and M. Di Toro for the warm hospitality during her short stays at their institutes. References [1] J. M. Lattimer, M.Prakash, Nucl. Phys A777, (2006) 479 and refs. therein; B. Liu, H. Guo, V. Greco, U. Lombardo, M. Di Toro and Cai-Dian Lue, Eur. Phys. J. A22, (2004) 337. [2] T.Klähn et al., Phys. Rev. C74, (2006) 035802. [3] P. Danielewicz, R. Lacey, W.G. Lynch, Science 298, (2002) 1592 [4] N. Hermann, J.P. Wessels, T. Wienold, Annu. Rev. Nucl. Part. Sci. 49, (1999) [5] C. Fuchs et al., Phys. Rev. Lett. 86, (2001) 1974. [6] A.B. Larionov, U. Mosel, Phys. Rev. C72, (2005) 014901. [7] C. Fuchs, Prog. Part. Nucl. Phys. 56, (2006) 1. [8] J. Aichelin, C.M. Ko, Phys. Rev. Lett. 55, (1985) 2661. [9] C. Hartnack, H. Oeschler, J. Aichelin, Phys. Rev. Lett. 90, (2003) 102302. [10] C. Sturm et al. (KaoS Collaboration), Phys. Rev. Lett. 86, (2001) 39. [11] A. Andronic et al. (FOPI Collaboration), Phys. Lett. B612, (2005) 173. [12] Bao-An Li, Phys.Rev.C71, (2005) 014608. [13] Qing-feng Li, Zhu-xia Li, En-guang Zhao, Raj K. Gupta, Phys. Rev. C71, (2005) 054907. [14] G. Ferini, M. Colonna, T. Gaitanos, M. Di Toro, Nucl. Phys. A762, (2005) 147-166. [15] G. Ferini, T. Gaitanos, M. Colonna, M. Di Toro, H.H. Wolter, Phys. Rev. Lett. 97, (2006) 202301.. [16] A.B. Larionov, W. Cassing, S. Leupold, U. Mosel, Nucl.Phys. A696, (2001) [17] C. Fuchs et al., Phys. Rev. C 64, (2001) 024003. [18] B. Ter Haar, R. Malfliet, Phys. Rev. C36, (1987) 1611. [19] T. Gaitanos, C. Fuchs, H.H. Wolter, Phys. Lett. B609, (2005) 241; E. Santini, T. Gaitanos, M. Colonna, M. Di Toro, Nucl. Phys. A756, (2005) T. Gaitanos, C. Fuchs, H.H. Wolter, Prog. Part. Nucl. Phys. 53, (2004) 45. [20] D.B. Kaplan, A.E. Nelson, Phys. Lett. B175, (1986) 57; A.E. Nelson, D.B. Kaplan, Phys. Lett. B192, (1987) 193. [21] G.Q. Li, C.H. Lee, G.E. Brown, Nucl. Phys. A625, (1997) 372. [22] J. Schaffner-Bielich, I.N. Mishustin, J. Bondorf, Nucl. Phys. A625, (1997) 325. [23] E.L. Bratkovskaya, W. Cassing, U. Mosel, Nucl. Phys. A622, (1997) 593. [24] X. Lopez et al. (FOPI Collaboration), submitted for publication. [25] L.P. Kadanoff, G. Baym, Quantum Statistical Mechanics (Benjamin, New York, 1962). [26] B. Blättel, V. Koch, U. Mosel, Rep. Prog. Phys. 56, (1993) 1. [27] P. Danielewicz, Nucl. Phys. A 673, 375 (2000). [28] A.B. Larionov et al., Phys. Rev. C62, 064611 (2000). [29] T. Gaitanos, C. Fuchs, H.H. Wolter, Amand Faessler, Eur.Phys.J. A12, (2001) [30] T. Gaitanos, C. Fuchs, H.H. Wolter, Nucl.Phys. A741, (2004) 287. [31] D. Persam, C. Gale, Phys. Rev. C65, (2002) 064611. [32] P. Danielewicz, Acta Phys. Polon. B33, (2002) 45. [33] A.B. Larionov, U. Mosel, Nucl.Phys. A728, (2003) 135. [34] H. Huber, J. Aichelin, Nucl. Phys. A573, (1994) 587. [35] E.N.E. van Dalen, C. Fuchs, Amand Faessler, Phys, Rev. C72, (2005) 065803. [36] S. Weinberg, Phys. Rev. 166, (1968) 1568. [37] G.E.Brown, M.Rho, Nucl. Phys. A596, (1996) 503. [38] C. Fuchs, H.H. Wolter, Nucl. Phys. A589, (1995) 732. [39] J.D. Walecka, Ann. Phys. (N.Y.) 83, (1974) 491. [40] K. Tsushima, A. Sibirtsev, A.W. Thomas, G.Q. Li, Phys. Rev. C59, (1999) K. Tsushima, S.W. Huang, Amand Faessler, Phys. Lett. B337, (1994) 245; Austral. J. Phys. 50, (1997) 35 (nucl-th/9602005). [41] D. Pelte et al., Z. Phys. A357, (1997) 215; More precise pion data have been recently published in W. Reisdorf et. al., FOPI Collaboration, Nucl. Phys. A781 (2007) 459. [42] D. Best et al. (FOPI collaboration), Nucl. Phys. A625, (1997) 307 [43] M. Menzel et al. (KaoS collaboration), Phys. Lett. B495, (2000) 26. [44] R. Barth et al. (KaoS collaboration), Phys. Rev. Lett. 78, (1997) 4007; P. Senger, H. Ströbele, J. Phys. G25, (1999) R59. ABSTRACT The effect of possible in-medium modifications of nucleon-nucleon ($NN$) cross sections on particle production is investigated in heavy ion collisions ($HIC$) at intermediate energies. In particular, using a fully covariant relativistic transport approach, we see that the density dependence of the {\it inelastic} cross sections appreciably affects the pion and kaon yields and their rapidity distributions. However, the $(\pi^{-}/\pi^{+})$- and $(K^{0}/K^{+})$-ratios depend only moderately on the in-medium behavior of the inelastic cross sections. This is particularly true for kaon yield ratios, since kaons are more uniformly produced in high density regions. Kaon potentials are also suitably evaluated in two schemes, a chiral perturbative approach and an effective meson-quark coupling method, with consistent results showing a similar repulsive contribution for $K^{+}$ and $K^{0}$. As a consequence we expect rather reduced effects on the yield ratios. We conclude that particle ratios appear to be robust observables for probing the nuclear equation of state ($EoS$) at high baryon density and, particularly, its isovector sector. <|endoftext|><|startoftext|> Introduction One famous conjecture of Erdös and Turán [2] asserts that any set A ⊂ N with∑ a∈A 1/a = ∞ should contain infinitely many progressions of arbitrary length k ≥ 3. There are two important progresses towards this direction due to Szemerédi [7] and Green and Tao [5] respectively, which assert that if A has positive upper density or A is the set of the prime numbers, then A contains infinitely many progressions of arbitrary length. If one considers the similar question in the two-dimensional plane, Graham [4] conjectured that if B ⊂ N× N satisfies (x,y)∈B x2 + y2 then B contains the four vertices of an axes-parallel square. More generally, for any s ≥ 2 it should be true that B contains an s× s axes-parallel grid. Furstenberg and Katznelson [3] proved the two-dimensional Szemerédi theorem, that is, any set B ⊂ N × N with positive upper density contains an s × s axes-parallel grid. In another words, such a set B contains any finite pattern. The purpose of this paper is to show that if the Graham conjecture is true, then the Erdös-Turán conjecture is also true. 2. The Graham conjecture implies the Erdös-Turán conjecture Suppose that the Erdös-Turán conjecture is false for k = 3. Then there exists a A = {a1 < a2 < a3 < · · · } ⊂ N Date: April 4, 2007. 2000 Mathematics Subject Classification. 11B25. http://arxiv.org/abs/0704.0555v1 2 LIANGPAN LI n∈N 1/an = ∞ such that A contains no arithmetic progression of length 3. Define a set B ⊂ N× N by (an +m,m) : n ∈ N,m ∈ N (x,y)∈B x2 + y2 (an +m)2 +m2 (an +m)2 +m2 (an + an)2 + a2n In the sequel we indicate that B contains no square and argue it by contradiction. This would mean that the Graham conjecture is false for s = 2. Suppose that for some n,m, l ∈ N, B contains a square of the following form: (an +m,m+ l), (an +m+ l,m+ l), (an +m,m), (an +m+ l,m). It follows easily from the construction of B that an − l, an, an + l ∈ A, which yields a contradiction since A contains no arithmetic progression of length 3 according to the initial assumption. Similarly, if the Graham conjecture is true for some s ≥ 2, then the Erdös-Turán conjecture is true for k = 2s− 1. The interested reader can easily provide a proof. 3. Concluding Remarks Let r(k,N) be the maximal cardinality of a subset A of {1, 2, . . . , N} which is free of k-term arithmetic progressions. Behrend [1] and Rankin [6] had shown that r(k,N) ≥ N · exp(−c(logN)1/(k−1)). Similarly, let r̃(s,N) be the maximal cardinality of a subset B of {1, 2, . . . , N}2 which is free of s× s axes-parallel grids. For any set A ⊂ {1, 2, . . . , N}, define Θ(A) = {(a+m,m) : a ∈ A,m = 1, 2, . . . , N} ⊂ {1, 2, . . . , 2N}2. Following the discussion in Section 2, one can easily deduce that if A is free of 2s−1 term of arithmetic progression, then Θ(A) is free of s× s axes-parallel grid. Hence r̃(s, 2N) ≥ r(2s− 1, N) ·N ≥ N2 exp(−c(logN)1/(2s−2)). We end this paper with a question. Does the Erdös-Turán conjecture imply the Graham conjecture? THE GRAHAM CONJECTURE IMPLIES THE ERDÖS-TURÁN CONJECTURE 3 References [1] F.A. Behrend, On sets of integers which contain no three terms in arithmetic progression, Proc. Nat. Aca. Sci. 32 (1946), 331–332. [2] P. Erdös and P.Turán, On some sequences of integers, J. London Math. Soc. 11 (1936), 261–264. [3] H. Furstenberg and Y. Katznelson, An ergodic Szemeredi theorem for commuting transfor- mation, J. d’Analyse Math. 34 (1979), 275–291. [4] R. Graham, Conjecture 8.4.6 in Discrete and Computational Geometry (J.E. Goodman and J. O’Rourke, eds), CRC Press, Boca Raton, NY, p.11. [5] B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, to appear in Ann. of Math. [6] R.A. Rankin, Sets of integers containing not more than a given number of terms in arithmetic progression, Proc. Roy. Soc. Edinburgh Sect A. 65 (1960/61), 332–344. [7] E. Szemerédi, On sets of integers containing no k elements in arithmetic progression, Acta Arith. 27 (1975), 299–345. Department of Mathematics, Shanghai Jiaotong University, Shanghai 200240, Peo- ple’s Republic of China E-mail address: liliangpan@sjtu.edu.cn ABSTRACT Erd\"{o}s and Tur\'{a}n once conjectured that any set $A\subset\mathbb{N}$ with $\sum_{a\in A}{1}/{a}=\infty$ should contain infinitely many progressions of arbitrary length $k\geq3$. For the two-dimensional case Graham conjectured that if $B\subset \mathbb{N}\times\mathbb{N}$ satisfies $$\sum\limits_{(x,y)\in B}\frac{1}{x^2+y^2}=\infty,$$ then for any $s\geq2$, $B$ contains an $s\times s$ axes-parallel grid. In this paper it is shown that if the Graham conjecture is true for some $s\geq2$, then the Erd\"{o}s-Tur\'{a}n conjecture is true for $k=2s-1$. <|endoftext|><|startoftext|> Effective conservation of energy and momentum algorithm using switching potentials suitable for molecular dynamics simulation of thermodynamical systems Christopher G. Jesudason ∗ Laboratory of Physics and Helsinki Institute of Physics, P.O.Box 1100, FIN-02015 HUT, Finland. Email: jesu@um.edu.my, chrysostomg@gmail.com 4 April, 2007 Abstract During a crossover via a switching mechanism from one 2-body poten- tial to another as might be applied in modeling (chemical) reactions in the vicinity of bond formation, energy violations would occur due to finite step size which determines the trajectory of the particles relative to the potential interactions of the unbonded state by numerical (e.g. Verlet) integration. This problem is overcome by an algorithm which preserves the coordinates of the system for each move, but corrects for energy dis- crepancies by ensuring both energy and momentum conservation in the dynamics. The algorithm is tested for a hysteresis loop reaction model with an without the implementation of the algorithm. The tests involve checking the rate of energy flow out of the MD simulation box; in the equilibrium state, no net rate of flows within experimental error should be observed. The temperature and pressure of the box should also be invariant within the range of fluctuation of these quantities. It is demon- strated that the algorithm satisfies these criteria AMS (MSC2000) Subject Classification. 00A71-2, 70H05, 80A20 1 PRELIMINARIES The dimeric particle reaction simulated may be written A2 (1.1) ∗on leave from Chemistry Department, University of Malaya, 50603 Kuala Lumpur, Malaysia. http://arxiv.org/abs/0704.0556v1 where k1 is the forward rate constant and k−1 is the backward rate constant. The reaction simulation was conducted at extremely high temperatures which are off-scale and not used in ordinary simulations of LJ (Lennard-Jones) fluids where normally [1] the reduced temperatures T ∗ ranges ∼ 0.3 − 1.2, whereas here, T ∗ ∼ 8.0 − 16.0, well above the supercritical regime of the LJ fluid At these temperatures, the normal choices for time step increments do not obtain without also taking into account energy-momentum conservation algorithms in regions where there are abrupt changes of gradient [1, 2, 3]. The global literature does not seem to cover such extreme conditions of simulation with discrete time steps using the Verlet velocity algorithm. The units used here are reduced LJ ones [1]. The simulation was at density ρ = 0.70 with 4096 atomic particls which could react. The potentials used are as given in Fig. (1) where rb = 1.20 for the vicinity where the bond of the dimer is broken and where 2 free particles emerge, and rf = 0.85 is the point along the hysteresis potential curve where the dimer is defined to exist for two previously free particles which collide. The reaction proceeds as follows; all particles interact with the splined LJ pair potential uLJ except for the dimeric pair (i, j) formed from particles i and j which interact with a harmonic-like intermolecular potential modified by a switch u(r) given u(r) = uvib(r)s(r) + uLJ [1− s(r)] (1.2) where uvib(r) is the vibrational potential given by eq.(1.3) below uvib(r) = u0 + k(r − r0) 2 (1.3) The switching function s(r) is defined as s(r) = )n (1.4) where s(r) → 1 if r < rsw s(r) → 0 for r > rsw The switching function becomes effective when the distance between the atoms approach the value rsw (see Fig. (1) ). Some of the other parameters used in the equations that follow include: u0 = −10, r0 = 1.0, k ∼ 2446 (exact value is determined by the other input pa- rameters), n = 100, rf = 0.85, rb = 1.20, and rsw = 1.11. Particles i and j above also interact with all other particles not bonded to it via uLJ . Full simulation details are given elsewhere [2]; suffice to say the activation energy at rf is ex- tremely high at approximately 17.5. At rf , the molecular potential is turned on where at this point there is actually a crossing of the potential curves although the gradients of the molecular and free uLJ potentials are ”‘very close”’. On the other hand, at rb , the switch forces the two curves to coalesce, but detailed examination shows that there is an energy gap of about the same magnitude as the cut-off point in a normal non-splined LJ potential (∼ 0.04 energy units), meaning there is no crossing of the potentials. The current algorithm is applied for both these cross-over regions with their different mechanisms of cross-over. The MD cell is rectangular, with unit distance along the axis ( x direction) of the cell length, whereas the breadth and height was both 1/16, implying a thin pencil-like system where the thermostats were placed at the ends of the MD cell, and the energy supplied per unit time step δt at both ends of the cell (orthogonal to the x axis) in the vicinity of x = 0 and x = 1 maintained at temperatures Th and Tl could be monitored, where this energy per unit step time is respectively ǫh and ǫl. At equilibrium, (when Th = Tl), the net energy supplied within statistical error (meaning 1-3 units of the standard error of the ǫ distributions ) is zero, i.e. ǫl ≈ ǫh ≈ 0. The cell is divided up uniformly into 64 rectangular regions along the x axis and its thermodynamical properties of temperature and pressure are probed. The resulting values of the ǫ’s and the relative invariance of the pressure and temperature profiles would be a measure of the accuracy of the algorithm from a thermodynamical point of view at the steady state. For systems with a large number of particles such thermodynam- ical criteria are appropriate. The synthetic thermostats now frequently used in conjunction with ”‘non-Hamiltonian”’ MD [3] cannot be employed for this type of study, where actual energy increments are sampled. The runs were for 4 million time steps, with averages taken over 100 dumps, where each variable is sampled every 20 time steps. The final averages were over the 20-100 dump values of averaged quantities. 0.8 1 1.2 1.4 1.6 1.8 r/LJ distance units Potentials for simulation model intermolecular potential s(r) switching function atomic LJ potential Figure 1: Potentials used for this work. The temperature T and pressure p are computed by the equipartition and 0 20 40 60 80 x layer number Figure 2: Temperature profile across the cell for different set conditions a−e for temperature T ∗ and step time δt pairs (T ∗, δt) where a = (8.0, 2.0 ep− 3), b = (8.0, 5.0 ep− 4), c = (8.0, 5.0 ep− 5), d = (12.0, 5.0 ep− 5), e = (16.0, 5.0 ep− 5). The curves {l1, l3, t1, t2, t3} results with the application of the algorithm at rb and rf with associated conditions l1 ⇔ a, l3 ⇔ b, t1 ⇔ c, t2 ⇔ d, t3 ⇔ e whilst the curves {l2, l4, l5, l6, l7} are for the cases without implementing the algorithm with the associated conditions l2 ⇔ a, l4 ⇔ b, l5 ⇔ c, l6 ⇔ d, l7 ⇔ e, where ep x ≡ 10x. 0 20 40 60 80 x layer number Figure 3: Pressure profile across the cell for different runs.The conditions of the runs and the labeling of the curves are exactly as in Fig. (2). Curve ǫh ǫl Mean Temperature l1 -.2274E+00 ±0.19E-02 -.2295E+00 ±0.21E-02 0.9063E+01 ±0.62E-02 l2 -.5602E+00 ±0.22E-02 -.5596E+00 ±0.22E-02 0.1032E+02 ±0.63E-02 l3 -.4161E-01 ±0.14E-02 -.4089E-01 ±0.14E-02 0.8774E+01 ±0.79E-02 l4 -.5201E-01 ±0.16E-02 -.5103E-01 ±0.17E-02 0.8980E+01 ±0.98E-02 t1 -.5312E-03 ±0.92E-03 -.3334E-03 ±0.76E-03 0.8082E+01 ±0.49E-02 l5 0.1311E-02 ±0.82E-03 0.1147E-02 ±0.84E-03 0.7731E+01 ±0.97E-02 t2 -.6823E-03 ±0.12E-02 -.1507E-02 ±0.13E-02 0.1216E+02 ±0.17E-01 l6 0.7291E-02 ±0.12E-02 0.6343E-02 ±0.14E-02 0.1088E+02 ±0.15E-01 t3 -.9348E-03 ±0.18E-02 -.3379E-02 ±0.17E-02 0.1622E+02 ±0.18E-01 l7 0.1918E-01 ±0.14E-02 0.1938E-01 ±0.16E-02 0.1329E+02 ±0.20E-01 Table 1: Table with values for the mean heat supply per unit step and temper- ature. The error is one unit of standard error for the quantities. Virial expression given respectively by pi.pi/mi = 3NkBT andP = ρkBT +W/V where W = − 1 w(rij) and the intermolecular pair Virial w(r) is given by w(r) = r dv(r) with v being the potential. 2 ALGORITHMAND AND ANALYSIS OF NU- MERICAL RESULTS The velocity Verlet algorithm [4, p. 81]used here [1] and allied types generate a trajectory at time nδt from that at (n− 1)δt with step increment δt through a mapping Tm where (v(nδt), r(nδt)) = Tm(v((n − 1)δt), r((n − 1)δt)) which does not scale linearly with δt. For a Hamiltonian H whose potential V is dependent only on position r having momentum components pi, the system without external perturbation has constant energy E, and the normal assump- tion in MD (NAMD)is that for the nth step, ∆En = |H(nδt)− E| ≤ ǫ and also i=1 ∆Ei ≤ ǫ s for the specified ǫ′s. In the simulation under NAMD, the force fields are constant and do not change for any one time step. In these cases, the energy is a constant of the motion for any time interval δtT when no external perturbations (e.g. due to thermostat interference) are impressed. When there is a crossing of potentials at such a time interval interval from φb to φa at an inter particle distance(icd ) rc (such as points rf and rb of Fig. (1)) of general particle 1 and 2 (the (1, 2) particle pair) due to a reactive process (such as oc- curs in either direction of (1.1)) a bifurcation occurs where the MD program computes the next step coordinates as for the unreacted system (potential φb), which needs to be corrected. Let the icd at time step i be ri (with φb potential) and at step i + 1 after interval δt be rf = ri+1 where rf < rc < ri. Due to this crossover, a different Hamiltonian H ′ is operative after point rc is crossed, where under NAMD, the other coordinates not undergoing crossover are not affected. For what follows, subscripts refer to the particle concerned. Let the interparticle potential at rf be Ea = Ef = φa(rf ) and at rf be Eb = φb(rf ), where ∆ = Eb − Ea. Then if rf be the final coordinate due to the φb potential and force field, two questions may be asked: (i) Can the velocities of (1,2) be scaled, so that there is no energy or momentum violation during the crossover based on the φb trajectory calculation and (ii) Can a pseudo stochastic potential be imposed from coordinates rc (at virtual time tc) to rf such that (i) above is true? For (ii) we have Theorem 2.1 A virtual potential which scales velocities to preserve momentum and energy can be constructed about region rc. proof The external work done δW on particles 1 and 2 over the time step is proportional to the distance traveled since these forces are constant and so for each of these particles i, Fext,i.∆ri = δWi where ∆ri is the distance increment during at least part of the time step from rc to rf . For the non-reacting trajec- tory over time λδt (λ ≤ 1) (virtual because it is not the correct path due to the crossover at rc), δW2 + δW1 − (φb(rf )− φb(rc)) = ∆ (K.E.) (2.1) where ∆ (K.E.) is the change of kinetic energy for the (1, 2) pair from the First Law between the end points rf , rc. Now over time interval tc to tf , for the reactive trajectory, we introduce a ”‘virtual potential”’ V vir that will lead to the same positional coordinates for the pair at the end of the time step with different velocities than for the non-reactive transition leading to the transition δW2 + δW1 − (V vir(rf )− V vir(rc)) = ∆ ′(K.E.) (2.2) where ∆ ′(K.E.) is the change of kinetic energy for the pair with V vir turned on and along this trajectory, the change of potential for V vir is equated to the change in the K.E. of the pair as given in the results of theorem (2.2) for all three orthogonal coordinates, i.e. δV vir(r) − δφb(r) = δ (K.E.x,y,z)−∆ ′(K.E.)x,y,z with momentum conservation, that is δV vir(ri) = δφa(ri) for the variation along the ri coordinate, but δφa(ri) = −δK.E. along internuclear coordinate ri whereas δV vir = −K.E. (scaled about all three axes). Continuity of potential implies φa(rf ) = V vir(rf );φa(rc) = V vir(rc);φb(rc) = V vir(rc); (2.3) Subtracting (2.1) from (2.2) and applying b.c.’s (2.6) leads to ∆ = φb(rf )− V vir(rf ) = φb(rf )− φa(rf ) = Eb − Ea (2.4) ′(K.E.)−∆ (K.E.) (2.5) The above shows that a conservative virtual potential could be said to be oper- ating in the vicinity of the transition (from tc to ta) .• Question (i) above leads to: Theorem 2.2 Relative to the velocities at any rf due to the φb potential, the rescaled velocities v ′ due to the potential difference ∆ leading to these final velocities due to the virtual potential can have a form given by ′ = (1 + α)vi + β (2.6) (where i = 1, 2) for a vector β. proof From the v velocities at rf due to φb we compute the v ′ velocities at rf due to the virtual potential. Since net change of momentum is due to the external forces only, which is invariant for the (1, 2) pair, conservation of total momentum relating v′ and v in (2.6) yields a definition of β ( summation from 1 to 2 for what follows, where the mass of particle i is mi) β = −α mivi/ mi (2.7) Defining for any vector s2 = s.s,β2 = α2Q, where (2.8) then the rescaled velocities become from (2.6) = (1 + α)2vi 2 + 2(1 + α)vi.β + β 2. (2.9) With ∆ = Eb − Ea, Energy conservation implies 2 = ∆ (2.10) The coupling of (2.9-2.10) leads, after several steps of algebra to α2m1m2 2(m1 +m2) 2 + v2 2 − 2v1.v2 (2.11) 2αm1m2 2(m1 +m2) 2 + v2 2 − 2v1.v2 Defining a = (v1 − v2) 2, q = m2m1/[2(m1 +m2)], (q > 0, a ≥ 0), then the above is equivalent to the quadratic equation α2qa+ 2qaα−∆ = 0 (2.12) and in simulations, only α is unknown and can be determined from (2.12) where real solutions exist for ∆/qa ≥ −1. • The above Inequality leads to a certain asymmetry concerning forward and backward reactions, even for reversible re- actions where the region of formation and breakdown of molecules are located in the same region with the reversal of the sign of approximate ∆. For this simulation, a reaction in either direction (formation or breakdown of dimer ) proceeds if (??) is true; if not then the trajectory follows the one for the initial trajectory without any reaction (i.e. no potential crossover). Interpretation of results. Fig. (1) shows a rapidly changing potential curve with several inflexion points used in the simulation at very high temperature (as far as I know such ranges have not been reported in the literature for non- synthetic methods) warranting smaller time steps; larger ones would introduce errors due to the rapidly changing potential and high K.E.; thus, even with the application of the algorithm between cordinates rf and rb, curves l1 and l2 have too large a δt value to achieve equilibrium - meaning flat or invariant - temperature (see Fig. (2) ) or pressure (see Fig. (3))or unit step thermostat heat supply (see Table 1)(ǫh and ǫl) profiles where for these curves, the (ǫh, ǫl) values show net heat absorption; the curve at t1 (with δt = 5.0 ep− 5 show flat profiles (within statistical fluctuations and 2 standard errors of variation) for temperature, pressure and net zero heat supply; and this choice of time step interval was found adequate for runs at much higher temperatures (T = 12 and T = 16) which was used to determine thermodynamical properties [2]. For this δt value and all others, no reasonable stationary equilibrium conditions could be obtained without the application of the algorithm (curves l2,l4,l5,l6 and l7). The algorithm is seen to be effective over a wide temperature range for this complex dimer reaction simulated under extreme values of thermodynamical variables and the results here do not vary for longer runs and greater sampling statistics (e.g. 6 or 10 million time steps). The thin, pencil-like geometry of the rectangular cell with thermostats located at the ends would highlight the energy non-conservation leading to a non-flat temperature distribution, as observed and which was used to determine the regime of validity of the algorithm. References [1] J. M. Haile,Molecular Dynamics Simulation,JohnWiley & Sons,Inc.,New York, 1992. [2] C. G. Jesudason, Model hysteresis dimer molecule. I. Equilibrium prop- erties. J. Math. Chem. JOMC, Accepted 2006. [3] D. Frenkel and B. Smit, Understanding Molecular Simulations: From Algorithms to Applications, Vol(1) of Computational Science Series, Aca- demic Press, San Diego, Second Ed., 2002. [4] M.P. Allen and D. J. Tildesley, Computer Simulation of Liquids,Oxford Univ. Press, Oxford, 1992 PRELIMINARIES ALGORITHM AND AND ANALYSIS OF NUMERICAL RESULTS ABSTRACT During a crossover via a switching mechanism from one 2-body potential to another as might be applied in modeling (chemical) reactions in the vicinity of bond formation, energy violations would occur due to finite step size which determines the trajectory of the particles relative to the potential interactions of the unbonded state by numerical (e.g. Verlet) integration. This problem is overcome by an algorithm which preserves the coordinates of the system for each move, but corrects for energy discrepancies by ensuring both energy and momentum conservation in the dynamics. The algorithm is tested for a hysteresis loop reaction model with an without the implementation of the algorithm. The tests involve checking the rate of energy flow out of the MD simulation box; in the equilibrium state, no net rate of flows within experimental error should be observed. The temperature and pressure of the box should also be invariant within the range of fluctuation of these quantities. It is demonstrated that the algorithm satisfies these criteria. <|endoftext|><|startoftext|> Baltic Astronomy, vol. 16, xxx–xxx, 2007. MIXED CHEMISTRY PHENOMENON DURING LATE STAGES OF STELLAR EVOLUTION R. Szczerba, M.R. Schmidt, M. Pulecka1 1 Nicolaus Copernicus Astronomical Center, ul. Rabiańska 8, 87-100 Toruń, Poland Received 2006 October 15; revised — Abstract. We discuss phenomenon of simultaneous presence of O- and C- based material in surroundings of evolutionary advanced stars. We concentrate on silicate carbon stars and present observations that directly confirm the binary model scenario for them. We discuss also class of C-stars with OH emission detected, to which some [WR] planetary nebulae do belong. Key words: stars: Asymptotic Giant Branch, carbon stars, chemical com- position, planetary nebulae, stars: individual (V778 Cyg, IRAS 04496−6859, IRAS 06238+0904, M 2−43) 1. INTRODUCTION During Asymptotic Giant Branch (AGB) phase of evolution stars with ini- tial masses between 0.8 and 8M⊙ lose a significant amount of their initial mass by ejecting the matter into interstellar space with rates between 10−7 and 10−4 M⊙ yr −1. The chemistry in the formed circumstellar envelopes is determined by the photospheric C/O ratio and is O-based for n(O)>n(C) (usually less evolution- ary advanced stars) and C-based when carbon abundance exceeds that of oxygen (evolved stars which experienced thermal pulses and dredged-up carbon to the surface). This dichotomy is a consequence of CO molecule (very stable) formation which is so efficient that less abundant element (C or O) is mostly used. There- fore, the detection of co-existence of O-rich and C-rich material in surroundings of evolved stars was (and still is) surprising and attracts a significant attention. Hereafter, we call this phenomenon a mixed chemistry phenomenon. Already, due to the IRAS observations it was realized that there is a group of carbon stars which show typical for O-rich environment the 9.7 and 18µm amorphous silicate features (Little-Marenin 1986, Willems & de Jong 1986). The Infrared Space Observatory (ISO) observations (Yamamura et al., 2000) showed that 9.7µm feature in one of such objects (V778 Cyg) is very stable and did not change during the last 15 years (the time spanned between IRAS and ISO obser- vations). This put a very strong constraint on a model and evolutionary status of this class of objects with most likely explanation being a long-lived reservoir of O-rich material located inside or around a binary system. In this review we discuss MERLIN interferometer observations of V778 Cyg which proved existence of such reservoir (disk) around companion of C-rich star. We note that the recent Spitzer Space Telescope (SST) data showed that the first extra-galactic silicate http://arxiv.org/abs/0704.0557v1 2 R. Szczerba, M.R. Schmidt, M. Pulecka carbon star (IRAS 04496−6859, Trams et al. 1999) is in fact a normal carbon star and do not show the 9.7µm dust emission (see Speck et al. 2006). There is another group of carbon stars suspected to have mixed chemistry. Namely, carbon stars with OH maser emission. Lewis (1992) listed a group of stars with SiC emission seen in the IRAS Low Resolution Spectra (LRS) and OH maser emission detected. While most of these sources appeared to have wrong LRS clas- sification the 3 C-stars with OH maser emission remained and Chen et al. (2001) added 6 more sources to this class. However, this class of mixed chemistry sources did not attract a significant attention (except of [WR] planetary nebulae – see below), since OH emission is not well spatially resolved and this group of sources may be result of spatial coincidence between OH maser emission from interstellar medium and location of C-star. For example, Szczerba et al. (2002) presented observational evidence that IRAS 05373−0810 (C-star with OH maser emission) is a genuine carbon star and that OH maser and SiO thermal emission detected toward this star is not coming from its envelope, but from molecular clouds. Here we discuss a case of another C-star with OH maser emission (IRAS 06238+0904) toward which we have detected, using IRAM radiotelescope, the SiO thermal emis- sion coming from its envelope. Here, we present arguments that shock and Photon Dominated Region (PDR) chemistry allow to form a significant amount of SiO in C-rich environment. One of the most important achievements of the ISO mission was detection of crystalline silicates. Surprisingly, crystalline silicates were detected also in [WR] planetary nebulae, which have H-poor and C-rich central stars of WR-type1. [WR] planetary nebulae show at the same time presence of Polycyclic Aromatic Hydro- carbons (PAHs) and crystalline silicates (Waters et al. 1998, Cohen et al. 1999). Scenarios proposed to explain simultaneous presence of PAHs and crystalline sil- icates include: destruction of fossil comets orbiting the star, ejection of matter before star become C-rich, formation of stable O-rich disk or torus around com- panion or system at some point of binary evolution. Hajduk, Szczerba & Gesicki (this Proceedings) present an attempt to determine spatial location of PAHs and crystalline silicates inside the [WR] planetary nebula M 2-43, by means of the radiative transfer modelling of ISO spectrum. They concluded that crystalline silicates have to be located at significant distance from the central star to avoid their emission at about 10 µm. We note also an attempt to find precursors of [WR] planetary nebulae (C-rich stars with C- and O-rich material in their circumstellar shells) among proto-planetary nebulae (Szczerba et al. 2003). The authors have argued that formation of crystalline silicates is necessary before proto-planetary nebula phase, while post-AGB star may be still O-rich and change to C-rich one during the fatal thermal pulse. They indicated five proto-planetary nebulae as a possible precursors of [WR] planetary nebulae, including famous Red Rectangle, other C-rich source with crystalline silicates (IRAS 16279-4757), as well as three O-rich sources which show presence of crystalline silicates in their ISO spectra: AC Her, IRAS 18095+2704 and IRAS 19244+1115. In this review we will not cover such cases as: NGC 6302 – O-rich planetary nebula which show presence of crystalline silicates as well as PAHs (e.g. Kemper et al. 2002); HD 233517 – an evolved O-rich red giant with orbiting polycyclic 1Note, that Zijlstra et al. (1991) detected OH maser emission from [WR] planetary nebula IRAS 07027−7934. Therefore, at least this [WR] planetary nebula belongs also to the discussed above class of C-stars with OH maser emission. Mixed chemistry phenomenon 3 aromatic hydrocarbons (Jura et al. 2006); IRAS 09425−6040 – a carbon-rich AGB star with the highest abundance of crystalline silicates detected up to now (Molster et al. 2001); IRC +10216 – a well known C-rich AGB star with water and OH maser lines detected (Melnick et al. 2001, Ford et al. 2003); and possibly some other spectacular sources which we, not intentionally, have overlooked. 2. V778 CYG A SILICATE CARBON STAR To test the hypotheses related to the mixed chemistry phenomenon observed in silicate carbon stars, we observed water masers towards V778Cyg at high angular resolution. Details of our observations and data analysis are presented by Szczerba et al. (2006), so here we repeat only some of the most important points and findings. The observations were taken on 2001 October 12/13 under good weather con- ditions, using five telescopes of MERLIN. The longest MERLIN baseline of 217 km gave a fringe spacing of 12mas at 22.235080GHz. The bandwidth was 2MHz with 256 spectral channels per baseline providing a channel separation of 0.105km s−1. The continuum calibrator sources were observed in 16MHz band with 16 channels. The data were obtained in left and right circular polarisation and the velocities were measured with respect to the local standard of rest. We used the phase referencing method; 4min scans on V778Cyg were inter- leaved by 2min scans on the source 2021+614 (at 3.◦8 from the target) over 11.5 h. The flux density of 2021+614 at K band of 1.48 Jy was derived from observation of 4C39.25. At the epoch of observation the flux density of 3C39.25 was 7.5±0.3Jy (Terasranta 2002, private communication). This source was also used for bandpass calibration. After initial calibration with the MERLIN software the data were processed using the AIPS package. In order to derive phase and amplitude corrections for atmospheric and instrumental effects the phase reference source was mapped and self-calibrated. These corrections were applied to the target visibility data. The absolute position of the brightest feature at −15.1 km s−1 was determined. The phase solutions for this feature were obtained with self-calibration method and were then applied for the all channels. The target was mapped and cleaned using a 12mas circular restoring beam. The map noise of ∼27mJy beam−1 for I Stokes parameter in a line-free channel was close to the predicted thermal noise level. In order to determine the position and the brightness of the maser components two dimensional Gaussian components were fitted to the emission in channel maps. The position uncertainty of this fitting depends on the signal to noise ratio in the channel map and is lower than 1mas for about 80% of the maser components to- wards V778Cyg. The absolute position of the phase reference source is known with an accuracy of ∼3mas. The highest uncertainties in the absolute position of maser spots are due to tropospheric effects and errors in the telescope positions. The first effects, estimated by observing the phase rate on the point source 3C39.25, introduce the position error of ∼9mas, whereas uncertainties in telescope positions of 1−2 cm cause an error of spot positions of ∼10mas. In order to check the posi- tion accuracy of maser spots we applied a reverse phase referencing scheme. The emission of 15 channels around the reference feature at −15.1 km s−1 was averaged and mapped. The map obtained was used as a model to self-calibrate the raw tar- get data then these target solutions were applied to the raw data of 2021+614. 4 R. Szczerba, M.R. Schmidt, M. Pulecka The position of the reference source was shifted by ∼2mas with respect to the catalogue position. This indicates excellent phase connection when referencing 2021+614 to the set of the brightest maser spots. The above discussed factors im- ply the absolute position accuracy of the maser source to be of order of ∼25mas. Fig. 1. The absolute positions of the H2O 22GHz maser components towards V778Cyg relative to the reference feature at −15.1 km s−1 (RA(J2000) = 20h 36m07.s3833, DE(J2000) = 60◦05′26.′′024). The symbols indicate the ranges of component velocities in km s−1. The size of each symbol is proportional to the logarithm of peak brightness of the corresponding component. The maser emission brigther than 150mJ beam−1 (∼ 5σ) was found in 51 spectral channels. In these channels single and unresolved component only was detected. The overall structure of the H2O maser is shown in Fig. 1. The maser com- ponents form a distorted ”S” like shape structure along a direction of position angle of about −10◦. There is a clear velocity gradient along this structure with weak south components blueshifted with respect to the brightest north compo- nents. The angular extend of maser emisson is 18.5mas. The axis of alonga- tion of the maser structure is fairly perpendicular to the line towards the op- tical position of V778Cyg measured by Tycho2 (see Fig. 2). Angular separa- tion between the optical star and the maser reference component is 0.192±0.′′048. Mixed chemistry phenomenon 5 200 150 100 50 0 -50 Relative RA [mas] Tycho-2 MERLIN Fig. 2. Comparison of th optical position of V778 Cyg as determined in the Tycho-2 catalogue with the radio position of the H2O 22 GHz maser components as obtained from the MERLIN measurements. The epochs of optical and radio observations differ by about 10 yrs. Szczerba et al. (2006) have argued that such separation cannot be explained by proper motion and instead provide direct observational evidence for the binary system model of Yamamura et al. (2000). They suggested that the observed wa- ter maser components can be interpreted as an almost edge-on warped Keplerian disk located around a companion object and tilted by no more 20◦ relative to the orbital plane. More detailed model of disk around companion in V778 Cyg system is presented by Babkovskaia et al. (2006). Finally, note that recently Ohnaka et al. (2006) reported indirect detection of disk around another silicate carbon star (IRAS 08002−38003). They argued that oxygen-rich material is stored in circumbinary disk surrounding the carbon-rich primary star and its putative low- luminosity companion. These two findings may suggest that there are two different kinds of silicate carbon stars: with circumbinary disk and disk around companion only. 3. IRAS 06238+0904 - AN OH MASER C-STAR OR GENUINE CARBON STAR? Genuine carbon stars are formed during evolution on AGB. The star on that stage posses extended circumstellar envelope (CSE). In its inner part (near the photosphere) physical conditions (T∼2500 K, ρ ∼ 1014 cm−3) make the material mainly molecular, with composition determined by the local thermodynamic equi- librium (LTE). In carbon CSE (C/O>1) after CO formation there is almost no oxygen left. However silicon monoxide (SiO) is observed in carbon stars. Recent observations (Schöier et al. 2006) show relatively high SiO fractional abundances (1× 10−7− 5× 10−5), while LTE models give ∼ 5× 10−8 (Millar 2004). Therefore 6 R. Szczerba, M.R. Schmidt, M. Pulecka the non-equilibrium processes should be considered in modelling of circumstellar chemistry. In this review we focus on IRAS 06238+0904 – an OH maser C-star (see Chen et al. 2001). We first built model of carbon circumstellar envelope (CSE) and then computed radiative transfer in molecular rotational lines of HCN J=1-0, CS J=3-2, CS J=5-4 and SiO J=3-2, detected by us with the IRAM radiotelescope. Spectral energy distribution (SED) for IRAS 06238+0904 was modelled by means of the code and method described in Szczerba et al. (1997). The best fit (see Fig. 3) is obtained for the star’s effective temperature T∗=2500 [K], luminosity to distance ratio L/d2=6500 [L⊙kpc −2], mass loss rate Ṁ = 2×10−5 [M⊙ yr −1], dust temperature at the inner boundary Tdust(R in )=900 [K], amorphous carbon (AC) and silicon carbide (SiC) to gas ratios: ρ(AC)/ρgas=0.001, ρ(SiC)/ρgas=0.00019. Fig. 3. Spectral energy distribution for IRAS 06238+0904. See text for details concerning assumed and estimated parameters. The chemical model is computed with the network based on RATE99 database Le Teuff et al. (2000) composed of 343 species made of 10 elements. The gas tem- perature profile is approximated by the power law function r−1.8 established by iterations from the best fits to CS lines. We assume solar gas composition with modifications of carbon (C/O=1.5) and sulfur ǫ(S)=6.71 abundances. As initial concentrations we put LTE values of 23 important species, where SiO number den- sity is equal 1×10−8 [cm−3] The effect of dust formation is included by reduction of Si and C by amount locked up in SiC and amorphous carbon grains according to dusty model. This results in decrease of silicon abundance to ǫ(Si)=7.39 and decrease of C to O ratio to 1.3. Mixed chemistry phenomenon 7 The radiative transfer is computed in Sobolev approximation with molecular data taken from the Leiden database (Schöier et al. 2005). Only interstellar radiation is taken into account as an important source of UV photons. Level populations of investigated molecules were computed for the assumed temperature and molecular densities resulting from chemical model. The half-width main beam (HPBW) for SiO rotational transition J=3-2 (v=130 GHz) is equal to 18.9′′, 16.7′′ for CS(3-2), 10.0′′ for CS(5-4) and 28.9′′ for HCN(1-0) transition, in case of the IRAM telescope observations. The synthetic profile was computed for assumed distance to IRAS 06238+0904 being equal 2.3 kpc. The observed and obtained molecular line profiles of SiO(3-2), CS(3-2) and CS(5-4) are shown in 3 panels of Fig. 4. During line profiles modelling we included simple treatment of CO self-shielding based on Mamon et al. (1988). This process has considerable influence on all molecules, and is especially important for SiO. As one can see in Fig. 4, when self-shielding is not included (solid line) we can explain observed spectrum solely by the PDR chemistry. Around 1×1016 [cm] we observed considerable reproduction of SiO. On the other hand, inclusion of CO self-shielding prevents formation of SiO (dashed line). Partial reproduction in PDR is still present. In both cases the exchange reaction OH + Si → SiO + H is a main process responsible for formation of silicon monoxide. Exchanges between atomic oxygen and SiH, SiC, and HCSi molecules (O + SiH → SiO + H, O + HCSi → SiO + CH, and O + SiC → SiO + C) are also important. Simulation of the shock passage (see Willacy & Cherchneff 1998) enlarge initial abundance of SiO, in comparison to the LTE value, for about one order. Profile obtain with abundance of this molecule increased by factor of 10 is shown as dashed-dotted line in Fig. 4. Fig. 4. Observed and modelled molecular rotational lines without (solid line) and with (dashed line) CO self-shielding. Dashed-dotted line in the left panel show results when the intial LTE abundance of SiO is increased ten times due to the shock passage. Therefore, we can conclude that IRAS 06238+0904 is a genuine C-star and no assumption of mixed chemistry is necessary. Chemical reactions considered in network can reproduce O-bearing SiO molecule in C-rich environment if no CO self-shielding is considered. In presence of CO self-shielding the computed SiO 8 R. Szczerba, M.R. Schmidt, M. Pulecka emission is too low. This may be improved, however, if we consider the effect of shock passage which can increase the initial SiO abundance by order of magnitude as predicted by Willacy & Cherchneff (1998). ACKNOWLEDGMENTS. This work has been supported by grants 2.P03D.017.25 and 1.P03D.010.29 of the Polish State Committee for Scientific Research. REFERENCES Babkovskaia N., Poutanen J., Richards A. M. S., Szczerba R. 2006, MNRAS, 370, Chen P. S., Szczerba R., Kwok S, Volk K. 2001, AA, 368, 1006 Cohen M., Barlow M. J., Sylvester R. J. et al. 1999, ApJ, 513, L135 Ford K. E. S., Neufeld D. A., Goldsmith P. F., Melnick G. J. 2003, ApJ, 589, 430 Jura M., Bohac C. J., Sargent B. et al. 2006, ApJ, 637, L45 Lewis B. M., 1992, ApJ, 396, 251 Le Teuff Y. H., Millar T. J., Markwick A. J. 2000, A&AS, 146, 157 Little-Marenin I. 1986, AA, 307, L15 Mamon G. A., Glassgold A. E., Huggins P. J., 1988, ApJ, 328 Melnick G. J., Neufeld D. A., Ford K. E. S. et al. 2001, Nature, 412, 160 Millar T. J. 2004, in AGB stars, eds. H. J. Habing, H. Olofsson, 247 Molster F. J., Yamamura I., Waters L. B. F. M. et al. 2001, AA, 366, 923 Ohnaka K., Driebe T., Hoffman K.-H. et al. 2006, AA, 445, 1015 Schöier F., Olofsson H., Lundgren A. 2006, AA, 454, 247 Schöier F., van der Tak F., van Dishoeck E., Black J. H. 2005, AA, 432, 369 Speck A., Cami J., Markwick-Kemper C. et al., 2006, ApJ, 650, 892 Szczerba R., Chen P. S., Szymczak M., Omont A. 2002, AA, 381, 491 Szczerba R., Omont A., Volk K. et al. 1997, AA.,317,859 Szczerba R. Stasińska G., Siódmiak N., Górny S. K. 2003, in Exploiting the ISO Data Archive. Infrared Astronomy in the Internet Age, eds. C. Gry, S. Peschke, J. Matagne, P. Garcia-Lario, R. Lorente, A. Salama, ESA SP-511, 149 Szczerba R., Szymczak M., Babkovskaia N. et al. 2006, AA, 452, 561 Trams N. R., van Loon J. Th., Zijlstra A. A. et al. 1999, AA, 344, L17 Waters L. B. F. M., Beintema D. A., Zijlstra A. A. 1998, AA, 331, L61 Willacy K., Cherchneff I., 1998, AA, 330, 676 Willems F. J., de Jong T. 1986, ApJ, 309, L39 Yamamura I., Dominik C., de Jong T. 2000, AA, 363, 629 Zijlstra A. A., Gaylard M. J., Te Lintel Hekkert P. et al. 1991, AA, 243, L9 ABSTRACT We discuss phenomenon of simultaneous presence of O- and C-based material in surroundings of evolutionary advanced stars. We concentrate on silicate carbon stars and present observations that directly confirm the binary model scenario for them. We discuss also class of C-stars with OH emission detected, to which some [WR] planetary nebulae do belong. <|endoftext|><|startoftext|> Introduction Let X3 ⊂ P4 be a smooth cubic threefold, then its intermediate Jacobian J(X) := H2,1(X,C)∗/H3(X,Z) is a principally polarised abelian variety (J(X),Θ) of dimension five that is not a Ja- cobian of a curve [4, Thm.0.12]. The Fano scheme F parametrising lines contained in X is a smooth surface, and the Abel-Jacobi map F → J(X) is an embedding that induces an isomorphism Alb(F )) ≃ J(X) [4, Thm.0.6,0.9]. Furthermore the cohomology class of F ⊂ J(X) is minimal, that is [F ] = There is only one other known family of examples of principally polarised abelian varieties (A,Θ) of dimension n such that for 1 ≤ d ≤ n− 2, a minimal cohomology class Θ (n−d)! can be represented by an effective cycle of dimension d: the Jacobians of curves J(C) where the suvarietiesWd(C) ⊂ J(C) have minmal cohomology class. O. Debarre has shown that on a Jacobian these are the only subvarieties having minimal class [5, Thm.5.1], furthermore by a theorem of Z. Ran [11, Thm.5], the only principally polarised abelian fourfolds with a subvariety of minimal class are (products of) Jacobians of curves. In higher dimension few things are known about subvarieties having minimal class. In [9], G. Pareschi and M. Popa introduce a new approach to the characterisation of these subvarieties: they consider the (probably more tractable) cohomological properties of the twisted structure sheaf of the subvariety. More precisely we have the following conjecture. 1.1. Conjecture. [5],[9] Let (A,Θ) be an irreducible principally polarised abelian varieties of dimension n, and let Y be a nondegenerate subvariety (cf. [11, p.464]) of A of dimension d ≤ n− 2. The following statements are equivalent. 1.) The variety Y has minimal cohomology class, i.e. [Y ] = Θ (n−d)! 2.) The twisted structure sheaf OY (Θ) is M -regular (cf. definition 1.4 below), and h0(Y,OY (Θ)⊗ Pξ) = 1 for Pξ ∈ Pic 0(A) general. Date: 4th April, 2007. http://arxiv.org/abs/0704.0558v2 3.) Either (A,Θ) is the Jacobian of a curve of genus n and Y is a translate of Wd(C) or −Wd(C), or n = 5, d = 2 and (A,Θ) is the intermediate Jacobian of a smooth cubic threefold and Y is a translate of F or −F . The implication 2) ⇒ 1) is the object of [9, Thm.B]. The implication 3) ⇒ 2) has been shown for Jacobians of curves in [8, Prop.4.4]. We complete the proof of this implication by treating the case of the intermediate Jacobian. 1.2. Theorem. Let X3 ⊂ P4 be a smooth cubic threefold, and let (J(X),Θ) be its intermediate Jacobian. Let F ⊂ J(X) be an Abel-Jacobi embedded copy of the Fano variety of lines in X. Then OF (Θ) is M -regular and h 0(F,OF (Θ)⊗ Pξ) = 1 for Pξ ∈ Pic 0 J(X) general. Since the properties considered are invariant under isomorphisms, the theorem implies the same statement for −F . The study of the remaining open implications of conjecture 1.1 is a much harder task than the proof of theorem 1.2. In an upcoming paper we will start to investi- gate this problem under the additional hypothesis that (A,Θ) is the intermediate Jacobian of a generic smooth cubic threefold. In this case we can show the following statement. 1.3. Theorem. [6] Let X3 ⊂ P4 be a general smooth cubic threefold. Let (J(X),Θ) be its intermediate Jacobian, and let F ⊂ J(X) be an Abel-Jacobi embedded copy of the Fano variety of lines in X. Let S ⊂ J(X) be a surface that has minimal cohomology class, i.e. [S] = Θ . Then S is a translate of F or −F . Notation and basic facts. We work over an algebraically closed field of characteristic different from 2. We will denote by D ≡ D′ the linear equivalence of divisors, and by D ≡num D ′ the numerical equivalence. For (A,Θ) a principally polarised abelian variety (ppav), we identify A with  = Pic0(A) via the morphism induced by Θ. If ξ ∈ A is a point, we denote by Pξ the corresponding point in  = Pic0(A) which we consider as a numerically trivial line bundle on A. 1.4. Definition. [10] Let (A,Θ) be a ppav of dimension n, and let F be a coherent sheaf on A. For all n ≥ i > 0, we denote by V iF := {ξ ∈ A | h i(A,F ⊗ Pξ) > 0} the i-th cohomological support locus of F . We say that F is M -regular if codimV i for all i ∈ {1, . . . , n}. If l ⊂ X is a line, we will denote by [l] the corresponding point of the Fano surface F and by Dl ⊂ F the incidence curve of l, that is, Dl parametrises lines in X that meet l. Furthermore we have by [4, §10], [12, §6] and Riemann-Roch that OF (Θ) ≡num 2Dl,(1.5) KF ≡num 3Dl,(1.6) Dl ·Dl = 5,(1.7) χ(F,OF (Θ)) = 1.(1.8) 2. Prym construction of the Fano surface We recall the construction of the Fano surface as a special subvariety of a Prym variety [3, 2]: let C̃ := Dl0 ⊂ F be the incidence curve of a general line l0 ⊂ X . Let X ′ be the blow-up of X in l0. Then the projection from l0 induces a conic bundle structure X ′ → P2 with branch locus C ⊂ P2 a smooth quintic. This conic bundle induces a natural connected étale covering of degree two π : C̃ → C (cf. [1, Ch.I] for details), and we denote by σ : C̃ → C̃ the involution induced by π. The kernel of the normmorphism Nm : JC̃ → JC has two connected components which we will denote by P and P1. The zero component P is called the Prym variety associated to π, and it is isomorphic as a ppav to J(X) [1, Thm.2.1]. Let H ⊂ C be an effective divisor given by a hyperplane section in P2. Then H has degree five and h0(C,OC(H)) = 3, so the complete linear system g 5 corresponds to a P2 ⊂ C(5). We choose a divisor H̃ ∈ C̃(5) such that π(5)([H̃ ]) = [H ], where π(5) : C̃(5) → C(5) is the morphism induced by π on the symmetric products. Let φH : C (5) → JC and φ : C̃(5) → JC̃ be the Abel-Jacobi maps given by H and H̃. We have a commutative diagram C̃(5) The fibre of φ (C̃(5)) → φH(C (5)) over the point 0 (and thus the intersection of (C̃(5)) with kerNm) has two connected components F0 ⊂ P and F1 ⊂ P1. If we identify P and P1 via H̃ − σ(H̃), we obtain an identification F1 = −F0 [3, p.360]. The (non-canonical) isomorphism of ppavs P ≃ J(X) transforms F0 into a translate of the Fano surface F [3, Thm.4]. From now on we will identify P (resp. F0) and J(X) (resp. some Abel-Jacobi emdedded copy of the Fano surface F ). We will now prove two technical lemmata on certain linear systems on C̃. The first is merely a reformulation of [2, §2,ii)]. 2.9. Lemma. The line bundle O (C̃) is a base-point free pencil of degree five such that any divisor D ∈ |O (C̃)| satisfies π∗D ≡ H. Proof. We define a morphism µ : C̃ = Dl0 → l0 ≃ P 1 by sending [l] ∈ C̃ to l∩l0. Since l0 is general and through a general point of l0 there are five lines distinct from l0, the morphism µ has degree 5. If [l] ∈ F , then Dl · Dl0 = 5 by formula (1.7), so for [l] 6= [l0] the divisor Dl0 ∩ Dl ∈ |OC̃(Dl)| is effective. Furthermore π∗Dl ≡ H , since π∗Dl is the intersection of C ⊂ P 2 with the image of l under the projection X ′ → P2. By specialisation the linear system |O (C̃)| is not empty and a general divisor D in it corresponds to the five lines distinct from l0 passing through a general point of l0. Hence OC̃(C̃) ≃ µ ∗OP1(1) and π∗D ≡ H . � 2.10. Lemma. The sets V ′0 := {ξ ∈ P | h 0(C̃,O (C̃)⊗ Pξ) > 0} V ′1 := {ξ ∈ P | h 0(C̃,O (2C̃)⊗ Pξ) > 1} are contained in translates of F ∪ (−F ). Proof. 1) Let D ∈ |O (C̃) ⊗ Pξ| be an effective divisor. Then π∗C̃ ≡ π∗D ≡ H . It follows that D ∈ (φ (C̃(5)) ∩ kerNm), so D is in F or −F . 2) We follow the argument in [2, §3]. By [2, §2,iv)] we have h0(C̃,O (C̃ + σ(C̃))) = 4, so h0(C̃,O (2C̃)) is odd. It follows from the deformation invariance of the parity [7, p.186f] that V ′1 = {ξ ∈ P | h 0(C̃,O (2C̃)⊗ Pξ) ≥ 3}. Fix ξ ∈ P such that h0(C̃,O (2C̃)⊗ Pξ) ≥ 3 and D ∈ |OC̃(2C̃)⊗ Pξ|. Let s and t be two sections of O (C̃) such that the associated divisors have disjoint supports, then we have an exact sequence 0 → O (D − C̃) (t,−s) (D)⊕2 (s,t) (D + C̃) → 0. This implies h0(C̃,O (D − C̃)) + h0(C̃,O (D + C̃)) ≥ 2h0(C̃,O (D)) = 6, furthermore by Riemann-Roch h0(C̃,O (D + C̃)) = h0(C̃,O −D − C̃)) + 5. Now K −D ≡ σ(D) and h0(C̃,O (σ(D) − C̃)) = h0(C̃,O (D − σ(C̃))) imply h0(C̃,O (D − C̃)) + h0(C̃,O (D − σ(C̃))) ≥ 1. Hence D ≡ C̃ + D′ or D ≡ σ(C̃) + D′ where D′ is an effective divisor such that ′ ≡ H . We see as in the first part of the proof that the effective divisors D′ such that π∗D ′ ≡ H are parametrised by a set that is contained in a translate of F ∪ (−F ). � 3. Proof of theorem 1.2. Since OF (Θ) ≡num OF (2C̃) by formula (1.5), it is equivalent to verify the stated properties for the sheaf OF (2C̃). Step 1. The second cohomological support locus is contained in a translate of F ∪ (−F ). By formula (1.6), we have KF ≡ OF (3C̃)⊗Pξ0 for some ξ0 ∈ P . Hence by Serre duality h2(F,OF (2C̃)⊗Pξ) = h 0(F,OF (C̃)⊗P ξ ⊗Pξ0), so it is equivalent to consider the non-vanishing locus V0 := {ξ ∈ P | h 0(F,OF (C̃)⊗ Pξ) > 0}. If l ∈ F is a line on X , the corresponding incidence curve Dl ⊂ F is an effective divisor numerically equivalent to C̃, so it is clear that ±F is (up to translation) a subset of V0. In order to show that we have an equality, consider the exact sequence 0 → OF ⊗ Pξ → OF (C̃)⊗ Pξ → OC̃(C̃)⊗ Pξ → 0. Clearly h0(F,OF ⊗ Pξ) = 0 for ξ 6= 0, so h 0(F,OF (C̃)⊗ Pξ) ≤ h 0(C̃,O (C̃)⊗ Pξ) for ξ 6= 0. Since a divisor D ∈ |O (C̃)| satisfies π∗D ≡ H , we conclude with Lemma 2.10. Step 2. The first cohomological support locus is is contained in a union of trans- late of F ∪ (−F ). Since χ(F,OF (2C̃)) = χ(F,OF (Θ)) = 1 (formula (1.8)), we h1(F,OF (2C̃)⊗ Pξ) = h 0(F,OF (2C̃)⊗ Pξ) + h 0(F,OF (C̃)⊗ P ξ ⊗ Pξ0)− 1. Since h0(F,OF (2C̃)⊗ Pξ) = h 0(F,OF (Θ)⊗ Pξ) ≥ 1 for all ξ ∈ P , the first cohomological support locus is contained in the locus where h0(F,OF (C̃)⊗ P ξ ⊗Pξ0) > 0 or h 0(F,OF (2C̃)⊗ Pξ) > 1. By step 1 the statement follows if we show the following claim: the set V1 := {ξ ∈ P | h 0(F,OF (2C̃)⊗ Pξ) > 1} is contained in a union of translates of F ∪ (−F ). Step 3. Proof of the claim and conclusion. Consider the exact sequence 0 → OF (C̃)⊗ Pξ → OF (2C̃)⊗ Pξ → OC̃(2C̃)⊗ Pξ → 0. By the first step we know that h0(F,OF (C̃)⊗ Pξ) = 0 for ξ in the complement of a translate of F ∪ (−F ), so h0(F,OF (2C̃)⊗ Pξ) ≤ h 0(C̃,O (2C̃)⊗ Pξ) for ξ in the complement of a translate of F ∪ (−F ). The claim is then immediate from Lemma 2.10. By the same lemma h0(C̃,O (2C̃)⊗Pξ) = 1 for ξ ∈ P general, so h0(F,OF (2C̃)⊗ Pξ) = h 0(F,OF (Θ)⊗ Pξ) = 1 for ξ ∈ P general. � Remark. It is possible to strengthen a posteriori the statements in the proof: since Theorem 1.2 holds, we can use the Fourier-Mukai techniques from [9] to see that the cohomological support loci are supported exactly on the theta-dual of F (ibid, Definition 4.2), which in our case is just F . Acknowledgements. I would like to thank Mihnea Popa for suggesting to me to work on this question. Olivier Debarre has shown much patience at explaining to me the geometry of abelian varieties. For this and many discussions on minimal cohomology classes I would like to express my deep gratitude. References [1] A. Beauville. Variétés de Prym et jacobiennes intermédiaires. Ann. Sci. École Norm. Sup. (4), 10(3):309–391, 1977. [2] A. Beauville. Les singularités du diviseur Θ de la jacobienne intermédiaire de l’hypersurface cubique dans P4. In Lect. Notes Math. 947., pages 190–208. Springer, Berlin, 1982. [3] A. Beauville. Sous-variétés spéciales des variétés de Prym. Comp. Math., 45(3):357–383, 1982. [4] C. H. Clemens and P. A. Griffiths. The intermediate Jacobian of the cubic threefold. Ann. of Math. (2), 95:281–356, 1972. [5] O. Debarre. Minimal cohomology classes and Jacobians. J. Alg. Geom., 4(2):321–335, 1995. [6] A. Höring. Paper in preparation. Soon on this server, 2007. [7] D. Mumford. Theta characteristics of an algebraic curve. Ann. Sci. École Norm. Sup. (4), 4:181–192, 1971. [8] G. Pareschi and M. Popa. Regularity on abelian varieties. I. J. Amer. Math. Soc., 16(2):285– 302, 2003. [9] G. Pareschi and M. Popa. Generic vanishing and minimal cohomology classes on abelian varieties. arXiv:math.AG/0610166, 2006. [10] G. Pareschi and M. Popa. GV-sheaves, Fourier-Mukai transform, and Generic Vanishing. arXiv:math.AG/0608127, 2006. [11] Z. Ran. On subvarieties of abelian varieties. Inventiones Math., 62:459–479, 1981. [12] G. E. Welters. Abel-Jacobi isogenies for certain types of Fano threefolds, volume 141 of Mathematical Centre Tracts. Mathematisch Centrum, Amsterdam, 1981. Andreas Höring, IRMA, Université Louis Pasteur, 7 rue René Descartes, 67084 Stras- bourg, France E-mail address: andreas.hoering@ujf-grenoble.fr http://arxiv.org/abs/math/0610166 http://arxiv.org/abs/math/0608127 1. Introduction 2. Prym construction of the Fano surface 3. Proof of theorem ??. References ABSTRACT Let $(A,\Theta)$ be a principally polarised abelian variety, and let Y be a subvariety. Pareschi and Popa conjectured that Y has minimal cohomology class if and only if the structure sheaf of Y satisfies a property that they call M-regularity. Let now X be a smooth cubic threefold. By a classical result due to Clemens and Griffiths, its intermediate Jacobian J(X) is a principally polarised abelian variety; furthermore the Fano surface of lines on X can be embedded in J(X) and has minimal cohomology class. In this short note we show that its structure sheaf is M-regular. <|endoftext|><|startoftext|> arXiv:0704.0559v1 [hep-ph] 4 Apr 2007 Signal for space-time noncommutativity: the Z → γγ decay in the renormalizable gauge sector of the θ-expanded NCSM ∗ Josip Trampetić† Rudjer Bošković Institute, Zagreb, Croatia Abstract We propose the Z → γγ decay, a process strictly forbidden in the standard model, as a signal suitable for the search of noncommutativity of coordinates at very short distances. We compute the Z → γγ partial widthin the framework of the recently proposed renormalizable gauge sector of the noncommutative standard model. The one-loop renormalizability is obtained for the model containing the usual six representations of matter fields of the first generation. Even more, the noncommutative part is finite or free of divergences, showing that perhaps new interaction symmetry exists in the noncommutative gauge sector of the model. Discovery of such symmetry would be of tremendous importance in further search for the violation of the Lorentz invariance at very high energies. Experimental possibilities of Z → γγ decay are analyzed and a firm bound to the scale of the noncommutativity parameter is set around 1 TeV. ∗ Based on presentation given at the IV Summer School in Modern Mathematical Physics, Belgrad, Serbia, September 3-14, 2006 and LHC Days in Split, Croatia, October 2-7, 2006. Work supported by the Croatian Ministry of Science, Education and Sport project 098-0982930-2900. † e-mail address: josipt@rex.irb.hr http://arxiv.org/abs/0704.0559v1 The title 2 Gauge theories can be extended to a noncommutative (NC) setting in different ways. In our model, the classical action is obtained via a two-step procedure. First, the noncommutative Yang-Mills (NCYM) is equipped with a star product carrying information about the underlying noncommu- tative manifold, and, second, the ⋆-product and noncommutative fields are expanded in the noncommutative parameter θ using the Seiberg-Witten (SW) map [1]. In this approach, noncommutativity is treated perturba- tively. The major advantage is that models with any gauge group and any particle content can be constructed [2, 3, 4, 5, 6, 7], so we can construct the standard model (SM). Commutative gauge symmetry is the underlying symmetry of the theory and is present in each order of the θ-expansion. Noncommutative (NC) symmetry, on the other hand, exists only in the full theory, i.e. after summation. There are a number of versions of the noncommutative standard model (NCSM) in the θ-expanded approach, [3, 4, 5, 6]. The action is gauge in- variant; furthermore, it has been proved that the action is anomaly free whenever its commutative counterpart is also anomaly free [8]. The ar- gument of renormalizability was previously included in the construction of field theories on noncommutative Minkowski space producing not only the one-loop renormalizable model [9], but the model containing one-loop quantum corrections free of divergences [10], contrary to previous results [11, 12]. In [10] we analyzed the gauge theory based on the U(1)Y × SU(2)L × SU(3)C group: we succeeded in constructing a model which had the renor- malizable gauge sector to θ-linear order. The condition of the gauge sector renormalizability determines the additional θ-linear interactions between gauge bosons. Experimental evidence for noncommutativity coming from the gauge sector should be searched for in the process of the Z → γγ decay, kinemati- cally allowed for on-shell particles [10, 7]. As it is forbidden in the SM by an- gular momentum conservation and Bose statistics (Landau-Pomeranchuk- Yang Theorem), it would serve as a clear signal for the existence of space- time noncommutativity. Signatures of noncommutativity were discussed previously within particle physics in [7, 13, 14]. The noncommutative space which we consider is the flat Minkowski space, generated by four hermitian coordinates x̂µ which satisfy the com- mutation rule [x̂µ, x̂ν ] = iθµν = const. (1) The algebra of the functions φ̂(x̂), χ̂(x̂) on this space can be represented by the algebra of the functions φ̂(x), χ̂(x) on the commutative R4 with the Moyal-Weyl multiplication: φ̂(x) ⋆ χ̂(x) = e θµν ∂ ∂yν φ̂(x)χ̂(y)|y→x . (2) It is possible to represent the action of an arbitrary Lie group G (with the generators denoted by T a) on noncommutative space. In analogy to the ordinary case, one introduces the gauge parameter Λ̂(x) and the vector The title 3 potential V̂µ(x). The main difference is that the noncommutative Λ̂ and V̂µ cannot take values in the Lie algebra G of the group G: they are enveloping algebra-valued. The noncommutative gauge field strength F̂µν is F̂µν = ∂µV̂ν − ∂ν V̂µ − i(V̂µ ⋆ V̂ν − V̂ν ⋆ V̂µ). (3) There is, however, a relation between the noncommutative gauge symmetry and the commutative one: it is given by the Seiberg-Witten (SW) mapping [1]. Namely, the matter fields φ̂, the gauge fields V̂µ, F̂µν and the gauge parameter Λ̂ can be expanded in the noncommutative θµν and in the com- mutative Vµ and Fµν . This expansion coincides with the expansion in the generators of the enveloping algebra of G, {T a, : T aT b :, : T aT bT c :}; here : : denotes the symmetrized product. The SW map is obtained as a solution to the gauge-closing condition of infinitesimal (noncommutative) transformations. The expansions of the NC vector potential and of the field strength, up to first order in θ, read V̂ρ(x) = Vρ(x)− θµν {Vµ(x), ∂νVρ(x) + Fνρ(x)}+ . . . , (4) F̂ρσ = Fρσ + 2{Fµρ, Fνσ} − {Vµ, (∂ν +Dν)Fρσ} + . . . , (5) where Dν = ∂ν − i[Vν , ] is the commutative covariant derivative. The solution for the SW map given above is not unique and along with (5) all expressions V̂ ′µ, F̂ µν of the form V̂ ′µ = V̂µ +Xµ, F̂ µν = F̂µν +DµXν −DνXµ (6) are solutions to the closing condition to linear order, if Xµ is a gauge covariant expression linear in θ, otherwise arbitrary. One can think of this transformation as of a redefinition of the fields Vµ and Fµν . Taking the action of the noncommutative gauge theory, analogous to that of the ordinary Yang-Mills theory with the commutative field strengths replaced by the noncommutative ones, S = − d4x F̂µν ⋆ F̂ µν , (7) and expanding the fields as in (4-5) and the ⋆-product in θ, we obtain the expression S = − d4xFµνF µν + θµν Tr FµνFρσ − FµρFνσ F ρσ , (8) which is the starting point for the analysis of θ-expanded noncommutative gauge models. Due to the renormalizability condition, we add term, includ- ing NC freedom parameter 1 (a− 1), to the original Lagrangian, producing the following general form of the noncommutative gauge field action: S = − d4xFµνF µν + θµν Tr d4x ( FµνFρσ − FµρFνσ)F ρσ. (9) The title 4 The most general form of the NC action, invariant under the NC gauge transformation, is given in [3, 5, 6, 4], Sgauge = − R(F̂µν) ⋆R(F̂ . (10) The sum in (10) is, in principle, taken over all irreducible representations R of GSM with arbitrary weights CR. Obviously, gauge models are rep- resentation dependent in the NC case: the choice of representations has a strong influence on the theory, on both the form of interactions and the renormalizability properties. Expanding the NC gauge action (10) to first order in the noncommuta- tivity parameter θ, we obtain Sgauge = − d4xR(Fµν)R(F µν) (11) R(Fµν)R(Fρσ)−R(Fµρ)R(Fνσ) R(F ρσ). The arbitrariness in the gauge action, introduced through the coefficient a, reflects in part also the nonuniqueness of the SW map. As we have already mentioned, renormalizability points out the value a = 3 as physical; however, we keep the value of a arbitrary in calculations and use a = 3 at the end. Note that by generalizing the expression (5) to equivalent form F̂µν(a) = Fµν + 2{Fµρ, Fντ} − a{Vρ, (∂τ +Dτ )Fµν} , (12) one could also obtain the actions (9,11) directly from (7,10).1 The im- portant question, if the freedom parameter a is eventually comming from different class of SW maps and/or some other new interaction symmetry extends the purpose of this presentation and, consequentlly, shall be dis- cussed elsewhere. The noncommutative correction, that is the θ-linear part of the La- grangian, reads Lθi = g ′3κ1θ fµνfρσf ρσ − fµρfνσf + g3κ BiµνB ρσk −BiµρB + g3Sκ GaµνG ρσc −GaµρG 1This is in part due to the properties of the integral over the two-function ⋆-product, i.e. the Stokes theorem. The title 5 + g′g2κ2θ ρσi − fµρB ρσi + c.p. + g′g2Sκ3θ ρσa − fµρG ρσa + c.p. , (13) where the c.p. in (13) denotes the addition of the terms obtained by a cyclic permutation of fields without changing the positions of indices. Here, fµν , B µν , and G µν are the physical field strengths which correspond to U(1)Y, SU(2)L, and SU(3)C, respectively. The couplings κi, (i = 1, ..., 5), as functions of the weights CR, that is of the Ci(= 1/g i ), i = 1, ..., 6, are parameters of the model. The couplings in (13) are defined as follows: CRd(R2)d(R3)R1(Y ) 3, (14) CRd(R3)R1(Y )Tr (R2(T L)R2(T L)), (15) CRd(R2)R1(Y )Tr (R3(T S )R3(T S)), (16) CRd(R3)Tr ({R2(T L),R2(T L)}R2(T L)), (17) κabc5 = CRd(R2)Tr ({R3(T S ),R3(T S)}R3(T S)). (18) The κ1, . . . , κ5 depend on the representations of matter fields through the dependence on the coefficients CR. For the first generation of the standard model there are six such representations, summarized in Table 1 of [4]; they produce six independent constants CR 2. However, one can immediately verify that κ 4 = 0. This follows from the fact that the symmetric coeffi- cients dijk of SU(2) vanish for all irreducible representations. In addition, we take that κabc5 = 0. The argument for this assumption is related to the invariance of the color sector of the SM under charge conjugation. Although apparently in Table 1 from [4] one has only the fundamental representa- tion 3 of SU(3)C, there are in fact both 3 and 3̄ representations with the same weights, C3 = C3̄. In the Lagrangian this corresponds to writing each minimally-coupled quark term as a half of the sum of the original and the charge-conjugated terms. Since the symmetric coefficients for the 3 and 3̄ representations satisfy dabc = −dabc , we obtain κabc5 = C3d = 0. (19) 2We assume that CR > 0; therefore the six CR’s were denoted by , i = 1, ..., 6, in [3, 6]. The title 6 -0.3 -0.2 -0.1 0 0.1 ΓΓΓ -0.2 -0.3 -0.2 -0.1 0 0.1 Figure 1: (a) The three-dimensional simplex that bounds possi- ble values for the coupling constants Kγγγ , KZγγ and KZgg at the MZ scale. The vertices of the simplex are: (−0.184,−0.333, 0.054), (−0.027,−0.340,−0.108), (0.129,−0.254, 0.217), (−0.576, 0.010,−0.108), (−0.497,−0.133, 0.054), and (−0.419, 0.095, 0.217). (b) The allowed region for KZγγ and Kγγγ at theMZ scale, projected from the simplex given in Fig (a). The vertices of the polygon are: (−0.333, −0.184), (−0.340, −0.027), (−0.254, 0.129), (0.095, −0.419), (0.0095, −0.576), and (−0.133, −0.497). We are left only with three nonvanishing couplings, κ1, κ2, and κ3, depend- ing on six constants C1, . . . , C6: κ1 = −C1 − κ2 = − C6 ; κ3 = C5 . (20) There are three relations among Ci’s: = 2C1 + C2 + C5 + C6 , = C2 + 3C5 + C6 ; = C3 + C4 + 2C5 , (21) in effect representing three consistency conditions imposed on (8) in a way to match the SM action at zeroth order in θ. See detailes in [6]. Fig.(1) shows the three-dimensional simplex that bounds allowed values for the dimensionless coupling constants Kγγγ , KZγγ and KZgg. For any choosen point within the simplex in Fig.(1) the remaining coupling con- stants KZZγ, KZZZ, KWWγ, KWWZ and Kγgg are uniquely fixed by the NCSM [6, 4]. This is true for any combination of three coupling constants. The title 7 Our total classical action reads Scl = SSM + Sθi = g ′3κ1θ fµνfρσf ρσ − fµρfνσf + g′g2κ2θ ρσi − fµρB ρσi + c.p. + g′g2Sκ3θ ρσa − fµρG ρσa + c.p. . (22) The term Sθ1 in (22) is one-loop renormalizable to linear order in θ [9] since the one-loop correction to the Sθ1 is of the second order in θ. We need to investigate only the renormalizability of the remaining Sθ2 and S 3 parts of the action (22). To realize the one-loop renormalization of the gauge part action (22), we apply, as before [9, 10], the background field method [15, 16]. As we have already explained the details of the method in [12], here we only discuss the points needed for this computation. The main contribution to the func- tional integral is given by the Gaussian integral. However, technically, this is achieved by splitting the vector potential into the classical-background plus the quantum-fluctuation parts, that is, φV → φV +ΦV , and by comput- ing the terms quadratic in the quantum fields. In this way we determine the second functional derivative of the classical action, which is possible since our interactions (22) are of the polynomial type. The quantization is performed by the functional integration over the quantum vector field ΦV in the saddle-point approximation around the classical (background) configuration φV . First, an advantage of the background field method is the guarantee of covariance, because by doing the path integral the local symmetry of the quantum field ΦV is fixed, while the gauge symmetry of the background field φV is manifestly preserved. Since we are dealing with gauge symmetry, our Lagrangian (22) is sin- gular owing to its invariance under the gauge group. Therefore, a proper quantization of (22) requires the presence of the gauge fixing term Sgf [φ], i.e. the Feynman-Fadeev-Popov ghost appears in the effective action Γ[φ] = Scl[φ] + Sgf [φ] + Γ (1)[φ], Sgf [φ] = − d4x(DµΦ )2 . (23) The one-loop effective part Γ(1)[φ] is given by Γ(1)[φ] = log detS(2)[φ] = Tr logS(2)[φ]. (24) In (24), the S(2)[φ] is the 2nd-functional derivative of the classical action, with the following structure: S2 = ✷+N1 +N2 + T2 + T3 + T4 . (25) The title 8 Here N1, N2 are commutative vertices, while T2, T3, T4 are noncommutative ones. The indices denote the number of classical fields. The one-loop effective action computed by using the background field method is θ,2 = Tr log I +✷−1(N1 +N2 + T2 + T3 + T4) (−1)n+1 −1N1 +✷ −1N2 +✷ −1T2 +✷ −1T3 +✷ As the conventions and the notation are the same as in [10], we only en- counter and discuss the final results. The divergent one-loop vertex correction to (22) as a function of the SW freedom parameter a is [10] Γdiv = 3(4π)2ǫ BiµνB µνi + GaµνG 3(4π)2ǫ g′g2κ2(3− a)θ ρσ − fµρB 3(4π)2ǫ g′g2Sκ3(3− a)θ ρσ − fµρG ρσa . From (27) it is clear that the expanded gauge action (22) is renormalizable only for the value a = 3 and, its noncommutative part is finite or free of di- vergencies, so the noncommutativity parameter θ need not be renormalized. The results for the bare fields and couplings, are given in [10]. Note that we have also analized the renormalizability properties of the pure NC SU(N) gauge sector, for vector fields in the adjoint representation [17]. We have found that this model is also renormalizable for a = 3. However, to obtain renormalizability, we had to pay a price by necessity for the renormalization of the noncommutative deformation parameter h. In this way the parameter h and/or the scale of noncommutativity ΛNC become running quantities, dependent on energy [17]. In addition, it was shown that the one-loop contributions to the U(1) gauge-field part of the noncommutative gauge theories in the enveloping- algebra formalism are renormalizable at first order in θ even if the scalar matter, with and without spontaneous symmetry breaking, contributions are taken into account [18]. There is reasonable hope that the same con- clusion should hold for SU(N), but the computations are expected to be extremely involving. Nevertheless, the results [18] further strengthen the philosophy which is embraced in our latest papers [10, 17]. From the action (22) we extract the triple-gauge boson terms which are not present in the commutative SM Lagrangian. In terms of the physical fields A, W±, Z, and G they are Lθγγγ = sin 2θW Kγγγθ ρτAµν (aAµνAρτ − 4AµρAντ ) , Kγγγ = gg′(κ1 + 3κ2); (28) The title 9 LθZγγ = sin 2θW KZγγ θ × [2Zµν (2AµρAντ − aAµνAρτ ) + 8ZµρA µνAντ − aZρτAµνA µν ] , KZγγ = − 2g2 ; (29) where Aµν ≡ ∂µAν − ∂νAµ, etc. The structure of the other interactions such as ZZγ, WWZ, ZZZ, Zgg, and γgg is given in [4, 6]. Next we focus on the branching ratio of the Z → γγ decay in the renor- malizable model. Note that each term from the θ-expanded action (22), (28) and (29) is manifestly invariant under the ordinary gauge transforma- tions. The gauge-invariant amplitude AθZ→γγ for the Z(k1) → γ(k2) γ(k3) decay in the momentum space reads AθZ→γγ = −2e sin 2θWKZγγΘ 3 (a; k1,−k2,−k3)ǫµ(k1)ǫν(k2)ǫρ(k3). (30) The tensor Θ 3 (a; k1, k2, k3) is given by 3 (a; k1, k2, k3) = − (k1θk2) (31) × [(k1 − k2) ρgµν + (k2 − k3) µgνρ + (k3 − k1) νgρµ] − θµν [k 1 (k2k3)− k 2 (k1k3)] − θνρ [k 2 (k3k1)− k 3 (k2k1)] − θρµ [kν3 (k1k2)− k 1 (k3k2)] + (θk2) gνρ k23 − k + (θk3) gνρ k22 − k + (θk3) gµρ k21 − k + (θk1) gµρ k23 − k + (θk1) gµν k22 − k + (θk2) gµν k21 − k + θµα(ak1 + k2 + k3)α [g νρ (k3k2)− k + θνα(k1 + ak2 + k3)α [g µρ (k3k1)− k + θρα(k1 + k2 + ak3)α [g µν (k2k1)− k 1 ] , where the 4-momenta k1, k2, k3 are taken to be incoming, satisfying the momentum conservation (k1+ k2+ k3 = 0). In (31) the freedom parameter a appears symmetric in physical gauge bosons which enter the interaction point, as one would expect. The amplitude (30), for a = 3, with the Z boson at rest gives the total rate for the Z → γγ decay: ΓZ→γγ = sin2 2θWK ~E2θ + ~B2θ ), (32) where ~Eθ = {θ 01, θ02, θ03} and ~Bθ = {θ 23, θ31, θ12} are dimensionless coef- ficients of order one, representing the time-space and space-space noncom- mutativity, respectively. For the Z boson at rest, polarized in the direction The title 10 of the third axis, we obtain the following polarized partial width: Γ3Z→γγ = sin2 2θWK ~E2θ + ~B2θ + 42 (θ03)2 + (θ12)2 . (33) In order to estimate the scale of noncommutativity ΛNC from ΓZ→γγ,we consider new experimental possibilities at LHC. According to the CMS Physics Technical Design Report [19], around 107 Z → e+e− events are expected to be recorded with 10 fb−1 of the data. From this one can estimate the expected number of Z → γγ events per 10 fb−1. Assuming that BR(Z → γγ) ∼ 10−8 and using BR(Z → e+e−) = 3 × 10−2, we may expect to have ∼ 3 events of Z → γγ with 10 fb−1. Now the question is: What would be the background from Z → e+e− when the electron radiates a very high-energy bremsstrahlung photon in the beam pipe or in the first layer(s) of the Pixel Detector and is thus lost for the tracker reconstruction? In that case, the electron would not be reconstructed and would be misidentified as a photon. The probability of such an event should be evaluated from the full detector simulation. According to the CMS note [20] which studies the Z → e+e− background for Higgs → γγ, the probability to misidentify the electron as a photon is huge (see Fig. 3 in [20]) but the situation can be improved by applying more stringent selections to the photon candidate when searching for Z → γγ events [21]. However, the irreducible di-photon background (Fig. 3 in [20]) might also kill the signal. In that case, one can only set the upper limits to the scale of noncommutativity from the Z → γγ rate. In accord with the analysis of the LHC experimental expectations [19, 20, 21] it is bona fide reasonable to assume that the lower bound for the branching ratio is BR(Z → γγ) ∼ 10−8. Next, choosing the lower central value of |KZγγ | = 0.05, from the figures and the Table in [6], we find that the upper bound to the scale of noncommutativity is ΛNC ∼ 1.0 TeV for ~E2θ + ~B2θ ≃ 1. The obtained bound is strongly supported in [18]. Clearly, the measurement of the Z → γγ decay branching ratio would fix the quantity |KZγγ/Λ NC|, while the inclusion of other triple gauge boson interactions through 2 → 2 scattering experiments [14] would sufficiently reduce the available parameter space of our model by more precisely de- termining the relations among the couplings Kγγγ , KZγγ , KZZγ, KZZZ , KWWγ, and KWWZ. Next, we summarize our results and compare with those obtained previously. The first Z → γγ calculation [22] was performed within a different model which has different symmetries in comparison with ours and, because of the absence of the SW map, the model does not possess the commutative gauge invariance. Also, the Z → γγ rate obtained in [22] by imposing the unitarity of the theory in the usual manner, θ0i = 0, [23, 24], vanishes 3. The partial width for the same process was obtained in [6] in the frame- work of similar theories, which, however, were not renormalizable. The 3The condition of unitarity can be covariantly generalized to θµνθ µν = 2( ~B2θ − ~E2θ) > 0 [25]. The title 11 present results for the partial widths ΓZ→γγ and Γ Z→γγ are about three times larger than those in [6] and consistently symmetric with respect to time-space and space-space noncommutativity. In the polarized rate (33) the third components ((θ03)2 + (θ12)2) are enhanced relative to the other two components by a large factor, as expected. Also, the rate (33) is en- hanced by a factor of ∼ 3 with respect to the total rate (32). The upper limit to the scale of noncommutativity ΛNC ∼ 1 TeV is significantly higher than in [6]. This bound is now firmer owing to the regular behavior of the triple gauge boson interactions (28-29) with respect to the one-loop renormalizability of the NCSM gauge sector. After 10 years of the LHC running the integrated luminosity is expected to reach ∼ 1000 fb−1, [20]. This means that for the assumed BR(Z → γγ) ∼ 10−8 we should have ∼ 300 events of Z → γγ, that is we should be well above the background. On the other hand, this result can also be understood as ∼ 3 events with the BR(Z → γγ) ∼ 10−10, which lifts the scale of noncommutativity up by a factor of ∼ 3. Therefore, with a more stringent selection of photon candidates and if the irreducible di-photon contamination becomes controllable, the Z → γγ decay will become a clean signature of space-time noncommutativity in LHC experiments. Finally, the results of [17,18], while strongly supporting this computa- tions, might also hint at the existence of new interaction symmetry of the noncommutative gauge sector. Such new symmetry could be a responsible for the renormalizability of the noncommutative matter sector including fermions and, next, for the main goal, i.e. in general, the physical realiza- tion of the Lorentz invariance breaking at very high energies, respectively. References [1] N. Seiberg and E. Witten, String theory and non-commutative geometry, JHEP 09 (1999) 032 [arXiv:hep-th/9908142]. [2] J. Madore, S. Schraml, P. Schupp and J. Wess, Gauge theory on non-commutative spaces, Eur. Phys. J. C16 (2000) 161 [arXiv:hep-th/0001203]; B. Jurčo, S. Schraml, P. Schupp and J. Wess, Enveloping algebra valued gauge transforma- tions for non-Abelian gauge groups on non-commutative spaces, Eur. Phys. J. C17 (2000) 521 [arXiv:hep-th/0006246]; B. Jurčo, L. Möller, S. Schraml, P. Schupp and J. Wess, Construction of non-Abelian gauge theories on non-commutative spaces, Eur. Phys. J. C21 (2001) 383 [arXiv:hep-th/0104153]. [3] X. Calmet, B. Jurčo, P. Schupp, J. Wess and M. Wohlgenannt, The standard model on non-commutative space-time, Eur. Phys. J. C23 (2002) 363 [arXiv:hep- ph/0111115]. [4] B. Melic, K. Passek-Kumericki, J. Trampetic, P. Schupp and M. Wohlgenannt, The standard model on non-commutative space-time: Electroweak currents and Higgs sector, Eur. Phys. J. C 42 (2005) 483 [arXiv:hep-ph/0502249]. B. Melic, K. Passek-Kumericki, J. Trampetic, P. Schupp and M. Wohlgenannt, The stan- dard model on non-commutative space-time: Strong interactions included, Eur. Phys. J. C 42 (2005) 499 [arXiv:hep-ph/0503064]. [5] P. Aschieri, B. Jurčo, P. Schupp and J. Wess, Non-Commutative GUTs, Standard Model and C,P,T, Nucl. Phys. B651 (2003) 45 [arXiv:hep-th/0205214]. The title 12 [6] W. Behr, N.G. Deshpande, G. Duplančić, P. Schupp, J. Trampetić and J. Wess, The Z → γγ, gg Decays in the Noncommutative Standard Model, Eur. Phys. J. C29 (2003) 441 [arXiv:hep-ph/0202121]; G. Duplančić, P. Schupp and J. Tram- petić, Comment on triple gauge boson interactions in the non-commutative elec- troweak sector, Eur. Phys. J. C32 (2003) 141 [arXiv:hep-ph/0309138]. [7] M. Buric, D. Latas, V. Radovanovic and J. Trampetic, Improved Z → γγ de- cay in the renormalizable gauge sector of the non-commutative standard model, [arXiv:hep-ph/0611299]. [8] F. Brandt, C.P. Martin and F. Ruiz Ruiz, Anomaly freedom in Seiberg-Witten non-commutative gauge theories, JHEP 07 (2003) 068 [arXiv:hep-th/0307292]. [9] M. Buric, D. Latas and V. Radovanovic, Renormalizability of non-commutative SU(N) gauge theory, JHEP 0602 (2006) 046 [arXiv:hep-th/0510133]. [10] M. Buric, V. Radovanovic and J. Trampetic, The one-loop renormalization of the gauge sector in the θ-expanded non-commutative standard model, JHEP 03 (2007) 030 [arXiv:hep-th/0609073]. [11] R. Wulkenhaar, Non-Renormalizability Of Theta-Expanded Noncommutative QED, JHEP 0203 (2002) 024 [arXiv:hep-th/0112248]. [12] M. Buric and V. Radovanovic, Non-renormalizability of non-commutative SU(2) gauge theory, JHEP 0402 (2004) 040 [arXiv:hep-th/0401103]; [13] J. Trampetić, Rare and forbidden decays, Acta Phys. Polon. B33 (2002) 4317 [hep-ph/0212309]; B. Melic, K. Passek-Kumericki and J. Trampetic, Quarkonia decays into two photons induced by the space-time non-commutativity, Phys. Rev. D 72 (2005) 054004 [arXiv:hep-ph/0503133]; K → pi gamma decay and space-time non-commutativity, Phys. Rev. D 72 (2005) 057502 [arXiv:hep-ph/0507231]; [14] A. Alboteanu, T. Ohl and R. Ruckl, Collider tests of the non-commutative stan- dard model, PoS HEP2005 (2006) 322 [arXiv:hep-ph/0511188]; Probing the non- commutative standard model at hadron collider, Phys. Rev. D 74, 096004 (2006) [arXiv:hep-ph/0608155]. [15] G. ’t Hooft, An algorithm for the poles at dimension four in the dimensional regularization procedure, Nucl. Phys. B 62 (1973) 444. [16] M. E. Peskin and D. V. Schroeder, An introduction to Field Theory, Perseus Books, Reading 1995. [17] D. Latas, V. Radovanovic and J. Trampetic, Non-commutative SU(N) gauge the- ories and asymptotic freedom, arXiv:hep-th/0703018. [18] C. P. Martin, D. Sanchez-Ruiz and C. Tamarit, The noncommutative U(1) Higgs-Kibble model in the enveloping-algebra formalism and its renormalizabil- ity, arXiv:hep-th/0612188. [19] CMS Physics Technical Design Report, Vol.1. CERN/LHCC 2006-001. [20] M. Pieri et al., CMS Note 2006/112. [21] A. Nikitenko, private communications. [22] I. Mocioiu, M. Pospelov and R. Roiban, Low-energy limits on the antisymmetric tensor field background on the brane and on the non-commutative scale, Phys. Lett. B 489, 390 (2000) [arXiv:hep-ph/0005191]; [23] N. Seiberg, L. Susskind and N. Toumbas, Space/time non-commutativity and causality, JHEP 0006, 044 (2000) [arXiv:hep-th/0005015]. [24] J. Gomis and T. Mehen, Space-time noncommutative field theories and unitarity, Nucl. Phys. B 591, 265 (2000) [arXiv:hep-th/0005129]. [25] S. M. Carroll, J. A. Harvey, V. A. Kostelecky, C. D. Lane and T. Okamoto, Noncommutative field theory and Lorentz violation, Phys. Rev. Lett. 87, 141601 (2001) [arXiv:hep-th/0105082]. ABSTRACT We propose the Z -> gamma gamma decay, a process strictly forbidden in the standard model, as a signal suitable for the search of noncommutativity of coordinates at very short distances. We compute the Z -> gamma gamma partial widthin the framework of the recently proposed renormalizable gauge sector of the noncommutative standard model. The one-loop renormalizability is obtained for the model containing the usual six representations of matter fields of the first generation. Even more, the noncommutative part is finite or free of divergences, showing that perhaps new interaction symmetry exists in the noncommutative gauge sector of the model. Discovery of such symmetry would be of tremendous importance in further search for the violation of the Lorentz invariance at very high energies. Experimental possibilities of Z -> gamma gamma decay are analyzed and a firm bound to the scale of the noncommutativity parameter is set around 1 TeV. <|endoftext|><|startoftext|> Introduction Quantum electrodynamics (QED) was the first quantum field theory to be formulated and has suc- cessfully passed every experimental test at low and intermediate fields. A well-known example of QED effects at low fields (∼ 109 V/cm) is the Lamb shift in hydrogen [1]. At low fields, the QED effects (self-energy and vacuum polarisation) can still be ∗ Corresponding author. Email address: d.winters@gsi.de (D.F.A. Winters). treated as a perturbation, only taking into account lower order terms [2]. However, up to now QED calculations have never been tested at high fields (∼ 1015 V/cm) because such fields cannot be pro- duced in a laboratory, nor by the strongest lasers available. At high fields, perturbative QED is no longer valid and higher order terms become impor- tant as well [2]. Experiments carried out at high fields therefore test different aspects of QED cal- culations and are complementary to high-precision tests of the lower order terms. Heavy atoms that have been stripped of almost Preprint submitted to Canadian Journal of Physics 4 November 2018 http://arxiv.org/abs/0704.0560v2 all their electrons, the so-called highly-charged ions (HCI), are ideal ‘laboratories’ for tests of QED at high fields. These ions have, for example, electric field strengths of the order of 1015 V/cm close to the nucleus [2] and can be produced at high veloc- ities at the Gesellschaft für Schwerionenforschung (GSI) in Darmstadt, Germany. At the HITRAP facility, which is currently be- ing built at GSI, ions coming from the experimen- tal storage ring (ESR) with MeV energies will be slowed down by linear and radiofrequency stages to keV kinetic energies, trapped and cooled down to sub-eV energies, and finally made available for experiments. Within the HITRAP project, instru- mentation is being developed for high-precision measurements of atomic and nuclear properties, mass and g-factor measurements and ion-atom and ion-surface interaction studies [3,4,5]. 2. Hydrogen- and lithium-like ions Hydrogen- and lithium-like ions are the best can- didates for our studies, since they have s-electrons which are very close to the nucleus. The (higher order) QED effects are most pronounced at the high fields close to the nucleus, therefore the best measurable quantity is the ground state hyperfine splitting (HFS). Due to the simple electronic struc- ture of H- and Li-like species, accurate (higher or- der) calculations of ground state HFS can be done, which will then be compared with accurate exper- imental results. As a first approximation, good within about 4%, the energy of the (1s) 2S1/2 ground state HFS of hydrogen-like ions is given by [2,6]: EHFS = α(Zα) 2(2I + 1) 2As(1− δ)(1) where α is the fine structure constant, gI = µ/(µNI) is the nuclear g-factor (with µ the nuclear magnetic moment and µN the nuclear magneton), I the nuclear spin, me and mp are the electron and proton mass, respectively, and c is the speed of light. Equation (1) represents the normal ground state HFS multiplied by a correction As for the relativistic energy of the s-electron, and by a fac- tor (1− δ), which takes the ‘Breit-Schawlow’ (BS) effect into account. The BS effect is due to the spa- tial distribution of the nuclear charge. It corrects for the fact that we cannot assume a homogeneous charge distribution over a spherical nucleus. The values for δ were taken from [6], those for gI and I from [7]. In principle eq.(1) should also contain a correction for the finite nuclear mass, but since this correction is very small it can be neglected [2]. The energy of the (1s22s) 2S1/2 ground state HFS of lithium-like ions only differs from eq.(1) by a factor 1/n3 = 1/8 and by the As-value [2]. However, eq.(1) requires two further important corrections, the one of most interest to us being that which corrects for the QED effects. The other correction takes the ‘Bohr-Weisskopf’ (BW) effect into account [8]. The BW effect is due to the spa- tial distribution of the nuclear magnetisation and is only known with an accuracy of 20-30 %, which is mainly due to the single-particle model used for its calculation [9]. Unfortunately, the QED effects are of the same order of magnitude as the uncer- tainty in the BW effect [10]. Thus, from a HFS measurement of a single species (i.e. H- or Li-like) the QED effects cannot be determined accurately. Equation (1) can also be written as E1sHFS = C1s +E1sQED, where the constant C 1s includes ev- erything except the QED effects. Since the equa- tions for the (1s) and (2s) states are so similar, it is possible to write the difference between the two HFS as ∆EHFS = E HFS−ξE HFS = Enon−QED+ EQED [10]. The factor ξ only contains non-QED terms and can be calculated to a high precision [10]. From the difference between the HFS measure- ments of H- and Li-like ions of the same isotope, the QED effects can thus be determined within a few percent. However, this requires measurements of transitionwavelengthswith an experimental res- olution of the order of 10−6. The transition lifetime t is defined as t = A−1 (see e.g. [11]). The transition probability A, for an M1 transition from the excited to the lower hyperfine state, is given by [2] 4α(2πν)3~2I (2κ+ 1) 27m2ec 4 (2I + 1) where ~ is Planck’s constant divided by 2π and κ Table 1 Calculated HFS transition wavelengths (λ) and lifetimes (t) of the most interesting ion species for systematic studies. Also shown are the nuclear spin (I) and magnetic moment (µ), taken from [7]. The half-lives of these species are longer than 10 minutes. (The values listed are truncated and the QED and BW effects are not included.) element ion type λ (nm) t (ms) I µ (µN ) lead 207Pb81+ H-like 973 45 1/2 0.59 bismuth 209Bi82+ H-like 239 0.38 9/2 4.11 209Bi80+ Li-like 1469 87 protactinium 231Pa90+ H-like 262 0.64 3/2 2.01 231Pa88+ Li-like 1511 123 lead [12] 207Pb+ P3/2 - P1/2 710 41 1/2 0.59 chlorine [13] 35Cl+ 3P2 - 1D2 858 - 3/2 0.82 3P1 - 1D2 913 - argon [14] 37Ar2+ 3P2 - 1D2 714 - 7/2 1.3 3P1 - 1D2 775 - is related to the electron’s angular momentum [2]. From eq.(2) it can be seen that A scales with the transition frequency as ν3, whereas ν is propor- tional to Z3, see eq.(1). Therefore, the transition lifetime scales with Z as t ∝ Z−9 and is roughly of the order of milliseconds for Z > 70. In table 1 the calculated transition wavelengths (λ) and lifetimes (t), together with their corre- sponding I and µ values, of the most interesting species for our laser spectroscopy studies are listed. (The QED and BW effects are not taken into ac- count.) The half-lives of these species exceed 10 minutes, which corresponds to the minimum time required for a measurement. Although the wave- lengths span a broad range, roughly from 200 to 1600 nm, these transitions are still accessible with standard laser systems. The three species (Pb [12], Cl [13] and Ar [14]) at the bottom of the table are considered as candidates for pilot experiments. They are singly charged ions, which are easily pro- duced, have M1 transitions at convenient wave- lengths, and can be used to test the laser spec- troscopy part of the experiment. A measurement of the HFS in 207Pb+ is of special interest, be- cause it will be possible to extract the value of µ. Currently two different values exist, which unfor- tunately leads to a 2% difference in the HFS cal- culations [15]. In principle, similar experiments could be car- ried out with metastable hafnium (180Hf, level energy 1141 keV, half-life 5.5 h [7]). For H-like hafnium, the transition values are λ = 217 nm and t = 0.25 ms. For the Li-like ion, λ = 1434 nm and t = 72 ms are obtained. The difficulty with this isotope is that its nucleus is in an excited state, which is difficult to produce. Figure 1 shows the calculated transition wave- lengths of all H-like lead, and all H- and Li-like bismuth isotopes with half-lives exceeding 10 min- utes. (The QED and BW effects are not included.) The isotopes are labelled by their corresponding atomic mass units (in u) and the stable isotopes (207Pb and 209Bi) are indicated by the small ar- rows. For Pb, only the H-like isotopes are acces- sible with standard lasers, because the transition wavelengths of Li-like isotopes are much longer than 1600 nm. For Bi, many isotopes of both ion species are accessible, although their transition wavelengths differ considerably. From Fig.1 it is clear that both elements offer many candidates for laser spectroscopy measure- ments of ground state HFS and that bismuth, in particular, allows for a systematic study of the (higher order) QED effects at high fields. Further- more, a systematic study of different isotopes of the same species, for example a study of the H-like Pb isotopes, will make it possible to study trends in nuclear properties across a range of isotopes. There already exist two previous measurements of the 2s ground state HFS in 209Bi. A direct mea- surement [16] was carried out using the ESR at GSI (Darmstadt), but unfortunately no resonance was found at the predicted value of ≈ 1554 nm [10]. An indirect measurement [17] was performed in an electron beam ion trap (EBIT) and yielded a value of ≈ 1512 nm, but the error in the mea- surement was rather large (≈ 50 nm). In the ESR the ions have relativistic velocities (≈ 200MeV/u), H-like Li-like H-like Fig. 1. Calculated transition wavelengths for H-like Pb and Bi isotopes (full circles), and Li-like Bi isotopes (open circles). Only isotopes with half-lives exceeding 10 minutes are shown. The small arrows indicate the stable isotopes, the numbers are the masses in u. (The QED and BW effects are not included.) which are used to shift the transition wavelength to a lower value (≈ 532 nm), and the transitions are Doppler-broadened (≈ 40 GHz). In the EBIT the ions have temperatures of several hundreds of eV (∼ 106 K), which lead to considerable Doppler broadening (≈ 10 GHz). The resolution obtained in previous measurements at the ESR is of the or- der of 10−4, whereas that of the EBIT measure- ment is of the order of 10−2. 3. Experiment overview A detailed description of the proposed experi- ments, as well as a treatment of the techniques used, can be found elsewhere [18,19]. Briefly, an externally produced bunch of roughly 105 HCI at an energy of a few keV is loaded into a cylindrical open-endcap Penning trap [20] on axis, i.e. along the magnetic field lines. Electron capture (neutral- isation) by collisions is strongly reduced by operat- ing the trap at cryogenic temperatures under UHV conditions. The HCI are captured in flight, con- fined, cooled by ‘resistive cooling’ [21] and radially compressed by a ‘rotating wall’ [22] technique. Af- ter these steps a cold and dense ion cloud is ob- tained. The spectroscopy laser enters the trap ax- ially through an open-endcap and will fully irradi- ate the ion cloud. The fluorescence from the excited HCI is detected perpendicular to the cooled axial motion (trap axis) through segmented ring elec- trodes, which are covered by a highly-transparent copper mesh. (The ring is segmented for the rotat- ing wall technique.) The above mentioned transition lifetimes imply that, for a detection efficiency of ∼ 10−3, accept- able fluorescence rates, up to a few thousand counts per second, from M1 transitions can be expected from a (∼ 3mm diameter) cloud of 105 ions [18,19]. Confining the HCI in a trap, and cooling and com- pressing the cloud, will thus enable fluorescence detection and ensure long interrogation times by the laser. However, due to the high density of HCI in the cloud, space charge effects will play a role and will lead to shifts of the motional frequencies of the trapped ions. We have studied this effect in detail and understand the corresponding frequency shifts well [23]. Since these shifts are fairly small, the (fre- quency dependent) cooling and compression tech- niques can still be applied. The HCI also need to be strongly cooled to re- duce Doppler broadening of the transitions. This will be achieved by resistive cooling of the (axial) ion motion in the trap. For example, for the F = 1 → F = 0 transition in 207Pb81+ at ν ≈ 3× 1014 Hz, the Doppler broadened linewidth at a temper- ature of 4 K is ∆νD ≈ 3× 10 7 Hz. The anticipated resolution is therefore of the order of 107/1014 = 10−7. This is three orders of magnitude better than any previous measurement, see e.g. [15,24], and good enough to measure the QED effects within a few percent. 4. Acknowledgments This work is supported by the European Com- mission within the framework of the HITRAP project (HPRI-CT-2001-50036). W.N. acknowl- edges funding by the Helmholtz Association (VH- NG-148). References [1] W.E. Lamb Jr. and R.C. Rhetherford, Phys. Rev. 72, (1974) 241. [2] T. Beier, Phys. Rep. 339, (2000) 79. [3] W. Quint J. Dilling, S. Djekic, H. Häffner, N. Hermanspahn, H.-J. Kluge, G. Marx, R. Moore, D. Rodriguez, J. Schönfelder, G. Sikler, T. Valenzuela, J. Verdú, C. Weber and G. Werth, Hyp. Int. 132, (2001) [4] T. Beier, L. Dahl, H.-J. Kluge, C. Kozhuharov, W. Quint and the HITRAP collaboration, Nucl. Instr. Meth. Phys. Res. B 235, (2005) 473. [5] H.-J. Kluge, T. Beier, K. Blaum, M. Block, L. Dahl, S. Eliseev, F. Herfurth, S. Heinz, O. Kester, C. Kozhuharov, T. Kühl, G. Maero, W. Nörtershäuser, T. Stöhlker, W. Quint, G. Vorobjev, G. Werth, and the HITRAP Collaboration, Proceedings of the Memorial Symposium for Gerhard Soff, Topics in Heavy Ion Physics (Eds Walter Greiner and Joachim Reinhardt), pages 89-101 (2005), EP Systema (Budapest). [6] V.M. Shabaev, J. Phys. B 27, (1994) 5825. [7] R.B. Firestone and V.S. Shairley, Table of Isotopes (Appendix E), Wiley (1998). [8] A. Bohr and V.F. Weisskopf, Phys. Rev. 77, 94 (1950). [9] V.M. Shabaev, M. Tomaselli, T. Kühl, A.N. Artemyev and V.A. Yerokhin, Phys. Rev. A 56, (1997) 252. [10] V.M. Shabaev, A.N. Artemyev, V.A. Yerokhin, O.M. Zherebtsov and G. Soff, Phys. Rev. Lett. 86, (2001) 3959. [11] W. Demtröder, Laser Spectroscopy, Springer, New York (1996). [12] A. Roth, Ch. Gerz, D. Wilsdorf and G. Werth, Z. Phys. D 11, (1989) 283. [13] I.S. Bowen, Astrophys. J. 132, (1960) 1. [14] M.H. Prior, Phys. Rev. A 30, (1984) 3051. [15] P. Seelig, S. Borneis, A. Dax, T. Engel, S. Faber, M. Gerlach, C. Holbrow, G. Huber, T. Kühl, D. Marx, K. Meier, P. Merz, W. Quint, F. Schmitt, M. Tomaselli, L. Völker, H. Winter, M. Würtz, K. Beckert, B. Franzke, F. Nolden, H. Reich, M. Steck and T. Winkler, Phys. Rev. Lett. 81, (1998) 4824. [16] S. Borneis, A. Dax, T. Engel, C. Holbrow, G. Huber, T. Kühl, D. Marx, P. Merz, W. Quint, F. Schmitt, P. Seelig, M. Tomaselli, H. Winter, K. Beckert, B. Franzke, F. Nolden, H. Reich and M. Steck, Hyp. Int. 127, (2000) 305. [17] P. Beiersdorfer, A.L. Osterheld, J.H. Scofield, J.R. Crespo López-Urrutia and K. Widmann, Phys. Rev. Lett. 80, (1998) 3022. [18] D.F.A. Winters, A.M. Abdulla, J.R. Castrejón Pita, A. de Lange, D.M. Segal and R.C. Thompson, Nucl. Instr. Meth. Phys. Res. B 235, (2005) 201. [19] M. Vogel, D.F.A. Winters, D.M. Segal and R.C. Thompson, Rev. Sci. Instrum. 76, (2005) 103102. [20] G. Gabrielse, L. Haarsma and S.L. Rolston, Int. J. Mass Spectr. Ion Proc. 88, (1989) 319. [21] D.J. Wineland and H.G. Dehmelt, J. Appl. Phys. 46, (1975) 919. [22] W.M. Itano, J.J. Bollinger, J.N. Tan, B. Jelenković, X.-P. Huang and D.J. Wineland, Science 279, (1998) [23] D.F.A. Winters, M. Vogel, D.M. Segal and R.C. Thompson, J. Phys. B: At. Mol. Opt. Phys. 39, (2006) 3131. [24] I. Klaft, S. Borneis, T. Engel, B. Fricke, R. Grieser, G. Huber, T. Kühl, D. Marx, R. Neumann, S. Schröder, P. Seelig and L. Völker, Phys. Rev. Lett. 73, (1994) 2425. Introduction Hydrogen- and lithium-like ions Experiment overview Acknowledgments References ABSTRACT An overview is presented of laser spectroscopy experiments with cold, trapped, highly-charged ions, which will be performed at the HITRAP facility at GSI in Darmstadt (Germany). These high-resolution measurements of ground state hyperfine splittings will be three orders of magnitude more precise than previous measurements. Moreover, from a comparison of measurements of the hyperfine splittings in hydrogen- and lithium-like ions of the same isotope, QED effects at high electromagnetic fields can be determined within a few percent. Several candidate ions suited for these laser spectroscopy studies are presented. <|endoftext|><|startoftext|> Introduction to Superconducting Circuits (Wiley, New York, 1999); M. Tinkham, Introduction to Superconductivity (McGraw-Hill, New York, 1996); K. K. Likharev, Dynamics of Josephson Junctions and Circuits (Gordon and Breach, New York, 1986); A. Barone and G. Paternò, Physics and Applications of the Josephson Effect (Wiley, New York, 1982). [11] P.C. Hendry, N.S. Lawson, R.A.M. Lee, P.V.E. McClintock, and C.H.D. Williams, in: Forma- tion and Interactions of Topological Defects, ed. A.C. Davis and R.N. Brandenberger (Plenum, New York,1995). mailto:phr76jb@tx.technion.ac.il References ABSTRACT We study a loop of Josephson junctions that is quenched through its critical temperature. For three or more junctions, symmetry breaking states can be achieved without thermal activation, in spite of the fact that the relaxation time is practically constant when the critical temperature is approached from above. The probability for these states decreases with quenching time, but the dependence is not allometric. For large number of junctions, cooling does not have to be fast. For this case, we evaluate the standard deviation of the induced flux. Our results are consistent with the available experimental data. <|endoftext|><|startoftext|> Title Frequency modulation Fourier transform spectroscopy Mandon, Guelachvili, Picqué, 2007 Frequency modulation Fourier transform spectroscopy Julien Mandon, Guy Guelachvili, Nathalie Picqué Laboratoire de Photophysique Moléculaire, CNRS; Univ. Paris-Sud, Bâtiment 350, 91405 Orsay, France Corresponding author: Dr. Nathalie Picqué, Laboratoire de Photophysique Moléculaire Unité Propre du CNRS, Université Paris Sud, Bâtiment 350 91405 Orsay Cedex, France Phone number: +33 1 69156649 Fax number: +33 1 69157530 Email: nathalie.picque@ppm.u-psud.fr Web: http://www.laser-fts.org Abstract: A new method, FM-FTS, combining Frequency Modulation heterodyne laser spectroscopy and Fourier Transform Spectroscopy is presented. It provides simultaneous sensitive measurement of absorption and dispersion profiles with broadband spectral coverage capabilities. Experimental demonstration is made on the overtone spectrum of C2H2 in the 1.5 µm region. OCIS codes: 120.6200, 300.6300, 300.6380, 300.6360, 300.6310, 300.6390, 120.5060 120.6200 Spectrometers and spectroscopic instrumentation, 300.6300 Spectroscopy, Fourier transforms, 300.6380 Spectroscopy, modulation, 300.6360 Spectroscopy, laser, 300.6310 Spectroscopy, heterodyne, 300.6390 Spectroscopy, molecular, 120.5060 Phase modulation mailto:nathalie.picque@ppm.u-psud.fr http://www.laser-fts.org/ Frequency modulation Fourier transform spectroscopy Mandon, Guelachvili, Picqué, 2007 Improving sensitivity is presently one of the major concern of spectroscopists. This may be obtained both from the enhancement of the intrinsic signal, and from the reduction of the background noise. In this latter case, modulation has been one of the most effective approach. In particular, Frequency Modulation (FM) absorption spectroscopy [1] has reached detection sensitivity near to the fundamental quantum noise limit, by shifting the frequency modulation of the measurements to a frequency range where the 1/f noise becomes negligible. Moreover, FM spectroscopy benefits from high-speed detection and simultaneous measurement of absorption and dispersion signals. Since Bjorklund’s first demonstrations [1,2] of the efficiency of FM spectroscopy with a single-mode continuous-wave dye laser, the technique has been widely used as a tunable laser spectroscopic method in fields such as laser stabilization [3], two-photon spectroscopy [4], optical heterodyne saturation spectroscopy [5], trace gas detection [6]. In most schemes, the laser wavelength is scanned across the atomic/molecular resonance to retrieve the line shape. More rarely, the modulation frequency is tuned. However in both cases, the measurements are limited to narrow spectral ranges. This letter reports the first results in FM broadband spectroscopy. This work is motivated by our ongoing effort of implementing a new spectroscopic approach simultaneously delivering sensitivity, resolution, accuracy, broad spectral coverage and rapid acquisition. The basic idea, named FM-FTS, is to associate the advantages of FM spectroscopy and high-resolution Fourier transform spectroscopy (FTS). FTS is able to record at once extended ranges, with no spectral restriction. In particular it gives easy access to the infrared domain. In this letter, a new way of modulating the interferogram is implemented. The key concept is that a radio frequency (RF) modulation is performed. The beat signal at the output port of the Fourier transform spectrometer is modulated at constant RF, which is about 104 times greater than the audio frequency generally delivered by the interferometer optical conversion. Together with the advantage, over classical FTS, of measurements performed at much higher frequency, our approach benefits from the synchronous detection ability and from the simultaneous acquisition of both the absorption and the dispersion of the recorded profiles. The experimental principle is presented in Fig. 1. The light emitted by the broadband source is first passing through the interferometer. The output beam is then phase-modulated by an electro-optic modulator (EOM) before entering the absorption cell and falling on the fast detector. The synchronous detection of the detector signal is realized by the lock-in amplifier at the EOM driver reference frequency fm. Recorded data are finally stored on the computer disk with their corresponding path difference position ∆. Their Fourier transform is the spectrum. In more details, the electric field E at the output of the interferometer may be written as: 0 ( )( , ) 1 exp exp( )d c.c. (1) c c ct i i tc ω ω ω ⎡ ⎤∆⎛ ⎞ ∆ = + − +⎜ ⎟⎢ ⎥ ⎝ ⎠⎣ ⎦ where E0 is the electric field amplitude of the source at ωc optical pulsation, c is the velocity of light and c.c the conjugate complex of the preceding expression in Eq. 1. The EOM effect on the beam is assumed to have a low modulation index M. As a consequence, each carrier wave of pulsation ωc, has two weak sidebands located at ± ωm = ± 2π fm. Equation (1) becomes: ( ) ( ) } 0 ( )( , ) 1 exp exp M exp M exp d c.c. (2) c m c m c t i i t i t i t ω ω ω ω ω ⎡ ⎤∆⎛ ⎞ ∆ = + −⎜ ⎟⎢ ⎥ ⎝ ⎠⎣ ⎦ ⎡ ⎤ ⎡ ⎤+ + − − +⎣ ⎦ ⎣ ⎦ When interacting with the gas, the carrier and the sidebands experience attenuation and phase- shift due to absorption and dispersion. Following the notations introduced in [1], this interaction may be written as exp(-δ(ω)- i φ(ω)) where δ is the amplitude attenuation and φ is Frequency modulation Fourier transform spectroscopy Mandon, Guelachvili, Picqué, 2007 the phase shift. The following convention is adopted: δn and φn denotes for n = 0, ±1 the respective components at ωc and ωc± ωm. Then Eq. 2 may be written: ( ) ( ){ ( ) ( ) ( ) ( ) } 1 1 1 1 ( , ) 1 exp exp exp M exp exp M exp exp d c.c. (3) c m c m c t i i i t i i t i i t ω δ φ ω δ φ ω ω δ φ ω ω ω+ + − − ⎡ ⎤∆⎛ ⎞ ∆ = + − − −⎜ ⎟⎢ ⎥ ⎝ ⎠⎣ ⎦ ⎡ ⎤ ⎡ ⎤+ − − + − − − − +⎣ ⎦ ⎣ ⎦ The intensity I detected by the fast photodetector is proportional to : *I( , ) ( , ) ( , ). (4)t t t∆ ∝ ∆ ∆E E ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 0 1 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 I( , ) exp 2 exp 2 exp 2 1 cos 2M cos exp cos exp cos 1 cos 2M sin exp sin exp sin 1 cos δ δ δ ω ω δ δ φ φ δ δ φ φ ω ω δ δ φ φ δ δ φ φ ω + + − − − − + + ⎛ ⎡ ⎤∆⎛ ⎞⎡ ⎤∆ ∝ − + − + − +⎜ ⎜ ⎟⎢ ⎥⎣ ⎦ ⎝ ⎠⎣ ⎦⎝ ⎡ ⎤∆⎛ ⎞⎡ ⎤+ − − − − − − − + ⎜ ⎟⎢ ⎥⎣ ⎦ ⎝ ⎠⎣ ⎦ ∆⎛ ⎞⎡ ⎤+ − − − − − − − + ⎜⎣ ⎦ ⎝ ⎠ ( ) ( ) ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 1 2M cos 2 exp sin 1 cos 2M sin 2 exp cos 1 cos d . (5) m c c ω δ δ φ φ ω ω δ δ φ φ ω ω + − − + + − − + ⎡ ⎤∆⎛ ⎞⎡ ⎤+ − − − + ⎜ ⎟⎢ ⎥⎣ ⎦ ⎝ ⎠⎣ ⎦ ⎞⎡ ⎤∆⎛ ⎞⎡ ⎤+ − − − + ⎟⎜ ⎟⎢ ⎥⎣ ⎦ ⎝ ⎠⎣ ⎦ ⎠ After synchronous detection at fm frequency and with the assumption that |δ0-δj |<<1 and |φ0- φj|<<1 (with j = ±1), the in-phase Icos(∆) and the in-quadrature Isin(∆) parts of the electric signal are given by ( )( )cos 0 1 1I ( ) M 1 cos exp 2 d . (6)c cc ω δ δ δ ω− + ⎡ ⎤∆⎛ ⎞ ∆ ∝ + − −⎜ ⎟⎢ ⎥ ⎝ ⎠⎣ ⎦ ( )( )sin 0 1 1 0I ( ) M 1 cos exp 2 2 d . (7)c cc ω δ φ φ φ ω+ − ⎡ ⎤∆⎛ ⎞ ∆ ∝ + − + −⎜ ⎟⎢ ⎥ ⎝ ⎠⎣ ⎦ Summarizing, two interferograms are simultaneously measured, allowing to obtain broadband FM spectra. The in-phase interferogram provides spectrally resolved information on the difference of absorption experienced by each group of two sidebands. The in- quadrature interferogram gives the difference between the average of the dispersions experienced by the sidebands and the dispersion undergone by each carrier. For this first experimental demonstration, a narrow-band emission source covering 0.25 cm-1 (7.5 GHz) has been implemented as a test source. It is made of a fiber-coupled distributed feedback laser diode emitting around 1530 nm with an output power of a few mW. The current of the laser diode is modulated at about 20 Hz by a ramp generator. At each path difference step, while the interferometer is recording one interferogram sample, the laser frequency excursion is equal to 7.5 GHz, corresponding to one period of the triangular ramp. Consequently, for the interferometer, the laser diode behaves as a continuous emission source emitting over 0.25 cm-1. The interferometer output light is phase-modulated at fm = 150 MHz by the EOM and passes through an 80-cm cell filled at 10 hPa with acetylene in natural abundance. The light is next focused on an InGaAs nanosecond infrared photodetector, which according to Eq.5 delivers a signal proportional to the intensity of the beam containing a beat signal at the RF modulation frequency. The amplified detector signal is mixed with the reference signal at fm, down to d.c., using a commercial high frequency dual-phase lock-in amplifier. The reference may be phase-shifted with respect the signal used to drive the EOM. The two channels detected in-phase and in-quadrature are measured simultaneously. Frequency modulation Fourier transform spectroscopy Mandon, Guelachvili, Picqué, 2007 Figure 2 shows a typical in-phase interferogram of C2H2. Its shape is characteristic of an interferogram of first-derivative type line-shapes. The 3 cm period amplitude modulation is due to the beat between the two strongest acetylene lines in the explored spectral domain. Figure 3 shows the two narrow-band spectra, Fourier transform of the in-phase (absorption) and in-quadrature (dispersion) interferograms. The spectral domain extension is limited by the tuning capabilities of the diode laser, which was used as a test source. This does not restrict the generality of the present demonstration. The lines belong to the ν1+ν3 and ν1+ν3+ν51-ν51 overtone bands of 12C2H2. The unapodised spectral instrumental resolution: 12.5 10-3 cm-1 (0.375 GHz) is narrower than the Doppler width of the lines. Signal to noise ratio is of the order of 1200. The total recording time of the order of 15 minutes is due to the need of adapting the interferometer recording mode procedure to the rather low laser diode frequency excursion period. The present validation of FM-FTS with a narrow band light source made the experience much simpler. Indeed, in wideband FTS, processing the signal of the interferogram needs special dynamic range solutions. Thanks to the only 0.25 cm-1-wide spectrum analysed in this experiment, a sophisticate RF detection chain, presently under development, was not necessary. The design of our Connes-type interferometer allows a balanced detection of the signals recorded at the two output ports. This will be helpful to remove the part of the interferogram which is not modulated by path difference and to consequently improve the dynamic range of the measurements. Similar solutions have already been successfully practiced for time-resolved FTS [7]. In FM-FTS, they are formally even easier to implement since the signal may be band-pass filtered around the modulation radio-frequency. In the present experimental set-up, the light should sequentially reach the equipment parts as shown in Fig.1. Briefly, to have a broadband equivalent of FM tunable laser spectroscopy, the sidebands generated by the EOM must not be resolved by the spectrometer. Also, since each carrier and its sidebands have to experience different attenuation and phase- shift, the EOM must be placed before the cell containing the gas of interest. This matter will be discussed in more detail elsewhere. This first FM-FTS experiment demonstrates the feasibility of coupling broadband laser sources, Fourier spectrometers and RF detection. This opens new perspectives in high sensitivity multiplex spectroscopy. FM-FTS may be coupled to a large variety of high brightness sources. This includes broadband cw lasers, supercontinua sources, mode-locked lasers as demonstrated recently [8], and Amplified Spontaneous Emission sources. Frequency nonlinear conversion may also be used when no laser source is available in the spectral range of interest. FM-FTS may be practiced with any kind of Fourier transform spectrometers, including commercially available instruments, at the expense of reasonable modifications in the signal detection scheme. The approach is also suitable at low spectral resolution. In such case, modulation frequencies lying in the GHz domain may be used. Moreover, FM-FTS induces new practices in Fourier transform spectroscopy. The modulation frequency is very high. The optical fringes generated by the interferometer can then be scanned at a much higher frequency than what is usually practiced nowadays. Path difference variation of the order of 1 m/s, is easily affordable. It corresponds to acquisition times expressed in second when presently the most efficient existing high resolution interferometers need 1 to 10 hours to record interferograms. Additionally, due to the low étendue of the analysed laser beams in our method, miniaturized instruments may be implemented. In addition to the radio-frequency detection scheme, sensitivity may be further enhanced by using an external optical resonator, thus increasing the effective absorption length. With FM-FTS, both the absorption and the dispersion associated with each spectral features are measured simultaneously. Despite its recognized interest for lineshape parameters retrieval, traditional dispersion spectroscopy has been poorly developed, only at low spectral Frequency modulation Fourier transform spectroscopy Mandon, Guelachvili, Picqué, 2007 resolution, mostly due to its experimental complexity. FM-FTS should represent an easy manner of getting this information over extended spectral domains, which may induce new interest to the experimental investigation of dispersion profiles. References [1] G.C. Bjorklund, Frequency-modulation spectroscopy: a new method for measuring weak absorptions and dispersions, Optics Letters 5, 15-17 (1980). [2] G.C. Bjorklund, M.D. Levenson, W. Lenth, C. Ortiz, Frequency-modulation (FM) spectroscopy. Theory of lineshapes and signal-to-noise analysis, Applied Physics B 32, 145- 152 (1983). [3] R. W. P. Drever, J. L. Hall, F. V. Kowalski, J. Hough, G. M. Ford, A. J. Munley and H. Ward, Laser phase and frequency stabilization using an optical resonator, Applied Physics B 31, 97-105 (1983). [4] W. Zapka, M. D. Levenson, F. M. Schellenberg, A. C. Tam, G. C. Bjorklund, Continuous- wave Doppler-free two-photon frequency-modulation spectroscopy in Rb vapor Optics Letters 8, 27-29 (1983) [5] J.L. Hall, L. Hollberg, T. Baer, H.G. Robinson, Optical heterodyne saturation spectroscopy, Applied Physics Letters 39, 680-682 (1981). [6] P. Maddaloni, P. Malara, G. Gagliardi, P. De Natale, Two-tone frequency modulation spectroscopy for ambient-air trace gas detection using a portable difference-frequency source around 3 µm, Applied-Physics-B-Lasers-and-Optics B85, 219-22 (2006). [7] N. Picqué, G. Guelachvili, High-information time-resolved Fourier transform spectroscopy at work, Applied Optics 39, 3984-3990 (2000). [8] J. Mandon, G. Guelachvili, N. Picqué, Frequency Comb Spectrometry with Frequency Modulation, in preparation, 2007. Frequency modulation Fourier transform spectroscopy Mandon, Guelachvili, Picqué, 2007 Figure captions Fig. 1. Schematic of the experimental setup. Fig. 2. Absorption interferogram using in-phase RF detection with FM-FTS. Maximum path difference is 40 cm corresponding to 12.5 10-3 cm-1 unapodized resolution. Fig. 3. FM-FTS dispersion and absorption spectra of the acetylene molecule at 1528.6 nm. The middle plot represents the line relative intensities taken from the HITRAN database. Frequency modulation Fourier transform spectroscopy Mandon, Guelachvili, Picqué, 2007 Broadband source FTS EOM Sample Cell Driver Lock-In Detector Fig. 1. Schematic of the experimental setup. 0 10 20 30 40 Path difference (cm) FM-FTS Interferogram (absorption channel) Fig. 2. Absorption interferogram using in-phase RF detection with FM-FTS. Maximum path difference is 40 cm corresponding to 12.5 10-3 cm-1 unapodized resolution. 6541.50 6541.75 6542.00 Dispersion Absorption (2) R ν1+ν3+ν51-ν51 ν1+ν3 (cm-1) Fig. 3. FM-FTS dispersion and absorption spectra of the acetylene molecule at 1528.6 nm. The middle plot represents the line relative intensities taken from the HITRAN database. Frequency modulation Fourier transform spectroscopy ABSTRACT A new method, FM-FTS, combining Frequency Modulation heterodyne laser spectroscopy and Fourier Transform Spectroscopy is presented. It provides simultaneous sensitive measurement of absorption and dispersion profiles with broadband spectral coverage capabilities. Experimental demonstration is made on the overtone spectrum of C2H2 in the 1.5 $\mu$m region. <|endoftext|><|startoftext|> Introduction There are strong theoretical coherence reasons which impose to critically reconsider the approach to cosmological problem on the whole. The Quantum Cosmology’s main problem is to individuate the proper boundary conditions for the Universe’s wave function in the Wheeler-DeWitt equation. These conditions have to be such to allow the confrontation between a probability distribution of states and the observed Universe. In particular, it is expected to select a path in the configuration space able to solve the still open problems of the Big-Bang traditional scenario: flat space, global homogeneity (horizon problem) and the “ruggedness” necessary to explain the tiny initial dishomogeneities which have led to the formation of the galactic structures. The inflationary cosmology ideas has partly supplied with a solution to the standard model wants by introducing the symmetry breaking and phase transition notions which are at the core of Quantum Cosmology. The last one also finds its motivation in the necessity to provide with a satisfactory physical meaning to the initial singularity problem, unavoidable in GR under the condition of the Hawking-Penrose theorem (Hawking & Ellis, 1973). The Hartle-Hawking “no-boundary” condition seems to provide a very powerful constraint for the Quantum Cosmology main requirements, but appears as an “ad hoc” solution which could be deduced by a fundamental approach. Particularly, the mix of topologies used to conciliate the without boundary Universe symmetry with the Big-Bang evolutionary scenario is unsatisfactory. We realize that most part of the Quantum Cosmology problems inherit the uncertainties of the Fridman model in GR, so they derive from the euristic use of the local laws on cosmic scale. A possible way-out is the Fantappié-Arcidiacono group approach which allows to individuate a Universe model without recourse to arbitrary extrapolations of the symmetry groups valid in physics. The group extension theory naturally finds again the Hartle-Hawking condition on the Universe wave function and allows to firmly founding theoretically the Quantum Cosmology. The price to pay is a subtle methodological question on using the GR in cosmology. In fact, in 1952 Fantappié pointed out that the problem of the use of local laws to define the cosmological boundary conditions is due to the fact that GR describes matter in terms of local curvature, but leaves the question of space-time global structure indeterminate. It happens because, differently from RR, GR has not be built on group base, which thing should be central in building any theory up, especially when it aims to express universally valid statements on physical world, the class of the superb theories, how Roger Penrose called them. We are going to examine here the foundations of the group extension method (par. 2) and the relativity in the De Sitter Universe (par. 3, 4), we introduce the conditions to define matter-fields (par.5).In (par.6) we analyze the physical significance of the observers in an istantonic Universe at imaginary time, and in (par.7) investigate the physical meaning of an Hartle-Hawking condition in an hyper-spherical universe. 2. An Erlangen Program for Cosmology In 1872 Felix Klein (1849-1925) presented the so-called Erlangen program for geometry, centred upon the symmetry transformations group. From 1952, Fantappié, basing on a similar idea and in perfect consonance with Relativity spirit, proposed an Erlangen program for physics, where a Universe is univocally individuated by a symmetry group which let its physical laws invariant (Fantappié,1954, 1959). It has to be underlined that in the theory Universe means any physical system characterized by a symmetry group. The space-time isotropy and homogeneity principle with respect to physical laws tells us that the physical law concept itself is based upon symmetry. So the essential idea is to individuate physical laws starting from the transformations group which let them invariant. We observe here that there are infinite possible transformations group which individuate an isotropic and homogeneous space-time. In order to build the next improvements in physics using the group extension method, we can follow the path indicated by the two groups we know to be two valid description levels of the physical world: the Galilei group and the Lorentz-Poincarè one. It is useful to remember that the Galilei group is a particular case of the Lorentz one when ∞→c ,i.e. when it is not made use of the field notion and the interactions velocity is considered to be infinite. Staying within a quadrimensional space-time and consequently considering only groups at 10 parameters and continuous transformations, Fantappié showed that the Poincaré group can be considered a limit case of a broader group depending with continuity on c and another parameter r: the Fantappié group; moreover this group cannot be further extended under the condition to stay within a group at 10 parameters. So we have the sequence: 1031 31 +++ →→ FLG Where G is the Galilei group, L the Lorentz one and F the Fantappié final one, from which with ∞→R , we get the L group. It is shown that such sequence of universes is univocal. The Lorentz group can be mathematically interpreted as the group of roto-translations such to let that particular object that is the Minkowski space-time invariant. Similarly, the Fantappié group is the one of the pentadimensional rotations of a new space-time: the hyper-spherical and at constant curvature De Sitter universe (maximally symmetric). We point out we have obtained the De Sitter model without referring to the gravitational interaction, differently from the GR where the De Sitter universe is one of the possible solutions of the Einstein equations with cosmological constant. From a formal viewpoint we make recourse to pentadimensional rotations because in the De Sitter universe there appears a new constant r, which can be interpreted as the Universe radius. The group extension mechanism individuates an univocal sequence of symmetry groups; for each symmetry group we have a corresponding level of physical world description and a new universal constant, so providing the most general boundary conditions and constraining the form of the possibile physical laws. The Fantappié group fixes the c and r constants and defines a new relativity for the inertial observers in De Sitter Universe. In this sense, the Theory of Universes- based on group extension method- is actually a version of what is sought for in the Holographic Principle: the possibility to describe laws and boundaries in a compact and unitary way. In 1956 G. Arcidiacono proposed to study the De Sitter S4 absolute universe by means of the tangent relative spaces where observers localize and describe the physical events by using the Beltrami-Castelnuovo 4P projective representation in the Projective Special Relativity, PSR (Arcidiacono,1956; 1976; 1984). We note that we pass from hyper-spherical S4 to its real representation as hyperboloid by means of an inverse Wick rotation, rotating τ→it and associating the great circles on the hyper-sphere with a family of geodesics on the hyperboloid. In this way, we get a realization of the Weyl principle for defining a Universe model, because it fixes a set of privileged observers (Ellis & Williams, 1988). So, the choice of 4P Beltrami-Castelnuovo is equivalent to study a relativity in 4S . 3. The Fantappié Group Transformations To study the De Sitter 4S universe according to Beltrami-Castelnuovo representation we have to set the projectivities which let the Cayley-Klein interval invariant: (1.3) 0222222 =+−++ rtczyx . The (1.3) meets the time axis in the two 0tt ±= “singularities”, where crt =0 is the time it takes light to run the Universe r radius. In this case the singularities’ meaning is purely geometrical, not physical, and they represent the hyperboloid rims (1.3), since the De Sitter universe is lacking in “structural” singularities. The 4S invariant transformations are the 5-dimensional space rotations which lead on the 4P observer’s space the projectivities that let the (1.3) unchanged. Let’s introduce the five homogeneous projective coordinates (Weierstrass condition): (2.3) 2rxx aa = , with .4,3,2,1,0=a The ix space-time coordinates, with i = 1,2,3,4 are: (3.3) xx =1 , yx =2 , zx =3 , ictx =4 . The connection between the (2.3) and (3.3) is given by the relation: (4.3) 0xxrx ii = from which, owing to (2.3), we get the inverse relation: (5.3) arx =0 , axx ii = , where 2222 11 γα −+=+= rxxa ii , with rx =α and 0tt=γ . The searched transformation between the two 'O and O observers consequently has the form: (6.3) ax = bab xα with abα orthogonal matrix. Limiting ourselves, just for simplicity reasons, to the 410 ,, xxx variables and following the standard method, also used in RR, we get 3 families of transformations: A) the space translations along the x axis, given by the ( 10 , xx ) rotation: (7.3) ϑϑ sincos 01 1 xxx += +−= ϑsin1 0 xx ϑcos0x 4 4 xx = . Using the (4.3) and putting αϑ == r Ttg , we get the space-time transformations with T parameter: (8.3) ' , 1 2' . The (8.3) for r indeterminate, i.e. ∞→r , are reduced to the well-known space translations of the classical and relativistic cases, connected by the T parameter. B) the T0 parameter time translation, given by the ( 40 , xx ) rotation: (9.3) 0004 4 sincos ϑϑ xxx += +−= 04 0 sinϑxx 00 cosϑx 1 1 xx = . Putting γϑ itTitg == 000 we obtain: (10.3) = , Also the (10.3), when ∞→r are reduced to the known cases of classical and relativistic physics. C) the V parameter inertial transformations, given by the ( 41 , xx ) rotation: (11.3) 04011 sincos ϕϕ xxx += +−= 01 4 sinϕxx 04 cosϕx 0 0 xx = . Putting βϕ icVitg == , here we find again the Lorentz transformations: (12.3) x , The (A), (B) and (C) transformations form the Fantappié projective group which for two variables (x,t) and three parameters (T,T0,V), with T translations and V velocity along x, can be written: (13.3) ( )[ ] ( ) ( ) 0 ttrxab bTctax αβγβγα γβγαβ ( )[ ] ( ) ( ) 0 ttrxab bTtcxa αβγβγα αβγαβ where we have put 221 γα −+=a and 22 )(1 βγαβ −+−=b , with rx=α , cV=β and 0tt=γ . For ∞→r we get a = 1 and 21 β−=b , and from (13.3) we obtain the Poincaré group with three parameters (T, T0,V). The Fantappié group can be synthesized by a very clear geometrical viewpoint, saying that the De Sitter universe at 21 r constant curvature shows an elliptic geometry in its hyper spatial global aspect (Gauss-Riemann) and an hyperbolic geometry in its space-time sections (Lobacevskij). Making the “natural” r unit of this two geometries tend towards infinity we obtain the parabolic geometry of Minkowski flat space. 4. The Projective Relativity in De Sitter Universe The Projective Special Relativity (PSR) widens and contextualizes the relativistic results in De Sitter geometry.Just like in any physics there exists a wll-defined connection between mechanics and geometry. Therefore the PSR makes use of the notion of observer’s private space, redifining it on the basis of a constant curvature. In PSR it is introduced a space temporal double scale which connects a ( τχ , ) point of S4 with a (x,t) one of P4 by means the (1.3) projective invariant. Given a AB straight line and put as R and S the intersections with (1.3), the projective distance is given by the logarithm of the (ABRS) bi-ratio: (1.4) ( ) ( ) ( ) ( ) ( )ASBRBSARtABRStAB ⋅⋅== log2log2 00 . From the (1.4) we obtain: (2.4) rarctg=χ and 00 log From the (2.4) second one, similar to the Milne’s formula, we can see that the “formal” singularities are related to the projective description which depicts a universe with infinite space and finite time, whereas the De Sitter one is with finite space and infinite time. It is important to underline that such equivalence between an “evolutionary” model and a “stationary” one, differently from what is often stated, is purely geometrical and has nothing to do with the physical processes, but it deals with the cosmological observer definition.We will speak again about such fundamental point further. The addition of durations’ new law: (3.4) 1 tdd it is obtained by the (10.3) formulae and finds its physical meaning in the appearing of the new crt =0 , interpretable as the “universe age” for any 4P observer family. Let us consider a uniform motion with U velocity, given by '' Utx = , by means of Fantappié transformations we have a uniform motion with W velocity given by: (4.4) ( ) ( ) ( ) cUVcVU −+−++ For the visible universe of the O observer, inside the light-cone, it is valid the condition γα ±= and a=1 , and the (4.4) can be simplified as: (5.4) ( )( ) cVcUcVU For V = c then W=c, according to RR, while for U=c we have: (6.4) ( ) ( ) ccVcVccW ≠+−±= 112 2α . The (6.4) expresses the possibility of observing hyper-c velocity in PSR. The outcome is less strange than it can seem at first sight, because now the space-time of an observer is defined not only by the c constant but also by r, and the light-cone is at variable aperture. In straighter physical terms it means that when we observe a far universe region of the crt =0 order, the cosmic objects’ velocity appears to be superior to c value, even if the region belongs to the light-cone of the observer’s past. For b=0 we obtain the angular coefficients of the tangents to the (1.3) Cayley-Klein invariant starting from a P point of the Beltrami-Castelnuovo projection, which represent the two light-cone’s straight lines. Differently from RR, here the light-cone’s angle is not constant and depends on the P point according to the formula: (7.4) ( )222 γαϑ += atg . From the (7.4) derives the C variation of the light velocity with time: (8.4) 21 γ− C , with from which follows that ∞→C in the two 0t± singularities which fix the limit duration according to the addition of durations’ new law (3.4). Another remarkable consequence of the projective group is the expansion-collapse law, that is the connection between the two singularities. Differentiating the (10.3) and dividing them we obtain the velocities’ variation law for a translation in time: (9.4) ( ) 002' 11 txttVV γγγ −+=− . For 1=γ and 00 tT = we have the law of projective expansion valid for 00 <<− tt : (10.4) = , or also If ( )00 == tγ , we can write (11.4) HxtxV == 0 , ( )αβ = , where 01 trcH == is the well-known Hubble constant. The analogous procedure will be followed for the law of projective collapse valid for 00 tt << , with 1−=γ and 00 tT −= : (12.4) = , or We note that in singularities the expansion-collapse velocity becomes infinite. In PSR such process, differently from GR, is not connected to gravitation, but derives from Beltrami- Castelnuovo geometry. From the Fantappié group it also follows a new formula for the Doppler effect: (13.4) ( ) ( ) 2' 11 αββωω ++−= , where ω is the frequency. For 1=β , which is V=c, we get nothing but the traditional proportionality between distance and frequency, αωω =' . For V=0 there follows a Doppler effect depending on distance: (14.4) 2' 1 αωω += . The z red-shift is defined by '1 ωω=+ z and the (13.4) becomes: (15.4) ( ) ( ) ( ) 21111 αββ ++−=+ z , which was historically introduced- in a 1930 Accademia dei Lincei famous memoir- by Castelnuovo to explain the “new” Hubble observations on galactic red-shift. If we are placed on the observer’s light-cone where the (12.4) becomes )1( ααβ −= , the (15.4) will be: (16.4) ( )α−=+ 111 z . The red-shift tends towards infinity for x = r, and hyper- c velocities are possible if z > 1. As everybody would naturally expect, modifying geometry implies, as well as in RR, a deep redefinition of mechanics. In PSR, the m mass of a body varies with velocity and distance according (17.4) bamm 20= . From the (17.4) it follows that for a = 0, in singularities, the mass is null, while on the light- cone, for b = 0, ∞→m . The mass of a body at rest varies with t according to: (18.4) ( )20 1 γ−= mm , from which we deduce that at the initial and final instant, 1±=γ , the mass vanishes. Another greatly important outcome (Arcidiacono, 1977) is the relation between m mass and the J polar inertia momentum of a body: (19.4) 2mrJ = A remarkable consequence is that the universe M mass varies with t: (20.4) ( ) ( ) MtM +−= γ , where M0 is the mass for 0=t , and J the polar momentum with respect to the observer. So the overall picture for an inertial observer in a De Sitter Universe is that of a universe coming into existence in a singularity at –t0 time, expanding and collapsing at t0 time and where c light velocity is only locally constant. In the initial and final instants the light velocity is infinite and the global mass is zero while in the expansion-collapse time it varies according to (20.4). In the projective scenario the space flatness is linked to the observer geometry in a universe at constant curvature. All this is linked to the fact that in PSR the translations and rotations are indivisible. In the singularities there is no “breakdown” of the physical laws because the global space-time structure is univocally individuated by the group which is independent of the matter-energy distribution. In this case, the singularities in 4P are – more properly- an horizon of events with a natural “cosmic censure” fixed by observers’ geometry. 5. The Projective Gravitation The connection between the metric approach to Einstein gravitation and Fantappié-Arcidiacono group one is the aim of Projective General Relativity(PGR), which describes a universe globally at constant curvature and locally at variable curvature. It can be done by following the Cartan idea, where any 4V Riemann manifold is associated with an infinite family of Euclidean, pseudo- Euclidean, non-Euclidean spaces tangent to it in each of its P points. Those spaces’ geometry is individuated by a holonomy group. The Cartan connection law links the tangent spaces so as to obtain both the 4V local characteristics (curvature and torsion) and the global ones (holonomy group). The GR holonomy group is the one at four dimension rotations, i.e. the Lorentz group. So we get a general method which builds a bridgeway up between differential geometry and group theory (Pessa, 1973; Arcidiacono, 1986) To make a PGR it is introduced the 5V Riemann manifold which allows as holonomy group the De Sitter-Fantappié one, isomorphic to the 5S five-dimensional rotations’ group. The 5V geometry is successively written in terms of Beltrami projective inducted metric for a anholomonous 4V manifold at variable curvature. The Veblen projective connection: (1.5) { }ABCABC =π = ( )BCSCSBBSCAS gggg ∂−∂+∂2 defines a projective translation law which let the field of the Q quadrics invariant in the tangent spaces, in each 4V point, 0== BAAB xxgQ ,where ABg are the coefficients of the five-dimensional metric, the Kx are the homogeneous projective coordinates, and (ABC)=0,1,..,4.From the (1.5) we build the projective torsion-curvature tensor: (2.5) SBC BCDR ππππππ −+∂−∂= . So the gravitation equations of Projective General Relativity are: (3.5) ABABAB TRgR χ=− 2 with ABT energy-momentum tensor, and χ Einstein gravitational constant. The (2.5) tensor is projectively flat, i.e. when it vanishes we get the De Sitter space at constant curvature. The deep link between rotations and translations in 4S naturally leads the (3.5) to include the torsion, showing an interesting formal analogy with Einstein-Cartan- Sciama-Kibble spin-fluids theory. The construction is analogous to the GR one, but in lieu of the relation between Riemann curvature and Minkowski s-t, we get here a curvature-torsion connected to the De Sitter-Fantappié holonomy group. It has to be noted that, in concordance with the equivalence principle, the PGR gives a metric description of the local gravity, valid for single( i.e., non cosmological) systems. It is here proposed again the problem of the relations between local physics and its extension on cosmic scale. In fact, if we take the starting expression of standard cosmology based upon GR, i.e. let us consider the whole matter of Universe, and transfer it within the ambit of PGR, we can ask ourselves if the torsion role, associated to the rotation one, could get a feed-back on the background metric, modifying it deeply. Generally, the syntax of a purely group-based theory does not get the tools to give an answer, because it is independent from gravity and the hypotheses on ABT . For example, Snyder (Snyder, 1947) showed that in a De Sitter space it is introduced an uncertainty relation linked to a curvature of the kind: 21 rxx ki ≈∆∆ . Only a third quantization formalism, able to take into account the dynamical two-way inter-relations between local and global, will succeed in giving an answer. The essential point we have to underline here is that the introduction of a cosmological constant, both as additional hypothesis on Einstein equations or via group, is a radical alternative to the “machian philosophy” of the GR. So, for a Universe without metter-fields we assume the constant curvature as a sort of “pre- matter” which describes in topological terms the most general conditions for the quantum vacuum. Therefore the Einstein equations in the following form are valid: (4.5) ABAB gG Λ= and ( ) ABAB gRR Λ−= 2 , with their essentially physical content, i.e. the deep connection among curvature, radius and matter- energy’s density vacρ by means of the cosmological constant: (5.5) vac π 6. De Sitter Observers, Singularities and Wick Rotations From a quantum viewpoint the 4S interesting aspect is that it is at imaginary cyclic time and without singularities. It means that it is impossible to define on De Sitter a global temporal coordinate. So it has an istanton feature, individuated by its Euler topological number which is 2 (Rajaraman,1982). This leads to a series of formal analogies both with black holes’ quantum physics and the theoretical proposals for the “cure” for singularities. Let us consider the De Sitter-Castelnuovo metric in real time: (1.6) 222 2 11 Ω+�� drdrr ds , where 2222 sin ϕϑϑ ddd +=Ω in polar coordinates. As we have seen in PSR, the singularity in Hcr = becomes an horizon of events for any observer when it passes to the Euclidean metric with it−→τ : (2.6) ( )222 22 sincos Ω++= rddrH dds ττ , with a close analogy with the Schwarzschild solution’s case. The τ period is Hπβ 2= ; for the observers in De Sitter it implies the possibility to define a temperature, an entropy and an area of the horizon, respectively given by: (3.6) 1 −== β Tb ; π From the (3.6) we get the following fundamental outcome: (4.6) AS which is the well-known expression of the t’Hooft-Susskind-Bekenstein Holographic Principle(Susskind,1995). The (4.6) connects the non-existence of a global temporal coordinate with the information accessible to any observer in the De Sitter model. In this way we obtain a deep physical explanation for applying the Weyl Principle in the De Sitter Universe, and sum up that in cosmology, as well as in QM, a physical system cannot be fully specified without defining an observer. G. Arcidiacono stated that the hyper-spherical Universe is like a book written with seven seals ( Apocalypse, 6-11), and consequently two operations are necessary to investigate its physics: 1) inverse Wick rotation and 2) Beltrami-Castelnuovo representation. That’s the way we can completely define a relativity in De Sitter. The association of imaginary time with temperature gets a remarkable physical significance which implies some considerations on the statistical partition function (Hawking, 1975). For our aims it will be sufficient to say that such temperature is linked to the (4.6) relation, i.e. to the information that an observer spent within his area of events. Which thing has patent implications from the dynamical viewpoint, because it is the same as to state that, as well as in Schwarzschild black hole’ s case, the De Sitter space and the quantum field defined on it behave as if they were immersed in background fluctuations. The transition amplitude from a configuration of a φ generic field in dttt =− 12 time will be given by the iHdte− matrix element which acts as a ( )1U group transformation of the ( ) ( )timespace UU 11 ⇔ . It means that a transition amplitude on 4S will appear to an observer as the ( )tR scale factor’s variation with H variation rate. It makes possible to link the hyper-spherical description with the Big-Bang evolutionary scenario and to get rid of the thermodinamic ambiguities which characterize its “beginning” and “ending” notions. The last ones have to be re-interpretated as purely quantum dynamics of the matter-fields on the hyper-sphere free of singularities. 7. Physical Considerations for Further Developments Such considerations suggest a research program we are going here to shortly delineate ; it furthermore develops the analogy between black holes, istantons and De Sitter Universes (see – for example – Frolov, Markov, Mukhanov, 1989;Strominger, 1992). It is known that the Hartle- Hawking proposal of “no-boundary” condition removes the initial singularity and allows to calculate the Universe wave function (Hartle-Hawking, 1989). In fact, it is possible – as in the usual QFT- to calculate the path integrals by using a Wick rotation as “Euclidization” procedure. In such way also the essential characteristics of the inflationary hypotheses are englobed (A. Borde, A. Guth and A. Vilenkin, 2003). The derived formalism is similar to that used in the ordinary QM for the tunnel effect, an analogy which should explain the physics at its bottom (Vilenkin, 1982; S.W. Hawking and I.G. Moss, 1982). The group extension method provides this procedure with a solid foundation, because the De Sitter space, maximally symmetric and simply connected, is univocally individuated by the group structure, and consequently is directly linked to the space-time homogeneity and isotropy principle with respect to physical laws. The original Hartle-Hawking formulation operates a mix of topologies hardly justified both on the formal level and the conceptual one. The “no-boundary” condition is only valid if we works with imaginary time, and the theory does not contain a strict logical procedure to explain the passage to real time. This corresponds to a quite vague attempt to conciliate an hyper-spherical description at imaginary time with an evolutive one at real time according to the traditional Big-Bang scenario.In fact, it has been observed that the Hartle-Hawking condition is the same as to substitute a singularity with a “nebulosity”. The spontaneous proposal, at this point, is considering the Hartle-Hawking conditions on primordial space-time as a consequence of a global charaterization of the hyper-sphere and directly developing quantum physics on 4S .Which thing does not contradict the quantum mechanics formulation and its fundamental spirit, which is to say the Feynman path integrals. In other words, quantum mechanics has not to be applied to cosmology for the Universe smallness at its beginning, but because each physical system – without exception- gets quantum histories with amplitude interferences. We point out that such view is in perfect consonance with the so-called quantum mechanics Many Worlds Interpretation ( Halliwell, 1994). The “by nothing creation” means that we cannot “look inside” an istanton (hyper-spherical space), but we have to recourse to an “evolutionary” description which separates space from time. The projective methods tell us how to do it. An analogous problem– to some extent – is that of the Weyl Tensor Hypothesis. Recently, Roger Penrose has suggested a condition on the initial singularity that, within the GR, ties entropy and gravity and makes a time arrow emerge (Penrose,1989). It is known that the ABCDW Weyl conformal tensor describes the freedom degrees of the gravitational field. The Penrose Hypothesis is that 0→ABCDW in the Big-Bang, while ∞→ABCDW in the Big-Crunch. The physical reason is that in the Universe’s initial state we have an highly uniform matter distribution at low entropy ( entalpic order), while in Big-Crunch, just like a black hole, we have an high entropy situation. This differentiates the two singularities and provides a time arrow. In an hyper-spherical Universe there is no “beginning” and “ending”, but only quantum transitions.Consequently, the Penrose Hypothesis can only be implemented in terms of projective representation within the ambit of PGR. Finally, we can take into consideration the possibility to build a Quantum Field Theory on 4S . A QFT, for T tending towards zero, is a limit case of a theory describing some physical fields interacting with an external environment at T temperature. Without this external environment we could not speak of dechoerence , could not introduce concepts such as like dissipation, chaos, noise and, obviously, the possibility to describe phase transitions would vanish too. Therefore, it is of paramount importance to write a QFT on De Sitter background metric and then studying it in projective representation. If we admit decoherence processes on 4S , it is possible to interpret the Weyl Principle as a form of Anthropic Principle: the “classical” and observable Universes are the ones where it can be operated a description at real time. In conclusion, it is possible to delineate an alternative, but not incompatible with traditional cosmology scenario.The Universe is the quantum configuration of the quantum fields on 4S .Thus developing a Quantum Cosmology coincides with developing a Quantum Field Theory on a space free of singularities.The Big-Bang is a by vacuum nucleation in an hyper-spherical background at imaginary time, and so the concepts of “beginning”, “expansion” and “ending” belong to the space- time foreground and gain their meaning only by means of a suitable representation which defines a family of cosmological observers. Acknowledgements: I owe my knowledge of the group extension method to the regretted Prof. G. Arcidiacono (1927 – 1998), during our intense discussions while strolling throughout Rome. Special thanks to my friends E. Pessa and L. Chiatti for the rich exchange of viewpoints and e- mails. References Arcidiacono, G.(1956), Rend. Accad. Lincei,20, 4 Arcidiacono, G. (1976), Gen. Rel. And Grav., 7, 885 Arcidiacono, G.(1977), Gen. Rel. And Grav.,7, 865 Arcidiacono, G.(1984), in De Sabbata,V. & T.M.Karade (eds), Relativistic Astrophysics and Cosmology, World Scientific, Singapore Arcidiacono, G. (1986), Projective Relativity,Cosmology and Gravitation, Hadr.Press,Cambridge,USA Borde, A. , Guth,A., Vilenkin, A. (2003),Phys. Rev. Lett. 90 Ellis, G.F.R. & Wiliams,R. (1988), Flat and Curved Space-Times, Clarendon Press Fantappié, L. (1954), Rend. Accad. Lincei,17,5 Fantappié, L. (1959), Collectanea Mathematica,XI, 77 Frolov, V.P., Markov,M.A., Mukhanov,V.F.(1989), Phys.Lett.,B216,272 Halliwell,J.J. (1994), in Greenberger,D. & Zeilinger,A.(eds), Fundamental Problems in Quantum Theory, New York Academy of Sciences,NY Hartle, J.B. & Hawking,S.W. (1983), Phys.Rev.D,28,12 Hawking,S.W. & Ellis, G.F.R.,(1973) The Large Scale Structure of Space-Time,Cambridge Univ.Press Hawking,S.W. (1975), Commun.Math.Phys.,43 Hawking, S.W. & Moss, I.G. (1982), Phys.Lett.,B110,35 Pessa,E. (1973), Collectanea Mathematica,XXIV,2 Rajaraman,R.(1982), Solitons and Istantons,North-Holland Publ.,NY Snyder, H.S. (1947), Phys.Rev., 51,38 Strominger, A. (1992), Phys.Rev.D,46,10 Susskind, L. (1995), Jour.Math.Phys.,36 Vilenkin, A. (1982), Phys.Lett.,117B,1. ABSTRACT In the last years the traditional scenario of Big Bang has been deeply modified by the study of the quantum features of the Universe evolution, proposing again the problem of using local physical laws on cosmic scale, with particular regard to the cosmological constant role. The group extention method shows that the De Sitter group univocally generalizes the Poincare group, formally justifies the cosmological constant use and suggests a new interpretation for Hartle-Hawking boundary conditions in Quantum Cosmology. <|endoftext|><|startoftext|> Introduction The spectral action introduced by Chamseddine–Connes plays an important role [3] in noncom- mutative geometry. More precisely, given a spectral triple (A,H,D) where A is an algebra acting on the Hilbert space H and D is a Dirac-like operator (see [8, 23]), they proposed a physical action depending only on the spectrum of the covariant Dirac operator DA := D +A+ ǫ JAJ−1 (1) where A is a one-form represented on H, so has the decomposition ai[D, bi], (2) with ai, bi ∈ A, J is a real structure on the triple corresponding to charge conjugation and ǫ ∈ { 1,−1 } depending on the dimension of this triple and comes from the commutation relation JD = ǫDJ. (3) This action is defined by S(DA,Φ,Λ) := Tr Φ(DA/Λ) where Φ is any even positive cut-off function which could be replaced by a step function up to some mathematical difficulties investigated in [16]. This means that S counts the spectral values of |DA| less than the mass scale Λ (note that the resolvent of DA is compact since, by assumption, the same is true for D, see Lemma 3.1 below). In [18], the spectral action on NC-tori has been computed only for operators of the form D+A and computed for DA in [20]. It appears that the implementation of the real structure via J , does change the spectral action, up to a coefficient when the torus has dimension 4. Here we prove that this can be also directly obtained from the Chamseddine–Connes analysis of [4] that we follow quite closely. Actually, S(DA,Φ,Λ) = 0 0. 2.1 Residues of series and integral In order to be able to compute later the residues of certain series, we prove here the following Theorem 2.1. Let P (X) = j=0 Pj(X) ∈ C[X1, · · · ,Xn] be a polynomial function where Pj is the homogeneous part of P of degree j. The function ζP (s) := P (k) , s ∈ C has a meromorphic continuation to the whole complex plane C. Moreover ζP (s) is not entire if and only if PP := {j : u∈Sn−1 Pj(u) dS(u) 6= 0} 6= ∅. In that case, ζP has only simple poles at the points j + n, j ∈ PP , with s=j+n ζP (s) = u∈Sn−1 Pj(u) dS(u). The proof of this theorem is based on the following lemmas. Lemma 2.2. For any polynomial P ∈ C[X1, . . . ,Xn] of total degree δ(P ) := i=1 degXiP and any α ∈ Nn0 , we have P (x)|x|−s ≪P,α,n (1 + |s|)|α|1 |x|−σ−|α|1+δ(P ) uniformly in x ∈ Rn verifying |x| ≥ 1, where σ = ℜ(s). Proof. By linearity, we may assume without loss of generality that P (X) = Xγ is a monomial. It is easy to prove (for example by induction on |α|1) that for all α ∈ Nn0 and x ∈ Rn \ {0}: |x|−s β,µ∈Nn0 β+2µ=α |β|1+|µ|1 ) (|β|1+|µ|1)! β! µ! |x|σ+2(|β|1+|µ|1) It follows that for all α ∈ Nn0 , we have uniformly in x ∈ Rn verifying |x| ≥ 1: |x|−s ≪α,n (1 + |s|)|α|1 |x|−σ−|α|1 . (7) By Leibniz formula and (7), we have uniformly in x ∈ Rn verifying |x| ≥ 1: xγ |x|−s ∂β(xγ) ∂α−β |x|−s ≪γ,α,n β≤α;β≤γ xγ−β (1 + |s|)|α|1−|β|1 |x|−σ−|α|1+|β|1 ≪γ,α,n (1 + |s|)|α|1 |x|−σ−|α|1+|γ|1 . Lemma 2.3. Let P ∈ C[X1, . . . ,Xn] be a polynomial of degree d. Then, the difference ∆P (s) := P (k) Rn\Bn P (x) which is defined for ℜ(s) > d+ n, extends holomorphically on the whole complex plane C. Proof. We fix in the sequel a function ψ ∈ C∞(Rn,R) verifying for all x ∈ Rn 0 ≤ ψ(x) ≤ 1, ψ(x) = 1 if |x| ≥ 1 and ψ(x) = 0 if |x| ≤ 1/2. The function f(x, s) := ψ(x) P (x) |x|−s, x ∈ Rn and s ∈ C, is in C∞(Rn × C) and depends holomorphically on s. Lemma 2.2 above shows that f is a “gauged symbol” in the terminology of [24, p. 4]. Thus [24, Theorem 2.1] implies that ∆P (s) extends holomorphically on the whole complex plane C. However, to be complete, we will give here a short proof of Lemma 2.3: It follows from the classical Euler–Maclaurin formula that for any function h : R → C of class CN+1 verifying lim|t|→+∞ h(k)(t) = 0 and |h(k)(t)| dt < +∞ for any k = 0 . . . , N + 1, that we have ∑ h(k) = h(t) + (−1)N (N+1)! BN+1(t) h (N+1)(t) dt where BN+1 is the Bernoulli function of order N + 1 (it is a bounded periodic function.) Fix m′ ∈ Zn−1 and s ∈ C. Applying this to the function h(t) := ψ(m′, t) P (m′, t) |(m′, t)|−s (we use Lemma 2.2 to verify hypothesis), we obtain that for any N ∈ N0: ψ(m′,mn) P (m ′,mn) |(m′,mn)|−s = ψ(m′, t) P (m′, t) |(m′, t)|−s dt+RN (m′; s) (8) where RN (m′; s) := (−1) (N+1)! BN+1(t) N+1 (ψ(m ′, t) P (m′, t) |(m′, t)|−s) dt. By Lemma 2.2, ∣∣∣BN+1(t) ∂ ψ(m′, t) P (m′, t) |(m′, t)|−s ) ∣∣∣ dt ≪P,n,N (1 + |s|)N+1 (|m′|+ 1)−σ−N+δ(P ). m′∈Zn−1 RN (m′; s) converges absolutely and define a holomorphic function in the half plane {σ = ℜ(s) > δ(P ) + n−N}. Since N is an arbitrary integer, by letting N → ∞ and using (8) above, we conclude that: (m′,mn)∈Zn−1×Z ψ(m′,mn) P (m ′,mn) |(m′,mn)|−s− m′∈Zn−1 ψ(m′, t) P (m′, t) |(m′, t)|−s dt has a holomorphic continuation to the whole complex plane C. After n iterations, we obtain that ψ(m) P (m) |m|−s − ψ(x) P (x) |x|−s dx has a holomorphic continuation to the whole C. To finish the proof of Lemma 2.3, it is enough to notice that: • ψ(0) = 0 and ψ(m) = 1, ∀m ∈ Zn \ {0}; • s 7→ ψ(x) P (x) |x|−s dx = {x∈Rn:1/2≤|x|≤1} ψ(x) P (x) |x|−s dx is a holomorphic function on C. Proof of Theorem 2.1. Using the polar decomposition of the volume form dx = ρn−1 dρ dS in Rn, we get for ℜ(s) > d+ n, Rn\Bn Pj(x) ρj+n−1 Pj(u) dS(u) = j+n−s Pj(u) dS(u). Lemma 2.3 now gives the result. 2.2 Holomorphy of certain series Before stating the main result of this section, we give first in the following some preliminaries from Diophantine approximation theory: Definition 2.4. (i) Let δ > 0. A vector a ∈ Rn is said to be δ−diophantine if there exists c > 0 such that |q.a−m| ≥ c |q|−δ, ∀q ∈ Zn \ { 0 } and ∀m ∈ Z. We note BV(δ) the set of δ−diophantine vectors and BV := ∪δ>0BV(δ) the set of diophantine vectors. (ii) A matrix Θ ∈ Mn(R) (real n × n matrices) will be said to be diophantine if there exists u ∈ Zn such that tΘ(u) is a diophantine vector of Rn. Remark. A classical result from Diophantine approximation asserts that for all δ > n, the Lebesgue measure of Rn \ BV(δ) is zero (i.e almost any element of Rn is δ−diophantine.) Let Θ ∈ Mn(R). If its row of index i is a diophantine vector of Rn (i.e. if Li ∈ BV) then tΘ(ei) ∈ BV and thus Θ is a diophantine matrix. It follows that almost any matrix of Mn(R) ≈ is diophantine. The goal of this section is to show the following Theorem 2.5. Let P ∈ C[X1, · · · ,Xn] be a homogeneous polynomial of degree d and let b be in S(Zn × · · · × Zn) (q times, q ∈ N). Then, (i) Let a ∈ Rn. We define fa(s) := P (k) e2πik.a. 1. If a ∈ Zn, then fa has a meromorphic continuation to the whole complex plane C. Moreover if S is the unit sphere and dS its Lebesgue measure, then fa is not entire if and only u∈Sn−1 P (u) dS(u) 6= 0. In that case, fa has only a simple pole at the point d + n, with s=d+n fa(s) = u∈Sn−1 P (u) dS(u). 2. If a ∈ Rn \ Zn, then fa(s) extends holomorphically to the whole complex plane C. (ii) Suppose that Θ ∈ Mn(R) is diophantine. For any (εi)i ∈ {−1, 0, 1}q , the function g(s) := l∈(Zn)q b(l) fΘ i εili extends meromorphically to the whole complex plane C with only one possible pole on s = d+n. Moreover, if we set Z := {l ∈ (Zn)q : i=1 εili = 0} and V := l∈Z b(l), then 1. If V P (u) dS(u) 6= 0, then s = d+ n is a simple pole of g(s) and s=d+n g(s) = V u∈Sn−1 P (u) dS(u). 2. If V P (u) dS(u) = 0, then g(s) extends holomorphically to the whole complex plane C. (iii) Suppose that Θ ∈ Mn(R) is diophantine. For any (εi)i ∈ {−1, 0, 1}q , the function g0(s) := l∈(Zn)q\Z b(l) fΘ i=1 εili where Z := {l ∈ (Zn)q : i=1 εili = 0} extends holomorphically to the whole complex plane C. Proof of Theorem 2.5: First we remark that If a ∈ Zn then fa(s) = P (k) . So, the point (i.1) follows from Theorem 2.1; g(s) := l∈(Zn)q\Z b(l) fΘ i εili (s) + l∈Z b(l) P (k) . Thus, the point (ii) rises easily from (iii) and Theorem 2.1. So, to complete the proof, it remains to prove the items (i.2) and (iii). The direct proof of (i.2) is easy but is not sufficient to deduce (iii) of which the proof is more delicate and requires a more precise (i.e. more effective) version of (i.2). The next lemma gives such crucial version, but before, let us give some notations: F := { P (X) (X21+···+X r/2 : P (X) ∈ C[X1, . . . ,Xn] and r ∈ N0}. We set g =deg(G) =deg(P )− r ∈ Z, the degree of G = P (X) (X21+···+X r/2 ∈ F . By convention we set deg(0) = −∞. Lemma 2.6. Let a ∈ Rn. We assume that d (a.u,Z) := infm∈Z |a.u−m| > 0 for some u ∈ Zn. For all G ∈ F , we define formally, F0(G; a; s) := e2πi k.a and F1(G; a; s) := (|k|2+1)s/2 e2πi k.a. Then for all N ∈ N, all G ∈ F and all i ∈ {0, 1}, there exist positive constants Ci := Ci(G,N, u), Bi := Bi(G,N, u) and Ai := Ai(G,N, u) such that s 7→ Fi(G;α; s) extends holomorphically to the half-plane {ℜ(s) > −N} and verifies in it: Fi(G; a; s) ≤ Ci(1 + |s|)Bi d (a.u,Z) Remark 2.7. The important point here is that we obtain an explicit bound of Fi(G;α; s) in {ℜ(s) > −N} which depends on the vector a only through d(a.u,Z), so depends on u and indirectly on a (in the sequel, a will vary.) In particular the constants Ci := Ci(G,N, u), Bi = Bi(G,N) and Ai := Ai(G,N) do not depend on the vector a but only on u. This is crucial for the proof of items (ii) and (iii) of Theorem 2.5! 2.2.1 Proof of Lemma 2.6 for i = 1: Let N ∈ N0 be a fixed integer, and set g0 := n+N + 1. We will prove Lemma 2.6 by induction on g =deg(G) ∈ Z. More precisely, in order to prove case i = 1, it suffices to prove that: Lemma 2.6 is true for all G ∈ F verifying deg(G) ≤ −g0. Let g ∈ Z with g ≥ −g0+1. If Lemma 2.6 is true for all G ∈ F such that deg(G) ≤ g−1, then it is also true for all G ∈ F satisfying deg(G) = g. • Step 1: Checking Lemma 2.6 for deg(G) ≤ −g0 := −(n+N + 1). Let G(X) = P (X) (X21+···+X r/2 ∈ F verifying deg(G) ≤ −g0. It is easy to see that we have uniformly in s = σ + iτ ∈ C and in k ∈ Zn: |G(k) e2πi k.a| (|k|2+1)σ/2 |P (k)| (|k|2+1)(r+σ)/2 ≪G 1(|k|2+1)(r+σ−deg(P ))/2 ≪G (|k|2+1)(σ−deg(G))/2 ≪G 1(|k|2+1)(σ+g0)/2 . It follows that F1(G; a; s) = (|k|2+1)s/2 e2πi k.a converges absolutely and defines a holo- morphic function in the half plane {σ > −N}. Therefore, we have for any s ∈ {ℜ(s) > −N}: |F1(G; a; s)| ≪G (|k|2+1)(−N+g0)/2 (|k|2+1)(n+1)/2 ≪G 1. Thus, Lemma 2.6 is true when deg(G) ≤ −g0. • Step 2: Induction. Now let g ∈ Z satisfying g ≥ −g0+1 and suppose that Lemma 2.6 is valid for all G ∈ F verifying deg(G) ≤ g − 1. Let G ∈ F with deg(G) = g. We will prove that G also verifies conclusions of Lemma 2.6: There exist P ∈ C[X1, . . . ,Xn] of degree d ≥ 0 and r ∈ N0 such that G(X) = P (X)(X21+···+X2n+1)r/2 and g =deg(G) = d− r. Since G(k) ≪ (|k|2 +1)g/2 uniformly in k ∈ Zn, we deduce that F1(G; a; s) converges absolutely in {σ = ℜ(s) > n+ g}. Since k 7→ k + u is a bijection from Zn into Zn, it follows that we also have for ℜ(s) > n+ g F1(G; a; s) = P (k) (|k|2+1)(s+r)/2 e2πi k.a = P (k+u) (|k+u|2+1)(s+r)/2 e2πi (k+u).a = e2πi u.a P (k+u) (|k|2+2k.u+|u|2+1)(s+r)/2 e2πi k.a = e2πi u.a α∈Nn0 ;|α|1=α1+···+αn≤d ∂αP (k) (|k|2+2k.u+|u|2+1)(s+r)/2 e2πi k.a = e2πi u.a |α|1≤d ∂αP (k) (|k|2+1)(s+r)/2 2k.u+|u|2 (|k|2+1) )−(s+r)/2 e2πi k.a. Let M := sup(N + n+ g, 0) ∈ N0. We have uniformly in k ∈ Zn 2k.u+|u|2 (|k|2+1) )−(s+r)/2 −(s+r)/2 )(2k.u+|u|2)j (|k|2+1)j +OM,u ( (1+|s|)M+1 (|k|2+1)(M+1)/2 Thus, for σ = ℜ(s) > n+ d, F1(G; a; s) = e 2πi u.a |α|1≤d ∂αP (k) (|k|2+1)(s+r)/2 2k.u+|u|2 (|k|2+1) )−(s+r)/2 e2πi k.a = e2πi u.a |α|1≤d −(s+r)/2 ∂αP (k)(2k.u+|u|2) (|k|2+1)(s+r+2j)/2 e2πi k.a +OG,M,u (1 + |s|)M+1 (|k|2+1)(σ+M+1−g)/2 . (9) Set I := {(α, j) ∈ Nn0 × {0, . . . ,M} | |α|1 ≤ d} and I∗ := I \ { (0, 0) }. Set also G(α,j);u(X) := ∂αP (X)(2X.u+|u|2) (|X|2+1)(r+2j)/2 ∈ F for all (α, j) ∈ I∗. Since M ≥ N + n+ g, it follows from (9) that (1 − e2πi u.a) F1(G; a; s) = e2πi u.a (α,j)∈I∗ −(s+r)/2 G(α,j);u;α; s +RN (G; a;u; s) (10) where s 7→ RN (G; a;u; s) is a holomorphic function in the half plane {σ = ℜ(s) > −N}, in which it satisfies the bound RN (G; a;u; s) ≪G,N,u 1. Moreover it is easy to see that, for any (α, j) ∈ I∗, G(α,j);u = deg(∂αP ) + j − (r + 2j) ≤ d− |α|1 + j − (r + 2j) = g − |α|1 − j ≤ g − 1. Relation (10) and the induction hypothesis imply then that (1− e2πi u.a) F1(G; a; s) verifies the conclusions of Lemma 2.6. (11) Since |1− e2πi u.a| = 2| sin(πu.a)| ≥ d (u.a,Z), then (11) implies that F1(G; a; s) satisfies conclu- sions of Lemma 2.6. This completes the induction and the proof for i = 1. 2.2.2 Proof of Lemma 2.6 for i = 0: Let N ∈ N be a fixed integer. Let G(X) = P (X) (X21+···+X r/2 ∈ F and g = deg(G) = d− r where d ≥ 0 is the degree of the polynomial P . Set also M := sup(N + g + n, 0) ∈ N0. Since P (k) ≪ |k|d for k ∈ Zn\{ 0 }, it follows that F0(G; a; s) and F1(G; a; s) converge absolutely in the half plane {σ = ℜ(s) > n+ g}. Moreover, we have for s = σ + iτ ∈ C verifying σ > n+ g: F0(G; a; s) = k∈Zn\{ 0 } (|k|2+1−1)s/2 e2πi k.a = ′ G(k) (|k|2+1)s/2 |k|2+1 )−s/2 e2πi k.a (−1)j G(k) (|k|2+1)(s+2j)/2 e2πi k.a (1 + |s|)M+1 ′ |G(k)| (|k|2+1)(σ+2M+2)/2 (−1)jF1(G; a; s + 2j) (1 + |s|)M+1 ′ |G(k)| (|k|2+1)(σ+2M+2)/2 . (12) In addition we have uniformly in s = σ + iτ ∈ C verifying σ > −N , ′ |G(k)| (|k|2+1)(σ+2M+2)/2 ′ |k|g (|k|2+1)(−N+2M+2)/2 |k|n+1 < +∞. So (12) and Lemma 2.6 for i = 1 imply that Lemma 2.6 is also true for i = 0. This completes the proof of Lemma 2.6. 2.2.3 Proof of item (i.2) of Theorem 2.5: Since a ∈ Rn \ Zn, there exists i0 ∈ {1, . . . , n} such that ai0 6∈ Z. In particular d(a.ei0 ,Z) = d(ai0 ,Z) > 0. Therefore, a satisfies the assumption of Lemma 2.6 with u = ei0 . Thus, for all N ∈ N, s 7→ fa(s) = F0(P ; a; s) has a holomorphic continuation to the half-plane {ℜ(s) > −N}. It follows, by letting N → ∞, that s 7→ fa(s) has a holomorphic continuation to the whole complex plane C. 2.2.4 Proof of item (iii) of Theorem 2.5: Let Θ ∈ Mn(R), (εi)i ∈ {−1, 0, 1}q and b ∈ S(Zn × Zn). We assume that Θ is a diophantine matrix. Set Z := { l = (l1, . . . , lq) ∈ (Zn)q : i εili = 0 } and P ∈ C[X1, . . . ,Xn] of degree d ≥ 0. It is easy to see that for σ > n+ d: l∈(Zn)q\Z |b(l)| ′ |P (k)| |e2πi k.Θ i εili | ≪P l∈(Zn)q\Z |b(l)| |k|σ−d l∈(Zn)q\Z |b(l)| < +∞. g0(s) := l∈(Zn)q\Z b(l) fΘ i εili (s) = l∈(Zn)q\Z ′ P (k) e2πi k.Θ i εili converges absolutely in the half plane {ℜ(s) > n+ d}. Moreover with the notations of Lemma 2.6, we have for all s = σ + iτ ∈ C verifying σ > n+ d: g0(s) = l∈(Zn)q\Z b(l)fΘ i εili (s) = l∈(Zn)q\Z b(l)F0(P ; Θ εili; s) (13) But Θ is diophantine, so there exists u ∈ Zn and δ, c > 0 such |q. tΘu−m| ≥ c (1 + |q|)−δ , ∀q ∈ Zn \ { 0 }, ∀m ∈ Z. We deduce that ∀l ∈ (Zn)q \ Z, .u−m| = | .tΘu−m| ≥ c 1 + | εili| )−δ ≥ c (1 + |l|)−δ. It follows that there exists u ∈ Zn, δ > 0 and c > 0 such that ∀l ∈ (Zn)q \ Z, d εili).u;Z ≥ c (1 + |l|)−δ . (14) Therefore, for any l ∈ (Zn)q \Z, the vector a = Θ i εili verifies the assumption of Lemma 2.6 with the same u. Moreover δ and c in (14) are also independent on l. We fix now N ∈ N. Lemma 2.6 implies that there exist positive constants C0 := C0(P,N, u), B0 := Bi(P,N, u) and A0 := A0(P,N, u) such that for all l ∈ (Zn)q \ Z, s 7→ F0(P ; Θ i εili; s) extends holomorphically to the half plane {ℜ(s) > −N} and verifies in it the bound F0(P ; Θ εili; s) ≤ C0 (1 + |s|)B0 d εili).u;Z This and (14) imply that for any compact set K included in the half plane {ℜ(s) > −N}, there exist two constants C := C(P,N, c, δ, u,K) and D := D(P,N, c, δ, u) (independent on l ∈ (Zn)q \ Z) such that ∀s ∈ K and ∀l ∈ (Zn)q \ Z, F0(P ; Θ εili; s) ≤ C (1 + |l|)D . (15) It follows that s 7→ l∈(Zn)q\Z b(l)F0(P ; Θ iεili; s) has a holomorphic continuation to the half plane {ℜ(s) > −N}. This and ( 13) imply that s 7→ g0(s) = l∈(Zn)q\Z b(l)fΘ i εili (s) has a holomorphic contin- uation to {ℜ(s) > −N}. Since N is an arbitrary integer, by letting N → ∞, it follows that s 7→ g0(s) has a holomorphic continuation to the whole complex plane C which completes the proof of the theorem. Remark 2.8. By equation (11), we see that a Diophantine condition is sufficient to get Lemma 2.6. Our Diophantine condition appears also (in equivalent form) in Connes [7, Prop. 49] (see Remark 4.2 below). The following heuristic argument shows that our condition seems to be necessary in order to get the result of Theorem 2.5: For simplicity we assume n = 1 (but the argument extends easily to any n). Let θ ∈ R \Q. We know (see this reflection formula in [15, p. 6]) that for any l ∈ Z \ {0}, gθl(s) := e2πiθlk s−1/2 ) hθl(1− s) where hθl(s) := |θl+k|s So, for any (al) ∈ S(Z), the existence of meromorphic continuation of g0(s) := l∈Z al gθl(s) is equivalent to the existence of meromorphic continuation of h0(s) := al hθl(s) = |θl+k|s So, for at least one σ0 ∈ R, we must have |al||θl+k|σ0 = O(1) uniformly in k, l ∈ Z It follows that for any (al) ∈ S(Z), |θl + k| ≫ |al|1/σ0 uniformly in k, l ∈ Z∗. Therefore, our Diophantine condition seems to be necessary. 2.2.5 Commutation between sum and residue Let p ∈ N. Recall that S((Zn)p) is the set of the Schwartz sequences on (Zn)p. In other words, b ∈ S((Zn)p) if and only if for all r ∈ N0, (1 + |l1|2 + · · · |lp|2)r |b(l1, · · · , lp)|2 is bounded on (Zn)p. We note that if Q ∈ R[X1, · · · ,Xnp] is a polynomial, (aj) ∈ S(Zn)p, b ∈ S(Zn) and φ a real-valued function, then l := (l1, · · · , lp) 7→ ã(l) b(−l̂p)Q(l) eiφ(l) is a Schwartz sequence on (Zn)p, where ã(l) := a1(l1) · · · ap(lp), l̂i := l1 + . . .+ li. In the following, we will use several times the fact that for any (k, l) ∈ (Zn)2 such that k 6= 0 and k 6= −l, we have |k + l|2 = |k|2 − 2k.l + |l|2 |k|2|k + l|2 . (16) Lemma 2.9. There exists a polynomial P ∈ R[X1, · · · ,Xp] of degree 4p and with positive coefficients such that for any k ∈ Zn, and l := (l1, · · · , lp) ∈ (Zn)p such that k 6= 0 and k 6= −l̂i for all 1 ≤ i ≤ p, the following holds: |k + l̂1|2 . . . |k + l̂p|2 ≤ 1|k|2p P (|l1|, · · · , |lp|). Proof. Let’s fix i such that 1 ≤ i ≤ p. Using two times (16), Cauchy–Schwarz inequality and the fact that |k + l̂i|2 ≥ 1, we get |k+bli|2 2|k||bli|+|bli| (2|k||bli|+|bli| |k|4|k+bli|2 |l̂i|+ |l̂i|2 + 4|k|3 |l̂i| 3 + 1 |l̂i|4. Since |k| ≥ 1, and |l̂i|j ≤ |l̂i|4 if 1 ≤ j ≤ 4, we find |k+bli|2 |l̂i|j ≤ 5|k|2 1 + 4|l̂i|4 1 + 4( |lj |)4 |k+bl1|2...|k+blp|2 |k|2p 1 + 4( |lj |)4 Taking P (X1, · · · ,Xp) := 5p 1 + 4( j=1Xj) now gives the result. Lemma 2.10. Let b ∈ S((Zn)p), p ∈ N, Pj ∈ R[X1, · · · ,Xn] be a homogeneous polynomial function of degree j, k ∈ Zn, l := (l1, · · · , lp) ∈ (Zn)p, r ∈ N0, φ be a real-valued function on Zn × (Zn)p and h(s, k, l) := b(l)Pj(k) e iφ(k,l) |k|s+r|k + l̂1|2 · · · |k + l̂p|2 with h(s, k, l) := 0 if, for k 6= 0, one of the denominators is zero. For all s ∈ C such that ℜ(s) > n+ j − r − 2p, the series H(s) := (k,l)∈(Zn)p+1 h(s, k, l) is absolutely summable. In particular, l∈(Zn)p h(s, k, l) = l∈(Zn)p h(s, k, l) . Proof. Let s = σ + iτ ∈ C such that σ > n+ j − r − 2p. By Lemma 2.9 we get, for k 6= 0, |h(s, k, l)| ≤ |b(l)Pj(k)| |k|−r−σ−2p P (l), where P (l) := P (|l1|, · · · , |lp|) and P is a polynomial of degree 4p with positive coefficients. Thus, |h(s, k, l)| ≤ F (l)G(k) where F (l) := |b(l)|P (l) and G(k) := |Pj(k)||k|−r−σ−2p. The summability of l∈(Zn)p F (l) is implied by the fact that b ∈ S((Zn)p). The summability of∑′ k∈ZnG(k) is a consequence of the fact that σ > n + j − r − 2p. Finally, as a product of two summable series, k,lF (l)G(k) is a summable series, which proves that k,lh(s, k, l) is also absolutely summable. Definition 2.11. Let f be a function on D× (Zn)p where D is an open neighborhood of 0 in C. We say that f satisfies (H1) if and only if there exists ρ > 0 such that (i) for any l, s 7→ f(s, l) extends as a holomorphic function on Uρ, where Uρ is the open disk of center 0 and radius ρ, (ii) the series l∈(Zn)p ‖H(·, l)‖∞,ρ is summable,where ‖H(·, l)‖∞,ρ := sups∈Uρ |H(s, l)|. We say that f satisfies (H2) if and only if there exists ρ > 0 such that (i) for any l, s 7→ f(s, l) extends as a holomorphic function on Uρ − {0}, (ii) for any δ such that 0 < δ < ρ, the series l∈(Zn)p ‖H(·, l)‖∞,δ,ρ is summable, where ‖H(·, l)‖∞,δ,ρ := supδ<|s|<ρ |H(s, l)|. Remark 2.12. Note that (H1) implies (H2). Moreover, if f satisfies (H1) (resp. (H2) for ρ > 0, then it is straightforward to check that f : s 7→ l∈(Zn)p f(s, l) extends as an holomorphic function on Uρ (resp. on Uρ \ { 0 }). Corollary 2.13. With the same notations of Lemma 2.10, suppose that r + 2p − j > n, then, the function H(s, l) := k∈Znh(s, k, l) satisfies (H1). Proof. (i) Let’s fix ρ > 0 such that ρ < r + 2p − j − n. Since r + 2p − j > n, Uρ is inside the half-plane of absolute convergence of the series defined by H(s, l). Thus, s 7→ H(s, l) is holomorphic on Uρ. (ii) Since ∣∣|k|−s ∣∣ ≤ |k|ρ for all s ∈ Uρ and k ∈ Zn \ { 0 }, we get as in the above proof |h(s, k, l)| ≤ |b(l)Pj(k)| |k|−r+ρ−2p P (|l1|, · · · , |lp|). Since ρ < r + 2p − j − n, the series k∈Zn |Pj(k)||k|−r+ρ−2p is summable. Thus, ‖H(·, l)‖∞,ρ ≤ K F (l) where K := ′|Pj(k)||k|−r+ρ−2p <∞. We have already seen that the series l F (l) is summable, so we get the result. We note that if f and g both satisfy (H1) (or (H2)), then so does f + g. In the following, we will use the equivalence relation f ∼ g ⇐⇒ f − g satisfies (H1). Lemma 2.14. Let f and g be two functions on D × (Zn)p where D is an open neighborhood of 0 in C, such that f ∼ g and such that g satisfies (H2). Then l∈(Zn)p f(s, l) = l∈(Zn)p g(s, l) . Proof. Since f ∼ g, f satisfies (H2) for a certain ρ > 0. Let’s fix η such that 0 < η < ρ and define Cη as the circle of center 0 and radius η. We have g(s, l) = Res f(s, l) = 1 f(s, l) ds = u(t, l)dt . where I = [0, 2π] and u(t, l) := 1 ηeitf(η eit, l). The fact that f satisfies (H2) entails that the series l∈(Zn)p ‖f(·, l)‖∞,Cη is summable. Thus, since ‖u(·, l)‖∞ = η ‖f(·, l)‖∞,Cη , the series∑ l∈(Zn)p ‖u(·, l)‖∞ is summable, so, as a consequence, l∈(Zn)p u(t, l)dt = l∈(Zn)p u(t, l)dt which gives the result. 2.3 Computation of residues of zeta functions Since, we will have to compute residues of series, let us introduce the following Definition 2.15. ζ(s) := Zn(s) := |k|−s, ζp1,...,pn(s) := 1 · · · k |k|s , for pi ∈ N, where ζ(s) is the Riemann zeta function (see [25] or [14]). By the symmetry k → −k, it is clear that these functions ζp1,...,pn all vanish for odd values of pi. Let us now compute ζ0,··· ,0,1i,0··· ,0,1j ,0··· ,0(s) in terms of Zn(s): Since ζ0,··· ,0,1i,0··· ,0,1j ,0··· ,0(s) = Ai(s) δij , exchanging the components ki and kj , we get ζ0,··· ,0,1i,0··· ,0,1j ,0··· ,0(s) = Zn(s− 2). Similarly, |k|s+8 n(n−1) Zn(s+ 4)− 1n−1 |k|s+8 but it is difficult to write explicitly ζp1,...,pn(s) in terms of Zn(s− 4) and other Zn(s−m) when at least four indices pi are non zero. When all pi are even, ζp1,...,pn(s) is a nonzero series of fractions P (k) where P is a homogeneous polynomial of degree p1 + · · ·+ pn. Theorem 2.1 now gives us the following Proposition 2.16. ζp1,...,pn has a meromorphic extension to the whole plane with a unique pole at n+ p1 + · · ·+ pn. This pole is simple and the residue at this pole is s=n+p1+···+pn ζp1,...,pn(s) = 2 )···Γ( n+p1+···+pn when all pi are even or this residue is zero otherwise. In particular, for n = 2, ′ kikj |k|s+4 = δij π , (18) and for n = 4, ′ kikj |k|s+6 = δij ′ kikjklkm |k|s+8 = (δijδlm + δilδjm + δimδjl) . (19) Proof. Equation (17) follows from Theorem (2.1) s=n+p1+···+pn ζp1,...,pn(s) = k∈Sn−1 1 · · · kpnn dS(k) and standard formulae (see for instance [32, VIII,1;22]). Equation (18) is a straightforward consequence of Equation (17). Equation (19) can be checked for the cases i = j 6= l = m and i = j = l = m. Note that Zn(s) is an Epstein zeta function associated to the quadratic form q(x) := x 1+...+x so Zn satisfies the following functional equation Zn(s) = π s−n/2Γ(n/2− s/2)Γ(s/2)−1 Zn(n− s). Since πs−n/2Γ(n/2−s/2) Γ(s/2)−1 = 0 for any negative even integer n and Zn(s) is meromorphic on C with only one pole at s = n with residue 2πn/2Γ(n/2)−1 according to previous proposition, so we get Zn(0) = −1. We have proved that Zn(s+ n) = 2π n/2 Γ(n/2)−1, (20) Zn(0) = −1. (21) 2.4 Meromorphic continuation of a class of zeta functions Let n, q ∈ N, q ≥ 2, and p = (p1, . . . , pq−1) ∈ Nq−10 . Set I := {i | pi 6= 0} and assume that I 6= ∅ and I := {α = (αi)i∈I | ∀i ∈ I αi = (αi,1, . . . , αi,pi) ∈ N 0 } = We will use in the sequel also the following notations: - for x = (x1, . . . , xt) ∈ Rt recall that |x|1 = |x1|+ · · ·+ |xt| and |x| = x21 + · · ·+ x2t ; - for all α = (αi)i∈I ∈ I = i∈I N |α|1 = |αi|1 = |αi,j| and 2.4.1 A family of polynomials In this paragraph we define a family of polynomials which plays an important role later. Consider first the variables: - for X1, . . . ,Xn we set X = (X1, . . . ,Xn); - for any i = 1, . . . , 2q, we consider the variables Yi,1, . . . , Yi,n and set Yi := (Yi,1, . . . , Yi,n) and Y := (Y1, . . . , Y2q); - for Y = (Y1, . . . , Y2q), we set for any 1 ≤ j ≤ q, Ỹj := Y1 + · · ·+ Yj + Yq+1 + · · ·+ Yq+j. We define for all α = (αi)i∈I ∈ I = i∈I N 0 the polynomial Pα(X,Y ) := (2〈X, Ỹi〉+ |Ỹi|2)αi,j . (22) It is clear that Pα(X,Y ) ∈ Z[X,Y ], degXPα ≤ |α|1 and degY Pα ≤ 2|α|1. Let us fix a polynomial Q ∈ R[X1, · · · ,Xn] and note d := degQ. For α ∈ I, we want to expand Pα(X,Y )Q(X) in homogeneous polynomials in X and Y so defining L(α) := {β ∈ N(2q+1)n0 : |β|1 − dβ ≤ 2|α|1 and dβ ≤ |α|1 + d } where dβ := 1 βi, we set Pα(X,Y )Q(X) =: β∈L(α) cα,βX where cα,β ∈ R, Xβ := Xβ11 · · ·X n and Y β := Y 1,1 · · ·Y β(2q+1)n 2q,n . By definition, X β is a homogeneous polynomial of degree in X equals to dβ . We note Mα,β(Y ) := cα,β Y 2.4.2 Residues of a class of zeta functions In this section we will prove the following result, used in Proposition 5.4 for the computation of the spectrum dimension of the noncommutative torus: Theorem 2.17. (i) Let 1 Θ be a diophantine matrix, and ã ∈ S (Zn)2q . Then s 7→ f(s) := l∈[(Zn)q]2 |k + l̃i|pi |k|−sQ(k) eik.Θ has a meromorphic continuation to the whole complex plane C with at most simple possible poles at the points s = n+ d+ |p|1 −m where m ∈ N0. (ii) Let m ∈ N0 and set I(m) := { (α, β) ∈ I × N(2q+1)n0 : β ∈ L(α) and m = 2|α|1 − dβ + d }. Then I(m) is a finite set and s = n+ d+ |p|1 −m is a pole of f if and only if C(f,m) := (α,β)∈I(m) Mα,β(l) u∈Sn−1 uβ dS(u) 6= 0, with Z := {l : 1 lj = 0} and the convention ∅ = 0. In that case s = n + d + |p|1 −m is a simple pole of residue Res s=n+d+|p|1−m f(s) = C(f,m). In order to prove the theorem above we need the following Lemma 2.18. For all N ∈ N we have |k + l̃i|pi = α=(αi)i∈I∈ i∈I{0,...,N} ) Pα(k,l) |k|2|α|1−|p|1 +ON (|k||p|1−(N+1)/2) uniformly in k ∈ Zn and l ∈ (Zn)2q verifying |k| > U(l) := 36 ( ∑2q−1 i=1, i 6=q |li|)4. Proof. For i = 1, . . . , q − 1, we have uniformly in k ∈ Zn and l ∈ (Zn)2q verifying |k| > U(l), ∣∣2〈k,eli〉+|eli|2 . (23) In that case, |k + l̃i| = |k|2 + 2〈k, l̃i〉+ |l̃i|2 = |k| 2〈k,eli〉+|eli| |k|2u−1 P iu(k, l) where for all i = 1, . . . , q − 1 and for all u ∈ N0, P iu(k, l) := 2〈k, l̃i〉+ |l̃i|2 with the convention P i0(k, l) := 1. In particular P iu(k, l) ∈ Z[k, l], degk P iu ≤ u and degl P iu ≤ 2u. Inequality (23) implies that for all i = 1, . . . , q − 1 and for all u ∈ N, |k|2u |P iu(k, l)| ≤ uniformly in k ∈ Zn and l ∈ (Zn)2q verifying |k| > U(l). Let N ∈ N. We deduce from the previous that for any k ∈ Zn and l ∈ (Zn)2q verifying |k| > U(l) and for all i = 1, . . . , q − 1, we have |k + l̃i| = |k|2u−1 P iu(k, l) +O |k| | |k|)−u |k|2u−1 P iu(k, l) +ON |k|(N−1)/2 It follows that for any N ∈ N, we have uniformly in k ∈ Zn and l ∈ (Zn)2q verifying |k| > U(l) and for all i ∈ I, |k + l̃i|pi = αi∈{0,...,N} |k|2|αi|1−pi P iαi(k, l) +ON |k|(N+1)/2−pi where P iαi(k, l) = j=1 P (k, l) for all αi = (αi,1, . . . , αi,pi) ∈ {0, . . . , N}pi and |k + l̃i|pi = α=(αi)∈ i∈I{0,...,N} |k|2|α|1−|p|1 Pα(k, l) +ON |k|(N+1)/2−|p|1 where Pα(k, l) = i∈I P (k, l) = j=1 P (k, l). Proof of Theorem 2.17. (i) All n, q, p = (p1, . . . , pq−1) and ã ∈ S (Zn)2q are fixed as above and we define formally for any l ∈ (Zn)2q F (l, s) := |k + l̃i|pi Q(k) eik.Θ 1 lj |k|−s. (24) Thus, still formally, f(s) := l∈(Zn)2q ãl F (l, s). (25) It is clear that F (l, s) converges absolutely in the half plane {σ = ℜ(s) > n + d + |p|1} where d = degQ. Let N ∈ N. Lemma 2.18 implies that for any l ∈ (Zn)2q and for s ∈ C such that σ > n+ |p|1+d, F (l, s) = |k|≤U(l) |k + l̃i|pi Q(k) eik.Θ 1 lj |k|−s α=(αi)i∈I∈ i∈I{0,...,N} |k|>U(l) |k|s+2|α|1−|p|1 Pα(k, l)Q(k) e 1 lj +GN (l, s). where s 7→ GN (l, s) is a holomorphic function in the half-plane DN := {σ > n+ d+ |p|1 − N+12 } and verifies in it the bound GN (l, s) ≪N,σ 1 uniformly in l. It follows that F (l, s) = α=(αi)i∈I∈ i∈I{0,...,N} Hα(l, s) +RN (l, s), (26) where Hα(l, s) := ′ (1/2 |k|s+2|α|1−|p|1 Pα(k, l)Q(k) e 1 lj , RN (l, s) := |k|≤U(l) |k + l̃i|pi Q(k) eik.Θ 1 lj |k|−s |k|≤U(l) α=(αi)i∈I∈ i∈I{0,...,N} ) Pα(k,l) |k|s+2|α|1−|p|1 Q(k) eik.Θ 1 lj +GN (l, s). In particular there exists A(N) > 0 such that s 7→ RN (l, s) extends holomorphically to the half-plane DN and verifies in it the bound RN (l, s) ≪N,σ 1 + |l|A(N) uniformly in l. Let us note formally hα(s) := ãlHα(l, s). Equation (26) and RN (l, s) ≪N,σ 1 + |l|A(N) imply that f(s) ∼N α=(αi)i∈I∈ i∈I{0,...,N} hα(s), (27) where ∼N means modulo a holomorphic function in DN . Recall the decomposition Pα(k, l)Q(k) = β∈L(α)Mα,β(l) k β and we decompose similarly hα(s) = β∈L(α) hα,β(s). Theorem 2.5 now implies that for all α = (αi)i∈I ∈ i∈I{0, . . . , N}pi and β ∈ L(α), - the map s 7→ hα,β(s) has a meromorphic continuation to the whole complex plane C with only one simple possible pole at s = n+ |p|1 − 2|α|1 + dβ , - the residue at this point is equal to s=n+|p|1−2|α|1+dβ hα,β(s) = ãlMα,β(l) u∈Sn−1 uβdS(u) (28) where Z := {l ∈ (Z)n)2q : 1 lj = 0}. If the right hand side is zero, hα,β(s) is holomorphic on By (27), we deduce therefore that f(s) has a meromorphic continuation on the halfplane DN , with only simple possible poles in the set {n+ |p|1+k : −2N |p|1 ≤ k ≤ d }. Taking now N → ∞ yields the result. (ii) Let m ∈ N0 and set I(m) := { (α, β) ∈ I × N(2q+1)n0 : β ∈ L(α) and m = 2|α|1 − dβ + d }. If (α, β) ∈ I(m), then |α|1 ≤ m and |β|1 ≤ 3m+ d, so I(m) is finite. With a chosen N such that 2N |p|1 + d > m, we get by (27) and (28) s=n+d+|p|1−m f(s) = (α,β)∈I(m) Mα,β(l) u∈Sn−1 uβ dS(u) = C(f,m) with the convention ∅ = 0. Thus, n+d+ |p|1−m is a pole of f if and only if C(f,m) 6= 0. 3 Noncommutative integration on a simple spectral triple In this section, we revisit the notion of noncommutative integral pioneered by Alain Connes, pay- ing particular attention to the reality (Tomita–Takesaki) operator J and to kernels of perturbed Dirac operators by symmetrized one-forms. 3.1 Kernel dimension We will have to compare here the kernels of D and DA which are both finite dimensional: Lemma 3.1. Let (A,H,D) be a spectral triple with a reality operator J and chirality χ. If A ∈ Ω1D is a one-form, the fluctuated Dirac operator DA := D +A+ ǫJAJ−1 (where DJ = ǫ JD, ǫ = ±1) is an operator with compact resolvent, and in particular its kernel KerDA is a finite dimensional space. This space is invariant by J and χ. Proof. Let T be a bounded operator and let z be in the resolvent of D + T and z′ be in the resolvent of D. Then (D + T − z)−1 = (D − z′)−1 [1− (T + z′ − z)(D + T − z)−1]. Since (D− z′)−1 is compact by hypothesis and since the term in bracket is bounded, D+ T has a compact resolvent. Applying this to T = A+ ǫJAJ−1, DA has a finite dimensional kernel (see for instance [27, Theorem 6.29]). Since according to the dimension, J2 = ±1, J commutes or anticommutes with χ, χ commutes with the elements in the algebraA andDχ = −χD (see [10] or [23, p. 405]), we get DAχ = −χDA and DAJ = ±JDA which gives the result. 3.2 Pseudodifferential operators Let (A,D,H) be a given real regular spectral triple of dimension n. We note P0 the projection on KerD , PA the projection on KerDA , D := D + P0 ,DA := DA + PA . P0 and PA are thus finite-rank selfadjoint bounded operators. We remark that D and DA are selfadjoint invertible operators with compact inverses. Remark 3.2. Since we only need to compute the residues and the value at 0 of the ζD, ζDA functions, it is not necessary to define the operators D−1 or D−1A and the associated zeta func- tions. However, we can remark that all the work presented here could be done using the process of Higson in [26] which proves that we can add any smoothing operator to D or DA such that the result is invertible without changing anything to the computation of residues. Define for any α ∈ R OP 0 := {T : t 7→ Ft(T ) ∈ C∞ R,B(H) OPα := {T : T |D|−α ∈ OP 0 }. where Ft(T ) := e it|D| T e−it|D| = eit|D| T e−it|D| since |D| = |D|+ P0. Define δ(T ) := [|D|, T ], ∇(T ) := [D2, T ], σs(T ) := |D|sT |D|−s, s ∈ C. It has been shown in [13] that OP 0 = p≥0Dom(δ p). In particular, OP 0 is a subalgebra of B(H) (while elements of OPα are not necessarily bounded for α > 0) and A ⊆ OP 0, JAJ−1 ⊆ OP 0, [D,A] ⊆ OP 0. Note that P0 ∈ OP−∞ and δ(OP 0) ⊆ OP 0. For any t > 0, Dt and and |D|t are in OP t and for any α ∈ R,Dα and |D|α are in OPα. By hypothesis, |D|−n ∈ L(1,∞)(H) so for any α > n, OP−α ⊆ L1(H). Lemma 3.3. [13] (i) For any T ∈ OP 0 and s ∈ C, σs(T ) ∈ OP 0. (ii) For any α, β ∈ R, OPαOP β ⊆ OPα+β . (iii) If α ≤ β, OPα ⊆ OP β. (iv) For any α, δ(OPα) ⊆ OPα. (v) For any α and T ∈ OPα, ∇(T ) ∈ OPα+1. Proof. See the appendix. Remark 3.4. Any operator in OPα, where α ∈ R, extends as a continuous linear operator from Dom |D|α+1 to Dom |D| where the Dom |D|α spaces have their natural norms (see [13,26]). We now introduce a definition of pseudodifferential operators in a slightly different way than in [9,13,26] which in particular pays attention to the reality operator J and the kernel of D and allows D and |D|−1 to be a pseudodifferential operators. It is more in the spirit of [4]. Definition 3.5. Let us define D(A) as the polynomial algebra generated by A, JAJ−1, D and A pseudodifferential operator is an operator T such that there exists d ∈ Z such that for any N ∈ N, there exist p ∈ N0, P ∈ D(A) and R ∈ OP−N (p, P and R may depend on N) such that P D−2p ∈ OP d and T = P D−2p +R . Define Ψ(A) as the set of pseudodifferential operators and Ψ(A)k := Ψ(A) ∩OP k. Note that if A is a 1-form, A and JAJ−1 are in D(A) and moreover D(A) ⊆ ∪p∈N0OP p. Since |D| ∈ D(A) by construction and P0 is a pseudodifferential operator, for any p ∈ Z, |D|p is a pseudodifferential operator (in OP p.) Let us remark also that D(A) ⊆ Ψ(A) ⊆ ∪k∈ZOP k. Lemma 3.6. [9, 13] The set of all pseudodifferential operators Ψ(A) is an algebra. Moreover, if T ∈ Ψ(A)d and T ∈ Ψ(A)d′, then TT ′ ∈ Ψ(A)d+d′ . Proof. See the appendix. Due to the little difference of behavior between scalar and nonscalar pseudodifferential operators (i.e. when coefficients like [D, a], a ∈ A appears in P of Definition 3.5), it is convenient to also introduce Definition 3.7. Let D1(A) be the algebra generated by A, JAJ−1 and D, and Ψ1(A) be the set of pseudodifferential operators constructed as before with D1(A) instead of D(A). Note that Ψ1(A) is subalgebra of Ψ(A). Remark that Ψ1(A) does not necessarily contain operators such as |D|k where k ∈ Z is odd. This algebra is similar to the one defined in [4]. 3.3 Zeta functions and dimension spectrum For any operator B and if X is either D or DA, we define ζBX(s) := Tr B|X|−s ζX(s) := Tr |X|−s The dimension spectrum Sd(A,H,D) of a spectral triple has been defined in [9,13]. It is extended here to pay attention to the operator J and to our definition of pseudodifferential operator. Definition 3.8. The spectrum dimension of the spectral triple is the subset Sd(A,H,D) of all poles of the functions ζPD := s 7→ Tr P |D|−s where P is any pseudodifferential operator in OP 0. The spectral triple (A,H,D) is simple when these poles are all simple. Remark 3.9. If Sp(A,H,D) denotes the set of all poles of the functions s 7→ Tr P |D|−s where P is any pseudodifferential operator, then, Sd(A,H,D) ⊆ Sp(A,H,D). When Sp(A,H,D) = Z, Sd(A,H,D) = {n − k : k ∈ N0 }: indeed, if P is a pseudodifferential operator in OP 0, and q ∈ N is such that q > n, P |D|−s is in OP−ℜ(s) so is trace-class for s in a neighborhood of q; as a consequence, q cannot be a pole of s 7→ Tr P |D|−s Remark 3.10. Sp(A,H,D) is also the set of poles of functions s 7→ Tr B|D|−s−2p where p ∈ N0 and B ∈ D(A). 3.4 The noncommutative integral We already defined the one parameter group σz(T ) := |D|zT |D|−z, z ∈ C. Introducing the notation (recall that ∇(T ) = [D2, T ]) for an operator T , ε(T ) := ∇(T )D−2, we get from [4, (2.44)] the following expansion for T ∈ OP q σz(T ) ∼ g(z, r) εr(T ) mod OP−N−1+q (29) where g(z, r) := 1 ) · · · (z − (r − 1)) = with the convention g(z, 0) := 1. We define the noncommutative integral by − T := Res ζTD(s) = Res T |D|−s Proposition 3.11. [13] If the spectral triple is simple, is a trace on Ψ(A). Proof. See the appendix. 4 Residues of ζDA for a spectral triple with simple dimension spectrum We fix a regular spectral triple (A,H,D) of dimension n and a self-adjoint 1-form A. Recall that DA := D + à where à := A+ εJAJ−1, DA := DA + PA where PA is the projection on KerDA. Remark that à ∈ D(A) ∩OP 0 and DA ∈ D(A) ∩OP 1. We note VA := PA − P0. As the following lemma shows, VA is a smoothing operator: Lemma 4.1. (i) k≥1Dom(DA)k ⊆ k≥1Dom |D|k. (ii) KerDA ⊆ k≥1Dom |D|k. (iii) For any α, β ∈ R, |D|βPA|D|α is bounded. (iv) PA ∈ OP−∞. Proof. (i) Let us define for any p ∈ N, Rp := (DA)p−Dp, so Rp ∈ OP p−1 and Rp Dom |D|p Dom |D| (see Remark 3.4). Let us fix k ∈ N, k ≥ 2. Since DomDA = DomD = Dom |D|, we have Dom(DA)k = {φ ∈ Dom |D| : (Dj +Rj)φ ∈ Dom |D| , ∀j 1 ≤ j ≤ k − 1 }. Let φ ∈ Dom(DA)k. We prove by recurrence that for any j ∈ { 1, · · · , k − 1 }, φ ∈ Dom |D|j+1: We have φ ∈ Dom |D| and (D +R1)φ ∈ Dom |D|. Thus, since R1 φ ∈ Dom |D|, Dφ ∈ Dom |D|, which proves that φ ∈ Dom |D|2. Hence, case j = 1 is done. Suppose now that φ ∈ Dom |D|j+1 for a j ∈ { 1, · · · , k − 2 }. Since (Dj+1 +Rj+1)φ ∈ Dom |D|, and Rj+1 φ ∈ Dom |D|, we get Dj+1 φ ∈ Dom |D|, which proves that φ ∈ Dom |D|j+2. Finally, if we set j = k − 1, we get φ ∈ Dom |D|k, so Dom(DA)k ⊆ Dom |D|k. (ii) follows from KerDA ⊆ k≥1Dom(DA)k and (i). (iii) Let us first check that |D|αPA is bounded. We define D0 as the operator with domain DomD0 = ImPA ∩Dom |D|α and such that D0 φ = |D|α φ. Since DomD0 is finite dimensional, D0 extends as a bounded operator on H with finite rank. We have φ∈Dom |D|αPA, ‖φ‖≤1 ‖|D|αPA φ‖ ≤ sup φ∈DomD0, ‖φ‖≤1 ‖|D|α φ‖ = ‖D0‖ <∞ so |D|αPA is bounded. We can remark that by (ii), DomD0 = ImPA and Dom |D|αPA = H. Let us prove now that PA|D|α is bounded: Let φ ∈ DomPA|D|α = Dom |D|α. By (ii), we have ImPA ⊆ Dom |D|α so we get ‖PA|D|α φ‖ ≤ sup ψ∈ImPA, ‖ψ‖≤1 | < ψ, |D|α φ > | ≤ sup ψ∈ImPA, ‖ψ‖≤1 | < |D|αψ, φ > | ≤ sup ψ∈ImPA, ‖ψ‖≤1 ‖|D|αψ‖ ‖φ‖ = ‖D0‖ ‖φ‖ . (iv) For any k ∈ N0 and t ∈ R, δk(PA)|D|t is a linear combination of terms of the form |D|βPA|D|α, so the result follows from (iii). Remark 4.2. We will see later on the noncommutative torus example how important is the difference between DA and D + A. In particular, the inclusion KerD ⊆ KerD + A is not satisfied since A does not preserve KerD contrarily to Ã. The coefficient of the nonconstant term Λk (k > 0) in the expansion (5) of the spectral action S(DA,Φ,Λ) is equal to the residue of ζDA(s) at k. We will see in this section how we can compute these residues in term of noncommutative integral of certain operators. Define for any operator T , p ∈ N, s ∈ C, Kp(T, s) := (− s2) 0≤t1≤···≤tp≤1 σ−st1(T ) · · · σ−stp(T ) dt with dt := dt1 · · · dtp. Remark that if T ∈ OPα, then σz(T ) ∈ OPα for z ∈ C and Kp(T, s) ∈ OPαp. Let us define X := D2A −D2 = ÃD +DÃ+ Ã2, XV := X + VA, thus X ∈ D1(A) ∩OP 1 and by Lemma 4.1, XV ∼ X mod OP−∞. (30) We will use Y := log(D2A)− log(D2) which makes sense since D2A = D2A + PA is invertible for any A. By definition of XV , we get Y = log(D2 +XV )− log(D2). Lemma 4.3. [4] (i) Y is a pseudodifferential operator in OP−1 with the following expansion for any N ∈ N k1,··· ,kp=0 (−1)|k|1+p+1 |k|1+p ∇kp(X∇kp−1(· · ·X∇k1(X) · · · ))D−2(|k|1+p) mod OP−N−1. (ii) For any N ∈ N and s ∈ C, |DA|−s ∼ |D|−s + Kp(Y, s)|D|−s mod OP−N−1−ℜ(s). (31) Proof. (i) We follow [4, Lemma 2.2]. By functional calculus, Y = I(λ) dλ, where I(λ) ∼ (−1)p+1 (D2 + λ)−1XV (D2 + λ)−1 mod OP−N−3. By (30), (D2 + λ)−1XV (D2 + λ)−1X mod OP−∞ and we get I(λ) ∼ (−1)p+1 (D2 + λ)−1X (D2 + λ)−1 mod OP−N−3. We set Ap(X) := (D2 + λ)−1X (D2 + λ)−1 and L := (D2 + λ)−1 ∈ OP−2 for a fixed λ. Since [D2 + λ,X] ∼ ∇(X) mod OP−∞, a recurrence proves that if T is an operator in OP r, then, for q ∈ N0, A1(T ) = LTL ∼ (−1)k∇k(T )Lk+2 mod OP r−q−5. With Ap(X) = LXAp−1(X), another recurrence gives, for any q ∈ N0, Ap(X) ∼ k1,··· ,kp=0 (−1)|k|1∇kp(X∇kp−1(· · ·X∇k1(X) · · · ))L|k|1+p+1 mod OP−q−p−3, which entails that I(λ) ∼ (−1)p+1 k1,··· ,kp=0 (−1)|k|1∇kp(X∇kp−1(· · ·X∇k1(X) · · · ))L|k|1+p+1 mod OP−N−3. (D2 + λ)−(|k|1+p+1)dλ = 1 |k|1+p D−2(|k|1+p), we get the result provided we control the remainders. Such a control is given in [4, (2.27)]. (ii) We have |DA|−s = eB−(s/2)Y e−B |D|−s where B := (−s/2) log(D2). Following [4, Theorem 2.4], we get |DA|−s = |D|−s + Kp(Y, s)|D|−s . (32) and each Kp(Y, s) is in OP Corollary 4.4. For any p ∈ N and r1, · · · , rp ∈ N0, εr1(Y ) · · · εrp(Y ) ∈ Ψ1(A). Proof. If for any q ∈ N and k = (k1, · · · , kq) ∈ Nq0, Γkq(X) := (−1)|k|1+q+1 |k|1+q ∇kq(X∇kq−1(· · ·X∇k1(X) · · · )), then, Γkq (X) ∈ OP |k|1+q. For any N ∈ N, k1,··· ,kq=0 Γkq(X)D −2(|k|1+q) mod OP−N−1. (33) Note that the Γkq(X) are in D1(A), which, with (33) proves that Y and thus εr(Y ) = ∇r(Y )D−2r, are also in Ψ1(A). We remark, as in [11], that the fluctuations leave invariant the first term of the spectral action (5). This is a generalization of the fact that in the commutative case, the noncommutative integral depends only on the principal symbol of the Dirac operator D and this symbol is stable by adding a gauge potential like in D+A. Note however that the symmetrized gauge potential A+ ǫJAJ−1 is always zero in this case for any selfadjoint one-form A. Lemma 4.5. If the spectral triple is simple, formula (6) can be extended as ζDA(0)− ζD(0) = (−1)q −(ÃD−1)q. (34) Proof. Since the spectral triple is simple, equation (32) entails that ζDA(0)− ζD(0) = Tr(K1(Y, s)|D|−s)|s=0 . Thus, with (29), we get ζDA(0) − ζD(0) = −12 Y . Replacing A by Ã, the same proof as in [4] gives − Y = (−1)q −(ÃD−1)q. Lemma 4.6. For any k ∈ N0, s=n−k ζDA(s) = Res s=n−k ζD(s) + r1,··· ,rp=0 s=n−k h(s, r, p) Tr εr1(Y ) · · · εrp(Y )|D|−s where h(s, r, p) := (−s/2)p 0≤t1≤···≤tp≤1 g(−st1, r1) · · · g(−stp, rp) dt . Proof. By Lemma 4.3 (ii), |DA|−s ∼ |D|−s + p=1Kp(Y, s)|D|−s mod OP−(k+1)−ℜ(s), where the convention ∅ = 0 is used. Thus, we get for s in a neighborhood of n− k, |DA|−s − |D|−s − Kp(Y, s)|D|−s ∈ OP−(k+1)−ℜ(s) ⊆ L1(H) which gives s=n−k ζDA(s) = Res s=n−k ζD(s) + s=n−k Kp(Y, s)|D|−s . (35) Let us fix 1 ≤ p ≤ k and N ∈ N. By (29) we get Kp(Y, s) ∼ (− s2) 0≤t1≤···tp≤1 r1,··· ,rp=0 g(−st1, r1) · · · g(−stp, rp) εr1(Y ) · · · εrp(Y ) dt mod OP−N−p−1. (36) If we now take N = k − p, we get for s in a neighborhood of n− k Kp(Y, s)|D|−s − r1,··· ,rp=0 h(s, r, p) εr1(Y ) · · · εrp(Y )|D|−s ∈ OP−k−1−ℜ(s) ⊆ L1(H) so (35) gives the result. Our operators |DA|k are pseudodifferential operators: Lemma 4.7. For any k ∈ Z, |DA|k ∈ Ψk(A). Proof. Using (36), we see that Kp(Y, s) is a pseudodifferential operator in OP −p, so (31) proves that |DA|k is a pseudodifferential operator in OP k. The following result is quite important since it shows that one can use for D or DA: Proposition 4.8. If the spectral triple is simple, Res P |DA|−s P for any pseudodiffer- ential operator P . In particular, for any k ∈ N0 − |DA|−(n−k) = Res s=n−k ζDA(s). Proof. Suppose P ∈ OP k with k ∈ Z and let us fix p ≥ 1. With (36), we see that for any N ∈ N, PKp(Y, s)|D|−s ∼ r1,··· ,rp=0 h(s, r, p)Pεr1(Y ) · · · εrp(Y )|D|−s mod OP−N−p−1+k−ℜ(s). Thus if we take N = n− p+ k, we get PKp(Y, s)|D|−s n−p+k∑ r1,··· ,rp=0 h(s, r, p) Tr Pεr1(Y ) · · · εrp(Y )|D|−s Since s = 0 is a zero of the analytic function s 7→ h(s, r, p) and s 7→ TrPεr1(Y ) · · · εrp(Y )|D|−s has only simple poles by hypothesis, we see that Res h(s, r, p) Tr Pεr1(Y ) · · · εrp(Y )|D|−s PKp(Y, s)|D|−s = 0. (37) Using (31), P |DA|−s ∼ P |D|−s + p=1 PKp(Y, s)|D|−s mod OP−n−1−ℜ(s) and thus, Tr(P |DA|−s) = − P + PKp(Y, s)|D|−s . (38) The result now follows from (37) and (38). To get the last equality, one uses the pseudodiffer- ential operator |DA|−(n−k). Proposition 4.9. If the spectral triple is simple, then − |DA|−n = − |D|−n. (39) Proof. Lemma 4.6 and previous proposition for k = 0. Lemma 4.10. If the spectral triple is simple, − |DA|−(n−1) = − |D|−(n−1) − (n−1 − X|D|−n−1. − |DA|−(n−2) = − |D|−(n−2) + n−2 − X|D|−n + n − X2|D|−2−n Proof. (i) By (31), s=n−1 ζDA(s)− ζD(s) = Res s=n−1 (−s/2)Tr Y |D|−s = −n−1 Y |D|−(n−1)|D|−s where for the last equality we use the simple dimension spectrum hypothesis. Lemma 4.3 (i) yields Y ∼ XD−2 mod OP−2 and Y |D|−(n−1) ∼ X|D|−n−1 mod OP−n−1 ⊆ L1(H). Thus, Y |D|−(n−1)|D|−s = Res X|D|−n−1|D|−s − X|D|−n−1. (ii) Lemma 4.6 (ii) gives s=n−2 ζDA(s) = Res s=n−2 ζD(s) + Res s=n−2 h(s, r, 1) Tr εr(Y )|D|−s + h(s, 0, 2) Tr Y 2|D|−s We have h(s, 0, 1) = − s , h(s, 1, 1) = 1 )2 and h(s, 0, 2) = 1 )2. Using again Lemma 4.3 (i), Y ∼ XD−2 − 1 ∇(X)D−4 − 1 X2D−4 mod OP−3. Thus, s=n−2 Y |D|−s − X|D|−n − 1 −(∇(X) +X2)|D|−2−n. Moreover, using ∇(X)|D|−k = 0 for any k ≥ 0 since is a trace, s=n−2 ε(Y )|D|−s = Res s=n−2 ∇(X)D−4|D|−s − ∇(X)|D|−2−n = 0. Similarly, since Y ∼ XD−2 mod OP−2 and Y 2 ∼ X2D−4 mod OP−3, we get s=n−2 Y 2|D|−s = Res s=n−2 X2D−4|D|−s − X2|D|−2−n. Thus, s=n−2 ζDA(s) = Res s=n−2 ζD(s)+(−n−22 )( − X|D|−n − 1 −(∇(X) +X2)|D|−2−n) − ∇(X)|D|−2−n + 1 − X2|D|−2−n. Finally, s=n−2 ζDA(s) = Res s=n−2 ζD(s) + (−n−22 ) − X|D|−n − 1 − X2|D|−2−n − X2|D|−2−n and the result follows from Proposition 4.8. Corollary 4.11. If the spectral triple is simple and satisfies |D|−(n−2) = ÃD|D|−n =∫ DÃ|D|−n = 0, then − |DA|−(n−2) = n(n−2)4 − ÃDÃD|D|−n−2 + n−2 − Ã2|D|−n Proof. By previous lemma, s=n−2 ζDA(s) = − Ã2|D|−n + n −( ÃDÃD +DÃDÃ+ ÃD2Ã+DÃ2D )|D|−n−2 Since ∇(Ã) ∈ OP 1, the trace property of yields the result. 5 The noncommutative torus 5.1 Notations Let C∞(TnΘ) be the smooth noncommutative n-torus associated to a non-zero skew-symmetric deformation matrix Θ ∈Mn(R) (see [6], [30]). This means that C∞(TnΘ) is the algebra generated by n unitaries ui, i = 1, . . . , n subject to the relations ui uj = e iΘij uj ui, (40) and with Schwartz coefficients: an element a ∈ C∞(TnΘ) can be written as a = k∈Zn ak Uk, where {ak} ∈ S(Zn) with the Weyl elements defined by Uk := e− k.χk u 1 · · · uknn , k ∈ Zn, relation (40) reads UkUq = e k.Θq Uk+q, and UkUq = e −ik.Θq UqUk (41) where χ is the matrix restriction of Θ to its upper triangular part. Thus unitary operators Uk satisfy U∗k = U−k and [Uk, Ul] = −2i sin( k.Θl)Uk+l. Let τ be the trace on C∞(TnΘ) defined by τ k∈Zn ak Uk := a0 and Hτ be the GNS Hilbert space obtained by completion of C∞(TnΘ) with respect of the norm induced by the scalar product 〈a, b〉 := τ(a∗b). On Hτ = { k∈Zn ak Uk : {ak}k ∈ l2(Zn) }, we consider the left and right regular representations of C∞(TnΘ) by bounded operators, that we denote respectively by L(.) and R(.). Let also δµ, µ ∈ { 1, . . . , n }, be the n (pairwise commuting) canonical derivations, defined by δµ(Uk) := ikµUk. (42) We need to fix notations: let AΘ := C∞(TnΘ) acting on H := Hτ ⊗ C2 with n = 2m or n = 2m+ 1 (i.e., m = ⌊n ⌋ is the integer part of n ), the square integrable sections of the trivial spin bundle over Tn. Each element of AΘ is represented on H as L(a) ⊗ 12m where L (resp. R) is the left (resp. right) multiplication. The Tomita conjugation J0(a) := a ∗ satisfies [J0, δµ] = 0 and we define J := J0 ⊗ C0 where C0 is an operator on C2 . The Dirac operator is given by D := −i δµ ⊗ γµ, (43) where we use hermitian Dirac matrices γ. It is defined and symmetric on the dense subset of H given by C∞(TnΘ)⊗ C2 . We still note D its selfadjoint extension. This implies α = −εγαC0, (44) D Uk ⊗ ei = kµUk ⊗ γµei, where (ei) is the canonical basis of C 2m . Moreover, C20 = ±12m depending on the parity of m. Finally, one introduces the chirality (which in the even case is χ := id⊗ (−i)mγ1 · · · γn) and this yields that (AΘ,H,D, J, χ) satisfies all axioms of a spectral triple, see [8, 23]. The perturbed Dirac operator VuDV ∗u by the unitary Vu := L(u)⊗ 12m L(u)⊗ 12m defined for every unitary u ∈ A, uu∗ = u∗u = U0, must satisfy condition (3) (which is equivalent toH being endowed with a structure ofAΘ-bimodule). This yields the necessity of a symmetrized covariant Dirac operator: DA := D +A+ ǫJ AJ−1 since VuDV ∗u = DL(u)⊗12m [D,L(u∗)⊗12m ]: in fact, for a ∈ AΘ, using J0L(a)J 0 = R(a ∗), we get L(a)⊗ γα J−1 = −R(a∗)⊗ γα and that the representation L and the anti-representation R are C-linear, commute and satisfy [δα, L(a)] = L(δαa), [δα, R(a)] = R(δαa). This induces some covariance property for the Dirac operator: one checks that for all k ∈ Zn, L(Uk)⊗ 12m [D, L(U∗k )⊗ 12m ] = 1⊗ (−kµγµ), (45) so with (44), we get Uk[D, U∗k ] + ǫJUk[D, U∗k ]J−1 = 0 and VUk D V = D = DL(Uk)⊗12m [D,L(U∗k )⊗12m ]. (46) Moreover, we get the gauge transformation: VuDAV ∗u = Dγu(A) (47) where the gauged transform one-form of A is γu(A) := u[D, u∗] + uAu∗, (48) with the shorthand L(u)⊗ 12m −→ u. As a consequence, the spectral action is gauge invariant: S(DA,Φ,Λ) = S(Dγu(A),Φ,Λ). An arbitrary selfadjoint one-form A, can be written as A = L(−iAα)⊗ γα, Aα = −A∗α ∈ AΘ, (49) DA = −i δα + L(Aα)−R(Aα) ⊗ γα. (50) Defining Ãα := L(Aα)−R(Aα), we get D2A = −gα1α2(δα1 + Ãα1)(δα2 + Ãα2)⊗ 12m − 12Ωα1α2 ⊗ γ α1α2 where γα1α2 := 1 (γα1γα2 − γα2γα1), Ωα1α2 := [δα1 + Ãα1 , δα2 + Ãα2 ] = L(Fα1α2)−R(Fα1α2) Fα1α2 := δα1(Aα2)− δα2(Aα1) + [Aα1 , Aα2 ]. (51) In summary, D2A = −δα1α2 δα1 + L(Aα1)−R(Aα1) δα2 + L(Aα2)−R(Aα2) ⊗ 12m L(Fα1α2)−R(Fα1α2) ⊗ γα1α2 . (52) 5.2 Kernels and dimension spectrum We now compute the kernel of the perturbed Dirac operator: Proposition 5.1. (i) KerD = U0 ⊗C2 , so dimKerD = 2m. (ii) For any selfadjoint one-form A, KerD ⊆ KerDA. (iii) For any unitary u ∈ A, KerDγu(A) = Vu KerDA. Proof. (i) Let ψ = k,j ck,j Uk ⊗ ej ∈ KerD. Thus, 0 = D2ψ = k,i ck,j|k|2 Uk ⊗ ej which entails that ck,j|k|2 = 0 for any k ∈ Zn and 1 ≤ j ≤ 2m. The result follows. (ii) Let ψ ∈ KerD. So, ψ = U0 ⊗ v with v ∈ C2 and from (50), we get DAψ = Dψ + (A+ ǫJAJ−1)ψ = (A+ ǫJAJ−1)ψ = −i[Aα, U0]⊗ γαv = 0 since U0 is the unit of the algebra, which proves that ψ ∈ KerDA. (iii) This is a direct consequence of (47). Corollary 5.2. Let A be a selfadjoint one-form. Then KerDA = KerD in the following cases: (i) Au := L(u)⊗ 12m [D, L(u∗)⊗ 12m ] when u is a unitary in A. (ii) ||A|| < 1 (iii) The matrix 1 Θ has only integral coefficients. Proof. (i) This follows from previous result because Vu(U0 ⊗ v) = U0 ⊗ v for any v ∈ C2 (ii) Let ψ = k,j ck,j Uk ⊗ ej be in KerDA (so k,j |ck,j|2 <∞) and φ := j c0,j U0 ⊗ ej . Thus ψ′ := ψ − φ ∈ Ker DA since φ ∈ KerD ⊆ KerDA and 06=k∈Zn, j ck,j kα Uk ⊗ γαej ||2 = ||Dψ′||2 = || − (A+ ǫJAJ−1)ψ′||2 ≤ 4||A||2||ψ′||2 < ||ψ′||2. Defining Xk := α kαγα, X α |kα|2 12m is invertible and the vectors {Uk ⊗Xkej }06=k∈Zn, j are orthogonal in H, so 06=k∈Zn, j |kα|2 |ck,j|2 < 06=k∈Zn, j |ck,j|2 which is possible only if ck,j = 0, ∀k, j that is ψ′ = 0 et ψ = φ ∈ Ker D. (iii) This is a consequence of the fact that the algebra is commutative, thus A+ǫJAJ−1 = 0. Note that if Ãu := Au + ǫJAuJ −1, then by (45), ÃUk = 0 for all k ∈ Zn and ‖AUk‖ = |k|, but for an arbitrary unitary u ∈ A, Ãu 6= 0 so DAu 6= D. Naturally the above result is also a direct consequence of the fact that the eigenspace of an iso- lated eigenvalue of an operator is not modified by small perturbations. However, it is interesting to compute the last result directly to emphasize the difficulty of the general case: Let ψ = l∈Zn,1≤j≤2m cl,j Ul⊗ ej ∈ KerDA, so l∈Zn,1≤j≤2m |cl,j |2 <∞. We have to show that ψ ∈ Ker D that is cl,j = 0 when l 6= 0. Taking the scalar product of 〈Uk ⊗ ei| with 0 = DAψ = l, α, j cl, j(l αUl − i[Aα, Ul])⊗ γαej , we obtain l, α, j cl, j lαδk,l − i〈Uk, [Aα, Ul]〉 〈ei, γαej〉. If Aα = α,l aα,l Ul ⊗ γα with { aα,l }l ∈ S(Zn), note that [Ul, Um] = −2i sin(12 l.Θm)Ul+m and 〈Uk, [Aα, Ul]〉 = l′∈Zn aα,l′(−2i sin(12 l ′.Θl)〈Uk, Ul′+l〉 = −2i aα,k−l sin(12k.Θl). cl, j lαδk,l − 2aα,k−l sin(12k.Θl) 〈ei, γαej〉, ∀k ∈ Zn, ∀i, 1 ≤ i ≤ 2m. (53) We conjecture that KerD = KerDA at least for generic Θ’s: the constraints (53) should imply cl,j = 0 for all j and all l 6= 0 meaning ψ ∈ KerD. When 12πΘ has only integer coefficients, the sin part of these constraints disappears giving the result. Lemma 5.3. If 1 Θ is diophantine, Sp C∞(TnΘ),H,D = Z and all these poles are simple. Proof. Let B ∈ D(A) and p ∈ N0. Suppose that B is of the form B = arbrDqr−1|D|pr−1ar−1br−1 · · · Dq1 |D|p1a1b1 where r ∈ N, ai ∈ A, bi ∈ JAJ−1, qi, pi ∈ N0. We note ai =: l ai,l Ul and bi =: i bi,l Ul. With the shorthand kµ1,µqi := kµ1 · · · kµqi and γ µ1,µqi = γµ1 · · · γµqi , we get Dq1 |D|p1a1b1 Uk ⊗ ej = l1, l a1,l1b1,l′1 Ul1UkUl′1 |k + l1 + l′1|p1 (k + l1 + l′1)µ1,µq1 ⊗ γ µ1,µq1 ej which gives, after r iterations, BUk⊗ej = ãlb̃lUlr · · ·Ul1UkUl′1 · · ·Ul′r |k+ l̂i+ l̂′i|pi(k+ l̂i+ l̂′i)µi1,µiqi ⊗γ qr−1 · · · γµ q1 ej where ãl := a1,l1 · · · ar,lr and b̃l′ := b1,l′1 · · · br,l′r . Let us note Fµ(k, l, l ′) := i=1 |k + l̂i + l̂′i|pi(k + l̂i + l̂′i)µi1,µiqi and γ µ := γ µr−11 ,µ qr−1 · · · γµ Thus, with the shortcut ∼c meaning modulo a constant function towards the variable s, B|D|−2p−s ãlb̃l′ τ U−kUlr · · ·Ul1UkUl′1 · · ·Ul′r )Fµ(k,l,l′) |k|s+2p Tr(γµ) . Since Ulr · · ·Ul1Uk = UkUlr · · ·Ul1e−i 1 li.Θk we get U−kUlr · · ·Ul1UkUl′1 · · ·Ul′r = δPr 1 li+l eiφ(l,l ′) e−i 1 li.Θk where φ is a real valued function. Thus, B|D|−2p−s eiφ(l,l ′) δPr 1 li+l ãlb̃l′ Fµ(k,l,l ′) e−i 1 li.Θk |k|s+2p Tr(γµ) ∼c fµ(s)Tr(γµ). The function fµ(s) can be decomposed has a linear combination of zeta function of type described in Theorem 2.17 (or, if r = 1 or all the pi are zero, in Theorem 2.5). Thus, s 7→ Tr B|D|−2p−s has only poles in Z and each pole is simple. Finally, by linearity, we get the result. The dimension spectrum of the noncommutative torus is simple: Proposition 5.4. (i) If 1 Θ is diophantine, the spectrum dimension of C∞(TnΘ),H,D equal to the set {n− k : k ∈ N0 } and all these poles are simple. (ii) ζD(0) = 0. Proof. (i) Lemma 5.3 and Remark 3.9. (ii) ζD(s) = 1≤j≤2m〈Uk ⊗ ej , |D|−sUk ⊗ ej〉 = 2m( + 1) = 2m(Zn(s) + 1). By (21), we get the result. We have computed ζD(0) relatively easily but the main difficulty of the present work is precisely to calculate ζDA(0). 5.3 Noncommutative integral computations We fix a self-adjoint 1-form A on the noncommutative torus of dimension n. Proposition 5.5. If 1 Θ is diophantine, then the first elements of the expansion (5) are given − |DA|−n = − |D|−n = 2m+1πn/2 Γ(n )−1. (54) − |DA|n−k = 0 for k odd. − |DA|n−2 = 0. We need few technical lemmas: Lemma 5.6. On the noncommutative torus, for any t ∈ R, − ÃD|D|−t = − DÃ|D|−t = 0. Proof. Using notations of (49), we have Tr(ÃD|D|−s) ∼c 〈Uk ⊗ ej,−ikµ|k|−s[Aα, Uk]⊗ γαγµej〉 ∼c −iTr(γαγµ) kµ|k|−s〈Uk, [Aα, Uk]〉 = 0 since 〈Uk, [Aα, Uk]〉 = 0. Similarly Tr(DÃ|D|−s) ∼c 〈Uk ⊗ ej , |k|−s aα,l 2 sin (l + k)µUl+k ⊗ γµγαej〉 ∼c 2Tr(γµγα) aα,l sin (l + k)µ |k|−s〈Uk, Ul+k〉 = 0. Any element h in the algebra generated by A and [D,A] can be written as a linear combination of terms of the form a1 p1 · · · anpr where ai are elements of A or [D,A]. Such a term can be written as a series b := a1,α1,l1 · · · aq,αq,lqUl1 · · ·Ulq ⊗ γα1 · · · γαq where ai,αi are Schwartz sequences and when ai =: l alUl ∈ A, we set ai,α,l = ai,l with γα = 1. We define L(b) := τ a1,α1,l1 · · · aq,αq,lqUl1 · · ·Ulq Tr(γα1 · · · γαq ). By linearity, L is defined as a linear form on the whole algebra generated by A and [D,A]. Lemma 5.7. If h is an element of the algebra generated by A and [D,A], h|D|−s ∼c L(h)Zn(s). In particular, Tr h|D|−s has at most one pole at s = n. Proof. We get with b of the form a1,α1,l1 · · · aq,αq,lqUl1 · · ·Ulq ⊗ γα1 · · · γαq , b|D|−s a1,α1,l1 · · · aq,αq,lqUl1 · · ·UlqUk〉 Tr(γα1 · · · γαq )|k|−s ∼c τ( a1,α1,l1 · · · aq,αq,lqUl1 · · ·Ulq)Tr(γα1 · · · γαq )Zn(s) = L(b)Zn(s). The results follows now from linearity of the trace. Lemma 5.8. If 1 Θ is diophantine, the function s 7→ Tr εJAJ−1A|D|−s extends meromor- phically on the whole plane with only one possible pole at s = n. Moreover, this pole is simple εJAJ−1A|D|−s = aα,0 a m+1πn/2 Γ(n/2)−1. Proof. With A = L(−iAα) ⊗ γα, we get ǫJAJ−1 = R(iAα) ⊗ γα, and by multiplication εJAJ−1A = R(Aβ)L(Aα)⊗ γβγα. Thus, εJAJ−1A|D|−s 〈Uk, AαUkAβ〉 |k|−s Tr(γβγα) aα,l aβ,−l e ik.Θl |k|−sTr(γβγα) ∼c 2m aα,l a ik.Θl |k|−s. Theorem 2.5 (ii) entails that l aα,l a ik.Θl |k|−s extends meromorphically on the whole plane C with only one possible pole at s = n. Moreover, this pole is simple and we aα,l a ik.Θl |k|−s = aα,0 aα0 Res Zn(s). Equation (20) now gives the result. Lemma 5.9. If 1 Θ is diophantine, then for any t ∈ R, − X|D|−t = δt,n 2m+1 aα,l a −l + aα,0 a 2πn/2 Γ(n/2)−1. where X = ÃD +DÃ+ Ã2 and A =: −i l aα,l Ul ⊗ γα. Proof. By Lemma 5.6, we get X|D|−t = Ress=0Tr(Ã2|D|−s−t). Since A and εJAJ−1 commute, we have Ã2 = A2 + JA2J−1 + 2εJAJ−1A. Thus, Tr(Ã2|D|−s−t) = Tr(A2|D|−s−t) + Tr(JA2J−1|D|−s−t) + 2Tr(εJAJ−1A|D|−s−t). Since |D| and J commute, we have with Lemma 5.7, Ã2|D|−s−t ∼c 2L(A2)Zn(s+ t) + 2Tr εJAJ−1A|D|−s−t Thus Lemma 5.8 entails that Tr(Ã2|D|−s−t) is holomorphic at 0 if t 6= n. When t = n, Ã2|D|−s−t = 2m+1 aα,l a −l + aα,0 a 2πn/2 Γ(n/2)−1, (55) which gives the result. Lemma 5.10. If 1 Θ is diophantine, then − ÃDÃD|D|−2−n = −n−2 − Ã2|D|−n. Proof. With DJ = εJD, we get − ÃDÃD|D|−2−n = 2 − ADAD|D|−2−n + 2 − εJAJ−1DAD|D|−2−n. Let us first compute ADAD|D|−2−n. We have, with A =: −iL(Aα)⊗ γα =: −i l aα,lUl ⊗ γα, ADAD|D|−s−2−n l1,l2 aα2,l2 aα1,l1 τ(U−kUl2Ul1Uk) kµ1(k+l1)µ2 |k|s+2+n Tr(γα,µ) where γα,µ := γα2γµ2γα1γµ1 . Thus, − ADAD|D|−2−n = − aα2,−l aα1,lRes ′ kµ1kµ2 |k|s+2+n Tr(γα,µ). We have also, with εJAJ−1 = iR(Aα)⊗ γa, εJAJ−1DAD|D|−s−2−n l1,l2 aα2,l2aα1,l1τ(U−kUl1UkUl2) kµ1 (k+l1)µ2 |k|s+2+n Tr(γα,µ). which gives − εJAJ−1DAD|D|−2−n = aα2,0aα1,0Res ′ kµ1kµ2 |k|s+2+n Tr(γα,µ). Thus, − ÃDÃD|D|−2−n = aα2,0aα1,0 − aα2,−laα1,l Ress=0 ′ kµ1kµ2 |k|s+2+n Tr(γα,µ). kµ1kµ2 |k|s+2+n δµ1µ2 Zn(s+ n) and Cn := Ress=0 Zn(s+ n) = 2π n/2Γ(n/2)−1 we obtain − ÃDÃD|D|−2−n = aα2,0aα1,0 − aα2,−laα1,l Tr(γα2γµγα1γµ). Since Tr(γα2γµγα1γµ) = 2 m(2− n)δα2,α1 , we get − ÃDÃD|D|−2−n = 2m − aα,0 aα0 + aα,−l a )Cn(n−2) Equation (55) now proves the lemma. Lemma 5.11. If 1 Θ is diophantine, then for any P ∈ Ψ1(A) and q ∈ N, q odd, − P |D|−(n−q) = 0. Proof. There exist B ∈ D1(A) and p ∈ N0 such that P = BD−2p + R where R is in OP−q−1. As a consequence, P |D|−(n−q) = B|D|−n−2p+q. Assume B = arbrDqr−1ar−1br−1 · · · Dq1a1b1 where r ∈ N, ai ∈ A, bi ∈ JAJ−1, qi ∈ N. If we prove that B|D|−n−2p+q = 0, then the general case will follow by linearity. We note ai =: l ai,l Ul and bi =: l bi,l Ul. With the shorthand kµ1,µqi := kµ1 · · · kµqi and γ µ1,µqi = γµ1 · · · γµqi , we get Dq1a1b1Uk ⊗ ej = a1,l1 b1,l′1 Ul1UkUl (k + l1 + l 1)µ1,µq1 ⊗ γ µ1,µq1ej which gives, after iteration, B Uk ⊗ ej = ãlb̃lUlr · · ·Ul1UkUl′1 · · ·Ul′r (k + l̂i + l̂ i)µi1,µiqi qr−1 · · · γµ where ãl := a1,l1 · · · ar,lr and b̃l′ := b1,l′1 · · · br,l′r . Let’s note Qµ(k, l, l ′) := i=1 (k + l̂i + l̂ i)µi1,µiqi and γµ := γ qr−1 · · · γµ q1 . Thus, − B |D|−n−2p+q = Res ãl b̃l′ τ U−kUlr · · ·Ul1UkUl′1 · · ·Ul′r ) Qµ(k,l,l′) |k|s+2p+n−q Tr(γµ) . Since Ulr · · ·Ul1Uk = UkUlr · · ·Ul1e−i 1 li.Θk, we get U−kUlr · · ·Ul1UkUl′1 · · ·Ul′r = δPr 1 li+l eiφ(l,l ′) e−i 1 li.Θk where φ is a real valued function. Thus, − B |D|−n−2p+q = Res eiφ(l,l ′) δPr 1 li+l ãl b̃l′ Qµ(k,l,l ′)e−i 1 li.Θk |k|s+2p+n−q Tr(γµ) =: Res fµ(s)Tr(γ We decompose Qµ(k, l, l ′) as a sum h=0Mh,µ(l, l ′)Qh,µ(k) where Qh,µ is a homogeneous poly- nomial in (k1, · · · , kn) and Mh,µ(l, l′) is a polynomial in (l1)1, · · · , (lr)n, (l′1)1, · · · , (l′r)n Similarly, we decompose fµ(s) as h=0 fh,µ(s). Theorem 2.5 (ii) entails that fh,µ(s) extends meromorphically to the whole complex plane C with only one possible pole for s+2p+n−q = n+d where d := deg Qh,µ. In other words, if d+ q− 2p 6= 0, fh,µ(s) is holomorphic at s = 0. Suppose now d+ q− 2p = 0 (note that this implies that d is odd, since q is odd by hypothesis), then, by Theorem 2.5 (ii) fh,µ(s) = V u∈Sn−1 Qh,µ(u) dS(u) where V := l,l′∈ZMh,µ(l, l ′) eiφ(l,l ′) δPr 1 li+l ,0 ãl b̃l′ and Z := { l, l′ : i=1 li = 0 }. Since d is odd, Qh,µ(−u) = −Qh,µ(u) and u∈Sn−1 Qh,µ(u) dS(u) = 0. Thus, Res fh,µ(s) = 0 in any case, which gives the result. As we have seen, the crucial point of the preceding lemma is the decomposition of the numer- ator of the series fµ(s) as polynomials in k. This has been possible because we restricted our pseudodifferential operators to Ψ1(A). Proof of Proposition 5.5. The top element follows from Proposition 4.9 and according to (20), − |D|−n = Res |D|−s−n = 2mRes Zn(s + n) = 2m+1πn/2 Γ(n/2) For the second equality, we get from Lemmas 5.7 and 4.6 s=n−k ζDA(s) = r1,··· ,rp=0 h(n− k, r, p) − εr1(Y ) · · · εrp(Y )|D|−(n−k). Corollary 4.4 and Lemma 5.11 imply that εr1(Y ) · · · εrp(Y )|D|−(n−k) = 0, which gives the result. Last equality follows from Lemma 5.10 and Corollary 4.11. 6 The spectral action Here is the main result of this section. Theorem 6.1. Consider the n-NC-torus C∞(TnΘ),H,D where n ∈ N and 1 Θ is a real n×n skew-symmetric real diophantine matrix, and a selfadjoint one-form A = L(−iAα)⊗ γα. Then, the full spectral action of DA = D +A+ ǫJAJ−1 is (i) for n = 2, S(DA,Φ,Λ) = 4πΦ2Λ2 +O(Λ−2), (ii) for n = 4, S(DA,Φ,Λ) = 8π2 Φ4Λ4 − 4π Φ(0) τ(FµνF µν) +O(Λ−2), (iii) More generally, in S(DA,Φ,Λ) = Φn−k cn−k(A)Λ n−k +O(Λ−1), cn−2(A) = 0, cn−k(A) = 0 for k odd. In particular, c0(A) = 0 when n is odd. This result (for n = 4) has also been obtained in [20] using the heat kernel method. It is however interesting to get the result via direct computations of (5) since it shows how this formula is efficient. As we will see, the computation of all the noncommutative integrals require a lot of technical steps. One of the main points, namely to isolate where the Diophantine condition on Θ is assumed, is outlined here. Remark 6.2. Note that all terms must be gauge invariants, namely, according to (48), invariant by Aα −→ γu(Aα) = uAαu∗ + uδα(u∗). A particular case is u = Uk where Ukδα(U∗k ) = −ikαU0. In the same way, note that there is no contradiction with the commutative case where, for any selfadjoint one-form A, DA = D (so A is equivalent to 0!), since we assume in Theorem 6.1 that Θ is diophantine, so A cannot be commutative. Conjecture 6.3. The constant term of the spectral action of DA on the noncommutative n-torus is proportional to the constant term of the spectral action of D+A on the commutative n-torus. Remark 6.4. The appearance of a Diophantine condition for Θ has been characterized in di- mension 2 by Connes [7, Prop. 49] where in this case, Θ = θ with θ ∈ R. In fact, the Hochschild cohomology H(AΘ,AΘ∗) satisfies dim Hj(AΘ,AΘ∗) = 2 (or 1) for j = 1 (or j = 2) if and only if the irrational number θ satisfies a Diophantine condition like |1−ei2πnθ|−1 = O(nk) for some k. Recall that when the matrix Θ is quite irrational (see [23, Cor. 2.12]), then the C∗-algebra generated by AΘ is simple. Remark 6.5. It is possible to generalize above theorem to the case D = −i gµν δµ ⊗ γν instead of (43) when g is a positive definite constant matrix. The formulae in Theorem 6.1 are still valid. 6.1 Computations of In order to get this theorem, let us prove a few technical lemmas. We suppose from now on that Θ is a skew-symmetric matrix in Mn(R). No other hypothesis is assumed for Θ, except when it is explicitly stated. When A is a selfadjoint one-form, we define for n ∈ N , q ∈ N, 2 ≤ q ≤ n and σ ∈ {−,+}q + := ADD−2, − := ǫJAJ−1DD−2, σ := Aσq · · ·Aσ1 . Lemma 6.6. We have for any q ∈ N, −(ÃD−1)q = −(ÃDD−2)q = σ∈{+,−}q − Aσ. Proof. Since P0 ∈ OP−∞, D−1 = DD−2 mod OP−∞ and (ÃD−1)q = (ÃDD−2)q. Lemma 6.7. Let A be a selfadjoint one-form, n ∈ N and q ∈ N with 2 ≤ q ≤ n and σ ∈ {−,+}q. Then ∫ − Aσ = − A−σ. Proof. Let us first check that JP0 = P0J . Since DJ = εJD, we get DJP0 = 0 so JP0 = P0JP0. Since J is an antiunitary operator, we get P0J = P0JP0 and finally, P0J = JP0. As a consequence, we get JD2 = D2J , JDD−2 = εDD−2J , JA+J−1 = A− and JA−J−1 = A+. In summary, JAσiJ−1 = A−σi . The trace property of now gives − Aσ = − Aσq · · ·Aσ1 = − JAσqJ−1 · · · JAσ1J−1 − A−σq · · ·A−σ1 = − A−σ. Definition 6.8. In [4] has been introduced the vanishing tadpole hypothesis: − AD−1 = 0, for all A ∈ Ω1D(A). (56) By the following lemma, this condition is satisfied for the noncommutative torus, a fact more or less already known within the noncommutative community [34]. Lemma 6.9. Let n ∈ N, A = L(−iAα)⊗γα = −i l∈Zn aα,l Ul⊗γα, Aα ∈ AΘ, { aα,l }l ∈ S(Zn), be a hermitian one-form. Then, ApD−q = (ǫJAJ−1)pD−q = 0 for p ≥ 0 and 1 ≤ q < n (case p = q = 1 is tadpole hypothesis.) (ii) If 1 Θ is diophantine, then BD−q = 0 for 1 ≤ q < n and any B in the algebra generated by A, [D,A], JAJ−1 and J [D,A]J−1. Proof. (i) Let us compute ∫ − Ap(ǫJAJ−1)p′D−q. With A = L(−iAα)⊗ γα and ǫJAJ−1 = R(iAα)⊗ γα, we get Ap = L(−iAα1) · · ·L(−iAαp)⊗ γα1 · · · γαp (ǫJAJ−1)p = R(iAα′1 ) · · ·R(iAα′ )⊗ γα′1 · · · γα We note ãα,l := aα1,l1 · · · aαp,lp . Since L(−iAα1) · · ·L(−iAαp)R(iAα′1) · · ·R(iAα′p′ )Uk = (−i) ãα,l ãα′,l′ Ul1 · · ·UlpUkUl′ · · ·Ul′1 , Ul1 · · ·UlpUk = UkUl1 · · ·Ulp e−i( i li).Θk, we get, with Ul,l′ := Ul1 · · ·UlpUl′ · · ·Ul′1 , gµ,α,α′(s, k, l, l ′) := eik.Θ kµ1 ...kµq |k|s+2q ãα,l ãα′,l′ , ′,µ := γα1 · · · γαpγα′1 · · · γα p′γµ1 · · · γµq , Ap(ǫJAJ−1)p D−q|D|−sUk ⊗ ei ∼c (−i)p ip gµ,α,α′(s, k, l, l ′)UkUl,l′ ⊗ γα,α ′,µei. Thus, Ap(ǫJAJ−1)p D−q = Res f(s) where f(s) : = Tr Ap(ǫJAJ−1)p D−q|D|−s ∼c (−i)p ip 〈Uk ⊗ ei, gµ,α,α′(s, k, l, l ′)UkUl,l′ ⊗ γα,α ′,µei〉 ∼c (−i)p ip gµ,α,α′(s, k, l, l ′)Ul,l′ Tr(γµ,α,α ∼c (−i)p ip gµ,α,α′(s, k, l, l Ul,l′ Tr(γµ,α,α It is straightforward to check that the series k,l,l′gµ,α,α′(s, k, l, l Ul,l′ is absolutely summable if ℜ(s) > R for a R > 0. Thus, we can exchange the summation on k and l, l′, which gives f(s) ∼c (−i)p ip gµ,α,α′(s, k, l, l Ul,l′ Tr(γµ,α,α If we suppose now that p′ = 0, we see that, f(s) ∼c (−i)p ′ kµ1 ...kµq |k|s+2q ãα,l δ Tr(γµ,α,α which is, by Proposition 2.16, analytic at 0. In particular, for p = q = 1, we see that AD−1 = 0, i.e. the vanishing tadpole hypothesis is satisfied. Similarly, if we suppose p = 0, we get f(s) ∼c (−i)p ′ kµ1 ...kµq |k|s+2q ãα,l′ δPp′ i=1 l Tr(γµ,α,α which is holomorphic at 0. (ii) Adapting the proof of Lemma 5.11 to our setting (taking qi = 0, and adding gamma matrices components), we see that − BD−q = Res eiφ(l,l ′) δPr 1 li+l ãα,l b̃β,l′ kµ1 ···kµq e 1 li.Θk |k|s+2q Tr(γ(µ,α,β)) where γ(µ,α,β) is a complicated product of gamma matrices. By Theorem 2.5 (ii), since we suppose here that 1 Θ is diophantine, this residue is 0. 6.1.1 Even dimensional case Corollary 6.10. Same hypothesis as in Lemma 6.9. (i) Case n = 2: − AqD−q = −δq,2 4π τ (ii) Case n = 4: with the shorthand δµ1,...,µ4 := δµ1µ2δµ3µ4 + δµ1µ3δµ2µ4 + δµ1µ4δµ2µ3 , − AqD−q = δq,4 π Aα1 · · ·Aα4 Tr(γα1 · · · γα4γµ1 · · · γµ4)δµ1,...,µ4 . Proof. (i, ii) The same computation as in Lemma 6.9 (i) (with p′ = 0, p = q = n) gives − AnD−n = Res (−i)n ′ kµ1 ...kµn |k|s+2n l∈(Zn)n ãα,lUl1 · · ·Uln Tr(γα1 · · · γαnγµ1 · · · γµn) and the result follows from Proposition 2.16. We will use few notations: If n ∈ N, q ≥ 2, l := (l1, · · · , lq−1) ∈ (Zn)q−1, α := (α1, · · · , αq) ∈ {1, · · · , n}q, k ∈ Zn\{0}, σ ∈ {−,+}q, (ai)1≤i≤n ∈ (S(Zn))n, lq := − 1≤j≤q−1 lj , λσ := (−i)q j=1...q σj , ãα,l := aα1,l1 . . . aαq ,lq , φσ(k, l) := 1≤j≤q−1 (σj − σq) k.Θlj + 2≤j≤q−1 σj (l1 + . . .+ lj−1).Θlj , gµ(s, k, l) := kµ1 (k+l1)µ2 ...(k+l1+...+lq−1)µq |k|s+2|k+l1|2...|k+l1+...+lq−1|2 with the convention 2≤j≤q−1 = 0 when q = 2, and gµ(s, k, l) = 0 whenever l̂i = −k for a 1 ≤ i ≤ q − 1. Lemma 6.11. Let A = L(−iAα) ⊗ γα = −i l∈Zn aα,l Ul ⊗ γα where Aα = −A∗α ∈ AΘ and { aα,l }l ∈ S(Zn), with n ∈ N, be a hermitian one-form, and let 2 ≤ q ≤ n, σ ∈ {−,+}q. Then, Aσ = Res f(s) where f(s) := l∈(Zn)q−1 φσ(k,l) gµ(s, k, l) ãα,l Tr(γ αqγµq · · · γα1γµ1). Proof. By definition, Aσ = Res f(s) where Tr(Aσq · · ·Aσ1 |D|−s) ∼c 〈Uk ⊗ ei, |k|−s Aσq · · ·Aσ1Uk ⊗ ei〉 =: f(s). Let r ∈ Zn and v ∈ C2m . Since A = L(−iAα)⊗ γα, and ǫJAJ−1 = R(iAα)⊗ γα, we get +Ur ⊗ v = ADD−2Ur ⊗ v = A rµ|r|2+δr,0Ur ⊗ γ µv = −i rµ |r|2+δr,0 AαUr ⊗ γαγµv , −Ur ⊗ v = ǫJAJ−1DD−2Ur ⊗ v = ǫJAJ−1 rµ|r|2+δr,0Ur ⊗ γ µv = i |r|2+δr,0 UrAα ⊗ γαγµv. With UlUr = e Ur+l and UrUl = e Ur+l, we obtain, for any 1 ≤ j ≤ q, σjUr ⊗ v = (−σj) i eσj r.Θl rµ |r|2+δr,0 aα,l Ur+l ⊗ γαγµv. We now apply q times this formula to get |k|−sAσq · · ·Aσ1Uk ⊗ ei = λσ l∈(Zn)q φσ(k,l) gµ(s, k, l) ãα,l Uk+ ⊗ γαqγµq · · · γα1γµ1ei φσ(k, l) := σ1 k.Θl1 + σ2 (k + l1).Θl2 + . . .+ σq (k + l1 + . . . + lq−1).Θlq. Thus, f(s) = l∈(Zn)q φσ(k,l) gµ(s, k, l) ãα,l U Tr(γαqγµq · · · γα1γµ1) l∈(Zn)q φσ(k,l) gµ(s, k, l) ãα,l δ( lj)Tr(γ αqγµq · · · γα1γµ1) l∈(Zn)q−1 φσ(k,l) gµ(s, k, l) ãα,l Tr(γ αqγµq · · · γα1γµ1) where in the last sum lq is fixed to − 1≤j≤q−1 lj and thus, φσ(k, l) = 1≤j≤q−1 (σj − σq) k.Θlj + 2≤j≤q−1 σj (l1 + . . .+ lj−1).Θlj . By Lemma 2.10, there exists a R > 0 such that for any s ∈ C with ℜ(s) > R, the family φσ(k,l) gµ(s, k, l) ãα,l (k,l)∈(Zn\{ 0 })×(Zn)q−1 is absolutely summable as a linear combination of families of the type considered in that lemma. As a consequence, we can exchange the summations on k and l, which gives the result. In the following, we will use the shorthand c := 4π Lemma 6.12. Suppose n = 4. Then, with the same hypothesis of Lemma 6.11, (i) 1 −(A+)2 = 1 −(A−)2 = c aα1,l aα2,−l lα1 lα2 − δα1α2 |l|2 (ii) − 1 −(A+)3 = −1 −(A−)3 = 4c li∈Z4 aα3,−l1−l2 a aα1,l1 sin l1.Θl2 (iii) 1 −(A+)4 = 1 −(A−)4 = 2c li∈Z4 aα1,−l1−l2−l3 aα2,l3 a aα2l1 sin l1.Θ(l2+l3) sin l2.Θl3 (iv) Suppose 1 Θ diophantine. Then the crossed terms in (A+ + A−)q vanish: if C is the set of all σ ∈ {−,+}q with 2 ≤ q ≤ 4, such that there exist i, j satisfying σi 6= σj, we have∑ Aσ = 0. Proof. (i) Lemma 6.11 entails that A++ = Res l∈Zn −f(s, l) where f(s, l) := ′ kµ1 (k+l)µ2 |k|s+2|k+l|2 ãα,l Tr(γ α2γµ2γα1γµ1) and ãα,l := aα1,l aα2,−l . We will now reduce the computation of the residue of an expression involving terms like |k+l|2 in the denominator to the computation of residues of zeta functions. To proceed, we use (16) into an expression like the one appearing in f(s, l). We see that the last term on the righthandside yields a Zn(s) while the first one is less divergent by one power of k. If this is not enough, we repeat this operation for the new factor of |k + l|2 in the denominator. For f(s, l), which is quadratically divergent at s = 0, we have to repeat this operation three times before ending with a convergent result. All the remaining terms are expressible in terms of Zn functions. We get, using three times (16), |k+l|2 − 2k.l+|l| (2k.l+|l|2)2 − (2k.l+|l| |k|6|k+l|2 . (57) Let us define fα,µ(s, l) := ′ kµ1 (k+l)µ2 |k|s+2|k+l|2 ãα,l so that f(s, l) = fα,µ(s, l)Tr(γ α2γµ2γα1γµ1). Equation (57) gives fα,µ(s, l) = f1(s, l)− f2(s, l) + f3(s, l)− r(s, l) with obvious identifications. Note that the function r(s, l) = ′ kµ1 (k+l)µ2 (2kl+|l| |k|s+8|k+l|2 ãα,l is a linear combination of functions of the typeH(s, l) satisfying the hypothesis of Corollary 2.13. Thus, r(s, l) satisfies (H1) and with the previously seen equivalence relation modulo functions satisfying this hypothesis we get fα,µ(s, l) ∼ f1(s, l)− f2(s, l) + f3(s, l). Let’s now compute f1(s, l). f1(s, l) = ′ kµ1 (k+l)µ2 |k|s+4 ãα,l = ãα,l ′ kµ1kµ2 |k|s+4 Proposition 2.1 entails that s 7→ ′ kµ1kµ2 |k|s+4 is holomorphic at 0. Thus, f1(s, l) satisfies (H1), and fα,µ(s, l) ∼ −f2(s, l) + f3(s, l). Let’s now compute f2(s, l) modulo (H1). We get, using several times Proposition 2.1, f2(s, l) = ′ kµ1 (k+l)µ2 (2kl+|l| |k|s+6 ãα,l = ′ (2kl)kµ1kµ2+(2kl)kµ1 lµ2+|l| 2kµ1kµ2+lµ2 |l| |k|s+6 ãα,l ∼ 0 + ′ (2kl)kµ1 lµ2 |k|s+6 ãα,l + ′ |l|2kµ1kµ2 |k|s+6 ãα,l + 0 . Recall that |k|s+6 Zn(s + 4). Thus, f2(s, l) ∼ 2lilµ2 ãα,l Zn(s+ 4) + |l|2 ãα,l δµ1µ2 Zn(s+ 4). Finally, let us compute f3(s, l) modulo (H1) following the same principles: f3(s, l) = ′ kµ1 (k+l)µ2 (2kl+|l| |k|s+8 ãα,l ′ (2kl)2kµ1kµ2+(2kl) 2kµ1 lµ2+|l| 4kµ1kµ2+|l| 4kµ1 lµ2+(4kl)|l| 2kµ1kµ2+(4kl)|l| 2kµ1 lµ2 |k|s+8 ãα,l ∼ 4lilj ′ kikjkµ1kµ2 |k|s+8 ãα,l + 0. In conclusion, fα,µ(s, l) ∼ −14(2lµ1 lµ2 + |l| 2 δµ1µ2)ãα,lZn(s+ 4) + 4l ilj ãα,l ′ kikjkµ1kµ2 |k|s+8 =: gα,µ(s, l). Proposition (2.1) entails that Zn(s+ 4) and s 7→ ′ kikjkµ1kµ2 |k|s+8 extend holomorphically in a punctured open disk centered at 0. Thus, gα,µ(s, l) satisfies (H2) and we can apply Lemma 2.14 to get −(A+)2 = Res f(s, l) = gα,µ(s, l)Tr(γ α2γµ2γα1γµ1) =: g(s, l). The problem is now reduced to the computation of Res g(s, l). Recall that Ress=0 Z4(s+4) = 2π by (20) or (17), and Ress=0 ′ kikjklkm |k|s+8 = (δijδlm + δilδjm + δimδjl) Thus, gα,µ(s, l) = −π ãα,l (lµ1 lµ2 + |l|2δµ1µ2). We will use Tr(γµ1 · · · γµ2j ) = Tr(1) all pairings of { 1···2j } s(P ) δµP1µP2 δµP3µP4 · · · δµP2j−1µP2j (58) where s(P ) is the signature of the permutation P when P2m−1 < P2m for 1 ≤ m ≤ n. This gives Tr(γα2γµ2γα1γµ1) = 2m(δα2µ2δα1µ1 − δα1α2δµ2µ1 + δα2µ1δµ2α1). (59) Thus, g(s, l) = −c ãα,l (lµ1 lµ2 + 12 |l| 2δµ1µ2)(δ α2µ2δα1µ1 − δα1α2δµ2µ1 + δα2µ1δµ2α1) = −2c ãα,l (lα1 lα2 − δα1α2 |l|2). Finally, −(A+)2 = 1 −(A−)2 = c aα1,l aα2,−l lα1 lα2 − δα1α2 |l|2 (ii) Lemma 6.11 entails that A+++ = Res (l1,l2)∈(Zn)2 f(s, l) where f(s, l) := l1Θl2 kµ1 (k+l1)µ2 (k+ bl2)µ3 |k|s+2|k+l1|2|k+bl2|2 ãα,l Tr(γ α3γµ3γα2γµ2γα1γµ1) =: fα,µ(s, l)Tr(γ α3γµ3γα2γµ2γα1γµ1), and ãα,l := aα1,l1 aα2,l2 aα3,−bl2 with l̂2 := l1 + l2. We use the same technique as in (i): |k+l1|2 − 2k.l1+|l1| (2k.l1+|l1| |k|4|k+l1|2 |k+bl2|2 |k|2 − 2k.bl2+|bl2| (2k.bl2+|bl2| |k|4|k+bl2|2 and thus, |k+l1|2|k+bl2|2 − 2k.l1 − 2k.bl2 +R(k, l) (60) where the remain R(k, l) is a term of order at most −6 in k. Equation (60) gives fα,µ(s, l) = f1(s, l) + r(s, l) where r(s, l) corresponds to R(k, l). Note that the function r(s, l) = l1Θl2 kµ1(k+l)µ2 (k+ bl2)µ3R(k,l) |k|s+2 ãα,l is a linear combination of functions of the type H(s, l) satisfying the hypothesis of Corollary (2.13). Thus, r(s, l) satisfies (H1) and fα,µ(s, l) ∼ f1(s, l). Let us compute f1(s, l) modulo (H1) f1(s, l) = l1Θl2 kµ1 (k+l1)µ2 (k+ bl2)µ3 |k|s+6 ãα,l − l1Θl2 kµ1 (k+l1)µ2 (k+ bl2)µ3 (2k.l1+2k. |k|s+8 ãα,l l1Θl2 kµ1kµ2 bl2µ3+kµ1kµ3 l1µ2 |k|s+6 ãα,l − l1Θl2 kµ1kµ2kµ3 (2k.l1+2k. |k|s+8 ãα,l = i e l1Θl2 ãα,l (l1µ2δµ1µ3 + l̂2µ3δµ1µ2) Z4(s+ 4)− 2(li1 + l̂i2) ′ kµ1kµ2kµ3ki |k|s+8 =: gα,µ(s, l). Since gα,µ(s, l) satisfies (H2), we can apply Lemma 2.14 to get −(A+)3 = Res (l1,l2)∈(Zn)2 f(s, l) (l1,l2)∈(Zn)2 gα,µ(s, l)Tr(γ α3γµ3γα2γµ2γα1γµ1) =: Recall that l3 := −l1 − l2 = −l̂2. By (17) and (19), gα,µ(s, l)i e l1Θl2 ãα,l 2(−li1 + li3)π (δµ1µ2δµ3i + δµ1µ3δµ2i + δµ1iδµ2µ3) + (l1µ2δµ1µ3 − l3µ3δµ1µ2)π We decompose Xl in five terms: Xl = 2 l1Θl2 ãα,l (T1 + T2 + T3 + T4 + T5) where T0 := (−li1 + li3)(δµνδρi + δµρδνi + δµiδνρ) + l1νδµρ − l3ρδµν , T1 := (δ α3ρδα2νδα1µ − δα3ρδα2α1δµν + δα3ρδα2µδα1ν)T0, T2 := (−δα2α3δρνδα1µ + δα2α3δα1ρδµν − δα2α3δρµδα1ν)T0, T3 := (δ α3νδα2ρδα1µ − δα3νδα1ρδα2µ + δα3νδρµδα1α2)T0, T4 := (−δα1α3δα2ρδµν + δα1α3δρνδα2µ − δα1α3δρµδα2ν)T0, T5 := (δ α3µδα2ρδα1ν − δα3µδρνδα1α2 + δα3µδα1ρδα2ν)T0. With the shorthand p := −l1 − 2l3, q := 2l1 + l3, r := −p− q = −l1 + l3, we compute each Ti, and find 3T1 = δ α1α2(2− 2m)pα3 + δα3α1qα2 − δα2α1qα3 + δα3α2qα1 + δα3α2rα1 − δα2α1rα3 + δα3α1rα2 , 3T2 = (2 m − 2)δα2α3pα1 − 2mδα2α3qα1 − 2mδα2α3rα1 , 3T3 = δ α1α3pα2 − δα2α3pα1 + δα1α2pα3 + 2mδα2α1qα3 + δα3α2rα1 − δα3α1rα2 + δα1α2rα3 , 3T4 = −δα1α32mpα2 − δα1α32mqα2 + δα1α3(2m − 2)rα2 , 3T5 = δ α1α3pα2 − δα1α2pα3 + δα3α2pα1 + δα3α2qα1 − δα1α2qα3 + δα3α1qα2 + (2− 2m)δα1α2rα3 . Thus, Xl = 2 m 2π2 l1.Θl2 ãα,l (q α3δα1α2 + rα2δα1α3 + pα1δα2α3) (61) −(A+)3 = i 2c (S1 + S2 + S3), where S1, S2 and S3 correspond to respectively q α3δα1α2 , rα2δα1α3 and pα1δα2α3 . In S1, we permute the li variables the following way: l1 7→ l3, l2 7→ l1, l3 7→ l2. Therefore, l3.Θ l1 7→ l3.Θ l1 and q 7→ r. With a similar permutation of the αi, we see that S1 = S2. We apply the same principles to prove that S1 = S3 (using permutation l1 7→ l2, l2 7→ l3, l3 7→ l1). Thus, −(A+)3 = i 2c ãα,l e l1.Θl2 (l1 − l2)α3δα1α2 = S4 − S5, where S4 correspond to l1 and S5 to l2. We permute the li variables in S5 the following way: l1 7→ l2, l2 7→ l1, l3 7→ l3, with a similar permutation on the αi. Since l1.Θ l2 7→ −l1.Θ l2, we finally get −(A+)3 = −4c aα1,l1 aα2,l2 aα3,−l1−l2 sin l1.Θl2 lα31 δ α1α2 . (iii) Lemma 6.11 entails that A++++ = Res (l1,l2,l3)∈(Zn)3 fµ,α(s, l)Tr γ µ,α where θ := l1.Θl2 + l1.Θl3 + l2.Θl3, Tr γµ,α := Tr(γα4γµ4γα3γµ3γα2γµ2γα1γµ1), fµ,α(s, l) := θ kµ1 (k+l1)µ2 (k+ bl2)µ3 (k+ bl3)µ4 |k|s+2|k+l1|2|k+bl2|2|k+bl3|2 ãα,l, ãα,l := aα1,l1 aα2,l2 aα3,l3 aα4,−l1−l2−l3 . Using (16) and Corollary 2.13 successively, we find fµ,α(s, l) ∼ θ kµ1kµ2kµ3kµ4 |k|s+2|k+l1|2|k+l1+l2|2|k+l1+l2+l3|2 ãα,l ∼ θ kµ1kµ2kµ3kµ4 |k|s+8 ãα,l. Since the function θ kµ1kµ2kµ3kµ4 |k|s+8 ãα,l satisfies (H2), Lemma 2.14 entails that −(A+)4 = (l1,l2,l3)∈(Zn)3 ãα,l Res ′ kµ1kµ2kµ3kµ4 |k|s+8 Tr γµ,α =: Therefore, with (19), we get Xl = ãα,l e (A+B + C), where A := Tr(γα4γµ4γα3γµ4γ α2γµ2γα1γµ2), B := Tr(γα4γµ4γα3γµ2γα2γµ4γ α1γµ2), C := Tr(γα4γµ4γα3γµ2γ α2γµ2γα1γµ4). Using successively {γµ, γν} = 2δµν and γµγµ = 2m 12m , we see that A = C = 4 Tr(γα4γα3γα2γα1), B = −4 Tr(γα4γα3γα1γα2) + Tr(γα4γα2γα3γα1) Thus, A+B +C = 8 2m δα4α3δα2α1 + δα4α1δα3α2 − 2δα4α2δα3α1 , and ãα,l δα4α3δα2α1 + δα4α1δα3α2 − 2δα4α2δα3α1 . (62) By (62), we get ∫ −(A+)4 = 2c (−2T1 + T2 + T3), where T1 := l1,...,l4 aα4,l4 aα3,l3 aα2,l2 aα1,l1 e δα4α2 δα3α1 , T2 := l1,...,l4 aα4,l4 aα3,l3 aα2,l2 aα1,l1 e δα4α3 δα2α1 , T3 := l1,...,l4 aα4,l4 aα3,l3 aα2,l2 aα1,l1 e δα4α1 δα3α2 . We now proceed to the following permutations of the li variables in the T1 term : l1 7→ l2, l2 7→ l1, l3 7→ l4, l4 7→ l3. While i li is invariant, θ is modified : θ 7→ l2.Θl1 + l2.Θl4 + l1.Θl4. With δ0, in factor, we can let l4 be −l1 − l2 − l3, so that θ 7→ −θ. We also permute the αi in the same way. Thus, l1,...,l4 aα3,l3 aα4,l4 aα1,l1 aα2,l2 e δα3α1 δα4α2 . Therefore, 2T1 = 2 l1,...,l4 aα4,l4 aα3,l3 aα2,l2 aα1,l1 cos δα4α2 δα3α1 . (63) The same principles are applied to T2 and T3. Namely, the permutation l1 7→ l1, l2 7→ l3, l3 7→ l2, l4 7→ l4 in T2 and the permutation l1 7→ l2, l2 7→ l3, l3 7→ l1, l4 7→ l4 in T3 (the αi variables are permuted the same way) give l1,...,l4 aα4,l4aα3,l3aα2,l2 aα1,l1 e δα4α2 δα3α1 , l1,...,l4 aα4,l4 aα3,l3aα2,l2 aα1,l1 e δα4α2 δα3α1 where φ := l1.Θ l2 + l1.Θ l3 − l2.Θ l3. Finally, we get −(A+)4 = 4c l1,...,l4 aα1,l4 aα2,l3 a − cos θ l1,...,l3 aα1,−l1−l2−l3 aα2,l3 a l1.Θ(l2+l3) sin l2.Θl3 . (64) (iv) Suppose q = 2. By Lemma 6.11, we get − Aσ = Res λσfα,µ(s, l)Tr(γ α2γµ2γα1γµ1) where fα,µ(s, l) := ′ kµ1 (k+l)µ2 |k|s+2|k+l|2 eiη k.Θl ãα,l and η := 1 (σ1 − σ2) ∈ {−1, 1}. As in the proof of (i), since the presence of the phase does not change the fact that r(s, l) satisfies (H1), we get fα,µ(s, l) ∼ f1(s, l)− f2(s, l) + f3(s, l) where f1(s, l) = ′ kµ1(k+l)µ2 |k|s+4 eiη k.Θl ãα,l, f2(s, l) = ′ kµ1(k+l)µ2 (2k.l+|l| |k|s+6 eiη k.Θl ãα,l, f3(s, l) = ′ kµ1(k+l)µ2 (2k.l+|l| |k|s+8 eiη k.Θl ãα,l. Suppose that l = 0. Then f2(s, 0) = f3(s, 0) = 0 and Proposition 2.1 entails that f1(s, 0) = kµ1kµ2 |k|s+4 ãα,0 is holomorphic at 0 and so is fα,µ(s, 0). Since 1 Θ is diophantine, Theorem 2.5 3 gives us the result. Suppose q = 3. Then Lemma 6.11 implies that − Aσ = Res l∈(Zn)2 fµ,α(s, l) Tr(γ µ3γα3 · · · γµ1γα1) where fµ,α(s, l) := ik.Θ(ε1l1+ε2l2)e σ2l1.Θl2 kµ1 (k+l1)µ2 (k+l1+l2)µ3 |k|s+2|k+l1|2|k+l1+l2|2 ãα,l, and εi := (σi − σ3) ∈ {−1, 0, 1}. By hypothesis (ε1, ε2) 6= (0, 0). There are six possibilities for the values of (ε1, ε2), corresponding to the six possibilities for the values of σ: (−,−,+), (−,+,+), (+,−,+), (+,+,−), (−,+,−), and (+,−,−). As in (ii), we see that fµ,α(s, l) ∼ ′ eik.Θ(ε1l1+ε2l2)kµ1 (k+l1)µ2 (k+ bl2)µ3 |k|s+6 ′ eik.Θ(ε1l1+ε2l2)kµ1 (k+l1)µ2 (k+ bl2)µ3 (2k.l1+2k. |k|s+8 λσ ãα,l e σ2l1.Θl2 . With Z := {(l1, l2) : ε1l1 + ε2l2 = 0}, Theorem 2.5 (iii) entails that l∈(Zn)2\Z fµ,α(s, l) is holomorphic at 0. To conclude we need to prove that g(σ) := fµ,α(s, l) Tr(γ µ3γα3 · · · γµ1γα1) is holomorphic at 0. By definition, λσ = iσ1σ2σ3 and as a consequence, we check that g(−,−,+) = −g(+,+,−), g(+,−,+) = −g(+,−,−), g(−,+,+) = −g(−,+,−), which implies that σ g(σ) = 0. The result follows. Suppose finally that q = 4. Again, Lemma 6.11 implies that − Aσ = Res l∈(Zn)3 fµ,α(s, l) Tr(γ µ4γα4 · · · γµ1γα1) where fµ,α(s, l) := i=1 εili e (σ2l1.Θl2+σ3(l1+l2).Θl3) kµ1 (k+l1)µ2 (k+l1+l2)µ3 (k+l1+l2+l3)µ4 |k|s+2|k+l1|2|k+l1+l2|2|k+l1+l2+l3|2 ãα,l and εi := (σi − σ4) ∈ {−1, 0, 1}. By hypothesis (ε1, ε2, ε3) 6= (0, 0, 0). There are fourteen pos- sibilities for the values of (ε1, ε2, ε3), corresponding to the fourteen possibilities for the values of σ: (−,−,−,+), (−,−,+,+), (−,+,−,+), (+,−,−,+), (−,+,+,+), (+,−,+,+), (+,+,−,+), (+,+,+,−), (−,−,+,−), (−,+,−,−), (+,−,−,−), (−,+,+,−), (+,−,+,−) and (+,+,−,−). As in (ii), we see that, with the shorthand θσ := σ2l1.Θl2 + σ3(l1 + l2).Θl3, fµ,α(s, l) ∼ i=1 εili e θσ kµ1kµ2kµ3kµ4 |k|s+8 ãα,l =: gµ,α(s, l) . With Zσ := {(l1, l2, l3) : i=1 εili = 0}, Theorem 2.5 (iii), the series l∈(Zn)3\Zσ fµ,α(s, l) is holomorphic at 0. To conclude, we need to prove that g(σ) := gµ,α(s, l) Tr(γ µ4γα4 · · · γµ1γα1) = 0. Let C be the set of the fourteen values of σ and C7 be the set of the seven first values of σ given above. Lemma 6.7 implies ∑ g(σ) = 2 g(σ). Thus, in the following, we restrict to these seven values. Let us note Fµ(s) := kµ1kµ2kµ3kµ4 |k|s+8 so that g(σ) = Res Fµ(s)λσ θσ ãα,l Tr(γ µ4γα4 · · · γµ1γα1). Recall from (62) that Fµ(s)Tr(γ µ4γα4 · · · γµ1γα1) = 2c δα4α3δα2α1 + δα4α1δα3α2 − 2δα4α2δα3α1 As a consequence, we get, with ãα,l := aα1,l1 · · · aα4,l4 , g(σ) = 2cλσ l∈(Zn)4 θσ ãα,l δP4 i=1 li,0 i=1 εili,0 δα4α3δα2α1 + δα4α1δα3α2 − 2δα4α2δα3α1 =: 2cλσ(T1 + T2 − 2T3). We proceed to the following change of variable in T1: l1 7→ l1, l2 7→ l3, l3 7→ l2, l4 7→ l4. Thus, we get θσ 7→ ψσ := σ2l1.Θl3 + σ3(l1 + l3).Θl2, and i=1 εili 7→ ε1l1 + ε3l2 + ε2l3 =: uσ(l). With a similar permutation on the αi, we get l∈(Zn)4 ψσ ãα,l δP4 i=1 li,0 δε1l1+ε3l2+ε2l3,0 δ α4α2δα3α1 . We proceed to the following change of variable in T2: l1 7→ l2, l2 7→ l3, l3 7→ l1, l4 7→ l4. Thus, we get θσ 7→ φσ := σ2l2.Θl3 + σ3(l2 + l3).Θl1, and i=1 εili 7→ ε3l1 + ε1l2 + ε2l3 =: vσ(l). After a similar permutation on the αi, we get l∈(Zn)4 φσ ãα,l δP4 i=1 li,0 δε3l1+ε1l2+ε2l3,0 δ α4α2δα3α1 . Finally, we proceed to the following change of variable in T3: l1 7→ l2, l2 7→ l1, l3 7→ l4, l4 7→ l3. Thus, we get θσ 7→ −θσ, and i=1 εili 7→ (ε2− ε3)l1+(ε1− ε3)l2− ε3l3 =: wσ(l). With a similar permutation on the αi, we get l∈(Zn)4 θσ ãα,l δP4 i=1 li,0 δ(ε2−ε3)l1+(ε1−ε3)l2−ε3l3,0δ α4α2δα3α1 . As a consequence, we get g(σ) = 2c l∈(Zn)4 Kσ(l1, l2, l3) ãα,l δP4 i=1 li,0 δα4α2δα3α1 , where Kσ(l1, l2, l3) = λσ ψσ δuσ(l),0 + e φσ δvσ(l),0 − e θσ δP3 i=1 εili,0 θσ δwσ(l),0 The computation of Kσ(l1, l2, l3) for the seven values of σ yields K−−++(l1, l2, l3) = δl1+l3,0 + δl2+l3,0 − δl1+l2,0 − δl1+l2,0, K−+−+(l1, l2, l3) = δl1+l2,0 + δl1+l2,0 − δl1+l3,0 − δl1+l3,0, K−−++(l1, l2, l3) = δl2+l3,0 + δl1+l3,0 − δl2+l3,0 − δl2+l3,0, K−−−+(l1, l2, l3) = − l1.Θl2δP3 i=1 li,0 l2.Θl1δP3 i=1 li,0 l2.Θl1δP3 i=1 li,0 l1.Θl2δl3,0 K−+++(l1, l2, l3) = − l3.Θl2δl1,0 + e l3.Θl1δl2,0 − e l2.Θl3δl1,0 − e l3.Θl1δl2,0 K+−++(l1, l2, l3) = − l1.Θl2δl3,0 + e l2.Θl1δl3,0 − e l1.Θl3δl2,0 − e l3.Θl2δl1,0 K++−+(l1, l2, l3) = − l1.Θl3δl2,0 + e l2.Θl3δl1,0 − e l1.Θl2δl3,0 − e l2.Θl1δP3 i=1 li,0 Thus, ∑ Kσ(l1, l2, l3) = 2i(δl3,0 − δP3 i=1 li,0 ) sin l1.Θl2 and ∑ g(σ) = i4c l∈(Zn)4 (δl3,0 − δP3 i=1 li,0 ) sin l1.Θl2 ãα,l δP4 i=1 li,0 δα4α2δα3α1 . The following change of variables: l1 7→ l2, l1 7→ l2, l3 7→ l4, l4 7→ l3 gives l∈(Zn)4 1 li,0 sin l1.Θl2 ãα,l δP4 1 li,0 δα4α2δα3α1 = − l∈(Zn)4 δl3,0 sin l1.Θl2 ãα,l δP4 1 li,0 δα4α2δα3α1 g(σ) = i8c l∈(Zn)4 δl3,0 sin l1.Θl2 ãα,l δP4 1 li,0 δα4α2δα3α1 . Finally, the change of variables: l2 7→ l4, l4 7→ l2 gives l∈(Zn)4 δl3,0 sin l1.Θl2 ãα,l δP4 1 li,0 δα4α2δα3α1 = − l∈(Zn)4 δl3,0 sin l1.Θl2 ãα,l δP4 1 li,0 δα4α2δα3α1 which entails that g(σ) = 0. Lemma 6.13. Suppose n = 4 and 1 Θ diophantine. For any self-adjoint one-form A, ζDA(0) − ζD(0) = −c τ(Fα1,α2Fα1α2). Proof. By (34) and Lemma 6.6 we get ζDA(0) − ζD(0) = (−1)q σ∈{+,−}q − Aσ. By Lemma 6.12 (iv), we see that the crossed terms all vanish. Thus, with Lemma 6.7, we get ζDA(0)− ζD(0) = 2 (−1)q −(A+)q. (65) By definition, Fα1α2 = i aα2,k kα1 − aα1,k kα2 aα1,k aα2,l [Uk, Ul] (aα2,k kα1 − aα1,k kα2)− 2 aα1,k−l aα2,l sin( τ(Fα1α2F α1α2) = α1, α2=1 (aα2,k kα1 − aα1,k kα2)− 2 l′∈Z4 aα1,k−l′ aα2,l′ sin( k.Θl′ (aα2,−k kα1 − aα1,−k kα2)− 2 l”∈Z4 aα1,−k−l” aα2,l” sin( k.Θl” One checks that the term in aq of τ(Fα1α2F α1α2) corresponds to the term (A+)q given by Lemma 6.12. For q = 2, this is l∈Z4, α1, α2 aα1,l aα2,−l lα1 lα2 − δα1α2 |l|2 For q = 3, we compute the crossed terms: k,k′,l (aα2,k kα1 − aα1,k kα2) a Uk[Uk′ , l] + [Uk′ , Ul]Uk which gives the following a3-term in τ(Fα1α2F α1α2) aα3,−l1−l2 a aα1,l1 sin l1.Θl2 For q = 4, this is aα1,−l1−l2−l3 aα2,l3 a l1.Θ(l2+l3) sin l2.Θl3 which corresponds to the term (A+)4. We get finally, (−1)q −(A+)q = − c τ(Fα1,α2F α1α2). (66) Equations (65) and (66) yield the result. Lemma 6.14. Suppose n = 2. Then, with the same hypothesis as in Lemma 6.11, −(A+)2 = −(A−)2 = 0. (ii) Suppose 1 Θ diophantine. Then − A+A− = − A−A+ = 0. Proof. (i) Lemma 6.11 entails that A++ = Res l∈Z2 −f(s, l) where f(s, l) := kµ1 (k+l)µ2 |k|s+2|k+l|2 ãα,l Tr(γ α2γµ2γα1γµ1) =: fµ,α(s, l)Tr(γ α2γµ2γα1γµ1) and ãα,l := aα1,l aα2,−l. This time, since n = 2, it is enough to apply just once (16) to obtain an absolutely convergent series. Indeed, we get with (16) fµ,α(s, l) = ′ kµ1 (k+l)µ2 |k|s+4 ãα,l − ′ kµ1 (k+l)µ2 (2k.l+|l| |k|s+4|k+l|2 ãα,l. and the function r(s, l) := kµ1 (k+l)µ2 (2k.l+|l| |k|s+4|k+l|2 ãα,l is a linear combination of functions of the type H(s, l) satisfying the hypothesis of Corollary 2.13. As a consequence, r(s, l) satisfies (H1) and fµ,α(s, l) ∼ ′ kµ1 (k+l)µ2 |k|s+4 ãα,l ∼ ′ kµ1kµ2 |k|s+4 ãα,l Note that the function (s, l) 7→ hµ,α(s, l) := kµ1kµ2 |k|s+4 ãα,l satisfies (H2). Thus, Lemma 2.14 yields f(s, l) = hµ,α(s, l)Tr(γ α2γµ2γα1γµ1). By Proposition 2.16, we get Res hµ,α(s, l) = δµ1µ2 π ãα,l. Therefore, − A++ = −π ãα,l Tr(γ α2γµγα1γµ) = 0 according to (59). (ii) By Lemma 6.11, we obtain that A−+ = Res l∈Z2 λσfα,µ(s, l)Tr(γ α2γµ2γα1γµ1) where λσ = −(−i)2 = 1 and fα,µ(s, l) := ′ kµ1 (k+l)µ2 |k|s+2|k+l|2 eiη k.Θl ãα,l and η := 1 (σ1 − σ2) = −1. As in the proof of (i), since the presence of the phase does not change the fact that r(s, l) satisfies (H1), we get fα,µ(s, l) ∼ ′ kµ1 (k+l)µ2 |k|s+4 eiη k.Θl ãα,l := gα,µ(s, l) . Since 1 Θ is diophantine, the functions s 7→ l∈Z2\{0} gα,µ(s, l) are holomorphic at s = 0 by Theorem 2.5 3. As a consequence, − A−+ = Res gα,µ(s, 0)Tr(γ α2γµ2γα1γµ1) = Res ′ kµ1kµ2 |k|s+4 ãα,0 Tr(γ α2γµ2γα1γµ1). Recall from Proposition 2.1 that Ress=0 |k|s+4 = δij π. Thus, again with (59), − A−+ = ãα,0 π Tr(γα2γµγα1γµ) = 0. Lemma 6.15. Suppose n = 2 and 1 Θ diophantine. For any self-adjoint one-form A, ζDA(0) − ζD(0) = 0. Proof. As in Lemma 6.13, we use (34) and Lemma 6.6 so the result follows from Lemma 6.14. 6.1.2 Odd dimensional case Lemma 6.16. Suppose n odd and 1 Θ diophantine. Then for any self-adjoint 1-form A and σ ∈ {−,+}q with 2 ≤ q ≤ n, ∫ − Aσ = 0 . Proof. Since Aσ ∈ Ψ1(A), Lemma 5.11 with k = n gives the result. Corollary 6.17. With the same hypothesis of Lemma 6.16, for any self-adjoint one-form A, ζDA(0)− ζD(0) = 0. Proof. As in Lemma 6.13, we use (34) and Lemma 6.6 so the result follows from Lemma 6.16. 6.2 Proof of the main result Proof of Theorem 6.1. (i) By (5) and Proposition 5.5, we get S(DA,Φ,Λ) = 4πΦ2Λ2 +Φ(0) ζDA(0) +O(Λ−2), where Φ2 = Φ(t) dt. By Lemma 6.15, ζDA(0) − ζD(0) = 0 and from Proposition 5.4, ζD(0) = 0, so we get the result. (ii) Similarly, S(DA,Φ,Λ) = 8π2 Φ4Λ4+Φ(0) ζDA(0)+O(Λ−2) with Φ4 = 12 Φ(t) t dt. Lemma 6.13 implies that ζDA(0)−ζD(0) = −c τ(FµνFµν) and by Proposition 5.4, ζDA(0) = −c τ(FµνFµν) leading to the result. (iii) is a direct consequence of (5), Propositions 5.4, 5.5, and Corollary 6.17. A Appendix A.1 Proof of Lemma 3.3 (i) We have |D|T |D|−1 = T + δ(T )|D|−1 and |D|−1T |D| = T − |D|−1δ(T ). A recurrence proves that for any k ∈ N, |D|kT |D|−k = δq(T )|D|−q and we get |D|−kT |D|k = q=0(−1)q |D|−qδq(T ). As a consequence, since T , |D|−q and δq(T ) are in OP 0 for any q ∈ N, for any k ∈ Z, |D|kT |D|−k ∈ OP 0. Let us fix p ∈ N0 and define Fp(s) := δp(|D|sT |D|−s) for s ∈ C. Since for k ∈ Z, Fp(k) is bounded, a complex interpolation proves that Fp(s) is bounded, which gives |D|sT |D|−s ∈ OP 0. (ii) Let T ∈ OPα and T ′ ∈ OP β. Thus, T |D|−α, T ′|D|−β are in OP 0. By (i) we get |D|βT |D|−α|D|−β ∈ OP 0, so T ′|D|−β|D|βT |D|−β−α ∈ OP 0. Thus, T ′T |D|−(α+β) ∈ OP 0. (iii) For T ∈ OPα, |D|α−β and T |D|−α are in OP 0, thus T |D|−β = T |D|−α|D|α−β ∈ OP 0. (iv) follows from δ(OP 0) ⊆ OP 0. (v) Since ∇(T ) = δ(T )|D|+ |D|δ(T )− [P0 , T ], the result follows from (ii), (iv) and the fact that P0 is in OP A.2 Proof of Lemma 3.6 The non-trivial part of the proof is the stability under the product of operators. Let T, T ′ ∈ Ψ(A). There exist d, d′ ∈ Z such that for any N ∈ N, N > |d|+ |d′|, there exist P,P ′ in D(A), p, p′ ∈ N0, R ∈ OP−N−d , R′ ∈ OP−N−d such that T = PD−2p + R, T ′ = P ′D−2p′ + R′, PD−2p ∈ OP d and P ′D−2p′ ∈ OP d′ . Thus, TT ′ = PD−2pP ′D−2p +RP ′D−2p + PD−2pR′ +RR′. We also have RP ′D−2p ′ ∈ OP−N−d′+d′ = OP−N and similarly, PD−2pR′ ∈ OP−N . Since RR′ ∈ OP−2N , we get TT ′ ∼ PD−2pP ′D−2p′ mod OP−N . If p = 0, then TT ′ ∼ QD−2p′ mod OP−N where Q = PP ′ ∈ D(A) and QD−2p′ ∈ OP d+d′ . Suppose p 6= 0. A recurrence proves that for any q ∈ N0, D−2P ′ ∼ (−1)k∇k(P ′)D−2k−2 + (−1)q+1D−2∇q+1(P ′)D−2q−2 mod OP−∞ . By Lemma 3.3 (v), the remainder is in OP d ′+2p′−q−3, since P ′ ∈ OP d′+2p′ . Another recurrence gives for any q ∈ N0, D−2pP ′ ∼ k1,··· ,kp=0 (−1)|k|1∇|k|1(P ′)D−2|k|1−2p mod OP d′+2p′−q−1−2p. Thus, with qN = N + d+ d ′ − 1, TT ′ ∼ k1,··· ,kp=0 (−1)|k|1P∇|k|1(P ′)D−2|k|1−2(p+p′) mod OP−N . The last sum can be written QND −2rN where rN := p qN + (p + p ′). Since QN ∈ D(A) and −2rN ∈ OP d+d′ , the result follows. A.3 Proof of Proposition 3.11 Let P ∈ OP k1 , Q ∈ OP k2 ∈ Ψ(A). With [Q, |D|−s] = Q− σ−s(Q) |D|−s and the equivalence Q− σ−s(Q) ∼ − r=1 g(−s, r) εr(Q) mod OP−N−1+k2 , we get P [Q, |D|−s] ∼ − g(−s, r)Pεr(Q)|D|−s mod OP−N−1+k1+k2−ℜ(s) which gives, if we choose N = n+ k1 + k2, P [Q, |D|−s] n+k1+k2∑ g(−s, r)Tr Pεr(Q)|D|−s By hypothesis s 7→ Tr Pεr(Q)|D|−s has only simple poles. Thus, since s = 0 is a zero of the analytic function s 7→ g(−s, r) for any r ≥ 1, we have Res g(−s, r) Tr Pεr(Q)|D|−s = 0, which entails that Res P [Q, |D|−s] = 0 and thus − PQ = Res P |D|−sQ When s ∈ C with ℜ(s) > 2max(k1 + n + 1, k2), the operator P |D|−s/2 is trace-class while |D|−s/2Q is bounded, so Tr P |D|−sQ |D|−s/2QP |D|−s/2 σ−s/2(QP )|D|−s Thus, using (29) again, P |D|−sQ − QP + n+k1+k2∑ g(−s/2, r)Tr εr(QP )|D|−s As before, for any r ≥ 1, Res g(−s/2, r)Tr εr(QP )|D|−s = 0 since g(0, r) = 0 and the spectral triple is simple. Finally, P |D|−sQ − QP. Acknowledgments We thank Pierre Duclos, Emilio Elizalde, Victor Gayral, Thomas Krajewski, Sylvie Paycha, Joe Varilly, Dmitri Vassilevich and Antony Wassermann for helpful discussions and Stéphane Louboutin for his help with Proposition 2.16. A. Sitarz would like to thank the CPT-Marseilles for its hospitality and the Université de Provence for its financial support and acknowledge the support of Alexander von Humboldt Foundation through the Humboldt Fellowship. References [1] A. L. Carey, J. Phillips, A. Rennie and F. A. Sukochev, “The local index formula in semifi- nite von Neumann algebras I: Spectral flow”, Advances in Math. 202 (2006), 415–516. [2] L. Carminati, B. Iochum and T. Schücker, “Noncommutative Yang-Mills and noncommu- tative relativity: a bridge over troubled water, Eur. Phys. J. C 8 (1999) 697–709. [3] A. Chamseddine and A. Connes, “The spectral action principle”, Commun. Math. Phys. 186 (1997), 731–750. [4] A. Chamseddine and A. Connes, “Inner fluctuations of the spectral action”, J. Geom. and Phys. 57 (2006), 1–21. [5] A. Chamseddine, A. Connes and M. Marcolli, “Gravity and the standard model with neu- trino mixing”, [arXiv:hep-th/0610241]. [6] A. Connes, “C∗-algèbres et géométrie différentielle”, C. R. Acad. Sci. Paris 290 (1980), 599–604. [7] A. Connes, “Noncommutative differential geometry”, Pub. Math. IHÉS, 39 (1985), 257– [8] A. Connes, Noncommutative Geometry, Academic Press, London and San Diego, 1994. [9] A. Connes, “Geometry from the spectral point of view”, Lett. Math. Phys., 34 (1995), 203–238. [10] A. Connes, “Noncommutative geometry and reality”, J. Math. Phys. 36 (1995), 6194–6231. [11] A. Connes, Cours au Collège de France, january 2001. [12] A. Connes and G. Landi, “Noncommutative manifolds, the instanton algebra and isospectral deformations”, Commun. Math. Phys. 221 (2001), 141–159. [13] A. Connes and H. Moscovici, “The local index formula in noncommutative geometry”, Geom. And Funct. Anal. 5 (1995), 174–243. [14] A. Edery, “Multidimensional cut-off technique, odd-dimensional Epstein zeta functions and Casimir energy of massless scalar fields”, J. Phys. A: Math. Gen. 39 (2006), 678–712. [15] E. Elizalde, S. D. Odintsov, A. Romeo, A. A. Bytsenko and S. Zerbini, Zeta Regularization Techniques with Applications, Singapore: World Scientific, 1994. [16] R. Estrada, J. M. Gracia-Bond́ıa and J. C. Várilly, “On summability of distributions and spectral geometry”, Commun. Math. Phys. 191 (1998), 219–248. [17] V. Gayral, “Heat-kernel approach to UV/IR Mixing on isospectral deformation manifolds”, Ann. H. Poincaré 6 (2005), 991–1023. [18] V. Gayral and B. Iochum, “The spectral action for Moyal plane”, J. Math. Phys. 46 (2005), no. 4, 043503, 17 pp. http://arxiv.org/abs/hep-th/0610241 [19] V. Gayral, B. Iochum and J. C. Várilly, “Dixmier traces on noncompact isospectral defor- mations”, J. Funct. Anal. 237 (2006), 507–539. [20] V. Gayral, B. Iochum and D. Vassilevich, “Heat kernel and number theory on NC-torus”, Commun. Math. Phys. 273 (2007), 415–443. [21] P. B. Gilkey, Asymptotic Formulae in Spectral Geometry, Chapman & Hall/CRC, Boca Raton, FL, 2004. [22] A. de Goursac, J.-C. Wallet and R. Wulkenhaar, “Noncommutative induced gauge theory”, Eur. Phys. J. C51 (2007) 977–988. [23] J. M. Gracia-Bond́ıa, J. C. Várilly and H. Figueroa, Elements of Noncommutative Geome- try, Birkhäuser Advanced Texts, Birkhäuser, Boston, 2001. [24] V.W. Guillemin, S. Sternberg and J. Weitsman, “The Ehrhart function for symbols”, arXiv:math.CO/06011714. [25] G. H. Hardy and E. M. Wright, An Introduction to the Theory of Numbers, Clarendon, Oxford, 1979. [26] N. Higson, “The local index formula in noncommutative geometry”, Lectures given at the School and Conference on Algebraic K-theory and its applications, Trieste, 2002. [27] T. Kato, Perturbation Theory For Linear Operators, Springer–Verlag, Berlin-Heidelberg- New York, 1980. [28] M. Knecht and T. Schücker, “Spectral action and big desert”, Phys. Lett. B640 (2006) 272-277 [29] R. Nest, E. Vogt and W. Werner, “Spectral action and the Connes–Chamseddine model”, p. 109-132 in Noncommutative Geometry and the Standard Model of Elementary Parti- cle Physics, F. Scheck, H. Upmeier and W. Werner (Eds.), Lecture Notes in Phys., 596, Springer, Berlin, 2002. [30] M. A. Rieffel, “C∗-algebras associated with irrational rotations”, Pac. J. Math. 93 (1981), 415–429. [31] M. A. Rieffel, Deformation Quantization for Actions of Rd, Memoirs Amer. Soc. 506, Providence, RI, 1993. [32] L. Schwartz, Méthodes mathématiques pour les sciences physiques, Hermann, Paris, 1979. [33] B. Simon, Trace ideals and their applications, London Math. Lecture Note Series, Cam- bridge University Press, Cambridge, 1979. [34] W. van Suijlekom, Private communication. [35] A. Strelchenko, “Heat kernel of non-minimal gauge field kinetic operators on Moyal plane, Int. J. Mod. Phys. A22 (2007), 181–202. [36] D. V. Vassilevich, “Non-commutative heat kernel”, Lett. Math. Phys. 67 (2004), 185–194. http://arxiv.org/abs/math/0601171 [37] D. V. Vassilevich, “Heat kernel, effective action and anomalies in noncommutative theories”, JHEP 0508 (2005), 085. [38] D. V. Vassilevich, “Induced Chern–Simons action on noncommutative torus”, [arXiv:hep-th/0701017]. http://arxiv.org/abs/hep-th/0701017 Introduction Residues of series and integral, holomorphic continuation, etc Residues of series and integral Holomorphy of certain series Proof of Lemma ?? for i=1: Proof of Lemma ?? for i=0: Proof of item (i.2) of Theorem ??: Proof of item (iii) of Theorem ??: Commutation between sum and residue Computation of residues of zeta functions Meromorphic continuation of a class of zeta functions A family of polynomials Residues of a class of zeta functions Noncommutative integration on a simple spectral triple Kernel dimension Pseudodifferential operators Zeta functions and dimension spectrum The noncommutative integral Residues of DA for a spectral triple with simple dimension spectrum The noncommutative torus Notations Kernels and dimension spectrum Noncommutative integral computations The spectral action Computations of Even dimensional case Odd dimensional case Proof of the main result Appendix Proof of Lemma ?? Proof of Lemma ?? Proof of Proposition ?? ABSTRACT The spectral action on noncommutative torus is obtained, using a Chamseddine--Connes formula via computations of zeta functions. The importance of a Diophantine condition is outlined. Several results on holomorphic continuation of series of holomorphic functions are obtained in this context. <|endoftext|><|startoftext|> Introduction The late-stage behavior of a material undergoing a first-order phase transition (due to changes in temperature and/or pressure for example) is characterized by thermodynamic instability resolved through phase separation and consequent coars- ening of the emerging phase. In the case of the new phase occupying much smaller volume fraction, and thus appearing as well-separated particles, this coarsening pro- cess (known as Ostwald ripening) is driven by the minimization of surface energy at the interface via diffusional mass exchange between particles while the total mass or volume of each phase is conserved. The result of this kind of mass diffusion from regions of high to regions of low interfacial curvature is the growth of large parti- cles and the shrinkage and final extinction of smaller ones. For a review of some aspects of Ostwald ripening, mainly from the physical and modeling viewpoint, see the survey by Voorhees [21] or the book by Ratke and Voorhees [18]. In this coarsening scenario the mass-diffusion process can be controlled by two different mechanisms: either by the diffusion of atoms away from the particles and into the bulk, or by the reaction-rate of attachment of atoms at the phase interface. In the former case (diffusion control), the random exchange of atoms between the particles and the bulk is sufficiently rapid and the surrounding of each particle is in thermal equilibrium with the atoms in it; in the latter (interface-reaction control), detachment and attachment are slow compared to diffusion and the surrounding bulk can be out of equilibrium with the particle interface. We refer to the physics literature for more details, for example, Slezov and Sagalovich [19], Bartelt, Theis, and Tromp [3]; for a related mathematical treatment see Dai and Pego [5]. The classical theory for Ostwald ripening was developed by Lifshitz and Slyozov [9] and Wagner [22] in the case of supersaturated solid solutions in three dimensions. The Lifshitz–Slyozov–Wagner theory statistically characterizes the evolution by the particle-radius density n(t, R), where n(t, R) dR is defined to be the number of particles with radii between R and dR at time t per unit volume. In the late stages of the phase transition nucleation and coalesence of particles can be neglected since new nuclei dissolve immediately and since particles cannot merge because of the large distances between them. Thus, the particle-radius density satisfies the This work was supported by the DFG through the Graduiertenkolleg RTG-1128 “Analysis, Numerics, and Optimization of Multiphase Problems” at the Humboldt-Universität zu Berlin. http://arxiv.org/abs/0704.0565v4 2 APOSTOLOS DAMIALIS continuity equation (see [18, §5.1]) n(t, R) + v(t, R)n(t, R) where v(t, R) denotes the growth rate of particles of radius R at time t. Using a mean-field ansatz (cf. Section 3), Lifshitz, Slyozov, and Wagner formally calculate n(t, R) + (Rū− 1)n(t, R) ū(t) = n(t, R) dR Rn(t, R) dR, in the diffusion-controlled case, and n(t, R) + n(t, R) ū(t) = Rn(t, R) dR R2n(t, R) dR, in the reaction-controlled one, both results valid in the limit of vanishing mass or volume fraction of particles. In [11] and [12] Niethammer rigorously derived the effective equations in the dif- fusion-controlled case, starting from a quasi-static one-phase Stefan problem with surface tension and kinetic undercooling, −∆u = 0 in Ω \G, V = ∇u · n on ∂G, u = H + βV on ∂G, (1.1) and restricting it to spherical particles. The same was also done in [11] for the full time-dependent parabolic problem but without the kinetic-drag term βV . Here, u is a chemical potential, n is the outer normal to the particle phase G, V is the nor- mal velocity of the phase interface ∂G, and H is its mean curvature. The domain Ω ⊂ R3 is considered bounded and β is a parameter that comes from the nondi- mensionalization and scales like diffusivity over mobility. The second boundary condition is the Gibbs–Thomson law, coupling the curvature of the interface with the chemical potential, modified by accounting for kinetic drag. Note that while under diffusion control the parameter β is small and the kinetic drag can even be neglected (thus yielding the well-known Mullins–Sekerka model [10]), in the reaction-controlled case the values of β are large and, therefore, the kinetic-drag term is necessary. For a derivation of such sharp-interface free-boundary problems from continuum mechanics and thermodynamics see the book of Gurtin [8]. The goal in the following is to use the techniques developed in [11] and [12] to derive the effective equations in the reaction-controlled case. This involves passing over to a different time scale incorporating the parameter β tending to infinity (see Section 2) and, as a result, some extra manipulations in the proofs. Except for the scaling, in Section 2 we also give short proofs of some useful preliminaries and discuss the validity of the mean-field description while in Section 3 we prove pointwise estimates for approximate solutions and for the growth rates of particles. Finally, using these estimates, in Section 4 we pass to the homogenization limit of infinitely-many particles and obtain a weak form of the Lifshitz–Slyozov–Wagner equation. THE LSW EQUATION FOR REACTION-CONTROLLED KINETICS 3 In comparison with the results in the diffusion-controlled case, we make precise that the crucial quantity that has to vanish in order to neglect direct interactions between particles and justify the expected mean-field law is the surface-area density of the particles in contrast to their capacity in the other case (see [11] and [12]). This difference is of interest since the asymptotic limits of vanishing surface area and capacity have different physical interpretations and further refine the näıve general limit of vanishing mass or volume. For the reaction-controlled case though, the result is in some sense to be expected since the limit of vanishing surface- area density corresponds to the physics of the interface-reaction-controlled scenario, where there is an obvious dependence on the area of the interface. 2. Formulation, scaling, and preliminary estimates We start with problem (1.1) where the quasi-static approximation to the para- bolic diffusion equation is justified by the small interfacial velocities present during late-stage coarsening. (See the discussion in Mullins and Sekerka [10].) We further suppose that the solid phase consists of spherical particles with cen- ters fixed in space, a simplification that can be justified by the work of Alikakos and Fusco [1], [2], and Velázquez [20]. Denoting these particles as Bi, where each Bi is the closed ball B(xi, Ri(t)), the particle phase is then the union ∪Bi and its isotropic evolution can be modeled by averaging the flux in the Stefan condition, i.e., V = Ṙi(t) := − ∇u · n , where the average integral is defined as for a function f on some domain D, and where the overdot denotes a derivative with respect to time; the Gibbs–Thomson law becomes then + βṘi, since in the case of spheres the mean curvature is the inverse radius. To have many small particles in a bounded domain, for a system with size of order O(1), say the unit cube [0, 1]3, let δ be the typical particle distance with 0 < δ ≪ 1. For the distribution of particle centers in space, we assume, for simplicity, that they are situated on a three-dimensional lattice of spacing δ. Then, the initial number density of particles Ni(δ) will be bounded by 1/δ 3, and for the particles to be small let the typical particle size be δα for α > 1. For times t ∈ [0, T ] we choose a δ small enough so that adjacent particles of size δα will not collide during the evolution up to a maximal time T . Concerning the assumption on the spatial distribution of particles, a more general assumption like infi6=j |xi−xj| > cδ, for a constant c > 0, would still be enough for our purposes in this work. These considerations will also be used in the proof of Lemma 3.2 where we approximate a certain sum over all particles by an integral. For an approach using more sophisticated deterministic and stochastic assumptions on the distribution of particles with respect to homogenization we refer to Niethammer and Velázquez [16], [17], where also further refinements of the theory are made. To have particle sizes of order O(1) as well, we rescale Rδi := 4 APOSTOLOS DAMIALIS and motivated by the scaling invariance of problem (1.1) (cf. [5]), uδ := δαu, tδ := Notice that this rescaling is another way of addressing the reaction-controlled regime. Instead of rescaling time by β and then letting β tend to infinity, we keep β fixed and positive, and specially rescale as above letting δ tend to zero. Since now β plays no significant role, we will set it to unity in what follows. In addition, one easily sees that the transformations Rδi , u δ, and tδ preserve the form of the equations. From hereon we also drop the superscript δ from the notation for time and to denote the dependence on the new scale we write Bδi := B(xi, δ αRδi ). Finally, note that under diffusion control the relevant scale for time would be δ3α instead of δ2α. This difference is key to all that follows, leading to different consid- erations on the validity of the mean-field model. (Cf. the remarks following Lemma 2.1.) As initial data, for every particle-center xi we associate a corresponding bounded initial radius Rδi (0) with the assumption that supi∈NiR i (0) ≤ R0, uniformly for some constant R0. To consider a closed system, we impose a no-flux Neumann boundary condition on the outer boundary of Ω, i.e., ∇uδ · n = 0 on ∂Ω. In case the ith particle vanishes at time ti := sup{t | Rδi (t) > 0}, for times later than ti we define R i to be zero, reduce the number N(t) := {j | Rδj(t) > 0} of active particles by one, and neglect the boundary ∂Bδi in the boundary conditions. In the following, all sums, unions, and suprema will run over the set N(t), with N(0) ≡ Ni, and any further reference to the particle-number density will mean the active particle-number density N unless otherwise noted. Summarizing, the restricted and rescaled problem for the particle radii can be considered as a nonlocal, N -dimensional system of ordinary differential equations Ṙδi (t) = 4πδ2αRδi (t) ∇uδ · n on ∂Bδi (t), (2.1) for times t ∈ (0, ti), ti < T , and with bounded initial data Rδi (0) for every i, while the chemical potential is determined by −∆uδ(t, x) = 0 in Ω \ ∪Bδi (t), (2.2) uδ(t, x) = Rδi (t) + Ṙδi (t) on ∂B i (t), (2.3) and the Neumann condition on the outer boundary. Global existence and uniqueness of continuous, piecewise-smooth solutions for a similar restricted Stefan problem was proved in [12] by an application of the Picard–Lindelöf theorem, the only difference being the different time scale. These solutions are not globally smooth due to the singularities arising from the extinction of particles; however, they are smooth in the intervals between the extinction times ti. In the following, when we mention solutions of the problem we will mean such continuous, piecewise-smooth solutions that exist up to any given time T . THE LSW EQUATION FOR REACTION-CONTROLLED KINETICS 5 It is easy to see that equations (2.1), (2.2), (2.3), along with the outer boundary condition conserve the volume and decrease the interfacial area of the particle phase. Indeed, differentiating the total volume of particles with respect to time gives Rδi (t) 3 = 3 Rδi (t) 2Ṙδi (t) = 3 Rδi (t) 4πδ2αRδi (t) ∇uδ · n where the last sum vanishes due to the divergence theorem, equation (2.2), and the no-flux condition on ∂Ω. The decrease of total surface area follows from the next a priori estimate. Lemma 2.1. For any time t ∈ (0, T ), the solutions of the problem satisfy the following energy equality. (Rδi ) 2|Ṙδi |2 + Rδi (t) 4πδ2α Ω\∪Bδ |∇uδ|2 = 1 Rδi (0) Proof. Multiplying −∆uδ = 0 with uδ, integrating over Ω \ ∪Bδi , and integrating by parts gives Ω\∪Bδ |∇uδ|2 + (∇uδ · n)uδ − (∇uδ · n)uδ = 0, where the last term vanishes due to the Neumann condition on the outer boundary. Thus, using equations (2.3) and (2.1) we get Ω\∪Bδ |∇uδ|2 = + Ṙδi ∇uδ · n + Ṙδi 4πδ2α(Rδi ) 2Ṙδi = 4πδ2α Rδi Ṙ i + (R 2|Ṙδi |2 and after rearranging, (Rδi ) 2|Ṙδi |2 + Rδi Ṙ 4πδ2α Ω\∪Bδ |∇uδ|2 = 0. (2.4) The result follows from an integration over time. � After normalization with respect to the initial particle-number density Ni, this energy equality can yield useful information on the validity of the mean-field ap- proach. In fact, we have (Rδi ) 2|Ṙδi |2+ Rδi (t) 4πNiδ2α Ω\∪Bδ |∇uδ|2 = 1 Rδi (0) where the right-hand side is uniformly bounded by the assumption on the initial radii. For the left-hand side to stay bounded as well, if the quantity Niδ 2α tends to zero, the same must hold for |∇uδ| and it is exactly this limit of vanishing surface- area density of particles that results in a mean field that is constant in space since, in particular, ∇uδ → 0 in L2 0, T ;H1(Ω) Here and in the following, to obtain global estimates that are uniform in δ we extend uδ to the interior of particles, and thus to the whole of Ω, by its boundary values. It is important to note that in our scaling setup, for the surface area to vanish as δ 6 APOSTOLOS DAMIALIS tends to zero, the exponent α must be strictly larger than 3/2 since Ni is O(1/δ These facts will be made precise in Corollary 3.3 where we give an estimate of the mean-field effect. Note also that we do not address here the critical case α = 3/2 that corresponds to finite surface area. For that one would have to use the different methods developed by Niethammer and Otto in [13]. Finally, note that for similar considerations under diffusion control, the corre- sponding quantity would be the capacity Niδ α due to the different time scale. In three dimensions, this capacity effect fits to general homogenization results as in the work of Cioranescu and Murat [4]; to our knowledge though, the surface-area effect has not been explicitly discussed in the relevant literature. 3. Approximation and growth-rate estimates As in the mean-field ansatz of Lifshitz, Slyozov, and Wagner, we suppose that the system is dilute enough so that particles behave as if they were isolated and we base our approximation on the solution of a single-particle problem. Consider problem (2.1), (2.2), (2.3) for a single spherical particle centered at the origin and with initially unscaled radius r that we rescale as rδ := r/δα, along with the corresponding reaction-controlled rescalings for a chemical potential uδr and time, as in Section 2. For this rescaled particle Bδr we consider the following problem in the whole space: ṙδ(t) = 4πδ2αrδ(t)2 ∇uδr · n on ∂Bδr , where the chemical potential uδr(t, x) satisfies −∆uδr(t, x) = 0 in R3 \Bδr , uδr(t, x) = rδ(t) + ṙδ(t) for x ∈ ∂Bδr , and the mean-field assumption is posed as a condition at infinity, i.e., |x|→∞ uδr(t, x) = ū r(t). This problem can be explicitly solved to give uδr(t, x) = ū r(t) + δαrδ(t) 1 + δαrδ(t) 1− ūδr(t)rδ(t) ṙδ(t) = 1 + δαrδ(t) ūδr(t)− rδ(t) Note that in the formal limit of δ tending to zero, the expected effective equations take the general form u(t, x) = ū(t) and ṙ = ū− 1 as in the reaction-controlled Lifshitz–Slyozov–Wagner theory. Going now back to the many-particle problem, a calculation using the single- particle growth rate above along with the requirement that the volume is conserved gives the following expression for the mean field ūδ = 1 + δαRδi (Rδi ) 1 + δαRδi . (3.1) THE LSW EQUATION FOR REACTION-CONTROLLED KINETICS 7 The effect of this mean field plus a sum of single-particle solutions will be the monopole approximation to the solution uδ supposing that there are no direct interactions between particles. To this end, let us define the approximate solution ζδ(t, x) := ūδ(t) + δαRδi (t) 1 + δαRδi (t) 1− ūδ(t)Rδi (t) |x− xi| (3.2) for x ∈ Ω \ ∪Bδi (t). Below is a maximum principle tailored to our setting that will be used to compare the approximation and the solution in the lemma next. Its proof can be found in [12]. Lemma 3.1. Let Ω be a Lipschitz domain and let ∪Bi ⊂ Ω be a finite collection of disjoint closed balls. Then, a function v which is constant on each of the boundaries ∂Bi and satisfies −∆v = 0 in Ω \ ∪Bi, v − ci ∇v · n ≥ 0 on ∂Bi, ∇v · n ≥ 0 on ∂Ω, where ci ≥ 0 for all i, also satisfies v ≥ 0 in Ω \ ∪Bi. Lemma 3.2. For any time t ∈ (0, T ) and small positive ε, the chemical potential and its approximation satisfy ‖uδ − ζδ‖L∞(Ω\∪Bδ )(t) ≤ Cδ2α−3−ε supRδi (t) 1 + ūδ(t) supRδi (t) Proof. Since the difference uδ − ζδ is already harmonic in Ω \ ∪Bδi as ζδ is a su- perposition of fundamental solutions, we would like to estimate to what extent it satisfies the maximum principle’s boundary conditions. For the condition on the particle boundaries, we use equations (2.1), (2.3), and the definition of ζδ to calculate for x on the boundary ∂Bδi of the ith particle, ζδ(t, x)−− ∇ζδ · n uδ(t, x)−− ∇uδ · n ζδ − 1 ∇ζδ · n ūδ − 1 δαRδj 1 + δαRδj (1− ūδRδj) |x− xj | |x− xj | and since by the divergence theorem there holds for j 6= i, |x− xj | · n = 0, while for j = i, |x− xi| · n = − 1 δα(Rδi ) 8 APOSTOLOS DAMIALIS we continue the calculation to get ūδ − 1 1− ūδRδi Rδi (1 + δ αRδi ) δαRδj 1 + δαRδj (1− ūδRδj) |x− xj | j 6=i δαRδj 1 + δαRδj (1− ūδRδj) |x− xj | ≤ δ2α−3 supRδj (1 + ūδ supRδj ) j 6=i |x− xj | ≤ Cδ2α−3 supRδj(1 + ūδ supRδj). (3.3) In the last step, keeping in mind the assumptions on the spatial distribution of particle centers, the sum is bounded for j 6= i since it is considered as a Riemann- sum approximation to the integral |x− y| which in turn is bounded using radial symmetry around the singularity and where the factor δ3 in the sum compensates for the scaling in space. To further fulfil the maximum principle’s outer boundary condition on ∂Ω, we consider the comparison function ζδ+zδ, where the auxiliary function zδ solves the problem −∆zδ = ∇ζδ · n in Ω, ∇zδ · n = −∇ζδ · n on ∂Ω, zδ = 0, (3.4) such that the comparison function ζδ + zδ has zero normal derivative on ∂Ω. To work with the maximum principle, zδ also needs to be harmonic in Ω and for that we need that the integral ∇ζδ · n vanishes. But, ∇ζδ · n = δα δαRδi 1 + δαRδi (1−Rδi ūδ) |x− xi| · n , where the last integral equals −4π, independent of i. Thus, zδ is harmonic if and only if ūδ = 1 + δαRδi (Rδi ) 1 + δαRδi which is exactly the mean field (3.1) as dictated by the single-particle ansatz in the beginning of the section. Moreover, since now zδ is harmonic, the divergence theorem further gives ∇zδ · n = 0. A construction as in Lemma 3 of [11] and elliptic regularity theory (see Gilbarg and Trudinger [7]) give the estimate ‖zδ‖L∞(Ω) ≤ Cεδ2α−3−ε supRδi (1 + ūδ supRδi ), where ε is a small positive number. THE LSW EQUATION FOR REACTION-CONTROLLED KINETICS 9 Let us now apply the maximum principle to the function f+ := u δ − ζδ − zδ + Cδ2α−3−ε supRδi (1 + ūδ supRδi ). For a large enough constant C, say 2Cε, the following hold for f+: it is harmonic, there holds ∇f+ · n = 0 on ∂Ω by the construction of zδ, and for the constants ci = 1/4πδ 2α(Rδi ) 2, estimate (3.3) gives uδ − ζδ − zδ + Cδ2α−3−ε supRδi (1 + ūδ supRδi )− ci ∇(uδ − ζδ − zδ) · n ≥ 0. Thus, f+ satisfies the maximum principle’s conditions and therefore, f+ ≥ 0 in Ω \ ∪Bδi , i.e., uδ − ζδ − zδ ≥ −Cδ2α−3−ε supRδi (1 + ūδ supRδi ). Using the maximum principle with −v instead of v, the function f− := u δ − ζδ − zδ − Cδ2α−3−ε supRδi (1 + ūδ supRδi ), again satisfies the corresponding conditions and, as above, yields f− ≤ 0 in Ω\∪Bδi , i.e., uδ − ζδ − zδ ≤ Cδ2α−3−ε supRδi (1 + ūδ supRδi ). Combining the last two inequalities, we get ‖uδ − ζδ − zδ‖L∞(Ω\∪Bδ ) ≤ Cδ2α−3−ε supRδi (1 + ūδ supRδi ) and the lemma follows by the triangle inequality using the regularity of zδ. � In the previous lemma it is clear that our approach excludes the critical case α = 3/2. In the following we introduce, for technical reasons, a new exponent γ > 0 with the property δγ := max {δα, δ2α−3, δ2α−3−ε} for each α greater than 3/2 + ε. As a corollary to the previous lemma we can now estimate the effect of the mean field. Corollary 3.3. For any time t ∈ (0, T ) and γ > 0, the chemical potential and the mean field satisfy ‖uδ − ūδ‖L∞(Ω\∪Bδ )(t) ≤ Cδγ 1 + 2 supRδi (t) 1 + ūδ(t) supRδi (t) Proof. By the triangle inequality and Lemma 3.2 there holds ‖uδ − ūδ‖L∞(Ω\∪Bδ ) ≤ ‖ζδ − ūδ‖L∞(Ω\∪Bδ ) + Cδ 2α−3−ε supRδj(1 + ū δ supRδj). To estimate ‖ζδ− ūδ‖L∞(Ω\∪Bδ ), by the definition of ζ δ there holds for x ∈ Ω\∪Bδj , ζδ(t, x)− ūδ(t) δαRδj 1 + δαRδj (1− ūδRδj) |x− xj | δαRδi 1 + δαRδi (1 + ūδRδi ) |x− xi| j 6=i δαRδj 1 + δαRδj (1− ūδRδj) |x− xj | and since |x− xi| ≥ δαRδi in Ω \ ∪Bδj , arguing as in estimate (3.3) gives α(1 + ūδRδi ) 1 + δαRδi + Cδ2α−3 supRδj(1 + ū δ supRδj) ≤ C(δα + δ2α−3 supRδj )(1 + ūδ supRδj), 10 APOSTOLOS DAMIALIS thus, ‖ζδ − ūδ‖L∞(Ω\∪Bδ ) ≤ C(δα + δ2α−3 supRδj )(1 + ūδ supRδj), and finally, ‖uδ − ūδ‖L∞(Ω\∪Bδ ) ≤ C(δα + δ2α−3 supRδj + δ2α−3−ε supRδj )(1 + ūδ supRδj). Using the exponent γ, we get ‖uδ − ūδ‖L∞(Ω\∪Bδ ) ≤ Cδγ(1 + 2 supRδi )(1 + ūδ supRδi ). � The following lemma gives an estimate for the growth rate of particles in accor- dance with the reaction-controlled Lifshitz–Slyozov–Wagner theory. Lemma 3.4. For any time t ∈ (0, T ) and γ > 0, for the growth rates of particles holds Ṙδi − ūδ − 1 ≤ Cδγ(1 + ūδ supRδi ) 1 + (1 + δγ supRδi )(1 + 2 supR Proof. Let wδi be the capacity potential of the ball B i with respect to a larger ball Bλδi := B(xi, λδ αRδi ) for λ > 1, i.e., let wi solve −∆wδi = 0 in Bλδi \Bδi , wδi = 0 on ∂B wδi = 1 in B (3.5) An explicit calculation gives wδi = 1− λδ |x− xi| (3.6) and also ∇wδi · n = ∇wδi · n = 4π 1− λδ αRδi . (3.7) Using equations (2.1), (2.2), (2.3), and the Neumann boundary condition, along with the above properties of wδi , and integrating by parts, gives 4πδ2α(Rδi ) 2Ṙδi = ∇uδ · n wδi∇uδ · n ∇wδi∇uδ uδ∇wδi · n − uδ∇wδi · n + Ṙδi ∇wδi · n − uδ∇wδi · n δαRδi + Ṙδi − ūδ (uδ − ūδ)∇wδi · n , THE LSW EQUATION FOR REACTION-CONTROLLED KINETICS 11 where in the last equation we used (3.7) and added and subtracted ūδ. After rearranging, we have Ṙδi − ūδ − 1 δαRδi Ṙ 4πλδαRδi (uδ − ūδ)∇wδi · n ≤ λ− 1 δαRδi |Ṙδi |+ 4πλδαRδi (uδ − ūδ)∇wδi · n ≤ δαRδi |Ṙδi |+ ‖uδ − ūδ‖L∞(Ω\∪Bδ ), (3.8) where in the last step we again used equation (3.7). But by using equation (2.3) for uδ on ∂Bδi we have Rδi |Ṙδi | ≤ 1 +Rδi |uδ| ≤ 1 +Rδi (‖uδ − ūδ‖L∞(Ω\∪Bδ ) + ū δ). (3.9) Substituting back in (3.8) and using Corollary 3.3 gives the final estimate. � The next lemma ensures that the bounds in the approximation and the growth- rate estimates are indeed uniform. Lemma 3.5. For any time t ∈ (0, T ), the mean field and the radii of the particles are uniformly bounded, i.e., ūδ(t) ≤ C and supRδi (t) ≤ C. Proof. For the mean field (3.1) holds ūδ = 1 + δαRδi (Rδi ) 1 + δαRδi ≤ supRδi (1 + δα supRδi ) (Rδi ) and since by Hölder’s inequality (Rδi ) (Rδi ) δ3(Rδi ) conservation of the total volume of particles gives ūδ ≤ C supRδi (1 + δα supRδi ). or, using the exponent γ, ūδ ≤ C supRδi (1 + δγ supRδi ). (3.10) Consider now the set t | supRδi (t) ≤ then, for times t ∈ A, plugging (3.10) in estimate (3.9) and using Corollary 3.3 gives (Rδi ) 2 ≤ C sup(Rδi )2 + C. Integrating over the time interval (0, T ), Gronwall’s inequality implies that supi supt∈A∩[0,T ] (R 2 ≤ C(T ), therefore, [0, T ] ⊂ A, i.e., the radii are bounded up to time T as is the mean field by estimate (3.10). � Finally, the following lemma gives control over the growth rates of vanishing particles and will prove useful for some regularity considerations in the next section. 12 APOSTOLOS DAMIALIS Lemma 3.6. For any time t ∈ (0, T ) such that Rδi (t) ≤ 1/4 supt,δ ūδ(t) and for sufficiently small δ, there holds ≤ Ṙδi ≤ − and √ ti − t ≤ Rδi ≤ 2 ti − t. Proof. For δαRδi 1 + δαRδi |x− xi| it can be verified that the function uδ−g satisfies the assumptions of the maximum principle in Lemma 3.1 for the constants ci = 1/4πδ 2α(Rδi ) 2, thus yielding uδ ≥ g in Ω \ ∪Bδi . But since uδ = g on the boundary ∂Bδi , monotonicity implies that ∇uδ · n ≥ ∇g · n on ∂Bδi and taking the average integrals over ∂Bδi we have Ṙδi ≥ − Rδi (1 + δ αRδi ) ≥ − 2 Moreover, Lemma 3.4 gives Ṙδi ≤ ūδ − + Cδγ(1 + ūδ supRδi ) 1 + (1 + δγ supRδi )(1 + 2 supR Using now the assumption that Rδi ≤ 1/4 supt,δ ūδ and since from Lemma 3.5 it follows that for sufficiently small δ the O(δγ) term is uniformly bounded by 1/4Rδi , we get Ṙδi ≤ ≤ − 1 Let now y1 := ti − t, y2 := 2 ti − t be sub- and supersolutions that respectively solve ẏ1 = − , ẏ2 = − By comparison, we get the lemma’s second assertion, i.e., y1 ≤ Rδi ≤ y2. � 4. Homogenization In order to pass to the homogenization limit of infinitely-many particles, we need first describe the particle-radius density in the limit. To that end, define at any time t ∈ (0, T ) the empirical measure νδt as 〈φ, νδt 〉 = t, Rδi (t) dνδt := t, Rδi (t) for φ ∈ Cc, i.e., for functions φ(t, R) continuous and compactly supported in the radius variable. Using now the estimates from the previous section, we can prove the following Lemma 4.1. For a subsequence δ → 0 and for a function ū ∈ W 1, p(0, T ), for p < 2, holds ūδ → ū in L2(0, T ), uδ → ū in L2 0, T ;H1(Ω) THE LSW EQUATION FOR REACTION-CONTROLLED KINETICS 13 Furthermore, the measures νδt converge to a family νt of probability measures such φdνδt → φa(t) dνt uniformly in t, where a(t) denotes the percentage of active particles in the limit. Proof. As a consequence of Lemma 3.6, we have sup ‖Ṙδi ‖Lp(0,T ) ≤ C(p) for p < 2, thus, conservation of volume and boundedness of the radii give for p < 2, Lp(0,T ) ≤ C sup ‖Ṙδi ‖Lp(0,T ) ≤ C. Therefore, ūδ ∈ W 1, p(0, T ), for p < 2, and the compactness following from the Rellich–Kondrachov theorem gives that ūδ converges to a limit ū in L2. Taking into consideration that uδ is extended to the whole of Ω and using the lemmas in the previous section, ζδ converges to ū in L2(Ω) and uδ − ζδ converges uniformly to 0 as δ → 0, therefore, uδ converges to ū in L2(Ω). By the energy equality in Lemma 2.1 we have further control over ‖∇uδ‖L2 and thus, we have strong convergence in L2(0, T ;H1(Ω)). For the measures νδt holds ‖νδt ‖ := sup‖φ‖Cc≤1|〈φ, ν t 〉| ≤ 1 in the norm of (Cc) ∗, so for a subsequence δ → 0 there holds that νδt converges weakly-* to νt. Furthermore, for positive functions φ the limit measure νt is non- negative and from this it follows that νt becomes zero if there are no particles left in the system. Choosing now a function ψ(t) that depends only on time, we calculate ψ(t) dνδt = ψ(t). The ratio N/Ni is the percentage of active particles at time t. This ratio is bounded by 1 and decreasing, therefore it is uniformly bounded in the space BV (0, T ) and by the compact embedding of BV (0, T )∩L∞(0, T ) in L2(0, T ), it converges in L2, for a subsequence δ → 0, to a limit a ∈ BV (0, T ). If we project now the measure νt to the interval [0, T ], we get that the projection satisfies proj[0, T ] νt = a(t) dt and according to [6, Ch. 1, Thm. 10], the decomposition and convergence to νt follow from the slicing of measures. � We conclude with the following theorem which states that the limit measure νt satisfies the Lifshitz–Slyozov–Wagner equation in a weak sense. Note that in the theorem’s statement, the initial condition is defined as t, Rδi (t) dνδ0 := 0, Rδi (0) Theorem 4.2. The measure νt satisfies the Lifshitz–Slyozov–Wagner equation in the sense that φ(t, R) + ū− 1 φ(t, R) a(t) dνt + φ(0, R) dν0 = 0, (4.1) 14 APOSTOLOS DAMIALIS for all smooth and compactly supported functions φ ∈ C∞c ([0, T )× R+), where the mean field ū is given by R dνt R2 dνt. Proof. We begin by computing the mean-field limit ū(t). For a continuous function φ(t) there holds, by the definition of ūδ, 1 + δαR dνδt = ∑ R2i 1 + δαRi 1 + δαRi 1 + δαR dνδt . Taking the limit δ → 0 on both sides, Lemma 4.1 gives R dνt R2 dνt. Consider now a smooth and compactly supported function φ as in the theorem’s statement. Then, the fundamental theorem of calculus and Lemma 3.4 give t, Rδi (t) dνδt + 0, Rδi (0) t, Rδi (t) + Ṙδi (t) t, Rδi (t) dνδt + 0, Rδi (0) t, Rδi (t) ūδ − 1 t, Rδi (t) dνδt +O(δ 0, Rδi (0) dνδ0 . The result follows by taking the limit for a subsequence δ → 0 and using the strong convergence of ūδ. � As a concluding remark, we note that the well-posedness (existence, uniqueness, and continuous dependence on initial data) of the weak formulation (4.1) can be treated by the methods developed by Niethammer and Pego in [14] and [15]. Acknowledgments Thanks are due to Barbara Niethammer for her substantial help and to Nick Alikakos and Bob Pego for helpful discussions. Thanks are also due to the anony- mous referee for a careful reading of the manuscript. References [1] N. D. Alikakos and G. Fusco. The equations of Ostwald ripening for dilute systems. J. Stat. Phys. 95 No. 5/6 (1999), pp. 851–866. [2] N. D. Alikakos and G. Fusco. Ostwald ripening for dilute systems under quasistationary dynamics. Comm. Math. Phys. 238 No. 3 (2003), pp. 429–479. [3] N. C. Bartelt, W. Theis, and R. M. Tromp. Ostwald ripening of two-dimensional islands on Si(001). Phys. Rev. B 54 (1996), pp. 11741–11751. [4] D. Cioranescu and F. Murat. A strange term coming from nowhere. In Topics in the math- ematical modelling of composite materials, A. Cherkaev, R. Kohn eds. Birkhäuser, Boston, MA, 1997, pp. 45–94. THE LSW EQUATION FOR REACTION-CONTROLLED KINETICS 15 [5] S. Dai and R. L. Pego. Universal bounds on coarsening rates for mean-field models of phase transitions. SIAM J. Math. Anal. 37 No. 2 (2005), pp. 347–371. [6] L. C. Evans. Weak convergence methods for nonlinear partial differential equations. American Mathematical Society, Providence, RI, 1990. [7] D. Gilbarg and N. S. Trudinger. Elliptic partial differential equations of second order. Springer-Verlag, Berlin, second edition, 1983. [8] M. E. Gurtin. Thermomechanics of evolving phase boundaries in the plane. The Clarendon Press, Oxford, 1993. [9] I. M. Lifshitz and V. V. Slyozov. The kinetics of precipitation from supersaturated solid solutions. J. Phys. Chem. Solids 19 (1961), pp. 35–50. [10] W. W. Mullins and R. F. Sekerka. Morphological stability of a particle growing by diffusion or heat flow. J. Appl. Phys. 34 No. 2 (1963), pp. 323–329. [11] B. Niethammer. Derivation of the LSW-theory for Ostwald ripening by homogenization meth- ods. Arch. Rat. Mech. Anal. 147 (1999), pp. 119–178. [12] B. Niethammer. The LSW model for Ostwald ripening with kinetic undercooling. Proc. R. Soc. Edinburgh 130A No. 6 (2000), pp. 1337–1361. [13] B. Niethammer and F. Otto. Ostwald ripening: The screening length revisited. Calc. Var. 13 No. 1 (2001), pp. 33–68. [14] B. Niethammer and R. L. Pego. On the initial-value problem in the Lifshitz–Slyozov–Wagner theory of Ostwald ripening. SIAM J. Math. Anal. 31 No. 3 (2000), pp. 467–485. [15] B. Niethammer and R. L. Pego. Well-posedness for measure transport in a family of nonlocal domain coarsening models. Indiana Univ. Math. J. 54 No. 2 (2005), pp. 499–530. [16] B. Niethammer and J. J. L. Velázquez. Homogenization in coarsening systems I: Deterministic case. Math. Mod. Meth. Appl. Sci. 14 No. 8 (2004), pp. 1211–1233. [17] B. Niethammer and J. J. L. Velázquez. Homogenization in coarsening systems II: Stochastic case. Math. Mod. Meth. Appl. Sci. 14 No. 9 (2004), pp. 1–24. [18] L. Ratke and P. W. Voorhees. Growth and coarsening: Ostwald ripening in material process- ing. Springer-Verlag, Berlin, 2002. [19] V. V. Slezov and V. V. Sagalovich. Diffusive decomposition of solid solutions. Sov. Phys. Usp. 30 No. 1 (1987), pp. 23–45. [20] J. J. L. Velázquez. On the effect of stochastic fluctuations in the dynamics of the Lifshitz– Slyozov–Wagner model. J. Stat. Phys. 99 No. 1/2 (2000), pp. 231–252. [21] P. W. Voorhees. The theory of Ostwald ripening. J. Stat. Phys. 38 No. 1/2 (1985), pp. 231– [22] C. Wagner. Theorie der Alterung von Niederschlägen durch Umlösen. Z. Elektrochem. 65 No. 7/8 (1961), pp. 581–591. Institut für Mathematik, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany E-mail address: damialis@mathematik.hu-berlin.de Current address: Department of Mathematics, University of Athens, Panepistemiopolis, 15784 Athens, Greece mailto:damialis@mathematik.hu-berlin.de 1. Introduction 2. Formulation, scaling, and preliminary estimates 3. Approximation and growth-rate estimates 4. Homogenization Acknowledgments References ABSTRACT We rigorously derive a weak form of the Lifshitz-Slyozov-Wagner equation as the homogenization limit of a Stefan-type problem describing reaction-controlled coarsening of a large number of small spherical particles. Moreover, we deduce that the effective mean-field description holds true in the particular limit of vanishing surface-area density of particles. <|endoftext|><|startoftext|> Introduction 1 1.1 Canonical AZD hcan . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Supercanonical AZD ĥcan . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Variation of the supercanonical AZD ĥcan . . . . . . . . . . . . . 5 2 Proof of Theorem 1.7 6 2.1 Upper estimate of K̂Am . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Lower estimate of K̂Am . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Independence of ĥcan,A from hA . . . . . . . . . . . . . . . . . . 9 2.4 Completion of the proof of Theorem 1.7 . . . . . . . . . . . . . . 10 2.5 Comparison of hcan and ĥcan . . . . . . . . . . . . . . . . . . . . 10 3 Variation of ĥcan under projective deformations 11 3.1 Construction of ĥcan on a family . . . . . . . . . . . . . . . . . . 12 3.2 Semipositivity of the curvature current of ĥm,A . . . . . . . . . . 13 3.3 Uniqueness of ĥcan,A for singular hA’s . . . . . . . . . . . . . . . 16 3.4 Case dimS > 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.5 Completion of the proof of Theorem 1.10 . . . . . . . . . . . . . 17 4 Appendix 20 1 Introduction Let X be a smooth projective variety and let KX be the canonical bundle of X . In algebraic geometry, the canonical ring R(X,KX) := ⊕∞m=0Γ(X,OX(mKX)) is one of the main object to study. http://arxiv.org/abs/0704.0566v5 Let X be a smooth projective variety such that KX is pseudoeffective. The purposes of this article are twofold. The first purpose is to construct a singular hermitian metric ĥcan on KX such that 1. ĥcan is uniquely determined by X . ĥcan is semipositive in the sense of current. 3. H0(X,OX(mKX)⊗I(ĥmcan)) ≃ H0(X,OX(mKX)) holds for every m ≧ 0, where I(ĥmcan) denotes the multiplier ideal sheaf of ĥmcan as is defined in [N]. And the second purpose is to study the behavior of ĥcan on projective families. We may summerize the 2nd and the 3rd conditions by introducing the following notion. Definition 1.1 (AZD)([T1, T2]) Let M be a compact complex manifold and let L be a holomorphic line bundle on M . A singular hermitian metric h on L is said to be an analytic Zariski decomposition (AZD in short), if the followings hold. 1. Θh is a closed positive current. 2. For every m ≥ 0, the natural inclusion H0(M,OM (mL)⊗ I(hm)) → H0(M,OM (mL)) is an isomorphim. Remark 1.2 A line bundle L on a projective manifold X admits an AZD, if and only if L is pseudoeffective ([D-P-S, Theorem 1.5]). � In this sense, the first purpose of this article is to construct an AZD on KX depending only on X , when KX is pseudoeffective (by Remark 1.2 this is the minimum requirement for the existence of an AZD). The main motivation to construct such a singular hermitian metric is to study the canonical ring in terms of it. This is indeed possible. For example, we obtain the invariance of plurigenera under smooth projective deformations (cf. Corollary 1.12). In fact the hermitian metric constructed here is useful in many other con- texts. Other applications and a generalization to subKLT pairs will be treated in the forthcoming papers ([T6]). I would like to express thanks to Professor Bo Berndtsson who pointed out an error in the previous version. 1.1 Canonical AZD hcan If we assume the stronger assumption that X has nonnegative Kodaira dimen- sion, we have already konwn how to construct a canonical AZD for KX . Let us review the construction in [T5]. Theorem 1.3 ([T5]) Let X be a smooth projective variety with nonnegative Kodaira dimension. We set for every point x ∈ X Km(x) := sup{| σ | m (x);σ ∈ Γ(X,OX(mKX)), | (σ ∧ σ̄) 1m |= 1} K∞(x) := lim sup Km(x). hcan := the lower envelope of K is an AZD on KX. � Remark 1.4 By the ring structure of R(X,KX), we see that lim sup Km(x) = sup Km(x) holds. � Remark 1.5 Since h∞ depends only on X, the volume h−1can is an invariant of X. � Apparently this construction is very canonical, i.e., hcan depends only on the complex structure of X . We call hcan the canonical AZD of KX . But this construction works only if we know that the Kodaira dimension of X is nonneg- ative apriori. This is the main defect of hcan. For example, hcan is useless to solve the abundance conjecture. 1.2 Supercanonical AZD ĥcan To avoid the defect of hcan we introduce the new AZD ĥcan. Let us use the following terminology. Definition 1.6 Let (L, hL) be a singular hermitian line bundle on a complex manifold X. (L, hL) is said to be pseudoeffective, if the curvature current of hL is semipositive. � Let X be a smooth projective n-fold such that the canonical bundle KX is pseudoeffective. Let A be a sufficiently ample line bundle such that for every pseudoeffective singular hermitian line bundle (L, hL) on X , OX(A+L)⊗I(hL) and OX(KX+A+L)⊗I(hL) are globally generated. Such an ample line bundle A extists by L2-estimates. Let hA be a a C ∞ hermitian metric on A with strictly positive curvature 1. Let us fix a C∞ volume form dV on X . By the L2-extension theorem ([O]) we may and do assume that A is sufficiently ample 1Later we shall also consider the case that hA is any C ∞ hermitian metric (without posi- tivity of curvature) or a singular hermitian metric on A. so that for every x ∈ X and for every pseudoeffective singular hermitian line bundle (L, hL), there exists a bounded interpolation operator Ix : A 2(x, (A+ L)x, hA · hL, δx) → A2(X,A+ L, hA · hL, dV ) such that the operator norm of Ix is bounded by a positive constant independent of x and (L, hL), where A 2(X,A + L, hA · hL, dV ) denotes the Hilbert space defined by A2(X,A+L, hA·hL, dV ) := {σ ∈ Γ(X,OX(A+L)⊗I(hL)) | | σ |2 ·hA·hL·dV < +∞} with the L2 inner product (σ, σ′) := σ · σ̄′ · hA · hL · dV and A2(x, (A+L)x, hA ·hL, δx) is defined similarly, where δx is the Dirac measure supported at x. We note that if hL(x) = +∞, then A2(x, (A+L)x, hA ·hL, δx) = 0. For every x ∈ X we set K̂Am(x) := sup{| σ | m (x) | σ ∈ Γ(X,OX(A+mKX)), | A · (σ ∧ σ̄) m |= 1}. Here | σ | 2m is not a function on X , but the supremum is takan as a section of the real line bundle |A | 2m ⊗ |KX |2 in the obvious manner2. Then h A · K̂Am is a continuous semipositive (n, n) form on X . Under the above notations, we have the following theorem. Theorem 1.7 We set K̂A∞ := lim sup A · K̂ ĥcan,A := the lower envelope of K̂ Then ĥcan,A is an AZD of KX . And we define ĥcan := the lower envelope of inf ĥcan,A, where inf means the pointwise infimum and A runs all the ample line bundles on X. Then ĥcan is a well defined AZD 3 depending only on X. � Definition 1.8 (Supercanonical AZD) We call ĥcan in Theorem 1.7 the supercanonical AZD of KX . And we call the semipositive (n, n) form ĥ can the supercanonical volume form on X. � Remark 1.9 Here “super” means that corresponding volume form ĥ−1can satisfies the inequality : ĥ−1can ≧ h if X has nonnegative Kodaria dimension (cf. Theorem 2.9). � In the statement of Theorem 1.7, one may think that ĥcan,A may dependent of the choice of the metric hA. But later we prove that ĥcan,A is independent of the choice of hA (cf. Theorem 2.7). 2We have abused the notations |A|, |KX| here. These notations are similar to the notations of corresponding linear systems. But I think there is no fear of confusion. 3I believe that ĥcan,A is already independent of the sufficiently ample line bundle A. 1.3 Variation of the supercanonical AZD ĥcan Let f : X −→ S be an algebraic fiber space, i.e., X,S are smooth projective varieties and f is a projective morphism with connected fibers. Suppose that for a general fiber Xs := f −1(s), KXs is pseudoeffective 4. In this case we may define a singular hermitian metric ĥcan on KX/S similarly as above. Then ĥcan have a nice properties on f : X −→ S as follows. Theorem 1.10 Let f : X −→ S be an algebraic fiber space such that for a general fiber Xs, KXs is pseudoeffective. We set S ◦ be the maximal nonempty Zariski open subset of S such that f is smooth over S◦ and X◦ = f−1(S◦). Then there exists a unique singular hermitian metric ĥcan on KX/S such that 1. ĥcan has semipositive curvature in the sense of current. 2. ĥcan |Xs is an AZD of KXs for every s ∈ S◦. 3. There exists the union F of at most countable union of proper subvarieties of S such that for every s ∈ S \F , ĥcan|Xs ≦ ĥcan,s holds, where ĥcan,s denotes the supercanonical AZD of KXs . 4. There exists a subset G of measure 0 in S◦, such that for every s ∈ S◦ \G, ĥcan |Xs = ĥcan,s holds. Remark 1.11 Even for s ∈ G, ĥcan|Xs is an AZD of KXs by 2. I do not know whether F or G really exists in some cases. � By Theorem 1.10 and the L2-extension theorem ([O-T, p.200, Theorem]), we obtain the following corollary immediately. Corollary 1.12 ([S1, S2, T3]) Let f : X −→ S be a smooth projective family over a complex manifold S. Then plurigenera Pm(Xs) := dimH 0(Xs,OXs(mKXs)) is a locally constant function on S � The following corollary is immediate consequence of Theorem 1.10, since the supercanonical AZD is always has minimal singularities (cf. Definition 2.2 and Remark 2.8). Corollary 1.13 Let f : X −→ Y be an algebraic fiber space. Suppose that KX and KY are pseudoeffective. Let ĥcan be the canonical singular hermitian metric on KX/Y constructed as in Theorem 1.10. Let ĥcan,X , ĥcan,Y be the supercanonical AZD’s of KX and KY respectively. Then there exists a positive constant C such that ĥcan,X ≦ C · ĥcan · f∗ĥcan,Y holds on X. � 4This condition is equivalent to the one that for some regular fiber Xs, KXs is pseudoef- fective. This is well known. For the proof, see Lemma 3.7 below for example. Cororally 1.13 is very close to Iitaka’s conjecture which asserts that Kod(X) ≧ Kod(Y ) + Kod(F ) holds for any algebraic fiber space f : X −→ Y , where F is a general fiber of f : X −→ Y and Kod(M) denotes the Kodaira dimension of a compact complex manifold M . In this paper all the varieties are defined over C. And we frequently use the classical result that the supremum of a family of plurisubharmonic functions locally uniformly bounded from above is again plurisubharmonic, if we take the uppersemicontinuous envelope of the supremum ([L, p.26, Theorem 5]). For simpliciy, we denote the upper(resp. lower)semicontinuous envelope simply by the upper(resp. lower) envelope. We note that this adjustment occurs only on the set of measure 0. In this paper all the singular hermitian metrics are supposed to be lowersemicontinuous. There are other applications of the supercanonical AZD. Also it is imme- diate to generalize it to the log category and another generalization involving hermitian line bundles with semipositive curvature is also possible. These will be discussed in the forthcoming papers. 2 Proof of Theorem 1.7 In this section we shall prove Theorem 1.7. We shall use the same notations as in Section 1.2. The upper estimate of K̂Am is almost the same as in [T5], but the lower estimate of K̂Am requires the L 2 extension theorem ([O-T, O]). 2.1 Upper estimate of K̂Am Let X be as in Theorem 1.7 and let n denote dimX and let x ∈ X be an arbitrary point. Let (U, z1, · · · , zn) be a coordinate neighbourhood ofX which is biholomorphic to the unit open polydisk ∆n such that z1(x) = · · · = zn(x) = 0. Let σ ∈ Γ(X,OX(A+mKX)). Taking U sufficiently small, we may assume that (z1, · · · , zn) is a holomorphic local coodinate on a neighbourhood of the closure of U and there exists a local holomorphic frame eA of A on a neighbour- hood of the clousure of U . Then there exists a bounded holomorphic function fU on U such that σ = fU · eA · (dz1 ∧ · · · ∧ dzn)⊗m holds. Suppose that A · (σ ∧ σ̄) m |= 1 holds. Then we see that | fU (z) | m dµ(z) ≦ (inf hA(eA, eA)) hA(eA eA) m | fU |2 dµ(z) ≦ (inf hA(eA, eA)) hold, where dµ(z) denotes the standard Lebesgue measure on the coordinate. Hence by the submeanvalue property of plurisubharmonic functions, A · | σ | m (x) ≦ { hA(eA, eA)(x) infU hA(eA, eA) m · π−n· |dz1 ∧ · · · ∧ dzn |2 (x) holds. Let us fix a C∞ volume form dV on X . Since X is compact and every line bundle on a contractible Stein manifold is trivial, we have the following lemma. Lemma 2.1 There exists a positive constant C independent of the line bundle A and the C∞ metric hA such that lim sup A · K̂ m ≦ C · dV holds on X. � 2.2 Lower estimate of K̂Am Let hX be any C ∞ hermitian metric on KX . Let h0 be an AZD of KX defined by the lower envelope of : inf{h(x) | h is a singular hermitian metric on KX with Θh ≧ 0,h ≧ hX}. Then by the classical theorem of Lelong ([L, p.26, Theorem 5]) it is easy to verify that h0 is an AZD of KX (cf. [D-P-S, Theorem 1.5]). h0 is of minimal singularities in the following sense. Definition 2.2 Let L be a pseudoeffective line bundle on a smooth projective variety X. An AZD h on L is said to be an AZD of minimal singularities, if for any AZD h′ on L, there exists a positive constant C such that h ≦ C · h′ holds. � Let us compare h0 and ĥcan. By the L2-extension theorem ([O]), we have the following lemma. Lemma 2.3 There exists a positive constant C independent of m such that K(A+mKX , hA · hm−10 ) ≧ C · (hA · hm0 )−1 holds, where K(A+mKX , hA · hm−10 ) is the (diagonal part of) Bergman kernel of A+mKX with respect to the L 2-inner product: (σ, σ′) := ( σ ∧ σ̄′ · hA · hm−10 , where we have considered σ, σ′ as A+ (m− 1)KX valued canonical forms. � Proof of Lemma 2.3. By the extremal property of the Bergman kernel (see for example [Kr, p.46, Proposition 1.4.16]) we have that K(A+mKX , hA·hm−10 )(x) = sup{|σ(x) | 2| σ ∈ Γ(X,OX(A+mKX)⊗I(hm−10 )), ‖σ‖= 1}, holds for every x ∈ X , where ‖σ‖= (σ, σ) 12 . Let x be a point such that h0 is not +∞ at x. Let dV be an arbitrary C∞ volume form on X as in Section 1.2. Then by the L2-extension theorem ([O, O-T]) and the sufficiently ampleness of A (see Section 1.2), we may extend any τx ∈ (A+mKX)x with hA·hm−10 ·dV −1(τx, τx) = 1 to a global section τ ∈ Γ(X,OX(A+mKX)⊗ I(hm−10 )) such that ‖τ ‖≦ C0, where C0 is a positive constant independent of x and m. Let C1 be a positive constant such that h0 ≧ C1 · dV −1 holds on X . By (1), we obtain the lemma by taking C = C−10 · C1. � Let σ ∈ Γ(X,OX(A+mKX)⊗ I(hm−10 )) such that σ ∧ σ̄ · hA · hm−10 = 1 | σ |2 (x) = K(A+mKX , hA · hm−10 )(x) hold, i.e., σ is a peak section at x. Then by the Hölder inequality we have that A · (σ ∧ σ̄) m | ≦ ( hA · hm0 · | σ |2 ·h−10 ) m · ( h−10 ) h−10 ) hold. Hence we have the inequality: K̂Am(x) ≧ K(A+mKX , hA · hm−10 )(x) m · ( h−10 ) m (2) holds. Now we shall consider the limit lim sup A ·K(A+mKX , hA · h Let us recall the following result. Lemma 2.4 ([D, p.376, Proposition 3.1]) lim sup A ·K(A+mKX , hA · h m = h−10 holds. � Remark 2.5 In ([D, p.376, Proposition 3.1], Demailly only considered the local version of Lemma 2.4. But the same proof works in our case by the sufficiently ampleness of A. This kind of localization principle for Bergman kernels is quite standard. � In fact the L2-extension theorem ([O-T, O]) implies the inequality lim sup A ·K(A+mKX , hA · h m ≧ h−10 and the converse inequality is elementary. See [D] for details and applications. Hence letting m tend to infinity in (2), by Lemma 2.4, we have the following lemma. Lemma 2.6 lim sup A · K̂ m ≧ ( h−10 ) −1 · h−10 holds. � By Lemmas 2.1 and 2.6, we see that K̂A∞ := lim sup A · K̂ exists as a bounded semipositive (n, n) form on X . We set ĥcan,A := the lower envelope of (K 2.3 Independence of ĥcan,A from hA In the above construction, ĥcan,A depends on the choice of the C ∞ hermitian metric hA apriori. But actually ĥcan,A is independent of the choice of hA. Let h′A be another C ∞-hermitian metric on A. We define (K̂Am) ′ := sup{| σ | 2m ; σ ∈ Γ(X,OX(A+mKX)), | (h′A) m · (σ ∧ σ̄) 1m |= 1}. We note that the ratio hA/h A is a positive C ∞-function on X and m = 1 uniformly on X . Since the definitions of K̂Am and (K̂ ′ use the extremal prop- erties, we see easily that for every positive number ε, there exists a positive integer N such that for every m ≧ N (1− ε)(K̂Am)′ ≦ K̂Am ≦ (1 + ε)(K̂Am)′ holds on X . Hence we obtain the following uniqueness theorem. Theorem 2.7 K̂A∞ = lim supm→∞ h A · K̂Am is independent of the choice of the C∞ hermitian metric hA. Hence hcan,A is independent of the choice of the C hermitian metric hA. � 2.4 Completion of the proof of Theorem 1.7 Let h0 be an AZD of KX constructed as in Section 2.1. Then by Lemma 2.6 we see that ĥcan,A ≦ ( h−10 ) · h0 holds. Hence we see I(ĥmcan,A) ⊇ I(hm0 ) holds for every m ≧ 1. This implies that H0(X,OX(mKX)⊗I(hm0 )) ⊆ H0(X,OX(mKX)⊗I(ĥmcan,A)) ⊆ H0(X,OX(mKX)) hold, hence H0(X,OX(mKX)⊗ I(ĥmcan,A)) ≃ H0(X,OX(mKX)) holds for every m ≧ 1. And by the construction and the classical theorem of Lelong ([L, p.26, Theorem 5]) stated in Section 1.3, ĥcan,A has semipositive curvature in the sense of current. Hence ĥcan,A is an AZD of KX and depends only on X and A by Lemma 2.7. Let us consider K̂∞ := sup K̂∞,A where sup means the pointwise supremum and A runs all the sufficiently am- ple line bundle on X . Then Lemma 2.1, we see that K̂∞ is a well defined semipositive (n, n) form on X . We set ĥcan := the lower envelope of K̂ Then by the construction, ĥcan ≦ ĥcan,A for every ample line bundle A. Since ĥA is an AZD ofKX , ĥcan is also an AZD ofKX indeed (again by [L, p.26, Theorem 5]). Since ĥcan,A depends only on X and A, ĥcan is uniquely determined by X . This completes the proof of Theorem 1.7. � Remark 2.8 As one see Section 2.2, we see that ĥcan is an AZD of KX of minimal singularities (cf. Definition 2.2). � 2.5 Comparison of hcan and ĥcan Suppose that X has nonnegative Kodaira dimension. Then by Theorem 1.3, we can define the canonical AZD hcan on KX . We shall compare hcan and ĥcan. Theorem 2.9 ĥcan,A ≦ hcan holds on X. In particular ĥcan ≦ hcan holds on X � Proof of Theorem 2.9. If X has negative Kodaira dimension, then the right hand side is infinity. Hence the ineuqality is trivial. Suppose thatX has nonnegative Kodaira dimension. Let σ ∈ Γ(X,OX(mKX)) be an element such that (σ ∧ σ̄) m |= 1 Let x ∈ X be an arbitrary point on X . Since OX(A) is globally generated by the definition of A, there exists an element τ ∈ Γ(X,OX(A)) such that τ(x) 6= 0 and hA(τ, τ) ≦ 1 on X . Then we see that hA(τ, τ) m · (σ ∧ σ̄) 1m ≦ 1 holds. This implies that K̂Am(x) ≧|τ(x) | m ·Km(x) holds at x. Noting τ(x) 6= 0,letting m tend to infinity, we see that K̂A∞(x) ≧ K∞(x) holds. Since x is arbitrary, this completes the proof of Theorem 2.9. � Remark 2.10 The equality hcan = ĥcan implies the abundance of KX . � By the same proof we obtain the following comparison theorem (without assuming X has nonnegative Kodaira dimension). Theorem 2.11 Let A,B a sufficiently ample line bundle on X. Suppose that B −A is globally generated, then ĥcan,B ≦ ĥcan,A holds. � Remark 2.12 Theorem 2.11 implies that ĥcan = lim ĥcan,ℓA holds for any ample line bundle A on X. � 3 Variation of ĥcan under projective deforma- tions In this section we shall prove Theorem 1.10. The main ingredient of the proof is the variation of Hodge structure. 3.1 Construction of ĥcan on a family Let f : X −→ S be an algebraic fiber space as in Theorem 1.10. The construction of ĥcan can be performed simultaeneously on the family as follows. The same construction works for flat projective family with only canonical singularities. But for simplicity we shall work on smooth category. Let S◦ be the maximal nonempty Zariski open subset of S such that f is smooth over S◦ and let us set X◦ := f−1(S◦). Hereafter we shall assume that dimS = 1. The general case of Theorem 1.10 easily follows from just by cutting down S to curves. Let A be a sufficiently ample line bundle on X such that for every pseudoeffective singular hermitian line bundle (L, hL), OX(A+L)⊗I(hL) and OX(KX+A+L)⊗I(hL) are globally generated and OXs(A+L |Xs)⊗I(hL |Xs) and OXs(KXs+A+L |Xs)⊗I(hL |Xs) are globally generated for every s ∈ S◦ as long as hL|Xs is well defined. Let us assume that there exists a smooth member D of | 2A | such that D does not contain any fiber over S◦. Let σD a holomorphic section of 2A with divisor D. We consider the singular hermitian metric hA := | σD | on A. We set Em := f∗OX(A+mKX/S). Since we have assumed that dimS = 1, Em is a vector bundle for every m ≧ 1. We denote the fiber of the vector bundle over s ∈ S by Em,s. Then we shall define the sequence of 1 A-valued relative volume forms by K̂Am,s := sup{|σ | m ;σ ∈ Em,s, | A · (σ ∧ σ̄) m |= 1} for every s ∈ S◦. This fiberwise construction is different from that in Section 1.2 in the following two points : 1. We use the singular metric hA |Xs instead of a C∞ hermitian metric on A |Xs. 2. We use Em,s instead of Γ(Xs,OXs(A|Xs +mKXs)). We note that the 2nd difference occurs only over at most countable union of proper analytic subsets in S◦. Since hA is singular, at some point s ∈ S◦ and for some positive integer m0, K̂ might be identically 0 on Xs. But for any s ∈ S◦ we find a positive integer m0 such that for every m ≧ m0, we have A |Xs) = OXs holds for every m ≧ m0. Hence even in this case we see that K̂Am,s is not identically 0 for every sufficiently large m. We define the relative |A | 2m valued volume form K̂Am by K̂Am|Xs := K̂Am,s(s ∈ S) and a relative volume form K̂A∞ by K̂A∞|Xs := lim sup A · K̂ m,s(s ∈ S). Of course the above construction of K̂Am,s(s ∈ S◦) works also for C∞ hermitian metric instead of the singular hA as above. The reason why we use the singular hA is that we shall use the variation of Hodge structure to prove the plurisub- harmonic variation of log K̂Am,s.We may use a C ∞ metric with strictly positive curvature on A, instead of the singular hA as above, if we use the plurisubhar- monicity properties of Bergman kernels ([Ber, Theorem 1.2]) instead of Theorem 3.1. See Theorem 4.1 below. We define singular hermitian metrics on A+mKX/S by ĥm,A := the lower envelope of (K̂ Let us fix a C∞ hermitian metric hA,0 on A and we set ĥcan,A := the lower envelope of lim inf A,0 · ĥm,A. Cleary ĥcan,A does not depend on the choice of hA,0 (in this sense, the presence of hA,0 is rather auxilary). Then we define ĥcan := the lower envelope of inf ĥcan,A, where A runs all the ample line bundle on X . At this moment, ĥcan is defined only on KX/S |X◦. The extension of ĥcan to the singular hermitian metric on the whole KX/S will be discussed later. 3.2 Semipositivity of the curvature current of ĥm,A To prove the semipositivity of the curvature of ĥm,A, the following theorem is essential. Theorem 3.1 ([Ka3, p.174,Theorem 1.1] see also [F, Ka1]) φ : M −→ C be a projective morphism with connected fibers from a smooth projective variety M onto a smooth curve C. Let KM/C be the relative canonical bundle. We set F := φ∗OM (KM/C)) and let C◦ denote the nonempty maximal Zariski open subset of C such that φ is smooth over C◦. Let hM/C be the hermitian metric on F | C◦ by hM/C(σ, σ ′) := ( σ ∧ σ′, where n = dimM − 1. Let π : P(F ∗) −→ C be the projective bundle associated with F ∗ and Let L −→ P(F ∗) be the tautological line bundle. Let hL denote the hermitian metric on L | π−1(S◦) induced by hM/C . Then hL has semipositive curvature on π −1(S◦) and hL extends to the sin- gular hermitian metric on L with semipositive curvature current. � We define the pseudonorm ‖σ‖ 1 of σ ∈ Em,s by ‖σ‖ 1 A · (σ ∧ σ̄) m |m2 . We set Em = f∗OX(A+mKX/S) and let Lm be the tautological line bundle on P(E∗m), where E m denotes the dual of Em. By Theorem 3.1 and the branched covering trick, we obtain the following essential lemma. Lemma 3.2 ([Ka1, p.63, Lemma 7 and p.64, Lemma 8]) Let σ ∈ Γ(X,OX(A + mKX/S)). Then ‖ ‖ defines a singular hermitian metric with semipositive curvature on Lm. � Proof of Lemma 3.2. If there were no A, the lemma is completely the same as [Ka1, p.63, Lemma 7 and p.64, Lemma 8]. In our case, we use the Kawamata’s trick to reduce the logarithmic case to the non logarithmic case. Since this trick has been used repeatedly by Kawamata himself (see [Ka2, Ka3] for example), the following argument has no originality. We consider the multivalued relative log canonical form Then there exists a finite cyclic covering µ : Y −→ X such that µ∗( σ√ m is a (single valued) relative canonical form on Y 5. Here the branch locus of µ may be much larger than the union of D∪(σ). But it does not matter. The branch covering is used to reduce the log canonical case to the canonical case. Let π : Ỹ −→ Y be an equivariant resolution of singularities and let f̃ : Ỹ −→ S be the resulting family. We shall denote the composition µ ◦ π : Ỹ −→ X by µ̃. Let U be a Zariski open subset of Sσ such that f̃ is smooth. We note that the Galois group action is isometric on f̃∗OỸ (KỸ /S) with respect to the natural L2-inner product on f̃∗OỸ (KỸ /S). Therefore by Theorem 3.1, we see that ‖ ‖ defines a singular hermitian metric on Lm with semipositive curvature on a nonempty Zariski open of P(E∗m). Again by Theorem 3.1 the singular hermitian metric extends to the whole P(E∗m) preserving semipositive curvature property. We also present an alternative proof indicated by Bo Berndtsson at the workshop at MSRI in April, 2007. Alternative proof of Lemma 3.2(cf. [B-P, Section 6]). We use the eqality | σ | 2m= | σ | | σ |2m−1m and view | σ |2m−1m as a singular hermitian metric on (m− 1)KX/S +A. Then by [T4, Therem 5.4] or [B-P], we see that A · (σ ∧ σ̄) m | 2m defines a singular hermitian metric with semipositive curvature current on Lm. The rest of the proof is identical as the previous one. � 5If we use a C∞ hermitian metric instead of the above hA, we also construct a cyclic covering µ : Y −→ X such that 1 µ∗L is a genuine line bundle on Y and µ∗σ m is a 1 valued canonical form on Y . Remark 3.3 The metric hA can be replaced by a C ∞-hermitian metric with semipositive curvature in the second proof. � Corollary 3.4 (see also [B-P, Section 6]) The curvature Θ ĥm,A −1∂∂̄ log K̂m,A is semipositive everywhere on X◦. � Proof. Let x ∈ Xs(s ∈ S◦) and let Ω be a holomorphic local generatorof KX/S and let eA be a holomorphic local generator of A on a neighbourhood U of x in X◦. Viewing ξ(y) := (e−1A · Ω−m)(y) as an element of the dual of Em,f(y) by σ ∈ Em,f(y) 7→ σ(y) · (e−1A · Ω−m)(y)(y ∈ U), log(K̂m,A(y)· | eA |− m · | Ω |−2 (y)) (y ∈ U) is plurisubharmonic function on U , since | ξ(y) | m ·K̂m,A(y) = sup{ | ξ(y) · σ(y) | 2m ‖ [σ][ξ(y)] ‖ ; σ ∈ Em,f(y), [σ][ξ(y)] 6= 0} holds, where [σ][ξ(y)] denotes the class of σ ∈ Em,f(y) in the fiber Lm,[ξ(y)] at [ξ(y)] ∈ P(E∗m). � Now let us consider the behavior of ĥm,A along X\X◦. Since the problem is local, we may and do assume S is a unit open disk ∆ in C for the time being. For every local holomorphic section σ of Em the function A · (σ ∧ σ̄) is of algebraic growth along S \S◦. More precisely for s0 ∈ S \S◦ as in [Ka1, p.59 and p. 66] there exist positive numbers C,α, β such that A · (σ ∧ σ̄) m |≦ C· |s− s0|−α · | log(s− s0) |β (3) holds. Moreover as [Ka1, p.66] for a nonvanishing holomorphic section σ of Em around p ∈ S \S◦, the pseudonorm ‖σ‖ 1 A (σ ∧ σ̄) has a positive lower bound around every p ∈ S. This implies that ĥm,A is bounded from below by a smooth metric along the boundary X \X◦. By the above estimate, ĥm,A is of algebraic growth along the fiber on X \X◦ by its definition and ĥm,A extends to a singular hermitian metric on A+KX/S with semipositive curvature on the whole X . Now we set ĥcan,A := the lower envelope of lim inf A,0 · ĥm,A, where hA,0 be a C ∞ metric on A (with strictly positive curvature) as in the last subsetion 6. To extend ĥcan,A across S \S◦, we use the following useful lemma. 6One may use hA instead of hA,0 here. But the corresponding limits may be different along D, although the difference is negligible by taking the lower envelopes. Lemma 3.5 ([B-T, Corollary 7.3]) Let {uj} be a sequence of plurisubharmonic functions locally bounded above on the bounded open set Ω in Cm. Suppose further lim sup is not identically −∞ on any component of Ω. Then there exists a plurisubhar- monic function u on Ω such that the set of points {x ∈ Ω | u(x) 6= (lim sup uj)(x)} is pluripolar. � Since ĥm,A extends to a singular hermitian metric on A + KX/S with semipositive curvature current on the whole X and ĥcan,A := the lower envelope of lim inf A,0 · ĥm,A exists as a singular hermitian metric on KX/S on X ◦ = f−1(S◦), we see that ĥcan,A extends as a singular hermitian metric with semipositive curvature cur- rent on the whole X by Lemma 3.5. Repeating the same argument we see that ĥcan is a well defined singular her- mitian metric with semipositive curvature current on KX/S |X◦ and it extends to a singular hermitian metric on KX/S with semipositive curvature current on the whole X . 3.3 Uniqueness of ĥcan,A for singular hA’s In the above construction, we use a singular hermitian metric hA on A instead of a C∞ hermitian metric. We note that hA is singular along the divisor D. Hence the resulting metric may be a little bit different from the original construction apriori. But actually Theorem 2.7 still holds. Our metric hA is defined as as above. Let h′A be a C ∞ hermitian metric on A. Let us fix an arbitrary point s ∈ S◦. Let us fix a Kähler metric on X and let Uε be the ε neighbourhood of D with respect to the metric. By the upper estimate Lemma 2.1, we see that although hA is singular along D, there exists a positive integer m0 and a positive constnat C depending only on s such that for every m ≧ m0 and any σ ∈ Em,s with ‖ σ ‖ 1 A · (σ ∧ σ̄) 2 = 1, Uε∩Xs A · (σ ∧ σ̄) m |≦ C · ε holds. This means that there is no mass concentration around the neighbour- hood of D ∩ Xs. We note that on Xs \Uε the ratio (hA/h′A) m converges uni- formly to 1 as m tends to infinity. Hence by the definitions of K̂Am,s and (K̂ we see that for every s ∈ S◦ and δ > 0, there exists a positive integer m1 such that for every m ≧ m1 (1− δ)(K̂Am,s)′ ≦ K̂Am,s ≦ (1 + δ)(K̂Am,s)′ holds on Xs. Hence we have the following lemma. Lemma 3.6 K̂A∞,s is same as the one defined by a C ∞ hermitian metric on A for every s ∈ S◦. � 3.4 Case dimS > 1 In Sections 3.1,3.2, we have assumed that dimS = 1. In the case of dimS > 1 the same proof works similarly. But there are several minor differences. First there may not exist D ∈| 2A | which does not contain any fibers, hence the restriction of hA may not be well defined on some fibers in this case. But this can be taken care by Lemma 3.6. Namely ĥcan is independent of the choice of D. Hence replacing hA by a C ∞ hermitian metric, we see that K̂A∞ is defined on all fibers over S◦. Second in this case Em = f∗OX(A+mKX/S) may not be locally free on S◦. If Em.s is not locally free at s0 ∈ S◦, then K̂A∞ may be discontinuous at s0. But J := {s ∈ S◦ | Em is not locally free at s for some m ≧ 1} is at most a countable union of proper subvarieties of S◦ and ĥcan,A := the lower envelope of is a well defined singular hermitian metric with semipositive curvature current on X◦, i.e., the construction is indifferent to the thin set J . Hence we may construct ĥcan on X ◦ in this case. The extension of ĥcan as a singular hermitian metric onKX/S with semipositive curvature current can be accomplished just by slicing S by curves. Hence we complete the proof of the assertion 1 in Theorem 1.10. 3.5 Completion of the proof of Theorem 1.10 To complete the proof of Theorem 1.10, we need to show that ĥcan defines an AZD for KXs for every s ∈ S. To show this fact, we modify the construction of K̂Am. Here we do not assume dimS = 1. Let us fix s ∈ S◦ and let h0,s be an AZD constructed as in Section 2.2. Let U be a neighbourhood of s ∈ S◦ in S◦ which is biholomorphic to an open ball in k(k := dimS). By the L2-extension theorem ([O-T, O]), we have the following lemma. Lemma 3.7 Every element of Γ(Xs,OXs(A | Xs+mKXs)⊗I(hm−10,s )) extends to an element of Γ(f−1(U),OX(A+mKX)) for every positive integer m. � Proof of Lemma 3.7. We prove the lemma by induction on m. If m = 1, then the L2-extension theorem ([O-T, O]) implies that every element of Γ(Xs,OXs(A +KXs)) extends to an element of Γ(f−1(U),OX(A +KX)). Let {σ(m−1)1,s , · · · , σ (m−1) N(m−1)} be a basis of Γ(Xs,OXs(A | Xs+(m−1)KXs)⊗I(h̃ 0,s )) for some m ≧ 2. Suppose that we have already constructed holomorphic exten- sions {σ̃(m−1)1,s , · · · , σ̃ (m−1) N(m−1),s} ⊂ Γ(f −1(U),OX(A+ (m− 1)KX)) of {σ(m−1)1,s , · · · , σ (m−1) N(m−1),s} to f −1(U). We define the singular hermitian metric Hm−1 on (A+ (m− 1)KX) | f−1(U) by Hm−1 := 1∑N(m−1) j=1 | σ̃ (m−1) j,s |2 We note that by the choice of A, OXs(A |Xs + mKXs) ⊗ I(hm−10,s ) is globally generated. Hence we see that I(hm0,s) ⊆ I(hm−10,s ) ⊆ I(Hm−1|Xs) hold on Xs. Apparently Hm−1 has a semipositive curvature current. Hence by the L2-extension theorem ([O-T, p.200, Theorem]), we may extend every element of Γ(Xs,OXs(A+mKXs)⊗ I(hm−10,s )) extends to an element of Γ(f−1(U),OX(A+mKX)⊗ I(Hm−1)). This completes the proof of Lemma 3.7 by induction. � Let hA,0 be a C ∞ hermitian metric on A with strictly positive curvature as in the end of the last subsection. We define the sequence of {K̃Am,s} by K̃Am,s := sup{| σ | m ; σ ∈ Γ(Xs,OXs(A | Xs+mKXs)⊗I(hm−10,s )), | A,0·(σ∧σ̄) m |= 1}. By Lemma 3.7, we obtain the following lemma immediately. Lemma 3.8 lim sup A,0 · K̃ m,s ≦ K̂ holds. � Proof. We set K̂A,0m,s = sup{| σ | m ; σ ∈ Em,s, | A,0 · (σ ∧ σ̄) m |= 1}. Then by the definition of K̃Am,s and Lemma 3.7 we have that K̃Am,s ≦ K̂ m,s (4) holds on Xs. On the other hand by Lemma 3.6, we see that lim sup A,0 · K̂ m,s = lim sup A,0 · K̂ m,s = K̂∞,s (5) hold. Hence combining (4) and (5), we complete the proof of Lemma 3.8. � We set h̃m,A,s := (K̃ We have the following lemma. Lemma 3.9 If we define K̃A∞,s := lim sup A,0 · K̃ h̃∞,A,s := the lower envelope of K̃ ∞.A,s, h̃∞,A,s is an AZD of KXs . � Proof. Let h0,s be an AZD of KXs as above. We note that OXs(A |Xs + mKXs) ⊗ I(hm−10,s ) is globally generated by the definition of A. Then by the definition of K̃Am,s, I(hm0,s) ⊆ I(h̃mm,A,s) holds for every m ≧ 1. Hence by repeating the arugument in Section 2.2, similar to Lemma 2.6, we have that h̃∞,A,s ≦ ( h−10,s) · h0,s holds. Hence h̃∞,A,s is an AZD of KXs . � Since by the construction and Lemma 3.6 ĥcan,s ≦ h̃∞,A,s holds on s, we see that ĥcan |Xs is an AZD of KXs . Since s ∈ S◦ is arbitrary, we see that ĥcan |Xs is an AZD of KXs for every s ∈ S◦. This completes the proof of the assertion 2 in Theorem 1.10. We have already seen that the singular hermitian metric ĥcan has semipositive curvature in the sense of current (cf. Section 3.2 expecially Corollary 3.4). We note that there exists the union F of at most countable union of proper subvarieties of S such that for every s ∈ S◦ \F E(ℓ)m,s = Γ(Xs,OX(ℓA+mKXs)) holds for every ℓ,m ≧ 1. Then by the construction and Theorem 2.11(see Remark 2.12)7 for every s ∈ S◦ \F , ĥcan|Xs ≦ ĥcan,s holds, where ĥcan,s is the supercanonical AZD of KXs . This completes the proof of the assertion 3 in Theorem 1.10. 7Theorem 2.11 is used because some ample line bundle on the fiber may not extends to an ample line bundle on X in general. We shall define the singular hermitian metric Ĥcan on KX/S|X◦ by Ĥcan|Xs := ĥcan,s (s ∈ S◦). Then by the construction of ĥcan there exists a subset Z of measure 0 in X such that Ĥcan|X◦ \Z = ĥcan|X◦ \Z holds. Let us set G := {s ∈ S◦ | Xs ∩ Z is not of measure 0 in Xs}. Then since Z is of measure 0, G is of measure 0 in S◦. For s ∈ S \G, by the definition of the supercanonical AZD ĥcan,s of KXs , we see that ĥcan|Xs = ĥcan,s holds. This completes the proof of Theorem 1.10. � Remark 3.10 As above we have used the singular hermitian metric hA to prove Theorem 1.10 and then go back to the case of a C∞ metric by the uniqueness result (Lemma 3.6). This kind of interaction between singular and smooth met- rics have been seen in the convergence of the currents associated with random sections of a positive line bundle to the 1-st Chern form of the positive line bundle (see [S-Z]). My first plan of the proof of Theorem 1.10 was to use the random sections to go to the smooth case from the singular case. Although I cannot justify it, it seems to be interesting to pursue this direction. � 4 Appendix The following theorem is a generalization of Theorem 3.1. Theorem 4.1 φ : M −→ C be a projective morphism with connected fibers from a smooth projective variety M onto a smooth curve C. Let KM/C be the relative canonical bundle. Let (L, hL) be a pseudoeffective singular hermitian line bundle on M Let m be a positive integer. We set F := φ∗OM (mKM/C +L) and let C◦ denote the nonempty maximal Zariski open subset of C such that φ is smooth over C◦. Let π : P(F ∗) −→ C be the projective bundle associated with F ∗ and Let H −→ P(F ∗) be the tautological line bundle. Let hH denote the singular hermitian metric on H | π−1(S◦) defined by hH(σ, σ) := {( L · (σ ∧ σ) m }m2 , where n = dimM − 1. Then hH has semipositive curvature on π−1(S◦) and hH extends to the singular hermitian metric on H with semipositive curvature current. � Proof. The proof is a minor modification of the proof of Lemma 3.2. Let σ be a local holomorphic section of H on π−1(S◦). We consider the multivalued L-valued canonical form m σ and uniformize it by taking a suitable cyclic Galois covering µ : Y −→ X as in Lemma 3.2. Then applying [Ber, Theorem 1.2] or [T4, Theorem 5.4] (see also [B-P]) on Y , as in Lemma 3.2, we see that hH defines a singular hermitian metric on the tautological line bundle on P(E∗m). Hence we see that hH has semipositive curvature on π−1(S◦). The extension of hH to the whole H is also follows from [T4, Theorem 5.4]. This completes the proof of Theorem 4.1. � References [B-T] E. Bedford, B.A. Taylor : A new capacity of plurisubharmonic functions, Acta Math. 149 (1982), 1-40. [Ber] B. Berndtsson: Curvature of vector bundles associated to holomorphic fibra- tions, math.CV/0511225 (2005). [B-P] B. Berndtsson, B. and M. Paun : Bergman kernels and the pseudoeffectivity of relative canonical bundles, math.AG/0703344 (2007). [D] J.P. Demailly : Regularization of closed positive currents and intersection theory. J. Algebraic Geom. 1 (1992), no. 3, 361–409. [D-P-S] J.P. Demailly-T. Peternell-M. Schneider : Pseudo-effective line bundles on compact Kähler manifolds, International Jour. of Math. 12 (2001), 689-742. [F] T. Fujita : On Kähler fiber spaces over curves, J. Math. Soc. Japan 30, 779-794 (1978). [Ka1] Y. Kawamata: Kodaira dimension of Algebraic fiber spaces over curves, Invent. Math. 66 (1982), pp. 57-71. [Ka2] Y. Kawamata, Subadjunction of log canonical divisors II, alg-geom math.AG/9712014, Amer. J. of Math. 120 (1998),893-899. [Ka3] Y. Kawamata, On effective nonvanishing and base point freeness, Kodaira’s issue, Asian J. Math. 4, (2000), 173-181. [Kr] S. Krantz : Function theory of several complex variables, John Wiley and Sons (1982). [L] P. Lelong : Fonctions Plurisousharmoniques et Formes Differentielles Positives, Gordon and Breach (1968). [N] A.M. Nadel: Multiplier ideal sheaves and existence of Kähler-Einstein metrics of positive scalar curvature, Ann. of Math. 132 (1990),549-596. [O-T] T. Ohsawa, K. Takegoshi: L2-extension of holomorphic functions, Math. Z. 195 (1987),197-204. [O] T. Ohsawa: On the extension of L2 holomorphic functions V, effects of gener- alization, Nagoya Math. J. 161 (2001) 1-21, Erratum : Nagoya Math. J. 163 (2001). [S-Z] B. Shiffman, S. Zelditch :Distribution of zeros of random and quantum chaotic sections of positive line bundles. Comm. Math. Phys. 200 (1999), no. 3, 661–683. [S1] Y.-T. Siu : Invariance of plurigenera, Invent. Math. 134 (1998), 661-673. [S2] Y.-T. Siu : Extension of twisted pluricanonical sections with plurisubharmonic weight and invariance of semipositively twisted plurigenera for manifolds not nec- essarily of general type, Collected papers Dedicated to Professor Hans Grauert (2002), pp. 223-277. [T1] H. Tsuji: Analytic Zariski decomposition, Proc. of Japan Acad. 61(1992) 161- [T2] H. Tsuji: Existence and Applications of Analytic Zariski Decompositions, Trends in Math. Analysis and Geometry in Several Complex Variables, (1999) 253-272. http://arxiv.org/abs/math/0511225 http://arxiv.org/abs/math/0703344 http://arxiv.org/abs/math/9712014 [T3] H. Tsuji: Deformation invariance of plurigenera, Nagoya Math. J. 166 (2002), 117-134. [T4] H. Tsuji: Dynamical construction of Kähler-Einstein metrics, math.AG/0606023 (2006). [T5] H. Tsuji: Curvature semipositivity of relative pluricanonical systems, math.AG/0703729 (2007). [T6] H. Tsuji: Kodaira dimension of algebraic fiber spaces, in preparation. Author’s address Hajime Tsuji Department of Mathematics Sophia University 7-1 Kioicho, Chiyoda-ku 102-8554 Japan http://arxiv.org/abs/math/0606023 http://arxiv.org/abs/math/0703729 Introduction Canonical AZD hcan Supercanonical AZD can Variation of the supercanonical AZD can Proof of Theorem 1.7 Upper estimate of mA Lower estimate of mA Independence of can,A from hA Completion of the proof of Theorem 1.7 Comparison of hcan and can Variation of can under projective deformations Construction of can on a family Semipositivity of the curvature current of m,A Uniqueness of can,A for singular hA's Case dimS > 1 Completion of the proof of Theorem 1.10 Appendix ABSTRACT We introduce a new class of canonical AZD's (called the supercanonical AZD's) on the canonical bundles of smooth projective varieties with pseudoeffective canonical classes. We study the variation of the supercanonical AZD $\hat{h}_{can}$ under projective deformations and give a new proof of the invariance of plurigenera. <|endoftext|><|startoftext|> Introduction We consider a model for the term structure of interest rates, where the short rate (rt)t≥0 is given under the martingale measure by a one-dimensional conserva- tive affine process in the sense of Duffie, Filipović, and Schachermayer [2003]. An affine short rate process of this type will lead to an exponentially-affine structure of zero-coupon bond prices and thus also to an affine term structure of yields and forward rates. We emphasize here that the definition of Duffie et al. [2003] is not limited to diffu- sions, but also includes processes with jumps and even with jumps whose intensity depends in an affine way on the state of the process itself. The class of models we consider naturally includes the Vasiček model, the CIR model and variants of them that are obtained by adding jumps, such as the JCIR-model of Brigo and Mercurio [2006, Section 22.8]. Since they are the best-known, the two ‘classical’ models of Vasiček and Cox-Ingersoll-Ross will serve as the starting point for our discussion of yield curve shapes: A common criticism of the (time-homogenous) CIR and the Vasiček model is that they are not flexible enough to accommodate more complex shapes of yield curves, such as curves with a dip (a local minimum), curves with a dip and a hump, or Date: November 4, 2018. 2000 Mathematics Subject Classification. 60J25, 91B28. Key words and phrases. affine process, term structure of interest rates, Ornstein-Uhlenbeck process, yield curve. Supported by the Austrian Science Fund (FWF) through project P18022 and the START programm Y328. Supported by the module M5 “Modelling of Fixed Income Markets” of the PRisMa Lab, financed by Bank Austria and the Republic of Austria through the Christian Doppler Research Association. Both authors would like to thank Josef Teichmann for most valuable discussions and encour- agement. We also thank various proof-readers at FAM for their comments. http://arxiv.org/abs/0704.0567v2 2 MARTIN KELLER-RESSEL AND THOMAS STEINER other shapes that are frequently observed in the markets. Often these shortcomings are explained by ‘too few parameters’ in the model (cf. Carmona and Tehranchi [2006, Section 2.3.5] or Brigo and Mercurio [2006, Section 3.2]). However if jumps are added to the mentioned models, additional parameters (potentially infinitely many) are introduced through the jump part, while the model still remains in the scope of affine models. It is not clear per se what consequences the introduction of jumps will have for the range of attainable yield curves, and this is one question we intend to answer in this article. Moreover, there seems to be some confusion about what shapes of yield curves are actually attainable even in well-studied models like the CIR-model. While most sources (including the original paper of Cox et al. [1985]) mention inverse, normal and humped shapes, Carmona and Tehranchi [2006, Section 2.3.5] write that ‘tweaking the parameters [of the CIR model] can produce yield curves with one hump or one dip’, and Brigo and Mercurio [2006, Section 3.2] state that ‘some typ- ical shapes, like that of an inverted yield curve, may not be reproduced by the [CIR or Vasiček] model.’ In our main result, Theorem 3.9, we settle this question and prove that in any time-homogenous, affine one-factor model the attainable yield curves are either inverse, normal or humped. The proof will rely only on tools of elementary analysis and on the characterization of affine processes through the generalized Riccati equations of Duffie et al. [2003]. Another related problem is how the shape of the yield curve is determined by the parameters of the model, and also how – when the parameters are fixed – the yield curve is determined by the level of the current short rate. We show in Section 4.2 that also in this respect the CIR model has not been completely understood and discuss a misconception that originates in [Cox et al., 1985] and is repeated for ex- ample in [Rebonato, 1998]. In Section 3.3 we provide conditions under which an affine process converges to a limit distribution. We also characterize the limit distribution in terms of its cumulant generating function, extending results of Jurek and Vervaat [1983] and Sato and Yamazato [1984] for OU-type processes to the class of affine processes. These results can again be interpreted in the context of interest rates, where they can be used to derive the risk-neutral asymptotic distribution of the short rate (rt)t≥0 as t goes to infinity. We conclude our article in Section 4 by applying the theoretical results to several interest rate models, such as the Vasiček model, the CIR model, the JCIR model and an Ornstein-Uhlenbeck-type model. 2. Preliminaries In this section we collect some key results on affine processes from Duffie et al. [2003]. In their article affine processes are defined on the (m+n)-dimensional state space Rm >0 × Rn, and we will try to simplify notation where this is possible in the one-dimensional case. Results on affine processes with state space R>0 can also be found in Filipović [2001]. Definition 2.1 (One-dimensional affine process). A time-homogenous Markov pro- cess (rt)t≥0 with state space D = R>0 or R and its semi-group (Pt)t≥0 are called YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 3 affine, if the characteristic function of its transition kernel pt(x, .), given by p̂t(x, u) = euξ pt(x, dξ) and defined (at least) on {u ∈ C : Reu ≤ 0} if D = R>0 , {u ∈ C : Reu = 0} if D = R , is exponentially affine in x. That is, there exist C-valued functions φ(t, u) and ψ(t, u), defined on R>0 × U , such that (2.1) p̂t(x, u) = exp (φ(t, u) + xψ(t, u)) for all x ∈ D, (t, u) ∈ R>0 × U . For subsequent results the following regularity condition for (rt)t≥0 will be needed: Definition 2.2. An affine process is called regular if it is stochastically continuous and the right hand derivatives ∂+t φ(t, u)|t=0 and ∂+t ψ(t, u)|t=0 exist for all u ∈ U and are continuous at u = 0. Definition 2.3. The parameters (a, α, b, β, c, γ,m, µ) are called admissible for a process with state space R>0 if a = 0, α, b, c, γ ∈ R>0 , β ∈ R , m, µ are Lévy measures on (0,∞), where m satisfies (0,∞) (ξ ∧ 1)m(dξ) <∞ , and admissible for a process with state space R if a, c ∈ R>0 , b, β ∈ R , m is a Lévy measure on R \ {0} , α = 0, γ = 0, µ ≡ 0 . Moreover define the truncation functions hF (ξ) = 0 if D = R>0 if D = R and hR(ξ) = if D = R>0 0 if D = R and finally the functions F (u), R(u) for u ∈ C as F (u) = au2 + bu− c+ D\{0} euξ − 1− uhF (ξ) m(dξ) ,(2.2) R(u) = αu2 + βu− γ + D\{0} euξ − 1− uhR(ξ) µ(dξ) .(2.3) The next result is a one-dimensional version of the key result of Duffie et al. [2003]: 4 MARTIN KELLER-RESSEL AND THOMAS STEINER Theorem 2.4 (Duffie, Filipović, and Schachermayer, Theorem 2.7). Suppose (rt)t≥0 is a one-dimensional regular affine process. Then it is a Feller process. Let A be its infinitesimal generator. Then C∞c (D) is a core of A, C2c (D) ⊆ D(A) and there exist some admissible parameters (a, α, b, β, c, γ,m, µ) such that, for f ∈ C2c (D), Af(x) = (a+ αx)f ′′(x) + (b + βx)f ′(x)− (c+ γx)f(x)+ D\{0} (f(x+ ξ)− f(x)− f ′(x)hF (ξ)) m(dξ)+ D\{0} (f(x+ ξ)− f(x) − f ′(x)hR(ξ)) µ(dξ) .(2.4) Moreover φ(t, u) and ψ(t, u), defined by (2.1), solve the generalized Riccati equa- tions ∂t φ(t, u) = F (ψ(t, u)) , φ(0, u) = 0 ,(2.5a) ∂t ψ(t, u) = R (ψ(t, u)) , ψ(0, u) = u .(2.5b) Conversely let (a, α, b, β, c, γ, µ,m) be some admissible parameters. Then there ex- ists a unique regular affine semigroup (Pt)t≥0 with infinitesimal generator (2.4), and (2.1) holds with φ(t, u) and ψ(t, u) given by (2.5). Closely related to affine processes is the notion of an Ornstein-Uhlenbeck (OU- )type process. These processes are of some importance, since they usually offer good analytic tractability and have been studied for longer than affine processes. Following Sato [1999, Chapter 17] an OU-type process (Xt)t≥0 can be defined as the solution of the Langevin SDE dXt = −λXt dt+ dLt, λ ∈ R, X0 ∈ R, where (Lt)t≥0 is a Lévy process, often called background driving Lévy process (BDLP). In an equivalent definition, an OU-type process is a time-homogenous Markov process, whose transition kernel pt(x, .) has the characteristic function p̂t(x, u) = exp F (e−λsu) ds+ xe−λtu where F (u) is the characteristic exponent of (Lt)t≥0. From the last equation it is immediately seen that every OU-type process is an affine process in the sense of Definition 2.1. It is also seen that in the generalized Riccati equations (2.5) for an OU-type process necessarily R(u) = −λu. Comparing this with (2.3) and Defini- tion 2.3, it is seen that any regular affine process with state space R is a process of OU-type. The reverse, however is not true, as there also exist OU-type processes with state space R>0. We will give an example of such a process in Section 4.4. Naturally we will not only be interested in the process (rt)t≥0 itself, but also in its integral rs ds and in quantities of the type (2.6) Qt f(x) := E rs ds f(rt) ∣∣∣∣ r0 = x where f is a bounded function on D. The next result is an application of the Feynman-Kac formula for Feller semigroups (cf. Rogers and Williams [1994, Sec- tion III.19]) and can be found in Duffie et al. [2003]. It relies on the positivity of (rt)t≥0 and is therefore only applicable if D = R>0. YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 5 Proposition 2.5 (Duffie, Filipović, and Schachermayer, Proposition 11.1). Let (rt)t≥0 be a one-dimensional, regular affine process with state space R>0. Then the fam- ily (Qt)t≥0 defined by (2.6) forms a regular, affine semigroup with infinitesimal generator Bf(x) = Af(x)− xf(x) for all f ∈ C2c (D) . We will make extensive use of the convexity and continuous differentiability of the functions F and R from Definition 2.3. These properties are established in this Lemma: Lemma 2.6. If c = γ = 0 then F, R as defined in Definition 2.3 have the following properties: (i) R(0) = 0 and F (0) = 0. (ii) R(u) <∞ for all u ∈ (−∞, 0]. (iii) If F (u) < ∞ on (c1, c2) ⊆ R, then F is either strictly convex on (c1, c2) or F (u) = bu for all u ∈ R. The same holds for R with b replaced by β. (iv) If F (u) <∞ on (c1, c2) ⊆ R, then F is continuously differentiable on (c1, c2). Also the one-sided derivatives at c1 and c2 are defined but may take the values −∞ (at c1) and +∞ (at c2). The same holds for R. Proof. Property (i) is obvious. If D = R then by Definition 2.3 R(u) = βu such that (ii) follows immediately. If D = R>0 we use the estimate (2.7) |euξ − 1− uhR(ξ)| ≤ |u| O(ξ2) ∧ 1 for all u ∈ (−∞, 0] and ξ ∈ R>0, and (ii) follows from (2.3). For Property (iii) note that by the Lévy-Khintchine formula there exists an infinitely divisible random variableX , such that F is its cumulant generating function, i.e. F (u) = logE for u ∈ (c1, c2). Choosing two distinct numbers u, v ∈ (c1, c2), we apply the Cauchy- Schwarz inequality to = logE 2 · e vX2 ≤ log E[euX ] · E[evX ] = F (u) + F (v) which shows convexity of F . The inequality is strict unless there exists some c 6= 0 such that euX = cevX almost surely. This can only be the case if X is con- stant a.s., in which case F is linear. The same argument applies to R. Property (iv) follows from the convexity and from the fact that F and R are analytic on {u ∈ C : Reu ∈ (c1, c2)} (cf. Lukacs [1960, Chapter 7]). � 3. Theoretical Results We will now use the theory from the last section to calculate bond prices, yields and other quantities in an interest rate model where the short rate follows a one- dimensional regular affine process (rt)t≥0 under the martingale measure. Naturally we will also make the assumption that (rt)t≥0 is conservative, i.e. that pt(x,D) = 1 for all (t, x) ⊆ R>0 ×D. This implies by Duffie et al. [2003, Proposition 9.1] that c = γ = 0 in Definition 2.3. We will need some additional assumptions which are summarized in the following condition: Condition 3.1. The one-dimensional affine process (rt)t≥0 is assumed to be reg- ular and conservative. In addition, if the process has state space R, such that by 6 MARTIN KELLER-RESSEL AND THOMAS STEINER Definition 2.3 R(u) = βu, we require that (3.1) F (u) <∞ for all u ∈ (1/β, 0] if β < 0 , (−∞, 0] else . It will be seen that the condition on F is necessary to guarantee existence of bond prices for all maturities in the term structure model. By Sato [1999, Theorem 25.17] we get an equivalent formulation of Condition 3.1, if we replace F (u) < ∞ by∫ |ξ|>1 euξm(dξ) <∞. Next we define a quantity that will generalize the coefficient of mean reversion from OU-type processes: Definition 3.2 (quasi-mean-reversion). Given a one-dimensional conservative affine process (rt)t≥0, define the quasi-mean-reversion λ as the positive solution of (3.2) R(−1/λ) = 1 . If there is no positive solution we set λ = 0. Since R is by Lemma 2.6 a convex function satisfying R(0) = 0, it is easy to see that (3.2) can have at most one solution and thus λ is well-defined. The name quasi- mean-reversion is derived from the fact that if (rt)t≥0 is a process of OU-type with positive mean-reversion, then R(u) = βu and the quasi-mean-reversion λ = −β is exactly the coefficient of mean reversion of (rt)t≥0. When the process (rt)t≥0 satisfies Condition 3.1, it is seen that F must be defined at least on (−1/λ, 0]. We will encounter several times the condition that λ > 0. The next result gives an equivalent formulation in terms of (α, β, µ): Proposition 3.3. The quasi-mean reversion λ is strictly positive if and only if α > 0, D\{0} hR(ξ)µ(dξ) = ∞, or β − D\{0} hR(ξ)µ(dξ) < 0. Proof. First note that by Lemma 2.6 R(u) < ∞ for all u ∈ (−∞, 0]. Using the estimate (2.7) and a dominated convergence argument it is seen from (2.3) that = α(3.3) R(u)− αu2 = β0 := β − D\{0} hR(ξ)µ(dξ) ,(3.4) where β0 can also take the value −∞. Suppose now that α > 0. Then by (3.3) we get limu→−∞R(u) = ∞. Since R(0) = 0 and R is continuous it follows that there exists a λ > 0 such that R(−1/λ) = 1. Similarly if α = 0, but β0 < 0, it follows from (3.4) that limu→−∞R(u) = ∞ and thus again that λ > 0. Conversely, suppose that α = 0 and β0 ≥ 0. Then R′(u) = lim = β0 ≥ 0 . By the convexity of R it follows that R′(u) ≥ 0 for all u ∈ (−∞, 0). Since R(0) = 0 this implies that R(u) ≤ 0 for all u ∈ (−∞, 0), and consequently that λ = 0. � 3.1. Bond Prices. We consider now the price P (t, t + x) of a zero-coupon bond with time to maturity x, at time t, given by P (t, t+ x) = E ∫ t+x rs ds )∣∣∣∣Ft YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 7 The affine structure of (rt)t≥0 carries over to the bond prices, and we get the following result: Proposition 3.4. Let the short rate be given by a one-dimensional affine process (rt)t≥0 satisfying Condition 3.1. Then the bond price P (t, t+ x) exists for all t, x ≥ 0 and is given by (3.5) P (t, t+ x) = exp (A(x) + rtB(x)) where A and B solve the generalized Riccati equations ∂xA(x) = F (B(x)) A(0) = 0 ,(3.6a) ∂xB(x) = R (B(x)) − 1 B(0) = 0 .(3.6b) Proof. If D = R>0 the assertion follows directly from Proposition 2.5 by noting that P (t, t+ x) = Qx 1. If D = R then, as discussed after Theorem 2.4, (rt)t≥0 is a process of OU-type and R(u) has the simple structure R(u) = βu. By Sato [1999, (17.2) - (17.3)] we obtain in this case directly that (3.7) E ∫ t+x rs ds = exp F (B(s)) ds + rtB(x) with B(x) = (1 − eβx)/β if β 6= 0 and B(x) = −x when β = 0. As a function of x ∈ R>0, B is continuously decreasing from 0 to 1/β if β < 0, and from 0 to −∞ if β ≥ 0. It is therefore seen that the integral on the right side of (3.7) is finite for all x ∈ R>0 if and only if F satisfies (3.1), as required by Condition 3.1. � Corollary 3.5. Let (rt)t≥0 satisfy Condition 3.1 and have quasi-mean-reversion λ. Then the function B(x) from Proposition 3.4 is strictly decreasing and satisfies B(x) = −1/λ . Proof. The result follows from a qualitative analysis of the autonomous ODE (3.6b). Let λ > 0. Since R(−1/λ)−1 = 0 the point x∗ := −1/λ is an critical point of (3.6b). By the convexity of R and the fact that R(0) = 0 it follows that R′(x∗) < 0 such that x∗ is asymptotically stable, i.e. solutions entering a small enough neighborhood of x∗ must converge to x∗. Since R(x)− 1 < 0 for x ∈ (x∗, 0] and there is no other critical point in (x∗, 0], we conclude that B(x) – the solution of (3.6b) starting at 0 – is strictly decreasing and converges to x∗. If λ = 0 then there is no critical point in (−∞, 0] and R(x)−1 < 0 for x ∈ (−∞, 0]. It follows that B(x) is strictly decreasing and diverges to −∞. � 3.2. The Yield Curve and the Forward Rate Curve. The next results are the central theoretical results of this article and describe the global shapes of attainable yield curves in any affine one-factor term structure model. Definition 3.6. The (zero-coupon) yield Y (rt, x) is given by Y (rt, 0) := rt and (3.8) Y (rt, x) := − logP (t, t+ x) = −A(x) for all x > 0 . For rt fixed, we call the function Y (rt, .) the yield curve. The (instantaneous) forward rate f(rt, x) is given by f(rt, 0) := rt and (3.9) f(rt, x) := −∂x logP (t, t+ x) = −A′(x)− rtB′(x) for all x > 0 . For rt fixed, we call the function f(rt, .) the forward rate curve. 8 MARTIN KELLER-RESSEL AND THOMAS STEINER By l’Hospital’s rule and the generalized Riccati equations (3.6) it is seen that both the yield and the forward rate curve are continuous at 0. The first quantity associated to the yield curve that we consider, is the asymptotic level basymp of the yield curve as x → ∞, also known as long-term yield, consol yield or simply ‘long end’. Theorem 3.7. Let the short rate process be given by a one-dimensional affine pro- cess (rt)t≥0 satisfying Condition 3.1 with quasi-mean-reversion λ. If λ > 0 then basymp := lim Y (rt, x) = lim f(rt, x) = −F (−1/λ) . If λ = 0 then basymp = lim −F (u) + rt (1−R(u)) . Proof. From (3.6a) we obtain that (3.10) lim = lim A′(x) = lim F (B(x)) . If λ > 0 then by Corollary 3.5 (3.11) lim B(x) = −1/λ, lim = 0 and lim B′(x) = 0 and the assertion follows by combining (3.8) – (3.11). If λ = 0 then limx→∞B(x) = −∞ and = lim B′(x) = lim R(B(x)) − 1 . By setting u := B(x) we obtain the desired result. � From Theorem 3.7 it is clear that for practical purposes only models with λ > 0 will be useful. So far we know that in this case the short end of the yield curve is given by Y (rt, 0) = rt and the long end by Y (rt,∞) = basymp. We will now examine what happens between these two endpoints. Definition 3.8. The yield curve Y (rt, x) is called • normal if it is a strictly increasing function of x, • inverse if it is a strictly decreasing function of x, • humped if it has exactly one local maximum and no minimum on (0,∞). In addition we call the yield curve flat if it is constant over all x ∈ R>0. This is our main result on the shapes of yield curves in affine one-factor models: Theorem 3.9. Let the risk-neutral short rate process be given by a one-dimensional affine process (rt)t≥0 satisfying Condition 3.1 and with quasi-mean-reversion λ > 0. In addition suppose that F 6= 0 and that either F or R is non-linear. Then the following holds: • The yield curve Y (rt, .) can only be normal, inverse or humped. • Define bnorm := − F ′(−1/λ) R′(−1/λ) and binv := R′(0) if R′(0) < 0 +∞ if R′(0) ≥ 0 . YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 9 The yield curve is normal if rt ≤ bnorm , humped if bnorm < rt < binv and inverse if rt ≥ binv . The above theorem is visualized in Figure 1. For its proof we will use a simple Lemma. We state the Lemma without proof, since it follows in an elementary way from the usual definition of a convex function on R. Lemma 3.10. A strictly convex or a strictly concave function on R intersects an affine function in at most two points. In the case of two intersection points p1 < p2, the convex function lies strictly below the affine function on the interval (p1, p2); if the function is concave it lies strictly above the affine function on (p1, p2). Proof of Theorem 3.9. Define the function H(x) : R>0 → R by (3.12) H(x) := Y (rt, x)x = −A(x)− rtB(x) . We will see that the convexity behavior of H will be crucial for the shape of the yield curve Y (rt, .). From the generalized Riccati equations (3.6) the first derivative of H is calculated as (3.13) ∂xH(x) = −F (B(x)) − rt (R(B(x))− 1) and the second as (3.14) ∂xxH(x) = −B′(x) (F ′(B(x)) + rtR′(B(x))) . Note that F and R are continuously differentiable by Lemma 2.6, and also B by (3.6b), such that the second derivative of H is well-defined and continuous. Since B is strictly decreasing by Corollary 3.5, the factor −B′(x) is positive for all x ∈ R>0. The sign of ∂xxH(x) therefore equals the sign of (3.15) k(x) := F ′(B(x)) + rtR ′(B(x)) . From the fact that B is decreasing and F and R are convex it is obvious that k must be decreasing. We will now show that k has at most a single zero in [0,∞): (a) D = R>0: We have assumed that either F or R is non-linear. By Lemma 2.6 this implies that either F or R is strictly convex, and thus that either F ′ or R′ is strictly increasing. If rt > 0, then it follows that k is strictly decreasing and thus has at most a single zero. If rt = 0, an additional argument is needed: It could happen that F is of the form F = bu such that k(x) = b and k is no longer strictly decreasing. However, by assumption, F 6= 0 such that in this case k has no zero in [0,∞). (b) D = R: In this case, by the admissibility conditions in Definition 2.3, we have necessarily R(u) = βu. Also, since either F or R is non-linear, F must be non-linear and thus by Lemma 2.6 strictly convex. It follows that k(x) = F ′(B(x))+rtβ is strictly decreasing and thus has at most a single zero in [0,∞). We have shown that k is decreasing and has at most a single zero; to determine whether it has a zero for some value of rt, we consider the two ‘endpoints’ k(0) and limx→∞ k(x). First we show that (3.16) k(0) ≥ 0 if and only if rt ≤ binv := R′(0) if R′(0) < 0 +∞ if R′(0) ≥ 0 . Since B(0) = 0 by Proposition 3.4 it follows that k(0) = F ′(0) + rtR ′(0) . 10 MARTIN KELLER-RESSEL AND THOMAS STEINER We distinguish two cases: (a) If R′(0) < 0 then the assertion (3.16) follows immediately. (b) Consider the case that R′(0) ≥ 0: Assume that D = R. Then we have R(u) = βu and R′(0) = β ≥ 0. This, however, stands in contradiction to our assumption λ > 0, which implies that β = −λ < 0 (cf. Definition 3.2). Thus we must have D = R>0 and rt ≥ 0; in this case it follows that k(0) ≥ 0, for all rt ∈ D, and we set binv = +∞. Next we consider the right end of k(x) and show that (3.17) lim k(x) ≤ 0 if and only if rt ≥ bnorm := − F ′(−1/λ) R′(−1/λ) . Since limx→∞B(x) = −1/λ by Corollary 3.5 we have that (3.18) lim k(x) = F ′(−1/λ) + rtR′(−1/λ) . By assumption λ > 0, and by Definition 3.2 it holds that R(−1/λ) = 1. Also R(0) = 0, and by the mean value theorem 1 = R(−1/λ)−R(0) = − 1 R′(ξ) for some ξ ∈ (−1/λ, 0). Since R′ is increasing, it follows that R′(−1/λ) ≤ −λ < 0, and we can deduce (3.17) directly from (3.18). We summarize our results on the function k so far: k stays negative on (0,∞) if rt ≥ binv and positive if rt ≤ bnorm. It has a single zero on (0,∞) if and only if bnorm < rt < binv. If k has a zero on (0,∞), since k is decreasing, the sign of k will be positive to the left of the zero and negative to the right of the zero. Since ∂xxH has the same sign as k, the statements above translate in the obvious way to the convexity behavior of H . We will now use the convexity behavior of H to derive our results about the yield curve. Consider the equation (3.19) H(x) = cx, x ∈ [0,∞) for some fixed c ∈ R. Since H(0) = 0 this equation has at least one solution, x0 = 0. If rt ≥ binv then H(x) is strictly concave on [0,∞), and by Lemma 3.10 the equation (3.19) has at most one additional solution x1. Also, when the solution exists, H(x) crosses cx from above at x1. Similarly if rt ≤ bnorm then H(x) is strictly convex, and there exists at most one additional solution x2 to (3.19) on [0,∞). If the solution exists, then cx is crossed from below at x2. In the last case bnorm < rt < binv, there exists a x∗ – the zero of k(x) – such that H(x) is strictly convex on (0, x∗) and strictly concave on (x∗,∞). Now there can exist at most two additional solutions x1, x2 to (3.19) with x1 < x ∗ < x2, such that cx is crossed from below at x1 and from above at x2. Because of definition (3.12), every solution to (3.19), excluding x0 = 0, is also a solution to (3.20) Y (rt, x) = c, x ∈ (0,∞) with rt fixed. Also the properties of crossing from above/below are preserved since x is positive. This means that in the case rt ≥ binv, equation (3.20) has at most a single solution, or in other words, that every horizontal line is crossed by the yield curve at most in a single point. If it is crossed, it is crossed from above. This YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 11 implies that Y (x) is a strictly decreasing function of x, or following Definition 3.8, that the yield curve is inverse. In the case rt ≤ bnorm we have again that (3.20) has at most a single solution and that every horizontal line is crossed from below by the yield curve, if it is crossed. In other words, the yield curve is normal. In the last case of bnorm < rt < binv, the yield curve crosses every horizontal line at most twice, in which case it crosses first from below, then from above. Thus in this case the yield curve is humped. � Corollary 3.11. Under the conditions of Theorem 3.9 the instantaneous forward rate curve has the same global behavior as the yield curve, i.e. Y (rt, .) is inverse ⇐⇒ f(rt, .) is strictly decreasing Y (rt, .) is humped ⇐⇒ f(rt, .) has exactly one local maximum and no local minimum Y (rt, .) is normal ⇐⇒ f(rt, .) is strictly increasing . In the second case the maximum of the forward rate curve is f(rt, x∗), where x∗ solves (3.21) rt = − F ′(B(x)) R′(B(x)) , x ∈ (0,∞) . Proof. This follows from the fact that ∂xH(x) as given in (3.13) is exactly the forward rate f(rt, x). The derivative of the forward rate is therefore ∂xxH(x), which is given in (3.14) as ∂xf(rt, x) = ∂xxH(x) = −B′(x) · k(x) . The factor −B′(x) 6= 0 is always positive, and the possible sign changes and zeroes of k(x) are discussed in the proof of Theorem 3.9, leading to the stated equivalences. Equation (3.21) is simply the condition k(x∗) = 0. � Corollary 3.12. Under the conditions of Theorem 3.9 it holds that (3.22) bnorm < basymp < binv whenever the quantities are finite. In addition it holds that (3.23) D ∩ (bnorm, binv) 6= ∅ . Remark 3.13. Note that equation (3.23) implies that there is always some rt ∈ D such that the yield curve Y (rt, .) is humped. Proof. By the mean value theorem there exists a ξ ∈ (−1/λ, 0) such that basymp = −F (−1/λ) = F (0)− F (−1/λ) = F ′(ξ) . Since F is convex and thus F ′ increasing, it holds that (3.24) F ′(−1/λ) ≤ basymp ≤ F ′(0) Applying the mean value theorem to R, there exists another ξ ∈ (−1/λ, 0) such 1 = R(−1/λ)−R(0) = − 1 R′(ξ) . 12 MARTIN KELLER-RESSEL AND THOMAS STEINER Since R′ is increasing we deduce that R′(−1/λ) ≤ −λ < 0. Assuming that also R′(0) < 0 we get (3.25) − 1 R′(−1/λ) ≤ ≤ − 1 R′(0) Since either F orR is non-linear, one of the functions is strictly convex by Lemma 2.6. Consequently either both inequalities in (3.24) or in (3.25) are strict. Putting them together we get ′(−1/λ) R′(−1/λ) < basymp < − F ′(0) R′(0) proving (3.22) under the assumption that R′(0) < 0. If R′(0) ≥ 0 then by definition binv = ∞. Equation (3.24) still holds, but in (3.25) only the left inequality sign remains valid. Together this still proves that bnorm < basymp and we have shown (3.22). To prove (3.23) we distinguish two cases: (a) D = R. In this case it is sufficient to prove−∞ < binv and bnorm <∞. Consider first binv. If R ′(0) ≥ 0 then by definition binv = ∞ and nothing is to prove. If R′(0) < 0 then binv = −F ′(0)/R′(0). By convexity F ′(0) > −∞ and the assertion follows. Consider now bnorm = −F ′(−1/λ)/R′(−1/λ). From (3.25) we know that R′(−1/λ) ≤ −λ < 0. By convexity F ′(−1/λ) <∞ and it follows that bnorm <∞. (b) D = R>0. In this case it is sufficient to prove 0 ≤ bnorm and to apply (3.22). As above we have that bnorm = −F ′(−1/λ)/R′(−1/λ) and that R′(−1/λ) ≤ −λ < 0. By Definition 2.3 F ′(−1/λ) = b+ (0,∞) ξe−ξ/λm(dξ) with b ≥ 0. It follows that F ′(−1/λ) ≥ 0, proving the assertion. � The last Corollary of this section shows the interesting fact that the occurrence of a humped yield curve is a necessary and sufficient sign of randomness in the short rate model: Corollary 3.14. Let the risk-neutral short rate process be given by a one-dimensional affine process (rt)t≥0 satisfying Condition 3.1 with F 6= 0 and quasi-mean-reversion λ > 0. Then the following statements are equivalent: (i) There exists a rt ∈ D such that Y (rt, .) is flat. (ii) There exists no rt ∈ D such that Y (rt, .) is humped. (iii) The short rate process (rt)t≥0 is deterministic. (iv) F (u) = bu and R(u) = βu. Proof. Theorem 3.9, together with Corollary 3.12, shows already that ¬(iv) implies ¬(i) and ¬(ii). Also, from the form of the generator in (2.4), it is seen that (iii) and (iv) are equivalent. It remains to show that (iv) implies (i) and (ii). Proceeding as in the proof of Theorem 3.9 we obtain instead of (3.15) simply k(x) = b+ rtβ . The yield curve will be humped if and only if k has a single (isolated) zero in [0,∞). Since k is a constant function, this cannot be the case for any rt ∈ D and we have shown (ii). By the same arguments as in the proof of Theorem 3.9 the yield curve YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 13 binv = − F’(0) R’(0) basymp = − F(− 1 λ) bnorm = − F’(− 1 λ) R’(− 1 λ) Time to Maturity Figure 1. This Figure shows a graphical summary of Theorems 3.7 and 3.9, as well as the definitions of the key quantities bnorm, basymp and binv. In any affine model satisfying the conditions of Theorem 3.9, the shapes of yield curves will follow the picture given here. They will be normal if r0 is below bnorm, humped if r0 is between bnorm and binv and inverse if r0 is above binv. Also all yield curves will tend asymptotically to the same level basymp. is flat if and only if k is constant and equal to 0. This is the case if rt = − bβ . It remains to show that rt ∈ D. Note that β = −λ < 0. In particular β 6= 0, such that for D = R we are already done. If D = R>0 we have by the admissibility conditions in Definition 2.3 that b ≥ 0. Thus rt = − bβ ≥ 0 and we have shown (i). � 3.3. The Limit Distribution of an Affine Process. It is well-known that the Gaussian Ornstein-Uhlenbeck process, for example, converges in law to a limit distribution and that this distribution is Gaussian. The goal of this section is to establish a corresponding result for affine processes. While calculating the marginal distributions of an affine process involves solving the generalized Riccati equations (2.5), it will be seen that the limit distribution is much easier obtained and can be determined directly from the functions F and R. In the interest rate model considered in the preceding section, the short rate follows an affine process under the martingale measure, such that the results will allow us to characterize the risk-neutral asymptotic short rate distribution. Often also the 14 MARTIN KELLER-RESSEL AND THOMAS STEINER limit distribution under the objective measure is of interest, but the affine prop- erty is in general not preserved by an equivalent change of measure, such that the results are not directly applicable. Nevertheless, for the sake of tractability, condi- tions on the measure change can be imposed, such that the model is affine under both the objective and the risk-neutral measure. (See Nicolato and Venardos [2003] for an example from option pricing and Cheridito et al. [2005] for more general re- sults). In such a setting the results can also be applied under the objective measure. Before we state the result, we want to recall that a real-valued random variable L is called self-decomposable if for every c ∈ (0, 1) there exists a random variable Lc, independent of L, such that L = cL+ Lc for all c ∈ (0, 1) . Since self-decomposability is a distributional property, we will identify L and its law, and refer to both as self-decomposable. For OU-type processes, limit distributions have been studied for some time; the first results can be found in Jurek and Vervaat [1983] and Sato and Yamazato [1984]. The next theorem summarizes these results, and can be found in similar form in Sato [1999, Theorem 17.5]: Theorem 3.15. Let (rt)t≥0 be a OU-type process on R. If β < 0 and |ξ|>1 log |ξ|m(dξ) <∞ then (rt)t≥0 converges in law to a limit distribution L which is independent of r0 and has the following properties: (i) L is self-decomposable. (ii) The cumulant generating function κ(u) = log eux dL(x) satisfies (3.26) κ(iu) = F (is) ds for all u ∈ R . Conversely, if L is a self-decomposable distribution on R and β < 0, there exists a unique triplet (a, b,m) satisfying the admissibility conditions of Definition 2.3, such that L is the limit distribution of the affine process (of OU-type) given by the parameters (a, b,m, β). As discussed in Section 2, every regular affine process with state space R is of OU-type, such that the above theorem applies. We now state our corresponding result for affine processes on R>0: Theorem 3.16. Let (rt)t≥0 be a one-dimensional, regular, conservative affine pro- cess with state space R>0. If R′(0) < 0 and log ξ m(dξ) <∞ then (rt)t≥0 converges in law to a limit distribution L which is independent of r0, and whose cumulant generating function κ is given by (3.27) κ(u) = F (s) ds for all u ∈ (−∞, 0] . YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 15 Proof. By Theorem 2.4 the transition kernel pt(x, .) of the process (rt)t≥0 has the characteristic function p̂t(x, u) = exp (φ(t, u) + xψ(t, u)) where φ and ψ satisfy the generalized Riccati equations (2.5) for all u ∈ U , and thus in particular for all u ∈ (−∞, 0]. Since R(0) = 0, 0 is a critical point of the autonomous ODE (2.5b), and by the assumption R′(0) < 0 it is asymptotically stable. By the convexity of R, R′(0) < 0 also implies that R(u) > 0 for all u ∈ (−∞, 0), such that ψ(t, u) is a strictly increasing function in t for all u ∈ (−∞, 0). Since 0 is the only critical point of (2.5b) on (−∞, 0] it also follows that ψ(t, u) = 0 for all u ∈ (−∞, 0] . Consequently, (3.28) lim log p̂t(x, u) = lim φ(t, u) = F (ψ(r, u)) dr = F (s) where the last two equalities follow from (2.5) and the transformation s = ψ(r, u). We will now show that the last integral in (3.28) converges absolutely for all u ∈ (−∞, 0]: Since R(u) ≥ 0 and F (u) ≤ 0 for all u ∈ (−∞, 0] we obtain F (s) ∣∣∣∣ ds = − F (s) ds ≤ − 1 R′(0) F (s) ds, u ∈ (−∞, 0] , where the inequality follows from the fact that the convex function R is supported by its tangent at 0. From the definition of F (u) in (2.2) it is clear that the convergence of the last integral will depend only on the jump part of F , i.e. the integral converges if and only if (3.29) (0,∞) esξ − 1 m(dξ) ds <∞, for all u ∈ (−∞, 0]. Define M(u, ξ) = esξ−1 ds. For a fixed u ∈ (−∞, 0], it is easily verified that M(u, ξ) = O(ξ) as ξ → 0, and that M(u, ξ) = O(log ξ) as ξ → ∞. Since the Lévy measurem(dξ) integrates (ξ∧1) by Definition 2.3, and log ξ ·1{ξ>1} by assumption, it must also integrateM(u, ξ). Applying Fubini’s theorem, (3.29) follows, such that κ(u) := F (s) ds converges for all u ∈ (−∞, 0]. In particular limu↑0 κ(u) = 0, such that the limit in (3.28) is a function that is left-continuous at 0. By standard results on Laplace transforms of probability measures (cf. Steutel and van Harn [2004, Theorem A.3.1]), the pointwise convergence of cumulant generating functions to a function that is left-continuous at 0 implies convergence in distribution of (rt)t≥0 to a limit distribution L with cumulant generating function given by (3.28). � Since the marginal distributions of an affine process are infinitely divisible, also the limit distribution L must be infinitely divisible, if it exists. In Theorem 3.15 a stronger result is given for an affine process on R: In this case L is also self- decomposable. An obvious question is, if this result can be extended to the state space R>0. We will see that the answer is negative. In Section 4.3 an example of an affine process with state space R>0 is given, which converges to an infinitely divisible limit distribution that is not self-decomposable. This result is interesting, since it leaves open the possibility of some unexpected properties of the limit distribution of 16 MARTIN KELLER-RESSEL AND THOMAS STEINER an affine process. For example a self-decomposable distribution is always unimodal, whereas an infinitely divisible distribution might be not. 4. Applications 4.1. The Vasiček model. We apply the results of the last section to the classical Vasiček model (4.1) drt = −λ(rt − θ) dt+ σ dWt, r0 ∈ R where (Wt)t≥0 is a standard Brownian motion under the risk-neutral measure and λ, θ, σ > 0. The Vasiček model is arguably the simplest affine model, and no surprises are to be expected here. In fact all results that we state here can already be found in the original paper of Vasiček [1977]. We advise the reader to view this paragraph as a warm-up for the following examples. Clearly (rt)t≥0 is a conservative affine process with F (u) = λθu + u2 ,(4.2) R(u) = −λu .(4.3) From the quadratic term in F and Definition 2.3, it is seen that (rt)t≥0 has state space R. This property is often criticized, since it allows the short rate to become negative. From Theorem 3.9 we calculate binv = θ and bnorm = θ − such that the yield curve in the Vasiček model is normal if rt ≤ θ − σ2/λ2, inverse if rt ≥ θ and humped in the remaining cases. The long term yield is calculated from (3.7) as basymp = −F (−1/λ) = θ − in this case exactly the arithmetic mean of binv and bnorm. Theorem 3.15 applies and the cumulant generating function κ of the risk-neutral limit distribution L satisfies κ(iu) = − 1 F (is) iθ − σ ds = uiθ − u for u ∈ R. Hence, L is Gaussian with mean θ and variance σ2 4.2. The Cox-Ingersoll-Ross model. The Cox-Ingersoll-Ross (CIR)-model was introduced by Cox et al. [1985]. In this model the short rate process (rt)t≥0 is given by the SDE (4.4) drt = −a(rt − θ)dt+ σ rt dWt, r0 ∈ R>0 where (Wt)t≥0 is a standard Brownian Motion under the risk-neutral measure and a, θ, σ > 0. The process (rt)t≥0 is a conservative affine process with F (u) = aθu ,(4.5) R(u) = u2 − au .(4.6) YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 17 From Definition 2.3 it is seen that (rt)t≥0 has state space R>0. The fact that interest rates stay non-negative in the CIR-model is often cited as an advantage of the model over the Vasiček model. Calculating the quasi-mean-reversion (see Definition 3.2), we find that a2 + 2σ2 + a From Theorem 3.7 we find that the long-term yield is given by basymp = −F (−1/λ) = a2 + 2σ2 + a The boundary between humped and inverse behavior binv is calculated from Theo- rem 3.9 as binv = − F ′(0) R′(0) Both quantities basymp and binv can also be found in [Cox et al., 1985, Eq. (26) and following paragraph]. Before we consider bnorm, we quote (with notation adapted to (4.4)) from page 394 of [Cox et al., 1985] where the shape of the yield curve is discussed: ‘When the spot rate is below the long-term yield [= basymp], the term structure is uniformly rising. With an interest rate in excess of θ [= binv], the term structure is falling. For intermediate values of the interest rate, the yield curve is humped.’ In our terminology, they claim that the yield curve is normal for rt ≤ basymp, humped for basymp < rt < binv and inverse for rt ≥ binv. This stands in clear contradiction to Theorem 3.9 and Corollary 3.12 where we have obtained that yield curves are normal if and only if rt ≤ bnorm and that bnorm < basymp, or – in plain words – that there are yield curves starting strictly below the long-term yield that are still humped. The claims of Cox et al. [1985] are repeated in [Rebonato, 1998, p. 244f], where even several plots of ‘yield surfaces’ (the yield as a function of rt and x) are presented as evidence. However Rebonato fails to indicate the level of basymp in the plots, such that the conclusion remains ambiguous. To clarify the scope of humped yield curves in the CIR-model we calculate bnorm from Theorem 3.9: bnorm = − F ′(−1/λ) R′(−1/λ) = a2 + 2σ2 The relation bnorm < basymp < binv is immediately confirmed by noting that basymp is the harmonic mean of bnorm and binv. For a graphical illustration we refer to the second yield curve from below in Figure 1. The plot actually shows CIR yield curves with parameters a = 0.5, σ = 0.5, θ = 6% plotted over a time scale of 25 years. The second curve from below starts at r0 = 4.2%, i.e. below the long-term yield, but is visibly humped. 18 MARTIN KELLER-RESSEL AND THOMAS STEINER To calculate the limit distribution of (rt)t≥0, we apply Theorem 3.16: The cu- mulant generating function κ(u) of the limit distribution is given by κ(u) = F (s) 1− sσ2/2a ds = − This is the cumulant generating function of a gamma distribution with shape pa- rameter 2aθ/σ2 and scale parameter σ2/2a. Again this result can already be found in Cox et al. [1985, p. 392]. 4.3. An extension of the CIR model. To illustrate the power of the affine setting, we consider now an extension of the CIR model that is obtained by adding jumps to (4.4). We define the risk-neutral short rate process by (4.7) drt = −a(rt − θ)dt+ σ rt dWt + dJt, r0 ≥ 0 where (Jt)t≥0 is a compound Poisson process with intensity c > 0 and expo- nentially distributed jumps of mean ν > 0. This model has been introduced by Duffie and Gârleanu [2001] as a model for default intensity and is used by Filipović [2001] as a short rate model. It can also be found in Brigo and Mercurio [2006] under the name JCIR model. It is easily calculated that F (u) = aθu+ ν − u, u ∈ (−∞, ν) ,(4.8) R(u) = u2 − au .(4.9) Solving the generalized Riccati equations (3.6) for A(x) and B(x) becomes quite tedious, but the quantities binv, basymp, bnorm can be calculated from Theorem 3.7 and Theorem 3.9 in a few lines: The quasi-mean reversion λ stays the same as in the CIR model, since R does not change. From F ′(u) = aθ + (ν − u)2 we derive immediately binv = θ + basymp = ν(a+ ν) + 2 bnorm = γ(σ2ν + γ − a)2 , where γ = a2 + 2σ2. Note that by setting the jump intensity c to zero, the ex- pressions of the (original) CIR model are recovered. Next we calculate the limit distribution of the model. Using the abbreviations ρ := σ2/2 and ∆ := a− νρ we obtain κ(u) = F (s) 1− sρ/a ds+ c (s− ν)(ρs− a) = if ∆ 6= 0 −θν log if ∆ = 0 YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 19 as the cumulant generating function of the limit distribution L under the martin- gale measure. We now take a closer look at the distribution L, since this will answer the question raised at the end of Section 3.3: For certain parameters, L is an example for a limit distribution of an affine process that is infinitely divisible, but not self- decomposable. We consider the case ∆ = 0 and define (4.10) l(x) := νe−νx, x ∈ R>0 . By Frullani’s integral formula (4.11) κ(u) = (eux − 1) l(x) for all u ∈ (−∞, ν). Since l is non-negative on R>0, l(x)/x is the density of a Lévy measure and (4.11) is seen to be the Lévy-Khintchine representation for the cumulant generating function of the infinitely divisible distribution L. In addition, L is self-decomposable if and only if l is non-negative and non-increasing on R>0 (cf. Sato [1999, Corollary 15.11]). In the case of l(x) given by (4.10), it is easily calculated that l(x) has a single maximum at x∗ = 1 . Thus, if c ≤ aθν, then x∗ ≤ 0, such that l is non- increasing on R>0 and L is self-decomposable. If c > aθν then l is increasing in the interval [0, x∗) and the limit distribution L is infinitely divisible, but not self-decomposable. 4.4. The gamma model. Instead of analyzing the properties of a known model, we will now follow a different route and construct a model that satisfies some given properties. We want to construct an affine process on R>0 that has the same limit distribution as the CIR model (i.e. a gamma distribution), but is a process of OU- type. The second property is equivalent to R(u) = βu. Considering Theorem 3.16, we know that if we want to obtain a limit distribution, we need β < 0. To keep with the notation of the Vasiček model, we will write R(u) = −λu where λ > 0. Now by (3.27) the cumulant generating function of the limit distribution is given (4.12) κ(u) = F (s) ds for all u ∈ (−∞, 0] . Let the limit distribution be a gamma distribution with shape parameter k > 0 and scale parameter θ > 0. Then κ(u) = −k log(1 − θu) and by (4.12) F (u) = 1− θu . Setting c = λk and ν = 1/θ it is seen that F (u) is equal to the last term in (4.8). This means that the driving Lévy process of (rt)t≥0 is of the same kind as the process (Jt)t≥0 in (4.7), i.e. (rt)t≥0 is a pure jump OU-type process with exponentially distributed jump heights of mean 1/θ and with jump intensity λk. We interpret the affine process we have constructed as a risk-neutral short rate process. It is clear that the bond prices are of the exponentially-affine form (3.5). From the generalized Riccati equation (3.6b) we obtain B(x) = e−λx − 1 20 MARTIN KELLER-RESSEL AND THOMAS STEINER From equation (3.6a) we calculate A(x) = F (B(s)) ds = θ + λ (log(1− θB(x)) − θx) , such that the bond prices are given by P (t, t+ x) = exp −x λθk θ + λ + rtB(x) (1− θB(x)) θ+λ . The global shape of the yield curve is described by the quantities binv = kθ, basymp = 1/θ+ 1/λ , bnorm = (1/θ + 1/λ)2 and it is seen that for the gamma-OU-process basymp is the geometric average of binv and bnorm. 5. Conclusions In this article we have given, under very general conditions, a characterization of the yield curve shapes that are attainable in term structure models where the risk- neutral short rate is given by a time-homogenous, one-dimensional affine process. Even though the parameter space for this class of models is infinite-dimensional, the scope of attainable yield curves is very narrow, with only three possible global shapes. In addition we have given conditions under which an affine process con- verges to a limit distribution, and we have characterized the limit distribution in terms of its cumulant generating function, extending some known results on OU- type processes. The most obvious question for future research is the extension of these results to multi-factor models. It is evident from numerical results that in two-factor models yield curves with e.g. a dip, or also with a dip and a hump, can be obtained. It would be interesting to see if more complex shapes can also be produced, or if there are similar limitations as in the single-factor case. Also, in the one-factor case the dependence of the yield curve shape on the current short rate is basically described by the intervals D ∩ (−∞, bnorm], (bnorm, binv) and [binv,∞). In the two-factor case the partitioning of the state-space might be more complex, and we expect to see more interesting transitions between yield curve types. Another aspect is, that since affine processes as a general framework become better understood, extensions of classical models e.g. by adding jumps, like in the JCIR model described in Section 4.3, become more feasible and attractive for applications. References Damiano Brigo and Fabio Mercurio. Interest Rate Models - Theory and Practice. Springer Finance. Springer, 2nd edition, 2006. René Carmona and Michael Tehranchi. Interest Rate Models: An Infinite Dimen- sional Stochastic Analysis Perspective. Springer Finance. Springer, 2006. Patrick Cheridito, Damir Filipović, and Marc Yor. Equivalent and absolutely con- tinuous measure changes for jump-diffusion processes. The Annals of Applied Probability, 15(3), 2005. John C. Cox, Jonathan E. Ingersoll, and Stephen A. Ross. A theory on the term structure of interest rates. Econometrica, 53(2):385–407, 1985. YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 21 Darrell Duffie and Nicolae Gârleanu. Risk and valuation of collateralized debt obligations. Financial Analysts Journal, 57(1):41 – 59, 2001. Darrell Duffie, Damir Filipović, and Walter Schachermayer. Affine processes and applications in finance. The Annals of Applied Probability, 13(3):984–1053, 2003. Damir Filipović. A general characterization of one factor affine term structure models. Finance and Stochastics, 5:389–412, 2001. Zbigniew J. Jurek and Wim Vervaat. An integral representation for self- decomposable Banach space valued random variables. Zeitschrift für Wahrschein- lichkeitstheorie und verwandte Gebiete, 62:247–262, 1983. Eugene Lukacs. Characteristic Functions. Charles Griffin & Co Ltd., 1960. Elisa Nicolato and Emmanouil Venardos. Option pricing in stochastic volatility models of the Ornstein-Uhlenbeck type. Mathematical Finance, 13 (4):445–466, 2003. Riccardo Rebonato. Interest-Rate Option Models. Wiley, 2nd edition, 1998. L.C.G. Rogers and David Williams. Diffusions, Markov Processes and Martingales, Volume 1. Cambridge Mathematical Library, 2nd edition, 1994. Ken-iti Sato. Lévy processes and infinitely divisible distributions. Cambridge Uni- versity Press, 1999. Ken-iti Sato and M. Yamazato. Operator-selfdecomposable distributions as limit distributions of processes of Ornstein-Uhlenbeck type. Stochastic Processes and Applications, 17:73–100, 1984. Fred Steutel and Klaas van Harn. Infinite Divisibility of Probability Distributions on the Real Line. Marcel Dekker Inc., 2004. Oldrich Vasiček. An equilibrium characterization of the term structure. Journal of Financial Economics, 5:177–188, 1977. Vienna University of Technology, Wiedner Hauptstrasse 8–10, A-1040 Wien, Austria E-mail address: mkeller@fam.tuwien.ac.at Vienna University of Technology, Wiedner Hauptstrasse 8–10, A-1040 Wien, Austria E-mail address: thomas@fam.tuwien.ac.at 1. Introduction 2. Preliminaries 3. Theoretical Results 3.1. Bond Prices 3.2. The Yield Curve and the Forward Rate Curve 3.3. The Limit Distribution of an Affine Process 4. Applications 4.1. The Vasicek model 4.2. The Cox-Ingersoll-Ross model 4.3. An extension of the CIR model 4.4. The gamma model 5. Conclusions References ABSTRACT We consider a model for interest rates, where the short rate is given by a time-homogenous, one-dimensional affine process in the sense of Duffie, Filipovic and Schachermayer. We show that in such a model yield curves can only be normal, inverse or humped (i.e. endowed with a single local maximum). Each case can be characterized by simple conditions on the present short rate. We give conditions under which the short rate process will converge to a limit distribution and describe the limit distribution in terms of its cumulant generating function. We apply our results to the Vasicek model, the CIR model, a CIR model with added jumps and a model of Ornstein-Uhlenbeck type. <|endoftext|><|startoftext|> Introduction The discussion about the presence of localised states and their density naturally leads to the question about the kind of traps we are dealing with. In a first approach we should distinguish between intrinsic and extrinsic defects. In the first type we should inscribe polymer end groups, grain boundaries, structural defects, conformational disorder up to molecular groups with large permanent dipole moment that could increase the level of energetic disorder [1]. For the second type we should mention the chemical impurities, somehow unavoidable in the synthesis of organic molecules. We can further distinguish the kind of traps in function of their location, interfacial or bulk, or in term of energy, deep traps or shallow traps. Also polarons could be seen, in a simplistic way, like defects caused by an electron plus an induced lattice polarisation [2] followed by a lattice distortion. Traps are into the samples in a great variety and in different proportion, often despite the same preparation procedure. For that reason sometimes their nature is difficult to investigate and the data must be handled with care. In the studies of trapped states because of the above underlined variety of defects in solids the most successful approach is to start introducing a single type of defect in a well-known system in a controlled way. 2. Experimental Part In our work we focused on low molecular compounds as well as on polymers, especially of two classes of materials: oxadiazoles and quinoxalines. Both organic compounds are well know as electron transport materials in OLEDs. PPQs (see figure 1) in general show very high solubility [3] in a variety of common organic solvents, and according to literature they exhibit a glass transition at quite high temperature (250-350 °C). The materials were deposited by spin coating on gold, aluminium or silicon in different speed or concentration for the film optimisation. The layer thickness was controlled by Dektak techniques and Ellypsometry. The thermally stimulated luminescence (TSL) and thermally stimulated current (TSC) are powerful instruments in order to study de-trapping and relaxation processes in organic materials. TSL is a contact less technique that allows to distinguish between deep and shallow trapping states. The proposed mathematical model for the TSL enables to study trap levels and recombination centres inside the band gap. The analytical solution of the rate equations allows two different de-trapping regimes, including or excluding subsequent re-trapping effects. The first order solution kinetic indicates that no re-trapping phenomena are permitted. The electron released from a localised level recombines with a hole in a recombination centre and its re-trapping probability, before to recombine, is negligible. The second order equation deals with the opposite extreme case. The phonon-assisted release of an electron is followed by multiple re-trapping. In this second order kinetic regime the probability of a released electron to get re-trapped is very high. The main factors governing both solutions are the energy depth of the traps calculated with respect to the conduction band edge and the frequency factor. This second important factor in general indicates the attempt-to- escape frequency of electrons from the localised levels. The mathematical model takes also into account the occurrence of distributions of localised levels. In case of a Gaussian distribution of localised states a meaningful parameter is the width of the distribution. Numerical simulations, calculated with the proposed model, show that while a first order peak is characterised by an asymmetric peak shape with a steep decreasing side, the second order kinetic peaks are characterised by a more symmetric shape. The signal is smeared along the whole peak temperature range due to the re-trapping effect. Figure 1. Poly-[2,2’-(1,4-phenylene)-6,6’-bis(3-phenylquinoxaline)] (PPQ IA) The same theoretical description holds for both techniques, TSL and TSC. However, TSC theory requires the presence of an extended conduction band. During a TSC experiment a driving voltage is applied to the sample and the de-trapped charges are extracted at the device contacts. However, the equations describing a TSC peak are similar to equations describing TSL. Additionally, it is possible to determine the density of the trapping states evaluating the area under a TSC peak. Simultaneous TSL and TSC measurements give useful information about the localised states combining the best possibilities of both thermal techniques. Unambiguous information about trap depth, density of states, kinetics order and frequency factor can be extract making use of the full possibilities of the combined measurements. In typical thermally stimulated process experiments a sample is heated in a controlled way and the current, in case of TSC, or the light emission, in case of TSL, or both simultaneously are monitored. The effect appears only when an optical or electrical excitation takes place prior to the heating. TSC, in contrast to TSL, requires the presence of good ohmic contacts. In the following a TSL experiment is described in more detail and the rate equations derived. The sample, in an equilibrium state at room temperature where all the shallow traps are empty, is cooled down to a low temperature. Then, it is illuminated with electromagnetic radiation of certain energy. The incident radiation excites the electrons from the valence band to the conduction band trough the gap. In the case of prompt fluorescence the generated electrons recombine promptly. Otherwise they can form an electron hole pair followed by geminate recombination or by dissociation with subsequent trapping. Charge carriers can get trapped in localised levels that, considering the random fluctuation of the potential in disordered materials, are distributed in energy. From statistical consideration, the distribution type should be generally Gaussian. The thermal emission from traps at low temperature is negligibly small. Therefore, the perturbed equilibrium created by the incident radiation resists for a long time and the electrons are just stored in the localised levels. Temperature is then raised in a controlled way, electrons acquire energy and finally escape from the traps by means of a phonon assisted jump and recombine with holes trough a recombination centre where recombination occurs with subsequent photon emission. By means of spectrally resolved TSL experiments it is possible to get information about recombination centres studying the wavelength of the emitted light as function of temperature and intensity. The above-described processes are illustrated in figure 2. The illustrated scheme for thermoluminescence is simple, but despite of its simplicity it can describe all fundamental features of a thermoluminescence process. Following Chen [4], the electron exchange between the HOMO and LUMO levels, during the trap emptying, can be described by the following three differential equations: h Ann ⋅⋅−= (1) pnAnNn c ⋅−⋅−= )( (2) c AnnAnNnpn ⋅⋅−⋅−−⋅= )( (3) Here nh is the concentration of holes in the recombination centres, nc is the concentration of electrons in the conduction band, Ar is the recombination coefficient for electrons in the conduction band with holes in the recombination centres, n is the concentration of electrons in traps, N is the function describing the concentration of electron traps at depth E below the edge of the conduction band, A is the transition coefficient for electrons in the conduction band becoming trapped and p is the same probability of thermal release of electrons from traps defined in equation (1), which represents in fact their release rate. Equation (2) describes the change of hole density nh in recombination centres versus time. The recombination rate depends both from the concentration of free electrons (nc) and from the concentration of holes already present in the recombination centres trough a probability coefficient (Ar) that depends on the cross section and the thermal velocity72 of electrons. An increase in these parameters results in an increase of the recombination probability. Equation (3) describes the exchange of electrons between conduction band and traps. The first term in the right hand side includes the probability A for an electron to be trapped. That probability A, like Ar, also depends on the thermal velocity of electrons and on the cross section of traps. The second term on the right hand side is the de-trapping term. It is proportional to the concentration of trapped Electron centre Hole centre Conduction band Valence band Recombination centre Figure 2. Energy diagram describing the elementary process of the simple model for TSL electrons and to the Boltzmann’s function, i.e. equation (1). The proportionality factor s, often called frequency factor or pre-exponential factor, should be, when interpreted in terms of attempt to escape of an electron from the potential well, in the order of magnitude of 1010 ÷ 1014 s-1. A saturation effect for carrier release from traps, caused by a limited number of available states in the conduction band, is neglected in this model. In each moment the number of available states in conduction band is much higher than the released amount of electrons from the localised states [5]. Equation (3) describes the variation of electron density in the conduction band and essentially it takes into account the charge neutrality of the whole system. The variation rate of electrons in the conduction band depends on electrons being released - first term on the right hand side-, electrons being trapped - second term - and electrons that recombine - third term. Electrons and holes in that model are generated at the same time - geminate couples, but they are not necessarily still bound. Saturation effects due to filled deeper traps or recombination centres that have already a hole on them are not considered. Complex models have the disadvantage to introduce an increased number of parameters [7]. Actually, several combinations of too many parameters can generate the same shape of a real glow curve, making impossible to find a most probable fit [8]. For that reason it is preferable to deal with a reasonable simple model that involves few reliable parameters. Actually, the proposed simple model can successfully describe the experimental glow curves, but it is necessary to take also the energetic distribution of localised states into account in order to describe the complex behaviour of disordered systems, like amorphous polymers or organic polycrystalline thin films. In such case the total number of traps is represented by the following equation (4). The traps do not have single activation energy, but they are continuously energetically distributed. dEENN (4) Here N(E) can be in principle any kind of distribution, but considering the statistical disorder in organic materials it should have a Gaussian shape. In principle also the frequency factor should have the same energetic distribution. In order to solve the system of differential equations (1)-(4), equation (5), regarding the time dependence of temperature, should be add. tTT ⋅+= β0 (5) In equation (5) β is the experimental constant heating rate. It should be note that as long as T is a well knows function of the time the only real variable is the time. For that reason it is very important, experimentally, to have a perfect control of the temperature linearity. 3. Results and discussion The main peak in figure 3 has the maximum temperature at Tm = 159.7 K and the second, of the roughly half the intensity, at Tm = 230.7 K. The peak at Tm = 159.7 K has a symmetry factor µ = 0.54, very near to the typical value for a second order kinetic. This fact gives a hint that in PPQ IA an electron, before recombination, has high probability to get re-trapped several times. Because of its hidden position the analysis of the minor peak of PPQ IA appearing at Tm = 230.7 K is very difficult [6]. 60 80 100 120 140 160 180 200 220 240 =230.7 K = 159.7 K Temperature (K) 40 60 80 100 120 140 160 180 200 220 240 260 Temperature (K) Figure 3. TSL of PPQ IA Figure 4. TSL glow curve of a PPQ IA sample (red line) compared with the second order numerical simulation of the first peak (green line) Figure 4 shows the numerical fitting of the main TSL peak of PPQ IA. The fit is performed by means of a second order equation characterised by a Gaussian distribution of traps. While the high temperature side perfectly fits the glow curve, the low temperature side do not follow the curve shape. For that reason the numerical simulation is not completely satisfactory. The distribution has a width σ = 0.12 eV and the distribution maximum is at Em = 0.37 eV. The energy maximum Em lies exactly in the middle of the integration limits E1 = 0.25 eV and E2 = 0.49 eV, having the distribution in such case a perfect Gaussian shape. The natural frequency s of this peak can be estimated to be in the order of s = 1x1010 s-1, considering, as is normal, the recombination coefficient / trapping coefficient ratio equal to 1 and a density of traps of 1014 cm- 3. The value of Em, derived by numerical analysis, is far from the expected energy depth of a trap calculated with the initial rise method. However, this mismatching can be explained considering the particular complexity of the peak that could result from the sum of at least two distributed peaks. In effect the initial rise procedure reveals the activation energy of a hidden peak at low temperature. This is an important point to clarify because of the importance of the presence of shallow traps in materials suitable for plastic electronic applications. Shallow traps are, at room temperature, empty and they play a crucial role in the electron transport property of organic materials. For a different thickness we obtain glow curves from figure 5. 40 60 80 100 120 140 160 180 200 220 240 260 Temperature (K) Figure 5. Glow curve of PPQ IA for 1500 nm We made TSC measurements for Poly-3-hexyle-thiophene (P3HT) on SiO2 – treated and untreated in oxygen plasma. The TSC experiment consisted of: (1) Cooling to -180°C (2) Trap filling by light (Mercury lamp) with + 4 V bias voltage (3) Application of readout voltage, heating with 0.10 K/sec. and measurement of detrapping current All measurements were carried out in vacuum (4,5 x 10 –5 mbar). The traps were filled by creating carriers with band-to-band photoexcitation of the samples. The light source was a Mercury lamp (200 W). The thermally stimulated currents were measured by a Keithley 617 electrometer. The TSC and temperature data were stored in a personal computer as described earlier. In a typical experiment, the samples are cooled down to T = 80 K and kept at this temperature for 15 min. Then they are illuminated through front electrode for a 15 minutes at a bias voltage + 4 V. Measurements were started after exposure to light, and samples are then heated with a constant rate (β = 0.1 K/sec.) from 50 up to 240 K. We measured and compared 2 samples of P3HT on SiO2 – treated and untreated in oxygen plasma. Both experiments were performed under the same conditions. The concentration of the traps was estimated using (Manfredotti et. al.) the relation: NT = (6) Here Q is the quantity of charge released during a TSC experiment and can be calculated from the area under the TSC peaks; A and L are the area and the thickness of the sample, respectively; e is the electronic charge and G is the photoconductivity gain, which equals to the number of electrons passing through the sample for each absorbed photon. NT was calculated by assuming G = 1. For that samples L = 60 nm and samples have 2,5 x 2,5 cm, A = 6,25. 10-4 m2. The trap is characterized by the temperature (Tm) corresponding to the peak maximum at the thermally stimulated current. The energy associated with the trap is the thermal energy at Tm given by: mmo KTTTfE ),,( ´βα= (7) 80 100 120 140 160 180 200 220 240 1.54E-010 1.55E-010 1.55E-010 1.56E-010 1.56E-010 Ea = 22.2 meV Ea=20.1 meV Ea = 25.8 meV Ea=17,5 meV =26.79x1014 =4,452x1014Nt=0.2825x10 =48.31x1014 198.56 171.08 155.05 134.43 Temperature (K) 80 100 120 140 160 180 200 220 240 1.00E-013 1.10E-013 1.20E-013 1.30E-013 1.40E-013 1.50E-013 Ea=26 meV =1.368x1012 cm3 =0.608x1012 cm3 =0.84x1012 cm-3 Ea=21 meV Ea=16 meV 202.914 167.161 127.709 Temperature (K) Figure 6. TSC for P3HT on SiO2 untreated sample Figure 7. TSC for P3HT on SiO2 treated sample In this equation, α is a dimensionless model dependent constant. The variable T’ is the temperature at half of the maximum current value on the low temperature side of the current peak. Assuming the Grossweiner model, the constant α and function f are given by: 51.1=α (8) ´ 1),,( TTf mmβ (9) We observed a difference between the trap concentrations in these two samples (treated and untreated) of two orders of magnitude. 4. Conclusions Thermal techniques are a powerful tool in the to study of localised levels in inorganic and organic materials. Thermally stimulated luminescence, thermally stimulated currents and thermally stimulated depolarisation currents allow, when applied in synergy the details shallow of traps and deeper levels to be investigated. They also permit to study, in synergy with dielectric spectroscopy, as polarisation and depolarisation effects. The analysis of the thermograms, emerging from the thermal techniques, can be performed starting from the differential rate equations of the de-trapping phenomena. Such an approach, allowed by the computing power of the modern computers, is not the most fruitful, while the number of free variables involved in the numerical resolution of the rate differential equations is too high. Sometimes completely different sets of parameters can fit the same thermally stimulated peak and ambiguous results are often achieved. 5. Acknowledgements Many thanks for the European Commission (contract number - HPRN-CT-2002-00327 - RTN- EUROFET) for the financial support as well as all co-workers and many friends of EUROFET network. 6. References [1] Ashcroft, N. W. & Mermin, N. D. Solid State Physics (Holt, Rinehart & Winston, New York, 1976). [2] McKeever, S. W. S. Thermoluminescence of Solids (eds. Cahn, R. W., Davis, E. A. & Ward, I. M.) (Cambridge University Press, Cambridge, 1985). [3] Paolo Imperia, Localised States in Organic Semiconductors and their Detection, University of Potsdam, 2003. [4] Bässler, H. Charge Transport in Disordered Organic Photoconductors, a Monte Carlo Simulation Study. phys. stat. sol. (b) 175, 15 (1993). [5] M. Prelipceanu, O.G. Tudose, S. Schrader, Thermally Stimulated Luminescence Investigations of New Materials For OFET’s and OLED’s, Winterschool on Organic Electronics (OEWS’04) Materials, Thin Films, Charge Transport & Device, Planneralm, Austria, 2004. [6] van Turnhout, J. in Electrets (ed. Sessler, G. M.) 81 (Springer Verlag, Berlin, 1980). [7] Schrader, S., Imperia, P., Koch, N., Leising, G. & Falk, B. in Organic Light-Emitting Materials and Devices, (ed. Kafafi, Z. H.) 209 (SPIE, San Diego, California, 1999). ABSTRACT The present work is focused on theoretical and experimental study of localised levels in organic materials suitable for light-emitting devices and field effect transistors by means of thermal techniques. In our work we focused on low molecular compounds as well as on polymers, especially of two classes of materials: oxadiazoles and quinoxalines. Both organic compounds are well know as electron transport materials in OLEDs. <|endoftext|><|startoftext|> A mi ro�uidi devi e based on droplet storage for s reening solubility diagrams Philippe Laval, Ni olas Lisai, Jean-Baptiste Salmon, and Mathieu Joani ot LOF, unité mixte Rhodia�CNRS�Bordeaux 1, 178 avenue du Do teur S hweitzer, F�33608 Pessa edex � FRANCE (Dated: O tober 25, 2018) This work des ribes a new mi ro�uidi devi e developed for rapid s reening of solubility diagrams. In several parallel hannels, hundreds of nanoliter-volume droplets of a given solution are �rst stored with a gradual variation in the solute on entration. Then, the appli ation of a temperature gradient along these hannels enables us to read dire tly and quantitatively phase diagrams, on entration vs. temperature. We show, using a solution of adipi a id, that we an measure ten points of the solubility urve in less than 1 hr and with only 250 µL of solution. I. INTRODUCTION Chemistry, biology, and pharma ology, are fa ing al- ways more omplex systems depending on multiple pa- rameters. Therefore their omplete investigations take time and require signi� ant amounts of produ ts. In this ontext, roboti �uidi workstations have already met a great su ess and proved their e� ien y for instan e in the genome sequen ing and analysis [1℄. However, these instruments remain very expensive, need important la- bor, and the volumes involved (≤ mL) are still too large for some spe i� appli ations (e.g. proteomi s) [2, 3℄. Nowadays, other high throughput te hniques based on mi ro�uidi s [4, 5℄ an o�er suitable alternative solutions for the development of rapid s reening tools. Mi ro�uidi devi es are now largely used in biologi al and hemi al �elds for multiple appli ations [6℄ like mole ular sepa- rations and ells sorting [7℄, polymerase hain rea tion [8, 9℄, rapid mi romixing and analysis of hemi al rea - tions [10, 11, 12, 13℄. . . Moreover, the development of mi rovalves and mi romixers has made possible the pro- du tion of highly integrated systems whi h an be used to address individually hundreds of rea tion hambers [14℄. These devi es are well adapted to arry out high throughput s reening of phase diagrams, parti ularly in the ase of protein rystallization investigation. However, their fabri ation and multiplexing are still ompli ated. Another possible strategy is the use of droplets playing the role of nanoliter-sized rea tion ompartments. These droplets an be produ ed in spe i� mi ro�uidi geome- tries [15℄, and their volume and hemi al omposition an be �xed in a ontrolled way. In addition, they also allow a rapid mixing of the di�erent ompounds, prevent from any hydrodynami dispersion and ross ontamination, and an be stored in mi ro hannels (see Ref. [16℄ and referen es therein). Su h a strategy has already proved to be useful for rystallization studies: e.g. s reening of protein rystallization onditions [17, 18℄, or rystal nu leation kineti s measurements [19℄. Figure 1 summarizes the main insights of our work. Ele troni address: philippe.laval-exterieur�eu.rhodia. om We have engineered a new mi ro�uidi hip that allow a dire t and quantitative reading of two-dimensional di- agrams. Hundreds of nanoliter-sized droplets of di�er- ent hemi al ompositions an be stored in parallel mi- ro hannels, and a temperature gradient applied along these hannels enables us to obtain a two-dimensional array of droplets of di�erent on entrations and tempera- tures. For solubility diagram s reening, droplets ontain- ing a given solute are �rst stored with a gradual variation of on entration. Then, rystallization in the droplets is indu ed by ooling, and �nally, the appli ation of an ad- equat temperature gradient dissolves rystals in droplets whose temperature is higher than their solubility tem- perature. As a result, we dire tly read the limit between droplets with and without rystals as shown on Fig. 1( ), whi h gives the solubility temperatures of the solution at the di�erent on entrations. In the materials and methods se tion, we des ribe the mi ro�uidi devi e, the method used to store the droplets in the hannels, and the temperature ontrol setup. We also hara terize the on entration and temperature gra- dients. In the last se tion, we present an experimental proto ol to measure quantitatively solubility diagrams using this devi e. We demonstrate its e� ien y by mea- suring with only 250 µL of solution, the solubility urve of an organi ompound. II. MATERIALS AND METHODS A. Mi rofabri ation The mi ro�uidi devi e is fabri ated in poly(dimethylsiloxane) (PDMS) by using soft- lithographi te hniques [20℄. PDMS (Sili one Elastomer Base, Sylgard 184; Dow Corning) is molded on master fabri ated on a sili on wafer (3-In h-Si-Wafer; Siegert Consulting e.k.) using a negative photoresist (SU-8 2100; Mi roChem). To make molds of 500 µm height, we spin su essively two 250 µm thi k SU-8 layers on the wafer. After ea h spin oating pro ess, the wafer is soft-baked (10 min/65 C and 60 min/95 C). Photolithography is used to de�ne negative images of the mi ro hannels. Eventually, the wafer is hard-baked (25 min/95 C) and http://arxiv.org/abs/0704.0569v1 mailto:philippe.laval-exterieur@eu.rhodia.com (a) 1 FIG. 1: (a) Design of the mi ro�uidi devi e ( hannels width 500 µm). Sili one oil is inje ted in inlet 1 and aqueous so- lutions in inlets 2 and 3. The two dotted areas indi ate the positions of the two Peltier modules used to apply tempera- ture gradients ∇T . The three lines of dots mark the positions of temperature measurements. (b) Pi ture of the mi ro�uidi hip made of PDMS sealed with a glass slide (76×52 mm improve larity. Droplets ontaining a olored dye at di�erent on entrations are stored in the ten parallel hannels. ( ) Ex- ample of dire t reading of a solubility diagram. The droplets ontain an organi solute. The dotted line bounding droplets ontaining rystals give an estimation of the solubility limit (see se tion Results for details). developed (SU-8 Developer; Mi roChem). A mixture 10:1 of PDMS is molded on the SU-8 master des ribed above (65 C/60 min). The rossed linked PDMS layer is then peeled o� the mold and holes for the inlets and outlets (1/32 and 1/16 in. o.d.) are pun hed into the material. Then, the PDMS surfa e and a lean sili on wafer surfa e (3-In h-Si-Wafer; 500 µm; Siegert Consulting e.k.) are a tivated for 2 min in a UV ozone apparatus (UVO Cleaner, Model 144AX; Jelight) and brought together. Finally, the devi e is pla ed at 65 for 2 hr to improve the sealing. B. Droplet storage proto ol The devi e, presented on Fig. 1(a), is omposed of three inlets and ten outlets lo ated at the extremities of hannels 1 to 10. As shown on Fig. 1(b), ea h out- let is onne ted to a ≈ 20 m long rigid tubing (FEP 1/16 in.) ended with a pie e of soft PVC tubing (Nal- gene, ≈ 5 m long) inserted in an automated pin h ele - trovalve (105S�01059P; As o Jou omati ). Thanks to this system, ea h outlet an be independently losed or opened by pin hing or not the orresponding PVC tub- ing. However, the pin hing out of a tube leads in a liquid displa ement. To minimize the subsequent liquid distur- ban e in the mi ro hannels, the ele trovalves are pla ed lose to the rigid ones, and the hydrodynami resistan e after the ele trovalves is kept as weak as possible using large tubing. Sili one oil (500 St; Rhodorsil) is inje ted in inlet 1 at onstant �ow rate Q1 ≈ 3 mL hr , and aqueous phases are inje ted at �ow rates Q2 and Q3 ranging from 0 to about 1 mL hr , in inlets 2 and 3 respe tively. All liquids are inje ted with syringe pumps (PHD 2000 infu- sion; Harvard Apparatus). At the interse tion between the oil and the aqueous streams, monodisperse droplets of the aqueous phase in oil are ontinuously produ ed [21℄. Both the droplet volume (about 100�300 nL) and their produ tion frequen y (typi ally between one to ten droplets per se ond) an be tuned by the ratio of oil to aqueous phase �ow rates. The droplet omposition is monitored by the ratio Q2/Q3. Thanks to the possible opening and losing of ea h out- let, we an store droplets of given aqueous ompositions in the di�erent storage hannels i. Several steps are ne - essary to perform su h a �lling. First, all the hannels are initially �lled with sili one oil. Se ondly, the outlet of hannel 1 is opened and all the others are losed. In this on�guration, all the droplets of a given omposition �ow through 1. Finally, on e the �ow is stable, the outlet of 1 is suddenly losed and simultaneously, the outlet of hannel 2 is opened. All the droplets previously present in 1 stay immobilized whereas the other droplets, whose omposition an be hanged, �ow through 2. Su es- sively, in the same way, we an store droplets of various ompositions in all hannels i. C. Chemi al omposition ontrol For solubility investigations, the ontrol of the on- entrations in the droplets is ru ial. However, be ause of PDMS elasti ity and syringe pumps pre ision, an in- a ura y in droplet on entration remains. To estimate this error, we have performed investigations with a on- fo al Raman mi ros ope (HR800 Horiba; Jobin-Yvon). A 50× mi ros ope obje tive was used for fo using a 532 nm wavelength laser beam in the droplets, and for olle ting Raman s attered light, subsequently dispersed with a grating of 600 lines per millimeter. To minimize the out-of-fo us ba kground signals, we �xed the on- fo al pinhole at 500 µm. Experiments were performed on droplets made of two initial aqueous solutions of K4Fe(CN)6 (0.5 M) and K3Fe(CN)6 (0.5 M) inje ted in inlets 2 and 3 respe tively. These two ompounds display strong and distin t Raman signals [22℄. Figure 2 shows three Raman spe tra measured in droplets ontaining di�erent on entration ratios RC=[K4Fe(CN)6℄/[K3Fe(CN)6℄. The two bands entered 2000 2060 2095 2136 2200 wave number (cm−1) FIG. 2: Raman spe tra of droplets ontaining di�erent on- entration ratios RC of potassium ferro yanide K4Fe(CN)6 and potassium ferri yanide K3Fe(CN)6. (a) RC = 0 (b) RC = 1 ( ) RC = 9. at 2060 and 2095 m orrespond to K3Fe(CN)6 and the one at 2136 m orresponds to K4Fe(CN)6. The on entration of ea h ompound an be probed from the area under their spe i� Raman bands by: Ai = KiCitV , (1) where Ai is the area under the Raman band of the om- pound i, Ci its on entration, Ki a spe i� onstant, t the a quisition time, and V the analysis volume. As a onsequen e, the ratio RA of the Raman bands areas of K4Fe(CN)6 and K3Fe(CN)6 is proportional to the on- entrations ratio RC , and does not depend on the a qui- sition parameters. In order to optimize the �lling proto ol, we �rst use Ra- man mi ros opy to follow the kineti s of the on entra- tion stabilization in the droplets after a sudden hange in the aqueous phases �ow rates. Indeed, due to the PDMS elasti ity and the inje tion system (syringe pumps), the �nite response time of the devi e does not allow instan- taneous hange of the on entrations. To estimate this response time, we have performed the following experi- ment: for t < 0 s, Q2 = 0 and Q3 = 500 µL hr , and for t > 0 s, Q2 = Q3 = 250 µL hr . Droplets �rst �ow through hannel 1 whi h is losed after 30 s. Then, droplets are stored in �ve other hannels after 1, 2, 4, 6, and 10 min. Thus, Raman spe tra obtained from the droplets in the di�erent hannels enable us to follow the evolution of RA as a fun tion of time after the �ow rates hange. Figure 3(a) shows it rea hes almost a onstant value after 60 s meaning the on entrations be ome sta- ble after this time. Su h measurements illustrate that 20 min long proto ols are e� ient to store droplets of desired ompositions in the ten hannels (≈ 2 min per hannel). A se ond series of experiments was performed to es- timate and hara terize the on entration gradient we an apply in the devi e. The storage hannels are �lled with droplets of di�erent on entrations in K3Fe(CN)6 and K4Fe(CN)6 set from the �ow rates. In ea h hannel i, to rea h a stable droplets omposition, we maintain the �ow for 90 s before losing the outlet to store them [see Fig. 3(a)℄. By measuring the Raman spe tra of the droplets omposition in the di�erent hannels, we ob- tain the ratio RA as a fun tion of the theoreti al ratio = Q2/Q3. The error bar orresponds to the stan- dard deviation of the measurements performed on the droplets in a given hannel. As an be seen on Fig. 3(b), a linear relationship between RA and R is observed as expe ted. Deviations of a few per ents around the linear law are probably due to the Raman measurements un- ertainties, to the a ura y of the inje tion system, and also to the PDMS elasti ity. These Raman measurements demonstrate that with the developed proto ol, we are able to store hundreds of droplets in ten hannels in about 20 min, and on- suming less than a few hundreds of µL of solution. We believe that more rigid and smaller mi rodevi es om- bined with even more rea tive inje tion system would de rease signi antly the amount of liquids used when �ll- ing the hannels. Other strategies involving for instan e droplet generation thanks to integrated mi rovalves [23℄, may also proved to be useful to de rease the required volumes of solution. 0 1 2 3 4 0 60 200 400 600 t (s) FIG. 3: (a) Evolution of the ratio RA in the droplets after a sudden hange of the aqueous solutions �ow rates Q2 and Q3. Before t = 0 s, Q2 = 0 and Q3 = 500 µL hr . For t > 0 s, Q2 = Q3 = 250 µL hr . Between 5 and 30 s, RA are obtained from three single droplets in hannel 1. After t = 60 s, ea h point is a mean value al ulated on several droplets in a given hannel. (b) Con entration ratio RA in droplets as a fun tion of the on entration ratio R C determined from the aqueous solutions �ow rates. The dotted line orresponds to the linear �t of the data. D. Temperature ontrol The temperature �eld of the hip is ontrolled with two Peltier modules (30×30×3.3 mm ; CP1.4�71�06L; Mel- or) pla ed underneath the wafer at positions marked by the two dotted areas on Fig. 1(a). Sin e the two Peltier modules are independant, we an heat or ool the de- vi e, and also apply important temperature gradients. We use a sili on wafer as hip support to optimize ther- mal transfers and thus to reate regular temperature gra- dients along the storage hannels. Thin thermo ouples (type K, 76 µm o.d., 5SRTC-TTKI-40-1M; Omega) mea- sure the temperature of the devi e along three series of positions parallel to the storage hannels. The �rst series is pla ed above 1, the se ond one between 5 and 6, and the third one below 10 [see Fig. 1(a)℄. To rea h the maxi- mal pre ision on liquid temperature measurements inside the hannels, the thermo ouples are inserted in holes pre- viously pun hed through the PDMS layer and �lled with sili one oil. Thermo ouples signals are pro essed with a data a quisition instrument (USB�9161; National In- struments) and LabView software. Figure 4(a) shows we are able to apply easily temperature gradients up to C on 5 m. To estimate the temperature at any po- 0 10 20 30 40 X (mm) X (mm)C FIG. 4: Temperature pro�les of the hip for a given tem- perature gradient (a) Temperatures measured along the stor- age hannels with thermo ouples inserted through the PDMS layer at di�erent positions shown on Fig. 1(a). (N) measure- ments series above hannel 1; (�) series between 5 and 6; (H) series below 10. (b) Interpolated temperature pro�le of the hip. sitions along the storage hannels, we perform a longi- tudinal and transverse linear interpolation of the three series of measurements. The �nal pro�le obtained after su h interpolation is depi ted on Fig. 4(b). Note that the temperature is not perfe tly homogeneous transversely to the storage hannels. This is due to the size of the Peltier module as ompared to the size of the droplet storage area: smaller storage area, or larger Peltier mod- ules, would give homogeneous temperature pro�les along the transverse dire tion of the hannels. III. RESULTS In the previous se tion we have shown that our mi- rodevi e allows us to build a two-dimensional array of droplets with both on entration and temperature gra- dients. We now present an appli ation of this hip by measuring the solubility urve of an organi solute. Su h measurements are arried out with an adipi a id solution previously prepared in a beaker. It is made of 10.14 g of adipi a id (99%; Aldri h) in 50.66 g of deion- ized water. The solubility temperature of the solution is 63 C. To avoid any rystallization before the droplets formation, the syringe ontaining the solution and the orresponding tubing are heated at about 65 C with two �exible heaters (Min o) ontrolled with temperature on- trollers (Min o). A stereo mi ros ope (SZX12; Olympus) with an obje tive (DF PLFL 0.5× PF; Olympus) enables us to observe the devi e during the solubility study. We inje t the adipi a id solution in inlet 2 and deion- ized water in inlet 3. By hanging the �ow rates ratio we �ll the storage hannels with droplets whose on en- tration in adipi a id varies from 20 g / 100 g of water in hannel 1 down to 6 g / 100 g of water in 10. The massi on entration C in the droplets is al ulated a - ording to: 1 + (1 + C0)Q3/Q2 where C0 is the massi on entration of the initial adipi a id solution, Q2 and Q3 the respe tive �ow rates of the solution and water (we he ked that density variations in- du ed by the presen e of adipi a id are negligible). The mi ro�uidi hip is kept at about 65 C using the Peltier modules to avoid any rystallization during the droplet storage. Before stopping the droplets in a hannel, we maintain it open for 90 s for �ow stabilization. In these onditions, the total �lling of the ten hannels is rea hed in less than 20 min and only 250 µL of solution are spent. After the droplet storage, rystallization is indu ed by ooling. Note that the mean time of rystal nu leation is inversely proportional to the rea tor volume. Indeed, the nu leation frequen y is given by 1/JV where J is the nu leation rate that does no depend on the volume V of the rea tor (see Refs. [19, 24, 25, 26℄ and referen es therein). Crystal nu leation in a droplet of 100 nL is thus 10 times longer than in a vial of 1 mL. To redu e su h long indu tion time, we apply a strong ooling to in rease signi� antly the supersaturation. In our ase, down to ≈ −5◦C, rystals appear in all the droplets after a few minutes. To obtain the solubility urve dire tly on the hip, we then apply a temperature gradient between 32 and 65 after the rystallization step. Crystals dissolve in all the droplets whose temperature is higher than their solubil- ity temperature. In the other droplets, rystals are partly solubilized but still exist (the equilibrium is rea hed in about 20 min). Typi al images of the storage area are FIG. 5: Images of a part of the storage area obtained under rossed polarizers. Droplets of adipi a id solution are stored in the hannels. The on entration in adipi a id was grad- ually hanged between the upper and the bottom hannels. After rystallisation of all the droplets, a temperature gradi- ent is applied (low temperature on the left and high temper- ature on the right). (a) The dotted line separating droplets ontaining rystals from empty droplets give an estimation of the solubility limit. (b) Same image but with a di�erent ontrast displaying the droplets positions. presented on Fig. 5. Sin e adipi a id rystals have bire- fringent properties, they are easily dete ted under rossed polarizers. The smallest dete table rystals size is about 50×50 µm2 at the magni� ation used. Figure 5(a) en- ables us to dire tly observe the limit of rystal presen e. Using interpolated temperature pro�les su h as the one displayed in Fig. 4, allows us to estimate the solubility temperatures for all the ten on entrations (we hoose them in the middle of the two su essive droplets with and without rystals). Figure 6 presents su h solubility temperatures measured with our mi ro�uidi devi e. The error orresponds to the temperatures di�eren e between the two droplets en losing the solubility limit positions. These results are in good agreements with data obtained from literature [27℄. Naturally, the errors done on su h measurements de- pend on the distan e between two su essive droplets, and on the amplitude of the temperature gradient. In our ase, the temperature gradient of 0.7 a typi al distan e of 3 mm between two droplets give an error of ±1◦C. The appli ation of smaller temperature 40 45 50 55 60 T (°C) FIG. 6: (•) Solubility of adipi a id in water measured in the ase of a temperature gradient of 0.7 . (◦) Solubility data from literature, the dotted line is a guideline for eyes. gradients and the redu tion of the distan e between two su essive droplets would give a better a ura y on the solubility limit. For the moment, the maximal temperature whi h an be investigated is limited by the evaporation of water through the PDMS layer [28℄. Simple measurements show that the volume of an aqueous droplet stored in our devi e at 60 C, de reases by ≈ 10% in 4 hr. Su h an e�e t is negligible for the experiments des ribed above (droplet �lling time 20 min at 65 C), but may explain the small dis repan y observed on Fig. 6 at high temper- ature. We believe that the use of non-permeable materi- als su h as glass, instead of PDMS, ould easily broaden the possibilities o�ered by our system. IV. CONCLUSION In this work we have presented a new mi ro�uidi tool to perform rapid s reening of solubility diagrams. The devi e enables us to store hundreds of droplets (≈ 100 nL) of various hemi al ompositions in parallel mi ro hannels, and to apply large temperature gradients. We have demonstrated using a model system (adipi a id in water), that we ould easily and dire tly a ess to ten simultaneous measurements of the solubility urve on a large temperature range, in less than 1 hr, and with only 250 µL of solution. To on lude, we believe our devi e is a suitable tool for solubility diagrams s reen- ing, more rapid, with a better temperature ontrol, and heaper than lassi al roboti workstations. Su h a mi- ro�uidi tool may also be useful for many other appli a- tions, where two-dimensional s reening, temperature vs. omposition, is required. A knowledgments We gratefully thank G. Cristobal, J. Krishnamurti, J. Leng, and F. Sarrazin for fruitful dis ussions and riti- al reading of this manus ript. We also a knowledge Ré- gion Aquitaine for funding and support, and the Atelier Mé anique of the CRPP for their te hni al help. [1℄ G. H. W. Sanders and A. Manz, Trends Anal. Chem. 19, 364 (2000). [2℄ J. R. Luft, J. Wol�ey, I. Jurisi a, J. Glasgow, S. Fortier, and G. T. DeTitta, J. Cryst. Growth 232, 591 (2001). [3℄ D. L. Chen and R. F. Ismagilov, Curr. Opin. Chem. Biol. 10, 226 (2006). [4℄ H. A. Stone, A. D. Stroo k, and A. Ajdari, Annu. Rev. Fluid. Me h. 36, 381 (2004). [5℄ T. M. Squires and S. R. Quake, Rev. Mod. Phys. 77, 977 (2005). [6℄ T. Vilkner, D. Janasek, and A. Manz, Anal. Chem. 76, 3373 (2004). [7℄ N. Min , C. Futterer, K. D. Dorfman, A. Ban aud, C. Gosse, C. Goubault, and J. L. Viovy, Anal. Chem. 76, 3770 (2004). [8℄ J. Khandurina and A. Guttman, J. Chromatogr. A 943, 159 (2002). [9℄ M. Chabert, K. D. Dorfman, P. de Cremoux, J. Roer- aade, and J.-L. Viovy, Anal. Chem. 78, 7722 (2006). [10℄ A. D. Stroo k, S. K. Dertinger, A. Ajdari, I. Mezi , H. A. Stone, and G. M. Whitesides, S ien e 295, 647 (2002). [11℄ E. M. Chan, A. P. Alivisatos, and R. A. Mathies, J. Am. Chem. So . 127, 13854 (2005). [12℄ J.-B. Salmon, C. Dubro q, P. Tabeling, S. Charier, D. Al- or, L. Jullien, and F. Ferrage, Anal. Chem. 77, 3417 (2005). [13℄ S. A. Khan, A. Gunther, M. A. S hmidt, and K. F. Jensen, Langmuir 20, 8604 (2004). [14℄ T. Thorsen, S. J. Maerkl, and S. R. Quake, S ien e 298, 580 (2002). [15℄ T. Thorsen, R. W. Roberts, F. H. Arnold, and S. R. Quake, Phys. Rev. Lett. 86, 4163 (2001). [16℄ H. Song, D. L. Chen, and R. F. Ismagilov, Angew. Chem. Int. Ed Engl. 45, 7336 (2006). [17℄ B. Zheng, L. S. Roa h, and R. F. Ismagilov, J. Am. Chem. So . 125, 11170 (2003). [18℄ J. Shim, G. Cristobal, D. Link, T. Thorsen, and S. Fraden, Using mi ro�uidi s to de ouple nu leation and growth of protein rystals, Unpublished Work (2006). [19℄ P. Laval, J.-B. Salmon, and M. Joani ot, J. Cryst. Growth doi:10.1016/j.j rysgro.2006.12.044 (2007). [20℄ J. C. M Donald and G. M. Whitesides, A . Chem. Res. 35, 491 (2002). [21℄ S. L. Anna, N. Boutoux, and H. A. Stone, Appl. Phys. Lett. 82, 364 (2003). [22℄ G. Cristobal, L. Arbouet, F. Sarrazin, D. Talaga, J.-L. Bruneel, M. Joani ot, and L. Servant, Lab Chip 6, 1140 (2006). [23℄ B. T. Lau, C. A. Baitz, X. P. Dong, and C. L. Hansen, J. Am. Chem. So . 129, 454 (2007). [24℄ A. C. Zettlemoyer, Nu leation (Mar el Dekker, New York, 1969). [25℄ D. Kash hiev and G. M. Rosmalen, Cryst. Res. Te hnol. 38, 555 (2003). [26℄ J. W. Mullin, Crystallization (Butterworth-Heinemann, Oxford, 2001), 4th ed. [27℄ A. Apelblat and E. Manzurola, J. Chem. Thermodynam- i s 19, 317 (1986). [28℄ J. Leng, B. Lonetti, P. Tabeling, M. Joani ot, and A. Aj- dari, Physi al Review Letters 96, 084503 (2006). ABSTRACT This work describes a new microfluidic device developed for rapid screening of solubility diagrams. In several parallel channels, hundreds of nanoliter-volume droplets of a given solution are first stored with a gradual variation in the solute concentration. Then, the application of a temperature gradient along these channels enables us to read directly and quantitatively phase diagrams, concentration vs. temperature. We show, using a solution of adipic acid, that we can measure ten points of the solubility curve in less than 1 hr and with only 250 $\mu$L of solution. <|endoftext|><|startoftext|> Introduction One and two quasiparticles in the Laughlin state The ground state and the quasihole states One quasiparticle Two or more quasiparticles Quasiparticles and quasiholes Composite Fermion states in the Jain series The = 2/5 composite fermion ground state The quasihole operators The quasiparticle operator The = 3/7 state and the Jain series Connection to effective Chern-Simons theories and edge states Localized quasiparticles and fractional charge and statistics Numerical tests Two-quasiparticle wave function Random Phase Approximation Summary and Outlook The background charge Equivalence between CFT and CF wave functions An identity Equivalence between the = 2/5 CF and CFT wave functions. The general CF operators and the Jain series The normalization factors N1 and N2 References ABSTRACT It is known that a subset of fractional quantum Hall wave functions has been expressed as conformal field theory (CFT) correlators, notably the Laughlin wave function at filling factor $\nu=1/m$ ($m$ odd) and its quasiholes, and the Pfaffian wave function at $\nu=1/2$ and its quasiholes. We develop a general scheme for constructing composite-fermion (CF) wave functions from conformal field theory. Quasiparticles at $\nu=1/m$ are created by inserting anyonic vertex operators, $P_{\frac{1}{m}}(z)$, that replace a subset of the electron operators in the correlator. The one-quasiparticle wave function is identical to the corresponding CF wave function, and the two-quasiparticle wave function has correct fractional charge and statistics and is numerically almost identical to the corresponding CF wave function. We further show how to exactly represent the CF wavefunctions in the Jain series $\nu = s/(2sp+1)$ as the CFT correlators of a new type of fermionic vertex operators, $V_{p,n}(z)$, constructed from $n$ free compactified bosons; these operators provide the CFT representation of composite fermions carrying $2p$ flux quanta in the $n^{\rm th}$ CF Landau level. We also construct the corresponding quasiparticle- and quasihole operators and argue that they have the expected fractional charge and statistics. For filling fractions 2/5 and 3/7 we show that the chiral CFTs that describe the bulk wave functions are identical to those given by Wen's general classification of quantum Hall states in terms of $K$-matrices and $l$- and $t$-vectors, and we propose that to be generally true. Our results suggest a general procedure for constructing quasiparticle wave functions for other fractional Hall states, as well as for constructing ground states at filling fractions not contained in the principal Jain series. <|endoftext|><|startoftext|> Untitled 0 → π+π−π0 Time Dependent Dalitz analysis at BaBar. Gianluca Cavoto∗ INFN Sezione di Roma, Piazzale Aldo Moro 2, 00185 Rome, Italy I present here results of a time-dependent analysis of the Dalitz structure of neutral B meson decays to π+π−π0 from a dataset of 346 million BB̄ pairs collected at the Υ (4S) center of mass energy by the BaBar detector at the SLAC PEP-II e+e− accelerator. No significant CP violation effects are observed and 68% confidence interval is derived on the weak angle α to be [75,152] I. INTRODUCTION The time-dependent analysis of the B0 → π+π−π0 Dalitz plot (DP), dominated by the ρ(770) intermedi- ate resonances, extracts simultaneously the strong tran- sition amplitudes and the weak interaction phase α ≡ arg [−VtdV ∗tb/VudV ∗ub] of the Unitarity Triangle [1]. In the Standard Model, a non-zero value for α is respon- sible for the occurrence of mixing-induced CP violation in this decay. ρ±π∓ is not a CP eigenstate, and four flavor-charge configurations (B0(B0) → ρ±π∓) must be considered. The corresponding isospin analysis [2] is un- fruitful with the present statistics since two pentagonal amplitude relations with 12 unknowns have to be solved (compared to 6 unknowns for the π+π− and ρ+ρ− sys- tems). The differential B0 decay width with respect to the Mandelstam variables s+, s− (i.e., the Dalitz plot [3]) reads dΓ(B0 → π+π−π0) = 1 (2π)3 |A3π | ds+ds−, where A3π (A3π) is the Lorentz-invariant amplitude of the three-body decay B0 → π+π−π0 (B0 → π+π−π0). We assume in the following that the amplitudes are dom- inated by the three resonances ρ+, ρ− and ρ0 and we write A3π = f+A + + f−A − + f0A 0 and A3π = f+A+ + −+f0A 0, where the fκ (with κ = {+,−, 0} denoting the charge of the ρ from the decay of the B0 meson) are functions of s+ and s− that incorporate the kinematic and dynamical properties of the B0 decay into a (vec- tor) ρ resonance and a (pseudoscalar) pion, and where the Aκ are complex amplitudes that include weak and strong transition phases and that are independent of the Dalitz variables. With ∆t ≡ t3π−ttag defined as the proper time interval between the decay of the fully reconstructedB03π and that of the other meson B0tag, the time-dependent decay rate ∗Electronic address: gianluca.cavoto@roma1.infn.it when the tagging meson is a B0 (B0) is given by |A±3π(∆t)| e−|∆t|/τB0 |A3π |2 + |A3π|2 ∓ |A3π |2 − |A3π|2 cos(∆md∆t) ± 2Im A3πA∗3π sin(∆md∆t) , (1) where τB0 is the mean B 0 lifetime and ∆md is the B oscillation frequency. Here, we have assumed that CP violation in b mixing is absent (|q/p| = 1), ∆ΓBd = 0 and CPT is conserved. Inserting the amplitudes A3π and A3π one obtains for the terms in Eq. (1) |A3π |2 ± |A3π|2 = κ∈{+,−,0} |fκ|2U±κ + κ<σ∈{+,−,0} Re [fκf κσ − Im [fκf∗σ ]U±,Imκσ A3πA∗3π κ∈{+,−,0} |fκ|2Iκ + κ<σ∈{+,−,0} Re [fκf σ ] I κσ + Im [fκf σ ] I , (2) The 27 real-valued coefficients defined in Tab.IV that multiply the fκf σ bilinears are determined by the fit. Each of the coefficients is related in a unique way to phys- ically more intuitive quantities, such as tree-level and penguin-type amplitudes, the angle α, or the quasi-two- body CP and dilution parameters [4] (cf. Section IVB). We determine the quantities of interest in a subsequent least-squares fit to the measured U and I coefficients. II. DALITZ MODEL The ρ resonances are assumed to be the sum of the ground state ρ(770) and the radial excitations ρ(1450) and ρ(1700), with resonance parameters determined by a combined fit to τ+ → ντπ+π0 and e+e− → π+π− data [5]. Since the hadronic environment is different in B decays, we cannot rely on this result and therefore de- termine the relative ρ(1450) and ρ(1700) amplitudes si- multaneously with the CP parameters from the fit. Vari- ations of the other parameters and possible contributions http://arxiv.org/abs/0704.0571v1 0 0.2 0.4 0.6 0.8 1 Interference FIG. 1: Square Dalitz plots for Monte-Carlo generated B0 → π+π−π0 decays.The decays have been simulated without any detector effect and the amplitudes A+, A− and A0 have all been chosen equal to 1 in order to have destructive inter- ferences at equal ρ masses. The main overlap regions be- tween the charged and neutral ρ bands are indicated by the hatched areas. Dashed lines in both plots correspond to√ s+,−,0 = 1.5 GeV/c 2: the central region of the Dalitz plot contains almost no signal event. to the B0 → π+π−π0 decay other than the ρ’s are studied as part of the systematic uncertainties (Section IVA). Following Ref. [5], the ρ resonances are parameterized in fκ by a modified relativistic Breit-Wigner function in- troduced by Gounaris and Sakurai (GS) [6]. Large variations occurring in small areas of the Dalitz plot are very difficult to describe in detail. These re- gions are particularly important since this is where the interference, and hence our ability to determine the strong phases, occurs. We therefore apply the trans- formation ds+ ds− −→ | detJ | dm′ dθ′, which defines the Square Dalitz plot (SDP). The new coordinates are m′ ≡ 1 arccos −mmin , θ′ ≡ 1 θ0, where m0 is the invariant mass between the charged tracks, mmax0 = mB0 − mπ0 and mmin0 = 2mπ+ are the kine- matic limits of m0 and θ0 is the ρ 0 helicity angle; θ0 is defined by the angle between the π+ in the ρ0 rest frame and the ρ0 flight direction in the B0 rest frame. J is the Jacobian of the transformation that zooms into the kinematic boundaries of the Dalitz plot, shown in Fig.1 . III. ANALYSIS DESCRIPTION The U and I coefficients and the B0 → π+π−π0 event yield are determined by a maximum-likelihood fit of the signal model to the selected candidate events. Kinematic and event shape variables exploiting the characteristic properties of the events are used in the fit to discriminate signal from background. A. Signal and background parametrization We reconstruct B0 → π+π−π0 candidates from pairs of oppositely-charged tracks, which are required to form a good quality vertex, and a π0 candidate. In order to ensure that all events are within the Dalitz plot bound- aries, we constrain the three-pion invariant mass to the B mass. A B-meson candidate is characterized kinemat- ically by the energy-substituted mass mES = s+ p0 · pB)2/E20 − p2B] 2 and energy difference ∆E = E∗B − 12 s, where (EB,pB) and (E0,p0) are the four-vectors of the B-candidate and the initial electron-positron system, respectively. The asterisk denotes the Υ (4S) frame, and s is the square of the invariant mass of the electron-positron system. We require 5.272 < mES < 5.288GeV/c 2. The ∆E res- olution exhibits a dependence on the π0 energy and therefore varies across the Dalitz plot. We account for this effect by introducing the transformed quantity ∆E′ = (2∆E − ∆E+ − ∆E−)/(∆E+ − ∆E−), with ∆E±(m0) = c± − (c± ∓ c̄) (m0/mmax0 )2, where m0 is strongly correlated with the energy of π0. We use the val- ues c̄ = 0.045GeV, c− = −0.140GeV, c+ = 0.080GeV, mmax0 = 5.0GeV, and require −1 < ∆E′ < 1. Backgrounds arise primarily from random combina- tions in continuum qq̄ events. To enhance discrimination between signal and continuum, we use a neural network (NN) [7] to combine discriminating topological variables. The time difference ∆t is obtained from the measured distance between the z positions (along the beam direc- tion) of the B03π and B tag decay vertices, and the boost βγ = 0.56 of the e+e− system: ∆t = ∆z/βγc. To deter- mine the flavor of the B0tag we use the B flavor tagging algorithm of Ref. [8]. This produces six mutually exclu- sive tagging categories. Events with multiple B candidates passing the full se- lection occur in 16% (ρ±π∓) and 9% (ρ0π0) of the time, according to signal MC. If the multiple candidates have different π0 candidates, we choose the B candidate with the reconstructed π0 mass closest to the nominal π0 mass; in the case that both candidates have the same π0, we pick the first one. The signal efficiency determined from MC simulation is 24% for B0 → ρ±π∓ and B0 → ρ0π0 events, and 11% for non-resonant B0 → π+π−π0 events. Of the selected signal events, 22% of B0 → ρ±π∓, 13% of B0 → ρ0π0, and 6% of non-resonant events are misreconstructed. Misreconstructed events occur when a track or neutral cluster from the tagging B is assigned to the reconstructed signal candidate. This occurs most often for low-momentum tracks and photons and hence the misreconstructed events are concentrated in the cor- ners of the Dalitz plot. Since these are also the areas where the ρ resonances overlap strongly, it is important to model the misreconstruced events correctly. We use MC simulated events to study the background from other B decays. More than a hundred channels were considered in preliminary studies, of which twenty- nine are included in the final likelihood model. For each mode, the expected number of selected events is com- puted by multiplying the selection efficiency (estimated using MC simulated decays) by the world average branch- ing fraction (or upper limit), scaled to the dataset lumi- nosity (310 fb−1). The selected on-resonance data sample is assumed to consist of signal, continuum-background and B-background components, separated by the flavor and tagging category of the tag side B decay. The sig- nal likelihood consists of the sum of a correctly recon- structed (“truth-matched”, TM) component and a mis- reconstructed (“self-cross-feed”, SCF) component. B. Dalitz and ∆t distribution The Dalitz plot PDFs require as input the Dalitz plot- dependent relative selection efficiency, ǫ = ǫ(m′, θ′), and SCF fraction, fSCF = fSCF(m ′, θ′). Both quantities are taken from MC simulation. Away from the Dalitz plot corners the efficiency is uni- form, while it decreases when approaching the corners, where one of the three particles in the final state is close to rest so that the acceptance requirements on the par- ticle reconstruction become restrictive. Combinatorial backgrounds and hence SCF fractions are large in the corners of the Dalitz plot due to the presence of soft neu- tral clusters and tracks. The width of the dominant ρ(770) resonance is large compared to the mass resolution for TM events (about 8MeV/c2 core Gaussian resolution). We therefore neglect resolution effects in the TM model. Misreconstructed events have a poor mass resolution that strongly varies across the Dalitz plot. It is described in the fit by a 2 × 2-dimensional resolution function, convoluted with signal Dalitz PDF. The ∆t resolution function for signal and B- background events is a sum of three Gaussian distribu- tions, with parameters determined by a fit to fully recon- structed B0 decays [8]. The Dalitz plot- and ∆t-dependent PDFs factorize for the charged-B-backgroundmodes, but not necessarily for the neutral-B background due to B0B0 mixing. The charged B-background contribution to the likeli- hood parametrizes tag-“charge” correlation (represented by an effective flavor-tag-versus-Dalitz-coordinate cor- relation), and therefore possible direct CP violation in these events. The Dalitz plot PDFs are obtained from MC simula- tion and are described with the use of non-parametric functions. The ∆t resolution parameters are determined by a fit to fully reconstructed B+ decays. The neutral-B background is parameterized with PDFs that depend on the flavor tag of the event and, depending on the final states they can show correla- tions between the flavor tag and the Dalitz coordinate. The Dalitz plot PDFs are obtained from MC simulation and are described with the use of non-parametric func- tions. For neutral-B background, the signal ∆t resolution model is assumed. The Dalitz plot of the continuum events is parametrized with an empirical shape. extracted from on-resonance events selected in the mES sidebands and corrected for feed-through from B decays. The contin- uum ∆t distribution is parameterized as the sum of three Gaussian distributions with common mean and three dis- tinct widths that scale the ∆t per-event error, all deter- mined by the fit. IV. RESULTS The maximum-likelihood fit results in a B0 → π+π−π0 event yield of 1847 ± 69, where the error is statistical only. For the U and I coefficients, the results are given together with their statistical and systematic errors in Table IV. The signal is dominated by B0 → ρ±π∓ de- cays. We observe an excess of ρ0π0 events, which is in agreement with our previous upper limit [9], and the lat- est measurement from the Belle collaboration [10]. The result for the ρ(1450) amplitude is in agreement with the findings in τ and e+e− decays [5]. For the relative strong phase between the ρ(770) and the ρ(1450) amplitudes we find (171± 23)◦ (statistical error only), which is compat- ible with the result from τ and e+e− data. A. Systematics studies The most important contribution to the systematic un- certainty stems from the modeling of the Dalitz plot dy- namics for signal. We evaluated this by observing the difference between the true values and Monte Carlo fit re- sults, in which events are generated based on an alterna- tive model. The alternative fit model has, in addition, a uniform Dalitz distribution for the non-resonance events and possible resonances including f0(980), f2(1270), and a low mass S-wave σ. The fit does not find significant number of any of those decays. However, the inclusion of a low mass π+π− S-wave component significantly de- grades our ability to identify ρ0π0 events. . We vary the mass and width of the ρ(770), ρ(1450), and ρ(1700) within ranges that exceed twice the errors found for these parameters in the fits to τ and e+e− data [5], and assign the observed differences in the mea- sured U and I coefficients as systematic uncertainties. To validate the fitting tool, we perform fits on large MC samples with the measured proportions of signal, contin- uum and B-background events. No significant biases are observed in these fits, and the statistical uncertainties on the fit parameters are taken as systematic uncertainties ”Quasi twobody” U±κ = |Aκ|2 ± |Aκ|2 U+0 ρ 0π0 fit fraction 0.237 ± 0.053 ± 0.043 U+− ρ −π+ fit fraction 1.33± 0.11 ± 0.04 U−0 Direct CPV (ρ 0π0) −0.055± 0.098 ± 0.13 U−− Direct CPV (ρ −π+) −0.30± 0.15 ± 0.03 U−+ Direct CPV (ρ +π−) 0.53± 0.15 ± 0.04 ”Quasi twobody” Iκ = Im AκAκ∗ I0 Int. Mixing CPV ρ 0π0 −0.028± 0.058 ± 0.02 I− Int. Mixing CPV ρ −π+ −0.03± 0.10 ± 0.03 I+ Int. Mixing CPV ρ +π− −0.039± 0.097 ± 0.02 ”Interference” U ±,Re(Im) κσ = Re(Im) AκAσ∗ ± AκAσ∗ +− 0.62± 0.54 ± 0.72 +− 0.13± 0.94 ± 0.17 +− 0.38± 0.55 ± 0.28 +− 2.14± 0.91 ± 0.33 +0 0.03± 0.42 ± 0.12 +0 −0.75± 0.40 ± 0.15 +0 −0.93± 0.68 ± 0.08 +0 −0.47± 0.80 ± 0.3 −0 −0.03± 0.40 ± 0.23 −0 −0.52± 0.32 ± 0.08 −0 0.24± 0.61 ± 0.2 −0 −0.42± 0.73 ± 0.28 ”Interference” IReκσ = Re AκAσ∗ −AσAκ∗ IRe+− −0.1 ± 1.9 ± 0.3 IRe+0 0.2 ± 1.1 ± 0.4 IRe−0 0.92± 0.91 ± 0.4 ”Interference” IImκσ = Im AκAσ∗ + AσAκ∗ IIm+− −1.9 ± 1.1 ± 0.1 IIm+0 −0.1 ± 1.1 ± 0.3 IIm−0 0.7 ± 1.0 ± 0.3 TABLE I: Definitions and results for the 26 U and I observ- ables extracted from the fit. We determine the relative values of U and I coefficients to U++ . Another major source of systematic uncertainty is the B-background model. The expected event yields from the background modes are varied according to the uncer- tainties in the measured or estimated branching fractions Since B-backgroundmodes may exhibit CP violation, the corresponding parameters are varied within appropriate uncertainty ranges. Continuum Dalitz plot PDF is extrapolated form mES sideband, and large samples of off-resonance data with loosened requirements on ∆E and the NN are used to compare the distributions of m′ and θ′ between the mES sideband and the signal region. No significant differences are found. We assign as systematic error the effect seen when weighting the continuum Dalitz plot PDF by the ratio of both data sets. This effect is mostly statistical in origin. Other systematic effects due to the signal PDFs com- prise uncertainties in the PDF parameterization, the treatment of misreconstructed events, the tagging per- 0 50 100 150 α (deg) B A B A R P R E L I M I N A R Y FIG. 2: Confidence level functions for α. Indicated by the dashed horizontal lines are the confidence level (C.L.) values corresponding to 1σ and 2σ, respectively. formance, and the modeling of the signal contributions and are estimated using arious data control samples. B. Intepretation of the results The U and I coefficients are related to the quasi-two- body parameters (Tab.IVB) defined in Ref. [4], explic- itly accounting for the presence of interference effects, and are thus exact even for a ρ with finite width. The systematic errors are dominated by the uncertainty on the CP content of the B-related backgrounds. One can transform the experimentally convenient, namely uncor- related, direct CP -violation parameters C and Aρπ into the physically more intuitive quantities A+−ρπ and A−+ρπ . The significance, including systematic uncertainties and calculated by using a mininum χ2 method, for the ob- servation of non-zero direct CP violation is at the 3.0σ level. C = (C+ + C−)/2 0.154 ± 0.090 ± 0.037 S = (S+ + S−)/2 0.01± 0.12 ± 0.028 ∆C = (C+ − C−)/2 0.377 ± 0.091 ± 0.021 ∆S = (S+ − S−)/2 0.06 ± 0.13 ± 0.029 Aρπ = −0.142± 0.041 ± 0.015 A+−ρπ = |κ +−|2−1 |κ+−|2+1 0.03 ± 0.07± 0.03 A−+ρπ = |κ −+|2−1 |κ−+|2+1 −0.38+0.15−0.16 ± 0.07 TABLE II: Quasi twobody parameters definition and results, where C± = and S± = ; κ+− = (q/p)(A−/A+) and κ−+ = (q/p)(A+/A−), so that A+−ρπ (A−+ρπ ) involves only diagrams where the ρ (π) meson is emitted by the W bo- son. A+−ρπ and A−+ρπ are evaluated as − Aρπ+C+Aρπ∆C 1+∆C+AρπC Aρπ−C−Aρπ∆C 1−∆C−AρπC . Their correlation coefficient is 0.62. The measurement of the resonance interference terms allows us to constrain the relative phase δ+− = arg (A+∗A−) between the amplitudes of the decays B0 → ρ−π+ and B0 → ρ+π−. This constraint can be improved with the use of strong isospin symmetry. The amplitudes Aκ represent the sum of tree-level (T κ) and penguin- type (P κ) amplitudes, which have different CKM fac- tors. Here we denote by κ the charge conjugate of κ, where 0 = 0. We define [11] Aκ = T κe−iα + P κ and Aκ = T κe+iα + P κ, where the magnitudes of the CKM factors have been absorbed in the T κ, P κ, T κ and P κ. Using strong isospin symmetry and neglecting isospin- breaking effects, one can identify P 0 = −(P+ + P−)/2 and 9 unknowns have to be determined by the fit. We find for the solution that is favored by the fit δ+− = (34 ± 29)◦, where the errors include both sta- tistical and systematic effects, but only a marginal con- straint on δ+− is obtained for C.L. < 0.05. Finally, following the same procedure, we can also de- rive a constraint on α. The resulting C.L. function versus α is given in Fig. 2 and includes systematic uncertain- ties. Ignoring the mirror solution at α + 180◦, we find α ∈ (75◦, 152◦) at 68% C.L. No constraint on α is achieved at two sigma and beyond. V. CONCLUSIONS We have presented the preliminary measurement of CP -violating asymmetries in B0 → π+π−π0 decays dom- inated by the ρ resonance. The results are obtained from a data sample of 346 million Υ (4S) → BB decays. We perform a time-dependent Dalitz plot analysis. From the measurement of the coefficients of 26 form factor bilin- ears we determine the three CP -violating and two CP - conserving quasi-two-body parameters, where we find a 3.0σ evidence of direct CP violation. Taking advantage of the interference between the ρ resonances in the Dalitz plot, we derive constraints on the relative strong phase between B0 decays to ρ+π− and ρ−π+, and on the an- gle α of the Unitarity Triangle. These measurements are consistent with the expectation from the CKM fit [12]. Acknowledgments The author wishes to thank the conference organizers for an enjoyable and well-organized workshop. This work is supported by the Istituto Nazionale di Fisica Nucle- are (INFN) and the United State Department of Energy (DOE) under contract DE-AC02-76SF00515. [1] H.R. Quinn and A.E. Snyder, Phys. Rev. D48, 2139 (1993). [2] H.J. Lipkin, Y. Nir, H.R. Quinn and A. Snyder, Phys. Rev. D44, 1454 (1991). [3] W. M. Yao et al. [Particle Data Group], J. Phys. G 33 (2006) 1. [4] BABAR Collaboration (B. Aubert et al.), Phys. Rev. Lett. 91, 201802 (2003); updated preliminary results at BABAR-PLOT-0055 (2003). [5] ALEPH Collaboration, (R. Barate et al.), Z. Phys. C76, 15 (1997); we use updated lineshape fits including new data from e+e− annihilation [13] and τ spectral func- tions [14] (masses and widths in MeV/c2): mρ±(770) = 775.5±0.6, mρ0(770) = 773.1±0.5, Γρ±(770) = 148.2±0.8, Γρ±(770) = 148.0 ± 0.9, mρ(1450) = 1409 ± 12, Γρ(1450) = 500± 37, mρ(1700) = 1749 ± 20, and Γρ(1700) ≡ 235. [6] G.J. Gounaris and J.J. Sakurai, Phys. Rev. Lett. 21, 244 (1968). [7] P. Gay, B. Michel, J. Proriol, and O. Deschamps, “Tag- ging Higgs Bosons in Hadronic LEP-2 Events with Neural Networks.”, In Pisa 1995, New computing techniques in physics research, 725 (1995). [8] BABAR Collaboration, B. Aubert et al., Phys. Rev. D66, 032003 (2002). [9] BABAR Collaboration (B. Aubert et al.), Phys. Rev. Lett. 93, 051802 (2004). [10] Belle Collaboration (J. Dragic et al.), Phys. Rev. D73, 111105 (2006). [11] The BABAR Physics Book, Editors P.F. Harrison and H.R. Quinn, SLAC-R-504 (1998). [12] M. Bona et al., JHEP, 0507 (2005) 028, J. Charles et al., Eur. Phys. J. C41, 1 (2005). [13] R.R. Akhmetshin et al. (CMD-2 Collaboration), Phys. Lett. B527, 161 (2002). [14] ALEPH Collaboration, ALEPH 2002-030 CONF 2002- 019, (July 2002). ABSTRACT I present here results of a time-dependent analysis of the Dalitz structure of neutral $B$ meson decays to $\pip\pim\piz$ from a dataset of 346 million $B \bar B$ pairs collected at the $\Upsilon(4S)$ center of mass energy by the BaBar detector at the SLAC PEP-II $e^+e^-$ accelerator. No significant CP violation effects are observed and 68% confidence interval is derived on the weak angle $\alpha$ to be [75$^o$,152$^o$] <|endoftext|><|startoftext|> Introduction Thermally stable polymers have attracted a lot of interest due to their potential use as the active component in electronic, optical and optoelectronic applications, such as light-emitting diodes, light emitting electrochemical cells, photodiodes, photovoltaic cells, field effect transistors, optocouplers and optically pumped lasers in solution and solid state. Polymer-based structures are the focus of intensive investigations as mechanically and physically flexible, processible materials for large-area photoemitting and photosensitive devices. Their wide practical application is inhibited by present-day limitations in control over luminescent spectra, sensitivity and efficiency. We report results of our investigations into the use of thermal treatment of poly(p-phenylene vinylene) (PPV) films grown on a variety of substrates (quartz and glass). The samples studied had a thickness in the range 50 - 200 nm. Film thickness, morphology and structural properties were investigated by a range of techniques in particular: atomic force microscope - AFM, DEKTAK method, Ellipsometry and UV-VIS spectroscopy. 2. Experiment Part Thin polymeric films are often used in the microelectronic industry, the development of optoelectronic applications. Homogeneous films with thickness varying from 50 – 200 nm are commonly prepared by spin coating. I this technique, polymers solution is dropped on the substrate surface (our case glass and quartz), which rotates at a given angular velocity during a give period of time. The film thickness is controlled by the concentration of the polymer is solution – 5% PPV in our experiment -, polymer molecular weight, spinning velocity and solvent evaporation rate. The polymers films are annealed at higher temperatures in vacuum and normal atmosphere and after this are investigated and results are compared. This work is concerned with the morphology of the thin films obtained from spin coating when different annealing method. The interactions between substrate, polymer and solvent were qualitatively correlated with the resulting surface morphology of spin coated films and treatment applied. We choose quartz and glass as substrates because this is transparent and easy for the spectroscopic investigations in transmission mode, and P-PPV dissolved in common solvents like toluene and chloroform. Moreover, the determination of the optical absorption and transmission, morphology and stability of the films are important for the development of electronic applications and waveguides[1]. Analytical grade toluene and chloroform were used to prepare the solutions at the polymer concentration 5 mg mL-1. The P-PPV was dissolved in solvents, where no phase separation takes place. The chemical structure for PPV is schematically represented in Figure 1. Cl Cl H2O2 / TeO2 NaOtBu 200oC III IV Figure 1. Synthesis of PPV ( after C.J. Brabec, et al.) 3. Methods and Results Spin coating – The PPV films were prepared by spin coating on commercial quartz and glass substrates. The substrates dimensions of 1 cm x 2 cm were previously cleaned in standard manner and dried under a stream of N2 [2]. All coatings were performed with the spinning velocity of 2000 rpm and the spinning time of 60 seconds. Ellipsometry – The mean thickness and index of refraction (n) of the films were determined by means of ellipsometry in a Plasmos SD2000Automatic ellipsometer, Munich, Germany. The samples characteristics are shown in Table 1. Table 1. Characteristic of PPV films obtained from spin coating. All measurements were performed at 24 ± 2 0 C Dektak measurements – We measured and compared the morphology, thickness and aspect of films before and after treatment. For PPV films before annealing we obtained the thickness of 87 nm witch is shown in figure 3, and in figure 4 after annealing in vacuum we obtained 45 nm and figure 5 and 6 shown aspects layers before and after annealing. [3]. Sample Solvent Thickness (nm) Reflection index P-PPV on quartz (before annealing) Toluene and Chloroform 87 ± 5 1,3 ± 0.05 PPV on Quartz (annealed in vacuum) Toluene and Chloroform 45 ± 5 2,590 ± 0.05 PPV on quartz (annealed in normal atmosphere) Toluene and Chloroform 58 ± 5 2,567 ± 0.05 PPV on quartz (after second annealing in vacuum) Toluene and Chloroform 44 ± 5 2,578 ± 0,05 Figure 2. Spin coating deposition -100 0 100 200 300 400 500 600 700 87 nm distance µm -100 0 100 200 300 400 500 600 700 Distance (µm) Figure 3.PPV film thickness before annealing Figure 4. PPV film thickness after annealing in vacuum 2 h at 200 oC Figure 5. PPV film aspects before annealing Figure 6. PPV film aspects before annealing Atomic Force Microscopy – Measurements were carried out with an instrument from Park Instrument Scientific (Sunnyvale, CA, USA) in non-contact mode in air at room temperature. All AFM images represent unfiltered original data and are displayed in color scale in figure 7, 8 and 9 [4]. Figure 7. AFM image from PPV films after anneling in vacuum at 200 0 C from 2 h Figure 8. AFM image from PPV films after second anneling in vacuum at 200 0 C from 2 Figure 9. AFM image from PPV films after anneling in mormal atmosphere at 200 0 C from 2 h In figure 7 and 8 are shown the image of PPV films after first and second annealing in vacuum at 200 0C for 2 hours. We can see not many changes between films, thichness were almost the same ( 45 nm respectively 44 nm) [5]. Figure 9 shows the surface structure of PPV films after annealing in normal atmophere at 200 0C for 2 hours and the film structure are diferent, compare with films structure which were annealed in vacuum. For all images, films are continuous and smooth with a a root mean square (r.m.s) roughness of 2 – 3 nm, from the annealed in vacuum and 5-7 nm from the normal athmosphere annealed [6]. We get amorphous films in the both case. The main informations observed in Dektak measurements, AFM investigations and ellipsomentry are: the surface roughness of the films depend on the speed of heating, slow heat up raises the roughness, quick heat up leads to more smoth films. The same situation is meet in case of vacuum annealing and normal atmosphere annealing. More over, the thickness of the layers is reduced to about the haltf after annealing in the both case. The PPV layers are not orienteded in the both annealing method [6]. UV/VIS measurements - were made using the Perkin Elmer – UV/VIS Spectrometer Lambda 16. Spectra were acquired from 300 to 900 nm for optical excitation. Figure 9, 10 and 11 shows a set of absorption spectra of PPV films obtained from spin coating converted by heating under vacuum and normal atmosphere at 200oC for 2 hours [7]. Spectra were normalized by dividing absorption spectrum of each individual sample by its absorption at the maximum. In this way relative changes within the spectrum and between the spectra are easily observed. One can notice differences in the position of the absorption maxima of PPV films prepared in different annealing method. The changes in the optical spectra of PPV films obtained from the precursors prepared in vacuum conditions and normal atmosphere condition are consistent with earlier observations in figure 10, 11 and 12 [8]. 4. Conclusions We summarize our findings as follows: (i) Annealing of PPV films causes ordering of polymer chains and, as a result, change in the luminescence intensity and spectra. (ii) spectral characteristics of the converted PPV-precursor strongly dependent on the preparation condition of the precursor (iii) the thickness of layers is reduced to about the half after annealing. (iv) The surface roughness of the films depends on the speed of heating: slow heat up raises the roughness; quick heat up leads to more smooth films. (v) PPV is thermally stable up to more than 500 0C measured by TGA. (vi) We get amorphous films in spin coating deposition. 5. Acknowledgements Financial support of the European Commission under contract number: FP6 – 505478-1 ODEON - Project and RTN EUROFET – Project is gratefully acknowledged. 300 400 500 600 700 800 900 1000 after annealing in normal atmosphere before annealing after annealing in vacuum wavelength (nm) 100 200 300 400 500 600 700 800 900 1000 before annealing after 1st annealing (in vacuum) after 2nd annealing (in vacuum) X Axis Title Figure 10. UV-VIS spectra of PPV films obtained in vacuum conversion and normal atmosphere conversion Figure 11. UV-VIS spectra of PPV films obtained in vacuum conversion made several times 300 400 500 600 700 800 900 before annealing after annealing wavelength (nm) Figure 12. UV-VIS spectra comparation of PPV films after and before vacuum conversion 6. References [1] L. Bakueva, E.H. Sargent, R. Resendes, A. Bartole, I. Manners, J. Mater. Sci.: Mater. Electron. 12 (2001) 21. [2] M. Pope, C.E. Swenberg, Electronic Processes in Organic Crystals and Polymers, Oxford Science Publications, Oxford, 1999. [3] L. Bakueva, S. Musikhin, E.H. Sargent, A. Shik, 2001. MRS Fall Meeting, Boston, November 26–30, 2001 Book of Abstracts. [4] D. Moses, A. Dogariu, A.J. Heeger, Synth. Met. 116 (2001) 19. [5] B. Hu, F.E. Karaz, Chem. Phys. 227 (1998) 263. [6] X.-R. Zeng, T.-M. Ko, J. Polym. Sci. B 35 (1997) 1993. [7] C.E. Lee, C.-H. Jin, Synthet. Met. 117 (2001) 27. [8] D.F.S. Petri, J.Braz.Chem.Soc. vol. 13, no 5, 695-699,2002. ABSTRACT Thermally stable polymers have attracted a lot of interest due to their potential use as the active component in electronic, optical and optoelectronic applications, such as light-emitting diodes, light emitting electrochemical cells, photodiodes, photovoltaic cells, field effect transistors, optocouplers and optically pumped lasers in solution and solid state.We report results of investigations into the use of thermal treatment of poly(p-phenylene vinylene) (PPV) films grown on a variety of substrates (quartz and glass). Film thickness, morphology and structural properties were investigated by a range of techniques in particular: atomic force microscope - AFM, DEKTAK method, Ellipsometry and UV-VIS spectroscopy. <|endoftext|><|startoftext|> arXiv:0704.0573v1 [quant-ph] 4 Apr 2007 Relativistic treatment in D-dimensions to a spin-zero particle with noncentral equal scalar and vector ring-shaped Kratzer potential Sameer M. Ikhdair∗ and Ramazan Sever† ∗Department of Physics, Near East University, Nicosia, North Cyprus, Mersin 10, Turkey †Department of Physics, Middle East Technical University, 06531 Ankara, Turkey. (November 4, 2018) Abstract The Klein-Gordon equation in D-dimensions for a recently proposed Kratzer potential plus ring-shaped potential is solved analytically by means of the conventional Nikiforov-Uvarov method. The exact energy bound-states and the corresponding wave functions of the Klein-Gordon are obtained in the presence of the noncentral equal scalar and vector potentials. The results obtained in this work are more general and can be reduced to the standard forms in three-dimensions given by other works. Keywords: Energy eigenvalues and eigenfunctions, Klein-Gordon equa- tion, Kratzer potential, ring-shaped potential, non-central potentials, Niki- forov and Uvarov method. PACS numbers: 03.65.-w; 03.65.Fd; 03.65.Ge. ∗sikhdair@neu.edu.tr †sever@metu.edu.tr http://arxiv.org/abs/0704.0573v1 I. INTRODUCTION In various physical applications including those in nuclear physics and high energy physics [1,2], one of the interesting problems is to obtain exact solutions of the relativistic equations like Klein-Gordon and Dirac equations for mixed vector and scalar potential. The Klein- Gordon and Dirac wave equations are frequently used to describe the particle dynamics in relativistic quantum mechanics. The Klein-Gordon equation has also been used to under- stand the motion of a spin-0 particle in large class of potentials. In recent years, much efforts have been paid to solve these relativistic wave equations for various potentials by using different methods. These relativistic equations contain two objects: the four-vector linear momentum operator and the scalar rest mass. They allow us to introduce two types of potential coupling, which are the four-vector potential (V) and the space-time scalar potential (S). Recently, many authors have worked on solving these equations with physical potentials including Morse potential [3], Hulthen potential [4], Woods-Saxon potential [5], Pösch-Teller potential [6], reflectionless-type potential [7], pseudoharmonic oscillator [8], ring-shaped har- monic oscillator [9], V0 tanh 2(r/r0) potential [10], five-parameter exponential potential [11], Rosen-Morse potential [12], and generalized symmetrical double-well potential [13], etc. It is remarkable that in most works in this area, the scalar and vector potentials are almost taken to be equal (i.e., S = V ) [2,14]. However, in some few other cases, it is considered the case where the scalar potential is greater than the vector potential (in order to guar- antee the existence of Klein-Gordon bound states) (i.e., S > V ) [15-19]. Nonetheless, such physical potentials are very few. The bound-state solutions for the last case is obtained for the exponential potential for the s-wave Klein-Gordon equation when the scalar potential is greater than the vector potential [15]. The study of exact solutions of the nonrelativistic equation for a class of non-central po- tentials with a vector potential and a non-central scalar potential is of considerable interest in quantum chemistry [20-22]. In recent years, numerous studies [23] have been made in analyzing the bound states of an electron in a Coulomb field with simultaneous presence of Aharanov-Bohm (AB) [24] field, and/or a magnetic Dirac monopole [25], and Aharanov- Bohm plus oscillator (ABO) systems. In most of these works, the eigenvalues and eigen- functions are obtained by means of seperation of variables in spherical or other orthogonal curvilinear coordinate systems. The path integral for particles moving in non-central poten- tials is evaluated to derive the energy spectrum of this system analytically [26]. In addition, the idea of SUSY and shape invariance is also used to obtain exact solutions of such non- central but seperable potentials [27,28]. Very recently, the conventional Nikiforov-Uvarov (NU) method [29] has been used to give a clear recipe of how to obtain an explicit exact bound-states solutions for the energy eigenvalues and their corresponding wave functions in terms of orthogonal polynomials for a class of non-central potentials [30]. Another type of noncentral potentials is the ring-shaped Kratzer potential, which is a combination of a Coulomb potential plus an inverse square potential plus a noncentral angu- lar part [31,32]. The Kratzer potential has been used to describe the vibrational-rotational motion of isolated diatomic molecules [33] and has a mixed-energy spectrum containing both bound and scattering states with bound-states have been widely used in molecular spec- troscopy [34]. The ring-shaped Kratzer potential consists of radial and angular-dependent potentials and is useful in studying ring-shaped molecules [22]. In taking the relativistic effects into account for spin-0 particle in the presence of a class of noncentral potentials, Ya- suk et al [35] applied the NU method to solve the Klein-Gordon equation for the noncentral Coulombic ring-shaped potential [21] for the case V = S. Further, Berkdemir [36] also used the same method to solve the Klein-Gordon equation for the Kratzer-type potential. Recently, Chen and Dong [37] proposed a new ring-shaped potential and obtained the exact solution of the Schrödinger equation for the Coulomb potential plus this new ring- shaped potential which has possible applications to ring-shaped organic molecules like cyclic polyenes and benzene. This type of potential used by Chen and Dong [37] appears to be very similar to the potential used by Yasuk et al [35]. Moreover, Cheng and Dai [38], proposed a new potential consisting from the modified Kratzer’s potential [33] plus the new proposed ring-shaped potential in [37]. They have presented the energy eigenvalues for this proposed exactly-solvable non-central potential in three dimensional (i.e., D = 3)- Schrödinger equation by means of the NU method. The two quantum systems solved by Refs [37,38] are closely relevant to each other as they deal with a Coulombic field interaction except for a slight change in the angular momentum barrier acts as a repulsive core which is for any arbitrary angular momentum ℓ prevents collapse of the system in any dimensional space due to the slight perturbation to the original angular momentum barrier. Very recently, we have also applied the NU method to solve the Schrödinger equation in any arbitrary D- dimension to this new modified Kratzer-type potential [39,40]. The aim of the present paper is to consider the relativistic effects for the spin-0 parti- cle in our recent works [39,40]. So we want to present a systematic recipe to solving the D-dimensional Klein-Gordon equation for the Kratzer plus the new ring-shaped potential proposed in [38] using the simple NU method. This method is based on solving the Klein- Gordon equation by reducing it to a generalized hypergeometric equation. This work is organized as follows: in section II, we shall present the Klein-Gordon equation in spherical coordinates for spin-0 particle in the presence of equal scalar and vector noncentral Kratzer plus the new ring-shaped potential and we also separate it into radial and angular parts. Section III is devoted to a brief description of the NU method. In section IV, we present the exact solutions to the radial and angular parts of the Klein- Gordon equation in D-dimensions. Finally, the relevant conclusions are given in section II. THE KLEIN-GORDON EQUATION WITH EQUAL SCALAR AND VECTOR POTENTIALS In relativistic quantum mechanics, we usually use the Klein-Gordon equation for de- scribing a scalar particle, i.e., the spin-0 particle dynamics. The discussion of the relativistic behavior of spin-zero particles requires understanding the single particle spectrum and the exact solutions to the Klein Gordon equation which are constructed by using the four-vector potential Aλ (λ = 0, 1, 2, 3) and the scalar potential (S). In order to simplify the solution of the Klein-Gordon equation, the four-vector potential can be written as Aλ = (A0, 0, 0, 0). The first component of the four-vector potential is represented by a vector potential (V ), i.e., A0 = V. In this case, the motion of a relativistic spin-0 particle in a potential is described by the Klein-Gordon equation with the potentials V and S [1]. For the case S ≥ V, there exist bound-state (real) solutions for a relativistic spin-zero particle [15-19]. On the other hand, for S = V, the Klein-Gordon equation reduces to a Schrödinger-like equation and thereby the bound-state solutions are easily obtained by using the well-known methods developed in nonrelativistic quantum mechanics [2]. The Klein-Gordon equation describing a scalar particle (spin-0 particle) with scalar S(r, θ, ϕ) and vector V (r, θ, ϕ) potentials is given by [2,14] 2 − [ER − V (r, θ, ϕ)/2] + [µ+ S(r, θ, ϕ)/2] ψ(r, θ, ϕ) = 0, (1) where ER,P and µ are the relativistic energy, momentum operator and rest mass of the particle, respectively. The potential terms are scaled in (1) by Alhaidari et al [14] so that in the nonrelativistic limit the interaction potential becomes V. In this work, we consider the equal scalar and vector potentials case, that is, S(r, θ, ϕ) = V (r, θ, ϕ) with the recently proposed general non-central potential taken in the form of the Kratzer plus ring-shaped potential [38-40]: V (r, θ, ϕ) = V1(r) + V2(θ) V3(ϕ) r2 sin2 θ , (2) V1(r) = − , V2(θ) = Cctg 2θ, V3(ϕ) = 0, (3) where A = 2a0r0, B = a0r 0 and C is positive real constant with a0 is the dissociation energy and r0 is the equilibrium internuclear distance [33]. The potentials in Eq. (3) introduced by Cheng-Dai [38] reduce to the Kratzer potential in the limiting case of C = 0 [33]. In fact the energy spectrum for this potential can be obtained directly by considering it as special case of the general non-central seperable potentials [30]. In the relativistic atomic units (h̄ = c = 1), the D-dimensional Klein-Gordon equation in (1) becomes [41] sin θ sin θ sin2 θ − (ER + µ) V1(r) + V2(θ) V3(ϕ) r2 sin2 θ E2R − µ ψ(r, θ, ϕ) = 0. (4) with ψ(r, θ, ϕ) being the spherical total wave function separated as follows ψnjm(r, θ, ϕ) = R(r)Y j (θ, ϕ), R(r) = r −(D−1)/2g(r), Y mj (θ, ϕ) = H(θ)Φ(ϕ). (5) Inserting Eqs (3) and (5) into Eq. (4) and using the method of separation of variables, the following differential equations are obtained: dR(r) j(j +D − 2) + α22 α21 − R(r) = 0, (6) sin θ sin θ m2 + Cα22 cos sin2 θ + j(j +D − 2) H(θ) = 0, (7) d2Φ(ϕ) +m2Φ(ϕ) = 0, (8) where α21 = µ−ER, α 2 = µ+ER, m and j are constants and with m 2 and λj = j(j+D−2) are the separation constants. For a nonrelativistic treatment with the same potential, the Schrödinger equation in spherical coordinates is sin θ sin θ sin2 θ ENR − V1(r)− V2(θ) V3(ϕ) r2 sin2 θ ψ(r, θ, ϕ) = 0. (9) where µ and ENR are the reduced mass and the nonrelativistic energy, respectively. Besides, the spherical total wave function appearing in Eq. (9) has the same representation as in Eq. (5) but with the transformation j → ℓ. Inserting Eq. (5) into Eq. (9) leads to the following differential equations [39,40]: dR(r) ENR + R(r) = 0, (10) sin θ sin θ m2 + 2µC cos2 θ sin2 θ + ℓ(ℓ+D − 2) H(θ) = 0, (11) d2Φ(ϕ) +m2Φ(ϕ) = 0, (12) where m2 and λℓ = ℓ(ℓ + D − 2) are the separation constants. Equations (6)-(8) have the same functional form as Eqs (10)-(12). Therefore, the solution of the Klein-Gordon equation can be reduced to the solution of the Schrödinger equation with the appropriate choice of parameters: j → ℓ, α21 → −ENR and α 2 → 2µ. The solution of Eq. (8) is well-known periodic and must satisfy the period boundary condition Φ(ϕ + 2π) = Φ(ϕ) which is the azimuthal angle solution: Φm(ϕ) = exp(±imϕ), m = 0, 1, 2, ..... (13) Additionally, Eqs (6) and (7) are radial and polar angle equations and they will be solved by using the Nikiforov-Uvarov (NU) method [29] which is given briefly in the following section. III. NIKIFOROV-UVAROV METHOD The NU method is based on reducing the second-order differential equation to a gener- alized equation of hypergeometric type [29]. In this sense, the Schrödinger equation, after employing an appropriate coordinate transformation s = s(r), transforms to the following form: ψ′′n(s) + τ̃(s) ψ′n(s) + σ̃(s) σ2(s) ψn(s) = 0, (14) where σ(s) and σ̃(s) are polynomials, at most of second-degree, and τ̃ (s) is a first-degree polynomial. Using a wave function, ψn(s), of the simple ansatz: ψn(s) = φn(s)yn(s), (15) reduces (14) into an equation of a hypergeometric type σ(s)y′′n(s) + τ(s)y n(s) + λyn(s) = 0, (16) where σ(s) = π(s) φ′(s) , (17) τ(s) = τ̃(s) + 2π(s), τ ′(s) < 0, (18) and λ is a parameter defined as λ = λn = −nτ ′(s)− n (n− 1) σ′′(s), n = 0, 1, 2, .... (19) The polynomial τ(s) with the parameter s and prime factors show the differentials at first degree be negative. It is worthwhile to note that λ or λn are obtained from a particular solution of the form y(s) = yn(s) which is a polynomial of degree n. Further, the other part yn(s) of the wave function (14) is the hypergeometric-type function whose polynomial solutions are given by Rodrigues relation yn(s) = [σn(s)ρ(s)] , (20) where Bn is the normalization constant and the weight function ρ(s) must satisfy the con- dition [29] w(s) = w(s), w(s) = σ(s)ρ(s). (21) The function π and the parameter λ are defined as π(s) = σ′(s)− τ̃ (s) σ′(s)− τ̃ (s) − σ̃(s) + kσ(s), (22) λ = k + π′(s). (23) In principle, since π(s) has to be a polynomial of degree at most one, the expression under the square root sign in (22) can be arranged to be the square of a polynomial of first degree [29]. This is possible only if its discriminant is zero. In this case, an equation for k is obtained. After solving this equation, the obtained values of k are substituted in (22). In addition, by comparing equations (19) and (23), we obtain the energy eigenvalues. IV. EXACT SOLUTIONS OF THE RADIAL AND ANGLE-DEPENDENT EQUATIONS A. Separating variables of the Klein-Gordon equation We seek to solving the radial and angular parts of the Klein-Gordon equation given by Eqs (6) and (7). Equation (6) involving the radial part can be written simply in the following form [39-41]: d2g(r) (M − 1)(M − 3) − α22 + α21α g(r) = 0, (24) where M = D + 2j. (25) On the other hand, Eq. (7) involving the angular part of Klein-Gordon equation retakes the simple form d2H(θ) + ctg(θ) dH(θ) m2 + Cα22 cos sin2 θ − j(j +D − 2) H(θ) = 0. (26) Thus, Eqs (24) and (26) have to be solved latter through the NU method in the following subsections. B. Eigenvalues and eigenfunctions of the angle-dependent equation In order to apply NU method [29,30,33,35,36,38-40,42-44], we use a suitable transforma- tion variable s = cos θ. The polar angle part of the Klein Gordon equation in (26) can be written in the following universal associated-Legendre differential equation form [38-40] d2H(s) 1− s2 dH(s) (1− s2)2 j(j +D − 2)(1− s2)−m2 − Cα22s H(θ) = 0. (27) Equation (27) has already been solved for the three-dimensional Schrödinger equation through the NU method in [38]. However, the aim in this subsection is to solve with different parameters resulting from the D-space-dimensions of Klein-Gordon equation. Further, Eq. (27) is compared with (14) and the following identifications are obtained τ̃(s) = −2s, σ(s) = 1− s2, σ̃(s) = −m′2 + (1− s2)ν ′, (28) where ν ′ = j′(j′ +D − 2) = j(j +D − 2) + Cα22, m ′2 = m2 + Cα22. (29) Inserting the above expressions into equation (22), one obtains the following function: π(s) = ± (ν ′ − k)s2 + k − ν ′ +m′2, (30) Following the method, the polynomial π(s) is found in the following possible values π(s) =   m′s for k1 = ν ′ −m′2, −m′s for k1 = ν ′ −m′2, m′ for k2 = ν −m′ for k2 = ν ′. Imposing the condition τ ′(s) < 0, for equation (18), one selects k1 = ν ′ −m′2 and π(s) = −m′s, (32) which yields τ(s) = −2(1 +m′)s. (33) Using equations (19) and (23), the following expressions for λ are obtained, respectively, λ = λn = 2ñ(1 +m ′) + ñ(ñ− 1), (34) λ = ν ′ −m′(1 +m′). (35) We compare equations (34) and (35), the new angular momentum j values are obtained as j = − (D − 2) (D − 2)2 + (2ñ+ 2m′ + 1)2 − 4Cα22 − 1, (36) j′ = − (D − 2) (D − 2)2 + (2ñ+ 2m′ + 1)2 − 1. (37) Using Eqs (15)-(17) and (20)-(21), the polynomial solution of yn is expressed in terms of Jacobi polynomials [39,40] which are one of the orthogonal polynomials: Hñ(θ) = Nñ sin m′(θ)P (m′,m′) (cos θ), (38) where Nñ = (ñ+m′)! (2ñ+2m′+1)(ñ+2m′)!ñ! is the normalization constant determined by [Hñ(s)] ds = 1 and using the orthogonality relation of Jacobi polynomials [35,45,46]. Besides ñ = − (1 + 2m′) (2j + 1)2 + 4j(D − 3) + 4Cα22, (39) where m′ is defined by equation (29). C. Eigensolutions of the radial equation The solution of the radial part of Klein-Gordon equation, Eq. (24), for the Kratzer’s potential has already been solved by means of NU-method in [39]. Very recently, using the same method, the problem for the non-central potential in (2) has been solved in three dimensions (3D) by Cheng and Dai [36]. However, the aim of this subsection is to solve the problem with a different radial separation function g(r) in any arbitrary dimensions. In what follows, we present the exact bound-states (real) solution of Eq. (24). Letting ε2 = α21α 2, 4γ 2 = (M − 1)(M − 3) + 4Bα22, β 2 = Aα22, (40) and substituting these expressions in equation (24), one gets d2g(r) −ε2r2 + β2r − γ2 g(r) = 0. (41) To apply the conventional NU-method, equation (41) is compared with (14), resulting in the following expressions: τ̃ (r) = 0, σ(r) = r, σ̃(r) = −ε2r2 + β2r − γ2. (42) Substituting the above expressions into equation (22) gives π(r) = 4ε2r2 + 4(k − β2)r + 4γ2 + 1. (43) Therefore, we can determine the constant k by using the condition that the discriminant of the square root is zero, that is k = β2 ± ε 4γ2 + 1, 4γ2 + 1 = (D + 2j − 2)2 + 4Bα22. (44) In view of that, we arrive at the following four possible functions of π(r) : π(r) =   εr + 1 4γ2 + 1 for k1 = β 2 + ε 4γ2 + 1, εr + 1 4γ2 + 1 for k1 = β 2 + ε 4γ2 + 1, εr − 1 4γ2 + 1 for k2 = β 2 − ε 4γ2 + 1, εr − 1 4γ2 + 1 for k2 = β 2 − ε 4γ2 + 1. The correct value of π(r) is chosen such that the function τ(r) given by Eq. (18) will have negative derivative [29]. So we can select the physical values to be k = β2 − ε 4γ2 + 1 and π(r) = 4γ2 + 1 , (46) which yield τ(r) = −2εr + (1 + 4γ2 + 1), τ ′(r) = −2ε < 0. (47) Using Eqs (19) and (23), the following expressions for λ are obtained, respectively, λ = λn = 2nε, n = 0, 1, 2, ..., (48) λ = δ2 − ε(1 + 4γ2 + 1). (49) So we can obtain the Klein Gordon energy eigenvalues from the following relation: 1 + 2n+ (D + 2j − 2)2 + 4(µ+ ER)B µ−ER = A µ+ ER, (50) and hence for the Kratzer plus the new ring-shaped potential, it becomes 1 + 2n+ (D + 2j − 2)2 + 4a0r20(µ+ ER) µ−ER = 2a0r0 µ+ ER, (51) with j defined in (36). Although Eq. (51) is exactly solvable for ER but it looks to be little complicated. Further, it is interesting to investigate the solution for the Coulomb potential. Therefore, applying the following transformations: A = Ze2, B = 0, and j = ℓ, the central part of the potential in (3) turns into the Coulomb potential with Klein Gordon solution for the energy spectra given by ER = µ 2q2e2 q2e2 + (2n+ 2ℓ+D − 1)2 , n, ℓ = 0, 1, 2, ..., (52) where q = Ze is the charge of the nucleus. Further, Eq. (52) can be expanded as a series in the nucleus charge as ER = µ− 2µq2e2 (2n + 2ℓ+D − 1)2 2µq4e4 (2n+ 2ℓ+D − 1)4 −O(qe)6, (53) The physical meaning of each term in the last equation was given in detail by Ref. [36]. Besides, the difference from the conventional nonrelativistic form is because of the choice of the vector V (r, θ, ϕ) and scalar S(r, θ, ϕ) parts of the potential in Eq. (1). Overmore, if the value of j obtained by Eq.(36) is inserted into the eigenvalues of the radial part of the Klein Gordon equation with the noncentral potential given by Eq. (51), we finally find the energy eigenvalues for a bound electron in the presence of a noncentral potential by Eq. (2) as 1 + 2n+ (2j′ +D − 2)2 + 4(a0r20 − C)(µ+ ER) µ−ER = 2a0r0 µ+ ER, (54) where m′ = m2 + C(µ+ ER) and ñ is given by Eq. (39). On the other hand, the solution of the Schrödinger equation, Eq. (9), for this potential has already been obtained by using the same method in Ref. [39] and it is in the Coulombic-like form: ENR = − 8µa20r 2n+ 1 + (2ℓ′ +D − 2)2 + 8µ(a0r20 − C) ]2 , n = 0, 1, 2, ... (55) 2ℓ′ +D − 2 = (D − 2)2 + (2ñ + 2m′ + 1)2 − 1, (56) wherem′ = m2 + 2µC. Also, applying the following appropriate transformation: µ+ER → 2µ, µ−ER → − ENR, j → ℓ to Eq. (54) provides exactly the nonrelativistic limit given by Eq. (55). In what follows, let us now turn attention to find the radial wavefunctions for this potential. Substituting the values of σ(r), π(r) and τ(r) in Eqs (42), (45) and (47) into Eqs. (17) and (21), we find φ(r) = r(ζ+1)/2e−εr, (57) ρ(r) = rζe−2εr, (58) where ζ = 4γ2 + 1. Then from equation (20), we obtain ynj(r) = Bnjr −ζe2εr rn+ζe−2εr , (59) and the wave function g(r) can be written in the form of the generalized Laguerre polyno- mials as g(ρ) = Cnj )(1+ζ)/2 e−ρ/2Lζn(ρ), (60) where for Kratzer’s potential we have (D + 2j − 2)2 + 4a0r20(µ+ ER), ρ = 2εr. (61) Finally, the radial wave functions of the Klein-Gordon equation are obtained R(ρ) = Cnj )(ζ+2−D)/2 e−ρ/2Lζn(ρ), (62) where Cnj is the normalization constant to be determined below. Using the normalization condition, R2(r)rD−1dr = 1, and the orthogonality relation of the generalized Laguerre polynomials, zη+1e−z [Lηn(z)] (2n+η+1)(n+η)! , we have Cnj = µ2 − E2R )1+ ζ (2n+ ζ + 1) (n+ ζ)! . (63) Finally, we may express the normalized total wave functions as ψ(r, θ, ϕ) = µ2 − E2R )1+ ζ (ñ+m′)! √√√√(2ñ+ 2m ′ + 1)(ñ+ 2m′)!ñ!n! 2π (2n+ ζ + 1) (n + ζ)! (ζ+2−D) 2 exp(− µ2 − E2Rr)L µ2 − E2Rr) sin m′(θ)P (m ′,m′) n (cos θ) exp(±imϕ). (64) where ζ is defined in Eq. (61) and m′ is given after the Eq. (54). V. CONCLUSIONS The relativistic spin-0 particle D-dimensional Klein-Gordon equation has been solved easily for its exact bound-states with equal scalar and vector ring-shaped Kratzer potential through the conventional NU method. The analytical expressions for the total energy levels and eigenfunctions of this system can be reduced to their conventional three-dimensional space form upon setting D = 3. Further, the noncentral potentials treated in [30] can be introduced as perturbation to the Kratzer’s potential by adjusting the strength of the coupling constant C in terms of a0, which is the coupling constant of the Kratzer’s potential. Additionally, the radial and polar angle wave functions of Klein-Gordon equation are found in terms of Laguerre and Jacobi polynomials, respectively. The method presented in this paper is general and worth extending to the solution of other interaction problems. This method is very simple and useful in solving other complicated systems analytically without given a restiction conditions on the solution of some quantum systems as the case in the other models. We have seen that for the nonrelativistic model, the exact energy spectra can be obtained either by solving the Schrödinger equation in (9) (cf. Ref. [39] or Eq. (55)) or by applying appropriate transformation to the relativistic solution given by Eq. (54). Finally, we point out that these exact results obtained for this new proposed form of the potential (2) may have some interesting applications in the study of different quantum mechanical systems, atomic and molecular physics. ACKNOWLEDGMENTS This research was partially supported by the Scientific and Technological Research Coun- cil of Turkey. S.M. Ikhdair wishes to dedicate this work to his family for their love and assistance. REFERENCES [1] T.Y. Wu and W.Y. Pauchy Hwang, Relativistic Quantum Mechanics and Quantum Fields (World Scientific, Singapore, 1991). [2] W. Greiner, Relativistic Quantum Mechanics: Wave Equations, 3rd edn (springer, Berlin, 2000). [3] A.D. Alhaidari, Phys. Rev. Lett. 87 (2001) 210405; 88 (2002) 189901. [4] G. Chen, Mod. Phys. Lett. A 19 (2004) 2009; J.-Y. Guo, J. Meng and F.-X. Xu, Chin. Phys. Lett. 20 (2003) 602; A.D. Alhaidari, J. Phys. A: Math. Gen. 34 (2001) 9827; 35 (2002) 6207; M. Şimşek and H. Eğrifes, J. Phys. A: Math. Gen. 37 (2004) 4379. [5] J.-Y. Guo, X.-Z. Fang and F.-X. Xu, Phys. Rev. A 66 (2002) 062105; C. Berkdemir, A. Berkdemir and R. Sever, J. Phys. A: Math. Gen. 39 (2006) 13455. [6] G. Chen, Acta Phys. Sinica 50 (2001) 1651; Ö. Yeşiltaş, Phys. Scr. 75 (2007) 41. [7] G. Chen and Z.M. Lou, Acta Phys. Sinica 52 (2003) 1071. [8] S.M. Ikhdair and R. Sever, J. Mol. Structure:THEOCHEM 806 (2007) 155; G. Chen, Z.D. Chen and Z.M. Lou, Chin. Phys. 13 (2004) 279. [9] W.C. Qiang, Chin. Phys. 12 (2003) 136. [10] W.C. Qiang, Chin. Phys. 13 (2004) 571. [11] G. Chen, Phys. Lett. A 328 (2004) 116; Y.F. Diao, L.Z. Yi and C.S. Jia, Phys. Lett. A 332 (2004) 157. [12] L.Z. Yi et al, Phys. Lett. A 333 (2004) 212. [13] X.Q. Zhao, C.S. Jia and Q.B.Yang, Phys. Lett. A 337 (2005) 189. [14] A.D. Alhaidari, H. Bahlouli and A. Al-Hasan, Phys. Lett. A 349 (2006) 87. [15] G. Chen, Phys. Lett. A 339 (2005) 300. [16] A. de Souza Dutra and G. Chen, Phys. Lett. A 349 (2006) 297. [17] F. Dominguez-Adame, Phys. Lett. A 136 (1989) 175. [18] A.S. de Castro, Phys. Lett. A 338 (2005) 81. [19] G. Chen, Acta Phys. Sinica 53 (2004) 680; G. Chen and D.F. Zhao, Acta Phys. Sinica 52 (2003) 2954. [20] M. Kibler and T. Negadi, Int. J. Quantum Chem. 26 (1984) 405; İ. Sökmen, Phys. Lett. 118A (1986) 249; L.V. Lutsenko et al., Teor. Mat. Fiz. 83 (1990) 419; H. Hartmann et al., Theor. Chim. Acta 24 (1972) 201; M.V. Carpido-Bernido and C. C. Bernido, Phys. Lett. 134A (1989) 315; M.V. Carpido-Bernido, J. Phys. A 24 (1991) 3013; O. F. Gal’bert, Y. L. Granovskii and A. S. Zhedabov, Phys. Lett. A 153 (1991) 177; C. Quesne, J. Phys. A 21 (1988) 3093. [21] M. Kibler and P. Winternitz, J. Phys. A 20 (1987) 4097. [22] H. Hartmann and D. Schuch, Int. J. Quantum Chem. 18 (1980) 125. [23] M. Kibler and T. Negadi, Phys. Lett. A 124 (1987) 42; A. Guha and S. Mukherjee, J. Math. Phys. 28 (1989) 840; G. E. Draganescu, C. Campiogotto and M. Kibler, Phys. Lett. A 170 (1992) 339; M. Kibler and C. Campiogotto, Phys. Lett. A 181 (1993) 1; V. M. Villalba, Phys. Lett. A 193 (1994) 218. [24] Y. Aharonov and D. Bohm, Phys. Rev. 115 (1959) 485. [25] P. A. M. Dirac, Proc. R. Soc. London Ser. A 133 (1931) 60. [26] B.P. Mandal, Int. J. Mod. Phys. A 15 (2000) 1225. [27] R. Dutt, A. Gangopadhyaya and U.P. Sukhatme, Am. J. Phys. 65 (5) (1997) 400. [28] B. Gönül and İ. Zorba, Phys. Lett. A 269 (2000) 83. [29] A.F. Nikiforov and V. B. Uvarov, Special Functions of Mathematical Physics (Birkhauser, Bassel, 1988). [30] S.M. Ikhdair and R. Sever, to appear in the Int. J. Theor. Phys. (preprint quant- ph//0702186). [31] H.S. Valk, Am. J. Phys. 54 (1986) 921. [32] Q.W. Chao, Chin. Phys. 13 (5) (2004) 575. [33] C. Berkdemir, A. Berkdemir and J.G. Han, Chem. Phys. Lett. 417 (2006) 326. [34] A. Bastida et al, J. Chem. Phys. 93 (1990) 3408. [35] F. Yasuk, A. Durmuş and I. Boztosun, J. Math. Phys. 47 (2006) 082302. [36] C. Berkdemir, Am. J. Phys. 75 (2007) 81. [37] C.Y. Chen and S.H. Dong, Phys. Lett. A 335 (2005) 374. [38] Y.F. Cheng and T.Q. Dai, Phys. Scr. 75 (2007) 274. [39] S.M. Ikhdair and R. Sever, preprint quant-ph/0703008; quant-ph/0703042. [40] S.M. Ikhdair and R. Sever, preprint quant-ph/0703131. [41] S.M. Ikhdair and R. Sever, Z. Phys. C 56 (1992) 155; C 58 (1993) 153; D 28 (1993) 1; Hadronic J. 15 (1992) 389; Int. J. Mod. Phys. A 18 (2003) 4215; A 19 (2004) 1771; A 20 (2005) 4035; A 20 (2005) 6509; A 21 (2006) 2191; A 21 (2006) 3989; A 21 (2006) 6699; Int. J. Mod. Phys. E (in press) (preprint hep-ph/0504176); S. Ikhdair et al, Tr. J. Phys. 16 (1992) 510; 17 (1993) 474. [42] S.M. Ikhdair and R. Sever, Int. J. Theor. Phys. (DOI 10.1007/s10773-006-9317-7; J. Math. Chem. (DOI 10.1007/s10910-006-9115-8). [43] S.M. Ikhdair and R. Sever, Ann. Phys. (Leipzig) 16 (3) (2007) 218. [44] S.M. Ikhdair and R. Sever, to appear in the Int. J. Mod. Phys. E (preprint quant- ph/0611065). [45] G. Sezgo, Orthogonal Polynomials (American Mathematical Society, New York, 1939). [46] N.N. Lebedev, Special Functions and Their Applications (Prentice-Hall, Englewood Cliffs, NJ, 1965). ABSTRACT The Klein-Gordon equation in D-dimensions for a recently proposed Kratzer potential plus ring-shaped potential is solved analytically by means of the conventional Nikiforov-Uvarov method. The exact energy bound-states and the corresponding wave functions of the Klein-Gordon are obtained in the presence of the noncentral equal scalar and vector potentials. The results obtained in this work are more general and can be reduced to the standard forms in three-dimensions given by other works. <|endoftext|><|startoftext|> Introduction Introducing red noise terms Overall formalism Modifying the detection statistic to account for red noise Multiple transits Impact on the noise budget and detection statistic Noise budget on transit time-scales Detection probability PS/N Additional considerations Turnoff mass Saturation mass Radial velocity follow up Applications PG05a's fiducial cluster Example galactic open clusters Conclusions ABSTRACT We present an extension of the formalism recently proposed by Pepper & Gaudi to evaluate the yield of transit surveys in homogeneous stellar systems, incorporating the impact of correlated noise on transit time-scales on the detectability of transits, and simultaneously incorporating the magnitude limits imposed by the need for radial velocity follow-up of transit candidates. New expressions are derived for the different contributions to the noise budget on transit time-scales and the least-squares detection statistic for box-shaped transits, and their behaviour as a function of stellar mass is re-examined. Correlated noise that is constant with apparent stellar magnitude implies a steep decrease in detection probability at the high mass end which, when considered jointly with the radial velocity requirements, can severely limit the potential of otherwise promising surveys in star clusters. However, we find that small-aperture, wide field surveys may detect hot Neptunes whose radial velocity signal can be measured with present-day instrumentation in very nearby (<100 pc) clusters. <|endoftext|><|startoftext|> Introduction In 1873, J. Bertrand[1] published a short but important paper in which he proved that there are of only two central fields for which all orbits radially bounded are closed, namely: The newtonian field and the isotropic harmonic oscillator field. Because of this additional degenerescency it is no wonder that the properties of those two fields have been under close scrutiny since Newton’s times. Newton addresses to the isotropic harmonic oscillator in proposition X Book I, and to the inverse-square law in proposition XI [2]. Newton shows that both fields give rise to an elliptical orbit with the difference that in the first case the force is directed towards the geometrical centre of the ellipse and in the second ∗e-mail: filadelf@if.ufrj.br †e-mail: vsoares@if.ufrj.br ‡e-mail: tort@if.ufrj.br. http://arxiv.org/abs/0704.0575v1 case the force is directed to one of the foci. Bertrand’s result, also known as Bertrand’s theorem, continues to fascinate old and new generations of physicists interested in classical mechanics and unsurprisingly papers devoted to it continue to be produced and published. Bertrand’s proof concise and elegant and contrary to what one may be led to think by a number of perturbative demonstrations that can be found in modern literature, textbooks and papers on the subject, it is fully non-perturbative. As examples of perturbative demonstrations the reader can consult references [3, 4, 5]. We can also find in the literature demonstrations that resemble the spirit of Bertrand’s original work as for example [6]. As far as the present authors are aware of all those demonstrations have a restrictive feature, i.e., they set a limit on the number of possibilities of the existence of central fields with the property mentioned above to a finite number and finally show explicitly that among the surviving possibilities only two, the newtonian and the isotropic harmonic oscillator, are really possible. In his paper, Bertrand proves initially by taking into consideration the equal radii limit that a central force f(r) acting on a point-like body able of generating radially bounded orbits must necessarily be of the form f (r) = κ r(1/m where r is the radial distance to center of force, κ is a constant and m a rational number. Next, making use of this particular form of the law of force and considering also an additional limiting condition, Bertrand finally shows that only for m = 1 and m = 1/2, which correspond to Newton’s gravitational law of force f (r) = − and to the isotropic harmonic oscillator law of force f (r) = −κ r, respectively, we can have orbits with the properties stated in the theorem. However, we can also prove that for these laws of force all bounded orbits are closed. Here we offer an alternative non-perturbative proof of Bertrand’s theorem that leads in a more concise way directly to the two allowed fields. 2 Bertrand’s theorem In a central field one can introduce a potential function V (r), through the property f = −∇V (r) . (1) in such a way that the mechanical energy of a point-like body of mass µ v2 + V (r) , (2) is conserved. For radially bounded orbits there are two extreme radii rmax e rmin, the so called apsidal points ra, that are determined by the condition ṙa = 0, and between which the particle oscillates indefinitely. Moreover, the conservation of the angular momentum of the particle under the action of a central field obliges the motion to take place on a fixed plane and allows the introduction of the effective potential U(r) = V (r) + , (3) with the help of which it is possible to reduce this problem to an equivalent unidimensional one. This procedure can be found in several textbooks at the undergraduate and gradu- ated level, see for example [7]. In terms of the effective potential orbits radially bounded are characterised by apsidal distances rmax e rmin that satisfy the condition E = U (ra). Evidently there is an intermediate point r0 where the effective potential has a minimum that satisfies U ′ (r0) = V ′ (r0)− = 0. (4) The angular displacement of the particle between two successive apsidal points, the apsidal angle ∆θa, is determined by ∆θa = ∫ rmax [E − U (r)] . (5) By considering the effective potential U as the independent variable and by making use of the inverse function r (U), Tikochinsky[3] produced a very ingenious proof of Bertrand’s theorem. The inversion of the equation (3), however, is not possible in all the domain on which the radial coordinate r is defined because the function is not one-to-one in the field of the real numbers. To circumvent this difficulty we define two one-to-one branches of the function U (r), namely, one to the left and the other to the right of the point r0. Then we introduce the inverse functions r1 = r1 (U) and r2 = r2 (U), defined to the left and to the right of the point r0, respectively, see Figure 1 . We express initially the angular displacement, equation when the particle moves from the point of minimum radial distance rmin to the point r0 in terms of the variable U ∆θ1 = [E − U ] [E − U ] . (6) radial distance r r0rmin Figure 1: General form of the effective potential energy. By the same token we will also have ∆θ2 = [E − U ] [E − U ] , (7) for the angular displacement from r0 to the point of maximum radial distance r2. Upon adding up equations (6) and (7) we obtain the angular displacement between two succes- sive apsidal points ∆θa = F (U) E − U , (8) where F (U) = . (9) Equation (8) is Abel’s integral equation the solution of which can be found, for ex- ample, in Landau’s well known book on classical mechanics [8]. A beautiful and straight- forward solution of this equation is the one by Oldham and Spanier [9]. Abel’s solution reads ∆θa (E)√ U − E dE, (10) where the explicit dependency of the apsidal angle on the energy was stressed. If all bounded orbits are closed then the apsidal angle ∆θa (E), for these orbits, cannot change when the energy changes in a continual manner otherwise the continual changes would inevitably lead to open orbits. Taking this fact into account let us determine the central potentials that produce the same apsidal angle for all radially bounded orbits. After integrating equation (10) we obtain 2m∆θa U − U0. (11) Equation (11) was derived in Ref. [3] where a perturbative technique applied on a circular orbit leads to Bertrand’s result. The functions r1 (U) and r2 (U) being the inverse function of the function U (r) are not independent of each other, and combined as they are in equation (11), do not allow an efficient manipulation and hide the unique inverse we are looking for. At this point we perform an analytical continuation of the function U (r) such that we can consider its inverse function r = r (U). Therefore we write 2µ∆θa U − U0 + Φ(U, U0) , (12) where Φ (U, U0) is an analytical function of the complex variable U in an open neighbor- hood of U0 satisfying the condition Φ (U0, U0) = 1/r0, and whose analytical continuation cannot have poles but can have other ramification points. Notice that it s not necessary to make use of the symbol ± before the second term of equation 12) because the square root has two branches. The positive sign corresponds to r < r0 and the negative one to r > r0. Taking equation (3) into equation (12) we obtain 2m∆θa r2V (r) + − U0r2 + Φ(U, U0) . (13) The left-hand side of the identity (13) represents a meromorphic function with a single pole at r = 0 and the right hand side of this same identity contains several terms but only one can spoil the analyticity of the complete function at some point not equal to r = 0, namely the term that depends on the square root that generates a branch point at r = r0. To avoid this it is mandatory to undo the branching effect inherent to the square root. This is possible only if the radicand is the square of an analytical function with a zero at r = r0. In this way we identify two possibilities for the potential V (r), to wit V (r) = − , newtonian potential, (14) V (r) = κ r2, isotropic harmonic oscillator potential; (15) for which the apsidal angle is independent of the energy. We can calculate the corre- sponding constant apsidal angles for those two potentials as follows. For the newtonian potential the effective potential, equation (3), is given by U = − . (16) Solving equation (16) with respect to 1/r we obtain . (17) Making use of equation (4) with the effective potential given by equation (16) we find r0 = ℓ 2/(µκ) and the corresponding minimum energy U0 = −µκ2/(2ℓ2). Now we can recast equation (17) into the form U − U0. (18) Comparing equation (12) with equation (18) we can finally determine the apsidal angle for the newtonian potential which reads ∆θa = π. (19) The procedure employed with the newtonian potential can be also applied with a little bit more of effort to the case of the isotropic harmonic oscillator. The effective potential is now given by kr2 + . (20) This equation is a quartic equation in 1/r, biquadratic more precisely, and its solution is given by − 1. (21) Factoring out the right hand side of the equation (21) we have U − U0 + U + U0, (22) where now we have made use of the relations r2 µκ and U0 = ℓ κ/µ. Comparing equations (12) and (22) we obtain 2µ∆θa , ∴ ∆θa = . (23) We can see that both potentials for which the apsidal angle is constant the orbits are closed. For the newtonian case the radius oscillates only once in a complete cycle and for the oscillator case the radius oscillates twice. 3 Final Remarks In this brief paper we derived Bertrand’s theorem in a non-perturbative way. We have shown that simple analytical function techniques applied to the problem of finding the only central fields that allow an entire class of bounded, closed orbits with a minimum number of restrictions leads in a concise, straightforward way directly to the two allowed fields. We believe that the derivation discussed here is a valid alternative to a non-perturbative proof of Bertrand’s theorem and can be presented at the undergraduate and graduate level or assigned as a problem for classroom discussion. References [1] Bertrand J 1873 C.R. Acad. Sci. Paris 77 849 [2] Newton I 1687 Philosophiae Naturalis Principia Mathematica (London: Royal Soci- ety). English translation by A Motte revised by F Cajori 1962 (University of Cali- fornia Press, Berkeley CA) [3] Tikochinsky Y 1988 Am. J. Phys. 56 1073 [4] Brown L S 1978 Am. J. Phys. 46 930 [5] Zarmi Y 2002 Am. J. Phys. 70 446 [6] Arnol’d V I 1976 Les Méthodes Mathématiques de la Mécanique Classique (Mir: Moscou) [7] Goldstein H, Poole C and Safko J 2002 Classical Mechanics 3rd edn (Reading: Addison-Wesley) [8] Landau L and Lifchitz E 1969 Mècanique 3e èdition revue (Mir: Moscou) [9] Oldham K B and Spanier J 1974 The Fractional Calculus (London: Academic Press) Introduction Bertrand's theorem Final Remarks ABSTRACT We discuss an alternative non-perturbative proof of Bertrand's theorem that leads in a concise way directly to the two allowed fields: the newtonian and the isotropic harmonic oscillator central fields. <|endoftext|><|startoftext|> [1] [2] (Received 12 January 2007) Neutron-Capture Elements in the Double-Enhanced Star HE 1305-0007: a New s- and r-Process Paradigm∗ CUI Wen-Yuan()1,2,3, CUI Dong-Nuan()1, DU Yun-Shuang()1, ZHANG Bo()1,2 Department of Physics, Hebei Normal University, Shijiazhuang 050016 National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100012 Graduate School of the Chinese Academy of Sciences, Beijing 100049 The star HE 1305-0007 is a metal-poor double-enhanced star with metallicity [Fe/H] = −2.0, which is just at the upper limit of the metallicity for the observed double-enhanced stars. Using a parametric model, we find that almost all s-elements were made in a single neutron exposure. This star should be a member of a post-common-envelope binary. After the s-process material has experienced only one neutron exposure in the nucleosynthesis region and is dredged-up to its envelope, the AGB evolution is terminated by the onset of common-envelope evolution. Based on the high radial-velocity of HE 1305-0007, we speculate that the star could be a runaway star from a binary system, in which the AIC event has occurred and produced the r-process elements. PACS numbers: 97.10.Cv,26.45.+h,97.10.Tk The discovery that several stars show enhancements of both r-process and s-process elements (s+r stars hereafter)[1,2] is puzzling, as they require pollution from both an AGB star and a supernova. In 2003, Qian and Wasserburg[3] proposed a theory, i.e. accretion-induced collapse(AIC), for the possible creation of s+r-process stars. Another possible s+r scenario is that the AGB star transfers s-rich matter to the observed star but not suffer from a large mass loss and at the end of the AGB phase, the degenerate core of low-metallicity, high-mass AGB star may reach the Chandresekhar mass, leading to type-1.5 supernova.[4] Because the initial-final-mass re- lation flats at higher metallicity,[4] the degenerate cores of high-metallicity AGB stars are smaller than those of the low-metallicity stars, the formation of AIC or SN1.5 is more difficult in the high-metallicity binary system, which can explain the upper limit of the metallicity ([Fe/H] < −2.0) for the observed r+s stars.[5] Recently, Barbuy et al.[6] and Wanajo et al.[7] suggested massive AGB stars (M = 8 ∼ 12M⊙) to be the origin of these double enhancements. Such a large mass AGB star could possibly provide the observed enhancement of s-process elements in the first phase, and explode or collapse pro- viding the r-process elements. However, the modeling of the evolution of such a large mass metal-poor star is a difficult task, an amount of the s-process material is pro- duced and its abundance distribution is still uncertain.[7] The generally favoured s-process model till now is as- sociated with the partial mixing of protons (PMP here- after) into the radiative C-rich layers during thermal pulses.[8−11] PMP activates the chain of reactions 12C(p, γ)13N(β)13C(α, n)16O, which likely occurs in a narrow mass region of the He intershell (i.e. 13C-pocket) during the interpulse phases of an AGB star. The nucleosynthe- sis of neutron-capture elements in the carbon-enhanced metal-poor stars (CEMP stars hereafter)[12] can be in- vestigated by abundance studies of s-rich or r-rich stars. In 2006, Goswami et al.[13] analysed the spectra of the s- and r-rich metal-poor star HE 1305-0007, and concluded that the observed abundances could not be well fit by a scaled solar system r-process pattern nor by the s-process pattern of an AGB model. This star shows that the en- hancements of the neutron-capture elements Sr and Y are much lower than the enhancement of Ba and the abundances ratio [Pb/Ba] is only about 0.05. Because of the Na overabundance, which is believed to be formed through deep CNO-burning, Goswami et al.[13] have also speculated that this star should be polluted by a massive AGB star. Clearly, the restudy of elemental abundances in this object is still very important for well understand- ing the nucleosynthesis of neutron-capture elements in metal-poor stars. The chemical abundance distributions of the very metal-poor double-enhanced stars are excellent informa- tion to set new constraints on models of neutron-capture processes at low metallicity. The metallicity of HE 1305- 0007 is [Fe/H] = −2.0, which is just at the upper limit of the metallicity for the observed double-enhanced stars. There have been many theoretical studies of s-process nucleosynthesis in low-mass AGB stars. Unfortunately, the precise mechanism for chemical mixing of protons from the hydrogen-rich envelope into the 12C-rich layer to form a 13C-pocket is still unknown.[14] It is interest- ing to adopt the parametric model for metal-poor stars presented by Aoki et al.[15] and developed by Zhang et al.[5] to study the physical conditions which could repro- duce the observed abundance pattern found in this star. In this Letter, we investigate the characteristics of the nucleosynthesis pathway that produces the special abun- dance ratios of s- and r-rich object HE 1305-0007 using the s-process parametric model.[5] The calculated results are presented. We also discuss the characteristics of the s-process nucleosynthesis at low metallicity. We explored the origin of the neutron-capture elements in HE 1305-0007 by comparing the observed abundances with predicted s- and r-process contribution. For this http://arxiv.org/abs/0704.0576v1 purpose, we adopt the parametric model for metal-poor stars presented by Zhang et al.[5] The ith element abun- dance in the envelope of the star can be calculated by Ni(Z) = CsNi,s + CrNi,r10 [Fe/H], (1) where Z is the metallicity of the star, Ni,s is the abun- dance of the i-th element produced by the s-process in the AGB star and Ni,r is the abundance of the ith element produced by the r-process (per Si = 106 at Z = Z⊙), Cs and Cr are the component coefficients that correspond to contributions from the s-process and r-process respec- tively. There are four parameters in the parametric model of s- and r-rich stars. They are the neutron exposure per thermal pulse ∆τ , the overlap factor r, the component co- efficient of the s-process Cs and the component coefficient of the r-process Cr. The adopted initial abundances of seed nuclei lighter than the iron peak elements were taken to be the solar-system abundances, scaled to the value of [Fe/H] of the star. Because the neutron-capture-element component of the interstellar gas to form very mental- deficient stars is expected to consist of mostly pure r- process elements, for the other heavier nuclei we use the r-process abundances of the solar system,[16] normalized to the value of [Fe/H]. The abundances of r-process nu- clei in Eq. (1) are taken to be the solar-system r-process abundances[16] for the elements heavier than Ba, for the other lighter nuclei we use solar-system r-process abun- dances multiplied by a factor of 0.4266.[5,17] Using the observed data in the sample star HE 1305-0007, the pa- rameters in the model can be obtained from the para- metric approach. Figure 1 shows our calculated best-fit result. For this star, the curves produced by the model are consistent with the observed abundances within the error limits. The agreement of the model results with the observations provides strong support to the validity of the paramet- ric model. In the AGB model, the overlap factor r is a fundamental parameter. In 1998, Gallino et al.[8] (G98 hereafter) have found an overlap factor of r ≃ 0.4−0.7 in their standard evolution model of low-mass (1.5−3.0M⊙) AGB stars at solar metallicity. The overlap factor calcu- lated for other s-enhanced metal-poor stars lies between 0.1 and 0.81.[5] The overlap factor deduced for HE 1305- 0007 is about r = 0+0.17 −0.00, which is much smaller than the range presented by G98. This just implies that iron seeds could experience only one neutron exposure in the nucleosynthesis region.[18] For the third dredge-up and the AGB model, sev- eral important properties depend primarily on the core mass.[19−21] In the core-mass range 0.6 ≤ Mc ≤ 1.36, an analytical formula for the AGB stars was given by Iben[19] showing that the overlap factor increases with decreasing core mass. Combing the formula and the initial-final mass relations,[4] Cui and Zhang [22] obtained the overlap factor as a function of the initial mass and metallicity. In an evolution model of AGB stars, a small r may be realized if the third dredge-up is deep 40 50 60 70 80 = 0.71(mbarn 1) r = 0.00 Cr = 67.43 Cs= 0.0047 = 1.290 FIG. 1: Best fit to observational result of HE 1305-0007. The black circles with appropriate error bars denote the ob- served element abundances, the solid line represents predic- tions from s-process calculations considering r-process contri- bution (taken from Ref. [13]). enough for the s-processed material to be diluted by ex- tensive admixture of unprocessed material. Karakas[21] and Herwig[23,24] have found that the third dredge-up is more efficient for the AGB stars with larger core masses, confirming the low values of r obtained by Iben[19] in these cases. In AGB stars with initial mass in the range M = 1.0−4.0M⊙, the core mass Mc lies between 0.6 and 1.2M⊙ at [Fe/H] = −2.0. According to the formula pre- sented by Iben,[19] the corresponding values of r would range between 0.76 and 0.26. Obviously, the overlap fac- tor of HE 1305-0007 is smaller than this range. We have extensively explored the convergence of the abundance distribution of s-process elements through re- current neutron exposures. All elements, including Pb, were found to be made in the first neutron exposure. This is consistent with the small overlap factor r ≃ 0 deduced in our best-fit model. Thus the possibility that the s-process material has experienced only one neutron exposure in the nucleosynthesis region is existent. In 2000, Fujimoto, Ikeda and Iben[25] have proposed a scenario for the extra-metal-poor AGB stars with [Fe/H]< −2.5 in which the convective shell triggered by the thermal runaway develops inside the helium layer. Once this occurs,12C captures proton to synthesize13C and other neutron-source nuclei. The thermal runaway continues to heat material in the thermal pulse so that neutrons produced by the 22Ne(α, n)25Mg reaction as well as the 13C(α, n)16O reaction may contribute. In this case, only one episode of proton mixing into He intershell layer occurs in metal-poor stars.[25,15,45] After the first two pulses no more proton mixing occurs although the third dredge-up events continue to repeat, so the abun- dances of the s-rich metal-poor stars can be characterized by only one neutron exposure. Obviously, the metallicity of HE 1305-0007 is higher than the range of metallicity for this scenario. 0.1 1 FIG. 2: Best fit to observational result of metal-deficient star HE 1305-0007 shows the calculated abundances logε(Pb), logε(Ba) and logε(Sr) and reduced χ2 (bottom)as a function of the neutron exposure ∆τ in a model with Cr = 67.4, Cs = 0.0047 and r = 0. These are compared with the ob- served abundances of HE 1305-0007. 0.0 0.2 0.4 0.6 0.8 1.0 FIG. 3: The same as those in Fig. 2 but as a function of the overlap factor r in a model with ∆τ = 0.71. One major goal of this work is to explore the charac- teristics of the binary system that HE 1305-0007 origin belongs to. The enhancement of the neutron-capture ele- ments Ba and Pb suggests that in a binary system a mass- transfer episode from a former AGB star took place. The radial-velocity measurement indicates that HE 1305-0007 is a high-velocity star, with a radial-velocity of 217.8 km s−1. From the high velocity of HE 1305-0007, we could speculate that the star could be a runaway star from a bi- nary system, which has experienced the AIC event. The strong overabundance of r-process elements for HE 1305- 0007 (Cr = 67.4) should be a significant evidence for the AIC scenario. In this case, the orbital separation must be small enough to allow for capture of a sufficient amount of material to create the formation of this event. As- suming that HE 1305-0007 is formed in a binary system, the AGB connection strongly suggests that this star is a member of a post-common-envelope binary. This must be the case if the overabundances of s-process elements are attributed to mass-transfer from an AGB star. We can only speculate about the effects of common-envelope phase on the nuclear signatures in a metal-poor star that was formed from this mechanism. One case could in- volve several thermal pulses with dredge-up causing the observed abundance distribution corresponding to larger overlap factor. However, after the s-process material has experienced only one neutron exposure in the nucleosyn- thesis region and is dredged-up to its envelope, the AGB evolution is terminated by the onset of common-envelope evolution. This could explain the characteristic of single neutron exposure in this star. In addition, based on the Na overabundance, Goswami et al.[13] have speculated that HE 1305-0007 should be polluted by a massive AGB star, which has a large core-mass and favours the forma- tion of AIC. Clearly, a detailed theoretical investigation of this scenario is highly desirable. The neutron exposure per pulse, ∆τ , is another funda- mental parameter in the AGB model. In 2006, Zhang et al.[5] have deduced the neutron exposure per pulse for other s-enhanced metal-poor stars which lies between 0.45 and 0.88 mbarn−1. The neutron exposure deduced for HE 1305-0007 is about ∆τ = 0.71+0.06 −0.04 mbarn −1. Fig- ures 2 and 3 show the calculated abundances logε(Pb), logε(Ba) and logε(Sr) as versus the neutron exposure ∆τ in a model with Cr = 67.4, Cs = 0.0047 and r = 0 and versus overlap r with ∆τ = 0.71 mbarn−1, respec- tively. These are compared with the observed abun- dances of HE 1305-0007. There is only one region in Fig. 2, ∆τ = 0.71+0.06 −0.04 mbarn −1, in which all the ob- served ratios of three representative elements can be ac- counted for within the error limits. The bottom panel in Fig. 2 displays the reduced χ2 value calculated in our model with all detected elemental abundances be- ing taken into account and there is a minimum, with χ2 = 1.290, at ∆τ = 0.71 mbarn−1. From Fig. 3, we find that the abundances logε(Pb), logε(Ba) and logε(Sr)are insensitive to the overlap factor r in a wider range, 0 ≤ r ≤ 0.17. The uncertainties of the parameters for the star HE 1305-0007 are similar to those for metal- poor stars LP 625-44 and LP 706-7 obtained by Aoki et al.[15] In addition, it is worth further commenting on the be- haviour of logε(Sr), logε(Ba) and logε(Pb) as a function of the neutron exposure ∆τ seen in Fig. 2. The non- linear trends displayed in the plot reveal the complex dependence on the neutron exposure. The trends can be illustrated as follows. Starting from low neutron expo- sure and moving toward higher neutron exposure values, they show how the Sr peak elements are preferentially produced at nearly ∆τ∼ 0.4mbarn−1. At larger neutron exposure (e.g., ∆τ∼ 0.7mbarn−1), the Ba-peak elements become dominant. In fact, the higher neutron exposure favors large amounts of production of the heavier ele- ments such as Ba, La, etc. and less Sr, Y, etc.,[22] which is the reason of the abundance pattern of the s-process elements in HE 1305-0007, i.e. the enhancements of the neutron-capture elements Sr and Y are much lower than the enhancement of Ba and the abundances ratio [Pb/Ba] is only about 0.05. Then a higher value of logε(Pb)∼ 4 follows at ∆τ = 1.5 mbarn−1. In this case, the s-process flow extends beyond the Sr-peak and Ba-peak nuclei to cause an accumulation at 208Pb. Clearly, logε(Pb) is very sensitive to the neutron exposure. The r- and s-process component coefficients of HE 1305-0007 are about 67.4 and 0.0047, which implies that this star belongs to s+r stars. Recently, Zhang et al.[5] have calculated 12 s+r stars with 0.0005 ≤ Cs ≤ 0.0060. The s-process component coefficient of HE 1305-0007 lies in this range. The Ba and Eu abundances are most use- ful for unraveling the sites and nuclear parameters asso- ciated with the s- and r-process corresponding to those in extremely metal-poor stars, polluted by material with a few times of nucleosynthesis processing. In the Sun, the elemental abundances of Ba and Eu consist of signif- icantly different combinations of s- and r-process isotope contributions, with s:r ratios for Ba and Eu of 81:19 and 6:94, respectively.[16] From Eq. (1), we can obtain the s:r ratios for Ba and Eu are 95.7:4.3 and 30.1:69.9, which are obviously larger than the ratios in the solar system. From Fig. 1 we find that our model cannot explain the larger errors of some neutron-capture elements, such as Y and Zr in HE 1305-0007. This implies that our un- derstanding of the true nature of s-process or r-process is incomplete for at least some of these elements.[27] In conclusion, the star HE 1305-0007 is an s+r star with metallicity [Fe/H] = −2.0, which is just at the upper limit of the metallicity for the observed double-enhanced stars. Theoretical predictions for abundances starting with Sr fit well the observed data for the sample star, providing an estimation for neutron exposure occurred in AGB star. The calculated results indicated that al- most all s-elements were made in the first neutron expo- sure. Once this happens, after only one time dredge-up, the observed abundance profile of the s-rich stars may be reproduced in a single neutron exposure. From the high radial-velocity of HE 1305-0007, we speculate that the star could be a runaway star from a binary system, which has experienced the AIC event. The r-process el- ements in HE 1305-0007 (Cr = 67.4) should come from the AIC event. Because the orbital separation must be small enough to allow for capture of a sufficient amount of material to create the formation of AIC, this star should be a member of a post-common-envelope binary. After the s-process material has experienced only one neutron exposure in the nucleosynthesis region and is dredged-up to its envelope, the AGB evolution is terminated by the onset of common-envelope evolution. Clearly, such an idea requires a more detailed high-resolution study and long-term radial-velocity monitoring in order to reach a definitive conclusion. More in-depth theoretical and ob- servational studies of this scenario is highly desirable. References [1] Hill V et al 2000 Astron. Astrophys. 353 557 [2] Cohen J G et al 2003 Astrophys. J. 588 1082 [3] Qian Y Z and Wasserburg G J 2003 Astrophys. J. 588 1099 [4] Zijlstra A A 2004 Mon. Not. R. Astron. Soc. 348, [5] Zhang B, Ma K and Zhou G D 2006 Astrophys. J. 642 1075 [6] Barbuy B et al 2005 Astron. Astrophys. 429 1031 [7] Wanajo S et al 2005 Astrophys. J. 636 842 [8] Gallino R et al 1998 Astrophys. J. 497 388 [9] Gallino R et al 2003 Nucl. Phys. A 718 181 [10] Straniero O et al 1995 Astrophys. J. 440 L85 [11] Straniero O, Gallino R and Cristallo S 2006 Nucl. Phys. A 777 311 [12] Cohen J G et al 2005 Astrophys. J. 633 L109 [13] Aruna Goswami et al 2006 Mon. Not. R. Astron. Soc. 372 343 [14] Busso M et al 2001 Astrophys. J. 557 802 [15] Aoki W et al 2001 Astrophys. J. 561 346 [16] Arlandini C et al 1999 Astrophys. J. 525 886 [17] Cui W Y et al 2007 Astrophys. J. 657 1037 [18] Ma K, Cui W Y and Zhang B 2007 Mon. Not. R. Astron. Soc. 375 1418 [19] Iben I Jr 1977 Astrophys. J. 217 788 [20] Groenewegen M A T and de Jong T 1993 Astron. Astrophys. 267 410 [21] Karakas A I, Lattanzio J C and Pols O R 2002 PASA 19 515 [22] Cui W Y and Zhang B 2006Mon. Not. R. Astron. Soc. 368 305 [23] Herwig F 2000 Astron. Astrophys. 360 952 [24] Herwig F 2004 Astrophys. J. 605 425 [25] Fujimoto M Y, Ikeda Y and Iben I Jr 2000 Astro- phys. J. 529 L25 [26] Iwamoto N et al 2003 Nucl. Phys. A 718 193 [27] Travaglio C et al 2004 Astrophys. J. 601 864 [1] ∗Supported by the National Natural Science Foundation of China under Grant Nos 10373005, 10673002 and 10778616. [2] ∗∗To whom correspondence should be addressed. Email: zhangbo@hebtu.edu.cn References ABSTRACT The star HE 1305-0007 is a metal-poor double-enhanced star with metallicity [Fe/H] $=-2.0$, which is just at the upper limit of the metallicity for the observed double-enhanced stars. Using a parametric model, we find that almost all s-elements were made in a single neutron exposure. This star should be a member of a post-common-envelope binary. After the s-process material has experienced only one neutron exposure in the nucleosynthesis region and is dredged-up to its envelope, the AGB evolution is terminated by the onset of common-envelope evolution. Based on the high radial-velocity of HE 1305-0007, we speculate that the star could be a runaway star from a binary system, in which the AIC event has occurred and produced the r-process elements. <|endoftext|><|startoftext|> Introduction In eleven-dimensional M theory, there exists two extended brane solutions, i.e membrane and M5-brane. The membrane was recovered in [1] as an elementary solution of D = 11 supergravity which preserves half of the spacetime supersymmetry, which is a electric source of four-form field. While, the M5-brane was found in [2] as a soliton solution of D = 11 supergravity also preserving half of the spacetime supersymmetry, but is magnetic source of the same four-form field. These extended brane solutions can be related to the corresponding brane solutions in ten-dimensional string theory. After performing the compactification and some dualities, these branes can be reduced to D-branes or other brane solutions in string theory [3]. In this paper, we will investigate the properties of M2-brane in the M5-brane back- ground. Here, we will not investigate the cases of the brane intersection. Instead, we are mainly concerned with the classical dynamics of membrane in the given background. As will be illustrated, due to the gravity force of M5-brane, the membrane evolves nontriv- ially. In the 11-dimensional supergravity, the classical solution of N coincident M5-brane reads ds2 = H− 3 ηµνdx µdxν +H 3 δijdx idxj, H = 1 + πNl3p (xi)2 = r2 + x11 2, µ, ν = 0, 1, · · · , 5, i, j = 6, 7, 8, 9, 11 (1.1) and the 4-form field strength takes the form F4 = dA3 = 3πNl pdvS4 (1.2) where the dvS4 denotes the volume form of a unit S 4 and lp is the Planck length in the 11-dimensional theory. The N coincident M5-brane are parallel to the xµ directions and located at R = 0 in the transverse space. In the near horizon limit R → 0, the harmonic form H will become H = πNl3p , and the other parts will choose the same forms as in the equations (1.1) and (1.2). As in [4], if we suppose that there are a periodic configuration of N coincident M5- brane along the x11 direction at intervals of 2πR11, and take the limit of 1 ≪ r/R11, then our background metric and the 4-form field strength will become ds2 = f− 3ηµνdx µdxν + f 3 δijdx idxj + f 3 (dx11)2, f = 1 + R11r2 2Nℓ3p dvS3 ∧ dx11, (xi)2, x11 = R11φ, (1.3) where µ, ν = 0, 1, · · · , 5, i, j = 6, 7, 8, 9 and 0 ≤ φ ≤ 2π . We can see this metric has an so(4) symmetry group of rotations in the directions transverse to the M5-brane. In the near horizon limit, the harmonic function f becomes f = R11r2 . While, the other parts of background (1.3) remain unchanged. Actually, if letting the radius of x11 coordinate approach zero, then the metric (1.3) can reduce to the N coincident NS5-brane solution in ten-dimensional string theory [5]. Here we will mainly focus on the classical dynamics of a M2-brane in the above back- grounds (1.1) and (1.3). The dynamics of this single membrane can be described by the Nambu-Goto and Wess-Zumino type effective action. However, for the coincident mem- branes, unlike the coincident D-brane in string theory which can be described by the effective action [6], their worldvolume action is still not very clear [7]. We choose the worldvolume coordinates of membrane as x0, x1, x2, and those of M5-brane as x0, · · · , x5. Hence M2-brane is “parallel” to the M5-brane, i.e it is extended in some of the M5-brane worldvolume directions xµ, and point-like in the directions transverse to the M5-brane (x6, x7, x8, x9, x11). Indeed, this configuration breaks supersymmetry completely. We can label the worldvolume coordinates of the M2-brane by ξµ, µ = 0, 1, 2, and use reparame- terization invariance on the worldvolume of the M2-brane to set ξµ = xµ. The position of the M2-brane in the transverse directions, (x6, · · · , x9, x11), give rise to scalar fields on the worldvolume of the M2-brane, (X6(ξµ), · · · , X9(ξµ), X11(ξµ)). A single M2-brane world- volume action [8] is given by the sum of the Nambu-Goto action and the Wess-Zumino type term in the following form SM2 = −T2 − detP [G]µν + T2 P [A] (1.4) where the tension of the M2-brane is expressed as T2 = 1/4π 2l3p, and P [· · ·] means the pullback operation P [G]µν = GMN(X), P [A] = AMNL(X) . (1.5) The indices M,N,L run over the whole eleven dimensional spacetime. And the fields GMN , AMNL denote the metric and form field in eleven dimensions. In the following sections, we will discuss the M2-brane classical dynamics in the above backgrounds, and suppose that the transverse coordinates of M5-brane only depend on the time coordinate. In this case the Wess-Zumino term in the membrane action will vanish. 2 Classical dynamics of membrane Now let us consider the membrane dynamics in the background (1.1). Since we have supposed that the directions transverse to the M5-brane X i are only the function of time t, where i = 6, 7, 8, 9, 11, the pullback quantities are as following P [G]tt = −H− 3 Ẋ iẊ i, P [G]x1x1 = H P [G]x2x2 = H 3 , P [A] = 0. (2.1) after substituting the above equations (2.1) into the M2-brane action (1.4), we get SM2 = −V T2 H−1 − Ẋ iẊ i (2.2) where V is the space volume of the M2-brane, also i = 6, · · · , 9, 11. We can find it is very similar to the corresponding one in [9] which is the DBI action of D-brane in the N NS5 brane background. Then through using the Legendre transformation, the Hamiltonian is H−1 − Ẋ iẊ i ≡ V E (2.3) where the E denotes the energy density. And the equation of motion will be H−1 − ẊjẊj H−1 − ẊjẊj . (2.4) Using this equation of motion (2.4), one can check that the Hamiltonian is conserved. To solve the (2.4), we need the initial conditions that it is ~X(t = 0) and ~̇X(t = 0). These two vectors define a plane in R5. By an SO(5) rotation, we can rotate this plane into the (x6, x7) plane. Then the motion will remain in the (x6, x7) space for all time. Thus, without loss of generality, we can study trajectories in this space. We choose the polar coordinates X6 = R cos θ, X7 = R sin θ. (2.5) Then the energy density (2.3) will become H−1 − Ṙ2 − R2θ̇2 , (2.6) and the angular momentum density will be H−1 − Ṙ2 − R2θ̇2 . (2.7) We can find this angular momentum of the M2-brane is conserved as well. From the membrane action (2.2), we can obtain energy momentum tensor. The components of Tµν are listed in the following T00 = − H−1 − Ẋ iẊ i Tij = −T2δij H−1 − Ẋ iẊ i, (2.8) and the other components of stress tensor are zero. From the angular momentum Lθ equation (2.7) and energy density E (2.6), we can get the equations of the coordinates R and θ Ṙ2 = T 22 + , (2.9) EH(R)R2 . (2.10) For simplicity, we can first consider Lθ = 0 case, then the radial equation is Ṙ2 = . (2.11) The right hand of the above equation can’t be smaller than zero, so we get a constraint on the coordinate R is πNl3p − 1. (2.12) From the above equation, we can see if the energy density E is larger than the tension of a M2-brane, T2, the constraint (2.12) is empty and the M2-brane can escape to infinity. However, for E < T2, the M2-brane does not have enough energy to overcome the grav- itational pull of the M5-brane, and then will fall down to the M5-brane from an initial position. Choosing the near horizon limit, hence the harmonic function becomes H = πNl3p/R Then the equation (2.11) will be Ṙ2 = πNl3p π2N2E2l6p R6. (2.13) Since the left hand of the equation (2.13) is nonnegative, the coordinate R has a maximal value πNE2l3p/T . Also from this equation, the minimal value of R is zero. Except for these two, there are no other extremum. But there is one inflexion between points R = 0 and πNE2l3p/T . We can regard the M2-brane is at the maximal value πNE2l3p/T at the initial time. Due to the gravitational force of M5-brane, the M2- brane then will roll down to the M5-brane. As the time t → ∞, the radial coordinate R approaches to zero. We can calculate the energy momentum tensor which is Tij = EH(R) as the R → 0, the Tij will approach to zero. It may regard as the pressure decreasing to zero. But we need to mention that the coordinate R can’t reach zero, since at this point the supergravity background will be not reliable. Then the classical dynamics of the membrane near R = 0 from the above analysis will become incorrect. Thus, in order to use the supergravity approximation, we must constrain the coordinate R to be larger than the planck length lp. Now we begin to consider the nonzero case of angular momentum. From the radial equation of motion (2.9), and after substituting the harmonic function H = 1+πNl3p/R we can get the constraint on the radial coordinate R is πNE2l3p πNl3p . (2.14) If choosing the equal case of the above equation, the constraint will become πNE2l3p πNl3p = 0. (2.15) The above equation only has one real root which is the maximal distance that M2-brane is separated from M5-brane. For simplicity, we choose the near horizon limit, then the equation of motion for the radial coordinate will become Ṙ2 = πNl3p π2N2E2l6p π2N2E2l6p R6. (2.16) We find that the equation (2.16) is still very difficult to solve. Instead, here, we take some analysis for this equation. If letting Lθ = 0, then this equation will reduce to the equation (2.13). We let the left hand of the equation (2.16) to zero, then we can get the extremal value for R. Actually, there are two only two real extremal values of the radial coordinate R. One is R = 0, the other is 108πNlp 3E2T2 + 12 6 + 27π2N2lp 6E4T2 − 12Lθ2 108π Nlp 3E2T2 + 12 6 + 27π2N2lp 6E4T2 . (2.17) When Lθ = 0, the above R value will reach the πNE2l3p/T . As the same in the Lθ = 0 case, between the R = 0 and (2.17) there exists a inflexion. We can suppose that the M2-brane is at the maximal value (2.17) at the initial time, then under the gravitational pull of M5-brane, it will monotonic approach to M5-brane. Of course for the Lθ nonzero case, the equation (2.10) for the θ coordinate in the near horizon background is θ̇ = Lθ = RLθ πNEl3p . Thus, if the radial coordinate R reaches the value (2.17), the angular velocity will choose the maximum, and as the R → 0, the angular velocity does also approach to zero. The energy momentum tensor satisfies Tij = −δij EH(R) = −δij πNEl3p Thus, it again goes to zero as in the Lθ = 0 case. As mentioned in the above, near the region R = 0, the classical background will be instability due to the strong interaction. Hence the above supergravity analysis will become unreliable in this region. From the first section, we already know that, after compactifying a periodic circle of coordinate x11, the metric (1.1) will become background (1.3). In the following, we study the membrane dynamics in this background (1.3). Here, we still suppose the directions transverse to the M5-brane X i and X11 are only the function of time t, where i = 6, 7, 8, 9, then the pullback quantities take the form as follows P [G]tt = −f− 3 + f 3 Ẋ iẊ i + f 3 Ẋ11Ẋ11, P [G]x1x1 = f P [G]x2x2 = f 3 , P [A] = 0. (2.18) After inserting (2.18) into the M2-brane action (1.4), we can get SM2 = −V T2 f−1 − Ẋ iẊ i − Ẋ11Ẋ11 (2.19) where V is the space volume of the M2-brane. This action is also very similar to action in [9] except for the harmonic function and dimension. From the Lagrangian (2.19), we can derive the equations of motion for the membrane in this background as followes f−1 − ẊjẊj − R211φ̇2 f−1 − ẊjẊj −R211φ̇2 , (2.20) R11φ̇ f−1 − ẊjẊj − R211φ̇2  = 0. (2.21) Due to some symmetry of this system, there are also some conserved charges. Time translation invariance implies that the energy H = PiẊ i + Pφφ̇− L (2.22) is conserved. The momentum is obtained by varying the Lagrangian L, δẊ i T2V Ẋi f−1 − ẊjẊj − R211φ̇2 , (2.23) T2V R f−1 − ẊjẊj − R211φ̇2 . (2.24) Substituting (2.23) into (2.22), we find that the energy is given by f−1 − Ẋ iẊ i − R211φ̇2 ≡ V E. (2.25) And since the harmonic function f = 1 + R11r2 , then ∂if(r) = X if ′(r)/r, and one of the equations of motion (2.20) can be rewritten as f−1 − ẊjẊj − R211φ̇2 X if ′ 2rf 2 f−1 − ẊjẊj − R211φ̇2 , (2.26) the other one is unchanged. To solve these equations, we need to specify some initial conditions for the coordinates. One condition is ~X(t = 0) and ~̇X(t = 0). These two vectors define a plane in R4. By an SO(4) rotation symmetry, we can rotate this plane into the (x6, x7) plane. The other one is φ(t = 0) and φ̇(t = 0). Then the motion of the membrane will remain in the (x6, x7, φ) space for all time. Thus, without loss of generality, we can study trajectories in this space. In addition to the energy, the angular momentum of the M2-brane is conserved as well. It is given by (X6P 7 −X7P 6). (2.27) Using the expression for the momentum, (2.23), we find that Lθ = T2 X6Ẋ7 −X7Ẋ6 f−1 − ẊjẊj −R211φ̇2 . (2.28) Another interest quantity is the stress tensor Tµν associated with the moving M2- brane. The component T00 denotes the energy density, so it is given by expression (2.25) for E, with the factor of the volume stripped off. We list the components of Tµν in the following equations T00 = − f−1 − Ẋ iẊ i −R211φ̇2 Tij = −T2δij f−1 − Ẋ iẊ i − R211φ̇2, Tφφ = −T2R211 f−1 − Ẋ iẊ i −R211φ̇2, (2.29) and the other components of stress tensor are zero. Due to the so(4) rotation symmetry in the transverse directions of M5-brane, it is convenient to change to the polar coordinates X6 = r cos θ, X7 = r sin θ. (2.30) In these coordinates, the expressions of the energy density and angular momentum density becomes f−1 − ṙ2 − r2θ̇2 − R211φ̇2 , (2.31) f−1 − ṙ2 − r2θ̇2 − R211φ̇2 , (2.32) f−1 − ṙ2 − r2θ̇2 − R211φ̇2 . (2.33) One can check directly that Lθ and Lφ are conserved by using the equations of motion (2.26) and (2.21). In order to solve the equations of motion for the given energy and angular momentum densities E, Lθ and Lφ, we would like to solve the equation (2.32) for θ̇, and then substitute this solution into the (2.31). Then the equation for the θ̇ is . (2.34) Inserting it into (2.31), (2.32) and solving for ṙ, we find ṙ2 = E2f 2 T 22 + . (2.35) Also we have the equation of φ̇ EfR211 . (2.36) In the next, we would like to study the solutions of the equations of motion (2.34), (2.35) and (2.36). Firstly, we consider the angular momentum Lθ = 0 case. Then Equation (2.34) implies that θ is constant, while the radial equation (2.35) takes the form ṙ2 = E2f 2 T 22 + . (2.37) Since the right hand side of the equation(2.37) is non-negative, then we can get the condition 1 T 22 + ≥ 0. After substituting the harmonic function f , (1.3), into it, we find the constraint on r (for fixed energy density E) R11r2 − 1 (2.38) where we can define the effective M2-brane tension is T 2e = T (2.39) From the equation of constraint (2.38), obviously, if the energy density E is larger than the effective tension of a M2-brane, Te, the constraint (2.38) is empty and the M2-brane can escape to infinity. For E < Te, the M2-brane does not have enough energy to escape the gravitational pull of the M5-brane, which means that it cannot exceed some maximal distance from the M5-brane. Under the near horizon limit, the harmonic function f will become f = R11r2 . Then the equation (2.38) will be R11r2 . Thus, if r << , the effective tension of membrane Te satisfies the constraint Te/E >> 1. However, r >> , the case will be otherwise. Indeed, in this near horizon case, we can solve for the trajectory r(t), φ(t) exactly. Substituting the harmonic function f = R11r2 into (2.37), we find the equation of motion ṙ2 = E2N2l6p T 22 + r4. (2.40) Then the solution can be obtained L2φ +R NR11E2l3p t (2.41) where we choose t = 0 to be the time at which the M2-brane reaches its maximal distance from the M5-brane. For an observer living on M5-brane, the M2-brane reaching r = 0 will take an infinite time. Also, the M2 radial motion is similar to D-brane’s motion in And the equation of motion (2.24) becomes φ̇2 = r2 − ṙ2 11 + T . (2.42) Substituting the solution r into equation (2.42), we can get the equation L2φ + T . (2.43) Then after solving this equation, the solution can be obtained L2φ + T t. (2.44) It is interesting to calculate the energy momentum tensor of the M2-brane in this case. The energy density T00 is constant and equal to E throughout the time evolution. However, for the parts Tij and Tφφ, we can find Tij = −δij Tφφ = − R211T . (2.45) We see that the pressure goes smoothly to zero as r → 0, since f(r) ∼ 1/r2. But again as the analysis in the background (1.1), this may be unreliable near the r = 0 region. So far we have discussed the trajectories with vanishing angular momentum density (2.32). A natural question is whether anything qualitatively new occurs for non-zero Lθ. Just as [9], we can think as follows, the radial equation of motion (2.35) can be thought of as describing a particle with mass m = 2, moving in one dimension r in the effective potential Veff(r) = E2f 2 T 22 + (2.46) with zero energy. Now we discuss the properties of this effective potential Veff . In the small r region, it will behave as Veff(r) ≃ E2Nl3p r2. (2.47) For large r, the leading terms of this potential will be Veff(r) ≃ − 1. (2.48) If the energy density of the M2-brane is smaller than the effective tension of a M2-brane, E < Te, then the effective potential Veff approaches to a positive constant (2.48) as r → ∞, which means the membrane cannot escape to infinity. From the equation (2.47), we can find that in order to have trajectories at non-zero r, the angular momentum must satisfy the constraint NEl3p . (2.49) If the constraint (2.49) is not satisfied, the only solution is r = 0. But, if the condition (2.49) is satisfied, the trajectory of the M2-brane is qualitatively similar to that in the Lθ = 0 case. It will approach the M5-brane and does not have stable orbits at finite r. For the case Te >> E, the whole trajectory lies again in the region r << Nl3p/R11, and one can approximate the harmonic function (1.3) by f = R11r2 . Then the equation (2.35) for ṙ will be ṙ2 = Nl3pE E2N2l4p T 22 + r4 , (2.50) with the solution L2φ +R R11NE2l3p − R11L2θ NR11E2l3p − R211L2θ NEl3p t. (2.51) We can find that the non-zero angular momentum can slow down the exponential decrease of r as t → ∞. In the near horizon limit f(r) = Nl , the solution of the equation (2.34) for θ is R11Lθ ENl3p t. (2.52) The solution (2.51) and (2.52) mean that the M2-brane in the background (1.3) will be spiralling towards the origin, circling around it an infinite number of times in the process. The equation about φ is Lφ(NE 2l3p −R11L2θ) ENl3p(L cosh R11E2Nl3p − R211L2θ ENl3p . (2.53) and the solution of the above equation reads L2φ +R NE2l3p − L2θ R11NE2l3p − R211L2θ ENl3p t. (2.54) At t = 0, the φ = 0, however, the time t → ∞, then, φ → Lφ NE2l3p−L Thus, the non-zero angular momentum Lθ slows down the variation of φ. From these three solutions, we know that the M2-brane is circling along the θ direction, varying along the φ and falling down towards the M5-brane in the process. Also, the energy momentum tensor Tij and Tφφ will approach to zero as r → 0, since f(r) ∼ 1/r2. But we must mention that, near the r = 0 region, the discussion may be incorrect due to the strong coupling. In the background (1.3), the results about the dynamics of a M2-brane have some similar properties as studying in [9]. This can be understood that the D2-brane and NS5-brane in IIA can be got by compactified one transverse dimension of M2-brane and M5-brane in M theory. The solutions of equation of motion describe the M2-brane falling towards the M5-brane. In the non-zero angular momentum Lθ, the M2-brane is spiralling towards the M5-brane. But both in this two case, M2-brane has a angular momentum Lφ. We need to mention that the background (1.3) is only correct in the limit of 1 ≪ r/R11. Therefore, as the M2-brane approaches the M5-brane, the energy momentum tensor Tij and Tφφ approaching zero may be unreliable. Since here the radial coordinate r is smaller than the radius R11. So we are not sure whether the membrane will have the same behavior just like the late time behavior of unstable D-brane [10, 11, 12, 13, 14]. In the above sections, we investigated the membrane classical dynamics in various M5-brane backgrounds. There may be some generalizations, since under the Penrose limit, the N coincident M5-brane solution (1.1) will reduce to the AdS7 × S4 geometry. Hence one can investigate the membrane dynamics in this geometry. For the (1.1), (1.3) and their near horizon background geometry, after calculating the classical equations of motion of membrane from the membrane action (1.4), we can analyze the moving trajec- tories of membrane. In some particular cases, we can get the exact solution of trajectories of membrane. However, generally, the equations of motion is very difficult to solve. But through analyzing these equations, we still can obtain some qualitative information about the motion of membrane. Consequently, in the M5-brane background, the membrane will be falling and spiralling towards to the M5-brane by the gravitational force of M5-brane. In the near M5-brane region, i.e R (or r) being of the order of the planck length lp, the above analysis of the classical dynamics of membrane may not be trusted, since the method of the supergravity approximation is unreliable. Acknowledgements We would like to thank Yi-hong Gao for the useful suggestions and discussions. References [1] M. J. Duff and K. Stelle, “Multimembrane solutions of d = 11 supergravity,” Phys. Lett. B 253 (1991) 113. [2] R. Gueven, “Black p-brane solutions of D = 11 supergravity theory,” Phys. Lett. B 276 (1992) 49. [3] J. Polchinski, “String Theory (Vol. I, Vol. II),” Cambridge Press, 1998. [4] Y. Hyakutake, “ Expanded Strings in the Background of NS5-branes via a M2-brane, a D2-brane and D0-branes,” hep-th/0112073. [5] C. G. Callan, J. A. Harvey and A. Strominger, “Worldbrane actions for string soli- tons,” Nucl. Phys. B367: 60-82, 1991; “World sheet approach to heterotic instantons and solitons,” Nucl. Phys. B359: 611-634, 1991. [6] R. C. Myers, “Dielectric branes,” JHEP 9912: 022, 1999 [hep-th/9910053]. [7] A. Basu and J. A. Harvey, “The M2-M5 brane system and a generalized Nahm’s equation,” Nucl. Phys. B713: 136-150, 2005 [hep-th/0412310]. [8] E. Bergshoeff, E. Sezgin and P. K. Townsend, “Properties Of The Eleven-Dimensional Super Membrane Theory,” Annals Phys 185 (1988) 330. [9] D. Kutasov, “D-Brane Dynamics Near NS5-Branes,” hep-th/0405058; “A Geometric interpretation of the open string tachyon,” hep-th/0408073; K. L. Panigrahi, “D- Brane Dynamics in Dp-Brane Background,” hep-th/0407134. [10] A. Sen, “Tachyon dynamics in open string theory,” Int. J. Mod. Phys. A20: 5513- 5656, 2005 [hep-th/0410103]. [11] A. Sen, “Rolling tachyon,” JHEP 0204, 048 (2002) [hep-th/0203211]. [12] F. Larsen, A. Naqvi and S. Terashima, “Rolling tachyons and decaying branes,” hep-th/0212248. [13] T. Okuda and S. Sugimoto, “Coupling of rolling tachyon to closed strings,” Nucl. Phys. B647, 101 (2002) [hep-th/0208196]. [14] N. Lambert, H. Liu and J. Maldacena, “Closed strings from decaying D-branes,” hep-th/0303139. Introduction Classical dynamics of membrane ABSTRACT In this paper, we investigate the properties of a membrane in the M5-brane background. Through solving the classical equations of motion of the membrane, we can understand the classical dynamics of the membrane in this background. <|endoftext|><|startoftext|> Introduction Solar research is currently working on understanding how turbulent convection on the Sun transports mass and en- ergy through the convective zone, how it couples with the magnetic field and how it manages to deposit in the higher parts of the solar atmosphere the energy released from the corona. Among the different approaches to these questions, observations of the solar photosphere are essential, as they provide the only direct look at what is happening just below the solar surface. The hierarchy of surface features found on the photosphere are the visible representation of the plasma flows beneath the photosphere and are customar- ily classified by size and lifetime as patterns of granula- tion (1 Mm, 0.2 hr), mesogranulation (5-10 Mm, 5 hr) and supergranulation (15-35 Mm, 24 hr). These features have been initially regarded as direct manifestation of var- ious sized convection cells existing in the convection zone (Schrijver et al., 1997; Raju et al., 1999); lately, the idea is consolidating that meso and supergranulation are sig- natures of a collective interaction of granular cells (Rast, 2003; Roudier et al., 2003; Berrilli et al., 2005). Despite years of intensive studies, the character of their motions remains not completely understood (Beck & Duvall, 2000; Krishan et al., 2002; Berrilli et al., 2004; Del Moro et al., 2004; DeRosa & Toomre, 2004). The aim of the study we present is to investigate the origin of the supergranular (SG) flow field: directly convective or a collective interaction of smaller convective features. The study performed by Simon & Leighton (1964) initi- ated the campaign to characterize supergranular flows. Outflows on SG scales have been observed to sweep embedded granules and magnetic flux elements toward Send offprint requests to: delmoro@roma2.infn.it convergence lanes between cells (Leighton, 1964; Zwaan, 1978; Rimmele, 1989; Shine et al., 2000). Such behaviour causes the chromospheric transition CaII k line to be a good proxy for the network of intercellular lanes due to the higher magnetic elements density in the SG perimeters. The advent of full-disk Doppler imaging, pro- vided by the MDI onboard SOHO spacecraft, has con- siderably improved our capability to study such fea- tures (Hathaway et al., 2002; DeRosa & Toomre, 2004; Paniveni et al., 2004; Meunier et al., 2007); but direct ob- servations of supergranular flows are still hindered by the fact that there is no contrast on supergranular scales in visi- ble light, observations in CaII only provide the cell network boundaries and Doppler images show SG only away from disk centre. At present, the only methods to reconstruct the full 3D vector velocity field are direct Doppler measurement in combination with a tracking type measure for the velocity horizontal component (above the τ = 1 surface) or Local Helioseismology (below the τ = 1 surface). To gain com- plete insight of the dynamics of the plasma flows inside a SG structure, we need a spatial and temporal resolution still not reached by local helioseismology, while to obtain the 3D velocity field through the other method, observations with very high spatial, spectral and temporal resolution are necessary. With the assumption that granule motions are mainly driven by plasma flows (Rieutord et al., 2001), it is possible to employ the TST to infer the horizontal velocity field. In this work we reconstruct the 3D velocity field of a sin- gle SG structure and investigate in detail its plasma flow using data acquired with the IBIS spectrometer, trying to discern whether the SG pattern has a convective nature or is originated by small scale structure interaction. http://arxiv.org/abs/0704.0578v2 2 Del Moro et al.: 3D photospheric velocity field of a SG cell Fig. 1. A representative synoptic panel from the 16th October 2003 dataset. Upper left panel: Ca II wing intensity image. Upper middle panel: Doppler velocity field computed from FeI 709.0 nm line scan. Upper right panel: FeI 709.0 nm line core intensity Lower left panel: Doppler velocity field computed from FeII 722.4 nm line scan. Lower middle panel: FeII 722.4 nm line core intensity. Lower right panel: Continuum (near 709.0 nm) intensity image. Line Wavelength zTcore FWHMRFI zVline FWHMRFI [nm] [km] [km] [km] [km] FeI 709.0 ≃100 ∼300 ≃140 ∼300 FeII 722.4 ≃50 ∼200 ≃70 ∼200 Table 1. Line RF peak depths. Depths are in km above the level τ500nm = 1. 2. Observations The data utilized in this analysis have been acquired with the IBIS (Interferometric BIdimensional Spectrometer) 2D spectrometer (Cavallini et al., 2001; Cavallini, 2006) on October 16, 2003 (from 14:24 UT to 17:32 UT). We imaged a roundish network cell near the solar disk cen- tre (SLAT=7.8N, SLONG=3.6E). When observed in MDI high-resolution magnetograms, all the features outlining the cell exhibit negative polarity and seem to survive for at least 10 hours, with little or no evolution. The full dataset consists of 600 sequences, containing a 16 image scan of the FeI 709.0 nm line, a 14 image scan of the FeII 722.4 nm line and 5 spectral images in the wing (line centre + 12 nm) of the CaII 854.2 nm line, imag- ing a round Field of View (FoV) of about 80” diameter. Each monochromatic image was acquired with a 25 ms ex- posure time by a 12bit CCD detector, whose pixel scale was 0.17′′·pixel−1. The time required for the acquisition of a single sequence was 19 s, thus setting the temporal res- olution. Each image was reduced with the standard IBIS pipeline (Janssen & Cauzzi, 2006; Giordano et al., 2007), correcting for CCD non linearity effects, dark current, gain table and blue shift. The Line of Sight (LoS) velocity fields were computed for the Fe I and Fe II lines by means of Doppler shifts, evaluated, pixel by pixel, fitting a Gaussian on the line profile. In order to remove the orbital contribu- tion, we set to zero the average value of each LoS velocity image. The 5-minutes oscillations were removed applying a 3D Fourier filter in the kh −ω domain with a cut-off veloc- ity of 7 km·s−1 both on intensity and velocity image series. After the whole reduction process and selecting only the period of good seeing we are left with a 30 minutes dataset imaging a square FoV of ∼ 50”. An example of the im- ages of this reduced dataset is shown in Fig. 1. The mean resolution due to the seeing of the CaII images is 0.35”; the mean resolution of the LoS velocity images, the inten- sity continuum images and the line core images is instead 0.45”, somewhat degraded by both the reduction pipeline and the kh − ω filtering. In order to obtain information about the depth dependence of a photospheric quantity by associating a suitable ‘for- mation zone’ with a line, it is possible to consider its ef- Del Moro et al.: 3D photospheric velocity field of a SG cell 3 fect on the line characteristic as linear perturbations and to study the Response Function RF of the emergent line characteristic at the observed wavelengths within the line. In particular, the RF Ip is, at each depth, the function we must use to weigh the perturbation p in order to get the variation of the emergent intensity I (Caccin et al., 1977). This approach has been employed to derive the RF IT and Fig. 2. Core intensity fields of FeII 722.4 nm (z≃50 km) and FeI 709.0 nm (z≃100 km ) in comparison with con- tinuum image (z≃0 km) and CaII 854.2 nm wing intensity field (z≃150 km) (Qu & Xu , 2002). The z axis is greatly exaggerated with respect to the x-y axes in order to allow a better visualization. V for the spectral lines FeI 709.0 nm and FeII 722.4 nm (Del Moro, 2005). In Table 1 we report the photospheric depths of the line core RF IT maximum (zTcore), and of the mean RF V maxi- mum (zVline) for the two spectral lines. We also report the RF full width at half maximum for the two spectral lines: these rather large values imply broad formation zones for both the FeI 709.0 nm and FeII 722.4 nm Doppler velocity and core intensity signals. 3. 3D Velocity Field The TST procedure (Del Moro, 2004) has been applied on the continuum image series and on both the FeII 722.4 nm and FeI 709.0 nm Doppler field series, in order to retrieve the horizontal velocity field at different depths of the solar atmosphere. To minimize the effect of the proper motion of the granules, which are used as trackers of the mean plasma flow, we computed the horizontal velocity field using all the structures that were tracked, so that statistically at least one tracker is present in each interpolated horizontal velocity field pixel, as suggested by Behan (2000). This means we used a grid step of ∼ 1.5 Mm and a temporal window of ∼ 30 min. Possibly, this would not completely remove the noise associated with granule proper motions or residuals from the 5-min oscillation filtering, but should minimize it. Combining the horizontal velocity fields retrieved from Fig. 3. 3D representation of velocity vectors extracted from continnum (z≃0 km), FeII 722.4 nm (z≃70 km) and FeI 709.0 nm (z≃140 km). Cone size is proportional to the ve- locity vector module: the yellowish cone corresponds to 1 km·s−1. The z axis is greatly exaggerated with respect to the x-y axes in order to allow a better visualization. the Dopplergrams by the TST and the Doppler LoS velocity, the 3D vector field has been reconstructed for the FeII 722.4 nm and FeI 709.0 nm lines. We are aware that associating the vector fields to precise heights in the photosphere is an oversemplification, as can be readily understood from the large FWHM reported in Table 1, nevertheless, we did it for the sake of a good visualization: in Fig. 3 we show the mean 3D velocity field associated to the the dataset. In both the 3D fields we retrieved, the vector velocity appears to be structured quite coherently with the SG feature visible in the CaII wing images. In order to further investigate the structuring of the velocity field, we computed the average continuum (upper panel of Fig. 4) and CaII wing (bottom panel of Fig. 4) intensity images, the average Dopplergram from FeII 722.4 nm (middle panel of Fig. 4) and the average Dopplergram from FeI 709.0 nm (upper panel of Fig. 4) and correlated them. While the average continuum image does not show any evident signal, there is a significant correlation between strong downflows and bright CaII features, in particular for the complex cluster of features in the lower part of the FoV. This issue can be at least partially explained by the coherence of the 3D velocity field with the SG structure. We expanded this study by comparing the averaged images with the horizontal velocity field extracted by the TST from granules as seen in the continuum and up-flows from the FeI 709.0 nm Doppler images. We excluded from this analysis the FeII 722.4 nm Doppler images because we found its horizontal velocity field to be not as reliable as the others. This is due to the TST finding less than optimal number of features to track because of the shallowness of the FeII 722.4 nm line. A shallow line Doppler shift is much harder to measure by the LoS velocity reconstruction procedure, resulting in a more noisy dopplergram. This noise is interpreted by the TST as a fast variation of the structures, therefore causing a lot of them to be rejected for the tracking. As a consequence, the TST finds too few trackers in the 722.4 Doppler for the divergence field to be reliably reconstructed. 4 Del Moro et al.: 3D photospheric velocity field of a SG cell The horizontal velocity fields extracted by the TST are Fig. 4. Upper panel: average continuum image with the horizontal velocity field (obtained by tracking granules) represented as red arrows. The granules were tracked by ap- plying the TST to the continuum image time series. Lower panel: divergence field computed from the interpolated hor- izontal velocity field. shown superimposed on the average images in the left panels of Fig. 4 and Fig. 5, above the associated 2D divergence images. The values of the divergence fields range from +0.25 km s−1 Mm−1 in the brightest part of the image to −0.25 km s−1 Mm−1 in the darkest parts. The continuum granules and FeI 709.0 nm up-flow fields show a divergent flow from the centre of the SG structure and convergent flows in the border of the SG structure. In detail, these two fields agree very well, showing a single, large divergent feature in the centre of the SG, whose mean value is about +0.1 km s−1 Mm−1, almost completely surrounded by convergent flows of the same magnitude. The peak divergence signals we retrieved both in the centre and in the periphery of the SG cell are an order of magnitude larger than the averaged values reported by Meunier et al. (2007). This discrepancy probably stems mainly from the different temporal and spatial averaging processes in the divergence reconstruction and marginally from the different resolution of the two datasets. The structuring of the divergence field is very compatible with a net flow from the centre of the SG to its border. Fig. 5. Upper panel: average FeI 709.0 nm Doppler velocity image with the horizontal velocity field (obtained by track- ing up-flows) represented as red arrows. The up-flows were tracked by applying the TST to the FeI 709.0 nm Doppler velocity field time series. Lower panel: divergence field com- puted from the interpolated horizontal velocity field. Moreover, examining the LoS velocity fields, we found a strong and stable up-flow region nearby the divergence maximum, with a mean FeII 722.4 nm Doppler veloc- ity value of Vc ∼200 m·s −1 for almost the whole time span. This last region is liable to be the origin of the divergence signal we measured, possibly as suggested by Rieutord et al. (2000); Roudier et al. (2003). Observing the divergence images (bottom panels of Fig. 4 and Fig. 5) extracted from the horizontal velocity fields, the supergranule is outlined by convergences on ∼ 66% of its circumference, while the bright cluster area clearly visible in the CaII wing image does not seem to be a region of strong convergence, despite the fact that it is mostly formed of down-flows. This region has a mean FeII 722.4 nm Doppler velocity value Vc ∼-100 m·s −1 for the whole dataset time span (T ≃ 0.5 hour), and it seems to mantain similar values also for the part of the observations discarded for the loss of spatial resolution due to worsening seeing condition. A similar cluster of bright CaII structures is present in the upper-left part of the FoV, but its associated downflow shows a much smaller coherence: it has a mean FeII 722.4 nm Doppler velocity value Vc ∼-50 m·s −1 for more than half Del Moro et al.: 3D photospheric velocity field of a SG cell 5 of the time span, then it drops to ∼-20 m·s−1. Whether or not downflow regions like these may be organizing the SG pattern, as predicted by Rast (2003), is a question we cannot address due to the short duration of our dataset. 4. Horizontal Flow Analysis via Cork Tracking To further extract information about the plasma motion inside the SG structure, we tracked the evolution of tracers (corks) passively advected by instantaneous velocity and intensity fields. The corks, initially randomly spread over the FoV, are moved following the local gradient towards sites of minimum intensity or of minimum velocity in the case of intensity or velocity fields, respectively. We will com- pare the final and initial positions of the corks, which will give us information about the motion of downflows in the field of view. Corks are tracked for ∼ 16 minutes (a time sufficiently longer than the characteristic time scale of pho- tospheric fields (Müller et al., 2001; Berrilli et al., 2002) to let the cork settle in a downflow feature and track it for a while) and their initial and final position are stored. As corks tend to accumulate in long lasting downflow struc- tures, new corks are added each ∼ 5 minutes in order to also track structures forming during the observations. In Fig. 6 we report the result of the cork tracing for a contin- uum image series and for both the FeII 722.4 nm and the FeI 709.0 nm Doppler fields. In particular, we plotted the difference between the final and initial distances from the image centre of the corks versus their initial distances from the image centre. The alignment effect of the scatter plot is due to an inverse linear relationship between the ρstart and the δρ of corks with different initial postions which end in the same ‘attractor’ and therefore share their final position. As the image is centred on the SG structure, this will give us information about a possible difference of mean flows inside and outside the SG. In order to investigate the properties of the distribution, we fit on the scatter plots a sigmoidal function: A1 −A2 1 + e(x−x0)/dx +A2 (1) so that x0 will tell where the transition between the two values A1 and A2 of the distribution takes place and dx will tell how fast this transition is. The parameters of the fit are retrieved by a recursive Levenberg-Marquardt min- imization algorithm. The fits agree in retrieving positive values of A1 and near zero values for A2 (Table 2). This means that the corks inside a circle of radius x0 from the image centre tend to increase their radial distance, while the corks outside have no preferred direction in their mo- tion. Several simulations on randomly generated velocity fields showed that we can neglect the contribute from corks whose initial position is so near to the image centre that they are biased towards positive radial displacement. Finally, we tested the robustness of these results against the initial guesses and against the SG centre position in the FoV. The retrieved parameters do not depend on these factors, as long as the initial guesses are of the same order of magnitude of the convergence values or the FoV shift is less than 2.5 Mm. In Fig. 4 we show the mean intensity fields associated to the plots in Fig. 6, with superimposed the location of the change of the A value represented as an annulus of mean Fig. 6. Cork displacements versus initial positions. Top Panel: results of the cork tracking for the intensity field from continuum images. Central Panel: results of the cork tracking for the vertical velocity field extracted from FeII 722.4 nm. Bottom Panel: results of the cork tracking for the vertical velocity field extracted from FeI 709.0 nm. Each scatter plot has been fitted with a sigmoidal function (equa- tion 1). The retrieved fits are overplotted on the relative scatter plots. radius x0 and thickness 2dx. The three annular shapes essentially agree in retrieving the same SG diameter of ∼ 25 Mm. The width of the annuli, instead, seems to depend on the atmospheric altitude. In the upper panel of Fig. 8 we report the value of dx computed by the sigmoidal fits versus the photospheric height. Error bars represent the standard deviation from the fit. Recently, Berrilli et al. (2002) found a similar height dependence of the statistical properties of granular flows. In particular, they reported an intense braking in the first ∼ 120 km of the photosphere, confirmed by Puschmann et al. (2005) and a damping effect that filtered 6 Del Moro et al.: 3D photospheric velocity field of a SG cell A1 A2 x0 dx [Mm] [Mm] [Mm] [Mm] Continuum Intensity 0.49 ± 0.02 0.08 ± 0.01 11.3± 0.2 0.3± 0.2 FeII 722.4 LoS Velocity 0.61 ± 0.03 −0.09± 0.02 12.0± 0.2 1.1± 0.2 FeI 709.0 LoS Velocity 0.67 ± 0.08 −0.09± 0.03 10.9± 0.7 2.9± 0.6 Table 2. Parameters of the sigmoidal fits to the scatter plots reported in Fig. 6. out small features in higher atmospheric layers, letting only large flow features penetrate into the upper photosphere. The same process can explain the broadening of the SG border we found: in higher layers the corks are collected in larger and fewer downflow structures. As more corks are collected by the same structures, the number of indepen- dent tracers is decreased and similarly the precision of the retrieval of the boundary is decreased. Instead, we can exclude that such a smoothing effect is due to data reduction or seeing, because in that case the SG border retrieved from the FeI 709.0 nm LoS field would have been thinner than the one retrieved from the FeII 722.4 nm LoS field, as the latter shows lower contrast features, more prone to be degraded by the loss of spatial resolution. Due to the form of equation 1, the difference A1 − A2, divided for the time allotted to the corks to move, will give the mean radial velocity experienced by the corks. We plot in the bottom panel of Fig. 8 the radial velocity retrieved from the three scatter plots as a function of the photospheric altitude. Error bars represent the standard deviation from the fit. To account for these results, we assume that the large and more coherent features present in the LoS dopplergrams are reliable to retrieve the radial velocity measure, while the measure from the continuum images is somewhat reduced by the presence of tiny structures which are more turbulent in their motion. Such structures are not present in the higher layers dopplergrams because of the damping effect already discussed. We therefore discard the value obtained from the WL dataset because it is probably smeared by the turbulent motions of very small scale features and take into account only the two values retrieved from the higher layers, retrieving a mean velocity of 0.75± 0.05 km s−1. Such a value for the flows from the SG structure centre is consistent with the literature (Simon & Leighton, 1964; Hathaway et al., 2002; Paniveni et al., 2004; Meunier et al., 2007). 5. Conclusions The study of the full 3D velocity field of a SG shows that strong downflows are located on the border of the super- granular structure, but also that the mean granular flow regresses from the centre to the periphery of the SG. The divergence images show that the SG structure is out- lined by convergence sites on ∼ 66% of its border. The retrieved divergence values show a nearly radial flow of ∼ 0.1 km s−1 Mm−1 from the centre of the SG and con- vergent flows of the same magnitude in its border. The analysis of the evolution of passive tracers on inten- sity and velocity fields shows that inside the SG structure there is a preferential radial flow towards the SG border of 0.75± 0.05 km s−1. The height behaviour of the thickness of the SG border, again retrieved via cork tracing, shows an increase of the border width with height. This is probably due to a filter- ing effect with height, which preferentially allows large flow features to penetrate into the upper photospheric layers. The large and CaII bright cluster of structures in the lower part of the FoV, is not a site of strong convergence, but is a site of long-lasting downflows. We also found a strong and stable upflow nearby the centre of the cell, liable to organize the CaII bright structures by sweeping them out of the SG cell. The result presented in this paper are extracted from a sub- set of a longer timeseries of excellent spectral and temporal resolution, but varying spatial quality due to seeing. The used 30 min subset is characterized by a constant and good spatial resolution. This allowed us to detect precisely the flow associated with the SG. Anyhow, our analysis would have greatly benefited from a longer time sequence and other SG structures to analyze. In the future, we plan to apply this analysis to a collection of SG structures, so as to derive some statistical describer and possibly generalize the results. Acknowledgements. We thank the referee, T. Roudier, for suggestions and comments that have signicantly improved this paper. Part of this work was supported by Rome “Tor Vergata” University Physics Department grants. The data were acquired by instruments operated by the National Solar Observatory. The National Solar Observatory is a Division of the National Optical Astronomy Observatories, which is operated by the Association of Universities for Research in Astronomy, Inc., under cooperative agreement with the National Science Foundation. DDM thanks the High Altitude Observatory for support and C. Sormani for helpful comments. The authors aknowl- edge k. Janssen for the development of the IBIS data reduction pipeline, V. Penza for the calculation of the line RFs and M. Rast for very useful discussions and comments. References Behan, A. 2000, Proceedings of the 19th ISPRS Congress and Exhibition - Geoinformation for All. Amsterdam, The Netherlands, 16th - 23rd July 2000. Beck, J. G., Duvall, T. L., Jr. 2000, BAAS, 32, 802 Berrilli, F., Consolini, G., Pietropaolo, E., Caccin, B., Penza, V., Lepreti, F. 2002, A&A, 381, 253 Berrilli, F., Del Moro, D., Consolini, G., Pietropaolo, E., Duvall, T. L., Jr., Kosovichev, A. G. 2004, Sol. Phys., 221, 33 Berrilli, F., Del Moro, D., Russo, S., Consolini, G., Straus, Th. 2005, ApJ, 632, 677 Caccin, B., Gomez, M. T., Marmolino, C. & Severino, G., 1977, A&A, 54, 227 Cavallini, F., Berrilli, F., Cantarano, S., Egidi, A. 2001, Mem. SaIt, 72, 554 Cavallini, F. 2006, Sol. Phys., 236, 415 Del Moro, D., Berrilli, F., Duvall, T. L., Jr., Kosovichev, A. G. 2004, Sol. Phys., 221, 23 Del Moro, D. 2004, A&A, 428, 1007 Del Moro, D. 2005, PhD Thesis DeRosa, M. L., Toomre, J. 2004, ApJ, 616, 1242 Del Moro et al.: 3D photospheric velocity field of a SG cell 7 Deubner, F.-L. 1971, Sol. Phys., 17, 6 Frazier, E.N. 1970, Sol. Phys., 14, 89 Giordano, S. Del Moro, D. Berrilli, F. 2007, submitted Hathaway, D. H., Beck, J. G., Han, S., Raymond, J. 2002, Sol. Phys., 205, 25 Janssen, K., Cauzzi, G. 2006, A&A, 450, 365 Krishan, V., Paniveni, U., Singh, Jagdev, Srikanth, R. 2002, MNRAS, 334, 230 Leighton, R. B. 1964, ApJ, 140, 1547 Meunier, N., Tkaczuk, R., Roudier, Th., Rieutord, M. 2007, A&A, 461, 1141 Müller, D.A.N., Steiner, O., Schlichenmaier, R., Brandt, P.N. 2001, Sol. Phys., 203, 211 Musman, S., Rust, D.S. 1970, Sol. Phys., 13, 261 Paniveni, U., Krishan, V., Singh, Jagdev, Srikanth, R. 2004, MNRAS, 347, 1279 Puschmann, K. G., Ruiz Cobo, B., Vzquez, M., Bonet, J. A., Hanslmeier, A. 2005, A&A, 441, 1157 Qu, Z. & Xu, Z., 2002, Chin. J. Astron. Astrophys., 2, 71 Rast, M. P., 2003, ApJ, 597, 1200 Raju, K. P., Srikanth, R., Singh, Jagdev 1999, BASI, 27, 65 Rieutord, M., Roudier, Th., Malherbe, J.M., Rincon, F. 2000, A&A, 357, 1063 Rieutord, M., Roudier, Th., Ludwig, H. G., Nordlund, Å., Stein, R. 2001, A&A, 377, L14 Rimmele, T., Schroeter, E. H. 1989, A&A, 221, 137 Roudier, Th., Lignieres, F., Rieutord, M., Brandt, P.N., Malherbe, J.M. 2003, A&A, 409, 301 Schrijver, C. J., Hagenaar, H. J., Title, A. M. 1997, ApJ, 475, 328 Shine, R. A., Simon, G. W., Hurlburt, N. E. 2000, Soph, 193, 313 Simon, G. W., Leighton, R. B. 1964, ApJ, 140, 1120 Zwaan, C. 1978, Sol. Phys., 60, 213 Fig. 7. Mean images with superimposed the SG dimen- sion extracted from the cork tracking. Top Panel: mean FeI 709.0 nm Doppler image with SG extracted from from FeI 709.0 nm Dopplergrams (∼ 140 km). Central Panel: mean FeII 722.4 nm Doppler image with SG extracted from from FeII 722.4 nm Dopplergrams (∼ 70 km). Bottom Panel: mean CaII 854.2 nm wing image with SG extracted from the intensity continuum images (∼ 0 km). 8 Del Moro et al.: 3D photospheric velocity field of a SG cell -20 0 20 40 60 80 100 120 140 160 180 200 Photospheric Altitude (km) -20 0 20 40 60 80 100 120 140 160 180 200 Photospheric Altitude [km] Fig. 8. Top panel: dx (annulus width) versus photospheric height. Bottom panel: radial velocity versus photospheric height. Introduction Observations 3D Velocity Field Horizontal Flow Analysis via Cork Tracking Conclusions ABSTRACT We investigate the plasma flow properties inside a Supergranular (SG) cell, in particular its interaction with small scale magnetic field structures. The SG cell has been identified using the magnetic network (CaII wing brightness) as proxy, applying the Two-Level Structure Tracking (TST) to high spatial, spectral and temporal resolution observations obtained by IBIS. The full 3D velocity vector field for the SG has been reconstructed at two different photospheric heights. In order to strengthen our findings, we also computed the mean radial flow of the SG by means of cork tracing. We also studied the behaviour of the horizontal and Line of Sight plasma flow cospatial with cluster of bright CaII structures of magnetic origin to better understand the interaction between photospheric convection and small scale magnetic features. The SG cell we investigated seems to be organized with an almost radial flow from its centre to the border. The large scale divergence structure is probably created by a compact region of constant up-flow close to the cell centre. On the edge of the SG, isolated regions of strong convergent flow are nearby or cospatial with extended clusters of bright CaII wing features forming the knots of the magnetic network. <|endoftext|><|startoftext|> Introduction According to the current cosmological paradigm, large struc- tures in the Universe form hierarchically. Clusters of galaxies are the largest structures that have grown through mergers of smaller units and have achieved near dynamical equilibrium. In the hierarchical scenario, clusters are a rather young population, and we should be able to observe their formation process even at rather low redshifts. A signature of such process is the presence of cluster substructures. A cluster is said to contain substruc- tures (or subclusters) when its surface density is characterized by multiple, statistically significant peaks on scales larger than the typical galaxy size, with “surface density” being referred to the cluster galaxies, the intra-cluster (IC) gas or the dark matter (DM hereafter; Buote 2002). Studying cluster substructure therefore allows us to investi- gate the process by which clusters form, constrain the cosmo- logical model of structure formation, and ultimately test the hi- erarchical paradigm itself (e.g. Richstone et al. 1992; Mohr et al. 1995; Thomas et al. 1998). In addition, it also allows us to better understand the mechanisms affecting galaxy evolution in clus- ters, which can be accelerated by the perturbative effects of a cluster-subcluster collision and of the tidal field experienced by Send offprint requests to: Massimo Ramella, ramella@oats.inaf.it ⋆ Figure 6 is only available in electronic form via http://www.edpsciences.org a group accreting onto a cluster (Bekki 1999; Dubinski 1999; Gnedin 1999). If clusters are to be used as cosmological tools, it is important to calibrate the effects substructures have on the estimate of their internal properties (e.g. Schindler & Müller 1993; Pinkney et al. 1996; Roettiger et al. 1998; Biviano et al. 2006; Lopes et al. 2006). Finally, detailed analyses of clus- ter substructures can be used to constrain the nature of DM (Markevitch et al. 2004; Clowe et al. 2006). The analysis of cluster substructures can be performed us- ing the projected phase-space distribution of cluster galaxies (e.g. Geller & Beers 1982), the surface-brightness distribution and temperature of the X-ray emitting IC gas (e.g. Briel et al. 1992), or the shear pattern in the background galaxy distribu- tion induced by gravitational lensing, that directly samples sub- structure in the DM component (e.g. Abdelsalam et al. 1998). None of these tracers of cluster substructure (cluster galaxies, IC gas, background galaxies) can be considered optimal. The iden- tification of substructures is in fact subject to different biases depending on the tracer used. In X-rays projection effects are less important than in the optical, but the identification of sub- structures is more subject to a z-dependent bias, arising from the point spread function of the X-ray telescope and detector (e.g. Böhringer & Schuecker 2002). Moreover, the different cluster components respond in a different way to a cluster-subcluster collision. The subcluster IC gas can be ram-pressure braked and stripped from the colliding subcluster and lags behind the sub- cluster galaxies and DM along the direction of collision (e.g. http://arxiv.org/abs/0704.0579v1 http://www.edpsciences.org 2 M. Ramella et al.: Substructures in the WINGS clusters Roettiger et al. 1997; Barrena et al. 2002; Clowe et al. 2006). Hence, it is equally useful to address cluster substructure analy- sis in the X-ray and in the optical. Traditionally, the first detections of cluster substructures were obtained from the projected spatial distributions of galax- ies (e.g. Shane & Wirtanen 1954; Abell et al. 1964), in com- bination, when possible, with the distribution of galaxy ve- locities (e.g. van den Bergh 1960, 1961; de Vaucouleurs 1961). Increasingly sophisticated techniques for the detection and characterization of cluster substructures have been developed over the years (see Moles et al. 1986; Perea et al. 1986a,b; Buote 2002; Girardi & Biviano 2002, and references therein). In many of these techniques substructures are identified as de- viations from symmetry in the spatial and/or velocity distri- bution of galaxies and in the X-ray surface-brightness (e.g. West et al. 1988; Fitchett & Merritt 1988; Mohr et al. 1993; Schuecker et al. 2001). In other techniques substructures are identified as significant peaks in the surface density distribu- tion of galaxies or in the X-ray surface brightness, either as residuals left after the subtraction of a smooth, regular model representation of the cluster (e.g. Neumann & Böhringer 1997; Ettori et al. 1998), or in a non-parametric way, e.g. by the tech- nique of wavelets (e.g. Escalera et al. 1994; Slezak et al. 1994; Biviano et al. 1996) and by adaptive-kernel techniques (e.g. Kriessler & Beers 1997; Bardelli et al. 1998a, 2001). The performances of several different methods have been evaluated both using numerical simulations (e.g. Mohr et al. 1995; Crone et al. 1996; Pinkney et al. 1996; Buote & Xu 1997; Cen 1997; Valdarnini et al. 1999; Knebe & Müller 2000; Biviano et al. 2006) and also by applying different methods to the same cluster data-sets and examine the result differ- ences (e.g. Escalera et al. 1992, 1994; Mohr et al. 1995, 1996; Kriessler & Beers 1997; Fadda et al. 1998; Kolokotronis et al. 2001; Schuecker et al. 2001; Lopes et al. 2006). Generally speaking, the sensitivity of substructure detection increases with both increasing statistics (e.g. more galaxies or more X-ray pho- tons) and increasing dimensionality of the test (e.g. using galaxy velocities in addition to their positions, or using X-ray tempera- ture in addition to X-ray surface brightness). Previous investigations have found very different fractions of clusters with substructure in nearby clusters, depending on the method and tracer used for substructure detection, on the cluster sample, and on the size of sampled cluster re- gions (e.g. Geller & Beers 1982; Dressler & Shectman 1988; Mohr et al. 1995; Girardi et al. 1997; Kriessler & Beers 1997; Jones & Forman 1999; Solanes et al. 1999; Kolokotronis et al. 2001; Schuecker et al. 2001; Flin & Krywult 2006; Lopes et al. 2006). Although the distribution of subcluster masses has not been determined observationally, it is known that subclusters of ∼ 10% the cluster mass are typical, while more massive subclusters are less frequent (Escalera et al. 1994; Girardi et al. 1997; Jones & Forman 1999). The situation is probably dif- ferent for distant clusters which tend to show massive sub- structures more often than nearby clusters clearly suggesting hierarchical growth of clusters was more intense in the past (e.g. Gioia et al. 1999; van Dokkum et al. 2000; Haines et al. 2001; Maughan et al. 2003; Huo et al. 2004; Rosati et al. 2004; Demarco et al. 2005; Jeltema et al. 2005). Additional evidence for the hierarchical formation of clus- ters is provided by the analysis of brightest cluster galaxies (BCGs hereafter) in substructured clusters. BCGs usually sit at the bottom of the potential well of their host cluster (e.g. Adami et al. 1998b). When a BCG is found to be significantly displaced from its cluster dynamical center, the cluster displays evidence of substructure (e.g. Beers et al. 1991; Ferrari et al. 2006). From the correlation between cluster and BCG luminosi- ties, Lin & Mohr (2004) conclude that BCGs grow by merg- ing as their host clusters grow hierarchically. The related evo- lution of BCGs and their host clusters is also suggested by the alignement of the main cluster and BCG axes (e.g. Binggeli 1982; Durret et al. 1998). Both the BCG and the cluster axes are aligned with the surrounding large scale structure dis- tribution, where infalling groups come from. These infalling groups are finally identified as substructures once they enter the cluster environment (Durret et al. 1998; Arnaud et al. 2000; West & Blakeslee 2000; Ferrari et al. 2003; Plionis et al. 2003; Adami et al. 2005). Hence, substructure studies really provide direct evidence for the hierarchical formation of clusters. Concerning the impact of subclustering on global cluster properties, it has been found that subclustering leads to over- estimating cluster velocity dispersions and virial masses (e.g. Perea et al. 1990; Bird 1995; Maurogordato et al. 2000), but not in the general case of small substructures (Escalera et al. 1994; Girardi et al. 1997; Xu et al. 2000). During the collision of a subcluster with the main cluster, both the X-ray emitting gas distribution and its temperature have been found to be signifi- cantly affected (e.g. Markevitch & Vikhlinin 2001; Clowe et al. 2006). As a consequence, it has been argued that substruc- ture can explain at least part of the scatter in the scaling rela- tions of optical-to-X-ray cluster properties (e.g. Fitchett 1988; Girardi et al. 1996; Barrena et al. 2002; Lopes et al. 2006). As far as the internal properties of cluster galaxies are concerned, there is observational evidence that a higher frac- tion of cluster galaxies with spectral features characteristic of recent or ongoing starburst episodes is located in sub- structures or in the regions of cluster-subcluster interactions (Caldwell et al. 1993; Abraham et al. 1996; Biviano et al. 1997; Caldwell & Rose 1997; Bardelli et al. 1998b; Moss & Whittle 2000; Miller et al. 2004; Poggianti et al. 2004; Miller 2005; Giacintucci et al. 2006). In this paper we search for and characterize substructures in the sample of 77 nearby clusters of the WIde-field Nearby Galaxy-cluster Survey (WINGS hereafter, Fasano et al. 2006). This sample is an almost complete sample in X-ray flux in the redshift range 0.04 < z < 0.07. We detect substructures from the spatial, projected distribution of galaxies in the cluster fields, us- ing the adaptive-kernel based DEDICA algorithm (Pisani 1993, 1996). In Sect. 2 we describe our data-set; in Sect. 3 we de- scribe the procedure of substructure identification; in Sect. 4 we use Monte Carlo simulations in order to tweak our procedure; in Sect. 5 we describe the identification of substructures in our data-set; in Sect. 6 the catalog of identified substructures is pro- vided. In Sect. 7 we investigate the properties of the identified substructures, and in Sect. 8 we consider the relation between the BCGs and the substructures. We provide a summary of our work in Sect. 9. 2. The Data WINGS is an all-sky, photometric (multi-band) and spectro- scopic survey, whose global goal is the systematic study of the local cosmic variance of the cluster population and of the prop- erties of cluster galaxies as a function of cluster properties and local environment. The WINGS sample consists of 77 clusters selected from three X-ray flux limited samples compiled from ROSAT All- Sky Survey data, with constraints just on the redshift (0.04 < z < 0.07) and distance from the galactic plane (|b| ≥20 deg). The core M. Ramella et al.: Substructures in the WINGS clusters 3 of the project consists of wide-field optical imaging of the se- lected clusters in the B and V bands. The imaging data were col- lected using the WFC@INT (La Palma) and the WFI@MPG (La Silla) in the northern and southern hemispheres, respectively. The observation strategy of the survey favors the uniformity of photometric depth inside the different CCDs, rather than com- plete coverage of the fields that would require dithering. Thus, the gaps in the WINGS optical imaging correspond to the phys- ical gaps between the different CCDs of the mosaics. During the data reduction process, we give particular care to sky subtraction (also in presence of crowded fields including big halo galaxies and/or very bright stars), image cleaning (spikes and bad pixels) and star/galaxy classification (obtained with both automatic and interactive tools). According to Fasano et al. (2006) and Varela et al. (2007), the overall quality of the data reported in the WINGS photomet- ric catalogs can be summarized as follows: (i) the astrometric errors for extended objects have r.m.s. ∼0.2 arcsec; (ii) the av- erage limiting magnitude is ∼24.0, ranging from 23.0 to 25.0; (iii) the completeness of the catalogs is achieved (on average) up to V ∼22.0; (iv) the total (systematic plus random) photometric r.m.s. errors, derived from both internal and external compar- isons, vary from ∼0.02 mag, for bright objects, up to ∼0.2 mag, for objects close to the detection limit. 3. The DEDICA Procedure We base our search for substructures in WINGS clusters on the DEDICA procedure (Pisani 1993, 1996). This procedure has the following advantages: 1. DEDICA gives a total description of the clustering pattern, in particular the membership probability and significance of structures besides geometrical properties; 2. DEDICA is scale invariant; 3. DEDICA does not assume any property of the clusters, i.e. it is completely non-parametric. In particular it does not re- quire particularly rich samples to run effectively. The basic nature and properties of DEDICA are described in Pisani (1993, 1996, and references therein). Here we summarize the main structure of the algorithm and how we apply it to our data sample. The core structure of DEDICA is based on the as- sumption that a structure (or a “cluster” in the algorithm jargon) corresponds to a local maximum in the density of galaxies. We proceed as follows. First we need to estimate the proba- bility density function Ψ(ri) (with i = 1, . . .N) associated with the set of N galaxies with coordinates ri. Second, we need to find the local maxima in our estimate of Ψ(ri) in order to iden- tify clusters and also to evaluate their significance relatively to the noise. Third and finally, we need to estimate the probability that a galaxy is a member of the identified clusters. 3.1. The probability density DEDICA is a non-parametric method in the sense that it does not require any assumption on the probability density function that it is aimed to estimate. The only assumptions are that Ψ(ri) must be continuous and at least twice differentiable. The function f (ri) is an estimate of Ψ(ri) and it is built by using an adaptive kernel method given by: fka(r) = K(ri, σi; r) (1) where we use the two dimensional Gaussian kernel K(ri, σi; r) centered in ri with size σi. The most valuable feature of DEDICA is the procedure to se- lect the values of kernel widths σi. It is possible to show that the optimal choice for σi, i.e. with asymptotically minimum vari- ance and null bias, is obtained by minimizing the distance be- tween our estimate f (ri) and Ψ(ri). This distance can be eval- uated by a particular function called the integrated square error IS E( f ) given by: IS E( f ) = [Ψ(r) − f (r)]2dr (2) Once the minimum IS E( f ) is reached we have obtained the DEDICA estimate of the density as in Eq.1. 3.2. Cluster Identification The second step of DEDICA consists in the identification of the local maxima in fka(r). The positions of the peaks in the density function fka(r) are found as the solutions of the iterative equa- tion: rm+1 = rm + a · ∇ fka(rm) fka(rm) where a is a scale factor set according to optimal convergence requirements. The limit R of the sequence rm defined in Eq.3 depends on the starting position rm=1. rm = R(rm=1) (4) We run the sequence in Eq.3 at each data position ri. We label each data point with the limit Ri = R(rm=1 = ri). These limits Ri are the position of the peak to which the i− th galaxy belong. In the case that all the galaxies are members of a unique cluster, all the labels Ri are the same. At the other extreme each galaxy is a one-member cluster and all Ri have different values. All the members of a given cluster belong to the same peak in fka(r) and have the same Ri. We identify cluster members by listing galaxies having the same values of R. We end up with ν different clusters each with nµ (µ = 1, . . . , ν) members. In order to maintain a coherent notation, we identify with the label µ = 0 the n0 isolated galaxies considered a system of background galaxies. We have: n0 = N − µ=1 nµ. 3.3. Cluster Significance and of Membership Probability The statistical significance S µ (µ = 1, . . . , ν) of each cluster is based on the assumption that the presence of the µ − th cluster causes an increase in the local probability density as well as in the sample likelihood LN = Πi[ fka(ri)] relatively to the value Lµ that one would have if the members of the µ− th cluster were all isolated, i.e. belonging to the background. A large value in the ratio LN/Lµ characterizes the most im- portant clusters. According to Materne (1979) it is possible to estimate the significance of each cluster by using the likelihood ratio test. In other words 2 ln(LN/Lµ) is distributed as a χ 2 vari- able with ν− 1 degrees of freedom. Therefore, once we compute the value of χ2 for each cluster (χ2S ), we can also compute the significance S µ of the cluster. Here we assume that the contribution to the global density field fka(ri) of the µ − th cluster is Fµ(ri). The ratio between the value of Fµ(ri) and the total local density fka(ri) can be used to estimate the membership probability of each galaxy relatively to 4 M. Ramella et al.: Substructures in the WINGS clusters the identified clusters. This criterion also allows us to estimate the probability that a galaxy is isolated. At the end of the DEDICA procedure we are left with a) a catalog of galaxies each with information on position, mem- bership, local density and size of the Gaussian kernel, b) a cat- alog of structures with information on position, richness, the χ2S parameter, and peak density. For each cluster we also com- pute from the coordinate variance matrix the cluster major axis, ellipticity and position angle. 4. Tweaking the Algorithm with Simulations In this section we describe our analysis of the performance of DEDICA and the guidelines we obtain for the interpretation of the clustering analysis of our observations. We build simulated fields containing a cluster with and with- out subclusters. The simulated fields have the same geometry of the WFC field and are populated with the typical number of objects we will analyze. For simplicity we consider only WFC fields. Because DEDICA is scale-free, a different sampling of the same field of view has no consequence on our analysis. In the next section we limit our analysis to MV,lim ≤ −16. At the median redshift of the WINGS cluster, z ≃ 0.05, this absolute magnitude limit corresponds to an apparent magnitude Vlim ≃ 21. Within this magnitude limit the representative number of galaxies in our frames is Ntot= 900. We then consider Ntot= Nmem+ Nbkg, with Nmemthe number of cluster members and Nbkgthe number of field – or background – galaxies. We set Nbkg= 670, close to the average number of background galaxies we expect in our frame based on typical observed fields counts, e.g. Berta et al. (2006) or Arnouts et al. (1997). With this choice, we have Nmem= 230. We distribute uniformly at random Nbkgobjects. We dis- tribute at random the remaining Nmem= 230 objects in one or more overdensities depending on the test we perform. We popu- late overdensities according to a King profile (King 1962) with a core radius Rcore = 90 kpc, representative of our clusters. We then scale Rcore with the number of members of the substructure, NS . We use Rcore = 250 NC + NS where NC is the number of objects in the cluster with Nmem= NS + NC . This scaling of Rcore with cluster richness is from Adami et al. (1998a) assuming direct proportionality between cluster richness and luminosity (e.g. Popesso et al. 2006). As far as the relative richnesses of the cluster and subcluster are concerned, we consider the following richness ratios rcs = NC/NS = 1, 2, 4, 8. With these richness ratios, the number of objects in the cluster are NC = 115, 153, 184, 204, and those in subclusters are NS = 115, 77, 46, 26 respectively. In a first set of simulated fields we place the substructure at 2731 pixels (15 arcmin) from the main cluster so that they do not overlap. In a second set of simulations, we place main cluster and substructure at shorter distances, 683 and 1366 pixels, in order to investigate the ability of DEDICA to resolve structures. At each of these shorter distances we build simulations with both rcs= 1 and 2. For each richness ratio and/or distance between cluster and subcluster we produce 16 simulations with different realizations of the random positions of the data points representing galaxies. In order to minimize the effect of the borders on the detection of structures we add to the simulation a “frame” of 1000 pixel. 5 10 15 simulation Fig. 1. Fraction of recovered members of each substructure for different rcs. The solid line connects substructures with rcs = 2 and 4 We fill this frame with a grid of data points at the same density as the average density of the field. The first result we obtain from the runs of DEDICA on the simulations with varying richness ratio is the positive rate at which we detect real structures. We find that we always recover both cluster and substructure even when the substructure only contains NS = 1/8 NC objects, i.e. 26 objects (on top of the uni- form background). In other words, if there is a real structure DEDICA finds it. We also check how many original members the procedure assigns to structures it recovers. The results are summarized in Fig. 1. In the diagram, the fraction of recovered members of each substructure is represented by the values of its rcs. The solid line connects substructures with rcs = 2 and 4. From Fig. 1 it is clear that our procedure recovers a large fraction of members, almost irrespective of the richness of the original structure. It is also interesting to note that the fluctu- ations identified as substructures are located very close to the center of the corresponding simulated substructures. In almost all cases the distance between original and detected substructure is significantly shorter than the mean inter-particle distance. The second important result we obtain from the simulations is the false positive rate, i.e. the fraction of noise fluctuations that are as significant as the fluctuations corresponding to real structures. First of all we need to define an operative measure of the reliability of the detected structures. In fact DEDICA provides a default value S µ (µ = 1, . . . , ν) of the significance (see Sect. 3.3). However, S µ has a relatively small dynamical range, in particular for highly significant clusters. Density or richness both allow a reasonable ”ranking” of structures. However, both large low-density noise fluctuations (often built up from more than one noise fluctuation) and very high density fluctuations produced by few very close data points could be mistakenly ranked as highly significant structures ac- cording to, respectively, richness and density criteria. M. Ramella et al.: Substructures in the WINGS clusters 5 0 20 40 60 80 Fig. 2. χ2S of simulated noise fluctuations (solid line). Labels are the rcsof simulated structures at the abscissa corresponding to their χ2S and at arbitrary ordinates. We therefore prefer to use the parameter χ2S which stands at the base of the estimate of S µ and which is naturally provided by DEDICA. The main characteristic of χ2S is that it depends both on the density of a cluster relative to the background and on its rich- ness. Using χ2S we classify correctly significantly more structures than with either density or richness alone. In Fig. 2 we plot the distribution of χ2S of noise fluctuations (solid line). In the same plot we also mark the rcsof real struc- tures as detected by our procedure. We use labels indicating rcsand place them at the abscissa corresponding to their χ S and at arbitrary ordinates. Fig. 2 shows that the structures detected with rcs= 1, 2 are al- ways distinguishable from noise fluctuations. Substructures with rcs= 4 or higher, although correctly detected, have χ S values that are close to or lower than the level of noise. With the second set of simulations, we test the minimum dis- tance at which cluster and subcluster can still be identified as separate entities. We place cluster and substructure (rcs= 1, 2) at distances dcs = 683 and 1366 pixel. These distances are 1/4 and 1/2 respectively of the distance between cluster and substructure in the first set of simulations. Again we produce 16 simulations for each of the 4 cases. We find that at dcs = 1366 pixel cluster and substructure are always correctly identified. At the shorter distance dcs = 683 pixel, DEDICA merges cluster and substructure in 1 out of 16 cases for rcs= 1 and in 8 out of 16 cases for rcs= 2. With our density profile, dcs = 683 pixel corresponds to dcs ≃ Rc + Rs with Rc, Rs the radii of the main cluster and of the subcluster respectively. In order to verify the results we obtain for 900 data points we produce more simulations with Ntot= 450, 600 and 1200. In all these simulations RC and RS are the same as in the set with Ntot= 900. We vary Nbkgand Nmemso that Nmem/ Nbkgis the same as in the case Ntot= 900. 0.5 1 1.5 2 2.5 Fig. 3. Small symbols correspond to χ2S as a function of the num- ber of members of noise fluctuations. Crosses, circles, dots and triangles are χ2S for the noise fluctuations of the simulations with Ntot= 450, 600, 900, and 1200 respectively. Large symbols are χ2S of simulated clusters and subclusters with rcs= 1. Horizontal lines mark the levels of χ2S ,threshold. These simulations confirm the results we obtain in the case Ntot= 900, and allow us to set a detection threshold, χ2S ,threshold(Ntot), for significant fluctuations in the analysis of real clusters. We summarize the behavior of the noise fluctuations in our simulations in Fig. 3. In this figure, the small symbols corre- spond to χ2S as a function of the number of members of noise fluctuations. In particular, crosses, circles, dots and triangles are χ2S for the noise fluctuations of the simulations with Ntot= 450, 600, 900, and 1200 respectively. The larger symbols are the χ2S of the fluctuations correspond- ing to simulated clusters and subclusters of equal richness (rcs= The 4 horizontal lines mark the level of χ2S ,threshold, i.e. the average χ2S of the 3 most significant noise fluctuations in each of the 4 groups of simulations with Ntot= 450, 600, 900, and 1200. The expected increase of χ2S ,thresholdwith Ntotis evident. We note that the only significant difference with these find- ings we obtain from the simulations with rcs= 2 is that χ simulated clusters and subclusters is closer to χ2S ,threshold(but still higher). We fit χ2S ,thresholdwith Ntotand obtain log(χ2S ,threshold) = 1/2.55 log(Ntot) + 0.394 (5) in good agreement with the expected behavior of the poissonian fluctuations. As a final test we verify that infra-chip gaps do not have a dramatic impact on the detection of structures in the cases rcs= 1 and 2. We place a 50 pixel wide gap where it has the maximum impact, i.e. where the kernel size is shortest. Even if the infra- 6 M. Ramella et al.: Substructures in the WINGS clusters chip gap cuts through the center of the structures, DEDICA is able to identify these structures correctly. We summarize here the main results of our tests on simulated clusters with substructures: – DEDICA successfully detects even the poorest structures above a uniform poissonian noise background. – DEDICA recovers a large fraction (typically > 3/4) of the real members of a substructure, almost irrespective of the richness of the structure. – DEDICA is able to distinguish between noise fluctuations and true structures only if these structures are rich enough. In the case of our simulations, structures have to be richer than 1/4th of the main structure. – DEDICA is able to separate neighboring structures provided they do not overlap. – infra-chip gaps do not threaten the detection of structures that are rich enough to be reliably detected. – the χ2S threshold we use to identify significant structures is a function of the total number of points and can be scaled within the whole range of numbers of galaxies observed within our fields. In the next section we apply these results to the real WINGS clusters. 5. Substructure detection in WINGS clusters We apply our clustering procedure to the 77 clusters of the WINGS sample. The photometric catalog of each cluster is deep, reaching a completeness magnitude Vcomplete ≤ 22. The num- ber of galaxies is correspondingly large, from Ngal ≃ 3, 000 to Ngal ≃ 10, 000. The large number of bright background galaxies (faint ap- parent magnitudes) dilutes the clustering signal of local WINGS clusters. We perform test runs of the procedure on several clus- ters with magnitude cuts brighter than Vcomplete. Based on these tests, we decide to cut galaxy catalogs to the absolute magnitude threshold MV = −16.0. With this choice a) we maximize the signal-to-noise ratio of the detected subclusters and b) we still have enough galaxies for a stable identification of the system. At the median redshift of WINGS clusters, z ≃ 0.0535, our absolute magnitude cut corresponds to an apparent magnitude V ≃ 21.2. This apparent magnitude also approximately corresponds to the magnitude where the contrast of our typical cluster relative to the field is maximum (this estimate is based on the average cluster luminosity function of (Yagi et al. 2002; De Propris et al. 2003) and on the galaxy counts of (Berta et al. 2006)). The number of galaxies that are brighter than the threshold MV = −16.0 is in the range 600 < Ntot < 1200 for a large fraction of clusters observed with either WFC@INT or with WFI@ESO2.2. In order to proceed with the identification of significant structures within WINGS clusters, we need to verify that our simulations are sufficiently representative of the real cases. In practice we need to compare the observed distributions of χ2S values of noise fluctuations with the corresponding simulated distributions. In the observations it is impossible to identify indi- vidual fluctuations as noise. In order to have an idea of the distri- butions of χ2S of noise fluctuations we consider that our fields are centered on real clusters. As a consequence, on average, fluctua- tions in the center of the frames are more likely to correspond to real systems than those at the borders. 0 20 40 60 Fig. 4. χ2S distributions for border (thick solid histogram) and central (thick dashed histogram) observed fluctuations. The thin solid line is the normalized distribution of χ2S of the noise fluctu- ations in our simulations We therefore consider separately the fluctuations within the central regions of the frames and all other fluctuations (borders). We define the central regions as the central 10% of WFC and WFI areas. We plot in Fig. 4 the two distributions. The thick solid histogram is for the border and the thick dashed histogram for the center of the frames. The difference between ”noise” and ”signal” is clear. In the same figure we also plot the normal- ized distribution of χ2S of the noise fluctuations in our simulations (thin solid line). The distributions of χ2S of the observed and sim- ulated fluctuations are in reasonable agreement considering a) the simple model used for the simulations and that b) in the ob- servations we can not exclude real low-χ2S structures among noise fluctuations. We conclude that for our clusters we can adopt the same reliability threshold χ2S ,thresholdwe determine from our sim- ulations (Eq. 5). 6. The Catalog of Substructures We detect at least one significant structure in 55 (71%) clus- ters. We find that 12 clusters (16%) have no structure above the threshold (undetected). In the case of another 10 (13%) clusters we find significant structures only at the border of the field of view. In absence of a detection in the center of the frame, we con- sider these border structures unrelated to the target cluster. We also verify that in the Color-Magnitude Diagram (CMD) these border structures are redder than expected given the redshift of the target cluster. We consider also these 10 clusters undetected. Here we list the 22 undetected clusters: A0133, A0548b, A0780, A1644, A1668, A1983, A2271, A2382, A2589, A2626, A2717, A3164, A3395, A3490, A3497, A3528a, A3556, A3560, A3809, A4059, RX1022, Z1261. We note that undetected clusters are real physical systems ac- cording to their x-ray selection. From an operative point of view, the fact that these clusters are not detected by DEDICA is the M. Ramella et al.: Substructures in the WINGS clusters 7 Fig. 5. Isodensity contours (logarithmically spaced) of the Abell 85 field. The title lists the coordinates of the center. The orien- tation is East to the left, North to the top. Galaxies belonging to the systems detected by DEDICA are shown as dots of different colors. Black, light green, blue, red, magenta, dark green are for the main system and the subsequent substructures ordered as in Table A.1. Large symbols are for galaxies with MV ≤ −17.0 that lie where local densities are higher than the median local density of the structure the galaxy belongs to. Open symbols mark the positions of the first- and second-ranked cluster galaxies, BCG1 and BCG2 respectively. Similar plots for the 55 analysed clusters are available in the electronic version of this Journal. result of the division into too many structures of the total avail- able clustering signal in the field (or of a too large fraction of the clustering signal going into border structures). Several phys- ical situations could be at the origin of missed detections. One possibility is an excess of physical substructures of comparable richness. Another possibility is that these clusters are embedded in regions of the large scale structure that are highly clustered. We do not try to recover these structures because they can not be prominent enough. Since our analysis is bidimensional, we can only detect and use confidently the most prominent struc- tures. Redshifts are needed for a more detailed analysis of cluster substructures. We list the 55 clusters with significant structures in Table A.1. We give, for each substructure: (1) the name of the parent cluster; (2) the classification of the structure as main (M), sub- cluster (S), or background (B) together with their order number; (3) right ascension (J2000), and (4) declination (J2000) in deci- mal degrees of the DEDICA peak; the parameters of the ellipse we obtain from the variance matrix of the coordinates of galaxies in the substructure, i.e. (5) major axis in arcminutes, (6) elliptic- ity, and (7) position angle in degrees; (8) luminosity (see the next section); (9) χ2S . We make available contour plots of the number density fields of all clusters in Fig. 6 of the electronic version of this Journal. In Fig. 5 we show an example of these plots. Isodensity contours are drawn at ten logarithmic intervals. Galaxies belonging to the systems detected by DEDICA are shown as dots of different col- ors. We use large symbols for brighter galaxies (MV ≤ −17.0) that lie where local densities are higher than the median local density of the structure the galaxy belongs to. We also mark with open symbols the positions of the first- and second-ranked cluster galaxies, BCG1 and BCG2 respectively. Color coding is black, light green, blue, red, magenta, dark green for the main system and the subsequent substructures ordered as in Table A.1. We describe and analyze in detail our catalog in the next sec- tion. 7. Properties of substructures The first problem we face in order to study the statistical and physical properties of substructures is to determine their asso- ciation with the main structure. In fact, the main structure itself has to be identified among the structures detected by DEDICA in each frame. In most cases it is easy to identify the main structure of a cluster since it is located at the center of the frame and it has a high χ2S . In two cases (A0168 and A1736) the choice of the main structure is complicated because there are several similar structures near the center of the frame. In these cases we select the main structure for its highest χ2S . At this point we limit our analysis to members of the struc- ture that a) have an absolute magnitude MV ≤ −17 (corrected for Galactic absorption) and that b) are in the upper half of the distri- bution of DEDICA-defined local galaxy densities of the system they belong to. The galaxy density threshold we apply allows us to separate adjacent structures whose definition becomes more uncertain at lower galaxy density levels. The magnitude cut in- creases the relative weight of the galaxies we use to evaluate the nature of structures in the CMD. After having identified the main structure, we need to deter- mine which structures in the field of view of a given cluster have to be considered background structures. We consider a structure a physical substructure (or subcluster) if its color-magnitude re- lation (CMR hereafter) is identical, within the errors, to the CMR of the main structure. As a first step we define the color-magnitude relation (CMR) of the “whole cluster”, i.e. of galaxies in the main structure to- gether with all other galaxies not assigned to any structure by DEDICA. We compute the (B − V) CMR of the Coma cluster from published data (Adami et al. 2006). Then we keep fixed the slope of the linear CMR of Coma and shift it to the mean redshift of the cluster. In order to determine that the main structure and a substruc- ture are at the same redshift, we evaluate the fraction of back- ground (red) galaxies, fbg, that each structure has in the CMD. If these fractions are identical within the errors (Gehrels 1986), we consider the two structures to be at the same redshift. In practice we determine fbg by assigning to the background those galaxies of a structure that are redder than a line parallel to the CMR and vertically shifted (i.e. redwards) by 2.33 times the root-mean square of the colors of galaxies in the CMR. We note that the probability that a random variable is greater than 2.33 in a Gaussian distribution is only 1%. The result of the selection of main structures and substruc- tures is the following: 40 clusters have a total of 69 substruc- tures at the same redshift as the main structure, only 15 clus- ters are left without substructures. A total of 35 systems are found in the background. Considering a) the number density of poor-to-rich clusters (Mazure et al. 1996; Zabludoff et al. 1993), b) the average luminosity function of clusters (Yagi et al. 2002; 8 M. Ramella et al.: Substructures in the WINGS clusters Fig. 7. Cumulative distributions of the two different indicators of subclustering: left panel Nsub, right panel fLsub. De Propris et al. 2003), c) the total area covered by the 55 clus- ter fields, and d) the limiting apparent magnitude corresponding to our absolute magnitude threshold MV = −16.0, we expect to find ∼ 0.5± 0.2 background systems per cluster field, 28± 11 in total. This estimate is consistent with the 35 background systems we find. The fraction of clusters with subclusters (73%) is higher than generally found in previous investigations (typically∼ 30%, see, e.g., Girardi & Biviano 2002; Flin & Krywult 2006; Lopes et al. 2006, and references therein). Even if we count all undetected clusters as clusters without substructures, this fraction only de- creases to 52% (40/77). It is however acknowledged that the fraction of substructured clusters depends, among other factors, on the algorithm used to detect substructures, on the quality and depth of the galaxy catalog. For example Kolokotronis et al. (2001) using optical and X-ray data find that the fraction of clus- ters with substructures is ≥ 45%, Burgett et al. (2004) using a battery of tests detect substructures in 84% of the 25 clusters of their sample. Having established the “global” fraction of substructured clusters, we now investigate the degree of subclustering of in- dividual clusters, i.e. the distribution of the number of substruc- tures Nsub we find in our sample. We find 15 (27%) clusters without substructures; 22 (40%) clusters with Nsub = 1; 10 (18%) clusters with Nsub = 2; 6 (11%) clusters with Nsub = 3; and 2 (3%) clusters with Nsub = 4. We plot in the left panel of Fig. 7 the integral distribution of Nsub. The distribution of the level of subclustering does not change when we measure it as the fractional luminosity of subclusters, fLsub, relative to the luminosity of the whole cluster (see Fig. 7, right panel). The luminosities we estimate are background cor- rected using the counts of Berta et al. (2006). We use the ellipses output from DEDICA (see previous section) as a measure of the area of subclusters. We find that Nsub and fLsub are clearly correlated according to the Spearman rank-correlation test. We now consider the distribution of subcluster luminosities and plot the corresponding histogram in Fig. 8. In the same fig- ure we also plot with arbitrary scaling the power-law∝ L−1. This relation is the prediction for the differential mass function of substructures in the cosmological simulations of De Lucia et al. (2004). Fig. 8. Observed differential distribution of subcluster lumi- nosities (histogram) and theoretical model (arbitrary scaling; De Lucia et al. 2004). Our observations are consistent to within the uncertainties with the theoretical prediction of De Lucia et al. (2004) down to L ∼ 1011.2 L⊙. The disagreement at lower luminosity is ex- pected since: a) below this limit galaxy-sized halos become im- portant among the simulated substructures, and b) only above this limit we expect our catalog to be complete. In fact only subclusters with luminosities brighter than L = 1011.2 L⊙ have always richnesses that are ≥ 1/3 of the main structure. This richness limit approximately corresponds to the completeness limit of DEDICA detections according to our simulations (see Sect. 4). 8. Brightest Cluster Galaxies Here we investigate the relation between BCGs and cluster struc- tures. We find that, on average, BCG1s are located close to the den- sity peak of the main structures. In projection on the sky, the bi- weight average (see Beers et al. 1990) distance of BCG1s from the peak of the main system is 72 ± 11 kpc. If we only consider the 44 BCG1s that are on the CMR and are assigned to main systems by DEDICA, the average distance decreases to 56 ± 8 kpc. The fact that BCG1s are close to the center of the system is consistent with current theoretical view on the formation of BCGs (e.g. Dubinski 1998; Nipoti et al. 2004). BCG2s are more distant than BCG1s from the peak of the main system: the biweight average distance is 345 ± 47 kpc. If we only consider the 26 BCG2s that are on the CMR and are assigned to main systems by DEDICA, the average distance de- creases to 161 ± 34 kpc. Projected distances of BCG2s from density peaks remain larger than those of BCG1s even when we consider the density peak of the structure or substructure they belong to. In Fig. 9 we plot the cumulative distributions of the distances of BCG1s (solid line) and BCG2s (dashed line) from the density peak of their systems. The distributions are different at the > 99.99% level according to a Kolmogorov-Smirnov test (KS-test). Now we turn to luminosities and find that the magnitude dif- ference between BCG1s and BCG2s, ∆M12, is larger in clus- ters without substructures than in clusters with substructures. In Fig. 10 we plot the cumulative distributions of ∆M12 for clusters with (dashed line) and without (solid line) subclusters. M. Ramella et al.: Substructures in the WINGS clusters 9 Fig. 9. Cumulative distributions of distances of BCG1 (solid line) and BCG2 (dashed line) from the density peak of their sys- Fig. 10. Cumulative distributions of the magnitude difference be- tween BCG1 and BCG2 in clusters with (dashed line) and with- out subclusters (solid line). The two distributions are different according to a KS-test at the 99.1% confidence level. We note that Lin & Mohr (2004) find that ∆M12 is independent of cluster properties. These authors however do not consider subclustering. In order to determine whether the higher values of ∆M12 in clusters without subclusters are due to an increased luminosity of the BCG1 (L1) or to a decreased luminosity of the BCG2 (L2), we consider the luminosity of the 10 th brightest galaxy (L10) as a reference. The biweight average luminosity ratios are < L1/L10 >= 8.6 ± 1.0 and < L2/L10 >= 3.3 ± 0.3 in clus- ters without substructures, and < L1/L10 >= 7.1 ± 0.4 and < L2/L10 >= 3.4 ± 0.2 in clusters with substructures. We then conclude that the ∆M12-effect is caused by a brightening of the BCG1 relative to the BCG2 in clusters without substructures. The fact that ∆M12 is higher in clusters without substruc- tures can be interpreted, at least qualitatively, in the framework of the hierarchical scenario of structure evolution. Clusters with- out substructures are likely to be evolved after several merger phases. Their BCG1s have already had time to accrete many galaxies, in particular the more massive ones, which slow down and sink to the cluster center as the result of dynamical friction. Some of these galaxies may even have been BCGs of the merg- ing structures. The simulations by De Lucia & Blaizot (2006) show that the BCG1s continue to increase their mass via can- nibalism even at recent times, and that there is a large vari- ance in the mass accretion history of BCG1s from cluster to cluster. The result of such a cannibalism process is an increase of the BCG1 luminosity with respect to other cluster galaxies, and in extreme cases may lead to the formation of fossil groups (Khosroshahi et al. 2006). However, according to these simulations, only 15% of all BCG1s have accreted > 30% of their mass over the last 2 Gyr, while another 15% have accreted <3% of their mass over the same period. Our results indicate that about 60% of the BCG1s are more than 1 magnitude brighter than the corresponding BCG2s. Given the size and generality of the luminosity dif- ferences it would seem that cannibalism alone, even if present along the merging history of a given cluster, cannot account for it. Most of the BCG1s should have then been assembled in early times, as pointed out in the downsizing scenario for galaxy for- mation (Cowie et al. 1996) and entered that merging history al- ready with luminosity not far form the present one. 9. Summary In this paper we search for and characterize cluster substructures, or subclusters, in the sample of 77 nearby clusters of the WINGS (Fasano et al. 2006). This sample is an almost complete sample in X-ray flux in the redshift range 0.04 < z < 0.07. We detect substructures in the spatial projected distribution of galaxies in the cluster fields using DEDICA (Pisani 1993, 1996) an adaptive-kernel technique. DEDICA has the following advantages for our study of WINGS clusters: a) DEDICA gives a total description of the clustering pat- tern, in particular membership probability and significance of structures besides geometrical properties. b) DEDICA is scale invariant c) DEDICA does not assume any property of the clusters, i.e. it is completely non-parametric. In particular it does not require particularly rich samples to run effectively. In order to test DEDICA and to set guidelines for the in- terpretation of the results of the application of DEDICA to our observations we run DEDICA on several sets of simulated fields containing a cluster with and without subclusters. We find that: a) DEDICA always identifies both cluster and subcluster even when the substructure richness ratio cluster- to-subcluster is rcs= 8, b) DEDICA recovers a large fraction of members, almost irrespective of the richness of the original structure (>∼ 70% in most cases), c) structures with richness ra- tios rcs<∼ 3 are always distinguishable from noise fluctuations of the poissonian simulated field. These simulations also allow us to define a threshold that we use to identify significant structures in the observed fields. We apply our clustering procedure to the 77 clusters of the WINGS sample. We cut galaxy catalogs to the absolute magni- tude threshold MV = −16.0 in order to maximize the signal-to- noise ratio of the detected subclusters. We detect at least one significant structure in 55 (71%) clus- ter fields. We find that 12 clusters (16%) have no structure above the threshold (undetected). In the remaining 10 (13%) clusters we find significant structures only at the border of the field of view. In absence of a detection in the center of the frame, we consider these border structures unrelated to the target cluster. We also verify that in the CMD these border structures are redder 10 M. Ramella et al.: Substructures in the WINGS clusters than expected given the redshift of the target cluster. We consider also these clusters undetected. We provide the coordinates of all substructures in the 55 clusters together with their main properties. Using the CMR of the early-type cluster galaxies we sep- arate ”true” subclusters from unrelated background structures. We find that 40 clusters out of 55 (73%) have a total of 69 sub- structures with 15 clusters left without substructures. The fraction of clusters with subclusters (73%) we identify is higher than most previously published values (typically ∼ 30%, see, e.g., Girardi & Biviano 2002, and references therein). It is however acknowledged that the fraction of substructured clus- ters depends, among other factors, on the algorithm used to de- tect substructures, on the quality and depth of the galaxy catalog (Kolokotronis et al. 2001; Burgett et al. 2004). Another important result of our analysis is the distribution of subcluster luminosities. In the luminosity range where our sub- structure detection is complete (L ≥ 1011.2 L⊙), we find that the distribution of subcluster luminosities is in agreement with the power-law ∝ L−1 predicted for the differential mass function of substructures in the cosmological simulations of De Lucia et al. (2004). Finally, we investigate the relation between BCGs and clus- ter structures. We find that, on average, BCG1s are located close to the den- sity peak of the main structures. In projection on the sky, the bi- weight average distance of BCG1s from the peak of the main system is 72±11 kpc. BCG2s are significantly more distant than BCG1s from the peak of the main system (345 ± 47 kpc). The fact that BCG1s are close to the center of the system is consistent with current theoretical view on the formation of BCGs (Dubinski 1998). A more surprising result is that the magnitude difference be- tween BCG1s and BCG2s, ∆M12, is significantly larger in clus- ters without substructures than in clusters with substructures. This fact may be interpreted in the framework of the hierarchical scenario of structure evolution (e.g. De Lucia & Blaizot 2006). References Abdelsalam H.M., Saha P., Williams L.L.R., Oct. 1998, AJ, 116, 1541 Abell G.O., Neyman J., Scott E.L., 1964, AJ, 69, 529 Abraham R.G., Smecker-Hane T.A., Hutchings J.B., et al., Nov. 1996, ApJ, 471, Adami C., Mazure A., Biviano A., Katgert P., Rhee G., Mar. 1998a, A&A, 331, Adami C., Mazure A., Katgert P., Biviano A., Aug. 1998b, A&A, 336, 63 Adami C., Biviano A., Durret F., Mazure A., Nov. 2005, A&A, 443, 17 Adami C., Picat J.P., Savine C., et al., Jun. 2006, A&A, 451, 1159 Arnaud M., Maurogordato S., Slezak E., Rho J., Mar. 2000, A&A, 355, 461 Arnouts S., de Lapparent V., Mathez G., et al., Jul. 1997, A&AS, 124, 163 Bardelli S., Pisani A., Ramella M., Zucca E., Zamorani G., Oct. 1998a, MNRAS, 300, 589 Bardelli S., Zucca E., Zamorani G., Vettolani G., Scaramella R., May 1998b, MNRAS, 296, 599 Bardelli S., Zucca E., Baldi A., Jan. 2001, MNRAS, 320, 387 Barrena R., Biviano A., Ramella M., Falco E.E., Seitz S., May 2002, A&A, 386, Beers T.C., Flynn K., Gebhardt K., Jul. 1990, AJ, 100, 32 Beers T.C., Gebhardt K., Forman W., Huchra J.P., Jones C., Nov. 1991, AJ, 102, Bekki K., Jan. 1999, ApJ, 510, L15 Berta S., Rubele S., Franceschini A., et al., Jun. 2006, A&A, 451, 881 Binggeli B., Mar. 1982, A&A, 107, 338 Bird C.M., Jun. 1995, ApJ, 445, L81 Biviano A., Durret F., Gerbal D., et al., Jul. 1996, A&A, 311, 95 Biviano A., Katgert P., Mazure A., et al., May 1997, A&A, 321, 84 Biviano A., Murante G., Borgani S., et al., Sep. 2006, A&A, 456, 23 Böhringer H., Schuecker P., Jun. 2002, Observational signatures and statistics of galaxy Cluster Mergers: Results from X-ray observations with ROSAT, ASCA, and XMM-Newton, 133–162, ASSL Vol. 272: Merging Processes in Galaxy Clusters Briel U.G., Henry J.P., Boehringer H., Jun. 1992, A&A, 259, L31 Buote D.A., Jun. 2002, X-Ray Observations of Cluster Mergers: Cluster Morphologies and Their Implications, 79–107, ASSL Vol. 272: Merging Processes in Galaxy Clusters Buote D.A., Xu G., Jan. 1997, MNRAS, 284, 439 Burgett W.S., Vick M.M., Davis D.S., et al., Aug. 2004, MNRAS, 352, 605 Caldwell N., Rose J.A., Feb. 1997, AJ, 113, 492 Caldwell N., Rose J.A., Sharples R.M., Ellis R.S., Bower R.G., Aug. 1993, AJ, 106, 473 Cen R., Aug. 1997, ApJ, 485, 39 Clowe D., Bradač M., Gonzalez A.H., et al., Sep. 2006, ApJ, 648, L109 Cowie L.L., Songaila A., Hu E.M., Cohen J.G., Sep. 1996, AJ, 112, 839 Crone M.M., Evrard A.E., Richstone D.O., Aug. 1996, ApJ, 467, 489 De Lucia G., Blaizot J., Jun. 2006, ArXiv Astrophysics e-prints De Lucia G., Kauffmann G., Springel V., et al., Feb. 2004, MNRAS, 348, 333 De Propris R., Colless M., Driver S.P., et al., Jul. 2003, MNRAS, 342, 725 de Vaucouleurs G., May 1961, ApJS, 6, 213 Demarco R., Rosati P., Lidman C., et al., Mar. 2005, A&A, 432, 381 Dressler A., Shectman S.A., Apr. 1988, AJ, 95, 985 Dubinski J., Jul. 1998, ApJ, 502, 141 Dubinski J., Aug. 1999, In: Merritt D.R., Valluri M., Sellwood J.A. (eds.) ASP Conf. Ser. 182: Galaxy Dynamics - A Rutgers Symposium, 491–+ Durret F., Forman W., Gerbal D., Jones C., Vikhlinin A., Jul. 1998, A&A, 335, Escalera E., Slezak E., Mazure A., Oct. 1992, A&A, 264, 379 Escalera E., Biviano A., Girardi M., et al., Mar. 1994, ApJ, 423, 539 Ettori S., Fabian A.C., White D.A., Nov. 1998, MNRAS, 300, 837 Fadda D., Slezak E., Bijaoui A., Jan. 1998, A&AS, 127, 335 Fasano G., Marmo C., Varela J., et al., Jan. 2006, A&A, 445, 805 Ferrari C., Maurogordato S., Cappi A., Benoist C., Mar. 2003, A&A, 399, 813 Ferrari C., Arnaud M., Ettori S., Maurogordato S., Rho J., Feb. 2006, A&A, 446, Fitchett M., Merritt D., Dec. 1988, ApJ, 335, 18 Fitchett M.J., 1988, In: Dickey J.M. (ed.) ASP Conf. Ser. 5: The Minnesota lec- tures on Clusters of Galaxies and Large-Scale Structure, 143–174 Flin P., Krywult J., Apr. 2006, A&A, 450, 9 Gehrels N., Apr. 1986, ApJ, 303, 336 Geller M.J., Beers T.C., Jun. 1982, PASP, 94, 421 Giacintucci S., Venturi T., Bardelli S., et al., Apr. 2006, New Astronomy, 11, 437 Gioia I.M., Henry J.P., Mullis C.R., Ebeling H., Wolter A., Jun. 1999, AJ, 117, Girardi M., Biviano A., Jun. 2002, Optical Analysis of Cluster Mergers, 39–77, ASSL Vol. 272: Merging Processes in Galaxy Clusters Girardi M., Fadda D., Giuricin G., et al., Jan. 1996, ApJ, 457, 61 Girardi M., Escalera E., Fadda D., et al., Jun. 1997, ApJ, 482, 41 Gnedin O.Y., Oct. 1999, Ph.D. Thesis Haines C.P., Clowes R.G., Campusano L.E., Adamson A.J., May 2001, MNRAS, 323, 688 Huo Z.Y., Xue S.J., Xu H., Squires G., Rosati P., Mar. 2004, AJ, 127, 1263 Jeltema T.E., Canizares C.R., Bautz M.W., Buote D.A., May 2005, ApJ, 624, Jones C., Forman W., Jan. 1999, ApJ, 511, 65 Khosroshahi H.G., Ponman T.J., Jones L.R., Oct. 2006, MNRAS, 372, L68 King I., Oct. 1962, AJ, 67, 471 Knebe A., Müller V., Feb. 2000, A&A, 354, 761 Kolokotronis V., Basilakos S., Plionis M., Georgantopoulos I., Jan. 2001, MNRAS, 320, 49 Kriessler J.R., Beers T.C., Jan. 1997, AJ, 113, 80 Lin Y.T., Mohr J.J., Dec. 2004, ApJ, 617, 879 Lopes P.A.A., de Carvalho R.R., Capelato H.V., et al., Sep. 2006, ApJ, 648, 209 Markevitch M., Vikhlinin A., Dec. 2001, ApJ, 563, 95 Markevitch M., Gonzalez A.H., Clowe D., et al., May 2004, ApJ, 606, 819 Materne J., Apr. 1979, A&A, 74, 235 Maughan B.J., Jones L.R., Ebeling H., et al., Apr. 2003, ApJ, 587, 589 Maurogordato S., Proust D., Beers T.C., et al., Mar. 2000, A&A, 355, 848 Mazure A., Katgert P., den Hartog R., et al., Jun. 1996, A&A, 310, 31 Miller N.A., Dec. 2005, AJ, 130, 2541 Miller N.A., Owen F.N., Hill J.M., et al., Oct. 2004, ApJ, 613, 841 Mohr J.J., Fabricant D.G., Geller M.J., Aug. 1993, ApJ, 413, 492 Mohr J.J., Evrard A.E., Fabricant D.G., Geller M.J., Jul. 1995, ApJ, 447, 8 Mohr J.J., Geller M.J., Fabricant D.G., et al., Oct. 1996, ApJ, 470, 724 Moles M., Perea J., del Olmo A., Jan. 1986, MNRAS, 213, 365 Moss C., Whittle M., Sep. 2000, MNRAS, 317, 667 Neumann D.M., Böhringer H., Jul. 1997, MNRAS, 289, 123 M. Ramella et al.: Substructures in the WINGS clusters 11 Nipoti C., Treu T., Ciotti L., Stiavelli M., Dec. 2004, MNRAS, 355, 1119 Perea J., Moles M., del Olmo A., Jan. 1986a, MNRAS, 219, 511 Perea J., Moles M., del Olmo A., Jan. 1986b, MNRAS, 222, 49 Perea J., del Olmo A., Moles M., Jan. 1990, A&A, 228, 310 Pinkney J., Roettiger K., Burns J.O., Bird C.M., May 1996, ApJS, 104, 1 Pisani A., Dec. 1993, MNRAS, 265, 706 Pisani A., Feb. 1996, MNRAS, 278, 697 Plionis M., Benoist C., Maurogordato S., Ferrari C., Basilakos S., Sep. 2003, ApJ, 594, 144 Poggianti B.M., Bridges T.J., Komiyama Y., et al., Jan. 2004, ApJ, 601, 197 Popesso P., Biviano A., Böhringer H., Romaniello M., Jun. 2006, ArXiv Astrophysics e-prints Richstone D., Loeb A., Turner E.L., Jul. 1992, ApJ, 393, 477 Roettiger K., Loken C., Burns J.O., Apr. 1997, ApJS, 109, 307 Roettiger K., Stone J.M., Mushotzky R.F., Jan. 1998, ApJ, 493, 62 Rosati P., Tozzi P., Ettori S., et al., Jan. 2004, AJ, 127, 230 Schindler S., Müller E., May 1993, A&A, 272, 137 Schuecker P., Böhringer H., Reiprich T.H., Feretti L., Nov. 2001, A&A, 378, 408 Shane C.D., Wirtanen C.A., Sep. 1954, AJ, 59, 285 Slezak E., Durret F., Gerbal D., Dec. 1994, AJ, 108, 1996 Solanes J.M., Salvador-Solé E., González-Casado G., Mar. 1999, A&A, 343, 733 Thomas P.A., Colberg J.M., Couchman H.M.P., et al., Jun. 1998, MNRAS, 296, Valdarnini R., Ghizzardi S., Bonometto S., Mar. 1999, New Astronomy, 4, 71 van den Bergh S., 1960, MNRAS, 121, 387 van den Bergh S., Feb. 1961, PASP, 73, 46 van Dokkum P.G., Franx M., Fabricant D., Illingworth G.D., Kelson D.D., Sep. 2000, ApJ, 541, 95 Varela et al., Jan. 2007, A&A, submitted West M.J., Blakeslee J.P., Nov. 2000, ApJ, 543, L27 West M.J., Oemler A.J., Dekel A., Apr. 1988, ApJ, 327, 1 Xu W., Fang L.Z., Wu X.P., Apr. 2000, ApJ, 532, 728 Yagi M., Kashikawa N., Sekiguchi M., et al., Jan. 2002, AJ, 123, 87 Zabludoff A.I., Geller M.J., Huchra J.P., Ramella M., Oct. 1993, AJ, 106, 1301 12 M. Ramella et al.: Substructures in the WINGS clusters Appendix A: The catalog of substructures We provide here the catalog of substructures. In Table A.1 we give, for each substructure: (1) the name of the parent cluster; (2) the classification of the structure as main (M), subcluster (S), or background (B) together with their order number; (3) right ascension (J2000), and (4) declination (J2000) in decimal degrees of the DEDICA peak; the parameters of the ellipse we obtain from the variance matrix of the coordinates of galaxies in the substructure, i.e. (5) major axis in arcminutes, (6) ellipticity, and (7) position angle in degrees; (8) luminosity; (9) χ2S . ID class αJ2000 δJ2000 a e PA L χ (deg) (deg) (arcmin) (deg) (1012 L⊙) A0085 M 10.4752 -9.3025 2.0 0.23 -17. 0.41536 48.4 A0085 S1 10.4410 -9.4430 1.8 0.35 -39. 0.17649 42.9 A0085 S2 10.3947 -9.3501 2.3 0.40 -72. 0.12337 32.8 A0119 M 14.0625 -1.2630 4.1 0.44 -65. 0.83955 63.4 A0119 S1 14.1183 -1.2106 4.6 0.60 -23. 0.26847 50.8 A0119 S2 14.0267 -1.0441 3.4 0.34 80. 0.03592 32.0 A0119 B1 13.9402 -1.4979 3.1 0.39 46. – 23.5 A0147 M 17.0648 2.2033 3.9 0.45 79. 0.31392 45.2 A0147 S1 16.8673 2.1393 4.1 0.25 -50. 0.05638 24.8 A0147 S2 17.1925 1.9284 4.4 0.38 55. 0.05052 21.4 A0147 B1 17.0753 2.3174 4.4 0.37 75. – 58.0 A0151 M 17.2186 -15.4219 1.7 0.26 -16. 0.47344 39.9 A0151 S1 17.3516 -15.3652 2.1 0.37 -58. 0.13761 42.9 A0151 S2 17.2632 -15.5564 1.6 0.26 -53. 0.19762 40.9 A0151 B1 17.1375 -15.6116 1.5 0.08 -4. – 59.0 A0160 M 18.2344 15.5126 3.6 0.37 82. 0.55525 66.7 A0160 S1 18.2483 15.3138 5.0 0.41 85. 0.03120 38.1 A0160 S2 18.1141 15.7501 3.0 0.16 86. 0.15196 28.3 A0160 S3 17.9981 15.4150 3.9 0.41 0. 0.06315 27.8 A0168 M 18.7755 0.3999 3.1 0.32 -11. 0.24492 30.5 A0168 S1 18.8799 0.2993 2.0 0.33 4. 0.06871 28.6 A0193 M 21.2894 8.6994 2.1 0.08 36. 0.61982 105.7 A0193 B1 20.9945 8.6119 4.9 0.45 -1. – 39.1 A0311 M 32.3793 19.7722 2.3 0.19 43. 0.43320 44.0 A0376 M 41.4276 36.9517 1.7 0.07 -67. 0.13477 40.8 A0376 S1 41.5569 36.9214 4.4 0.49 -22. 0.24350 33.6 A0500 M 69.6476 -22.1308 2.0 0.31 16. 0.41203 45.5 A0500 S1 69.5915 -22.2377 2.3 0.19 36. 0.20520 47.3 A0602 M 118.3638 29.3528 1.8 0.55 -46. 0.20112 55.4 A0602 S1 118.1848 29.4145 2.3 0.52 31. 0.08470 34.8 A0671 M 127.1237 30.4269 1.6 0.24 -51. 0.68582 69.8 A0671 S1 127.2241 30.4342 2.0 0.40 -5. 0.19736 44.3 A0671 S2 127.1617 30.2967 1.9 0.24 -90. 0.13778 43.0 A0754 M 137.1073 -9.6370 2.0 0.25 53. 0.56063 46.8 A0754 S1 137.3707 -9.6760 3.2 0.53 -8. 0.30590 54.9 A0754 S2 137.2619 -9.6367 1.7 0.14 76. 0.23734 51.2 A0957x M 153.4095 -0.9259 2.0 0.09 -83. 0.42106 38.6 A0957x B1 153.5517 -0.7023 2.2 0.44 -63. – 37.9 A0970 M 154.3595 -10.6921 1.5 0.27 -30. 0.46130 62.5 A0970 S1 154.2369 -10.6422 1.7 0.15 15. 0.13660 42.3 A0970 B1 154.1833 -10.6771 1.8 0.23 -76. – 32.2 A1069 M 159.9418 -8.6883 2.8 0.31 52. 0.37270 50.2 A1069 S1 159.9286 -8.5506 2.4 0.23 88. 0.18532 32.7 A1069 B1 159.7678 -8.9262 3.5 0.55 77. – 54.7 A1291 M 173.0467 56.0255 2.5 0.51 -11. 0.25272 32.1 A1291 S1 172.9090 56.1872 1.4 0.48 -82. 0.03530 37.6 A1631a M 193.2410 -15.3413 1.4 0.35 40. 0.20077 33.9 A1736 M 202.0097 -27.3131 3.1 0.35 58. 0.41824 52.1 A1736 S1 201.7305 -27.0170 2.8 0.32 9. 0.24023 42.6 A1736 S2 201.7662 -27.4067 3.4 0.28 7. 0.14528 42.3 A1736 S3 201.5672 -27.4291 2.7 0.44 73. 0.16926 40.4 A1736 S4 201.9057 -27.1600 3.5 0.21 -1. 0.24192 32.9 A1736 S5 201.7036 -27.1236 3.0 0.39 -12. 0.40395 31.7 A1795 M 207.1911 26.5586 0.6 0.17 55. 0.12341 52.4 M. Ramella et al.: Substructures in the WINGS clusters 13 ID class αJ2000 δJ2000 a e PA L χ (deg) (deg) (arcmin) (deg) (1012 L⊙) A1795 S1 207.2329 26.7362 1.3 0.38 82. 0.05123 46.7 A1831 M 209.8120 27.9714 1.9 0.43 9. 1.08418 56.0 A1831 S1 209.7356 28.0636 2.1 0.34 41. 0.36295 59.7 A1831 B1 209.5725 28.0206 1.7 0.25 -10. – 47.7 A1991 M 223.6405 18.6390 2.3 0.54 -78. 0.28195 40.3 A1991 S1 223.7575 18.7812 2.5 0.32 41. 0.11412 49.9 A1991 B1 223.7683 18.7022 1.7 0.31 71. – 36.6 A2107 M 234.9497 21.8075 2.7 0.19 48. 0.50994 61.1 A2107 B1 235.0699 22.0127 4.3 0.48 83. – 32.9 A2107 B2 235.1409 21.8276 2.4 0.10 55. – 20.4 A2124 M 236.2400 36.0990 1.3 0.24 32. 0.41727 43.3 A2124 B1 236.0207 36.1779 1.6 0.22 34. – 59.7 A2149 M 240.3723 53.9406 1.5 0.46 -10. 0.37347 48.7 A2169 M 243.4867 49.1875 0.6 0.22 72. 0.15358 34.2 A2256 M 255.9260 78.6412 1.9 0.29 -86. 1.46563 95.6 A2256 B1 256.3094 78.4886 2.2 0.48 75. – 48.2 A2256 B2 256.6024 78.4283 2.0 0.12 -88. – 46.8 A2399 M 329.3693 -7.7772 3.5 0.64 -26. 0.40505 38.8 A2415 M 331.3829 -5.5444 2.3 0.23 -60. 0.36780 44.8 A2415 S1 331.5610 -5.3960 2.3 0.33 52. 0.05032 33.9 A2415 B1 331.3800 -5.4017 1.9 0.34 32. – 41.4 A2415 B2 331.3295 -5.3890 1.6 0.41 4. – 37.3 A2457 M 338.9462 1.4765 4.3 0.50 -84. 0.88720 107.3 A2457 S1 339.0392 1.6459 4.1 0.65 -50. 0.05960 23.1 A2457 B1 339.0667 1.3266 5.6 0.53 77. – 73.8 A2572a M 349.3192 18.7197 2.9 0.39 23. 0.44749 47.8 A2572a S1 349.1122 18.5320 4.1 0.25 8. 0.07320 34.5 A2572a S2 349.3851 18.5395 2.6 0.31 67. 0.05345 25.1 A2572a S3 349.0037 18.7220 3.2 0.34 86. 0.00884 20.5 A2593 M 351.0766 14.6539 1.1 0.25 58. 0.28333 33.8 A2593 S1 351.0677 14.4048 2.2 0.42 80. 0.09810 27.0 A2622 M 353.7384 27.3856 3.1 0.09 76. 0.48920 68.1 A2622 S1 353.4880 27.2877 4.2 0.35 46. 0.03070 35.1 A2622 B1 353.7837 27.3182 3.0 0.49 -5. – 53.7 A2622 B2 353.8009 27.6217 2.6 0.38 68. – 29.9 A2657 M 356.1725 9.1818 4.5 0.47 22. 0.27061 49.6 A2657 S1 356.2755 9.1799 3.2 0.35 10. 0.20771 34.0 A2657 B1 355.9569 8.9422 2.8 0.06 -10. – 38.6 A2665 M 357.7050 6.1582 3.6 0.26 71. 0.67950 121.8 A2665 S1 357.4003 5.8659 4.9 0.64 84. 0.01780 17.5 A2665 B1 357.8218 6.3522 3.5 0.44 -50. – 32.7 A2734 M 2.8363 -28.8652 3.4 0.33 3. 0.48700 56.3 A2734 S1 2.6950 -28.7728 4.0 0.27 -7. 0.18970 55.0 A2734 S2 2.6987 -29.0394 3.2 0.24 55. 0.03030 43.3 A2734 S3 2.5727 -29.0562 3.4 0.51 84. 0.03100 33.1 A2734 B1 2.7701 -28.6488 3.7 0.39 14. – 28.0 A3128 M 52.4825 -52.5764 2.2 0.17 -36. 0.40452 37.2 A3128 S1 52.7366 -52.7089 4.1 0.44 -9. 0.25240 51.8 A3128 S2 52.6655 -52.4413 2.3 0.32 -65. 0.19646 51.0 A3128 S3 52.3697 -52.7570 3.2 0.47 81. 0.17169 39.0 A3158 M 55.7477 -53.6334 2.6 0.62 2. 0.70205 52.8 A3158 S1 55.8382 -53.6780 3.4 0.58 10. 0.45553 53.4 A3266 M 67.7893 -61.4637 1.1 0.27 -72. 0.42993 63.5 A3376 M 90.1628 -39.9950 2.5 0.14 -43. 0.33708 43.1 A3376 S1 90.4344 -39.9776 2.7 0.20 -4. 0.21279 59.8 A3376 S2 90.4712 -39.7946 2.1 0.39 -88. 0.00904 31.2 A3528b M 193.5928 -29.0136 1.3 0.04 -24. 0.65638 66.4 A3528b S1 193.6030 -29.0721 1.3 0.26 10. 0.16706 59.0 A3530 M 193.9098 -30.3606 1.9 0.26 33. 0.53043 34.9 A3532 M 194.3035 -30.3732 3.6 0.52 -44. 0.76920 51.1 A3532 B1 194.0413 -30.2130 3.8 0.38 66. – 56.3 14 M. Ramella et al.: Substructures in the WINGS clusters ID class αJ2000 δJ2000 a e PA L χ (deg) (deg) (arcmin) (deg) (1012 L⊙) A3558 M 201.9587 -31.4892 4.9 0.54 49. 1.14860 64.1 A3558 B1 202.2501 -31.6887 2.6 0.49 44. – 37.6 A3562 M 203.4603 -31.6812 2.5 0.18 82. 0.39087 42.4 A3562 S1 203.1622 -31.7742 4.0 0.40 -86. 0.15010 51.1 A3562 S2 203.3137 -31.6953 3.6 0.40 76. 0.11706 41.0 A3562 S3 203.6982 -31.7171 2.5 0.21 -50. 0.07820 36.7 A3562 S4 203.6541 -31.5969 4.0 0.55 -81. 0.06542 30.3 A3667 M 303.1637 -56.8598 2.1 0.14 23. 0.56803 42.0 A3667 S1 303.5297 -56.9660 3.0 0.39 -86. 0.27086 39.2 A3667 S2 302.7241 -56.6674 1.5 0.17 75. 0.28948 34.0 A3667 S3 302.7081 -56.7557 2.3 0.39 12. 0.09961 33.7 A3716 M 312.9910 -52.7677 4.7 0.38 36. 0.76450 77.9 A3716 S1 312.9769 -52.6434 3.5 0.22 -6. 0.49159 56.9 A3716 B1 312.7735 -52.8976 4.1 0.40 28. – 48.4 A3716 B2 313.1888 -52.4785 3.2 0.23 21. – 22.6 A3880 M 336.9796 -30.5474 3.8 0.33 -50. 0.25840 44.1 A3880 B1 336.8684 -30.8171 2.5 0.30 64. – 30.9 A3880 B2 336.7356 -30.7839 2.7 0.35 46. – 28.4 IIZW108 M 318.4443 2.5706 2.6 0.38 42. 0.49940 33.4 IIZW108 S1 318.6247 2.5533 3.2 0.36 -14. 0.08565 42.3 IIZW108 B1 318.3335 2.7751 1.6 0.24 43. – 44.5 IIZW108 B2 318.5190 2.8039 2.5 0.44 22. – 33.6 MKW3s M 230.3916 7.7281 2.3 0.30 -8. 0.37614 48.7 MKW3s S1 230.4576 7.8769 3.3 0.46 -22. 0.04585 39.3 MKW3s B1 230.7349 7.8882 2.0 0.40 16. – 25.5 RX0058 M 14.5875 26.8816 2.4 0.22 -31. 0.31967 44.2 RX0058 S1 14.7652 27.0424 3.8 0.60 64. 0.31661 50.6 RX0058 B1 14.4012 26.7041 3.3 0.08 -12. – 28.9 RX1740 M 265.1398 35.6416 2.8 0.21 -26. 0.17896 42.1 RX1740 S1 265.2600 35.4366 3.3 0.22 38. 0.01946 31.7 RX1740 S2 264.8688 35.6053 3.7 0.52 28. 0.01340 27.9 RX1740 S3 265.0744 35.8116 3.5 0.44 -7. 0.01166 21.8 Z2844 M 150.7281 32.6483 2.9 0.20 -85. 0.10143 48.3 Z2844 S1 150.6524 32.7621 5.2 0.58 63. 0.04930 50.5 Z2844 S2 150.5821 32.8890 2.6 0.39 0. 0.00395 23.1 Z8338 M 272.7447 49.9078 3.0 0.37 -67. 0.45876 43.1 Z8338 S1 272.8606 49.7916 3.2 0.11 62. 0.05549 31.9 Z8338 S2 272.6903 49.9737 3.1 0.67 62. 0.07089 25.7 Z8338 B1 272.4479 49.6815 1.7 0.18 9. – 25.8 Z8852 M 347.6024 7.5824 2.7 0.41 -45. 0.76110 67.9 Z8852 S1 347.5926 7.3999 5.8 0.56 62. 0.12022 32.4 Z8852 S2 347.6986 7.8018 2.3 0.13 81. 0.02493 25.5 Z8852 B1 347.7381 7.6808 2.1 0.25 -73. – 35.9 Z8852 B2 347.4951 7.8165 2.4 0.21 72. – 27.0 M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 1 Online Material M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 2 Fig. 6. Isodensity contours (logarithmically spaced) of the 55 clusters with significant structures. The title lists the coordinates of the center. The orientation is East to the left, North to the top. Galaxies belonging to the systems detected by DEDICA are shown as dots of different colors. Black, light green, blue, red, magenta, dark green are for the main system and the subsequent substructures ordered as in Table A.1. Large symbols are for galaxies with MV ≤ −17.0 that lie where local densities are higher than the median local density of the structure the galaxy belongs to. Open symbols mark the positions of the first- and second-ranked cluster galaxies, BCG1 and BCG2 respectively. M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 3 Fig. 6. (continued) M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 4 Fig. 6. (continued) M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 5 Fig. 6. (continued) M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 6 Fig. 6. (continued) M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 7 Fig. 6. (continued) M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 8 Fig. 6. (continued) M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 9 Fig. 6. (continued) M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 10 Fig. 6. (continued) M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 11 Fig. 6. (continued) Introduction The Data The DEDICA Procedure The probability density Cluster Identification Cluster Significance and of Membership Probability Tweaking the Algorithm with Simulations Substructure detection in WINGS clusters The Catalog of Substructures Properties of substructures Brightest Cluster Galaxies Summary The catalog of substructures ABSTRACT We search for and characterize substructures in the projected distribution of galaxies observed in the wide field CCD images of the 77 nearby clusters of the WIde-field Nearby Galaxy-cluster Survey (WINGS). This sample is complete in X-ray flux in the redshift range 0.04<|startoftext|> Ising-like dynamics and frozen states in systems of ultrafine magnetic particles Stefanie Russ1 and Armin Bunde1 Institut für Theoretische Physik III, Justus-Liebig-Universität Giessen, D-35392 Giessen, Germany (Dated: February 28, 2022) We use Monte-Carlo simulations to study aging phenomena and the occurence of spinglass phases in systems of single-domain ferromagnetic nanoparticles under the combined influence of dipolar interaction and anisotropy energy, for different combinations of positional and orientational disorder. We find that the magnetic moments oriente themselves preferably parallel to their anisotropy axes and changes of the total magnetization are solely achieved by 180 degree flips of the magnetic moments, as in Ising systems. Since the dipolar interaction favorizes the formation of antiparallel chain-like structures, antiparallel chain-like patterns are frozen in at low temperatures, leading to aging phenomena characteristic for spin-glasses. Contrary to the intuition, these aging effects are more pronounced in ordered than in disordered structures. PACS numbers: 75.75.+a, 75.40.Mg, 75.50.Lk, 75.50.Tt INTRODUCTION In the last decade, systems of ultrafine magnetic nanoparticles have received considerable interest, due both to their important technological applications (mainly in magnetic storage and recordings) and their rich and often unusual experimental behavior, which is related to their role as a complex mesoscopic system [1, 2]. It has been discussed controversially in the past, under which circumstances these systems are able to show spin-glass phases. While experiments on disor- dered magnetic materials present indications of a spin- glass phase [2, 3, 4] or of a glassy-like random anisotropy system [5], the situation is less clear on the theoretical side. Simulations of the zero-field cooling (ZFC) and field-cooling susceptibility showed no indication of a spin- glass phase [6, 7]. In contrast, simulations on aging [8] (on a simplified system, where the dipolar interaction was only considered up to a cut-off radius) and magnetic relaxation [9, 10] favorize the spin-glass hypothesis, but the structure of the frozen history-dependent states as well as the actual mechanism leading to them has not yet been clarified. In this letter, in order to clarify these questions, we use Monte Carlo simulations [11] to study aging phe- nomena on a large variety of systems of ultrafine mag- netic nanoparticles (see Fig. 1). Our simulations do not only point to the existence of frozen history-dependent states at low temperatures that are characteristic for spin glasses, but also yield an insight into the structure of the frozen states and the underlying dynamics. We find that under the combined influence of dipolar and anisotropy energy, the magnetic moments have a tendency to align in an Ising-like manner either parallel or antiparallel to their anisotropy axes and change their directions by 180 degree flips as in Ising systems. This way, chain-like structures are formed where all magnetic moments point into the same direction and neighboring chains have the tendency to oriente themselves in an antiparallel way. (a) (b) (c) (d) FIG. 1: Two-dimensional sketches of the geometries consid- ered in this paper: (a) cubic arrangement of the particles and all anisotropy axes aligned into the z-direction, (b) liquid-like arrangement and all axes arranged, (c) cubic arrangement and all axes randomly oriented and (d) liquid-like arrange- ment and all axes randomly oriented. In the simulations, the systems were three-dimensional (64 particles per cube). These topological chains that freeze in at low tempera- tures, form simple straight lines, when the particles are arranged on the sites of a cubic lattice [10] and form com- plex winded curves, when the arrangement of the parti- cles is liquid-like. As a consequence, if a small external magnetic field is applied, the magnetic moments can fol- low the field more easily in a disordered system than in the ordered configuration. This leads, contrary to the in- tuition, to more pronounced aging effects (characteristic for spin glasses) in ordered than in disordered structures. http://arxiv.org/abs/0704.0580v1 MODEL SYSTEM AND NUMERICAL SIMULATIONS For the numerical calculations, we focus on the same model as in earlier papers [6, 9], which (i) assumes a coherent magnetization rotation within the anisotropic particles, and (ii) takes into account the magnetic dipo- lar interaction between them. Every particle i of volume Vi is considered to be a single magnetic domain ~µi with all its atomic magnetic moments rotating coherently and the Vi are taken from a Gaussian distribution of width σV = 0.4 and 〈V 〉 = 1 (see also [6, 9]). This results in a constant absolute value |µi| = MsVi of the total mag- netic moment of each particle, whereMs is the saturation magnetization. The energy of each particle consists of three contributions: anisotropy energy, dipolar interac- tion and magnetic energy of an external field. We assume a temperature independent uniaxial anisotropy energy A = −KVi((~µi~ni)/|~µi|) 2, where K is the anisotropy constant and the unit vector ~ni denotes the easy direc- tions. Eventually, the magnetic moments are coupled to an external field H leading to the additional field energy H = −~µi ~H . Finally, the energy of the magnetic dipo- lar interaction between two particles i and j separated by ~rij is given by E (i,j) D = (~µi~µj)/r ij − 3(~µi~rij)(~µj~rij)/r Adding up the three energy contributions and summing over all N particles we obtain the total energy j 6=i (i,j) . (1) In the Monte Carlo simulations we concentrate on sam- ples of N = L3 particles placed inside a cube of side length L = 4 and average over 1000 configurations. Dur- ing the simulations, both, the positions of the particles and their easy axes are kept fixed. The unitless concen- tration c is defined as the ratio between the total vol- Vi occupied by the particles and the volume Vs of the sample. Here, we focus on the concentra- tion c/c0 ≈ 0.3, where c0 = 2K/M s is a dimensionless material-dependent constant, c0 ∼ 1.4 for iron nitride and c0 ∼ 2.1 for maghemite nanoparticles [9]. We also tested systems with higher concentrations c/c0 ≈ 0.4 and (the extremely high concentration) c/c0 ≈ 0.6 and found that the results remain qualitatively unchanged. The temperature is measured in units of the reduced temper- ature T̃ ≡ 1/(2βKV ), where 2KV is the height of the anisotropy barrier and β = 1/(kBT ). Similarly, the mag- netic field is measured in units of the anisotropy field Ha = 2K/Ms. The relaxation of the individual magnetic moments is simulated by the standard Metropolis algo- rithm [12]. In contrast to [8], where dipole interactions between the particles were only considered up to a cut-off radius, we calculate the interaction energies by the Ewald sum method with periodic boundary conditions in x, y and z-direction [6, 13] and thus are able to account fully (a) (b) (c) (d) FIG. 2: (Colors online) The magnetization m(τ ) after wait- ing times tw = 0 (filled symbols) and tw = 10000 Monte Carlo steps (open symbols) is plotted versus τ (number of Monte Carlo steps with applied external field) for (a) cubic lattice and aligned axes, (b) liquid-like system and aligned axes, (c) cubic system and random axes and (d) liquid- like systems and random axes for the reduced temperatures T̃ = kBT/(2KV ) = 5 (black symbols, circles), T̃ = 1/10 (red symbols, squares) and T̃ = 1/40 (blue symbols, diamonds). for the long-range character of the dipole forces. The magnetic moment ~µi is characterized by the spherical angles θi and ϕi relative to a coordinate frame, where the z-axis is parallel to the external field [9, 14, 15]. To study the magnetic relaxation we determine as a function of time t (number of Monte Carlo steps) for each particle i the angle θi between the magnetic moment ~µi and the z-axis, from which we obtain the relevant quantities, as e.g. the normalized magnetization, m(t) = cos θi(t). (2) To obtain the orientation of ~µi relative to ~ni, we intro- duce the ”orientational order parameter” Oµ ≡ 〈|~µi~ni|〉, i.e. the average of the absolute values of the scalar prod- uct ~µi~ni over all N particles and all configurations. Oµ does not distinguish between the parallel and the an- tiparallel alignment. It is equal to zero when all ~µi are perpendicular to their axes ~ni and equal to 1 if they are all parallel or antiparallel to them. To study aging phenomena, we determine the magne- tization in a ZFC simulation. First, starting in a ran- dom configuration of the magnetic moments, the system is cooled down in the absence of an external field, from T = ∞ to a reduced temperature T̃ with a constant cool- ing rate of ∆β/∆t = 0.1, corresponding to 400 Monte Carlo steps for T̃ = 1/40 and 10 steps for T̃ = 1. Sec- ond, the cooling process is stopped at T̃ and the system is allowed to relax for a certain waiting time tw. Finally, in the third step, a small external field h = 0.1Ha is applied in z-direction. The magnetization m(τ) is determined as a function of τ ≡ t− tw (number of Monte Carlo steps af- ter switching on the field). Aging effects are represented by differences between the m(τ)-curves for different tw and occur, when many different relaxation rates exist in the system, so that after a given waiting time tw, the sys- tem has only partly relaxed towards equilibrium. Experi- mentally, aging effects have already been found in several spin-glasses, as e.g. in Permalloy/alumina granular films [16], rare-earth manganates [17], CuMn spin-glasses [18], multilayer systems [19] and in Fe3N nanoparticle sys- tems [20]. NUMERICAL RESULTS Figure 2 shows m(τ) for the systems of Fig. 1 without waiting time, tw = 0 (filled symbols), and for tw = 10 (open symbols). The different colors (and symbols) stand for three different temperatures T̃ = 1/5, 1/10 and 1/40. Clearly, all curves show aging effects, similar to the ex- perimental results of Refs. [16, 17, 18]. Systems with no or only small tw follow the external field faster than the systems with longer waiting times, indicating that the longer relaxation leads to more stable chains. The ag- ing effects are most pronounced for those systems where all anisotropy axes are oriented into the direction of the external field (Fig. 2(a,b)) and less pronounced but still visible for the systems with disordered anisotropy axes (Fig. 2(c,d)). In these systems with orientational disor- der, the m(τ) curves coincide for small τ and show aging effects only after a certain crossover time (close to 102 Monte Carlo steps). This indicates that in these systems a certain fraction of dipoles does not belong to quasi- stable chain-like structures and can follow the external field nearly instantaneously, independentely of the wait- ing time and thus dominate the short-time behavior. The aging effects decrease with increasing T̃ , when the order is destroyed by the thermal fluctuations. In order to understand the dynamical behavior in a more microscopic way, we compare m(τ) with the time- dependence of the corresponding orientational order pa- rameters Oµ(τ). Figure 3 shows Oµ in the 3rd step of the aging process for tw = 0 and tw = 10000 (filled and open symbols, respectively) and for the same geometries as before (see Fig. 1). The figure shows that quite con- trary to the expectation, apart from a slight minimum at intermediate τ , Oµ is constant in time for the systems of tw = 10000. Without waiting time, the curves start at much smaller values of Oµ, but increase rapidly un- til they reach at a crossover time τc of about 10 3 Monte Carlo steps the common plateau value. In the plateau regime, the dipolar moments ~µi are either oriented par- (a) (b) (c) (d) FIG. 3: (Colors online) The order parameter Oµ after a wait- ing time tw = 0 (filled symbols) and tw = 10000 (open sym- bols) is plotted versus τ (number of Monte Carlo steps) for the same geometries, temperatures, system parameters and symbols and colors as in Fig. 2. (a) (b) (c) (d) FIG. 4: (Colors online) The percentage Nup of particles per system pointing upwards after waiting times tw = 0 (filled symbols) and tw = 10000 (open symbols) is plotted versus τ (number of Monte Carlo steps) for the same geometries, temperatures, system parameters and symbols and colors as in Fig. 2. allel or antiparallel to their easy axes ~ni and do therefore flip only between these two directions. Accordingly, the value of Oµ does neither depend on the external field nor on the functional form of m(τ). Since Oµ(τ) stays con- stant for large tw or τ > τc, while m(τ) increases with time (see Fig. 2), the ~µi have already reached their par- FIG. 5: (Colors online) The magnetization m(τ ) after a wait- ing time tw = 0 (filled symbols) and tw = 10000 (open sym- bols) for T̃ = 1/10 (red symbols) and T̃ = 1/40 (blue symbols) of systems with aligned and randomly oriented anisotropy axes (red circles and diamonds, respectively for T̃ = 1/10 and blue squares and triangles respectively for T̃ = 1/40) are plotted versus τ (number of Monte Carlo steps) for systems without dipole-interaction. allel or antiparallel position and can only perform spin flips by 180 degrees, thereby increasing m(τ) and leav- ing Oµ unchanged. To make this point still clearer, we plot in Fig. 4 the percentage Nup of particles pointing upwards, i.e. with ϑi < π/2, again for the geometries of Fig. 1. The similarity between Fig. 4 and Fig. 2 is obvi- ous, showing that the number of the magnetic moments oriented upwards determine the shape of m(τ). We therefore arrive at a remarkably simple Ising-like dynamics of these ultrafine magnetic particles. The amount of aging is directly related to the degree of or- der a system can achieve during tw. In the fully ordered system of Figs. 1-3(a), after a long waiting time tw, the ~µi prefer to be aligned in stable chains [10] along the z- direction and thus cannot follow an external field easily. Single magnetic moments inside a chain will hardly flip to the other side and flips of whole chains possess extremely large relaxation times. Without waiting time, on the other hand, the ~µi are in unstable positions which allows them to follow the external field quite rapidly, leading to large aging effects in ordered systems. As Figs. 2(b-d) show, the situation is different in systems with positional and/or orientational disorder. The relaxation times for spin flips decrease with the amount of disorder, in partic- ular with the amount of orientational disorder. When the chains are winded and aligned into different directions, they are less stable and possess a large variety of inter- mediate positions to flip to the other side. Accordingly, aging effects become weaker with increasing disorder. For illustration, we visualize the aging process in Fig. 6 for the system with the highest order and the strongest aging effects, i.e. for the cubic system with aligned anisotropy axes. For this visualization, we follow the definition of the transversal order parameter of Ref. [10]: each of the L2 sites in the xy plane can be either a + site or a − site, if a chain has already been formed and all magnetic moments in the chain point into the positive or negative z direction, respectively (white sites). If this is not the case, the site is a 0 site (grey sites). The figure shows that chains are quite obviously formed in the sec- ond step of the aging process during the waiting time tw, as can most easily be seen by comparing Fig. 6(a), where tw = 10000 with 6(d) where tw = 0. In (a), many chains are formed during tw that appear to be quite stable in the following 3rd step of the aging process (Fig. 6(b,c)), when an external field is applied in the + direction. We can see that most of the − chains persist in spite of the external field. The situation is different in Fig. 6(d-f), where only few chains exist at the end of the 2nd step of the aging process (Fig. 6(d)). Here, after switching on the external magnetic field, new chains can be built from the 0 sites and the system therefore follows the field much easier than in Fig. 6(a-c). Recently, it has been argued that also a broad distri- bution of anisotropy energy barriers might lead to aging effects in superparamagnetic systems [20]. To show that these kinds of aging effects are in fact negligible com- pared with systems where both energy contributions are present, we have studied systems without dipole interac- tion (solely anisotropy energy) at temperatures T̃ = 1/10 and 1/40. In this case, the particle positions play no role, so that the geometry of Fig. 1(a) and (b) as well as (c) and (d) are physically identical. The results of m(τ) for these two geometries are shown in Fig. 5 for the same aging procedure as before. The figure shows that the dif- ferences between the curves for tw = 0 and tw = 10000 are orders of magnitude smaller than in the systems with dipolar interaction. It is interesting to note that also for systems with only dipolar interaction, some kind of ag- ing can be seen, but orders of magnitude smaller than for systems with both energy contributions. In summary, analyzing the microscopic dynamics of ul- trafine magnetic particles, we found that irrespective of the strength of the dipolar interaction, the dipoles ori- ente themselves either parallel or antiparallel to their anisotropy axes. We therefore arrive at a remarakably simple picture of the dipole dynamics, where similar to the Ising model, the ~µi perform ”spin flips” between these two orientations. Aging effects occur when after a certain waiting time, the magnetic dipoles have arranged them- selves in stable configurations and flips of single magnetic moments are suppressed. These aging effects increase in a counter-intuitive way with the order of the system and are thus most pronounced in completely orderded sys- tems with cubic arrangement of the particles and axes aligned into the direction of the magnetic field. (a) (b) (c) (d) (e) (f) FIG. 6: Visualization of the chains perpendicular to the xy- plane in the cubic system with aligned anisotropy axes at T̃ = 1/5 for one typical system. The complete chains are indicated by white sites and by + or − signs, depending on the direction of the chain. Sites, where chains have not (yet) been built are indicated by the grey shade. (a-c) System with waiting time tw = 10000, i.e. (a) after the cooling process and tw = 10000 (b,c) after an external magnetic field in the + direction has been applied for (b) 1000 and (c) 10000 Monte Carlo steps. (d-f) System without waiting time (tw = 0), i.e. (d) after the cooling process (and tw = 0) (e,f) after an external magnetic field in the + direction has been applied for (e) 1000 and (f) 10000 Monte Carlo steps. ACKNOWLEDGEMENTS We gratefully acknowledge very valuable discussions with W. Kleemann and financial support from the Deutsche Forschungsgemeinschaft. [1] X. Batlle and A. Labarta, J. Phys. D 35, R15 (2002). [2] Xi Chen, S. Sahoo, W. Kleemann, S. Cardoso and P. P. Freitas, Phys. Rev. B 70, 172411 (2004). [3] T. Jonsson, J. Mattsson, C. Djurberg, F. A. Khan, P. Nordblad, and P. Svedlindh, Phys. Rev. Lett. 75, 4138 (1995). [4] R. W. Chantrell, M. El-Hilo, and K. O Grady, IEEE Trans. Magn. 27, 3570 (1991). [5] W. Luo, S. R. Nagel, T. F. Rosenbaum, and R. E. Rosensweig, Phys. Rev. Lett. 67, 2721 (1991). [6] J. Garcia-Otero, M. Porto, J. Rivas, and A. Bunde, Phys. Rev. Lett. 84, 167 (2000). [7] M. Porto, Eur. Phys. J. B 45, 369 (2005). [8] J.-O. Andersson et al., Phys. Rev. B 56, 13983 (1997). [9] M. Ulrich, J. Garcia-Otero, J. Rivas, and A. Bunde; Phys. Rev. B 67, 024416 (2003). [10] S. Russ, A. Bunde, Phys. Rev. B 74, 064426 (2006). [11] U. Nowak, R. W. Chantrell, and E. C. Kennedy, Phys. Rev. Lett. 84, 163 (2000). [12] In every step, we select a particle i at random and gen- erate an attempted orientation of its magnetization, cho- sen in a spherical segment around the present orientation with an aperture angle dθ (see also Ref. [6]). By varying dθ, i.e. the maximum jump angle, it is possible to modify the rate of acceptance and to optimize the simulation. As a compromise between simulations at low and high temperatures, we chose dθ = 0.1 for all simulations, in- dependent of temperature, which refers to an accecptance rate between 0.5 and 0.8 for T̃ between 1/40 and 1/5. We also tested larger values of dθ with considerably lower ac- ceptation rates and found that they did not change the final states significantly. [13] M. P. Allen and D. J. Tildesley, Computer Simulation of Liquids (Clarendon, Oxford, 1987). [14] R. V. Chamberlin, G. Mozurkewich, and R. Orbach, Phys. Rev. Lett. 52, 867 (1984). [15] K. L. Ngai and U. Strom, Phys. Rev. B 38, 10350 (1988). [16] E. Vincent, Y. Yuan, J. Hamman, H. Hurdequint and F. Guevara; J. of Mag. and Mag. Mat. 161209 (1996). [17] A. K. Kundu, P. Nordblad and C. N. R. Rao; Phys. Rev. B 72, 144423 (2005). [18] L. Lundgren, P. Svendlindh, P. Nordblad and O. Beck- mann; Phys. Rev. Lett. 51, 811 (1983). [19] S. Bedanta, O. Petracic, E. Kentzinger, W. Kleemann, U. Rücker, A. Paul, Th. Brückel, S. Cardoso and P. P. Freitas, Phys. Rev. B 72, 024419 (2005). [20] M. Sasaki, P.E. Jönsson, H. Takayama and H. Mamiya; Phys. Rev. B 71, 104405 (2005). ABSTRACT We use Monte-Carlo simulations to study aging phenomena and the occurence of spinglass phases in systems of single-domain ferromagnetic nanoparticles under the combined influence of dipolar interaction and anisotropy energy, for different combinations of positional and orientational disorder. We find that the magnetic moments oriente themselves preferably parallel to their anisotropy axes and changes of the total magnetization are solely achieved by 180 degree flips of the magnetic moments, as in Ising systems. Since the dipolar interaction favorizes the formation of antiparallel chain-like structures, antiparallel chain-like patterns are frozen in at low temperatures, leading to aging phenomena characteristic for spin-glasses. Contrary to the intuition, these aging effects are more pronounced in ordered than in disordered structures. <|endoftext|><|startoftext|> Introduction Let G be a finite group and V be a finite G–module of characteristic p. If (|G|, |V |) = 1, then in [3, Theorem 2.2] R. Knörr presented a beautiful argument showing how to obtain strong upper bounds for k(GV ) (the number of conjugacy classes of GV ) by using only information on CG(v) for a fixed v ∈ V . Note that his result immediately implies the important special case that if G has a regular orbit on V (i.e., there is a v ∈ V with CG(v) = 1), then k(GV ) ≤ |V |, which was a crucial result in the solution of the k(GV )–problem. In this note we give a much shorter proof of this result (see Proposition 3.1 below). The main objective of the paper, however, is to modify and generalize Knörr’s argument in various directions to include non–coprime situations. This way we obtain a number of bounds on certain subsets of Irr(GV ), such as the following: Theorem A. Let G be a finite group and let V be a finite G–module of characteristic p. Let v ∈ V and C = CG(v) and suppose that (|C|, |V |) = 1. Then the number of irreducible characters whose restriction to 〈v〉 is not a multiple of the regular character of 〈v〉 is bounded above by |CV (ci)|, where the ci are representatives of the conjugacy classes of C. Theorem B. Let G be a finite group and V be a finite G–module. Let g ∈ G be of prime order not dividing |V |. Then the number of irreducible characters of GV whose restriction to A = 〈g〉 is not a multiple of the regular character of 〈g〉 is bounded above by |CG(g)| n(CG(g), CV (g)), where n(CG(g), CV (g)) denotes the number of orbits of CG(g) on CV (g). Stronger versions and refinements of these results are proved in the paper. It is hoped that these results prove useful in solving the non–coprime k(GV )–problem, as discussed, for instance, in [2] and [1]. Theorem A and B will be proved in Sections 3 and 4 below respectively. In Section 2, we will generalize a recent result of P. Schmid [5, Theorem 2(a)] stating that in the situation of the k(GV )–problem, if G has a regular orbit on V , then k(GV ) = |V | can only hold if G is abelian. We prove Theorem C. Let G be a finite group and V a finite faithful G–module with (|G|, |V |) = 1. Suppose that G has a regular orbit on V . Then k(GV ) ≤ |V | − |G|+ k(G). Our proof is different from the approach taken in [5], and we actually will prove a slightly stronger result including some non–coprime actions. Notation: If the group A acts on the set B, we write n(A,B) for the number of orbits of A on B. All other notation is standard or explained along the way. 2 k(GV) = |V| and regular orbits In this paper we often work under the hypothesis of the k(GV )–problem which is the following. 2.1 Hypothesis. Let G be a finite group and let V be a finite faithful G–module such that (|G|, |V |) = 1. Write p for the characteristic of V . In [5, Theorem 2(a)] P. Schmid proved that under Hypothesis 2.1, if G has a regular orbit on V , V is irreducible, and k(GV ) = |V |, then G is abelian, and from this it follows easily that either |G| = 1 and |V | = p, or G is cyclic of order |V | − 1. The proof in [5] is somewhat technical. The goal of this section is to give a short proof of a generalization of Schmid’s result based on a beautiful argument of Knörr [3]. We word it in such a way that we even do not need the coprime hypothesis, so that the result may even be useful to study the non–coprime k(GV )–problem. To do this, for any group X and x ∈ X we introduce the set Irr(X,x) = {χ ∈ Irr(G)| χ|〈x〉 is not an integer multiple of the regular character of 〈x〉} and write k(X,x) = |Irr(X,x)|. 2.2 Theorem. Let G be a finite group and let V be a finite G–module such that G possesses a regular orbit on V . Let v ∈ V be a representative of such an orbit. Then k(GV, v) ≤ |V | − |G|+ k(G) Proof. Let p be the characteristic if V . We proceed exactly as in Case (ii) of the proof of [3, Theorem 2.2]. Write C = CG(v). As C = 1, we see that for A = 〈v〉 we trivially have that |C| and |A| are coprime, and so that proof yields (1) (p − 1)|V | = τ∈Irr(GV ) (τη, τ)A where η is the character of A defined by η = p1A − ρA with ρA being the regular character of A. Now for any τ ∈ Irr(GV ) we have (2) (τη, τ)A = τ(a)(p − ρA(a))τ(a) 16=a∈A |τ(a)|2 = 0 if τ |A is an integer multiple of ρA ≥ p− 1 otherwise where the last step follows from [4, Corollary 4]. Next observe that if τ ∈ Irr(GV ) with V ≤ ker τ , then τ ∈ Irr(G) and clearly τ |A is not a multiple of ρA, and then clearly (3) (τη, τ)A = 16=a∈A |τ(a)|2 = 16=a∈A τ(1)2 = (p − 1)τ(1)2. Thus with (1), (2), and (3) we get (p − 1)|V | = τ∈Irr(G) (τη, τ)A + τ ∈ Irr(GV ), V 6≤ ker τ (τη, τ)A τ∈Irr(G) (p− 1)τ(1)2 + (k(GV, v) − k(G))(p − 1) which yields |V | ≥ τ∈Irr(G) τ(1)2 + k(GV, v) − k(G) = |G|+ k(GV, v) − k(G). This implies the assertion of the theorem, and we are done. ✸ The following consequence implies Schmid’s result [5, Theorem 2(a)]. 2.3 Corollary. Assume Hypothesis 2.1 and that G has a regular orbit on V . Then k(GV ) ≤ |V | − |G|+ k(G). In particular, if k(GV ) = |V |, then G is abelian. Proof. By Ito’s theorem and as (|G|, |V |) = 1, we know that χ(1) divides |G| for every χ ∈ Irr(GV ), so in particular p does not divide χ(1). Thus for any v ∈ V # we see that χ|〈v〉 cannot be an integer multiple of ρ〈v〉. Therefore k(GV, v) = k(GV ). Now the assertion follows from Theorem 2.2. ✸ 3 Bounds for k(GV) In this section we study more variations of Knörr’s argument in [3, Theorem 2.2] and generalize it to some non-coprime situations. We begin, however, by looking at a classical application of it. An important and immediate consequence of Knörr’s result is that if under Hypothesis 2.1 G has a regular orbit on V , then k(GV ) ≤ |V |. This important result can be obtained in the following shorter way. 3.1 Proposition. Let G be a finite group and let V be a finite faithful G–module. Let v ∈ V . k(GV, v) ≤ |CG(v)||V |, in particular, if (|G|, |V |) = 1 and G has a regular orbit on V , then k(GV ) ≤ |V |. Proof. PutA = 〈v〉. If τ ∈ Irr(GV, v), then by [4, Corollary 4] we know that 16=a∈A |τ(a)|2 ≥ p−1. With this and well–known character theory we get (p− 1)k(GV, v) ≤ k(GV, v) min τ∈Irr(GV,v) 16=a∈A |τ(a)|2 τ∈Irr(GV ) 16=a∈A |τ(a)|2 16=a∈A τ∈Irr(GV ) τ(a)τ(a) 16=a∈A |CGV (a)| 16=a∈A |CG(v)||V | = (p− 1)|CG(v)||V | This implies the first result. If (|G|, |V |) = 1, then by Ito’s result τ(1)||G| for all τ ∈ Irr(GV ), so p cannot divide τ(1), and thus k(GV, v) = k(GV ), and the second result now follows by choosing v to be in a regular orbit of G on V . ✸ Now we turn to generalizing Knörr’s argument. We discuss various ways to do so. 3.2 Remark. Let G be a finite group and let V be a finite faithful G–module of characteristic p. Let v ∈ V and put C = CG(v) and A = 〈v〉. Let Irr(GV,C, v) := Irr0(GV ) := Irr(GV )−{χ ∈ Irr(GV ) | χ|C×〈v〉 = τ ×ρA for a character τ of C} Irrp′(GV ) = {χ ∈ Irr(GV ) | p does not divide χ(1)}, so that clearly Irrp′(GV ) ⊆ Irr0(GV ). Note that if (|G|, |V |) = 1, then by Ito Irr(GV ) = Irrp′(GV ). To work towards our next result, we again proceed somewhat similarly as in [3, Theorem 2.2]. In the following we work under the hypothesis that (|C|, |V |) = 1. Let N = NG(A). Then |N : C| divides p − 1. Moreover, from Knörr’s proof we know that if ci (i = 1, . . . , k(C)) with c1 = 1 are representatives of the conjugacy classes of C and aj (j = 1, . . . , |N :C| ) are representatives of the N–conjugacy classes of A − 1 then, the ciaj are representatives of those conjugacy classes of GV which intersect C × (A− 1) nontrivially. Moreover recall from Knörr’s proof that for c ∈ C, 1 6= a ∈ A, g ∈ G, u ∈ V we know that (ca)gu ∈ C ×A if and only if g ∈ N and u ∈ CV (c Now define a character η on C ×A by η = 1C × (p1A − ρA). Then for c ∈ C, a ∈ A we have η(ca) = p, if a 6= 1 0, if a = 1 Therefore ηGV vanishes on all conjugacy classes of GV which intersect C × (A − 1) trivially, whereas for c ∈ C, 1 6= a ∈ A we have that ηGV (ca) = |C ×A| g ∈ G u ∈ V η̇((ca)gu) u∈CV (c η(cgag) |CV (c |CV (c)| = |N : C| |CV (c)|. Thus if xi (i = 1, . . . , k(GV )) are representatives of the conjugacy classes of GV , then we get k(GV ) ηGV (xi) = |N:C| ηGV (ciaj) |N:C| |N : C| |CV (ci)| |N : C| |N : C| |CV (ci)| = (p− 1) |CV (ci)|, and thus (p − 1) |CV (ci)| = k(GV ) ηGV (xi) = τ∈Irr(GV ) (τηGV , τ)GV τ∈Irr(GV ) (τη, τ)C×A (1). Now if τ ∈ Irr(GV ), as in [3] write τ |C×A = λ∈Irr(A) τλ × λ (2) where τλ is a character of C or τλ = 0. Then as in [3] we see that (τη, τ)C×A = |C ×A| c ∈ C a ∈ A τ(ca)η(ca)τ(ca) c ∈ C 1 6= a ∈ A τ(ca)τ(ca) ((τλ − τµ), (τλ − τµ))C (3) where ”≤” is some arbitrary ordering on Irr(A). Now if τλ − τµ is a nonzero multiple of ρC , then (τλ − τµ, τλ − τµ)C ≥ |C| (4) and thus (τη, τ)C×A ≥ |C|. Moreover, note that if τ ∈ Irr0(GV ), then not all τλ− τµ can be equal to 0 as otherwise from (2) we see that τ |C×A would be equal to τλ × ρA for any λ. So we can partition the set Irr(A) into two disjoint nonempty subsets Λ1 = {λ ∈ Irr(A) | τλ = τ1} and Λ2 = {λ ∈ Irr(A) | τλ 6= τ1}, and thus as in [3] we see that |Λ1| |Λ2| ≥ p − 1, so there are at least p − 1 pairs λ, µ ∈ Irr(A) such that τλ − τµ 6= 0. Thus (τη, τ)C×A ≥ p− 1 for all τ ∈ Irr0(GV ). (5) Therefore by (1) and (5) we get that (p− 1) |CV (ci)| = τ∈Irr(GV ) (τη, τ)C×A ≥ τ∈Irr0(GV ) (τη, τ)C×A ≥ (p − 1)|Irr0(GV )| and thus |Irr0(GV )| ≤ |CV (ci)|. (6) From now on we assume that C > 1. Now we repeat the arguments of this proof, but replace η by η1 = (|C|1C − ρC)× (p1A − ρA), so for c ∈ C and a ∈ A we have η1(ca) = |C|p if c 6= 1 and a 6= 1 0 if c = 1 or a = 1 Now from the above we know that the ciaj (i = 2, . . . , k(C), j = 1, . . . , |N :C| ) are representatives of those conjugacy classes which intersect (C − 1)× (A− 1) nontrivially. Clearly ηGV1 vanishes on all conjugacy classes of GV which intersect (C − 1)× (A− 1) trivially, whereas for 1 6= c ∈ C, 1 6= a ∈ A, if (|C|, |V |) = 1, we have that ηGV1 (ca) = |C ×A| g ∈ G u ∈ V η̇1((ca) u∈CV (c = |N | |CV (c)|. Next we conclude that k(GV ) ηGV1 (xi) = |N:C| ηGV1 (ciaj) = (p− 1)|C| |CV (ci)|, and so as in (1) we see that (p− 1)|C| |CV (ci)| = τ∈Irr(GV ) (τη1, τ)C×A (7). Now with (2) similarly as in [3] we see that (τη1, τ)C×A = |C ×A| c ∈ C a ∈ A τ(ca)η1(ca)τ(ca) 1 6= c ∈ C 1 6= a ∈ A τ(ca)τ(ca) 1 6= c ∈ C 1 6= a ∈ A λ∈Irr(A) τλ(c)λ(a) µ∈Irr(A) τµ(c)µ(a) λ,µ∈Irr(A) 16=c∈C τλ(c)τµ(c) 16=a∈A λ(a)µ(a) = (p − 1) λ∈Irr(A) 16=c∈C τλ(c)τλ(c) − λ, µ ∈ Irr(A) λ 6= µ 16=c∈C τλ(c)τµ(c) λ∈Irr(A) 16=c∈C τλ(c)τλ(c)− λ,µ∈Irr(A) 16=c∈C τλ(c)τµ(c) 16=c∈C (τλ(c) − τµ(c))(τλ(c)− τµ(c)) 16=c∈C |τλ(c) − τµ(c)| 2 (8) for some arbitrary ordering ≤ on Irr(A). Now recall that if τ ∈ Irr0(GV ), then not all of the τλ − τµ can be 0. So choose λ, µ ∈ Irr(C) such that τλ − τµ 6= 0. If all the τµ (µ ∈ Irr(A)) are integer multiples of ρC then put Λ1 = {φ ∈ Irr(A) | τφ = τλ} and Λ2 = {φ ∈ Irr(A) | τφ 6= τλ}, so Λ1 6= ∅ and Λ2 6= ∅ and from 0 ≤ (|Λ1| − 1)(|Λ2| − 1) we clearly deduce that |Λ1||Λ2| ≥ p− 1, so there are at least p− 1 pairs (φ1, φ2) ∈ Irr(A) × Irr(A) such that τφ1 − τφ2 is a nonzero multiple of ρC . So next we assume that τλ is not a multiple of ρC . Then put Γ1 = {φ ∈ Irr(A) | τλ − τφ is a multiple of ρC} Γ2 = {φ ∈ Irr(A) | τλ − τφ is not a multiple of ρC}. Clearly λ ∈ Γ1, so Γ1 6= ∅. If Γ2 = ∅, then Irr(A) = Γ1, and if we define Λ1, Λ2 as in the previous argument, we see that there are at least (p − 1) pairs (φ1, φ2) ∈ Irr(A) × Irr(A) such that τφ1 − τφ2 is a nonzero multiple of ρC . So now suppose Γ2 6= ∅. Then |Γ1| + |Γ2| = p, and if φ1 ∈ Γ1 and φ2 ∈ Γ2, then τφ1 − τφ2 = (τφ1 − τλ) + (τλ − τφ2) clearly is not a multiple of ρC , and by the same argument as used before we see that |Γ1||Γ2| ≥ p − 1, so there are at least (p − 1) pairs (φ1, φ2) ∈ Irr(A) × Irr(A) such that τφ1 − τφ2 is not a multiple of ρC . Altogether we thus have shown that for any τ ∈ Irr0(GV ) one of the following holds: (A) There are at least (p− 1) pairs (φ1, φ2) ∈ Irr(A)× Irr(A) such that τφ1 − τφ2 is a nonzero multiple of ρC , or (B) there are at least (p − 1) pairs (φ1, φ2) ∈ Irr(A)× Irr(A) such that τφ1 − τφ2 is not a multiple of ρC . Now it remains to consider two cases: Case 1: At least half of the τ ∈ Irr0(GV ) satisfy (A). Then for any of these τ by (3) and (4) we have (τη, τ)C×A = ((τλ − τµ), (τλ − τµ))C ≥ (p − 1)|C| and so by (1) we see that (p − 1) |CV (ci)| ≥ τ∈Irr0(GV ) (τη, τ)C×A ≥ |Irr0(GV )|(p− 1)|C| which implies |Irr0(GV )| ≤ |CV (ci)| (9). Case 2: At least half of the τ ∈ Irr0(GV ) satisfy (B). Then for any of these τ by (8) and [4, Corollary 4] we have (τη1, τ)C×A ≥ (p− 1)(k(C) − 1). Thus by (7) we have that (p− 1)|C| |CV (ci)| ≥ τ∈Irr0(GV ) (τη1, τ)C×A ≥ |Irr0(GV )|(p − 1) · (k(C)− 1) whence |Irr0(GV )| ≤ k(C)− 1 |CV (ci)| (10). Now we drop the assumption (|C|, |V |) = 1 and work towards a general bound for |Irr0(GV )|. For this, fix g0 ∈C such that g0 is of prime order q and put C0 = 〈g0〉 andN0 = NG(C0). Trivially there are at most |C0|(p − 1) = q(p − 1) conjugacy classes of GV that intersect C0 × (A − 1) nontrivially, and given 1 6= c ∈ C0, 1 6= a ∈ A, we see that for g ∈ G, u ∈ V (ca)gu = cg[cg, u]ag ∈ C0 ×A first implies c g ∈ C0, i.e., g ∈ N0, and for each fixed g ∈ N0, the equation [c g, u]ag ∈ A implies [cg, u] ∈ Aa−g which has at most |CV (c g)| |Ag−1| = p|CV (g0)| solutions u. Moreover, if c = 1, then (ca)gu = agu = ag implies g ∈ NG(A) = N and u ∈ V. Now we define the character η2 on C0 ×A by η2 = 1C0 × (p1A − ρA). Thus η 2 vanishes on all conjugacy classes of GV which intersect C0×(A−1) trivially, whereas for 1 6= c ∈ C0, 1 6= a ∈ A we get ηGV2 (ca) = |C0 ×A| g ∈ G u ∈ V η̇ ((ca)gu) p|CV (g0)|p |N0||CV (g0)|, and for c = 1, 1 6= a ∈ A we get ηGV2 (ca) = η 2 (a) = |V |p = |N ||V |. Thus if xi (i = 1, . . . , k(GV )) are representatives of the conjugacy classes of GV , then k(GV ) ηGV2 (xi) ≤ (p − 1) |N ||V |+ (q − 1)(p − 1) |N0||CV (g0)| and as in (1) we see that k(GV ) ηGV2 (xi) = τ∈Irr(GV ) (τη2, τ)C0×A. Now arguing as in (2), (3), (5) and (6) above will yield |Irrp′(GV )| ≤ k(GV, v) ≤ |Irr(GV,C0, v)| ≤ (|N ||V |+ (q − 1)p|N0||CV (g0)|), where Irr(GV,C0, v) is as defined at the beginning of Remark 3.2. Putting the main results together, altogether we have proved the following: 3.3 Theorem. Let G be a finite group and let V be a finite faithful G–module of characteristic p. Let v ∈ V and put C = CG(v). If ci (i = 1, . . . , k(C)) are representatives of the conjugacy classes of C, then the following hold: (a) If (|C|, |V |) = 1, then |Irr0(GV )| ≤ |CV (ci)| and if C > 1, then |Irr0(GV )| ≤ max |CV (ci)|, k(C)− 1 |CV (ci)| (b) If (|G|, |V |) = 1, then Irr0(GV ) = Irr(G), so k(GV ) = |Irr0(GV )| and the bounds in (a) hold true for k(GV ) instead of |Irr0(GV )|. (c) In general, if g ∈ C such that o(g) = q is a prime, then |Irrp′(GV )| ≤ k(GV, v) ≤ |NG(〈v〉)||V |+ (q − 1)p|NG(〈g〉)||CV (g)| 4 The dual approach In the previous section, we always fixed v ∈ V and obtained bounds on the size of suitable subsets of Irr(GV ) in terms of properties of the action of CG(v) on V . In this section we consider a ”dual” approach: We fix g ∈ G and find bounds in terms of the action of CG(g) on CV (g). For this, put Irrg(GV ) = {χ ∈ Irr(G) | χ|〈g〉×CV (g) cannot be written as ρ〈g〉×ψ for a character ψ of CV (g)}. In particular, Irr(GV, g) ⊆ Irrg(GV ). 4.1 Theorem. Let G be a finite group and V be a finite G–module. Let g ∈ G such that (o(g), |V |) = 1. Write A = 〈g〉, N = NG(A) and C = CV (g). Then (a) |Irrg(GV )| ≤ (n(N,A)−1)n(CG(A),C) (|A|−1)|C| max16=a∈A(|NG(〈a〉)||CV (a)|) (b) if g is of prime order, then |Irrg(GV )| ≤ |CG(A)|n(CG(A), C) (c) there are X,Y ⊆ Irrg(GV ) such that Irrg(GV ) is a disjoint union of X and Y and |X| ≤ (n(N,A)− 1)n(CG(A), C) (|A| − 1)|C|2 16=a∈A (|NG(〈a〉)||CV (a)|) and |Y | ≤ (n(N,A)− 1)(n(CG(A), C)− 1) (|A| − 1)|C| 16=a∈A (|NG(〈a〉)||CV (a)|) (d) if g is of prime order and X,Y are as in (c), then |X| ≤ |CG(A)|n(CG(A), C) and |Y | ≤ |CG(A)|(n(CG(A), C)− 1) Proof. If a1, a2 ∈ A and c1, c2 ∈ C − {1}, then it is straightforward to see that (a1, c1) (a2, c2) GV implies that aG1 = a 2 . Hence if T is a set of representatives of the orbits of N on A−{1}, then every conjugacy class of GV that intersects nontrivially with (A−{1})×C has a representative ac for some a ∈ T and some c ∈ C. Moreover, for each a ∈ T we have that if c3, c4 ∈ C are CG(A)–conjugate, then ac3 and ac4 are CG(A)–conjugate and thus (ac3) G = (ac4) This shows that for each a ∈ T there are at most n(CG(A), C) conjugacy classes of GV inter- secting nontrivially with {a} × C. Hence altogether we see that there are at most |T |n(CG(A), C) = (n(N,A) − 1)n(CG(A), C) (1) conjugacy classes of GV which intersect (A− {1}) × C nontrivially. Moreover observe that for 1 6= a ∈ A, c ∈ C, h ∈ G and u ∈ V we have (ac)hu ∈ A× C if and only if h ∈ NG(〈a〉), c h ∈ C and u ∈ CV (a) because the condition (ac)hu = ah[ah, u]ch ∈ A × C first forces ah ∈ A which implies (as A is cyclic) ah ∈ 〈a〉, so h ∈ NG(〈a〉), and then as c ∈ C ≤ CV (〈a〉), it follows that c h ∈ CV (〈a〉) and [ah, u] ∈ [〈a〉, V ]. Now as by our hypothesis we have V = CV (〈a〉) × [〈a〉, V ], we see that (ac)hu ∈ A× C now forces [ah, u] = 1 and ch ∈ C. Hence u ∈ CV (a h) = CV (a). Note that the direct product A×C is a subgroup of GV . We now define a generalized character η on A× C by η = (|A| · 1A − ρA)× 1C where ρA is the regular character of A. So for a ∈ A, c ∈ C we have η(ac) = 0, a = 1 |A|, a 6= 1 Therefore ηGV vanishes on all conjugacy classes of GV which intersect (A− {1}) × C trivially, whereas for c ∈ C and 1 6= a ∈ A we have ηGV (ac) = |A× C| h ∈ G u ∈ V η̇((ac)hu) |A||C| h ∈ NG(〈a〉) with ch ∈ C u∈CV (a) η((ac)hu) |A||C| h ∈ NG(〈a〉) with ch ∈ C u∈CV (a) η(ahch) |CV (a)| |A||C| h ∈ NG(〈a〉) with ch ∈ C |NG(〈a〉)||CV (a)| Thus if {xi | i = 1, . . . , k(GV )} is a set of representatives for the conjugacy classes of GV , then by (1) and (2) we see that (n(N,A) − 1)n(CG(A), C) · 16=a∈A (|NG(〈a〉)||CV (a)|) ≥ k(GV ) ηGV (xi) τ∈Irr(GV ) (τηGV , τ)GV τ∈Irr(GV ) (τη, τ)A×C (3). Observe that in case that A is of prime order, then n(N,A)− 1 = |A| − 1 |N : CG(A)| (|A| − 1)|CG(A)| and max 16=a∈A (|NG(〈a〉)||CV (a)|) = |N ||C|, so that (3) becomes |CG(A)|(|A| − 1)n(CG(A), C) ≥ τ∈Irr(GV ) (τη, τ)A×C (3a) Since A× C is a direct product, we can write τA×C = λ∈Irr(C) (τλ × λ), where τλ is a character of A or τλ = 0. Then (τη, τ)A×C = |A× C| a ∈ A c ∈ C τ(ac)η(ac)τ(ac) |A||C| 1 6= a ∈ A c ∈ C τ(ac)|A|τ(ac) 1 6= a ∈ A c ∈ C λ∈Irr(C) τλ(a)λ(c) µ∈Irr(C) τµ(a)µ(c) 16=a∈A λ,µ∈Irr(C) τλ(a)τµ(a) λ(c)µ(c) 16=a∈A λ,µ∈Irr(C) τλ(a)τµ(a)(λ, µ)C As (λ, µ)C = 1, λ = µ 0, λ 6= µ , we further obtain (τη, τ)A×C = 16=a∈A λ∈Irr(C) τλ(a)τλ(a) λ∈Irr(C) 16=a∈A |τλ(a)| 2 (4) Now observe that τ(1) = λ∈Irr(C) τλ(1). If all the τλ are multiples of ρA, then clearly τ1 6∈ Irrg(GV ), and so if τ ∈ Irrg(GV ), then by [4, Corollary 4] with (4) we see that (τη, τ)A×C ≥ |A| − 1 (5) So (3) and (5) yield |Irrg(GV )| ≤ (n(N,A)− 1)n(CG(A), C) (|A| − 1)|C| 16=a∈A (|NG(〈a〉)||CV (a)|), (6) and if g is of prime order, then (3a) and (5) yield |Irrg(GV )| ≤ |CG(A)|n(CG(A), C). (6a) Now as in Section 3, we now repeat the same arguments, but use η1 = (|A|1A − ρA)× (|C|1C − ρC) instead of η. One can then easily check that (n(N,A)− 1)(n(CG(A), C) − 1) · 16=a∈A (|NG(〈a〉)||CV (a)|) ≥ τ∈Irr(GV ) (τη1, τ)A×C (3b) and if g is of prime order, then |CG(A)|(|A| − 1)(n(CG(A), C) − 1) ≥ τ∈Irr(GV ) (τη1, τ)A×C (3c) Moreover it is easily seen that (τη1, τ)A×C = 1 6= a ∈ A 1 6= c ∈ C τ(ac)τ(ac) 16=a∈A λ,µ∈Irr(C) τλ(a)τµ(a) 16=c∈C λ(c)µ(c), and as 16=c∈C λ(c)µ(c) = −1, if λ 6= µ |C| − 1, if λ = µ , it follows that (τη1, τ)A×C = 16=a∈A |τλ(a)− τµ(a)| 2 (7) where ”≤” is an arbitrary ordering on Irr(C). Next suppose that there are exactly a characters τ ∈ Irrg(GV ) such that there is a character ψ of A (depending on τ) and there are aλ ∈ ZZ (λ ∈ Irr(C)) such that τλ = ψ + aλρA for all λ ∈ Irr(C) and ψ is not a multiple of ρA. Then by (4) and [4, Corollary 4] we know that (τη, τ)A×C = λ∈Irr(C) 16=a∈A |ψ(a)|2 ≥ |C|(|A| − 1) and hence by (3) we get (n(N,A)− 1)n(CG(A), C) (|A| − 1)|C|2 16=a∈A (|NG(〈a〉)||CV (a)|), (8) and if g is of prime order, then by (3a) even |CG(A)|n(CG(A), C) Now let b be the number of τ ∈ Irrg(GV ) such that there is no such ψ. Then there exist λ, µ ∈ Irr(C) with 16=a∈A |τλ(a)− τµ(a)| 2 6= 0, and thus by [4, Corollary 4] we have (τη1, τ) ≥ |A| − 1 (9) So (3b) and (9) yield (n(N,A)− 1)(n(CG(A), C) − 1) |C|(|A| − 1) 16=a∈A (|NG(〈a〉)||CV (a)|) (10) and, if g is of prime order, then by (3c) b ≤ |CG(A)|(n(CG(A), C) − 1), (10b) and clearly a+ b = |Irrg(GV )|, and hence all the assertions follow and we are done. ✸ References [1] R. Guralnick, P. H. Tiep, The non–coprime k(GV )–problem, J. Algebra 279 (2004), 694– [2] T. M. Keller, Fixed conjugacy classes of normal subgroups and the k(GV )–problem, J. Algebra 305 (2006), 457–486. [3] R. Knörr, On the number of characters in a p–block of a p–solvable group, Illinois J. Math 28 (1984), 181–209. [4] G. R. Robinson, A bound on norms of generalized characters with applications, J. Algebra 212 (1999), 660–668. [5] P. Schmid, Some remarks on the k(GV )–theorem, J. Group Theory 8 (2005), 589–604. ABSTRACT Let $G$ be a finite group and $V$ be a finite $G$--module. We present upper bounds for the cardinalities of certain subsets of $\Irr(GV)$, such as the set of those $\chi\in\Irr(GV)$ such that, for a fixed $v\in V$, the restriction of $\chi$ to $$ is not a multiple of the regular character of $$. These results might be useful in attacking the non--coprime $k(GV)$--problem. <|endoftext|><|startoftext|> Introduction 1.1 The setup The study of lattice effective interface models, continous and discrete, has a long tradi- tion in statistical mechanics [14, 5, 9, 10, 13, 2, 3, 4]. The model we study is given in terms of variables ϕi ∈ R which, physically speaking, are thought to represent height variables of a random surface at the sites i ∈ Zd. Mathematically speaking they are just continuous unbounded (spin) variables. The model is defined in terms of: a pair potential V , a quenched random term, and a pinning term at interface height zero. More precisely, we are interested in the behavior of the quenched finite-volume Gibbs measures in a finite volume Λ⊂Zd with fixed boundary condition at height zero, given University of Groningen, Department of Mathematics and Computing Sciences, Blauwborgje 3, 9747 AC Groningen, The Netherlands kuelske@math.rug.nl, http://www.math.rug.nl/∼kuelske/ Dipartimento di Matematica, Universit degli Studi ”Roma Tre”, Largo San Leonardo Murialdo, 1, 00146 Roma, ITALY, orlandi@mat.uniroma3.it , http://www.mat.uniroma3.it/users/orlandi/ http://arxiv.org/abs/0704.0582v1 http://www.math.rug.nl/~kuelske/ µε,Λ[η](dϕΛ) 〈i,j〉∈Λ V (ϕi−ϕj)− i∈Λ,j∈Λc,|i−j|=1 V (ϕi)+ i∈Λ ηiϕi i∈Λ(dϕi + εδ0(dϕi)) Zε,Λ[η] where the partition function Zε,Λ[η] denotes the normalization constant that turns the last expression into a probability measure. The Dirac-measures at the interface height zero are multiplied with the parameter ε, having the meaning of a coupling strength. The disorder configuration η = (ηi)i∈Rd denotes an arbitrary fixed configuration of external fields, modelling a ”quenched” (or frozen) random environment. What do we expect for such a model? Recall that the variance of a free massless interface in a finite box diverges like the logarithm of the sidelength when there are no random fields. Adding an arbitrarily small pinning ε (without disorder) always localizes the interface uniformly in the volume, with the variance of the field behaving on the scale | log ε| when ε tends to zero. Indeed, there is a beautiful and complete mathematical understanding of the model without disorder, in the case of both Gaussian and uniformly elliptic potentials (see [1, 7]) with precise asymptotics as the pinning force tends to zero. These results follow from the analysis of the distribution of pinned sites and the random walk (arising from the random walk representation of the covariance of the ϕi’s) with killing at these sites. In this sense there is already a random system that needs to be analyzed even without disorder in the original model. What do we expect if we turn on randomness in the model and add the ηi’s ? Let us review first what we know about the same model without a pinning force. In d = 2 we recently proved the deterministic lower bound µΛN [η](|ϕ0| ≥ t logL) ≥ c exp(−ct2) uniformly for any fixed disorder configuration η, for general potentials V (assuming not too slow growth at infinity) [12]. So, it is not possible to stabilize an interface by cleverly choosing a random field configuration (one could think e.g. that this might be possible with a staggered field). As this result holds at any arbitrary fixed configuration here we don’t need any assumptions on the distribution of random fields. This result clearly excludes the existence of an infinite-volume Gibbs measure describing a two dimensional interface in infinite volume in the presence of random fields. In another paper [8] the question of existence of gradient Gibbs measures (Gibbs distributions of the increments of the interface) in infinite volume was raised. Note that while interface states may not exist in the infinite volume such gradient states may very well exist, as the example of the two-dimensional Gaussian free field shows, by computation. (For existence beyond the Gaussian case which is far less trivial, see [10, 11].) It was proved in [8] that there are no such gradient Gibbs measures in the random model in dimension d = 2. Now, turn to the full model in d = 2. In view of the localization taking place at any positive pinning force ε without disorder, a natural guess might be that with disorder at least at very large ε there would be pinning. However, we show as a result of the present paper that this is not the case, somewhat to our own surprise, and an arbitrarily strong pinning does not suffice to keep the interface bounded. 1.2 Main results Delocalization in d = 2 - superextensivity of the overlap Denote by ΛL the square of sidelength 2L+ 1 centered at the origin. In this subsection we consider the disorder average of the overlap in ΛL showing that it grows faster than the volume. This in particular implies that in two dimensions there is never pinning, for arbitrarily weak random field and arbitrarily large pinning forces ε. Here is the result. Theorem 1.1 Assume that supt V ′′(t) ≤ 1, lim inf |t|↑∞ log V (t) log |t| > 1, and let ηi be sym- metrically distributed, i.i.d. with finite second moment. Let d = 2. Then there is a constant a > 0, independent of the distribution of the random fields and the pinning strength ε ≥ 0, such that lim inf L2 logL ηiµε,ΛL [η](ϕi) ≥ aE(η20). (2) Note that the growth condition on V includes the quadratic case and ensures the finiteness of the integrals appearing in (1) for all arbitrarily fixed choices of η, even at ε = 0. Generalizations to interactions that are non-nearest neighbor are obvious; all results go through e.g. for finite range and we skip them in this presentation for the sake of simplicity. We like to exhibit the case of Gaussian random fields (and not necessarily Gaussian potential V ) since the bound acquires a form that looks even more striking because it becomes independent of the size of the variance of the ηi’s (as long as this is strictly positive). Corollary 1.2 Let us assume that the random fields ηi have an i.i.d. Gaussian distri- bution with mean zero and strictly positive variance of arbitrary size. Then, with the same constant a as above, we have the bound lim inf L2 logL µε,ΛL [η](ϕ i )− µε,ΛL [η](ϕi) ≥ a > 0 (3) for any 0 ≤ ε < ∞. (3) follows from (2) by partial integration w.r.t. the Gaussian disorder average (transforming the overlap into the variance of the ϕi’s). Note that, even in the unpinned case of ε = 0, Theorem 1.1 is not entirely trivial in the case of general potentials V . Here it provides an alternative simple way to see the delocalization in the presence of random fields (while the explicit lower bound on the tails of [12] provides more information.) Lower bound on overlap in d ≥ 3 The analogue of Theorem 1.1 for higher dimensions is the following. Theorem 1.3 Let d ≥ 3 and let ε ≥ 0 be arbitrary and assume the same conditions on V and ηi as in Theorem 1.1. Then there are positive constants B1, B2 < ∞, independent of the distribution of the random fields and the pinning strength ε ≥ 0, such that lim inf ηiµε,ΛL[η](ϕi) E(η20) (−∆−1)0,0 − log(B1 +B2ε) (4) where the positive constant (−∆−1)0,0 is the diagonal element of the inverse of the infinite-volume lattice Laplace operator whose existence is guaranteed in d ≥ 3. Lower bound on the pinned volume in d ≥ 3 We complement the previous lower bounds on the overlaps which are depinning-type of results by a pinning-type result. It is a lower bound on the disorder average of the quenched Gibbs-expectation of the fraction of pinned sites. While we needed an upper bound on the interaction potential V before we are assuming now a lower bound on V . Theorem 1.4 Let d ≥ 3. Assume that inft V ′′(t) = c− > 0 and let ηi be symmetrically distributed, i.i.d. with finite second moment. Then there exist dimension-dependent constants C1, C2 > 0, independent of the distribution of the disorder, such that, for all ε and for all volumes Λ, the disorder average of the fraction of pinned sites obeys the estimate µε,Λ[η](ϕi = 0) ≥ 1− C1 + C2E(η log ε . (5) This shows pinning for the large ε regime in the ”thermodynamic sense” that the fraction of pinned sites can be made arbitrarily close to one, uniformly in the volume. As usual this result does not allow to make statement about the Gibbs measure itself. The proofs follows from ”thermodynamic reasoning”. The first ”depinning-type” result follows from taking the log of the partition function and differentiating and in- tegrating back w.r.t. the coupling strength of the random fields. Exploiting the linear form of the random fields, convexity, comparison of non-Gaussian with the Gaussian partition functions, and asymptotics of Green’s functions the results follow, see Chapter 2 Proof of Depinning-type results The estimates in formulas (2), (3), and (4) are immediate consequences of the following fixed-disorder estimate. Proposition 2.1 For any dimension d, there are constants CnG,d < ∞ and cG,d > 0 such that, for all fixed configurations of local fields η, we have i,j∈Λ (−∆Λ)−1i,j ηiηj − |Λ| log CnG,d + ε ηiµε,Λ[η](ϕi). (6) Proof of the Proposition: Let us see what comes out when we differentiate and integrate back the free energy in finite volume w.r.t. strength of the random fields. logZε,Λ[hη] = ηiµε,Λ[hη](ϕi). (7) At every fixed η, this quantity is a monotone function of h, which is seen by another differentiation w.r.t. h which produces the variance. We have Zε,Λ[η] Zε,Λ[0] dhηiµε,Λ[hη](ϕi) ≤ ηi µε,Λ[η](ϕi). (8) We note the lower bound on the numerator which we get by dropping the pinning term, giving us Zε,Λ[η] ≥ Zε=0,Λ[η] ≥ ZGaussε=0,Λ[η] = exp i,j∈Λ (−∆Λ)−1i,j ηiηj ZGaussε=0,Λ[0] ≥ exp i,j∈Λ (−∆Λ)−1i,j ηiηj Here we have denoted by ZGaussε=0,Λ[η] the Gaussian partition function with potential V (t) = Further we used that the lower bound on V (t) taken from the hypothesis implies that, for any partition function in any volume D, we have Zε=0,D[0] ≤ C |D|nG,d. This gives Zε,Λ[0] = ε|A|Zε,Λ\A[0] ε|A|C |Λ\A| nG,d = (CnG,d + ε) So the desired estimate on the overlap follows from (8),(9),(10). This concludes the proof of the Proposition. � It is easy to obtain the Theorems 1.1 and 1.3 from the proposition. Indeed, taking a disorder average we have E(η20) (−∆Λ)−1i,i − |Λ| log CnG,d + ε ηiµε,Λ[η](ϕi) . (11) Now use the asymptotics of the Green’s-function in a square (−∆ΛL) i,i ∼ logL at fixed i to get the first theorem. The proof of the case d ≥ 3 follows from the existence of the infinite-volume Green’s-function in d ≥ 3. Finally let us note in passing that a constant magnetic field is always winning against an arbitrarily strong pinning, and even more strongly than a random field. Indeeed, let d ≥ 2, let ηi = h ≥ 0 for all sites i and let ε ≥ 0 be arbitrary. Then, there is a constant cd > 0, independent of h and ε, such that lim inf µε,ΛL [h](ϕi) ≥ cdh. (12) This again follows from the Proposition, using i,j∈Λ(−∆ΛL) i,j ∼ Ld+2. 3 Proof of Pinning-type results To prove the lower bound on the fraction of pinning sites in dimension d ≥ 3 given in Theorem 1.4 we will in fact prove the following fixed-disorder lower bound: For all finite volumes Λ and for all realizations η we have, for any ε0 > 0 µε,Λ[η](ϕi = 0) log ε 2c−|Λ| i,j∈Λ (−∆Λ)−1i,j ηiηj with a constant CG,d defined in (21). Taking a disorder-expectation (5) follows by the finiteness of Green’s function in the infinite volume (−∆ )−10,0 with ε0 = 1. � Proof of (13): The proof is based on the trick to differentiate and integrate back the log of the partition function, now w.r.t. ε: Differentiation gives logZε,Λ[η] = µε,Λ[η](ϕi = 0). (14) We integrate this relation back, and it will be important for us to do it starting from a positive ε0 > 0. So we get Zε,Λ[η] Zε0,Λ[η] µε̃,Λ[η](ϕi = 0) ≤ log µε,Λ[η](ϕi = 0) (15) where we have used that i∈Λ µε̃,Λ[η](ϕi = 0) is a monotone function of ε̃. Note that the integrand itself is not a monotone function. (Compare [6] for a related non-random pinning scenario, with back-integration from zero.) Now we have the trivial lower bound obtained by keeping only the contribution in the expansion where all sites are pinned, i.e. Zε,Λ[η] ≥ ε|Λ|. (16) For the upper bound on the partition function of the full model (at ε0) we first use the lower bound on the potential V (t) ≥ c−t giving us a comparison with a Gaussian partition function with curvature c−: Zε0,Λ[η] ≤ Z Gauss,c− [η]. (17) It is a simple matter to rescale the Gaussian curvature away Gauss,c− [η] = c− 2 ZGauss 2 η] (18) where the partition function on the r.h.s. is taken with unity curvature potential. For the Gaussian partition function we claim the upper bound (writing again in the original parameters) of the form ZGaussε,Λ [η] ≤ ZGaussε=0,Λ[η]. (19) Here is an elementary proof: We will replace successively the single-site integrations involving the Dirac measure by integrations only over the Lebesgue measure with the appropriately adjusted prefactor. Indeed, consider one site i and compute the contri- bution to the partition function while fixing the values of ϕj for j not equal to i. Then use that dϕi + εδ0(dϕi) ϕj + ηi)ϕi = (2π) 2 exp j∼i ϕj + ηi) 2 exp j∼i ϕj + ηi) dϕi exp ϕj + ηi)ϕi and iterate over the sites. For the Gaussian unpinned partition function use ZGaussε=0,Λ[η] = exp i,j∈Λ (−∆Λ)−1i,j ηiηj ZGaussε=0,Λ[0] ≤ exp i,j∈Λ (−∆Λ)−1i,j ηiηj with a suitable constant. From here (5) follows from (15,16,17,18,19,21) � Acknowledgements: The authors thank Pietro Caputo for an interesting discus- sion and Aernout van Enter for comments on a previous draft of the manuscript. C.K. thanks the university Roma Tre for hospitality. References [1] E. Bolthausen, Y. Velenik, Critical behavior of the massless free field at the depinning transition. Comm. Math. Phys. 223, 161-203, 2001. [2] M. Biskup and R. Kotecký, Phase coexistence of gradient Gibbs states. Published Online in Probab. Theory Rel. Fields DOI 10.1007/s00440-006-0013-6, 2007. [3] A. Bovier and C. Külske, A rigorous renormalization group method for interfaces in random media. Rev. Math. Phys. 6, 413–496, 1994. [4] A. Bovier and C. Külske, There are no nice interfaces in (2 + 1)-dimensional SOS models in random media, J. Statist. Phys., 83: 751–759, 1996. [5] J. Bricmont, A. El Mellouki, and J. Fröhlich, Random surfaces in statistical mechanics: roughening, rounding, wetting, . . . J. Statist. Phys. 42, 743–798, 1986. [6] P. Caputo, Y. Velenik, A note on wetting transition for gradient fields. Stochastic Process. Appl. 87, 107–113, 2000. [7] J.-D. Deuschel, Y. Velenik, Non-Gaussian surface pinned by a weak potential. Probab. Theory Related Fields 116, 359-377, 2000. [8] A. C. D. van Enter, C. Külske, Non-existence of random gradient Gibbs measures in contin- uous interface models in d = 2., math.PR/0611140, to be published in Annals of Applied Probability [9] G. Forgacs, R. Lipowski and Th.M. Nieuwenhuizen, The Behaviour of Interfaces in Ordered and Disordered Systems, in Phase Transitions and Critical Phenomena, vol. 14, edited by C. Domb and J.L. Lebowitz, Academic Press, 1986. [10] T. Funaki, Stochastic Interface models. 2003 Saint Flour lectures, Springer Lecture Notes in Mathematics, 1869, 103–294, 2005. [11] T. Funaki and H. Spohn, Motion by mean curvature from the Ginzburg-Landau ∇ϕ inter- face model. Comm. Math. Phys. 185, 1–36, 1997. [12] C. Külske, E. Orlandi, A simple fluctuation lower bound for a disordered massless random continuous spin model in d = 2. Electronic Comm. Probab. 11 200-205 (2006) [13] S. Sheffield, Random surfaces, large deviations principles and gradient Gibbs measure clas- sifications. arXiv math.PR/0304049, Asterisque 304, 2005. [14] Y. Velenik, Localization and delocalization of random interfaces. Probability Surveys 3, 112-169, 2006. http://arxiv.org/abs/math/0611140 Introduction The setup Main results Proof of Depinning-type results Proof of Pinning-type results ABSTRACT We consider statistical mechanics models of continuous height effective interfaces in the presence of a delta-pinning at height zero. There is a detailed mathematical understanding of the depinning transition in 2 dimensions without disorder. Then the variance of the interface height w.r.t. the Gibbs measure stays bounded uniformly in the volume for any positive pinning force and diverges like the logarithm of the pinning force when it tends to zero. How does the presence of a quenched disorder term in the Hamiltonian modify this transition? We show that an arbitarily weak random field term is enough to beat an arbitrarily strong delta-pinning in 2 dimensions and will cause delocalization. The proof is based on a rigorous lower bound for the overlap between local magnetizations and random fields in finite volume. In 2 dimensions it implies growth faster than the volume which is a contradiction to localization. We also derive a simple complementary inequality which shows that in higher dimensions the fraction of pinned sites converges to one when the pinning force tends to infinity. <|endoftext|><|startoftext|> Introduction A unital and separable C∗-algebra D 6= C is strongly self-absorbing if there is an isomorphism D → D ⊗ D which is approximately unitarily equivalent to the inclusion map D → D ⊗ D, d 7→ d ⊗ 1D ([14]). Strongly self-absorbing C ∗-algebras are known to be simple and nuclear; moreover, they are either purely infinite or stably finite. The only known examples of strongly self-absorbing C∗-algebras are the UHF algebras of infinite type (i.e., every prime number that occurs in the respective supernatural number occurs with infinite multiplicity), the Cuntz algebras O2 and O∞, the Jiang–Su algebra Z and tensor products of O∞ with UHF algebras of infinite type, see [14]. All these examples are K1-injective, i.e., the canonical map U(D)/U0(D) → K1(D) is injective. It was observed in [14] that any two unital ∗-homomorphisms σ, γ : D → A ⊗ D are approximately unitarily equivalent, were A is another unital and separable C∗-algebra. If D is K1-injective, the unitaries implementing the equivalence may even be chosen to Date: August 3, 2021. 2000 Mathematics Subject Classification. 46L05, 47L40. Key words and phrases. Strongly self-absorbing C∗-algebras, KK-theory, asymptotic unitary equivalence, continuous fields of C∗-algebras. Supported by: The first named author was partially supported by NSF grant #DMS-0500693. The second named author was supported by the DFG (SFB 478). http://arxiv.org/abs/0704.0583v1 2 MARIUS DADARLAT AND WILHELM WINTER be homotopic to the unit. When D is O2, O∞, it was known that σ and γ are even asymptotically unitarily equivalent – i.e., they can be intertwined by a continuous path of unitaries, parametrized by a half-open interval. Up to this point, it was not clear whether the respective statement holds for the Jiang–Su algebra Z. Theorem 2.2 below provides an affirmative answer to this problem. Even more, we show that the path intertwining σ and γ may be chosen in the component of the unit. We believe this result, albeit technical, is interesting in its own right, and that it will be a useful ingredient for the systematic further use of strongly self-absorbing C∗-algebras in Elliott’s program to classify nuclear C∗-algebras by K-theory data. In fact, this point of view is our main motivation for the study of strongly self-absorbing C∗-algebras; see [8], [10], [16], [17], [18] and [15] for already existing results in this direction. For the time being, we use Theorem 2.2 to derive some consequences for the Kasparov groups of the form KK(D, A ⊗ D). More precisely, we show that all the elements of the Kasparov group KK(D, A ⊗ D) are of the form [ϕ] − n[ι] where ϕ : D → K ⊗ A ⊗ D is a ∗-homomorphism and ι : D → A ⊗ D is the inclusion ι(d) = 1A ⊗ d and n ∈ N. Moreover, two non-zero ∗-homomorphisms ϕ,ψ : D → K⊗A⊗D with ϕ(1D) = ψ(1D) = e have the same KK-theory class if and only if there is a unitary-valued continuous map u : [0, 1) → e(K ⊗ A ⊗ D)e, t 7→ ut such that u0 = e and limt→1 ‖ut ϕ(d)u t − ψ(d)‖ = 0 for all d ∈ D. In addition, we show that KKi(D,D ⊗A) ∼= Ki(D ⊗A), i = 0, 1. One may note the similarity to the descriptions of KK(O∞,O∞ ⊗ A) ([8],[10]) and KK(C,C ⊗ A). However, we do not require that D satisfies the universal coefficient theorem (UCT) in KK-theory. In the same spirit, we characterize O2 and the universal UHF algebra Q using K-theoretic conditions, but without involving the UCT. As another application of Theorem 2.2 (and the results of [7]), we prove in [4] an automatic trivialization result for continuous fields with strongly self-absorbing fibres over finite dimensional spaces. The second named author would like to thank Eberhard Kirchberg for an inspiring conversation on the problem of proving Theorem 2.2. 1. Strongly self-absorbing C∗-algebras In this section we recall the notion of strongly self-absorbing C∗-algebras and some facts from [14]. 1.1 Definition: Let A, B be C∗-algebras and σ, γ : A → B be ∗-homomorphisms. Suppose that B is unital. ON THE KK-THEORY OF STRONGLY SELF-ABSORBING C∗-ALGEBRAS 3 (i) We say that σ and γ are approximately unitarily equivalent, σ ≈u γ, if there is a sequence (un)n∈N of unitaries in B such that ‖unσ(a)u n − γ(a)‖ for every a ∈ A. If all un can be chosen to be in U0(B), the connected component of 1B of the unitary group U(B), then we say that σ and γ are strongly approximately unitarily equivalent, written σ ≈su γ. (ii) We say that σ and γ are asymptotically unitarily equivalent, σ ≈uh γ, if there is a norm-continuous path (ut)t∈[0,∞) of unitaries in B such that ‖utσ(a)u t − γ(a)‖ for every a ∈ A. If one can arrange that u0 = 1B and hence (ut ∈ U0(B) for all t), then we say that σ and γ are strongly asymptotically unitarily equivalent, written σ ≈suh γ. 1.2 The concept of strongly self-absorbing C∗-algebras was formally introduced in [14, Definition 1.3]: Definition: A separable unital C∗-algebra D is strongly self-absorbing, if D 6= C and there is an isomorphism ϕ : D → D ⊗D such that ϕ ≈u idD ⊗ 1D. 1.3 Recall [14, Corollary 1.12]: Proposition: Let A and D be unital C∗-algebras, with D strongly self-absorbing. Then, any two unital ∗-homomorphisms σ, γ : D → A⊗D are approximately unitarily equivalent. In particular, any two unital endomorphisms of D are approximately unitarily equivalent. We note that the assumption that A is separable which appears in the original statement of [14, Corollary 1.12] is not necessary and was not used in the proof. 1.4 Lemma: Let D be a strongly self-absorbing C∗-algebra. Then there is a sequence of unitaries (wn)n∈N in the commutator subgroup of U(D ⊗ D) such that for all d ∈ D ‖wn(d⊗ 1D)w n − 1D ⊗ d‖ → 0 as n→ ∞. Proof: Let F ⊂ D be a finite normalized set and let ε > 0. By [14, Prop. 1.5] there is a unitary u ∈ U(D⊗D) such that ‖u(d⊗1D)u ∗−1D⊗d‖ < ε for all d ∈ F . Let θ : D⊗D → D be a ∗-isomorphism. Then ‖(θ(u∗) ⊗ 1D)u(d ⊗ 1D)u ∗(θ(u) ⊗ 1D) − 1D ⊗ d‖ < ε for all d ∈ F . By Proposition 1.3 θ ⊗ 1D ≈u idD⊗D and so there is a unitary v ∈ U(D ⊗ D) such that ‖θ(u∗) ⊗ 1D − vu ∗v∗‖ < ε and hence ‖(θ(u∗) ⊗ 1D)u − vu ∗v∗u‖ < ε. Setting w = vu∗v∗u we deduce that ‖w(d ⊗ 1D)w ∗ − 1D ⊗ d‖ < 3ε for all d ∈ F . 1.5 Remark: In the situation of Proposition 1.3, suppose that the commutator subgroup of U(D) is contained in U0(D). This will happen for instance if D is assumed to be K1- injective. Then one may choose the unitaries (un)n∈N which implement the approximate 4 MARIUS DADARLAT AND WILHELM WINTER unitary equivalence between σ and γ to lie in U0(A⊗D). This follows from [14, (the proof of) Corollary 1.12], since the unitaries (un)n∈N are essentially images of the unitaries (wn)n∈N of Lemma 1.4 under suitable unital ∗-homomorphisms. 2. Asymptotic vs. approximate unitary equivalence It is the aim of this section to establish a continuous version of Proposition 1.3. 2.1 Lemma: Let D be separable unital strongly self-absorbing C∗-algebra. For any finite subset F ⊂ D and ε > 0, there are a finite subset G ⊂ D and δ > 0 such that the following holds: If A is another unital C∗-algebra and σ : D → A⊗D is a unital ∗-homomorphism, and if w ∈ U0(A⊗D) is a unitary satisfying ‖[w, σ(d)]‖ < δ for all d ∈ G, then there is a continuous path (wt)t∈[0,1] of unitaries in U0(A ⊗ D) such that w0 = w, w1 = 1A⊗D and ‖[wt, σ(d)]‖ < ε for all d ∈ F , t ∈ [0, 1]. Proof: We may clearly assume that the elements of F are normalized and that ε < 1. Let u ∈ D ⊗D be a unitary satisfying (1) ‖u(d ⊗ 1D)u ∗ − 1D ⊗ d‖ < for all d ∈ F . There exist k ∈ N and elements s1, . . . , sk, t1, . . . , tk ∈ D of norm at most one such that (2) ‖u− si ⊗ ti‖ < (3) δ := k · 10 (4) G := {s1, . . . , sk} ⊂ D. Now let w ∈ U0(A⊗D) be a unitary as in the assertion of the lemma, i.e., w satisfies (5) ‖[w, σ(si)]‖ < δ for all i = 1, . . . , k. We proceed to construct the path (wt)t∈[0,1]. By [14, Remark 2.7] there is a unital ∗-homomorphism ϕ : A⊗D ⊗D → A⊗D ON THE KK-THEORY OF STRONGLY SELF-ABSORBING C∗-ALGEBRAS 5 such that (6) ‖ϕ(a⊗ 1D)− a‖ < for all a ∈ σ(F) ∪ {w}. Since w ∈ U0(A⊗D), there is a path (w̄t)t∈[ 1 ,1] of unitaries in A⊗D such that (7) w̄ 1 = w and w̄1 = 1A⊗D. For t ∈ [1 , 1] define (8) wt := ϕ((σ ⊗ idD)(u) ∗(w̄t ⊗ 1D)(σ ⊗ idD)(u)) ∈ U(A⊗D); then (wt)t∈[ 1 ,1] is a continuous path of unitaries in A ⊗ D. For t ∈ [ , 1] and d ∈ F we ‖[wt, σ(d)]‖ = ‖wtσ(d)w t − σ(d)‖ < ‖wtϕ(σ(d) ⊗ 1D)w t − ϕ(σ(d) ⊗ 1D)‖+ 2 · ≤ ‖((σ ⊗ idD)(u)) ∗(w̄t ⊗ 1D)((σ ⊗ idD)(u(d ⊗ 1D)u ∗))(w̄∗t ⊗ 1D) ·((σ ⊗ idD)(u)) − ((σ ⊗ idD)(d⊗ 1D))‖ + < ‖((σ ⊗ idD)(u)) ∗(w̄t ⊗ 1D)((σ ⊗ idD)(1D ⊗ d))(w̄ t ⊗ 1D) ·((σ ⊗ idD)(u)) − ((σ ⊗ idD)(d⊗ 1D))‖ + = ‖(σ ⊗ idD)(u ∗(1D ⊗ d)u− d⊗ 1D)‖+ 6 MARIUS DADARLAT AND WILHELM WINTER where for the last equality we have used that the w̄t are unitaries and that σ is a unital ∗-homomorphism. Furthermore, we have (7),(8) = ‖ϕ(((σ ⊗ idD)(u)) ∗(w ⊗ 1D)((σ ⊗ idD)(u))) − w‖ < ‖ϕ(((σ ⊗ idD)(u)) ∗(w ⊗ 1D)( σ(si)⊗ ti))− w‖+ ≤ ‖ϕ(((σ ⊗ idD)(u)) σ(si)⊗ ti)(w ⊗ 1D))− w‖ ‖[w, σ(si)]‖ · ‖ti‖+ (5),(4),(2) < ‖ϕ(w ⊗ 1D)− w‖+ k · δ + 2 · (6),(3) + 2 · The above estimate allows us to extend the path (wt)t∈[ 1 ,1] to the whole interval [0, 1] in the desired way: We have ‖w 1 w∗ − 1D‖ < < 2, whence −1 is not in the spectrum of w 1 w∗. By functional calculus, there is a = a∗ ∈ A ⊗ D with ‖a‖ < 1 such that w∗ = exp(πia). For t ∈ [0, 1 ) we may therefore define a continuous path of unitaries wt := (exp(2πita))w ∈ U(A⊗D). It is clear that w0 = w and wt → w 1 as t→ (1 )−, whence (wt)t∈[0,1] is a continuous path of unitaries in A satisfying w0 = w and w1 = 1A ⊗D. Moreover, it is easy to see that ‖wt − w‖ ≤ ‖w 1 − w‖ < for all t ∈ [0, 1 ), whence ‖[wt, σ(d)]‖ < ‖[w 1 , σ(d)]‖ + for t ∈ [0, 1 ), d ∈ F . We have now constructed a path (wt)t∈[0,1] ⊂ U(A) with the desired properties. 2.2 Theorem: Let A and D be unital C∗-algebras, with D separable, strongly self- absorbing and K1-injective. Then, any two unital ∗-homomorphisms σ, γ : D → A⊗D are strongly asymptotically unitarily equivalent. In particular, any two unital endomorphisms of D are strongly asymptotically unitarily equivalent. ON THE KK-THEORY OF STRONGLY SELF-ABSORBING C∗-ALGEBRAS 7 Proof: Note that the second statement follows from the first one with A = D, since D ∼= D ⊗D by assumption. Let A be a unital C∗-algebra such that A ∼= A ⊗ D and let σ, γ : D → A be unital ∗-homomorphisms. We shall prove that σ and γ are strongly asymptotically unitarily equivalent. Choose an increasing sequence F0 ⊂ F1 ⊂ . . . of finite subsets of D such that Fn is a dense subset of D. Let 1 > ε0 > ε1 > . . . be a decreasing sequence of strictly positive numbers converging to 0. For each n ∈ N, employ Lemma 2.1 (with Fn and εn in place of F and ε) to obtain a finite subset Gn ⊂ D and δn > 0. We may clearly assume that (10) Fn ⊂ Gn ⊂ Gn+1 and that δn+1 < δn < εn for all n ∈ N. Since σ and γ are strongly approximately unitarily equivalent by Proposition 1.3 and Remark 1.5, there is a sequence of unitaries (un)n∈N ⊂ U0(A) such that (11) ‖unσ(d)u n − γ(d)‖ < for all d ∈ Gn and n ∈ N. Let us set wn := u n+1un, n ∈ N. Then wn ∈ U0(A) and ‖[wn, σ(d)]‖ = ‖wnσ(d)w n − σ(d)‖ ≤ ‖u∗n+1unσ(d)u nun+1 − u n+1γ(d)un+1‖ +‖u∗n+1γ(d)un+1 − σ(d)‖ for d ∈ Gn, n ∈ N. Now by Lemma 2.1 (and the choice of the Gn and δn), for each n there is a continuous path (wn,t)t∈[0,1] of unitaries in U0(A) such that wn,0 = wn, wn,1 = 1A and (12) ‖[wn,t, σ(d)]‖ < εn for all d ∈ Fn, t ∈ [0, 1]. Next, define a path (ūt)t∈[0,∞) of unitaries in U0(A) by ūt := un+1wn,t−n if t ∈ [n, n+ 1). 8 MARIUS DADARLAT AND WILHELM WINTER We have that (13) ūn = un+1wn = un and that ūt → un+1 as t → n + 1 from below, which implies that the path (ūt)t∈[0,∞) is continuous in U0(A). Furthermore, for t ∈ [n, n+ 1) and d ∈ Fn we obtain ‖ūtσ(d)ū t − γ(d)‖ = ‖un+1wn,t−nσ(d)w n,t−nu n+1 − γ(d)‖ < ‖un+1σ(d)u n+1 − γ(d)‖ + εn (11),(10) < 2εn. Since the Fn are nested and the εn converge to 0, we have (14) ‖ūtσ(d)ū t − γ(d)‖ for all d ∈ n=0Fn; by continuity and since n=0Fn is dense in D, we have (14) for all d ∈ D. Since ū0 ∈ U0(A) we may arrange that ū0 = 1A. 3. The group KK(D, A⊗D) and some applications 3.1 For a separable C∗-algebra D we endow the group of automorphisms Aut (D) with the point-norm topology. Corollary: Let D be a separable, unital, strongly self-absorbing and K1-injective C algebra. Then [X,Aut(D)] reduces to a point for any compact Hausdorff space X. Proof: Let ϕ,ψ : X → Aut (D) be continuous maps. We identify ϕ and ψ with unital ∗-homomorphisms ϕ,ψ : D → C(X) ⊗ D. By Theorem 2.2, ϕ is strongly asymptotically unitarily equivalent to ψ. This gives a homotopy between the two maps ϕ,ψ : X → Aut (D). 3.2 Remark: The conclusion of Corollary 3.1 was known before for D a UHF algebra of infinite type and X a CW complex by [13], for D = O2 by [8] and [10], and for D = O∞ by [2]. It is new for the Jiang–Su algebra. 3.3 For unital C∗-algebras D and B we denote by [D, B] the set of homotopy classes of unital ∗-homomorphisms from D to B. By a similar argument as above we also have the following corollary. ON THE KK-THEORY OF STRONGLY SELF-ABSORBING C∗-ALGEBRAS 9 Corollary: Let D and A be unital C∗-algebras. If D is separable, strongly self-absorbing and K1-injective, then [D, A⊗D] reduces to a singleton. 3.4 For separable unital C∗-algebras D and B, let χi : KKi(D, B) → KKi(C, B) ∼= Ki(B), i = 0, 1 be the morphism of groups induced by the unital inclusion ν : C → D. Theorem: Let D be a unital, separable and strongly self-absorbing C∗-algebra. Then for any separable C∗-algebra A, the map χi : KKi(D, A ⊗ D) → Ki(A ⊗ D) is bijective, for i = 0, 1. In particular both groups KKi(D, A⊗D) are countable and discrete with respect to their natural topology. Proof: Since D is KK-equivalent to D ⊗ O∞, we may assume that D is purely infinite and in particular K1-injective by [11, Prop. 4.1.4]. Let CνD denote the mapping cone C algebra of ν. By [3, Cor. 3.10], there is a bijection [D, A⊗ D] → KK(CνD, SA⊗ D) and hence KK(CνD, SA⊗D) = 0 for all separable and unital C ∗-algebras A as a consequence of Corollary 3.3. Since KK(CνD, A ⊗ D) is isomorphic to KK(CνD, S 2A ⊗ D) by Bott periodicity and the latter group injects in KK(CνD, SC(T) ⊗ A ⊗ D) = 0, we have that KKi(CνD,D ⊗ A) = 0 for all unital and separable C ∗-algebras A and i = 0, 1. Since KKi(CνD,D ⊗A) is a subgroup of KKi(CνD,D ⊗ Ã) = 0 (where à is the unitization of A) we see that KKi(CνD,D ⊗ A) = 0 for all separable C ∗-algebras A. Using the Puppe exact sequence, where χi = ν KKi+1(CνD, A⊗D) // KKi(D, A⊗D) // KKi(C, A⊗D) // KKi(CνD, A⊗D) we conclude that χi is an isomorphism, i = 0, 1. The map χi = ν ∗ is continuous since it is given by the Kasparov product with a fixed element (we refer the reader to [12], [9] or [1] for a background on the topology of the Kasparov groups). Since the topology of Ki is discrete and χi is injective, it follows that the topology of KKi(D, A⊗D) is also discrete. The countability of KKi(D, A⊗D) follows from that of Ki(A⊗D), as A⊗D is separable. 3.5 Remark: In contrast to Theorem 3.4, if D is the universal UHF algebra, then KK(D,C) ∼= Ext(Q,Z) ∼= QN has the power of the continuum [6, p. 221]. 3.6 Let D and A be as in Theorem 3.4 and assume in addition that D is K1-injective and A is unital. Let ι : D → A⊗D be defined by ι(d) = 1A ⊗ d. Corollary: If e ∈ K ⊗ A⊗ D is a projection, and ϕ,ψ : D → e(K ⊗ A ⊗ D)e are two unital ∗-homomorphisms, then ϕ ≈suh ψ and hence [ϕ] = [ψ] ∈ KK(D, A⊗D). Moreover: KK(D, A⊗D) = {[ϕ]− n[ι] |ϕ : D → K⊗A⊗D is a ∗-homomorphism, n ∈ N}. 10 MARIUS DADARLAT AND WILHELM WINTER Proof: Let ϕ, ψ and e be as in the first part of the statement. By [14, Cor. 3.1], the unital C∗-algebra e(K⊗A⊗D)e is D-stable, being a hereditary subalgebra of a D-stable C∗-algebra. Therefore ϕ ≈suh ψ by Theorem 2.2. Now for the second part of the statement, let x ∈ KK(D, A ⊗ D) be an arbitrary element. Then χ0(x) = [e]−n[1A⊗D] for some projection e ∈ K⊗A⊗D and n ∈ N. Since e(K ⊗ A ⊗ D)e is D-stable, there is a unital ∗-homomorphism ϕ : D → e(K ⊗ A ⊗ D)e. χ0([ϕ] − n[ι]) = [ϕ(1D)]− n[ι(1D)] = [e]− n[1A⊗D] = χ0(x), and hence [ϕ]− n[ι] = x since χ0 is injective by Theorem 3.4. In the remainder of the paper we give characterizations for the Cuntz algebra O2 and for the universal UHF-algebra which do not require the UCT. The latter result is a variation of a theorem of Effros and Rosenberg [5]. 3.7 Proposition: Let D be a separable unital strongly self-absorbing C∗-algebra. If [1D] = 0 in K0(D), then D ∼= O2. Proof: Since D must be nuclear (see [14]), D embeds unitally in O2 by Kirchberg’s theorem. D is not stably finite since [1D] = 0. By the dichotomy of [14, Thm. 1.7] D must be purely infinite. Since [1D] = 0 in K0(D), there is a unital embedding O2 → D, see [11, Prop. 4.2.3]. We conclude that D is isomorphic to O2 by [14, Prop. 5.12]. 3.8 Proposition: Let D, A be separable, unital, strongly self-absorbing C∗-algebras. Suppose that for any finite subset F of D and any ε > 0 there is a u.c.p. map ϕ : D → A such that ‖ϕ(cd) − ϕ(c)ϕ(d)‖ < ε for all c, d ∈ F . Then A ∼= A⊗D. Proof: By [14, Thm. 2.2] it suffices to show that for any given finite subsets F of D, G of A and any ε > 0 there is u.c.p. map Φ : D → A such that (i) ‖Φ(cd)− Φ(c)Φ(d)‖ < ε for all c, d ∈ F and (ii) ‖[Φ(d), a]‖ < ε for all d ∈ F and a ∈ G. We may assume that ‖d‖ ≤ 1 for all d ∈ F . Since A is strongly self-absorbing, by [14, Prop. 1.10] there is a unital ∗- homomorphism γ : A⊗A→ A such that ‖γ(a⊗1A)−a‖ < ε/2 for all a ∈ G. On the other hand, by assumption there is a u.c.p. map ϕ : D → A such that ‖ϕ(cd) − ϕ(c)ϕ(d)‖ < ε for all c, d ∈ F . Let us define a u.c.p. map Φ : D → A by Φ(d) = γ(1A ⊗ ϕ(d)). It is clear that Φ satisfies (i) since γ is a ∗-homomorphism. To conclude the proof we check now that Φ also satisfies (ii). Let d ∈ F and a ∈ G. Then ‖[Φ(d), a]‖ ≤ ‖[Φ(d), a − γ(a⊗ 1A)]‖+ ‖[Φ(d), γ(a ⊗ 1A)]‖ ≤ 2‖Φ(d)‖‖a − γ(a⊗ 1A)‖+ ‖[γ(1A ⊗ ϕ(d)), γ(a ⊗ 1A)]‖ < 2ε/2 + 0 = ε. ON THE KK-THEORY OF STRONGLY SELF-ABSORBING C∗-ALGEBRAS 11 3.9 Proposition: Let D be a separable, unital, strongly self-absorbing C∗-algebra. Sup- pose that D is quasidiagonal, it has cancellation of projections and that [1D] ∈ nK0(D) for all n ≥ 1. Then D is isomorphic to the universal UHF algebra Q with K0(Q) ∼= Q. Proof: Since D is separable unital and quasidiagonal, there is a unital ∗-representation π : D → B(H) on a separable Hilbert space H and a sequence of nonzero projections pn ∈ B(H) of finite rank k(n) such that limn→∞ ‖[pn, π(d)]‖ = 0 for all d ∈ D. Then the sequence of u.c.p. maps ϕn : D → pnB(H)pn ∼= Mk(n)(C) ⊂ Q is asymptotically multiplicative, i.e limn→∞ ‖ϕn(cd) − ϕn(c)ϕn(d))‖ = 0 for all c, d ∈ D. Therefore Q ∼= Q⊗D by Proposition 3.8. In the second part of the proof we show that D ∼= D ⊗Q. Let En : Q → Mn!(C) ⊂ Q be a conditional expectation onto Mn!(C). Then limn→∞ ‖En(a)− a‖ = 0 for all a ∈ Q. By assumption, for each n there is a projection e in D ⊗Mm(C) (for some m) such that n![e] = [1D] in K0(D). Let ϕ : Mn!(C) → Mn!(C) ⊗ e(D ⊗ Mm(C))e be defined by ϕ(b) = b ⊗ e. Since D has cancellation of projections and since n![e] = [1D], there is a partial isometry v ∈ Mn!(C) ⊗ D ⊗Mm(C) such that v ∗v = 1Mn!(C) ⊗ e and vv e11⊗1D⊗e11. Therefore b 7→ v ϕ(b) v ∗ gives a unital embedding ofMn!(C) into D. Finally, ψn(a) = v (ϕ ◦ En(a)) v ∗ defines a sequence of asymptotically multiplicative u.c.p. maps Q → D. Therefore D ∼= D ⊗Q by Proposition 3.8. 3.10 Remark: Let D be a separable, unital, strongly self-absorbing and quasidiagonal C∗- algebra. Then D ⊗Q ∼= Q by the first part of the proof of Proposition 3.9. In particular K1(D) ⊗ Q = 0 and K0(D) ⊗ Q ∼= Q by the Künneth formula (or by writing Q as an inductive limit of matrices). References [1] M. Dadarlat. On the topology of the Kasparov groups and its applications., J. Funct. Anal. 228 (2005), 394–418. [2] M. Dadarlat. Continuous fields of C∗-algebras over finite dimensional spaces , arXiv preprint math.OA/0611405 (2006). [3] M. Dadarlat. The homotopy groups of the automorphism group of Kirchberg algebras, J. Noncomm. Geom. 1 (2007), 113–139. [4] M. Dadarlat and W. Winter. Trivialization of C(X)-algebras with strongly self-absorbing fibres, preprint (2007). [5] E. G. Effros and J. Rosenberg. C∗-algebras with approximately inner flip, Pacific J. Math. 77 (1978), 417–443. [6] L. Fuchs. Infinite abelian groups, vol. 1, Academic Press, New York and London, 1970. [7] I. Hirshberg, M. Rørdam and W. Winter. C0(X)-algebras, stability and strongly self-absorbing C algebras, arXiv preprint math.OA/0610344 (2006). To appear in Math. Ann. [8] E. Kirchberg. The classification of purely infinite C∗-algebras using Kasparov’s theory, preprint (1994). http://arxiv.org/abs/math/0611405 http://arxiv.org/abs/math/0610344 12 MARIUS DADARLAT AND WILHELM WINTER [9] M. V. Pimsner. A topology on the Kasparov groups, draft. [10] N. C. Phillips. A classification theorem for nuclear purely infinite simple C∗-algebras, Documenta Math. 5 (2000), 49–114. [11] M. Rørdam. Classification of Nuclear C∗-Algebras, Encyclopaedia Math. Sci., vol. 126, Springer, Berlin, 2002. [12] C. Schochet. The fine structure of the Kasparov groups I. Continuity of the KK-pairing, J. Funct. Anal. 186 (2001), 25–61. [13] K. Thomsen. The homotopy type of the group of automorphisms of a UHF-algebra, J. Funct. Anal. 72 (1987), 182–207. [14] A. Toms andW.Winter. Strongly self-absorbing C∗-algebras, arXiv preprint math.OA/0502211 (2005). To appear in Trans. Amer. Math. Soc. [15] A. Toms and W. Winter. Z-stable ASH algebras, arXiv preprint math.OA/0508218 (2005). To appear in Can. J. Math. [16] W. Winter. On the classification of simple Z-stable C∗-algebras with real rank zero and finite decom- position rank, J. London Math. Soc. 74 (2006), 167–183. [17] W. Winter. Simple C∗-algebras with locally finite decomposition rank, J. Funct. Anal. 243 (2007), 394–425. [18] W. Winter. Localizing the Elliott conjecture, in preparation. Department of Mathematics, Purdue University, West Lafayette,, IN 47907, USA E-mail address: mdd@math.purdue.edu Mathematisches Institut der Universität Münster, Einsteinstr. 62, D-48149 Münster, Germany E-mail address: wwinter@math.uni-muenster.de http://arxiv.org/abs/math/0502211 http://arxiv.org/abs/math/0508218 0. Introduction 1. Strongly self-absorbing C*-algebras 2. Asymptotic vs. approximate unitary equivalence 3. The group KK(D,AD) and some applications References ABSTRACT Let $\Dh$ and $A$ be unital and separable $C^{*}$-algebras; let $\Dh$ be strongly self-absorbing. It is known that any two unital $^*$-homomorphisms from $\Dh$ to $A \otimes \Dh$ are approximately unitarily equivalent. We show that, if $\Dh$ is also $K_{1}$-injective, they are even asymptotically unitarily equivalent. This in particular implies that any unital endomorphism of $\Dh$ is asymptotically inner. Moreover, the space of automorphisms of $\Dh$ is compactly-contractible (in the point-norm topology) in the sense that for any compact Hausdorff space $X$, the set of homotopy classes $[X,\Aut(\Dh)]$ reduces to a point. The respective statement holds for the space of unital endomorphisms of $\Dh$. As an application, we give a description of the Kasparov group $KK(\Dh, A\ot \Dh)$ in terms of $^*$-homomorphisms and asymptotic unitary equivalence. Along the way, we show that the Kasparov group $KK(\Dh, A\ot \Dh)$ is isomorphic to $K_0(A\ot \Dh)$. <|endoftext|><|startoftext|> Effective interactions from q-deformed inspired transformations V. S. Timóteo a C. L. Lima b aCentro Superior de Educação Tecnológica, Universidade Estadual de Campinas, 13484-370, Limeira, SP, Brasil bInstituto de F́ısica, Universidade de São Paulo, CP 66318, 05315-970, São Paulo, SP, Brazil Abstract From the mass term for the transformed quark fields, we obtain effective contact interactions of the NJL type. The parameters of the model that maps a system of non-interacting transformed fields into quarks interacting via NJL contact terms are discussed. It is very common in physics to use transformations that make one particular system mathematically simpler, yet describing the same phenomena. A clear example is the use of canonical transformations in classical mechanics. q-Deformed algebras provide a nice framework to incorporate, in an effective way, interactions not originally contained in the Lagrangian of a particular system. In hadron physics, the NJL model is a very simple effective model for strong interactions that describes important features like the dynamical mass genera- tion, spontaneous chiral symmetry breaking, and chiral symmetry restoration at finite temperature. In recent works, we have been investigating possible applications of quantum algebras in hadronic physics. In general, we observed that when we deform the underlying algebra, the system is affected with correlations between its constituents. We have studied in detail the NJL model under the influence of a quantum su(2) algebra. The question we approach in this letter is: is it possible to obtain a transfor- mation connecting the NJL model to a simpler non-interacting system? We verified that we can indeed obtain the same dynamics of the NJL interaction Preprint submitted to Elsevier 4 November 2018 http://arxiv.org/abs/0704.0584v1 with a simple transformation of the quark fields, inspired in the q-deformed quark fields of previous works [2], [3], [6]. Mass term We start by writing a mass term for the transformed quark fields Lmassq =−M ΨΨ Ψ1Ψ1 +Ψ2Ψ2 UU +DD where  . (2) The transformed quark fields can be written in terms of the standard fields as Ψ1=ψ1 + (q −1 − 1) ψ1ψ2γ0ψ2 , (3) Ψ2=ψ2 + (q −1 − 1) ψ2ψ1γ0ψ1 , (4) U = u+ (q−1 − 1) udγ0d , (5) D= d+ (q−1 − 1) duγ0u , (6)  . (7) Here both components are modified in the same way, so that the above ex- pressions are different from the ones used in [2,3], where only one component is affected. Extending the transformation to both components is required to obtain a set of terms that will form an interaction of the NJL type. This im- plies that the anti-commutation relations for the deformed fields Ψ will also be different from the ones in [2,3]. Since obtaining the new anti-commuation relations is not in the scope of this work, we focus on the effective interactions contained in the non-interacting Lagrangian. Using Eqs. (5) and (6), we can re-write the Lagragian Eq.(1) in terms of the standard quark fields UU = uu+Q uud†d+Q d†duu+Q2 dduudd , (8) DD= dd+Q ddu†u+Q u†udd+Q2 uudduu , (9) where Q = (q−1 − 1). We can re-write the above equations as follows 1 + 2Q d†d dduudd+ dduudd , (10) 1 + 2Q u†u uudduu+ uudduu , (11) so that we identify the contact interactions between the quarks contained in the non-interacting deformed fields Lagrangian. Figure 1 shows the six-point contact interactions contained in the mass term for the q-deformed quark fields. We can reduce the six-point interactions to four-point contact terms in a mean field approach [5], so that we have UU +DD 1 + 2Q uu+ dd dduu+ dddd+ uudd+ uuuu , (12) where 〈ψ†ψ〉 = 〈u†u〉 = 〈d†d〉 = ρv, = 〈uu〉 = = ρs, and A = A(T ; q) has the same dimension of the condensate and will be determined later in this letter. The reduction of the six-point to four-point contact terms by closing one fermion line is also shown in Figure 1. Now we can write the mass term for the transformed quark fields Lmassq = −MΨΨ = −M 1 + 2 ψψ − M Γ2 ψψψψ , (13) with Γ = Q/A Kinetic energy term Accordingly, the kinetic energy term for the transformed fields, Ψγµ∂µΨ, can be written in terms of the standard ones as Ψγµ∂µΨ=Uγ µ∂µU +Dγ = uγµ∂µu+Q dγ0duγ µ∂µu+ uγ µ∂µudγ0d + dγµ∂µd+Q uγ0udγ µ∂µd+ dγ µ∂µduγ0u dγ0duγ µ∂µudγ0d uγ0udγ µ∂µduγ0u By using an extreme mean field approximation, namely, substituting every- where in the kinetic energy contribution 〈ψ†ψ〉 = 〈u†u〉 = 〈d†d〉 → ρv, and = 〈uu〉 = → ρs, we obtain Ψγµ∂µΨ= uγ µ∂µu (1 + 2Γρv) + dγµ∂µd (1 + 2Γρv) + (uγµ∂µu) Γ 2ρv + dγµ∂µd uγµ∂µu+ dγ (1 + Γρv) This corresponds to a usual kinetic energy with a shifted momentum p → p (1 + Γρv) The full Lagrangian The treatment of the density dependence of the kinetic energy term is rather cumbersome and will be postponed to a further contribution. We will consider the influence of this momentum dependent kinetic energy term in an effective way. Therefore, we will study a class of Lagrangians of the type L′q = (1 + Γρv) Lq = ψγµ∂µψ −M 1 + 2 (1 + Γρv) (1 + Γρv) ψψψψ (16) This representative of the full Lagrangian Lq = Ψγµ∂µΨ + Lmassq , when writ- ten in terms of the standard quark fields, can be identified with the NJL Lagrangian LNJL = ψγµ∂µψ −m0 ψψ +G ψψψψ . (17) The conditions for both Lagrangians, LNJL and L′q, to be equivalent for any values of T and q are (1 + Γρv) (1 + 2Γρv) m0 , (18) G = −M (1 + Γρv) . (19) Inserting Eq. (18) in Eq. (19), we obtain an equation for Γ Γ2 − 2αρv Γ− α = 0 , (20) where α = − 2G > 0. (21) This equation has two solutions Γ± = αρv . (22) The mass of the transformed fermion fields, M , has to be positive, so we associate the two solutions Γ− and Γ+ with the two regimes q < 1 and q > 1, respectively. The quantity A will be negative in both cases. The scalar (ρs) and vector (ρv) densities were calculated from the NJL model at finite temperature: [1− n− n] , (23) dpp2 [n− n] , (24) where n(p, T, µ) = 1 + exp [β (E − µ)] , (25) n(p, T, µ) = 1 + exp [β (E + µ)] , (26) are the fermions and anti-fermions distribution functions respectively with p2 +m2. First we solve the set of coupled gap equations for m, µ, and ρv (Eqs. 27 and 24, respectively) in the NJL model at finite temperature and chemical potential m = m0 − 2Gρs , µ = µ0 − GNcρv . The next step is to calculate the scalar and vector densities entering in the equation for Γ for a given value of the transformation parameter q. In this way we obtain A(T ; q), which in turn is used to obtain M . The numerical results are displayed in Figures 2 and 3, where we show the quantity A and the mass M as a function of both temperature and transformation parameter in the q > 1 and q < 1 regimes. It is worth to note that the mass of the transformed fermion fields does not depend on the transformation parameter. The well known results of the NJL model are mapped through A(T ; q) from the non-interacting transformed fermion fields Lagrangian. It is worth to note that the mass of the q-deformed fermion fields does not depend on the deformation of the algebra. The quantity A (T ; q) maps the simple non-interacting model into the NJL model. It represents, in an effective way, the correlations introduced by the transformations, when we write the non-interacting Lagrangian in terms of the standard quark fields. These correlations, in a mean field approximation, are effectively represented by contact interactions of the NJL type. It is also important to mention that it inherits the phase transition. When the con- densate and the dynamical mass vanishes with increasing T , the quantity A also experiences the phase transition. This is an expected behavior, since it depends on the dynamical mass. For a given temperature, T , and transfor- mation parameter, q, there is a value of the mapping function, A(T ; q), that makes the Lagrangians Eq.(16) and Eq.(17) equivalent. Summarizing, we have shown that it is possible to describe the dynamics of an interacting system of the NJL type with a simple non-interacting system by using a set of quantum algebra inspired transformations and a mapping function. Acknowledgments C. L. L. thanks Profs. D. Galetti and B. M. Pimentel for most helpful discus- sions. This work was partially supported by FAPESP Grant No. 2002/10896-7. V.S.T. would like to thank FAEPEX/UNICAMP for financial support. References [1] D. Galetti and B. M. Pimentel, An. Acad. Bras. Ci. 67 (1995) 7; S. S. Avancini, A. Eiras, D. Galetti, B. M. Pimentel, and C. L. Lima, J. Phys. A: Math. Gen. 28 (1995) 4915; D. Galetti, J. T. Lunardi, B. M. Pimentel, and C. L. Lima, Physica A242 (1997) 501. [2] M. Ubriaco, Phys. Lett. A 219 (1996) 205. [3] L. Tripodi and C. L. Lima, Phys. Lett. B 412 (1997) 7. [4] Y. Nambu and G. Jona-Lasinio, Physical Review 122 (1961) 345. [5] U. Vogl and W. Weise, Prog. Part. Nucl. Phys. 27 (1991) 195. [6] V. S. Timóteo and C. L. Lima, Phys. Lett. B 448 (1999) 1. [7] V. S. Timóteo and C. L. Lima, Mod. Phys. Lett. A 15 (2000) 219. [8] V. S. Timóteo and C. L. Lima, nucl-th/0509089. http://arxiv.org/abs/nucl-th/0509089 d dd d Gq Gq Gq Gq u u d d d d u u 〈uu〉〈dd〉 〈dd〉〈uu〉 Fig. 1. Contact interactions generated by the mass term for the q-deformed fermion fields and their reduction from six-point to four-point by closing one fernion line. 0.2 1 T (GeV) 0.2 0.40.5 qT (GeV) Fig. 2. The quantity A, in units of the chiral condensate at zero temperature ρs(T = 0) = −1.42 × 10−2 GeV3, as a function of temperature and q-deformation for the q > 1 and q < 1 regimes. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 T (GeV) Fig. 3. The mass of the q-deformed quark fields, in units of the current quark mass m0 = 5 MeV, as a function of T for both q > 1 and q < 1 regimes. For small temperatures, M = m0. References ABSTRACT From the mass term for the transformed quark fields, we obtain effective contact interactions of the NJL type. The parameters of the model that maps a system of non-interacting transformed fields into quarks interacting via NJL contact terms are discussed. <|endoftext|><|startoftext|> Magnetospectroscopy of epitaxial few-layer graphene M.L. Sadowski a G. Martinez a, M. Potemski a, C. Berger b,c, W.A. de Heer b aGrenoble High Magnetic Field Laboratory, Grenoble,France bGeorgia Institute of Technology, Atlanta, Georgia, USA cInstitut Néel, CNRS, Grenoble, France Abstract The inter-Landau level transitions observed in far-infrared transmission experiments on few-layer graphene samples show a behaviour characteristic of the linear disper- sion expected in graphene. This behaviour persists in relatively thick samples, and is qualitatively different from that of thin samples of bulk graphite. Key words: Graphene, Cyclotron resonance, PACS: 71.70.Di, 76.40.+b, 78.30.-j, 78.67.-n The interest in two-dimensional graphite is fuelled by its particular band struc- ture and ensuing dispersion relation for electrons, leading to numerous differ- ences with respect to “conventional” two-dimensional electron gases (2DEG). Single graphite layers (graphene) have long been used as a starting point in band structure calculations of bulk graphite [1,2,3] and, more recently, carbon nanotubes [4]. The band structure of a single graphene sheet is considered to be composed of cones located at two inequivalent Brillouin zone corners at which the conduction and valence bands merge. In the vicinity of these points the electron energy depends linearly on its momentum, which implies that free charge carriers in graphene are governed not by Schrödinger’s equation, but rather by Dirac’s relativistic equation for zero rest mass particles, with an effective velocity c̃, which replaces the speed of light [5,6]. The recent appearance of ultrathin graphite layers (few-layer graphene, FLG), obtained by epitaxial [7,8,9] and exfoliation techniques [10], followed by sin- gle graphene and its unusual sequence of quantum Hall states [11,12] has re-ignited this interest. The prospects of studying quantum electrodynamics Preprint submitted to Solid State Communications 4 November 2018 http://arxiv.org/abs/0704.0585v1 100 200 300 400 500 600 700 Energy (cm-1) 0.4 T 1.9 K Fig. 1. Transmission spectrum of epitaxial graphene at 0.4 T. The inset shows a schematic of the assignations of the observed transitions. in solid state experiments on the one hand and the possibility of future ap- plications in carbon-based electronics on the other are currently driving a considerable research effort. The majority of the published literature remains theoretical; the extremely small lateral dimensions (≈ 10µm) of the graphene flakes used in the above-mentioned transport experiments makes them difficult objects for experimental studies. Moreover, due to the somewhat hit-and-miss character of the exfoliation method, as well as the inherent difficulty of ob- taining large numbers of samples, it appears to be an unlikely candidate for possible applications. Epitaxial methods on the other hand offer the opportu- nity of obtaining relatively large, high quality two-dimensional graphite [13]. In the following, we present optical measurements of the characteristic disper- sion relation of FLG, confirming directly its linear (“relativistic”) character. A number of epitaxial graphene samples have been studied by means of far- infrared magnetotransmission measurements. The samples were about 4 × 4 mm2 in area, grown by sublimating SiC substrates at high temperatures [9,13]. The experimental details and part of the results have been described elsewhere [14]. A representative transmission spectrum of a three-graphene-layer sample is shown in Fig. 1 for a weak magnetic field of 0.4 T. When the magnetic field is increased, all the features visible in this figure are displaced towards higher energies. Furthermore, their strength increases [14] and more features become visible at higher energies. The positions of the features observed for the sample containing 3 graphene layers are plotted versus the square root of the magnetic field in Fig. 2. It may be seen that the resonant energies observed evolve proportionally to the square root of the magnetic field. The oscillator strength Line Slope in units of c̃ 2e~ Transition 1 L1 → L2 B 1 L0 → L1(L−1 → L0) −1 → L2(L−2 → L1) −2 → L3(L−3 → L2) −3 → L4(L−4 → L3) −4 → L5(L−5 → L4) −5 → L6(L−6 → L5) −6 → L7(L−7 → L6) −7 → L8(L−8 → L7) Table 1 Observed lines and their assignments of the transition labelled B in Fig 1 has also been shown [14] to increase linearly with the square root of the magnetic field. These results are, in a first approximation, in excellent agreement with predic- tions arising from a simple single-particle model of non-interacting massless Dirac fermions. Using appropriate graphene wavefunctions [4] and the Hamiltonian commonly used to describe electrons in a single graphene layer, it is fairly straightforward to work out the optical selection rules [15]. It may then be shown that the allowed transitions are Ln → Lm such that |m| = |n| − 1 for the “+” circular polarisation and |m| = |n|+1 in the “-” circular polarisation. For unpolarised radiation, used in the current experiment, the allowed transitions are simply those between states n,m such that |m| = |n| ± 1. The Landau level energies are obtained as En = c̃ 2~eB|n| (1) where c̃ is the effective velocity of the Dirac fermions, B is the magnetic field and n = 0,±1,±2 ... is the Landau level index (the electron and hole levels being identical). The energies of the allowed optical transitions may then be concisely written as 2~eB( |n+ 1| ± |n|) (2) The positions of the transitions shown in Fig. 2 are summarised in Table 1. It should be stressed that all the positions of all the observed lines are described 0 1 2 3 4 4000 I H G F E )( 2/1TB Fig. 2. Evolution with magnetic field of transitions observed in transmission. The letters correspond to those used in Fig. 1; the shaded region corresponds to the range where the substrate is opaque. by a single fitting parameter - the effective light velocity c̃. We should add, for the sake of completeness, that the present experiment, using unpolarised light, does not distinguish between electrons and holes, which are expected to be identical in terms of the effective mass and dispersion relation. Thus, transition A, attributed to the L1 → L2 process, could also be due to the corresponding −2 → L−1 one. While a p-type character appears to be unlikely, it cannot be ruled out on the basis of the experiment in question. The striking agreement of the experimental data obtained using several lay- ers of graphene with expectations for a single layer is surprising, given that calculations suggest a completely different behaviour already for a graphene bilayer [17]. On the other hand, it has long been known that particles with a linear dispersion exist in bulk graphite as well - a minority pocket of carriers in the vicinity of the H point of the Brillouin zone were shown to give rise to electronic transitions following a square root dependence on the magnetic field [18]. The question therefore is posed: at what point, if at all, does epitaxial FLG become bulk graphite? Early work on epitaxial graphene [7] suggested that the process of baking SiC substrates led to a single graphene layer floating above a graphite layer. More recent calculations [22] suggest that the first carbon layer on top of an SiC substrate has an electronic structure different from that of graphene, and acts as a buffer, allowing subsequent layers to behave like graphene. A strong dependence of the electronic structure of FLG on the type of stacking has also been suggested [23]. The common Bernal, or AB, stacking found for example in HOPG graphite is usually assumed for all FLG structures as well; this is 100 200 300 400 500 600 700 60 layers 9 layers Energy (cm-1) 3 layers Fig. 3. Transmission spectra at 4T for epitaxial FLG samples (top three) of varying thickness and, for comparison, of HOPG graphite at the same magnetic field. not necessarily the case. Also, let us note that the HOPG interlayer distance of 3.354 Å may not be the correct value for epitaxial graphene. In order to elucidate the effect of multiplying layers on the transmission spec- trum, samples of varying thickness were studied and compared with a layer of HOPG obtained by exfoliation. The details of this study shall be presented elsewhere [19]; for the time being let us note the qualitative differences in the spectra, shown in Fig. 3. Four spectra are shown, at a magnetic field of 4T: for sample consisting of 3, 9 and 60 layers of graphene on SiC, and for the HOPG sample. The dominant feature in the epitaxial samples is always the L0 → L1 −1 → L0) transition; we can see that it grows stronger as the number of layers is increased, and is several times stronger for the sample containing 60 layers. In this sample one can also see the appearance of other features at lower energies, which were absent in the thinner samples, and which appear to correspond to bulk-like features visible in the lowest (HOPG) trace in the figure. On the other hand, the L0 → L1 (L−1 → L0) transition, which has a square root dispersion even in the 60 layer sample, is absent from the HOPG spectrum. The observed persistence of the Dirac fermion-like behaviour of the carriers in epitaxial FLG up to relatively thick ( 19 nm) structures appears to suggest that the structure of this material is in fact different from that of bulk HOPG. The simplest explanation would be a far weaker interaction between adjacent graphene layers, leading to a sequence of graphene layers instead of bulk, or even multilayer, graphene. More studies are necessary to elucidate this question. The GHMFL is a “Laboratoire conventionné avec l’UJF et l’INPG de Greno- ble”. The present work was supported in part by the European Commission through grant RITA-CT-2003-505474 and by grants from the Intel Research Corporation and the NSF: NIRT “Electronic Devices from Nano-Patterned Epitaxial Graphite”. References [1] P.R. Wallace, Phys. Rev. 71, 622 (1947) [2] J.W. McClure, Phys. Rev. 104, 666 (1956) [3] J.C. Slonczewski and P.R. Weiss, Phys. Rev.109, 272 (1958) [4] T. Ando, J. Phys. Soc. Jpn. 74, 777 (2005) [5] F.D.M. Haldane, Phys. Rev. Lett. 61, 2015 (1988) [6] Y. Zheng and T. Ando, Phys. Rev. B 65, 245420 (2002) [7] I. Forbeaux, J.-M. Themlin, and J.-M. Debever, Phys. Rev. B 58, 16396 (1998) [8] A.Charrier, A. Coati, T. Argunova, F. Thibaudau, Y. Garreau, R. Pinchaux, I. Forbeaux, J.-M. Debever, M. Sauvage-Simkin, J.-M. Themlin, J. Appl. Phys. 92, 2479 (2002) [9] C. Berger, Z. Song, T. Li, X. Li, A.Y. Ogbazghi, R. Feng, Z. Dai, A.N. Marchenkov, E.H. Conrad, P.N. First, and W.A. de Heer, J. Phys. Chem. 108, 19912 (2004). [10] K.S. Novoselov, A.K. Geim, S.V. Morozov, D. Jiang, Y. Zhang, S.V. Dubonos, I.V. Grigorieva, and A.A. Firsov, Science 306, 666 (2004) [11] K.S. Novoselov, A.K. Geim, S.V. Morozov, D. Jiang, M.I. Katsnelson, I.V. Grigorieva, S.V. Dubonos, and A.A. Firsov, Nature 438, 197 (2005). [12] Y. Zhang, Y.-W. Tan, H.L. Stormer and P. Kim, Nature 438, 201 (2005). [13] C. Berger, Z. Song, T. Li, X. Li, X. Wu, N. Brown, C. Naud, D. Mayou, A.N. Marchenkov, E.H. Conrad, P.N. First, and W.A. de Heer, Science 312, 1191 (2006) [14] M.L. Sadowski, G. Martinez, M. Potemski, C. Berger, and W.A. de Heer, Phys. Rev. Lett 97, 266405 (2006). [15] M.L. Sadowski, G. Martinez, M. Potemski, C. Berger, and W.A. de Heer, Int. J. Mod. Phys. B, in press. [16] V.P. Gusynin, S.G.Sharapov, and J.P. Carbotte, J. Phys.: Condens. Matter 19, 026222 (2007) [17] D.S.L. Abergel and V.I. Fal’ko, cond-mat/0610673 [18] W.W. Toy, M.S. Dresselhaus, and G. Dresselhaus, Phys. Rev. 15, 4077 (1977) [19] M.L. Sadowski et al., to be published [20] T. Ohta, A. Bostwick, T. Seyller, K. Horn, and E. Rotenberg, Science 313, 951 (2006) [21] B. Partoens and F.M. Peeters, Phys. Rev. B 74,075404 (2006) [22] F. Varchon, R. Feng, J. Hass, X. Li, B.N. Nguyen, C. Naud, P. Mallet, J.-Y. Veuillen, C. Berger, E.H. Conrad, and L. Magaud, cond-mat/0702311 [23] F. Guinea, A.H. Castro Neto, N.M.R. Peres, Phys. Rev. B 73, 245426 (2006) http://arxiv.org/abs/cond-mat/0610673 http://arxiv.org/abs/cond-mat/0702311 References ABSTRACT The inter-Landau level transitions observed in far-infrared transmission experiments on few-layer graphene samples show a behaviour characteristic of the linear dispersion expected in graphene. This behaviour persists in relatively thick samples, and is qualitatively different from that of thin samples of bulk graphite. <|endoftext|><|startoftext|> Introduction SN dust formation revisited Survival in the reverse shock Dynamics of the reverse shock Dust grain survival Extinction and emission from SN dust Summary Stochastic heating from electron collisions ABSTRACT The presence of dust at high redshift requires efficient condensation of grains in SN ejecta, in accordance with current theoretical models. Yet, observations of the few well studied SNe and SN remnants imply condensation efficiencies which are about two orders of magnitude smaller. Motivated by this tension, we have (i) revisited the model of Todini & Ferrara (2001) for dust formation in the ejecta of core collapse SNe and (ii) followed, for the first time, the evolution of newly condensed grains from the time of formation to their survival - through the passage of the reverse shock - in the SN remnant. We find that 0.1 - 0.6 M_sun of dust form in the ejecta of 12 - 40 M_sun stellar progenitors. Depending on the density of the surrounding ISM, between 2-20% of the initial dust mass survives the passage of the reverse shock, on time-scales of about 4-8 x 10^4 yr from the stellar explosion. Sputtering by the hot gas induces a shift of the dust size distribution towards smaller grains. The resulting dust extinction curve shows a good agreement with that derived by observations of a reddened QSO at z =6.2. Stochastic heating of small grains leads to a wide distribution of dust temperatures. This supports the idea that large amounts (~ 0.1 M_sun) of cold dust (T ~ 40K) can be present in SN remnants, without being in conflict with the observed IR emission. <|endoftext|><|startoftext|> Preferential interaction coefficient for nucleic acids and other cylindrical poly-ions Emmanuel Trizac∗ CNRS; Univ. Paris Sud, UMR8626, LPTMS, ORSAY CEDEX, F-91405 and Center for Theoretical Biological Physics, UC San Diego, 9500 Gilman Drive MC 0374 - La Jolla, CA 92093-0374, USA Gabriel Téllez† Departamento de F́ısica, Universidad de Los Andes, Apartado Aéreo 4976, Bogotá, Colombia The thermodynamics of nucleic acid processes is heavily affected by the electric double-layer of micro-ions around the polyions. We focus here on the Coulombic contribution to the salt- polyelectrolyte preferential interaction (Donnan) coefficient and we report extremely accurate ana- lytical expressions valid in the range of low salt concentration (when polyion radius is smaller than the Debye length). The analysis is performed at Poisson-Boltzmann level, in cylindrical geometry, with emphasis on highly charged poly-ions (beyond “counter-ion condensation”). The results hold for any electrolyte of the form z−:z+. We also obtain a remarkably accurate expression for the electric potential in the vicinity of the poly-ion. Coulombic interactions between salt and poly-anions play a key role in the equilibrium and kinetics of nucleic acid processes [1]. A convenient quantity quantifying such interactions and allowing for the analysis and inter- pretation of their thermodynamics consequences, is the so called preferential interaction coefficient. Several def- initions have been proposed and their interrelation stud- ied, see e.g. [2, 3, 4]. In the present work, they are defined as the integrated deficit (with respect to bulk conditions) of co-ions concentration around a rod-like poly-ion. Our goal is to provide analytical expressions describing the effect of salt concentration and poly-ion structural pa- rameters on the preferential interaction coefficient, for a broad class of asymmetric electrolytes. For symmetric electrolytes, it will be shown that our formulas improve upon existing analytical results. For other asymmetries, they seem to have no counterpart in the literature. Our analysis holds for highly (i.e. beyond counter-ion conden- sation [5, 6]) and uniformly charged cylindrical poly-ions, and is explicitly limited to the low salt regime (i.e. when the poly-ion radius a is smaller than the Debye length 1/κ). These conditions are most relevant for RNA or DNA in their single, double, or triple strand forms. As in several previous approaches [7, 8, 9, 10], we adopt the mean-field framework of Poisson-Boltzmann equation, in a homogeneous dielectric background of per- mittivity ε. The same starting point has proven relevant for related structural physical chemistry studies of nu- cleic acids [11]. In a z−:z+ electrolyte, the dimensionless electrostatic potential φ = eϕ/kT (with e > 0 the ele- mentary charge and kT thermal energy) then obeys the equation [12] z+ + z− ez−φ − e−z+φ , (1) ∗Electronic address: trizac@lptms.u-psud.fr †Electronic address: gtellez@uniandes.edu.co where r is the radial distance to the rod axis. The va- lencies z+ and z− of salt ions are both taken positive. Denoting derivative with a prime, the boundary condi- tions read rφ′(r) = 2ξ > 0 at the polyion radius (r = a) and φ → 0 for r → ∞. The latter condition expresses the infinite dilution of poly-ion limit and ensures that the whole system is electrically neutral, since it (indi- rectly) implies that rφ′ → 0 for r → ∞. We consider a negatively charged poly-anion for which φ < 0 and the line charge density reads λ = −eξ/ℓB < 0, where ℓB = e 2/(εkT ) denotes the Bjerrum length (0.71 nm in water at room temperature). Finally, the Debye length is defined from the bulk ionic densities n∞+ and n − through κ2 = 4πℓB(z + + z The Coulombic contribution to the anionic preferential interaction coefficient is defined as [7, 8, 9, 10, 13] Γ = κ2 (ez−φ − 1) rdr, (2) while its cationic counterpart follows from electro- neutrality. This quantity –which provides a measure of the Donnan effect [14]– can be expressed in closed form as a function of the electrostatic potential, see Appendix A. As can be seen in (A3) and (A4), Γ depends expo- nentially on the surface potential φ0, so that deriving a precise analytical expression is a challenging task. Fur- thermore, we are interested here in the limit κa < 1 (including the regime κa≪ 1) which is analytically more difficult than the opposite high salt situation where to leading order, the charged rod behaves as an infinite plane, and curvature corrections can be perturbatively included [15, 16, 17]. We will proceed in two steps. Focusing first on the surface potential φ0 = φ(a), we make use of recent re- sults [18] that have been obtained from a mapping of Eq. (1) onto a Painlevé type III problem [19, 20, 21]. The exact expressions thereby derived only hold for 1:1, 1:2 and 2:1 electrolytes, but may be written in a way that is electrolyte independent. This remarkable feature is spe- http://arxiv.org/abs/0704.0587v1 mailto:trizac@lptms.u-psud.fr mailto:gtellez@uniandes.edu.co z+/z− 1/10 1/3 1/2 1 2 3 10 C -2.51 -1.94 -1.763 -1.502 -1.301 -1.21 -1.06 TABLE I: Values of C appearing in Eq. (4) as a function of electrolyte asymmetries. For z+/z− = 1, 1/2 and 2, C is known analytically from the results of [18]. The corresponding values are recalled in Appendix B. For other values of z+/z−, C has been determined numerically, see in particular Fig. 6 of Appendix B. cific to the short distance behaviour of φ and has been overlooked so far, since not only short distance but also large distance properties have been studied [18]. We are then led to conjecture that the corresponding expression holds for any binary electrolyte z−:z+, and we explicitly check the relevance of our assumption on several specific examples. Technical details are deferred to the appendices. It is in particular concluded in Appendix B that the surface potential may be written e−z+φ0 ≃ 2(z+ + z−) z+(κa)2 (z+ξ − 1) 2 + µ̃2 where log(κa) + C − (z+ξ − 1)−1 . (4) Expression (4) is valid for κa < 1 and z+ξ > 1 [in fact z+ξ > 1 + O(1/| log κa|)]. These conditions are easily fulfilled for nucleic acids. The “constant” C appearing in (3) depends smoothly on the ratio z+/z− but is oth- erwise salt and charge independent. We report in Table I its values for several electrolyte asymmetries. The de- crease (in absolute value) of C when z+/z− increases is a signature of more efficient (non-linear) screening with counter-ions of higher valencies. From Eq. (3) and the results of Appendix B, our ap- proximation for Γ takes a simple form Γ ≃ − (1 + µ̃2). (5) This expression is tested in Figures 1 and 2 against the “true” numerical results that serve as a benchmark. In Fig. 1 which corresponds to a monovalent salt (or more generally a z:z electrolyte), we also show the prediction of Ref. [9], which is, to our knowledge, the most accurate existing formula for a 1:1 salt. For the technical reasons discussed in Appendix B, and that are evidenced in Fig- ure 6, our expression improves that of Shkel, Tsodikov and Record [9], particularly at lower salt content. For 1:2 and 2:1 salts, we expect Eq. (5) to be also accurate, since it is based on exact expansions. The situation of other salt asymmetries is more conjectural (see Appendix B), but Eq. (5) is nevertheless in remarkable agreement with the full solution of Eq. (1), see Fig. 2. To be spe- cific, in both Figures 1 and 2, the relative accuracy of our approximation is better than 0.2% for κa = 10−2 (for both ss and ds RNA parameters). At κa = 0.1, the accuracy is on the order of 1%. As illustrated in Fig. 3, approximation (4) assumes that z+ξ > 1. The corresponding expression for Γ there- fore breaks down when ξ is too low. More general expres- sions, still for κa < 1, may be found in appendix C. The inset of Fig. 3 offers an illustration and shows that the limitations of approximation (4) may be circumvented at little cost, providing a quasi-exact value for Γ. Moreover, it is shown in this appendix that for z+ξ = 1, µ̃ reads log(κa) + C . (6) On the other hand, Eq. (3) still holds. The corresponding Γ is shown in Fig. 4. We provide in Appendix C a general expression of the short scale (i.e valid up to κr ∼ 1) radial dependence of the electric potential, see Eq. (C1). The bare charge should not be too low [more precisely, one must have ξ > ξc with ξc given by Eq. (C5)], and µ̃ –which encodes the dependence on ξ– follows from solving Eq. (C2). In general, the corresponding solution should be found nu- merically. However, one can show a) that µ̃ vanishes for ξ = ξc, b) that µ̃ takes the value (6) when z+ξ = 1 and c) that µ̃ is given by (4) when z+ξ exceeds unity by a small and salt dependent amount. In practice, for DNA and RNA, we have ξ > 2 and Eq. (4) provides 0.001 0.01 0.1 1 0.001 0.01 0.1 FIG. 1: Preferential interaction coefficient for a 1:1 salt. The main graph corresponds to ss-RNA with reduced line charge ξ = 2.2 while the inset is for ds-RNA (ξ = 5). The circles correspond to the value of (2) following from the numerical solution of Eq. (1). The prediction of Eq. (5) with eµ given by (4) and C ≃ −1.502, shown with the continuous curve, is compared to that of Ref. [9], shown with the dashed line. As in all other figures, the opposite of Γ is displayed, to consider a positive quantity. 0.001 0.01 0.1 0.001 0.01 0.1 2.5 dsRNA ssRNA FIG. 2: Same as Figure 1 for a 1:3 and a 3:1 electrolyte. From Table I, we have C ≃ −1.21 in the 1:3 case and conversely C ≃ −1.94 in the 3:1 case. The symbols correspond to the numerical solution of Eq. (1) and the continuous curves show the results of Eq. (5) with again eµ given by (4). 1 1.5 2 2.5 3 1 1.5 2 2.5 FIG. 3: Preferential interaction coefficient for a 1:1 salt (hence C ≃ −1.502) and κa = 10−2. The circles show the numerical solution of PB theory (1), the continuous curve is for (5) with (4) and the dashed line is the prediction of Ref. [9]. Although approximation (4) breaks down at low ξ, the inset shows that eµ following from the solution of Eq. (C2) gives through (5) a Γ (continuous curve), that is in excellent agreement with the “exact one”, shown with circles as in the main graph. excellent results whenever κa < 0.1. To illustrate this, we compare in Figure 5 the potential following from the analytical expression (C1) to its numerical counterpart. We do not display 1:1, 1:2 and 2:1 results since in these cases, Eq. (C1) is obtained from an exact expansion and fully captures the r-dependence of the potential. For the asymmetry 1:3, Fig. 5 shows that the relatively simple form (C1) is very reliable. A similar agreement has been found for all couples z−:z+ sampled, with the trend that 0.001 0.01 0.1 0.001 0.01 0.1 FIG. 4: Same as Fig. 1 for ξ = 1 and z+/z− = 1. The same quantities are shown: our prediction for Γ [Eqs. (5) and (6) with C ≃ −1.502] is compared to that of Ref. [9]. The inset shows −z+Γ/z− for a 1:2 salt such as MgCl2 where C takes the value -1.301. Circles : numerical data; curve : our prediction. the validity of (C1) extends to larger distances as z+/z− is decreased. In this respect, the agreement shown in Fig. 5 for which z+/z− is quite high (3), is one of the “worst” observed. 0.01 0.1 1 0 0.5 1 0.01 0.1 1 FIG. 5: Opposite of the electric potential versus radial dis- tance in a 1:3 electrolyte with κa = 10−2. The continuous curve shows the prediction of Eq. (C1) with eµ given by (4) ; the circles show the numerical solution of Eq. (1). The po- tential for ξ = 2.2 is shown in the main graph on a log-linear scale, and on a linear scale in the lower inset. The upper inset is for ξ = 5. Conclusion. The poly-ion ion preferential interaction coefficient Γ describes the exclusion of co-ions in the vicinity of a polyelectrolyte in an aqueous solution. We have obtained an accurate expression for Γ in the regime of low salt (κa < 1). The present results are particu- larly relevant for highly charged poly-ions (z+ξ > 1, that is beyond the classical Manning threshold [22]), but are somewhat more general and hold in the range ξc < ξ < 1, where ξ stands for the line charge per Bjerrum length and ξc is a salt dependent threshold, given by Eq. (C5). Our formulae have been shown to hold for arbitrary mixed salts of the form z−:z+ (magnesium chloride, cobalt hex- amine etc). They have been derived from exact expan- sions valid in 1:1,1:2 and 2:1 cases, from which a more general conjecture has been inferred. The validity of this conjecture, backed up by analytical arguments, has been extensively tested for various values of z+/z−, poly-ion charge and salt content. These tests have provided the numerical value of the constant C reported in Table I, which only depends of the ratio z+/z−. As a byprod- uct of our analysis, we have obtained a very accurate expression for the electric potential in the vicinity of the charged rod (r < κ−1). It should be emphasized that the validity of our mean-field description relying on the non-linear Poisson- Boltzmann equation depends on the valency of counter- ions (z+), and to a lesser extent to the value of z− [12, 23]. For the 1:1 case in a solvent like water at room temper- ature, micro-ionic correlations can be neglected up to a salt concentration of 0.1M [8]. For z+ ≥ 2 or in sol- vents of lower dielectric permittivity, they play a more important role. Our results however provide mean-field benchmarks from analytical expressions, from which the effects of correlations may be assessed in cases where they cannot be ignored (see e.g. [8] for a detailed discussion). Acknowledgments This work was supported by a ECOS Nord/COLCIENCIAS action of French and Colom- bian cooperation. G. T. acknowledge partial financial support from Comité de Investigaciones, Facultad de Ciencias, Universidad de los Andes. This work has been supported in part by the NSF PFC-sponsored Center for Theoretical Biological Physics (Grants No. PHY-0216576 and PHY-0225630). APPENDIX A In order to explicitly relate the preferential coefficient Γ in (2) to the electric potential, we follow a procedure similar to that which leads to an analytical solution in the cell model, without added salt [24]. Implicit use will be made of the boundary conditions associated to (1). First, integrating Eq. (1), one gets [r′φ′(r′)]ra = z+ + z− e−z+φ − ez−φ r′dr′, (A1) where the notation [F (r′)]ra = F (r) − F (a) has been in- troduced. Then, multiplying Eq. (1) by r2φ′ and inte- grating, we obtain z+ + z− (r′φ′)2 e−z+φ + r′2 e−z+φ dr′.(A2) Combining both relations with adequate weights, in order to suppress the integral over counter-ion (+) density, we r′(ez−φ − 1)dr′ = 2(z+ + z−) ez−φ0 − 1 e−z+φ0 − 1 where φ0 = φ(a) is the surface potential. Equation (A3) will turn useful in the formulation of a general conjec- ture concerning the surface potential φ0, see Appendix B. We also note that for the systems under investigation here, the surface potential is quite high, and a very good approximation to (A3) is r′(ez−φ − 1)dr′ ≃ a2z−e −z+φ0 2(z+ + z−) APPENDIX B We start by analyzing a 1:1 electrolyte, for which it has been shown [19, 20] that the short distance behaviour reads eφ/2 = 2µ log − 2Ψ(µ) + O (κr) where Ψ denotes the argument of the Euler Gamma func- tion Ψ(x) = arg[Γ(ix)] [19, 20]. In (B1), µ denotes the smallest positive root of tan [2µ log(κa/8)− 2Ψ(µ)] = ξ − 1 . (B2) Expressions (B1) and (B2) require that ξ exceeds a salt dependent threshold [denoted ξc below and given by Eq. (C5)] that is always smaller than 1 [18]. They thus always hold for ξ ≥ 1 and in particular encompass the interesting limiting case ξ = 1, which is sufficient for our purposes. For large ξ, we have proposed in [18] an approximation which amounts to linearizing the argument of the tangent in (B1) in the vicinity of −π, and similarly linearizing Ψ to first order: Ψ(x) ≃ −π/2 − γx + O(x3) where γ is the Euler constant, close to 0.577. It turns out however that finding accurate expressions for exp(−z+φ0), which is useful for the computation of the preferential interac- tion coefficient, requires to include the first non-linear correction in the expansion of the tangent. After some algebra, we find : log(κa) + C − (ξ − 1)−1 6(log(κa) + C − (ξ − 1)−1)4 (ξ − 1)3 ψ(2)(1) where the constant C = C1:1 reads C1:1 = γ − log 8 ≃ −1.502 and ψ(2)(1) = d3 ln Γ(x)/dx3|x=1. From (B3) and (B1) where the sinus is expanded to third order, we ob- (κa)2e−φ0 ≃ 4[(ξ − 1)2 + µ̃2] (B4) where µ̃ is given by log(κa) + C − (z+ξ − 1)−1 . (B5) In writing (B5), we have introduced the change of vari- able µ̃ = 2µ [25]. The reason is that similar changes for other electrolyte asymmetries allows to put the final re- sult in a “universal” (electrolyte independent) form, see below. A similar reason holds for introducing z+, here equal to 1, in the denominator of (B5). The functional proximity between our expressions and those reported in [9] in the very same context is striking. We note however that our µ̃ (denoted β in [9]) involves a different constant C. More importantly, the functional form of (B1) differs from that given in [9]. The compari- son of the performances of our results with those of [9] is addressed below, and is also discussed in the main text. Performing a similar analysis as above in the 1:2 case where z+ = 2 and z− = 1, we obtain from the expressions derived in [18]: (κa)2e−z+φ0 ≃ 3[(ξ − 1)2 + µ̃2] (B6) and similarly, in the 2:1 case (z+ = 1, z− = 2): (κa)2e−z+φ0 ≃ 6[(ξ − 1)2 + µ̃2]. (B7) In both cases, provided again that ξ is not too low (see below) µ̃ is given by (B5) [26], with however a different numerical value for C [C1:2 = γ− (3 log 3)/2− (log 2)/3 ≃ −1.301 and C2:1 = γ − (3 log 3)/2− log 2 ≃ −1.763]. The similarity of expressions (B4), (B6) and (B7) leads to conjecture that this form holds for any z−:z+ elec- trolyte : (κa)2e−z+φ0 ≃ A[(z+ξ − 1) 2 + µ̃2]. (B8) We then have to determine the prefactor A as a function of z+ and z−. To this end, we make use of the exact relation (A3) [or equivalently (A4)], where in the limit of large ξ, the lhs is finite while the two terms on the rhs diverge. This yields the leading order behaviour : (κa)2 exp(−z+φ0) z+ + z− (z+ξ − 1) 2. (B9) It then follows that A = 2(z+ + z−)/z+ so that our gen- eral expression (B8) takes the form: (κa)2e−z+φ0 ≃ 2 z+ + z− (z+ξ − 1) 2 + µ̃2 . (B10) This expression holds regardless of the approximation used for µ̃. If Eq. (B5) is used, then z+ξ should not be too close to unity (see appendix C for more general results including the case z+ξ = 1). In order to test the accuracy of (B10) in conjunction with (B5), we have solved numerically Eq. (1) for several values of κa < 1 and electrolyte asymmetry and checked that for several different values of z+ξ > 1, the quantity Q = −π (κa)2e−z+φ0 2(z+ + z−) − (z+ξ − 1) ]−1/2 − log(κa) + (z+ξ − 1) −1 (B11) is a constant C, which only depends on z+/z− but not on salt and ξ [it should be borne in mind that Eq. (B5) is a small κa and large ξ expansion, which becomes increas- ingly incorrect as κa is increased and/or ξ lowered]. This is quite a stringent test (since the two terms on the rhs of (B11) are large and close] which requires high numerical accuracy. This is achieved following the procedure out- lined in [27]. In doing so, we confirm the validity of (B10) and collect the values of C given in Table I. In the 1:1 case, we predict that C = γ− log 8 ≃ −1.507, in excellent agreement with the numerical data of Figure 6. On the other hand, the prediction of Ref. [9] that Q reaches a constant close to -1.90 (shown by the horizontal dashed line in Fig 6) is incorrect. Figure 6 shows that the quality of expression (B10) deteriorates when κa increases, as ex- pected. It is noteworthy however that for κa = 10−1, its accuracy is excellent whenever ξ > 2. The inset of Fig. 6 shows the validity of (B10) for a 3:1 electrolyte. When z+ξ is close to 1, Eq. (B5) becomes an irrelevant approx- imation to the solution of (B2), and can therefore not be inserted into the general formula (B10). This explains the large deviations between Q and the asymptotic value C observed in Fig. 6 for the lower values of ξ reported. We come back to this point in Appendix C. The present results hold for z+ξ > 1 +O(1/| log κa|). In this regime, our analysis shows that Eq. (B10) [with µ̃ given by (B5)] is correct up to order 1/ log4(κa) for any (z−, z+). On the other hand the results of [9] ,valid in the 1:1 case, appear to be correct to order 1/ log2(κa). In addition, our expression for the surface potential may be generalized to a broader range of ξ values and an ex- pression for the short distance dependence of the electric potential may also be provided. This is the purpose of appendix C. 1 2 3 4 5 1 2 3 4 Q = -1.90 FIG. 6: Plot of the quantity Q defined in (B11) versus line charge ξ for a 1:1 electrolyte at κa = 10−3 (continuous curve) and κa = 10−1 (dashed curve). The value reached at large ξ is compared to the prediction of [9] Q → eγ + log 2− γ ≃ −1.90 (horizontal dashed-dotted line) whereas Eqs. (B10) and (B5) imply Q → γ− log 8 ≃ −1.50, shown by the horizontal dotted line. The inset shows the same quantity for a 3:1 electrolyte at κa = 10−5 [such a very low value is required to determine precisely the value of the asymptotic constant C, that can subsequently be used at experimentally relevant (higher) salt concentrations]. Here, we obtain Q → −1.94 (dotted line) which is the value reported for C in Table I. APPENDIX C In Appendix B, the “universal” results valid for all (z+, z−) have been unveiled partly by a change of variable µ → µ̃ from existing expressions [18]. In light of these results, and of their accuracy (assessed in particular by the precision reached for the preferential interaction co- efficient), it is tempting to go further without invoking approximations of (B2), or related expressions for other asymmetries than 1:1. Inspection of the results given in [18] for the 1:1, 1:2 and 2:1 cases lead, with again the help of (A4), to the conjecture that ez+φ/2 ≃ 2(z+ + z−) sin [µ̃ log(κr) + µ̃ C] (C1) tan [ µ̃ log(κa) + µ̃ C ] = z+ξ − 1 . (C2) We emphasize that (C1), much as (B1), is a short dis- tance expansion and typically holds for κr < 1 (hence the requirement that κa < 1). In appendix D we give further analytical support for conjecture (C1). A typical plot showing the accuracy of (C1) is provided in the main text (Fig. 5). For κr < 0.1, the agreement with the ex- act result is better than 0.1%, and becomes progressively worse at higher distances (20% disagreement at κr = 1). From (C1), it follows that the integrated charge q(r) in a cylinder of radius r [that is q(r) = −rφ′(r)/2] reads z+q(r) = −1 + µ̃ tan µ̃ log where the so-called Manning radius [18, 28, 29] is given κRM = exp . (C4) The Manning radius is a convenient measure of the coun- terion condensate thickness. It is the point r where not only z+q(r) = 1 but also where q(r) versus log r exhibits an inflection point [30]. For high enough ξ, the logarith- mic dependence of 1/µ̃ with salt [see (B5)] is such that RM ∝ κ −1/2. The two relations (C1) and (C2) encompass those given in Appendix B and allow to investigate the regime z+ξc < z+ξ, and in particular the case z+ξ = 1, the so-called Manning threshold [5]. However, (C1) and (C2) are not valid for ξ < ξc, with z+ξc = 1 + log κa+ C . (C5) Note that ξc < 1, since the constant C is negative and that salt should fulfill κa < 1. For κa = 10−2 and z+/z− = 1, we obtain ξc ≃ 0.836. This is precisely the point where −Γ = 1 in the inset of Fig. 3. This inset also shows that the value of Γ resulting from the use of the solution of (C2) is remarkably accurate. At this point, it seems useful to investigate the Man- ning threshold case z+ξ = 1 (which corresponds to the onset of counterion condensation when κa → 0 [5, 18, 30]). It is readily seen that the solution of (C2) reads z+ξ=1 log(κa) + C , (C6) which should be inserted in (C1) to obtain the potential profile, or in (5) to get the interaction coefficient. APPENDIX D In this appendix we give further support for the con- jecture (C1) which gives the short-distance expansion of the electric potential. Let us suppose initially that the charge is below the Manning threshold ξ < ξc. It is straightforward to verify that Poisson–Boltzmann equation (1) admits solutions which behave as φ(r) = −2A ln(κr) + lnB + o(1) for κr ≪ 1. Injecting this ex- pansion into equation (1) allows us to compute higher order terms. To study the regime beyond the Man- ning threshold, we compute all higher order terms of the form r2n(1+z+A) (for a negatively charged macroion) and r2n(1−z−A) (for a positively charged macroion), with n a positive integer. These terms turn out to present them- selves as the series expansion of the logarithm, thus re- summing them we obtain φ(r) = −2A ln(κr) + lnB (D1) −z+ (κr)2(1+z+A) 8(z+ + z−)(1 + z+A)2 − (κr)2(1−z−A) 8(z+ + z−)(1 − z−A)2 + · · · The dots represent terms of order r2n(1+z+A)+2m(1−z−A) with n and m two nonzero positive integers. When the Manning threshold is approached, z+A + 1 = 0 for neg- atively charged macroion, the terms r2n(1+z+A) (second line of Eq. (D1)) become of order one, but the rest of the terms (third line of Eq. (D1) and dots) remain higher or- der: a change in the small distance behavior of φ occurs. A similar situation is reached for 1 − z−A = 0 which is the Manning threshold for a positively charged macroion. A and B in the previous equations are constants of in- tegration, which should be determined with the bound- ary conditions rφ′(r) = 2ξ at the polyion radius (r = a) and φ → 0 for r → ∞. Thus to proceed further, we have to connect the long and the short distance behavior of φ. This connection problem has been only solved in the cases 1:1, 1:2 and 2:1 in Refs. [19, 31]. In particular, once A has been chosen (notice that for a = 0, A = −ξ), B should be one and only one function of A in order to satisfy φ→ 0 for r → ∞. The results from [19, 31] show B = 26Aγ ((1 +A)/2) (1 : 1) (D2) B = 33A22Aγ (2(1 +A)/3)γ ((1 +A)/3) (1 : 2) B = 33A22Aγ ((1 + 2A)/3)γ ((2 +A)/3) (2 : 1) where γ(x) = Γ(x)/Γ(1 − x). B turns out to have some interesting properties in the cases 1:1, 1:2 and 2:1, where its exact expression (D2) is known. Namely, at the Man- ning threshold 1 + z+A = 0, A→−1/z+ 8(z+ + z−)(1 + z+A)2 = 1 (D3) Furthermore if we put 1 + z+A = iµ̃, and define e2iΨ(eµ) = 8(z+ + z−)(1 + z+A)2 then for µ̃ ∈ R, Ψ(µ̃) ∈ R is a real function of µ̃, with Ψ(0) = 0. Let us now study the regime beyond the Manning threshold for a negatively charged macroion. From Eq. (D1) we can write ez+φ(r)/2 ∼ (κr)−z+ABz+/2 −z+ (κr)2(1+z+A) 8(z+ + z−)(1 + z+A)2 neglecting terms of higher order when z+A is close to −1. Let us conjecture that the properties of B as a function of A presented above hold in the general case z− : z+. Then using the parameter µ̃ defined above we find after some simple algebra ez+φ(r)/2 = 2(z+ + z−) sin [µ̃ log(κr) + Ψ(µ̃)] +O(r3+2z−/z+) (D6) Recalling that |µ̃| ≪ 1 we can approximate Ψ(µ̃) ≃ µ̃C, where C = Ψ′(0). Replacing this approximation into (D6) and imposing the boundary condition aφ′(a) = 2ξ leads to (C1) and (C2). Numerical values obtained for the constants C are reported in Table I, for different charge asymmetries z− : z+. The previous analysis shows that analytical predictions for C could be made if the con- nection problem is solved and the equivalent of expres- sions (D2) are found for the general case z− : z+. [1] C.F. Anderson and M.T. Record Jr, Annu. Rev. Phys. Chem. 33, 191 (1984). [2] H. Eisenberg, Biological Macromolecules and Polyelec- trolytes in Solution, Clarendon, Oxford (1976). [3] J.A. Schellman, Biophys. Chem. 37, 121 (1990). [4] S.M. Timasheff, Biochemistry 31, 9857 (1992). [5] G.S. Manning, J. Chem. Phys. 51, 924 (1969). [6] F. Oosawa, Polyelectrolytes, Dekker, New York (1971). [7] K.A. Sharp, Biopolymers 36, 227 (1995). [8] H. Ni, C.F. Anderson and M.T. Record Jr, J. Phys. Chem. B 103, 3489 (1999). [9] I.A. Shkel, O.V. Tsodikov and M.T. Record Jr, Proc. Natl. Acad. Sci. USA 99, 2597 (2002). [10] C.H. Taubes, U. Mohanty and S. Chu, J. Phys. Chem. B 109, 21267 (2005). [11] M. Gueron, J.-Ph. Demaret and M. Filoche, Biophys. Journal 78, 1070 (2000). [12] Y. Levin, Rep. Prog. Phys. 65, 1577 (2002). [13] In the 1:1 case, our definition differs from the more stan- dard one as found e.g. in [9] by a factor 4ξ. The reason for doing so is that this allows easier comparison of the salt dependence of Γ for different values of the poly-ion charge. [14] F.G. Donnan, Chem. Rev. 1, 73 (1924). [15] I.A. Shkel, O.V. Tsodikov and M.T. Record Jr, J. Phys. Chem. B 104, 5161 (2000). [16] M. Aubouy, E. Trizac, L. Bocquet, J. Phys. A: Math. Gen. 36, 5835 (2003). [17] G. Tellez and E.Trizac, Phys. Rev. E 70, 011404 (2004). [18] E. Trizac and G. Téllez, Phys. Rev. Lett 96, 038302 (2006) ; G. Téllez and E. Trizac, J. Stat. Mech. P06018 (2006). [19] B.M. McCoy, C.A. Tracy and T.T. Wu, J. Math. Phys. 18, 1058 (1977). [20] J.S. McCaskill and E.D. Fackerell, J. Chem. Soc., Fara- day Trans. 2 84, 161 (1988). [21] C.A. Tracy and H. Widom, Physica A 244, 402 (1997). [22] We emphasize that accurate results for Γ, φ etc may be obtained for ξ < ξc from the results given in [18]. We did not investigate this regime here, since it is of little relevance for nucleic acids. [23] A.Y. Grosberg, T.T. Nguyen and B.I. Shklovskii, Rev. Mod. Phys. 74, 329 (2002). [24] R.M. Fuoss, A. Katchalsky and S.F. Lifson, P. Natl. Acad. Sci. USA 37, 579 (1951). [25] It then appears that the expression given for eµ = 2µ in (B5) corresponds to the dominant term only in (B3) (the first one on the rhs). [26] Compared to the expressions given in [18] where a pa- rameter µ plays a key role, the corresponding change of variables should be performed: eµ = 3µ (1:2 case) and eµ = 3µ/2 for 2:1 electrolytes. [27] E. Trizac, L. Bocquet, M. Aubouy and H.H. von Grünberg, Langmuir 19, 4027 (2003). [28] M. Gueron and G. Weisbuch, Biopolymers 19, 353 (1980). [29] B. O’Shaughnessy and Q. Yang, Phys. Rev. Lett. 94, 048302 (2005). [30] M. Deserno, C. Holm and S. May, Macromolecules 33, 199 (2000). [31] C.A. Tracy and H. Widom, Commun. Math. Phys. 190, 697 (1998). ABSTRACT The thermodynamics of nucleic acid processes is heavily affected by the electric double-layer of micro-ions around the polyions. We focus here on the Coulombic contribution to the salt-polyelectrolyte preferential interaction (Donnan) coefficient and we report extremely accurate analytical expressions valid in the range of low salt concentration (when polyion radius is smaller than the Debye length). The analysis is performed at Poisson-Boltzmann level, in cylindrical geometry, with emphasis on highly charged poly-ions (beyond ``counter-ion condensation''). The results hold for any electrolyte of the form $z_-$:$z_+$. We also obtain a remarkably accurate expression for the electric potential in the vicinity of the poly-ion. <|endoftext|><|startoftext|> Introduction One of the important quantities in information theory is the mutual information of two random variables X and Y which is expressed in terms of the Boltzmann-Gibbs entropy H(·) as follows: I(X ∧ Y ) = −H(X, Y ) +H(X) +H(Y ) when X, Y are continuous variables. For the expression of I(X∧Y ) of discrete variables X, Y , the aboveH(·) is replaced by the Shannon entropy. A more practical and rigorous definition via the relative entropy is I(X ∧ Y ) := S(µ(X,Y ), µX ⊗ µY ), where µ(X,Y ) denotes the joint distribution measure of (X, Y ) and µX⊗µY the product of the respective distribution measures of X, Y . The aim of this paper is to show that the mutual information I(X∧Y ) is gained as a certain asymptotic limit of the volume of “discrete micro-states” consisting of permu- tations approximating joint moments of (X, Y ) in some way. In Section 1, more gener- ally we consider an n-tuple of real bounded random variables (X1, . . . , Xn). Denote by ∆(X1, . . . , Xn;N,m, δ) the set of (x1, . . . ,xn) of xi ∈ R N whose joint moments (on the uniform distributed N -point set) of order up tom approximate those of (X1, . . . , Xn) up to an error δ. Furthermore, denote by ∆sym(X1, . . . , Xn;N,m, δ) the set of (σ1, . . . , σn) of permutations σi ∈ SN such that (σ1(x1), . . . , σn(xn)) ∈ ∆(X1, . . . , Xn;N,m, δ) for some x1, . . . ,xn ∈ R ≤ , where R ≤ is the R N -vectors arranged in increasing order. Then, the asymptotic volume log γ⊗nSN ∆sym(X1, . . . , Xn;N,m, δ) under the uniform probability measure γSN on SN is shown to converge as lim supN→∞ (also lim infN→∞) and then limm→∞,δց0 to −H(X1, . . . , Xn) + H(Xi) 1 Supported in part by Grant-in-Aid for Scientific Research (B)17340043. 2 Supported in part by the Hungarian Research Grant OTKA T068258. AMS subject classification: Primary: 62B10, 94A17. http://arxiv.org/abs/0704.0588v1 2 F. HIAI AND D. PETZ as long as H(Xi) > −∞ for 1 ≤ i ≤ n. Thus, we obtain a kind of discretization of the mutual information via symmetric group (or permutations). The approach can be applied to an n-tuple of discrete random variables (X1, . . . , Xn) as well. But the definition of the ∆sym-set of micro-states for discrete variables is somewhat different from the continuous variable case mentioned above, and we discuss the discrete variable case in Section 2 separately. The idea comes from the paper [3]. Motivated by theory of mutual free information in [6], a similar approach to Voiculescu’s free entropy is provided there. The free entropy is the free probability counterpart of the Boltzmann-Gibbs entropy, and RN -vectors and the symmetric group SN here are replaced by Hermitian N ×N matrices and the unitary group U(N), respectively. In this way, the “discretization approach” here is in some sense a classical analog of the “orbital approach” in [3]. 1. The continuous case For N ∈ N let RN≤ be the convex cone of the N -dimensional Euclidean space R consisting of x = (x1, . . . , xN ) such that x1 ≤ x2 ≤ · · · ≤ xN . The space R N is naturally regarded as the real function algebra on the N -point set. Let SN be the symmetric group of order N (i.e., the permutations on {1, 2, . . . , n}). Throughout this section let (X1, . . . , Xn) be an n-tuple of real random variables on a probability space (Ω,P), and assume that the Xi’s are bounded (i.e., Xi ∈ L ∞(Ω;P)). The Boltzmann-Gibbs entropy of (X1, . . . , Xn) is defined to be H(X1, . . . , Xn) := − · · · p(x1, . . . , xn) log p(x1, . . . , xn) dx1 · · · dxn if the joint density p(x1, . . . , xn) of (X1, . . . , Xn) exists; otherwise H(X1, . . . , Xn) = −∞. Note that the above integral is well defined in [−∞,∞) since the density p is compactly supported. Definition 1.1. The mean value of x = (x1, . . . , xN) in R N is given by κN(x) := For each N,m ∈ N and δ > 0 we define ∆(X1, . . . , Xn;N,m, δ) to be the set of all n-tuples (x1, . . . ,xn) of xi = (xi1, . . . , xiN ) ∈ R N , 1 ≤ i ≤ n, such that |κN(xi1 · · ·xik)− E(Xi1 · · ·Xik)| < δ for all 1 ≤ i1, . . . , ik ≤ n with 1 ≤ k ≤ m, where xi1 · · ·xik means the pointwise product, i.e., xi1 · · ·xik := (xi11 · · ·xik1, xi12 · · ·xik2, . . . , xi1N · · ·xikN ) ∈ R and E(·) denotes the expectation on (Ω,P). For each R > 0, define ∆R(X1, . . . , Xn; N,m, δ) to be the set of all (x1, . . . ,xn) ∈ ∆(X1, . . . , Xn;N,m, δ) such that xi ∈ [−R,R]N for all 1 ≤ i ≤ n. Heuristically, ∆(X1, . . . , Xn;N,m, δ) is the set of “micro-states” consisting of n- tuples of discrete random variables on the N -point set with the uniform probability A NEW APPROACH TO MUTUAL INFORMATION 3 such that all joint moments of order up to m give the corresponding joint moments of X1, . . . , Xn up to an error δ. For x ∈ RN write ‖x‖p := (N j=1 |xj | p)1/p for 1 ≤ p < ∞ and ‖x‖∞ := max1≤j≤N |xj| while ‖X‖p denotes the L p-norm of a real random variable X on (Ω,P). The next lemma is seen from [4, 5.1.1] based on the Sanov large deviation theorem, which says that the Boltzmann-Gibbs entropy is gained as an asymptotic limit of the volume of the approximating micro-states. Lemma 1.2. For every m ∈ N and δ > 0 and for any choice of R ≥ max1≤i≤n ‖Xi‖∞, the limit log λ⊗nN ∆R(X1, . . . , Xn;N,m, δ) exists, where λN is the Lebesgue measure on R N . Furthermore, one has H(X1, . . . , Xn) = lim m→∞,δց0 log λ⊗nN ∆R(X1, . . . , Xn;N,m, δ) independently of the choice of R ≥ max1≤i≤n ‖Xi‖∞. In the following let us introduce some kinds of mutual information in the discretiza- tion approach using micro-states of permutations. Definition 1.3. The action of SN on R N is given by σ(x) := (xσ−1(1), xσ−1(2), . . . , xσ−1(N)) for σ ∈ SN and x = (x1, . . . , xN) ∈ R N . For each N,m ∈ N, δ > 0 and R > 0 we denote by ∆sym,R(X1, . . . , Xn;N,m, δ) the set of all (σ1, . . . , σn) ∈ S N such that (σ1(x1), . . . , σn(xn)) ∈ ∆R(X1, . . . , Xn;N,m, δ) for some (x1, . . . ,xn) ∈ (R n. For each R > 0 define Isym,R(X1, . . . , Xn) := − lim m→∞,δց0 lim sup log γ⊗nSN ∆sym,R(X1, . . . , Xn;N,m, δ) where γSN is the uniform probability measure on SN . Define also Isym,R(X1, . . . , Xn) by replacing lim sup by lim inf. Obviously, 0 ≤ Isym,R(X1, . . . , Xn) ≤ Isym,R(X1, . . . , Xn). Moreover, ∆sym,∞(X1, . . . , Xn;N,m, δ) is defined by replacing ∆R(X1, . . . , Xn;N,m, δ) in the above by ∆(X1, . . . , Xn;N,m, δ) without cut-off by the parameter R. Then Isym,∞(X1, . . . , Xn) and Isym,∞(X1, . . . , Xn) are also defined as above. Definition 1.4. For each 1 ≤ i ≤ n we choose and fix a sequence ξi = {ξi(N)} of ξi(N) ∈ R ≤ , N ∈ N, such that κN (ξi(N) k) → E(Xki ) as N → ∞ for all k ∈ N, i.e., ξi(N) → Xi in moments. For each N,m ∈ N and δ > 0 we define ∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N,m, δ) to be the set of all (σ1, . . . , σn) ∈ S N such that (σ1(ξ1(N)), . . . , σn(ξn(N))) ∈ ∆(X1, . . . , Xn;N,m, δ). 4 F. HIAI AND D. PETZ Define Isym(X1, . . . , Xn : ξ1, . . . , ξn) := − lim m→∞,δց0 lim sup log γ⊗nSN ∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N,m, δ) and Isym(X1, . . . , Xn : ξ1, . . . ξn) by replacing lim sup by lim inf. The next proposition asserts that the quantities in Definitions 1.3 and 1.4 are all equivalent. Lemma 1.5. For any choice of R ≥ max1≤i≤n ‖Xi‖∞ and for any choices of approxi- mating sequences ξ1, . . . , ξn one has Isym,∞(X1, . . . , Xn) = Isym,R(X1, . . . , Xn) = Isym(X1, . . . , Xn : ξ1, . . . , ξn), (1.1) Isym,∞(X1, . . . , Xn) = Isym,R(X1, . . . , Xn) = Isym(X1, . . . , Xn : ξ1, . . . , ξn). (1.2) Proof. It is obvious that ∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N,m, δ) is included in ∆sym,∞(X1, . . . , Xn;N,m, δ) for any approximating sequences ξi. Moreover, for each 1 ≤ i ≤ n an approximating sequence ξi can be chosen so that ‖ξi(N)‖∞ ≤ ‖Xi‖∞ for all N ; then ∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N,m, δ) ⊂ ∆sym,R(X1, . . . , Xn; N,m, δ) for any R ≥ R0 := max1≤i≤n ‖Xi‖∞. Hence it suffices to prove that for any approximating sequences ξi and for every m ∈ N and δ > 0, there are an m ′ ∈ N, a δ′ > 0 and an N0 ∈ N so that ∆sym,∞(X1, . . . , Xn;N,m ′, δ′) ⊂ ∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N,m, δ) for all N ≥ N0. Choose a ρ ∈ (0, 1) with m(R0 + 1) m−1ρ < δ/2. By [5, Lemma 4.3] (also [4, 4.3.4]) there exist an m′ ∈ N with m′ ≥ 2m, a δ′ > 0 with δ′ ≤ min{1, δ/2} and an N0 ∈ N such that for every 1 ≤ i ≤ n and every x ∈ R ≤ with N ≥ N0, if |κN (x k) − E(Xki )| < δ ′ for all 1 ≤ k ≤ m′, then ‖x − ξi(N)‖m < ρ. Suppose N ≥ N0 and (σ1, . . . , σn) ∈ ∆sym,∞(X1, . . . , Xn;N,m ′, δ′); then (σ1(x1), . . . , σn(xn)) ∈ ∆(X1, . . . , Xn;N,m ′, δ′) for some (x1, . . . ,xn) ∈ (R n. Since |κN(x i ) − E(X i )| < δ for all 1 ≤ k ≤ m′, we get ‖xi − ξi(N)‖m ≤ ρ and ‖xi‖m ≤ ‖xi‖2m = κN(x < (E(X2mi ) + 1) ≤ (R2m0 + 1) 1/2m ≤ R0 + 1. Therefore, |κN(σi1(ξi1(N)) · · ·σik(ξik(N)))− E(Xi1 · · ·Xik)| ≤ |κN(σi1(ξi1(N)) · · ·σik(ξik(N)))− κN(σi1(xi1) · · ·σik(xik))| + |κN(σi1(xi1) · · ·σik(xik))− E(Xi1 · · ·Xik)| ≤ m(R0 + 1) m−1ρ+ δ′ < δ for all 1 ≤ i1, . . . , ik ≤ n with 1 ≤ k ≤ m. The above latter inequality follows from the Hölder inequality. Hence (σ1, . . . , σn) ∈ ∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N,m, δ), and the result follows. � A NEW APPROACH TO MUTUAL INFORMATION 5 Consequently, we denote all the quantities in (1.1) by the same Isym(X1, . . . , Xn) and those in (1.2) by Isym(X1, . . . , Xn). We call Isym(X1, . . . , Xn) and Isym(X1, . . . , Xn) the mutual information and upper mutual information of (X1, . . . , Xn), respectively. The terminology “mutual information” will be justified after the next theorem. In the continuous variable case, our main result is the following exact relation of Isym and Isym with the Boltzmann-Gibbs entropy H(·), which says that Isym(X1, . . . , Xn) is formally the sum of the separate entropiesH(Xi)’s minus the compoundH(X1, . . . , Xn). Thus, a naive meaning of Isym(X1, . . . , Xn) is the entropy (or information) overlapping among the Xi’s. Theorem 1.6. H(X1, . . . , Xn) = −Isym(X1, . . . , Xn) + H(Xi) = −Isym(X1, . . . , Xn) + H(Xi). Proof. If the coordinates si of s ∈ R N are all distinct, then s is uniquely written as s = σ(x) with x ∈ RN≤ and σ ∈ SN . Note that the set of s ∈ R N with si = sj for some i 6= j is a closed subset of λN -measure zero. Under the correspondence s ∈ RN ←→ (x, σ) ∈ RN≤ × SN , s = σ(x) (well defined on a co-negligible subset of RN), the measure λN is transformed into the product of λN |RN and the counting measure on SN . In the following proof we adopt, due to Lemma 1.5, the description of Isym and Isym as Isym,R(X1, . . . , Xn) and Isym,R(X1, . . . , Xn) with R := max1≤i≤n ‖Xi‖∞. For each N,m ∈ N and δ > 0, suppose (s1, . . . , sn) ∈ ∆R(X1, . . . , Xn;N,m, δ) and write si = σi(xi) with xi ∈ R ≤ and σi ∈ SN . Then it is obvious that (x1, . . . ,xn; σ1, . . . , σn) ∆R(Xi;N,m, δ) ∩ R ×∆sym,R(X1, . . . , Xn;N,m, δ). By Lemma 1.2 and the fact stated at the beginning of the proof, we obtain H(X1, . . . , Xn) ≤ lim log λ⊗nN ∆R(X1, . . . , Xn;N,m, δ) ≤ lim inf log λN ∆R(Xi;N,m, δ) ∩ R + log#∆sym,R(X1, . . . , Xn;N,m, δ) = lim inf log λN ∆R(Xi;N,m, δ) − n logN ! 6 F. HIAI AND D. PETZ + log#∆sym,R(X1, . . . , Xn;N,m, δ) log λN ∆R(Xi;N,m, δ) + lim inf log γ⊗nSN ∆sym,R(X1, . . . , Xn;N,m, δ) This implies that H(X1, . . . , Xn) ≤ H(Xi)− Isym(X1, . . . , Xn). (1.3) Conversely, for each m ∈ N and δ > 0, by [5, Lemma 4.3] (also [4, 4.3.4]) there are an m′ ∈ N with m′ ≥ m, a δ′ > 0 with δ′ ≤ δ/2 and an N0 ∈ N such that for every N ∈ N and for every x,y ∈ RN≤ , if ‖x‖∞ ≤ R and |κN(x k) − κN(y k)| < 2δ′ for all 1 ≤ k ≤ m′, then ‖x− y‖1 < δ/2m(R + 1) m−1. Suppose N ≥ N0 and (x1, . . . ,xn; σ1, . . . , σn) ∆R(Xi;N,m ′, δ′) ∩ RN≤ ×∆sym,R(X1, . . . , Xn;N,m ′, δ′) so that (σ1(y1), . . . , σn(yn)) ∈ ∆R(X1, . . . , Xn;N,m ′, δ′) for some (y1, . . . ,yn) ∈ (R Since |κN(x i )− κN(y i )| ≤ |κN(x i )− E(X i )|+ |κN(y i )− E(X i )| < 2δ for all 1 ≤ k ≤ m′, we get ‖xi − yi‖1 < δ/2m(R + 1) m−1 for 1 ≤ i ≤ n. Therefore, |κN (σi1(xi1) · · ·σik(xik))− E(Xi1 · · ·Xik)| ≤ |κN(σi1(xi1) · · ·σik(xik))− κN(σi1(yi1) · · ·σik(yik))| + |κN(σi1(yi1) · · ·σik(yik))− E(Xi1 · · ·Xik)| ≤ m(R + 1)m−1 max 1≤i≤n ‖xi − yi‖1 + δ + δ′ ≤ δ for all 1 ≤ i1, . . . , ik ≤ n with 1 ≤ k ≤ m. This implies that (σ1(x1), . . . , σn(xn)) ∈ ∆R(X1, . . . , Xn;N,m, δ). By Lemma 1.2 we obtain H(Xi)− Isym(X1, . . . , Xn) log λN ∆R(Xi;N,m ′, δ′) + lim sup log γ⊗nSN ∆sym,R(X1, . . . , Xn;N,m ′, δ′) = lim sup log λN ∆R(Xi;N,m ′, δ′) ∩ RN≤ A NEW APPROACH TO MUTUAL INFORMATION 7 + log#∆sym,R(X1, . . . , Xn;N,m ′, δ′) ≤ lim sup log λ⊗nN ∆R(X1, . . . , Xn;N,m, δ) This implies by Lemma 1.2 once again that H(Xi)− Isym(X1, . . . , Xn) ≤ H(X1, . . . , Xn). (1.4) The result follows from (1.3) and (1.4). � Let µ(X1,...,Xn) be the joint distribution measure on R n of (X1, . . . , Xn) while µXi is that of Xi for 1 ≤ i ≤ n. Let S(µ(X1,...,Xn), µX1 ⊗· · ·⊗µXn) denote the relative entropy (or the Kullback-Leibler divergence) of µ(X1,...,Xn) with respect to the product measure µX1 ⊗ · · · ⊗ µXn , i.e., S(µ(X1,...,Xn), µX1 ⊗ · · · ⊗ µXn) := dµ(X1,...,Xn) d(µX1 ⊗ · · · ⊗ µXn) dµ(X1,...,Xn) if µ(X1,...,Xn) is absolutely continuous with respect to µX1 ⊗ · · · ⊗ µXn ; otherwise S(µ(X1,...,Xn), µX1 ⊗ · · · ⊗ µXn) := +∞. When H(Xi) > −∞ for all 1 ≤ i ≤ n, one can easily verify that S(µ(X1,...,Xn), µX1 ⊗ · · · ⊗ µXn) = −H(X1, . . . , Xn) + H(Xi). Thus, the above theorem yields the following: Corollary 1.7. If H(Xi) > −∞ for all 1 ≤ i ≤ n, then Isym(X1, . . . , Xn) = Isym(X1, . . . , Xn) = S(µ(X1,...,Xn), µX1 ⊗ · · · ⊗ µXn). Corollary 1.8. Under the same assumption as the above corollary, Isym(X1, . . . , Xn) = 0 if and only if X1, . . . , Xn are independent. In particular, the originalmutual information I(X1∧X2) of two real random variables X1, X2 is normally defined as I(X1 ∧X2) := S(µ(X1,X2), µX1 ⊗ µX2). Hence we have I(X1 ∧X2) = Isym(X1, X2) = Isym(X1, X2) as long as H(X1) > −∞ and H(X2) > −∞ (and X1, X2 are bounded). For this reason, we gave the term “mutual information” to Isym. Finally, some open problems are in order: (1) Without the assumption H(Xi) > −∞ for 1 ≤ i ≤ n, does Isym(X1, . . . , Xn) = Isym(X1, . . . , Xn) hold true? 8 F. HIAI AND D. PETZ (2) More strongly, does the limit such as log γ⊗nSN (∆sym,R(X1, . . . , Xn;N,m, δ)) log γ⊗nSN (∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N,m, δ)) exist as in Lemma 1.2? (3) Without the assumption H(Xi) > −∞ for 1 ≤ i ≤ n, does Isym(X1, . . . , Xn) = S(µ(X1,...,Xn), µX1⊗· · ·⊗µXn) hold true? Also, is Isym(X1, . . . , Xn) = 0 equivalent to the independence of X1, . . . , Xn? (4) Although the boundedness assumption for X1, . . . , Xn is rather essential in the above discussions, it is desirable to extend the results in this section to X1, . . . , Xn not necessarily bounded but having all moments. 2. The discrete case Let Y be a finite set with a probability measure p. The Shannon entropy of p is S(p) := − p(y) log p(y). For each sequence y = (y1, . . . , yN) ∈ Y N , the type of y is a probability measure on Y given by νy(t) := Ny(t) where Ny(t) := #{j : yj = t}, t ∈ Y . The number of possible types is smaller than (N + 1)#Y . If ν is a type and TN(ν) denotes the set of all sequences of type ν from YN , then the cardinality of TN(ν) is estimated as follows: (N + 1)#Y eNS(ν) ≤ #TN (ν) ≤ e NS(ν) (2.1) (see [1, 12.1.3] and [2, Lemma 2.2]). Let p be a probability meausre on Y . For each N ∈ N and δ > 0 we define ∆(p;N, δ) to be the set of all sequences y ∈ YN such that |νy(t)−p(t)| < δ for all t ∈ Y . In other words, ∆(p;N, δ) is the set of all δ-typical sequeces (with respect to the measure p). Then the next lemma is well known. Lemma 2.1. S(p) = lim log#∆(p;N, δ). In fact, this easily follows from (2.1). Let PN,δ be the maximizer of the Shannon entropy on the set of all types νy, y ∈ Y N , such that |νy(t) − p(t)| < δ for all t ∈ Y . We can use the Shannon entropy of the type class corresponding to PN,δ to estimate the cardinality of ∆(p;N, δ): (N + 1)−#YeNS(PN,δ) ≤ #∆(p;N, δ) ≤ eNS(PN,δ)(N + 1)#Y . A NEW APPROACH TO MUTUAL INFORMATION 9 It follows that log#∆(p;N, δ) = sup{S(q) : q is a probability meausre on Y such that |q(t)− p(t)| < δ, t ∈ Y}, and the lemma follows. We consider the case where p is the joint distribution of an n-tuple (X1, . . . , Xn) of discrete random variables on (Ω,P). Throughout this section we assume that the random variables X1, . . . , Xn have their values in a finite set X = {t1, . . . , td}. Definition 2.2. Let p(X1,...,Xn) denote the joint distribution of (X1, . . . , Xn), which is a measure on X n while the distribution pXi of Xi is a measure on X , 1 ≤ i ≤ n. We write ∆(Xi;N, δ) for ∆(pXi;N, δ) and ∆(X1, . . . , Xn;N, δ) for ∆(p(X1,...,Xn);N, δ). Next, we introduce the counterparts of Definitions 1.3 and 1.4 in the discrete variable case. Definition 2.3. The action of SN on X N is similar to that on RN given in Defintion 1.3. For N ∈ N let XN≤ denote the set of all sequences of length N of the form x = (t1, . . . , t1, t2, . . . , t2, . . . , td, . . . , td). Oviously, such a sequence x is uniquely determined by (Nx(t1), . . . , Nx(td)) or the type of x. That is, XN≤ is regarded as the set of all types from X N . For each N ∈ N and δ > 0 we denote by ∆sym(X1, . . . , Xn;N, δ) the set of all (σ1, . . . , σn) ∈ S N such that (σ1(x1), . . . , σn(xn)) ∈ ∆(X1, . . . , Xn;N, δ) for some (x1, . . . ,xn) ∈ (X n. Define Isym(X1, . . . , Xn) := − lim lim sup log γ⊗nSN (∆sym(X1, . . . , Xn;N, δ)), and Isym(X1, . . . , Xn) by replacing lim sup by lim inf. Moreover, for each 1 ≤ i ≤ n, choose a sequence ξi = {ξi(N)} of ξi(N) = (ξi(N)1, . . . , ξi(N)N) ∈ X ≤ such that νξi(N) → pXi as N → ∞. We then define ∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N, δ), Isym(X1, . . . , Xn : ξ1, . . . , ξn) and Isym(X1, . . . , Xn : ξ1, . . . , ξn) as in Definition 1.4. Lemma 2.4. For any choices of approximating sequences ξ1, . . . , ξn one has Isym(X1, . . . , Xn) = Isym(X1, . . . , Xn : ξ1, . . . , ξn), Isym(X1, . . . , Xn) = Isym(X1, . . . , Xn : ξ1, . . . , ξn). Proof. It suffices to show that for each δ > 0 there are a δ′ > 0 and an N0 ∈ N such ∆sym(X1, . . . , Xn;N, δ ′) ⊂ ∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N, δ) (2.2) for all N ≥ N0. Choose δ ′ > 0 so that 3ndn+1δ′ ≤ δ, where d = #X . Suppose (σ1, . . . , σn) is in the left-hand side of (2.2) so that (σ1(x1), . . . , σn(xn)) ∈ ∆(X1, . . . , Xn; N, δ′) for some (x1, . . . ,xn), xi = (xi1, . . . , xiN) ∈ X ≤ . Since |ν(σ1(x1),...,σn(xn))(z1, . . . , zn)− p(X1,...,Xn)(z1, . . . , zn)| < δ ′, (z1, . . . , zn) ∈ X n, (2.3) 10 F. HIAI AND D. PETZ νxi(t) = z1,...,zi−1,zi+1,...,zn∈X ν(σ1(x1),...,σn(xn))(z1, . . . , zi−1, t, zi+1, . . . , zn), t ∈ X , pXi(t) = z1,...,zi−1,zi+1,...,zn∈X p(X1,...,Xn)(z1, . . . , zi−1, t, zi+1, . . . , zn), t ∈ X , it follows that |νxi(t)− pXi(t)| < d n−1δ′ (2.4) for any 1 ≤ i ≤ n and t ∈ X . Now, choose an N0 ∈ N so that |νξi(N)(t)− pXi(t)| < δ and hence |νξi(N)(t)− νxi(t)| < 2d n−1δ′ (2.5) for any 1 ≤ i ≤ n and t ∈ X and for all N ≥ N0. Since |(Nξi(N)(t1) + · · ·+Nξi(N)(tl))− (Nxi(t1) + · · ·+Nxi(tl))| ≤ |Nξi(N)(t1)−Nxi(t1)|+ · · ·+ |Nξi(N)(tl)−Nxi(tl)| < 2Ndnδ′ for every 1 ≤ l ≤ d thanks to (2.5), it is easily seen that j ∈ {1, . . . , N} : ξi(N)j 6= xij < 2Ndn+1δ′ for any 1 ≤ i ≤ n. Hence we get |ν(σ1(ξ1(N)),...,σn(ξn(N)))(z1, . . . , zn)− ν(σ1(x1),...,σn(xn))(z1, . . . , zn)| ∣#{j : ξ1(N)σ−1 (j) = z1, . . . , ξn(N)σ−1n (j) = zn} −#{j : x1σ−1 (j) = z1, . . . , xnσ−1n (j) = zn} #{j : ξi(N)j 6= xij} < 2nd n+1δ′ so that thanks to (2.3) |ν(σ1(ξ1(N)),...,σn(ξn(N)))(z1, . . . , zn)− p(X1,...,Xn)(z1, . . . , zn)| < 3nd n+1δ′ ≤ δ for every (z1, . . . , zn) ∈ X n. Therefore, (σ1, . . . , σn) is in the right-hand side of (2.2), as required. � The next theorem is the discrete variable version of Theorem 1.6. Theorem 2.5. Isym(X1, . . . , Xn) = Isym(X1, . . . , Xn) = −S(X1, . . . , Xn) + S(Xi). Proof. For each sequence (N1, . . . , Nd) of integers Nl ≥ 0 with l=1Nl = N , let S(N1, . . . , Nd) denote the subgroup of SN consisting of products of permutations of {1, . . . , N1}, {N1 + 1, . . . , N1 +N2}, . . . , {N1 + · · ·+Nd−1 + 1, . . . , N}, and let SN/S(N1, . . . , Nd) be the set of left cosets of S(N1, . . . , Nd). For each x ∈ X ≤ and σ ∈ SN we write [σ]x for the left coset of S(Nx(t1), . . . , Nx(td)) containing σ. Then it is clear that A NEW APPROACH TO MUTUAL INFORMATION 11 every s ∈ XN is represented as s = σ(x) with a unique pair (x, [σ]x) of x ∈ X ≤ and [σ]x ∈ SN/S(Nx(t1), . . . , Nx(td)). For any ε > 0 one can choose a δ > 0 such that for every 1 ≤ i ≤ n and every probability measure p on X , if |p(t)−pXi(t)| < δ for all t ∈ X , then |S(p)−S(pXi)| < ε. This implies that for each N ∈ N and 1 ≤ i ≤ n, one has |S(νx) − S(pXi)| < ε whenever x ∈ ∆(Xi;N, δ). Notice that ∆sym(X1, . . . , Xn;N, δ/d n−1) is the union of [σ1]x1 × · · · × [σn]xn for all (x1, . . . ,xn; [σ1]x1 , . . . , [σn]xn) of xi ∈ X ≤ and [σi]xi ∈ SN/S(Nxi(t1), . . . , Nxi(td)) such that (σ1(x1), . . . , σn(xn)) ∈ ∆(X1, . . . , Xn;N, δ/d n−1). Now, suppose (x1, . . . ,xn) ∈ (X n, (σ1, . . . , σn) ∈ S N and (σ1(x1), . . . , σn(xn)) ∈ ∆(X1, . . . , Xn;N, δ/d n−1). Then, for each 1 ≤ i ≤ n we get xi ∈ ∆(Xi;N, δ), i.e., |νxi(t)− pXi(t)| < δ for all t ∈ X as (2.4). Hence we have [σ1]x1 × · · · × [σn]xn x∈∆(Xi;N,δ) Nx(t)! (2.6) so that #∆sym(X1, . . . , Xn;N, δ/d ≤ #∆(X1, . . . , Xn;N, δ/d n−1) · x∈∆(Xi;N,δ) Nx(t)! Therefore, log γ⊗nSN ∆sym(X1, . . . , Xn;N, δ/d log#∆(X1, . . . , Xn;N, δ/d x∈∆(Xi;N,δ) logNx(t)! logN !. (2.7) For each 1 ≤ i ≤ n and for any x ∈ ∆(Xi;N, , δ), the Stirling formula yields logNx(t)!− logN ! Nx(t) logNx(t)− Nx(t) − logN + 1 + o(1) = −S(νx) + o(1) ≤ −S(pXi) + ε+ o(1) as N →∞ (2.8) thanks to the above choice of δ > 0. Here, note that the o(1) in the above estimate is uniform for x ∈ ∆(Xi;N, δ). Hence, by (2.7), (2.8) and by Lemma 2.1 applied to p(X1,...,Xn) on X n, we obtain −Isym(X1, . . . , Xn) ≤ S(p(X1,...,Xn))− S(pXi) + nε and hence Isym(X1, . . . , Xn) ≥ −S(X1, . . . , Xn) + S(Xi). (2.9) 12 F. HIAI AND D. PETZ Next, we prove the converse direction. For any ε > 0 choose a δ > 0 as above. For N ∈ N let Ξ(N, δ/dn−1) be the set of all (x1, . . . ,xn) ∈ (X n such that (σ1(x1), . . . , σn(xn)) ∈ ∆(X1, . . . , Xn;N, δ/d for some (σ1, . . . , σn) ∈ S N . Furthermore, for each (x1, . . . ,xn) ∈ Ξ(N, δ/d n−1), let Σ(x1, . . . ,xn;N, δ/d n−1) be the set of all ([σ1]x1 , . . . , [σn]xn) ∈ SN/S(Nxi(t1), . . . , Nxi(td)) such that (σ1(x1), . . . , σn(xn)) ∈ ∆(X1, . . . , Xn;N, δ/d n−1). Then it is obvious that #∆(X1, . . . , Xn;N, δ/d n−1) ≤ (x1,...,xn)∈Ξ(N,δ/dn−1) #Σ(x1, . . . ,xn;N, δ/d n−1). (2.10) When (x1, . . . ,xn) ∈ Ξ(N, δ/d n−1), we get xi ∈ ∆(Xi;N, δ) as (2.4) for 1 ≤ i ≤ n. Hence it is seen that #Ξ(N, δ/dn−1) ≤ #∆(Xi;N, δ) (N1, . . . , Nd) : Nl ≥ 0 is an integer in N(pXi(tl)− δ), N(pXi(tl) + δ) for 1 ≤ l ≤ d < (2Nδ + 1)nd. (2.11) For any fixed (x1, . . . ,xn) ∈ Ξ(N, δ/d n−1), suppose ([σ1]x1 , . . . , [σn]xn) ∈ Σ(x1, . . . ,xn; N, δ/dn−1); then we get [σ1]x1 × · · · × [σn]xn x∈∆(Xi;N,δ) Nx(t)! similarly to (2.6). Therefore, #∆sym(X1, . . . , Xn;N, δ/d ([σ1]x1 ,...,[σn]xn )∈Σ(x1,...,xn;N,δ/d [σ1]x1 × · · · × [σn]xn ≥ #Σ(x1, . . . ,xn;N, δ/d n−1) · x∈∆(Xi;N,δ) Nx(t)! . (2.12) By (2.10)–(2.12) we obtain #∆(X1, . . . , Xn;N, δ/d n−1) ≤ #∆sym(X1, . . . , Xn;N, δ/d n−1) · (2Nδ + 1)nd minx∈∆(Xi;N,δ) t∈X Nx(t)! A NEW APPROACH TO MUTUAL INFORMATION 13 so that log#∆(X1, . . . , Xn;N, δ/d log γ⊗nSN ∆sym(X1, . . . , Xn;N, δ/d x∈∆(Xi;N,δ) logNx(t)! logN ! + log(2Nδ + 1). Since it follows similarly to (2.8) that logNx(t)! + logN ! ≤ S(pXi) + ε+ o(1) as N →∞ with uniform o(1) for all x ∈ ∆(Xi;N, δ), we obtain S(p(X1,...,Xn)) ≤ −Isym(X1, . . . , Xn) + S(pXi) + nε by Lemma 2.1 again, and hence Isym(X1, . . . , Xn) ≤ −S(X1, . . . , Xn) + S(Xi). (2.13) The conclusion follows from (2.9) and (2.13). � In particular, the mutual information I(X1 ∧ X2) of X1 and X2 is equivalently ex- pressed as I(X1 ∧X2) = S(p(X1,X2), pX1 ⊗ pX2) = −S(p(X1,X2)) + S(pX1) + S(pX2) = Isym(X1, X2) = Isym(X1, X2). Similarly to the problem (2) mentioned in the last of Section 1, it is unknown whether the limit log γ⊗nSN ∆sym(X1, . . . , Xn;N, δ) exists or not. References [1] T. M. Cover and J. A. Thomas, Elements of Information Theory, Second edition, Wiley- Interscience, Hoboken, NJ, 2006. [2] I. Csiszár and P. C. Shields, Information Theory and Statistics: A Tutorial, in “Foundations and Trends in Communications and Information Theory,” Vol. 1, No. 4 (2004), 417-528, Now Publishers. [3] F. Hiai, T. Miyamoto and Y. Ueda, Orbital approach to microstate free entropy, preprint, 2007, math.OA/0702745. [4] F. Hiai and D. Petz, The Semicircle Law, Free Random Variables and Entropy, Mathematical Surveys and Monographs, Vol. 77, Amer. Math. Soc., Providence, 2000. [5] D. Voiculescu, The analogues of entropy and of Fisher’s information measure in free probability theory, II, Invent. Math. 118 (1994), 411–440. [6] D. Voiculescu, The analogue of entropy and of Fisher’s information measure in free probability theory VI: Liberation and mutual free information, Adv. Math. 146 (1999), 101–166. http://arxiv.org/abs/math/0702745 14 F. HIAI AND D. PETZ Graduate School of Information Sciences, Tohoku University, Aoba-ku, Sendai 980- 8579, Japan Alfréd Rényi Institute of Mathematics, Hungarian Academy of Sciences, H-1053 Budapest, Reáltanoda u. 13-15, Hungary Introduction 1. The continuous case 2. The discrete case References ABSTRACT A new expression as a certain asymptotic limit via "discrete micro-states" of permutations is provided to the mutual information of both continuous and discrete random variables. <|endoftext|><|startoftext|> Introduction Zhou and Sornette (2003) analyzed the deflated quarterly average sales prices p(t) from December 1992 to December 2002 of new houses sold in all the states in the USA and by regions (northeast, midwest, south and west) and found that, while there was undoubtedly a strong growth rate, there was no evidence of a bubble in the latest six years (as qualified by a super-exponential growth). Then, Zhou and Sornette (2006) analyzed the quarterly average sale prices of new houses sold in the USA as a whole, in the northeast, midwest, south, and west of the USA, in each of the 50 states and the District of Columbia of the USA up to the first quarter of 2005, to determine whether they have grown faster-than-exponential (which is taken as the diagnostic of a bubble). Zhou and Sornette (2006) found that 22 states (mostly Northeast and West) exhibit clear-cut signatures of a fast growing bubble. From the analysis of the S&P 500 Home Index, they concluded that the turning point of the bubble would probably occur around mid-2006. The specific statement found at the bottom of page 306 of Ref.[Zhou and Sornette (2006)] is: “We observe a good stability of the predicted tc ≈ mid-2006 for the two LPPL models (2) and (3). The spread of tc is larger for the second-order LPPL fits but brackets mid- 2006. As mentioned before, the power-law fits are not reliable. We conclude that the turning point of the bubble will probably occur around mid-2006.” It should be stressed that these studies departed from most other reports by analysts and consulting firms on real estate prices in that Zhou and Sornette (2003, 2006) did not characterize the housing market as overpriced in 2003. It is only in 2004-2005 that they confirmed that the signatures of an unsustainable bubble path has been revealed. Let us briefly analyze how this prediction has fared. The upper panel of Figure 1 shows the quarterly house price indexes (HPIs) in the 21 states and in the District of Columbia (DC) from 1994 to the fourth quarter of 2006 released by the OFHEO. It is evident that the growth in most of these 22 HPIs has slowed down or even stopped during the year of 2006. When we look at the S&P Case-Shiller Home Indexes of the 20 major US cities, as illustrated in the lower panel of Figure 1, we observe that the majority of the S&P/CSIs had a maximum denoted by a solid dot in the middle of 2006, validating the pre- diction of Zhou and Sornette (2006). Specifically, the times of the maxima are respectively 2006/06/01, 2006/09/01, 2005/11/01, 2006/05/01, 2006/08/01, 2006/05/01, 2006/12/01, 2006/07/01, 2006/08/01, 2006/09/01, 2005/09/01, 2005/12/01, 2006/09/01, 2006/09/01, 2006/08/01, 2006/06/01, 2006/07/01, 2006/09/01, 2006/08/01, 2006/12/01, 2006/06/01, and 2006/07/01 for the 20 cities shown in the legend of the lower panel. The only two cities with a max- Education Foundation (Grant 101086), and the Alfred Kastler Foundation which supported W.-X. Zhou for a visiting position in France. imum occurring later towards the end of 2006 (2006/12/01) are Miami and Seattle. However their growth rates decreased remarkably in 2006 as shown in the figure. Furthermore, the S&P/CS Home Price Composite-10 reached its historical high 226.29 on 2006/06/01 and the Composite-20 culminated to 206.53 on 2006/07/01, again confirming remarkably well the validity of the forecast of Zhou and Sornette (2006). 1994 1996 1998 2000 2002 2004 2006 2008 2010 2000 2001 2002 2003 2004 2005 2006 2007 Phoenix − AZ Los Angeles San Diego San Francisco Denver Washington Miami Tampa − FL Atlanta − GA Chicago Boston Detroit − MI Minneapolis − MN Charlotte − NC Las Vegas New York Cleveland − OH Portland − OR Dallas − TX Seattle − WA Fig. 1. Evaluation of the prediction of Zhou and Sornette (2006) that “the turning point of the bubble will probably occur around mid-2006” using the OFHEO HPI data (upper panel) and the S&P CSI data (lower panel). In this note, we provide a more regional study of the diagnostic of bubbles and the prediction of their demise. Specifically, we analyze the Case-Shiller- Weiss (CSW) Zip Code Indexes of 27 different Las Vegas regions calculated with a monthly rate from June-1983 to March-2005. The CSW Indexes are based on the so-called repeat sales methods which directly measure house price appreciations. The key to these data is that they are observations of multiple transactions on the same property, repeated over many properties and then pooled in an index. Prices from different time periods are combined to create “matched pairs,” providing a direct measure of price changes for a given property over a known period of time. Bailey et al. (1963) proposed the basic repeat sales method over four decades ago, but only after the work by Case and Shiller (1987, 1989, 1990) did the idea receive significant attention in the housing research community. Studying the Las Vegas database is particular suitable since Las Vegas belongs to a state which was identified by Zhou and Sornette (2006) as one of the 22 states with a fast growing bubble in 2005. With access to 27 different CSW Zip Code Indexes of Las Vegas, we are able to obtain more reliable and fine-grained measures, which both confirm and extend the previous analyses of Zhou and Sornette (2003, 2006). The next section recalls the conceptual background underlying our empirical approach. Then, section 3 analyzes the regional CSW indexes for Las Vegas, showing that there is a regime shift separated by a bubble around year 2004. Section 4 identifies and then analyzes the yearly periodicity and intra-year pattern detected in the growth rate of the regional CSW indexes. Section 5 offers a preliminary forecast based on the periodicity analyses in Sec. 4. Section 6 concludes. 2 Conceptual background of our empirical analysis 2.1 Humans as social animals and herding Humans are perhaps the most social mammals and they shape their envi- ronment to their personal and social needs. This statement is based on a growing body of research at the frontier between new disciplines called neuro- economics, evolutionary psychology, cognitive science, and behavioral finance. This body of evidence emphasizes the very human nature of humans with its biases and limitations, opposed to the previously prevailing view of ratio- nal economic agents optimizing their decisions based on unlimited access to information and to computation resources. Here, we focus on an empirical question (the existence and detection of real- estate bubbles) which, we hypothesize, is a footprint of perhaps the most robust trait of humans and the most visible imprint in our social affairs: im- itation and herding. Imitation has been documented in psychology and in neuro-sciences as one of the most evolved cognitive process, requiring a de- veloped cortex and sophisticated processing abilities. In short, we learn our basics and how to adapt mostly by imitation all along our life. It seems that imitation has evolved as an evolutionary advantageous trait, and may even have promoted the development of our anomalously large brain (compared with other mammals). It is actually “rational” to imitate when lacking suffi- cient time, energy and information to take a decision based only on private information and processing, that is..., most of the time. Imitation, in obvious or subtle forms, is a pervasive activity of humans. In the modern business, economic and financial worlds, the tendency for humans to imitate leads in its strongest form to herding and to crowd effects. Based on a theory of cooperative herding and imitation, we have shown that imitation leads to positive feedbacks, that is, an action leads to consequences which themselves reinforce the action and so on, leading to virtuous or vicious circles. We have formalized these ideas in a general mathematical theory which has led to observable signature of herding, in the form of so-called log-periodic power law acceleration of prices. A power law acceleration of prices reflects the positive feedback mechanism. When present, log-periodicity takes into account the competition between positive feedback (self-fulfilling sentiment), negative feedbacks (contrariant behavior and fundamental/value analysis) and inertia (everything takes time to adjust). Sornette (2003) presented a general introduction, a synthesis and examples of applications. 2.2 Definition and mechanism for bubbles The term “bubble” is widely used but rarely clearly defined. Following Case and Shiller (2003), the term “bubble” refers to a situation in which excessive public ex- pectations of future price increases cause prices to be temporarily elevated. During a housing price bubble, homebuyers think that a home that they would normally consider too expensive for them is now an acceptable purchase be- cause they will be compensated by significant further price increases. They will not need to save as much as they otherwise might, because they expect the increased value of their home to do the saving for them. First-time home- buyers may also worry during a housing bubble that if they do not buy now, they will not be able to afford a home later. Furthermore, the expectation of large price increases may have a strong impact on demand if people think that home prices are very unlikely to fall, and certainly not likely to fall for long, so that there is little perceived risk associated with an investment in a home. What is the origin of bubbles? In a nutshell, speculative bubbles are caused by “precipitating factors” that change public opinion about markets or that have an immediate impact on demand, and by “amplification mechanisms” that take the form of price-to-price feedback, as stressed by Shiller (2000). A number of fundamental factors can influence price movements in housing mar- kets. On the demand side, demographics, income growth, employment growth, changes in financing mechanisms or interest rates, as well as changes in loca- tion characteristics such as accessibility, schools, or crime, to name a few, have been shown to have effects. On the supply side, attention has been paid to construction costs, the age of the housing stock, and the industrial organiza- tion of the housing market. The elasticity of supply has been shown to be a key factor in the cyclical behavior of home prices. The cyclical process that we observed in the 1980s in those cities experiencing boom-and-bust cycles was caused by the general economic expansion, best proxied by employment gains, which drove demand up. In the short run, those increases in demand encountered an inelastic supply of housing and developable land, inventories of for-sale properties shrank, and vacancy declined. As a consequence, prices accelerated. This provided an amplification mechanism as it led buyers to anticipate further gains, and the bubble was born. Once prices overshoot or supply catches up, inventories begin to rise, time on the market increases, vacancy rises, and price increases slow down, eventually encountering down- ward stickiness. The predominant story about home prices is always the prices themselves (see Shiller, 2000; Sornette, 2003); the feedback from initial price increases to further price increases is a mechanism that amplifies the effects of the precipitating factors. If prices are going up rapidly, there is much word-of- mouth communication, a hallmark of a bubble. The word of mouth can spread optimistic stories and thus help cause an overreaction to other stories, such as stories about employment. The amplification can also work on the downside as well. Price decreases will generate publicity for negative stories about the city, but downward stickiness is encountered initially. 2.3 Was there a bubble? Status of the argument based on the ratio of cost of owning versus cost of renting In recent years, there has been increasing debates on whether there was a real estate bubble or not in the United States of America. Case and Shiller (2003), Shiller (2006) and Smith and Smith (2006) argued that the house prices over the period 2000-2005 were not abnormal as they reflected only the convergence of the prices to their fundamentals from below. In contrast, Zhou and Sornette (2006) and Roehner (2006) have suggested that there was a bubble, which be- came identifiable only after 2003, that is, after the work of Zhou and Sornette (2003). In this context, it is instructive to comment on the study by Himmelberg et al. (2005), from the Federal Reserve Bank of New York , as it reflects the never ending debate between tenants of the fundamental valuation explanation and those invoking speculative bubbles. We are resolutely part of the second group. Himmelberg et al. (2005) constructed measures of the annual cost of single- family housing for 46 metropolitan areas in the United States over the last 25 years and compared them with local rents and incomes as a way of judging the level of housing prices. In a nutshell, they claimed in 2005 that conventional metrics like the growth rate of house prices, the price-to-rent ratio, and the price-to-income ratio can be misleading and lead to incorrect conclusions on the existence of the real-estate bubble. Their measure showed that, during the 1980s, houses looked most overvalued in many of the same cities that subsequently experienced the largest house price declines. But they found that from the trough of 1995 to 2004, the cost of owning rose somewhat relative to the cost of renting, but not, in most cities, to levels that made houses look overvalued. The rosy conclusion of Himmelberg et al. (2005), that 2004-2005 prices were justifiable and that there was no risk of deflation as no bubble was present, is based on a particularly curious comparison between cost of owning and cost of renting, as noticed by Jorion (2005), in a letter to the Wall Street Journal. In- deed, they candidly revealed however that their “cost of owning” calculations imply an “expected appreciation on the property” coefficient. The value for this factor is no doubt derived from figures for appreciation as currently ob- served on the housing market, meaning they regarded the current appreciation level as a reasonable assumption for what would indeed happen next – which is precisely what our analyses and that of others question. In other words, the authors had unwittingly hard-wired into their model the assertion that there was no housing bubble; little wonder then that this is also what they felt au- thorized to conclude. The circularity of their reasoning is particularly obvious in an illustration they gave for San Francisco where for more than 60 years the price-to-rent ratio has exceeded the national average, which, so they claimed, “does not necessarily make owning there more expensive than renting.” The reason why is that “high financing costs are offset by above-average expected capital gains.” Translated, this means that as long as there is a bubble, prices will go up and investing in a house remain a profitable operation. This trivial statement is hollow; the real question is whether the trend that is observed now remains sustainable. In addition to this criticism put forward by Jorion (2005), there are other reasons to doubt the validity of the conclusion of Himmelberg et al. (2005). In the own words of Himmelberg et al. (2005), “the ratio of the cost of owning to the cost of renting is especially sensitive to the real long-term interest rates.” They are right in their rosy conclusion... as long as the long-term interest rates remain exceptionally low. It is particularly surprising that their estimation of the ratio of the cost of owning to the cost of renting was based on the most recent rates over the preceding year of their analysis (2004), while the price of a house is a long-term investment: what will be the long-term rates in 10, 20, 30, or 50 years? Another problem is that their analysis was “mono-dimensional”: they proposed that everything depends only on the ratio of the cost of owning to the cost of renting. But they missed the interest rates as an independent variable. As a consequence, it is not reasonable to compare the 1980s and the present time, as the long-term interest rates had nothing in common. Another problem with their analysis is that they assumed “equilibrium,” while people are sensitive to the history-dependent path followed by the prices. In other words, people are sensitive to the way prices reach a certain level, if there is an acceleration that can self-fuel itself for a while, while Himmelberg et al. (2005) discussed only the mono-dimensional level of the price, and not how it got there. We think that this general error made by “equilibrium” economists constitutes a fundamental flaw which fails to capture the real nature of the organization of human societies and their decision process. In the sequel, we actually focus our attention on signatures of price trajectories that highlight the importance of history dependence for prediction. This discussion is reminiscent of the proposition by Mauboussin and Hiler (1999), offered close to the peak of the Internet and new technology bubble that culminated in 2000, that better business models, the network effect, first- to-scale advantages, and real options effect could account rationally for the high prices of dot.com and other New Economy companies. These interest- ing views expounded in early 1999 were in synchrony with the bull market of 1999 and preceding years. They participated in the general optimistic view and added to the strength of the herd. Later, after the collapse of the bubble, these explanations seem less attractive. This did not escape the then U.S. Fed- eral Reserve chairman Alan Greenspan (1997), who said : “Is it possible that there is something fundamentally new about this current period that would warrant such complacency? Yes, it is possible. Markets may have become more efficient, competition is more global, and information technology has doubt- less enhanced the stability of business operations. But, regrettably, history is strewn with visions of such new eras that, in the end, have proven to be a mirage. In short, history counsels caution.” 3 Regime shift in the CSW Zip Code Indexes of Las Vegas 3.1 Description of the data We now turn to the analysis of the CSW indexes of 27 different Las Vegas zip regions obtained with a monthly rate. The 27 monthly CSW data sets start from June-1983 and end in March-2005. Figure 2 shows the price trajectories of all the 27 CSW indexes. Visual inspection shows (i) a very similar behavior of all the different zip codes and (ii) a sudden increase of the indexes since Mid-2003. Let us now analyze this data quantitatively. 3.2 Power law fits The simplest mathematical equation capturing the positive feedback effect and herding is the power law formula (see Broekstra et al., 2005, for a simple introduction in a similar context) I(t) = A+B|tc − t| m , (1) 1980 1985 1990 1995 2000 2005 2010 Fig. 2. Time evolution of the Case-Shiller-Weiss (CSW) Zip Code Indexes of 27 Las Vegas zip regions from June-1983 to March-2005. where B < 0 and 0 < m < 1 or B > 0 and m < 1. Others cases do not qualify as a power law acceleration. For B < 0 and 0 < m < 1 or B > 0 and m < 0, the trajectory of I(t) described by (1) expresses the existence of an accelerating bubble, which is faster than exponential. This is taken as one hallmark of the existence of a bubble. Notice also that this formula expresses the existence of a singularity at time tc, which should be interpreted as a change of regime (the mathematical singu- larity does not exist in reality and is rounded off by so-called finite-size effects and the appearance of a large susceptibility to other mechanisms). This criti- cal time tc must be interpreted as the end of the bubble and the time where the regime is transiting to another state through a crash or simply a plateau or a slowly moving correction. We have fitted each of the 27 individual CSW indexes using the pure power law model (1). The data used for fitting is from Dec-1995 to Jun-2005. We do not show the results as the signature of a power law growth is not evident, essentially because the acceleration is only over a rather short period of time from approximately 2002 to 2004. As a consequence, power law fits give unre- liable critical time tc too much in the future (like 2008 and beyond). We have thus redone the fits of the 27 CSW indexes over a shorter time interval from Aug-2001 to Jun-2005. A typical example is shown in Fig 3. All other 26 CSW are very similar, with some variations of the parameters, but the message is the same: while there is a clear faster-than-exponential acceleration over most of the time interval, the price trajectory has clearly transitioned into another regime in the latter part of the time interval considered here. The transition occurred smoothly from mid-2004 to mid-2005 (the end of the time period analyzed here). It is important to recognize that the power law regime is expected only rela- 2001 2002 2003 2004 2005 2006 = 2012.94; m = −12; χ = 0.007679 Fig. 3. Typical evolution of a CSW index from Aug-2001 to Jun-2005 and its fit by a power law, showing both the faster-than-exponential growth up to mid-2004 and the smooth transition to a much slower growth at later times. The root-mean-square χ of the residuals of the fit as well at tc and m are given inside the figure. tively close to the critical time tc, while other behaviors are expected far from tc. The simplest model is to consider that, far from tc, the price follows an exponential growth with an approximately constant growth rate µ: I(t) = a+ beµt . (2) A fuller description is thus to consider that formula (2) holds from the begin- ning of the time series up to a cross-over time t∗, beyond which expression (1) takes over. Any given price trajectory should thus be fitted by (2) from some initial time tstart to time t ∗ and then by (1) from t∗ to the end of the time series. Technically, t∗ is known from the parameters a, b, µ, A,B, tc, m by the condition of continuity of I(t) at t = t∗, that is, both formulas give the same value at t = t∗. We can further determine one of the parameters a, b or µ by imposing a condition of differentiability at t∗, that is, the first time-derivative of I(t) is continuous at t∗. This approach is known in numerical analysis as “asymptotic matching” (see Bender and Orszag, 1978). A simplified description of such a cross-over between a standard exponential growth and the power law super-exponential acceleration is obtained by using a more compact formulation I(t) = A+B tanh[(tc − t)/τ ] m , (3) where tanh denotes the hyperbolic tangent function. This expression derives from a study of the transition from the non-critical to critical regime in rup- ture processes (of which bubbles and their terminal singularity belong to) conducted by Sornette and Andersen (1998). This expression has the virtue of providing automatically a smooth transition between the exponential be- havior (2) and the pure power law (1), since tanh[(tc − t)/τ ] ≈ (tc − t)/τ for tc− t < τ and tanh[(tc− t)/τ ] ≈ 1− 2e 2(t−tc)/τ for tc− t > τ . In this later case tc − t > τ , expression (3) becomes of the form (2) with m = 1 and a=A+B , (4) b=−2Be−2tc/τ , (5) µ=1/τ . (6) In contrast, for tc − t < τ , expression (3) becomes of the form (1) with the correspondence B/τm → B. Expression (3) has only five free parameters, in contrast with the model involving the cross-over from (2) to (1) which has 7 free parameters (a, b, µ, A,B, tc, m) while t ∗ is determined by the asymptotic matching). The pure power law formula (1) has 4 parameters while the ex- ponential law (2) has just 3 parameters. The problem with expression (3) is that it does not recover a pure exponential growth even for tc − t > τ , when m 6= 1. Thus, expression (3) is limited in fully describing a possible cross-over from a standard mild exponential growth and an super-exponential power law acceleration. Our tests (not shown) find that a fit with model (3) retrieve the pure power law model (1) with the same critical time tc and exponent m and the same root-mean-square residual r.m.s. (the fit adjusts the parameter τ to a very large value, ensuring that the fit is always in the regime tc−t ≪ τ so that the hyperbolic tangential model reduces to the pure power law model). Thus, contrary to our initial hopes, this approach does not provide any additional insight. Inspired by these tests, we could propose the following modified model I(t) = a+ beµt(tc − t) m . (7) It has 5 adjustable parameters, like model (3), but it seems more flexible to describe the looked-for cross-over: for large tc− t, the power law term (tc− t) changes slowly, especially for 0 < m < 1 as is expected here; for small tc−t, the power law term changes a lot while the exponential term is basically constant. But, this model is correct for a critical point only if m < 0 so that b > 0; otherwise, if 0 < m < 1, b < 0 and for tc− t large, the exponential term which dominate does not describe a growth but an exponentially accelerating decay. For 0 < m < 1, we thus need a different formulation. We propose I(t) = a+ beµt + c(tc − t) m . (8) We have fitted this formula to the data over the four periods 1983 - Oct. 2004, 1991 - Oct. 2004, 1983 - Mar. 2005, 1991 - Mar. 2005 and, while the fits are reasonable, the critical time tc is found to overshoot to 2007-2008, which is a typical signature that the model is not predictive. In conclusion of this first preliminary study, the presence of a bubble (faster- than-exponential growth) is confirmed but the determination of the end of this phase is for the moment unreliable. 3.3 Dependence of the growth rate on the index value The monthly growth rate g(t) of a given CSW index at time t is defined by g(t) = ln[p(t)/p(t− 1)] , (9) where p(t) is the price of that CSW index at time t. Figure 4 shows the evolution of the growth rates of the 27 CSW indexes from June-1983 to March- 2005. While there are some variations, all 27 CSW indexes follow practically the same pattern. We clearly observe a large peak of growth over the period 2003-2005. Notice that this recent peak is much larger and coherent than the previous one ending in 1991, which was followed by a price stabilization and even a price drop in certain cases. This figure stresses that the acceleration in growth rate is a very localized event which occurred essentially in 2003-2004 and the subsequent growth rate has leveled off to pre-bubble times. We can conclude that there has been no bubble from 1990 to 2002, approximately, then a short-lived bubble until mid-2004 followed by a smoothed transition back to normal. 1980 1985 1990 1995 2000 2005 2010 −0.04 −0.02 Fig. 4. Evolution of the growth rates of the 27 regional CSW indexes from June-1983 to March-2005. Fig. 5 plots the price growth rate g(t) versus the price p(t) itself for the 27 CSW indexes. A linear regression of the data points on Fig. 5, shown as the red straight line, gives a correlation coefficient of 0.494. If we perform lin- ear regression for each index, then we find an average correlation coefficient 0.503 ± 0.036, confirming the robustness of this estimation of the correlation between growth rate and price level. The obtained relation between g and p obtained from this correlation analysis is captured by the following mathe- matical regression g = 0.00922× − 0.00747 . (10) In words, if p is large, then g is large on average, which confirms the concept of a positive feedback of price on its further growth. The continuous time limit of g(t) defined by (9) is g(t) = d ln p . (11) This last equation together with (10), that we write as g(t) = αp − β (with α = 0.00922/100 and β = 0.00747), implies the following ordinary differential equation = αp2 − βp , (12) which indeed gives a power law acceleration p(t) ∼ 1/(tc − t) asymptotically close to the critical time tc. Note that this critical time is determined by the initial conditions, and is called in mathematics a movable singularity. We conclude from this first analysis that the rough linear growth of the growth rate confirms the existence of a bubble growing faster than exponential according to an approximate power law. But of course, the exponent of this power law is poorly constrained, in particular from the fact that the growth rate g(t) exhibits significant variability and furthermore nonlinearity, as can be seen in Fig. 5. 50 100 150 200 250 300 350 400 450 −0.04 −0.02 Jul. 1983 − Sep. 2003 Oct. 2003 − Sep. 2004 Oct. 2004 − Mar. 2005 Fig. 5. Dependence on the data price p for all CSW indexes of its growth rate g. The overall correlation coefficient is 0.494. The red line is the linear fit of the data points. It is useful to refine this analysis by separating the whole time interval into three distinct intervals. The corresponding plot of the growth rate g as a function of price is shown in Fig. 5 with different symbols: period 1 is Jul. 1983 to Sept. 2003, period 2 is from Oct. 2003 to Sept. 2004, and period 3 is from Oct. 2004 to Mar. 2005. An anomaly can be clearly outlined, associated with the red dots which correspond to the anomalous peak in the growth rate in the period from Oct. 2003 to Sept. 2004. Notice also that the most recent time interval from Oct. 2004 to Mar. 2005 shows practically the same behavior as the first period before 2003. In other words, when removing the data in red for the period from Oct. 2003 to Sept. 2004, the growth rate g(t) is practically independent of p, which qualifies the normal regime. We can thus conclude that this so-called “phase-portrait” of the growth rate versus price has identified clearly an anomalous time interval associated with extremely fast accelerating prices followed by a more recent period where the price growth has resumed a more normal regime. 4 Yearly periodicity and intra-year structure 4.1 Yearly periodicity from superposed year analysis and spectral analysis In Fig. 4, the time dependence of the monthly growth rate exhibits a clear sea- sonality (or periodicity), which appear visually to be predominantly a yearly phenomenon. This visual observation is made quantitative by performing a spectral Fourier analysis. The power spectrum of a typical CSW index is shown in Fig. 6 (all CSW indexes show the same power spectrum). Since the unit of time used here is one year, the frequency f is in unit of 1/year. A periodic behavior with period one year should translate into a peak at f = 1 plus all its harmonics f = 2, 3, 4, · · · , which is indeed observed in Fig. 6. Note also that the spectrum has large peaks at f = 4 and f = 8 among the harmonics of f = 1, which indicates a weak periodicity with period of one quarter. This is consistent with Fig. 7, where four oscillations in the averaged monthly growth rates can be observed. 0 1 2 3 4 5 6 7 8 9 10 11 12 Fig. 6. Spectrum analysis to confirm the strong periodicity in g(t). Note that the power spectrum itself is periodic with a period of 12, which is the sampling frequency, equal to the double of the Nyquist frequency. There are also many peaks in the low-frequency region (larger that one-year time scale) close to f = 0, which are associated with the time scales of the global trends produced by the big peaks in g(t) around year 2004 as well as around 1990. To further explore this seasonal variability of the price growth rates, we cal- culate the averages of the growth rates for given months, where the average is performed over all years. Consider for instance the month of January: we look up the growth rate for all the data over all years for the month of January and take the average. We do the same for each successive months. The result is shown in Figure 7 for two time periods, which gives the average growth rate 〈g〉 for different months of the year. The red dash line and circles give the resultant 〈g〉 for all the data and the black dash line and triangles give the standard deviation σg for all data (which is a measure of the variability from year to year and from zip code to zip code around the average). The difference between the two time periods is precisely the time interval from June 2003 to March 2005: this period is responsible for a significant increase of the average growth rate (compare the red dashed line (filled circles) with the red continu- ous line (open circles)) and an even larger increase of the variability (compare the black continuous line (filled triangles) with the dashed black line (open triangles)), again confirming the evidence of an anomalous behavior in that period. In 2005, it appears that the growth rate relaxed back to the normal level (according to the historical record). Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month Fig. 7. Monthly average growth rate (circles) and its standard deviation (triangles) as a function of the month within the year. Dash: results obtained over all 27 indexes over the period from Jun. 1983 to Mar. 2005; Solid: results obtained over all 27 indexes over the period from Jun. 1983 to May 2003. 4.2 Yearly periodicity and intra-year structure with a scale and translation modulated model Inspired by these results, we propose the following quantitative model. Con- sider a time t in units of month. We write t = 12T +m, where T is the year and m is the month within that year and thus goes from 1 (January) to 12 (December). For instance, t = 26 corresponds to T = 2 and m = 2 (Febru- ary), while t = 38 corresponds to T = 3 and again the same month m = 2 (February) within the year. We propose to model the intra-year structure of the growth rate g(t) together with possible yearly variations by the following expression g(t = 12T +m) = f(T )h(m) + j(T ) . (13) In words, the growth rate has an intra-year structure h(m) modulated from year to year in amplitude by f(T ) up to a possible overall translation j(T ) which can also vary from year to year. We can expect f(T ) and j(T ) to be approximately constant for most years, except around 1990 and 2004 for which we should see an anomaly in either or both of them, since these two periods had bubbles. Note that this model (13) gives an exact yearly periodicity if f(T ) and j(T ) are constant. A non-constant f(T ) describes an amplitude modulation of the yearly periodicity. In particular, we expect a strong peak around T = 2004. With this model, we can focus on predicting f(T ) and j(T ) only, because we have removed the complex intra-year structure. We have thus fitted the model (13) to three subsets of the whole available time series for the growth rate g(t) and also to the whole set taken globally, in order to test for the robustness of the model. For this, we use the cost function Tmax∑ [g(t = 12T +m)− f(T )h(m)− j(T )]2 (14) which is minimized with respect to the 12 unknown variables h(1), ..., h(12) and the 2 × Tmax variables [f(1), j(1)], ..., f(Tmax), j(Tmax). There are 12Tmax terms in the sum and 12 + 2 × Tmax unknown variables. This shows that the system is well-constrained as soon as Tmax ≥ 2. For instance for Tmax = 20, we have 52 unknown variables to fit and 240 terms in the sum to constrain the Figure 8 illustrates the result of the fit of model (13) to the growth rate over the whole time interval from 1985 to 2005. As expected, we can observe a clear peak in the amplitude f(T ) corresponding to the year 2004, while there is not appreciable peak around 1990. This means that the recent bubble appears significantly stronger than any other episodes in the last 20 years and dwarfs them. The anomalous nature of the recent bubble is reinforced by the existence of a peak in j(T ) for the same year 2004, showing that both the amplitude and translation components of the growth rates has been completely anomalous in 2004. The middle graph of the top panel of figure 8 shows the intra-year pattern captured by the model, which is in remarkable agreement with the pattern shown in figure 7: one can observe a peak in March, May, August and December, the largest peak being in May. The bottom panel of figure 8 shows visually how well (or badly) the model fits the actual data. The quality of the fit is excellent, except in 2004-2005. In other words, we clearly identify a very anomalous or exceptional behavior in 2004-2005, again providing a confirmation that something exceptional or anomalous has occurred during that period. 1985 1990 1995 2000 2005 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec −0.025 −0.02 −0.015 −0.01 1985 1990 1995 2000 2005 1980 1985 1990 1995 2000 2005 2010 −0.03 −0.02 −0.01 Fig. 8. Upper panels: three graphs showing the three functions f(T ), h(m) and j(T ) fitted on the growth rate over the whole time interval from 1985 to 2005. Lower panel: Comparison between the growth rate data (empty blue circles) and the model (13) (red line). Figure 9 is the same as figure 8 for the period from 1985 to 1990. One can clearly here observe a peak in the scaling amplitude f(T ) at T =1988 and in the translation term j(T ) at T =1986, suggesting that the first bubble of the 1985-2005 period occurred over a relatively large time period 1985-1990, with two successive contributions. The intra-year structure h(m) has also its peaks on March, May, August and December, but this intra-year structure is weaker than for other sub-periods. The lower panel of figure 9 shows that the model captures very well the overall trend as well as the intra-year structure. The main discrepancies are in the amplitude of the large peaks and valleys, which are not fully predicted. 1985 1985.5 1986 1986.5 1987 1987.5 1988 1988.5 1989 1989.5 1990 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 0.185 0.195 0.205 1985 1985.5 1986 1986.5 1987 1987.5 1988 1988.5 1989 1989.5 1990 −0.29 −0.28 −0.27 1984 1985 1986 1987 1988 1989 1990 1991 −0.03 −0.02 −0.01 Fig. 9. Same as figure 8 for the period from 1985 to 1990. Figure 10 is the same as figure 8 for the period from 1991 to 2000. One can clearly here observe a peak in the scaling amplitude f(T ) at T =1995 and in the translation term j(T ) at T =1994. This thus identifies a small bubble in the mid-1990s. The intra-year structure h(m) has also its peaks on March, May, August and December, with very large amplitudes. The lower panel of figure 10 shows a truly excellent fit. 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec −0.12 −0.115 −0.11 −0.105 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 0.155 0.165 1990 1992 1994 1996 1998 2000 2002 −0.01 −0.005 0.005 0.015 0.025 Fig. 10. Same as figure 8 for the period from 1991 to 2000. Figure 11 is the same as figure 8 for the period from 2001 to 2005. One can clearly here observe a peak in the scaling amplitude f(T ) at T =2004 and in the translation term j(T ) also at T =2004. This thus clearly identifies the bubble as peaking in 2004. The intra-year structure h(m) has also its peaks on March, May, August and December, with very large amplitudes and very good agreement with the other three figures. The lower panel of figure 11 shows an excellent fit up to the early 2003 and then a rather large discrepancy starting early 2003 all the way to the last data point approaching mid-2005. In particular, note that the intra-year structure is washed out by the anomalous growth rate culminating in mid-2004. Symmetrically, the intra-year structure is also absent in the fast decay of the growth rate back to normal. We do not have enough data to ascertain if the growth rate has resumed its normal intra-year pattern. We believe that this is a very important diagnostic to characterize abnormal behavior and this could be a very useful variable to monitor on a monthly basis. 2001 2001.5 2002 2002.5 2003 2003.5 2004 2004.5 2005 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec −0.025 −0.02 −0.015 −0.01 2001 2001.5 2002 2002.5 2003 2003.5 2004 2004.5 2005 2000 2001 2002 2003 2004 2005 2006 −0.01 Fig. 11. Same as figure 8 for the period from 2001 to 2005. The four figures 8-11 validate model (13): in particular, they show the very robust intra-year structure with peaks in March, May, August and December. One possible contribution to this quarterly periodicity comes from the con- struction of the CSI: the monthly indexes use a three-month moving average algorithm. Home sales pairs are accumulated in rolling three-month periods, on which the repeat sales methodology is applied. The index point for each reporting month is based on sales pairs found for that month and the pre- ceding two months. For example, the December 2005 index point is based on repeat sales data for October, November and December of 2005. This av- eraging methodology is used to offset delays that can occur in the flow of sales price data from county deed recorders and to keep sample sizes large enough to create meaningful price change averages. A three month rolling window construction corresponds in general to a convolution of the bare price with a kernel which possesses a three month periodicity (or size). The Fourier transform of the convolution is the product of Fourier transforms. Thus the spectrum of the signal should contain the peaks of the Fourier spectrum of the kernel, which by construction contains a peak at three months. However, our synthetic tests (not shown) suggest that this effect is by far too small to explain the strong amplitude of the observed quarterly periodicity. It would be important to understanding why such intra-year structure develops: is it the result of a natural intra-day organization of buyers’ behaviors associated with taxes/ income constraints or a problem of reporting or perhaps the effect of other calendar regularities? Or is it the result of patterns coming from the supply part of the equation, namely home-builders, developers, and perhaps in the time modulation of the rates of allocated permits? Answering these questions is important to determine how much emphasis one should give to these results. But if indeed the intra-day structure is a genuine non-artificial phenomenon, we believe that it offers a remarkable opportunity for monitoring in real time the normal versus abnormal evolution of the market and also for developing forecasts on a month time horizon. 4.3 Intra-year pattern from signs of growth rate increments The existence of a strong and robust intra-year structure in the price growth rate can be further demonstrated by studying the sign of g(t + 1) − g(t). A positive (negative) sign mean that the growth rate tends to increase (decrease) from one month to the next. Based on the seasonality of the growth rate, we are able to answer the following question: given the current growth rate g(t), will the growth rate increase or decrease at time t+1? This amounts to asking what is the sign of g(t+1)−g(t)? Technically, we construct the (unconditional) number of times the sign of the increment g(t + 1) − g(t) is positive or negative irrespective of what is g(t). From Fig. 4, we obtain a sequence of signs: −−+−+−−+−−++. For each month, we calculate the percentage of positive and negative signs, respectively. The second and the third rows of Table 1 gives the percentage of positive and negative signs for each month. The third and fourth rows gives the signs and the associated percentages. For instance, the table says that the “probability” of the sign of g(t = Feb)− g(t = Jan) being “-” is about 92.1%. If we know g(t = Jan), we can say that it is very probable that the growth rate of February will be less than this January value. Thus, this table has predictive power in the sense that the probabilities to predict the signs are much higher than the value of 75% obtained under the null hypothesis that g(t) is a white noise process (see Sornette and Andersen, 2000). This table is another way to rephrase and expand on our preceding analysis on the yearly periodicity by identifying a very strong and robust intra-year structure. Table 1 Analysis of the signs of g(t + 1) − g(t). The second and the third rows gives the percentage of positive and negative signs for each month. The third and fourth rows give the sign for each month that dominates and the associated percentages. Mon Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec +% 7.91 17.2 88.0 5.64 97.7 8.47 8.47 91.4 6.57 8.92 84.2 82.2 -% 92.1 82.8 12.0 94.4 2.29 91.5 91.5 8.59 93.4 91.1 15.8 17.8 sign - - + - + - - + - - + + % 92.1 82.8 88.0 94.4 97.7 91.5 91.5 91.4 93.4 91.1 84.2 82.2 Since our initial analysis performed in the summer of 2005 which used data up to March 2005, new data for the 27 CSW indexes has become available which covers the interval from Apr. 2005 to Sept. 2006. It is very interesting to check if the sign of the growth variations obtained in Table 1 using the data until March 2005 still applies to the new data. The realized signs of the newly available months are calculated and the sequence of signs is the following: - (Apr. 2005, 27 CSW indexes out of 27), + (May. 2005, 27 out of 27), - (Jun. 2005, 27 out of 27), - (Jul. 2005, 27 out of 27), + (Aug. 2005, 27 out of 27), - (Sep. 2005, 27 out of 27), - (Oct. 2005, 27 out of 27), + (Nov. 2005, 27 out of 27), + (Dec. 2005, 21 out of 27), - (Jan. 2006, 27 out of 27), - (Feb. 2006, 27 out of 27), + (Mar. 2006, 27 out of 27), - (Apr. 2006, 27 out of 27), + (May. 2006, 27 out of 27), - (Jun. 2006, 27 out of 27), - (Jul. 2006, 27 out of 27), + (Aug. 2006, 27 out of 27), and - (Sep. 2006, 27 out of 27). Thus, table 1 predicts exactly the signs of the growth rate variations of all 27 CSW indexes for all months except for Dec. 2005 for which there are 6 errors: table 1 predicts that the growth rate variation from Dec. 2005 to Jan. 2006 should be +, which is correct for 21 CSW indexes out of 27, corresponding to a success ratio of 77% (close to the white noise case). This score is slightly lower than the previously estimated probability of 82.2% for the month of December, which is the lowest among all months. Overall, the success rate is remarkably high, adding further evidence that the Las Vegas property market has returned to a more normal phase (no bubble from April 2005 to Sept. 2006). 5 Predicting the monthly growth rate Conditional of the evidence that the anomalous faster than exponential growth has ended, let us attempt to predict the future evolution of the CSW indexes based only on the strong seasonality of the growth rate. Figure 12 presents the predictions one year ahead for the 27 regional CSW indexes. Two different prediction schemes are used. The RED lines are based on the average growth rate obtained from all 27 indexes, while the MAGENTA lines are based on the average growth rate obtained from the individual index under investigation. There is not discernable difference. A similar prediction of the Clark County (Las Vegas MSA) indexes (NVC003Q and NVC003C) has also been made using the average growth rates obtained from all 27 regional indexes. Since these two indexes are only available from July-2000 to March-2005, we do not have enough data to calculate the average growth rates using the indexes themselves. The results are shown in Fig. 13. 2002 2003 2004 2005 2006 2007 Fig. 12. Predicting regional CSW indexes one year ahead. Red lines: Prediction using average growth rate obtained from all 27 indexes; Magenta lines: Prediction using average growth rate obtained from the individual index under investigation. The two kinds of prediction are almost undistinguishable. 2000 2001 2002 2003 2004 2005 2006 2007 NVC003C: Raw data NVC003C: Prediction NVC003Q: Raw data NVC003Q: Prediction Fig. 13. Predicting Clark County (Las Vegas MSA) indexes (NVC003Q and NVC003C) one year ahead. 6 Conclusion We have analyzed 27 house price indexes of Las Vegas from Jun. 1983 to Mar. 2005, corresponding to 27 different zip codes. These analyses confirm the existence of a real-estate bubble, defined as a price acceleration faster than exponential. This bubble is found however to be confined to a rather limited time interval in the recent past from approximately 2003 to mid-2004 and has progressively transformed into a more normal growth rate in 2005. The data up to mid-2005 suggests that the current growth rate has now come back to pre-bubble levels. We conclude that there has been no bubble from 1990 to 2002 except for a medium-sized surge in 1995, then a short-lived but very strong bubble until mid-2004 which has been followed by a smoothed transition back to what appears to be normal. It thus seems that, while the strength of the real-estate bubble has been very strong over the period 2003- 2004, the price appreciation rate has returned basically to normal. In addition, we have identified a strong yearly periodicity which provides a good potential for fine-tuned prediction from month to month. As the intra- year structure is likely a genuine non-artificial phenomenon, it offers a re- markable opportunity for monitoring in real time the normal versus abnormal evolution of the market and also for developing forecasts on a monthly time horizon. In particular, a monthly monitoring using a model that we have de- veloped here could confirm, by testing the intra-year structure, if indeed the market has returned to “normal” or if more turbulence is expected ahead. In addition, it would provide a real-time observatory of upsurges and other anomalous behavior at the monthly scale. This requires additional technical developments and tests beyond this report. Compared with previous analysis of Zhou and Sornette (2003, 2006) at the scale of states and whole regions (northeast, midwest, south and west), the present analysis demonstrates the existence of very significant variations at the local scale, in the sense that the bubble in Las Vegas seems to have preceded the more global USA bubble and has ended approximately two years earlier (mid 2004 for Las Vegas compared with mid-2006 for the whole of the USA). References Bailey, M., Muth, R., Nourse, H., 1963. A regression method for real estate price index construction. Journal of the American Statistical Association 58, 933–942. Bender, C., Orszag, S. A., 1978. Advanced Mathematical Methods for Scien- tists and Engineers. McGraw-Hill, New York. Broekstra, G., Sornette, D., Zhou, W.-X., 2005. Bubble, critical zone and the crash of Royal Ahold. Physica A 346, 529–560. Case, K. E., Shiller, R. J., 1987. Prices of single-family homes since 1970: New indexes for four cities. New England Economic Review Sep/Oct, 45–56. Case, K. E., Shiller, R. J., 1989. The efficiency of the market for single-family homes. American Economic Review 79, 125–137. Case, K. E., Shiller, R. J., 1990. Forecasting prices and excess returns in the housing market. AREUEA J. 18, 253–273. Case, K. E., Shiller, R. J., 2003. Is there a bubble in the housing market. Brookings Papers on Economic Activity (2), 299–362. Greenspan, A., 1997. Federal ReserveÕs semiannual monetary policy report, before the Committee on Banking, Housing, and Urban Affairs, U.S. Senate, February 26. Himmelberg, C., Mayer, C., Sinai, T., September 2005. Assessing high house prices: Bubbles, fundamentals, and misperceptions. Tech. Rep. Staff Report no. 218, Federal Reserve Bank of New York. Jorion, P., 2005. Is housing market surge really sustainable? The Wall Street Journal September 22, A17. Mauboussin, M. J., Hiler, R., 1999. Rational Exuberance? Equity research report of Credit Suisse First Boston, January 26. Roehner, B. M., 2006. Real estate price peaks: A comparative overview. Evo- lutionary and Institutional Economics Review in press, physics/0605133. Shiller, R. J., 2000. Irrational Exuberance. Princeton University Press, New York. Shiller, R. J., 2006. Long-term perspectives on the current boom in home prices. Economists’ Voice 3(4), Art. 4. Smith, M. H., Smith, G., 2006. Bubble, bubble, where’s the housing bubble? Brookings Papers on Economic Activity (1), 1–67. Sornette, D., 2003. Why Stock Markets Crash: Critical Events in Complex Financial Systems. Princeton University Press, Princeton. Sornette, D., Andersen, J.-V., 1998. Scaling with respect to disorder in time- to-failure. European Physical Journal B 1, 353–357. Sornette, D., Andersen, J.-V., 2000. Increments of uncorrelated time series can be predicted with a universal 75% probability of success. International Journal of Modern Physics C 11, 713–720. Zhou, W.-X., Sornette, D., 2003. 2000-2003 real estate bubble in the UK but not in the USA. Physica A 329, 249–263. Zhou, W.-X., Sornette, D., 2006. Is there a real-estate bubble in the US? Physica A 361, 297–308. Biographies: Wei-Xing Zhou is a Professor of Finance at the School of Business in the East China University of Science and Technology. He received his PhD in Chemical Engineering from the East China University of Science and Technology in 2001. His current research interest focuses on the modeling and prediction of catastrophic events in complex systems. Didier Sornette holds the Chair of Entrepreneurial Risks at the Depart- ment of Management, Technology and Economics of ETH Zurich. He received his PhD in Statistical Physics from the University of Nice, France. His current research focuses on the modeling and prediction of catastrophic events in com- plex systems, with applications to finance, economics, seismology, geophysics and biology. Introduction Conceptual background of our empirical analysis Humans as social animals and herding Definition and mechanism for bubbles Was there a bubble? Status of the argument based on the ratio of cost of owning versus cost of renting Regime shift in the CSW Zip Code Indexes of Las Vegas Description of the data Power law fits Dependence of the growth rate on the index value Yearly periodicity and intra-year structure Yearly periodicity from superposed year analysis and spectral analysis Yearly periodicity and intra-year structure with a scale and translation modulated model Intra-year pattern from signs of growth rate increments Predicting the monthly growth rate Conclusion ABSTRACT We analyze 27 house price indexes of Las Vegas from Jun. 1983 to Mar. 2005, corresponding to 27 different zip codes. These analyses confirm the existence of a real-estate bubble, defined as a price acceleration faster than exponential, which is found however to be confined to a rather limited time interval in the recent past from approximately 2003 to mid-2004 and has progressively transformed into a more normal growth rate comparable to pre-bubble levels in 2005. There has been no bubble till 2002 except for a medium-sized surge in 1990. In addition, we have identified a strong yearly periodicity which provides a good potential for fine-tuned prediction from month to month. A monthly monitoring using a model that we have developed could confirm, by testing the intra-year structure, if indeed the market has returned to ``normal'' or if more turbulence is expected ahead. We predict the evolution of the indexes one year ahead, which is validated with new data up to Sep. 2006. The present analysis demonstrates the existence of very significant variations at the local scale, in the sense that the bubble in Las Vegas seems to have preceded the more global USA bubble and has ended approximately two years earlier (mid 2004 for Las Vegas compared with mid-2006 for the whole of the USA). <|endoftext|><|startoftext|> Introduction Hermitian Codes and Syndrome Computation Lemma 1 Definition 1 Lemma 2 Systematic Encoding Lemma 3 Algorithm 1 Algorithm 2 Lemma 5 Algorithm 3 Efficient Implementation of a Systematic Encoder Module A: Multiplication with Matrix A, A' Module B: Multiplication with Matrix A-1, A'-1 Module C: Systematic Encoding of Codes Ei Module D: Systematic Encoding of Codes Dl Encoder Final Remarks Acknowledgement References ABSTRACT We present an algorithm for systematic encoding of Hermitian codes. For a Hermitian code defined over GF(q^2), the proposed algorithm achieves a run time complexity of O(q^2) and is suitable for VLSI implementation. The encoder architecture uses as main blocks q varying-rate Reed-Solomon encoders and achieves a space complexity of O(q^2) in terms of finite field multipliers and memory elements. <|endoftext|><|startoftext|> Quantum criticality and disorder in the antiferromagnetic critical point of NiS2 pyrite N. Takeshita1, S. Takashima2, C. Terakura1, H. Nishikubo2, S. Miyasaka3, M. Nohara2, Y. Tokura1,4, and H. Takagi1,2 1Correlated Electron Research Center (CERC), National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba 305-8562, Japan 2Department of Advanced Materials, University of Tokyo, Kashiwa 277-8561, Japan 3Department of Physics, Osaka University, Toyonaka 560-0043, Japan 4Department of Applied Physics, University of Tokyo, Tokyo 113-8656, Japan (Dated: November 11, 2021) A quantum critical point (QCP) between the antiferromagnetic and the paramagnetic phases was realized by applying a hydrostatic pressure of ∼ 7 GPa on single crystals of NiS2 pyrite with a low residual resistivity, ρ0, of 0.5 µΩcm. We found that the critical behavior of the resistivity, ρ, in this clean system contrasts sharply with those observed in its disordered analogue, NiS2−xSex solid-solution, demonstrating the unexpectedly drastic effect of disorder on the quantum criticality. Over a whole paramagnetic region investigated up to P = 9 GPa, a crossover temperature, defined as the onset of T2 dependence of ρ, an indication of Fermi liquid, was suppressed to a substantially low temperature T ∼ 2 K and, instead, a non Fermi liquid behavior of ρ, T 3/2-dependence, robustly showed up. PACS numbers: A hallmark of strongly correlated electron systems is the presence of a rich variety of phases often competing with each other. When two phases meet with each other in the T = 0 limit by tuning a phase controlling parame- ter such as pressure and chemical doping, quantum fluc- tuations often give rise to exotic states of electrons, which has been attracting considerable interest in condensed matter research [1]. One of the most fascinating cases is a breakdown of the Fermi liquid at magnetic quantum crit- ical points (QCP) in itinerant magnets, which has been believed to be captured by self-consistent renormalization theory [2] and scaling analysis [3, 4]. The onset temper- ature of Fermi liquid coherence, probed by a quadratic temperature dependence of resistivity ρ(T ) ∝ T 2, is pre- dicted to be suppressed by the presence of low lying spin fluctuations near QCP and, right at QCP, a non Fermi liquid ground state is realized which manifests itself as a non-trivial power law behavior of resistivity ρ(T ) ∝ T n down to the T = 0 limit, where n = 3/2 for antiferro- magnetic (AF) QCP and n = 5/3 for ferromagnetic (F) QCP [5]. A V-shaped recovery of Fermi liquid behavior, T 2-resistivity, around QCP is anticipated as a function of phase tuning parameter. The critical behavior of ρ(T ) near the AF QCP in NiS2−xSex solid solution is a textbook demonstration of standard theories for QCP. NiS2 crystallizes in the pyrite structure. Ni is divalent and therefore accommodates two electrons in doubly degenerate eg orbitals (t g). Due to a strong onsite Coulomb repulsion among Ni eg electrons, the system is a S = 1 Mott insulator [6, 7]. By substitut- ing S with Se, the effective bandwidth can be increased due to the increase of p-d hybridization. With increasing x in NiS2−xSex, the system experiences a weakly first or- der transition to an AF metal with a collinear spin struc- ture [8] at x ∼ 0.4 and then a second order transition to a paramagnetic metal at x = 1.0. In the AF metal phase of NiS2−xSex, the AF transi- tion temperature is TN ∼ 90 K at x = 0.5 and, with increasing x, continuously goes down to T = 0 at x = 1.0, giving rise to a well defined AF QCP. The T 3/2 de- pendence of ρ(T ), expected for AF QCP, is observed at least down to 1.7 K at x = 1.0. On going away from x = 1.0, T 2-behavior of ρ(T ) quickly recovers and a V-shaped region with T 2 resistivity is identified around x = 1.0. It is known that the application of pressure is equivalent to Se substitution in that it increases the band width. By applying pressure P on an AF metal NiS1.3Se0.7, sup- pression of TN analogous to Se substitiution was indeed observed and, eventually, QCP was approached with P = 2 GPa [9]. The phase diagram and the critical behavior of the resistivity in pressurized NiS1.3Se0.7 were essen- tially identical with Se content x simply replaced with P Recently, however, there has been growing evidence that, the above mentioned textbook picture is violated in a variety of intermetallic systems. In a helical magnet MnSi [10] and a weak ferromagnet ZrZn2 [11], a non triv- ial power law behavior of resistivity, ρ(T ) ∝ T 3/2, dom- inates the resistivity down to a very low temperature, not only right at the QCP but also over a wide range of paramagnetic phase. At the AF QCP in CePd2Si2, with increasing the purity of sample, the exponent of the power law resistivity was found to deviate significantly from the standard value of 3/2 [12]. The common feature among these systems is that they are clean with a low residual resistivity of ρ0 < 1 µΩcm [10, 11]. In contrast, in NiS2−xSex solid solution where textbook example of critical behavior of ρ(T ) is observed, Se substitution in- herently gives rise to disorder. Indeed, ρ0 of NiS2−xSex around QCP is as large as several 10 µΩcm. Questions immediately arise. Does the non-trvial behavior in the intermetallics represent a generic property of magnetic http://arxiv.org/abs/0704.0591v1 QCP in clean systems? Does standard behavior of QCP shows up only when the system is disordered? To tackle these questions experimentally, we attempted to realize a “clean” QCP in NiS2−xSex. The parent compound of NiS2−xSex, NiS2, is pure and presumably clean. If one can approach the QCP of pure NiS2 by pressure without relying on Se substitution, a clean analogue of AF QCP in NiS2−xSex can be explored and the impact of disor- der on QCP can be captured. Recent progress of high pressure technique enabled us to do so. In this Letter, we address the issue of criticality and disorder by examining the critical behavior of resistivity of pure NiS2 under pressures. The AF QCP of NiS2 was reached at ∼ 7 GPa, where the system was found very clean with a low residual resistivity ρ0 of ∼ 0.5 µΩcm. Not only right at the QCP but over an entire range of the paramagnetic phase investigated, the recovery of Fermi liquid T 2 of ρ(T ) is suppressed substantially to a very low temperature below ∼ 2 K and non Fermi liquid behavior with T 3/2 dependence of ρ(T ) dominated. We demon- strate the drastic influence of disorder on this AF QCP by contrasting the result with previous pressure work on NiS1.3Se0.7 with a residual resistivity of 60 µΩcm [9]. NiS2 sample used in this study was prepared by a va- por transport technique. The resistivity measurement was performed by a conventional four probe technique under hydrostatic pressure up to ∼ 10 GPa in a cubic anvil type pressure system down to 3 K and also in a modified Bridgman-type pressure cell down to 180 mK. The results obtained by the two different pressure setups agreed reasonably in the temperature range of overlap, indicating a very good homogeneity of pressure. Pres- sure was calibrated by measuring the superconducting transition temperature of Pb [13]. The inset of Fig. 1 demonstrates ρ(T ) of NiS2 at rel- atively low pressures below 4 GPa. With applying pres- sure, the insulating behavior of ρ(T ) switches into metal- lic behavior, indicating the occurrence of metal-insulator transition. In between 2.6-3.4 GPa, we observe a discon- tinuous jump of resistivity as a function of temperature, which corresponds to a first order metal-insulator transi- tion line on the phase diagram in Fig. 1. The discontinu- ous jump appears to terminate around 200 K, indicating the presence of a critical end point. In the phase dia- gram of NiS2−xSex solid solution, the first order phase line terminates at much lower temperature and is hard to identify [14]. This difference appears to suggest the strong influence of disorder on the Mott criticality. As seen in the inset of Fig. 1, ρ(T ) of pure NiS2 showed metallic behavior above P = 2.6 GPa. The residual re- sistivity at the critical point was as low as ∼ 0.5 µΩcm, demonstrating that the system is indeed very clean. Mag- netic ordering in the AF metal phase manifests itself as a very weak but sharp kink in ρ(T ) at TN as indicated by the arrows. The antiferromagnetic transition tempera- ture TN thus determined systematically goes down upon 1086420 P (GPa) 3002001000 T (K) 0, 2.6, 3.0, 3.2, 3.3, 3.36, 3.4 GPa TNInsulator 100500 T (K) 4.0 GPa 5.0 GPa 6.2 GPa 7.5 GPa FIG. 1: The electronic phase diagram of clean NiS2 pyrite as a function of pressure. PM and AFM denote paramagnetic metal and AF metal, respectively. The inset shows the tem- perature dependent resistivity under pressures, P = 0 - 3.4 GPa (left) and P = 4.0 - 7.5 GPa (right). pressure and approaches T = 0 somewhere around 7-7.5 GPa. No superconductivity was observed between P = 6 and 9.1 GPa down to 180 mK, in spite of the low resid- ual resistivity. This appears to suggest that realizing an AF QCP in clean systems alone is not enough to achieve exotic superconductivity as observed in heavy Fermion compounds [15, 16, 17, 18] and that additional ingredi- ents such as Kondo physics must be invoked. The pressure dependence of TN , determined by the kink in ρ(T ), together with the first order metal insulator transition, is summarized as a phase diagram in Fig. 1. TN appears to decrease almost linearly in contradiction to (Pc − P ) 2/3 dependence expected from self consistent renormalization theory [5]. Unusual linear suppression of the magnetic transition temperature was also observed analogously for a helical magnet MnSi [10] and a weakly ferromagnet ZrZn2 [11] when the sample is very clean. It may be interesting to infer that, in these clean system, the magnetic transition as a function of pressure is re- ported to be a first order rather than a second order. In the clean NiS2, we cannot rule out the possibility of a first order transition at this stage, because ρ(T ) is not very sensitive to TN near the critical point. The signature of AF criticality in this clean system was explored. The inset of Fig. 2 demonstrates ρ(T ) below 30 K, plotted as ρ vs. T 3/2. In the antiferromagnetic phase at P = 6.2 GPa, ρ - T 3/2 curve is linear above TN but shows superlinear behavior below TN . The temperature dependence below TN is found to be approximately T indicative of the formation of coherent quasi particles. In the paramagnetic phase above ∼ 7 GPa, however, the ρ - T 3/2 curve shows a linear behavior down to very low T (K) T 3/2 6.8 GPa 2001000 T 3/2 (K3/2) 6.2 GPa FIG. 2: Temperature dependent part of resistivity ρ−ρ0 as a function of temperature under pressure above ∼ 7 GPa, where the system is paramagnetic, plotted as log(ρ− ρ0) vs. log T . The inset shows ρ vs. T 3/2 plot. temperature which is expected for the antiferromagnetic critical point due to low lying spin fluctuations. It is remarkable to see T 3/2 behavior characteristic of the an- tiferromagnetic critical point over such a wide range of pressure from ∼ 7 GPa to ∼ 9 GPa. ρ(T ) is surprisingly insensitive to the pressure in the paramagnetic region above 7 GPa and it is hard to find a signature of crit- icality. This is analogous to those observed in a helical magnet MnSi [10] and a weakly ferromagnetic magnet ZrZn2 [11] when the sample is clean, implying that un- usual critical behavior in clean systems is ubiquitous. To investigate the details of unusual temperature de- pendence in the paramagnetic phase further, we plotted the temperature dependence of ρ− ρ0 as log(ρ− ρ0) vs. logT in the main panel of Fig. 2. The residual resistivity ρ0 was determined by extrapolating ρ - T 2 curve to T = 0 limit. We note here that the temperature dependent part ρ− ρ0 is comparable to ρ0 at ∼ 3 K and, therefore, the ambiguity originating from the estimate of ρ0 does not influence the temperature dependence of ρ − ρ0 at least above 1 K. It is again clear that the slope is appar- ently smaller than 2 and instead close to 3/2 above ∼ 2 K. At the lowest temperatures below ∼ 2 K, however, the slope becomes steeper and T 2-resitivity appears to recover eventually below 2 K. This crossover tempera- ture to T 2-resistivity is again insensitive to pressure and always 2-3 K up to 9 GPa. Close inspection of data indi- cates that the crossover temperature is the lowest around 7.5 GPa but only slightly lower than the other pressures. 987654 P (GPa) 6543210 P (GPa) NiS1.3Se0.7 FIG. 3: Contour plot of the exponent n of power low depen- dence of resistivity on pressure-temperature plane, demon- strating the criticality observed in the temperature depen- dence of resistivity. The main panel is data for clean NiS2 and the inset shows data for dirty NiS2−xSex. The Néel tem- perature determined by resistivity anomaly was shown by the white line. This strong suppression of the crossover to T 2 behav- ior in ρ(T ), over a remarkably wide range of pressure from ∼ 7 GPa up to ∼ 9 GPa, contrasts sharply with the observation in Se-doped samples, where the recovery of T 2 resistivity was clearly observed not only for the magnetic side but also for the paramagnetic side. As a function of Se-doping, a metal-insulator transition and antiferromagnetic critical point occurs at around Se con- tent x = 0.4 and 1.0, respectively, while ∼ 2.5 GPa and ∼ 7 GPa are required to reach a metal-insulator transition and QCP, respectively. This yields a conversion ratio of phase controlling parameters, ∼ 0.15 Se/1 GPa. In this disordered NiS2−xSex, the T 3/2-dependence of ρ(T ) dominates at the QCP of x = 1.0. With further doping of Se up to x = 1.33 which is equivalent of additional pressure of ≃ 2 GPa, however, the T 2 resistivity is fully recovered and can be observed below ∼ 80 K [9]. Analo- gously, in a Se doped NiS1.3Se0.7 crystal under pressure, on going from the QCP at P ∼ 2 GPa to P = 4 GPa, T 2 resistivity recovers quickly and shows up below 80 K with increase of 2 GPa. These should be compared with the low crossover temperature of 2-3 K, approximately 2 GPa above the critical point. To visually illustrate these points, we plotted the ex- ponent of power law dependence of ρ(T ), n as a contour map on the pressure-temperature plane in Fig. 3. The exponent was determined by taking the derivative of the log(ρ − ρ0) - logT curve in Fig. 2. It is clear that the V-shaped recovery of Fermi liquid behavior around QCP is absent in clean NiS2. The recovery can be seen only on the antiferromagnetic side below 7 GPa, where the region with n = 2 (T 2) develops below TN . Above the critical point of P ∼ 7 GPa, it is clear that the n = 1.5 (T 3/2) region predominantly occupies a majority of the paramagnetic phase. A thin region with a different color is lying at the T = 0 limit. This corresponds to the marginal recovery of Fermi liquid behavior below ∼ 2 K. In the inset of Fig. 3, we have constructed the contour map also for the NiS1.3Se0.7 data under pressure from a previous report [7]. Note again the V-shaped recovery on the temperature scale of 100 K over ∼ 2 GPa pressure. The remarkable contrast in the critical behavior be- tween pure NiS2 and NiS2−xSex, visually demonstrated in Fig. 3, indicates that the influence of disorder on quantum criticality is surprisingly drastic, since the only difference between the two systems is the disorder pro- duced by Se substitution. In NiS1.3Se0.7 solid solution, the residual resistivity ρ0 is approximately 60 µΩcm, which is larger than those of pure NiS2 by two orders of magnitude. When the samples are disordered, we do see a canonical behavior of the QCP as predicted by stan- dard theories [3, 4, 5]. To our surprise, once the system becomes clean, the textbook behavior is gone and the Fermi liquid coherence seen in ρ(T ) is dramatically sup- pressed. We should note that the magnitude of ρ(T )−ρ0 is roughly 50 µΩcm in the temperature range below 100 K at around QCP. In the Se doped crystal, inelastic scat- tering is always weaker than or at most comparable to elastic scattering due to disorder below 100 K. In the pure NiS2, in contrast, the same situation, ρ − ρ0 < ρ0 occurs only below 2-3 K, where we observed crossover to T 2-resistivity. This suggests that disorder might be controlling the appearance of T 2-resistivity. One of the plausible scenarios for the strong influence of disorder and robust non-Fermi liquid behavior might be a dichotomy of the Fermi surface [19]. It is natural that a specific part of Fermi surface, a “hot spot”, is coupled strongly with a critical antiferromagnetic fluctu- ation with a characteristic wave vector Q. There may remains a region with well defined quasiparticles free from critical fluctuations, a cold spot. The transport is then determined by an interplay of these two contri- butions at high temperatures but eventually a cold spot with T 2-dependence should dominate the conduction at very low temperatures. This phase separation in k-space might explain in part the unusual temperature depen- dence observed in pure NiS2 but it is not clear whether the robustness of non Fermi liquid behavior can be prop- erly described. In this scenario, the strong influence of disorder can be naturally understood. The strong elas- tic scatting should mix up the hot spot and cold spot and the inelastic scattering therefore becomes effectively isotropic, which might be close to the situation implic- itly assumed in standard theories [5]. Another scenario might be a phase separation and the resultant domain or bubble formation in real space as discussed in clean MnSi where the helical spin order disappears discontinuously as a first order transition [10]. These bubbles and domains have been proposed to be responsible for the robust non Fermi liquid behavior in the paramagnetic phase. It is worth checking the possible first order transition carefully checking the magnetism at ∼ 7 GPa. In conclusion, we have demonstrated the sharp con- trast in the quantum critical behavior of ρ(T ) between the clean and the disordered systems by examining a sin- gle crystal of NiS2 with a low residual resistivity of ∼ 0.5 µΩcm. Previously, the V-shaped recovery of Fermi liquid behavior (T 2-behavior of resistivity) around the antifer- romagnetic critical point was clearly observed as a func- tion of pressure and Se content in the dirty NiS2−xSex systems with ρ − ρ0 < ρ0. In sharp contrast, we found a robust non Fermi liquid behavior over a wide pressure range in the paramagnetic side of a QCP in the clean sys- tem with ρ−ρ0 ≫ ρ0. These results clearly demonstrate that our understanding of the quantum critical point is still far from complete and some important ingredient must be missing. The authors would like to thank M. Imada and H. Fukuyama for discussion. This work was partly sup- ported by a Grant-in-Aid for Scientific Research (No. 18043009) from the Ministry of Education, Culture, Sports, Science and Technology of Japan and CREST, Japan Science and Technology Agency (JST). [1] P. Coleman and A. J. Schofield, Nature 433, 226 (2005). [2] T. Moriya, Spin Fluctuations in Itinerant Electron Mag- netism (Springer-Verlag, Berlin, 1985). [3] J. A. Hertz, Phys. Rev. B. 14, 1165 (1976). [4] A. J. Millis, Phys. Rev. B. 48, 7183 (1993). [5] T. Moriya and K. Ueda, Adv. Phys., 49, 555 (2000). [6] S. Ogawa, J. Appl. Phys., 50, 2308 (1979). [7] J. A. Wilson, The Metallic and Non-metallic States of Matter, pp.215-260 (Taylor & Francis, London, 1985). [8] T. Miyadai et al., J. Phys. Soc. Jpn., 38, 115 (1975). [9] S. Miyasaka et al., J. Phys. Soc. Jpn., 69, 3166 (2000). [10] C. Pfleiderer, S. R. Julian, and G. G. Lonzarich, Nature 414, 427 (2001). [11] S. Takashima et al., J. Phys. Soc. Jpn. 76, 043704 (2007). [12] F. M. Grosche et al., J. Phys.: Condens. Matter, 12, L533 (2000). [13] N. Môri, H. Takahashi, and N. Takeshita, High Pressure Research, 24, 225 (2004). [14] G. Czjzek et al., J. Magn. Mag. Mat., 3, 58 (1976). [15] D. Jaccard, K. Behnia, and J. Sierro, Phys. Lett. A 163, 475 (1992). [16] S. R. Julian et al., J. Phys.: Condens. Matter, 8, 9675 (1996). [17] C. Petrovic et al., J. Phys.: Condens. Matter, 13, L337 (2001). [18] S. S. Saxena et al., Nature 406, 587 (2000). [19] A.Rosch, Phys. Rev. B, 62, 4945 (2000). ABSTRACT A quantum critical point (QCP) between the antiferromagnetic and the paramagnetic phases was realized by applying a hydrostatic pressure of ~ 7 GPa on single crystals of NiS_{2} pyrite with a low residual resistivity, rho_{0}, of 0.5 mu-Omega-cm. We found that the critical behavior of the resistivity, rho, in this clean system contrasts sharply with those observed in its disordered analogue, NiS_{2-x}Se_{x} solid-solution, demonstrating the unexpectedly drastic effect of disorder on the quantum criticality. Over a whole paramagnetic region investigated up to P = 9 GPa, a crossover temperature, defined as the onset of T^{2} dependence of rho, an indication of Fermi liquid, was suppressed to a substantially low temperature T sim 2 K and, instead, a non Fermi liquid behavior of rho, T^{3/2}-dependence, robustly showed up. <|endoftext|><|startoftext|> Spin coherence of holes in GaAs/AlGaAs quantum wells M. Syperek1,2, D. R. Yakovlev1,†, A. Greilich1, J. Misiewicz2, M. Bayer1, D. Reuter3, and A. D. Wieck3 Experimentelle Physik II, Universität Dortmund, D-44221 Dortmund, Germany Institute of Physics, Wroc law University of Technology, 50-370 Wroc law, Poland and Angewandte Festkörperphysik, Ruhr-Universität Bochum, D-44780 Bochum, Germany (Dated: August 2, 2021) The carrier spin coherence in a p-doped GaAs/(Al,Ga)As quantum well with a diluted hole gas has been studied by picosecond pump-probe Kerr rotation with an in-plane magnetic field. For resonant optical excitation of the positively charged exciton the spin precession shows two types of oscillations. Fast oscillating electron spin beats decay with the radiative lifetime of the charged exciton of 50 ps. Long lived spin coherence of the holes with dephasing times up to 650 ps. The spin dephasing time as well as the in-plane hole g factor show strong temperature dependence, underlining the importance of hole localization at cryogenic temperatures. PACS numbers: 42.25.Kb, 78.55.Cr, 78.67.De Recently the investigation of the coherent spin dynam- ics in semiconductor quantum wells (QW) and quantum dots has attracted much attention, due to the possible use of the spin degree of freedom in novel fields of solid state research such as spin-based electronics or quantum information processing [1, 2, 3]. Until now the inter- est has been mostly focused on the spin coherence of electrons, while experimental information about the spin coherence of holes is limited [4]. The hole as a Luttinger spinor has properties strongly different from the electron spin, such as a strong spin-orbit coupling, a strong direc- tional anisotropy, etc. It plays an important role also in coherent control of electron spins, since in many optical schemes charged electron-hole complexes are proposed as intermediate manipulation states [5]. Earlier, the hole spin dynamics in GaAs-based QWs has been measured by optical orientation detecting photoluminescence (PL) either time-integrated or time- resolved [4, 6, 7, 8, 9]. Experimental studies addressed the longitudinal spin relaxation time T1 [6, 7, 8] and the dephasing time T ∗ , exploiting the observation of hole spin quantum beats [4]. The reported relaxation times vary from 4 ps [6] up to ∼1 ns [4, 8] demonstrating strong de- pendence on doping level, doping density and excitation energy. A major drawback of PL techniques is, however, that the spin coherence can be traced only as long as both electrons and holes are present and photolumines- cence can occur. Further, they work only for studying the spin dynamics of minority carriers in a sea of ma- jority carriers and are therefore restricted to undoped or n-type doped QWs. However, then the holes can interact with electrons, providing additional relaxation channels via exchange or shake-up processes [8, 10]. These mech- anisms can be excluded for p-doped structures if the hole spin relaxation occurs on time scales longer than the ra- diative annihilation of electrons. A pump-probe Kerr ro- tation (KR) technique using resonant excitation allows to realize such measurements, which up to now have been reported only for bulk p-type GaN [11] and not yet for low-dimensional systems. The theoretical analysis of the hole spin dynamics in QWs has been focused on free holes [10, 12, 13, 14] by considering different relaxation mechanisms: (i) a Dyakonov-Perel like mechanism, (ii) an acoustic phonon assisted spin-flip due to spin mixing of valence band states, (iii) an exchange induced spin-flip due to scatter- ing on electrons, which resembles the Bir-Aronov-Pikus mechanism, but for holes. Recently the attention has been drawn on the role of hole localization and the de- phasing caused by fluctuations of the in-plane g factor has been calculated [15]. In this paper we use time-resolved pump-probe Kerr rotation [16] to investigate the spin coherence of holes in a p-doped GaAs/Al0.34Ga0.66As single QW with a low hole density. We find spin dephasing times reaching al- most the ns-range at a temperature T = 1.6 K with a hole in-plane g factor of about 0.01. Both quantities de- crease strongly with increasing temperature, suggesting the important influence of hole localization. We discuss also a mechanism that provides generation of spin co- herence for the hole gas under resonant excitation of the positively charged exciton. The structure was fabricated by molecular-beam epi- taxy on a (100) oriented GaAs substrate. A 15 nm- wide GaAs QW was grown on top of a 380 nm- thick Al0.34Ga0.66As barrier and overgrown by a 190 nm-thick Al0.34Ga0.66As layer. 21 nm-thick layers with Al0.34Ga0.66As effective composition realized by GaAs/AlAs short-period superlattices were deposited on both sides of the QW in order to improve interface pla- narity. Two δ-doped layers with Carbon acceptors were positioned symmetrically at 110 nm distance from the QW. The hole gas concentration and mobility in the QW are 1.51× 1011 cm−2 and 1.2× 105 cm2/Vs, respectively, as determined by Hall measurements at T = 4.2 K. It was possible to deplete the hole density in the QW by above barrier illumination and even invert the majority carrier type, resulting in a diluted electron gas. The sam- ple temperature was varied from 1.6 to 6 K. A mode-locked Ti:Sapphire laser with a repetition rate of 75.6 MHz and a pulse duration of ∼1.5 ps (∼1 meV full width at half maximum) was used for optical excitation. http://arxiv.org/abs/0704.0592v1 FIG. 1: (a) KR traces for a p-type 15 nm-wide GaAs/Al0.34Ga0.66As QW vs time delay between pump and probe pulses at B = 0 and 7 T with field tilted by ϑ = 4◦ out of QW plane. Laser at 1.5365 eV is resonant with T+ line. Power was set to 5 and 1 W/cm2 for pump and probe, respectively. Bottom trace was recorded with additional laser illumination at 2.33 eV. T = 1.6 K. (b) PL spectra for DHG (excitation at 1.579 eV) and DEG regime (above barrier ex- citation at 2.33 eV). (c) Comparison of KR traces for ϑ = 0 and 4◦. The laser beam was split into a circularly polarized pump and a linearly polarized probe beam. Both beams where focused on the sample surface to a spot diameter of ∼100 µm. Magnetic fields B ≤ 10 T were applied about per- pendicular to the structure growth z-axis (Voigt geome- try). In a pump-probe KR experiment the pump pulse coherently excites carriers with spins polarized along the z axis. The subsequent coherent evolution of the spins in form of a precession about the magnetic field is tested by the probe pulse polarization. To detect the change of the linear probe polarization plane (the KR angle), a homodyne technique based on phase-sensitive balanced detection was used. Photoluminescence spectra excited above and below the band gap of the Al0.34Ga0.66As barriers are shown in Fig. 1(b) at B = 0 and 7 T. A single PL line correspond- ing to the positively charged trion T+ (consisting of two holes and one electron) is seen for the regime of diluted hole gas (DHG) established for below-barrier excitation. After inverting the type of majority carriers to the DEG FIG. 2: Top trace: KR signal measured at B = 7 T for ϑ = 4◦. Bottom traces are obtained by separating electron and hole contributions (see text). Excitation conditions as in Fig. 1. regime by above barrier illumination the PL spectra con- sist of the exciton (X) and negatively charged trion (T−) lines. The type of majority carriers in the QW can be iden- tified by the KR signals measured at B = 7 T, with the laser energy tuned to the trion resonance. The bottom trace in Fig. 1(a) was measured with additional above- barrier illumination (DEG regime) and shows long-lived electron spin beats with a dephasing time of 2.5 ns which is considerably longer than the radiative decay time of resonantly excited trions of about 50 ps. The precession frequency corresponds to a g factor | ge |= 0.285± 0.005, which is typical for electrons in GaAs-based QWs. Without above-barrier illumination in the DHG regime, fast electron precession is observed only during ∼ 200 ps after pump pulse arrival [see middle trace in Fig. 1(a)]. This signal is caused by the coherent preces- sion of the electron in T+ and disappears with the trion recombination. The electron beats are superimposed on the hole beats with a much longer precession period. The hole beats decay with a time constant of about 100 ps and can be followed up to 500 ps delay. At these long times the KR signal is solely given by coherent hole precession. Experimentally, it is difficult to observe the hole spin quantum beats due to the very small in-plane hole g fac- tor. To enhance the visibility we tilted the magnetic field slightly out of the plane by an angle ϑ = 4◦ to increase the hole g factor by mixing the in-plane com- ponent (gh,⊥) with the one parallel to the QW growth axis (gh,‖), which typically is much larger: gh(ϑ) = sin2 ϑ+ g2 h,⊥ cos 2 ϑ [17]. The strong change of the hole beat frequency with the tilt angle is seen in Fig. 1(c). The precession frequency is analyzed by ωh = µB | gh | B/~, where µB is the Bohr magneton, and gives | gh,⊥ |= 0.012±0.005 for ϑ = 0 ◦ and | gh |= 0.048±0.005 for ϑ = 4◦. The electron and hole contributions to the KR ampli- FIG. 3: Hole component of KR signal at different magnetic fields and ϑ = 4◦. Top inset: Magnetic field dependence of the hole dephasing time T ∗2 . Solid line is 1/B fit to data. Bottom inset: Hole spin precession frequency vs magnetic fields. Line is guide to the eye. In inserts closed and open circles show the data measured for pump to probe powers of 1 to 5 W/cm2 and 5 to 1 W/cm2, respectively. T = 1.6 K. tude, ΘK, can be separated by fitting the experimental data with a superposition form of exponentially damped harmonic functions for electrons and holes: i=e,h Ai exp(− ) cos(ωi∆t). (1) Ai are the corresponding signal amplitudes at pump-to- probe delay ∆t = 0, and T ∗ 2,i are the spin dephasing times. An example for a decomposition of the KR signal in the DHG regime is shown in Fig. 2. Let us turn now to the hole coherence. Figure 3 shows the hole contribution to the KR signal for different B at T = 1.6 K. From the fit by Eq.(1) we have obtained the dephasing time T ∗ , which is plotted versus B in the inset. A very long lived hole spin coherence with T ∗2 = 650 ps is found at B = 1 T. With increasing B up to 10 T it short- ens to 70 ps. The field dependence can be well described by a 1/B-form (see the line in the inset), from which we conclude that the dephasing shortening arises from the inhomogeneity of the hole g factor ∆gh = 0.0007 in the QW, which is translated into a spread of the preces- sion frequency: ∆ωh = ∆ghµBB/~. Since T ∝ 1/∆ωh, this explains the 1/B-dependence of the dephasing time. The magnetic field dependence of the hole precession fre- quency in the lower inset of Fig. 3 shows an approximate B-linear dependence up to 7 T. For higher fields a super linear increase is seen which indicates a change of the hole g factor due to mixing between heavy and light hole states, induced by the field. Two sets of experimental data measured for pump to FIG. 4: Temperature dependence of the hole KR signal at B = 7 T and ϑ = 5◦. Pump and probe powers are set to 1 and 5 W/cm2, respectively. Inset: Hole spin dephasing time T ∗2 vs temperature. probe powers 1 to 5 W/cm2 and 5 to 1 W/cm2 are com- pared in the insets of Fig. 3. The very similar results demonstrate performance of the experiment in the linear regime for both pump and probe beams with power not exceeding 5 W/cm2. Insight into the origin of the long hole spin coherence can be taken from KR at varying temperatures. The data in Fig. 4 measured at ϑ = 5◦ show that (i) the dephasing time T ⋆ decreases from 110 down to 60 ps when increasing the temperature from 1.6 to 6 K (see also inset), and (ii) simultaneously the precession frequency decreases notably corresponding to a g factor decrease from 0.057 to 0.030. These results can be naturally explained by hole lo- calization in the QW potential relief due to monolayer well width fluctuations. The localization energy does not exceed 0.5 meV, which is comparable with the thermal energy at T = 6 K. Free holes are expected to have a short spin coherence time T2 limited by the efficient spin relaxation mechanisms due to the spin-orbit interaction [10, 13, 14]. For localized holes these mechanisms are mostly switched off and one can expect long T2 times. However, in a KR experiment we do not measure the T2 time, but rather the ensemble dephasing time T (Ref. 15), as confirmed by the 1/B dependence in the inset of Fig. 3. The T ∗2 time gives a lower boundary for the spin coherence time T2. Therefore we can conclude, that the T2 for localized holes is at least 650 ps. Thermal delocalization of holes on the one hand decreases the role of inhomogeneities and reduces ∆gh, which should lead to longer dephasing times. On the other hand, then the fast decoherence of free holes becomes the limiting factor for the spin beats dynamics. Mixing of heavy and light hole states in a QW is en- hanced by localization effects. This should be detectable by an increase of the in-plane hole g factor, which is close to zero for free holes [4, 15, 18]. The decrease of the hole g factor with increasing temperature shown in Fig. 4 is consistent with a hole delocalization scenario. We turn now to discussing the mechanism for opti- cal generation of hole spin coherence in a QW with a DHG. The generation mechanism is similar to the one suggested for singly charged quantum dots [19, 20] and QWs with a DEG [21]. In our experiment pump and probe are resonant in energy with the positively charged trion T+. Due to the considerable heavy-light hole split- ting, the circularly polarized pump creates holes and elec- trons with well-defined spin projections, Jh,z = ±3/2 and Se,z = ±1/2, respectively, according to the optical selec- tion rules [12]. Therefore, |⇑⇓↓〉 (|⇑⇓↑〉) trions can be generated by a σ+ (σ−) polarized pump. Here the thick and thin arrows give the spin states of holes and elec- trons, respectively. The pump pulse duration is much shorter than the spin coherence and the electron-hole recombination times. If in addition the pump duration is shorter than the charge coherence time of the trion state the pulse creates a co- herent superposition of a resident hole from the DHG and a hole singlet trion T+. The spin state of the resident hole with arbitrary spin orientation before excitation can be described by α |⇑〉+ β |⇓〉, where |α|2 + |β|2 = 1. With- out magnetic field and for fields oriented normal to the z-axis, the net spin polarization of the hole ensemble is zero, so that the ensemble averaged coefficients are equal: α = β. For σ+ polarized excitation, for which injection of an |⇑↓〉 electron-hole pair is possible, the excited superposi- tion is given by α |⇑〉+β cos(Θ/2) |⇓〉+iβ sin(Θ/2) |⇑⇓↓〉. Here Θ = d · E(t)dt/~ is the dimensionless pulse area with the pump laser electric field E(t) and the dipole transition matrix element d. In general, the hole-trion superposition state may be driven coherently by varying the pulse area, giving rise to Rabi-oscillations as reported recently for (In,Ga)As quantum dots [20]. Such oscilla- tions have not been found yet in QWs, most probably due to the fast carrier dephasing, in particular for strong ex- citation. Dephasing of the superposition occurs shortly after the pulse on a time scale of a few ps, converting the coherent polarization into a population consisting of holes with original spins ⇑ and ⇓ and trions with ⇑⇓↓. In a simplified picture, the spin coherence generation can be described as follows: The σ+ polarized pump cre- ates with certain efficiency trions T+ of spin configura- tion |⇑⇓↓〉. By this process |⇓〉 holes are pumped out of the DHG, leaving behind holes with opposite spin |⇑〉. Right after the pump pulse the KR signal is contributed by the |⇑〉 hole from the DHG and |↓〉 electron of the T+. The further evolution of the coherent signal depends on the strength of external magnetic field applied perpen- dicular to the z-axis. AtB=0, the carrier spins experience no Larmor preces- sion. The electron spin relaxation time usually exceeds the lifetime of trions, which is limited by radiative decay, by one-two orders of magnitude. Trion recombination re- turns the hole to the DHG with the same spin orientation as it was pumped out, if no electron spin scattering oc- curred in the meantime. This compensates the induced spin polarization and nullifies the KR signal at delays exceeding the trion lifetime. Indeed, the KR signal in the top trace in Fig. 1(a) shows a fast decay with a time constant of ∼50 ps, which is characteristic for radiative trion recombination in GaAs/(Al,Ga)As QWs [22]. The long-lived tail of the signal has a very small amplitude and is due to hole coherence provided by weak spin re- laxation of electrons in T+ and/or hole relaxation in the DHG during the trion lifetime. In finite magnetic fields, the carrier spins start to pre- cess about B. Due to the electron spin precession in T+, the hole spin returned to the DHG after trion re- combination will not compensate the spin polarization of the resident holes. Therefore, a long-lived hole coherence with considerable amplitude will be induced. This co- herence is observed in the KR signal as spin beats with low frequency (see Figs. 1 and 3). Note that the Larmor precession of the resident holes may also contribute to generation of hole spin coherence, but the effect is pro- portional to the ratio of the hole and electron Larmor frequencies and therefore will be rather small. Let us compare the spin coherence generation for QWs with DHG and DEG resonantly excited in the T+ and T− states, respectively. We are interested in a long-lived spin coherence which goes beyond the trion lifetime, i.e. in spin coherence induced for the resident carriers. In both cases the amplitude of the KR signal is controlled by the ratio of the electron spin beat period to the trion lifetime. Nevertheless, the two cases are quite different as for DHG the precessing electron is bound in the T+ trion, while for DEG the background electron precesses. In the latter case the electron precession in T− is blocked due to the singlet spin character of the trion ground state. To conclude, a long-lived spin coherence has been found for localized holes in a GaAs/(Al,Ga)As QW with a diluted hole gas. The spin coherence time exceeds 650 ps and is still masked by the spin dephasing due to g factor inhomogeneities. Localization of holes suppresses most spin relaxation mechanisms inherent for free carri- ers. It is also worth to note, that due to the p-type Bloch wave functions the holes do not interact with the nuclear spins, which provides the most efficient spin relaxation mechanism for localized electrons [23]. Acknowledgements. This work was supported by the BMBF program ’nanoquit’. [†] Also at Ioffe Physico-Technical institute, Russian Academy of Sciences, St. Petersburg, Russia. [1] Semiconductor Spintronics and Quantum Computation, ed. by D. D. Awschalom, D. Loss, and N. Samarth, (Springer-Verlag, Heidelberg 2002). [2] I. Žutić, J. Fabian, and S. Das Sarma, Rev. Mod. Phys. 76, 323 (2004). [3] D. P. DiVincenzo, Science 270, 255 (1995); D. Loss and D. P. DiVincenzo, Phys. Rev. A 57, 120 (1998). [4] X. Marie et al., Phys. Rev. B 60, 5811 (1999). [5] A. Imamoglu et al., Phys. Rev. Lett. 83, 4204 (1999). [6] T. C. Damen, L. Viña, J. E. Cunningham, J. E. Shah, and L. J. Sham, Phys. Rev. Lett. 67, 3432 (1991). [7] Ph. Roussignol et al., Surf. Sci. 305, 263 (1994). [8] B. Baylac et al., Sol. State Comm. 93, 57 (1995). [9] B. Baylac et al., Surf. Sci. 326, 161 (1995). [10] T. Uenoyama and L. J. Sham, Phys. Rev. Lett. 64, 3070 (1990); Phys. Rev. B 42, 7114 (1990). [11] C. Y. Hu et al., Phys. Rev. B 72, 121203(R) (2005). [12] Optical Orientation, ed. by F. Meier and B. P. Za- kharachenya (North-Holland, Amsterdam 1984), Ch. 2. [13] R. Ferreira and G. Bastard, Phys. Rev. B 43, 9687 (1991). [14] C. Lü, J. L. Cheng, and M. W. Wu, Phys. Rev. B 73, 125314 (2006). [15] Y. G. Semenov, K. N. Borysenko, and K. W. Kim, Phys. Rev. B 66, 113302 (2002). [16] J. J. Baumberg, D. D. Awschalom, N. Samarth, H. Luo, and J. K. Furdyna, Phys. Rev. Lett. 72, 717 (1994). [17] The value | gh,‖ |= 0.60 ± 0.01 was determined from the Zeeman splitting of PL lines at B = 7 T applied along the QW growth axis. [18] R. Winkler, S. J. Papadakis, E. P. De Poortere, and M. Shayegan, Phys. Rev. Lett. 85, 4574 (2000). [19] A. Shabaev, Al. L. Efros, D. Gammon, and I. A. Merkulov, Phys. Rev. B 68, 201305(R) (2003). [20] A. Greilich et al., Phys. Rev. Lett. 96, 227401 (2006). [21] T. A. Kennedy et al., Phys. Rev. B 73, 045307 (2006). [22] G. Finkelstein et al., Phys. Rev. B 58, 12637 (1998). [23] I. A. Merkulov, Al. L. Efros and M. Rosen, Phys. Rev. B 65, 205309 (2002). ABSTRACT The carrier spin coherence in a p-doped GaAs/(Al,Ga)As quantum well with a diluted hole gas has been studied by picosecond pump-probe Kerr rotation with an in-plane magnetic field. For resonant optical excitation of the positively charged exciton the spin precession shows two types of oscillations. Fast oscillating electron spin beats decay with the radiative lifetime of the charged exciton of 50 ps. Long lived spin coherence of the holes with dephasing times up to 650 ps. The spin dephasing time as well as the in-plane hole g factor show strong temperature dependence, underlining the importance of hole localization at cryogenic temperatures. <|endoftext|><|startoftext|> Local-field effects in radiatively broadened magneto-dielectric media: negative refraction and absorption reduction Jürgen Kästel and Michael Fleischhauer Fachbereich Physik, Technische Universität Kaiserslautern, D-67663 Kaiserslautern, Germany Gediminas Juzeliūnas Institute of Theoretical Physics and Astronomy, Vilnius University, A Goštauto 12, Vilnius 01108, Lithuania (Dated: August 11, 2021) We give a microscopic derivation of the Clausius-Mossotti relations for a homogeneous and isotropic magneto-dielectric medium consisting of radiatively broadened atomic oscillators. To this end the diagram series of electromagnetic propagators is calculated exactly for an infinite bi-cubic lattice of dielectric and magnetic dipoles for a lattice constant small compared to the resonance wavelength λ. Modifications of transition frequencies and linewidth of the elementary oscillators are taken into account in a selfconsistent way by a proper incorporation of the singular self-interaction terms. We show that in radiatively broadened media sufficiently close to the free-space resonance the real part of the index of refraction approaches the value -2 in the limit of ρλ3 ≫ 1, where ρ is the number density of scatterers. Since at the same time the imaginary part vanishes as 1/ρ local field effects can have important consequences for realizing low-loss negative index materials. PACS numbers: INTRODUCTION It is well known that in dense dielectric materials the induced polarization P alters the field strength Eloc act- ing on the constituents (i.e. the local field) compared to the average macroscopic field Em. Macroscopic consider- ations show that in systems with high symmetry such as a cubic lattice the two fields are related to each other ac- cording to Eloc = Em +P/(3ε0) [1, 2]. This leads to the well-known Clausius-Mossotti relation for the permittiv- ity ε(ω) ε(ω) = 1 + ρα(ω)/ε0 1− ρα(ω)/(3ε0) where ρ is the density and α(ω) the polarizability of the oscillators. Similar arguments hold for a purely magnetic material [3], except that the required densities are usu- ally much higher due to the smallness of magnetic dipole moments and polarizabilities. In linear response α(ω) is well described by a damped-oscillator model [1] α(ω) = α′ + i α′′ = ω20 − ω2 − iγω . (2) The corresponding (real-valued) parameters such as the oscillator strength α0, the resonance frequency and width, ω0 and γ, are determined by the microscopic model. In general the linewidth γ contains radiative as well as non-radiative contributions. For purely ra- diative interaction these parameters are strongly af- fected by the renormalization of energy levels and spon- taneous emission processes caused by the interaction with the vacuum electromagnetic field in the medium [4, 5, 6, 7, 8, 9, 10, 11]. Since the mode structure of the electromagnetic field inside a dense medium can be substantially modified compared to free space, one would expect that the polarizability entering eq.(1) is different from that in free space. In a macroscopic approach α(ω) is however an input function and no conclusion can be drawn about possible changes due to the different struc- ture of the vacuum modes inside the medium. To take into account the modification of transition frequencies and radiative linewidth in a dense medium in a self- consistent way requires a microscopic approach. In the present paper we develop a microscopic ap- proach to local field effects in dense materials with simul- taneous dielectric and magnetic response using Greens- function techniques similar to those used by deVries and Lagendijk for purely dielectric materials [12]. To this end we consider an infinitely extended bi-cubic lattice of elec- tric and magnetic point dipoles with isotropic response with a lattice constant small compared to the transition wavelength. We however do not make use of the as- sumptions made in [12] to renormalize the singular self- interaction contributions to the lattice T -matrix which eliminated radiative contributions to linewidth and tran- sition frequencies altogether. We show that instead the self-interaction contributions can be summed to yield the dressed t-matrix of an isolated oscillator interacting with the vacuum modes of the electromagnetic field in free space. In this way we derive Clausius-Mossotti relations for general, radiatively broadened, isotropic magneto- dielectrica. Apart from non-radiative broadenings, the electric and magnetic polarizabilities entering these equa- tions are shown to be exactly those of free space. We then show that simultaneous local-field corrections to electric and magnetic fields in purely radiatively broad- ened magneto-dielectrica have a surprising and poten- tially important effect: For sufficiently large densities the http://arxiv.org/abs/0704.0593v2 real part of the refractive index saturates at the level of −2. At the same time, the imaginary part of the com- plex index approaches zero inversely proportional to the density. Thus the medium becomes transparent and left- handed i.e. displays a negative index of refraction with low absorption. LOCAL-FIELD EFFECTS AND RENORMALIZATION OF RADIATIVE SELF-INTERACTION IN DIELECTRIC MEDIA We start by developing a microscopic scattering ap- proach to local-field effects in dielectric media taking into account possible material induced modifications of radiative linewidth and transition frequencies in a self- consistent way. To this end we consider a simple cubic lattice of electric point dipoles with isotropic bare polar- izability αb αb(r) = αb δ(r−R), (3) whereR denote lattice vectors. The dipoles interact with the quantized electromagnetic field Ê which obeys the vector Helmholtz equation ~∇× ~∇× Ê(r, ω)− ω Ê(r, ω) = µ0ω P̂ . (4) In the weak-excitation, i.e. linear response limit, the op- erator of the microscopic electric polarization P̂ has the form P̂(r) = αb(r)Ê(r, ω). Solving eq.(4) we can deter- mine the (isotropic) dispersion relation k = k(ω) from which the permittivity ε(ω) can be extracted. In the linear response limit the solution of the quantum me- chanical interaction problem can most easily be obtained by means of Greensfunction techniques. In particular it is sufficient to calculate the scattering T -matrix of the oscillator lattice. The dispersion relation can then be obtained via [13, 14, 15] det T−1 = 0. (5) The scattering T -matrix obeys a linear Dyson equation T = V + V G(0)V + · · · = V + V G(0)T, (6) where G(0)(r, r′, ω) is the free-space retarded propagator of the electric field which is a solution to the classical vector Helmholtz equation ~∇× ~∇× G(0)(r, r′, ω)− ω G(0)(r, r′, ω) = = 1 δ(r− r′), (7) V (r, ω) = −ω 2αb(r) is a linear, isotropic point vertex. Note that integration over spatial variables was suppressed in eq.(6) for nota- tional simplicity. For a cubic lattice of isotropic scatterers, the series can be summed up to yield [16] T (k,k′) = − ei(k−k R 6=0 RG(0)(R) where G(0)(R) stands for G(0)(r, r+R, ω0) which due to the discrete translation invariance is independent on r. The single-particle scattering t-matrix t(ω) is determined by the bare polarizability [12] t(ω)−1 = + G(0)(0). (10) Note that G(0)(0) is diagonal and isotropic. In eq.(9) we have separated the contribution of the lattice ( R 6=0) from the multiple scattering events at the same oscilla- tor (G(0)(0)). This separation is crucial since G(0)(0) is singular. Rather than eliminating this singularity by a regularization procedure as done in [12], we note that ex- pression (10) gives the single-particle scattering t-matrix t(ω) dressed by the interaction with the vacuum field in free space. This quantity is experimentally observable and is related to the single-particle polarizability α(ω) in free space: α(ω) = t(ω) ε0 (11) αb on the other hand is not observable and thus only a theoretical notion. At this point other broadening mech- anisms can be incorporated by adding appropriate non- radiative decay rates γnon-rad to the polarizability α(ω) (11) (cf. equation (2) and discussion thereafter). Obviously, for the radiative part separating the sum∑ RG(0)(R) into G(0)(0)+ R 6=0 e ik′RG(0)(R) does the trick of writing the full lattice T -matrix in terms of the known free space t-matrix. As a drawback we are left with the sum over the lattice vectors R 6= 0. Unfortunately this sum can not be evaluated exactly and has to be treated approximately. According to Poisson’s summation formula f(n) = dxf(x)e−2πikx (12) the sum over R 6= 0 can be expressed in terms of a real space integral and a sum over inverse lattice vectors K of the Fourier transform of the free space Greensfunction G̃(0)(p) R 6=0 eikRG(0)(R) = Ξ(|r|) (2πa)3 ei(p+k−K)rG̃(0)(p) Here Ξ(|r|) is some smooth function with Ξ(0) = 0 and Ξ(|r| > 0) → 1 introduced to prevent the integral from touching the excluded singular point r = 0. In the following we restrict the discussion to lattices with a lattice constant much smaller than the resonant wavelength, i.e. ka ≪ 1. In this limit the lattice of oscillators behaves essentially as a homogeneous medium. Contributions from large K-vectors to the sum, which reflect the discreteness of the lattice, can be neglected as long as the singular contribution from the origin has been excluded. Therefore we only keep the term K = 0 and assume a Gaussian cutting function Ξ(|r|) = 1− e−r2/δ2 , with δ ≪ a. This yields R 6=0 eikRG(0)(R) ≈ 1 G̃(0)(k) π3/2δ3 (2π)3 dp p2e− (k2+p2)e− k·p̂G̃(0)(p), where p̂ = p/|p|. Apart from the Gaussian p-integral which provides a smooth cut-off in reciprocal space, δ can be treated as a small parameter. That allows to carry out the integration analytically which in leading order of δ yields R 6=0 eikRG(0)(R) ≈ 1 G̃(0)(k)− 1 3ω2/c2 1. (15) The free-space Greentensor G̃(0)(k) is given by [12] G̃(0)(k) = 1− |k|2∆k with ∆k = 1 − k̂ ⊗ k̂ being a projector to directions orthogonal to k. With this we are ready to evaluate eq. (5) which reads in the limit ka ≪ 1 ρα(ω)/ε0 1− |k|2∆k Solving eq. (17) for the (isotropic) dispersion k = k(ω) with k(ω) = ε(ω)ω2/c2 finally yields ε(ω) = 1 + ρα(ω)/ε0 1− ρα(ω)/3ε0 . (18) This is the well-known Clausius-Mossotti relation where for purely radiatively broadened systems α(ω) is the dressed polarizability of an isolated oscillator interacting with the free-space electromagnetic vacuum field. LOCAL-FIELD EFFECTS FOR MAGNETO-DIELECTRICS We now extend the above discussion to the case of a bi-cubic lattice of electric and magnetic dipole oscillators. The microscopic, space-dependent bare electric polariz- ability αbe(r) is then given by αbe(r) = αbe δ(r−R) = αbe eiKr (19) and, similarly, the bare magnetic polarizability by αbm(r) = αbm δ(r−R−∆r) = αbm eiK(r−∆r) HereR denotes again the lattice vectors and ∆r the spac- ing between the electric and magnetic sublattices. The bare atomic polarizabilities αbe and αbm are assumed to be scalar for simplicity corresponding to an isotropic medium. The last expressions in eqn. (19) and (20) give the bare polarizabilities in reciprocal space, with K being the reciprocal lattice vectors. Due to the simultaneous presence of electric and mag- netic dipole lattices we now have to solve the coupled set of vector Helmholtz equations for the operators of the electric and magnetic fields ∇×∇× Ê− ω Ê = iωµ0∇× M̂+ µ0ω2P̂ (21) ∇×∇× Ĥ− ω M̂− iω∇× P̂. (22) In linear response the operator of the polarization P̂ and the magnetization M̂ are proportional to the elec- tric and magnetic fields respectively, P̂(r) = αbe(r)Ê(r) and M̂(r) = µ0αbm(r)Ĥ(r). In the following we will pursue a slightly different ap- proach to solve the coupled set of equations than used in the previous section. Taking into account the lattice symmetry we first write the field variables in the form Ê(r) = Ẽ(k−K)ei(k−K)r, (23) where the dependence on frequency ω was suppressed for notational simplicity. The subscript denotes integration over the first Brillouin zone. Substituting this and the corresponding expression for Ĥ into (21)-(22) gives the Helmholtz equations in reciprocal space. After some ele- mentary manipulations the following closed set of equa- tions is derived: ραbe/ε0 1− |k−K|2∆k−K Ẽ(k−K′) = µ0αbm eiK∆r(k−K)× 1− |k−K|2∆k−K H̃(k−K′)e−iK ρµ0αbm 1− |k−K|2∆k−K H̃(k−K′)e−iK ′∆r = − c ωµ0αbm e−iK∆r(k−K)× 1− |k−K|2∆k−K Ẽ(k−K′) where ρ = 1/a3 is the particle density. The sum in the brackets on the left hand sides of eqs. (24,25) can be rewritten as 1− |k−K|2∆k−K eikRG(0)(R) = G(0)(0) + R 6=0 eikRG(0)(R), where in the second line we have separated the singular contribution G(0)(0). One recognizes that this term can be added to the expressions containing the bare polariz- abilities in eqs.(24) and (25) yielding the dressed scatter- ing t-matrices for isolated electric and magnetic dipoles interacting with the free-space vacuum field: te(ω) + G(0)(0), (26) tm(ω) (ω2µ0 + G(0)(0). (27) The sum over the Greensfunction excludingR = 0 can be evaluated in a similar way as in the previous section. If we again assume a lattice constant a much smaller than the resonant wavelength, reciprocal K vectors different from zero can be disregarded. This leads to ρte(ω) + G̃(0)(k)− 1 3ω2/c2 Ê(k) = (28) µ0αbm 1− k2∆k Ĥ(k), ρtm(ω) + G̃(0)(k) − 1 3ω2/c2 Ĥ(k) = (29) c2αbe ωµ0αbm 1− k2∆k Ê(k). Since we are furthermore only interested in propagating, i.e. transversal modes, we can further simplify the calcu- lation by projecting onto transversal modes using ∆k ραe(ω)/ε0 ∆kÊ(k) = µ0αbm k×∆kĤ(k) (30) ρµ0αm(ω) k×∆kĤ(k) = c2αbe ωµ0αbm ∆kÊ(k). (31) Here we have substituted the dressed single parti- cle t-matrices by the free-space dressed polarizabilities αe(m)(ω) = te(m)(ω)c 2/ω2ε0 In order to find the dispersion k(ω) = n2ω2/c2 we have to determine the solution of the secular equation of the linear set of eqs. (30,31), which results in the condition ραe(ω)/ε0 × (32) ρµ0αm(ω) Solving for the refractive index of the transversal modes then gives n2 = εµ, where ε = 1 + ραe(ω)/ε0 1− ραe(ω)/3ε0 µ = 1 + ρµ0αm(ω) 1− ρµ0αm(ω)/3 are the relative dielectric permittivity and magnetic permeability, respectively, both satisfying the Clausius- Mossotti relations. Note that for longitudinal modes eqs. (28) and (29) de- couple. This can be seen by applying the corresponding projector to longitudinal waves k̂⊗k̂ which leads to a dis- appearance of the cross-coupling terms. The dispersion obtained in this way gives either ε = 0 corresponding to electric excitons [17, 18] or µ = 0 for magnetic excitons. NEGATIVE REFRACTION AND ABSORPTION REDUCTION DUE TO LOCAL FIELD EFFECTS IN MAGNETO-DIELECTRIC MEDIA It is interesting to consider the implications of the Clausius Mossotti relations for radiatively broadened me- dia in the large density limit. Let us first consider a purely dielectric medium and let us assume that the po- larizability αe(ω) = α e(ω) + i α e (ω) does not depend on the density, i.e. the medium is radiatively broadened. In this case one finds ρ→∞−→ −2 + i |αe|2 . (35) In the high-density limit and sufficiently close to reso- nance the response saturates at a value of −2 with an imaginary part that vanishes as 1/ρ. At this point the medium becomes totally opaque since the index of re- fraction attains an imaginary value n = i 2 indicating the emergence of a stopping band. This is illustrated in the left column of Fig 1 for a medium composed of either electric or magnetic dipole oscillators. For small densities (ρ|α0|/3 = 1/3) the resonance is centered at ω0 whereas for larger densities (ρ|α0|/3 = 3) the response shifts to smaller frequencies and is amplified. Eventu- ally (ρ|α0|/3 = 30) the refractive index becomes almost purely imaginary in which case light cannot propagate any longer. This behavior changes dramatically if we consider me- dia with overlapping electric and magnetic resonances described by both an electric polarizability αe(ω) and a magnetic polarizability αm(ω). Independent application of Clausius-Mossotti local-field corrections to the permit- tivity and the permeability leads in the high density limit n = −2 + i |αe|2 9α′′m µ0|αm|2 . (36) Thus in the spectral overlap region the real part of the index of refraction approaches the value −2, i.e. attains a constant negative value. Furthermore the imaginary part, responsible for absorption losses, approaches zero in that spectral region as 1/ρ. This rather peculiar behav- ior is illustrated in the right column of Fig.1. One clearly recognizes the emergence of a spectral region around the bare resonance frequency where the real part of the re- fractive index approaches −2 while the imaginary part is strongly suppressed. -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 (b)(a) ∆ [γ ] ∆ [γ ] FIG. 1: (color online) spectrum of the real (solid) and imagi- nary (dashed) part of the refractive index as well as the real (dotted) part of the response function(s) ε and/or µ as a func- tion of the detuning ∆ for a (a) pure dielectric or magnetic medium for ρ|α0|/3 at ∆ = 0 equal to = 1/3 (top), 3 (middle) and 30 (bottom) (b) magneto-dielectric medium for ρ|α0|/3 at ∆ = 0 equal to = 1/3 (top), 3 (middle) and 30 (bottom). Negative refraction of light is currently one of the most active research areas in photonics [19, 20, 21] due to fascinating potential applications such as superlens- ing [22] or electromagnetic cloaking [23, 24, 25]. In recent years substantial progress has been made in re- alizing negative refraction in so-called meta-materials [26, 27, 28, 29]. These are artificial periodic structures of electric and magnetic dipoles with a resonance wave- length much larger than the lattice constant which thus form a quasi-homogeneous magneto-dielectric medium. In order to achieve a large electromagnetic response, op- eration close to resonance is needed which is associated with rather substantial losses. The elimination of these losses represents one of the main challenges in the field [30]. We have shown here that in a radiatively broad- ened medium, i.e. a medium in which density-dependent broadening mechanism can still be disregarded for suf- ficiently large densities, local field effects can provide a negative index of refraction and at the same time effi- ciently suppress absorption losses. SUMMARY In the present paper we have given a rigorous micro- scopic derivation of Clausius-Mossotti relations for both the electric and magnetic response in an isotropic, radia- tively broadened magneto-dielectric medium formed by a simple bi-cubic lattice of electric and magnetic dipoles. As opposed to previous microscopic approaches we have taken into account possible modifications of the single- particle polarizabilities by the altered electromagnetic vacuum inside the medium in a self-consistent way. For a simple bi-cubic lattice it has been shown that the po- larizabilities entering the Clausius-Mossotti relations are those of single oscillators interacting with the free-space vacuum field. We showed that as a consequence of the local field corrections a radiatively broadened medium with overlapping electric and magnetic resonances be- comes lossless with a real part of the refractive index approaching the value −2 in the high-density limit. The latter could provide an interesting avenue to construct ar- tificial materials with negative refraction and low losses. This work was supported by the Alexander von Hum- boldt Foundation through the institutional collabora- tion grant between The Institute of Theoretical Physics and Astronomy of Vilnius University and the Techni- cal University of Kaiserslautern. J.K. acknowledges fi- nancial support by the Deutsche Forschungsgemeinschaft through the GRK 792 “Nichtlineare Optik und Ultra- kurzzeitphysik”. [1] J. D. Jackson, Classical Electrodynamics, (John Wiley & Sons, New York, 1999) [2] H.A. Lorentz, Wiedem. Ann. 9, 641 (1880); L. Lorenz ibid. 11, 70 (1881); L. Onsager, J. Am. Chem. Soc. 58, 1486 (1936); C.J.F. Böttcher, Theory of Electric Polar- ization, (Elsevier, Amsterdam, 1973) [3] D. M. Cook, The Theory of the Electromagnetic Field, (Prentice-Hall, Englewood Cliffs, N.J., 1975). [4] G. Nienhuis and C. Th. J. Alkemande, Physica C 81, 181 (1976). [5] J. Knoester and S. Mukamel, Phys. Rev. A 40, 7065 (1989). [6] R.J. Glauber and M. Lewenstein, Phys. Rev. A 43, 467 (1991). [7] S.M. Barnett, B. Huttner, and R. Loudon, Phys. Rev. Lett. 68, 3698 (1992). [8] G. Juzeliunas, Phys. Rev. A 55 R4015 (1997). [9] S. Scheel, L. Knöll, and D.G. Welsch, Phys. Rev. A 60, 4094 (1999) [10] M. Fleischhauer, Phys. Rev. A 60, 2534 (1999). [11] H.T. Dung HT, S.Y. Buhmann, L. Knöll, D.G. Welsch, S. Scheel, and J. Kästel, Phys. Rev. A 68, 043816 (2003). [12] P. de Vries, D. V. van Coevorden, and A. Lagendijk, Rev. Mod. Phys. 70, 447 (1998). [13] J. Korringa, Physica 13, 392 (1947). [14] W. Kohn, N. Rostoker, Phys. Rev. 94, 1111 (1953). [15] J. M. Ziman, Proc. Phys. Soc. 86, 337 (1965). [16] P. de Vries, and A. Lagendijk, Phys. Rev. Lett 81, 1381 (1998). [17] A. S. Davydov, Theory of Molecular Excitons (Plenum, New York, 1971). [18] V. M. Agranovich, and M. D. Galanin, Electronic Exci- tation Energy Transfer in Condensed Matter, edited by V. M. Agranovich and A. A. Maradudin (North-Holland, Amsterdam, 1982). [19] V. G. Veselago, Sov. Phys. Usp. 10, 509 (1968). [20] V. M. Agranovich, and Y. N. Gartstein, Physics Uspekhi 49, 1029 (2006) [21] V. M. Shalaev, Nature Photonics 1, 41 (2007) [22] J. B. Pendry, Phys. Rev. Lett. 85, 3966 (2000) [23] U. Leonhardt, Science 312, 1777 (2006). [24] J. B. Pendry, D. Schurig, and D. R. Smith, Science 312, 1780 (2006). [25] D. Schurig et al. , Science 314, 977 (2006). [26] J. B. Pendry et al. , IEEE Trans. Micro. Theory Tech. 47, 2075 (1999). [27] D. R. Smith et al. , Phys. Rev. Lett. 84, 4184 (2000); R. Shelby, D. R. Smith, and S. Schultz, Science 292, 77 (2001). [28] T. J. Yen et al. , Science 303, 1494 (2004). [29] S. Linden et al. , Science 306, 1351 (2004); C. Enkrich et al. , Phys. Rev. Lett. 95, 203901 (2005). [30] J. Kästel, M. Fleischhauer, S. F. Yelin, and R. L. Walsworth, Phys. Rev. Lett. 99, 073602 (2007). ABSTRACT We give a microscopic derivation of the Clausius-Mossotti relations for a homogeneous and isotropic magneto-dielectric medium consisting of radiatively broadened atomic oscillators. To this end the diagram series of electromagnetic propagators is calculated exactly for an infinite bi-cubic lattice of dielectric and magnetic dipoles for a lattice constant small compared to the resonance wavelength $\lambda$. Modifications of transition frequencies and linewidth of the elementary oscillators are taken into account in a selfconsistent way by a proper incorporation of the singular self-interaction terms. We show that in radiatively broadened media sufficiently close to the free-space resonance the real part of the index of refraction approaches the value -2 in the limit of $\rho \lambda^3 \gg 1$, where $\rho$ is the number density of scatterers. Since at the same time the imaginary part vanishes as $1/\rho$ local field effects can have important consequences for realizing low-loss negative index materials. <|endoftext|><|startoftext|> Introduction The Standard Model (SM), although in agreement with the available experimental data [1], leaves several open questions. In particular, the number of fermion generations and their mass spectrum are not predicted. The measurement of the Z decay widths [1] established that the number of light neutrino species (m < mZ/2, where mZ is the Z boson mass) is equal to three. However, if a heavy neutrino or a neutrinoless extra generation exists, this bound does not exclude the possibility of extra generations of heavy quarks. Moreover the fit to the electroweak data [2] does not deteriorate with the inclusion of one extra heavy generation, if the new up and down-type quarks mass difference is not too large. It should be noticed however that in this fit no mixing of the extra families with the SM ones is assumed. The subject of this paper is the search for the pair production of a fourth generation b′-quark at LEP-II: b′ production and decay are discussed in section 2; in section 3, the data sets and the Monte Carlo (MC) simulation are described; the analysis is discussed in section 4; the results and their interpretation within a sequential model are presented in sections 5 and 6, respectively. 2 b′-quark production and decay Extra generations of fermions are predicted in several SM extensions [3,4]. In sequen- tial models [5–7], a fourth generation of fermions carrying the same quantum numbers as the SM families is considered. In the quark sector, an up-type quark, t′, and a down-type quark, b′, are included. The corresponding 4× 4 extended Cabibbo-Kobayashi-Maskawa (CKM) matrix is unitary, approximately symmetric and almost diagonal. As CP-violation is not considered in the model, all the CKM elements are assumed to be real. The b′-quark may decay via charged currents (CC) to UW, with U = t′, t, c, u, or via flavour-changing neutral currents (FCNC) to DX , where D = b, s, d and X = Z,H, γ, g (Fig. 1). As in the SM, FCNC are absent at tree level, but can appear at one-loop level, due to CKM mixing. If the b′ is lighter than t′ and t, the decays b′ → t′W and b′ → tW are kinematically forbidden and the one-loop FCNC decays can be as important as the CC decays [6]. The analysis of the electroweak data [1] shows that the mass difference |mt′ −mb′ | < 60 GeV/c2 is consistent with the measurement of the ρ parameter [3,5]. In particular, when mZ + mb < mb′ < mH + mb, either b ′ → cW or b′ → bZ decay tend to be domi- nant [5–7]. In this case, the partial widths of the CC and FCNC b′ decays depend mainly on mt′ , mb′ and RCKM = | Vcb′V tb′Vtb |, where Vcb′ , Vtb′ and Vtb are elements of the extended 4 × 4 CKM matrix [7]. Limits on the mass of the b′-quark have been set previously at various accelerators. At LEP-I, all the experiments searched for b′ pair production (e+e− → b′b̄′), yielding a lower limit on the b′ mass of about mZ/2 [8]. At the Tevatron, both the D0 [9] and CDF [10] experiments reported limits on σ(pp̄ → b′b̄′) × BR(b′ → bX)2, where BR is the branching ratio corresponding to the considered FCNC b′ decay mode and X = γ,Z. Assuming BR(b′ → bZ) = 1, CDF excluded the region 100 < mb′ < 199 GeV/c2. Although no dedicated analysis was performed for the b′ → cW decay, the D0 limits on σ(pp̄ → tt̄) × BR(t → cW)2 from Fig. 44 and Table XXXI of reference [11] can give a hint on the possible values for BR(b′ → cW) [12]. In the present analysis the on-shell FCNC (b′ → bZ) and CC (b′ → cW) decay modes were studied and consequently the mass range 96 GeV/c2 < mb′ < 103 GeV/c 2 was considered. This mass range is complementary to the one covered by CDF [10]. The mass range mW + mc < mb′ < mZ + mb was not considered because in this region the evaluation of the branching ratios for the different b′ decays is particularly difficult from the theoretical point of view [7]. In the present analysis no assumptions on the BR(b′ → bZ) and BR(b′ → cW) in order to derive mass limits were made. Different final states, corresponding to the different b′ decay modes and subsequent decays of the Z and W bosons, were analysed. 3 Data samples and Monte Carlo simulation The analysed data were collected with the DELPHI detector [13] during the years 1999 and 2000 in LEP-II runs at s = 196 − 209 GeV and correspond to an integrated luminosity of about 420 pb−1. The luminosity collected at each centre-of-mass energy is shown in Table 1. During the year 2000, an unrecoverable failure affected one sector of the central tracking detector (TPC), corresponding to 1/12 of its acceptance. The data collected during the year 2000 with the TPC fully operational were split into two energy bins, below and above s = 206 GeV, with 〈 s〉 = 204.8 GeV and 〈 s〉 = 206.6 GeV, respectively. The data collected with one sector of the TPC turned off were analysed separately and have 〈 s〉 = 206.3 GeV. s (GeV) 196 200 202 205 207 206∗ luminosity (pb−1) 76.0 82.7 40.2 80.0 81.9 59.2 Table 1: The luminosity collected with the DELPHI detector at each centre-of-mass energy is shown. The energy bin labelled 206∗ corresponds to the data collected with one sector of the TPC turned off. Signal samples were generated using a modified version of PYTHIA 6.200 [14]. Al- though PYTHIA does not provide FCNC decay channels for quarks, it was possible to activate them by modifying the decay products of an available channel. The angular distributions assumed for b′ pair production and decay were those predicted by the SM for any heavy down-type quark. Different samples, corresponding to b′ masses in the range between 96 and 103 GeV/c2 and with a spacing of 1 GeV/c2 were generated at each centre-of-mass energy. Specific Monte Carlo simulations (for both SM and signal processes) were produced for the period when one sector of the TPC was turned off. The most relevant background processes for the present analyses are those leading to WW or ZZ bosons in the final state, i.e. four-fermion backgrounds. Radiation in these events can mimic the six-fermion final states for the signal. Additionally qq̄(γ) and Bhabha events can not be neglected since for signal final states with missing energy these backgrounds can become important. SM background processes were simulated at each centre-of-mass energy using several Monte Carlo generators. All the four-fermion final states (both neutral and charged currents) were generated with WPHACT [15], while the particular phase space regions of e+e− → e+e−f f̄ referred to as γγ interactions were generated using PYTHIA [14]. The qq(γ) final state was generated with KK2F [16]. Bhabha events were generated with BHWIDE [17]. The generated signal and background events were passed through the detailed simu- lation of the DELPHI detector [13] and then processed with the same reconstruction and analysis programs as the data. 4 Description of the analyses Pair production of b′-quarks was searched for in both the FCNC (b′ → bZ) and CC (b′ → cW) decay modes. The b′ decay modes and the subsequent decays of the gauge bosons (Z or W) lead to several different final states (Fig. 2). The final states considered and their branching ratios are shown in Table 2. The choice of the considered final states was done taking into account their signatures and BR. About 81% and 90% of the branching ratio to the FCNC and CC channels were covered, respectively. All final states include two jets originating from the low energy b (c) quarks present in the FCNC (CC) b′ decay modes. A common preselection was adopted, followed by a specific analysis for each of the final states (Table 2). b′ decay boson decays BR (%) final states b′ → bZ (FCNC) ZZ → l+l−νν̄ 4.0 bb̄l+l−νν̄ ZZ → qq̄νν̄ 28.0 bb̄qq̄νν̄ ZZ → qq̄qq̄ 48.6 bb̄qq̄qq̄ b′ → cW (CC) WW → qq̄l+ν 43.7 cc̄qq̄l+ν WW → qq̄qq̄ 45.8 cc̄qq̄qq̄ Table 2: The final states considered in this analysis are shown. About 81% and 90% of the branching ratio to the FCNC and CC channels were covered, respectively. Events were preselected by requiring at least eight good charged-particle tracks and the visible energy measured at polar angles1 above 20◦, to be greater than 0.2 s. Good charged-particle tracks were defined as those with a momentum above 0.2 GeV/c and impact parameters in the transverse plane and along the beam direction below 4 cm and below 4 cm/ sin θ, respectively. The identification of muons relied on the association of charged particles to signals in the muon chambers and in the hadronic calorimeters and was provided by standard DELPHI algorithms [13]. The identification of electrons and photons was performed by combining information from the electromagnetic calorimeters and the tracking sys- tem. Radiation and interaction effects were taken into account by an angular clustering procedure around the main shower [18]. The search for isolated particles (charged leptons and photons) was done by construct- ing double cones oriented in the direction of charged-particle tracks or neutral energy deposits. The latter ones were defined as calorimetric energy deposits above 0.5 GeV, not matched to charged-particle tracks and identified as photon candidates by the stan- dard DELPHI algorithms [13,18]. For charged leptons (photons), the energy in the region between the two cones, which had half-opening angles of 5◦ and 25◦ (5◦ and 15◦), was required to be below 3 GeV (1 GeV), to ensure isolation. All the charged-particle tracks 1In the standard DELPHI coordinate system, the positive z axis is along the electron beam direction. The polar angle (θ) is defined with respect to the z axis. In this paper, polar angle ranges are always assumed to be symmetric with respect to the θ = 90◦ plane. final state assignment criteria bb̄l+l−νν̄ at least 1 isolated lepton bb̄qq̄νν̄ no isolated leptons Emissing > 50 GeV bb̄qq̄qq̄ no isolated leptons Emissing < 50 GeV cc̄qq̄l+ν only 1 isolated lepton cc̄qq̄qq̄ no isolated leptons Emissing < 50 GeV Table 3: Summary of the final state assignment criteria. and neutral energy deposits inside the inner cone were associated to the isolated particle. Its energy was then re-evaluated as the sum of the energies inside the inner cone and was required to be above 5 GeV. For well identified leptons or photons [13,18] the above requirements were weakened. In this case only the external cone was used (to ensure isolation) and its angle α was varied according to the energy of the lepton (photon) can- didate, down to 2◦ for Pℓ ≥ 70 GeV/c (3◦ for Pγ ≥ 90 GeV/c), with the allowed energy inside the cone reduced by sinα/ sin 25◦ (sinα/ sin 15◦). Isolated leptons were required to have a momentum greater than 10 GeV/c and a polar angle above 25◦. Events with isolated photons were rejected. All the events were clustered into two, four or six jets using the Durham jet algo- rithm [19], according to the number of jets expected in the signal in each of the final states, unless explicitly stated otherwise. Although two b jets are always present in the FCNC final states, they have a relatively low energy and b-tagging techniques [20] were not used. Events were assigned to the different final states according to the number of isolated leptons and to the missing energy in the event, as detailed in Table 3. Within the same b′ decay channel, the different selections were designed to be mutually exclusive. For the final states involving charged leptons (bb̄l+l−νν̄ and cc̄qq̄l+ν), events were divided into different samples according to the lepton flavour identification: e sample (well identified electrons), µ sample (well identified muons) and no-id sample (leptons with unidentified flavour or two leptons identified with different flavours). Specific analyses were then performed for each of the final states. The selection criteria for the bb̄qq̄qq̄ and cc̄qq̄qq̄ final states were the same. The bb̄l+l−νν̄ final state has a very clean signature (two leptons with ml+l− ∼ mZ, two low energy jets and missing mass close to mZ) and consequently a sequential cut analysis was adopted. For all the other final states, a sequential selection step was followed by a discriminant analysis. In this case, a signal likelihood (LS) and a background likelihood (LB) were assigned to each event, based on Probability Density Functions (PDF), built from the distributions of relevant physical variables. The discriminant variable was defined as ln(LS/LB). 4.1 The bb̄l+l−νν̄ final state The FCNC bb̄l+l−νν̄ final state events were preselected as described above, by re- quiring at least eight good charged-particle tracks, the visible energy measured at polar angles above 20◦, to be greater than 0.2 s and at least one isolated lepton. Distribu- tions of the relevant variables are shown in Fig. 3 for all the events assigned to this final state after the preselection. The event selection was performed in two levels. In the first one, events were required to have at least two leptons and an effective centre-of-mass energy [21], s′, below 0.95 s. The particles other than the two leptons in the events were clustered into two jets and the Durham resolution variable in the transition from two jets to one jet2 was required to be greater than 0.002. The number of data events and the SM expectation after the first selection level is shown in Table 4. The background composition and the signal efficiencies at this level of selection for mb′ = 100 GeV/c 2 and√ s = 205 GeV are given in Table 8. The efficiencies for the other relevant b′ masses and√ s values were found to be the same within errors. Data, SM expectation and signal distributions at this selection level are shown in Fig. 4. s (GeV) data (SM expectation ± statistical error) e sample µ sample no-id sample 196 2 (2.6±0.3) 1 (2.9±0.3) 47 (35.9±1.4) 200 3 (2.5±0.4) 4 (3.4±0.4) 30 (37.4±1.4) 202 2 (1.3±0.2) 1 (1.7±0.2) 20 (18.7±0.7) 205 5 (2.5±0.4) 3 (3.0±0.4) 35 (36.2±1.4) 207 3 (2.3±0.4) 3 (3.1±0.4) 45 (35.1±1.3) 206∗ 1 (1.9±0.3) 2 (2.6±0.2) 31 (27.6±1.0) total 16 (13.2±0.8) 14 (16.7±0.8) 208 (191.0±3.0) Table 4: First selection level of the bb̄l+l−νν̄ final state: the number of events selected in data and the SM expectations after the first selection level for each sample and cen- tre-of-mass energy are shown. In the final selection level the momentum of the more energetic (less energetic) jet was required to be below 30 GeV/c (12.5 GeV/c). Events in the e and no-id samples had to have a missing energy greater than 0.4 s. In the µ sample events were required to have an angle between the two muons greater than 125◦. In the no-id sample, the angle between the two charged leptons had to be greater than 140◦ and pmis/Emis < 0.4, where pmis and Emis are the missing momentum and energy, respectively. After the final selection, one data event was selected for an expected background of 1.5±0.7. This event belonged to the no-id sample and was collected at s = 200 GeV. The signal efficiencies for mb′ = 100 GeV/c 2 and s = 205 GeV are 30.6 ± 2.5% (e sample), 48.6 ± 2.7% (µ sample) and 7.2 ± 0.8% (no-id sample) and their variation with mb′ and s was found to be negligible in the relevant range. 4.2 The bb̄qq̄νν̄ final state The FCNC bb̄qq̄νν̄ final state is characterised by the presence of four jets and a missing mass close to mZ. At least 20 good charged-particle tracks and s′ > 0.5 were required. Events were clustered into four jets. Monojet-like events were rejected by requiring − log10(y2→1) < 0.7 (y2→1 is the Durham resolution variable in the two to one jet transition). Furthermore, − log10(y4→3) was required to be below 2.8 and the energy of the leading charged particle of the most energetic jet was required to be below 0.1 2The Durham resolution variable is the minimum value of the scaled transverse momentum obtained in the transition from n to n− 1 jets [19] and will be represented by yn→n−1. A kinematic fit imposing energy-momentum conservation and no missing energy was applied and the background-like events with χ2/n.d.f. < 6 were rejected. The data, SM expectation and signal distributions of this variable are shown in Fig. 5. Table 5 summarizes the number of selected data events and the SM expectation. The background composition and the signal efficiency at this level of selection for mb′ = 100 GeV/c 2 and√ s = 205 GeV are given in Table 8. The efficiencies for the other relevant b′ masses and√ s values were found to be the same within errors. s (GeV) data (SM expectation ± statistical error) 196 123 (106.3±4.0) 200 111 (104.8±4.0) 202 50 (49.8±1.9) 205 88 (94.2±3.7) 207 99 (91.2±3.6) 206∗ 62 (65.7±2.6) total 533 (511.7±8.3) Table 5: First selection level of the bb̄qq̄νν̄ final state: the number of events selected in data and the SM expectation for each centre-of-mass energy are shown. A discriminant selection was then performed using the following variables to build the PDFs: • the missing mass; • Aj1j2cop × min(sin θj1 , sin θj2), where Aj1j2cop is the acoplanarity3 and θj1,j2 are the polar angles of the jets when forcing the events into two jets4; • the acollinearity between the two most energetic jets5 with the event particles clus- tered into four jets; • the sum of the first and third Fox-Wolfram moments (h1 + h3) [22]; • the polar angle of the missing momentum. The data, SM expectation and signal distributions of these variables are shown in Fig. 6. 4.3 The bb̄qq̄qq̄ final state The FCNC bb̄qq̄qq̄ final state is characterised by the presence of six jets and a small missing energy. All the events were clustered into six jets and only those with at least 30 good charged-particle tracks were accepted. Moreover, events were required to have√ s′ > 0.6 s, − log10(y2→1) < 0.7 and − log10(y6→5) < 3.6. The number of selected data events and the expected background at this level are shown in Table 6. The background composition and the signal efficiency at this level of selection for mb′ = 100 GeV/c 2 and√ s = 205 GeV are given in Table 8. The efficiencies for the other relevant b′ masses and√ s values were found to be the same within errors. A discriminant selection was performed using the following variables to build the PDFs: 3The acoplanarity between two particles is defined as |180◦ − |φ1 −φ2||, where φ1,2 are the azimuthal angles of the two particles (in degrees). 4While the signal is characterised by the presence of four jets in the final state, the two jets configuration is used mainly for background rejection. 5The acollinearity between two particles is defined as 180◦ − α1,2, where α1,2 is the angle (in degrees) between those two particles. s (GeV) data (SM expectation ± statistical error) 196 349 (326.7±5.3) 200 347 (342.1±5.5) 202 165 (162.1±2.6) 205 322 (319.0±5.2) 207 287 (307.6±5.0) 206∗ 192 (215.8±3.6) total 1662 (1673.9±11.4) Table 6: First selection level of the bb̄qq̄qq̄ and cc̄qq̄qq̄ final states: the number of events selected in data and the SM expectations for each centre-of-mass energy are shown. • the Durham resolution variable, − log10(y4→3); • the Durham resolution variable, − log10(y5→4); • the acollinearity between the two most energetic jets, with the event forced into four jets; • the sum of the first and third Fox-Wolfram moments; • the momentum of the most energetic jet; • the angle between the two most energetic jets (with the events clustered into six jets). The distributions of these variables are shown in Fig. 7 for data, SM expectation and signal. 4.4 The cc̄qq̄l+ν final state The signature of this CC final state is the presence of four jets (two of them having low energy), one isolated lepton and missing energy (originating from the W → lν̄ decay). The events were accepted if they had at least 15 good charged-particle tracks. The event particles other than the identified lepton were clustered into four jets. Part of the qq̄ and γγ background was rejected by requiring − log10(y2→1) < 0.7. Furthermore, there should be only one charged-particle track associated to the isolated lepton, and the leading charged particle of the most energetic jet was required to have a momentum below 0.1 The number of selected data events and SM expectations at this level are summarized in Table 7. The background composition and the signal efficiencies at this level of selection for mb′ = 100 GeV/c 2 and s = 205 GeV are given in Table 8. The efficiencies for the other relevant b′ masses and s values were found to be the same within errors. The PDFs used to calculate the background and signal likelihoods were based on the following variables: • the sum of the first and third Fox-Wolfram moments; • the invariant mass of the two jets, with the event particles other than the identified lepton clustered into two jets; • the Durham resolution variable, − log10(y4→3); |~pi|/ s, where ~pi are the momenta of the charged particles (excluding the lepton) in the same hemisphere as the lepton (the hemisphere is defined with respect to the lepton); • the acollinearity between the two most energetic jets; s (GeV) data (SM expectation ± statistical error) e µ no-id 196 65 (51.1±1.4) 53 (56.1±1.5) 38 (34.4±1.4) 200 54 (58.1±1.7) 63 (59.9±1.6) 40 (35.0±1.4) 202 30 (27.8±0.8) 21 (28.4±0.8) 13 (16.9±0.7) 205 56 (50.8±1.5) 66 (53.6±1.5) 32 (33.3±1.4) 207 53 (53.8±1.6) 48 (57.2±1.6) 35 (33.8±1.4) 206∗ 31 (37.2±1.4) 42 (39.3±1.1) 21 (23.4±1.0) total 289 (278.8±3.5) 293 (294.5±3.4) 179 (176.8 ± 2.8) Table 7: First selection level of the cc̄qq̄l+ν final state: the number of events selected in data and the SM expectations for each sample and centre-of-mass energy are shown. • the angle between the lepton and the missing momentum. The data, SM expectation and signal distributions of these variables are shown in Fig. 8. In order to improve the efficiency, events with no leptons seen in the detector were kept in a fourth sample. For this sample, the selection criteria of the bb̄qq̄νν̄ final state were applied and the same variables as in section 4.2 were used to build the PDFs. The signal efficiency after the first selection level for mb′ = 100 GeV/c 2 and s = 205 GeV was 8.9±0.9%. The efficiencies for the other relevant b′ masses and s values were found to be the same within errors. 4.5 The cc̄qq̄qq̄ final state This final state is very similar to bb̄qq̄qq̄ (with slightly different kinematics due to the mass difference between the Z and the W). The analysis described in section 4.3 was thus adopted. The number of selected events and the SM expectations can be found in Table 6. At this level, the signal efficiency for mb′ = 100 GeV/c 2 and s = 205 GeV was 67.3±1.5%. The efficiencies for the other b′ masses and centre-of-mass energies were the same within errors. The PDFs were built using the same set of variables as in section 4.3. 5 Results For all final states, a good agreement between data and SM expectation was found. The summary of the total number of selected data events, SM expectations, the corresponding background composition and the signal efficiencies for the studied final states are shown in Table 8. In the bb̄l+l−νν̄ final state, one data event was retained after the final selection level, for a SM expectation of 1.5 ± 0.7 events. This event belonged to the no-id sample and was collected at s = 200 GeV. For all the other final states, discriminant analyses were used. In these cases, a discriminant variable, ln(LS/LB), was defined. The distributions of ln(LS/LB), for the different analysis channels are shown in Fig. 9. No evidence for a signal was found in any of the channels and the full information, i.e. event numbers and the shapes of the distributions of the discriminant variables were used to derive limits on BR(b′ → bZ) and BR(b′ → cW). data background signal final state (SM ± stat. error) composition (%) efficiency (%) qq̄ WW ZZ γγ bb̄l+l−νν̄ e sample 16 (13.2±0.8) 16 16 68 0 35.1±2.6 (first selection µ sample 14 (16.7±0.8) 0 10 90 0 53.4±2.7 level) no-id sample 208 (191.0±3.0) 8 80 12 0 12.3±1.0 bb̄qq̄νν̄ 533 (511.7±8.3) 76 17 2 5 57.6±1.7 bb̄qq̄qq̄ 1662 (1673.9±11.4) 35 65 0 0 66.0±1.5 e sample 289 (278.8±3.5) 7 82 11 0 45.3±2.7 cc̄qq̄l+ν µ sample 293 (294.5±3.4) 2 97 1 0 56.4±2.7 no-id sample 179 (176.8±2.8) 9 84 7 0 5.3±0.7 no lepton sample 533 (511.7±8.3) 76 17 2 5 8.9±0.9 cc̄qq̄qq̄ 1662 (1673.9±11.4) 35 65 0 0 67.3±1.5 Table 8: Summary of the total number of selected data events and SM expectations for the studied final states after the final selection (first selection level for bb̄l+l−νν̄). The corresponding background composition and signal efficiencies for mb′ = 100 GeV/c 2 and√ s = 205 GeV are also shown. 5.1 Limits on BR(b′ → bZ) and BR(b′ → cW) Upper limits on the product of the e+e− → b′b̄′ cross-section and the branching ratio as a function of the b′ mass were derived at 95% confidence level (CL) in each of the considered b′ decay modes (FCNC and CC), taking into account the values of the dis- criminant variables and their expected distributions for signal and background, the signal efficiencies and the data luminosities at the various centre-of-mass energies. Assuming the SM cross-section for the pair production of heavy quarks at LEP [7,14], these limits were converted into limits on the branching ratios corresponding to the b′ → bZ and b′ → cW decay modes. The modified frequentist likelihood ratio method [23] was used. The different final states and centre-of-mass energy bins were treated as inde- pendent channels. For each b′ mass only the channels with s > 2mb′ were considered. In order to avoid some non-physical fluctuations of the distributions of the discriminant variables due to the limited statistics of the generated events, a smoothing algorithm was used. The median expected limit, i.e. the limit obtained if the SM background was the only contribution in data, was also computed. In Fig. 10 the observed and expected limits on BR(b′ → bZ) and BR(b′ → cW) are shown as a function of the b′ mass. The 1σ and 2σ bands around the expected limit are also shown. The observed and expected limits are statistically compatible. At 95% CL and for mb′ = 96 GeV/c 2, the BR(b′ → bZ) and BR(b′ → cW) have to be below 51% and 43%, respectively. These limits were evaluated taking into account the systematic uncertainties, as explained in the next subsection. The limits obtained for BR(b′ → bZ) are compatible with those presented by CDF [10] for a b′ mass of 100 GeV/c2. Below this mass, the DELPHI result is more sensitive and the CDF limit degrades rapidly. For higher b′ masses, the LEP-II kinematical limit is reached and the present analysis looses sensitivity. 5.2 Systematic uncertainties The evaluation of the limits was performed taking into account systematic uncertain- ties, which affect the background estimation, the signal efficiency and the shape of the distributions used. The following systematic uncertainties were considered: • SM cross-sections: uncertainties on the SM cross-sections translate into uncertainties on the expected number of background events. The overall uncertainty on the most relevant SM background processes for the present analyses is typically less than 2% [24], which leads to relative changes on the branching ratio limits below 6%; • Signal generation: uncertainties on the final state quark hadronisation and fragmen- tation modelling were studied. The Lund symmetric fragmentation function was tested and compared with schemes where the b and c quark masses are taken into account [14]. This systematic error source was estimated to be of the order of 20% in the signal efficiency, by conservatively taking the maximum observed variation. The relative effect on the branching ratio limits is below 16%; • Smoothing: the uncertainty associated to the discriminant variables smoothing was estimated by applying different smoothing algorithms. The smoothing procedure does not change the number of SM expected events or the signal efficiency, but may lead to differences in the shape of the discriminant variables. The relative effect of this uncertainty on the limits evaluation was found to be below 9%. Further details on the evaluation of the systematic errors and the derivation of limits can be found in [25]. 6 Constraints on RCKM The branching ratios for the b′ decays can be computed within a four generations sequential model [5–7]. As discussed before, if the b′ is lighter than both the t and the t′ quarks and mZ < mb′ < mH, the main contributions to the b ′ width are BR(b′ → bZ) and BR(b′ → cW) [7]. Using the unitarity of the CKM matrix, its approximate diagonality (Vub′ Vub ≈ 0) and taking Vcb ≈ 10−2 [12], the branching fractions can be written as a function of three variables: RCKM = | Vcb′V tb′ Vtb |, mt′ and mb′ [5–7]. Fixing mt′ − mb′ , the limits on BR(b′ → bZ) and BR(b′ → cW) (Fig. 10) can be translated into 95% CL bounds on RCKM as a function of mb′ . Two extreme cases were considered: the almost degenerate case, with mt′−mb′ = 1 GeV/c2, and the case in which the mass difference is close to the largest possible value, mt′ − mb′ = 50 GeV/c2 [3,5]. The results are shown in Fig. 11 and Fig. 12. In the figures, the upper curve was obtained from the limit on BR(b′ → cW), while the lower curve was obtained from the limit on BR(b′ → bZ), which decreases with growing mt′ . This suppression is due to the GIM mechanism [26] as mt′ approaches mt. On the other hand, as the b ′ mass approaches the bZ threshold, the b′ → bg decay dominates over b′ → bZ [7] and the lower limit on RCKM becomes less stringent. The expected limits on BR(b ′ → bZ) did not allow to set exclusions for low values of RCKM and mt′ −mb′ = 1 GeV/c2 (see Fig. 11). 7 Conclusions The data collected with the DELPHI detector at s = 196−209 GeV show no evidence for the pair production of b′-quarks with masses ranging from 96 to 103 GeV/c2. Assuming the SM cross-section for the pair production of heavy quarks at LEP, 95% CL upper limits on BR(b′ → bZ) and BR(b′ → cW) were obtained. It was shown that, at 95% CL and for mb′ = 96 GeV/c 2, the BR(b′ → bZ) and BR(b′ → cW) have to be below 51% and 43%, respectively. The 95% CL upper limits on the branching ratios, combined with the predictions of the sequential fourth generation model, were used to exclude regions of the (RCKM , mb′) plane for two hypotheses of the mt′ − mb′ mass difference. It was shown that, for mt′ −mb′ = 1 (50) GeV/c2 and 96 GeV/c2 < mb′ < 102 GeV/c2, RCKM is bounded by an upper limit of 3.8×10−3 (1.2×10−3). For mb′ = 100 GeV/c2 and mt′ −mb′ = 50 GeV/c2, the CKM ratio was constrained to be in the range 4.6 × 10−4 < RCKM < 7.8 × 10−4. Acknowledgements We are greatly indebted to our technical collaborators, to the members of the CERN- SL Division for the excellent performance of the LEP collider, and to the funding agencies for their support in building and operating the DELPHI detector. We acknowledge in particular the support of Austrian Federal Ministry of Education, Science and Culture, GZ 616.364/2-III/2a/98, FNRS–FWO, Flanders Institute to encourage scientific and technological research in the industry (IWT) and Belgian Federal Office for Scientific, Technical and Cultural affairs (OSTC), Belgium, FINEP, CNPq, CAPES, FUJB and FAPERJ, Brazil, Czech Ministry of Industry and Trade, GA CR 202/99/1362, Commission of the European Communities (DG XII), Direction des Sciences de la Matière, CEA, France, Bundesministerium für Bildung, Wissenschaft, Forschung und Technologie, Germany, General Secretariat for Research and Technology, Greece, National Science Foundation (NWO) and Foundation for Research on Matter (FOM), The Netherlands, Norwegian Research Council, State Committee for Scientific Research, Poland, SPUB-M/CERN/PO3/DZ296/2000, SPUB-M/CERN/PO3/DZ297/2000, 2P03B 104 19 and 2P03B 69 23(2002-2004) FCT - Fundação para a Ciência e Tecnologia, Portugal, Vedecka grantova agentura MS SR, Slovakia, Nr. 95/5195/134, Ministry of Science and Technology of the Republic of Slovenia, CICYT, Spain, AEN99-0950 and AEN99-0761, The Swedish Research Council, Particle Physics and Astronomy Research Council, UK, Department of Energy, USA, DE-FG02-01ER41155, EEC RTN contract HPRN-CT-00292-2002. References [1] The LEP Collaborations ALEPH, DELPHI, L3, OPAL and the LEP Electroweak Working Group, A Combination of Preliminary Electroweak Measurements and Con- straints on the Standard Model (2005) CERN-PH-EP/2005-051, hep-ex/0511027; ALEPH, DELPHI, L3, OPAL and SLD Coll., LEP Electroweak Working Group, SLD Heavy Flavour Groups, Phys. Rept. 427 (2006) 257. [2] V.A. Novikov, L.B. Okun, A.N. Rozanov and M.I. Vysotsky, Phys. Lett. B529 (2002) [3] P.H. Frampton, P.Q. Hung and M. Sher, Phys. Rep. 330 (2000) 263. [4] A. Djouadi et al. in Electroweak symmetry breaking and new physics at the TeV scale, ed. Barklow, Timothy - World Scientific, Singapore (1997). [5] A. Arhrib and W.S. Hou, Phys. Rev. D64 (2001) 073016; A. Arhrib and W.S. Hou, JHEP 0607 (2006) 009. [6] W.S. Hou and R.G. Stuart, Phys. Rev. Lett. 62 (1989) 617; W.S. Hou and R.G. Stuart, Nucl. Phys. B320 (1989) 277; W.S. Hou and R.G. Stuart, Nucl. Phys. B349 (1991) 91. [7] S.M. Oliveira and R. Santos, Phys. Rev. D68 (2003) 093012; S.M. Oliveira and R. Santos, Acta Phys. Polon. B34 (2003) 5523. [8] ALEPH Coll., D. Decamp et al., Phys. Lett. B236 (1990) 511; DELPHI Coll., P. Abreu et al., Nucl. Phys. B367 (1991) 511; L3 Coll., O. Adriani et al., Phys. Rep. 236 (1993) 1; OPAL Coll., M.Z. Akrawy et al., Phys. Lett. B246 (1990) 285. [9] D0 Coll., S. Abachi et al., Phys. Rev. Lett. 78 (1997) 3818. [10] CDF Coll., T. Affolder et al., Phys. Rev. Lett. 84 (2000) 835. [11] D0 Coll., S. Abachi et al., Phys. Rev. D52 (1995) 4877. [12] Particle Data Group, W.-M. Yao et al., J. Phys. G33 (2006) 1. [13] DELPHI Coll., P. Aarnio et al., Nucl. Instr. Meth. A303 (1991) 233; DELPHI Coll., P. Abreu et al., Nucl. Instr. Meth. A378 (1996) 57. [14] T. Sjöstrand, Comp. Phys. Comm. 82 (1994) 74; T. Sjöstrand, PYTHIA 5.7 and JETSET 7.4, CERN-TH/7112-93; T. Sjöstrand et al., Comp. Phys. Comm. 135 (2001) 238. [15] E. Accomando and A. Ballestero, Comp. Phys. Comm. 99 (1997) 270; E. Accomando, A. Ballestrero and E. Maina, Comp. Phys. Comm. 150 (2003) 166; A. Ballestrero, R. Chierici, F. Cossutti and E. Migliore, Comp. Phys. Comm. 152 (2003) 175. [16] S. Jadach, B.F.L. Ward and Z. Was, Comp. Phys. Comm. 130 (2000) 260. [17] S. Jadach, W. P laczek and B.F.L. Ward, Phys. Lett. B390 (1997) 298. [18] F. Cossutti et al., REMCLU: a package for the Reconstruction of Elec- troMagnetic CLUsters at LEP200, DELPHI Note 2000-164 PROG 242, http://delphiwww.cern.ch/pubxx/delnote/public/2000 164 prog 242.ps.gz. [19] S. Catani et al., Phys. Lett. B269 (1991) 432. [20] DELPHI Coll., J. Abdallah et al., Eur. Phys. J. C32 (2004) 185. [21] P. Abreu et al., Nucl. Instr. Meth. A427 (1999) 487. [22] G. Fox and S. Wolfram, Phys. Lett. B82 (1979) 134. [23] A.L. Read, CERN report 2000-005 (2000) 81, “Workshop on Confidence Limits”, edited by F. James, L. Lyons and Y. Perrin. [24] S. Jadach et al., LEP2 Monte Carlo Workshop: Report of the Working Groups on Precision Calculations for LEP2 Physics, CERN report 2000-009 (2000); G. Altarelli et al., Physics at LEP2, CERN report 96-01 (1996). [25] N. Castro, Search for a fourth generation b′-quark at LEP-II. MSc. Thesis, Instituto Superior Técnico da Universidade Técnica de Lisboa (2004), CERN-THESIS-2005- [26] S. Glashow, J. Iliopoulos and L. Maiani, Phys. Rev. D2 (1970) 1285. Z / H / g / γ b / s / d t′ / t / c / u a) b) Figure 1: The Feynman diagrams corresponding to the b′ (a) FCNC and (b) CC decay modes are shown. Z/γ Z l−/ q̄ / q̄ l+/ q / q ν̄ / ν̄ / q̄ ν / ν / q Z/γ W q̄ / q̄ q / q ν / q̄ l+/ q a) b) Figure 2: The final states associated to the b′ (a) FCNC and (b) CC decay modes are shown. Only those states analysed here are indicated. DELPHI 0 50 100 150 αl1,tr (˚) a) (e sample) 0 20 40 60 80 100 pmis (GeV/c) b) (µ sample) 0 20 40 60 80 pjet 1 (GeV/c) c) (no-id sample) SM expectation signal (mb’=100 GeV/c Figure 3: Data and SM expectation after the preselection level for the bb̄l+l−νν̄ final state and centre-of-mass energies above 200 GeV. (a) The angle between the most energetic lepton and the closest charged-particle track (e sample), (b) the missing momentum (µ sample) and (c) the momentum of the most energetic jet (no-id sample) are shown. The signal distributions for mb′ = 100 GeV/c 2 and s = 205 GeV are also shown with arbitrary normalisation. The background composition is 11% of qq̄, 69% of WW, 15% of ZZ and 5% of γγ for the e sample, 6% of qq̄, 90% of WW and 4% of ZZ for the µ sample and 45% of qq̄, 48% of WW, 5% of ZZ and 2% of γγ for the no-id sample. DELPHI 0 20 40 60 80 pjet 1 (GeV/c) a) (e sample) 0 50 100 150 αll (˚) b) (µ sample) 0 0.25 0.5 0.75 1 pmis / Emis c) (no-id sample) SM expectation signal (mb’=100 GeV/c Figure 4: Data and SM expectation after the first selection level for the bb̄l+l−νν̄ final state and for centre-of-mass energies above 200 GeV. (a) The momentum of the most energetic jet (e sample), (b) the angle between the two leptons (µ sample) and (c) the ratio between the missing momentum and missing energy (no-id sample) are shown. The signal distributions for mb′ = 100 GeV/c 2 and s = 205 GeV are also shown with arbitrary normalisation. The arrows represent the cuts applied in the second selection level. DELPHI 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 χ2/n.d.f. SM expectation signal (mb’=100 GeV/c Figure 5: Comparison of data and SM expectation distributions of the χ2/n.d.f. of the fit imposing energy-momentum conservation and no missing energy for the bb̄qq̄νν̄ final state at centre-of-mass energies above 200 GeV. The arrow shows the applied cut. The signal for mb′ = 100 GeV/c 2 and s = 205 GeV is also shown with arbitrary normalisation. DELPHI 0 50 100 150 200 missing mass (GeV/c 0 10 20 30 40 scaled acoplanarity (˚) 0 50 100 150 acolj1j2 (˚) 0 0.5 1 1.5 h1+h3 0 50 100 150 θmis (˚) SM expectation signal (mb’=100 GeV/c Figure 6: Variables used in the discriminant analysis (bb̄qq̄νν̄ final state). The data and SM expectation distributions for centre-of-mass energies above 200 GeV are shown for (a) the missing mass, (b) Aj1j2cop ×min(sin θj1 , sin θj2), where Aj1j2cop is the acoplanarity and θj1,j2 are the polar angles of the jets when forcing the events into two jets, (c) the acollinearity between the two most energetic jets (with the event particles clustered into four jets), (d) the sum of the first and third Fox-Wolfram moments and (e) the polar angle of the missing momentum. The signal distributions for mb′ = 100 GeV/c 2 and s = 205 GeV are also shown with arbitrary normalisation. DELPHI 1 1.5 2 2.5 3 -log10(y4→3) 1 2 3 4 -log10(y5→4) 0 50 100 150 acolj1j2 (4 jets) (˚) 0 0.2 0.4 0.6 0.8 1 h1+h3 0 20 40 60 80 100 pj1 (GeV/c) /c e) 0 50 100 150 αj1j2 (˚) data SM expectation signal (mb’=100 GeV/c Figure 7: Variables used in the discriminant analysis (bb̄qq̄qq̄ final state). The data and SM expectation for centre-of-mass energies above 200 GeV are shown for (a) − log10(y4→3), (b) − log10(y5→4), (c) the acollinearity between the two most energetic jets, with the events clustered into four jets (see text for explanation), (d) the h1 + h3 Fox-Wolfram moments sum, (e) the momentum of the most energetic jet and (f) the angle between the two most energetic jets. The signal distributions for mb′ = 100 GeV/c 2 and√ s = 205 GeV are also shown with arbitrary normalisation. DELPHI 0 0.25 0.5 0.75 1 h1+h3 a) (e sample) 0 50 100 150 mj1j2 (2 jets) (GeV/c b) (e sample) 1 2 3 4 -log10(y4→3) c) (µ sample) 0 0.1 0.2 0.3 0.4 Σptracks lepton hem. / √s d) (µ sample) 0 50 100 150 acolj1j2 (˚) e) (no-id sample) 0 50 100 150 αlν (˚) f) (no-id sample) data SM expectation signal (mb’=100 GeV/c Figure 8: Variables used in the discriminant analysis (cc̄qq̄l+ν final state). The data events and background expectation for centre-of-mass energies above 200 GeV are shown for (a) the h1 + h3 Fox-Wolfram moments sum (e sample), (b) the invariant mass of the two jets with the events clustered into two jets (e sample), (c) − log10(y4→3) (µ sample), (d) |~pi|/ s, where ~pi are the momenta of the charged particles (excluding the lepton) in the same hemisphere as the lepton (µ sample), (e) the acollinearity between the two most energetic jets (no-id sample) and (f) the angle between the lepton and the missing momentum (no-id sample). The signal distributions for mb′ = 100 GeV/c 2 and√ s = 205 GeV are also shown with arbitrary normalisation. DELPHI -10 0 10 ln(LS/LB) -10 -5 0 5 ln(LS/LB) -20 -10 0 10 ln(LS/LB) -20 -10 0 10 ln(LS/LB) -20 -10 0 10 ln(LS/LB) -10 0 10 ln(LS/LB) -20 -10 0 10 ln(LS/LB) SM expectation Signal (mb’=100 GeV/c Figure 9: Discriminant variables ln(LS/LB) for data and SM simulation (centre-of– mass energies above 200 GeV). FCNC b′ decay mode: (a) bb̄qq̄νν̄ and (b) bb̄qq̄qq̄. CC b′ decay mode: (c) cc̄qq̄l+ν (e sample), (d) cc̄qq̄l+ν (µ sample), (e) cc̄qq̄l+ν (no-id sample) (f) cc̄qq̄l+ν (no lepton sample) and (g) cc̄qq̄qq̄. The signal distributions for mb′ = 100 GeV/c 2 and s = 205 GeV are also shown with arbitrary normalisation. DELPHI 96 97 98 99 100 101 102 103 mb’ (GeV/c a) b’→ bZ decay observed limit expected limit expected ± 1σ expected ± 2σ 96 97 98 99 100 101 102 103 96 97 98 99 100 101 102 103 mb’ (GeV/c 96 97 98 99 100 101 102 103 b) b’→ cW decay observed limit expected limit expected ± 1σ expected ± 2σ Figure 10: The observed and expected upper limits at 95% CL on (a) BR(b′ → bZ) and (b) BR(b′ → cW) are shown. The 1σ and 2σ bands around the expected limit are also presented. Systematic errors were taken into account in the limit evaluation. DELPHI Figure 11: The excluded region in the plane (RCKM , mb′) with mt′ −mb′ = 1 GeV/c2, obtained from the 95% CL upper limits on BR(b′ → bZ) (bottom) and BR(b′ → cW) (top) is shown. The light and dark shadings correspond to the observed and expected limits, respectively. The expected limits on BR(b′ → bZ) did not allow exclusions to be set for low values of RCKM . DELPHI Figure 12: The excluded region in the plane (RCKM , mb′) with mt′ −mb′ = 50 GeV/c2, obtained from the 95% CL upper limits on BR(b′ → bZ) (bottom) and BR(b′ → cW) (top) is shown. The light and dark shadings correspond to the observed and expected limits, respectively. ABSTRACT A search for the pair production of fourth generation b'-quarks was performed using data taken by the DELPHI detector at LEP-II. The analysed data were collected at centre-of-mass energies ranging from 196 to 209 GeV, corresponding to an integrated luminosity of 420 pb^{-1}. No evidence for a signal was found. Upper limits on BR(b' -> bZ) and BR(b' -> cW) were obtained for b' masses ranging from 96 to 103 GeV/c^2. These limits, together with the theoretical branching ratios predicted by a sequential four generations model, were used to constrain the value of R_{CKM}=|V_{cb'}/V_{tb'}V_{tb}|, where V_{cb'}, V_{tb'} and V_{tb} are elements of the extended CKM matrix. <|endoftext|><|startoftext|> Introduction The main concern of this paper is the curvature of a special family of warped pseudo-metrics on product manifolds. We introduce a suitable form for the relations among the involved curvatures in such metrics and apply them to the existence and/or construction of Einstein and constant scalar curvature metrics in this family. Let B = (Bm, gB) and F = (Fk, gF ) be two pseudo-Riemannian manifolds of dimensions m ≥ 1 and k ≥ 0, respectively and also let B × F be the usual product manifold of B and F . For a given smooth function w ∈ C∞>0(B) = {v ∈ C∞(B) : v(x) > 0, ∀x ∈ B}, the warped product B ×w F = ((B ×w F )m+k, g = gB +w2gF ) was defined by Bishop and O’Neill in [19] in order to study manifolds of negative curvature. Date: November 4, 2018. 1991 Mathematics Subject Classification. Primary: 53C21, 53C25, 53C50 Secondary: 35Q75, 53C80, 83E15, 83E30. Key words and phrases. Warped products, conformal metrics, Ricci curvature, scalar curvature, semilinear equations, positive solutions, Lichnerowicz-York equation, concave- convex nonlinearities, Kaluza-Klein theory, string theory. http://arxiv.org/abs/0704.0595v1 2 FERNANDO DOBARRO & BÜLENT ÜNAL In this article, we deal with a particular class of warped products, i.e. when the pseudo-metric in the base is affected by a conformal change. Pre- cisely, for given smooth functions c, w ∈ C∞>0(B) we will call ((B × F )m+k, g = c2gB+w 2gF ) as a [c, w]-base conformal warped product (briefly [c, w]-bcwp), denoted by B ×[c,w] F . We will concentrate our attention on a special sub- class of this structure, namely when there is a relation between the conformal factor c and the warping function w of the form c = wµ, where µ is a real parameter and we will call the [ψµ, ψ]-bcwp as a (ψ, µ)-bcwp. Note that we generically called the latter case as special base conformal warped products, briefly sbcwp in [29]. As we will explain in §2, metrics of this type play a relevant role in several topics of differential geometry and theoretical physics (see also [29]). This article concerns curvature related questions of these metrics which are of interest not only in the applications, but also from the points of view of dif- ferential geometry and the type of the involved nonlinear partial differential equations (PDE), such as those with concave-convex nonlinearities and the Lichnerowicz-York equations. The article is organized in the following way: in §2 after a brief description of several fields where pseudo-metrics described as above are applied, we formulate the curvature problems that we deal within the next sections and give the statements of the main results. In §3, we state Theorems 2.2 and 2.3 in order to express the Ricci tensor and scalar curvature of a (ψ, µ)-bcwp and sketch their proofs (see [29, Section 3] for detailed computations). In §4 and 5, we establish our main results about the existence of (ψ, µ)-bcwp’s of constant scalar curvature with compact Riemannian base. 2. Motivations and Main results As we announced in the introduction, we firstly want to mention some of the major fields of differential geometry and theoretical physics where base conformal warped products are applied. i: In the construction of a large class of non trivial static anti de Sitter vacuum space-times • In the Schwarzschild solutions of the Einstein equations (see [10, 18, 41, 59, 69, 74]). • In the Riemannian Schwarzschild metric, namely (see [10]). • In the “generalized Riemannian anti de Sitter T2 black hole metrics” (see §3.2 of [10] for details). • In the Bañados-Teitelboim-Zanelli (BTZ) and de Sitter (dS) black holes (see [1, 15, 16, 28, 45] for details). Indeed, all of them can be generated by an approach of the fol- lowing type: let (F2, gF ) be a pseudo-Riemannian manifold and g be ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 3 a pseudo-metric on R+ × R× F2 defined by (2.1) g = u2(r) dr2 ± u2(r)dt2 + r2gF . After the change of variables s = r2, y = t, there results ds2 = 4r2dr2 and dy2 = dt2. Then (2.1) is equivalent to ds2 ± 4 s)dy2 + sgF 2 )2(− 2 ))2(−1)ds2 ± (2s 2 ))2dy2 2 )2gF . (2.2) Note that roughly speaking, g is a nested application of two (ψ, µ)- bcwp’s. That is, on R+ × R and taking (2.3) ψ1(s) = 2s 2 ) and µ1 = −1, the metric inside the brackets in the last member of (2.2) is a (ψ1, µ1)- bcwp, while the metric g on (R+ × R)× F2 is a (ψ2, µ2)-bcwp with (2.4) ψ2(s, y) = s 2 and µ2 = − In the last section of [29], through the application of Theorems 2.2 and 2.3 below and several standard computations, we generalized the latter approach to the case of an Einstein fiber (Fk, gF ) with dimension k ≥ 2. ii: In the study of the equivariant isometric embeddings of space-time slices in Minkowski spaces (see [39, 38]). iii: In the Kaluza-Klein theory (see [76, §7.6, Particle Physics and Ge- ometry], [60] and [77]) and in the Randall-Sundrum theory [30, 40, 63, 64, 65, 71] with µ as a free parameter. For example, in [46] the following metric is considered (2.5) e2A(y)gijdx idxj + e2B(y)dy2, with the notation {xi}, i = 0, 1, 2, 3 for the coordinates in the 4- dimensional space-time and x5 = y for the fifth coordinate on an extra dimension. In particular, Ito takes the ansatz (2.6) B = αA, which corresponds exactly to our sbcwp metrics, considering gB = dy2, gF = gijdx idxj , ψ(y) = e α = eA(y) and µ = α. iii: In String and Supergravity theories, for instance, in the Maldacena conjecture about the duality between compactifications of M/string 4 FERNANDO DOBARRO & BÜLENT ÜNAL theory on various Anti-de Sitter space-times and various confor- mal field theories (see [55, 62]) and in warped compactifications (see [40, 72] and references therein). Besides all of these, there are also frequent occurrences of this type of metrics in string topics (see [33, 34, 35, 36, 37, 53, 61, 71] and also [1, 12, 67] for some reviews about these topics). iv: In the derivation of effective theories for warped compactification of supergravity and the Hor̆ava-Witten model (see [50, 51]). For in- stance, in [51] the ansatz ds2 = hαds2(X4) + h βds2(Y ) is considered where X4 is a four-dimensional space-time with coordinates x is a Calabi-Yau manifold (the so called internal space) and h de- pends on the four-dimensional coordinates xµ, in order to study the dynamics of the four-dimensional effective theory. We note that in those articles, the structure of the expressions of the Ricci tensor and scalar curvature of the involved metrics result particularly use- ful. We observe that they correspond to very particular cases of the expressions obtained by us in [29], see also Theorems 2.2 and 2.3 and Proposition 2.4 stated below. v: In the discussion of Birkhoff-type theorems (generally speaking these are the theorems in which the gravitational vacuum solutions admit more symmetry than the inserted metric ansatz, (see [41, page 372] and [17, Chapter 3]) for rigorous statements), especially in Equation 6.1 of [66] where, H-J. Schmidt considers a special form of a bcwp and basically shows that if a bcwp of this form is Einstein, then it admits one Killing vector more than the fiber. In order to achieve that, the author considers for a specific value of µ, namely µ = (1 − k)/2, in the following problem: Does there exist a smooth function ψ ∈ C∞>0(B) such that the corresponding (ψ, µ)-bcwp (B2 × Fk, ψ2µgB + ψ2gF ) is an Einstein manifold? (see also (Pb-Eins.) below.) vi: In the study of bi-conformal transformations, bi-conformal vector fields and their applications (see [32, Remark in Section 7] and [31, Sections 7 and 8]). vii: In the study of the spectrum of the Laplace-Beltrami operator for p−forms. For instance in Equation (1.1) of [11], the author considers the structure that follows: let M be an n-dimensional compact, Rie- mannian manifold with boundary, and let y be a boundary-defining function; she endows the interior M of M with a Riemannian metric ds2 such that in a small tubular neighborhood of ∂M inM , ds2 takes the form (2.7) ds2 = e−2(a+1)tdt2 + e−2btdθ2∂M , ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 5 where t := − log y ∈ (c,+∞) and dθ2 is the Riemannian metric on ∂M (see [11, 56] and references therein for details). Notation 2.1. From now on, we will use the Einstein summation convention over repeated indices and consider only connected manifolds. Furthermore, we will denote the Laplace-Beltrami operator on a pseudo-Riemannian man- ifold (N,h) by ∆N (·), i.e., ∆N (·) = ∇N i∇Ni(·). Note that ∆N is elliptic if (N,h) is Riemannian and it is hyperbolic when (N,h) is Lorentzian. If (N,h) is neither Riemannian nor Lorentzian, then the operator is ultra-hyperbolic. Furthermore, we will consider the Hessian of a function v ∈ C∞(N), denoted by Hvh or H N , so that the second covariant differential of v is given by Hvh = ∇(∇v). Recall that the Hessian is a symmetric (0, 2) tensor field satisfying (2.8) Hvh(X,Y ) = XY v − (∇XY )v = h(∇X(grad v), Y ), for any smooth vector fields X,Y on N. For a given pseudo-Riemannian manifold N = (N,h) we will denote its Riemann curvature tensor, Ricci tensor and scalar curvature by RN , RicN and SN , respectively. We will denote the set of all lifts of all vector fields of B by L(B). Note that the lift of a vector field X on B denoted by X̃ is the vector field on B × F given by dπ(X̃) = X where π : B × F → B is the usual projection map. In Section 3, we will sketch the proofs of the following two theorems related to the Ricci tensor and the scalar curvature of a generic (ψ, µ)-bcwp. Theorem 2.2. Let B = (Bm, gB) and F = (Fk, gF ) be two pseudo-Rieman- nian manifolds with m ≥ 3 and k ≥ 1, respectively and also let µ ∈ R \ {0, 1, µ, µ±} be a real number with µ := − k m− 2 and µ± := µ± µ2 − µ. Suppose ψ ∈ C∞>0(B). Then the Ricci curvature tensor of the corresponding (ψ, µ)-bcwp, denoted by Ric verifies the relation Ric = RicB + β − β∆ 1 α∆ gB on L(B)× L(B), Ric = 0 on L(B)× L(F ), Ric = RicF − ψ2(µ−1) α∆ gF on L(F )× L(F ), (2.9) 6 FERNANDO DOBARRO & BÜLENT ÜNAL where (2.10) (m− 2)µ+ k , (m− 2)µ+ k , −[(m− 2)µ + k] µ[(m− 2)µ+ k] + k(µ− 1) , [(m− 2)µ + k]2 µ[(m− 2)µ+ k] + k(µ− 1) . Theorem 2.3. Let B = (Bm, gB) and F = (Fk, gF ) be two pseudo-Rieman- nian manifolds of dimensions m ≥ 2 and k ≥ 0, respectively. Suppose that SB and SF denote the scalar curvatures of B = (Bm, gB) and F = (Fk, gF ), respectively. If µ ∈ R and ψ ∈ C∞>0(B), then the scalar curvature S of the corresponding (ψ, µ)-bcwp verifies, (i) If µ 6= − k m− 1 , then (2.11) − β∆Bu+ SBu = Su2µα+1 − SFu2(µ−1)α+1 where (2.12) α = 2[k + (m− 1)µ] {[k + (m− 1)µ] + (1− µ)}k + (m− 2)µ[k + (m− 1)µ] , (2.13) β = α2[k + (m− 1)µ] > 0 and ψ = uα > 0. (ii) If µ = − k m− 1 , then (2.14) − k |∇Bψ|2B m−1 [S − SFψ−2]− SB. From the mathematical and physical points of view, there are several interesting questions about (ψ, µ)-bcwp’s. In [29] we began the study of existence and/or construction of Einstein (ψ, µ)-bcwp’s and those of constant scalar curvature. These questions are closely connected to Theorems 2.2 and In [29], by applying Theorem 2.2, we give suitable conditions that allow us to study some particular cases of the problem: (Pb-Eins.) Given µ ∈ R, does there exist a smooth function ψ ∈ C∞>0(B) such that the corresponding (ψ, µ)-bcwp is an Einstein manifold? ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 7 In particular, we obtain the following result as an immediate corollary of Theorem 2.2. Proposition 2.4. Let us assume the hypothesis of Theorem 2.2. Then the corresponding (ψ, µ)-bcwp is an Einstein manifold with λ if and only if (F, gF ) is Einstein with ν constant and the system that follows is verified λψ2µgB = RicB + β − β∆ 1 α∆ gB on L(B)× L(B) λψ2 = ν − 1 ψ2(µ−1) (2.15) where the coefficients are given by (2.10). Compare the system (2.15) with the well known one for a classical warped product in [18, 49, 59]. By studying (2.15), we have obtained the generaliza- tion of the construction exposed in the above motivational examples in i and v, among other related results. We suggest the interested reader consider the results about the problem (Pb-Eins.) stated in [29]. Now, we focus on the problems which we will deal in §4. Let B = (Bm, gB) and F = (Fk, gF ) be pseudo-Riemannian manifolds. There is an extensive number of publications about the well known Yamabe problem namely: (Ya) [79, 75, 68, 13] Does there exist a function ϕ ∈ C∞>0(B) such that (Bm, ϕ m−2 gB) has constant scalar curvature? Analogously, in several articles the following problem has been studied: (cscwp) [27] Is there a function w ∈ C∞>0(B) such that the warped product B ×w F has constant scalar curvature? In the sequel we will suppose that B = (Bm, gB) is a Riemannian manifold. Thus, both problems bring to the study of the existence of positive solutions for nonlinear elliptic equations on Riemannian manifolds. The involved non- linearities are powers with Sobolev critical exponent for the Yamabe problem and sub-linear (linear if the dimension k of the fiber is 3) for the problem of constant scalar curvature of a warped product. In Section 4, we deal with a mixed problem between (Ya) and (cscwp) which is already proposed in [29], namely: 8 FERNANDO DOBARRO & BÜLENT ÜNAL (Pb-sc) Given µ ∈ R, does there exist a function ψ ∈ C∞>0(B) such that the corresponding (ψ, µ)-bcwp has constant scalar curvature? Note that when µ = 0, (Pb-sc) corresponds to the problem (cscwp), whereas when the dimension of the fiber k = 0 and µ = 1, then (Pb-sc) corresponds to (Ya) for the base manifold. Finally (Pb-sc) corresponds to (Ya) for the usual product metric with a conformal factor in C∞>0(B) when µ = 1. Under the hypothesis of Theorem 2.3 i, the analysis of the problem (Pb- sc) brings to the study of the existence and multiplicity of solutions u ∈ C∞>0(B) of (2.16) − β∆Bu+ SBu = λu2µα+1 − SFu2(µ−1)α+1, where all the components of the equation are like in Theorem 2.3 i and λ (the conjectured constant scalar curvature of the corresponding (ψ, µ)-bcwp) is a real parameter. We observe that an easy argument of separation of variables, like in [24, §2] and [27], shows that there exists a positive solution of (2.16) only if the scalar curvature of the fiber SH is constant. Thus this will be a natural assumption in the study of (Pb-sc). Furthermore, note that the involved nonlinearities in the right hand side of (2.16) dramatically change with the choice of the parameters, an exhaustive analysis of these changes is the subject matter of [29, §6]. There are several partial results about semi-linear elliptic equations like (2.16) with different boundary conditions, see for instance [2, 5, 6, 9, 21, 23, 26, 73, 78] and references in [29]. In this article we will state our first results about the problem (Pb-sc) when the base B is a compact Riemannian manifold of dimension m ≥ 3 and the fiber F has non-positive constant scalar curvature SF . For brevity of our study, it will be useful to introduce the following notation: µsc := µsc(m,k) = − m− 1 and µpY = µpY (m,k) := − k + 1 m− 2 (sc as scalar curvature and Y as Yamabe). Notice that µpY < µsc < 0. We plan to study the case of µ = µsc in a preceding project, therefore the related results are not going to be presented here. We can synthesize our results about (Pb-sc) in the case of non-positive SF as follow. • The case of scalar flat fiber, i.e. SF = 0. Theorem 2.5. If µ ∈ (µpY , µsc) ∪ (µsc,+∞) the answer to (Pb-sc) is affirmative. ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 9 By assuming some additional restrictions on the scalar curvature of the base SB , we obtain existence results for the range µ ≤ µpY . • The case of fiber with negative constant scalar curvature, i.e. SF < 0. In order to describe the µ−ranges of validity of the results, we will apply the notations introduced in [29, §5] (see Appendix A for a brief introduction of these notations). Theorem 2.6. If “(m,k) ∈ D and µ ∈ (0, 1)” or “(m,k) ∈ CD and µ ∈ (0, 1) ∩ (µ−, µ+)” or “(m,k) ∈ CD and µ ∈ (0, 1) ∩ C[µ−, µ+]”, then the answer to (Pb-sc) is affirmative. Remark 2.7. The first two cases in Theorem 2.6 will be studied by adapting the ideas in [5] and the last case by applying the results in [73, p. 99]. In the former - Theorem 4.15, the involved nonlinearities correspond to the so called concave-convex whereas in the latter - Theorem 4.16, they are singular as in the Lichnerowicz-York equation about the constraints for the Einstein equations (see [22], [43], [58], [57, p. 542-543] and [73, Chp.18]). Similarly to the case of SF = 0, we obtain existence results for some remaining µ−ranges by assuming some additional restrictions for the scalar curvature of the base SB . Naturally the study of (Pb-sc) allows us to obtain partial results of the related question: Given µ ∈ R and λ ∈ R does there exist a function ψ ∈ C∞>0(B) such that the corresponding (ψ, µ)-bcwp has constant scalar curvature λ? These are stated in the several theorems and propositions in §4. 3. The curvature relations - Sketch of the proofs The proofs of Theorems 2.2 and 2.3 require long and yet standard com- putations of the Riemann and Ricci tensors and the scalar curvature of a general base conformal warped product. Here, we reproduce the results for the Ricci tensor and the scalar curvature, and we also suggest the reader see [29, §3] for the complete computations. Theorem 3.1. The Ricci tensor of [c, w]-bcwp, denoted by Ric satisfies (1) Ric = RicB − (m− 2)1 HcB + k +2(m− 2) 1 dc⊗ dc+ k 1 [dc⊗ dw + dw ⊗ dc] (m− 3)gB(∇ Bc,∇Bc) gB(∇Bw,∇Bc) on L(B)× L(B), 10 FERNANDO DOBARRO & BÜLENT ÜNAL (2) Ric = 0 on L(B)× L(F ), (3) Ric = RicF − (m− 2)gB(∇ Bw,∇Bc) +(k − 1)gB(∇ Bw,∇Bw) gF on L(F )× L(F ). Theorem 3.2. The scalar curvature S of a [c, w]-bcwp is given by c2S = SB + SF − 2(m− 1)∆Bc − 2k∆Bw − (m− 4)(m− 1)gB(∇ Bc,∇Bc) − 2k(m− 2)gB(∇ Bw,∇Bc) − k(k − 1)gB(∇ Bw,∇Bw) The following two lemmas (3.3 and 3.7) play a central role in the proof of Theorems 2.2 and 2.3. Indeed, it is sufficient to apply them in a suitable mode and make use of Theorems 3.1 and 3.2 several times, the reader can find all the details in [29, §2 and 4]. LetN = (Nn, h) be a pseudo-Riemannian manifold of dimension n, |∇(·)|2 = |∇N (·)|2N = h(∇N (·),∇N (·)) and ∆h = ∆N . Lemma 3.3. Let Lh be a differential operator on C >0(N) defined by (3.1) Lhv = where ri, ai ∈ R and ζ := riai, η := i . Then, (3.2) Lhv = (η − ζ) ‖grad hv‖2h (ii) If ζ 6= 0 and η 6= 0, for α = ζ and β = , then we have (3.3) Lhv = β Remark 3.4. We also applied the latter lemma in the study of curvature of multiply warped products (see [28]). ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 11 Corollary 3.5. Let Lh be a differential operator defined by (3.4) Lhv = r1 for v ∈ C∞>0(N), where r1a1 + r2a2 6= 0 and r1a21 + r2a22 6= 0. Then, by changing the variables v = uα with 0 < u ∈ C∞(N), α = r1a1 + r2a2 1 + r2a and β = (r1a1 + r2a2) 1 + r2a α(r1a1 + r2a2) there results (3.5) Lhv = β Remark 3.6. By the change of variables as in Corollary 3.5 equations of the (3.6) Lhv = r1 = H(v, x, s), transform into (3.7) β∆hu = uH(u α, x, s). Lemma 3.7. Let Hh be a differential operator on C∞>0(N) defined by (3.8) Hhv = riai and η := i , where the indices extend from 1 to l ∈ N and any ri, ai ∈ R. Hence, (3.9) Hhv = (η − ζ) dv ⊗ dv + ζ 1 where ⊗ is the usual tensorial product. If furthermore, ζ 6= 0 and η 6= 0, (3.10) Hhv = β where α = and β = 4. The problem (Pb-sc) - Existence of solutions Throughout this section, we will assume that B is not only a Riemannian manifold of dimension m ≥ 3, but also “compact” and connected. We further assume that F is a pseudo-Riemannian manifold of dimension k ≥ 0 with constant scalar curvature SF ≤ 0. Moreover, we will assume that µ 6= µsc. Hence, we will concentrate our attention on the relations (2.11), (2.12) and (2.13) by applying Theorem 2.3 (i). 12 FERNANDO DOBARRO & BÜLENT ÜNAL Let λ1 denote the principal eigenvalue of the operator (4.1) L(·) = −β∆B(·) + SB(·), and u1 ∈ C∞>0(B) be the corresponding positive eigenfunction with ‖u1‖∞ = 1, where β is as in Theorem 2.3. First of all, we will state some results about uniqueness and non-existence of positive solutions for Equation (2.16) under the latter hypothesis. About the former, we adapt Lemma 3.3 in [5, p. 525] to our situation (for a detailed proof see [5], [20, Method II, p. 103] and also [70]). Lemma 4.1. Let f ∈ C0(R>0) such that t−1f(t) is decreasing. If v and w satisfy (4.2) −β∆Bv + SBv ≤ f(v), v ∈ C∞>0(B), (4.3) −β∆Bw + SBw ≥ f(w), w ∈ C∞>0(B), then w ≥ v on B. Proof. Let θ(t) be a smooth nondecreasing function such that θ(t) ≡ 0 for t ≤ 0 and θ(t) ≡ 1 for t ≥ 1. Thus for all ǫ > 0, θǫ(t) := θ is smooth, nondecreasing, nonnegative and θ(t) ≡ 0 for t ≤ 0 and θ(t) ≡ 1 for t ≥ ǫ. Furthermore γǫ(t) := sθ′ǫ(s)ds satisfies 0 ≤ γǫ(t) ≤ ǫ, for any t ∈ R. On the other hand, since (B, gB) is a compact Riemannian manifold without boundary and β > 0, like in [5, Lemma 3.3, p. 526] there results (4.4) [−vβ∆Bw + wβ∆Bv]θǫ(v − w)dvgB ≤ [−β∆Bv]γǫ(v − w)dvgB . Hence, by the above considerations about θǫ and γǫ, (4.4) implies that (4.5) [−vβ∆Bw + wβ∆Bv]θǫ(v − w)dvgB ≤ ǫ [−β∆Bv≥0] [−β∆Bv]dvgB . Now, by applying (4.2) and (4.3) there results (4.6) − vβ∆Bw+wβ∆Bv = vLw−wLv ≥ vf(w)−wf(v) = vw − f(v) ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 13 Thus by combining (4.6) and (4.5), as ǫ→ 0+ we led to (4.7) [v>w] − f(v) dvgB ≤ 0 and conclude the proof like in [5, Lemma 3.3, p. 526-527]. But on [v > w] and hence meas[v > w] = 0; thus v ≤ w. 1 � Corollary 4.2. Let f ∈ C0(R>0) such that t−1f(t) is decreasing. Then (4.8) −β∆Bv + SBv = f(v), v ∈ C∞>0(B) has at most one solution. Proof. Assume that v and w are two solutions of (4.8). Then by applying Lemma 4.1 firstly with v and w, and conversely with w and v, the conclusion is proved. � Remark 4.3. Notice that Lemma 4.1 and Corollary 4.2 allow the function f ∈ C0(R>0) to be singular at 0. Related to the non-existence of smooth positive solutions for Equation (2.16), we will state an easy result under the general hypothesis of this section. Proposition 4.4. If either maxB SB ≤ infu∈R>0 u2µα(λ− SFu−2α) or minB SB ≥ supu∈R>0 u 2µα(λ − SFu−2α), then (2.16) has no solution in C∞>0(B). Proof. It is sufficient to apply the maximum principle with some easy ad- justments to the particular involved coefficients. � • The case of scalar flat fiber, i.e. SF = 0. In this case, the term containing the nonlinearity u2(µ−1)α+1 becomes non- influent in (2.16), thus (Pb-sc) equivalently results to the study of existence of solutions for the problem: (4.9) −β∆Bu+ SBu = λu2µα+1, u ∈ C∞>0(B), where λ is a real parameter (i.e., it is the searched constant scalar curvature) and ψ = uα. 1meas denotes the usual gB−measure on the compact Riemannian manifold (Bm, gB) 14 FERNANDO DOBARRO & BÜLENT ÜNAL Remark 4.5. 2 Let p ∈ R\{1} and (λ0, u0) ∈ (R\{0})×C∞>0(B) be a solution (4.10) −β∆Bu+ SBu = λup, u ∈ C∞>0(B). Hence, by the difference of homogeneity between both members of (4.9), it is easy to show that if λ ∈ R satisfies sign(λ) = sign(λ0), then (λ, uλ) is a solution of (4.10), where uλ = tλu0 and tλ = Thus by (4.9), we obtain geometrically: if the parameter µ is given in a way that p := 2µα + 1 6= 1 and B ×[ψµ0 ,ψ0] F has constant scalar curvature λ0 6= 0, then for any λ ∈ R verifying sign(λ) = sign(λ0), there results that B×[ψµ F is of scalar curvature λ, where ψλ = t ψ0 and tλ given as above. Theorem 4.6. (Case : µ = 0) The scalar curvature of a (ψ, 0)-bcwp of base B and fiber F (i.e., a singly warped product B ×ψ F ) is a constant λ if and only if λ = λ1 and ψ is a positive multiple of u 1 (i.e., ψ = tu 1 for some t ∈ R>0). Proof. First of all note that µ = 0 implies α = k + 1 . On the other hand, in this case, the problem (4.9) is linear, so it is sufficient to apply the well known results about the principal eigenvalue and its associated eigenfunctions of operators like (4.1) in a suitable setting. � Theorem 4.7. (Case : µsc < µ < 0) The scalar curvature of a (ψ, µ)-bcwp of base B and fiber F is a constant λ, only if sign(λ) = sign(λ1). Further- more, (1) if λ = 0 then there exists ψ ∈ C∞>0(B) such that B ×[ψµ,ψ] F has constant scalar curvature 0 if and only if λ1 = 0. Moreover, such ψ’s are the positive multiples of uα1 , i.e. tu 1 , t ∈ R>0. (2) if λ > 0 then there exists ψ ∈ C∞>0(B) such that B ×[ψµ,ψ] F has constant scalar curvature λ if and only if λ1 > 0. In this case, the solution ψ is unique. (3) if λ < 0 then there exists ψ ∈ C∞>0(B) such that B ×[ψµ,ψ] F has constant scalar curvature λ when λ1 < 0 and is close enough to 0. Proof. The condition µsc < µ < 0 implies that 0 < p := 2µα + 1 < 1, i.e., the problem (4.9) is sublinear. Thus, to prove the theorem one can use variational arguments as in [24] (alternatively, degree theoretic arguments as in [7] or bifurcation theory as in [27]). 2Along this article we consider the sign function defined by sign = χ(0,+∞) − χ(−∞,0), where χA is the characteristic function of the set A. ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 15 We observe that in order to obtain the positivity of the solutions required in (4.9), one may apply the maximum principle for the case of λ > 0 and the antimaximum principle for the case of λ < 0. The uniqueness for λ > 0 is a consequence of Corollary 4.2. � Remark 4.8. In order to consider the next case we introduce the following notation. For a given p such that 1 < p ≤ pY , let (4.11) κp := inf |∇Bv|2 + SB dvgB , where Hp := v ∈ H1(B) : |v|p+1dvgB = 1 Now, we consider the following two cases. (1 < p < pY ): In this case by adapting [42, Theorem 1.3], there ex- ists up ∈ C∞>0(B) such that (βκp, up) is a solution of (4.10) and∫ up+1p dvgB = 1. (p = pY ): For this specific and important value, analogously to [42, §2], we distinguish three subcases along the study of our problem (4.10), in correspondence with the sign(κpY ). κpY = 0: in this case, there exists upY ∈ C∞>0(B) such that (0, upY ) is a solution of (4.10) and upY +1pY dvgB = 1. κpY < 0: here there exists upY ∈ C∞>0(B) such that (βκpY , upY ) is a solution of (4.10) and upY +1pY dvgB = 1. κpY > 0: this is a more difficult case, let Km be the sharp Eu- clidean Sobolev constant (4.12) Km = m(m− 2)ω where ωm is the volume of the unit m−sphere. Thus, if (4.13) κpY < then there exists upY ∈ C∞>0(B) such that (βκpY , upY ) is a solu- tion of (4.10) and upY +1pY dvgB = 1. Furthermore, the condi- (4.14) κpY ≤ 16 FERNANDO DOBARRO & BÜLENT ÜNAL is sharp by [42], so that this is independent of the underlying manifold and the potential considered. The equality case in (4.14) is discussed in [44]. This results allow to establish the following two theorems. Theorem 4.9. (Cases : µpY < µ < µsc or 0 < µ) There exists ψ ∈ C∞>0(B) such that the scalar curvature of B ×[ψµ,ψ] F is a constant λ if and only if sign(λ) = sign(κp) where p := 2µα+1 and κp is given by (4.11). Furthermore if λ < 0, then the solution ψ is unique. Proof. The conditions (µpY < µ < µsc or 0 < µ) imply that 1 < p := 2µα+ 1 < pY , i.e. the problem (4.9) is superlinear but subcritical with respect to the Sobolev immersion theorem (see [29, Remark 5.5]). By recalling that ψ = uα, it is sufficient to prove that follows. Let up be defined as in the case of (1 < p < pY ) in Remark 4.8. If (λ, u) is a solution of (4.9), then multiplying (4.9) by up and integrating by parts there results (4.15) βκp upudvgB = λ pdvgB . Thus sign(λ) = sign(κp) since β, up and u are all positive. Conversely, if λ is a real constant such that sign(λ) = sign(κp) 6= 0, then by Remark 4.5, (λ, uλ) is a solution of (4.9), where uλ = tλup and On the other side, if λ = κp = 0, then (0, up) is a solution of (4.9). Since 1 < p, the uniqueness for λ < 0 is a consequence of Corollary 4.2. � Theorem 4.10. (Cases : µ = µpY ) If there exists ψ ∈ C∞>0(B) such that the scalar curvature of B ×[ψµpY ,ψ] F is a constant λ, then sign(λ) = sign(κpY ). Furthermore, if λ ∈ R verifying sign(λ) = sign(κpY ) and (4.13), then there exists ψ ∈ C∞>0(B) such that the scalar curvature of B ×[ψµpY ,ψ] F is λ. Besides, if λ ∈ R is negative, then there exists at most one ψ ∈ C∞>0(B) such that the scalar curvature of B ×[ψµpY ,ψ] F is λ. Proof. The proof is similar to that of Theorem 4.9, but follows from the application of the case of (p = pY ) in Remark 4.8. Like above, the uniqueness of λ < 0 is a consequence of Corollary 4.2. � In the next proposition including the supercritical case, we will apply the following result (see also [73, p.99]). ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 17 Lemma 4.11. Let (Nn, gN ) be a compact connected Riemannian manifold without boundary of dimension n ≥ 2 and ∆gN be the corresponding Laplace- Beltrami operator. Consider the equation of the form (4.16) −∆gNu = f(·, u), u ∈ C∞>0(N) where f ∈ C∞(N × R>0). If there exist a0 and a1 ∈ R>0 such that (4.17) u < a0 ⇒ f(·, u) > 0 u > a1 ⇒ f(·, u) < 0, then (4.16) has a solution satisfying a0 ≤ u ≤ a1. Proposition 4.12. (Cases : −∞ < µ < µsc or 0 < µ) If maxSB < 0, then for all λ < 0 there exists ψ ∈ C∞>0(B) such that the scalar curvature of B ×[ψµ,ψ] F is the constant λ. Furthermore, the solution ψ is unique. Proof. The conditions (−∞ < µ < µsc or 0 < µ) imply that 1 < p := 2µα+ 1. On the other hand, since B is compact, by taking f(., u) = −SB(·)u+ λup = (−SB + λup−1)u, we obtain that limu−→0+ f(·, u) = 0+ and limu−→+∞ f(·, u) = −∞. Thus (4.17) is verified. Hence, the proposition is proved by applying Lemma 4.11 on (Bm, gB). No- tice that a0 can take positive values and eventually gets close enough to 0 due to the condition of limu−→0+ f(·, u), and consequently the corresponding solution results positive. Again, since λ < 0 and 1 < p the uniqueness is a consequence of Corollary 4.2. � Proof. (of Theorem 2.5) This is an immediate consequence of the above results. � • The case of a fiber with negative constant scalar curvature, i.e. SF < 0. Here, the (Pb-sc) becomes equivalent to the study of the existence for the problem (4.18) −β∆Bu+ SBu = λup − SFuq, u ∈ C∞>0(B), where λ is a real parameter (i.e., the searched constant scalar curvature), ψ = uα, p = 2µα+ 1 and q = 2(µ − 1)α + 1. Remark 4.13. Let u be a solution of (4.18). 18 FERNANDO DOBARRO & BÜLENT ÜNAL (i) If λ1 ≤ 0, then λ < 0. Indeed, multiplying the equation in (4.18) by u1 and integrating by parts there results: (4.19) λ1 u1udvgB + SF qdvgB = λ pdvgB , where u1 and u are positive. (ii) If λ = 0, then λ1 > 0. (iii) If µ = 0 (the warped product case), then λ < λ1. These cases have been studied in [27, 24]. (iv) If µ = 1 (the Yamabe problem for the usual product with conformal factor in C∞>0(B)), there results sign(λ) = sign(λ1 + SF ). An immediate consequence of Remark 4.13 is the following lemma. Lemma 4.14. Let B and F be given like in Theorem 2.3(i). Suppose further that B is a compact connected Riemannian manifold and F is a pseudo- Riemannian manifold of constant scalar curvature SF < 0. If λ ≥ 0 and λ1 ≤ 0 (for instance when SB ≤ 0 on B), then there is no ψ ∈ C∞>0(B) such that the scalar curvature of B ×[ψµ,ψ] F is λ. Theorem 4.15. [29, Rows 6 and 8 in Table 4] Under the hypothesis of Theorem 2.3(i), let B be a compact connected Riemannian manifold and F be a pseudo-Riemannian manifold of constant scalar curvature SF < 0. Suppose that “(m,k) ∈ D and µ ∈ (0, 1)” or “(m,k) ∈ CD and µ ∈ (0, 1) ∩ C[µ−, µ+]”. (1) If λ1 ≤ 0, then λ ∈ R is the scalar curvature of a B ×[ψµ,ψ] F if and only if λ < 0. (2) If λ1 > 0, then there exists Λ ∈ R>0 such that λ ∈ R \ {Λ} is the scalar curvature of a B ×[ψµ,ψ] F if and only if λ < Λ. Furthermore if λ ≤ 0, then there exists at most one ψ ∈ C∞>0(B) such that B ×[ψµ,ψ] F has scalar curvature λ. Proof. The proof of this theorem is the subject matter of §5. � Once again we make use of Lemma 4.11 for the next theorem about the singular case and the following propositions. Theorem 4.16. [29, Row 7 Table 4] Under the hypothesis of Theorem 2.3(i), let B be a compact connected Riemannian manifold and F be a pseudo- Riemannian manifold of constant scalar curvature SF < 0. Suppose that “(m,k) ∈ CD and µ ∈ (0, 1) ∩ (µ−, µ+)”, then for any λ < 0 there exists ψ ∈ C∞>0(B) such that the scalar curvature of B×[ψµ,ψ] F is λ. Furthermore the solution ψ is unique. ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 19 Proof. First of all note that the conditions “(m,k) ∈ CD and µ ∈ (0, 1) ∩ (µ−, µ+)” imply that q < 0 and 1 < p , i.e. the problem (4.18) is superlinear in p but singular in q. On the other hand, since B is compact, taking f(., u) = −SB(·)u+ λup − SFuq = [(−SB(·) + λup−1)u1−q − SF ]uq, there result limu−→0+ f(·, u) = +∞ and limu−→+∞ f(·, u) = −∞. Thus (4.17) is verified. Thus by an application of Lemma 4.11 for (Bm, gB), we conclude the proof for the existence part. The uniqueness part just follows from Corollary 4.2. � Remark 4.17. We observe that the arguments applied in the proof of The- orem 4.16 can be adjusted to the case of a compact connected Riemannian manifold B with 0 ≤ q < 1 < p, λ < 0 and SF < 0, so that some of the situations included in Theorem 4.15. However, both argumentations are compatible but different. Proof. (of Theorem 2.6) This is an immediate consequence of the above results. � The approach in the next propositions is similar to Proposition 4.12 and Theorem 4.16. Proposition 4.18. [29, Row 10 Table 4] Let 1 < µ < +∞. If maxSB < 0, then for all λ < 0 there exists ψ ∈ C∞>0(B) such that the scalar curvature of B ×[ψµ,ψ] F is the constant λ. Proof. The condition 1 < µ < +∞ implies that 1 < q < p. On the other hand, since B is compact, taking f(., u) = −SB(·)u+ λup − SFuq = [−SB(·) + (λup−q − SF )uq−1]u, there result limu−→0+ f(·, u) = 0+ and limu−→+∞ f(·, u) = −∞. Thus (4.17) is satisfied. Thus an elementary application of Lemma 4.11 for (Bm, gB) proves the proposition. � Proposition 4.19. [29, Rows 2, 4 and 3 in Table 4] Let either “(m,k) ∈ D and µ ∈ (µsc, 0)” or “(m,k) ∈ CD and µ ∈ (µsc, 0) ∩ C[µ−, µ+]” or “(m,k) ∈ CD and µ ∈ (µsc, 0)∩ (µ−, µ+)”. If minSB > 0, then for all λ ≤ 0 there exists a smooth function ψ ∈ C∞>0(B) such that the scalar curvature of B ×[ψµ,ψ] F is the constant λ. Proof. If either “(m,k) ∈ D and µ ∈ (µsc, 0)” or “(m,k) ∈ CD and µ ∈ (µsc, 0) ∩ C[µ−, µ+]”, then 0 < q < p < 1. 20 FERNANDO DOBARRO & BÜLENT ÜNAL On the other hand, since B is compact, taking f(., u) = −SB(·)u+ λup − SFuq = [−SB(·)u1−q + λup−q − SF ]uq, there result limu−→0+ f(·, u) = 0+ and limu−→+∞ f(·, u) = −∞. Thus (4.17) is verified and again we can apply Lemma 4.11 for (Bm, gB). If “(m,k) ∈ CD and µ ∈ (µsc, 0) ∩ (µ−, µ+)”, then q < 0 < p < 1. Con- sidering the limits as above, limu−→0+ f(·, u) = +∞ and limu−→+∞ f(·, u) = −∞. So, an application of Lemma 4.11 concludes the proof. � Remark 4.20. Notice that in Theorems 4.15 and 4.16 we do not assume hypothesis related to the sign of SB(·), unlike in Propositions 4.12, 4.18 and 4.19. Proposition 4.21. [29, Rows 5 and 9 in Table 4] Let (m,k) ∈ CD be. (1) If either “µ ∈ m− 1 , 0 ∩ {µ−, µ+} and minSB > 0” or “µ ∈ (0, 1) ∩ {µ−, µ+}”, then for all λ < 0 there exists a smooth function ψ ∈ C∞>0(B) such that the scalar curvature of B ×[ψµ,ψ] F is the constant λ. In the second case, ψ is also unique . (2) If either “µ ∈ m− 1 , 0 ∩ {µ−, µ+}” or “µ ∈ (0, 1) ∩ {µ−, µ+}” and furthermore λ1 > 0, then there exists a smooth function ψ ∈ C∞>0(B) such that the scalar curvature of B ×[ψµ,ψ] F is 0. Proof. In both cases q = 0, so by considering f(., u) = −SB(·)u+ λup − SF , the proof of (1) follows as in the latter propositions, while that of (2) is a consequence of the linear theory and the maximum principle. � Remark 4.22. Finally, we observe a particular result about the cases studied in [27]. If µ = 0, then p = 1 and q = 1−2α = k − 3 k + 1 . When the dimension of the fiber is k = 2, the exponent q = −1 . So, writing the involved equation ∆Bu = f(., u) = −SB(·)u + λu− SFu− and by applying Lemma 4.11 as above, we obtain that if λ < minSB, then there exists a smooth function ψ ∈ C∞>0(B) such that the scalar curvature of B ×ψ F is the constant λ. Furthermore, by Corollary 4.2 such ψ is unique (see [27, 24] and [25]). ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 21 5. Proof of the Theorem 4.15 The subject matter of this section is the proof of the Theorem 4.15, so we naturally assume its hypothesis. Most of the time, we need to specify the dependence of λ of (4.18), we will do that by writing (4.18)λ. Furthermore, we will denote the right hand side of (4.18)λ by fλ(t) := λt p − SF tq. The conditions either “(m,k) ∈ D and µ ∈ (0, 1)” or “(m,k) ∈ CD and µ ∈ (0, 1) ∩ C[µ−, µ+]”, imply that 0 < q < 1 < p. But the type of nonlinearity in the right hand side of (4.18)λ changes with the signλ, i.e. it is purely concave for λ < 0 and concave-convex for λ > 0. The uniqueness for λ ≤ 0 is again a consequence of Corollary 4.2. In order to prove the existence of a solution for (4.18)λ with signλ 6= 0, we adapt the approach of sub and upper solutions in [5]. Thus, the proof of Theorem 4.15 will be an immediate consequence of the results that follows. Lemma 5.1. (4.18)0 has a solution if and only if λ1 > 0. Proof. This situation is included in the results of the second case of Theorem 4.7 by replacing −SF with λ (see [24, Proposition 3.1]). � Lemma 5.2. Let us assume that {λ : (4.18)λ has a solution} is non-empty and define (5.1) Λ = sup{λ : (4.18)λ has a solution}. (i) If λ1 ≤ 0, then Λ ≤ 0. (ii) If λ1 > 0, then there exists λ > 0 finite such that Λ ≤ λ. Proof. (i) It is sufficient to observe Remark 4.13 i. (ii) Like in [5], let λ > 0 such that (5.2) λ1t < λt p − SF tq,∀t ∈ R, t > 0. Thus, if (λ, u) is a solution of (4.18)λ, then p − SF λ1u1u < λ p − SF so λ < λ. Lemma 5.3. Let (5.3) Λ = sup{λ : (4.18)λ has a solution}. 22 FERNANDO DOBARRO & BÜLENT ÜNAL Figure 1. The nonlinearity fλ in Lemma 5.3, i.e. 0 < q < 1 < p, SF < 0, λ1 > 0, λ > 0. (i) Let E ∈ R>0. There exist 0 < λ0 = λ0(E) and 0 < M = M(E,λ0) such that ∀λ : 0 < λ ≤ λ0, so we have (5.4) 0 < E fλ(EM) (ii) If λ1 > 0, then {λ > 0 : (4.18)λ has a solution} 6= ∅. As a conse- quence of that, Λ is finite. (iii) If λ1 > 0, then for all 0 < λ < Λ there exists a solution of the problem (4.18)λ. Proof. (i) For any 0 < λ < λ0 0 < gλ(r) := E fλ(Er) = Erq−1(λEp−1rp−q − SFEq−1) < Erq−1(λ0E p−1rp−q − SFEq−1). It is easy to see that q − 1 p−q 1 ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 23 is a minimum point for gλ0 and gλ0(r0) = E q − 1 ) q−1 q − 1 p− 1 − 1 → 0+, as λ0 → 0+. Hence there exist 0 < λ0 = λ0(E) and 0 < M =M(E,λ0) such that (5.4) is verified. (ii) Since λ1 > 0, by the maximum principle, there exists a solution e ∈ C∞>0(B) of (5.5) LB(e) = −β∆Be+ SBe = 1. Then, applying item (i) above with E = ‖e‖∞ there exists 0 < λ0 = λ0(‖e‖∞) and 0 < M = M(‖e‖∞, λ0) such that ∀λ with 0 < λ ≤ λ0 we have that (5.6) LB(Me) =M ≥ fλ(Me), hence Me is a supersolution of (4.18)λ. On the other hand, since ǔ1 := inf u1 > 0, for all λ > 0 (5.7) ǫ−1fλ(ǫǔ1) = ǫ q−1[λǫp−qǔ 1 − SF ǔ 1] → +∞, as ǫ→ 0+. Furthermore, note that fλ is nondecreasing when λ > 0. Hence for any 0 < λ there exists a small enough 0 < ǫ verifying (5.8) LB(ǫu1) = ǫλ1u1 ≤ ǫλ1‖u1‖∞ ≤ fλ(ǫǔ1) ≤ fλ(ǫu1), thus ǫu1 is a subsolution of (4.18)λ. Then for any 0 < λ < λ0, (taking eventually 0 < ǫ smaller if necessary), we have that the above constructed couple sub super solution satisfies (5.9) ǫu1 < Me. Now, by applying the monotone iteration scheme, we have that {λ > 0 : (4.18)λ has a solution} 6= ∅. Furthermore by Lemma 5.2 (ii) there results Λ is finite. (iii) The proof of this item is completely analogous to Lemma 3.2 in [5]. We will rewrite this to be self contained. Given λ < Λ, let uν be a solution of (4.18)ν with λ < ν < Λ. Then uν is a supersolution of (4.18)λ and for small enough 0 < ǫ, the subsolution ǫu1 of (4.18)λ verifies ǫu1 < uν , then as above (4.18)λ has a solution. Lemma 5.4. For any λ < 0, there exists γλ > 0 such that ‖u‖∞ ≤ γλ for any solution u of (4.18)λ. Furthermore if SB is nonnegative, then positive zero of fλ can be choose as γλ. 24 FERNANDO DOBARRO & BÜLENT ÜNAL fΛHtL fΛHtL+Νt Figure 2. The nonlinearity in Lemma 5.5 , i.e. 0 < q < 1 < p, SF < 0, λ1 > 0, λ < 0. Proof. Define ŠB := minSB (recall that B is compact). There are two different situations, namely. • 0 ≤ ŠB: since there exists x1 ∈ B such that u(x1) = ‖u‖∞ and 0 ≤ −β∆Bu(x1) = −SB(x1)‖u‖∞ + λ‖u‖p∞ − SF‖u‖q∞, there results ‖u‖∞ ≤ γλ, where γλ is the strictly positive zero of fλ. • ŠB < 0: we consider f̃λ(t) := λtp − SF tq − ŠBt. Now, our problem (4.18)λ is equivalent to −β∆Bu+ (SB − ŠB)u = f̃λ(u), u ∈ C∞>0(B). But here the potential of (SB− ŠB) is non negative and the function f̃λ has the same behavior of fλ with a positive zero γ̃λ on the right side of the positive zero γλ of fλ. Thus, repeating the argument for the case of ŠB ≥ 0, we proved ‖u‖∞ ≤ γ̃λ. Lemma 5.5. Let λ1 > 0. Then for all λ < 0 there exists a solution of (4.18)λ. Proof. We will apply again the monotone iteration scheme. Define ŠB := minSB (note that B is compact). ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 25 • 0 ≤ ŠB: Clearly, the strictly positive zero γλ of fλ is a supersolution (5.10) − β∆Bu+ (SB + ν)u = fλ(u) + νu, for all ν ∈ R. On the other hand, for 0 < ǫ = ǫ(λ) small enough, (5.11) LB(ǫu1) = ǫλ1u1 ≤ fλ(ǫu1). Then ǫu1 is a subsolution of (5.10) for all ν ∈ R. By taking ε possibly smaller, we also have (5.12) 0 < ǫu1 < γλ. We note that for large enough values of ν ∈ R>0, the nonlinearity on the right hand side of (5.10), namely fλ(t) + νt, is an increasing function on [0, γλ]. Thus applying the monotone iteration scheme we obtain a strictly positive solution of (5.10), and hence a solution of (4.18)λ (see [3], [4], [54]). • ŠB < 0: In this case, like in Lemma 5.4 we consider f̃λ(t) := λtp − q − ŠBt. Then, the problem (4.18)λ is equivalent to (5.13) −β∆Bu+ (SB − ŠB)u = f̃λ(u), u ∈ C∞>0(B), where the potential is nonnegative and the function f̃λ has a similar behavior to fλ with a positive zero γ̃λ on the right side of the positive zero γλ of fλ. Here, it is clear that γ̃λ is a positive supersolution of (5.14) − β∆Bu+ (SB − ŠB + ν)u = f̃λ(u) + νu, for all ν ∈ R. Hence, we complete the proof similarly to the case of ŠB ≥ 0. Lemma 5.6. Let λ1 ≤ 0, λ < 0, ŠB := minSB and also let γλ be a positive zero of fλ and γ̃λ be a positive zero of f̃λ := fλ − ŠBidR≥0 . Then there exists a solution u of (4.18)λ. Furthermore any solution of (4.18)λ satisfies γλ ≤ ‖u‖∞ ≤ γ̃λ. Proof. First of all we observe that if SB ≡ 0 (so λ1 = 0), then u ≡ γλ is the searched solution of (4.18)λ. Now, we assume that SB 6≡ 0. Since λ1 ≤ 0, there results ŠB < 0. In this case, one can notice that 0 < γλ < γ̃λ. 26 FERNANDO DOBARRO & BÜLENT ÜNAL On the other hand, the problem (4.18)λ is equivalent to (5.15) −β∆Bu+ (SB − ŠB)u = f̃λ(u), u ∈ C∞>0(B). By the second part of the proof of Lemma 5.4, if u is a solution of (4.18)λ (or equivalently (5.15)), then ‖u‖∞ ≤ γ̃λ. Besides, since u1(fλ ◦ u) = λ1 u, u1 > 0 and λ1 ≤ 0 results γλ ≤ ‖u‖∞. From this point on, the proof of the existence of solutions for (5.15) follows the lines of the second part of Lemma 5.5. � 6. Conclusions and future directions Now, we would like to summarize the content of the paper and to propose our future plans on this topic. We remark to the reader that several computations and proofs, along with other complementary results mentioned in this article and references can be obtained in [29]. We have chosen this procedure to avoid the involved long computations. In brief, we introduced and studied curvature properties of a particular family of warped products of two pseudo-Riemannian manifolds which we called as a base conformal warped product. Roughly speaking the metric of such a product is a mixture of a conformal metric on the base and a warped metric. We concentrated our attention on a special subclass of this structure, where there is a specific relation between the conformal factor c and the warping function w, namely c = wµ with µ a real parameter. As we mentioned in §1 and the first part of §2, these kinds of metrics and considerations about their curvatures are very frequent in different physi- cal areas, for instance theory of general relativity, extra-dimension theories (Kaluza-Klein, Randall-Sundrum), string and super-gravity theories; also in global analysis for example in the study of the spectrum of Laplace-Beltrami operators on p-forms, etc. More precisely, in Theorems 3.1 and 3.2, we obtained the classical relations among the different involved Ricci tensors (respectively, scalar curvatures) for metrics of the form c2gB⊕w2gF . Then the study of particular families of either scalar or tensorial nonlinear partial differential operators on pseudo- Riemannian manifolds (see Lemmas 3.3 and 3.7) allowed us to find reduced expressions of the Ricci tensor and scalar curvature for metrics as above with c = wµ, where µ a real parameter (see Theorems 2.2 and 2.3). The operated reductions can be considered as generalizations of those used by Yamabe in [79] in order to obtain the transformation law of the scalar curvature under ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 27 a conformal change in the metric and those used in [27] with the aim to obtain a suitable relation among the involved scalar curvatures in a singly warped product (see also [52] for other particular application and our study on multiply warped products in [28]). In §4 and 5, under the hypothesis that (B, gB) be a “compact” and con- nected Riemannian manifold of dimension m ≥ 3 and (F, gF ) be a pseudo - Riemannian manifold of dimension k ≥ 0 with constant scalar curvature SF , we dealt with the problem (Pb-sc). This question leads us to ana- lyze the existence and uniqueness of solutions for nonlinear elliptic partial differential equations with several kinds of nonlinearities. The type of non- linearity changes with the value of the real parameter µ and the sign of SF . In this article, we concentrated our attention to the cases of constant scalar curvature SF ≤ 0 and accordingly the central results are Theorems 2.5 and 2.6. Although our results are partial so that there are more cases to study in forthcoming works, we obtained also other complementary results under more restricted hypothesis about the sign of the scalar curvature of the base. Throughout our study, we meet several types of partial differential equa- tions. Among them, most important ones are those with concave-convex nonlinearities and the one so called Lichnerowicz-York equation. About the former, we deal with the existence of solutions and leave the question of multiplicity of solutions to a forthcoming study. We observe that the previous problems as well as the study of the Ein- stein equation on base conformal warped products, (ψ, µ)-bcwp’s and their generalizations to multi-fiber cases, give rise to a reach family of interesting problems in differential geometry and physics (see for instance, the several recent works of R. Argurio, J. P. Gauntlett, M. O. Katanaev, H. Kodama, J. Maldacena, H. -J. Schmidt, A. Strominger, K. Uzawa, P. S. Wesson among many others) and in nonlinear analysis (see the different works of A. Am- brosetti, T. Aubin, I. Choquet-Bruat, J. Escobar, E. Hebey, J. Isenberg, A. Malchiodi, D. Pollack, R. Schoen, S. -T. Yau among others). Appendix A. Let us assume the hypothesis of Theorem 2.3 (i), the dimensions of the base m ≥ 2 and of the fiber k ≥ 1. In order to describe the classification of the type of nonlinearities involved in (2.11), we will introduce some notation (for a complete study of these nonlinearities see [29, Section 5]). The example in Figure 1 will help the reader to clarify the notation. Note that the denominator in (2.12) is (A.1) η := (m− 1)(m − 2)µ2 + 2(m− 2)kµ + (k + 1)k 28 FERNANDO DOBARRO & BÜLENT ÜNAL and verifies η > 0 for all µ ∈ R. Thus α in (2.12) is positive if and only if µ > − k m− 1 and by the hypothesis µ 6= − m− 1 in Theorem 2.3 (i), results α 6= 0. We now introduce the following notation: (A.2) p = p(m,k, µ) = 2µα+ 1 and q = q(m,k, µ) = 2(µ− 1)α + 1 = p− 2α, where α is defined by (2.12). Thus, for all m,k, µ given as above, p is positive. Indeed, by (A.1), p > 0 if and only if ̟ > 0, where ̟ := ̟(m,k, µ) := 4µ[k + (m− 1)µ] + (m− 1)(m− 2)µ2 + 2(m− 2)kµ + (k + 1)k = (m− 1)(m+ 2)µ2 + 2mkµ + (k + 1)k. But discr (̟) ≤ −4km2 ≤ −16 and m > 1, so ̟ > 0. Unlike p, q changes sign depending on m and k. Furthermore, it is im- portant to determine the position of p and q with respect to 1 as a function of m and k. In order to do that, we define (A.3) D := {(m,k) ∈ N≥2 × N≥1 : discr (̺(m,k, ·)) < 0}, where N≥l := {j ∈ N : j ≥ l} and ̺ := ̺(m,k, µ) := 4(µ − 1)[k + (m− 1)µ] + (m− 1)(m− 2)µ2 + 2(m− 2)kµ + (k + 1)k = (m− 1)(m + 2)µ2 + 2(mk − 2(m− 1))µ + (k − 3)k. Note that by (A.1), q > 0 if and only if ̺ > 0. Furthermore q = 0 if and only if ̺ = 0. But here discr (̺(m,k, ·)) changes its sign as a function of m and k. We adopt here the notation in [29, Table 4] below, namely CD = (N≥2× N≥1)\D if D ⊆ N≥2×N≥1 and CI = R\I if I ⊆ R. Thus, if (m,k) ∈ CD, let µ− and µ+ two roots (eventually one, see [29, Remark 5.3]) of q, µ− ≤ µ+. Besides, if discr (̺(m,k, ·)) > 0, then µ− < 0; whereas µ+ can take any sign. References [1] O. Aharony, S. S. Gubser, J. Maldacena, H. Ooguri and Y. Oz, Large N Field Theories, String Theory and Gravity, Physics Reports 323 (2000), 183-386 [arXiv:hep-th/9905111]. [2] S. Alama, Semilinear elliptic equations with sublinear indefinite nonlinearities, Ad- vances in Differential Equations 4 No. 6 (1999), 813-842. [3] H. Amann, On the number of solutions of nonlinear equations in ordered Banach spaces, J. Func. Anal. 11(1972), 346-384. [4] H. Amann, Fixed point equations and nonlinear eigenvalue problems in ordered Banach spaces, SIAM Rev. 18(1976), 620-709. http://arxiv.org/abs/hep-th/9905111 ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 29 ΜscΜpY Μ-Μ+ 1 pHm,k,ΜL qHm,k,ΜL ΑHm,k,ΜL ΡHm,k,ΜL Figure 3. Example: (m,k) = (7, 4) ∈ CD [5] A. Ambrosetti, N. Brezis and G. Cerami, Combined effects of concave and convex nonlinearities in some elliptic problems, J. Funct. Anal. 122 No. 2 (1994), 519-543. [6] A. Ambrosetti, J. Garcia Azorero and I. Peral, Existence and multiplicity results for some nonlinear elliptic equations: a survey, Rendiconti di Matematica Serie VII Vol- ume 20 (2000), 167-198. [7] A. Ambrosetti and P. Hess, Positive solutions of asymptotically linear elliptic eigenvalue problems, J. Math. Anal. Appl. 73 (1980), 411-422. [8] A. Ambrosetti, A. Malchiodi and W-M.Ni, Singularly Perturbed Elliptic Equations with Symmetry: Existence of Solutions Concentrating on Spheres, Part I, (2002). [9] A. Ambrosetti and P. H. Rabinowitz, Dual variational methods in critical points theory and applications, J. Funct. Anal. 14 (1973), 349–381. [10] M. T. Anderson, P. T. Chrusciel and E. Delay, Non-trivial, static, geodesically com- plete, vacuum space-times with a negative cosmological constant, JHEP 10 (2002), 063. [11] F. Antoci, On the spectrum of the Laplace-Beltrami opertor for p−forms for a class of warped product metrics, Advances in Mathematics 188 (2) (2004), 247-293 [arXiv:math.SP/0311184]. [12] R. Argurio, Brane Physics in M-theory, PhD thesis (Université Libre de Bruxelles), ULB-TH-98/15 [arXiv:hep-th/9807171]. [13] T. Aubin, Nonlinear analysis on manifolds. Monge-Ampere equations, Comprehensive Studies in Mathematics no. 252, Springer Verlag, Berlin (1982). [14] M. Badiale and F. Dobarro, Some Existence Results for Sublinear Elliptic Problems in Rn, Funkcialaj Ekv. 39 (1996), 183-202. [15] M. Bañados, C. Teitelboim and J. Zanelli, The black hole in three dimensional space- time, Phys. Rev. Letters 69 (1992), 1849-1851. [16] M. Bañados, C. Henneaux, C. Teitelboim and J. Zanelli, Geometry of 2+1 black hole, Phys. Rev. D. 48 (1993), 1506-1525. http://arxiv.org/abs/math/0311184 http://arxiv.org/abs/hep-th/9807171 30 FERNANDO DOBARRO & BÜLENT ÜNAL [17] J. K. Beem, P. E. Ehrlich and K. L. Easley, Global Lorentzian Geometry, 2nd Edition, Pure and Applied Mathematics Series Vol. 202, Marcel Dekker Ink., New York (1996). [18] A. Besse, Einstein manifolds, Modern Surveys in Mathematics no. 10, Springer Ver- lag, Berlin (1987). [19] R. L. Bishop and B. O’Neil, Manifolds of negative curvature, Trans. Amer. Math. Soc. 145 (1969), 1-49. [20] H. Brezis, S. Kamin, Sublinear elliptic equation in RN , Manus. Math. 74 (1992), p.87-106. [21] J. Chabrowski and J. B. do O, On Semilinear Elliptic Equations Involving Concave and Convex Nonlinearities, Math. Nachr. 233-234 (2002), 55-76. [22] Y. Choquet-Bruhat, J. Isenberg and D. Pollack, The constraint equations for the Einstein-scalar field system on compact manifolds, Class. Quantum Grav. 24 (2007), 809-828 [arXiv:gr-qc/0610045]. [23] C. Cortázar, M. Elgueta and P. Felmer, On a semilinear elliptic problem in Rn with a non-Lipschitzian nonlinearity, Advances in Differential Equations 1 (2) (1996) 199-218. [24] V. Coti Zelati, F. Dobarro and R. Musina, Prescribing scalar curvature in warped products, Ricerche Mat. 46 (1) (1997), 61-76. [25] M. G. Cradall, P. H. Rabinowitz and L. Tartar, On a Dirichlet problem with a singular nonlinearity, MRC Report 1680 (1976). [26] D. De Figueiredo, J-P. Gossez and P. Ubilla, Local superlinearity and sublinearity for indefinite semilinear elliptic problems, J. Funct. Anal. 199 (2) (2003), 452-467. [27] F. Dobarro and E. Lami Dozo, Scalar curvature and warped products of Riemann manifolds, Trans. Amer. Math. Soc. 303 (1987), 161-168. [28] F. Dobarro and B. Ünal, Curvature of multiply warped products, J. Geom. Phys. 55 (1) (2005), 75-106 [arXiv:math.DG/0406039]. [29] F. Dobarro and B. Ünal, Curvature of Base Conformal Warped Products, arXiv:math.DG/0412436. [30] A. V. Frolov, Kasner-AdS spacetime and anisotropic brane-world cosmology, Phys.Lett. B 514 (2001), 213-216 [arXiv:gr-qc/0102064]. [31] A. Garcia-Parrado, Bi-conformal vector fields and their applications to the characterization of conformally separable pseudo-Riemannian manifolds, arXiv:math-ph/0409037. [32] A. Garcia-Parrado, J. M. M. Senovilla Bi-conformal vector fields and their applica- tions, Class. Quantum Grav. 21, 2153-2177. [33] J. P. Gauntlett, N. Kim and D. Waldram, M-Fivebranes Wrapped on Supersymmetric Cycles, Phys.Rev. D 63 (2001), 126001 [arXiv:hep-th/0012195]. [34] J. P. Gauntlett, N. Kim and D. Waldram, M-Fivebranes Wrapped on Supersymmetric Cycles II, Phys.Rev. D 65 (2002), 086003 [arXiv:hep-th/0109039]. [35] J. P. Gauntlett, N. Kim, S. Pakis and D. Waldram, M-theory solutions with AdS factors, Class. Quantum Grav. 19 (2002), 3927-3945. [36] J. P. Gauntlett, D. Martelli, J. Sparks and D. Waldram, Supersymmetric AdS5 solu- tions of M-theory, Class.Quant.Grav. 21 (2004), 4335-4366 [arXiv:hep-th/0402153]. [37] A.M. Ghezelbash and R.B. Mann , Atiyah-Hitchin M-Branes, JHEP 0410 (2004), 012 [arXiv:hep-th/0408189]. [38] J. T. Giblin, Jr. and A. D. Hwang, Spacetime Slices and Surfaces of Revolution, J.Math.Phys. 45 (2004), 4551 [arXiv:gr-qc/0406010]. [39] J. T. Giblin Jr., D. Marlof and R. H. Garvey, Spacetime Embedding Dia- grams for Spherically Symmetric Black Holes, Gen.Rel.Grav. 36 (2004), 83-99 [arXiv:gr-qc/0305102]. http://arxiv.org/abs/gr-qc/0610045 http://arxiv.org/abs/math/0406039 http://arxiv.org/abs/math/0412436 http://arxiv.org/abs/gr-qc/0102064 http://arxiv.org/abs/math-ph/0409037 http://arxiv.org/abs/hep-th/0012195 http://arxiv.org/abs/hep-th/0109039 http://arxiv.org/abs/hep-th/0402153 http://arxiv.org/abs/hep-th/0408189 http://arxiv.org/abs/gr-qc/0406010 http://arxiv.org/abs/gr-qc/0305102 ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 31 [40] B. R. Greene, K. Schalm, G. Shiu, Warped compactifications in M and F theory , Nucl.Phys. B 584 (2000), 480-508 [arXiv:hep-th/0004103]. [41] S. W. Hawking and G. F. Ellis, The large scale structure of space-time, Cambridge Monographs on Mathematical Physics, (1973). [42] E. Hebey, Variational methods and elliptic equations in Riemannian geometry, Notes from lectures at ICTP, Workshop on recent trends in nonlinear variational problems, http://www.ictp.trieste.it, 2003 smr1486/3. [43] E. Hebey, F. Pacard, D. Pollack, A variational analysis of Einstein-scalar field Lich- nerowicz equations on compact Riemannian manifolds, arXiv:gr-qc/0702203. [44] E. Hebey, M. Vaugon, From best constants to critical functions, Math. Z. 237 (2001), 737-767. [45] S.-T. Hong, J. Choi and Y.-J. Park, (2 + 1) BTZ Black hole and multiply warped product space time, General Relativity and Gravitation 35 12 (2003), 2105-2116. [46] M. Ito, Five dimensional warped geometry with bulk scalar field, arXiv:hep-th/0109040. [47] M. O. Katanaev, T. Klösch and W. Kummer, Global properties of warped solutions in general relativity, Ann. Physics 276 (2) (1999), 191-222. [48] J. Kazdan, Some applications of partial differential equations to problems in geometry, Surveys in Geometry Series, Tokyo Univ. (1983). [49] D.-S. Kim and Y. H. Kim, Compact Einstein warped product spaces with nonpositive scalar curvature, Proc. Amer. Math. Soc. 131 (8) (2003), 2573-2576. [50] H. Kodama and K. Uzawa, Moduli instability in warped compactifications of the type- IIB supergravity, JHEP07(2005)061 [arXiv:hep-th/0504193]. [51] H. Kodama and K. Uzawa, Comments on the four-dimensional effective theory for warped compactification, JHEP03(2006)053 [arXiv:hep-th/0512104]. [52] J. Lelong-Ferrand, Geometrical interpretations of scalar curvature and regularity of conformal homeomorphisms, Differential Geometry and Relativity, Mathematical Phys. and Appl. Math. Vol. 3, Reidel, Dordrecht (1976), 91-105. [53] J. E. Lidsey, Supergravity Brane Cosmologies, Phys. Rev. D 62 (2000), 083515 [arXiv:hep-th/0007014]. [54] P. L. Lions, On the existence of positive solutions of semilinear elliptic equations, SIAM Review 24 4 (1982), 441-467. [55] J. Maldacena, The Large N Limit of Superconformal field theories and supergravity, Adv.Theor.Math.Phys. 2 (1998), 231-252; Int. J. Theor. Phys. 38 (1999), 1113-1133 [arXiv:hep-th /9711200]. [56] R. Melrose, Geometric scattering theory, Stanford Lectures, Cambridge University Press, Cambridge (1995). [57] C. W. Misner, J. A. Wheeler and K. S. Thorne, Gravitation, W. H. Freeman and Company, San Francisco (1973). [58] Ó. N. Murchadha, Readings of the Licherowicz-York equation, Acta Physica Polonica B 36, 1 (2005), 109-120. [59] B. O’Neil, Semi-Riemannian geometry, Academic Press, New York (1983). [60] J. M. Overduin and P. S. Wesson, Kaluza-Klein Gravity, Phys.Rept. 283 (1997), 303- 380 [arXiv:gr-qc/9805018]. [61] G. Papadopoulos and P. K. Townsend, Intersecting M-branes, Physics Letters B 380 (1996), 273-279 [arXiv:hep-th/9603087]. [62] J. L. Petersen, Introduction to the Maldacena Conjecture on AdS/CFT, Int.J.Mod.Phys. A 14 (1999), 3597-3672 [arXiv:hep-th/9902131]. http://arxiv.org/abs/hep-th/0004103 http://www.ictp.trieste.it http://arxiv.org/abs/gr-qc/0702203 http://arxiv.org/abs/hep-th/0109040 http://arxiv.org/abs/hep-th/0504193 http://arxiv.org/abs/hep-th/0512104 http://arxiv.org/abs/hep-th/0007014 http://arxiv.org/abs/gr-qc/9805018 http://arxiv.org/abs/hep-th/9603087 http://arxiv.org/abs/hep-th/9902131 32 FERNANDO DOBARRO & BÜLENT ÜNAL [63] L. Randall and R. Sundrum, A large mass hierarchy from a small extra dimension, Phys. Rev. Letters 83 3770 (1999) [arXiv:hep-th/9905221]. [64] L. Randall and R. Sundrum, An alternative to compactification, Phys. Rev. Letters 83 (1999), 4690 [arXiv:hep-th/9906064]. [65] S. Randjbar-Daemi and V. Rubakov, 4d−flat compactifications with brane vorticiteis, JHEP 0410 (2004), 054 [arXiv:hep-th/0407176]. [66] H.-J. Schmidt, A new proof of Birkoff’s theorem, Gravitation and Cosmology, Grav.Cosmol. 3 (1997), 185-190 [arXiv:gr-qc/9709071]. [67] H.-J. Schmidt, Lectures on mathematical cosmology, arXiv:gr-qc/0407095. [68] R. Schoen, Conformal deformation of a Riemannian metric to constant scalar curva- ture, Journal of Differential Geometry 20 (1984), 479-495. [69] K. Schwarzschild, On the Gravitational Field of a Mass Point according to Einstein’s Theory, Sitzungsberichte der Koeniglich Preussischen Akademie der Wissenschaften zu Berlin (1916), 189-196 [arXiv:physics/9905030]. [70] J. Shi and M. Yao, Positive solutions for elliptic equations with singular nonlinearity, EJDE Vol. 2005(2005), 04, 1-11. [71] J. Soda, Gravitational waves in brane world A Midi-superspace Approach, arXiv:hep-th/0202016. [72] A. Strominger, Superstrings with torsion, Nucl. Phys. B 274 (1986), 253. [73] M. E. Taylor, Partial Differential Equations III - Nonlinear Equations, Applied Math- ematical Sciences - Springer (1996). [74] K. Thorne, Warping spacetime, The Future of Theoretical Physics and Cosmology, Part 5, Cambridge University Press (2003), 74-104. [75] N. Trudinger, Remarks concerning the conformal deformation of Rieamnnian struc- tures on compact manifolds, Ann. Scuola Norm. Sup. Pisa 22 (1968), 265-274. [76] P. S. Wesson, Space-Time-Matter, Modern Kaluza-Klein Theory, World Scientific (1999). [77] P. S. Wesson, On Higher-Dimensional Dynamics, arXiv:gr-qc/0105059. [78] M. Willem, Minimax Theorems, Birkhäuser, Boston (1996). [79] H. Yamabe On a deformation of Riemannian structures on compact manifolds, Osaka Math. J. 12 (1960), 21-37. (F. Dobarro) Dipartimento di Matematica e Informatica, Università degli Studi di Trieste, Via Valerio 12/b, I-34127 Trieste, Italy E-mail address: dobarro@dmi.units.it (B. Ünal) Department of Mathematics, Bilkent University, Bilkent, 06800 Ankara, Turkey E-mail address: bulentunal@mail.com http://arxiv.org/abs/hep-th/9905221 http://arxiv.org/abs/hep-th/9906064 http://arxiv.org/abs/hep-th/0407176 http://arxiv.org/abs/gr-qc/9709071 http://arxiv.org/abs/gr-qc/0407095 http://arxiv.org/abs/physics/9905030 http://arxiv.org/abs/hep-th/0202016 http://arxiv.org/abs/gr-qc/0105059 1. Introduction 2. Motivations and Main results 3. The curvature relations - Sketch of the proofs 4. The problem (Pb-sc) - Existence of solutions 5. Proof of the Theorem ?? 6. Conclusions and future directions Appendix A. References ABSTRACT We consider the curvature of a family of warped products of two pseduo-Riemannian manifolds $(B,g_B)$ and $(F,g_F)$ furnished with metrics of the form $c^{2}g_B \oplus w^2 g_F$ and, in particular, of the type $w^{2 \mu}g_B \oplus w^2 g_F$, where $c, w \colon B \to (0,\infty)$ are smooth functions and $\mu$ is a real parameter. We obtain suitable expressions for the Ricci tensor and scalar curvature of such products that allow us to establish results about the existence of Einstein or constant scalar curvature structures in these categories. If $(B,g_B)$ is Riemannian, the latter question involves nonlinear elliptic partial differential equations with concave-convex nonlinearities and singular partial differential equations of the Lichnerowicz-York type among others. <|endoftext|><|startoftext|> Introduction The present paper provides a finishing touch in a local classification of essentially conformally symmetric pseudo-Riemannian metrics. A pseudo-Riemannian manifold of dimension n ≥ 4 is called essentially conformal- ly symmetric if it is conformally symmetric [2] (in the sense that its Weyl conformal tensor is parallel) without being conformally flat or locally symmetric. The metric of an essentially conformally symmetric manifold is always indefinite [4, Theorem 2]. Compact essentially conformally symmetric manifolds are known to exist in all dimensions n ≥ 5 with n ≡ 5 (mod 3), where they represent all in- definite metric signatures [8], while examples of essentially conformally symmetric pseudo-Riemannian metrics on open manifolds of all dimensions n ≥ 4 were first constructed in [16]. On every conformally symmetric manifold there is a naturally distinguished parallel distribution D, of some dimension d, which we call the Olszak distribution. As shown by Olszak [13], for an essentially conformally symmetric manifold d ∈ {1, 2}. In [7] we described the local structure of all conformally symmetric manifolds with d = 2. See also Section 3. This paper establishes an analogous result (Theorem 4.1) for the case d = 1. In both cases, some of the metrics in question are locally symmetric. In Remark 4.2 we explain why a similar classification result cannot be valid just for essentially con- formally symmetric manifolds. Essentially conformally symmetric manifolds with d = 1 are all Ricci-recurrent, in the sense that, for every tangent vector field v, the Ricci tensor ρ and the covariant derivative ∇vρ are linearly dependent at each point. The local structure of essentially conformally symmetric Ricci-recurrent manifolds at points with ρ ⊗∇ρ 6= 0 has already been determined by the second author [16]. Our new contribution settles the 2000 Mathematics Subject Classification. 53B30. Key words and phrases. Parallel Weyl tensor, conformally symmetric manifold. http://arxiv.org/abs/0704.0596v1 2 A. DERDZINSKI AND W. ROTER one case still left open in the local classification problem, namely, that of essentially conformally symmetric manifolds with d = 1 at points where ρ⊗∇ρ = 0. The literature dealing with conformally symmetric manifolds includes, among oth- ers, [9, 10, 12, 15, 17, 18] and the papers cited above. A local classification of homo- geneous essentially conformally symmetric manifolds can be found in [3]. 1. Preliminaries Throughout this paper, all manifolds and bundles, along with sections and connec- tions, are assumed to be of class C∞. A manifold is, by definition, connected. Unless stated otherwise, a mapping is always a C∞ mapping betweeen manifolds. Given a connection ∇ in a vector bundle E over a manifold M , a section ψ of E , and vector fields u, v tangent to M , we use the sign convention (1) R(u, v)ψ = ∇v∇uψ − ∇u∇vψ + ∇[u,v]ψ for the curvature tensor R = R∇. The Levi-Civita connection of a given pseudo-Riemannian manifold (M, g) is al- ways denoted by ∇. We also use the symbol ∇ for connections induced by ∇, in various ∇-parallel subbundles of TM and their quotients. The Schouten tensor σ and Weyl conformal tensor W of a pseudo-Riemannian manifold (M, g) of dimension n ≥ 4 are given by σ = ρ − (2n − 2)−1 sg, with ρ denoting the Ricci tensor, s = trgρ standing for the scalar curvature, and (2) W = R − (n− 2)−1g ∧ σ. Here ∧ is the exterior multiplication of 1-forms valued in 1-forms, which uses the ordinary ∧ as the valuewise multiplication; thus, g∧σ is a 2-form valued in 2-forms. Let (t, s) 7→ x(s, t) be a fixed variation of curves in a pseudo-Riemannian manifold (M, g), that is, an M-valued C∞ mapping from a rectangle (product of intervals) in the ts-plane. By a vector field w along the variation we mean, as usual, a section of the pullback of TM to the rectangle (so that w(t, s) ∈ Tx(t,s)M). Examples are xs and xt, which assign to (t, s) the velocity of the curve t 7→ x(t, s) or s 7→ x(t, s) at s or t. Further examples are provided by restrictions to the variation of vector fields on M . The partial covariant derivatives of a vector field w along the variation are the vector fields wt, ws along the variation, obtained by differentiating w covariantly along the curves t 7→ x(t, s) or s 7→ x(t, s). Skipping parentheses, we write wts, wstt, etc., rather than (wt)s, ((ws)t)t for higher-order derivatives, as well as xss, xst instead of (xs)s, (xs)t. One always has wts = wst + R(xt, xs)w, cf. [11, formula (5.29) on p. 460], and, since the Levi-Civita connection ∇ is torsionfree, xst = xts. Thus, whenever (t, s) 7→ x(s, t) is a variation of curves in M , (3) xtss = xsst + R(xt, xs)xs . CONFORMALLY SYMMETRIC MANIFOLDS 3 2. The Olszak distribution The Olszak distribution of a conformally symmetric manifold (M, g) is the parallel subbundle D of TM , the sections of which are the vector fields u with the property that ξ∧Ω = 0 for all vector fields v, v ′ and for the differential forms ξ = g(u, · ) and Ω =W (v, v ′, · , · ). The distribution D was introduced, in a more general situation, by Olszak [13], who also proved the following lemma. Lemma 2.1. The following conclusions hold for the dimension d of the Olszak distribution D in any conformally symmetric manifold (M, g) with dimM = n ≥ 4. (i) d ∈ {0, 1, 2, n}, and d = n if and only if (M, g) is conformally flat. (ii) d ∈ {1, 2} if (M, g) is essentially conformally symmetric. (iii) d = 2 if and only if rankW=1, in the sense that W, as an operator acting on exterior 2-forms, has rank 1 at each point. (iv) If d = 2, the distribution D is spanned by all vector fields of the form W (u, v)v′ for arbitrary vector fields u, v, v′ on M . Proof. See Appendix I. � In the next lemma, parts (a) and (d) are due to Olszak [13, 2o and 3o on p. 214]. Lemma 2.2. If d ∈ {1, 2}, where d is the dimension of the Olszak distribution D of a given conformally symmetric manifold (M, g) with dimM = n ≥ 4, then (a) D is a null parallel distribution, (b) at any x ∈M the space Dx contains the image of the Ricci tensor ρx treated, with the aid of gx, as an endomorphism of TxM, (c) the scalar curvature is identically zero and R = W + (n− 2)−1g ∧ ρ, (d) W (u, · , · , · ) = 0 whenever u is a section of D, (e) R(v, v ′, · , · ) = W (v, v ′, · , · ) = 0 for any sections v and v ′ of D⊥, (f) of the connections in D and E = D⊥/D, induced by the Levi-Civita connec- tion of g, the latter is always flat, and the former is flat if d = 1. Proof. Assertion (e) for W is immediate from the definition of D. Namely, at any point x ∈M , every 2-form Ωx in the image of Wx (for Wx acting on 2-forms at x) is ∧-divisible by ξ = gx(u, · ) for each u ∈ Dxr{0}, and so Ωx(v, v ′) = 0 if v, v ′ ∈ Dx We now proceed to prove (a), (b), (c) and (d). First, let d = 2. By Lemma 2.1(iii), this amounts to the condition rankW=1, so that (a), (b) and (c) follow from Lemma 2.1(iv) combined with [7, Lemma 17.1(ii) and Lemma 17.2]. Also, for a nonzero 2-form Ωx chosen as in the last paragraph, Dx is the image of Ωx, that is, Ωx equals the exterior product of two vectors in Dx (treated as 1-forms, with the aid of gx). Now (d) follows since, by (a), Ωx(ux, · ) = 0 if u is a section of D. 4 A. DERDZINSKI AND W. ROTER Next, suppose that d = 1. Replacing M by a neighborhood of any given point, we may assume that D is spanned by a vector field u. If u were not null, we would have W (u, v, u, v ′) = 0 for any sections v, v ′ of D⊥, as one sees contracting the twice-covariant tensor field W ( · , v, · , v ′) = 0, at any point x, in an orthogonal basis containing the vector ux. (We have already established (e) for W.) Combined with (e) for W and the symmetries of W, the relation W (u, v, u, v ′) = 0 for v, v ′ in D⊥ would then give W = 0, contrary to the assumption that d = 1. Thus, u is null, which yields (a). Now (4) we choose, locally, a null vector field u′ with g(u, u′) = 1. For any section v of D⊥ one sees that W (u, · , u′, v) = 0 by contracting the tensor field W ( · , · , · , v) = 0 in the first and third arguments, at any point x, in (5) a basis of TxM formed by ux, u x and n− 2 vectors orthogonal to them, and using (e) for W, along with the inclusion D ⊂ D⊥, cf. (a). Since u′ and D⊥ span TM , assertion (e) for W thus implies (d). To prove (b) and (c) when d = 1, we distinguish two cases: (M, g) is either es- sentially conformally symmetric, or locally symmetric. For (c), it suffices to establish vanishing of the scalar curvature s (cf. (2)). Now, in the former case, s = 0 accord- ing to [5, Theorem 7], while (b) follows since, as shown in [6, Theorem 7 on p. 18], for arbitrary vector fields v, v ′ and v ′′ on an essentially conformally symmetric pseu- do-Riemannian manifold, ξ ∧ Ω = 0, where ξ = ρ(v, · ) and Ω = W (v ′, v ′′, · , · ). In the case where g is locally symmetric, (b) and (c) are established in Appendix II. Assertion (e) for R is now obvious from (e) for W and (c), since, by (b), ρ(v, · ) = 0 for any section v of D⊥. The claim about E in (f) is in turn immediate from (1) and (e) for R, which states that R(w,w ′)v, for arbitrary vector fields w,w ′ and any section v of D⊥, is orthogonal to all sections of D⊥ (and hence must be a section of D). Finally, to prove (f) for D, with d = 1, let us fix a section u of D, a vector field v, and define a differential 2-form ζ by ζ(w,w ′) = (n−2)R(w,w ′, u, v) for any vector fields w,w ′. By (c) and (e), ζ = g(u, · )∧ ρ(v, · ), as D ⊂ D⊥ (cf. (a)), and so ρ(u, · ) = 0 in view of (b) and symmetry of ρ. However, by (b), both g(u, · ) and ρ(v, · ) are sections of the subbundle of T ∗M corresponding to D under the bundle isomorphism TM → T ∗M induced by g, so that ζ = 0 since the distribution D is one-dimensional. � 3. The case d = 2 For more details of the construction described below, we refer the reader to [7]. Let there be given a surface Σ, a projectively flat torsionfree connection D on Σ with a D-parallel area form α, an integer n ≥ 4, a sign factor ε = ±1, a real vector space V of dimension n− 4, and a pseudo-Euclidean inner product 〈 , 〉 on V . CONFORMALLY SYMMETRIC MANIFOLDS 5 We also assume the existence of a twice-contravariant symmetric tensor field T on Σ with divD(divDT ) + (ρD, T ) = ε (in coordinates: T jk,jk + T jkRjk = ε). Here divD denotes the D-divergence, ρD is the Ricci tensor of D, and ( , ) stands for the obvious pairing. Such T always exists locally in Σ. In fact, according to [7, Theorem 10.2(i)] combined with [7, Lemma 11.2], T exists whenever Σ is simply connected and noncompact. For T chosen as above, we define a twice-covariant symmetric tensor field τ on Σ, that is, a section of [T ∗Σ]⊙2, by requiring τ to correspond to the section T of [TΣ]⊙2 under the vector-bundle isomorphism TΣ → T ∗Σ which acts on vector fields v by v 7→ α(v, · ). In coordinates, τjk = αjlαkmT Next, we denote by hD the Patterson-Walker Riemann extension metric [14] on the total space T ∗Σ, obtained by requiring that all vertical and all D-horizontal vectors be hD-null, while hDx (ζ, w) = ζ(dπxw) for x ∈ T ∗Σ, any vector w ∈ TxT any vertical vector ζ ∈ Ker dπx = T Σ, and the bundle projection π : T ∗Σ → Σ. Finally, let γ and θ be the constant pseudo-Riemannian metric on V correspond- ing to the inner product 〈 , 〉, and the function V → R with θ(v) = 〈v, v〉. Our Σ,D, α, n, ε, V , 〈 , 〉 now give rise to the pseudo-Riemannian manifold (6) (T ∗Σ × V, hD− 2τ + γ − θρD) , of dimension n, with the metric hD− 2τ + γ − θρD, where the function θ and covariant tensor fields τ, ρD, hD, γ on Σ, T ∗Σ or V are identified with their pull- backs to T ∗Σ × V . (Thus, for instance, hD− 2τ + γ is a product metric.) We have the following local classification result, in which d stands for the dimen- sion of Olszak distribution D. Theorem 3.1. The pseudo-Riemannian manifold (6) obtained as above from any data Σ,D, α, n, ε, V , 〈 , 〉 with the stated properties is conformally symmetric and has d = 2. Conversely, in any conformally symmetric pseudo-Riemannian manifold such that d = 2, every point has a connected neighborhood isometric to an open subset of a manifold (6) constructed above from some data Σ, D, α, n, ε, V , 〈 , 〉. The manifold (6) is never conformally flat, and it is locally symmetric if and only if the Ricci tensor ρD is D-parallel. Proof. See [7, Section 22]. Note that, in view of Lemma 2.1(iii), the condition rankW=1 used in [7] is equivalent to d = 2. � The objects Σ,D, α, n, ε, V , 〈 , 〉 are treated as parameters of the above construc- tion, while T is merely assumed to exist, even though the metric g in (6) clearly depends on τ (and hence on T ). This is justified by the fact that, with fixed Σ,D, α, n, ε, V , 〈 , 〉, the metrics corresponding to two choices of T are, locally, iso- metric to each other, cf. [7, Remark 22.1]. 6 A. DERDZINSKI AND W. ROTER The metric signature of (6) is clearly given by −− . . .++, with the dots standing for the sign pattern of 〈 , 〉. 4. The case d = 1 Let there be given an open interval I, a C∞ function f : I → R, an integer n ≥ 4, a real vector space V of dimension n − 2 with a pseudo-Euclidean inner product 〈 , 〉, and a nonzero traceless linear operator A : V → V , self-adjoint relative to 〈 , 〉. As in [16], we then define an n-dimensional pseudo-Riemannian manifold (7) (I ×R× V, κ dt2 + dt ds + γ) , where products of differentials represent symmetric products, t, s denote the Carte- sian coordinates on the I × R factor, γ stands for the pullback to I × R × V of the flat pseudo-Riemannian metric on V that corresponds to the inner product 〈 , 〉, and the function κ : I ×R× V → R is given by κ(t, s, ψ) = f(t)〈ψ, ψ〉+ 〈Aψ, ψ〉. The manifolds (7) are characterized by the following local classification result, analogous to Theorem 3.1. As before, d is the dimension of the Olszak distribution. Theorem 4.1. For any I, f, n, V , 〈 , 〉, A as above, the pseudo-Riemannian man- ifold (7) is conformally symmetric and has d = 1. Conversely, in any conformally symmetric pseudo-Riemannian manifold such that d = 1, every point has a connected neighborhood isometric to an open subset of a manifold (7) constructed from some such I, f, n, V , 〈 , 〉, A. The manifold (7) is never conformally flat, and it is locally symmetric if and only if f is constant. A proof of Theorem 4.1 is given at the end of the next section. Obviously, the metric κ dt2+ dt ds + γ in (7) has the sign pattern − . . .+, where the dots stand for the sign pattern of 〈 , 〉. Remark 4.2. A classification result of the same format as Theorem 4.1 cannot be true just for essentially conformally symmetric manifolds with d = 1. Namely, such manifolds do not satisfy a principle of unique continuation: formula (7) with f which is nonconstant on I, but constant on some nonempty open subinterval I ′ of I, defines an essentially conformally symmetric manifold with a locally symmetric open submanifold U = I ′ ×R × V . At points of U, the local structure of (7) does not, therefore, arise from a construction that, locally, produces all essentially conformally symmetric manifolds and nothing else. As explained in [7, Section 24], an analogous situation arises when d = 2. 5. Proof of Theorem 4.1 The following assumptions will be used in Lemma 5.1. (a) (M, g) is a conformally symmetric manifold of dimension n ≥ 4 and y ∈M . CONFORMALLY SYMMETRIC MANIFOLDS 7 (b) The Olszak distribution D of (M, g) is one-dimensional. (c) u is a global parallel vector field spanning D. (d) t :M → R is a C∞ function with g(u, · ) = dt and t(y) = 0. (e) dim V = n− 2 for the space V of all parallel sections of E = D⊥/D. (f) ρ = (2−n)f(t) dt⊗dt for some C∞ function f : I ′ → R on an open interval I ′, where ρ is the Ricci tensor and f(t) denotes the composite f ◦ t. For local considerations, only (a) and (b) are essential. In fact, condition (e) (in which ‘parallel’ refers to the connection in E induced by the Levi-Civita connection of g), as well (c) and (d) for some u and t, follow from (a) – (b) if M is simply connected. See Lemma 2.2(f). On the other hand, (c) – (d), Lemma 2.2(b) and symmetry of ρ give ∇dt = 0 and ρ = χ dt⊗ dt for some function χ : M → R, so that ∇ρ = dχ⊗ dt⊗ dt. However, ∇ρ is totally symmetric (that is, ρ satisfies the Codazzi equation): our assumption ∇W = 0 implies the condition divW = 0, well known [11, formula (5.29) on p. 460] to be equivalent to the Codazzi equation for the Schouten tensor σ, while σ = ρ by Lemma 2.2(c). Thus, dχ equals a function times dt, and so χ is, locally, a function of t, which (locally) yields (f). For any section v of D⊥, we denote by v the image of v under the quotient-pro- jection morphism D⊥ → E = D⊥/D. The data required for the construction in Section 4 consist of I, f, n, V appearing in (a) – (f), along with the pseudo-Euclidean inner product 〈 , 〉 in V , induced in an obvious way by g (cf. Lemma 2.2(f)), and A : V → V characterized by 〈Aψ, ψ ′〉 = W (u′, v, v ′, u′), for ψ, ψ ′ ∈ V , with a vector field u′ and sections v, v ′ of D⊥ chosen, locally, so that g(u, u′) = 1, ψ = v and ψ ′ = v ′. (The resulting bilinear form (ψ, ψ ′) 7→ 〈Aψ, ψ ′〉 on V is well-defined, that is, unaffected by the choices of u′, v or v ′, as a consequence of Lemma 2.2(d),(e), while the function W (u′, v, v ′, u′) is in fact constant, by Lemma 2.2(d), as ones sees differentiating it via the Leibniz rule and noting that, since v and v ′ are parallel, the covariant derivatives of v and v ′ in the direction of any vector field are sections of D.) That A is traceless and self-adjoint is immediate from the symmetries of W. Finally, A 6= 0 since, otherwise, W would vanish. (Namely, in view of Lemma 2.2(d),(e), W would yield 0 when evaluated on any quadruple of vector fields, each of which is either u′ or a section of D⊥.) Under the assumptions (a) – (f), with f = f(t), we then have (8) R(u′, v)v ′ = [f g(v, v ′) + 〈Av, v ′〉]g(u′, u)u for any sections v, v ′ of D⊥ and any vector field u′. In fact, ρ(v, · ) = ρ(v ′, · ) = 0 from symmetry of ρ and Lemma 2.2(b), so that, by Lemma 2.2(c), R(u′, v)v ′ = W (u′, v)v ′ − (n − 2)−1g(v, v ′)ρu′, where ρu′ denotes the unique vector field with g(ρu′, · ) = ρ(u′, · ). Now (8) follows: due to (d), (f) and the definition of A, both sides have the same g-inner product with u′, and are orthogonal to u⊥ = D⊥ (with R(u′, v)v ′ orthogonal to D⊥ in view of Lemma 2.2(e)). 8 A. DERDZINSKI AND W. ROTER We fix an open subinterval I of I ′, containing 0, and a null geodesic I ∋ t 7→ x(t) in M with x(0) = y, parametrized by the function t (in the sense that the function t restricted to the geodesic coincides with the geodesic parameter). Namely, since ∇dt = 0, the restriction of t to any geodesic is an affine function of the parameter; thus, by (d), it suffices to prescribe the initial data formed by x(0) = y and a null vector ẋ(0) ∈ TyM with g(ẋ(0), uy) = 1. As g(ẋ(0), uy) = 1, the plane P in TyM , spanned by the null vectors ẋ(0) and uy (cf. Lemma 2.2(a)) is gy-nondegenerate, and so TyM = P ⊕ Ṽ , for Ṽ = P ⊥. Let pr : TyM → Ṽ be the orthogonal projection. Since pr(Dy) = {0}, the restriction of pr to Dy ⊥ descends to the quotient Ey = Dy ⊥/Dy, producing an isomorphism Ey → Ṽ , also denoted by pr. Finally, for ψ ∈ V , we let t 7→ ψ̃(t) ∈ Tx(t)M be the parallel field with ψ̃(0) = pr ψy, and set κ(t, s, ψ) = f(t)〈ψ, ψ〉+ 〈Aψ, ψ〉, as in Section 4. The formula F (t, s, ψ) = expx(t)(ψ̃(t) + sux(t)/2) now defines a C ∞ mapping F from an open subset of R2× V into M . Lemma 5.1. Under the above hypotheses, F ∗g = κ dt2+ dtds+ h. Proof. The F -images w,w ′, F∗ψ of the constant vector fields (1, 0, 0), (0, 1, 0) and (0, 0, ψ) in R2×V , for ψ ∈ V , are vector fields tangent to M along F (sections of F ∗TM). Since D⊥ is parallel, its leaves are totally geodesic and, by Lemma 2.2(e), the Levi-Civita connection of g induces on each leaf a flat torsionfree connection. Thus, w ′ and each F∗ψ are parallel along each leaf of D ⊥, as well as tangent to the leaf, and parallel along the geodesic t 7→ x(t). Therefore, w ′ = u/2, while the functions g(w ′, F∗ψ) and g(F∗ψ, F∗ψ ′), for ψ, ψ ′ ∈ V , are constant, and hence equal to their values at y, that is, 0 and 〈ψ, ψ ′〉. It now remains to be shown that g(w,w) = κ◦F , g(w, u/2) = 1/2 and g(w, F∗ψ) = 0. To this end, we consider the variation x(t, s) = F (t, sa, sψ) of curves in M , with any fixed a ∈ R and ψ ∈ V . Clearly, w = xt along the variation (notation of Section 1). Next, xts = xst is tangent to D ⊥, since so is xs, while D ⊥ is parallel. Consequently, [g(xt, u)]s = 0, as u is parallel and tangent to D. Thus, g(w, u) = g(xt, u) = 1. (Note that g(xt, u) = 1 at s = 0, due to (d), as the geodesic t 7→ x(t) is parametrized by the function t.) However, xss = 0 and xs is tangent to D ⊥, so that (3) and (8) now give xtss = [fg(xs, xs) + 〈Axs, xs〉]u, which is parallel in the s direction, while xts = xst = 0 at s = 0. Hence xts = s[fg(xs, xs) + 〈Axs, xs〉]u, and so g(xts, xts) = 0 (cf. (c) above and Lemma 2.2(a)). This further yields [g(xt, xt)]ss/2 = g(xt, xtss) = fg(xs, xs) + 〈Axs, xs〉. The last function is constant in the s direction, while g(xt, xt) = [g(xt, xt)]s = 0 at s = 0, and so g(w,w) = g(xt, xt) = s 2[fg(xs, xs) + 〈Axs, xs〉] = κ. Finally, being proportional to u at each point, xts is orthogonal to D ⊥, and hence to F∗ψ, which imples that [g(xt, F∗ψ)]s = 0, and, as g(w, F∗ψ) = g(xt, F∗ψ) = 0 at s = 0, we get g(w, F∗ψ) = 0 everywhere. � CONFORMALLY SYMMETRIC MANIFOLDS 9 We are now in a position to prove Theorem 4.1. First, (7) is conformally sym- metric and has d = 1, as one can verify by a direct calculation, cf. [16, Theorem 3]. Conversely, if conditions (a) and (b) above are satisfied, we may also assume (c) – (f). (See the comment following (f).) Our assertion is now immediate from Lemma 5.1. Appendix I: Proof of Lemma 2.1 We prove Lemma 2.1 here, since Olszak’s paper [13] may be difficult to obtain. The condition d = n is equivalent to conformal flatness of (M, g), since n > 2 and so Ω = 0 is the only 2-form ∧-divisible by all nonzero 1-forms ξ. At a fixed point x, the metric gx allows us to treat the Ricci tensor ρx and any 2-form Ωx as endomorphisms of TxM, so that we may consider their images (which are subspaces of TxM). If W 6= 0, fixing a nonzero 2-form Ωx in the image of Wx acting on 2-forms at x we see that, for every u ∈ Dx, our Ωx is ∧-divisible by ξ = gx(u, · ), and so the image of Ωx contains Dx. Thus, d ≤ 2, and (i) follows. (Being nonzero and decomposable, Ωx has rank 2.) As shown in [6, Theorem 7 on p. 18], if (M, g) is essentially conformally symmetric, the image of ρx is a subspace of Dx, so that (i) yields (ii), since g in (ii) cannot be Ricci-flat. Next, if d = 2, the image of our Ωx coincides with Dx (as rankΩx = 2). Every 2-form in the image of Wx thus is a multiple of Ωx, being the exterior product of two vectors in Dx, identified, via gx, with 1-forms. Hence rankW = 1. Conversely, if rankW = 1, all nonzero 2-forms Ωx in the image of Wx are of rank 2, as Wx, being self-adjoint, is a multiple of Ωx ⊗ Ωx, and so the Bianchi identity for W gives Ωx ∧ Ωx = 0. All such Ωx are therefore ∧-divisible by ξ = gx(u, · ), for every nonzero vector u in the common 2-dimensional image of such Ωx, which shows that d = 2. Finally, (iv) follows if one chooses Ωx 6= 0 equal to Wx(v, v ′, · , · ) for some v, v ′ ∈ TxM . Appendix II: Lemma 2.2(b),(c) in the locally symmetric case Parts (b) and (c) of Lemma 2.2 for locally symmetric manifolds with d = 1 could, in principle, be derived from Cahen and Parker’s classification [1] of pseudo-Riemann- ian symmetric manifolds. We prove them here directly, for the reader’s convenience. Our argument uses assertions (a), (d) in Lemma 2.2, along with (e) for W, which were established in the proof of Lemma 2.2 before Appendix II was mentioned. Suppose that ∇R = 0 and d = 1. Replacing M by an open subset, we also assume that the Olszak distribution D is spanned by a vector field u. By (1), (9) i) R( · , · )u = Ω ⊗ u or, in coordinates, ii) ulRjkl s = Ωjku for some differential 2-form Ω, which obviously does not depend on the choice of u. (It is also clear from (1) that Ω is the curvature form of the connection in the line bundle D, induced by the Levi-Civita connection of g.) Being unique, Ω is parallel, 10 A. DERDZINSKI AND W. ROTER and so are ρ and W, which implies the Ricci identities R · Ω = 0, R · ρ = 0, and R ·W = 0. In coordinates: Rmlj sτsk +Rmlk sτjs = 0, where τ = Ω or τ = ρ, and (10) Rqpj sWsklm + Rqpk sWjslm + Rqpl sWjksm + Rqpk sWjkls = 0. Summing Rmlj sΩsk + Rmlk sΩjs = 0 against u l, we obtain Ω ◦ Ω = 0, where the metric g is used to treat Ω as a bundle morphism TM → TM that sends each vector field v to the vector field Ωv with g(Ωv, v ′) = Ω(v, v ′) for all vector fields v ′. Lemma 2.2(d) and (9.i) give W ( · , · , u, v) = R( · , · , u, v) = 0 for our fixed vector field u, spanning D, and any section v of D⊥. Hence, by (2), g(u, · ) ∧ σ(v, · ) = g(v, · ) ∧ σ(u, · ). Thus, σu = cu for the Schouten tensor σ and some constant c, with σu defined analogously to Ωv. (Otherwise, choosing v such that u, σu and v are linearly independent at a given point x, we would obtain a contradiction with the equality between planes in TxM , corresponding to the above equality between exterior products.) Consequently, g(u, · ) ∧ (σ + cg)(v, · ) = 0, and so σv + cv is a section of D whenever v is a section of D⊥. Let us now fix u′ as in (4). Symmetry of σ gives g(σu′, u) = c. In a suitably ordered basis with (5), at any point x, the endomorphism of TxM corresponding to σx thus has an upper triangular matrix with the diagonal entries c,−c, . . . ,−c, c, so that trgσ = (4 − n)c. Consequently, (n − 2) s = 2(n − 1)(4 − n)c, for the scalar curvature s, and (n − 2)ρu = 2cu. However, contracting (9.ii) in k = s, we get ρu = −Ωu, and so (n− 2)Ωu = −2cu. The equality Ω ◦Ω = 0 that we derived from the Ricci identity R ·Ω = 0 now gives c = 0. Hence s = 0 (which yields Lemma 2.2(c)), and ρu = 0. As c = 0 and σ = ρ, the assertion about σv + cv obtained above means that ρv is a section of D whenever v is a section of D⊥. Let λ, µ, ξ be the 1-forms with λ = g(u, · ), µ = g(u′, · ), ξ(u′) = 0, and ρv = ξ(v)u for sections v of D⊥. Transvecting (9.ii) with µs, we get Ω = R( · , · , u, u ′) = (n − 2)−1λ ∧ ρ(u′, · ) from Lemma 2.2(c) with ρu = 0 and Lemma 2.2(d). However, evaluating ρ(u′, · ) on u′, u and sections v of D⊥, we see that ρ(u′, · ) = hλ+ ξ, with h = ρ(u′, u′). (Note that ξ(u) = 0 since ρu = 0, while D ⊂ D⊥ by Lemma 2.2(a).) Therefore, (11) i) (n− 2)Ω = λ ∧ ξ , ii) ρ = hλ⊗ λ + λ⊗ ξ + ξ ⊗ λ. In addition, if v ′ denotes the unique vector field with g(v ′, · ) = ξ, then u and v ′ are null and orthogonal, or, equivalently, (12) the 1-forms λ and ξ are null and mutually orthogonal. In fact, g(u, u) = 0 by Lemma 2.2(a), g(u, v ′) = 0 as ξ(u) = 0, and v ′ is null since (11) yields (n−2)[ρ(Ωu′)−Ω(ρu′)] = 2g(v ′, v ′)u, while, transvecting the Ricci identity Rmlj sRsk + Rmlk sRjs = 0 with u l and using (9.ii), we see that ρ and Ω commute as bundle morphisms TM → TM . Furthermore, transvecting with µkµm the coordinate form Rmlj sτsk+Rmlk sτjs = 0 of the Ricci identity R · τ = 0 for the parallel tensor field τ = (n − 2)Ω + ρ = CONFORMALLY SYMMETRIC MANIFOLDS 11 hλ⊗ λ + 2λ⊗ ξ (cf. (11)), we get 2λjblsξ s = 0, where b = W (u′, · , u′, · ). Namely, R = W + (n− 2)−1g ∧ ρ by Lemma 2.2(c), Wmlj sτsk = 0 in view of Lemma 2.2(d), µkµmWmlk sτjs = 2λjblsξ s since b(u, · ) = 0 (again from Lemma 2.2(d)), and the remaining terms, related to g ∧ ρ, add up to 0 as a consequence of (12), (11.ii) and the formula for τ . (Note that (12) gives Rj sτsk = Rj sτks = 0, and so four out of the eight remaining terms vanish individually.) However, u 6= 0, and so λ 6= 0, which gives b( · , v ′) = 0, where v ′ is the vector field with g(v ′, · ) = ξ. Thus, W (u′, · , u′, v ′) = 0. As a result, the 3-tensor W ( · , · , · , v ′) must vanish: it yields the value 0 whenever each of the three arguments is either u′ or a section of D⊥. (Lemma 2.2(e) for W is already established.) The relation W ( · , · , · , v ′) = 0 implies in turn that W ( · , · , · , ρv) = 0 (in coor- dinates: Wjkl sRsp = 0). In fact, by (11.ii), the image of ρ is spanned by u and v while W ( · , · , · , u) = 0 according to Lemma 2.2(d). As in [13, 1o on p. 214], we have W = (λ ⊗ λ) ∧ b (notation of (2)), where, again, b = W (u′, · , u′, · ). Namely, by Lemma 2.2(e) for W, both sides agree on any quadruple of vector fields, each of which is either u′ or a section of D⊥. Finally, transvecting (10) with µkµm and replacing R by W + (n− 2)−1g ∧ ρ, we obtain two contributions, one from W and one from g ∧ ρ, the sum of which is zero. Since W = (λ⊗λ)∧ b, the W contribution vanishes: its first two terms add up to 0, and so do its other two terms. (As we saw, b(u, · ) = 0, while, obviously, b(u′, · ) = 0.) Out of the sixteen terms forming the g∧ρ contribution, eight are separately equal to zero since Wjkl sRsp = 0, and so, in view of (11.ii) and the relation W = (λ⊗ λ) ∧ b, vanishing of the g∧ρ contribution gives λpSjlq = λqSjlp, for Sjlq = 2bjlξq−bqlξj−bqjξl. Thus, Sjlq = ηjlλq for some twice-covariant symmetric tensor field η, which, summed cyclically over j, l, q, yields 0 (due to the definition of Sjlq and symmetry of b). As λ 6= 0 and the symmetric product has no zero divisors, we get η = 0 and Sjlq = 0. The expression bjlξq− bqlξj is, therefore, skew-symmetric in j, l. As it is also, clearly, skew-symmetric in j, q, it must be totally skew-symmetric and hence equal to one- third of its cyclic sum over j, l, q. That cyclic sum, however, is 0 in view of symmetry of b, so that bjlξq = bqlξj. Thus, ξ = 0, for otherwise the last equality would yield b = ϕξ ⊗ ξ for some function ϕ, and hence W = (λ⊗ λ) ∧ b = ϕ(λ⊗ λ) ∧ (ξ ⊗ ξ), which would clearly imply that the vector field v ′ with g(v ′, · ) = ξ is a section of the Olszak distribution D, not equal to a function times u (as ξ(u′) = 0, while g(u, u′) = 1), contradicting one-dimensionality of D. Therefore, ρ = hλ ⊗ λ by (11.ii) with ξ = 0, which proves assertion (b) of Lemma 2.2 in our case. References [1] Cahen, M. & Parker, M., Pseudo-riemannian symmetric spaces. Mem. Amer. Math. Soc. 229 (1980), 1–108. 12 A. DERDZINSKI AND W. ROTER [2] Chaki, M. C. & Gupta, B., On conformally symmetric spaces. Indian J. Math. 5 (1963), 113–122. [3] Derdziński, A., On homogeneous conformally symmetric pseudo-Riemannian manifolds. Col- loq. Math. 40 (1978), 167–185. [4] Derdziński, A. & Roter, W., On conformally symmetric manifolds with metrics of indices 0 and 1. Tensor (N. S.) 31 (1977), 255–259. [5] Derdziński, A. & Roter, W., Some theorems on conformally symmetric manifolds. Tensor (N. S.) 32 (1978), 11–23. [6] Derdziński, A. & Roter, W., Some properties of conformally symmetric manifolds which are not Ricci-recurrent. Tensor (N. S.) 34 (1980), 11–20. [7] Derdzinski, A. & Roter, W., Projectively flat surfaces, null parallel distributions, and conformally symmetric manifolds. Preprint, math.DG/0604568. To appear in Tohoku Math. J. [8] Derdzinski, A. & Roter, W., Compact pseudo-Riemannian manifolds with parallel Weyl tensor. Preprint, http://arXiv.org/abs/math.DG/0702491. [9] Deszcz, R., On hypercylinders in conformally symmetric manifolds. Publ. Inst. Math. (Beograd) (N. S.) 51(65) (1992), 101–114. [10] Deszcz, R. & Hotloś, M., On a certain subclass of pseudosymmetric manifolds. Publ. Math. Debrecen 53 (1998), 29–48. [11] Dillen, F. J. E. & Verstraelen, L. C. A. (eds.), Handbook of Differential Geometry, Vol. I. North-Holland, Amsterdam, 2000. [12] Hotloś, M., On conformally symmetric warped products. Ann. Acad. Paedagog. Cracov. Stud. Math. 4 (2004), 75–85. [13] Olszak, Z., On conformally recurrent manifolds, I: Special distributions. Zesz. Nauk. Politech. Śl., Mat.-Fiz. 68 (1993), 213–225. [14] Patterson, E. M. & Walker, A. G., Riemann extensions. Quart. J. Math. Oxford Ser. (2) 3 (1952), 19–28. [15] Rong, J. P., On 2K∗ space. Tensor (N. S.) 49 (1990), 117–123. [16] Roter, W., On conformally symmetric Ricci-recurrent spaces. Colloq. Math. 31 (1974), 87– [17] Sharma, R., Proper conformal symmetries of conformal symmetric space-times. J. Math. Phys. 29 (1988), 2421–2422. [18] Simon, U., Compact conformally symmetric Riemannian spaces.Math. Z. 132 (1973), 173–177. Department of Mathematics The Ohio State University Columbus, OH 43210 E-mail address : andrzej@math.ohio-state.edu Institute of Mathematics and Computer Science Wroc law University of Technology Wybrzeże Wyspiańskiego 27, 50-370 Wroc law Poland E-mail address : roter@im.pwr.wroc.pl http://arxiv.org/abs/math/0604568 Introduction 1. Preliminaries 2. The Olszak distribution 3. The case d=2 4. The case d=1 5. Proof of Theorem ?? Appendix I: Proof of Lemma ?? Appendix II: Lemma ??(b),(c) in the locally symmetric case References ABSTRACT This is a final step in a local classification of pseudo-Riemannian manifolds with parallel Weyl tensor that are not conformally flat or locally symmetric. <|endoftext|><|startoftext|> Introduction The space-time development of a hadronic system is still poorly understood, and models are necessary to transform a partonic system, governed by perturbative QCD, to final state hadrons observed in the detectors. WW events produced in e+e− collisions at LEP-2 constitute a unique laboratory to study and test the evolution of such hadronic systems, because of the clean environment and the well-defined initial energy in the process. Of particular interest is the possibility to study separately one single evolving hadronic system (one of the W bosons decaying semi-leptonically, the other decaying hadronically), and compare it with two hadronic systems evolving at the same time (both W bosons decaying hadronically). Interconnection effects between the products of the hadronic decays of the two W bosons (in the same event) are expected since the lifetime of the W bosons (τW ≃ ~/ΓW ≃ 0.1 fm/c) is an order of magnitude smaller than the typical hadronization times. These effects can happen at two levels: • in the evolution of the parton shower, between partons from different hadronic sys- tems by exchanging coloured gluons [1] (this effect is called Colour Reconnection (CR) for historical reasons); • between the final state hadrons, due to quantum-mechanical interference, mainly due to Bose-Einstein Correlations (BEC) between identical bosons (e.g. pions with the same charge). A detailed study by DELPHI of this second effect was recently published [2]. The first effect, the possible presence of colour flow between the two W hadronization systems, is the topic studied in this paper. This effect is worthy of study in its own right and for the possible effects induced on the W mass measurement in fully hadronic events (see for instance [3] for an introduction and [4] for an experimental review). The effects at the perturbative level are expected to be small [3], whereas they may be large at the hadronization level (many soft gluons sharing the space-time) for which models have to be used to compare with the data. The most tested model is the Sjöstrand-Khoze “Type 1” CR model SK-I [5]. This model of CR is based on the Lund string fragmentation phenomenology. The strings are considered as colour flux tubes with some volume, and reconnection occurs when these tubes overlap. The probability of reconnection in an event is parameterised by the value κ, set globally by the user, according to the space-time volume overlap of the two strings, Voverlap : Preco(κ) = 1− e−κVoverlap . (1) The parameter κ was introduced in the SK-I model to allow a variation of the percentage of reconnected events and facilitate studies of sensitivity to the effect. In this model only one string reconnection per event was allowed. The authors of the model propose the value of κ = 0.66 to give similar amounts of reconnection as other models of Colour Reconnection. By comparing the data with the model predictions evaluated at several κ values, it is possible to determine the value of κ most consistent with the data and extract the corresponding reconnection probability. Another model was proposed by the same authors, considering the colour flux tubes as infinitely thin, which allows for Colour Reconnection in the case the tubes cross each other and provided the total string length is reduced (SK-II′). This last model was not tested. Two further models are tested here, these are the models implemented in HERWIG [6] and ARIADNE [7] Monte Carlo programs. In HERWIG the partons are reconnected, with a reconnection probability of 1/9, if the reconnection results in a smaller total cluster mass. In ARIADNE, which implements an adapted version of the Gustafson-Häkkinen model [8], the model used [9] allows for reconnections between partons originating in the same W boson, or from different W bosons if they have an energy smaller than the width of the W boson (this model will be referred as ‘AR-2’). Colour Reconnection has been previously investigated in DELPHI by comparing in- clusive distributions of charged particles, such as the charged-particle multiplicity dis- tribution or the production of identified (heavy) particles, in fully hadronic WW events and the distributions in semi-leptonic WW events. The investigations did not show any effect as they were limited by statistical and systematic errors and excluded only the most extreme models of CR (see [10]). This article presents the results of the investigations of Colour Reconnection effects in hadronically decaying W pairs using two techniques. The first, proposed by L3 in [11], looks at the particle flow between the jets in a 4-jet WW event. The second, proposed by DELPHI in [12], takes into account the different sensitivity to Colour Reconnection of several W mass estimators. The first technique is more independent of the model and it can provide comparisons based on data. The second technique is more dependent on the model tested, but has a much larger sensitivity to the models SK-I and HERWIG. Since the particle flow and W mass estimator methods were found to be largely uncorrelated a combination of the results of these two methods is provided. The paper is organised as follows. In the next section, the LEP operation and the components of the DELPHI detector relevant to the analyses are briefly described. In section 3 data and simulation samples are explained. Then both of the analysis methods discussed above are described and their results presented in sections 4 and 5. The com- bination of the results is given in section 6 and conclusions are drawn in the seventh and final section. 2 LEP Operation and Detector Description At LEP-2, the second phase of the e+e− collider at CERN, the accelerator was operated at centre-of-mass energies above the threshold for double W boson production from 1996 to 2000. In this period, the DELPHI experiment collected about 12000 WW events corresponding to a total integrated luminosity of 661 pb−1. About 46% of the WW events are WW → q1q̄2q3q̄4 events (fully hadronic), and 44% are WW → q1q̄2ℓν̄, where ℓ is a lepton (semi-leptonic). The detailed description of the DELPHI detector and its performance is provided in [13,14]. A brief summary of the main characteristics of the detector important for the analyses follows. The tracking system of DELPHI consisted of a Time Projection Chamber (TPC), the main tracking device of DELPHI, and was complemented by a Vertex Detector (VD) closest to the beam pipe, the Inner and the Outer Detectors in the barrel region, and two Forward Chambers in the end caps. It was embedded in a 1.2 T magnetic field, aligned parallel to the beam axis. The electromagnetic calorimeter consisted of the High density Projection Chamber (HPC) in the barrel region, the Forward Electromagnetic Calorimeter (FEMC) and the Small angle Tile Calorimeter (STIC) in the forward regions, complemented by detectors to tag the passage of electron-positron pairs from photons converted in the regions between the FEMC and the HPC. The total depths of the calorimeters corresponded to about 18 radiation lengths. The hadronic calorimeter was composed of instrumented iron with a total depth along the shortest trajectory for a neutral particle of 6 interaction lengths, and covered 98% of the total solid angle. Embedded in the hadronic calorimeter were two planes of muon drift chambers to tag the passage of muons. The whole detector was surrounded by a further double plane of staggered muon drift chambers. For LEP-2, the DELPHI detector was upgraded as described in the following. Changes were made to some of the subdetectors, the trigger system [15], the run control and the algorithms used in the offline reconstruction of tracks, which improved the performance compared to the earlier LEP-1 period. The major changes were the extensions of the Vertex Detector (VD) and the Inner Detector (ID), and the inclusion of the Very Forward Tracker (VFT) [16], which increased the coverage of the silicon tracker to polar angles with respect to the z-axis1 of 11◦ < θ < 169◦. To further improve the track reconstruction efficiency in the forward regions of DELPHI, the tracking algorithms and the alignment and calibration procedures were optimised for LEP-2. Changes were also made to the electronics of the trigger and timing system which improved the stability of the running during data taking. The trigger conditions were optimised for LEP-2 running, to give high efficiency for 2- and 4-fermion processes in the Standard Model and also to give sensitivity to events which may have been signatures of new physics. In addition, improvements were made to the operation of the detector during the LEP operating states, to prepare the detector for data taking at the very start of stable collisions of the e+e− beams, and to respond to adverse background from LEP when it arose. These changes led to an overall improvement in the efficiency for collecting the delivered luminosity from about 85% in 1995, before the start of LEP-2, to about 95% at the end in 2000. During the operation of the DELPHI detector in 2000 one of the 12 sectors of the central tracking chamber, the TPC, failed. After 1st September it was not possible to detect the tracks left by charged particles inside the broken sector. The data affected corresponds to around 1/4 of the data collected in 2000. Nevertheless, the redundancy of the tracking system of DELPHI meant that tracks passing through the sector could still be reconstructed from signals in any of the other tracking detectors. As a result, the track reconstruction efficiency was only slightly reduced in the region covered by the broken sector, but the track parameter resolutions were degraded compared with the data taken prior to the failure of this sector. 3 Data and Simulation Samples The analyses presented here use the data collected by DELPHI in the years 1997 to 2000, at centre-of-mass energies s between 183 and 209 GeV. The data collected in the year 2000 with the TPC working in full, with centre-of-mass energies from 200 to 208 GeV and a integrated luminosity weighted average centre-of-mass energy of 206 GeV, were analysed together. Data acquired with the TPC with a broken sector, corresponding to a integrated luminosity weighted average centre-of-mass energy of 207 GeV, were analysed separately and included in the results presented here. The total integrated luminosity of the data sample is 660.8 pb−1, and the integrated luminosity weighted average centre-of-mass energy of the data is 197.1 GeV. To compare with the expected results from processes in the Standard Model including or not including CR, Monte Carlo (MC) simulation was used to generate events and 1The DELPHI coordinate system is a right-handed system with the z-axis collinear with the incoming electron beam, and the x axis pointing to the centre of the LEP accelerator. simulate the response of the DELPHI detector. These events were reconstructed and analysed with the same programs as used for the real data. The 4-fermion final states were generated with the code described in [17], based on WPHACT [18], for the WW signal (charged currents) and for the ZZ background (neutral currents), after which the events were fragmented with PYTHIA [19] tuned to DELPHI data [20]. The same WW events generated at 189, 200 and 206 GeV were also fragmented with PYTHIA implementing the SK-I model, with 100% reconnection probability. The systematic effects of fragmentation were studied using the above WW samples and WW samples generated with WPHACT and fragmented with either ARIADNE [7] or HERWIG [6] at 183, 189, 200 and 206 GeV. For systematic studies of Bose-Einstein Correlations (BEC), WW samples generated with WPHACT and fragmented with PYTHIA implementing the BE32 model [21] of BEC, were used at all energies, except at 207 GeV. The integrated luminosity of the simulated samples was at least 10 times that of the data of the corresponding year, and the majority corresponded to 100 times that of the data. To test the consistency of the SK-I model and measure the κ parameter, large WW samples were generated in an early stage of this work with EXCALIBUR [22] at 200 and 206 GeV, keeping only the fully hadronic decays. These samples were then fragmented with PYTHIA. It was verified for smaller subsets that the results using these large samples and the samples generated later with WPHACT are compatible. The qq̄(γ) background events were generated at all energies with KK2f [23] and frag- mented with PYTHIA. For systematic studies, similar KK2f samples fragmented with ARIADNE [7] were used at 183, 189, 200 and 206 GeV. These samples will be referred to as “DELPHI samples”. At 189 GeV, to compare with the other LEP experiments and with different CR mod- els, 6 samples generated with KORALW [24] for the 4-fermion final states were also used. These samples 2 will be referred to as “Cetraro samples”. The events in the different sam- ples have the final state quarks generated with the same kinematics, and differ only in the parton shower evolution and fragmentation. Three samples were fragmented respectively with PYTHIA, ARIADNE and HERWIG (using the tuning of the ALEPH collaboration), with no CR implementation. Three other samples were fragmented in the same manner but now implementing several CR models: the SK-I model with 100% reconnection proba- bility, the AR-2 model, and the HERWIG implementation of CR with 1/9 of reconnected events, respectively. 4 The Particle Flow Method The first of the two analyses presented in this paper is based on the so-called “particle flow method”. The particle flow algorithm is based on the selection of special event topologies, in order to obtain well defined regions between any two jets originating from the same W (called the Inside-W region) or from different Ws (called the Between-W region). It is expected that Colour Reconnection decreases (increases) particle production in the Inside-W (Between-W) region. Hence, by studying the particle production in the inter-jet regions it is possible to measure the effects of Colour Reconnection. However, this method requires a selection of events with a suitable topology (see below) which has a low efficiency (<∼25%). 2produced by ALEPH after the LEP-W Physics Workshop in Cetraro, Italy, October 2001 4.1 Event and Particle Selection Events with both Ws decaying into q1q̄2 are characterised by high multiplicity, large visible energy, and the tendency of the particles to be grouped in 4 jets. The background is dominated by qq̄(γ) events. Charged particles were required to have momentum p larger than 100 MeV/c and below 1.5 times the beam energy, a relative error on the momentum measurement ∆p/p < 1, and polar angle θ with respect to the beam axis between 20◦ and 160◦. To remove tracks from secondary interactions, the distance of closest approach of the extrapolated track to the interaction point was required to be less than 4 cm in the plane perpendicular to the beam axis and less than 4/sin θ cm along the beam axis, and the reconstructed track length was required to be larger than 30 cm. Clusters in the electromagnetic or hadronic calorimeters with energy larger than 0.5 GeV and polar angle in the interval 10◦ < θ < 170◦, not associated to charged particles, were considered as neutral particles. The events were pre-selected by requiring at least 12 charged particles, with a sum of the modulus of the momentum transverse to the beam axis, of charged and neutral particles, above 20% of the centre-of-mass energy. These cuts reduced the contributions from gamma-gamma processes and beam-gas interactions to a negligible amount. The momentum distribution of the charged particles for the pre-selected events is shown in Figure 1 and compared to the expected distribution from the simulation. A good agree- ment between data and simulation is observed. DELPHI 189 GeV WW WPHACT WW semileptonic p (GeV/c) 0 5 10 15 20 25 30 35 40 45 50 DELPHI 189 GeV WW WPHACT WW semileptonic p (GeV/c) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Figure 1: Momentum distribution for charged particles (range 0-50 GeV/c (a) and 0-5 GeV/c (b)). Points represent the data and the histograms represent the contributions from simulation for the different processes (signal (white) and background contributions). About half of the e+e−→qq̄(γ) events at high-energy are associated with an energetic photon emitted by one of the beam electrons or positrons (radiative return events), thus reducing the energy available in the hadronic system to the Z mass. To remove these radiative return events, the effective centre-of-mass energy s′, computed as described in [25], was required to be above 110 GeV. It was verified that this cut does not affect the signal from W pairs, but reduces significantly the contribution from the qq̄(γ) process. In the WW fully hadronic decays four well separated energetic jets are expected which balance the momentum of the event and have a total energy near to the centre-of-mass energy. The charged and neutral particles in the event were thus clustered using the DURHAM algorithm [26], for a separation value of ycut = 0.005, and the events were kept if there were 4 and only 4 jets and a multiplicity (charged plus neutral) in each jet larger than 3. The combination of these two cuts removed most of the semi-leptonic WW decays and the 2-jet and 3-jet events of the qq̄(γ) background. The charged-particle multiplicity distribution for the selected events at 189 GeV is given in Figure 2, with data points compared to the histogram from simulation of signal and background processes. 0 10 20 30 40 50 60 70 80 90 DELPHI 189 GeV WW WPHACT WW semileptonic Figure 2: Uncorrected charged-particle multiplicity distribution at a centre-of-mass en- ergy of 189 GeV. Points represent the data and the histograms represent the contribution from simulation for the different processes. For the study of the charged-particle flow between jets, the initial quark configuration should be well reconstructed with a good quark-jet association. At 183 GeV and above, the produced W bosons are significantly boosted. This produces smaller angles in the laboratory frame of reference between the jets into which the W decays, when compared to these angles at threshold (back-to-back). Hence, this property tends to reduce the ambiguity in the definition of the Between-W and Inside-W regions. The selection criteria were designed in order to minimize the situation of one jet from one W boson appearing in the Inside-W region of the other W boson. The selection criteria are based on the event topology, with cuts in 4 of the 6 jet-jet angles. The smallest and the second smallest jet-jet angle should be below 100◦ and not adjacent (not have a common jet). Two other jet-jet angles should be between 100◦ and 140◦ and not adjacent (large angles). In the case that there are two different combinations of jets satisfying the above criteria for the large angles, the combination with the highest sum of large angles is chosen. This selection increases the probability to have a correct pairing of jets to the same W boson. s L Eff. Pur. Nsel MC tot. WW 4j qq̄(γ) ZZ W lep. εPAIR 183 52.7 22% 74% 127 114.2 84.4 22.3 0.7 7.0 69% 189 157.6 21% 75% 340 341.4 255.9 56.8 2.4 26.4 75% 192 25.9 21% 75% 61 56.1 41.9 9.4 0.4 4.4 77% 196 77.3 19% 74% 176 159.2 117.6 26.2 1.3 14.0 79% 200 83.4 18% 72% 173 165.0 119.5 27.8 1.3 16.4 82% 202 40.6 17% 72% 82 75.7 54.6 12.5 0.7 8.0 82% 206 163.9 15% 70% 282 274.7 193.1 47.8 2.7 31.1 79% 207 59.4 15% 70% 102 99.7 70.1 17.6 1.0 11.1 80% Table 1: Centre-of-mass energy ( s in GeV), integrated luminosity (L in pb−1), efficiency and purity of the data samples, number of selected events, number of expected events from 4-jet WW and background processes (total and separated by process), and efficiency of correct pairing of jets to the same W boson. The integrated luminosity, the efficiency to select 4-jet WW events and the purity of the selected data samples, estimated using simulation, and the number of selected events are summarised for each centre-of-mass energy in Table 1. The numbers of expected events are also given separately for the signal and the background processes, and were estimated using simulation. The efficiency to select the correct pairing of jets to the same W boson, estimated with simulation as the fraction of WW events for which the selected jets 1 and 2 (see later) correspond indeed to the same W boson, is given in the last column of the Table. The efficiency of the event selection criteria decreases with increasing centre-of-mass energy. This is primarily due to the ‘large’ angles being reduced as a result of the increased boost (becoming lower than the cut value of 100◦) and ‘small’ angles being increased due to the larger phase-space available (becoming higher than the cut value of 100◦). Much for the same reason, the efficiency to assign two jets to the same W boson in the selected events increases slightly with increasing centre-of-mass energy, in opposition to what would happen at threshold with the W boson decaying into two back-to-back jets, that would never be selected to come from the same W boson by the requirement that their interjet angle should be between 100◦ and 140◦. In the following analysis the jets and planar regions are labeled as shown in Figure 3: the planar region corresponding to the smallest jet-jet angle is region B in the plane made by jets 2 and 3; the second smallest jet-jet angle corresponds to the planar region D between jets 1 and 4 in the plane made by these two jets; the planar region corresponding to the greatest of the large jet-jet angles in this combination is region A and spans the angle between jets 1 and 2 in the plane made by these jets; and finally region C corresponds to the planar region spanned by the second large angle, between jets 3 and 4 in the plane made by these two jets. In general, the planar regions are not in the same plane, as the decay planes of the W bosons do not coincide, and the large angles in this combination are not necessarily the largest jet-jet angles in the event. The distribution of the reconstructed masses of the jet pairings (1,2) and (3,4), after applying a 4C kinematic fit requiring energy and momentum conservation, is shown in Figure 4 (two entries per event). In the figure, data at 189 GeV (points) are compared to the expected distribution from the 4-jet WW signal without CR, plus background processes, estimated using the simulation (histograms). The contribution from the 4-jet Inside-W region Inside-W region Between-W region Between-W region jet 2 jet 1 jet 3 jet 4 Figure 3: Schematic drawing of the angular selection. WW signal simulation is split between the case in which the two pairs of jets making the large angles actually come from their parent W bosons and the case in which the jets of a pair come from different W bosons (mismatch). 4.2 Particle Flow Distribution The particle flow analysis uses the number of particles in the Inside-W and the Between-W regions. An angular ordering of the jets is performed as in Figure 3. The two large jet-jet angles in the event are used to define the Inside-W regions, and the two smallest angles span the Between-W regions, the regions between the different Ws. In general, the two W bosons will not decay in the same plane, and this must be accounted for when comparing the particle production in the Inside-W and Between-W regions. So, for each region (A, B, C and D) the particle momenta of all charged particles are projected onto the plane spanned by the jets of that region: jets 1 and 2 for region A; jets 2 and 3 for region B; jets 3 and 4 for region C; jets 4 and 1 for region D. Then, for each particle the rescaled angle Φrescaled is determined as a ratio of two angles: Φrescaled = Φi/Φr , (2) when the particle momentum is projected onto the plane of the region r. The angle Φi is then the angle between the projected particle momentum and the first mentioned jet in the definition of the regions given above. The angle Φr is the full opening angle between the jets. Hence Φrescaled varies between 0 and 1 for the particles whose momenta are projected between the pair of jets defining the plane. However, due to the aplanarity of the event about 9% of the particles in the data and in the 4-jet WW simulation have projected angles outside all four regions. These particles were discarded from further analysis. In the case where a particle could be projected onto more than one region, with 0 < Φrescaled < 1, the solution with the lower momentum transverse to the region was used. This happened for about 13% of the particles in data, after background subtraction, and in the 4-jet WW simulation. This leads to the normalised particle flow distribution shown in Figure 5 at 189 GeV, where the rescaled angle of region A is plotted from 0 to 1, region B from 1 to 2, region 0 20 40 60 80 100 120 DELPHI 189 GeV WW WPHACT WW wrong pairing WW semileptonic Mass (GeV/c Figure 4: Reconstructed dijet masses (after a 4C kinematic fit) for the selected pairs at 189 GeV (2 entries per event)(see text). C from 2 to 3 and region D from 3 to 4. The statistical error on the bin contents (the average multiplicity per bin of Φrescaled divided by the bin width) was estimated using the Jackknife method [27], to correctly account for correlations between different bins. In this distribution the regions between the jets coming from the same W bosons (A and C), and from different W bosons (B and D), have the same scale and thus can be easily compared. After subtracting bin-by-bin the expected background from the observed distributions, we define the Inside-W (Between-W) particle flow as the bin-by-bin sum of regions A and C (B and D). These distributions are compared by performing the bin-by-bin ratio of the Inside-W particle flow to the Between-W particle flow. This ratio of distributions is shown for 189 GeV and 206 GeV in Figure 6. The data points are compared to several fully simulated WW MC samples with and without CR. A good agreement was found between the predictions using the WPHACT WW MC sam- ples and the predictions based on the KORALW WW MC samples, both for the scenario without CR and for the scenario with CR (SK-I model with 100% probability of recon- nection). For both sets of predictions the regions of greatest difference between the two scenarios span the rescaled variable Φrescaled from 0.2 to 0.8. 4.3 Particle Flow Ratio After summing the particle flow distributions for regions A and C, and regions B and D, the resulting distributions are integrated from 0.2 to 0.8. The ratio R of the Inside-W to the Between-W particle flow is then defined as (with Φ being the rescaled variable Φrescaled): 0 0.5 1 1.5 2 2.5 3 3.5 4 A B C D DELPHI 189 GeV WPH. SK-I 100% WW WPHACT WW semilept. Φrescaled Figure 5: Normalised charged-particle flow at 189 GeV. The lines correspond to the sum of the simulated 4-jet WW signal with the background contributions (estimated from DELPHI MC samples), normalised to the total number of expected events (Nevents). The dashed histogram corresponds to the sum with the simulated 4-jet WW signal generated by WPHACT with 100% SK-I. 0 0.2 0.4 0.6 0.8 1 Φrescaled DELPHI 189 GeV KW. SK-I 100%WPH. SK-I 100% WW KORALWWW WPHACT 0 0.2 0.4 0.6 0.8 1 Φrescaled DELPHI 206 GeV WPH. SK-I 100% WW WPHACT Figure 6: The ratio of the particle flow distributions (A+C)/(B+D) at 189 GeV (a) and at 206 GeV (b). The data (dots) are compared to WW MC samples generated with WPHACT (DELPHI samples) and KORALW (Cetraro samples), both without CR and implementing the SK-Imodel with 100% probability of reconnection. The lines corresponding to WPHACT are hardly distinguishable from the lines corresponding to KORALW in the same condition of implementation of CR. s (GeV) RData Rno CR RSK-I:100% 183 0.889 ± 0.084 0.928 ± 0.005 - 189 1.025 ± 0.063 0.966 ± 0.006 0.864 ± 0.005 192 1.008 ± 0.150 0.970 ± 0.006 - 196 1.041 ± 0.093 0.995 ± 0.006 - 200 0.922 ± 0.084 1.022 ± 0.007 0.889 ± 0.006 202 0.952 ± 0.126 1.015 ± 0.008 - 206 1.116 ± 0.088 1.012 ± 0.008 0.889 ± 0.006 207 1.039 ± 0.135 1.019 ± 0.008 - Table 2: Values of the ratio R for each energy (errors are statistical only), and expected values with errors due to limited statistics of the simulation, all from DELPHI WPHACT WW samples. MC Sample χ2/DF α,A β,B γ no CR 7.31/5 1.001± 0.003 (3.20± 0.36)× 10−3 (−1.35± 0.40)× 10−4 SK-I 100% 1.46/1 0.880± 0.003 (1.68± 0.44)× 10−3 - Table 3: Results of the fit to the evolution of R with ( s(GeV)− 197.5). ∫ 0.8 dnch/dΦ(A+ C)dΦ ∫ 0.8 dnch/dΦ(B +D)dΦ . (3) To take into account possible statistical correlations between particles in the Inside-W and Between-W regions, the statistical error on this ratio R was again estimated through the Jackknife method [27]. The values forR obtained for the different centre-of-mass energies are shown in Table 2, and compared to the expectations from the DELPHI WPHACT WW samples without CR and implementing the SK-I model with 100% reconnection probability. These values for data and MC are plotted as function of the centre-of-mass energy in Figure 7. The changes in the value of R for the MC samples are mainly due to the dif- ferent values of the boost of the W systems. In order to quantify this effect a linear function R( s− 197.5) = A + B · ( s− 197.5) was fitted to the MC points with CR (with s in GeV), while for the points without CR the quadratic func- tion R( s− 197.5) = α+ β · ( s− 197.5) + γ · ( s− 197.5)2 was assumed (with GeV), giving reasonable χ2/d.o.f. values. The fits yielded the results shown in Table 3. The MC without CR shows a stronger dependence on s. The function fitted to this sample was used to rescale the measured values of R for the data collected at different energies to the energy of 189 GeV, the centre-of-mass energy at which the combination of the results of the LEP experiments was proposed in [4]. All the rescaled values were combined with a statistical error-weighted average. The average of the R ratios rescaled to 189 GeV was found to be 〈R〉 = 0.979± 0.032(stat). (4) 180 185 190 195 200 205 210 WW no CR Fit to no CR WW SK-I 100% Fit to SK-I 100%Fit to SK-I 100% Combined Ratio √s (GeV) DELPHI Figure 7: The ratio R as function of s for data and MC (DELPHI WPHACTWW samples), and fits to the MC with and without CR, and the combined ratio after rescaling all values s = 189 GeV (see text). The value of the combined ratio at 189 GeV is shown at a displaced energy (upwards by 1 GeV) for better visibility, as well as all the values for the MC ‘WW no CR’ points and the corresponding fitted curve which are shown at centre-of-mass energies shifted downwards by 0.5 GeV. All errors for the MC values are smaller than the size of the markers. Performing the same weighted average when using for the rescaling the fit to the MC with CR, one obtains: 〈RCR rescale〉 = 0.987± 0.032(stat). (5) Repeating the procedure, but now without rescaling the R ratios, the result is: 〈Rno rescale〉 = 0.999± 0.033(stat). (6) 4.4 Study of the Systematic Errors in the Particle Flow The following effects were studied as sources of systematic uncertainties in this anal- ysis. 4.4.1 Fragmentation and Detector response A direct comparison between the particle flow ratios measured in fully hadronic data and MC samples, R4qData and R4qMC, respectively, is hampered by the uncertainties as- sociated with the modelling of the WW fragmentation and the detector response. These systematic uncertainties were estimated using mixed semi-leptonic events. In this tech- nique, two hadronically decaying W bosons from semi-leptonic events were mixed together to emulate a fully hadronic WW decay. Mixing Technique Semi-leptonic WW decays were selected from the data collected by DELPHI at centre- of-mass energies between 189 and 206 GeV, by requiring two hadronic jets, a well isolated identified muon or electron or, in case of a tau candidate, a well isolated particle, all associated with missing momentum (corresponding to the neutrino) pointing away from the beam pipe. A neural network selection, developed in [28], was used to select the events. The same procedure was applied to the WPHACT samples fragmented with PYTHIA and HERWIG at centre-of-mass energies of 189, 200 and 206 GeV and with ARIADNE at 189 and 206 GeV. The background to this selection was found to be of negligible importance in this analysis. Samples of mixed semi-leptonic events were built separately at each centre-of-mass energy for data and Monte Carlo semi-leptonic samples, following the mixing procedure developed in [2]. In each semi-leptonic event, the lepton (or tau-decay jet) was stripped off and the remaining particles constituted the hadronically decaying W boson. Two hadronically decaying W bosons were then mixed together to emulate a fully hadronic WW decay. The hadronic parts of W bosons were mixed in such a way as to have the parent W bosons back-to-back in the emulated fully hadronic WW decay. To increase the statistics of emulated events, and profiting from the cylindrical symmetry of the detector along the z axis, the hadronic parts of W bosons were rotated around the z axis, but were not moved from barrel to forward regions or vice-versa, as detailed in the following. When mixing the hadronic parts of different W events it was required that the two Ws had reconstructed polar angles back-to-back or equal within 10 degrees. In the latter case, when both Ws are on the same side of the detector, the z component of the momentum is sign flipped for all the particles in one of the Ws. The particles of one W event were then rotated around the beam axis, in order to have the two Ws also back-to-back in the transverse plane. Each semi-leptonic event was used in the mixing procedure between 4 and 9 times, to minimize the statistical error on the particle flow ratio R measured in the mixed semi-leptonic data sample. The mixed events were then subjected to the same event selection and particle flow analysis used for the fully hadronic events. The particle flow ratios Rmixed SLData and Rmixed SLMC were measured in the mixed semi-leptonic data and MC samples, respectively, and are plotted as function of the centre-of-mass energy in Figure 8. The values of Rmixed SL measured in MC show a dependence on s. This effect is quantified by performing linear fits to the points measured with PYTHIA, ARIADNE and HERWIG, respectively. The differences between the measured slopes were found to be small. The function fitted to the PYTHIA points was used to rescale the values of R measured in data at different energies to 189 GeV. The rescaled values were then combined using as weights the scaled statistical errors. The weighted average R at 189 GeV for the mixed semi-leptonic events built from data was found to be 〈Rmixed SLData〉 = 1.052± 0.027(stat). (7) For each MC sample, the ratio Rmixed SLData/Rmixed SLMC was used to calibrate the particle flow ratio measured in the corresponding fully hadronic sample, R4qMC, to compare it to the ratio measured in the data, 〈R4qData〉. The correction factor Rmixed SLData/Rmixed SLMC was computed from the values of R rescaled to 189 GeV, calcu- lated from the fits to the mixed semi-leptonic samples built from the data and the MC. 185 190 195 200 205 210 Data Mixed SL PYTHIA Fit to PYTHIA ARIADNE Fit to ARIADNE HERWIG Fit to HERWIG √s (GeV) DELPHI Figure 8: The ratio Rmixed SL as function of s for data and MC, and fits to the MC (see text). The ARIADNE points at 189 GeV and at 206 GeV have their centre-of-mass energy shifted and the error bars on data are tilted for readability. MC sample PYTHIA ARIADNE HERWIG Rmixed SLData/Rmixed SLMC 1.053 1.044 0.997 RCalibrated4qMC 1.018 1.011 1.004 Table 4: Ratio of data to MC fitted values of R in mixed semi-leptonic samples, used to calibrate the R4qMC values for different models (upper line), and calibrated values of R4qMC. All values were computed at s = 189 GeV. The values for Rmixed SLMC are presented in Table 4, for the different models, along with the calibrated values of R4q for the same models. The calibration factors differ from unity by less than 6%, and the largest difference of the calibrated R4qMC values when changing the fragmentation model, 0.014, was consid- ered as an estimate of the systematic error due to simulation of the fragmentation and of the detector response, and was added in quadrature to the systematic error. The error in the calibrated R4qMC values due to the statistical error on 〈Rmixed SLData〉 value used for the calibration, 0.026, was also added in quadrature to the systematic error. 4.4.2 Bose-Einstein Correlations Bose-Einstein correlations (BEC) between identical pions and kaons are known to exist and were established and studied in Z hadronic decays in [29]. They are expected to exist with a similar behaviour in the W hadronic decays, and this is studied in [2]. They are implemented in the MC simulation samples with BEC via the BE32 model of LUBOEI [21], which was tuned to describe the DELPHI data in [2]. However, the situation for the WW (ZZ) fully hadronic decays is not so clear, i.e. whether there are correlations only between pions and kaons coming from the same W(Z) boson or also between pions and kaons from different W(Z) bosons. The analyses of Bose-Einstein correlations between identical particles coming from the decay of different W bosons do not show a significant effect [30] for three of the LEP experiments, whereas for DELPHI, an effect was found at the level of 2.4 standard deviations [2]. Thus, a comparison was made between the WPHACT samples without CR and with BEC only between the identical pions coming from the same W boson (BEC only inside), to the samples without CR and with BEC allowed for all the particles stemming from both W bosons, implemented with the BE32 variant of the LUBOEI model (BEC all). The R values were obtained at each centre-of-mass energy, after which a linear fit was performed for each model to obtain a best prediction at 189 GeV. The fit values were found to be in agreement to the estimate at 189 GeV alone, and for simplicity this estimate was used. The measurement of BEC from DELPHI of 2.4 standard deviations above zero (corresponding to BEC only inside), was used to interpolate the range of 4.1 standard deviations of separation between BEC only inside and BEC all. To include the error on the measured BEC effect, one standard deviation was added to the effect before the interpolation. The difference in the estimated values of R at s = 189 GeV, between the model with BEC only inside and the model with partial BEC all (at the interpolated point of 3.4/4.1), -0.013, was added in quadrature to the systematic error. 4.4.3 qq̄(γ) Background Shape The fragmentation effects, in the shape of the qq̄(γ) background, were estimated by comparing the values of R obtained when the subtracted qq̄(γ) sample was fragmented with ARIADNE instead of PYTHIA at the centre-of-mass energy of 189 GeV, and the differ- ence, 0.003, was added in quadrature to the systematic error. 4.4.4 qq̄(γ) and ZZ Background Contribution At the centre-of-mass energy of 189 GeV, the qq̄(γ) cross-section in the 4-jet region is poorly known, due to the difficulty in isolating the qq̄(γ) → 4-jet signal from other 4-jet processes such as WW and ZZ. The study performed in [31] has shown that the maximal difference in the estimated qq̄(γ) background rate is 10% coming from changing from PYTHIA to HERWIG as the hadronization model, with the ARIADNE model giving intermediate results. Conservatively, at each centre-of-mass energy a variation of 10% on the qq̄(γ) cross-section was assumed, and the largest shift in R, 0.011, was added in quadrature to the systematic error. The other background process considered is the Z pair production. The Standard Model predicted cross-sections are in agreement with the data at an error level of 10% [32]. The cross-section was thus varied by ±10% at each energy and the effect in R was found to be negligible. 4.4.5 Evolution of R with Energy The R ratios were rescaled to s = 189 GeV using the fit to the MC without CR, however the correct behaviour might be given by the MC with CR. Hence, the difference of 0.009 between the R values obtained using the two rescaling methods, using MC without CR 〈R〉 and with CR 〈RCR rescale〉, was added in quadrature to the systematic error. 4.5 Results of the Particle Flow Analysis The final result for the average of the ratios R rescaled to 189 GeV is MC Sample R PYTHIA no CR 1.037± 0.004 PYTHIA SK-I 100% 0.917± 0.003 ARIADNE no CR 1.053± 0.004 ARIADNE AR2 1.021± 0.004 HERWIG no CR 1.059± 0.004 HERWIG 1/9 CR 1.040± 0.003 Table 5: R ratios for the Cetraro samples at 189 GeV, calibrated with the mixed semi-lep- tonic events. 〈R〉 = 0.979± 0.032(stat)± 0.035(syst). (8) In order to facilitate comparisons between the four LEP experiments, this value can be normalised by the one determined from simulation samples produced with the full detector simulation and analysed with the same method. The LEP experiments agreed to use for this purpose the Cetraro PYTHIA samples. These events were generated with the ALEPH fragmentation tuning but have been reconstructed with the DELPHI detector simulation and analysed with this analysis. The values of the R ratios obtained from the Cetraro samples at 189 GeV, calibrated using the mixed semi-leptonic events from these samples, are given in Table 5. The value of 〈R〉 measured from data is between the expected R ratios from PYTHIA without CR and with the SK-I model with 100% fraction of reconnection. The error of this measurement is larger than the difference between the values of R from ARIADNE samples without and with CR, and than the difference between values of R from the HERWIG samples without CR and with 1/9 of reconnected events. The following normalised ratios are obtained for the sample without CR and imple- menting the SK-I model with 100% CR probability, respectively: rdatano CR = 〈R〉data Rno CR = 0.944± 0.031(stat)± 0.034(syst), (9) rdataCR = 〈R〉data = 1.067± 0.035(stat)± 0.039(syst). (10) In the above expressions, the statistical errors in the MC predicted values were propagated and added quadratically to the systematic errors on the ratios. It is also possible to define the following quantity, taking the predictions for RCR and Rno CR at s = 189 GeV from the PYTHIA samples in Table 5, 〈Rdata〉 −Rno CR RCR − Rno CR = 0.49± 0.27(stat)± 0.29(syst) , (11) from which it can be concluded that the measured 〈Rdata〉 is compatible with intermediate probability of CR, and differs from the CR in the SK-I model at 100% at the level of 1.3 standard deviations. The ability to distinguish between these two models can be computed from the inverse of the sum in quadrature of the statistical and systematic errors; it amounts to be 2.5 standard deviations. In Figure 9 the result of δr is compared to the predicted values, in the scope of the SK-I model, as a funtion of the fraction of reconnected events. Fraction of reconnected events % DELPHI 0 20 40 60 80 100 Figure 9: Comparison of the measurement of the δr observable to the predictions from the SK-I model as a function of the fraction of reconnected events. The result for the value of 〈R〉 can also be used to test for consistency with the SK-I model as a function of κ and a log-likelihood curve was obtained. This also facilitates combination with the result obtained in the analysis in the following section, and for this reason the value of 〈R〉 is rescaled with PYTHIA without CR to a centre-of-mass energy of 200 GeV: the value obtained at 200 GeV is 〈R〉(200 GeV) = 1.024 ± 0.050. The values obtained for the predicted ratios RN at 200 GeV and the log-likelihood curve, as a function of κ, are shown in Figure 10. The value of κ most compatible with the data within one standard deviation is κSK-I = 4.13 +20.97 −3.46 . (12) 5 Different MW Estimators as Observables It has been shown [12] that the MW measurement inferred from hadronically decaying W+W− events at LEP-2, by the method of direct reconstruction, is influenced by CR effects, most visible when changing the value of κ in the SK-I model. For the MW(4q) estimator within DELPHI this is shown in [33]. Other published MW estimators in LEP experiments are equally sensitive to κ [34]. To probe this sensitivity to CR effects, alternative estimators for the MW measure- ment were designed which have different sensitivity to κ. In the following, the standard estimator and two alternative estimators, studied in this paper, are presented. The stan- dard estimator corresponds to that previously used in the measurement of the W mass by DELPHI [33]. Note that in the final DELPHI W mass analysis [35] results are given 0. 0. Fraction of reconnected events % DELPHI DELPHI Fraction of reconnected events % DELPHI DELPHI 0 20 40 60 80 100 0 20 40 60 80 100 Figure 10: a) Estimated ratio RN at 200 GeV plotted as a function of different κ values (top scale), or as function of the corresponding reconnection probabilities (bottom scale), compared to 〈R〉 measured from data after rescaling to 200 GeV (horizontal lines marked with R for the value and with 1σ(2σ) for the 〈R〉 value added/subtracted by one(two) standard deviations); the last three marks on the x axis, close to 100% of reconnection probability, correspond respectively to the values κ = 100, 300, 800; b) corresponding log-likelihood curve for the comparison of the estimated values (RN ) with the data (〈R〉). for the standard and hybrid cone estimators, with the hybrid cone estimator used to provide the primary result. The data samples, efficiencies and purities for the analysis corresponding to the standard estimator are provided in [33, 35]. • The standard MW estimator : This estimator is described in [33] and was optimised to obtain the smallest sta- tistical uncertainty for the W mass measurement. It results in an event-by-event likelihood Li(MW) for the parameter MW. • The momentum cut MW estimator : For this alternative MW estimator the event selection was performed in exactly the same way as for the standard MW estimator. The particle-jet association was also taken from this analysis. However, when reconstructing the event for the MW extraction a tighter track selection was applied. The momentum and energy of the jets were calculated only from those tracks having a momentum higher than a certain pcut value. An event-by-event likelihood L i (MW) was then calculated. • The hybrid cone MW estimator : In this second alternative MW estimator the reconstruction of the event is the same as for the standard analysis, except when calculating the jet momenta used for the MW extraction. coneR (cone) (std) Figure 11: Illustration of the iterative cone algorithm within a predefined jet as explained in the text. An iterative procedure was used within each jet (defined by the clustering algorithm used in the standard analysis) to find a stable direction of a cone excluding some particles in the calculation of the jet momentum, illustrated in Figure 11. Starting with the direction of the original jet ~p std , the jet direction was recalculated (direction (1) on the Figure) only from those particles which have an opening angle smaller than Rcone with this original jet. This process was iterated by constructing a second cone (of the same opening angle) around this new jet direction and the jet direction was recalculated again. The iteration was continued until a stable jet direction ~p jetcone was found. The jet momenta obtained, ~p jetcone, were rescaled to compensate for the lost energy of particles outside the stable cone, ~p jetcone → ~p jetcone · Ejetcone . (13) The energies of the jets were taken to be the same as those obtained with the standard clustering algorithm (E jetcone → E jet). This was done to increase the correlation of this estimator with the standard one. The rescaling was not done for the pcut estimator as it will be used in a cross-check observable with different systematic properties. Again the result is an event-by-event likelihood LRconei (MW). Each of these previously defined MW likelihoods had to be calibrated. The slope of the linear calibration curve for the MW estimators is tuned to be unity, therefore only a bias correction induced by the reconstruction method has to be applied. This bias is estimated with the nominal WPHACT Monte Carlo events and the dependence on the value of κ is estimated with the EXCALIBUR simulation. It was verified for smaller subsets that the results using these large EXCALIBUR samples and the samples generated with WPHACT are compatible. Neglecting the possible existence of Colour Reconnection (CR) in the Monte Carlo simulation results in event likelihoods Li(MW|event without CR), while Li(MW|event with CR) are the event likelihoods obtained when assuming the hypothesis that events do reconnect (100% CR in the scope of the SK-I model). To construct the event likelihoods for intermediate CR (values of κ larger than 0) the following weighting formula is used : Li(MW|κ) = [1−Pi(κ)]·Li(MW|event without CR)+Pi(κ)·Li(MW|event with CR) (14) where Pi(κ) is defined in Equation 1. The combined likelihood is produced for the event sample; the calibrated values for MW(κ) were obtained for different val- ues of κ using the maximum likelihood principle. In Figure 12 the difference dMW(κ) = MW(κ)−MW(κ = 0) or the influence of κ on the bias of the MW estimator is presented as function of κ. The uncertainty on this difference is estimated with the Jackknife method [27] to take the correlation between MW(κ) and MW(κ = 0) into account. It was observed from simulations that the estimators dependency on κ, for κ below about 5, was not signifi- cantly different in the centre-of-mass range between 189 and 207 GeV. Therefore in the determination of κ the dependency at 200 GeV was taken as default for all centre-of- mass energies. This value of centre-of-mass energy is close to the integrated luminosity weighted centre-of-mass energy of the complete data sample, which is 197.1 GeV. When neglecting the information content of low momentum particles or when using the hybrid cone algorithm, the influence of Colour Reconnection on the MW estimator is decreased. The dependence ∂MW of the estimator to κ is decreased when increasing the value of pcut or when working with smaller cone opening angles Rcone. 5.1 The Measurement of κ The observed difference ∆MW(std, i) = MW std − MWi in the event sample, where i is a certain alternative analysis, provides a measurement of κ. When both estimators std and MW i are calibrated in the same hypothesis of κ, the expectation values of ∆MW(std, i) will be invariant under a change of pcut or Rcone. When neglecting part of the information content of the events in these alternative MW analyses, by increasing pcut or decreasing Rcone, the statistical uncertainty on the value of 1 10 10 Standard MW analysis DELPHI pcut = 1 GeV/c pcut = 2 GeV/c pcut = 3 GeV/c Cone R=1.00 rad Cone R=0.75 rad Cone R=0.50 rad Cone R=0.25 rad SK-I Model parameter κ κ = 0.66 Figure 12: The difference dMW(κ) = MW(κ)−MW(κ = 0) is presented as a function of κ, for different MW estimators. The curve for the standard MW estimator is the curve at the top. The curves obtained with the hybrid cone analysis for different values of the cone opening angle, starting from the top with 1.00 rad down to 0.75 rad, 0.50 rad and 0.25 rad are indicated with dotted lines. The curves obtained with the momentum cut analysis for different values of pcut, starting from the top with 1 GeV/c, down to 2 GeV/c and 3 GeV/c are dashed. The vertical line indicates the value of κ preferred by the SK-I authors [5] and commonly used to estimate systematic uncertainties on measurements using e+e− → W+W− → q1q̄2q3q̄4 events. the MW estimator is increased. Therefore a balance must be found between the statistical precision on ∆MW(std, i) and the dependence of this difference to κ in order to obtain the largest sensitivity for a κ measurement. This optimum was found using the Monte Carlo simulated events and assuming that the data follow the κ = 0 hypothesis, resulting in the smallest expected uncertainty on the estimation of κ. For the pcut analysis an optimal sensitivity was found when using the difference ∆MW(std, pcut) with pcut equal to 2 GeV/c or 3 GeV/c. Even more information about κ could be extracted from the data, when using the difference ∆MW(std,Rcone), which was found to have an optimal sensitivity around Rcone = 0.5 rad. No significant im- provement in the sensitivity was found when combining the information from these two observables. Therefore the best measure of κ using this method is extracted from the ∆MW(std,Rcone = 0.5 rad) observable. Nevertheless, the ∆MW(std, pcut = 2GeV/c) ob- servable was studied as a cross-check. 5.2 Study of the Systematic Errors in the ∆MW Method The estimation of systematic uncertainties on the observables ∆MW(std, i) follows similar methods to those used within the MW analysis. Here the double difference is a measure of the systematic uncertainty between Monte Carlo simulation (‘MC’) and real data (‘DA’): ∆syst(MC,DA) = std(MC)−MWstd(DA)]− [MWi(MC)−MWi(DA)] ∣ (15) where i is one of the alternative MW estimators. The systematic error components are described below and summarised in Table 6. 5.2.1 Jet Reconstruction systematics with MLBZs A novel technique was proposed in [36] to study systematic uncertainties on jet recon- struction and fragmentation in W physics measurements with high statistical precision through the use of Mixed Lorentz Boosted Z events (MLBZs). The technique is similar to the one described in section 4.4.1. The main advantage of this method was that Monte Carlo simulated jet properties in W+W− events could be directly compared with the corresponding ones from real data using the large Z statistics. The main extension of the method beyond that described in [36] consisted in an improved mixing and boosting procedure of the Z events into MLBZs, demonstrated in Figure 13. The 4-momenta of the four primary quarks in WPHACT generated W+W− → q1q̄2q3q̄4 events were used as event templates. The Z events from data or simulation were chosen such that their thrust axis directions were close in polar angle to one of the primary quarks of the W+W− event template. Each template W was then boosted to its rest frame. The particles in the final state of a selected Z event were rotated so that the thrust axis matches the rest frame direction of the primary quarks in the W+W− template. After rescaling the kinematics of the Z events to match the W boson mass in the generated W+W− template, the two Z events were boosted to the lab frame of the W+W− template. All particles having an absolute polar angle with the beam direction smaller than 11◦ were removed from the event. The same generated WPHACT events were used for the construction of both the data MLBZs and Monte Carlo MLBZs in order to increase the correlation between both emulated samples to about 31%. This correlation was taken into account when boost lab frame WW "re−boost" rotate rest frame WW lab frame MLBZ boost Figure 13: Illustration of the mixing and boosting procedure within the MLBZ method (see text for details). quoting the statistical uncertainty on the systematic shift on the observables between data and Monte Carlo MLBZs. It was verified that when introducing a significant mass shift of 300 MeV/c2 on MW by using the cone rejection algorithm, it was reproduced within 15% by applying the MLBZ technique. Because the expected systematic uncertainties on the ∆MW(std, i) observables of interest are one order of magnitude smaller than 300 MeV/c2, this method is clearly justified. The double difference of Equation 15 was determined with the MLBZ method using Z events selected in the data sets collected during the 1998 calibration runs and Z events from the corresponding Monte Carlo samples. The following results were obtained for the ∆MW(std,Rcone = 0.5 rad) observable: ∆syst(ARIADNE ,DA) = −1.9 ± 3.9(stat)MeV/c2 ∆syst(PYTHIA ,DA) = −5.7 ± 3.9(stat)MeV/c2 ∆syst(HERWIG ,DA) = −10.6 ± 3.9(stat)MeV/c2 where the statistical uncertainty takes into account the correlation between the Monte Carlo and the data MLBZ events, together with the correlation between the two MW estimators. This indicates that most of the fragmentation, detector and Between-W Bose-Einstein Correlation systematics are small. The study was not performed for the ∆MW(std, pcut) observable. Other systematic sources on the reconstructed jets are not considered as the MW estimators used in the difference ∆MW(std, i) have a large correlation. 5.2.2 Additional Fragmentation systematic study The fragmentation of the primary partons is modelled in the Monte Carlo simulation used for the calibration of the MW i observables. The expected values on the MW estimators from simulation (in the κ = 0 hypothe- sis) are changed when using different fragmentation models [33], resulting in systematic uncertainties on the measured MW i observables and hence possibly also on our esti- mated κ. In Figure 14 the systematic shift δMW in the different MW i observables is shown when using HERWIG or ARIADNE rather than PYTHIA as the fragmentation model in the no Colour Reconnection hypothesis. When inferring κ from the data difference, ∆MW(std, i), the PYTHIA model is used to calibrate each MW i observable. This data difference for MW pcut=2GeV/c, ∆MW(std, pcut = 2GeV/c), changes 3 by (27 ± 12) MeV/c2 or (8 ± 12) MeV/c2 when replacing PYTHIA by respectively HERWIG or ARIADNE. Simi- larly, the observable ∆MW(std,Rcone = 0.5 rad) changes by (-4 ± 10) MeV/c2 or (-6 ± 10) MeV/c2 when replacing PYTHIA by respectively HERWIG or ARIADNE. The largest shift of the observable when changing fragmentation models (or the uncertainty on this shift if larger) is taken as systematic uncertainty on the value of the observable. Hence, systematic errors of 27 MeV/c2 for the ∆MW(std, pcut = 2GeV/c) observable and 10 MeV/c2 for the ∆MW(std,Rcone = 0.5 rad) observable were assumed as the contribution from fragmentation uncertainties. The MLBZ studies (see above) are compatible with these results, hence no additional systematic due to fragmentation was quoted for the ∆MW(std,Rcone = 0.5 rad) observable. 3This change, ∆MW(std, pcut = 2GeV/c) PYTHIA − ∆MW(std, pcut = 2GeV/c) HERWIG , is given by δMW(std ≡ pcut = 0.2GeV/c) PYTHIA−HERWIG − δMW(pcut = 2GeV/c) PYTHIA−HERWIG, and similar expressions for the ARIADNE and Rcone cases (for Rcone, std ≡ Rcone = π). 0 0.5 1 1.5 2 2.5 3 3.5 4 PYTHIA-ARIADNE DELPHI PYTHIA-HERWIG pcut / GeV/c 0 0.2 0.4 0.6 0.8 1 1.2 PYTHIA-ARIADNE DELPHI PYTHIA-HERWIG Rcone / rad Figure 14: Systematic shifts δMW, on MW observables, when applying different fragmen- tation models as a function of the pcut or Rcone values used in the construction of the MW observable. These Monte Carlo estimates were obtained at a centre-of-mass energy of 189 GeV. The uncertainties are determined with the Jackknife method. 5.2.3 Energy Dependence The biases of the different MW estimators have a different dependence on the centre- of-mass energy, hence the calibration of ∆MW(i, j) will be energy dependent. The energy dependence of each individual MW estimator was parameterised with a second order poly- nomial. Since WPHACT event samples were used at a range of centre-of-mass energies the uncertainty on the parameters describing these curves are small. Therefore a small systematic uncertainty of 3 MeV/c2 was quoted on the ∆MW(i, j) observables due to the calibration. 5.2.4 Background The same event selection criteria were applied for all the MW estimators, hence the same background contamination is present in all analyses. The influence of the qq̄(γ) background events on the individual MW estimators is small [33] and was taken into account when constructing the centre-of-mass energy dependent calibration curves of the individual MW estimators. The residual systematic uncertainty on both ∆MW(i, j) observables is 3 MeV/c2. 5.2.5 Bose-Einstein Correlations As for the particle flow method, the systematic uncertainties due to possible Bose- Einstein Correlations are estimated via Monte Carlo simulations. The relevant values for the systematic uncertainties on the observables are the differences between the ob- servables obtained from the Monte Carlo events with Bose-Einstein Correlations inside individual W’s (BEI) and those with, in addition, Bose-Einstein Correlations between identical particles from different W’s (BEA). The values were estimated to be (6.4 ± 9.3) MeV/c2 for the ∆MW(std, pcut = 2GeV/c) observable, and (7.2 ± 8.2) MeV/c2 for the ∆MW(std,Rcone = 0.5 rad) observable. As the uncertainties in the estimated contri- butions were larger than the contributions themselves, these uncertainties were added in quadrature to the systematic errors on the relevant observables. 5.2.6 Cross-check in the Semi-leptonic Channel Colour Reconnection between the decay products originating from different W boson decays can only occur in the W+W− → q1q̄2q3q̄4 channel. The semi-leptonic W+W− decay channel (i.e, qq̄′ℓνℓ) is by definition free of those effects. Therefore the determi- nation of Colour Reconnection sensitive observables, like ∆MW(std,Rcone = 0.5 rad), in this decay channel could indicate the possible presence of residual systematic effects. A study of the ∆MW(std,Rcone = 0.5 rad) observable was performed in the semi-leptonic decay channel. The semi-leptonic MW analysis in [33] was used and the cone algorithm was implemented in a similar way as for the fully hadronic decay channel. The same data sets have been used as presented throughout this paper and the following result was obtained: ∆MW(std,Rcone) = MW std − MWRcone = (8 ± 56(stat))MeV/c2 (17) where the statistical uncertainty was computed taking into account the correlation be- tween both measurements. Although the statistical significance of this cross-check is small, a good agreement was found for both MW estimators. 5.3 Results from the MW Estimators Analyses The observable ∆MW(std,Rcone) with Rcone equal to 0.5 rad (defined above), was found to be the most sensitive to the SK-I Colour Reconnection model, and the ∆MW(std, pcut = 2GeV/c) observable was measured as a cross-check. The analyses were calibrated with PYTHIA κ = 0 WPHACT generated simulation events. The values measured from the combined DELPHI data at centre-of-mass energies ranging between 183 and 208 GeV are: ∆MW(std,Rcone) = MW std − MWRcone = (59 ± 35(stat) ± 14(syst))MeV/c2 ∆MW(std, pcut) = MW std − MWpcut = (143 ± 61(stat) ± 29(syst))MeV/c2 where the first uncertainty numbers represent the statistical components and the sec- ond the combined systematic ones. The full breakdown of the uncertainties on both observables can be found in Table 6. Uncertainty contribution (MeV/c2) Source ∆MW(std,Rcone = 0.5 rad) ∆MW(std, pcut = 2GeV/c) Fragmentation 11 27 Calibration 3 3 Background 3 3 BEI-BEA 8 9 Total systematic 14 29 Statistical Error 35 61 Total 38 67 Table 6: Breakdown of the total uncertainty on both relevant observables. From these values estimates were made for the κ parameter by comparing them with the Monte Carlo expected values in different hypothesis of κ, shown in Figure 15 for the observable ∆MW(std,Rcone = 0.5 rad). The Gaussian uncertainty on the measured observables was used to construct a log- likelihood function L(κ) = −2 log L(κ) for κ. The log-likelihood function obtained is shown in Figure 16 for the first and in Figure 17 for the second observable. The result shown in Figure 16 is the primary result of this analysis, because of the larger sensitivity of the ∆MW(std,Rcone = 0.5 rad) observable to the value of κ (see sec- tion 5.1). The value of κ most compatible with the data within one standard deviation of the measurement is κSK-I = 1.75 +2.60 −1.30 . (19) The result on κ extracted from the cross-check ∆MW(std, pcut = 2GeV/c) observable is found not to differ significantly from the quoted result obtained with the more opti- mal ∆MW(std,Rcone = 0.5 rad) observable. The significance can be determined by the difference between both MW estimators : pcut − MWRcone = (−84 ± 59(stat))MeV/c2 . (20) Taking into account that the expectation of this difference depends on κ, we find a sta- tistical deviation of about 1 to 1.5σ between the measurements. No improved sensitivity is obtained by combining the information of both observables. 200 DELPHI at 188.6 GeV at 199.5 GeV at 206.5 GeV SK-I Model parameter κ κ = 0.66 Figure 15: The dependence of the observable ∆MW(std,Rcone = 0.5 rad) from simulation events on the value of the SK-I model parameter κ. The dependence is given at three centre-of-mass energies. 1 10 10 Likelihood of indirect measurement of κ SK-I DELPHI measured : std - R=0.50 (stat+syst) DELPHI measured : std - R=0.50 (stat) SK-I Model parameter κ Figure 16: The log-likelihood function −2 log L(κ) obtained from the DELPHI data mea- surement of ∆MW(std,Rcone = 0.5 rad). The bottom curve (full line) gives the final result including the statistical uncertainty on ∆MW(std,Rcone = 0.5 rad) and the investigated systematic uncertainty contributions. The top curve (dashed) is centred on the same min- imum and reflects the log-likelihood function obtained when only statistical uncertainties are taken into account. 1 10 10 Likelihood of indirect measurement of κ SK-I DELPHI measured : std - pcut=2 (stat+syst) DELPHI measured : std - pcut=2 (stat) SK-I Model parameter κ Figure 17: The log-likelihood function −2 log L(κ) obtained from the DELPHI data mea- surement of ∆MW(std, pcut = 2GeV/c). The bottom curve (full line) gives the final result including the statistical uncertainty on ∆MW(std, pcut = 2GeV/c) and the investigated systematic uncertainty contributions. The top curve (dashed) is centred on the same min- imum and reflects the log-likelihood function obtained when only statistical uncertainties are taken into account. In this paper the SK-I model for Colour Reconnection implemented in PYTHIA was studied because it parameterizes the effect as function of the model parameter κ. Other phenomenological models implemented in the ARIADNE [7,8] and HERWIG [6] Monte Carlo fragmentation schemes exist and are equally plausible. Unfortunately their effect in W+W− → q1q̄2q3q̄4 events cannot be scaled with a model parameter, analogous to κ in SK-I, without affecting the fragmentation model parameters. Despite this non- factorization property, the consistency of these models with the data can still be ex- amined. The Monte Carlo predictions of the observables in the hypothesis with Colour Reconnection (calibrated in the hypothesis of no Colour Reconnection) give the following values: ARIADNE → MWstd − MWRcone = (7.2 ± 4.1) MeV/c2 ARIADNE → MWstd − MWpcut = (9.4 ± 7.0) MeV/c2 HERWIG → MWstd − MWRcone = (19.7 ± 4.0) MeV/c2 HERWIG → MWstd − MWpcut = (22.8 ± 6.9) MeV/c2 . The small effects on the observables with the HERWIG implementation of Colour Reconnec- tion compared to those predicted by SK-I are due to the fact that the fraction of events that reconnect is smaller in HERWIG (1/9) compared to SK-I (& 25% at s = 200 GeV). After applying this scale factor between both models, their predicted effect on the W mass and on the ∆MW(i, j) observables becomes compatible. The ARIADNE implementa- tion of Colour Reconnection has a much smaller influence on the observables compared to those predicted with the SK-I and HERWIG Monte Carlo. 5.4 Correlation with Direct MW Measurement When using a data observable to estimate systematic uncertainties on some measur- and inferred from the same data sample, the correlation between the estimator used to measure the systematic bias and the estimator of the absolute value of the measurand should be taken into account. Therefore the correlation between the Colour Reconnection sensitive observables ∆MW(std,Rcone = 0.5 rad) and ∆MW(std, pcut = 2GeV/c) and the absolute MW(std) estimator was calculated. The correlation was determined from the Monte Carlo events and with κ = 0 or no Colour Reconnection. The values obtained were found to be stable as a function of κ within the statistical precision. The correlation between ∆MW(std,Rcone = 0.5 rad) and MW(std) was found to be 11%, while for the one between ∆MW(std, pcut = 2GeV/c) and MW(std) a value of 8% was obtained. Also the correlation between the different MW estimators was estimated and found to be stable with the value of κ. A value of 83% was obtained for the correlation between MW(std) and MW Rcone=0.5 rad, while 66% was obtained between MW(std) and MW pcut=2GeV/c. 6 Combination of the Results in the Scope of the SK-I Model The log-likelihood curve from the particle flow method was combined with the curve from the ∆MW method and the result is shown in Figure 18. The correlations between the analyses were neglected because the overlap between the samples is small and the nature of the analyses is very different. The total errors were used (statistical and systematic added in quadrature) in the combination. 1 10 10 Log-Likelihood of measurement of κ SK-I from ∆MW from particle flow ∆MW+part.flow combined SK-I Model parameter κ DELPHI Figure 18: The log-likelihood function −2 log L(κ) obtained from the combined DELPHI measurement via ∆MW(std,Rcone = 0.5 rad) and the particle flow. The full line gives the final result including the statistical and systematic uncertainties. The log-likelihood functions are combined in the hypothesis of no correlation between the statistical and systematic uncertainties of both measurements. The best value for κ from the minimum of the curve, with its error given by the width of the curve at the value −2 log L = (−2 log L)min + 1, is: κSK-I = 2.2 −1.3 . (22) 7 Conclusions Colour Reconnection (CR) effects in the fully hadronic decays of W pairs, produced in the DELPHI experiment at LEP, were investigated using the methods of the particle flow and the MW estimators, notably the ∆MW(std,Rcone = 0.5 rad) observable. The average of the ratios R of the integrals between 0.2 and 0.8 of the particle distri- bution in Inside-W regions to the Between-W regions was found to be 〈R〉 = 0.979± 0.032(stat)± 0.035(syst) . (23) The values used in this average were obtained after rescaling the value at each energy to the value at a centre-of-mass energy of 189 GeV using a fit to the MC without CR. The effects of CR on the values of the reconstructed mass of the W boson, as imple- mented in different Monte Carlo models, were studied with different estimators. From the estimator of the W mass with the strongest sensitivity to the SK-I model of CR, the ∆MW(std,Rcone = 0.5 rad) method, the difference in data was found to be ∆MW(std,Rcone) = MW std −MWRcone=0.5 rad = ( 59± 35(stat)± 14(syst) )MeV/c2 . (24) From the combination of the results from particle flow and MW estimators, corre- sponding to the curve in full line shown in Figure 18, the best value and total error for the κ parameter in the SK-I model was extracted to be: κSK-I = 2.2 −1.3 (25) which corresponds to a probability of reconnection of Preco = 52% and lies in the range 31% < Preco < 68% at 68% confidence level. The two analysis methods used in this paper are complementary: the method of parti- cle flow provides a model-independent measurement but has significantly less sensitivity towards the SK-I model of CR than the method of ∆MW estimators. The obtained value of κ in equation (25) can be compared with similar values obtained by other LEP experiments, and it was found to be compatible with, but higher than, the values obtained with the particle flow by L3 [37] and OPAL [38]. It is also compatible with, but higher than, the values obtained with the method of different MW estimators by OPAL [39] and ALEPH [40]. Acknowledgements We thank the ALEPH Collaboration for the production of the simulated “Cetraro Samples”. We are greatly indebted to our technical collaborators, to the members of the CERN- SL Division for the excellent performance of the LEP collider, and to the funding agencies for their support in building and operating the DELPHI detector. We acknowledge in particular the support of Austrian Federal Ministry of Education, Science and Culture, GZ 616.364/2-III/2a/98, FNRS–FWO, Flanders Institute to encourage scientific and technological research in the industry (IWT) and Belgian Federal Office for Scientific, Technical and Cultural affairs (OSTC), Belgium, FINEP, CNPq, CAPES, FUJB and FAPERJ, Brazil, Czech Ministry of Industry and Trade, GA CR 202/99/1362, Commission of the European Communities (DG XII), Direction des Sciences de la Matière, CEA, France, Bundesministerium für Bildung, Wissenschaft, Forschung und Technologie, Germany, General Secretariat for Research and Technology, Greece, National Science Foundation (NWO) and Foundation for Research on Matter (FOM), The Netherlands, Norwegian Research Council, State Committee for Scientific Research, Poland, SPUB-M/CERN/PO3/DZ296/2000, SPUB-M/CERN/PO3/DZ297/2000, 2P03B 104 19 and 2P03B 69 23(2002-2004) FCT - Fundação para a Ciência e Tecnologia, Portugal, Vedecka grantova agentura MS SR, Slovakia, Nr. 95/5195/134, Ministry of Science and Technology of the Republic of Slovenia, CICYT, Spain, AEN99-0950 and AEN99-0761, The Swedish Research Council, Particle Physics and Astronomy Research Council, UK, Department of Energy, USA, DE-FG02-01ER41155, EEC RTN contract HPRN-CT-00292-2002. References [1] G. Gustafson, U. Pettersson and P. M. Zerwas, Phys. Lett. B 209 (1988) 90. [2] J. Abdallah et al. [DELPHI Collaboration], Eur. Phys. J. C 44 (2005) 161. [3] V. Khoze, L. Lönnblad, R. Møller, T. Sjöstrand, Š. Todorova and N. K. Watson in “Physics at LEP-2”, Yellow Report CERN 96-01, Eds. G. Altarelli, T. Sjöstrand and F. Zwirner, vol. 1 (1996) 191. [4] The LEP Collaborations ALEPH, DELPHI, L3, OPAL, and the LEP W Working Group, “Combined preliminary results on Colour Reconnection using Particle Flow in e+e− → W+W−”, note LEPEWWG/FSI/2002-01, ALEPH 2002-027 PHYSIC 2002- 016, DELPHI 2002-090 CONF 623, L3 note 2768, and OPAL TN-724, July 17th, 2002, contribution to the summer Conferences of 2002, available at http://delphiwww.cern.ch/pubxx/delnote/public/2002 090 conf 623.ps.gz . [5] T. Sjöstrand and V. A. Khoze, Z. Phys. C 62 (1994) 281. [6] G. Marchesini et al., Comp. Phys. Comm. 67 (1992) 465; G. Corcella et al., JHEP 0101 (2001) 010. [7] L. Lönnblad, Comp. Phys. Comm. 71 (1992) 15; H. Kharraziha and L. Lönnblad, Comp. Phys. Comm. 123 (1999) 153. [8] G. Gustafson and J. Häkkinen, Z. Phys. C 64 (1994) 659. [9] L. Lönnblad, Z. Phys. C 70 (1996) 107. [10] P. Abreu et al. [DELPHI Collaboration], Phys. Lett. B 416 (1998) 233. P. Abreu et al. [DELPHI Collaboration], Eur. Phys. J. C 18 (2000) 203 [Erratum- ibid. C 25 (2002) 493]. [11] D. Duchesneau, “New method based on energy and particle flow in e+e− → W+W− → hadrons events for colour reconnection studies”, LAPP-EXP-2000-02 (http://wwwlapp.in2p3.fr/preplapp/LAPP EX2000 02.pdf), Presented at Workshop on WW Physics at LEP-200 (WW99), Kolymbari, Chania, Greece, 20-23 Oct 1999. [12] J. D’ Hondt and N. J. Kjaer, “Measurement of Colour Reconnection model parameters using MW analyses”, contributed paper for ICHEP’02 (Amsterdam), note DELPHI 2002-048 CONF 582, available at http://delphiwww.cern.ch/pubxx/delnote/public/2002 048 conf 582.ps.gz . [13] P. A. Aarnio et al. [DELPHI Collaboration], Nucl. Instrum. Meth. A 303 (1991) 233. [14] P. Abreu et al. [DELPHI Collaboration], Nucl. Instrum. Meth. A 378 (1996) 57. [15] A. Augustinus et al. [DELPHI Trigger Group], Nucl. Instrum. Meth. A 515 (2003) [16] P. Chochula et al. [DELPHI Silicon Tracker Group], Nucl. Instrum. Meth. A 412 (1998) 304. [17] A. Ballestrero, R. Chierici, F. Cossutti and E. Migliore, Comp. Phys. Comm. 152 (2003) 175. [18] E. Accomando and A. Ballestrero, Comp. Phys. Comm. 99 (1997) 270; E. Accomando, A. Ballestrero and E. Maina, Comp. Phys. Comm. 150 (2003) 166. [19] T. Sjöstrand, Comp. Phys. Comm. 82 (1994) 74; T. Sjöstrand et al., Comp. Phys. Comm. 135 (2001) 238. [20] P. Abreu et al. [DELPHI Collaboration], Z. Phys. C 73 (1996) 11. [21] L. Lönnblad and T. Sjöstrand, Eur. Phys. J. C 2 (1998) 165. [22] F. A. Berends, R. Pittau and R. Kleiss, Comp. Phys. Comm. 85 (1995) 437. [23] S. Jadach, B. F. L. Ward and Z. Was, Phys. Lett. B 449 (1999) 97; S. Jadach, B. F. L. Ward and Z. Was, Comp. Phys. Comm. 130 (2000) 260. [24] S. Jadach et al., Comp. Phys. Comm. 140 (2001) 475. [25] P. Abreu et al., Nucl. Instrum. Meth. A 427 (1999) 487. [26] S. Catani et al., Phys. Lett. B 269 (1991) 432. [27] B. Efron, “Computers and the Theory of Statistics”, SIAM Rev. 21 (1979) 460. [28] J. Abdallah et al. [DELPHI Collaboration], Eur. Phys. J. C 34 (2004) 399. [29] P. Abreu et al. [DELPHI Collaboration], Phys. Lett. B 286 (1992) 201; P. Abreu et al. [DELPHI Collaboration], Z. Phys. C 63 (1994) 17; P. Abreu et al. [DELPHI Collaboration], Phys. Lett. B 355 (1995) 415; P. Abreu et al. [DELPHI Collaboration], Phys. Lett. B 471 (2000) 460; P. Achard et al. [L3 Collaboration], Phys. Lett. B 524 (2002) 55; P. Achard et al. [L3 Collaboration], Phys. Lett. B 540 (2002) 185; P. D. Acton et al. [OPAL Collaboration], Phys. Lett. B 267 (1991) 143; P. D. Acton et al. [OPAL Collaboration], Phys. Lett. B 287 (1992) 401; P. D. Acton et al. [OPAL Collaboration], Phys. Lett. B 298 (1993) 456; R. Akers et al. [OPAL Collaboration], Z. Phys. C 67 (1995) 389; G. Alexander et al. [OPAL Collaboration], Z. Phys. C 72 (1996) 389. K. Ackerstaff et al. [OPAL Collaboration], Eur. Phys. J. C 5 (1998) 239; G. Abbiendi et al. [OPAL Collaboration], Eur. Phys. J. C 11 (1999) 239; G. Abbiendi et al. [OPAL Collaboration], Eur. Phys. J. C 16 (2000) 423; G. Abbiendi et al. [OPAL Collaboration], Eur. Phys. J. C 21 (2001) 23; G. Abbiendi et al. [OPAL Collaboration], Phys. Lett. B 523 (2001) 35; G. Abbiendi et al. [OPAL Collaboration], Phys. Lett. B 559 (2003) 131; D. Decamp et al. [ALEPH Collaboration], Z. Phys. C 54 (1992) 75; A. Heister et al. [ALEPH Collaboration], Eur. Phys. J. C 36 (2004) 147; S. Schael et al. [ALEPH Collaboration], Phys. Lett. B 611 (2005) 66. [30] P. Achard et al. [L3 Collaboration], Phys. Lett. B 547 (2002) 139; G. Abbiendi et al. [OPAL Collaboration], Eur. Phys. J. C 36 (2004) 297; S. Schael et al. [ALEPH Collaboration], Phys. Lett. B 606 (2005) 265. [31] J. Abdallah et al. [DELPHI Collaboration], Eur. Phys. J. C 34 (2004) 127. [32] J. Abdallah et al. [DELPHI Collaboration], Eur. Phys. J. C 30 (2003) 447. [33] P. Abreu et al. [DELPHI Collaboration], Phys. Lett. B 511 (2001) 159. [34] The LEP Collaborations ALEPH, DELPHI, L3 and OPAL, and the LEP W Working Group, “Combined Preliminary Results on the Mass and Width of the W Boson Measured by the LEP Experiments”, note LEPEWWG/MASS/2001- 02, ALEPH 2001-044 PHYSIC 2001-017, DELPHI 2001-122 PHYS 899, L3 Note 2695, OPAL TN-697, contribution to EPS 2001, available at http://delphiwww.cern.ch/pubxx/delnote/public/2001 122 phys 899.ps.gz . [35] J. Abdallah et al. [DELPHI Collaboration], “Measurement of the mass and width of the W boson in e+e− collisions at s =161-209 GeV ”, paper in preparation. [36] N. Kjaer and M. Mulders, “Mixed Lorentz boosted Z0’s”, CERN-OPEN-2001-026. [37] P. Achard et al. [L3 Collaboration], Phys. Lett. B 561 (2003) 202. [38] G. Abbiendi et al. [OPAL Collaboration], Eur. Phys. J. C 45 (2006) 291. [39] G. Abbiendi et al. [OPAL Collaboration], Eur. Phys. J. C 45 (2006) 307. [40] S. Schael et al. [ALEPH Collaboration], Eur. Phys. J. C 47 (2006) 309. ABSTRACT In the reaction e+e- -> WW -> (q_1 qbar_2)(q_3 qbar_4) the usual hadronization models treat the colour singlets q_1 qbar_2 and q_3 qbar_4 coming from two W bosons independently. However, since the final state partons may coexist in space and time, cross-talk between the two evolving hadronic systems may be possible during fragmentation through soft gluon exchange. This effect is known as Colour Reconnection. In this article the results of the investigation of Colour Reconnection effects in fully hadronic decays of W pairs in DELPHI at LEP are presented. Two complementary analyses were performed, studying the particle flow between jets and W mass estimators, with negligible correlation between them, and the results were combined and compared to models. In the framework of the SK-I model, the value for its kappa parameter most compatible with the data was found to be: kappa_{SK-I} = 2.2^{+2.5}_{-1.3} corresponding to the probability of reconnection P_{reco} to be in the range 0.31 < P_{reco} < 0.68 at 68% confidence level with its best value at 0.52. <|endoftext|><|startoftext|> Microsoft Word - ENG-EJTP.doc EVOLUTIONARY NEURAL GAS (ENG): A MODEL OF SELF ORGANIZING NETWORK FROM INPUT CATEGORIZATION. Ignazio Licata (a) ↑ , Luigi Lella (b) (a) Ixtucyber for Complex Systems, Marsala, TP and Institute for Scientific Methodology, Palermo, Italy; (b) A.R.C.H.I. - Advanced Research Center for Health Informatics, Ancona, Italy ABSTRACT Despite their claimed biological plausibility, most self organizing networks have strict topological constraints and consequently they cannot take into account a wide range of external stimuli. Furthermore their evolution is conditioned by deterministic laws which often are not correlated with the structural parameters and the global status of the network, as it should happen in a real biological system. In nature the environmental inputs are noise affected and “fuzzy”. Which thing sets the problem to investigate the possibility of emergent behaviour in a not strictly constrained net and subjected to different inputs. It is here presented a new model of Evolutionary Neural Gas (ENG) with any topological constraints, trained by probabilistic laws depending on the local distortion errors and the network dimension. The network is considered as a population of nodes that coexist in an ecosystem sharing local and global resources. Those particular features allow the network to quickly adapt to the environment, according to its dimensions. The ENG model analysis shows that the net evolves as a scale-free graph, and justifies in a deeply physical sense- the term “gas” here used. Key-words: Self-Organizing Networks; Neural Gas; Scale-Free Graph; Information in Network Functional Specialization. 1. INTRODUCTION Self organizing networks are systems widely used in categorization tasks. A network can be seen as a set A={c1, c2,… ,cn} of units with associated reference vectors wc∈R n where Rn is the same space where inputs are defined. Each unit (or node) can establish connections with the other ones, the units belonging to the same clusters are subjected to similar modification affecting their reference vectors. Self organizing networks can automatically adapt to input distributions without supervision by means of training algorithms that are simple sequences of deterministic rules. Competitive hebbian learning and neural gas are the most important strategies used for their training. Neural gas algorithm (Martinetz T.M. and Schulten K.J., 1991) sorts the network units according to the distance of their reference vector to each input. Then the reference vectors are adapted so that the ones related to the first nodes in the rank order are moved more close than the others to the considered input. Competitive hebbian learning (Martinetz and Schulten, 1991; Martinetz, 1993) consists in augmenting the weight of the link connecting the two units whose reference vectors are closest to the considered input (the two most activated units). Both strategies are examples of deterministic rules. As we know there are other rules that constrain the topology of the network which has a fixed dimensionality. That’s the case of Self Organizing Maps (Kohonen, 1982) and Growing Cell Structures (Fritzke, 1994). ↑ Corresponding author: Ignazio.licata@ejtp.info In other cases the network structures haven’t topological constraints, they take a well ordered distribution by exactly adapting to the manifold inputs. For example TRN (Martinetz and Schulten, 1994) and GNG are networks whose final structure is similar to a Delaunay Triangulation (Delaunay, 1934).We have tried to define a new self organizing network that is trained by probabilistic rules avoiding any topological constraints. According to Jefferson (1995) life and evolution are structured at least into four fundamental levels: molecular, cellular, organism and population. We propose a population level based on evolutionary algorithm where the network is seen as a population of units whose interactions are conditioned by the availability of resources in their ecosystem. The evolution of the population is driven by a selective process that favours the fittest units. This approach has a biological plausibility. As stated by recent theories (Edelman, 1987) human brain evolution is subjected to similar selective pressures. Obviously we are not interested in recreating the same structure as the human brain. Our work aims at finding innovative and effective solutions to the categorization problem adopting natural system strategies. So our system falls within the Artificial Life field (Langton, 1989). Our model is a complex system that shows emergent features. In particular its structure evolves as a scale free graph. In the training phase there arise clusters of units with a limited number of nodes that establish a great number of links with the others. Scale free graphs are a particular structure that is really common in natural systems. Human knowledge, for instance, seems to be structured as a scale free graph (Steyvers, Tenenbaum 2001). If we represent words and concepts as nodes, we’ll find that some of these are more connected than the others. Scale free graphs have three main features.The small world structure. It means there is a relatively short path between any couple of nodes (Watts, Strogatz, 1998).The inherent tendency to cluster that is quantified by a coefficient introduced by Watts and Strogatz. Given a node i of ki degree i.e. having ki edges which connect it to ki other nodes, if those make a cluster, they can establish ki(ki-1)/2 edges at best. The ratio between the actual number of edges and the maximum number gives the clustering coefficient of node i. The clustering coefficient of the whole network is the average of all the individual clustering coefficients. Scale free graphs are also characterized by a particular degree distribution that has a power-law tail P(k)~k n− . That’s why such networks are called “scale free” (Albert, Barabasi, 2000). The three previous features are quantified by three parameters: the average path length between any couple of nodes, the clustering coefficient and the exponent of the power law tail. We’ll show that the values of these parameters in our model seem to confirm its scale free nature. 2. AN OUTLINE ON SELF-ORGANIZATION AND EVOLUTIONARY SYSTEMS Natural selection mechanism has been successfully used for a lot of industrial applications spanning from projecting to real-time control and neural networks training. It was in the 60s that Genetic Algorithms based on the Evolution Theory’s three main mechanisms - reproduction, mutation and fitness – were first used in dealing with optimization problems. Although the solution is reached by a population of individuals, systems based on this approach are not considered self organizing because their dynamics depend on the external constraint of the fitness function. In the 80s a new approach to the study of living systems which mixed together self organization and evolutionary systems came out (Rocha, 1997). Its success was due to the studies on the way how biological systems work (metabolism, adaptability, autonomy, self repairing, growth, evolution etc.). The hybrid systems make us possible to get a better simulation both of the evolutionary optimization processes and the internal structure modification to reach a greater biological plausibility in the fitness Neuroevolutionary systems are an example of this approach. In classic neuroevolutionary models the network parameters are genetically set, whereas the connection weights are modified according to a training strategy. This solution follows the classic vision of cerebral development where genes control the formation of synaptic connections while their reinforcement depends on neural activity. More recent neuroevolutionary systems are characterized by different forms of self organizing processes which are cooperative coevolution (Paredis, 1995; Smith, Forrest and Perelson, 1993) and synaptic Darwinism (Edelman, 1987). Cooperative co evolutionary systems offer a promising alternative to classic evolutionary algorithms when we face complex dynamical problems. The main difference with respect to classic EA is the fact that each individual represents only a partial solution of the problem. Complete solutions are obtained by grouping several individuals. The goal of each individual is to optimize only a part of the solution, cooperating with other individuals that optimize other parts of the solution. It is so avoided the premature convergence towards a single group of individuals. An example of such approach is given by the Symbiotic Adaptive Neuroevolution System (Moriarty and Miikkulainen, 1998) that operates on populations of neural networks. While in most neuroevolutionary systems each individual represents a complete neural network, in SANE each individual represents a hidden unit of a two-layered network. Units are continuously combined and the resulting networks are evaluated on the basis of the performances shown in a given task. The global effect is equal to schemas promoting in standard EAs. In fact during the evolution of the population the neural schemas having the highest fitness values are favoured and the possible mutations in the copies of these schemas don’t affect the other copies in the population. Other recent strategies focus on the evolution of connection schemas in the network. In the human brain the number of synapses established by a single neuron is always much lower than the overall number of neurons. That gives the network a sparsely connected aspect. In the last years several models have been proposed to emulate the mechanism involved in the selection of links without referring to the physical and chemical properties of neurons. The Chialvo and Bak model (Chialvo and Bak, 1999) is based on two simple and biological inspired principles. First, the neural activity is kept low selecting the activated units by a winner takes all strategy. Second, the external environment gives a negative feedback that inhibits active synapses if the network behaviour is not satisfying. With these simple rules the model operates in a highly adaptive state and in critical conditions (extreme dynamics). The fundamental difference of this strategy based on the synaptic inhibition with respect to the classic one based on synaptic reinforcement is that the reinforcement-based learning is a continuative process by definition, while the inhibition-based learning stops when the training goal is achieved. The synaptic inhibition is also biologically plausible. According to Young (Young, 1964; Young, 1966) learning is the result of the elimination of synaptic connections (closing of unneeded channels). Dawkins (Dawkins R., 1971) stressed that pattern learning is achieved by synaptic inhibition. As stated by the neural groups’ selection theory developed by Edelman (Edelman, 1978; Edelman, 1987), brain development is characterized by generating a structural and dynamical variability within and between populations of neurons, by the interaction of the neural circuit with the environment and by the differential attenuation or amplification of synaptic connections. Research in neurobiology seems to confirm the validity of the negative feedback model and the fact that neural development follows the process of Darwinian evolution. The Chialvo and Bak model is a simple two-layered network. After the training each input pattern is associated with a single output unit leading to the formation of an associative map. When an input pattern is presented the most activated input unit i is selected. Then the neuron j from the hidden layer that establishes the most robust connection with i is selected. Finally the output neuron k that is the most strongly connected with j is selected. If k is not the desired output the two links connecting i with j and j with k are inhibited by a coefficient d that is the only parameter of the model. The iterative application of these rules leads to a rapid convergence towards any input- output mapping. This selective process followed by an inhibitory one is the essence of the natural selection in the evolutionary context. The fittest individual is selected on the basis of a strategy that doesn’t reward the best but punishes the worst. That’s the reason why this model has been considered a particular kind of synaptic Darwinism. Our neuroevolutionary model is also based on a selection strategy. The structural information of our network is not codified by genes. We directly consider the entire network as a population of nodes that can establish connections, generate other units or die. The probability of these events depends on the presence of local and global resources. If there are few resources the population falls, if there is a lot of resources the population grows. Like in the Chialvo and Bak model we don’t select the fittest nodes reinforcing their links, but we simply remove the worst nodes when the ecosystem resources are low. This generates a selective process that indirectly rewards the units which can better model the input patterns. Our evolutionary strategy can be seen as a selective retention process (Heylighen, 1992) that removes those units which cannot reach a stable state, remaining associated with several input patterns. Even if the stability of a unit is quantified by the minimum distortion error related to it, this information mustn’t be considered to be environmental information. The minimum distortion error simply quantifies the difficulty encountered by the unit during the modelling of input patterns. 3. THE EVOLUTIONARY ALGORITHM Research has confirmed (Roughgarden, 1979; Song and Yu, 1988) that in natural environments the population size along with competition and reproduction rates continuously changes according to some natural resources and the available space in the ecosystem. These mechanisms have been reproduced in some evolutionary algorithms, for example to optimize the evolution of a population of chromosomes in a genetic algorithm (Annunziato and Pizzuti, 2000). We have tried to use a similar strategy for the evolution of a population of units in a self organizing network without using the string representation of genetic programming. In our model each node is defined by a vector of neighbouring units connected to it, a reference vector and a variable D that is the smallest distance between its reference vector and the closest modelled input. The value of this variable quantifies the debility degree of the unit. The lower is D the higher are the chances for the unit to survive. At each presentation of the training input set, D is set to the maximum value. After the presentation of a given input x, if the reference vector w of the unit is modified, the resulting distance between the two vectors ||x-w|| is calculated. If this value is lower than D it becomes its new value. The training algorithm here used can be subdivided in three phases: 1) Winners are selected. For each input the unit having the closest reference vector is selected. 2) The reference vectors of the winners and their neighbours are updated according to the following formula : (3.1) ( ) ( ) ( )( )1w t w t x w tα+ = + − . So the reference vectors w of the selected units are moved towards the relative inputs x of a certain fraction of the distances that separate them. For winners this fraction is two or three orders of magnitude higher than the one used for their neighbours. So winners have the reference vectors moving more quickly towards the inputs. 3) The population of units evolves producing descendants, establishing new connections and eliminating the less performing units. All these events can occur with a well defined probability that depends on the availability of resources. These rules are iterated until a given goal is achieved. For example the minimization of the expected quantization error that is the mean of the distances between the winners and the K inputs they model: (3.2) D x wK = −� If this value falls below a certain threshold Dmin, training is stopped. The first two phases can be considered a kind of winner takes all strategy, where only the most activated units are selected and enabled to modify their reference vectors. The third phase is the evolutionary phase (fig. 3.1). Each unit i, i=[1…N(t)] where N(t) is the actual population size can meet the closest winner j with probability Pm: Fig. 3.1 – The evolutionary phase of the algorithm. If meeting occurs, the two units establish a link and they can interact by reproducing with probability Pr. In this case two new units are created. One is closer to the first parent, the other to the second parent: (3.3) If reproduction doesn’t take place due to the lack of resources the weaker unit of the population, i.e. the one with the highest debility degree, is removed. If unit i doesn’t meet any winner it can interact with the closest node k with probability Pr establishing a connection and producing a new unit whose reference vector is set between the parents reference vectors: (3.4) 1 22 2 p pw ww When we fix a maximum population size, the ratio between the actual size and the threshold N(t)/Nmax can be seen as a global resource of the ecosystem affecting the probabilities of the events. For example if the population size is low the reproduction rate should be high. So we can reasonably choose Pr = 1-N(t)/Nmax. If the population size is high, the chance for the units to meet each other will be higher, so we can set Pm = N(t)/Nmax. We can also consider a local resource that is the ratio between the threshold Dmin and the debility degree Di of the unit i. Each unit i should meet a winner with a probability Pm=(N(t)/Nmax)(1-Dmin/Di) and Pr = 1 – Pm. In this way winners are not encouraged to migrate to other groups of nodes and weaker units don’t participate in reproduction activities. We can estimate the population grow rate in the following way: (3.5) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( ) 2 2 2 2min min 1 2 1 1 1 2 1 1 2 1 ( ) 2 1 1 1 2 1 1 ( ) m r m r m m d N t N t P P N t P P N t P N t P P N t N t P N t N t X t X t X t first model N t D D N t X t X t X t second model M D D + = + − − + − − − = = − = = − � + = −� � � � � �� � � �= − − � + = − −� � � �� � � �� �� �� � � �� �� � where X(t) is the normalized size N(t)/Nmax. This is the quadratic-logistic map of Annunziato and Pizzuti(Annunziato and Pizzuti, 2000): (3.6) ( ) ( ) ( )( )21 1X t aX t X t+ = − They proved that by varying the parameter different chaotic regimes arise. For a<1.7 the behaviour is not chaotic, for 1.7<|startoftext|> X-ray Dichroism and the Pseudogap Phase of Cuprates S. Di Matteo1, 2 and M. R. Norman3 Laboratori Nazionali di Frascati INFN, via E. Fermi 40, C.P. 13, I-00044 Frascati, Italy Equipe de physique des surfaces et interfaces, UMR-CNRS 6627 PALMS, Université de Rennes 1, 35042 Rennes Cedex, France Materials Science Division, Argonne National Laboratory, Argonne, IL 60439, USA (Dated: October 26, 2018) A recent polarized x-ray absorption experiment on the high temperature cuprate superconduc- tor Bi2Sr2CaCu2O8+x indicates the presence of broken parity symmetry below the temperature, T*, where a pseudogap appears in photoemission. We critically analyze the x-ray data, and con- clude that a parity-breaking signal of the kind suggested is unlikely based on the crystal structures reported in the literature. Possible other origins of the observed dichroism signal are discussed. We propose x-ray scattering experiments that can be done in order to determine whether such alternative interpretations are valid or not. PACS numbers: 78.70.Ck, 78.70.Dm, 75.25.+z, 74.72.Hs I. INTRODUCTION Twenty years since the discovery of high-temperature cuprate superconductivity, there is still no consensus on its origin. As the field has evolved, more and more at- tention has been directed to the pseudogap region of the phase diagram in underdoped compounds and the possible relation of this phase to the superconducting one.1 Time-reversal breaking has been predicted to occur in this pseudogap phase due to the presence of orbital currents2 and a subsequent experiment3 using angle- dependent dichroism in photoemission has claimed to de- tect this. However, this result has been challenged by others4 and an independent experimental verification of this would be highly desirable. Recently, Kubota et al.5 performed Cu K edge circu- lar and linear dichroism x-ray experiments for under- doped Bi2Sr2CaCu2O8+x (Bi2212), claiming no time- reversal breaking of the kind predicted in Ref. 2 exists, and that, on the contrary, a parity-breaking signal (but time-reversal even) is present with the same tempera- ture dependence as the photoemission dichroism signal, that was interpreted as x-ray natural circular dichroism (XNCD) as seen in other materials.6 The aim of the present paper is to critically examine the conclusions of Kubota et al.5 In particular, we find that the XNCD signal for the average7 Bi2212 crystal structure should be zero along all three crystallographic axes, therefore casting doubt on the original interpreta- tion of Ref. 5. To look into alternate explanations, we performed detailed numerical simulations aimed at ex- plaining the observed signal. At the basis of our study is the simple observation (see, e.g., Ref. 8) that circu- lar dichroism in absorption can be generated either by a non-magnetic effect in the electric dipole-quadrupole (E1-E2) channel (XNCD, a parity-breaking signal), or by a magnetic signal in the E1-E1 channel (parity-even). Alternately, such a signal can be due to contamination from x-ray linear dichroism (XNLD). We propose x-ray experiments that could be used to investigate these mat- ters further. The structure of the paper is as follows. In Sec. II we show how symmetry constrains any possible XNCD sig- nal that would be observed in Bi2212. We also perform numerical simulations for XNCD assuming an alignment displaced from the c-axis, using several crystal structure refinements proposed in the literature. We also calculate the XNLD signal and comment whether XNLD contam- ination could be responsible for the observed signal. In Sec. III we calculate the x-ray magnetic circular dichro- ism (XMCD) signal at both the Cu K and L edges as- suming magnetic order on either the copper or oxygen sites. Finally, in Sec. IV, we draw some general conclu- sions from our work. II. NON-MAGNETIC CIRCULAR DICHROISM IN BI2212 The structure of Bi2212 is strongly layered, with insu- lating BiO blocks intercalated between superconducting CuO2 planes. Crystal structure refinements reveal the presence of an incommensurate modulation whose origin has been the subject of much debate. The presence of this modulation has complicated the determination of the average crystal structure. Two different average struc- tures have been proposed in the literature for Bi2212: Bbmb9,10 and Bb2b11,12,13,14, where b is the modulation direction for the superstructure. We follow the general convention in the cuprate literature and use a rotated basis compared to those in the International Tables for Crystallography15 (respectively, Cccm, No. 66, and Ccc2, No. 37). In this way, the c-crystallographic direction is orthogonal to the CuO2 planes. The Bbmb structure is globally centrosymmetric and, as such, does not admit a non-zero value for the parity- odd operator ~L · (ǫ̂∗ × ǫ̂)(~Ω · k̂), whose expectation value determines the XNCD signal (~L, k̂, ǫ̂ and ~Ω are, re- spectively, the angular momentum operator, the direc- tion and polarization of the x-ray beam, and the toroidal http://arxiv.org/abs/0704.0599v2 momentum operator, see, e.g., Refs. 8,16). Therefore, if the signal measured by Kubota et al.5 were a true XNCD signal, this would imply a lower crystal symmetry than Bbmb. We note that most refinements in the literature suggest Bbmb,9,10 and this crystal structure is also con- sistent with recent photoemission data17 that indicate the presence of both a glide plane and a mirror plane based on dipole selection rules. The other average structure that has been suggested by x-ray and neutron diffraction is Bb2b.11,12,13,14 This space group is not centrosymmetric and, therefore, a par- ity breaking signal like that of XNCD is in principle al- lowed. However, not all wave vector directions are com- patible with the presence of a XNCD signal, as demon- strated below by symmetry considerations. In the last part of Section II, we numerically calculate the XNCD for a geometrical configuration allowing a signal - like ~k‖(101) - by means of the multiple-scattering subroutine in the FDMNES program.18 In the context of this program, atomic potentials are generated using a local density approximation with a Hedin-Lundqvist form for the exchange-correlation en- ergy. These potentials are then used in a muffin tin approximation to calculate the resulting XANES sig- nal by considering multiple scattering of the photo- electron about the absorbing site within a one-electron approximation.19 In the future, it would be desirable to repeat these calculations by using input from self- consistent band theory, as has recently been done for the Bbmb structure in regards to angle resolved photoemis- sion spectra.20 In the Bb2b setting, Cu ions belong to sites of Wyck- off multiplicity 8d. These eight equivalent copper sites can be partitioned in two groups of four sites that are related by the vector (1/2,0,1/2). Within each group the four sites are related by the symmetry operations {Ê, Ĉ2y, m̂x, m̂z}, where Ê is the identity, Ĉ2y is a two- fold axis around the b crystallographic axis, and m̂x(z) is a mirror-symmetry plane orthogonal to the a(c)-axis. The absorption at the Cu K edge, expressed in Mbarn, can be calculated through the equations: σ(±) = j , (1) j = 4π 2αh̄ω |〈Ψ(j)n |Ô (±)|Ψ 2δ[h̄ω − (En − E0)] The operator Ô(±) ≡ ǫ̂(±) ·~r(1+ i ~k ·~r) in Eq. (2) is the usual matter-radiation interaction operator expanded up to dipole (E1) and quadrupole (E2) terms, with the pho- ton polarization ǫ̂ and the wave vector ~k, where we label left- and right-handed polarization by the superscript ±. n ) is the ground (excited) state of the crystal, and E0 (En) its energy. The sum in Eq. (2) is extended over all the excited states of the system and h̄ω is the energy of the incoming photon, with α the fine-structure 9.029.019.008.998.98 Energy (keV) Pet (XNCD x 1000) Glady (XNCD x 400) Kan (XNCD x 100) FIG. 1: (Color online) XANES signal for ~k‖(001) and XNCD signal for ~k‖(101) at the Cu K edge for three Bb2b crystal refinements, with a cluster radius of 4.9 Å. The crystal struc- tures are Pet for Petricek et al.11, Glady for Gladyshevskii and Flükiger13 , and Kan for Kan and Moss12. The XNCD signals have been multiplied by the factors indicated. Each successive set of curves is displaced by 0.4 Mbarn. constant. Finally, the index j = 1, ..., 8 indicates the lattice site of the copper photoabsorbing atom in the unit cell. Eqs. (1) and (2) are the basis of the numer- ical calculations of the FDMNES program.18 The eight contributions can be written as the sum of two equal parts coming from the two subsets of four ions related by the (1/2,0,1/2) translation. Within each subset, the four absorption contributions are related to one another by the symmetry operations σ2 = Ĉ2yσ1, σ3 = m̂xσ1, and σ4 = m̂zσ1, implying that the total absorption is: σ = 2(1 + m̂z)(1 + m̂x)σ1 (3) The group from σ5 to σ8 is equivalent to the first group of four modulo a translation (this is the reason for the factor of two in Eq. (3)). Notice that the symmetry operators in Eq. (3) are meant to operate just on the electronic part of the operator Ô in Eq. (2). In the case of circular dichroism, the signal is given by σ = σ+−σ−. If we suppose that no net magnetization is present in the material (we shall analyze the possibility of magnetism in Sec. III), then the dichroism is natural, i.e., necessarily coming from the interference E1-E2 con- tribution in Eq. (2). In this case the signal is parity-odd, which implies that m̂z(x) ≡ ÎĈ2z(2x) → −Ĉ2z(2x) (Î is the inversion operator). Then Eq. (3) becomes σ = 2(1− Ĉ2z)(1 − Ĉ2x)σ1 (4) which implies that, of the possible five second-rank ten- sors involved in XNCD, only the term T 1 − T −1 sur- vives. To arrive at this result, we applied the usual op- erator rules on spherical tensors:21 Ĉ2xT m = T −m and Ĉ2zT m = (−) m . This in turn leads to a zero XNCD along the three crystallographic axes of the Bb2b crys- tal structure where, e.g., along the c-axis, the signal is proportional to T Therefore, even in the Bb2b crystal structure, the XNCD is exactly zero by symmetry when the wave vec- tor is directed along the c-axis (i.e., orthogonal to the CuO2 planes) as in the experiment of Ref. 5. This has been further checked by numerical calculations with clus- ter radii up to 6.5 Å, i.e., 93 atoms, centered on the Cu ion, based on the average crystal structures reported in Refs. 11,12,13,14. The only possibility to justify theo- retically the experimental evidence of circular dichroism of a non-magnetic nature is either by lowering of the or- thorhombic Bb2b symmetry, a misalignment of the k̂- direction with respect to the c-axis, or contamination from linear dichroism. We checked all of these possi- bilities. If we take into account the monoclinic supercell pro- posed in Ref. 13, with space group Cc, this implies a re- duction of the symmetry operations, with the loss of the two-fold screw axis. Nonetheless, the glide plane contain- ing the normal to the CuO2-planes is still present (m̂x in Eq. 3), which is responsible for the extinction rule of the quantity 〈Ψn|~L · (ǫ̂ ∗ × ǫ̂)(~Ω · k̂)|Ψn〉. Therefore the XNCD is again identically zero by symmetry, which we verified by direct numerical simulation of the supercell. Note that only a reduction to triclinic symmetry would allow for XNCD in the direction orthogonal to the CuO2 planes. We also checked for the possibility of misalignment, as shown in Fig. 1, by a direct calculation with ~k‖(101), corresponding to a tilting θ ∼ 10o with respect to the c-axis (note that the a lattice constant is ∼ 5.4 Å, and the c one ∼ 30.9 Å). We first remark that the calcu- lated XANES signal compares well to experiment (the XANES for ~k‖(001) and ~k‖(101) are identical on the scale of Fig. 1). Despite this, the energy profile of the XNCD signal is different from the one reported by Kub- ota et al.5 The main difference is the energy extension of the calculated dichroic signal, whose oscillations persist, though with decreased intensity, more than 50 eV above the edge itself. This characteristic is present for all four Bb2b refinements we have looked at (and as well for the monoclinic supercell, which we do not show) and is at variance with the experimental results of Ref. 5, where the dichroic signal is confined to a small energy range around the edge. Some comments on the calculations are in order. Each refinement gives rise to a different XNCD signal, and -0.05 9.029.019.008.998.98 Energy (keV) Pet (XNCD x 1000) Kan (XNCD x 100) Glady (XNCD x 400) FIG. 2: (Color online) XNCD signal, as in Fig. 1, but for a CuO5 cluster. 9.029.019.008.998.98 Energy (keV) R=2.1 R=3.1 R=4.9 R=4.1 R=5.5 R=6.5 XNCD x 1000 FIG. 3: (Color online) XNCD signal at the Cu K edge for ~k‖(101) as a function of the cluster radius (in Å) for the crys- tal structure of Petricek et al.11. The signal has been mul- tiplied by 1000. Each successive curve is displaced by 0.06 Mbarn. their intensities are quite different as well. They are found to be strongly dependent on the magnitude of the deviation of the atoms from their Bbmb positions, which differs significantly among the various Bb2b refinements. Moreover, the structure of Kan and Moss leads to a mani- festly different energy profile. This difference is seen even for CuO4 and CuO5 clusters (the results for a CuO5 clus- ter are shown in Fig. 2) and has to do with the large de- parture of this particular structure from the Bbmb one. Although there has been some criticism in the literature concerning this particular refinement,10 the point we wish to make is that each refinement has a different XNCD signal, showing how sensitive this signal is to the actual crystal structure. We note that Fig. 1 was done for a cluster radius of 4.9 Å, i.e., 37 atoms. In Fig. 3, we show results for the refinement of Petricek et al.11 up to a ra- dius of 6.5 Å (93 atoms), showing the development of ad- ditional structure in the energy profile as more and more atoms are included in the cluster. In the real system, the effective cluster radius is limited by the photoelectron escape depth, which is energy dependent.22 We also remark that the magnitude of the XNCD sig- nal we calculated for a 10 degree misalignment is com- parable to that measured in Ref. 5. On the other hand, the signal goes as sin(2θ), where θ measures the displace- ment from the c-axis. We note that Kubota et al.5 men- tion that their signal was insensitive to displacements from the c-axis of 5 degrees, which would argue against a misalignment given the strong angular dependence we predict. Moreover, we note that the size of the signal only depends on the projection of the k vector onto the a-c plane, i.e., a signal for ~k‖(111) is equivalent to that for ~k‖(101). The above leads us to suspect that neither misalign- ment, nor symmetry reduction, are the basis of the signal detected in Ref. 5. We now turn to the third possibility for a non-magnetic signal, that due to intermixing of lin- ear dichroism. All x-ray beams at a synchrotron have a linear polarization component (Kubota et al.5 mention the possibility of up to 5% of linear admixture). The resulting linear dichroism, which vanishes for a uniaxial crystal, can swamp the intrinsic XNCD signal for a biax- ial crystal where the a and b directions are inequivalent (like Bi2212). This was shown by Goulon et al. for a KTiOPO4 crystal with the same point group symmetry (mm2) as Bi2212.23 To analyze this further, we show the linear dichro- ism (the XANES signal for ~E‖(010) minus the one for ~E‖(100)), calculated for the Bb2b refinement of Glady- shevskii et al.,14 for various cluster radii, in Fig. 4. Simi- lar results have been obtained for the other crystal struc- tures, including the Bbmb refinement of Miles et al.10 The energy profile, with a positive peak followed by a negative peak, and its location at the absorption edge, is very reminiscent of the data. Moreover, the size of the XNLD signal is large, meaning that only a few percent admixture is necessary to explain the size of the signal seen in Ref. 5. One issue is that Kubota et al.5 did report the existence of an XNLD signal, but also claim that it is temperature independent. This is somewhat puzzling, as there are significant changes of the lattice constants with temperature.24 One obvious question would be why such an XNLD contamination would only appear below T*, though it should be remarked that there are anomalies in the superstructure periodicity near T*.24 A definitive test would be to rotate the sample under the beam, as any XNLD signal would vary as cos(2φ) where φ is the in- plane angle relative to the b-axis. Any circular dichroism (XNCD or XMCD) is instead φ-independent. 9.029.019.008.998.98 Energy (keV) R=3.0 R=4.1 R=5.1 XANES XNLD x 50 FIG. 4: (Color online) XNLD signal at the Cu K edge for ~k‖(001) as a function of the cluster radius (in Å) for the crys- tal structure of Gladyshevskii et al.14. The XNLD signal has been multiplied by 50. The XANES curve has been displaced by 0.05 Mbarn. A final possibility would be a small energy shift be- tween the left and right circularly polarized beams. Dif- ferentiating the absorption edge in Fig. 1 would indeed lead to a signal similar to that seen in Ref. 5 (but with an enhanced positive peak relative to the negative peak). But such an energy shift is difficult to imagine with the particular experimental set up used. The observed dichroism signal as a function of energy is also reminiscent of that typically seen for magnetic circular dichroism: in this case the signal would be of E1-E1 origin and its main features are indeed expected to be at the edge itself. In addition, the nature of the observed signal (a single sharp positive peak followed by a single sharp negative peak) is also much like a magnetic signal where the main features are expected to be more localized in energy starting from the rising edge of the absorption. Whether this possibility is realistic or not can only be determined by a quantitative calculation, which we offer in the next section. III. MAGNETIC DICHROISM IN BI2212 Over the years, there have been several claims of pos- sible magnetic order in the pseudogap phase of cuprate superconductors. Recently, a magnetic signal at a (101) Bragg vector has been observed below T* for several un- derdoped YBa2Cu3O6+x (YBCO) samples by polarized neutron diffraction.25 The signal, corresponding to a mo- ment of order 0.05-0.1 µB, is not simple ferromagnetism as it was not observed at the (002) Bragg vector. Even more recently, a Kerr rotation below T* has been de- tected in underdoped YBCO corresponding to a net fer- 9.029.019.008.998.98 Energy (keV) XANES Kan XMCD Pet XMCD Kan XMCD Glady XMCD x 10000 FIG. 5: (Color online) XMCD for ~k‖(001) at the Cu K edge with a ferromagnetic moment of 0.1 µB along the c-axis at each copper site. The cluster radius is 4.1 Å. The XMCD signal has been multiplied by 10000. The crystal structures are those of Fig. 1. romagnetic moment of 10−4 µB. This motivates us to consider the possibility of a mag- netic origin for the x-ray circular dichroism detected in Ref. 5. We restate that in absorption (see, e.g., Ref. 8), circular dichroism can be generated either by a non- magnetic effect in the E1-E2 channel (XNCD, a parity- breaking signal), or by a magnetic signal in the E1-E1 channel (XMCD, parity-even). The first possibility, an- alyzed in the previous section, does not seem to be com- patible with the experimental results. In order to analyze the second possibility, we need to provide the lattice with a magnetic structure that has a net magnetization (oth- erwise, the XMCD is zero). In what follows, we shall sup- pose two magnetic distributions, the first with the mag- netic moments on the Cu sites, the second on the planar O sites. The numerical calculations are performed with the relativistic extension of the multiple-scattering pro- gram in the FDMNES code,18 and provide results that are an extension of those of the previous section. The details of the calculations are as follows: we used again the average crystal structures discussed in Section II. For each magnetic configuration we employed clus- ters with radii ranging from 3.1 Å (a CuO5-cluster), to 4.9 Å (37 atoms) around the Cu photoabsorbing ion. In the first set of calculations, shown in Figs. 5 and 6, we built the input potential from a magnetic configuration of 4.55 3d↑ electrons and 4.45 3d↓ electrons (i.e., a moment of 0.1 µB per copper site). In the second set of calcula- tions, shown in Fig. 7, we built the input potential from a magnetic configuration of 2.05 2p↑ electrons and 1.95 2p↓ electrons (i.e., a moment of 0.1 µB per planar oxygen site). The following results are noteworthy: a) Differently from the XNCD calculations shown in -0.05 9.029.019.008.998.98 Energy (keV) XMCD Pet XMCD Kan XMCD Glady x 10000 FIG. 6: (Color online) XMCD as in Fig. 5, but with a cluster radius of 3.1 Å, i.e., a CuO5 cluster. 9.029.019.008.998.98 Energy (keV) XANES Kan XMCD Pet XMCD Kan XMCD Glady XMCD x 5000 FIG. 7: (Color online) XMCD for ~k‖(001) at the Cu K edge with a ferromagnetic moment of 0.1 µB at each planar oxygen site. The XMCD signal has been multiplied by 5000. The XANES signal has been displaced by 0.02 Mbarn. Fig. 1, all crystal structures give basically the same XMCD spectra. The reason for this behavior may be related to the fact that XMCD, when x-rays are orthog- onal to the CuO2 planes, mainly depends on the in-plane magnetisation density and on the in-plane crystal struc- ture, which is quite similar for the various refinements. b) There is a more marked dependence on the cluster radius compared to the XNCD, as shown by the compar- ison of Fig. 5 and Fig. 6. The calculations with a radius bigger than 4.1 Å (6 Cu, 9 O, 4 Sr, and 4 Ca), above the pre-edge energy, are basically equivalent to those shown in Fig. 5, with a positive peak at the edge energy, fol- 960950940930 Energy (eV) XANES Pet XMCD Pet XMCD Kan XMCD Glady XMCD x 10 FIG. 8: (Color online) XMCD for ~k‖(001) at the Cu L2,3 edges with a ferromagnetic moment of 0.1 µB along the c- axis at each copper site. The cluster radius is 4.1 Å. The XMCD signal has been multiplied by 10. lowed by a double negative peak, the latter at variance with the experimental results. On the contrary, the en- ergy shape obtained for a radius of 3.1 Å is very close to the experimental one, with a single negative peak after the sharp positive one, with relatively good agreement in the energy position and width. We could be tempted to suppose, therefore, that the virtual photoelectron has a very small mean free path before decaying and it is sensitive just to the nearest neighbor oxygens. Indeed we checked that an identical XMCD profile is obtained with just the in-plane CuO4-cluster. On the other hand, the size of the signal we calculate is about an order of magnitude smaller than that seen in Ref. 5. Since the XMCD signal is proportional to the moment, then we would need a moment of ∼1 Bohr magneton per copper to have a comparable signal. Such a huge moment would have been observed previously by neutron scattering if it existed. Of course we cannot exclude that spurious effects, such as strain fields, could have influenced the measurement. c) The energy profile in the case of magnetization at the oxygen sites is not much different from the cop- per case, except for the deeper negative peak around E ∼ 8.994 keV, as shown in Fig. 7. Also in this case the dif- ferent crystal structure refinements give basically equiv- alent results, as again the CuO2 planes are practically equivalent in the various cases. Note that the relative intensity is equivalent to the copper case, as 0.1 Bohr magnetons per planar oxygen corresponds to 0.2 Bohr magnetons per CuO2 cell (note we multiply by 5000 in Fig. 7 as compared to 10000 in Fig. 5). We also performed simulations for the Cu L2,3 edges for the magnetic configuration corresponding to Fig. 5, as shown in Fig. 8, which can be compared with future experimental investigations in order to confirm whether or not a net magnetization exists in this compound. Finally, we remark that the dependence of the XMCD signal on the tilting angle θ (i.e., the displacement of the photon wave vector from the c-axis) goes like cos(θ) and therefore the signal is not very sensitive to small displacements of 5 degrees, as noted by Kubota et al.5 This different angular dependence from the XNCD signal suggests a relatively easy way to unravel the question ex- perimentally: it is sufficient to measure the θ (azimuthal) dependence of the signal, noting that any XNLD contam- ination would be tested by the φ (polar) dependence of the signal. IV. CONCLUSIONS In our opinion, the experiments of Kubota et al.5 have raised more questions than they have answered. Al- though not treated in our paper, we believe that their results from the measurement of non-reciprocal linear dichroism are at this stage not conclusive, as only one di- rection for the toroidal moment has been investigated, of the two possible suggested by the orbital current pattern of Varma.2 The analysis performed in Section II showed, moreover, that the claimed XNCD signal is probably un- justified. In fact, even though the space group Bb2b is non-centrosymmetric, XNCD is absent by symmetry when the x-ray wave vector is chosen orthogonal to the CuO2 planes, as in the experimental measurement geom- etry of Ref. 5. The same extinction rule survives for the monoclinic supercell structure refined in Ref. 13. More- over, in both cases, it seems hard to mantain the hy- pothesis of misalignment, due to the experimental local- ization of the energy profile around the main absorption edge, which is absent in the calculations. We also note the difference of XNCD from the photoemission dichro- ism results of Kaminski et al.3 A direct comparison is however not immediate, as the former represents a q̂- integrated version of the latter (here q̂ is the solid angle in the space of the photoelectron wave-vector, see, e.g., Ref. 8). A more likely explanation is an XNLD contamination (Fig. 4), but then the challenge is to understand why such an effect would only exist below T*. We note that an op- tics experiment for an optimal doped Bi2212 sample has seen a change in linear birefrigence below Tc, which was accompanied by a non-zero circular birefringence.28 In addition, both Bi222329 and the Fe analogue of Bi221230 exhibit supercells with 222 space groups which would al- low for dichroism. So, it is conceivable that there is a subtle structural transition associated with T* which we suggest could be looked for by diffraction experiments. A final comment about the physical quantities detected by x-ray circular dichroism, either natural or magnetic, is in order. At the K edge of transition metal oxides, XMCD in the E1-E1 channel, at the excitation energy E = h̄ω − Eedge, gives information on the expectation value of L̂ · (ǫ̂∗× ǫ̂) for the excited states at the energy E . The orbital angular momentum is either induced from a spin moment via spin-orbit coupling (as calculated here) or directly by an orbital current (as in the scenario advo- cated in Ref. 231). No direct spin information is available at the K edge and therefore an XMCD measurement is not directly related to the ground state magnetic moment as for the L edge. Moreover, the observed states are those with p-like angular momentum projection on the photoabsorbing Cu ion, which are extended and there- fore mainly sensitive to the influence of the oxygen atoms surrounding the Cu site. In this case, the main contribu- tion to the XMCD energy profile is expected in an energy range of 10-20 eV from the main edge to the first shoul- der in the XANES spectrum, as found in Ref. 5 and in our own XMCD calculations. Although the results of Section III are in principle con- sistent with Ref. 5, the size of the ferromagnetic moment necessary to get a signal of the magnitude seen by exper- iment, ∼1 Bohr magneton, is excessive. If such a large ferromagnetic moment existed, it would have surely been seen by neutron scattering. From this point of view, ex- periments performed at the Cu L edge and O K edge would be desirable as they are more sensitive to the pres- ence of a magnetic moment. Finally, we would like to remark that XNCD along the c-axis is insensitive to orbital currents. These latter, confined to the CuO2 planes, develop a parity breaking characterized by a toroidal moment (~Ω) within the CuO2 planes. The XNCD experiment of Ref. 5 would only be sensitive to the projection of the toroidal moment out of this plane (i.e., along the direction of the x-ray wavevec- tor). Therefore, if performed as stated, it cannot tell us about any possible orbital current order. To conclude, we believe that the various interpreta- tions, XNCD, XNLD, or XMCD, have their drawbacks, and therefore the origin of the experimental signal of Ref. 5 is still open. In this sense, further experimen- tal checks of the energy extension of the dichroic signal would be highly desirable. Based on our results, the most stringent experimental test on the physical origin of the signal would come from the measurement of the depen- dence on the tilting (azimuthal) angle θ, due to the dif- ferent dependences of XNCD and XMCD, as well as the dependence on the in-plane (polar) angle φ, which would test for any possible XNLD contamination. V. ACKNOWLEDGMENTS The authors thank John Freeland, Zahir Islam, Stephan Rosenkranz, Daniel Haskel and Matti Lindroos for various discussions. This work is supported by the U.S. DOE, Office of Science, under Contract No. DE- AC02-06CH11357. SDM would like to thank the kind hospitality of the ID20 beamline staff at the ESRF. 1 M. R. Norman, D. Pines and C. Kallin, Adv. Phys. 54, 715 (2005). 2 C. M. Varma, Phys. Rev. B 55, 14554 (1997) and Phys. Rev. Lett. 83, 3538 (1999); M. E. Simon and C. M. Varma, ibid 89, 247003 (2002). 3 A. Kaminski, S. Rosenkranz, H. Fretwell, J. C. Cam- puzano, Z. Li, H. Raffy, W. G. Cullen, H. You, C. G. Olson, C. M. Varma and H. Hoechst, Nature 416, 610 (2002). 4 S. V. Borisenko, A. A. Kordyuk, A. Koitzsch, T. K. Kim, K. A. Nenkov, M. Knupfer, J. Fink, C. Grazioli, S. Turchini and H. Berger, Phys. Rev. Lett. 92, 207001 (2004). 5 M. Kubota, K. Ono, Y. Oohara and H. Eisaki, J. Phys. Soc. Jpn. 75, 053706 (2006). 6 L. Alagna, T. Prosperi, S. Turchini, J. Goulon, A. Ro- galev, C. Goulon-Ginet, C. R. Natoli, R. D. Peacock and B. Stewart, Phys. Rev. Lett. 80, 4799 (1998). 7 The average crystal structure is the base orthorhombic unit cell before the incommensurate superstructure is taken into account. As the latter involves a translation operator, it should not affect the symmetry arguments in this paper. 8 S. Di Matteo and C. R. Natoli, J. Synchr. Rad. 9, 9 (2002). 9 A. Yamamoto, M. Onoda, E. Takayama-Muromachi, F. Izumi, T. Ishigaki and H. Asano, Phys. Rev. B 42, 4228 (1990); D. Grebille, H. Leligny, A. Ruyter, Ph. Labbé and B. Raveau, Acta Cryst. B52, 628 (1996); N. Jakubowicz, D. Grebille, M. Hervieu and H. Leligny, Phys. Rev. B 63, 214511 (2001). 10 P. A. Miles, S. J. Kennedy, G. J. McIntyre, G. D. Gu, G. J. Russell and N. Koshizuka, Physica C 294, 275 (1998). 11 V. Petricek, Y. Gao, P. Lee and P. Coppens, Phys. Rev. B 42, 387 (1990). 12 X. B. Kan and S. C. Moss, Acta Cryst. B48, 122 (1992). 13 R. E. Gladyshevskii and R. Flükiger, Acta Cryst. B52, 38 (1996). 14 R. E. Gladyshevskii, N. Musolino and R. Flükiger, Phys. Rev. B 70, 184522 (2004). There is no superstructure for this Pb doped variant, so the crystal structure used is the actual one, not the average one. International Tables for Crystallography, 5th ed., ed. T. Hahn (Kluwer, Dordrecht, 2002). 16 P. Carra and R. Benoist, Phys. Rev. B 62, R7703 (2000). 17 A. Mans, I. Santoso, Y. Huang, W. K. Siu, S. Tavaddod, V. Apiainen, M. Lindroos, H. Berger, V. N. Strocov, M. Shi, L. Patthey and M. S. Golden, Phys. Rev. Lett. 96, 107007 (2006). 18 Y. Joly, Phys. Rev. B 63, 125120 (2001). This program can be downloaded at http://www-cristallo.grenoble.cnrs.fr/fdmnes. 19 C. R. Natoli, Ch. Brouder, Ph. Sainctavit, J. Goulon, Ch. Goulon-Ginet and A. Rogalev, Eur. Phys. J. B 4, 1 (1998). 20 V. Arpiainen and M. Lindroos, Phys. Rev. Lett. 97, 037601 (2006). 21 D. A. Varshalovich, A. N. Moskalev and V. K. Kersonskii, Quantum Theory of Angular Momentum (World Scientific, Singapore, 1988). 22 The results presented involve a convolution of the calcu- http://www-cristallo.grenoble.cnrs.fr/fdmnes lated spectrum with both a core hole (1.9 eV for the Cu K edge) and a photoelectron lifetime, with the latter having a strong energy dependence (ranging up to 15 eV with a midpoint value at 30 eV above the Fermi energy). But the calculation has a fixed cluster radius. 23 J. Goulon, C. Goulon-Ginet, A. Rogalev, G. Benayoun, C. Brouder and C. R. Natoli, J. Synchr. Rad. 7, 182 (2000). In addition, there is also an intrinsic non XNCD contribu- tion to the circular dichroism for a biaxial crystal, but this should vanish in Bi2212 for an x-ray beam aligned along the c-axis, see J. Goulon, C. Goulon-Ginet, A. Rogalev, V. Gotte, C. Brouder and C. Malgrange, Eur. Phys. J. B 12, 373 (1999). 24 P. A. Miles, S. J. Kennedy, A. R. Anderson, G. D. Gu, G. J. Russell and N. Koshizuka, Phys. Rev. B 55, 14632 (1997). 25 B. Fauqué, Y. Sidis, V. Hinkov, S. Pailhes, C. T. Lin, X. Chaud and P. Bourges, Phys. Rev. Lett. 96, 197001 (2006). 26 A. Kapitulnik, unpublished results. 27 It should be remarked, though, that single-particle based approaches can miss some features of the spectral weight transfer between the L2 and L3 edges. 28 J. Kobayashi, T. Asahi, M. Sakurai, M. Takahashi, K. Okubo and Y. Enomoto, Phys. Rev. B 53, 11784 (1996). 29 E. Giannini, N. Clayton, N. Musolino, A. Piriou, R. Glady- shevskii and R. Flükiger, IEEE Trans. Appl. Supercond. 15, 3102 (2005). 30 Y. Le Page, W. R. McKinnon, J.-M. Tarascon and P. Bar- boux, Phys. Rev. B 40, 6810 (1989). 31 X-ray dichroism experiments in regards to the orbital cur- rent scenario of Varma have been discussed by S. Di Matteo and C. M. Varma, Phys. Rev. B 67, 134502 (2003). ABSTRACT A recent polarized x-ray absorption experiment on the high temperature cuprate superconductor Bi2Sr2CaCu2O8 indicates the presence of broken parity symmetry below the temperature, T*, where a pseudogap appears in photoemission. We critically analyze the x-ray data, and conclude that a parity-breaking signal of the kind suggested is unlikely based on the crystal structures reported in the literature. Possible other origins of the observed dichroism signal are discussed. We propose x-ray scattering experiments that can be done in order to determine whether such alternative interpretations are valid or not. <|endoftext|><|startoftext|> Introduction 1.1 Algebraic patterns within subsets of N We use extensively the notion of “algebraic pattern”. By an algebraic pattern we mean a solution of a diophantine system of equations. For example, an arithmetic progression of length k is an algebraic pattern corresponding to the following diophantine system: 2xi = xi−1 + xi+1, i = 2, 3, . . . , k − 1. We investigate the problem of finding linear algebraic patterns (these cor- respond to linear systems) within a family of subsets of natural numbers satisfying some asymptotic conditions. For instance, by Szemerédi theorem, subsets of positive upper Banach density (all S ⊂ N : d∗(S) > 0, where d∗(S) = lim supbn−an→∞ |S∩[an,bn]| bn−an+1 ) contain the pattern of an arithmetic progression of any finite length (see [12]). http://arxiv.org/abs/0704.0600v2 On the other hand, Schur patterns, namely triples of the form {x, y, x+ y}, which correspond to solutions of the so-called Schur equation, x+ y = z, do not necessarily occur in sets of positive upper density. For example, the odd numbers do not contain this pattern. But if we take a random subset of N by picking natural numbers with probability 1 independently, then this set contains the Schur pattern with probability 1. There is a deterministically defined analog of a random set - a normal set. To define a normal set we recall the notions of a normal infinite binary sequence and of a normal number. An infinite {0, 1}-valued sequence λ is called a normal sequence if every finite binary word w occurs in λ with frequency 1 , where |w| is the length of w. The more familiar notion is that of a normal number x ∈ [0, 1]. If to a number x ∈ [0, 1] we associate its dyadic expansion x = with xi ∈ {0, 1}, then x is called a normal number if the sequence (x1, x2, . . . , xn, . . .) is a normal sequence. Definition 1.1.1 A set S ⊂ N is called normal if the 0-1 sequence 1S (1S(n) = 1 ⇔ n ∈ S) is normal. Normal sets exhibit a non-periodic, “random” behavior. We notice that if S is a normal set then S − S contains N. Therefore, the equation z − y = x is solvable within every normal set. This implies that every normal set con- tains Schur patterns. Normal sets are related to a class of dynamical systems displaying maximal randomness; namely Bernoulli systems. In this work we investigate occur- rence of linear patterns in sets corresponding to dynamical systems with a lower degree of randomness, so called weakly mixing dynamical systems. The sets we obtain will be called WM sets. We will make this precise in the next section. In the present paper we treat the following problem: Give a complete characterization of the linear algebraic patterns which occur in all WM sets. Remark 1.1.1 It will follow from our definition of a WM set, that any normal set is a WM set. The problem of the solvability of a nonlinear equation or system of equa- tions is beyond the limits of the technique used in this paper. Nevertheless, some particular equations might be analyzed. In [3] it is shown that there exist normal sets in which the multiplicative Schur equation xy = z is not solvable. 1.2 Generic points and WM sets For a formal definition of WM sets we need the notions of measure preserving systems and of generic points. Definition 1.2.1 Let X be a compact metric space, B the Borel σ-algebra on X; let T : X → X be a continuous map and µ a probability measure on B. The quadruple (X,B, µ, T ) is called a measure preserving system if for every B ∈ B we have µ(T−1B) = µ(B). For a compact metric space X we denote by C(X) the space of continuous functions on X with the uniform norm. Definition 1.2.2 Let (X,B, µ, T ) be a measure preserving system. A point ξ ∈ X is called generic for the system (X,B, µ, T ) if for any f ∈ C(X) we f(T nξ) = f(x)dµ(x). (1.1) Example: Consider the Bernoulli system: (X = {0, 1}N0,B, µ, T ), where X is endowed with the Tychonoff topology, B is Borel σ-algebra on X , T is the shift to the left, µ is the product measure of µi’s where µi(0) = µi(1) = and N0 = N ∪ {0}. An alternative definition of a normal set which is purely dynamical is the following. A set S is normal if and only if the sequence 1S ∈ {0, 1} N0 is a generic point of the foregoing Bernoulli system. The notion of a WM set generalizes that of a normal set, where the role played by Bernoulli dynamical system is taken over by dynamical systems of more general character. Let ξ(n) be any {0, 1}−valued sequence. There is a natural dynamical system (Xξ, T ) connected to the sequence ξ: On the compact space Ω = {0, 1}N0 endowed with the Tychonoff topology, we define a continuous map T : Ω −→ Ω by (Tω)n = ωn+1. Now for any ξ in Ω we define Xξ = {T nξ}n∈N0 ⊂ Ω. Let A be a subset of N. Choose ξ = 1A and assume that for an appropriate measure µ, the point ξ is generic for (Xξ,B, µ, T ). We can attach to the set A dynamical properties associated with the system (Xξ,B, µ, T ). We recall the notions of ergodicity, total ergodicity and weak-mixing in er- godic theory: Definition 1.2.3 A measure preserving system (X,B, µ, T ) is called er- godic if every A ∈ B which is invariant under T , i.e. T−1(A) = A, satisfies µ(A) = 0 or 1. A measure preserving system (X,B, µ, T ) is called totally ergodic if for ev- ery n ∈ N the system (X,B, µ, T n) is ergodic. A measure preserving system (X,B, µ, T ) is called weakly mixing if the system (X ×X,BX×X , µ× µ, T × T ) is ergodic. In our discussion of WM sets corresponding to weakly mixing systems, we shall add the proviso that the dynamical system in question not be the trivial 1-point system supported on the point x ≡ 0. This implies that the “density” of the set in question be positive. Definition 1.2.4 Let S ⊂ N. If the limit of 1 n=1 1S(n) exists as N → ∞ we call it the density of S and denote by d(S). Definition 1.2.5 A subset S ⊂ N is called a WM set if 1S is a generic point of the weakly mixing system (X1S ,B, µ, T ) and d(S) > 0. 1.3 Solvability of linear diophantine systems within WM sets and normal sets Our main result is a complete characterization of linear systems of diophan- tine equations which are solvable within every WM set. The characterization is given by describing affine subspaces of Qk which intersect Ak, for any WM set A ⊂ N. Theorem 1.3.1 An affine subspace of Qk intersects Ak for every WM set A ⊂ N if and only if it contains a set of the form {n~a+m~b+ ~f |n,m ∈ N}, where ~a,~b, ~f have the following description: ~a = (a1, a2, . . . , ak) t, ~b = (b1, b2, . . . , bk) t ∈ Nk, ~f = (f1, f2, . . . , fk) t ∈ Zk and there exists a partition F1, . . . , Fl of {1, 2, . . . , k} such that: a) for every r ∈ {1, . . . , l} there exist c1,r, c2,r ∈ N, such that for every i ∈ Fr we have ai = c1,r , bi = c2,r and for every j ∈ {1, . . . , k} \ Fr we have aj bj c1,r c2,r 6= 0. ∀r ∈ {1, 2, . . . , l} ∃cr ∈ Z such that ∀i ∈ Fr : fi = cr. We also classify all affine subspaces of Qk which intersect Ak for any normal set A ⊂ N. Theorem 1.3.2 An affine subspace of Qk intersects Ak for every normal set A ⊂ N if and only if it contains a set of the form {n~a+m~b+ ~f |n,m ∈ N}, where ~a,~b, ~f have the following description: ~a = (a1, a2, . . . , ak) t, ~b = (b1, b2, . . . , bk) t ∈ Nk, ~f = (f1, f2, . . . , fk) t ∈ Zk and there exists a partition F1, . . . , Fl of {1, 2, . . . , k} such that for every r ∈ {1, . . . , l} there exist c1,r, c2,r ∈ N, such that for every i ∈ Fr we have ai = c1,r , bi = c2,r and for every j ∈ {1, . . . , k} \ Fr we have aj bj c1,r c2,r 6= 0. A family of linear algebraic patterns that has been studied previously are the “partition regular” patterns. These are patterns which for any finite partition of N: N = C1∪C2 ∪ . . .∪Cr, the pattern necessarily occurs in some Cj. (For example by van der Waerden’s theorem, arithmetic progressions are partition regular and by Schur’s theorem the Schur pattern is also partition regular). A theorem of Rado gives a complete characterization of such patterns. We will show in Proposition 4.1 that every linear algebraic pattern which is partition-regular occurs in every WM set. It is important to mention that if we weaken the requirement of weak mixing to total ergodicity, then in the resulting family of sets, Rado’s patterns need not necessarily occur. For example, for α 6∈ Q the set n ∈ N|nα (mod 1) ∈ is totally ergodic, i.e., 1S is a generic point for a totally ergodic system and the density of S is positive, but the equation x+ y = z is not solvable within In the separate paper [4] we will address the question of solvability of more general algebraic patterns, not necessarily linear, in totally ergodic and WM sets. The structure of the paper is the following. In Section 2 we prove the direction “⇐” of Theorems 1.3.1 and 1.3.2. In Section 3, by use of a probabilistic method, we prove the direction “⇒” of Theorems 1.3.1 and 1.3.2. In Section 4 we show that every linear system which is solvable in one of the cells of any finite partition of N is also solvable within every WM set. The paper ends with Appendix in which we collected proofs of technical statements which have been used in Sections 2 and 3. 1.4 Acknowledgments This paper is a part of the author’s Ph.D. thesis. I thank my advisor Prof. Hillel Furstenberg for introducing me to ergodic theory and for many useful ideas which I learned from him. I thank Prof. Vitaly Bergelson for fruitful discussions and valuable suggestions. Also, I would like to thank an anony- mous referee for numerous valuable remarks. 2 Proof of Sufficiency Notation: We introduce the scalar product of two vectors v, w of length N as follows: < v,w >N v(n)w(n). We denote by L2(N) the (finite-dimensional) Hilbert space of all real vectors of length N with the aforementioned scalar product. We define: ‖ w ‖2N =< w,w >N . First we state the following proposition which will prove useful in the proof of the sufficiency of the conditions of Theorem 1.3.1. Proposition 2.1 Let Ai ⊂ N (1 ≤ i ≤ k) be WM sets. Let ξi(n) = 1Ai(n)−d(Ai), where d(Ai) denotes density of Ai. Suppose there are (a1, b1), (a2, b2), . . . , (ak, bk) ∈ Z 2, such that ai > 0, 1 ≤ i ≤ k, and for every i 6= j ai bi aj bj 6= 0. Then for every ε > 0 there exists M(ε) ∈ N, such that for every M ≥ M(ε) there exists N(M, ε) ∈ N, such that for every N ≥ N(M, ε) where w(n) m=1 ξ1(a1n+b1m)ξ2(a2n+b2m) . . . ξk(akn+bkm) for every n = 1, 2, . . . , N . Since the proof of Proposition 2.1 involves many technical details, first we show how our main result follows from it. Afterwards we state and prove all the lemmas necessary for the proof of Proposition 2.1. We use an easy consequence of Proposition 2.1. Corollary 2.1 Let A be a WM set. Let k ∈ N, suppose (a1, b1), (a2, b2), . . . , (ak, bk) ∈ Z 2 satisfy all requirements of Proposition 2.1 and suppose f1, . . . , fk ∈ Z. Then for every δ > 0 there exists M(δ) such that ∀M ≥ M(δ) there exists N(M, δ) such that ∀N ≥ N(M, δ) we have ∣‖v‖N − d ∣ < δ, where v(n) m=1 1A(a1n + b1m + f1)1A(a2n + b2m + f2) . . . 1A(akn + bkm+ fk) for every n = 1, 2, . . . , N . Proof. We rewrite v(n) in the following form: v(n) = (ξ1(a1n+ b1m) + d(A)) . . . (ξk(akn + bkm) + d(A)), for every n = 1, 2, . . . , N . We introduce normalized WM sequences ξi(n) = ξ(n+ fi) (of zero average), where ξ(n) = 1A(n)− d(A). By use of triangular inequality and Proposition 2.1 it follows that for big enoughM and N (which depends on M) ‖v‖N is as close as we wish to d k(A). This finishes the proof. Proof. (of Theorem 1.3.1, ⇚) Let A ⊂ N be a WM set. Without loss of generality, we can assume that for every r : 1 ≤ r ≤ l we have r ∈ Fr. It follows from Corollary 2.1 that the vector v defined by 1A(a1n+ b1m+ f1)1A(a2n + b2m+ f2) . . . 1A(aln + blm+ fl) for every n = 1, 2, . . . , N , is not identically zero for big enough M and N . But this is possible only if for some n,m ∈ N we have (a1n+ b1m+ f1, a2n+ b2m+ f2, . . . , aln+ blm+ fl) ∈ A The latter implies that Ak intersects the affine subspace. Proof. (of Theorem 1.3.2, ⇚) For every r : 1 ≤ r ≤ l take all indices which comprise Fr. Denote this sequence of indices by Ir. Denote cr = mini∈Ir fi. Let Sr be the set of all non-zero shifts of fi, i ∈ Fr, centered at cr, i.e., Sr = {fi − cr | i ∈ Fr, fi > cr}. For example, if the sequence of fi’s where i ∈ F1 is (−5, 2, 3, 2,−5), then S1 = {7, 8}. Let A be a normal set. For every r : 1 ≤ r ≤ l we define sets Ar by Ar = {n ∈ N ∪ {0} |n ∈ A and n+ s ∈ A, ∀s ∈ Sr}. Then Ar is no longer a normal set provided that Sr 6= ∅ (d(A) = 21+|Sr | But, for all r : 1 ≤ r ≤ l the sets Ar’s are WM sets. Without loss of generality, assume that for every r : 1 ≤ r ≤ l we have r ∈ Fr. From Proposition 2.1 it follows that for big enough M and N 1A1(a1n+ b1m)1A(a2n+ b2m) . . . 1A(aln+ blm) ≈ d(Ar). The latter ensures that there exist m,n ∈ N such that (a1n+ b1m+ f1, . . . , akn+ bkm+ fk) ∈ A Now we state and prove all the claims that are required in order to prove Proposition 2.1. Definition 2.1 Let ξ be a WM-sequence (ξ is a generic point for a weakly mixing system (Xξ,BXξ , µ, T )) of zero average. The autocorrelation function of ξ of length j ∈ N with the shifts ~i = (i1, i2, . . . , ij) ∈ Z j and r ∈ Z is the sequence ψ which is defined by (n) = w∈{0,1}j ξ(n+ r + w ·~i), n ∈ N, where w ·~i is the usual scalar product in Qj, and (n) = 0, n ≤ 0. Lemma 2.1 Let ξ be a WM-sequence of zero average and suppose ε, δ > 0, b ∈ Z \ {0}. Then for every j ≥ 1, (c1, c2, . . . , cj) ∈ (Z \ {0}) j and (r1, r2, . . . , rj) ∈ Z j there exist I = I(ε, δ, c1, . . . , cn), a set S ⊂ [−I, I] density at least 1−δ and N(S, ε) ∈ N, such that for every N ≥ N(S, ε) there exists L(N, S, ε) such that for every L ≥ L(N, S, ε) r,(c1i1,...,cjij) (l + bn) for every (i1, i2, . . . , ij) ∈ S, where r = k=1 rk. Proof. We note that it is sufficient to prove the lemma in the case c1 = c2 = . . . = cj = 1, since if the average of nonnegative numbers over a whole lattice is small, then the average over a sublattice of a fixed positive density is also small. Recall that ξ ∈ Xξ = {T nξ}∞n=0 ⊂ supp(ξ) N0, where T is the usual shift to the left on the dynamical system supp(ξ)N0, and by the assumption that ξ is a WM-sequence of zero average it follows that ξ is a generic point of the weakly mixing system (Xξ,BXξ , µ, T ) and the function f : f(ω) = ω0 has zero integral. Denote ~i = (i1, . . . , ij). We define functions g on Xξ by T r+ǫ· ~i ◦ f, ǫ∈V ∗ T r+ǫ· ~i ◦ f, where Vj is the j-dimensional discrete cube {0, 1} j and V ∗j is the j-dimensional discrete cube except the zero point. Notice that (T nξ) = ψ We use the following theorem which is a special case of a multiparameter weakly mixing PET of Bergelson and McCutcheon (theorem A.1 in [2]; it is also a corollary of Theorem 13.1 of Host and Kra in [9]). Let (X, µ, T ) be a weakly mixing system. Given an integer k and 2k bounded functions fǫ on X, ǫ ∈ Vk , the functions Ni −Mi n∈[M1,N1)×...[Mk,Nk) ǫ∈V ∗ T ǫ1n1+...ǫknk ◦ fǫ converge in L2(µ) to the constant limit ǫ∈V ∗ when N1 −M1, . . . , Nk −Mk tend to +∞. From this theorem applied to the weakly mixing system Xξ × Xξ and the functions fǫ(x) = T r ◦ f ⊗ T r ◦ f for every ǫ ∈ Vj , we obtain for every Folner sequence {Fn} in N j that an average over the multi-index ~i = {i1, . . . , ij} of on Fn’s converges to zero in L 2(µ) (the integral of T r ◦ f ⊗ T r ◦ f is zero). Thus Xξ×Xξ Ni −Mi ~i∈[M1,N1)×...×[Mj,Nj) (y)dµ(x)dµ(y) = Ni −Mi ~i∈[M1,N1)×...×[Mj,Nj) (x)dµ(x) as N1 −M1, . . . , Nj −Mj → ∞. As a result we obtain the following statement: For every ε > 0, j ∈ N and every fixed (r1, r2, . . . , rj) ∈ N j, there exists a subset R ⊂ Nj of lower density equal to one, such that < ε (2.1) for every ~i ∈ R, where r = k=1 rj. Recall that lower density of a subset R ⊂ Nj is defined to be d∗(R) = lim inf N1−M1,...,Nj−Mj→∞ #{R ∩ [M1, N1)× . . .× [Mj , Nj)} k=1(Nk −Mk) Recall that ψ (l + bn) = g T l+bnξ The definition of the sequences ψj implies r1,~i (l + bn) = lim r2,(±i1,...,±ij) (l ± bn) for any r1, r2 ∈ Z, where ~i = (i1, . . . , ij). Therefore, in order to prove Lemma 2.1 it is sufficient to show the following: For every ε, δ > 0 and for any a priori chosen b ∈ N there exists I(ε, δ) ∈ N, such that for every I ≥ I(ε, δ) there exists a subset S ⊂ [1, I]j of density at least 1 − δ (namely, we have |S∩[1,I)j | ≥ 1 − δ) and N(S, ε) ∈ N, such that for every N ≥ N(S, ε) there exists L(N, S, ε) ∈ N such that for every L ≥ L(N, S, ε) the following holds for every ~i ∈ S: (l + bn) Let b ∈ N. Continuity of the function g0,~i and genericity of the point ξ ∈ Xξ yield (l + bn) = lim T bng0,~i T bng0,~i dµ. (2.2) By applying the von Neumann ergodic theorem to the ergodic system (Xξ,B, µ, T b) (ergodicity follows from weak-mixing of the original measure preserving system (Xξ,B, µ, T )) we have T bng0,~i → L2(Xξ) g0,~idµ. (2.3) From (2.1) there exists I(ε, δ) ∈ N big enough that for every I ≥ I(ε, δ) there exists a set S ⊂ [1, I]j of density at least 1− δ such that g0,~idµ for all ~i ∈ S. From equation (2.3) it follows that there exists N(S, ε) ∈ N, such that for every N ≥ N(S, ε) we have T bng0,~i for all ~i ∈ S. Finally, equation (2.2) implies that there exists L(N, S, ε) ∈ N, such that for every L ≥ L(N, S, ε) we have (l + bn) for all ~i ∈ S. The following lemma is a generalization of the previous lemma to a product of several autocorrelation functions. Lemma 2.2 Let ψ r1,~i , . . . , ψ rk,~i be autocorrelation functions of length j of WM-sequences ξ1, . . . , ξk of zero average, {c11, . . . , c j , . . . , c 1, . . . , c j} ∈ (Z \ {0}) jk and ε, δ > 0. Suppose (a1, b1), (a2, b2), . . . , (ak, bk) ∈ Z 2, such that ai > 0 for all i : 1 ≤ i ≤ k and for every i 6= j ai bi aj bj 6= 0. (If k = 1 assume that b1 6= 0.) Then there exists I(ε, δ) ∈ N, such that for every I ≥ I(ε, δ) there exist S ⊂ [−I, I]j of density at least 1− δ, M(S, ε) ∈ N, such that for every M ≥ M(S, ε) there exists X(M,S, ε) ∈ N, such that for every X ≥ X(M,S, ε) r1,(c i1,...,c (a1x+ b1m) . . . ψ rk,(c i1,...,c (akx+ bkm) for every (i1, i2, . . . , ij) ∈ S. Proof. The proof is by induction on k. THE CASE k = 1 (and arbitrary j): If a1 = 1 then the statement of the lemma follows from Lemma 2.1. If a1 > 1 then by Proposition 5.1 of Appendix for a given ~i = (i1, . . . , ij) ∈ S we have r1,(c i1,...,c (a1x+ b1m) r1,(c i1,...,c (x+ b1m) (2.4) (Limits exist by genericity of the point ξ.) By Lemma 2.1 the right hand side of (2.4) is small for large enough M . So, for large enough X (depending on M and (i1, . . . , ij)) the statement of the lemma is true. By finiteness of S we conclude that the statement of the lemma holds for k = 1. GENERAL CASE (k > 1): Suppose that the statement holds for k − 1. Denote vm(x) r1,(c i1,...,c (a1x+ b1m) . . . ψ rk,(c i1,...,c (akx+ bkm). Let ε, δ > 0. We show that there exists I(ε, δ) ∈ N such that for every I > I(ε, δ) a set S ⊂ [−I, I]j of density at least 1− δ can be chosen satisfying the following property: There exists I(ε, S) ∈ N such that for every I > I(ε, S) there exists M(I) ∈ N such that for all M > M(I) for a set of i’s in {1, 2, . . . , I} of density at least 1− ε we have < vm, vm+i >X (2.5) for all (i1, . . . , ij) ∈ S. The Van der Corput lemma (Lemma 5.1 of Appendix) finishes the proof. Note that the set of “good” i’s in the interval {1, 2, . . . , I} depends on (i1, . . . , ij) ∈ S. Denote < vm, vm+i >X 1,j+1 r1,(c i1,...c ij ,b1i) (a1x+ b1m) . . . ψ k,j+1 rk,(c i1,...,c ij ,bki) (akx+ bkm) Denote y = a1x+ b1m. Assume that (a1, b1) = d. Denote B̃y,m = ψ 1,j+1 r1,(c i1,...c ij ,b1i) (y) . . . ψ k,j+1 rk,(c i1,...,c ij ,bki) (a′ky + b where a′p = , b′p = bp − a pb1, 2 ≤ p ≤ k. We rewrite à as follows: y≡dl mod a1 m≡φ(l) mod B̃y,m + δX,M . (2.6) Here φ is a bijection of Za1 defined by the identity for every 0 ≤ l ≤ a1 − 1, Y = a1X , a p as above and δX,M accounts for the fact that for small y’s and y’s close to Y there is a difference between elements that are taken in the expression for à and in the expression on the right hand side of equation (2.6). Nevertheless, we have δX,M → 0 if Denote C̃y,m = ψ 2,j+1 r2,(c i1,...,c ij ,b2i) (a′2y + b 2m) . . . ψ k,j+1 rk,(c i1,...,c ij ,bki) (a′ky + b It will suffice to prove that there exists I(ε, δ) ∈ N such that for every I > I(ε, δ) we can find S ⊂ [−I, I]j of density at least 1 − δ with the following property: There exists I(ε, S) ∈ N such that for every I > I(ε, S) there exists M(I) ∈ N such that for every M > M(I) we can find X(M) ∈ N such that for every X > X(M) for a set of i’s in {1, 2, . . . , I} of density at least 1 − ε we have y≡dl mod a1 m≡φ(l) mod C̃y,m (2.7) for all 0 ≤ l ≤ a1 − 1, for all (i1, . . . , ij) ∈ S. Note that it is enough to prove the latter statement for every particular l : 0 ≤ l ≤ a1 Denote the left hand side of inequality (2.7) for a fixed l by D̃l. Introduce new variables z and n, such that y = za1+ dl and m = n +φ(l). We obtain D̃l = 2,j+1 t2n,z,l . . . ψ k,j+1 tkn,z,l 2,j+1 (a2z + c2n + q2) . . . ψ k,j+1 (akz + ckn+ qk) where shp = (rp, (c 1i1, . . . , c j ij , bpi)), n,z,l ap(a1z+dl)+(a1bp−apb1)( n+φ(l)) , qp = apld+(a1bp−apb1)φ(l) a1bp−apb1 6= 0, Z = Y and N = Md From the conditions on the function φ it follows that qp ∈ Z, 2 ≤ p ≤ k. From the conditions of the lemma we obtain for every p 6= q, p, q > 1, ap cp aq cq a1 det ap bp aq bq 6= 0. Therefore, D̃l can be rewritten as D̃l = φ2 (a2z + c2n) . . . φk (akz + ckn) where φℓ = ψ ℓ,j+1 rℓ+qℓ,(c i1,...,c ij ,bℓi) , 2 ≤ ℓ ≤ k. By the induction hypothesis the following is true. There exists Il(ε, δ ′) ∈ N big enough, such that for every Il ≥ Il(ε, δ ′) there exist a subset Sl ⊂ [−Il, Il] j+1 of density at least 1 − δ′2 and N(Sl, ε) ∈ N, such that for every N ≥ N(Sl, ε) there exists Z(N, Sl, ε) ∈ N, such that for every Z ≥ Z(N, Sl, ε) we have D̃l < (2.8) for all (i1, . . . , ij , i) ∈ Sl. For every (i1, . . . , ij) ∈ [−Il, Il] j we denote by Sli1,...,ij the fiber above (i1, . . . , ij): Sli1,...,ij = {i ∈ [−Il, Il] | (i1, . . . , ij , i) ∈ Sl}. Then there exists a set Tl ⊂ [−Il, Il] j of density at least 1− δ′, such that for every (i1, . . . , ij) ∈ Tl the density of S i1,...,ij is at least 1 − δ′. Let ε, δ > 0. Take δ′ < min ( ε , δ) and I > max (I ′(ε), Il(ε, δ ′)) (I ′(ε) is taken from the van der Corput lemma). Then it follows by (2.8) that there exists M(Tl, ε, δ) ∈ N, such that for every M ≥ M(Tl, ε, δ) there exists X(M,Tl, ε, δ) ∈ N, such that for every X ≥ X(M,Tl, ε, δ) the inequality (2.7) holds for every fixed (i1, . . . , ij) ∈ Tl for a set of i’s within the interval {1, . . . , I} of density at least 1 − ε . The lemma follows from the van der Corput lemma. Proof of Proposition 2.1. Denote vm(n) = ξ1(a1n+b1m) . . . ξk(akn+bkm). For every i ∈ N we introduce à defined by < vm, vm+i >N 0,(b1i) (a1n+ b1m) . . . ψ 0,(bki) (akn + bkm) where the functions ψp,j’s are autocorrelation functions of the ξp’s of length By Lemma 2.2 it follows that for every ε > 0 there exists I(ε) ∈ N such that for every I ≥ I(ε) there exist S ⊂ {1, 2, . . . , I} of density at least 1− ε M(S, ε) such that for every M ≥ M(S, ε) there exists N(M,S, ε) such that for every N ≥ N(M,S, ε) we have 0,(b2i) (a2n+ b2m) . . . ψ 0,(bki) (akn+ bkm) ≤ ε2. The proposition follows from the van der Corput Lemma 5.1. 3 Probabilistic constructions of WM sets The goal of this section is to prove the necessity of the conditions of Theorem 1.3.1. The following proposition is the main tool for this task. Proposition 3.1 Let a, b ∈ N, c ∈ Z such that a 6= b. Then there exists a normal set A within which the equation ax = by + c (3.1) is unsolvable, i.e., for every (x, y) ∈ A2 we have ax 6= by + c. Remark 3.1 The proposition is a particular case of Theorem 1.3.1. It is a crucial ingredient in proving the necessity direction of the theorem in general. Proof. Let S ⊂ N. We construct from S a new set AS within which the equation ax = by + c is unsolvable. Without loss of generality, suppose that a < b. Assume (a, b) = 1 (the general case follows easily). It follows from (a, b) = 1 that (3.1) is solvable. Any solution (x, y) of the equation ax = by + c has restrictions on x. Namely, x ≡ φ(a, b, c)(mod b), where φ(a, b, c) ∈ {0, 1, . . . , b − 1} is determined uniquely. Let us denote l0 = φ(a, b, c). We define inductively a sequence {li} ⊂ N ∪ {0}. If a pair (x, y) is a solution of equation (3.1) and y ∈ biN + li−1 then choose li ∈ {0, 1, . . . , b i+1 − 1} such that x ∈ bi+1N+ li. Note that from (a, b) = 1 it follows that (a, bi+1) = 1. It is clear that if u, v ∈ N satisfy (u, v) = 1 then for any w ∈ Z there exists a solution (x, y) ∈ N2 of the equation ux = vy + w. The latter implies that there exist x ∈ N, y ∈ biN+ li−1 such that ax = by+ c. Any such x should be a member of bi+1N+ li. Note that li and li−1 are connected by the identity ali ≡ bli−1 + c ( mod b i+1). (3.2) In addition, if x ∈ N is given then the equation ax ≡ by + c ( mod bi+1) has at most one solution y ∈ {0, 1, . . . , bi − 1}. We define sets Hi = biN+ li−1 ; i ∈ N. We prove that for every i ∈ N, Hi+1 ⊂ Hi. All elements of Hi+1 are in the same class modulo b i+1, therefore all elements of Hi+1 are in the same class modulo b i. So, if we show for some x ∈ Hi+1 that x ≡ li−1(mod b i) then we are done. For i = 1 we know that if y ∈ N then any x ∈ N such that (x, y) is a solution of the equation (3.1) has to be in H1. Take x ∈ H2 such that there exists y ∈ H1 with ax = by + c. Then x ∈ H1. Therefore, we have shown that H2 ⊂ H1. For i > 1 there exists x ∈ Hi+1 such that there exists y ∈ Hi with ax = by + c. By induction Hi ⊂ Hi−1. Therefore, the latter y is in Hi−1. Therefore, by construction of li’s we have that x ∈ Hi. This shows Hi+1 ⊂ Hi. We define sets Bi; 0 ≤ i <∞: B0 = N \H1, B1 = H1 \H2 . . . Bi = Hi \Hi+1 . . . Clearly we have Bi∩Bj = ∅ , ∀i 6= j and |N\ (∪ i=0Bi)| = | ∩ i=1Hi| ≤ 1. The latter is because for every i the second element (in the increasing order) of Hi is ≥ b We define AS = i=0Ai, where Ai’s are defined in the following manner: = S ∩ B0, C0 = B0 \ A0 = B1 \ {x | ax ∈ bB0 + c}, A1 = (B1 ∩ {x | ax ∈ bC0 + c}) ∪ (D1 ∩ S) , = B1 \A1 . . . = Bi \ {x | ax ∈ bBi−1 + c}, Ai = (Bi ∩ {x | ax ∈ bCi−1 + c}) ∪ (Di ∩ S) , = Bi \ Ai . . . Here it is worthwhile to remark that for every i, Bi = Ai ∪ Ci. Therefore AS ⊂ ∪ i=0Bi. If for some i ≥ 1 we have y ∈ Ai ⊂ Bi = Hi \ Hi+1, then any x with ax = by + c satisfies ax ≡ bli−1 + c ( mod b i+1). From (a, bi+1) = 1 it follows that there exists a unique solution x modulo bi+1. By identity (3.2) we have x ≡ li ( mod b i+1). Thus x ∈ Hi+1. If x ∈ Hi+2, then x ≡ li+1 ( mod b i+2). Thus we have ali+1 ≡ by + c ( mod b i+2). By uniqueness of a solution ( y ) modulo bi+1 we get y ≡ li ( mod b i+1). Thus y ∈ Hi+1. We have a contradiction, which shows that x ∈ Hi+1\Hi+2 = Bi+1. The same argument works for y ∈ A0 ⊂ B0 and it shows that any x with ax = by + c satisfies x ∈ B1. So, if y ∈ Ai (i ≥ 0) then any x with ax = by + c should satisfy x ∈ Bi+1. By construction of AS, x 6∈ AS. Thus equation (3.1) is not solvable in AS. We make the following claim: For almost every subset S of N the set AS is a normal set. (The probability measure on subsets of N considered here is the product on {0, 1}∞ of probability measures (1 The tool for proving the claim is the following easy lemma (for a proof see Appendix, Lemma 5.2). A subset A of natural numbers is a normal set if and only if for any k ∈ (N ∪ {0}) and any i1 < i2 < . . . < ik we have χA(n)χA(n + i1) . . . χA(n+ ik) = 0, (3.3) where χA(n) = 2 · 1A(n)− 1. First of all, we denote TN = n=1 χAS(n)χAS(n + i1) . . . χAS(n + ik). Be- cause of randomness of S, TN is a random variable. We will prove that N=1E(T ) <∞ and this will imply by Lemma 5.3 that TN →N→∞ 0 for almost every S ⊂ N. E(T 2N) = n,m=1 E(χAS(n)χAS(n+i1) . . . χAS(n+ik)χAS(m) . . . χAS(m+ik)). Adding (removing) of a finite set to (from) a normal set does not affect the normality of the set. The set ∪iBi might differ from N by at most one element (| ∩∞i=1 Hi| ≤ 1). This possible element does not affect the normality of AS and we assume without loss of generality that ∩∞i=1Hi = ∅, thus N = ∪ i=0Bi. For every number n ∈ N we define the chain of n, Ch(n), to be the following finite sequence: If n ∈ B0, then Ch(n) = (n). If n ∈ B1, then two situations are possible. In the first one there exists a unique y ∈ B0 such that an = by+c. We set Ch(n) = (n, y) = (n, Ch(y)). In the second situation we can not find such y from B0 and we set Ch(n) = (n). If n ∈ Bi+1, then again two situations are possible. In the first one there exists y ∈ Bi such that an = by + c. In this case we set Ch(n) = (n, Ch(y)). In the second situation there is no such y from Bi. In this case we set Ch(n) = (n). We define l(n) to be the length of Ch(n). For every n ∈ N we define the ancestor of n, a(n), to be the last element of the chain of n (of Ch(n)). To determine whether or not n ∈ AS will depend on whether a(n) ∈ S. The exact relationship depends on the i for which n ∈ Bi and on the j for which a(n) ∈ Bj or in other words on the length of Ch(n): χAS(n) = (−1) i−jχS(a(n)) = (−1) l(n)−1χS(a(n)). We say that n is a descendant of a(n). It is clear that E(χAS(n1) . . . χAS(nk)) 6= 0 (E(χAS(n1) . . . χAS(nk)) ∈ {0, 1}) if and only if every number a(ni) occurs an even number of times among numbers a(n1), a(n2), . . . , a(nk). We bound the number of n,m’s inside the square [1, N ] × [1, N ] such that E(χAS(n)χAS(n+ i1) . . . χAS(n+ ik)χAS(m)χAS(m+ i1) . . . χAS(m+ ik)) 6= 0. For a given n ∈ [1, N ] we count all m’s inside [1, N ] such that for the ancestor of n there will be a chance to have a twin among the ancestors of n+i1, . . . , n+ ik, m,m+ i1, . . . , m+ ik. First of all it is obvious that in the interval [1, N ] for a given ancestor there can be at most log b N + C1 descendants, where C1 is a constant. For all but a constant number of n’s it is impossible that among n + i1, . . . , n + ik there is the same ancestor as for n. Therefore we should focus on ancestors of the set {m,m + i1, . . . , m + ik}. For a given n we might have at most (k + 1)(log b N + C1) options for the number m to provide that one of the elements of {m,m + i1, . . . , m + ik} has the same ancestor as n. Therefore for most of n ∈ [1, N ] (except maybe a bounded number C2 of n’s which depends only on {i1, . . . , ik} and doesn’t depend on N) we have at most (k + 1)(log b N + C1) possibilities for m’s such that E(χAS(n)χAS(n+ i1) . . . χAS(n+ ik)χAS(m)χAS(m+ i1) . . . χAS(m+ ik)) 6= 0. Thus we have E(T 2N) ≤ (k + 1)(log b N + C1) + C2N ((k+1) log b N +C3), where C3 is a constant. This implies E(T 2N2) <∞. Therefore TN2 →N→∞ 0 for almost every S ⊂ N. By Lemma 5.3 it follows that TN →N→∞ 0 almost surely. In the general case, where a, b are not relatively prime, if c satisfies (3.1) then it should be divisible by (a, b). Therefore by dividing the equation (3.1) by (a, b) we reduce the problem to the previous case. We use the following notation: Let W be a subset ofQn. Then for any increasing subsequence I = (i1, . . . , ip) ⊂ {1, 2, . . . , n} we define ProjIW = WI = {(wi1, . . . , wip) | ∃w = (w1, w2, . . . , wn) ∈ W}. We recall the notion of a cone. Definition 3.1 A subset W ⊂ Qn is called a cone if (a) ∀w1, w2 ∈ W we have w1 + w2 ∈ W (b) ∀α ∈ Q : α ≥ 0 and ∀w ∈ W we have αw ∈ W . The next step involves an algebraic statement with a topological proof which we have to establish. Lemma 3.1 Let W be a non-trivial cone in Qn which has the property that for every two vectors ~a = {a1, a2, . . . , an} t,~b = {b1, b2, . . . , bn} t ∈ W there exist two coordinates 1 ≤ i < j ≤ n (depend on the choice of ~a,~b) such that ai bi aj bj There exist two coordinates i < j such that the projection of W on these two coordinates is of dimension ≤ 1 (dimQ SpanProj(i,j)W ≤ 1). Proof. First of all W has positive volume in V = SpanW (Volume is Haar measure which normalized by assigning measure one to a unit cube and W contains a parallelepiped). Fix an arbitrary non-zero element ~x ∈ W . For every i, j : 1 ≤ i < j ≤ n we define the subspace Ui,j = {~v ∈ V |Proj(i,j)~v ∈ SpanProj(i,j)~x}. From the assumptions of the lemma it follows that i,j;1≤i= x , < ~v, ~ej >= y}) is contained in a line, then Proj+i,jV is diagonal, i.e. it is contained in {(x, x)|x ∈ Q}. Otherwise, we can generate a partition of N into two disjoint sets S1, S2 such that no S 1 and no S 2 intersects V : This partition is constructed by an iterative process. Without loss of gener- ality we may assume that the line is x = ny, where n ∈ N. The general case is treated in the simillar way. We start with S1 = S2 = ∅. Let 1 ∈ S1. We “color” the infinite geometric progression {nm |m ∈ N} (adding elements to either S1 or S2) in such way that there is no (x, y) on the line from S 1 , S Then we take a minimal element from N which is still uncolored. Call it a. Add a to S1. Next, “color” {an m |m ∈ N}. Continuing in this fashion, we obtain the desired partition of N. This contradicts the assumption that the given system is partition-regular. Let F1, . . . , Fl be a partition of {1, 2, . . . , k} such that for every r ∈ {1, . . . , l} we have for every i 6= j , i, j ∈ Fr : dim QSpan(Proj i,jV ) = 1, and for every r : 1 ≤ r ≤ l, every i ∈ Fr .and for every j 6∈ Fr we have dim QSpan(Proj i,jV ) = 2. For every r : 1 ≤ r ≤ l we choose arbitrarily one representative index within Fr and denote it by jr (jr ∈ Fr). Then there exist g1, . . . , gl ∈ N such that VI = {(g1x1, . . . , glxl) | x1, . . . , xl ∈ Q}. The latter ensures that there exist vectors ~a,~b ∈ V which satisfy all the requirements of Theorem 1.3.1 and, therefore, the system is solvable in every WM set. 5 Appendix In this section we prove all technical lemmas and propositions that were used in the paper. We start with the key lemma which is a finite modification of Bergelson’s lemma in [1]. Its origin is in a lemma of van der Corput. Lemma 5.1 Suppose ε > 0 and {uj} j=1 is a family of vectors in Hilbert space, such that ‖uj‖ ≤ 1 (1 ≤ j ≤ ∞). Then there exists I ′(ε) ∈ N, such that for every I ≥ I ′(ε) there exists J ′(I, ε) ∈ N, such that the following holds: For J ≥ J ′(I, ε) for which we obtain 〈uj, uj+i〉 for a set of i’s in the interval {1, . . . , I} of density 1− ε we have Proof. For an arbitrary J define uk = 0 for every k < 1 or k > J . The following is an elementary identity: uj−i = I Therefore, the inequality i=1 ui i=1 ‖ui‖ yields ≤ (J + I) = (J + I) uj−p, uj−s〉 = (J+I) ‖uj−p‖ +2(J+I) r,s=1;s 0 and every integers b1, b2, . . . , bk ξ(n+ b1)ξ(n+ b2) . . . ξ(n+ bk) = ξ(an+ b1)ξ(an+ b2) . . . ξ(an+ bk), where ξ = 1A − d(A). Proof. Consider the weak-mixing measure preserving system (Xξ,B, µ, T ). The left side of the equation in the proposition is T b1fT b2f . . . T bkfdµ, where f(ω) = ω0 for every infinite sequence inside Xξ. We make use of the notion of disjointness of measure preserving systems. By [6] we know that every weak-mixing system is disjoint from any Kronecker system which is a compact monothethic group with Borel σ-algebra, the Haar probability measure, and the shift by a chosen element of the group. In particular, every weak-mixing system is disjoint from the measure preserving system (Za,BZa , S, ν), where Za = Z/aZ, S(n) = n+ 1( mod a). The measure and the σ-algebra of the last system are uniquely determined. Therefore, from Furstenberg’s theorem (see [6], Theorem I.6) it follows that the point (ξ, 0) ∈ Xξ×Za is a generic point of the product system (Xξ×Za,B×BZa , T×S, µ×ν). Thus, for every continuous function g on Xξ × Za we obtain Xξ×Za g(x,m)dµ(x)dν(m) = lim g(T nξ, Sn0). Let g(x,m) = f(x)10(m), which is obviously continuous on Xξ × Za. Then genericity of the point (ξ, 0) yields Xξ×Za f(x)10(m)dµ(x)dν(m) = f(x)dµ(x) = f(T nξ)10(n) = lim f(T anξ). Taking instead of the function f the continuous function T b1fT b2f . . . T bkf in the definition of g finishes the proof. The next two lemmas are very useful for constructing normal sets with specif- ical properties. Lemma 5.2 Let A ⊂ N. Let λ(n) = 21A(n)− 1. Then A is a normal set ⇔ for any k ∈ (N ∪ {0}) and any i1 < i2 < . . . < ik we have λ(n)λ(n+ i1) . . . λ(n+ ik) = 0. Proof. “⇒” If A is normal then any finite word w ∈ {−1, 1}∗ has the “right” frequency 1 inside wA. This guarantees that “half of the time” the function λ(n)λ(n + i1) . . . λ(n + ik) equals 1 and “half of the time” is equal to −1. Therefore we get the desired conclusion. “⇐” Let w be an arbitrary finite word of plus and minus ones: w = a1a2 . . . ak and we have to prove that w occurs in wA with the frequency 2 −k. For every n ∈ N the word w occurs in 1A and starting from n if and only if 1A(n) = a1 . . . 1A(n+ k − 1) = ak The latter is equivalent to the following λ(n) = 2a1 − 1 . . . λ(n+ k − 1) = 2ak − 1 The frequency of w within 1A is equal to λ(n)(2a1 − 1) + 1 . . . λ(n+ k − 1)(2ak − 1) + 1 The limit is equal to 1 Lemma 5.3 Let {an} be a bounded sequence. Let TN = n=1 an. Then TN converges to a limit t ⇔ there exists a sequence of increasing indices {Ni} such that Ni → 1 and TNi →i→∞ t. References [1] Bergelson, V. Weakly mixing PET. Ergodic Theory Dynam. Sys- tems 7 (1987), no. 3, 337–349. [2] Bergelson, V.; McCutcheon, R. An ergodic IP polynomial Sze- merédi theorem. Mem. Amer. Math. Soc. 146 (2000), no. 695. [3] Fish, A. Random Liouville functions and normal sets. Acta Arith. 120 (2005), no. 2, 191–196. [4] Fish, A. Polynomial largeness of sumsets and totally ergodic sets, see http://arxiv.org/abs/0711.3201. [5] Fish, A. Ph.D. thesis, Hebrew University, 2006. [6] Furstenberg, H. Disjointness in ergodic theory, minimal sets, and a problem in Diophantine approximation. Math. Systems Theory 1 (1967), 1-49. [7] Furstenberg, H. Ergodic behavior of diagonal measures and a theorem of Szemerédi on arithmetic progressions. J. d’ Analys Math. 31 (1977), 204–256. [8] Furstenberg, H. Recurrence in Ergodic Theory and Combinatorial Number Theory. Princeton Univ. Press 1981. [9] Host, B.; Kra, B. Nonconventional ergodic averages and nilman- ifolds. Ann. of Math. (2) 161 (2005), no. 1, 397–488. [10] Rado, R. Note on combinatorial analysis. Proc. London Math. Soc. 48 (1943), 122–160. [11] Schur, I. Uber die Kongruenz xm+ym ≡ zm(modp). Jahresbericht der Deutschen Math.-Ver. 25 (1916), 114–117. [12] Szemerédi, E. On sets of integers containing no k elements in arithmetic progression. Collection of articles in memory of Jurǐi Vladimirovič Linnik. Acta Arith. 27 (1975), 199–245. Current Address: Department of Mathematics University of Wisconsin-Madison 480 Lincoln Dr. Madison, WI 53706-1388 E-mail: afish@math.wisc.edu http://arxiv.org/abs/0711.3201 Introduction Algebraic patterns within subsets of N Generic points and WM sets Solvability of linear diophantine systems within WM sets and normal sets Acknowledgments Proof of Sufficiency Probabilistic constructions of WM sets Comparison with Rado's Theorem Appendix ABSTRACT We introduce a new class of "random" subsets of natural numbers, WM sets. This class contains normal sets (sets whose characteristic function is a normal binary sequence). We establish necessary and sufficient conditions for solvability of systems of linear equations within every WM set and within every normal set. We also show that partition-regular system of linear equations with integer coefficients is solvable in any WM set. <|endoftext|><|startoftext|> arXiv:0704.0601v4 [hep-ph] 24 Oct 2007 D − D̄ mixing and rare D decays in the Littlest Higgs model with non-unitarity matrix Chuan-Hung Chen1,2∗, Chao-Qiang Geng3,4† and Tzu-Chiang Yuan3‡ 1Department of Physics, National Cheng-Kung University, Tainan 701, Taiwan 2National Center for Theoretical Sciences, Hsinchu 300, Taiwan 3Department of Physics, National Tsing-Hua University, Hsinchu 300, Taiwan 4Theory Group, TRIUMF, 4004 Wesbrook Mall, Vancouver, B.C. V6T 2A3, Canada (Dated: October 27, 2018) Abstract We study the D − D̄ mixing and rare D decays in the Littlest Higgs model. As the new weak singlet quark with the electric charge of 2/3 is introduced to cancel the quadratic divergence induced by the top-quark, the standard unitary 3× 3 Cabibbo-Kobayashi-Maskawa matrix is extended to a non-unitary 4× 3 matrix in the quark charged currents and Z-mediated flavor changing neutral currents are generated at tree level. In this model, we show that theD−D̄ mixing parameter can be as large as the current experimental value and the decay branching ratio (BR) of D → Xuγ is small but its direct CP asymmetry could be O(10%). In addition, we find that the BRs of D → Xuℓ+ℓ−, D → Xuνν̄ and D → µ+µ− could be enhanced to be O(10−9), O(10−8) and O(10−9), respectively. ∗ Email: physchen@mail.ncku.edu.tw † Email: geng@phys.nthu.edu.tw ‡ Email: tcyuan@phys.nthu.edu.tw http://arxiv.org/abs/0704.0601v4 I. INTRODUCTION As the observation of the Bs − B̄s mixing in 2006 by CDF [1], all neutral pseudoscalar- antipseudoscalar oscillations (P − P̄ ) in the down type quark systems have been seen. In the standard model (SM), the most impressive features of flavor physics are the Glashow- Iliopoulos-Maiani (GIM) mechanism [2] and the large top quark mass. The former results in the cancellation between the lowest order short-distance (SD) contributions of the first two generations to the mass difference ∆mK in the K 0 system, while the latter makes ∆mBq (q = d, s) in the Bq systems dominated by the SD effects [3]. In addition, these features also lead to sizable flavor changing neutral currents (FCNCs) from box and penguin diagrams, which contribute to the rare decays, such as K → πνν̄ and B → K(∗)ℓℓ̄. It is known that these processes could be good candidates to probe new physics effects [4–6]. However, it is clear that the new physics signals deviated from the SM predications for the P − P̄ mixings and rare FCNC decays have to wait for precision measurements on these processes. Unlike K and Bq systems, the SD contributions to charmed-meson FCNC processes, such as the D − D̄ mixing [7] and the decays of c → uℓ+ℓ− and D → ℓ+ℓ− [8], are highly suppressed due to the stronger GIM mechanism and weaker heavy quark mass enhancements in the loops. On the other hand, it is often claimed that the long-distance (LD) effect for the D−D̄ mixing should be the dominant contribution in the SM. Nevertheless, because the nonperturbative hadronic effects are hard to control, the result is still inconclusive [9–12]. Recently, BABAR [13] and BELLE [14, 15] collaborations have reported the evidence for the D − D̄ mixing with x′2 = (−0.22± 0.30± 0.21)× 10−3 , y′ = (9.7± 4.4± 3.1)× 10−3 , (1) x ≡ ∆mD = (0.80± 0.29± 0.17)% , y ≡ ∆ΓD = (0.33± 0.24± 0.15)% , yCP = (1.31± 0.32± 0.25)% , (2) respectively, where x′ = x cos δ + y sin δ and y′ = −x sin δ + y cos δ with the assumption of CP conservation and δ being the relative strong phase between the amplitudes for the doubly-Cabbibo-suppressed D → K+π− and Cabbibo-favored D → K−π+ decays [16, 17] and yCP = τ(D → K−π+)/τ(D → K+K−) − 1. Moreover, no evidence for CP violation is found. The combined results of Eqs. (1) and (2) at the 68% C.L. are [18] x = (5.5± 2.2)× 10−3 , y = (5.4± 2.0)× 10−3 , δ = (−38± 46)0 . (3) Note that the upper bound of x < 0.015 at 95%C.L. can be extracted from the BELLE data in Eq. (2) [14, 15]. The evidences of the mixing parameters by BABAR and BELLE collaborations reveal that the era of the rare charmed physics has arrived. The results in Eq. (3) can not only test the SU(3) breaking effects for the D− D̄ mixing [10, 12], but also examine new physics beyond the SM [17–21]. It is known that a straightforward way to enhance the rare D processes is to include some new heavy quarks within the framework of the SM. For instance, if a new heavy quark with the electric charge of −1/3 is introduced, it could affect the D system since the extra down type quark violates the GIM mechanism. However, the constraint on this heavy quark is quite strong as it could also lead to FCNCs for the down type quark sector at tree level, which are strictly limited by the well measured rare K and B decays, such as KL → µ+µ− and B → Xsγ [22]. On the other hand, if the charge of the new heavy quark is 2/3, it could generate interesting tree level FCNCs for the up type quark sector, for which the constraints are much weaker. In this paper, we will study D physics based on a new weak singlet upper quark. It has been known that in the framework of the Littlest Higgs model [23], there exists a new SU(2)L singlet vector-like up quark [24], hereafter denoted by T . Since the number of down type quarks is the same as that in the SM, the standard unitary 3 × 3 Cabibbo- Kobayashi-Maskawa (CKM) matrix is extended to a non-unitary 4×3 matrix in the charged currents. Moreover, Z-mediated FCNCs for the up quark sector are generated at tree level. In Ref. [25], it has been shown that the contributions of this new quark to the rare D processes are small and cannot reach the sensitivities of future experiments [25, 26]. In this paper, we will demonstrate that by adopting some plausible scenario, the effects could not only generate a large D − D̄ oscillation but also marginally reach the sensitivity proposed by BESIII for the rare D decays [27]. We note that the implication of the new data on the D − D̄ mixing in the Littlest Higgs model with T-parity has been recently studied in Ref. [21]. The paper is organized as follows. In Sec. II, we investigate that when a gauge singlet T - quark is introduced in the Littlest Higgs model, how the non-unitary matrix for the charged current and the tree level Z-mediated FCNC are formed. By using the leading perturbation, the mixing matrix elements related to the new parameters in the Littlest Higgs model are derived. In addition, we study how to get the small mixing matrix element for Vu(c)b, which describes the b → u(c) decays. In Sec. III, we discuss the implications of the non-unitarity on the D− D̄ mixing and rare D decays by presenting some numerical analysis. Finally, we summarize our results in Sec. IV. II. NON-UNITARY MIXING MATRIX IN THE LITTLEST MODEL To study the new flavor changing effects in the Littlest Higgs model, we start by writing the Yukawa interactions for the up quarks to be [24, 25] λabfǫijkǫxyχaiΣjxΣkyu b + λ0fTT c + h.c. , (4) where χT1 = (d1, u1, 0), χ 2 = (s2, c2, 0), χ 3 = (b3, t3, T ), u b is the weak singlet and Σ = eiΠ/fΣ0e iΠT /f with 112×2 112×2 , Π = 2 h∗/ φ hT/ . (5) The scale f denotes the global symmetry spontaneously breaking scale, which, as usual, could be around 1 TeV. Consequently, the 4× 4 up-quark mass matrix is given by [25] iλijv − − − − − 0 0 λ33f | λ0f . (6) We remark that the quadratic divergences for the Higgs mass from one-loop diagrams in- volving t and T get exactly cancelled as shown in Ref. [28]. Moreover, for other quarks other than the top quark, the one-loop quadratic divergent contributions do not necessitate fine-tuning the Higgs potential as the cutoff is around 10 TeV for f ∼ 1 TeV due to the small corresponding Yakawa couplings. That is why there is no need to introduce extra singlet states T [28, 29]. To obtain the quark mass hierarchy of mt ≫ mc ≫ mu, we can choose a basis such that the up-quark mass matrix is [30] m̂U 0 hf λ0f  (7) where m̂Uij = δijλiv/ 2 ≡ mi is diagonal matrix and h = (h1, h2, h3). The hi is related to λ33 by hi = Ṽ i3 λ33 and hh † = |λ33|2, in which Ṽ UR is the unitary transformation for the right-handed up quarks. We note that mi are not the physical masses and in principle their magnitudes could be as large as the weak scale. In order to preserve the hierarchy in the quark masses, one expects that m3 > m2 > m1. Furthermore, in terms of this basis, the charged and neutral currents, defined by LC = g√ J−µ W +µ − g√ 2 tan θ J−µ W H + h.c. , cos θW 3 − sin2 θWJµem tan θ 3 ZHµ + h.c. , (8) are expressed by J−µ = ŪLγµṼ 0aVDL , µṼ 0aV Ṽ 0†UL − µDL , (9) respectively, where UT = (u1, c2, t3, T ), D T = (d, s, b), aV = diag(1, 1, 1, 0) and Ṽ 0 = (V 0CKM)3×3 0  (10) with V 0CKMV CKM = 113×3. The null entry in aV denotes the new T -quark being a weak singlet; and without the new T -quark, V 0CKM is just the CKM matrix. Since the down quark sector is the same as that in the SM, we have set the unitary transformation UDL to be an identity matrix. For getting the physical eigenstates, the mass matrix in Eq. (7) can be diagonalized by unitary matrices V UL,R so that we have †diag U = V ULMUM L (11) m̂Um̂ f (|λ33|2 + |λ0|2)f 2  . (12) Since (|λ33|2 + |λ0|2)f 2 is much larger than other elements, we can take the leading order of the perturbation in himi/f (i = 1, 2, 3). According to Eq. (11), the leading expansion is given by †diag U = V ULMUM L ≈ (1 + ∆L)MUM †U (1−∆L) . (13) By looking at the off-diagonal terms (M †diag U )i4(4i), we can easily get ∆Li4 ≈ −∆L4i = − himif (|λ33|2 + |λ0|2)f 2 −m2i with i 6= 4. From the diagonal entries, if we set the light quark masses to be mu ≈ mc ≈ 0, we obtain 0 ≈ m2uj ≈ m j + 2∆Lj4(MUM U)4j , ∆Lj4 ≈ − with j = 1, 2. To be consistent with Eq. (14), at the leading expansion the relation 2h2j = (|λ33|2 + |λ0|2) (16) should be satisfied. We emphasize that the choice of Eq. (16) is somewhat fine-tuned in order to have Eqs. (14) and (15) simultaneously. Since the top-quark is much heavier than other ordinary quarks, we have 2h23 ≈ (1−m2t/m23)(|λ33|2+ |λ0|2) if f > m3 > mt. Similarly, one obtains the flavor mixing effects for i 6= j 6= 4 to be ∆Lij = hihjmimj m2i −m2j f 2[2(|λ33|2 + |λ0|2)f 2 − (m2i +m2j )] (|λ33|2 + |λ0|2)f 2 −m2j ((|λ33|2 + |λ0|2)f 2 −m2i ) . (17) After diagonalization, the currents become J−µ = ŪLγµV ULṼ 0aVDL = ŪLγµV ULV 0DL , 3 = ŪLγ µV ULṼ 0aV Ṽ 0†V †ULUL = ŪLγ µV ULV 0V 0†V †ULUL , (18) where UT = (u, c, t, T ), V 0 = Ṽ 0aV = (V 0CKM)3×3 0  (19) and diag(V 0V 0†) = aV . Since the 4-th component of aV is different from the first 3 ones, it is obvious that the matrix V ≡ V ULV 0 associated with the charged current does not satisfy unitarity. In addition, V ULaV V L, which is associated with the neutral current, is not the identity matrix. As a result, Z-mediated FCNCs at tree level are induced. According to Eq. (18), we see that V V † = V ULaV V L (20) which is just the same as the effects of the Z-mediated FCNCs. Due to V being a non-unitary matrix, one finds (V V †)ij = Vi4V j4 . (21) Consequently, the interesting phenomena arising from non-unitary matrix elements are al- ways related to Vi4V j4 = ∆Li4∆j4. We note that as we do not particularly address CP problem, in most cases, we set the parameters to be real numbers. It has been known that enormous data give strict bounds on the flavor changing effects. In particular, the pattern describing the charged current has been fixed quite well. Any new parametrization should respect these constraints. It should be interesting to see the relationship with and without the new vector-like T -quark. From Eq. (18), we know that the new flavor mixing matrix for the charged current is given by V = V ULV 0. At the leading order perturbation, one gets V = V ULV 0 ≈ (1 + ∆L)V 0 = V 0 +∆LV 0 . (22) If V 0tb ∼ 1 is taken, one finds that Vub ≈ V 0ub+∆L13 and Vcb ≈ V 0cb+∆L23. In terms of Eq. (17) and h1 = h2 ≈ h3, the relations ∆L13 ≈ −m1/m3 and ∆L23 ≈ −m2/m3 are obtained. Hence, in our approach, we have Vus ≈ V 0us − , Vub ≈ V 0ub − , Vcb ≈ V 0cb − . (23) From these results, it is clear that when the T -quark decouples from ordinary quarks, Vus → V 0us, Vub → V 0ub and Vcb → V 0cb, while m1/m2 → mu/mc, m1/m3 → mu/mt and m2/m3 → mc/mt, respectively. According to the observations in the decays of b → uℓν̄ℓ and b → cℓν̄ℓ, the corresponding values have been determined to be |Vub| = 3.96 ± 0.09 × 10−3 and |Vcb| = 42.21+0.10−0.80 × 10−3, respectively [16]. Since V 0ij and mi are free parameters, to satisfy the experimental limits with interesting phenomena in low energy physics, it is rea- sonable to set the orders of magnitude for m1/m3 and m2/m3 (m1/m2) to be O(10 −2) and O(10−1), respectively. Consequently, the non-unitary effects on the rare charmed meson decays governed by V14V 24 could be as large as ∆14∆24 ∼ O(10−4), which could be one order of magnitude larger than those in Ref. [25]. III. D − D̄ MIXING AND RARE D DECAYS A. D − D̄ mixing It is well known that the GIM mechanism has played an important role in the K − K̄ oscillation in the SM. In addition, due to the top-quark in the box and penguin diagrams, Bq − B̄q mixings are dominated by the SD effects, which are consistent with the data [16]. On the contrary, for the D− D̄ mixing the GIM cancellation further suppresses the mixing effect to be ∆mD ∼ O(m4s/m2Wm2c) [7] and the bottom quark contribution actually is a subleading effect due to the suppression of (VubV 2. In the SM, the SD contribution to the mixing parameter is O(10−7) [34]. However, the LD contribution to the mixing is believed to be dominant. Due to the nonperturbative hadronic effects, the result is still uncertain with the prediction on the mixing parameter ranging from O(10−3) [9] to O(10−2) [10–12]. Nonetheless, the mixing parameters shown in Eq. (3) could arise from the LD contribution. Thus, it is important to have a better understanding of the LD effect. On the other hand, it is also possible that the mixings in Eq. (3) could result from new physics. In the following, we will concentrate on the Littlest Higgs model. In the quark sector of the Littlest Higgs model due to the introduction of a new weak singlet, a direct impact on the low energy physics is the FCNCs at tree level. According to Eq. (18), the most attractive process with |∆C| = 2 via the Z-mediated c−u−Z interaction, illustrated in Fig. 1, is given by H(|∆C| = 2) = g (V14V ūγµPLc ūγ µPLc , (V14V ūγµPLc ūγ µPLc . (24) In terms of the hadronic matrix element, defined by 〈D̄|(ūc)V−A (ūc)V−A|D〉 = D , (25) FIG. 1: Z-mediated flavor diagram with |∆C| = 2. the mass difference for the D meson is [25] ∆mD ≈ DmDBD|(V14V ∗24)2| . (26) If we assume no cancellation between new physics and SM contributions, by taking τD = 1/ΓD = 6.232 × 1011 GeV−1, fD BD = 200 MeV [31, 32] and mD = 1.86 GeV and using Eq. (3), we obtain ζ0 ≡ |V14V ∗24| = |∆L14∆L24| = (1.47± 0.29)× 10−4 , (27) which is in the desirable range. In other words, the result in Eq. (27) demonstrates that the non-unitarity in the Littlest Higgs model could enhance the D − D̄ mixing at the observed level. We note that the limit of x < 0.015 (95%C.L.) leads to ζ0 < 2.5× 10−4. (28) In addition, we note that cancellation between the LD effect in the SM and the SD one from new physics could happen. In this case, the values in Eqs. (27) and (28) could be relaxed. B. D → Xuγ decay In the SM, the D-meson FCNC related processes are all suppressed since the internal fermions in the loops are all much lighter than mW . For the decay of D → Xuγ, without QCD corrections, the branching ratio is O(10−17); and it becomes O(10−12) when one-loop QCD corrections are included [8]. However, it is found that the two-loop QCD corrections can boost the BR to be as large as 3.5× 10−8 [33]. It should be interesting to see how large the non-unitarity effect on c → uγ is in the Littlest Higgs model. To study the radiative decay of c → uγ, we write the effective Lagrangian to be Lc→uγ = − mcūσµνPRcF µν , (29) where C7 = C 7 + C 7 and C 7 ≈ −(0.007 + i0.02) = 0.021eiδs with δs = 70.7◦ [33] being the strong phase induced by the two-loop QCD corrections. In the extension of the SM by including a weak singlet particle, the flavor mixing matrix in the charged current is not unitary and the Z-mediated FCNC at tree level is generated as well. For c → uγ, besides the QED-penguin diagrams induced by the W -boson displayed in Figs. 2a and 2b, the Z-mediated QED-penguin one in Fig. 2c will also give contributions. We note that the d, s, b W d, s, b u, c, T (a) (c)(b) FIG. 2: Flavor diagrams for c → uγ. contributions from WH and ZH can be ignored as m and m2Z/m are much less than one. At the first sight, due to the light quarks in the loops, the contributions from Figs. 2a and 2b could be negligible. However, due to the non-unitarity of (V V †)uc = V14V 24 6= 0, even in the limits of md, s, b → 0, the contributions from the mass independent terms do not vanish anymore and can be sizable. In terms of unitary gauge [22], we obtain CW7 = (V V †)12 VusV ∗cs VusV ∗cs . (30) Furthermore, if we set mu ≈ mc = 0, the contributions from Fig. 2c are given by CZ7 = (f u + f c + f T )/VusV fZc + f − eu sin2 θW [4ξ0(0)− 6ξ1(0) + 2ξ2(0)] +eu sin 2 θW [4ξ0(0)− 4ξ1(0)] fZT = eu [2ξ0(yT )− 3ξ1(yT ) + ξ2(yT )] , (31) where the functions ξn(x) are defined by ξn(x) ≡ zn+1dz 1 + (x− 1)z and yT = mT/mZ . Numerically, the total contribution in Fig. 2 is C7 = C 7 + C V ∗csVus 24 . (33) If we regard V14V 24 as an unknown complex parameter, i.e. V14V 24 = ζ0e iθ with θ being the CP violating phase, one can study the decay BR and direct CP asymmetry (CPA) of D → Xuγ defined by BR(D → Xuγ) = 6αem|C7|2 π|Vcd|2 BR(D → Xdeν̄e) , ACP = BR(c̄ → ūγ)− BR(c → uγ) BR(c̄ → ūγ) + BR(c → uγ) , (34) as functions of ζ0 and θ. In Fig. 3, the BR and CPA as functions of ζ0 are presented, where the solid, dotted, dashed and dash-dotted lines represent the CP violating phase at θ = 0, 45◦, 90◦ and 135◦, respectively. From these results, it is interesting to see that 0 1 2 3 4 5 6 0 1 2 3 4 5 6 (a) (b) FIG. 3: BR (in units of 10−8) and CPA (in units of 10−2) for D → Xuγ as functions of ζ0, where the solid, dotted, dashed and dash-dotted lines represent the CP violating phase at θ = 0, 45◦, 90◦ and 135◦, respectively. BR(D → Xuγ) is insensitive to the new physics effects, whereas the direct CPA could be as large as O(10%). Explicitly, if we take θ = 90◦ and ζ0 = 1.5 × 10−4, the CPA is about 3%. Note that this CPA vanishes in the SM. C. D → Xuℓℓ̄ and D0 → ℓ+ℓ− decays Because the current experimental measurements in K and Bq decays are all consistent with the SM predictions, it is inevitable that if we want to observe any deviations from the SM, we have to wait for precision measurements for K and Bq. SuperB factories or LHCb could provide a hope. However, the situation in D physics is straightforward. As stated before, unlike K and Bq systems, due to no heavy quark enhancement in the D system, the rare D-meson decays, such as D → Xuℓℓ̄ (ℓ = e, µ, ν), are always suppressed. Even by considering the long-distance effects, the related decays, such as D → µ+µ− and D → Xuνν̄, get small corrections to the SD predictions on the BRs [35]. Therefore, these rare decays definitely could be good candidates to probe the new physics effects. Since the values in the SM are hardly reachable at D factories [27], if any exotic event is found, it must be a strong evidence for new physics. In the following analysis, we are going to discuss the implication of the Littlest Higgs model on the rare D decays involving di-leptons. To study these decays, we first write the effective Hamiltonian for c → uℓ+ℓ− (ℓ = e, µ) H(c → uℓ+ℓ−) = − GFαem√ V ∗csVus Ceff9 O9 + C7O7 + C10O10 , (35) O7 = − ūiσµνq νPRcℓ̄γ O9 = ūγµPLc ℓ̄γ O10 = ūγµPLc ℓ̄γ µγ5ℓ , (36) where the effective Wilson coefficients are given by Ceff9 = (V V †)14 V ∗csVus cℓV + (h(zs, s)− h(zd, s)) (C2(mc) + 3C1(mc)) , C10 = − (V V †)14 V ∗csVus cℓA , (37) with s = q2/m2c , zi = mi/mc, c V = −1/2 + 2 sin2 θW , cℓA = −1/2 and h(z, s) = −4 ln z + (2 + x) |1− x| 1−x+1√ 1−x−1 − i π, for x ≡ 4z2/s < 1 , 2 arctan 1√ x−1 , for x ≡ 4z 2/s > 1 . (38) Here, we have neglected the small contributions from the penguin and box diagrams. We note that in the SM, the SD contributions are mainly from the term with h(z, s), induced by the insertion of O2 = ūLγµqLq̄Lγ µcL and mixing with O9 at one-loop level [35, 36]. We note that the resonant decays of D → XuV → Xuℓ+ℓ− (V = φ, ρ, ω) would have large corrections to c → uℓ+ℓ− at the resonant regions. However, in this paper we do not discuss these contributions as we only concentrate on the SD contributions. Moreover, these resonance contributions can be removed by imposing proper cuts in the phase space in dedicated searches. From Eq. (35), the decay rate for D → Xuℓ+ℓ− as a function of the invariant mass s = q2/m2c can be found to be 768π5 |VusV ∗cs|2(1− s)2R(s) , R(s) = |Ceff9 |2 + |C10|2 (1 + 2s) + 12Re(C∗7C 9 ) + 4 |C7|2 . (39) In addition, by utilizing the lepton angular distribution, we can also study the forward- backward asymmetry (FBA), given by −1 d cos θdΓ/dsd cos θ sgn(cos θ) −1 d cos θdΓ/dsd cos θ Ceff9 + , (40) where θ is the angle of ℓ+ related to the momentum of the D meson in the ℓ+ℓ− invariant mass frame. Since C10 is small in the SM, AFB is negligible. With mc = 1.4 GeV and the mixing parameter in Eq. (27), we get BR(D → Xue+e−) = (4.18± 0.91)× 10−10 , BR(D → Xuµ+µ−) = (2.51± 0.86)× 10−10 , (41) comparing with the SM predictions of BR(D → Xue+e−)SM = 2.1 × 10−10 and BR(D → +µ−)SM = 0.5 × 10−10, respectively. Clearly, if some cancellation occurs between new physics and SM contributions in the D − D̄ mixing, a larger value of ζ0 can be allowed. In Fig. 4, we show the tendency of the decay as a function of ζ0, where the negative horizontal values correspond to -ζ0. In addition, we present the differential decay BR [FBA] of D → Xue+e− as a function of s = q2/m2c in Fig. 5a [b], where the thick solid, dotted and dashed lines correspond to ζ0 = 1.5, 3.0 and 5.0, while the thin ones denote the cases for −ζ0 except ζ0 = 0 for the thin solid line in Fig. 5a. From Fig. 5b, we see that the FBA -6 -4 -2 0 2 4 6 FIG. 4: BR(in units of 10−9) for D → Xue+e− as a function of ζ0. 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 (a) (b) FIG. 5: (a)[(b)] Differential BR (in units of 10−9) [FBA] for D → Xue+e− as a function of s, where the thick solid, dotted and dashed lines correspond to ζ0 = 1.5, 3.0 and 5.0, while the thin ones denote the cases for −ζ0 except ζ0 = 0 for the thin solid line in (a). is only at percent level. In the Littlest Higgs model, this is because the Z coupling to the charged lepton cℓV = −1/2 + 2 sin2 θW appearing in Ceff9 is much smaller than one. This is quite different from that in b → sℓ+ℓ− where the dominant effect in the SM for the FBA is from the box and QED-penguin diagrams. Next, we discuss the decay of D → Xuνν̄. In the SM, the BR for D → Xuνν̄ is estimated to be O(10−16) − O(10−15) [35], which is vanishing small. In the Littlest Higgs model, by taking C7 = 0, C 9 = −C10 = −π(V V †)14/(αemVusV ∗cs), the effective Hamiltonian in Eq. (35) can be directly applied to c → uνν̄. The decay rate forD → Xuνν̄ as a function of s = q2/m2c can be obtained as 768π5 (1− s)2(1 + 2s) 2π2|(V V †)12|2 , (42) where the factor of 3 stands for the neutrino species. With ζ0 = 1.5×10−4, we get BR(D → Xuνν̄) = 1.31 × 10−9. However, if we relax the constraint on V14V †24, the BR as a function of ζ0 is shown in Fig. 6a. For a larger value of ζ0, BR(D → Xuνν̄) could be as large as O(10−8). 0 1 2 3 4 5 6 0 1 2 3 4 5 6 (a) (b) FIG. 6: (a) BR (in units of 10−8) for D → Xuνν̄ and (b) BR (in units of 10−9) for D → µ+µ−. Finally, we study the decays of D → ℓ+ℓ−. It has been known that, in the SM, the SD contributions to D → µ+µ− are O(10−18), while the LD ones are O(10−13) [35]. It is clear that any signal to be observed at the sensitivity of the proposed detector, such as BESIII, will indicate new physics effects. Since the effective interactions at quark level are the same as those in Eq. (35), one finds that BR(D → ℓ+ℓ−) = τDmDm |πV14V ∗24|2 . (43) Here we have used equation of motion for the charged lepton so that ℓ̄/pDℓ = 0. We also note that operators O7,9 make no contributions. With |V14V ∗24| = ζ0 = 1.5 × 10−4, the predicted BR for D → µ+µ− is 1.17× 10−10. In Fig. 6b, we present the BR as a function of ζ0. We see that BR(D → µ+µ−) in the Littlest Higgs model could be as large as O(10−9). IV. CONCLUSIONS We have studied the D − D̄ mixing and rare D decays in the Littlest Higgs model. In the model, as the new weak singlet vector-like quark T with the electric charge of 2/3 is introduced to cancel the quadratic divergence induced by the top-quark, the standard unitary 3× 3 CKM matrix is extended to a non-unitary 4× 3 matrix in the quark charged currents and Z-mediated flavor changing neutral currents are generated at tree level. We have shown that the effects on |∆C| = 2 and |∆C| = 1 processes are all related to V14V ∗24 in Eq. (21). To avoid the scenario adopted by Ref. [25], in which λ0 ∼ λ33 ≫ λij was assumed, we choose the basis such that the effective mass matrix for u1, c2 and t3 is diagonal, while the corresponding masses m1, m2 and m3 are free parameters and can be as large as the weak scale v. Since the global symmetry breaking scale f is larger than v, the mixing matrix relating physical and unphysical states could be extracted by taking the leading perturbative expansion. Accordingly, by using the approximation of mu ≈ mc ≈ 0, the explicit expressions for V14 and V24 have been obtained. In terms of the data for Vub and Vcb, we have found that the natural value for ζ0 ≡ |V14V ∗24| is O(10−4), which agrees with the observed parameter in the D − D̄ mixing but it is one order of magnitude larger than that in Ref. [25]. For the rare D decays, due to the non-unitarity effects in the model, BR(D → Xuℓ+ℓ−), BR(D → Xuνν̄) and BR(D → µ+µ−) could be enhanced to be O(10−9), O(10−8) and O(10−9), respectively, which could marginally reach the sensitivity proposed by BESIII [27]. Acknowledgments This work is supported in part by the National Science Council of R.O.C. under Grant #s: NSC-95-2112-M-006-013-MY2, NSC-95-2112-M-007-001, NSC-95-2112-M-007-059-MY3 and NSC-96-2918-I-007-010. [1] A. Abulencia et al. [CDF Collaboration], Phys. Rev. Lett. 97, 242003 (2006) [arXiv:hep- ex/0609040]. [2] S. L. Glashow, J. Iliopoulos and L. Maiani, Phys. Rev. D2, 1285 (1970). [3] J. S. Hagelin, Nucl. Phys. B193, 123 (1981). [4] S. R. Choudhury et al., Phys. Lett. B601, 164 (2004); J. Hubisz, S. J. Lee and G. Paz, JHEP 06, 041 (2006). [5] M. Blanke et al., arXiv:hep-ph/0702136; M. Blanke et al., JHEP 01, 066 (2007); A. J. Buras et al., JHEP 11, 062 (2006); M. Blanke, JHEP 12, 003 (2006). [6] C. H. Chen, Phys. Lett. B521, 315 (2001); J. Phys. G28, L33 (2002); C. H. Chen and C. Q. Geng, Phys. Rev. D66, 014007 (2002); Phys. Rev. D66, 094018 (2002); Phys. Rev. D71, 054012 (2005); Phys. Rev. D71, 115004 (2005); C. H. Chen and H. Hatanaka, Phys. Rev. D73, 075003 (2006). [7] K. Niyogi and A. Datta, Phys. Rev. D20, 2441(1979); A. Datta and D. Kumbhakhar, Z. Phys. C 27, 515 (1985). [8] G. Burdman, E. Golowich, J. Hewett and S. Pakvasa, Phys. Rev. D 52, 6383 (1995). [9] H. Georgi, Phys. Lett. B297, 353 (1992); T. Ohl, G. Ricciardi and E. Simmons, Nucl. Phys. B403, 605 (1993); I. Bigi and N. Uraltsev, Nucl. Phys. B592, 92 (2001). [10] A. A. Petrov, Int. J. Mod. Phys. A21, 5686 (2006). [11] J. Donoghue et al., Phys. Rev. D33, 179 (1986); L. Wolfenstein, Phys. Lett. B164, 170 (1985); P. Colangelo, G. Nardulli and N. Paver, Phys. Lett. B242, 71 (1990); T. A. Kaeding, Phys. Lett. B357, 151 (1995); A. A. Anselm and Y. I. Azimov, Phys. Lett. B85, 72 (1979). [12] A. F. Falk et al., Phys. Rev. D65, 054034 (2002); A. F. Falk, et al., Phys. Rev. D69, 114021 (2004). [13] B. Aubert et al. (Babar Collaboration), Phys. Rev. Lett. 98, 211802 (2007) [arXiv:hep- ex/0703020]. [14] K. Abe et al. (Belle Collaboration), Phys. Rev. Lett. 98, 211803 (2007) [arXiv:hep- ex/0703036]. [15] L. M. Zhang et al. (Belle Collaboration), arXiv:0704.1000 [hep-ex]. [16] Particle Data Group, W.M. Yao et al., J. Phys. G 33, 1 (2006). [17] Y. Nir, arXiv:hep-ph/0703235. [18] M. Ciuchini et al., arXiv:hep-ph/0703204. [19] E. Golowich, S. Pakvasa and A. A. Petrov, arXiv:hep-ph/0610039. [20] P. Ball, arXiv:hep-ph/0703245; arXiv:0704.0786; X. G. He and G. Valencia, arXiv:hep- ph/0703270. [21] M. Blanke, A. J. Buras, S. Recksiegel, C. Tarantino and S. Uhlig, arXiv:hep-ph/0703254. [22] C. H. Chang, D. Chang and W. Y. Keung, Phys. Rev. D61, 053007 (2000). [23] N. Arkani-Hamed, A. G. Cohen and H. Georgi, Phys. Lett. B513, 232 (2001) N. Arkani- Hamed, A. G. Cohen, E. Katz and A. E. Nelson, JHEP 07, 034 (2002) [arXiv:hep-ph/0206021]. [24] T. Han et al., Phys. Rev. D67, 095004 (2003). [25] J. Lee, JHEP 0412, 065 (2004). [26] S. Fajfer and S. Prelovsek, Phys. Rev. D73, 054026 (2006). [27] H. B. Li, hep-ex/0605004. [28] M. Perelstein, M. E. Peskin and A. Pierce, Phys. Rev. D 69, 075002 (2004); see also M. Perel- stein, Prog. Part. Nucl. Phys. 58, 247 (2007). [29] M. Schmaltz and D. Tucker-Smith, Ann. Rev. Nucl. Part. Sci. 55, 229 (2005). [30] H. Fritzsch, Phys. Lett. B73, 317 (1978); Phys. Lett. B166, 423 (1986). [31] M. Artuso et al. [CLEO Collaboration], Phys. Rev. Lett. 95, 251801 (2005). [32] H. W. Lin, S. Ohta, A. Soni and N. Yamada, Phys. Rev. D74, 114506 (2006). [33] C. Greub, T. Hurth, M. Misiak and D. Wyler, Phys. Lett. B382, 415 (1996); S. Prelovsek and D. Wyler, Phys. Lett. B500, 304 (2001). [34] E. Golowich and A. A. Petrov, Phys. Lett. B625, 53 (2005). [35] G. Burman, E. Golowich, J. Hewett and S. Pakvasa, Phys. Rev. D66, 014009 (2002). [36] S. Fajfer, P. Singer and J. Zupan, Eur. Phys. J. C27, 201 (2003). ABSTRACT We study the $D-\bar D$ mixing and rare D decays in the Littlest Higgs model. As the new weak singlet quark with the electric charge of 2/3 is introduced to cancel the quadratic divergence induced by the top-quark, the standard unitary $3\times 3$ Cabibbo-Kobayashi-Maskawa matrix is extended to a non-unitary $4\times 3$ matrix in the quark charged currents and Z-mediated flavor changing neutral currents are generated at tree level. In this model, we show that the $D-\bar D$ mixing parameter can be as large as the current experimental value and the decay branching ratio (BR) of $D\to X_u \ga$ is small but its direct CP asymmetry could be $O(10%)$. In addition, we find that the BRs of $D\to X_u \ell^{+} \ell^{-}$, $D\to X_u\nu \bar \nu$ and $D\to \mu^{+} \mu^{-}$ could be enhanced to be $O(10^{-9})$, $O(10^{-8})$ and $O(10^{-9})$, respectively. <|endoftext|><|startoftext|> CERN–PH–TH/2007–068 SU–4252–8489 Alternative Large N Schemes and Chiral Dynamics Francesco Sannino∗ CERN Theory Division, CH-1211 Geneva 23, Switzerland. and University of Southern Denmark, Campusvej 55, DK-5230, Odense M., Denmark. Joseph Schechter† Department of Physics, Syracuse University, Syracuse, NY 13244-1130, USA. (Dated: March 2007) We compare the dependences on the number of colors of the leading ππ scattering amplitudes using the single index quark field and two index quark fields. These are seen to have different relationships to the scattering amplitudes suggested by chiral dynamics which can explain the long puzzling pion pion s wave scattering up to about 1 GeV. This may be interesting for getting a better understanding of the large Nc approach as well as for application to recently proposed technicolor models. BACKGROUND Gaining control of QCD in its strongly interacting (low energy) regime constitutes a real challenge. One very at- tractive approach is based on studying the theory in the large number of colors (Nc) limit [1, 2]. At the same time one may obtain more information by requiring the theory to model the (almost) spontaneous breakdown of chiral symmetry [3, 4]. A standard test case is pion pion scattering in the energy range up to about 1 GeV. Some time ago, an attempt was made [5, 6] to implement this combined scenario. Since the leading large Nc ampli- tude contains only tree diagrams involving mesons of the standard quark-antiquark type, it is expected that the required amplitude should be gotten by calculating just the chiral tree diagrams for rho meson exchange together with the four point pion contact diagram. There are no unknown parameters in this calculation. The crucial question is whether the scattering amplitude calculated in this way will satisfy unitarity. When one compares the result with experimental data up to about 1 GeV on the real part of the (most sensitive to unitarity violation) J=I=0 partial wave, one finds (see Fig.1 of [6]) that the result violates the partial wave unitarity bound by just a “little bit”. On the other hand, the pion contact term by itself violates unitarity much more drastically so one might argue that the large Nc approach, which suggests that the tree diagrams of all quark anti-quark resonances in the relevant energy range be included, is helping a lot. To make matters more quantitative one might ask the question: by how much should Nc be increased in or- der for the amplitude in question to remain within the unitarity bounds for energies below 1 GeV? This question was answered in a very simple way in [7], as we now briefly review. In terms of the con- ventional amplitude, A(s, t, u) the I = 0 amplitude is 3A(s, t, u) + A(t, s, u) + A(u, t, s). One gets the J = 0 channel by projecting out the correct partial wave. The current algebra (pion contact diagram) contribution to the conventional amplitude is Aca(s, t, u) = 2 s−m2π , (1) where the pion decay constant, Fπ depends on Nc as Fπ(Nc) = 131 3 so that Fπ(3) = 131 MeV. Fur- thermore mπ = 137 MeV is independent of Nc. The desired amplitude is obtained by adding to the current algebra term the following vector meson ρ(770) contribu- tion: Aρ(s, t, u) = g2ρππ (4m2π − 3s) g2ρππ (m2ρ − t)− imρΓρθ(t− 4m2π) (m2ρ − u)− imρΓρθ(u − 4m2π) , (2) where gρππ(Nc) = 8.56 Nc is the ρππ coupling con- stant. Also, mρ = 771 MeV is independent of Nc and Γρ(Nc) = g2ρππ (Nc) 12πm2ρ . (3) It should be noted that the first term in Eq. (2), which is an additional non-resonant contact interaction other than the current algebra contribution, is required when we include the ρ vector meson contribution in a chiral invariant manner. In Fig. 1 we show the real part of the I = J = 0 amplitude (denoted R0 ) due to current alge- bra plus the ρ contribution for increasing values of Nc. Since in this channel the vector meson is never on shell we suppress the contribution of the width in the vector http://arxiv.org/abs/0704.0602v1 meson propagator in Eq. (2). One observes that the uni- tarity bound (i.e., |R0 | ≤ 1/2) is satisfied for Nc ≥ 6 till well beyond the 1 GeV region. However unitarity is still a problem for 3, 4 and 5 colors. At energy scales larger than the one associated with the vector meson clearly other resonances are needed [5] but we shall not be concerned with that energy range here. It is also interesting to note that these considerations are essentially unchanged when the pion mass (i.e. explicit chiral symmetry breaking in the Lagrangian) is set to zero. FIG. 1: Real part of the I = J = 0 partial wave amplitude due to the current algebra +ρ terms, plotted for the follow- ing increasing values of Nc (from up to down), 3, 4, 5, 6, 7. The curve with largest violation of the unitarity bound corre- sponds to Nc = 3 while the ones within the unitarity bound are for Nc = 6, 7. Note that essentially we are just using the scaling, A(s, t, u) = Ã(s, t, u) . (4) where Ã(s, t, u) is defined replacing Fπ and gρππ with the Nc independent quantities F̃π = Fπ Nc and g̃ρππ = 3. Other authors [8] have found the same minimum value, Nc=6 for the practical consistency of the large Nc approximation, by using different methods. In order to explain low energy ππ scattering for the physical value Nc = 3 one must go beyond the large Nc approximation. It is attractive to keep the assumption of tree diagram dominance involving near by resonances, however. One easily sees that adding a scalar singlet res- onance (sigma) at the location where the unitarity bound on R0 (s) is first violated should restore unitarity. This is because the real part of a Breit Wigner resonance is zero at the pole location and negative just above it, so will bring R0 (s) below the bound, as required. In [7], the resonance mass was found to be around 550 MeV on this basis. Such a low value would make it different from a p-wave quark-antiquark state, which is expected to be in the 1000-1400 MeV range. We assume then that it is a four quark state (glueball states are expected to be in the 1.5 GeV range from lattice investigations). Four quark states of diquark-anti diquark type [9] and meson-meson type [10] have been discussed in the literature for many years. Accepting this picture, however, poses a problem for the accuracy of the large Nc inspired description of the scattering since four quark states are predicted not to exist in the large Nc limit of QCD. We shall take the point of view that a four quark type state is present since it allows a natural fit to the low energy data. Of course, it is still necessary to fine tune the parameters and shape of the sigma resonance to fit the experimental ππ scat- tering data in detail. In practice, since the parameters of the pion contact and rho exchange contributions are fixed, the sigma is the most important one for fitting and fits may even be achieved [11] if the vector meson piece is neglected. However the well established, presumably four quark type, f0(980) resonance must be included to achieve a fit in the region just around 1 GeV. There is by now a fairly large recent literature [12]- [44] on the effect of light “exotic” scalars in low energy meson meson scattering. There seems to be a consene- sus, arrived at using rather different approaches (keeping however, unitarity), that the sigma exists. TWO INDEX QUARK FIELDS Now, consider redefining the Nc = 3 quark field with color index A (and flavor index not written) as ǫABCq BC , qBC = −qCB, (5) so that, for example, q1 = q 23 and similarly for the ad- joint field, q̄1 = q̄23 etc. This is just a trivial change of variables and will not change anything for QCD. How- ever, if a continuation of the theory is made to Nc > 3 the resulting theory will be different since the two index anti- symmetric quark representation has Nc(Nc− 1)/2 rather than Nc color components. As was pointed out by Cor- rigan and Ramond [45], who were mainly interested in the problem of the baryons at large Nc, this shows that the extrapolation of QCD to higher Nc is not unique. Further investigation of the properties of the alternative extrapolation model introduced in [45] was carried out by Kiritsis and Papavassiliou [46]. Here, we shall discuss the consequences for the low energy ππ scattering dis- cussed above, of this alternative large Nc extrapolation, assuming for our purpose, that all the quarks extrapolate as antisymmetric two index objects. It may be worthwhile to remark that gauge theo- ries with two index quarks have recently gotten a great deal of attention. Armoni, Shifman and Veneziano [47, 48, 49, 50, 51] have proposed an interesting relation between certain sectors of the two index antisymmetric (and symmetric) theories at large number of colors and sectors of super Yang-Mills (SYM). Using a supersym- metry inspired effective Lagrangian approach 1/Nc cor- rections were investigated in [52]. Information on the su- per Yang-Mills spectrum has been obtained in [53]. On the validity of the large Nc equivalence between differ- ent theories and interesting new possible phases we refer the reader to [54, 55, 56]. The finite temperature phase transition and its relation with chiral symmetry has been investigated in [57] while the effects of a nonzero baryon chemical potential were studied in [58]. When adding flavors the phase diagram as a function of the number of flavors and colors has been provided in [59]. The complete phase diagram for fermions in ar- bitrary representations has been unveiled in [60]. The study of theories with fermions in a higher dimensional representation of the gauge group and the knowledge of the associated conformal window led to the construction of minimal models of technicolor [59, 61, 62] which are not ruled out by current precision measurements and lead to interesting dark matter candidates [63, 64, 65] as well as to a very high degree of unification of the standard model gauge couplings [66]. Besides these two limits a third one for massless one- flavor QCD, which is in between the ’t Hooft and Cor- rigan Ramond ones, has been very recently proposed in [67]. Here one first splits the QCD Dirac fermion into the two elementary Weyl fermions and afterwards assigns one of them to transform according to a rank-two antisym- metric tensor while the other remains in the fundamental representation of the gauge group. For three colors one reproduces one-flavor QCD and for a generic number of colors the theory is chiral. The generic Nc is a particular case of the generalized Georgi-Glashow model [68]. To illustrate the large Nc counting for the ππ scatter- ing amplitude when quarks are designated to transform according to the two index antisymmetric representation of color SU(3) one may employ [1] the mnemonic where each tensor index of this group is represented by a di- rected line. Then the quark-quark gluon interaction is pictured as in Fig. 2. The two index quark is pictured FIG. 2: Two index fermion - gluon vertex. as two lines with arrows pointing in the same direction, as opposed to the gluon which has two lines with arrows pointing in opposite directions. The coupling constant representing this vertex is taken to be gt/ Nc, where gt (the ’t Hooft coupling constant) is to be held constant. A “one point function”, like the pion decay constant, Fπ would have as it’s simplest diagram, Fig. 3. The X represents a pion insertion and is associated FIG. 3: Diagram for Fπ for the two index quark. with a normalization factor for the color part of the pion’s wavefunction, Nc(Nc − 1) , (6) which scales for large Nc as 1/Nc. The two circles each carry a quark index so their joint factor scales as N2c for large Nc; more precisely, taking the antisymmetry into account, the factor is Nc(Nc − 1) . (7) The product of Eqs. (6) and (7) yields the Nc scaling for F 2π (Nc) = Nc(Nc − 1) F 2π (3). (8) For largeNc, Fπ scales proportionately to Nc rather than Nc as in the case of the ’t Hooft extrapolation. Using this scaling together with Eq.(1) suggests that the ππ scatttering amplitude, A scales as, A(Nc) = Nc(Nc − 1) A(3), (9) which, for large Nc scales as 1/N c rather than as 1/Nc in the ’t Hooft extrapolation. This scaling law for large Nc may be verified from the mnemonic in Fig. 4, where there is an N2c factor from the two loops multiplied by four factors of 1/Nc from the X’s. FIG. 4: Diagram for the scattering amplitude, A with the 2 index quark. It is interesting to find the minimum value of Nc for which the tree amplitude due to the pion and rho meson terms (given in Eqs.(1) and (2) above) is unitary in this two antisymmetric index quark extrapolation scheme. Fig. 1 shows that the the peak value of the partial wave amplitude, R0 due to these two terms is numerically about 0.9. This is to be identified with Aca(3) + Aρ(3) in Eq.(9). Thus the condition that the extrapolated am- plitude be unitary is, Nc(Nc − 1) < 1/2. (10) Clearly, the extrapolated amplitude is unitary already for Nc = 4, which indicates better convergence in Nc than for the ’t Hooft case which became unitary at Nc = 6. There is still another different feature; consider the typical ππ scattering diagram with an extra internal (two index) quark loop, as shown in Fig. 5. FIG. 5: Diagram for the scattering amplitude, A including an internal 2 index quark loop. In this diagram there are four X’s (factor from Eq.(6)), five index loops (factor from Eq.(7)) and six gauge cou- pling constants. These combine to give a large Nc scaling behavior proportional to 1/N2c for the ππ scattering am- plitude. We see that diagrams with an extra internal 2 index quark loop are not suppressed compared to the leading diagrams. This is analogous, as pointed out in [46], to the behavior of diagrams with an extra gluon loop in the ’t Hooft extrapolation scheme. Now, Fig. 5 is a diagram which can describe a sigma particle exchange. Thus in the 2 index quark scheme, “exotic” four quark resonances can appear at the leading order in addition to the usual two quark resonances. Given the discussion we reviewed above, the possibility of a sigma appearing at leading order means that one can construct a unitary ππ amplitude already at Nc = 3 in the 2 antisymmetric index scheme. From the point of view of low energy ππ scattering, it seems to be unavoidable to say that the 2 index scheme is more realistic than the ’t Hooft scheme given the existence of a four quark type sigma. Of course, the usual ’t Hooft extrapolation has a num- ber of other things to recommend it. These include the fact that nearly all meson resonances seem to be of the quark- antiquark type, the OZI rule predicted holds to a good approximation and baryons emerge in an elegant way as solitons in the model. A fair statement would seem to be that each extrap- olation emphasizes different aspects of the true Nc = 3 QCD. In particular, the usual scheme is not really a re- placement for the true theory. That appears to be the meaning of the fact that the continuation to Nc > 3 is not unique. QUARKS IN TWO INDEX SYMMETRIC COLOR REPRESENTATION Clearly the assignment of quarks to the two index sym- metric representation of color SU(3) looks very similar. We may denote the quark fields as, AB = q BA , (11) in contrast to Eq.(5). There will be Nc(Nc + 1)/2 differ- ent color states for the two index symmetric quarks. This means that there is no value of Nc for which the sym- metric theory can be made to correspond to true QCD. For Nc = 3 there are 6 color states of the quarks and 8 color states of the gluon. If we choose Nc = 2, there are 3 color states of the quarks but unfortunately only three color states of the gluon. On the other hand, for large Nc it would seem reasonable to make approximations like, Asym(Nc) ≈ Aasym(Nc), (12) for the ππ scattering amplitude. As far as the large Nc counting goes, the mnemonics in Figs. 2-5 are still applicable to the case of quarks in the two index symmetric color representation. For not so large Nc, the scaling factor for the pion insertion would Nc(Nc + 1) , (13) and the pion decay constant would scale as F symπ (Nc) ∝ Nc(Nc + 1) . (14) With the identification AQCD = Aasym(3), the use of Eq.(12) enables us to estimate the large Nc scattering amplitude as, Asym(Nc) ≈ AQCD. (15) In applications to recently proposed minimal walking technicolor theories this formula is useful for making es- timates involving weak gauge bosons via the Goldstone boson equivalence theorem [69]. Finally we remark on the large Nc scaling rules for meson and glueball masses and decays in either the two index antisymmetric or two index symmetric schemes. Both meson and glueball masses scale as (Nc) 0. Further- more, all six reactions of the type a → b+ c, (16) where a,b and c can stand for either a meson or a glueball, scale as 1/Nc. This is illustrated in Fig.6 for the case of a meson decaying into two glueballs; note that the glueball insertion scales as 1/Nc and that two interaction vertices are involved. FIG. 6: Diagram for meson decay into two glueballs. SUMMARY We have investigated the dependences on the number of colors of the leading ππ scattering amplitudes using the single and the two index quark fields. We have seen that in the 2 index quark extension of QCD, exotic four quark resonances can appear at the leading order in addition to the usual two quark reso- nances. From the point of view of low energy ππ scatter- ing the 2 index scheme is more realistic than the ’t Hooft one given the existence of a four quark type sigma. This allows one to explain the long puzzling pion pion s wave scattering up to about 1 GeV. Of course, the usual ’t Hooft extrapolation has a num- ber of other important predictions to recommend it. A fair statement is that each largeNc extrapolation of QCD captures different aspects of the physical Nc = 3 case. We have also related the QCD scattering amplitude at large Nc with the one featuring two index symmetric quarks (Similar connections exist for adjoint fermions). The results are interesting for getting a better under- standing of the large Nc approach as well as for applica- tion to recently proposed technicolor models. Acknowledgments It is a pleasure to thank A. Abdel Rehim, D. Black, D.D. Dietrich, A. H. Fariborz, M.T. Frandsen, M. Harada, S. Moussa, S. Nasri and K. Tuominen for helpful discussions. The work of F.S. is supported by the Marie Curie Excellence Grant under contract MEXT-CT-2004- 013510 as well as the Danish Research Agency. The work of J.S. is supported in part by the U. S. DOE under Con- tract no. DE-FG-02-85ER 40231. ∗ Electronic address: francesco.sannino@nbi.dk † Electronic address: schechte@physics.syr.edu [1] G.’t Hooft, Nucl. Phy. B72, 461 (1974). [2] E. Witten, Nucl. Phy. B160, 57 (1979). [3] Y. Nambu and G. Jona-Lasinio, Phys. Rev. 122, 345 (1961). [4] M. Gell-Mann and M. Levy, Nuovo Cimento 16, 705 (1960). [5] F. Sannino and J. Schechter, Phys. Rev. D 52, 96 (1995). [6] M. Harada, F. Sannino and J. Schechter, Phys. Rev. D 54, 1991 (1996). [7] M. Harada, F. Sannino and J. Schechter, Phys. Rev. D 69, 034005 (2004) [arXiv:hep-ph/0309206]. [8] J. R. Pelaez, Phys. Rev. Lett. 92, 102001 (2004); 97, 242002 (2006). M. Uehara, arXiv:hep-ph/0308241, /0401037, /0404221. [9] R. L. Jaffe, Phys. Rev. D 15, 267 (1977); Phys. Rev. D 15, 281 (1977). [10] J. D. Weinstein and N. Isgur, Phys. Rev. Lett. 48, 659 (1982). [11] M. Harada, F. Sannino and J. Schechter, Phys. Rev. Lett. 78, 1603 (1997). [12] See the dedicated conference proceedings, S. Ishida et al “Possible existence of the sigma meson and its implica- tion to hadron physics”, KEK Proceedings 2000-4, So- ryyushiron Kenkyu 102, No. 5, 2001. Additional points of view are expressed in the proceedings, D. Amelin and A.M. Zaitsev “Hadron Spectroscopy”, Ninth Inter- national Conference on Hadron Spectroscopy, Protvino, Russia(2001). [13] E. Van Beveren, T. A. Rijken, K. Metzger, C. Dulle- mond, G. Rupp and J. E. Ribeiro, Z. Phys. C 30, 615 (1986); E. van Beveren and G. Rupp, Eur. Phys. J. C 10, 469 (1999). See also J. J. De Swart, P. M. Maessen and T. A. Rijken, Talk given at U.S. / Japan Seminar: The Hyperon - Nucleon Interaction, Maui, HI, 25-28 Oct 1993 [arXiv:nucl-th/9405008]. [14] D. Morgan and M. R. Pennington, Phys. Rev. D 48, 1185 (1993). [15] A. A. Bolokhov, A. N. Manashov, V. V. Vereshagin and V. V. Polyakov, Phys. Rev. D 48, 3090 (1993). See also V. A. Andrianov and A. N. Manashov, Mod. Phys. Lett. A 8, 2199 (1993). Extension of this string-like approach to the πK case has been made in V. V. Vereshagin, Phys. Rev. D 55, 5349 (1997) and in A. V. Vereshagin and V. V. Vereshagin, Phys. Rev. D 59, 016002 (1999). [16] N. N. Achasov and G. N. Shestakov, Phys. Rev. D 49, 5779 (1994). [17] R. Kamı́nski, L. Leśniak and J. P. Maillet, Phys. Rev. D 50, 3145 (1994). [18] N. A. Tornqvist, Z. Phys. C 68, 647 (1995) and references therein. In addition see N. A. Törnqvist and M. Roos, Phys. Rev. Lett. 76, 1575 (1996); N. A. Tornqvist, Talk given at 7th International Conference on Hadron Spec- troscopy (Hadron 97), Upton, NY, 25-30 Aug 1997 and at EuroDaphne Meeting, Barcelona, Spain, 6-9 Nov 1997 [arXiv:hep-ph/9711483]; Phys. Lett. B 426, 105 (1998). [19] R. Delbourgo and M. D. Scadron, Mod. Phys. Lett. A 10, 251 (1995); See also D. Atkinson, M. Harada and A. I. Sanda, Phys. Rev. D 46, 3884 (1992). [20] G. Janssen, B. C. Pearce, K. Holinde and J. Speth, Phys. mailto:francesco.sannino@nbi.dk mailto:schechte@physics.syr.edu http://arxiv.org/abs/hep-ph/0309206 http://arxiv.org/abs/hep-ph/0308241 http://arxiv.org/abs/nucl-th/9405008 http://arxiv.org/abs/hep-ph/9711483 Rev. D 52, 2690 (1995). [21] M. Svec, Phys. Rev. D 53, 2343 (1996). [22] S. Ishida, M. Ishida, H. Takahashi, T. Ishida, K. Taka- matsu and T. Tsuru, Prog. Theor. Phys. 95, 745 (1996); S. Ishida, M. Ishida, T. Ishida, K. Takamatsu and T. Tsuru, Prog. Theor. Phys. 98, 621 (1997). See also M. Ishida and S. Ishida, Talk given at 7th International Conference on Hadron Spectroscopy (Hadron 97), Upton, NY, 25-30 Aug 1997 [arXiv:hep-ph/9712231]. [23] D. Black, A. H. Fariborz, F. Sannino and J. Schechter, Phys. Rev. D 58, 054012 (1998). [24] D. Black, A. H. Fariborz, F. Sannino and J. Schechter, Phys. Rev. D 59, 074026 (1999). [25] L. Maiani, A. Polosa, F. Piccinni and V. Riquier, Phys. Rev. Lett. 93, 212002 (2004). Here the characteristic form for a four quark scalar coupling to two pions was obtained as in [24] above but with the difference that non-derivative coupling rather than derivative coupling was used. The derivative coupling appeared in [24] since the context was that of a non-linear chiral Lagrangian. [26] J. A. Oller, E. Oset and J. R. Pelaez, Phys. Rev. Lett. 80, 3452 (1998). See also K. Igi and K. I. Hikasa, Phys. Rev. D 59, 034005 (1999). [27] A. V. Anisovich and A. V. Sarantsev, Phys. Lett. B 413, 137 (1997). [28] V. Elias, A. H. Fariborz, F. Shi and T. G. Steele, Nucl. Phys. A 633, 279 (1998). [29] V. Dmitrasinovic, Phys. Rev. C 53, 1383 (1996). [30] P. Minkowski andW. Ochs, Eur. Phys. J. C 9, 283 (1999). [31] S. Godfrey and J. Napolitano, Rev. Mod. Phys. 71, 1411 (1999). [32] L. Burakovsky and T. Goldman, Phys. Rev. D 57, 2879 (1998). [33] A. H. Fariborz and J. Schechter, Phys. Rev D60, 034002 (1999). [34] T. Hatsuda, T. Kunihiro and H. Shimizu, Phys. Rev. Lett. 82, 2840 (1999); S. Chiku and T. Hatsuda, Phys. Rev. D 58, 076001 (1998). [38] D. Black, A. H. Fariborz and J. Schechter, Phys. Rev. D 61, 074030 (2000). See also V. Bernard, N. Kaiser and U. G. Meissner, Phys. Rev. D 44, 3698 (1991). [39] D. Black, A. H. Fariborz and J. Schechter, Phys. Rev. D 61, 074001 (2000). [37] L. S. Celenza, S. f. Gao, B. Huang, H. Wang and C. M. Shakin, Phys. Rev. C 61, 035201 (2000). [38] D. Black, A. H. Fariborz and J. Schechter, Phys. Rev. D 61, 074030 (2000). See also V. Bernard, N. Kaiser and U. G. Meissner, Phys. Rev. D 44, 3698 (1991). [39] D. Black, A. H. Fariborz and J. Schechter, Phys. Rev. D 61, 074001 (2000). [40] D. Black, A. H. Fariborz, S. Moussa, S. Nasri and J. Schechter, Phys. Rev. D 64, 014031 (2001). [41] In addition to [39] and [40] above see T. Teshima, I. Ki- tamura and N. Morisita, J. Phys. G 28, 1391 (2002); F. Close and N. Tornqvist, ibid. 28, R249 (2002); A.H. Fariborz, Int. J. Mod. Phys. A 19, 2095 (2004); 5417 (2004); Phys. Rev. D 74, 054030 (2006); F. Giacosa, Th. Gutsche, V.E. Lyubovitskij and A. Faessler, Phys. Lett. B 622, 277 (2005); J. Vijande, A. Valcarce, F. Fernandez and B. Silvestre-Brac, Phys. Rev. D 72, 034025 (2005); S. Narison, Phys. Rev. D 73, 114024 (2006); L. Maiani, F. Piccinini, A.D. Polosa and V. Riquer, hep-ph/0604018. [42] Y. S. Kalashnikova, A. E. Kudryavtsev, A. V. Nefediev, C. Hanhart and J. Haidenbauer, the Eur. Phys. J. A 24 (2005) 437 [hep-ph/0412340]. [43] The Roy equation for the pion amplitude, S.M. Roy, Phys. Lett. B 36, 353 (1971), has been used by several au- thors to obtain information about the f0(600) resonance. See T. Sawada, page 67 of ref. [12] above, I. Caprini, G. Colangelo and H. Leutwyler, Phys. Rev. Lett. 96, 132001 (2006). A similar approach has been employed to study the putative light kappa by S.Descotes-Genon and B. Moussallam, Eur. Phys. J. C 48, 553 (2006). [44] Further discussion of the approach in ref. [43] above is given in D.V. Bugg, J. Phys. G 34, 151 (2007) [hep-ph/0608081]. [45] E. Corrigan and P. Ramond, Phys. Lett. B 87, 73 (1979). [46] E. B. Kiritsis and J. Papavassiliou, Phys. Rev. D 42, 4238 (1990). [47] A. Armoni, M. Shifman and G. Veneziano, Nucl. Phys. B 667, 170 (2003) [arXiv:hep-th/0302163]. [48] A. Armoni, M. Shifman and G. Veneziano, Phys. Rev. Lett. 91, 191601 (2003) [arXiv:hep-th/0307097]. [49] A. Armoni, M. Shifman and G. Veneziano, Phys. Rev. D 71, 045015 (2005) [arXiv:hep-th/0412203]. [50] A. Armoni, G. Shore and G. Veneziano, Nucl. Phys. B 740, 23 (2006) [arXiv:hep-ph/0511143]. [51] A. Armoni, M. Shifman and G. Veneziano, Phys. Lett. B 647, 515 (2007) [arXiv:hep-th/0701229]. [52] F. Sannino and M. Shifman, Phys. Rev. D 69, 125004 (2004) [arXiv:hep-th/0309252]. [53] A. Feo, P. Merlatti and F. Sannino, Phys. Rev. D 70, 096004 (2004) [arXiv:hep-th/0408214]. P. Merlatti and F. Sannino, Phys. Rev. D 70, 065022 (2004) [arXiv:hep-th/0404251]. [54] M. Unsal, arXiv:hep-th/0703025. [55] M. Unsal and L. G. Yaffe, Phys. Rev. D 74, 105019 (2006) [arXiv:hep-th/0608180]. [56] P. Kovtun, M. Unsal and L. G. Yaffe, Phys. Rev. D 72, 105006 (2005) [arXiv:hep-th/0505075]. [57] F. Sannino, Phys. Rev. D 72, 125006 (2005) [arXiv:hep-th/0507251]. [58] M. T. Frandsen, C. Kouvaris and F. Sannino, Phys. Rev. D 74, 117503 (2006) [arXiv:hep-ph/0512153]. [59] F. Sannino and K. Tuominen, Phys. Rev. D 71, 051901 (2005) [arXiv:hep-ph/0405209]. [60] D. D. Dietrich and F. Sannino, arXiv:hep-ph/0611341. To Appear in Phys. Rev. D. [61] D. K. Hong, S. D. H. Hsu and F. Sannino, Phys. Lett. B 597, 89 (2004) [arXiv:hep-ph/0406200]. [62] D. D. Dietrich, F. Sannino and K. Tuominen, Phys. Rev. D 72, 055001 (2005) [arXiv:hep-ph/0505059]. Phys. Rev. D 73, 037701 (2006) [arXiv:hep-ph/0510217]. [63] K. Kainulainen, K. Tuominen and J. Virkajarvi, arXiv:hep-ph/0612247. [64] S. B. Gudnason, C. Kouvaris and F. Sannino, Phys. Rev. D 74, 095008 (2006) [arXiv:hep-ph/0608055]. [65] C. Kouvaris, arXiv:hep-ph/0703266. [66] S. B. Gudnason, T. A. Ryttov and F. Sannino, arXiv:hep-ph/0612230. [67] T. A. Ryttov and F. Sannino, Phys. Rev. D 73, 016002 (2006) [arXiv:hep-th/0509130]. [68] H. Georgi, Nucl. Phys. B 266, 274 (1986). [69] J.M. Cornwall, D.N. Levin and G. Tiktopoulos, Phys. Rev. D 10, 1145 (1974); B.W. Lee, C. Quigg and H.B. Thacker, Phys. Rev. D 16, 1519 (1977); M.S. Chanowitz and M.K. Galliard, Nucl. Phys. B 261, 379 (1985). http://arxiv.org/abs/hep-ph/9712231 http://arxiv.org/abs/hep-ph/0604018 http://arxiv.org/abs/hep-ph/0412340 http://arxiv.org/abs/hep-ph/0608081 http://arxiv.org/abs/hep-th/0302163 http://arxiv.org/abs/hep-th/0307097 http://arxiv.org/abs/hep-th/0412203 http://arxiv.org/abs/hep-ph/0511143 http://arxiv.org/abs/hep-th/0701229 http://arxiv.org/abs/hep-th/0309252 http://arxiv.org/abs/hep-th/0408214 http://arxiv.org/abs/hep-th/0404251 http://arxiv.org/abs/hep-th/0703025 http://arxiv.org/abs/hep-th/0608180 http://arxiv.org/abs/hep-th/0505075 http://arxiv.org/abs/hep-th/0507251 http://arxiv.org/abs/hep-ph/0512153 http://arxiv.org/abs/hep-ph/0405209 http://arxiv.org/abs/hep-ph/0611341 http://arxiv.org/abs/hep-ph/0406200 http://arxiv.org/abs/hep-ph/0505059 http://arxiv.org/abs/hep-ph/0510217 http://arxiv.org/abs/hep-ph/0612247 http://arxiv.org/abs/hep-ph/0608055 http://arxiv.org/abs/hep-ph/0703266 http://arxiv.org/abs/hep-ph/0612230 http://arxiv.org/abs/hep-th/0509130 ABSTRACT We compare the dependences on the number of colors of the leading pion pion scattering amplitudes using the single index quark field and two index quark fields. These are seen to have different relationships to the scattering amplitudes suggested by chiral dynamics which can explain the long puzzling pion pion s wave scattering up to about 1 GeV. This may be interesting for getting a better understanding of the large Nc approach as well as for application to recently proposed technicolor models. <|endoftext|><|startoftext|> Introduction The correlative analysis proves to be an essential tool in disentangling of causal relations in the solar atmosphere. Recently, Rutten & Krijger (2003) and Rutten et al. (2004) quanti- fied the correlation of the reversed granulation observed in the low chromosphere and mid- photosphere with surface granulation in quest for the nature of internetwork background brightness patterns in these layers. In agreement with these studies Puschmann et al. (2003) demonstrated that filtering out of the p-modes is inevitable for studying the convective struc- tures in the solar photosphere because p-modes mostly reduce the correlation between var- ious line parameters. Odert et al. (2005) showed that correlation coefficients can fluctuate strongly in time with amplitudes of over 0.4 due to 5-min oscillations and the amplitudes are larger for higher formed lines. In case of weak lines the situation worsens even more, because correlations derived from them are influenced stronger by seeing. In this paper, we address the dissimilarity between non-magnetic and magnetic region seen in height variations of the correlation between temperature and line-of-sight velocity. We compare our results with a similar study by Rodrı́guez Hidalgo et al. (1999). Our analysis follows on the paper Koza et al. (2006, henceforth Paper I) and we invite the reader to have the paper at hand for further references. 2 Observational data and inversion procedure We use a time sequence of spectrograms obtained at the German Vacuum Tower Telescope at the Observatorio del Teide on April 28, 2000. The inversion code SIR (Ruiz Cobo & del Toro Iniesta 1992) was employed in the analysis of this observation. Thorough descriptions of the obser- vational data, inversion procedure, and spectral lines are given in Paper I. http://arxiv.org/abs/0704.0603v1 140 J. Koza et al.: Temperature – velocity correlation in the solar photosphere Figure 1. The height variation of the correlation between line-of-sight velocity and temperature for the results of Rodrı́guez Hidalgo et al. (1999) (solid) and for the non-magnetic (dashed) and magnetic (dotted) region defined in Paper I. 3 Results Figure 1 shows the height variations of the correlation between temperature and line-of-sight velocity for three different sets of data. The results of Rodrı́guez Hidalgo et al. (1999) in- dicate the significant anticorrelation between granules and intergranular lanes reaching the maximum of about −0.7 at log τ5 = −0.2. The subsequent weakening of this anticorrelation over the log τ5 ∈ 〈−0.2,−1.0〉 range is followed by a rise of correlation up to 0.28 at the middle photosphere at log τ5 = −1.75. No significant correlation exists in the upper photo- sphere. In the lower layers of the non-magnetic region (Paper I) the anticorrelation decreases to −0.63 at log τ5 = −0.4. However, in the middle photosphere temperature and line-of-sight velocity are almost uncorrelated with a local peak value of 0.08 at log τ5 = −1.7. Higher up at log τ5 = −2.9 the anticorrelation of about −0.42 is established again. In the sub-photospheric layers of the magnetic region the anticorrelation of −0.6 was found at log τ5 = 0.5. An approximately constant value of anticorrelation −0.2 was obtained in the low and middle photosphere. In the upper photosphere the anticorrelation reaches again −0.6. Figure 2 compares temperatures and line-of-sight velocities in the form of scatter corre- lation plots. Each plotted sample represents temperature and line-of-sight velocity specified along the x and y axes at a given pixel along the slit at a time within the interval of 15 min. From the top down, the row panels show correlations of the results of Rodrı́guez Hidalgo et al. (1999) and our results in the non-magnetic and magnetic region in three selected optical depths log τ5 = −0.3,−1.3, and −1.8. Plot saturation is avoided by showing sample density contours rather than individual points, except for the extreme outliers. The total distributions of temperatures and line-of-sight velocities are shown at the top and the left side of each panel, respectively. The first-moment curves are aligned at large correlation and become perpendicular in the absence of any correlation (Rutten & Krijger 2003). The first column in Fig. 2 shows good agreement of correlation coefficients and positions of maxima of ve- J. Koza et al.: Temperature – velocity correlation in the solar photosphere 141 Figure 2. Height-dependent scatter correlations of the line-of-sight velocity versus temperature. Top row: data from Rodrı́guez Hidalgo et al. (1999, see p. 315, Fig. 1). Middle and bottom row: our data for the non-magnetic and magnetic region (Paper I), respectively. The optical depths log τ5 and correlation coefficients cc are specified at each panel. Negative velocities indicate upflows. The rescaled total distributions of temperatures and line-of-sight velocities are shown as solid curves at the top and the left side of each panel, respectively. The dashed curves show the first moments of the sample density distributions over temperature and velocity bins. locity distributions in the non-magnetic region with the results of Rodrı́guez Hidalgo et al. (1999). However, the temperature distributions are dissimilar both in terms of asymme- try and also the positions of maxima. Our results indicate predominance of higher tem- peratures in the sample in contrast with lower temperatures prevailing in the results of Rodrı́guez Hidalgo et al. (1999). In the magnetic region, weak anticorrelation was found. The temperature distribution in this region is almost symmetric with maximum at higher temperatures than in the non-magnetic region. The second column of Fig. 2 corresponds to the layers where granulation is almost erased. While the temperature distributions in the non-magnetic region and in the results of Rodrı́guez Hidalgo et al. (1999) are symmetric, in the magnetic region the asymmetry indicates the abundant higher temperatures. The posi- tive correlation in the results of Rodrı́guez Hidalgo et al. (1999) shown in the third column suggests reversed granulation. However, this is not seen in our results. In the magnetic re- gion the asymmetries of temperature and velocity distributions indicate higher abundance of relatively hotter pixels with faster upflows. 142 J. Koza et al.: Temperature – velocity correlation in the solar photosphere 4 Discussion Figures 1 and 2 show dissimilarities both in height variations of correlation and distributions, although we and Rodrı́guez Hidalgo et al. (1999) used VTT observations and the SIR code. Because the maximum of anticorrelation found at sub-photospheric layers of the magnetic region is out of the range of sensitivity of the spectral lines (Paper I), we disregard this fea- ture. Very low anticorrelation found over log τ5 ∈ 〈0.0,−2.0〉 in the magnetic region (Fig. 1) suggests that magnetic field is another important decorrelating factor along with 5-min os- cillations and seeing (Puschmann et al. 2003; Odert et al. 2005). In our results, the middle layers of the non-magnetic and magnetic region lack signatures of reversed granulations (Fig. 1). The sinusoidal shape of the correlation coefficient in the non-magnetic region over the log τ5 ∈ 〈−1.2,−3.5〉 range can be explained as a sum of positive correlation typical for reversed granulation and negative anticorrelation characteristic for 5-min oscillations. 5 Summary Using a time sequence of high-resolution spectrograms and the SIR inversion code we have inferred height variation of correlation between the temperature and line-of-sight velocity throughout the quiet solar photosphere and a small magnetic region. The most important as- pect is comparison of the results with the akin study by Rodrı́guez Hidalgo et al. (1999). We found in agreement with Rodrı́guez Hidalgo et al. (1999) that the maximum anticorrelation −0.6 between the temperature and line-of-sight velocity in the non-magnetic region occurs at log τ5 = −0.4. The absence of signatures of reversed granulation in the middle layers of the non-magnetic region is likely to be due to 5-min oscillations, which negative anticorrelation prevails over weaker positive correlation typical for reversed granulation. Our results show that magnetic field is another decorrelating factor along with 5-min oscillations and seeing. Acknowledgements. The VTT is operated by the Kiepenheuer-Institut für Sonnenphysik, Freiburg, at the Observatorio del Teide of the Instituto de Astrofı́sica de Canarias. We are grateful to B. Ruiz Cobo (IAC) for kindly providing of the original data used in Figs. 1 and 2. This research is part of the European Solar Magnetism Network (EC/RTN contract HPRN-CT-2002-00313). This work was supported by the Slovak grant agency VEGA (2/6195/26) and by the Deutsche Forschungsgemein- schaft grant (DFG 436 SLK 113/7). J. Koza’s research is supported by a Marie Curie Intra-European Fellowships within the 6th European Community Framework Programme. References Koza, J., Kučera, A., Rybák, J., & Wöhl, H. 2006, A&A, 458, 941, (Paper I) Odert, P., Hanslmeier, A., Rybák, J., Kučera, A., & Wöhl, H. 2005, A&A, 444, 257 Puschmann, K., Vázquez, M., Bonet, J. A., Ruiz Cobo, B., & Hanslmeier, A. 2003, A&A, 408, 363 Rodrı́guez Hidalgo, I., Ruiz Cobo, B., Collados, M., & del Toro Iniesta, J. C. 1999, in ASP Conf. Ser. 173: Stellar Structure: Theory and Test of Connective Energy Transport, ed. A. Giménez, E. F. Guinan, & B. Montesinos, 313 Ruiz Cobo, B. & del Toro Iniesta, J. C. 1992, ApJ, 398, 375 Rutten, R. J., de Wijn, A. G., & Sütterlin, P. 2004, A&A, 416, 333 Rutten, R. J. & Krijger, J. M. 2003, A&A, 407, 735 Introduction Observational data and inversion procedure Results Discussion Summary ABSTRACT We derive correlation coefficients between temperature and line-of-sight velocity as a function of optical depth throughout the solar photosphere for the non-magnetic photosphere and a small area of enhanced magnetic activity. The maximum anticorrelation of about -0.6 between temperature and line-of-sight velocity in the non-magnetic photosphere occurs at log tau5 = -0.4. The magnetic field is another decorrelating factor along with 5-min oscillations and seeing. <|endoftext|><|startoftext|> Magnetism in the high-Tc analogue Cs2AgF4 studied with muon-spin relaxation T. Lancaster,1, ∗ S.J. Blundell,1 P.J. Baker,1 W. Hayes,1 S.R. Giblin,2 S.E. McLain,2 F.L. Pratt,2 Z. Salman,1, 2 E.A. Jacobs,3 J.F.C. Turner,3 and T. Barnes3 Clarendon Laboratory, Oxford University Department of Physics, Parks Road, Oxford, OX1 3PU, UK ISIS Facility, Rutherford Appleton Laboratory, Chilton, Oxfordshire OX11 0QX, UK Department of Chemistry and Neutron Sciences Consortium, University of Tennessee, Knoxville, Tennessee 37996, USA (Dated: November 4, 2018) We present the results of a muon-spin relaxation study of the high-Tc analogue material Cs2AgF4. We find unambiguous evidence for magnetic order, intrinsic to the material, below TC = 13.95(3) K. The ratio of inter- to intraplane coupling is estimated to be |J ′/J | = 1.9 × 10−2, while fits of the temperature dependence of the order parameter reveal a critical exponent β = 0.292(3), implying an intermediate character between pure two- and three- dimensional magnetism in the critical regime. Above TC we observe a signal characteristic of dipolar interactions due to linear F–µ +–F bonds, allowing the muon stopping sites in this compound to be characterized. PACS numbers: 74.25.Ha, 74.72.-h, 75.40.Cx, 76.75.+i Twenty years after its discovery, high-Tc superconduc- tivity remains one of the most pressing problems in con- densed matter physics. High-TC cuprates share a lay- ered structure of [CuO2] planes with strong antiferro- magnetic (AFM) interactions between S = 1/2 3d9 Cu2+ ions [1, 2]. However, analogous materials based upon 3d transition metal systems such as manganites [3] and nickelates [4] share neither the magnetic nor the super- conducting properties of the high-TC cuprates, leading to speculation that the spin- 1 character of Cu2+ is unique in this context. A natural extension to this line of in- quiry is to explore compounds based on the 4d analogue of Cu2+, namely S = 1 4d9 Ag2+ [5]; this motivated the synthesis of the layered fluoride Cs2AgF4 which contains silver in the unusual divalent oxidation state [6, 7]. This material possesses several structural similarities with the superconducting parent compound La2CuO4; it is com- prised of planes of [AgF2] instead of [CuO2] separated by FIG. 1: (Color online.) Structure of Cs2AgF4 showing a pos- sible magnetic structure. Candidate muon sites occur in both the [CsF] and [AgF2] planes. planes of [CsF] instead of [LaO] (Fig. 1). Magnetic measurements [7] suggest that in contrast to the antiferromagnetism of La2CuO4, Cs2AgF4 is well modelled as a two-dimensional (2D) Heisenberg ferro- magnet (described by the Hamiltonian H = J 〈ij〉 Si · Sj) with intralayer coupling J/kB = 44.0 K. The ob- servation of a magnetic transition below TC ≈ 15 K with no spontaneous magnetization in zero applied field (ZF) and a small saturation magnetization (∼ 40 mT), suggests the existence of a weak, AFM interlayer cou- pling. This behavior is reminiscent of the 2D ferromag- net K2CuF4 [8], where ferromagnetic (FM) exchange re- sults from orbital ordering driven by a Jahn-Teller distor- tion [9, 10]. On this basis, it has been suggested that in Cs2AgF4 a staggered ordering of dz2−x2 and dz2−y2 hole- containing orbitals on the Ag2+ ions gives rise to the FM superexchange [7]. An alternative scenario has also been advanced on the basis of density functional calcula- tions in which a d3z2−r2 − p− dx2−y2 orbital interaction through the Ag–F–Ag bridges causes spin polarization of the dx2−y2 band [11]. Although inelastic neutron scattering measurements have been carried out on this material [7], both Cs and Ag strongly absorb neutrons, resulting in limited resolu- tion and a poor signal-to-noise ratio. In contrast, spin- polarized muons, which are very sensitive probes of local magnetic fields, suffer no such impediments and, as we shall see, are ideally suited to investigations of the mag- netism in fluoride materials. In this paper we present the results of a ZF muon- spin relaxation (µ+SR) investigation of Cs2AgF4. We are able to confirm that the material is uniformly ordered throughout its bulk below TC and show that the critical behavior associated with the magnetic phase transition is intermediate in character between 2D and 3D. In ad- dition, strong coupling between the muon and F− ions allows us to characterise the muon stopping states in this http://arxiv.org/abs/0704.0604v1 compound. ZF µ+SR measurements were made on the MuSR in- strument at the ISIS facility, using an Oxford Instru- ments Variox 4He cryostat. In a µ+SR experiment spin- polarized positive muons are stopped in a target sample, where the muon usually occupies an interstitial position in the crystal. The observed property in the experiment is the time evolution of the muon spin polarization, the behavior of which depends on the local magnetic field B at the muon site [12]. Polycrystalline Cs2AgF4 was syn- thesised as previously reported [7]. Due to its chemical reactivity, the sample was mounted under an Ar atmo- sphere in a gold plated Ti sample holder with a cylindri- cal sample space of diameter 2.5 cm and depth 2 mm. A 25 µm thick window was screw-clamped onto a gold o-ring on the main body of the sample holder resulting in an airtight seal. Example ZF µ+SR spectra measured on Cs2AgF4 are shown in Fig. 2(a). Below TC (Fig. 2(c)) we observe oscillations in the time dependence of the muon polar- ization (the “asymmetry” A(t) [12]) which are character- istic of a quasi-static local magnetic field at the muon stopping site. This local field causes a coherent preces- sion of the spins of those muons for which a component of their spin polarization lies perpendicular to this local field (expected to be 2/3 of the total spin polarization for a powder sample). The frequency of the oscillations is given by νi = γµ|Bi|/2π, where γµ is the muon gyro- magnetic ratio (= 2π × 135.5 MHz T−1) and Bi is the average magnitude of the local magnetic field at the ith muon site. Any fluctuation in magnitude of these local fields will result in a relaxation of the oscillating signal [13], described by relaxation rates λi. Maximum entropy analysis (inset, Fig. 2(c)) reveals two separate frequencies in the spectra measured below TC, corresponding to two magnetically inequivalent muon stopping sites in the material. The precession frequen- cies, which are proportional to the internal magnetic field experienced by the muon, may be viewed as an effective order parameter for these systems [12]. In order to ex- tract the temperature dependence of the frequencies, the low temperature data were fitted to the function A(t) = Ai exp(−λit) cos(2πνit) (1) +A3 exp(−λ3t) +Abg, where A1 and A2 are the amplitudes of the precession signals and A3 accounts for the contribution from those muons with a spin component parallel to the local mag- netic field. The term Abg reflects the non-relaxing signal from those muons which stop in the sample holder or cryostat tail. The ratio of the two precession frequencies was found to be ν2/ν1 = 0.83 across the temperature range T < TC and this ratio was fixed in the fitting procedure. The FIG. 2: (Color online.) (a) Temperature evolution of ZF µ+SR spectra measured on Cs2AgF4 between 1.3 K and 59.7 K. (b) Above TC low frequency oscillations are observed due to the dipole-dipole coupling of F–µ+–F states. Inset: The energy level structure allows three transitions, leading to three observed frequencies. (c) Below TC higher frequency oscillations are observed due to quasistatic magnetic fields at the muon sites. Inset: Maximum entropy analysis reveal two magnetic frequencies corresponding to two magnetically inequivalent muon sites. amplitudes Ai were found to be constant across the tem- perature range and were fixed at values A1 = 1.66%, A2 = 3.74% and A3 = 5.54%. This shows that the prob- ability of a muon stopping in a site that gives rise to fre- quency ν1 is approximately half that of a muon stopping in a site that corresponds to ν2. We note also that A3 is in excess of the expected ratio of A3/(A1 +A2) = 1/2. The unambiguous assignment of amplitudes is made difficult by the resolution limitations that a pulsed muon source places on the measurement. The initial muon pulse at ISIS has FWHM τmp ∼ 80 ns, limiting the response for frequencies above ∼ τ−1mp [12]. We should expect, there- fore, slightly reduced amplitudes or increased relaxation (see below) for the oscillating components in our spectra for which ν1,2 & 5 MHz. The amplitudes of the oscilla- tions are large enough, however, for us to conclude that the magnetic order in this material is an intrinsic prop- erty of the bulk compound. Moreover, above TC there is a complete recovery of the total expected muon asymme- try. This observation, along with the constancy of A1,2,3 below TC, leads us to believe that Cs2AgF4 is completely ordered throughout its bulk below TC. Fig. 3(a) shows the evolution of the precession frequen- 11-T/TC 0 2 4 6 8 10 12 14 T (K) FIG. 3: Results of fitting data measured below TC to Eq. (1). (a) Evolution of the muon-spin precession frequencies ν1 (closed circles) and ν2 (open circles) with temperature. Solid lines are fits to the function νi(T ) = νi(0)(1 − T/TC) described in the text. Inset: Scaling plot of the precession frequencies with parameters TC = 13.95(3) K and β = 0.292. (b) Relaxation rates λ1 (closed circles), λ2 (open circles) and λ3 (closed triangles), as a function of temperature showing a rapid increase as TC is approached from below. cies νi, allowing us to investigate the critical behavior as- sociated with the phase transition. From fits of νi to the form νi(T ) = νi(0)(1−T/TC)β for T > 10 K, we estimate TC = 13.95(3) K and β = 0.292(3). In fact, good fits to νi(T ) = νi(0)(1−T/13.95)0.292 are achieved over the en- tire measured temperature range (that is, no spin wave related contribution was evident at low temperatures), yielding ν1(0) = 6.0(1) MHz and ν2(0) = 4.9(2) MHz corresponding to local magnetic fields at the two muon sites of B1 = 44(1) mT and B2 = 36(1) mT. A value of β = 0.292(3) is less than expected for three dimen- sional models (β = 0.367 for 3D Heisenberg), but larger than expected for 2D models (β = 0.23 for 2D XY or β = 0.125 for 2D Ising) [14, 15]. This suggests that in the critical regime the behavior is intermediate in char- acter between 2D and 3D; this contrasts with the mag- netic properties of K2CuF4 where β = 0.33, typical of a 3D system, is observed in the reduced temperature re- gion tr ≡ (TC − T )/TC > 7 × 10−2, with a crossover to more 2D-like behavior at tr < 7 × 10−2, where β = 0.22 [16, 17, 18]. Our measurements probe the behavior of Cs2AgF4 for tr ≥ 5.5 × 10−3, for which we do not ob- serve any crossover. A knowledge of TC and the intraplane coupling J , al- lows us to estimate the interplane coupling J ′. Recent studies of layered S = 1/2 Heisenberg ferromagnets us- ing the spin-rotation invariant Green’s function method [19], show that the interlayer coupling may be described by an empirical formula = exp with a = 2.414 and b = 2.506. Substituting our value of TC = 13.95 K and using |J |/kB = 44.0 K [7], we obtain |J ′|/kB = 0.266 K and |J ′/J | = 1.9× 10−2. The applica- tion of this procedure to K2CuF4 (for which TC = 6.25 K and |J |/kB = 20.0 K [8]) results in |J ′|/kB = 0.078 K and |J ′/J | = 3.9 × 10−3. This suggests that, although highly anisotropic, the interlayer coupling is stronger in Cs2AgF4 than in K2CuF4. This may account for the lack of dimensional crossover in Cs2AgF4 down to tr = 5.5× 10−3. Both transverse depolarization rates λ1 and λ2 are seen to decrease with increasing temperature (Fig. 3(b)) ex- cept close to TC where they rapidly increase. The large values of λ1,2 at low temperatures may reflect the re- duced frequency response of the signal due to the muon pulse width described above. The large upturn in the depolarization rate close to TC, which is also seen in the longitudinal relaxation rate λ3 (which is small and nearly constant except on approach to TC), may be at- tributed to the onset of critical fluctuations close to TC. The component in the spectra with the larger precession frequency ν1 has the smaller depolarization rate λ1 at all temperatures. These features provide further evidence for a magnetic phase transition at TC = 13.95 K. Above TC the character of the measured spectra changes considerably (Fig.2(a) and (b)) and we observe lower frequency oscillations characteristic of the dipole- dipole interaction of the muon and the 19F nucleus [20]. The Ag2+ electronic moments, which dominate the spec- tra for T < TC, are no longer ordered in the param- agnetic regime, and fluctuate very rapidly on the muon time scale. They are therefore motionally narrowed from the spectra, leaving the muon sensitive to the quasistatic nuclear magnetic moments. This interpretation is sup- ported by µ+SR measurements of K2CuF4 where simi- lar behavior was observed [21]. In many materials con- taining fluorine, the muon and two fluorine ions form a strong hydrogen bond usually separated by approxi- mately twice the F− ionic radius. The linear F–µ+–F spin system consists of four distinct energy levels with three allowed transitions between them (inset, Fig. 2(b)) giving rise to the distinctive three-frequency oscillations observed. The signal is described by a polarization function [20] D(ωdt) = uj cos(ωjt) , where u1 = 1, u2 = (1 + 1/ 3) and u3 = (1 − 1/ 3). The transition frequencies (shown in Fig. 2(b)) are given by ωj = 3ujωd/2 where ωd = µ0γµγF/4πr 3, γF is the nuclear gyromagnetic ratio and r is the µ+–19F separa- tion. This function accounts for the observed frequencies very well, leading us to conclude that the F–µ+–F bonds are highly linear. A successful fit of our data required the multiplication of D(ωdt) by an exponential function with a small re- laxation rate λ4, crudely modelling fluctuations close to TC. The addition of a further exponential component A5 exp(−λ5t) was also required in order to account for those muon sites not strongly dipole coupled to fluorine nuclei. The data were fitted with the resulting relaxation function A(t) = A4D(ωdt) exp(−λ4t) +A5 exp(−λ5t) +Abg, (3) The frequency ωd was found to be constant at all measured temperatures, taking the value ωd = 2π × 0.211(1) MHz, which corresponds to a constant F–µ+ separation of 1.19(1) Å, typical of linear bonds [20]. The relaxation rates only vary appreciably within 0.2 K of the magnetic transition, increasing as TC is approached from above, probably due to the onset of critical fluctua- tions. This provides further evidence for our assignment of TC = 13.95 K. Our determination of νi(0) and observation of the lin- ear F–µ+–F signal allow us to identify candidate muon sites in Cs2AgF4. Although the magnetic structure of the system is not known, magnetic measurements [7] suggest the existence of loosely coupled FM Ag2+ layers arranged antiferromagnetically along the c-direction. Dipole fields were calculated for such a candidate magnetic structure with Ag2+ moments in the ab planes oriented parallel (antiparallel) to the a direction for z = 0 (z = 1/2). The calculation was limited to a sphere containing ≈ 105 Ag ions with localized moments of 0.8 µB [7]. The above considerations suggest that the muon sites will be situ- ated midway between two F− ions. Two sets of candidate muon sites may be identified in the planes containing the fluorine ions. Magnetic fields corresponding to ν2(0) are found in the [CsF] planes (i.e. those with z = 0.145 and z = 0.355) at the positions (1/4, 1/4, z), (1/4, 3/4, z), (3/4, 1/4, z) and (3/4, 3/4, z). Sites corresponding to the frequency ν1(0) are more difficult to assign, but good candidates are found in the [AgF2] planes (at z = 0, 1/2) at positions (1/4, 1/2, z), (3/4, 1/2, z), (1/4, 0, z) and (3/4, 0, z). The candidate sites are shown in Fig. 1. We note that there are twice as many [CsF] planes in a unit cell than there are [AgF2] planes in agreement with our observation that components with frequency ν2 oc- cur with twice the amplitude of those with ν1. Such an assignment then implies that the presence of the muon distorts the surrounding F− ions such that their separa- tion is ∼ 2.38 Å. This contrasts with the in-plane F–F separation in the unperturbed material of 4.55 Å ([CsF] planes) and∼ 3.2 Å ([AgF2] planes) [7]. Thus the two ad- jacent F− ions in the magnetic [AgF2] planes each shift by ∼ 0.4 Å from their equilibrium positions towards the µ+, demonstrating that the muon introduces a non-negligible local distortion; however, the distortion in the Ag2+ ion positions is expected to be much less significant. In conclusion, we have shown unambiguous evi- dence for magnetic order in Cs2AgF4 with an exchange anisotropy of |J ′/J | ≈ 10−2 and critical behavior inter- mediate in character between 2D and 3D. The presence of coherent F–µ+–F states allows a determination of can- didate muon sites and an estimate of the perturbation of the system caused by the muon probe. This study demonstrates that µ+SR is an effective and useful probe of the Cs2AgF4 system. In order to further explore this system as an analogue to the high-TC materials it is desir- able to perform investigations of doped materials based on the Cs2AgF4 parent compound. Part of this work was carried out at the ISIS facility, Rutherford Appleton Laboratory, UK. This work is sup- ported by the EPSRC (UK). T.L. acknowledges support from the Royal Commission for the Exhibition of 1851. J.F.C.T and S.E.M acknowledge the U.S. National Sci- ence Foundation under awards CAREER-CHE 039010 and OISE 0404938, respectively. ∗ Electronic address: t.lancaster1@physics.ox.ac.uk [1] P.A. Lee, N. Nagaosa and X.-G. Wen, Rev. Mod. Phys. 78 17 (2006). [2] M.A. Kastner et al, Rev. Mod. Phys. 70 897 (1998). [3] E. Dagotta, T. Hotta and A Moreo, Phys. Rep. 344 1 (2001). [4] R.J. Cava et al., Phys. Rev. B 43 1229 (1991). [5] W. Grochala and R. Hoffmann, Angew. Chem. Int. Ed. 40 2742 (2001). [6] R.-H. Odenthal, D. Paus and R. Hoppe, Z. Anorg. Allg. Chem. 407 144 (1974). [7] S.E. McLain et al., Nature Mat. 5, 561 (2006). [8] I. Yamada, J. Phys. Soc. Jpn. 33, 979 (1972). [9] Y. Ito and J. Akimitsu, J. Phys. Soc. Jpn. 40, 1333 (1976). [10] D.I. Khomskii and K.I. Kugel, Solid State Commun. 13 763 (1973). [11] D. Dai et al., Chem. Mater. 18, 3281 (2006). [12] S.J. Blundell, Contemp. Phys. 40, 175 (1999). [13] R.S. Hayano et al., Phys. Rev. B 20, 850 (1979). [14] S.J. Blundell, Magnetism in Condensed Matter (Oxford University Press, 2001). [15] S.T. Bramwell and P.C.W. Holdsworth, J. Phys.: Con- dens. Matter 5 L53 (1993). [16] K. Hirakawa and H. Ikeda, J. Phys. Soc. Jpn. 35, 1328 (1973). [17] T. Hashimoto et al., J. Magn. Magn. Mater., 15-18, 1025 (1980). [18] M. Suzuki and H. Ikeda, J. Phys. Soc. Jpn. 50, 1133 (1981). [19] D. Schmalfuß, J. Richter and D. Ihle, Phys. Rev. B 72 224405 (2005). [20] J.H. Brewer et al., Phys. Rev. B 33, 7813 (1986). [21] C. Mazzoli et al., Physica B 326 427 (2003). mailto:t.lancaster1@physics.ox.ac.uk ABSTRACT We present the results of a muon-spin relaxation study of the high-Tc analogue material Cs2AgF4. We find unambiguous evidence for magnetic order, intrinsic to the material, below T_C=13.95(3) K. The ratio of inter- to intraplane coupling is estimated to be |J'/J|=1.9 x 10^-2, while fits of the temperature dependence of the order parameter reveal a critical exponent beta=0.292(3), implying an intermediate character between pure two- and three- dimensional magnetism in the critical regime. Above T_C we observe a signal characteristic of dipolar interactions due to linear F-mu-F bonds, allowing the muon stopping sites in this compound to be characterized. <|endoftext|><|startoftext|> Introduction Flattè parametrization Flattè analysis: procedure and results Discussion Summary Acknowledgments References ABSTRACT We investigate the enhancement in the D^0\bar{D}^0\pi^0 final state with the mass M=3875.2\pm 0.7^{+0.3}_{-1.6}\pm 0.8 MeV found recently by the Belle Collaboration in the B\to K D^0\bar{D}^0\pi^0 decay and test the possibility that this is yet another manifestation of the well-established resonance X(3872). We perform a combined Flatte analysis of the data for the D^0\bar{D}^0\pi^0 mode, and for the \pi^+\pi^- J/\psi mode of the X(3872). Only if the X(3872) is a virtual state in the D^0\bar{D}^{*0} channel, the data on the new enhancement comply with those on the X(3872). In our fits, the mass distribution in the D^0\bar{D}^{*0} mode exhibits a peak at 2-3 MeV above the D^0\bar{D}^{*0} threshold, with a distinctive non-Breit-Wigner shape. <|endoftext|><|startoftext|> Introduction Tremendous interest has been generated recently in the electronic properties of two dimensional (2D) graphene in both experimental and theoretical arenas [1, 2, 3, 5, 6, 7]. Graphene is a single atomic layer of carbon atoms forming a dense honeycomb crystal lattice [8]. The massless energy dispersion relation of electrons and holes with zero (or close to zero) bandgap results in novel behavior of both single-particle and collective excitations [1, 2, 3]. In addition, the high mobility of electrons in graphene has generated interest in developing novel high speed devices. Recently, it has been shown that the frequencies of plasma waves in graphene at moderate carrier densities ( 109−1011 cm−2) are in the terahertz range [3]. Electron-hole decay through plasmon emission has been recently experimentally observed in graphene [4]. The zero bandgap of graphene leads to strong damping of the plasma waves (plasmons) at finite temperatures as plasmons can decay by exciting interband electron-hole pairs [1, 2]. In this paper we show that plasmon amplification through stimulated emission is possible in population inverted graphene layers. This process is depicted in Fig.1. We show that plasmons in graphene can have a net gain at frequencies in the 1-10 THz range even if plasmon losses from electron and hole intraband scattering are considered. A net gain for the plasmons implies that terahertz amplifiers and oscillators based on plasmon amplification through stimulated emission are possible. The gain at terahertz frequencies is possible due to the (almost) zero bandgap of graphene. Although terahertz gain is also achievable in population inverted subbands in 2D quantum wells [9], intrasubband plasmons in quantum wells, being longitudinal collective modes, do not couple with intersubband transitions that require field polarization perpendicular to the plane of the quantum wells. The electromagnetic energy in the two-dimensional plasmon mode is confined within very small distances of the graphene layer and therefore waveguiding structures with large dimensions, such as those required in terahertz quantum cascade lasers [9], are not required for realizing plasmon based terahertz devices. We also present results for plasmon gain under different population inversion conditions taking into account both intraband and interband electronic transitions and carrier scattering. 2 Theoretical Model In this section we discuss the theoretical model used to obtain the values for the plasmon gain in graphene. In graphene, the valence and conduction bands resulting from the mixing of the pz-orbitals are degenerate at the inequivalent K and K ′ points of the Brillouin zone [8]. Near these points, the conduction and valence band dispersion relations can be written compactly as [2], Es,k = sh̄v|k| (1) where s = ±1 stand for conduction (+1) and valence (−1) bands, respectively, and v is the “light” velocity of the massless electrons and holes. The wavevector k is measured from the K(K ′) point. The frequencies ω(q) of the longitudinal plasmon modes of wavevector q are given by the equation,ǫ(q, ω) = 0, where ǫ(q, ω) is the longitudinal dielectric function of graphene [2]. In the random phase approximation (RPA) ǫ(q, ω) can be written as [10], ǫ(q, ω) = 1− V (q)Π(q, ω) (2) Here, V (q) is the bare 2D Coulomb interaction and equals e2/2ǫ∞q. ǫ∞ is the average of the dielectric constant of the media on either side of the graphene layer. Π(q, ω) is the electron-hole propagator including both intraband and interband processes and is given by the expression [1, 2], Π(q, ω) = 4 s s′ k | < ψs′,k+q|e iq.r|ψs,k > | f(Es,k − Efs)− f(Es′,k+q − Efs′) h̄ω + Es,k − Es′,k+q + iη The factor of 4 outside in the above equation comes from the degenerate two spins and the two valleys at K and K ′. f(E − Ef ) is the Fermi distribution function with Fermi energy Ef . |ψs,k) > are the Bloch functions for the conduction and valence bands near the K(K ′) point. The occupancy of electrons in the conduction and valence bands are described by different Fermi levels to allow for nonequilibrium population inversion. The Bloch functions have the following matrix elements [8], | < ψs′,k+q|e iq.r|ψs,k > | 1 + ss′ |k|+ |q| cos (θ) |k+ q| where θ is the angle between the vectors k and q. The condition v|q| < ω(q) must be satisfied in order to avoid direct intraband absorption of plasmons. Assuming v|q| < ω, and using the symmetry between conduction and valence bands, the intraband and interband contributions to the propagator can be approximated as follows, Πintra(q, ω) ≈ q2K T /πh̄2 ω(ω + i/τ)− v2q2/2 eEf+/KT + 1 e−Ef−/KT + 1 Πinter(q, ω) ≈ [f(h̄ω/2− Ef+)− f(−h̄ω/2− Ef−)] ω2 − ω2 [f(h̄ω/2− Ef+)− f(−h̄ω/2− Ef−)] (6) Here, q = |q|. In Equation (5), the intraband contribution to the propagator is written in the plasmon-pole approximation that satisfies the f-sum rule [10]. This approximation is not valid for large value of the wavevector q when ω(q) → vq. However, in this paper we will be concerned with small values of the wavevector for which the plasmons have net gain, and therefore the approximation used in Equation (5) is adequate. Plasmon energy loss due to intraband scattering has been included with a scattering time τ in the number-conserving relaxation-time approximation which assumes that as a result of scattering the carrier distribution relaxes to the local equilibrium distribution [11]. The real part of the interband contribution to the propagator modifies the effective dielectric constant and leads to a significant reduction in the plasmon frequency under population inversion conditions. The imaginary part of the interband contribution to the propagator incorporates plasmon loss or gain due to stimulated interband transitions. A necessary condition for plasmon gain from stimulated interband transitions is that the splitting of the Fermi levels of the conduction and valence electrons exceed the plasmon energy, i.e. Ef+−Ef− > h̄ω. But the plasmons will gave net gain only if the plasmon gain from stimulated interband transitions exceed the plasmon loss due to intraband scattering. The real and imaginary parts of the propagator in Equations (5) and (6) satisfy the Kramers-Kronig relations. Equations (5) and (6) can be used with Equation (2) to calculate the real and imaginary parts of the plasmon frequency ω(q) as a function of q. However, from the point of view of device design, it is more useful to assume that the frequency ω is real and the propagation vector q(ω), written as a function of ω, is complex. Since the charge density wave corresponding to plasmons has the form eiq.r−iωt, the imaginary part of the propagation vector corresponds to net gain or loss. We define the net plasmon energy gain g(ω) as −2Imag{q(ω)}. 3 Results and Discussion In simulations we use v = 108 cm/s and ǫ∞ = 4.0ǫo (assuming silicon-dioxide on both sides of the graphene layer) [1]. We assume a nonequilibrium situation, as in a semiconductor interband laser [12], in which the electron and hole densities are equal and Ef+ = −Ef−. Such a non-equilibrium situation can be realized experimentally by either carrier injection in an electrostatically defined graphene pn-junction or through optical pumping [13, 14]. The value of the scattering time τ (momentum relaxation time) is also critical for calculations of the net plasmon gain. Value of τ can be estimated from the experimentally reported values of mobility using the following expression for the graphene conductivity (assuming that only electrons are present) [17], e2 τ K T eEf+/KT + 1 Values of mobility between 20,000 and 60,000 cm2/V-s have been experimentally measured at low temperatures (T¡77K) in graphene [6, 7, 15]. Assuming a mobility value of 27,000 cm2/V-s , reported in Ref. [15] for an electron density of 3.4×1012 cm−2 at T=58K, the value of τ comes out to be approximately 0.6 ps. The phonon scattering time was experimentally determined to be close to 4 ps at T=300K [15]. Therefore, impurity or defect scattering is expected to be the 0 0.5 1 1.5 2 2.5 3 Wavevector (105 cm−1) T = 10K n=p=1, 3.5, 6, 8.5 × 109 cm−2 increasing density ω = v q Figure 2: Calculated plasmon dispersion relation in graphene at 10K is plotted for different electron-hole densities (n = p = 1, 3.5, 6, 8.5×109 cm−2). The condition ω(q) > h̄vq is satisfied for frequencies that have net gain in the terahertz range. The assumed values of v and τ are 108 cm/s and 0.5 ps, respectively. dominant momentum relaxation mechanism in graphene, and the scattering time is expected to be relatively independent of temperature [17]. In the results presented below, unless stated otherwise, we have used a temperature independent scattering time of 0.5 ps. Figs. 2-7 show the calculated dispersion relation of the plasmons and the net plasmon gain at T=10K, 77K, and 300K for different electron-hole densities. At very low frequencies the losses from intraband scattering dominate. At frequencies ranging from 1 to 15 THz, the plasmons can have net gain. The values of the net gain are found to be significantly large reaching 1−4×104 cm−1 for electron-hole densities in the 109 cm−2 range at low temperatures and 1011 cm−2 range at room temperature. The calculated plasmon dispersions indicate that ω(q) > vq at all frequencies for which the plasmons have net gain. Therefore, direct intraband absorption of plasmons is not possible at these frequencies and will not reduce the calculated gain values. Plasmons acquire net gain for smaller electron-hole densities at lower temperatures. At higher temperatures the distribution of electrons and holes in energy is broader and the 0 1 2 3 4 5 6 Frequency (THz) T = 10K n=p=1, 3.5, 6, 8.5 × 109 cm−2 increasing density Figure 3: Net plasmon gain in graphene at 10K is plotted for different electron-hole densities (n = p = 1, 3.5, 6, 8.5 × 109 cm−2). The assumed values of v and τ are 108 cm/s and 0.5 ps, respectively. gain at any particular frequency is therefore smaller. At T=10K, the plasmons have net gain for electron-hole densities as small as 2 × 109 cm−2. Almost an order of magnitude larger electron-hole densities are required to achieve the same net gain values at T=77K compared to T=10K. The linear energy dependence of the density of states associated with the massless dispersion relation of electrons and holes in graphene results in the maximum plasmon gain values to increase with the electron-hole density. The peak gain values shift to higher frequencies with the increase in the electron-hole density for the same reason. The fact that plasmons can acquire net gain for relatively small carrier densities suggests that plasmon gain is relatively robust with respect to intraband scattering losses. Fig. 8 shows the net gain at T=10K for n = p = 1010 cm−2 and values of the intraband scattering time τ varying from 0.1 to 0.5 ps. The net gain decreases as the plasmon losses increase with a decrease in the value of τ and the maximum gain value equals zero for τ = 0.15 ps. However, it should not be concluded from Fig. 8 that plasmons cannot have net gain for τ less than 0.15 ps since electron-hole density can always be increased to achieve net gain for smaller values of 0 2 4 6 8 Wavevector (105 cm−1) increasing density ω = v q T = 77K n=p=1, 2, 3, 4 × 1010 cm−2 Figure 4: Calculated plasmon dispersion relation in graphene at 77K is plotted for different electron-hole densities (n = p = 1, 2, 3, 4 × 1010 cm−2). The condition ω(q) > h̄vq is satisfied for frequencies that have net gain in the terahertz range. The assumed values of v and τ are 108 cm/s and 0.5 ps, respectively. 0 2 4 6 8 10 Frequency (THz) T = 77K n=p=1, 2, 3, 4 × 1010 cm−2 increasing density Figure 5: Net plasmon gain in graphene at 77K is plotted for different electron-hole densities (n = p = 1, 2, 3, 4 × 1010 cm−2). The assumed values of v and τ are 108 cm/s and 0.5 ps, respectively. 0 5 10 15 20 Wavevector (105 cm−1) T = 300K n=p=1, 1.5, 2, 2.5 × 1011 cm−2 increasing density ω = v q Figure 6: Calculated plasmon dispersion relation in graphene at 300K is plotted for different electron-hole densities (n = p = 1, 1.5, 2, 2.5×1011 cm−2). The condition ω(q) > h̄vq is satisfied for frequencies that have net gain in the terahertz range. The assumed values of v and τ are 108 cm/s and 0.5 ps, respectively. 0 5 10 15 20 Frequency (THz) T = 300K n=p=1, 1.5, 2, 2.5 × 1011 cm−2 increasing density Figure 7: Net plasmon gain in graphene at 300K is plotted for different electron-hole densities (n = p = 1, 1.5, 2, 2.5 × 1011 cm−2). The assumed values of v and τ are 108 cm/s and 0.5 ps, respectively. 0 2 4 6 8 Frequency (THz) T = 10K n = p = 1010 cm−2 τ = 0.5, 0.4, 0.3, 0.2, 0.15, 0.1 ps increasing τ Figure 8: Net plasmon gain in graphene at 10K is plotted for different intraband scattering times τ (τ = 0.5, 0.4, 0.3, 0.2, 0.15, 0.1 ps). The assumed value of v is 108 cm/s and the electron-hole density is 1010 cm−2. 0 2 4 6 8 10 12 Frequency (THz) increasing τ T = 10K n = p = 3 × 1010 cm−2 τ = 150, 125, 100, 75 fs Figure 9: Net plasmon gain in graphene at 10K is plotted for different scattering times τ (τ = 150, 125, 100, 75 fs). The assumed value of v is 108 cm/s and the electron-hole density is 3× 1010 cm−2. τ . Fig. 9 shows the net gain at T=10K for n = p = 3× 1010 cm−2 and values of the intraband scattering time τ varying from 75 to 150 fs. It can be seen that at these larger carrier densities plasmons have net gain for scattering times that are sub-100 fs. The exceedingly large values of the net plasmon gain (> 104 cm−1) in graphene implies that terahertz plasmon oscillators only a few microns long in length could have sufficient gain to overcome both intrinsic losses and losses associated with external radiation coupling. Plasmon fields with in-plane wavevector magnitude q decay as e−q |z| away from the graphene layer where |z| is the distance from the graphene layer. Figs. 2, 4, and 6 show that q has values exceeding 105 cm−1 at terahertz frequencies. Therefore, the electromagnetic energy associated with the terahertz plasmons is confined within 100 nm of the graphene layer. Strong field confinement and low plasmon losses at terahertz frequencies are both partly responsible for the high net gain values in graphene. Recent theoretical predictions for electron-hole recombination rates in graphene due to Auger scattering indicate that electron-hole recombination times can be much longer than 1 ps at temperatures ranging from 10K to 300K for electron-hole densities smaller than 1012 cm−2 [16]. This suggests that population inversion can be experimentally achieved in graphene via current injection in electrostatically defined pn-junctions or via optical pumping [13, 14]. It also needs to be pointed out here that graphene monolayers and multilayers produced from currently available experimental techniques are estimated to have defect/impurity densities anywhere between 1011 and 1012 cm−2 [17]. Therefore, at low electron-hole densities (less than 1011 cm−2) graphene is expected to exhibit localized electron and hole puddles rather than continuous electron and/or hole sheet charge densities [17]. This implies that with the currently available techniques graphene based terahertz plasmon oscillators might only be realizable with higher electron-hole densities (> 1011 cm−2) for operation at higher frequencies (> 5 THz). 4 conclusion In conclusion, we have shown that high gain values for plasmons are possible in population inverted graphene layers in the 1-10 THz frequency range. The plasmon gain remains positive even for carrier intraband scattering times shorter than 100 fs. The high gain values and the strong plasmon field confinement near the graphene layer could enable compact terahertz amplifiers and oscillators. The authors would like to thank Edwin Kan and Sandip Tiwari for helpful discussions. References [1] X. F. Wang, T. Chakraborty, Phys. Rev. B, 75, 033408 (2007). [2] E. H. Hwang, S. D. Darma, cond-mat/0610561. [3] V. Ryzhii, A. Satou, J. Appl. Phys., 101, 024509 (2007). [4] A. Bostwick, T. Ohta, T. Seyller, Karsten Horn AND E. Rotenberg, Nature, 3, 36, (2007). [5] K. S. Novoselov et. al., Nature, 438, 197 (2005). [6] K. S. Novoselov et. al., Science, 306, 666 (2004). [7] Y. Zhang et. al., Nature, 438, 201 (2005). [8] R. Saito, G. Dresselhaus, M. S. Dresselhaus, Physical Properties of Carbon Nanotubes, Imperial College Press, London, UK (1999). [9] B. Williams, H. Callebaut, S. Kumar, Q. Hu, Appl. Phys. Letts., 82, 1015 (2003). [10] H. Huag, S. W. Koch, Quantum Theory of the Optical and Electronic Properties of Semiconductors, World Sientific, NJ (1994). [11] N. D. Mermin, Phys. Rev. B, 1, 2362 (1969). [12] L. A. Coldren, S. W. Corzine, Diode Lasers and Photonics Integrated Circuits, Wiley, NY (1995). [13] J. R. Williams, L. DiCarlo, C. M. Marcus, cond-mat/0704.3487 (2007). [14] B. zyilmaz, P. Jarillo-Herrero, D. Efetov, D. A. Abanin, L. S. Levitov, Philip Kim, cond-mat/0705.3044. http://arxiv.org/abs/cond-mat/0610561 [15] W. De Heer et. al., Science, 312, 1191 (2006). [16] F. Rana, cond-mat/0705.1204. [17] E. H. Hwang, S. Adam, S. Das Sarma, Phys. Rev. Letts., 98, 186806 (2007). Introduction Theoretical Model Results and Discussion conclusion ABSTRACT We show that plasmons in two-dimensional graphene can have net gain at terahertz frequencies. The coupling of the plasmons to interband electron-hole transitions in population inverted graphene layers can lead to plasmon amplification through the process of stimulated emission. We calculate plasmon gain for different electron-hole densities and temperatures and show that the gain values can exceed $10^{4}$ cm$^{-1}$ in the 1-10 terahertz frequency range, for electron-hole densities in the $10^{9}$-$10^{11}$ cm$^{-2}$ range, even when plasmon energy loss due to intraband scattering is considered. Plasmons are found to exhibit net gain for intraband scattering times shorter than 100 fs. Such high gain values could allow extremely compact terahertz amplifiers and oscillators that have dimensions in the 1-10 $\mu$m range. <|endoftext|><|startoftext|> Introduction Let R be a Noetherian ring and f = {f1, . . . , fm} a set of elements of R. Such sets are the ingredients of rational maps between affine and other spaces. At the cost of losing some definition, we choose to examine them in the setting of the ideal I they generate. Specifically, we consider the presentation of the Rees algebra of I 0 → M −→ S = R[T1, . . . , Tm] ϕ−→ R[It] → 0, Ti 7→ fit. The context of Rees algebra theory allows for the examination of the syzygies of the fi but also of the relations of all orders, which are carriers of analytic information. We set R = R[It] for the Rees algebra of I. The ideal M will be referred to as the equations of the fj, or by abuse of terminology, of the ideal I. If M is generated by forms of degree 1, I is said to be of linear type (this is independent of the set of generators). The Rees algebra R[It] is then the symmetric algebra S = Sym(I) of I. Such is the case when the fi form a regular sequence, M is then generated by the Koszul forms fiTj − fjTi, i < j. We will treat mainly almost complete intersections in a Cohen-Macaulay ring R, that is, ideals of codimension r generated by r + 1 elements. Almost exclusively, I will be an ideal of finite co-length in a local ring, or in a ring of polynomials over a field. Our focus on R is shaped by the following fact. The class of ideals I to be considered will have the property that both its symmetric algebra S and the normalization R′ of R have amenable properties, for instance, one of them (when not both) is Cohen-Macaulay. In such case, the diagram S ։ R ⊂ R′ gives a convenient dual platform from which to examine R. There are specific motivations for looking at (and for) these equations. In order to describe our results in some detail, let us indicate their contexts. (i) Ideals which are almost complete intersections occur in some of the more notable birational maps and in geometric modelling ([3], [4], [5], [6], [7], [8], [9], [10], [17], [18], [21]). (ii) It is possible interpret questions of birationality of certain maps as an interaction between the Rees algebra of the ideal and its special fiber. The mediation is carried by the first Chern coefficient of the associated graded ring of I. In the case of almost complete intersections the analysis is more tractable, including the construction of suitable algorithms. (iii) At a recent talk in Luminy ([9]), D. Cox raised several questions about the character of the equations of Rees algebras in polynomial rings in two variables. They are addressed in Section 4 as part of a general program of devising algorithms that produce all the equations of an ideal, or at least some distinguished polynomial (e.g. the ‘elimination equation’ in it) ([3], [13]). We now describe our results. Section 2 is an assemblage for the ideals treated here of basics on symmetric and Rees algebras, and on their Cohen-Macaulayness. We also introduce the general notion of a Sylvester form in terms of contents and coefficients in a polynomial ring over a base ring. This is concretely taken up in Section 4 when the base ring is a polynomial ring in 2 variables over a field. In Section 3 we examine the connection between typical algebraic invariants and the geometric background of rational maps and their images. Here, besides the dimension and the degree of the related algebras, we also consider the Chern number e1(I) of an ideal. In particular we explain a criterion for a rational map to be birational in terms of an equality of two such Chern numbers, provided the base locus of the map is empty and defined by an almost complete intersection ideal. In Section 4, we discuss the role of irreducible ideals in producing Sylvester forms. Of a general nature, we describe a method to obtain an irreducible decomposition of ideals of finite co-length. In rings such as k[s, t], due to a theorem of Serre, irreducible ideals are complete intersections, a fact that leads to Sylvester forms of low degree. Turning to the equations of almost complete intersections, we derive several Sylvester forms over a polynomial ring R = k[s, t], package them into ideals and examine the incident homological properties of these ideals and the associated algebras. It is a computer-assisted approach whose role is to produce a set of syzygies that afford hand computation: the required equations themselves are not generated by computation. Concretely, we model a generic class of ideals cases to define ‘super-generic’ ideals L in rings with several new variables L = (f, g, h1, . . . , hm) ⊂ A. Using Macaulay2 ([11]), we obtain the free resolutions of L. In degrees ≤ 5, the resolution has length ≤ 3 (2 when degree = 4) 0 → F3 d3−→ F2 d2−→ F1 −→ F0 −→ L → 0. It has the property that after specialization the ideals of maximal minors of d3 and d2 have codimension 5 and ≥ 4, respectively. Standard arguments of the theory of free resolutions will suffice to show that the specialization of L is a prime ideal. For ideals in R = k[s, t] generated by forms of degrees ≤ 5, the method succeeds in describing the full set of equations. In higher degree, in cases of special interest, it predicts the precise form of the elimination equation. For a technical reason–due to the character of irreducible ideals–the method is limited to dimension two. Nevertheless, it is supple enough to apply to non-homogeneous ideals. This may be exploited elsewhere, along with the treatment of ideals with larger numbers of generators in a two-dimensional ring. 2 Preliminaries on symmetric and Rees algebras We will introduce some basic material of Rees algebras ([2], [12], [22]). Since most of the questions we will consider have a local character, we pick local rings as our setting. Whenever required, the transition to graded rings will be direct. Throughout we will consider a Noetherian local ring (R,m) and I an m-primary ideal (or a graded algebra over a field k, R = n≥0Rn = R0[R1], R0 = k, and I a homogeneous ideal of finite colength λ(R/I) < ∞). We assume that I admits a minimal reduction J generated by n = dimR elements. This is always possible when k is infinite. The terminology means that for some integer r, Ir+1 = JIr. This condition in turn means that the inclusion of Rees algebras R[Jt] ⊂ R[It] is an integral birational extension (birational in the sense that the two algebras have the same total ring of fractions). The smallest such integer, rJ(I), is called the reduction number of I relative to J ; the infimum of these numbers over all minimal reductions of I is the (absolute) reduction number r(I) of I. For any ideal, not necessarily m-primary, the special fiber of R[It] – or of I by abuse of terminology – is the algebra F(I) = R[It]⊗R (R/m). The dimension of F(I) is called the analytic spread of I, and denoted ℓ(I). When I is m-primary, ℓ(I) = dimR. A minimal reduction J is generated by ℓ(I) elements, and F(J) is a Noether normalization of F(I). Hilbert polynomials The Hilbert polynomial of I by (m ≫ 0) is the function ([2]): λ(R/Im) = e0(I) m+ n− 1 − e1(I) m+ n− 2 + lower terms. e0(I) is the multiplicity of the ideal I. If R is Cohen-Macaulay, e0(I) = λ(R/J), where J is a minimal reduction of I (generated by a regular sequence). For such rings, e1(I) ≥ 0. For instance, if R = k[x1, . . . , xn], m = (x1, . . . , xn) and I = m λ(R/Im) = λ(R/mmd) = md+ n− 1 m+ n− 1 − e1(I) m+ n− 2 + lower terms where e1(I) = (dn − dn−1). Both coefficients will be the focus of our interest soon. Cohen-Macaulay Rees algebras There is broad array of criteria expressing the Cohen-Macaulayness of Rees algebra (see [1], [14], [19], [23, Chapter 3]). Our needs will be filled by single criterion whose proof is fairly straightforward. We briefly review its related contents. Let (R,m) be a Cohen-Macaulay local ring of dimension ≥ 1, and let I be an m-primary ideal with a minimal reduction J . The Rees algebra R[Jt] is Cohen-Macaulay and serves as an anchor to derive many properties of R[It]. Here is one that we shall make use of. Define the Sally module SJ(I) of I relative to J to be the cokernel of the natural inclusion of finite R[Jt]-modules I R[Jt] ⊂ I R[It]. Thus, SJ(I) = It/IJ t−1. It has a Hilbert function, unlike the algebra R[It], that gives information about the Hilbert function of I (see [22, Chapter 2]). The module on the left, I ·R[Jt], is a Cohen-Macaulay R[Jt]-module of depth dimR + 1. The Cohen-Macaulayness of I · R[It] is directly related to that of R[It]. These considerations lead to the criterion: Theorem 2.1 If dimR ≥ 2 and the reduction number of I is ≤ 1, that is I2 = JI, then R[It] is Cohen-Macaulay. The converse holds if dimR = 2. Symmetric algebras Throughout R is a Cohen-Macaulay ring and I is an almost complete intersection. The symmetric algebra Sym(I) will be denoted by S. Hopefully there will be no confusion between S and the rings of polynomials S = R[T1, . . . , Tn] that we use to give a presentation of either R or S. What keeps symmetric algebras of almost complete intersections fairly under control is the following: Proposition 2.2 Let (R,m) be a Cohen-Macaulay local ring. If I is an almost complete intersection and depth R/I ≥ dimR/I − 1, then S is Cohen-Macaulay. In particular, if I is m-primary then S is Cohen-Macaulay. Proof. The general assertion follows from [12, Proposition 10.3]; see also [16]. ✷ Let R be a Noetherian ring and let I be an R-ideal with a free presentation ϕ−→ Rn −→ I → 0. We assume that I has a regular element. If S = R[T1, . . . , Tn], the symmetric algebra S of I is defined by the ideal M1 ⊂ S of 1-forms, M1 = I1([T1, . . . , Tn] · ϕ). The ideal of definition of the Rees algebra R of I is the ideal M ⊂ S obtained by elimination (M1 : x t) = M1 : x where x is a regular element of I. Sylvester forms To get additional elements of M , evading the above calculation, we make use of general Sylvester forms. Recall how these are obtained. Let f = {f1, . . . , fn} be a set of polynomials in B = R[x1, . . . , xr] and let a = {a1, . . . , an} ⊂ R. If fi ∈ (a)B for all i, we can write f = [f1 · · · fn] = [a1 · · · an] ·A = a ·A, where A is a n×n matrix with entries in B. By an abuse of terminology, we refer to det(A) as a Sylvester form of f relative to a, in notation det(f)(a) = det(A). It is not difficult to show that det(f)(a) is well-defined mod (f). The classical Sylvester forms are defined relative to sets of monomials (see [9]). We will make use of them in Section 4. The structure of the matrix A may give rise to finer constructions (lower order Pfaffians, for example) in exceptional cases (see [20]). In our approach, the fi are elements of M1, or were obtained in a previous calculation, and the ideal (a) is derived from the matrix of syzygies ϕ. 3 Algebraic invariants in rational parametrizations Let f1, . . . , fn+1 ∈ R = k[x1, . . . , xn] be forms of the same degree. They define a rational Ψ : Pn−1 99K Pn p → (f1(p) : f2(p) : · · · : fn+1(p)). Rational maps are defined more generally with any number m of forms of the same degree, but in this work we only deal with the case where m = n+ 1. There are two basic ingredients to the algebraic side of rational map theory: the ideal theoretic and the algebra aspects, both relevant for the nature of Ψ. First the ideal I = (f1, . . . , fn+1) ⊂ R, which in this context is called the base ideal of the rational map. Then there is the k-subalgebra k[f1, . . . , fn+1] ⊂ R, which is homogeneous, hence a standard k- algebra up to degree renormalization. As such it gives the homogeneous coordinate ring of the (closed) image of Ψ. Finding the irreducible defining equation of the image is known as elimination or implicitization. We refer to [21] and [18] (also [20] for an even earlier overview) for the interplay between the ideal and the algebra, as well as its geometric consequences. In particular, the Rees algebra R = R[It] plays a fundamental role in the theory. A pleasant side of it is that, since I is generated by forms of the same degree, one has R⊗R k ≃ k[f1t, . . . , fn+1t] ⊂ R, which retro-explains the (closed) image of Pn−1 by Ψ as the image of the projection to Pn of the graph of Ψ. In particular, the fiber cone is reduced and irreducible. 3.1 Elimination degrees and birationality Although a rational map Pn−1 99K Pn has a unique set of defining forms f1, . . . , fn+1 of the same degree and unit gcd, two such maps may look “nearly” the same if they happen to be composite with a birational map of the target Pn - a so-called Cremona transformation. If this is the case the two maps have the same degree, in particular the final elimination degrees are the same. However, it may still be the case that the two maps are composite with a rational map of the target which is not birational, so that their degrees as maps do not coincide, yet the degrees of the respective images are the same. In such an event, one would like to pick among all such maps one with smallest possible degree. This leads us to he notion of improper and proper rational parametrizations. Definition 3.1 Let Ψ = (f1 : · · · : fn+1) : Pn−1 99K Pn be a rational map, where gcd(f1, . . . , fn+1) = 1. We will say that Ψ (or the parametrization defined by f1, . . . , fn+1) is improper if there exists a rational map Ψ′ = (f ′1 : · · · : f ′n+1) : Pn−1 99K Pn, with gcd(f ′1, . . . , f n+1) = 1, such that: 1. There is an inclusion of k-algebras k[f1, . . . , fn+1] ⊂ k[f ′1, . . . , f ′n+1]; 2. There is an isomorphism of k-algebras k[f1, . . . , fn+1] ≃ k[f ′1, . . . , f ′n+1]; 3. degΨ′ < degΨ. We note that if Ψ is improper and Ψ′ is as above then the rational map (P1 : · · · : Pn+1) : Pn 99K Pn is not birational, where fj = Pj(f 1, . . . , f n+1), for 1 ≤ j ≤ n + 1. Of course, the transition forms Pj = Pj(y1, . . . , yn+1) are not uniquely defined. Example 3.2 The parametrization given by f1 = x 1, f2 = x 2, f3 = x 2 is improper since it factors through the parametrization f ′1 = x 2 = x1x2, f 3 = x 2 through either one of the rational maps (y1 : y2 : y3) 7→ (y21 : y22 : y23) or (y1 : y2 : y3) 7→ (y21 : y1t3 : y23) neither of which is birational. Moreover, the forms x21, x1x2, x 2 define a birational map onto its image. We say that a rational map Ψ = (f1 : · · · : fn+1) : Pn−1 99K Pn is proper if it is not improper. The need for considering proper rational maps will become apparent in the context. It is also a basic assumption in elimination theory when one is looking for the elimination degrees (see [9]). Clearly, if Ψ is birational onto its image then it is proper. The converse does not hold and one seeks for precise conditions under which Ψ is birational onto its image. This is the object of the following parts of this subsection. When the ideal I = (f1, . . . , fn+1) has finite co-length – that is, I is (x1, . . . , xn)-primary – it is natural to consider another mapping, namely, the corresponding embedding of the Rees algebra R = R[It] into its integral closure R̃. We will explore the attached Hilbert functions into the determinations of various degrees, including the elimination degree of the mapping. Thus, assume that I has finite co-length. Then we may assume (k is infinite) that f1, . . . , fn is a regular sequence, hence the multiplicity of J = (f1, . . . , fn) is d n, the same as the multiplicity of md. This implies that J is a minimal reduction of I and of md. We will set up a comparison between R and R′ = R[md], where m = (x1, . . . , xn), through two relevant exact sequences: 0 → R −→ R′ −→ D → 0, (1) and its reduction mod m R̄ −→ R̄′ −→ D̄ → 0. (2) F = R̄ is the special fiber ofR (or, of I), and since I is generated by forms of the same degree, one has F ≃ k[f1, . . . , fn+1] as graded k-algebras. By the same token, F ′ = R̄′ ≃ k[md] – the d-th Veronese subring of R. In particular, since dimF = dimF ′, the leftmost map in the exact sequence (2) is injective. AlsoD is annihilated by a power of m, hence dimD = dim D̄. These are the degrees (multiplicities) deg(F) and deg(F ′) of the special fibers. Since F ′ is an integral extension of F , one has deg(F ′) = deg(F)[F ′ : F ], (3) where [F ′ : F ] = dimK(F ′ ⊗F K), where K denotes the fraction field of F (see, e.g., [21, Proposition 6.1 (b) and Theorem 6.6] for more general formulas). Since F ′ is besides integrally closed, the latter is also the field extension degree [ k(md) : K ]. Note that [F ′ : F ] = 1 means that the extension F ⊂ F ′ is birational (equivalently, the rational map Ψ maps Pn−1 birationally onto its image). As above, set L = md. We next characterize birationality in terms of both the coefficient e1 and the dimension of the R-module D. Proposition 3.3 The following conditions are equivalent: (i) [F ′ : F ] = 1, that is Ψ is birational onto its image; (ii) deg(F) = dn−1; (iii) dim D̄ ≤ n− 1; (iv) dimD ≤ n− 1 (v) e1(L) = e1(I). Proof. (i) ⇐⇒ (ii) This is clear from (3) since deg(F ′) = dn−1. (i) ⇐⇒ (iii) Since ℓ(I) = n and F ⊂ F ′ is integral, then F ⊂ F ′ is a birational extension if and only if its conductor F :F F ′ is nonzero, equivalently, if and only if dim D̄ ≤ n− 1. (iv) ⇐⇒ (iii) Clearly, dimD ≤ n and in the case of equality its multiplicity is e1(L) − e1(I) > 0. Therefore, the equivalence of the two statements follows suit. ✷ There is some advantage in examining D̄ since F is a hypersurface ring, F = k[T1, . . . , Tn+1]/(f) = R[T1, . . . , Tn+1]/(x1, . . . , xn, f) a complete intersection. Since F ′ is also Cohen-Macaulay, with a well-known presentation, it affords an understanding of D̄, and sometimes, of D. 3.2 Calculation of e1(I) of the base ideal of a rational map One objective here is to apply some general formulas for the Chern number e1(I) of an ideal I to the case of the base ideal of a rational map with source P1 = Proj(k[x1, x2]). Here is a method put together from scattered facts in the literature of Rees algebras (see [23, Chapter 2]). Proposition 3.4 Let (R,m) be a Cohen-Macaulay local ring of dimension d, let I be an m-primary ideal with a minimal reduction J = (a1, . . . , ad). Set R ′ = R/(a1, . . . , ad−1), I ′ = IR′. Then (i) e0(I) = e0(I ′) = λ(R/J), e1(I) = e1(I (ii) r(I ′) < degR′ ≤ e0(I); in particular, for n ≥ r = r(I ′), one has I ′n+1 = adI ′n (iii) λ(R′/I ′ r+1) = λ(R′/I ′ r) + λ(I ′ r/adI ′ r) = e0(I)(r + 1)− e1(I) (iv) e1(I) = −λ(R′/I ′ r) + e0(I)r It would be desirable to develop a direct method suitable for the ideal I = (a, b, c) generated by forms of R = k[s, t], of degree n. We may assume that a, b for a regular sequence (i.e. gcd(a, b) = 1). We already know that e0(I) = n 2. For regular rings, one knows ([15]) that e1(I) ≤ d−12 e0(I), d = dimR. Nevertheless the steps above already lead to an efficient calculation for two reasons: the multiplicity e0(I) is known at the outset and it does not really involve the powers of I. Forms of degree up to 10 are handled well by Maucalay2 ([11]). 4 Sylvester forms in dimension two We establish the basic notation to be used throughout. R = k[s, t] is a polynomial ring over the infinite field k, and I ⊂ R = k[s, t] is a codimension 2 ideal generated by 3 forms of the same degree n+ 1, with free graded resolution 0 −→ R(−n−1−µ)⊕R(2(−n−1)+µ) ϕ−→ R3(−n−1) −→ I −→ 0, ϕ = α1 β1 γ1 α2 β2 γ2 Then the symmetric algebra of I is S ≃ R[T1, T2, T3]/(f, g) with f = α1T1 + β1T2 + γ1T3 g = α2T1 + β2T2 + γ2T3. Starting out from these 2 forms, the defining equations of S, following [9], we obtain by elim- ination higher degrees forms in the defining ideal of R(I). It will make use of a computer- assisted methodology to show that these algorithmically specified sets generate the ideal of definition M of R(I) in several cases of interest–in particular answering some questions raised [9]. More precisely, the so-called ideal of moving forms M is given when I is gen- erated by forms of degree at most 5. In arbitrary degree, the algorithm will provide the elimination equation in significant cases. 4.1 Basic Sylvester forms in dimension 2 Let R = k[s, t] and let F,G ∈ B = R[s, t, T1, T2, T3]. If F,G ∈ (u, v)B, for some ideal (u, v) ⊂ R, the form derived from h = ad− bc = det(F,G)(u,v), will be called a basic Sylvester form. To explain their naturalness, even for ideals I not necessarily generated by forms, we give an approach to irreducible decomposition of certain ideals. Theorem 4.1 Let (R,m) be a Gorenstein local ring and let I be an m–primary ideal. Let J ⊂ I be an ideal generated by a system of parameters and let E = (J : I)/J be the canonical module of R/I. If E = (e1, . . . , er), ei 6= 0, and Ii = ann (ei), then Ii is an irreducible ideal The statement and its proof will apply to ideals of rings of polynomials over a field. Proof. The module E is the injective envelope of R/I, and therefore it is a faithful R/I– module (see [2, Section 3.2] for relevant notions). For each ei, Re1 is a nonzero submodule of E whose socle is contained in the socle of E (which is isomorphic to R/m) and therefore its annihilator Ii (as an R-ideal) is irreducible. Since the intersection of the Ii is the annihilator of E, the asserted equality follows. ✷ Corollary 4.2 Let (R,m) be a regular local ring of dimension two and let I be an m– primary ideal with a free resolution 0 → Rn−1 ϕ−→ Rn −→ I → 0,  an−1,1 · · · an−1,n−1 an,1 · · · an,n−1  and suppose that the last two maximal minors ∆n−1,∆n of ϕ form a regular sequence. If e1, . . . , en−1 are as above, then (∆n−1,∆n) : I = In−2(ξ ′) = (e1, . . . , en−1) and each ideal (∆n−1,∆n) : ei is a complete intersection of codimension 2. Proof. The assertion that the irreducible Ii is a complete intersection is a result of Serre, valid for all two-dimensional regular rings whose projective modules are free. ✷ Remark 4.3 In our applications, I = C(f, g), the content ideal of f, g. In some of these cases, C(f, g) = (s, t)n, for some n, an ideal which admits the irreducible decomposition (s, t)n = (si, tn+1−i). One can then process f, g through all the pairs {si, tn−i+1}, and collect the determinants for the next round of elimination. As in the classical Sylvester forms, the inclusion C(f, g) ⊂ (s, t)n may be used anyway to start the process, although without the measure of control of degrees afforded by the equality of ideals. 4.2 Cohen-Macaulay algebras We pointed out in Theorem 2.1 that the basic control of Cohen-Macaulayness of a Rees algebra of an ideal I ⊂ k[s, t] is that its reduction number be at most 1. We next give a mean of checking this property directly off a free presentation of I. Theorem 4.4 Let I ⊂ R be an ideal of codimension 2, minimally generated by 3 forms of the same degree. Let α1 α2 β1 β2 γ1 γ2 be the Hilbert-Burch presentation matrix of I. Then R is Cohen-Macaulay if and only if the equalities of ideals of R hold (α1, β1, γ1) = (α2, β2, γ2) = (u, v), where u, v are forms. Proof. Consider the presentation 0 → L −→ S = R[T1, T2, T3]/(f, g) −→ R → 0, where f, g are the 1-forms [ T1 T2 T3 If R is Cohen-Macaulay, the reduction number of I is 1 by Theorem 2.1, so there must be a nonzero quadratic form h with coefficients in k in the presentation ideal M of R. In addition to h, this ideal contains f, g, hence in order to produce such terms its Hilbert-Burch matrix must be of the form  p1 p2 q1 q2 where u, v are forms of k[s, t], and the other entries are 1-forms of k[T1, T2, T3]. Since p1, p2 are q1, q2 are pairs of linearly independent 1-forms, the assertion about the ideals defined by the columns of ϕ follow. 4.3 Base ideals generated in degree 4 This is the case treated by D. Cox in his Luminy lecture ([9]). We accordingly change the notation to R = k[s, t], I = (f1, f2, f3), forms of degree 4. The field k is infinite, and we further assume that f1, f2 form a regular sequence so that J = (f1, f2) is a reduction of I and of (s, t)4. Let 0 → R(−4− µ)⊕R(−8 + µ) ϕ−→ R3(−4) −→ R −→ R/I → 0, ϕ = α1 α2 β1 β2 γ1 γ2  (4) be the Hilbert-Burch presentation of I. We obtain the equations of f1, f2, f3 from this matrix. Note that µ is the degree of the first column of ϕ, 4 − µ the other degree. Let us first consider (as in [9]) the case µ = 2. Balanced case We shall now give a computer-assisted treatment of the balanced case, that is when the resolution (4) of the ideal I has µ = 2 and the content ideal of the syzygies is (s, t)2. Since k is infinite, it is easy to show that there is a change of variables, T1, T2, T3 → x, y, z, so that (s2, st, t2) is a syzygy of I. The forms f, g that define the symmetric algebra of I can then be written [f g] = [s2 st t2] where u, v, w are linear forms in x, y, z. Finally, we will assume that the ideal I2 x y z u v w has codimension two. Note that this is a generic condition. We introduce now the equations of I. • Linear equations f and g: [f g] = [x y z] ϕ = [x y z] α1 α2 β1 β2 γ1 γ2 = [s2 st t2] where u, v, w are linear forms in x, y, z. • Biforms h1 and h2: Write Γ1 and Γ2 such that [f g] = [x y z] ϕ = [ s t2 ] Γ1 = [ s 2 t ] Γ2. Then h1 = detΓ1 and h2 = detΓ2. • Implicit equation F = detΘ, where [h1 h2] = [s t] Θ. Using generic entries for ϕ, in place of the true k-linear forms in old variables x, y, z, we consider the ideal of k[s, t, x, y, z, u, v, w] defined by f = s2x+ sty + t2z g = s2u+ stv + t2w h1 = −syu− tzu+ sxv + txw h2 = −szu− tzv + sxw + tyw F = −z2u2 + yzuv − xzv2 − y2uw + 2xzuw + xyvw − x2w2 Proposition 4.5 If I2 x y z u v w specializes to a codimension two ideal of k[x, y, z], then L = (f, g, h1, h2, F ) ⊂ A = R[x, y, z, u, v, w] specializes to the defining ideal of R. Proof. Macaulay2 ([11]) gives a resolution 0 → A d2−→ A5 −→ A5 −→ L → 0 where  zv − yw zu− xw −yu+ xv  The assumption on I2 x y z u v w says that the entries of d2 generate an ideal of codimension four and thus implies that the specialization LS has projective dimension two and that it is unmixed. Since LS 6⊂ (s, t)S, there is an element q ∈ (s, t)R that is regular modulo S/LS. If LS = Q1 ∩ · · · ∩Qr is the primary decomposition of LS, the localization LSq has the corresponding decompo- sition since q is not contained in any of the Qi. But now Symq = Rq, so LSq = (f, g)u, as Iq = Rq. ✷ Non-balanced case We shall now give a similar computer-assisted treatment of the non-balanced case, that is when the resolution (4) of the ideal I has µ = 3. This implies that the content ideal of the syzygies is (s, t). Let us first indicate how the proposed algorithm would behave. • Write the forms f, g as f = as+ bt g = cs+ dt, where x y z u v w • The next form is the Jacobian of f, g with respect to (s, t) h1 = det(f, g)(s,t) = ad− bc = −bxs2 − byst− bzt2 + aus2 + avst+ awt2. • The next two generators h2 = det(f, h1)(s,t) = b 2xs+ b2yt− abzt− abus− abvt+ a2wt and the elimination equation h3 = det(f, h2)(s,t) = −b3x+ ab2y − a2bz + ab2u− a2bv + a3w. Proposition 4.6 L = (f, g, h1, h2, h3) ⊂ A = k[s, t, x, y, z, u, v, w] specializes to the defin- ing ideal of R. Proof. Macaulay2 ([11]) gives the following resolution of L 0 → A2 ϕ−→ A6 ψ−→ A5 −→ L → 0, x + abu −b y + abz + abv − a w −bsx− bty + asu + atv −btz + atw −s x− sty − t u− stv − t t −s 0 0 0 0 a b t −s 0 0 0 0 a b t −s 0 0 0 0 a b The ideal of 2 × 2 minors of ϕ has codimension 4, even after we specialize from A to S in the natural manner. Since LS has projective dimension two, it will be unmixed. As LS 6⊂ (s, t), there is an element u ∈ (s, t)R that is regular modulo S/LS. If LS = Q1 ∩ · · · ∩Qr is the primary decomposition of LS, the localization LSu has the corresponding decompo- sition since u is not contained in any of the Qi. But now Symu = Ru, so LSu = (f, g)u, as Iu = Ru. ✷ 4.4 Degree 5 and above It may be worthwhile to extend this to arbitrary degree, that is assume that I is defined by 3 forms of degree n+1 (for convenience in the notation to follow). We first consider the case µ = 1. Using the procedure above, we would obtain the sequence of polynomials in A = R[a, b, x1, . . . , xn, y1, . . . , yn] • Write the forms f, g as f = as+ bt g = cs+ dt, where x1 · · · xn y1 · · · yn  sn−2t stn−2  • The next form is the Jacobian of f, g with respect to (s, t) h1 = det(f, g)(s,t) = ad− bc • Successively we would set hi+1 = det(f, hi)(s,t), 1 < n. • The polynomial hn = det(f, hn−1)(s,t) is the elimination equation. Proposition 4.7 L = (f, g, h1, . . . , h5) ⊂ A specializes to the defining ideal of R. In Macaulay2, we checked the degrees 5 and 6 cases. In both cases, the ideal L (which has one more generator in degree 6) has a projective resolution of length 2 and the ideal of maximal minors of the last map has codimension four. Conjecture 4.8 For arbitrary n, L = (f, g, h1, . . . , hn) ⊂ A has projective dimension two and specializes to the defining ideal of R. In degree 5, the interesting case is when the Hilbert-Burch matrix φ has degrees 2 and 3. Let us describe the proposed generators. For simplicity, by a change of coordinates, we assume that the coordinates of the degree 2 column of ϕ are s2, st, t2 f = s2x+ sty + t2z g = (s3w1 + s 2tw2 + st 2w3 + t 3w4)x+ (s 3w5 + s 2tw6 + st 2w7 + t 3w8)y + (s3w9 + s 2tw10 + st 2w11 + t 3w12)z Let [ x y z sA sB + tC tD  = φ x ys+ zt sA+ tB stC + t2D xs+ yt z s2A+ stB sC + tD where A,B,C,D are k-linear forms in x, y, z. h1 = det(B1) = s2(−yA) + st(xC − yB − zA) + t2(xD − zB) = s2(−yA) + t(xCs− yBs− zAs+ xDt− zBt) = s(−yAs+ xCt− yBt− zAt) + t2(xD − zB), h2 = det(B2) = s2(xC − zA) + st(xD + yC − zB) + t2(yD) = s2(xC − zA) + t(xDs+ yCs− zBs+ yDt) = s(xCs− zAs + xDt+ yCt− zBt) + t2(yD). x ys+ zt −yA xCs− yBs− zAs + xDt− zBt xs+ yt z −yAs+ xCt− yBt− zAt xD − zB x ys+ zt xC − zA xDs+ yCs− zBs+ yDt xs+ yt z xCs− zAs+ xDt+ yCt− zBt yD c1 = det(C1) = x 2(Cs+Dt) + xy(−Bs) + xz(−As−Bt) + yz(At) + y2(As) c2 = det(C2) = x 2(Ds) + xy(Dt) + xz(−Bs−Ct) + yz(As) + z2(At) c3 = det(C3) = x 2(Ds) + xy(Dt) + xz(−Bs−Ct) + yz(As) + z2(At) c4 = det(C4) = xy(Ds) + xz(−Cs−Dt) + yz(−Ct) + z2(As +Bt) + y2(D) x y z −yA xC − yB − zA xD − zB xC − zA xD + yC − zB yD Then F = −x3D2+x2yCD+xy2(−BD)+x2z(2BD−C2)+xz2(2AC−B2)+xyz(BC− 3AD) + y2z(−AC)+ yz2(AB)+ y3(AD) + z3(−A2), an equation of degree 5. In particular, the parametrization is birational. Proposition 4.9 L = (f, g, h1, h2, c1, c2, c4, F ) specializes to the defining ideal of R. Proof. Using Macaulay2, the ideal L has a resolution: 0 −→ S1 d3−→ S6 d2−→ S12 d1−→ S8 −→ L −→ 0. d3 = [−z y x − t s 0]t y z 0 0 0 0 x 0 z 0 0 0 −v 0 0 z 0 x2w4 − xzw7 + xyw8 + xzw12 u 0 0 0 z −xzw3 + xyw4 + z 2w6 − yzw7 + y 2w8 − xzw8 − z 2w11 + yzw12 0 x −y 0 0 0 0 −v 0 −y 0 xzw1 − x 2w3 + yzw5 + z 2w9 − xzw11 0 u 0 0 −y xzw2 − x 2w4 + z 2w10 − xzw12 0 0 u 0 −x xzw1 + yzw5 − xzw6 + x 2w8 + z 0 0 0 u v 0 0 0 v x 0 −xyw1 + x 2w2 − y 2w5 + xyw6 − x 2w7 − yzw9 + xzw10 0 0 0 0 0 −t 0 0 0 0 0 s The ideals of maximal minors give codim I1(d3) = 5 and codim I5(d2) = 4 after special- ization. As we have been arguing, this suffices to show that the specialization is a prime ideal of codimension two. ✷ Elimination forms in higher degree In degrees greater than 5, the methods above are not very suitable. However, in several cases they are still supple enough to produce the elimination equation. We have already seen this when one of the syzygies is of degree 1. Let us describe two other cases. • Degree n = 2p, f and g both of degree p. We use the decomposition (s, t)p = (si, tp+1−i). For each 1 ≤ i ≤ p, let hi = det(f, g)(si,tp+1−i). These are quadratic polynomials with coefficients in (s, t)p−1. We set [h1, · · · , hp] = [sp−1, · · · , tp−1] ·A, where A is a p × p matrix whose entries are 2-forms in k[x, y, z]. The Sylvester form of degree n, F = det(A), is the required elimination equation. • Degree n = 2p+ 1, f of degree p. We use the decomposition (s, t)p = (si, tp+1−i). For each 1 ≤ i ≤ p, let hi = det(f, g)(si,tp+1−i). These are quadratic polynomials with coefficients in (s, t)p. We set [f, h1, · · · , hp] = [sp, · · · , tp] ·B, where A is a (p + 1) × (p + 1) matrix with one column whose entries are linear forms and the remaining columns with entries 2-forms in k[x, y, z]. The Sylvester form F = det(B) is the required elimination equation. References [1] I. M. Aberbach, C. Huneke and N. V. Trung, Reduction numbers, Briançon-Skoda theorems and depth of Rees algebras, Compositio Math. 97 (1995), 403–434. [2] W. Bruns and J. Herzog, Cohen-Macaulay Rings, Cambridge University Press, 1993. [3] L. Busé and J.-P. Jouanolou, On the closed image of a rational map and the implicit- ization problem, J. Algebra 265 (2003), 312-357. [4] L. Busé, M. Chardin and J.-P. Jouanolou, Complement to the implicitization of ratio- nal hypersurfaces by means of approximation complexes, Arxiv, 2006. [5] L. Busé, D. Cox and C. DAndrea, Implicitization of surfaces in P3 in the presence of base points, J. Algebra Appl. 2 (2003), 189-214. [6] D. A. Cox, T. Sederberg, and F. Chen, The moving line ideal basis of planar rational curves, Comput. Aided Geom. Des. 15 (1998) 803–827. [7] D. A. Cox, R. N. Goldman, and M. Zhang, On the validity of implicitization by moving quadrics for rational surfaces with no base points, J. Symbolic Computation 29 (2000) 419–440. [8] D. A. Cox, Equations of parametric curves and surfaces via syzygies, Contemporary Mathematics 286 (2001) 1–20. [9] D. A. Cox, Four conjectures: Two for the moving curve ideal and two for the Bezoutian, Proceedings of “Commutative Algebra and its Interactions with Algebraic Geometry”, CIRM, Luminy, France, May 2006 (available in CD media). [10] C. D’Andrea, Resultants and moving surfaces, J. Symbolic Computation 31 (2001) 585–602. [11] D. Grayson and M. Stillman, Macaulay2, a software system for research in algebraic geometry. Available at http://www.math.uiuc.edu/Macaulay2/. [12] J. Herzog, A. Simis and W. V. Vasconcelos, Koszul homology and blowing-up rings, in Commutative Algebra, Proceedings: Trento 1981 (S. Greco and G. Valla, Eds.), Lecture Notes in Pure and Applied Mathematics 84, Marcel Dekker, New York, 1983, 79–169. [13] J. P. Jouanolou, Formes d’inertie et résultant: un formulaire, Adv. Math. 126 (1997), 119–250. [14] B. Johnson and D. Katz, Castelnuovo regularity and graded rings associated to an ideal, Proc. Amer. Math. Soc. 123 (1995), 727-734. [15] C. Polini, B. Ulrich and W. V. Vasconcelos, Normalization of ideals and Briançon- Skoda numbers, Math. Research Letters 12 (2005), 827–842. [16] M. E. Rossi, On symmetric algebras which are Cohen-Macaulay, Manuscripta Math. 34 (1981), 199-210. http://www.math.uiuc.edu/Macaulay2/ [17] T. Sederberg, R. Goldman and H. Du, Implicitizing rational curves by the method of moving algebraic curves, J. Symbolic Computation 23 (1997), 153–175. [18] A. Simis, Cremona transformations and some related algebras, J. Algebra 280 (1) (2004), 162–179. [19] A. Simis, B. Ulrich and W. V. Vasconcelos, Cohen-Macaulay Rees algebras and degrees of polynomial relations, Math. Annalen 301 (1995), 421–444. [20] A. Simis, B. Ulrich and W. V. Vasconcelos, Jacobian dual fibrations, Amer. J. Math. 115 (1993), 47–75. [21] A. Simis, B. Ulrich and W. V. Vasconcelos, Codimension, multiplicity and integral extensions, Math. Proc. Camb. Phil. Soc. 130 (2001), 237–257. [22] W. V. Vasconcelos, Arithmetic of Blowup Algebras, London Math. Soc., Lecture Note Series 195, Cambridge University Press, 1994. [23] W. V. Vasconcelos, Integral Closure, Springer Monographs in Mathematics, New York, 2005. Introduction Preliminaries on symmetric and Rees algebras Algebraic invariants in rational parametrizations Elimination degrees and birationality Calculation of e1(I) of the base ideal of a rational map Sylvester forms in dimension two Basic Sylvester forms in dimension 2 Cohen-Macaulay algebras Base ideals generated in degree 4 Degree 5 and above ABSTRACT We study birational maps with empty base locus defined by almost complete intersection ideals. Birationality is shown to be expressed by the equality of two Chern numbers. We provide a relatively effective method of their calculation in terms of certain Hilbert coefficients. In dimension two the structure of the irreducible ideals leads naturally to the calculation of Sylvester determinants via a computer-assisted method. For degree at most 5 we produce the full set of defining equations of the base ideal. The results answer affirmatively some questions raised by D. Cox. <|endoftext|><|startoftext|> Introduction We have all become accustomed to sending messages electronically, whether by fax machine, telephone, computer or other electronic media. Most of these messages contain data that is already publicly known or at least easily found. Other messages are things we would like to keep to ourselves, and it would be inconvenient if some third party came across the message. Still other messages are extremely private and resources, jobs, or even lives(!) might be lost if the message fell into the wrong hands. A great deal of effort is employed to encrypt the messages that fall in this last category, sending them with some sort of code in order to prevent any third party from understanding them even if the messages are intercepted.[1] However, when a message is sent electronically there is no commonly available technology to determine if someone has been trying to intercept the message. When sending typed letters, such a technology does exist, albeit in an imperfect form. We often seal our letters in envelopes. These envelopes are not secure, that is, they do not prevent anyone from opening the envelope and reading the letter inside. However, when an envelope is received intact, without any tears or other indication that it has been tampered with, we have a strong reason to believe that the message inside has not been seen by anyone since the earlier time when the sender sealed it. Yet a seal on an envelope is not to be wholly trusted for this task of detecting eavesdroppers. 1 Previous address: Army Research Laboratory, Adelphi, MD http://arxiv.org/abs/0704.0609v1 A skilled person might be able to examine the contents of the sealed envelope in any number of ways: by using x-rays or other similar non-destructive testing methods, by steaming the seal off and re-sealing, or by ripping open the envelope and then placing the letter in a new, forged envelope that matches the original in every detail. In this paper we introduce a quantum cryptographic protocol that allows two users to send and receive a message in a manner that is, in effect, quite similar to the use of a sealed envelope. The receiver of the message has the opportunity to check if there have been any active eavesdroppers trying to learn the contents of the message. And similar to a message sealed in an envelope, the message remains unknown to anyone who is not actively trying to learn the contents. This protocol has the advantage over sealing letters in envelopes because the limited types of interactions allowed by quantum mechanics prevent someone from eavesdropping on the message without leaving signs of the eavesdropping activities. It is important to make it clear that any messages sent using the protocol introduced here are not secure. That is, an eavesdropper can always choose to take some action in order to determine the content of a message sent using this protocol. (We give an example of one such effective eavesdropping strategy below.) The quality that makes this protocol distinct from other methods of message transmission is that any such active eavesdropping strategies will cause an appreciable amount of “noise” that is detectable by the message receiver. The analysis that a message receiver undertakes to place a bound on what an eavesdropper could have learned during a particular message transmission is not undertaken here. This analysis can be found elsewhere.[2] The goal of this manuscript is to examine a certain class of strategies for eavesdropping on these sealed messages, and it is divided into four parts: First, the quantum message sealing protocol is introduced. Following this, we examine a certain class of eavesdropping strategies and describe what an eavesdropper expects to learn by employing such strategies. Next, we describe the type and amount of disturbance the eavesdropper will cause by such an activity and work out the details of an example from this class of eavesdropping strategies. We conclude with a discussion of this protocol and its similarities and differences to other quantum cryptographic protocols. 2. Message sealing protocol We describe the protocol where the message sender named Alice transfers a message to the receiver named Bob. This message will be a single bit b which is either zero or one. The protocol utilizes a single quantum mechanical system which has two degrees of freedom. The standard notation for such a system is used, with |0〉 and |1〉 representing vectors that form an orthonormal basis. The protocol also involves a number of announcements made by the message sender. These announcements are to be considered as public announcements to which everyone is assumed to have access. A process, referred to as a single shot, will be repeated many times and goes as follows: Step 1 - Bob prepares a quantum system, which we refer to as a particle, in one of four pure states: |0〉, |1〉, |+〉 ≡ (|0〉+ |1〉)/ 2, or |−〉 ≡ (|0〉 − |1〉)/ 2. The decision as to which state to prepare is made at random with equal probability for each state. He records the state he has prepared and then he sends the particle to Alice. Step 2 - Alice makes one of two measurements with equal probability. She either makes a measurement corresponding to σ1 = |+〉〈+|−|−〉〈−| or she makes a measurement corresponding to σ3 = |0〉〈0| − |1〉〈1|. Each of these two measurements can be said to have a result m that is either m = +1 or m = −1. Step 3 - Alice announces whether her measurement corresponded to σ1 or σ3. Step 4 - Alice makes one of two possible announcements. With probability pa she makes a bit-announcement (described immediately below) and with probability (1 − pa) she makes a result-announcement. She also makes it known which of the two types of announcement she is making. Bit-Announcement: She announces a bit c that is determined by using the message bit b and the measurement result. If her measurement yielded the result m = +1 then her announced bit c will be the same as the message bit b and if her measurement yielded the result m = −1 her announced bit c will be the opposite of the message bit b. Result-Announcement: She announces the result of her measurement, m = +1 or m = −1. When Bob prepares the particle in the state |0〉 or |1〉 and Alice makes a σ3 measurement, or when Bob prepares the particle in the state |+〉 or |−〉 and Alice makes a σ1 measurement we say that Alice’s measurement and Bob’s state preparation have a matching basis. They will have a matching basis on half the shots performed. When this occurs then Bob knows the result of the measurement without Alice having to announce it, provided that the state of the particle did not change from when Bob prepared it to when Alice makes the measurement. The correlations between Bob’s state preparation and Alice’s measurement results allow Bob to both determine the message bit and check the channel for any disturbances. When Alice makes a measurement in the basis matching Bob’s state preparation, Bob determines the message by applying a controlled-bit-flip operation on the announced bit. When the state in which he prepared the particle is either |0〉 or |+〉 then the message bit b is the same as the announced bit c and if he prepared |1〉 or |−〉 then the message b is the opposite of the announced bit c. From an eavesdropper’s point of view, the probability that the message bit is one value or the other is determined from the coded bit-announcements. When both values of the measurement result are equally likely then both values of the message bit are equally likely (for either bit-announcement). The four possible initial states that Bob prepares and the two possible measurements were chosen so that either measurement result is equally likely. Moreover, the only opportunity that an eavesdropper has to change these probabilities is to change the state of the particle when it is traveling from Bob to Alice. The rules of quantum mechanics allow for the state of a quantum mechanical system to change in two different ways: by a unitary evolution or by a measurement. If we want to describe the effects of coupling the quantum system composed of the particle to another (auxiliary) quantum system and then letting the state of whole system (particle plus auxiliary) change via unitary evolution of measurement, the entire process can be described as a quantum operation or a generalized measurement on the state of the particle subsystem.[3] In the following sections we examine the case of when an eavesdropper chooses to change the state of the particle by applying a quantum operation. It is worthwhile to emphasize that while using this type of eavesdropping activity is not optimal,[2] it provides us with some intuition as to how this protocol can be expected to work. 3. Information gain from quantum operations In this section we quantify what an eavesdropper learns by applying a quantum operation to change the state of the particle as it travels from Bob to Alice. We first describe quantum operations[3] and then tackle the problem of quantifying an eavesdropper’s gain by using the Shannon mutual information.[4] A quantum operation E acting on states in Hilbert space H is described by a set of operators {E1, . . . , En} subject to the requirement that iEi = I where I is the identity operator acting on H. We say that the quantum operation E maps the initial state ρ to final state E(ρ) = i . A quantum operation is a convex linear map on the space of mixed states, which is to say that if ρ = pρ1 + (1 − p)ρ2 with 0 ≤ p ≤ 1, then E(ρ) = pE(ρ1) + (1 − p)E(ρ2). A special class of quantum operations are the unital quantum operations that map the chaotic state, which is 1 I where d = dim(H), to itself. We quantify the amount an eavesdropper learns by using the Shannon mutual information between two random variables: the random variable B which describes the possible values of the message and their probabilities, and the random variable C which describes the possible strings of bit-announcements and their probabilities. These strings result from the fact that there will be N shots, and an announcement will be made on each shot. On some of the shots only the result of the measurement will be announced, and this result does not depend on the message in any way. Therefore, only the bit-announcements will be of any concern to us in quantifying what the eavesdropper learns. The possible messages are b = 0 and b = 1 with one-half prior probability each. On each shot there are four possible bit-announcements — (σ1, 0), (σ1, 1), (σ3, 0), and (σ3, 1) — and when N shots are made, k of which result in bit-announcements (where 0 ≤ k ≤ N), there are 4k possible bit-announcement strings. Because of the probabilistic nature of the protocol, the number of bit-announcements is not fixed. The probability pk of making k bit-announcements is found using the binomial distribution p ka (1− pa)N−k . We use the symbol c to denote a bit-announcement string, and we use the symbol C(k) to describe the ensemble of all possible bit-announcement strings of length k. Given that there are k bit-announcements, the Shannon mutual information I(C(k) : B) is calculated using I(C(k) : B) = Pr(c) log Pr(c) Pr(c | b) log Pr(c | b) where the sum over c(k) indicates that this sum is taken over all 4k bit-announcement strings of length k. This can be used to determine the expected mutual information when taking the weighted sum over the various possible lengths of bit-announcement strings, I(C : B) = pkI(C (k) : B) . (2) This can be calculated once the probabilities Pr(c|b) are known for every c and both values of b. The remainder of this section is devoted to determining these probabilities, which will change depending upon which quantum operation is applied. For a given value of the message, the probabilities of the four bit-announcements depend upon the probability of Alice getting the m = +1 measurement result. That is, Pr(σi, c = b|b) = Pr(m = +1|σi) Pr(σi) = Pr(m = +1|σi)/2 , Pr(σi, c 6= b|b) = Pr(m = −1|σi) Pr(σi) = Pr(m = −1|σi)/2 , Table 1. The probabilities for the four results relevant to the bit-announcements, given that an eavesdropper acts with a quantum operation Eλv that maps the chaotic state to ρ(λv). Pr(m = +1|σ1, Eλv) = 12(1 + λv1) Pr(m = −1|σ1, Eλv) = 12(1− λv1) Pr(m = +1|σ3, Eλv) = 12(1 + λv3) Pr(m = −1|σ3, Eλv) = 12(1− λv3) where i = 1, 3 and b = 0, 1. The notation Pr(m = +1|σi), for example, is used to mean that this is the probability that the result m = +1 will be found when a measurement that corresponds to σi is made on the particle and Pr(σi) is the probability that the measurement corresponding to σi will be performed. Of course, the machinery of quantum mechanics requires us to specify the state of the particle in order to calculate a probability of a certain measurement result. From an eavesdropper’s point of view, if she does nothing to the particle then there are four possible states with equal probability. So Pr(m = ±1|σi) = 14 (Tr( (I±σi)|0〉〈0|)+Tr(12 (I±σi)|1〉〈1|)+Tr( (I±σi)|+〉〈+|)+ (I ± σi)|−〉〈−|)) where i = 1, 3. By the linearity of the Trace function, this is equivalent to Pr(m = ±1|σi) = Tr(12(I ± σi) I). In this way, it is quite reasonable to say that the state of the particle, to the eavesdropper’s best description, is the chaotic state ρ = 1 When an eavesdropper applies a quantum operation E to change the state of the particle, it will in general change each of the four possible initial states differently. By the linearity of the Trace function and the convex linearity of the quantum operation E , the probability of m = ±1 can be calculated for the state ρ′ = E(1 I). That is, Pr(m = ±1|σi) = Tr(12 (I ± σi)E( I)) for i = 1, 3. Every (generally mixed) state of a two-level quantum system can be described by ρ(λv) = (I + λ[v1σ1 + v2σ2 + v3σ3]) where v 1 + v 2 + v 3 = 1, σ2 = iσ1σ3, and 0 ≤ λ ≤ 1. This “Bloch sphere” description of the two-level state can be pictured as a vector λv in a real three dimensional space. When E(1 I) = 1 (I + λ[v1σ1 + v2σ2 + v3σ3]), the probabilities for the four possible announcements are shown in Table 1. If an eavesdropper applies the same quantum operation each time a particle is sent from Bob to Alice, the probabilities for each bit-announcement string is found by taking the product of the probabilities of each of the four announcements, with each of the probabilities appearing in the product the same number of times that that announcement appears in the string. We can now calculate the mutual information for any quantum operation by calculating the probabilities for each bit-announcement string and then using Equations (1) and (2). To summarize this section, we have described how to calculate the mutual information which quantifies what an eavesdropper expects to learn about the message given a particular quantum operation used as an eavesdropping strategy. In the next section, we determine the amount of “noise” that such eavesdropping strategies cause. 4. Disturbance caused by quantum operations In the previous section we focused on the bit-announcements and ignored the result- announcements. In this section we will do the opposite. The bit-announcements are used Table 2. The four events that correspond to mismatches. Bob prepares the state Alice measures measurement result |+〉 σ1 m = −1 |−〉 σ1 m = +1 |0〉 σ3 m = −1 |1〉 σ3 m = +1 by both Bob and any eavesdroppers to determine the message, but the result-announcements are of no use to the eavesdropper and serve Bob’s purpose to check the channel for “noise”. There are sixteen different event statistics that are kept by Bob relating to the measurement- announcements: four possible initial states, two possible measurement types, and two possible measurement results for each measurement. Out of these sixteen, there are four events that would be the most surprising to Bob, and would each indicate that the state of the particle, when Alice measured it, was not the same as the one he had prepared. These four types of events will be referred to as mismatches and are shown in Table 2. The probability of a mismatch, on a particular shot, is Pr(mismatch) = Pr(|+〉, σ1,−1) + Pr(|−〉, σ1,+1) + Pr(|0〉, σ3,−1) + Pr(|1〉, σ3,+1) Pr(σ1,−1||+〉) + Pr(σ1,+1||−〉) + Pr(σ3,−1||0〉) + Pr(σ3,+1||1〉) Pr(−1||+〉, σ1) + Pr(+1||−〉, σ1) + Pr(−1||0〉, σ3) + Pr(+1||1〉, σ3) Of course, when Bob analyzes the data, a mismatch can only occur on a particular shot if the bases are matched up. A factor of 1/2 disappears when we account for this to give the probability that there will be a mismatch error on a shot when the bases are matched. For a fixed quantum operation E employed by an eavesdropper, these probabilities are easily calculated. Note that these probabilities depend upon the final states E(|+〉〈+|), E(|−〉〈−|), E(|0〉〈0|), and E(|1〉〈1|), and not just on the evolution of the chaotic state. In general, there are many different quantum operations that have the same effect on the chaotic state. (The exception to this is when the chaotic state is mapped to a pure state, in which case it is easily seen by the convex linearity of quantum operations that every initial state must be mapped to that pure state.) 5. An Example Let us now examine a family of eavesdropping strategies that utilize the quantum operation Ex, where x is a parameter which falls in the range 0 ≤ x ≤ 1. When x = 0 the strategy corresponds to the eavesdropper doing nothing (and as we shall see, learning nothing), and when x = 1 it corresponds to a quantum operation eavesdropping strategy with the greatest mutual information. The quantum operation Ex can be achieved by coupling the initial state ρ (from Bob) to an auxiliary quantum system in the state |φ〉, letting the coupled system evolve unitarily (described by some unitary operator U that acts on the combined system) and then tracing over the auxiliary system. The unitary operator acts as follows: |0〉 ⊗ |φ〉 = |0〉 ⊗ |F 〉 ≡ |Γ0〉 Table 3. The probabilities, from the eavesdropper’s point of view, of the four possible bit- announcements for a given value of b when the quantum operation Ex, introduced in Section 5, is applied. b = 0 b = 1 Pr(σ1, c = 0|b) 1/4 1/4 Pr(σ1, c = 1|b) 1/4 1/4 Pr(σ3, c = 0|b) (1 + x)/4 (1− x)/4 Pr(σ3, c = 1|b) (1− x)/4 (1 + x)/4 |1〉 ⊗ |φ〉 x |0〉 ⊗ |G〉+ 1− x |1〉 ⊗ |F 〉 ≡ |Γ1〉 , where 〈F |G〉 = 0 and 〈F |F 〉 = 〈G|G〉 = 1. The fact that 〈0|1〉〈φ|φ〉 = 〈Γ0|Γ1〉 is sufficient to show that such a unitary operator U exists. The action of the quantum operation Ex on any initial pure state |η〉 is found by tracing over the auxiliary subsystem after performing the unitary transformation U : |η〉〈η| = Traux |η〉 ⊗ |φ〉 〈η| ⊗ 〈φ| By the convex linearity of quantum operations we also know the action of Ex on any mixed state as well. From the preceeding considerations, it is straightforward to show that Ex acts on the relevant initial states in the following way: |0〉〈0| = |0〉〈0| |1〉〈1| = x |0〉〈0| + (1− x) |1〉〈1| |+〉〈+| (1 + x) |0〉〈0| + (1− x) |1〉〈1| + |0〉〈1| + |1〉〈0| |−〉〈−| (1 + x) |0〉〈0| + (1− x) |1〉〈1| − |0〉〈1| + |1〉〈0| from which it is easy to see that (1 + x) |0〉〈0| + (1− x) |1〉〈1| (I + xσ3). The probability of a mismatch, calculated using Equation (3), for this quantum operations is (1 + x− 1− x). In order to calculate the mutual information for this quantum operation, we must be able to determine the values of Pr(c|b, Ex), that is, the probability of a every string of result- announcements c given each value of b. If a particular string of k result-announcements c(k, d1, d2, d3, d4) consists of (σ1, c = 0) announced d1 times, (σ1, c = 1) announced d2 times, (σ3, c=0) announced d3 times, and (σ3, c=1) announced d4 times — in any order — then the probability for this announcement to occur is c(k, d1, d2, d3, d4)|b=0, Ex (1 + x)d3(1− x)d4 ≡ px,k,d3,d4 c(k, d1, d2, d3, d4)|b=1, Ex (1− x)d3(1 + x)d4 ≡ qx,k,d3,d4 . 0.2 0.4 0.6 0.8 1 I HC : BL Figure 1. Mutual information as a function of x, describing the amount an eavesdropper learns about the message bit given that she uses the quantum operation Ex on each shot when Bob sends N = 119 particles and Alice has probability pa = 0.01 of making a bit- announcement. 0.2 0.4 0.6 0.8 1 Mismatch Probability Figure 2. Probability of a mismatch as a function of x when an eavesdropper uses the quantum operation Ex on each shot. This calculation utilizes the probabilities for the single announcements found in Table 3. There are k!/(d3!d4!(k − d3 − d4)!) different strings of k bit announcements that share this same probability (for each value of b). Using these results, we can now calculate the mutual information. I(C(k) : B) = − d3!d4!(k−d3−d4)! px,k,d,d3 + qx,k,d,d3 px,k,d,d3 + qx,k,d,d3 px,k,d,d3 log px,k,d,d3 − qx,k,d,d3 log qx,k,d,d3 If we choose some exemplary values of pa and N , this will give us some numerical results for the mutual information. Say that Alice sets pa = 0.05 and Bob agrees with Alice to send N = 119 particles in order to communicate the value of a single bit. This choice of pa and N gives them slightly more than a 95% chance of matching their bases on a shot when a result-announcement is made. The mutual information I(C : B), when an eavesdropper applies the quantum operation Ex on every shot, is plotted for all values of 0 ≤ x ≤ 1 in Figure 1. Compare this with the disturbance caused, quantified by the probability of a mismatch, by applying the same quantum operation, which is shown in Figure 2. As a final note, this example demonstrates that a passive eavesdropper learns nothing about the message. That is, if we describe a passive eavesdropper as someone who is only listening to the announcements that Alice makes but does not interfere with the particles in any way,[1] that person’s eavesdropping strategy would correspond to Ex when x = 0. It is easily seen from the Figures that this strategy causes the eavesdropper to learn nothing and also to causes no disturbance. 6. Discussion This protocol represents something new in the field of cryptography. It provides the message receiver with a way to check if an eavesdropper is attempting to access the message. The analysis shown here demonstrates both the amount learned by an eavesdropper and the disturbance caused, measured in the number of mismatches, when an eavesdropper employs a particular quantum operation. As shown in the example above, this protocol is not secure against active attacks in which an eavesdropper interacts with the particles as they travel from the message receiver to the message sender. However, this example also demonstrates that such attacks cause a disturbance in the system, which can be quantified by the number of mismatches found by the message receiver. A more general analysis a message receiver’s bound on the amount of information an eavesdropper could have learned during a particular transmission is taken up elsewhere.[2] The protocol discussed here has similarities to other quantum cryptography protocols that have been introduced and it is worthwhile to examine these similarities, as well as what makes this current protocol distinct. The three types of quantum cryptographic protocols that will be discussed here are quantum key distribution (QKD) protocols, quantum secure direct communication (QSDC) protocols, and quantum seal protocols. The main distinction between this new protocol and the QKD protocols is that the goal of QKD is to develop a shared private key between two parties while here it is important that a particular message gets transmitted. Said in a different way, each party in a QKD setting starts with nothing and ends up with a random string of bits, but neither one of them cares which string of bits results from the process, so long as they share the same one. Here, one party starts with a particular string of bits — the message — and when the process ends the other party will (hopefully) have the message as well. (There is a tunably small probability that the process will be unsuccessful.[2]) Of course, in QKD the random string of bits can later be used to encrypt a message (which can be sent on a classical channel), but the QKD process itself transfers no information. It is worthwhile mentioning that this current protocol is very similar, in some ways, to a specific QKD protocol, called BB84.[5] The two protocols use the same four initial possible states and the same two measurements. The difference between the two is the classical messages that are sent and how these messages are used. These two protocols are so similar that if two users have a system that implements BB84 then they should be able to implement this new protocol with only minor modifications to the system. The second type of quantum cryptographic protocol that we will discuss is the so-called “quantum secure direct communication” (QSDC).[6] The greatest similarity between the QSDC protocols and the one introduced here is that they both use quantum states of some transferred system to transmit a message from one party to another, rather than generating a key. Moreover, this is done without the use of any pre-shared key. However, the goal of QSDC is to transmit the messages securely (that is, to prevent any eavesdropper from understanding the message), while the goal of the protocol introduced here is to detect the activity of any active eavesdroppers. The final comparison we will make is with those quantum cryptographic protocols that have been called “quantum seal” protocols.[7] These quantum seal protocols are distinct from the current one. The goal of the quantum seal protocols is for a message sender to prepare a quantum mechanical system in some initial state so that someone else can determine the message by making a measurement on that quantum mechanical system. Moreover, the message preparer also creates correlations between the quantum mechanical system and a second quantum mechanical system so that a measurement can be made, by the message preparer, on the second system to determine if anyone has read the message. The major distinction between these quantum seal protocols and the protocol introduced here is that protocol introduced here has a preferred message receiver (the person who sends the particles to the message sender) who can check if anyone else has tried to read the message, while in these earlier quantum seal protocols[7] all receivers are on equal footing and it is the message sender who can check if someone has accessed the message. We conclude this discussion by emphasizing that the protocol introduced here is neither a QKD protocol, nor a QSDC protocol, nor a quantum seal protocol. It has distinct goals and the various security (or no-security) proofs that have been applied to these earlier protocols do not apply here. Acknowledgments This work was funded in part by the Disruptive Technology Office (DTO) and by the Army Research Office (ARO). This research was performed while Paul Lopata held a National Research Council Research Associateship Award at the Army Research Laboratory. References [1] Brassard G Modern Cryptology 1988 (Spring-Verlag New York, Inc.) [2] Lopata P and Bahder T, manuscript in preparation [3] Nielsen M and Chuang I 2000 Quantum Computation and Quantum Information (Cambridge University Press) [4] Shannon C 1993 Claude Elwood Shannon Collected Papers (IEEE Press) p 84 [5] Bennett C and Brassard G 1984 Proceedings of IEEE International Conference on Computers, Systems and Signal Processing (IEEE Press) pp 175–179 [6] Boström K and Felbinger T 2002 Physical Review Letters 89 187902 Wójcik A 2003 Physical Review Letters 90 157901 Deng F-G, Long G L, and Liu X-S 2003 Physical Review A 68 042317 Deng F-G and Long G L 2004 Physical Review A 69 052319 Lucamarini M and Mancini S Physical Review Letters 94 140501 and others. [7] Bechmann-Pasquinucci H 2003 Quantum Seals Preprint quant-ph/0303173 Bechmann-Pasquinucci H, D’Ariano G M, and Macchiavello C 2005 Impossibility of Perfect Sealing of Classical Information Preprint quant-ph/0501073 Singh S K and Srikanth R 2005 Physica Scripta 71 pp 433–5 He G-P 2005 Physical Review A 71 054304 Chau H F 2006 Physics Letters A 354 pp 31–4 http://arxiv.org/abs/quant-ph/0303173 http://arxiv.org/abs/quant-ph/0501073 ABSTRACT A quantum protocol is described which enables a user to send sealed messages and that allows for the detection of active eavesdroppers. We examine a class of eavesdropping strategies, those that make use of quantum operations, and we determine the information gain versus disturbance caused by these strategies. We demonstrate this tradeoff with an example and we compare this protocol to quantum key distribution, quantum direct communication, and quantum seal protocols. <|endoftext|><|startoftext|> Shocks in nonlocal media Neda Ghofraniha,1 Claudio Conti,2,3 Giancarlo Ruocco,3,4 Stefano Trillo5∗ 1 Research Center SMC INFM-CNR, Università di Roma “La Sapienza”, P. A. Moro 2, 00185, Roma, Italy 2Centro Studi e Ricerche “Enrico Fermi”, Via Panisperna 89/A, 00184 Rome, Italy 3Research center SOFT INFM-CNR Università di Roma “La Sapienza”, P. A. Moro 2, 00185, Roma, Italy 4Dipartimento di Fisica, Università di Roma “La Sapienza”, P. A. Moro 2, 00185, Roma, Italy 5 Dipartimento di Ingegneria, Università di Ferrara, Via Saragat 1, 44100 Ferrara, Italy (Dated: October 1, 2018) We investigate the formation of collisionless shocks along the spatial profile of a gaussian laser beam propagating in nonlocal nonlinear media. For defocusing nonlinearity the shock survives the smoothing effect of the nonlocal response, though its dynamics is qualitatively affected by the latter, whereas for focusing nonlinearity it dominates over filamentation. The patterns observed in a thermal defocusing medium are interpreted in the framework of our theory. Shock waves are a general phenomenon thoroughly in- vestigated in disparate area of physics (fluids and water waves, plasma physics, gas dynamics, sound propagation, physics of explosions, etc.), entailing the propagation of discontinuous solutions typical of hyperbolic PDE mod- els [1, 2]. They are also expected in (non-hyperbolic) universal models for dispersive nonlinear media, such as the Korteweg-De Vries (KdV) and nonlinear Schrödinger (NLS, or analogous Gross-Pitaevskii) equations, since hy- drodynamical approximations of such models hold true in certain regimes (typically, in the weakly dispersive or strongly nonlinear case) [3, 4, 5]. However, in the lat- ter cases, no true discontinuous solutions are permitted. The general scenario, first investigated by Gurevich and Pitaevskii [3], is that dispersion regularizes the shock, de- termining the onset of oscillations that appear near wave- breaking points and expand afterwards. This so-called collisionless shock has been observed for example in ion- acoustic waves [6], or wave-breaking of optical pulses in a normally dispersive fiber [7], and recently in a Bose- Einstein condensate with positive scattering length [8]. In this Letter we investigate how nonlocality of the nonlinear response affects the formation of a collisionless shock in a system ruled by a NLS model. In fact nonlocal- ity plays a key role in many physical systems due to trans- port phenomena and finite range interactions (e.g. as in Bose-Einstein condensation), and can be naively thought to smooth and eventually wipe out steep fronts character- istic of shocks. More specifically, we place this problem in the context of nonlinear optics where nonlocality arises quite naturally in different media [9, 10, 11, 12], study- ing the spatial propagation of a fundamental (gaussian TEM00) laser mode subject to diffraction and nonlocal focusing/defocusing action (Kerr effect). In a defocus- ing and ideal (local and lossless) medium, high intensity portions of the beam diffract more rapidly than the tails leading, at sufficiently high powers, to overtaking and os- cillatory wave-breaking similar (in 1D) to what observed in the temporal case [18]. We find that, while shock is not hampered by the presence of (even strong) nonlocal- ity, the mechanism of its formation as well as post-shock patterns are qualitatively affected by the nonlocality. Ex- perimental results obtained with a thermal defocusing nonlinearity are consistent with our theory and shed new light on the interpretation of the thermal lensing phe- nomenon. Importantly, our theory permits also to establish that nonlocality allows the shock to form also in the focusing regime where, contrary to the local case, it can prevails over filamentation or modulational instability (MI). Theory We start from the paraxial wave equation obeyed by the envelope A of a monochromatic field E = ( 2 )1/2A exp(ikZ − iωT ) (|A|2 is the intensity) + k0∆nA = −i A. (1) where k = k0n = n is the wave-number, and α0 the intensity loss rate. A sufficiently general nonlocal model can be obtained by coupling Eq. (1) to an equa- tion that rules the refractive index change ∆n of nonlin- ear origin. Introducing the scaled coordinates x, y, z = X/w0, Y/w0, Z/L, and complex variables ψ = A/ and θ = k0Lnl∆n, where Lnl = (k0|n2|I0)−1 is the non- linear length scale associated with peak intensity I0 and a local Kerr coefficient n2 (∆n = n2|A|2), Ld = kw20 is the characteristic diffraction length associated with the input spot-size w0, and L ≡ LnlLd, such model can be conveniently written as follows [12] ψ + χθψ = −iα εψ, (2) −σ2∇2 θ + θ = |ψ|2, (3) where α = α0L, ∇2⊥ = ∂2x + ∂2y , χ = n2/|n2| = ±1 is the sign of the nonlinearity, and σ2 is a free parameter that measures the degree of nonlocality. The peculiar dimensionless form of Eqs. (2-3) where ε ≡ Lnl/L = Lnl/Ld is a small quantity, highlights the fact that we will deal with the weakly diffracting (or strongly non- linear) regime, such that the local σ = 0 and lossless α = 0 limit yields a semiclassical Schrödinger equation with cubic potential (ε and z replace Planck constant http://arxiv.org/abs/0704.0610v1 and time, respectively). We study Eqs. (2-3) subject to the axi-symmetric gaussian input ψ0(r) = exp(−r2), x2 + y2, describing a fundamental laser mode at its waist. For ε ≪ 1, its evolution can be studied in the framework of the WKB trasformation ψ(r, z) = ρ(r, z) exp [iφ(r, z)/ ε] [4]. Substituting in Eqs. (2-3) and retaining only leading orders in ε, we obtain (D − 1) ρu+ (ρu)r = −αρ; uz + uur − χθr = 0, θrr + D − 1 + θ = ρ. (4) where u ≡ φr is the phase chirp, and D = 2 is the trans- verse dimensionality. The 1D case described by Eqs. (4) with D = 1 and r → x (∂y = 0) illustrates the ba- sic physics with least complexity. In the defocusing case (χ = −1) for an ideal medium (σ = α = 0, θ = ρ), Eqs. (4) are a well known hyperbolic system of conser- vation laws (Eulero and continuity equations) with real celerities (or eigenspeeds, i.e. velocities of Riemann in- variants) v± = u ± −χρ, which rules gas dynamics (u and ρ are velocity and mass density of a gas with pres- sure ∝ ρ2). A gaussian input is known to develop two symmetric shocks at finite z [4]. Importantly the diffrac- tion, which is initially of order ε2, starts to play a major role in the proximity of the overtaking point, and regu- larize the wave-breaking through the appearance of fast (wavelength ∼ ε) oscillations which connect the high and low sides of the front and expand outwards (far from the beam center) [3]. Such oscillations, characteristic of a col- lisionless shock, appear simultaneously in intensity and phase chirp (u) as clearly shown in Fig. 1(a,c). In the nonlocal case, the index change θ(x) can be wider than the gaussian mode (for large σ) and the shock dynamics is essentially driven by the chirp u with ρ adia- batically following. This can be seen by means of the fol- lowing approximate solution of Eqs. (4): considering that the equation for ρ is of lesser order [O(ǫ)], with respect to those for θ and u [O(1)], we assume ρ = exp(−2x2) unchanged in z and solve exactly the third of Eq. (4) for θ(x) (though derived easily, its full expression is quite cumbersome). Then, applying the theory of characteris- tics [1], the second of Eqs. (4) is reduced to the following ODEs, where dot stands for d/dz ẋ = u ; u̇ = χθx, (5) equivalent to the motion of a unit mass in the potential V (x) = −χθ with conserved energy E = u(z) + V (x). The solution of Eqs. (5) with initial condition x(0) = s, u(0) = 0 yields x(s, z), u(s, z) in parametric form, from which overtaking is found whenever u(x, z) (obtained by eliminating s) becomes a multivalued function of x at finite z = zs. The shock point corresponding to |du/dx| → ∞ is found from the solution u(x, z) displayed in Fig. 2(a) [ 2(b)], at positions x = ±xs 6= 0 (defocusing FIG. 1: (Color online) 1D spatial profiles of phase chirp u(x) (a-b) and amplitude |ψ(x)| = ρ(x) (c-d), as obtained from Eqs. (2-3) with ε = 10−3, α = 0, χ = −1 (defocusing), and ψ0 = exp(−x 2), for different z as indicated: (a-c) local case, σ2 = 0; (b-d) nonlocal case, σ2 = 5. FIG. 2: (Color online) (a) u(x) for different z and χ = 1 (focusing), σ = 1; (b) as in (a) for χ = −1 (defocusing); (c) shock distance zs (χ = −1 bold solid, χ = 1 thin solid) and shock position xs in the defocusing case (dashed line). case) or xs = 0 (focusing case). The shock distance zs increases with σ in both cases, as shown in Fig. 2(c). We have tested these predictions by integrating nu- merically Eqs. (2-3). Simulations with χ = −1 [see Fig. 1(b,d)] show indeed steepening and post-shock oscilla- tions in the spatial chirp u, which are accompanied by a steep front in ρ moving outward. The shock location in x and z is in good agreement with the results of our approximate analysis summarized in Fig. 2. Numerical simulations of Eqs. (2-3) validates also the focusing scenario. The field evolution displayed in Fig. 3(a) exhibits shock formation at the focus point (xs = 0, zs ≃ 8, for σ = 5) driven the phase whose chirp is shown in Fig. 3(b). This is remarkable because, in the local limit σ = 0, the celerities become imaginary (the equivalent gas would have pressure decreasing with increasing density ρ), and no shock could be claimed to exist. In this limit, the reduced problem (4) is elliptic and the initial value problem is ill-posed [13], an ultimate con- sequence of the onset of MI: modes with transverse (nor- FIG. 3: (Color online) Level plot of the intensity in the focusing case (χ = 1, ε = 0.01): (a) nonlocal case (σ2 = 25); (b) chirp profile for various z for (a); (c) quasi-local case (σ2 = 10−5). malized) wavenumber q < ∆q grow exponentially with z, with both gain g and bandwidth ∆q scaling as 1/ε. How- ever, the nonlocal response tends to frustrate MI (see also Refs. [9, 12]), as shown by standard linear stability anal- ysis which yields g = d(2χ− d)/ε2 (we set d ≡ ε2q2/2 and χ ≡ χ/(1 + σ2q2)), in turn implying a strong reduc- tion of both gain and bandwidth for large σ. In order to emphasize the difference between the local and non- local regime, we contrast Fig. 3(a) with the analogous evolution [see Fig. 3(c)] obtained in the quasi-local limit (σ2 = 10−5), which appears to be clearly dominated by filamentation. Thermal nonlinearity The physics of the defocusing case can be experimentally tested by exploiting ther- mal nonlinearities of strongly absorptive bulk samples, that we show below to fit our model. In this case, the system relaxes to a steady-state refractive index change ∆n = (dn/dT )∆T , where dn/dT is the thermal coef- ficient, and ∆T the local temperature change due to optical absorption. It is well known that this so-called thermal lens distorts a laser beam propagating in the medium [14, 15, 16]. However, only perturbative ap- proaches to the problem have been proposed (ray optics or Fresnel diffraction theory is applied after the lens pro- file is worked out from gaussian ansatz [14]), while the role of shock phenomena was completely overlooked. FIG. 4: 2D evolution according to Eqs. (2-3) with σ2 = 1, α = 1: (a) radial phase chirp at different z, as indicated, showing steepening and shock formation for ε = 10−2; (b) cor- responding intensity profile |ψ|2 (maximum scaled to unity) at z = 4.9; (c) transverse intensity profile vs. x (at y = 0) at z = 1/(4ε) and different values of ε (α0 = 62cm −1, σ = 0.3). We assume that the temperature field ∆T = ∆T⊥(X,Y ) obeys the following 2D heat equation (∂2X + ∂ Y )∆T⊥ − C∆T⊥ = −γ|A|2 (6) where the source term account for absorption pro- portional to intensity through the coefficient γ = α0/(ρ0cpDT ), where ρ0 the material density, cp the spe- cific heat at constant pressure, and DT is the thermal diffusivity (see e.g. [16]). Eq. (6) has been already em- ployed to model a refractive index of thermal origin in Ref. [10], and in Ref. [11] in the limit C = 0 which is equivalent to consider the range of nonlocality (mea- sured by 1/C, see below) to be infinite. Starting from the 3D heat equation ∇2∆T = −γ|A|2, the latter regime amounts to assume ∆T (X,Y, Z) = ∆T⊥(X,Y ), which is justified when longitudinal changes in intensity |A|2 are negligible as for solitary (invariant in Z) wave-packets in the presence of low absorption [11]. Viceversa, in the regime of strong absorption, we need to account for lon- gitudinal temperature profiles that are known from solu- tions of the 3D heat equations to be peaked at charac- teristic distance Ẑ in the middle of sample and decay to room temperature on the facets [14]. Since highly non- linear phenomena occurs in the neighborhood of Ẑ where the index change is maximum, we can use a (longitudi- nal) parabolic approximation with characteristic width Leff (∼ L) of the 3D temperature field ∆T (X,Y, Z) = 1− (Z−Ẑ) ∆T⊥(X,Y ) and consequently approximate ∇2∆T ≃ (∂2X + ∂2Y )∆T⊥ − L eff∆T⊥, so that the 3D heat equation reduces to Eq. (6) with C = 1/L2eff . Following this approach, Eq. (6) coupled to Eq. (1) can be casted in the form of Eqs. (2-3) by posing θ = k0Lnl|dn/dT |∆T⊥ and σ2 = 1/(Cw20) = L2eff/w20. The model reproduces the infinite range nonlocality for negligible losses (Leff → ∞); while for thin samples [|(∂2X + ∂2Y )∆T⊥| << |L eff∆T⊥|], Leff can be related to the Kerr coefficient n2 as Leff = γ|dn/dT | DTρ0cp|n2| α0|dn/dT | which establishes a link between the degree of nonlocal- ity and the strength of the nonlinear response (similarly to other nonlocal materials [12]). Having retrieved the model Eqs. (2-3), let us show next that the scenario illustrated previously applies sub- stantially unchanged in bulk (2D case) even on account for the optical power loss (α 6= 0). An example of the general dynamics is shown in Fig. 4, where we report a simulation of the full model (2-3), with σ2 = 1 and rela- tively large loss α = 1. In analogy to the 1D case, Fig. 4(a) clearly shows that the radial chirp u = φr steep- ens and then develop characteristic oscillations after the shock point (z ≃ 6, where |∂ru| → ∞). Correspond- ingly the intensity exhibits also an external front which is connected to a flat central region with a characteris- tic overshoot [see Fig. 4(b)] corresponding to a brighter ring [inset in Fig. 4(c)]. For larger distances this struc- ture moves outward following the motion of the shock. In the experiment such motion can be observed, at fixed physical lenght, by increasing the power, which amounts to decrease ε while scaling z and α accordingly (z ∝ 1/ε, α ∝ ε), as displayed in Fig. 4(c) for σ = 0.3. As a sample of a strongly absorbing medium we choose a 1 mm long cell filled with an acqueous solution of Rhodamine B (0.6 mM concentration). Our measure- ments of the linear and nonlinear properties of the sam- ple performed by means of the Z-scan technique gives data consistent with the literature [17], and allows us to extrapolate at the operating vacuum wavelength of 532 nm, a linear index n = 1.3, a defocusing nonlin- ear index n2 = 7 × 10−7 cm2W−1, and α0 = 62 cm−1. For our sample DT = 1.5 × 10−7 m2s−1, ρ0 = 103 kg m−3, cp = 4 × 103Jkg−1K−1 and |dn/dT | = 10−4 K−1 (γ ∼= 104 K W−1), and exploiting Eq. (7) we estimate Leff ∼= 10µm (Leff << L because of the strong ab- sorption that causes strong heating of our sample near the input facet), and correspondingly the degree of non- locality σ ∼= 0.3. We operate with an input gaussian beam with fixed intensity waist w0I = w0/ 2 = 20 µm (Ld ∼= 12 mm) focused onto the input face of the cell. With these numbers, an input power P = πw20II0 = 200 mW yields a nonlinear length Lnl ∼= 8 µm (L ∼= 0.3 mm), which allows us to work in the semiclassical regime with ε ∼= 0.025. The radial intensity profiles together with the 2D patterns imaged by means of a 40×microscope objec- tive and a recording CCD camera are reported in Fig. 5. As shown the beam exhibits the formation of the bright ring whose external front moves outward with increasing power, consistently with the reported simulations. We point out that, at higher powers, we observe (both ex- perimentally and numerically) that the moving intensity front leaves behind damped oscillations that correspond to inner rings of lesser brightness, as reported in litera- ture [15]. This, however, occurs well beyond the shock point that we have characterized so far. In summary, the evolution of a gaussian beam in the strong nonlinear regime is characterized by occurence of collisionless (i.e., regularized by diffraction) shocks that survive the smoothing effect of (even strong) nonlocality. While experimental results support the theoretical sce- nario in the defocusing case, the remarkable result that the nonlocality favours shock dynamics over filamenta- tion requires future investigation. ∗ Electronic address: claudio.conti@phys.uniroma1.it [1] G. B. Whitman, Linear and Nonlinear Waves (Wiley, New York, 1974); [2] L. D. Landau and E. M. Lifshitz, Fluid Mechanics (Perg- amon, 1995); M. A. Liberman and A. L. Yelikovich, Physics of Shock Waves in Gases and Plasmas (Springer, FIG. 5: Radial profiles of intensity observed in the thermal medium for different input powers. The insets show the cor- responding 2D output patterns. Heidelberg, 1986). [3] A.V. Gurevich and L.P. Pitaevskii, Sov. Phys. JETP 38, 291 (1973); A.V. Gurevich and A. L. Krylov, Sov. Phys. JETP 65, 944 (1987). [4] J. C. Bronski and D. W. McLaughlin, in Singular Limits of Dispersive Waves, NATO ASI Series, Ser. B 320, pp. 21-28 (1994); M. G. Forest and K. T. R. McLaughlin, J. Nonlinear Science 7, 43 (1998); Y. Kodama, SIAM J. Appl. Math. 59, 2162 (1999). M. G. Forest, J. N. Kutz, and K. T. R. McLaughlin, J. Opt. Soc. Am. B 16, 1856 (1999). [5] A. M. Kamchatnov, R. A. Kraenkel, and B. A. Umarov, Phys. Rev. E 66, 036609 (2002). [6] R. J. Taylor, D.R. Baker, and H. Ikezi, Phys. Rev. Lett. 24, 206 (1970). [7] J. E. Rothenberg and D. Grischkowsky, Phys. Rev. Lett. 62, 531 (1989). [8] M. A. Hoefer, M. J. Ablowitz, I. Coddington, E. A. Cornell, P. Engels, and V. Schweikhard, Phys. Rev. A 74, 023623 (2006). V. M. Perez-Garcia, V.V. Konotop, V.A. Brazhnyi, Phys. Rev. Lett. 92, 220403 (2004); B. Damski, Phys. Rev. A 69, 043610 (2004); [9] J. Wyller, W. Krolikowski, O. Bang, J. J. Rasmussen, Phys. Rev. E 66, 066615 (2002). [10] A. Yakimenko, Y. Zaliznyak and Y.S. Kivshar, Phys. Rev. E 71, 065603(R) (2005). [11] C. Rotschild, O. Cohen, O. Manela, M. Segev and T. Carmon, Phys. Rev. Lett. 95, 213904 (2005). [12] C. Conti, M. Peccianti and G. Assanto, Phys. Rev. Lett. 91 073901 (2003); Phys. Rev. Lett. 92 113902 (2004); C. Conti, G. Ruocco and S. Trillo, Phys. Rev. Lett. 95 183902 (2005). [13] P. D. Miller and S. Kamvissis, Phys. Lett. A 247, 75 (1998); J. C. Bronski, Physica D 152, 163 (2001). [14] C. A. Carter and J. M. Harris, Appl. Opt. 23, 476 (1984); S. Wu and N. J. Dovichi, J. Appl. Phys. 67, 1170 (1990); F. Jürgensen and W. Schröer, Appl. Opt. 34 41 (1995). [15] C. J. Wetterer, L. P. Schelonka, and M. A. Kramer, Opt. Lett. 14, 874 (1989). mailto:claudio.conti@phys.uniroma1.it [16] P. Brochard, V. Grolier-Mazza and R. Cabanel, J. Opt. Soc. Am. B 14, 405 (1997). [17] S. Sinha, A. Ray, and K. Dasgupta, J. Appl. Phys. 87, 3222 (2000). [18] paraxial diffraction in defocusing media is well known to be isomorphus in 1D to propagation in a normally dispersive focusing medium as considered in Ref. [7] ABSTRACT We investigate the formation of collisionless shocks along the spatial profile of a gaussian laser beam propagating in nonlocal nonlinear media. For defocusing nonlinearity the shock survives the smoothing effect of the nonlocal response, though its dynamics is qualitatively affected by the latter, whereas for focusing nonlinearity it dominates over filamentation. The patterns observed in a thermal defocusing medium are interpreted in the framework of our theory. <|endoftext|><|startoftext|> Introduction Description of RBFNN Non parametric statistical modeling Quality of predictor Prediction of field evolution Time evolution of melt pool Optimal value of parameter Optimal number of joint sample pairs Choosing the surrounding S Optimal prediction of melt pool evolution Conclusion References ABSTRACT Efficient control of a laser welding process requires the reliable prediction of process behavior. A statistical method of field modeling, based on normalized RBFNN, can be successfully used to predict the spatiotemporal dynamics of surface optical activity in the laser welding process. In this article we demonstrate how to optimize RBFNN to maximize prediction quality. Special attention is paid to the structure of sample vectors, which represent the bridge between the field distributions in the past and future. <|endoftext|><|startoftext|> Introduction Recently our understanding of the linear Balitsky–Fadin–Kuraev–Lipatov (BFKL) [4, 5] and non-linear Jalilian-Marian–Iancu–McLerran–Weigert–Leonidov–Kovner (JIMWLK) [6–13] and Balitsky-Kovchegov (BK) [14–18] small-x evolution equations in the Color Glass Condensate [6–29] has been improved due to the completion of the calculations determining the scale of the running coupling in the evolution kernel in [1,2,30,31]. The calculations in [1,2] proceeded by including αsNf corrections into the evolution kernel and by then completing Nf to the complete one-loop QCD beta-function by replacing Nf → −6πβ2. Calculation of the αsNf corrections is particularly easy in the s-channel light-cone perturbation theory formalism [32, 33] used to derive the BK and JIMWLK equations: there αs Nf corrections are solely due to chains of quark bubbles placed onto the s-channel gluon lines. The analytical results of [1,2] are not very concise and could not have been guessed without an explicit calculation. After finding αsNf corrections, the obtained contributions had to be divided into the running coupling part, which has a form of a running coupling correction to the leading-order (LO) JIMWLK or BK kernel, and into the ”subtraction piece”, which would bring in new structures into the kernel. Such separation had to be done both in [1] and in [2]. Unfortunately, there appears to be no unique way to perform this separation: it is not surprising, therefore, that it was done differently in both papers [1, 2]. This resulted in two different running coupling terms, shown below in Eqs. (35) and (36) along with Eqs. (7) and (8). Such a discrepancy has led to a misconception in the community that the calculations of [1] and of [2] disagree at some fundamental level. Indeed to compare the results of [1] and [2] one has to undo the separation into the running coupling and subtraction terms: combining both terms one should compare full kernels of the evolution equation obtained in [1, 2]. There is another more physical reason to perform such comparison: in principle, there is no small parameter making the subtraction term smaller than the running coupling term and thus justifying neglecting the former compared to the latter. Even the labeling of one term as “running coupling” piece is somewhat misleading, since it may give an impression that the neglected subtraction term has no running coupling corrections in it. As was shown in [31] both terms actually contribute to the running coupling corrections to the BFKL equation (if one uses the separation of [2] to define the terms). In this paper we perform numerical analysis of the BK evolution equation with the αsNf corrections resummed to all orders and with Nf completed to the QCD beta-function, Nf → −6πβ2, with β2 given in Eq. (20). We first solve the BK equation keeping the running coupling term only, with the kernels given by Eqs. (7) and (8). Indeed the solutions we find this way are different from each other. We then evaluate the subtraction terms for both cases and show that inclusion of subtraction terms puts the results of [1] and [2] in perfect agreement with each other! We complete our analysis by solving the BK equation with the full kernel including both the running coupling and subtraction terms. This work is structured as follows. Section 2 begins with Sect. 2.1 in which we review the αsNf corrections to the dipole scattering amplitude evolution equation recently derived in [1, 2] and the subtraction method employed in both works to separate the running coupling contributions from the subtraction terms. We discuss the scheme dependence of the running coupling terms introduced by this separation. We proceed in Sect. 2.2 by deriving the explicit expressions for the subtracted terms. The calculation is based on the results of [2]. Our analytical results are summarized in Sect. 2.3, where we give the explicit final expression for the kernel of the subtraction term in Eq. (39), which, combined with Eq. (38) gives us the subtraction terms (40) and (41) for the subtractions performed in [1] and in [2] correspondingly. In Sect. 3 we explain the numerical method we use to solve the evolution equations. We also list the initial conditions used, along with the definition of the saturation scale employed. Throughout the paper we will avoid the important question of the Landau pole and the contri- bution of renormalons to small-x evolution. As we explain in Sect. 3, we will simply “freeze” the running coupling at a constant value in the infrared. For a detailed study of the renormalon effects in the non-linear evolution we refer the readers to [30]. Our numerical results are presented in Sect. 4. By solving the evolution equations with the running coupling term only in Sect. 4.1 we show that the resulting dipole amplitude differs significantly from the fixed coupling case. We also observe that the amplitude obtained by solving the equation obtained in [2] is very close to the result of solving the BK evolution with a postulated parent-dipole running of the coupling constant. Both these amplitudes are quite different from the solution of the equation derived in [1], as one can see from Fig. 4. In spite of that, all three evolution equations studied (the ones derived in [1], [2] and the parent-dipole running coupling model) give approximately identical scaling function for the dipole amplitude at high rapidity, as demonstrated in Fig. 5 in Sect. 4.2. It is worth noting that, as can be seen from Fig. 6, the anomalous dimension we extracted from our solution is γ ≈ 0.85, which is different from the fixed coupling anomalous dimension of γ ≈ 0.64. The former anomalous dimension also appears to disagree with the predictions of analytical approximations to the behavior of the dipole amplitude with running coupling from [34–38]. In Sect. 4.3 we numerically evaluate the subtraction terms for both [1] and [2] and show that their contributions are important, as shown in Fig. 7. However, subtraction terms decrease with increasing rapidity, such that at high enough rapidities their relative contribution becomes small (see Fig. 8). In Fig. 9 we show that inclusion of subtraction terms makes the results of [1] and [2] agree with each other. Finally, the numerical solution of the full (all orders in αs β2) evolution equation including both the running coupling and subtraction terms is performed in Sect. 4.4. The results are shown in Fig. 10. All the main features of the evolution with the running coupling are preserved in the full solution: the growth of the dipole amplitude and of the saturation scale with rapidity is slowed down (for the latter see Fig. 11). The scaling function of Fig. 5 is unaltered by the subtraction term, as shown in Fig. 12. We summarize and discuss our main conclusions in Sect. 5. 2 Scheme dependence 2.1 Inclusion of running coupling corrections: general concepts The BK evolution equation for the dipole scattering matrix reads ∂S(x0, x1; Y ) d2z K(x0, x1, z) [S(x0, z; Y )S(z, x1; Y )− S(x0, x1; Y )] , (1) where K(x0, x1, z) = r21 r is the kernel of the evolution. Here transverse two-dimensional vectors x0 and x1 denote the transverse coordinates of the quark and the anti-quark in the parent dipole, while z is the position of the gluon produced in one step of evolution [39–42]. We have introduced the notation r = x0 − x1, r1 = x0 − z, r2 = z − x1 for the sizes of the parent and of the new (daughter) dipoles created by one step of the evolution. The notation r ≡ |r| for all the 2- dimensional vectors will be also employed throughout the rest of the paper. Eq. (1) admits a clear physical interpretation: the original parent dipole, when boosted to higher rapidities, may emit a new gluon which, in the large-Nc limit, is equivalent to a quark-antiquark pair. Thus, the original dipole splits into two new dipoles sharing a common transverse coordinate: the transverse position of the emitted gluon, z. The nonlinear term in the right hand side of Eq. (1) accounts for either one of the two new dipoles interacting with the target, along with the possibilities of only one dipole interacting or no interaction at all, while the subtracted linear term reflects virtual corrections. The kernel of the evolution is just the probability of one gluon emission calculated at leading logarithmic accuracy in αs ln(1/xB), where xB is the fraction of momentum carried by the emitted gluon [39–42]. Under the eikonal approximation the dipole scattering matrix off a hadronic target at a fixed rapidity is given by the average over the hadron field configurations of Wilson lines V calculated along fixed transverse coordinates (those of the quark and of the antiquark). More specifically S(x0, x1; Y ) = V (x0)V †(x1) 〉 . (3) Hence, the integrand of Eq. (1) can be regarded as a three point function in the sense that the gluon fields of the target are evaluated at three different transverse positions, those of the original quark and antiquark plus the one of the emitted gluon. However, the inclusion of higher order corrections to the evolution equation via all order resummation of αsNf contributions as recently derived in [1, 2] brings in new physical chan- nels that modify the three point structure of the leading-log equation. The dipole structure generated under evolution by diagrams like the one depicted in Fig. 1A (for more detailed dis- cussion of the diagrammatic content of the high order corrections see [2]) is identical to the one previously discussed for the leading order equation, the only novelty being that the propagator of the emitted gluon is now dressed with quark loops, modifying the emission probability but leaving untouched the interaction terms. On the contrary, diagrams like the one in Fig. 1B in which a quark-antiquark pair (rather than a gluon) is added to the evolved wave function modify the interaction structure of the evolution equation. The evolution of the parent dipole scattering matrix driven by these kind of terms is proportional to the scattering matrix of the two newly created dipoles (the one formed by the original quark and the new antiquark and vice versa), ∼ S(x0, z1)S(z2, x1). This term depends on four different transverse coordinates, i.e., it is a four point function and, therefore, its contribution to the evolution equation cannot be accounted for by a mere modification of the emission kernel of the leading order equation. To discuss in more detail the modifications introduced by the high order corrections, we find it useful to rewrite the evolution equation in the following, rather general way ∂S(x0, x1; Y ) = F [S(x0, x1; Y )] (4) where F is a functional of the dipole scattering matrix which for the original derivation of the equation is given by the right hand side of Eq. (1). In general it can be decomposed into two Figure 1: Schematic representation of the diagrams contributing to quark-NLO evolution. pieces F [S] = R [S]− S [S] . (5) The first term, R, which we will call the ’running coupling’ contribution, gathers all the higher order in αsNf corrections to the evolution that can be recast in a functional form that looks identical to the leading order one but with a modified kernel, K̃, which includes all the terms setting the scale for the running coupling: R [S(x0, x1; Y )] = d2z K̃(x0, x1, z) [S(x0, z; Y )S(z, x1; Y )− S(x0, x1; Y )] . (6) The second term, S, henceforth referred to as the ’subtraction’ contribution, encodes those contributions that depart from the three point structure of the leading-log equation. The explicit derivation and expressions for this term are presented in the next section. The relative minus sign between the two terms in Eq. (5) has been introduced for latter convenience. Importantly, the decomposition of F into running coupling and subtraction contributions, although constrained by unitarity arguments, is not unique. Two different separation schemes have been proposed in [1, 2]. They are both based on a similar strategy, sketched in Fig. 2, that can be summarized as follows. The newly created quark-antiquark pair added to the wave function in the diagrams Fig. 1B is shrunk to a point, called the subtraction point, by integrating out one of the coordinates in the dipole-qq̄ wave function, rendering the previously discussed four point nature of these contributions into a three point one. This integrated three point contribution is added to the running coupling contribution, whereas the original four point term minus its integrated version are assigned to the subtraction contribution. The divergence between the two approaches stems from the choice of the subtraction point. In the subtraction scheme proposed by Balitsky in [1] the subtraction point is chosen to be the transverse coordinate of either the quark, z2, or the antiquark, z1. The kernel for the running coupling functional, Eq. (6), obtained in this way is K̃Bal(r, r1, r2) = Nc αs(r r21 r . (7) On the other hand, in the subtraction procedure followed in [2] (which we will refer to as KW) the zero size quark-antiquark pair is fixed at the transverse coordinate of the gluon, x 0 x 0 Figure 2: Schematic representation of the subtraction procedure. z = αz1 + (1 − α)z2, where α is the fraction of the gluon’s longitudinal momentum carried by the quark, yielding the following expression for the kernel of the running coupling contribution: K̃KW(r, r1, r2) = − 2 αs(r 1)αs(r αs(R2) r1 · r2 r21 r + αs(r , (8) where R2(r, r1, r2) = r1 r2 r1·r2 . (9) As we shall discuss later, the scheme dependence originated by the choice of the subtraction point is substantial and has an important effect in the solutions of the evolution equation when only the running contribution is taken into account. In our numerical study we will also consider the following ad hoc prescription for the kernel of the running coupling functional in which the scale for the running of the coupling is set to be the size of the parent dipole K̃pd(r, r1, r2) = Nc αs(r r21 r . (10) This prescription is useful as a benchmark used to compare with previous numerical [3] and analytical works [34, 35, 43] where this ansatz was used. 2.2 Derivation of the subtraction term We begin by considering the NLO contribution to the kernel of the JIMWLK and BK evolution equations with the s-channel gluon splitting into a quark-antiquark pair, which then interacts with the target, as shown on the left hand side of Fig. 3. The contribution of this diagram has been calculated in [2]. The resulting JIMWLK kernel is [2] KNLO1 (x0, x1; z1, z2) = 4Nf (2π)2 (2π)2 (2π)2 (2π)2 e−iq·(z−x0)+iq ′·(z−x )−i(k−k′)·z q2q′2 (1− 2α)2q · k k′ · q′ + q · q′ k · k′ − q · k′ k · q′ k2 + q2α(1− α) k′2 + q′2α(1− α) 2α (1− α) (1− 2α) k2 + q2α(1− α) k′2 + q′2α(1− α) k · q k′ · q′ 4α2 (1− α)2 k2 + q2α(1− α) k′2 + q′2α(1− α)  . (11) The momentum labels in the above equation are explained on the left hand side of Fig. 3. If k1 and k2 are the transverse momenta of the quark and of the antiquark in the produced pair as shown in Fig. 3, then the transverse momentum of the gluon is q = k1 + k2. The other transverse momentum we use is k = k1(1 − α) − k2α, where α is the fraction the of gluon’s “plus” momentum carried by the quark, α ≡ k1+/(k1+ + k2+). The prime over the transverse momentum denotes the momentum of the same particle in the complex conjugate amplitude. For instance q′ is the momentum of the s-channel gluon in the complex conjugate amplitude. Finally, z1 and z2 denote the transverse coordinates of the quark and the antiquark. In Eq. (11) we use z12 = z1−z2 (the transverse separation between the quark and the antiquark) and z = α z1 + (1− α) z2 (the transverse coordinate of the gluon). Figure 3: A lowest order leading-Nf NLO correction which gives rise to the subtraction term is shown on the left. The same diagram with the gluon lines “dressed” by chains of fermion bubbles, as shown on the right, gives the full (resumming all powers of αµNf ) contribution to the subtraction term. Calculation of the subtraction term is pictured in Fig. 2. To obtain the BK kernel from Eq. (11) one should sum over all possible emissions of the gluon off the quark and antiquark lines in the incoming dipole both in the amplitude and in the complex conjugate amplitude, which is accomplished by KNLO1 (x0, x1; z1, z2) = CF m,n=0 (−1)m+n KNLO1 (xm, xn; z1, z2). (12) Below we will label the JIMWLK kernel by calligraphic letter K and the corresponding BK kernel by K. The contribution of the kernel from Eq. (12) to the right hand side of the NLO version of Eq. (1) is given by the following term d2z1 d 1 (x0, x1; z1, z2)S(x0, z1, Y )S(z2, x1, Y ) (13) with αµ the bare coupling. As shown in Fig. 2, at the NLO level, the subtraction term introduced in Eq. (5), is then defined by SNLO[S] = α2µ d2z1 d 2z2 K 1 (x0, x1; z1, z2) × [S(x0, w, Y )S(w, x1, Y )− S(x0, z1, Y )S(z2, x1, Y )], (14) where w is the point of subtraction in the transverse coordinate space. In [1] it was chosen to be equal to the transverse coordinate of either the quark or the antiquark, w = z1 or w = z2, (15) as both choices lead to the same subtraction term SBalNLO[S]: SBalNLO[S] = d2z1 d 2z2 K 1 (x0, x1; z1, z2) × [S(x0, z1, Y )S(z1, x1, Y )− S(x0, z1, Y )S(z2, x1, Y )] . (16) In [2] the subtraction point was chosen to be the transverse coordinate of the gluon z, w = z = α z1 + (1− α) z2. (17) This leads to the following subtraction term, which we denote SKWNLO[S]: SKWNLO[S] = d2z1 d 1 (x0, x1; z1, z2) × [S(x0, z, Y )S(z, x1, Y )− S(x0, z1, Y )S(z2, x1, Y )] . (18) Indeed the complete kernel in Eq. (5) is independent of the choice of w. However, since the subtraction term of Eq. (14) was neglected both in [1] and in [2], different choices of w led to different expressions for the remaining running coupling part R[S], i.e., to different answers as far as investigations in [1] and in [2] were concerned. Different choice of w is the main source of the discrepancy of final answers of [1] and [2], though it does not imply any disagreement in the full expression (5). Our goal in this Section is to evaluate KNLO1 (x0, x1; z1, z2) from Eq. (11) including the running coupling corrections. The s-channel light-cone perturbation theory formalism makes such inclusion simple [2]: all we have to do is include infinite chains of quark bubbles on the gluon lines in the amplitude and in the complex conjugate amplitude, as depicted on the right hand side of Fig. 3. Performing calculations similar to those done in [2] one arrives at K ❣1 (x0, x1; z1, z2) = 4Nf (2π)2 (2π)2 (2π)2 (2π)2 e−iq·(z−x0)+iq ′·(z−x )−i(k−k′)·z q2q′2 (1− 2α)2q · k k′ · q′ + q · q′ k · k′ − q · k′ k · q′ k2 + q2α(1− α) k′2 + q′2α(1− α) 2α (1− α) (1− 2α) k2 + q2α(1− α) k′2 + q′2α(1− α) k · q k′ · q′ 4α2 (1− α)2 k2 + q2α(1− α) k′2 + q′2α(1− α) 1 + αµβ2 ln q2 e−5/3 1 + αµβ2 ln q′2 e−5/3 ) (19) where K ❣1 denotes the kernel with the running coupling corrections resummed to all orders. Just like in [2, 31], here we will use MS renormalization scheme. Inclusion of fermion bubble chains generated two denominators at the end of Eq. (19), which is its only difference from Eq. (11). Here 11Nc − 2Nf . (20) Now we have to perform the transverse momentum integrals in Eq. (19). First we expand the denominators at the end of Eq. (19) into a power series and rewrite Eq. (19) as K ❣1 (x0, x1; z1, z2) = 4Nf n,m=0 (−αµβ2)n+m (2π)2 (2π)2 (2π)2 (2π)2 e−iq·(z−x0)+iq ′·(z−x )−i(k−k′)·z q2q′2 (1− 2α)2q · k k′ · q′ + q · q′ k · k′ − q · k′ k · q′ k2 + q2α(1− α) k′2 + q′2α(1− α) 2α (1− α) (1− 2α) k2 + q2α(1− α) k′2 + q′2α(1− α) k · q k′ · q′ 4α2 (1− α)2 k2 + q2α(1− α) k′2 + q′2α(1− α) λ=λ′=0 where we have defined µ2 = µ2 e5/3 to make the expressions more compact. Indeed we can not always expand the denominators of Eq. (19) into a geometric series employed in Eq. (21), but one has to remember that the summation of bubble chain diagrams shown on the right side of Fig. 3 gives one the geometric series. Hence the geometric series come first: later they are absorbed into the denominators shown in Eq. (19), which is an approximation not valid for all q and q′. Therefore, by keeping the geometric series in Eq. (21) we are not making any approximations. In general, in what follows we are not going to keep track of the issues of convergence of perturbation series. The contribution of renormalons to non-linear small-x evolution was thoroughly investigated in [30] and was found to be significant at low Q2. We refer the interested reader to [30] for more details on this issue. Using the following formulas (2π)2 e−ik·z k2 + q2 K0(q z) (22) (2π)2 e−ik·z k2 + q2 q K1(q z) (23) we can now perform the k- and k′-integrals in Eq. (21). Integrating over the angles of q and q′ as well yields K ❣1 (x0, x1; z1, z2) = (2π)4 n,m=0 (−αµβ2)n+m dq q dq′ q′ z212 |z − x0| |z − x1| − 4α ᾱ z12 · (z − x0) z12 · (z − x1) + z212 (z − x0) · (z − x1) × J1(q |z − x0|)K1(z12 q α ᾱ) J1(q ′ |z − x1|)K1(z12 q′ α ᾱ) + 2α ᾱ (α− ᾱ) z12 · (z − x0) z12 |z − x0| J1(q |z − x0|)K1(z12 q α ᾱ) J0(q ′ |z − x1|)K0(z12 q′ α ᾱ) z12 · (z − x1) z12 |z − x1| J0(q |z − x0|)K0(z12 q α ᾱ) J1(q ′ |z − x1|)K1(z12 q′ α ᾱ) + 4α2 ᾱ2 J0(q |z − x0|)K0(z12 q α ᾱ) J0(q ′ |z − x1|)K0(z12 q′ α ᾱ) λ=λ′=0 . (24) We have defined ᾱ = 1− α (25) for brevity. Now the integrals over q and q′ can be carried out to give K ❣1 (x0, x1; z1, z2) = (2π)4 n,m=0 (−αµβ2)n+m z212 µ 2 α ᾱ )λ+λ′ Γ2(1 + λ) Γ2(1 + λ′) − 4α ᾱ z12 · (z − x0) z12 · (z − x1) + z212 (z − x0) · (z − x1) (1 + λ) (1 + λ′) z812 (α ᾱ) 1 + λ, 2 + λ; 2;−|z − x0| α ᾱ z212 1 + λ′, 2 + λ′; 2;−|z − x1| α ᾱ z212 α− ᾱ z12 · (z − x0) (1 + λ)F 1 + λ, 2 + λ; 2;−|z − x0| α ᾱ z212 1 + λ′, 1 + λ′; 1;−|z − x1| α ᾱ z212 z12 · (z − x1) 1 + λ, 1 + λ; 1;− |z − x0|2 α ᾱ z212 (1 + λ′)F 1 + λ′, 2 + λ′; 2;− |z − x1|2 α ᾱ z212 1 + λ, 1 + λ; 1;− |z − x0|2 α ᾱ z212 1 + λ′, 1 + λ′; 1;− |z − x1|2 α ᾱ z212 λ=λ′=0 . (26) Unfortunately further simplification of the expression in Eq. (26) is impossible without approx- imations. The series resulting from summation over n and m are likely to be divergent due to renormalons. As we mentioned before, here we neglect the renormalon problem referring the reader to [30]. Similar to how it was done in [2] we are not going to attempt to resum the series exactly: instead we will calculate the next-to-leading order terms and assume that with a good accuracy they give us the scale(s) of the running coupling constant. This procedure is similar to the well-known prescription due to Brodsky, Lepage and Mackenzie [44]. Using the Taylor-expansions of hypergeometric functions F (1 + λ, 2 + λ; 2; z) = − λ 1 1 + ln(1− z) + 1 ln(1− z) + o(λ2). (27) F (1 + λ, 1 + λ; 1; z) = ln (1− z) + o(λ2) (28) after some algebra we obtain K ❣1 (x0, x1; z1, z2) = [α (z1 − x0)2 + ᾱ (z2 − x0)2] [α (z1 − x1)2 + ᾱ (z2 − x0)2] z412 − 4α ᾱ z12 · (z − x0) z12 · (z − x1) + z212 (z − x0) · (z − x1) 1− αµ β2 ln R2T (x0)µ + o(α2µ) 1− αµ β2 ln R2T (x1)µ + o(α2µ) + 2α ᾱ (α− ᾱ) z212 z12 · (z − x0) 1− αµ β2 ln R2T (x0)µ + o(α2µ) 1− αµ β2 ln R2L(x1)µ + o(α2µ) + z12 · (z − x1) 1− αµ β2 ln R2L(x0)µ + o(α2µ) 1− αµ β2 ln R2T (x1)µ + o(α2µ) +4α2 ᾱ2 z412 1− αµ β2 ln R2L(x0)µ + o(α2µ) 1− αµ β2 ln R2L(x1)µ + o(α2µ) In arriving at Eq. (29) we employed functions RT (x) and RL(x), which have dimensions of transverse coordinates and are defined by R2T (x)µ 4 e−2γ−5/3 [α (z1 − x)2 + ᾱ (z2 − x)2]µ2MS α ᾱ z212 (z − x)2 α (z1 − x)2 + ᾱ (z2 − x)2 α ᾱ z212 R2L(x)µ 4 e−2γ−5/3 [α (z1 − x)2 + ᾱ (z2 − x)2]µ2MS α (z1 − x)2 + ᾱ (z2 − x)2 α ᾱ z212 The subscripts T and L stand for transverse and longitudinal gluon polarizations which give rise to the two different functions under the logarithm. Recombining the series in Eq. (29) into physical running couplings finally yields α2µK ❣1 (x0, x1; z1, z2) = [α (z1 − x0)2 + ᾱ (z2 − x0)2] [α (z1 − x1)2 + ᾱ (z2 − x0)2] z412 − 4α ᾱ z12 · (z − x0) z12 · (z − x1) + z212 (z − x0) · (z − x1) R2T (x0) R2T (x1) + 2α ᾱ (α− ᾱ) z212 z12 · (z − x0) αs R2T (x0) R2L(x1) +z12 · (z − x1)αs R2L(x0) R2T (x1) + 4α2 ᾱ2 z412 αs R2L(x0) R2L(x1) with the physical running coupling in the MS scheme given by αs(1/R 1 + αµβ2 ln R2 µ2 ) . (33) Eq. (32) is the contribution to the JIMWLK evolution kernel of the resummed diagram on the right hand side of Fig. 3. 2.3 Brief summary of analytical results Let us briefly summarize our analytical results. The non-linear small-x evolution equation with the running coupling corrections included reads ∂S(x0, x1; Y ) = R [S]− S [S] . (34) The first term on the right hand side of Eq. (34) is referred to as the running coupling contribution. It was calculated independently in [1] and in [2]: the results of those calculations are given above in Eqs. (7) and (8) correspondingly, which have to be combined with Eq. (6) to obtain RBal [S] = d2z K̃Bal(x0, x1, z) [S(x0, z; Y )S(z, x1; Y )− S(x0, x1; Y )] (35) RKW [S] = d2z K̃KW(x0, x1, z) [S(x0, z; Y )S(z, x1; Y )− S(x0, x1; Y )] . (36) One notices immediately that RBal [S] calculated in [1] is different from RKW [S] calculated in [2] due to the difference in the kernels K̃Bal and K̃KW in Eqs. (7) and (8). However that does not imply disagreement between the calculations of [1] and [2]: after all, it is the full kernel on the right of Eq. (34), R [S]− S [S], that needs to be compared. To do that one has to calculate the second term on the right hand side of Eq. (34). The second term on the right hand side of Eq. (34) is referred to as the subtraction contri- bution. It is given by S[S] = α2µ d2z1 d 2z2K ❣1 (x0, x1; z1, z2) [S(x0, w, Y )S(w, x1, Y )− S(x0, z1, Y )S(z2, x1, Y )] with the resummed BK kernel K ❣1 (x0, x1; z1, z2) = CF m,n=0 (−1)m+n K ❣1 (xm, xn; z1, z2). (38) The resummed JIMWLK kernel K ❣1 (xm, xn; z1, z2) is given by Eq. (32), along with Eqs. (30) and (31) defining the scales of the running couplings. In the numerical solution below we will replace Nf → −6πβ2 in its prefactor, obtaining α2µK ❣1 (x0, x1; z1, z2) = − [α (z1 − x0)2 + ᾱ (z2 − x0)2] [α (z1 − x1)2 + ᾱ (z2 − x0)2] z412 − 4α ᾱ z12 · (z − x0) z12 · (z − x1) + z212 (z − x0) · (z − x1) R2T (x0) R2T (x1) + 2α ᾱ (α− ᾱ) z212 z12 · (z − x0) αs R2T (x0) R2L(x1) +z12 · (z − x1)αs R2L(x0) R2T (x1) + 4α2 ᾱ2 z412 αs R2L(x0) R2L(x1) This substitution is the same as for all other factors ofNf . The same substitution was performed in [2] to calculate the running coupling term. In fact, as was shown in [31], the linear part of the subtraction term (calculated using the prescription of [2]) contributes to the running coupling corrections to the BFKL equation. Therefore, in that case, the factor of Nf in front of Eq. (32) is definitely a part of the beta-function. Hence the replacement Nf → −6πβ2 is justified even in the subtraction term. Once again, in the numerical solution below we will use Eq. (39) along with Eq. (38) in Eq. (37) to calculate the subtraction term S[S]. Substituting w = z1 (or, equivalently, w = z2) in Eq. (37) would yield the subtraction term SBal[S] =α2µ d2z1 d 2z2K ❣1 (x0, x1; z1, z2) × [S(x0, z1, Y )S(z1, x1, Y )− S(x0, z1, Y )S(z2, x1, Y )] (40) which has to be subtracted from RBal [S] calculated in [1] and given by Eq. (35) to obtain the complete evolution equation resumming all orders of αsNf in the kernel. Substituting w = z = α z1 + (1− α) z2 in Eq. (37) yields SKW[S] =α2µ d2z1 d 2z2K ❣1 (x0, x1; z1, z2) × [S(x0, z, Y )S(z, x1, Y )− S(x0, z1, Y )S(z2, x1, Y )] (41) which has to be subtracted from RKW [S] calculated in [2] and given in Eq. (36) again to obtain the complete evolution equation resumming all orders of αsNf in the kernel. We checked explicitly by performing analytic calculations that the two evolution equations obtained this way agree at the NLO and NNLO. Below we will check the agreement of the two calculations to all orders by performing a numerical analysis of the solutions of these equations. The above discussion demonstrates that the separation of the evolution kernel into the running coupling and subtraction pieces, as done in Eq. (34), is somewhat artificial, and has no small parameter justifying one or another separation prescription. Therefore, the small-x evolution equation including all running coupling (or, more precisely, αsNf ) corrections should combine both terms in Eq. (34). Below we will solve such evolution equation numerically to obtain the full small-x evolution with the running coupling. 3 Numerical setup and initial conditions In our numerical study we consider the translational invariant approximation in which the scattering matrix is independent of the impact parameter of the collision, i.e., S = S(r, Y ). To solve the integro-differential equations, corresponding to the BK equation with running coupling we employ a second-order Runge-Kutta method with a step size in rapidity ∆Y = 0.1. We discretize the variable |r| into 800 points equally separated in logarithmic space between rmin = 10 −8 and rmax = 50. Throughout this paper, the units of r will be GeV −1, and those of Qs will be GeV. All the integrals have been performed using improved adaptative Gaussian quadrature methods. The accuracy of this numerical method has been checked in [3] to be better than a 4% in all the r range. We consider three different initial conditions for the dipole scattering amplitude, N(r, Y ) = 1− S(r, Y ). The first one is taken from the McLerran-Venugopalan (MV) model [22, 23]: NMV (r, Y = 0) = 1− exp . (42) where a constant term has been added to the argument of the logarithm in the exponent in order to regularize it for large values of r. The other two initial conditions are given by NAN (r) = 1− exp −(r Q , (43) with γ = 0.6 and γ = 0.8. These two last initial conditions will be referred hereinafter as AN06 and AN08 respectively. The interest in this ansatz, reminiscent of the Golec-Biernat–Wusthoff model [45], is that the small-r behavior NAN ∝ r2γ corresponds to an anomalous dimension 1− γ of the unintegrated gluon distribution at large transverse momentum. (AN labels initial conditions with anomalous dimension.) Our choices γ = 0.6 and γ = 0.8 can be motivated a posteriori by the observation that the anomalous dimension of the evolved BK solution for running coupling lies in between those two values and the one for the MV initial condition, γ ≈ 1 (see Section 4.2). Thus, the choice of distinct initial conditions allows us to better track the onset of the expected asymptotic universal behavior that is eventually reached at high energies and to study the influence of the pre-asymptotic, non-universal corrections to the solutions of the evolution equations. To completely determine our initial conditions, we set Q′s = 1 GeV at Y=0 in Eqs. (42) and (43) and put Λ = 0.2 GeV. Although Q s is normally identified with the saturation scale, our definition of the saturation scale through the rest of the paper will be purely pragmatical and given by the condition N(r = 1/Qs(Y ), Y ) = κ, (44) with κ = 0.5. We have checked that this choice of κ, albeit arbitrary, does not affect any of the major conclusions to be drawn in the rest of the paper. Finally, in order to avoid the Landau pole and to regularize the running coupling at large transverse sizes we stick to the following procedure: for small transverse distances r < rfr, with rfr defined by αs(1/r fr) = 0.5, the running coupling is given by the one loop expression αs(1/r β2 ln r2 Λ2 ) (45) with Nf = 3 and Λ = 0.2 GeV, whereas for larger sizes, r > rfr, we “freeze” the coupling at a fixed value αs = 0.5. A detailed study of the role of Landau pole in non-linear small-x evolution is given in [30]. 4 Results In this Section, we discuss our numerical results and how they compare to previous numerical work and analytical estimates. 4.1 Running coupling Fig. 4 shows the solutions of the evolution equation when only the running coupling contribution is taken into account, i.e., neglecting the subtraction term in Eq. (34), for different initial conditions and for the three schemes considered in this work: Balitsky’s, given by Eqs. (7) and (35), KW, given by Eqs. (8) and (36), and the the ad hoc parent dipole implementation of the running coupling, shown in Eq. (10). MV init. cond. Y=0,5,15,30KW Balitsky parent dipole init. cond. AN08 init. cond. Y=0,5,15,30 AN06 init. cond. Y=0,5,15,30 r )-1(GeV Figure 4: Solutions of the BK equation at rapidities Y=0, 5, 15 and 30 (curves are labeled from right to left) for the three running coupling schemes considered in this work: KW (solid line), Balitsky (dashed line) and parent dipole (dashed-dotted lines). The initial conditions are MV (top), AN08 (middle) and AN06 (bottom). As previously observed in [3, 46], the most relevant effect of including running coupling corrections in the evolution equation is a considerable reduction in the speed of the evolution with respect to the fixed coupling case. This is a common feature of the different running coupling schemes studied here and of other phenomenological ones considered in the literature (a detailed comparison between the solutions for fixed coupling evolution and for parent dipole running coupling can be found e.g. in [3]). This is not a surprising result, since a generic effect of the running of the coupling is to suppress the emission of small transverse size dipoles, which is the leading mechanism driving the evolution. However, despite this common feature of the running coupling solutions, significant dif- ferences are found between the solutions obtained under different schemes as we infer from Fig. 4. In particular, the evolution is much faster with the KW prescription than with that of Balitsky. Equivalently, the KW prescription yields a stronger growth of the saturation scale with rapidity/energy than Balitsky’s. Moreover, the solutions obtained when the parent dipole prescription is used lay much closer to those obtained within the KW scheme than to the ones obtained when Balitsky’s scheme is applied, contrary to what was suggested in [1]. As argued before, the differences observed in the solutions obtained using the two subtraction schemes are entirely due to neglecting the subtraction contribution and reflect the arbitrariness of the separation procedure. 4.2 Geometric scaling It has been found in previous analytical [34, 36, 47] and numerical studies on the solutions of the BK equation at leading order [3, 48–50] and for different heuristic implementations of next-to-leading order corrections [3,46], including the parent dipole prescription for the running coupling also considered in this work, that the solutions of the evolution equation at high enough rapidities are no longer a function of two separate variables r and Y , but rather they depend on a single scaling variable, τ = r Qs(Y ). This feature of the evolution, commonly referred to as geometric scaling, is an exact property of the solutions for fixed coupling evolution due to the conformal invariance of the leading-log kernel, and has become one of the key connections between the saturation based formalisms and the phenomenology of heavy ion collisions and deep inelastic scattering experiments [51–57]. It can be seen from Fig. 5 that the solutions of the BK equation with the running coupling terms discussed in the previous section also exhibit the property of scaling, in agreement with the analytical study carried out in [38], shown by the fact that the rescaled high rapidity solutions lay on a single curve which is independent of both the running coupling scheme and of the initial condition. The scaling behavior of the solution is observed in the whole τ range studied in this work, including the saturation region, τ > 1. The tiny deviations from a pure scaling behavior observed in Fig. 5 may be attributed to the fact that the full asymptotic behavior is reached at even larger rapidities (Y & 80, [3]) than those achieved by the numerical solution performed in this work. Remarkably, the scaling function for both KW and Balitsky’s scheme coincides with the one obtained with the parent dipole prescription, up to the above mentioned scaling violations. It has been observed in [3, 46, 48] that the scaling function differs significantly in the fixed and running coupling cases. Following that work, and to make a more quantitative study of the scaling property, we fitted our solutions to the functional form [34] f(τ) = a τ 2 γ ln τ 2 + b , (46) with a, b and γ free parameters, within a fixed window below the saturation region, τ ∈ [10−5, 0.1]. Noticeably, at large enough rapidities the whole fitting window lays within the geometric scaling window proposed in [36]: (Λ/Qs(Y )) < τ < 1, where Λ is some initial scale. -210 -110 1 MV init. cond. Y=0,40 Balitsky parent dipole fixed coupling init. cond. τ -110 1 10 AN06 init. cond. Y=0,40 Figure 5: Solutions of the BK equation at rapidities Y=0 and 40 for KW (solid line), Balitsky (dashed line) and parent dipole (dashed-dotted lines) schemes plotted versus the scaling variable τ = rQs(Y ). The asymptotic solution obtained with fixed coupling αs = 0.2 at Y = 40 in [3] is shown (black dashed-dotted line) for comparison. The initial conditions are MV (left) and AN06 (right). The value of γ extracted from the fits at rapidity Y = 40 lays in between γ ∼ 0.8 and γ ∼ 0.9. This conclusion holds for the three initial conditions used here: the anomalous dimension seems to converge to some intermediate value, in agreement with the value found in [3], for asymptotic running coupling solutions (γ ∼ 0.85 at Y = 70). This result for anomalous dimension is very far away from the value obtained in [3] for fixed coupling solutions (γ ∼ 0.64 at Y = 70) and from the predicted anomalous dimension for both running and fixed coupling solutions from analytical studies of the equation based on saddle point techniques [34–38], γc = χ(γc)/χ ′(γc) = 0.6275, where χ is the leading-log BFKL kernel. It might be argued that the numerical value of the anomalous dimension extracted from our fits is conditioned by the choice of the fitting function and by the fitting interval. Actually, it was shown in [3] that the solutions of the evolution could be well fitted by other functional forms, including the double-leading-log solution of BFKL, within a similar fitting region to the one considered in this work. On the other hand, several phenomenological parameterizations of the solution of the evolution have been proposed in [54–57] and have successfully confronted HERA 10 -410 10 -210 -110 1 2*0.6τ ~ Fixed coupling 2*0.85τ ~ Running coupling Figure 6: Asymptotic solutions (Y=40) of the evolution equation for running coupling (solid line) and fixed coupling with αs = 0.2 (dashed line). A fit to a power-law function aτ 2γ in the region τ ∈ [10−6, 10−2] yields γ ≈ 0.85 for the running coupling solution and γ ≈ 0.6 for the fixed coupling one. and RHIC experimental data. There, the dipole scattering amplitude at arbitrary rapidity is assumed to be given by a functional form analogous to our ansatz for the initial condition Eq. (43), but allowing for geometric scaling violations by replacing γ → γ(r, Y ). The value of the anomalous dimension at r = 1/Qs and/or for Y → ∞ is fixed to be the BFKL saddle point, γc ∼ 0.63 (the saddle point value considered in [57] is slightly different, γ ∼ 0.53), while the value γ = 1 is recovered in the limit r → ∞ at any finite rapidity. The success of these phenomenological works supports the claim that the anomalous dimension of the solution is given by the BFKL saddle point, in agreement with the above mentioned analytical predictions. However, the relevant values of momenta probed at current phenomenological applications are very distinct from the fitting region considered here. For example, the inclusive structure function measured in HERA is fitted in [54,56] within the region 0.045 GeV2 < Q2 < 45 GeV2, whereas charged hadron pt spectra in dAu collisions is well reproduced by [55–57] in the region 1 GeV < pt < 4.5 GeV. Note that, for both sets of data, the measured regions overlap with the deeply saturated domain of the solution. On the contrary, our fitting region 10−5 < τ < 1 corresponds to values of momenta ∼ 10Qs(Y ) < pt < 105Qs(Y ) (always well above the saturation scale), with Qs(Y = 40) ∼ 500÷1000 GeV for the different running coupling schemes considered and, therefore, has no overlap with the kinematic regions measured experimentally, since we scrutinize a momentum region strongly shifted to the ultraviolet compared to currently available data. Moreover, it should be noticed that the rapidity interval covered by both sets of experimental data is ∆Y < 4 in both cases, while we study the solutions of the evolution (Y=0)]MVSub[N -410 -310 -210 -110 1 (Y=30)]MVSub[N (Y=0)] Sub[N 10 -210 -110 1 (Y=30)] Sub[N (Y=0)] Sub[N 10 -210 -110 1 (Y=30)] Sub[N Figure 7: Subtraction contribution calculated in the KW scheme (triangles) and in Balitsky’s (stars). The trial functions correspond to the solutions of the evolution under Balitsky run- ning coupling scheme at rapidities Y = 0, 30 for MV (left), AN08 (center) and AN06 initial conditions. at asymptotic rapidities, Y ∼ 40. We have checked that shifting our fitting region to larger values of τ (smaller momentum) would bring the value of γ extracted from our fits closer to the saddle point BFKL one, since the transition from the ultraviolet region to the deeply saturated domain of the scaling solution is realized by a locally less steeper function (see Figs. (5) and (12)). Therefore, there is no contradiction at all between the success of the phenomenological parameterizations of the solutions and the results reported here. With the above clarifications we reach the following conclusion: the asymptotic scaling solutions corresponding to fixed and running coupling evolution are intrinsically different in the whole r-range. This is emphasized in Fig. 6, where we represent the scaling solutions in a log scale for τ < 1. It is clear that the tail of the distribution falls off with decreasing τ much steeper for the running coupling solution than for the fixed coupling one. A fit to a pure power-law function, f = a τ 2γ , in the region τ ∈ [10−6, 10−2] yields γ ∼ 0.85 for the running coupling and γ ∼ 0.61 for the fixed coupling solution. The differences between fixed and running coupling solutions at τ > 1 are evident from Fig. 5. This is a puzzling result that remains to be understood from purely analytical methods. (Y=0)]MV[N 10 -210 -110 1 (Y=30)]MV[N (Y=0)] 10 -210 -110 1 (Y=30)]AN08[N (Y=0)] 10 -210 -110 1 (Y=30)]AN06[N Figure 8: Ratio of the subtraction over the running terms, D(r, Y ) = S[N(r, Y )]/R[N(r, Y )], calculated in both KW (triangles) and Balitsky (stars) schemes for MV (left), AN08 (middle) and AN06 (right) initial conditions at rapidities Y=0 (top) and Y=30 (bottom). 4.3 Subtraction Term Before attempting to solve the complete evolution equation, and in order to gain insight in the nature and structure of the subtraction contribution, we first evaluate the subtraction functional for both Balitsky, Eq. (40), and KW, Eq. (41), schemes using a set of trial functions for S which we choose to consist of the solutions of the evolution equation with the running coupling in Balitsky’s scheme at different rapidities and of the three initial conditions considered above in this work. Two main remarks can be made about our results, shown in Fig. 7: i) For all the trial functions considered in this work, the subtraction contribution is much larger in the KW scheme than in Balitsky’s. A plausible explanation for this is that Balitsky’s subtraction contribution, Eq. (40), when expanded in terms of dipole scattering amplitudes, N = 1 − S, reduces to a sum of non-linear terms, since all the linear terms in the expansion cancel each other due to the z1 ↔ z2 symmetry of the kernel, whereas in the KW case no such cancellation happens and the subtraction contribution, Eq. (41), also includes linear terms, which are dominant over the non-linear ones in the non-saturated domain where N ≪ 1. ii) The subtraction contribution S has the same sign as the running coupling contribution R in the whole τ range which, together with the relative minus sign assigned to the subtraction term in Eq. (34), implies that the proper inclusion of the subtraction term reduces the value of the functional that governs the evolution, F . In other words: the subtraction contribution tends to systematically slow down the evolution, as we shall explicitly confirm in the next subsection. To better quantify the size of the subtraction contribution, we plot the ratio D(r, Y ) ≡ S[N(r, Y )]/R[N(r, Y )] in Fig. 8. At Y = 0, the relative weight of the subtraction contribution with respect to the running one within the KW scheme and for a MV initial condition goes from a D ∼ 0.4 at small τ to D ∼ 1 at τ ∼ 1. The same ratio for the Balitsky scheme takes significantly smaller values: it goes from D ∼ 0.1 at small τ to D ∼ 0.4 for τ ∼ 1. As the evolved solutions get closer to the scaling function, i.e. for larger rapidities, the r dependence of the ratio becomes flatter and its overall normalization goes down to an approximately constant value D ∼ 0.15 for the KW scheme and D ∼ 0.025 for that of Balitsky. This behavior remains unaltered when going from rapidity Y = 20 to Y = 30, which suggests that the ratio may saturate to a fixed value in the asymptotic region. -210 -110 1 KWRun BalRun KW(Run-Sub) (Run-Sub) (Y=0)MVN τ -210 -110 1 (Y=30)MVN Figure 9: Total kernel F = R−S calculated under Balitsky’s scheme, Eqs (7) and (40), (solid line) and under the KW scheme, Eqs (8) and (41), (dashed line). The overlap of the two lines shows the agreement between the two calculations. Triangles stand for the running coupling term calculated in the KW approach, RKW, while stars stand for the running coupling term under Balitsky’s scheme, RBal. The trial functions N(r, Y ) correspond to the solution of the evolution with only running coupling under Balitsky’s scheme at Y=0 (left) and Y=30 (right) for a MV initial condition. -110 1 1.2 MV i.c. Y=0,3,10 KWRun BalRun F=Run-Sub init. cond. r )-1(GeV -110 1 scaling function i.c. Y=0,3,10 r )-1(GeV Figure 10: Solutions of the complete (all orders in αs β2) evolution equation given in Eq. (34) (solid lines), and of the equation with Balitsky’s (dashed lines) and KW’s (dashed-dotted) running coupling schemes at rapidities Y = 0, 5 and 10. Left plot uses MV initial condition. The right plot employs the initial condition given by the dipole amplitude at rapidity Y = 35 evolved using Balitsky’s running coupling scheme and with r-dependence rescaled down such that Qs = Q s = 1 GeV. Finally, we have checked that combining the subtraction and running coupling contributions for both schemes adds up to the same result. This is shown in Fig. 9, where we plot the value of the total functional F = R − S calculated under the KW scheme (Eqs. (8) and (36) for the running coupling term, R, and Eq. (41) for the subtraction term, S) and under Balitsky’s scheme (Eqs. (7) and (35) for the running coupling term and Eq. (40) for the subtraction term). The two results coincide within the estimation of the numerical accuracy previously discussed. The agreement between the two results is better in the small-τ region, where the two curves lay almost on top of each other. In the saturation region, τ & 1, the agreement is slightly worse, although the differences between the values of F calculated in both schemes is still much less than the differences between the running coupling terms themselves. This slight remaining disagreement between the Balitsky’s and KW prescriptions may also be due to inaccuracies in a Fourier transform of a geometric series performed in arriving at Eq. (39). This result serves as a cross-check of our numerical method and as an additional confirmation of the agreement of the independent calculations derived in [1, 2]. 0 1 2 3 4 5 6 7 8 9 MV i.c.KWRun BalRun F=Run-Sub )2(GeV 1 2 3 4 5 6 7 8 9 scaling function i.c. Figure 11: Saturation scale corresponding to the solutions plotted in Fig. 10. 4.4 Complete running coupling BK equation In this section we calculate the solutions of the complete evolution equation, Eq. (34), including both the running and subtraction terms obtained by the all-orders αs Nf resummation and by the Nf → −6πβ2 replacement. Since the numerical evaluation of the subtraction contribution at each point of the grid and each step of the evolution would require an exceedingly large amount of CPU time consumption, the strategy followed to include it in the evolution equation consists of calculating such contribution only in a small set of grid points at each step of the evolution, which we fixed at n = 16, between the points r1 and r2, which are determined at each step of the evolution by the conditions N(Y, r1) = 10 −9, and N(Y, r2) = 0.99, and then using power-law interpolation and extrapolation to the other points of the grid. Both the running and subtracted terms are calculated within Balitsky scheme. This procedure is motivated by the fact that, as discussed in the previous section, the subtraction contribution can be regarded as a small perturbation with respect to the running coupling term within Balitsky’s scheme and by the fact that it is a rather smooth function that can be well fitted by a power-law function in most of the r-range. The accuracy of this procedure has been checked by doubling the number of points at which the subtraction contribution is calculated at each step of the evolution, i.e. by setting n=32. At Y=2, the differences between the solutions obtained with the two above mentioned choices for n were less than a 8% in the tail of the solution, r < r1, and less than a 3% for r > r1. The results of the evolution calculated in this way and using MV and rescaled asymptotic running coupling solution (Y=35) as initial conditions are plotted in Fig. 10. They confirm -210 -110 1 10 Y=10F=Run-Sub (Y=35) MV BALi.c.=N Figure 12: Rescaled solutions given by the complete αs β2-evolution equation (solid line) and for KW (dashed-dotted line) and Balitsky’s (dashed line) running coupling schemes at Y = 10. The initial condition corresponds to the dipole amplitude at rapidity Y = 35 evolved using Balitsky’s running coupling scheme and with r-dependence rescaled down such that Qs = Q s = 1 GeV. the expectations raised in the previous Subsection: the inclusion of the subtraction terms considerably slows down the evolution with respect to the sole consideration of the running coupling contributions. Moreover, the reduction in the speed of the wave front is much larger for the KW scheme than for that of Balitsky one for both initial conditions. However, the closer the initial condition is to the asymptotic running coupling scaling function, the smaller are the effects of the subtraction contribution. These features can be better quantified by inspecting the rapidity dependence of the saturation scale generated by the evolution, plotted in Fig. 11. At rapidity Y = 10 the ratio of the saturation scale Qs yielded by the KW scheme to Qs given by the complete αs β2-evolution equation is a factor of ∼ 2.5 for the MV initial condition and a factor of ∼ 2.1 for the asymptotic running coupling initial condition. At the same rapidity, the ratio of the saturation scale obtained under Balitsky’s scheme to Qs corresponding to the complete αs β2-evolution is ∼ 1.25 for the MV initial condition and ∼ 1.15 for the scaling function initial condition. Thus, in spite of the smallness of the ratio of the subtraction terms to the running coupling contributions at high rapidity, which is ∼ 0.025 for Balitsky’s and ∼ 0.15 for KW scheme at Y = 30 (see bottom plots in Fig. 8), the proper inclusion of the subtraction term results in fairly sizable effects in the solutions of the evolution equation. Finally, we notice that the scaling behavior of the solution is not affected by the subtraction term. This is seen in figure Fig. 12, where we evolve starting from an initial condition already close to the running coupling scaling function and plot the solutions of the evolution equation obtained with just running coupling terms (see Section 4.1) and the solution of the complete αs β2-evolution at rapidity Y = 10. It is clear that, within the numerical accuracy, no departure from the scaling behavior is observed. Therefore the main effect of a proper consideration of the subtraction term is the one of reducing the speed of the evolution. It does not violate or modify the geometric scaling property of the solutions established in Section 4.2. In our understanding geometric scaling appears to persist when the running coupling effects are included because, at high enough rapidity Qs(Y ) ≫ Λ, such that the new (from the LO standpoint) momentum scale Λ introduced by the running coupling can be safely neglected. Hence the dynamics is again characterized by a single momentum scale Qs(Y ). At the same time running coupling does modify the evolution kernel, leading to a different shape of the scaling function. 5 Conclusions In this paper we have taken into account all corrections to the kernels of the non-linear JIMWLK and BK evolution equations containing powers of αs Nf . We reiterated the fact that the sep- aration of the resulting kernel resumming all powers of αs Nf into the running coupling and subtraction parts, as done in the previous calculations of [1, 2], is not justified parametrically. We have then performed numerical analysis with the following conclusions. • First we solved the evolution equations derived in [1] and [2] keeping only the running coupling part or the evolution kernel and neglecting the subtraction term. Comparing to the results for fixed coupling obtained in [3] we confirmed the conclusion reached in [3] that the growth with rapidity is substantially reduced when running coupling corrections are included. The results for three different initial conditions are shown in Fig. 4. We observe that the solution of the equation derived in [2] differs significantly from that derived in [1], but agrees (with good numerical accuracy) with the solution of the BK evolution equation with the coupling running at the parent dipole size. (The latter is just a model of the running coupling not resulting from any calculations, which we plot for illustrative purposes.) We also observe that at sufficiently high rapidity both equations from [1] and from [2] give us the same scaling function for the dipole amplitude N(r, Y ) as a function of r Qs(Y ), which is also in agreement with the scaling function given by the parent dipole running, as shown in Fig. 5. The fact that the scaling is preserved when the running coupling corrections are included was previously established in [3], though for models of running coupling only. The shape of the scaling function is very different from that obtained from the fixed coupling evolution equations. In particular, we found that for dipole sizes below 0.1/Qs the anomalous dimension of the scaling function in the running coupling case becomes γ ≈ 0.85 (see Fig. 6). This is different from the result of several analytical estimates [34–38], which expect the anomalous dimension not to change when running coupling corrections are included and to remain at its fixed coupling value of γ ≈ 0.63. • We have then evaluated the subtraction term for both calculations performed in [1] and [2]. We demonstrated that subtracting the subtraction terms from the running coupling terms makes the full answer agree for both calculations of [1] and [2], as shown in Fig. 9 for the right hand side of the evolution equation. It turns out that the subtraction term SBal[S], which has to be subtracted from the result of [1], is systematically smaller than SKW[S], to be subtracted from the result of [2], over the whole rapidity range studied here. This implies that the result of [1] should have a smaller correction than the result of [2] and is thus closer to the full answer. The subtraction terms SBal[S] and SKW[S] are plotted in Fig. 7 as functions of the dipole size r for different values of rapidity. Their relative contributions to the evolution kernel are shown in Fig. 8, where we plotted the subtraction functional divided by the running coupling functional. From those figures we conclude that both the magnitude of these extra terms and their relative contribution to the evolution kernel decrease with increasing rapidity. Hence, while at ”moderate” rapidities (the ones closer to realistic experimental values) the subtraction term is important for both calculations [1, 2], it becomes increasingly less important at asymptotically large rapidities. The physics is easy to understand: the subtraction terms are o(α2s), while the running coupling part of the kernel is o(αs). Hence, if we suppose that the effective value of the coupling is given by its magnitude at the saturation scale Qs(Y ), then, as rapidity increases, the coupling would decrease, making the subtraction term much smaller than the running coupling term. Indeed, while at asymptotically high rapidities the assumption of [1,2] that the subtraction term could be neglected is justified, making the results of [1] and [2] agree with each other, for rapidities relevant to modern days experiments the subtraction term is numerically important. • With the last conclusion in mind we continued by numerically solving the full evolution equation resumming all powers of αs Nf in the evolution kernel, which now would combine both the running coupling and the subtraction terms. The five-dimensional integral in the subtraction term (37) made obtaining this solution rather difficult. The outcome of the calculation is shown in Fig. 10. All the main conclusions stated above were again confirmed by the solution of the full equation. At asymptotically high rapidity scaling regime is recovered, as can be seen from Fig. 12. As the subtraction term is less important in that regime, the scaling function appears to be the same as in the case of having only the running coupling term in the kernel. The anomalous dimension again turns out to be γ ≈ 0.85, in disagreement with the analytical expectations of [34–38]. However, the scaling of the saturation scale with rapidity appears to be in agreement with the expectations of analytical work of [34, 35, 38], as shown in Fig. 11. We conclude by observing that the knowledge of the non-linear small-x evolution equation with all the running coupling corrections included brings us to an unprecedented level of pre- cision allowing for a much more detailed comparison with experiments than was ever possible before. Acknowledgments We would like to thank Heribert Weigert for many informative and helpful discussions at the beginning of this work. A portion of the performed work was motivated by stimulating discussions with Robi Peschanski, which we gratefully acknowledge. This research is sponsored in part by the U.S. Department of Energy under Grant No. DE- FG02-05ER41377. This work was supported in part by an allocation of computing time from the Ohio Supercomputer Center. References [1] I. I. Balitsky, Quark Contribution to the Small-x Evolution of Color Dipole, Phys. Rev. D 75 (2007) 014001, [hep-ph/0609105]. [2] Y. Kovchegov and H. Weigert, Triumvirate of Running Couplings in Small-x Evolution, Nucl. Phys. A 784 (2007) 188–226, [hep-ph/0609090]. [3] J. L. Albacete, N. Armesto, J. G. Milhano, C. A. Salgado, and U. A. Wiedemann, Numerical analysis of the Balitsky-Kovchegov equation with running coupling: Dependence of the saturation scale on nuclear size and rapidity, Phys. Rev. D71 (2005) 014003, [hep-ph/0408216]. [4] E. A. Kuraev, L. N. Lipatov, and V. S. Fadin, The Pomeranchuk singularity in non-Abelian gauge theories, Sov. Phys. JETP 45 (1977) 199–204. [5] Y. Y. Balitsky and L. N. Lipatov Sov. J. Nucl. Phys. 28 (1978) 822. [6] J. Jalilian-Marian, A. Kovner, A. Leonidov, and H. Weigert, The BFKL equation from the Wilson renormalization group, Nucl. Phys. B504 (1997) 415–431, [hep-ph/9701284]. [7] J. Jalilian-Marian, A. Kovner, A. Leonidov, and H. Weigert, The Wilson renormalization group for low x physics: Towards the high density regime, Phys. Rev. D59 (1998) 014014, [hep-ph/9706377]. [8] J. Jalilian-Marian, A. Kovner, and H. Weigert, The Wilson renormalization group for low x physics: Gluon evolution at finite parton density, Phys. Rev. D59 (1998) 014015, [hep-ph/9709432]. [9] J. Jalilian-Marian, A. Kovner, A. Leonidov, and H. Weigert, Unitarization of gluon distribution in the doubly logarithmic regime at high density, Phys. Rev. D59 (1999) 034007, [hep-ph/9807462]. [10] A. Kovner, J. G. Milhano, and H. Weigert, Relating different approaches to nonlinear QCD evolution at finite gluon density, Phys. Rev. D62 (2000) 114005, [hep-ph/0004014]. [11] H. Weigert, Unitarity at small Bjorken x, Nucl. Phys. A703 (2002) 823–860, [hep-ph/0004044]. [12] E. Iancu, A. Leonidov, and L. D. McLerran, Nonlinear gluon evolution in the color glass condensate. I, Nucl. Phys. A692 (2001) 583–645, [hep-ph/0011241]. [13] E. Ferreiro, E. Iancu, A. Leonidov, and L. McLerran, Nonlinear gluon evolution in the color glass condensate. II, Nucl. Phys. A703 (2002) 489–538, [hep-ph/0109115]. [14] I. Balitsky, Operator expansion for high-energy scattering, Nucl. Phys. B463 (1996) 99–160, [hep-ph/9509348]. [15] I. Balitsky, Operator expansion for diffractive high-energy scattering, hep-ph/9706411. http://arxiv.org/abs/hep-ph/0609105 http://xxx.lanl.gov/abs/hep-ph/0609105 http://arxiv.org/abs/hep-ph/0609090 http://xxx.lanl.gov/abs/hep-ph/0609090 http://arxiv.org/abs/hep-ph/0408216 http://xxx.lanl.gov/abs/hep-ph/0408216 http://arxiv.org/abs/hep-ph/9701284 http://xxx.lanl.gov/abs/hep-ph/9701284 http://arxiv.org/abs/hep-ph/9706377 http://xxx.lanl.gov/abs/hep-ph/9706377 http://arxiv.org/abs/hep-ph/9709432 http://xxx.lanl.gov/abs/hep-ph/9709432 http://arxiv.org/abs/hep-ph/9807462 http://xxx.lanl.gov/abs/hep-ph/9807462 http://arxiv.org/abs/hep-ph/0004014 http://xxx.lanl.gov/abs/hep-ph/0004014 http://arxiv.org/abs/hep-ph/0004044 http://xxx.lanl.gov/abs/hep-ph/0004044 http://arxiv.org/abs/hep-ph/0011241 http://xxx.lanl.gov/abs/hep-ph/0011241 http://arxiv.org/abs/hep-ph/0109115 http://xxx.lanl.gov/abs/hep-ph/0109115 http://arxiv.org/abs/hep-ph/9509348 http://xxx.lanl.gov/abs/hep-ph/9509348 http://arxiv.org/abs/hep-ph/9706411 http://xxx.lanl.gov/abs/hep-ph/9706411 [16] I. Balitsky, Factorization and high-energy effective action, Phys. Rev. D60 (1999) 014020, [hep-ph/9812311]. [17] Y. V. Kovchegov, Small-x F2 structure function of a nucleus including multiple pomeron exchanges, Phys. Rev. D60 (1999) 034008, [hep-ph/9901281]. [18] Y. V. Kovchegov, Unitarization of the BFKL pomeron on a nucleus, Phys. Rev. D61 (2000) 074018, [hep-ph/9905214]. [19] L. V. Gribov, E. M. Levin, and M. G. Ryskin, Singlet structure function at small x: Unitarization of gluon ladders, Nucl. Phys. B188 (1981) 555–576. [20] A. H. Mueller and J.-w. Qiu, Gluon recombination and shadowing at small values of x, Nucl. Phys. B268 (1986) 427. [21] L. D. McLerran and R. Venugopalan, Green’s functions in the color field of a large nucleus, Phys. Rev. D50 (1994) 2225–2233, [hep-ph/9402335]. [22] L. D. McLerran and R. Venugopalan, Gluon distribution functions for very large nuclei at small transverse momentum, Phys. Rev. D49 (1994) 3352–3355, [hep-ph/9311205]. [23] L. D. McLerran and R. Venugopalan, Computing quark and gluon distribution functions for very large nuclei, Phys. Rev. D49 (1994) 2233–2241, [hep-ph/9309289]. [24] Y. V. Kovchegov, Non-Abelian Weizsaecker-Williams field and a two- dimensional effective color charge density for a very large nucleus, Phys. Rev. D54 (1996) 5463–5469, [hep-ph/9605446]. [25] Y. V. Kovchegov, Quantum structure of the non-Abelian Weizsaecker-Williams field for a very large nucleus, Phys. Rev. D55 (1997) 5445–5455, [hep-ph/9701229]. [26] J. Jalilian-Marian, A. Kovner, L. D. McLerran, and H. Weigert, The intrinsic glue distribution at very small x, Phys. Rev. D55 (1997) 5414–5428, [hep-ph/9606337]. [27] E. Iancu and R. Venugopalan, The color glass condensate and high energy scattering in QCD, hep-ph/0303204. [28] H. Weigert, Evolution at small xbj: The Color Glass Condensate, Prog. Part. Nucl. Phys. 55 (2005) 461–565, [hep-ph/0501087]. [29] J. Jalilian-Marian and Y. V. Kovchegov, Saturation physics and deuteron gold collisions at RHIC, Prog. Part. Nucl. Phys. 56 (2006) 104–231, [hep-ph/0505052]. [30] E. Gardi, J. Kuokkanen, K. Rummukainen, and H. Weigert, Running coupling and power corrections in nonlinear evolution at the high-energy limit, Nucl. Phys. A784 (2007) 282–340, [hep-ph/0609087]. [31] Y. V. Kovchegov and H. Weigert, Quark loop contribution to BFKL evolution: Running coupling and leading-N(f) NLO intercept, Nucl. Phys. A789, 260 (2007), [hep-ph/0612071]. http://arxiv.org/abs/hep-ph/9812311 http://xxx.lanl.gov/abs/hep-ph/9812311 http://arxiv.org/abs/hep-ph/9901281 http://xxx.lanl.gov/abs/hep-ph/9901281 http://arxiv.org/abs/hep-ph/9905214 http://xxx.lanl.gov/abs/hep-ph/9905214 http://arxiv.org/abs/hep-ph/9402335 http://xxx.lanl.gov/abs/hep-ph/9402335 http://arxiv.org/abs/hep-ph/9311205 http://xxx.lanl.gov/abs/hep-ph/9311205 http://arxiv.org/abs/hep-ph/9309289 http://xxx.lanl.gov/abs/hep-ph/9309289 http://arxiv.org/abs/hep-ph/9605446 http://xxx.lanl.gov/abs/hep-ph/9605446 http://arxiv.org/abs/hep-ph/9701229 http://xxx.lanl.gov/abs/hep-ph/9701229 http://arxiv.org/abs/hep-ph/9606337 http://xxx.lanl.gov/abs/hep-ph/9606337 http://arxiv.org/abs/hep-ph/0303204 http://xxx.lanl.gov/abs/hep-ph/0303204 http://arxiv.org/abs/hep-ph/0501087 http://xxx.lanl.gov/abs/hep-ph/0501087 http://arxiv.org/abs/hep-ph/0505052 http://xxx.lanl.gov/abs/hep-ph/0505052 http://arxiv.org/abs/hep-ph/0609087 http://xxx.lanl.gov/abs/hep-ph/0609087 http://arxiv.org/abs/hep-ph/0612071 http://xxx.lanl.gov/abs/hep-ph/0612071 [32] G. P. Lepage and S. J. Brodsky, Exclusive processes in perturbative quantum chromodynamics, Phys. Rev. D22 (1980) 2157. [33] S. J. Brodsky, H.-C. Pauli, and S. S. Pinsky, Quantum chromodynamics and other field theories on the light cone, Phys. Rept. 301 (1998) 299–486, [hep-ph/9705477]. [34] A. H. Mueller and D. N. Triantafyllopoulos, The energy dependence of the saturation momentum, Nucl. Phys. B640 (2002) 331–350, [hep-ph/0205167]. [35] D. N. Triantafyllopoulos, The energy dependence of the saturation momentum from RG improved BFKL evolution, Nucl. Phys. B648 (2003) 293–316, [hep-ph/0209121]. [36] E. Iancu, K. Itakura, and L. McLerran, Geometric scaling above the saturation scale, Nucl. Phys. A708 (2002) 327–352, [hep-ph/0203137]. [37] S. Munier and R. Peschanski, Universality and tree structure of high energy QCD, Phys. Rev. D70 (2004) 077503, [hep-ph/0401215]. [38] G. Beuf and R. Peschanski, Universality of QCD traveling-waves with running coupling, hep-ph/0702131. [39] A. H. Mueller, Soft gluons in the infinite momentum wave function and the BFKL pomeron, Nucl. Phys. B415 (1994) 373–385. [40] A. H. Mueller and B. Patel, Single and double BFKL pomeron exchange and a dipole picture of high-energy hard processes, Nucl. Phys. B425 (1994) 471–488, [hep-ph/9403256]. [41] A. H. Mueller, Unitarity and the BFKL pomeron, Nucl. Phys. B437 (1995) 107–126, [hep-ph/9408245]. [42] Z. Chen and A. H. Mueller, The dipole picture of high-energy scattering, the BFKL equation and many gluon compound states, Nucl. Phys. B451 (1995) 579–604. [43] S. Munier and R. Peschanski, Traveling wave fronts and the transition to saturation, Phys. Rev. D69 (2004) 034008, [hep-ph/0310357]. [44] S. J. Brodsky, G. P. Lepage, and P. B. Mackenzie, On the elimination of scale ambiguities in perturbative quantum chromodynamics, Phys. Rev. D28 (1983) 228. [45] K. Golec-Biernat and M. Wüsthoff, Saturation effects in deep inelastic scattering at low Q2 and its implications on diffraction, Phys. Rev. D59 (1998) 014017, [hep-ph/9807513]. [46] M. A. Braun, Pomeron fan diagrams with an infrared cutoff and running coupling, Phys. Lett. B 576 (2003) 115, [hep-ph/0308320]. [47] S. Munier and R. Peschanski, Geometric scaling as traveling waves, Phys. Rev. Lett. 91 (2003) 232001, [hep-ph/0309177]. http://arxiv.org/abs/hep-ph/9705477 http://xxx.lanl.gov/abs/hep-ph/9705477 http://arxiv.org/abs/hep-ph/0205167 http://xxx.lanl.gov/abs/hep-ph/0205167 http://arxiv.org/abs/hep-ph/0209121 http://xxx.lanl.gov/abs/hep-ph/0209121 http://arxiv.org/abs/hep-ph/0203137 http://xxx.lanl.gov/abs/hep-ph/0203137 http://arxiv.org/abs/hep-ph/0401215 http://xxx.lanl.gov/abs/hep-ph/0401215 http://arxiv.org/abs/hep-ph/0702131 http://xxx.lanl.gov/abs/hep-ph/0702131 http://arxiv.org/abs/hep-ph/9403256 http://xxx.lanl.gov/abs/hep-ph/9403256 http://arxiv.org/abs/hep-ph/9408245 http://xxx.lanl.gov/abs/hep-ph/9408245 http://arxiv.org/abs/hep-ph/0310357 http://xxx.lanl.gov/abs/hep-ph/0310357 http://arxiv.org/abs/hep-ph/9807513 http://xxx.lanl.gov/abs/hep-ph/9807513 http://arxiv.org/abs/hep-ph/0308320 http://xxx.lanl.gov/abs/hep-ph/0308320 http://arxiv.org/abs/hep-ph/0309177 http://xxx.lanl.gov/abs/hep-ph/0309177 [48] J. L. Albacete, N. Armesto, A. Kovner, C. A. Salgado, and U. A. Wiedemann, Energy dependence of the Cronin effect from non-linear QCD evolution, Phys. Rev. Lett. 92 (2004) 082001, [hep-ph/0307179]. [49] M. Lublinsky, Scaling phenomena from non-linear evolution in high energy DIS, Eur. Phys. J. C21 (2001) 513–519, [hep-ph/0106112]. [50] N. Armesto and M. A. Braun, Parton densities and dipole cross-sections at small x in large nuclei, Eur. Phys. J. C20 (2001) 517–522, [hep-ph/0104038]. [51] A. M. Stasto, K. Golec-Biernat, and J. Kwiecinski, Geometric scaling for the total γ∗p cross-section in the low x region, Phys. Rev. Lett. 86 (2001) 596–599, [hep-ph/0007192]. [52] N. Armesto, C. A. Salgado, and U. A. Wiedemann, Relating high-energy lepton hadron, proton nucleus and nucleus nucleus collisions through geometric scaling, hep-ph/0407018. [53] J. L. Albacete, N. Armesto, J. G. Milhano, C. A. Salgado, and U. A. Wiedemann, Nuclear size and rapidity dependence of the saturation scale from QCD evolution and experimental data, Eur. Phys. J. C43 (2005) 353–360, [hep-ph/0502167]. [54] E. Iancu, K. Itakura, and S. Munier, Saturation and BFKL dynamics in the HERA data at small x, Phys. Lett. B590 (2004) 199–208, [hep-ph/0310338]. [55] A. Dumitru, A. Hayashigaki, and J. Jalilian-Marian, Geometric scaling violations in the central rapidity region of d + Au collisions at RHIC, Nucl. Phys. A770 (2006) 57–70, [hep-ph/0512129]. [56] V. P. Goncalves, M. S. Kugeratski, M. V. T. Machado, and F. S. Navarra, Saturation physics at HERA and RHIC: An unified description, Phys. Lett. B643 (2006) 273–278, [hep-ph/0608063]. [57] D. Kharzeev, Y. V. Kovchegov, and K. Tuchin, Nuclear modification factor in d + Au collisions: Onset of suppression in the color glass condensate, Phys. Lett. B599 (2004) 23–31, [hep-ph/0405045]. http://arxiv.org/abs/hep-ph/0307179 http://xxx.lanl.gov/abs/hep-ph/0307179 http://arxiv.org/abs/hep-ph/0106112 http://xxx.lanl.gov/abs/hep-ph/0106112 http://arxiv.org/abs/hep-ph/0104038 http://xxx.lanl.gov/abs/hep-ph/0104038 http://arxiv.org/abs/hep-ph/0007192 http://xxx.lanl.gov/abs/hep-ph/0007192 http://arxiv.org/abs/hep-ph/0407018 http://xxx.lanl.gov/abs/hep-ph/0407018 http://arxiv.org/abs/hep-ph/0502167 http://xxx.lanl.gov/abs/hep-ph/0502167 http://arxiv.org/abs/hep-ph/0310338 http://xxx.lanl.gov/abs/hep-ph/0310338 http://arxiv.org/abs/hep-ph/0512129 http://xxx.lanl.gov/abs/hep-ph/0512129 http://arxiv.org/abs/hep-ph/0608063 http://xxx.lanl.gov/abs/hep-ph/0608063 http://arxiv.org/abs/hep-ph/0405045 http://xxx.lanl.gov/abs/hep-ph/0405045 Introduction Scheme dependence Inclusion of running coupling corrections: general concepts Derivation of the subtraction term Brief summary of analytical results Numerical setup and initial conditions Results Running coupling Geometric scaling Subtraction Term Complete running coupling BK equation Conclusions ABSTRACT We study the solution of the nonlinear BK evolution equation with the recently calculated running coupling corrections [hep-ph/0609105, hep-ph/0609090]. Performing a numerical solution we confirm the earlier result of [hep-ph/0408216] that the high energy evolution with the running coupling leads to a universal scaling behavior for the dipole scattering amplitude. The running coupling corrections calculated recently significantly change the shape of the scaling function as compared to the fixed coupling case leading to a considerable increase in the anomalous dimension and to a slow-down of the evolution with rapidity. The difference between the two recent calculations is due to an extra contribution to the evolution kernel, referred to as the subtraction term, which arises when running coupling corrections are included. These subtraction terms were neglected in both recent calculations. We evaluate numerically the subtraction terms for both calculations, and demonstrate that when the subtraction terms are added back to the evolution kernels obtained in the two works the resulting dipole amplitudes agree with each other! We then use the complete running coupling kernel including the subtraction term to find the numerical solution of the resulting full non-linear evolution equation with the running coupling corrections. Again the scaling regime is recovered at very large rapidity. <|endoftext|><|startoftext|> Anomalous c-axis transport in layered metals D. B. Gutman and D. L. Maslov Department of Physics, University of Florida, Gainesville, FL 32611, USA (Dated: November 4, 2018) Transport in metals with strongly anisotropic single-particle spectrum is studied. Coherent band transport in all directions, described by the standard Boltzmann equation, is shown to withstand both elastic and inelastic scattering as long as EF τ ≫ 1. A model of phonon-assisted tunneling via resonant states located in between the layers is suggested to explain a non-monotonic temperature dependence of the c-axis resistivity observed in experiments. PACS numbers: 72.10.-d,72.10.Di Electron transport in layered materials exhibits a num- ber of unusual properties. The most striking example is a qualitatively different behavior of the in-plane (ρab) and out-of-plane (ρc) resistivities: whereas the temper- ature dependence of ρab is metallic-like, that of ρc is ei- ther insulating-like or even non-monotonic. At the level of non-interacting electrons, layered systems are metals with strongly anisotropic Fermi surfaces. A commonly used model is free motion along the planes and nearest- neighbor hopping between the planes: εk = k ||/2mab + 2J (1− cos k⊥d) , (1) where k|| and k⊥ are in the in-plane and c-axis com- ponents of momentum, respectively, mab is the in-plane mass, and d is lattice constant in the c-axis direction. For the strongly anisotropic case (J ≪ EF ), the equipoten- tial surfaces are “corrugated cylinders” (see Fig.1). If the Hamiltonian consists of the band motion with spectrum (1) and the interaction of electrons with poten- tial disorder as well as with inelastic degrees of freedom, e.g., phonons, the Boltzmann equation predicts that the conductivities are given by σBab = e 2ν〈vavbτtr〉, σBc = 4e2νJ2d2〈sin2 (k⊥d) τtr〉, (2) where 〈. . . 〉 denotes averaging over the Fermi surface and over the thermal (Fermi) distribution, ν = mab/πd is the density of states, and τtr is the transport time, resulting from all scattering processes (we set h̄ = kB = 1). If τtr decreases with the temperature, both σab and σc are expected to decrease with T as well. This is not what the experiment shows. The c-axis puzzle received a lot of attention in con- nection to the HTC materials [1], and a non-Fermi-liquid nature of these materials was suggested to be responsible for the anomalous c-axis transport [2]. However, other materials, such as graphite [3], TaS2 [4], Sr2RuO4 [5], or- ganic metals [6], etc., behave as canonical Fermi liquids in all aspects but the c-axis transport. This suggests that the origin of the effect is not related to the specific prop- erties of HTC compounds but common for all layered materials. A large number of models were proposed to explain the c-axis puzzle. Despite this variety, most au- thors seem to agree on that the coherent band transport FIG. 1: Fermi surface corresponding to Eq.(1) with Fermi velocity vectors at two different points. in the c-axis direction is destroyed. Although there is no agreement as to what replaces the band transport in the ”incoherent” regime, the most frequently discussed mech- anisms include incoherent tunneling between the layers, assisted by either out-of-plane impurities [8, 10, 11, 12] or by coupling to dissipative environment [13], and polarons [14, 15]. The message of this Letter is two-fold. First, we ob- serve that neither elastic or inelastic (electron-phonon) scattering can destroy band transport even in a strongly anisotropic metal as long as the familiar parameter EF τ is large. Nothing happens to the Boltzmann conductivi- ties in Eq.(2) except for σBc becoming very small at high temperatures so that other mechanisms, not included in Eq.(2), dominate transport. This observation is in agree- ment with recent experiment [7] where a coherent fea- ture (angle-dependent magnetoresistance) was observed in a supposedly incoherent regime. Second, we propose phonon-assisted tunneling through resonant impurities as the mechanism competing with the band transport. As such tunneling provides an additional channel for trans- http://arxiv.org/abs/0704.0613v1 port, the total conductivity is [8] σc = σ c + σres, (3) where σres is the resonant-impurity contribution. Be- cause σres increases with the temperature, the band chan- nel is short-circuited by the resonant one at high enough temperatures[9]. Accordingly, σc goes through a min- imum at a certain temperature (and ρc = σ c goes through a maximum). We consider phonon-assisted tun- neling through a wide band of resonant levels distributed uniformly in space. We show that the non-perturbative (in the electron-phonon coupling) version of this the- ory is in a quantitative agreement with the experiment on Sr2RuO4 [5]. Due to a similarity between phonon- assisted tunneling and other problems, in which inter- action leads to the formation of a cloud surrounding the electron (such as polaronic effect and zero bias anomaly), many ideas put forward earlier [8, 10, 11, 12, 13, 14, 15] agree with our picture. Nevertheless, we believe that only a combination of resonant impurities and electron- phonon interaction solves the puzzle of c-axis resistivity and provides a microscopic theory for some of the mech- anisms considered in prior work. We begin with the dis- cussion of the breakdown (or lack of it thereof) of the Boltzmann equation. One may wonder whether the band transport along the c-axis breaks down because the Anderson localization transition occurs in the c-direction whereas the in-plane transport remains metallic. This does not happen, how- ever, because an electron, encountering an obstacle for motion along the c-axis, moves quickly to another point in the plane, where such an obstacle is absent. More formally, it has been shown the Anderson transition oc- curs only simultaneously in all directions [16, 17, 18] and only if J is exponentially smaller than 1/τ . Therefore, localization cannot explain the observed behavior. Refs.[19, 20] suggested an idea of the “coherent- incoherent crossover”. It implies that the coherent band motion breaks down if electrons are scattered faster than they tunnel between adjacent layers, i.e., if Jτ ≪ 1. Con- sequently, the current in the c-direction is carried via in- coherent hops between conducting layers. It was noted by a number of authors that the assumption about inco- herent nature of the transport does not, by itself, explain the difference in temperature dependences of σab and σc [20, 21]: due to conservation of the in-plane momentum, σc is proportional to τ both in the coherent and inco- herent regimes. Nevertheless, an issue of the “coherent- incoherent crossover” poses a fundamentally important question: can scattering destroy band transport only in some directions, if the spectrum is anisotropic enough [22]? We argue here that this is not the case. Since we have already ruled out elastic scattering, this leaves inelastic one as a potential culprit. We focus on the case of the electron-phonon interaction as a source of inelastic scattering. For an isotropic metal, the quantum kinetic equation is derived from the Keldysh equations of motion for the Green’s function via the Prange-Kadanoff procedure [23] for any strength of the electron-phonon in- teraction. In this Letter, we apply the Prange-Kadanoff theory to metals with strongly anisotropic Fermi surfaces, such as the one in Fig. 1. We show that, exactly as in the isotropic case, the Boltzmann equation holds its stan- dard form as long as EF τe-ph ≫ 1. Since this form does not change between coherent (Jτe-ph ≫ 1) and incoher- ent (Jτe-ph ≪ 1) regimes, it means that the coherent- incoherent crossover is, in fact, absent. We adopt the standard Frölich Hamiltonian for the deformation-potential interaction with longitudinal acoustic phonons (ωq = sq) k+qak Since tunneling matrix elements are much more sensi- tive to the increase in the inter-plane distance than the elastic moduli, the anisotropy of phonon spectra in lay- ered materials, albeit significant, is still weaker than the anisotropy of electron spectra (see, e.g., Ref. [24]). Therefore, we treat phonons in the isotropic approxima- tion, and assume that the magnitude of the Fermi veloc- ity is larger than the speed of sound s. For a static and uniform electric field, the Keldysh component of the electron’s Green function satisfies the Dyson equation L̂GK + [ReΣR,⊗GK ]− + [ΣK ,⊗ReGK ]− [ΣK ,⊗A]+ − [Γ,⊗GK ]+ . (5) Here L̂ = (∂t + v · ∇R + eE · ∇k) is the Liouville op- erator, A = i(GR − GA) is the spectral function, Γ = ΣR − ΣA , and ⊗ denotes the convolution in space and time. Thanks to the Migdal theorem, the self-energy does not depend on electron’s dispersion ξk ≡ εk − EF , and Eq.(5) can be integrated over ξk. This results in an equa- L̂gK + [ReΣR, gK ]− = 2iΣ K − 1 [Γ, gK ]+ (6) for the “distribution function” gK(ǫ, n̂) = GK(ǫ, ξk, n̂)dξk , (7) where n̂ = vk/ |vk| is a local normal to the Fermi surface. We consider a linear dc response, when the self-energy is needed only at equilibrium. Within the Migdal theory, the Matsubara self-energy is given by a single diagram Σ(ǫ, n̂) = − g2 (q)G(ǫ− ω,k− q)D(ω, q) , where the dressed phonon propagator D−1 = D−10 − g is expressed through bare one D0(ω, q) = −s2q2/ ω2 + s2q2 and polarization operator Π which, for EF > 2J, is given by its 2D form Π(ω, q) = −ν 1− |ω|/ v2F q ‖ + ω We assume that the electron-phonon vertex decays on some scale kD shorter than Fermi momentum (kD ≪ kF ). This assumption allows one to linearize the dispersion ξk−q ≈ ξk − vk · q and simplifies the analysis without changing the results qualitatively. As long as J ≪ EF , we have |vk| ≈ kF /mab ≈ vF , where kF is the radius of the cylinder in Fig. 1 for J = 0. Despite the fact that the electron velocity does have a small component along the c-axis, its in-plane component is large (cf. Fig. 1). Since it is the magnitude of vk that controls the Migdal’s approximation, the problem reduces to the interaction of fast 2D electrons with slow 3D phonons. With these simplifications, we find ReΣR(ǫ, n̂) = −1 ǫ; (8a) ImΣR(ǫ, n̂) = − ζ 12(1− ζ)2 , (8b) where ζ = νg2 is a dimensionless coupling constant and ωD = skD. We see that, despite the strong anisotropy, the self-energy remains local, i.e., independent of ξk. Vertex renormalization leads to two types of correc- tions to the self-energy: those that are proportional to the Migdal’s parameter (s/vF ) and those that are pro- portional to ms2/ǫ. The second type of corrections inval- idates the Migdal’s theory for temperatures below ms2, which is about 1 K in a typical metal. For metals with anisotropic spectrum the existence of such a scale is po- tentially dangerous, since it is not obvious which of the masses (light or heavy) defines this scale. We find that the in-plane mass (mab) controls the vertex renormaliza- tion for the nearly cylindrical Fermi surface. This shows that the Migdal theory for layered metals has the same range of applicability as for isotropic metals [25]. The rest of the derivation proceeds in the same way as for the isotropic case [23], and the resulting Boltzmann equation assumes its standard form. Since no assump- tion about the relation between τe-ph and the dwell time (1/J) has been made, the conductivities obtained from the Boltzmann equation have the same form regardless of whether Jτe-ph is large or small. In other words, there is no coherent-incoherent crossover due to inelastic scat- tering in an anisotropic metal [29]. The situation changes qualitatively if resonant impu- rities are present in between the layers. Electrons that tunnel through such impurities are moving with the speed controlled by the broadening of a resonant level, i.e., much slower than speed of sound. For that reason they can not be treated within the formalism outlined above and require a separate study. To evaluate the resonant-impurity contribution to the conductivity, we assume that the impurities are randomly distributed in space with density nimp whereas their en- ergy levels uniformly distributed over an interval Eb. The tunneling conductance of a bilayer junction is G = −e2 dǫdǫ′Wǫ,ǫ′ (1 − n′ǫ) + , (9) where Wǫ,ǫ′ is a transition probability per unit time and nǫ is the Fermi function. To calculateWǫ,ǫ′ , we use the re- sults of Ref.[30, 31] for the probability of phonon-assisted tunneling through a single impurity Wǫ,ǫ′ = ΓLΓR it1(ǫ dt2dt3e i(t2−t3)(ǫ−ǭ0)−Γ(t2+t3) (10) × exp |αq|2 |1− e−it3 + eit1 e−it2 − 1 |2 coth e−it3 + eit2 + eit1(e−it2 − 1)(1− eit3)− c.c. where αq = −iΛq/ ρωq, Λ is the deformation-potential constant, ΓL and ΓR are tunneling widths of the resonant level, Γ = ΓL + ΓR, and ǭ0 is the energy of a resonant level renormalized by the electron-phonon interaction. In the limit of no electron-phonon interaction, Eq.(10) re- produces the well-known Breit-Wigner formula. From now on, we consider a wide band of resonant levels: Eb ≫ T ≫ Γ. Averaging Eq.(10) over spatial and en- ergy positions of resonant levels, one obtains σres=σel 1−coth sinh2 dteitǫ−λf(t) f(t)= (1−cos(ωt)) coth +i sin(ωt) .(11) Here σel is the conductivity due to elastic resonant tun- neling and λ ≡ Λ2ω2D/ρs5π2 is the dimensionless cou- pling constant for localized electrons. In the absence of electron-phonon interaction, σres is temperature inde- pendent and given by σel ≃ πe2Γ1nimpa0d/Eb[32], where a0 is the localization radius of a resonant state and Γ1 ≃ ǫ0e−d/a0 is its typical width. We note that the electron-phonon interaction is much stronger for localized electrons than for band ones: λ/ζ ∼ (kFd) (vF /s) ≫ 1. Since typically ζ ∼ 1, one needs to consider a non- perturbative regime of phonon-assisted tunneling. In that case, resonant tunneling is exponentially suppressed at T = 0: σres(T = 0) = σele −λ/2. At finite T , we find σres = σel e−λ/2 1 + π , T ≪ ωD√ , T ≫ λωD. As T increases, σres growth, resembling the zero-bias anomaly in disordered metals and Mössbauer effect. At high temperatures (T ≫ λωD) σres approaches the non- interacting value (σel). The asymptotic regimes in the interval ωD/ λ ≪ T ≪ λωD can also be studied but we will not pause for this here. Notice that, in contrast to the phenomenological model of Ref.[8], there is no simple relation between the T -dependences of σBc and σres. To compare our model with the experiment, we extract σBc from the low-temperature (between 10 and 50 K) c- axis resistivity of Sr2RuO4 and extrapolate it to higher temperatures [5]. The resonant part of the conductivity is calculated numerically using Eq.(11). The fit to the data for σel = 43 · 103Ω−1 cm−1, ωD = 41 K and λ = 16 is shown in Fig. 2. The agreement between the theory and experiment is quite good and the values of the fitting parameters are reasonable. An immediate consequence of our model is the sample-to-sample variation of the c-axis conductivity. Among the layered materials, the largest amount of data is collected for graphite [3]. Even within the group of samples with comparable in-plane mobili- ties, the temperature of the maximum in ρc varies from 40K to 300 K [3, 33]. To conclude, we have shown that the Boltzmann equation and its consequences are no less robust for anisotropic metals than they are for isotropic ones. The only condition controlling the validity of the Boltzmann equation is the large value of EF τ, regardless of whether τ comes from elastic or inelastic scattering. Out-of- plane localized states change the c-axis transport rad- ically while playing only minor role for the in-plane one. While ρab remains metallic, an interplay between 100 200 300 400 T, Kelvin FIG. 2: ρc vs temperature. Solid: experimental data on Sr2RuO4; dashed: fit into the phonon-assisted tunneling model in the non-perturbative regime, Eq.(11) phonon-assisted tunneling and conventional momentum relaxation causes insulating or non-monotonic depen- dence of ρc on temperature. This model is in a good agreement with the experimental data on Sr2RuO4. This research was supported by NSF-DMR-0308377. We acknowledge stimulating discussions with B. Alt- shuler, A. Chubukov, A. Hebard, S. Hill, P. Hirschfeld, P. Littlewood, D. Khmelnistkii, N. Kumar, Yu. Makhlin, A. Mirlin, M. Reizer, A. Schofield, S. Tongay, A.A. Var- lamov, and P. Wölfle. We are indebted to A. Hebard, A. Mackenzie, and S. Tongay for making their data available to us. [1] S. L. Cooper and K. E. Gray, in Physical Properties of High Temperature Superconductors, edited by D. M. Ginsberg, (World Scientific, Singapore, 1994), p. 61. [2] P. W. Anderson, Science 256, 1526 (1990); P. W. An- derson and Z. Zou, Phys. Rev. Lett. 42, 2642 (1992); D. G. Clarke, S. P. Strong, and P. W. Anderson, Phys. Rev. Lett. 72, 3218-3221 (1994). [3] see N. B. Brandt, S. M. Chudinov, and Ya. G. Ponomarev, Semimetals: I. Graphite and its com- pounds, (North-Holland, Amsterdam, 1988) and refer- ences therein. [4] W. J. Wattamaniuk, J. P. Tidman, and R. F. Frindt, Phys. Rev. Lett. 35 62 (1975). [5] A. W. Tyler, A. P. Mackenzie, S. NishiZaki, and Y. Maeno, Phys. Rev. B 58, 10107 (R) (1998). [6] J. Singleton and C. Mielke, Contemp. Phys. 43, 63 (2002). [7] J. Singleton et al. cond-mat/0610318. [8] A. Rojo and K. Levin, Phys. Rev. B 48, 16861 (1993). [9] V. Fleurov, M. Karpovski, M. Molotskii, A. Palevski, A. Gladkikh, R. Kris Solid State Comm. 97, 543, (1996). [10] M. J. Graf, M. Palumbo, D. Rainer, and J. A. Sauls, Phys. Rev. B 52, 10588 (1995). http://arxiv.org/abs/cond-mat/0610318 [11] P. J. Hirschfeld, S. M. Quinlan, and D. J. Scalapino, Phys. Rev. B 55, 12742 (1997). [12] A. A. Abrikosov, Physica C 317-318, 154 (1999). [13] M. Turlakov and A. J. Leggett, Phys. Rev. B 63, 064518 (2001). [14] U. Lundin and R. H. McKenzie,Phys. Rev. B 68, 081101(R) (2003). [15] A. F. Ho and A. J. Schofield, Phys. Rev. B 71, 045101 (2005) [16] P. Wölfle and R. N. Bhatt, Phys. Rev. B 30, 3542 (1984). [17] N. Kumar, P. A. Lee, and B. Shapiro, Physica A 168, 447 (1990). [18] N. Dupuis, Phys. Rev B 56, 9377 (1997). [19] N. Kumar and A. M. Jayannavar, Phys. Rev. B 45, 5001 (1992). [20] P. Moses and R. H. McKenzie, Phys. Rev. B 60, 7998 (1999). [21] L. Ioffe, A. Larkin, A. Varlamov, and L. Yu, Phys. Rev. B 47, 8936 (1993). [22] D. G. Clarke, S. P. Strong, P. M. Chaikin, and E. I. Chashechkina, Science 279, 2071 (1998). [23] J. Rammer and H. Smith, Rev. Mod. Phys. 58, 323 (1986). [24] J. Paglione, C. Lupien, W.A. MacFarlane, J.M. Perz, L. Taillefer, Z.Q. Mao, and Y. Maeno, Phys. Rev. B 65, 220506(R). [25] The self-energy in Eqs.(8a,8b) diverges at ζ = 1. This di- vergence –also present for the isotropic case – results from the renormalization of the sound velocity and is an arte- fact of the Frölich Hamiltonian. A divergence-free theory is obtained by applying the adiabatic approximation to the coupled system of electrons and ions [26, 27, 28]. [26] J. R. Schrieffer, Theory of Superconductivity, (Addison- Wesley, Redwood City, 1988). [27] E.G. Brovman and Yu. Kagan, Sov. Phys. JETP 25, 365 (1967). [28] B. T. Geilikman, Sov. Phys.-Usp. 18, 190 (1975). [29] Migdal’s theory also rules out models based entirely on polaronic effects since polarons are stable only if a typical electron velocity does not exceed the sound one. [30] L.I. Glazman and R.I. Shekhter, Sov. Phys. JETP 61 163, (1988). [31] N. S. Wingreen, K. W. Jacobsen, and J. W. Wilkins, Phys. Rev. Lett. 59, 376 (1987); Phys. Rev. B 40, 11834 (1989). [32] A.I. Larkin and K. Matveeev, Sov. Phys. JETP 66, 580 (1987). [33] S. Tongay and A. F. Hebard, private communication. ABSTRACT Transport in metals with strongly anisotropic single-particle spectrum is studied. Coherent band transport in all directions, described by the standard Boltzmann equation, is shown to withstand both elastic and inelastic scattering as long as $E_F\tau\gg 1$. A model of phonon-assisted tunneling via resonant states located in between the layers is suggested to explain a non-monotonic temperature dependence of the c-axis resistivity observed in experiments. <|endoftext|><|startoftext|> Introduction to complex analytic geometry. Translated from the Polish by Maciej Klimek, Birkhäuser Verlag, Basel, 1991. [Nik-Tho-Zwo 2007] N. Nikolov, P. J. Thomas, W. Zwonek, Discontinuity of the Lempert function and the Kobayashi-Royden metric of the spectral ball, preprint. [Ran-Whi 1991] T. J. Ransford, M. C. White, Holomorphic self-maps of the spectral unit ball, Bull. London Math. Soc. 23 (1991), 256–262. [Ros 2003] J. Rostand, On the automorphisms of the spectral unit ball, Studia Math. 155 (2003), 207–230. [Rud 1980] W. Rudin, Function theory in the unit ball of Cn (1980), Grundlehren der Mathematischen Wissenschaften 241 Springer-Verlag, New York-Berlin. Instytut Matematyki, Uniwersytet Jagielloński, Reymonta 4, 30-059 Kraków, Poland E-mail address: Wlodzimierz.Zwonek@im.uj.edu.pl ABSTRACT We prove an Alexander type theorem for the spectral unit ball $\Omega_n$ showing that there are no non-trivial proper holomorphic mappings in $\Omega_n$, $n\geq 2$. <|endoftext|><|startoftext|> Introduction The (Fitch) parsimony length of a character on a tree equals the minimum number of state changes (substitutions) required to fit the character onto a tree (Fitch, 1971). We turn this definition on its head and show how the parsimony length of a character equals the minimum number of changes in the tree required to fit the tree onto the character. This may be a back-to-front way to look at parsimony, but it is also a useful one. We detail two applications of the result. The first application is that this reformulation of parsimony provides a closer link between parsimony based analysis and supertree methods. We demonstrate that the maximum parsi- mony tree can be viewed as a type of median consensus tree, where the median is computed with respect to the SPR distance (see below). As well, the result shows how to conduct a parsimony based analysis not just on characters but on trees, without having to recode the trees as binary character matrices. This opens the way to a hybrid between the consensus approach and the total evidence approach, where the data is a mix characters, trees, and subtrees. The second application of our observation on parsimony is to the analysis of pairs of characters. We show that the score of the maximum parsimony tree for two characters is a simple function of the smallest number of recombinations required to explain the incongru- ence between the characters without homoplasy. This result provides the basis of a highly efficient test for recombination (Bruen et al., 2006). Here and throughout the paper we assume that all phylogenetic trees are fully resolved (bifurcating) and that by ‘parsimony’ we refer to Fitch parsimony, where the character states are unordered and reversible. Some of the results presented here can be extended to other forms of parsimony, and possibly to incompletely resolved trees (Bruen, 2006), lie beyond the scope of this paper. Note that in this paper we are dealing with unrooted SPR rearrangements, which are those used in tree searches. There is a related, but distinct, concept of rooted SPR rearrangements, where the rearrangements are restricted to obey a type of temporal constraint Song (2003). It is this latter class of rooted SPR rearrangments that are used to model lateral gene transfers and recombination. It would be a worthwhile, but challenging, goal to investigate whether any of the results on unrooted SPR rearrangements in this paper can be extended to rooted SPR rearrangements. Linking Parsimony with SPR A subtree-prune and regraft (SPR) rearrangement is an operation on phylogenetic trees whereby a subtree is removed from one part of the tree and regrafted to another part of the tree, see Figure 1, (Felsenstein, 2004; Swofford et al., 1996). These SPR rearrangements are widely used by tree searching software packages like PAUP (Swofford, 1998) and Garli (Zwickl, 2006). The SPR distance between two trees can be defined as the minimal num- ber of SPR rearrangements required to transform one tree into the other (Hein, 1990; Allen and Steel, 2001; Goloboff, 2007). For example, the two trees T1 and T3 in Figure 1 can be transformed into each other using a minimum of two SPR rearrangements, via the tree T2, so their SPR distance is two. Figure 1: Two trees, T1 and T3, separated by two SPR rearrangements via the intermediate tree T2. A binary character of parsimony length 3 is indicated on tree T1 by the node colours. The character is compatible with a tree (T3) within SPR distance two, illustrating Theorem 1.. The parsimony length of a character on a tree is the minimum number of steps required to fit that character on the tree, as computed by the algorithm of (Fitch, 1971). We will always assume unordered reversible characters The length of a character Xi on a tree T is denoted `(Xi, T ). A character with ri states therefore has parsimony length at least (ri−1), as every state not at the root has to arise at least once. A character is compatible with a tree if it requires at most (ri − 1) changes on that tree (Felsenstein, 2004). So far, one thinks of fitting a character onto a tree; we could just as well fit the tree onto the character. If the character and the tree are compatible then we have a perfect fit. When there is not a perfect fit we can measure how many SPR rearrangements are required to give a tree that does make a perfect fit. It turns out that this measure gives an equivalent score to parsimony length. More formally: Theorem 1. Let Xi be a character with ri states and let T be a fully resolved phylogenetic tree. It takes exactly `(Xi, T ) − (ri − 1) SPR rearrangements to transform T into a tree compatible with Xi. The result still holds if Xi has some missing states. As an example, consider the character X1 mapping taxa A,C,D,F to one and B,E,G to zero. The length of this character on tree T1 of Figure 1 is three, and the number of SPR rearrangements needed to transform T1 onto some tree T3 compatible with with X1 is two. Note that there could be other trees compatible with X1 are are further than two SPR rearrangements away: the result only gives the number of rearrangements required to obtain the closest tree. Once stated, the theorem is not too difficult to prove. First show that performing an SPR rearrangement decreases the length by at most one step. Hence it takes at least `(Xi, T )− (ri − 1) SPR rearrangements to transform T into a tree compatible with the character Xi. Then show that this is the minimum required. A formal proof is presented in the Appendix. A restricted (binary character) version of this theorem was proved in (Bryant, 2003). The theorem captures an issue that is central to the interpretation of incongruence: is an observed incongruence to be explained by positing homoplasy or by modifying the tree. Define the SPR distance from a tree T to a character Xi to be the SPR distance from T to the closest tree T ′ that is compatible with Xi. Theorem 1 then tells us that the SPR distance from T is equal to the difference between the length `(Xi, T ) of Xi on T and the minimum possible length of Xi on any tree. Consensus trees, supertrees and parsimony In their insightful overview of supertree methods Thorley and Wilkinson (2003) characterise a family of supertree methods that all minimise a sum of the form d(T, ti) = d(T, t1) + d(T, t2) + ...+ d(T, tn). (1) Here t1, t2, . . . , tn are the input trees and d(T, ti) is a measure of the distance between the input tree ti with the supertree T . There are many choice for the distance measure d, and it need not be the case that the distance measure satisfies the symmetry condition d(T, ti) = d(ti, T ). Gordon (1986) was the first to propose this description of supertrees. Many supertree methods can be described in these terms, including Matrix representation with parsimony (MRP) (Baum, 1992; Ragan, 1992); Minimum Flip supertrees Chen et al. (2006); the Median Supertree (Bryant, 1997), Majority Rule Supertree (Cotton and Wilkin- son, 2007) and the Average Consensus Supertree (Lapointe and Cucumel, 1997). Let ds(T,Xi) denote the SPR distance from T to the closest fully resolved tree Ti that is compatible with Xi. By Theorem 1, a maximum parsimony tree for X1, . . . , Xm is one that minimises the expression ds(T,Xi) = ds(T,X1) + ds(T,X2) + ...+ ds(T,Xm). (2) In this way, maximum parsimony is a form of median consensus. The significance of this observation doesn’t come from the fact that we can write the the parsimony score of T in the form (2); it is from the close connection with SPR distances, and from the way we will now use this connection to combine different kinds of data in the same theoretical framework. An SPR median tree for fully resolved trees t1, . . . , tn on the same leaf set is a tree T that minimises ds(T, ti) = ds(T, t1) + ds(T, t2) + ...+ ds(T, tn), where here d(T, ti) denotes the SPR distance from T to ti (Hill, 2007). We extend this directly to a supertree method by mimicking the situation for characters. Suppose that ti is a phylogenetic tree, not necessarily fully resolved, on a subset of the set of leaves. We say that a fully resolved tree T on the full set of leaves is compatible with ti (equivalently, T displays ti) if we can obtain ti from T by pruning off leaves and contracting edges. In this general situation, we let ds(T, ti) denote the SPR distance from T to the closest fully resolved tree Ti that is compatible with ti. This is equivalent to the more traditional definition whereby we first prune leaves off T then compute the distance from this pruned tree to ti. Now suppose that we have both characters and trees in the input. Both types of phylo- genetic data can be into an SPR median tree T , chosen to minimise the sum ds(T, ti) + ds(T,Xi). We have, then, a way to bring together both the supertree/consensus methodology and the total evidence methodology. In the case that the data comprises only trees, the tree is a median supertree; in the case that the data comprises only character data, the tree is the maximum parsimony tree. It is important to note the difference between this approach and the MRP method (Baum, 1992; Ragan, 1992), which could be used to combine trees and characters. In MRP, the trees are broken down into multiple independent characters. This is a problem, since the characters encoding a tree are nowhere near independent. In contrast, the SPR median tree approach treats a tree as a single indivisible unit of information. There is one critical issue that has been side-stepped: computation time. At present, computational limitations make the construction of SPR median trees infeasible for all but the smallest data sets: just computing the SPR distance between two trees is an NP-hard problem (Hickey et al., 2006). In contrast, Total evidence and MRP approaches are possible for at least 100 taxa. However there are now good heuristics for unrooted SPR distance Goloboff (2007) and exact special case algorithms Hickey et al. (2006) that could be applied to the problem. Below we describe a lower bound method for the SPR distance that should also aid construction of these SPR median trees. Parsimony on pairs of characters Another valuable application of Theorem 1 follows when we consider parsimony analysis of just two unordered and reversible characters. The concept of pairwise character compatibil- ity was introduced by Le Quesne (1969) (see also Felsenstein (2004)). Two binary characters with states 0 and 1 are incompatible if and only if all four combinations of 00, 01, 10, and 11 are present as combination of states for the two characters (Le Quesne, 1969). In a standard setting, character incompatibility is interpreted as implying that at least one of the charac- ters has undergone convergent or recurrent mutation (homoplasy). In other words, for every possible phylogeny describing the history of the two characters, at least one homoplasy is posited for one of the characters. Another interpretation of incompatibility of two characters is that characters evolved without homoplasy on two different phylogenies, where the phylo- genies differ by one or more SPR rearrangement (Sneath et al., 1975; Hudson and Kaplan, 1985). Define the total incongruence score i(X1, X2) for two multi-state unordered characters X1 and X2 (with r1 and r2 states respectively) as i(X1, X2) = min `(X1, T ) + `(X2, T ) − (r1 − 1)− (r2 − 1). (3) This is the maximum parsimony score of the two characters X1, X2 minus the minimum number of changes required for each character. Equation (3) generalises the incompatibility notion for two binary characters. It is also equivalent to the incongruence length difference statistic applied to only two characters (Farris et al., 1995). Importantly, the total incongru- ence score can be computed rapidly (Bruen and Bryant, 2006). The following consequence of Theorem 1 strengthens the connection between incongruence and SPR rearrangements. Theorem 2. The total incongruence score i(X1, X2) for two characters equals the minimum SPR distance between a tree T1 and T2 such that X1 is compatible with T1 and X2 is compatible with T2. Although the notion of total incongruence for two characters has been considered before in the context of character selection and weighting (Penny and Hendy, 1986), it has not been considered in the context of genealogical similarity. Essentially, Theorem 2 shows that the total incongruence score equals the minimum possible number of SPR rearrangements that could have occurred between the phylogenetic histories for both characters, assuming that the characters have different histories with which they are each perfectly compatible. Indeed, Theorem 2 suggests a natural way to interpret genealogical similarity between two characters, which we have used to develop a powerful test for recombination (Bruen et al., 2006). Choosing two characters from two different genes (which have possibly different histories) gives a simple approach to identify the distinctiveness of the histories of the genes. We can also apply Theorem 2 to obtain a lower bound on an SPR distance between two trees. Suppose that we have two trees T1 and T2 and we wish to obtain a lower bound on the SPR distance d(T1, T2) between the two trees. If we choose any character X1 convex on T1 and any character X2 convex on T2 then, by Theorem 2, we have that i(X1, X2) ≤ d(T1, T2). By carefully choosing X1 and X2 we can obtain tighter bounds. One natural starting point for X1 and X2 is the four or five character encodings described by (Semple and Steel, 2002; Huber et al., 2005). Discussion and extensions We have presented a reformulation of parsimony that is, in some way, dual to the standard definitions. Instead of measuring how well a character fits onto a tree we look at how well the tree fits onto the character. A consequence of this new perspective is that we can combine trees and character data using one general SPR framework, and we also obtain new results connecting incongruence measures and recombination. Nevertheless, it is not immediately clear how the new reformulation can be interpreted in itself. Trees compatible with X1 Trees compatible with X2 Trees compatible with X3 Trees compatible with X4 Trees compatible with X(m-1)Trees compatible with Xm d(T,t1) Figure 2: Cartoon representation of parsimony in terms of tree rearrangements. Each characterXi gives a ‘cloud’ of trees containing those trees compatible withXi. The maximum parsimony tree is then the tree closest to these clouds under the SPR distance. One aid in this direction is to consider the information a single character, or tree, rep- resents. Given a single character, we can imagine a cloud of trees comprising exactly those trees compatible with the character (Figure 2). If we are told that this character evolved without homoplasy, then we know that the true evolutionary tree must be contained some- where within the cloud. However as there is only one character there is a lot of uncertainty regarding the tree, so there are a lot of trees in the clouds. Now suppose we have multiple characters, each with its own cloud. There may not be a single tree contained in the inter- section of all of these clouds. Instead, we search for a tree that is close as possible to all of the clouds. The distance from T to the cloud associated to character Xi is exactly ds(T,Xi), so by Theorem 1 a tree closest to all of the clouds is a maximum parsimony tree. Each cloud represents the uncertainty around each piece of data (tree or character). We note that several of the results in this article can be extended, for details. Firstly, both Theorems 1 and 2 are both valid if we replace the SPR distance with the tree bisection and reconnection (TBR) distance. In a TBR rearrangement, a subtree is removed from the tree and then reattached elsewhere in a tree, the difference with SPR being that we can reattach using any of the nodes in the subtree (Allen and Steel, 2001; Felsenstein, 2004). The TBR distance between two trees is the minimum number of TBR rearrangements required to transform one tree into the other. That Theorems 1 and 2 hold for the TBR distance might seem surprising, since the TBR distance between two trees is always less than, or equal to, the SPR distance between the trees. However the extension follows by a tiny change to the proof of Theorem 1, noting that a TBR move can still only reduce the parsimony score of a character by at most one. We have also explored extensions of the result to other distances between trees, notably the Robinson-Foulds or partition distance and the Nearest Neighbor Interchange distance, though the connections are not so clear. See Bruen (2006) for details. Acknowledgements We would like to thank Mike Steel, Sebastien Böcker, Olaf Bininda Emonds, Pablo Golloboff, Mark Wilkinson and an anonymous referee for their valuable suggestions. This research was partially supported by the New Zealand Marsden Fund. References Allen, B. and M. Steel. 2001. Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics 5:1–13. Baum, B. 1992. Combining trees as a way of combining datasets for phylogenetic inference, and the desirability of combining gene trees. Taxon 41:3–10. Bruen, T. 2006. Discrete and statistical approaches to genetics. Ph.D. thesis McGill Univer- sity School of Computer Science. Bruen, T. and D. Bryant. 2006. A subdivision approach to maximum parsimony. Annals of Combinatorics In Press. Bruen, T., H. Philippe, and D. Bryant. 2006. A simple and robust statistical test to detect the presence of recombination. Genetics 172:1–17. Bryant, D. 1997. Building trees, hunting for trees and comparing trees. Ph.D. thesis Dept. Mathematics, University of Canterbury. Bryant, D. 2003. A classification of consensus methods for phylogenetics. Pages 163–184 in Bioconsensus vol. 61 of DIMACS. American Math Society, Providence, RI. Bryant, D. 2004. The splits in the neighborhood of a tree. Annals of Combinatorics 8:1–11. Chen, D., O. Eulenstein, D. Fernandez-Baca, and M. Sanderson. 2006. Minimum-flip su- pertrees: Complexity and algorithms. IEEE/ACM Trans. Comput. Biol. Bioinformatics 3:165–173. Cotton, J. and M. Wilkinson. 2007. Majority-rule supertrees. Systematic Biology 56:445– Farris, J. S., M. Källersjö, A. G. Kluge, and C. Bult. 1995. Constructing a significance test for incongruence. Systematic Biology 44:570–572. Felsenstein, J. 2004. Inferring Phylogenies. Sinauer Associates. Fitch, W. M. 1971. Towards defining the course of evolution: Minimum change for a specific tree topology. Systematic Zoology 20:406–416. Goloboff, P. 2007. Calculating SPR distances between trees. Cladistics Online early access. Gordon, A. D. 1986. Consensus supertrees: the synthesis of rooted trees containing overlap- ping sets of labeled leaves. Journal of Classification 3:335–348. Hein, J. 1990. Reconstructing evolution of sequences subject to recombination using parsi- mony. Mathematical Biosciences 98:185–200. Hickey, G., F. Dehne, A. Rau-Chaplin, and C. Blouin. 2006. The computational complexity of the unrooted subtree prune and regraft distance. Tech. Rep. CS-2006-06 Faculty of Computer Science, Dalhousie University. Hill, T. 2007. Development of New Methods for Inferring and Evaluating Phylogenetic Trees. Ph.D. thesis Uppsala Universitet. Huber, K. T., V. Moulton, and M. A. Steel. 2005. Four characters suffice to convexly define a phylogenetic tree. SIAM Journal on Discrete Mathematics 18:835–843. Hudson, R. R. and N. L. Kaplan. 1985. Statistical properties of the number of recombination events in the history of a sample of dna sequences. Genetics 111:147–64. Lapointe, F.-J. and G. Cucumel. 1997. The average consensus procedure: combination of weighted taxa containing identical or overlapping sets of taxa. Systematic Biology 46:306– Le Quesne, W. J. 1969. A method of selection of characters in numerical taxonomy. System- atic Zoology 18:201–205. Penny, D. and M. Hendy. 1986. Estimating the reliability of evolutionary trees. Molecular Biology and Evolution 3:403–17. Ragan, M. A. 1992. Phylogenetic inference based on matrix representations of trees. Molec- ular Phylogenetics and Evolution 1:53–58. Semple, C. and M. Steel. 2002. Tree reconstruction from multi-state characters. Advances in Applied Mathematics 28:169–84. Semple, C. and M. Steel. 2003. Phylogenetics. Oxford University Press. Sneath, P., M. Sackin, and R. Ambler. 1975. Detecting evolutionary incompatibilities from protein sequences. Systematic Zoology 24:311–332. Song, Y. S. 2003. On the combinatorics of rooted binary phylogenetic trees. Ann. Comb. 7:365–379. Swofford, D., G. Olsen, P. Waddell, and D. Hillis. 1996. Molecular sytematics chap. Phylo- genetic Inference, Pages 407–514. Sinauer Associates, Inc. Swofford, D. L. 1998. PAUP*. Phylogenetic Analysis using Parsimony (*and other methods). Sinauer Associates, Sunderland, Massachusetts. Thorley, J. L. and M. Wilkinson. 2003. A view of supertree methods. Pages 185–194 in Bioconsensus (F. Roberts, ed.) vol. 61 of DIMACS series in discrete mathematics and theoretical computer science The American Mathematical Society, New York. Zwickl, D. 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. thesis University of Texas at Austin. Appendix Refer to (Semple and Steel, 2003) for a detailed description of the notation. The first observation is that an TBR rearrangement of a tree increases the length of a character by at most one. As SPR rearrangements are a special case of TBR rearrangements, the same result holds for SPR. Lemma 1. Let T be a fully resolved phylogenetic tree and Xi an unordered reversible charac- ter. Let T ′ be a phylogenetic tree that differs from T by a single TBR rearrangement. Then `(χ, T ′) ≤ `(χ, T ) + 1. Proof. The proof of Lemma 5.1 in (Bryant, 2004) for binary characters applies directly to the multistate case. Let dSPR(T, T ′) denote the unrooted SPR distance between two phylogenetic trees T and Theorem 1 LetXi be a character with ri states and let T be a fully resolved phylogenetic tree. It takes exactly `(Xi, T ) − (ri − 1) SPR rearrangements to transform T into a tree compatible with Xi. The result still holds if Xi has some missing states. Proof. Let T ′ be a fully resolved phylogenetic tree compatible with Xi for which dSPR(T, T is minimized and let m = dSPR(T, T ′). Then there exists a sequence of trees T ′ = T0, ..., Tm = T such that every adjacent pair of trees in the sequence differ by exactly one SPR rear- rangement. By Lemma 1 the existence of this sequence implies that `(T,Xi) − `(T ′, Xi) ≤ dSPR(T, T ′) and since Xi is compatible with Xi we have `(T ′, Xi) = ri − 1, giving `(T,Xi)− (ri − 1) ≤ dSPR(T, T ′). For the other direction, we show that we can construct a sequence of `(T,Xi) − (ri − 1) SPR rearrangements that transform T into a tree T ′ compatible with Xi. Firstly, if `(T,Xi) − (ri − 1) = 0, then T is compatible with Xi so the proof is finished. Otherwise, let X̂i be an assignment of states to internal nodes that minimises the number of state changes (that is, a minimum extension of Xi). Then since Xi is not convex on T there exist three vertices u, v and w, where {u, v} ∈ E(T ), v lies on the path from u to w and X̂i(u) = X̂i(w) 6= X̂i(v). Perform an SPR rearrangement by removing edge {u, v}, supressing the v vertex and creating a new edge {u, t} where t is a new vertex on an edge adjacent to w. Furthermore, set X̂i(t) = X̂i(w). Then the number of edges on which a change has occurred has decreased by 1 thereby decreasing the parsimony length by 1. This procedure can be repeated until the parsimony length equals ri − 1, constructing the desired sequence of trees and completing the proof. Let T be a maximum parsimony phylogenetic tree for X1 and X2 and let Theorem 2 The total incongruence score i(X1, X2) for two characters equals the mini- mum SPR distance between a tree T1 and T2 such that X1 is compatible with T1 and X2 is compatible with T2. Proof. Let T1 and T2 be any two trees compatible with X1 and X2 respectively. Then `(X1, T1) = r1−1 and by Theorem 1, `(X2, T1)− (r2−1) ≤ dSPR(T1, T2). We have then that i(X1, X2) ≤ `(X1, T1) + `(X2, T1)− (r1 − 1)− (r2 − 1) ≤ dSPR(T1, T2) and so i(X1, X2) is a lower bound for dSPR(T1, T2). We show that this bound can be achieved. Let T be a maximum parsimony tree for the pair of characters X1, X2. By Theorem 1 there exist two trees T1 and T2 such that T1 is compatible with X1, T2 is compatible with X2 and dSPR(T1, T ) + dSPR(T2, T ) = i(X1, X2), implying that dSPR(T1, T2) ≤ dSPR(T1, T ) + dSPR(T2, T ) ≤ i(X1, X2) and hence dSPR(T1, T2) = i(X1, X2). ABSTRACT The parsimony score of a character on a tree equals the number of state changes required to fit that character onto the tree. We show that for unordered, reversible characters this score equals the number of tree rearrangements required to fit the tree onto the character. We discuss implications of this connection for the debate over the use of consensus trees or total evidence, and show how it provides a link between incongruence of characters and recombination. <|endoftext|><|startoftext|> Draft version June 8, 2021 Preprint typeset using LATEX style emulateapj v. 08/22/09 SPECTROPOLARIMETRIC OBSERVATIONS OF THE CA II 8498 Å AND 8542 Å LINES IN THE QUIET SUN A. Pietarila , H. Socas-Navarro High Altitude Observatory, National Center for Atmospheric Research2, 3080 Center Green, Boulder, CO 80301, USA T. Bogdan Space Environment Center, National Oceanic and Atmospheric Administration, 325 Broadway, Boulder, CO 80305, USA Draft version June 8, 2021 ABSTRACT The Ca II infrared triplet is one of the few magnetically sensitive chromospheric lines available for ground-based observations. We present spectropolarimetric observations of the 8498 Å and 8542 Å lines in a quiet Sun region near a decaying active region and compare the results with a simulation of the lines in a high plasma-β regime. Cluster analysis of Stokes V profile pairs shows that the two lines, despite arguably being formed fairly close, often do not have similar shapes. In the network, the local magnetic topology is more important in determining the shapes of the Stokes V profiles than the phase of the wave, contrary to what our simulations show. We also find that Stokes V asymmetries are very common in the network, and the histograms of the observed amplitude and area asymmetries differ significantly from the simulation. Both the network and internetwork show oscillatory behavior in the Ca II lines. It is stronger in the network, where shocking waves, similar to those in the high-β simulation, are seen and large self-reversals in the intensity profiles are common. Subject headings: polarization, Sun: chromosphere, waves 1. INTRODUCTION Our understanding of solar magnetic fields outside active regions has increased signifi- cantly during the last years. This is due to new and better instrumentation (e.g., THEMIS, Paletou & Molodij 2001; VSM on SOLIS, Keller & The Solis Team 2001; Swedish Solar Telescope, Scharmer, Bjelksjo, Korhonen, Lindberg, & Petterson 2003; Solar Optical Telescope on Hinode, Shimizu 2004; and SPINOR, Socas-Navarro et al. 2006), better diagnostic techniques (see for example Bellot Rubio 2006 for a review on inversion techniques) and advanced numerical simulations (Stein & Nordlund 2006 and references therein). A large portion of the work has focused on photospheric magnetic fields. Only now we are starting to have adequate tools for investigating chromospheric magnetism in more detail. (For a review of chromospheric magnetic fields see Lagg (2005)). This is not surprising considering the numerous difficulties in observing chromospheric magnetic fields, interpreting the data, and performing realistic MHD simulations. There are two different sets of lines that are often used for chromospheric spectropolarimetry, the He I infrared (IR) triplet at 10830 Å, and the Ca II IR triplet at 8500 Å. Both line sets have their advantages and dis- advantages. The He I lines are formed over a relatively thin layer, and therefore observations can be inverted us- ing a simple Milne-Eddington model. The drawback is that while the formation range is fairly narrow, the pre- cise formation height remains uncertain, and the Milne- Eddington inversions do not give any information on 1 Institute of Theoretical Astrophysics, University of Oslo, P.O.Box 1029 Blindern, N-0315 Oslo, Norway 2 The National Center for Atmospheric Research (NCAR) is sponsored by the National Science Foundation. the atmospheric gradients. The lines are also sensitive to the Paschen-Back effect, which must be included in the inversion code (Socas-Navarro et al. 2004). Further- more, simulating the He I lines is difficult since coronal irradiation has a non-negligible effect on their formation (Andretta & Jones 1997). In contrast, the formation of the Ca II IR lines is fairly well understood (Lites et al. 1982). The broad Ca II lines sample a large region of the atmosphere, from the photosphere to the lower chromo- sphere. However, the Ca II lines are formed in nonLTE, making inversions considerably more cumbersome. Several investigations using the Ca II IR lines have studied intensity and velocity oscillations in the quiet Sun (e.g. Lites et al. 1982; Deubner & Fleck 1990) or, alternatively, magnetic fields in active regions (e.g. Socas-Navarro et al. 2000a). In both cases the lines have proven useful as diagnostics of the solar chromosphere. In this paper we present results of spectropolarimetric observations of two of the lines in an enhanced network region. We have both spatial maps and time series data. The observations show that the Ca II lines are formed in a very interesting region, namely the region where the atmosphere is transforming from a plasma dominated (β >> 1) to a magnetic field dominated (β << 1) regime in terms of dynamic force balance. Wave propagation is clearly seen in the highly dynamic magnetic regions, whereas the weakly magnetic internetwork is found to be less variable. Interestingly, the two Ca II lines ex- hibit significant differences even though in calculations they are formed fairly close together. The importance of gradients in the chromospheric network is clearly demon- strated by the prevalence of asymmetric Stokes V profiles in the data. The paper is arranged as follows: in § 2 the data and their reduction are addressed. Results of analyzing the data using different approaches are presented in § 3. We http://arxiv.org/abs/0704.0617v1 performed cluster analyses on the Stokes V profiles to classify them and to describe spatial patterns seen in the data. Statistics, such as profile amplitudes and asym- metries, are presented. The time dependent behavior of the lines in different network and internetwork regions is also discussed. In § 4 the observations are compared to simulations of the lines in a high plasma-β regime (Pietarila et al. 2006, hereafter P06). Finally, in § 5 the main results are summarized and discussed. 2. OBSERVATIONS AND DATA REDUCTION The Spectro-Polarimeter for INfrared and Optical Re- gions (SPINOR, Socas-Navarro et al. 2006) at the Dunn Solar telescope, Sacramento Peak Observatory, was used to observe two of the Ca II infrared triplet lines at 8498 Å and 8542 Å, as well as two photospheric Fe I lines at 8497 Å and 8538 Å. The setup included several other lines but because of computer problems only data from the two Ca lines which used the ASP TI TC245 cameras were recorded fully. The data have 256 points in both the wavelength and spatial position with a typical noise level of 6 × 10−4 Ic (1 σ deviation from the mean) and a spectral sampling of 25 mÅ. The pixel height corre- sponds to ≈ 0.38 arcseconds on the solar surface along the slit. We observed a quiet Sun region near disk center at S17.3 W32.1 on May 19, 2005 at 14:14 UT. An MDI- magnetogram of the region is shown in Figure 1. The slit was positioned in the vicinity of a decaying active re- gion, AR10763, but avoided flux concentrations from the active region (i.e., plages). A time series consisting of 99 time steps of short scans (3 slit positions), with a spacing of 0.375 arcseconds each, was acquired during variable seeing conditions. The cadence is ≈ 10 seconds (i.e., a given slit position was repeated every 30 s). The time series was followed by a 63 step raster centered around the position where the slit was during the time series. The raster step size was 0.375 arcseconds. Adaptive op- tics (AO, Rimmele 2000) were used during the observ- ing sequence but the compromised seeing conditions did not allow for continuous locking onto granulation. This caused the slit to jump occasionally, making the longest period with a stationary slit in the time series 17 time steps (8.5 min). The spatial resolution varied during the sequences being at best less than an arc second, but on average a factor of two worse. Standard procedures for flat field and bias were used for the data reduction. Instrumental polarization was re- moved using the available calibration data, as explained in Socas-Navarro et al. (2006). No absolute wavelength calibration was attempted because no suitable telluric lines are present. Instead a wavelength calibration using spatial pixels devoid of magnetic field was done by fitting the average spectrum to the Kitt Peak FTS-spectral at- las (Neckel & Labs 1984). The FTS atlas was also used to find the normalization factor for the intensities to the quiet Sun continuum intensities. Because of detector flat- field residuals and prefilter shape, the continua in the raw data from both detectors are tilted. The tilts were re- moved a posteriori by subtracting a linear fit (y = a+bλ) obtained by matching the continuum intensity levels to those of the FTS atlas. The data were analyzed using both the raster and time series for statistical purposes. The period when the slit was stationary on the solar disk was used to study the time-dependent behavior of the lines. Because of the short length of this period, we do not present any Fourier analysis of the data. To make a classification of Stokes V profile morphologies, we did cluster analyses based on a Principal Component Analysis (PCA) in a similar manner as the work of Sánchez Almeida & Lites (2000) and Khomenko et al. (2003) for photospheric lines. We computed amplitudes for Stokes I and V profiles. Be- cause the Ca line intensity profiles often exhibit strong self-reversals, no proxies for atmospheric velocities, such as lines’ centers of gravity or bisectors, are adequate. For those Stokes V profiles with amplitudes greater than 7 × 10−3 Ic, (i.e. ≥ 10σ), amplitude and area asymme- tries were also calculated. The amplitude asymmetry of a Stokes V profile is de- fined by (Mart́ınez Pillet 1997): ab − ar ab + ar , (1) where ab and ar are the unsigned extrema of the blue and red lobes of the Stokes V profile. The area asymmetry of a Stokes V profile is defined by (Mart́ınez Pillet 1997): σA = s V (λ)dλ |V (λ)|dλ , (2) where s is the sign of the blue lobe. Because of the broad, deep lines and large velocities (compared with the pho- tosphere) present in the chromosphere, the choice of the integration range for the area asymmetries is non-trivial for the Ca lines. We followed the same procedure as in P06. In the weak field regime the Stokes V profile is proportional to dI/dλ (strictly true only in the ab- sence of atmospheric velocity and magnetic gradients). Inspection of the data showed that most of the observed Stokes V profiles have roughly the same structures as the dI/dλ profiles. The intensity in the blue wing (λ0) of the line profile was matched with a point in the red wing (λ1) with the same intensity. The signal-to-noise in the intensity profiles is much higher than in the Stokes V profiles and also the slope is much steeper. This makes matching points with the same value more accurate in the intensity than in the Stokes V profiles. The selec- tion of a wavelength to start the integration range was made by choosing a wavelength point that is far enough from the line core so that self-reversals are not an issue. In our data this point, λ1, is at 600 mÅ from reference wavelength of line center. The same value was used in Magnetograms made from the 63 step scan are shown in Figure 2. The panels are in order of increasing for- mation height: Fe I 8497 Å, Fe I 8538 Å, Ca II 8498 Å and Ca II 8542 Å. The lower part of the slit was located above a flux concentration along the enhanced network and the upper part over an internetwork region with very little magnetic activity. The network becomes wider and more diffuse with increasing line formation height as de- scribed by Giovanelli (1980). Not all magnetic flux seen in the photosphere can be identified in the chromosphere Fig. 1.— MDI magnetogram showing the position of the slit for the time series and the map (rectangular region). The observed region was close to the decaying active region, AR10763. and vice versa. However, interpreting the chromospheric magnetograms is difficult due to the self-reversed features in the cores of the Ca line Stokes V profiles. 3. RESULTS In Figure 3, Stokes I and V spectra of the solar sur- face under the slit are shown for both Ca II lines as well as the two photospheric Fe I lines in the Ca lines’ wings (marked by arrows). Since the Fe I 8497 Å line is blended in the Ca line’s wing and the Fe I 8538 Å line is very close to the edge of the detector, no quantitative analysis is done for them. No signal above the noise was recorded in Stokes Q and U so they will not be addressed in what follows. Residual vertical fringing caused by the polar- ization modulator is visible in the Stokes V images. We chose not to try to remove the fringing since its’ ampli- tude is of the same order of magnitude as the noise. The network, present in the lower part of the slit, is associated with less absorption in the intensity profiles. Both Ca lines often show self reversals, which are usually stronger on the blue side of the line than in the red. The Stokes V profiles of both Ca lines have large, extended wings. At times, the profiles may have both polarities present on the blue side of the core but in almost all cases the far blue wing of the profile has the same po- larity (i.e., opposite sign) as the red wing. The Stokes I and V profiles of the chromospheric lines look distinc- tively different from the photospheric lines: the Ca lines have more structure, they are wider and exhibit more spatial variation than the photospheric Fe lines. Some differences are seen between the two Ca lines: the 8542 Å line is slightly broader, has more structure in the spec- Fig. 2.— Magnetograms of the map deduced by using the weak field method (Landi Degl’Innocenti 1992). The Stokes V signal is measured in units of Gauss. Vertical lines show the position of slit during the time series. First panel: Fe I 8497 Å, second panel: Fe I8538 Å, third panel: Ca II 8498 Å, third panel: Ca II 8542 Å. Location on the solar disk: S32.1, W17.3. The orientation of the magnetograms is 180 degrees from the MDI magnetogram in Figure 1. The plotted symbols (∗, ✸ and △) on the images show where the pixels discussed later in the text are located. tra and also stronger absorption than the 8498 Å line. The internetwork region, present in the upper part of the slit, is mostly devoid of Stokes V signal, and Stokes I is more homogeneous than in the network. Self reversals are usually not seen in the profiles. A small portion of the internetwork region has structures in Stokes I that are similar to those seen in the magnetic region: Stokes I is brighter than in the surrounding areas and the profiles show some self reversals. Closer inspection of the images reveals a visible, albeit a very small amplitude, Stokes V signal. The spatial patterns of Stokes I and V amplitudes and asymmetries in the two Ca lines (Figure 4) are fairly similar to one another. The network is clearly visible in the Stokes I and V amplitudes, though it is more diffuse in the 8542 Å line. There is a structure in the upper part of the map that is seen best in the 8498 Å intensity image. Parts of this structure appear also in both lines’ Stokes V amplitude and asymmetry images. The edges of the network have more asymmetric Stokes V profiles. This is seen clearly in the 8542 Å amplitude asymmetry. Photospheric velocities can be estimated from the lo- cations of the iron lines’ intensity minima. Except for a nearly constant offset caused by the convective down flows in the network, the internetwork and network re- gions have very similar spatial and temporal patterns. 3.1. Classification of the Stokes V profiles To classify the shapes of the 8498 Å and 8542 Å Stokes V profile pairs we used PCA, (Rees et al. 2000) and clus- ter analysis. The cluster analyses were performed sepa- rately for the map and the time series. Here we present a summary of the PCA procedure and cluster analysis for completeness. With the PCA we are able to reduce the number of parameters needed to describe a given profile. Each pro- file, S(λj), j = 1, ..., Nλ (Nλ is the number of wavelength points in the profile) is composed of a linear combination of eigenvectors ei(λj), i = 1, ..., n: S(λj) = Σ i=1ciei(λj), (3) where the ci are appropriate constants. The eigenvectors and constants for a given set of profiles are obtained from a singular value decomposition (SVD, Rees et al. 2000, Socas-Navarro et al. 2001) and form an orthonormal ba- sis with Nλ eigenvectors: j=1ei(λj)ek(λj) = δik. (4) Not all eigenvectors contain the critical information needed to reproduce the profiles, some of the eigenvec- tors carry information about the noise pattern. We can therefore truncate the series expansion and use only a small number of eigenvectors and corresponding coeffi- cients to reproduce a given profile. The PCA guarantees that when expansion of Eq. 3 is truncated at a given order m, the amount of information in the lower orders is maximized. We performed the SVD for the two 8498 Å and 8542 Å Stokes V profiles separately. The resulting orthonor- mal bases, and also the cluster analysis, depend on the subset of profiles used to construct it. Because of this we included all Stokes V profiles from pixels where the 8498 Å Stokes V amplitude is above 7 × 10−3Ic, alto- gether 13671 profiles. Visual inspection of the eigenvec- tors shows that the first 11 eigenvectors (approximately) Fig. 3.— Dispersed images of the slit. The arrows mark the locations of the two photospheric iron lines and the horizontal lines in the intensity images are the hairlines used to spatially coalign the two detectors. Wavelengths are measured from 8498 Å (left) and 8542 Å (right). contain relevant information about the actual shape of the profiles whereas the remainder are associated with the noise patterns. The Stokes V profile pairs, now described with 11 × 2 coefficients corresponding to the 11 × 2 eigenvectors in- stead of 102 (51 for each profile) wavelength points, were organized into a predefined number of clusters. Before doing this the vectors consisting of the 22 coefficients were standardized, i.e., no information of the absolute Stokes V amplitudes is left, only the relative amplitudes of the 8498 Å and 8542 Å profiles. Based on the values of the coefficients, 6 cluster centers were identified using the k-means method (MacQueen 1967). It starts with k random clusters, which through iterations are changed to minimize the variability within a cluster and maximize it between clusters. Each profile pair is then assigned to the nearest cluster center in the 22-dimensional Eu- clidean space. The choice of number of clusters used for the cluster analysis is non-trivial. Since each data point is described by 22 numbers we cannot visually distinguish patterns in the spatial distribution of the points. Instead the number of clusters was defined by trial and error, i.e. so that each profile type in the time series or map is rep- resented and each cluster is still clearly distinct from one another. For each cluster a profile was constructed using the eigenvectors and the averaged 2 × 11 coefficients of all profiles belonging to that cluster. Cluster analysis of the map shows the shapes of Stokes V profiles in network regions with different magnetic topologies, whereas the time series analysis describes how a set of profiles from a certain magnetic topology changes with time. The results for the map are shown in Figure 5. Above each profile is the percentage of all profiles belonging to the cluster, the mean distance in the Euclidean space Fig. 4.— Maps of the 8498 Å and 8542 Å lines’ Stokes I amplitudes, Stokes V amplitudes, area asymmetries and amplitude asymmetries in the raster scan. The horizontal line seen in the amplitude images is a hairline used to spatially coalign the detectors. The vertical line shows the position of the slit during the time series. Note that x-axis is stretched compared with y-axis. of the profiles to the cluster center, and the standard deviation of the mean. The smaller the distance to the cluster center, the more compact the cluster is and the better the cluster describes the profiles. The standard deviation is proportional to the spread of the distances in each cluster. In general, clusters with the least number of profiles belonging to them have larger mean distances. Three points can be deduced from the figure. First, asymmetric profiles should be common. In fact, they ap- pear to be more common than symmetric ones. Second, even though the two Ca lines are formed fairly close to one another (the 8498 Å line core optical depth is unity at about 1 Mm and the 8542 Å 0.2 Mm higher up in the ra- diation hydrodynamic simulations by Carlsson and Stein 1997), the 8498 Å and 8542 Å profiles in a given cluster are often clearly different from one another. Third, in all cluster profiles the far-red wings have the same polar- ity as the far-blue wings, indicating that the lower parts along the line-of-sight of the atmosphere, where the wings are formed, are dominated by a single magnetic polarity. The clusters differ from one another in several differ- ent ways: the degree of asymmetry, and distinct relation- ships between the 8498 Å and 8542 Å line profiles, rel- ative amplitudes, etc. However, quantitative measures, such as profile asymmetries, of the clusters do not nec- essarily represent the members of a given cluster very well. For example, the variation of Stokes V amplitude asymmetries within a cluster is large and the mean is not necessarily the same as that of the cluster profile. The cluster analysis retrieves qualitative similarities and gives a basis for morphological classification, rather than representing quantitative similarities within the data. To illustrate this point, Figure 6 displays histograms of the clusters showing the Stokes V amplitudes and asymme- tries for all profiles belonging to a given cluster. Shown in Figure 7 is the spatial distribution of the clusters. The smallest network patches often consist of only cluster 1 and cluster 2 profiles. The middle of the largest network patch is a mixture of different clusters. In most cases, the profiles at the edges of the network patches belong to cluster 1. This is the most common cluster consisting of 35.6 % of all the profile pairs in the map. The cluster 1 profiles are asymmetric, 8542 Å more so than 8498 Å, and they also have opposite signs of amplitude and area asymmetries. The amplitude histograms of profiles belonging to this cluster show that they have in general low amplitudes, as one might expect from profiles located at the edges of the network. The large amplitude asymmetry in the 8542 Å cluster profile is not seen in the observed profiles. In fact, only very few profiles exhibit such large asymmetries and there is only a slight tendency of the profiles having more often negative than positive amplitude asymmetries. The cluster area asymmetries are in better agreement with the observed profiles belonging to this cluster. Regions of cluster 2 profiles are often located adjacent to patches of cluster 1 profiles. The cluster 2 profiles account for 20.0 % of all profile pairs in the map. The cluster profiles are fairly antisymmetric. This is seen in the observed profiles as well: the asymmetry histograms tend to be narrow and only slightly offset from zero. The relative amplitudes of the two cluster profiles are very different: the 8498 Å amplitude is a factor 3 larger. The disproportionality is not as large in the observed profiles though the amplitude histograms show that in general 8498 Å has a larger amplitude than 8542 Å. The range of observed amplitudes is considerably larger than in cluster Of the profile pairs in the map 14.1 % belong to cluster 3. Also, these profiles are often found in regions close to the network edges by the patches of cluster 1 profiles. Both cluster profiles have multiple lobes and are asym- metric, 8498 Å more in amplitude and 8542 Å in area. This is also seen in the histograms of the observed asym- Fig. 5.— Results of cluster analysis of the Stokes V profile pairs in the map. Line on left is 8498 Å and on right 8542 Å. Shown are the percentages of profile pairs belonging to each cluster, and the mean distance and its standard deviation of the profiles to the cluster center. metries. There is a strong emission feature on the blue side of the line in the 8498 Å cluster profile. It is weaker in the 8542 Å profile. The histograms for cluster 3 are nearly identical to those of cluster 1. This illustrates how cluster analysis based on PCA is captures the qualitative differences in the line profiles. Cluster 4 consists of 13.5 % of the profile pairs. Most of the observed profiles belonging to this cluster are near to the middle of the largest network patches. The 8498 Å cluster profile is dominated by a strong emission feature in the blue lobe. This feature is not visible in the 8542 Å cluster profile. The overlap between the two lines’ ampli- tude histograms is fairly small. Also the cluster profiles show this difference in the relative amplitudes: 8542 Å has a significantly lower amplitude than 8498 Å. Except for the 8542 Å area asymmetry histogram, all histograms are centered around zero. The range of area asymmetries in the 8542 Å line is large and the distribution is skewed towards negative values. This trend in the 8542 Å area asymmetries is seen in several of the clusters. The patches of profiles belonging to the fifth cluster (9.6 %) are also found in the less homogeneous middle regions of the network elements. The 8498 Å cluster Fig. 6.— Stokes V statistics of the map clusters. The histograms are for all profiles belonging to the given cluster and the dotted vertical lines show the area and amplitude asymmetries for the cluster profiles. profile has a factor 2 lower amplitude than 8542 Å. This is not seen in the amplitude histograms but there is a large overlap between the two histograms. The cluster profiles are fairly antisymmetric and also the histograms of observed profile asymmetries are centered around zero. The 8542 Å area asymmetry is again the exception: it is centered around a negative value. Cluster 6 is the smallest cluster with 7.2 % of the pro- files. Patches of cluster 6 profiles are located in regions with cluster 4 and 5 profiles. The 8498 Å cluster profile is very similar to that of cluster 5. Like cluster 5, the 8542 Å cluster 6 profile has a factor 2 larger amplitude and the amplitude histograms overlap nearly entirely. All the cluster 6 histograms are very similar to cluster 5. The major difference between the two is that there is very little structure in the 8542 Å line profile. 3.2. Time-dependent behavior The cluster analysis results of the time series are shown in Figure 8, and the spatio-temporal distribution is cap- tured in Figure 9. The clusters consist of profiles at rest with varying degrees of structure, and profiles where the blue side is in emission. While there are temporal changes in the clusters, there are no clear periodic pat- terns visible. Most slit positions have a preferred cluster or in some cases the slit position is dominated by two clusters. Positions where more than 2 clusters are domi- nant are rare. Because the slit moved occasionally during the time series, no meaningful power spectra can be made from this data set. The time series data do however allow for a qualitative analysis of the time-dependent behav- ior. Comparing network and internetwork pixels reveals some interesting features: the network, especially in the intermediate flux regions, is very dynamic with propagat- ing shock-like features and large self-reversals appearing frequently in both Stokes I and V . In comparison, the internetwork is less dynamic, intensity oscillations are present but they are much weaker than in the network. No structures indicating the presence of shocks, are seen in the internetwork profiles. In agreement with prior ob- servations of chromospheric lines (e.g., Noyes 1967), any oscillation periods in the network appear to have a longer period than in the internetwork. We now examine three different regions, namely an internetwork pixel, an intermediate flux network pixel, and a strong network pixel. 3.2.1. Internetwork In Figure 10 the time evolution of a typical internet- work pixel is shown. The location of the pixel is marked by an asterisk in Figure 2. The data were taken when the slit was stationary. No Stokes V signal above the noise level is seen in the pixel. The Stokes I profiles of both Ca lines change periodically in width and position of the line center, but no self-reversals are seen. Also, the line-wing intensity shows some oscillations. 3.2.2. Intermediate flux network The difference between the internetwork and network regions with intermediate flux (Fig. 11) is dramatic: the Fig. 7.— Spatial distribution of the clusters in the map. The black areas (0 cluster) correspond to regions where the Stokes V amplitudes are below 7× 10−3Ic and where no cluster analysis was performed. network region is much more dynamic, and highly asym- metric profiles, in both lines Stokes I and V , are seen. The time dependent behavior of the photospheric iron line is quite similar to what is seen in the internetwork. The Stokes I in both lines has a clearly oscillating be- havior with bright, very asymmetric episodes followed by a darker, more symmetric episodes. The period for the oscillation is about 4 minutes, i.e. below that as- sociated with the acoustic cutoff frequency (about 5.3 mHz). This may be caused by the presence of inclined magnetic fields can lower effectively the acoustic cutoff frequency (Bel & Leroy 1977). The time evolution of the 8542 Å Stokes I has a diagonal structure moving from blue to red. This indicates the presence of propagating compressible waves (Carlsson & Stein 1997). The bright part, which corresponds to a large self-reversal, is clearly shifted towards the blue. This is seen in the 8498 Å line profiles as well, although these profiles tend to be more flat-bottomed. In general, the self-reversals and over all variation is larger in the blue wing than in the red. This is true for all slit positions which exhibit strong time- dependent behavior. The Stokes V image of the 8498 Å line also shows strong diagonal structures that coincide in time with the dark phases of Stokes I. Inspection of individual pro- files (Fig. 12) reveals a pattern of multiple lobes in the Stokes V profiles. These lobes are on the blue side of the line core and their amplitudes and positions vary period- ically in time resulting in the diagonal structure seen in the image. The lobes can be identified with the emission features seen in the Stokes I profiles. The 8542 Å line Stokes V image shows a pattern of a multi-lobed pro- files whose amplitudes vary strongly in time. The large Stokes V amplitude phase coincides with the bright, very asymmetric phase seen in the intensity profiles. The red wings always exhibit less structure and variation than the blue wings. 3.2.3. Strong network Stokes I and V profiles seen in the strong network re- gions (Figs. 13 and 14) would appear at first glance to be a mixture of the less dynamic internetwork and the highly dynamic intermediate flux region. The Stokes I profiles exhibit the same pattern of bright (more asym- metric) and dark (less asymmetric) phases as seen in the intermediate flux region. The difference between the two phases is however not as large: the amplitude of the self- reversals, especially in the 8542 Å intensity profiles, is much smaller than in the intermediate flux case. The Stokes V images resemble those of the intermedi- ate flux region: some diagonal structures are seen, but they are weaker. The 8542 Å line Stokes V profiles have a time varying amplitude but the profiles are not as asym- metric and they are not necessarily multi-lobed. The dif- Fig. 8.— As Fig. 5 but for the time series. ference between the time-dependent behavior of the red and blue lobes of the profile, i.e. the red lobe varies less in time, is even more clear here than in the intermediate flux region. 3.3. Statistics Histograms of the Stokes I amplitude integrated over 250 mÅ around the line core for the two Ca lines are shown in the top-left panel of Figure 15. These his- tograms include both the map and time-series profiles. Because there are almost five times as many profiles in the time-series as there are in the map, the histograms are dominated by the time-series profiles. Both lines ex- hibit a wide range of values. Except for the peaks at low intensities, the histograms are fairly flat. The dark- est (i.e. lowest core intensity or most absorption) am- plitudes, are associated with the internetwork, and the brightest with the network. Histograms of the Stokes V amplitudes (top right panel of Fig. 15) peak at the same value in both lines, 0.003 Ic, but the 8498 Å histogram tail decays more slowly. Since the 8498 Å line is formed slightly lower of the two and the lines are roughly equally sensitive to magnetic fields (effective Landé g factors are 1.07 and 1.10 for the 8498 Å and the 8542 Å lines, respectively), it is not surprising that the 8498 Å histogram has the longer tail. Fig. 9.— The spatio-temporal distributions of the clusters for the first slit position in the time series. The black areas correspond to regions where the Stokes V amplitudes is ≤ 7×10−3Ic. The vertical lines show the period with the best seeing when the slit was stationary. Both lines’ Stokes V amplitude asymmetry histograms (bottom left panel of Fig. 15) have very similar shapes and similar widths. There are more positive asymme- tries in both lines: 56% in 8498 Å and 64 % in 8542 Å (Table 1). The mean amplitude asymmetries are also positive, and the 8542 Å mean asymmetry is two times larger. There are more negative amplitude asymmetries in the 8542 Å map than in the time series. Non-zero am- plitude asymmetries indicate at least one of two things: the spatial pixels consist in most cases of at least two atmospheric components that are shifted relative to one another or that there are velocity and/or magnetic field gradients present in the atmosphere. The area asymmetry histograms (bottom right panel of Fig. 15) of the two calcium lines repeat the pattern al- ready seen in the cluster profiles: the 8542 Å histograms is centered around a negative value and the 8498 Å is centered at roughly zero, though the mean is slightly pos- itive. The 8542 Å histogram is significantly wider than the 8498 Å histogram. A multi-component atmosphere alone cannot produce area asymmetries, so the existence of non-zero area asymmetries indicates the presence of velocity and possibly magnetic gradients in the atmo- sphere. In the 8542 Å line 66 % of the profiles have negative area asymmetries whereas in the 8498 Å line the majority of the profiles, 64 %, have positive area asymmetries (Ta- ble 1). To better understand why the area asymmetry histograms of the lines are so different, we need to look at the components of the area asymmetry separately i.e. the sign of the blue lobe and the total area of the Stokes V profile. One possible cause for the difference in the histograms might be that the distribution of signs of the blue lobe is different in the two lines. Closer inspection reveals that this is not the explanation. The vast major- ity of both lines, over 80 %, have a negative sign. (Here the sign is defined to be the sign of the local maximum or minimum amplitude of the blue lobe). A second pos- sible explanation is that the V (λ)dλ is different in the two lines. This is found to be the case. The 8542 Å line has more profiles with a positive area and the 8498 Å has slightly more profiles with a negative area. (Note that the sign of the area asymmetry is the product of the sign of the blue lobe and the area; eq. 2.) The area of the Stokes V profile is strongly affected by the emis- sion features. These features, and their amplitudes, are related to the self-reversals seen in the Stokes I profiles. The self-reversals are stronger on the blue side of the line core than on the red. In general, the blue lobes of the Stokes V profiles have negative amplitudes and the effect of the emission features is then to reduce the amplitude, and in some cases, make it positive and this way reduce the overall negative area. The effect of the emission features on the amplitude asymmetries is not as large because the amplitude will be affected only if the emission feature is located at the Fig. 10.— Time dependent behavior of Stokes I in an internetwork pixel. Location of the pixel is marked with an asterisk in Figure 2. same wavelength as the maximum absolute amplitude. Also if the profile has a wide blue lobe, i.e., the wings contribute significantly, a local reduction in peak ampli- tude is counterbalanced by a comparable signal in the other parts of the blue lobe. The resulting profile will have nearly the same amplitude in the blue lobe as be- fore, but the area will be reduced leading to a smaller, or even negative, area asymmetry. Since the self-reversals are larger in the 8542 Å line, this scenario is more likely to apply to it than the 8498 Å line. Both lines’ area and amplitude asymmetries are found to be inversely proportional to the Stokes V amplitudes. The scatter, especially in the 8542 Å line, is fairly large. PCA also allows us to ensure that the determination of Stokes V asymmetries is not dominated by noise. Recon- structing the profiles using only the 11 first eigenvectors (i.e., essentially noise-free profiles) and then computing the asymmetries reproduces the Stokes V amplitude and asymmetry histograms. To test if the negative histogram peak in the 8542 Å line is an artifact caused by data re- duction, we computed area asymmetries for the datasets, but after first removing the fringe pattern caused by the optics. This did not alter the area asymmetry histogram. Another artifact that could cause the offset is an incor- rect subtraction of the tilt caused by the detector in the continuum intensity. To remove the offset in the his- tograms by means of changing the tilt causes a clearly visible lopsidedness in the Stokes I profiles. Lastly, to make sure that the choice of the integration range is not the cause of the offset, we used a constant bandwidth for area asymmetries and it also reproduces the 8542 Å area histogram offset. (Besides these issues, there are no other obvious artifacts that would cause the offset.) We therefore conclude that the offset is not caused by the fringing or incorrect subtraction of the tilt in the contin- uum intensity. 4. COMPARISON OF OBSERVATIONS WITH A HIGH-β SIMULATION In P06 we synthesized Stokes profiles for the Ca IR triplet lines in the high-β regime. This was done by combining a radiation hydrodynamic code (see for exam- ple Carlsson & Stein 1997) with a weak magnetic field and using a nLTE Stokes inversion and synthesis code (Socas-Navarro et al. 2000b) to produce, based on snap- shots of the simulation, a time series of the lines’ Stokes vectors. The simulation is driven by a photospheric ve- locity piston and its dynamics are dominated by upward propagating acoustic waves in a simple magnetic field topology. The simulation shows that the radiative trans- fer is very similar in all the Ca IR triplet lines. The differences between the line behaviors in the simulation are mainly due to the lines having slightly different for- mation heights and thus experiencing a difference in the amplitudes of the shocking waves: the higher the line is formed, the larger the amplitude of the passing wave is. In the simulation there is no feedback from the mag- netic fields on the dynamics and the waves are purely acoustic. The observations have limited spatial and tem- poral resolutions whereas the simulation is much better resolved. 4.1. Comparison of time dependent behavior As the acoustic waves in the simulation propagate up- wards and eventually form shocks, a time-varying pat- Fig. 11.— Time dependent behavior of Stokes I and V in an intermediate flux pixel. Location of the pixel is marked with a diamond in Figure 2. Fig. 12.— Time evolution of individual Stokes I and V profiles in an intermediate flux pixel. Location of the pixel is marked with a diamond in Figure 2. Fig. 13.— Time evolution of Stokes I and V in a network pixel. Location of the pixel is marked with a triangle in Figure 2. Fig. 14.— Time evolution of individual Stokes I and V profiles in a network pixel. Location of the pixel is marked with a triangle in Figure 2. Fig. 15.— Histograms of Stokes I and V amplitudes, and Stokes V amplitude and area asymmetries of the map and time-series. tern of disappearing and reappearing Stokes V lobes is seen (Fig. 16). The pattern is strongest in the highest forming line, i.e. 8542 Å. Wave propagation is also seen in the Stokes I profiles. There are no large self reversals or brightenings, instead the position of the line minimum changes periodically and forms a saw-tooth like pattern where the red shift takes more time than the blue shift phase. If we first compare the simulated profiles to the in- ternetwork observations (Fig. 10), we see that the strong signatures of shocks seen in the simulation are not present in the observations. In the simulation the Ca IR triplet is formed in a region where the waves are just be- ginning to shock. If the formation height of the lines or the shocks in the simulation is off, compared to the real Sun, by a small amount, even 50 km, the lines’ temporal evolution may look very different. Another possible ex- planation to why we see no strong indications of shocks is the temporal and/or spatial resolution: there may be several components oscillating out of phase relative to one another in a given resolution element. However, the photospheric velocities are very similar in the internet- work and network, but the network profiles show strong self-reversals. This suggests that spatial and temporal resolution alone cannot explain the lack of strong signa- tures of shocks in the internetwork. Observations of the quiet Sun show varying degrees of oscillatory power (compare for example Lites et al. (1993) [Ca II H and K] or UV data of Judge et al. 2003, McIntosh & Judge 2001 and Wikstøl et al. 2000). This variation may be related to the local magnetic topology, especially to the possible existence of a magnetic canopy (McIntosh et al. 2003; Vecchio et al. 2006). The region observed here was less oscillatory than average but still not exceptionally quiet. Both the simulated profiles and observed network pro- files (Fig. 13) show time varying patterns where the Stokes I and V amplitudes change periodically. In the simulation the wave propagation manifests itself in the Stokes I profiles most clearly as a shift of the line core and the saw-tooth shape of the time series. In the ob- servations, waves cause the lines’ periodically varying self-reversals that result in alternating bright and dark phases. There are indications of diagonal structures in the observed Stokes I images, but they are not nearly as clear as in the simulation. In the simulation the up- ward propagating waves cause the blue and red lobes of the Stokes V profiles to disappear alternately. In con- trast, the observed time varying pattern in Stokes V looks more complicated: there is much more structure in the observed profiles, especially in the line cores, than in the simulation. This is related to the simulated pro- files not exhibiting strong self-reversals as seen in the observations. In the simulation, because of radiative cooling and ex- pansion of the falling material, the down flows are in general cooler than the up flows. In the synthesized pro- files this manifests itself by the red wings of the Stokes I profiles showing less variations, though the difference with the blue wing is quite small. Similar behavior is also seen in the observations: the self-reversals are in general larger in the blue wing of the Stokes I profiles and the red lobes of the Stokes V profiles show clearly less variation. 4.2. Comparison of statistics and Stokes V morphologies In the simulation the magnetic field decays exponen- tially with height and therefore the Ca II Stokes V am- plitudes are significantly lower than the Fe I 8497 Å am- plitude. In the observations the Ca and Fe line Stokes V profiles have roughly the same amplitudes. This may be explained by the field decaying much slower with height in the observations, or by the filling factor in the ob- servations being smaller in the photosphere than in the chromosphere. Both Ca II lines’ observed Stokes V profiles have a sig- nificant amount of signal in the wings. In the simulations only the 8498 Å line Stokes V has extended wings with large amplitudes (Fig. 4 in P06). The amount of signal in the wings depends on the atmospheric magnetic field Fig. 16.— Time evolution of Stokes I and V profiles in the high-β simulation (P06). The Stokes V signal in wavelength range -1.2 to -0.6 Å in the 8498 Å image is scaled down with factor 7.5 in order to display both the Ca II 8498 Å and Fe I 8497 Å lines in the same panel. gradient. If there is no gradient the wings of all three Ca lines have very little signal. Whereas a model atmo- sphere with a constant field gradient produces profiles where all lines, 8498 Å the most, have some signal in the line wings and an exponential field produces profiles with the largest wings. Depending on where the gradi- ent is located and how strong the field is, the Ca lines may or may not have similar Stokes V profiles. Based on the profile shapes and relative amplitudes, it is ob- vious that the magnetic topology in the observations is different from the simulation. Formation of area and amplitude asymmetries in the simulation is coupled. The correlation is especially strong in the 8542 Å line (upper row of Fig. 17). In the 8498 Å Stokes V profiles the strong wings affect the asymmetries, and the correlation is weaker. The observed area and amplitude asymmetries of both lines show less correlation. This is at least partly because the observed profiles have more complex shapes than in the simulation. The lower panels in Figure 17 show the Stokes V asym- metry histograms for the simulation. The observed his- tograms are re-plotted to enable direct comparison. In the simulation both lines’ amplitude and asymmetry his- tograms are centered roughly around zero (percentage- wise there are a couple of percent more negative than positive asymmetries). This was not the case in the ob- servations where all the asymmetries, except the 8542 Å area asymmetry, have clearly more positive than nega- tive values, i.e. the blue lobe is larger in area/amplitude than the red lobe. The observed 8498 Å profiles are more dynamic than the simulated ones. Consequently the observed 8498 Å asymmetry histograms are clearly wider than the simu- lated. Because there is very little signal in the simulated 8542 Å Stokes V profile wings, when an upward propa- gating wave causes a Stokes V lobe to disappear, there is no signal in the line wing to contribute to the amplitude. This leads to the extreme amplitude asymmetries in the simulations and in the additional lobes at large values in the simulated 8542 Å line area asymmetry histogram. Since the observed profiles have a significant amount of signal in the wings, the extreme amplitude asymmetries are moderated, and no lobes at large values are seen in the histogram. 5. CONCLUSIONS AND DISCUSSION So far most spectropolarimetric studies using the Ca II IR triplet lines have focused on active regions (e.g., Socas-Navarro, Trujillo Bueno, & Ruiz Cobo 2000a; López Ariste, Socas-Navarro, & Molodij 2001; Socas-Navarro 2005; Uitenbroek, Balasubramaniam, & Tritschler 2006). The observations presented here show that these lines are also promising candidates for studying the magnetic chromosphere outside of active regions. Interpreting the observations, however, is not straight forward. The main results of the analysis presented here are: • Classification of Stokes V profile shapes. Asymmetric line profiles are very common and that the two lines, despite being formed fairly close in a geometrical sense, often do not have similar shapes. Furthermore, the edges of the network patches ex- hibit profile shapes different from those seen in the center of the patches. The cluster analysis results, as expected, in a qualitative, not quantitative, de- scription of the profile shapes. • Statistics of the line profiles. The 8542 Å area asymmetry is predominantly neg- ative; while the 8498 Å area asymmetry and the amplitude asymmetries are usually positive. • Time dependent behavior. The enhanced network has very different dynamic behavior compared with the internetwork. It is more dynamic and the oscillation period, as seen in both Stokes I and V , is greater than in the in- ternetwork. • Comparison with high-β simulation. Oscillations are present in both the observations and the simulation. The simulated profiles are more dynamic than the observed internetwork pro- files. The opposite is true for network profiles. In the simulation, the formation of asymmetries is more tightly coupled than what is seen in the ob- servations. Except for the 8542 Å amplitude asym- metry the observed profiles show a wider range of asymmetries. And lastly, the peculiar negative area asymmetries seen in the observed 8542 Å line and the tendency of the other asymmetries to be posi- tive are not reproduced by the simulation. The tendency of large Stokes V asymmetries to de- crease with an increasing signal amplitude has also been observed in photospheric lines (Grossmann-Doerth et al. 1996). In the photosphere a magnetic canopy is one pos- sible explanation: the canopy gives rise to asymmetries in the lines, and as a flux tube diameter increases, the relative contribution from the canopy to the Stokes V signal decreases. In the photosphere the scatter in an amplitude vs. asymmetry plot is significantly larger in the area than in the amplitude. No large difference is seen in the area and amplitude asymmetry scatters of the Ca II lines. In the quiet Sun photosphere, more positive than negative Stokes V asymmetries are found (Grossmann-Doerth et al. 1996). In contrast with 8498 Å line (where there is no large difference in the mean area and amplitude asymmetries) the photospheric mean area asymmetries are significantly smaller (4 % in the Fe I 6302 Å line) than the mean amplitude asymmetries (15 % in the Fe I 6302 Å line). The photospheric asymme- tries are often attributed to multiple atmospheric compo- nents within a resolution element. In the chromosphere, however, gradients have to play a dominant role since the formation of area asymmetries require them. An- other piece of evidence of the importance of gradients in the chromosphere is that Milne-Eddington inversions, which include the Paschen-Back effect of the He I 10830 Å triplet, are not able to reproduce the observed area asymmetries (Sasso & Solanki 2006). Fig. 17.— Stokes V asymmetries of the simulated and observed profiles. Upper 4 panels show the correlation of amplitude and area asymmetries in the simulated and observed Ca lines. The Pearson correlation coefficient for each case is given. The asterisk symbols show the mean for each 0.1 wide bin and the error bars show the standard deviation. The lower panels are histograms of observed and simulated amplitude and area asymmetries. Khomenko et al. (2005) used a 3-dimensional magne- toconvection model to synthesize photospheric magneti- cally sensitive lines in the visible and IR. There are more positive than negative Stokes V asymmetries in their syn- thetic profiles. They found that reducing the spatial res- olution increases the number of irregular stokes V pro- files (though the number of strongly asymmetric profiles decreases). They conclude that the asymmetries reflect more inhomogeneities in the horizontal direction than in the vertical. In the chromosphere large velocity gradients are more common and variation in the vertical direction are likely to be more important than variation in the hor- izontal direction. When these two factors are combined with the observed area asymmetries, one concludes that the chromospheric asymmetries mainly reflect the line-of- sight inhomogeneities, and not variations in the horizon- tal direction. Despite the apparent similarities between the photospheric and chromospheric Stokes V profiles, the underlying mechanism causing the asymmetries does not appear to be the same. Drawing parallels between the chromosphere and photosphere is problematic since the two regions exist in very different physical regimes. The discrepancy between the Stokes V asymmetry his- tograms of the observations and the simulation may be related to the self-reversals. The simulated profiles ex- hibit only small self-reversals. The observations show large self-reversals in the Stokes I profiles and accompa- nying emission features in the Stokes V profiles. These features are stronger on the blue side of the line cores. Another effect that contributes to the imbalance is that that the down flow phase lasts longer. Our observations, especially with a 5 second exposure time, sample more profiles with red-shifts and positive asymmetries (since there will be more emission on the blue side). However, inspection of Fig. 16 shows the same to be true of the simulations. If this is the case, why are there not more positive than negative asymmetries in the simulation as well? The sample of these observations is limited because the majority of the profiles are drawn from the same three slit positions which sample the same local magnetic field configuration. It would not be surprising if histograms made of profiles from a variety of quiet-Sun magnetic field topologies would have somewhat different shapes. The complexity of the observed profiles makes the inter- pretation of the area and amplitude asymmetries diffi- cult. Because of multiple lobes and the strong signal in the line wings, the asymmetries are not necessarily good proxies for the overall complexity of the Stokes V pro- files. This is especially true if the two asymmetries are viewed separately. It is a well known result that the network intensity oscillations have a longer period than the internetwork (e.g. Orrall 1966, Lites et al. 1993, Banerjee et al. 2001). This has also been observed before in the Ca II IR lines (Deubner & Fleck 1990). Why do the intermedi- ate flux regions in our observations appear to be more dynamic than the stronger flux regions? It may be re- lated to a more complex magnetic topology at the edges of the network patches. The observations show no sig- nal above the noise in Stokes Q and U , so we can- not draw any conclusions of possible horizontal fields. Any signal would be affected by atomic polarization (Manso Sainz & Trujillo Bueno 2003) making the inter- pretation exceedingly complex. The filling factor in the network is not likely to be very large, and is likely smaller at the edges than in the center of the network patch. In- versions by Bellot Rubio et al. (2000) of average Stokes profiles in a plage region gave a filling factor of 0.5 a z = 0 km. The filling factor in the photospheric net- work can safely be assumed to be lower than this. In fact, in recent inversions by Domı́nguez Cerdeña et al. (2006), which included a small patch of network, the pho- tospheric filling factor in the patch center was as small as 0.1. The network magnetic fields must expand with height and consequently the chromospheric filling fac- tor must exceed photospheric values. Results of com- paring photospheric and chromospheric magnetograms, however, Zhang & Zhang (2000) suggest that the sizes of the network magnetic elements are not very different at the two heights . The chromospheric magnetograms in the comparison are based on the Hβ line. Its interpreta- tion is complicated by the magnetically sensitive blends close to the line core, and the line may suffer from same problems as the Hα line when used as a proxy for chro- mospheric magnetic fields, namely that the photospheric contribution to the polarization signal is not insignificant (Socas-Navarro & Uitenbroek 2004). Lastly, the size of network patches is not directly linked with the filling fac- tor. We see some expansion of the network with height in the magnetograms of the map (Fig. 2), especially when comparing the Ca II 8498 Å and 8542 Å magnetograms. But since the magnetograms were constructed by using the weak field formula, and the network fields have gra- dients and are not necessarily weak, the magnetograms are not accurate. Also the choice of color scaling of the images affects the comparison. However, the apparent expansion is not necessarily an artifact, since expansion of network seen in magnetograms has also been reported by Giovanelli (1980). Obviously we need to understand better the topology of the network magnetic fields. To do this we plan to perform nLTE inversions of these data in the near fu- ture. The inversions will help further in understanding the formation dynamics of the Ca II IR lines in the quiet Sun, and hopefully reveal how the underlying atmosphere differs from that used in the simulation. An important question to answer is why the two Ca lines behave as differently as they do. Having a time series taken during good seeing would be helpful. Also in order to expand the analysis to internetwork regions, better spatial resolution is required. Another interesting question is how much variation there is in dynamics in different internetwork regions, and how well the differences can be explained in terms of the surrounding magnetic fields as has been sug- gested by Vecchio et al. (2006) based on imaging data of Ca II 8542 Å Stokes I. To fully investigate this in detail high quality data of the full Stokes vector are needed. TABLE 1 Observed Stokes V asymmetries 8498 Å 8498 Å 8498 Å 8542 Å 8542 Å 8542 Å < 0 (%) > 0 (%) mean (%) < 0 (%) > 0 (%) mean (%) σa 43.2 55.7 3.1 36.6 61.4 6.3 σA 35.5 64.5 3.3 69.7 30.3 -6.8 Note. — Percentages of observed Ca II 8498 Å and 8542 Å Stokes V amplitude and area asymmetries with negative (i.e. red lobe larger) and positive (i.e. blue lobe larger) signs. Thanks to Doug Gilliam, Joe Elrod and Mike Bradford for all their invaluable help during the observing run. REFERENCES Andretta, V. & Jones, H. P. 1997, ApJ, 489, 375 Banerjee, D., O’Shea, E., Doyle, J. G., & Goossens, M. 2001, A&A, 371, 1137 Bel,N. & Leroy, B. 1977, A&A, 55, 239 Bellot Rubio, L. R. 2006, ArXiv Astrophysics e-prints Bellot Rubio, L. R., Ruiz Cobo, B., & Collados, M. 2000, ApJ, 535, 489 Carlsson, M. & Stein, R. F. 1997, ApJ, 481, 500 Deubner, F.-L. & Fleck, B. 1990, A&A, 228, 506 Domı́nguez Cerdeña, I., Almeida, J. S., & Kneer, F. 2006, ApJ, 646, 1421 Giovanelli, R. G. 1980, Solar Phys., 68, 49 Grossmann-Doerth, U., Keller, C. U., & Schuessler, M. 1996, A&A, 315, 610 Judge, P. G., Carlsson, M., & Stein, R. F. 2003, ApJ, 597, 1158 Keller, C. U. & The Solis Team. 2001, in ASP Conf. Ser. 236: Advanced Solar Polarimetry – Theory, Observation, and Instrumentation, ed. M. Sigwarth, 16–+ Khomenko, E. V., Collados, M., Solanki, S. K., Lagg, A., & Trujillo Bueno, J. 2003, A&A, 408, 1115 Khomenko, E. V., Shelyag, S., Solanki, S. K. & Vögler, A. 2005, A&A, 442, 1059 Lagg, A. 2005, in ESA SP-596: Chromospheric and Coronal Magnetic Fields, ed. D. E. Innes, A. Lagg, & S. A. Solanki Landi Degl’Innocenti,E. 1992, in Solar Observations: Techniques and Interpretation, First Canary Islands Winter School of Astrophysics, ed. F. Sanchez, M. Collados & M. Vazquez Lites, B. W., Chipman, E. G., & White, O. R. 1982, ApJ, 253, 367 Lites, B. W., Rutten, R. J., & Kalkofen, W. 1993, ApJ, 414, 345 López Ariste, A., Socas-Navarro, H., & Molodij, G. 2001, ApJ, 552, 871 MacQueen, J. 1967, in Proceedings Fifth Berkeley Symposium on Math. Stat. and Prob., ed. L. M. LeCam & J. Neyman, 281–+ Manso Sainz, R. & Trujillo Bueno, J. 2003, Physical Review Letters, 91, 111102 Mart́ınez Pillet, V., L. B. W. . S. A. 1997, ApJ, 474, 810 McIntosh, S. W., Fleck, B., & Judge, P. G. 2003, A&A, 405, 769 McIntosh, S. W. & Judge, P. G. 2001, ApJ, 561, 420 Neckel, H. & Labs, D. 1984, Solar Phys., 90, 205 Noyes, R. W. 1967, in IAU Symp. 28: Aerodynamic Phenomena in Stellar Atmospheres, ed. R. N. Thomas, 293–+ Orrall, F. Q. 1966, ApJ, 143, 917 Paletou, F. & Molodij, G. 2001, in ASP Conf. Ser. 236: Advanced Solar Polarimetry – Theory, Observation, and Instrumentation, ed. M. Sigwarth, 9–+ Pietarila, A., Socas-Navarro, H., Bogdan, T., Carlsson, M., & Stein, R. F. 2006, ApJ, 640, 1142 Rees, D. E., López Ariste, A., Thatcher, J., & Semel, M. 2000, A&A, 355, 759 Rimmele, T. R. 2000, in Proc. SPIE Vol. 4007, p. 218-231, Adaptive Optical Systems Technology, Peter L. Wizinowich; Ed., ed. P. L. Wizinowich, 218–231 Sánchez Almeida, J. & Lites, B. W. 2000, ApJ, 532, 1215 Sasso, C., Lagg, A. & Solanki, S. 2006, A&A, 456, 367 Scharmer, G. B., Bjelksjo, K., Korhonen, T. K., Lindberg, B., & Petterson, B. 2003, in Innovative Telescopes and Instrumentation for Solar Astrophysics. Edited by Stephen L. Keil, Sergey V. Avakyan . Proceedings of the SPIE, Volume 4853, pp. 341-350 (2003)., ed. S. L. Keil & S. V. Avakyan, 341–350 Shimizu, T. 2004, in ASP Conf. Ser. 325: The Solar-B Mission and the Forefront of Solar Physics, ed. T. Sakurai & T. Sekii, 3–+ Socas-Navarro, H. 2005, ApJ, 631, L167 Socas-Navarro, H., López Ariste, A., & Lites, B. W. 2001, ApJ, 553, 949 Socas-Navarro, H., Trujillo Bueno, J., & Landi Degl’Innocenti, E. 2004, ApJ, 612, 1175 Socas-Navarro, H., Trujillo Bueno, J., & Ruiz Cobo, B. 2000a, Science, 288, 1398 —. 2000b, ApJ, 530, 977 Socas-Navarro, H. & Uitenbroek, H. 2004, ApJ, 603, L129 Socas-Navarro, H. et al. 2006, Solar Physics, 235, 55 Stein, R. F. & Nordlund, Å. 2006, ApJ, 642, 1246 Uitenbroek, H., Balasubramaniam, K. S., & Tritschler, A. 2006, ApJ, 645, 776 Vecchio, A., Cauzzi, G., Reardon, K. P., Janssen, K. & Rimmele, T. 2006, astro-ph/0611206 Wikstøl, Ø., Hansteen, V., Carlsson, M., & Judge, P. G. 2000, Astrophys. J., 531, 1150 Zhang, H. & Zhang, M. 2000, Solar Phys., 196, 269 http://arxiv.org/abs/astro-ph/0611206 ABSTRACT The Ca II infrared triplet is one of the few magnetically sensitive chromospheric lines available for ground-based observations. We present spectropolarimetric observations of the 8498 A and 8542 A lines in a quiet Sun region near a decaying active region and compare the results with a simulation of the lines in a high plasma-beta regime. Cluster analysis of Stokes V profile pairs shows that the two lines, despite arguably being formed fairly close, often do not have similar shapes. In the network, the local magnetic topology is more important in determining the shapes of the Stokes V profiles than the phase of the wave, contrary to what our simulations show. We also find that Stokes V asymmetries are very common in the network, and the histograms of the observed amplitude and area asymmetries differ significantly from the simulation. Both the network and internetwork show oscillatory behavior in the Ca II lines. It is stronger in the network, where shocking waves, similar to those in the high-beta simulation, are seen and large self-reversals in the intensity profiles are common. <|endoftext|><|startoftext|> Introduction In this paper we compute the number of moduli of certain families of irreducible plane curves with nodes and cusps as singularities. Let Σnk,d ⊂ P(H 0(P2,OP2(n))) := PN , with N = n(n+3) , be the closure, in the Zariski’s topology, of the locally closed set of reduced and irreducible plane curves of degree n with k cusps and d nodes. Let Σ ⊂ Σnk,d be an irreducible component of the variety Σ k,d. We denote by Σ0 the open set of Σ of points [Γ] ∈ Σ such that Σ is smooth at [Γ] and such that [Γ] corresponds to a reduced and irreducible plane curve of degree n with d nodes, k cusps and no further singularities. Since the tautological family S0 → Σ0, parametrized by Σ0, is an equigeneric family of curves, by normalizing the total space, we get a family // S0 // P2 × Σ0 of smooth curves of genus g = −k−d. Because of the functorial properties of the moduli space Mg of smooth curves of genus g, we get a regular map Σ0 → Mg, sending every point [Γ] ∈ Σ0 to the isomorphism class of the normalization of the plane curve Γ corresponding to the point [Γ]. This map extends to a rational map ΠΣ : Σ 99K Mg. We say that ΠΣ is the moduli map of Σ and we set number of moduli of Σ := dim(ΠΣ(Σ)). Date: 06 September 2005. 1991 Mathematics Subject Classification. 14H15; 14H10; 14B05. Key words and phrases. families of plane curves, number of moduli, nodes and cusps. http://arxiv.org/abs/0704.0618v1 2 CONCETTINA GALATI Notice that, when Σnk,d is reducible, two different irreducible components of Σ can have different number of moduli. We say that Σ has general moduli if ΠΣ is dominant. Otherwise, we say that Σ has special moduli. Definition 1.1. When Σ has the expected dimension equal to 3n+ g − 1 − k and g ≥ 2, we say that Σ has the expected number of moduli if dim(ΠΣ(Σ)) = min(dim(Mg), dim(Mg) + ρ− k), where ρ := ρ(2, g, n) = 3n−2g−6 is the number of Brill-Noether of the linear series of degree n and dimension 2 on a smooth curve of genus g. As we shall see in the next section, when g ≥ 2 and when Σ has the expected dimension equal to 3n+ g − 1 − k, the number of moduli of Σ is at most equal to the expected one. This happens in particular if k < 3n. If k ≥ 3n, in general we have not an upper-bound for the dimension of Σ and we cannot provide an upper bound for the number of moduli of Σ, (see lemma 2.2 and remark 2.3). Moreover, by classical Brill-Neother theory when ρ is positive and by a well know result of Sernesi when ρ ≤ 0 (see [18]), we have that Σn0,d, (which is irreducible by [8]), has the expected number of moduli for every d ≤ . When k > 0 there are known results giving sufficient conditions for the existence of irreducible components Σ of Σnk,d with general moduli, (see propositions 2.5 and 2.6 and corollary 2.7). In this article we construct examples of families of irreducible plane curves with nodes and cusps with finite and expected number of moduli. A large part of this paper is obtained working out the main ideas and techniques that Sernesi uses in [18]. In section 2.1 we introduce the varieties Σnk,d and we recall their main properties. In section 2.2 we discuss on definition 1.1 and we summarize known results on the number of moduli of families of irreducible plane curves with nodes and cusps. In theorem 3.5 we prove the existence of plane curves with nodes and cusps as singular- ities whose singular points are in sufficiently general position to impose independent linear conditions to a linear system of plane curves of a certain degree. This result is related to the moduli problem by lemma 3.2, remark 3.4 and proposition 4.1, where we find sufficient conditions in order that an irreducible component Σ ⊂ Σnk,d has the expected number of moduli. If Σ verifies the hypotheses of proposition 4.1, then the Brill-Neother number ρ is not positive and Σ has finite number of moduli. Moreover, by lemma 4.6 and corollary 4.7, for every k′ ≤ k and d′ ≤ d+k−k′, there is at least an irreducible component Σ′ ⊂ Σnk′,d′ , such that Σ ⊂ Σ ′ and the general element [D] ∈ Σ′ corresponds to a plane curve D verifying hypotheses of propo- sition 4.1 and so having the expected number of moduli. Finally, the main result of this paper is contained in theorem 4.9, where, by using induction on the degree n and on the genus g of the general curve of the family, we construct examples of families of irreducible plane curves with nodes and cusps verifying the hypotheses of proposition 4.1. In particular, we prove that, if k ≤ 6 and ρ ≤ 0, then Σnk,d has at least an irreducible component which is not empty and which has the expected number of moduli. This result may be improved and examples of families of curves showing that the condition k ≤ 6 is not sharp are given in remark 4.10. Notice that the previous theorem provides only examples of families of plane curves with nodes and cusps with expected number of moduli, when ρ is not positive. When the number of cusps k is very small, we expect it is possible to prove the existence of irreducible components of Σnk,d with expected number of moduli, for every value of ρ. For example, from a result of Eisenbud and Harris, it follows that Σn1,d, (which is irreducible by [16]), has general moduli if ρ ≥ 2, (see corollary 2.7). In theorem 4.11, by using induction on n we find that Σn1,d has general moduli also when ρ = 1. By recalling that, by theorem 4.9, Σn1,d has expected number of moduli when ρ ≤ 0, we conclude that Σn1,d has the expected number of moduli for every ρ or, equivalently, NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 3 for every d ≤ − 1. We still don’t know examples of irreducible components of Σnk,d having number of moduli smaller that the expected. 2. Preliminaries 2.1. On Severi-Enriques varieties. We shall denote by PN = P n(n+3) 2 the Hilbert scheme of plane curves of degree n, by [Γ] ∈ PN the point parametrizing a plane curve Γ ⊂ P2 and by Σnk,d ⊂ P N the closure, in the Zariski topology, of the locally closed set parametrizing reduced and irreducible plane curves of degree n with d nodes and k cusps as singularities. These varieties have been introduced at the beginning of the last century by Severi and Enriques. In particular, the case k = 0 has been studied first by Severi and for this reason the varieties Σn0,d are usually called Severi varieties, while for k > 0 the varieties Σnk,d are called Severi-Enriques varieties. We recall that every irreducible component Σ of Σnk,d has dimension at least equal to N − d− 2k = 3n+ g − 1− k, where g = −k−d. When the equality holds we say that Σ has expected dimen- sion. Moreover, it is well known that if k < 3n then every irreducible component Σ of Σnk,d has expected dimension, (see for example [23] or [25]). On the contrary, when k ≥ 3n, there exist examples of irreducible components of Σnk,d having di- mension greater than the expected, (see [25]). Moreover, we recall that Σn0,d is not empty for every d ≤ and it contains in its closure all points parameterizing irreducible plane curves of degree n and genus g = − d, (see [24], [25] and [1]). Often, we shall denote Σn0,d by Vn,g. While the proof of the existence of Vn,g is quite elementary and it is due to Severi, the irreducibility of Vn,g remained an open problem for a long time and it has been proved by Harris only in 1986. Later, by using the same techniques of Harris, Kang has proved the irreducibility of Σnk,d with k ≤ 3, see [8] and [16]. However, in general, Σnk,d is reducible and there exist values of n, d and k such that Σnk,d is empty, (see [25], [12], [20], [11] or chapter 2 of [7] and related references). Finally, we recall that, if Σ ⊂ Σnk,d is a non-empty irreducible component of the expected dimension equal to 3n + g − 1 − k, then, for every k′ ≤ k and d′ ≤ d + k − k′, there exists a non-empty irreducible compo- nent Σ′ ⊂ Σnk′,d′ such that Σ ⊂ Σ ′. This happens in particular if k < 3n. More precisely, it is true that, if Γ ⊂ P2 is a reduced (possibly reducible) plane curve of degree n with k < 3n cusps at points q1, . . . , qk, nodes at points p1, . . . , pd and no further singularities, then, chosen arbitrarily k1 cusps, say q1, . . . , qk1 among the k cusps of Γ, k2 cusps qk1+1, . . . , qk2 among qk1+1, . . . , qk and d1 nodes p1, . . . , pd1 among the nodes of Γ, there exists a family of reduced plane curves D → B ⊂ PN of degree n, whose special fibre is D0 = Γ and whose general fibre Dt = D has a node in a neighborhood of every marked node of Γ, a cusp in a neighborhood of each point q1, . . . , qk1 , a node in a neighborhood of each point qk1+1, . . . , qk2 and no further singularities, (see [25], corollary 6.3 of [11] or lemma 3.17 of chapter 2 of [7]). To save space, we shall say that the family D → B is obtained from Γ by preserving the singularities q1, . . . , qk1 and p1, . . . , pd1 , by deforming in a node each cusp qk1+1, . . . , qk2 and by smoothing the other singularities. 2.2. Known results on the number of moduli of Σnk,d. In order to explain the definition 1.1, we need to recall some basics of Brill-Noether theory. Given a smooth curve C of genus g, the set G2n(C) of linear series g n on C of dimension 2 and degree n, is a projective variety which verifies the following properties: (1) G2n(C) is not empty of dimension at least ρ, if ρ(2, n, g) = 3n− 2g − 6 ≥ 0, (see theorem V.1.1 and proposition IV.4.1 of [4]). 4 CONCETTINA GALATI (2) Let g2n be a given linear series, letH ∈ g n be a divisor and letW ⊂ H 0(C,H) be the three dimensional vector space corresponding to g2n. Denoting by ωC = OC(KC) the canonical sheaf of C and by µo,C :W ⊗H 0(C, ωC(−H)) → H 0(C, ωC) the natural multiplication map, also called the Brill-Noether map of the pair (C,W ), we have that the dimension of the tangent space to G2n(C) at the point [g2n],corresponding to g n, is equal to dim(T[g2 n(C)) = ρ+ dim(ker(µ0,C)), (see [2] or proposition IV.4.1 of [4] for a proof). (3) Moreover, if C is a curve with general moduli (i.e. if [C] varies in an open set of Mg), the variety G n(C) is empty if ρ < 0, it consists of a finite number of points if ρ = 0 and it is reduced, irreducible, smooth and not empty variety of dimension exactly ρ, when ρ ≥ 1, (see theorem V.1.5 and theorem V.1.6 of [4]). In the latter case, the general g2n on C defines a local embedding on C and it maps C to P2 as a nodal curve, (see theorem 3.1 of [1] or lemma 3.43 of [9]). From (3), we deduce that, the Severi variety Σn0,d = Vn,g of irreducible plane curves of genus g = − d, has general moduli when ρ ≥ 0 and it has special moduli when ρ < 0. When ρ < 0, and then g ≥ 3, by definition 1.1, we expect that the image of Vn,g into Mg has codimension exactly −ρ. Equivalently, recalling that, in this case, dim(Vn,g) = 3n+ g − 1 = 3g − 3 + ρ+ 8 = dim(Mg) + ρ+ dim(Aut(P we expect that on the smooth curve C, obtained by normalizing the plane curve corresponding to the general element of Vn,g, there is only a finite number of g mapping C to the plane as a nodal curve. This is a well known result proved by Sernesi in [18]. Theorem 2.1 (Sernesi, [18]). The Severi variety Vn,g = Σ 0,d of irreducible plane curves of degree n and genus g = − d has number of moduli equal to min(dim(Mg), dim(Mg) + ρ). What can we say about the number of moduli of an irreducible component Σ of Σnk,d, when k > 0? In this case we need to distinguish the two cases k < 3n and k ≥ 3n. In the first case we have the following result. Lemma 2.2. For every not empty irreducible component Σ of Σnk,d, with k < 3n and g = − k − d ≥ 2, the number of moduli of Σ is at most equal to min(dim(Mg), dim(Mg) + ρ− k), where ρ = 3n − 2g − 6 is the Brill-Neother number of moduli of linear series of dimension 2 and degree n on a smooth curve of genus g. Proof. We recall that an ordinary cusp P of a plane curve Γ corresponds to a simple ramification point p of the normalization map φ : C → Γ, i.e. to a simple zero of the differential map dφ. If we denote by G2n,k(C) ⊂ G n(C) the set of g n on C defining a birational morphism with k simple ramification points, then G2n,k(C) is a locally closed subset of G2n(C) and every irreducible component G of G n,k(C) has dimension at least equal to ρ− k, if it is not empty. In particular, if F 2n,k(C) is the variety whose points correspond to the pairs ([g2n], {s0, s1, s2}) where [g n] ∈ G n,k(C) and {s0, s1, s2} is a frame of the three dimensional space associated to the linear series g2n, then every irreducible component of F n,k(C) has dimension at least equal min(8, ρ− k + 8). NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 5 Now, let Σ be one of the irreducible components of Σnk,d and let [Γ] be a general point of Σ. Then, if Γ ⊂ P2 is the corresponding plane curve and φ : C → Γ is the normalization map, then the fibre over the point [C] ∈ Mg of the moduli map ΠΣ : Σ 99K Mg consists of an open set in one or more irreducible components of F 2n,k(C). In partic- ular, every irreducible component of the general fibre of ΠΣ has dimension at least equal to min(8, ρ− k + 8). Moreover, if k < 3n then Σ has the expected dimension equal to N−d−2k = 3n+g−1−k, (see [25] or [23]). Finally, if g = −k−d ≥ 2, dim(Σ) = 3n+ g − 1− k = 3g − 3 + ρ− k + 8. This proves the statement. � Remark 2.3. The proof of the previous lemma still holds if k ≥ 3n but Σ has the expected dimension. However in general, when k ≥ 3n, we don’t have a bound for dim(ΠΣ(Σ)). Indeed, in this case the dimension of the general fibre of the moduli map of Σ is still at least equal to ρ− k + 8, but Σ may have dimension larger than 3n+ g − 1 − k. Anyhow, by the following proposition, every not empty irreducible component of Σnk,d has special moduli if k ≥ 3n. Proposition 2.4 (Arbarello-Cornalba, [1]). Let C be a general curve of genus g ≥ 2 and let φ : C → P2 be a birational morphism, then the degree of the zero divisor of the differential map of φ is smaller than ρ. In particular, every irreducible component of Σnk,d has special moduli if ρ = 3n− 2g − 6 < k. A sufficient condition for the existence of irreducible families of plane curves with nodes and cusps with general moduli is given by the following result. Proposition 2.5 (Kang, [15]). Σnk,d is irreducible, not empty and with general moduli if n > 2g − 1 + 2k, where g = − d− k. Actually, in [15], Kang proves that if n > 2g−1+2k, then Σnk,d is not empty and irreducible. But from his proof it follows that, under the hypothesis of proposition 2.5, Σnk,d has general moduli because the general element of Σ k,d corresponds to a curve which is a projection of an arbitrary smooth curve C of genus g in Pn−g, from a general (n − 3)-plane intersecting the tangent variety of C in k different points. Another result which may be used to find examples of families of plane curves with nodes and cusps having general moduli is the following. Let grn be a linear series on C associated to a (r+1)-spaceW ⊂ H0(C,L), where L is an invertible sheaf on C, and let {s0, . . . , sr} be a basis of W , then the ramification sequence of the g n at p is the sequence b = (b0, . . . , br) with bi = ordpsi − i. Choosing another basis of W , the ramification sequence of grn at p doesn’t change. We say that the ramification sequence of the grn at p is at least equal to b = (b0, . . . , br) if bi ≤ ordpsi − i, for every i, and we write (ordps0, . . . , ordpsr − r) ≥ (b0, . . . , br). Proposition 2.6 (Proposition 1.2 of [10]). Let C be a general curve of genus g, let p be a general point on C and let b = (b0, . . . , br) be any ramification sequence. There exists a grn on C having ramification at least b at p if and only if (bi + g − n+ r)+ ≤ g, where (−)+ := max(−, 0). From proposition 2.6, we easily deduce the following result. Corollary 2.7. Suppose that k ≤ 3 and ρ = 3n− 2g − 6 ≥ 2k. Then Σnk,d is not empty, irreducible and it has general moduli. 6 CONCETTINA GALATI Proof. By [16], the variety Σnk,d is irreducible for every k ≤ 3 and d ≤ Moreover, by using classical arguments, one can prove that Σnk,d is not empty if k ≤ 4 and d ≤ − 4, (see, for example, corollary 3.18 of chapter two of [7]). Finally, by theorem 1.1 of [21], by using the terminology of proposition 2.6, under the hypothesis k ≤ 3n − 4, in particular if k ≤ 3, the variety Σnk,d contains every point of PN corresponding to a plane curve Γ of genus g = − k − d such that the normalization morphism of Γ has at least a ramification point with ramification sequence (b0, b1, b2) ≥ (0, k, k). Then, by proposition 2.6, if ρ ≥ 2k and k ≤ 3, the moduli map of Σnk,d is surjective. � 3. On the existence of certain families of plane curves with nodes and cusps in sufficiently general position As we already observed, we don’t have a complete answer for the existence prob- lem of Σnk,d. In this section we are interested in a little more specific existence problem. We shall prove the existence of plane curves with nodes and cusps as singularities whose singular points are in sufficiently general position to impose independent linear conditions to a linear system of plane curves of a certain degree. Definition 3.1. A projective curve C ⊂ Pr is said to be geometrically t-normal if the linear series cut out on the normalization curve C̃ of C by the pull-back to C̃ of the linear system of hypersurfaces of Pr of degree t is complete. From a geometric point of view, a projective curve C ⊂ Pr is geometrically t- normal if and only if the image curve νt,r(C) of C by the Veronese embedding νt,r : P r → P( t ) of degree t, is not a projection of a non-degenerate curve living in a higher dimensional projective space. We shall say that a curve is geometrically linearly normal (g.l.n. for short) if it is geometrically 1-normal. Every such a curve C is not a projection of a curve lying in a projective space of larger dimension. The following result is proved under more general hypotheses in [5], theorem 2.1. Lemma 3.2. Let Γ ⊂ P2 be an irreducible and reduced plane curve of degree n and genus g with at most nodes and cusps as singularities. Let t be an integer such that n− 3 − t < 0, then Γ is geometrically t-normal if and only if it is smooth. On the contrary, if n − 3 − t ≥ 0, the plane curve Γ is geometrically t-normal if and only if its singular points impose independent linear conditions to plane curves of degree n− 3− t. We recall the following classical definition. Definition 3.3. Let Γ ⊂ P2 be a plane curve of degree n with d nodes at p1, . . . , pd and k cusps at q1, . . . , qk as singularities. Let φ : C → Γ be the normalization of Γ. The adjoint divisor ∆ of φ is the divisor on C defined by ∆ = i=1 φ −1(pi) + j=1 2φ −1(qj). Proof of lemma 3.2. Let Γ be a plane curve as in the statement of the lemma. Then, Γ is geometrically t-normal if and only if, by definition, h0(C,OC(t)) = h 0(P2,OP2(t))− h 0(P2, IΓ(t)) where IΓ is the ideal sheaf of Γ in P 2 and OC(t) := OC(tφ ∗(H)), where H is the general line of P2. By Riemann-Roch theorem, Γ is geometrically t-normal if and only if (2) h0(C, ωC(−t))) = −nt+ g − 1 + (t+ 1)(t+ 2) − h0(P2, IΓ(t)), where g is the geometric genus of Γ and ωC is the canonical sheaf of C. On the other hand, it is well known that H0(C, ωC(−t)) = H 0(C,OC(n− 3− t)(−∆)), where ∆ NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 7 is the adjoint divisor of φ, (see definition 3.3 and [4], appendix A). If n− 3− t < 0 then h0(C,OC(n− 3− t)) = 0 and Γ is geometrically t-normal if and only if h0(P2,OP2(t)) − h 0(P2, IΓ(t)) = nt− n2 − 3n where δ = − g = deg(∆)/2. This equality is verified if and only if δ = 0, i.e. Γ is smooth. If n− 3 ≥ t, h0(P2, IΓ(t)) = 0 and (2) is verified if and only if h0(C,OC(n− 3− t)(−∆)) = h 0(P2,OP2(n− 3− t))− δ. On the other hand, if ψ : S → P2 is the blowing-up of the plane at the singular locus of Γ, denoting by Ei the pullback of the singular locus of Γ with respect to ψ and by OS(r) the sheaf OS(rψ ∗(H)), we have that h0(C,OC(n−3−t)(−∆)) = h 0(S,OS(n−3−t)(− Ei)) = h 0(P2,OP2(n−3−t)⊗A) where A is the ideal sheaf of singular points of Γ. � Remark 3.4. Notice that, if an irreducible and reduced plane curve Γ of degree n with only nodes and cusps as singularities is geometrically t-normal, with t ≤ n−3, then it is geometrically r-normal for every r ≤ t. Indeed, if a set of points imposes independent linear conditions to a linear system S, then it imposes independent linear conditions to every linear system S′ containing S. Theorem 3.5. Let Σnk,d be the variety of irreducible and reduced plane curves of degree n with d nodes and k cusps. Suppose that d, k, n and t are such that d+ k ≤ n2 − (3 + 2t)n+ 2 + t2 + 3t = h0(OP2(n− t− 3))(3) t ≤ n− 3 if k = 0,(4) k ≤ 6 if t = 1, 2 and(5) k ≤ 6 + [ ], if t = 3,(6) where [−] is the integer part of −. Then the variety Σnk,d is not empty and there ex- ists at least an irreducible component W ⊂ Σnk,d whose general element corresponds to a geometrically t-normal plane curve. Remark 3.6. As we shall see in the next section, (see proposition 4.1), the geomet- ric linear normality of the plane curve corresponding to the general element of an irreducible component Σ of Σnk,d, is related with the number of moduli of Σ. Another motivation for the previous theorem has been the family of irreducible plane sextics with six cusps. By [25], we know that Σ66,0 contains at least two irreducible compo- nents Σ1 and Σ2. The general point of Σ1 corresponds to a sextic with six cusps on a conic, whereas the general element of Σ2 corresponds to a sextic with six cusps not on a conic. Note that, by the previous lemma the general element of Σ2 param- eterizes a geometric linearly normal sextic, unlike the general element of Σ1, which corresponds to a projection of a canonical curve of genus four. Theorem 3.5, proves in particular that, under a suitable restriction, (see inequality (3)), on the genus of the curve corresponding to the general element of the family and, if the number of the cusps is small, the variety Σnk,d contains a not empty irreducible component whose general element corresponds to a curve which is not a projection of an other curve, lying in a projective space of larger dimension. We notice that the inequality (3) of the previous theorem can’t be improved. Indeed, if g = − k − d, then k + d > h0(P2,OP2(n− 3 − t)) if and only if g < 2tn−t2−3t . On the other hand, by using the same notation as in theorem (3.5), if g < 2tn−t , then, by Riemann- Roch theorem, we have that h0(C,OC(t)) ≥ tn− g+1 > +1 = h0(P2,OP2(t)). On the contrary, inequalities (5) and (6) are not sharp, (see example 3.7). 8 CONCETTINA GALATI In the case of k = 0 and t = 1, theorem 3.5 has been proved by Sernesi in [18], section 4. The case k = 0 and t ≤ n−3 is already contained in [5]. To show theorem 3.5, we proceed by induction on the degree n and on the number of nodes and cusps of the curve. The geometric idea at the base of the induction on the degree of the curve is, mutatis mutandis, the same as that of Sernesi. Proof of theorem 3.5. Let t be a positive integer such that n−3−t ≥ 0 and letW ⊂ Σnk,d be an irreducible component of Σ k,d. By standard semicontinuity arguments it follows that, if there exists a point [C] ∈ W corresponding to a geometrically t-normal curve with only k cusps and d nodes as singularities, then the general element of W corresponds to a geometrically t-normal plane curve. Moreover, if the theorem is true for fixed n, t ≤ n − 3, k as in (5) or in (6) and k + d as in (3), then the theorem is true for n, t and any k′ ≤ k and d′ ≤ d+ k − k′. Indeed, from the hypotheses (3), (5) and (6), it follows in particular that k < 3n. By section 2.1, under this hypothesis, for every k′ ≤ k and for every d′ ≤ d + k − k′, there exists a family of plane curves C → ∆ of degree n, parametrized by a curve ∆ ⊂ Σnk′,d′ , whose special fibre is C0 = C and whose general fibre Cz has d ′ nodes and k′ cusps as singularities. The statement follows by applying the semicontinuity theorem to the family C̃ → ∆̃, obtained by normalizing the total space of the pull-back family of C → ∆ to the normalization curve ∆̃ of ∆. Finally, it’s enough to show the theorem when the equality holds in (5), (6) and (3). First of all we consider the case k = 0. We will show the statement for any fixed t and by induction on n. Let, then t ≥ 1 and n = t+3. In this case the equality holds in (3) if d = 1 = h0(P2,OP2). Since one point imposes independent linear conditions to regular functions, by using lemma 3.2, we find that every irreducible plane curve of degree n = t + 3 with one node and no further singularities is geometrically t- normal. So, the first step of the induction is proved. Suppose, now, that the theorem is true for n = t+3+a and let [Γ] ∈ Vn,g be a point corresponding to a geometrically t-normal curve with a 2+3a+2 nodes. Let D be a line which intersects transversally Γ and let P1, ..., Pt+1 be t + 1 marked points of Γ ∩ D. If Γ ′ = Γ ∪ D ⊂ P2, then P1, ..., Pt+1 are nodes for Γ ′. Let C → Γ be the normalization of Γ and C′ → Γ′ the partial normalization of Γ′, obtained by smoothing all singular points of Γ′, except P1, ..., Pt+1. We have the following exact sequence of sheaves on C (7) 0 → OD(t)(−P1 − ...− Pt+1) → OC′(t) → OC(t) → 0, where OC′(t) := OC′(tH) and H is the pull-back with respect to C ′ → Γ′ of general line of P2. Since deg(OD(t)(−P1 − ...− Pt+1)) < 0, we get that h0(D,OD(t)(−P1 − ...− Pt+1)) = 0 and so (8) h0(C′,OC′(t)) = h 0(C,OC(t)) = h 0(P2,OP2(t)). Now, by section 2.1, we can obtain Γ′ as the limit of a 1-parameter family of irreducible plane curves ψ : C → ∆ ⊂ P (n+1)(n+4) of degree n+ 1 = t+ a+ 4 with a2 + 3a+ 2 + n− t− 1 = (a+ 1)2 + 3(a+ 1) + 2 = h0(P2,OP2(n+ 1− t− 3)) nodes specializing to nodes of Γ′ different from the marked points P1, ..., Pt+1. More- over, one can prove that ∆ is smooth, (see [24] or [25]). Normalizing C, we obtain a family whose general fibre is smooth and whose special fibre is exactly C′, and we conclude the inductive step by (8) and by semicontinuity theorem. Now we consider the case t = 1, 2 or 3 and k as in (5) and in (6). Suppose the theorem is true for n and let [Γ] ∈ Σnk,d be a general point in one of the irreducible NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 9 components of Σnk,d. Then, let D be a smooth plane curve of degree t if t = 1, 2 or an irreducible cubic with a cusp if t = 3. By the generality of Γ, we may suppose that D intersects Γ transversally. Let P1, ..., Pt2+1 be t 2+1 fixed points of Γ∩D. If Γ′ = Γ ∪D, then P1, ..., Pt2+1 are nodes for Γ ′. Let C → Γ be the normalization of Γ and C′ → Γ′ the partial normalization of Γ′, obtained by smoothing all singular points except P1, ..., Pt2+1. By using the same notation and by arguing as before, from the following exact sequence of sheaves on C′ 0 → OD(t)(−P1 − ...− Pt2+1) → OC′(t) → OC(t) → 0, we deduce that (9) h0(C′,OC′(t)) = h 0(C,OC(t)) = h 0(P2,OP2(t)). Now, by section 2.1, we can obtain Γ′ as limit of a family of irreducible plane curves φ : C → ∆ of degree n+ t with d+ nt− t2 − 1 = (n+t)2+(3+2t)(n+t)+t2+3t+2 nodes specializing to nodes of Γ′ different to P1, ..., Pt2+1, and k + 2−3t+2 cusps specializing to cusps of Γ. We conclude by (9) and by semicontinuity, as before. Now we have to show the first step of the induction. For t = 1 the induction begins with the cases (n, k) = (4, 1), (5, 3), (6, 6). Trivially, if n = 4 and k = 1 one point imposes independent conditions to the linear system of regular functions. If n = 5 and k = 3 we have to show that there are irreducible quintics with three cusps not on a line. A quintic with three cusps is a projection of the rational normal quintic C5 ⊂ P 5 from a plane generated by three points lying on three different tangent lines to C5. By Bezout theorem the three cusps of such a plane curve can’t be aligned. If n = k = 6, one can repeat the classical argument used by Zariski, see [24] or example 3.20 of chapter 2 of [7]. For t = 2 we have to show the theorem for (n, k) = (5, 1), (6, 3), (7, 6), (8, 6), while for t = 3 we have to show the theorem for (n, k) = (6, 1), (7, 3), (8, 6), (9, 6), (10, 6). The case t = 2 and (n, k) = (5, 1) is trivial. When t = 2, n = 6 and k = 3 we have that n − 3 − t = 1. To show that there exists an irreducible sextic with three cusps not on a line, consider a rational quartic C4 with three cusps, (see corollary 3.18 of chapter 2 of [7] for the existence). By Bezout theorem, the three double points of C4 can’t be aligned. Then consider a sextic C6 which is union of C4 and a conic C2 which intersects C4 transversally. By section 2.1, one can smooth the intersection points of C4 and C2 obtaining a family of sextics with three cusps not on a line. For t = 2, n = 7 and k = 6 we argue as in the previous case, by using a sextic C6 with six cusps not on a conic and a line R with intersects C6 transversally. Similarly for t = 2 , n = 8 and k = 6 and t = 3 and (n, k) = (6, 1), (7, 3), (8, 6), (9, 6), (10, 6). � Example 3.7. Inequalities (5) and (6) are not sharp. To see this, we can consider the example of curves of degree 10. We recall that we say that a plane curve is geometrically linearly normal (g.l.n. for short) if it is geometrically 1-normal. The- orem 3.5 ensures the existence of g.l.n. irreducible plane curves of degree 10 with k ≤ 6 cusps and nodes as singularities. But, by using the same ideas as we used in theorem 3.5, one can prove the existence of g.l.n. plane curves of degree 10 with nodes and k ≤ 9 cusps. It is enough to consider a sextic Γ6 with six cusps not on a conic and a rational quartic Γ4 with three cusps intersecting Γ6 transversally. We choose five points P1, . . . , P5 of Γ4 ∩ Γ6. If Γ 6 and Γ 4 are the normalization curves of Γ6 and Γ4 respectively and C ′ is the partial normalization of Γ6 ∪Γ4 obtained by normalizing all its singular points except P1, . . . , P5, by considering the following exact sequence 0 → OΓ′4(1)(−P1 − · · · − P5) → OC′(1) → OΓ′6(1) → 0 10 CONCETTINA GALATI we find that h0(C′,OC′(1)) = 3. By using terminology of section 2.1, the statement follows by smoothing the singular points P1, . . . , P5 of Γ6 ∪Γ4, and by semicontinu- ity, as in the proof of theorem 3.5. The bound on the number of cusps of theorem 3.5 can be improved also for t = 2 or t = 3. For example, theorem 3.5 ensures the existence of geometrically 3-normal curves of degree 12 with k ≤ 6 and nodes as further singularities. But, by considering a geometrically 3-normal curve of de- gree 8 with six cusps and a quartic with 3 cusps and arguing as before, we can find geometrically 3-normal irreducible plane curves of degree 12 with nodes and k ≤ 9 cusps. 4. Families of plane curves with nodes and cusps with finite and expected number of moduli. Let Σ ⊂ Σnk,d be an irreducible component of Σ k,d. We want to give sufficient conditions for Σ to have the expected number of moduli. Let [Γ] ∈ Σ be a general element, corresponding to a plane curve Γ with normalization map φ : C → Γ. We shall denote by ωC the canonical sheaf of C and by OC(1) the sheaf associated to the pullback to C of the divisor cut out on Γ from the general line of P2. Proposition 4.1. Let Σ ⊂ Σnk,d be an irreducible component of Σ k,d and let [Γ] ∈ Σ be a general element, corresponding to a plane curve Γ with normalization map φ : C → Γ. Suppose that Σ is smooth of the expected dimension equal to 3n+g−1−k at [Γ]. Moreover, suppose that: (1) Γ is geometrically linearly normal, i.e. h0(C,OC(1)) = 3, (2) the Brill-Noether map µo,C : H 0(C,OC(1))⊗H 0(C, ωC(−1)) → H 0(C, ωC) is surjective. Then Σ has the expected number of moduli equal to 3g − 3 + ρ− k. Proof. The case k = 0 has been proved by Sernesi in [18], section 4. We shall assume k > 0. Let Γ be a plane curve verifying the hypotheses of the proposition. By lemma 1.5.(b) of [22], the hypothesis that Σ is smooth of the expected dimension at [Γ] implies the vanishing H1(C,Nφ) = 0, where Nφ if the normal sheaf of φ. We recall that, denoting by ΘC and ΘP2 the tangent sheaf of C and P 2 respectively, then the normal sheaf of φ is defined as the cokernel of the differential map φ∗ of φ (10) 0 → ΘC → φ∗ΘP2 → Nφ → 0 By theorem 3.1 of [13], the vanishing H1(C,Nφ) = 0 is a sufficient condition for the existence of a universal deformation family of the normalization map φ, whose parameter space B is smooth at the point 0 corresponding to φ, with tangent space at 0 equal to H0(C, Nφ). On the contrary, by [3], p. 487, the Severi variety Vn,g = Σ 0,k+d of irreducible plane curves of genus − d − k is singular at the point [Γ] and the universal deformation space B of φ is a desingularization of Vn,g at [Γ]. Moreover, by corollary 6.11 of [2], if Bk = F −1(Σ) is the locus of points of B corresponding to a morphism with k ramification points, then the tangent space to Bk at 0 is a subspaceW of H 0(C,Nφ) NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 11 of codimension k such that W ∩H0(C,Kφ) = 0, where Kφ is the torsion subsheaf of Nφ. By [3], p.487, it follows that, if F : B → Vn,g is the natural (1 : 1)-map from B to Vn,g, then the differential map dF : H0(C, Nφ) → T[Γ]Vn,g restricts to an isomorphism between W and the tangent space T[Γ]Σ to Σ at [Γ]. We can now go back to the number of moduli of Σ. From the exact sequence (10), by using that H1(C, Nφ) = 0, we get the following long exact sequence 0 → H0(C,ΘC) → H 0(C, φ∗ΘP2) → H 0(C,Nφ) → H1(C,ΘC) → H 1(C, φ∗ΘP2) → 0 Recalling that the space H1(C,ΘC) is canonically identified with the tangent space T[C]Mg to Mg at the point associated to the normalization C of Γ, the coboundary map δC : H 0(C,Nφ) → H 1(C,ΘC) sends the Horikawa class of an infinitesimal deformation of φ to the Kodaira-Spencer class of the corresponding infinitesimal deformation of C. So, δC |W is the differential map at the point 0 ∈ B of the moduli map ΠΣ ◦ F : Bk = F −1(Σ) 99K Mg. Since the point [Γ] is general in Σ, and recalling the isomorphism dF :W → T[Γ]Σ, we have that the number of moduli of Σ = dim(δC(W )). Now, from the exact sequence (10), we have that dim(δC(H 0(C,Nφ)) = 3g − 3− h 1(C, φ∗ΘP2). Moreover, from the pull-back to C of the Euler exact sequence, we deduce the well known isomorphism H1(C, φ∗ΘP2) ≃ coker(µ 0,C) ≃ (ker(µ0,C)) and we conclude that (11) dim(δC(H 0(C,Nφ))) = 3g − 3− dim(ker(µ0,C)). Notice that the previous equality is always true, even if Γ doesn’t verify the hypoth- esis (1) or (2) of the statement. Moreover, if Γ is geometrically linearly normal, i.e. if h0(C,OC(1)) = 3, we have that ρ = 3n− 2g − 6 = dim(coker(µo,C))− dim(ker(µo,C)). When µo,C is surjective, ρ = − dim(ker(µo,C)) and (12) dim(δC(H 0(C,Nφ)) = 3g − 3 + ρ = dim(B)− 8 = dim(Vn,g)− 8. Since the dimension of the fibre of the moduli map ΠVn,g ◦ F : B 99K Mg has dimension at least equal to 8 = dim(Aut(P2)), from (12) we deduce that the differential map of ΠVn,g ◦F has maximal rank at 0 and, in particular, we have that dim((ΠVn,g ◦ F ) −1([C])) = 8. Equivalently, there exist only finitely many g2n on C. It follows that there are only finitely many g2n on C mapping C to the plane as a curve with k cusps and d nodes. Then, dim(δc(W )) = dim(ΠΣ(Σ)) = 3g − 3 + ρ− k. Remark 4.2. Arguing as in the proof of the previous proposition, it has been proved in [18] that, if Γ is a geometrically linearly normal plane curve with only d nodes as singularities and the Brill-Noether map µo,C of the normalization morphism of Γ is injective, then Σ = Σn0,d has general moduli. If Σ ⊂ Σ k,d and [Γ] ∈ Σ verify the hypotheses of proposition 4.1 but we assume that µo,C is injective, we may only conclude that ΠVn,g ◦ F is dominant with surjective differential map at [Γ]. 12 CONCETTINA GALATI So dim(Π−1Vn,g ([C])) = ρ + 8. But this is not useful to compute the dimension of δC(W ) = δC(T[Γ]Σ). However, in this case we get that δC(T[Γ]Σ) + δC(H 0(C,Kφ)) = δC(H 0(C,Nφ)) = H 1(C,ΘC). Then, by using that dim(δC(H 0(C,Kφ))) ≤ k and by recalling that if Σ has the expected dimension then the number of moduli of Σnk,d is at most the expected one (see lemma 2.2 and remark 2.3), we find that 3g − 3− k ≤ number of moduli of Σ ≤ 3g − 3 + ρ− k. Remark 4.3. Notice that, if a plane curve Γ of genus g verifies the hypotheses (1) and (2) of the previous proposition, then the Brill-Noether number ρ(2, g, n) is not positive and, in particular, g ≥ 3. We don’t know examples of complete irreducible families Σ ⊂ Σnk,d with the expected number of moduli whose general element [Γ] corresponds to a curve Γ of genus g, with ρ(2, g, n) ≤ 0, which doesn’t verify properties (1) and (2). Lemma 4.4 ([5], Corollary 3.4). Let Γ be an irreducible plane curve of degree n with only nodes and cusps as singularities and let φ : C → Γ be the normalization morphism of Γ. Suppose that Γ is geometrically 2-normal, i.e. h0(C,OC(2)) = 6. Then the Brill-Noether map µo,C : H 0(C,OC(1))⊗H 0(C, ωC(−1)) → H 0(C, ωC) is surjective. Proof. By lemma 3.2, the curve Γ is geometrically 2-normal if and only if the scheme N of the singular points of Γ imposes independent linear conditions to the linear systemH0(P2,OP2(n−5)) of plane curves of degree n−5. SinceH 0(P2,OP2(n−5)) ⊂ H0(P2,OP2(n−4)), N imposes independent linear conditions plane curves of degree n−4, and, by using lemma 3.2, we get that h0(C,OC(1)) = 3, i.e. Γ is geometrically linearly normal. Now, denote by IN |P2 the ideal sheaf of N . Notice that the curve Γ is geometrically 2-normal if and only if the ideal sheaf IN |P2(n− 4) is 0-regular, (in the sense of Castelnuovo-Mumford). Indeed, since h2(P2, IN |P2(n− 6)) = 0, the ideal sheaf IN |P2(n− 4) is 0-regular if and only if h 1(P2, IN |P2(n− 5)) = 0. Because of the 0-regularity of IN |P2(n− 4), we have the surjectivity of the natural map H0(P2, IN |P2(n− 4))⊗H 0(P2,OP2(1)) → H 0(P2, IN |P2(n− 3)), (see [17]). Finally, by the geometric linear normality of Γ, the vertical maps of the following commutative diagram H0(P2,OP2(1))⊗H 0(P2, IN |P2(n− 4)) // H0(P2, IN |P2(n− 3)) H0(C,OC(1))⊗H 0(C, ωC(−1)) // H0(C, ωC) are surjective and, hence, the Brill-Noether map µo,C is surjective too. � Corollary 4.5. Let Σ ⊂ Σnk,d be an irreducible component of Σ k,d of dimension equal to 3n + g − 1 − k, such that the general point [Γ] ∈ Σ corresponds to a geometrically 2-normal plane curve. Then Σ has the expected number of moduli equal to 3g − 3 + ρ− k. Proof. It follows from proposition 4.1 and lemma 4.4. � In order to produce examples of families of irreducible plane curves with nodes and cusps with the expected number of moduli, we study how increases the rank of the Brill-Noether map by smoothing a node or a cusp of the general curve of the family, (in the sense of section 2.1). NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 13 Let Σ ⊂ Σnk,d, with n ≥ 5, be an irreducible component of Σ k,d, let [Γ] ∈ Σ be a general point of Σ and let φ : C → Γ be the normalization of Γ. Choose a singular point P ∈ Γ and denote by φ′ : C′ → Γ the partial normalization of Γ obtained by smoothing all singular points of Γ, except the point P . If ωC′ is the dualizing sheaf of C′ and µo,C′ : H 0(C′,OC′(1))⊗H 0(C′, ωC′(−1)) → H 0(C′, ωC′), is the natural multiplication map, we have the following result. Lemma 4.6. If h0(C,OC(1)) = 3 and the geometric genus g of C is such that g > n−2, with n ≥ 5, then rk(µo,C′) ≥ rk(µo,C)+1. In particular, if h 0(C,OC(1)) = 3, n ≥ 5 and µo,C is surjective, then µo,C′ is also surjective. Proof. Let ψ : C → C′ be the normalization map. We recall that, if we set φ∗(P ) := p1 + p2 when P is a node and φ ∗(P ) = 2φ−1(P ) when P is a cusp, then the dualizing sheaf of C′ is a subsheaf of ψ∗(ωC(φ ∗(P ))), (see for example [10], p.80). In particular we have the following exact sequence (13) 0 → ωC′ → ψ∗ωC(φ ∗(P )) → CP → 0 where CP is the skyscraper sheaf on C with support at P . From this exact sequence, we deduce that H0(C′, ωC′) ≃ H 0(C, ωC(φ ∗(P ))). Moreover, tensoring (13) by OC′(−1), we find the exact sequence (14) 0 → ωC′(−1) → ψ∗ωC(φ ∗(P ))(−1) → CP → 0 from which we get an injective map H0(C′, ωC′(−1)) → H 0(C, ωC(φ ∗(P ))(−1)). On the other hand (15) h0(C′, ωC′(−1)) = h 0(C, ωC(φ ∗(P ))(−1)) = g − n+ 3 and so H0(C′, ωC′(−1)) ≃ H 0(C, ωC(φ ∗(P ))(−1)). Moreover, from the hypothesis h0(C,OC(1)) = 3, we have that H 0(C,OC(1)) ≃ H0(C′,OC′(1)) ≃ H 0(P2,OP2(1)). Therefore, in the following commutative dia- H0(C′,OC′(1))⊗H 0(C′, ωC′(−1)) µo,C′ H0(C′, ωC′) H0(C,OC(1))⊗H 0(C, ωC(−1)(φ ∗(P ))) // H0(C, ωC(φ ∗(P ))) where we denoted by µ′o,C the natural multiplication map, the vertical maps are isomorphisms. In particular, rk(µo,C′) = rk(µ o,C). In order to compute the rank of µ′o,C , we consider the following commutative dia- 14 CONCETTINA GALATI H0(C,OC(1))⊗H 0(C, ωC(−1)) H0(C, ωC) H0(C,OC(1))⊗H 0(C, ωC(−1)(φ ∗(P ))) // H0(C, ωC(φ ∗(P ))) where the vertical maps are injections. Notice that, since we supposed n ≥ 5, h0(C,OC(1)) = 3 and g > n−2 ≥ 3, the sheaf OC(1) is special. We deduce that C is not hyperelliptic and, chosen a basis of H0(C, ωC), the associated map C → P g−1 is an embedding. On the contrary, the sheaf ωC(φ ∗(P )) does not define an embedding on C. Choosing a basis of H0(C, ωC(φ ∗(P )) and denoting by Φ : C → Pg the associated map, this will be an embedding outside φ∗(P ). If P is a node of C and φ∗(P ) = p1 + p2, the image of C to P g, with respect to Φ, will have a node at the image point Q of p1 and p2. If P ∈ Γ is a cusp, then Φ(C) will have a cusp at the image point Q of φ−1(P ). The hyperplanes of Pg passing through Q cut out on C the canonical linear series |ωC |. Moreover, if we denote by B ⊂ P the subspace which is the base locus of the hyperplanes of Pg corresponding to Im(µ′o,C), then Q /∈ B. Indeed, B intersects the curve C in the image of the base locus of |OC(1)|+ |ωC(φ ∗(P ))(−1)| := P(Im(µ′0,C)), which coincides with the base locus of |ωC(φ ∗(P ))(−1)|, since |OC(1)| is base point free. Now, by (15), h0(ωC(φ ∗(P ))(−1)) = 3 + g − n = h0(C, ωC(−1)) + 1. Then φ∗(P ) does not belong to the base locus of |ωC(φ ∗(P ))(−1)|, and so dim(< Q,B >Pg) = dim(B) + 1. Finally, we find that rk(µo,C) = rk(Gµo,C) ≤ dim(Im(G) ∩ Im(µ o,C)) ≤ g + 1− dim(< B,Q >Pg)− 1 = g − 1− dim(B) = rk(µ′o,C)− 1. Corollary 4.7. Let Σ ⊂ Σnk,d be a non-empty irreducible component of the expected dimension of Σnk,d, with n ≥ 5. Suppose that Σ has the expected number of moduli and that the general element [Γ] ∈ Σ corresponds to a g.l.n. plane curve Γ of geometric genus g such that, if C → Γ is the normalization of Γ, then the map µo,C is surjective. Then, for every k ′ ≤ k and d′ ≤ d + k − k′, there is at least an irreducible component Σ′ ⊂ Σnk′,d′ , such that Σ ⊂ Σ ′, the general element [D] ∈ Σ′ corresponds to a g.l.n. plane curve D of geometric genus g′ with normalization Dν → D and the Brill-Noether map µ0,Dν surjective. In particular, Σ ′ has the expected number of moduli. Proof. Let Γ be the curve corresponding to the general element [Γ] of Σ ⊂ Σnk,d. Since by hypothesis Σ is smooth of the expected dimension at [Γ], by section 2.1, for every k′ ≤ k and for every d′ ≤ d+ k− k′ there exists an irreducible component Σ′ of Σnk′,d′ containing Σ. In order to prove the statement, it is enough to show it under the hypotheses k′ = k− 1 and d′ = d+1, k = k′ and d′ = d− 1 or d = d′ and k′ = k − 1. If k′ = k − 1 and d′ = d + 1, then the statement follows by standard semiconinuity arguments. If k = k′ and d′ = d − 1 or d = d′ and k′ = k − 1, the statement follows by lemma 4.6 and by standard semicontinuity arguments. � NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 15 The following lemma has been stated and proved by Sernesi in [18]. Actually, Sernesi supposes that Γ has only nodes as singularities. But, since his proof works for plane curves Γ with any type of singularities and, since we need it for curves with nodes and cusps, we state the lemma in a more general form. Lemma 4.8. ([18], lemma 2.3) Let Γ be an irreducible and reduced plane curve of degree n ≥ 5 with any type of singularities. Denote by C the normalization of Γ. Suppose that h0(C,OC(1)) = 3 and the Brill-Noether map µo,C : H 0(C,OC(1))⊗H 0(C, ωC(−1)) → H 0(C, ωC), has maximal rank. Let R be a general line and let P1, P2 and P3 be three fixed points of Γ ∩ R. We denote by C′ the partial normalization of Γ′ = Γ ∪ R, obtained by smoothing all the singular points, except P1, P2 and P3. Then h 0(C′,OC′(1)) = 3 and, denoting by ωC′ the dualizing sheaf of C ′, the multiplication map µo,C′ : H 0(C′,OC′(1))⊗H 0(C′, ωC′(−1)) → H 0(C′, ωC′), has maximal rank. Theorem 4.9. Let Σnk,d be the algebraic system of irreducible plane curves of degree n ≥ 4 with k cusps, d nodes and geometric genus g = − k − d. Suppose that: (16) n− 2 ≤ g equivalently k + d ≤ h0(P2,OP2(n− 4)) (17) k ≤ 6 + if 3n− 9 ≤ g and n ≥ 6, (18) k ≤ 6 otherwise. Then Σnk,d has at least an irreducible component Σ which is not empty and such that, if Γ ⊂ P2 is the curve corresponding to the general element of Σ and C is the normalization curve of Γ, then h0(C,OC(1)) = 3 and the map µo,C has maximal rank. In particular, when ρ ≤ 0, the algebraic system Σ has the expected number of moduli equal to 3g − 3 + ρ− k. Proof. Suppose that (17) holds. Then, by observing that g ≥ 3n− 9 if and only if k + d ≤ h0(P2,OP2(n− 6)) and by using theorem 3.5 for t = 3, we have that there exists an irreducible com- ponent Σ of Σnk,d whose general element is a geometrically 3-normal plane curve Γ. By remark 3.4, it follows that also the linear systems cut out on C by the conics and the lines are complete. The statement follows from corollary 4.5. In order to prove the theorem under the hypothesis (18), we consider the following subcases: (1) 2n− 5 ≤ g ≤ 3n− 9, i.e. h0(OP2(n− 6)) ≤ k+ d ≤ h 0(OP2(n− 5)) and n ≥ 5, (2) n− 2 ≤ g ≤ 2n− 7 and n ≥ 5, (3) g = 2n− 6 and n ≥ 4. Suppose that (1) holds. By theorem 3.5 for t = 2, we know that, under this hypothesis, there exists a nonempty component Σ ⊂ Σnk,d, whose general element is geometrically 2-normal. We conclude as in the previous case, by corollary 4.5. Now, suppose that g and n verify (2). We shall prove the theorem by induction on n and g. Set g = 2n − 7 − a, with a ≥ 0 fixed. Suppose that the theorem is true for the pair (n, g), with n ≥ 7. We shall prove the theorem for (n + 1, g + 2), observing that g + 2 = 2(n + 1) − 7 − a. Let Γ be a g.l.n. irreducible plane curve 16 CONCETTINA GALATI of degree n and genus g = 2n − 7 − a with k ≤ 6 cusps, d nodes and no more singularities. Let C be the normalization of Γ. Suppose that the Brill-Noether map µo,C has maximal rank. Let R ⊂ P 2 be a general line and let P1, P2 and P3 be three fixed points of Γ∩R. By section 2.1, since k ≤ 6 < 3n, one can smooth the singular points P1, P2, P3 and preserve the other singularities of Γ ∪ R ⊂ P 2, obtaining a family of plane curves C → ∆ whose general fibre is irreducible, has degree n+1 and genus g+2. We conclude by lemma 4.8 and by standard semicontinuity arguments. Now we prove the first step of the induction for n ≥ 7. If n = 7, we get 0 ≤ a ≤ 2. Let a = 0, i.e. g = 2n−7−a= 7. Let Γ be a g.l.n. irreducible plane curve of degree n = 7, of genus g = n = 7 with k ≤ 6 cusps and nodes as singularities, such that no seven singular points of Γ lie on an irreducible conic. To prove that there exists such a plane curve, notice that, by applying theorem 3.5 for t = 1, we get that, for any fixed k ≤ 6, there exists a g.l.n. irreducible sextic D of genus four with k cusps and d = 6 − k nodes. Let R1, . . . , R6 be the singular points of D. Since the points R1, . . . , R6 of D impose independent linear conditions to the conics, however we choose five singular points Ri1 , . . . , Ri5 of D, with I = (i1, . . . , i5) ⊂ (1, . . . , 6), there exists only one conic CI , passing through these points. Let us set S = CI ∩D and let R be a line intersecting D transversally at six points out of S. By Bezout theorem, no seven singular points of Γ′ = D ∪ R belong to an irreducible conic. Moreover, if D̃ is the normalization of D, if Q1, . . . , Q4 are four fixed points of D∩R and D′ is the partial normalization of Γ′ obtained by smoothing the singular points except Q1, . . . , Q4, then, by the following exact sequence (19) 0 → OR(1)(−Q1 − · · · −Q4) → OD′(1) → OD̃(1) → 0 we find that h0(D′, OD′(1)) = 3. By section 2.1, one can smooth the singulari- ties Q1, . . . , Q4 and preserve the other singularities of D ∪ R, getting a family of irreducible septics G → ∆ whose general fibre Γ is a geometrically linearly normal irreducible septic with k cusps and 8 − k nodes such that no seven singular point of Γ belong to an irreducible conic. Let, now, C be the normalization of Γ and let ∆ ⊂ C be the adjoint divisor of the normalization map φ : C → Γ. We shall prove that ker(µo,C) = 0. Since Γ is geometrically linearly normal, we have that h0(C, ωC(−1)) = h 0(C,OC(3)(−∆))) = g − n+ 2 = 2. Then, by the base point free pencil trick, we find that ker(µo,C) = H 0(C, ω∗C(B)⊗OC(2)), where B is the base locus of |ωC(−1) = OC(3)(−∆)|. Let F be the pencil of plane cubics passing through the eight double points P1, . . . , P8 of Γ and let BF be the base locus of the pencil F . Let Γ3 be the general element of F . Suppose that BF has dimension one. If BF contains a line l, then, by Bezout theorem, at most three points among P1, . . . , P8, say P1, . . . , P3 can lie on l and the other points have to be contained in the base locus of a pencil of conics F ′. Using again Bezout theorem, we find that the curves of F ′ are reducible and the base locus of F ′ contains a line l′. But also l′ contains at most three points of P4, . . . , P6. It follows that there is only one cubic through P1, . . . , P8. This is not possible by construction. Suppose that BF contains an irreducible conic Γ2. By Bezout theorem, at most seven points among P1, . . . , P8 may lie on Γ2. On the other hand, since dim(F) = 1, there are exactly seven points of P1, . . . , P8, say P1, . . . , P7, on Γ2 and the general cubic Γ3 of F is union of Γ2 and a line passing through P8. Since, by construction, no seven singular points of Γ lie on an irreducible conic, also in this case we get a contradiction. So the general element Γ3 of F is irreducible. Using again Bezout theorem, we find that Γ3 is smooth and F has only one more base point Q. We consider the following cases: a) Q doesn’t lie on Γ; NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 17 b) Q lies on Γ, but Q 6= P1, . . . , P8; c) Q is infinitely near to one of the points P1, . . . , P8, say Pî, i.e. the cubics of F have at P the same tangent line l, but l is not contained in the tangent cone to Γ d) Q is like in the case c), but l is contained in the tangent cone to Γ at P Suppose that the case a) or c) holds. Thus B = 0 and ker(µo,C) = H 0(C, ω∗C ⊗OC(2)) = H 0(C,OC(−2)(∆)). By Riemann-Roch theorem, h0(C,OC(−2)(∆)) = h 0(C,OC(6)(−2∆)) − 4. One sees that h0(C,OC(6)(−2∆)) = 4, by blowing-up the plane at P1, . . . , P8 and by using some standard exact sequences. Suppose now that the case b) holds. Thus B = Q and dim(ker(µo,C)) = h 0(C,OC(−2)(∆ +Q)) = h 0(C,OC(6)(−2∆−Q))− 3. Also in this case one sees that h0(C,OC(6)(−2∆ − Q)) = 3 by blowing-up at P1, . . . , P8 and Q and by using standard exact sequences. Finally, we analyze the case d). Let Φ : S → P2 be the blow-up of the plane at P1, . . . , P8 with exceptional divisors E1, . . . , E8. Let Q ∈ Eî be the intersection point of Eî and the strict transform C3 of the general cubic Γ3 of the pencil F . We denote by Φ̃ : S̃ → S the blow-up of S at Q and by Ψ : S̃ → P2 the composition map of the maps Φ and Φ̃. We still denote by E1, . . . , E8 their strict transforms on S̃, by C and C3 the strict transforms of Γ and Γ3 and by EQ the new exceptional divisor of S̃. In this case we have that Ψ∗(Γ) = C+2 Ei+3EQ, Ψ ∗(Γ3) = C3+ Ei+2EQ. Moreover, the divisor ∆ is cut out on C by iEi + EQ and the base locus B of the linear series |ωC(−1)| coincides with the intersection point of EQ and C. So, we have that dim(ker(µo,C)) = h 0(C,OC(−2)( Ei+2EQ)) = h 0(C,OC(6)(−2 Ei−3EQ))−3. Moreover, from the following exact sequence 0 → O (−1) → O (6)(−2 Ei − 3EQ) → OC(6)(−2 Ei − 3EQ) → 0 we find that H0(C,OC(6)(−2 Ei− 3EQ)) = H 0(S̃,O (6)(−2 Ei− 3EQ)). In order to show that h0(S̃,O (6)(−2 Ei − 3EQ)) = 3, we consider the following exact sequence 0 → O (3)(− Ei − EQ) → OS̃(6)(−2 Ei − 3EQ) →(20) → OC3(6)(−2 Ei − 3EQ) → 0 By Riemann-Roch theorem, we have that h0(C3,OC3(6)(−2 Ei − 3EQ)) = 1 and h 1(C3,OC3(6)(−2 Ei − 3EQ) = 0. Moreover, by Serre duality we have that H1(S̃,O (3)(− Ei − EQ))) = H 1(S̃,O (−6)(2 Ei + 3EQ))). From the exact sequence (21) 0 → O (−6)(+2 Ei + 3EQ)) → OS̃(1) → OC(1) → 0 18 CONCETTINA GALATI by using that the map H0(S̃,O (1)) → H0(C,OC(1)) is surjective and that h1(S̃,O (1)) = 0, we find that H1(S̃,O (−6)(+2 Ei + 3EQ))) = H 1(S̃,O (3)(− Ei − EQ))) = 0. Then, by (20), h0(S̃,O (6)(−2 Ei − 3EQ)) = h 0(S̃,O (3)(− Ei − EQ)) + h0(C3,OC3(6)(−2 Ei−3EQ)) = 3 and ker(µo,C) = 0. The first step of induction for g = n = 7 and k ≤ 6 is proved. We complete the proof of the first step of the induction, for n and g verifying (2). When n = 7 and 1 ≤ a ≤ 2, the existence of a g.l.n. plane curve Γ follows from theorem 3.5. Using the above notation, h0(C, ωC(−1)) = 1 if a = 1 and h0(C, ωC(−1)) = 0 if a = 2. In any case µo,C is injective. When n ≥ 8 and a ≤ n − 6 the theorem follows by induction from the case n = 7. For n ≥ 8 and a = n− 5, we find that g = n− 2, or, equivalently, k + d = h0(P2,OP2(n− 4)). In theorem 3.5, we proved the existence of geometrically linearly normal plane curves of degree n ≥ 8 and genus g = n− 2, with nodes and k ≤ 6 cusps. For every such plane curve Γ, using the above notation, the Brill-Noether map µo,C is injective since h0(C, ωC(−1)) = 0. The cases n = 5 and n = 6 are similar. Suppose now that n and g verify (3). First of all we prove the theorem for (n, g) = (4, 2), (5, 4), (6, 6). For n = 4 and g = 2, we find n = g + 2 and we argue as in the case n ≥ 8 and g = n − 2. Similarly, for (n, g) = (5, 4). For n = 6 and g = 6 in theorem 3.5 we proved the existence of geometrically linearly normal plane curves Γ with k ≤ 4 cusps and nodes as singularities. For every such a plane curve Γ, denoting by C its normalization, we get that h0(C, ωC(−1)) = 2, i.e. the linear system F of conics passing through the four singular points P1, . . . , P4 of Γ is a pencil which cuts out on C the complete linear series |ωC(−1)|. We have two possibilities: either the general element of this pencil is irreducible or it consists of a line containing exactly three singular points P1, P2, P3 of Γ and a line passing through P4. In any case the base locus of F intersects Γ only at P1, . . . , P4 and the linear series |ωC(−1)| has no base points. Then, by the base point free pencil trick , we find that ker(µo,C) = H 0(C, ω∗C ⊗ O(2)) = H 0(C,OC(−1)(∆)), where ∆ ⊂ C is the adjoint divisor of the normalization map C → Γ. By Riemann-Roch theorem, we have that h0(C,OC(−1)(∆)) = h 0(C,OC(4)(−2∆))−3. By blowing-up at P1, . . . , P4, one can see that h 0(C,OC(4)(−2∆)) = 3, as we wanted. Finally, we show the theorem under the hypothesis (3) for n ≥ 7, by using induction on n. In order to prove the inductive step we may use lemma 4.8, exactly as we did in the case (2). We prove the first step of induction. If n = 7 we have that g = 8. On pages 15 and 16 we proved the existence of geometrically linearly normal plane curves Γ of degree 7 and genus 7 with k ≤ 6, such that, if P1, . . . , P8 are the singular points of Γ, then no seven points among P1, . . . , P8 lie on a conic. In particular, we proved that, for every such a plane curve Γ, the general element of the pencil of cubics passing through P1, . . . , P8 is irreducible and, if φ : C → Γ is the normalization of Γ, then the Brill-Noether map µo,C is injective. Let C the partial normalization of Γ which we get by smoothing all the singular points of Γ except a node, say P8. By using the same notation and by arguing exactly as in the proof of lemma 4.6, we get the following commutative diagram H0(C′,OC′(1))⊗H 0(C′, ωC′(−1)) µo,C′ H0(C′, ωC′) H0(C,OC(1))⊗H 0(C, ωC(−1)(φ ∗(P8))) // H0(C, ωC(φ ∗(P8))) where µ′o,C is the multiplication map and the vertical maps are isomorphisms. We want to prove that the map µo,C′ is surjective. By the previous diagram it is enough NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 19 to prove that µ′o,C is surjective. Since h 0(C, ωC(φ ∗(P8))) = 8 and h0(C,OC(1))h 0(C, ωC(−1)(φ ∗(P8))) = 3(7− 7 + 3) = 9, we have that dim(ker(µo,C′)) ≥ 1 and µo,C′ is surjective if dim(ker(µo,C′)) = 1. By recalling that Γ is geometrically linearly normal, we have that, if Z is the scheme of the points P1, . . . , P7 and IZ|P2 is the ideal sheaf of Z in P 2, then in the following commutative diagram H0(C,OC(1))⊗H 0(C, ωC(−1)(φ ∗(P8))) H0(C, ωC(φ ∗(P8))) H0(P2,OP2(1))⊗H 0(P2, IZ|P2(3)) // H0(P2, IZ|P2(4)) the vertical maps are isomorphisms. Hence, it is enough to prove that the kernel of the multiplication map µ has dimension one. Let {f0, f1, f2} be a basis of the vector space H0(P2, IZ|P2(3)). Since the general cubic passing through P1, . . . , P8 is irreducible, we may assume that f0, f1 and f2 are irreducible. Suppose, by contradiction, that there exist at least two linearly independent vectors in the kernel of µ. Then, there exist sections u0, u1, u2 and v0, v1, v2 of H 0(P2,OP2(1)) such that the sections ui⊗ fi and vi⊗ fi are linearly independent in H 0(P2,OP2(1))⊗ H0(P2, IZ|P2(3)) and i=0 uifi = 0 i=0 vifi = 0. We can look at (22) as a linear system in the variables f0, f1, f2. The space of solutions of (22) is generated by the vector (u1v2 − u2v1, u3v0 − u0v3, u0v1 − u1v0). In particular, if we set qi = (−1) 1+iuivj − viuj, we find that fjqi = fiqj , for every i 6= j. But this is not possible since f1, f2 and f3 are irreducible. We deduce that dim(ker(µ)) = dim(ker(µo,C′)) = 1 and µo,C′ is surjective. The existence of a plane septic of genus 8 with k ≤ 6 cusps and nodes as singularities, with injective Brill-Neother map, follows now by smoothing the node P8 (in the sense of section 2.1) and by standard semicontinuity arguments. � Remark 4.10. Notice that the conditions which we found in theorem 4.9 in order that Σnk,d has at least an irreducible component with the expected number of moduli, are not sharp, even if we suppose ρ ≤ 0. To see this, notice that in remark 3.6 we proved the existence of an irreducible component Σ of Σ129,0 whose general element corresponds to a 3-normal plane curve. By remark 3.4 and corollary 4.5, we have that Σ has the expected number of moduli. Theorem 4.11. Σn1,d has the expected number of moduli, for every d ≤ Proof. First of all, we recall that, by [16], Σn1,d is irreducible for every d ≤ Moreover, from theorem 4.9 and from corollary 2.7, we know that Σn1,d is not empty and it has the expected number of moduli if either ρ ≤ 0 or ρ ≥ 2. Next we shall prove that, if ρ = 1, then the algebraic system Σn1,d = Σ (n−3)2 has general moduli. Equivalently, we will show that, if [Γ] ∈ Σn1,d is a general point and g = − 1 − d = 3n−7 , then, on the normalization curve C of Γ there are only finitely many linear series g2n with at least a ramification point. Notice that, 20 CONCETTINA GALATI if g = − 1 − d = 3n−7 , then n is odd and n ≥ 5. We prove the statement by induction on n. If n = 5 then g = 4. Let C ⊂ P3 be the canonical model of a general curve of genus four and let 2P + Q, with P 6= Q be a divisor in a g13 on C. This divisor is cut out on C by the tangent line to C at P . The projection of C from Q is a plane quintic of genus four with a cusp. This proves that Σ51,1 has general moduli. Now we suppose that the theorem is true for n and we prove the theorem for n+2. Let Γ ⊂ P2 be the plane curve with a cusp and (n−3)2 −1 nodes corresponding to a general point [Γ] ∈ Σn (n−3)2 and let C2 be an irreducible conic intersecting Γ transversally. By section 2.1, the point [C2 ∪ Γ] belongs to Σ (n+2−3)2 particular, however we choose four points P1, . . . , P4 of intersection between Γ and C2, there exists an analytic branch SP1,..., P4 of Σ (n−1)2 , passing through [C2∪Γ] and whose general point corresponds to an irreducible plane curve of degree n + 2 with a cusp in a neighborhood of the cusp of Γ and a node at a neighborhood of every node of C2 ∪ Γ different from P1, . . . , P4. Moreover, S := SP1,..., P4 is smooth at the point [C2 ∪ Γ], (see [7], chapter 2). Let Π : Σn+2 (n−1)2 99K M 3(n+2)−7 be the moduli map of Σn+2 (n−1)2 . In order to prove that Π is dominant it is sufficient to show that Π(S) = M 3n−1 . By section 2.1, there exist an analytic open sets Si ⊂ Σn+2 (n−3)2 −1+2n−i , with i = 1, 2, 3, such that S0 := S ∩ (P5 × Σn (n−3)2 ) ⊂ S1 ⊂ S2 ⊂ S3 ⊂ S. Every Si, with i = 1, 2, 3, has irreducible components, passing through [C2∪Γ] and intersecting transversally at [C2 ∪Γ], (see [7], chapter 2 or [25]). Moreover, the general point of every irreducible component of Si, with i = 1, 2, 3, corresponds to an irreducible plane curve Γi of degree n + 2 with a cusp in a neighborhood of the cusp of Γ, a node in a neighborhood of every node of C2 ∪ Γ different from P1, . . . , P4 and 4 − i nodes specializing to 4 − i fixed points among P1, . . . , P4, as Γi specializes to C2 ∪ Γ. Now, notice that the moduli map Π is not defined at the point [C2 ∪ Γ], but, if S is sufficiently small, then the restriction of Π to S extends to a regular function on S. More precisely, let C → ∆ be any family of curves, parametrized by a projective curve ∆ ⊂ S, passing through the point [C2 ∪ Γ] and whose general point corresponds to an irreducible plane curve of degree n+2 of genus 3(n+2)−7 with a cusp and nodes as singularities. If we denote by C′ → ∆ the family of curves obtained from C → ∆ by normalizing the total space, we have that the general fibre of C′ → ∆ is a smooth curve of genus 3n−1 , corresponding to the normalization of the general fibre of C → ∆, whereas the special fibre C′0 is the partial normalization of C2∪Γ, obtained by normalizing all the singular points, except P1, . . . , P4. Then, the map Π|S is defined at [C2 ∪Γ] and it associates to the point [C2∪Γ] the isomorphism class of C 0. Similarly, if [Γi] is a general point in one of the irreducible components of Si, with i = 1, 2, 3, then Π|S ([Γi]) is the partial normalization of Γi obtained by smoothing all the singular points except for the 4−i nodes of Γi tending to 4−i fixed points among P1, . . . , P4 as Γi specializes to C2∪Γ. It follows that, if we denote by M the locus of M 3n−1 parametrizing j-nodal curves, then ΠS(S i) ⊆ M4−i3n−1 , for every i = 0, . . . , 4, and ΠS(S i) ΠS(S i+1). In particular, we find that dim(Π|S (S)) ≥ dim(Π|S (S 0)) + 4. NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 21 In order to compute the dimension of Π|S (S 0) we consider the rational map F : Π|S (S 0) 99K M 3n−7 forgetting the rational tail. By the hypothesis that Σn (n−3)2 has general moduli and hence F is dominant. Moreover, if C is the normalization curve of Γ, by the generality of [Γ] in Σn (n−3)2 , we may assume that C is general in M 3n−7 want to show that dim(F−1([C])) = 5. In order to see this, we recall that, by the hypothesis that Σn (n−3)2 has general moduli, on C there exist only finitely many linear series of degree n and dimension two, mapping C to the plane as curve with a cusp and nodes as singularities. Let g2n be one of these linear series, let {s0, s1, s2} be a basis of g2n and φ ′ : C → Γ′ ⊂ P2 the associated morphism. If Q1, . . . , Q4 are four general points of Γ′, then the linear system of conics through Q1, . . . , Q4 is a pencil F(Q1, . . . , Q4). Let C2 and D2 be two general conics of F(Q1, . . . , Q4). We claim that, if η : P1 → C2 and β : P 1 → D2 are isomorphisms between P 1 and C2 and D2 respectively, then the points η −1(Q1), . . . , η −1(Q4) are not projectively equivalent to the points β−1(Q1), . . . , β −1(Q4). In order to prove this, it is enough to prove that there are at least two conics in the pencil F(Q1, . . . , Q4) which verify the claim. LetD ⊂ P2 be a conic. If we choose two sets of points p1, . . . , p4 and q1, . . . , q4 ofD not projectively equivalent onD, we may always find projective automorphisms A : P2 → P2 and A′ : P2 → P2 such that A(pi) = Qi and A ′(qi) = (Qi), for every i. By construction, the conics C2 = A(D) and D2 = A ′(D) belong to the pencil F (Q1, . . . , Q4) and verify the claim. This implies that the partial normalizations C and D′ of Γ′∪C2 and Γ ′∪D2, obtained by smoothing all the singular points except Q1, . . . , Q4, are not isomorphic. Now, let C 2 be a general conic of F(Q1, . . . , Q4) and let R1, . . . , R4 be four general points of Γ ′, different from Q1, . . . , Q4. If D is a general conic of the pencil F(R1, . . . , R4), then the partial normalization C and D′ of Γ′ ∪C′2 and Γ ′ ∪D′2 obtained, respectively, by smoothing all the singular points except Q1, . . . , Q4 and R1, . . . , R4, are not isomorphic. Indeed, since C is a general curve of genus 3n−7 ≥ 7, the only automorphism of C is the identity. This proves that dim(F−1([C])) = 5. In particular, we deduce that dim(Π|S (S 0)) = 3 3n− 7 − 3 + 5 dim(Π|S (S) ≥ 3 3n− 7 − 3 + 9 = 3 3(n+ 2)− 7 Remark 4.12. We expect that it is possible to prove that Σnk,d has expected number of moduli for every ρ also when k = 2 or k = 3. By corollary 2.7 and theorem 4.9, Σnk,d is not empty, irreducible and it has expected number of moduli for ρ ≤ 0 and ρ ≥ 2k. In order to extend theorem 4.11 to the case k = 2 and k = 3 one needs to consider a finite number of cases. Acknowledgment. The results of this paper are part of my PhD-thesis. I would like to express my gratitude to my advisor Prof. C. Ciliberto who initiated me into the subject of algebraic geometry and who provided me many invaluable suggestions. I have also enjoyed and benefited from conversation with many people including F. Flamini, E. Sernesi, L. Chiantini, L. Caporaso and G. Pareschi. Finally, I would like to thank the referee for useful remarks which allowed me to improve the finale version of this paper. 22 CONCETTINA GALATI References [1] E. Arbarello and M. Cornalba: Su una proprietà notevole dei morfismi di una curva a moduli generali in uno spazio proiettivo, Rend. Sem. Mat. Univ. Politec. Torino, vol. 38 (1980), no. 2, 87–99 (1981). [2] E. Arbarello and M. Cornalba: Su una congettura di Petri., Comment. Math. Helv., vol. 56 (1981), no. 1, 1–38. [3] E. Arbarello and M. Cornalba: A few remarks about the variety of irreducible plane curves of given degree and genus., Ann. Sci. École Norm. Sup. (4), vol. 16 (1983), 467–488 (1984). [4] E. Arbarello and M. Cornalba, P.A. Griffiths, J. Harris: Geometry of algebraic curves., vol. 1, Springer-Verlag. [5] A. Arsie and C. Galati: Geometric k-normality of curves and applications, Le Matematiche, Vol. LVIII (2003), Fasc. II, 179–199. [6] S. Diaz and J. Harris: Ideals associated to deformations of singular plane curves, Transactions of the American Mathematical Society, vol. 309, n. 2, 433–468 (1988). [7] C. Galati: Number of moduli of plane curves with nodes and cusps., PhD thesis, Università degli Studi di Tor Vergata, 2004-2005. [8] J. Harris: On the Severi problem, Invent. Math., vol. 84 (1986), no. 3, 445–461. [9] J. Harris and I. Morrison: Moduli of curves, Graduate texts in mathematics, vol. 187, Springer, New York, 1988. [10] D. Eisenbud and J. Harris: The Kodaira dimension of the moduli space of curves of genus ≥ 23. Invent. Math. vol. 90 (1987), no. 2, 359–387. [11] G.M Greuel and U. Karras: Families of varieties with prescribed singularities, Compositio Math. vol. 69 (1989), no. 1, 83–110. [12] G.M. Greuel, C. Lossen, and E. Shustin: Castelnuovo function, zero-dimensional schemes and singular plane curves. J. Algebraic Geom. vol. 9 (2000), no. 4, 663–710. [13] E. Horikawa: On the deformations of the holomorphic maps I, J. Math. Soc. Japan, vol. 25 (1973), 372–396. [14] E. Horikawa: On the deformations of the holomorphic maps II, J. Math. Soc. Japan, vol. 26 (1974), 647–667. [15] P. Kang: A note on the variety of plane curves with nodes and cusps, Proc. Amer. Math. Soc. vol. 106 (1989), no. 2, 309–312. [16] P. Kang: On the variety of plane curves of degree d with δ nodes and k cusps, Trans. Amer. Math. Soc. vol. 316 (1989), no. 1, 165–192. [17] D. Mumford: Lectures on curves on an algebraic surface. , Princeton University Press, 1966. [18] E. Sernesi: On the existence of certain families of curves, curves. Invent. Math. vol. 75 (1984), no. 1, 25–57. [19] F. Severi: Vorlesungen über algebraische Geometrie, Teuner, Leipzig, 1921. [20] E. Shustin: Smoothness and irreducibility of varieties of plane curves with nodes and cusps, Bull. Soc. Math. France vol. 122 (1994), no. 2, 235–253. [21] E. Shustin: Equiclassical deformations of plane algebraic curves, Singularities (Oberwolfach, 1996), 195–204, Progr. Math., vol. 162, Birkhuser, Basel, 1998. [22] A. Tannenbaum: On the classical characteristic linear series of plane curves with nodes and cuspidal points: two examples of Beniamino Segre, Compositio Mathematica vol. 51 (1984), 169–183. [23] J. Wahl: Deformations of plane curves with nodes and cusps, Amer. J. Math. vol. 96 (1974), 529–577. [24] O. Zariski: Dimension theoretic characterization of maximal irreducible sistems of plane nodal curves, Amer. J. Math. vol. 104 (1982), no. 1, 209–226. [25] O. Zariski: Algebraic surfaces, Classics in mathematics, Springer. Dipartimento di Matematica, Università degli Studi della Calabria, via P. Bucci, cubo 30B, Arcavacata di Rende (CS) E-mail address: galati@mat.unical.it 1. Introduction 2. Preliminaries 2.1. On Severi-Enriques varieties 2.2. Known results on the number of moduli of nk,d 3. On the existence of certain families of plane curves with nodes and cusps in sufficiently general position 4. Families of plane curves with nodes and cusps with finite and expected number of moduli. Acknowledgment References ABSTRACT Consider the family S of irreducible plane curves of degree n with d nodes and k cusps as singularities. Let W be an irreducible component of S. We consider the natural rational map from W to the moduli space of curves of genus g=(n-1)(n-2)/2-d-k. We define the "number of moduli of W" as the dimension of the image of W with respect to this map. If W has the expected dimension equal to 3n+g-1-k, then the number of moduli of W is at most equal to the min(3g-3, 3g-3+\rho-k), dove \rho is the Brill-Neother number of the linear series of degree n and dimension 2 on a smooth curve of genus g. We say that W has the expected number of moduli if the equality holds. In this paper we construct examples of families of irreducible plane curves with nodes and cusps as singularities having expected number of moduli and with non-positive Brill-Noether number. <|endoftext|><|startoftext|> Introduction Identifying the mechanism of electroweak symmetry breaking will be one of the main goals of the LHC. Many possibilities have been studied in the literature, of which the most popular ones are the Higgs mechanism within the Standard Model (SM) and within the Minimal Supersymmetric Standard Model (MSSM) [1]. Contrary to the case of the SM, in the MSSM two Higgs doublets are required. This results in five physical Higgs bosons instead of the single Higgs boson of the SM. These are the light and heavy CP-even Higgs bosons, h and H , the CP-odd Higgs boson, A, and the charged Higgs boson, H±.1 The Higgs sector of the MSSM can be specified at lowest order in terms of the gauge couplings, the ratio of the two Higgs vacuum expectation values, tan β ≡ v2/v1, and the mass of the CP-odd Higgs boson, MA. Consequently, the masses of the CP-even neutral Higgs bosons and the charged Higgs boson are dependent quantities that can be predicted in terms of the Higgs- sector parameters. Higgs-phenomenology in the MSSM is strongly affected by higher-order corrections, in particular from the sector of the third generation quarks and squarks, so that the dependencies on various other MSSM parameters can be important. After the termination of LEP in the year 2000 (the final LEP results can be found in Refs. [2, 3]), and the (ongoing) Higgs boson search at the Tevatron [4–6], the search will be continued at the LHC [7–9] (see also Refs. [10, 11] for recent reviews). The current exclusion bounds within the MSSM [3–5] and the prospective sensitivities at the LHC are usually dis- played in terms of the parameters MA and tan β that characterize the MSSM Higgs sector at lowest order. The other MSSM parameters are conventionally fixed according to certain benchmark scenarios [12–14]. The most prominent one is the “mmaxh scenario”, which in the search for the light CP-even Higgs boson allows to obtain conservative bounds on tan β for fixed values of the top-quark mass and the scale of the supersymmetric particles [15]. Besides the “no-mixing scenario”, which is similar to the mmaxh scenario, but assumes vanishing mix- ing in the stop sector, other CP-conserving scenarios that have been studied in LHC analyses (see e.g. Ref. [11]) are the “gluophobic Higgs scenario” and the “small αeff” scenario [13]. For the interpretation of the exclusion bounds and prospective discovery contours in the benchmark scenarios it is important to assess how sensitively the results depend on those parameters that have been fixed according to the benchmark prescriptions. While in the decoupling limit, which is the region of MSSM parameter space with MA ≫ MZ , the couplings of the light CP-even Higgs boson approach those of a SM Higgs boson with the same mass, the couplings of the heavy Higgs bosons of the MSSM can be sizably affected by higher-order contributions even for large values of MA. The kinematics of the heavy Higgs-boson production processes, on the other hand, is governed by the parameter MA, since in the region of large MA the heavy MSSM Higgs bosons are nearly mass-degenerate, MA ≈ MH ≈ MH±. In Ref. [14] it has been shown that higher-order contributions to the relation between the bottom-quark mass and the bottom-Yukawa coupling have a dramatic effect on the exclusion bounds in the MA–tanβ plane obtained from the bb̄φ, φ → bb̄ channel at the Tevatron. In this article we investigate how the 5 σ discovery regions in the MA–tanβ plane for the heavy neutral MSSM Higgs bosons (a corresponding analysis for the charged Higgs-boson 1We focus in this paper on the case without explicit CP-violation in the soft supersymmetry-breaking terms. search will be presented elsewhere) obtainable with the CMS experiment at the LHC depend on the other MSSM parameters. For the experimental sensitivities achievable with CMS we use up-to-date results based on full simulation studies for 30 or 60 fb−1(depending on the channel) [9]. This information is combined with precise theory predictions for the Higgs- boson masses and the involved production and decay processes incorporating higher-order corrections at the one-loop and two-loop level. In our analysis we investigate the impact on the discovery reach arising both from higher-order corrections and from possible decays of the heavy Higgs bosons into supersymmetric particles.2 The search for the heavy neutral MSSM Higgs bosons at the LHC will mainly be pursued in the b quark associated production with a subsequent decay to τ leptons [7–9]. In the region of large tanβ this production process benefits from an enhancement factor of tan2 β compared to the SM case. The main search channels are3 (here and in the following φ denotes the two heavy neutral MSSM Higgs bosons, φ = H,A): bb̄φ, φ → τ+τ− → 2 jets (1) bb̄φ, φ → τ+τ− → µ+ jet (2) bb̄φ, φ → τ+τ− → e+ jet (3) bb̄φ, φ → τ+τ− → e+ µ . (4) For our numerical analysis we use the program FeynHiggs [19–22]. We study in particular the dependence of the “LHC wedge” region, i.e. the region in which only the light CP-even MSSM Higgs boson can be detected at the LHC at the 5 σ level, on the variation of the higgsino mass parameter µ. The dependence on µ enters in two different ways, on the one hand via higher-order corrections affecting the relation between the bottom mass and the bottom Yukawa coupling, and on the other hand via the kinematics of Higgs decays into supersymmetric particles. We analyze both effects separately and discuss the possible impact of other supersymmetric parameters. Our results for the discovery reach of the heavy neutral MSSM Higgs bosons extend the known results in the literature in various ways. In comparison with Refs. [23, 24], where the prospective 5σ discovery contours for CMS in the MA–tanβ plane of the m h benchmark scenario were given for three different values of µ, the results in the present paper are based on full simulation studies and make use of the most up-to-date CMS tools for triggering and event reconstruction. Furthermore, in the analysis of Refs. [23, 24] relevant higher- order corrections, in particular those depending on ∆b (see Sect. 2.2 below), have been neglected. The effects induced by the ∆b corrections have been investigated in Ref. [14], where the results were obtained by a simple rescaling of the experimental results given in Refs. [7, 23–25]. Our present analysis, on the other hand, makes use of the latest CMS studies and provides a separate treatment of the different τ final states, channels (1)–(4). As a second step of our analysis we investigate the experimental precision that can be achieved for the determination of the heavy Higgs-boson masses in the discovery channels (1)– 2We restrict our analysis to the impact of supersymmetric contributions. For a discussion of uncertainties related to parton distribution functions, see e.g. Ref. [16]. 3In our analysis we do not consider diffractive Higgs production, pp → p ⊕ H ⊕ p [17]. For a detailed discussion of the search reach for the heavy neutral MSSM Higgs bosons in diffractive Higgs production we refer to Ref. [18]. (4). We discuss the prospective accuracy of the mass measurement in view of the possibility to experimentally resolve the signals of the heavy neutral MSSM Higgs bosons. The paper is organized as follows: Sect. 2 introduces our notation and gives a brief sum- mary of the most relevant supersymmetric radiative corrections to the Higgs-boson masses, production cross sections and decay widths at the LHC. The relevant benchmark scenarios are briefly reviewed. In Sect. 3 the experimental analysis is described. The results for the variation of the 5 σ discovery contours, obtainable at CMS with 30 or 60 fb−1 are given in Sect. 4, where we also discuss the achievable experimental precision in the Higgs mass determination. The conclusions can be found in Sect. 5. 2 Phenomenology of the MSSM Higgs sector 2.1 Notation The MSSM Higgs sector at lowest order is described in terms of two independent parameters (besides the SM gauge couplings): tan β ≡ v2/v1, the ratio of the two vacuum expectation values, and MA, the mass of the CP-odd Higgs boson A. Beyond the tree-level, large radiative corrections can occur from the t/t̃ sector, and for large values of tanβ also from the b/b̃ sector. Our notations for the scalar top and scalar bottom sector of the MSSM are as follows: the mass matrices in the basis of the current eigenstates t̃L, t̃R and b̃L, b̃R are given by +m2t + cos 2β ( s2w)M Z mtXt mtXt M +m2t + cos 2β s2wM , (5) +m2b + cos 2β (−12 + s2w)M Z mbXb mbXb M +m2b − 13 cos 2β s , (6) where mtXt = mt(At − µ cotβ ), mb Xb = mb (Ab − µ tanβ ). (7) Here MQ̃, Mt̃R and Mb̃R are the diagonal soft SUSY-breaking parameters, At denotes the trilinear Higgs–stop coupling, Ab denotes the Higgs–sbottom coupling, and µ is the higgsino mass parameter. For the numerical evaluation, it is often convenient to choose MQ̃ = Mt̃R = Mb̃R =: MSUSY. (8) Concerning analyses for the case where Mt̃R 6= MQ̃ 6= Mb̃R , see e.g. Refs. [20, 26, 27]. It has been shown that the upper bound on the mass of the light CP-even Higgs boson, Mh, obtained using eq. (8) is the same as for the more general case, provided that MSUSY is identified with the heaviest mass of MQ̃,Mt̃R ,Mb̃R [20]. Accordingly, the most important parameters entering the Higgs-sector predictions via higher-order corrections are mt, MSUSY, Xt, Xb and µ (see also the discussion in Sect. 2.2.2 below). The Higgs-sector observables furthermore depend on the SU(2) gaugino mass param- eter, M2, the U(1) parameter M1 and the gluino mass, mg̃ (the latter enters the predictions for the Higgs-boson masses only from two-loop order on). In numerical analyses the U(1) gaugino mass parameter, M1, is often fixed via the GUT relation M2. (9) We will briefly comment below on the possible impact of complex phases entering the Higgs- sector predictions via higher-order contributions. 2.2 Higher-order corrections in the Higgs sector In the following we briefly summarize the most important higher-order corrections affecting the observables in the MSSM Higgs-boson sector. As mentioned above, we focus on the MSSM with real parameters. For our numerical analysis we use the program FeynHiggs [19– 22]4, which incorporates a comprehensive set of higher-order results obtained in the Feynman- diagrammatic approach [20–22, 28–30]. 2.2.1 Higgs-boson propagator corrections Higher-order corrections to the Higgs-boson masses and the wave function normalization factors of processes with external Higgs bosons arise from Higgs-boson propagator-type con- tributions. These corrections furthermore contribute in a universal way to all Higgs-boson couplings. For the propagator-type corrections in the MSSM the complete one-loop re- sults [31–34], the bulk of the two-loop contributions [20, 27–29, 35–39] and even leading three-loop corrections [40] are known. The remaining theoretical uncertainty on the light CP-even Higgs-boson mass has been estimated to be below ∼ 3 GeV [21, 41]. The by far dominant contribution is the O(αt) term due to top and stop loops (αt ≡ h2t/(4π), where ht denotes the top-quark Yukawa coupling). Effects of O(αb) can be important for large values of tan β. 2.2.2 Corrections to the relation between the bottom-quark mass and the bot- tom Yukawa coupling Concerning the corrections from the bottom/sbottom sector, large higher-order effects can in particular occur in the relation between the bottom-quark mass and the bottom Yukawa coupling (which controls the interaction between the Higgs bosons and bottom quarks as well as between the Higgs and scalar bottoms), hb, for large values of tanβ. At lowest order the relation reads mb = hbv1. Beyond the tree level large radiative corrections proportional to hbv2 are induced, giving rise to tanβ-enhanced contributions [36–38,42]. At the one-loop level the leading terms proportional to v2 are generated either by gluino–sbottom one-loop diagrams of O(αs) or by chargino–stop loops of O(αt). The leading one-loop contribution ∆b in the limit of MSUSY ≫ mt and tanβ ≫ 1 takes the simple form [36] mg̃ µ tanβ × I(mb̃1 , mb̃2 , mg̃) + At µ tan β × I(mt̃1 , mt̃2 , µ) , (10) 4 The code can be obtained from www.feynhiggs.de . where the function I is given by I(a, b, c) = (a2 − b2)(b2 − c2)(a2 − c2) a2b2 log + b2c2 log + c2a2 log max(a2, b2, c2) The leading contribution can be resummed to all orders in the perturbative expansion [36– 38]. This leads in particular to the replacement 1 + ∆b , (12) where mb denotes the running bottom quark mass including SM QCD corrections. For the numerical evaluations in this paper we choose mb = mb(mt) ≈ 2.97 GeV. The ∆b corrections are numerically sizable for large tan β in combination with large values of the ratios of µmg̃/M SUSY or µAt/M SUSY. Negative values of ∆b lead to an enhancement of the bottom Yukawa coupling as a consequence of eq. (12) (for extreme values of µ and tanβ the bottom Yukawa coupling can even acquire non-perturbative values when ∆b → −1), while positive values of ∆b give rise to a suppression of the Yukawa coupling. Since a change in the sign of µ reverses the sign of ∆b, the bottom Yukawa coupling can exhibit a very pronounced dependence on the parameter µ. For large values of tanβ the correction to the production cross sections of the Higgs bosons H and A induced by ∆b enters approximately like tan 2 β/(1 + ∆b) 2, giving rise to potentially large numerical effects. In the case of the subsequent Higgs-boson decay φ → τ+τ−, however, the ∆b corrections in the production and the decay process cancel each other to a large extent. The residual ∆b dependence of σ(bb̄φ) × BR(φ → τ+τ−) is approximately given by tan2 β/((1+∆b) 2+9), which has a much weaker ∆b dependence (see Ref. [14] for a more detailed discussion). In the numerical analysis below the ∆b corrections, which have been discussed in this section in terms of simple approximation formulae, will be supplemented by other higher- order corrections as implemented in the program FeynHiggs (and possible decay modes into supersymmetric particles are taken into account). Higher-order corrections to Higgs decays into τ+τ− within the SM and MSSM have been evaluated in Refs. [34, 43]. 2.2.3 Corrections to the Higgs production cross sections For the prediction of Higgs-boson production processes at hadron colliders SM-type QCD corrections in general play an important role. The SM predictions for the process bb̄ → φ+X at the LHC are far advanced. In the five-flavor scheme the SM cross section is known at NNLO in QCD [44]. The cross section in the four-flavor scheme is known at NLO [45, 46]. Results obtained in the two schemes have been shown to be consistent [47–49] (see also Refs. [48, 50] and Refs. [45, 46] for results with one and two final-state b-quarks at high-pT , respectively). The predictions for the bb̄ → φ + X cross sections in the MSSM have been obtained with FeynHiggs [19–22]. The FeynHiggs implementation5 is based on the state-of-the-art 5The inclusion of the charged Higgs production cross sections is planned for the near future. SM prediction, namely the NNLO result in the five-flavor scheme [44] using MRST2002 parton distributions at NNLO [51], with the renormalization scale set equal to MHSM and the factorization scale set equal to MHSM/4. In order to obtain the MSSM prediction the SM cross section is rescaled with the ratio of the partial widths in the MSSM and the SM, Γ(φ → bb̄)MSSM Γ(φ → bb̄)SM . (13) The evaluation of the partial widths incorporates one-loop SM QCD and SUSY QCD correc- tions, as well as (in the SUSY case) the resummation of all terms of O((αs tanβ)n) [34,37,43] and the proper normalization of the external Higgs bosons as discussed in Refs. [22, 52]. Since the approximation of rescaling the SM cross section with the ratio of partial widths does not take into account the MSSM-specific dynamics of the production processes, the theoretical uncertainty in the predictions for the cross sections will in general be somewhat larger than for the decay widths. It should be noted that in comparison with other approaches for treat- ing the SM and SUSY contributions, for instance the program HQQ [53], sizable deviations can occur as a consequence of differences in the scale choices and the inclusion of higher-order corrections. 2.3 The mmaxh and no-mixing benchmark scenarios While the phenomenology of the production and decay processes of the heavy neutral MSSM Higgs bosons at the LHC is mainly characterised by the parametersMA and tanβ that govern the Higgs sector at lowest order, other MSSM parameters enter via higher-order contribu- tions, as discussed above, and via the kinematics of Higgs-boson decays into supersymmetric particles. The other MSSM parameters are usually fixed in terms of benchmark scenarios. The most commonly used scenarios are the “mmaxh ” and “no-mixing” benchmark scenar- ios [12–14]. According to the definition of Ref. [13] the mmaxh scenario is given by mmaxh : MSUSY = 1000 GeV, Xt = 2MSUSY, Ab = At, µ = 200 GeV, M2 = 200 GeV, mg̃ = 0.8MSUSY . (14) The no-mixing scenario differs from the mmaxh scenario only in that it has vanishing mixing in the stop sector and a larger value of MSUSY no-mixing: MSUSY = 2000 GeV, Xt = 0, Ab = At, µ = 200 GeV, M2 = 200 GeV, mg̃ = 0.8MSUSY . (15) The value of the top-quark mass in Ref. [13] was chosen according to the experimental central value at that time. For our numerical analysis below, we use the value, mt = 171.4 GeV [54] In Ref. [14] it was suggested that in the search for heavy MSSM Higgs bosons the mmaxh and no-mixing scenarios, which originally were mainly designed for the search for the light CP-even Higgs boson h, should be extended by several discrete values of µ, µ = ±200,±500,±1000 GeV . (16) 6 Most recently the central experimental value has shifted to mt = 170.9± 1.8 GeV [55]. This shift has a negligible impact on our analysis. As discussed above, the variation of µ in particular has an impact on the correction ∆b, modifying in this way the bottom Yukawa coupling. For very large values of tan β and large negative values of µ the bottom Yukawa coupling can be so much enhanced that a perturbative treatment is no longer possible. We have checked that in our analysis of the LHC discovery contours the bottom Yukawa coupling stays in the perturbative regime, so that all values of µ down to µ = −1000 GeV can safely be inserted. The variation of the parameter µ also modifies the mass spectrum and the couplings in the chargino and neutralino sector of the MSSM. Besides the small higher-order corrections induced by loop diagrams involving charginos and neutralinos, a change in the mass spectrum of the chargino and neutralino sector can have an important effect on Higgs phenomenology because decay modes of the heavy neutral MSSM Higgs bosons into charginos and neutralinos open up if the supersymmetric particles are sufficiently light (the mass spectrum in the mmaxh and no-mixing scenarios respects the limits from direct searches for charginos at LEP [56] for all values of µ specified in eq. (16)). Differences between the mmaxh and no-mixing scenarios in the searches for heavy neutral MSSM Higgs bosons are induced in particular by a difference in the ∆b correction. While in the mmaxh scenario both the O(αs) and O(αt) contributions to ∆b can be sizable, see eq. (10), in the no-mixing scenario the O(αt) contribution is very small because At is close to zero in this case. The larger value of MSUSY in the no-mixing scenario gives rise to an additional suppression of |∆b| compared to the mmaxh scenario. 3 Experimental analysis In this section we briefly review the recent CMS analysis of the φ → τ+τ− channel, see Ref. [9], yielding the number of events needed for a 5 σ discovery (depending on the mass of the Higgs boson). The analysis was performed with full CMS detector simulation and reconstruction for the following four final states of di-τ -lepton decays: τ+τ− → jets [57], τ+τ− → e+ jet [58], τ+τ− → µ+ jet [59] and τ+τ− → e + µ [60]. The Higgs-boson production in association with b quarks, pp → bb̄φ, has been selected using single b-jet tagging in the experimental analysis. The kinematics of the gg → bb̄φ production process (2 → 3) was generated with PYTHIA [61]. It has been shown that in this way the NLO kinematics is better reproduced than using the PYTHIA gb → bφ process (2 → 2) [62]. The backgrounds considered in the analysis were QCD muli-jet events (for the ττ → jets mode), tt̄, bb̄, Drell-Yan production of Z, γ∗, W+jet, Wt and ττbb̄. All background processes were generated using PYTHIA, except for τ+τ−bb̄, which was generated using CompHEP [63]. The results for the various channels, eqs. (1) – (4), are given in Tabs. 1 – 4. For every Higgs-boson mass point studied we show the number of signal events needed for 5 σ discovery, NS, the total experimental selection efficiency, εexp, and the ratio of the di-τ mass resolution to the Higgs-boson mass, RMφ . The last row in Tabs. 1 – 4 shows the expected precision of the Higgs-boson mass measurement, evaluated as explained below, for parameter points on the 5 σ discovery contour. Detector effects, experimental systematics and uncertainties of the background determination were taken into account in the evaluation of the NS. These effects reduce the discovery region in the MA–tanβ plane as shown in previous analyses [9] φ → τ+τ− → jets, 60 fb−1 MA [GeV] 200 500 800 NS 63 35 17 εexp 2.5× 10−4 2.4× 10−3 3.6× 10−3 RMφ 0.176 0.171 0.187 ∆Mφ/Mφ [%] 2.2 2.8 4.5 Table 1: Required number of signal events, NS, with L = 60 fb−1 for a 5 σ discovery in the channel φ → τ+τ− → jets. Furthermore given are the total experimental selection efficiency, εexp, the ratio of the di-τ mass resolution to the Higgs-boson mass, RMφ, and the expected precision of the Higgs-boson mass measurement, ∆Mφ/Mφ, obtainable from NS signal events. φ → τ+τ− → e+ jet, 30 fb−1 MA [GeV] 200 300 500 NS 72.9 45.5 32.8 εexp 3.0× 10−3 6.4× 10−3 1.0× 10−2 RMφ 0.216 0.214 0.230 ∆Mφ/Mφ [%] 2.5 3.2 4.0 Table 2: Required number of signal events, NS, with L = 30 fb−1 for a 5 σ discovery in the channel φ → τ+τ− → e+ jet. The other quantities are defined as in Tab. 1. φ → τ+τ− → µ+ jet, 30 fb−1 MA [GeV] 200 500 NS 79 57 εexp 7.0× 10−3 2.0× 10−2 RMφ 0.210 0.200 ∆Mφ/Mφ [%] 2.4 2.6 Table 3: Required number of signal events, NS, with L = 30 fb−1 for a 5 σ discovery in the channel φ → τ+τ− → µ+ jet. The other quantities are defined as in Tab. 1. φ → τ+τ− → e + µ, 30 fb−1 MA [GeV] 200 250 NS 87.8 136.7 εexp 6.4× 10−3 1.1× 10−2 RMφ 0.262 0.412 ∆Mφ/Mφ [%] 2.8 3.5 Table 4: Required number of signal events, NS, with L = 30 fb−1 for a 5 σ discovery in the channel φ → τ+τ− → e+ µ. The other quantities are defined as in Tab. 1. (see in particular Fig. 5.6 of Ref. [9] for the τ+τ− → µ+ jet mode). Now we turn to the evaluation of the expected precision of the Higgs-boson mass mea- surement. In spite of the escaping neutrinos, the Higgs-boson mass can be reconstructed in the H,A → ττ channel from the visible τ momenta (τ jets) and the missing transverse energy, EmissT , using the collinearity approximation for neutrinos from highly boosted τ ’s. In the investigated region of MA and tanβ the two states A and H are nearly mass-degenerate. For most values of the other MSSM parameters the mass difference of A and H is much smaller than the achievable mass resolution. In this case the difference in reconstructing the A or the H will have no relevant effect on the achievable accuracy in the mass determina- tion. In some regions of the MSSM parameter space, however, a sizable splitting between MA and MH can occur even for MA ≫ MZ . We will discuss below the prospects in scenarios where the splitting between MA and MH is relatively large. The precision ∆Mφ/Mφ shown in Tabs. 1 – 4 is derived for the border of the parameter space in which a 5 σ discovery can be claimed, i.e. with NS observed Higgs events. The statistical accuracy of the mass measurement has been evaluated via . (17) A higher precision can be achieved if more than NS events are observed. The corresponding estimate for the precision is obtained by replacing NS in eq. (17) by the number of observed signal events, Nev. It should be noted that the prospective accuracy obtained from eq. (17) does not take into account the uncertainties of the jet and missing ET energy scales. In the τ+τ− → jets mode these effects can lead to an additional 3% uncertainty in the mass measurement [57]. A more dedicated procedure of the mass measurement from the signal plus background data still has to be developed in the experimental analysis. However, we do not expect that the additional uncertainties will considerably degrade the accuracy of the Higgs boson mass measurement as calculated with eq. (17). 4 Results The results quoted in Sect. 3 for the required number of signal events depend only on the Higgs-boson mass, i.e. the event kinematics, but are independent of any specific MSSM scenario. In order to determine the 5 σ discovery contours in the MA–tan β plane these results have to be confronted with the MSSM predictions. The number of signal events, Nev, for a given parameter point is evaluated via Nev = L × σbb̄φ × BR(φ → τ+τ−)× BRττ × εexp . (18) Here L denotes the luminosity collected with the CMS detector, σbb̄φ is the Higgs-boson pro- duction cross section, BR(φ → τ+τ−) is the branching ratio of the Higgs boson to τ leptons, BRττ is the product of the branching ratios of the two τ leptons into their respective final state, BR(τ → jet +X) ≈ 0.65 , (19) BR(τ → µ+X) ≈ BR(τ → e+X) ≈ 0.175 , (20) and εexp denotes the total experimental selection efficiency for the respective process (as given in Tabs. 1 – 4). The Higgs-boson production cross sections and decay branching ratios have been evaluated with FeynHiggs as described in Sect. 2.2. 4.1 Discovery reach for heavy neutral MSSM Higgs bosons The number of signal events, Nev, in the MSSM depends besides the parameters MA and tan β, which govern the MSSM Higgs sector at lowest order, in principle also on all other MSSM parameters. In the following we analyze how stable the results for the 5σ discovery contours in theMA–tanβ plane are with respect to variations of the other MSSM parameters. We take into account both effects from higher-order corrections, as discussed in Sect. 2.2, and from decays of the heavy Higgs bosons into supersymmetric particles. As starting point of our analysis we use the mmaxh and no-mixing benchmark scenarios, where we investigate in detail the sensitivity of the discovery contours with respect to variations of the parameter µ. We then discuss the possible impact of varying other MSSM parameters. We have evaluated Nev in the two benchmark scenarios as a function of MA and tan β. For fixed MA we have varied tan β such that Nev = NS (as given in Tabs. 1 – 4). This tanβ value is then identified as the point on the 5 σ discovery contour corresponding to the chosen value of MA. In this way we have determined the 5 σ discovery contours for the m h and the no-mixing scenarios for µ = ±200,±1000 GeV. In Figs. 1 – 3 we show the 5σ discovery contours obtained from the process bb̄φ, φ → τ+τ− for the final states τ+τ− → jets, τ+τ− → e + jet and τ+τ− → µ + jet. As can be seen from Tab. 4, the fourth channel discussed above, τ+τ− → e + µ, contributes for 30 fb−1 only in the region of relatively small MA values and has a lower sensitivity than the other three channels. We therefore omit this channel in the following discussion. The discovery contours in Figs. 1 – 3 are given for the mmaxh and no-mixing benchmark scenarios with µ = ±200,±1000 GeV. As explained above, the 5 σ discovery contours are affected by a change in µ in two ways. Higher-order contributions, in particular the ones associated with ∆b, ,GeV/cAM 100 200 300 400 500 600 700 800 2 = -1000 GeV/cµ 2 = -200 GeV/cµ 2 = 200 GeV/cµ 2 = 1000 GeV/cµ CMS, 60 fb j+j→ ττ → φ bb→pp scenariomaxhm 2 = 1 TeV/cSUSYM 2 = 200 GeV/c2M SUSY = 0.8 Mgluinom SUSY = 2 MtStop mix: X ,GeV/cAM 100 200 300 400 500 600 700 800 2 = -1000 GeV/cµ 2 = -200 GeV/cµ 2 = 200 GeV/cµ 2 = 1000 GeV/cµ CMS, 60 fb j+j→ ττ → φ bb→pp no mixing scenario 2 = 2 TeV/cSUSYM 2 = 200 GeV/c2M SUSY = 0.8 Mgluinom = 0tStop mix: X Figure 1: Variation of the 5σ discovery contours obtained from the channel bb̄φ, φ → τ+τ− → jets in the mmaxh (left) and no-mixing (right) benchmark scenarios for different values of µ. ,GeV/cAM 100 200 300 400 500 600 700 800 2 = -1000 GeV/cµ 2 = -200 GeV/cµ 2 = 200 GeV/cµ 2 = 1000 GeV/cµ CMS, 30 fb e+j→ ττ → φ bb→pp scenariomaxhm 2 = 1 TeV/cSUSYM 2 = 200 GeV/c2M SUSY = 0.8 Mgluinom SUSY = 2 MtStop mix: X ,GeV/cAM 100 200 300 400 500 600 700 800 2 = -1000 GeV/cµ 2 = -200 GeV/cµ 2 = 200 GeV/cµ 2 = 1000 GeV/cµ CMS, 30 fb e+j→ ττ → φ bb→pp no mixing scenario 2 = 2 TeV/cSUSYM 2 = 200 GeV/c2M SUSY = 0.8 Mgluinom = 0tStop mix: X Figure 2: Variation of the 5σ discovery contours obtained from the channel bb̄φ, φ → τ+τ− → e+ jet in the mmaxh (left) and no-mixing (right) benchmark scenarios for different values of µ. ,GeV/cAM 100 200 300 400 500 600 700 800 2 = -1000 GeV/cµ 2 = -200 GeV/cµ 2 = 200 GeV/cµ 2 = 1000 GeV/cµ CMS, 30 fb +jµ → ττ → φ bb→pp scenariomaxhm 2 = 1 TeV/cSUSYM 2 = 200 GeV/c2M SUSY = 0.8 Mgluinom SUSY = 2 MtStop mix: X ,GeV/cAM 100 200 300 400 500 600 700 800 2 = -1000 GeV/cµ 2 = -200 GeV/cµ 2 = 200 GeV/cµ 2 = 1000 GeV/cµ CMS, 30 fb +jµ → ττ → φ bb→pp no mixing scenario 2 = 2 TeV/cSUSYM 2 = 200 GeV/c2M SUSY = 0.8 Mgluinom = 0tStop mix: X Figure 3: Variation of the 5σ discovery contours obtained from the channel bb̄φ, φ → τ+τ− → µ+ jet in the mmaxh (left) and no-mixing (right) benchmark scenarios for different values of µ. modify the Higgs-boson production cross sections and decay branching ratios. Furthermore the mass eigenvalues of the charginos and neutralinos vary with µ, possibly opening up the decay channels of the Higgs bosons to supersymmetric particles, which reduces the branching ratio to τ leptons. The results for the 5 σ discovery contours for the final state τ+τ− → jets are shown in Fig. 1 for themmaxh (left) and the no-mixing (right) scenario. As expected from the discussion of the ∆b corrections in Sect. 2.2, the variation of the 5 σ discovery contours with µ is more pronounced in the mmaxh scenario, where a shift up to ∆ tanβ = 12 can be observed for MA = 800 GeV. For lowMA values (corresponding also to lower tanβ values on the discovery contours) the variation stays below ∆ tanβ = 3. In the no-mixing scenario the variation does not exceed ∆ tan β = 5. The τ+τ− → jets channel has also been discussed in Ref. [14]. Our results, which are based on the latest CMS studies using full simulation [57], are qualitatively in good agreement with Ref. [14], in which the earlier CMS studies of Refs. [23, 24] had beed used. The 5 σ discovery regions are largest for µ = −1000 GeV and pushed to highest tanβ values for µ = +200 GeV. In the low MA region our discovery contours are very similar to those obtained in Ref. [14]. In the high MA region, MA ∼ 800 GeV, corresponding to larger values of tan β on the discovery contours, our improved evaluation of the 5 σ discovery contours gives rise to a shift towards higher tan β values compared to Ref. [14] of about ∆ tanβ = 8 (mostly due to the up-to-date experimental input). Accordingly, we find a smaller discovery region compared to Ref. [14] and therefore an enlarged “LHC wedge” region where only the light CP-even MSSM Higgs boson can be detected at the 5 σ level. The results for the channel τ+τ− → e+ jet are shown in Fig. 2. Again the mmaxh scenario shows a stronger variation than the no-mixing scenario. The resulting shift in tan β reaches up to ∆ tan β = 8 for MA = 500 GeV in the m h scenario, but stays below ∆ tanβ = 4 for the no-mixing scenario. Finally in Fig. 3 the results for the channel τ+τ− → µ+ jet are depicted. The level of variation of the 5 σ discovery contours is the same as for the e + jet final state.7 ,GeV/cAM 100 200 300 400 500 600 700 800 2 = -1000 GeV/cµ 2 = -200 GeV/cµ 2 = 200 GeV/cµ 2 = 1000 GeV/cµ CMS, 60 fb j+j→ ττ → φ bb→pp )=0χ χ → φ, BR(maxhm 2 = 1 TeV/cSUSYM 2 = 200 GeV/c2M SUSY = 0.8 Mgluinom SUSY = 2 MtStop mix: X ,GeV/cAM 100 200 300 400 500 600 700 800 2 = -1000 GeV/cµ 2 = -200 GeV/cµ 2 = 200 GeV/cµ 2 = 1000 GeV/cµ CMS, 60 fb j+j→ ττ → φ bb→pp )=0χ χ → φno mixing, BR( 2 = 2 TeV/c 2 = 200 GeV/c2M = 0.8 M gluino = 0tStop mix: X Figure 4: Variation of the 5σ discovery contours obtained from the channel bb̄φ, φ → τ+τ− → jets in the mmaxh (left) and no-mixing (right) benchmark scenarios for different values of µ in the case where no decays of the heavy Higgs bosons into supersymmetric particles are taken into account (see text). In order to gain a better understanding of how sensitively the discovery contours in the MA–tan β plane depend on the chosen SUSY scenario, it is useful to separately investigate the different effects caused by varying the parameter µ. For simplicity, we restrict the following discussion to the bb̄φ, φ → τ+τ− → jets channel. In Fig. 4 we show the same results as in Fig. 1, but for the case where no decays of the heavy Higgs bosons into supersymmetric particles are taken into account. As a consequence, the variation of the 5 σ discovery contours with µ shown in Fig. 4 is purely an effect of higher-order corrections, predominantly those entering via ∆b. The difference between Fig. 1 and Fig. 4, on the other hand, is purely an effect of the change in BR(φ → τ+τ−) caused by the variation of the partial Higgs-boson decay widths into supersymmetric particles arising from a shift in the masses of the charginos and neutralinos. In Fig. 4 the dependence of the 5 σ discovery contours on µ significantly differs from the case of Fig. 1. While in Fig. 1 the inclusion of decays into supersymmetric particles gives 7Since the results of the experimental simulation for this channel are available only for two MA values, the interpolation is a straight line. This may result in a slightly larger uncertainty of the results shown in Fig. 3 compared to the other figures. rise to the fact that the smallest discovery region is found for small µ values, µ = +200 GeV (with the exception of the region of very small MA), in Fig. 4 the 5 σ discovery contours are ordered monotonously in µ: the largest (smallest) 5 σ discovery regions are obtained for µ = −(+)1000 GeV, i.e. for the largest (smallest) values of the bottom Yukawa coupling. As expected, the effect of the higher-order corrections is largest in the high tanβ-region (corresponding to large values of MA on the discovery contours). In this region the variation of µ shifts the discovery contours by up to ∆ tanβ = 11 for the case of the mmaxh scenario (left plot of Fig. 4), i.e. the effect is about the same as for the case where decays into supersymmetric particles are included. For lower values of tanβ (corresponding to smaller values of MA on the discovery contours), on the other hand, the modification of the Higgs branching ratio as a consequence of decays into supersymmetric particles yields the dominant effect on the 5 σ discovery contours. Accordingly, the observed variation with µ in this region is significantly smaller in Fig. 4 as compared to the full result of Fig. 1. The reduced sensitivity of the discovery contours on µ can also clearly be seen for the case of the no- mixing scenario (right plot), where as discussed above the ∆b correction is smaller than in the mmaxh scenario. ,GeV/cAM 100 200 300 400 500 600 700 800 2 = 200 GeV/c gluino 2 = 500 GeV/c gluino 2 = 1000 GeV/c gluino 2 = 2000 GeV/c gluino CMS, 60 fb j+j→ ττ → φ bb→pp scenariomaxhm 2 = 1 TeV/cSUSYM 2 = 200 GeV/c2M 2 = 1000 GeV/cµ SUSY = 2 MtStop mix: X ,GeV/cAM 100 200 300 400 500 600 700 800 2 = 200 GeV/c gluino 2 = 500 GeV/c gluino 2 = 1000 GeV/c gluino 2 = 2000 GeV/c gluino CMS, 60 fb j+j→ ττ → φ bb→pp no mixing scenario 2 = 2 TeV/cSUSYM 2 = 200 GeV/c2M 2 = 1000 GeV/cµ = 0tStop mix: X Figure 5: Variation of the 5σ discovery contours obtained from the channel bb̄φ, φ → τ+τ− → jets in the mmaxh (left) and no-mixing (right) benchmark scenarios with µ = +1000 GeV for different values of mg̃. A parameter affecting the ∆b corrections, see eq. (10), but not the kinematics of the Higgs-boson decays is the gluino mass, mg̃. We now investigate the impact of varying this parameter, which is normally fixed to the values mg̃ = 800, 1600 GeV in the m h and no-mixing benchmark scenarios, respectively. The results for four different values of the gluino mass, mg̃ = 200, 500, 1000, 2000 GeV, are shown in Fig. 5. The µ parameter has been set to µ = +1000 GeV in Fig. 5, such that the Higgs decay channels into charginos and neutralinos are suppressed. As one can see from eq. (10), the change of mg̃ affects the O(αs) part of ∆b and corresponds to a monotonous increase of ∆b. As an example, this yields for µ = 1000 GeV, tan β = 50 in the two scenarios: mmaxh , mg̃ = 200 GeV : ∆b = 0.50 mmaxh , mg̃ = 2000 GeV : ∆b = 0.94 no-mixing, mg̃ = 200 GeV : ∆b = 0.06 no-mixing, mg̃ = 2000 GeV : ∆b = 0.29 . (21) In the no-mixing scenario the At value is close to zero, suppressing the mg̃-independent contribution to ∆b, while the higher SUSY mass scale results in an overall reduction of ∆b in this scenario. The value of ∆b in the no-mixing scenario would slightly increase if mg̃ were raised to even larger values, but this effect would not change the qualitative behaviour. Fig. 5 shows that the results for the discovery reach in the MA–tanβ plane are relatively stable with respect to variations of the gluino mass. The shift in the discovery contours remains below about ∆ tanβ = 4 for the mmaxh scenario (left plot) and ∆ tanβ = 1 for the no-mixing scenario (right plot). For the positive sign of µ chosen in Fig. 5, where the ∆b correction yields a suppression of the bottom Yukawa coupling, the largest discovery reach is obtained for small mg̃, while the smallest discovery reach is obtained for large mg̃. This behaviour would be reversed by a change of sign of µ. We have also investigated the possible impact of other MSSM parameters (besides µ and mg̃) on the 5 σ discovery contours in the MA–tan β plane. The ∆b corrections depend also on the parameters in the stop and sbottom sector, see eq. (10). While the formulas in Sect. 2.2.2 have been given for the region where MSUSY ≫ mt, the qualitative effect of reducing the stop and sbottom masses can nevertheless be inferred. Sizable ∆b corrections require relative large values of µ and mg̃. If these parameters are kept large while the stop and sbottom masses are reduced, the ∆b corrections tend to decrease. It is obvious from eq. (10) that reducing the absolute value of At decreases the electroweak part of the ∆b correction. As discussed above, this effect of the ∆b corrections manifests itself in the comparison of the mmaxh and no-mixing scenarios, see Figs. 1–5. Concerning the possible impact of the ∆b corrections on the 5 σ discovery contours for the bb̄φ, φ → τ+τ− channel in the MA–tanβ plane we conclude that larger effects than those shown in Figs. 1–5 (where we have displayed the discovery contours up to tan β = 50) would only arise if the variation of µ were extended over an even wider interval than −1000 GeV ≤ µ ≤ +1000 GeV as done in our analysis above. We now turn to the possible effects of other higher-order corrections beyond those entering via ∆b on the 5 σ discovery contours for the bb̄φ, φ → τ+τ− channel. These effects are in general non-negligible, see the discussions in Sect. 2.2 and in Sect. 4.2 below, but smaller than those induced by ∆b. As a consequence, the impact on the 5 σ discovery contours in the MA–tan β plane of other supersymmetric parameters entering via higher-order corrections is in general much smaller than the effect of varying µ in the high-tanβ region of Fig. 4. As an example, the difference observed in Figs. 1–5 between the mmaxh and no-mixing scenarios arising from the different values of At and MSUSY in the two scenarios (see eqs. (14), (15)) is mainly an effect of the ∆b corrections, while the impact of other higher-order corrections involving At and MSUSY is found to be small. Also the decays of the heavy neutral MSSM Higgs bosons into supersymmetric particles are in general affected by other supersymmetric parameters in addition to the dependence on µ, MA and tan β. The resulting effects on BR(φ → τ+τ−) turn out to be rather small, however. We find that sizable deviations from the values of BR(φ → τ+τ−) occurring in the mmaxh and no-mixing scenarios for −1000 GeV ≤ µ ≤ +1000 GeV are only possible in quite extreme regions of the MSSM parameter space that are already highly constrained by existing experimental data. Our discussion above has been given in the context of the MSSM with real parameters. Since the sensitivity of the 5 σ discovery contours in the MA–tan β plane on the other super- symmetric parameters can mainly be understood as an effect of higher-order corrections to the bottom Yukawa coupling and of the kinematics of Higgs-boson decays into supersymmet- ric particles, no qualitative changes of our results are expected for the case where complex phases are taken into account. 4.2 Higgs-boson mass precision The discussion in the previous section shows that the prospective discovery reach of the bb̄φ, φ → τ+τ− channel in theMA–tanβ plane is rather stable with respect to variations of the other MSSM parameters. We now turn to the second part of our analysis and investigate the expected statistical precision of the Higgs-boson mass measurement. The expected statistical precision is evaluated as described in Sect. 3, see eq. (17). In Figs. 6 – 7 we show the expected precision for the mass measurement achievable from the channel bb̄φ, φ → τ+τ− using the final states τ+τ− → jets and τ+τ− → e + jet. Within the 5 σ discovery region we have indicated contour lines corresponding to different values of the expected precision, ∆M/M . The results are shown in the mmaxh benchmark scenario for µ = −200 GeV (left plots) and µ = +200 GeV (right plots). We find that experimental precisions of ∆Mφ/Mφ of 1–4% are reachable within the dis- covery region. A better precision is reached for larger tanβ and smaller MA as a consequence of the higher number of signal events in this region. The other scenarios and other values of µ discussed above yield qualitatively similar results to those shown in Figs. 6, 7. As discussed above, for large values of MA the heavy neutral MSSM Higgs bosons are nearly mass-degenerate, MH ≈ MA. The experimental separation of the two states H and A (or the corresponding mass eigenstates in the CP-violating case) will therefore be challenging. The results shown in Figs. 6 – 7 have been obtained using the combined sample of H and A events. It is important to note, however, that even in the region of large MA the mass splitting between MH and MH can reach the level of a few %. An example of such a scenario is (as above, we consider the CP-conserving case, i.e. the MSSM with real parameters; the corresponding scenario in the case of non-vanishing complex phases has been discussed in Ref. [22]) MSUSY = 500 GeV, At = Ab = 1000 GeV, µ = 1000 GeV, M2 = 500 GeV, M1 = 250 GeV, mg̃ = 500 GeV . (22) In Fig. 8 the mass splitting |MH −MA| min(MH ,MA) is given as a function of Xt for tanβ = 40 and two MA values, MA = 300 GeV (solid line) and MA = 500 GeV (dashed line). The dot-dashed and dotted parts of the contours for Figure 6: The statistical precision of the Higgs-boson mass measurement achievable from the channel bb̄φ, φ → τ+τ− → jets in the mmaxh benchmark scenario for µ = −200 GeV (left) and µ = +200 GeV (right) is shown together with the 5 σ discovery contour. Figure 7: The statistical precision of the Higgs-boson mass measurement achievable from the channel bb̄φ, φ → τ+τ− → e + jet in the mmaxh benchmark scenario for µ = −200 GeV (left) and µ = +200 GeV (right) is shown together with the 5 σ discovery contour. -1500 -1000 -500 0 500 1000 1500 [GeV] = 500 GeV, tanβ = 40 = 300 GeV = 300 GeV, LEP excl. = 500 GeV = 500 GeV, LEP excl. Figure 8: The mass splitting between the heavy neutral MSSM Higgs bosons, ∆MHA/M ≡ |MH −MA| /min(MH ,MA), is shown as a function ofXt forMA = 300, 500 GeV in a scenario with MSUSY = 500 GeV, µ = 1000 GeV and tanβ = 40. The other parameters are given in eq. (22). The dot-dashed (dotted) parts of the contours forMA = 300 GeV (MA = 500 GeV) indicate parameter combinations that are excluded by the search for the light CP-even Higgs boson of the MSSM at LEP [3]. MA = 300, 500 GeV, respectively, in the region of small |Xt| indicate parameter combinations that result in relatively low Mh values that are excluded by the search for the light CP-even Higgs boson of the MSSM at LEP [3]. One can see in Fig. 8 that the mass splitting between MH and MA shows a pronounced dependence on Xt in this scenario. Mass differences of up to 5% are possible for large Xt (while the widths of the Higgs bosons are at the 1–1.5% level in this parameter region). The example of Fig. 8 shows that a precise mass measurement at the LHC may in favourable regions of the MSSM parameter space open the exciting possibility to distin- guish between the signals of H and A production. In confronting Fig. 8 with the expected accuracies obtained in Figs. 6 – 7 one of course needs to take into account that a separate treatment of the H and A channels in Figs. 6 – 7 would reduce the number of signal events by a factor of 2, resulting in a degradation of the expected accuracies (for the same luminosity) by a factor of 2. A more detailed analysis of the potential for experimentally resolving two mass peaks would furthermore have to include effects arising from overlapping Higgs signals. Such an analysis goes beyond the scope of the present paper. 5 Conclusions We have analyzed the reach of the CMS experiment with 30 or 60 fb−1 for the heavy neutral MSSM Higgs bosons, depending on tanβ and the Higgs-boson mass scale, MA. We have focused on the channel bb̄H/A,H/A → τ+τ− with the τ ’s subsequently decaying to jets and/or leptons. The experimental analysis, yielding the number of events needed for a 5 σ discovery (depending on the mass of the Higgs boson) was performed with full CMS detector simulation and reconstruction for the final states of di-τ -lepton decays. The events were generated with PYTHIA. The experimental analysis has been combined with predictions for the Higgs-boson masses, production processes and decay channels obtained with the code FeynHiggs, taking into ac- count all relevant higher-order corrections as well as possible decays of the heavy Higgs bosons into supersymmetric particles. We have analyzed the sensitivity of the 5 σ discov- ery contours in the MA–tanβ plane to variations of the other supersymmetric parameters. We have shown that the discovery contours are relatively stable with respect to the im- pact of additional parameters. The biggest effects, resulting from higher-order corrections to the bottom Yukawa coupling and from the kinematics of Higgs decays into charginos and neutralinos, are caused by varying the absolute value and the sign of the higgsino mass parameter µ. The corresponding shift in the 5 σ discovery contours amounts up to about ∆ tanβ = 10. The effects of other contributions to the relation between the bottom-quark mass and the bottom Yukawa coupling, arising from the gluino mass and the parameters in the stop and sbottom sector, are in general smaller than the shifts induced by a variation of µ. The same holds for the impact of higher-order contributions beyond the corrections to the bottom Yukawa coupling and for the possible effects of other decay modes of the heavy Higgs bosons into supersymmetric particles. The results of our analysis, which was carried out in the framework of the CP-conserving MSSM, should not be substantially affected by the inclusion of complex phases of the soft-breaking parameters. We have analyzed the prospective accuracy of the mass measurement of the heavy neu- tral MSSM Higgs bosons in the channel bb̄H/A,H/A → τ+τ−. We find that statistical experimental precisions of 1–4% are reachable within the discovery region. These results, obtained from a simple estimate of the prospective accuracies, are not expected to consid- erably degrade if further uncertainties related to background effects and jet and missing ET scales are taken into account. We have pointed out that a %-level precision of the mass measurements could in favourable regions of the MSSM parameter allow to experimentally resolve the signals of the two heavy MSSM Higgs bosons. Acknowledgements S.H. and G.W. thank M. Carena and C.E.M. Wagner for collaboration on some of the theoretical aspects employed in this analysis. References [1] H. Nilles, Phys. Rept. 110 (1984) 1; H. Haber and G. Kane, Phys. Rept. 117 (1985) 75; R. Barbieri, Riv. Nuovo Cim. 11 (1988) 1. [2] [LEP Higgs working group], Phys. Lett. B 565 (2003) 61, hep-ex/0306033. [3] [LEP Higgs working group], Eur. Phys. J. C 47 (2006) 547, hep-ex/0602042. [4] V. Abazov et al. [D0 Collaboration], Phys. Rev. Lett. 95 (2005) 151801, hep-ex/0504018; Phys. Rev. Lett. 97 (2006) 121802, hep-ex/0605009; D0 Note 5331-CONF. [5] A. Abulencia et al. [CDF Collaboration], Phys. Rev. Lett. 96 (2006) 011802, hep-ex/0508051; CDF note 8676. [6] A. Abulencia et al. [CDF Collaboration], Phys. Rev. Lett. 96 (2006) 042003, hep-ex/0510065; R. Eusebi, Ph.d. thesis: “Search for charged Higgs in tt̄ decay products from proton- antiproton collisions at s = 1.96TeV”, University of Rochester, 2005. [7] ATLAS Collaboration, Detector and Physics Performance Technical Design Report, CERN/LHCC/99-15 (1999), see: atlasinfo.cern.ch/Atlas/GROUPS/PHYSICS/TDR/access.html ; [8] K. Cranmer, Y. Fang, B. Mellado, S. Paganis, W. Quayle and S. Wu, hep-ph/0401148. [9] CMS Physics Technical Design Report, Volume 2. CERN/LHCC 2006-021, see: cmsdoc.cern.ch/cms/cpt/tdr/ . [10] V. Büscher and K. Jakobs, Int. J. Mod. Phys. A 20 (2005) 2523, hep-ph/0504099. [11] M. Schumacher, Czech. J. Phys. 54 (2004) A103; hep-ph/0410112. [12] M. Carena, S. Heinemeyer, C. Wagner and G. Weiglein, hep-ph/9912223. [13] M. Carena, S. Heinemeyer, C. Wagner and G. Weiglein, Eur. Phys. J. C 26 (2003) 601, hep-ph/0202167. [14] M. Carena, S. Heinemeyer, C. Wagner and G. Weiglein, Eur. Phys. J. C 45 (2006) 797, hep-ph/0511023. [15] S. Heinemeyer, W. Hollik and G. Weiglein, JHEP 0006 (2000) 009, hep-ph/9909540. [16] A. Belyaev, J. Pumplin, W. Tung and C. Yuan, JHEP 0601 (2006) 069, hep-ph/0508222. http://arxiv.org/abs/hep-ex/0306033 http://arxiv.org/abs/hep-ex/0602042 http://arxiv.org/abs/hep-ex/0504018 http://arxiv.org/abs/hep-ex/0605009 http://arxiv.org/abs/hep-ex/0508051 http://arxiv.org/abs/hep-ex/0510065 http://arxiv.org/abs/hep-ph/0401148 http://arxiv.org/abs/hep-ph/0504099 http://arxiv.org/abs/hep-ph/0410112 http://arxiv.org/abs/hep-ph/9912223 http://arxiv.org/abs/hep-ph/0202167 http://arxiv.org/abs/hep-ph/0511023 http://arxiv.org/abs/hep-ph/9909540 http://arxiv.org/abs/hep-ph/0508222 [17] M. Albrow and A. Rostovtsev, hep-ph/0009336; V. Khoze, A. Martin and M. Ryskin, Eur. Phys. J. C 23 (2002) 311, hep-ph/0111078; A. De Roeck, V. Khoze, A. Martin, R. Orava and M. Ryskin, Eur. Phys. J. C 25 (2002) 391, hep-ph/0207042; B. Cox, AIP Conf. Proc. 753 (2005) 103, hep-ph/0409144; J. Forshaw, hep-ph/0508274. [18] S. Heinemeyer, V. Khoze, M. Ryskin, W. Stirling, M. Tasevsky and G. Weiglein, in preparation. [19] S. Heinemeyer, W. Hollik and G. Weiglein, Comput. Phys. Commun. 124 (2000) 76, hep-ph/9812320; hep-ph/0002213; see: www.feynhiggs.de . [20] S. Heinemeyer, W. Hollik and G. Weiglein, Eur. Phys. J. C 9 (1999) 343, hep-ph/9812472. [21] G. Degrassi, S. Heinemeyer, W. Hollik, P. Slavich and G. Weiglein, Eur. Phys. J. C 28 (2003) 133, hep-ph/0212020. [22] M. Frank, T. Hahn, S. Heinemeyer, W. Hollik, H. Rzehak and G. Weiglein, JHEP 02 (2007) 047, hep-ph/0611326. [23] S. Abdullin et al., Eur. Phys. J. C 39S2 (2005) 41. [24] R. Kinnunen and A. Nikitenko, CMS note 2003/006. [25] J. Thomas, ATL-PHYS-2003-003; D. Cavalli and D. Negri, ATL-PHYS-2003-009. [26] M. Carena, P. Chankowski, S. Pokorski and C. Wagner, Phys. Lett. B 441 (1998) 205, hep-ph/9805349. [27] J. Espinosa and I. Navarro, Nucl. Phys. B 615 (2001) 82, hep-ph/0104047. [28] S. Heinemeyer, W. Hollik and G. Weiglein, Phys. Rev. D 58 (1998) 091701, hep-ph/9803277; Phys. Lett. B 440 (1998) 296, hep-ph/9807423. [29] G. Degrassi, A. Dedes and P. Slavich, Nucl. Phys. B 672 (2003) 144, hep-ph/0305127. [30] M. Carena, H. Haber, S. Heinemeyer, W. Hollik, C. Wagner, and G. Weiglein, Nucl. Phys. B 580 (2000) 29, hep-ph/0001002. [31] J. Ellis, G. Ridolfi and F. Zwirner, Phys. Lett. B 257 (1991) 83; Y. Okada, M. Yamaguchi and T. Yanagida, Prog. Theor. Phys. 85 (1991) 1; H. Haber and R. Hempfling, Phys. Rev. Lett. 66 (1991) 1815. [32] A. Brignole, Phys. Lett. B 281 (1992) 284. [33] P. Chankowski, S. Pokorski and J. Rosiek, Phys. Lett. B 286 (1992) 307; Nucl. Phys. B 423 (1994) 437, hep-ph/9303309. http://arxiv.org/abs/hep-ph/0009336 http://arxiv.org/abs/hep-ph/0111078 http://arxiv.org/abs/hep-ph/0207042 http://arxiv.org/abs/hep-ph/0409144 http://arxiv.org/abs/hep-ph/0508274 http://arxiv.org/abs/hep-ph/9812320 http://arxiv.org/abs/hep-ph/0002213 http://arxiv.org/abs/hep-ph/9812472 http://arxiv.org/abs/hep-ph/0212020 http://arxiv.org/abs/hep-ph/0611326 http://arxiv.org/abs/hep-ph/9805349 http://arxiv.org/abs/hep-ph/0104047 http://arxiv.org/abs/hep-ph/9803277 http://arxiv.org/abs/hep-ph/9807423 http://arxiv.org/abs/hep-ph/0305127 http://arxiv.org/abs/hep-ph/0001002 http://arxiv.org/abs/hep-ph/9303309 [34] A. Dabelstein, Nucl. Phys. B 456 (1995) 25, hep-ph/9503443; Z. Phys. C 67 (1995) 495, hep-ph/9409375. [35] R. Hempfling and A. Hoang, Phys. Lett. B 331 (1994) 99, hep-ph/9401219; J. Casas, J. Espinosa, M. Quirós and A. Riotto, Nucl. Phys. B 436 (1995) 3, E: ibid. B 439 (1995) 466, hep-ph/9407389; M. Carena, J. Espinosa, M. Quirós and C. Wagner, Phys. Lett. B 355 (1995) 209, hep-ph/9504316; M. Carena, M. Quirós and C. Wagner, Nucl. Phys. B 461 (1996) 407, hep-ph/9508343; H. Haber, R. Hempfling and A. Hoang, Z. Phys. C 75 (1997) 539, hep-ph/9609331; R. Zhang, Phys. Lett. B 447 (1999) 89, hep-ph/9808299; J. Espinosa and R. Zhang, JHEP 0003 (2000) 026, hep-ph/9912236; G. Degrassi, P. Slavich and F. Zwirner, Nucl. Phys. B 611 (2001) 403, hep-ph/0105096; J. Espinosa and R. Zhang, Nucl. Phys. B 586 (2000) 3, hep-ph/0003246; A. Brignole, G. Degrassi, P. Slavich and F. Zwirner, Nucl. Phys. B 631 (2002) 195, hep-ph/0112177; A. Brignole, G. Degrassi, P. Slavich and F. Zwirner, Nucl. Phys. B 643 (2002) 79, hep-ph/0206101; S. Heinemeyer, W. Hollik, H. Rzehak and G. Weiglein, Eur. Phys. J. C 39 (2005) 465, hep-ph/0411114; hep-ph/0506254. [36] R. Hempfling, Phys. Rev. D 49 (1994) 6168; L. Hall, R. Rattazzi and U. Sarid, Phys. Rev. D 50 (1994) 7048, hep-ph/9306309; M. Carena, M. Olechowski, S. Pokorski and C. Wagner, Nucl. Phys. B 426 (1994) 269, hep-ph/9402253. [37] M. Carena, D. Garcia, U. Nierste and C. Wagner, Nucl. Phys. B 577 (2000) 577, hep-ph/9912516. [38] H. Eberl, K. Hidaka, S. Kraml, W. Majerotto and Y. Yamada, Phys. Rev. D 62 (2000) 055006, hep-ph/9912463. [39] S. Martin, Phys. Rev. D 65 (2002) 116003, hep-ph/0111209; Phys. Rev. D 66 (2002) 096001, hep-ph/0206136; Phys. Rev. D 67 (2003) 095012, hep-ph/0211366; Phys. Rev. D 68 (2003) 075002, hep-ph/0307101; Phys. Rev.D 70 (2004) 016005, hep-ph/0312092; Phys. Rev. D 71 (2005) 016012, hep-ph/0405022; Phys. Rev. D 71 (2005) 116004, hep-ph/0502168; S. Martin and D. Robertson, Comput. Phys. Commun. 174 (2006) 133, hep-ph/0501132. [40] S. Martin, hep-ph/0701051. [41] S. Heinemeyer, W. Hollik and G. Weiglein, Phys. Rept. 425 (2006) 265, hep-ph/0412214. [42] J. Guasch, P. Häfliger and M. Spira, Phys. Rev. D 68 (2003) 115001, hep-ph/0305101. [43] S. Gorishny, A. Kataev, S. Larin and L. Surguladze, Mod. Phys. Lett. A 5 (1990) 2703; Phys. Rev. D 43 (1991) 1633; A. Kataev and V. Kim, Mod. Phys. Lett. A 9 (1994) 1309; http://arxiv.org/abs/hep-ph/9503443 http://arxiv.org/abs/hep-ph/9409375 http://arxiv.org/abs/hep-ph/9401219 http://arxiv.org/abs/hep-ph/9407389 http://arxiv.org/abs/hep-ph/9504316 http://arxiv.org/abs/hep-ph/9508343 http://arxiv.org/abs/hep-ph/9609331 http://arxiv.org/abs/hep-ph/9808299 http://arxiv.org/abs/hep-ph/9912236 http://arxiv.org/abs/hep-ph/0105096 http://arxiv.org/abs/hep-ph/0003246 http://arxiv.org/abs/hep-ph/0112177 http://arxiv.org/abs/hep-ph/0206101 http://arxiv.org/abs/hep-ph/0411114 http://arxiv.org/abs/hep-ph/0506254 http://arxiv.org/abs/hep-ph/9306309 http://arxiv.org/abs/hep-ph/9402253 http://arxiv.org/abs/hep-ph/9912516 http://arxiv.org/abs/hep-ph/9912463 http://arxiv.org/abs/hep-ph/0111209 http://arxiv.org/abs/hep-ph/0206136 http://arxiv.org/abs/hep-ph/0211366 http://arxiv.org/abs/hep-ph/0307101 http://arxiv.org/abs/hep-ph/0312092 http://arxiv.org/abs/hep-ph/0405022 http://arxiv.org/abs/hep-ph/0502168 http://arxiv.org/abs/hep-ph/0501132 http://arxiv.org/abs/hep-ph/0701051 http://arxiv.org/abs/hep-ph/0412214 http://arxiv.org/abs/hep-ph/0305101 L. Surguladze, Phys. Lett. B 338 (1994) 229, hep-ph/9406294; Phys. Lett. B 341 (1994) 60, hep-ph/9405325; K. Chetyrkin, Phys. Lett. B 390 (1997) 309, hep-ph/9608318; K. Chetyrkin and A. Kwiatkowski, Nucl. Phys. B 461 (1996) 3, hep-ph/9505358; S. Larin, T. van Ritbergen and J. Vermaseren, Phys. Lett. B 362 (1995) 134, hep-ph/9506465; P. Chankowski, S. Pokorski and J. Rosiek, Nucl. Phys. B 423 (1994) 497; S. Heinemeyer, W. Hollik and G. Weiglein, Eur. Phys. J. C 16 (2000) 139, hep-ph/0003022. [44] R. Harlander and W. Kilgore, Phys. Rev. D 68 (2003) 013001, hep-ph/0304035. [45] S. Dittmaier, M. Kramer and M. Spira, Phys. Rev. D 70 (2004) 074010, hep-ph/0309204. [46] S. Dawson, C. Jackson, L. Reina and D. Wackeroth, Phys. Rev. D 69 (2004) 074027, hep-ph/0311067. [47] K. Assamagan et al. [Les Houches 2003 Higgs Working Group], hep-ph/0406152. [48] S. Dawson, C. Jackson, L. Reina and D. Wackeroth, Phys. Rev. Lett. 94 (2005) 031802, hep-ph/0408077. [49] S. Dawson, C. Jackson, L. Reina and D. Wackeroth, Mod. Phys. Lett. A 21 (2006) 89, hep-ph/0508293. [50] J. Campbell, R. Ellis, F. Maltoni and S. Willenbrock, Phys. Rev. D 67 (2003) 095002, hep-ph/0204093. [51] A. Martin, R. Roberts, W. Stirling and R. Thorne, Eur. Phys. J. C 28 (2003) 455, hep-ph/0211080. [52] T. Hahn, S. Heinemeyer and G. Weiglein, Nucl. Phys. B 652 (2003) 229, hep-ph/0211204. [53] See: people.web.psi.ch/spira/hqq . [54] E. Brubaker et al. [Tevatron Electroweak Working Group], hep-ex/0608032, see: tevewwg.fnal.gov/top/ . [55] [Tevatron Electroweak Working Group], hep-ex/0703034. [56] G. Abbiendi et al. [OPAL Collaboration], Eur. Phys. J. C 35 (2004) 1, hep-ex/0401026. [57] S. Gennai, A. Nikitenko and L. Wendland, CMS Note 2006/126. [58] R. Kinnunen and S. Lehti, CMS Note 2006/075. [59] A. Kalinowski, M. Konecki and D. Kotlinski, CMS Note 2006/105. [60] S. Lehti, CMS Note 2006/101. http://arxiv.org/abs/hep-ph/9406294 http://arxiv.org/abs/hep-ph/9405325 http://arxiv.org/abs/hep-ph/9608318 http://arxiv.org/abs/hep-ph/9505358 http://arxiv.org/abs/hep-ph/9506465 http://arxiv.org/abs/hep-ph/0003022 http://arxiv.org/abs/hep-ph/0304035 http://arxiv.org/abs/hep-ph/0309204 http://arxiv.org/abs/hep-ph/0311067 http://arxiv.org/abs/hep-ph/0406152 http://arxiv.org/abs/hep-ph/0408077 http://arxiv.org/abs/hep-ph/0508293 http://arxiv.org/abs/hep-ph/0204093 http://arxiv.org/abs/hep-ph/0211080 http://arxiv.org/abs/hep-ph/0211204 http://arxiv.org/abs/hep-ex/0608032 http://arxiv.org/abs/hep-ex/0703034 http://arxiv.org/abs/hep-ex/0401026 [61] T. Sjostrand et al., Comput. Phys. Commun. 135 (2001) 238; hep-ph/0010017. [62] J. Campbell, A. Kalinowski and A. Nikitenko, “Comparison between MCFM and Pythia for the gb → bh and gg → bb̄h processes at the LHC” in C. Buttar et al., Les Houches Physics at TeV Colliders 2005, “Standard Model and Higgs working group: Summary report”, hep-ph/0604120. [63] E. Boos et al. [CompHEP Collaboration], Nucl. Instrum. Meth. A 534 (2004) 250, hep-ph/0403113. http://arxiv.org/abs/hep-ph/0010017 http://arxiv.org/abs/hep-ph/0604120 http://arxiv.org/abs/hep-ph/0403113 Introduction Phenomenology of the MSSM Higgs sector Notation Higher-order corrections in the Higgs sector Higgs-boson propagator corrections Corrections to the relation between the bottom-quark mass and the bottom Yukawa coupling Corrections to the Higgs production cross sections The mhmax and no-mixing benchmark scenarios Experimental analysis Results Discovery reach for heavy neutral MSSM Higgs bosons Higgs-boson mass precision Conclusions ABSTRACT The search for MSSM Higgs bosons will be an important goal at the LHC. We analyze the search reach of the CMS experiment for the heavy neutral MSSM Higgs bosons with an integrated luminosity of 30 or 60 fb^-1. This is done by combining the latest results for the CMS experimental sensitivities based on full simulation studies with state-of-the-art theoretical predictions of MSSM Higgs-boson properties. The results are interpreted in MSSM benchmark scenarios in terms of the parameters tan_beta and the Higgs-boson mass scale, M_A. We study the dependence of the 5 sigma discovery contours in the M_A-tan_beta plane on variations of the other supersymmetric parameters. The largest effects arise from a change in the higgsino mass parameter mu, which enters both via higher-order radiative corrections and via the kinematics of Higgs decays into supersymmetric particles. While the variation of $\mu$ can shift the prospective discovery reach (and correspondingly the ``LHC wedge'' region) by about Delta tan_beta = 10, we find that the discovery reach is rather stable with respect to the impact of other supersymmetric parameters. Within the discovery region we analyze the accuracy with which the masses of the heavy neutral Higgs bosons can be determined. We find that an accuracy of 1-4% should be achievable, which could make it possible in favourable regions of the MSSM parameter space to experimentally resolve the signals of the two heavy MSSM Higgs bosons at the LHC. <|endoftext|><|startoftext|> Introduction White Dwarf (WD) mass distributions have been determined us- ing a variety of different methods. Discrepancies exist between the different determinations in particular between the photo- metric and spectroscopic WD masses. Boudreault & Bergeron (2005) compared the masses derived by fitting the observed Balmer lines with masses derived from trigonometric parallaxes and photometry. They found differences of ∼ 50 per cent for cool (6 500–14 000 K) DA white dwarfs. Spectroscopic masses are believed to be more accurate, especially for WDs in the temper- ature range between 15 000 and 40 000 K (Liebert et al. 2005). Atmospheric models are less well established for stars outside this range. For hotter WDs the atmospheric structure is modi- fied by an (often unknown) amount of metals and by non-LTE effects. For cooler WDs the convection has to be considered and the models are sensitive to the mixing length and the amount of helium convected to the surface (Boudreault & Bergeron 2005). Central stars of planetary nebulae (CSPN) provide a way to test the mass distributions. CSPNe evolve directly into WDs, with only very minor mass changes, allowing one to measure masses of currently forming white dwarfs. However, CSPN mass distributions have also been uncertain. For example, Napiwotzki (2006) shows that the very high CSPN masses (close to the Chandrasekhar limit) derived spectroscopically with state-of- the-art model atmospheres by Pauldrach et al. (2004) are physi- cally implausible and masses close to the peak of the CSPN/WD mass distribution are more likely. CSPN masses are normally obtained from the luminosi- ties. But more accurate masses can be derived using the age– temperature diagram, obtainable from the surrounding planetary nebula (PN). Gesicki et al. (2006) applied this to a sample of 101 PNe. In this Letter we discuss the resulting mass distribu- tions for hydrogen-rich and hydrogen-poor CSPNe and compare with published WD masses. 2. Methods and results 2.1. Models The method requires the age of the nebula and the temperature of the central star to be determined. Together these provide the heating time scale for the star. We derive the age of the PNe using a combination of line ratios, diameters (taken from the literature), and new high res- olution spectra (Gesicki et al. 2006). The diameters and line ratios are used to fit a spherically symmetric photo-ionization model. The model assumes a density distribution and finds a stellar black-body temperature. For each ion, the model finds a radial emissivity distribution. The observed line profiles for each ion represent the convolution of the thermal broadening and the expansion velocity at each radius. Thus, the line profiles for dif- ferent ions are used to fit a velocity field. An iterative procedure is used to improve the ionization model. The emissivity distribu- tions of different ions overlap, and this gives a strong constraint on the shape of the wings of the line profiles. A genetic algo- rithm, PIKAIA, is used to arrive at the optimum solution for ionization model and velocity field. A turbulent component is added if needed: turbulence is indicated by a Gaussian shape of the line profiles. The expansion velocities are found to increase with radius, due to the overpressure of the ionized region. ¿From the velocity field v(r), we derive the mass-weighted average over the nebula, vav. This parameter has been shown to be robust against the simplifications. Different models which provide comparable quality fits give the same vav to within 2 km s−1 (Gesicki et al 2006). Applying this to a radius of 0.8 times the outer radius (equivalent to the mass-averaged radius) allows us to define a kinematic age t to the nebula. A linear ac- celeration is assumed to have occurred from the AGB expansion velocity (10–15 km s−1) to the PN velocity vav (20–25 km s−1). The derived nebular age and stellar temperature are com- pared to the the H-burning tracks of Blöcker (1995), which pro- vide the largest and most uniform collection available. We inter- 2 K. Gesicki and A.A. Zijlstra: White dwarf masses from planetary nebulae 0.565 0.605 0.625 0.696 0.836 2 4 8 Fig. 1. Comparison of the 101 modelled PNe with the evolu- tionary tracks in the HR diagram. The model black-body tem- peratures are plotted against the luminosities interpolated from tracks. Filled circles indicate [WR] stars, open circles are wels and pluses indicate non-emission-line stars. The dotted lines show H-burning evolutionary models of Blöcker (1995), labeled by mass in units of M�. The solid lines are isochrones, labeled by the time after the nebula ejection, in units of 103 yr. polate between different tracks to find for each (t,Teff), the CSPN luminosity and mass. 2.2. Different CSPN types The CSPNe fall into two broad categories: the hydrogen-rich O-type stars and the emission-line central stars which are gen- erally hydrogen-deficient. The second group consists of [WR]- type stars with strong emission lines and wels (weak emission line stars). The [WR] are subdivided into hot [WO] and cool [WC]. [WR] stars are in most cases hydrogen-free (three possi- ble exceptions are mentioned by Werner & Herwig 2006). The wels may contain some hydrogen. Gesicki et al. (2006) show that one group of wels is located in the temperature gap between [WC] and [WO] stars. The other wels stars form a non-uniform group, including higher-mass objects where the high luminosity drives a wind but the star is not necessarily hydrogen-poor. The hydrogen-rich stars are believed to be related to the DA white dwarfs, while the [WR] may evolved into DB’s. 2.3. The HR diagram The full analyzed sample contains 101 PNe, of which about 60 are in the direction of the Galactic Bulge and the remainder are in the Galactic disk. Foreground confusion among the Bulge PNe is estimated at 20%. The sample contains 23 [WR]-type, 21 wels and 57 non-emission-line central stars1. The CSPN classification was adopted from literature. The last group contains also objects without any information about their spectrum. In Fig.1 we show the photoionization temperatures and inter- polated luminosities, plotted on the HR diagram. The H-burning tracks of Blöcker (1995) are also shown: the luminosities and masses of CSPNe fall into a rather restricted range of values. Isochrones of 1,2,4, and 8 × 103 yr are also shown. A previous HR diagram of CSPNe presented by Stanghellini et al. (2002) shows a much broader range of luminosities and, in consequence, masses. They use Zanstra temperatures and luminosities. The Zanstra method of locating a CSPN in the HR diagram was criticized by Schönberner & Tylenda (1990). 1 The data file is available from web page www.astri.uni.torun.pl/∼gesicki/modelled pne.dat Table 1. Comparison between our dynamical masses and spec- troscopic masses from Kudritzki et al. (2006). Observed mass- loss rates from the same paper are also listed and compared to values from the model tracks of Blöcker (1995). He 2-108 is classified as wels, the other three are non-emission-line stars. Object M [M�] Teff [103 K] log Ṁ [M� yr−1] dyn. spec. dyn. spec. spec. evol. tracks Tc 1 0.59 0.81 32 34 −7.46 −7.91 He 2-108 0.57 0.63 32 34 −6.85 −8.16 IC 418 0.61 0.92 37 36 −7.43 −7.82 NGC 3242 0.61 0.63 79 75 −8.08 −7.86 Observationally, the accuracy of the luminosity determinations is about a factor of 2. On the Schönberner tracks, a CSPN mass change from, e.g., 0.57 to 0.7 M� corresponds to a factor of 3 in luminosity. The masses determined directly from luminosity are thus accurate to only 0.1 M�. This is less than the typical dispersion of masses. In contrast, for the same mass range, the dynamical time scales differ by a factor of 60. Even for a factor of 2 uncertainty in the nebular age, the mass changes by only 0.02 M�. Therefore, the dynamical method improves the accu- racy. Schönberner & Tylenda (1990) also developed a method to improve the CSPN mass determination. This method (Tylenda et al. 1991) results in masses similar to ours. Table 1 compares, for four objects in common, our dynami- cal masses with the spectroscopic masses derived by Kudritzki et al. (2006). The spectroscopic masses are larger, in two cases very much larger. The lower masses are supported by the kinematical properties of Tc 1 and He 2-108 (see Fig. 5 of Napiwotzki 2006), which favour an old thin disk population. Kudritzki et al. also derive Teff : our photo-ionization values are in good agreement. Pauldrach et al. (2004) find from a spectroscopic analysis, five CSPNe with masses close to the Chandrasekhar limit. This result is implausible, as argued by Napiwotzki (2006). Three of their objects are also in our sample, and all are found to have regular masses. 2.4. The mass distributions In Fig.2 the upper panel presents the mass distribution of our whole sample of 101 PNe. All CSPNe masses fall into a narrow range, 0.55 − 0.66 M�, with a mean mass of 0.61 M�. The range of masses is almost identical to that of Tylenda et al. (1991) but they obtained a smaller mean mass of 0.593 M� and their distri- bution peaks at 0.58 M�. The lower panel of Fig.2 presents masses for the same types of CSPNe as shown in Fig. 1. The non-emission-line stars show a Gaussian mass distribution. The hydrogen-deficient emission- line stars seem to consist of two populations: one sharply peaked, containing [WR] stars, and the other showing a wider spread, composed of [WR] and wels. The sharp peak consists, with a single exception, of hot [WO] stars only. The presented histograms seem to suggest that hot [WO] stars form a different group from the combined cooler [WC] and wels CSPNe. K. Gesicki and A.A. Zijlstra: White dwarf masses from planetary nebulae 3 Fig. 2. The CSPN mass histograms. Upper panel: the histogram of all modelled PNe. Lower panel: the histogram of different subgroups of the 101 PNe. The dashed line indicates [WR] stars, the dotted line wels and the solid line non-emission-line stars. 3. Comparing CSPNe and WDs 3.1. The histograms The comparable birth rates of PNe and WDs suggests that most white dwarfs go through the PN phase (e.g. Liebert et al. 2005). The mass distribution in both samples should therefore be simi- Fig.3 presents the histograms of our interpolated O-type CSPN masses and the masses of DA white dwarfs from recent surveys. The WD data of Madej et al. (2004), kindly provided by the authors, contain 1175 new DA WDs extracted from the Sloan Digital Sky Survey. The data of Liebert et al. (2005) taken from the electronic version of their article, contain 347 DA WDs from the Palomar Green Survey. For Fig.3 we selected the ob- jects with temperatures between 15 000 K and 40 000 K. The two WDs histograms are not identical, but both peak at similar val- ues and show extended low- and high-mass tails. We plot the histograms using narrower bins than usually done for WDs, op- timized to the mass resolution of our CSPN data. The difference between the WD and CSPN distributions is striking. First, the obtained CSPN masses are restricted to a much nar- rower range of values than WDs, and are also much more sharply peaked. At face value, this implies that only some of the WDs have gone through the PN phase, in contrast to the conclusion from their similar birth rates (Liebert et al. 2005). Second, the two distributions peak at different masses. Here a systematic er- ror cannot be excluded, as discussed below. 3.2. Hydrogen-rich vs. hydrogen-deficient Hansen & Liebert (2003) point to a variety of WD mass distri- butions with clear differences between hydrogen- and helium- rich cool stars. Beauchamp et al. (1996) found for hot helium- atmosphere DB stars a sharp peak lacking almost entirely of low- and high-mass components. They also found that the DBA stars, which exhibit traces of atmospheric hydrogen, show a distinctly different, broad and flat distribution. The CSPN show an apparent difference between hydrogen- rich and hydrogen-deficient mass distributions. The hydrogen- deficient stars show a very narrow mass distribution; it is tempt- ing to relate this to the helium-rich DB and DBA populations. We use hydrogen-burning tracks to derive these masses. The Fig. 3. The mass distribution of non-emission-line O-type CSPNe (shaded area) is compared to two DA white dwarf distri- butions of intermediate temperatures: thin line: data from Liebert et al. (2005); dotted line: data from Madej et al. (2004) which are more numerous, and are rescaled. evolution after the thermal pulse leading to helium burners is very complicated and not well understood (Werner & Herwig 2006). This may not affect the derived masses too much: the ef- fect of a thermal pulse is to change the temperature of the star, but as shown in Fig. 1, the isochrones have only a weak depen- dence on temperature. The resulting offset in time (still very un- certain) when accounted for can shift those CSPN masses to- wards higher values. 4. Discussion 4.1. Uncertainties in mass determinations When comparing the CSPNe and WDs we have to remem- ber that we compare different spatial distributions. Because of their faintness the WD observations are restricted to our near- est neighbourhood while PNe are observed across the whole Galaxy. Nevertheless we didn’t obtain significantly different dis- tributions for PNe at different distances. Our mass determination relies on a single set of evolutionary tracks. There are two possible sources of errors in the Blöcker tracks. The first is the early post-AGB evolution where the time scales depend on how and when the AGB wind terminates. The Blöcker tracks end this at Teff ∼ 6000 K, (pulsation period of 50 days) to agree with the observations of detached shells around hotter stars but not around cooler stars. A later termination would lead to an earlier start of the ionization: in this case we would systematically overestimate the masses. For a reduction of the post-AGB transition time by 103 yr, the typical mass would re- duce by 0.01 M�. The second uncertainty is the mass-loss rate during the post- AGB phase. For M ∼ 0.6 M�, the post-AGB mass-loss rate in the Blöcker models is 0.1 times the nuclear burning rate, but for high-mass models the mass loss accelerates the evolution by 50% (Blöcker 1995). A higher post-AGB mass loss than as- sumed would reduce our masses, but for the typical masses we find a very large increase would be required. Table 1 compares the Blöcker mass-loss rates with observed values, where we used the dynamical mass to calculate the Blöcker rate. For the three non-emission-line stars, observed rates are higher by up to a fac- tor of 3. This appears to be in part related to the high luminos- ity derived by Kudritzki et al: if we compare their rates with Blöcker tracks at similar luminosity, then the Blöcker rates tend to be higher. The nuclear burning rate of ṀH ∼ −6.8 exceeds the observed wind by a factor of four (more for NGC 3242). For 4 K. Gesicki and A.A. Zijlstra: White dwarf masses from planetary nebulae Table 2. Blöcker track time scales: PN visibility is defined as between log Teff = 4.4 and either a nebular age t = 104 yr or a stellar luminosity log L = 3.0, whichever occurs earlier Mass [M�] tstart [yr] tend [yr] tvisibility [yr] 0.546 90 103 - - 0.565 4 103 10 103 6 103 0.605 1.5 103 7.4 103 5.9 103 0.625 660 3.6 103 2.9 103 0.696 100 880 780 0.836 100 840 740 0.940 12 90 78 this factor, the Blöcker tracks would underestimate the speed of evolution by only 10 per cent. We conclude that the post-AGB mass-loss rates have little effect on the derived masses. The ex- ception is the wels star in the sample, where the wind mass loss rate is comparable to the nuclear burning rate. There is also an uncertainty in the dynamical age estimate. A later acceleration would increase the ages by up to 50 per cent and shift the mass peak from 0.61 to 0.60 M�. The WD mass determinations also suffer from simplifica- tions and model assumptions, in addition to the uncertainties concerning cool and hot WDs as described in the Introduction. One uncertainty is in contemporary plasma physics, concerning the pressure broadening in a very high density plasma (Madej et al. 2004). The mass-radius relations used depend on the assumed mass of the hydrogen layer. Napiwotzki et al. (1999) compared estimates from different studies and concluded that the gravities obtained from spectroscopic method suffer from systematic er- rors of up to 0.1 dex in log g. This corresponds to an offset in masses of about 0.02 M� and could, in principle, explain the dif- ference in peak masses between WDs and CSPNe. The width of the peak may also be narrower than derived from the models. Nevertheless, the wide tails of the mass distribution are not in doubt. 4.2. Time scales, birth rates and binarity The derived CSPN mass distribution combines the effects of the birth rate as function of mass, and the observable life time of the PN. The latter depends on mass as indicated in Table 2. The period of visibility is defined here as beginning when the star reaches Teff = 25 103 K, and ending either when the star enters the cooling track (defines as log L = 3.00) or when the age of the nebula is 104 yr, whichever comes earlier. Our histogram should be corrected for the difference in visibility time. This increases the number at high CSPN mass only by a factor of up to 10, and brings the high mass tail in somewhat better agreement. We may also have a sample bias against high masses, as these are not expected in the Bulge objects. The de-selection of bipolar objects may have removed a few higher-mass nebulae in the disk. CSPNe with M < 0.56 M� would not produce a visible PN, as the post-AGB transition time becomes too long (’lazy PNe’). In the sample of Liebert et al. (2005), 30 per cent of white dwarfs have masses in this range, and 50 per cent in the sample of Madej et al. (2004). However, the sharp drop in the CSPN mass distri- bution below 0.60 M� occurs at too high mass to be affected. Hansen & Liebert (2003) argue that both the high- and low- mass tails in WDs distribution can be a result of binary evolu- tion. Merging leads to high-mass WDs while a close compan- ion stripping the envelope can cause an early termination of the evolution and produce a low-mass helium WD. Both channels together may account for some 10 per cent of all WDs (Moe & de Marco 2006). Therefore the histogram for single WDs could be narrower. Close binary evolution can affect the PN phase as well, leading to strongly non-spherical nebulae. Our model anal- ysis assumes spherical symmetry, and we did not analyze bipo- lar nebulae. Our selection therefore favours single CSPNe and rejects low-mass CSPNe in interacting binaries. Thus, the CSPN histogram (Fig. 3) is biased toward single-star evolution, while the WD histogram includes binary broadening. This may affect the tails of the WD histogram but is not expected to affect the main peak. Moe & de Marco (2006) predict a number of PNe in the Galaxy of around 46000. Based on local column densities, Zijlstra & Pottasch (1991) derive an actual number of 23000, suggesting that only about half the stars which could produce a PN, do so. This comparison is limited by our knowledge on the time a PN remains observable. Moe & de Marco (2006) pre- dict a birth rate of PNe of 1.1× 10−12 PNe yr−1 pc−3, comparable to the current, local WD birth rate of 1.0 × 10−12 PNe yr−1 pc−3. Again assuming only half their predicted number of PNe is actu- ally observed, the expectation is that half of all WDs have passed through the PN phase. 5. Conclusions We show that the mass distribution of CSPNe is sharply peaked at M = 0.61 M�. The published WD mass distributions show a much broader distribution peaking at a lower mass of M = 0.59 M�. Part of the difference in the peak may indicate faster evolution during the early post-AGB phase than assumed in the Blöcker tracks. CSPN mass-loss rates cannot explain the dif- ference. However considering the uncertainty of 0.02 M� in the WD mass estimations both peaks are in reasonable agreement. About 30 per cent of WDs have too low masses to have passed through the PN phase. Acknowledgements. We thank our referee Ralf Napiwotzki for important com- ments. This project was financially supported by the “Polish State Committee for Scientific Research” through the grant No. 2.P03D.002.025 and by a NATO collaborative program grant No. PST.CLG.979726. AAZ and KG gratefully ac- knowledge hospitality from the SAAO. References Beauchamp, A., Wesemael, F. & Bergeron, P. 1996, in: C. S. Jeffery and U. Heber (eds.) Hydrogen-Deficient Stars, ASP Conference Series, Vol. 96, p.295 Blöcker, T. 1995, A&A 299, 755 Boudreault, S. & Bergeron, P. 2005, ASP Conf. Series, Vol. 334, p.249 Gesicki, K., Zijlstra, A. A., Acker, A., Gorny, S. K., Gozdziewski & K., Walsh, J. R. 2006, A&A 451, 925 Hansen, B.,M.,S. & Liebert, J. 2003, ARA&A 41, 465 Kudritzki, R. P., Urbaneja, M. A., & Puls, J. 2006, IAU Symposium 234, Planetary Nebulae in our Galaxy and Beyond, M.J. Barlow and R.H. Méndez, Eds., (CUP Cambridge). , p. 119 Liebert, J., Bergeron, P. & Holberg, J. B. 2005, ApJS 156, 47 Madej, J., Nalezyty M. & Althaus, L. G. 2004, A&A 419, L5 Moe, M. & de Marco, O. 2006, ApJ, 650, 916 Napiwotzki, R., Green, P. J. & Saffer, R. A., 1999, ApJ, 517, 399 Napiwotzki, R. 2006, A&A, 451, L27 Pauldrach, A. W. A., Hoffmann, T. L. & Mendez, R. H. 2004, A&A 419, 1111 Schönberner, D. & Tylenda, R. 1990, A&A, 234, 439 Stanghellini, L., Villaver, E., Manchado, A. & Guerrero, M. A. 2002, ApJ, 576, Tylenda, R., Stasińska, G., Acker, A. & Stenholm, B. 1991, A&A, 246, 221 Werner, K. & Herwig, F. 2006, PASP 118, 183 Zijlstra A.A., & Pottasch S.R. 1991, A&A, 243, 478 Introduction Methods and results Models Different CSPN types The HR diagram The mass distributions Comparing CSPNe and WDs The histograms Hydrogen-rich vs. hydrogen-deficient Discussion Uncertainties in mass determinations Time scales, birth rates and binarity Conclusions ABSTRACT We compare the mass distribution of central stars of planetary nebulae (CSPN) with those of their progeny, white dwarfs (WD). We use a dynamical method to measure masses with an uncertainty of 0.02 M$_\odot$. The CSPN mass distribution is sharply peaked at $0.61 \rm M_\odot$. The WD distribution peaks at lower masses ($0.58 \rm M_\odot$) and shows a much broader range of masses. Some of the difference can be explained if the early post-AGB evolution is faster than predicted by the Bl\"ocker tracks. Between 30 and 50 per cent of WD may avoid the PN phase because of too low mass. However, the discrepancy cannot be fully resolved and WD mass distributions may have been broadened by observational or model uncertainties. <|endoftext|><|startoftext|> Introduction This article discusses uniqueness theorems for Cauchy integrals of complex measures in the plane. We consider the spaceM =M(C) of finite complex measures µ in C. The Cauchy integral of a measure fromM is defined in the sense of principal value. First, for any µ ∈M , ε > 0 and any z ∈ C consider Cµε (z) := ζ:|ζ−z|>ε dµ(ζ) ζ − z Consequently, the Cauchy integral of µ can be defined as Cµ(z) := lim Cµε (z) , if the limit exists. Unlike the Cauchy transform on the line, Cµ can vanish on a set of positive Lebesgue measure: consider for example µ = dz on a closed curve, whose Cauchy transform is zero at all points outside the curve. It is natural to ask if Cµ can also vanish on large sets with respect to µ. If µ = δz is a single point mass, its Cauchy transform will be zero µ-a.e. due to the above definition of Cµ in the sense of principal value. Examples of infinite discrete measures with vanishing Cauchy transforms can also be constructed with little effort. After that one arrives at the following corrected version of the question: Is it true that any continuous µ ∈ M , such that Cµ(z) = 0 at µ-a.e. point, is trivial? As usual, we call a measure continuous if it has no point masses. We denote the space of all finite complex continuous measures by Mc(C). This problem can also be interpreted in terms of uniqueness. Namely, if f and g are two functions from L1(|µ|) such that C(f−g)µ = 0, µ-a.e., does it imply that f = g, µ-a.e.? This way it becomes a problem of injectivity of the planar Cauchy transform. The first author is supported by grants No. MTM2004-00519 and 2001SGR00431. The second author is supported by N.S.F. Grant No. 0500852. The third author is supported by N.S.F. Grant No. 0501067 . http://arxiv.org/abs/0704.0621v1 2 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG First significant progress towards the solution of this problem was achieved by X. Tolsa and J. Verdera in [14]. It was established that the answer is positive in two important particular cases: when µ is absolutely continuous with respect to Lebesgue measure m2 in C and when µ is a measure of linear growth with finite Menger curvature. The latter class of measures is one of the main objects in the study of the planar Cauchy transform, see for instance [11], [12] or [13]. As to the complete solution to the problem, it seemed for a while that the answer could be positive for any µ ∈Mc, see for example [14]. However, in Section 5 of the present paper we show that there exists a large set of continuous measures µ satisfying Cµ(z) = 0, µ-a.e. Following [2], we call such measures reflectionless. This class seems to be an intriguing new object in the theory. On the positive side, we prove that if the maximal function associated with the Cauchy transform is summable with respect to |µ| then µ cannot be reflectionless, see Theorem 2.1. This result is sharp in its scale because the simplest examples of reflectionless measures produce maximal functions that lie in the ”weak” L1(|µ|). We prove this result in Section 2 In view of this fact, we believe that the class of continuous measures with summable Cauchy maximal functions also deserves attention. A full description of this class and the (disjoint) class of reflectionless measures remains an open problem. Let us mention that if µ is a measure with linear growth and finite Menger curvature then its Cauchy maximal function belongs to L2(|µ|), see [12, 13], and therefore is summable. This fact relates Theorem 2.1 to the beforementioned result from [14]. The latter can also be deduced in a different way, see Section 2. From the point of view of uniqueness, our results imply that any bounded planar Cauchy transform is injective, see corollary 2.5. This property is a clear analogue of the uniqueness results for the Cauchy integral on the line or the unit circle. In Section 3 we discuss other applications of Theorem 2.2. They involve structural theo- rems of De Giorgi and his notion of a set of finite perimeter, see [5]. In Section 4 we study asymptotic behavior of the Cauchy transform near its zero set. The results of this section imply that the Radon derivative of µ with respect to Lebesgue measure m2 vanishes a.e. on the set {Cµ = 0}. In particular the set {Cµ = 0} must be a zero set with respect to the variation of the absolutely continuous part of µ which is a slight generalization of the first result of [14]. It is interesting to note that the most direct analogue of this corollary on the real line is false: it is easy to construct an absolutely continuous (with respect to m1 = dx) measure µ ∈M(R) such that |µ|({Cµ = 0}) > 0. Finally, in Section 5 we attempt a geometric description of the set of reflectionless mea- sures. We give a partial description of reflectionless measures on the line in terms of so-called comb-like domains. We also provide tools for the construction of various examples of such measures. In particular, we show that the harmonic measure on any compact subset (of positive Lebesgue measure) of R is reflectionless. Acknowledgments. The authors are grateful to Fedja Nazarov for his invaluable comments and insights. The second author would also like to thank the administration and staff of Centre de Recerca Matemática in Barcelona for the hospitality during his visit in the Spring of 2006. UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 3 2. Measures with summable maximal functions If µ ∈M we denote by Cµ∗ (z) its Cauchy maximal function Cµ∗ (z) := sup |Cµε (z)|. Our first result is the following uniqueness theorem. Theorem 2.1. Let µ ∈Mc. Assume that Cµ∗ (z) ∈ L1(|µ|) and that Cµ(z) exists and vanishes µ-a.e. Then µ ≡ 0. We first prove Theorem 2.2. If Cµ∗ ∈ L1(|µ|) and Cµ(z) exists µ-a.e. then µdµ(z) = 2 Cµ(t)dµ(t) = [Cµ(z)] for m2-a.e. point z ∈ C . (1) Proof. Put F := {z ∈ C : d|µ|(t) |t− z| <∞} . As |µ| is a finite measure, m2(C \ F ) = 0 . (2) Let z ∈ F . Then the integral |t−ζ|>ε dµ(t)dµ(ζ) ζ − z is absolutely convergent for any ε > 0. Using the identity (t− z)(z − ζ) (z − ζ)(ζ − t) (ζ − t)(t− z) we obtain |t−ζ|>ε z − ζ ζ − t ζ − t dµ(t)dµ(ζ) = dµ(ζ) ζ − z |t−ζ|>ε dµ(t) dµ(t) |ζ−t|>ε dµ(ζ) ζ − t dµ(t) · Cµε (t) · dµ(ζ) · Cµε (ζ) · ζ − z Cµε (t)dµ(t) E := {z ∈ C : Cµ∗ (t)d|µ|(t) |t− z| <∞} . By assumption, the numerator Cµ∗ (t)d|µ|(t) is a finite measure. Therefore m2(C \ E) = 0 . (3) If z ∈ E then Cµε (t)dµ(t) Cµ(t)dµ(t) . (4) 4 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG This formula is true as long as Cµ∗ ∈ L1(|µ|) and the principal value Cµ exists µ-a.e. by the dominated convergence theorem. Thus I = 2CC µdµ(z) if z ∈ E . (5) It is left to show that, since z ∈ F , I = [Cµ(z)]2 . (6) Since z ∈ F , the following integral converges absolutely: φε(t, z) := ζ∈C,|ζ−t|>ε dµ(ζ) ζ − z φε(t, z) dµ(t) . Since the point z is fixed in F , we have that 1|ζ−z| ∈ L 1(|µ|), and therefore |ζ−z|d|µ|(ζ) is small if |µ|(A) is small. Denoting the disc centered at t and of radius ε by B(t, ε) we notice 1) φε(t, z) = dµ(ζ) ζ − z B(t,ε) dµ(ζ) ζ − z 2) lim |µ|(B(t, ε)) = 0. uniformly in t. Otherwise µ would have an atom. We conclude that, as ε → 0, the functions φε(t, z) converge uniformly in t ∈ C to φ(z) =∫ ζ−z . Hence for any z ∈ F and any t ∈ C \ z φε(t, z) → φ(z) , as ε → 0 . Since φε(t, z) converge uniformly and z ∈ F , dµ(t)φε(t, z) → φ(z) dµ(t) = [Cµ(z)]2 . We have verified (6). Combining (5) and (6) we conclude that for z ∈ E ∩ F (so for m2-a.e. z ∈ C) we have µdµ(z) = 2 Cµ(t)dµ(t) = lim I = [Cµ(z)]2 for m2-a.e. point z ∈ C . (7) This formula is true as long as Cµ∗ ∈ L1(|µ|) and the principal value Cµ exists µ-a.e. To deduce Theorem 2.1 suppose that Cµ vanishes µ-a.e. Then the left-hand side in (7) is zero form2-a.e. point z. The same must hold for [C µ(z)]2. But if Cµ(z) = 0 for Lebesgue-a.e. point z ∈ C then µ = 0, see for example [6]. Theorem 2.1 is completely proved. UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 5 Remark. In the statement of Theorem 2.2 the condition Cµ∗ ∈ L1(|µ|) can be replaced with the condition that Cµε converge in L 1(|µ|). The proof would have to be changed as follows. Like in the above proof one can show that at Lebesgue-a.e. point z I = [Cµ(z)]2 . (8) The relation I = 2 Cµε (t)dµ(t) for a.e. z can also be established as before. Since Cµε converge in L 1(|µ|), the last integral converges to CC µdµ(z) in the ”weak” L2(dxdy), which concludes the proof. Hence we arrive at the following version of Theorem 2.1: Theorem 2.3. Let µ ∈Mc. Assume that Cµε → 0 in L1(|µ|). Then µ ≡ 0. This version has the following corollary: corollary 2.4 ([14]). Let µ ∈M be a measure of linear growth and finite Menger curvature. If Cµ = 0 at µ-a.e. point then µ ≡ 0. Proof. The conditions on µ imply that the L2(|µ|)-norms of the functions Cµε are uniformly bounded, see for instance [11]. Since Cµε also converge µ-a.e., they must converge in L 1(|µ|). Remark As was mentioned in the introduction, Corollary 2.4 also follows from Theo- rem 2.1. However, the above version of the argument allows one to obtain it without the additional results of [12, 13] on the maximal function. We also obtain the following statement on the injectivity of any bounded planar Cauchy transform. As usual, we say that the Cauchy transform is bounded in L2(µ) if the functions Cfdµε are uniformly bounded in L 2(µ)-norm for any f ∈ L2(µ). If Cµ is bounded, then Cfdµε converge µ-a.e as ε → 0 and the image Cfdµ exists in a regular sense as a function in L2(µ), see [13]. corollary 2.5. Let µ ∈ M be a positive measure. If Cµ is bounded in L2(µ) then it is injective (has a trivial kernel). Proof. Suppose that there is f ∈ L2(µ) such that Cfdµ = 0 at µ-a.e. point. Since both f and Cfdµ∗ are in L 2(µ), Cfdµ∗ is in L 1(|f |dµ). Hence f is a zero-function by Theorem 2.1 � Remark We have actually obtained a slightly stronger statement: If Cµ is bounded in L2(µ) then for any f ∈ L2(µ) the functions f and Cfdµ cannot have disjoint essential supports, i.e. the product fCfdµ cannot equal to 0 at µ-a.e. point. In the rest of this section we will discuss what other kernels could replace the Cauchy kernel in the statement of Theorem 2.1. If K(x) is a complex-valued function in Rn, bounded outside of any neighborhood of the origin, and µ is a finite measure on Rn, one can define Kµ and Kµ∗ in the same way as C and Cµ∗ were defined in the introduction. 6 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG The proof of Theorem 2.2 relied on the fact that the Cauchy kernel K(z) = 1/z is odd, satisfies the symmetry condition (3), i.e. K(x− y)K(y − z) +K(y − z)K(z − x) +K(z − x)K(x− y) ≡ 0, (9) and is summable as a function of z for any t with respect to Lebesgue measure. Any K(x) having these three properties could be used in Theorem 2.1. Out of these three conditions the symmetry condition (9) seems to be most unique. However, other symmetry conditions may result in formulas similar to Theorem 2.2 that could still yield Theorem 2.1. Here is a different example. It shows that much less symmetry can be required from the kernel if the measure is positive. Theorem 2.6. Let µ be a positive measure in Rn. Suppose that the real kernel K(x) satisfies the following properties: 1) K(−x) = −K(x) for any x ∈ Rn; 2) K(x) > 0 for any x from the half-space Rn+ = {x = (x1, x2, ..., xn) | x1 > 0}. If Kµ∗ ∈ L1(µ) and Kµ(x) = 0 for µ-a.e. x then µ ≡ 0. Note that real and imaginary parts of the Cauchy kernel, Riesz kernels in Rn, as well as many other standard kernels satisfy the conditions of the theorem. We will need the following Lemma 2.7. Let K be an odd kernel. and let µ, ν ∈ M . Then Kµε (z)dν(z) = − Kνε (z)dµ(z) (10) for any ε > 0. Suppose that Kµ∗ ∈ L1(|ν|). If Kµ(z) exists ν-a.e. then Kµ(z)dν(z) = − lim Kνε (z)dµ(z). In particular, suppose that both Kµ∗ ∈ L1(|ν|) and Kν∗ ∈ L1(|µ|). If Kµ(z) exists ν-a.e. and Kν(z) exists µ-a.e. then Kµ(z)dν(z) = Kν(z)dµ(z). Proof. Since K is odd, the first equation can be obtained simply by changing the order of integration. The second and third equations now follow from the dominated convergence theorem. � Proof of Theorem 2.6. There exists a half-plane {x1 = c} in Rn such that µ({x1 = c}) = 0 but both µ({x1 > c}) and µ({x1 < c}) are non-zero. Denote by ν and η the restrictions of µ onto {x1 > c} and {x1 < c} respectively. Then∫ Kνε (z)dµ(z) = Kνε (z)dν(z) + Kνε (z)dη(z). The first integral on the right-hand side is 0 because of the oddness of K (apply the first equation in the last lemma with µ = ν). The second condition on K and the positivity of the measure imply that the second integral is positive and increases as ε → 0. Therefore∫ Kνε (z)dµ(z) cannot tend to zero. This contradicts the fact that K µ = 0, ν-a.e. and the second equation from the last lemma. � UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 7 3. Sets of finite perimeter In this section we give another example of an application of Theorem 2.2. It involves the notion of a set of finite perimeter introduced by De Giorgi in the 50’s, see [5]. We say that a set G ⊂ R2 has finite perimeter (in the sense of De Giorgi) if the distributional partial derivatives of its characteristic function χG are finite measures. Such sets have structural theorems. For example, if G is such a set then the measure ∇χG is carried by a set E, rectifiable in the sense of Besicovitch, i. e. a subset of a countable union of C1 curves and an H1-null set, where H1 is the one-dimensional Hausdorff measure. Also the measure ∇χG is absolutely continuous with respect to H1 restricted to E and its Radon-Nikodym derivative is a unit normal vector H1-a.e. (notice that ∇χG is a vector measure). At H1-almost all points of E the function χG has approximate “one-sided”’ limit. For more details we refer the reader to [5]. The general question we consider can be formulated as follows: What can be said about µ if Cµ coincides at µ-a.e. point with a ”good” function f? To avoid certain technical details, all measures in this section are compactly supported. Furthermore, we will only discuss the two simplest choices of f . As we will see, even in such elementary situations Theorem 2.2 yields interesting consequences. As usual, when we say that Cµ = f at µ-a.e. point, we imply that the principal value exists µ-almost everywhere. Theorem 3.1. Let µ ∈ Mc be compactly supported. Assume that Cµ(z) = 1, µ-almost everywhere and Cµ∗ ∈ L1(|µ|). Then µ = ∂̄χG, where G is a set of finite perimeter. In particular, µ is carried by a set E, H1(E) < ∞, rectifiable in the sense of Besicovitch, and µ is absolutely continuous with respect to the restriction of H1 to E. Remark. The most natural example of such a measure is dz on a C1 closed curve. The theorem says that, by the structural results of De Giorgi, this is basically the full answer. Proof. By Theorem 2.2 we get that for Lebesgue-almost every point in C [Cµ(z)]2 = 2Cµ(z) . (11) In other words for m2-a.e. point z we have C µ(z) = 0 or = 2. Let G denote the set where Cµ(z) = 2. Since the Cauchy transform of any compactly supported finite measure must tend to zero at infinity, this set is bounded. Consider the following equality χG = C understood in the sense that the two functions are equal as distributions. Taking distribu- tional derivatives on both sides we obtain ∂̄χG = µ/2 and ∂χ̄G = µ̄/2. Hence G has finite perimeter and the rest of the statement follows from the results of [5]. � We say that a set G has locally finite perimeter (in the sense of De Giorgi) if the distribu- tional derivatives of χG are locally finite measures. Our second application is the following 8 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG Theorem 3.2. Let µ ∈ Mc be compactly supported. Assume that Cµ(z) = z, µ-almost everywhere and Cµ∗ ∈ L1(|µ|). If µ(C) = 0 then µ = 2z∂̄χG, where G is a set with locally finite perimeter. Whether µ(C) = 0 or not, µ is carried by a set E, H1(E) <∞, which is a rectifiable set in the sense of Besicovitch, and µ is absolutely continuous with respect to the restriction of H1 to E. Remark. The most natural example of such a measure is zdz on a C1 closed curve. Our statement shows that this is basically one-half of the answer. The other half is given by√ z2 − cdz as will be seen from the proof. Proof. Again, from Theorem 2.2 we get that for Lebesgue-almost every point in C [Cµ(z)]2 = 2Cζdµ(ζ)(z) . (12) Notice that Cζdµ(ζ)(z) = ζ − z dµ(ζ) = µ(C) + zCµ(z) and we get a quadratic equation [Cµ(z)]2 = 2zCµ(z)− p , where p := −2µ(C). First case p = 0. Here we get [Cµ(z)]2 = 2zCµ(z) . We conclude that Cµ(z) = 0 or z for Lebesgue-a.e. point z ∈ C. Again a bounded set G appears on which Cµ = 2zχG(z) in terms of distributions. Therefore ∂̄χG = dµ/2z , and the right hand side is a finite measure on any compact set avoiding the origin. Therefore, G is a (locally) De Giorgi set. Let us consider the case p 6= 0. For simplicity we assume p = 1, other p’s are treated in the same way. Then we have to solve the quadratic equation Cµ(z)2 − 2zCµ(z) + 1 = 0 for Lebesgue-a.e. point in C. Let us make the slit [−1, 1] and consider two holomorphic functions in C \ [−1, 1] r1(z) = z − z2 − 1, r2(z) = z + z2 − 1 , where the branch of the square root is chosen so that r1(z) → 0, z → ∞ . In other words we have the sets E1 and E2 such that m2(C \ E1 ∪ E2) = 0 and z ∈ E1 ⇒ Cµ(z) = r1(z) , z ∈ E2 ⇒ Cµ(z) = r2(z) . UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 9 Obviously it is E1 that contains a neighborhood of infinity. The function z − z2 − 1 outside of [−1, 1] can be written as Cµ0(z) where dµ0(x) = 1π 1− x2dx. Consider ν = µ−µ0. z ∈ E1 ⇒ Cν(z) = 0 , z ∈ E2 ⇒ Cν(z) = 2 z2 − 1 := R(z) . Therefore, Cν(z) = R(z)χE2 . (13) Notice that if R was analytic in an open domain compactly containing E2 we would conclude from the previous equality that ν = R(z)∂̄χE2 . If, in addition, |R| was bounded away from zero on E2, we would obtain that ∂̄χE2 and ∂χE2 are measures of finite variation, and hence E2 is a set of finite perimeter. Notice that our R(z) = 2 z2 − 1 is analytic in O := C \ [−1, 1] and is nowhere zero. We will conclude that E2 is a set of locally finite perimeter. More precisely we will establish the following claim: For every open disk V ⊂ O the set O ∩ E2 has finite perimeter. Indeed, let W be a disk compactly containing V , W ⊂ O. Let ψ be a smooth function, supported in W , ψ|V = 1. Multiply (13) by ψ and take a distributional derivative (against smooth functions supported in V ). Then we get (using the fact that R is holomorphic on V ) ν|V = ∂̄(ψRχE2∩V )|V = ∂̄(RχE2∩V )|V = R∂̄(χE2∩V )|V . We conclude immediately that E2 ∩ V is a set of finite perimeter. Therefore, E2 ∩D is a set of finite perimeter, where D is a domain whose closure is contained compactly in O. Recalling that µ = ν + µ0 we finish the proof. � Remark 3.3. In is interesting to note that, as follows from the proof, if µ is the measure from the statement of the theorem then one of the connected components of supp µ must contain both roots of the equation z2 + 2µ(C) = 0. We conclude this section with the following examples of measures µ whose Cauchy trans- form coincides with z at µ-a.e. point Examples. 1. Let Ω be an open domain with smooth boundary Γ. Suppose that [−1, 1] ⊂ Ω. Let {Dj}∞j=1 be smoothly bounded disjoint domains in O := Ω \ [−1, 1], γj = ∂Dj . Assume H1(γj) <∞ . (14) LetR(z) be an analytic branch of 2 z2 − 1 inO. Consider the measure ν on Γ∪(∪γj)∪[−1, 1] defined as ν = R(z)dz|Γ − R(z)dz|∪γj − 1− x2dx|[−1,1]. 10 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG Cν(z) = 0 if z ∈ C \ Ō , 0 if z ∈ ∪jDj , R(z) if z ∈ O \ ∪jD̄j . Recall that R(z) = z + z2 − 1 − (z − z2 − 1) and that Cµ0(z) = z − z2 − 1 for µ0 = 1− x2dx|[−1,1]. We conclude that for µ = ν + µ0 one has Cµ(z) = z2 − 1 if z ∈ C \ Ō , z2 − 1 if z ∈ ∪jDj , z2 − 1 if z ∈ O \ ∪jD̄j . 2. The second example is exactly the same as the first one but Dj,k = B(xj,k, ), xj,k = j , 1 ≤ k ≤ j, j = 1, 2, 3.... Here the assumption (14) fails. But ν, defined as above, will still be a measure of finite variation (and so will be µ): |ν|(C) ≤ C In both examples Cµ(z) = z for µ-a.e. z. 4. Asymptotic behavior near the zero-set of Cµ In this section we take a slightly different approach. We study asymptotic properties of measures near the sets where the Cauchy transform vanishes. Theorem 4.2 below shows that near the density points of such sets the measure must display a certain ”irregular” asymptotic behavior. As was mentioned in the introduction, one of the results of [14] says that an absolutely continuous planar measure cannot be reflectionless. This result is not implied by our Theorem 2.1 because an absolutely continuous measure may not have a summable Cauchy maximal function. It is, however, implied by Theorem 4.2, see Corollary 4.4 below. When estimating Cauchy integrals one often uses an elementary observation that the difference of any two Cauchy kernels 1/(z − a) − 1/(z − b) can be estimated as O(|z|−2) near infinity. To obtain higher order of decay one may consider higher order differences. Here we will utilize the following estimate of that kind, which can be verified through simple calculations. Lemma 4.1. If a, b, c ∈ B(0, r) be different points, |a − b| > r. Then there exist constants A,B ∈ C such that |A|, |B| < 2 z − a z − b z − c ∣∣∣∣ < outside of B(0, 2r). (Namely, A = b−c b−a , B = a−b .) If µ ∈M consider one of its Riesz transforms in R3, R1µ(x, y, z), defined as R1µ(x, y, z) = |(u, v, 0)− (x, y, z)|3 dµ(u+ iv). UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 11 This transform is the planar analogue of the Poisson transform. In particular, R1µ(x, y, z) = (x+ iy) for all points w = x+ iy ∈ C where the Radon derivative (w) = lim µ(B(w, r)) |B(w, r)| exists. For measures on the line or on the circle their Poisson integrals and Radon derivatives (with respect to the one-dimensional Lebesgue measure) are very much related but not always equivalent. When the asymptotics of the Poisson integral and the ratio from the definition of the Radon derivative are different near a certain point it usually means that the measure is ”irregular” near that point. It is not difficult to show that if µ is absolutely continuous then at a Lebesgue point of its density function the Radon derivative of µ and the Poisson integral of |µ| (or R1|µ| if n > 1) behave equivalently. Even for singular measures on the circle, if a measure possesses a certain symmetry near a point, then the same equivalent behavior takes place, as follows for instance from [1], Lemma 4.1. In fact, it is not easy to construct a measure so that its Poisson integral and Radon derivative behaved differently near a large set of points. The same can be said about the Riesz transform and the Radon derivative. Thus one may interpret our next result as an evidence that, for a planar measure µ, most points where Cµ = 0 are ”irregular.” Theorem 4.2. Let µ ∈ M and let w = x + iy be a point of density (with respect to m2) of the set E = {Cµ = 0}. Then µ(B(w, r)) = o (R1|µ|(x, y, r)) as r → 0 + . In view of the above discussion this implies corollary 4.3. If w is a point of density of the set E = {Cµ = 0}, such that there exists the Radon derivative d|µ|/dm2(w) 6= 0, then µ(B(w, r)) = o (|µ|(B(w, r))) as r → 0+ (16) and dµ/dm2(w) = 0. Since m2-almost every point of a set is its density point, we also obtain the following version of the result from [14]: corollary 4.4. The set E = {Cµ = 0} has measure zero with respect to the absolutely continuous component of µ. Proof of Theorem 4.2. without loss of generality w = 0. Choose a C∞0 test-function φ sup- ported in B := B(0, r), and such that 0 ≤ φ ≤ D/r2, |∇φ| ≤ A/r3 and φ dm2 = 1. Denote the complement of E by Ec. Then φdµ = 〈φ, ∂̄Cµ〉 = 〈∂̄φ, Cµ〉 = 〈χEc∂̄φ, Cµ〉 = χEc ∂̄φ dm2(z) ζ − z dµ(ζ) (17) All we need is to show that the last integral is small. Then, since the first integral in (17) is similar to the right-hand side of (16) we will complete the proof. The main idea for 12 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG the rest of the proof is to make the function F (ζ) = χEc ∂̄φ dm2(z) ζ−z ”small” by subtracting a linear combination of Cauchy kernels corresponding to points from E, which will not change its integral with respect to µ. Namely, let a, b ∈ B(0, r) ∩ E be any two points such that |a − b| > r. By the previous lemma for any z ∈ B(0, r) there exist constants A = A(z), B = B(z), of modulus at most 2, such that (15) holds with c = z. Integrating (15) with respect to χEc ∂̄φ dm2(z) we obtain that ∣∣∣∣ χEc ∂̄φ dm2(z) ζ − z ζ − a ζ − b ∣∣∣∣ < C ε(r)r |ζ |3 outside of B(0, 2r) for some constants A∗, B∗, where ε(r) = |B(0, r)∩Ec|/r2 = o(1) as r → 0. The constants satisfy |A∗|, |B∗| < 2 ε(r) Notice that if w ∈ E then ζ−wdµ = 0 by the definition of the set E. Hence, since a, b ∈ E, χEc ∂̄φ dm2(z) ζ − z dµ(ζ) = χEc ∂̄φ dm2(z) ζ − z ζ − a ζ − b dµ(ζ) B(0,2r) C\B(0,2r) = I1 + I2. For I2 we now have C\B(0,2r) χEc ∂̄φ dm2(z) ζ − z ζ − a ζ − b dµ(ζ) C\B(0,2r) ε(r)r |ζ |3 d|µ|(ζ) ≤ Cε(r)R1|µ|(0, 0, r). In I1 we estimate each summand separately. First, B(0,2r) χEc ∂̄φ dm2(z) ζ − z dµ(ζ) ∣∣∣∣ ≤ B(0,2r) |ζ − z| χEcdm2(z)d|µ|(ζ) |µ|(B(0, 2r)) ≤ C ε(r)R1|µ|(0, 0, r). To estimate the second and third summands of I1, recall that the only restriction on the choice of a, b ∈ B(0, r) ∩ E was that |a − b| > r. This condition will be satisfied, for instance, if a ∈ B1 = B(−56r, r) and b ∈ B2 = B(56r, r). If we average the modulus of the second summand over all choices of a ∈ B1 ∩ E, recalling that A∗ = A∗(a) always satisfies |A∗| ≤ 2 ε(r) , we get |B1 ∩ E| B(0,2r) A∗(a) ζ − a dµ(ζ) ∣∣∣∣dm2(a) ≤ |B1 ∩ E| B(0,2r) |A∗(a)| |ζ − a| dm2(a)d|µ|(ζ) r|µ|(B(0, 2r)) ≤ Cε(r)R1|µ|(0, 0, r). It is left to choose a ∈ B1 ∩ E for which the modulus is no greater than its average. The same can be done for b. The proof is finished. � UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 13 5. Reflectionless measures and Combs As was mentioned in the introduction, following [2], we will call a non-trivial continuous finite measure µ ∈M(C) reflectionless if Cµ(z) = 0 at µ-a.e. point z. Perhaps the simplest example of a reflectionless measure is the measure µ = 1 (1−x2)−1/2dx on [−1, 1], the harmonic measure of C \ [−1, 1] corresponding to infinity. The fact that µ is reflectionless can be verified through routine calculations or via the conformal map interpretation of the harmonic measure. It will also follow from a more general Theorem 5.4 below. At the same time, since Cµ∗ ≍ (1 − x2)−1/2 on [−1, 1], this simple example complements the statement of Theorem 2.1. Since the function (1−x2)−1/2 belongs to the ”weak” L1(|µ|), the summability condition for the Cauchy maximal function proves to be exact in its scale. In the rest of this section we discuss further examples and properties of positive reflec- tionless measures on the line. Let us recall that functions holomorphic in the upper half plane C+ and mapping it to itself (having non-negative imaginary part) are called Nevanlinna functions. Let M+(R) denote the class of finite positive measures compactly supported on R. The function f is a Nevanlinna function if and only if it has a form f(z) = az + b+ t2 + 1 ]dρ(t) , where ρ is a positive measure on R such that dρ(t) < ∞, a > 0, b ∈ R are constants. If the representing measure is from M+(R) and f(∞) = 0, the formula becomes simpler: f(z) = dµ(x) x−z . Definition. A simply connected domain O is comb-like if it is a subset of a half-strip {w : ℑw ∈ (0, π),ℜw > q}, for some q ∈ R, contains another half-strip {w : ℑw ∈ (0, π),ℜw > r} for some r ∈ R and has the property that for any w0 = u0 + iv0 ∈ O the whole ray {w = u+ iv0, u ≥ u0} lies in O . (18) If in addition H1(∂O∩B(0, R)) <∞ for all finite R, we say that O is a rectifiable comb-like domain. Let O be a rectifiable comb-like domain, Γ = ∂O. Then by the Besicovitch theory we know that for H1-a.e. pont w ∈ Γ there exists an approximate tangent line to Γ, see [3] for details. We wish to consider rectifiable comb-like domains satisfying the following geometric property: for a.e. w ∈ Γ approximate tangent line is either vertical or horizontal. (19) It is not difficult to verify that for any conformal map F : C+ → O, O is comblike if and only if F ′ is a Cauchy potential of µ ∈ M+(R): F ′(z) = dµ(x) x−z . It is, therefore, natural to ask the following Question. Which comb-like domains correspond to reflectionless measures µ ∈M+(R)? An answer would give a geometric description of reflectionless measures from M+(R). If, in addition, a comb-like domain is rectifiable, then the answer is given by 14 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG Theorem 5.1. 1) Rectifiable comb-like domains correspond exactly to those measures µ ∈M+(R) that are absolutely continuous with respect to dx and satisfy dµ(x) ∈ H1loc(C+). (20) 2) An absolutely continuous measure satisfying (20) is reflectionless if and only if the corre- sponding comb-like domain has the property (19). Remarks. 1) Of course not every comb-like domain gives rise to a reflectionless measure fromM+(R). Just take any comb-like domain which appears as F (C+), where F = ∫ z ∫ dµ(x) x−z for a singular µ ∈M+(R). By a result from [9] singular measures cannot be reflectionless. 2) On the other hand, even if µ = g(x)dx is a reflectionless absolutely continuous measure, the corresponding conformal map F = ∫ z ∫ dµ(x) x−z : C+ → O can be onto a non-rectifiable domain. 3) For non-rectifiable domains we have no criteria to recognize which ones correspond to reflectionless measures. 4) It is well known, and not difficult to prove, that the antiderivative of a Nevanlinna function is a conformal map, see for instance [4]. If F = ∫ z ∫ dµ(x) x−z , µ ∈ M+(R) then ℑF (x) is an increasing function on R whose derivative in the sense of distributions is µ. The image F (C+) lies in the strip {ℑw ∈ (0, π‖µ‖)}. Theorem 5.1 will follow from Theorems 5.2 and 5.3 below. Theorem 5.2. Let F be a conformal map of C+ on a rectifiable comb-like domain O. Then F (z) = ∫ z ∫ dµ(x) x−z , µ ∈ M+(R), µ << dx. Also dµ(x) x−z ∈ H loc(C+). If in addition O satisfies (19) then µ is reflectionless. Proof. without loss of generality O ⊂ {ℜz > 0}. Put Φ = eF . Then the image Φ(O) is the subdomain of the complement of the unit half-disk in C+ which is the union of rays (R(θ)eiθ,∞). Consider the subdomain of the upper half-disk D := {z : 1/z ∈ Φ(O)}. Define G as the smallest open domain containing D and its reflection D := {z̄ : z ∈ D}. Then G is a star-like domain inside the unit disk. The preimage of G ∩ R under Φ is the union of two Infinite rays R1 = [−∞, a), R2 = (b,∞], a < b. Therefore, by reflection principle C \ [a, b] is mapped conformally (by the extension of Φ which we will also denote by Φ) onto star-like Since Φ : C+ → G, where G is star-like, it is well-known that argΦ(x+ iδ) is an increasing function of x, see [7]. We conclude that the argument of Φ is monotone. Therefore, ℑF (x + iδ) is monotone, and so ℑf(x+ iδ) is positive, where f = F ′. We see that f = F ′ is a Nevanlinna function. From the structure of our comb-like domain, we conclude immediately that its representing measure µ has compact support, so we are in M+(R). Also, let us prove that µ << dx. The boundary of our comb is locally rectifiable. So f = F ′ belongs locally to the Hardy class H1(C+), [16]. Since ℑf is the Poisson integral of µ, ℑf = Pµ = (x− t)2 + y2 dµ(t), UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 15 and f is in H1(C+) locally, we conclude that µ = ℑfdx,ℑf ≥ 0 a.e., [16]. Now suppose that, in addition, O = F (C+) has the property (19). Let us recall that for a simply connected domain with rectifiable boundary Γ the restriction of the Hausdorff measure H1|Γ is equivalent to the harmonic measure ν on O. Therefore the tangent lines to Γ are either vertical or horizontal a.e. with respect to ν. The measure ν is the image of the harmonic measure λ of C+ which is equivalent to the Lebesgue measure on the line. We have a conformal map F (a continuous function up to the boundary of C+ because it is an anti-derivative of an H1loc-function) which pushes forward λ to ν. Call a point w0 ∈ Γ accessible from O if there exists a ray x0 + iy, 0 < y < 1, such that w0 = limy→0 F (x0 + iy). Almost every point of Γ (w.r. to ν) is accessible from O. For ν-a.e. accessible w0 ∈ Γ where the tangent line is vertical (horizontal) we can say that ℜF ′(x0) = 0 (ℑF ′(x) = 0). So R = E1 ∪ E2 ∪ E3, where |E3| = 0, |E1 ∩ E2| = 0, and E1 = {x ∈ R : ℜF ′(x) = 0}, E2 = {x ∈ R : ℑF ′(x) = 0}. We already know that the measure µ = ℑF ′(x)dx represents f(z) = F ′(z) = dµ(t) t−z . Notice that R\E2 · = ·. But we also know that boundary values exist dx-almost everywhere, i.e. dµ(t) t− x− iy = ℜF ′(x) = 0 for a.e. x ∈ E1 and therefore for µ-a.e. x ∈ E1. This means (see [16]) that dµ(x) = 0 µ-a.e. Definition. A simply connected rectifiable comb-like domain O is called a comb if its “left” boundary consists of countably many horizontal and vertical segments. A comb is called a straight comb if O = {w : ℑw ∈ (0, π),ℜw > 0} \ S, where the set S is relatively closed with respect to the strip {w : ℑw ∈ (0, π),ℜw > 0} and is the union of countably many horizontal intervals Rn = (iyn, ln + iyn]. We require also that ln <∞ . Example. Let F be a conformal map of C+ on a comb O. By our last theorem F ′(z) =∫ dµ(x) x−z , where µ ∈M+(R) is reflectionless: C µ(x) = 0 for µ-a.e. x. Definition. Let E be a compact subset of the real line. Let E have positive logarithmic capacity, so Green’s function G of C \E exists. The domain C \E is called Widom domain G(c) <∞ , where the summation goes over all critical points of G (we assume that G is a Green’s function with pole at infinity. Example. Let E be a compact subset of the real line of the positive length. We assume that every point of E is regular in the sense of Dirichlet for the domain C \ E, and we also assume that C \ E is not a Widom domain. Such E exist in abundance. We will see below, that the harmonic measure ω of C \ E (with pole at infinity) is reflectionless. Consider F (z) = ∫ z ∫ dω(x) z−x for z ∈ C+. It is easy to see that F (z) = G(z) + iG̃(z) + const, 16 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG where G̃ is the harmonic conjugate of G. This F is a conformal map (see [4]) of C= onto a domain D lying in the strip {w : ℑw ∈ (0, π)}. It is easy to see that complementary intervals of E will be mapped by F onto straight horizontal segments on the boundary of D. Each finite complementary interval contains exactly one critical point of G, and clearly the length of the corresponding straight horizontal segment is G(c) (this follows from the formula F (z) = G(z) + iG̃(z) + const). As the domain C \ E was not a Widom domain, we have that the sum of lengths of abovementioned straight horizontal segment is infinite. So domain D is not rectifiable. Therefore the reflectionless property of µ alone does not say anything about the rectifiability of the domain, which is the target domain of the conformal map F (z) = ∫ z ∫ dµ(x) z−x . Theorem 5.3. Let µ be absolutely continuous positive measure on R and let Cµ ∈ H1loc(C+). Then F (z) = ∫ z ∫ dµ(x) x−z is a conformal map of C+ onto a rectifiable comb-like domain O. If µ is reflectionless then O has the property (19). Proof. Consider F (z) = ∫ z ∫ dµ(x) x−z . Since µ is positive, it is a conformal map. If µ is such that f(z) = Cµ ∈ H1loc(C+) then F (z) = f maps C+ onto a domain with locally rectifiable boundary (see [16]). If, in addition, µ = ℑfdx is reflectionless, then for a.e. point of P := {x ∈ R : ℑf(x) > 0} we have ℜf(x) = 0. Conformal map F (z) is continuous up to the boundary of C+ and its boundary values F (x) form a (locally) absolutely continuous function, F ′(x) = f(x) a.e. As at almost every point we have either ℑF ′(x) = 0 or ℜF ′(x) = 0 we conclude that O = F (C+) has the property (19). We also need the following definition. Definition. A compact subset E in R is called homogeneous if there exist r, δ > 0 such that for all x ∈ E, |E ∩ (x− h, x+ h)| ≥ δh for all h ∈ (0, r). Example. Let E ⊂ R be a compact set of positive length. Let µ be a reflectionless measure supported on E, µ = g(x)dx. Let in addition E be a homogeneous set. Then F (z) = ∫ z ∫ dµ(x) x−z is a conformal map from C+ on a rectifiable comb-like domain satisfying (19). Proof. The Cauchy integral Cgdx considered in C\E will be in the Hardy class H1(C\E). In fact the reflectionless property of gdx implies that its limits from C± will be both integrable with respect to dx|E . Now we use homogenuity of E and Zinsmeister’s theorem [15] to conclude that f(z) = Cgdx(z) is in the usual H1loc(C). Then the conformal map F (z) = f maps C+ onto a rectifiable subdomain of a strip. We use Theorem 5.3 to get the rest of our example’s claims. � The simple example of a reflectionless measure mentioned at the beginning of this section, as well as many other explicit examples, are given by our next statement. UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 17 Theorem 5.4. Let E be a compact set of positive lenght, E ⊂ R. Let ω be a harmonic measure of C \ E with pole at infinity. Then ω is reflectionless. Example. The simplest comb is a strip {w : ℑw ∈ (0, π),ℜw > 0}. Consider F (z) = log(z + z2 − 1). It maps conformally C+ onto the strip. Its derivative f(z) = 1√z2−1 is x−z and dµ = 1−x2 is the harmonic measure of C \ [−1, 1]. Proof of Theorem 5.4. We need to show that Cω = 0 at ω-a.e point. From our definitions it can be seen, that Cω on the line coincides with the Hilbert transform of ω, which in its turn is asymptotically equivalent to the conjugate Poisson transform Qω. Thus all we need to establish is that Qω(x+ ih) = (x− y)2 + h2 dω(y) = ℜ dω(y) x− ih− y → 0 as h→ 0+ (21) for almost every x. Instead, we have that the Green’s function F (x) defined as F (x) = log |x− y|dω(y) + C∞, where C∞ is a real constant (Robin’s constant), is equal to 0 at every density point of E, see for example [8]. The idea of the proof is to show that Qω(x + iε) behaves like (F (x+ ε) + F (x− ε))/ε near almost every x. The technical details are as follows. Introduce φ(y) := |1− y| |1 + y| y2 + 1 , (22) φx,h(y) := y − x The function φ(y) decreases as 1/y2 at infinity, hence it is in L1(R, dx) and so are φx,h(y) with a uniform bound on the norm. However, these functions are not bounded, which makes it difficult to use them in our estimates. To finish the proof we will first obtain a bounded version of φx,h(y) through the following averaging procedure. Let ω = g(x)dx. Choose x to be a Lebesgue point of g and a density point of E. Fixing sufficiently small h > 0 we can find the set A(x, h) ⊂ (x−h, x−h/2)∪ (x+h/2, x+h) such • A(x, h) consists of density points of E, • |A(x, h)| ≥ h/2, • A(x, h) is symmetric with respect to x. Let Tx,h := T := {t ∈ (0, h) : x+ t ∈ A(x, h)}. Then |T | ≥ h/4. Now put ψx,h(y) := φx,t(y) dt . By (22) one can see immediately that |ψx,h| ≤ for some M > 0 and |ψx,h(y)| ≤ C , for |y| > h . (23) 18 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG Also, since ∫ φ dy = 0 . we have that ∫ ψx,h dy = 0 . Therefore, g(y)ψx,h(y) dy| = | (g(y)− g(x))ψx,h(y) dy| ≤ |g(y)− g(x)||ψx,h|(y) dy. Now notice that (23) implies that |ψx,h| is majorated by an approximate unity (for instance, by a constant multiple of the Poisson kernel corresponding to z = x + ih). Since x is a Lebesgue point for g(x), this means that the last integral tends to 0 as h→ 0. Looking at the definitions of Tx,h and ψx,h(y) we can see that g(y)ψx,h(y) dy = |Tx,h| (F (x+ t)− F (x− t))−ℜ g(y)dy x− it− y where F (x) is the Green’s function. As we mentioned before, F is zero at the density points of E. We conclude that |Tx,h| g(y)dy x− it− y → 0, h→ 0 + . for a.e. x on the Borel support of g. Since the Cauchy integral of g has a limit a.e. we obtain g(y)dy x− ih− y → 0, h→ 0 + . Remark. All reflectionless measures on R discussed in this section, including those provided by Theorem 5.4 are absolutely continuous with respect to Lebesgue measure. One may wonder if there exist singular reflectionless measures. The answer is negative. More generally, as follows from a theorem from [9], if principal values of the Hilbert transform exist µ-a.e. for a continuous µ ∈M(R) then µ << dx . References [1] A. B. Alexandrov, J. M. Anderson, A. Nicolau. Inner functions, Bloch spaces and symmetric measures, Proc. London Math. Soc. (3) 79 (1999), no. 2, 318–352. [2] E.D. Belokolos, A.I. Bobenko, V.Z. Enol’skii, A.R. Its and V.B. Matveev, Algebro-Geometric Approach to Nonlinear Integrable Equations, Springer, Berlin (1994). [3] P. Mattila, Geometry of sets and measures in Euclidean spaces. Fractals and rectifiability. Cambridge Studies in Advanced Mathematics, 44. Cambridge University Press, Cambridge, 1995. xii+343 pp. [4] P. L. Duren, Univalent Functions, Grundlehren der Mathematischen Wissenschaften [Fundamental Prin- ciples of Mathematical Sciences], 259. Springer-Verlag, New York, 1983. [5] L.C. Evans, R. Gariepy, Measure Theory and Fine Properties of Functions, Studies in Advanced Math- ematics. CRC Press, Boca Raton, FL, 1992. [6] T. W. Gamelin, Uniform Algebras, Prentice-Hall, Inc., Englewood Cliffs, N. J., 1969. UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 19 [7] G. M. Golusin, Geometric theory of functions. Hochschulbcher fr Mathematik, Bd. 31. VEB Deutscher Verlag der Wissenschaften, Berlin, 1957. xii+438 pp. 30.0X [8] W. K. Hayman, P. B. Kennedy, Subharmonic Functions, vol. 1, Academic Press, London-New York, 1976. [9] P. Jones, A. Poltoratski, Asymptotic growth of Cauchy transforms, Ann. Acad. Sci. Fenn. Math, 2004 [10] M. Krein, A. Nudelman The Markov moment problem and extremal problems. Ideas and problems of P. L. Čebyšev and A. A. Markov and their further development. Translated from the Russian by D. Louvish. Translations of Mathematical Monographs, Vol. 50. American Mathematical Society, Providence, R.I., 1977. v+417 pp. [11] M. Melnikov, J. Verdera, A geometric proof of the L2 boundedness of the Cauchy integral on Lipschitz graphs, Internat. Math. Res. Notices 1995, no. 7, 325–331. [12] F. Nazarov, S. Treil, A. Volberg , Cauchy integral and Calder-Zygmund operators on nonhomogeneous spaces, Int. Math. Res. Not. 15 (1997) 703726. [13] X. Tolsa, L2 -boundedness of the Cauchy integral operator for continuous measures, Duke Math. J. 98 (1999), 269-304. [14] X. Tolsa, J. Verdera, May the Cauchy transform of a non-trivial finite measure vanish on the support of the measure? Ann. Acad. Sci. Fenn. Math. 31 (2006), no. 2, 479–494. [15] M. Zinsmeister, Espaces de Hardy et domaines de Denjoy. (French) [Hardy spaces and Denjoy domains] Ark. Mat. 27 (1989), no. 2, 363–378. [16] I. Privalov, Boundary properties of analytic functions, Gosudarstv. Izdat. Tehn.-Teor. Lit., Moscow- Leningrad, 1950. 336 pp. Department de Matematiques, Uneversitat Autonoma de Barcelona, 08193 Bellaterra, Barcelona, Spain E-mail address : mark.melnikov@gmail.com Texas A& M University, Department of Mathematics, College Station, TX 77843, USA E-mail address : alexeip@math.tamu.edu Dept. Math., Michigan State Univ., East Lansing MI 48823, USA, and, School of Math., University of Edinburgh, Edinburgh UK EH9 EJ6 E-mail address : sashavolberg@yahoo.com ABSTRACT If $\mu$ is a finite complex measure in the complex plane $\C$ we denote by $C^\mu$ its Cauchy integral defined in the sense of principal value. The measure $\mu$ is called reflectionless if it is continuous (has no atoms) and $C^\mu=0$ at $\mu$-almost every point. We show that if $\mu$ is reflectionless and its Cauchy maximal function $C^\mu_*$ is summable with respect to $|\mu|$ then $\mu$ is trivial. An example of a reflectionless measure whose maximal function belongs to the "weak" $L^1$ is also constructed, proving that the above result is sharp in its scale. We also give a partial geometric description of the set of reflectionless measures on the line and discuss connections of our results with the notion of sets of finite perimeter in the sense of De Giorgi. <|endoftext|><|startoftext|> Introduction Let Σnk,d ⊂ P(H0(P2,OP2(n))) := PN , with N = n(n+3) , be the closure, in the Zariski’s topology, of the locally closed set of reduced and irreducible plane curves of degree n with k cusps and d nodes. Let Σ ⊂ Σnk,d be an irreducible component of the variety Σnk,d. Denoting by Mg the moduli space of smooth curves of genus − k − d, it is naturally defined a rational map ΠΣ : Σ 99K Mg, sending the general point [Γ] ∈ Σ0 to the isomorphism class of the normalization of the curve Γ ⊂ P2 corresponding to [Γ]. We say that ΠΣ is the moduli map of Σ and we set number of moduli of Σ := dim(ΠΣ(Σ)). We say that Σ has general moduli if ΠΣ is dominant. Otherwise, we say that Σ has special moduli or that Σ has finite number of moduli. By lemma 2.2 of [4], we know that the dimension of the general fibre of ΠΣ is at least equal to max(8, 8 + ρ− k), where ρ := ρ(2, g, n) = 3n− 2g− 6 is the number of Brill-Noether of linear series of degree n and dimension 2 on a smooth curve of genus g. It follows that, if Σ has the expected dimension equal to 3n+ g − 1− k and g ≥ 2, then (1) dim(ΠΣ(Σ)) ≤ min(dim(Mg), dim(Mg) + ρ− k). Definition 1.1. We say that Σ has the expected number of moduli if equality holds in (1). In particular, we expect that, if ρ − k ≤ 0, then on the normalization curve C of the curve Γ ⊂ P2 corresponding to the general point [Γ] ∈ Σ, there exists only a finite number of linear series of degree n and dimension 2 mapping C to a plane curve with nodes and k cusps as singularities and corresponding to a point of Σ, (see the proof of lemma 2.2 of [4]). For a deeper discussion and a list of known results about the moduli problem of Σnk,d we refer to sections 1 and 2 of [4] and related references. In particular, in [4] we have found sufficient conditions in order Key words and phrases. number of moduli, sextics with six cusps, plane curves, Zariski pairs. http://arxiv.org/abs/0704.0622v1 2 CONCETTINA GALATI that an irreducible component Σ of Σnk,d has finite and expected number of moduli. If Σ verifies these conditions then ρ(2, n, g) ≤ 0. Finally in [4] we constructed examples of families of plane curves with nodes and cusps with finite and expected number of moduli. In this paper we consider the particular case of the variety Σ66,0 of irreducible sextics with six cusps. It was proved by Zariski (see [8]) that Σ66,0 has at least two irreducible com- ponents. One of them is the parameter space Σ1 of the family of plane curves of equation f32 (x0, x1, x2) + f 3 (x0, x1, x2) = 0, where f2 and f3 are homogeneous polynomials of degree two and three respectively. The general point of Σ1 corresponds to an irreducible sextic with six cusps on a conic as singularities. Moreover, Σ66,0 contains at least one irreducible component Σ2 whose general element corresponds to a sextic with six cusps not on a conic as singularities and containing in its closure the variety Σ69,0 of elliptic sextics with nine cusps. Recently, A. Degtyarev has proved that Σ1 and Σ2 are the unique irreducible components of Σ66,0, (see [1]). The moduli number of Σ1 and Σ2 can not be calculated by using the result of [4]. Indeed, in this case ρ(2, 4, 6) = 4 > 0 and then the general element of every irreducible component of Σ66,0 does not verify the hypotheses of proposition 4.1 of [4]. On the contrary, it is easy to verify that, if Γ ⊂ P2 is the plane curve corresponding to the general element of one of the irreducible components of Σ66,0 and C is the normalization curve of Γ, then the map µo,C is injective. But, in contrast with the nodal case, this information is not useful in order to study the moduli problem of Σnk,d, (see [6] and remark 4.2 of [4]). In the proposition 2.2 and corollary 2.4, we prove that Σ2 has the expected number of moduli equal to seven. Moreover, we show that there exists a stratification Σ69,0 ⊂ Σ′ ⊂ Σ̃ ⊂ Σ2, where Σ′ and Σ̃ are respectively irreducible components of Σ68,0 and Σ 7,0 with ex- pected number of moduli. Finally, in the corollary 2.8, we prove that also Σ1 has the expected number of moduli by using that every element of Σ1 is the branch locus of a triple plane. 2. On the number of moduli of complete irreducible families of plane sextics with six cusps First of all we want to find sufficient conditions in order that, if an irreducible component Σ of Σnk,d has the expected number of moduli, then every irreducible component Σ′ of Σnk′,d′ , containing Σ, has the expected number of moduli. In the corollary 4.7 of [4] we considered this problem under the hypothesis that Σ has the expected dimension and ρ(2, n, g) ≤ 0. Now we are interested to the case ρ > 0. We need the following local result. = {(a, b, x, y)| y2 = x3 + ax+ b} ⊂ C2 × A2 be the versal deformation family of an ordinary cusp (see [3] for the definition and properties of the versal deformation family of a plane singularity). We recall that the general curve of this family is smooth. The locus ∆ of C2 of the pairs (a, b) such that the corresponding curve is singular, has equation 27b2 = 4a3. For (a, b) ∈ ∆ and (a, b) 6= (0, 0), the corresponding curve has a node and no other singularities, whereas (0, 0) is the only point parametrizing a cuspidal curve. ON THE NUMBER OF MODULI OF PLANE SEXTICS WITH SIX CUSPS 3 Lemma 2.1 ([3], page 129.). Let G → C2 be a two parameter family of curves of genus g ≥ 2, whose general fibre is stable and which is locally given by y2 = x3 + ax + b, with (a, b) ∈ C2 and let D ⊂ C2 be a curve passing through (0, 0) and not tangent to the axis b = 0 at (0, 0). Then the j-invariant of the elliptic tail of the curve which corresponds to the stable limit of G(0,0), with respect to the curve D, doesn’t depend on D. Otherwise, for every j0 ∈ C, there exists a curve Dj0 ⊂ C2 passing through (0, 0) and tangent to the axis b = 0 at this point, such that the elliptic tail of the stable reduction of G(0,0) with respect to Dj0 , has j-invariant equal to j0. Proposition 2.2. Let Σ ⊂ Σnk,d, with k < 3n, be an irreducible component of Σnk,d. Let g be the geometric genus of the plane curve corresponding to the general element of Σ. Suppose that g ≥ 2, ρ(2, g, n)− k ≤ 0 and Σ has the expected number of moduli equal to 3g − 3 + ρ− k. Then, every irreducible component Σ′ of Σnk′,d′ , with k′ = k− 1 and d = d′ or k = k′ and d′ = d− 1, such that Σ ⊂ Σ′, has expected number of moduli. Proof. First we consider the case k′ = k − 1 and d = d′. Let q1, . . . , qk be the cusps of Γ. It is well known that, since k < 3n then [Γ] ∈ Σnk−1,d. In particular, for every fixed cusp qi of Γ there exists an irreducible analytic branch Si of Σnk−1,d passing through the point [Γ] and whose general point corresponds to a plane curve Γ′ of degree n with d nodes and k − 1 cusps specializing to the singular points of Γ different from qi, as Γ ′ specializes to Γ. Moreover, it is possible to prove that every Si is smooth at the point [Γ], see [7] or chapter 2 of [5]. Let Σ′ be one of the irreducible components of Σnk−1,d containing Σ. Notice that the general element of Σ′ corresponds to a curves of genus g′ = g + 1. Since ρ(2, g′, n) − k′ = 3n− 2g − 2− 6− k + 1 = ρ(2, g, n)− k − 1 < 0, in order to prove the theorem it is enough to show that the general fibre of the moduli map ΠΣ′ : Σ 99K Mg+1 has dimension equal to eight. Let us notice that the map ΠΣ′ is not defined at the general element [Γ] of Σ. More precisely, let γ ⊂ Si ⊂ Σ′ be a curve passing through [Γ] and not contained in Σ. Let C → γ be the tautological family of plane curves parametrized by γ. Let C′ → γ be the family obtained from C → γ by normalizing the total space. The general fibre of C′ → γ is a smooth curve of genus g+1, while the special fibre C′0 := Γ′ is the partial normalization of Γ obtained by smoothing all the singular points of Γ, except the marked cusp qi. If we restrict the moduli map ΠΣ′ to γ, we get a regular map which associates to [Γ] the point corresponding to the stable reduction of Γ with respect to the family C′ → γ, which is the union of the normalization curve C of Γ and an elliptic curve, intersecting at the point q ∈ C which maps to the cusp qi ∈ Γ. Now, let G ⊂ Σ′ × Mg+1 be the graph of ΠΣ′ , let π1 : G → Σ′ and π2 : G → Mg+1 be the natural projections and let U ⊂ Σ be the open set parametrizing curves of degree n and genus g with exactly k cusps and d nodes as singularities. From what we observed before, if we denote by ΠΣ′(Σ) the Zariski closure in Mg+1 of π2π−11 (U), then ΠΣ′(Σ) is contained in the divisor ∆1 ⊂ Mg+1, whose points are isomorphism classes of reducible curves which are union of a smooth curve of genus g and an elliptic curve, meeting at a point. Denoting by ΠΣ : Σ → Mg the moduli map of Σ, the rational map ∆1 99K Mg which forgets the elliptic tail, restricts to a rational dominant map q : ΠΣ′(Σ) 99K ΠΣ(Σ). The dimension of the general fibre of q is at most two. Since, by hypothesis, the dimension of the fibre of the moduli map ΠΣ is eight, there exists only a finite 4 CONCETTINA GALATI number of g2n on C, ramified at k points, which maps C to a plane curve D such that the associated point [D] ∈ P n(n+3) 2 belongs to Σ. In particular, the set of points x of C such that there is a g2n with k simple ramification points, one of which at x, is finite. So, the dimension of the general fibre of q is at most one. In order to decide if the general fibre of q has dimension zero or one, we have to understand how the j-invariant of the elliptic tail of the stable reduction of Γ′ with respect the family C′ → γ, depends on γ. If C → C2 is the étale versal deformation family of the cusp. By versality, for every fixed cusp pi of Γ, there exist étale neighborhoods n(n+3) 2 of [Γ] in P n(n+3) 2 , V v→ C2 of (0, 0) in C2 and Ui of pi in the tautological family U → P n(n+3) 2 with a morphism f : U → V such that the family Ui → U is the pullback, with respect to f , of the restriction to V of the versal family. By the properties of the étale versal deformation family of a plane singularity, (see [2]), we have that f−1((0, 0)) is an étale neighborhood of [Γ] in the (smooth) analytic branch Σn1,0 whose general element corresponds to an irreducible plane curves with only one cusp at a neighborhood of the cusp qi of Γ. So, dim(f −1((0, 0))) = n(n+3) − 2 and the map f is surjective. Moreover, if g is the restriction of f at u−1(Σ′), then also g is surjective. Indeed, g−1((0, 0)) = f−1((0, 0)) ∩ u−1(Σ′) = u−1((Σ)) and, since k < 3n, then dim(Σ) = 3n + g − 1 − k = dim(Σ′) − 2 and g is sur- jective. By using lemma 2.1, it follows that the general fibre of the natural map ΠΣ′(Σ) → ΠΣ(Σ) has dimension exactly equal to one. Therefore, dim(ΠΣ′ (Σ)) = dim(ΠΣ(Σ)) + 1 = 3g − 3 + ρ(2, g, n)− k + 1 = 3(g + 1)− 3 + ρ(2, g + 1, n)− k By using that dim(ΠΣ′ (Σ ′)) ≥ dim(ΠΣ′(Σ)) + 1 = 3(g + 1)− 3 + ρ(2, g + 1, n)− k + 1. and by recalling that, by lemma 2.2 of [4], it is always true that dim(ΠΣ′(Σ ′)) ≤ 3(g + 1)− 3 + ρ(2, g + 1, n)− k + 1, the statement is proved in the case k′ = k − 1 and d′ = d. Suppose, now, that k = k′ and d′ = d − 1. Also in this case Σ is not contained in the regularity domain of ΠΣ′ . More precisely, if [Γ] ∈ Σ is general, then ΠΣ′([Γ]) consists of a finite number of points, corresponding to the isomorphism classes of the partial normalizations of Γ obtained by smoothing all the singular points of Γ, except for a node. Then ΠΣ′(Σ) is contained in the divisor ∆0 of Mg+1 parametrizing the isomorphism classes of the analytic curves of arithmetic genus g + 1 with a node and no more singularities. The natural map ∆0 99K Mg sending the general point [C′] of ∆0 to the isomorphism class of the normalization of C restricts to a rational dominant map q : ΠΣ′(Σ) 99K ΠΣ(Σ). Since we suppose that Σ has the expected number of moduli and ρ(2, g, n)−k ≤ 0, if C is the normalization of the plane curve corresponding to the general element of Σ, then the set S of the linear series of dimension 2 and degree n on C with k simple ramification points, mapping C to a plane curve D such that the associated point [D] in the Hilbert Scheme belongs to Σ, is finite. We deduce that also the set S′ of the pairs of points (p1, p2) of C, such that there is a g n ∈ S such that the associated morphism maps p1 and p2 to the same point of the plane, is finite. So, also q −1([C]) is finite and dim(ΠΣ′ (Σ)) = dim(ΠΣ(Σ)). It follows that dim(ΠΣ′ (Σ ′)) ≥ 3g− 3+ 3n− 2g− 6− k+1 = 3(g+1)− 3+ 3n− 2(g+1)− 6− k. Remark 2.3. Notice that, the arguments used before to prove lemma 2.2 don’t work if the dimension of the general fibre of the moduli map of Σ has dimension bigger than eight. Indeed, in this case, the dimension of the general fibre of the map ON THE NUMBER OF MODULI OF PLANE SEXTICS WITH SIX CUSPS 5 ΠΣ′(Σ) 99K ΠΣ(Σ) could be bigger than one if k ′ = k − 1 and d = d′, or than zero if k′ = k and d = d′ − 1. Corollary 2.4. There exists at least one irreducible component Σ2 of Σ 6,0 having the expected number of moduli equal to dim(M4) − 2 and whose general element corresponds to a sextic with six cusps not on a conic. Remark 2.5. As we already observed in the previous section, Σ2 is the only com- ponent of Σ66,0 parametrizing sextics with six cusps not on a conic by [1]. Proof. Let Σ69,0 be the variety of elliptic plane curves of degree six with nine cusps and no more singularities. It is not empty and irreducible, because, by the Plücker formulas, the family of dual curves is Σ30,0 ≃ P9, which is irreducible and not empty. Moreover, if we compose an holomorphic map φ : C → P2 from a complex torus C to a smooth plane cubic with the natural map φ(C) → φ(C)∗, where we denoted by the dual curve of φ(C), we get a morphism from C to a plane sextic with nine cusps. Therefore, the number of moduli of Σ69,0 is equal of the number of moduli of Σ30,0, equal to one. Since 6 < 3n = 18, there is at least one irreducible component Σ′ of Σ68,0 containing Σ 9,0. Let ΠΣ′ : Σ 99K M2 be the moduli map of Σ′ and let G ⊂ Σ′ × M2 be its graph. If we denote by π1 : G → Σ′ and π2 : G → M2 the natural projection, by U the open set of Σ69,0 parametrizing cubics of genus one with nine cusps and by ΠΣ′(Σ 9,0) the Zariski closure in M2 of π2π−11 (U), then, by arguing as in the first part of the proof of the lemma 2.2, we have a dominant map ΠΣ′(Σ 9,0) 99K M1, whose general fibre has dimension one. We conclude that dim(πΣ′ (Σ ′)) ≥ dim(πΣ′(Σ69,0)) + 1 = 3 and so, the moduli map of Σ′ is dominant, as one expects, because ρ(2, 2, 6)− 8 = 18− 4− 6− 8 = 0. Let D be the plane sextic corresponding to the general point of Σ′. By Bezout theorem, the height cusps P1, . . . , P8 of D don’t belong to a conic and, however we choose five cusps of D, no four of them lie on a line. Then, let C2 be the unique conic containing P1, . . . , P5. There exists at least a cusp, say P6, which does not belong to C2. Since 8 < 3n = 18, there exists a family of plane sextics D → ∆, whose special fibre is D and whose general fibre has a cusp at a neighborhood of every cusp of D different from P7 and no further singularities. By lemma 2.2, the curve ∆ is contained in an irreducible component of Σ67,0 with expected number of moduli. By repeating the same argument for the general fibre of the family D → ∆ we get an irreducible component Σ2 of Σ66,0 with the expected number of moduli and whose general element parametrizes a sextic with six cusps not on a conic. � Now we consider the irreducible component Σ1 of Σ 6,0 parametrizing plane curves of equation f23 (x0, x1, x2) + f 2 (x0, x1, x2) = 0, where f2 is an homogeneous poly- nomial of degree two and f3 is an homogeneous polynomial of degree three. The general element of Σ1 corresponds to an irreducible plane curve of degree six with six cusps on a conic. We want to show that Σ1 has the expected number of moduli equal to 12− 3 + ρ(2, 4, 6)− 6 = 7 = dim(M4)− 2. Equivalently, we want to show that the general fibre of the moduli map Σ1 99K M4 has dimension equal to eight. Lemma 2.6. Let Γ2 : f2(x0, x1, x2) = 0 and Γ3 : f3(x0, x1, x2) = 0 be a smooth conic and a smooth cubic intersecting transversally. Then, the plane curve Γ : f23 (x0, x1, x2)− f32 (x0, x1, x2) = 0 is an irreducible sextic of genus four with six cusps at the intersection points of Γ2 and Γ3 as singularities. The curve Γ is projection of a canonical curve C ⊂ P3 from 6 CONCETTINA GALATI a point p ∈ P3 which is contained in six tangent lines to C. Moreover, for every point q ∈ P3 −C such that the projection plane curve πq(C) of C from q is a sextic with six cusps on a conic of equation g23(x0, x1, x2) − g32(x0, x1, x2) = 0, where g3 and g2 are two homogeneous polynomials of degree three and two respectively, there exists a cubic surface S3 ∈ |IC|P3(3)|, containing C, such that the plane curve πq(C) is the branch locus of the projection πq : S3 → P2. Remark 2.7. Notice that, by [1], every irreducible sextic with six cusps on a conic as singularities has equation given by (f2(x0, x1, x2)) 3 + (f3(x0, x1, x2)) 2 = 0, with f2 and f3 homogeneous polynomials of degree two and three. In order words, all the sextics with six cusps on a conic as singularities are parametrized by points of Σ1. An other proof of this result as been provided to us by G. Pareschi. Proof of lemma 2.6. Let f(x0, x1, x2) = f 3 (x0, x1, x2) − f32 (x0, x1, x2) = 0 be the equation of Γ. From the relation f3(x) = ±f2(x) f2(x), we deduce that (x) = ±2∂f2 f2(x) and hence (x) = 2 (x)f3(x)− 3f2(x)2 (x) = −f2(x)2 By using that the conic Γ2 : f2 = 0 is smooth, it follows that, if a point x ∈ Γ is singular, then x ∈ Γ2 and hence x ∈ Γ3∩Γ2. On the other hand, always from (2), if x ∈ Γ2∩Γ3, then x is a singular point of Γ. Hence, the singular locus of Γ coincides with Γ3 ∩ Γ2. Let x be a singular point of Γ. If p1(x, y) + terms of degree two = 0 q1(x, y) + terms of degree ≥ two = 0 are respectively affine equations of Γ2 and Γ3 at x, then, the affine equation of Γ at x is given by q1(x, y) 2 − p1(x, y)3 + terms of degree ≥ four = 0. Since Γ2 and Γ3 intersect transversally, we have that q1(x, y) does not divide p1(x, y) and hence Γ has an ordinary cusp at x. Let now φ : C → Γ be the normalization of Γ. We recall that the cubics passing through the six cusps of Γ cut out on C the complete canonical series |ωC |. Since the cusps of Γ is contained in the conic Γ2 ⊂ P2 of equation f2 = 0, the lines of P 2 cut out on C a subseries g ⊂ |ωC | of dimension two of the canonical series. Moreover, if we still denote by C a canonical model of C in P3, then the linear series g is cut out on C in P3 from a two dimensional family of hyperplanes passing through a point p ∈ P3−C. If we project C from p we get a plane curve projectively equivalent to Γ. Since Γ has six cusps as singularities, we deduce that there are six tangent lines to C passing through p. To see that Γ is the branch locus of a triple plane, let S3 ⊂ P3 be the cubic surface of equation F3(x0, . . . , x3) = x 3 − 3f2(x0, x1, x2)x3 + 2f3(x0, x1, x2) = 0. If p = [0, 0, 0, 1], then, by using Implicit Function Theorem, the ramification locus of the projection πp : S3 → P2, is given by the intersection of S3 with the quadric S2 of equation = x23 − f2(x0, x1, x2) = 0. Now, if x = [x0, x1, x2] ∈ S3 ∩ S2, then x3 = ± f2(x0, x1, x2). By substituting in the equation of S3, we find that the branch locus of the projection πp : S3 → P2 coincides with the plane curve Γ. From what we proved before, it follows that the ramification locus of the projection map πp : S3 → P2 is the normalization curve C of Γ. Finally, if q ∈ P3 − C is an other point such that the plane projection πq(C) is an irreducible sextic with six cusps on a conic parametrized by a point xq ∈ Σ1 ⊂ P27, then, up to projective motion, we may always assume that q = [0 : 0 : 0 : 1] and hence, if g23(x0, x1, x2)− g32(x0, x1, x2) = 0 ON THE NUMBER OF MODULI OF PLANE SEXTICS WITH SIX CUSPS 7 is the equation of the plane curve πq(C), then C is the locus of ramification of the projection from q to the plane of the cubic surface of equation x33 − 3g2(x0, x1, x2)x3 + 2g3(x0, x1, x2) = 0. Corollary 2.8. The irreducible component Σ1 of Σ 6,0 parametrizing plane curves of equation f23 (x0, x1, x2)+ f 2 (x0, x1, x2) = 0, where f2 is an homogeneous polynomial of degree two and f3 is an homogeneous polynomial of degree three, has the expected number of moduli equal to 7 = dim(M4) + ρ(2, 4, 6)− 6. Proof. Let [Γ] ⊂ P2 be a plane sextic of equation f23 (x0, x1, x2) − f32 (x0, x1, x2) = 0, where the conic f2 = 0 and the cubic f3 = 0 are smooth and they intersect transversally. Let C ⊂ P3 be the normalization curve of Γ and let SC be the set of points v = [v0 : · · · : v3] ∈ P3 such that there exists a cubic surface S3 ∈ |IC|P3(3)|, containing C, such that the curve C is the ramification locus of the projection πv : S3 → P2. By the former lemma, in order to prove that Σ1 has the expected number of moduli, it is enough to find a point [Γ] of Σ1 corresponding to an irreducible plane sextic Γ ⊂ P2 with six cusps of a conic such that the set SC is finite. Let Γ2 be the smooth conic of equation f2(x0, x1, x2) = x 1 − x22 = 0 and let Γ3 be the smooth cubic of equation f3(x0, x1, x2) = x 0+x0x 2−x21x2 = 0. If a1, a2 and a3 are the three different solutions of the polynomial x3 + x2 + x− 1 = 0, then Γ2 and Γ3 intersect transversally at the points [ai, ai, 1], [ai,− ai, 1], with i = 1, 2, 3. By the former lemma, the plane sextic Γ of equation f32 − f23 = 0 is irreducible and it has six cusps at the intersection points of Γ2 and Γ3 as singularities. Moreover, the normalization curve C of Γ is the canonical curve of genus 4 in P3 which is intersection of the cubic surface S3 ⊂ P3 of equation F3(x0, x1, x2, x3) = x 3 + (x 0 + x 1 − x22)x3 + x30 + x0x22 − x21x2 = 0 and the quadric S2 of equation = 3x23 + x 0 + x 1 − x22 = 0. We want to show that SC is finite. To see this we observe that, since h0(P3, IC|P3(2)) = 1 and h0(P3, IC|P3(3)) = 5, the equation of every cubic surface containing C and which is not the union of S2 and an hyperplane is given by G(x0, . . . , x3;β0, . . . , β3) = F3(x0, x1, x2, x3) + ∂F3(x0, x1, x2, x3) with βj ∈ C, for i = 0, . . . , 3. Now, a point [v] = [v0, . . . , v3] ∈ SC if and only if there exist β0, . . . , β3 such that C is contained in the intersection of G(x0, . . . , x3;β0, . . . , β3) = 0 and ∂G(x0, . . . , x3;β0, . . . , β3) Still using that h0(P3, IC|P3(2)) = 1, a point [v] ∈ P3 belongs to SC if and only if ∂G(x0, . . . , x3;β0, . . . , β3) ∂F3(x0, . . . , x3) for some λ ∈ R− 0, or, equivalently, βjxj) ∂x3∂xi = (λ− viβi − v3) 8 CONCETTINA GALATI The previous equality of polynomials is equivalent to the following bilinear system of ten equations in the variables v0, . . . , v3 and β0, . . . , β3 (1 + β3)v0 + 3β0v3 = 0 (x0x3) (1 + β3)v1 + 3β1v3 = 0 (x1x3) (1 + β3)v2 − 3β2v3 = 0 (x2x3) β1v0 + β0v1 = 0 (x0x1) β2v0 + (1− β0)v2 = 0 (x0x2) (1 − β2)v1 + β1v2 = 0 (x1x2) 2β1v1 − v2 = λ− j=0 βjvj − v3 (x21) −v0 + 2β2v2 = λ− j=0 βjvj − v3 (x22) (3 + 2β0)v0 = λ− j=0 βjvj − v3 (x20) 2β3v3 = λ− j=0 βjvj − v3 (x23) The points of SC are the solutions v of the previous linear system, as a linear system whose coefficients depend on β0, . . . , β3. It is easy to prove that it has only a solution equal to (v0, v1, v2, v3) = (0, 0, 0, λ) if β0 = β1 = β2 = β3 = 0 and it has not solutions otherwise, (see [5], page 98). By the previous lemma, we conclude that the point [0 : 0 : 0 : 1] ∈ P3 is the only point which belongs to six tangent lines to the canonical curve C ⊂ P3 which is intersection of the cubic surface of equation F3(x0, x1, x2, x3) = x 3 + (x 0 + x 1 − x22)x3 + x30 + x0x22 − x21x2 = 0 and the quadric of equation = 3x23 + x 0 + x 1 − x22 = 0. It follows that, on the normalization curve D of the plane curve Γ′ corresponding to the general point of Σ1 ⊂ Σ66,0 there exists only a finite number of linear series of dimension two with six ramification points. � Remark 2.9. By using the notation introduced in the proof of corollary 2.8, we observe that in this corollary we have proved that if C is a general canonical curve of genus four such that the set SC is not empty, then SC is finite. Actually, C. Ciliberto pointed out to our attention that it is possible to show, with a very simple argument, that for every canonical curve C of genus four such that SC is not empty, we have that SC is finite. Finally, we observe that, by remark 2.7, for every canonical curve C of genus four, the set SC coincides with the set of points of P3 which are contained in six tangent lines to C. Acknowledgment. The results of this paper are contained in my PhD-thesis. I would like to thank my advisor C. Ciliberto for introducing me into the subject and for providing me very useful suggestions. I have also enjoyed and benefited from conversation with G. Pareschi and my college M. Pacini. References [1] A. Degtyarev: On deformations of singular plane sextics, math.AG/0511379, appearing on Journal of Algebraic Geometry. [2] S. Diaz and J. Harris: Ideals associated to deformations of singular plane curves, Transactions of the American Mathematical Society, vol. 309, n. 2, 433–468 (1988). [3] J. Harris and I. Morrison: Moduli of curves, Graduate texts in mathematics, vol. 187, Springer, New York, 1988. [4] C. Galati: Number of moduli of irreducible families of plane curves with nodes and cusps, Collect. Math. 57,3 (2006), 319-346. [5] C. Galati: Number of moduli of plane curves with nodes and cusps, PhD the- sis, Università degli Studi di Tor Vergata, 2004-2005, available on the homepage http://dspace.uniroma2.it/dspace/items-by-author?author=Galati http://arxiv.org/abs/math/0511379 http://dspace.uniroma2.it/dspace/items-by-author?author=Galati ON THE NUMBER OF MODULI OF PLANE SEXTICS WITH SIX CUSPS 9 [6] E. Sernesi: On the existence of certain families of curves, curves. Invent. Math. vol. 75 (1984), no. 1, 25–57. [7] O. Zariski: Dimension theoretic characterization of maximal irreducible sistems of plane nodal curves, Amer. J. Math. vol. 104 (1982), no. 1, 209–226. [8] O. Zariski: Algebraic surfaces, Classics in mathematics, Springer. Dipartimento di Matematica, Università della Calabria, Arcavacata di Rende (CS) E-mail address: galati@mat.unical.it 1. Introduction 2. On the number of moduli of complete irreducible families of plane sextics with six cusps Acknowledgment References ABSTRACT Let S be the variety of irreducible sextics with six cusps as singularities. Let W be one of irreducible components of W. Denoting by M_4 the space of moduli of smooth curves of genus 4, the moduli map of W is the rational map from W to M_4 sending the general point of W, corresponding to a plane curve D, to the point of M_4 parametrizing the normalization curve of D. The number of moduli of W is, by definition the dimension of the image of W with respect to the moduli map. We know that this number is at most equal to seven. In this paper we prove that both irreducible components of S have number of moduli exactly equal to seven. <|endoftext|><|startoftext|> Introduction Angular distribution Diffusion on surfaces Asynchronous transitions Fourier analysis of asynchronous data Random walk about a single center Discussion Angular distributions Time distribution Acknowledgments References ABSTRACT In this paper I describe a specialized algorithm for anisotropic diffusion determined by a field of transition rates. The algorithm can be used to describe some interesting forms of diffusion that occur in the study of proton motion in a network of hydrogen bonds. The algorithm produces data that require a nonstandard method of spectral analysis which is also developed here. Finally, I apply the algorithm to a simple specific example. <|endoftext|><|startoftext|> Temporal Evolution Of Step-Edge Fluctuations Under Electromigration Conditions P.J. Rous∗ and T.W. Bole Department of Physics, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250 (Dated: October 31, 2018) The temporal evolution of step-edge fluctuations under electromigration conditions is analysed using a continuum Langevin model. If the electromigration driving force acts in the step up/down direction, and step-edge diffusion is the dominant mass-transport mechanism, we find that significant deviations from the usual t1/4 scaling of the terrace-width correlation function occurs for a critical time τ which is dependent upon the three energy scales in the problem: kBT , the step stiffness, γ, and the bias associated with adatom hopping under the influence of an electromigration force, ±∆U . For (t < τ ), the correlation function evolves as a superposition of t1/4 and t3/4 power laws. For t ≥ τ a closed form expression can be derived. This behavior is confirmed by a Monte-Carlo simulation using a discrete model of the step dynamics. It is proposed that the magnitude of the electromigration force acting upon an atom at a step-edge can by estimated by a careful analysis of the statistical properties of step-edge fluctuations on the appropriate time-scale. PACS numbers: 05.40.-a, 66.30.Qa, 68.35.Ja I. INTRODUCTION During the past decade, continuum models and dis- crete lattice simulations have been applied to understand direct imaging observations of the thermal fluctuations of step edges in which the step position is monitored as a function of time [1, 2]. Of particular interest has been the study of the dynamics of the equilibration of terrace width distributions where the temporal evolution of step edge fluctuations are driven by the exchange of atoms between the step and the adjacent terrace and/or by mo- tion of adatoms along the step edge itself [3, 4, 5, 6, 7, 8]. It is well known that the position of the step edge, as de- scribed by its temporal correlation function, has a time dependence that scales as a power law with an expo- nent characteristic of specific atomic processes driving the step fluctuations; tβ . In cases where mass transport at the step is dominated by diffusion of atoms along the step edge; β = 1/4. When mass transport proceeds via exchange of atoms between the step edge and the terrace β = 1/2. Careful experiments allow a crossover from t1/4 to t1/2 scaling to be observed as a function of the sam- ple temperature [9]. Further, experimental measurement of the correlation functions have been used to determine thermodynamic properties of the steps, such as the step stiffness [1, 2]. In this paper we investigate how the scaling of the step edge fluctuations is changed by the presence of an electro- migration force [10] acting upon atoms diffusing along the step edges. The primary motivation for this study is the possibility of using measurements of these changes to ob- ∗Electronic address: rous@umbc.edu tain information about the electromigration force itself. In conducting materials, a surface electromigration force can be generated by passing an electrical current through the sample [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]. In terms of a simple discrete model, the presence of the electromigration force introduces a small bias in the dif- fusion of atoms at the step edge, parallel to the current (and field). By convention, this bias can be expressed as an energy difference between atoms diffusing parallel or anti-parallel to the current ∆U = Z∗eEa⊥ where E is the electric field applied to the sample, Z∗ is the effec- tive valence of adatoms and a⊥ is the lattice spacing per- pendicular to the step edge. A characteristic property of surface electromigration is that the electromigration bias, ∆U , is several orders of magnitude smaller than the other energy scales in the problem: γa⊥, where γ is the step stiffness and the energy associated with thermal fluctua- tions; kBT . This suggests that thermal fluctuations will completely dominate the short-time behavior of the step fluctuations with the effect of the electromigration bias emerging only on much longer time scales. Nevertheless, such time scales (of the order of seconds) are accessible to experiment offering the possibility that the observation of step fluctuations under electromigration conditions may allow us to obtain quantitative information about the magnitude of the force itself; a quantity that, to date, has been hidden from experimental study. This paper is organized as following. In section 2 we present a continuum theory of step edge fluctuations un- der electromigration conditions. In section 3 we test the fidelity of the theory by showing the results of a Monte- Carlo simulation of the temporal evolution step correla- tion function. Concluding remarks are contained in sec- tion 4. http://arxiv.org/abs/0704.0624v1 mailto:rous@umbc.edu II. THEORY In order to describe the dynamics of a step-edge evolv- ing under electromigration conditions we employ the usual Langevin formalism where each degree of freedom diffuses towards lower energy with a velocity that is pro- portional to the energy gradient, subject to random ther- mal fluctuations. The position of step edge is described by it’s edge profile y(x, t) where the x-axis is oriented parallel to the step edge. y(x, t) is the position of the step edge at x and at time t. In this paper we consider the limiting case where the step motion occurs most easily by adatom diffusion along the step edge itself. Adatom exchange between the step- edge and the adjacent terraces, via attachment and de- tachment, is neglected. It is well known that atomic diffu- sion along a step edge is driven by the step curvature [23], which generates a flux Dsδsγ ∇sκ (1) where Jc is the curvature-driven flux, Ds = a ||/τh is the surface diffusion constant, a|| is the atomic diameter, δs = a⊥ the width of an atom perpendicular to the step, τh is the average time between hopping events and γ is the step stiffness. ∇sκ is the gradient of the curvature along the step edge. Mass conservation determines the normal velocity of the step edge, n̂ = −Ω∇s · Jc (2) where Ω is the area of a single atom and n̂ is normal to the step edge. The inclusion of a thermal noise term, η and lineariza- tion of the above equation leads to the well-known equa- tion of motion for a step edge [3] − Γhγ y(x, t) = η(x, t). (3) where we have defined a3||a . (4) We model the effects of electromigration by adding to this equation of motion a term generated by a constant force, F = Z∗eE, felt by atoms at the step edge, which arises from the application of a electric field, E, to the material oriented parallel to the y-axis. Z∗ is the effective valence of an atom at the step edge. The electromigration force generates an additional flux, JEM = DsδsZ ∗eE|| DsδsZ 1 + y2x where E|| is component of the electric field along the step edge. yx indicates an x derivative. The energy of any step configuration, relative to the energy of the straight step, is determined by the step stiff- ness γ and the electromigration force, F , felt by atoms at the step edge. If the force acts perpendicular to the step edge (i.e. when F > 0, the force acts in the +y, or step down, direction) then, for small fluctuations, one can linearize the stochastic equation of motion for the step edge to obtain the following Langevin equation for the step dynamics, − Γhγ − ΓhF kTa||a⊥ y(x, t) = η(x, t) (6) where a|| and a⊥ are the lattice parameter parallel and perpendicular to the step edge. The noise term, η, de- scribes the thermal fluctuations and is correlated be- cause, in our model, of the random hopping of atoms occurs only between between adjacent step edge sites. 〈η(x, t)η(x′, t′)〉 = 2Γhδ(t− t′) δ(x − x′) (7) To first order, the electromigration force does not al- ter the correlation properties of the noise so that in the single-jump regime equation 4 remians valid. Before proceeding, for notational simplicity we rewrite eqn. 6 as − α ∂ ∓ αq2c y(x, t) = η(x, t) (8) where α ≡ Γhγ , qc ≡ a||a⊥ The parameter qc depends only on the magnitude of the electromigration force and in eqn. 6 the ∓ denotes the force acting in the step down(-) or step up (+) direction. In the case where there is no random thermal noise (i.e. T → 0), eqn. 6 predicts that the amplitude of a small sinusoidal fluctuation of the step-edge profile, with wavevector q, evolves according to the following disper- sion relation, iω = αq2 q2 ± q2c This is the same dispersion relation as the one that de- scribes step flow under growth conditions [24, 25, 26]. If the electromigration force acts in the step up (+) direc- tion the step fluctuations are linearly stable. For a force acting in the step-down direction there exists a linear in- stability which initiates the well-known phenomenon of electromigration-induced step-wandering for fluctuations with wavevector smaller than qc [27, 28, 29]. The critical wavevector is an important parameter in our Langevin model that, from eqn. 9, is determined by the ratio of the electromigration force and the step stiffness. In order to determine the statistical properties of the solutions of eqn. 6 we take the usual approach and first derive the Green’s function for the problem using stan- dard Fourier transform methods, G(x, t|x′, t′) = i e−α(k k2)(t−t′)eik(x−x In terms of the Green’s function, the displacement of the step at time t is: y(x, t′′) = y(x, t′) + ∫ t′′ G(x, t′′|x′, t)η(x′, t′)dx′dt We can now compute the time correlation function of the step edge, G(t), after time t = t′′ − t′ has elapsed, g(t) ≡< (y(x, t′′)− y(x, t′))2 > (13) Substituting eqn. 11 into 12 and using the correlation properties of the noise (eqn. 7) we obtain we find, after some calculation, g±(t) = 21/4πα (k2 ± q2c ) 1− e−2α(k or, using substitution, g±(t) = t (kT ) × (15) (u2 ± αtq4c ) 1− e−2u When there is no electromigration force (i.e. |F | = 0, qc = 0), the integral in eqn. 16 is clearly time- independent and we recover the result derived by Bartelt et al. [3] where the g(t) the step edge fluctuations evolves with the well-known t1/4 scaling law characteristic of step-edge limited diffusion. g±(t) ≡ g0(t) = t1/4 (kT ) It is helpful to re-express eqn. 16 in terms of the average time, τ0, that it takes for the mean-squared width of an initially straight step to reach a value equal to g20(τ0) = a2⊥ (i.e. one square lattice spacing): g0(t) = a where τ0 ≡ τh 2kTa|| 4Γ (3/4) When the electromigration force is finite ( i.e. |F | > 0), it is apparent from eqn. 16 that this scaling behavior is modified by the appearance of an explicit time depen- dence in the integral. This can be seen more clearly by defining a critical time, τ , and a rescaled time, ζ = t/τ , where, = 2τh 2kTa|| , (19) Then, eqn. 16 can be rewritten as g±(t) = a where I± (ζ) ≡ 21/4Γ(3/4) × (21) (u2 ± 1− e−2u I± is a universal function of the rescaled time, ζ and is normalized such that I±(t → 0) = 1. The integral appearing in eqn. 22 is easily evaluated numerically and is shown in fig. 1 in which we display I± plotted as a function of the rescaled time ζ (i.e. time is expressed in units of τ). The solid curves show I± obtained for F > 0 (step up) , F < 0 (step down) electromigration forces (For F = 0, I±(ζ) = 1). From fig. 1. it is apparent that g±(t) (eqn. 20) deviates very significantly from the t1/4 scaling behavior of g0(t) (eqn. 17) as t approaches τ (ζ → 1). These deviations be- gin to appear at earlier times, t ∼ 0.1τ , when the effect of the force on the evolution of the step fluctuations begins to be felt. This can be seen more clearly in fig. 2 which displays the correlation function of a step plotted as a function of time for τ0 = 5s and τ = 10 +4s. The values for these parameters were chosen so that τ/τ0 ∼ 104, a ratio typical of accelerated electromigration experiments. As noted above, deviations from t1/4 scaling start to ap- pear when t ∼ 0.1τ = 100s. The results shown in figs. 1 and 2 have a simple qualitative interpretation; the short time behavior of the step fluctuations (t ≪ τ) is com- pletely dominated by the thermal fluctuations and the effect of the electromigration bias emerges only at later FIG. 1: The integral function I(ζ) (eqn. 22) plotted as a function of the rescaled time ζ = t/τ for no electromigration force (F = 0), the electromigration force in the step down direction (F > 0) and in the step-up direction (F < 0). FIG. 2: The time corrleation function, g(t), of the step- edge distribution predicted by the continuum Langevin model (eqns. 20 and 22), plotted as a function of time for τ0 = 5s and τ = 104s. times. Such behavior is typical of the dynamics of diffu- sion driven by weak external forces. It is instructive to perform a power series expansion of the integral about ζ = 0 such that eqn. 20 becomes, g±(t) = a 1∓ a 1 . . . The expansion coefficents can be obtained analytically, = 0.3487, a1 = = 0.1500, Shown as the dashed lines in fig. 1 are the results of this series approximation for I±(ζ) (eqn. 22), evaluated up to, and including the terms linear in time. Clearly, this truncated expression is a reasonable approximation for t ≤ 0.4τ . III. SIMULATION In order to test the predictions of the continuum Langevin model described above, we developed a Monte Carlo simulation of step edge fluctuations in the pres- ence of an external force. In this model, atomic diffu- sion was restricted to the step edges with atoms jump- ing between adjacent step sites only. Only nearest- neighbor interactions on a square lattice were permit- ted (a⊥ = a|| = a = 1) and were modelled by a single bond energy ǫ. In order to describe the electromigration force, the atomic jump probabilities for motion parallel and anti-parallel to the force were biased by a small en- ergy differential ∆U . In terms of the electromigration force, F , and the lattice spacing perpendicular to the step edge, a; ∆U = ±Fa. Simulations were performed for steps of length ℓ = 10000a|| fluctuation on a square lattice. Periodic bound- ary conditions were employed. The bond energy was set to ǫ = 2.0kBT and the magnitude of the electromigra- tion bias was |∆U | = 0.01kBT , a factor of 103 smaller than the typical binding energy of an atom to the step edge. This value was chosen to generate statistically sig- nificant deviations from the (no-force) t1/4 scaling within reasonable simulation times. If ǫ = 0.1eV and it is as- sumed that a ∼ 1.5Å (typical of metals) then this bias corresponds to an electric field with a magnitude of or- der 1000V cm−1 acting on an atom with effective valence of Z∗ ∼ ±10e. In actual accelerated electromigration experiments, a field of 0.1− 1V cm−1 is typical. Figure 3 shows the results of the simulation where the correlation function of an isolated step is plotted as a function of the time measured in Monte-Carlo steps per step-edge site (MCS). Shown is g(t) obtained when the electromigration force acts in the step-up and step-down directions and when ∆U = 0 (i.e. no electromigration force is acting at the step edge).We define one Monte FIG. 3: Results of the a kinetic Monte Carlo simulation where the correlation function of an isolated step is plotted as a function of the time (measured in Monte-Carlo steps per step- edge site, the lattice spacing is a = 1). Shown is g(t) obtained when the electromigration force acts in the step-up and step- down directions and when ∆U = 0 (i.e. no electromigration force is acting at the step edge). The curves shown were obtained by averaging the data from 200 randomly generated replicas after each was equilibrated for 105 Monte Carlo steps. Carlo step to be equal to the average time needed for ev- ery atom at the step edge to attempt a hop. The results shown in fig. 3 were obtained by averaging the data from 100 randomly generated replicas after each was equili- brated for 105 Monte Carlo iterations per site. Compar- ing the simulation results (fig. 3.) to the predictions of the Langevin theory (fig 4.) it is apparent that the qual- itative features of the continuum theory are reproduced by the Monte Carlo simulation. These same data are pre- sented in the form of a log-log plot in fig. 4. In the ab- sence of the force (F = 0 , ∆U = 0), a least-squares fit of the no-force simulation data shows that g0(t) is very well fit by a tβ power-law where β = 0.25± 0.01. Therefore, when there is no electromigration force present in the simulation, the correlation function of the step evolves according well-known t1/4 scaling, as predicted by the Langevin analysis (eqn. 17). By least squares fitting the simulation results to eqn. 17 we obtain a value of τ0 = 5 We now consider the results of the simulation obtained for a finite electromigration force, also shown in figs. 3 and 4. Equation 22 suggests that the value of τ can be FIG. 4: A log-log plot of the simulation data shown in fig. 3. The dashed line shows the best fit of a power law, (t/τ0) β, to the no-force (F = 0) data; β = 0.25, τ0 = 5 MCS. extracted from the simulation results by considering the scaling of the difference between the correlation functions for the force in the up and down step directions: ∆(t) = g+ − g− = 2a 1 . . . (24) For the simulation results, this normalized difference is plotted in fig. 5. The behavior of this quantity is well fit by the leading order term in eqn. 24 from which we obtain a value of τ = 62000± 10000 MCS. For comparison with the fits to the continuum Langevin model , we can estimate τ from the microscopic parameters employed in the discrete Monte Carlo model used to generate the simulation data. Combining eqn. 19 with the step stiffness appropriate for our model 2kTa|| 2kBTa|| we obtain an estimate of τ in units of MCS: τ = 2 2kBTa|| The ratio of the hopping time in the Langevin model , τh to the time between hopping attempts in the Monte Carlo simulation τa can be obtained by monitoring the FIG. 5: For the simulation results shown in figs. 3 and 4; a log-log plot of the normalized difference (eqn. 24) plotted as a function of time (MCS). The behavior of this quantity is well fit by the leading order term in eqn. 24 (t1/2, dashed curve) from which we obtain a value of τ = 62000 ± 10000 MCS. success rate of hops between adjacent lattice sites in the simulation. We find that τa/τh = 3.6 ± 0.1. Thus our estimate for the value of τ in the Monte Carlo simu- lations, used to generate the data shown in figs 3-5, is τ = 62000±10000 MCS. Clearly, the agreement between the continuum Langevin theory (τ = 71000 MCS ) and the microscopic model (τ = 62000± 10000 MCS) is rea- sonable. Finally we note that in the high temperature limit (kBT ≫ ǫ) the ratio of the electromigration bias to the binding energy at a step edge, ǫ is related directly to τ that would be obtained from experiment: where τh can be determined from eqn. 18, if the step stiffness is known. IV. CONCLUSIONS The temporal evolution of step-edge fluctuations un- der electromigration conditions has been analyzed using a continuum Langevin model for the case where diffu- sion is limited by mass transport along the step edge. We find that the presence of the electromigration force, felt by atoms at the step edge, causes deviations from the usual t1/4 scaling of the terrace-width distribution driven by thermal fluctuations alone. We have identified a critical time τ that is a function of the three energy scales in the problem: kBT , the step stiffness, γ, and the bias associated with adatom hopping under the influence of an electromigration force, ±∆U . For (t < τ), the cor- relation function evolves, to a good approximation, as a superposition of t1/4 and t3/4 power laws. For all τ a closed form expression was derived. This behavior was confirmed by a Monte-Carlo simulation using a discrete model of the step dynamics. Finally we propose that the magnitude of the electromigration force acting upon an atom at a step-edge could be determined directly by care- ful measurement and analysis of the statistical properties of step-edge fluctuations on the appropriate time-scale. Acknowledgments We acknowledge helpful discussions with E.D. Williams. This work has been supported by the US Department of Energy grant DE-FG02-01ER45939 and by the NSF- Materials Research Science and Engineering Center un- der grant DMR-00-80008. [1] H.-C. Jeong and E. D. Williams, Surface Science Reports 34, 171 1999. [2] M. Giesen, Progress in Surface Science 68, 1 2001. [3] N. C. Bartelt, J. L. Goldberg, T. L. Einstein, and E. D. Williams, Surf. Sci. 273, 252 1992. [4] S. V. Khare and T. L. Einstein, Phys. Rev. B 57, 4782 1998. [5] T. Ihle, C. Misbah, and O. Pierre-Louis, Phys. Rev. B 58, 2289 1998. [6] H.-C. Jeong and J. D. Weeks, Surf. Sci. 432 , 101 1999. [7] C. P. Flynn, Phys. Rev. B 66, 155405 2002. [8] N. C. Bartelt, T. L. Einstein, and E. D. Williams, Surf. Sci. 521 , L669 2002. [9] M. Giesen, Surf. Sci. 442 , 543 1999. [10] R. S. Sorbello, Solid State Physics eds. H. Ehrenreich and F. Spaepen 51, 159 1999. [11] D. Schumacher, Surface Scattering Experiments With Conduction Electrons Springer Tracts in Modern Physics (Springer, Berlin) 128, 1993. [12] T. W. Duryea and H. B. Huntington, Surf. Sci. 199, 261 1988. [13] H. Ishida, Phys. Rev. B 49, 14610 1994. [14] H. Ishida, Phys. Rev. B 52, 10819 1995. [15] H. Ishida, Phys. Rev. B 57, 4140 1998. [16] H. Ishida, Phys. Rev. B 60, 4532 1999. [17] H. Ishida, Phys. Rev. B 54, 10905 1996. [18] P. J. Rous, T. L. Einstein, and E. D. Williams, Surf. Sci. Lett. 315, 995 1994. [19] P. J. Rous, Phys. Rev. B 61 , 8475 2000. [20] P. J. Rous, Phys. Rev. B 61 , 8475 2000. [21] P. J. Rous, J. Appl. Phys. 87, 2780 2000. [22] H. Yasunaga and A. Natori, Surf. Sci. Rep. 15, 205 1992. [23] W. W. Mullins, J. Appl. Phys. 28 , 333 1957. [24] G. Bales and A. Zangwill, Phys. Rev. B 41 , 5500 1990. [25] O. Pierre-Louis, M. D’Orsogna, and T. Einstein, Phys. Rev. Lett. 82, 3661 1999. [26] M. R. Murty and B. Cooper, Phys. Rev. Lett. 83, 352 1999. [27] J. Krug, Multiscale Modeling of Epitaxial Growthed. ed. A Voight (Birkhauser) ), 2004. [28] M. Degawa et al., Surf. Sci. 487 , 171 2001. [29] T. Zhao, J. D. Weeks, and D. Kandel, Phys. Rev. B 70 , 161303 2004. ABSTRACT The temporal evolution of step-edge fluctuations under electromigration conditions is analysed using a continuum Langevin model. If the electromigration driving force acts in the step up/down direction, and step-edge diffusion is the dominant mass-transport mechanism, we find that significant deviations from the usual $t^{1/4}$ scaling of the terrace-width correlation function occurs for a critical time $\tau$ which is dependent upon the three energy scales in the problem: $k_{B}T$, the step stiffness, $\gamma$, and the bias associated with adatom hopping under the influence of an electromigration force, $\pm \Delta U$. For ($t < \tau$), the correlation function evolves as a superposition of $t^{1/4}$ and $t^{3/4}$ power laws. For $t \ge \tau$ a closed form expression can be derived. This behavior is confirmed by a Monte-Carlo simulation using a discrete model of the step dynamics. It is proposed that the magnitude of the electromigration force acting upon an atom at a step-edge can by estimated by a careful analysis of the statistical properties of step-edge fluctuations on the appropriate time-scale. <|endoftext|><|startoftext|> Introduction Renormgroup analysis The Split Higgsino scenario The model Lagrangian and its features Split Higgsino as the DM carrier SUSY scales and experimental possibilities -N scattering and collider signals Diffuse gamma spectrum from the Galactic halo Conclusions References ABSTRACT We present a renormalization group motivation of scale hierarchies in SUSY SU(5) model. The Split Higgsino scanrio with a high scale of the SUSY breaking is considered in detail. Its manifestations in experiments are discussed. <|endoftext|><|startoftext|> Introduction ISW effect and steerable wavelets ISW effect Wavelets on the sphere Steerability and morphological measures Analysis procedures Data and simulations Generic procedure Local morphological analysis Matched intensity analysis Local morphological correlations Detections Foregrounds and systematics Matched intensity correlation Detections Foregrounds and systematics Conclusions ABSTRACT Using local morphological measures on the sphere defined through a steerable wavelet analysis, we examine the three-year WMAP and the NVSS data for correlation induced by the integrated Sachs-Wolfe (ISW) effect. The steerable wavelet constructed from the second derivative of a Gaussian allows one to define three local morphological measures, namely the signed-intensity, orientation and elongation of local features. Detections of correlation between the WMAP and NVSS data are made with each of these morphological measures. The most significant detection is obtained in the correlation of the signed-intensity of local features at a significance of 99.9%. By inspecting signed-intensity sky maps, it is possible for the first time to see the correlation between the WMAP and NVSS data by eye. Foreground contamination and instrumental systematics in the WMAP data are ruled out as the source of all significant detections of correlation. Our results provide new insight on the ISW effect by probing the morphological nature of the correlation induced between the cosmic microwave background and large scale structure of the Universe. Given the current constraints on the flatness of the Universe, our detection of the ISW effect again provides direct and independent evidence for dark energy. Moreover, this new morphological analysis may be used in future to help us to better understand the nature of dark energy. <|endoftext|><|startoftext|> Electromagnetic structure and weak decay of meson K in a light-front QCD-inspired model∗ Fabiano P. Pereira a, J. P. B. C. de Melo b †, T. Frederico c, and Lauro Tomio d aInstituto de F́ısica, Universidade Federal Fluminense, 24210-900, Niterói, RJ, Brazil bUniversidade Cruzeiro do Sul, CETEC, 08060-070, São Paulo, SP, Brazil cInstituto Tecnológico de Aeronáutica, 12228-900, São José dos Campos, SP, Brazil dInstituto de F́ısica Teórica, UNESP, 01405-900, São Paulo, SP, Brazil The kaon electromagnetic (e.m.) form factor is reviewed considering a light-front con- stituent quark model. In this approach, it is discussed the relevance of the quark-antiquark pair terms for the full covariance of the e.m. current. It is also verified, by considering a QCD dynamical model, that a good agreement with experimental data can be obtained for the kaon weak decay constant once a probability of about 80% of the valence component is taken into account. 1. INTRODUCTION The kaon, as quark-antiquark bound states, is one appropriate system to study aspects of QCD at low and intermediate energy regions. By using quantum field theory at the light-front the subnuclear structure can be more easily studied [ 1, 2, 3]. Within the light- front framework and an appropriate choice of the frame, it is possible to obtain the pion electromagnetic form factor at both space- and time-like regimes[ 4]. Using the light-cone components J+K = J 0+J3 and J−K = J 0−J3 of the kaon electromagnetic current, one can obtain the corresponding form factors in the light-front formalism, with a pseudoscalar coupling for the quarks and considering the Breit frame (q+ = 0, ~q⊥ = (qx, 0) 6= 0) [ 5]. In the case of J+K there is no pair term contribution in the Breit frame. However, for the J−K component of the electromagnetic current, the pair term contribution is different from zero and necessary to preserve the rotational symmetry of the current. In the next section, we outline the main equations of the model for the kaon electromag- netic current, detailed in [ 5], with the corresponding results obtained for the kaon elastic form factor. In section 3, we briefly review a QCD inspired model, presenting results for the weak decay pseudoscalar constants compared to data. In section 4 we present our conclusions. ∗Work partially supported by the Brazilian funding agencies FAPESP and CNPq †JPBC de Melo thanks Instituto de F́ısica Teórica, UNESP, for supporting facilities http://arxiv.org/abs/0704.0627v1 2. ELECTROMAGNETIC FORM FACTOR The initial light-front wave function considered in the present model is given by: ΦiQ(x, k⊥) = (1− x)2 −M20 )(m −M2(mQ, mR)) , (1) where N is a normalization constant, Q ≡ {q̄, q} is the quark or antiquark index with mQ is the corresponding quark mass, m2 is the kaon mass, x = k+/P+ is the momentum fraction, and M2(mQ, mR) ≡ k2⊥ +m (P − k)2⊥ +m − P 2⊥, (2) with the free quark-mass operator given by M20 = M 2(mq̄, mq). mR is a mass constant chosen to regularize the triangle diagram. For the corresponding final wave-functions, q̄ and Φ q , we just need to exchange P ↔ P ′ in (1) and (2). The relation between the electromagnetic current Jµ and the space-like kaon electromagnetic form factor FK+(q is given by 〈P ′|Jµ|P ′〉 = (P ′ + P )µFK+(q 2) . In terms of the initial (Φiq̄) and final (Φ light-front wave functions, we have F+q̄ (q 2) = −eq̄ N2g2Nc 4π3P+ d2k⊥dx N+q̄ θ(x)θ(1− x) Φ q̄ (x, k⊥)Φ q̄(x, k⊥) , (3) F+q (q 2) = [ q ↔ q̄ in F+q̄ (q 2) ] , (4) where Nc is the color number, g is the coupling constant, eQ is the charge of quark Q, and N+q̄ = (−1/4)Tr[(/k +mq̄)γ 5(/k − /P ′ +mq)γ +(/k − /P +mq)γ . In the light-front approach, beside the valence contribution, we have also the non-valence contributions to the currents. In the case of the J+ component, the non-valence component does not contribute to the corresponding matrix elements [ 5]. The kaon electromagnetic form factor obtained with J+ is the sum of two contributions from quark and antiquark currents: (q2) = F+q (q 2) + F+q̄ (q 2) normalized such that F+ (0) = 1. (5) In the case that we consider the J− component, to obtain the kaon electromagnetic form factor, after considering the contribution from the interval 0 < k+ < P+ (interval I), we need to add a second contribution, which is originated from the pair terms, and non-zero in the interval P+ < k+ < P ′+ (interval II). The contribution is obtained after a Cauchy integral in k− is performed in the limit P ′+ → P+ [ 5]. So, instead of (5), we will have: (q2) = F−q (q 2) + F−q̄ (q F−(q2) , (6) normalized by the charge conservation to F− (0) = 1. The parameters of the model are the constituent quark masses, mq = mu = md = 220 MeV, ms = 419 MeV and the regulator mass mR =946 MeV, adjusted to fit the electro- magnetic radius of the kaon. The electromagnetic radius is related to the corresponding form factor, with the mean-square-radius given by 〈r2K+〉 = 6 dFK+(q . (7) With the parameters adjusted as given above, we have 〈r2 〉 = 0.354 fm2, which is very close to the experimental value 〈r2 〉|exp = 0.340 fm 2 [ 6]. Our results for the kaon electromagnetic form factor are presented in Fig. 1, in compar- ison with available experimental data [ 6]. We observe that the full kaon electromagnetic form factor is covariant only after the inclusion of the pair terms or non-valence contri- bution to the J− component of the electromagnetic current. 0.01 0.10 1.00 [(GeV/c) Figure 1. The kaon electromagnetic form factor is obtained with the plus and minus components of the e.m. current (both cases are shown by the solid-line results) and compared with experimental data [ 6]. The dashed-line curve shows the form factor without the pair terms contribution in J− 3. WEAK DECAY CONSTANTS IN A QCD INSPIRED MODEL Next, we briefly review the calculation of the pseudoscalar constants, in a light-front QCD-inspired dynamical model. In this case, the constituent quark masses need to be readjusted in view of the fact that, differently from the approach outlined in section 2, the wave-function is obtained from an eigenvalue equation, as follows. The valence wave function is obtained by solving an eigenvalue equation for the effective square mass operator M2ps [ 7]: ps ψ(x, ~k⊥) = M 0 (x, k⊥) ψ(x, ~k⊥)− dx′d~k′⊥θ(x ′)θ(1− x′) x(1− x)x′(1− x′) 4m1m2 − λpsg(M 0 (x, k⊥))g(M ′, k′⊥)) ψ(x′, ~k′⊥) , (8) where M20 (x, k⊥) ≡ ( ~k2⊥ + m 1)/x + ( ~k2⊥ + m 2)/(1 − x) is the free square mass operator in the meson rest frame, m1,2 are the constituent quark masses, α gives the strength of the Coulomb-like interaction. g(K) is the model form factor, with λps the strength of the separable interaction. We consider two expressions for the form factors: g(a)(K2) = β(a) +K2 and g(b)(K2) = , (9) where the parameters β(a,b) and λps are adjusted to reproduce the experimental val- ues of the pion electromagnetic radius and mass, mπ. For α = 0.5, we have β (a) = −(634.5 MeV)2 and β(b) = (1171 MeV)2. mu = 384 MeV, ms = 508 MeV. In Table 1, we have the results compared with experimental data [ 8]. Table 1 Results for the kaon and pion weak decay constants, compared with experimental data. The model is adjusted to reproduce pion radius and mass. qq f (a)ps (MeV) f ps (MeV) f ps (MeV) M ps (MeV) M ps (MeV) [ 8] π+(ud) 110 110 92.4± .07± 0.25 [ 8] 140 140 K+(us) 126 121 113.0± 1.0± 0.31[ 8] 490 494 4. CONCLUSIONS Considering a light-front model wave-function we have observed a good agreement of the results for the kaon electromagnetic form factor with experimental data. The electro- magnetic form factor was obtained using the plus and minus components of the electro- magnetic current. The inclusion of the non-valence component of the current was shown to be essential in this approach to obtain covariant results for the calculated matrix ele- ments. We also show that a good agreement with experimental data is obtained for the kaon weak decay constants once a probability of the valence component of about 80% is taken into account. REFERENCES 1. F. Cardarelli, I. L. Grach, I. M. Narodetsky, E. Pace, G. Salme, S. Simula, Phys. Rev. D 53 (1996) 6682. 2. J. P. B. C. de Melo, H. W. Naus and T. Frederico, Phy. Rev. C 59 (1999) 2278. 3. B. L. G. Bakker, H.-M. Choi and C.-R. Ji, Phys. Rev. D 63 (2001) 074014. 4. J. P. B. C. de Melo, T. Frederico, E. Pace and G. Salmè, Phy. Rev. D 73 (2006) 074013; J. P. B. C. de Melo, T. Frederico, E. Pace and G. Salmè, Phy. Lett. B 581 (2004) 75. 5. F.P. Pereira, J.P.B.C. de Melo, T. Frederico and L. Tomio, Phys. of Part. and Nucl. 36 (2005) 5217; F.P. Pereira, Fatores de Forma Eletromagnéticos do Ṕıon e do Kaon na Frente de Luz, Msc Dissertation, IFT, São Paulo, 2005. 6. S. R. Amendolia et al., Phys. Lett. B 178 (1986) 435. 7. T. Frederico and H.-C. Pauli, Phy. Rev. D 64 (2001) 054004; L. A. M. Salcedo, J. P. B. C. de Melo, D. Hadjmichef and T. Frederico, Eur. Phys. J. A 27 (2006) 213. 8. W.-M. Yao et al., Journal of Physics G 33 (2006) 1. INTRODUCTION ELECTROMAGNETIC FORM FACTOR WEAK DECAY CONSTANTS IN A QCD INSPIRED MODEL CONCLUSIONS ABSTRACT The kaon electromagnetic (e.m.) form factor is reviewed considering a light-front constituent quark model. In this approach, it is discussed the relevance of the quark-antiquark pair terms for the full covariance of the e.m. current. It is also verified, by considering a QCD dynamical model, that a good agreement with experimental data can be obtained for the kaon weak decay constant once a probability of about 80% of the valence component is taken into account. <|endoftext|><|startoftext|> Black hole puncture initial data with realistic gravitational wave content B. J. Kelly,1, 2 W. Tichy,3 M. Campanelli,4, 2 and B. F. Whiting5, 2 1Gravitational Astrophysics Laboratory, NASA Goddard Space Flight Center, 8800 Greenbelt Rd., Greenbelt, MD 20771, USA 2Center for Gravitational Wave Astronomy, Department of Physics and Astronomy, The University of Texas at Brownsville, Brownsville, Texas 78520 3Department of Physics, Florida Atlantic University, Boca Raton Florida 33431-0991 4Center for Computational Relativity and Gravitation, School of Mathematical Sciences, Rochester Institute of Technology, 78 Lomb Memorial Drive, Rochester, New York 14623 5Department of Physics, University of Florida, Gainsville Florida 32611-8440 (Dated: October 26, 2018) We present improved post-Newtonian-inspired initial data for non-spinning black-hole binaries, suitable for numerical evolution with punctures. We revisit the work of Tichy et al. [W. Tichy, B. Brügmann, M. Campanelli, and P. Diener, Phys. Rev. D 67, 064008 (2003)], explicitly calculating the remaining integral terms. These terms improve accuracy in the far zone and, for the first time, include realistic gravitational waves in the initial data. We investigate the behavior of these data both at the center of mass and in the far zone, demonstrating agreement of the transverse- traceless parts of the new metric with quadrupole-approximation waveforms. These data can be used for numerical evolutions, enabling a direct connection between the merger waveforms and the post-Newtonian inspiral waveforms. PACS numbers: 04.25.Dm, 04.25.Nx, 04.30.Db, 04.70.Bw I. INTRODUCTION Post-Newtonian (PN) methods have played a funda- mental role in our understanding of the astrophysical im- plications of Einstein’s theory of general relativity. Most importantly, they have been used to confirm that the ra- diation of gravitational waves accounts for energy loss in known binary pulsar configurations. They have also been used to create templates for the gravitational waves emit- ted from compact binaries which might be detected by ground-based gravitational wave observatories, such as LIGO [1, 2], and the NASA/ESA planned space-based mission, LISA [3, 4]. However, PN methods have not been extensively used to provide initial data for binary evolution in numerical relativity, nor, until recently (see [5, 6]), have they been extensively studied so that their limitations could be well identified and the results of nu- merical relativity independently confirmed. Until the end of 2004, the field of numerical relativ- ity had been struggling to compute even a single or- bit for a black-hole binary (BHB). Although debate oc- curred on the advantages of one type of initial data over another, the primary focus within the numerical rela- tivity community was on code refinement which would lead to more stable evolution. Astrophysical realism was very much a secondary issue. However, this situation has radically changed in the last few years with the in- troduction of two essentially independent, but equally successful techniques: the generalized harmonic gauge (GHG) method developed by Pretorius [7] and the “mov- ing puncture” approach, independently developed by the UTB and NASA Goddard groups [8, 9]. Originally in- troduced by Brandt & Brügmann [10] in the context of initial data, the puncture method explicitly factored out the singular part of the metric. When used in numerical evolution in which the punctures remained fixed on the numerical grid, it resulted in distortions of the coordi- nate system and instabilities in the Baumgarte-Shapiro- Shibata-Nakamura (BSSN) [11, 12] evolution scheme. The revolutionary idea behind the moving puncture ap- proach was precisely, not to factor out the singular part of the metric, but rather evolve it together with the reg- ular part, allowing the punctures to move freely across the grid with a suitable choice of the gauge. A golden age for numerical relativity is now emerging, in which multiple groups are using different computer codes to evolve BHBs for several orbits before plunge and merger [13, 14, 15, 16, 17, 18, 19, 20, 21]. Comparison of the numerical results obtained from these various codes has taken place [22, 23, 24], and comparison with PN inspiral waveforms has also been carried out with encour- aging success [5, 6, 25, 26]. The application of successful numerical relativity tools to study some important as- trophysical properties (e.g. precession, recoil, spin-orbit coupling, elliptical orbits, etc) of spinning and/or un- equal mass-black hole systems is currently producing ex- tremely interesting new results [27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42]. It now seems that the primary obstacle to further progress is simply one of computing power. In this new situation, it is perhaps time to return to the question of what initial data will best describe an astrophysical BHB. To date, the best-motivated description of pre-merger BHBs has been supplied by PN methods. We might ex- pect, then, that a PN-based approach would give us the most astrophysically correct initial data from which to run full numerical simulations. In practice, PN results are frequently obtained in a form ill-adapted to numeri- http://arxiv.org/abs/0704.0628v2 cal evolution. PN analysis often deals with the full four- metric, in harmonic coordinates; numerical evolutions frequently use ADM-type coordinates, with a canonical decomposition of the four-metric into a spatial metric and extrinsic curvature. Fortunately, many PN results have been translated into the language of ADM by Ohta, Damour, Schäfer and collaborators. Explicit results for 2.5PN BHB data in the near zone were given by Schäfer [43] and Jaranowski & Schäfer (JS) [44], and these were implemented numer- ically by Tichy et al. [45]. Their insight was that the ADM-transverse-traceless (TT) gauge used by Schäfer was well-adapted to a puncture approach. To facilitate comparison with this earlier work [45], we continue to use the results of Schäfer and co-workers, anticipating that higher-order PN results should eventually become available in a useful form. The initial data provided previously by Tichy et al. already include PN information. They are accurate up to order (v/c)5 in the near zone (r ≪ λ), but the accuracy drops to order (v/c)3 in the far zone (r ≫ λ) [here λ ∼ π r312/G(m1 +m2) is the gravitational wave- length]. These data were incomplete in the sense that they did not include the correct TT radiative piece in the metric, and thus did not contain realistic gravita- tional waves. In this paper, we revisit the PN data problem in ADM-TT coordinates, with the aim of supplying Numer- ical Relativity with initial BHB data that extend as far as necessary, and contain realistic gravitational waves. To do this, we have evaluated the “missing pieces” of Schäfer’s TT metric for the case of two non-spinning par- ticles. We have analyzed the near- and far-zone behavior of these data, and incorporated them numerically in the Cactus [46] framework. In principle, the most accurate PN metric available could be used at this step, but it is not currently available in ADM-TT form. The remainder of this paper is laid out as follows. In Section II, we summarize the results of Schäfer (1985) [43], and Jaranowski & Schäfer (1997) [44] and their ap- plication by Tichy et al. (2003) [45], to the production of puncture data for numerical evolution. In Section III, we describe briefly the additional terms necessary to complete hTT to order (v/c)4, deferring details to the Appendix. In Section IV, we study the full data both analytically and numerically. Section V summarizes our results, and lays the groundwork for numerical evolution of these data, to be presented in a subsequent article. II. ADM-TT GAUGE IN POST-NEWTONIAN The “ADM-TT” gauge [43, 47] is a 3+1 split of data where the three-metric differs from conformal flatness precisely by a TT radiative part: gij = ηij + h ij , (1) πii = 0. (2) The fields φ, πij and hTTij can all be expanded in a post- Newtonian series. Solving the constraint equations of 3+1 general relativity in this gauge, [43, 44] obtained ex- plicit expressions valid up to O(v/c)5 in the near zone, incorporating an arbitrary number of spinless point par- ticles, with arbitrary masses mA. For N particles, the lowest-order contribution to the conformal factor is1: φ(2) = 4G , (3) where rA = ~x− ~xA is the distance from the field point to the location of particle A. In principle hTTij is computed from hTTij = −δTT klij ✷−1retskl, (4) where ✷−1ret is the (flat space) inverse d’Alembertian (with a “no-incoming-radiation” condition [48]), skl is a non- local source term and δTT klij is the TT-projection oper- ator. In order to compute hTTij we first rewrite Eq. (4) hTTij = −δTT klij ∆−1 + (✷−1ret −∆−1) TT (NZ) ij − δ TT kl ij (✷ ret −∆−1)skl. (5) Note that the near-zone approximation h TT (NZ) ij of h has already been computed in [43] up to order O(v/c)4 (see also Eq. 12 below). The last term in Eq. (5) is diffi- cult to compute because skl = 16πG pAk pAl δ(x− xA) + ,l (6) is a non-local source. However, we can approximate skl s̄kl = pAk pAl B 6=A nABk nABl ×16πGδ(x− xA). (7) and show that hTTij,(div) = −δ TT kl ij (✷ ret−∆−1)(skl− s̄kl) ∼ O(v/c)5 (8) 1 We explicitly include the gravitational constant G in all expres- sions here, as the standard convention G = 1 used in Numerical Relativity differs from the convention 16πG = 1 employed by [43, 44]. in the near zone. Furthermore, outside the near zone ij,(div) ∼ 1/r2, so that hTT ij,(div) falls off much faster than rest of hTTij , which falls off like 1/r. Hence hTTij = h TT (NZ) ij − δ TT kl ij (✷ ret −∆−1)s̄kl + hTTij,(div), (9) where hTT ij,(div) can be neglected if we only keep terms up to O(v/c)4 generally, and O(1/r) at infinity. The full expression for hTTij for N interacting point particles from Eq. (4.3) of [43] is: hTTij = h TT (NZ) ij + h ij,(div) + 16πG d3~k dω dτ (2 π)4 pAi pAj B 6=A nABi nABj × (ω/k) ~k·(~x−~xA)−i ω (t−τ) k2 − (ω + i ǫ)2 . (10) The first term in (10), h TT (NZ) ij can be expanded in v/c TT (NZ) ij = h TT (4) ij + h TT (5) ij +O(v/c) 6. (11) The leading order term at O(v/c)4, is given explicitly by Eq. (A20) of [44]: hTT (4)ij = mA rA ‖ ~pA ‖2 −5 (n̂A · ~pA)2 δij + 2 piA p 3(n̂A · ~pA)2 − 5 ‖ ~pA ‖2 niA n A + 12(n̂A · ~pA)n B 6=A niABn AB + 2 rA + rB niA n rABrA + 3rA rABrA , (12) where sAB ≡ rA + rB + rAB. The other two terms in Eq. (10) can be shown to be small in the near zone (r ≪ λ, where the characteristic wavelength λ ∼ 100M for rAB ∼ 10M). However, hTT (NZ)ij is only a valid ap- proximation to hTTij in the near zone, and becomes highly inaccurate when used further afield. Setting aside these far-field issues, Tichy et al. [45] ap- plied Schäfer’s formulation, in the context of a black-hole binary system, to construct initial data that are accurate up to O(v/c)5 in the near zone. They noted that the ADM-TT decomposition was well-adapted to the use of a puncture approach to handle black-hole singularities. This approach is essentially an extension of the method introduced in [10]. It allows a simple numerical treatment of the black holes without the need for excision. The PN-based puncture data of Tichy et al. have not been used for numerical evolutions. This is in part because these data, just like standard puncture data [10, 49, 50, 51], do not contain realistic gravitational waves in the far zone: h TT (NZ) ij does not even vaguely agree with the 2PN approximation to the waveform amplitude nor with the quadrupole approximation to the waveform phase for realistic inspiral. To illustrate this, we restrict to the case of two point sources, and compute the “plus” and “cross” polariza- tions of the near-zone approximation for hTTij : + = h TT (NZ) θ, (13) × = h TT (NZ) φ. (14) For comparison, the corresponding polarizations of the quadrupole approximation for the gravitational-wave strain are given by (paraphrasing Eq. (3.4) of [52]): (1+cos2 θ)(πGMfGW)2/3cos(ΦGW), (15) cos θ(πGMfGW)2/3sin(ΦGW), (16) where M ≡ ν3/5M is the “chirp mass” of the binary, given in terms of the total PN mass of the system M = m1 +m2, and the symmetric mass ratio ν = m1m2/M The angle θ is the “inclination angle of orbital angular momentum to the line of sight toward the detector”; that is, just the polar angle to the field point, when the binary moves in the x-y plane. ΦGW and fGW are the phase and frequency of the radiation at time t, exactly twice the orbital phase Φ(t− r) and orbital frequency Ω(t− r)/2π. The lowest-order PN prediction for radiation-reaction effects yields a simple inspiral of the binary over time, with orbital phasing given by Φ(τ) = Φ(tc)− Θ5/8, (17) Ω(τ) = Θ−3/8, (18) where Θ ≡ ν (tc − τ)/5GM , M and ν are given below (16), and tc is a nominal “coalescence time”. To evaluate (13-14), we need the transverse momentum p correspond- ing to the desired separation r12. The simplest expression for this is the classical Keplerian relation, which we give parameterized by Ω(τ): r12 = G 1/3M(MΩ)−2/3, (19) p = Mν(GMΩ)1/3. (20) In Fig. 1 we compare the plus polarization of the two waveforms (13) and (15) at a field point r = 100M , θ = π/4, φ = 0, for a binary in the x-y plane, with ini- tial separation r12 = 10M . The orbital frequency of the binary is related to the separation r12 and momenta p en- tering (13) by (19-20). To this level of approximation, the binary has a nominal PN coalescence time tc ≈ 780M . As might have been anticipated, both phase and ampli- tude of h TT (4) ij are wrong outside the near zone. This means that the data constructed from h TT (4) ij have the wrong wave content, but nevertheless these data are still accurate up to order (v/c)3 in the far zone. It is evident from the present-time dependence of (12) that it cannot actually contain any of the past history of an inspiralling binary. We would expect that a cor- rect “wave-like” contribution should depend rather on the retarded time of each contributing point source. It seems evident that the correct behavior must, in fact, be contained in the as-yet unevaluated parts of (10). The requisite evaluation is what we undertake in the next sec- tion. III. COMPLETING THE EVALUATION OF hTTij To move forward, we will simplify (10) and (12) to the case of only two particles. Then (10) reduces to: hTTij = h TT (NZ) ij + 16πG p1 i p1 j ~k ·(~x−~x1) + p2 i p2 j ~k ·(~x−~x2) − G n12i n12j ~k ·(~x−~x1) n21i n21j ~k ·(~x−~x2) · (ω/k) 2 e−i ω (t−τ) k2 − (ω + i ǫ)2 d3~k dω dτ (2 π)4 + hTTij,(div) (21) TT (NZ) ij +H +HTT2ij −HTT1ij Gm1m2 2 r12 −HTT2ij Gm1m2 2 r12 +hTTij,(div), (22) where HTTAij [~u] := 16πG d3~k dω (2 π)4 [ui uj ] (ω/k)2 k2 − (ω + i ǫ)2 ~k·(~x−~xA(τ)) e−i ω (t−τ). (23) Here, the “TT projection” is effected using the operator i := δ i − ki kj/k2. For an arbitrary spatial vector ~u, [ui uj ] TT = uc ud (P Pij P = ui uj + ki kj u(i kj) . (24) Details on the evaluation of these terms are presented in Appendix A. After calculation, we write the result as a sum of terms evaluated at the present field-point time t, the retarded time trA defined by t− trA − rA(trA) = 0, (25) and integrals between trA and t, TTA[~u] = H TTA[~u; t] +H TTA[~u; t TTA[~u; t A → t], (26) where the three parts are given by: 0 100 200 300 400 500 600 700 800 -0.0015 -0.001 -0.0005 0.0005 0.001 (quadrupole) (NZ) FIG. 1: Plus polarization of the quadrupole (black/solid) and near-zone (red/dashed) strains observed at field point r = 100M , θ = π/4, φ = 0. The binary orbits in the x-y plane, with initial separation r12 = 10M , and a nominal coalescence time tc ≈ 780M . Both phase and amplitude of h TT (4) very wrong outside the near zone. TTA[~u; t] = − rA(t) u2 − 5 (~u · n̂A)2 δi j + 2 ui uj + 3 (~u · n̂A)2 − 5 u2 niA n +12 (~u · n̂A)u(i nj)A , (27) TTA[~u; t −2 u2 + 2 (~u · n̂A)2 δi j + 4 ui uj + 2 u2 + 2 (~u · n̂A)2 niA n −8 (~u · n̂A)u(i nj)A , (28) TTA[~u; t A → t] = −G (t− τ) rA(τ)3 −5 u2 + 9 (~u · n̂A)2 δi j + 6 ui uj − 12 (~u · n̂A)u(i nj)A 9 u2 − 15 (~u · n̂A)2 niA n (t− τ)3 rA(τ)5 u2 − 5 (~u · n̂A)2 δi j + 2 ui uj − 20 (~u · n̂A)u(i nj)A −5 u2 + 35 (~u · n̂A)2 niA n . (29) In Fig. 2, we show the retarded times calculated for each particle, as measured at points along the x axis, for the same orbit as in Fig. 1. We also show the corre- sponding retarded times for a binary in an exactly cir- cular orbit. Since the small-scale oscillatory effect of the finite orbital radius would be lost by the overall linear trend, we have multiplied by the orbital radius. A. Reconciling with Jaranowski & Schäfer’s h TT (4) From the derivation above it is clear that hTTij includes retardation effects, so it will not depend solely on the present time. We might even expect that all “present- time” contributions should vanish individually, or should cancel out. It can be seen easily from (27) that the “t” part of the second and third terms of Eq. (22) exactly cancel out the “kinetic” part (first line) of Eq. (12). Thus, we can simply remove that line in Eq. (12), and use the 500 1000 1500 particle 1 (circular) particle 1 (inspiral) particle 2 (circular) particle 2 (inspiral) FIG. 2: Retarded times for particles 1 and 2, as measured by observers along the x axis at the initial time t = 0, for the binary of Fig. 1. To highlight the oscillatory effect of the finite-radius orbit on tr, we first divide by the average field distance r. “tr” part instead. One may similarly inquire whether the “t” parts of the fourth and fifth terms of Eq. (22) above, TT (pot,now) ij ≡ −H Gm1m2 2 r12 n̂12; t −HTT2ij Gm1m2 2 r12 n̂12; t , (30) also cancel the remaining, “potential” parts of Eq. (12). The answer is “not completely”; expanding in powers of 1/r, we find: TT (pot,4) ij + h TT (pot,now) G2m1m2 r12 16 r3 (3 + 14W 2 − 25W 4) δi j − 4 (1 + 5W 2)n12i n12j −5 (1 + 6W 2 − 7W 4)n1i n1j + 2W (7 + 9W 2) (n12i n1j + n12j n1i) +O(1/r4),(31) where W ≡ sin θ cos(φ − Φ(t)), and Φ(t) is the orbital phase of particle 1 at the present time t. That is, the “new” contribution cancels the 1/r and 1/r2 pieces of TT (4) ij entirely. In the far zone the result is thus smaller than the hTT ij,(div) term which we are ignoring everywhere, since it is small both in the near and the far zone [43]. We note here two general properties of the contribu- tions to the full hTTij . 1. In the near zone h TT (4) ij is the dominant term since all other terms arise from (✷−1ret − ∆−1)skl. Thus all other terms must cancel within the accuracy of the near-zone approximation. TT (4) ij is wrong far from the sources; thus, the new corrections should “cancel” h TT (4) ij entirely, far from sources. Note, however, that while hij = −✷−1retskl depends only on retarded time, its TT- projection hTTij = δ TT kl ij hkl has a more complicated causal structure; E.g. the finite time integral comes from applying the TT-projection. [Proof: Even if we had a source given exactly by s̄kl, h TT (4) ij would depend only the present time, hij would depend only on retarded time, and hTTij would (as we have computed) contain a finite time integral term.] Additionally, the full hTTij agrees well with quadrupole predictions, which we demonstrate in Section IV. IV. NUMERICAL RESULTS AND INVARIANTS A. Phasing and Post-Keplerian Relations It has been known for some time (see for example [53]) that gravitational wave phase plays an even more impor- tant part in source identification than does wave ampli- tude. In PN work, phase and amplitude are estimated somewhat separately; the amplitude requires knowledge of the time-dependent multipoles, used in developing the the full metric, while the phase can be relatively simply approximated from the orbital equations of motion, tak- ing into account the gravitational wave flux at infinity to evolve the orbital parameters [54]. The quadrupole waveform introduced for the compar- ison in Fig. 1 had an amplitude accurate to O(v/c)4 and the simplest available time evolution for the phase. Waveform phase is a direct consequence of orbital phase. To lowest order, we could have assumed a binary mov- ing in a circular orbit (of zero eccentricity) since, up to 2PN order, we can have circular orbits, where the linear momentum, p, of each particle is related to the separa- tion r12 by, say, Eq. (24) of [45]. Nevertheless, circular orbits are physically unrealistic – since radiation reac- tion will lead to inspiral and merger of the particles – and Eqs. (17-18) already include leading-order radiation- reaction effects. Moreover, the phase errors that would accrue from using purely circular orbits would be larger, the further from the sources we tried to compute them. The calculations of section III lead to waveform am- plitudes that are accurate at O(v/c)4 everywhere. How- ever, we desire that our initial-data wave content already encode the phase as accurately as possible. Highly ac- curate phase for our initial data (via hTT), and hence in the leading edge of the waveforms we would extract from numerical evolution, is critical for parameter estimation following a detection. For demonstrative purposes, in this section, we will restrict ourselves to the simplest phasing relations con- sistent with radiation-reaction inspiral as given by Eqs. (17-18), while using higher-order PN expressions than Eqs. (19 -20) for relating the orbit to the phase. For ex- ample, from [55], we have found to second PN (beyond leading) order: r12(Ω) = (GMΩ)−2/3 − (3− ν) − (18− 81ν − 8ν (GMΩ)2/3, (32) = (GMΩ)1/3 + (15− ν) (GMΩ) (441− 324ν − ν2) (GMΩ)5/3, (33) and we note that higher-order equivalents of these can be computed from [56]. In the numerical construction of initial data, the pri- mary input is the coordinate separation of the holes. In placing the punctures on the numerical grid, the separa- tion must be maintained exactly. To ensure this, we in- vert Eq. (32) to obtain the exact Ωr corresponding to our desired r12. Then we use Eq. (18) with t = 0 to find the coalescence time tc that yields this Ωr. Once we have ob- tained tc, we then find the orbital phase Φ and frequency Ω at any source time τ directly from Eqs. (17-18), and the corresponding separation r12 and momentum p from Eqs. (32-33), or their higher-order equivalents. In Fig. 3, we show a representative component of the retarded-time part of hTTij for both circular and leading- order inspiral orbits. For both orbits, we use the ex- tended Keplerian relations (32) and (33); otherwise the orbital configuration is that of Fig. 1. The coalescence time is now tc ∼ 1100M . We can see that the cumu- lative wavelength error of the circular-orbit assumption becomes very large at large distances from the sources. This demonstrates that using inspiral orbits instead of circular orbits will significantly enhance the phase accu- racy of the initial data, even though circular orbits are in principle sufficient when we include terms only up to O(v/c)4 as done in this work. From now on we use only inspiral orbits. 0 500 1000 1500 circular inspiral FIG. 3: The xx component of the full hTTij for a binary with initial separation r12 = 10M in a circular (black/solid) or inspiralling (red/dashed) orbit. Both fields have been rescaled by the observer radius r = z to compensate for the leading 1/r fall-off. The orbital configuration is the same as for Fig. 1, apart from the Keplerian relations, where we have used the higher-order relations (32-33), yielding tc ∼ 1100M . Note the frequency broadening at more distant field points. 0 100 200 300 400 500 600 700 800 -0.0015 -0.001 -0.0005 0.0005 0.001 (quadrupole) (full) (quadrupole) (full) FIG. 4: Plus and cross polarizations of the strain observed at field point r = 100M , θ = π/4, φ = 0. Both the quadrupole- approximation waveform (black/solid and green/dot-dashed) and the full (red/dashed and blue/dotted) waveforms coming from hTTij are shown. The orbital configuration is the same as for Fig. 1. Next, we compare our full waveform hTTij (expressed as the combinations h+ and h×) at an intermediate-field position (r = 100M , θ = π/4, φ = 0) to the lowest-order quadrupole result. In Fig. 4, the orbital configuration is the same as for Fig. 1. As one can see, both the + and × polarizations of our hTTij agree very well with quadrupole results, as they should. We demonstrate the near- and intermediate-zone behavior of the new data on 50 100 150 200 250 300 350 400 450 500 -0.05 -0.05 quadrupole FIG. 5: Plus and cross polarizations of the strain observed at t = 0 along the z axis. We show the near-zone (solid/black), the quadrupole (dashed/red) and full (dot-dashed/green) waveforms. All waveforms have been rescaled by the observer radius r = z to compensate for the leading 1/r fall-off. The orbital configuration is the same as for Fig. 1. the initial spatial slice in Fig. 5. The quadrupole and full solutions agree very well outside ∼ 100M . However, the full solution’s phase and amplitude approach the NZ solution closer to the sources. B. Numerical Implementation After having confirmed that we have a PN three-metric gij that is accurate up to errors of order O(v/c) 5, and that correctly approaches the quadrupole limit outside the near zone, we are now ready to construct initial data for numerical evolutions. In order to do so, we need the intrinsic curvature Kij , which can be computed as in Tichy et al. [45] from the conjugate momentum. The difference is that here we use the full ḣTTij instead of the near-zone approximation ḣ TT (4) ij to obtain the conjugate momentum [43]. The result is Kij = −ψ−10PN ḣTTij + (φ(2)π̃ +O(v/c)6, (34) where the error term comes from neglecting terms like ij,(div) at O(v/c)5 in hTTij , and where ψPN , π̃ φ(2) can be found in Tichy et al. [45]. An additional difference is that the time derivative of hTTij is evaluated numerically in this work. Note that the results for gij are accurate up to O(v/c)4, while the results for Kij are 0 0.5 1 1.5 = 10M = 20M = 50M = 100M 0 0.5 1 1.5 FIG. 6: Upper panel: Hamiltonian constraint violation along the y axis of our new data in the near zone, as a function of binary separation r12. Lower panel: Momentum constraint (y-component) violation of the same data along the x axis. The orbital configuration is that of Fig. 3. Distances have been scaled relative to r12, so that the punctures are initially at y/r12 = ±0.5. accurate up O(v/c)5, because Kij contains an additional time derivative [45, 57, 58]. Next we show the violations of the Hamiltonian and momentum constraints computed from gij and Kij , as functions of the binary separation r12. As we can see in both panels of Fig. 6, the constraints become smaller for larger separations, because the post-Newtonian approxi- mation gets better. Note that, as in [45], the constraint violation remains finite everywhere, and is largest near each black hole. C. Curvature Invariants and Asymptotic Flatness In analysis of both initial and evolved data, it is often instructive to investigate the behavior of scalar curva- ture invariants, as these give some idea of the far-field properties of our solution. We expect, for an asymptoti- cally flat space-time, that in the far field, the speciality index S ≡ 27J 2/I3 will be close to unity. This can be seen from the following arguments. Let us choose a tetrad such that the Weyl tensor components ψ1 and ψ3 are both zero. Further, we assume that in the far field ψ0 and ψ4 are both perturbations of order ǫ off a Kerr background. Then S ≈ 1− 3ψ0ψ4 +O(ǫ3), (35) which is indeed close to one. Note however, that this argument only works if the components of the Weyl ten- sor obey the peeling theorem, such that ψ2 ∼ O(r−3), ψ0 ∼ O(r−5) and ψ4 ∼ O(r−1). In particular, if ψ0 falls off more slowly than O(r−5), S will grow for large r. Now observe that ψ0 ∼ O(r−5) ∼ M3/r5 is formally of O(v/c)6. Thus, in order to see the expected behavior of S ≈ 1 in the far-field we need to go to O(v/c)6. If we only go to O(v/c)4 (as done in this work) ψ0 consists of un- controlled remainders only, which should in principle be dropped. When we numerically compute S we find that for our data, S deviates further and further from unity for large distances from the binary. This reflects the fact that the so-called “incoming” Weyl scalar ψ0 only falls off as 1/r3, due to uncontrolled remainders at O(v/c)6, which arise from a mixing of the background with the TT waveform. V. DISCUSSION AND FUTURE WORK Exploring and validating PN inspiral waveforms is cru- cially important for gravitational-wave detection and for our theoretical understanding of black-hole binaries. Our goal has been to provide a step forward in this under- standing by building a direct interface between the PN approach and numerical evolution, along the lines ini- tially outlined in Ref. [45]. In this paper we have essentially completed the calculation of the transverse- traceless part of the ADM-TT metric to O(v/c)4 pro- vided in [45], yielding data that, on the initial Cauchy slice, will describe the space-time into the far-field. We have incorporated this formulation into a numerical initial-data routine adapted to the “puncture” topology that has been so successful recently, and have explored these data’s numerical properties on the initial slice. Our next step is to evolve these data with moving punctures, and investigate how the explicit incorporation of post-Newtonian waveforms in the initial data affects both the ensuing slow binary inspiral of the sources and the release of radiation from the system. We note es- pecially that our data are non-conformally flat beyond O(v/c)3. We expect our data to incorporate smaller unphysical initial distortions in the black holes than is possible with conformal flatness, and hence less spurious gravitational radiation during the numerical evolution. We see this as a very positive step toward providing fur- ther validation of numerical relativity results for multiple orbit simulations, since it permits comparison with PN results where they are expected to be reliable. Our initial data will also allow us to fully evaluate the validity of PN results for merging binaries by enabling comparison with the most accurate numerical relativity results. We expect that further development of these data will certainly involve the use of more accurate orbital phas- ing information than the leading order given by Eqs. (17- 18). This information is available in radiative coordinates (see, e.g. Eq. (6.29) of [59]) appropriate for far-field eval- uation of the gravitational radiative modes; it may be possible to produce them in ADM-TT coordinates via a contact transformation, or by direct calculation (see, e.g. [60]). For initial separations similar to the fiducial test case of this paper, r12 = 10M , the order necessary for clean matching of the initial wave content with the new radiation generated in evolution should not be par- ticularly high [26]. As noted, the Keplerian relations Eqs. (32-33) can easily be extended to higher PN order. The data presented already allow for arbitrary initial mass ratios ν; this introduces the possibility of significant gravitational radiation in odd-l multipoles, together with associated phenomena, such as in-plane recoil “kicks”. An interesting future development of these data will be the inclusion of spin angular momenta on the pre-merger holes. This will open our initial-data prescription to de- scribing an even richer spectrum of binary radiation. Acknowledgments We would like to thank L. Blanchet and G. Schäfer for generous assistance and helpful discussion. M.C., B.K. and B.W. gratefully acknowledge the sup- port of the NASA Center for Gravitational Wave Astron- omy (NAG5-13396). M.C. and B.K. also acknowledge the NSF for financial support under grants PHY-0354867 and PHY-0722315. B.K. also acknowledges support from the NASA Postdoctoral Program at the Oak Ridge Associ- ated Universities. The work of W.T. was supported by NSF grant PHY-0555644. W.T. also acknowledges par- tial support from the NCSA under Grant PHY-060040T. The work of B.W. was also supported by NSF grants PHY-0245024 and PHY-0555484. APPENDIX A: DETAILS OF INTEGRAL CALCULATION Here we present some more details of the calculations that lead to the three contributions to Eq. (23): Eqs. (27-29). Inserting Eq. (24) in the general integral (23), we can write H TTA[~u] as a combination of scalar and tensor terms: HTTAij [~u] = 16πG ui uj − Iij A + [uc ud IcdA δij 2 uc u(i Icj)A + [uc ud I cdij A ,(A1) where the “I” integrals are defined as: d3~k dω (2 π)4 (ω/k)2 ei k rA cos θ−i ω T k2 − (ω + i ǫ)2 , (A2) d3~k dω (2 π)4 ki kj × (ω/k) 2 ei k rA cos θ−i ω T k2 − (ω + i ǫ)2 , (A3) i j c d d3~k dω (2 π)4 ki kj kc kd × (ω/k) 2 ei k rA cos θ−i ω T k2 − (ω + i ǫ)2 . (A4) Here T ≡ t − τ , and ~rA ≡ ~x − ~xA. We have also taken our integration coordinates such that ~rA lies in the z di- rection, so that the dummy momentum vector ~k satisfies ~k · ~rA = k rA cos θ, (A5) d3~k = k2 dk sin θ dθ dφ. (A6) Define the unit orthogonal vectors n̂A ≡ (0, 0, 1) , ℓ̂ ≡ (cosφ, sinφ, 0). Then we can write ~k = k cos θ n̂A + k sin θ ℓ̂ ⇒ ~k · ~rA = rA ~k · n̂A. We can also define a projector tensor onto ℓ̂: Qa b ≡ δa b − na nb ⇒ Qab = δab − na nb ⇒ QacQcb = Qab , Qab nb = 0 , Qab ℓb = ℓa. 1. Angular integration We will neglect the A subscript for now, until it be- comes relevant again. To calculate the integrals (A2-A4), we begin with the φ integration. The only φ dependence comes from the ~ℓ parts of the ~k terms. It can be seen from elementary trigonometric integrals that: dφ ℓa = dφ ℓa ℓb ℓc = 0, dφ ℓa ℓb = π Qa b, dφ ℓa ℓb ℓc ℓd = Qa bQc d +Qa cQb d +Qa dQb c We use these to calculate the φ integrals for Ia bA and Ia b c dA . Define w ≡ cos θ. Then dφ 1 = 2 π, ka kb = 2 π w2 na nb + π (1− w2)Qa b, ka kb kc kd = 2 π w4 na nb nc nd +6 πw2 (1− w2)Q(a b nc nd) (1− w2)2Q(a bQc d). So the next integrals will differ in their θ dependence, contained in the powers of w above. The θ integrals will contain the following basic types: g0(a) ≡ dw eaw = 2 sinh a , (A7) g2(a) ≡ dw w2 eaw = 2 sinh a − 4 cosha sinh a g4(a) ≡ dw w4 eaw = 2 sinh a − 8 cosha sinh a − 48 cosha sinh a . (A9) Now Ia b and Ia b c d can be written as the linear combi- nations: Ia b = Qa b I (na nb − 1 Qa b)K , (A10) Ia b c d = na nb nc nd − 3Q(a b nc nd) + 3 Q(a bQc d) 3Q(a b nc nd) − Q(a bQc d) Q(a bQc d) I . (A11) I here can be expressed in terms of g0(a) above: (2 π)3 (ω/k)2 k2 − (ω + i ǫ)2 e i k r cos θ−i ω T (2 π)3 ω2 e−i ω T k2 − (ω + i ǫ)2 g0(i k r) (2 π)3 ω2 e−i ω T J0. (A12) The 1/2 factor is because we moved to integrating k over the whole real line instead of the positive half-line (this is permissible as gn(a) is an even function of a). K and L are defined analogously to I, but with extra even powers of cos θ = w: (2 π)3 (ω/k)2 k2 − (ω + i ǫ)2 ei k r cos θ−i ω T cos2 θ (2 π)3 ω2 e−i ω T k2 − (ω + i ǫ)2 g2(i k r) (2 π)3 ω2 e−i ω T J2, (A13) (2 π)3 (ω/k)2 k2 − (ω + i ǫ)2 ei k r cos θ−i ω T cos4 θ (2 π)3 ω2 e−i ω T k2 − (ω + i ǫ)2 g4(i k r) (2 π)3 ω2 e−i ω T J4. (A14) 2. Momentum integration Now we address the k integrals, defined as: dk fn(k) = dk f+n (k) + dk f−n (k), where we collect the positive exponents in the gn in the integrand of f+n (k), and the negative exponents in f n (k): f+n (k) ≡ g+n (i k r)/2 k2 − (ω + i ǫ)2 , f−n (k) ≡ g−n (i k r)/2 k2 − (ω + i ǫ)2 We calculate this as the sum of contour integrals of the “plus” and “minus” integrands (necessary, as the oppo- site signs require different contours). Each of these has poles at k = 0, k = k+ ≡ ω + i ǫ, and k = k− ≡ −ω − i ǫ (the first of these is from the gn). We integrate the “plus” integrands anticlockwise around the contour C1, and the “minus” integrands anticlockwise around the contour C2 (see Fig. 7); taking the limit |k| → ∞, the contribu- tion from the curved segments vanishes, and the residue theorem gives us: Jn = 2 π iRes[f n , k+]− 2 π iRes[f−n , k−] +π iRes[f+n , 0]− π iRes[f−n , 0]. (A15) Calculating the residues, we find the values of each of the Jn: π ei r (ω+i ǫ) r (ω + i ǫ)2 r (ω + i ǫ)2 , (A16) π ei r (ω+i ǫ) r (ω + i ǫ)2 π ei r(ω+i ǫ) [−2 + 2 i r (ω + i ǫ)] r3 (ω + i ǫ)4 r3 (ω + i ǫ)4 , (A17) π ei r (ω+i ǫ) r (ω + i ǫ)2 4π ei r(ω+i ǫ) r5 (ω + i ǫ)6 [6− 6 i r (ω + i ǫ) −3 r2 (ω + i ǫ)2 + i r3 (ω + i ǫ)3 − 24 π r5 (ω + i ǫ)6 . (A18) 3. Frequency integration Now we perform the ω integration. Inserting the re- sults (A16-A18) into (A12-A14) respectively, we see that each of I, K and L contains a delta function, which we can extract: 4 π r [δ(T − r) − δ(T )], 4 π r δ(T − r) + e−r ǫ (2 π)3 e−iω (T−r) F2a(ω) (2 π)3 e−i ω T F2b(ω), 4 π r δ(T − r) + e−r ǫ (2 π)3 e−iω (T−r) F4a(ω) (2 π)3 e−i ω T F4b(ω), where the new terms on the right-hand side come from the Jn above, grouped by exponential, as that is what determines the contours chosen during integration (see Fig. 7): F2a(ω) = π ω2 [−2 + 2 i r (ω + i ǫ)] r3 (ω + i ǫ)4 F2b(ω) = 2 π ω2 r3 (ω + i ǫ)4 F4a(ω) = r5 (ω + i ǫ)6 [24− 24 i r (ω + i ǫ) −12 r2 (ω + i ǫ)2 + 4 i r3 (ω + i ǫ)3 F4b(ω) = − 24 π ω2 r5 (ω + i ǫ)6 Now the residues are as follows (taking the ǫ→ 0 limit): e−iω (T−r) F2a(ω),−i ǫ 2 π i T e−iω T F2b(ω),−i ǫ = −2 π i T e−iω (T−r) F4a(ω),−i ǫ 4 π i T 3 e−iω T F4b(ω),−i ǫ = −4 π i T The only pole is at ω = −i ǫ, so if we can close the contour in the upper half-plane, we will get zero. • For T < 0, both the “a” and “b” integrals can be closed in C1. Result: zero contribution. • For 0 < T < r, the “a” integrals can be closed in C1, but the “b” integrals must be closed in C2. Result: “b” contribution. • For T > r, both the “a” and “b” integrals must be closed in C2. But then the “a” and “b” residues cancel out. Result: zero contribution. Thus the only interesting contribution happens in the interval 0 < T < r ⇔ t− r(τ) < τ < t. In this case, the final integrals yield (2 π)3 e−iω (T−r) F2b(ω) = − 2 π r3 (2 π)3 e−iω (T−r) F4b(ω) = − −ω − iǫ ω + iǫ FIG. 7: Contours needed to complete integration over k (left) and ω (right). leading to the final result for K and L: 4 π r δ(T − r)− 1 4 π r δ(T ), 4 π r δ(T − r)−Θ(T )Θ(r − T ) T 2 π r3 4 π r δ(T − r)−Θ(T )Θ(r − T ) T We use these to calculate the Ii j and Ii j k l: Ii j = ni nj 4 π r δ(T − r)−Θ(T )Θ(r − T ) T 2 π r3 4 π r δ(T ) + Θ(T )Θ(r − T ) T 2 π r3 ,(A19) Ii j k l = ni nj nk nl 4 π r δ(T − r) −Θ(T )Θ(r − T ) T − 3Q(i j nk nl) Θ(T )Θ(r − T ) 2 π r3 Q(i j Qk l) 4 π r δ(T ) + Θ(T )Θ(r − T ) . (A20) 4. Time integration The final integrations will be over the source time τ . The “crossing times” for the two Θ functions are τ = t and τ = tr, where t is the present field time, and tr the corresponding retarded time defined by (25). Now taking a general function y(τ), we find that dτ IA y(τ) = y(trA) 4 π rA(t − y(t) 4 π rA(t) A y(τ) = niA n 4 π rA 4 π rA 3niA n A − δ ) (t− τ) y(τ) 4 π rA(τ)3 i j k l A y(τ) = niA n 4 π rA 4 π rA −3Q(i jA n (t− τ) 2 π rA(τ)3 −niA n A + 3Q (t− τ)3 π rA(τ)5 y(τ). These can now be substituted into the general integral (A1). We write the result as a sum of terms at the present field-point time t, the retarded time trA, and interval terms between them, TTA[~u] = H TTA[~u; t] +H TTA[~u; t A] +H TTA[~u; t A → t], TTA[~u; t] = − rA(t) ui uj − u [uk ul Qk lA δ 2 uk u [uk ul TTA[~u; t ui uj − u niA n [uk ul nkA n 2 uk u [uk ul niA n TTA[~u; t A → t] = −4G (t− τ) rA(τ)3 3niA n A − δ [uk ul 3nkA n A − δk l 2 uk u A − δj) k [uk ul (t− τ)3 rA(τ)5 [uk ul niA n A − 3Q [1] R. Vogt, in Sixth Marcel Grossman Meeting on General Relativity (Proceedings, Kyoto, Japan, 1991), edited by H. Sato and T. Nakamura (World Scientific, Singapore, 1992), pp. 244–266. [2] B. Abbott et al. (LIGO Scientific), Nucl. Instrum. Meth. A517, 154 (2004), gr-qc/0308043. [3] P. Bender et al., Tech. Rep. MPQ 233, Max- Planck-Institut für Quantenoptik (1998), URL: http://www.lisa-science.org/resources/talks-articles/mission/prephasea.pdf. [4] K. Danzmann and A. Rudiger, Class. Quantum Grav. 20, S1 (2003). [5] A. Buonanno, G. B. Cook, and F. Pretorius, Phys. Rev. D 75, 124018 (2007), gr-qc/0610122. [6] E. Berti, V. Cardoso, J. A. Gonzalez, U. Sperhake, M. Hannam, S. Husa, and B. Brügmann (2007), arXiv:gr- qc/0703053. [7] F. Pretorius, Phys. Rev. Lett. 95, 121101 (2005), gr- qc/0507014. [8] M. Campanelli, C. O. Lousto, P. Marronetti, and Y. Zlochower, Phys. Rev. Lett. 96, 111101 (2006), gr- qc/0511048. [9] J. G. Baker, J. Centrella, D.-I. Choi, M. Koppitz, and J. van Meter, Phys. Rev. Lett. 96, 111102 (2006), gr- qc/0511103. [10] S. Brandt and B. Brügmann, Phys. Rev. Lett. 78, 3606 (1997), gr-qc/9703066. [11] M. Shibata and T. Nakamura, Phys. Rev. D 52, 5428 (1995). [12] T. Baumgarte and S. Shapiro, Phys. Rev. D 59, 024007 (1999), gr-qc/9810065. [13] B. Brügmann, W. Tichy, and N. Jansen, Phys. Rev. Lett. 92, 211101 (2004), gr-qc/0312112. [14] M. Campanelli, C. O. Lousto, and Y. Zlochower, Phys. Rev. D 73, 061501(R) (2006), gr-qc/0601091. [15] F. Pretorius, Class. Quantum Grav. 23, S529 (2006), gr- qc/0602115. [16] J. G. Baker, J. Centrella, D.-I. Choi, M. Koppitz, and J. van Meter, Phys. Rev. D 73, 104002 (2006), gr- qc/0602026. [17] B. Brügmann et al. (2006), gr-qc/0610128. [18] M. A. Scheel et al., Phys. Rev. D 74, 104006 (2006), gr-qc/0607056. [19] P. Marronetti, W. Tichy, B. Brügmann, J. Gonzalez, M. Hannam, S. Husa, and U. Sperhake, Class. Quant. Grav. 24, S43 (2007), gr-qc/0701123. [20] W. Tichy, Phys. Rev. D 74, 084005 (2006), gr- qc/0609087. [21] H. P. Pfeiffer, D. A. Brown, L. E. Kidder, L. Lindblom, G. Lovelace, and M. A. Scheel, Class. Quant. Grav. pp. S59–S81 (2007), gr-qc/0702106. [22] J. G. Baker, M. Campanelli, F. Pretorius, and Y. Zlo- chower, Class. Quant. Grav. 24, S25 (2007), gr- qc/0701016. [23] J. Thornburg, P. Diener, D. Pollney, L. Rezzolla, http://www.lisa-science.org/resources/talks-articles/mission/prephasea.pdf E. Schnetter, E. Seidel, and R. Takahashi, Class. Quant. Grav. 24, 3911 (2007), gr-qc/0701038. [24] NRwaves home page: https://gravity.psu.edu/wiki NRwaves. [25] J. G. Baker, S. T. McWilliams, J. R. van Meter, J. Cen- trella, D.-I. Choi, B. J. Kelly, and M. Koppitz, Phys. Rev. D 75, 124024 (2007), gr-qc/0612117. [26] J. G. Baker, J. R. van Meter, S. T. McWilliams, J. Cen- trella, and B. J. Kelly (2006), gr-qc/0612024. [27] M. Campanelli, Class. Quant. Grav. 22, S387 (2005), astro-ph/0411744. [28] F. Herrmann, D. Shoemaker, and P. Laguna (2006), gr- qc/0601026. [29] J. G. Baker et al., Astrophys. J. 653, L93 (2006), astro- ph/0603204. [30] M. Campanelli, C. O. Lousto, and Y. Zlochower, Phys. Rev. D 74, 041501(R) (2006), gr-qc/0604012. [31] M. Campanelli, C. O. Lousto, and Y. Zlochower, Phys. Rev. D 74, 084023 (2006), astro-ph/0608275. [32] M. Campanelli, C. O. Lousto, Y. Zlochower, B. Krishnan, and D. Merritt, Phys. Rev. D 75, 064030 (2007), gr- qc/0612076. [33] J. A. Gonzalez, U. Sperhake, B. Bruegmann, M. Han- nam, and S. Husa, Phys. Rev. Lett. 98, 091101 (2007), gr-qc/0610154. [34] F. Herrmann, I. Hinder, D. Shoemaker, P. Laguna, and R. A. Matzner (2007), gr-qc/0701143. [35] M. Campanelli, C. O. Lousto, Y. Zlochower, and D. Mer- ritt, 659, L5 (2007), revised version has very different numbers/formulae, gr-qc/0701164. [36] M. Koppitz, D. Pollney, C. Reisswig, L. Rezzolla, J. thornburg, P. Diener, and E. Schnetter, Phys. Rev. Lett. 99, 041102 (2007), gr-qc/0701163. [37] J. A. Gonzalez, M. D. Hannam, U. Sperhake, B. Brügmann, and S. Husa, Phys. Rev. Lett. 98, 231101 (2007), gr-qc/0702052. [38] D.-I. Choi et al. (2007), gr-qc/0702016. [39] J. G. Baker et al. (2007), astro-ph/0702390. [40] F. Pretorius and D. Khurana, Class. Quant. Grav. 24, S83 (2007), gr-qc/0702084. [41] M. Campanelli, C. O. Lousto, Y. Zlochower, and D. Mer- ritt, Phys. Rev. Lett. 98, 231102 (2007), arXiv:gr- qc/0702133. [42] W. Tichy and P. Marronetti (2007), gr-qc/0703075. [43] G. Schäfer, Ann. Phys. 161, 81 (1985). [44] P. Jaranowski and G. Schäfer, Phys. Rev. D 57, 7274 (1998), errata: Phys. Rev. D 63, 029902(E) (2000), gr- qc/9712075. [45] W. Tichy, B. Brügmann, M. Campanelli, and P. Diener, Phys. Rev. D 67, 064008 (2003), gr-qc/0207011. [46] Cactus Compuational Toolkit, http://www.cactuscode.org. [47] T. Ohta, H. Okamura, T. Kimura, and K. Hiida, Prog. Theor. Phys. 51, 1598 (1974). [48] V. A. Fock, The Theory of Space, Time and Gravitation, 2nd ed. (Pergamon Press, 1964). [49] W. Tichy, B. Brügmann, and P. Laguna, Phys. Rev. D 68, 064008 (2003), gr-qc/0306020. [50] W. Tichy and B. Brügmann, Phys. Rev. D 69, 024006 (2004), gr-qc/0307027. [51] M. Ansorg, B. Brügmann, and W. Tichy, Phys. Rev. D 70, 064011 (2004), gr-qc/0404056. [52] L. S. Finn and D. F. Chernoff, Phys. Rev. D 47, 2198 (1993), gr-qc/9301003. [53] C. Cutler et al., Phys. Rev. Lett. 70, 2984 (1993), astro- ph/9208005. [54] W. Tichy, E. E. Flanagan, and E. Poisson, Phys. Rev. D 61, 104015 (2000), gr-qc/9912075. [55] G. Schäfer and N. Wex, Phys. Lett. A 174, 196 (1993). [56] R.-M. Memmesheimer, A. Gopakumar, and G. Schäfer, Phys. Rev. D 70, 104011 (2004), gr-qc/0407049. [57] N. Yunes, W. Tichy, B. J. Owen, and B. Brügmann, Phys. Rev. D 74, 104011 (2006), gr-qc/0503011. [58] N. Yunes and W. Tichy, Phys. Rev. D 74, 064013 (2006), gr-qc/0601046. [59] L. Blanchet, Phys. Rev. D 54, 1417 (1996), gr- qc/9603048. [60] T. Damour, A. Gopakumar, and B. R. Iyer, Phys. Rev. D 70, 064028 (2005), gr-qc/0404128. http://www.cactuscode.org ABSTRACT We present improved post-Newtonian-inspired initial data for non-spinning black-hole binaries, suitable for numerical evolution with punctures. We revisit the work of Tichy et al. [W. Tichy, B. Bruegmann, M. Campanelli, and P. Diener, Phys. Rev. D 67, 064008 (2003)], explicitly calculating the remaining integral terms. These terms improve accuracy in the far zone and, for the first time, include realistic gravitational waves in the initial data. We investigate the behavior of these data both at the center of mass and in the far zone, demonstrating agreement of the transverse-traceless parts of the new metric with quadrupole-approximation waveforms. These data can be used for numerical evolutions, enabling a direct connection between the merger waveforms and the post-Newtonian inspiral waveforms. <|endoftext|><|startoftext|> CLNS 07/1989 CLEO 07-01 Measurement of the Decay Constant f using D+ → ℓ+ν M. Artuso,1 S. Blusk,1 J. Butt,1 S. Khalil,1 J. Li,1 N. Menaa,1 R. Mountain,1 S. Nisar,1 K. Randrianarivony,1 R. Sia,1 T. Skwarnicki,1 S. Stone,1 J. C. Wang,1 G. Bonvicini,2 D. Cinabro,2 M. Dubrovin,2 A. Lincoln,2 D. M. Asner,3 K. W. Edwards,3 P. Naik,3 R. A. Briere,4 T. Ferguson,4 G. Tatishvili,4 H. Vogel,4 M. E. Watkins,4 J. L. Rosner,5 N. E. Adam,6 J. P. Alexander,6 D. G. Cassel,6 J. E. Duboscq,6 R. Ehrlich,6 L. Fields,6 L. Gibbons,6 R. Gray,6 S. W. Gray,6 D. L. Hartill,6 B. K. Heltsley,6 D. Hertz,6 C. D. Jones,6 J. Kandaswamy,6 D. L. Kreinick,6 V. E. Kuznetsov,6 H. Mahlke-Krüger,6 D. Mohapatra,6 P. U. E. Onyisi,6 J. R. Patterson,6 D. Peterson,6 J. Pivarski,6 D. Riley,6 A. Ryd,6 A. J. Sadoff,6 H. Schwarthoff,6 X. Shi,6 S. Stroiney,6 W. M. Sun,6 T. Wilksen,6 S. B. Athar,7 R. Patel,7 J. Yelton,7 P. Rubin,8 C. Cawlfield,9 B. I. Eisenstein,9 I. Karliner,9 D. Kim,9 N. Lowrey,9 M. Selen,9 E. J. White,9 J. Wiss,9 R. E. Mitchell,10 M. R. Shepherd,10 D. Besson,11 T. K. Pedlar,12 D. Cronin-Hennessy,13 K. Y. Gao,13 J. Hietala,13 Y. Kubota,13 T. Klein,13 B. W. Lang,13 R. Poling,13 A. W. Scott,13 A. Smith,13 P. Zweber,13 S. Dobbs,14 Z. Metreveli,14 K. K. Seth,14 A. Tomaradze,14 J. Ernst,15 K. M. Ecklund,16 H. Severini,17 W. Love,18 V. Savinov,18 O. Aquines,19 A. Lopez,19 S. Mehrabyan,19 H. Mendez,19 J. Ramirez,19 G. S. Huang,20 D. H. Miller,20 V. Pavlunin,20 B. Sanghi,20 I. P. J. Shipsey,20 B. Xin,20 G. S. Adams,21 M. Anderson,21 J. P. Cummings,21 I. Danko,21 D. Hu,21 B. Moziak,21 J. Napolitano,21 Q. He,22 J. Insler,22 H. Muramatsu,22 C. S. Park,22 E. H. Thorndike,22 and F. Yang22 (CLEO Collaboration) Syracuse University, Syracuse, New York 13244 Wayne State University, Detroit, Michigan 48202 Carleton University, Ottawa, Ontario, Canada K1S 5B6 Carnegie Mellon University, Pittsburgh, Pennsylvania 15213 Enrico Fermi Institute, University of Chicago, Chicago, Illinois 60637 Cornell University, Ithaca, New York 14853 University of Florida, Gainesville, Florida 32611 George Mason University, Fairfax, Virginia 22030 University of Illinois, Urbana-Champaign, Illinois 61801 Indiana University, Bloomington, Indiana 47405 University of Kansas, Lawrence, Kansas 66045 Luther College, Decorah, Iowa 52101 University of Minnesota, Minneapolis, Minnesota 55455 Northwestern University, Evanston, Illinois 60208 State University of New York at Albany, Albany, New York 12222 State University of New York at Buffalo, Buffalo, New York 14260 University of Oklahoma, Norman, Oklahoma 73019 University of Pittsburgh, Pittsburgh, Pennsylvania 15260 University of Puerto Rico, Mayaguez, Puerto Rico 00681 Purdue University, West Lafayette, Indiana 47907 Rensselaer Polytechnic Institute, Troy, New York 12180 University of Rochester, Rochester, New York 14627 (Dated: November 1, 2018) We measure the decay constant f using the D+s → ℓ +ν channel, where the ℓ+ designates either a µ+ or a τ+, when the τ+ → π+ν. Using both measurements we find f = 274 ± 13 ± 7 MeV. Combining with our previous determination of fD+ , we compute the ratio fD+ /fD+ = 1.23±0.11± 0.04. We compare with theoretical estimates. PACS numbers: 13.20.Fc, 13.66.Bc To extract precise information on the size of CKM matrix elements from Bd and Bs mixing measure- ments the ratio of “decay constants,” that are re- lated to the heavy and light quark wave-function overlap at zero separation, must be well known [1]. Recent measurement of B0s mixing by CDF [2] has shown the urgent need for precise numbers. De- cay constants have been calculated for both B and http://arxiv.org/abs/0704.0629v3 D mesons using several methods, including lattice QCD [3]. Here we present the most precise measure- ment to date of f , and combined with our previ- ous determination of fD+ [4, 5], we find fD+ /fD+ . In the Standard Model (SM) purely leptonic Ds decay proceeds via annihilation through a virtual W+. The decay rate is given by [6] s → ℓ |Vcs| where M is the D+s mass, mℓ is the lepton mass, GF is the Fermi constant, and |Vcs| is a CKM matrix element with a value of 0.9738 [7]. In this Letter we report measurements of both B(D+s → µ +ν) and B(D+s → τ +ν), when τ+ → π+ν (D+s → π +νν). More details are given in a compan- ion paper [8]. The ratio Γ(D+s → τ +ν)/Γ(D+s → µ+ν) predicted in the SM via Eq. 1 depends only on well-known masses, and equals 9.72; any devia- tion would be a manifestation of new physics as it would violate lepton universality [9]. New physics can also affect the expected widths; any undiscov- ered charged bosons would interfere with the SM W+ [10]. The CLEO-c detector [11] is equipped to mea- sure the momenta of charged particles, identify them using dE/dx and Cherenkov imaging (RICH) [12], detect photons and determine their directions and energies. We use 314 pb−1 of data produced in e+e− collisions using CESR near 4.170 GeV. Here the cross-section for our analyzed sample, D∗+s D s , is ∼1 nb. Other charm produc- tion totals ∼7 nb [13], and the underlying light- quark “continuum” is ∼12 nb. We fully reconstruct oneD−s as a “tag,” and examine the properties of the D+s . (Charge conjugate decays are used.) Track se- lection, particle identification, π0, η, and K0S criteria are the same as those described in Ref. [4], except that RICH identification now requires a minimum momentum of 700 MeV/c. Tag modes are listed in Table I. For resonance de- cays we select intervals in invariant mass within ±10 MeV of the known mass for η′ → π+π−η, ±10 MeV for φ → K+K−, ±100 MeV for K∗0 → K−π+, and ±150 MeV for ρ− → π−π0. We require tags to have momentum consistent with coming from DsD s pro- duction. The distribution for the K+K−π− mode (44% of all the tags) is shown in Fig. 1. To select tags, we first fit the invariant mass dis- tributions to the sum of two Gaussians centered at MDs . The r.m.s. resolution (σ) is defined as σ ≡ f1σ1 + (1 − f1)σ2, where σ1 and σ2 are the in- dividual widths and f1 is the fractional area of the FIG. 1: Invariant mass of K+K−π− candidates after requiring the total energy to be consistent with the beam energy. The curve shows a fit to a two-Gaussian signal function plus a polynomial background. TABLE I: Tagging modes and numbers of signal and background events, within cuts, from two-Gaussian fits to the invariant mass plots, and the number of γ tags in each mode, within ±2.5σ from a fit to the signal Crys- tal Ball function (see text) and a 5th order Chebychev background polynomial and the associated background. Mode Invariant Mass MM∗2 Signal Bkgrnd Signal Bkgrnd K+K−π− 13871±262 10850 8053± 211 13538 − 3122±79 1609 1933±88 2224 ηπ− 1609± 112 4666 1024±97 3967 η′π− 1196±46 409 792±69 1052 φρ− 1678±74 1898 1050±113 3991 π+π−π− 3654±199 25208 2300±187 15723 K∗−K∗0 2030±98 4878 1298±130 5672 ηρ− 4142±281 20784 2195±225 17353 Sum 31302 ± 472 70302 18645±426 63520 first Gaussian. We require the invariant masses to be within ± 2.5σ (±2σ for the ηρ− mode) of MDs . We have a total of 31302±472 tag candidates. Then we add a γ candidate that satisfies our shower shape re- quirement. Regardless of whether or not the γ forms a D∗s with the tag, for real D sDs events, the missing mass squared, MM∗2, recoiling against the γ and the D−s tag should peak at M . We calculate MM∗2 = (ECM − EDs − Eγ) −→pCM − −→pDs − where ECM ( −→pCM) is the center-of-mass energy (mo- mentum), EDs ( −→pDs) is the energy (momentum) of the fully reconstructed D−s tag, Eγ ( −→pγ) is the en- ergy (momentum) of the additional γ. We use a kinematic fit that constrains the decay products of the D−s to MDs and conserves overall momentum and energy. All γ’s in the event are used, except for those that are decay products of the D−s tag. The MM∗2 distribution from K+K−π− tags is shown in Fig. 2. We fit all the modes individually to determine the number of tag events. This proce- dure is enhanced by having information on the shape of the signal function. We use fully reconstructed D−s D s events, and examine the signal shape when one Ds is ignored. The signal is fit to a Crystal Ball function [14], which determines σ and the shape of the tail. Though σ varies somewhat between modes, the tail parameters don’t change, since they depend on beam radiation and γ energy resolution. FIG. 2: The MM∗2 distribution from events with a γ in addition to the K+K−π− tag. The curve is a fit to the Crystal Ball function and a 5th order Chebychev background function. Fits of MM∗2 in each mode when summed show 18645±426 events within a ±2.5σ interval (see Ta- ble I). There is a small enhancement of (4.8± 1.0)% in our ability to find tags in µ+ν (or π+νν) events (tag bias) as compared with generic events. Addi- tional systematic errors are evaluated by changing the fitting range, using 4th and 6th order Chebychev background polynomials, and allowing the parame- ters of the tail of the fitting function to float, leading to an overall systematic uncertainty of 5%. Candidate µ+ν events are required to have only a single additional track oppositely charged to the tag with an angle >35.9◦ with respect to the beam line. We also require that there not be any neutral en- ergy cluster detected of more than 300 MeV, which is especially useful to reject D+s → π +π0 and ηπ+ decays. Since here we are searching for events in which there is a single missing ν, the missing mass squared, MM2, should peak at zero: MM2 = (ECM − EDs − Eγ − Eµ) −→pCM − −→pDs − −→pγ − where Eµ ( −→pµ) are the energy (momentum) of the candidate µ+ track. We also make use of a set of kinematical con- straints and fit each event to two hypotheses: (1) the D−s tag is the daughter of a D s and (2) the D∗+s decays into γD s . The kinemati- cal constraints, in the center-of-mass frame, are −→pDs + −→pD∗ = 0, ECM = EDs + ED∗ , ED∗ ECM/2+ −M2Ds /2ECM or EDs = ECM/2− −M2Ds /2ECM, MD∗ − MDs = 143.6 MeV. In addition, we constrain the invariant mass of the D−s tag to MDs . This gives a total of 7 constraints. The missing ν four-vector needs to be determined, so we are left with a three-constraint fit. We perform an iterative fit minimizing χ2. To eliminate system- atic uncertainties that depend on understanding the absolute scale of the errors, we do not make a χ2 cut but simply choose the γ and the decay sequence in each event with the minimum χ2. We consider three separate cases: (i) the track de- posits < 300 MeV in the calorimeter, characteristic of a non-interacting pion or a µ+; (ii) the track de- posits > 300 MeV in the calorimeter, characteristic of an interacting pion; or (iii) the track satisfies our electron selection criteria. The separation between muons and pions is not complete. Case (i) contains 99% of the muons but also 60% of the pions, while case (ii) includes 1% of the muons and 40% of the pions [5]. Case (iii) does not include any signal but is used for background estimation. For cases (i) and (ii) we insist that the track not be identified as an electron or a kaon. Electron candidates have a match between the momentum measured in the tracking system and the energy deposited in the CsI calorime- ter, and dE/dx and RICH measurements consistent with this hypothesis. For the µ+ν final state the MM2 distribution is modeled as the sum of two Gaussians centered at zero. A Monte Carlo (MC) simulation of the MM2 shows σ=0.025 GeV2 after the fit. We check the resolution using the D+s → K K+ mode. We search for events with at least one additional track identi- fied as a kaon using the RICH detector, in addition to a D−s tag. The MM 2 resolution is 0.025 GeV2 in agreement with the simulation. In the π+νν final state, the extra missing ν re- sults in a smeared MM2 distribution that is almost triangular in shape starting near -0.05 GeV2, peak- ing near 0.10 GeV2, and ending at 0.75 GeV2. FIG. 3: The MM2 distributions from data usingD−s tags, and one additional opposite-sign charged track and no extra energetic showers, for cases (i), (ii), and (iii). The MM2 distributions from data are shown in Fig. 3. The overall signal region is -0.05 < MM2 < 0.20 GeV2. The upper limit is chosen to prevent background from ηπ+ and K0π+ final states. The peak in Fig. 3(i) is due to D+s → µ +ν. Below 0.20 GeV2 in both (i) and (ii) we have π+νν events. The specific signal regions are: for µ+ν, −0.05 < MM2 < 0.05 GeV2, corresponding to ±2σ; for π+νν, in case (i) 0.05 < MM2 < 0.20 GeV2 and in case (ii) −0.05 < MM2 < 0.20 GeV2. In these regions we find 92, 31, and 25 events, respectively. We consider backgrounds from two sources: one from real D+s decays and the other from the back- ground under the single-tag signal peaks. For the latter, we estimate the background from data using side-bands of the invariant mass, shown in Fig. 1. For case (i) we find 3.5 (properly normalized) back- ground events in the µ+ν region and 2.5 back- grounds in the τ+ν region; for case (ii) we find 3 events. Our total background estimate summing over all of these cases is 9.0±2.3 events. The background from real D+s decays is evaluated by identifying specific sources. For µ+ν the only possible background is D+s → π +π0. Using a 195 pb−1 subsample of our data, we limit the branching fraction as < 1.1 × 10−3 at 90% C.L. [8]. This low rate coupled with the extra γ veto yields a negligible contribution. The real D+s backgrounds for π are listed in Table II. Using the SM expected ratio of decay rates we calculate a contribution of 7.4 π+νν events. TABLE II: Event backgrounds in the π+νν sample from real D+s decays. Source B(%) case (i) case (ii) Sum D+s → Xµ +ν 8.2 0+1.8 0 0+1.8 D+s → π +π0π0 1.0 0.03±0.04 0.08±0.03 0.11±0.04 D+s → τ +ν 6.4 τ+ → π+π0ν 1.5 0.55±0.22 0.64±0.24 1.20±0.33 τ+ → µ+νν 1.0 0.37±0.15 0 0.37±0.15 Sum 1.0+1.8 0.7±0.2 1.7+1.8 The event yield in the signal region, Ndet (92), is related to the number of tags, Ntag, the branching fractions, and the background Nbkgrd (3.5) as Ndet −Nbkgrd = Ntag · ǫ[ǫ ′B(D+s → µ +ν) (3) +ǫ′′B(D+s → π +νν)], where ǫ (80.1%) includes the efficiencies (77.8%) for reconstructing the single charged track including fi- nal state radiation, (98.3)% for not having another unmatched cluster in the event with energy greater than 300 MeV, and the correction for the tag bias (4.8%); ǫ′ (91.4%) is the product of the 99.0% µ+ calorimeter efficiency and the 92.3% acceptance of the MM2 cut of |MM2| < 0.05 GeV2; ǫ′′ (7.6%) is the fraction of π+νν events contained in the µ+ν sig- nal window (13.2%) times the 60% acceptance for a pion to deposit less than 300 MeV in the calorime- ter. Using B(τ+ → π+ν) of (10.90±0.07)% [7], the ratio of the π+νν to µ+ν widths is 1.059; we find: B(D+s → µ +ν) = (0.594± 0.066± 0.031)%. (4) We can also sum the µ+ν and τ+ν contributions for −0.05 < MM2 < 0.02 GeV2. Equation 3 still ap- plies. The number of signal and background events changes to 148 and 10.7, respectively. ǫ′ becomes 96.2%, and ǫ′′ increases to 45.2%. The effective branching fraction, assuming lepton universality, is Beff(D+s → µ +ν) = (0.638± 0.059± 0.033)%. (5) The systematic errors on these branching fractions are dominated by the error on the number of tags (5%). Other errors include: (a) track finding (0.7%), determined from a detailed comparison of the sim- ulation with double tag events where one track is ignored; (b) the error due to the requirement that the charged track deposit no more than 300 MeV in the calorimeter (1%), determined using two-body D0 → K−π+ decays [5]; (c) the γ veto efficiency (1%), determined by extrapolating measurements on fully reconstructed events. Systematic errors arising from the background estimates are negligible. The total systematic error for Eq. 4 is 5.2%, and is 5.1% for Eq. 5 as (b) doesn’t apply here. We also analyze the τ+ν final state independently. For case (i) we define the signal region to be the in- terval 0.05 l+ nu channel, where the l+ designates either a mu+ or a tau+, when the tau+ -> pi+ nu. Using both measurements we find fDs = 274 +-13 +- 7 MeV. Combining with our previous determination of fD+, we compute the ratio fDs/fD+ = 1.23 +- 0.11 +- 0.04. We compare with theoretical estimates. <|endoftext|><|startoftext|> Introduction The BABAR detector and dataset Event Selection and Kinematic Fit The K+ K- +- final state Final Selection and Backgrounds Selection Efficiency Cross Section for e+e- K+ K- +- Substructure in the K+ K- +- Final State The e+e- K*0 K Cross Section The (1020)+- Intermediate State The (1020) f0(980) Intermediate State The K+ K-00 Final State Final Selection and Backgrounds Selection Efficiency Cross Section for e+e- K+ K- 00 Substructure in the K+ K- 00 Final State The (1020)00 Intermediate State The (1020) f0(980) Intermediate State The K+ K- K+ K- Final State Final Selection and Background Selection Efficiency Cross Section for e+e- K+ K- K+ K- The (1020) K+ K- Intermediate State e+e- f0 Near Threshold The Charmonium Region Summary Acknowledgments References ABSTRACT We study the processes $e^+ e^-\to K^+ K^- \pi^+\pi^-\gamma$, $K^+K^-\pi^0\pi^0\gamma$ and $K^+ K^- K^+ K^-\gamma$, where the photon is radiated from the initial state. About 34600, 4400 and 2300 fully reconstructed events, respectively, are selected from 232 \invfb of \babar data. The invariant mass of the hadronic final state defines the effective \epem center-of-mass energy, so that the $K^+ K^- \pi^+\pi^-\gamma$ data can be compared with direct measurements of the $e^+ e^-\to K^+K^- \pipi$ reaction; no direct measurements exist for the $e^+ e^-\to K^+ K^- \pi^0\pi^0$ or $\epem\to K^+ K^- K^+ K^-$ reactions. Studying the structure of these events, we find contributions from a number of intermediate states, and we extract their cross sections where possible. In particular, we isolate the contribution from $e^+ e^-\to\phi(1020) f_{0}(980)$ and study its structure near threshold. In the charmonium region, we observe the $J/\psi$ in all three final states and several intermediate states, as well as the $\psi(2S)$ in some modes, and measure the corresponding branching fractions. We see no signal for the Y(4260) and obtain an upper limit of $\BR_{Y(4260)\to\phi\pi^+\pi^-}\cdot\Gamma^{Y}_{ee}<0.4 \ev$ at 90% C.L. <|endoftext|><|startoftext|> d-wave superconductivity from electron-phonon interactions J.P.Hague Dept. of Physics and Astronomy, University of Leicester, Leicester, LE1 7RH and Dept. of Physics, Loughborough University, Loughborough, LE11 3TU (Dated: 4th May 2005) I examine electron-phonon mediated superconductivity in the intermediate coupling and phonon frequency regime of the quasi-2D Holstein model. I use an extended Migdal–Eliashberg theory which includes vertex corrections and spatial fluctuations. I find a d-wave superconducting state that is unique close to half-filling. The order parameter undergoes a transition to s-wave superconductivity on increasing filling. I explain how the inclusion of both vertex corrections and spatial fluctuations is essential for the prediction of a d-wave order parameter. I then discuss the effects of a large Coulomb pseudopotential on the superconductivity (such as is found in contemporary superconducting ma- terials like the cuprates), which results in the destruction of the s-wave states, while leaving the d-wave states unmodified. Published as: Phys. Rev. B 73, 060503(R) (2006) PACS numbers: 71.10.-w, 71.38.-k, 74.20.-z The discovery of high transition temperatures and a d-wave order parameter in the cuprate superconductors are remarkable results and have serious implications for the theory of superconductivity. The presence of large Coulomb interactions in the cuprates which have the po- tential to destroy conventional s-wave BCS states has prompted the search for new mechanisms that can give rise to superconductivity. However, electron-phonon me- diated superconductivity is still not well understood, es- pecialy in lower dimensional systems. In particular, the electron-phonon problem is particularly difficult at inter- mediate couplings with large phonon frequency (such as found in the cuprates) and the electron-phonon mech- anism cannot be fully ruled out. It is therefore of paramount importance to develop new theories to under- stand electron-phonon mediated superconductivity away from the BCS limit. The assumption that electron-phonon interactions can- not lead to high transition temperatures and unusual or- der parameters was made on the basis of calculations from BCS theory, which is a very-weak-coupling mean- field theory (although of course highly successful for pre-1980s superconductors)1. In the presence of strong Coulomb interaction, the BCS s-wave transition temper- ature is vastly reduced. However, the recent measure- ment of large couplings between electrons and the lat- tice in the cuprate superconductors means that exten- sions to the conventional theories of superconductivity are required2,3,4. In particular, low dimensionality, in- termediate dimensionless coupling constants of ∼ 1 and large and active phonon frequencies of ∼ 75meV mean that BCS or the more advanced Migdal–Eliashberg (ME) theory cannot be applied. In fact, the large coupling con- stant and a propensity for strong renormalization in 2D systems, indicate that the bare unrenormalized phonon frequency could be several times greater than the mea- sured 75 meV5. Here I apply the dynamical cluster approxima- tion (DCA) to introduce a fully self-consistent momentum-dependent self-energy to the electron-phonon problem5,6,7,8. Short ranged spatial fluctuations and low- est order vertex corrections are included, allowing the sequence of phonon absorption and emission to be re- ordered once. In particular, the theory used here is second order in the effective electron-electron coupling U = −g2/Mω20, which provides the correct weak coupling limit from small to large phonon frequencies18. In this paper, I include symmetry broken states in the anoma- lous self energy to investigate unconventional order pa- rameters such as d-wave. No assumptions are made in advance about the form of the order parameter. DCA6,8,9 is an extension to the dynamical mean-field theory for the study of low dimensional systems. To ap- ply the DCA, the Brillouin zone is divided into NC sub- zones within which the self-energy is assumed to be mo- mentum independent, and cluster Green functions are determined by averaging over the momentum states in each subzone. This leads to spatial fluctuations with characteristic range, N c . In this paper, Nc = 4 is used throughout. This puts an upper bound on the strength of the superconductivity, which is expected to be reduced in larger cluster sizes10. To examine superconducting states, DCA is extended within the Nambu formalism7,8. Green functions and self-energies are described by 2 × 2 matrices, with off diagonal terms relating to the super- conducting states. The self-consistent condition is: G(K, iωn) = Di(ǫ)(ζ(Ki, iωn)− ǫ) |ζ(Ki, iωn)− ǫ|2 + φ(Ki, iωn)2 F (K, iωn) = − Di(ǫ)φ(Ki, iωn) |ζ(Ki, iωn)− ǫ|2 + φ(Ki, iωn)2 where ζ(Ki, iωn) = iωn + µ−Σ(Ki, iωn), µ is the chem- ical potential, ωn are the Fermionic Matsubara frequen- cies, φ(K, iω) is the anomalous self energy and Σ(K, iω) is the normal self energy. G(K, iωn) must obey the lat- tice symmetry. In contrast, it is only |F (K, iωn)| which is constrained by this condition, since φ is squared in the denominator of Eqn. 1. Therefore the sign of φ http://arxiv.org/abs/0704.0633v1 FIG. 1: Diagrammatic representation of the current approx- imation. Series (a) represents the vertex-neglected theory which corresponds to the Migdal–Eliashberg approach, valid when the phonon energy ω0 and electron-phonon coupling U are small compared to the Fermi energy. Series (b) repre- sents additional diagrams for the vertex corrected theory. The phonon self energies are labeled with Π, and Σ denotes the electron self-energies. Lines represent the full electron Green function and wavy lines the full phonon Green function. can change. For instance, if the anomalous self energy has the rotational symmetry φ(π, 0) = −φ(0, π), the on-diagonal Green function, which represents the elec- tron propagation retains the correct lattice symmetry G(π, 0) = G(0, π). Therefore, only inversion symmetry is required of the anomalous Green function representing superconducting pairs and the anomalous self energy. Here I examine the Holstein model11 of electron- phonon interactions. It treats phonons as nuclei vibrat- ing in a time-averaged harmonic potential (representing the interactions between all nuclei), i.e. only one fre- quency ω0 is considered. The phonons couple to the local electron density via a momentum-independent coupling constant g11. H = − σ tc iσcjσ + iσ niσ(gri − µ) The first term in this Hamiltonian represents hopping of electrons between neighboring sites and has a dispersion ǫk = −2t i=1 cos(ki). The second term couples the lo- cal ion displacement, ri to the local electron density. The last term is the bare phonon Hamiltonian, i.e. a sim- ple harmonic oscillator. The creation and annihilation of electrons is represented by c i (ci), pi is the ion momen- tum and M the ion mass. The effective electron-electron interaction is, U(iωs) = ω2s + ω where, ωs = 2πsT , s is an integer and U = −g 2/Mω20 represents the magnitude of the effective electron- electron coupling. D = 2 with t = 0.25, resulting in a non-interacting band width W = 2. A small interpla- nar hopping t⊥ = 0.01 is included. This is necessary to stabilise superconductivity, which is not permitted in a pure 2D system12. Perturbation theory in the effective electron-electron interaction (Fig. 1) is applied to second order in U , us- ing a skeleton expansion. The electron self-energy has two terms, ΣME(ω,K) neglects vertex corrections (Fig. 1(a)), and ΣVC(ω,K) corresponds to the vertex corrected case (Fig. 1(b)). ΠME(ω,K) and ΠVC(ω,K) correspond to the equivalent phonon self energies. At large phonon frequencies, all second order diagrams including ΣV C are essential for the correct description of the weak coupling limit. The phonon propagator D(z,K) is calculated from, D(iωs,K) = ω2s + ω 0 −Π(iωs,K) and the Green function from equations 1 and 2. Σ = ΣME+ΣVC and Π = ΠME+ΠVC. Details of the transla- tion of the diagrams in Fig. 1 and the iteration procedure can be found in Ref. 7. Calculations are carried out along the Matsubara axis, with sufficient Matsubara points for an accurate calculation. The equations were iterated un- til the normal and anomalous self-energies converged to an accuracy of approximately 1 part in 103. Since the anomalous Green function is proportional to the anomalous self energy, initializing the problem with the non-interacting Green function leads to a non- superconducting (normal) state. A constant supercon- ducting field with d-wave symmetry was applied to the system to induce superconductivity. The external field was then completely removed. Iteration continued with- out the field until convergence. This solution was then used to initialize self-consistency for other similar val- ues of the parameters. The symmetry conditions used in Refs 5 and 7 have been relaxed to reflect the additional breaking of the anomalous lattice symmetry in the d-wave state. This does not affect the normal state Green func- tion, but does affect the anomalous state Green function. In Fig. 2, the anomalous self energy is examined for n = 1.0 (half-filling). The striking feature is that sta- ble d-wave superconductivity is found. This is mani- fested through a change in sign of the anomalous self energy, which is negative at the (π, 0) point and positive at the (0, π) point. The electron Green function (equa- tion 1) depends on φ2, so causality and lattice symmetry are maintained. Since the gap function φ(iωn)/Z(iωn) is directly proportional to φ(iωn), and Z(iωn,K(π,0)) = Z(iωn,K(0,π)), then the sign of the order parameter i.e. the sign of the superconducting gap changes under 90o rotation. Z(iωn) = 1− Σ(iωn)/iωn. Figure 3 shows the variation of superconducting pair- ing across the Brillouin zone. ns(k) = T n F (iωn,k). U = 0.6, ω0 = 0.4, n = 1 and T = 0.005. The d-wave order can be seen very clearly. The largest anomalous densities are at the (π, 0) and (0, π) points, with a node situated at the (π/2, π/2) point and a sign change on 90o rotation. Pairing clearly occurs between electrons close to the Fermi surface. So far, the model has been analyzed at half filling. Figure 4 demonstrates the evolution of the order param- -0.025 -0.02 -0.015 -0.01 -0.005 0.005 0.01 0.015 0.02 0.025 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 (0,0) (π,0) (0,π) (π,π) FIG. 2: Anomalous self-energy at half-filling. The anomalous self energy is real. It is clear that φ(π, 0) = −φ(0, π). This is characteristic of d-wave order. Similarly, the electron self energy has the correct lattice symmetry Σ(π, 0) = Σ(0, π), which was not imposed from the outset. The gap function is related to the anomalous self energy via φ(iωn)/Z(iωn). -0.25 0.25 -0.25 0.25 ns(k) FIG. 3: Variation of superconducting (anomalous) pairing density across the Brillouin zone. ns(k) = T F (iωn,k). U = 0.6, ω0 = 0.4, n = 1 and T = 0.005. The d-wave order can be seen very clearly, with a change in sign on 90o rota- tion and a node situated at the (π/2, π/2) point. The largest anomalous (superconducting) densities are at the (π, 0) and (0, π) points. eter as the number of holes is first increased, and then decreased. The total magnitude of the anomalous den- sity, ns = |ns(Ki)| is examined. When the number of holes is increased, stable d-wave order persists to a filling of n = 1.18, while decreasing monotonically. At the critical point, there is a spontaneous transition to s- wave order. Starting from a high filling, and reducing the number of holes, there is a spontaneous transition from s to d-wave order at n = 1.04. There is therefore hys- teresis associated with the self-consistent solution. It is reassuring that the d-wave state can be induced without the need for the external field. As previously established, s-wave order does not exist at half-filling as a mainfesta- 0.02 0.04 0.06 0.08 1 1.05 1.1 1.15 1.2 1.25 1.3 External field FIG. 4: Hysteresis of the superconducting order parameters. |ns(Ki)|. Starting from a d-wave state at half- filling, increasing the chemical potential increases the filling and decreases the d-wave order. Eventually, at n = 1.18 the system changes to an s-wave state. On return from large filling, the s-wave superconductivity is persistent to a low filling of n = 1.04, before spontaneously reverting to a d- wave state. The system is highly susceptible to d-wave order, and application of a very small external superconducting field to an s-wave state results in a d-wave state. Note that d- and s-wave channels are coupled in the higher order theory, so the transition can take place spontaneously, unlike in the standard gap equations. tion of Hohenberg’s theorem7, so the computed d-wave order at half-filling is the ground state of the model. It is interesting that the d- and s-channels are able to co- exist, considering that the BCS channels are separate on a square lattice. This is due to the vertex corrections, since the self consistent equations are no longer linear in the gap function (the 1st order gap equation vanishes in the d-wave case, leaving 2nd order terms as the leading contribution). I finish with a brief discussion of Coulomb effects. In the Eliashberg equations, a Coulomb pseudopotential may be added to the theory as, φC = UCT F (iωn,K) (6) It is easy to see the effect of d-wave order on this term. Since the sign of the anomalous Green function is mod- ulated, the average effect of d-wave order is to nullify the Coulomb contribution to the anomalous self-energy (i.e. φCd = 0). This demonstrates that the d-wave state is stable to Coulomb perturbations, presumably be- cause the pairs are distance separated. In contrast, the s-wave state is not stable to Coulomb interaction, with a corresponding reduction of the transition temperature (TC = 0 for λ < µC). Thus, such a Coulomb filter selects the d-wave state (see e.g. Ref. 13). Since large local Coulomb repulsions are present in the cuprates (and in- deed most transition metal oxides), then this mechanism seems the most likely to remove the hysteresis. Without the Coulomb interactions, it is expected that the s-wave state will dominate for n > 1.04, since the anomalous order is larger. I note that a further consequence of strong Coulomb repulsion is antiferromagnetism close to half-filling. Typ- ically magnetic fluctuations act to suppress phonon medi- ated superconducting order. As such, one might expect a suppression of superconducting order close to half-filling, with a maximum away from half filling. The current the- ory could be extended to include additional anomalous Green functions related to antiferromagnetic order. This would lead to a 4x4 Green function matrix. A full anal- ysis of antiferromagnetism and the free energy will be carried out at a later date. a. Summary In this paper I have carried out simu- lations of the 2D Holstein model in the superconducting state. Vertex corrections and spatial fluctuations were included in the approximation for the self-energy. The anomalous self energy and superconducting order param- eter were calculated. Remarkably, stable superconduct- ing states with d-wave order were found at half-filling. d-wave states persist to n = 1.18, where the symmetry of the parameter changes to s-wave. Starting in the s- wave phase and reducing the filling, d-wave states spon- taneously appear at n = 1.04. The spontaneous appear- ance of d-wave states in a model of electron-phonon in- teractions is of particular interest, since it may negate the need for novel pairing mechanisms in the cuprates19. The inclusion of vertex corrections and spatial fluctua- tions was essential to the emergence of the d-wave states in the Holstein model, which indicates why BCS and ME calculations do not predict this phenomenon. For very weak coupling, the off diagonal Eliashberg self-energy has the form −UT Q,n F (iωn,Q)D0(iωs − iωn), so it is clear (for the same reasons as the Coulomb pseudopo- tential) that this diagram has no contribution in the d- wave phase (the weak coupling phonon propagator is mo- mentum independent for the Holstein model). Therefore, vertex corrections are the leading term in the weak cou- pling limit. Furthermore, I have discussed the inclusion of Coulomb states to lowest order, which act to desta- bilize the s-wave states, while leaving the d-wave states unchanged. Since the Coulomb pseudopotential has no effect then it is possible that electron-phonon interactions are the mechanism inducing d-wave states in real mate- rials such as the cuprates. The Coulomb filtering mech- anism works for p-wave symmetry and higher, so it is possible that electron-phonon interactions could explain many novel superconductors. Certainly, such a mecha- nism cannot be ruled out. The doping dependence of the order qualitatively matches that of La2−xSrxCuO4 (here order extends to x = 0.18, in the Cuprate to x = 0.3). Antiferromagnetism is only present in the cuprate very close to half filling (up to approx x = 0.02), and on a mean-field level does not interfere with the d-wave su- perconductivity at larger dopings. It has been determined experimentally that strong electron-phonon interactions and high phonon frequen- cies are clearly visible in the electron and phonon band structures of the cuprates, and are therefore an essential part of the physics3,4. Similar effects to those observed in the cuprates are seen in the electron and phonon band structures of the 2D Holstein model in the normal phase5. It is clearly of interest to determine whether other fea- tures and effects in the cuprate superconductors could be explained with electron-phonon interactions alone. b. Acknowledgments I thank the University of Le- icester for hospitality while carrying out this work. I thank E.M.L.Chung for useful discussions. I am currently supported under EPSRC grant no. EP/C518365/1. 1 J.Bardeen, L.N.Cooper, and J.R.Schrieffer, Phys. Rev. B 108, 1175 (1957). 2 G.M.Zhao, M.B.Hunt, H.Keller, and K.A.Müller, Nature 385, 236 (1997). 3 A.Lanzara, P.V.Bogdanov, X.J.Zhou, S.A.Kellar, D.L.Feng, E.D.Lu, T.Yoshida, H.Eisaki, A.Fujimori, K.Kishio, et al., Nature 412, 6846 (2001). 4 R.J.McQueeney, Y.Petrov, T.Egami, M.Yethiraj, G.Shirane, and Y.Endoh, Phys. Rev. Lett. 82, 628 (1999). 5 J.P.Hague, J. Phys. Condens. Matt 15, 2535 (2003). 6 M.H.Hettler, A.N.Tahvildar-Zadeh, M.Jarrell, T.Pruschke, and H.R.Krishnamurthy, Phys. Rev. B 58, R7475 (1998). 7 J.P.Hague, J. Phys.: Condens. Matter 17, 5663 (2005). 8 T. Maier, M. Jarrell, T. Pruschke, and M. H. Hettler, Rev. Mod. Phys 77, 1027 (2005). 9 M.H.Hettler, M.Mukherjee, M.Jarrell, and H.R.Krishnamurthy, Phys. Rev. B 61, 12739 (2000). 10 M.Jarrell, Th.Maier, C.Huscroft, and S.Moukouri, Phys. Rev. B 64, 195130 (2001). 11 T.Holstein, Ann. Phys. 8, 325 (1959). 12 P.C.Hohenberg, Phys. Rev. 158, 383 (1967). 13 J.F.Annett, Superconductivity, Superfluidity and Conden- sates (Oxford University Press, 2004). 14 C.Grimaldi, L.Pietronero, and S.Strässler, Phys. Rev. Lett. 75, 1158 (1995). 15 A.A.Abrikosov, Physica C 244, 243 (1995). 16 A.A.Abrikosov, Phys. Rev. B 52, R15738 (1995). 17 R. J. Birgeneau,and G. Shirane, in Physical Properties of High Temperature Superconductors I, edited by D. M.Ginsberg (World Scientific, Singapore, 1989). 18 I also note the extensions to Eliashberg theory carried out by Grimaldi et al.14. 19 On the basis of a screened electron-phonon interaction, Abrikosov claims to have found stable d-wave states in a BCS like theory15,16. However with an unscreened Holstein potential, the transition temperature it the d-wave chan- nel given by the standard theory is zero. Also, the assumed order parameter in his work does not clearly have d-wave symmetry. ABSTRACT I examine electron-phonon mediated superconductivity in the intermediate coupling and phonon frequency regime of the quasi-2D Holstein model. I use an extended Migdal-Eliashberg theory which includes vertex corrections and spatial fluctuations. I find a d-wave superconducting state that is unique close to half-filling. The order parameter undergoes a transition to s-wave superconductivity on increasing filling. I explain how the inclusion of both vertex corrections and spatial fluctuations is essential for the prediction of a d-wave order parameter. I then discuss the effects of a large Coulomb pseudopotential on the superconductivity (such as is found in contemporary superconducting materials like the cuprates), which results in the destruction of the s-wave states, while leaving the d-wave states unmodified. <|endoftext|><|startoftext|> Introduction Equilibrium conformational fluctuations of proteins about their folded, native structure play an important role in their biological function.1-4 Three prominent approaches used to compute conformational fluctuations of proteins are, in order of increasing computational efficiency and decreasing modeling resolution, molecular dynamics (MD), all-atom normal mode analysis (NMA), and coarse-grained elastic NMA (eNMA). MD attempts to sample the equilibrium distribution of states in the vicinity of the native structure via time-integration of Newton’s equations of motion, typically modeling solvent explicitly.5 All-atom NMA assumes harmonic fluctuations about the native state in solving the free vibration problem for the protein while treating the solvent implicitly.2,3,6 Finally, eNMA employs a coarse-grained elastic description of the protein in which specific atomic interactions are replaced by a simple network of linear elastic springs, typically connecting Cα atoms within an arbitrary cut-off radius.7,8 Successively coarser and thus computationally more efficient eNMA descriptions are obtained by reducing the total number of interaction sites in the system.9-12 The idea of treating proteins as effective elastic media in calculating their normal modes dates back at least to Suezaki and Go.13 Despite their relative simplicity, elastic coarse-grained models have proven remarkably successful in calculating the slow, large length-scale vibrational modes of proteins and their supramolecular assemblies. As shown recently by Lu and Ma,14 their success may partially be attributed to the fact that biomolecular shape plays a dominant role in determining the lowest normal modes of proteins. Indeed, large length-scale modes naturally average over heterogeneous interactions present at atomic length-scales, thereby rendering elastic descriptions valid in this regime. Global structural averages such as backbone fluctuations and inter-residue correlations are in turn also successfully predicted because they are dominated by these low frequency modes. The success of eNMA motivates the current work, in which the elastic network model for proteins is cast in the framework of the well established Finite Element Method (FEM).15,16 In formulating the model, the protein is defined by its mass density, ρ, isotropic elastic modulus, E, and solvent-excluded surface (SES), which is obtained by rolling a water molecule-size probe-sphere over its van der Waals surface.17-20 As an initial exploration of the utility of the FEM in analyzing protein mechanical response, the normal modes of a mutant of T4 lysozyme and of F-actin are computed, as well as the critical Euler buckling load of F-actin when subject to axial compression. NMA results for T4 lysozyme are compared with all-atom NMA, the Rotation Translation Blocks (RTB) procedure,21,22 which treats residues as rigid but retains atomic-level interactions as modeled by the implicit solvent force-field EEF1,23 and experiment. Similar to eNMA, the proposed FE-based procedure offers several advantages over all-atom NMA, including the elimination of costly energy minimization that may distort the initial protein structure, direct applicability to x-ray data of proteins with unknown atomic structure,24-26 and a significant speed-up of the NMA itself due to a drastic reduction in the number of degrees of freedom simulated. Additionally, the FEM offers several distinct advantages over existing elastic network models that provide the primary motivation for the current work. Principal among these is the suitability of the FEM to calculate the mechanical response of proteins and their supramolecular assemblies to applied bending, buckling, and other generalized loading scenarios, which is needed to probe the structure-function relation of supramolecular assemblies such as viral capsids,27,28 microtubules,29,30 F-actin bundles,31,32 and molecular motors.33,34 Moreover, casting the coarse-grained elastic model in the framework of the FEM opens two important avenues of model refinement that are currently being pursued. First, the atomic Hessian can be projected onto the FE- space in order to incorporate atomic-level interactions into the model, thereby eliminating the a priori assumption of homogeneous isotropic elastic response. This idea is similar to the initial version of the Rotation Translation Blocks (RTB) procedure proposed in Durand et al.,35, as well as related works in modeling crystals.36,37 The incorporation of atomic-level interactions may be particularly important in modeling binding interfaces present between constituent monomers in supramolecular assemblies such as F-actin, MTs, and viral coat protein subunits, particularly near the onset of mechanical failure. Second, the FE-based protein model may be coupled directly to field calculations including the Poisson–Boltzmann Equation to model solvent-mediated electrostatic interactions38-40 and the Stokes Equations to model solvent-damping in dynamic response calculations.41,42 Methods The FEM is a mature field that is discussed in detail in references such as Bathe15 and Zienkiewicz and Taylor.16 Accordingly, the focus here is on its application to proteins and readers are kindly referred to the above-referenced books for details on its theoretical foundations. Generation of the FE model requires three steps: (1) definition and discretization of the protein volume; (2) definition of the local effective mass density and constitutive behavior of the protein; and (3) application of boundary conditions such as displacement- or force-based loading. The protein volume is defined by its bounding SES, which is also called the Richards Molecular Surface or simply the Molecular Surface. This surface is defined by the closest point of contact of a solvent-sized probe-sphere that is rolled over the van der Waals surface of the protein, which defines the molecular volume that is never penetrated by any part of the solvent probe-sphere.17-19 The SES is computed using MSMS ver. 2.6.1, which generates a high density triangulated approximation (one triangular vertex per Å2) to the exact SES.20 The MSMS-discretized SES is subsequently decimated to arbitrary prescribed spatial resolution using the surface simplification algorithm QSLIM.43-45 The QSLIM algorithm employs iterative vertex-pair contraction together with a quadric error metric to retain a near-optimal representation of the original surface while reducing the total number of faces by an arbitrary, user-specified amount.43 The protein volume that is bounded by the closed SES is subsequently discretized with 3D tetrahedral finite elements via automatic mesh generation using the commercial Finite Element program ADINA ver. 8.4 (Watertown, MA, USA). Application of the proposed FE-based procedure directly to x-ray data would require definition of the molecular volume from the electron density map using Voronoi tessellation or a similar procedure, as proposed by Wriggers et al.,46, and performed by Ming et al.24 The protein constitutive response is modeled using the standard Hooke’s law, which treats the protein as a homogeneous, isotropic, elastic continuum with Young’s modulus E and Poisson ratio ν.47 While this is conceptually similar to elastic-network based models, it is rigorously distinct: Elastic network models typically connect Cα atoms by springs of equal stiffness, which results in general in a locally anisotropic and inhomogeneous elastic material with length-scale dependent mechanical properties. In contrast, the FE-model defined here treats the protein as strictly homogeneous, with an isotropic elastic material response that is length-scale invariant. The mass density of the protein is taken to be homogeneous, although it could equally be defined as a spatially- varying function from the underlying atomic constitution or from electron density data. Finally, arbitrary boundary conditions consisting of displacement- or force-based loading may be applied to the molecule, modeling the effects of the protein environment. In the current application, the free vibration problem is solved for T4 lysozyme and F- actin in the absence of any boundary condition and the linearized buckling problem for F- actin is solved by applying co-axial compressive point loads to the ends of the molecule. Given the protein volume, constitutive behavior, and boundary conditions, the FEM uses numerical volume-integration to derive a set of algebraic equations that is linear in the finite element nodal displacement degrees of freedom, u , + =Mu Ku R (1) where M is the diagonal mass-matrix, K is the elastic stiffness matrix, and R is a forcing vector that results from natural (force-based) boundary conditions.15 In the case of the free vibration problem relevant to the NMA of proteins, 0=R . Substitution of the oscillatory solution, cos( )tω γ= +u y , into the free-vibration form of Eq. (1) results in the generalized eigenvalue problem, 2 0ω− =Ky My (2) which after definition of the eigenvalues, 2:λ ω= , may be written in secular form, det 0λ− =K M . (3) Various efficient FE procedures exist to obtain the solution to the generalized eigenvalue problem, yielding the eigenvalues and eigenvectors, ( , )i iλ y . In the present application an accelerated subspace-iteration method15,48 is used for T4 lysozyme and F-actin. The substructure synthesis procedure49 commonly available in structural mechanics FE programs could also be applied to calculate the normal modes of F-actin, as recently proposed by Ming et al.50 The eigenvectors corresponding to the FE nodal degrees of freedom are linearly interpolated to the Cα positions given by the atomic coordinates that were used to define the FE model. Standard equilibrium thermal averages may then be computed in the standard way, including the fluctuation of Cα atom i due to mode k, 2 2 /ik B ik k ir k Ta mλΔ = , the total fluctuation of Cα atom i due to all modes, i ikk r rΔ = Δ∑ , correlations in positional fluctuations of Cα atoms i and j, /ij i j i i j jC r r r r r r= Δ ⋅Δ Δ ⋅Δ Δ ⋅Δ , where ( )i j B ik jk k i jkr r k T a a m mλΔ ⋅Δ = ∑ , and the overlap, ijR , between normal modes i and j, defined by the inner product of the modes, ij i j i jR = ⋅a a a a , where (1 1)ijR≤ ≤ − . 6 As with elastic network models, the protein stiffness-scale (E) is unknown. Accordingly, the acoustic wave speed, E ρ , which is the relevant physical unit in the free-vibration problem, is adjusted to best-fit the pertinent Cα fluctuation data, which is either experimental or that from the all-atom NMA. In the case of F-actin, the average mass density, ρ, is set explicitly and the Young’s modulus is determined by matching its stretching stiffness to experiment,51 as also performed by ben-Avraham and Tirion52 and described in more detail below. The Poisson ratio is taken to be 0.3 for T4 lysozyme and F-actin, which is typical of crystalline solids. While the choice of 0.3ν = has, to the best of the author’s knowledge, no rigorous justification, it is noted that its precise value does not affect the computed results within the range of (0.3 0.5)ν≤ ≤ . This is typical of response calculations such as those performed here, in which material compressibility does not play an important role. Two important considerations in generating the FE model are the choice of the probe-sphere radius used to define the protein volume and the degree of surface simplification performed. Regarding the choice of probe-sphere radius, two approaches were deliberated here. In the first, the probe-sphere radius is treated as an adjustable parameter, akin to the cut-off radius used in elastic network models. In this case, as the radius of the probe-sphere is increased, protein cavities in which solvent would normally be present become part of the effective elastic medium constituting the protein. Accordingly, the shape of the protein becomes a function of the probe-sphere radius, which will affect its mechanical response. In the second approach, the probe-sphere radius is treated as a fixed, physically-based parameter that is approximately equal to the size of a water molecule, as in electrostatic field calculations.53-55 The homogeneous elastic medium of the protein is then strictly applied to those volumetric regions in which dense intramolecular packing involving close-ranged van der Waals, hydrogen-bond, and bonded interactions are present, and the molecular surface is a well-defined physical feature of the protein. The latter approach was taken here in order to retain the physical connection to atomic packing in solids. An important theoretical property of the FEM is that it guarantees convergence to the exact solution of the underlying mathematical model as the FE mesh is refined, where the mathematical model is defined by the protein’s analytical SES, constitutive behavior, and boundary conditions.15 Thus, any normal mode or mechanical response calculation performed using the proposed FE-based procedure should in principle systematically refine the discretized representation in order to ensure convergence of the computed model property to its exact result. In practice, however, the permissible degree of surface simplification using QSLIM or similar algorithm will depend on the sensitivity of the computed observable to details of molecular shape, which must be evaluated on a case- by-case basis, as addressed below for T4 lysozyme and F-actin. T4 lysozyme The initial structure of the 164 residue (18.7 kDa) mutant T4 phage lysozyme is taken from Matsumura et al.,56 (Protein Data Bank ID 3LZM).57 CHARMM ver. 33a158 is used with the implicit solvation model EEF123 to build in coordinates missing in the crystal structure and to perform energy minimization and NMA. Steepest descent minimization followed by adopted-basis Newton–Raphson minimization is performed in the presence of successively reduced harmonic constraints on backbone atoms to achieve a final root-mean-square (RMS) energy gradient of 5×10–4 kcal/(mol Å) with corresponding RMS deviation between the x-ray and energy-minimized structures of 1.3 Å (Fig. 1a). All-atom (ATM) and RTB NMA21 are used as implemented in CHARMM,22 using one-block per residue for the RTB calculations. Fig. 1 T4 lysozyme (a) crystal structure (Protein Data Bank ID 3lmz), (b) MSMS- triangulated SES, and (c) QSLIM-decimated SES used for the FE computation. Atomic structure rendered with VMD ver. 1.8.559 and triangulated models rendered with ADINA ver. 8.4. To define the FE model, MSMS is used to compute the SES of the energy- minimized structure of T4 lysozyme using the MSMS-default 1.5 Å radius probe ignoring hydrogens. As noted previously, the FE model may be defined directly from the atomic structure without initial energy minimization, however, the energy-minimized structure is used here to be consistent with the ATM model, which requires minimization. MSMS generates a triangulated approximation to the analytical SES that consists of 17,300 triangular faces (Fig. 1b). This model is decimated using QSLIM to a reduced model consisting of 2,000 faces (Fig. 1c). The decimated surface-mesh is read into ADINA ver. 8.4 and used as a template to generate 6,843 4-node tetrahedral finite elements consisting of 1,627 nodes. Calculation of the 100 lowest non-rigid-body modes using an accelerated subspace-iteration method15,48 required 27 MB of RAM and about 10 seconds on a 2.1 GHz Intel Core2Duo processor. Refining drastically the surface representation from 2,000 faces to 17,300 faces (and associated volume discretization) or computing more than 100 normal modes did not alter the Cα fluctuations significantly. F-actin The atomic structure of F-actin (52 protomers, 2.2 MDa molecular weight) is generated using FilaSitus ver. 1.460 based on the Holmes fiber model61 and the structure of G-actin:ADP:Ca2+ from the actin-gelsolin segment-1 complex.62,63 This structure of F- actin-ADP models the filament in its “young” state when the DNase I binding region of subdomain 2 of G-actin (residues 40–48) is in its disordered loop conformation as opposed to its ordered α-helix conformation.63-65 Importantly, in its disordered loop conformation this region forms intramolecular contacts in F-actin that stabilize the filament and have direct consequences on its mechanical properties.66-68 Calculation of the SES using MSMS and a 3 Å radius probea results in a model with 1,248,038 triangular faces, which is subsequently decimated in several seconds using QSLIM to a reduced model with 40,000 triangular faces. The decimated surface- mesh is read into ADINA ver. 8.4 and used as a template to generate 134,883 4-node tetrahedral finite elements consisting of 31,881 nodes (Fig. 2b). Planar axial stretching is used to determine the effective Young’s modulus of F-actin, E = 2.69 GPa, by fitting its computed value to its experimentally-measured value in the absence of tropomyosin, 43.7 nN.51 The homogeneous mass density, ρ = 1,170 kg/m3, is based on the 42 kDa molecular weight of G-actin and the calculated molecular volume of F-actin, which is equal to 3.1×106 Å3 for the 52-mer considered. Normal mode analysis using the accelerated subspace-iteration procedure in ADINA requires 22 MB and less than 10 seconds to calculate the lowest 10 modes on a 2.1 GHz Intel Core2Duo processor. To test a Use of the MSMS-default 1.5 Å radius probe resulted in QSLIM-decimated surface models that were poorly formed with multiple intersecting and degenerate triangles due to re-entrant surfaces of F-actin. Use of a 3 Å radius probe resolved this problem of SES-representation and is not expected to affect significantly the large length-scale normal modes of F-actin, which has relatively large minor and major diameters of ~40 and 80 Å, respectively. convergence of the FE solution to the exact solution, the FE mesh was coarsened considerably to a model consisting of only 7,558 4-node tetrahedral volume elements (4,000 surface triangles), for which the lowest four eigen-frequencies increased by at most 15% with respect to the more detailed model. Further mesh refinement beyond 40,000 surface elements was precluded by the problematic surface mesh generated by the proposed procedure, in which substantial element intersections were present. Fig. 2 (a) Atomic structure of the 52-monomer F-actin filament analyzed and (b) the triangulated SES used to define the FE model. Atomic structure is rendered with VMD ver. 1.8.5 59 and the FEM model rendered using ADINA ver. 8.4. Results T4 lysozyme Equilibrium thermal fluctuations of Cα atoms aid in understanding protein function as mediated by local conformational flexibility and provide a first quantitative test for the proposed coarse-grained procedure. Experimental fluctuations are related to the experimental temperature- or B-factor by, 2 28 / 3i iB rπ= Δ , where irΔ is the mean-squared fluctuation of atom i. While both coarse-grained models capture well the overall experimental variation in flexibility of T4 lysozyme (Fig. 3a and Table 1),56 local differences are evident in disordered loop regions where conformational flexibility is overestimated significantly by both the RTB and FEM procedures (e.g., residue numbers 35–40). Comparison with the all-atom model indicates that these discrepancies are inherent to the protein structure, however, and not artifacts of the RTB and FEM procedures (Fig. 3b). Indeed, Cα fluctuations calculated with the RTB and FEM models correlate notably better with fluctuations calculated with the all-atom model than with experiment (Table 1). residue index RMSF (Å) 0 20 40 60 80 100 120 140 160 residue index RMSF (Å) 0 20 40 60 80 100 120 140 160 (a) (b)residue index residue index Fig. 3. Coarse-grained RMSF of Cα atoms in T4 lysozyme compared with (a) experiment and (b) all-atom NMA. 100 modes are used to compute the all-atom, RTB, and FEM fluctuations. Correlation coefficients provided in Table 1. Table 1 Correlation coefficients corresponding to Cα atom RMSF in Figure 3. Experiment ATM RTB 0.73 0.95 FEM 0.68 0.89 Inter-residue spatial correlations measured at Cα atoms provide additional insight into protein function,69,70 as well as a further test of the proposed coarse-grained procedure. Interestingly, the RTB and FEM procedures provide similar information with respect to the all-atom model, as measured over either the lowest 10 or 100 modes (Fig. 4 top and bottom, respectively). The fact that the correlation maps are largely determined with as few as ten modes reconfirms numerous previous findings that the lowest modes of proteins dominate their free vibration response.14,71,72 The similarity in the FEM and ATM correlation maps provides additional evidence that T4 lysozyme behaves remarkably similar to a homogeneous isotropic elastic solid in free vibration. Fig. 4 T4 lysozyme inter-Cα correlations computed using (top) 10 modes and (bottom) 100 modes for the (a) ATM, (b) RTB, and (c) FEM models. The lowest four mode shapes computed using the FEM may be projected onto the ground-state (energy-minimized) structure of T4 lysozyme to visualize their nature (Fig. 5). Similar to the native hen egg lysozyme, the lowest mode is a hinge-bending mode,1,73 whereas the three higher modes are a combination of hinge- and twist-deformations. Quantitative comparison between the coarse-grained and all-atom models is made in Table 2 for the lowest four mode shapes, and the lowest 200 frequencies in Figure 6. Fig. 5 Lowest four eigenmodes computed by the FEM superimposed on the minimized structure of T4 lysozyme. Overlap with the all-atom model is given in Table 2. Images rendered using VMD ver. 1.8.5 59. Table 2 Overlap of coarse-grained model and all-atom normal modes as measured at Cα positions. Mode 1 Mode 2 Mode 3 Mode 4 RTB 0.97 0.93 0.82 0.28 FEM 0.91 0.86 0.76 0.71 The modal frequency distributions provide a final quantitative evaluation of the FEM and RTB approach for T4 lysozyme (Fig. 6). While the overall correlation between the FEM and all-atom frequencies is reasonable, particularly for low mode-numbers, the FEM tends to underestimate the “exact” frequency computed using the all-atom model at high mode-numbers. This suggests that the FEM models the protein as overly compliant in this regime, which is to be expected because higher modes excite shorter wavelength, stiffer degrees of freedom in the all-atom protein resulting from chain connectivity, whereas the elastic solid approximation assumes a compliance that is length-scale invariant. Backbone Cα fluctuations as well as Cα correlations are apparently unaffected by this approximation because the low modes dominate these observables. Interestingly, the opposite tendency was observed by Tama et al.,21 for the RTB-approach with successively larger blocks. This is also to be expected because the assumption of rigid blocks in the protein renders the structure overly stiff on short length scales (high frequency modes), and the length-scale at which this deviation from the all-atom model becomes significant increases with increasing block-size. ωCG [cm−1] ωATM [cm−1] 0 5 10 15 20 25 30 35 40 ωATM [cm–1] ωCG [cm–1] Fig. 6 Correlation between coarse-grained (CG) and all-atom (ATM) model frequencies for the lowest 200 modes. F-actin F-actin is a highly dynamic biopolymer with a considerable degree of internal plasticity in the state of tilt and twist of its constituent protomers, which depends on the bound nucleotide-state (ATP/ADP), bound actin-binding protein, and solvent conditions.74-78 Additionally, the bending stiffness of F-actin has been shown to increase by a factor of two in the presence of phalloidin, by 50% in the F-ADP-P versus F-ADP state, and to be regulated by tropomyosin in a Ca2+-dependent fashion.66 Thus, any modeling attempt to predict the mechanical properties of F-actin and investigate their relation to its detailed internal structure and composition must consider such variations. Modeling attempts to investigate the structure-function relation of F-actin include an early study by ben-Avraham and Tirion,68 who treated G-actin monomers as internally rigid and connected to their nearest neighbor monomers by compliant springs, a more recent study by Ming et al.,79 in which conventional eNMA is used together with substructure-synthesis to calculate the large wavelength normal modes of a micron-long F-actin molecule, and most recently an all-atom MD study by Chu and Voth,67 who found that the loop-helix transition of the DNase I binding region of subdomain 2 of G-actin plays a central role in respectively stabilizing-destabilizing F-actin by disrupting inter- monomer interactions. Chu and Voth67 also calculated the apparent persistence length of F-actin and found that the loop-to-helix transition between the ATP- and ADP-bound states accounted for the approximately 50% decrease in associated bending stiffness observed experimentally.66 The normal modes of F-actin (52-mer, 0.14 μm length) computed here in free planar-vibration yield four bending modes as the lowest modes (Fig. 7). Association of F- actin with a homogeneous elastic rod in free vibration80 results in an apparent bending stiffness, κ = 6.8×10–26 Nm2, for the lowest mode, which is near the upper limit of bending stiffness typically reported experimentally.66,81,82 Subjecting F-actin to an axially compressive load and performing a linearized buckling analysis yields the lowest critical Euler buckling load, Pcrit = 33 pN. Association of the filament again with a homogeneous elastic Euler–Bernoulli beam yields the effective bending stiffness, 2 2 26 2/ 6.9 10 NmcritP Lκ π −= = × , which is similar to the bending stiffness calculated from the lowest bending mode because that mode of deformation is the same as the lowest Euler buckling mode. Mode 1 Mode 4 Mode 3 Mode 2 Fig. 7 Four lowest free vibration modes of F-actin (52-mer, 0.144 μm length) in planar deformation. The corresponding angular frequencies are, 0.18×10–2, 0.48×10–2, 0.92×10– 2, and 0.16×10–1 rad/psec. The bending stiffness calculated here for F-actin is consistent with experimental measurements of the ATP-bound-state in which the DNase binding region (residues 40– 48) in subdomain 2 of G-actin is in its disordered loop conformation, thereby stabilizing inter-monomer interactions.66,67 While this is to be expected given the structure of G- actin:ADP:Ca2+ employed, in which the DNase binding region is also in its disordered loop conformation, a similar coarse-grained analysis of F-actin must be performed in which the DNase binding region of G-actin is in its ordered α-helical structure. Only then may it be stated definitively whether the observed mechanical behavior is due solely to this detailed structural difference or to some other source, such as a lack of modeling resolution. While a more detailed investigation of this type is of direct interest in evaluating the full utility of the proposed procedure, it is also of interest fundamentally to investigate the respective roles of molecular shape versus molecular interactions on determining the mechanical properties of supramolecular assemblies such as F-actin, MTs, and viral capsids. In particular, an intriguing hypothesis is that mechanical response is determined solely by molecular shape, in which case the mechanical properties of supramolecular assemblies would be robust to amino acid mutations that do not alter molecular shape. A competing hypothesis is that mechanical response is sensitive to both molecular shape and detailed molecular interactions, in which case amino acid mutations would be more tightly constrained. In either case, investigation of the respective roles of molecular shape versus specific interactions on protein mechanics clearly requires that all-atom models be considered, either directly or via incorporation into coarse-grained models. Such investigations are currently underway and are expected to provide fundamental insight into the origin and robustness of the mechanical properties of supramolecular assemblies. Concluding discussion A coarse-grained FE-based procedure is proposed to compute the normal modes and mechanical response of proteins and their supramolecular assemblies. The procedure takes as input the atomic structure to define uniquely the volume associated with the SES, mass density, and elastic stiffness of the protein. The initial, high resolution SES discretized at atomic resolution is simplified using a quadric simplification algorithm to obtain a molecular surface representation of arbitrary prescribed spatial resolution. While the proposed procedure is applied to proteins with known atomic structure, the molecular volume could equally be defined from electron density data, rendering the procedure applicable to a broad class of biomolecules and biomolecular complexes for which only a rough approximation to the molecular volume is known. As with existing coarse-grained elastic network models, energy minimization is not required prior to the NMA because the initial structure is assumed to be the ground-state structure. Ongoing development of the proposed procedure is directed towards three areas of improvement. First, the atomic-based Hessian from all-atom force-fields such as CHARMM58 will be projected onto the FE-space such that the model optimally converges to the “exact” all-atom solution as the FE mesh is refined to atomic length- scales. Such a procedure will enable the systematic coarsening of protein structure and interactions without the a priori assumption of elastic response. Indeed, an intriguing and as of yet unresolved question regards the relative effects of molecular shape versus specific molecular interactions on the mechanical response of supramolecular assemblies such as F-actin, MTs, and viral capsids. Second, the Poisson–Boltzmann equation used to model aqueous electrolyte-mediated electrostatic interactions in proteins may be coupled directly to the elastic-based FE model, so that it may be included in computations of normal modes and mechanical response. Langevin dynamics may also be incorporated into the model by coupling the protein-domain to the Stokes equations to model solvent damping.41,83 Finally, the proposed surface discretization and simplification procedure requires improvement because it often results in surface meshes with intersecting or degenerate triangles, as encountered here for F-actin. The utility of the proposed FE-based procedure is explored here for one specific globular protein and supramolecular assembly, namely T4 lysozyme and F-actin. Clearly, in order to evaluate the utility of the proposed procedure thoroughly, a set of proteins of drastically varying structure must be analyzed, as well as additional supramolecular assemblies. Additional response variables and the effects of internal structural variations of the molecules examined should also be investigated. Notwithstanding these additional analyses and the foregoing model improvements, the current communication establishes an effective theoretical framework for the computation of the normal modes and generalized mechanical response of proteins and their supramolecular assemblies based on the elastic medium theory of proteins. Acknowledgements Discussions with Marco Cecchini, Martin Karplus, Klaus–Jürgen Bathe, and Michael Garland are gratefully acknowledged, as is funding from the Alexander von Humboldt Foundation in the form of a post-doctoral fellowship. The author additionally thanks Michael Sanner for bringing QSLIM to his attention. References 1. Levitt M, Sander C, Stern PS. Protein Normal-mode Dynamics: Trypsin Inhibitor, Crambin, Ribonuclease and Lysozyme. Journal of Molecular Biology 1985;181(3):423-447. 2. Brooks B, Karplus M. Harmonic dynamics of proteins - Normal Modes and fluctuations in bovine pancreatic trypsin inhibitor. Proceedings of the National Academy of Sciences of the United States of America 1983;80(21):6571-6575. 3. Go N, Noguti T, Nishikawa T. Dynamics of a small globular protein in terms of low frequency vibrational modes. Proceedings of the National Academy of Sciences of the United States of America 1983;80(12):3696-3700. 4. Bruccoleri RE, Karplus M, McCammon JA. The hinge-bending mode of a lysozyme inhibitor complex. Biopolymers 1986;25(9):1767-1802. 5. Karplus M, McCammon JA. Molecular dynamics simulations of biomolecules. Nature Structural Biology 2002;9(9):646-652. 6. Brooks BR, Janezic D, Karplus M. Harmonic analysis of large systems. 1. Methodology. Journal of Computational Chemistry 1995;16(12):1522-1542. 7. Tirion MM. Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Physical Review Letters 1996;77(9):1905-1908. 8. Bahar I, Atilgan AR, Erman B. Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Folding & Design 1997;2(3):173-181. 9. Tama F, Brooks CL. Symmetry, form, and shape: Guiding principles for robustness in macromolecular machines. Annual Review of Biophysics and Biomolecular Structure 2006;35:115-133. 10. Tozzini V. Coarse-grained models for proteins. Current Opinion In Structural Biology 2005;15(2):144-150. 11. Bahar I, Rader AJ. Coarse-grained normal mode analysis in structural biology. Current Opinion in Structural Biology 2005;15(5):586-592. 12. Ma JP. Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes. Structure 2005;13(3):373-380. 13. Suezaki Y, Go N. Breathing Mode of Conformational Fluctuations in Globular Proteins. International Journal of Peptide and Protein Research 1975;7(4):333- 334. 14. Lu MY, Ma JP. The role of shape in determining molecular motions. Biophysical Journal 2005;89(4):2395-2401. 15. Bathe KJ. Finite Element Procedures. Upper Saddle River, New Jersey: Prentice- Hall Inc.; 1996. 16. Zienkiewicz OC, Taylor RL. The finite element method. Boston: Butterworth– Heinemann; 2000. 17. Richards FM. Areas, volumes, packing, and protein-structure. Annual Review of Biophysics and Bioengineering 1977;6:151-176. 18. Connolly ML. Analytical Molecular Surface Calculation. Journal of Applied Crystallography 1983;16(OCT):548-558. 19. Greer J, Bush BL. Macromolecular Shape and Surface Maps by Solvent Exclusion. Proceedings of the National Academy of Sciences of the United States of America 1978;75(1):303-307. 20. Sanner MF, Olson AJ, Spehner JC. Reduced surface: An efficient way to compute molecular surfaces. Biopolymers 1996;38(3):305-320. 21. Tama F, Gadea FX, Marques O, Sanejouand YH. Building-block approach for determining low frequency normal modes of macromolecules. Proteins: Structure Function and Genetics 2000;41(1):1-7. 22. Li GH, Cui Q. A coarse-grained normal mode approach for macromolecules: An efficient implementation and application to Ca2+-ATPase. Biophysical Journal 2002;83(5):2457-2474. 23. Lazaridis T, Karplus M. Effective energy function for proteins in solution. Proteins: Structure Function and Genetics 1999;35(2):133-152. 24. Ming D, Kong YF, Lambert MA, Huang Z, Ma JP. How to describe protein motion without amino acid sequence and atomic coordinates. Proceedings of the National Academy of Sciences of the United States of America 2002;99(13):8620-8625. 25. Tama F, Wriggers W, Brooks CL. Exploring global distortions of biological macromolecules and assemblies from low-resolution structural information and elastic network theory. Journal of Molecular Biology 2002;321(2):297-305. 26. Chacon P, Tama F, Wriggers W. Mega-Dalton biomolecular motion captured from electron microscopy reconstructions. Journal of Molecular Biology 2003;326(2):485-492. 27. Michel JP, Ivanovska IL, Gibbons MM, Klug WS, Knobler CM, Wuite GJL, Schmidt CF. Nanoindentation studies of full and empty viral capsids and the effects of capsid protein mutations on elasticity and strength. Proceedings of the National Academy of Sciences of the United States of America 2006;103(16):6184-6189. 28. Carrasco C, Carreira A, Schaap IAT, Serena PA, Gomez-Herrero J, Mateu MG, Pablo PJ. DNA-mediated anisotropic mechanical reinforcement of a virus. Proceedings of the National Academy of Sciences of the United States of America 2006;103(37):13706-13711. 29. de Pablo PJ, Schaap IAT, MacKintosh FC, Schmidt CF. Deformation and collapse of microtubules on the nanometer scale. Physical Review Letters 2003;91(9). 30. Kis A, Kasas S, Babic B, Kulik AJ, Benoit W, Briggs GAD, Schonenberger C, Catsicas S, Forro L. Nanomechanics of microtubules. Physical Review Letters 2002;89(24). 31. Claessens M, Bathe M, Frey E, Bausch AR. Actin-binding proteins sensitively mediate F-actin bundle stiffness. Nature Materials 2006;5(9):748-753. 32. Bathe M, Heussinger C, Claessens MMAE, Bausch AR, Frey E. Cytoskeletal bundle bending, buckling, and stretching behavior. Submitted 2006. 33. Wriggers W, Zhang Z, Shah M, Sorensen DC. Simulating nanoscale functional motions of biomolecules. Molecular Simulation 2006;32(10-11):803-815. 34. Howard J. Mechanics of Motor Proteins and the Cytoskeleton. Sunderland, MA: Sinauer Associates, Inc.; 2001. 35. Durand P, Trinquier G, Sanejouand YH. New approach for determining low frequency Normal-Modes in macromolecules. Biopolymers 1994;34(6):759-771. 36. Tadmor EB, Ortiz M, Phillips R. Quasicontinuum analysis of defects in solids. Philosophical Magazine A: Physics of Condensed Matter Structure Defects and Mechanical Properties 1996;73(6):1529-1563. 37. Abraham FF, Broughton JQ, Bernstein N, Kaxiras E. Spanning the continuum to quantum length scales in a dynamic simulation of brittle fracture. Europhysics Letters 1998;44(6):783-787. 38. Baker NA, Sept D, Joseph S, Holst MJ, McCammon JA. Electrostatics of nanosystems: Application to microtubules and the ribosome. Proceedings of the National Academy of Sciences of the United States of America 2001;98(18):10037-10041. 39. Cortis CM, Friesner RA. Numerical solution of the Poisson-Boltzmann equation using tetrahedral finite-element meshes. Journal of Computational Chemistry 1997;18(13):1591-1608. 40. Holst M, Baker N, Wang F. Adaptive multilevel finite element solution of the Poisson-Boltzmann equation I. Algorithms and examples. Journal of Computational Chemistry 2000;21(15):1319-1342. 41. Lamm G, Szabo A. Langevin modes of macromolecules. Journal of Chemical Physics 1986;85(12):7334-7348. 42. Potter MJ, Luty B, Zhou HX, McCammon JA. Time-dependent rate coefficients from Brownian Dynamics simulations. Journal of Physical Chemistry 1996;100(12):5149-5154. 43. Heckbert PS, Garland M. Optimal triangulation and quadric-based surface simplification. Computational Geometry: Theory and Applications 1999;14(1- 3):49-65. 44. Garland M. Quadric-Based Polygonal Surface Simplification [Ph.D.]: Carnegie Mellon University; 1999. 45. Garland M, Heckbert PS. Surface simplification using quadric error metrics. 1997 August 1997. p 209-216. 46. Wriggers W, Agrawal RK, Drew DL, McCammon A, Frank J. Domain motions of EF-G bound to the 70S ribosome: Insights from a hand-shaking between multi- resolution structures. Biophysical Journal 2000;79(3):1670-1678. 47. Malvern LE. Introduction to the Mechanics of a Continuous Medium. Englewood Cliffs, N. J.: Prentice–Hall; 1969. 48. Bathe KJ, Ramaswamy S. An accelerated Subspace Iteration Method. Computer Methods in Applied Mechanics and Engineering 1980;23(3):313-331. 49. Bathe KJ, Gracewski S. On non-linear dynamic analysis using substructuring and mode superposition. Computers & Structures 1981;13(5-6):699-707. 50. Ming D, Kong YF, Wu YH, Ma JP. Substructure synthesis method for simulating large molecular complexes. Proceedings of the National Academy of Sciences of the United States of America 2003;100(1):104-109. 51. Kojima H, Ishijima A, Yanagida T. Direct measurement of stiffness of single actin-filaments with and without tropomyosin by in-vitro nanomanipulation. Proceedings of the National Academy of Sciences of the United States of America 1994;91(26):12962-12966. 52. Benavraham D, Tirion MM. Dynamic and elastic properties of F-actin: A Normal Modes Analysis. Biophysical Journal 1995;68(4):1231-1245. 53. Nicholls A, Honig B. A rapid finite difference algorithm, utilizing successive over-relaxation to solve the Poisson-Boltzmann equation. Journal of Computational Chemistry 1991;12(4):435-445. 54. Cortis CM, Friesner RA. An automatic three-dimensional finite element mesh generation system for the Poisson-Boltzmann equation. Journal of Computational Chemistry 1997;18(13):1570-1590. 55. Baker N, Holst M, Wang F. Adaptive multilevel finite element solution of the Poisson-Boltzmann equation II. Refinement at solvent-accessible surfaces in biomolecular systems. Journal of Computational Chemistry 2000;21(15):1343- 1352. 56. Matsumura M, Wozniak JA, Daopin S, Matthews BW. Structural Studies of Mutants of T4 Lysozyme that Alter Hydrophobic Stabilization. Journal of Biological Chemistry 1989;264(27):16059-16066. 57. Bernstein FC, Koetzle TF, Williams GJB, Meyer EF, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The Protein Data Bank: A Computer- based Archival File for Macromolecular Structures. Journal of Molecular Biology 1977;112(3):535-542. 58. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M. CHARMM—A program for macromolecular energy, minimization, and dynamics calculations. Journal of Computational Chemistry 1983;4(2):187-217. 59. Humphrey W, Dalke A, Schulten K. VMD: Visual Molecular Dynamics. Journal of Molecular Graphics 1996;14(1):33. 60. Wriggers W, Milligan RA, McCammon JA. Situs: A package for docking crystal structures into low-resolution maps from electron microscopy. Journal of Structural Biology 1999;125(2-3):185-195. 61. Holmes KC, Popp D, Gebhard W, Kabsch W. Atomic model of the actin filament. Nature 1990;347(6288):44-49. 62. McLaughlin PJ, Gooch JT, Mannherz HG, Weeds AG. Structure of Gelsolin Segment-1-Actin Complex and the Mechanism of Filament Severing. Nature 1993;364(6439):685-692. 63. Wriggers W, Schulten K. Investigating a back door mechanism of actin phosphate release by steered molecular dynamics. Proteins: Structure Function and Genetics 1999;35(2):262-273. 64. Otterbein LR, Graceffa P, Dominguez R. The crystal structure of uncomplexed actin in the ADP state. Science 2001;293(5530):708-711. 65. Graceffa P, Dominguez R. Crystal structure of monomeric actin in the ATP state - Structural basis of nucleotide-dependent actin dynamics. Journal of Biological Chemistry 2003;278(36):34172-34180. 66. Isambert H, Venier P, Maggs AC, Fattoum A, Kassab R, Pantaloni D, Carlier MF. Flexibility of actin filaments derived from thermal fluctuations - Effect of bound nucleotide, phalloidin, and muscle regulatory proteins. Journal of Biological Chemistry 1995;270(19):11437-11444. 67. Chu JW, Voth GA. Allostery of actin filaments: Molecular dynamics simulations and coarse-grained analysis. Proceedings of the National Academy of Sciences of the United States of America 2005;102(37):13111-13116. 68. ben-Avraham D, Tirion MM. Dynamic and elastic properties of F-actin: A Normal Modes Analysis. Biophysical Journal 1995;68(4):1231-1245. 69. Cheng XL, Lu BZ, Grant B, Law RJ, McCammon JA. Channel opening motion of alpha 7 nicotinic acetylcholine receptor as suggested by normal mode analysis. Journal of Molecular Biology 2006;355(2):310-324. 70. Lange OF, Grubmuller H. Generalized correlation for biomolecular dynamics. Proteins: Structure Function and Bioinformatics 2006;62(4):1053-1061. 71. Cui Q, Li GH, Ma JP, Karplus M. A normal mode analysis of structural plasticity in the biomolecular motor F-1-ATPase. Journal of Molecular Biology 2004;340(2):345-372. 72. Nicolay S, Sanejouand YH. Functional modes of proteins are among the most robust. Physical Review Letters 2006;96(7):Art. No. 078104. 73. Brooks B, Karplus M. Normal-Modes for specific motions of macromolecules - Application to the hinge-bending mode of lysozyme. Proceedings of the National Academy of Sciences of the United States of America 1985;82(15):4995-4999. 74. Egelman EH, Francis N, Derosier DJ. F-actin is a helix with a random variable twist. Nature 1982;298(5870):131-135. 75. Egelman EH, Orlova A. Allostery, cooperativity, and different structural states in F-actin. Journal of Structural Biology 1995;115(2):159-162. 76. Orlova A, Galkin VE, VanLoock MS, Kim E, Shvetsov A, Reisler E, Egelman EH. Probing the structure of F-actin: Cross-links constrain atomic models and modify actin dynamics. Journal of Molecular Biology 2001;312(1):95-106. 77. Orlova A, Shvetsov A, Galkin VE, Kudryashov DS, Rubenstein PA, Egelman EH, Reisler E. Actin-destabilizing factors disrupt filaments by means of a time reversal of polymerization. Proceedings of the National Academy of Sciences of the United States of America 2004;101(51):17664-17668. 78. McGough A. F-actin-binding proteins. Current Opinion in Structural Biology 1998;8(2):166-176. 79. Ming DM, Kong YF, Wu YH, Ma JP. Simulation of F-actin filaments of several microns. Biophysical Journal 2003;85(1):27-35. 80. Den Hartog JP. Mechanical Vibrations. New York: McGraw-Hill; 1956. 81. LeGoff L, Hallatschek O, Frey E, Amblard F. Tracer studies on F-actin fluctuations. Physical Review Letters 2002;89(25):Art. No. 258101-258101. 82. Gittes F, Mickey B, Nettleton J, Howard J. Flexural rigidity of microtubules and actin filaments measured from thermal fluctuations in shape. Journal of Cell Biology 1993;120(4):923-934. 83. Case DA. Normal mode analysis of protein dynamics. Current Opinion in Structural Biology 1994;4(2):285-290. ABSTRACT A coarse-grained computational procedure based on the Finite Element Method is proposed to calculate the normal modes and mechanical response of proteins and their supramolecular assemblies. Motivated by the elastic network model, proteins are modeled as homogeneous isotropic elastic solids with volume defined by their solvent-excluded surface. The discretized Finite Element representation is obtained using a surface simplification algorithm that facilitates the generation of models of arbitrary prescribed spatial resolution. The procedure is applied to compute the normal modes of a mutant of T4 phage lysozyme and of filamentous actin, as well as the critical Euler buckling load of the latter when subject to axial compression. Results compare favorably with all-atom normal mode analysis, the Rotation Translation Blocks procedure, and experiment. The proposed methodology establishes a computational framework for the calculation of protein mechanical response that facilitates the incorporation of specific atomic-level interactions into the model, including aqueous-electrolyte-mediated electrostatic effects. The procedure is equally applicable to proteins with known atomic coordinates as it is to electron density maps of proteins, protein complexes, and supramolecular assemblies of unknown atomic structure. <|endoftext|><|startoftext|> Introduction Light element abundance Homogeneous BBN Parameters and Basic equations Theoretical predictions and observations of light elements Theoretical predictions and observations of heavy elements (92,94Mo, 96,98Ru) Diffusion during BBN Summary Acknowledgements References ABSTRACT This is a reply report to astro-ph/0604264. We studied heavy element production in high baryon density region in early universe astro-ph/0507439. However it is claimed in astro-ph/0604264 that small scale but high baryon density region contradicts observations for the light element abundance or in order not to contradict to observations high density region must be so small that it cannot affect the present heavy element abundance. In this paper we study big bang nucleosynthesis in high baryon density region and show that in certain parameter spaces it is possible to produce enough amount of heavy element without contradiction to CMB and light element observations. <|endoftext|><|startoftext|> Introduction Sum-over-states expressions for the time-ordered nonlinear response Quasiparticle expressions for Wannier excitons in semiconductors Connecting the sum-over-states and the quasiparticle pictures 2D correlation signals Discussion Acknowledgments Exciton representation of the two-band Hamiltonian for fermions The Nonlinear Exciton Equations Response functions of quasiparticles The exciton scattering-matrix SOS expressions for third order techniques. Quasiparticle picture for soft-core and hard-core bosons References ABSTRACT Two-dimensional correlation spectroscopy (2DCS) based on the nonlinear optical response of excitons to sequences of ultrafast pulses, has the potential to provide some unique insights into carrier dynamics in semiconductors. The most prominent feature of 2DCS, cross peaks, can best be understood using a sum-over-states picture involving the many-body eigenstates. However, the optical response of semiconductors is usually calculated by solving truncated equations of motion for dynamical variables, which result in a quasiparticle picture. In this work we derive Green's function expressions for the four wave mixing signals generated in various phase-matching directions and use them to establish the connection between the two pictures. The formal connection with Frenkel excitons (hard-core bosons) and vibrational excitons (soft-core bosons) is pointed out. <|endoftext|><|startoftext|> Introduction Since the solar corona is optically thin, studies based on coronal loop observations must include some form of subtraction of the background contribution (see e.g., Klimchuk 2000, Schmelz & Martens 2006, López Fuentes, Klimchuk & Démoulin 2006). In their recent statistical study based on observations from the Transition Region and Coronal Explorer (TRACE, see Handy et al. 1999), Aschwanden, Nightingale & Boerner (2007) showed that the background can be several times brighter than the loops themselves. It is likely that the background corona is formed by a number of loops that are too faint to produce a large enough contrast to make them detectable. However, these unobserved structures constitute a spatially fluctuating background for actual observed loops. Therefore, even for loops with constant intensity along their length, fluctuations due to the structuring of the background are expected. The determination of morphological properties of a loop, such as its diameter, can be affected by the characteristics of the background, and therefore it is important that the background be taken into account during such analyses. In a recent paper (López Fuentes, Klimchuk & Démoulin 2006, henceforth LKD06) we explored the problem of the apparent constant width of coronal loops. Since loops are the trace of magnetic flux tubes rooted in the photosphere, we might expect on the basis of simple force-free magnetic field models that most loops would expand with height. However, observations show that this is not the case; both X-ray loops (Klimchuk 2000) observed with the Soft X-ray Telescope (SXT, see Tsuneta et al. 1991) aboard Yohkoh, and EUV loops (Watko & Klimchuk 2000, and LKD06) observed with TRACE, seem to correspond to constant cross-sections. In LKD06, we compared a number of observed TRACE loops with corresponding model flux tubes obtained from force-free extrapolations of magnetogram data from the Michelson Doppler Imager (MDI, see Scherrer et al. 1995) aboard the Solar and Heliospheric Ob- servatory (SOHO). To quantify the expansion of the loops and flux tubes, we defined the expansion factor Γ as the ratio between the widths averaged over the middle and footpoint sections. We found that the mean expansion factor of the model flux tubes is about twice that of the corresponding observed loops. Another important result is that the cross sec- tion is much more asymmetric (from footpoint to footpoint) for the model flux tubes than for observed loops. We suggest that the origin of this asymmetry lies is the complexity of the magnetic connectivity of the solar atmosphere. In LKD06 we proposed a mechanism to explain the observed symmetry of real loops. Although the measured widths of observed loops have very little global variation, there are short distance fluctuations as large as 25% of the average width. In LKD06 the loop background was subtracted by linearly interpolating between the intensities on either side – 3 – of the loop. Since the background intensity can be as much as three to five times the intrinsic intensity of the loop, we might expect the width determination to be less reliable at positions where the ratio of loop to background intensities is smaller. If so, then measured width variations might be partly or largely an artifact of imperfect background subtraction. It is also possible that the width fluctuations are indicative of real structural properties of the loops. For instance, loops may be bundles of thinner unresolved magnetic strands that wrap around each other. If there are only a few such strands, then we might expect the width of the bundle to fluctuate on top of a global trend. Furthermore, the width should be anti-correlated with the intensity, since the bundle will be thinner and brighter in places where the strands are lined up along the line of sight, and it will be fatter and fainter in places where the stands are side-by-side across the plane of the sky. The filamentary nature of coronal loops, and the solar corona in general, has been progressively evident from the combination of models and observations (for a review, see e.g., Klimchuk 2006). Our ability to discern the internal structure of loops is limited by the instrument resolution. It can be seen from TRACE images that structures many times wider than the instrument Point Spread Function (PSF) are clearly made of thinner strands. It is not surprising then that recognizable “individual” loops are no thicker than a few times the PSF width. Since identifiable individual loops are close to the resolution limit, it has been suggested that the apparent constant width may just be an artifact of the resolution (see the recent paper by DeForest 2007). If loops are everywhere much smaller than the PSF, then they will appear to have a constant width equal to that of the PSF, even if the true width is varying greatly. We have carefully accounted for the PSF in our earlier studies and concluded that this is not a viable explanation for the observed constant widths. What we have not addressed in as much detail is the possible role of imperfect background subtraction. This paper describes a study that addresses both the background subtraction and finite resolution and the extent to which they influence the measured widths of loops. Our approach is to produce synthetic loops with constant and variable cross-sections, and place them on real TRACE backgrounds to simulate loop observations. We then process the synthetic data following the same procedure used in LKD06, so we can compare them with actual TRACE loops. This allows us to determine whether the procedure followed in LKD06 is able to distinguish expanding loops from constant cross-section loops. We will answer the question of whether the lack of global expansion in observed loops is real or simply an observational artifact, as suggest by DeForest (2007). We will also investigate the reliability of the shorter length scale fluctuations that are often observed. In Section 2 we describe the main properties of the set of loops studied in LKD06. We explain the synthetic loop construction in Section 3. In Section 4 we compare synthetic and – 4 – observed loops, and we discuss and conclude in Section 5. 2. Observed TRACE loops In LKD06 we studied a set of 20 loops from TRACE images in the 171 Å passband. To determine the width of the loops we followed a procedure based on the measurement of the second moment (the standard deviation) of the cross-axis intensity profile at each position along the loop. The measurement is done on a straightened version of the loop, as described in LKD06. Assuming circular cross-sections and uniform emissivity, the cross- section diameter (that we refer to as the width) will be 4 times the standard deviation of the profile. The same procedure had been used in previous studies (see Klimchuk et al. 1992, Klimchuk 2000, Watko & Klimchuk 2000). The actual background is estimated by linear interpolation of the background pixels at both sides of the loop. The obtained width is corrected for instrumental resolution (i.e. the combined PSF due to telescope smearing and detector pixelation). The typical length of the studied loops is around 150 TRACE pixels or 54 Mm, though loops as long as 300 pixels (108 Mm) are included in the set. The average width for all loops in the set is 4.2 pixels or 1.5 Mm. Figure 1 shows a typical case having the average width and length. The upper panel shows the loop as observed in the TRACE image, and the lower panel is the “straightened” version. For the resolution correction we use a conversion curve (see Figure 4 in LKD06) to transform each measured standard deviation value to width. The curve has been obtained assuming a Gaussian PSF with a full width at half maximum of 2.25 pixels and loop cross- sections that are circular and uniformly filled. The chosen PSF width is an upper limit for values obtained in different studies (Golub et al. 1999, Gburek, Sylwester & Martens 2006). The resolution correction curve plays a two-fold role. First, it allows us to obtain a more realistic width value from a measured quantity like the standard deviation. Second, it is a filter for measurements that are clearly unreliable. Standard deviation measurements smaller than a minimum value equal to the standard deviation of the PSF itself (where the conversion curve crosses the abscissa axis, see LKD06 Figure 4) are considered untrustworthy, and the corresponding width is set to zero (i.e., rejected). Problems resulting from significant errors in the background subtraction can also be identified in this way. It is worth remarking that our approach is quite cautious in that the PSF we have assumed is wider than the most recent estimates (see Gburek, Sylwester & Martens 2006). Some of the measurements we reject as being unresolved may in fact be valid. – 5 – Figure 2 is a plot of width (asterisks) versus position along the loop shown in Figure 1. The horizontal line corresponds to the average width. It is nearly identical to the mean width of all the loops in the set. The three “zero width” values that lie on the abscissa axis correspond to standard deviation measurements that were below the resolution limit as explained above. To quantify the expansion of loops from footpoint to top we defined expansion factors as follows: Γm/se = Ws +We , Γm/s = and Γm/e = , (1) where Wm, Ws and We are the average width of portions that cover 15% of the loop length at the middle, start footpoint, and end footpoint, respectively. Start and end refer to the magnetic field line traces used to define the magnetic flux tubes in the extrapolation models. The model flux tubes expand much more than the corresponding observed loops (LKD06). Their expansion factors are 1.5 to 2 times larger. As explained in Section 1, the loop width fluctuates as much as 25% over short dis- tances (see e.g., Figure 2). We tried alternate measures of the loop width (full width at half maximum and equivalent width of the intensity profile), and the same fluctuations are present. Our conclusion is that the fluctuations are most likely due to the influence of the background (see below). Since these fluctuations have a short length scale and vary quasi randomly around a global trend, they do not significantly affect the measured expansion factors. 3. Synthetic loops In this study we create a set of synthetic loops with similar characteristics to the TRACE loops studied in LKD06, and we overlay them on real TRACE backgrounds. The axis of the loop is linear and its cross-section is circular. To analyze the possibility that the apparent constant width is due to a resolution effect we create loops of two kinds: loops with constant diameter along their length, and loops that are wider in the middle than at the footpoints. For the second class of synthetic loops we use an expansion factor that is typical of the model flux tubes obtained in LKD06. We set the diameter of the loop at the mid point to be twice the diameter at the ends, and we assume that the diameter varies quadratically with position. Since the expansion factor defined in Equation (1) involves averages along 15% sections of the loop, Γm/se = 1.57 rather than 2.0. – 6 – We have chosen two kinds of background for the synthetic loops. Background I, shown in the top-left panel in Figure 3, corresponds to a typical TRACE loop background: it has similar intensity magnitude and fluctuations, and it contains moss (see Berger et al. 1999, Martens et al. 2000) and other intense features. Background II, shown in the top-right panel, is fainter and fluctuates less than Background I. Although it does not correspond very well to real loop backgrounds, we consider it interesting to study how this kind of background affects the width determination. The average intensities of backgrounds I and II are approximately 70 and 30 DN (Data Numbers), respectively. For comparison, the typical intrinsic (background subtracted) intensity of observed loops is between 20 and 40 DN. Both background areas have been extracted from a TRACE image in the 171 Å band obtained at 01:45 UT on July 30, 2002. To create a simulated TRACE image containing the synthetic loop we proceed as follows. We create an image of the loop without background. The maximum intensity of the loop (at the axis) is set proportional to the average intensity of the background image on which it will be later superposed. The constant of proportionality is referred to as the intensity factor Φ. Since the background intensity tends to be higher than the intrinsic loop intensity, Φ is generally smaller than 1. To simulate the finite resolution, we smooth the image of the loop using a gaussian profile with a full width at half maximum of 2.25 pixels corresponding to the instrument PSF. The resulting loop is then placed on the previously selected background (I or II) from the TRACE image. The images in Figure 3 have been created as described above. Both panels in each row use the same synthetic loop placed in one case on background I (left) and the other case on Background II (right). The four loops differ in the following ways. The loop in the second row (panels a and b) has a constant diameter of 4 pixels, corresponding to Γm/se = 1. The loop in the third row (panels c and d) has a diameter that expands from 2.5 pixels at the ends to 5 pixels at the center, corresponding to Γm/se = 1.57. The loop in fourth row (panels e and f) has a constant diameter of 3 pixels. Finally, the loop in the bottom row (panels g and h) expands from 2 pixels at the ends to 4 pixels in the middle. Notice that the ends of this last loop are narrower than the PSF. The intensity factor Φ has been adjusted so that the resulting loops look similar, by eye, to typical TRACE loops. We used Φ = 0.5 for Background I and Φ = 0.7 for Background II. Considering the average intensities of Backgrounds I and II, this gives intrinsic loop intensities of around 35 DN and 25 DN, respectively. These values are consistent with the intrinsic intensities of observed loops. The photon statistical noise associated with TRACE data is given by N , where N is the number of photon counts per pixel (Handy et al. 1999). Since 1 DN corresponds to 12 photon counts, the photon noise as a percentage of the signal is: – 7 – PN% = 100 , (2) where I is the intensity of the signal in DN/pix. The synthetic loop data constructed here includes the photon noise present in the TRACE image used for the background. As we now demonstrate, this contribution dominates the noise from the loop itself, so we can safely ignore the loop contribution. Let us first consider the extreme case of low background and loop intensities, namely: Ib = 30 DN and Il = 10 DN. According to Equation (2) the photon noise of the total signal is PN% = 30/ 40 or 4.7%. On the other hand, for our synthetic images (photon noise from the background only) it is PN% = 30/ 30 or 4.1%, meaning a difference of 0.6%. For a more typical case of Ib = 70 DN and Il = 25 DN, the same percentages are 3.1% and 2.6% respectively, implying a difference of 0.5% of the total signal. These differences are minor and will have a negligible effect on the results of the following sections. 4. Results 4.1. Can we detect expanding loops? From the set of loop images, we measured the width following the same procedure used in LKD06 for real loops and described in Section 2. We used the conversion curve (Figure 4 in LKD06) to correct for the instrument PSF. The non-linearity of the curve increases the dispersion of the resulting widths at smaller values approaching the width of the PSF. In Figure 4 we plot the “measured” width (asterisks) versus position along the loop for the eight cases in Figure 3. The format of the figures is the same. For comparison, we also plot as continuous lines the actual diameters used to construct the images. It can be seen that, despite the fluctuations, expanding and constant width loops are clearly distinguishable. This is true for loops that are relatively wide (top two rows) and loops that are relatively narrow (bottom two rows). This demonstrates convincingly that, if loops expanded as expected from standard force-free extrapolation models, then it would be noticeable from observations even when they are very close to the resolution limit (last row). Since that it not the case, this may imply that actual magnetic fields have more complexity than is present in the standard models. We know, for example, that the field is comprised of many thin flux strands (elemental kilogauss tubes) that are tangled by photospheric convection. We believe this can explain the symmetry of observed loops with respect to their summit (see discussion in LKD06 and Klimchuk 2006), but whether it can also explain the lack of a general expansion with height is unclear. – 8 – It is interesting to note from the plots in Figure 4 that the measured width is systemat- ically smaller than the width set in the construction of the loops. This tendency appears in all loops and is very likely due to an underestimation of the real width in the measurement procedure. The procedure requires a subjective selection of the loop edges for the purpose of defining the background and computing the standard deviation. During this step one can miss the faint tail of the cross-axis intensity profile that blends in with the background. We have verified that there is a tendency to define the loop edges to be slightly inside the actual edges. This causes the measured width to be artificially small, both because the tail of the profile is missing from the standard deviation computation and because too strong a background is subtracted from the loop. We expect the effect to be greatest for loops that are especially faint or especially narrow, as discussed below. If this explanation is correct, we can conclude that the TRACE loops studied in LKD06 are actually slightly wider than our measurements seem to indicate. The fact that the measured width is a lower bound for the real width gives further support to our assertion that the analyzed loops are instrumentally resolved. In LDK06, we estimated the width uncertainties associated with background subtraction by repeating each measurement using different choices for the loop edges. We concluded that rule-of-thumb error bars range from 10% below to 20% above the measured best value. It now appears that the actual error bars may be somewhat larger. However, we stress that this does not impact our ability to distinguish expanding loops from non-expanding loops, as is readily apparent from Figure 4. To quantify this claim, we computed the expansion factors (Γm/se in Equation 1) of all the synthetic loops shown in Figure 4. These are listed in Table 1. The upper and lower limits that define the error bars are the expansion factors Γm/s and Γm/e. For comparison, in the case of the observed loop of Figures 1 and 2, the expansion factor computed in the same way is 1.03±0.04. The values given in Table 1 clearly confirm our conclusion that loops with constant and expanding cross section can be easily distinguished. It is interesting to note that the expansion factors for the same loops placed in different backgrounds can be notably different. The same is true for loops of the same kind (expanding or not) but with different characteristic size (wide or narrow). Compare row 1 with row 3, and row 2 with row 4 in Figure 4 and the table. The error bars are also different in all cases. Part of these differences may be due to the subjective part of the analysis procedure (the selection of the loop edges). However, repeating the width measurements we obtain approximately the same expansion factors. Therefore, the distribution of the background emission and the characteristic size of the loop both play a role in determining the precise value of the expansion factors and the error bars. In particular, Backgrounds I and II tend to give an under and over estimation of the expansion factors, respectively (compared to the values set during the loop construction). – 9 – Nevertheless, we want to stress that the measured expansion factors of the expanding and non-expanding synthetic loops are clearly clustered around the actual values, implying that loops with constant and expanding cross section are readily distinguishable. Next, we study how the observed loop expansion is affected by the relative intensity of the loop compared to the background. To test this, we created synthetic data in the way described in Section 3, for different values of the loop-to-background intensity ratio Φ. In Figure 5 we plot the expansion factor Γm/se (Equation 1) versus Φ for 4 narrow synthetic loops with similar characteristics to those shown in panels e) to h) of Figures 3 and 4. The difference is that the loops of Figure 5 are 300 pixels long, instead of 200 pixels. The definition of Γ is not affected by the change of length. On the other hand, longer loops provide more measurements and better statistics for studying how loop expansion depends on the loop-to-background intensity ratio. We chose thin loops for Figure 5 because their expansion is more challenging to measure and they are more affected by the background. If the intensity ratio Φ is too small, it is difficult to detect a loop above the background, much less measure its width. Our previous studies of observed loops have therefore avoided such cases. We subjectively define a lower limit for loop visibility of around Φ = 0.3. Below that, the width determination is unreliable. The upper value Φ = 1.5 is extreme for most TRACE loops, but it is interesting for analysis and may be appropriate to other datasets. The intermediate Φ values are 0.5, 0.7 and 1.0. Figure 5 provides strong additional support for our claim that the expansion factors of expanding and non-expanding loops can be clearly distinguished, even for the most critical cases of very low intensities and narrow widths. In no case do the error bars of expanding and non-expanding loops overlap. It is interesting to note that the synthetic loops used for Figure 5 overlap with more of the background image than do the shorter synthetic loops used for Figures 3 and 4. The footpoint and middle sections therefore combine with different portions of the background. Since the expansion factors are qualitatively similar in the corresponding cases, we can be confident that our results are not an artifact of the particular loop-background combinations. So far we have not considered loops that are completely below the resolution limit. In Figure 6 we show two cases of unresolved synthetic loops. Both have a constant diameter of 0.5 pix, and both use Background I (Figure 3). The loops differ only in the intensity ratio Φ, which is set to 1 for case (a) and 3 for case (b). Note, however, that because the loops occupy only a fraction of a pixel, the “observed” intensity ratios are much smaller: around 0.25 for the Φ=1 loop and 0.5 for the Φ=3 loop. Figure 7 shows the widths of the loops as measured in the usual way, including correction for instrument resolution. A majority of the measurements are equal to zero, meaning that – 10 – the computed standard deviation is below that of the PSF. This is especially true for the fainter loop of case (a). We can understand this behavior as follows. Due to the influence of the variable background, we expect some measurements to be too large and others to be too small. However, because of the systematic effects associated with loop edge selection, discussed above, we expect more of the measurements to be too small. The conversion from standard deviation to width is very sensitive at small values, where the conversion curve is nonlinear, and it only takes small errors in the standard deviation to produce a zero width value. The solid line in Figure 7 indicates the actual loop width of 0.5 pix, while the dashed line indicates the full width at half maximum of the PSF. The most important conclusion to draw from the figure is that our measurement technique can easily detect when loops are unresolved, i.e., when they are thinner than the PSF. As we stated before, the loops analyzed in LKD06 and previous works are all wider than the PSF (see also Section 5 below). Finally, in Figure 8 we plot width versus position along a synthetic loop with a footpoint width of 2.5 pixels and a model expansion factor of Γm/se = 2.2. Our measurement procedure tracks the loop expansion very well. The expansion factor computed from the observed width as in Table 1 gives Γm/se = 2.1± 0.2. Therefore, the loop can be readily distinguished from the Γm/se = 1.57 loop having the same footpoint size in Figure 4, panel (d). 4.2. Short length scale width fluctuations As discussed in Section 1 the measured widths of observed loops fluctuate as much as 25% over short length scales. It is important to know whether these variations are real or an artifact of the background. Comparison of panels a) and b) in Figure 4 with Figure 2 shows that synthetic and observed loops with similar characteristic width exhibit similar width fluctuations. For the observed loop of Figure 2, the amplitude of the fluctuations computed as the ratio of the standard deviation of the measured width to its average is 18%. The corresponding ratios for the synthetic loops of panels (a) and (b) in Figure 4, are 17% and 25%, respectively. This suggests that the fluctuations are not real and argues against loops being comprised of a small number of braided strands (the possibility that they are bundles of many tangled strands is not affected). This is not a firm conclusion, however, since the fluctuations are somewhat more coherent for the observed loop than for the synthetic loop. We return to this issue below. Narrower loops (e.g., panels (e) and (f) in Figure 4) show larger amplitude fluctuations (21% and 38%, respectively) mostly because of the non-linearity of the resolution correction curve (LKD06 Figure 4), which exaggerates differences at smaller widths. – 11 – To study how the background fluctuations affect the width determination, we analyze the relationships between the width and the loop and background intensities. In Figure 9 we plot as a function of position along the loop: the intensity of the background pixels at either side of the loop (from which the loop background is linearly interpolated; continuous lines), the loop width (dotted), the loop intensity (maximum intensity of the background-subtracted profile; dashed), and the absolute value of the difference between the two background pixel intensities (dot-dashed). The loop width is given in pixels and multiplied by 10 for easier comparison with the intensities. The upper panel corresponds to the observed loop example of Figures 1 and 2, and the lower panel correspond to the synthetic loop of Figures 3 and 4, panels (a). Figure 9 shows that our synthetic loop data share the main qualitative characteristics of real loops. The fluctuations of the background intensity and its difference at the sides of the loop, and the loop intrinsic intensity and its fluctuations, are similar in both cases. There are obvious differences due to the spatial structure unique to each background that can easily be identified in the images. For example, the bumps between 30 and 60 for the observed loop, and between positions 0 and 40 and between 90 and 130 for the synthetic loop, can be traced to patches of enhanced emission in Figures 1 and 3. Another difference is the global variation in the intensity of the observed and synthetic loops. The measured intensity of the synthetic loop is nearly constant because the loop was constructed that way (small fluctuations come entirely from imperfect background subtraction). The measured intensity of the observed loop, on the other hand, tends to diminish systematically toward the right end. This is likely to be real and not an artifact of the background subtraction. Despite of these expected differences, the comparison shows that the synthetic loops reproduce the main properties of the observed cases. We have suggested that small-scale fluctuations of the measured intensities and widths of loops are due to imperfect background subtraction. To further assess this, we look for statistical correlations between these quantities. In the upper panels of Figure 10 we plot width versus intensity for all positions along the observed and synthetic loops, respectively. We find that there is a small direct correlation between the width and intensity in both cases: wider sections of the loops tend to be brighter. The lines in the scatter plots are least- squares fits, which have the indicated slopes and intercepts. The correlation between width and intensity can be explained by the tendency, during the interactive analysis procedure, to miss the wings of the intensity profile and define the loop edges to be inside the true edges. As described earlier, this causes an over estimation of the background intensity and produces artificially narrow loop widths and artificially faint loop intensities. We expect the magnitude of this effect to vary depending on the brightness of the background relative to the loop. It will be stronger (i.e., the underestimates of width and intensity will be greater) – 12 – when the background is relatively bright. This is confirmed in the second row of Figure 10. It shows an inverse correlation between the measured width and the background-to-loop intensity ratio for both the observed and synthetic loops. The background intensity used here is the average of the sloping background subtracted during the analysis (i.e., the average of the values on either side of the loop). Notice also that for the synthetic loop, the measured width tends to be smaller than the model width (4 pixels) when the relative intensity of the background is larger. This effect is almost certainly responsible for the width-intensity correlation of the synthetic loop and seems a likely explanation for the observed loop, as well. Whether it is strong enough to allow the possibility that loops are bundles of a few (3-5) intertwined strands is unclear. Recall that such loops would exhibit an inverse correlation between width and intensity if the measurements were perfect. Are measurement errors large enough to negate this inverse correlation and produce a small direct correlation, as observed? Only more involved modeling can answer this question. It seems plausible that cross-loop gradients in the background could also have an effect on the measured width. Certainly small scale inhomogeneities are more difficult to subtract than a flat background. In the upper panels of Figure 11, we plot width versus the absolute value of the background intensity difference on the two sides of the loop. No correlation is apparent for either the observed or synthetic loops. We confirmed a lack of correlation using a non-parametric statistical analysis. We also find no correlation between the intrinsic loop intensity and the background intensity difference. The right bottom panel of Figure 11 indicates how the known error in width measure- ment for the synthetic loop correlates with the background intensity gradient. The ordinate is the absolute value of the difference between the measured width and the width used dur- ing the loop construction. The abscissa is the absolute value of the background intensity difference on the two sides, normalized by the loop intensity. The normalization is meant to compensate for the fact that background gradients should have a lesser impact on bright loops. The left bottom panel of Figure 11 is a corresponding scatter plot for the observed loop. Since the actual width is not known, the ordinate is replaced by the absolute value of the deviation of the measured width from its mean. In neither case is there a correlation, as confirmed by statistical analysis. We conclude that the magnitude of the background has a bigger effect on the width measurements than does the difference in the background on the two sides of the loop (the cross loop gradient). – 13 – 5. Discussion and conclusion In this paper we study the effect of the background and the instrument PSF in the determination of the apparent width of EUV coronal loops observed by TRACE. Our main motivation is to extend the results obtained in our previous work: López Fuentes, Klimchuk & Démoulin (2006; LKD06). There, we compared a set of observed TRACE loops with corresponding force-free model flux-tubes, and we found that observed loops do not expand with height as expected from the extrapolation model. Here, we construct artificial loops with expansion factors similar to those of the studied loops and the model flux-tubes, and we overlay them on real TRACE backgrounds. We repeat on these synthetic loops the same procedure followed in LKD06, and compare the results back with real loops. We find that even for loops close to the resolution limit the procedure followed in LKD06 discerns expanding and non-expanding cross-sections. The method includes a resolution correction that identifies measurements that are below the resolution limit and therefore unreliable (see explanation in Section 2). We used a gaussian Point Spread Function (PSF) for the instrument with a FWHM of 2.25 pixels, which is an upper bound for values found by different authors (Golub et al. 1999, Gburek et al. 2006). In a recent paper, DeForest (2007) has proposed the interesting idea that most thin individual loops observed by TRACE are actually extremely bright structures well under the resolution limit. In this scenario, the loop apparent width would be given by the instrument PSF. In this way, loops may actually expand, but their size both at the top and the footpoints would be unresolved and would appear the same. The motivations for this conjecture are the apparent constant width of loops, and the observation that TRACE loops have an intensity scale height that is considerably larger than expected for static equilibrium (Winebarger et al. 2003, Aschwanden et al. 2001) or steady flow (Patsourakos & Klimchuk 2004). More precisely, for expanding loops that are everywhere unresolved, the density gradient present in the corona is larger than inferred from the observations under the assumption of constant cross section. According to the above explanation, we should expect all individual TRACE loops to have a true width less than that of the PSF and an apparent width roughly equal to that of the PSF. However, observations do not support this. The mean width of the loops studied in LKD06 is 4.2 pix after correction for the instrument resolution (see also Watko & Klimchuk 2000). As shown in Figure 7 and discussed in Section 4.1, our method can easily identify loops that are intrinsically more narrow than the PSF. The loops selected for our studies are clearly not of this type. As we discussed in Section 1, coronal structures that are many times wider than the TRACE PSF are observed to be formed by thinner individual loops. Therefore, there is an – 14 – intermediate range of widths – let us say between one and three PSF widths – for which the profiles produced by unresolved threads could overlap to form apparently wider loops. This, together with the effect of a fluctuating and intense background, are the arguments provided by DeForest (2007) to explain loops wider than the PSF. A key point in this discussion is that unresolved neighbor threads might be expected to separate from each other with height for the same reasons that individual strands might be expected to expand with height (e.g., if the field behaves like simple force-free extrapolation models predict). In this respect, a structure formed by diverging threads does not differ from the expanding loops studied in Section 3. As we discussed there, the plots in Figures 4 and 5 and the Γ factors given in Table 1 show clearly that our procedure for the width determination would be able to detect the expansion if it existed, even for loops near the resolution limit. It is interesting to compare the synthetic images in Figure 2 in DeForest’s article with our Figure 3. There, he claims that synthetic loops made from a single unresolved thread of constant width and from two diverging threads are indistinguishable from each other and from actual TRACE loops. In our Figure 3 it is also very difficult, by eyeball, to determine which loops have expanding widths or constant widths. However, the plots in Figure 4 show that a careful examination through a quantitative measurement provides the answer. One of DeForest’s main arguments is that it is difficult to measure the width of features that are at or near the instrument resolution due to effects such as the smearing from the telescope, pixilation from the detector, and the presence of background emission. We agree, but these claims need to be quantified. It is not sufficient to make eye-ball comparisons of features. Quantitative measures must be used. We have adopted the standard deviation of the loops cross-axis intensity profile as one such measure. We have been very careful in our work to indicate when the measurements are reliable and when they are not. Measured widths that are very close to the instrument resolution have very large error bars that we show (see LKD06) and that we take into account. We have paid particularly careful attention to the effects of the combined PSF, which accounts for both smearing and pixilation. DeForest is correct that measurements of very thin features depend critically on the PSF. We have therefore adopted a conservative value for the PSF width that is greater than the estimates determined by the instrument teams and others. Furthermore, features as narrow as our assumed PSF are routinely observed, which would not be possible if the actual PSF were wider. DeForest is also correct that background emission can be important and may lead to spurious results. It is therefore vital to subtract the background before making measure- ments, as we have done. In LKD06, we have avoided loops where the background is especially bright or complicated. We attempted in our earlier studies to estimate the uncertainties as- – 15 – sociated with imperfect background subtraction, but this was not as careful as our treatment of resolution effects. The main purpose of this paper was to rigorously evaluate the effects of background on the measurement of loop widths. Regarding the importance of quantitative measurements versus visual inspection, we concur with DeForest that the visual determination of the edge of a feature is subjective and largely based on the intensity gradient across the feature. This can lead to erroneous conclusions about width variations if there is a systematic variation of intensity along the feature, such as decreasing intensity with height. Our quantitative measure of width based on the standard deviation of the intensity profile is by construction moderating such bias. The positive correlation found between the loop width and the maximum intensity (top panels in Figure 10) could be a remnant of this effect or an intrinsic property of the loops. DeForest correctly points out that, with optically-thin coronal emission, the observed intensity scale height of a hydrostatic structure is larger for an expanding loop than for a constant cross section loop, especially if the loop is unresolved. In fact, for the 1-2 MK model examples he shows (Figures 5 and 6), the intensity actually increases with height by a factors of 2-3 to a maximum brightness at altitudes near 7 × 109 cm). Whether actual TRACE loops have this property is unknown and should be investigated. The variation of temperature with height combined with the transmission properties of the filter used will complicate the interpretation. We note that the observation of super-hydrostatic scale heights is different from the observation of excess densities in TRACE loops. For most TRACE loops, the density inferred from the observed emission measure and diameter is much larger than that expected from static equilibrium theory, given the observed temperature and loop length (Aschwanden et al. 2001, Winebarger et al. 2003). DeForest’s idea of unresolved loops would make this discrepancy even worse, since a higher density is required to produce the same emission measure from a smaller volume. The loops identified and measured by DeForest are qualitatively much different from the loops identified and measured in our studies. We chose cases that are not obviously composed of a few resolved or quasi-resolved strands (although we believe that our loops may be composed of large numbers of elemental strands that are far below the resolution limit). The only one of his loops with no apparent internal structure (Loop 6 in his Figure 8) would have not been selected by us, because it is barely discernable above the background. On the other hand, some of the thinner structures within DeForest’s loop bundles (e.g., at the bottom edge of his Loop 3) are not unlike the loops we have investigated. In this regard, we must clarify a comment attributed to one of us at the end of Section 5 in his paper. We suggest that researchers seeking to study monolithic-looking loops will tend to select cases – 16 – that are only a few resolution elements across. Significantly wider loops (e.g, all except Loop 6 in DeForest’s sample) usually show evidence of internal structure and will be rejected. We agree fully with DeForest that collections of loops (loop bundles) expand appreciably with height. However, we stand by our claim that individual loops that are clearly discernable within a bundle have a much more uniform width. This is not an artifact of the resolution. A hare and hounds exercise, as currently planned, is one useful way to clarify any remaining differences of opinion. An important topic of the present study has been the analysis of how the properties of the background affect the loop width determination. We searched for correlations between the width and: the loop intrinsic intensity, the background intensity to loop intensity ratios, and the absolute value of the background difference. The background intensity is computed as the average between the pixels at both sides of the loop, which is used for the estimation of the actual loop background, while the background difference is the difference between those pixels. We found a direct correlation between the width and the maximum intensity of the loop profile (see Figure 10, upper panels). This is probably due to the fact that we tend to miss the “tails” of the loop profile at positions where the loop is less intense with respect to the background, and therefore, the measured profile tends to be narrower. This is confirmed by the inverse correlation found between the width and the ratio of background intensity to loop intensity (see Figure 10, bottom panels). It can be seen from the plots that the width tends to be abnormally narrower, and the points more disperse, for larger background to loop intensity ratios. We found no evidence of correlation between the width and the the background differ- ence. This shows that the background gradients are less important in the determination of the width than the background relative intensity. We stress, however, that this does not affect our ability to determine the global expansion properties of loops, and that despite of the background contribution we are readily able to distinguish constant width loops from loops that expand as predicted from simple force-free magnetic models. The results presented here are extremely intriguing and provide clues and new questions to guide future investigations. However, it is expected that definitive answers will come from improved observations using new generations of solar instruments with higher resolution. We acknowledge the Transition Region and Coronal Explorer (TRACE ) team. We wish to thank Craig DeForest for fruitful discussions about the nature of observed loops. We also thank our anonymous referee for his/her valuable suggestions and comments. The authors acknowledge financial support from CNRS (France) and CONICET (Argentina) through their cooperative science program (N0 20326). MLF thanks the Secretary of Science and – 17 – Technology of Argentina, through its RAICES program, for travel support. This work was partially funded by NASA and the Office of Naval Research. REFERENCES Aschwanden,M.J., Nightingale,R.W., & Boerner,P. 2007, ApJ, 656, 577 Aschwanden, M. J., Schrijver, C. J., & Alexander, D. 2001, ApJ, 550, 1036 Berger, T. E., De Pontieu, B., Fletcher, L., Schrijver, C. J., Tarbell, T. D., & Title, A. M. 1999, Sol. Phys., 190, 409 DeForest, C. E. 2007, ApJ, 661, 532 Gburek, S., Sylwester, J., & Martens, P. 2006, Sol. Phys., 239, 531 Golub, L., et al. 1999, Physics of Plasmas, 6, 2205 Handy, B. N., et al. 1999, Sol. Phys., 187, 229 Klimchuk, J. A. 2000, Sol. Phys., 193, 53 Klimchuk, J. A., Antiochos, S. K., & Norton, D. 2000, ApJ, 542, 504 Klimchuk, J. A. 2006, Sol. Phys., 234, 41 Klimchuk, J. A., Lemen, J. R., Feldman, U., Tsuneta, S., & Uchida, Y. 1992, PASJ, 44, López Fuentes, M. C., Klimchuk, J. A., & Démoulin, P. 2006, ApJ, 639, 459 (LKD06) Martens, P. C. H., Kankelborg, C. C., & Berger, T. E. 2000, ApJ, 537, 471 Patsourakos, S. & Klimchuk, J. A. 2004, ApJ, 603, 322 Scherrer, P. H. et al. 1995, Sol. Phys., 162, 129 Schmelz, J. T., & Martens, P. C. H. 2006, ApJ, 636, L49 Watko, J. A. & Klimchuk, J. A. 2000, Sol. Phys., 193, 77 Winebarger, A. R., Warren, H. P., & Mariska, J. T. 2003, ApJ, 587, 439 This preprint was prepared with the AAS LATEX macros v5.2. – 18 – Table 1: Expansion factors Γm/se (Equation 1) for the synthetic loops shown in Figures 3 and 4 (see detailed explanation in Section 4.1). Synthetic loop Imposed Background I Background II Const. width (4 pix) 1 0.85± .02 0.95± .04 Variable width (2.5-5 pix) 1.57 1.38± .25 1.76± .01 Const. width (3 pix) 1 0.82± .05 1.03± .03 Variable width (2-4 pix) 1.57 1.59± .28 2.11± .20 – 19 – Fig. 1.— The loop shown is an example of the TRACE loops studied in LKD06. The lower panel shows the straightened version of the loop that is used for the width determination. – 20 – Fig. 2.— Measured width versus position along the loop of Figure 1. Background subtraction and PSF correction have been applied, as described in Section 2. The horizontal line shows the mean value. – 21 – Fig. 3.— Synthetic data created by superposing loops with different specified properties on real TRACE backgrounds. The top panels show the background used in each column. The ends (footpoints) and middle of the loops have an imposed diameter (in pixels) of: (a,b) 4-4, (c,d) 2.5-5, (e,f) 3-3, (g,h) 2-4. This provides both constant and expanding (by a factor 2) synthetic loops close to the spatial resolution. For a detailed description of the panels see Section 3. Fig. 4.— Measured width corrected for instrument resolution (asterisks) versus position along the synthetic loops shown in panels a) to h) of Figure 3. Continuous lines indicate the model width. – 23 – Fig. 5.— Expansion factor Γm/se versus loop-to-background intensity ratio Φ, for expanding and non-expanding synthetic narrow loops on backgrounds I and II (similar to loops in panels e-h of Figure 4). The error bars are defined by the expansion factors Γm/s and Γm/e (see Section 4.1). The horizontal line indicates the expansion factor of the model. – 24 – Fig. 6.— Two examples of unresolved synthetic loops constructed with a constant width of 0.5 pixels. The loops differ only in the loop-to-background intensity ratio Φ, which is set to 1 for case (a) and 3 for case (b) (see Section 4.1). – 25 – Fig. 7.— Width measurements for the two synthetic loops in Figure 6, corrected for the instrument resolution. The solid line indicates the actual model loop width, and the dashed line indicates the PSF full width at half maximum. – 26 – Fig. 8.— Width versus position along a synthetic loop with a model expansion factor of 2.2. The width has been corrected for instrument resolution. – 27 – Fig. 9.— Different loop and background properties versus position along the loop. Top panel: example loop from Figure 1; botton panel: synthetic loop from Figure 3, panel a). For a detailed description see Section 4.2. Fig. 10.— Scatter plots of measured quantities for the observed loop in Figure 1 (left column) and for the synthetic loop of Figures 3 and 4, panels (a) (right column). Top: width versus on-axis loop intensity. Bottom: width versus background intensity to loop intensity ratio (see Section 4.2). Continuous lines correspond to least-squares fits of the data. Slopes and intercepts are given in the respective panels. Fig. 11.— Scatter plots of measured quantities for the observed loop in Figure 1 (left column) and for the synthetic loop of Figures 3 and 4, panels (a) (right column). Top: width versus absolute value of the background intensity difference across the loop. Bottom left: width deviation from the mean versus the background intensity difference normalized by the loop intensity (see Section 4.2). Bottom right: same kind of plot, but the deviation is relative to the model width. Introduction Observed TRACE loops Synthetic loops Results Can we detect expanding loops? Short length scale width fluctuations Discussion and conclusion ABSTRACT We study the effect of the coronal background in the determination of the diameter of EUV loops, and we analyze the suitability of the procedure followed in a previous paper (L\'opez Fuentes, Klimchuk & D\'emoulin 2006) for characterizing their expansion properties. For the analysis we create different synthetic loops and we place them on real backgrounds from data obtained with the Transition Region and Coronal Explorer (\textit{TRACE}). We apply to these loops the same procedure followed in our previous works, and we compare the results with real loop observations. We demonstrate that the procedure allows us to distinguish constant width loops from loops that expand appreciably with height, as predicted by simple force-free field models. This holds even for loops near the resolution limit. The procedure can easily determine when loops are below resolution limit and therefore not reliably measured. We find that small-scale variations in the measured loop width are likely due to imperfections in the background subtraction. The greatest errors occur in especially narrow loops and in places where the background is especially bright relative to the loop. We stress, however, that these effects do not impact the ability to measure large-scale variations. The result that observed loops do not expand systematically with height is robust. <|endoftext|><|startoftext|> Polarizations of J/ψ and ψ(2S) Mesons Produced in pp Collisions at s = 1.96 TeV A. Abulencia,24 J. Adelman,13 T. Affolder,10 T. Akimoto,55 M.G. Albrow,17 S. Amerio,43 D. Amidei,35 A. Anastassov,52 K. Anikeev,17 A. Annovi,19 J. Antos,14 M. Aoki,55 G. Apollinari,17 T. Arisawa,57 A. Artikov,15 W. Ashmanskas,17 A. Attal,3 A. Aurisano,42 F. Azfar,42 P. Azzi-Bacchetta,43 P. Azzurri,46 N. Bacchetta,43 W. Badgett,17 A. Barbaro-Galtieri,29 V.E. Barnes,48 B.A. Barnett,25 S. Baroiant,7 V. Bartsch,31 G. Bauer,33 P.-H. Beauchemin,34 F. Bedeschi,46 S. Behari,25 G. Bellettini,46 J. Bellinger,59 A. Belloni,33 D. Benjamin,16 A. Beretvas,17 J. Beringer,29 T. Berry,30 A. Bhatti,50 M. Binkley,17 D. Bisello,43 I. Bizjak,31 R.E. Blair,2 C. Blocker,6 B. Blumenfeld,25 A. Bocci,16 A. Bodek,49 V. Boisvert,49 G. Bolla,48 A. Bolshov,33 D. Bortoletto,48 J. Boudreau,47 A. Boveia,10 B. Brau,10 L. Brigliadori,5 C. Bromberg,36 E. Brubaker,13 J. Budagov,15 H.S. Budd,49 S. Budd,24 K. Burkett,17 G. Busetto,43 P. Bussey,21 A. Buzatu,34 K. L. Byrum,2 S. Cabreraq,16 M. Campanelli,20 M. Campbell,35 F. Canelli,17 A. Canepa,45 S. Carilloi,18 D. Carlsmith,59 R. Carosi,46 S. Carron,34 B. Casal,11 M. Casarsa,54 A. Castro,5 P. Catastini,46 D. Cauz,54 M. Cavalli-Sforza,3 A. Cerri,29 L. Cerritom,31 S.H. Chang,28 Y.C. Chen,1 M. Chertok,7 G. Chiarelli,46 G. Chlachidze,17 F. Chlebana,17 I. Cho,28 K. Cho,28 D. Chokheli,15 J.P. Chou,22 G. Choudalakis,33 S.H. Chuang,52 K. Chung,12 W.H. Chung,59 Y.S. Chung,49 M. Cilijak,46 C.I. Ciobanu,24 M.A. Ciocci,46 A. Clark,20 D. Clark,6 M. Coca,16 G. Compostella,43 M.E. Convery,50 J. Conway,7 B. Cooper,31 K. Copic,35 M. Cordelli,19 G. Cortiana,43 F. Crescioli,46 C. Cuenca Almenarq,7 J. Cuevasl,11 R. Culbertson,17 J.C. Cully,35 S. DaRonco,43 M. Datta,17 S. D’Auria,21 T. Davies,21 D. Dagenhart,17 P. de Barbaro,49 S. De Cecco,51 A. Deisher,29 G. De Lentdeckerc,49 G. De Lorenzo,3 M. Dell’Orso,46 F. Delli Paoli,43 L. Demortier,50 J. Deng,16 M. Deninno,5 D. De Pedis,51 P.F. Derwent,17 G.P. Di Giovanni,44 C. Dionisi,51 B. Di Ruzza,54 J.R. Dittmann,4 M. D’Onofrio,3 C. Dörr,26 S. Donati,46 P. Dong,8 J. Donini,43 T. Dorigo,43 S. Dube,52 J. Efron,39 R. Erbacher,7 D. Errede,24 S. Errede,24 R. Eusebi,17 H.C. Fang,29 S. Farrington,30 I. Fedorko,46 W.T. Fedorko,13 R.G. Feild,60 M. Feindt,26 J.P. Fernandez,32 R. Field,18 G. Flanagan,48 R. Forrest,7 S. Forrester,7 M. Franklin,22 J.C. Freeman,29 I. Furic,13 M. Gallinaro,50 J. Galyardt,12 J.E. Garcia,46 F. Garberson,10 A.F. Garfinkel,48 C. Gay,60 H. Gerberich,24 D. Gerdes,35 S. Giagu,51 P. Giannetti,46 K. Gibson,47 J.L. Gimmell,49 C. Ginsburg,17 N. Giokarisa,15 M. Giordani,54 P. Giromini,19 M. Giunta,46 G. Giurgiu,25 V. Glagolev,15 D. Glenzinski,17 M. Gold,37 N. Goldschmidt,18 J. Goldsteinb,42 A. Golossanov,17 G. Gomez,11 G. Gomez-Ceballos,33 M. Goncharov,53 O. González,32 I. Gorelov,37 A.T. Goshaw,16 K. Goulianos,50 A. Gresele,43 S. Grinstein,22 C. Grosso-Pilcher,13 R.C. Group,17 U. Grundler,24 J. Guimaraes da Costa,22 Z. Gunay-Unalan,36 C. Haber,29 K. Hahn,33 S.R. Hahn,17 E. Halkiadakis,52 A. Hamilton,20 B.-Y. Han,49 J.Y. Han,49 R. Handler,59 F. Happacher,19 K. Hara,55 D. Hare,52 M. Hare,56 S. Harper,42 R.F. Harr,58 R.M. Harris,17 M. Hartz,47 K. Hatakeyama,50 J. Hauser,8 C. Hays,42 M. Heck,26 A. Heijboer,45 B. Heinemann,29 J. Heinrich,45 C. Henderson,33 M. Herndon,59 J. Heuser,26 D. Hidas,16 C.S. Hillb,10 D. Hirschbuehl,26 A. Hocker,17 A. Holloway,22 S. Hou,1 M. Houlden,30 S.-C. Hsu,9 B.T. Huffman,42 R.E. Hughes,39 U. Husemann,60 J. Huston,36 J. Incandela,10 G. Introzzi,46 M. Iori,51 A. Ivanov,7 B. Iyutin,33 E. James,17 D. Jang,52 B. Jayatilaka,16 D. Jeans,51 E.J. Jeon,28 S. Jindariani,18 W. Johnson,7 M. Jones,48 K.K. Joo,28 S.Y. Jun,12 J.E. Jung,28 T.R. Junk,24 T. Kamon,53 P.E. Karchin,58 Y. Kato,41 Y. Kemp,26 R. Kephart,17 U. Kerzel,26 V. Khotilovich,53 B. Kilminster,39 D.H. Kim,28 H.S. Kim,28 J.E. Kim,28 M.J. Kim,17 S.B. Kim,28 S.H. Kim,55 Y.K. Kim,13 N. Kimura,55 L. Kirsch,6 S. Klimenko,18 M. Klute,33 B. Knuteson,33 B.R. Ko,16 K. Kondo,57 D.J. Kong,28 J. Konigsberg,18 A. Korytov,18 A.V. Kotwal,16 A.C. Kraan,45 J. Kraus,24 M. Kreps,26 J. Kroll,45 N. Krumnack,4 M. Kruse,16 V. Krutelyov,10 T. Kubo,55 S. E. Kuhlmann,2 T. Kuhr,26 N.P. Kulkarni,58 Y. Kusakabe,57 S. Kwang,13 A.T. Laasanen,48 S. Lai,34 S. Lami,46 S. Lammel,17 M. Lancaster,31 R.L. Lander,7 K. Lannon,39 A. Lath,52 G. Latino,46 I. Lazzizzera,43 T. LeCompte,2 J. Lee,49 J. Lee,28 Y.J. Lee,28 S.W. Leeo,53 R. Lefèvre,20 N. Leonardo,33 S. Leone,46 S. Levy,13 J.D. Lewis,17 C. Lin,60 C.S. Lin,17 M. Lindgren,17 E. Lipeles,9 A. Lister,7 D.O. Litvintsev,17 T. Liu,17 N.S. Lockyer,45 A. Loginov,60 M. Loreti,43 R.-S. Lu,1 D. Lucchesi,43 P. Lujan,29 P. Lukens,17 G. Lungu,18 L. Lyons,42 J. Lys,29 R. Lysak,14 E. Lytken,48 P. Mack,26 D. MacQueen,34 R. Madrak,17 K. Maeshima,17 K. Makhoul,33 T. Maki,23 P. Maksimovic,25 S. Malde,42 S. Malik,31 G. Manca,30 F. Margaroli,5 R. Marginean,17 C. Marino,26 C.P. Marino,24 A. Martin,60 M. Martin,25 V. Marting,21 M. Mart́ınez,3 R. Mart́ınez-Ballaŕın,32 T. Maruyama,55 P. Mastrandrea,51 T. Masubuchi,55 H. Matsunaga,55 M.E. Mattson,58 R. Mazini,34 P. Mazzanti,5 K.S. McFarland,49 P. McIntyre,53 R. McNultyf ,30 A. Mehta,30 P. Mehtala,23 S. Menzemerh,11 A. Menzione,46 P. Merkel,48 C. Mesropian,50 A. Messina,36 T. Miao,17 N. Miladinovic,6 J. Miles,33 R. Miller,36 C. Mills,10 M. Milnik,26 A. Mitra,1 G. Mitselmakher,18 A. Miyamoto,27 S. Moed,20 N. Moggi,5 B. Mohr,8 C.S. Moon,28 R. Moore,17 http://arxiv.org/abs/0704.0638v2 M. Morello,46 P. Movilla Fernandez,29 J. Mülmenstädt,29 A. Mukherjee,17 Th. Muller,26 R. Mumford,25 P. Murat,17 M. Mussini,5 J. Nachtman,17 A. Nagano,55 J. Naganoma,57 K. Nakamura,55 I. Nakano,40 A. Napier,56 V. Necula,16 C. Neu,45 M.S. Neubauer,9 J. Nielsenn,29 L. Nodulman,2 O. Norniella,3 E. Nurse,31 S.H. Oh,16 Y.D. Oh,28 I. Oksuzian,18 T. Okusawa,41 R. Oldeman,30 R. Orava,23 K. Osterberg,23 C. Pagliarone,46 E. Palencia,11 V. Papadimitriou,17 A. Papaikonomou,26 A.A. Paramonov,13 B. Parks,39 S. Pashapour,34 J. Patrick,17 G. Pauletta,54 M. Paulini,12 C. Paus,33 D.E. Pellett,7 A. Penzo,54 T.J. Phillips,16 G. Piacentino,46 J. Piedra,44 L. Pinera,18 K. Pitts,24 C. Plager,8 L. Pondrom,59 X. Portell,3 O. Poukhov,15 N. Pounder,42 F. Prakoshyn,15 A. Pronko,17 J. Proudfoot,2 F. Ptohose,19 G. Punzi,46 J. Pursley,25 J. Rademackerb,42 A. Rahaman,47 V. Ramakrishnan,59 N. Ranjan,48 I. Redondo,32 B. Reisert,17 V. Rekovic,37 P. Renton,42 M. Rescigno,51 S. Richter,26 F. Rimondi,5 L. Ristori,46 A. Robson,21 T. Rodrigo,11 E. Rogers,24 S. Rolli,56 R. Roser,17 M. Rossi,54 R. Rossin,10 P. Roy,34 A. Ruiz,11 J. Russ,12 V. Rusu,13 H. Saarikko,23 A. Safonov,53 W.K. Sakumoto,49 G. Salamanna,51 O. Saltó,3 L. Santi,54 S. Sarkar,51 L. Sartori,46 K. Sato,17 P. Savard,34 A. Savoy-Navarro,44 T. Scheidle,26 P. Schlabach,17 E.E. Schmidt,17 M.P. Schmidt,60 M. Schmitt,38 T. Schwarz,7 L. Scodellaro,11 A.L. Scott,10 A. Scribano,46 F. Scuri,46 A. Sedov,48 S. Seidel,37 Y. Seiya,41 A. Semenov,15 L. Sexton-Kennedy,17 A. Sfyrla,20 S.Z. Shalhout,58 M.D. Shapiro,29 T. Shears,30 P.F. Shepard,47 D. Sherman,22 M. Shimojimak,55 M. Shochet,13 Y. Shon,59 I. Shreyber,20 A. Sidoti,46 P. Sinervo,34 A. Sisakyan,15 A.J. Slaughter,17 J. Slaunwhite,39 K. Sliwa,56 J.R. Smith,7 F.D. Snider,17 R. Snihur,34 M. Soderberg,35 A. Soha,7 S. Somalwar,52 V. Sorin,36 J. Spalding,17 F. Spinella,46 T. Spreitzer,34 P. Squillacioti,46 M. Stanitzki,60 A. Staveris-Polykalas,46 R. St. Denis,21 B. Stelzer,8 O. Stelzer-Chilton,42 D. Stentz,38 J. Strologas,37 D. Stuart,10 J.S. Suh,28 A. Sukhanov,18 H. Sun,56 I. Suslov,15 T. Suzuki,55 A. Taffardp,24 R. Takashima,40 Y. Takeuchi,55 R. Tanaka,40 M. Tecchio,35 P.K. Teng,1 K. Terashi,50 J. Thomd,17 A.S. Thompson,21 E. Thomson,45 P. Tipton,60 V. Tiwari,12 S. Tkaczyk,17 D. Toback,53 S. Tokar,14 K. Tollefson,36 T. Tomura,55 D. Tonelli,46 S. Torre,19 D. Torretta,17 S. Tourneur,44 W. Trischuk,34 R. Tsuchiya,57 S. Tsuno,40 Y. Tu,45 N. Turini,46 F. Ukegawa,55 S. Uozumi,55 S. Vallecorsa,20 N. van Remortel,23 A. Varganov,35 E. Vataga,37 F. Vazquezi,18 G. Velev,17 G. Veramendi,24 V. Veszpremi,48 M. Vidal,32 R. Vidal,17 I. Vila,11 R. Vilar,11 T. Vine,31 I. Vollrath,34 I. Volobouevo,29 G. Volpi,46 F. Würthwein,9 P. Wagner,53 R.G. Wagner,2 R.L. Wagner,17 J. Wagner,26 W. Wagner,26 R. Wallny,8 S.M. Wang,1 A. Warburton,34 D. Waters,31 M. Weinberger,53 W.C. Wester III,17 B. Whitehouse,56 D. Whiteson,45 A.B. Wicklund,2 E. Wicklund,17 G. Williams,34 H.H. Williams,45 P. Wilson,17 B.L. Winer,39 P. Wittichd,17 S. Wolbers,17 C. Wolfe,13 T. Wright,35 X. Wu,20 S.M. Wynne,30 A. Yagil,9 K. Yamamoto,41 J. Yamaoka,52 T. Yamashita,40 C. Yang,60 U.K. Yangj,13 Y.C. Yang,28 W.M. Yao,29 G.P. Yeh,17 J. Yoh,17 K. Yorita,13 T. Yoshida,41 G.B. Yu,49 I. Yu,28 S.S. Yu,17 J.C. Yun,17 L. Zanello,51 A. Zanetti,54 I. Zaw,22 X. Zhang,24 J. Zhou,52 and S. Zucchelli5 (CDF Collaboration∗) 1Institute of Physics, Academia Sinica, Taipei, Taiwan 11529, Republic of China 2Argonne National Laboratory, Argonne, Illinois 60439 3Institut de Fisica d’Altes Energies, Universitat Autonoma de Barcelona, E-08193, Bellaterra (Barcelona), Spain 4Baylor University, Waco, Texas 76798 5Istituto Nazionale di Fisica Nucleare, University of Bologna, I-40127 Bologna, Italy 6Brandeis University, Waltham, Massachusetts 02254 7University of California, Davis, Davis, California 95616 8University of California, Los Angeles, Los Angeles, California 90024 9University of California, San Diego, La Jolla, California 92093 10University of California, Santa Barbara, Santa Barbara, California 93106 11Instituto de Fisica de Cantabria, CSIC-University of Cantabria, 39005 Santander, Spain 12Carnegie Mellon University, Pittsburgh, PA 15213 13Enrico Fermi Institute, University of Chicago, Chicago, Illinois 60637 14Comenius University, 842 48 Bratislava, Slovakia; Institute of Experimental Physics, 040 01 Kosice, Slovakia 15Joint Institute for Nuclear Research, RU-141980 Dubna, Russia 16Duke University, Durham, North Carolina 27708 ∗ With visitors from aUniversity of Athens, bUniversity of Bristol, cUniversity Libre de Bruxelles, dCornell University, eUniversity of Cyprus, fUniversity of Dublin, gUniversity of Edinburgh, hUniversity of Heidelberg, iUniversidad Iberoamericana, jUniversity of Manchester, kNagasaki Institute of Applied Science, lUniversity de Oviedo, mUniversity of London, Queen Mary College, nUniversity of California Santa Cruz, oTexas Tech University, pUniversity of California Irvine, and qIFIC(CSIC-Universitat de Valencia). 17Fermi National Accelerator Laboratory, Batavia, Illinois 60510 18University of Florida, Gainesville, Florida 32611 19Laboratori Nazionali di Frascati, Istituto Nazionale di Fisica Nucleare, I-00044 Frascati, Italy 20University of Geneva, CH-1211 Geneva 4, Switzerland 21Glasgow University, Glasgow G12 8QQ, United Kingdom 22Harvard University, Cambridge, Massachusetts 02138 23Division of High Energy Physics, Department of Physics, University of Helsinki and Helsinki Institute of Physics, FIN-00014, Helsinki, Finland 24University of Illinois, Urbana, Illinois 61801 25The Johns Hopkins University, Baltimore, Maryland 21218 26Institut für Experimentelle Kernphysik, Universität Karlsruhe, 76128 Karlsruhe, Germany 27High Energy Accelerator Research Organization (KEK), Tsukuba, Ibaraki 305, Japan 28Center for High Energy Physics: Kyungpook National University, Taegu 702-701, Korea; Seoul National University, Seoul 151-742, Korea; SungKyunKwan University, Suwon 440-746, Korea 29Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, California 94720 30University of Liverpool, Liverpool L69 7ZE, United Kingdom 31University College London, London WC1E 6BT, United Kingdom 32Centro de Investigaciones Energeticas Medioambientales y Tecnologicas, E-28040 Madrid, Spain 33Massachusetts Institute of Technology, Cambridge, Massachusetts 02139 34Institute of Particle Physics: McGill University, Montréal, Canada H3A 2T8; and University of Toronto, Toronto, Canada M5S 1A7 35University of Michigan, Ann Arbor, Michigan 48109 36Michigan State University, East Lansing, Michigan 48824 37University of New Mexico, Albuquerque, New Mexico 87131 38Northwestern University, Evanston, Illinois 60208 39The Ohio State University, Columbus, Ohio 43210 40Okayama University, Okayama 700-8530, Japan 41Osaka City University, Osaka 588, Japan 42University of Oxford, Oxford OX1 3RH, United Kingdom 43University of Padova, Istituto Nazionale di Fisica Nucleare, Sezione di Padova-Trento, I-35131 Padova, Italy 44LPNHE, Universite Pierre et Marie Curie/IN2P3-CNRS, UMR7585, Paris, F-75252 France 45University of Pennsylvania, Philadelphia, Pennsylvania 19104 46Istituto Nazionale di Fisica Nucleare Pisa, Universities of Pisa, Siena and Scuola Normale Superiore, I-56127 Pisa, Italy 47University of Pittsburgh, Pittsburgh, Pennsylvania 15260 48Purdue University, West Lafayette, Indiana 47907 49University of Rochester, Rochester, New York 14627 50The Rockefeller University, New York, New York 10021 51Istituto Nazionale di Fisica Nucleare, Sezione di Roma 1, University of Rome “La Sapienza,” I-00185 Roma, Italy 52Rutgers University, Piscataway, New Jersey 08855 53Texas A&M University, College Station, Texas 77843 54Istituto Nazionale di Fisica Nucleare, University of Trieste/ Udine, Italy 55University of Tsukuba, Tsukuba, Ibaraki 305, Japan 56Tufts University, Medford, Massachusetts 02155 57Waseda University, Tokyo 169, Japan 58Wayne State University, Detroit, Michigan 48201 59University of Wisconsin, Madison, Wisconsin 53706 60Yale University, New Haven, Connecticut 06520 We have measured the polarizations of J/ψ and ψ(2S) mesons as functions of their transverse momentum pT when they are produced promptly in the rapidity range |y| < 0.6 with pT ≥ 5 GeV/c. The analysis is performed using a data sample with an integrated luminosity of about 800 pb−1 collected by the CDF II detector. For both vector mesons, we find that the polarizations become increasingly longitudinal as pT increases from 5 to 30 GeV/c. These results are compared to the pre- dictions of nonrelativistic quantum chromodynamics and other contemporary models. The effective polarizations of J/ψ and ψ(2S) mesons from B-hadron decays are also reported. PACS numbers: 13.88.+e, 13.20.Gd, 14.40.Lb An effective field theory, nonrelativistic quantum chromodynamics (NRQCD) [1], provides a rigorous formalism for calculating the production rates of charmonium (cc) states. NRQCD explains the direct production cross sections for J/ψ and ψ(2S) mesons observed at the Tevatron [2, 3] and predicts their increasingly transverse polarizations as pT increases, where pT is the meson’s momentum component perpendicular to the colliding beam direction [4]. The first polarization measurements at the Tevatron [5] did not show such a trend. This Letter reports on J/ψ and ψ(2S) polarization measurements with a larger data sample than previously available. This allows the extension of the measurement to a higher pT region and makes a more stringent test of the NRQCD prediction. The NRQCD cross section calculation for cc production separates the long-distance nonperturbative contributions from the short-distance perturbative behavior. The former is treated as an expansion of the matrix elements in powers of the nonrelativistic charm-quark velocity. This expansion can be computed by lattice simulations, but currently the expansion coefficients are treated as universal parameters, which are adjusted to match the cross section measurements at the Tevatron [2, 3]. The calculation also applies to cc production in ep collisions, but HERA measurements of J/ψ polarization tend to disagree with the NRQCD prediction [6]. These difficulties have led some authors to explore alternative power expansions of the long-distance interactions for the cc system [7]. There are also new QCD-inspired models, the gluon tower model [8] and the kT -factorization model [9], that accomodate vector-meson cross sections at both HERA and the Tevatron and predict the vector-meson polarizations as functions of pT . These authors emphasize that measuring the vector-meson polarizations as functions of pT is a crucial test of NRQCD. The CDF II detector is described in detail elsewhere [3, 10]. In this analysis, the essential features are a muon system covering the central region of pseudorapidity, |η| < 0.6, and the tracking system, immersed in the 1.4 T solenoidal magnetic field and composed of a silicon microstrip detector and a cylindrical drift chamber called the central outer tracker (COT). The data used here correspond to an integrated luminosity of about 800 pb−1 and were recorded between June 2004 and February 2006 by a dimuon trigger, which requires two opposite-charge muon candidates, each having pT > 1.5 GeV/c. Decays of vector mesons V (either J/ψ or ψ(2S)) → µ+µ− are selected from dimuon events for which each track has segments reconstructed in both the COT and the silicon microstrip detector. The pT of each muon is required to exceed 1.75 GeV/c in order to guarantee a well-measured trigger efficiency. The muon track pair is required to be consistent with originating from a common vertex and to have an invariant mass M within the range 2.8 (3.4) < M < 3.4 (3.9) GeV/c2 to be considered as a J/ψ (ψ(2S)) candidate. To have a reasonable polarization sensitivity, the vector-meson candidates are required to have pT ≥ 5 GeV/c in the rapidity range |y (≡ 12 ln E+p|| E−p|| )| < 0.6, where E is the energy and p|| is the momentum parallel to the beam direction of the dimuon system. Events are separated into a signal region and sideband regions, as indicated in Fig. 1. The fit to the data uses a double (single) Gaussian for the J/ψ (ψ(2S)) signal and a linear background shape. The fits are used only to define signal and background regions. The signal regions are within 3σV of the fitted mass peaks MV , where σV is the width obtained in the fit to the invariant mass distribution. Both the background distribution and the quantity of background events under the signal peak are estimated by events from the lower and upper mass sidebands. The sideband regions are 7σJ/ψ (4σψ(2S)) away from the signal region for J/ψ (ψ(2S)). For each candidate, we compute ct =MLxy/pT , where t is the proper decay time and Lxy is the transverse distance between the beam line and the decay vertex in the plane normal to the beam direction. The ct distributions of the selected dimuon events are shown in Fig. 2. The ct distribution of prompt events is a Gaussian distribution centered at zero due to finite tracking resolution. For J/ψ, the prompt events are due to direct production or the decays of heavier charmonium states such as χc and ψ(2S); for ψ(2S), the prompt events are almost entirely due to direct production since heavier charmonium states rarely decay to ψ(2S) [11]. Both the J/ψ and the ψ(2S) samples contain significant numbers of events originating from long-lived B-hadron decays, as can be seen from the event excess at positive ct. We have measured the fraction of B → J/ψ + X events in the J/ψ sample and found agreement with other results [3]. We select prompt events by requiring the sum of the squared impact parameter significances of the positively and negatively charged muon tracks S ≡ ( d )2 + ( )2 ≤ 8. The impact parameter d0 is the distance of closest approach of the track to the beam line in the transverse plane. Vector-meson candidates from B-hadron decays are selected by requiring S > 16 and ct > 0.03 cm. This requirement retains a negligible fraction of prompt events in the B sample. To measure the polarizations of prompt J/ψ and ψ(2S) mesons as functions of pT , the J/ψ events are analyzed in six pT bins and the ψ(2S) events in three bins, shown in Table I. We determine the fraction of B-decay background remaining in prompt samples fbkd by subtracting the number of negative ct events from the number of positive ct events. Only a negligible fraction (< 0.2%) of B decays produce vector-meson events with negative ct. For both vector mesons, fbkd increases with pT , as listed in Table I. The prompt polarization from the fitting algorithm is corrected for this contamination. M (GeV/c 2.8 2.9 3 3.1 3.2 3.3 3.4 M (GeV/c 3.4 3.5 3.6 3.7 3.8 3.9 FIG. 1: Invariant mass distributions for (a) J/ψ and (b) ψ(2S) candidates. The curves are fits to the data. The solid (dashed) lines indicate the signal (sideband) regions. ct (cm) -0.3 -0.2 -0.1 0 0.1 0.2 0.3 510 (a) ct (cm) -0.2 -0.1 0 0.1 0.2 410 (b) FIG. 2: Sideband-subtracted ct distributions for (a) J/ψ and (b) ψ(2S) events. The prompt Gaussian peak, positive excess from B-hadron decays, and negative tail from mismeasured events are shown. The dotted line is the reflection of the negative ct histogram about zero. The polarization information is contained in the distribution of the muon decay angle θ∗, the angle of the µ+ in the rest frame of vector meson with respect to the vector-meson boost direction in the laboratory system. The decay angle distribution depends on the polarization parameter α: d cos θ∗ ∝ 1 + α cos2θ∗ (−1 ≤ α ≤ 1). For fully transverse (longitudinal) polarization, α = +1 (−1). Intermediate values of α indicate a mixture of transverse and longitudinal polarization. A template method is used to account for acceptance and efficiency. Two sets of cos θ∗ distributions for fully polarized decays of J/ψ and ψ(2S) events, one longitudinal (L) and the other transverse (T ), are produced with the CDF simulation program using the efficiency-corrected pT spectra measured from data [3, 12]. We use the muon trigger efficiency measured using data as a function of track parameters (pT , η, φ) to account for detector non- uniformities. The parametrized efficiency is used as a filter on all simulated muons. Events that pass reconstruction represent the behavior of fully polarized vector-meson decays in the detector. The fitting algorithm [5] uses two binned cos θ∗ distributions for each pT bin, one made by NS events from the signal region (signal plus background) and the other made by NB events from the sideband regions (background). The χ2 minimization is done simultaneously for both cos θ∗ distributions. The fitting algorithm includes an individual background term for each cos θ∗ bin, normalized to NB. Simulation shows that the cos θ ∗ resolution at all decay angles over the entire pT range is much smaller than the bin width of 0.05 (0.10 for ψ(2S)) used here. The data, fit, and template distributions for the worst fit (9% probability) in the J/ψ data are shown in Fig. 3. *θcos -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 (GeV/c) < 9T p≤ 7 FIG. 3: cos θ∗ distribution of data (points) and polarization fit for the worst χ2 probability bin in the J/ψ data. The dotted (dashed) line is the template for fully L (T) polarization. The fit describes the overall trend of the data well. All systematic uncertainties are much smaller than the statistical uncertainties. Varying the pT spectrum used in the simulation by 1σ changed the polarization parameter for J/ψ at most by 0.002. A systematic uncertainty of 0.007 was estimated by the change in the polarization parameter when a modification was made on all trigger efficiencies by ±1σ. For ψ(2S), the dominant systematic uncertainty came from the yield estimate because of the radiative tail and the large background. The total systematic uncertainties shown in Table I were taken to be the quadrature sum of these individual uncertainties. Other possible sources of systematic uncertainties - signal definition and cos θ∗ binning - were determined to be negligible. Corrections to prompt polarization from B-decay contamination were small, so that uncertainties on B-decay polarization measurements also had negligible effect. No φ-dependence of the polarizations was observed. The polarization of J/ψ mesons from inclusive Bu and Bd decays was measured by the BABAR collaboration [13]. In this analysis, the B-hadron direction is unknown, so we define θ∗ with respect to the J/ψ direction in the laboratory system. The resulting polarization is somewhat diluted. As discussed in Ref. [3], CDF uses a Monte Carlo procedure to adapt the BABAR measurement to predict the effective J/ψ polarization parameter. For the J/ψ events with 5 ≤ pT < 30 GeV/c, the CDF model for Bu and Bd decays gives αeff = −0.145± 0.009, independent of pT . We have measured the polarization of vector mesons from B-hadron decays. For J/ψ, we find αeff = −0.106±0.033 (stat)±0.007 (syst). At this level of accuracy, a polarization contribution by J/ψ mesons from Bs and b-baryon decays cannot be separated from the effective polarization due to those from Bu and Bd decays. We also report the first measurement of the ψ(2S) polarization from B-hadron decays: αeff = 0.36± 0.25 (stat)± 0.03 (syst). The polarization parameters for both prompt vector mesons corrected for fbkd using our experimental results on αeff are listed as functions of pT in Table I and are plotted in Fig. 4. The polarization parameters for J/ψ are negative over the entire pT range of measurement and become increasingly negative (favoring longitudinal polarization) as pT increases. For ψ(2S), the central value of the polarization parameter is positive at small pT , but, given the uncertainties, its behavior is consistent with the trend shown in the measurement of the J/ψ polarization. The polarization behavior measured previously with 110 pb−1 [5] is not consistent with the results presented here. This is a differential measurement, and the muon efficiencies in this analysis are true dimuon efficiencies. In Ref. [5], they are the product of independent single muon efficiencies. The efficiency for muons with pT < 4 GeV/c is crucial for good polarization sensitivity. In this analysis, the muon efficiency varies smoothly from 99% to 97% over this range. In the analysis of Ref. [5], it varied from 93% to 40% with significant jumps between individual data points. Data from periods of drift chamber aging were omitted from this analysis because the polarization results were inconsistent with the remainder of the data. Studies such as this were not done in the analysis of Ref. [5]. The systematics of the polarization measurement are much better understood in this analysis. These polarization measurements for the charmed vector mesons extend to a pT regime where perturbative QCD should be applicable. The results are compared to the predictions of NRQCD and the kT -factorization model in Fig. 4. The prediction of the kT -factorization model is presented for pT < 20 GeV/c and does not include the contribution pT (GeV/c) (GeV/c) fbkd(%) α χ 2/d.o.f J/ψ 5−6 5.5 2.8± 0.2 −0.004± 0.029± 0.009 15.5/21 6−7 6.5 3.4± 0.2 −0.015± 0.028± 0.010 24.1/23 7−9 7.8 4.1± 0.2 −0.077± 0.023± 0.013 35.1/25 9−12 10.1 5.7± 0.3 −0.094± 0.028± 0.007 34.0/29 12−17 13.7 6.7± 0.6 −0.140± 0.043± 0.007 35.0/31 17−30 20.0 13.6± 1.4 −0.187± 0.090± 0.007 33.9/35 ψ(2S) 5−7 5.9 1.6± 0.9 +0.314± 0.242± 0.028 13.1/11 7−10 8.2 4.9± 1.2 −0.013± 0.201± 0.035 18.5/13 10−30 12.6 8.6± 1.8 −0.374± 0.222± 0.062 26.9/17 TABLE I: Polarization parameter α for prompt production in each pT bin. The first (second) uncertainty is statistical (sys- tematic). is the average transverse momentum. (GeV/c)Tp 5 10 15 20 25 30 CDF Data NRQCD -factorizationTk (GeV/c)Tp 5 10 15 20 25 30 CDF Data NRQCD -factorizationTk FIG. 4: Prompt polarizations as functions of pT : (a) J/ψ and (b) ψ(2S). The band (line) is the prediction from NRQCD [4] (the kT -factorization model [9]). from the decays of heavier charmonium states for J/ψ production. The polarizations for prompt production of both vector mesons become increasingly longitudinal as pT increases beyond 10 GeV/c. This behavior is in strong disagreement with the NRQCD prediction of large transverse polarization at high pT . It is striking that the NRQCD calculation and the other models reproduce the measured J/ψ and ψ(2S) cross sections at the Tevatron, but fail to describe the polarization at high pT . This indicates that there is some important aspect of the production mechanism that is not yet understood. We thank the Fermilab staff and the technical staffs of the participating institutions for their vital contributions. This work was supported by the U.S. Department of Energy and National Science Foundation; the Italian Istituto Nazionale di Fisica Nucleare; the Ministry of Education, Culture, Sports, Science and Technology of Japan; the Natural Sciences and Engineering Research Council of Canada; the National Science Council of the Republic of China; the Swiss National Science Foundation; the A.P. Sloan Foundation; the Bundesministerium für Bildung und Forschung, Germany; the Korean Science and Engineering Foundation and the Korean Research Foundation; the Particle Physics and Astronomy Research Council and the Royal Society, UK; the Institut National de Physique Nucleaire et Physique des Particules/CNRS; the Russian Foundation for Basic Research; the Comisión Interministerial de Ciencia y Tecnoloǵıa, Spain; the European Community’s Human Potential Programme; the Slovak R&D Agency; and the Academy of Finland. [1] G. T. Bodwin, E. Braaten, and G. P. Lepage, Phys. Rev. D 51, 1125 (1995); Erratum, ibid. Phys. Rev. D 55, 5853 (1997); E. Braaten and S. Fleming, Phys. Rev. Lett. 74, 3327 (1995). [2] F. Abe et al. (CDF Collaboration), Phys. Rev. Lett. 79, 572 (1997). [3] D. Acosta et al. (CDF Collaboration), Phys. Rev. D 71, 032001 (2005). [4] P. Cho and M. Wise, Phys. Lett. B 346, 129 (1995); M. Beneke and I. Z. Rothstein, Phys. Lett. B 372, 157 (1996); Erratum, ibid. Phys. Lett. B 389, 769 (1996); E. Braaten, B. A. Kniehl, and J. Lee, Phys. Rev. D 62, 094005 (2000). [5] T. Affolder et al. (CDF Collaboration), Phys. Rev. Lett. 85, 2886 (2000). [6] C. Adloff et al. (H1 Collaboration), Eur. Phys. J. C 25, 41 (2002); S. Chekanov et al. (ZEUS Collaboration), Eur. Phys. J. C 44, 13 (2005). [7] S. Fleming, I. Z. Rothstein, and A. K. Leibovich, Phys. Rev. D 64, 036002 (2001). [8] V. A. Khoze, A. D. Martin, M. G. Ryskin, and W. J. Stirling, Eur. Phys. J. C 39, 163 (2005). [9] S. P. Baranov, Phys. Rev. D 66, 114003 (2002). [10] The CDF coordinate system has ẑ along the proton direction, x̂ horizontal pointing outward from the Tevatron ring, and ŷ vertical. θ (φ) is the polar (azimuthal) angle measured with respect to ẑ, and η is the pseudorapidity defined as −ln (tan (θ/2)). The transverse momentum of a particle is denoted as pT = p sin θ. [11] W.-M. Yao et al. (Particle Data Group), J. Phys. G 33, 1 (2006). [12] A Letter on ψ(2S) cross section measurement is in preparation. [13] B. Aubert et al. (BABAR Collaboration), Phys. Rev. D 67, 032002 (2003). References ABSTRACT We have measured the polarizations of $\jpsi$ and $\psiprime$ mesons as functions of their transverse momentum $\pt$ when they are produced promptly in the rapidity range $|y|<0.6$ with $\pt \geq 5 \pgev$. The analysis is performed using a data sample with an integrated luminosity of about $800 \ipb$ collected by the CDF II detector. For both vector mesons, we find that the polarizations become increasingly longitudinal as $\pt$ increases from 5 to $30 \pgev$. These results are compared to the predictions of nonrelativistic quantum chromodynamics and other contemporary models. The effective polarizations of $\jpsi$ and $\psiprime$ mesons from $B$-hadron decays are also reported. <|endoftext|><|startoftext|> A measure of the non-Gaussian character of a quantum state Marco G. Genoni,1 Matteo G. A. Paris,1, 2, ∗ and Konrad Banaszek3 1Dipartimento di Fisica dell’Università di Milano, I-20133, Milano, Italia. 2Institute for Scientific Interchange, I-10133 Torino, Italia 3Institute of Physics, Nicolaus Copernicus University, PL-87-100 Toruń, Poland (Dated: November 3, 2018) We address the issue of quantifying the non-Gaussian character of a bosonic quantum state and introduce a non-Gaussianity measure based on the Hilbert-Schmidt distance between the state under examination and a reference Gaussian state. We analyze in details the properties of the proposed measure and exploit it to evaluate the non-Gaussianity of some relevant single- and multi-mode quantum states. The evolution of non-Gaussianity is also analyzed for quantum states undergoing the processes of Gaussification by loss and de-Gaussification by photon-subtraction. The suggested measure is easily computable for any state of a bosonic system and allows to define a corresponding measure for the non-Gaussian character of a quantum operation. PACS numbers: 03.67.-a, 03.65.Bz, 42.50.Dv I. INTRODUCTION Gaussian states play a crucial role in quantum information processing with continuous variables. This is especially true for quantum optical implementations since radiation at ther- mal equilibrium, including the vacuum state, is itself a Gaus- sian state and most of the Hamiltonians achievable within the current technology are at most bilinear in the field operators, i.e. preserve the Gaussian character [1, 2, 3]. As a matter of fact, using single-mode and entangled Gaussian states, lin- ear optical circuits and Gaussian operations, like homodyne detection, several quantum information protocols have been implemented, including teleportation, dense coding and quan- tum cloning [4]. On the other hand quantum information protocols required for long distance communication, as for example entangle- ment distillation and entanglement swapping, rely on non- Gaussian operations. In addition, it has been demonstrated that teleportation [5, 6, 7] and cloning [8] of quantum states may be improved by using non-Gaussian states and non- Gaussian operations. Indeed, de-Gaussification protocols for single-mode and two-mode states have been proposed [5, 6, 7] and realized [9]. It should be also noticed that any strongly superadditive function is minimized, at fixed covariance ma- trix, by Gaussian states. This is crucial to prove extremality of Gaussian states and Gaussian operations [10, 11] for what concerns various quantities as channel capacities [12], multi- partite entanglement measures [13] and distillable secret key in quantum key distribution protocols. Since in most cases these quantities can be computed only for Gaussian states, a non-Gaussianity measure may serve as a guideline to quan- tify them for the class of non-Gaussian states. Overall, non- Gaussianity is revealing itself as a resource for continuous variable quantum information, and thus we urge a measure able to quantify the non-Gaussian character of a quantum state. In this paper we introduce a novel quantity, the non- ∗Electronic address: matteo.paris@fisica.unimi.it Gaussianity δ[̺] of a quantum state, which quantifies how much a state fails to be Gaussian. Our measure, which is based on the Hilbert-Schmidt distance between the state itself and a reference Gaussian state, can be easily computed for any state, either single-mode or multi-mode. The paper is structured as follows. In the next Section we introduce notation and review the basic properties of Gaussian states. Then, in Section III we introduce the formal definition of δ[̺] and study its properties in details. In Section IV we evaluate non-Gaussianity of relevant quantum states whereas in Section V we analyze the evolution of non-Gaussianity for known Gaussification and de-Gaussification maps. Section VI closes the paper with some concluding remarks. II. GAUSSIAN STATES For concreteness, we will use here the quantum optical ter- minology of modes carrying photons, but our theory applies to general bosonic systems. Let us consider a system of n modes described by mode operators ak, k = 1 . . . n, satis- fying the commutation relations [ak, a j ] = δkj . A quantum state ̺ of the n modes is fully described by its characteristic function [14] χ[̺](λ) = Tr[̺D(λ)] where D(λ) = k=1Dk(λk) is the n-mode displacement operator, with λ = (λ1, . . . , λn) T , λk ∈ C, and where Dk(λk) = exp{λka†k − λ is the single-mode displacement operator. The canonical op- erators are given by: (ak + a (ak − a†k) with commutation relations given by [qj , pk] = iδjk. Upon in- troducing the real vector R = (q1, p1, . . . , qn, pn) T , the com- http://arxiv.org/abs/0704.0639v4 mailto:matteo.paris@fisica.unimi.it mutation relations rewrite as [Rk, Rj ] = iΩkj where Ωkj are the elements of the symplectic matrix Ω = k=1 σ2, σ2 being the y-Pauli matrix. The covariance ma- trix σ ≡ σ[̺] and the vector of mean values X ≡ X[̺] of a quantum state ̺ are defined as Xj = 〈Rj〉 σkj = 〈{Rk, Rj}〉 − 〈Rj〉〈Rk〉 where {A,B} = AB+BA denotes the anti-commutator, and 〈O〉 = Tr[̺ O] is the expectation value of the operator O. A quantum state ̺G is referred to as a Gaussian state if its characteristic function has the Gaussian form χ[̺G](Λ) = exp σΛ+XTΩΛ where Λ is the real vector Λ = (Reλ1, Imλ1, . . . ,Reλn, Imλn) T . Of course, once the covariance matrix and the vector of mean values are given, a Gaussian state is fully determined. For a single-mode system the most general Gaussian state can be written as ̺G = D(α)S(ζ)ν(nt)S †(ζ)D†(α), D(α) being the displacement operator, S(ζ) = exp[ 1 ζ(a†)2 − 1 ζ∗a2] the squeezing operator, α, ζ ∈ C, and ν(nt) = (1 + nt) −1[nt/(1 + nt)] a†a a thermal state with nt average number of photons. III. A MEASURE OF THE NON-GAUSSIAN CHARACTER OF A QUANTUM STATE In order to quantify the non-Gaussian character of a quan- tum state ̺ we use a quantity based on the distance between ̺ and a reference Gaussian state τ , which itself depends on ̺. Specifically, we define the non-Gaussianity δ[̺] of the state ̺ δ[̺] = D2HS [̺, τ ] where DHS [̺, τ ] denotes the Hilbert-Schmidt distance be- tween ̺ and τ D2HS [̺, τ ] = Tr[(̺− τ)2] = µ[̺] + µ[τ ] − 2κ[̺, τ ] , (3) with µ[̺] = Tr[̺2] and κ[̺, τ ] = Tr[̺τ ] denoting the purity of ̺ and the overlap between ̺ and τ respectively. The Gaussian reference τ is the Gaussian state such that X[̺] = X[τ ] σ[̺] = σ[τ ] i.e. τ is the Gaussian state with the same covariance matrix σ and the same vector X of the state ̺. The relevant properties of δ[̺], which confirm that it repre- sents a good measure of the non-Gaussian character of ̺, are summarized by the following Lemmas: Lemma 1: δ[̺] = 0 iff ̺ is a Gaussian state. Proof: If δ[̺] = 0 then ̺ = τ and thus it is a Gaussian state. If ̺ is a Gaussian state, then it is uniquely identified by its first and second moments and thus the reference Gaussian state τ is given by τ = ̺, which, in turn, leads to DHS [̺, τ ] = 0 and thus to δ[̺] = 0. Lemma 2: If U is a unitary map corresponding to a symplec- tic transformation in the phase space, i.e. if U = exp{−iH} with hermitianH that is at most bilinear in the field operators, then δ[U̺U †] = δ[̺]. This property ensures that displace- ment and squeezing operations do not change the Gaussian character of a quantum state. Proof: Let us consider ̺′ = U̺U †. Then the covariance ma- trix transforms as σ[̺′] = Σσ[̺]ΣT , Σ being the symplectic transformation associated to U . At the same time the vector of mean values simply translates to X ′ = X +X0, where X0 is the displacement generated by U . Since any Gaussian state is fully characterized by its first and second moments, then the reference state must necessarily transform as τ ′ = UτU †, i.e. with the same unitary transformation U . Since the Hilbert- Schmidt distance and the purity of a quantum state are invari- ant under unitary transformations the lemma is proved. Lemma 3: δ[̺] is proportional to the squared L2(Cn) dis- tance between the characteristic functions of ̺ and of the ref- erence Gaussian state τ . In formula: δ[̺] ∝ d2nλ [χ[̺](λ)− χ[τ ](λ)]2 . (4) Since the notion of Gaussianity of a quantum state is de- fined through the shape of its characteristic function, and since the characteristic function of a quantum state belongs to the L2(Cn) space [14], we address L2(C) distance to as a good indicator for the non Gaussian character of ̺. Proof: Since characteristic functions of self-adjoint operators are even functions of λ and by means of the identity Tr[O1O2] = χ[O1](λ)χ[O2](−λ) , we obtain D2HS [̺, τ ] = [χ[̺](λ)− χ[τ ](λ)]2 . Lemma 4: Consider a bipartite state ̺ = ̺A ⊗ ̺G. If ̺G is a Gaussian state then δ[̺] = δ[̺A]. Proof: we have µ[̺] = µ[̺A]µ[̺G] µ[τ ] = µ[τA]µ[τG] κ[̺, τ ] = κ[̺A, τA]κ[̺G, ̺G] . Therefore, since κ[̺G, ̺G] = µ[̺G] we arrive at δ[̺] = µ[̺A]µ[̺G] + µ[τA]µ[̺G]− 2κ[̺A, τA]κ[̺G, ̺G] 2µ[̺A]µ[̺G] = δ[̺A] (5) The four properties illustrated by the above lemmas are the natural properties required for a good measure of the non- Gaussian character of a quantum state. Notice that by using the trace distanceDT [̺, τ ] = Tr|̺−τ | instead of the Hilbert- Schmidt distance we would lose Lemmas 3 and 4, and that the invariance expressed by Lemma 4 holds thanks to the renor- malization of the Hilbert-Schmidt distance through the purity µ[̺]. We stress the fact that our measure of non-Gaussianity is a computable one: It may be evaluated for any quantum state of n modes by the calculation of the first two moments of the state, followed by the evaluation of the overlap with the corresponding Gaussian state. Notice that δ[̺] is not additive (nor multiplicative) with re- spect to the tensor product. If we consider a (separable) multi- partite quantum state in the product form ̺ = ⊗nk=1̺k, the non-Gaussianity is given by δ[̺] = k=1 µ[̺k] + k=1 µ[τk]− 2 k=1 κ[̺k, τk] k=1 µ[̺k] where τk is the Gaussian state with the same moments of ̺k. In fact, since the state ̺ is factorisable, we have that the cor- responding Gaussian τ is a factorisable state too. IV. NON-GAUSSIANITY OF RELEVANT QUANTUM STATES Let us now exploit the definition (2) to evaluate the non- Gaussianity of some relevant quantum states. At first we con- sider Fock number states |p〉 of a single mode as well as mul- timode factorisable states |p〉⊗n made of n copies of a num- ber state. The reference Gaussian states are a thermal state τp = ν(p) with average photon number p and a factorisable thermal state τN = [ν(p)] ⊗n with average photon number p in each mode [15]. Non-Gaussianity may be analytically eval- uated, leading to δ[|p〉〈p|] = 1 2p+ 1 δ[(|p〉〈p|)⊗n] = 1 2p+ 1 In the multimode case of |p〉⊗n, we seek for the number of copies that maximizes the non-Gaussianity. In Fig. 1 we show both δp ≡ δ[|p〉〈p|] and δ̄p = maxn δ[(|p〉〈p|)⊗n] as a function of p. As it is apparent from the plot non-Gaussianity of Fock states |p〉 increases monotonically with the number of photon p with the limiting value δp = 1/2 obtained for p → ∞. Upon considering multi-mode copies of Fock states we obtain larger value of non-Gaussianity: δ̄p is a decreasing function of p, approaching δ̄ = 1/2 from above. The value 1 5 10 15 20 25 30 FIG. 1: (Top): Non-Gaussianity of single mode Fock states (gray) |p〉 and of multi-mode Fock states |p〉⊗n (black) as a function of p. Non-Gaussianity for multi-mode states has been maximized over the number of copies n. (Bottom): Non-Gaussianity, as a function of the parameter φ, for the two-mode superpositions |Φ〉〉 (dashed gray), |Ψ〉〉 (solid gray), and for the single-mode superposition of coherent states |ψS〉 for α = 0.5 (solid black) and α = 5 (dashed black). of δ̄p corresponds to n = 3 for p < 26 and to n = 2 for 27 ≤ p . 250. Another example is the superposition of coherent states |ψS〉 = N−1/2 (cosφ|α〉 + sinφ| − α〉) (7) with normalization N = 1 + sin(2φ) exp{−2α2} which for φ = ±π/4 reduces to the so-called Schrödinger cat states, and whose reference Gaussian state is a displaced squeezed thermal state τS = D(C)S(r)ν(N)S †(r)D†(C), where the real parameters C, r, and N are analytical functions of φ and α. Finally we evaluate the non-Gaussianity of the two-mode Bell-like superpositions of Fock states |Φ〉〉 = cosφ|0, 0〉+ sinφ|1, 1〉 |Ψ〉〉 = cosφ|0, 1〉+ sinφ|1, 0〉, which for φ = ±π/4 reduces to the Bell states |Φ±〉 and |Ψ±〉. The corresponding reference Gaussian states are respectively a two mode squeezed thermal state τΦ = S2(ξ)[ν(N) ⊗ ν(N)]S†2(ξ), where S2(ξ) = exp(ξa ξ∗ab) denotes the two-mode squeezing operator, and τΨ = R(θ)[ν(N1)⊗ν(N2)]R†(θ), namely the correlated two-mode state obtained by mixing a single-mode thermal state with the vacuum at a beam splitter of transmissivity cos2 θ, i.e. R(θ) = exp[iθ(a 1a2+a 2a1)]. All the parameters involved in these reference Gaussian states are analytical functions of the superposition parameter φ. Non-Gaussianities are thus evalu- ated by means of (2) and are reported in Fig. 1 as a function of the parameter φ. As it is apparent from the plot, the non- Gaussianity of single-mode states does not surpass the value δ = 1/2, and this fact is confirmed by other examples not reported here. As concern the cat-like states, we notice that for small val- ues of α the non-Gaussianity of the superposition |ψS〉 shows a different behavior for positive and negative values of the pa- rameter φ: for φ > 0 and α = 0.5 we have almost zero δ, while higher values are achieved for φ < 0. For higher val- ues of α (α = 5 in Fig. 1), non-Gaussianity becomes an even function of φ. This different behavior can be understood by looking at the Wigner functions of even and odd Schrödinger cat states for different values of α: for small values of α the even cat’s Wigner function is similar to a Gaussian function, while the odd cat’s Wigner function shows a non-Gaussian hole in the origin of the phase space; increasing the value of α the Wigner functions of the two kind of states become similar and deviate from a Gaussian function. We have also done a numerical analysis of non-Gaussianity of single-mode quantum states represented by finite superpo- sition of Fock states n,k=0 ̺nk|n〉〈k| . (8) To this aim we generate randomly quantum states in a finite dimensional subspaces, dim(H) ≡ d+ 1 ≤ 21, following the algorithm proposed by Zyczkowski et al [16, 17], i.e. by gen- erating a random diagonal state (i.e. a point on the simplex) and a random unitary matrix according to the Haar measure. In Fig. 2 we report the distribution of non-Gaussianity δ[̺d], as evaluated for 105 random quantum states, for three different value of the maximum number of photons d. As it is apparent from the plots the distribution of δ[̺d] becomes Gaussian-like for increasing d. In the fourth panel of Fig. 2 we thus re- port the mean values and variances of the the distributions as a function of the maximum number of photons d. The mean value increases with the dimension whereas the variance is a monotonically decreasing function of d. Also for finite superpositions simulations we did not ob- serve non-Gaussianity higher than 1/2. Therefore, although we have no proof, we conjecture that δ = 1/2 is a limiting value for the non-Gaussianity of a single-mode state. Higher values are achievable for two-mode or multi-mode quantum states (e.g. δ = 2/3 for the Bell states |Ψ±〉〉). V. GAUSSIFICATION AND DE-GAUSSIFICATION PROCESSES We have also studied the evolution of non-Gaussianity of quantum states undergoing either Gaussification or de- Gaussification processes. First we have considered the Gaus- sification of Fock states due do the interaction of the system FIG. 2: Distribution of non-Gaussianity δ[̺d] as evaluated for 10 random quantum states, for three different value of the maximum number of photons d. Top: d = 2 (left), d = 10 (right); Bottom: d = 20 (left). (Bottom-right): Mean values and variances of the non- Gaussianities evaluated for 105 random quantum states, as a function of the maximum number of photons d. with a bath of oscillators at zero temperature. This is per- haps the simplest example of a Gaussification protocol. In fact the interaction drives asymptotically any quantum state to the vacuum state of the harmonic system, which, in turn, is a Gaussian state. The evolution of the system is governed by the Lindblad Master equation ˙̺ = γ L[a]̺, where ˙̺ denotes time derivative, γ is the damping factor and the Lindblad superop- erator acts as follows L[a]̺ = 2a†̺a − a†a̺ − ̺a†a. Upon writing η = e−γt the solution of the Master equation can be written as ̺(η) = Vm ̺ V m (9) Vm = [(1− η)m/m!] 2 am η (a†a−m) , where ̺ is the initial state. In particular for the system ini- tially prepared in a Fock state ̺p = |p〉〈p|, we obtain, after evolution, the mixed state ̺p(η) = Vm̺pV αl,p(η)|l〉〈l| (10) with αl,p(η) = (1−η)p−lηl. The reference Gaussian state corresponding to ̺p(η) is a thermal state τp(η) = ν(pη) with average photon number pη. Non-Gaussianity of ̺p(η) can be evaluated analytically, we have δpη ≡ δ[̺p(η)] 2(1− η)2m 2F1 −m,−m, 1; η2 (η−1)2 (1− η)2m 2F1 −m,−m, 1; η (η − 1)2 + (1 + 2mη)−1 − 2(1 + (m− 1)η) (1 +mη)m+1 2F1(a, b, c;x) being a hypergeometric function. We show the behavior of δpη in Fig. 3 as a function of 1 − η for different values of p. As it is apparent from the plot δpη is a monotoni- cally decreasing function of 1 − η as well as a monotonically increasing function of p. That is, at fixed time t the higher is the initial photon number p, the larger is the resulting non- Gaussianity. 0.2 0.4 0.6 0.8 1 1 - Η 0.2 0.4 0.6 0.8 1 FIG. 3: (Left): Non-Gaussianity of Fock states |p〉 undergoing Gaus- sification by loss mechanism due to the interaction with a bath of os- cillators at zero temperature. We show δηp as a function of 1 − η for different values of p: from bottom to top p = 1, 10, 100, 1000. (Right): Non-Gaussianity of ̺IPS as a function of T for r = 0.5 and for different values of ǫ = 0.2, 0.4, 0.6, 0.8 (from bottom to top). δIPS results to be a monotonous increasing function of T , while ǫ only slightly changes the non-Gaussian character of the state. Let us now consider the de-Gaussification protocol ob- tained by the process of photon subtraction. Inconclusive Pho- ton Subtraction (IPS) has been introduced for single-mode and two-mode states in [6, 7, 18] and experimentally realized in [9]. In the IPS protocol an input state ̺(in) is mixed with the vacuum at a beam splitter (BS) with transmissivity T and then, on/off photodetection with quantum efficiency ǫ is per- formed on the reflected beam. The process can be thus charac- terized by two parameters: the transmissivity T and the detec- tor efficiency ǫ. Since the detector can only discriminate the presence from the absence of light, this measurement is in- conclusive, namely it does not resolve the number of detected photons. When the detector clicks, an unknown number of photons is subtracted from the initial state and we obtain the conditional IPS state ̺IPS . The conditional map induced by the measurement is non-Gaussian [7], and the output state is de-Gaussified. Upon applying the IPS protocol to the (Gaus- sian) single-mode squeezed vacuum S(r)|0〉 (r ∈ R), where S(r) is the real squeezing operation we obtain [18] the con- ditional state ̺IPS , whose characteristic function χ[̺IPS ](λ) is a sum of two Gaussian functions and therefore is no longer Gaussian. The corresponding Gaussian reference state is a squeezed thermal state τIPS = S(ξIPS)ν(NIPS)S †(ξIPS) where the parameters ξIPS andNIPS are analytic functions of r, T and ǫ. Non-Gaussianity δIPS = δIPS(T, ǫ, r) has been evaluated, and in Fig. 3 (right) we report δIPS for r = 0.5 as a function of the transmittivity T for different values of the quantum efficiency ǫ. As it is apparent from the plot the IPS protocol indeed de-Gaussifies the input state, i.e. nonzero values of the non-Gaussianity are obtained. We found that δIPS is an increasing function of the transmissivity T which is the relevant parameter, while the quantum efficiency ǫ only slightly affects the non-Gaussian character of the output state. The highest value of non-Gaussianity is achieved in the limit of unit transmissivity and unit quantum efficiency T,η→1 δIPS = δ[|1〉〈1|] = δ[S(r)|1〉〈1|S†(r)], where the last equality is derived from Lemma 2. This result is in agreement with the fact that a squeezed vacuum state undergoing the IPS protocol is driven towards the target state S(r)|1〉 in the limit of T, ǫ → 1 [18]. Finally, we notice that for T, ǫ 6= 1 and for r → ∞ the non-Gaussianity vanishes. In turn, this corresponds to the fact that one of the coefficients of the two Gaussians of χ[̺IPS ](λ) vanishes, i.e. the output state is again a Gaussian one. VI. CONCLUSION AND OUTLOOKS Having at disposal a good measure of non-Gaussianity for quantum state allows us to define a measure of the non- Gaussian character of a quantum operation. Let us denote by G the whole set of Gaussian states. A convenient defi- nition for the non-Gaussianity of a map E reads as follows δ[E ] = max̺∈G δ[E(̺)], where E(̺) denotes the quantum state obtained after the evolution imposed by the map. Indeed, for a Gaussian map Eg , which transforms any input Gaussian state into a Gaussian state, we have δ[Eg] = 0. Work along this line is in progress and results will be reported elsewhere. In conclusion, we have proposed a measure of the non- Gaussian character of a CV quantum state. We have shown that our measure satisfies the natural properties expected from a good measure of non-Gaussianity, and have evaluated the non-Gaussianity of some relevant states, in particular of states undergoing Gaussification and de-Gaussification protocols. Using our measure an analogue non-Gaussianity measure for quantum operations may be introduced. Acknowledgments This work has been supported by MIUR project PRIN2005024254-002, the EC Integrated Project QAP (Con- tract No. 015848) and Polish MNiSW grant 1 P03B 011 29. APPENDIX A: GAUSSIAN REFERENCE WITH UNCONSTRAINED MEAN VALUE As we have seen from the above examples δ[̺] of Eq. (2) represents a good measure of the non-Gaussian character of a quantum state. A question arises on whether different choices for the reference Gaussian state τ may lead to alternative, valid, definitions. As for example (for single-mode states) we may define δ′[̺] = min D2HS [̺, τ ]/µ[̺], (A1) where τ = D(C)S(ξ)ν(N)S†(ξ)D†(C) is a Gaussian state with the same covariance matrix of ̺ and unconstrained vec- tor of mean values X = (ReC, ImC) used to minimize the Hilbert-Schmidt distance. Here we report few examples of the comparison between the results already obtained using (2) with that coming from (A1). As we will see either the two definitions coincide or δ′ and δ are monotone functions of each other. Since the definition (2) corresponds to an easily computable measure we conclude that it represents the most convenient choice. Let us first consider the Fock state ̺ = |p〉〈p|. According to (A1), the reference Gaussian state is given by a displaced thermal states τ ′ = D(C)νpD †(C). The overlap between ̺ and τ ′ is given by κ[|p〉〈p|, τ ′] = 1 1 + p 1 + p 1 + p p(1 + p) The maximum of (A2) is achieved forC = 0, which coincides with the assumptions C = Tr[a|p〉〈p|]. Let us consider the quantum state (10) obtained as the so- lution of the loss Master Equation for an initial Fock state |p〉〈p|. The unconstrained Gaussian reference is again a dis- placed thermal state τ ′ = D(C)νpηD †(C), and the overlap is given by κ[̺p(η), τ ′] = Tr[τ̺p(η)] = (1 + η(p− 1))p (1 + pη)p+1 η|C|2 (1 + pη)(η(1 − p)− 1) Again, since the overlap is maximum for C = Tr[a̺p(η)] = 0, both definitions give the same results for the non- Gaussianity. Let us now consider the Schrödinger cat-like states of (7). The reference Gaussian state is a displaced squeezed thermal state, with squeezing and thermal photons as calculated be- fore. The optimization over the free parameterC may be done numerically. In Fig. 4 we show the non-Gaussianitiy, both as resulting from (A1) and by choosing C = Tr[a̺S ] as in (2), as a function of ǫ. The two curves are almost the same, with no qualitative differences. [1] A. Ferraro, S. Olivares and M. G. A. Paris, Gaussian States in Quantum Information, (Bibliopolis, Napoli, 2005) [2] J. Eisert, M. B. Plenio, Int. J. Quant. Inf. 1, 479 (2003) [3] F. Dell’Anno et al., Phys. Rep. 428, 53 (2006). [4] S. L. Braunstein, P. van Loock, Rev. Mod. Phys 77, 513 (2005). [5] T. Opatrny et al., Phys.Rev. A 61, 032302 (2000). [6] P. T. Cochrane et al., Phys Rev. A 65, 062306 (2002). [7] S. Olivares et al., Phys. Rev. A 67, 032314 (2003); S. Olivares, M. G. A. Paris, Las. Phys. 16, 1533 (2006). [8] N. J. Cerf et al., Phys. Rev. Lett. 95, 070501 (2005). [9] J. Wenger et al., Phys. Rev. Lett. 92, 153601 (2004); A. Our- joumtsev et al., Science 312, 83 (2006). [10] M. M. Wolf et al., Phys. Rev. Lett. 96, 080502 (2006). [11] M. M. Wolf et al, Phys. Rev. Lett. 98, 130501 (2007). 0 [12] A. S. Holevo, R. F. Werner, Phys. Rev. A 63, 032312 (2001). [13] L. M. Duan at al, Phys. Rev. Lett. 84, 4002 (2000); R. F. Werner, M. M. Wolf, Phys. Rev. Lett. 86, 3658 (2001). [14] K. E. Cahill and R. J. Glauber, Phys. Rev. 177, 1882 (1969). [15] P. Marian, T. Marian, Phys. Rev. A 47, 4474 (1993). [16] K. Zyczkowski and M. Kus, J. Phys. A 27, 4235 (1994). [17] K. Zyczkowski, P. Horodecki, A. Sanpera and M. Lewenstein, Phys. Rev. A 58, 883 (1994). [18] S. Olivares and M. G. A. Paris, J. Opt. B, 7, S392 (2005). FIG. 4: Non-Gaussianity of a Schrödinger cat-like state as a func- tion of the superposition parameter φ, with either C obtained by nu- merical minimization (solid) or with C = Tr[a̺] (dotted). (Left): α = 0.5; (Right): α = 5. ABSTRACT We address the issue of quantifying the non-Gaussian character of a bosonic quantum state and introduce a non-Gaussianity measure based on the Hilbert-Schmidt distance between the state under examination and a reference Gaussian state. We analyze in details the properties of the proposed measure and exploit it to evaluate the non-Gaussianity of some relevant single- and multi-mode quantum states. The evolution of non-Gaussianity is also analyzed for quantum states undergoing the processes of Gaussification by loss and de-Gaussification by photon-subtraction. The suggested measure is easily computable for any state of a bosonic system and allows to define a corresponding measure for the non-Gaussian character of a quantum operation. <|endoftext|><|startoftext|> Introduction Recall that a Hadamard matrix A of orderm is a {±1}-matrix of size m×m such that AAT = mIm, where T denotes the transpose and Im the identity matrix. A skew-Hadamard matrix is a Hadamard matrix A such that A − Im is a skew-symmetric matrix. We refer the reader to [1] for the survey of known results about skew-Hadamard matrices. The construction of skew-Hadamard matrices is lagging considerably behind that for arbitrary Hadamard matrices. Our previous four notes, written more than 13 years ago, were motivated by the desire to im- prove this situation. We constructed skew-Hadamard matrices of order m = 4n for the following 24 odd integers n: [2]: 37, 43; [3]: 67, 113, 127, 157, 163, 181, 241; [4]: 39, 49, 65, 93, 121, 129, 133, 217, 219, 267; [6]: 81, 103, 151, 169, 463. At the time of publication, such matrices of these orders were not known to exist. Due to the manifold increase in computing power since that time, one can now make further progress. In [6], we listed 45 odd integers n < 300 for which no skew-Hadamard matrix of order 4n was known at that time. (In the first edition of [1], Table 24.31 was incomplete.) The smallest of these n’s was 47. The next one, 59, has been removed recently by Fletcher, Koukouvinos and The author was supported by an NSERC Discovery Grant. http://arxiv.org/abs/0704.0640v2 2 D.Ž. D– OKOVIĆ Seberry [7]. In this note we shall remove the integers 47 and 97 from the mentioned list by constructing examples of skew-Hadamard matrices of orders 4 · 47 = 188 and 4 · 97 = 388. (We have constructed a bunch of examples but we have saved and will present only a few of them.) Consequently, the revised list now consists of the 42 integers: 69, 89, 101, 107, 109, 119, 145, 149, 153, 167, 177, 179, 191, 193, 201, 205, 209, 213, 223, 225, 229, 233, 235, 239, 245, 247, 249, 251, 253, 257, 259, 261, 265, 269, 275, 277, 283, 285, 287, 289, 295, 299. We construct our examples of skew-Hadamard matrices of orders 188 and 388 by constructing first suitable supplementary difference sets, and then we use these sets to build four circulant blocks, which one should plug into the Goethals–Seidel array. The procedure used to find these supplementary difference sets is not new. I have used it in several papers during the last 15 years. It is described in my note [5]. 2. The case n = 47 We denote the additive group of integers modulo n by Zn. In this section we set n = 47. In the literature on Hadamard matrices it is customary to refer to difference families (DF) as supplementary differ- ence sets (SDS) and to employ more elaborate and more informative notation by listing the order v of the underlying abelian group, the number of sets in the family as well as their cardinals, and also the parameter λ. We have constructed four suitable difference families in Zn. The first two are the following. SKEW-HADAMARD MATRICES 3 Proposition 2.1. Define six subsets of Z47: X1 = {2, 3, 5, 6, 7, 9, 10, 11, 12, 13, 14, 17, 18, 19, 20, 21, 22, 25, 27, 30, 31, 33, 35, 37, 38, 39, 40, 42, 43, 44}, X2 = {1, 3, 6, 7, 8, 11, 13, 14, 15, 19, 20, 21, 24, 27, 30, 33, 39, 41, 43, 44, 45, 46}, X3 = {3, 6, 8, 10, 11, 12, 14, 20, 21, 23, 24, 25, 26, 27, 30, 31, 32, 34, 35, 41, 42, 45}, Y1 = {1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 15, 17, 18, 19, 21, 23, 24, 25, 27, 28, 29, 30, 31, 35, 38, 41, 43, 44, 46}, Y2 = {3, 6, 7, 8, 10, 11, 12, 16, 22, 25, 26, 31, 32, 33, 34, 37, 39, 41, 42, 43, 44, 46}, Y3 = {3, 7, 12, 13, 15, 16, 18, 20, 21, 23, 25, 26, 27, 28, 32, 35, 38, 39, 42, 44, 45, 46}. The triples {X1, X2, X3} and {Y1, Y2, Y3} are difference families, i.e., they are 3 − (47; 30, 22, 22; 39) supplementary difference sets in Z47. The two families are not equivalent. Proof. Use the computer to verify the claims. Note that the cardinals nk = |Xk| = |Yk| are indeed n1 = 30 and n2 = n3 = 22. The parameter λ is 39, i.e., each nonzero integer in Zn occurs 39 times in the list of differences created from the sets Xk and also from the Yk. The second claim can be verified in several ways. We used the fol- lowing ad hoc method. We compare the list of differences generated by the sets X1 and Y1. Each nonzero integer i ∈ Zn occurs in one of these lists say µi times. The µi’s take only three values: 18, 19 or 20. But the number of µi’s equal to 18, 19 and 20 is 12, 26 and 8 for X1 and 14, 22 and 10 for Y1. Hence X1 and Y1 are not equivalent under translations and automorphisms of the additive group Zn. � For any subset X ⊆ Zn let aX = (a0, a1, . . . , an−1) be the {±1}-row vector such that ai = −1 iff i ∈ X . We denote by AX the n× n circulant matrix having aX as its first row. Let X0 ⊆ Zn be the Paley difference set (the set of nonzero squares in the finite field Zn). Recall that X0 is of skew type, i.e., for nonzero i ∈ Zn we have i ∈ X0 iff −i /∈ X0. Its cardinal is n0 = |X0| = 23. For simplicity, write Ak instead of AXk for k = 0, 1, 2, 3. We can now plug our matrices Ak into the Goethals–Seidel template to construct a 4 D.Ž. D– OKOVIĆ skew-Hadamard matrix of order 188: A0 A1R A2R A3R −A1R A0 −A −A2R A R A0 −A −A3R −A As usual, R denotes the matrix having ones on the back-diagonal and all other entries zero. Clearly, we can use the second difference family to construct another skew-Hadamard matrix of order 188. Both solutions have the same associated decomposition of 4n as sum of four squares: 4n = 188 = 132 + 32 + 32 + 12 (n− 2nk) The remaining two difference families have different parameters from the first two. Proposition 2.2. Define six subsets of Z47: P1 = {0, 2, 4, 5, 9, 10, 12, 16, 17, 19, 21, 22, 23, 25, 27, 28, 35, 36, 37, 43, 46}, P2 = {0, 1, 2, 6, 8, 9, 11, 15, 16, 19, 25, 32, 33, 35, 36, 37, 38, 40, 44}, P3 = {1, 2, 3, 4, 5, 6, 7, 10, 11, 16, 18, 22, 24, 28, 31, 35, 38, 40, 43}, Q1 = {4, 5, 6, 8, 11, 12, 15, 20, 21, 23, 25, 26, 28, 29, 30, 31, 32, 36, 39, 41, 43}, Q2 = {1, 2, 5, 7, 13, 14, 21, 22, 24, 26, 31, 32, 35, 36, 37, 39, 40, 42, 46}, Q3 = {1, 2, 3, 4, 5, 9, 12, 18, 20, 21, 24, 25, 32, 34, 38, 39, 43, 44, 46}. The triples {P1, P2, P3} and {Q1, Q2, Q3} are difference families, i.e., they are 3 − (47; 21, 19, 19; 24) supplementary difference sets in Z47. These two families are not equivalent to each other or the ones above. Just as the first two families, {P1, P2, P3} and {Q1, Q2, Q3} can be used to construct two more skew-Hadamard matrices of order 188. The associated decomposition into sum of four squares is now different: 188 = 92 + 92 + 52 + 12. 3. The case n = 97 For the remainder of this note we set n = 97. Let G be the multi- plicative group of the nonzero elements of Zn, a cyclic group of order n − 1 = 96, and let H = 〈35〉 = {1, 35, 61} be its subgroup of order SKEW-HADAMARD MATRICES 5 3. We use the same enumeration of the 32 cosets αi, 0 ≤ i ≤ 31, of H in G as in our computer program. Thus we impose the condition that α2i+1 = −1 · α2i for 0 ≤ i ≤ 15. For even indices we have α0 = H, α2 = 2H, α4 = 3H, α6 = 4H, α8 = 5H, α10 = 6H, α12 = 7H, α14 = 9H, α16 = 10H, α18 = 12H, α20 = 13H, α22 = 15H, α24 = 18H, α26 = 20H, α28 = 23H, α30 = 26H. Next define four index sets: J0 = {1, 2, 4, 6, 9, 11, 13, 14, 17, 18, 21, 23, 25, 27, 29, 30}, J1 = {1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 23, 27, 29}, J2 = {0, 1, 2, 5, 6, 12, 13, 15, 16, 20, 24, 25, 26, 29, 30, 31}, J3 = {0, 2, 3, 4, 7, 8, 9, 11, 12, 13, 15, 16, 17, 18, 23, 28, 29} and introduce the following four subsets of Zn: αi, k = 0, 1, 2, 3. Their cardinals nk = |Uk| = 3|Jk| are: n0 = n2 = 48, n1 = 39, n3 = 51 and we set λ = n0 + n1 + n2 + n3 − n = 89. Observe that U0 is of skew type, i.e., we have U0 ∩ (−U0) = ∅, U0 ∪ (−U0) = Zn \ {0}. Proposition 3.1. The four subsets U0, U1, U2, U3 ⊂ Zn form a dif- ference family, i.e., they are 4 − (97; 48, 39, 48, 51; 89) supplementary difference sets in Z97. Proof. For r ∈ {1, 2, . . . , 96} let λk(r) denote the number of solutions of the congruence i − j ≡ r (mod 97) with {i, j} ⊆ Uk. It is easy to verify (by using a computer) that λ1(r) + λ2(r) + λ3(r) + λ4(r) = λ is valid for all such r. Hence the sets U1, U2, U3, U4 form a difference family in Zn. � Let Ak now denote the n × n circulant matrices AYk . The SDS- property implies that the {±1}-matrices A0, . . . , A3 satisfy the identity = 4nIn. 6 D.Ž. D– OKOVIĆ One can now plug the matrices Ak into the Goethals–Seidel template to obtain a Hadamard matrix A of order 4n = 388. Since U1 is of skew type, A is also skew-Hadamard. Our second example, B, is constructed in the same way by using the index sets: K0 = {0, 3, 4, 7, 9, 11, 12, 14, 17, 19, 20, 22, 24, 27, 28, 30}, K1 = {4, 7, 8, 10, 12, 13, 14, 15, 17, 18, 20, 26, 27}, K2 = {0, 1, 2, 3, 6, 7, 8, 11, 12, 14, 20, 23, 24, 25, 28, 31}, K3 = {1, 2, 4, 7, 8, 9, 10, 12, 13, 19, 21, 23, 24, 25, 26, 27, 31}, with the corresponding subsets of Zn: αi, k = 0, 1, 2, 3, with V0 of skew type. Proposition 3.2. The four subsets V0, V1, V2, V3 ⊂ Zn form a dif- ference family, i.e., they are 4 − (97; 48, 39, 48, 51; 89) supplementary difference sets in Z97. The two SDS’s that we used to construct A and B are not equivalent. For instance, the sets U1 and V1 are not equivalent under translations and group automorphisms of Zn. Since the two SDS’s have the same parameters, they share the same decomposition of 4n into sum of four squares: 4n = 388 = 192 + 52 + 12 + 12 (n− 2nk) References [1] C.J. Colbourn and J.H. Dinitz, Handbook of Combinatorial Designs, 2nd Edi- tion, CRC Press, New York, 2006. [2] D.Ž.D– oković, Skew Hadamard matrices of order 4×37 and 4×43, J. Combinat. Theory, Series A, 61 (1992), 319–321. [3] , Construction of some new Hadamard matrices, Bull. Austral. Math. Soc. 45 (1992), 327–332. [4] , Ten new orders for Hadamard matrices of skew type, Univ. Beograd, Publ. Elektrotehn. Fak. Ser. Mat. 3 (1992), 47–59. [5] , Good matrices of orders 33, 35 and 127 exist, J. Comb. Math. Comb. Comp. 14 (1993), 145–152. [6] , Five new orders for Hadamard matrices of skew type, Australasian J. Comb. 10 (1994), 259–264. SKEW-HADAMARD MATRICES 7 [7] R.J. Fletcher, C. Koukouvinos and J. Seberry, New skew-Hadamard matrices of order 4 · 59 and new D-optimal designs of order 2 · 59, Discrete Math. 286 (2004), 251–253. Department of Pure Mathematics, University of Waterloo, Water- loo, Ontario, N2L 3G1, Canada E-mail address : djokovic@uwaterloo.ca 1. Introduction 2. The case n=47 3. The case n=97 References ABSTRACT We construct several difference families on cyclic groups of orders 47 and 97, and use them to construct skew-Hadamard matrices of orders 188 and 388. Such difference families and matrices are constructed here for the first time. The matrices are constructed by using the Goethals-Seidel array. <|endoftext|><|startoftext|> Quantum engineering of photon states with entangled atomic ensembles D. Porras and J. I. Cirac Max-Planck Institut für Quantenoptik, Hans-Kopfermann-Str. 1, Garching, D-85748, Germany (Dated: November 4, 2018) We propose and analyze a new method to produce single and entangled photons which does not require cavities. It relies on the collective enhancement of light emission as a consequence of the presence of entanglement in atomic ensembles. Light emission is triggered by a laser pulse, and therefore our scheme is deterministic. Furthermore, it allows one to produce a variety of photonic entangled states by first preparing certain atomic states using simple sequences of quantum gates. We analyze the feasibility of our scheme, and particularize it to: ions in linear traps, atoms in optical lattices, and in cells at room temperature. PACS numbers: PACS The deterministic generation of collimated single and entangled photons is of crucial importance in Quantum Information, like in quantum cryptography [1], quantum computation [2], quantum lithography [3] or quantum interferometry [4, 5]. Most of the methods tested so far require high-Q cavities, something which is very demand- ing in practice [6, 7, 8, 9]. The engineering of quantum states in atomic systems is now possible thanks to the experimental progress experienced by the field of Atomic Physics during the last years. In fact, with trapped ions it has been already possible to create so–called W [10] and GHZ [11] states of up to 8 ions. At the same time, scientists have been able to produce other kinds of entan- gled states [12] with atoms in optical lattices. Further- more, with the advent of Rydberg techniques [13] it will soon be possible to create W–like states in that system or in atomic ensembles at room temperature. Apart from their fundamental interest, some of those states may have applications in precision spectroscopy [14, 15]. In this work we show that the ability of creating those atomic states may have a strong impact in different sub- fields of quantum information, as it may lead to a very efficient way of creating certain kind of entangled pho- tonic states which are required in various applications. The main idea is to use a laser and an internal level con- figuration such that we can map the atomic state onto photonic states corresponding to modes propagating in a well defined direction. Our scheme uses the well known fact [16, 17] that, under certain circumstances, light scat- tering takes place predominantly in the forward direction due to an interference effect. In fact, this effect is the ba- sis of one of the building blocks of the repeater scheme proposed in [18], and has been recently demonstrated in a series of experiments [19, 20, 21]. There, a single ex- citation is created in an atomic ensemble by detecting a photon emission in a certain direction. Then, the excita- tion is released in the forward direction by using a laser. Building on this fact, we propose to create certain kind of excitations by using quantum gates or atomic interac- tions, which give rise to the desired entangled states when they are released using a laser, and which propagate in the desired direction due to the mentioned interference effect. Let us consider a set of N atoms with (ground) hype- fine levels |g〉 and |sa,b〉 (see Fig. 1 (a)). We consider states of the form |k(na)a ,k b 〉 = na! nb! )na ( |0〉, (1) and linear combinations thereof. Here, |0〉 = |g〉1 . . . |g〉N , e−ikxr x,j , x = a, b, (2) where σ x,j excites an atom from |g〉j to |sx〉j , and r0j are the equilibrium position of the atoms. In the limit nx ≪ N , Eq. (1) defines a set of orthonormal collective states with nx atoms excited in |sx〉 and linear momentum kx. Those states can be indeed readily created using trapped ions or Rydberg techniques (see Appendix A1,3). In order to release the photons, one sends a laser pulse of wavevector kL which couples level |sx〉 to some elec- tronically excited ones |ex〉, respectively. The large pop- ulation of level |g〉 together with the initial entangle- ment (coherences) between the atoms, will now stimu- late the emission of photons from the excited states to the level |g〉, which overall will produce the mapping be- tween these states and the photonic states, |k(na)a ,k b 〉 → |na〉ka+kL,σa |nb〉kb+kL,σb ; (3) that is, (1) is mapped to a Fock state of nx photons with momenta kx + kL and polarization σx, where σx is the polarization of the light in each decay channel. More- over, due to the linearity of this process, superpositions of states of the form (1) will be mapped onto superpo- sitions of photonic states (3). For example, the atomic state |k(1),q(1)〉+ |q(1),k(1)〉 2 will emit a pair of entangled photons in different directions. The mapping (3) is strictly valid under ideal conditions, and in the limit N → ∞, and the directionality in the photon emission is directly connected to the momentum conservation which, in turn, is a consequence of the constructive interference in the field emitted by each atom. Thus, the crucial is- sue in our scheme is to determine how this mapping is http://arxiv.org/abs/0704.0641v2 modified in finite atomic ensembles under nonideal con- ditions. In the following we analyze such questions in detail, concentrating in the simplest case in which we have a single excitation with momentum |k0〉 in |sa〉 (i.e. our initial state is a W-like state) and thus we produce a single photon. We determine a function f(Ω), which is proportional to the probability density that the photon is emitted in the direction Ω. In general, f = fcoh + finc; that is, it is the sum of a coherent contribution and an incoherent one. The later appears whenever the posi- tions of the particles fluctuate. fcoh contains the forward scattering contribution, which is emitted in a cone with a width ∆Ω that decreases with the number of particles. finc, on the contrary, describes isotropic light emission, thus, even when the light emitted in ∆Ω is collected, the contribution finc leads to a limitation in the efficiency of the setup. To quantify the error probability, we define dΩ finc(Ω) dΩ f(Ω) . (4) As long as the number of excited atoms is small nx ≪ N , this analysis can be easily generalized to the emission of states with many photons (1). One obtaines that the overall error is bounded by 1− (1− E)na+nb . The emission pattern can be obtained by studying the Heisenberg equations of motion of the field operators. The calculation inolves the study of the decay of the atomic state under collective effects (see Appendix C). To simplify our analysis we ignore the dipole pattern, in which case we get: f(Ω) = i,j=1 〈e−i(kLnΩ−kL)(ri−rj)〉eik0(r ). (5) rj are the coordinate operators of the atoms, and thus Eq. (5) allows us to describe fluctuations in the position of the particles during the emission of light. In the following, we will show three different experimental set–ups where our scheme can be implemented. In order to analyze the performance in each of them, we first particularize the above formula to three different situations which are directly connected with those set–ups. We will focus on the angular width of the forward–scattering cone, ∆Ω, which measures the collimation of the emitted photons, and the error probability, E , as figures of merit. Then we will introduce the possible implementations and will use those formulas to specify the conditions for them to correctly operate. (i) Fixed atomic positions. In the case of a square lattice of particles trapped in 3D (see Fig. 1 (b)), the emission pattern is given by f0(Ω) = α=x,y,z sin2((kLn Ω − (kL+k0)α)d0Nα/2) sin2((kLn Ω − (kL+k0)α)d0/2) , (6) with Nα the number of atoms in each direction. f0(Ω) has a series of diffraction peaks, which are reduced to a single one if d0 < λ/2. In this regime, the emission is cen- tered in a cone with nΩ in the direction of kL+k0. Note that for simultaneous energy and momentum conserva- tion condition |kL + k0| = kL has to be fulfilled. Since the positions of the atoms do not fluctuate, f0 has only a coherent contribution (E = 0), and the only limitation for the effiency of the setup is the width of the emission cone, which scales in 3D like ∆θ3D ≈ 1/(N1/3kLd0). In the case of a chain of atoms (1D) momentum is conserved only along the direction of the chain. Photon emission can be still directed efficiently along the axis of the chain, in a cone whose width scales like ∆θ1D ≈ 1/ NkLd0. (ii) Fluctuating atomic positions. Let us consider a lattice of atoms at temperature T , trapped by indepen- dent harmonic potentials. The emission pattern is now the sum the of two contributions, fcoh(Ω) = f0(Ω)gT (Ω), finc(Ω) = 1− gT (Ω). (7) gT (Ω) = e −((kLnΩ−kL)ξT ) , and ξT is the vector whose components are the size of the position fluctuations in each spatial direction, (ξαT ) 2 = x20(1 + 2n T ), with x 0 the size of the ground state in harmonic potential, and nαT the number of motion quanta at T . Light scattered into finc represents an important fraction whenever ξ T ≫ d0. In this case, the emission of light is centered around kL, since the uncertainty in the position of the particles av- erages out the intial linear momentum k0. The scaling of E in this regime strongly depends on the dimensionality of the system. In particular, in the case of a chain of atoms, E1D = d0/λ, whereas in the square 3D lattice, we get E3D ≈ 12.6(d0/λ)2N−1/3. (iii) Statistical distribution of particles. Consider an ensemble of atoms (see Fig. 1 (c)), which move inside a square box of size L, such that their motion is faster than their radiative decay, that is, their average velocity v is such that vL ≫ Γ, with Γ the emission rate. This situation can be described by assuming that the atoms are in a statistical distribution with equal probability to be at any point in the box. The situation is thus similar to that of a thermal state, fcoh (Ω) = Ngbox(Ω), finc (Ω) = 1− gbox(Ω), (8) where gbox(Ω) = α=x,y,z sinc2 (L(kLn Ω − kαL)) . (9) Defining the average distance between particles like d0 = LN−1/3, we find the same scalings of ∆θ as in case (i), and of E , as in case (ii). Trapping schemes for atomic ensembles are simpler to realize but face the difficulty that conditions for the directionality of photon emission are more stringent. In the case of a lattice of particles at fixed positions, forward–scattering is ensured whenever condition d0 < λ/2 is fulfilled. On the contrary, in the case of atomic ensembles, the incoherent contribution finc has to be small enough such that E ≪ 1, which implies FIG. 1: (a) Level configuration for the release of atomic en- tangled states in photonic channels. (b) Release of a collec- tive state with linear momentum k0, that has been gener- ated in a lattice of atoms. (c) Emission of photons from an atomic ensemble, which consists of an incoherent contribu- tion (isotropic), and a coherent one in the forward–scattering direction. d0 ≪ λ in 1D, or, alternatively, a number of particles large enough in 3D. Now we introduce three experimental set–ups where our scheme can be implemented. In the Appendix we show how to create the atomic states that we are consid- ering here. Trapped ions. This system is ideally suited to create collective states like (1), as was demonstrated recently in ref. [10]. Most usually ions are arranged in chains, such that we deal with the 1D situation discussed above. Even though trapped ions are not equally spaced, under the condition d̄0 < λ/2, with d̄0 the average distance, we still get light emission in the forward–scattering cone only, see Fig. 2. Considering two different internal levels, which can correspond to different states in an hyperfine multiplet, states such as those defined by Eq. (1) can be created by a number of quantum operations that scales linearly with the number of ions N (see A1). For exam- ple, the state 1/ 2 (|0, 2kL〉+ |2kL,0〉), would emit two photons in the forward and backward directions along the chain axis, entangled in polarization. The main difficulty for the implementation of this idea with ions lies on the fact that ion–ion distances are usually in the range of a few µm, and thus condition d0 < λ/2 is not fulfilled when considering optical wavelengths. A way out of this prob- lem is to use optical transitions which lie in the range of λ & 5µm, which can be found in ions such as Hg+, Ba+, or Yb+ [22]. Cold atoms in optical lattices. By using optical lat- tices we fulfill the need of placing atoms at interparticle distances comparable to optical wavelengths, since po- tential wells in a standing–wave are indeed separated by d0 = λsw/2, with λsw, the wavelength of the counterprop- agating lasers. By using an optical transition such that λ > λsw, we are in the regime in which light emission FIG. 2: Probability of photon emission from an ion chain with N = 30 ions initially in a W –state. The blue line corresponds to a chain with equally spaced ions with two diffraction peaks. Black and red lines corresponds to an ion Coulomb chain, in which ions are in an overall trapping potential and thus are not equally spaced. However, in the case that the average dis- tance, d̄0 is small enough, light is also preferentially emitted in the forward–scattering direction. is focused into a single Bragg peak. Although one could think of peforming quantum gates between ultracold neu- tral atoms to generate collective atomic states [23], this procedure faces the difficulties of quantum computation in this system, like for example how to achieve single atom addressability. More efficiently, one could avoid the use of quantum gates by using the dipole–blockade mechanism with Rydberg atoms, which allows us to gen- erate W-states, as well as states which emit Fock states with a number M of photons [13] (see Appendix 3). Atomic ensembles at room temperature. The very same techniques which can be applied to Rydberg atoms in an optical lattice can also be used in the case of hot en- sembles. On the one hand, this setup has the advantage that atoms do not need to be cooled and placed in an optical lattice. On the other hand, it can be described by a statistical distribution of particles, and thus suffers from the fact that high efficiency in the release of pho- tons is achieved under more severe conditions of particle density and atom number, as discussed above. However, densities which are high enough to fulfill the requirement E ≪ 1 have been recently reported in [24]. In conclusion, we have proposed to use current tech- niques for quantum engineering to generate atomic multi- partite entangled states which can be efficiently mapped into photonic states. Our proposal relies on the release of spin–wave like excitations into a given spatial direction by means of interference effect, and can be implemented with trapped ions, atoms in optical lattices, and atomic ensembles at room temperature. This work was supported by E.U. projects (SCALA and CONQUEST), and the Deutsche Forschungsgemein- schaft. APPENDIX A: CREATION OF COLLECTIVE ATOMIC STATES IN A CHAIN OF ATOMS Entangled states of the form (1) and their linear com- binations can be generated in a chain of particles, for example, of trapped ions, by means of a limited number of quantum operations. To demonstrate this, we first show that they can be written as Matrix Product States with a small bond dimension D, i.e. they can be written |Ψ〉 = i1,...,iN 〈ΦF|V iN[N ] . . . V |ΦI〉|i1〉 . . . |iN 〉, (A1) In (A1), the indices ij = g, sa, sb, and V are D × D matrices acting on an auxiliary D–dimensional Hilbert space. D is given by the number of states which appear in the singular value decomposition (s.v.d.) of |Ψ〉 at any site in the chain [25]. As it is shown in [26], the state (A1) can be prepared by performing N gates which act on [log2 D]+1 qubits. Thus, as long as D is independent of N , the number of gates to be applied scales linearly with the total number of atoms. To evaluate D, consider first the case of a state like (1) with atoms excited in level sa only, and a partition of the chain in two parts L and R. We get na + 1 states in the s.v.d. with respect to this partition, which correspond to states with a number of excited atoms in part L, ranging from 0 to na. This result is easily generalized to a linear combination ofM states of the form (1), in which case we get D = M(na + 1)(nb + 1). For example, an entangled state of the form: |Ψ〉 = 1√ |kna=1,qnb=1〉+ |qna=1,knb=1〉 , (A2) has D = 8. APPENDIX B: QUANTUM STATE ENGINEERING WITH RYDBERG BLOCKADE Interactions between excited atomic states, like those that take place in Rydberg atoms, can be used to the cre- ate the states defined by Eq. (A2). This can be achieved in a single experimental step, without the need for quan- tum gates, if the proper configuration of atomic inter- actions is chosen. As an example, consider the 3 level configuration shown in Fig. 1 (a), and interactions be- tween excited states such that atoms in levels |sa〉, |sb〉, interact strongly only if they are in the same excited state, that is, Uaa = Ubb = U , but Uab = 0. We ap- ply two lasers with wavectors k1,2 and Rabi frequencies Ω1,2, detuned with respect to the |g〉 – |sa,b〉 transition, such that ∆1 = −∆2 = ∆. If condition ∆1,2 ≫ Ω1,2 is fulfilled, then the lasers induce a two–photon transi- tion with Rabi frequency Ωeff = Ω1Ω2/∆. Furthermore, if Ωeff ≪ U , states with two atoms in the same excited state are not populated. Under these conditions there are two possible excitation channels, depicted in Fig. 3, which give rise to the linear combination (A2). FIG. 3: Lasers and level configuration for the creation of atomic entangled states which emit pairs of photons entan- gled in polarization. APPENDIX C: CALCULATION OF THE PHOTON DISTRIBUTION We consider for simplicity the lambda configuration depicted in Fig. 1 (a), considering a single excited state |s〉, and a single auxiliary level |e〉. The interaction of the quantized electromagnetic field with the ensemble of atoms, after the adiabatic elimination of level |e〉, is de- scribed by j,k,λ jak,λe i(k−kL)rj+iωLt + h.c. gk,λ = (ǫkλ · dge) , (C1) σj refers to the |g〉 – |s〉 atomic transition, ΩL and kL are the Rabi frequency and wave–vector of the classical field, respectively, ωk is the photon energy, ǫk,λ are the polarization vectors, and dge is the dipole matrix element for the |g〉 – |e〉 transition. The probability of photon emission is proportional to the the diagonal elements of the one–photon density ma- trix, which are obtained from the Heisenberg equation of motion for the field operators, ak〉 = dτ1dτ2e −i(ωk−ωL)(τ1−τ2) 〈e−i(k−kL)(ri−rj)〉〈σ†i (τ1)σj(τ2)〉. (C2) Since we are interested in the conditions for momentum conservation due to interference effects, we consider the following atomic initial state, |k0〉 = σ†k0 |0〉, σ e−ik0r j . (C3) The emission pattern depends thus on the two–time atomic correlation function, which in turns can be eval- uated by means of a master equation which describes the decay of the atomic levels. In the case of the ini- tial atomic state k0 (C3), fixed atom positions, and ne- glecting boundary effects, this correlation function can be evaluated exactly, 〈σ†i (τ1)σj(τ2)〉 = e −Γk(τ1+τ2)/2eik0(r ), (C4) where we have neglected an energy shift due to dipole– dipole interactions. Integrating (C2) over the absolute value of k yields the probability of photon emission, I(Ω) = Ī(Ω)f(Ω). (C5) Ī(Ω) is the dipole pattern, Ī(Ω) = 1− (negnΩ)2 , (C6) where Γ is the single atom radiative decay rate, Γk0 is the collective decay rate, neg is the unit vector of the atomic transition, and nΩ is a unit vector in the direc- tion defined by the solid angle Ω. The factor f(Ω) in I(Ω) describes the interference between the emission from dif- ferent atoms, and is given by Eq. (5). Below we deduce the master equation which leads to (C4) and we sketch its solution in the case of collective states with a single excited atom. APPENDIX D: MASTER EQUATION The master equation for the reduced density matrix of the internal levels, which describes the ratiative decay of a set of atoms under the coupling to the quantized radiation field given by Eq. (C1), is ∂tρ = 2 σiρσ j − σ i σjρ− ρσ Gij [σ i σj , ρ], (D1) where the coupling constants depend on Jij = dτgij(τ)e −iωLτ = ei(ωk−ωL)τ+i(k−kL)r,(D2) in the following way: ℜ(Jij) = Γij , ℑ(Jij) = Gij . (D3) The master equation (D1) can be solved for the particular case of an initial state (C3) by noticing that the evolu- tion of ρ is closed within the subspace spanned by the states |k0〉, |0〉. This fact can be easily proved by direct substitution of ρ(0) = |k0〉〈k0| in Eq. (D1), which yields the following evolution for the atomic density matrix: ρ(t) = e−Γk0 t|k0〉〈k0|+ 1− e−Γk0 t |0〉〈0|, (D4) where the collective decay rate Γk0 is just the Fourier transform of the coupling constants in the master equa- tion, Γk0 = Γi,je ik0(r j). (D5) A similar result holds for nondiagonal elements of ρ(t), which together with the quantum regression theorem yields the evolution of the atomic correlation function (C4). [1] Nicolas Gisin, Grégoire Ribordy, Wolfgang Tittel, and Hugo Zbinden, Rev. Mod. Phys. 74, 145 (2002). [2] E. Knill, R. Laflamme, and G. J. Milburn. Nature 409, 46 (2001). [3] A.N. Boto et al.. 5to Beat the Diffraction Limit. Phys. Rev. Lett. 85, 2733 (2000). [4] J.J. Bollinger, W.M. Itano, D.J. Wineland, and D.J. Heinzen, Phys. Rev. A 54, R4649 (1996). [5] V. Giovannetti, S. Lloyd, and L. Maccone. Science 306 1330 (2004). [6] P. Michler et al.. Science 290, 2282–2285 (2000). [7] A. Kuhn, M. Heinrich, and G. Rempe. Phys. Rev. Lett. 89, 067901 (2002). [8] K. Keller et al.. Nature 431, 1075–1078 (2004). [9] J. McKeever et al.. Science 303, 1992 (2004). [10] H. Häffner et al.. Nature 438, 643 (2004). [11] D. Leibfried et al.. Nature 438, 639 (2004). [12] O. Mandel et al.. Nature 425, 937 (2003). [13] M. D. Lukin et al.. Phys. Rev. Lett. 87, 037901 (2001). [14] D.J. Wineland et al.. Phys. Rev. A 46, R6797 (1992). [15] C.F. Roos et al.. Nature 443, 316 (2006). [16] J.D. Jackson. Classical Electrodynamics. Wiley, New York (1962). [17] M.O. Scully, E.S. Fry, C. H. Raymond Ooi, and K. Wd- kiewicz. Phys. Rev. Lett. 96, 010501 (2006). [18] L–M. Duan, M.D. Lukin, J.I. Cirac, and P. Zoller, Nature 41, 413 (2001). [19] C.W. Chou et al.. Nature 438, 828 (2005). [20] Nature 438, 833 (2005). [21] M. D. Eisaman et al.. Nature 438, 837 (2005). [22] For example λ(2D3/2 - 2P1/2) = 10.8 µm in Hg +, or λ(2D3/2 - 2D5/2) = 12.5 µm in Ba [23] D. Jaksch et al.. Phys. Rev. Lett. 85, 2208 (2000). [24] R. Heidemann et al.. Preprint at (2007). [25] G. Vidal. Phys. Rev. Lett. 91, 147902 (2003). [26] C. Schön et al.. Phys. Rev. Lett. 95, 110503 (2005). http://arxiv.org/abs/quant-ph/0701120 ABSTRACT We propose and analyze a new method to produce single and entangled photons which does not require cavities. It relies on the collective enhancement of light emission as a consequence of the presence of entanglement in atomic ensembles. Light emission is triggered by a laser pulse, and therefore our scheme is deterministic. Furthermore, it allows one to produce a variety of photonic entangled states by first preparing certain atomic states using simple sequences of quantum gates. We analyze the feasibility of our scheme, and particularize it to: ions in linear traps, atoms in optical lattices, and in cells at room temperature. <|endoftext|><|startoftext|> Direct photons and dileptons via color dipoles B. Z. Kopeliovich,1, 2 A. H. Rezaeian,1 H. J. Pirner,3 and Iván Schmidt1 Departamento de F́ısica y Centro de Estudios Subatómicos, Universidad Técnica Federico Santa Maŕıa, Casilla 110-V, Valparáıso, Chile Joint Institute for Nuclear Research, Dubna, Russia Institute for Theoretical Physics, University of Heidelberg, Philosophenweg 19, D-69120 Heidelberg, Germany (Dated: November 4, 2018) Drell-Yan dilepton pair production and inclusive direct photon production can be described within a unified framework in the color dipole approach. The inclusion of non-perturbative primordial transverse momenta and DGLAP evolution is studied. We successfully describe data for dilepton spectra from 800-GeV pp collisions, inclusive direct photon spectra for pp collisions at RHIC energies√ s = 200 GeV, and for pp̄ collisions at Tevatron energies s = 1.8 TeV, in a formalism that is free from any extra parameters. PACS numbers: 13.85.QK,13.60.Hb,13.85.Lg I. INTRODUCTION Massive lepton pair production and inclusive direct photon production in hadronic collisions have histori- cally provided an important tool to gain access to parton distributions in hadrons. Moreover, direct photons, i.e. photons not from hadronic decay, can be also a powerful probe of the initial state of matter created in heavy ion collisions, since they interact with the medium only elec- tromagnetically and therefore provide a baseline for the interpretation of jet-quenching models. In the parton model, the Feynman diagrams for par- tonic subprocesses that are present in Drell-Yan (DY) lepton pair production and in inclusive direct photon pro- duction are different, and the connection between both production mechanisms within a unique approximation scheme is not obvious. Since in the target rest frame the DY process looks like bremsstrahlung of a virtual photon decaying into a lepton pair, we will show that the color dipole formalism defined in this frame is well suited to de- scribe both production processes in a unified framework free of parameters. As an illustrative example, we study dilepton spectra in 800-GeV pp collisions from the E866 experiment [1], inclusive direct-photon spectra in pp at√ s = 200 GeV from the PHENIX experiment [2], and pp̄ collisions at s = 1.8 TeV from the CDF experiment [3]. There have been already some attempts to describe the DY transverse momentum distribution in the color dipole approach [4], but unfortunately the experimen- tal data that was used for comparison is not fully kine- matically in the range of validity of the model. Here we confront the dipole approach with experimental data that is in a region where the model is supposed to be at work. Furthermore, we will also study the inclusion of both non-perturbative primordial transverse momenta and DGLAP evolution. Despite many years of intense studies, a satisfactory description of all existing inclusive direct photon pro- duction data in hadronic collision, based on perturba- tive QCD (pQCD) calculations, seems to be evasive [5]. This letter is an alternative attempt. We shows that the color dipole approach can successfully describe inclusive photon production in hadron-hadron collisions. II. COLOR DIPOLE FORMALISM The color dipole formalism, developed in [6] for the case of the total and diffractive cross sections, can be also applied to radiation [7]. Although in the process of electromagnetic bremsstrahlung by a quark no dipole participates, the cross section can be expressed via the more elementary cross section σqq̄ of interaction of a q̄q dipole. Nevertheless, this is a fake, or effective dipole. Similar to a real dipole, where color screening is provided by interactions with either the quark or the antiquark, in the case of radiation the two amplitudes for radiation prior or after the interaction screen each other, leading to cancellation of the infra-red divergences [7]. The transverse momentum pT distribution of photon bremsstrahlung in quark-nucleon interactions, integrated over the final quark transverse momentum, was derived in [8] in terms of the dipole formalism, dσqN (q → qγ) d(lnα)d2~pT (2π)2 d2~r1d 2~r2e i~pT .(~r1−~r2) × φ⋆T,Lγq (α,~r1)φT,Lγq (α,~r2)Σγ(x,~r1, ~r2, α), where Σγ(x,~r1, ~r2, α) = {σqq̄(x, αr1) + σqq̄(x, αr2)} σqq̄(x, α(~r1 − ~r2)). (2) and ~r1 and ~r2 are the quark-photon transverse separa- tions in the two radiation amplitudes contributing to the cross section, Eq. (1), which correspondingly contains double-Fourier transformations. The parameter α is the http://arxiv.org/abs/0704.0642v2 relative fraction of the quark momentum carried by the photon, and is the same in both amplitudes, since the interaction does not change the sharing of longitudinal momentum. The transverse displacement between the initial and final quarks is αr1 and αr2 respectively. Since the amplitude of quark interaction has a phase factor exp(i~b · ~pT ), where ~b is the impact parameter of collision, the transverse displacement between the initial and final quarks leads to the color screening factor 1−exp(iα~r·~pT ). In Eq. (1) T stands for transverse and L for longitudinal photons. The energy dependence of the dipole cross sec- tion, which comes via the variable x = 2p1 · q/s, where p1 is the projectile four-momentum and q is the four- momentum of the dilepton, is generated by additional radiation of gluons which can be resummed in the lead- ing ln(1/x) approximation. In Eq. (1) the light-cone (LC) wavefunction of the pro- jectile quark γq fluctuation has been decomposed into transverse φTγq(α,~r) and longitudinal φ γq(α,~r) compo- nents, and an average over the initial quark polarization and sum over all final polarization states of quark and photon is performed. These LC wavefunction compo- nents φT,Lγq (α,~r) can be represented at the lowest order φT⋆γq (α,~r1)φ γq(α,~r2) = 4K0(ǫr1)K0(ǫr2) [1 + (1− α)2]ǫ2~r1.~r2 K1(ǫr1)K1(ǫr2), φL⋆γq (α,~r1)φ γq(α,~r2) = M2(1 − α)2 ×K0(ǫr1)K0(ǫr2), (3) in terms of transverse separation ~r between photon γ and quark q and the relative fraction α of the quark mo- mentum carried by the photon. Here K0,1(x) denotes the modified Bessel function of the second kind. We have also introduced the auxiliary variable ǫ2 = α2m2q+(1−α)M2, where M denotes the mass of dilepton and mq is an ef- fective quark mass which can be conceived as a cutoff regularization. This quark mass has less influence on dilepton production in pp collisions, albeit it will be a numerically important parameter for direct photon pro- duction, when M = 0. In general the quark mass mq should not be considered an extra parameter. Indeed, depending on the kinematical variable M , the Feynman variable xF and the square of the center of mass energy of the colliding hadrons s, there always exists a range of values ofmq where the result does not depend on the spe- cific mq value. For direct photon M = 0, mq cannot be zero since the wave function becomes divergent. In this paper, as in Refs. [8, 9], we take mq = 0.2 GeV for both dilepton and direct photon production. Notice also that mq is a more important parameter in proton-nucleus col- lisions where a value of mq = 0.2 GeV is needed in order to describe the nuclear shadowing effect [10]. In order to obtain the hadron cross section from the elementary partonic cross section Eq. (1), one should sum up the contributions from quarks and antiquarks weighted with the corresponding parton distribution functions (PDFs) [8, 9], dσDY (pp → γ⋆X) dM2dxF d2~pT x1 + x2 Z2f{qf ( ) + q̄f ( qN (q → qγ⋆) d(lnα)d2~pT 3πM2(x1 + x2) dσqN (q → qγ⋆) d(lnα)d2~pT dσγ(pp → γX) dxF d2~pT x1 + x2 qN (q → qγ) d(lnα)d2~pT . (5) The PDFs of the projectile enter in a combination which can be written in terms of proton structure function 2 . Notice that with our definitions the fractional quark charge Zf is not included in the LC wave function of Eq. (3), and that the factor αem in Eq. (4) accounts for the decay of the photon into the lepton pair. We use the standard notation for the kinematical variables, x1 = ( x2F + 4τ+xF )/2 denotes the momentum fraction that the photon carries away from the projectile hadron in the target frame, we define x2 = x1−xF , xF = 2pL/ is the Feynman variable and τ = M2+p2 , where pL and pT denote the longitudinal and transverse momentum components of the photon in the hadron-hadron center of mass frame, s is the center of mass energy squared of the colliding protons and M is the dilepton mass. We also need to identify the scale Q entering in the pro- ton structure function in Eq. (5), and relate the energy scale x of the dipole cross section entered in Eq. (2) to measurable variables. From our previous definition, and following previous works [9, 11] we have that x = x2. At zero transverse momentum, the dominant term in the LC wavefunction Eq. (3) is the one that contains the modified Bessel function K1(ǫr). This function decay ex- ponentially at large values of the argument, so that the mean distances which numerically contribute are of order 1/ǫ. On the other hand, the minimal value of α is x1, and therefore the virtuality Q2 which enters into the problem at zero transverse momentum is ∼ (1−x1)M2. Thus the hard scale at which the projectile parton distribution is probed turns out to be Q2 = p2T + (1 − x1)M2. Notice that in the previous studies, M2 [9] and (1− x1)M2 [11] were used for the scale Q2. Nevertheless, these different choices for Q2 bring less than about a 20% effect at small x2 values. The dipole cross section is theoretically unknown, al- though several parametrizations have been proposed in the literature. For our purposes, here we consider two parametrizations, the saturation model of Golec-Biernat and Wüsthoff (GBW) [12] and the modified GBW cou- pled to DGLAP evolution (GBW-DGLAP) [13]. A. GBW model parametrization In the GBW model [12] the dipole cross section is parametrized as, σqq̄(x,~r) = σ0 1− e−r , (6) where the parameters, fitted to DIS HERA data at small x, are given by σ0 = 23.03 mb, R0 = 0.4fm ×(x/x0)0.144, where x0 = 3.04 × 10−4. This parametrization gives a quite good description of DIS data at x < 0.01. A salient feature of the model is that for decreasing x, the dipole cross section saturates for smaller dipole sizes, and that at small r, as perturbative QCD implies, σ ∼ r2 vanishes. This is the so-called color transparency phenomenon [6]. One of the obvious shortcoming of the GBW model is that it does not match with QCD evolution (DGLAP) at large values of Q2. This failure can be clearly seen in the energy dependence of σ tot for Q 2 > 20 GeV2, where the the model predictions are below the data [12, 13]. B. GBW couple to DGLAP equation and dipole evolution A modification of the GWB dipole parametrization model, Eq. (6), was proposed in Ref. [13] σqq̄(x,~r) = σ0 1− exp 2r2αs(µ 2)xg(x, µ2) where the scale µ2 is related to the dipole size by + µ20. (8) Here the gluon density g(x, µ2) is evolved to the scale µ2 with the leading order (LO) DGLAP equation [14]. Moreover, the quark contribution to the gluon density is neglected in the small x limit, and therefore ∂xg(x, µ2) ∂ lnµ2 dzPgg(z) , µ2). (9) where Pgg(z) and αs(µ 2) denote the QCD splitting func- tion and coupling, respectively. The initial gluon density is taken at the scale Q20 = 1GeV 2 in the form xg(x, µ2) = Agx −λg (1− x)5.6, (10) where the parameters C = 0.26, µ20 = 0.52GeV 2, Ag = 1.20 and λg = 0.28 are fixed from a fit to the DIS data for x < 0.01 and in a range of Q2 between 0.1 and 500 GeV2 [13]. We use the LO formula for the running of the strong coupling αs, with three flavors and for ΛQCD = 0.2 GeV. The dipole size determines the evolution scale µ2 through Eq. (8). The evolution of the gluon density is performed numerically for every dipole size r during the integration of Eq. (1). Therefore, the DGLAP equation is now cou- pled to our master equations (4,5). It is important to stress that the GBW-DGLAP model preserves the suc- cesses of the GBW model at low Q2 and its saturation property for large dipole sizes, while incorporating the evolution of the gluon density by modifying the small-r behaviour of the dipole size. The proton structure function in Eqs. (4,5) is parametrized as 2 (x,Q) = A(x) [ ln(Q2/Λ2) ln(Q20/Λ ]B(x) , (11) with Q20 = 20GeV 2, Λ = 0.25 GeV, and the functions A(x), B(x) and C(x) are parametrized in terms of 17 pa- rameters fitted to different experiments, and whose func- tional forms can be found in the Appendix of Ref. [15]. This parametrization is only valid in the kinematic range of the data sets which cover correlated regions in the ranges 3.5× 10−5 < x < 0.85 and 0.2 < Q2 < 5000GeV2. III. NUMERICAL RESULTS Before we proceed to present the results in the color dipole approach, some words regarding the validity of this formulation are in order. Although both valence and sea quarks in the projectile are taken into account through the proton structure function Eqs. (4, 5), the color dipole picture accounts only for Pomeron exchange from the tar- get, while ignoring its valence content. In terms of Regge phenomenology, this means that Reggeons are not taken into account, and as a consequence, the dipole approach predicts the same cross sections for both particle and antiparticle induced DY reactions. Therefore, in princi- ple this approach is well suited for high-energy processes, i.e. small x2. The exact range of validity of the dipole approach is of course not known a priori, but there is evidence [9, 11] in its favor for values of x2 < 0.1. In our case, however, we use a parametrization of the dipole cross section fitted to DIS data for Bjorken-x < 0.01 and for energy scales Q2 < 500. Given these restrictions, at present there are not many data for DY cross section at low x2. Notice also that some data are integrated over xF and M , and are therefore contaminated by contributions not included into the color dipole approach. We compare the present approach to data for 800-GeV pp collisions from E886 [1], which are not integrated over xF and M , and correspond to the lowest x2 values, i.e. lightest M and highest xF . We selected a xF bin where 0.55 < xF < 0.8, with an average value 〈xF 〉 = 0.63. Within this bin we selected two bins with the lightest average values for M , one for 4.20 < Mµ+µ− < 5.20, with an average value of 〈Mµ+µ−〉 = 4.80 GeV, and the other for 5.20 < Mµ+µ− < 6.20, with an average value 〈Mµ+µ−〉 = 5.70 GeV. The experimental data are plot- ted, with errors, in Figs. 1 and 2. In our calculations we have taken the experimental average values for xF and 0 1 2 3 4 5 (GeV) E866 Data GBW-DGLAP GBW-DGLAP-Primordial =0.63, M=5.70 GeV FIG. 1: The Dilepton spectrum in 800-GeV pp collisions at xF = 0.63 and M = 5.7GeV. We show the result of the GBW dipole model (dashed line) and the GBW-DGLAP model (dotted line). We also show the result when a constant pri- mordial momentum 〈k20〉 = 0.4GeV2 is incorporated within the GBW-DGLAP dipole model (solid line). Experimental data are from Ref. [1]. The E866 error bars are the linear sum of the statistical and systematic uncertainties. An addi- tional ±6.5% uncertainty in the experimental data points due to the normalization is also common to all points. 0 1 2 3 4 5 (GeV) E866 Data GBW-DGLAP GBW-DGLAP-Primordial =0.63, M=4.8 GeV FIG. 2: The same as Fig. 1, except for M = 4.8. In Figs. 1, 2 we show the result obtained by the dipole approach, for both the GBW and the GBW-DGLAP dipole models. At low transverse momentum pT < 2 GeV both model predictions are almost identical, but at higher pT the dipole parametrization improved by DGLAP evo- lution bends down towards the experimental points im- proving the result. This is more obvious for higher val- ues of M . Notice that for the case of a smaller value of x2 with a lighter M , Fig. 2, where the dipole approach is better suited, the GBW model without inclusion of the DGLAP evolution already provides a good descrip- tion of the data. We stress that the theoretical curves in Figs. 1, 2 are the results of a parameter free calcula- tion. As we already pointed out, varying the quark mass mq leaves the numerical results almost unaffected. Notice also that in contrast to the LO parton model, noK-factor was introduced, since the dipole parametrization fitted to DIS data already includes contributions from higher or- der perturbative corrections as well as non-perturbative effects contained in DIS data. One of the data point which surprisingly is left out from our theoretical computation curves for both values of M , is the one at the lowest pT . In the dipole ap- proach the DY cross section is finite at pT = 0 due to the saturation of the dipole cross section, which is in strik- ing contrast to the LO pQCD correction to the parton model, where one needs to resume the large logarithms Ln(p2T /M 2) from soft gluon radiation in order to obtain a physically sensible results at pT = 0 [16]. One of the pos- sible reasons behind the lack of agreement between our result and the experimental data at p → 0 may be due to a soft non-perturbative primordial transverse momen- tum distribution of the partons in the colliding protons. Such a primordial transverse momentum may have var- ious non-perturbative origins, e. g. finite size effects of the hadron, instanton effects, pion-cloud contributions. Moreover, in the parton model it has been shown that even within the next-to-leading order (NLO) pQCD cor- rection, experimental data of heavy quark pair produc- tion [17], direct photon production [18] and DY lepton pair production [19] can be only described if an aver- age primordial momentum as large as 1 GeV is included (see also Ref. [20]). Such a large value for the initial transverse momentum strongly indicates its perturbative origin in the parton model, and in principle must have been already included in the pQCD correction. There- fore, in the pQCD approach, it is still an open question how to separate what is truly intrinsic and what is pQCD generated transverse momentum. However, in the dipole approach all perturbative and non-perturbative contribu- tions, apart from the finite-size effect of hadrons, are al- ready encoded into the cross section via fitting the dipole parameters to DIS data. Therefore, we expect that in the dipole approach the primordial momentum should have a purely non-perturbative origin, and to be considerably less than in the parton model. One may introduce an intrinsic momentum contribution in the following factor- ized form F(pT ) → d2kTF(pT − kT )GN (kT ), (12) where the function F denotes the cross section defined in Eqs. (4,5). We assume that the initial pT distribution GN (pT ) has a Gaussian form, GN (kT ) = π〈k2T 〉N 〉N , (13) where 〈k2T 〉N is the square of the two-dimensional width of the pT -distribution for an incoming quark, and also that 〈k2T 〉N is a constant independent of the hard scale Q, since the pQCD radiation-generating transverse mo- menta are already taken into account in our approach. The differential cross section convoluted with the pri- mordial momentum distribution in the GBW-DGLAP dipole model are shown in Figs. 1 and 2 with curves de- noted with GBW-DGLAP-Primordial. A value around 〈k2T 〉N = 0.4 GeV 2 can describe the experimental points at low pT for both sets of data plotted in Figs. 1 and 2. This value, as we expected, is lower than the primordial momentum which has been used in the parton model. The experimental data points for pT → 0 should be taken with some precaution, since there exists some dis- agreement between different experiments for DY lep- ton pair production at low pT . Indeed, although the E772 and E866 measurements [1] have good agreement among them over a wide range of values, they disagree at pT → 0. Therefore, the discrepancy between our the- oretical results and experimental data at pT → 0 might be just in fact an artifact of the experiments. Next we calculate the inclusive direct photon spectra within the same framework. For direct photon we have M = 0, and we assume again a quark mass mq = 0.2 GeV. As illustrative examples we compare our results with the PHENIX and CDF experiments. Notice that di- rect photon problem (withM = 0), compared to the mas- sive virtual photon case (with M as big as ∼ 5 GeV), is numerically more involved since the integrand in Eq. (1) is divergent when mq → 0. In Fig. 3 we show the differential cross section ob- tained from the GBW and the GBW-DGLAP dipole models at midrapidities, for pp collisions at RICH en- ergies s = 200 GeV. The experimental data are from the PHENIX measurements for inclusive direct photon production at y = 0 [2]. We have also checked out that the effect of the incorporation of the same trans- verse primordial momentum 〈k2T 〉N = 0.4 GeV 2 which can describe the dilepton spectra at low pT , will be in this case too small to improve the results at the range of pT of the experimental data. Without a physically sound guiding principle, however, the introduction of a higher value of intrinsic momentum is somehow unsat- isfactory and will not be further discussed here. Notice also that, in contrast to the parton model, we have not included any photon fragmentation function [21, 22, 23] for computing the cross section, since the dipole formula- tion already incorporates all perturbative (via Pomeron exchange) and non-perturbative radiation contributions. It has been shown that the NLO pQCD prediction [21, 23] ] RHIC data GBW-DGLAP 0 2 4 6 8 10 12 14 16 18 20 22 24 (GeV) FIG. 3: Inclusive direct photon spectra obtained from the GBW and the GBW-DGLAP dipole models for midrapidity η = 0 at RHIC energy s = 200 GeV. Experimental data are from Ref. [2]. In the down panel we use the GBW-DGLAP dipole model result for the theory. The error bars are the linear sum of the statistical and systematic uncertainties. are also consistent with the RHIC data within the uncer- tainties [2]. In Fig. 4 we show the dipole models predictions for in- clusive prompt-photon production at midrapidities, and for CDF energies s = 1.8 TeV. The experimental points are taken from CDF data for inclusive isolated-photon, averaged over |η| < 0.9 [3]. At lower transverse momen- tum pT < 30 GeV the GBW dipole model can reproduce rather fairly the experimental data, and at higher pT val- ues DGLAP evolution significantly improves the results. In the collider experiments at the Tevatron, in order to reject the overwhelming background of secondary pho- tons which come from the decays of pions, isolation cuts are imposed [3]. These cuts affect the direct-photon cross section, in particular by reducing the fragmentation ef- fects. Isolation conditions are not imposed in our cal- culation, although the experimental data is for isolated photon. However, it has been shown that the cross sec- tion does not vary by more than 10% under CDF isola- tion conditions and kinematics [24]. Therefore, the main source of uncertainty in our approach is due to the fact that the experimental points are averaged over rapidity and contaminated by Reggeon contributions which are ignored in the dipole approach. One should also notice that the parametrizations of the dipole cross section and proton structure function employed in our computation have been fitted to data at considerably lower Q2 values (see previous section). The NLO pQCD calculation for direct photon production at the Tevatron energy was per- formed in Ref. [25]. New independent measurement of di- ] CDF data GBW-DGLAP NLO QCD 0 20 40 60 80 100 120 (GeV) CTEQ5M, µ = p FIG. 4: Inclusive direct photon spectra obtained from the GBW and GBW-DGLAP dipole models for midrapidity at CDF energy s = 1.8 TeV. Experimental data are for inclu- sive isolated photon from CDF experiment for pp̄ collision at CDF energy and |η| < 0.9 [3]. The NLO QCD curve is from the authors of reference [25] (given in table 3 of Ref. [26]) and use the CTEQ5M parton distribution functions with the all scales set to the pT . In the down panel we use the GBW- DGLAP dipole model result for the theory. The error bars are the linear sum of the statistical and systematic uncertainties. rect photon at the Tevatron energy which is in agreement with previous CDF measurement [3], provided further ev- idence that the shape of the cross section as function of pT cannot be fully described by the available NLO pQCD computation [26]. In this letter, we showed that both direct photon pro- duction and DY dilepton pair production processes can be described within the same color dipole approach with- out any free parameters. In contrast to the parton model, in the dipole approach there is no ambiguity in defin- ing the intrinsic transverse momentum. Such a purely non-perturbative primordial momentum improves the re- sults in the case of dilepton pair production, but does not play a significant role for direct photon production at the given experimental range of pT . We also showed that the color dipole formulation coupled to the DGLAP evolution provides a better description of data at large transverse momentum compared to the GBW dipole model. Acknowledgments The authors would like to thank T. Isobe for provid- ing the experimental data in Ref. [2]. This work was supported in part by Fondecyt (Chile) grants 1070517 and 1050519 and by DFG (Germany) grant PI182/3-1. [1] J. C. Webb, FERMILAB-THESIS-2002-56, hep-ex/0301031. [2] PHENIX Collaboration, Phys. Rev. Lett. 98, 012002 (2007). [3] CDF Collaboration, Phys. Rev. Lett. 73, 2662 (1994); 74,1891 (1995). [4] M. A. Betemps, M. B. Gay Ducati, M. V. T. Machado, J. Raufeisen, Phys. Rev. D67, 114008 (2003). [5] P. Aurenche, M. Fontannaz, J. Ph. Guillet, B. Kniehl, E. Pilon and M. Werlen, Eur. Phys. J. C9, 107 (1999). [6] A. B. Zamolodchikov, B. Z. Kopeliovich and L. I. Lapidus, JETP Lett. 33, 595 (1981). [7] B.Z. Kopeliovich, proc. of the workshop Hirschegg ’95: Dynamical Properties of Hadrons in Nuclear Mat- ter, Hirschegg January 16-21, 1995, ed. by H. Feld- meyer and W. Nörenberg, Darmstadt, 1995, p. 102 (hep-ph/9609385). [8] B. Z. Kopeliovich, A. Schaefer and A. V. Tarasov, Phys. Rev. C59, 1609 (1999). [9] B. Z. Kopeliovich, J. Raufeisen and A. V. Tarasov, Phys. Lett. B503, 91 (2001). [10] B. Z. Kopeliovich, J. Raufeisen and A. V. Tarasov, Phys. Rev. C62, 035204 (2000). [11] J. Raufeisen, J.-C. Peng and G. C. Nayak, Phys. Rev. D66, 034024 (2002). [12] K. Golec-Biernat and M. Wusthoff, Phys. Rev. D59, 014017 (1999); D60, 114023 (1999). [13] J. Bartels, K. Golec-Biernat and H. Kowalski, Phys. Rev. D66, 014001 (2002). [14] M. A. J. Botje, QCDNUM16: A fast QCD evolution, ZEUS Note 97-006, 1997. [15] SMC Collaboration, Phys. Rev. D58, 112001 (1998). [16] P. Chiappetta and H. J. Pirner, Nucl. Phys. B291, 765 (1987). [17] M. N. Mangano, P. Nason, and G. Ridolfi, Nucl. Phys. B373, 295 (1992). [18] Fermilab E706 Collaboration, Phys. Rev. Lett. 81, 2642 (1998); L. Apanasevich et al., Phys. Rev. D59, 074007 (1999). [19] D. C. Hom et al., Phys. Rev. Lett. 37, 1374 (1976); D. M. Kaplan et al., Phys. Rev. Lett.40, 435 (1978). [20] X.-N. Wang, Phys. Rev. C61, 064910 (2000). [21] L. E. Gordon and W. Vogelsang, Phys. Rev. D48, 3136 (1993); D50, 1901 (1994). [22] E. L. Berger and J.-W. Qiu, Phys. Rev. D44, 2002 (1991). [23] P. Aurenche et al., Phys. Lett. B140, 87 (1984); Nucl. Phys. B297, 661 (1988); H. Baer et al., Phys. Rev. D42 61 (1990); Phys. Lett. B234 127 (1990). [24] S. Catani, M. Fontannaz, J. Ph. Guillet and E. Pilon, JHEP 0205, 028 (2002). [25] M. Gluck, L. E. Gordon, E. Reya and W. Vogelsang, http://arxiv.org/abs/hep-ex/0301031 http://arxiv.org/abs/hep-ph/9609385 Phys. Rev. Lett. 73, 388 (1994). [26] CDF Collaboration, Phys. Rev. D70, 074008 (2004). ABSTRACT Drell-Yan dilepton pair production and inclusive direct photon production can be described within a unified framework in the color dipole approach. The inclusion of non-perturbative primordial transverse momenta and DGLAP evolution is studied. We successfully describe data for dilepton spectra from 800-GeV pp collisions, inclusive direct photon spectra for pp collisions at RHIC energies $\sqrt{s}=200$ GeV, and for $p\bar{p}$ collisions at Tevatron energies $\sqrt{s}=1.8$ TeV, in a formalism that is free from any extra parameters. <|endoftext|><|startoftext|> Submitted to The Astrophysical Journal Preprint typeset using LATEX style emulateapj v. 08/22/09 MAPPING THE YOUNGEST GALAXIES TO REDSHIFT ONE1,2 Yuko Kakazu, Lennox L. Cowie, Esther M. Hu, Submitted to The Astrophysical Journal ABSTRACT We describe the results of a narrow band search for ultra-strong emission line galaxies (USELs) with EW(Hβ) ≥ 30 Å. 542 candidate galaxies are found in a half square degree survey using two ∼ 100Å filters centered at 8150Å and 9140Å with Subaru/SuprimeCam. Followup spectroscopy has been obtained for randomly selected objects in the candidate sample with KeckII/DEIMOS and has shown that they consist of [O III]λ5007, [O II]λ3727, and Hα selected strong-emission line galaxies at intermediate redshifts (z < 1), and Lyα emitting galaxies at high-redshift (z >> 5). We determine the Hβ luminosity functions and the star formation density of the USELs, which is 5-10% of the value found from ultraviolet continuum objects at z = 0 − 1, suggesting that they correspond to a major epoch in the galaxy formation process at these redshifts. Many of the USELs show the temperature-sensitive [O III]λ4363 auroral lines and about a dozen have oxygen abundances satisfying the criteria of eXtremely Metal Poor Galaxies (XMPGs). These XMPGs are the most distant known today and our high yield rate of XMPGs suggests that narrowband method is powerful in finding such populations. Moreover, the lowest metallicity measured in our sample is 12+log(O/H)=7.06 (6.78−7.44), which is close to the minimum metallicity found in local galaxies, though we need deeper spectra to minimize the errors. HST/ACS images of several USELs exhibit widespread morphologies from relatively compact high surface brightness objects to very diffuse low surface brightness ones. The luminosities, metallicities and star formation rates of USELs are consistent with the strong emitters being start-up intermediate mass galaxies which will evolve into more normal galaxies and suggest that galaxies are still forming in relatively chemically pristine sites at z < 1. Subject headings: cosmology: observations — galaxies: distances and redshifts — galaxies: abundances — galaxies: evolution — galaxies: starburst 1. INTRODUCTION The study of low-metallicity galaxies is of consider- able interest for the clues that it can provide about the first stages of galaxy formation and chemical enrichment. We would also like to know if there are any genuinely young galaxies undergoing their first episodes of star for- mation at low redshifts. To date, the most metal-poor systems studied have been the blue compact emission- line galaxies found in the local Universe, with systems such as I Zw 18 and SBS 0335-052W defining the low metallicity boundary with measured 12+log (O/H) of ∼ 7.1 − 7.2 (Sargent & Searle 1970; Thuan & Izotov 2005; Izotov et al. 2005). More recently, the Sloan Digi- tal Sky Survey (SDSS) has yielded additional extremely metal-poor galaxies (XMPGs) (12+log (O/H) < 7.65 or Z < Z⊙/12; Kniazev et al. 2003; Izotov et al. 2006a). Despite enormous efforts, only a few dozen such XMPGs are known, all at redshift z < 0.05 (e.g., Oey 2006; Izotov 2006b). Historically, objective prism surveys have been used to select emission-line galaxies for low-metallicity stud- ies. (e.g. the Hamburg QSO Survey (Popescu et al. 1996) and its HSS sequel (Ugryumov et al. 1999) that 1 Based in part on data obtained at the Subaru Telescope, which is operated by the National Astronomical Observatory of Japan. 2 Based in part on data obtained at the W. M. Keck Observa- tory, which is operated as a scientific partnership among the the California Institute of Technology, the University of California, and NASA and was made possible by the generous financial support of the W. M. Keck Foundation. 3 Institute for Astronomy, University of Hawaii, 2680 Woodlawn Drive, Honolulu, HI 96822. discovered HS 2134+0400 (Pustilnik et al. 2006) and the Kitt Peak International Spectroscopy Survey (KISS; Salzer et al. 2000; Melbourne & Salzer 2002)). The ad- vantage of using the objective prism technique rather than the continuum selection, employed with the SDSS (Kniazev et al. 2003) or DEEP2 surveys (Hoyos et al. 2005), is that they have a higher efficiency and provide a more uniform selection. By comparison, continuum/broad-band surveys have a very low yield rate (8 new XMPGs and 4 recovered XMPGS from an anal- ysis of 250,000 spectra over ∼3000 deg2 for the SDSS (Kniazev et al. 2003), since low-metallicity populations in their first outburst have weak continuua and strong emission lines. An alternative method of discovering strong emission- line, low-metallicity galaxies is to use narrowband surveys. Strong emission-line galaxies have histori- cally been picked up in high-z Lyman alpha searches (e.g., Stockton & Ridgway 1998; Hu et al. 1998, 2004; Stern et al. 2000; Tran et al. 2004; Ajiki et al. 2006) where they have been considered contaminants. How- ever, the low redshift emission line galaxies seen in these surveys are of great interest in their own right as we shall show in the present paper. While some spectroscopic studies have been carried out for low-redshift galaxies se- lected from narrowband surveys (e.g., Maier et al. 2006; Ly et al. 2007), the small sample sizes have precluded any detailed investigation of metallicity and identifica- tion of a low-metallicity population. The narrowband method probes to much deeper limits than the objective prism surveys. This enables prob- http://arxiv.org/abs/0704.0643v1 2 Kakazu et al. ing star-forming populations out to near redshift z ∼ 1 where the cosmic star formation rates are considerably higher. Furthermore the narrow-band emission-line se- lection can allow us to assemble very large samples of strong-emission line objects, with a clean selection of dif- ferent line types for the construction of luminosity func- tions. Such a sample allows us to address such questions as whether there are substantial populations of strong star-forming galaxies with low metallicities among more massive galaxies. There has been considerable contro- versy about the interpretation of the low metallicity mea- surements in the blue compact galaxy samples where the ease with which gas may be ejected in these dwarf galaxies has complicated the picture (e.g., Corbin et al. 2006) or, at least, resulted in identifying low metal- licity systems which are not forming their first gen- eration of stars. The identification of low metallicity galaxies – at the level of the XMPGs – among mas- sive galaxies can provides less ambiguous examples of galaxies that are genuinely ‘young’ and caught in the initial stages of star formation. Current efforts to iden- tify low metallicity galaxies from continuum selected surveys (e.g., Kobulnicky et al. 2003; Lilly et al. 2003; Kobulnicky & Kewley 2004; Hoyos et al. 2005) have low- metallicity thresholds that are higher than this – about one-third solar (in O/H). With a narrow-band selec- tion criterion much larger emission-line samples includ- ing such low metallicity galaxies can be identified. With these large samples it is also possible to determine whether there is an observed lower metallicity threshold for such galaxies, and to estimate what the contribution of such strong star-formers might be at these epochs. In the present work we use a number of deep, narrow- band images obtained with the SuprimeCam mosaic CCD camera (Miyazaki et al. 2002) on the Subaru 8.2- m telescope to find a large sample of extreme emission- line galaxies. We first (§2) outline the selection criteria (magnitude and flux thresholds) for the target fields re- sulting in a sample of 542 galaxies, which we call USELs (Ultra-Strong Emission Liners). We then describe (§3) the spectroscopic followups for 161 of these galaxies us- ing multi-object masks with the DEIMOS spectrograph (Faber et al. 2003) on the 10-m Keck II telescope. Sam- ple spectra for each class of object are shown. Flux calibration and equivalent width distributions are pre- sented in §4, and the resulting measured line ratios are discussed. In §5 luminosity functions are constructed and star formation rates are estimated for the sample. These galaxies are estimated to contribute roughly 10% to the measured star-formation rate (without extinction correc- tions) at this epoch. Analysis of the metallicities is given in §6. Their morphologies and dynamical masses are dis- cussed in §7 and a final summary discussion is given in §8. We use a standard H0 = 70 km s −1 Mpc−1, Ωb = 0.3, ΩΛ = 0.7 cosmology throughout. 2. THE NARROW BAND SELECTION The emission line sample was chosen from a set of narrow band images obtained with the SuprimeCam camera on the Subaru 8.2-m telescope. The data were obtained in a number of runs between 2001 and 2005 under photometric or near photometric conditions. We used two ∼120 Å (FWHM) filters centered at nominal 5000 6000 7000 8000 9000 10000 WAVELENGTH (A) Fig. 1.— Schematic illustration of the selection process and a typical spectrum of the galaxies we find. The objects are chosen based on their excess light in one of two narrow band filters at 8160 Å and 9140Å. The present case corresponds to an Hα emission line object found in the 9140Å filter (shown with the narrow solid curve). Also illustrated are the broad band V (dash-dot), R (solid), I (dashed), and z′ (dotted) filters use to measure the continuum. The spectrum shown corresponds to object 205 in Table 3 and is an Hα emitter at z = 0.3983. The easily visible lines are the Balmer series and the [O III] lines at λλ5007, 4959, and 4363Å. wavelengths of 8150 Å and 9140 Å in regions of low sky background between the OH bands. The nominal specifications for the Subaru filters may be found at http://www.naoj.org/Observing/Instruments/SCam/sensitivity.html and are also described in Ajiki et al. (2003). We shall refer to these filters as NB816 and NB912. About 5 hour exposures were obtained with NB816 and ∼10 hour exposures with NB912 yielding 5 sigma limits fainter than 26 mags in both bands. Deep exposures in B, V , R, I and z′ were also taken for the fields. The data were taken as a sequence of dithered background-limited exposures and successive mosaic sequences were rotated by 90 degrees. Only the central uniformly covered areas of the images were used. Corresponding continuum expo- sures were always obtained in the same observing run as the narrowband exposures to avoid false identifications of transients such as high-z supernovae, or Kuiper belt objects, as emission-line candidates. A detailed descrip- tion of the full reduction procedure for images is given in Capak et al. (2004). All magnitudes are given in the AB system (Oke 1990). These were measured in 3′′ diameter apertures, and had average aperture corrections applied to give total magnitudes. The primary purpose of the program was to study Lyα emitters at redshifts of z ∼ 5.7 and z ∼ 6.6 (Hu et al. 2004, 2007) but the narrow band imaging is also ideal for selecting lower redshift emission line galaxies and it is for this purpose that we use these data in the present paper. The fields which we use and the area covered (ap- proximately a half square degree in each bandpass) are summarized in Table 1. These are distributed over the sky to deal with cosmic variance. We selected galaxies in the narrowband NB816 filter using the Cousins I band filter as a reference continuum bandpass and including all galaxies with NB816 < 25 and I − NB816 greater than 0.8. We selected galaxies in the NB912 filter with the z filter as the reference continuum bandpass and in- http://www.naoj.org/Observing/Instruments/ SCam/sensitivity.html THE YOUNGEST GALAXIES 3 TABLE 1 Narrowband Survey Area Coverage Field RA (J2000) Dec (J2000) (lII, bII) EB−V a NB816 NB912 (arcmin2) (arcmin2) SSA22 22:17:57.00 +00:14:54.5 ( 63.1,−44.1) 0.07 674 591 SSA22 new 22:18:24.67 +00:36:53.4 ( 63.6,−43.9) 0.06 278 278 A370 new 02:41:16.27 −01:34:25.1 (173.4,−53.3) 0.03 278 278 HDF-N 12:36:49.57 +62:12:54.0 (125.9,+54.8) 0.01 710 528 Total 1940 1675 Note. — An adjacent field to A370 new is a site of a gravitational lensing cluster at z ∼ 0.375, and was omitted from the suvey. a estimated using http://irsa.ipac.caltech.edu/applications/DUST/ based on Schlegel et al. (1998) 20 21 22 23 24 N(AB) 20 21 22 23 24 N(AB) WITH SPECTRA Fig. 2.— Continuum – Narrow band magnitude versus narrow band magitude for all objects with narrow band magnitude brighter than 24. The diamonds show the narrowband excess emission mag- nitude of the NB912 sample and the squares the NB816 sample. Galaxies which would be included in an R < 24 continuum selected sample are shown with solid symbols. The upper panel shows the complete sample while the lower panel shows the subsample which has been spectroscopically identified. cluded all galaxies with NB912 < 25 and z − NB912 greater than 1. The selection process is illustrated for a galaxy found in the NB912 filter in Figure 1. Both selections correspond roughly to choosing objects with emission lines with rest frame equivalent widths greater than 100 Å. The exact equivalent width limit depends on the precise position of the emission line in the filter and the redshift of the galaxy which in turn depends on which emission line is producing the excess light in the narrow band. The final USEL sample consists of 542 galaxies (267 in the NB816 filter and 275 in the NB912 filter). Tabu- lated coordinates, multi-color magnitudes, and redshifts (where measured) for these objects are summarized in Table 2. Very few of these objects would be included in continuum-selected spectroscopic samples. Figure 2 shows the narrow band excess as a function of narrow- band (NAB) magnitude for objects with narrow band magnitudes brighter than 24. The open symbols show the present sample while the solid symbols show ob- jects which would be included in an R < 24 continuum- selected sample. 3. SPECTRA Spectroscopic observations were obtained for 161 USELs from the sample using the Deep Extragalac- tic Imaging Multi-Object Spectrograph (DEIMOS; Faber et al. 2003) on Keck II in a series of runs between 2003 and 2006. The emission line objects were included in masks designed to observe high-z Lyα candidates and, as can be seen in the lower panel of Figure 2, constitute an essentially random sample of the emission line galax- The observations were primarily made with the G830 ℓ/mm grating blazed at 8640 Å and used 1′′ wide slitlets. In this configuration, the resolution is 3.3 Å, which is sufficient to distinguish the [O II]λ3727 doublet struc- ture. This allows us to easily identify [O II]λ3727 emit- ters where often the [O II]λ3727 doublet is the only emis- sion feature. The spectra cover a wavelength range of approximately 4000 Å and were centered at an average wavelength of 7800 Å, though the exact wavelength range for each spectrum depends on the slit position with re- spect to the center of the mask along the dispersion di- rection. The G830 grating used with the OG550 blocker gives a throughput greater than 20% for most of this range, and ∼ 28% at 8150 Å. The observations were not generally taken at the parallactic angle, since this was determined by the mask orientation, so considerable care must be taken in measuring line fluxes as we dis- cuss below. Each ∼ 1 hr exposure was broken into three subsets, with the objects stepped along the slit by 1.5′′ in each direction. Some USELs were observed multiple times, resulting in total exposure times for these galaxies of 2−3 hours. The two-dimensional spectra were reduced following the procedure described in Cowie et al. (1996) and the final one-dimensional spectra were extracted us- ing a profile weighting based on the strongest emission http://irsa.ipac.caltech.edu/applications/DUST/ 4 Kakazu et al. 4000 4500 5000 5500 6000 6500 7000 REST WAVELENGTH (A) Fig. 3.— Spectrum of an Hα emission galaxy selected in the NB912 filter. In the upper plot we have decreased the scale of the vertical axis by a factor of 10 to show the continuum and the weaker lines. The more important emission line features are labelled and marked with the dotted lines. 3500 4000 4500 5000 5500 REST WAVELENGTH (A) Fig. 4.— Spectrum of an [O III] galaxy in the NB816 selected sample. The lower plot shows the relative strengths of the very strong emission lines in the spectrum. In the upper plot we have decreased the scale of the vertical axis by a factor of 10 to show the continuum and the weaker lines. The more important emission line features are labelled and marked with the dotted lines. THE YOUNGEST GALAXIES 5 3500 4000 4500 5000 5500 REST WAVELENGTH (A) Fig. 5.— Spectrum of an [O III] galaxy selected in the NB912 filter. The lower plot shows the relative strengths of the very strong emission lines in the spectrum. In the upper plot we have decreased the scale of the vertical axis by a factor of 10 to show the continuum and the weaker lines. The more important emission line features are labelled and marked with the dotted lines. 3000 3200 3400 3600 3800 4000 4200 4400 REST WAVELENGTH (A) Fig. 6.— Spectrum of an [O II] galaxy selected in the NB816 filter. The plot shows the relative strengths of the very strong emission lines in the spectrum. The more important emission line features are labelled and marked with the dotted lines. 6 Kakazu et al. 0.0 0.5 1.0 1.5 REDSHIFT [OIII] Hα [OII] 0.0 0.5 1.0 1.5 REDSHIFT 10-18 10-17 10-16 10-15 Fig. 7.— (a) Distribution of redshifts for the spectroscopically identified sources. [O III] λ5007 emitters are the most common. Since the focus of this paper is on intermediate-redshift (z . 1) strong emission line galaxies, we did not plot high redshift Lyα galaxies (z >> 5) in our NEO sample. High-z Lyα emitters are discussed in Hu et al. (2004, 2007). (b) Flux versus redhift for the spectroscopically identified sample. Squares are Hα, diamonds are [O III] λ5007, and triangles are [O II] λ3727. The solid line shows the flux limit corresponding to the narrow band magnitude limit of N(AB)=25 for an emitter with very large equivalent width. Some objects with lower equivalent widths fall below this limit. line in the spectrum. A small number of the spectra were obtained with the ZD600 ℓ/mm grating giving a correspondingly lower resolution but a wider wavelength coverage. These observations were centered at 7200 Å. Essentially all of the emission line candidates which were observed were identified, though two of the objects in the NB816 sample are stars where the absorption line structure mimics emission in the band. Sample spec- tra are shown in Figures 3, 4, 5, and 6. The measured redshifts are given in Tables 2 and 3. The narrow band emission line selection produces a mixture of objects cor- responding to Hα, [O III]λ5007, and [O II]λ3727 and, at the faintest magnitudes (> 24), high redshift Lyα emit- ters. The number of objects seen in each line and the redshifts where they are found are shown in Figure 7. The spectroscopically identified sample from both bands contains 13 Hα, 92 [O III]λ5007, and 23 [O II]λ3727 emit- ters. In the remainder of the paper we shall focus primar- ily on the Hα and [O III]λ5007 selected galaxies which lie between redshifts zero and one. Since only 30% of the USELs are spectroscopically identified we must apply a substantial incompleteness correction in computing the line luminosity function and the universal star formation histories. Because the type mix may vary as a function of magnitude we have adopted a magnitude dependent weighting for each galaxy equal to the total number of galaxies at this mag- nitude divided by the number of spectroscopically iden- tified galaxies. However, the analysis is not particularly sensitive to the adopted scheme since the fraction of iden- tified galaxies is relatively constant with magnitude. 4. FLUX CALIBRATIONS Generally our spectra were not obtained at the par- allactic angle since this is determined by the DEIMOS mask orientation. Therefore flux calibration using stan- dard stars is problematic due to atmospheric refraction effects, and special care must be taken for the flux cali- bration. We thus employed three independent methods for the flux calibration. In §4.1 we introduce “primary fluxes” which are absolute fluxes of the emission lines used to select the galaxies. Primary fluxes are computed from the SuprimeCam broadband and narrowband mag- nitudes. We use these fluxes to derive luminosity func- tions of Hα and [O III]λ5007 emitters at z < 1 (§5.1). In §4.2 we measure line fluxes from the spectra. Rela- tive line fluxes can be measured from the spectra without flux calibration as long as we restrict the line measur- ments to over short wavelength range where the DEIMOS response is essentially constant. For example, one can as- sume the response of neighboring lines (e.g. [O III]λ4949 and [O III]λ5007) are the same and therefore one can measure the flux ratio without calibration. For bright galaxies, we can absolutely calibrate the fluxes by inte- grating spectra and equating them to Subaru broadband fluxes. These line fluxes derived from the spectra are used as a check of the primary fluxes. We show that the ratio of [O III]λ5007/[O III]λ4959 is indeed close to 1/3, and that the fluxes computed from the spectra are highly consistent with the primary fluxes measured in §4.1. In §4.3, we show Balmer flux ratios f(Hβ)/f(Hα) of bright Hα emitters are close to the Case B conditions, suggesting very little reddening. Metallicity measurements by the direct method re- quire four emission lines that are widely displaced over the spectral wavelength range ([O III]λλ4959, 5007, [O III]λ4363, and [O II]λ3727). To calibrate these lines, we used neighboring Balmer lines with the assumption of Case B conditions, and this is described in §4.4. 4.1. Narrow Band Fluxes − Primary Fluxes For the emission lines used to select each galaxy we may compute the equivalent widths and absolute fluxes directly from the narrow band magnitudes (N) and the corresponding continuum magnitudes (C) from our SuprimeCam imaging data. For example, in the case of the NB816 selected emission-line galaxies, N corresponds to the NB816 magnitude and C is the I band magnitude. We shall refer to the values calculated in this way as the primary fluxes and use this quantity to compute the lu- minosity functions for each emitter in §5.1. Defining the quantity R = 10−0.4∗(N−C) THE YOUNGEST GALAXIES 7 1.5 2.0 2.5 3.0 LOG REST FRAME EW ([OIII]5007) 1.5 2.0 2.5 3.0 LOG REST FRAME EW (Hα) Fig. 8.— (a) Distribution of the rest frame equivalent widths determined from the narrow band magnitudes for the spectroscop- ically identified [O III] λ5007 sources. (b) Distribution of the rest frame equivalent widths for the Hα selected sample. the observed frame equivalent width becomes where φ is the narrow band filter response normalized such that the integral over wavelength is unity and ∆λ is the effective width of the continuum filter. The narrow band filter is often assumed to be rectangular in which case φ becomes 1/δλ where δλ is the width of the narrow band but as can be seen from Figure 1 this is not a very good approximation in the present case. For very high equivalent width objects the denominator in this equa- tion becomes uncertain unless the broad band data are very deep, and this can result in a large scatter in the very highest equivalent widths where the continuum is poorly determined. In the case of the [O III]λ5007 line we must include the second member of the doublet which also lies within the narrow band filter. We have computed these cases as- suming the flux of the [O III]λ4959 line is 0.34 times that of the [O III]λ5007 line. Then φ = φ1+0.34×φ2 where φ1 is the filter response at the redshifted 5007 Å wavelength and φ2 is the filter response at redshifted 4959 Å. The distribution of the rest frame equivalent widths for the Hα and [O III]λ5007 samples is shown in Figure 8. The [O III]λ5007 sample selects objects with rest frame equivalent widths above about 100Å while the lower red- shift Hα sample selects objects with rest frame equivalent widths above about 150Å. Since the [O III]λ5007 lines are also generally stronger than the Hα lines the [O III] selec- tion chooses less extreme objects than the Hα selection and will include a larger fraction of galaxies at the given 10-18 10-17 10-16 10-15 f(OIII 5007) erg cm-2 s-1 Fig. 9.— The ratio of the [O III] λ4959 line to [O III] λ5007. The errors are plus and minus 1 sigma. The median ratio is 0.338 and the scatter around this value is consistent with that expected from the statistical errors. redshift. The high observed frame equivalent widths make the line fluxes insensitive to the continuum determination and these may simply be found from f = A 10−0.4N − 10−0.4C where A is the AB zeropoint at the narrow band wave- length in units of erg cm−2 s−1 Å−1. The flux depends on the filter response at the emission line wavelength and correspondingly is most uncertain at the edges of the fil- ters where this quantity changes rapidly. Primary fluxes defined here are measured by using narrowband (N) and broadband (C) magnitudes from Subaru imaging data with the object redshift information from Keck spectra for the filter response at the exact location of emission line wavelength (φ). We use these primary fluxes to con- struct the luminosity functions of Hα and [O III]λ5007 selected emitters as we discuss in §5.1. 4.2. Line Fluxes from the Spectra For the short wavelength range where DEIMOS re- sponse is essentially constant, we may also compute the relative line fluxes from the spectra without calibration. For each spectrum we fitted a standard set of lines. For the stronger lines we used a full Gaussian fit together with a linear fit to the continuum baseline. For weaker lines we held the full width constant using the value mea- sured in the stronger lines and set the central wavelength to the nominal redshifted value. We also measured the noise as a function of wavelength by fitting to random positions in the spectrum and computing the dispersion in the results. These fits should provide accurate relative fluxes over short wavelength intervals where the DEIMOS response is similar, but may be expected to be poorer over longer wavelength intervals where the true calibration can vary from the adopted value. We tested the relative flux cali- bration for neighboring lines and the noise measurement by measuring the ratio of the [O III]λ4959/ [O III]λ5007 lines. This is expected to have a value of approximately 0.34. The ratio is shown as a function of the [O III]λ5007 8 Kakazu et al. 10-17 10-16 10-15 FLUX (erg cm-2 s-1) Fig. 10.— Ratio of fluxes computed from the spectra and the broad band magnitudes versus those from the narrow band mag- nitudes. Hα lines are shown as solid boxes, [O III] λ5007 lines as diamonds and [O II] λ3727 lines as crosses. flux in Figure 9. The measured values scatter around the expected value and the dipsersion is consistent with the noise determination. This result supports our assump- tion of [O III]λ4959/[O III]λ5007 = 0.34 in the primary fluxes measurements described in §4.1. The brighter objects may be absolutely calibrated us- ing the continuum magnitudes obtained from our Sub- aru data. We integrated the spectrum convolved with each SuprimeCam filter response and set this equal to the broad band flux to normalize the spectrum in each of the filters. We then used the Gaussian fits to obtain the spectral line fluxes for lines lying within that broad band. This procedure only works for sources with well determined continuum magnitudes (C < 24.5) where the sky subtraction can be well determined in the spectra. For these objects the spectrally determined fluxes are shown versus the primary fluxes in Figure 10 where we plot the ratio of the spectral to the primary flux versus the primary line flux. The values agree extremely well though the measured spectral line fluxes are on average about 80 − 90% of the primary flux values. This may reflect a selection bias in the choice of the objects or the narrow band filters could be slightly narrower than the nominal profiles. 4.3. Balmer Ratios We now measured the Balmer ratios for the sample of objects selected in Hα and where the continuum magni- tudes were bright enough to absolutely flux calibrate the spectra. The ratio of f(Hβ)/f(Hα) is shown in Figure 11. The values average closely to the Case B ratio which is shown as the solid line and at brighter fluxes the indi- vidual values also closely match to this value suggesting that the galaxies have very little reddening. However, at fainter fluxes the scatter about the average value is considerably higher than the statistical errors. At the faintest fluxes it appears that the systematic uncertainty can be as high as a multiplicative factor of two. 4.4. Final Flux Calibration for Metallicity Analysis For the metallicity analysis we adopted the pro- cedure of normalizing the longer wavelength lines ([O III]λλ4959, 5007, [O III]λ4363) to their nearest 10-18 10-17 10-16 10-15 f(Hα) erg cm-2 s-1 Fig. 11.— The ratio of the Hβ/Hα fluxes versus Hα flux. The values average to the unreddened Balmer decrement shown by the solid line but at the lower fluxes the scatter is larger than expected from the statistical errors reflecting the calibration uncertainties for the fainter sources. The figure shows the ten objects detected in the Hα line with continuum magnitudes above 24.5 in the bandpasses corresponding to the lines. Balmer line to determine the unreddened fluxes. For ex- ample, in the case of the Hα emission selected galaxies, we can measure Hα absolute fluxes by the primary fluxes method described in §4.1. We can then derive Hβ and Hγ fluxes from Hα fluxes by assuming Case B recombination [e.g., f(Hα) = 2.85 × f(Hβ), f(Hγ) = 0.469 × f(Hβ) at T = 104 K and Ne ∼ 10 2−104cm−3; Osterbrock 1989]. As Hβ and [O III]λλ4959, 5007 have very similar DEIMOS response, the relative fluxes should remain the same with or without the flux calibration and this can be expressed by a simple equation: f0(Hβ) f0([O III]λ4959, λ5007) f(Hβ) f([O III]λ4959, λ5007) where f0(Hβ) and f0([O III]λ4959, λ5007) are the flux counts in the un-calibrated, reddened DEIMOS spec- tra, while f(Hβ) and f([O III]λ4959, λ5007) are absolute, unreddened fluxes. Since we know f(Hβ) from f(Hα) with the Case B assumption and f0(Hβ)/f0([O III]λ4959, λ5007) from the DEIMOS spectra, we can derive f([O III]λ 4959, λ5007) using this simple formula. Sim- ilary, we can absolutely calibrate [O III]λ4363 lines by using its neighboring Balmer line, Hγ: f0(Hγ) f0([O III]λ4363) f(Hγ) f([O III]λ4363) where f0(Hγ) and f0([O III]λ4363) are again the counts in flux uncalibrated, reddened DEIMOS spectra, and f(Hγ) and f([O III]λ4363) are absolute fluxes. In the case of the [O III] selected emitters, we first de- rive [O III]λλ4959, 5007 absolute fluxes using the primary fluxes method (§4.1), and then use the above formula to get absolute fluxes of Hβ, then Hγ (by the Case B ratio), and finally [O III]λ4363. This flux calibration technique using neighboring Balmer line should work well for the [O III]λλ4959, 5007, λ4363 lines and the [N II] lines which all lie close to Balmer lines but may be slightly more approximate for the [S II] lines. The higher order Balmer lines are too un- certain to apply this procedure due to inadequate S/N of THE YOUNGEST GALAXIES 9 40 41 42 43 L(Hα) (erg s-1) 1 z=0.24 40 41 42 43 L(Hα) (erg s-1) 1 z=0.39 Fig. 12.— The luminosity function of Hα at z = 0.24 (top panel) and at z = 0.39 (bottom panel). In each case the open boxes show the luminosity functions determined from the spectroscopic sample alone while the solid boxes show the function corrected for the incompleteness in the spectroscopic identification. The errors are plus and minus 1 sigma and at the highest luminosity we show the 1 sigma upper limit. the lines, and we have used the continuum flux calibrated values with no reddening for the [O II]λ3727 and [Ne III] lines. These values will have correspondingly higher flux uncertainties. Fortunately the [O II]λ3727 line is very weak in most of the objects and the uncertainty has lit- tle effect on the metallicity determinations. However, ionization analyses based on the [Ne III] line should be undertaken with caution. 5. STAR FORMATION HISTORY 5.1. Hα and [O III]λ5007 Luminosity Functions Because of the high observed frame equivalent widths the primary fluxes are insensitive to the continuum de- termination. However, they do depend on the filter re- sponse at the emission line wavelength so we first restrict ourselves to redshifts where the nominal filter response is greater than 70% of the peak value. This also has the advantage of providing a uniform selection and we assume the window function is flat over the defined red- shift range. Now the volume is simply defined by the se- lected redshift range for all objects above the minimum luminosity which we take as corresponding to a flux of 1.5×10−17 erg cm−2 s−1 (Figure 7). The luminosity func- tion is now obtained by dividing the number of objects 40 41 42 43 L(OIII) (erg s-1) 1 z=0.63 40 41 42 43 L(OIII) (erg s-1) 1 z=0.83 Fig. 13.— The luminosity function of [O III] λ5007 emitters at z = 0.63 (top panel) and at z = 0.83 (bottom panel). In each case the open boxes show the luminosity functions determined from the spectroscopic sample alone while the solid boxes show the function corrected for the incompleteness in the spectroscopic identification. The errors are plus and minus 1 sigma and at the highest luminosity we show the 1 sigma upper limit. in each luminosity bin by the volume. The incomplete- ness corrected luminosity function is obtained from the sum of the weights in each luminosity bin divided by the volume. The 1 sigma errors shown are calculated from the Poissonian errors based on the number of objects in the bin. The calculated Hα luminosity function is shown for the two redshift ranges corresponding to the NB816 and NB912 selections in Figure 12 and the corresponding [O III]λ5007 luminosity functions in Figure 13. 5.2. Star Formation Rates The individual objects have Hα luminosities stretching up to about 3×1041 erg s−1 where, at the higher redshifts, we use the Hβ luminosity to infer the Hα value assuming there is no reddening. For a steady formation this would require a star formation rate of a few solar masses per year if we adopt the Kennicutt (1998) conversion rate for his Salpeter mass function. Since the objects are more probably caused by star- bursts the true star formation rate will depend on the evolutionary history. However, the Hα luminosity den- sity should give a reasonable estimate of the universal star formation density of the objects provided only that most of the ionizing photons are absorbed in the galax- 10 Kakazu et al. Fig. 14.— The star formation history inferred from the Hα or Hβ luminosity density as a function of redshift. The data from our sample are shown in red. The open squares show the total rate from the entire sample while the solid squares show the values for objects with Hα rest frame equivalent widths in excess of 200Å or Hβ rest frame equivalent widths in excess of 70Å. The diamonds show the UV star formation rates (uncorrected for extinction) from the ground based work of Wilson et al. (2002) and the triangles the Galex results of Wyder et al. (2005) and Schiminovich et al. (2005). Hα measurements from the literature as summarized in Ly et al. (2007) are shown with filled circles. In all cases the errors are ±1σ. ies. We first formed the total Hα or Hβ luminosity den- sity of the galaxies by summing over the incompleteness weighted luminosities in each redshift interval. We only included detected objects and did not attempt to extrap- olate to lower luminosities but the result are not partic- ularly sensitive to this because the luminosity functions are relatively flat at the lower luminosities (Figures 13 and 14). We then converted these to star formation rates with the Kennicutt (1998) conversion. The results are shown in Figure 14. We first plot the rate for the total samples at each redshifts shown by the open squares. We have shown star formation rates for UV continuum samples for comparison and the present sample of strong emitters gives star formation rates which are about 10% of the UV values over the redshift interval. For comparison, we also show the star formation rates from Hα selected samples reported in the literature and summarized in Ly et al. (2007). In order to better understand the evolution we have also restricted our own sample to objects with rest frame equivalent widths of Hα in excess of 200Å at low redshifts and Hβ equivalent widths in excess of 70Å at the higher redshifts. The star formation rates for this sample are shown with the solid squares. This provides a more uniform selection with redshift and gives a slower increase than the total inferred star formation rate. For this sample the SFR is evolving roughly as (1+z)3 broadly similar to other UV and optically determined formation rates in this redshift interval. These more restricted objects comprise about 5% relative to the UV star formation rates at the higher redshifts. 6. GALAXY METALLICITIES 6.1. [O III] emitters The spectra are of variable quality and, in order to measure the metallicities, we need very high signal to noise observations. It is also important that Balmer lines are well detected since our flux calibrations are done us- ing the neighboring Balmer lines (§4.4). We therefore restricted ourselves to [O III] emitters whose Hβ fluxes are detected above 15 sigma. Among 92 [O III] emitters in our total spectroscopic sample, 8 such [O III] emitters were chosen in the NB912 sample, and 10 in the NB816. These emitters have Hγ detected above 4 sigma. Tables 4 and 5 give the oxygen line fluxes normalized by their Hβ fluxes for the NB816 and NB912 selected emitters, re- spectively. 1σ upper limits are listed when the measured flux is below 1σ. The most direct method to estimate the gas-phase oxygen abundance is to use the electron temperature of the HII region. Higher gas metallicity increases nebular cooling, leading to lower electron temperatures. There- fore electron temperature is a good indicator of the gas metallicity. The electron temperature can be de- rived from the ratio of the [O III]λ4363 auroral line to [O III]λλ5007,4959. This procedure is often referred to as the ‘direct’ method or Te method (e.g., Seaton 1975; Pagel et al. 1992; Pilyugin & Thuan. 2005; Izotov et al. 2006c). One well-known problem with the direct method, however, is that [O III]λ4363 is generally very weak even in the low-metallicity galaxies. For higher metallicity systems, the far-IR lines become the dominant coolant and therefore the optical auroral line is essentially not detectable. However, the majority of our sample exhibit [O III]λ4363, already suggesting that these are metal- deficient systems. To derive Te[O III] and the oxygen abundances, we used the Pagel et al. (1992) calibrations with the Te[O II]−Te[O III] relations derived by Garnett (1992). The results are shown in Table 4 (for NB816 selected [O III] emitters) and Table 5 (for NB912 se- lected [O III] emitters). The Izotov et al. (2006c) for- mula, which was developed with the latest atomic data and photoionization models, gives consistent abundances within 0.1 dex. The [S II]λλ6717, 6731 lines that are usu- ally used for the determination of the electron number density, are beyond the Keck/DEIMOS wavelength cov- erage for our [O III] emitters. Therefore we assumed ne = 100 cm−3. However the choice of electron density has little effect as electron temperature is insensitive to the electron density; we get the same results even when we use ne = 1000 cm The 1σ upper and lower limits on Te[O III] and the oxygen abundances are also shown in the tables. Because the [O III]λ4363 flux is weak (< 4σ), the range of our metallicity measurements are quite wide. However, of 18 [O III] emitters, even the upper metallicity limits on 6 emitters satisfy the definition of XMPGs [12 + log(O/H) < 7.65; Kunth & Östlin 2000]. All our emitters, except the ones that only have lower limits on metallicties such as ID31 in Table 4, have very low metallicities – even the upper metallicity limits are about 0.02 - 0.2 Z⊙. A few emitters may even have metallicities that are comparable to the currently known most metal-poor galaxies [I Zw 18 and SBS0335−052W; 12 + log(O/H) ∼ 7.1 − 7.2]. However our current metallicity errors are too large to measure the baseline metallicity accurately and higher S/N spectra will be necessary for this purpose. Our discovery rate of XMPGs appers to be signif- icantly higher than those of other surveys. Only 14 new XMPGs have been discovered from the analysis of THE YOUNGEST GALAXIES 11 Fig. 15.— [O III]λ4959+λ5007/[O III]λ4363 versus [O II]λ3727/[O III]λ5007 for the [O III] and Hα selected emitters in Table 4 and 5. The electron temperature of the HII region is also shown. ∼530,000 galaxy spectra in the SDSS and they are all located nearby (z < 0.005) (SDSS DR3: Kniazev et al. 2003, SDSS DR4: Izotov et al. 2006a). At higher red- shift, 17 metal-poor (7.8 < 12 + log(O/H)< 8.3) galax- ies have been found at z ∼ 0.7 in the initial phase of the DEEP2 survey of 3,900 galaxies and the Team Keck Red- shift Survey of 1,536 galaxies (Hoyos et al. 2005). But none of these galaxies satisfies the condition of XMPGs. The present sample may be the first XMPGs at inter- mediate redshift (z ∼ 1) whose oxygen abundances are securely measured by the direct method. The narrow- band method is very powerful for finding not only high- redshift (z >> 5) galaxies, but also strong emission-line, extremely metal-deficient galaxies at intermediate red- shifts (z < 1). Figure 15 shows the electron temperature sensi- tive line ratio, [O III](λ4959+λ5007)/[OIII]λ4363 ver- sus [O II]λ3727/[O III]λ5007. If we have an estimate of the metallicity, as in the present case, we can use the [O II]λ3727/[O III]λ5007 ratio to estimate the ionization parameter q. The ionization parameter q is defined as the number of hydrogen ionizing photons passing through a unit area per second per unit hydrogen number den- sity (Kewley & Dopita 2002). For the low metallicity systems with strong [O III]λ4363 lines, we can see from Figure 15 that [O II]λ3727 is very weak compared to [O III]λ5007 with values ranging downwards from 0.3. Assuming the metallicity is less than 0.2 Z⊙ this would place a lower bound of q = 108 cm s−1 on the ioniza- tion parameter based on the Kewley & Dopita (2002) model. The higher metallicity objects have stronger [O II]λ3727/[O III]λ5007 which, while in part due to the metallicity, also requires these objects to have lower ion- ization parameters suggesting we are seeing an evolution- ary sequence. 6.2. Hα emitters Among 13 Hα emitters in our spectroscopic sample, only 3 NB912 selected emitters have Hβ fluxes ade- quate (> 15σ) for the purpose of metallicity measure- ments. Their Te[O III] and oxygen abundances were mea- sured using the direct method described above, and are shown in the Table 5 together with the data for the [O III] emitters. The [O II]λ3727 line of ID266 is out- side the Keck/DEIMOS wavelength coverage. In order to derive an upper-limit on the metallicity, we assumed [O II]λ3727/[O III]λ5007 = 1.0, which is the maximum value in our sample (see, Fig. 15). All our Hα emitters are metal poor (Zupper < 0.45 Z⊙), but none of them are XMPGs. 6.3. Composite Spectrum As can be seen in Figure 15 the objects with low [O II]λ3727/[O III]λ5007 have relatively uniform values of ([O III]λ5007+λ4959)/[OIII]λ4363 and similar metal- licities. Given the relatively low signal to noise of the individual spectra it therefore seems of interest to form a composite spectrum. Such a spectrum will have weight- ings on the lines which depend on the individual ion- ization parameters and metallicity but will give a rough estimate of the average metallicity and temperature of the sample. In Figure 16 we show the composite spectrum of the 8 objects with [O II]λ3727/[OIII]λ5007 less than 0.1. The [O III]λ4363 is now strongly detected with a value of 16.7 ± 2.1 or eight sigma. The mean temper- ature is 19, 500 ± 1, 500 K and the average abundance 12+log(O/H)=7.5± 0.1 and the mean rest frame equiv- alent width of Hβ is 57Å. The results are similar to the values obtained by averaging the individual analyses of the eight objects. 7. MORPHOLOGIES The morphology of the USELs may give us a clue to the mechanism of their high star formation activity (SFR ∼ a few M⊙/yr) and star formation history; what has trig- gered the star formation −mergers, gas infall, or galactic winds? High resolution HST/ACS images are available for GOODS-North (GOODS-N) region (Giavalisco et al. 2004) which is one of our survey fields. There are 17 NB816 selected USELs in the GOODS-N, and 16 in the NB912 sample. Figures 17 and 18 show thumbnails of the NB816 and NB912 selected USELs in the GOODS-N field (respectively) with each thumbnail 12.′′5 on a side. The white dashes point to the largest galaxy. We used continuum broadband images to show underlying stellar populations: HST/ACS B, V, z′-band images were used for NB816 emitters, and B, V, I-band for NB912 emit- ters. High-redshift Lyα emitters (z >> 5) are very red because of the continuum absorption below Lyα emission caused by neutral hydrogen in the intergalactic medium. We do not have spectra for most of the USELs in the GOODS-N field yet, and none of the USELs in GOODS- N have metallicity measurements either due to lack of spectra or low spectral S/N. But we can qualitatively argue that the USELs at intermediate redshift (z < 1) exhibit widespread morphologies from relatively compact high surface brightness objects to very diffuse low surface brightness ones. 8. DISCUSSION The present emitters differ from the local dwarf HII galaxies in a large number of ways though they appear much more similar to the XMPGs found in the SDSS samples. They are much more luminous, have large [O III]/[O II] ratios, and they are a relatively high frac- tion (about 10% by number from Figure 13) of other 12 Kakazu et al. 3600 3800 4000 4200 4400 4600 4800 5000 5200 REST WAVELENGTH (A) Fig. 16.— Composite spectrum of the 8 emitters with [O II]λ3727/[O III]λ5007 less than 0.1. The eight spectra have simply been summed without weighting. The lower panel shows the stronger lines and the upper the continuum and the weaker lines. The stronger emission lines are labelled and marked with the vertical dotted lines. Fig. 19.— The oxygen abundance versus the absolute rest frame B magnitude for the [O III] selected samples (red squares). One sigma errors are shown for the oxygen abundances and one sigma lower limits are shown with upward pointing arrows. The solid line shows the (Skillman et al. 1989) relation for the nearby dwarf irregulars. As with the local XMPGs (filled circles, Kniazev et al. 2003; Kewley et al. 2007 and GRB hosts (open squares, Stanek et al. 2006; Kewley et al. 2007, the present galaxies are much more luminous at a given metallicity than the local irregulars. Metal-poor luminous galaxies (but not XMPGs) at z ∼ 1 from Hoyos et al. 2005. are shown as triangles. A few of our emitters may have oxygen abundances comparable to the most metal-deficient galaxis, I Zw 18 (12 + log O/H = 7.17± 0.01, Thuan & Izotov 2005) and SBS 0335-052W (12 + log O/H = 7.12 ± 0.03, Izotov et al. 2005). THE YOUNGEST GALAXIES 13 galaxy populations at these redshifts. Taken together this suggests that we are seeing much more massive galaxies in the early stages of formation and, since we need these to have relatively long lifetimes in order to understand their frequency, that we are seeing objects undergoing continuous star formation rather than single starbursts. For the case of constant star formation with a standard Salpeter IMF a forming galaxy can have equiv- alent widths above 30Å for 109 yr (Leitherer et al. 1999) which would allows us to understand the observed num- ber density of strong emitters relative to the total galaxy population. In this type of model we would expect the metal- licity to grow with time and that higher metallicity galaxies would have higher continuum magnitudes and lower equivalent widths in Hβ. We plot the absolute rest frame B magnitudes versus the Oxygen abundance in Figure 19. As with the case for the local XMPGs found in the SDSS (filled circles, Kniazev et al. 2003; Kewley et al. 2007) and the metal-poor galaxies (7.8 < 12 + log O/H < 8.3) at z ∼ 1 (triangles, Hoyos et al. 2005), the present emitters (red squares) are much more luminous at a given metallicity than is found for the local dIrrs (solid line, Skillman et al. 1989). Further- more there does indeed seem to be a trend to higher continuum luminosities at higher metallicity consistent with ongoing star formation raising the luminosity. Re- cently Kewley et al. (2007) reported the similarity be- tween XMPGs and long duration GRB hosts ; they share similar SFRs, extinction levels, and both lie in a similar region of the luminosity-metallicity diagram. Our sam- ple metal-deficient galaxies, which also lie in the region of GRB hosts, may be additional support of the connection between XMPGs and GRB hosts. Of the six galaxies with continuum magnitudes brighter than -18 all but one have metallicities or lower limits which would place them near or above 12+log(O/H)=8 while the lower luminosity galaxies pri- marily have 12+log(O/H) in the range 7.1 − 7.8. If we assumed that the metallicity were a simple linear func- tion of the age then the more luminous galaxies would be several times older than the less luminous ones which is not quite enough to account for the luminosity increase (see e.g. Leitherer et al. 1999) suggesting that the en- richment process may be more complex. However, the accuracy of our current metallicity measurements may be inadequate for measuring the lowest metallicities in the sample and we could be underestimating the amount of metallicity evolution. The relation between the metallicity and the Hβ equiv- alent width is shown in Figure 20. There clearly is a large scatter in metallicity at all equivalent widths suggesting that the star formation may be episodic with periods in which bursts of star formation enhances the Hβ equiva- lent widths in objects where previous star formation has raised the metallicity. With better spectra and more accurate metal estimates we should be able to refine these tests and also determine whether the number density of objects versus metallic- ity is consistent with that expected in a simple growth model. Perhaps even more importantly as larger spectro- scopic samples are obtained we should be able to deter- mine if there is a floor on the metallicity corresponding 10 100 1000 EW (H Beta) Fig. 20.— The oxygen abundance versus the rest frame Hβ equiv- alent width for the [O III] selected samples. One sigma errors are shown for the oxygen abundances and one sigma lower limits are shown with upward pointing arrows. The dotted line shows the metallicity of 1Zw-18. to the enrichment in the intergalactic gas. Within the er- rors we have yet to find an object with lower metallicity than the lowest metallicity local galaxies but this could easily change as the observations are improved. 9. SUMMARY We have described the results of spectrscopic observa- tions of a narrowband selected sample of extreme emis- sion line objects. The results show that such objects are common in the z = 0 − 1 redshift interval and produce about 5-10% of the star formation seen in ultraviolet or emission line measurements at these redshifts. A very large fraction of the strong emitters are detected in the [O III]λ4363 line and oxygen abundances can be mea- sured using the direct method. The abundance primar- ily lie in the 12+log(O/H) range of 7−8 characteristic of XMPGs. The results suggest that we are seeing early chemical enrichment of startup galaxies at these redshifts which are forming in relatively chemically regions. As we de- velop larger samples of these objects and improve the accuracy of their abundance estimates we should be able to test this model and to determine if there is a floor on the metallicity of the galaxies. We are indebted to the staff of the Subaru and Keck observatories for their excellent assistance with the obser- vations. We acknowledge Subaru/SuprimeCam support astronomer, Hisanori Furusawa, for his help over several years with the observations. Y. Kakazu acknowledges invaluable advice from Lisa J. Kewley and Roberto Ter- levich on metallicity measurements. This work was sup- ported in part by NSF grants AST04-07374 (LLC) and AST06-87850 (EMH), and Spitzer grant JPL 1289080 (EMH). 14 Kakazu et al. REFERENCES Ajiki, M. et al. 2003. AJ, 126, 2091, astro-ph/0307325 Ajiki, M., et al. 2006, PASJ, 58, 113, astro-ph/0511471 Capak, P. et al. 2004, AJ, 127, 180, astro-ph/0312635 Corbin, M. R., Vacca, W. D., Cid Fernandes, R., Hibbard, J. E., Somerville, R. S., & Windhorst, R. A. 2006, ApJ, 651, 861, astro-ph/0607280 Cowie, L. L., Songaila, A., Hu, E. M., & Cohen, J. G. 1996, AJ, 112, 839, astro-ph/9606079 Faber, S. M. et al. 2003, Proc. SPIE, 4841, 1657 Garnett, D. R. 1992, AJ, 103, 1330 Giavalisco, M. et al. 2004, ApJ, 600, L93, astro-ph/0309105 Hoyos, C., Koo, D. C., Phillips, A. C., Willmer, C. N. A., & Guhathakurta, P. 2005, ApJ, 635, L21, astro-ph/0511066 Hu, E. M., Cowie, L. L., & McMahon, R. G. 1998, ApJ, 502, L99, astro-ph/9801003 Hu, E. M., Cowie, L. L., Capak, P., Hayashino, T., & Komiyama, Y. 2004, AJ, 127, 563, astro-ph/0311528 Hu, E. M., Cowie, L. L., Kakazu, Y., & Capak. P. 2007, in preparation Izotov, Y. I., Thuan, T. X., & Guseva, N. G., 2005, ApJ, 415, 87, astro-ph/0506498 Izotov, Y. I., Papaderos, P., Guseva, N. G., Fricke, K. J., & Thuan, T. X. 2006a, A&A, 454, 137, astro-ph/0604234 Izotov, Y. I. 2006b, in ASP Conf. Ser. 353, Stellar Evolution at Low Metallicity: Mass Loss, Explosions, Cosmology, ed. H. J. G. L. M. Lamers, N. Langer, T. Nugis, & K. Annuk (San Francisco: ASP), 349 Izotov, Y. I., Stasińska, G., Meynet, G., Guseva, N. G., & Thuan, T. X. 2006c, A&A, 448, 955, astro-ph/0511644 Kennicutt, R. C., Jr. 1998, ARA&A, 36, 189, astro-ph/9807187 Kewley, L. J., & Dopita, M. A. 2002, ApJS, 142, 35, astro-ph/0206495 Kewley, L. J., Brown W. R., Geller, M. J., Kenyon, S. J., & Kurtz, M. J. 2007, ApJ, in press, astro-ph/0609246 Kniazev, A. Y., Grebel, E. K., Hao, L., Strauss, M. A., Brinkmann, J., & Fukugita, M. 2003, ApJ, 593, L73, astro-ph/0307401 Kobulnicky, H. A., & Kewley, L. J. 2004, ApJ, 617, 240, astro-ph/0408128 Kobulnicky, H. A., et al. 2003, ApJ, 599, 1006, astro-ph/0310346 Kunth, D., & Östlin, G. 2000, A&A Rev., 10, 1, astro-ph/9911094 Leitherer, C. et al. 1999, ApJS, 123, 3, astro-ph/9902334 Lilly, S. J., Carollo, C. M., & Stockton, A. N. 2003, ApJ, 597, 730, astro-ph/0307300 Ly, C. et al. 2007, ApJ, 657, 738, astro-ph/0610846 Maier, C., Lilly, S. J., Carollo, C. M., Meisenheimer, K., Hippelein, H., & Stockton, A. 2006, ApJ, 639, 858, astro-ph/0511255 Melbourne, J., & Salzer, J. J. 2002, AJ, 123, 2302, astro-ph/0202301 Miyazaki, S., et al. 2002, PASJ, 54, 833, astro-ph/0211006 Oey, M. S. 2006, in ASP Conf. Ser. 353, Stellar Evolution at Low Metallicity: Mass Loss, Explosions, Cosmology, ed. H. J. G. L. M. Lamers, N. Langer, T. Nugis, & K. Annuk (San Francisco: ASP), 253 Oke, J. B. 1990, AJ, 99, 1621 Osterbrock, D. E. 1989, Astrophysics of gaseous nebulae and active galactic nuclei (Mill Valley, CA: University Science Books) Pagel, B.E.J., Simonson, E. A., Terlevich, R. J., & Edmunds, M. G. 1992, MNRAS, 255, 325 Papovich, C. et al. 2004, ApJ, 600, L111, astro-ph/0310888 Pilyugin, L. S. & Thuan, T. X. 2005, ApJ, 631, 231 Popescu, C. C., Hopp, U., Hagen, H. J., & Elsaesser, H. 1996, A&AS, 116, 43, astro-ph/9510127 Pustilnik, S. A., Engels, D., Kniazev, A. Y., Pramskij, A. G., Ugryumov, A. V., & Hagen, H.-J. 2006, Astronomy Letters, 32, 228, astro-ph/0508255 Salzer, J. J., et al. 2000, AJ, 120, 80, astro-ph/0004074 Sargent, W. L. W., & Searle, L. 1970, ApJ, 162, L155 Schiminovich, D., et al. 2005, ApJ, 619, L47, astro-ph/0411424 Schlegel, D. J., Finkbeiner, D. P., & Davis, M. 1998, ApJ, 500, 525, astro-ph/9710327, used to estimate dust extinctions from source location by the NASA/IPAC Infrared Science Archive Dust Reddening and Extinction Tool (http://irsa.ipac.caltech.edu/applications/DUST/) Seaton M. J. 1975, MNRAS, 170, 475 Skillman, E. D., Kennicutt, R. C., & Hodge, P. W. 1989, ApJ, 347, 875 Stanek, K. Z. et al. 2006, Acta Astron., 56, 333, astro-ph/0604113 Stasińska, G., Schaerer, D., & Leitherer, C. 2002, Ap&SS, 281, 335 McMahon, R. G. 2003, MNRAS, 342, 439, astro-ph/0302212 Stern, D., Bunker, A., Spinrad, H., & Dey, A. 2000, ApJ, 537, 73, astro-ph/0002239 Stockton, A., & Ridgway, S. E. 1998, AJ, 115, 1340, astro-ph/9801056 Thuan, T. X., Izotov, Y. I. 2005, ApJS, 161, 240, astro-ph/0507209 Tran, K.-V. H., Lilly, S. J., Crampton, D., & Brodwin, M. 2004, ApJ, 612, L89, astro-ph/0407648 Turnshek, D. A., Bohlin, R. C., Williamson II, R. L., Lupie, O. L., Koornneef, J., & Morgan, D.H. 1990, AJ, 99, 1243 Ugryumov, A. V., et al. 1999, A&AS, 135, 511 Wilson, G., Cowie, L. L., Barger, A. J., & Burke, D. J. 2002, AJ, 124, 1258, astro-ph/0203168 Wyder, T. K., et al. 2005, ApJ, 619, L15, astro-ph/0411364 http://arxiv.org/abs/astro-ph/0307325 http://arxiv.org/abs/astro-ph/0511471 http://arxiv.org/abs/astro-ph/0312635 http://arxiv.org/abs/astro-ph/0607280 http://arxiv.org/abs/astro-ph/9606079 http://arxiv.org/abs/astro-ph/0309105 http://arxiv.org/abs/astro-ph/0511066 http://arxiv.org/abs/astro-ph/9801003 http://arxiv.org/abs/astro-ph/0311528 http://arxiv.org/abs/astro-ph/0506498 http://arxiv.org/abs/astro-ph/0604234 http://arxiv.org/abs/astro-ph/0511644 http://arxiv.org/abs/astro-ph/9807187 http://arxiv.org/abs/astro-ph/0206495 http://arxiv.org/abs/astro-ph/0609246 http://arxiv.org/abs/astro-ph/0307401 http://arxiv.org/abs/astro-ph/0408128 http://arxiv.org/abs/astro-ph/0310346 http://arxiv.org/abs/astro-ph/9911094 http://arxiv.org/abs/astro-ph/9902334 http://arxiv.org/abs/astro-ph/0307300 http://arxiv.org/abs/astro-ph/0610846 http://arxiv.org/abs/astro-ph/0511255 http://arxiv.org/abs/astro-ph/0202301 http://arxiv.org/abs/astro-ph/0211006 http://arxiv.org/abs/astro-ph/0310888 http://arxiv.org/abs/astro-ph/9510127 http://arxiv.org/abs/astro-ph/0508255 http://arxiv.org/abs/astro-ph/0004074 http://arxiv.org/abs/astro-ph/0411424 http://arxiv.org/abs/astro-ph/9710327 http://irsa.ipac.caltech.edu/applications/DUST/ http://arxiv.org/abs/astro-ph/0604113 http://arxiv.org/abs/astro-ph/0302212 http://arxiv.org/abs/astro-ph/0002239 http://arxiv.org/abs/astro-ph/9801056 http://arxiv.org/abs/astro-ph/0507209 http://arxiv.org/abs/astro-ph/0407648 http://arxiv.org/abs/astro-ph/0203168 http://arxiv.org/abs/astro-ph/0411364 THE YOUNGEST GALAXIES 15 TABLE 2 NB816 Emission-Line Sample No. RA(2000) Dec(2000) N(AB) Z(AB) I R V B z 1 40.115555 −1.694722 23.92 25.28 24.84 25.44 −99.00 −99.00 −1.0000 2 40.116665 −1.617361 24.26 25.88 25.52 25.74 −99.00 −99.00 −1.0000 3 40.138332 −1.405639 23.49 24.94 24.70 25.22 −99.00 −99.00 0.6343 4 40.174721 −1.704750 24.27 25.46 25.35 25.82 −99.00 −99.00 0.6355 5 40.183056 −1.495417 24.58 25.28 25.85 26.86 −99.00 −99.00 5.6886 6 40.216946 −1.494805 24.80 26.08 26.31 26.14 −99.00 −99.00 0.2416 7 40.250832 −1.744639 23.51 24.35 24.33 24.54 −99.00 −99.00 −1.0000 8 40.276112 −1.518139 24.72 24.78 25.53 25.75 −99.00 −99.00 −1.0000 9 40.276390 −1.623250 24.55 25.36 25.36 25.76 −99.00 −99.00 −1.0000 10 40.284168 −1.453583 21.82 23.20 23.02 22.56 −99.00 −99.00 0.2480 11 40.298889 −1.447389 24.32 25.95 25.42 25.71 −99.00 −99.00 −1.0000 12 40.304165 −1.391694 20.88 22.35 22.06 22.32 −99.00 −99.00 −1.0000 13 40.306946 −1.638500 24.91 25.67 25.72 26.02 −99.00 −99.00 −1.0000 14 40.311111 −1.535111 24.08 25.93 25.49 25.71 −99.00 −99.00 0.6240 15 40.318333 −1.548222 24.60 25.03 25.50 25.96 −99.00 −99.00 −1.0000 16 40.318890 −1.430889 23.37 23.77 24.32 24.42 −99.00 −99.00 1.1804 17 40.319168 −1.446333 23.60 23.85 24.43 24.56 −99.00 −99.00 1.1873 18 40.320835 −1.778028 20.70 21.57 21.60 21.86 −99.00 −99.00 −1.0000 19 40.324165 −1.409972 24.29 24.38 25.18 25.45 −99.00 −99.00 −1.0000 20 40.326111 −1.709805 23.24 24.75 24.62 24.96 −99.00 −99.00 −1.0000 21 40.336388 −1.570194 24.26 24.73 25.06 25.33 −99.00 −99.00 −1.0000 22 40.337223 −1.388194 24.81 27.37 26.71 26.77 −99.00 −99.00 −1.0000 23 40.337502 −1.658306 24.89 25.20 25.76 26.05 −99.00 −99.00 −1.0000 24 40.340279 −1.689472 24.55 26.37 26.07 26.51 −99.00 −99.00 −1.0000 25 40.340557 −1.551889 24.99 25.70 25.93 26.35 −99.00 −99.00 −1.0000 26 40.340832 −1.371222 22.42 23.43 23.29 23.16 −99.00 −99.00 −1.0000 27 40.340832 −1.516250 24.89 25.06 25.69 25.78 −99.00 −99.00 −1.0000 28 40.341110 −1.493500 23.37 24.62 24.55 24.23 −99.00 −99.00 −1.0000 29 40.341389 −1.484139 24.48 25.46 25.49 25.73 −99.00 −99.00 −1.0000 30 40.342777 −1.599528 24.65 25.31 25.50 25.85 −99.00 −99.00 −1.0000 31 40.343056 −1.438833 23.17 24.60 24.54 24.54 −99.00 −99.00 0.6226 32 40.347500 −1.403833 24.86 27.45 26.16 26.35 −99.00 −99.00 0.6324 33 40.349724 −1.598472 23.11 23.87 24.11 24.46 −99.00 −99.00 1.1956 34 40.356388 −1.515722 24.94 26.19 26.18 26.42 −99.00 −99.00 −5.0000 35 40.372223 −1.390361 23.85 24.68 24.72 24.62 −99.00 −99.00 −1.0000 36 40.373611 −1.722528 24.93 25.85 25.74 25.88 −99.00 −99.00 −1.0000 37 40.377777 −1.701889 23.45 23.82 24.26 24.70 −99.00 −99.00 −1.0000 38 40.388054 −1.697361 23.96 24.32 24.79 24.90 −99.00 −99.00 −1.0000 39 40.388889 −1.573361 24.79 25.13 25.65 26.07 −99.00 −99.00 −1.0000 40 40.394444 −1.521389 22.73 23.69 23.84 24.06 −99.00 −99.00 0.6292 Note. — Magnitudes are measured in 3′′ diameter apertures. An entry of ‘−99’ indicates that no excess flux was measured. −1.0000 in the redshift column means no spectroscopic data were obtained for the object. This is a sample table showing the first entries of the electronic version of the table that will accompany the published paper. 16 Kakazu et al. TABLE 3 NB912 Emission-Line Sample No. R.A. (J2000.0) Decl. (J2000.0) N(AB) z′(AB) I R V B zspec (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) 1 40.131668 −1.408361 23.86 25.08 25.89 25.97 −99.00 −99.00 0.8371 2 40.133888 −1.575222 24.76 25.92 26.13 25.84 −99.00 −99.00 1.4498 3 40.148056 −1.593555 23.21 24.62 25.53 26.11 −99.00 −99.00 0.8207 4 40.148335 −1.725417 24.76 25.97 26.98 26.39 −99.00 −99.00 0.8111 5 40.150833 −1.737556 23.31 24.35 24.88 25.32 −99.00 −99.00 0.8269 6 40.153332 −1.536833 23.64 25.00 26.18 26.87 −99.00 −99.00 0.8301 7 40.156387 −1.765833 21.93 23.05 23.51 23.74 −99.00 −99.00 −1.0000 8 40.165833 −1.580056 24.94 26.20 25.92 25.84 −99.00 −99.00 1.4482 9 40.183334 −1.389583 21.77 22.79 23.33 23.52 −99.00 −99.00 0.8325 10 40.184444 −1.596444 23.09 24.50 25.30 25.55 −99.00 −99.00 0.8293 11 40.193611 −1.690083 24.32 25.40 25.83 25.69 −99.00 −99.00 0.8266 12 40.194168 −1.373722 23.93 24.93 25.21 25.35 −99.00 −99.00 −1.0000 13 40.194721 −1.373917 24.87 25.90 26.30 26.47 −99.00 −99.00 −1.0000 14 40.196667 −1.378333 24.07 25.42 26.05 26.18 −99.00 −99.00 −1.0000 15 40.202221 −1.584472 24.22 25.30 25.81 26.19 −99.00 −99.00 0.8289 16 40.203335 −1.471861 24.77 26.20 25.78 25.54 −99.00 −99.00 0.3965 17 40.214722 −1.519917 23.14 24.40 25.38 25.94 −99.00 −99.00 0.8288 18 40.220276 −1.753778 24.34 25.41 26.40 26.29 −99.00 −99.00 −1.0000 19 40.220833 −1.388556 23.24 24.36 25.13 24.99 −99.00 −99.00 −1.0000 20 40.226944 −1.542111 23.02 24.47 25.77 25.48 −99.00 −99.00 0.8208 21 40.229168 −1.720889 24.99 27.85 27.11 · · · −99.00 −99.00 6.4800 22 40.229443 −1.376472 23.75 24.98 25.72 24.66 −99.00 −99.00 −1.0000 23 40.245834 −1.578972 24.61 25.82 27.22 26.91 −99.00 −99.00 0.8285 24 40.280556 −1.421056 24.82 25.86 26.13 25.91 −99.00 −99.00 −1.0000 25 40.290833 −1.746361 23.89 25.16 25.81 25.03 −99.00 −99.00 0.0000 26 40.323891 −1.697667 24.73 25.80 26.47 25.71 −99.00 −99.00 −1.0000 27 40.330833 −1.612389 23.03 24.68 26.06 25.47 −99.00 −99.00 0.0000 28 40.339722 −1.395361 23.87 25.54 26.98 25.45 −99.00 −99.00 0.3889 29 40.346668 −1.448305 23.93 24.98 25.37 25.48 −99.00 −99.00 0.8274 30 40.382500 −1.554056 23.52 24.87 25.60 25.08 −99.00 −99.00 0.3930 31 40.393055 −1.713694 24.94 26.83 26.70 26.50 −99.00 −99.00 −1.0000 32 40.398888 −1.466417 23.87 24.91 24.83 24.64 −99.00 −99.00 1.4590 33 40.403889 −1.530167 24.88 26.08 27.44 27.30 −99.00 −99.00 0.8223 34 40.409443 −1.369222 21.52 23.44 24.47 24.04 −99.00 −99.00 −1.0000 35 40.411667 −1.691417 24.41 25.50 25.68 25.55 −99.00 −99.00 −1.0000 36 40.424999 −1.454028 21.91 22.93 23.42 23.41 −99.00 −99.00 −1.0000 37 40.430000 −1.501111 23.13 24.21 24.76 24.92 −99.00 −99.00 0.8267 38 40.446110 −1.676139 24.91 26.09 26.37 26.04 −99.00 −99.00 −1.0000 39 40.478889 −1.534278 24.87 26.06 25.79 25.56 −99.00 −99.00 0.3861 40 40.506111 −1.755111 22.73 24.27 25.24 25.19 −99.00 −99.00 −1.0000 41 40.511665 −1.596944 24.73 25.96 25.55 25.67 −99.00 −99.00 −1.0000 42 40.518055 −1.666139 22.47 23.73 24.37 24.45 −99.00 −99.00 −1.0000 Note. — Magnitudes are measured in 3′′ diameter apertures. An entry of ‘−99’ indicates that no excess flux was measured. −1.0000 in the redshift column means no spectroscopic data were obtained for the object. This is a sample table showing the first entries of the electronic version of the table that will accompany the published paper. TABLE 4 Line fluxes and Oxygen Abundance for L816 selected emitters Object f([OIII]5007) f([OIII]4959) f([OIII]4363) f([OII]3727) Te[OIII] 12+log(O/H) [OIII] emitters 31 513.6 ± 24.0 222.3 ± 11.4 < 6.60 54.4 ± 4.62 < 1.19 > 8.09 40 577.9 ± 21.6 191.3 ± 8.05 9.40 ± 3.40 140.9 ± 6.22 1.14 < 1.34 < 1.53 7.86 < 8.03 < 8.25 51 401.5 ± 12.6 146.9 ± 5.52 9.40 ± 4.50 < 2.39 1.19 < 1.55 < 1.90 7.51 < 7.62 < 7.94 76 464.4 ± 10.5 191.3 ± 4.86 < 2.90 344.5 ± 8.07 < 0.95 > 8.55 118 492.6 ± 29.7 193.9 ± 12.9 34.4 ± 12.8 11.3 ± 2.61 2.16 < 3.08 < 4.32 6.93 < 7.16 < 7.44 195 335.0 ± 21.4 129.5 ± 10.2 24.0 ± 10.9 97.1 ± 9.71 2.02 < 3.17 < 4.86 6.78 < 7.06 < 7.44 206 597.1 ± 19.5 204.1 ± 7.41 21.6 ± 9.20 < 1.72 1.48 < 1.97 < 2.48 7.42 < 7.55 < 7.84 208 658.0 ± 30.9 249.8 ± 12.3 15.6 ± 8.00 < 22.1 1.16 < 1.56 < 1.93 7.67 < 7.85 < 8.22 223 242.9 ± 15.3 83.3 ± 7.53 23.7 ± 18.9 < 10.1 1.45 < 4.64 < 19.62 6.14 < 6.61 < 7.53 252 466.8 ± 9.32 157.9 ± 3.57 9.20 ± 4.00 139.0 ± 3.71 1.16 < 1.45 < 1.72 7.68 < 7.87 < 8.14 Note. — Only emitters with > 15σ Hβ fluxes are listed. All fluxes are normalized by their f(Hβ) and multiplied by 100. 1σ upper limits are listed for [OII]3727 flux below 3σ and [OIII]4363 below 1σ. The units of Te[OIII] are 10 −4 [K]. THE YOUNGEST GALAXIES 17 Fig. 17.— HST/ACS (B, V, z’) composite images of NB816 emitters in the GOODS-N field with overlaid object IDs from Table 2 and redshifts, where known. Fields are 12.′′5 on a side. TABLE 5 Line Fluxes and Oxygen Abundances for L912 selected emitters Object f([OIII]5007) f([OIII]4959) f([OIII]4363) f([OII]3727) Te[OIII] 12+log(O/H) [OIII] emitters 3 550.9 ± 12.9 187.9 ± 4.91 23.9 ± 7.90 8.6 ± 2.5 1.74 < 2.20 < 2.71 7.26 < 7.43 < 7.65 6 588.1 ± 35.1 216.0 ± 14.2 18.4 ± 11.0 52.0 ± 8.6 1.20 < 1.79 < 2.39 7.40 < 7.68 < 8.14 9 442.3 ± 15.3 154.7 ± 6.42 < 12.3 157.2 ± 6.88 < 1.70 > 7.70 10 490.0 ± 11.9 178.7 ± 5.14 13.7 ± 4.40 < 2.95 1.42 < 1.69 < 1.97 7.55 < 7.61 < 7.82 17 342.5 ± 20.0 129.7 ± 9.29 15.9 ± 9.40 < 7.72 1.41 < 2.26 < 3.28 7.03 < 7.22 < 7.70 20 418.7 ± 17.4 135.1 ± 6.96 16.8 ± 5.50 24.5 ± 2.45 1.69 < 2.11 < 2.57 7.18 < 7.36 < 7.58 239 202.4 ± 10.2 75.6 ± 6.40 < 8.20 190.6 ± 10.2 < 2.08 > 7.34 270 351.7 ± 15.1 149.7 ± 7.73 12.4 ± 3.40 30.7 ± 2.72 1.59 < 1.87 < 2.16 7.28 < 7.43 < 7.61 Hα emitters 52 589.1 ± 10.0 179.2 ± 3.42 18.3 ± 1.59 < 1.56 1.75 < 1.83 < 1.92 7.62 < 7.67 < 7.72 60 619.1 ± 33.5 206.7 ± 12.5 10.7 ± 7.77 48.4 ± 8.7 0.90 < 1.37 < 1.77 7.67 < 7.96 < 8.57 266 682.8 ± 10.3 217.7 ± 3.57 14.7 ± 2.42 ... 1.40 < 1.52 < 1.63 < 8.3 Note. — Same as Table 4 but for NB912 emitters. The [OII]λ3727 of ID266 is beyond the DEIMOS wavelength coverage and thus was not being measured. 18 Kakazu et al. Fig. 18.— HST/ACS (B, V, I) composite images of NB912 emitters in the GOODS-N field with overlaid object IDs from Table 3 and redshifts, where known. Fields are 12.′′5 on a side. ABSTRACT We describe results of a narrow band search for ultra-strong emission line galaxies (USELs) with EW(H beta) > 30 A. 542 candidate galaxies are found in a half square degree survey using two ~100 Angstrom 8150 A and 9140 A filters with Subaru/SuprimeCam. Followup spectroscopy for randomly selected objects in the sample with KeckII/DEIMOS shows they consist of [OIII] 5007, [OII] 3727, and H alpha selected strong-emission line galaxies at intermediate redshifts (z < 1), and Ly alpha emitting galaxies at high-redshift (z >> 5). We determine the H beta luminosity functions and the star formation density of the USELs, which is 5-10% of the value found from ultraviolet continuum objects at z=0-1, suggesting they correspond to a major epoch in galaxy formation at these redshifts. Many USELs show the temperature-sensitive [OIII] 4363 auroral lines and about a dozen have oxygen abundances characteristic of eXtremely Metal Poor Galaxies (XMPGs). These XMPGs are the most distant known today. Our high yield rate of XMPGs suggests this is a powerful method for finding such populations. The lowest metallicity measured in our sample is 12+log(O/H) = 7.06 (6.78-7.44), close to the minimum metallicity found in local galaxies. The luminosities, metallicities and star formation rates of USELs are consistent with the strong emitters being start-up intermediate mass galaxies and suggest that galaxies are still forming in relatively chemically pristine sites at z < 1. <|endoftext|><|startoftext|> Submitted to ApJ Letters December 18, 2006; Accepted April 04, 2007 Preprint typeset using LATEX style emulateapj v. 08/22/09 DISCOVERY OF EXTREME ASYMMETRY IN THE DEBRIS DISK SURROUNDING HD 15115 Paul Kalas , Michael P. Fitzgerald , James R. Graham Submitted to ApJ Letters December 18, 2006; Accepted April 04, 2007 ABSTRACT We report the first scattered light detection of a dusty debris disk surrounding the F2V star HD 15115 using the Hubble Space Telescope in the optical, and Keck adaptive optics in the near-infrared. The most remarkable property of the HD 15115 disk relative to other debris disks is its extreme length asymmetry. The east side of the disk is detected to ∼315 AU radius, whereas the west side of the disk has radius >550 AU. We find a blue optical to near-infrared scattered light color relative to the star that indicates grain scattering properties similar to the AU Mic debris disk. The existence of a large debris disk surrounding HD 15115 adds further evidence for membership in the β Pic moving group, which was previously argued based on kinematics alone. Here we hypothesize that the extreme disk asymmetry is due to dynamical perturbations from HIP 12545, an M star 0.5◦ (0.38 pc) east of HD 15115 that shares a common proper motion vector, heliocentric distance, galactic space velocity, and Subject headings: stars: individual(HD 15115) - circumstellar matter 1. INTRODUCTION Volume-limited, far-infrared surveys of the solar neigh- borhood suggest that ∼15% of main sequence stars have excess thermal emission indicative of circumstellar grains (Aumann 1985; Backman & Paresce 1993; Meyer et al. 2007). Direct imaging of dust scattered light reveals the geometry of the grain population relative to the star, which further elucidates the origin of dust. In some cases, a circumstellar nebulosity may be amorphous with asymmetric striated features produced when stellar ra- diation pressure deflects interstellar dust (Kalas et al. 2002). In other cases, such as β Pictoris and Fomal- haut, the geometry of dust is consistent with a circum- stellar disk or belt related to the formation of planetesi- mals (Smith & Terrile 1984; Kalas et al. 2005). Though larger bodies such as comets and asteroids are not di- rectly observed, they most likely exist as a reservoir for injecting fresh debris into the system over the lifetime of the star. Furthermore, circumstellar debris disks dis- play significant structure and asymmetry that may be linked, in principle, to dynamical perturbations from a planetary system (Roques et al. 1994; Liou & Zook 1999; Moro-Mart́ın & Malhotra 2002). Unfortunately, only ∼10% of stars with excess thermal emission have detected scattered light disks due to the high contrast between the host star and the low surface brightness neb- ulosity at optical and near-infrared wavelengths. Fortu- nately, the observational capabilities have improved in recent years due to instrument upgrades on the Hubble Space Telescope (HST) and the implementation of adap- tive optics (AO) on large, ground-based telescopes. Here we show new scattered light images of a debris disk surrounding HD 15115, an F2 star at 45 pc (Table 1), first reported as a source of thermal excess emission by Silverstone (2000). The spectral energy distribution 1 Astronomy Department and Radio Astronomy Laboratory, 601 Campbell Hall, Berkeley, CA 94720 2 National Science Foundation Center for Adaptive Optics, Uni- versity of California, Santa Cruz, CA 95064 is consistent with a single temperature dust belt at ∼35 AU radius with an estimated dust mass of 0.047 M⊕ (Zuckerman & Song 2004; Williams & Andrews 2006). Recently, Moór et al. (2006) identified HD 15115 as a candidate for membership in the 12 Myr-old β Pic mov- ing group (BPMG), based on new radial velocity mea- surements that resulted in galactic kinematics similar to those of the BPMG. 2. OBSERVATIONS & DATA ANALYSIS We first detected the HD 15115 disk in scattered light using the HST ACS High Resolution Camera (HRC) on 2006 July 17. We used the F606W broadband filter and the 1.8′′ diameter occulting spot to artificially eclipse the star. Three flatfielded frames of 700 seconds each from standard pipeline processing of the HST data archive were median combined for cosmic ray rejection. The point spread function was then subtracted iteratively us- ing five other stars of similar spectral type obtained from the HST archive. The relative intensity scaling between images was iteratively adjusted until the residual image showed a mean radial profile equal to zero intensity per- pendicular to the circumstellar disk. The images were then corrected for geometric distortion, giving a 25 mas pixel−1 scale. The resulting optical images revealed a needle-like fea- ture projecting westward from the star to the edge of the field, but with almost no counterpart to the east. Given the high degree of asymmetry that could conceiv- ably arise from instrumental scattering, we endeavored to confirm the disk using the Keck II telescope with AO on 2006 October 07 and 2007 January 26. Utilizing the near-infrared camera NIRC2, a 0.4′′ radius occulting spot and a 10 mas pixel scale, we confirmed the exis- tence of the disk in J (1.2 µm), H (1.6 µm), and K′ (2.2 µm). PSF subtraction is accomplished by allowing the sky to rotate relative to the detector, thereby separat- ing the stellar PSF from the disk. The observing pro- cedure and data reduction procedure are fully described in Fitzgerald et al. (2007). Due to poor observing condi- http://arxiv.org/abs/0704.0645v1 2 Kalas, Fitzgerald, Graham. tions in October, including intermittent cirrus clouds, we used only the best fraction of data by visually selecting frames of relatively constant intensity and PSF sharp- ness. The resulting effective integration times are 450 s, 980 s, and 600 s for J, H, and K′, respectively. Stan- dard star observations were obtained under similar, non- photometric conditions and processed in a similar man- ner. In January 2007 we re-observed HD 15115 (1930 s cumulative integration time) and two standard stars under photometric conditions from Keck using the same instrumentation with the H broadband filter. However, the observations were made after meridian transit and the limited rotation of the sky relative to the instrumen- tal PSF causes disk emission to be included in the PSF estimate, resulting in disk self-subtraction at small radii. Our analysis of the 2007 January data therefore yields a detection of the west ansa in the region 1.3′′ − 3.3′′ ra- dius. The photometry in this second epoch agrees well with that of the first epoch (on average, the 2007 Jan- uary disk photometry is 0.13 mag arcsec−2 fainter than 2006 October), suggesting that our frame selection tech- nique for the first epoch of cloudy conditions effectively filtered out non-photometric data. 3. RESULTS Fig. 1 shows the PSF-subtracted images of HD 15115 with HST and with Keck. The west side of the disk in the optical HST data has PA = 278◦.5 ± 0.5 and is de- tected from the edge of the occulting spot at 1.5′′ (67 AU) radius to the edge of the field at 12.38′′ (554 AU) radius. The east midplane is detected as far as ∼7′′ (315 AU) radius. At this radius the east midplane begins to intersect the outer portion of the coronagraph’s 3.0′′ oc- culting spot. Further east, past the spot and to the edge of the field, no nebulosity is detected 9.0′′−14.9′′ radius. The appearance of the disk is more symmetric in the 2006 October Keck data, which show the disk between 0.7′′ (31 AU) - 2.5′′ (112 AU). Optical surface brightness contours (Fig. 2) re- veal a sharp midplane morphology for the west ex- tension that indicates an edge-on orientation to the line of sight. The west midplane is qualitatively simi- lar to that of β Pic’s northeast midplane, including a characteristic width asymmetry (Kalas & Jewitt 1995; Golimowski et al. 2006). The northern side of the west midplane is more vertically extended than the southern side. For example, the full-width at half-maximum across the disk midplane at 2′′ radius is 0.19′′ ± 0.10′′ in both the optical and NIR data. However, the vertical cuts are not symmetric about the midplane when measuring the half-width at quarter-maximum (HWQM). The HWQM north of the west midplane is 1.6 ± 0.1 times greater than that of the HWQM south of the west midplane. This width asymmetry is confirmed in the Keck data. If the width asymmetry is found to be in the opposite direction in the opposite midplane, then Kalas & Jewitt (1995) refer to such a feature as the butterfly asymmetry. The butterfly asymmetry is evident in the morphology of β Pic, that Golimowski et al. (2006) recently related to the presence of a second disk midplane tilted relative to the main disk midplane. However, our detection of HD 15115’s east midplane has insufficient signal to noise to confirm the presence of a width asymmetry here. We note that none of the surface brightness profiles show evidence for significant flattening inward toward the star (Fig. 3). All four surface brightness profiles are well-represented by a single power law decrease with radius. If there is an inner dust depletion, then it resides within 40 AU radius. This constraint is consistent with model fits of the spectral energy distribution that place the dominant emitting dust component at∼35 AU radius (Zuckerman & Song 2004; Williams & Andrews 2006). The color of the disk may be estimated in the 2.0′′ − 3.3′′ region where the H-band and V-band data overlap (Fig. 3). At face value, ΣV − ΣH ≈ −0.6 mag arcsec at 2′′ radius, increasing to −1.9 mag arcsec−2 at 3.3′′ radius for the West disk extension. The east ansa has similar blue scattering at 2′′ radius, but the V-band sur- face brightness profile is steeper in the east than in the west, giving a roughly constant blue color with increasing radius in the east. In a future paper we will present a detailed model of dust scattering and thermal emission properties, which requires a more complicated treatment of the obvious disk asymmetry. However, for isotropically scattering grains in an edge-on disk, an analytic approach shows that the grain number density distribution as a function of radius within the disk midplane follows a power-law with index equivalent to one minus the sky-projected ra- dial midplane power-law index. From the Keck data in Fig. 3, we estimate that the disk number density dis- tribution decreases with disk radius as r−3 in the inner region up to ∼3.3′′ radius for both sides of the disk. At > 3.3′′ radius, the optical data show that this profile continues for the east extension, but that the disk num- ber density profiles flattens for the west extension, as described in Fig. 3. A precise measurement of the color and polarization of the disk scattered light is necessary to further constrain the grain size distribution, the corre- sponding scattering phase function and albedo, and the effect on the disk number density profile. 4. DISCUSSION Asymmetric disk structure is evident in the majority of debris disks, and most authors invoke planetary pertur- bations as the likely origin. Secular perturbations may offset the center of global disk symmetry from the lo- cation of the star, though this effect may also be pro- duced by an external perturber (Wyatt et al. 1999). The edge-on debris disk surrounding β Pic displays a vari- ety of radial and vertical asymmetries on large scales (Kalas & Jewitt 1995) that may be most relevant to the study of HD 15115. In the deepest optical images of the β Pic disk, the northeast and southwest disk midplanes are traced to 1835 AU and 1450 AU, respectively, giving a ratio of 1.27 (Larwood & Kalas 2001). In the case of HD 15115 the corresponding ratio is >1.75. This ratio is a lower limit given that the 550 AU extent of the west midplane is limited only by our field of view. A single stellar flyby, or a periodic flyby by a bound companion on an eccentric orbit, has been studied as a potential mechanism for producing β Pic’s large-scale asymmetry (Larwood & Kalas 2001). However, in a kinematic study of Hipparcos-detected stars with pub- lished radial velocities, Kalas et al. (2001) did not find any perturbers that approached closer than 0.6 pc of β Pic, though the sample was estimated as only 20% com- plete. In the case of HD 15115, Moór et al. (2006) noted Extreme Debris Disk Asymmetries 3 that another β Pic moving group member, HIP 12545, is located relatively nearby in sky position. Table 1 summarizes the observed properties of both stars. Their projected separation is 0.51◦, which trans- lates to 0.38 pc at a mean heliocentric distance of 43 pc. Within the uncertainties, the proper motion vectors, the (U, V ) galactic space motions, and heliocentric dis- tance are identical. Furthermore, the eastward location of HIP 12545 is in the direction of the truncated side of the HD 15115 debris disk. This geometry is consistent with the dynamical simulation of a disk disrupted by a stellar flyby in Larwood & Kalas (2001). Specifically, in their Fig. 18, the long end of a highly perturbed disk is located in the direction of periastron. The perturber follows a parabolic trajectory such that in a later epoch it is located in the direction opposite of periastron, or in the direction of the truncated side of the disk. Peri- astron in these models is ∼700 AU, with an initial disk radius of ∼500 AU. Overall, the ensemble of evidence fa- vors further consideration of HD 15115 and HIP 12545 as a possible wide-separation multiple system with a highly eccentric orbit (e > 0.95). If the heliocentric distances are in fact nearly equiva- lent, then the projected sky separation approximates the true separation. Kalas et al. (2001) discuss the Roche ra- dius, at, of a star as containing the volume within which the stellar potential dominates the Galactic tidal field. Using their Eq. 2 and the stellar mass estimates in our Table 1, we find at = 1.1 pc and at= 0.7 pc for HD 15115 and HIP 12545, respectively. Therefore, for a small body gravitationally bound to HD 15115, the potential well of HIP 12545 exerts a more significant perturbing force than the Galactic tidal field at the current epoch. This is not the case if we take the Hipparcos parallaxes at face value. These give a line-of-sight separation between the stars of ∼4 pc, and we derive a 3-D separation of 5.1±2.8 pc. We further calculate that closest approach will occur ∼1 Myr in the future. Therefore, improving the par- allax measurements for both stars is a critical task for future work that would examine their possible physical association. A prediction of the Larwood & Kalas (2001) model is that the perturber may capture disk material, and dis- play a tenuous and highly asymmetric tail of escaping material pointing away from the mother disk. To test the hypothesis that HD 15115 suffered a close encounter with HIP 12545, high-contrast observations of HIP 12545 should reveal circumstellar nebulosity due to captured material. Since this is captured material, the nebulosity may not resemble a disk, and any tail should point away from the mother disk (eastward). To futher investigate this hypothesis, we examined ACS/HRC coronagraphic observations of HIP 12545 ob- tained by program GO-10487 (Principal Investigator David Ardila). The observing technique is similar to that described here for HD 15115. After PSF subtrac- tion, we do not detect nebulosity in the vicinity of HIP 12545. Therefore the possibility that the extreme disk asymmetry of HD 15115 is created by dynamical inter- actions with HIP 12545 does not have further supporting evidence at the present time. Finally, we note that among the four debris disks imaged in scattered light in the BPMG, the dust ap- pears depleted for HD 15115. The values of dust opti- cal depth in 10−4 units are given as 24.3±1.1, 4.9±0.4, 29.3±1.6 and 4.0±0.3 for β Pic (A5V), HD 15115 (F2V), HD 181327 (F5.5V) and AU Mic (M2V), respectively (Moór et al. 2006). The factor of ∼five smaller opti- cal depth for HD 15115 compared to β Pic and HD 181327 suggests a different evolutionary path for the disk. Though a stellar flyby is one possibility, mi- gration and dynamical instabilities within a hypotheti- cal planetary system may also play a role in the rapid diminution of dust parent bodies around HD 15115 (e.g. Morbidelli & Valsecchi 1997). 5. SUMMARY Optical and near-infrared coronagraphic images of the F2 star HD 15115 reveal a highly asymmetric debris disk with an edge-on orientation. We describe the morpho- logical and photometric properties of the disk, deferring a detailed model of scattering and thermal emission of grains to future work. The blue scattered light color may indicate grain properties most similar to those of the AU Mic debris disk, where ΣV − ΣH ≈ −1 mag arcsec relative to the star (Fitzgerald et al. 2007), and less like those of β Pic, which is predominantly red scattering (Golimowski et al. 2006). A key follow-up measurement would be polarization, which in the case of AU Mic re- vealed highly porous macroscopic grains (Graham et al. 2007). With outer optical radius >550 AU, HD 15115 pos- sesses the second largest debris disk next to β Pic. How- ever, the length asymmetry between its west and east midplanes greatly exceeds that of β Pic and other disks. HD 15115 is now the fourth debris disk discovered in scattered light among the β Pic moving group mem- bers. Future work should test our hypothesis that ex- treme asymmetries are due to dynamical perturbations from the nearby M star HIP 12545. Acknowledgements: Support for GO-10896 was provided by NASA through a grant from STScI under NASA contract NAS5-26555. REFERENCES Augereau, J. C., Lagrange, A. M., Mouillet, D., Papaloizou, J. C. B. & Grorod, P.A. 1999, A&A, 348, 557 Aumann, H.H. 1985, PASP, 97, 885 Backman, D. E. & Paresce, F. 1993, in Protostars and Planets III, eds. E. H. Levy & J. I. Lunine, (Univ. Arizona Press, Tucson), p. 1253 Fitzgerald, M. P., Kalas, P., Duchene, G., Pinte, C. and Graham, J. R. 2007, ApJ, submitted. Golimowski, D.A., Ardila, D.R., Krist, J.R., et al., AJ, 131, 3109. Graham. J.R., Kalas, P. & Matthews, B. 2007, ApJ, 654, 595. Kalas, P. & Jewitt, D. 1995, AJ,110, 794 Kalas, P. , Deltorn, J.-M. and Larwood, J. 2001, ApJ, 533, 410 Kalas, P., Graham, J.R., Beckwith, S.V.W., Jewitt, D.C. & Lloyd, J.P. , 2002, ApJ, 567, 999 Kalas, P., Graham. J.R. & Clampin, M.C. 2005, Nature, 435, 1067 Larwood, J. D. & Kalas, P. 2001, MNRAS, 323, 402 Liou, J. -C. & Zook, H. A. 1999, AJ, 118, 580 Meyer, M.R., Backman, D.E., Weinberger, A. J. & Wyatt, M. 2007, in Protostars and Planets V, in press. Moór, A., Abraham, P., Derekas, A., et al. 2006, ApJ, 644, 525 4 Kalas, Fitzgerald, Graham. Morbidelli, A. & Valsecchi, G. B. 1997, Icarus, 128, 464 Moro-Mart́ın, A. & Malhotra, R. 2002, AJ, 124, 2305 Roques, F. , Scholl, H., Sicardy, B. & Smith, B.A. 1994, Icarus, 108, 37 Silverstone, M.D., Ph.D. thesis Song, I., Zuckerman, B., and Bessel, M.S. 2003, ApJ, 599, 342 Smith, B.A. & Terrile, R. J. 1984, Science, 226, 1421 Strubbe, L. E. & Chiang, E. I. 2005, ApJ, 648, 652 Williams, J. P. & Andrews, S. M. 2006, ApJ, 653, 1480 Wyatt, M. C., Dermott, S.F., Telesco, C.M., et al. 1999, ApJ, 527, 918 Zuckerman, B. & Song, I. 2004, ApJ, 603, 738 Extreme Debris Disk Asymmetries 5 Fig. 1.— False-color, log scale images of the HD 15115 disk as originally discovered using the ACS/HRC F606W (λc= 591 nm, ∆λ = 234 nm) [LEFT] and confirmed in H-band with Keck II adaptive optics [RIGHT; 2006 October 26 data]. North is up, east is left and the scale bars span 5′′. In the HST image we use gray fields over the occulting bar and 3.0′′ occulting spot located to the left of HD 15115, as well as a gray disk covering PSF-subtraction artifacts surrounding HD 15115 itself. If the HD 15115 disk were a symmetric structure, then the east side of the disk would have been detected within the rectangular box, shown below the ACS/HRC occulting finger. The NIR data [RIGHT] show a more symmetric disk within 2′′ radius, with asymmetry becoming more apparent beyond 2′′ radius. Due to poor observing conditions, the field is contaminated by residual noise due to the diffraction pattern of the telescope (e.g. at 2 o’clock and 7 o’clock relative to HD 15115). However, whereas the residual diffraction pattern noise of the telescope rotates relative to the sky orientation over a series of exposures, the image of the disk remains fixed and it is confirmed as real. Fig. 2.— Surface brightness isocontours for the HD 15115 debris disk converted from F606W to Johnson V-band (derived using STSDAS/SYNPHOT with a Kurucz model atmosphere and the appropriate observatory parameters). The disk was rotate by 8◦ clockwise such that the midplane lies along a horizontal line. The bottom frame is the east extension, transposed across the vertical axis, and the gray region marks the area occupied by the ACS/HRC occulting finger and 3.0′′ occulting spot. The left edge of the frame corresponds to 2′′ radius from the star. The innermost contour (bold) is 19.0 mag arcsec−2 and the outermost contour represents 23.0 mag arcsec−2, with a contour interval of 0.5 mag arcsec−2. 6 Kalas, Fitzgerald, Graham. Fig. 3.— Radial surface brightness (mag arcsec−2) distribution along the west and east midplanes of HD 15115. We plot the difference between the measured disk surface brightness and the stellar magnitudes of H=5.86 and V=6.80. Disk photometry was extracted from boxes 0.25′′ × 0.25′′ centered on the midplane. We plot a representative sample of error bars that gives the standard deviation of the background residuals as a function of radius. The aperture corrections derived from point source photometry are 0.48 and 0.57 for the H-band data in the 2006 October and 2007 January observations, respectively, and 0.70 for the V band data. The H-band radial profiles between 0.7′′ and 2.3′′ radius may be described by power-laws with indices -3.7 and -4.4 for the west and east disk extensions, respectively. In the V band, the east midplane profile may be fit by a power-law with index -4.0 between 2.0′′ and 6.0′′ radius. Thus, our data do not show a significant color gradient as a function of radius for the east ansa. The west midplane profile in V band may be fit by a single power-law with index -3.0 between 2.0′′ and 10.0′′ radius. This profile is significantly shallower than the H band profile, resulting in a blue color gradient as a function of radius for the west ansa. Extreme Debris Disk Asymmetries 7 TABLE 1 Stellar Properties HD 15115 HIP 12545 Ref. Spectral Type F2 M0 Hipparcos mV (mag) 6.79 10.28 Hipparcos Mass (M⊙) 1.6 0.5 Astrophys. Quant. Distance (pc) 44.78+2.22 −2.01 40.54+4.38 −3.61 Hipparcos RA (ICRS) 02 26 16.2447 02 41 25.89 Hipparcos DEC (ICRS) +06 17 33.188 +05 59 18.41 Hipparcos µα(mas/yr) 86.09 ± 1.09 82.32± 4.46 Hipparcos µδ (mas/yr) −50.13 ± 0.71 −55.13± 2.45 Hipparcos µα(mas/yr) 87.1 ± 1.2 82.3± 4.3 Tycho-2 µδ (mas/yr) −50.9± 1.2 −55.1± 2.7 Tycho-2 U (km / s) −13.2± 1.9 −14.0± 0.5 a V (km / s) −17.8± 1.2 −16.7± 0.9 a W (km / s) −6.0± 2.3 −10.0± 0.5 a a Galactic kinematics for HD 15115 and HIP 12545 from Moór et al. (2006) and Song et al. (2003), respectively ABSTRACT We report the first scattered light detection of a dusty debris disk surrounding the F2V star HD 15115 using the Hubble Space Telescope in the optical, and Keck adaptive optics in the near-infrared. The most remarkable property of the HD 15115 disk relative to other debris disks is its extreme length asymmetry. The east side of the disk is detected to ~315 AU radius, whereas the west side of the disk has radius >550 AU. We find a blue optical to near-infrared scattered light color relative to the star that indicates grain scattering properties similar to the AU Mic debris disk. The existence of a large debris disk surrounding HD 15115 adds further evidence for membership in the Beta Pic moving group, which was previously argued based on kinematics alone. Here we hypothesize that the extreme disk asymmetry is due to dynamical perturbations from HIP 12545, an M star 0.5 degrees (0.38 pc) east of HD 15115 that shares a common proper motion vector, heliocentric distance, galactic space velocity, and age. <|endoftext|><|startoftext|> Introduction to Kolmogorov Complexity and Its Applications (Springer: Berlin, 1997) [46] E. Borel, Rend. Circ. Mat. Paleremo, 26, 247 (1909) [47] K. L. Chung, A Course in Probability Theory (New York: Academic, 1974) [48] P. C. W. Davies 1990, in Complexity, Entropy, and Physical Information, ed. W. H. Zurek (Addison- Wesley: Redwood City), p61 [49] M. Tegmark, Found. Phys. Lett., 9, 25 (1996) [50] H. D. Zeh, The Physical Basis of the Direction of Time, 4th Ed. (Springer: Berlin, 2002) [51] A. Albrecht and L. Sorbo, PRD, 70, 063528 (2004) [52] S. M. Carroll and J. Chen, Gen.Rel.Grav., 37, 1671 (2005) [53] R. M. Wald, gr-qc/0507094 (2005) [54] D. N. Page, hep-th/0612137 (2006) [55] A. Vilenkin, JHEP, 701, 92 (2007) [56] L. Boltzmann, Nature, 51, 413 (1895) [57] A. Guth, PRD, 23, 347 (1981) [58] A. Vilenkin, PRD, 27, 2848 (1983) [59] A. A. Starobinsky, Fundamental Interactions (MGPI Press, Moscow: p.55, 1984) [60] A. D. Linde, Particle Physics and Inflationary Cosmol- ogy (Harwood: Switzerland, 1990) [61] A. H. Guth, hep-th/0702178 (2007) [62] R. Penrose, N.Y.Acad.Sci., 571, 249 (1989) [63] S. Hollands and R. M. Wald, Gen.Rel.& Grav., 34, 2043 (2002) [64] L. Kofman, A. Linde, and V. Mukhanov, JHEP, 10, 57 (2002) [65] D. Giulini, E. Joos, C. Kiefer, J. Kupsch, I. O. Sta- matescu, and H. D. Zeh, Decoherence and the Appear- ance of a Classical World in Quantum Theory (Berlin: Springer, 1996) [66] D. Polarski and A. A. Starobinsky, Class. Quant. Grav., 13, 377 (1996) [67] K. Kiefer and D. Polarski, Ann. Phys., 7, 137 (1998) [68] M. Tegmark, JCAP, 2005-4, 1 (2005) [69] R. Easther, E. A. Lim, and M. R. Martin, JCAP, 0603, 16 (2006) [70] R. Bousso, PRL, 97, 191302 (2006) [71] A. Vilenkin, hep-th/0609193 (2006) [72] A. Aguirre, S. Gratton, and M. C. Johnson, hep-th/0611221 (2006) [73] J. Garriga and A. Vilenkin, PRD, 64, 043511 (2001) [74] D. Deutsch, The fabric of reality (Allen Lane: New York, 1997) [75] A. D. Linde, hep-th/0211048 (2002) [76] G. F. R. Ellis, U. Kirchner, and W. R. Stoeger, MN- RAS, 347, 921 (2004) [77] W. R. Stoeger, G. F. R. Ellis, and U. Kirchner, astro-ph/0407329 (2004) [78] R. D. Holder, God, the Multiverse, and Everything: Modern Cosmology and the Argument from Design (Ashgate: Burlington, 2004) [79] S. Weinberg, hep-th/0511037 (2005) [80] S. M. Carroll, Nature, 440, 1132 (2006) [81] D. N. Page, hep-th/0610101 (2006) [82] P. Davies 2007, in Universe or Multiverse?, ed. B. Carr (Cambridge: Cambridge Univ. Press) [83] M. Kaku, Parallel Worlds: A Journey Through Cre- ation, Higher Dimensions, and the Future of the Cos- mos (Anchor: New York, 2006) [84] A. Vilenkin, Many Worlds in One: The Search for Other Universes (Hill and Wang: New York, 2006) [85] R. Bousso and J. Polchinski, JHEP, 6, 6 (2000) [86] J. L. Feng, J. March-Russell, S. Sethi, and Wilczek F, Nucl. Phys. B, 602, 307 (2001) [87] S. Kachru, R. Kallosh, A. Linde, and S. P. Trivedi, PRD, 68, 046005 (2003) [88] L. Susskind, hep-th/0302219 (2003) [89] S. Ashok and M. R. Douglas, JHEP, 401, 60 (2004) [90] S. Feferman, In the Light of Logic, chapter 14 (Oxford Univ. Press: Oxford, 1998) [91] R. Hersh, What Is Mathematics, Really? (Oxford Univ. Press: Oxford, 1999) [92] D. Lewis, On the Plurality of Worlds (Blackwell: Oxford, 1986) [93] S. Hawking, A Brief History of Time (Touchstone: New York, 1993) [94] G. F. R. Ellis, Class.Quant.Grav., 16, A37 (1999) [95] C. Schmidhuber, hep-th/0011065 (2000) [96] C. J. Hogan, Rev.Mod.Phys., 72, 1149 (2000) [97] P. Benioff, PRA, 63, 032305 (2001) [98] G. F. R. Ellis, Int.J.Mod.Phys. A, 17, 2667 (2002) [99] N. Bostrom, Anthropic Bias: Observation Selection Ef- fects in Science and Philosophy (Routledge: New York, 2002) [100] P. Benioff, Found.Phys., 32, 989 (2002) [101] P. Benioff, quant-ph/0303086 (2003) [102] M. M. Circovic, Found.Phys., 33, 467 (2003) [103] R. Vaas, physics/0408111 (2004) [104] A. Aguirre and M. Tegmark, hep-th/0409072 (2004) [105] P. Benioff, Found.Phys., 35, 1825 (2004) [106] G. McCabe, http://philsci-archive.pitt.edu /archive/00002218 (2005) [107] P. Hut, M. Alford, and M. Tegmark, Found. Phys., 36, 765 (2006, physics/0510188) [108] B. Vorhees, C. Luxford, and A. Rhyan, Int. J. Uncon- ventional Computing, 1, 69 (2005) [109] G. F. R. Ellis, astro-ph/0602280 (2006) [110] W. R. Stoeger, astro-ph/0602356 (2006) [111] R. Hedrich, physics/0604171 (2006) [112] K. E. Drexler, Engines of Creation: The Coming Era of Nanotechnology (Forth Estate: London, 1985) [113] N. Bostrom, Int. Journal of Futures Studies, 2, 1 (1998) [114] R. Kurzweil, The Age of Spiritual Machines: When com- puters exceed human intelligence (Viking: New York, 1999) [115] H. Moravec, Robot: Mere Machine to Transcendent Mind (Oxford Univ. Press: Oxford, 1999) [116] F. J. Tipler, The Physics of Immortality (Doubleday: New York, 1994) [117] N. Bostrom, Philosophical Quarterly, 53, 243 (2003) [118] G. McCabe, Stud. Hist. Philos. Mod. Phys., 36, 591, physics/0511116 (2005) [119] R. Penrose, The Emperor’s New Mind (Oxford Univ. Press: Oxford, 1989) [120] R. Penrose 1997, in The Large, the Small and the Hu- man Mind, ed. M. Longair (Cambridge Univ. Press: Cambridge) [121] T. Hafting, Nature, 436, 801 (2005) [122] R. Gambini, R. Porto, and J. Pullin, New J. Phys., 6, 45 (2004) [123] G. Egan, Permutation City (Harper: New York, 1995) [124] R. K. Standish, Found. Phys. Lett., 17, (2004) [125] M. Davis, Computability and Unsolvability, Dover, New http://arXiv.org/abs/gr-qc/0507094 http://arXiv.org/abs/hep-th/0612137 http://arXiv.org/abs/hep-th/0702178 http://arXiv.org/abs/hep-th/0609193 http://arXiv.org/abs/hep-th/0611221 http://arXiv.org/abs/hep-th/0211048 http://arXiv.org/abs/astro-ph/0407329 http://arXiv.org/abs/hep-th/0511037 http://arXiv.org/abs/hep-th/0610101 http://arXiv.org/abs/hep-th/0302219 http://arXiv.org/abs/hep-th/0011065 http://arXiv.org/abs/quant-ph/0303086 http://arXiv.org/abs/physics/0408111 http://arXiv.org/abs/hep-th/0409072 http://philsci-archive.pitt.edu http://arXiv.org/abs/physics/0510188 http://arXiv.org/abs/astro-ph/0602280 http://arXiv.org/abs/astro-ph/0602356 http://arXiv.org/abs/physics/0604171 http://arXiv.org/abs/physics/0511116 York (1982) [126] D. Hilbert and P. Bernays, Grundlagen der Matematik (Springer: Berlin, 1934) [127] K. Gödel, I. Monatshefte f. Mathematik und Physik, 38, 173 (1931) [128] S. G. Simpson, Journal of Symbolic Logic, 53, 349, http://www.math.psu.edu/simpson/papers/hilbert.pdf (1988) [129] J. W. Dawson, 21st Annual IEEE Symposium on Logic in Computer Science, IEEE, p339 (2006) [130] A. Church, Am. J. Math., 58, 345 (1936) [131] A. Turing, Proc. London Math. Soc., 42, 230 (1936) [132] R. L. Goodstein, Constructive formalism; essays on the foundations of mathematics (Leister Univ. College: Le- icester, 1951) [133] G. McCabe, Stud.Hist.Philos.Mod.Phys., 36, 591 (2005, physics/0511116) [134] X. Wen, Prog.Theor.Phys.Suppl., 160, 351 (2006, cond-mat/0508020) [135] M. Levin and X. Wen, hep-th/0507118 (2005) [136] J. D. Barrow and F. J. Tipler, The Anthropic Cosmo- logical Principle (Clarendon Press: Oxford, 1986) [137] A. D. Linde 1987, in 300 Years of Gravitation, ed. S. Hawking and W. Israel (Cambridge University Press: Cambridge) [138] S. Weinberg, PRL, 59, 2607 (1987) [139] A. D. Linde, PLB, 201, 437 (1988) [140] M. Tegmark, A. Vilenkin, and L. Pogosian, astro-ph/0304536 (2003) [141] L. Pogosian, A. Vilenkin, and M. Tegmark, JCAP, 407, 5 (2004) [142] R. Jones, Philosophy of Science, 58, 185 (1991) [143] O. Pooley 2007, in The Structural Foundations of Quan- tum Gravity, ed. D. P. Rickles and S. R. D. French (Ox- ford Univ. Press: Oxford) [144] T. A. Larsson, math-ph/0103013v3 (2001) http://www.math.psu.edu/simpson/papers/hilbert.pdf http://arXiv.org/abs/physics/0511116 http://arXiv.org/abs/cond-mat/0508020 http://arXiv.org/abs/hep-th/0507118 http://arXiv.org/abs/astro-ph/0304536 http://arXiv.org/abs/math-ph/0103013 ABSTRACT I explore physics implications of the External Reality Hypothesis (ERH) that there exists an external physical reality completely independent of us humans. I argue that with a sufficiently broad definition of mathematics, it implies the Mathematical Universe Hypothesis (MUH) that our physical world is an abstract mathematical structure. I discuss various implications of the ERH and MUH, ranging from standard physics topics like symmetries, irreducible representations, units, free parameters, randomness and initial conditions to broader issues like consciousness, parallel universes and Godel incompleteness. I hypothesize that only computable and decidable (in Godel's sense) structures exist, which alleviates the cosmological measure problem and help explain why our physical laws appear so simple. I also comment on the intimate relation between mathematical structures, computations, simulations and physical systems. <|endoftext|><|startoftext|> Introduction Cosmological observations may provide several interesting ways of testing string theory, which is important for its further development. For example, a discovery of the cosmological acceleration corresponding to the existence of the cosmological constant Λ ∼ 10−120 (in Planck unitsMp = 1, where Mp = 2.435× 1018 GeV) initially was viewed as a problem for string theory. For a while is was not known how to describe an accelerating 4D universe in a vacuum state with a positive energy density. Eventually the problem was resolved by the KKLT construction [1] (developing on [2]), which allowed to explain acceleration in a metastable vacuum state. Earlier and further investigation of these issue [3], combined with the ideas of eternal inflation [4, 5], resulted in the development of the idea of inflationary multiverse [5, 6] and string landscape scenario [7], which may have important implications for the general methodology of theoretical physics. There are some other ways in which cosmology can be used for testing string theory. Much attention of string theory and cosmology communities during the recent few years, starting with [8], was dedicated to the possible future detection of cosmic strings produced after inflation [9, 10]. It is viewed as a possible window of a string theory into the real world. If detected, cosmic strings in the sky may test various ideas in string theory and cosmology. One may also try to check which versions of string theory lead to the best description of in- flation, in agreement with the existing measurements of the anisotropy of the cosmic microwave background radiation produced by scalar perturbations of metric [11]. These measurements provide an important information about the structure of the inflaton potential [12, 13, 14, 15]. In particular, observational constraints on the amplitude of scalar perturbations, in the slow roll approximation, imply that V 3/2 ≃ 5× 10−4 , (1.1) whereas the spectral index of the scalar perturbations is given by ns = 1− 3 ≈ 0.95± 0.02 (1.2) if the ratio of tensor perturbations to scalar perturbations is sufficiently small, r ≪ 0.1. For larger values of r, e.g. for r ∼ 0.2, ns = 0.98± 0.02. However, these data give rather indirect information about V : One can reduce the overall scale of energy density by many orders of magnitude, change its shape, and still obtain scalar perturbations with the same properties. In this sense, a measurement of the tensor perturbations (gravitational waves) [16], or of the tensor-to scalar ratio r = T/S, would be especially informative, since it is directly related to the value of the inflationary potential and the Hubble constant during inflation [12], r = 8 ≈ 3× 107 V ∼ 108 H2. (1.3) The last part of this equation follows from Eg. (1.1) and from the Einstein equation H2 = V/3. The purpose of this note is to address the issues of string cosmology in view of the possibility that tensor modes in primordial spectrum may be detected. We will argue here that the possible detection of tensor modes from inflation may have dramatic consequences for string theory and for fundamental physics in general. The current limit on the ratio of tensor to scalar fluctuations is r < 0.3. During the next few years one expects to probe tensor modes with r ∼ 0.1 and gradually reach the level of r ∼ 0.01. It is believed that probing below r ∼ 10−2 − 10−3 will be “formidably difficult” [17]. However, the interval between r = 0.3 and r ∼ 10−3 is quite large, and it can be probed by the cosmological observations. Expected amplitude of tensor perturbations in stringy inflation appears to be very low, r ≪ 10−3, see in particular [18, 19]. In Section 2 we will briefly review their results, as well as some other recent results concerning string theory inflation [20]. In Section 3 we give some independent arguments using the relation between the maximal value of the Hubble constant during inflation and the gravitino mass [21], which suggest that in the superstring models based on generic KKLT construction the amplitude of tensor perturbations in string theory inflation with m3/2 <∼ 1 TeV should be extremely small, r <∼ 10−24. One could argue therefore that the experimental detection of tensor modes would be in a contradiction with the existing models of string cosmology. Let us remember, however, that many of us did not expect the discovery of the tiny cosmological constant Λ ∼ 10−120, and that it took some time before we learned how to describe acceleration of the universe in the context of string theory. Since there exists a class of rather simple non-stringy inflationary models predicting r in the interval 0.3 <∼ r <∼ 10−3 [22, 23, 24, 28, 25, 26], it makes a lot of sense to look for tensor perturbations using the CMB experiments. It is important to think, therefore, what will happen if the cosmological observations will discover tensor perturbations in the range 10−3 < r < 0.3. As we will see, this result would not necessarily contradict string theory, but it may have important implications for the models of string theory inflation, as well as for particle phenomenology based on string theory. 2 Tensor modes in the simplest inflationary models Before discussing the amplitude of tensor modes in string theory, we will briefly mention what happens in general non-stringy inflationary models. The predicted value of r depends on the exact number of e-foldings N which happened after the time when the structure was formed on the scale of the present horizon. This number, in turn, depends on the mechanism of reheating and other details of the post-inflationary evolution. For N ∼ 60, one should have r ∼ 0.14 for the simplest chaotic inflation model m2φ2/2, and r ∼ 0.28 for the model λφ4/4. In the slow-roll approximation, one would have r = 8/N for the model m2φ2/2 and 16/N for the model λφ4/4 [12]. If one considers the standard spontaneous symmetry breaking model with the potential V = − (φ2 − v2)2 , (2.1) with v = m/ λ, it leads to chaotic inflation with the tensor to scalar ratio which can take any value in the interval 10−2 <∼ r <∼ 0.3, for N ∼ 60. The value of r depends on the scale of the spontaneous symmetry breaking v [23, 24], see Fig. 1. The situation in the so-called natural inflation model [25] is very similar [26], except for the upper branch of the curve above the green star (the first star from below) shown in Fig. 1, which does not appear in natural inflation. Figure 1: Possible values of r and ns in the theory (φ2 − v2)2 for different initial conditions and different v, for N = 60. In the small v limit, the model has the same predictions as the theory λφ4/4. In the large v limit it has the same predictions as the theory m2φ2. The branch above the green star (the first star from below) corresponds to inflation which occurs while the field rolls down from large φ, as in the simplest models of chaotic inflation. The lower branch corresponds to the motion from φ = 0, as in new inflation. If one considers chaotic inflation with the potential including terms φ2, φ3 and φ4, one can considerably alter the properties of inflationary perturbations [27] and cover almost all parts of the area in the (r, ns) plane allowed by the latest observational data [28]. However, in all of these models the value of r is large because the change of the inflation field during the last 60 e-folds of inflation is greater than Mp = 1 [29], which is not the case in many other inflationary models, such as new inflation [30] and hybrid inflation [31], see [29, 32] for a discussion of this issue. Therefore the bet for the possibility of the observational discovery of tensor modes in non-stringy inflationary models would be a bet for the triumph of simplicity over majority. 3 Existing models of string theory inflation do not pre- dict a detectable level of tensor modes String theory at present has produced two classes of models of inflation: brane inflation and modular inflation, see [10, 20, 33] for recent reviews. The possibility of a significant level of tensor modes in known brane inflation models was carefully investigated by several authors. The following conclusion has been drawn from our analysis of the work performed by Bean, Shandera, Tye, and Xu [19]. They compared the brane inflationary model to recent cosmological data, including WMAP 3-year cosmic microwave background (CMB) results, Sloan Digital Sky Survey luminous red galaxies (SDSS LRG) power spectrum data and Supernovae Legacy Survey (SNLS) Type 1a supernovae distance measures. When they used the bound on the distance in the warped throat geonetry derived by Baumann and McAllister [18], it became clear that in all currently known models of brane inflation (including DBI models [34]) the resulting primordial spectrum could not simultaneously have significant deviation from the slow roll behavior and satisfy the bound [18]. Moreover the slow roll inflation models that satisfy the bound have very low tensors not measurable by current or even upcoming experiments. The known models of brane inflation include the motion of a D3 brane down a single throat in the framework of the KKLMMT scenario [9]. In short, the bound on an inflaton field, which is interpreted as a distance between branes, does not permit fields with vev’s of Planckian scale or larger, which would lead to tensor modes. A work on the improved derivation of the bound including the breathing mode of the internal geometry is in progress [35]. At present, there is still a hope that it may be possible to go beyond the simplest models of brane inflation and evade the constraint on the field range. However, this still has to be done before one can claim that string theory has a reliable class of brane inflation models predicting tensor modes, or, on the contrary, that brane inflation predicts a non-detectable level of tensor modes. All known models of modular inflation in string theory (no branes) do not predict a de- tectable level of gravity waves [33], [20]. The only string theory inspired version of assisted inflation model [36], N-flation [37], would predict a significant level of tensors, as in chaotic and natural inflation [22, 25, 26], if some assumptions underlying the derivation of this model would be realized. The main assumption is that in the effective supergravity model with numerous complex moduli, tn = + iM2R2n, all moduli R n quickly go to their minima. Then only the axions φn remain to drive inflation. The reason for this assumption is that the Kähler potential depends only on the volume modulus of all two-cycles, R2n = − i2M2 (tn − t̄n), but is does not depend on the axions φn (tn + t̄n), so one could expect that the axion directions in the first approximation remain flat. Recently this issue was re-examined in [20], and it was found that in all presently available models this assumption is not satisfied. The search for models in various regions of the string theory landscape which would support assumptions of N-flation is in progress [38]. Thus at present we are unaware of any string inflation models predicting the detectable level of gravitational waves. However, a search for such models continues. We should mention here possible generalizations on N-flation, new types of brane inflation listed in Sec. 5 of [19] and some work in progress on DBI models in a more general setting [39]. We may also try to find a string theory generalization of a class of inflationary models in N = 1 d = 4 supergravity, which has shift symmetry and predict large tensor modes. One model is a supergravity version [40] of chaotic inflation, describing fields Φ and X with (Φ + Φ̄)2 +XX̄ , W = mΦX . (3.1) This model effectively reproduces the simplest version of chaotic inflation with V = 1 m2φ2, where the inflaton field is φ = i(Φ− Φ̄). Here the prediction for r, depending on the number of e-foldings, is 0.14 <∼ r <∼ 0.20. Another model is a supergravity version [20] of natural inflation [25]. (Φ + Φ̄)2 , W = w0 +Be −bΦ . (3.2) This model has an axion valley potential in which the radial part of the complex field quickly reaches the minimum. Therefore this model effectively reproduces natural inflation with the axion playing the role of the inflaton with potential V = V0(1 − cos(bφ)) where φ = i(Φ − Φ̄). Here the possible range of r, depending on the number of e-foldings and the axion decay constant 2 b)−1, is approximately 5× 10−3 <∼ r <∼ 0.20 [26]. Both models have one feature in common. They require shift symmetry of the canonical Kähler potential K = 1 (Φ + Φ̄)2, Φ → Φ + iδ , δ = δ̄ . (3.3) The inflaton potential appears because this shift symmetry is slightly broken by the superpo- tential. If supersymmetry will be discovered in future, one would expect that inflationary potential should be represented by a supergravity potential, or even better, by the supergravity effective potential derivable from string theory. It is gratifying that at least some supergravity models capable of prediction of large amplitude of tensor perturbations from inflation are available. So far, neither of the supergravity models in (3.1), (3.2) with detectable level of gravity waves was derived from string theory.1 It would be most important to study all possible corners of the landscape in a search of models which may eventually predict detectable tensor fluctuations, or prove that it is not possible. The future data on r will make a final judgment on the theories discussed above. If some models in string cosmology with r > 10−3 will be found, one can use the detection of gravity waves for testing models of moduli stabilization in string theories, and in this way relate cosmology to particle physics. The main point here is that the value of the Hubble constant during inflation is directly measurable in case that gravity waves are detected. 4 Scale of SUSY breaking, the gravitino mass, and the amplitude of the gravitational waves in string theory inflation So far, we did not discuss relation of the new class of models with particle phenomenology. This relation is rather unexpected and may impose strong constraints on particle phenomenology 1There is a difference between arbitrary N = 1, d = 4 supergravity model of the general type and models derived from string theory where various fields in effective supergravity theory have some higher-dimensional interpretation, like volumes of cycles, distance between branes etc. However, there are situations in string theory when the actual value of the Kähler potential is not known and therefore models like (3.1), (3.2) are not a priori excluded. and on inflationary models: In simplest models based on the KKLT mechanism the Hubble constant H should be smaller than the present value of the gravitino mass [21], H <∼ m3/2 . (4.1) The reason for this bound is that the mass of gravitino at the supersymmetric KKLT minimum with DW = 0 before the uplifting is given by 3m23/2 = |VAdS|. Uplifting of the AdS minimum to the present nearly Minkowski vacuum is achieved by adding to the potential a term of the type of C/σn, where σ is the volume modulus and n = 3 for generic compactification and n = 2 for the highly warped throat geometry. Since the uplifting is less significant at large σ, the barrier created by the uplifting generically is a bit smaller than |VAdS|. Adding the energy of the inflaton field leads to an additional uplifting. Since it is also proportional to an inverse power of the volume modulus, it is greater at the minimum of the KKLT potential than at the top of the barrier. Therefore adding a large vacuum energy density to the KKLT potential, which is required for inflation, may uplift the minimum to the height greater than the height of the barrier, and destabilize it, see Fig. 2. This leads to the bound (4.1). 100 150 200 250 Σ Figure 2: The lowest curve with dS minimum is the potential of the KKLT model. The second one shows what happens to the volume modulus potential when the inflaton potential Vinfl = V (φ) added to the KKLT potential. The top curve shows that when the inflaton potential becomes too large, the barrier disappears, and the internal space decompactifies. This explains the constraint H <∼ m3/2. One should note that an exact form of this bound is a bit more complicated than (4.1), containing additional factors which depend logarithmically on certain parameters of the KKLT potential. However, unless these parameters are exponentially large or exponentially small, one can use the simple form of this bound, H <∼ m3/2. Therefore if one believes in the standard SUSY phenomenology with m3/2 <∼ O(1) TeV, one should find a realistic particle physics model where the nonperturbative string theory dynamics occurs at the LHC scale (the mass of the volume modulus is not much greater than the gravitino mass), and inflation occurs at a density at least 30 orders of magnitude below the Planck energy density. Such models are possible, but their parameters should be substantially different from the parameters used in all presently existing models of string theory inflation. An interesting observational consequence of this result is that the amplitude of the grav- itational waves in all string inflation models of this type should be extremely small. Indeed, according to Eq. (1.3), one has r ≈ 3× 107 V ≈ 108 H2, which implies that r <∼ 10 8 m23/2 , (4.2) in Planck units. In particular, for m3/2 <∼ 1 TeV ∼ 4 × 10−16 Mp, which is in the range most often discussed by SUSY phenomenology, one has r <∼ 10 −24 . (4.3) If CMB experiments find that r >∼ 10−2, then this will imply, in the class of theories described above, that m3/2 >∼ 10 −5 Mp ∼ 2.4× 1013 GeV , (4.4) which is 10 orders of magnitude greater than the standard gravitino mass range discussed by particle phenomenologists. There are several different ways to address this problem. First of all, one may consider KKLT models with the racetrack superpotential containing at least two exponents and find such parameters that the supersymmetric minimum of the potential even before the uplifting occurs at zero energy density [21], which would mean m3/2 = 0. Then, by a slight change of parameters one can get the gravitino mass squared much smaller than the height of the barrier, which removes the constraint H <∼ m3/2. If we want to increase the upper bound on H from 1 TeV up to 1013 GeV for m3/2 ∼ 1 TeV, we would need to fine-tune the parameters of the model of Ref. [21] with a very high accuracy. Therefore it does not seem easy to increase the measurable value of r in the model of [21] from 10−24 up to 10−3. However, this issue requires a more detailed analysis, since this model is rather special: In its limiting form, it describes a supersymmetric Minkowski vacuum without any need of uplifting, and it has certain advantages with respect to vacuum stability being protected by supersymmetry were discussed in [41]. Therefore it might happen that this model occupies a special place in the landscape which allows a natural way towards large r. We will discuss now several other models of moduli stabilization in string theory to see whether one can overcome the bound (4.2). A new class of moduli stabilization in M-theory was recently developed in [42]. In particular cases studied numerically, the height of the barrier after the uplifting is about Vbarrier ≈ 50m23/2, in some other cases, Vbarrier ≤ O(500) m23/2 [43]. It seems plausible that for this class of models, just as in the simplest KKLT models, the condition that Vbarrier ≥ 3H2 is required for stabilization of moduli during inflation. Since the gravitino mass in this model is in the range from 1 TeV to 100 TeV, the amplitude of the tensor modes is expected to be negligibly small. Another possibility is to consider the large volume compactification models with stringy α′ corrections taken into account [44]. At first glance, this also does not seem to help. The AdS minimum at which moduli are stabilized before the uplifting is not supersymmetric, which means that generically in AdS minimum 3m23/2 = |V |AdS + eK |DW |2 ≥ |V |AdS. Upon uplift- ing, generically the height of the barrier is not much different from the absolute value of the potential in the AdS minimum, Vbarrier ∼ |V |AdS. As the result, the situation with the destabi- lization during inflation may seem even more difficult than in the simplest KKLT models: the extra term due to broken supersymmetry eK |DW |2 6= 0 tends to increase the gravitino mass squared as compared to |V |AdS. This decreases the ratio of the height of the barrier after the uplifting to the gravitino mass squared. However, a more detailed investigation of this model is required to verify this conjecture. As we already mentioned, an important assumption in the derivation of the constraint H <∼ m3/2 in the simplest version of the KKLT model is the absence of exponentially large parameters. Meanwhile the volume of compactification in [44] is exponentially large. One should check whether this can help to keep the vacuum stabilized for large H . But this class of models offers another possible way to address the low-H problem: In the phenomenological models based on [44] the gravitino mass can be extremely large. Phe- nomenological models with superheavy gravitinos were also considered in [45, 46]. In particu- lar, certain versions of the split supersymmetry models allow gravitino masses in the range of 1013 − 1014 GeV [46]. Therefore in such models the constraint H <∼ m3/2 is quite consistent with the possibility of the discovery of tensor modes with 10−3 <∼ r <∼ 0.3 if the problems with constructing the corresponding inflationary models discussed in the previous section will be resolved. We would like to stress that we presented here only a first scan of possibilities available in string cosmology with regard to detectability of the tensor modes, and so far the result is negative. More studies are required to have a better prediction of r in string cosmology. It would be most important either to construct a reliable inflationary model in string theory predicting tensors with 10−3 <∼ r <∼ 0.3, or prove a no-go theorem. If tensor modes will not be detected, this issue will disappear; the attention will move to more precise values of the tilt of the spectrum ns, non-gaussianity, cosmic strings and other issues which will be clarified by observations in the next few years. However, a possible discovery of tensor modes may force us to reconsider several basic assumptions of string cosmology and particle phenomenology. In particular, it may imply that the gravitino must be superheavy. Thus, investigation of gravitational waves produced during inflation may serve as a unique source of information about string theory and about the fundamental physics in general. Acknowledgments We are grateful to D. Baumann, R. Bean, S.E. Church, G. Efstathiou, S. Kachru, L. Kofman, D. Lyth, L. McAllister, V. Mukhanov, S. Shenker, E. Silverstein and H. Tye for very stimulating discussions. This work was supported by NSF grant PHY-0244728. References [1] S. Kachru, R. Kallosh, A. Linde and S. P. Trivedi, “De Sitter vacua in string theory,” Phys. Rev. D 68, 046005 (2003) [arXiv:hep-th/0301240]. [2] S. B. Giddings, S. Kachru and J. Polchinski, “Hierarchies from fluxes in string com- pactifications,” Phys. Rev. D66, 106006 (2002) [arXiv:hep-th/0105097]; E. Silverstein, “(A)dS backgrounds from asymmetric orientifolds,” arXiv:hep-th/0106209; A. Mal- oney, E. Silverstein and A. Strominger, “De Sitter space in noncritical string theory,” arXiv:hep-th/0205316. [3] W. Lerche, D. Lüst and A. N. Schellekens, “Chiral Four-Dimensional Heterotic Strings From Selfdual Lattices,” Nucl. Phys. B 287, 477 (1987); R. Bousso and J. Polchin- ski, “Quantization of four-form fluxes and dynamical neutralization of the cosmologi- cal constant,” JHEP 0006, 006 (2000) [arXiv:hep-th/0004134]; M. R. Douglas, “The statistics of string / M theory vacua,” JHEP 0305 046 (2003) [arXiv:hep-th/0303194]; F. Denef and M. R. Douglas, “Distributions of flux vacua,” JHEP 0405, 072 (2004) [arXiv:hep-th/0404116]; M. R. Douglas and S. Kachru, “Flux compactification,” arXiv:hep-th/0610102. [4] A. Vilenkin, “The Birth Of Inflationary Universes,” Phys. Rev. D 27, 2848 (1983). [5] A. D. Linde, “Eternally Existing Self-reproducing Chaotic Inflationary Universe,” Phys. Lett. B 175, 395 (1986); A. D. Linde, D. A. Linde and A. Mezhlumian, “From the Big Bang theory to the theory of a stationary universe,” Phys. Rev. D 49, 1783 (1994) [arXiv:gr-qc/9306035]. [6] A.D. Linde, Particle Physics and Inflationary Cosmology (Harwood, Chur, Switzerland, 1990) [arXiv:hep-th/0503203]; A. Linde, “Inflation, quantum cosmology and the anthropic principle,” in Science and Ultimate Reality: From Quantum to Cosmos, (eds. J.D. Barrow, P.C.W. Davies, & C.L. Harper, Cambridge University Press, 2003) [arXiv:hep-th/0211048]. [7] L. Susskind, “The anthropic landscape of string theory,” arXiv:hep-th/0302219. [8] E. J. Copeland, R. C. Myers and J. Polchinski, “Cosmic F- and D-strings,” JHEP 0406, 013 (2004) [arXiv:hep-th/0312067]. [9] S. Kachru, R. Kallosh, A. Linde, J. M. Maldacena, L. McAllister and S. P. Trivedi, “To- wards inflation in string theory,” JCAP 0310, 013 (2003) [arXiv:hep-th/0308055]. [10] S. H. Henry Tye, “Brane inflation: String theory viewed from the cosmos,” arXiv:hep-th/0610221. [11] V. F. Mukhanov and G. V. Chibisov, “Quantum Fluctuation And ‘Nonsingular’ Universe,” JETP Lett. 33, 532 (1981) [Pisma Zh. Eksp. Teor. Fiz. 33, 549 (1981)]; S. W. Hawking, “The Development Of Irregularities In A Single Bubble Inflationary Universe,” Phys. Lett. B 115, 295 (1982); A. A. Starobinsky, “Dynamics Of Phase Transition In The New In- flationary Universe Scenario And Generation Of Perturbations,” Phys. Lett. B 117, 175 http://arxiv.org/abs/hep-th/0301240 http://arxiv.org/abs/hep-th/0105097 http://arxiv.org/abs/hep-th/0106209 http://arxiv.org/abs/hep-th/0205316 http://arxiv.org/abs/hep-th/0004134 http://arxiv.org/abs/hep-th/0303194 http://arxiv.org/abs/hep-th/0404116 http://arxiv.org/abs/hep-th/0610102 http://arxiv.org/abs/gr-qc/9306035 http://arxiv.org/abs/hep-th/0503203 http://arxiv.org/abs/hep-th/0211048 http://arxiv.org/abs/hep-th/0302219 http://arxiv.org/abs/hep-th/0312067 http://arxiv.org/abs/hep-th/0308055 http://arxiv.org/abs/hep-th/0610221 (1982); A. H. Guth and S. Y. Pi, “Fluctuations In The New Inflationary Universe,” Phys. Rev. Lett. 49, 1110 (1982); J. M. Bardeen, P. J. Steinhardt and M. S. Turner, “Sponta- neous Creation Of Almost Scale - Free Density Perturbations In An Inflationary Universe,” Phys. Rev. D 28, 679 (1983); V. F. Mukhanov, “Gravitational Instability Of The Universe Filled With A Scalar Field,” JETP Lett. 41, 493 (1985) [Pisma Zh. Eksp. Teor. Fiz. 41, 402 (1985)]; V. F. Mukhanov, Physical Foundations of Cosmology, Cambridge University Press, 2005. [12] A.R. Liddle and D. H. Lyth, Cosmological Inflation and Large-Scale Structure, (Cambridge University Press, Cambridge 2000). [13] H. V. Peiris et al., “First year Wilkinson Microwave Anisotropy Probe (WMAP) observations: Implications for inflation,” Astrophys. J. Suppl. 148, 213 (2003) [arXiv:astro-ph/0302225]. [14] M. Tegmark et al., “Cosmological Constraints from the SDSS Luminous Red Galaxies,” Phys. Rev. D 74, 123507 (2006) [arXiv:astro-ph/0608632]. [15] C. L. Kuo et al., “Improved Measurements of the CMB Power Spectrum with ACBAR,” arXiv:astro-ph/0611198. [16] A. A. Starobinsky, “Spectrum Of Relict Gravitational Radiation And The Early State Of The Universe,” JETP Lett. 30, 682 (1979) [Pisma Zh. Eksp. Teor. Fiz. 30, 719 (1979)]; A. A. Starobinsky, “Cosmic Background Anisotropy Induced by Isotropic Flat-Spectrum Gravitational-Wave Perturbations,” Sov. Astron. Lett. 11, 133 (1985). [17] J. A. Peacock, P. Schneider, G. Efstathiou, J. R. Ellis, B. Leibundgut, S. J. Lilly and Y. Mellier, “Report by the ESA-ESO Working Group on Fundamental Cosmology,” arXiv:astro-ph/0610906. [18] D. Baumann and L. McAllister, “A microscopic limit on gravitational waves from D-brane inflation,” arXiv:hep-th/0610285. [19] R. Bean, S. E. Shandera, S. H. Henry Tye and J. Xu, “Comparing brane inflation to WMAP,” arXiv:hep-th/0702107. [20] R. Kallosh, “On inflation in string theory,” arXiv:hep-th/0702059. [21] R. Kallosh and A. Linde, “Landscape, the scale of SUSY breaking, and inflation,” JHEP 0412, 004 (2004) [arXiv:hep-th/0411011]. [22] A. D. Linde, “Chaotic Inflation,” Phys. Lett. B 129, 177 (1983). [23] H. J. de Vega and N. G. Sanchez, “Predictions of single field inflation for the tensor/scalar ratio and the running spectral index,” Phys. Rev. D 74, 063519 (2006). [24] A. Linde, “Inflationary cosmology,” a contribution to the proceedings of the conference “Inflation + 25,” in preparation. http://arxiv.org/abs/astro-ph/0302225 http://arxiv.org/abs/astro-ph/0608632 http://arxiv.org/abs/astro-ph/0611198 http://arxiv.org/abs/astro-ph/0610906 http://arxiv.org/abs/hep-th/0610285 http://arxiv.org/abs/hep-th/0702107 http://arxiv.org/abs/hep-th/0702059 http://arxiv.org/abs/hep-th/0411011 [25] K. Freese, J. A. Frieman and A. V. Olinto, “Natural inflation with pseudo - Nambu- Goldstone bosons,” Phys. Rev. Lett. 65, 3233 (1990). [26] C. Savage, K. Freese and W. H. Kinney, “Natural inflation: Status after WMAP 3-year data,” Phys. Rev. D 74, 123511 (2006) [arXiv:hep-ph/0609144]. [27] H. M. Hodges, G. R. Blumenthal, L. A. Kofman and J. R. Primack, “Nonstandard primor- dial fluctuations from a polynomial inflaton potential,” Nucl. Phys. B 335, 197 (1990). [28] C. Destri, H. J. de Vega and N. G. Sanchez, “MCMC analysis of WMAP3 data points to broken symmetry inflaton potentials and provides a lower bound on the tensor to scalar ratio,” arXiv:astro-ph/0703417. [29] D. H. Lyth, “What would we learn by detecting a gravitational wave signal in the cosmic microwave background anisotropy?,” Phys. Rev. Lett. 78, 1861 (1997) [arXiv:hep-ph/9606387]; D. H. Lyth, “Particle physics models of inflation,” arXiv:hep-th/0702128. [30] A. D. Linde, “A New Inflationary Universe Scenario: A Possible Solution Of The Horizon, Flatness, Homogeneity, Isotropy And Primordial Monopole Problems,” Phys. Lett. B 108, 389 (1982); A. Albrecht and P. J. Steinhardt, “Cosmology For Grand Unified Theories With Radiatively Induced Symmetry Breaking,” Phys. Rev. Lett. 48, 1220 (1982); A. D. Linde, “Coleman-Weinberg Theory And A New Inflationary Universe Scenario,” Phys. Lett. B 114, 431 (1982); A. D. Linde, “Temperature Dependence Of Coupling Constants And The Phase Transition In The Coleman-Weinberg Theory,” Phys. Lett. B 116, 340 (1982); A. D. Linde, “Scalar Field Fluctuations In Expanding Universe And The New Inflationary Universe Scenario,” Phys. Lett. B 116, 335 (1982). [31] A. D. Linde, “Axions in inflationary cosmology,” Phys. Lett. B 259, 38 (1991); A. D. Linde, “Hybrid inflation,” Phys. Rev. D 49, 748 (1994) [astro-ph/9307002]. [32] S. Chongchitnan and G. Efstathiou, “Prospects for direct detection of primordial gravi- tational waves,” Phys. Rev. D 73, 083511 (2006) [arXiv:astro-ph/0602594]; G. Efstathiou and S. Chongchitnan, “The search for primordial tensor modes,” Prog. Theor. Phys. Suppl. 163, 204 (2006) [arXiv:astro-ph/0603118]. [33] J. M. Cline, “String cosmology,” arXiv:hep-th/0612129. [34] M. Alishahiha, E. Silverstein and D. Tong, “DBI in the sky,” Phys. Rev. D 70, 123505 (2004) [arXiv:hep-th/0404084]. [35] D. Baumann and L. McAllister, private communication. [36] A. R. Liddle, A. Mazumdar and F. E. Schunck, “Assisted inflation,” Phys. Rev. D 58, 061301 (1998) [arXiv:astro-ph/9804177]. [37] S. Dimopoulos, S. Kachru, J. McGreevy and J. G. Wacker, “N-flation,” arXiv:hep-th/0507205. http://arxiv.org/abs/hep-ph/0609144 http://arxiv.org/abs/astro-ph/0703417 http://arxiv.org/abs/hep-ph/9606387 http://arxiv.org/abs/hep-th/0702128 http://arxiv.org/abs/astro-ph/9307002 http://arxiv.org/abs/astro-ph/0602594 http://arxiv.org/abs/astro-ph/0603118 http://arxiv.org/abs/hep-th/0612129 http://arxiv.org/abs/hep-th/0404084 http://arxiv.org/abs/astro-ph/9804177 http://arxiv.org/abs/hep-th/0507205 [38] R. Kallosh, N. Sivanandam, M. Soroush, “Looking for Axion Valley in the Landscape”, work in progress. [39] E. Silverstein, work in progress. [40] M. Kawasaki, M. Yamaguchi and T. Yanagida, “Natural chaotic inflation in supergravity,” Phys. Rev. Lett. 85, 3572 (2000) [arXiv:hep-ph/0004243]. [41] J. J. Blanco-Pillado, R. Kallosh and A. Linde, “Supersymmetry and stability of flux vacua,” JHEP 0605, 053 (2006) [arXiv:hep-th/0511042]. [42] B. S. Acharya, K. Bobkov, G. L. Kane, P. Kumar and J. Shao, “Explaining the electroweak scale and stabilizing moduli in M theory,” arXiv:hep-th/0701034. [43] We are grateful to the authors of [42] for providing us with this information. [44] V. Balasubramanian, P. Berglund, J. P. Conlon and F. Quevedo, “Systematics of moduli stabilisation in Calabi-Yau flux compactifications,” JHEP 0503, 007 (2005) [arXiv:hep-th/0502058]. [45] O. DeWolfe and S. B. Giddings, “Scales and hierarchies in warped compactifications and brane worlds,” Phys. Rev. D 67, 066008 (2003) [arXiv:hep-th/0208123]. [46] N. Arkani-Hamed and S. Dimopoulos, “Supersymmetric unification without low en- ergy supersymmetry and signatures for fine-tuning at the LHC,” JHEP 0506, 073 (2005) [arXiv:hep-th/0405159]; N. Arkani-Hamed, S. Dimopoulos, G. F. Giudice and A. Romanino, “Aspects of split supersymmetry,” Nucl. Phys. B 709, 3 (2005) [arXiv:hep-ph/0409232]. http://arxiv.org/abs/hep-ph/0004243 http://arxiv.org/abs/hep-th/0511042 http://arxiv.org/abs/hep-th/0701034 http://arxiv.org/abs/hep-th/0502058 http://arxiv.org/abs/hep-th/0208123 http://arxiv.org/abs/hep-th/0405159 http://arxiv.org/abs/hep-ph/0409232 Introduction Tensor modes in the simplest inflationary models Existing models of string theory inflation do not predict a detectable level of tensor modes Scale of SUSY breaking, the gravitino mass, and the amplitude of the gravitational waves in string theory inflation ABSTRACT Future detection/non-detection of tensor modes from inflation in CMB observations presents a unique way to test certain features of string theory. Current limit on the ratio of tensor to scalar perturbations, r=T/S, is r < 0.3, future detection may take place for r > 10^{-2}-10^{-3}. At present all known string theory inflation models predict tensor modes well below the level of detection. Therefore a possible experimental discovery of tensor modes may present a challenge to string cosmology. The strongest bound on r in string inflation follows from the observation that in most of the models based on the KKLT construction, the value of the Hubble constant H during inflation must be smaller than the gravitino mass. For the gravitino mass in the usual range, m_{3/2} < O(1) TeV, this leads to an extremely strong bound r < 10^{-24}. A discovery of tensor perturbations with r > 10^{-3} would imply that the gravitinos in this class of models are superheavy, m_{3/2} > 10^{13} GeV. This would have important implications for particle phenomenology based on string theory. <|endoftext|><|startoftext|> Introduction Quick behavioral response to strong aversive stimuli (such as threat from a predator or an imminent danger of being hurt) is a key to survival throughout the animal kingdom. Network models of animal behavior have been elaborately discussed in (Schmajuk 1997). A neural network model of rats’ anxiety behavior had been studied by Salum and colleagues (Salum et al. 2000), but they didn’t take into account the functioning of the nerve cells in the brain during the task performance. Recently a neurodynamical model for conditional visuomotor association task has been proposed (Loh & Deco 2005). In this model a trial and error paradigm has been assumed in a stochastic decision space. An integrate and fire neuronal network model has been proposed to realize the paradigm. A neural network model of brain or cognitive state machine (CSM) to study decision making in a competitive environment has also been proposed (Rabinovich et al. 2006). The dynamics of making a choice from among multiple conflicting options has been formulated by Lotka-Volterra type of equations. It is evident that intelligent decisions in a sequential behavior have to be stable against noise and reproducible to allow memorization and reuse of successful decision sequences in the future. On the other hand, it also has to be sensitive to new information from the environment. These two fundamentally contradictory requirements have been taken care of in (Rabinovich et al. 2006). In this paper a neurodynamical model of response behavior of a neural network to strong aversive stimuli has been presented. The assumption is that the network will behave in a manner to avoid repeating the negative experiences of the past. No specific neuronal network model has been assumed. In this approach information has been extracted (at least theoretically) from a behaving neuronal network by FFT on spike trains of all the neurons in the network, which should work across all networks of neurons irrespective of their architecture. The neurodynamical system has a unique representation in the information space where the Fourier coefficients of a spike train are arranged as a vector and uniquely represent the spike train. All the calculations have been carried out in that space. The model depends on past memory, synaptic plasticity and intensity of feeling. Interestingly, a short latency (120 – 160 ms) had been reported before response to aversive stimuli in the right prefrontal cortex of a human subject. No such latency was observed in case of pleasant or neutral stimuli (Kawasaki et al. 2001). The present model can offer a possible explanation for this apparently perplexing phenomenon. The following assumptions have been made: 1) Structure: Any behavior involves a network in the nervous system which if represented as a directed multigraph will have neurons as vertices and synapses as (directed) edges. 2) Function: The functional behavior of this network is manifested by the collective behavior of the neurons present in the network. 3) Memory: Memory of experiences of past behaviors of this network is stored entirely within the network and nowhere out side of it. 4) Plasticity: Plasticity of each synapse is a time dependent function (called synaptic weight) and the total plasticity of the network between any two behaviors is also a time dependent function called network plasticity. 5) Feeling: Feeling is associated with each behavior as a specific mathematical function which controls how experience associated with this behavior will mediate the intensity of effect of this behavior on any other behavior. 6) Interaction: Network plasticity mediates the interaction of the ongoing behavior with the memory of past behaviors stored in the network. In section 2 each of the above points will be explained briefly and will be represented with appropriate mathematical expressions and equations. In section 3 with the help of an analogy from classical electrostatics the behavior dynamics of the network in terms of those expressions and equations will be formulated. Notion of a force like and a potential energy like expressions have been introduced. Despite computational difficulties involved with the model, in section 4 it has been related to reality first by offering an explanation to an hitherto unexplained observation in the human brain, second relating it with a successful decision making model and third relating this electrophysiological model to hemodynamical activity of the brain. The paper will be concluded with discussions and future directions. 2. Modeling of the parts A) Structure The closest computational analog of a neuronal circuit is a directed multigraph, whose each node will represent a neuron and each edge a synapse (Majumdar & Kozma, 2006). It can be represented as a three dimensional array ],,[ kjia . If there are p different synapses joining the neuron i with the neuron j (assuming that all neurons in the brain have been numbered) then ],,[ kjia will give the weight (the gain in synaptic transmission) of the kth synapse joining the neuron i with the neuron j , where pk ≤≤1 . If the neuron i and neuron j are excitatory then ],,[ kjia will be positive, if they are inhibitory then ],,[ kjia will be negative. For given values of ji, and k ],,[ kji uniquely determines a synapse and ],,[ kjia represents the synaptic weight. In general ],,[ kjia will be a function of time and may be written as )](,,[ tkjia or )(taijk , where t denotes time. B) Function Computations in the central nervous system (CNS) have been viewed from three different angles – (a) synaptic computation (Abbott & Regehr 2004), (b) dendritic computation (London & Hausser 2005) and (c) neuronal computation (Koch 1999; Borisyuk & Rinzel 2005). In a neuronal network synapses are activating the neurons and neurons are activating the synapses. In this sense the aggregate behavior of the neurons in a network is equivalent to aggregate behavior of the synapses in that network. On an average every neuron receives input from about 10 synapses and therefore from the computational modeling purpose it would be more convenient to consider the aggregate behavior of neurons rather than the aggregate behavior of synapses in a network. The behavior of a neuron during a particular epoch of time is completely represented by the spike train it generates during that period of time. A neuronal spike train carries information in the following manner (Kandel, et al. 2000). (1) The number of action potentials (spikes); and (2) The time intervals between them. (Although this argument simplifies the description of brain functions to a great extent it ignores the reality of occurrence of change in the neural circuit without changing the firing patterns of the neurons. This has been discussed in Conclusion). The duration of a spike is typically 1 to 2 milliseconds (ms, depending on the temperature) (Koch 1999). When the sample frequency is high (1000 Hz or more) by FFT reliable information out of a neuronal spike train can be extracted. The FFT produces the vector bababaa .... 22110 where =na (nth Fourier coefficient + conjugate of nth Fourier coefficient), (2.1) = (nth Fourier coefficient – conjugate of nth Fourier coefficient) (2.2) 1−=i . For convenience it can be rewritten as )12().....1( +ree , where − oddisnifbne evenisnifane , (2.3) in uniform symbol. More conventionally the vector kv associated with the spike train of the kth neuron in the network can be written as kkk reev ))12(),...,1(( += . (2.4) The suffix T stands for transpose. If the duration of the spike train is p seconds then pr 500= (assuming sample frequency is 1000 Hz). Clearly the vector kv uniquely represents the spike train of the kth neuron in the network, assuming all the neurons in the network have been uniquely numbered. Since the Fourier series is convergent 0)12( →+rek as ∞→r . Let there are N neurons in the network. The behavior iB of the network for a duration of p seconds is represented by the cluster of N vectors { }N kv 1= , where k reev ))12(),...,1(( += , (2.5) p = , (2.6) f is sample frequency. It is important to note that even when a neuron is oscillating below threshold for spike initiation, it can still release neurotransmitter and shape the final circuit output (Harris-Warrick & Marder 1991). But for simplicity of modeling I shall ignore this fact in this paper. C) Memory Memory is no single concept and there is no universally agreed upon definition of it. However the starting point for virtually any scientific analysis of memory involves a decomposition into processes of encoding, storage and retrieval (Schacter 2004). Wilder Penfield explored the cortical surface in more than a thousand epileptic patients. On rare occasions (about 8% of all the subjects he tried) he found that electrical stimulation in the temporal lobes produced a coherent recollection of an earlier experience (Kandel et al. 2000). A similar phenomenon has been observed by stimulating the inferotemporal (IT) cortex of macaque monkeys (Afraz et al. 2006). By mild stimulation to the IT in the macaque brain (previously trained to distinguish between face and non-face) impression of seeing a face was created in the mind of the animals where there was actually no face. An opposite phenomenon has been reported for the human brain (Quian Quiroga et al. 2005). Visualization of images of interesting objects including faces of celebrities can make particular neurons to fire in the sub-region of the medial temporal lobes (MTL) of the human brain consisting of hippocampus, amygdala, entorhinal cortex and parahippocampal gyrus. This supports the hypothesis – object cognition activates a network in the brain and artificially activating the network to the appropriate degree will create the impression of perceiving the object in the brain even when it is not present in the environment. The following hypothesis seems to hold. A particular cognition involves a particular network in the brain where the memory of the cognition remains stored. The higher processing areas of the network takes increasingly greater part in the cognition and also greater part in storing and retrieving the memory. For the purpose of this paper it would be enough to be able to express memory in terms of behavior of a network. If the behavior of the network is iB the memory associated with it will be { }N ki uM 1== . The relationship between ku and kv is determined by long term potentiation (LTP) and long term depression (LTD) if iM is residing in the network long after iB has happened. It may be appropriate to emphasize at this point that the collective firing pattern of neurons (i.e., the collective spike trains of neurons belonging to the network) to evoke iM is preserved by the synaptic connections in the network, for they control input to the neurons in response to the stimuli and therefore iM is stored in the synaptic strengths of the network. A metric between iB and iM can be defined in the following manner { }∑ ∑ MBd ii , (2.7) where N is the number of neurons in the network. Although virtually impossible to compute in this form d can be a measure of plasticity of the whole network with respect to iB and iM . Note that the time duration has been taken care of in p at the time of determining )( je i and )( je i (neuronal firing patterns may not be identical at the time of a behavior and recalling it later both with respect to identical set of stimuli). A single network can mediate multiple behaviors (Harris-Warrick & Marder 1991) and therefore can store multiple memories. Memory associated with a behavior may either be positive (appetitive) or negative (aversive) or neutral or a combination of them. The sense positive, negative and neutral are totally subjective and no attempt will be made here to define them. The meaning will become clear in the contexts in which they occur. A simple behavior is one with which only one type of memory (i.e., either positive or negative or neutral) remains associated. A behavior with combined types of memory can be called complex behavior and let us assume any behavior can be decomposed into simple behaviors. A network therefore can be thought to have memories of simple behaviors only. When the difference between iB and iM is small i.e., in (2.7) ),( ii MBd becomes small, consolidation of iM is good. ),( ii MBd is intimately related to synaptic plasticity. D) Plasticity (2.7) gives us an immediate measure (no matter however difficult it is or may even be impossible to implement) of plasticity of a neuronal network. The measure of combined long term plasticity )(tW as a combination of long term potentiation (LTP) and long term depression (LTD) can be given by the following formula ∑ ii MBd ),( , (2.8) where p is the duration of recording of the neural response when iB is happening and when iM is being retrieved both in response to identical set of stimuli and t is the time difference between end of happening of iB and start of retrieving iM . Usually p remains fixed and therefore )(),( tWtpW = . m is the total number of past behaviors whose memory is still preserved in the network. When the gap between occurrence of iB and recalling of iM is long (30 minutes or more according to some estimate (Koch 1999)) (2.8) gives the long term plasticity and when it is shorter (2.8) gives short term plasticity. The following notion will also be useful ),()( iii MBdtW = , (2.9) where )(iWi is the network plasticity between behavior iB and memory iM after time t . E) Feeling From a modeling or computational point of view the feelings may be taken as mediating the intensity of a behavioral response. In this sense if iB is a simple behavior the feeling associated with it determines how intensely negative or intensely positive will be the memory of it. Ideally iM should have an ‘intensity distribution function’ similar to a normal distribution function, which is decayed as the distance from the mean position is increased. iM is not a single point, but a cluster of points. So if the intensity distribution function is to be modeled after the normal distribution function the most appropriate candidate for the mean point turns out to be the mean point of iM M ii  += ∑∑ )12(),.......,1( , (2.10) where T stands for transpose. The feeling function RRF : associated with iM is to be defined as  2/12/)12( i MXMXXF , (2.11) where Σ is the covariance matrix and Σ is the determinant of Σ . Since X has 12 +r independent component variables only diagonal entries of Σ are nonzero and each of them is the variance of a component variable in X . Σ is the parameter which controls the intensity of )(XFi beyond iM . The hormones and neuromodulators responsible for mediating feelings act by controlling the entries of Σ . F) Interaction Interaction means here the part played by synapses in the network in mediating the effects of sM i ' on an ongoing behavior B . In mathematical term it can be put as ))(())(()( tpWfttpWftI iii −−+−= , (2.12) where )( tI i denotes the interaction and t is any instant during occurrence of B . if is a continuous function which is almost everywhere differentiable. When B takes place the plasticity of the network changes and so also the interaction. Let us normalize interaction by the following formula )( , (2.13) where T is the duration of happening of B . 3. Integrative dynamics Now let us briefly consider a phenomenon in electrostatics. Let there be seven charged particles on a plane and they are all in arbitrary but fixed positions. Four of them are positively charged and three are negative. A new negative charge is introduced, which is allowed to move freely in the plane, that is, it has two degrees of freedom along the X and Y axes. Assume apart from sign all the charges are quantitatively equal. Let the locus of the introduced negative charge be ),( yx . The position of four positive charges be { }4 ),( =iii yx and that of the three negative charges be { } ),( =iii yx . The Coulomb force ),( yxF acting on the free charge is given by ∑∑ == −+− )()()()( i iii ii yyxx yxf , (3.1) where C is a constant. (3.1) will govern the dynamics of the whole system. Now what should be the condition to keep the introduced charge fixed within a bounded region enclosing the seven fixed positioned charge so that the potential energy on the introduced particle becomes minimum? On line of this analogy the dynamics of a new behavior B should follow the governing expression )(BG given by the following equation ( ) ( ) = +−= smi i )( , (3.2) where m is the number of simple behaviors whose memories are stored in the network under consideration. It has been assumed that B will be attracted towards s behaviors with positive experience and will be repulsed by the remaining sm − behaviors with negative experience. Like (3.1) there is no apparent reason why inverse square law should hold for (3.2) also. If the inverse square law does not hold then the denominators on the right side of (3.2) should be replaced by a general polynomial in the metric d , for any continuous function can be approximated to any desired degree within a compact interval by a suitable polynomial. Like the electrostatic situation for a given iM )( iMG must have a unique pole at the iM . This means in general (3.2) can have the form ( ) ( ) = +−= 1 1 ),( )( , (3.3) for some positive n . However in this paper I shall adhere to (3.2), for the model here is essentially electrophysiological and therefore Coulomb interaction seems to be probable. In summary the dynamics of the new behavior B is described by (2.7), (2.11), (2.13) and (3.2). In analogy with the electrostatic system (3.2) is a ‘force like’ expression with which a ‘potential energy like’ expression needs to be associated, which for the sake of stability of the system (i.e., if perturbed infinitesimally will come back to the original state) needs to be minimized. Let the potential function associated with B due to iM be iφ given by ∑ ))12(),......,1((φφ . (3.4) The potential function associated with the emerging new behavior B due to all previous behaviors iB ( iM is memory of iB ) be φ , which is given by ∑ φφ . (3.5) To derive the potential energy classically from the force field the force must have to be conservative, i.e., independent of time (Goldstein 1950). In (3.2) the expression )(tI i is a function in time. However in case of a very strong aversive stimulus like, locating a predator dangerously close all of a sudden or being on the verge of falling down deep underneath from a very high roof top after a sudden slip, an extremely fast behavioral response must have to be shown. Within this duration synaptic plasticity does not get much chance to act and the feeling must have to be very strong (such as intense fear) to compensate for that as is evident from (3.2). This gives us an opportunity to treat )(tI as a fixed quantity which makes )(BG in (3.2) time independent for the duration of B . Then in analogy with the electrostatic system )(BG must satisfy the following relation ∑ ∑∑ . (3.6) The value of )(BG has only been considered and not its direction. (3.2) and (3.6) together give ( ) ( ) ∑ ∑∑ ∑∑ = +−== = smi i jeN 1 1 . (3.7) (3.7) describes the dynamics of the behavior of B in the 12 +r R under the assumption that )(tI i is constant (otherwise the ‘force field’ would not have been conserved and only space dependent potential expressions could not have been brought in the dynamics). Also the time scale is very small. φ as given by (3.5) will have to be minimized (minimization of energy is an important criterion for neural computation (Laughlin 2004)), which means )}12(,.....,1{,0 . (3.8) Combining (3.4) and (3.5) the expression for φ becomes ∑∑ k ree ))12(),......,1((φφ . (3.9) If (3.9) is to give a global minimum it should not only be true when φ is given by (3.9), but also it must hold when φ is given by ∑ ∑ kki reeyx ))12(),......,1((φφ , (3.10) where )1,1(, ccyx ki +−∈ for some small 0>c . This is because a global minimum is a very stable position and therefore under a small perturbation the system always comes back to it. Let ikki zyx = . Then combining (3.8) and (3.10) the following is obtained +++ 0 ..... ..... ..... ..... ..... ..... ..... 12,,12,2,112,1,1 2,,2,2,12,1,1 1,,1,2,11,1,1 NmrNmrr , (3.11) where ))12(),......,1(( . Note that each kiz , or ikz can take uncountably infinitely many values from some (small) open interval and for all of them (3.11) holds under the perturbation principle. Therefore the mNr ×+ )12( matrix on the left of (3.11) must represent a null linear transformation and it must be a null matrix, which implies jki jki ,,,0 ))12(),......,1(( ,, ∀= . (3.12) (3.7) and (3.12) together imply ( ) ( ) = +−= smi i . (3.13) (3.13) says, “Negative and positive experiences in a neuronal network must counter balance each other for a stable (which will not be altered due to presence of some amount of noise or distraction) unsupervised learning from the experience gained through a new behavior.” 4. Application What do we get from the principle enunciated at the end of the last section in response to a strong aversive stimulus? Its mathematical formulation (3.13) says, in the face of strong aversive stimuli (such as an imminent danger) the ensuing behavior B must avoid repeating the past behaviors sBi ' with negative experience sM i ' . Note that the stimuli can invoke an iM if and only if iB is at least partially activated by the stimuli. (3.13) says the ensuing behavior B will have to be such that none of the sBi ' with negative experience is repeated. This means in the information space 12 +r R B will have to sit away from each iM , which denotes memory of a negative behavior. B will have its own positive and negative parts which will later be stored as new positive and negative memories in the network. B cannot be neutral. This will be the subject of a future work. A) Aversion response latency The prefrontal cortex participates in linking perception of stimuli to the guidance of behavior including the flexible execution of strategies for obtaining rewards and avoiding punishments as an organism interacts with its environment. Recording from neurons within healthy tissue in the ventral sites of the right prefrontal cortex short latency (120– 160ms) responses selective for aversive visual stimuli have been observed. No such latency was observed for pleasant or neutral stimuli (Kawasaki et al. 2001). (3.13) can offer us an explanation of this phenomenon. Aversive stimuli do evoke negative (or aversive) memory (that is how the stimuli are identified as aversive, even a novel aversive stimulus will have to be decomposable into known aversive features) and therefore the ensuing behavior B must ensure that it has minimum overlap with the negative memories in the space 12 +r R . This in turn makes sure that B acts on the network as less as possible to repeat the behaviors associated with the negative memories. When there will be only aversive stimuli (no pleasant stimulus) the new behavior will have to be organized to avoid the stimuli particularly when the stimuli are strongly aversive (like an immediate threat). Clearly a strongly aversive stimulus must invoke the memory jM of at least one simple negative behavior with which a strong feeling is to be associated. In other words amygdale make sure that the Σ in (2.11) have larger entries. Then there must be the memory jM of a simple positive behavior which can take appropriate action in response to the behavior (sensation) corresponding to jM . Even a new born is hardwired to express displeasure by crying in response to an aversive stimulus so that some one else (possibly the mother) is alerted and come in help to avoid the stimulus. In order to avoid the aversive stimuli the dynamics of B will be such that (3.13) can take the following form ( ) ( )∑∑ == + , (4.1) where )(XF j is the feeling associated with jM and )(tI j is the network interaction between B and jM , k is the number of simple positive behavior recalled. This will counterbalance the aversive effect of the stimuli and make sure that (3.13) holds. Whereas in case of pleasant or neutral stimuli (without the presence of aversive stimuli) avoidance is not necessary, the brain is free to repeat the behaviors with pleasant memory. In that case no global minimization of the potential function will be necessary and left side of (3.7) does not need to be zero. Therefore equality in (4.1) will also not be necessary. The new behavior can sit anywhere and no ‘optimum positioning’ (making the potential function globally minimum) will be necessary and this will require less time for a response and therefore there will be no latency in the response to pleasant stimuli. When there will be both positive and negative stimuli (3.13) will hold. Thus it appears that whenever aversive (negative) stimuli are present either (3.13) or (4.1) needs to be calculated. Time taken by this ‘calculation’ in the brain is the reason for latency when there are aversive stimuli. Whereas no such latency is necessary for positive or neutral stimuli. Probably the brain does not calculate the way shown in this paper. Had it done so it would have taken a much longer latency. But the reasoning here is compelling enough to conclude that calculations responsible for the observed latency do take place in the brain in one form or the other. B) Sequential decision making A brain always has to make choices i.e., a behavior is a series of switching or decision making procedures. Rabinovich and colleagues have considered the dynamics of decision making (DM) by the brain or cognitive state machine (CSM) at the psychological level (Rabinovich et al. 2006). Whereas the focus of this paper is on the dynamics of neural substrate of such processes. At each DM instant of a CSM the underlying neural dynamics of the DM generates the psychological process of DM in the CSM. The dynamics of the CSM is governed by the Lotka-Volterra type equation (Rabinovich et al. 2006) )(),( taatIaa i jijiiii ηρσ + +−= ∑ & , (4.2) where )(tai is the state of the CSM, ),( tIiσ is a control function which controls the dynamics given by (4.3), I is the input (environmental stimulus), ijρ is a coupling constant between ia and ja based on genetic and memorized information (very similar to the interaction )(tI i (2.13) between the memories of the old behavior and the ensuing new behavior, which in this paper has been taken to be constant), N the total number of states and )(tiη is the external noise. Note that in (4.2) the meaning of iσ and N are different than anywhere else in the earlier part of this paper. & , (4.3) where iU is a potential function and τ is characteristic time which is very small. Difference in states occur at the it where a decision needs to be made as shown in Figure 1. ia can be chosen at an instant jt , where a decision has to be made, as many ways as there are minima of iU . Figure 1: A sequence of cognitive states. Thin lines are possible paths and the decision path has been shown by thick lines. 9643 ,,, tttt are the instances to make a decision. Adopted from (Rabinovich et al. 2006). (3.13) is concerned about the dynamics of neural substrate of each single decision making (DM) in the sequence of DM’s shown in Figure 1. (3.13) is also derived by minimizing the potential function (3.9). Since the dynamics given by (3.7) is concerned about a local decision making in Figure 1, it was possible to take global optimum of the potential function given by (3.9) (global optimum of local potential function) and its stability ensures that the local dynamics is not perturbed by a small amount of noise. It is very important for stable cognition or stable behavior. According to this model the alteration in behavior (to make it dynamic) should be introduced by changing feeling (given by (2.11)) and synaptic plasticity (given by (2.13)), which happens from state to state in a CSM. Therefore despite the global minimum of the local dynamics the CSM manages to move on from one state to the next till it reaches the end of life which may be signified by lack of emotional impetus. On the other hand at the sequential DM level in a CSM (Figure 1) introduction of noise plays a significant role in dynamically altering the behavior (equation (4.2)). In both the dynamical systems of this paper and that of Rabinovich et al. the time length has been taken to be short. The duration has been taken so short that the dynamics of ),( tIiσ (in (4.3)) and )(tI i (in (2.13)) do not change within that period. In this scenario comparing (3.13) and (4.2) it appears that the external noise )(tiη may take an important role in mediating feeling (emotional distraction or attraction with respect to an external stimulus). (3.13) and (4.1) describe how new behavior is formed depending on synaptic plasticity, memory and feeling. However there is no deterministic way to compute the new behavior from (3.13) or (4.1) even if all the information are available. Also (3.13) or (4.1) will be extremely difficult (if not impossible) to compute except for cases like a simple behavior (such as inking or gill withdrawal) of a simple animal (such as the sea snail Aplysia californica). This is in conformity with the fact that closer is the model to the neuronal network level of the brain the more difficult it would be to implement. On the other hand the CSM dynamics given by (4.2) is very convenient to compute. C) Potential function The potential function introduced by (3.9) in the neuronal network dynamics of the brain can be related to metabolic energy consumption during activation of the network to execute a behavioral or cognitive task. (3.9) is based on electrophysiology and metabolism is related to hemodynamics. Lot of research is going on to find the relationship between these two activities of the brain (Logothetis & Wandell 2004). Recently a correlation study between stimulus based ERP and fMRI in different parts of the human brain has been reported (Gore et al. 2006). Despite convincing evidence of their interdependence, such as in the form of dependence of fMRI on ERP or EEG, precise knowledge about the nature of dependence is still lacking. A linear transform model had been proposed based on the hypothesis that fMRI responses are proportional to local average neural activity averaged over a period of time (Boynton et al. 1996). It has been reported that in the visual cortex of macaque monkey fMRI response depends closely on the local field potential (LFP) (Logothetis et al. 2001). Note that the potential function given by (3.5) can be obtained from (3.7) when the time course is short and the quantities on the right side are known. From (3.7) it is clear that the potential function φ (which is defined on the Fourier coefficients of neural spike trains (3.9)) depends on past memory, synaptic plasticity and feelings. In case of a potential function φ sum of local components will give the function over the whole space, and therefore no matter how large and distributed the network is, local component of φ at a point x on the cortex (let be denoted by xφ ) can be represented by (3.7) where only the neurons in a small neighborhood of x , denoted by )(xn , will participate in the computation of xφ . Since xφ defined on the Fourier coefficients of spike trains of the neurons in )(xn and the value of xφ depends on past memories, plasticity of synapses within )(xn and part of feeling arising within )(xn it is supposed to have a role in BOLD fMRI response at x with respect to a stimulus. xφ may have an affine or linear relation with stimulus driven difference in local BOLD signal at x as modeled in (Logothetis & Wandell 2004). But this needs experimentations to establish. Conclusion In this paper a system level model of the brain functions for behavior has been proposed. The model needs input from individual neurons under the assumption that a brain circuit responsible for a behavior can be understood by the behavior of neurons alone. This assumption has limitations, for significant change in a neural network may occur at the spiking sub-threshold level in the neurons and therefore without changing the spike trains (Harris-Warrick & Marder 1991). Note that FFT based information retrieveal technique followed in this paper will hold equally good for subthreshold signals, whereas spike detection or prediction algorithms may not work for those signals. To account for a fuller neural computation synaptic computation (Abbott & Regehr 2004) and dendritic computation (London & Hausser 2005) will also have to be incorporated. A close investigation into the nature of )(tI (equation (2.13)) will inevitably call attention to synaptic and dendritic computations. The model equation (3.2) will be valid in all generality, but the important consequence (3.13) cannot be drawn as easily as shown in this paper. The effectiveness of the model dynamics is only for a very short period of time and therefore only behaviors in response to strong aversive stimuli have been considered, for such response behaviors must have to be very quick. The time duration is so short that it has been assumed – the average change in the collection of synapses in a network remains fixed within that time. This is an important constraint for introducing potential energy function whose global minimization gives stability to the behavioral response in the sense that it remains unperturbed in a noisy environment. Even without incorporating synaptic and dendritic computations the model stands extremely difficult to implement (for example, calculating iB and then calculating iM will be very challenging). However it can be implemented in case of a simple behavior by a simple animal. In invertebrates like the sea snail Aplysia lesser number of neurons will have to be monitored (only a few in case of simple behaviors like gill withdrawal or inking) and single cell recordings will be less challenging than the mammals. This will make the verification of the system possible. In case of human brain functions if the behavior of the circuit involved can be monitored to a good extent by recording signals only from a few neurons verification of the model may be feasible for some very simple behavior like the following. Consider the experiment of tapping the leg (Kandel et al. 2000). If there is a sharp edge such that dragging the leg too much backward will make it bump on the sharp edge which is another aversive stimulus. The final position of the leg will be away from the source of tapping as well as the sharp edge. If a relationship between the potential function φ and the metabolic or hemodynamic activity of the brain can be established, study of the system will become easier. Introduction of an electrophysiological potential function in the neural computation is a significant outcome of the FFT based information extraction method followed in this paper. Apart from the prospect of relating it to metabolic energy requirements of the brain φ gives the opportunity of deriving interesting theoretical results as shown in this paper. Since the expression of feeling as given by (2.11) is independent of time here the detail of feeling did not have to be considered. But feeling is likely to be dependent on time in the long run and in that case the entries of Σ (which is a diagonal matrix) will be function of time and the dynamics will be much more complicated. How the eigen values of Σ are controlled by the limbic system of the brain is an open question. No attempt has been made in this paper to answer the question. Acknowledgement The author thankfully acknowledges the Institute of Mathematical Sciences in Chennai, India for a postdoctoral fellowship under which this work has been carried out. Some comments by Peter Dayan on a preliminary version of this manuscript are also being acknowledged. References Abbott, L. A. and Regehr, W. G. (2004) Synaptic computation. Nature 431:796–803. Afraz, S. R., Kiani, R. and Esteky, H. (2006) Microstimulation of inferotemporal cortex influences face categorization. Nature 442: 692–695. Borisyuk, R. and Rinzel, J. (2005) Understanding neuronal dynamics by geometrical dissections of minimal models. In: Chow., C, Gutkin, B, Hansel, D, Meunier, C and Dalibard, J, (eds.) Models and Methods in Neurophysics, Proc Les Houches Summer School 2003, (Session LXXX) Elsevier:19–72. Boynton, G. M., Engel, S. A., Glover, G. H. and Heeger, D. J. (1996) Linear systems analysis of functional magnetic resonance imaging in human V1. J. Neurosci. 16(13): 4207–4221. Engel, A.K., Fries, P. and Singer, W. (2001) Dynamic predictions: oscillations and synchrony in top-down processing. Nat. Rev. Neurosci. 2: 704–716. Goldstein, H., (1950) Classical Mechanics, Reading, MA, Addison-Wesley. Harris-Warrick, R. M. and Marder, E. (1991) Modulation of neural networks for behavior. Annu. Rev. Neurosci. 14: 39–57. Gore, J. C., Horovitz, S. G., Cannistraci, C. J. and Skudlarski, P. (2006) Integration of fMRI, NIROT and ERP for studies of human brain function. Magnetic Resonance Imaging 24: 507–513. Kandel, E. R., Schwartz, J. H. and Jessell, T. M. (2000). Principles of Neural Science, New York, McGraw Hill. Kawasaki, H., Adolphs, R., Kaufman, O., Damasio, H., Damasio, A. R., Granner, M., Bakken, H., Hori, T. and Howard III, M. A. (2001) Single-neuron responses to emotional visual stimuli recorded in human ventral prefrontal cortex. Nature Neuroscience 4: 15–16. Koch, C. (1999) Biophysics of Computation: Information Processing in Single Neuron. Oxford University Press, New York. Laughlin, S. B. (2004) The implications of metabolic energy requirements for the representation of information in neurons. In: Gazzaniga, M. S. (ed.) The Cognitive Neurosciences, MIT Press, Cambridge, MA, 187–196. Logothetis, N. K., Pauls, J., Augath, M., Trinath, T. and Oeltermann, A. (2001) Neurophysiological investigation of the basis of the fMRI signal. Nature 412: 150–157. Logothetis, N. K. and Wandell, B. A. (2004) Interpreting the BOLD signal, Annu. Rev. Physol. 66: 735– 769. Loh, M. and Deco, G. (2005) Cognitive flexibility and decision-making in a model of conditional visuomotor associations. Euro. J. Neurosci. 22: 2927–2936. London, M. and Hausser, M. (2005) Dendritic computation. Annu. Rev. Neurosci. 28:503–532. Majumdar, K. K. and Kozma, R. (2006). Studies on sparse array cortical modeling and memory cognition duality, Proc. IJCNN’06, Vancouver, Canada:4954–4957. Quian Quiroga, R., Reddy, L., Kreiman, G., Koch, C. and Fried, I. (2005) Invariant visual representation by single neurons in the human brain. Nature 435: 1102–1107. Rabinovich, M. I., Huerta, R. and Afraimovich, V. (2006) Dynamics of sequential decision making. Phys. Rev. Lett. 97: 188103–1–4. Salum, C., Morato, S. and Roque-da-Silva, A. C. (2000) Anxiety-like behavior in rats: a computational model. Neural Networks 13: 21–29. Schacter, D. L. (2004) Introduction (to Part VI: Memory). In: Gazzaniga, M. S. (ed.) The Cognitive Neurosciences, MIT Press, Cambridge, MA, 643–645. Schmajuk, N. A. (1997) Proteus Caught in A (Neural) Net. Animal Learning and Cognition: A Neural Network Approach, Cambridge University Press, Cambridge. ABSTRACT In this paper a theoretical model of functioning of a neural circuit during a behavioral response has been proposed. A neural circuit can be thought of as a directed multigraph whose each vertex is a neuron and each edge is a synapse. It has been assumed in this paper that the behavior of such circuits is manifested through the collective behavior of neurons belonging to that circuit. Behavioral information of each neuron is contained in the coefficients of the fast Fourier transform (FFT) over the output spike train. Those coefficients form a vector in a multidimensional vector space. Behavioral dynamics of a neuronal network in response to strong aversive stimuli has been studied in a vector space in which a suitable pseudometric has been defined. The neurodynamical model of network behavior has been formulated in terms of existing memory, synaptic plasticity and feelings. The model has an analogy in classical electrostatics, by which the notion of force and potential energy has been introduced. Since the model takes input from each neuron in a network and produces a behavior as the output, it would be extremely difficult or may even be impossible to implement. But with the help of the model a possible explanation for an hitherto unexplained neurological observation in human brain has been offered. The model is compatible with a recent model of sequential behavioral dynamics. The model is based on electrophysiology, but its relevance to hemodynamics has been outlined. <|endoftext|><|startoftext|> Introduction 1 2. Quivers and path algebras 5 3. Potentials and their Jacobian ideals 8 4. Quivers with potentials 12 5. Mutations of quivers with potentials 20 6. Some mutation invariants 25 7. Nondegenerate QPs 29 8. Rigid QPs 32 9. Relation to cluster-tilted algebras 36 10. Decorated representations and their mutations 39 11. Some three-vertex examples 50 12. Some open problems 54 13. Appendix. Proof of Lemma 4.12 55 Acknowledgments 57 References 57 1. Introduction The main objects of study in this paper are quivers with potentials (QPs for short). Roughly speaking, a QP is a quiver Q together with an element S of the path algebra of Q such that S is a linear combination of cyclic paths. We associate to S the two- sided ideal J(S) in the path algebra generated by the (noncommutative) partial derivatives of S with respect to the arrows of Q. We refer to J(S) as the Jacobian Date: April 18, 2007; revised July 10, 2007 and March 12, 2008. 2000 Mathematics Subject Classification. Primary 16G10, Secondary 16G20, 16S38. Research of H. D. supported by the NSF grant DMS-0349019. Research of J. W. supported by the NSF grant DMS-0600229. Research of A. Z. supported by the NSF grant DMS-0500534 and by a Humboldt Research Award. http://arxiv.org/abs/0704.0649v4 2 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY ideal, and to the quotient of the path algebra modulo J(S) as the Jacobian algebra. They appeared in physicists’ work on superpotentials in the context of the Seiberg duality in mirror symmetry (see e.g., [17, 2, 6]). Since in some of their work the superpotentials are required to satisfy some form of Serre duality, we prefer not to use this terminology, and just refer to S as a potential ; another reason for this is that we are working with the completed path algebra, so our potentials are possibly infinite linear combinations of cyclic paths. The Jacobian algebras also play an important role in the recent work on Calabi-Yau algebras [4, 25, 26, 27]. In this paper we introduce and study mutations for QPs and their (decorated) representations. In the context of Calabi-Yau algebras, the mutations were discussed in [26] but our approach is much more elementary and down-to-earth. Namely, we develop the setup that directly extends to QPs the Bernstein-Gelfand-Ponomarev reflection functors [3] and their “decorated” version [28]. The original motivation for our study comes from the theory of cluster algebras introduced and studied in a series of papers [18, 19, 1, 20]. In this paper, we deal only with the underlying combinatorics of this theory embodied in skew-symmetrizable integer matrices and their mutations. Furthermore, we restrict our attention to skew- symmetric integer matrices. Such matrices can be encoded by quivers without loops and oriented 2-cycles. Namely, a skew-symmetric integer n × n matrix B = (bi,j) corresponds to a quiver Q(B) with vertices 1, . . . , n, and bi,j arrows from j to i whenever bi,j > 0. For every vertex k, the mutation at k transforms B into another skew-symmetric integer n × n matrix µk(B) = B = (bi,j). The formula for bi,j is given below in (7.4). It is well-known (see Proposition 7.1 below) that the quiver Q(B) can be obtained from Q(B) by the following three-step procedure: Step 1. For every incoming arrow a : j → k and every outgoing arrow b : k → i, create a “composite” arrow [ba] : j → i; thus, whenever bi,k, bk,j > 0, we create bi,kbk,j new arrows from j to i. Step 2. Reverse all arrows at k; that is, replace each arrow a : j → k with a⋆ : k → j, and b : k → i with b⋆ : i→ k. Step 3. Remove any maximal disjoint collection of oriented 2-cycles (that can appear as a result of creating new arrows in Step 1). In the case where k is a source or a sink of Q(B), the first and last steps of the above procedure are not applicable, so Q(B) is obtained from Q(B) by just reversing all the arrows at k. In this situation, J.Bernstein, I. Gelfand, and V. Ponomarev [3] introduced the reflection functor at k sending representations of a quiver Q(B) (with- out relations) into representations of Q(B). A modification of these functors acting on decorated representations was introduced in [28] to establish a link between clus- ter algebras and quiver representations (the definition of decorated representations for general QPs is given below in Section 10). The elementary approach of [28] has not been further pursued until now, giving way to a more sophisticated approach via cluster categories and cluster-tilted alge- bras developed in [7, 8, 9, 10, 12, 13, 14, 15] and many other publications. Most of the results in these papers are for the quivers obtained by mutations from hereditary algebras (i.e., quivers without oriented cycles and without relations). In this paper we return to the more elementary point of view of [28] and propose an alternative QUIVERS WITH POTENTIALS I 3 approach (which is in fact more general, since we do not impose any restrictions on quivers in question). In this approach, the mutations at arbitrary vertices (not just sources or sinks) are defined for QPs and their decorated representations. The construction for QPs is carried out in Section 5, and for their representations in Sec- tion 10. It turns out to be rather delicate and requires a lot of technical preparation. The first two steps of the above mutation procedure extend to QPs in a relatively straightforward way, but Step 3 presents a real challenge: we need to accompany the removal of oriented 2-cycles from a quiver with a suitable modification of the potential, leaving the corresponding Jacobian algebra unchanged. Our main device in dealing with this difficulty is Theorem 4.6, which is the crucial technical result of the paper. Roughly speaking, Theorem 4.6 asserts that every potential S can be transformed by an automorphism of the path algebra into the sum of two potentials Striv and Sred on the disjoint sets of arrows, where the trivial part Striv is a linear combination of cyclic 2-paths, while the reduced part Sred involves only cyclic d-paths with d ≥ 3. Furthermore, the Jacobian algebra of Sred is isomorphic to that of S. Several comments on this result are in order. First, our arguments heavily de- pend on the setup using completed path algebras, thus allowing potentials to involve infinite sums of cyclic paths. Second, the reduction S 7→ Sred is not given by a canonical procedure. As a consequence, our construction of mutations for QPs and their representations is not functorial in any obvious sense. On the positive side, we prove that every mutation is a well-defined transformation on the right-equivalence classes of QPs (and their representations), where, roughly speaking, two QPs are right-equivalent if they can be obtained from each other by an automorphism of the path algebra (for more precise definitions see Definitions 4.2 and 10.2). Finally, it is important to keep in mind that, even with the help of Theorem 4.6, in order to get rid of all oriented 2-cycles in the mutated QP, one needs to impose some “genericity” conditions on the initial potential S. These conditions are studied in Section 7. They are not very explicit in general, but we introduce an important class or rigid QPs (see Definitions 6.7 and 6.10) for which the absence of oriented 2-cycles after any sequence of QP mutation is guaranteed. We now describe the contents of the paper in more detail. In Section 2 we introduce an algebraic setup for dealing with quivers and their path algebras. We fix a base field K, and encode a quiver with the vertex set Q0 and the arrow set Q1 by its vertex span R = KQ0 and arrow span A = KQ1 . Thus, R is a finite-dimensional commutative K-algebra, and A is a finite-dimensional R-bimodule. We then introduce the path algebra R〈A〉 = and, more importantly for our purposes, the complete path algebra R〈〈A〉〉 = here Ad stands for the d-fold tensor power of A as an R-bimodule. We view R〈〈A〉〉 as a topological algebra via the m-adic topology, where m is the two-sided ideal generated by A. 4 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY In Section 3 we introduce some of our main objects of study: potentials and their Jacobian ideals. It is natural to view potentials as elements of the trace space R〈〈A〉〉/{R〈〈A〉〉, R〈〈A〉〉}, where {R〈〈A〉〉, R〈〈A〉〉} is the closure of the vector sub- space in R〈〈A〉〉 spanned by all commutators. It is more convenient for us to define a potential S as an element of the cyclic part of R〈〈A〉〉; for all practical purposes, S can be replaced by a cyclically equivalent potential, that is, the one with the same image in the trace space. To define the Jacobian ideal J(S) and derive its basic properties, we develop the formalism of cyclic derivatives, in particular, establishing “cyclic” versions of the Leibniz rule and the chain rule. The main result of Section 3 is Proposition 3.7 that asserts that any isomorphism ϕ of path algebras sends J(S) to J(ϕ(S)). Note that cyclic derivatives for general non-commutative algebras were introduced in [29], and the results we present can be easily deduced from those in loc.cit. For the convenience of the reader, we present complete independent proofs. Victor Ginzburg informed us that in the context of path algebras of quivers, cyclic derivatives were introduced and studied in [5, 24], and that Proposition 3.7 is a consequence of the geometric interpretation of J(S) given in [25, Definition 5.1.1, Lemma 5.1.3]. In Section 4 we introduce quivers with potentials (QPs) and define the right- equivalence relation on them, which plays an important role in the paper. We then state and prove the key technical result of the paper: Splitting Theorem 4.6, already discussed above. The proof is elementary but pretty involved; it uses in an essential way the topology in a complete path algebra. In order not to interrupt the argument, we move to the Appendix our treatment of the topological properties needed for the proof of one of the technical lemmas. In Section 5 we finally introduce the mutations of QPs. Using Theorem 4.6, we prove that the mutation at an arbitrary vertex is a well-defined involution on the set of right-equivalence classes of reduced QPs (Theorem 5.7). In Section 6, we study some mutation invariants of QPs. In particular, we show that mutations preserve the class of QPs with finite-dimensional Jacobian algebras (Corollary 6.6). Another important property of QPs preserved by mutations is rigid- ity (Corollary 6.11), which was already mentioned above. For the precise definition of rigid QPs see Definitions 6.7 and 6.10 below; intuitively, a QP is rigid if its right- equivalence class is invariant under infinitesimal deformations. In Section 7, we introduce and study nondegenerate QPs, that is, those to which one can apply an arbitrary sequence of mutations without creating oriented 2-cycles. In Corollary 7.4 we show that nondegeneracy is guaranteed by non-vanishing of countably many nonzero polynomial functions on the space of potentials. In par- ticular, if the base field K is uncountable, a nondegenerate QP exists for every underlying quiver. Section 8 contains some examples of rigid and non-rigid potentials and some further results illustrating the importance of rigidity. A simple but important Proposition 8.1 asserts that rigid QPs have no oriented 2-cycles. Combining this with the fact that rigidity is preserved by mutations, we conclude that every rigid QP is nondegenerate. Using a result by Keller-Reiten [27], we show in Example 8.7 that the class of rigid QPs (as well as the class of QPs with finite-dimensional Jacobian algebras) is strictly greater than the class of QPs mutation-equivalent to acyclic ones. On the other hand, QUIVERS WITH POTENTIALS I 5 Example 8.6 exhibits an underlying quiver without oriented 2-cycles that does not admit a rigid QP; thus, the class of nondegenerate QPs is strictly greater than the class of rigid ones. In Section 9, we consider quivers that are mutation-equivalent to a Dynkin quiver. For every such underlying quiver, we compute explicitly the corresponding rigid QP (Proposition 9.1). Comparing this result with the description of cluster-tilted alge- bras obtained in [13, 9], we conclude in Corollary 9.3 that in the case in question, every cluster-tilted algebra can be identified with the Jacobian algebra of the cor- responding rigid QP. Thus, Jacobian algebras can be viewed as generalizations of cluster-tilted algebras. In Section 10 we introduce decorated representations of QPs (QP-representations, for short) and their right-equivalence (Definitions 10.1 and 10.2). We then present a representation-theoretic extension of Splitting Theorem 4.6 by defining the reduced part of a QP-representation M (Definition 10.4) and proving that, up to right- equivalence, it is determined by the right-equivalence class of M (Proposition 10.5). We use this result to introduce mutations of QP-representations and to prove a representation-theoretic extension of Theorem 5.7: the mutation at every vertex is an involution on the set of right-equivalence classes of reduced QP-representations (Theorem 10.13). Some examples of QP-representations and their mutations are given in Section 11. All these examples treat quivers with three vertices. In particular, we describe the effect of mutations on a special family of band representations coming from the theory of string algebras [11, 21]. The concluding Section 12 contains some open problems about QPs and their representations that we find essential for better understanding of the theory. In the forthcoming continuation of this paper, we plan to discuss applications of QP-representations and their mutations to the structure of the corresponding cluster algebras. 2. Quivers and path algebras A quiver Q = (Q0, Q1, h, t) consists of a pair of finite sets Q0 (vertices) and Q1 (arrows) supplied with two maps h : Q1 → Q0 (head) and t : Q1 → Q0 (tail). It is represented as a directed graph with the set of vertices Q0 and directed edges a : ta → ha for a ∈ Q1. Note that this definition allows the underlying graph to have multiple edges and (multiple) loops. We fix a field K, and associate to a quiver Q two vector spaces R = KQ0 and A = KQ1 consisting of K-valued functions on Q0 and Q1, respectively. We will sometimes refer to R as the vertex span of Q, and to A as the arrow span of Q. The space R is a commutative algebra under the pointwise multiplication of functions. The space A is an R-bimodule, with the bimodule structure defined as follows: if e ∈ R and f ∈ A then (e ·f)(a) = e(ha)f(a) and (f ·e)(a) = f(a)e(ta) for all a ∈ Q1. We denote by Q⋆ the dual or opposite quiver Q⋆ obtained by reversing the arrows in Q (i.e., replacing Q = (Q0, Q1, h, t) with Q ⋆ = (Q0, Q1, t, h)). The corresponding arrow span is naturally identified with the dual bimodule A⋆ (the dual vector space of A with the standard R-bimodule structure). 6 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY For a given vertex set Q0 with the vertex span R, every finite-dimensional R- bimodule B is the arrow span of some quiver on Q0. To see this, consider the elements ei ∈ R for i ∈ Q0 given by ei(j) = δi,j (the Kronecker delta symbol). They form a basis of idempotents of R, hence every R-bimodule B has a direct sum decomposition i,j∈Q0 Bi,j , where Bi,j = eiBej ⊆ B for every i, j ∈ Q0. If B is finite-dimensional, we can identify the (finite) set of arrows Q1 with a K-basis in B which is the union of bases in all components Bi,j; under this identification, every a ∈ Q1 ∩ Bi,j has h(a) = i and t(a) = j. It is convenient to represent an R-bimodule B by a matrix of vector spaces (Bi,j) whose rows and columns are labeled by vertices. In this model, the left (resp. right) action of an element c = i ciei ∈ R is given by the left (resp. right) multiplication by the diagonal matrix with diagonal entries ci. And the tensor product over R is given by a usual matrix multiplication: if B = i,j Bi,j and C = i,j Ci,j, then (B ⊗R C)i,j = (Bi,k ⊗ Ck,j). Returning to a quiver Q with the arrow span A, for each nonnegative integer d, let Ad denote the R-bimodule Ad = A⊗R A⊗R · · · ⊗R A︸ ︷︷ ︸ with the convention A0 = R. Definition 2.1. The path algebra of Q is defined as the (graded) tensor algebra R〈A〉 = For each i, j ∈ Q0, the component R〈A〉i,j = eiR〈A〉ej is called the space of paths from j to i. As above, we identify the set of arrows Q1 with some basis of A consisting of homogeneous elements, that is, each a ∈ Q1 belongs to some component Ai,j. Then for every d ≥ 1, the products a1 · · · ad such that all ak belong to Q1, and t(ak) = h(ak+1) for 1 ≤ k < d, form a K-basis of A d. We call this basis the path basis of Ad associated to Q1. For d = 0, we call {ei | i ∈ Q0} the path basis of A 0 = R. We refer to the union of path bases for all d as the path basis of R〈A〉. The elements of the path basis will be sometimes referred to simply as paths. We depict a1 · · · ad as a path of length d starting in the vertex t(ad) and ending in h(a1). Note that the product (a1 · · · ad)(ad+1 · · ·ad+k) of two paths is 0 unless t(ad) = h(ad+1), in which case the product is given by concatenation of paths. This description implies the following: (2.1) If 0 6= p ∈ Akei and 0 6= q ∈ eiA ℓ for some vertex i then pq 6= 0. QUIVERS WITH POTENTIALS I 7 Definition 2.2. The complete path algebra of Q is defined as R〈〈A〉〉 = Thus, the elements of R〈〈A〉〉 are (possibly infinite) K-linear combinations of the elements of a path basis in R〈A〉; and the multiplication in R〈〈A〉〉 naturally extends the multiplication in R〈A〉. Note that, if the quiver Q is acyclic (that is, has no oriented cycles), then Ad = {0} for d≫ 0, hence in this case R〈〈A〉〉 = R〈A〉, and this algebra is finite-dimensional. Example 2.3. Consider the quiver Q = (Q0, Q1) with Q0 = {1} and Q1 = {a} with a : 1 → 1. This is the loop quiver: 1a 88 . In this case R = KQ0 = K, and A = KQ1 = Ka. We have R〈A〉 = K[a], and R〈〈A〉〉 = K[[a]], the algebra of formal power series. Let m = m(A) denote the (two-sided) ideal of R〈〈A〉〉 given by (2.2) m = m(A) = Thus the powers of m are given by We view R〈〈A〉〉 as a topologicalK-algebra via the m-adic topology having the powers of m as a basic system of open neighborhoods of 0. Thus, the closure of any subset U ⊆ R〈〈A〉〉 is given by (2.3) U = (U +mn). It is clear that R〈A〉 is a dense subalgebra of R〈〈A〉〉. In dealing with R〈〈A〉〉, the following fact is quite useful: every (non-commutative) formal power series over R in a finite number of variables can be evaluated at arbi- trary elements of m to obtain a well-defined element of R〈〈A〉〉. To illustrate, let us show that m is the unique maximal two-sided ideal of R〈〈A〉〉 having zero intersec- tion with R = A0. Indeed, it is enough to show that any element x ∈ R〈〈A〉〉 − m generates an ideal having nonzero intersection with R. Let x = c + y with c a nonzero element of R, and y ∈ m. Multiplying x on both sides by suitable elements of R, we can assume that c = ei for some i ∈ Q0, and eiy = yei = y. But then z = ei − y + y 2 − y3 + · · · is a well-defined element of R〈〈A〉〉, and we have xz = ei, proving our claim. This characterization of m implies that it is invariant under any algebra automor- phism ϕ of R〈〈A〉〉 such that ϕ|R is the identity. Thus, ϕ is continuous, i.e., is an automorphism of R〈〈A〉〉 as a topological algebra. 8 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY The same argument shows that, more generally, if A and A′ are finite-dimensional R-bimodules then any algebra homomorphism ϕ : R〈〈A〉〉 → R〈〈A′〉〉 such that ϕ|R = id, sends m(A) intom(A ′), hence is continuous. Thus, ϕ is uniquely determined by its restriction to A1 = A, which is a R-bimodule homomorphism A → m(A′) = A′ ⊕m(A′)2. We write ϕ|A = (ϕ (1), ϕ(2)), where ϕ(1) : A→ A′ and ϕ(2) : A→ m(A′)2 are R-bimodule homomorphisms. Proposition 2.4. Any pair (ϕ(1), ϕ(2)) of R-bimodule homomorphisms ϕ(1) : A→ A′ and ϕ(2) : A → m(A′)2 gives rise to a unique homomorphism of topological algebras ϕ : R〈〈A〉〉 → R〈〈A′〉〉 such that ϕ|R = id, and ϕ|A = (ϕ (1), ϕ(2)). Furthermore, ϕ is an isomorphism if and only if ϕ(1) is a R-bimodule isomorphism A→ A′. Proof. The uniqueness of ϕ is clear. To show the existence, let Q1 = {a1, . . . , aN} ⊂ A = A1 be the set of arrows in A. Writing an element x ∈ R〈〈A〉〉 as an infinite K-linear combination of the elements of the corresponding path basis in R〈A〉, we express x as a (non-commutative) formal power series F (a1, . . . , aN ). Therefore, a desired algebra homomorphism can be obtained by setting ϕ(x) = F (ϕ(1)(a1) + ϕ(2)(a1), . . . , ϕ (1)(aN) + ϕ (2)(aN)). If ϕ is an isomorphism then ϕ(1) : A → A′ is clearly an isomorphism of R- bimodules. To show the converse implication, we can identify A and A′ with the help of ϕ(1), and so assume that ϕ(1) is the identity automorphism of A. Then the (infinite) matrix that expresses ϕ as a K-linear map in the path basis of R〈〈A〉〉 is lower-triangular with all the diagonal entries equal to 1 (we order the basis ele- ments so that their degrees weakly increase). Clearly, such a matrix is invertible, completing the proof of Proposition 2.4. � Definition 2.5. Let ϕ be the automorphism of R〈〈A〉〉 corresponding to a pair (ϕ(1), ϕ(2)) as in Proposition 2.4. If ϕ(2) = 0, then we call ϕ a change of arrows. If ϕ(1) is the identity automorphism of A, we say that ϕ is a unitriangular automorphism; furthermore, we say that ϕ is of depth d ≥ 1, if ϕ(2)(A) ⊂ md+1. The following property of unitriangular automorphisms is immediate from the definitions: If ϕ is an unitriangular automorphism of R〈〈A〉〉 of depth d,(2.4) then ϕ(u)− u ∈ mn+d for u ∈ mn. 3. Potentials and their Jacobian ideals In this section we introduce some of our main objects of study: potentials and their Jacobian ideals in the complete path algebra R〈〈A〉〉 given by Definition 2.2. We fix a path basis in R〈A〉; recall that it consists of the elements ei ∈ R = A together with the products a1 · · · ad (paths) such that all ak belong to Q1, and t(ak) = h(ak+1) for 1 ≤ k < d. Then each space A d has a direct R-bimodule decomposition i,j∈Q0 Adi,j, where the component A i,j is spanned by the paths a1 · · · ad with h(a1) = i and t(ad) = j. QUIVERS WITH POTENTIALS I 9 Definition 3.1. • For each d ≥ 1, we define the cyclic part of Ad as the sub-R-bimodule Adcyc =⊕ Adi,i. Thus, A cyc is the span of all paths a1 · · · ad with h(a1) = t(ad); we call such paths cyclic. • We define a closed vector subspace R〈〈A〉〉cyc ⊆ R〈〈A〉〉 by setting R〈〈A〉〉cyc = Adcyc, and call the elements of R〈〈A〉〉cyc potentials. • For every ξ ∈ A⋆, we define the cyclic derivative ∂ξ as the continuous K-linear map R〈〈A〉〉cyc → R〈〈A〉〉 acting on paths by (3.1) ∂ξ(a1 · · · ad) = ξ(ak)ak+1 · · · ada1 · · · ak−1. • For every potential S, we define its Jacobian ideal J(S) as the closure of the (two-sided) ideal in R〈〈A〉〉 generated by the elements ∂ξ(S) for all ξ ∈ A (see (2.3)); clearly, J(S) is a two-sided ideal in R〈〈A〉〉. • We call the quotient R〈〈A〉〉/J(S) the Jacobian algebra of S, and denote it by P(Q, S) or P(A, S). An easy check shows that a cyclic derivative ∂ξ : R〈〈A〉〉cyc → R〈〈A〉〉 does not de- pend on the choice of a path basis. Furthermore, cyclic derivatives do not distinguish between the potentials that are equivalent in the following sense. Definition 3.2. Two potentials S and S ′ are cyclically equivalent if S−S ′ lies in the closure of the span of all elements of the form a1 · · ·ad − a2 · · · ada1, where a1 · · · ad is a cyclic path. The following proposition is immediate from (3.1). Proposition 3.3. If two potentials S and S ′ are cyclically equivalent, then ∂ξ(S) = ′) for all ξ ∈ A⋆, hence J(S) = J(S ′) and P(A, S) = P(A, S ′). It is easy to see that the definition of cyclical equivalence does not depend on the choice of a path basis. In fact, it can be given in more invariant terms as follows. Definition 3.4. For any topological K-algebra U , its trace space Tr(U) is defined as Tr(U) = U/{U, U}, where {U, U} is the closure of the vector subspace in U spanned by all commutators. We denote by π = πU : U → Tr(U) the canonical projection. The following proposition is a direct consequence of the definitions. Proposition 3.5. Two potentials S and S ′ are cyclically equivalent if and only if πR〈〈A〉〉(S) = πR〈〈A〉〉(S ′). Thus, the Jacobian ideal and the Jacobian algebra of a potential S depend only on the image of S in Tr(R〈〈A〉〉). Recall that we identify the set of arrows Q1 with a K-basis in A = A 1. For a ∈ Q1, we will use the notation ∂a for the cyclic derivative ∂a⋆ , where Q 1 = {a ⋆ | a ∈ Q1} is the dual basis of Q1 in A 10 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY Example 3.6. Consider the quiver Q = (Q0, Q1) with Q0 = {1, 2} and Q1 = {a, b}, where a : 1 → 2 and b : 2 → 1: The vertex and arrow spans of Q are given by R = KQ0 = Ke1 ⊕ Ke2, and A = KQ1 = Ka⊕Kb. The paths in R〈〈A〉〉 are e1, e2 and all products of the generators a and b in which the factors a and b alternate. The potentials are (possibly infinite) linear combinations of the elements (ab)n and (ba)n for all n ≥ 1. Using (3.1), we obtain ∂a((ab) n) = ∂a((ba) n) = nb(ab)n−1, ∂b((ab) n) = ∂b((ba) n) = na(ba)n−1 (n ≥ 1). Up to cyclical equivalence, every potential can be written in the form αn(ab) n (αn ∈ K). Returning to the general theory, it is clear that every algebra homomorphism ϕ : R〈〈A〉〉 → R〈〈A′〉〉 such that ϕ|R = id, sends potentials to potentials. Proposition 3.7. Every algebra isomorphism ϕ : R〈〈A〉〉 → R〈〈A′〉〉 such that ϕ|R = id, sends J(S) onto J(ϕ(S)), inducing an isomorphism of Jacobian algebras P(A, S) → P(A′, ϕ(S)). Proof. We start by developing some “differential calculus” for cyclic derivatives. We need a few pieces of notation. We set R〈〈A〉〉⊗̂R〈〈A〉〉 = d,e≥0 (Ad ⊗ Ae) (the tensor product on the right is over the base field K), and view this space as a topological vector space with a basic system of open neighborhoods of 0 formed by the sets d+e≥n(A d⊗Ae) for all n ≥ 0; thus, R〈A〉⊗R〈A〉 is dense inR〈〈A〉〉⊗̂R〈〈A〉〉. Now, for every ξ ∈ A⋆, we define a continuous K-linear map ∆ξ : R〈〈A〉〉 → R〈〈A〉〉⊗̂R〈〈A〉〉 by setting ∆ξ(e) = 0 for e ∈ R = A 0, and (3.2) ∆ξ(a1 · · · ad) = ξ(ak)a1 · · · ak−1 ⊗ ak+1 · · · ad for any path a1 · · · ad of length d ≥ 1. Note that ∆ξ does not depend on the choice of a path basis. We will use the same convention as for cyclic derivatives: for a ∈ Q1, we write ∆a instead of ∆a⋆ . For instance, in the situation of Example 3.6, we have ∆a((ab) (ab)k−1 ⊗ b(ab)n−k, ∆b((ab) (ab)k−1a⊗ (ab)n−k. QUIVERS WITH POTENTIALS I 11 Next, we denote by (f, g) 7→ f�g a continuousK-bilinear map (R〈〈A〉〉⊗̂R〈〈A〉〉)× R〈〈A〉〉 → R〈〈A〉〉 given by (3.3) (u⊗ v)�g = vgu for u, v ∈ R〈A〉. We are now ready to state the Leibniz rule. Lemma 3.8 (Cyclic Leibniz rule). Let f ∈ R〈〈A〉〉i,j and g ∈ R〈〈A〉〉j,i for some vertices i and j. Then for every ξ ∈ A⋆, we have (3.4) ∂ξ(fg) = ∆ξ(f)�g +∆ξ(g)�f. More generally, for any finite sequence of vertices i1, . . . , id, id+1 = i1 and for any f1, . . . fd such that fk ∈ R〈〈A〉〉ik,ik+1, we have (3.5) ∂ξ(f1 · · ·fd) = ∆ξ(fk)�(fk+1 · · · fdf1 · · · fk−1). Proof. It is enough to check (3.4) in the case where f = a1 · · · ad and g = ad+1 · · · ad+s are two paths such that t(ad) = h(ad+1) and t(ad+s) = h(a1). Using (3.1), we obtain ∂ξ(fg) = ξ(ak)ak+1 · · · ad+sa1 · · · ak−1. Comparing this expression with (3.2) and (3.3), we see that the part of the last sum where k runs from 1 to d (resp. from d + 1 to d + s) is equal to ∆ξ(f)�g (resp. to ∆ξ(g)�f), proving (3.4). The identity (3.5) follows from (3.4) by induction on d. � Lemma 3.9 (Cyclic chain rule). Suppose that ϕ : R〈〈A〉〉 → R〈〈A′〉〉 is an algebra homomorphism as in Proposition 2.4. Then, for every potential S ∈ R〈〈A〉〉)cyc and ξ ∈ A′⋆, we have: (3.6) ∂ξ(ϕ(S)) = ∆ξ(ϕ(a))�ϕ(∂a(S)). Proof. It suffices to treat the case where S = a1 · · ·ad is a cyclic path. Applying (3.5) and (3.1), we obtain ∂ξ(ϕ(S)) = ∆ξ(ϕ(ak))�(ϕ(ak+1 · · · ada1 · · · ak−1)) ∆ξ(ϕ(a))�ϕ( k:ak=a ak+1 · · ·ada1 · · · ak−1) ∆ξ(ϕ(a))�ϕ(∂a(S)), as desired. � Now we are ready to prove Proposition 3.7. By Lemma 3.9, for every ξ ∈ A′⋆, the element ∂ξ(ϕ(S)) lies in the ideal generated by the elements ϕ(∂a(S)) for a ∈ Q1, hence, it lies in ϕ(J(S)). Thus, we have the inclusion J(ϕ(S)) ⊆ ϕ(J(S)). 12 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY We can also apply this to the inverse isomorphism ϕ−1 and the potential ϕ(S): J(S) = J(ϕ−1(ϕ(S))) ⊆ ϕ−1(J(ϕ(S)). Applying ϕ to both sides yields ϕ(J(S)) ⊆ J(ϕ(S)), completing the proof. � 4. Quivers with potentials We now introduce our main objects of study. Definition 4.1. Suppose Q is a quiver with the arrow span A, and S ∈ R〈〈A〉〉cyc is a potential. We say that a pair (Q, S) (or (A, S)) is a quiver with potential (QP for short) if it satisfies the following two conditions: (4.1) The quiver Q has no loops, i.e., Ai,i = 0 for all i ∈ Q0. (4.2) No two cyclically equivalent cyclic paths appear in the decomposition of S. In view of (4.1), every potential S belongs to m(A)2; and condition (4.2) excludes, for instance, any non-zero potential S cyclically equivalent to 0. Definition 4.2. Let (A, S) and (A′, S ′) be QPs on the same vertex set Q0. By a right-equivalence between (A, S) and (A′, S ′) we mean an algebra isomorphism ϕ : R〈〈A〉〉 → R〈〈A′〉〉 such that ϕ|R = id, and ϕ(S) is cyclically equivalent to S (see Definition 3.2). In view of Proposition 3.5, any algebra homomorphism R〈〈A〉〉 → R〈〈A′〉〉 such that ϕ|R = id, sends cyclically equivalent potentials to cyclically equivalent ones. It follows that right-equivalences of QPs have the expected properties: the composition of two right-equivalences, as well as the inverse of a right-equivalence, is again a right-equivalence. Note also that an isomorphism ϕ : R〈〈A〉〉 → R〈〈A′〉〉 induces an isomorphism of R-bimodules A and A′ (cf. Proposition 2.4), so in dealing with right-equivalent QPs we can assume without loss of generality that A = A′. In view of Propositions 3.3 and 3.7, any right-equivalence of QPs (A, S) ∼= (A′, S ′) induces an isomorphism of the Jacobian ideals J(S) ∼= J(S ′) and of the Jacobian algebras P(A, S) ∼= P(A′, S ′). For every two QPs (A, S) and (A′, S ′) (on the same set of vertices Q0), we can form their direct sum (A, S) ⊕ (A′, S ′) = (A ⊕ A′, S + S ′); it is well-defined since both complete path algebras R〈〈A〉〉 and R〈〈A′〉〉 have canonical embeddings into R〈〈A⊕ A′〉〉 as closed R-subalgebras. We start our analysis of QPs with the case S ∈ A2. In this case, J(S) is the closure of the ideal generated by the subspace (4.3) ∂S = {∂ξ(S) | ξ ∈ A ⋆} ⊆ A. Definition 4.3. We say that a QP (A, S) is trivial if S ∈ A2, and ∂S = A, or, equivalently, P(A, S) = R. The following description of trivial QPs is seen by standard linear algebra. QUIVERS WITH POTENTIALS I 13 Proposition 4.4. A QP (A, S) with S ∈ A2 is trivial if and only if the set of arrows Q1 consists of 2N distinct arrows a1, b1, . . . , aN , bN such that each akbk is a cyclic 2-path, and there is a change of arrows ϕ (see Definition 2.5) such that ϕ(S) is cyclically equivalent to a1b1 + · · ·+ aNbN . Returning to general QPs, we now show that taking direct sums with trivial ones does not affect the Jacobian algebra. Proposition 4.5. If (A, S) is an arbitrary QP, and (C, T ) is a trivial one, then the canonical embedding R〈〈A〉〉 → R〈〈A⊕C〉〉 induces an isomorphism of Jacobian algebras P(A, S) → P(A⊕ C, S + T ). Proof. Let L denote the closure of the two-sided ideal in R〈〈A ⊕ C〉〉 generated by C; thus, L is the set of all (possibly infinite) linear combinations of paths, each of which contains at least one arrow from C. The definitions readily imply that R〈〈A⊕ C〉〉 = R〈〈A〉〉 ⊕ L, J(S + T ) = J(S)⊕ L (in the last equality, J(S) is understood as the Jacobian ideal of S in R〈〈A〉〉). Therefore, P(A⊕ C, S + T ) = R〈〈A⊕ C〉〉/J(S + T ) = (R〈〈A〉〉 ⊕ L)/(J(S)⊕ L) ∼= R〈〈A〉〉/J(S) = P(A, S), as desired. � For an arbitrary QP (A, S), we denote by S(2) ∈ A2 the degree 2 homogeneous component of S. We call (A, S) reduced if S(2) = 0, i.e., S ∈ m(A)3. We define the trivial and reduced arrow spans of (A, S) as the finite-dimensional R-bimodules given by (4.4) Atriv = Atriv(S) = ∂S (2), Ared = Ared(S) = A/∂S (2) . (see (4.3)). The following statement will play a crucial role in later sections. Theorem 4.6 (Splitting Theorem). For every QP (A, S) with the trivial arrow span Atriv and the reduced arrow span Ared, there exist a trivial QP (Atriv, Striv) and a reduced QP (Ared, Sred) such that (A, S) is right-equivalent to the direct sum (Atriv, Striv)⊕(Ared, Sred). Furthermore, the right-equivalence class of each of the QPs (Atriv, Striv) and (Ared, Sred) is determined by the right-equivalence class of (A, S). Let us first prove the existence of a desired right-equivalence (4.5) (A, S) ∼= (Atriv, Striv)⊕ (Ared, Sred). There is nothing to prove if (A, S) is reduced, so let us assume that S(2) 6= 0. Using Proposition 4.4 and replacing S if necessary by a cyclically equivalent potential, we can assume that S is of the form (4.6) S = (akbk + akuk + vkbk) + S 14 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY where each akbk is a cyclic 2-path, the arrows a1, b1, . . . , aN , bN form a basis of Atriv, the elements uk and vk belong tom 2, and the potential S ′ ∈ m3 is a linear combination of cyclic paths containing none of the arrows ak or bk. The existence of a right- equivalence (4.5) becomes a consequence of the following lemma. Lemma 4.7. For every potential S of the form (4.6), there exists a unitriangular automorphism ϕ of R〈〈A〉〉 such that ϕ(S) is cyclically equivalent to a potential of the form (4.6) with uk = vk = 0 for all k. We say that a potential S is d-split if it is of the form (4.6) with uk, vk ∈ m d+1 for all k. To prove Lemma 4.7, we first show the following. Lemma 4.8. Suppose a potential S is d-split for some d ≥ 1. There exists a unitri- angular automorphism ϕ of R〈〈A〉〉 having depth d and such that ϕ(S) is cyclically equivalent to a 2d-split potential S̃ with ϕ(S)− S̃ ∈ m2d+2. Proof. Let us write S in the form (4.6) with uk, vk ∈ m d+1. Let ϕ be the unitriangular automorphism of R〈〈A〉〉 acting on arrows as follows: ϕ(ak) = ak − vk, ϕ(bk) = bk − uk, ϕ(c) = c (c ∈ Q1 − {a1, b1, . . . , aN , bN}). Then ϕ is of depth d, so by (2.4), for each k, we have ϕ(uk) = uk + u k, ϕ(vk) = vk + v k ∈ m 2d+1). Therefore, we obtain ϕ(S) = ((ak − vk)(bk − uk) + (ak − vk)(uk + u k) + (vk + v k)(bk − uk)) + S (akbk + aku k + v kbk) + S1 + S where S1 = − (vkuk + vku k + v kuk) ∈ m 2d+2. In view of Definition 3.2, S1 is cyclically equivalent to a potential of the form∑ k(aku k + v kbk) + S ′′, where u′′k, v k ∈ m 2d+1, and S ′′ is a linear combination of cyclic paths containing none of the ak or bk. Furthermore, we have S1 − S k + v kbk) ∈ m 2d+2. We see that the desired potential S̃ can be chosen as (akbk + ak(u k + u k) + (v k + v k)bk) + S ′ + S ′′, completing the proof of Lemma 4.8. � Proof of Lemma 4.7. Starting with a potential S of the form (4.6) and using repeat- edly Lemma 4.8, we construct a sequence of potentials S1, S2, . . . , and a sequence of unitriangular automorphisms ϕ1, ϕ2, . . . , with the following properties: (1) S1 = S. (2) Sd is 2 d−1-split. (3) ϕd is of depth 2 QUIVERS WITH POTENTIALS I 15 (4) ϕd(Sd) is cyclically equivalent to Sd+1, and ϕd(Sd)− Sd+1 ∈ m 2d+2. By property (3), setting (4.7) ϕ = lim ϕnϕn−1 · · ·ϕ1, we obtain a well defined unitriangular automorphism ϕ of R〈〈A〉〉; indeed, in view of (2.4), for any u ∈ R〈〈A〉〉, if we write ϕnϕn−1 · · ·ϕ1(u) as d=0 u n with u n ∈ A then each homogeneous component u n stabilizes as n→ ∞. We claim that this automorphism ϕ satisfies the required properties in Lemma 4.7. To see this, for d ≥ 1, denote Cd = ϕd(Sd)− Sd+1. By (4), Cd ∈ {R〈〈A〉〉, R〈〈A〉〉} ∩ 2d+2 (recall from Definition 3.4 that {R〈〈A〉〉, R〈〈A〉〉} denotes the closure of the vector subspace in R〈〈A〉〉 spanned by all commutators). Using (1), it is easy to see ϕnϕn−1 · · ·ϕ1(S) = Sn+1 + ϕnϕn−1 · · ·ϕd+1(Cd) for every n ≥ 1; passing to the limit as n→ ∞ yields ϕ(S) = lim Sn + ϕ( (ϕd · · ·ϕ1) −1(Cd)) (the convergence of the series on the right is clear since any automorphism of R〈〈A〉〉 preserves the powers of m). We conclude that ϕ(S) is cyclically equivalent to limn→∞ Sn. In view of (2), the latter element is of the form (4.6) with uk = vk = 0 for all k. This completes the proofs of Lemma 4.7 and of the existence of a right- equivalence (4.5). � The above argument makes it clear that the right-equivalence class of (Atriv, Striv) is determined by the right-equivalence class of (A, S) . To prove Theorem 4.6, it remains to show that the same is true for (Ared, Sred). Changing notation a little bit, we need to prove the following. Proposition 4.9. Let (A, S) and (A, S ′) be reduced QPs, and (C, T ) a trivial QP. If (A⊕C, S+T ) is right-equivalent to (A⊕C, S ′+T ) then (A, S) is right-equivalent to (A, S ′). We deduce Proposition 4.9 from the following result of independent interest. Proposition 4.10. Let (A, S) and (A, S ′) be reduced QPs such that S ′−S ∈ J(S)2. Then we have: (1) J(S ′) = J(S). (2) (A, S) is right-equivalent to (A, S ′). More precisely, there exists an algebra automorphism ϕ of R〈〈A〉〉 such that ϕ(S) is cyclically equivalent to S ′, and ϕ(u)− u ∈ J(S) for all u ∈ R〈〈A〉〉. Proof. (1) Since (A, S) is reduced, we have J(S) ⊆ m2. As an easy consequence of the cyclic Leibniz rule (3.4), we see that ∂ξ(J(S) 2)cyc ⊆ mJ(S) + J(S)m 16 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY for any ξ ∈ A⋆. It follows that (4.8) ∂ξS ′ − ∂ξS ∈ mJ(S) + J(S)m, implying that J(S ′) ⊆ J(S). To show the reverse inclusion, note that (4.8) also implies that J(S) ⊆ J(S ′) + (mJ(S) + J(S)m). Applying the same inclusion to each of the terms J(S) on the right, we obtain J(S) ⊆ J(S ′) + (m2J(S) +mJ(S)m+ J(S)m2). Continuing in the same way, we get J(S) ⊆ J(S ′) + kJ(S)mn−k ⊆ J(S ′) +mn+2 for any n ≥ 1. Remembering the definition of topology in R〈〈A〉〉 (see (2.3)) and the fact that J(S ′) is closed, we conclude that J(S) ⊆ J(S ′), finishing the proof of part (1) of Proposition 4.10. (2) Let Q1 = {a1, . . . , aN} be the set of arrows (that is, a basis of A). Then a unitriangular automorphism ϕ of R〈〈A〉〉 is specified by a N -tuple of elements b1, . . . , bN ∈ m 2 such that (4.9) ϕ(ak) = ak + bk (k = 1, . . . , N). Lemma 4.11. Let (A, S) be a reduced QP, and let ϕ be a unitriangular automor- phism of R〈〈A〉〉 given by (4.9). Then the potential ϕ(S) − S − k=1 bk∂akS is cyclically equivalent to an element of mI2, where I is the closure of the ideal in R〈〈A〉〉 generated by b1, . . . , bN . Proof. First consider the case where S = ak1 · · · akd is a cyclic path of length d ≥ 3. Then ϕ(S) = (ak1 +bk1) · · · (akd +bkd). Expanding this product, we see that the term that contains no factors bkj is equal to S, while the sum of the terms that contain ex- actly one factor bkj is easily seen to be cyclically equivalent to k=1 bk∂akS (cf.(3.1)), and the rest of the terms are cyclically equivalent to elements of k=1m(m d−1∩I)bk. Writing a general potential S ∈ m3 as a linear combination of cyclic paths,we see that ϕ(S)− S − bk∂akS is cyclically equivalent to ckbk, where each ck is of the form with c kℓ ∈ m d−1 ∩ I. Since I is closed, each ck is a well-defined element of mI, implying the assertion of Lemma 4.11. � We will also need one more lemma whose proof will be given in Section 13. Lemma 4.12. Let I be a closed ideal of R〈〈A〉〉, and J be the closure of an ideal generated by finitely many elements f1, f2, . . . , fN , which are bi-homogeneous with respect to the vertex bigrading. Then every potential belonging to the ideal IJ is cyclically equivalent to an element of the form k=1 bkfk, where all bk belong to I. QUIVERS WITH POTENTIALS I 17 To prove part (2) of Proposition 4.10, we construct a sequence of N -tuples (b1n, . . . , bNn) (n ≥ 1) of elements of m2 and the corresponding unitriangular automorphisms ϕn of R〈〈A〉〉 (so that ϕn(ak) = ak + bkn for k = 1, . . . , N) such that, for all n ≥ 1, we have (1) bkn ∈ m n+1 ∩ J(S) for k = 1, . . . , N . (2) S ′ is cyclically equivalent to ϕ0ϕ1 · · ·ϕn−1(S + k=1 bkn∂akS) (with the con- vention that ϕ0 is the identity automorphism). We proceed by induction on n. In the basic case n = 1, the existence of an N -tuple (b11, . . . , bN1) with desired properties follows from Lemma 4.12 applied to I = J = J(S) and fk = ∂akS (note that J(S) ⊆ m 2, since (A, S) is assumed to be reduced). Now assume that, for some n ≥ 1, we have already defined the elements bkℓ for k = 1, . . . , N and ℓ = 1, . . . , n, satisfying (1) and (2). Applying Lemma 4.11 to bk = bkn (so that ϕ = ϕn), we obtain that ϕn(S)− (S + k=1 bkn∂akS) is cyclically equivalent to an element of m(mn+1 ∩ J(S))2. We have m(mn+1 ∩ J(S))2 ⊆ (mn+2 ∩ J(S))J(S). This implies in particular that ϕn(S) − S is cyclically equivalent to an element of J(S)2. Combining Proposition 3.7 with the already proved part (1) of Proposi- tion 4.10, we conclude that ϕn(J(S)) = J(ϕn(S)) = J(S). It follows that ϕn(S) − k=1 bkn∂akS) is cyclically equivalent to an element of ϕn((m n+2 ∩ J(S))J(S)). Applying Lemma 4.12 to I = mn+2 ∩ J(S), J = J(S) and fk = ∂akS, we see that every potential in (mn+2 ∩ J(S))J(S) is cyclically equivalent to a potential of the bk,n+1∂akS for some bk,n+1 ∈ m n+2∩J(S). It follows that S+ k=1 bkn∂akS is cyclically equivalent to ϕn(S+ k=1 bk,n+1∂akS). Thus, conditions (1) and (2) get satisfied with n replaced by n+ 1, completing our inductive step. In view of condition (1), limn→∞ ϕ1 · · ·ϕn is a well-defined automorphism ϕ of R〈〈A〉〉 such that ϕ(u)− u ∈ J(S) for all u ∈ R〈〈A〉〉. Passing to the limit n → ∞ in condition (2), we conclude that S ′ is cyclically equivalent to ϕ(S), completing the proof of part (2) of Proposition 4.10. � Proof of Proposition 4.9. We abbreviate J = J(S) and J ′ = J(S ′) (understood as the Jacobian ideals of S and S ′ in R〈〈A〉〉). As in Proposition 4.5, let L denote the closure of the two-sided ideal in R〈〈A⊕ C〉〉 generated by C. Then we have (4.10) R〈〈A⊕ C〉〉 = R〈〈A〉〉 ⊕ L, J(S + T ) = J ⊕ L, J(S ′ + T ) = J ′ ⊕ L. Let ϕ be an automorphism ofR〈〈A⊕C〉〉, such that ϕ(S+T ) is cyclically equivalent to S ′ + T . In view of (4.10) and Proposition 3.7, we have (4.11) ϕ(J ⊕ L) = J ′ ⊕ L. 18 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY Let ψ : R〈〈A〉〉 → R〈〈A〉〉 denote the restriction to R〈〈A〉〉 of the composition pϕ, where p is the projection of R〈〈A ⊕ C〉〉 onto R〈〈A〉〉 along L. In view of Proposition 4.10, it suffices to show the following: ψ is an automorphism of R〈〈A〉〉 such that(4.12) S ′ − ψ(S) is cyclically equivalent to an element of ψ(J2) (indeed, assuming (4.12) and using Proposition 3.7, we see that ψ(J2) = J(ψ(S))2, hence one can apply Proposition 4.10 to potentials S ′ and ψ(S)). Clearly, ψ is an algebra homomorphism, so can be represented by a pair (ψ(1), ψ(2)) as in Proposition 2.4. To show that ψ is an automorphism of R〈〈A〉〉, it suffices to show that ψ(1) is an R-bimodule automorphism of A. By the definition, if we write the R-bimodule automorphism ϕ(1) of A⊕ C as a matrix ϕAA ϕAC ϕCA ϕCC then ψ(1) = ϕAA. Since ϕ(C) ⊂ ϕ(J ⊕ L) = J ′ ⊕ L ⊆ m(A)2 ⊕ L, it follows that ϕAC = 0, implying that ψ (1) = ϕAA is an R-bimodule automorphism of A, and that ψ is an automorphism of R〈〈A〉〉. Since S ′+T is cyclically equivalent to ϕ(S+T ), the same is true for the potentials obtained from them by applying the projection p; it follows that S ′−ψ(S) is cyclically equivalent to pϕ(T ). Since T ∈ C2, the claim that S ′ − ψ(S) is cyclically equivalent to an element of ψ(J2) follows from the fact that pϕ(L) ⊆ ψ(J), or, equivalently, that ϕ(L) ⊆ ϕ(J) + L. Applying the inverse automorphism ϕ−1 to both sides, it suffices to show that L ⊆ J + ϕ−1(L). Using the obvious symmetry between J and J ′, it is enough to show the inclusion L ⊆ J ′ + ϕ(L). Let us abbreviate M = m(A⊕ C), and I = J ′ + ϕ(L). Since ϕ(J) ⊆ J ′ ⊕ L, and J ⊆ m(A)2, it follows that ϕ(J) ⊆ J ′ ⊕ (L ∩M2) = J ′ +ML + LM . Therefore, we L ⊆ J ′ + L = ϕ(J) + ϕ(L) ⊆ I +ML+ LM. Substituting this upper bound for L into its right hand side, we deduce the inclusion L ⊆ I +M2L+MLM + LM2. Continuing in the same way, for every n > 0, we have the inclusion L ⊆ I + MkLMn−k ⊆ I +Mn+1. In view of (2.3), it follows that L is contained in I, the closure of I in R〈〈A⊕ C〉〉. However, it is easy to see that I = J ′ + ϕ(L) is closed in R〈〈A ⊕ C〉〉 (indeed, the closedness of I is equivalent to that of ϕ−1(I) = ϕ−1(J ′)+L, and so, by symmetry, it is enough to show that ϕ(J)+L is closed; but this is clear since ϕ(J)+L = p−1(ψ(J)) is the inverse image of the closed ideal ψ(J) of R〈〈A〉〉). This completes the proofs of Proposition 4.9 and Theorem 4.6. � QUIVERS WITH POTENTIALS I 19 Definition 4.13. We call the component (Ared, Sred) in the decomposition (4.5) the reduced part of a QP (A, S) (by Theorem 4.6, it is determined by (A, S) up to right-equivalence). Definition 4.14. We call a quiver Q (as well as its arrow span A) 2-acyclic if it has no oriented 2-cycles, i.e., satisfies the following condition: (4.13) For every pair of vertices i 6= j, either Ai,j = {0} or Aj,i = {0}. In the rest of this section we study the conditions on a QP (A, S) guaranteeing that its reduced part is 2-acyclic. We need some preparation. For a quiver Q with the arrow span A, let C = C(A) denote the set of cyclic paths on A up to cyclical equivalence. Thus, C is either empty (if Q has no oriented cycles at all), or countable. The space of potentials up to cyclical equivalence is naturally identified with KC. We say that a K-valued function on KC is polynomial if it depends on finitely many components of a potential S and can be expressed as a polynomial in these components. For a nonzero polynomial function F , we denote by U(F ) ⊂ KC the set of all potentials S such that F (S) 6= 0. By a regular function on U(F ) we mean a ratio of two polynomial functions on KC such that the denominator vanishes nowhere on U(F ); in particular, any function of the form G/F n, where G is a polynomial, is regular on U(F ). If A′ is the arrow span of another quiver Q′, we say that a map KC(A) → KC(A ′) is polynomial if its every component is a polynomial function; similarly, a map U(F ) → KC(A ′) is regular if its every component is a regular function on U(F ). Now suppose that the arrow span A satisfies (4.1), and let {a1, b1, . . . , aN , bN} be any maximal collection of distinct arrows in Q such that bkak is a cyclic 2-path for k = 1, . . . , N . Then the quiver obtained from Q by removing this collection of arrows is clearly 2-acyclic. To such a collection we associate a nonzero polynomial function on KC(A) given by (4.14) Db1,...,bNa1,...,aN (S) = det(xbqap)p,q=1,...,N , where xbqap is the sum of the coefficients of bqap and apbq in a potential S, with the convention that xbqap = 0 unless bqap is a cyclic 2-path. Proposition 4.15. The reduced part (Ared, Sred) of a QP (A, S) is 2-acyclic if and only if Db1,...,bNa1,...,aN (S) 6= 0 for some collection of arrows as above. Furthermore, if A the arrow span of the quiver obtained from Q by removing all arrows a1, b1, . . . , aN , bN , then there exists a regular map H : U(Db1,...,bNa1,...,aN ) → K C(A′) such that, for any QP (A, S) with S ∈ U(Db1,...,bNa1,...,aN ), the reduced part (Ared, Sred) is right-equivalent to (A′, H(S)). The proof of Proposition 4.15 follows by tracing the construction of (Ared, Sred) given in the proof of Lemma 4.7. Note that we use the following convention. If A is 2-acyclic from the start then the only collection {a1, b1, . . . , aN , bN} as above is the empty set; in this case, the function Db1,...,bNa1,...,aN is understood to be equal to 1, and H is just the identity mapping. 20 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY 5. Mutations of quivers with potentials Let (A, S) be a QP. Suppose that a vertex k ∈ Q0 does not belong to an oriented 2-cycle. In other words, k satisfies the following condition: (5.1) For every vertex i, either Ai,k or Ak,i is zero. Replacing S if necessary with a cyclically equivalent potential, we can also assume (5.2) No cyclic path occurring in the expansion of S starts (and ends) at k. Under these conditions, we associate to (A, S) a QP µ̃k(A, S) = (Ã, S̃) on the same set of vertices Q0. We define the homogeneous components Ãi,j as follows: (5.3) Ãi,j = (Aj,i) ⋆ if i = k or j = k; Ai,j ⊕Ai,kAk,j otherwise; here the product Ai,kAk,j is understood as a subspace of A 2 ⊆ R〈〈A〉〉. Thus, the R-bimodule à is given by (5.4) à = ekAek ⊕ AekA⊕ (ekA) ⋆ ⊕ (Aek) where we use the notation (5.5) ek = 1− ek = i∈Q0−{k} We associate to Q1 the set of arrows Q̃1 in the following way: • Take all the arrows c ∈ Q1 not incident to k. • For each incoming arrow a and outgoing arrow b at k, create a “composite” arrow [ba] corresponding to the product ba ∈∈ AekA. • Replace each incoming arrow a (resp. each outgoing arrow b) at k by the corresponding arrow a⋆ (resp. b⋆) oriented in the opposite way. More formally, for i = k or j = k, we set (5.6) Q̃1 ∩ Ãi,j = {a ⋆ | a ∈ Q1 ∩Aj,i} (the dual basis); and for i and j different from k, we define (5.7) Q̃1 ∩ Ãi,j = (Q1 ∩ Ai,j) {[ba] | b ∈ Q1 ∩ Ai,k, a ∈ Q1 ∩ Ak,j}, where [ba] ∈ Q̃1 ∩Ai,kAk,j denotes the arrow in Q̃1 associated with the product ba. We now associate to S the potential µ̃k(S) = S̃ ∈ R〈〈Ã〉〉 given by (5.8) S̃ = [S] + ∆k, where (5.9) ∆k = ∆k(A) = a,b∈Q1: h(a)=t(b)=k [ba]a⋆b⋆, and [S] is obtained by substituting [apap+1] for each factor apap+1 with t(ap) = h(ap+1) = k of any cyclic path a1 · · · ad occurring in the expansion of S (recall that none of these cyclic paths starts at k). It is easy to see that both [S] and ∆k do not depend on the choice of a basis Q1 of A. QUIVERS WITH POTENTIALS I 21 The following proposition is immediate from the definitions. Proposition 5.1. Suppose a QP (A, S) satisfies (5.1) and (5.2), and a QP (A′, S ′) is such that ekA ′ = A′ek = {0}. Then we have (5.10) µ̃k(A⊕ A ′, S + S ′) = µ̃k(A, S)⊕ (A ′, S ′). Theorem 5.2. The right-equivalence class of the QP (Ã, S̃) = µ̃k(A, S) is deter- mined by the right-equivalence class of (A, S). Proof. Let  be the finite-dimensional R-bimodule given by (5.11)  = A⊕ (ekA) ⋆ ⊕ (Aek) The natural embedding A→  identifies R〈〈A〉〉 with a closed subalgebra in R〈〈Â〉〉. We also have a natural embedding à → R〈〈Â〉〉 (sending each arrow [ba] to the product ba). This allows us to identify R〈〈Ã〉〉 with another closed subalgebra in R〈〈Â〉〉, namely, with the closure of the linear span of the paths â1 · · · âd such that â1 /∈ ekA and âd /∈ Aek. Under this identification, the potential S̃ given by (5.8) and viewed as an element of R〈〈Â〉〉 is cyclically equivalent to the potential S + ( b∈Q1∩Aek b⋆b)( a∈Q1∩ekA aa⋆). Taking this into account, we see that Theorem 5.2 becomes a consequence of the following lemma. Lemma 5.3. Every automorphism ϕ of R〈〈A〉〉 can be extended to an automorphism ϕ̂ of R〈〈Â〉〉 satisfying (5.12) ϕ̂(R〈〈Ã〉〉) = R〈〈Ã〉〉, (5.13) ϕ̂( a∈Q1∩ekA aa⋆) = a∈Q1∩ekA aa⋆, ϕ̂( b∈Q1∩Aek b⋆b) = b∈Q1∩Aek In order to extend ϕ to an automorphism ϕ̂ of R〈〈Â〉〉, we need only to define the elements ϕ̂(a⋆) and ϕ̂(b⋆) for all arrows a ∈ Q1 ∩ ekA and b ∈ Q1 ∩ Aek. We first deal with ϕ̂(a⋆). Let Q1 ∩ ekA = {a1, . . . , as}. In view of Proposition 2.4, the action of ϕ on these arrows is given by (5.14) ϕ(a1) ϕ(a2) · · · ϕ(as) a1 a2 · · · as (C0 + C1), where: • C0 is an invertible s × s matrix with entries in K such that its (p, q)-entry is 0 unless t(ap) = t(aq); • C1 is a s× s matrix whose (p, q)-entry belongs to m(A)t(ap),t(aq). Note that C0 +C1 is invertible, and its inverse is of the same form: indeed, we have (C0 + C1) −1 = (I + C−10 C1) −1C−10 = (I + (−1)n(C−10 C1) n)C−10 . 22 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY Now we define the elements ϕ̂(a⋆p) by setting ϕ̂(a⋆1) ϕ̂(a⋆2) ϕ̂(a⋆s)  = (C0 + C1)  . It follows that ϕ̂(a1) ϕ̂(a2) · · · ϕ̂(as) ϕ̂(a⋆1) ϕ̂(a⋆2) ϕ̂(a⋆s) a1 a2 · · · as  = For b ∈ Q1 ∩ Aek, we define ϕ̂(b ⋆) in a similar way. Namely, let Q1 ∩ Aek = {b1, . . . , bt}. As above, the action of ϕ on these arrows is given by (5.15) ϕ(b1) ϕ(b2) ϕ(bt)  = (D0 +D1)  , where: • D0 is an invertible t × t matrix with entries in K such that its (p, q)-entry is 0 unless h(bp) = h(bq); • D1 is a t× t matrix whose (p, q)-entry belongs to m(A)h(bp),h(bq). As above, we see that D0+D1 is invertible, and its inverse is of the same form. Now we define the elements ϕ̂(b⋆q) by setting ϕ̂(b⋆1) ϕ̂(b 2) · · · ϕ̂(b b⋆1 b 2 · · · b (D0 +D1) It follows that b⋆qbq) = ϕ̂(b1) ⋆ ϕ̂(b⋆2) · · · ϕ̂(b ϕ̂(b1) ϕ̂(b2) ϕ̂(bt) b⋆1 b 2 · · · b  = b⋆qbq. The condition (5.13) is then clearly satisfied; the construction also makes clear that the automorphism ϕ̂ of R〈〈Â〉〉 preserves the subalgebra R〈〈Ã〉〉. As a consequence QUIVERS WITH POTENTIALS I 23 of Proposition 2.4, ϕ̂ restricts to an automorphism of R〈〈Ã〉〉, verifying (5.12) and completing the proofs of Lemma 5.3 and Theorem 5.2. � Note that even if a QP (A, S) is assumed to be reduced, the QP µ̃k(A, S) = (Ã, S̃) is not necessarily reduced because the component [S](2) ∈ Ã2 may be non-zero. Combining Theorems 4.6 and 5.2, we obtain the following corollary. Corollary 5.4. Suppose a QP (A, S) satisfies (5.1) and (5.2), and let µ̃k(A, S) = (Ã, S̃). Let (A, S) be a reduced QP such that (5.16) (Ã, S̃) ∼= (Ãtriv, S̃ (2))⊕ (A, S) (see (4.5)). Then the right-equivalence class of (A, S) is determined by the right- equivalence class of (A, S). Definition 5.5. In the situation of Corollary 5.4, we use the notation µk(A, S) = (A, S) and call the correspondence (A, S) 7→ µk(A, S) the mutation at vertex k. Note that if a QP (A, S) satisfies (5.1) then the same is true for µ̃k(A, S) and for µk(A, S). Thus, the mutation µk is a well-defined transformation on the set of right-equivalence classes of reduced QPs satisfying (5.1). (With some abuse of notation, we sometimes denote a right-equivalence class by the same symbol as any of its representatives.) Example 5.6. Consider the quiver Q with vertices {1, 2, 3, 4} and arrows a : 1 → 2, b : 2 → 3, c : 3 → 4 and d : 4 → 1: Let S = dcba. Let us perform the mutation at vertex 2. The arrow a is replaced by e := a⋆ : 2 → 1, and b is replaced by f := b⋆ : 3 → 2. We also have a new arrow g := [ba] : 1 → 3. So µ̃2(A) corresponds to the quiver with vertices {1, 2, 3, 4} and arrows c, d, e, f, g: g=[ba] The potential µ̃2(S) = S̃ is given by S̃ = dcg + gef ; thus, µ̃2(A, S) is reduced, and we have µ̃2(A, S) = µ2(A, S). Note that S̃ does not satisfy condition (5.2) with respect to vertex k = 3 since the path gef starts and ends at 3. But we can fix this condition by replacing S̃ with a cyclically equivalent potential, say S ′ = dcg + efg. Now let us mutate (Ã, S ′) at 24 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY vertex 3. The arrows c, f, g are replaced by c⋆ : 4 → 3, f ⋆ : 2 → 3 and g⋆ : 3 → 1, respectively. We also add new arrows [cg] : 1 → 4 and [fg] : 1 → 2. Thus, µ̃3(Ã, S has arrows {d, e, c⋆, f ⋆, g⋆, [cg], [fg]}: c⋆ // The potential µ̃3(S ′) is given by ′) = d[cg] + e[fg] + [fg]g⋆f ⋆ + [cg]g⋆c⋆. It is not reduced, so to obtain the reduced QP µ3(Ã, S ′), we need to remove the trivial part of µ̃3(Ã, S ′). The resulting quiver is as follows: c⋆ // 3 Since it is acyclic (that is, has no oriented cycles), the corresponding potential is 0. Our next result is that every mutation is an involution. Theorem 5.7. The correspondence µk : (A, S) → (A, S) acts as an involution on the set of right-equivalence classes of reduced QPs satisfying (5.1), that is, µ2k(A, S) is right-equivalent to (A, S). Proof. Let (A, S) be a reduced QP satisfying (5.1) and (5.2). Let µ̃k(A, S) = (Ã, S̃) and µ̃2k(A, S) = µ̃k(Ã, S̃) = ( S). In view of Theorem 4.6 and Proposition 5.1, it is enough to show that (5.17) ( S) is right-equivalent to (A, S)⊕ (C, T ), where (C, T ) is a trivial QP. Using (5.4) twice, and identifying (ekA) ⋆ with A⋆ek, and (Aek) ⋆ with ekA ⋆, where A⋆ is the dual R-bimodule of A, we conclude that (5.18) A = A⊕ AekA⊕A Furthermore, the basis of arrows in A consists of the original set of arrows Q1 in A together with the arrows [ba] ∈ AekA and [a ⋆b⋆] ∈ A⋆ekA ⋆ for a ∈ Q1 ∩ ekA and b ∈ Q1 ∩Aek. Remembering (5.8) and (5.9), we see that the potential S is given by (5.19) S = [[S]] + [∆k(A)] + ∆k(Ã) = [S] + a,b∈Q1: h(a)=t(b)=k ([ba][a⋆b⋆] + [a⋆b⋆]ba), QUIVERS WITH POTENTIALS I 25 hence it is cyclically equivalent to (5.20) S1 = [S] + a,b∈Q1: h(a)=t(b)=k ([ba] + ba)[a⋆b⋆] (recall that [S] is obtained by substituting [apap+1] for each factor apap+1 with t(ap) = h(ap+1) = k of any cyclic path a1 · · · ad occurring in the path expansion of S). Let us abbreviate (C, T ) = (AekA⊕ A a,b∈Q1: h(a)=t(b)=k [ba][a⋆b⋆]). This is a trivial QP (cf. Proposition 4.4); therefore to prove Theorem 5.7 it suffices to show that the QP ( A, S1) given by (5.18) and (5.20) is right-equivalent to (A, S)⊕ (C, T ). We proceed in several steps. Step 1: Let ϕ1 be the change of arrows automorphism of R〈〈 A〉〉 (see Defini- tion 2.5) multiplying each arrow b ∈ Q1 ∩ Aek by −1, and fixing the rest of the arrows in A. Then the potential S2 = ϕ1(S1) is given by S2 = [S] + a,b∈Q1: h(a)=t(b)=k ([ba]− ba)[a⋆b⋆]. Step 2: Let ϕ2 be the unitriangular automorphism of R〈〈 A〉〉 (see Definition 2.5) sending each arrow [ba] ∈ AekA to [ba] + ba, and fixing the rest of the arrows in A. Remembering the definition of [S], it is easy to see that the potential ϕ2(S2) is cyclically equivalent to a potential of the form S3 = S + a,b∈Q1: h(a)=t(b)=k [ba]([a⋆b⋆] + f(a, b)) for some elements f(a, b) ∈ m(A⊕ AekA) Step 3: Let ϕ3 be the unitriangular automorphism of R〈〈 A〉〉 sending each arrow [a⋆b⋆] ∈ A⋆ekA ⋆ to [a⋆b⋆] − f(a, b), and fixing the rest of the arrows in A. Then we have ϕ3(S3) = S + T . Combining these three steps, we conclude that the QP ( A, S1) is right-equivalent A, S + T ) = (A, S)⊕ (C, T ), finishing the proof of Theorem 5.7. � 6. Some mutation invariants In this section we fix a vertex k and study the effect of the mutation µk on the Jacobian algebra P(A, S). We will use the following notation: for an R-bimodule B, denote (6.1) Bk̂,k̂ = ekBek = i,j 6=k (see (5.5)). Note that if B is a (topological) algebra then B k̂,k̂ is a (closed) subalgebra of B. 26 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY Proposition 6.1. Suppose a QP (A, S) satisfies (5.1) and (5.2), and let (Ã, S̃) = µ̃k(A, S) be given by (5.4) and (5.8). Then the algebras P(A, S)k̂,k̂ and P(Ã, S̃)k̂,k̂ are isomorphic to each other. Proof. In view of (5.4), we have (6.2) Ãk̂,k̂ = Ak̂,k̂ ⊕ AekA. Thus, the algebra R〈〈Ãk̂,k̂〉〉 is generated by the arrows c ∈ Q1 ∩ Ak̂,k̂ and [ba] for a ∈ Q1∩ekA and b ∈ Q1∩Aek. The following fact is immediate from the definitions. Lemma 6.2. The correspondence sending each c ∈ Q1 ∩ Ak̂,k̂ to itself, and each generator [ba] to ba extends to an algebra isomorphism R〈〈à k̂,k̂ 〉〉 → R〈〈A〉〉 k̂,k̂ Let u 7→ [u] denote the isomorphism R〈〈A〉〉k̂,k̂ → R〈〈Ãk̂,k̂〉〉 inverse of that in Lemma 6.2. It acts in the same way as the correspondence S 7→ [S] in (5.8): [u] is obtained by substituting [apap+1] for each factor apap+1 with t(ap) = h(ap+1) = k of any path a1 · · ·ad occurring in the path expansion of u. Lemma 6.3. The correspondence u 7→ [u] induces an algebra epimorphism P(A, S) k̂,k̂ → P(Ã, S̃) k̂,k̂ Proof. It is enough to prove the following two facts: (6.3) R〈〈Ã〉〉k̂,k̂ = R〈〈Ãk̂,k̂〉〉+ J(S̃)k̂,k̂; (6.4) [J(S)k̂,k̂] ⊆ R〈〈Ãk̂,k̂〉〉 ∩ J(S̃)k̂,k̂. To show (6.3), we note that if a path ã1 · · · ãd ∈ R〈〈Ã〉〉k̂,k̂ does not belong to R〈〈à k̂,k̂ 〉〉 then it must contain one or more factors of the form a⋆b⋆ with h(a) = t(b) = k. In view of (5.8) and (5.9), we have (6.5) a⋆b⋆ = ∂[ba]S̃ − ∂[ba][S]. Substituting this expression for each factor a⋆b⋆, we see that ã1 · · · ãd ∈ R〈〈Ãk̂,k̂〉〉+ J(S̃)k̂,k̂, as desired. To show (6.4), we note that J(S)k̂,k̂ is easily seen to be the closure of the ideal in R〈〈A〉〉 k̂,k̂ generated by the elements ∂cS for all arrows c ∈ Q1 with t(c) 6= k and h(c) 6= k, together with the elements (∂aS)a ′ for a, a′ ∈ Q1 ∩ ekA , and b ′(∂bS) for b, b′ ∈ Q1 ∩ Aek. Let us apply the map u 7→ [u] to these generators. First, we have: (6.6) [∂cS] = ∂cS̃. QUIVERS WITH POTENTIALS I 27 With a little bit more work (using (6.5)), we obtain [(∂aS)a t(b)=k (∂[ba][S])[ba t(b)=k (∂[ba]S̃ − a ⋆b⋆)[ba′](6.7) t(b)=k (∂[ba]S̃)[ba ′]− a⋆∂a′⋆S̃, [b′(∂bS)] = h(a)=k [b′a](∂[ba][S]) h(a)=k [b′a](∂[ba]S̃ − a ⋆b⋆)(6.8) h(a)=k [b′a](∂[ba]S̃)− (∂b′⋆S̃)b This implies the desired inclusion in (6.4). � To finish the proof of Proposition 6.1, it is enough to show that the epimorphism in Lemma 6.3 (let us denote it by α) is in fact an isomorphism. To do this, we construct the left inverse algebra homomorphism β : P(Ã, S̃) k̂,k̂ → P(A, S) k̂,k̂ that βα is the identity map on P(A, S)k̂,k̂). We define β as the composition of three maps. First, we apply the epimorphism P(Ã, S̃) k̂,k̂ k̂,k̂ defined in the same way as α. Remembering the proof of Theorem 5.7 and using the notation introduced there, we then apply the isomorphism P( S)k̂,k̂ → P(A ⊕ C, S + T )k̂,k̂ induced by the automorphism ϕ3ϕ2ϕ1 of R〈〈A ⊕ C〉〉. Finally, we apply the isomorphism P(A⊕ C, S + T )k̂,k̂ → P(A, S)k̂,k̂ given in Proposition 4.5. Since all the maps involved are algebra homomorphisms, it is enough to check that βα fixes the generators p(c) and p(ba) of P(A, S)k̂,k̂, where p is the projection R〈〈A〉〉 → P(A, S), and a, b, c have the same meaning as above. This is done by direct tracing of the definitions. � Proposition 6.4. In the situation of Proposition 6.1, if the Jacobian algebra P(A, S) is finite-dimensional then so is P(Ã, S̃). Proof. We start by showing that finite dimensionality of P(A, S) follows from a seemingly weaker condition. Lemma 6.5. Let J ⊆ m(A) be a closed ideal in R〈〈A〉〉. Then the quotient alge- bra R〈〈A〉〉/J is finite dimensional provided the subalgebra R〈〈A〉〉k̂,k̂/Jk̂,k̂ is finite dimensional. In particular, the Jacobian algebra P(A, S) is finite-dimensional if and only if so is the subalgebra P(A, S) k̂,k̂ Proof. Similarly to (6.1), for an R-bimodule B, we denote Bk,k̂ = ekBek = j 6=k Bk,j, Bk̂,k = ekBek = i 6=k Bi,k. 28 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY We need to show that if R〈〈A〉〉k̂,k̂/Jk̂,k̂ is finite dimensional then so is each of the spaces R〈〈A〉〉 , R〈〈A〉〉 andR〈〈A〉〉k,k/Jk,k. Let us treatR〈〈A〉〉k,k/Jk,k; the other two cases are done similarly (and a little simpler). Q1 ∩Ak,k̂ = {a1, . . . , as}, Q1 ∩ Ak̂,k = {b1, . . . , bt}. We have R〈〈A〉〉k,k = Kek ⊕ aℓR〈〈A〉〉k̂,k̂bm. It follows that there is a surjective map α : K×Mats×t(R〈〈A〉〉k̂,k̂) → R〈〈A〉〉k,k/Jk,k given by α(c, C) = p(cek + a1 a2 · · · as ), where Mats×t(B) stands for the space of s× t matrices with entries in B, and p is the projection R〈〈A〉〉 → R〈〈A〉〉/J . The kernel of α contains the space Mats×t(Jk̂,k̂), hence R〈〈A〉〉k,k/Jk,k is isomorphic to a quotient of the finite-dimensional space K × Mats×t(R〈〈A〉〉k̂,k̂/Jk̂,k̂). Thus, R〈〈A〉〉k,k/Jk,k is finite dimensional, as desired. � To finish the proof of Proposition 6.4, suppose that P(A, S) is finite dimensional. Then P(Ã, S̃)k̂,k̂ is finite dimensional by Proposition 6.1. Applying Lemma 6.5 to the QP (Ã, S̃), we conclude that P(Ã, S̃) is finite dimensional, as desired. � Remembering (5.16) and using Proposition 4.5, we see that Propositions 6.1 and 6.4 have the following corollary. Corollary 6.6. Suppose (A, S) is a reduced QP satisfying (5.1), and let (A, S) = µk(A, S) be a reduced QP obtained from (A, S) by the mutation at k. Then the algebras P(A, S) k̂,k̂ and P(A, S) k̂,k̂ are isomorphic to each other, and P(A, S) is finite-dimensional if and only if so is P(A, S). We see that the class of QPs with finite dimensional Jacobian algebras is invariant under mutations. Let us now present another such class. Definition 6.7. For every QP (A, S), we define its deformation space Def(A, S) by (6.9) Def(A, S) = Tr(P(A, S))/R (see Definitions 3.1 and 3.4). Remark 6.8. Definition 6.7 can be motivated as follows (we keep the following arguments informal although with some work they can be made rigorous). Let G = Aut(R〈〈A〉〉) be the group of algebra automorphisms ofR〈〈A〉〉 (acting as the identity on R). Using Proposition 2.4, we can think of G as an infinite dimensional algebraic group. In view of Definition 3.4, G acts naturally on the trace space Tr(R〈〈A〉〉). Remembering Definition 4.2, it is natural to think of the deformation space of a QP (A, S) as the normal space at π(S) of the orbit G·π(S) in the ambient space π(m(A)2) (recall that π stands for the natural projection R〈〈A〉〉 → Tr(R〈〈A〉〉)). Arguing as QUIVERS WITH POTENTIALS I 29 in Lemma 4.11, we conclude that the infinitesimal action of (the Lie algebra of) G on π(m(A)2) is by the transformations π(u) 7→ π( bk∂aku), where Q1 = {a1, . . . , aN} is the set of arrows, and bk ∈ m(A)h(ak),t(ak) (this is well defined in view of Proposition 3.3). This makes it natural to identify the tangent space at π(S) of G · π(S) with π(J(S)), hence to identify the corresponding normal space with π(m(A))/π(J(S)), or equivalently, with the space Def(S) given by (6.9). Proposition 6.9. In the situation of Proposition 6.1, deformation spaces Def(Ã, S̃) and Def(A, S) are isomorphic to each other. Proof. In view of Proposition 3.5, Def(A, S) is isomorphic to Tr(P(A, S)k̂,k̂)/Rk̂,k̂. Therefore, our assertion is immediate from Proposition 6.1. � Definition 6.10. We call a QP (A, S) rigid if Def(A, S) = {0}, i.e., if Tr(P(A, S)) = Combining Propositions 4.5 and 6.9, we obtain the following corollary. Corollary 6.11. If a reduced QP (A, S) satisfies (5.1) and is rigid, then the QP (A, S) = µk(A, S) is also rigid. Some examples of rigid and non-rigid QPs will be given in Section 8. 7. Nondegenerate QPs If we wish to be able to apply to a reduced QP (A, S) the mutation at every vertex of Q0, the R-bimodule A must satisfy (5.1) at all vertices. Thus, the arrow span A must be 2-acyclic (see Definition 4.14). Such an arrow span A can be encoded by a skew-symmetric integer matrix B = B(A) = (bi,j) with rows and columns labeled by Q0, by setting (7.1) bi,j = dimAi,j − dimAj,i. Indeed, the dimensions of the components Ai,j are recovered from B by (7.2) dimAi,j = [bi,j ]+, where we use the notation (7.3) [x]+ = max(x, 0). Proposition 7.1. Let (A, S) be a 2-acyclic reduced QP, and suppose that the reduced QP µk(A, S) = (A, S) obtained from (A, S) by the mutation at some vertex k (see Definition 5.5) is also 2-acyclic. Let B(A) = (bi,j) and B(A) = (bi,j) be the skew- symmetric integer matrices given by (7.1). Then we have (7.4) bi,j = −bi,j if i = k or j = k; bi,j + [bi,k]+ [bk,j ]+ − [−bi,k]+ [−bk,j ]+ otherwise. 30 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY Proof. First we note that by Proposition 4.4, if (C, T ) is a trivial QP then dimCi,j = dimCj,i for all i, j. In view of (5.16), this implies that (7.5) bi,j = dimAi,j − dimAj,i = dim Ãi,j − dim Ãj,i, where (Ã, S̃) = µ̃k(A, S). Using (5.3), we obtain dim Ãi,j = dimAj,i if i = k or j = k; dimAi,j + dimAi,k dimAk,j otherwise. To obtain (7.4), it remains to substitute these expressions into (7.5) and use (7.2). � An easy calculation using the obvious identity x = [x]+ − [−x]+ shows that the second case in (7.4) can be rewritten in several equivalent ways as follows: bi,j = bi,j + sgn(bi,k) [bi,kbk,j ]+ = bi,j + [−bi,k]+ bk,j + bi,k[bk,j]+ = bi,j + |bi,k|bk,j + bi,k|bk,j| It follows that the transformation B 7→ B given by (7.4) coincides with the matrix mutation at k which plays a crucial part in the theory of cluster algebras (cf. [18, (4.3)], [20, (2.2), (2.5)]). We see that the mutations of 2-acyclic QPs provide a natural framework for matrix mutations. With some abuse of notation, we denote by µk(A) the 2-acyclic R- bimodule such that the skew-symmetric matrix B(µk(A)) is obtained from B(A) by the mutation at k; thus, µk(A) is determined by A up to an isomorphism. Note that the matrix mutations at arbitrary vertices can be iterated indefinitely, while the 2-acyclicity condition (4.13) can be destroyed by a QP mutation. We will study QPs for which this does not happen. Definition 7.2. Let k1, . . . , kℓ ∈ Q0 be a finite sequence of vertices such that kp 6= kp+1 for p = 1, . . . , ℓ − 1. We say that a QP (A, S) is (kℓ, · · · , k1)-nondegenerate if all the QPs (A, S), µk1(A, S), µk2µk1(A, S), . . . , µkℓ · · ·µk1(A, S) are 2-acyclic (hence well-defined). We say that (A, S) is nondegenerate if it is (kℓ, . . . , k1)-nondegenerate for every sequence of vertices as above. To state our next result recall the terminology introduced before Proposition 4.15. In particular, for a given quiver with the arrow span A, the QPs on A are identified with the elements of KC(A). Proposition 7.3. Suppose that the base field K is infinite, Q is a 2-acyclic quiver with the arrow span A, a sequence of vertices k1, . . . , kℓ is as in Definition 7.2, and A′ = µkℓ · · ·µk1(A). Then there exist a non-zero polynomial function F : K C(A) → K and a regular map G : U(F ) → KC(A ′) such that every QP (A, S) with S ∈ U(F ) is (kℓ, . . . , k1)-nondegenerate, and, for any QP (A, S) with S ∈ U(F ), the QP µkℓ · · ·µk1(A, S) is right-equivalent to (A ′, G(S)). Proof. We proceed by induction on ℓ. First let us deal with the case ℓ = 1, that is, with a single mutation µk. Recall that µk(A, S) = (A, S) is the reduced part of the QP µ̃k(A, S) = (Ã, S̃) given by (5.3) and (5.8). It is clear from the definition QUIVERS WITH POTENTIALS I 31 that S̃ = G̃(S) for a polynomial map G̃ : KC(A) → KC( eA). Now let us apply Proposition 4.15 to the quiver with the arrow span Ã. We see that there exists a polynomial function of the form Dd1,...,dNc1,...,cN on K C( eA) (see (4.14), where we have changed the notation for the arrows to avoid the notation conflict with Section 5) such that the reduced part (A, S) of a QP (Ã, S̃) is 2-acyclic whenever S̃ ∈ U(Dd1,...,dNc1,...,cN ). Furthermore, for S̃ ∈ U(Dd1,...,dNc1,...,cN ), the QP (A, S) is right-equivalent to (A ′, H(S̃)) for some regular map H : U(Dd1,...,dNc1,...,cN ) → K C(A′), where A′ = µk(A). We now define a polynomial function F : KC(A) → K and a regular map G : U(F ) → KC(A ′) by setting (7.6) F = Dd1,...,dNc1,...,cN ◦ G̃, G = H ◦ G̃. To finish the argument for ℓ = 1, it remains to show that F is not identically equal to zero. But this is clear from the definitions (4.14) and (5.8), since the oriented 2- cycles in à (up to cyclical equivalence) are of the form c[ba] and so are in a bijection with the oriented 3-cycles cba in A that pass through k. Now assume that ℓ ≥ 2, and that our assertion holds if we replace ℓ by ℓ− 1. Let A1 = µk1(A), so A ′ = µkℓ · · ·µk2(A1). By the inductive assumption, there exist a non- zero polynomial function F ′ : KC(A1) → K and a regular map G′ : U(F ′) → KC(A such that, for any QP (A1, S1) with S1 ∈ U(F ′), the QP µkℓ · · ·µk2(A1, S1) is right- equivalent to (A′, G′(S1)). Also by the already established case ℓ = 1, there exists a non-zero polynomial function F ′′ : KC(A1) → K such that, for any QP (A1, S1) with S1 ∈ U(F ′′), the QP µk1(A1, S1) is 2-acyclic, hence is right-equivalent to some QP on A. Since the base field K is assumed to be infinite, we have U(F ′) ∩ U(F ′′) 6= ∅. Choose S 1 ∈ U(F ′) ∩ U(F ′′), and let (A, S0) = µk(A1, S 1 ). By Theorem 5.7, we have µk(A, S0) = (A1, S 1 ). By the above argument for ℓ = 1, there exist a nonzero polynomial function F1 : K C(A) → K and a regular map G1 : U(F1) → K C(A1) (of the type (7.6)) such that µk(A, S) = (A1, G1(S)) for S ∈ U(F1). In particular, we have G1(S0) = S 1 implying that F ′ ◦ G1 is a nonzero polynomial function on KC(A). It follows that the nonzero polynomial function F (S) = F1(S)F ′(G1(S)) and the regular map G = G′ ◦ G1 : U(F ) → K C(A′) are well-defined and satisfy all the required conditions. This completes the proof of Proposition 7.3. � Corollary 7.4. For every 2-acyclic arrow span A, there exists a countable family F of nonzero polynomial functions on KC(A) such that the QP (A, S) is nondegenerate whenever S ∈ F∈F U(F ). In particular, if the base field K is uncountable, then there exists a nondegenerate QP on A. Proof. By Proposition 7.3, for every sequence k1, . . . , kℓ as in Definition 7.2, there exists a nonzero polynomial function Fk1,...,kℓ on K C(A) such that a QP (A, S) is (kℓ, . . . , k1)-nondegenerate for S ∈ U(Fk1,...,kℓ). These functions form a desired count- able family F . It remains to prove that F∈F U(F ) 6= ∅ provided K is uncountable. If A is acyclic, i.e., has no oriented cycles, then KC(A) = {0}, and each of the functions in F is just a nonzero constant, so there is nothing to prove; no assumption on K is needed here. If A has at least one oriented cycle then the set C(A) is countable (recall 32 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY that it consists of cyclic paths up to cyclical equivalence). Thus, we can realize KC(A) as the polynomial ring K[X1, X2, . . . ] in countably many indeterminates. Since K is uncountable, we can choose x1 so that F (x1) 6= 0 for all F ∈ F ∩K[X1]. Then we choose x2 so that F (x1, x2) 6= 0 for all F ∈ F ∩K[X1, X2]. Continuing like this, we find a sequence x1, x2, . . . such that F (x1, x2, . . . ) 6= 0 for all F ∈ F . � 8. Rigid QPs Proposition 8.1. Every rigid reduced QP (A, S) is 2-acyclic. Proof. First note that the definition of rigidity can be conveniently restated as fol- lows: a QP (A, S) is rigid if and only if every potential(8.1) on A is cyclically equivalent to an element of J(S). Now suppose for the sake of contradiction that for some i 6= j both components Ai,j and Aj,i are non-zero. Choose non-zero elements a ∈ Ai,j and b ∈ Aj,i. Remembering the definition of the Jacobian ideal (see Definition 3.1), it is easy to see that the cyclic part of J(S) is contained in m(A)3. It follows that ab is not cyclically equivalent to an element of J(S), in contradiction with (8.1). � Combining Proposition 8.1 with Corollary 6.11, we obtain the following result. Corollary 8.2. Any rigid QP is nondegenerate. Let us now give some examples. Example 8.3. Recall that a skew-symmetric integer matrix B is acyclic if the corre- sponding directed graph (with an arrow i→ j associated with each entry bi,j > 0) has no oriented cycles. If the matrix B(A) given by (7.1) is acyclic, then R〈〈A〉〉cyc = {0}, and so the only QP associated with A is (A, 0), which is clearly rigid. Now suppose that A is 2-acyclic, and that B(A) is not necessarily acyclic but is mutation equivalent to an acyclic matrix (i.e., can be transformed to an acyclic matrix by a sequence of mutations). As a consequence of Corollary 6.11 and Theo- rem 5.7, there exists a potential S such that (A, S) is a rigid reduced QP; moreover, (A, S) is unique up to right-equivalences. Example 8.4. For A arbitrary, the deformation space of a QP (A, 0) is naturally identified with the space of potentials modulo cyclical equivalence, hence it is infinite- dimensional provided A has at least one oriented cycle. Example 8.5 (Cyclic triangle). Let Q be the quiver with three vertices 1, 2, 3 and three arrows a : 1 → 2, b : 2 → 3 and c : 3 → 1: An arbitrary potential S is cyclically equivalent to the one of the form S = F (cba), where F ∈ K[[t]] is a formal power series. The deformation space Def(A, S) is naturally isomorphic to the quotient space of tK[[t]] modulo the ideal generated by QUIVERS WITH POTENTIALS I 33 tdF/dt. If charK = 0, and n ≥ 1 is the smallest exponent such that tn appears in F , then dimDef(A, S) = n− 1. In particular, (A, S) is rigid if and only if n = 1. Now let us consider the QP (Ã, S̃) = µ̃2(A, S); in view of (5.6), (5.7) and (5.8), à has four arrows a⋆, b⋆, c, [ba], and S̃ = F (c[ba]) + [ba]a⋆b⋆. Thus, if n ≥ 2 then (Ã, S̃) is reduced and so is equal to µ2(A, S) = (A, S). Since µ2(A, S) has an oriented 2-cycle formed by the arrows c and [ba], the mutations at vertices 1 and 3 cannot be applied. We see that the QP (A, F (cba)) is degenerate for n ≥ 2. Example 8.6 (Double cyclic triangle). Now consider the quiver with three vertices 1, 2, 3 and six arrows a1, a2 : 1 → 2, b1, b2 : 2 → 3 and c1, c2 : 3 → 1: b2 ��= Any potential S on A is cyclically equivalent to the one whose degree 3 component belongs to the 8-dimensional space A31,1 = A1,3A3,2A2,1. It is known that the diagonal action of the group GL32 on K 2 ⊗ K2 ⊗K2 has seven orbits, see e.g., [23, Chapter 14, Example 4.5]. Thus, by performing a change of arrows automorphism, we can assume that the degree 3 component of S is one of the representatives of these orbits. An easy case-by-case analysis shows that no potential can give rise to a rigid QP on A. For instance, let (8.2) S = c1b1a1 + c2b2a2. Then J(S) is the closure of the ideal in R〈〈A〉〉 generated by six elements c1b1, b1a1, a1c1, c2b2, b2a2, a2c2. One checks easily that the cyclic path c1b2a1c2b1a2 is not cyclically equivalent to an element of J(S), hence (A, S) is not rigid. Now let us compute µ2(A, S). Again setting (Ã, S̃) = µ̃2(A, S), we see that à has ten arrows a⋆1, a 2, c1, c2, [b1a1], [b1a2], [b2a1], [b2a2], S̃ = c1[b1a1] + c2[b2a2] + i,j=1 [biaj ]a To obtain the splitting (4.5) of (Ã, S̃), we apply the automorphism ϕ of R〈〈Ã〉〉 fixing all arrows except c1 and c2, and such that ϕ(ci) = ci−a i . An easy check shows that µ2(A, S) = (A, S) can be described as follows: A is 6-dimensional with the arrows a⋆1, a 2, [b1a2], [b2a1], and S = [b1a2]a 1 + [b2a1]a 34 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY Thus, the mutated QP (A, S) can be obtained from the initial QP (A, S) by a renumbering of the vertices. This implies that one can apply to (A, S) unlimited mutations at arbitrary vertices, so (A, S) is a non-rigid, nondegenerate QP. Example 8.7. For each n ≥ 0, let us consider the following quiver Q(n), which we refer to as the triangular grid of order n. The vertex set of Q(n) is Q(n)0 = {(p, q, r) ∈ Z ≥0 | p+ q + r = n}; and there is a single arrow (p1, q1, r1) → (p2, q2, r2) if and only if (p2, q2, r2)−(p1, q1, r1) is one of the three vectors (−1, 1, 0), (0,−1, 1), (1, 0,−1). Thus, the vertices of Q(n) form a regular triangular grid with n2 cyclically oriented unit triangles. For example, the quiver Q(4) is <<|startoftext|> Coherent macroscopic quantum tunneling in boson-fermion mixtures D. Mozyrsky, I. Martin, and E. Timmermans Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545 (Dated: August 11, 2021) We show that the cold atom systems of simultaneously trapped Bose-Einstein condensates (BEC’s) and quantum degenerate fermionic atoms provide promising laboratories for the study of macroscopic quantum tunneling. Our theoretical studies reveal that the spatial extent of a small trapped BEC immersed in a Fermi sea can tunnel and coherently oscillate between the values of the separated and mixed configurations (the phases of the phase separation transition of BEC-fermion systems). We evaluate the period, amplitude and dissipation rate for 23Na and 40K-atoms and we discuss the experimental prospects for observing this phenomenon. PACS numbers: 05.30.Jp, 03.75.Kk, 32.80.Pj, 67.90.+z The tunneling of a macroscopic (or collective) vari- able of a many-body system through a classically forbid- den region, macroscopic quantum tunneling (MQT), is a phenomenon of fundamental interest [1] and a recurring theme in a variety of fields ranging from nuclear (fission) and condensed matter physics (e.g. quantummagnets [2], SQUIDs) to quantum optics (macroscopic Schrodinger cat states [3] and beyond-standard limit measurements). Nevertheless, stringent tests under well-understood and controlled conditions remain an experimental challenge. Cold atom gases, arguably the cleanest and best under- stood mesoscopic systems which, furthermore, offer un- precedented control knobs such as the ability to vary the inter-particle interactions [4], now provide an intriguing candidate laboratory for the study of MQT. The first cold atom MQT proposals [5] suggested ob- serving the collapse of a trapped dilute gas Bose-Einstein condensate (BEC) of mutually attracting bosons. How- ever, the experimental results [6, 7] were either too sensi- tive to particle number to distinguish MQT from classi- cal collapse [6], or the analysis was complicated by more complex dynamics (such as ’clumping’) [7]. Evidence of coherence (of the many-body system taking on a linear superposition of states that correspond to the macro- scopic variable residing on either side of the barrier) is even more difficult to gather. Such coherence would be more readily observable in the MQT between long-lived states, in which case one could set up a coherent pop- ulation oscillation between the many-body states. Such long-lived states naturally occur in (zero-temperature) first order phase transitions in which the order parame- ter, which provides the macroscopic variable, can tunnel through the barrier of its Landau-Ginzburg potential. In the infinite system limit, the coupling between the two states rigorously vanishes, but finite- size cold atom sys- tems of moderate particle numbers provide, once again, a promising candidate to observe the MQT coherence be- tween states of different phases, as we show below. An earlier proposal to observe MQT between states in which the components of a BEC-mixture arrange them- selves differently in space, involved a very low coupling on account of the small spatial overlap between the sin- gle component densities in the different states [8]. In this paper, we propose that MQT can be realized and its coherence, perhaps, observed in trapped gas mixtures of a single-component fermion system and a BEC. Such mixtures are currently created [9] e.g. in the sympa- thetic cooling scheme in which the colder BEC cools the fermions. The tunneling and coherent oscillations that we target would occur between states of the mixed and separated phases in the phase separation transition of the fermion-BEC mixture [10]. Such transitions could be accessed by varying the scattering length of the boson- fermion interaction [11]. We consider NB atomic bosons confined in a spheri- cally symmetric harmonic trap (of frequency ωT ) inter- acting with a much larger system of atomic fermions. For simplicity we assume the fermions to occupy an in- finite volume. The Hamiltonian of the bosons is de- scribed by the standard Gross-Pitaevskii (GP) form [1], i.e., with inter-particle interactions described by a con- tact potential (∝ λBBδ(r − r ′)), which we choose to be repulsive (λBB > 0) We assume that the interaction of bosons with fermions is also contact-like, contributing λBF |ΨB| 2|ΨF | 2 to the Hamiltonian density, where λBF is the fermion-boson coupling constant. Furthermore, all fermions occupy in the same spin state so that the short-range inter-fermion interactions do not contribute by virtue of the Pauli exclusion principle. We are interested in the dynamics of the reduced sys- tem of bosons described by the functional S = SBEC +Tr log ~∂τ − − µF + λBF |ΨB| where S0 is the action of the bosons alone, SBEC = d3rΨ̇BΨ B − HBEC), and the second term is a contribution due to the interaction of bosons with fermions; µF is the chemical potential of the fermions. Here and throughout the paper we will be utilizing the imaginary time (Matsubara) representation, unless stated otherwise. An explicit evaluation of the second term is a challenging task. However, here we are inter- ested in the dynamics of the slow breathing mode of the BEC Ψ0B, which can be treated in the self-similar density http://arxiv.org/abs/0704.0650v1 approximation. This dynamics describes the longitudinal expansions (and contractions) of the condensate. Finite size effects such as the appearance of a non-vanishing ex- citation energy (gap) can decouple this mode from other excitation modes. Hence, Ψ0B peaks at small frequen- cies (ω) and small wavevectors (q), giving a Ψ0B that is a slowly varying function of spacial and temporal coor- dinates. In such case the Tr log[...] in Eq. (1) can be evaluated within the Thomas-Fermi approximation. A straightforward zero-temperature calculation yields δTr log[...] λBF |ΨB(r)| ΨB, (2) where kF is the Fermi wavevector. Eq. (2) represents an additional term in the Gross-Pitaevskii (GP) equation, δSGP /δΨ B = 0, resulting from interaction with fermions. In order to analyze the physical meaning of Eq. (2) let us expand it in powers of ΨB. The first nontrivial con- tribution is a term −2λ′|ΨB| 2ΨB, λ ′ = (λ2BF k F /4π 2µF ), which corresponds to the attraction between bosons me- diated by interaction with fermions. For nonzero, but small ω and q there is an additional term (of the or- der of λ2BF ) related to the dissipation of the condensate due to the Landau damping, as we discuss below. The next order yields η|ΨB| 4ΨB, η = (k BF /8π 2µ2F ). Un- like the previous term this one is positive, and represents reduction in the effective boson-boson attraction due to depletion of fermions in the regions of high density of the bosons. The next order terms (in λBF ) prove to be unimportant as can be verified directly from Eq. (2). Therefore we will replace the potential energy contribu- tion in GP equation given by Eq. (2) by the two terms discussed above [12]. To analyze the dynamics of the slow (breathing) mode described by the Hamiltonian |∇ΨB| 2 (3) (λBB − λ ′)|ΨB| we apply the time dependent variational principle. Since we are interested in ground state properties of Eq. (5) we use a spherically symmetric Gaussian trial wavefunction Ψ0B(r) = 4 (xR0) 2(xR0)2 , (4) parameterized by a dimensionless parameter x that char- acterizes the BEC’s spatial width in units of the zero point motion amplitude R0 = (~/2mBωT ) 1/2. Substitu- tion of this wavefunction into Eq. (5) yields the following dependence of the ground-state energy E0 on x: E0(x) = 3NB~ωT , (5) where α = NB(λ ′ − λBB)/[(2π) 3/2R30~ωT ] and β = 4N2Bη/(3 5/2π3R60~ωT ). For positive but relatively small FIG. 1: Density profiles of the bosons (solid lines 1 and 2) and corresponding density profiles of the fermions (dashed lines 1’ and 2’) in two metastable states: (1) with fermions having zero density at the center of the trap (“separated phase”) and (2) with nonzero density of fermions (“mixed phase”). The dotted line represents schematically an effective potential for the breathing mode of the bosons. α, i.e., for α < αcr = 32(2/5) 1/4/15 ≃ 1.69, E0 may de- velop two competing minima, depending on the value of the β-parameter. The energy barrier separating the min- ima is caused by the same effect as the barrier appearing in the description of a BEC with attractive interactions: it arises due to the competition between the kinetic and the interaction energies, i.e, the first and the third terms in the right-hand side (rhs) of Eq. (5). In the absence of the last term in the rhs of Eq. (5) the state in this well would have been metastable - the energy would tend to −∞ at x → 0. The 1/x6 term stabilizes the system: for small x this term rapidly increases, giving rise to another minimum of E0(x), now due to the competition between the last two terms in the rhs of Eq. (5). At certain values of α and β the two minima of E0(x) will have the same energy, and the ground state of the system becomes degenerate. Since our system is finite, this degeneracy will be lifted by the quantum tunnel- ing transition between the two states. Such mechanism has been suggested to be the dominant decay process for condensates with attractive interactions between parti- cles [5]. The tunneling corresponds to the low energy ex- citations of the breathing mode, described by the wave- function in Eq. (4). It has been shown in [5] that by accounting for the superfluid motion of the condensate (which can be done by introducing a phase-factor eiφ for the wavefunction in Eq. (4) and requiring the superfluid velocity vs = (~/mB)∇φ to satisfy the continuity equa- tion) one obtains an effective action for the breathing mode of the condensate S0[x(τ)] = + E0(x) , (6) where E0(x) is given by Eq. (5) and m0 = 3mBNBR Thus the dynamics of the ground state wavefunction of the condensate is that of a quantum particle of mass m0 moving in the potential E0(x). A direct analysis of the Shrodinger equation corre- sponding to Eq. (6), however, is quite cumbersome since the two wells are generally quite asymmetric. Instead we choose an alterative route: we compute the ground state energy and obtain the tunneling rate by numerically solving the time-independent GP equation, δH/δΨB = EΨB, where H is given by Eq. (3). The latter approach also serves as an independent justification of the varia- tional method and confirms that macroscopic quantum tunneling, QMT, is the mechanism that causes the tran- sition between the two states of the condensate. Upon substitution ΨB ∼ φ/r, the time-independent GP equa- tion can be cast in the form φ = µφ, (7) where the φ(x)-function is normalized to unity, a = (π/2)1/2α, b = 35/2πβ/16, and x = r/R0, µ = E/~ωT . We find the ground state numerically by replacing the rhs of Eq. (7) by −∂τφ and propagating φ in the imag- inary time τ until it converges to the ground state φ0 (or Ψ0B). We then evaluate the ground state energy ac- cording to Eq. (3) and present the results in Fig. 2(a) as a function of the b-parameter for different values of a. Fig. 2(b) shows the dispersion of the ground state width, (1/NB) d3r|Ψ0B(r)| 2, as a function of those same pa- rameters. For a < acr = 1.83 the ground state energy and dispersion undergo a sharp crossover between the state with compressed and expanded BEC wavefunctions (corresponding to the phase separated and mixed states) as functions of b. Note that the value acr corresponds to the value of αcr = 1.46, which is quite close to the above critical value of 1.69 obtained from the variational approach. The dependence of ground state energy near the critical value of a is shown in the inset of Fig. 2(a). Clearly the ground state energy exhibits avoided level crossing, which is in accordance with the above conjec- ture (e.g., Eq. (6)) of macroscopic quantum tunneling between the two local energy minima. The value of the tunneling matrix element ∆ between two local “ground” states ǫ1 and ǫ2 can be deduced by fitting the calculated energy curves in Fig. 1 with the standard expression [13], ǫ = (ǫ1+ ǫ2)/2− [(ǫ1− ǫ2) ∆2]1/2 and assuming that in the vicinity of the point of crossover both ǫ1 and ǫ2 are linear functions of pa- rameter b. For a = 1.81 one finds ∆ ∼ 10−4 × ~ωT , while for a = 1.82, ∆ ∼ 10−2 × ~ωT . Assuming that the ground state wavefunctions have Gaussian shape, ∼ R−3 exp(−r2/R2 ), from Fig. 2 one finds that R̄ = (R1 + R2)/2 ≃ 0.85R0 for both a = 1.81 and a = 1.82, and δR = |R1 − R2| ≃ 0.09R0 for a = 1.81 and δR ≃ 0.03R0 for a = 1.82. For a typical value of the trapping frequency νT = 10 2Hz (ωT = 2πνT ), the two tunneling rates are ∆1.81/~ = 10 −2 s−1 and ∆1.82/~ = 10 2 s−1. Since the value of R0 for most trapped atomic BEC’s is of the order of a few microns, FIG. 2: (a) Dependence of the ground state energy of the BEC (per particle, in units of ~ωT ) as a function of parameters a and b; (b) Dispersion of the ground state spatial extent as a function of the same parameters. the difference between the radii of the two condensate states δR is submicron. Such small variation may be difficult to observe in situ by optical means. However, the expansion process that takes place in time-of-flight measurements after the trap potential is shut off and the expanding atoms are observed, has successfully magnified small distance features in other experiments. Role of dissipation: The above analysis determines the tunneling rate, but does not address the question whether the tunneling process is quantum coherent. Will the probability of the system to occupy one of the two macroscopic states oscillate in time as cos2 (∆ t/~)? The fermions not only provide the BEC with the effective in- teraction, they also cause fluctuations which can destroy the macroscopic quantum coherence. To evaluate the effect of fluctuations, it is sufficient to consider the first non-vanishing frequency-dependent contribution into the effective action of the bosons coming from the perturba- tive expansion of the Tr log [...] term in Eq. (1): (2π)3 χ0(q, ω)|ρB(ω,q)| 2. (8) Here ρB(q, ω) is the Fourier transform of ρB(r, t) and χ0 is the response function of the non-interacting fermions. In the small frequency domain χ0 = (1/4π)[~2k3F /(πµF ) + m F |ω|/(~ 2q)]. The frequency- independent part of χ0 has already been incorporated in the effective interaction between bosons, i.e., λ′|ΨB| term in Eq. (3). The second term in χ0 is responsible for damping. To quantify its role we employ a two- state approximation in describing the tunneling dynam- ics. In this representation the tunneling is described by the Hamiltonian Htun = ∆σ̂x, where σ̂x is a Pauli ma- trix with non-zero off-diagonal elements, and the posi- tion operator, i.e. the spatial width of the ground-state BEC wavefunction, is given by R̂ = R̄+(δR/2)σ̂z, where σ̂z is the diagonal Pauli matrix (with ±1 along the di- agonal). The dissipative part of the action for Htun can be derived from Eq. (8) by substituting a Gaus- sian ansatz, ρB(r, t) = NB/[π 3/2R3(t)] exp [−r2/R2(t)], where R(t) = R̄ + (δR/2)σz(t), σz = ±1, into Eq. (8). For δR ≪ R̄ one obtains Sdiss = γ~ dτdτ ′σz(τ)σz(τ ′)(τ − τ ′)−2, (9) where γ = N2Bλ 2/[2(2π~)4R̄4]. Eq. (9), to- gether with Htun defined above, describes dissipative dy- namics of a two-state system. Such dynamics has been extensively studied in connection with macroscopic quan- tum tunneling of a superconducting phase in Josephson junctions, and is known to depend critically on the value the parameter γ. Specifically, for γ > 1 the two-state os- cillation is always overdamped and at zero temperature it exhibits localization as a result of quantum fluctua- tions [14]. It is therefore instructive to evaluate γ for our situation. For estimates we consider an atomic mixture of 23Na (bosons) and 40K (fermions), which have natural scattering lengthes aBB ≃ 1nm (λBB = 4π~ 2aBB/mB) and aBF ≃ 4nm (λBF = 2π~ 2aBF [(1/mB) + (1/mF )]). For these data we obtain a critical value of N crB ≃ 12400 (again for νT = 10 2Hz) and the fermion density ncrF ≃ 7.4×1015cm−3. Then, for a = 1.81 we obtain γ1.81 ≃ 1.1, which corresponds to the localized case (at T = 0), whereas for a = 1.82 one gets γ1.82 ≃ 0.1. In the high temperature limit (for kBT > ∆) the relaxation rate Γ can be expressed in terms of γ as ~Γ = πγkBT [14], and therefore coherent (underdamped) oscillations can be observed for T ≪ ∆1.82/(γ1.82kB) = 0.5nK. The situation can be improved, however, if one utilizes a Fes- hbach resonance [4] to increase the aBF scattering length. For example, for aBF = 80nm one finds N B ≃ 25 and ncrF ≃ 2.6 × 10 11cm−3, and γ1.82 ≃ 2.5 × 10 −4. For such parameters coherent oscillations can be observed for T ≪ 0.2µK, which is easily observable. A low particle number also reduces the uncertainty of an atomic count- ing measurement that can be carried out in the time-of- flight procedure [15]. In summary we argue that a trapped boson-fermion mixture can exhibit MQT tunneling and coherent oscil- lations. Our studies indicate that MQT can be observed in 23Na and 40K atomic mixtures of sufficiently low tem- peratures. We thank M. Boshier and S. A. Gurvitz for valuable discussions. The work is supported by the US DOE. [1] A. J. Leggett et al., Rev. Mod. Phys., 59, 1 (1998) [2] L. Gunther and B. Barbera, Eds. ’Quantum tunneling of magnetization - QTM’94 (Kluwer, Dordrecht, Nethre- lands, 1995). [3] J. I. Cirac et al., Phys. Rev A, 57, 1208 (1998). [4] E. Timmermans et al, Phys. Rep., 315, 199 (1999). [5] H. T. C. Stoof, J. Stat. Phys. 87, 1353 (1997); M. Ueda and A. J. Leggett, Phys. Rev. Lett., 80, 1576 (1998); J. A. Freire and D. P. Arovas, Phys. Rev. A 59, 1461 (1999); C. Huepe, S. Metes, G. Dewel, P. Borckmans, and M. E. Brachet, Phys. Rev. Lett., 82, 1616 (1999). [6] C. A. Sackett et al, Phys. Rev. Lett. 82, 876 (1999); J. M. Gerton et al, Nature;408, 692 (2000). [7] E. A. Donley et al., Nature; 412, 295 (2001); J. L. Roberts et al. Phys. Rev. Lett.; 86, 4211 (2001). [8] K. Kasamatsu et al., Phys. Rev. A, 64, 053605 (2001). [9] A. G. Truscott et al., Science 291, 2570 (2001); F. Schreck et al., Phys. Rev. Lett. 87 080403 (2001); G. Modugno et al., Science, 297 2240 (2002); M. W. Zwier- lein et al., Phys. Rev. Lett. 92, 120403 (2004); T. Bour- del et al., Phys. Rev. Lett. 93 050401 (2004); Stan et al., Phys. Rev. Lett. 93, 143001 (2004). [10] The spatial arrangements in the fermion-BEC mixtures were first discussed in K. Molmer, Phys. Rev. Lett., 80, 1804 (1998), and the infinite system phase separation transition was described in L. Viverit et al. Phys. Rev. A, 61, 053605 (2000). [11] A Simoni et al., Phys. Rev. Lett., 90, 163202 (2003). [12] A quantitaive analysis of the GP equation for the poten- tial given by Eq. (2) for will be presented elsewhere. [13] QuantumMechanics, by L. D. Landau and E. M. Lifshits, Pergamon Press (1965). [14] A. J. Leggett et al., Rev. Mod. Phys. 59, 1 (1987). [15] Low particle numbers can be measured very accurately with resonance fluorescence, for instance, see D. Frese et al., Phys. Rev. Lett., 85, 3777 (2000); recent work also illustrated subPoissonian counting (∆N < N) for larger numbers, T. Campey et al., Phys. Rev. A, 74, 043612 (2006). ABSTRACT We show that the cold atom systems of simultaneously trapped Bose-Einstein condensates (BEC's) and quantum degenerate fermionic atoms provide promising laboratories for the study of macroscopic quantum tunneling. Our theoretical studies reveal that the spatial extent of a small trapped BEC immersed in a Fermi sea can tunnel and coherently oscillate between the values of the separated and mixed configurations (the phases of the phase separation transition of BEC-fermion systems). We evaluate the period, amplitude and dissipation rate for $^{23}$Na and $^{40}$K-atoms and we discuss the experimental prospects for observing this phenomenon. <|endoftext|><|startoftext|> Efficiency of thin film photocells D. Mozyrsky and I. Martin Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA (Dated: Printed November 4, 2018) We propose a new concept for the design of high-efficiency photocells based on ultra-thin (sub- micron) semiconductor films of controlled thickness. Using a microscopic model of a thin dielectric layer interacting with incident electromagnetic radiation we evaluate the efficiency of conversion of solar radiation into the electric power. We determine the optimal range of parameters which maximize the efficiency of such photovoltaic element. Improvement of efficiency of semiconductor photo- voltaic elements (solar cells) has been an important tech- nological challenge for several decades. The maximum possible efficiency obtains when every incident photon generates an electron-hole pair, which then separates into electron flowing to cathode and hole flowing to anode[1]. The limitations that reduce the efficiency of the practi- cal solar cells relative to the ideal are 1) light reflection at the interfaces, 2) incomplete absorption of light en- tering the device due to finite thickness, 3) electron-hole relaxation inside the absorbing medium during diffusion to the leads[2, 3]. Interplay between two latter mecha- nisms leads to an existence of an optimal device thick- ness, typically a few optical wavelengths. Here we show that the interface reflection, commonly considered a com- pletely independent loss mechanism, shows an interest- ing interplay with absorption in ultra-thin film devices. This opens a possibility for a new generation of ultra-thin (sub-wavelength) photovoltaic elements with efficiencies rivaling the best conventional devices. A “working body” of a solar cell is typically a semicon- ductor with relatively high absorption index at frequen- cies corresponding to those of the sun quanta h̄ωsun ∼ kBTsun, Tsun ≃ 6000 K. Such semiconductors, however, process a rather high refraction index n at these frequen- cies. As a consequence, a fraction (n − 1)2/(n + 1)2 of the incident light is reflected from the surface of the think device. To reduce this loss, often anti-reflective coating are applied to the surface of the device. On the other hand, for sub-wavelength thin films, the reflection can be significantly smaller (for a reason similar to why even metallic films are transparent when thin enough). Thus reducing the film thickness one should reach an optimum where reflection is reduced but the absorption is still sig- nificant. Also, in such thin devices the carrier recombination is naturally reduced. Electron-hole recombination which prevents efficient charge separation in the photocell is a major limiting factor in device operation. There are nu- merous mechanisms which lead to the charge relaxation in a bulk of a semiconductor. These mechanisms include spontaneous emission as well as phonon or impurity in- duced relaxation. While it is difficult to control these processes in the bulk of a semiconductor, it is clear that their contribution can be significantly reduced if diffu- sion length of electrons and holes is large compared to the width of the semiconducting layer. radiation (a) (b) (d)(c) FIG. 1: Insets (a) and (b): Schematics of the device. Insets (c) and (d): Band structure of the device without and with external load. Diffusion length for most semiconductors used in pho- tocells is of the order of a few microns. Since this distance is comparable to a typical wavelength of the sunlight, one could expect that the specific absorption (the ratio of the absorbed power to the incident power of radiation) in such a thin semiconducting layer is insufficient for any practical use. In order to see whether this is the case, it is instructive to look at the absorption of radiation in a layer of thickness d, e.g. Fig. 1(a,b). For simplicity we assume that the radiation is incident perpendicular to the surface of the layer and is monochromatic with wave- length λ. The layer has a dielectric constant whose real and imaginary parts are ǫRe and ǫIm respectively. The specific absorption of the dielectric layer can be easily evaluated by solving the wave equation (n/c2)Ä = ∂2zA (A is radiation field vector-potential, n = ǫ1/2 is the re- fraction index, and c is the speed of light) in the three regions, i.g., Fig. 1(b), and taking into account the conti- nuity conditions at the boundaries of the dielectric layer, A1 = A2, A2 = A3, ∂zA1 = ∂zA2, etc. After straight- forward algebra one finds that the specific absorption is http://arxiv.org/abs/0704.0651v1 Pabs = 1−|t|2−|r|2, where the amplitudes of transmitted and reflected waves are 4n exp (id/λ) (n+ 1)2 exp (−ind/λ)− (n− 1)2 exp (ind/λ) , (1a) (n2 − 1)[exp (ind/λ)− exp (−ind/λ)] (n+ 1)2 exp (−ind/λ)− (n− 1)2 exp (ind/λ) .(1b) The specific absorption evaluated according to the above equations is presented in Fig. 2 for a GaAs slab as a function of its thickness d. GaAs has a relatively nar- row (∼ 1.4 eV) bandgap and therefore is widely used in high-efficiency photocells. Since dielectric function of GaAs is strongly frequency dependent [4], in Fig. 2 we plot the specific absorption for several energies typ- ical to the quanta of solar radiation. The 2 eV curve corresponds to the relatively low imaginary part of the dielectric constant and thus saturates slowly exhibiting several oscillations due to the interference between re- flected and transmitted components. The 3 eV and 4 eV curves correspond to much higher absorption (for exam- ple ǫGaAsIm (3 eV ) ≃ 17) and saturate much faster. Prior to saturation both curves exhibit a peak (again due to the interference) at roughly d ≃ λ/|ǫ|. Remarkably the value of the specific absorption at the peak (∼ 0.42 at d ≃ λ/|ǫ| ≃ 20 nm for 3 eV curve) is nearly the same as its saturation value (0.51 at d ≫ λ). Thus we conclude that the solar radiation can be absorbed by a semicon- ducting layer of submicron thickness almost as efficiently as by an infinitely thick slab. In this paper, following the simple above considera- tions, we propose a new concept for the design of pho- tovoltaic elements based on thin semiconductor films of controlled thickness. To put our arguments on more rig- orous footings in the following we consider a detailed microscopic model of a dielectric layer interacting with solar radiation. Effective Hamiltonian can be derived from standard quantum-mechanical interaction between matter and radiation, (e/mc)A(r)·p + (e2/2mc2)A2(r), where here and in the following we assume Coulomb gauge for the electromagnetic field. The radiation in- duces transitions between valence and conduction band of the semiconductor. We assume that the temperature of the semiconductor is 0, and therefore these are the only possible transitions in the system (the valence band is completely full and the conduction band is empty). De- noting the Bloch states for the valence(conduction) bands (r) = exp (−ikr)uv(c) (r), H0ψ (H0 is a Hamiltonian of the crystal in the absence of coupling to radiation field) one can rewrite the radiation- matter interaction Hamiltonian in terms of single particle states in the semiconductor as Hint = 〈uc0|pα|uv0α〉 c dk−q⊥αA +H.c. . (2) We make the following assumptions: (1) While the electromagnetic field does not significantly vary with distance inside the film, the electronic wave-functions are effectively 3-dimensional - we assume that λsun ∼ 2πh̄c/(kBTsun) ≫ d ≫ h̄/(m∗Eg)1/2, where d is the thickness of the film, m∗ is exciton effective mass, (m∗)−1 = m−1v +m c , mv(c) are effective masses in va- lence and conduction bands, and Eg is the band-gap (in this paper we assume that bands have extremuma at zero momentum). This assumption allows one to carry out an analytic calculation with rather simple and transparent results. We will discuss the validity of this approxima- tion at the end of the paper; (2) The bands have different symmetry, say s and p, so index α denotes angular mo- mentum of an electron in p band. Since wave-vectors of the incident radiation are nearly perpendicular to the surface of the film and the film is assumed infinite in x − y dimension, only in-plane components of the an- gular momentum (α = x, y) are relevant in Eq. (2); (3) The coupling in Eq. (2) is isotropic and the coupling con- stant tα = 〈uc0|epα/(mc)|uv0α〉 ≃ Egp/(cS1/2), where p is the effective dipole moment per unit cell of the film and S is the surface area of the film. (4) Due to external electric load the bands have effectively different chem- ical potentials, µn and µp, e.g., Figs. 1(c) and 1(d). That is, once electron is promoted from valence to con- duction band, it immediately “rolls over” to the left lead, which corresponds to the infinite transition rate between the semiconductor and the metallic lead (an infinitely thin Shotkey barrier). Clearly, were the rate comparable or slower than the electron-hole relaxation rate, the effi- ciency of the cell would have decreased. (5) The p − n junction is prepared (doped) as shown in Fig. 1(c) - the top of the valence band in the p-doped area of the junc- tion lies just below the bottom of the of the conduction band in the n-doped area. Therefore the maximum volt- age the photo-cell can sustain is equal to Eg/e, which corresponds to the assumption of maximum efficiency of Shockley and Quasser[1], i.e., each electronic transition from valence to conduction band generates energy Eg in the circuit. The photocurrent is defined as the rate of the charge transfer between the valence and conduction band, Îph = [Hint, ck]. To the lowest non-vanishing order it can be expressed as Iph = (Eck − Evk−q⊥α, z = z ′ = 0) . (3) The photocurrent of Eq. (3) is independent of the voltage across the cell as far as it does not exceed the bandgap of the semiconductor, e.g., Fig. 1(d). For larger bias, within our assumptions, a reverse current begins to flow. In Eq. (3) D<αβq⊥(ω, z, z dtd2r⊥ exp i(ωt+ r⊥ · q⊥) 〈TKA−α (t, r⊥, z)A+β (0,0, z′)〉 is the “lesser” Green’s function of the electromagnetic field defined along the Keldysh contour, where super- scripts ± denote forward and return branches of the contour [5]. The Green’s function is inhomogeneous 0 0.5 1 1.5 4 eV 3 eV 2 eV FIG. 2: Specific absorption of GaAs slab as a function of its thickness. Different curves correspond to different frequencies (energies) of incident radiation. along z-direction, i.e., perpendicular to the film surface. In order to incorporate effects of absorption and reflec- tion from the film, it is necessary to include renormaliza- tion of the photon Green’s function due to interactions with the film according to Eq. (2). These effects can be treated by means of Dyson equation, which, for the case of two-dimensional film reads D̂αβ(ω,q⊥, z, z ′) = D̂0αβ(ω,q⊥, z − z′) +D̂0αγ(ω,q⊥, z)Σ̂γδ(ω,q⊥)D̂δβ(ω,q⊥, 0, z ′) , (4) where hats denote the standard 2 × 2 matrix struc- ture of the non-equilibrium Green’s functions. The self-energy Σ̂γδ in Eq. (4) is quasi-two-dimensional. Since l ≫ h̄/(m∗Eg)1/2, effects related to the finite width of the semiconducting slab can be neglected and Σ̂γδ(ω,q⊥, 0) = Σ̂γδ(ω,q), where Σ̂γδ(ω,q) is the self-energy defined for the bulk of the semiconductor. Moreover, due to x− y symmetry Σ̂γδ reduces to a δγδΣ̂, where Σ̂ depends only on |q⊥|. Therefore we obtain a closed form equation for D̂αβq⊥(ω, z = z ′ = 0): D̂αβ(0, 0) = D̂0αβ(0) + D̂0αγ(0)Σ̂D̂γβ(0, 0) , (5) where components of the bare Green’s function D̂0αβ for solar radiation are: 0αβ (ω,q⊥, 0) = 4π(δαβ − qαqβ/q2) ω2 − ω2q ± iδ , (6a) D<0αβ(ω,q⊥, 0) = (δαβ − qαqβ/q2) δ(ω − ωq)ñq − δ(ω + ωq)(1 + ñ−q) . (6b) In Eq. (6) ωq = h̄c|q| and the ñq is the distribution function of solar radiation. We assume that the incident radiation wavevectors are uniformly distributed within a cone with an opening angle 2φ (φ ≪ 1). Moreover, in order for the incident power to be maximum, we assume that the surface of the cell is perpendicular to the cone’s axis. Then ñq = n q θ(qz)θ(φqz −|q⊥|), where nBq is Bose distribution function with temperature Tsun. Furthermore due to the homogeneity of the Green’s function D̂αβq⊥ in x− y plane one can seek for solution of Eq. (5) in the form D̂αβ = D̂1δαβ + D̂2 qαqβ/q 2, where D̂1(2) are 2×2 matrices in the Keldysh space, but depend only on the absolute value of the wavevector q. Substi- tution of this ansatz into Eq. (5) yields two independent equations for D̂1 and for D̂3 = D̂1 + D̂2. After solving those equations one finds (DR01(3)) 01(3) (DA01(3)) −1 +Σ< 01(3) )−1 − ΣR][(DA 01(3) )−1 − ΣA] , (7) where D̂01 is the diagonal part of D̂0αβ , e.g., Eqs. (6), and D̂03 = D̂01 + D̂02, where D̂02 is the transverse part of D̂0αβ . Σ R(A) and Σ< are retarded(advanced) and “lesser” parts of photon self-energy. Also we find a stan- dard expression for the retarded(advanced) Green’s func- tions of the radiation: 01(3) )−1 − ΣR(A) . (8) We can now evaluate the photo-current in Eq. (3) in terms of the Green’s function of Eq. (7). The first contri- bution is due to absorption of incident radiation accom- panied by transfer of electrons from valence to conduction band. It comes from the first term in the numerator in the RHS of Eq. (7). This term can also lead to the reverse current due to spontaneous and stimulated emission, i.e., transitions from conduction band to the valence band ac- companied by creation of a real photon. It arises due to the δ(ω+ωq) term in Eq. (6b). This process is, however, not allowed while the energy gap Eg exceeds the applied voltage µn − µp, e.g., Fig. 1(c,d). In this situation the cell becomes a light-emitting diode, and, as was stated above, we are not interested in such case in this paper. The second contribution comes from the Σ< term in the RHS of Eq. (7). It corresponds to the emission of vir- tual quanta of radiation by one electron-hole pair and their subsequent re-absorbtion by another pair, resulting in an incoherent simultaneous transfer of two electrons from conduction to valence band. This process gives a reverse contribution to the current which, once again, is non-zero only in the “light-emitting” regime (or at high temperature). The self-energies in Eqs. (7,8) can be evaluated in terms of the electronic Green’s functions. The leading contribution comes from the conventional polarization di- agram, i.e., a convolution of two electronic Green’s func- tions. For non-equilibrium situation one obtains ΣR(ω, 0) = GKvk(ω + ω ′)GAck(ω +GRvk(ω + ω ′)GKck(ω ′) + (v ↔ c) . (9) and ΣA = (ΣR)∗. In Eq. (9) G are retarded (ad- vanced) Green’s functions of valence(conduction) elec- trons and GK is the Keldysh Green’s function. Note that in Eq. (9) we evaluated the self-energy at zero wavevector, since photon wavevectors are small com- pared to those of electrons, and therefore the self-energy Σ̂ is weakly dependent on q for direct bandgap materials. The self-energies Σ(R)A can be easily evaluated for non- interacting electrons. ThenG v(c)k (ω) = (ω−Ev(c) ±iδ)−1 and GKv(c)k(ω) = (1− 2n n(p)k)δ(ω −E ), where nFn(p)k are Fermi filling factors. The valence and conduction electrons are assumed to have chemical potentials corre- sponding to those of the two leads, µn and µp respec- tively. For these Green’s functions the imaginary part of the self-energy yields: Im (ω, 0) = ± Θ(ω − Eg) ω − Eg , (10) where we have introduced a dimensionless parameter a = p2dE ∗)3/2/(21/2h̄4c). A similar calculation for the real part of ΣR(A) yields an ultraviolet diver- gence. This, however, is an artifact of our approxima- tion of the infinitely thin absorbing layer (similar prob- lem occurs in the quantum electrodynamics treatment of electron-photon interaction). In a proper microscopic theory this divergence in the real part of the self-energy is exactly cancelled by the e2A2(r)/(2mc2) term in the radiation-matter interaction Hamiltonian (the frequency sum rule). Taking into account this cancellation is equiv- alent to performing the Kramers-Krönig transformation on the dielectric function [Re ǫ(ω) = 4πc2ReΣ(ω)/ω2], rather than self-energy: ΣRRe(ω, 0) = ΣRIm(ω, 0)dω ω′2(ω′ − ω) |ω + Eg| − |ω − Eg|) , (11) Note that in Eqs. (10,11) we used the three-dimensional expression for the self-energy and therefore we can re- express parameter a in terms of the conventional zero frequency dielectric constant ǫ0, a = 2lEg(ǫ0 − 1)/(h̄c). From Eqs. (6,7,10,11) one can evaluate the photocur- rent Iph given by Eq. (3). In this paper we are interested in the maximum efficiency of the photocell, which can be defined as ηmax = (IV )max/Pin, where (IV )max is the power that is dissipated in the circuit assuming an opti- mal load and Pin is the power of incident solar radiation. 0 2 4 6 FIG. 3: Dependence of photovoltaic efficiency on dimension- less parameters a and b. Since the photo-current is only weakly dependent on volt- age for V < Eg and becomes negative due to spontaneous and stimulated emission emission for V > Eg, we have (IV )max ≃ IphEg/e. The incident power per unit area Pin = cu, where energy density u = (2/v) h̄ωqñq, where v is the mode quantization volume and factor 2 is due to two polarizations of the light waves. Carrying out the calculation we obtain the following closed form ex- pression for the maximum photovoltaic efficiency of the cell: ηmax = 30ab4 exp (bx)− 1 (x+ a x− 1)2 + a2(2− x+ 1− x− 1)2 , (12) where we introduced another dimensionless parameter b = Eg/Tsun. Function ηmax(a, b) shown in Figure 3 has a pronounced maximum at a ≃ 2.0 and b ≃ 2.4. This maximum corresponds to the first maximum of the spe- cific absorption curves in Fig. 2. Note that since our microscopic theory is valid only for thin dialectic layers (d ≪ λsun), it does not account for the saturation of specific absorption at larger thickness, i.e., when d > λ. However, since ηmax reaches maximum at d ∼ λsun/ǫ0, our theory is fully self-consistent for semiconductors with sufficiently high value of the dielectric constant. The optimal value of parameter b corresponds to the bandgap energy Eoptg ≃ 1.2 eV . Since this value is close to the bandgap energy in GaAs (EGaAsg ≃ 1.4eV ), we conclude that GaAs is a good candidate for the prac- tical realization of thin film photocells. According to Eq. (12) the optimal thickness of such GaAs layer is dopt ≃ 1.2h̄c/[Eoptg (ǫGaAs0 − 1)] ≃ 15 nm, which is within the fabrication capabilities of contemporary molecular beam epitaxy technology. Another promising material is amorphous Silicone which, unlike crystalline Si, has large imaginary part of dielectric constant[6]. Moreover, in ultra-thin devices considered here, the thermal equi- libration of photo-generated carriers may occur on the time scales longer than the charge separation timescale. Thus the hot-carrier physics[7] may lead to further en- hancement of the efficiency. We thank D. Smith and A. Findikoglu for useful dis- cussions. The work was supported by the US DOE. [1] W. Shockley, H.J. Queisser, J. Appl. Phys. 32 (1961) 510- [2] P. Wurfel, Physics of Solar Cells, Wiley-VCH, 2005. [3] J. Nelson, The Physics of Solar Cells, Imperial College Press, 2003. [4] J.S. Blakemore, J. Appl. Phys. 53 (1982) 520-531. [5] J. Rammer, H. Smith, Rev. Mod. Phys. 58 (1986) 323-359. [6] K.C. Kao, R.D. McLeod, C.H. Leung, H.C. Card, H. Watanabe, J. Phys. D: Appl. Phys. 16 (1983) 1801-1811. [7] R.T. Ross, A.J. Nozik, J. Appl. Phys. 53 (1982) 3813-3818. ABSTRACT We propose a new concept for the design of high-efficiency photocells based on ultra-thin (submicron) semiconductor films of controlled thickness. Using a microscopic model of a thin dielectric layer interacting with incident electromagnetic radiation we evaluate the efficiency of conversion of solar radiation into the electric power. We determine the optimal range of parameters which maximize the efficiency of such photovoltaic element. <|endoftext|><|startoftext|> arXiv:0704.0652v1 [astro-ph] 4 Apr 2007 Draft version November 28, 2021 Preprint typeset using LATEX style emulateapj v. 11/26/04 GALACTIC WIND SIGNATURES AROUND HIGH REDSHIFT GALAXIES Daisuke Kawata and Michael Rauch Draft version November 28, 2021 ABSTRACT We carry out cosmological chemodynamical simulations with different strengths of supernova (SN) feedback and study how galactic winds from star-forming galaxies affect the features of hydrogen (HI) and metal (CIV and OVI) absorption systems in the intergalactic medium at high redshift. We find that the outflows tend to escape to low density regions, and hardly affect the dense filaments visible in HI absorption. As a result, the strength of HI absorption near galaxies is not reduced by galactic winds, but even slightly increases. We also find that a lack of HI absorption for lines of sight (LOS) close to galaxies, as found by Adelberger et al., can be created by hot gas around the galaxies induced by accretion shock heating. In contrast to HI, metal absorption systems are sensitive to the presence of winds. The models without feedback can produce the strong CIV and OVI absorption lines in LOS within 50 kpc from galaxies, while strong SN feedback is capable of creating strong CIV and OVI lines out to about twice that distance. We also analyze the mean transmissivity of HI, CIV, and OVI within 1 h−1 Mpc from star-forming galaxies. The probability distribution of the transmissivity of HI is independent of the strength of SN feedback, but strong feedback produces LOS with lower transmissivity of metal lines. Additionally, strong feedback can produce strong OVI lines even in cases where HI absorption is weak. We conclude that OVI is probably the best tracer for galactic winds at high redshift. Subject headings: galaxies: kinematics and dynamics —galaxies: formation —galaxies: stellar content 1. INTRODUCTION Supernova (SN) explosions are thought to be capa- ble of ejecting part of the interstellar medium (ISM) from galaxies. Such outflows are often called “galactic winds” (Johnson & Axford 1971; Mathews & Baker 1971; Veilleux et al. 2005). Galactic winds are believed to be an important mechanism for enriching the intergalactic medium (IGM) (Ikeuchi 1977; Aguirre et al. 2001; Madau et al. 2001; Scannapieco et al. 2002; Cen et al. 2005) and are thought to play a crucial role in shaping the mass- metallicity relation of galaxies (e.g. Larson 1974; Dekel & Silk 1986; Arimoto & Yoshii 1987; Gibson 1997; Kawata & Gibson 2003), and in heating the IGM (Ikeuchi & Os- triker 1986). Galactic winds have been observed in lo- cal star-forming galaxies (e.g. Lynds & Sandage 1963; Martin 1998; Ohyama et al. 2002), where their outflow morphology and kinematics have been extensively stud- ied (e.g. Martin 2005; Rupke et al. 2005). Some local galaxies show outflows to about 20 kpc (Veilleux et al. 2003). Such outflows are expected to be more common at high redshift, where star formation is more active (z > 1) (Madau et al. 1996). Galactic winds are also believed to terminate star formation in ellipticals (Mathews & Baker 1971; Kawata & Gibson 2003) and have been invoked at high redshift to explain the high age of their stel- lar populations (Kodama & Arimoto 1997; Labbé et al. 2005; Kriek et al. 2006). Theoretical studies suggest that for progenitors of disk galaxies the gas outflow due to SN heating at high redshift can effectively suppress star formation, leading to a less dense stellar halo (Brook et al. 2004) and a larger disk (Sommer-Larsen et al. 1 The Observatories of the Carnegie Institution of Washington, 813 Santa Barbara Street, Pasadena, CA 91101 2 Swinburne University of Technology, Hawthorn VIC 3122, Australia 2003; Robertson et al. 2004; Governato et al. 2006) at z = 0. Therefore, observing galactic wind signatures at high redshift may elucidate an important ingredient in the formation of galaxies. The observational studies at high redshift have uncov- ered evidence that outflows from star-forming galaxies at high redshift may be common, as has been shown for Lyman break selected galaxies (e.g. Pettini et al. 1998; Ohyama et al. 2003; Shapley et al. 2003). These results have contributed to the debate about how the outflows from star-forming galaxies may affect the intergalactic medium (IGM). Rauch et al. (2001a) uncovered evidence for repeated injection of kinetic energy into higher den- sity, CIV absorbing gas, possibly driven by recent galac- tic winds. In a study of the lower density, general Lyman alpha forest, Rauch et al. (2001b) found that most that of the HI absorption systems lack signs for being dis- turbed by winds, and derived upper limits on the filling factor of wind bubbles. Simcoe et al. (2002) surveyed the properties of strong OVI absorption systems at high redshift and proposed that the apparent temperatures and the kinematics of the OVI gas as well as their rate of incidence could be explained if massive Lyman break galaxies are driving winds out to 50 proper kpc. Adelberger et al. (2003, 2005a) studied the absorption line features in the spectra of background QSOs whose line of sight (LOS) passes close to Lyman-break selected star-forming galaxies. They found a deficit of neutral hy- drogen near these galaxies out to 0.5 h−1 comoving Mpc, accompanied by a surplus of HI beyond that radius, and suggested that most Lyman break galaxies may reside bubbles where superwinds have depleted the HI in the interior and piled up more neutral gas beyond the hot bubble. The more recent, more statistically significant of these studies Adelberger et al. (2005a) however, does not support this claim, but still appears to show a sig- http://arxiv.org/abs/0704.0652v1 2 Kawata and Rauch nificant fraction (about 7 out of 24) of the LOS exhibit- ing weak or no absorption within 1 h−1 comoving Mpc. Numerical simulations of the IGM without SN feedback predict a much lower fraction of such weak absorption systems near the galaxies (Kollmeier et al. 2003), a fact that may conceivably be explained if at least some galax- ies have outflows destroying HI in the IGM in their vicin- ity. Adelberger et al. (2003, 2005a) also show that there is a correlation between the spatial distribution of CIV absorption lines and the star-forming galaxies, with the strongest CIV absorption lines being observed at the LOS closest to the galaxies (∼ 80 proper kpc). This result may indicate that outflows related to recent star formation ac- tivity have enriched the IGM locally. Other evidence for the association of CIV with galaxies has been reported (Pieri et al. 2006; Simcoe et al. 2006). Scannapieco et al. (2006b) compared the observed LOS correlation func- tions of CIV and SiIV with analytic outflow models and concluded that the observed correlation can be formally explained if there are outflows with a scale of about 2 comoving Mpc from large galaxies whose stellar mass is about 1012 M⊙. However, the large range of the indi- vidual wind bubbles required to explain the CIV- galaxy correlation is puzzling. Theoretical arguments suggest that the clustering of CIV with Lyman break galaxies is not necessarily proof that those same galaxies produced the metal enrichment (Porciani & Madau 2005; Scanna- pieco 2005), and the relation between metals in the IGM and galaxies clearly needs further study. Cosmological numerical simulations have proven a use- ful tool for understanding metal absorption lines (Rauch et al. 1997). Comparisons between the observations and the statistics of metal absorption lines derived from nu- merical simulations have mostly been concerned with measurements of the metallicity and ionization state of the IGM (e.g. Davé et al. 1998; Aguirre et al. 2002; Schaye et al. 2003; Aguirre et al. 2004). The relation between star-forming galaxies and absorption line fea- tures have recently been studied by a number of authors (Croft et al. 2002; Kollmeier et al. 2003; Bruscoli et al. 2003; Kollmeier et al. 2006; Tasker & Bryan 2006), with several papers (e.g. Croft et al. 2002; Kollmeier et al. 2006; Theuns et al. 2002) considering observable effects of galactic outflows on the Lyman alpha forest absorption lines. These studies generally have concluded that such outflows hardly affect the strength of the HI absorption lines, because the winds tend to escape into less dense regions and do not impact the IGM where the density is high enough to produce HI absorption lines. So far, most studies based on full-blown cosmologi- cal numerical simulations have focused on HI absorp- tion lines. Although there have been a number of pa- pers discussing the origin and properties of metal ab- sorption lines (e.g. Rauch et al. 1997; Theuns et al. 2002; Aguirre et al. 2005; Tasker & Bryan 2006; Oppenheimer & Davé 2006), the relation between metal absorption line systems and outflows from the coeval galaxy pop- ulation has remained largely unclear, in particular as the effect of winds on the physical properties of metal lines it is not well known. Nevertheless, the fact that the metallicity is expected to be increased by galactic outflows, and the availability of multiple transitions and several heavy elements with a different enrichment his- tory should make metal lines a potentially much more Fig. 1.— Number of SN and chemical yields as a function of the age of a star particle with a mass of 1 M⊙. The upper panel shows the total number of SN. The middle and lower panels present the total ejected carbon and oxygen masses, respectively. The thin solid, thick solid, and gray thick dotted lines indicate the history of a star particle with the metallicity of Z = 10−4, 0.02, and 0.04, respectively. useful tracer of galactic winds than neutral hydrogen. The current paper studies the properties of both HI and metal absorption lines, and looks for observational signatures of galactic winds. To this end, we run cosmo- logical simulations with the original version of the Galac- tic Chemodynamics Code, GCD+ (Kawata & Gibson 2003) which is capable of tracing the chemical evolution of the IGM and galaxies self-consistently. We carry out simula- tions with different strengths of SN feedback, and com- pare the features of the HI and metal absorption lines between the simulations, attempting to identify features sensitive to the presence of galactic winds. The following section will explain our method including the description of the numerical simulations with GCD+ and analysis of absorption lines. Unlike previous studies, we follow different heavy elements separately, and take into account the abundance evolution for the different elements when creating fake QSO spectra from the sim- ulation. This is important because the different elements come from different types of SNe or are due to mass loss from inter-mediate mass stars with different life-times. Section 3 shows our results. First, Section 3.1 focuses on one galaxy in the simulation volume, and compares the absorption line features around the galaxy among models with different SN feedback strengths. Then, in Section 3.2 we discuss the results more quantitatively us- ing artificial QSO spectra in 1000 random LOS. Section 4 summarizes our conclusions. Galactic Wind Signatures 3 Fig. 2.— Projected gas surface density (upper) and temperature (middle) map and star particle distribution (lower) for models NF (left), SF (middle), and ESF (right). Fig. 3.— The history of the star formation rate down to z = 2.43 for models NF (left), SF (middle), and ESF (right). The time of first star formation is different among the three models, although there should not be any difference before feedback from stars happens. This difference is because different models are carried out on different computers, and the star formation model in GCD+ uses the random number generator (see Kawata & Gibson 2003, for details) whose sequences are different for different simulations. 2. METHOD 2.1. Numerical Simulations The simulations were carried out using the Galactic Chemodynamics Code GCD+ (Kawata & Gibson 2003). GCD+ is a three-dimensional tree N -body/smoothed par- ticle hydrodynamics (SPH) code that incorporates self- gravity, hydrodynamics, radiative cooling, star forma- tion, SN feedback, and metal enrichment. GCD+ takes into account chemical enrichment by both Type II (SNe II) (Woosley & Weaver 1995) and Type Ia (SNe Ia) (Iwamoto et al. 1999; Kobayashi et al. 2000) SNe and mass loss from intermediate-mass stars (van den Hoek & Groenewegen 1997), and follows the chemical enrich- ment history of both the stellar and gas components of the system. Figure 1 shows the total number of SN 4 Kawata and Rauch Fig. 4.— Overdensity, temperature, metallicity, and [C/O] map (from left to right) for model NF at Y= +50 (upper), 0 (middle) −50 (lower) proper kpc, where the biggest galaxy is set to be at the center. TABLE 1 Properties of the cental galaxy at z = 2.43 Model ESN Mvir a rvir b Mgas,vir MDM,vir Mstar,vir M200 c Tvir Name (erg) (M⊙) (kpc) (M⊙) (M⊙) (M⊙) (M⊙) (K) NF 0 2.1× 1011 57 1.7× 1010 1.7× 1011 2.5× 1010 2.0× 1011 3.8× 105 SF 3× 1051 2.0× 1011 57 1.5× 1010 1.7× 1011 1.6× 1010 1.9× 1011 3.7× 105 ESF 5× 1051 1.7× 1011 56 1.2× 1010 1.6× 1011 1.7× 109 1.6× 1011 3.2× 105 aVirial Mass in the definition of Kitayama & Suto (1996) bVirial radius in the definition of Kitayama & Suto (1996) cMass within a radius which is the radius of a sphere containing a mean density of 200 times the critical density at z = 2.43 bVirial temperature in the definition of Kitayama & Suto (1996) (both SN II and SN Ia) and the total amount of car- bon and oxygen ejected from a star particle with the mass of 1 Msun as a function of its age. Initially, SNe II go off, and they continue until the 8 Msun star dies (∼0.04 Gyr in the case of Z = 0.02). There is no SN until SNe Ia start to occur around 0.7 Gyr. A star par- ticle with Z = 10−4 does not lead to SN Ia, because the adopted SNIa model restricts the metallicity range for progenitors of SN Ia to logZ/Z⊙ > −1.1 (see Kobayashi et al. 2000, for details). Oxygen is produced mainly by SN II. After SN II ceases, the continuous ejection of oxygen and carbon is mainly due to the contribution from intermediate-mass stars. Although oxygen yield is mainly from the pre-enriched ejecta, carbon is newly pro- cessed in intermediate-mass stars, which explains the sig- nificant yield in the low metallicity case. The adopted version of the code also includes non- equilibrium chemical reactions of hydrogen and helium species (H, H+, He, He+, He++, H2, H 2 , H −) and their cooling processes, following the method of Abel et al. Galactic Wind Signatures 5 Fig. 5.— Same as Fig. 4, but for model SF. (1997); Anninos et al. (1997); Galli & Palla (1998). The details of the non-equilibrium chemical reactions are de- scribed in the Appendix of Kawata et al. (2006). We have made the following update from the code used in Kawata et al. (2006). We adopt a density threshold for star formation, and permit star formation from gas whose hydrogen number density (nH = fHρg/mp, where fH, ρg and mp are the hydrogen mass fraction, density, and pro- ton mass for each gas particle) is higher than 0.01 cm−3 (Schaye 2004). It is also crucial to take into account the effect of the UV background radiation when studying the properties of the IGM. We use the UV background spec- trum suggested by Haardt & Madau (2001). The code follows non-equilibrium chemical reactions of hydrogen and helium species subjected to the UV background. In addition, radiative cooling and heating due to heavy el- ements are taken into account based on the Raymond- Smith code (Raymond & Smith 1977), used in Cen et al. (1995). The simulation starts at z = 29.7, and initial temperature and the fractions of hydrogen and helium species are calculated by RECFAST (Seager et al. 1999, 2000). We turn on the UV background radiation at z = 6 (Becker et al. 2001; Fan et al. 2001). The cosmological simulation adopts a Λ-dominated cold dark matter (ΛCDM) cosmology (Ω0=0.24, Λ0=0.76, Ωb=0.042, h = 0.73, σ8 = 0.74, and ns = 0.95) consistent with the measured parameters from three-year Wilkinson Microwave Anisotropy Probe data (Spergel et al. 2006). We use a multi-resolution technique to achieve high-resolution in the regions of interest, includ- ing the tidal forces from neighboring large-scale struc- tures. The initial conditions for the simulations are con- structed using the public software LINGER and GRAFIC2 (Bertschinger 2001). Gas dynamics and star formation are included only within the relevant high-resolution re- gion (∼6 Mpc at z=0); the surrounding low-resolution region (∼55 Mpc) contributes to the high-resolution re- gion only through gravity. Consequently, the initial con- dition consists of a total of 1350380 dark matter parti- cles and 255232 gas particles. The mass and softening lengths of individual gas (dark matter) particles in the high-resolution region are 7.61 × 106 (3.59 × 107) M⊙ and 1.15 (1.93) kpc, respectively. The high-resolution region is chosen as the region within 8 times the virial radius of a small group scale halo with the total mass of Mtot = 3 × 10 12M⊙ and the virial radius of rvir = 380 kpc at z = 0. We simulate the following three models with these ini- tial conditions to investigate the effect of SN feedback. Model NF is a ”no-SN-feedback” model: although the model follows the chemical evolution due to SNe and mass loss from stars, we ignore the effect of energy feed- 6 Kawata and Rauch Fig. 6.— Same as Fig. 4, but for model ESF. back by SNe. Model SF is a ”strong feedback” model, where each SN yields the thermal energy of 3× 1051 erg. This model produces a feedback effect noticeable in a number of observables. In the final model, ESF, for ”ex- tremely strong feedback” model, a thermal energy release of 5 × 1051 erg per SN is assumed. We found that this model causes too strong effects of feedback, and produces too few stars. Therefore, the model is obviously unreal- istic. However, we retain the model for this extreme case to help put the other models in perspective. We analyze the properties of the IGM for all the mod- els at z = 2.43. As mentioned above, we adopt the multi-resolution technique. We extract a spherical vol- ume within the radius of rp = 800 kpc (in physical scale at z = 2.43) from a galaxy in the high-resolution region. The central galaxy is the biggest galaxy in the simulation volume, and the radius is chosen to avoid contamination from the low-resolution particles. In this paper, we use the coordinate system where the central galaxy resides at (x,y,z)=(0,0,0). Figure 2 shows the gas density, temper- ature, and stellar density map of the central 800 × 800 proper kpc2 region of this volume analyzed for all the models. Figure demonstrates that the stronger feedback affects the gas density distribution, and suppresses star formation more dramatically. 2.2. The Artificial QSO Spectra The aim of this paper is to search for signatures of galactic winds among the absorption lines in the back- ground QSO spectrum. To this end, we construct artifi- cial QSO spectra, with lines of sight through the simula- tion volume from various orientation and projected po- sitions, and compare the absorption line features among models with different strengths of SN feedback. For a given line of sight we identify the gas particles whose projected distance is smaller than their SPH smoothing length. In this paper, we focus on three absorption fea- tures, HI1216, CIV1548, and OVI1032, and we call them HI, CIV, and OVI hereafter. The ionization fractions for HI, CIV, and OVI for each gas particle are derived as fol- lows. The HI fraction is self-consistently calculated in our simulations, because GCD+ follows the non-equilibrium chemical reactions of the hydrogen and helium ions. The CIV and OVI fractions for each gas particle are analyzed with version 6.02b of CLOUDY, described by Ferland et al. (1998), assuming the condition of optical thin and ionization equilibrium. Here, we put in the density, tem- perature, and the abundances of different elements for the gas particles in the simulation, and run CLOUDY adopting the same UV background radiation as used in the simulations (Haardt & Madau 2001). Unfortunately, we realized that even our (unrealistically) strong feed- Galactic Wind Signatures 7 Fig. 7.— Density of HI, CIV, and OVI and OVI weighted temperature map (from left to right) for model NF at Y= +50 (upper), 0 (middle) −50 (lower) proper kpc, where the biggest galaxy is set to be at the center. The arrows represent the velocity field weighted by HI, CIV, OVI, and OVI in the panels from left to right, respectively. The size of arrow corresponds to the amount of velocity, as indicated in the upper right panel. back model cannot enrich the lower density regions in the IGM as much as what is observed (Cowie & Songaila 1998; Schaye et al. 2003; Aguirre et al. 2004). For ex- ample, some filaments are not enriched at all even in models ESF as will be seen in Figure 6, although quanti- tative comparisons with the observational data (see also Oppenheimer & Davé 2006) will be pursued in a future paper. This is likely because the limited resolution of our simulations is unable to resolve the formation of smaller galaxies which form at a higher redshift and could enrich the IGM. Thus, to mimic pre-enrichment for the low den- sity regions we add metals at the level of [C/H]= −3 and [O/H]= −2.5 to all gas particles. Once the ionization fractions are obtained, the column densities at the LOS for each species are analyzed for each gas particle, using the two-dimensional version of the SPH kernel. The op- tical depth τ(v) profiles along the LOS are calculated by the sum of the Voigt-absorption profiles for each particle, taking into account their temperature and LOS velocity, vi,LOS, which is the sum of Hubble expansion and pecu- liar velocity. The final spectra are constructed, assum- ing an overall signal-to-noise ratio of 50 per 0.04 Å pixel, the read out signal-to-noise ratio 500, and FWHM=6.7 km s−1. We stress that we take into account the dif- ference between carbon and oxygen abundances in our chemodynamical simulations, when obtaining CIV and OVI fraction. Note that the simulation volume analyzed is only 1.6 proper Mpc scale at z = 2.43. The Hubble expansion at z = 2.43 is 236 km s−1 Mpc−1 in the adopted cosmol- ogy, and 1.6 proper Mpc corresponds to 378 km s−1. In addition, since the volume is overdensity region, the ex- pansion velocity is smaller than the Hubble expansion. In Section 3.2, we analyze the mean transmissivity within 1 h−1 comoving Mpc from the galaxies. The velocity that corresponds to 1 h−1 comoving Mpc is 94.2 km s−1 at z = 2.43, which is well within the range of the volume. However, in the real Universe, if there are absorbers out- side the volume at the LOS and their peculiar velocity is large, they can contribute to absorptions in the velocity range we focus on (see Kollmeier et al. 2003, for more detailed discussion of such effect). Therefore, we ignore the contamination from such absorptions, and consider the ideal absorption systems only by the absorbers which are spatially close. 3. RESULTS 3.1. Absorption features around a galaxy To study how the gas outflows from galaxies affect the absorption line features, we generate two sets of QSO 8 Kawata and Rauch Fig. 8.— Same as Fig. 7, but for model SF. spectra for all the models. In this section, we analyze the spectra whose LOS are chosen as the 5 × 5 grid points each separated by 50 kpc (in physical scale at z = 2.43) as projected onto the plane of the sky. The grid is centered on a galaxy. In the next section, we generate 1000 ran- dom LOS spectra, and compare them among the three models. The properties of the central galaxy chosen for the present section for different feedback models are sum- marized in Table 1. The total mass of the central galaxy is slightly smaller than the estimated mass of the BX galaxies (M200 ∼ 6.3 × 10 11 − 1.6 × 1012 M⊙ in Adel- berger et al. 2005b) from the observational studies of the IGM-galaxies connection around z = 1.9 − 2.6 (Adel- berger et al. 2003, 2005a; Simcoe et al. 2006). However, the accurate mass range for such rest-frame UV-selected galaxies are still unknown. In this paper, we simply as- sume that the central galaxy is a typical UV-selected star-forming galaxies, and our simulation volume is a typical environment for such galaxies. This assumption will be tested in our future studies. Figure 3 shows the history of the total star formation rate for the star parti- cles within r = 5 proper kpc at z = 2.43. The extremely strong feedback in model ESF terminates star formation in the system. We confirmed that SNe Ia continuously heat the ISM, which keeps maintaining an outflow in model ESF. We chose the X–Y plane in Figure 2 as the projected plane of the sky, and define the Z-axis as the LOS di- rection. Figures 4-6 demonstrate the overdensity, tem- perature, metallicity, and abundance ratio of carbon to oxygen, [C/O], in the X–Z plane at three different posi- tion of Y= ±50 and 0 proper kpc for models NF, SF, and ESF, respectively. In the same planes, Figures 7-9 give the density distributions of HI, CIV, and OVI, and the OVI weighted temperature map. In these figures, the velocity field of the gas component is also shown with tangential arrows. In model NF the central galaxy ends up surrounded by a hot gaseous halo with a radius of about 100 kpc (Fig. 4). This is due to infalling gas being shock-heated to the virial temperature (Table 1). This situation appears common for high redshift galaxies (Rauch et al. 1997) and corresponds to the hot accretion mode of (Kereš et al. 2005; Dekel & Birnboim 2006) Because of the inflow, the metallicity of the hot gas is low. On the other hand, the high density filaments are cold, and part of the gas keeps accreting through the filaments onto the galaxy, i.e., the cold accretion mode described by Kereš et al. (2005) and Dekel & Birnboim (2006) (see also the velocity field in Fig. 7). As a result, the HI density is higher along the filament, and gets sig- nificantly higher in the collapsed region near the central Galactic Wind Signatures 9 Fig. 9.— Same as Fig. 7, but for model ESF. Fig. 10.— Composite image of overdensity (blue contour) and metallicity (red contour) distribution at Y= 0 for model SF. The edges of contours for overdenisty and metallicity correspond to log ρg/ < rhog >= −1 and logZ/Z⊙ = −2.5, respectively. galaxy and the neighboring galaxies. The spatial dis- tribution of CIV and OVI also traces the filaments, and their densities are high in the region close to the galaxies. Model SF produces a more extended hot gas region than model NF, and the hot gas is now metal-enriched (Fig. 5), compared to the NF case. Here the hot gas is dominated by a galactic wind induced by strong SN feed- back. The temperature map of Figure 5, the arrows in Figure 8, and Figure 10 demonstrate that the enriched gas tends to escape toward the lower density regions. The filament remains unaffected by the wind. As a re- sult, the cold accretion is still maintained through the filaments, which continues funneling gas to the galaxy. This is similar to what was observed in previous numeri- cal simulations with strong feedback effect (e.g., Theuns et al. 2002; Kollmeier et al. 2006). Consequently, the dis- tribution of HI density is not significantly different from model NF. On the other hand, higher density CIV and OVI gas extends over a much larger region. This is be- cause the enriched gas is blown out from the galaxy, and helps to raise the abundance of carbon and oxygen in the Interestingly, we find that the gas whose OVI density is high in model SF has a low temperature which is consis- tent with photoionization equilibrium for OVI (typical logarithmic temperatures are around Log(T (K))= 4.2) as opposed to collisional ionization temperature (typical Log(T (K))= 5.5). Comparison between the 3rd and 4th columns in Figure 8 demonstrates that the region where the density of OVI is high has temperatures around Log(T (K))< 4.5. This is gas that has been blown out of the galaxy, and cooled down by radiative cooling after colliding with the ambient IGM. CIV in low density halo 10 Kawata and Rauch Fig. 11.— Mosaic of 5× 5 lines of sight spectra of HI, CIV, and OVI lines around a galaxy at z = 2.43 for model NF. The central panel corresponds to the position of the galaxy, and each line of sight is separated by 50 kpc in physical scale. The numbers at the upper left corner of each panel present the (x,y) coordinate (in kpc) for each LOS. In the case of CIV and OVI only the stronger line of each doublet is shown. The LOS velocity is adjusted so that the LOS velocity of the galaxy equals zero. Fig. 12.— Same as Fig. 11, but for model SF. Galactic Wind Signatures 11 Fig. 13.— Same as Fig. 11, but for model ESF. Fig. 14.— The mean transmissivity of HI (upper), CIV (middle), and OVI (lower) for the pixels within 1 h−1 comoving Mpc of the galaxies as a function of the impact parameter for models NF (left), SF (middle), and ESF (right). The gray lines indicate median and 25th and 75th percentile. Note that the scale of the y-axis, i.e., f1Mpc is different in each panel. 12 Kawata and Rauch Fig. 15.— The probability of the flux decrement 1 − f1Mpc of HI (upper), CIV (middle), and OVI (lower), where f1Mpc is the mean transmissivity for the pixels within 1 h−1 comoving Mpc of the galaxies. The left/middle/right panel shows the results of model NF/SF/ESF. The dotted histogram in the upper panels present the observational results of Adelberger et al. (2005a). Fig. 16.— The mean transmissivity of the galaxies of HI, f1Mpc,HI for the pixels within 1 h −1 comoving Mpc as a function of the mean transmissivity of OVI, f1Mpc,OVI, for models NF (left), SF (middle), and ESF (right). The lines indicate median (black line) and 25th and 75th percentile (grey lines). and void regions is in a similar thermal state. Figure 6 shows that the extremely strong SN feedback in model ESF develops a much stronger galactic wind and a larger hot gas bubble. Feedback is now strong enough to affect the filaments. Gas accretion through the filaments is suppressed, so that star formation in the central galaxy ceases (Fig. 3). However, even here the denser regions of the filaments survive, and the density distribution of HI is similar to the one seen in models NF and SF. Figure 9 reveals that in model ESF there is more collisionally ionized OVI especially in the region close to the galaxy (r ≤ 40 kpc) where the OVI weighted tem- perature is around Log(T (K))=5.5 and the OVI density peaks. Figures 11-13 represent the spectra whose LOS are cho- sen as the 5× 5 grid points each separated by 50 proper kpc projected on the sky i.e., in the X–Y plane. In the figures, the middle panels correspond to the LOS through the center of the galaxy. First, we compare HI absorp- tion lines between the three models. At the LOS through the central galaxy, the HI lines are heavily saturated, i.e., they produce a damped Lyman alpha line, except in model ESF. In model ESF, not enough cold gas can sur- vive in the galaxy due to the extremely strong feedback. In all the models, the HI absorption becomes weaker with the projected distance from the galaxy. General features of HI absorption lines are similar among the three mod- els. Thus, the signature of a galactic wind seems to be difficult to see in HI absorption lines, confirming the con- clusions of previous studies (Theuns et al. 2002; Croft et al. 2002; Bruscoli et al. 2003; Kollmeier et al. 2006). However, a more detailed comparison of HI absorption line features between models NF and SF (Figs. 11 and 12, respectively) shows that the HI absorption lines tend Galactic Wind Signatures 13 Fig. 17.— The mean transmissivity of CIV (uppera), and OVI (lower) for the pixels within 1 h−1 comoving Mpc of the galaxies as a function of the impact parameter. The left panel shows the resul of model SF. The middle panel presents the result of model SF, when the IGM is assumed to be homogeneously enriched at the level of [C/H]=-2.5 and [O/H]=-3. The right panel demonstrates the result of model SF, when the QSO only UV background radiation suggested by Haardt & Madau (2001) is adopted. The gray lines indicate median and 25th and 75th percentile. to be stronger in model SF than those in model NF, even close to the galaxy. This is a counter-intuitive re- sult, because it seems natural that a strong wind should predominantly be destroying HI clouds, as suggested by Adelberger et al. (2003) and Bertone & White (2006). However, as shown in Figures 2 and 5, strong feedback redistributes the high density gas in the galaxy to the surrounding region, and the filaments become broader. This leads to the stronger HI lines in model SF, espe- cially in the outer regions of the galaxy. A similar effect is seen in model ESF (Fig.13). Therefore, we suggest that stronger SN feedback actually increases the HI ab- sorption. We will test this more quantitatively in the next section. Conversely, we do see a lack of neutral hydrogen along some LOS close to the galaxy. However, they do not seem to have anything to do with winds. In model NF, the LOS at (X,Y) = (−100,−50) and (−100,−100) have very weak HI absorption lines, although their projected distance is smaller than ∼120 proper kpc (∼ 300 h−1 comoving kpc). Adelberger et al. (2005a) studied HI ab- sorption lines around star-forming galaxies at 2 ≤ z ≤ 3, and found that some LOS which are within 1 h−1 comov- ing Mpc from the galaxies show weak or absent HI ab- sorption. They argue that such a lack of absorption may be caused by a galactic superwind destroying the neutral hydrogen. However, our model NF does not include any SN feedback. The LOS at (X,Y) = (−100,−50) corre- sponds to the line at X= −100 in the lower panels of Figures 4 and 7. This LOS passes through hot accretion- shocked gas which cannot accommodate HI, and misses the more HI-rich filaments. This example demonstrates that it is possible to have LOS close to galaxies which do not show any strong HI absorption, without the need for a galactic wind. It is also worth mentioning that there are some LOS which show double HI absorption lines, especially in model NF, e.g., the LOS at (X,Y) = (50, 0). We find that this is due to symmetric gas infall from the fila- ments. The LOS at (X,Y) = (50, 0) in model NF can be seen at X= 50 kpc at the middle panels of Figure 7. This LOS passes through two filaments at Z∼ −120 and ∼ 75 kpc, and the velocity map shows the filament at Z∼ −100 (∼ 75) has positive (negative) LOS velocity. As a result, these two filaments appear as double absorption compo- nents. Such double HI absorption line features become less obvious in the cases of stronger feedback, because outflow from the galaxy makes the velocity field more chaotic, and fills the gap between the components in ve- locity space. We also compare CIV and OVI lines among the models. In model NF, there are almost no CIV or OVI lines at R≥ 50 proper kpc. The LOS at (X,Y) = (50,−50) shows strong CIV lines. However, this is due to the next closest galaxy, as seen in Figure 7. In contrast, the stronger feedback in models SF and ESF creates CIV and OVI lines further away from the galaxy. We also investigated the absorption lines with a much finer grid of LOS, i.e., smaller separations, and found that strong CIV or OVI lines are rare at projected radii R ≥ 100 kpc even in models SF and ESF, unless there is another galaxy close to the LOS. This can also be seen in Figures 8 and 9, where dense CIV and OVI regions extent to about 100 kpc. OVI lines are very rare in model NF, and only exist where the HI absorption is saturated. On the other hand, strong feedback models produce more OVI lines, that are sometimes stronger than CIV lines. These figures demonstrate that the OVI lines are the most sensitive signature of a galactic wind in absorption. We study this possibility more quantitatively in the next section. 3.2. The mean transmissivity of HI, CIV, and OVI In this section, we analyze the artificial QSO spectra in 1000 random LOS. Since our simulation volume is a spherical volume (Sec. 2.2), we cannot use the LOS at too large an impact parameter. Therefore, we generate 1000 spectra for the random LOS at the projected radius of R < 400 proper kpc. We also change the angle of projection randomly for each LOS. Within the three dimensional radius of r = 400 proper kpc, the high-resolution volume of the simulations con- tains two galaxies whose virial mass is more than 1011 14 Kawata and Rauch M⊙. Since the virial mass of the observed UV-selected galaxies is not well known, as mentioned above (see also Erb et al. 2006), we assume these two galaxies are such galaxies, and apply a similar analysis to the one done for the observed UV-selected galaxies (Adelberger et al. 2003, 2005a). Adelberger et al. (2005a) measured the mean transmissivity of all HI pixel in their QSO spectra that lie within 1 h−1 comoving Mpc from the galaxies as determined from the projected distance and the LOS ve- locity. In this paper we indicate the mean transmissivity as f1Mpc. We analyze f1Mpc not only for HI (f1Mpc,HI) but also for CIV (f1Mpc,CIV) and OVI (f1Mpc,OVI) for all the LOS spectra for our two galaxies. In reality, it is diffi- cult to measure f1Mpc for metal lines. For example, OVI lines are often contaminated by interloper Lyman alpha lines. However, we carry out this theoretical exercize to understand the effect of galactic winds on absorption lines quantitatively. Figure 14 shows the mean transmissivity of HI, CIV, and OVI as a function of the projected distance from the galaxy, i.e., the impact parameter, b. Figure 15 shows the histogram of the probability of the decrement which is de- fined as 1−f1Mpc from Figure 14. The top panels of Fig- ures 14 and 15 correspond to the lower panel of Figures 13 and 15 of Adelberger et al. (2005a), respectively. The left-top panel of Figure 15 shows results similar to those from previous numerical simulation studies with without feedback (Kollmeier et al. 2003; Tasker & Bryan 2006). Although the observational data (dotted histogram) of Adelberger et al. (2005a) agree as far as strong absorp- tion is concerned, their data show a much higher proba- bility for the very weak absorption (1 − f1Mpc < 0.2). Adelberger et al. (2005a) claim that this may be be- cause in the real universe galactic winds turn moder- ate absorption into weak absorption, but do not affect the strong absorption systems. However, as seen in the top panels of Figures 14 and 15, our simulations predict that the existence of the strong galactic wind does not change the mean flux transmissivity of the HI lines. In- terestingly, if we compare the transmissivity at the LOS close to the galaxy (b ≤ 0.4h−1 comoving Mpc) in Figure 14, the stronger feedback leads to slightly stronger mean absorption, i.e., smaller median f1Mpc,HI, although the difference is subtle. Therefore, again, we conclude that HI absorption lines generally are unaffected by galactic winds. This leaves an inconsistency between the obser- vations and numerical simulations. Unfortunately, the current number of the observational sample is not satis- factory (31 systems in Adelberger et al. 2005a) to reach firm conclusions. On the theory side, the implementation of feedback has been one of the more uncertain ingredi- ents in current numerical simulations (e.g. Okamoto et al. 2005; Kobayashi et al. 2006; Scannapieco et al. 2006a). We adopt the simplest implementation, but different im- plementations may lead to different conclusions. Clearly, further observational studies and numerical simulations are required to address this problem. Figures 14 and 15 also show the results for CIV and OVI lines, which appear to be more sensitive to the ef- fect of SN feedback. Model NF shows low f1Mpc,CIV to be almost independently of the impact parameter. On the other hand, models SF and ESF show that significantly more LOS have a lower f1Mpc,CIV at b <∼ 0.2h −1 co- moving Mpc. The mean transmissivity of OVI lines also shows a similar trend. However, for OVI the absorption becomes noticeably stronger i.e., f1Mpc,OVI decreases, as the impact parameter decreases below b <∼ 0.4h−1 co- moving Mpc. Figure 15 reveals that model NF barely shows a decrement 1 − f1Mpc higher than 0.2 for both CIV and OVI, but models SF and ESF can produce such strong absorption lines. However, note that y-axis of the figure is the logarithm of probability, and it represents only ∼1 % of f1Mpc that show such strong absorption in models SF and ESF. In the previous section, we suggested that OVI is a good tracer for a galactic wind, and in model NF OVI lines are only observable where the HI lines are satu- rated. On the other hand, models SF and ESF produce OVI lines even where the HI is not saturated. Figure 16 plot the f1Mpc,HI against f1Mpc,OVI. Figure clearly shows that models SF and ESF show the significant fraction of the spectra with relatively weak HI (f1Mpc,HI > 0) and stronger OVI (f1Mpc,OVI < 0.95). We have calculated the probability of low f1Mpc,OVI for the spectra with f1Mpc,HI > 0.2. Model NF has only 14 % of the spec- tra with f1Mpc,OVI < 0.95, while models SF and ESF have 24 and 28 %. Although the difference is small, the stronger feedback seems to produce more such HI weak OVI strong lines. Finally, we briefly mention how our results are sensi- tive to the distribution of metals and the assumed UV background radiation. Since re-simulations changing the metal yields and the UV background radiation are com- putationally too expensive, we analyzed the results of model SF, assuming different metal distributions and UV background radiations. Figure 17 shows the results of the same analysis as Figure 14 in the cases when the the IGM is assumed to be homogeneously enriched at the level of [C/H]= −2.5 and [O/H]= −3 (model SFhZ) and when the QSO-only UV background radiation suggested by Haardt & Madau (2001) is adopted (model SFQ), while model SF includes radiation from both QSO and galaxies. Model SFhZ demonstrates that if the heavy el- ements are homogeneously distributed with the assumed metallicity, there is very little correlation between the strength of metal lines and the impact parameter. There- fore, it is important to follow the metal distribution in the IGM self-consistently. Model SFQ shows that f1Mpc,CIV and f1Mpc,OVI differ little between the QSO and galaxy radiation and QSO-only UV background radiation cases, except for very subtle decrease in CIV absorptions and increase in OVI absorptions (see also Aguirre et al. 2005). 4. CONCLUSIONS We have analyzed the QSO absorption features ob- tained from cosmological numerical simulations with dif- ferent strengths of SN feedback. Our simulations self- consistently follow the metal exchange histories among the IGM, ISM, and stellar components. We investigate not only the neutral hydrogen absorption lines but also the ionization lines for heavy elements, keeping track of the abundance history of the elements. We have paid particular attention to the properties of the IGM around high-redshift (z = 2.43) galaxies with Mvir ∼ 10 11 M⊙. We found that a model without me- chanical feedback creates hot gas halos around galaxies due to shock heating, with radii up to 100 proper kpc. We found that such hot gas can lead to a lack of HI Galactic Wind Signatures 15 absorption (Fig. 11) even for LOS close to galaxies, as found by Adelberger et al. (2005a), without having to invoke a galactic wind. In our strong feedback models, outflows induced by SN feedback produce larger hot bubbles around galax- ies (Figs. 5 and 6). However, such outflows tends to escape to lower density regions, and hardly affect the dense filaments producing HI absorption systems so that the transmissivity of HI Lyman alpha is virtually inde- pendent of the strength of SN feedback. If anything the absorption by neutral hydrogen slightly increases in the presence of a wind. We conclude that the presence or absence of HI absorption lines is not a good indicator of the presence or absence of a galactic wind. On the other hand, we found that the metal lines, es- pecially OVI, are sensitive to the existence of outflows. Without feedback, it is difficult to enrich the IGM enough to produce strong OVI lines further away from galaxies (Fig. 4), unless there are nearby satellite galaxies inter- sected by the LOS by chance. We also found that, in the no-feedback model, strong OVI lines are almost al- ways associated with saturated HI lines. On the other hand, the strong feedback model can produce strong OVI lines even where HI lines are unsaturated, because strong feedback can re-distribute the enriched gas to relatively low density regions. We have confirmed this by looking for the spectra whose OVI flux is less than 0.8 over more than 5 pixels and whose mean HI flux within ±50 km s−1 from the HI velocity corresponding to the OVI lines are higher than 0.2 from 1000 spectra with random LOS. The no-feedback model has no such spectra, while strong feedback (model SF) has 12 of such spectra. We point out that Figure 9 of Simcoe et al. (2004) shows an OVI line where the HI is not saturated. Our results suggest that this is likely to be a region where the effect of a galactic wind is significant. Analyzing the transmissivity of OVI lines we found that strong feedback creates more LOS with lower transmissivity, i.e. stronger OVI ab- sorption, near the star-forming galaxies. The statistical analysis of transmissivity also shows that there are more LOS where stronger OVI is associated with weaker HI, in the presence of galactic winds. We expect that the pixel-optical depth analysis of OVI against HI (Schaye et al. 2000) would be sensitive to the presence of a galac- tic wind, and we will test this idea in a future paper. In conclusion, OVI appears a theoretically good tracer of galactic winds that merits further attention. DK thanks the financial support of the JSPS, through Postdoctoral Fellowship for research abroad. We ac- knowledge the Center for Computational Astrophysics of the National Astronomical Observatory, Japan (project ID: imn33a), the Institute of Space and Astronautical Science of Japan Aerospace Exploration Agency, and the Australian and Victorian Partnerships for Advanced Computing, where the numerical computations for this paper were performed. MR is grateful to the NSF for support under grant AST-05-06845. REFERENCES Abel, T., Anninos, P., Zhang, Y., & Norman, M. L. 1997, New Astronomy, 2, 181 Adelberger, K. L., Shapley, A. E., Steidel, C. C., Pettini, M., Erb, D. K., & Reddy, N. A. 2005a, ApJ, 629, 636 Adelberger, K. L., Steidel, C. C., Pettini, M., Shapley, A. E., Reddy, N. A., & Erb, D. K. 2005b, ApJ, 619, 697 Adelberger, K. L., Steidel, C. C., Shapley, A. E., & Pettini, M. 2003, ApJ, 584, 45 Aguirre, A., Hernquist, L., Schaye, J., Weinberg, D. H., Katz, N., & Gardner, J. 2001, ApJ, 560, 599 Aguirre, A., Schaye, J., Hernquist, L., Kay, S., Springel, V., & Theuns, T. 2005, ApJ, 620, L13 Aguirre, A., Schaye, J., Kim, T.-S., Theuns, T., Rauch, M., & Sargent, W. L. W. 2004, ApJ, 602, 38 Aguirre, A., Schaye, J., & Theuns, T. 2002, ApJ, 576, 1 Anninos, P., Zhang, Y., Abel, T., & Norman, M. L. 1997, New Astronomy, 2, 209 Arimoto, N., & Yoshii, Y. 1987, A&A, 173, 23 Becker, R. H., Fan, X., White, R. L., Strauss, M. A., Narayanan, V. K., Lupton, R. H., Gunn, J. E., Annis, J., Bahcall, N. A., Brinkmann, J., Connolly, A. J., Csabai, I., Czarapata, P. C., Doi, M., Heckman, T. M., Hennessy, G. S., Ivezić, Ž., Knapp, G. R., Lamb, D. Q., McKay, T. A., Munn, J. A., Nash, T., Nichol, R., Pier, J. R., Richards, G. T., Schneider, D. P., Stoughton, C., Szalay, A. S., Thakar, A. R., & York, D. G. 2001, AJ, 122, 2850 Bertone, S., & White, S. D. M. 2006, MNRAS, 367, 247 Bertschinger, E. 2001, ApJS, 137, 1 Brook, C. B., Kawata, D., Gibson, B. K., & Flynn, C. 2004, MNRAS, 349, 52 Bruscoli, M., Ferrara, A., Marri, S., Schneider, R., Maselli, A., Rollinde, E., & Aracil, B. 2003, MNRAS, 343, L41 Cen, R., Kang, H., Ostriker, J. P., & Ryu, D. 1995, ApJ, 451, 436 Cen, R., Nagamine, K., & Ostriker, J. P. 2005, ApJ, 635, 86 Cowie, L. L., & Songaila, A. 1998, Nature, 394, 44 Croft, R. A. C., Hernquist, L., Springel, V., Westover, M., &White, M. 2002, ApJ, 580, 634 Davé, R., Hellsten, U., Hernquist, L., Katz, N., & Weinberg, D. H. 1998, ApJ, 509, 661 Dekel, A., & Birnboim, Y. 2006, MNRAS, 368, 2 Dekel, A., & Silk, J. 1986, ApJ, 303, 39 Erb, D. K., Steidel, C. C., Shapley, A. E., Pettini, M., Reddy, N. A., & Adelberger, K. L. 2006, ApJ, 646, 107 Fan, X., Narayanan, V. K., Lupton, R. H., Strauss, M. A., Knapp, G. R., Becker, R. H., White, R. L., Pentericci, L., Leggett, S. K., Haiman, Z., Gunn, J. E., Ivezić, Ž., Schneider, D. P., Anderson, S. F., Brinkmann, J., Bahcall, N. A., Connolly, A. J., Csabai, I., Doi, M., Fukugita, M., Geballe, T., Grebel, E. K., Harbeck, D., Hennessy, G., Lamb, D. Q., Miknaitis, G., Munn, J. A., Nichol, R., Okamura, S., Pier, J. R., Prada, F., Richards, G. T., Szalay, A., & York, D. G. 2001, AJ, 122, 2833 Ferland, G. J., Korista, K. T., Verner, D. A., Ferguson, J. W., Kingdon, J. B., & Verner, E. M. 1998, PASP, 110, 761 Galli, D., & Palla, F. 1998, A&A, 335, 403 Gibson, B. K. 1997, MNRAS, 290, 471 Governato, F., Willman, B., Mayer, L., Brooks, A., Stinson, G., Valenzuela, O., Wadsley, J., & Quinn, T. 2006, ArXiv Astrophysics e-prints Haardt, F., & Madau, P. 2001, in Clusters of Galaxies and the High Redshift Universe Observed in X-rays, ed. D. M. Neumann & J. T. V. Tran Ikeuchi, S. 1977, Progress of Theoretical Physics, 58, 1742 Ikeuchi, S., & Ostriker, J. P. 1986, ApJ, 301, 522 Iwamoto, K., Brachwitz, F., Nomoto, K., Kishimoto, N., Umeda, H., Hix, W. R., & Thielemann, F. 1999, ApJS, 125, 439 Johnson, H. E., & Axford, W. I. 1971, ApJ, 165, 381 Kawata, D., Arimoto, N., Cen, R., & Gibson, B. K. 2006, ApJ, 641, 785 Kawata, D., & Gibson, B. K. 2003, MNRAS, 340, 908 Kereš, D., Katz, N., Weinberg, D. H., & Davé, R. 2005, MNRAS, 363, 2 Kitayama, T., & Suto, Y. 1996, ApJ, 469, 480 Kobayashi, C., Springel, V., & White, S. D. M. 2006, ArXiv Astrophysics e-prints Kobayashi, C., Tsujimoto, T., & Nomoto, K. 2000, ApJ, 539, 26 Kodama, T., & Arimoto, N. 1997, A&A, 320, 41 Kollmeier, J. A., Miralda-Escudé, J., Cen, R., & Ostriker, J. P. 2006, ApJ, 638, 52 16 Kawata and Rauch Kollmeier, J. A., Weinberg, D. H., Davé, R., & Katz, N. 2003, ApJ, 594, 75 Kriek, M., van Dokkum, P., Franx, M., Quadri, R., Gawiser, E., Herrera, D., Illingworth, G., Labbe, I., Lira, P., Marchesini, D., Rix, H.-W., Rudnick, G., Taylor, E., Toft, S., Urry, M., & Wuyts, S. 2006, ArXiv Astrophysics e-prints Labbé, I., Huang, J., Franx, M., Rudnick, G., Barmby, P., Daddi, E., van Dokkum, P. G., Fazio, G. G., Schreiber, N. M. F., Moorwood, A. F. M., Rix, H.-W., Röttgering, H., Trujillo, I., & van der Werf, P. 2005, ApJ, 624, L81 Larson, R. B. 1974, MNRAS, 169, 229 Lynds, C. R., & Sandage, A. R. 1963, ApJ, 137, 1005 Madau, P., Ferguson, H. C., Dickinson, M. E., Giavalisco, M., Steidel, C. C., & Fruchter, A. 1996, MNRAS, 283, 1388 Madau, P., Ferrara, A., & Rees, M. J. 2001, ApJ, 555, 92 Martin, C. L. 1998, ApJ, 506, 222 —. 2005, ApJ, 621, 227 Mathews, W. G., & Baker, J. C. 1971, ApJ, 170, 241 Ohyama, Y., Taniguchi, Y., Iye, M., Yoshida, M., Sekiguchi, K., Takata, T., Saito, Y., Kawabata, K. S., Kashikawa, N., Aoki, K., Sasaki, T., Kosugi, G., Okita, K., Shimizu, Y., Inata, M., Ebizuka, N., Ozawa, T., Yadoumaru, Y., Taguchi, H., & Asai, R. 2002, PASJ, 54, 891 Ohyama, Y., Taniguchi, Y., Kawabata, K. S., Shioya, Y., Murayama, T., Nagao, T., Takata, T., Iye, M., & Yoshida, M. 2003, ApJ, 591, L9 Okamoto, T., Eke, V. R., Frenk, C. S., & Jenkins, A. 2005, MNRAS, 363, 1299 Oppenheimer, B. D., & Davé, R. 2006, MNRAS, 373, 1265 Pettini, M., Kellogg, M., Steidel, C. C., Dickinson, M., Adelberger, K. L., & Giavalisco, M. 1998, ApJ, 508, 539 Pieri, M. M., Schaye, J., & Aguirre, A. 2006, ApJ, 638, 45 Porciani, C., & Madau, P. 2005, ApJ, 625, L43 Rauch, M., Haehnelt, M. G., & Steinmetz, M. 1997, ApJ, 481, 601 Rauch, M., Sargent, W. L. W., & Barlow, T. A. 2001a, ApJ, 554, Rauch, M., Sargent, W. L. W., Barlow, T. A., & Carswell, R. F. 2001b, ApJ, 562, 76 Raymond, J. C., & Smith, B. W. 1977, ApJS, 35, 419 Robertson, B., Yoshida, N., Springel, V., & Hernquist, L. 2004, ApJ, 606, 32 Rupke, D. S., Veilleux, S., & Sanders, D. B. 2005, ApJS, 160, 115 Scannapieco, C., Tissera, P. B., White, S. D. M., & Springel, V. 2006a, MNRAS, 371, 1125 Scannapieco, E. 2005, ApJ, 624, L1 Scannapieco, E., Ferrara, A., & Madau, P. 2002, ApJ, 574, 590 Scannapieco, E., Pichon, C., Aracil, B., Petitjean, P., Thacker, R. J., Pogosyan, D., Bergeron, J., & Couchman, H. M. P. 2006b, MNRAS, 365, 615 Schaye, J. 2004, ApJ, 609, 667 Schaye, J., Aguirre, A., Kim, T.-S., Theuns, T., Rauch, M., & Sargent, W. L. W. 2003, ApJ, 596, 768 Schaye, J., Rauch, M., Sargent, W. L. W., & Kim, T.-S. 2000, ApJ, 541, L1 Seager, S., Sasselov, D. D., & Scott, D. 1999, ApJ, 523, L1 —. 2000, ApJS, 128, 407 Shapley, A. E., Steidel, C. C., Pettini, M., & Adelberger, K. L. 2003, ApJ, 588, 65 Simcoe, R. A., Sargent, W. L. W., & Rauch, M. 2002, ApJ, 578, —. 2004, ApJ, 606, 92 Simcoe, R. A., Sargent, W. L. W., Rauch, M., & Becker, G. 2006, ApJ, 637, 648 Sommer-Larsen, J., Götz, M., & Portinari, L. 2003, ApJ, 596, 47 Spergel, D. N., Bean, R., Dore’, O., Nolta, M. R., Bennett, C. L., Hinshaw, G., Jarosik, N., Komatsu, E., Page, L., Peiris, H. V., Verde, L., Barnes, C., Halpern, M., Hill, R. S., Kogut, A., Limon, M., Meyer, S. S., Odegard, N., Tucker, G. S., Weiland, J. L., Wollack, E., & Wright, E. L. 2006, ArXiv Astrophysics e-prints Tasker, E. J., & Bryan, G. L. 2006, ApJ, 642, L5 Theuns, T., Viel, M., Kay, S., Schaye, J., Carswell, R. F., & Tzanavaris, P. 2002, ApJ, 578, L5 van den Hoek, L. B., & Groenewegen, M. A. T. 1997, A&AS, 123, Veilleux, S., Cecil, G., & Bland-Hawthorn, J. 2005, ARA&A, 43, Veilleux, S., Shopbell, P. L., Rupke, D. S., Bland-Hawthorn, J., & Cecil, G. 2003, AJ, 126, 2185 Woosley, S. E., & Weaver, T. A. 1995, ApJS, 101, 181 ABSTRACT We carry out cosmological chemodynamical simulations with different strengths of supernova (SN) feedback and study how galactic winds from star-forming galaxies affect the features of hydrogen (HI) and metal (CIV and OVI) absorption systems in the intergalactic medium at high redshift. We find that the outflows tend to escape to low density regions, and hardly affect the dense filaments visible in HI absorption. As a result, the strength of HI absorption near galaxies is not reduced by galactic winds, but even slightly increases. We also find that a lack of HI absorption for lines of sight (LOS) close to galaxies, as found by Adelberger et al., can be created by hot gas around the galaxies induced by accretion shock heating. In contrast to HI, metal absorption systems are sensitive to the presence of winds. The models without feedback can produce the strong CIV and OVI absorption lines in LOS within 50 kpc from galaxies, while strong SN feedback is capable of creating strong CIV and OVI lines out to about twice that distance. We also analyze the mean transmissivity of HI, CIV, and OVI within 1 h$^{-1}$ Mpc from star-forming galaxies. The probability distribution of the transmissivity of HI is independent of the strength of SN feedback, but strong feedback produces LOS with lower transmissivity of metal lines. Additionally, strong feedback can produce strong OVI lines even in cases where HI absorption is weak. We conclude that OVI is probably the best tracer for galactic winds at high redshift. <|endoftext|><|startoftext|> cond-mat/somewhere ‘Stückelberg interferometry’ with ultracold molecules M. Mark,1 T. Kraemer,1 P. Waldburger,1 J. Herbig,1 C. Chin,1,3 H.-C. Nägerl,1 R. Grimm1,2 Institut für Experimentalphysik und Forschungszentrum für Quantenphysik, Universität Innsbruck, 6020 Innsbruck, Austria Institut für Quantenoptik und Quanteninformation, Österreichische Akademie der Wissenschaften, 6020 Innsbruck, Austria Physics Department and James Franck Institute, University of Chicago, Chicago, IL 60637, USA (Dated: August 18, 2021) We report on the realization of a time-domain ‘Stückelberg interferometer’, which is based on the internal state structure of ultracold Feshbach molecules. Two subsequent passages through a weak avoided crossing between two different orbital angular momentum states in combination with a variable hold time lead to high-contrast population oscillations. This allows for a precise determi- nation of the energy difference between the two molecular states. We demonstrate a high degree of control over the interferometer dynamics. The interferometric scheme provides new possibilities for precision measurements with ultracold molecules. PACS numbers: 34.50.-s, 05.30.Jp, 32.80.Pj, 67.40.Hf The creation of molecules on Feshbach resonances in atomic quantum gases has opened up a new chapter in the field of ultracold matter [1]. Molecular quantum gases are now readily available in the lab for various applica- tions. Prominent examples are given by the creation of strongly interacting many-body systems based on molec- ular Bose-Einstein condensates [2], experiments on few- body collision physics [3], the realization of molecular matter-wave optics [4], and by the demonstration of ex- otic pairs in optical lattices [5]. Recent experimental progress has shown that full control over all degress of freedom can be expected for such molecules [6, 7, 8]. Ul- tracold molecular samples with very low thermal spread and long interaction times could greatly increase the sen- sitivity in measurements of fundamental physical prop- erties such as the existence of an electron dipole moment [9] and a possible time-variation of the fine-structure con- stant [10, 11]. Most of today’s most accurate and precise measure- ments rely on interferometric techniques applied to ultra- cold atomic systems. For example, long coherence times in atomic fountains or in optical lattices allow ultrapre- cise frequency metrology [12, 13]. Molecules, given their rich internal structure, greatly extend the scope of possi- ble precision measurements. Molecular clocks, for exam- ple, may provide novel access to fundamental constants and interaction effects, different from atomic clocks. The fast progress in preparing cold molecular samples thus opens up fascinating perspectives for precision interfer- ometry. Recently, the technique of Stark deceleration has allowed a demonstration of Ramsey interferometry with a cold and slowed molecular beam [10]. Ultracold trapped molecular ensembles are expected to further enhance the range of possible measurements. In this Letter, we report on the realization of an inter- nal-state interferometer with ultracold Cs2 molecules. A weak avoided crossing is used as a ‘beam splitter’ for molecular states as a result of partial Landau-Zener tun- 10 12 14 16 18 20 magnetic field (G) FIG. 1: (color online). (a) Molecular energy structure below the dissociation threshold showing all molecular states up ℓ= 8. The relevant states for the present experiment (solid lines) are labeled |g〉, |g′〉, and |l〉. Molecules in state |g〉 or |l〉 are detected upon dissociation as shown in (b) and (c). The crossing used for the interferometer is the one between |g′〉 and |l〉 near 11.4G. Initially, ultracold molecules are generated in state |g〉 on the Feshbach resonance at 19.8G. neling when it is traversed by means of an appropriately chosen magnetic field ramp. Using the avoided crossing twice, first for splitting, and then for recombination of molecular states, leads to the well-known ‘Stückelberg oscillations’ [14]. We thus call our scheme a ‘Stückelberg interferometer’. Our realization of this interferometer al- lows full control over the interferometer dynamics. In particular, the hold time between splitting and recombi- nation can be freely chosen. In analogy to the well-known Ramsey interferometer [15] the acquired interferometer phase is mapped onto the relative populations of the two output states that can be well discriminated upon molec- ular dissociation. To demonstrate the performance of the Stückelberg interferometer we use it for precision molec- ular spectroscopy to determine the position and coupling strength of the avoided crossing. http://arxiv.org/abs/0704.0653v1 hold time FIG. 2: (a) Scheme of the ‘Stückelberg interferometer’. By ramping the magnetic field over the avoided crossing at Bc at a rate near the critical ramp rate Rc the population in the initial molecular state is coherently split. ∆E is the bind- ing energy difference at the given hold field B0. After the hold time τ a reverse ramp coherently recombines the two populations. The populations in the two ‘output ports’ are then determined as a function of acquired phase difference φ ∝ ∆E × τ . (b) Corresponding magnetic field ramp. The energy structure of weakly bound Cs2 dimers in the relevant range of low magnetic field strength is shown in Fig. 1 [16]. Zero binding energy corresponds to the threshold of dissociation into two free Cs atoms in the lowest hyperfine sublevel |F = 3,mF = 3〉 and thus to the zero-energy collision limit of two atoms. The states relevant for this work are labelled by |g〉, |g′〉, and |l〉 [17]. While |g〉 and |g′〉 are g-wave states with orbital angular momentum ℓ=4, the state |l〉 is an l-wave state with a high orbital angular momentum of ℓ= 8 [18]. Coupling with ∆ℓ=4 between s-wave atoms and g-wave molecules and between g- and l-wave states is a result of the strong indirect spin-spin interaction between two Cs atoms [16]. The starting point for our experiments is a Bose- Einstein condensate (BEC) with ∼ 2 × 105 Cs atoms in the |F = 3,mF = 3〉 ground state confined in a crossed- beam dipole trap generated by a broad-band fiber laser with a wavelength near 1064nm [19, 20]. The BEC al- lows us to efficiently produce molecules on a narrow Fesh- bach resonance at 19.84 G [21] in an optimized scheme as described in Ref. [22]. With an efficiency of typically 25% we produce a pure molecular ensemble with up to 2.5×104 ultracold molecules all in state |g〉, initially close to quantum degeneracy [21]. The following experiments are performed on the molecular ensemble in free fall. During the initial expansion to a 1/e-diameter of about 28µm along the radial and about 46µm along the axial direction the peak density is reduced to 1× 1011cm−3 so that molecule-molecule interactions [3] can be neglected on the timescale of the experiment. The molecules can now be transferred to any one of the molecular states shown in Fig. 1 with near 100% efficiency by controlled ‘jumping’ or adiabatic following at the various crossings [23]. When the magnetic field strength is decreased, the molecules first encounter the crossing at 13.6G. At all ramp rates used in our present experiments the passage through this crossing takes place in a fully adiabatic way. The molecules are thus trans- ferred from |g〉 to |g′〉 along the upper branch of the cross- ing. They then encounter the next crossing at a magnetic field of Bc ≈ 11.4G. We accidentally found this weak crossing in our previous magnetic moment measurements [3, 23]. This allowed the identification of the l-wave state |l〉 [18]. This crossing between |g′〉 and |l〉 plays a central role in the present experiment. It can be used as a tunable ‘beam splitter’, which allows adiabatic transfer, coherent splitting, as will be shown below, or diabatic transfer for the molecular states involved, depending on the chosen magnetic ramp rate near the crossing. We find that a crit- ical ramp rate of Rc ≈ 14G/ms leads to a 50/50-splitting into |g′〉 and |l〉 [23]. Using the well-known Landau-Zener formula and an estimate for the difference in magnetic moment for states |g′〉 and |l〉 [18] we determine the cou- pling strength V between |g′〉 and |l〉 to ∼h×15 kHz. We state-selectively detect the molecules by ramp- ing up the magnetic field to bring the molecules above the threshold for dissociation. There the quasi-bound molecules decay into the atomic scattering continuum. For state |g〉 dissociation is observed for magnetic fields above the 19.84G position of the corresponding Feshbach resonance. Fig. 1 (b) shows a typical absorption image of the resulting atom cloud [21]. For state |l〉 dissociation is observed above 16.5G. The molecular states can thus be easily discriminated by the different magnetic field values needed for dissociation. Moreover, the expansion pattern is qualitatively different from the one connected to state |g〉. The absorption image in Fig. 1 (c) shows an ex- panding ‘bubble’ of atoms with a relatively large kinetic energy of about kB×20µK per atom. Here, kB is Boltz- mann’s constant. We find that significant dissociation occurs only when the state |l〉 couples to a quasi-bound g-wave state about h×0.7 MHz above threshold [24]. The resulting bubble is not fully spherically symmetric, which indicates higher partial-wave contributions [25]. The dif- ferent absorption patterns allow us to clearly distinguish between the two different dissociation channels in a sin- gle absorption picture when the magnetic field is ramped up to ∼ 22 G. These dissociation channels serve as the interferometer ‘output ports’. The interferometer is based on two subsequent pas- sages through the crossing following the scheme illus- trated in Fig. 2. For an initial magnetic field above the crossing a downward magnetic-field ramp brings the ini- tial molecular state into a coherent superposition of |g′〉 and |l〉. After the ramp the field is kept constant at a FIG. 3: (color online). Series of dissociation patterns showing about one oscillation period with ∆E/h = 155 kHz at a hold field of 11.19G. From one picture to the next the hold time τ is increased by 1µs. The first and the last of the absorption images mainly show dissociation of l-wave molecules, whereas the central image shows predominant dissociation of g-wave molecules. 130 140 150 160 170 180 =11.24 G 94(1) kHz =11.20 G 132(1) kHz =11.17 G 169(1) kHz hold time (µs) FIG. 4: Interferometer fringes for magnetic hold fields B0 be- low the crossing of |g′〉 and |l〉. The g-wave molecular fraction is plotted as a function of the hold time τ . Sinusoidal fits give the oscillation frequency as indicated. hold field B0 below the crossing for a variable hold time τ . A differential phase φ is then accumulated between the two components, which linearly increases with the prod- uct of the binding energy difference ∆E and the hold time τ . The magnetic field is then ramped back up, and the second passage creates a new superposition state de- pending on φ. For a 50/50-splitting ratio, this can lead to complete destructive or constructive interference in the two output ports and thus to high-contrast fringes as a function of τ or ∆E. These fringes, resulting from two passages through the same crossing, are analogous to the well-known Stückelberg oscillations in collision physics [14, 26] or in the physics with Rydberg atoms [27, 28]. Note that our realization of a ‘Stückelberg interferome- ter’ gives full control over the interferometer dynamics by appropriate choice of ramp rates and magnetic offset fields. A typical ramp sequence, as shown in Fig. 2 (b), starts with a sample of |g′〉 molecules at a magnetic field of 11.6G about 250mG above the crossing. At the critical ramp rate Rc we ramp the magnetic field within about 50µs to a hold field B0 below the crossing. After the vari- able hold time τ we reverse the ramp and transverse the crossing a second time at the critical ramp rate. The out- put of the interferometer is detected by rapidly ramping the magnetic field up to 22G and by imaging the pattern of dissociating |l〉 and |g〉 molecules. For one period of oscillation the dependence of the dis- sociation pattern on the hold time τ is demonstrated by the series of absorption images shown in Fig. 3. The hold time is increased in steps of 1µs while the entire prepa- ration, ramping, and detection procedure is repeated for each experimental cycle, lasting about 20 s. The molecu- lar population oscillates from being predominantly l-wave to predominatly g-wave and back. For a quantitative analysis of the molecular population in each output port we fit the images with a simple model function [29] and extract the fraction of molecules in each of the two out- put ports. Fig. 4 shows the g-wave molecular population as a function of hold time τ for various hold fields B0 corresponding to different ∆E. The existence of these Stückelberg oscillations confirms that coherence is pre- served by the molecular beam splitter. Their high con- trast ratio shows that near 50/50-splitting is achieved. Sinusoidal fits to the data allow for an accurate determi- nation of the oscillation frequency and hence of ∆E. Fig. 5 shows ∆E as a function of magnetic field strength near the avoided crossing. For magnetic fields below the crossing we obtain ∆E as described before. For magnetic fields above the crossing, we invert the interfer- ometric scheme. Molecules are first transferred from |g′〉 into |l〉 using a slow adiabatic ramp. The field is then ramped up above the crossing with a rate near Rc, kept constant for the variable time τ at the hold field B0 and then ramped down to close the interferometer. An adia- batic ramp through the crossing maps population in |g′〉 onto |l〉 and vice versa. Detection then proceeds as be- fore. For both realizations of the interferometer we obtain high-contrast fringes even when it is not operated in the Landau-Zener regime and the fast ramps are stopped right at the crossing (see inset to Fig. 5). This allows us to measure ∆E in the crossing region. A fit to the data with a hyperbolic fit function according to the stan- dard Landau-Zener model yields Bc = 11.339(1) G for the position of the crossing, ∆µ = 0.730(6)µB for the difference in magnetic moment of the two states involved (µB is Bohr’s magneton), and V = h× 14(1) kHz for the coupling strength. While the measured ∆µ agrees reasonably well with the result from an advanced theo- retical model of the Cs2 dimer [18], Bc and V cannot be obtained from these calculations with sufficient accuracy. The present interferometer allows us to observe up to 100 oscillations at 200 kHz. Shot-to-shot fluctuations in- 11.1 11.2 11.3 11.4 11.5 11.6 50 75 100 125 150 175 magnetic field (G) hold time (µs) FIG. 5: Interferometrically measured binding energy differ- ence ∆E in the region of the crossing between states |g′〉 and |l〉 as a function of magnetic field. Solid circles: Standard ramp sequence of the interferometer. Open circles: Inverted scheme for field values above the crossing. The one-sigma statistical error from the sinusoidal fit is less than the size of the symbols. The solid curve is a hyperbolic fit to the experi- mental data. Inset: Oscillation at 26.6(3) kHz for a hold field B0 = 11.34 G right on the crossing. creasingly scramble the phase of the oscillations for longer hold times until the phase appears to be fully randomized while large amplitude variations for the molecular pop- ulations persist. The peak-to-peak amplitude of these fluctuations decays slowly and is still 50% of the initial contrast after 1 ms. We attribute this phase scram- bling to magnetic field noise that causes shot-to-shot variations of ∆E, the same, however, for each molecule. The large amplitude of these fluctuations indicates that phase coherence is preserved within the molecular sam- ple. We attribute the gradual loss of peak-to-peak am- plitude to spatial magnetic field inhomogeneities. We expect that straightforward technical improvements re- garding the magnetic field stability and homogeneity and applying the interferometer to trapped molecular samples will allow us extend the hold times far into the millisec- ond range. It will then be possible to measure ultraweak crossings with coupling strengths well below h×1 kHz. We have demonstrated a molecular Stückelberg inter- ferometer with full control over the interferometer dy- namics. The interferometer can be used as a spectros- copic tool as it allows precise measurements of binding energy differences of molecular states. In particular, the technique can be employed to measure feeble interactions between molecular states, such as parity non-conserving interactions [30]. The sensitivity to detect ultraweak level crossings, combined with long storage times in optical molecule traps [3] or lattices [5, 6, 7], may allow to de- tect interaction phenomena on the h× 1 Hz scale. In view of the rapid progress in various preparation methods for cold molecular samples, new tools for precision mea- surements on molecular samples, such as our Stückelberg interferometer, will open up exciting avenues for future research. We thank E. Tiesinga for discussions and for the- oretical support. We acknowledge financial support by the Austrian Science Fund (FWF) within SFB 15 (project part 16) and by the EU within the Cold Molecules TMR Network under contract No. HPRN-CT- 2002-00290. M.M. and C.C. acknowledge support by DOC [PhD-Program of the Austrian Academy of Science] and the FWF Lise-Meitner program, respectively. [1] For a review, see T. Köhler, K. Góral, and P.S. Julienne, Rev. Mod. Phys. 78, 1311 (2006). [2] See e.g. Ultracold Fermi Gases, Proceedings of the In- ternational School of Physics “Enrico Fermi”, Course CLXIV, Varenna, 20 - 30 June 2006, edited by M. In- guscio, W. Ketterle, and C. Salomon, in press. [3] C. Chin et al., Phys. Rev. Lett. 94, 123201 (2005). [4] J.R. Abo-Shaeer et al. Phys. Rev. Lett. 94, 040405 (2005). [5] K. Winkler et al., Nature 441, 853 (2006). [6] G. Thalhammer et al., Phys. Rev. Lett. 96, 050402 (2006). [7] T. Volz et al., Nature Physics 2, 692 (2006). [8] K. Winkler et al., Phys. Rev. Lett. 98, 043201 (2007). [9] J. J. Hudson, B. E. Sauer, M. R. Tarbutt, and E. A. Hinds, Phys. Rev. Lett. 89, 023003 (2002). [10] E. R. Hudson, H. J. Lewandowski, B. C. Sawyer, and J. Ye, Phys. Rev. Lett. 96, 143004 (2006). [11] C. Chin and V. V. Flambaum, Phys. Rev. Lett. 96, 230801 (2006). [12] S. Bize et al., J. Phys. B 38, S449 (2005). [13] M. M Boyd et al., Phys. Rev. Lett. 98, 083002 (2007). [14] E.C.G. Stückelberg, Helv. Phys. Acta 5, 369 (1932). [15] N.F. Ramsey, Molecular Beams (Oxford University Press, London, 1956). [16] C. Chin et al., Phys. Rev. A 70, 032701 (2004). [17] Using the notation |f , mf ; l, ml〉 of Ref. [16], the three states |g〉, |g′〉, and |l〉 correspond to |4, 4; 4, 2〉, |6, 6; 4, 0〉, and |6, 3; 8, 3〉, respectively. [18] E. Tiesinga (private communication). [19] T. Kraemer et al., Appl. Phys. B 79, 1013 (2004). [20] T. Weber et al., Science 299, 232 (2003). [21] J. Herbig et al., Science 301, 1510 (2003). [22] M. Mark et al., Europhys. Lett. 69, 706 (2005). [23] M. Mark et al., manuscript in preparation. [24] S. Knoop et al., manuscript in preparation. [25] S. Dürr, T. Volz, and G. Rempe, Phys. Rev. A 70, 031601(R) (2004). [26] E. E. Nikitin and S. Ya. Umanskii, Theory of Slow Atomic Collisions (Springer, Berlin, 1984). [27] M. C. Baruch and T. F. Gallagher, Phys. Rev. Lett. 68, 3515 (1992). [28] S. Yoakum, L. Sirko, and P. M. Koch, Phys. Rev. Lett. 68, 1919 (1992). [29] We model the dissociation pattern with appropriately chosen spherical harmonic functions to account for the angular distribution [25]. [30] E. D. Commins, Adv. At. Mol. Opt. Phys. 40, 1 (1999). ABSTRACT We report on the realization of a time-domain `St\"uckelberg interferometer', which is based on the internal state structure of ultracold Feshbach molecules. Two subsequent passages through a weak avoided crossing between two different orbital angular momentum states in combination with a variable hold time lead to high-contrast population oscillations. This allows for a precise determination of the energy difference between the two molecular states. We demonstrate a high degree of control over the interferometer dynamics. The interferometric scheme provides new possibilities for precision measurements with ultracold molecules. <|endoftext|><|startoftext|> Introduction Galaxy counterparts to Damped Lyman-α systems (DLAs) seen in quasar (QSO) spectra have been suggested to be (proto)-disk galaxies with line of sight clouds of neutral gas with column densities N(H i) > 2 × 1020 cm−2 (Wolfe et al. 1986). Analyses of absorption line profiles have indicated that rotational compo- nents with velocities of ∼200 km −1 can be involved in these systems suggesting that DLAs reside in large disk galaxies (Prochaska & Wolfe 1997; Ledoux et al. 1998a). On the other hand, numerical simulations show that in a hierarchical forma- tion scenario merging proto-galactic clumps can also give rise to the observed line profiles (Haehnelt et al. 1998). A large fraction of the neutral hydrogen present at z > 2 is contained in high column density DLA systems (Lanzetta et al. 1995; Storrie-Lombardi & Wolfe 2000; Péroux et al. 2001). In addition to the classical DLAs, clouds with column densities 1019 < N(H i) < 2 × 1020 cm−2 also show some de- gree of damping wings, which is characteristic of DLA sys- tems. It is suggested that sub-DLA systems contain a signifi- cant fraction of the neutral matter in the Universe (Péroux et al. 2003). Metallicity studies have shown that the properties of the sub-DLA systems are similar to those of DLA systems (Dessauges-Zavadsky et al. 2003), apart from the latter category having large ionisation corrections (Prochaska & Herbert-Fort 2004). Send offprint requests to: L. Christensen ⋆ Based on observations collected at the Centro Astronómico Hispano Alemán (CAHA), operated by the Max-Planck Institut für Astronomie and the Instituto Astrofisica de Andalucia (CSIC). The association of DLAs with galaxies has been a subject of much study. Originally, either space-based or ground-based deep images were obtained to identify objects near the line of sight to the QSOs (Steidel et al. 1995; Le Brun et al. 1997; Warren et al. 2001). To confirm nearby objects as galaxies that are respon- sible for the DLA lines in the QSO spectra, follow-up spectra are needed to find the galaxy redshifts. At z < 1, confirmations of 14 systems exist to date (Rao et al. 2003; Chen & Lanzetta 2003; Lacy et al. 2003; Chen et al. 2005, and references therein), while at z & 2 only 6 DLA galaxies are confirmed through spec- troscopic observations of Lyα emission from the DLA galax- ies (Møller & Warren 1993; Djorgovski et al. 1996; Møller et al. 1998; Leibundgut & Robertson 1999; Møller et al. 2002, 2004), three of which are located at the same redshifts as the QSOs themselves. Other techniques to identify DLA galaxies involve narrow-band imaging (e.g. Fynbo et al. 1999, 2000) or Fabry- Perot imaging. A Fabry-Perot imaging study of several QSO fields did not result in detections of emission from DLA galax- ies (Lowenthal et al. 1995), while recently the same method was used to identify a few emission line candidates (Kulkarni et al. 2006). Integral field spectroscopy (IFS) presents an alternative that provides images and spectra at each point on the sky simultane- ously. This technique can be used to look for emission line ob- jects at known wavelengths, but unknown spatial location. This technique is ideally suited to look for Lyα emission lines from the galaxies responsible for strong QSO absorption lines. At the Lyα wavelength corresponding to the redshift of the DLA sys- tem, the QSO emission has been absorbed, enabling us to lo- cate emission line objects very near to the QSO line of sight. Because of the large column densities in DLAs and the res- http://arxiv.org/abs/0704.0654v1 2 L. Christensen et al.: An IFS survey for high-z DLA galaxies onant nature of Lyα photons the corresponding emission line may be offset in velocity space relative to the DLA line (e.g. Leibundgut & Robertson 1999), but such an offset is not always observed (e.g. Møller et al. 2004). IFS searches for emission from DLA galaxies towards two QSOs have resulted in upper limits for their fluxes (Petitjean et al. 1996; Ledoux et al. 1998b), while a sub-DLA galaxy previously known to be a Lyα emitter was confirmed with IFS (Christensen et al. 2004). Here we present a survey us- ing IFS to look for Lyα emitting DLA galaxies. Section 2 de- scribes the sample of QSOs included in the survey, which are known previously to have DLAs and sub-DLAs in their spec- tra. Section 3 describes the observations and data reduction. In Section 4 the method of detecting emission line candidates is described. Section 5 presents the results and comments on each object. Properties of the Lyα emission candidates detected in the survey in relation to the six previously known Lyα emitting DLA galaxies are presented in Section 6. Section 7 summarises our findings. A flat cosmology with H0 = 70 km s −1 Mpc−1, Ωm = 0.3, and ΩΛ = 0.7 is used throughout. This study, as well as previous ones that try to identify the host galaxies of DLA systems, can be biased since the galaxy observed at the right redshift likely belongs to the brightest emis- sion line object close to the line of sight. In the case that the host galaxy is a much fainter galaxy in a group, it will not be identified correctly. In the remaining part of the paper, an ‘iden- tified’ DLA galaxy refers to observations that show (line) emis- sion from independent observations, while the ‘candidates’ are only reported in these IFS observations. Although extensive tests are done on the data to distinguish the candidates from potential artifacts, independent observations are needed to prove them as Lyα emitters connected with the DLAs. 2. Sample selection We selected a number of DLA systems without previous de- tections of associated Lyα emission. The selected QSOs with known DLAs were chosen based on the following criteria 1. N(H i)> 2 × 1020 cm−2 2. DLA redshift (2 < z < 4) 3. Northern hemisphere object The first criterion includes only classic DLA systems. Many QSO spectra show additional sub-DLA systems. Although their relationship to DLAs is still debated, these systems were in- cluded in the survey because of their probable physical associa- tion with galaxies. To increase the sample size with a minimum number of pointings we preferentially selected QSOs with multiple DLAs. IFS covers a range of wavelengths, and correspondingly Lyα emission at a large range of redshifts in the line of sight for each QSO. However, in retrospect, this can affect the emission line detections, because extinction in foreground DLAs could ob- scure emission from background ones when the galaxies lie in the same line of sight. Hence, upper limits on detections of the higher redshift systems can be biased. From the list of DLA systems compiled by S. Curran1, we found 66 QSOs matching these criteria in 2003. More re- cently, detections of DLAs in the Sloan Digitized Sky Survey QSO spectra have greatly increased the number of known DLAs (Prochaska & Herbert-Fort 2004; Prochaska et al. 2005). A sys- tematic survey of all 66 objects would require a large amount 1 http://www.phys.unsw.edu.au/∼sjc/dla/ of time with present instruments, so we selected a few systems based on their observability during the allocated observing runs. We avoided DLAs with Lyα absorption lines close to sky emis- sion lines. The total sample consists of 9 QSOs with a total number of 14 DLA systems as listed in Table 1. These QSOs have an ad- ditional 8 sub-DLAs which are included in the survey. Because of the small number of DLAs involved in the survey, a proper statistical study is not the aim of this paper. Instead we focus on a few systems to exploit the benefits of IFS for this kind of investigation. To study the applicability of IFS in identifying DLA galax- ies we initially observed DLA galaxies where Lyα emis- sion had been reported previously in the literature. Two of these systems could be observed during our runs; Q2233+131 and PHL 1222, originally identified by Steidel et al. (1995), Djorgovski et al. (1996) and Møller et al. (1998). Both objects are reported to have extended Lyα emission (Fynbo et al. 1999; Christensen et al. 2004). Table 1 includes these two previously known Lyα emitting DLA and sub-DLA galaxies, although the criteria listed above are not satisfied. The absorption system to- wards Q2233+131 has a column density that classifies it as a sub-DLA. Unless otherwise noted, these two objects are kept separate from the detection of candidate emission line objects in the remainder of the paper. Most of the DLAs in the IFS study lie towards bright QSOs (R < 19). This ensured that the PSF variations as a function of wavelength could be determined, which was necessary for the subtraction of the QSO emission. Bright QSOs had larger resid- uals from the subtraction of the continuum emission, which po- tentially affected out ability to recover emission line objects that were offset in velocity space and located closer than 1′′ to the QSO line of sight. However, tests with artificial objects showed that this was a minor problem for the data set (see Sect. 5.3). 3. Observations and data reduction Using the Potsdam Multi Aperture Spectrophotometer (PMAS) mounted on the 3.5m telescope at Calar Alto we observed the objects listed in Table 2 during several runs from 2002–2004. The PMAS integral field unit (IFU) was used in the standard configuration where 256 fibres are coupled to a 16×16 element lens array. During the observations each fibre covered 0.′′5×0.′′5 on the sky giving a field of view of 8′′×8′′. Each fibre resulted in a spatial element (spaxel) represented by a single spectrum. The 256 spectra were recorded on a 2k×4k CCD which was read out in a 2×2 binned mode. With a separation of 7 pixels between in- dividual spectra, the fibre to fibre cross-talk was negligible (less than 0.4% for an extraction of all 7 pixels). Detailed overviews of the PMAS instrument and capabilities are given in Roth et al. (2000, 2005). For individual exposures a maximum time of 1800s was used because of the large number of pixels affected by cosmic ray hits. Furthermore, because of varying conditions such as the atmo- spheric transmission and seeing, the total exposure time for each object was adjusted, or sometimes an observation was repeated under better conditions. The photometric conditions during ob- servations were monitored in real time with the PMAS acquisi- tion and guiding camera (A&G camera) which is equipped with a 1k×1k CCD. Seeing values listed in Table 2 refer to the see- ing FWHM measured in the A&G camera images. Determining actual spectrophotometric conditions requires monitoring of the extinction coefficients which cannot be determined from the http://www.phys.unsw.edu.au/~sjc/dla/ L. Christensen et al.: An IFS survey for high-z DLA galaxies 3 Coordinate name Alt. name zem zabs logN(H i) [Fe/H] [Si/H] Ref. Q0151+048A PHL 1222 1.93 1.934 20.36±0.10 (1) Q0953+4749 PC 0953+4749 4.457 3.404 21.15±0.15 >–2.178 >–2.09 (2,3) 3.891 21.20±0.10 >–1.712 >–1.60 4.244 20.90±0.15 –2.50±0.17 –2.23±0.15 Q1347+112 2.679 2.471 20.3 (4,5,6) 2.05 20.3† (7) Q1425+606 SBS 1425+606 3.163 2.827 20.30±0.04 –1.33±0.04 >–1.03 (8,9,10,21) Q1451+1223 B1451+123 3.246 2.469 20.39±0.10 –2.54±0.12 –1.95±0.16 (11,24) 3.171 19.70±0.15 –1.87±0.16 –1.62±0.15 (19) 2.254 20.30±0.15 –1.47±0.17 >–0.40 (6,12,19) Q1759+7539 GB2 1759+756 3.05 2.625 20.76±0.10 –1.21±0.10 –0.82±0.10 (13,18,20) 2.91 19.8 –1.65±0.01 –1.26±0.01 (18) Q1802+5616 PSS J1802+5616 4.158 3.391 20.30±0.10 –1.54±0.11 >–1.55 (23) 3.554 20.50±0.10 –1.93±0.12 >–1.82 3.762 20.55±0.15 –1.82±0.26 >–1.74 3.811 20.35±0.20 –2.19±0.23 –2.04±0.22 Q2155+1358 PSS J2155+1358 4.256 3.316 20.55±0.15 >–1.68 –1.26±0.17 (3) 3.142 19.94±0.10 –2.21±0.21 –1.85±0.12 (14,19) 3.565 19.37±0.15 <–2.40 –1.27±0.16 (14,19) 4.212 19.61±0.15 –2.18±0.25 –1.92±0.11 (14,19) Q2233+131 3.295 3.153 20.0 >–1.4±0.1 >–1.04 (15,16,17) 2.551 20.0 (12,16) Table 1. List of the observed DLA and sub-DLA systems with column densities and metallicities taken from the litera- ture. † denotes a system where the reported N(H i) needs to be confirmed through high resolution spectroscopy. References for either DLA redshifts or metallicities: (1) Møller et al. (1998), (2) Schneider et al. (1991), (3) Prochaska et al. (2003d), (4) Smith et al. (1986), (5) Wolfe et al. (1986), (6) Turnshek et al. (1989), (7) Wolfe et al. (1995), (8) Chaffee et al. (1994), (9) Stepanian et al. (1996), (10) Prochaska et al. (2002a), (11) Bechtold (1994), (12) Lanzetta et al. (1991), (13) Prochaska et al. (2001), (14) Péroux et al. (2003), (15) Steidel et al. (1995), (16) Lu et al. (1997), (17) Prochaska et al. (2003a), (18) Outram et al. (1999), (19) Dessauges-Zavadsky et al. (2003), (20) Prochaska et al. (2002b), (21) Lu et al. (1996), (22) Prochaska et al. (2003c), (23) Prochaska et al. (2003b), (24) Petitjean et al. (2000). A&G camera images. In Table 2 ‘stable’ means that the pho- tometry of the guiding star was constant within 1% during the observations. The data were obtained using 2 gratings; one with 300 lines mm−1 and one with 600 lines mm−1, set at a chosen tilt to cover a selected wavelength range. The FWHM of the sky lines were measured to be 6.4 and 3.2 Å, respectively. Observations of spec- trophotometric standard stars were carried out at the beginning and end of each night at the grating position used for the obser- vations. Data reduction was done by first subtracting an average bias frame. Before extracting the 256 spectra most cosmic ray hits were removed by the routine described in Pych (2004). A high threshold was chosen such that not all cosmic rays were re- moved, because a low threshold would also remove bright sky emission lines from some spectra. Remaining cosmic rays were removed from the extracted spectra using the program L.A. Cosmic (van Dokkum 2001). The locations of the spectra on the CCD were found from exposures of a continuum lamp, taken either before or after the science exposures, using a tracing algorithm developed for the IDL based PMAS data reduction package P3D (Becker 2002). The spectral extraction was done in two ways; a ‘simple extrac- tion’ that added all flux from each spectrum on the CCD (i.e. an extraction width of 7 pixels), and another method that took into account the profile of the spectrum on the CCD. This second method assumed that the spectral profiles were represented by Gaussian functions (Gaussian extraction) where the widths were allowed to vary with wavelength. Widths were determined by fits to each of the 256 spectra as a function of the wavelength, and the extraction used these width in combination with the centre found from the tracing algorithm. The Gaussian profile is an ap- proximation because the profiles are not strictly Gaussian. The second method increased the signal-to-noise ratio by >10% for faint objects and therefore unless otherwise noted, the results from the ‘Gaussian extraction’ data cubes will be reported (see also Sánchez 2006). After extraction, the spectra were wavelength calibrated us- ing exposures of emission line lamps taken just before or after the observations. The wavelength calibration was done using the P3D reduction tool. Comparisons with sky emission lines indi- cated an accuracy of the wavelength calibration of about 10% of the spectral resolution. The spectra show a wavelength dependent fibre to fibre trans- mission. To correct for this effect, we extracted sky spectra ob- tained from twilight sky observations in the same way as the sci- ence observations. A one dimensional average sky spectrum was calculated. Each of the 256 spectra were divided by this average spectrum, and the fraction was fit by a polynomial function to re- duce noise. These polynomials were used to flat field the science spectra. Before combining individual frames, the extracted spectra were arranged into data cubes. Each data cube was corrected for extinction using an average extinction curve for Calar Alto (Hopp & Fernandez 2002). The data cube combination took into account a correction for the differential atmospheric refraction 4 L. Christensen et al.: An IFS survey for high-z DLA galaxies QSO date exp.time grating λ coverage seeing conditions (s) (Å) Q0151+048A 2003-08-27 5×1800 V600 3500–5080 0.8–1.2 stable Q0953+4749 2004-04-16 4×1800 V300 3630–6980 0.9 stable 2004-04-21 5×1800 V300 3630–6980 1.0 non-phot. Q1347+112 2004-04-20 7×1800 V300 3630–6980 0.6 non-phot. Q1425+606 2004-04-19 6×1800 V300 3630–6750 1.0 stable Q1451+1223 2004-04-17 7×1800 V300 3630–6980 0.8 non-phot. Q1759+7539 2004-04-21 7×1800 V300 3630–6980 1.0–1.5 non-phot. Q1802+5616 2003-06-18 2×1800 V600 5100–6650 1.0 non-phot. 2003-06-20 3×1800 V600 5100–6650 1.0 non-phot. 2003-06-21 4×1800 V600 5100–6650 1.8 non-phot. 2003-06-22 6×1800 V600 5100–6650 0.9 stable Q2155+1358 2003-08-26 7×1800 V600 4015–5610 0.7 stable 2003-08-27 4×1800 V600 4015–5610 0.8 non-phot. Q2233+131 2002-09-03† 4×1800 V300 3930–7250 1.0–1.3 2003-08-24 6×1800 V600 4000–5600 0.6 stable 2003-08-25 4×1800 V600 4000–5600 0.7 non-phot. Table 2. Log of the observations. The last two columns show the average seeing during the integrations and the photometric conditions derived from the A&G camera images. † Results from these observations are published in Christensen et al. (2004). using a theoretical prediction (Filippenko 1982). Relative spa- tial shifts between individual data cubes were determined from a two-dimensional Gaussian fit to the QSO PSF at a wavelength close to the strong DLA absorption lines. Subtraction of the sky background was an iterative process because the locations of the emission line objects of interest were not known beforehand. PMAS, in the configuration used, does not have specifically allocated sky fibres. Instead, we selected 10–20 fibres uncontaminated by the QSO emission and the av- erage spectrum was subtracted from all 256 spectra. Different spaxel selections were examined visually to select an appropri- ate sky spectrum which had no emission line or noisy pixels in the spectral region of interest. Flux calibration was done in the standard way using obser- vations of spectrophotometric standard stars. A one-dimensional spectrum of the standard star was constructed by co-adding flux from all 256 spaxels. This was used to create a sensitivity func- tion that could be applied to each of the 256 spectra in the science exposures. For non-photometric nights the flux calibrated spec- tra were compared with QSO spectra from the literature to esti- mate photometric errors. However, no correction factor was ap- plied to our spectra, because an intrinsic variability of the QSOs would make such scaling uncertain. For some cases we note in Sect. 5.5 that there are differences which could be caused by ei- ther non-photometric conditions or intrinsic variability. For reference we show spectra of the target QSOs in Fig. 1. Where present, metal absorption lines corresponding to the highest column density DLAs are indicated. For QSOs with multiple DLAs lines only the DLA lines and their redshifts are indicated since the wavelength coverage does not include lines redwards of the QSO Lyα line. A detailed analysis of metal absorption lines requires higher resolution spectroscopy as presented elsewhere (e.g. Prochaska et al. 2003c; Péroux et al. 2003; Dessauges-Zavadsky et al. 2003). DLA redshifts derived from the metal absorption lines were consistent with those re- ported in the literature within the accuracy of the wavelength calibration of the data cubes. 4. Search for DLA optical counterparts The observations covered the wavelengths of Lyα for all but one of the strong absorption systems listed in Table 1. Only the high- est redshift sub-DLA system towards Q2155+1358 was not cov- ered, i.e. the total number of systems included in this analysis is 21 DLA and sub-DLA systems. 4.1. Expected sizes For this project we are only interested in small wavelength re- gions corresponding to Lyα at the DLA redshifts, and thus the search for candidate galaxies could be carried out using cus- tomised narrow-band filters. IFS, on the other hand, has the ad- vantage that the widths of the narrow-band filters can be ad- justed to match those of the emission lines. Typically customised narrow-band filters have a larger transmission FWHM than the spectral resolution of IFS data. Hence, IFS allows detection of emission line objects with a higher signal-to-noise ratio than that possible with narrow-band filters. A disadvantage is the rela- tively small field of view of current IFUs, but this is not a serious concern. One can estimate the expected sizes of DLA galaxies (see Wolfe et al. 1986). Using a Schechter luminosity function and a power-law relation between the disk luminosity and gas radius given by the Holmberg relation R/R∗ = (L/L∗)β, one can calculate the expected impact parameter. Combining β = 0.26 found for DLA galaxies at z < 1 (Chen et al. 2005) with the luminosity function in for z = 3 galaxies (Poli et al. 2003) one finds R∗ ≈ 30 kpc. If DLA galaxies are similar to or fainter than L∗ galaxies this implies that DLA galaxies at z > 2 are expected to lie closer than ∼4′′ from the QSO line of sight. The small field of view of IFUs is therefore well suited to search for Lyα emission from DLA galaxies. The estimated galaxy sizes are highly dependent on the parameters of the DLA galaxy luminosity and slope β. Most probably, high redshift DLA galaxies are not regular disks like those in the local universe. Numerical models of DLAs predict that the galaxies are mostly smaller than 10 kpc, while obser- vations that give limits on the star-formation rates associated with DLAs suggest that DLAs are located in neutral gas around Lyman break galaxies (Wolfe & Chen 2006). As DLA galaxies hristensen survey high-z galaxies fλ [10 −16 erg s−1cm−2 Å−1] Lyα(1215.67) SiII(1260.42) OI(1302.17) CII(1334.53) SiIV(1393.76) SiIV(1402.77) SII(1526.71) SII(1533.43) CIV(1549) Si IV fλ [10 −16 erg s−1cm−2 Å−1] Lyα(z=3.404) Lyα(z=3.891) Lyα(z=4.244) Lyα(QSO) fλ [10 −16 erg s−1cm−2 Å−1] Lyα(1215.67) SiII(1260.42) OI(1302.17) CII(1334.53) SiIV(1393.76) SiIV(1402.77) SII(1526.71) CIV(1549) FeII(1608.45) AlII(1670.79) AlIII(1854.7) AlIII(1862.8) Si IV fλ [10 −16 erg s−1cm−2 Å−1] Lyβ(1026) Lyα(1215.67) SiII(1260.42) OI(1302.17) CII(1334.53) SiIV(1393.76) SiIV(1402.77) SII(1526.71) SII(1533.43) CIV(1549) FeII(1608.45) AlII(1670.79) Si IV 0 2 4 6 8 fλ [10 −16 erg s−1cm−2 Å−1] Lyα(1215.67) SiII(1260.42) OI(1302.17) CII(1334.53) SiIV(1393.76) SiIV(1402.77) SII(1526.71) CIV(1549) Si IV fλ [10 −16 erg s−1cm−2 Å−1] Lyβ(1026) Lyα(1215.67) SiII(1260.42) OI(1302.17) CII(1334.53) SiIV(1393.76) SiIV(1402.77) SII(1526.71) CIV(1549) FeII(1608.45) AlII(1670.79) AlIII(1854.7) AlIII(1862.8) Si IV fλ [10 −16 erg s−1cm−2 Å−1] Lyα(z= 3.391) Lyα(z= 3.554) Lyα(z= 3.762) Lyα(z= 3.811) Lyα(QSO) fλ [10 −16 erg s−1cm−2 Å−1] | Lyα(1215.67) 0 2 4 6 8 fλ [10 −16 erg s−1cm−2 Å−1] Lyβ(1026) Lyα(1215.67) SiII(1260.42) OI(1302.17) CII(1334.53) ig.1. ectra tracted rrectio alactic ciated icated icated catio ritten ectra issio 6 L. Christensen et al.: An IFS survey for high-z DLA galaxies at z > 2 are generally found to be fainter than an L∗ galaxy (Colbert & Malkan 2002), we choose to consider only objects with impact parameters smaller than 30 kpc for a more detailed analysis. The impact parameters that we measure in the data corre- spond to the radially projected distances so the real distances to the absorber can be larger. Two candidates are found at impact parameters larger than 30 kpc, and they are likely not associated directly with the absorbers themselves. 4.2. Candidate selection Some Lyα emission lines from DLA galaxies are offset from the QSO- DLA line by ∼ 200 km s−1 (Møller et al. 2002), whereas Lyα emission from high redshift galaxies can have even larger offsets from the galaxy systemic redshift (Shapley et al. 2003). We therefore chose to focus on regions in the data cubes with velocities ranging from approximately –1000 to +1000 km s−1 from the DLA lines. First, the reduced data cubes were stacked in a two- dimensional frame and inspected visually around the DLA lines for emission line objects. When the spatial offset from the QSO is larger than the seeing, or alternatively when the QSO is very faint, emission line objects can be identified directly because of the ordering of the spectra in the stacked spectrum. Where no objects could be detected visually further sampling of the data cubes was necessary to increase the signal-to-noise ratio to detect candidate emission line objects. Inspections of the data cubes was done using the Euro3D visualisation tool (Sánchez 2004). From the reduced, sky-subtracted and combined data cubes, narrow-band images were created with an initial width of 10– 15 Å depending on the spectral resolution of the observations. A set of images was created offset by –10 to +10 Å from the DLA line to allow for possible velocity shifts of the Lyα emission line, and inspected visually for objects brighter than the background. If detected, spectra from these brighter regions were co-added and inspected for emission lines at the wavelength chosen in the narrow-band image. This step was necessary to discriminate be- tween emission lines and individual noisy spectra. It is known that three blocks of 16 fibres, i.e. 48 fibres in an area of 1.′′5 to the west in the field of view, have lower than average transmission. The effect of correcting for the total throughput was that these spectra had lower signal-to-noise ratios. When narrow-band im- ages were created from the cubes, the higher variance in these spaxels could result in extreme values, seemingly inconsistent with the neighboring spaxels. Only by looking at the spectrum associated with a bright spaxel could it be determined if an emis- sion line was present, or if the spectrum was just noisy. If an emission line was seen, a second pass narrow-band image was created using the value of the emission line width to increase the signal of the detection. A second pass one-dimensional spectrum was created after inspecting the narrow-band image for more bright spaxels surrounding the emission line candidate. This pro- cedure was iterated until the signal in either narrow-band images or spectra did not increase. We found that an interactive visual identification of faint emission lines was more effective than an automatic routine. To allow a better visual detection of emission line objects, the narrow-band images were interpolated to pixel scales 0.′′2 pixel−1 as shown in Fig. 2. In all panels the images are 8′′ by 8′′, with orientation north up and east left. The left panels show interpolated images of the QSO at wavelengths near to the DLA line. Contours correspond to an image centered on the visually detected emission feature close to the DLA redshift. In the mid- dle panels in Fig. 2 the plots are reversed, such that the image shows the emission line object and the contours correspond to the QSO narrow-band image. Here, the innermost contour cor- responds to the seeing FWHM. To enhance the visibility of the candidates the QSO emission was subtracted from the data cubes before creating the images. This subtraction of the QSO emis- sion was done using a simple approach (see Christensen et al. 2006). A scale factor was determined for each spaxel by dividing each spectrum by the extracted one-dimensional QSO spectrum. Using this scale factor, the QSO emission was subtracted, a pro- cess which retains objects with spectral characteristics different from the QSO in the data cube. The spectra of the candidates are shown in the right hand column in Fig. 2. These are created by co-adding spectra from between 4 and 10 spaxels. The dotted line corresponds to the 1σ noise level determined from a statistical analysis of the pixel values in the data cube, while the lower sub-panels show the background noise spectra in the data cubes, obtained from 4-10 background spaxels. Properties of the candidate objects corresponding to those with spectra in Fig. 2 are listed in Table 3. Offsets in RA, DEC from the QSO and the corresponding projected distance at the DLA redshift are listed in columns 2, 3, and 4. Emission lines were fit using ngaussfit in IRAF, redshifts listed in column 5 are derived. Fluxes in column 6 are derived from the Gaussian fits, and errors in the peak intensity, line width, and contin- uum placement are propagated to calculate the uncertainties. The fluxes have not been corrected for the Galactic extinction. The flux measurements and the associated errors indicate that most of the candidates are detected with a signal-to-noise ratio < 4σ. Column 7 gives the velocity difference between the systemic DLA redshift and the candidate Lyα emission lines. We integrate the signal-to-noise estimate over the emission line (Column 8), S/Nint = f /( N × σ), where f is the line flux, N is the number of pixels the emission line covers and σ is the noise in adjacent wavelength intervals. Column 9 gives the observed emission line FWHM after correcting for the instrumental resolution. Finally column 10 gives the significance classes of the candidate detec- tion, which is explained in Section 5.1. Columns 3 and 4 in Table 4 list the values of Galactic redden- ing towards each QSO (Schlegel et al. 1998), and the correction factors to be applied to the candidate fluxes for a Milky Way extinction curve (Fitzpatrick 1999). 5. Results This section describes the classification of the emission line can- didates. We estimate the contamination from spurious detections and from interlopers. Notes on each observed object are pre- sented as well. 5.1. Candidate significance class To estimate how reliable the candidate detection was, various tests were applied to the data cubes. The candidates were as- signed a significance class: 1, 2, 3, and 4 according to how many of the following tests were passed. 1. Instead of co-adding all data-cubes, two independent subsets of exposures were created, and the emission line candidate was visible in both sub-set combinations. In Sect. 5.5 these will be referred to as subcombinations. L. Christensen et al.: An IFS survey for high-z DLA galaxies 7 DLA Q0151+048A 3500 3550 3600 3650 Wavelength [Å] Sky noise DLA (z=3.404) Q0953+4749 5200 5300 5400 5500 Wavelength [Å] Sky noise DLA (z=3.891) Q0953+4749 5800 5900 6000 6100 Wavelength [Å] Sky noise DLA (z=4.244) Q0953+4749 no candidate 6200 6300 6400 6500 Wavelength [Å] Sky noise DLA Q1347+112 (z=2.484) 4100 4200 4300 4400 Wavelength [Å] Sky noise Fig. 2. Left panels: narrow-band images of the QSOs with overlayed contours of narrow-band images centered on the Lyα wave- lengths of the DLAs. The images are 8′′ square. Contour levels correspond to 2, 3, 4σ levels above the background noise. Middle panels: the reverse, where the contours are arbitrary apart from the central one that shows the QSO seeing FWHM. Right hand panels: Spectra of candidate emission line objects created from co-adding spectra from spaxels associated with the emission line candidates. The width of the grey bars over the emission lines correspond to the wavelength ranges of the narrow band images. The lines below the spectra show the background sky noise spectra determined from background spaxels. 8 L. Christensen et al.: An IFS survey for high-z DLA galaxies DLA Q1347+112 (z=2.057) 3650 3700 3750 3800 3850 3900 Wavelength [Å] Sky noise 10 DLA Q1425+606 4500 4600 4700 4800 Wavelength [Å] Sky noise DLA Q1451+1223 4100 4200 4300 4400 Wavelength [Å] Sky noise sub−DLA Q1451+1223 4900 5000 5100 5200 Wavelength [Å] Sky noise Fig. 2. Plots of candidates– continued. No candidate is found for the z = 2.254 DLA towards Q1451+1223. L. Christensen et al.: An IFS survey for high-z DLA galaxies 9 DLA Q1759+7539 4300 4400 4500 4600 Wavelength [Å] Sky noise 10 sub−DLA Q1759+7539 4600 4700 4800 4900 Wavelength [Å] Sky noise DLA Q1802+5616 (z=3.391) 5200 5300 5400 5500 Wavelength [Å] Sky noise 10 DLA Q1802+5616 (z=3.762) 5600 5700 5800 5900 Wavelength [Å] Sky noise Fig. 2. Plots of candidates– continued. No candidate is found for the z = 3.554 DLA towards Q1802+5616. 10 L. Christensen et al.: An IFS survey for high-z DLA galaxies 10 sub−DLA Q2155+1358 (z=3.146) 4900 5000 5100 5200 Wavelength [Å] Sky noise DLA Q2155+1358 5100 5200 5300 5400 Wavelength [Å] Sky noise subDLA Q2233+131 (z=3.153) 4900 5000 5100 5200 Wavelength [Å] Sky noise Fig. 2. Plots of candidates– continued. No candidates are found for the z = 3.811 DLA towards Q1802+5616, or the z = 3.565 sub-DLA towards Q2155+1358. L. Christensen et al.: An IFS survey for high-z DLA galaxies 11 Fig. 2. Plots of candidates– continued. No candidate is found for the z = 2.51 sub-DLA towards Q2233+131. QSO ∆RA ∆DEC b z fλ ∆ v S/Nint FWHM significance (′′) (′′) (kpc) (km s−1) (km s−1) class (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Q0151+048A 2.5:0.4 –2.5: 1.7 3.3:25.4 1.9363 (150±70) +225 7 280±220 conf. Q0953+4749 –1.2 –0.2 9.0 3.4041 (6.6±2.9) 0 15 290±260 2 –0.5 1.8 11.1 3.9029 (4.9±2.1) +730 16 570±230 3 Q1347+112 –2.0 1.5 20.2 2.4835 (3.5±1.9) +1080 8 190±370 3 –4.1 0.4 34.3 2.0568 (4.2±3.0) +670 7 610±210 1 Q1425+606 –4.1 0.4 32.3 2.8280 (8.5±3.1) +80 14 590±220 3 Q1451+1223 –3.0 3.8 39.2 2.4764 (5.8±2.6) +640 10 320±260 3 –1.0 2.8 22.5 3.1739 (3.1±2.0) +210 7 <100 3 Q1759+7539 –1.4 3.5 30.0 2.6377 (5.8±3.0) +1050 10 290±220 2 0.1 –1.2 9.4 2.9090 (6.0±2.9) –80 21 240±260 1 Q1802+5616 –0.2 –2.0 14.9 3.3820 (3.5±0.9) –610 10 180±90 4 0.2 1.8 12.9 3.7652 (4.6±1.9) +200 7 <100 2 Q2155+1358 0.7 1.2 10.2 3.3174 (9.4±3.0) +100 9 780±210 3 1.7 1.9 19.4 3.1461 (4.1±2.4) +290 7 260±220 3 Q2233+131 0.6 2.3 18.0 3.1543 (9.6±2.5) +90 9 230±110 conf. Table 3. Properties of candidate Lyα emission lines. Column 2, 3, and 4 list the offsets of the candidate in RA and DEC and in pro- jected kpc at the Lyα emission redshifts (in col. 5), respectively. Column 6 lists the integrated Lyα flux in units of 10−17 erg cm−2 s−1, and column 7 the velocity offset from the DLA redshift. Column 8 lists the integrated signal-to-noise ratio of the Lyα emission line, and column 9 gives the line width of the emission lines. Fluxes have not been corrected for Galactic extinction. Column 10 lists the significance class of the detections as described in Sect. 5.5. ‘Conf.’ implies candidates that were confirmed previously (Møller et al. 1998; Djorgovski et al. 1996). 2. The emission line candidate was visible in the ‘simple ex- tractions’, i.e. where a Gaussian profile was not assumed. 3. Emission line candidates were visible in the narrow-band im- ages when the QSO spectrum was subtracted from the data cube. 4. Emission line objects that were directly visible in individual or combined data cubes, or in the stacked spectra. In all cases, for a candidate to be considered further it was re- quired to be detected above 3σ in both narrow-band images and the associated spectra. The significance classes of the candidates are listed in column 10 in Table 3, and comments for each object are described in Section 5.5. Since the candidates were found from visual inspections of the data cubes, this classification was done to describe the candidates in a more qualitative manner. As the classes involve various tests on the data sets, this classifica- tion goes beyond the simple statistical significance in terms of to what σ level the object is detected. 5.2. Non-detections In the data cubes where no candidates were found, we esti- mated the upper limits for the emission line fluxes. Spectra from spaxels within one seeing element (e.g. 4 spaxels corresponding to a seeing of 1′′) were co-added to create a one-dimensional spectrum. Artificial emission lines with varying line fluxes were added to this spectrum at the DLA wavelength, and Gaussian profile fits to these lines were used to estimate the detection level. The results are listed in Table 5. The varying limits are due to the wavelength dependent noise in the data cubes and in particular the presence of residuals from nearby sky emission lines. 12 L. Christensen et al.: An IFS survey for high-z DLA galaxies QSO EB−V ffrac Q0151+048A 0.044 1.216 Q0953+4749 (z =3.4041) 0.011 1.032 Q0953+4749 (z =3.9028) 1.028 Q1347+112 (z =2.4835) 0.035 1.145 Q1347+112 (z =2.0568) 1.163 Q1425+606 (z =2.827) 0.012 1.043 Q1451+1223 (z =2.4764) 0.031 1.128 Q1451+1223 (z =3.1739) 1.102 Q1759+7539 (z =2.6377) 0.053 1.220 Q1759+7539 (z =2.91) 1.199 Q1802+5616 (z =3.3820) 0.052 1.164 Q1802+5616 (z =3.7652) 1.145 Q2155+1358 (z =3.3174) 0.067 1.222 Q2155+1358 (z =3.1461) 0.067 1.237 Q2233+131 0.068 1.240 Table 4. Column 2 and 3 give values of the Galactic redden- ing and the corresponding correction factor to be applied to the emission line candidates. QSO zabs flim(3σ) Q0953+4749 4.244 2.5 Q1451+1223 2.256 4.0 Q1802+5616 3.554 4.0 Q1802+5616 3.811 2.2 Q2155+1358 3.565 3.5 Q2233+131 2.551 4.8 Table 5. DLA and sub-DLA systems where no candidate emis- sion lines are found and 3σ upper limits for the line fluxes. Fluxes are in units of 10−17 erg cm−2 s−1. 5.3. Experiments with artificial objects To investigate how the efficiency of the visual inspection de- pended on object properties, several experiments with artificial data cubes were made. Similar to artificial experiments for one- and two-dimensional data sets, artificial emission line objects were added to the data cubes. These objects were described by the location in RA and DEC, central wavelength, peak emission intensity, and the widths in RA, DEC and wavelength. For sim- plicity we assumed that an emission line object seen as a point source in the data cube could be represented by a Gaussian pro- file in each direction, i.e. described by a Gaussian ellipsoid in the data cube. We first tested completely simulated data cubes with statisti- cal noise levels corresponding to the typical noise level in the combined data cubes. An emission line object with a flux of 5 × 10−17 erg cm−2 s−1, a width of 800 km s−1, and spatial FWHM of 1′′ was placed at a previously known wavelength. In the stacked spectra no objects could be seen immediately. The emission line was only identified after inspecting the data cube in the visualisation tool, and it was extracted and analysed in the same way as the real data. Similar tests were made by adding an emission line to a real data cube, where the background noise in- cluded the systematic noise as well as the pure Poissonian noise. These tests produced similar results for the faint emission lines with Lyα flux f ∼ 5 × 10−17 erg cm−2 s−1, i.e. 1) the emission line flux could be recovered within uncertainties, 2) even at very small impact parameters the object could be found 3) the recon- structed PSF of the emission line object was irregular as in any of the images in Fig. 2. We also tested an automatic routine where the re-detection of the artificial objects was done with no visual intervention. A set of narrow-band images were created in wavelength ranges around the artificial line. For the detection of an emission line the location was constrained to be within ±10 Å of the input central wavelengths. These images were smoothed and a two- dimensional Gaussian profile was fit to the images. When an ob- ject was detected above a certain threshold, spaxels around the centre within the seeing FWHM were co-added. A series of tests showed that the recovered flux was consistent within 1σ errors for fluxes down to f = 5 × 10−17 erg cm−2 s−1. In a typical data cube this was also the detection limit where 50% of the objects were re-identified, while the fraction of re-identified emission lines at this flux level from a visual inspection was larger. Tests on the frequency of false detections in data cubes where no objects were present showed that simultaneous detections of objects in narrow-band images and associated spectra with S/N > 3 occurred at a rate of less than 5% in a series of ex- periments. Therefore false detections cannot explain the large number of candidate objects. 5.4. Field Lyα emitters We estimate here whether the detected candidates are likely to be field Lyα emitters having no association with the DLAs. Observations of high redshift objects have partly focused on detecting Lyα emission from galaxies to determine the global comoving star-formation rates (e.g. Hu et al. 1998, 2004). The density of Lyα emitters at z ∼ 3 is estimated to be 15000 deg−2 ∆z−1 with line fluxes brighter than a mean of f = 1.5 × 10−17 erg cm−2 s−1 (Hu et al. 1998; Kudritzki et al. 2000). From the luminosity function at z ≈ 3 (van Breukelen et al. 2005), the ex- pected number of field Lyα emitters at a flux limit of 5 × 10−17 erg cm−2 s−1 is 1.7×10−4 arcsec−2 ∆z−1. In our survey, the 9 data cubes sample a total redshift interval of ∆z = 21.55 around z ≈ 3. Statistically, it is expected that there are 0.2 field Lyα emitters in the whole sample presented here. Because these very faint lines are difficult to locate when the approximate wavelength is not known in advance, we did not look for field emission objects. The negligible number of expected field emitters furthermore shows that the emission candidates, if proved to be real, are un- likely to be interloping field Lyα emitters. They are much more likely to be associated with the DLA galaxies. 5.5. Notes on individual objects This section explains the significance of the candidates for each individual QSO. Q0151+045A. – This is a zem ≈ zabs system at z ≈ 1.93. After flux calibration, the QSO spectrum is 2 magnitudes brighter than that presented in Møller et al. (1998). The low instrument sensi- tivity at 3560 Å combined with a variable extinction coefficient at Calar Alto makes the calibration uncertain. Extended Lyα emission was observed in a region of 3′′×6′′ around the QSO mostly to the east of the QSO (Fynbo et al. 1999). Long slit spectroscopy along the long axis revealed ve- locity structures of 400 km s−1 that could be interpreted as a rotation curve (Møller 1999). In the IFS data extended emission is detected to some degree in Fig. 2, but not with the same detail as in the higher spatial resolution and larger field of view data in Fynbo et al. (1999). This is the only case in the sample where extended emission is found, but the signal is not strong enough to determine the velocity structure over the extended region. The L. Christensen et al.: An IFS survey for high-z DLA galaxies 13 spectrum shown in Fig. 2 is the total spectrum co-added from the whole nebula. Q0953+4749. – This zem = 4.457 QSO has three DLAs at zabs = 3.407, 3.891, and 4.244 (Bunker et al. 2003). A candidate associated with the lowest redshift DLA is visible in the narrow- band image in Fig. 2. Independent subcombinations, the simple extraction, and the corresponding spectra show a faint emission line. This emission line coincides with a sky emission line 1.6 Å away and could be due to sky subtraction errors, so we only as- sign this candidate a significance class of 2. A Lyα emission line from the DLA galaxy has been reported (A. Bunker, pri- vate comm.) but its line flux is below our detection limit. For the second DLA system at z = 3.891 the object is present in sub- combinations, the simple-extracted images, and in the extracted spectra. This candidate is assigned significance class 3. No can- didate is found for the highest redshift DLA to the detection limit reported in Table 5. The locations of the candidates are compared to WFPC2 im- ages obtained from the HST archive, but no continuum counter- part could be identified. Q1347+112. – This zem = 2.679 QSO has a DLA at zabs = 2.471 and another possible one at zabs = 2.05, which needs con- firmation from spectroscopy at higher spectral resolution. An emission candidate for the z = 2.471 DLA is visible in the sub- combinations and the extracted one-dimensional spectra. In the simple extraction, the spectrum has a low signal-to-noise ratio and the emission feature in the spectrum is faint. We assign a significance class of 3 to this candidate. For the z = 2.0568 DLA system we detect a candidate emission line object, but note an increase in the background noise shortwards of 3750 Å. The ob- ject is not seen in one of the subcombinations, nor the extracted spectra, and therefore the candidate is assigned a significance class of 1. A snapshot WFPC F555W image (Bahcall et al. 1992) ob- tained from the HST archive has a 5σ limiting magnitude of 24.4 mag arcsec−2, but no continuum counterpart can be seen at the location of the candidate. Q1425+606. – This zem = 3.163 QSO has a DLA at zabs = 2.827. Because this QSO is very bright, strong residuals within 1′′ from the QSO centre are present in the narrow-band image where the QSO emission is subtracted. A faint object offset by ∼4′′ to the west is visible in the narrow-band image in Fig. 2. The candidate is present in subcombinations and in the constructed spectra and is assigned a significance class of 3. In PMAS data cubes, spaxels in the west region are more noisy than the average due to an overall lower transmission. Note that the tests suggest a good candidate, but the impact parameter (> 30 kpc) is large. Q1451+1223. – This zem = 3.246 QSO has two DLAs at zabs = 2.469 and zabs = 2.254 and a sub-DLA at zabs = 3.171. For the DLA system at z = 2.469 an object appears after the QSO subtraction close to the centre. It is caused by residuals, since the spectrum has no emission lines at the expected wavelength. For the same DLA, a region ∼4′′ to the north west appears in both narrow-band imaging, subcombinations, simple extractions, and the constructed spectra. We therefore assign a significance class of 3 to it, but note that is has a large impact parameter (39 kpc). For the z = 3.171 sub-DLA an object is detected to the north. Narrow-band images from subcombinations, and simple extrac- tions show the emission line candidate, but the corresponding spectra have emission lines with very low signals. The candidate is assigned a significance class of 3. No candidate is found for the z = 2.254 DLA. A deep optical broad-band image of the field surrounding this QSO was obtained by Steidel et al. (1995), who found no obvious candidates to the absorbers. Warren et al. (2001) found one candidate offset by 3.′′9 to the south-west of the QSO in a NICMOS image, but this object is outside the field of view of the IFS data. An HST/STIS archive image shows that the emis- sion line candidates lie in regions where no continuum emitting counterpart is found. Q1759+7539. – This zem = 3.05 QSO has a DLA at zabs = 2.625 and a sub-DLA at zabs = 2.91. The candidate detected for the DLA system in Fig. 2 lies near the northern edge of the field of view and can be affected by flat field errors. Although it is bright, the candidate is not visible in both subcombinations, and is therefore assigned a significance class of 2. A bright area 1.′′8 south west of the QSO appears after the QSO emission is subtracted but it is likely due to residuals. It has no emission lines at the expected wavelength and is not considered further. The higher redshift sub-DLA system has an emission line candidate which is visible only after the QSO PSF has been subtracted from the final cube. However, the candidate is only visible in one out of two subcombinations and we assign this candidate a low significance class of 1. A NICMOS snapshot image showed no bright galaxies near the QSO to a limit corresponding to an L∗ galaxy (Colbert & Malkan 2002). Q1802+5616. – This zem = 4.158 QSO has four DLAs at zabs = 3.391, 3.554, 3.762, and 3.811. The candidate for the low- est redshift DLA system is directly visible in the reduced and combined data cube when looking at the stacked spectra. The candidate can also be identified in individual subcombinations and in the simple extracted spectrum. Therefore this candidate is assigned the class 4. In a narrow-band image at the wavelength of Lyα at z = 3.7652 there is an emission region to the south (see Fig. 2) and the corresponding spectrum shows an emission fea- ture. However, this line is coincident with a faint sky emission line, so this candidate is assigned the class 2. No candidates are found for the other two DLA systems. Q2155+1358. – This zem = 4.256 QSO has a DLA at zabs = 3.316 and three sub-DLAs at zabs = 3.142, 3.565, and 4.212. The observations only cover Lyα for the three lower redshift systems. IFS covering the highest redshift system has revealed a possi- ble faint candidate emission line object (Francis & McDonnell 2006). The candidate Lyα emission line associated with the DLA system is visible in independent subcombinations and in the simple extraction and is therefore assigned a high value of 3. Because of the partial spatial overlap with the QSO, the emis- sion from the QSO is subtracted to give a cleaned emission line object and the associated spectrum shown in Fig. 2. A candidate is found to the south for the z = 3.142 sub-DLA system. This object is visible in the simple extraction, and subcombinations, but only one associated spectrum shows a clearly detected emis- sion line. We assign a significance class of 3 to this candidate. No candidate is found for the z = 3.565 sub-DLA. Q2233+131. – This zem = 3.295 QSO has two sub-DLAs at zabs = 3.153 and zabs = 2.551. The galaxy responsible for the z = 3.153 DLA was found by Steidel et al. (1995), and follow- up spectroscopy confirmed this by the detection of Lyα emission (Djorgovski et al. 1996). Previous IFS of this object suggested that the Lyα emission was extended (Christensen et al. 2004). This is not confirmed by the higher spectral resolution data in- cluded in this paper, although there appears to be some faint emission to the east of the object in Fig. 2. The new data and im- proved data reduction which optimises the signal-to-noise ratio, confirm the line flux Lyα line flux reported in Djorgovski et al. 14 L. Christensen et al.: An IFS survey for high-z DLA galaxies (1996). No candidate was found for the z = 2.551 sub-DLA sys- tem, consistent with the upper limit from a deeper Fabry-Perot imaging analysis (Kulkarni et al. 2006). 6. Properties of candidate DLA counterparts We proceed with a more detailed analysis of the properties of the detected candidate Lyα emission lines. Only those candi- dates assigned values 3 and 4 are included. Of the eight good candidates, we reject two due to their large impact parameters (> 30 kpc). However, since they fulfill the criteria for good can- didates, they could instead belong to a brighter component in a group. The average redshift of all the DLAs in the whole sam- ple is z̄sample = 3.13 while that of the six remaining candidates is z̄cand = 3.23, hence we find no preference for detection of ei- ther lower or higher redshift candidates. We emphasise that the candidates emission lines have fluxes that are detected at the 3σ level, but with this in mind we compare their properties with those of confirmed Lyα emission lines from DLA galaxies. 6.1. Line fluxes Fig. 3 shows the inferred line fluxes of the candidates as a func- tion of redshift. The triangles denote our candidates and square symbols indicate already confirmed objects from the literature (Møller & Warren 1993; Djorgovski et al. 1996; Møller et al. 1998; Leibundgut & Robertson 1999; Møller et al. 2002, 2004). This figure shows that the line fluxes for the candidates are simi- lar to those for the previously confirmed ones, which have deeper observations and detection levels of 5–10σ. Fabry-Perot imaging studies of QSOs with DLAs have man- aged to reach similar or lower flux limits than our IFS survey (Lowenthal et al. 1995; Kulkarni et al. 2006). With their detec- tion limit some objects should have been detected if the Lyα fluxes of DLA galaxies are around the level we find for the candidates and the confirmed objects. IFS is useful to look for emission lines as it allows us to adjust a posteriori the cen- tral wavelength, whereas in Fabry-Perot images, the emission line could fall at the wings of the filter where the transmis- sion is lower. Another advantage of IFS observations is the knowledge of the spatial QSO PSF as a function of wave- length which allows a modeling and subtraction of the QSO emission (Wisotzki et al. 2003; Sánchez et al. 2004). This al- lows detection of emission lines even when they are superim- posed on the QSO. Nevertheless we do not detect emission line candidates closer than about 1′′ from the QSO possibly due to subtraction residuals. The fact that the confirmed objects are found at smaller impact parameters compared to the candidates (Sect. 6.3) could indicate a bias. 6.2. Velocity differences An anticorrelation is expected between the Lyα luminosity and the velocity difference between the Lyα emission line and optical emission lines (Weatherley et al. 2005). The resonant nature of Lyα causes a shift of the emission line towards slightly longer wavelengths where the photons can escape absorption. When a larger fraction of the blue part of the line profile is absorbed, the remaining emission line of lower luminosity will be more shifted in velocity compared to brighter ones. This explanation is supported by the study of Lyα emission lines from Lyman break galaxies (LBGs) (Shapley et al. 2003). 1.5 2.0 2.5 3.0 3.5 4.0 redshift 10−18 10−17 10−16 10−15 10−14 L= 1043 erg s−1 L= 1042 erg s−1 confirmed candidates Fig. 3. Line fluxes of Lyα emission objects as function of red- shift, where square symbols represent already confirmed objects and triangles the candidates. The solid and dashed lines corre- spond to Lyα luminosities of 1042 and 1043 erg s−1, respectively. Fig. 4 shows the velocity differences between Lyα emission lines and the DLA redshifts for the candidates as a function of the Lyα luminosity. There is no evidence for a correlation for the candidates. We note that the only candidate that shows a negative velocity offset is the best candidate in the sample; the z = 3.391 DLA towards Q1802+5616. For the candidates we find an aver- age velocity difference of 300±580 km s−1, which is similar to the velocity differences measured for LBGs; Pettini et al. (2001) find 560±410 km s−1 while a larger sample has ∆v = 650 km s−1 between the Lyα emission line and low-ionisation absorption lines (Shapley et al. 2003). In the case that DLAs are associated with bright galaxies we would expect to see large velocity off- sets too. Furthermore, as the line of sight towards the emission line object and the QSOs differ by 10–30 kpc, a larger velocity offset can be expected due to differences in kinematics within the host and its environment. Instead, if the DLA galaxy resides in a group, the velocity offset will reflect the velocity disper- sion in the group instead of being related to the host. In support of this idea, it has been shown that bright Lyman break galax- ies at z > 2 are surrounded by gas extending to large distances (Adelberger et al. 2005). Correlation studies have revealed that DLAs cluster on almost the same scale as LBGs (Cooke et al. 2006), indicating that a similar amount of gas is present in their environments. Like flux-limited surveys, this IFS study selects the brightest emission component, and it is possible that the real absorbing galaxy is a fainter component in a group. In the case that DLA galaxies are related to rotating large disks, it can be expected that the velocity difference increases with impact parameter but Fig. 5 shows no clear correlation. In three of the confirmed DLA galaxies optical emission lines have velocity differences between –200 and 30 km s−1 relative to Lyα (Weatherley et al. 2005). Some candidates have larger offsets, possibly affected more strongly by resonant scattering. 6.3. H i extension The average impact parameter of 16 kpc derived for the can- didates is larger than that expected by numerical simulations which favor impact parameters of b = 3 kpc for DLA galax- L. Christensen et al.: An IFS survey for high-z DLA galaxies 15 42.2 42.4 42.6 42.8 43.0 log (L confirmed candidates Fig. 4. Velocity differences between the Lyα emission lines and DLA redshifts as a function of the Lyα luminosity. An average error bar for the Lyα emission candidates is shown in the lower right corner. 0 10 20 30 b (kpc) confirmed candidates Fig. 5. Velocity differences between the Lyα emission lines and DLA redshifts as a function of the impact parameter. ies, and have fewer than 25% with b > 10 kpc at all redshifts (Haehnelt et al. 2000; Okoshi & Nagashima 2005; Hou et al. 2005). Larger DLA galaxy sizes of 10–15 kpc at 2 < z < 4 are inferred by other simulations (Gardner et al. 2001). A pos- sibility for the difference between observations and simulations is that the simulations assume a single disk scenario, while DLA galaxies could exist in groups (Hou et al. 2005). The real absorb- ing galaxy could be fainter and lie closer to the QSO line of sight than the detected candidate galaxy. An anticorrelation between N(H i) and the distance to the nearest galaxy is found in simulations (Gardner et al. 2001), but no analysis of this effect for observed DLA galaxies has been at- tempted. A trend for larger column density absorbers at smaller impact parameters was observed in a sample of DLA galax- ies at z < 1 (Rao et al. 2003). At lower column densities in the Lyα forest such an anticorrelation has been shown to ex- ist (Chen et al. 1998, 2001). Observations of the galaxies giving rise to Mg ii absorption lines showed an anticorrelation between the impact parameters and column densities of both Mg ii and 2.0 2.5 3.0 3.5 4.0 redshift confirmed candidates Fig. 6. Angular impact parameter vs. redshift. The solid line corresponds to the size-luminosity relation for an L∗B galaxy (Chen & Lanzetta 2003), while the dotted and dashed lines cor- respond to galaxies with B band luminosities LB = 10%L B and LB = 1%L B, respectively. Squares are objects from the literature and triangles the candidates from this survey. H i (Churchill et al. 2000), but recent observations of a larger sample have indicated that the correlation is not always present (Churchill et al. 2005). We here investigate whether the candidates show a similar anticorrelation using the impact parameters for the candidates as a proxy for the sizes of neutral gas envelopes around proto galax- ies. Assuming such a correlation is necessarily a rough approx- imation because large morphological differences between indi- vidual systems are expected (Rao et al. 2003; Chen & Lanzetta 2003). Specifically, the possible presence of sub-clumps is ne- glected. 6.3.1. DLA galaxy sizes and luminosities To analyse the sizes of DLA galaxies at z < 1, Chen & Lanzetta (2003) describe the extension of the neutral gas cloud associated with DLA galaxies as , (1) assuming that DLA galaxies follow a Holmberg relation be- tween galaxy sizes and luminosities. Their fit to the observed DLA galaxies gives R∗ = 30h−1 kpc, t = 0.26+0.24−0.06, where L B cor- responds to a galaxy with M∗B = −19.6. Based on the morpholo- gies and impact parameters Chen et al. (2005) argued that dwarf galaxies alone cannot represent the DLA galaxy population at z < 1. In the case that the DLA galaxy population evolves from low luminosity objects at high redshifts to higher luminosity at lower redshifts, this will affect the expected impact parameters. In Fig. 6 the impact parameters of candidates and confirmed ob- jects are shown as a function of their redshifts. Overlayed on this figure are curves for the size-luminosity relation derived for low redshift DLA galaxies. If DLA galaxies at high redshift follow the low redshift scaling relation, they comprise a mix of galaxy luminosities. 16 L. Christensen et al.: An IFS survey for high-z DLA galaxies 0 10 20 30 b (kpc) confirmed candidates Fig. 7. Column density of neutral hydrogen as a function of impact parameters for the candidates and previously confirmed objects. The symbols are similar to the previous figures. The solid and dashed lines are fits to the power-law relation b/b∗ = (N/N∗)β for the candidates and confirmed objects, respectively. 6.3.2. Powerlaw profiles Using similar arguments as above one could expect that there is a relation between the impact parameter b, and the column density measured for the DLA. Fig. 7 shows the N(H i) measured for the DLA as a function of the impact parameters in kpc. We assume a similar scaling relation as in Eq. (1) for the impact parameter and N(H i), i.e. ( N(H i) N(H i)∗ We set log N(H i)∗ = 20.3 and the error on the measured impact parameter is given by the fibre size of 0.′′5, which corre- sponds to ∼4 kpc. A fit of the observed impact parameters for the candidates gives b∗ = 15.9 ± 1.4 kpc, and β = −0.23 ± 0.08 as shown by the solid line, while for the confirmed objects, we find b∗ = 12.0± 3.7 kpc and β = −0.36± 0.14 as represented by the dashed line. The fits to the candidates and confirmed objects are consistent within 1σ uncertainties. 6.3.3. Exponential profiles Radio observations of the 21 cm emission from H i disks in the local Universe have shown that an exponential profile is a poor representation in the central part of disk galaxies, where the 21 cm flux density either stays constant or decreases towards the centre (e.g. Verheijen & Sancisi 2001). However, at optical wavelengths disk galaxies are well represented by exponential profiles. Here we fit the impact parameter distribution by the re- lation N(H i) = N(H i)0 exp(−b/h) (3) where h is the scale length, N(H i)0 the central column density of a simple exponential disk, and N(H i) is the column density measured for the DLAs. The resulting fit to all the candidates is shown by the solid line in Fig. 8, which has log N(H i)0 = 21.7± 1.1 cm−2 and h = 5.1+2.5−1.3 kpc. This result is similar within the uncertainties to a fit to the confirmed objects only (log N0 = 21.7 ± 1.1 cm−2 and h = 4.5+3.6−1.4 kpc). 0 5 10 15 20 25 30 b (kpc) confirmed candidates Fig. 8. Column density of neutral hydrogen as a function of im- pact parameter for the candidates and previously confirmed ob- jects. The solid and dashed lines show the fit to the exponential relation N(H i) = N(H i)0 exp(−b/h) for the candidates and con- firmed objects, respectively. Local disk galaxies have optical scale lengths ranging from ∼2 kpc to ∼6 kpc, and observations of the H i profile in low sur- face brightness galaxies have indicated scale lengths > 10 kpc (Matthews et al. 2001). In contrast, higher redshift spiral galax- ies in the HST Ultra Deep field have smaller optical scale lengths of 1.5–3 kpc (Elmegreen et al. 2005), possibly biased towards smaller values due to the easier detection of high surface bright- ness, high star formation rate regions. The question is how ex- tended the gaseous envelopes are around these young galaxies. The impact parameters of the candidates suggest that high red- shift DLAs reside far from the host galaxy, if not in a regular proto-galactic disk (Wolfe et al. 1986), then in a region of the same physical scale, possibly in merging clumps of gas sur- rounding the actual proto galaxy. In this picture, DLAs can be found far from the center of the parent galaxy. The two objects that were originally discarded as candidates due to their large impact parameters (> 30 kpc) do not follow the relations for either the exponential or power-law profiles. Including them would make the scatter around the fit substan- tial. 6.4. Metallicity effects on Lyα emission Using a space based imaging survey and follow up long-slit spectroscopic observations, Møller et al. (2004) found indica- tions for a positive metallicity–Lyα luminosity relation, such that Lyα emission was preferentially observed in higher metal- licity systems. They argued that this positive correlation could over-power the negative dust–Lyα luminosity effects which are expected to be strong in high metallicity environments (Charlot & Fall 1993). Studies of Lyα emission from nearby star-forming galaxies have not revealed any correlations between metallicity and Lyα emission strength (Keel 2005). Differences in the velocity-metallicity relation between high and low redshift DLAs could be explained by higher redshift DLAs residing in lower mass galaxies (Ledoux et al. 2006), that have fainter Lyα emission. In this context we investigate the distribution of metallici- ties for the emission line candidates in comparison to the total L. Christensen et al.: An IFS survey for high-z DLA galaxies 17 -2.5 -2.0 -1.5 -1.0 -0.5 [Si/H] good candidates no candidates Fig. 9. Cumulative distribution of Si metallicities of the six good emission line candidates compared to the remaining part of the sample. The probability that the two distributions are similar is 38% estimated from a Kolmogorov-Smirnov test. sample. We compare the cumulative distributions of metallici- ties ([Si/H]) for the DLAs that have candidate Lyα detections with the metallicities of the remaining objects that have either no Lyα candidates or rejected candidates. The distributions in Fig. 9 show the fraction of DLAs with metallicities larger than a given value. Table 1 has several lower limits on [Si/H] and Fig. 9 treats the limits as actual detections. A two sided Kolmogorov Smirnov (KS) test gives a prob- ability of 38% that the two samples have the same underlying distributions. A similar analysis for [Fe/H] gives the same re- sult. Hence, none of the tests give clear statistical evidence for a difference between the two populations. Only a small number of DLAs are included in this survey. For the two samples with N1 and N2 being the number of objects in each sample respectively we find N1N2/(N1 + N2) = 4. For the KS test to be statistically valid a value larger than 4 is required (Press et al. 1992), hence a few more objects are needed to make the test more statistically significant. 6.5. Metallicity gradients In local galaxies the metallicities of H ii regions are shown to de- crease with increasing radial distance in the disk (Zaritsky et al. 1994). If DLAs arise in disks lower metallicities are expected at larger linear separations between the QSO line of sight and the galaxy centers. A comparison of absorption metallicities for 3 DLAs at z < 0.6 with abundances derived from strong emis- sion line diagnostics from the galaxy spectra revealed that gra- dients are likely present at at level –0.041±0.012 dex kpc−1 (Chen et al. 2005). Uncertainties in the gradients arise due to an unknown correction for dust depletion and also the inclina- tion of the galaxy plays an important role due to projection ef- fects. Metallicity differences due to differential depletion within a singe DLA galaxy can be strong as demonstrated by observa- tions of a lensed QSO (Lopez et al. 2005). Observations of nebular emission lines from seven galax- ies at 2.0 < z < 2.5 indicate an average solar metallicity (Shapley et al. 2004), with evidence for the presence of metal- licity gradients (Förster Schreiber et al. 2006). If metallicity gra- dients exist for high redshift DLA galaxies, we would expect 0 10 20 30 b (kpc) confirmed candidates Fig. 10. DLA metallicities as a function of the impact pa- rameters, where symbols shapes have similar meanings as in the previous figures. The dotted line shows a fit to all the objects excluding the limits. This line has a gradient of −0.024 ± 0.015 dex kpc−1. Data for the confirmed objects are either [Si/H] of [Zn/H] taken from Møller et al. (2004). to see a tendency for higher metallicities for the DLA galax- ies detected at smaller impact parameters. DLA galaxies are on the average fainter than LBGs detected in flux limited sur- veys (Møller et al. 2002). Combining this with a high redshift luminosity-metallicity relation can imply low DLA metallicities without involving metallicity gradients. Fig. 10 shows the DLA metallicities as a function of the im- pact parameters. The line shows a fit to all objects ignoring those with lower limits on [Si/H]. This gradient is –0.024±0.015, i.e. is consistent with zero at the 2σ level. The large scatter in the plot could be real and unrelated to gradients. Different DLAs exhibit a large range of star formation histories (Erni et al. 2006; Herbert-Fort et al. 2006), which makes it unreasonable to expect a smooth relation between metallicities and impact parameters for a sample of DLAs. Clearly more data are needed to deter- mine the reality of any relation. 7. Conclusions We have presented an integral field spectroscopic survey of 9 high redshift QSOs, which have a total of 14 DLA systems and 8 sub-DLA systems. We detect eight good candidates for Lyα emission lines from DLA and sub-DLA galaxies. Two of these are found at impact parameters larger than 30 kpc, and are not likely associated directly with the absorbing galaxy, but could be associated with galaxy groups in which the real absorbing galaxy resides. All candidates are detected at a statistically sig- nificant level in reconstructed narrow-band images as well as in the co-added one-dimensional spectra. Further observations will be useful to independently confirm the candidates at an even higher signal to noise ratio. We compare the properties inferred from the IFS data with those for previously spectroscopically confirmed Lyα emission lines from DLA galaxies reported in the literature. We find that line luminosities are similar to those of previously confirmed objects, that the average impact param- eters are larger by a factor of ∼2, and that some candidates have larger velocity offsets between the Lyα emission line and the systemic redshift of the DLA system. 18 L. Christensen et al.: An IFS survey for high-z DLA galaxies We analyse the distribution of DLA column densities as a function of impact parameters. Assuming that the average DLA galaxy is similar to a disk galaxy with an exponential profile, we show that it has a scale length of 5 kpc. Such a scale length is similar to disk scale lengths found for local spiral galaxies. This could imply that DLAs belong to large disks even at high red- shifts as originally suggested by Wolfe et al. (1986). However, it is probably too simplistic to expect that high redshift DLAs reside in regular disks with similar structure to large local disks. DLA systems are generally not associated with luminous galax- ies (e.g. Colbert & Malkan 2002; Møller et al. 2002). The large impact parameters found for the candidates indicate that the dis- tribution of H i clouds in DLA galaxies extends significantly be- yond the optical sizes of fainter dwarf galaxies. Furthermore, Wolfe & Chen (2006) showed that high red- shift DLAs do not reside in extended disks that follow the lo- cal Schmidt-Kennicutt law for star formation. The IFS results presented here suggest that the Lyα emission is generally not ex- tended, and that star formation takes place at a distance of several kpc from the DLA. Hence, we may speculate that DLAs arise in the outskirts of proto-galaxies, for example in clouds of neutral gas around LBGs. In this case one would expect a significant scatter in the relation between the impact parameter and column density of the DLA since neutral clouds could be distributed ir- regularly around the galaxy. Contrary to expectations, the ob- jects show a small scatter around the relations in Figs. 7 and 8. Regardless of the distribution of neutral gas in DLA galaxies we conclude that there is a tendency to find a lower column density DLA with increasing impact parameter. Extending the investiga- tion to include DLAs and sub-DLAs with N(H i)> 1019.6 cm−2 this tendency emerges for both the candidates and the confirmed objects. The velocity offsets between the Lyα emission lines and the systemic redshifts of the DLAs are larger for half of the can- didates compared to the confirmed objects. This could indicate an origin in groups of galaxies, where the DLA resides in a less luminous component than the galaxy detected in Lyα. To deter- mine whether resonant scattering affects the candidate Lyα lines more strongly and gives rise to larger velocity offsets than for the confirmed objects, observations of the corresponding opti- cal emission lines which are shifted to the near-IR are required (e.g. Weatherley et al. 2005). Optical emission lines furthermore have the advantage of being less affected by dust absorption and therefore are better for estimating the star-formation rates. Alternatively non-resonance UV lines such as C iv could be stud- This survey was carried out with IFS on a 4-m class tele- scope, and the signals were generally near the detection limit. To verify this IFS method, it is necessary to get independent, higher signal-to-noise ratio spectra with a larger aperture telescope to confirm the existence of the emission lines. Acknowledgements. This study was supported by the German Verbundforschung associated with the ULTROS project, grant no. 05AE2BAA/4. S.F. Sánchez acknowledges the support from the Euro3D Research Training Network, grant no. HPRN-CT2002-00305. K. Jahnke acknowledges support from DLR project No. 50 OR 0404. We thank the referee for useful suggestions that clarified the paper. References Adelberger, K. L., Shapley, A. E., Steidel, C. C., et al. 2005, ApJ, 629, 636 Bahcall, J. N., Maoz, D., Doxsey, R., et al. 1992, ApJ, 387, 56 Bechtold, J. 1994, ApJS, 91, 1 Becker, T. 2002, PhD thesis, Astrophysikalisches Institut Potsdam, Germany Bunker, A., Smith, J., Spinrad, H., Stern, D., & Warren, S. 2003, Ap&SS, 284, Chaffee, F. H., Stepanian, J. A., Chavushian, V. A., Foltz, C. B., & Green, R. F. 1994, Bulletin of the American Astronomical Society, 26, 1338 Charlot, S. & Fall, S. M. 1993, ApJ, 415, 580 Chen, H.-W., Kennicutt, R. C., & Rauch, M. 2005, ApJ, 620, 703 Chen, H.-W. & Lanzetta, K. M. 2003, ApJ, 597, 706 Chen, H.-W., Lanzetta, K. M., Webb, J. K., & Barcons, X. 1998, ApJ, 498, 77 Chen, H.-W., Lanzetta, K. M., Webb, J. K., & Barcons, X. 2001, ApJ, 559, 654 Christensen, L., Jahnke, K., Wisotzki, L., & Sánchez, S. F. 2006, A&A, 459, 717 Christensen, L., Sánchez, S. F., Jahnke, K., et al. 2004, A&A, 417, 487 Churchill, C. W., Kacprzak, G. G., & Steidel, C. C. 2005, In Probing Galaxies through Quasar Absorption Lines Churchill, C. W., Mellon, R. R., Charlton, J. C., et al. 2000, ApJ, 543, 577 Colbert, J. W. & Malkan, M. A. 2002, ApJ, 566, 51 Cooke, J., Wolfe, A. M., Gawiser, E., & Prochaska, J. X. 2006, ApJ, 652, 994 Dessauges-Zavadsky, M., Péroux, C., Kim, T.-S., D’Odorico, S., & McMahon, R. G. 2003, MNRAS, 345, 447 Djorgovski, S. G., Pahre, M. A., Bechtold, J., & Elston, R. 1996, Nature, 382, Elmegreen, B. G., Elmegreen, D. M., Vollbach, D. R., Foster, E. R., & Ferguson, T. E. 2005, ApJ, 634, 101 Erni, P., Richter, P., Ledoux, C., & Petitjean, P. 2006, A&A, 451, 19 Filippenko, A. V. 1982, PASP, 94, 715 Fitzpatrick, E. L. 1999, PASP, 111, 63 Förster Schreiber, N. M., Genzel, R., Lehnert, M. D., et al. 2006, ApJ, 645, 1062 Francis, P. J. & McDonnell, S. 2006, MNRAS, 656 Fynbo, J. U., Burud, I., & Møller, P. 2000, A&A, 358, 88 Fynbo, J. U., Møller, P., & Warren, S. J. 1999, MNRAS, 305, 849 Gardner, J. P., Katz, N., Hernquist, L., & Weinberg, D. H. 2001, ApJ, 559, 131 Haehnelt, M. G., Steinmetz, M., & Rauch, M. 1998, ApJ, 495, 647 Haehnelt, M. G., Steinmetz, M., & Rauch, M. 2000, ApJ, 534, 594 Herbert-Fort, S., Prochaska, J. X., Dessauges-Zavadsky, M., et al. 2006, PASP, 118, 1077 Hopp, U. & Fernandez, M. 2002, Calar Alto Newsletter No.4, http://www.caha.es/newsletter/news02a/hopp/paper.pdf Hou, J. L., Shu, C. G., Shen, S. Y., et al. 2005, ApJ, 624, 561 Hu, E. M., Cowie, L. L., Capak, P., et al. 2004, AJ, 127, 563 Hu, E. M., Cowie, L. L., & McMahon, R. G. 1998, ApJ, 502, L99 Keel, W. C. 2005, AJ, 129, 1863 Kudritzki, R.-P., Méndez, R. H., Feldmeier, J. J., et al. 2000, ApJ, 536, 19 Kulkarni, V. P., Woodgate, B. E., York, D. G., et al. 2006, ApJ, 636, 30 Lacy, M., Becker, R. H., Storrie-Lombardi, L. J., et al. 2003, AJ, 126, 2230 Lanzetta, K. M., McMahon, R. G., Wolfe, A. M., et al. 1991, ApJS, 77, 1 Lanzetta, K. M., Wolfe, A. M., & Turnshek, D. A. 1995, ApJ, 440, 435 Le Brun, V., Bergeron, J., Boisse, P., & Deharveng, J. M. 1997, A&A, 321, 733 Ledoux, C., Petitjean, P., Bergeron, J., Wampler, E. J., & Srianand, R. 1998a, A&A, 337, 51 Ledoux, C., Petitjean, P., Fynbo, J. P. U., Møller, P., & Srianand, R. 2006, A&A, 457, 71 Ledoux, C., Theodore, B., Petitjean, P., et al. 1998b, A&A, 339, L77 Leibundgut, B. & Robertson, J. G. 1999, MNRAS, 303, 711 Lopez, S., Reimers, D., Gregg, M. D., et al. 2005, ApJ, 626, 767 Lowenthal, J. D., Hogan, C. J., Green, R. F., et al. 1995, ApJ, 451, 484 Lu, L., Sargent, W. L. W., & Barlow, T. A. 1997, ApJ, 484, 131 Lu, L., Sargent, W. L. W., Barlow, T. A., Churchill, C. W., & Vogt, S. S. 1996, ApJS, 107, 475 Matthews, L. D., van Driel, W., & Monnier-Ragaigne, D. 2001, A&A, 365, 1 Møller, P. 1999, in Astrophysics with the NOT, ed. H. Karttunen & V. Piirola (University of Turku), 80 Møller, P., Fynbo, J. U., & Fall, S. M. 2004, A&A, 422, L33 Møller, P. & Warren, S. J. 1993, A&A, 270, 43 Møller, P., Warren, S. J., Fall, S. M., Fynbo, J. U., & Jakobsen, P. 2002, ApJ, 574, 51 Møller, P., Warren, S. J., & Fynbo, J. U. 1998, A&A, 330, 19 Okoshi, K. & Nagashima, M. 2005, ApJ, 623, 99 Outram, P. J., Chaffee, F. H., & Carswell, R. F. 1999, MNRAS, 310, 289 Péroux, C., Dessauges-Zavadsky, M., D’Odorico, S., Kim, T., & McMahon, R. G. 2003, MNRAS, 345, 480 Péroux, C., Storrie-Lombardi, L. J., McMahon, R. G., Irwin, M., & Hook, I. M. 2001, AJ, 121, 1799 Petitjean, P., Pecontal, E., Valls-Gabaud, D., & Charlot, S. 1996, Nature, 380, Petitjean, P., Srianand, R., & Ledoux, C. 2000, A&A, 364, L26 Pettini, M., Shapley, A. E., Steidel, C. C., et al. 2001, ApJ, 554, 981 Poli, F., Giallongo, E., Fontana, A., et al. 2003, ApJ, 593, L1 Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1992, Numerical recipes in FORTRAN. The art of scientific computing L. Christensen et al.: An IFS survey for high-z DLA galaxies 19 (Cambridge: University Press, 1992, 2nd ed.) Prochaska, J. X., Castro, S., & Djorgovski, S. G. 2003a, ApJS, 148, 317 Prochaska, J. X., Castro, S., & Djorgovski, S. G. 2003b, ApJS, 148, 317 Prochaska, J. X., Gawiser, E., Wolfe, A. M., Castro, S., & Djorgovski, S. G. 2003c, ApJL, 595, L9 Prochaska, J. X., Gawiser, E., Wolfe, A. M., Cooke, J., & Gelino, D. 2003d, ApJS, 147, 227 Prochaska, J. X., Henry, R. B. C., O’Meara, J. M., et al. 2002a, PASP, 114, 933 Prochaska, J. X. & Herbert-Fort, S. 2004, PASP, 116, 622 Prochaska, J. X., Herbert-Fort, S., & Wolfe, A. M. 2005, ApJ, 635, 123 Prochaska, J. X., Howk, J. C., O’Meara, J. M., et al. 2002b, ApJ, 571, 693 Prochaska, J. X. & Wolfe, A. M. 1997, ApJ, 487, 73 Prochaska, J. X., Wolfe, A. M., Tytler, D., et al. 2001, ApJS, 137, 21 Pych, W. 2004, PASP, 116, 148 Rao, S. M., Nestor, D. B., Turnshek, D. A., et al. 2003, ApJ, 595, 94 Roth, M. M., Bauer, S., Dionies, F., et al. 2000, in Proc. SPIE, Vol. 4008, 277– Roth, M. M., Kelz, A., Fechner, T., et al. 2005, PASP, 117, 620 Sánchez, S. F. 2004, AN, 325, 167 Sánchez, S. F. 2006, AN, 327, 850 Sánchez, S. F., Garcia-Lorenzo, B., Mediavilla, E., González-Serrano, J. I., & Christensen, L. 2004, ApJ, 615, 156 Schlegel, D. J., Finkbeiner, D. P., & Davis, M. 1998, ApJ, 500, 525 Schneider, D. P., Schmidt, M., & Gunn, J. E. 1991, AJ, 101, 2004 Shapley, A. E., Erb, D. K., Pettini, M., Steidel, C. C., & Adelberger, K. L. 2004, ApJ, 612, 108 Shapley, A. E., Steidel, C. C., Pettini, M., & Adelberger, K. L. 2003, ApJ, 588, Smith, H. E., Cohen, R. D., & Bradley, S. E. 1986, ApJ, 310, 583 Steidel, C. C., Pettini, M., & Hamilton, D. 1995, AJ, 110, 2519 Stepanian, J. A., Chavushian, V. H., Chaffee, F. H., Foltz, C. B., & Green, R. F. 1996, A&A, 309, 702 Storrie-Lombardi, L. J. & Wolfe, A. M. 2000, ApJ, 543, 552 Turnshek, D. A., Wolfe, A. M., Lanzetta, K. M., et al. 1989, ApJ, 344, 567 van Breukelen, C., Jarvis, M. J., & Venemans, B. P. 2005, MNRAS, 359, 895 van Dokkum, P. G. 2001, PASP, 113, 1420 Verheijen, M. A. W. & Sancisi, R. 2001, A&A, 370, 765 Warren, S. J., Møller, P., Fall, S. M., & Jakobsen, P. 2001, MNRAS, 326, 759 Weatherley, S. J., Warren, S. J., Møller, P., et al. 2005, MNRAS, 358, 985 Wisotzki, L., Becker, T., Christensen, L., et al. 2003, A&A, 408, 455 Wolfe, A. M. & Chen, H.-W. 2006, ApJ, 652, 981 Wolfe, A. M., Lanzetta, K. M., Foltz, C. B., & Chaffee, F. H. 1995, ApJ, 454, Wolfe, A. M., Turnshek, D. A., Smith, H. E., & Cohen, R. D. 1986, ApJS, 61, Zaritsky, D., Kennicutt, R. C., & Huchra, J. P. 1994, ApJ, 420, 87 Introduction Sample selection Observations and data reduction Search for DLA optical counterparts Expected sizes Candidate selection Results Candidate significance class Non-detections Experiments with artificial objects Field Ly emitters Notes on individual objects Properties of candidate DLA counterparts Line fluxes Velocity differences Hi extension DLA galaxy sizes and luminosities Powerlaw profiles Exponential profiles Metallicity effects on Ly emission Metallicity gradients Conclusions ABSTRACT We search for galaxy counterparts to damped Lyman-alpha absorbers (DLAs) at z>2 towards nine quasars, which have 14 DLAs and 8 sub-DLAs in their spectra. We use integral field spectroscopy to search for Ly-alpha emission line objects at the redshifts of the absorption systems. Besides recovering two previously confirmed objects, we find six statistically significant candidate Ly-alpha emission line objects. The candidates are identified as having wavelengths close to the DLA line where the background quasar emission is absorbed. In comparison with the six currently known Ly-alpha emitting DLA galaxies the candidates have similar line fluxes and line widths, while velocity offsets between the emission lines and systemic DLA redshifts are larger. The impact parameters are larger than 10 kpc, and lower column density systems are found at larger impact parameters. Assuming that a single gas cloud extends from the QSO line of sight to the location of the candidate emission line, we find that the average candidate DLA galaxy is surrounded by neutral gas with an exponential scale length of ~5 kpc. <|endoftext|><|startoftext|> Introduction Variability is an important phenomenon in astrophysical studies of structure and evolu- tion, both stellar and galactic. Some variable stars, such as RR Lyrae, are an excellent tool for studying the Galaxy. Being nearly standard candles (thus making distance determina- tion relatively straightforward) and being intrinsically bright, they are a particularly suitable – 3 – tracer of Galactic structure. In extragalactic astronomy, the optical continuum variability of quasars is utilized as an efficient method for their discovery (van den Bergh, Herbst & Pritchet 1973; Hawkins 1983; Koo, Kron & Cudworth 1986; Hawkins & Veron 1995), and is also fre- quently used to constrain the origin of their emission (Kawaguchi et al. 1998; Trevese et al. 2001; Martini & Schneider 2003). Despite the importance of variability, the variable optical sky remains largely unex- plored and poorly quantified, especially at the faint end. To what degree different variable populations contribute to the overall variability, how they are distributed in magnitude and color, what the characteristic time-scales and the dominant mechanisms of variability are, are just some of the questions that still remain to be answered. To address these questions, several contemporary projects aimed at regular monitoring of the optical sky were started. Some of the more prominent surveys in terms of the sky coverage, depth, and cadence are: • The Faint Sky Variability Survey (Groot et al. 2003) is a very deep (17 < V < 24) BV I survey of 23 deg2 of sky, containing about 80,000 sources sampled at timescales ranging from minutes to years. • The QUEST Survey (Vivas et al. 2001) monitors 700 deg2 of sky from V = 13.5 to a limit of V = 21. • ROTSE-I (Akerlof et al. 2000) monitors the entire observable sky twice a night from V = 10 to a limit of V = 15.5. The Northern Sky Variability Survey (Woźniak et al. 2004) is based on ROTSE-I data. • OGLE (most recently OGLE III; Udalski et al. 2002) monitors ∼ 100 deg2 towards the Galactic bulge from I = 11.5 to a limit of I = 20. Due to the very high stellar density towards the bulge, OGLE II has detected about 270,000 variable stars (Woźniak et al. 2002; Żebruń et al. 2002). • The MACHO Project monitored the brightness of ∼ 60 million stars in ∼ 90 deg2 of sky toward the Magellanic Clouds and the Galactic bulge for ∼ 7 years to a limit of V ∼ 24 (Alcock et al. 2001). A comprehensive review of past and ongoing variability surveys can be found in Becker et al. (2004). Recognizing the outstanding importance of variable objects, the last Decadal Survey Re- port (Astronomy and Astrophysics Survey Committee, Board on Physics and Astronomy, Space Studies Board, National Research Council 2001) highly recommended a major new initiative for studying the variable sky, the Large – 4 – Synoptic Survey Telescope (LSST; Tyson et al. 2002; Walker 2003). The LSST1 will offer an unprecedented view of the faint variable sky: according to the current designs it will scan the entire accessible sky every three nights to a limit of V ∼ 25 with two observations per night in two different bands (selected from a set of six). One of the LSST science goals2 will be the exploration of the transient optical sky: the discovery and analysis of rare and exotic objects (e.g. neutron star and black hole binaries), gamma-ray bursts, X-ray flashes, and of new classes of transients, such as binary mergers and stellar disruptions by black holes. The observed volume of space, and the requirement to recognize and monitor these events — in real time — on a “normally” variable sky, will present a challenge to the project. Since LSST will utilize3 the Sloan Digital Sky Survey (SDSS; York et al. 2000) photo- metric system (ugriz, Fukugita et al. 1996), multiple photometric observations obtained by the SDSS represent an excellent dataset for a pre-LSST study that characterizes the faint variable sky and quantifies the variable population and its distribution in magnitude-color- variability space. Here we present such a study of unresolved sources in a region that has been imaged multiple times by the SDSS. In Section 2 we give a brief overview of the SDSS imaging survey and repeated scans of a ∼ 290 deg2 region called Stripe 82. In Section 3, we describe methods used to select candidate variable sources from the SDSS Stripe 82 data assembled, averaged and recal- ibrated by Ivezić et al. (2007), and present tests that show the robustness of the adopted selection criteria. In the same section, we discuss the distribution of selected variable sources in magnitude-color-variability space. The Milky Way halo structure traced by selected can- didate RR Lyrae stars is discussed in Section 4, and in Section 5 we estimate the fraction of variable quasars. Implications for surveys such as the LSST are discussed in Section 6, and our main results are summarized in Section 7. 2. Overview of the SDSS Imaging and Stripe 82 Data The quality of photometry and astrometry, as well as the large area covered by the survey, make the SDSS stand out among available optical sky surveys (Sesar et al. 2006). The SDSS is providing homogeneous and deep (r < 22.5) photometry in five bandpasses 1See [HREF]http://www.lsst.org 2For more details see [HREF]http://www.lsst.org/Science/science goals.shtml 3LSST will also use the Y band at ∼ 1 µm. For more details see the LSST Science Requirement Document at [HREF]http://www.lsst.org/Science/lsst baseline.shtml – 5 – (u, g, r, i, and z, Gunn et al. 1998; Hogg et al. 2002; Smith et al. 2002; Gunn et al. 2006; Tucker et al. 2006) accurate to 0.02 mag (root-mean-square scatter, hereafter rms) for un- resolved sources not limited by photon statistics (Scranton et al. 2002; Ivezić et al. 2003a), and with a zeropoint uncertainty of 0.02 mag (Ivezić et al. 2004a). The survey sky coverage of 10,000 deg2 in the northern Galactic cap, and 300 deg2 in the southern Galactic cap will result in photometric measurements for well over 100 million stars and a similar number of galaxies (Stoughton et al. 2002). The recent Data Release 5 (Adelman-McCarthy et al. 2007)4 lists photometric data for 215 million unique objects observed in 8000 deg2 of sky as part of the “SDSS-I” phase that ran through June 2005. Astrometric positions are accu- rate to better than 0.1′′ per coordinate (rms) for sources with r < 20.5 (Pier et al. 2003), and the morphological information from the images allows reliable star-galaxy separation to r ∼ 21.5 (Lupton et al. 2002). In addition, the 5-band SDSS photometry can be used for very detailed source classification; e.g. separation of quasars and stars (Richards et al. 2002), spectral classification of stars to within 1-2 spectral subtypes (Lenz et al. 1998; Finlator 2000; Hawley et al. 2002), and even remarkably efficient color selection of the horizontal branch and RR Lyrae stars (Yanny et al. 2000; Sirko et al. 2004; Ivezić et al. 2005) and low-metallicity G and K giants (Helmi et al. 2003). The equatorial Stripe 82 region (22h 24m < αJ2000 < 04h 08m, −1.27 ◦ < δJ2000 < +1.27◦, ∼ 290 deg2), observed in the southern Galactic cap, presents a valuable data source for variability studies. The region was repeatedly observed (65 imaging runs by July 2005, but not all cover the entire region), and it is the largest source of multi-epoch data in the SDSS-I phase. Another source of the large number of scans is the SDSS-II Supernova Survey (Frieman et al. 2007). By averaging the repeated observations of Stripe 82 sources, more accurate photometry than the nominal 0.02 mag single-scan accuracy can be achieved. This motivated Ivezić et al. (2007) to produce a catalog of recalibrated Stripe 82 observations. The catalog lists 58 million photometric observations for 1.4 million unresolved sources that were observed at least 4 times in each of the gri bands (with a median of 10 observations obtained over ∼ 5 years). The random photometric errors for PSF (point spread function) magnitudes are below 0.01 mag for stars brighter than 19.5, 20.5, 20.5, 20, 18.5 in ugriz, respectively (about twice as accurate for individual SDSS runs), and the spatial variation of photometric zeropoints is not larger than ∼0.01 mag (rms). Following Ivezić et al. (2007), we use PSF magnitudes because they go deeper at a given signal-to-noise ratio than aperture magnitudes, and have more accurate photometric error estimates than model magnitudes. In addition, various low-order statistics such as root-mean-square scatter (Σ), χ2 per degree of freedom (χ2), lightcurve skewness (γ), minimum and maximum PSF magnitude, were 4Please see [HREF]http://www.sdss.org/dr5 – 6 – computed for each ugriz band and each source. We compute χ2 per degree of freedom as (xi − 〈x〉) and lightcurve skewness γ as5 (n− 1)(n− 2) (xi − 〈x〉) 3 (3) (xi − 〈x〉)2 (4) where n is the number of detections, xi is the magnitude, 〈x〉 is the mean magnitude, and ξi is the photometric error. Separation of quasars and stars, as well as efficient color selection of horizontal branch and RR Lyrae stars, depend on accurate u band photometry. To ensure this, we select 748,084 unresolved sources from the Ivezić et al. (2007) catalog with at least 4 detections in the u band. A catalog of variable sources selected from this sample is analyzed in Section 3 below. 3. Analysis of Stripe 82 Catalog of Variable Sources In this section we describe methods for selecting candidate variable sources, and present tests that show the robustness of the adopted selection criteria. The distribution of selected variable sources in magnitude-color-variability space is also presented. 3.1. Methods and Selection Criteria Due to a relatively small number of observations per source and random sampling, we do not perform lightcurve fitting, but instead use low order statistics to select candidate variables and study their properties. There are four parameters (median PSF magnitude, root-mean-square scatter Σ, χ2, and lightcurve skewness γ) measured in five photometric 5We use equations from [HREF]http://www.xycoon.com/skewness small sample test 1.htm. – 7 – bands (u, g, r, i, and z), for a total of 20 parameters. In the analysis presented here, we utilize eight of them: • median PSF magnitudes in the ugr bands (corrected for interstellar extinction using the map from Schlegel, Finkbeiner & Davis 1998) because the g − r vs. u − g color- color diagram has the most classification power (e.g. Smolčić et al. 2004 and references therein). • Σ and χ2 in the g and r bands, and • lightcurve skewness γ(g) (the g band combines a high signal-to-noise ratio and large variability amplitude for the majority of variable sources). The observed root-mean-square scatter Σ includes both the intrinsic variability σ and the mean photometric error 〈ξ(m)〉 as a function of magnitude. The dependence of Σ on magnitude in the ugriz bands, is shown in Figure 1. For sources brighter than 18, 19.5, 19.5, 19, and 17.5 mag in the ugriz, respectively, the SDSS delivers 2% photometry with little or no dependence on magnitude. We determine 〈ξ(m)〉 by fitting a fourth degree polynomial to median Σ values in 0.5 mag wide bins (here we assume that the majority of sources are not variable). The theoretically expected 〈ξ(m)〉 function (Strateva et al. 2001) 〈ξ(m)〉 = a+ b100.4m + c100.8m (5) provides equally good fits. We define the intrinsic variability σ (hereafter rms scatter σ) as σ = (Σ2 − 〈ξ(m)〉2)1/2 (6) for Σ > 〈ξ(m)〉, and σ = 0 otherwise. As the first variability selection criterion, we adopt σ(g) > 0.05 mag and σ(r) > 0.05 mag (hereafter written as σ(g, r) > 0.05 mag). At the bright end, this criterion is equivalent to selecting sources with rms scatter greater than 2.5σ0, where σ0 = 0.02 mag is the mea- surement noise. Selection cuts are applied simultaneously in the g and r bands to reduce the number of “false positives” (intrinsically non-variable sources selected as candidate vari- able sources due to measurement noise). About 6% of sources pass the σ cut in each band separately, and ∼ 3% of sources pass the cut in both bands simultaneously. By selecting sources with σ(g, r) > 0.05 mag, we also select faint sources that have large σ due to large photometric errors at the faint end. To only select faint sources with statistically significant rms scatter, we apply the χ2 test as the second selection cut. – 8 – In the χ2 test, the value of χ2 per degree of freedom (calculated with respect to a weighted mean magnitude and using errors computed by the photometric pipeline) deter- mines whether the observed lightcurve is consistent with the Gaussian distribution of er- rors. Large χ2 values show that the rms scatter is inconsistent with random fluctuations. Ivezić et al. (2003a, 2007) used multi-epoch SDSS observations to show that the photometric error distribution in the SDSS roughly follows a Gaussian distribution. A comparison of χ2 distributions in the g and r bands with a reference Gaussian χ2 distribution is shown in Figure 2. As evident, χ2 distributions in both bands roughly follow the reference Gaussian χ2 distribution for χ2 < 1, demonstrating that median photometric errors are correctly de- termined. The discrepancy for larger χ2 is due to variable sources rather than non-Gaussian error distributions, as we demonstrate below. The second selection cut, χ2(g) > 3 and χ2(r) > 3 (hereafter written as χ2(g, r) > 3), selects ∼ 90% of σ(g, r) > 0.05 mag sources, as shown in Figure 2 (middle panels). The effectiveness of the χ2 test is demonstrated in the bottom panel of Figure 2. For magnitudes fainter than g = 20.5, the fraction of candidate variables decreases as photometric errors increase. The selection is relatively uniform for sources brighter than g = 20.5, and we adopt this value as the flux limit for the selected variable sample. There are 662,195 sources brighter than g = 20.5 in the full sample. Using σ(g, r) > 0.05 mag and χ2(g, r) > 3 as the selection criteria, we select 13,051 candidate variable sources6. Therefore, at least 2% of unresolved optical sources brighter than g = 20.5 appear variable at the > 0.05 mag level (rms) simultaneously in the g and r bands. The fraction of selected variable sources is not a strong function of the minimum required number of observations, but it does depend on the stellar density because the number of stars increases at lower Galactic latitudes (see Fig. 5 in Ivezić et al. 2007) while the quasar count remains the same. 3.2. The Counts of Variable Sources In this section we estimate the completeness and efficiency of the candidate variable sample, and discuss the dependence of counts, rms scatter, σ(g)/σ(r) ratio, and the lightcurve skewness γ(g) on the position in the g − r vs. u− g color-color diagram. 6A list of candidate variable sources and their data from Ivezić et al. (2007) are publicly available from [HREF]http://www.sdss.org/dr5/products/value added/index.html – 9 – 3.2.1. Completeness The selection completeness, defined as the fraction of true variable sources recovered by the algorithm, depends on the lightcurve shape and amplitudes. Due to a fairly large number of observations (median of 10), and small σ(g, r) cutoff compared to typical amplitudes of variable sources (e.g. most RR Lyrae stars and quasars have peak-to-peak amplitudes ∼ 1 mag), we expect the completeness to be fairly high for RR Lyrae stars (& 95%, see Section 4) and quasars (∼ 90%, see Section 5). The completeness for other types of variable sources, such as flares and eclipsing binaries, is hard to estimate, but is probably low due to sparse sampling. 3.2.2. Efficiency The selection efficiency, defined as the fraction of true variable sources in the candidate variable sample, determines the robustness of the selection algorithm. The main diagnostic for the robustness of the adopted selection criteria is the distribution of selected candidates in the SDSS color-magnitude and color-color diagrams. The position of a source in these diagrams is a good proxy for its spectral classification (Lenz et al. 1998; Fan 1999; Finlator 2000; Smolčić et al. 2004). Figure 3 compares the distribution of candidate variable sources to that of all sources in the g − r vs. u − g color-color diagram. Were the selection a random process, the se- lected candidates would have the same distribution as the full sample. The distributions of candidate variables and of the full sample are remarkably different, demonstrating that the candidate variables are not randomly selected from the parent sample. The three dominant classes of variable objects are quasars, RR Lyrae stars, and stars from the main stellar locus. The most obvious difference between the variable and the full sample distributions is a much higher fraction of low-redshift quasars (< 2.2, recognized by their UV excess, u − g < 0.7, see Richards et al. 2002) and RR Lyrae stars (u − g ∼ 1.15, g − r < 0.3, see Ivezić et al. 2005) in the variable sample, and vividly shown in the bottom panel of Figure 3. Another interesting feature visible in this panel is a gradient in the fraction of variable main stellar locus stars (perpendicular to the main stellar locus). We investigate this gradient by first defining principal colors P1 = 0.91u− 0.495g − 0.415r − 1.28 (7) – 10 – s = −0.249u+ 0.794g − 0.555r + 0.234 (8) where P1 and s are principal axis parallel and perpendicular to the main stellar locus, respectively (Ivezić et al. 2004a). The s color is a measure of metallicity (Lenz et al. 1998), and s > 0.05 stars are expected to be metal poor (Helmi et al. 2003). Sources with r < 19 and 0 < P1 < 0.9 are selected and binned in four s bins. For each bin we calculate the fraction of source with σ(g) > 0.05 mag, the fraction of variable sources (selected with σ(g, r) > 0.05 mag and χ2(g, r) > 3), median σ(g), and the total number of sources in the bin (see Table 2). A greater fraction of variable sources in the last bin (s > 0.06) indicates that, on average, metal-poor main stellar locus stars are more variable than the metal-rich stars. This could be because this sample of metal-poor stars is expected to have a high fraction of giants. In order to quantify the differences between the full and the variable sample, we follow Sesar et al. (2006) and divide the g − r vs. u− g color-color diagram into six characteristic regions, each dominated by a particular type of source, as shown in Figure 4. The fractions and counts of variable and all sources in each region are listed in Table 1 for g < 19, g < 20.5, and g < 22 flux-limited samples. Notably, in the adopted g < 20.5 flux limit, the fraction of Region II sources (dominated by low-redshift quasars) in the variable sample is 63%, or ∼ 30 times greater than the fraction of Region II sources in the full sample (∼ 2%). The fraction of Region IV sources (which include RR Lyrae stars) in the variable sample is also high when compared to the full sample (∼ 6 times higher). As shown in Table 1, in the g = 20.5 flux-limited sample, we find that low-redshift quasars and RR Lyrae stars (i.e. Regions II and IV) make 70% of the variable population, while representing only 3% of all sources. Quasars alone account for 63% of the variable population. Stars from the main stellar locus represent 95% of all sources and 25% of the variable sample: about 0.5% of the stars from the locus are variable at the > 0.05 mag level. 3.3. The Properties of Variable Sources Various lightcurve properties, such as shape and amplitude, are expected to be correlated with stellar types. In this section we study the distribution of the rms scatter in the u and g bands, and σ(g)/σ(r) ratio as a function of the u− g and g− r colors. To emphasize trends, we bin sources and present median values for each bin. The distribution of the median σ(u) and σ(g) values in the g − r vs. u − g color-color diagram is shown in the top two panels of Figure 5. RR Lyrae stars show larger rms scatter – 11 – (& 0.3 mag) in the u and g bands, than low-redshift quasars or stars from the main stellar locus. Quasars also show slightly larger rms scatter in the u band (∼ 0.1 mag) than in the g band (∼ 0.07 mag), as discussed by Kinney et al. (1991),Ivezić et al. (2004b), and Vanden Berk et al. (2004). If we define the degree of variability as the root-mean-square scatter in the g band, then on average RR Lyrae stars show the greatest variability, followed by quasars and the main stellar locus stars. Another distinctive characteristic of variable sources is the ratio of flux changes in different bandpasses. This property can be used to select different types of variable sources. For example, RR Lyrae stars are bluer when brighter, a behavior used by Ivezić et al. (2000) to select RR Lyrae using 2-epoch SDSS data. Here we define a new parameter, σ(g)/σ(r), to express the ratio of flux changes in the g and r bands, and study its distribution in the g− r vs. u − g color-color diagram. In particular, we examine this distribution and its median values for three dominant classes of variable sources: quasars, RR Lyrae stars, and stars from the main stellar locus. The bottom left panel in Figure 5 shows the distribution of median σ(g)/σ(r) values as a function of u− g and g − r colors. Using Fig. 5 we note that on average: • RR Lyrae stars have σ(g)/σ(r) ∼ 1.4 • Main stellar locus stars have σ(g)/σ(r) ∼ 1, and • Quasars show a σ(g)/σ(r) gradient in the g − r vs. u− g color-color diagram. The average value of σ(g)/σ(r) ∼ 1.4 in Region IV indicates that RR Lyrae stars dominate the variable source count in this region. The ratio of 1.4 for RR Lyrae stars was also previously found by Ivezić et al. (2000). While Figure 5 only presents median values of the rms scatter, Figure 6 shows how the rms scatter in the g and r bands correlates with the u − g color for individual sources. Variable sources that follow the σ(g) = 1.4σ(r) relation also correlate with the u− g color, and have u− g ∼ 1, as expected for RR Lyrae stars. The average ratio of σ(g)/σ(r) ∼ 1 (i.e. gray flux variations) for stars in the main stellar locus suggests that the variability could be caused by eclipsing systems. The distribution of γ(g) for main stellar locus stars further strengthens this possibility, as discussed in Section 3.4 below. The gradient in the σ(g)/σ(r) ratio observed for low-redshift quasars in the g−r vs. u−g color-color diagram suggests that the variability correlation between the g and r bands is more complex than in the case of RR Lyrae or main stellar locus stars. Wilhite et al. (2006) – 12 – show that the photometric color changes for quasars depend on the combined effects of con- tinuum changes, emission-line changes, redshift, and the selection of photometric bandpasses. They note that due to the lack of variability of the lines, measured photometric color is not always bluer in brighter phases, but depends on redshift and the filters used. To verify the dependence of broad-band photometric variability on redshift, we plot σ(g)/σ(r) vs. redshift for all spectroscopically confirmed unresolved quasars from Schneider et al. (2005) which are in Stripe 82, as shown in Figure 7. We confirm that the broad-band photometric variability depends on the redshift, and that the σ(g)/σ(r) gradient in the g − r vs. u − g color-color diagram can be explained by the increase in σ(g)/σ(r) from ∼ 1 to ∼ 1.6 in the 1.0 to 1.6 redshift range. This effect is due to the Mg II emission line (more stable in flux than the con- tinuum) moving through the r band filter over this redshift range. The implied correlation of the u−g and g− r colors with redshift is consistent with the discussion by Richards et al. (2002). The lack of noticeable correlation of σ(g) with redshift is due to the combined effects of the dependence of σ(g) on the rest-frame wavelength and time which cancel out (for a detailed model see Ivezić et al. 2004b). 3.4. Skewness as a Proxy for Dominant Variability Mechanism Lightcurve skewness, a measure of the lightcurve asymmetry, provides additional in- formation on the type of variability. Negatively skewed, asymmetric lightcurves indicate variable sources that spend more time fainter than (mmin+mmax)/2, where mmin and mmax are magnitudes at the minimum and maximum. Type ab RR Lyrae stars, for example, have negatively skewed lightcurves (γ ∼ −0.5, Wils, Lloyd & Bernhard 2006). Positively skewed, asymmetric lightcurves indicate variable sources that spend more time brighter than (mmin + mmax)/2 (e.g. eclipsing systems). Sources with symmetric lightcurves will have γ ∼ 0. The bottom right panel in Figure 5 shows the distribution of the median γ(g) as a function of the position in the g − r vs. u − g color-color diagram. On average, quasars and c type RR Lyrae stars (u − g ∼ 1.15, g − r < 0.15) have γ(g) ∼ 0, ab type RR Lyrae (u − g ∼ 1.15, g − r > 0.15) have negative skewness (γ(g) ∼ −0.5), and stars in the main stellar locus have positive skewness. Figure 8 shows the distribution of the lightcurve skewness in the ugi bands for spec- troscopically confirmed unresolved quasars from Schneider et al. (2005) which are in Stripe 82, candidate RR Lyrae stars (selection details are discussed in Section 4 below), and main stellar locus stars from our variable sample. Stars in the main stellar locus show a bimodal γ(g) distribution. This distribution suggests at least two, and perhaps more, different popu- – 13 – lations of variables. Indeed, when spectroscopically confirmed M dwarfs are selected, a third peak appears at γ(g) −2.5, possibly associated with flaring M dwarfs (Kowalski et al. 2007). The bimodality similar to the one in the g band is also discernible in the r band, while it is less pronounced in the i band and not detected in the u and z bands (the r and z data are not shown). A comparison of the u− g and g − r color distributions for variable main stellar locus stars brighter than g = 19 and a subset with highly asymmetric lightcurves (γ(g) > 2.5) is shown in Figure 9. The subset with asymmetric lightcurves has an increased fraction of stars with colors u− g ∼ 2.5 and g− r ∼ 1.4, that correspond to M stars. This may indicate that M stars have a higher probability of being associated with an eclipsing companion than stars with earlier spectral types. However, the selection effects are probably important since a companion is easier to detect (due to the low luminosity of M dwarfs). Kowalski et al. (2007) examine these issues using lightcurve data on a sample of spectroscopically confirmed M dwarfs. Finally, quasars have symmetric lightcurves (γ ∼ 0) and their distribution of skewness does not change between bands. 4. The Milky Way Halo Structure Traced by Candidate RR Lyrae Stars Studies of substructures in the Galactic halo, such as clumps and streams, can constrain the formation history of the Milky Way. One of the best tracers to study the outer halo are RR Lyrae stars because they are nearly standard candles, are sufficiently bright to be detected at large distances (5− 100 kpc for 14 < r < 20.7), and are sufficiently numerous to trace the halo substructure with a high spatial resolution. The General Catalog of Variable Stars (GCVS; Kholopov et al. 1988) lists7 RR Lyrae stars as RR Lyrae type ab (RRab) and type c (RRc) stars. RRab stars have asymmetric lightcurves, periods from 0.3 to 1.2 days, and amplitudes from V ∼ 0.5 to V ∼ 2. RRc stars have nearly symmetric, sometimes sinusoidal, lightcurves, with periods from 0.2 to 0.5 days, and amplitudes not greater than V ∼ 0.8. In this work we assume MV = 0.7 as the absolute V band magnitude of RRab and RRc stars. A comprehensive review of RR Lyrae stars can be found in Smith (1995). In this section we fine tune criteria for selecting candidate RR Lyrae stars, and estimate the selection completeness and efficiency. Using selected candidate RR Lyrae stars, we recover a known halo clump associated with the Sgr dwarf tidal stream, and find several new halo substructures. 7A list of GCVS variability types can be found at [HREF]http://www.sai.msu.su/groups/cluster/gcvs/gcvs/iii/vartype.txt – 14 – 4.1. Criteria for Selecting RR Lyrae Stars Figures 3, 4, and 5 show that RR Lyrae stars occupy a well-defined region (Region IV) in the g− r vs. u−g color-color diagram, and Figure 6 shows how RR Lyrae stars follow the σ(g) = 1.4σ(r) relation. Motivated by these results, we introduce color and σ(g)/σ(r) cuts to specifically select candidate RR Lyrae stars from the variable sample, and study their distribution in the rms scatter-color-lightcurve skewness parameter space. RR Lyrae stars have distinctive colors and can be selected with the following criteria (Ivezić et al. 2005): 0.98 < u− g < 1.30 (9) − 0.05 < Dug < 0.35 (10) 0.06 < Dgr < 0.55 (11) − 0.15 < r − i < 0.22 (12) − 0.21 < i− z < 0.25 (13) where Dug = (u− g) + 0.67(g − r)− 1.07 (14) Dgr = 0.45(u− g)− (g − r)− 0.12. (15) We apply these cuts to our sample of candidate variables and select 846 sources. It is implied by Ivezić et al. (2005) that RR Lyrae should always stay within these color bound- aries, even though their colors change as a function of phase. Their distribution in the g− r vs. u − g color-color diagram and rms scatter in the g band are shown in Figure 10 (top left panel). The distribution of sources in the RR Lyrae region is inhomogeneous. Sources with large rms scatter in the g band (& 0.2 mag) are centered around u− g ∼ 1.15, and are separated by g − r ∼ 0.12 into two groups. A comparison with Figure 3 from Ivezić et al. (2005) suggests that these large rms scatter sources might be RR Lyrae type ab (RRab, g − r > 0.12) and type c stars (RRc, g − r < 0.12). Small rms scatter sources (. 0.1 mag) have a fairly uniform distribution, and are slightly bluer with u− g . 1.1. The distribution of sources from the RR Lyrae region in the σ(r) vs. σ(g) diagram is presented in the top right panel of Figure 10. The majority of large rms scatter sources follow the σ(g) = 1.4σ(r) relation, as expected for RR Lyrae stars. Since RR Lyrae stars are bluer when brighter, or equivalently, have greater rms scatter in the g band than in the r band, we require 1 < σ(g)/σ(r) 6 2.5 and select 683 candidate RR Lyrae stars. – 15 – A comparison of u − g color distributions for candidate RR Lyrae stars and of sources with RR Lyrae colors, but not tagged as RR Lyrae stars, presented in the bottom left panel of Figure 10, demonstrates the robustness of the RR Lyrae selection. The two distributions are very different (the probability that they are the same is 10−4, as given by the KS test), with the candidate RR Lyrae distribution peaking at u−g ∼ 1.15, as expected for RR Lyrae stars. One property that distinguishes RRab from RRc stars is the shape (or skewness) of their lightcurves (in addition to lightcurve amplitude and period). RRab stars have asymmetric lightcurves, while RRc lightcurves are symmetric. In the top left panel of Figure 10, we noted that g − r ∼ 0.12 seemingly separates high rms scatter sources into two groups. If g − r ∼ 0.12 is the boundary between the RRab and RRc stars, then the same boundary should show up in the distribution of lightcurve skewness as a function of the g − r color. As shown in Figure 10 (bottom left panel), this is indeed the case. On average, sources with g− r < 0.12 have γ(g) ∼ 0 (symmetric lightcurves), as RRc stars, while g− r > 0.12 sources have γ(g) ∼ −0.5 (asymmetric lightcurves) typical of RRab stars. We show in Section 4.2 that candidate RR Lyrae stars with γ(g) > 1 are contaminated by eclipsing variables. Therefore, to reduce the contamination by eclipsing variables, we also require γ(g) 6 1, and select 634 sources as our final sample of candidate RR Lyrae stars. 4.2. Completeness and efficiency The selection completeness, defined as the fraction of recovered RR Lyrae stars, will depend on the color cuts, σ(g, r) cutoff, and the number of observations. The color cuts (Eqs. 9 to 15) applied in Section 4.1 were chosen to minimize contamination by sources other than RR Lyrae stars while maintaining an almost 100% completeness (Ivezić et al. 2005). With the σ(g, r) cutoff at 0.05 mag (small compared to the ∼ 1 mag typical peak- to-peak amplitudes of RR Lyrae stars), and a fairly large number of observations per source (median of 10), we estimate the RR Lyrae selection completeness to be & 95% (see Appendix in Ivezić et al. 2000). To determine the selection efficiency, defined as the fraction of true RR Lyrae stars in the RR Lyrae candidate sample, we positionally match 683 candidate RR Lyrae stars selected by 1 < σ(g)/σ(r) 6 2.5 to a sample of RR Lyrae sources selected from the SDSS Light- Motion-Curve Catalog (LMCC; Bramich et al. 2007). This catalog covers the same region of the sky as the one discussed here, but includes more recent SDSS-II observations that allow the construction of lightcurves. We match 613 candidates, while 70 candidate RR Lyrae – 16 – stars from our sample, for some reason, do not have a match in the LMCC (De Lee, private communication). Following the classification based on phased lightcurves by De Lee et al. (2007), we find that 71% of sources in our candidate RR Lyrae sample are classified as RRab and RRc, 28% are classified as variable non-RR Lyrae stars, and only 1% are spurious, non- variable sources. The most significant contamination comes from a population of variable sources bluer than u−g ∼ 1.1 (dotted line, bottom left panel Figure 11), possibly Population II δ Scuti stars, also known as SX Phoenicis stars (Hoffmeister, Richter & Wenzel 1985). The top left and the bottom right panels in Figure 11, show that RRab and RRc- dominated regions are separated by g−r ∼ 0.12, as already hinted in Figure 10. Also, variable non-RR Lyrae sources with γ(g) > 1 are classified by De Lee et al. (2007) as eclipsing variables, justifying our γ(g) 6 1 cut. To summarize, using color criteria and criteria based on σ(g), σ(r), and γ(g) RR Lyrae stars are selected with & 95% completeness and ∼ 70% efficiency. 4.3. The Spatial Distribution of Candidate RR Lyrae Stars Using the selection criteria from Section 4.1 we isolate 634 RR Lyrae candidates. The magnitude-position diagram for these candidates within 2.5◦ from the Celestial Equator is shown in Figure 12. As discussed by Ivezić et al. (2005), an advantage of the data representation utilized in Figure 12 (magnitude–right ascension diagram) is its simplicity – only “raw” data are shown, without any post-processing. However, the magnitude scale is logarithmic and thus the spatial extent of structures is heavily distorted. In order to avoid these shortcomings, we have applied a Bayesian method for estimating continuous spatial density distribution developed by Ivezić et al. (2005) (see their Appendix B). The resulting density map is shown in the right panel in Figure 13. The advantage of that representation is that it better conveys the significance of various local overdensities. For comparison, we also show a map of the northern part of the equatorial strip constructed using 2-epoch data discussed by Ivezić et al. (2000). We detect several new halo substructures at & 3σ significance (compared to expected Poissonian fluctuations) and present their approximate locations and properties in Table 3. The most distant clump is at 100 kpc from the Galactic center. The strongest clump in the left wedge belongs to the Sgr dwarf tidal stream as does the clump marked by C in the right wedge (Ivezić et al. 2003a). We note that the apparent “clumpiness” of the candi- date RR Lyrae distribution increases with increasing radius, similar to CDM predictions by – 17 – Bullock, Kravtsov & Weinberg (2001). A detailed comparison of their models with the data presented here will be discussed elsewhere (Sesar et al., in prep). 5. Are All Quasars Variable? The optical continuum variability of quasars has been recognized since their first op- tical identification (Matthews & Sandage 1963), and it has been proposed and utilized as an efficient method for their discovery (van den Bergh, Herbst & Pritchet 1973; Hawkins 1983; Koo, Kron & Cudworth 1986; Hawkins & Veron 1995). The observed characteristics of the variability of quasars are frequently used to constrain the origin of their emission (e.g. Kawaguchi et al. 1998 and references therein; Martini & Schneider 2003; Pereyra et al. 2006). Recently, significant progress in the description of quasar variability has been made by employing the SDSS data (de Vries, Becker & White 2003; Ivezić et al. 2004b; Vanden Berk et al. 2004; de Vries et al. 2005; Sesar et al. 2006). Here we expand these studies by quantifying the efficiency of quasar discovery using variability. A preliminary comparison of color and variability based methods for selecting quasars using SDSS data was presented by Ivezić et al. (2003b). They found that 47% of spectroscop- ically confirmed unresolved quasars with UV excess have the g band magnitude difference between two observations obtained two years apart larger than 0.15 mag. We can improve on their analysis because now there are significantly more observations obtained over a longer time period. Since quasars vary erratically and the rms scatter of their variability (the so- called structure function) increases with time (e.g. Vanden Berk et al. 2004 and references therein), the variability selection completeness is expected to be higher than ∼ 50% obtained by Ivezić et al. (2003b). First, although the adopted variability selection criteria discussed above are fairly con- servative, we find that at least 63% of low-redshift quasars are variable at the > 0.05 mag level (simultaneously in the g and r bands over observer’s time scales of several years) in the g < 20.5 flux-limited sample. Second, even this estimate is only a lower limit: given the spectroscopic confirmation for a large flux-limited sample of quasars, it is possible to relax the adopted variability selection cutoff without a prohibitive contamination by non-variable sources. There are 2,492 unresolved quasars in the catalog of spectroscopically confirmed SDSS quasars (Schneider et al. 2005) from Stripe 82. The fraction of these objects that vary more than σ in the g and r bands, as a function of σ, is shown in Figure 14. We also show the analogous fraction for stars from the stellar locus. About 93% of quasars vary with σ > 0.03 – 18 – mag. For a small fraction of these objects the measured rms scatter is due to photometric noise, and the stellar data limit this fraction to be at most 3%. Conservatively assuming that none of these 3% of stars is intrinsically variable, we estimate that at least 90% of quasars are variable at the 0.03 mag level on time scales up to several years. 6. Implications for Surveys such as LSST The Large Synoptic Survey Telescope (LSST) is a proposed imaging survey that aims to obtain repeated multi-band imaging to faint limiting magnitudes over a large fraction of the sky. The LSST Science Requirement Document8 calls for ∼ 1000 repeated observations of a solid angle of ∼ 20, 000 deg2 distributed over the six ugrizY photometric bandpasses and over 10 years. The results presented here can be extrapolated to estimate the lower limit on the number of variable sources that the LSST would discover. The single-epoch LSST images will have a 5σ detection limit9 at r ∼ 24.7. Hence, 2% accurate photometry, comparable to the subsample with g < 20.5 discussed here, will be available for stars with r . 22. The USNO-B catalog (Monet et al. 2003) shows that there are about 109 stars with r < 21 across the entire sky. About half of these stars are in the parts of the sky to be surveyed by the LSST. The simulations based on contemporary Milky Way models, such as those developed by Robin et al. (2003) and Jurić et al. (2007), predict that there are about twice as many stars with r < 22 than with r < 21 across the whole sky. Hence, it is expected that the LSST will detect about a billion stars with r < 22. This estimate is uncertain to within a factor of two or so due to unknown details in the spatial distribution of dust in the Galactic plane and towards the Galactic center. We found that at least 0.5% of stars from the main stellar locus can be detected as variable with photometry accurate to ∼ 2%. This is only a lower limit because a much larger number of LSST observations obtained over a longer timespan than the SDSS data discussed here would increase this fraction. Hence, our results imply that the LSST will discover at least 50 million variable stars (without accounting for the fact that stellar counts greatly increase closer to the Galactic plane). Unlike the SDSS sample, where RR Lyrae stars account for ∼ 25% of all variable stars, the number of RR Lyrae stars in the LSST sample will be negligible compared to other types of variable stars. As estimated by Jurić et al. (2007) using deeper coadded SDSS photometry, there are 8 Available at [HREF]http://www.lsst.org/Science/lsst baseline.shtml 9An LSST Exposure Time Calculator is available at [HREF]www.lsst.org – 19 – about 100 deg−2 low-redshift quasars with r < 22 (see also Beck-Winchatz & Anderson 2007 and references therein). Therefore, with a sky coverage of ∼ 20, 000 deg2, the LSST will obtain well-sampled accurate multi-color lightcurves for ∼ 2 million low-redshift quasars. Even at the redshift limit of ∼ 2, this sample will be complete to Mr ∼ −24, that is, almost to the formal quasar luminosity cutoff, and will represent an unprecedented sample for studying quasar physics. 7. Conclusions and Discussion We have designed and tested algorithms for selecting candidate variable sources from a catalog based on multiple SDSS imaging observations. Using a sample of 13,051 selected candidate variable sources in the adopted g < 20.5 flux-limited sample, we find that at least 2% of unresolved optical sources appear variable at the > 0.05 mag level, simultaneously in the g and r bands. A similar fraction of variable sources (∼ 1%) was also found by Sesar et al. (2006) using recalibrated photometric POSS and SDSS measurements, and by Morales-Rueda et al. (2006) using the Faint Sky Variability Survey data (∼ 1%). Thanks to the multi-color nature of the SDSS photometry, and especially to the u band data, we can obtain robust classification of selected variable sources. The majority (2/3) of variable sources are low-redshift (< 2) quasars, although they represent only 2% of all sources in the adopted g < 20.5 flux-limited sample. We find that about 1/4 of variable stars are RR Lyrae stars, and that only 0.5% of stars from the main stellar locus are variable at the 0.05 mag level. The distribution of γ(g) for main stellar locus stars is bimodal, suggesting at least two, and perhaps more, different populations of variables. About a third of variable stars from the stellar locus show gray flux variations in the g and r bands (σ(g)/σ(r) ∼ 1), and positive lightcurve skewness, suggesting variability caused by eclipsing systems. This population has an increased fraction of M type stars. RR Lyrae stars show the largest rms scatter in the u and g bands, followed by low- redshift quasars. The ratio of rms scatter in the g and r bands for RR Lyrae is ∼ 1.4, in agree- ment with Ivezić et al. (2000) results based on 2-epoch photometry. The mean lightcurve skewness for RR Lyrae stars is ∼ −0.5, in agreement with Wils, Lloyd & Bernhard (2006). We selected a sample of 634 candidate RR Lyrae stars, with an estimated & 95% complete- ness and ∼ 70% efficiency. Using these stars, we detected rich halo substructure out to distances of 100 kpc. The apparent “clumpiness” of the candidate RR Lyrae distribution in- creases with increasing radius, similar to CDM predictions by Bullock, Kravtsov & Weinberg – 20 – (2001). Low-redshift quasars show a dependence of σ(g)/σ(r) on redshift, consistent with dis- cussions in Richards et al. (2002) and Wilhite et al. (2006). The lightcurve skewness dis- tribution for quasars is centered on zero in all photometric bands. We find that at least 90% of quasars are variable at the 0.03 mag level (rms) on time scales up to several years. This confirms that variability is as a good a method for finding low-redshift quasars at high (|b| > 30) Galactic latitudes as is the UV excess color selection. The fraction of variable quasars at the > 0.1 mag level obtained here (30%, see Figure 14) is comparable to 36% found by Rengstorf, Brunner & Wilhite (2006). The multiple photometric observations obtained by the SDSS represent an excellent dataset for estimating the impact of surveys such as the LSST on studies of the variable sky. Our results indicate that the LSST will obtain well-sampled 2% accurate multi-color lightcurves for ∼ 2 million low-redshift quasars, and will discover at least 50 million variable stars. The number of variable stars discovered by the LSST will be of the same order as the number of all stars detected by the SDSS. With about 1000 data points in six photometric bands, it will be possible to recognize and classify variable objects using lightcurve mo- ments of higher order than skewness discussed here, including lightcurve folding for periodic variables. We acknowledge support by NSF grant AST-0551161 to the LSST for design and de- velopment activity. Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Education Funding Council for England. The SDSS Web Site is http://www.sdss.org/. The SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions. The Participating Institutions are the American Museum of Natural History, Astrophysical Institute Potsdam, University of Basel, University of Cambridge, Case Western Reserve University, University of Chicago, Drexel University, Fermilab, the Institute for Ad- vanced Study, the Japan Participation Group, Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for Particle Astrophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sciences (LAMOST), Los Alamos National Laboratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics (MPA), New Mexico State University, Ohio State University, University of Pittsburgh, University of Portsmouth, Princeton University, the United States Naval Obser- http://www.sdss.org/ – 21 – vatory, and the University of Washington. REFERENCES Adelman-McCarthy, J. K., Agüeros, M. A., Allam, S. S. et al., 2007, submitted Akerlof, C., Amrose, S., Balsano, R. et al., 2000, AJ, 119, 1901 Alcock, C., Allsman, R. A., Alves, D. R. et al., 2001, ApJS, 136, 439 Astronomy and Astrophysics Survey Committee, Board on Physics and Astronomy, Space Studies Board, National Research Council, 2001, Astronomy and Astrophysics in the New Millennium (The National Academies Press) Becker, A. C., Wittman, D. M., Boeshaar, P. C. et al., 2004, ApJ, 611, 418 Beck-Winchatz, B., & Anderson, S. F., 2007, MNRAS, 374, 1506 Bramich, D. et al., 2007, in prep Bullock, J. S., Kravtsov, A. V., & Weinberg D. H., 2001, ApJ, 548, 33 De Lee, N. et al., 2007, in prep de Vries, W. H., Becker, R. H., & White, R. L., 2003, AJ, 126, 1217 de Vries, W. H., Becker, R. H., White, R. L., & Loomis, C., 2005, AJ, 129, 615 Fan, X. 1999, AJ, 117, 2528 Finlator, K., Ivezić, Ž., Fan, X. et al., 2000, AJ, 120, 2615 Frieman, J. A. et al., 2007, in prep Fukugita, M., Ichikawa, T., Gunn, J. E. et al., 1996, AJ, 111, 1748 Groot, P. J., Vreeswijk, P. M., Huber, M. E. et al., 2003, MNRAS, 339, 427 Gunn, J. E., Carr, M., Rockosi, C. et al., 1998, AJ, 116, 3040 Gunn, J. E., Siegmund, W. A., Mannery, E. J. et al., 2006, AJ, 131, 2332 Hawkins, M. R. S., 1983, MNRAS, 202, 571 Hawkins, M. R. S., & Veron, P., 1995, MNRAS, 275, 1102 – 22 – Hawley, S. L., Covey, K. R., Knapp, G. R. et al., 2002, AJ, 123, 3409 Helmi, A., Ivezić, Ž., Prada, F. et al., 2003, ApJ, 586, 195 Hoffmeister, C., Richter, G., & Wenzel, W., 1985, Variable Stars (New York:Springer) Hogg, D. W., Finkbeiner, D. P., Schlegel, D. J. & Gunn, J.E., 2002, AJ, 122, 2129 Ivezić, Ž., Goldston, J., Finlator, K. et al., 2000, AJ, 120, 963 Ivezić, Ž., Lupton, R. H., Anderson, S. et al., 2003a, Proceedings of the Workshop Variability with Wide Field Imagers, Mem. Soc. Astron. Italiana, 74, 978 (also astro-ph/0301400) Ivezić, Ž., Lupton, R. H., Johnston, D. E. et al., 2003b, Proceedings of “AGN Physics with the SDSS”, Richards, G.T. & Hall, P.B., eds., in press (also astro-ph/0310566) Ivezić, Ž., Lupton, R. H., Schlegel, D. et al., 2004a, Astronomische Nachrichten, 325, No. 6-8, 583-589 (also astro-ph/0410195) Ivezić, Ž., Lupton, R. H., Jurić, M. et al., 2004b, Proceedings of “The Interplay among Black Holes, Stars, and ISM in Galactic Nuclei”, IAU Symposium No. 222, Bergmann, Th. S., Ho, L. C., & Schmitt, H. R., eds., p. 525 (also astro-ph/0404487) Ivezić, Ž., Vivas, A. K., Lupton, R. H. & Zinn, R., 2005, AJ, 129, 1096 Ivezić, Ž., Smith, J. A., Miknaitis, G. et al., 2007, submitted to AJ (also astro-ph/0703157) Jurić, M., Ivezić, Ž., Brooks, A. et al., 2007, submitted to AJ (also astro-ph/0510520) Kawaguchi, T., Mineshige, S., Umemura, M., & Turner, E. L., 1998, ApJ, 504, 671 Kholopov, P. N., Samus, N. N., Frolov, M. S. et al., 1988, General Catalog of Variable Stars, 4th Ed. (Moscow:Nauka) Kinney, A. L., Bohlin, R. C., Blades, J. C., & York, D. G., 1991, ApJS, 75, 645 Koo, D. C., Kron, R. G., & Cudworth, K. M., 1986, PASP, 98, 285 Kowalski, A. et al, 2007, in prep Lenz, D. D., Newberg, J., Rosner, R., Richards, G. T., & Stoughton, C., 1998, ApJS, 119, Lupton, R. H., Ivezić, Ž., Gunn, J. E., Knapp, G. R., Strauss, M. A. & Yasuda, N., 2002, in “Survey and Other Telescope Technologies and Discoveries”, Tyson, J. A. & Wolff, S., eds. Proc. SPIE, 4836, 350 http://arxiv.org/abs/astro-ph/0301400 http://arxiv.org/abs/astro-ph/0310566 http://arxiv.org/abs/astro-ph/0410195 http://arxiv.org/abs/astro-ph/0404487 http://arxiv.org/abs/astro-ph/0703157 http://arxiv.org/abs/astro-ph/0510520 – 23 – Martini, P., & Schneider, D. P., 2003, ApJ, 597, L109 Matthews, T. A., & Sandage, A. R., 1963, ApJ, 138, 30 McMillan, J. D., & Herbst, W., 1991, AJ, 101, 1788 Monet, D. G., Levine, S. E., Canzian, B. et al., 2003, AJ, 125, 984 Morales-Rueda, L., Groot, P. J., Augusteijn, T. et al., 2006, MNRAS, 371, 1681 Pier, J. R., Munn, J. A., Hindsley, R. B., Hennesy, G. S., Kent, S. M., Lupton, R. H. & Ivezić, Ž., 2003, AJ, 125, 1559 Pereyra, N. A., Vanden Berk, D. E., Turnshek, D. A., 2006, ApJ, 642, 87 Richards, G. T., Fan, X., Newberg, H. J. et al., 2002, AJ, 123, 2945 Rengstorf, A. W., Mufson, S. L., Andrews, P. et al., 2004, AJ, 617, 184 Rengstorf, A. W., Brunner, R. J., & Wilhite, B. C., 2006, AJ, 131, 1923 Robin, A. C., Reylé, C., Derriére, S., & Picaud, S., 2003, A&A, 409, 523 Schlegel, D., Finkbeiner, D. P., & Davis, M., 1998, ApJ500, 525 Schneider, D. P., Hall, P. B., Richards, G. T. et al, 2005, AJ, 367 Scranton, R., Scranton, R., Johnston, D. et al., 2002, ApJ, 579, 48 Sesar, B., Svilković, D., Ivezić, Ž. et al., 2006, AJ, 131, 2801 Sirko, E., Goodman, J., Knapp, G. et al., 2004, AJ, 127, 899S Smith, H. A., 1995, RR Lyrae Stars (Cambridge University Press) Smith, J. A., Tucker, D. L., Kent, S. M. et al., 2002, AJ, 123, 2121 Smolčić, V., Ivezić, Ž., Knapp, G. R. et al., 2004, ApJ, 615L, 141 Strateva, I., Ivezić, Ž., Knapp, G. R. et al., 2001, 122, 1861 Tonry, J. L., Howell, S. B., Everett, M. E. et al., 2005, PASP, 117, 281 Stoughton, C., Lupton, R. H., Bernardi, M. et al., 2002, AJ, 123, 485 Trevese, D., Kron, R. G., & Bunone, A., 2001, ApJ, 551, 103 – 24 – Tucker, D., Kent, S., Richmond, M. W. et al., 2006, Astronomische Nachrichten, 327, 821 Tyson, J.A., 2002, in “Survey and Other Telescope Technologies and Discoveries”, Tyson, J. A. & Wolff, S., eds. Proc. SPIE, 4836, 10 Udalski, A., Zebrun, K., Szymanski, M. et al., 2002, Acta Astron., 52, 115 van den Bergh, S., Herbst, E., & Pritchet, C. 1973, AJ, 78, 375 Vanden Berk, D. E., Wilhite, B. C., Kron, R. G. et al., 2004, ApJ, 601, 692 Vivas, A. K., Zinn, R., Andrews, P. et al., 2001, ApJ, 554, L33 Walker, A.R., 2003, Proceedings of the Workshop Variability with Wide Field Imagers, Mem. Soc. Astron. Italiana, 74, 999 (also astro-ph/0303012) Wilhite, B. C., Vanden Berk, D. E., Kron, R. G. et al., 2006, ApJ, 633, 638 Wils, P., Lloyd, C., & Bernhard, K., 2006, MNRAS, 368, 1757 Woźniak, P. R., Udalski, A., Szymanski, M. et al., 2002, Acta Astron., 52, 129 Woźniak, P. R. et al., 2004, AJ127,2436 Yanny, B., Newberg, H. J., Kent, S. et al., 2000, ApJ, 540, 825 York, D. G., Adelman, J., Anderson, S. et al., 2000, AJ, 120, 1579 Żebruń, K., Soszyński, I., Woźniak, P. R. et al., 2002, Acta Astron., 51, 317 This preprint was prepared with the AAS LATEX macros v5.2. http://arxiv.org/abs/astro-ph/0303012 Table 1. The distribution of candidate variable sources in the g − r vs u− g diagram g < 19 g < 20.5 g < 22 Regiona Nameb % allc % vard var/alle Nvar/N % allc % vard var/alle Nvar/N % allc % vard var/alle Nvar/N I white dwarfs 0.14 0.59 4.25 3.50 0.24 0.40 1.69 3.34 0.28 0.45 1.64 4.51 II low-redshift QSOs 0.45 30.88 68.83 56.58 1.90 62.90 33.03 65.10 4.07 70.01 17.22 47.30 III dM/WD pairs 0.08 0.53 6.54 5.37 0.83 2.08 2.50 4.92 1.21 3.79 3.13 8.61 IV RR Lyrae stars 1.28 16.81 13.11 10.78 1.33 7.95 5.99 11.81 1.48 6.41 4.33 11.90 V stellar locus stars 96.27 48.77 0.51 0.42 94.49 25.15 0.27 0.52 91.89 18.33 0.20 0.55 VI high-redshift QSOs 1.78 2.42 1.36 1.12 1.21 1.52 1.26 2.48 1.07 1.01 0.95 2.60 total count 411,667 3,384 662,195 13,051 748,067 20,553 aThese regions are defined in the g − r vs. u− g color-color diagram, with their boundaries shown in Fig. 4 bAn approximate description of the dominant source type found in the region cThe fraction of all sources in a magnitude-limited sample found in this color region, with the magnitude limits listed on top. dThe number of candidate variables from the region, expressed as a fraction of all variable sources eThe ratio of values listed in columns d) and c) fThe number of candidate variables from the region, expressed as a fraction of all sources in that region Table 2. The fraction of variable main stellar locus stars as a function of the s color Bin % σ(g) > 0.05a % varb 〈σ(g)〉c Countsd s < −0.02 3.23 0.36 0.017 46,836 −0.02 < s < 0.02 2.92 0.28 0.017 136,910 0.02 < s < 0.06 4.61 1.18 0.019 29,106 s > 0.06 11.50 4.10 0.027 4,547 aFraction of sources with σ(g) > 0.05 mag bFraction of variable sources (selected using σ(g, r) > 0.05 mag and χ2(g, r) > 3) cMedian σ(g) dNumber of sources in the bin Table 3. Approximate locations and properties of detected overdensities Labela Nb 〈R.A.〉c 〈d〉d 〈r〉e 〈u− g〉f 〈g − rg〉 〈γ(g)〉h Nb/N A 84 330.95 21 17.02 1.14 0.18 -0.50 0.62 B 144 309.47 22 16.76 1.12 0.16 -0.57 0.64 C 54 33.69 25 17.61 1.13 0.20 -0.68 0.29 D 8 347.91 29 18.02 1.14 0.23 -0.50 0.38 E 11 314.06 43 18.84 1.09 0.20 -0.41 0.75 F 11 330.26 48 19.16 1.07 0.20 -0.46 0.38 G 10 354.81 55 19.46 1.10 0.22 -0.69 0.38 H 7 43.57 57 19.32 1.05 0.04 0.06 1.34 I 4 311.34 72 19.98 1.08 0.11 -0.10 2.0 J 26 353.58 81 20.21 1.11 0.20 -0.27 0.58 K 8 28.39 84 20.35 1.10 0.20 0.14 0.44 L 3 339.01 92 20.45 1.06 0.16 0.08 0.67 M 5 39.45 102 20.73 1.07 0.11 0.36 1.67 aOverdensity’s label from Fig. 13 bNumber of candidate RR Lyrae in the overdensity cMedian Right Ascension dMedian distance (in kpc) eMedian r band magnitude fMedian u− g color gMedian g − r color hMedian γ(g) iThe number ratio of candidate RR Lyrae with g − r < 0.12 and g − r > 0.12 – 28 – Fig. 1.— The dependence of the median root-mean-square (rms) scatter Σ in SDSS ugrz bands on magnitude (symbols). The vertical bars show the rms scatter of Σ in each bin (not the error of the median). The dependence of Σ in the i band is similar to the r band dependence. In each band, a fourth-degree polynomial is fitted through medians and shown by the solid line. – 29 – Fig. 2.— Top: The cumulative distribution of χ2 g and r values for all sources (solid line) and a reference Gaussian χ2 distribution with 9 degrees of freedom (dashed line). Vertical dashed lines show adopted selection cuts on χ2(g) and χ2(r) values. Middle: The fraction of σ(g, r) > 0.05 mag sources with χ2 per degree of freedom greater than χ2 (only in the g or r band: solid line, both in the g and r bands: dashed line). Bottom: The fraction of σ(g, r) > 0.05 mag sources with χ2(m) > 2 (dashed line) or χ2(m) > 3 (solid line) as a function of magnitude for m = g, r bands, respectively. – 30 – Fig. 3.— The distribution of counts for the full sample (top), candidate variable sample (middle), and the ratio of two counts (bottom) in the g− r vs. u− g color-color diagram for sources brighter than g = 20.5, binned in 0.05 mag bins. Contours outline distributions of unbinned counts. Note the remarkable difference between the distribution of all sources and that of the variable sample, which demonstrates that the latter are robustly selected. – 31 – Fig. 4.— The distribution of 18,329 candidate variable sources brighter than g = 21 in representative SDSS color-magnitude and color-color diagrams. Candidate variables are color-coded by their rms scatter in the g band (0.05-0.2, see the legend, red if larger or equal than 0.2). Only sources brighter than g = 20 are plotted in color-color diagrams. Note how RR Lyrae stars (u − g ∼ 1.15, red dots, σ(g) & 0.2 mag) and low-redshift quasars (u− g 6 0.7, green dots, σ(g) & 0.1 mag) stand out as highly variable sources. The regions marked in the top right panel are used for quantitative comparison of the overall and variable source distributions (see Table 1). – 32 – Fig. 5.— The distribution of the rms scatter σ(u) (top left), rms scatter σ(g) (top right), σ(g)/σ(r) ratio (bottom left), and γ(g) (bottom right) for the variable sample in the g − r vs. u − g color-color diagram. Sources are binned in 0.05 mag wide bins and the median values are color-coded. Color ranges are given at the top of each panel, going from blue to red, where the green color is in the mid-range. Values outside the range saturate in blue or red. Contours outline the count distributions on a linear scale in steps of 15%. Flux limit is g < 20.5, with an additional u < 20.5 limit in the top left panel. Bottom left: On average, RR Lyrae stars have σ(g)/σ(r) ∼ 1.4, main stellar locus stars have σ(g)/σ(r) ∼ 1, while low-redshift quasars show a gradient of σ(g)/σ(r) values. Bottom right: On average, quasars and c type RR Lyrae stars (u − g ∼ 1.15, g − r < 0.15) have γ(g) ∼ 0, ab type RR Lyrae (u− g ∼ 1.15, g− r > 0.15) have negative skewness, and stars in the main stellar locus have positive skewness. – 33 – Fig. 6.— The distribution of candidate variable sources in the g < 20.5 flux-limited sample is shown by linearly-spaced contours, and by symbols color-coded by the u − g color for sources with σ(g) > 0.05 mag and σ(r) > 0.05 mag. The dotted lines show the adopted σ(g, r) selection cut. The thick solid line shows σ(g) = σ(r), while the dashed line shows σ(g) = 1.4σ(r) relation representative of RR Lyrae stars. Note that sources following the σ(g) = 1.4σ(r) relation tend to have u−g ∼ 1, as expected for RR Lyrae stars. The greyscale background shows the fraction of χ2(g, r) > 3 sources which also have σ(g) > x and σ(r) > y and demonstrates that large χ2 sources also have large σ. – 34 – 0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 3.5 Fig. 7.— The dependence of σ(g)/σ(r) (top), g − r (middle), and σ(g) on redshift for a sample of spectroscopically confirmed unresolved quasars from Schneider et al. (2005). The σ(g)/σ(r) gradient shown in Fig. 5 (bottom left panel) can be explained by the local maximum of σ(g)/σ(r) in the 1.0 to 1.6 redshift range. – 35 – Fig. 8.— The lightcurve skewness distribution in the ugi bands for spectroscopically con- firmed unresolved quasars (dotted line), candidate RR Lyrae stars (dashed line), and variable main stellar locus stars (solid line, Region V, see Fig. 4 for the definition). The distribution of the skewness in the r band is similar to the g band distribution, and the distribution of skewness in the z band is similar to the u band distribution (therefore the r and z data are not shown). Stars in the main stellar locus show bimodality in the γ(g) suggesting at least two, and perhaps more, different populations of variables. Similar bimodality is also discernible in the r band, while it is less pronounced in the i band and not detected in the u and z bands. Quasars have symmetric lightcurves (γ ∼ 0) and their distribution of skewness does not change between bands. – 36 – 0.5 1 1.5 2 2.5 3 3.5 0.5 1 1.5 2 Fig. 9.— A comparison of the u−g (left) and g−r (right) color distributions for variable main stellar locus stars brighter than g = 19 (dashed lines), and a subset with highly asymmetric lightcurves (γ(g) > 2.5, solid lines). The subset with highly asymmetric lightcurves has an increased fraction of stars with colors u− g ∼ 2.5 and g− r ∼ 1.5, characteristic of M stars. – 37 – 0.9 1 1.1 1.2 1.3 0 0.1 0.2 0.3 0.4 0.5 0.9 1 1.1 1.2 1.3 -0.1 0 0.1 0.2 0.3 Fig. 10.— Top left: The distribution of 846 candidate variable sources from the RR Lyrae region (dashed lines, see Fig. 3 in Ivezić et al. 2005) in the g−r vs. u−g color-color diagram. The symbols mark the time-averaged values and are color-coded by σ(g) (0.05 to 0.2, blue to red). The dotted horizontal line shows the boundary between the RRab and RRc-dominated regions. Top right: Sources from the top left panel divided into 3 groups according to their σ(g)/σ(r) values: candidate RR Lyrae stars with 1 < σ(g)/σ(r) 6 2.5 (large dots), sources with σ(g)/σ(r) 6 1 (triangles), and sources with σ(g)/σ(r) > 2.5 (squares). Small dots show sources with RR Lyrae colors that fail the variability criteria. The dashed lines show the σ(g) = σ(r) and σ(g) = 2.5σ(r) relations, while the dotted line shows the σ(g) = 1.4σ(r) relation. Bottom left: A comparison of the u− g color distributions for candidate RR Lyrae stars (solid line) and sources with RR Lyrae colors but not tagged as RR Lyrae stars (dashed line). Bottom right: The dependence of γ(g) on the g−r color for candidate RR Lyrae stars. The boundary g − r = 0.12 (vertical dotted line) separates candidate RR Lyrae stars into those with asymmetric (γ(g) ∼ −0.5) and symmetric (γ(g) ∼ 0) lightcurves, corresponding to RRab and RRc stars, respectively. The condition γ(g) 6 1 (horizontal dashed line) is used to reduce the contamination of the RR Lyrae sample by eclipsing variables. – 38 – 0.9 1 1.1 1.2 1.3 0 0.1 0.2 0.3 0.4 0.5 0.9 1 1.1 1.2 1.3 -0.1 0 0.1 0.2 0.3 Fig. 11.— The distribution of candidate RR Lyrae stars selected with 1 < σ(g)/σ(r) 6 2.5 and classified by De Lee et al. (2007), shown in diagrams similar to Fig. 10. Symbols show RRab stars (red dots), RRc stars (blue dots), variable non-RR Lyrae stars (green dots), and non-variable sources (open squares, only four sources). A comparison of the u − g color distribution for RRab (solid line), RRc (dashed line), and variable non-RR Lyrae stars (dotted line) is shown in the bottom left panel. – 39 – 60 50 40 30 20 10 0 -10 -20 -30 -40 -50 634 Stripe 82 RR Lyrae candidates 25 kpc 50 kpc 75 kpc 100 kpc JK LM Fig. 12.— The magnitude-position distribution of 634 Stripe 82 RR Lyrae candidates within −55◦ < R.A. < 60◦ and |Dec| 6 1.27◦. Approximate distance (shown on the right y-axis) is calculated assuming Mr = 0.7 mag for RR Lyrae stars. Dashed lines show where sample completeness decreases from approximately 99% to 60% due to the χ2 cut (see the bottom right panel in Fig. 2). Closed curves are remapped ellipses and circles from Fig. 13 that mark halo substructure. – 40 – 25 50 75 100 kpc Fig. 13.— Left: The spatial distribution of candidate RR Lyrae stars discovered by SDSS along the Celestial Equator. Distance is calculated assuming Eq. 3 from Ivezić et al. (2005) and MV = 0.7 mag as the absolute magnitude of RR Lyrae in the V band. The right wedge corresponds to candidate RR Lyrae selected in this work (634 candidates, shown in Fig. 12) and the left wedge is based on the sample from Ivezić et al. (2000) (296 candidates). Right: The number density distribution of candidate RR Lyrae stars shown in the left panel, computed using an adaptive Bayesian density estimator developed by Ivezić et al. (2005). The color scheme represents the number density multiplied by the cube of the galactocentric radius, and displayed on a logarithmic scale with a dynamic range of 300 (from light blue to red). The green color corresponds to the mean density – both wedges with the data would have this color if the halo number density distribution followed a perfectly smooth r−3 power- law. The purple color marks the regions with no data. The yellow regions are formally ∼ 3σ significant overdensities, and orange/red regions have an even higher significance (using only the counts variance). The strongest clump in the left wedge belongs to the Sgr dwarf tidal stream as does the clump marked by C in the right wedge (Ivezić et al. 2003a). An approximate location and properties of labeled overdensities are listed in Table 3. The Ivezić et al. (2000) sample is based on only 2 epochs and thus has a much lower completeness (∼ 56%) resulting in a lower density contrast. – 41 – Fig. 14.— The fraction of spectroscopically confirmed unresolved QSOs (fQSO, solid line) and the fraction of sources from the stellar locus (floc, dashed line) brighter than g = 19.5 and r = 19.5 that have rms scatter larger than σ in the g and r bands. The ratio fQSO/(1+ floc) (dotted line), which corresponds to the implied fraction of variable QSOs, peaks at a level of 90% for σ = 0.03 mag. Introduction Overview of the SDSS Imaging and Stripe 82 Data Analysis of Stripe 82 Catalog of Variable Sources Methods and Selection Criteria The Counts of Variable Sources Completeness Efficiency The Properties of Variable Sources Skewness as a Proxy for Dominant Variability Mechanism The Milky Way Halo Structure Traced by Candidate RR Lyrae Stars Criteria for Selecting RR Lyrae Stars Completeness and efficiency The Spatial Distribution of Candidate RR Lyrae Stars Are All Quasars Variable? Implications for Surveys such as LSST Conclusions and Discussion ABSTRACT We quantify the variability of faint unresolved optical sources using a catalog based on multiple SDSS imaging observations. The catalog covers SDSS Stripe 82, and contains 58 million photometric observations in the SDSS ugriz system for 1.4 million unresolved sources. In each photometric bandpass we compute various low-order lightcurve statistics and use them to select and study variable sources. We find that 2% of unresolved optical sources brighter than g=20.5 appear variable at the 0.05 mag level (rms) simultaneously in the g and r bands. The majority (2/3) of these variable sources are low-redshift (<2) quasars, although they represent only 2% of all sources in the adopted flux-limited sample. We find that at least 90% of quasars are variable at the 0.03 mag level (rms) and confirm that variability is as good a method for finding low-redshift quasars as is the UV excess color selection (at high Galactic latitudes). We analyze the distribution of lightcurve skewness for quasars and find that is centered on zero. We find that about 1/4 of the variable stars are RR Lyrae stars, and that only 0.5% of stars from the main stellar locus are variable at the 0.05 mag level. The distribution of lightcurve skewness in the g-r vs. u-g color-color diagram on the main stellar locus is found to be bimodal (with one mode consistent with Algol-like behavior). Using over six hundred RR Lyrae stars, we demonstrate rich halo substructure out to distances of 100 kpc. We extrapolate these results to expected performance by the Large Synoptic Survey Telescope and estimate that it will obtain well-sampled 2% accurate, multi-color lightcurves for ~2 million low-redshift quasars, and will discover at least 50 million variable stars. <|endoftext|><|startoftext|> Introduction The theory of time scales is a relatively new area, that unify and generalize difference and differential equations [5]. It was initiated by Stefan Hilger in the nineties of the XX century [7, 8], and is now subject of strong current research in many different fields in which dynamic processes can be described with discrete or continuous models [1]. The calculus of variations on time scales was introduced by Bohner [4] and by Hilscher and Zeidan [9], and appears to have many opportunities for application in economics [2]. In all those works, necessary optimality conditions are only obtained for the basic (simplest) problem of the calculus of variations on time scales: in [2, 4] for the basic problem with fixed endpoints, in [9] for the basic problem with general (jointly varying) endpoints. Having in mind the classical ∗This work is part of the first author’s PhD project. http://arxiv.org/abs/0704.0656v1 setting (situation when the time scale T is either R or Z – see e.g. [6, 14] and [10, 11], respectively), one suspects that the Euler-Lagrange equations in [2, 4, 9] are easily generalized for problems with higher-order delta derivatives. This is not exactly the case, even beginning with the formulation of the problem. The basic problem of the calculus of variations on time scales is defined (cf. [4, 9], see §2 below for the meaning of the ∆-derivative and ∆-integral) as L[y(·)] = L(t, yσ(t), y∆(t))∆t −→ min, (y(a) = ya) , (y(b) = yb) , (1) with L : T×Rn×Rn → R, (y, u) → L(t, y, u) a C2-function for each t, and where we are using parentheses around the endpoint conditions as a notation to mean that the conditions may or may not be present: the case with fixed boundary conditions y(a) = ya and y(b) = yb is studied in [4], for admissible functions y(·) belonging to C1rd (T;R n) (rd-continuously ∆-differentiable functions); general boundary conditions of the type f(y(a), y(b)) = 0, which include the case y(a) or y(b) free, and over admissible functions in the wider class C1prd (T;R n) (piecewise rd-continuously ∆-differentiable functions), are considered in [9]. One question immediately comes to mind. Why is the basic problem on time scales defined as (1) and not as L[y(·)] = L(t, y(t), y∆(t))∆t −→ min, (y(a) = ya) , (y(b) = yb) . (2) The answer is simple: compared with (2), definition (1) simplifies the Euler- Lagrange equation, in the sense that makes it similar to the classical context. The reader is invited to compare the Euler-Lagrange condition (6) of problem (1) and the Euler-Lagrange condition (13) of problem (2), with the classical expression (on the time scale T = R): Ly′(t, y∗(t), y (t)) = Ly(t, y∗(t), y (t)), t ∈ [a, b] . It turns out that problems (1) and (2) are equivalent: as far as we are assuming y(·) to be ∆-differentiable, then y(t) = yσ(t) − µ(t)y∆(t) and (i) any problem (1) can be written in the form (2), (ii) any problem (2) can be written in the form (1). We claim, however, that the formulation (2) we are promoting here is more natural and convenient. An advantage of our formulation (2) with respect to (1) is that it makes clear how to generalize the basic problem on time scales to the case of a Lagrangian L containing delta derivatives of y(·) up to an order r, r ≥ 1. The higher-order problem will be naturally defined as L[y(·)] = r−1(b) L(t, y(t), y∆(t), . . . , y∆ (t))∆t −→ min, y(a) = y0a r−1(b) = y0b , (3) (a) = yr−1a r−1 ( ρr−1(b) = yr−1 where y∆ (t) ∈ Rn, i ∈ {0, . . . , r}, y∆ = y, and n, r ∈ N (assumptions on the data of the problem will be specified later, in Section 3). One of the new results in this paper is a necessary optimality condition in delta integral form for problem (3) (Theorem 4). It is obtained using the interplay of problems (1) and (2) in order to deal with more general optimal control problems (16). The paper is organized as follows. In Section 2 we give a brief introduction to time scales and recall the main results of the calculus of variations on this general setting. Our contributions are found in Section 3. We start in §3.1 by proving the Euler-Lagrange equation and transversality conditions (natural boundary conditions – y(a) or/and y(b) free) for the basic problem (2) (The- orem 2). As a corollary, the Euler-Lagrange equation in [4] and [9] for (1) is obtained. Regarding the natural boundary conditions, the one which appears when y(a) is free turns out to be simpler and more close in aspect to the classical condition Ly′(a, y∗(a), y (a)) = 0 for problem (1) than to (2)—compare condi- tion (9) for problem (2) with the correspondent condition (14) for problem (1); but the inverse situation happens when y(b) is free—compare condition (15) for problem (1) with the correspondent condition (10) for (2), this last being simpler and more close in aspect to the classical expression Ly′(b, y∗(b), y (b)) = 0 valid on the time scale T = R. In §3.2 we formulate a more general optimal control problem (16) on time scales, proving respective necessary optimality conditions in Hamiltonian form (Theorem 3). As corollaries, we obtain a Lagrange multi- plier rule on time-scales (Corollary 2), and in §3.3 the Euler-Lagrange equation for the problem of the calculus of variations with higher order delta derivatives (Theorem 4). Finally, as an illustrative example, we consider in §4 a discrete time scale and obtain the well-known Euler-Lagrange equation in delta differ- entiated form. All the results obtained in this paper can be extended: (i) to nabla deriva- tives (see [5, §8.4]) with the appropriate modifications and as done in [2] for the simplest functional; (ii) to more general classes of admissible functions and to problems with more general boundary conditions, as done in [9] for the simplest functional of the calculus of variations on time scales. 2 Time scales and previous results We begin by recalling the main definitions and properties of time scales (cf. [1, 5, 7, 8] and references therein). A nonempty closed subset of R is called a Time Scale and is denoted by T. The forward jump operator σ : T → T is defined by σ(t) = inf {s ∈ T : s > t}, for all t ∈ T, while the backward jump operator ρ : T → T is defined by ρ(t) = sup {s ∈ T : s < t}, for all t ∈ T, with inf ∅ = supT (i.e., σ(M) = M if T has a maximum M) and sup ∅ = inf T (i.e., ρ(m) = m if T has a minimum m). A point t ∈ T is called right-dense, right-scattered, left-dense and left- scattered if σ(t) = t, σ(t) > t, ρ(t) = t and ρ(t) < t, respectively. Throughout the text we let T = [a, b] ∩ T0 with a < b and T0 a time scale. We define Tk = T\(ρ(b), b], Tk and more generally Tk for n ∈ N. The following standard notation is used for σ (and ρ): σ0(t) = t, σn(t) = (σ ◦ σn−1)(t), n ∈ N. The graininess function µ : T → [0,∞) is defined by µ(t) = σ(t)− t, for all t ∈ T. We say that a function f : T → R is delta differentiable at t ∈ Tk if there is a number f∆(t) such that for all ε > 0 there exists a neighborhood U of t (i.e., U = (t− δ, t+ δ) ∩ T for some δ > 0) such that |f(σ(t)) − f(s)− f∆(t)(σ(t) − s)| ≤ ε|σ(t)− s|, for all s ∈ U. We call f∆(t) the delta derivative of f at t. Now, we define the rth−delta derivative (r ∈ N) of f to be the function → R, provided f∆ is delta differentiable on Tk For delta differentiable f and g, the next formulas hold: fσ(t) = f(t) + µ(t)f∆(t) (4) (fg)∆(t) = f∆(t)gσ(t) + f(t)g∆(t) = f∆(t)g(t) + fσ(t)g∆(t), where we abbreviate f ◦ σ by fσ. Next, a function f : T → R is called rd-continuous if it is continuous at right-dense points and if its left-sided limit exists at left-dense points. We denote the set of all rd-continuous functions by Crd or Crd[T] and the set of all delta differentiable functions with rd-continuous derivative by C1rd or C rd[T]. It is known that rd-continuous functions possess an antiderivative, i.e., there exists a function F with F∆ = f , and in this case an integral is defined by f(t)∆t = F (b)− F (a). It satisfies ∫ σ(t) f(τ)∆τ = µ(t)f(t) . (5) We now present some useful properties of the delta integral: Lemma 1. If a, b ∈ T and f, g ∈Crd, then f(σ(t))g∆(t)∆t = [(fg)(t)] t=a − f∆(t)g(t)∆t. f(t)g∆(t)∆t = [(fg)(t)] t=a − f∆(t)g(σ(t))∆t. The main result of the calculus of variations on time scales is given by the following necessary optimality condition for problem (1). Theorem 1 ([4]). If y∗ is a weak local minimizer (cf. §3) of the problem L[y(·)] = L(t, yσ(t), y∆(t))∆t −→ min y(·) ∈ C1 y(a) = ya, y(b) = yb, then the Euler-Lagrange equation L∆y∆(t, y (t), y∆ (t)) = Lyσ(t, y (t), y∆ (t)), t ∈ Tk holds. Main ingredients to prove Theorem 1 are item 1 of Lemma 1 and the Dubois- Reymond lemma: Lemma 2 ([4]). Let g ∈ Crd, g : [a, b] k → Rn. Then, g(t) · η∆(t)∆t = 0 for all η ∈ C1 with η(a) = η(b) = 0 if and only if g(t) = c on [a, b]k for some c ∈ Rn. 3 Main results Assume that the Lagrangian L(t, u0(t), u1(t), . . . , ur(t)) (r ≥ 1) is a C r+1 func- tion of (u0(t), u1(t), . . . , ur(t)) for each t ∈ T. Let y ∈ C rd[T], where Crrd[T] = y : Tk → Rn : y∆ is rd-continuous on Tk We want to minimize the functional L of problem (3). For this, we say that y∗ ∈ C rd[T] is a weak local minimizer for the variational problem (3) provided there exists δ > 0 such that L[y∗] ≤ L[y] for all y ∈ C rd[T] satisfying the constraints in (3) and ‖y − y∗‖r,∞ < δ, where ||y||r,∞ := with y∆ = y and ||y||∞ := supt∈Tkr |y(t)|. 3.1 The basic problem on time scales We start by proving the necessary optimality condition for the simplest varia- tional problem (r = 1): L[y(·)] = L(t, y(t), y∆(t))∆t −→ min y(·) ∈ C1rd[T] (y(a) = ya) , (y(b) = yb) . Remark 1. We are assuming in problem (7) that the time scale T has at least 3 points. Indeed, for the delta-integral to be defined we need at least 2 points. Assume that the time scale has only two points: T = {a, b}, with b = σ(a). Then, ∫ σ(a) L(t, y(t), y∆(t))∆t = µ(a)L(a, y(a), y∆(a)). In the case both y(a) and y(σ(a)) are fixed, since y∆(a) = y(σ(a))−y(a) , then L[y(·)] would be a con- stant for every admissible function y(·) (there would be nothing to minimize and problem (7) would be trivial). Similarly, for (3) we assume the time scale to have at least 2r + 1 points (see Remark 15). Theorem 2. If y∗ is a weak local minimizer of (7) (problem (3) with r = 1), then the Euler-Lagrange equation in ∆-integral form Ly∆(t, y∗(t), y (t)) = ∫ σ(t) Ly(ξ, y∗(ξ), y (ξ))∆ξ + c (8) holds ∀t ∈ Tk and some c ∈ Rn. Moreover, if the initial condition y(a) = ya is not present (y(a) is free), then the supplementary condition Ly∆(a, y∗(a), y (a))− µ(a)Ly(a, y∗(a), y (a)) = 0 (9) holds; if y(b) = yb is not present (y(b) is free), then Ly∆(ρ(b), y∗(ρ(b)), y (ρ(b))) = 0 . (10) Remark 2. For the time scale T = R equalities (9) and (10) give, respec- tively, the well-known natural boundary conditions Ly′(a, y∗(a), y (a)) = 0 and Ly′(b, y∗(b), y (b)) = 0. Proof. Suppose that y∗(·) is a weak local minimizer of L[·]. Let η(·) ∈C rd and define Φ : R → R by Φ(ε) = L[y∗(·) + εη(·)]. This function has a minimum at ε = 0, so we must have Φ′(0) = 0. Applying the delta-integral properties and the integration by parts formula 2 (second item in Lemma 1), we have 0 = Φ′(0) [Ly(t, y∗(t), y (t)) · η(t) + Ly∆(t, y∗(t), y (t)) · η∆(t)]∆t Ly(t, y∗(t), y (t)∆t · η(t) ∫ σ(t) Ly(ξ, y∗(ξ), y (ξ))∆ξ · η∆(t)− Ly∆(t, y∗(t), y (t)) · η∆(t) Let us limit the set of all delta-differentiable functions η(·) with rd-continuous derivatives to those which satisfy the condition η(a) = η(b) = 0 (this condition is satisfied by all the admissible variations η(·) in the case both y(a) = ya and y(b) = yb are fixed). For these functions we have Ly∆(t, y∗(t), y (t)) − ∫ σ(t) Ly(ξ, y∗(ξ), y (ξ))∆ξ · η∆(t)∆t = 0 . Therefore, by the lemma of Dubois-Reymond (Lemma 2), there exists a constant c ∈ Rn such that (8) holds: Ly∆(t, y∗(t), y (t))− ∫ σ(t) Ly(ξ, y∗(ξ), y (ξ))∆ξ = c , (12) for all t ∈ Tk. Because of (12), condition (11) simplifies to Ly(t, y∗(t), y (t)∆t · η(t) + c · η(t)| t=a = 0, for any admissible η(·). If y(a) = ya is not present in problem (7) (so that η(a) need not to be zero), taking η(t) = t− b we find that c = 0; if y(b) = yb is not present, taking η(t) = t − a we find that Ly(t, y∗(t), y (t) = 0. Applying these two conditions to (12) and having in mind formula (5), we may state that Ly∆(a, y∗(a), y (a))− ∫ σ(a) Ly(ξ, y∗(ξ), y (ξ))∆ξ = 0 ⇔ Ly∆(a, y∗(a), y (a))− µ(a)Ly(a, y∗(a), y (a)) = 0, and (note that σ(ρ(b)) = b) Ly∆(ρ(b), y∗(ρ(b)), y (ρ(b)))− Ly(ξ, y∗(ξ), y (ξ))∆ξ = 0 ⇔ Ly∆(ρ(b), y∗(ρ(b)), y (ρ(b))) = 0. Remark 3. Since σ(t) ≥ t, ∀t ∈ T, we must have Ly∆(t, y∗(t), y (t))− ∫ σ(t) Ly(ξ, y∗(ξ), y (ξ))∆ξ = c ⇔ Ly∆(t, y∗(t), y (t)) − µ(t)Ly(t, y∗(t), y Ly(ξ, y∗(ξ), y (ξ))∆ξ + c, by formula (5). Delta differentiating both sides, we obtain Ly∆(t, y∗(t), y (t))− µ(t)Ly(t, y∗(t), y = Ly(t, y∗(t), y (t)), t ∈ Tk . (13) Note that we can’t expand the left hand side of this last equation, because we are not assuming that µ(t) is delta differentiable. In fact, generally µ(t) is not delta differentiable (see example 1.55, page 21 of [5]). We say that (13) is the Euler-Lagrange equation for problem (7) in the delta differentiated form. As mentioned in the introduction, the formulations of the problems of the cal- culus of variations on time scales with “ t, yσ(t), y∆(t) ” and with “ t, y(t), y∆(t) are equivalent. It is trivial to derive previous Euler-Lagrange equation (6) from our equation (13) and the other way around (one can derive (13) directly from (6)). Corollary 1. If y∗ ∈ C [T] is a weak local minimizer of L[y(·)] = L(t, yσ(t), y∆(t))∆t , (y(a) = ya), (y(b) = yb), then the Euler-Lagrange equation (6) holds. If y(a) is free, then the extra transversality condition (natural boundary condition) Ly∆(a, y (a), y∆ (a)) = 0 (14) holds; if y(b) is free, then Lyσ(ρ(b), y (ρ(b)), y∆ (ρ(b)))µ(ρ(b)) + Ly∆(ρ(b), y (ρ(b)), y∆ (ρ(b))) = 0 . (15) Proof. Since y(t) is delta differentiable, then (4) holds. This permits us to write L(t, yσ(t), y∆(t)) = L(t, y(t) + µ(t)y∆(t), y∆(t)) = F (t, y(t), y∆(t)). Applying equation (13) to the functional F we obtain Fy∆(t, y(t), y ∆(t)) − µ(t)Fy(t, y(t), y ∆(t)) = Fy(t, y(t), y ∆(t)). Fy(t, y(t), y ∆(t)) = Lyσ(t, y σ(t), y∆(t)) , Fy∆(t, y(t), y ∆(t)) = Lyσ(t, y σ(t), y∆(t))µ(t) + Ly∆(t, y σ(t), y∆(t)) , and the result follows. 3.2 The Lagrange problem on time scales Now we consider a more general variational problem with delta-differential side conditions: J [y(·), u(·)] = L(t, y(t), u(t))∆t −→ min , y∆(t) = ϕ(t, y(t), u(t)) , (y(a) = ya) , (y(b) = yb) , where y(·) ∈ C1rd[T], u(·) ∈ Crd[T], y(t) ∈ R n and u(t) ∈ Rm for all t ∈ T, and m ≤ n. We assume L : T × Rn × Rm → R and ϕ : T × Rn × Rm → Rn to be C1-functions of y and u for each t; and that for each control function u(·) ∈ Crd[T;R m] there exists a correspondent y(·) ∈ C1rd[T;R n] solution of the ∆-differential equation y∆(t) = ϕ(t, y(t), u(t)). We remark that conditions for existence or uniqueness are available for O∆E’s from the very beginning of the theory of time scales (see [8, Theorem 8]). Roughly speaking, forward solutions exist, while existence of backward solutions needs extra assumptions (e.g. regressivity). In control theory, however, one usually needs only forward solutions, so we do not need to impose such extra assumptions [3]. We are interested to find necessary conditions for a pair (y∗, u∗) to be a weak local minimizer of J . Definition 1. Take an admissible pair (y∗, u∗). We say that (y∗, u∗) is a weak local minimizer for (16) if there exist δ > 0 such that J [y∗, u∗] ≤ J [y, u] for all admissible pairs (y, u) satisfying ‖y − y∗‖1,∞ + ‖u− u∗‖∞ < δ. Remark 4. Problem (16) is very general and includes: (i) problem (7) (this is the particular case where m = n and ϕ(t, y, u) = u), (ii) the problem of the calculus of variations with higher-order delta derivatives (3) (such problem receive special attention in Section 3.3 below), (iii) isoperimetric problems on time scales. Suppose that the isoperimetric condition I[y(·), u(·)] = g (t, y(t), u(t))∆t = β , β a given constant, is prescribed. We can introduce a new state variable yn+1 defined by yn+1(t) = g(ξ, y(ξ), u(ξ))∆ξ, t ∈ T, with boundary conditions yn+1(a) = 0 and yn+1(b) = β. Then y∆n+1(t) = g (t, y(t), u(t)) , t ∈ T and we can always recast an isoperimetric problem as a Lagrange problem (16). To establish necessary optimality conditions for (16) is more complicated than for the basic problem of the calculus of variations on time scales (1) or (2), owing to the possibility of existence of abnormal extremals (Definition 2). The abnormal case never occurs for the basic problem (Proposition 2). Theorem 3 (The weak maximum principle on time scales). If (y∗(·), u∗(·)) is a weak local minimizer of problem (16), then there exists a set of multipliers (ψ0∗ , ψ∗(·)) 6= 0, where ψ0∗ is a nonnegative constant and ψ∗(·) : T → R n is a delta differentiable function on Tk, such that (y∗(·), u∗(·), ψ0∗ , ψ∗(·)) satisfy (t) = Hψσ (t, y∗(t), u∗(t), ψ0∗ , ψ (t)) , (∆-dynamic equation for y) (17) (t) = −Hy(t, y∗(t), u∗(t), ψ0∗ , ψ (t)) , (∆-dynamic equation for ψ) (18) Hu(t, y∗(t), u∗(t), ψ0∗ , ψ (t)) = 0 , (∆-stationary condition) (19) for all t ∈ Tk, where the Hamiltonian function H is defined by H(t, y, u, ψ0, ψ σ) = ψ0L(t, y, u) + ψ σ · ϕ(t, y, u) . (20) If y(a) is free in (16), then ψ∗(a) = 0 ; (21) if y(b) is free in (16), then ψ∗(b) = 0 . (22) Remark 5. From the definition (20) of H , it follows immediately that (17) holds true for any admissible pair (y(·), u(·)) of problem (16). Indeed, condition (17) is nothing more than the control system y∆ (t) = ϕ(t, y∗(t), u∗(t)). Remark 6. For the time scale T = Z, (17)-(19) reduce to well-known conditions in discrete time (see e.g. [13, Ch. 8]): the ∆-dynamic equation for y takes the form y(k + 1) − y(k) = Hψ (k, y(k), u(k), ψ0, ψ(k + 1)); the ∆-dynamic equa- tion for ψ gives ψ(k + 1) − ψ(k) = −Hy (k, y(k), u(k), ψ0, ψ(k + 1)); and the ∆-stationary condition reads as Hu (k, y(k), u(k), ψ0, ψ(k + 1)) = 0; with the Hamiltonian H = ψ0L(k, y(k), u(k)) + ψ(k + 1) · ϕ(k, y(k), u(k)). For T = R, Theorem 3 is known in the literature as Hestenes necessary condition, which is a particular case of the Pontryagin Maximum Principle [12]. Corollary 2 (Lagrange multiplier rule on time scales). If (y∗(·), u∗(·)) is a weak local minimizer of problem (16), then there exists a collection of multipli- ers (ψ0∗ , ψ∗(·)), ψ0∗ a nonnegative constant and ψ∗(·) : T → R n a delta dif- ferentiable function on Tk, not all vanishing, such that (y∗(·), u∗(·), ψ0∗ , ψ∗(·)) satisfy the Euler-Lagrange equation of the augmented functional J∗: J∗[y(·),u(·), ψ(·)] = t, y(t), u(t), ψσ(t), y∆(t) ψ0L(t, y(t), u(t)) + ψ σ(t) · ϕ(t, y(t), u(t)) − y∆(t) [H(t, y(t), u(t), ψ0, ψ σ(t))− ψσ(t) · y∆(t)]∆t . Proof. The Euler-Lagrange equations (13) and (6) applied to (23) give y∆ − µ(t)L = L∗y , (−µ(t)L∗u) = L∗u , L ψσ = 0 , that is, (ψσ(t) + µ(t) ·Hy) = −Hy , (24) (−µ(t)Hu) ∆ = Hu , (25) ∆(t) = Hψσ , where the partial derivatives ofH are evaluated at (t, y(t), u(t), ψ0, ψ σ(t)). Obvi- ously, from (19) we obtain (25). It remains to prove that (18) implies (24) along (y∗(·), u∗(·), ψ0∗ , ψ∗(·)). Indeed, from (18) we can write µ(t)ψ ∆(t) = −µ(t)Hy, which is equivalent to ψ(t) = ψσ(t) + µ(t)Hy. Remark 7. Condition (18) or (24) imply that along the minimizer ψσ(t) = − ∫ σ(t) Hy(ξ, y(ξ), u(ξ), ψ0, ψ σ(ξ))∆ξ − c (26) for some c ∈ Rn. Remark 8. The assertion in Theorem 3 that the multipliers cannot be all zero is crucial. Indeed, without this requirement, for any admissible pair (y(·), u(·)) of (16) there would always exist a set of multipliers satisfying (18)-(19) (namely, ψ0 = 0 and ψ(t) ≡ 0). Remark 9. Along all the work we consider ψ as a row-vector. Remark 10. If the multipliers (ψ0, ψ(·)) satisfy the conditions of Theorem 3, then (γψ0, γψ(·)) also do, for any γ > 0. This simple observation allow us to conclude that it is enough to consider two cases: ψ0 = 0 or ψ0 = 1. Definition 2. An admissible quadruple (y(·), u(·), ψ0, ψ(·)) satisfying condi- tions (17)-(19) (also (21) or (22) if y(a) or y(b) are, respectively, free) is called an extremal for problem (16). An extremal is said to be normal if ψ0 = 1 and abnormal if ψ0 = 0. So, Theorem 3 asserts that every minimizer is an extremal. Proposition 1. The Lagrange problem on time scales (16) has no abnormal extremals (in particular, all the minimizers are normal) when at least one of the boundary conditions y(a) or y(b) is absent (when y(a) or y(b) is free). Proof. Without loss of generality, let us consider y(b) free. We want to prove that the nonnegative constant ψ0 is nonzero. The fact that ψ0 6= 0 follows from Theorem 3. Indeed, the multipliers ψ0 and ψ(t) cannot vanish simultaneously at any point of t ∈ T. As far as y(b) is free, the solution to the problem must satisfy the condition ψ(b) = 0. The condition ψ(b) = 0 requires a nonzero value for ψ0 at t = b. But since ψ0 is a nonnegative constant, we conclude that ψ0 is positive, and we can normalize it (Remark 10) to unity. Remark 11. In the general situation abnormal extremals may occur. More precisely (see proof of Theorem 3), abnormality is characterized by the existence of a nontrivial solution ψ(t) for the system ψ∆(t) + ψσ(t) · ϕy = 0. Proposition 2. There are no abnormal extremals for problem (7), even in the case y(a) and y(b) are both fixed (y(a) = ya, y(b) = yb). Proof. Problem (7) is the particular case of (16) with y∆(t) = u(t). If ψ0 = 0, then the Hamiltonian (20) takes the form H = ψσ · u. From Theorem 3, ψ∆ = 0 and ψσ = 0, for all t ∈ Tk. Since ψσ = ψ + µ(t)ψ∆, this means that ψ0 and ψ would be both zero, which is not a possibility. Corollary 3. For problem (7), Theorem 3 gives Theorem 2. Proof. For problem (7) we have ϕ(t, y, u) = u. From Proposition 2, the Hamil- tonian becomes H(t, y, u, ψ0, ψ σ) = L(t, y, u) + ψσ · u. By the ∆-stationary condition (19) we may write Lu(t, y, u) + ψ σ = 0. Now apply (26) and the result follows. To prove Theorem 3 we need the following result: Lemma 3 (Fundamental lemma of the calculus of variations on time scales). Let g ∈ Crd, g : T k → Rn. Then, g(t) · η(t)∆t = 0 for all η ∈ Crd if and only if g(t) = 0 on Tk . Proof. If g(t) = 0 on Tk, then obviously g(t) · η(t)∆t = 0, for all η ∈ Crd. Now, suppose (without loss of generality) that g(t0) > 0 for some t0 ∈ T We will divide the proof in two steps: Step 1: Assume that t0 is right scattered. Define in T η(t) = 1 if t = t0; 0 if t 6= t0. Then η is rd-continuous and g(t)η(t)∆t = ∫ σ(t0) g(t)η(t)∆t = µ(t0)g(t0) > 0, which is a contradiction. Step 2: Suppose that t0 is right dense. Since g is rd-continuous, then it is continuous at t0. So there exist δ > 0 such that for all t ∈ (t0 − δ, t0 + δ) ∩ T we have g(t) > 0. If t0 is left-dense, define in T η(t) = (t− t0 + δ) 2(t− t0 − δ) 2 if t ∈ (t0 − δ, t0 + δ); 0 otherwise. It follows that η is rd-continuous and g(t)η(t)∆t = ∫ t0−δ g(t)η(t)∆t + ∫ t0+δ g(t)η(t)∆t + g(t)η(t)∆t > 0, which is a contradiction. If t0 is left-scattered, define in T η(t) = (t− t0 − δ) 2 if t ∈ [t0, t0 + δ̃); 0 otherwise, where 0 < δ̃ < min{µ(ρ(t0), δ)}. We have: η is rd-continuous and g(t)η(t)∆t = ∫ t0+δ̃ g(t)η(t)∆t > 0, that again leads us to a contradiction. Proof. (of Theorem 3) We begin by noting that u(t) = (u1(t), . . . , um(t)) in problem (16), t ∈ Tk, are arbitrarily specified functions (controls). Once fixed u(·) ∈ Crd[T;R m], then y(t) = (y1(t), . . . , yn(t)), t ∈ T k, is determined from the system of delta-differential equations y∆(t) = ϕ(t, y(t), u(t)) (and boundary conditions, if present). As far as u(·) is an arbitrary function, variations ω(·) ∈ Crd[T;R m] for u(·) can also be considered arbitrary. This is not true, however, for the variations η(·) ∈C1rd[T;R n] of y(·). Suppose that (y∗(·), u∗(·)) is a weak local minimizer of J [·, ·]. Let ε ∈ (−δ, δ) be a small real parameter and yε(t) = y∗(t)+εη(t) (with η(a) = 0 if y(a) = ya is given; η(b) = 0 if y(b) = yb is given) be the trajectory generated by the control uε(t) = u∗(t)+εω(t), ω(·) ∈ Crd[T;R ε (t) = ϕ(t, yε(t), uε(t)) , (27) t ∈ Tk, (yε(a) = ya), (yε(b) = yb). We define the following function: Φ(ε) = J [yε(·), uε(·)] = J [y∗(·) + εη(·), u∗(·) + εω(·)] L (t, y∗(t) + εη(t), u∗(t) + εω(t))∆t . It follows that Φ : (−δ, δ) → R has a minimum for ε = 0, so we must have Φ′(0) = 0. From this condition we can write that [ψ0Ly (t, y∗(t), u∗(t)) · η(t) + ψ0Lu (t, y∗(t), u∗(t)) · ω(t)]∆t = 0 (28) for any real constant ψ0. Differentiating (27) with respect to ε, we get η∆(t) = ϕy(t, yε(t), uε(t)) · η(t) + ϕu(t, yε(t), uε(t)) · ω(t) . In particular, with ε = 0, η∆(t) = ϕy(t, y∗(t), u∗(t)) · η(t) + ϕu(t, y∗(t), u∗(t)) · ω(t) . (29) Let ψ(·) ∈C1rd[T;R n] be (yet) an unspecified function. Multiplying (29) by ψσ(t) = [ψσ1 (t), . . . , ψ n(t)], and delta-integrating the result with respect to t from a to b, we get that ψσ(t) · η∆(t)∆t = [ψσ(t) · ϕy · η(t) + ψ σ(t) · ϕu · ω(t)]∆t (30) for any ψ(·) ∈ C1rd[T;R n]. Integrating by parts (see Lemma 1, formula 1), σ(t) · η∆(t)∆t = ψ(t) · η(t)| ∆(t) · η(t)∆t , (31) and we can write from (28), (30) and (31) that ψ∆(t) + ψ0Ly + ψ σ(t) · ϕy · η(t) + (ψ0Lu + ψ σ(t) · ϕu) · ω(t) ∆t− ψ(t) · η(t)| = 0 (32) hold for any ψ(t). Using the definition (20) of H , we can rewrite (32) as ψ∆(t) +Hy · η(t) +Hu · ω(t) ∆t− ψ(t) · η(t)| = 0 . (33) It is, however, not possible to employ (yet) Lemma 3 due to the fact that the variations η(t) are not arbitrary. Now choose ψ(t) = ψ∗(t) so that the coefficient of η(t) in (33) vanishes: ψ∆ (t) = −Hy (and ψ∗(a) = 0 if y(a) is free, i.e. η(a) 6= 0; ψ∗(b) = 0 if y(b) is free, i.e. η(b) 6= 0). In the normal case ψ∗(t) is determined by (y∗(·), u∗(·)), and we choose ψ0∗ = 1. The abnormal case is characterized by the existence of a non-trivial solution ψ∗(t) for the system (t) + ψσ (t) · ϕy = 0: in that case we choose ψ0∗ = 0 in order to the first coefficient of η(t) in (32) or (33) to vanish. Given this choice of the multipliers, the necessary optimality condition (33) takes the form Hu · ω(t)∆t = 0 . Since ω(t) can be arbitrarily assigned for all t ∈ Tk, it follows from Lemma 3 that Hu = 0. 3.3 The higher-order problem on time scales As a corollary of Theorem 3 we obtain the Euler-Lagrange equation for problem (3). We first introduce some notation: y0(t) = y(t), y1(t) = y∆(t), yr−1(t) = y∆ u(t) = y∆ Theorem 4. If y∗ ∈ C rd[T] is a weak local minimizer for the higher-order problem (3), then (σ(t)) = −Lu(t, x∗(t), u∗(t)) (34) holds for all t ∈ Tk , where x∗(t) = y∗(t), y (t), . . . , y∆ and ψr−1 (σ(t)) is defined recursively by (σ(t)) = − ∫ σ(t) Ly0(ξ, x∗(ξ), u∗(ξ))∆ξ + c0 , (35) (σ(t)) = − ∫ σ(t) Lyi(ξ, x∗(ξ), u∗(ξ)) + ψ (σ(ξ)) ∆ξ + ci, i = 1, . . . , r − 1 , with cj, j = 0, . . . , r − 1, constants. If y ∆i(α) is free in (3) for some i ∈ {0, . . . , r − 1}, α ∈ {a, ρr−1(b)}, then the correspondent condition ψi (α) = 0 holds. Remark 12. From (34), (35) and (36) it follows that (−1)r−i · · · Lyi + [ci]r−i−1 = 0, (37) where [ci]r−i−1 means that the constant is free from the composition of the r− i integrals when i = r − 1 (for simplicity, we have omitted the arguments in Lu and Lyi). Remark 13. If we delta differentiate (37) r times, we obtain the delta differen- tiated equation for the problem of the calculus of variations with higher order delta derivatives. However, as observed in Remark 3, one can only expand formula (37) under suitable conditions of delta differentiability of µ(t). Remark 14. For the particular case with ϕ(t, y, u) = u, equation (8) is (37) with r = 1. Proposition 3. The higher-order problem on time scales (3) does not admit ab- normal extremals, even when the boundary conditions y∆ (a) and y∆ (ρr−1(b)), i = 0, . . . , r − 1, are all fixed. Remark 15. We require the time scale T to have at least 2r + 1 points. Let us consider problem (3) with all the boundary conditions fixed. Due to the fact that we have r delta derivatives, the boundary conditions y∆ (a) = yia and (ρr−1(b)) = yib for all i ∈ {0, . . . , r− 1}, imply that we must have at least 2r points in order to have the problem well defined. If we had only this number of points, then the time scale could be written as T = {a, σ(a), . . . , σ2r−1(a)} and r−1(b) L(t, y(t),y∆(t), . . . , y∆ (t))∆t i+1(a) σi(a) L(t, y(t), y∆(t), . . . , y∆ (t))∆t L(σi(a), y(σi(a)), y∆(σi(a)), . . . , y∆ (σi(a))), where we have used the fact that ρr−1(σ2r−1(a)) = σr(a). Now, having in mind the boundary conditions and the formula f∆(t) = f(σ(t))− f(t) we can conclude that the sum in (38) would be constant for every admissible function y(·) and we wouldn’t have nothing to minimize. The following technical result is used in the proof of Proposition 3. Lemma 4. Suppose that a function f : T → R is such that fσ(t) = 0 for all t ∈ Tk. Then, f(t) = 0 for all t ∈ T\{a} if a is right-scattered. Proof. First note that, since fσ(t) = 0, then fσ(t) is delta differentiable, hence continuous for all t ∈ Tk. Now, if t is right-dense, the result is obvious. Suppose that t is right-scattered. We will analyze two cases: (i) if t is left-scattered, then t 6= a and by hypothesis 0 = fσ(ρ(t)) = f(t); (ii) if t is left-dense, then f(t) = lims→t− f σ(s) = fσ(t), by the continuity of fσ. The proof is done. Proof. (of Proposition 3) Suppose that ψ0 = 0. With the notation (40) intro- duced below, the higher order problem (3) would have the abnormal Hamilto- nian given by H(t, y0, . . . , yr−1, u, ψ0, . . . , ψr−1) = ψi(σ(t)) · yi+1(t) + ψr−1(σ(t)) · u(t) (compare with the normal Hamiltonian (41)). From Theorem 3, we can write the system of equations: ψ̂0(t) = 0 ψ̂1(t) = −ψ0(σ(t)) ψ̂r−1(t) = −ψr−2(σ(t)) ψr−1(σ(t)) = 0, for all t ∈ Tk , where we are using the notation ψ̂i(t) = ψi (t), i = 0, . . . , r− 1. From the last equation, and in view of Lemma 4, we have ψ(t) = 0, ∀t ∈ \{a} if a is right-scattered. This implies that ψ̂r−1(t) = 0, ∀t ∈ Tk and consequently ψr−2(σ(t)) = 0, ∀t ∈ Tk \{a}. Like we did before, ψr−2(t) = 0, ∀t ∈ Tk \{a, σ(a)} if σ(a) is right-scattered. Repeating this procedure, we will finally have ψ̂1(t) = 0, ∀t ∈ Tk \{a, . . . , σr−2(a)} if σi(a) is right-scattered for all i ∈ {0, . . . , r− 2}. Now, the first and second equations in the system (39) imply that ∀t ∈ A = Tk \{a, . . . , σr−2(a)} 0 = ψ̂1(t) = −ψ0(σ(t)) = ψ0(t) + µ(t)ψ∆(t) = ψ0(t) . We pick again the first equation to point out that ψ0(t) = c, ∀t ∈ Tk some constant c. Since the time scale has at least 2r + 1 points (Remark 15), the set A is nonempty and therefore ψ0(t) = 0, ∀t ∈ Tk . Substituting this in the second equation, we get ψ̂1(t) = 0, ∀t ∈ Tk . As before, it follows that ψ1(t) = d, ∀t ∈ Tk and some constant d. But we have seen that there exists some t0 such that ψ 1(t0) = 0, hence ψ 1(t) = 0, ∀t ∈ Tk . Repeating this procedure, we conclude that for all i ∈ {0, . . . , r−1}, ψi(t) = 0 at t ∈ Tk . This is in contradiction with Theorem 3 and we conclude that ψ0 6= 0. Proof. (of Theorem 4) Denoting ŷ(t) = y∆(t), then problem (3) takes the fol- lowing form: L[y(·)] = r−1(b) L(t, y0(t), y1(t), . . . , yr−1(t), u(t))∆t −→ min, ŷ0 = y1 ŷ1 = y2 ŷr−2 = yr−1 ŷr−1 = u yi(a) = yia ρr−1(b) = yib , i = 0, . . . , r − 1, yia and y b ∈ R System (40) can be written in the form y∆ = Ay +Bu, where y0, y1, . . . , yr−1 y01 , . . . , y 1 , . . . , y n, . . . , y ∈ Rnr and the matrices A (nr by nr) and B (nr by n) are 0 I 0 · · · 0 0 0 I · · · 0 . . . 0 0 0 · · · I 0 0 0 · · · 0 , B = col{0, . . . , 0, I} in which I denotes the n by n identity matrix, and 0 the n by n zero matrix. From Proposition 3 we can fix ψ0 = 1: problem (40) is a particular case of (16) with the Hamiltonian given by H(t, y0, . . . , yr−1, u, ψ0, . . . , ψr−1) = L(t, y0, . . . , yr−1, u) + ψi(σ(·)) · yi+1 + ψr−1(σ(·)) · u. (41) From (26) and (19), we obtain ψi(σ(t)) = − ∫ σ(t) Hyi(ξ, x(ξ), u(ξ), ψ σ(ξ))∆ξ + ci, i ∈ {0, . . . , r − 1} (42) 0 = Hu(t, x(t), u(t), ψ σ(t)), (43) respectively. Equation (43) is equivalent to (34), and from (42) we get (35)- (36). 4 An example We end with an application of our higher-order Euler-Lagrange equation (37) to the time scale T = [a, b] ∩ Z, that leads us to the usual and well-known discrete-time Euler-Lagrange equation (in delta differentiated form) – see e.g. [11]. Note that ∀t ∈ T we have σ(t) = t + 1 and µ(t) = σ(t) − t = 1. In particular, we conclude immediately that µ(t) is r times delta differentiable. Also for any function g, g∆ exists ∀t ∈ Tk (see Theorem 1.16 (ii) of [5]) and g∆(t) = g(t+1)− g(t) = ∆g is the usual forward difference operator (obviously exists ∀t ∈ Tk and more generally g∆ exists ∀t ∈ Tk , r ∈ N). Now, for any function f : T → R and for any j ∈ N we have ∫ σ(t) · · · ︸ ︷︷ ︸ j−i integrals , i ∈ {0, . . . , j − 1} , (44) where f∆ (t) stands for f∆ (σj−i(t)). To see this we proceed by induction. For j = 1 ∫ σ(t) f(ξ)∆ξ = ∫ t+1 f(ξ)∆ξ = f(ξ)∆ξ + ∫ t+1 f(ξ)∆ξ f(ξ)∆ξ + f(t), and then ∫ σ(t) f(ξ)∆ξ = f(t) + f∆(t) = fσ. Assuming that (44) is true for all j = 1, . . . , k, then ∫ σ(t) · · · ︸ ︷︷ ︸ k+1−i integrals · · · ︸ ︷︷ ︸ k+1−i f∆τ + ∫ σ(t) · · · ︸ ︷︷ ︸ ∫ σ(t) · · · ︸ ︷︷ ︸ ∫ σ(t) · · · ︸ ︷︷ ︸ ∆iσk−i k+1−i Delta differentiating r times both sides of equation (37) and in view of (44), we obtain the Euler-Lagrange equation in delta differentiated form (remember that y0 = y, . . ., yr−1 = y∆ = u): r (t, y, y∆, . . . , y∆ (−1)r−iL∆ i (t, y, y , . . . , y ∆r) = 0. 5 Conclusion We introduce a new perspective to the calculus of variations on time scales. In all the previous works [2, 4, 9] on the subject, it is not mentioned the motivation for having yσ (or yρ) in the formulation of problem (1). We claim the formulation (2) without σ (or ρ) to be more natural and convenient. One advantage of the approach we are promoting is that it becomes clear how to generalize the simplest functional of the calculus of variations on time scales to problems with higher-order delta derivatives. We also note that the Euler-Lagrange equation in ∆-integral form (8), for a Lagrangian L with y instead of yσ, follows close the classical condition. Main results of the paper include: necessary optimality conditions for the Lagrange problem of the calculus of variations on time scales, covering both normal and abnormal minimizers; necessary optimality conditions for problems with higher-order delta derivatives. Much remains to be done in the calculus of variations and optimal control on time scales. We trust that our perspective provides interesting insights and opens new possibilities for further investigations. Acknowledgments This work was partially supported by the Portuguese Foundation for Science and Technology (FCT), through the Control Theory Group (cotg) of the Centre for Research on Optimization and Control (CEOC – http://ceoc.mat.ua.pt). The authors are grateful to M. Bohner and S. Hilger for useful and stimulating comments, and for them to have shared their expertise on time scales. References [1] R. Agarwal, M. Bohner, D. O’Regan, A. Peterson. Dynamic equations on time scales: a survey, J. Comput. Appl. Math. 141 (2002), no. 1-2, 1–26. [2] F. M. Atici, D. C. Biles, A. Lebedinsky. An application of time scales to economics, Math. Comput. Modelling 43 (2006), no. 7-8, 718–726. [3] Z. Bartosiewicz, E. Paw luszewicz. Realizations of linear control systems on time scales, Control Cybernet. 35 (2006), no. 4 (in press) [4] M. Bohner. Calculus of variations on time scales, Dynam. Systems Appl. 13 (2004), no. 3-4, 339–349. [5] M. Bohner, A. C. Peterson. Dynamic equations on time scales: an introduction with applications, Birkhäuser Boston, Inc., Boston, MA, 2001. [6] I. M. Gelfand, S. V. Fomin. Calculus of variations, Dover, New York, 2000. [7] S. Hilger. Analysis on measure chains—a unified approach to continuous and dis- crete calculus, Results Math. 35 (1990), 18–56. [8] S. Hilger. Differential and difference calculus—unified!, Proceedings of the Second World Congress of Nonlinear Analysts, Part 5 (Athens, 1996). Nonlinear Anal. 30 (1997), no. 5, 2683–2694. [9] R. Hilscher, V. Zeidan. Calculus of variations on time scales: weak local piecewise rd solutions with variable endpoints, J. Math. Anal. Appl. 289 (2004), no. 1, 143–166. [10] R. Hilscher, V. Zeidan. Nonnegativity and positivity of quadratic functionals in discrete calculus of variations: survey, J. Difference Equ. Appl. 11 (2005), no. 9, 857–875. [11] J. D. Logan. Higher dimensional problems in the discrete calculus of variations, Internat. J. Control (1) 17 (1973), 315–320. [12] L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, E. F. Mishchenko. The mathematical theory of optimal processes. Translated from the Russian by K. N. Trirogoff; edited by L. W. Neustadt Interscience Publishers John Wiley & Sons, Inc. New York-London, 1962. [13] S. P. Sethi, G. L. Thompson. Optimal control theory, Second edition, Kluwer Acad. Publ., Boston, MA, 2000. [14] B. van Brunt. The calculus of variations, Universitext, Springer-Verlag, New York, 2004. Introduction Time scales and previous results Main results The basic problem on time scales The Lagrange problem on time scales The higher-order problem on time scales An example Conclusion ABSTRACT We study more general variational problems on time scales. Previous results are generalized by proving necessary optimality conditions for (i) variational problems involving delta derivatives of more than the first order, and (ii) problems of the calculus of variations with delta-differential side conditions (Lagrange problem of the calculus of variations on time scales). <|endoftext|><|startoftext|> Introduction The solar wind evacuates a cavity in the interstellar medium (ISM) known as the heliosphere, from which interstellar ions are excluded. In contrast, neutral interstellar gas flows through the heliosphere until destroyed by charge-transfer with the solar wind and photoionization. These neutrals form the parent popu- lation of pickup ions (PUI) and anomalous cosmic rays (ACR) observed inside of the heliosphere. The properties of the sur- rounding interstellar medium set the boundary conditions of the heliosphere and determine its configuration and evolution. An ionization gradient is expected in the cloud feeding ISM into the heliosphere because of the hardness of the interstellar radiation field and the low opacity of the ISM (Cheng & Bruhweiler 1990; Vallerga 1996; Slavin & Frisch 2002, hereafter SF02). Because of this ionization gradient, the densities of partially ionized species in the local interstellar cloud (LIC) will differ from the values in the circumheliosphere interstellar medium (CHISM) that forms the boundary conditions of the heliosphere. Hence we have undertaken a series of studies to determine the bound- ary conditions of the heliosphere based on both astronomical and heliospheric data. In turn our results provide tighter constraints on the heliosphere models used to calculate the filtration factors Send offprint requests to: J. Slavin for neutrals that then permit comparisons between ISM inside and outside of the heliosphere. The distribution and velocity of interstellar H0 inside of the heliosphere were first determined over 30 years ago from the florescence of solar Lyα radiation from these atoms (Thomas & Krassa 1971; Bertaux & Blamont 1971; Adams & Frisch 1977). Similar observations of solar 584Å flo- rescence from interstellar He0 showed that H/He ∼ 6 for in- terstellar gas inside of the heliosphere, in contrast to the cos- mic value H/He = 10 (Ajello 1978; Weller & Meier 1981). More recent measurements of n(H0) compared to n(He0) in- side of the heliosphere find a similar ratio of H/He ∼ 6 − 7 (Richardson et al. 2004; Gloeckler & Geiss 2004; Witte 2004; Möbius et al. 2004). This difference can be attributed to two ef- fects: the loss of 40–60% of interstellar H0 due to charge-transfer with protons in the heliosheath region, a process denoted “filtra- tion” (Ripken & Fahr 1983), and the hardness of the interstellar radiation field at the Sun that ionizes more He than H (§4). The Copernicus satellite first showed that the local inter- stellar cloud (LIC) surrounding the Sun is low density, ∼ 0.1 atoms cm−3, partially ionized (n(H+) ∼ n(H0), York 1974; McClintock et al. 1975), and warm (temperature < 104 K, e.g. McClintock et al. 1978). Copernicus, FUSE and HST data have shown that the cluster of local interstellar clouds (CLIC, low density clouds within ∼ 30 pc), has low column densities, N(H i) < 1018.7, and N ionization levels of > 30% (e.g. Lehner et al. http://arxiv.org/abs/0704.0657v2 2 Slavin and Frisch: Boundary Conditions of the Heliosphere 2003; Wood et al. 2005), indicating partially ionized gas because H and N ionization are coupled by charge-transfer. The low col- umn densities of the LIC itself, N(H i) < 1018 cm−2, indicate it is partially opaque to H ionizing radiation but not to He ion- izing radiation. Cloud opacities of unity are reached for N(H i) ∼ 1017.2 cm−2 for photons close to the ionization threshold of hydrogen (13.6 eV), and N(H i) ∼ 1017.7 cm−2 for photons at the He0 ionization edge (24.6 eV). The result is that the LIC is par- tially ionized with a significant ionization gradient between the edge and center. Because of this radiative transfer effects are im- portant and need to be modeled carefully in order to determine the boundary conditions of the heliosphere. The LIC belongs to a flow of low density ISM embedded in the very low density and apparently hot (T ∼ 106 K) Local Bubble (Frisch 1981; Frisch & York 1986; McCammon et al. 1983; Snowden et al. 1990). The bulk motion of the CLIC through the local standard of rest1 corresponds to a velocity of −17.0 to −19.4 km s−1 from the direction ℓ ∼ 331◦, b ∼ −5◦ (Frisch & Slavin 2006). This upwind direction is near the center of the Loop I superbubble and the center of the “ring” shadow that has been attributed to the merging of Loop I and the Local Bubble (Frisch & York 1986; Frisch 2007; Egger & Aschenbach 1995). Individual cloudlets with distinct velocities are identified in this flow (Lallement et al. 1986; Frisch et al. 2002). The ve- locity of the cloud feeding gas into the heliosphere has been de- termined by the velocity of interstellar He0 in the heliosphere measured by the GAS detector on Ulysses, −26.3 km s−1 (Witte 2004). Curiously, absorption lines at the LIC velocity are not ob- served in the nearest interstellar gas in the upwind hemisphere, such as towards the closest star α Cen or towards 36 Oph lo- cated ∼ 5 pc beyond the heliosphere nose (Adams & Frisch 1977; Landsman et al. 1984; Linsky & Wood 1996; Wood et al. 2000a,b). This lack of an absorption component at the LIC ve- locity in the closest stars in the upstream direction indicates that the Sun is near the edge of the LIC, so the CHISM may vary over short distances (and hence timescales). We are able to test for possible past variations by compar- ing the first interplanetary H0 Lyα glow spectrum obtained by Copernicus in 1975 with Hubble Space Telescope observations of the Lyα spectra obtained during the mid-1990’s solar min- imum conditions. The observed H0 velocity and intensity has not varied to within uncertainties over the twenty-year period separating these two sets of observations, so that the CHISM velocity field is relatively smooth over spatial scales of ∼ 120 AU in the downwind direction (Frisch & Slavin 2005). The 1975 Copernicus data were acquired in the direction corresponding to ecliptic longitudes of λ = 264.3◦, β = +15.0◦, or ∼ 13.3◦ from the most recent upwind direction derived from SOHO H i Lyα data (Quémerais et al. 2006b,a, Q06). The Copernicus look direction was just outside of the “groove” expected in the Lyα glow in the ecliptic during solar minimum. The groove is caused by increased charge-transfer in the solar wind current sheet, which has a small tilt during minimum conditions (Bzowski 2003). The velocity of the Lyα profile observed by Copernicus corresponds to −24.8±2.6 km s−1, after correction to the SOHO upwind direction. Since Q06 measured H0 upwind velocities during the solar minimum years of 1996 and 1997 of −25.7±0.2 km s−1 and −25.3 ± 0.2 km s−1, which is consistent with the Copernicus results, we must arrive at the conclusion that the flow of interstellar H0 into the heliosphere was relatively constant be- 1 Heliocentric motions are converted to the local standard of rest, LSR, using the Standard solar apex motion. tween the years of 1975 and 1997, so that any observations of variations in the interplanetary Lyα glow properties must be due to solar activity properties alone. The thermal broadening of the Copernicus spectrum corresponded to a temperature of ∼ 5400 K, however measurement uncertainties allowed temperatures of up to 20, 000 K. In this paper we present new photoionization models of the LIC and show they reproduce both the densities of the ISM at the heliosphere and column densities in the LIC component towards the star ǫ CMa. In earlier papers, photoionization models of both the “LIC” and “Blue Cloud” components observed towards ǫ CMa were used to constrain the models (SF02, Frisch & Slavin 2003). SF02 grouped the properties of the LIC and Blue Cloud, both of which are with 2.7 pc of the Sun because they are also observed towards Sirius, ∼ 12◦ from the ǫ CMa sightline. The LIC velocity and density are sampled by observations of inter- stellar He i inside of the heliosphere (−26.3 km s−1, Witte 2004), and Gry & Jenkins (2001) show that the properties of the LIC and Blue Cloud differ somewhat. The present study therefore fo- cuses on obtaining the best model of heliosphere boundary con- ditions by using only data on the LIC inside of the heliosphere, and the LIC component towards ǫ CMa. The present study also benefits from new atomic rates for the critical Mg ii→Mg i di- electronic recombination coefficient, improved cooling rates in the radiative transfer code Cloudy, recent values for the pickup ion densities at the termination shock, and recent values for solar abundances. The most truly unique quality of the LIC is that we are inside of it and therefore have the ability to sample the cloud directly via in situ observations carried out by a variety of spacecraft within the Solar System. Of all the available measurements of LIC gas flowing into the Solar System, the observations of the density and temperature of neutral He are apparently the most robust. Helium, unlike hydrogen, undergoes little ionization or heating in traversing the heliosheath regions, no deflection due to radiation pressure, and is destroyed by photoionization and electron-impact ionization within ∼ 1 AU of the Sun, so we ex- pect that the density and temperature of He0 derived from the observations in the Solar System are truly representative of the values in the LIC (Möbius et al. 2004). In this paper we put a special emphasis on matching these He i data by determining model parameters that yield close agreement with the n(He0) and T (He0) data simultaneously. 2. Photoionization Model Constraints and Assumptions The primary data constraints on our photoionization mod- els are the LIC component column densities towards ǫ CMa, (Gry & Jenkins 2001, hereafter GJ01), in situ observations of He0 (Witte 2004; Möbius et al. 2004), pickup ions (PUI, Gloeckler & Fisk 2007), and anomalous cosmic rays (ACR, Cummings et al. 2002). The astronomical and in situ observa- tional constraints are summarized in Table 1. 2.1. Astronomical Constraints – The LIC towards ǫ CMa The data toward ǫ CMa of GJ01 show four separate veloc- ity components detected in several different ions including C ii, Si iii, Si ii, and Mg ii. One of these velocity components, with a heliocentric velocity of ∼ 17 km s−1, is identified as the LIC. Another is identified with a second local cloud, the “Blue Cloud” (BC) at ∼ 10 km s−1. In our previous study of LIC ioniza- Slavin and Frisch: Boundary Conditions of the Heliosphere 3 Table 1. Observational Constraints Observed Observeda Notesb Quantity Value N(C ii) (cm−2) 1.4 − 2.1 × 1014 1, S N(C ii∗) (cm−2) 1.3 ± 0.2 × 1012 1 N(C iv) (cm−2) 1.2 ± 0.3 × 1012 1 N(N i) (cm−2) 1.70 ± 0.05 × 1013 1 N(O i) (cm−2) 1.4+0.5 −0.2 × 10 14 1, S N(Mg i) (cm−2) 7 ± 2 × 109 1 N(Mg ii) (cm−2) 3.1 ± 0.1 × 1012 1 N(Si ii) (cm−2) 4.52 ± 0.2 × 1012 1 N(Si iii) (cm−2) 2.3 ± 0.2 × 1012 1 N(S ii) (cm−2) 8.6 ± 2.1 × 1012 1 N(Fe ii) (cm−2) 1.35 ± 0.05 × 1012 1 N(H i)/N(He i) 14 ± 0.4c 2 T (K) 6300 ± 340 4 n(He0) (cm−3) 0.015 ± 0.003 4, f = 1d n(N0)e (cm−3) 5.47 ± 1.37 × 10−6 3, f = 0.68 − 0.95d n(O0)e (cm−3) 4.82 ± 0.53 × 10−5 3, f = 0.64 − 0.99d n(Ne0)e (cm−3) 5.82 ± 1.16 × 10−6 3, f = 0.84 − 0.95d n(Ar0)e (cm−3) 1.63 ± 0.73 × 10−7 3, f = 0.53 − 0.95d a N(C ii∗), N(N i), N(O i), N(Mg ii), N(Si ii), N(S ii) and N(Fe ii) are used to constrain the input abundances of the models. b The first number indicates the reference (below), and “S” indi- cates that the column density is based on a saturated line. Filtration factors ( f , §2.2) are listed for neutrals that have been observed as pickup ions. c The uncertainty given is only that due to uncertainties listed in Dupuis et al. (1995) for the observed H i and He i column densities with the implicit assumption that the ratio is the same on all lines of sight. Given the substantial intrinsic variation in this ratio, however, the quoted uncertainty must be regarded as a lower limit to the true uncertainty. d We assume in this paper that for He fHe = 1, and f is not allowed to exceed 1. Heliosphere models predict the range fHe = 0.92 − 1 for He and fH = 0.50 − 0.74 for H (see filtration factors in Cummings et al. 2002; Müller & Zank 2004a; Müller et al. 2007; Izmodenov et al. 2004). e Note that the pickup ion ratios are for values at the termination shock. N0, O0, Ne0, and Ar0 densities need to be corrected for fil- tration in the heliosheath regions (2.2). References: (1) Gry & Jenkins (2001) Note that the values shown are for the LIC component towards ǫ CMa.; (2) Dupuis et al. (1995); (3) Gloeckler & Fisk (2007); (4) Witte (2004). tion (SF02) we noted that since the BC is close to the LIC, and both are detected towards Sirius at 2.7 pc (Lallement et al. 1994; Hébrard et al. 1999), perhaps actually abutting the LIC, we should treat the two as a single cloud for this line of sight. Counter arguments to this line of reasoning include the fact that the BC apparently has different properties than the LIC, though whether it is colder or hotter is unclear (see Hébrard et al. 1999, GJ01). For this reason and because we wish to find the ioniza- tion models that best predict interstellar neutral densities inside and outside of the heliosphere, we assume in this paper that the LIC and BC are truly separate clouds and select only the LIC components towards ǫ CMa to constrain the models. In the context of these models, the best astronomical con- straints on the neutral ISM component are N i, the saturated O i line, and to some extent Mg i (though the line is weak). The elec- tron density can be deduced from the ratio Mg ii/Mg i, which is determined by photoionization and both dielectronic and radia- tive recombination, and the excitation of the C ii fine-structure lines, C ii/C ii∗. However, the heavy saturation of the C ii 1335 Å line in the ISM limits the accuracy of the determination of N(C ii), so in this paper we have de-emphasized C ii/C ii∗ as an ionization diagnostic. For a discussion of the use of C ii/C ii∗ and other observations in diagnosing the C abundance and C/S ratio see Slavin & Frisch (2006). Elements with first ionization po- tentials (FIP) < 13.6 eV (e.g. Mg, Si, S, and Fe) are generally almost entirely singly ionized in the LIC and thus the column densities of these ions are close to the total column densities for the elements. 2.2. In situ Constraints – He0, Pickup Ions, and Anomalous Cosmic Rays Neutral atoms in the LIC penetrate the outer heliosphere regions, and become ionized primarily by charge-transfer with the so- lar wind ions, photoionization, and electron impact ionization (Rucinski et al. 1996). The composition of this neutral popula- tion reflects the partially ionized state of the LIC, rather than indicating a pure FIP effect. Thus the observed abundances of neutrals representing high FIP elements, such as He, Ne, and Ar, as well as H, N, and O, do not reflect their elemental abundances directly. As a result, these neutrals provide an interesting and unique constraint on the photoionization models. Once ionized, these interstellar wind particles form a population of ions with a distinct velocity distribution that are “picked up” by the solar wind and convected outwards, where they are measured by var- ious spacecraft (Möbius et al. 2004; Gloeckler & Geiss 2004). These pickup ions (PUI) are accelerated in the heliosheath re- gion and form a population of cosmic rays with an anomalous composition reflective of their origin as interstellar neutrals in a partially ionized gas (Cummings et al. 2002). In situ observa- tions of these byproducts of the ISM interaction with the helio- sphere, H, He, N, O, Ar, and Ne, provide a unique opportunity to constrain theoretical models of an interstellar cloud using the combination of sightline-integrated data, and data from a “sin- gle” spatial location, the heliosphere. We adopt the Ulysses He measurements as the best set of constraints on the ISM inside of the heliosphere. The Ulysses satellite provides direct measurements of interstellar He0 at high ecliptic latitudes throughout the solar cycle (the GAS de- tector, Witte 2004) and also measurements of the He pickup ion component (the SWICS detector Gloeckler & Geiss 2004; Gloeckler & Geiss 2007). Although interstellar He0 close to the Sun is detected through the resonant scattering of solar 584 Å radiation, geocoronal contamination of the interstellar signal is present so we prefer the Ulysses data (Möbius et al. 2004). We adopt the Ulysses GAS and SWICS results, n(He0) = 0.0151 ± 0.0015 cm−3, T (He0) = 6, 300 ± 340 K. Data on the density of neutral N, O, Ne, and Ar in the sur- rounding ISM are provided by PUI and ACR data. The densities of interstellar N0, O0, Ne0, and Ar0 at the termination shock are listed in Table 1. These densities must be corrected by the fil- tration factors, which correspond to the ratios of the densities at the termination shock to those in the LIC. Filtration occurs when neutral interstellar atoms are removed from the inflow by charge-transfer with interstellar protons as the atom crosses the heliosheath regions. Filtration values are listed in Table 1, based on values in Cummings et al. (2002); Müller & Zank (2004b); Izmodenov et al. (2004). Filtration values larger than 1 are not considered, although some models suggest possible net creation of O0 through charge-transfer between O+ and H0 in the he- liosheath regions (Müller & Zank 2004a). We adopt fHe = 1 for He. Our models must be compared to the interstellar densities 4 Slavin and Frisch: Boundary Conditions of the Heliosphere Fig. 1. The modeled interstellar radiation field at the Sun (model 26) is shown as a function of wavelength (bottom X-axis) and energy (top X-axis). The black histogram is the modeled hot gas (i.e. Local Bubble) spectrum while the gray histogram is the cloud boundary contribution. The other line is the stellar EUV/FUV background. The list of elements at the top of the plot identifies the ionization potentials for neutrals and ions of interest. The energy/wavelength at which an optical depth of 1 is reached for several different H i column densities is shown along the bottom of the plot. Observed flux levels of the soft X-ray dif- fuse background in the Wisconsin Be and B bands are plotted as lines (Bloch et al. 1986; McCammon et al. 1983). obtained by correcting densities at the termination shock by fil- tration factors. 2.3. Interstellar Radiation Field at the Cloud Surface The spectrum and flux of the cosmic radiation field control the ionization of the very local warm partially ionized medium (WPIM). The local interstellar radiation field (ISRF) that ionizes the LIC and other nearby cloudlets is determined by the location of the Sun in the interior of a hole in the neutral interstellar gas and dust referred to as the Local Bubble. The clustering tendency of hot O and B stars, and attenuation of radiation by interstellar dust and gas, yield the well known spatial variation of the in- tensity and spectrum of the ISRF. We ignore possible temporal variations of the radiation field (e.g. Parravano et al. 2003), and model the ISRF throughout the LIC based on the observations of the present-day radiation field. The radiation field we use in our models is based on obser- vations of the ISRF at the Sun, supplemented by theoretical cal- culations of the spectra in the EUV and soft X-rays where lack of sensitivity and/or spectral resolution require the use of mod- els to create a realistic spectrum. The directly observed radiation field includes the far ultraviolet (FUV) field created primarily by B stars, the extreme ultraviolet (EUV) field from two B stars (ǫ CMa and β CMa) and hot nearby white-dwarf stars, and the diffuse soft X-ray background (SXRB). We show the radiation field at the cloud surface in Fig. 1. Because the interstellar opac- ity for radiation with E > 13.6 eV is vastly greater than for E < 13.6 eV, the EUV part of the ISRF originates in nearby regions with N(H i) <∼ 10 18 cm−2 while the FUV comes from a much larger volume. We use the EUV field of Vallerga (1998), which is based on data collected by the Extreme Ultraviolet Explorer (EUVE) satellite. The EUVE spectrometers were sensitive over the wave- length range of 504 – 730 Å and showed that the stellar part of the EUV background is dominated by ǫ CMa and β CMa with substantial contributions from nearby hot white dwarfs at shorter wavelengths. Vallerga extrapolated those measurements to the H0 ionization edge at 912 Å using a total interstellar H0 density towards ǫ CMa of N(H i) = 9 × 1017 cm−2. This value for N(H i) appears somewhat high based on observations of Gry & Jenkins (2001) which, though dependent on assumptions for gas phase abundances, indicate a value for N(H i) of ∼ 7× 1017 cm−2. This uncertainty in the total N(H i) affects only the extrapolated por- tion of the spectrum in our models since the other portion of the spectrum is derived by de-absorbing the observed spectrum by the value of N(H i) assumed just for the LIC. For the results pre- sented here we have assumed that the total N(H i) towards ǫ CMa is 7×1017 cm−2. Assuming a larger value would increase the flux in the extrapolated region, though our calculations indicate that the overall affect on our results is only at the ∼ 2% level at most. The FUV field is important because it sets the ionization rate of Mg0, which has a first ionization potential of 7.65 eV (1621 Å). The radiation field shortwards of 1600 Å is heavily domi- nated by O and B stars in Gould’s Belt, particularly those oc- cupying the unattenuated regions of the third and fourth galac- tic quadrants. A pronounced spatial asymmetry in the 1565 Å radiation field has been observed by the TD-1 satellite S2/68 telescope survey of the interstellar radiation field, and we use those data and the extrapolation down to 912 Å from the 1564 Å measurements as calculated by Gondhalekar et al. (1980). The asymmetries in the TD-1 1565 Å radiation field are reproduced by diffuse interstellar radiation field models (Henry 2002). The diffuse soft X-ray background (SXRB) has been ob- served over the entire sky at relatively low spatial and spectral resolution by ROSAT (Snowden et al. 1997) and proportional counters flown on sounding rockets by the Wisconsin group (e.g., McCammon et al. 1983). The broadband count rates in the low energy bands, particularly the B and C bands (130– 188 eV and 160–284 eV respectively) have been modeled as coming from an optically thin, hot plasma at a temperature of ∼ 106 K that occupies the low density cavity extending to ∼ 50– 200 pc from the Sun in all directions (Snowden et al. 1990). Refinements to this picture have been required by ROSAT data showing absorption by relatively distant clouds (e.g. MBM12, Snowden et al. 1993). Snowden et al. (1998) propose a picture in which the emission is divided between a LB component (un- absorbed except for the LIC) and a distant absorbed component, mainly in the Galactic halo. More recently there has been grow- ing evidence that some, possibly large, fraction of the SXRB is generated within the heliosphere from charge exchange between solar wind ions and neutral atoms (SWCX). We discuss this fur- ther in §5.6. We model the spectrum measured by the broad-band soft X- ray observations by assuming that the SXR background consists of a local (unabsorbed) component and a distant (halo) com- ponent absorbed by an H i column density of 1019 cm−2. The emission measure or intrinsic intensity of the local and distant components are assumed to be equal. The spectrum is calculated using the Raymond & Smith (1977, updated) plasma emission code assuming a hot, optically thin plasma in collisional ion- ization equilibrium. We explore temperatures for the hot gas of log Th = 5.9, 6.0 and 6.1. The total flux, scaled by the emission measure, is fixed so that the B band flux matches the all-sky av- erage from McCammon et al. (1983). Slavin and Frisch: Boundary Conditions of the Heliosphere 5 The boundary region between the warm LIC and the adja- cent Local Bubble plasma may be another significant source of EUV radiation and we include this flux in our models 1–30. We model this transition region as a conductive interface between the LIC and Local Bubble plasma in the same way as described in Slavin & Frisch (2002). The cloud is assumed to be steadily evaporating into the surrounding hot gas. The partially ionized gas of the LIC is heated and ionized as it flows into the Local Bubble. The ionization falls out of equilibrium, with low ion- ization stages persisting into the hot gas. The non-equilibrium ionization is followed and the emission in the boundary is calcu- lated again using the Raymond & Smith (1977) code. Ions in the outflow are typically excited several times before being ionized and the boundary region radiates strongly in the 13.6 − 54.4 eV band. The contribution of the interface emission to the total B band flux is taken into account in the calculation of the emission measure for the hot gas of the Local Bubble so that the B band flux still matches the all-sky average. We note that no attempt is made to make this model con- sistent with the size of the local cavity, which is proposed to contain the hot gas in the standard model for the SXRB (e.g., Snowden et al. 1998). The pressure in the hot gas is not adjusted to fit such models. Rather the total pressure in the hot gas for an evaporating cloud model is dictated by the assumed density, tem- perature and magnetic field in the cloud. In fact the pressures in our models come out far too low for the standard model to pro- duce the SXRB within the confines of the Local Cavity as de- duced, e.g., from Na i observations (Lallement et al. 2003). This in turn means that if the thermal pressure in the Local Bubble turns out to be much lower than was assumed in those models because a substantial fraction of the SXRB comes from SWCX then the emission from the cloud boundary would be unaffected. An example of the profiles of the hydrodynamic variables for two different models is shown in Fig. 2. The magnetic field strength in these calculations is assumed to be proportional to the density at every point in the outflow. The treatment here is the same as in Slavin (1989) which contains a more thorough discussion of the issues involved in this sort of calculation. The effect of the field on the conductivity is parametrized in a simple way by a constant reduction factor of 0.5. The importance of the magnetic field for our calculations lies in the way it affects the thermal pressure in the layer. Since the density drops sharply in the outflow and |B| ∼ n, any magnetic pressure (∼ B2) drops even more sharply. The total pressure is roughly constant in the boundary, so the thermal pressure necessarily rises to make up for the decreasing magnetic pressure. Since all our models have nearly the same thermal pressure in the cloud, the primary effect of the magnetic field is to help determine the thermal pressure in the interface and thus radiative flux from the boundary. Since the total soft X-ray flux is fixed by requiring a match with the Wisconsin B band all-sky average count rate (McCammon et al. 1983), a larger assumed magnetic field increases only the EUV flux which is not constrained by the B-band data. The affect on the cloud of a larger EUV flux is to increase its temperature and ionization. A secondary effect of the magnetic field can be seen in the temperature profiles in Figure 2. In evaporating clouds in the ISM, if the temperature gradient is large enough and the thermal pressure is low enough, the conduction becomes “saturated”, which means that the heat flux expected from the gas temper- ature and temperature gradient exceeds the flux that can be car- ried by the electrons (Cowie & McKee 1977). Saturation leads to a steepening of the temperature gradient and a relative slow- ing of the mass loss rate. In the two cases shown in Figure 2, Fig. 2. Profiles of hydrodynamic variables in two different cloud boundary calculations. The solid lines are for model 6 which has nH = 0.273 cm −3, log Th = 6.0, B0 = 2 µG and NHI = 4.5 × 10 cm−2, while the dashed lines are for model 8, which differs from model 6 only in the strength of the magnetic field, B0 = 5 µG. Note that in the upper right panel, the plot of thermal pressure, the radial scale is much smaller in order to show the variation in thermal pressure. The higher magnetic pressure inside the cloud for model 8 leads to the higher external thermal pressure for that case. The temperature profile differs in the two cases because the degree of heat flux saturation is reduced for the higher thermal pressure of model 8, which in turn leads to a shallower tempera- ture gradient. the lower magnetic field case (B0 = 2 µG) is moderately satu- rated (in terms of Cowie & McKee’s parameter, σ0 = 3) while the other case (B0 = 5 µG), because of the higher magnetic pres- sure, is less saturated. 3. Photoionization Models The photoionization models of the LIC are developed follow- ing the same underlying procedure as in SF02, but selecting only the LIC absorption component towards ǫ CMa for com- parison. Improvements include using updated values for in situ ISM observations and using a recent version of the Cloudy radiative transfer/thermal equilibrium code (version 06.02.09a, Ferland et al. 1998). We run Cloudy with the assumption of a plane-parallel cloud geometry and format our calculated pho- toionizing spectrum to be used as input. The selected options include utilizing recent calculations for the dielectronic recom- bination rates for Mg+ → Mg0 (e.g. Altun et al. 2006, and the commands “set dielectronic recombination Badnell” and “set ra- diative recombination Badnell”) , the assumption of constant pressure, and the inclusion of interstellar dust grains at 50% abundance compared to a standard ISM value. As we discuss below, the only role that dust plays is net heating of the gas since there is far too little column for extinction to be important (ex- pected E(B − V) ∼ 10−4.2). The fraction of the heating provided by dust is ∼ 4% of the total heating, while dust provides ∼ 2% of the cooling (mainly by capture of electrons onto grain surfaces). A cosmic ray ionization rate at the default background level of 2.5×1017 s−1 (for H ionization) is included. The LIC is assumed to be in ionization and thermal equilibrium, and Cloudy calcu- lates the detailed transfer of radiation, including absorption and 6 Slavin and Frisch: Boundary Conditions of the Heliosphere Table 2. Model Input Parameter Values Input Parameter nH log Th B0 NHI Model No. (cm−3) (K) (µG) (1017 cm−2) 1 0.273 5.9 2.0 3.0 2 0.273 5.9 2.0 4.5 3 0.273 5.9 5.0 3.0 4 0.273 5.9 5.0 4.5 5 0.273 6.0 2.0 3.0 6 0.273 6.0 2.0 4.5 7 0.273 6.0 5.0 3.0 8 0.273 6.0 5.0 4.5 9 0.273 6.1 2.0 3.0 10 0.273 6.1 2.0 4.5 11 0.273 6.1 5.0 3.0 12 0.273 6.1 5.0 4.5 13 0.218 5.9 2.0 3.0 14 0.218 5.9 2.0 4.5 15 0.218 5.9 5.0 3.0 16 0.218 5.9 5.0 4.5 17 0.218 6.0 2.0 3.0 18 0.218 6.0 2.0 4.5 19 0.218 6.0 5.0 3.0 20 0.218 6.0 5.0 4.5 21 0.218 6.1 2.0 3.0 22 0.218 6.1 2.0 4.5 23 0.218 6.1 5.0 3.0 24 0.218 6.1 5.0 4.5 25 0.226 5.9 4.7 3.0 26 0.213 5.9 2.5 4.0 27 0.226 6.0 3.8 3.0 28 0.216 6.0 2.1 4.0 29 0.232 6.1 3.4 3.0 30 0.223 6.1 0.05 4.0 42 0.218 6.1 – 4.5 scattering of the radiation incident on the cloud surface, as well as the diffuse continuum and emission lines generated within the cloud. The procedure for creating a model begins with generating the incident radiation field at the cloud surface (§2.3). The ra- diative transfer model is then run, and the output predictions of the model are compared to observations of interstellar absorp- tion lines in the LIC towards ǫ CMa (N(C ii∗), N(N i), N(O i), N(Mg ii), N(S ii), N(Si ii), N(Fe ii)) and in situ observations of n(He0) by spacecraft inside of the solar system. The abundances of C, N, O, Mg, Si, S and Fe are then adjusted to be consistent with the observed column densities. With these new abundances, the Cloudy run is repeated and this process is continued until no adjustment of the abundances is needed. Because the abundances do have an impact on the emission from the cloud boundary, the cloud evaporation model is then re-run with the new abundances as well to re-generate the input ionizing spectrum. The iterative process of generating the spectrum and doing the Cloudy pho- toionization runs generally requires only a couple runs of the cloud evaporation program and a few runs of Cloudy. 4. Model Results To investigate the dependence of the results on the input pa- rameters we calculate a grid of 24 models. We explore total H density, nH = 0.273, 0.218; Local Bubble hot gas temperature, log Th = 5.9, 6.0, 6.1; magnetic field strength, B0 = 2, 5 µG; and H i column density, NH I = 3 × 10 17, 4.5 × 1017 cm−2. The val- Fig. 3. Model results for the He0 density and temperature in the ISM just outside the heliosphere. The squares, circles, and trian- gles are for models that are part of the initial grid of 24 models, while the stars are for models 25 − 30 for which the magnetic field, B, and the total H density, nH, were varied to match the ob- served n(He0) and T . For the grid models, the empty symbols are for models with N(H i)= 3 × 1017 cm−2, while the filled symbol models have N(H i)= 4.5 × 1017 cm−2. As the legend shows, the color (black vs. gray) indicates the magnetic field strength and the symbol shape indicates the temperature assumed for the hot gas of the Local Bubble. For identical kinds of points, the one to the left is for a model with n(H0) = 0.218 cm−3 and the one to the right has n(H0) = 0.273 cm−3. The ellipse is the error range around the observed values for n(He0) and T . ues for T and n(He i) at the solar location, the endpoint of the calculations, are shown in Fig. 3 for this set of models. We then explore another six models in which we do the calculations over a grid in log Th and NH I but vary the values of nH and B0 in or- der to match the observed T and n(He i). We employed a multiple linear regression to assist in narrowing down the search region for the values of nH and B0 needed to match the observations. Since the dependencies of the results on the parameters really are quite non-linear, this procedure could not work to predict exactly correct values for the required parameters, but was use- ful for getting close to the correct values. Based on the results for the initial grid of models, we use log Th = 5.9, 6.0, 6.1 and NH I = 3 × 10 17, 4 × 1017 cm−2 for this smaller grid, models 25 − 30. We chose to use 4 × 1017 cm−2 rather than 4.5 × 1017 cm−2 because the higher column density models, for the most part, produced temperatures that were too high. As can be seen from Fig. 3, all of these models (25–30), plotted as stars, are con- sistent with the observed T and n(He i), indicated by the ellipse in the figure. Table 2 gives the input parameters for each model. Model predictions for the ǫ CMa sightline integrated through the LIC are presented in Table 3. Model predictions for the CHISM (i.e. at the solar location) are shown in Table 4. In SF02 we tested models with no emission from the cloud boundary. This amounts to assuming that the boundary is a sharp transition from the hot gas of the Local Bubble to the warm gas of the LIC. We have again explored such models using the LIC data as constraints in the same way as for the models discussed above with a conductive interface. When there is no evaporative boundary, our models do not depend on the magnetic field in the cloud since in that case the ionizing flux consists only of dif- fuse emission from the hot gas of the Local Bubble and EUV Slavin and Frisch: Boundary Conditions of the Heliosphere 7 Table 3. Model Column Density Results Modela log N(Htot) log N(Ar i) log N(Ar ii) log N(Si iii) N(Mg II) N(Mg I) N(C II) N(C II∗) N(H I) N(He I) Obs.b – – – 12.40 443+197 −110 93 − 430 c 12 − 16 1 17.58 11.48 11.76 8.895 706.3 213.0 11.36 2 17.78 11.67 11.95 9.641 325.3 234.1 11.40 3 17.65 11.33 11.84 9.844 246.1 182.1 11.44 4 17.82 11.55 12.00 10.12 136.5 198.0 11.80 5 17.58 11.49 11.75 8.961 717.8 230.0 11.62 6 17.78 11.67 11.94 9.711 286.7 244.9 11.62 7 17.64 11.34 11.81 9.909 207.4 194.8 12.47 8 17.82 11.56 11.97 10.17 119.7 208.2 12.66 9 17.59 11.46 11.75 9.289 553.0 239.5 12.27 10 17.78 11.66 11.93 9.861 210.3 248.1 12.14 11 17.64 11.33 11.78 9.993 171.2 201.7 13.44 12 17.82 11.54 11.95 10.23 104.3 210.6 13.52 13 17.60 11.44 11.78 9.100 768.9 255.1 11.52 14 17.79 11.64 11.96 9.738 336.4 274.4 11.47 15 17.67 11.30 11.85 9.944 251.2 215.9 11.65 16 17.84 11.51 12.01 10.21 140.6 232.1 12.01 17 17.58 11.47 11.76 8.926 840.1 262.5 11.65 18 17.79 11.65 11.95 9.792 308.9 286.3 11.64 19 17.66 11.30 11.82 10.02 208.5 229.2 12.76 20 17.84 11.52 11.98 10.26 123.5 242.1 12.95 21 17.59 11.43 11.76 9.318 633.1 280.8 12.41 22 17.79 11.63 11.94 9.926 224.9 288.4 12.25 23 17.66 11.29 11.79 10.10 175.0 235.5 13.81 24 17.84 11.50 11.96 10.31 110.2 245.5 13.85 25 17.66 11.32 11.84 9.887 271.3 213.6 11.71 26 17.74 11.56 11.91 9.676 383.9 270.1 11.70 27 17.63 11.36 11.79 9.751 331.1 242.0 12.51 28 17.73 11.59 11.90 9.625 411.6 284.5 11.79 29 17.62 11.38 11.77 9.720 329.3 251.6 13.01 30 17.72 11.60 11.88 9.642 386.6 295.1 12.08 42 17.77 11.70 11.92 9.555 482.9 317.9 11.51 a The best models, consistent with all observational data (see §4), are indicated by bold face. b Observational results from Gry & Jenkins (2001) (see table 1). The values listed for N(H i)/N(He i) are the range of values observed excluding Feige 24 which is one of the most distant stars observed by Dupuis et al. (1995) and has unusually large N(H i) and ratio values. c The upper limit on N(C ii) is not well determined observationally because of the saturation of the line. Gry & Jenkins (2001) define it by assuming a solar C/S abundance ratio and using the observed S ii column density. We find (see Slavin & Frisch 2006) that the abundance of C required to match the N(C ii∗) observations is supersolar, with C/S∼ 36 − 44, which results in a much higher upper limit on N(C ii). emission from nearby hot stars and the spectrum is not related to the properties of the cloud. For these models our model grid consists of total H density, nH = 0.273, 0.218, Local Bubble hot gas temperature, log Th = 5.9, 6.0, 6.1, and H i column density, NH I = 3×10 17, 4.5×1017 cm−2, for a total of twelve models that we label as models 31–42. Of these models, however, only two resulted in ionizing fluxes sufficient to heat the cloud to a tem- perature ∼ 6000 K at the same time as matching the constraints on the ion column densities. These models were the ones with log Th = 6.1 and NH I = 4.5 × 10 17 cm−2 (models 36 and 42). For the other cases, either at the surface or deeper into the cloud, there are insufficient photons to provide the heating to balance the cooling and the cloud temperature drops sharply to < 1000 K. The model with nH = 0.218 (model 42) is consistent with the observed value of n(He0) and with the column density data as well. From these 42 models we have selected those that provide acceptable results for the observational constraints, according to prioritized requirements. The first requirement is that the model predict the density and temperature of He i observed inside of the solar system. Models 14 (marginally), 15 and 25–30 and 42 predict a He density and temperature consistent with the ob- served values within the reported errors. They also predict the PUI Ne densities, for an assumed Ne/H=123 ppm. Models 26 and 28 successfully match the PUI data for Ne/H as low as ∼ 100 ppm. These models are also required to match the ob- served Mg ii/Mg i ratio in the LIC towards ǫ CMa. Models 14, 27, and 29 marginally fit this criterion, while 26, 28, 30, and 42 successfully fit this criterion. Note that the new models now pro- vide acceptable predictions for the CHISM temperature, which was not the case for the best models in SF02. These models are also consistent with the observed C ii/C ii∗ ratios in the LIC com- ponent towards ǫ CMa though this constraint is weak because of the large uncertainties in N(C ii). Based on these comparisons, and given the uncertainties in both data and models, Models 14, 26-30, and 42 are plausible models, but Models 26 and 28 appear to best match the observational constraints. Based on the constraints and assumptions presented in the previous section, we select models 14, 26–30 and 42 as the best models for the LIC ionization, with models 26 and 28 favored by the PUI Ne data provided that Ne/H > 100 ppm. We believe that these seven models then give a realistic range for the uncer- tainties in the boundary conditions of the heliosphere, providing that the underlying assumptions implicit in the Cloudy code, e.g. 8 Slavin and Frisch: Boundary Conditions of the Heliosphere Table 4. Model Results for Solar Location Modela X(H) X(He) n(H0) n(He0) n(N0) n(O0) n(Ne0) n(Ar0) ne np T Obs.b 0.015 5.5 × 10−6 4.8 × 10−5 5.8 × 10−6 1.6 × 10−7 6300 1 0.176 0.299 0.318 0.0269 1.84 × 10−5 1.48 × 10−4 1.39 × 10−5 3.47 × 10−7 0.0800 0.0678 4310 2 0.196 0.352 0.251 0.0201 9.63 × 10−6 7.71 × 10−5 8.51 × 10−6 2.69 × 10−7 0.0724 0.0611 6450 3 0.286 0.404 0.230 0.0191 1.34 × 10−5 1.07 × 10−4 7.81 × 10−6 1.82 × 10−7 0.106 0.0922 6280 4 0.259 0.428 0.229 0.0175 8.86 × 10−6 6.95 × 10−5 6.45 × 10−6 1.92 × 10−7 0.0937 0.0799 7560 5 0.174 0.310 0.300 0.0249 1.72 × 10−5 1.39 × 10−4 1.13 × 10−5 3.30 × 10−7 0.0750 0.0631 4610 6 0.197 0.363 0.242 0.0190 9.13 × 10−6 7.44 × 10−5 7.08 × 10−6 2.61 × 10−7 0.0707 0.0593 6770 7 0.278 0.441 0.225 0.0173 1.33 × 10−5 1.04 × 10−4 5.75 × 10−6 1.78 × 10−7 0.101 0.0866 6670 8 0.257 0.459 0.223 0.0161 8.65 × 10−6 6.79 × 10−5 4.92 × 10−6 1.89 × 10−7 0.0918 0.0773 7860 9 0.193 0.357 0.265 0.0209 1.53 × 10−5 1.23 × 10−4 7.52 × 10−6 2.71 × 10−7 0.0761 0.0634 5410 10 0.203 0.392 0.235 0.0177 9.06 × 10−6 7.24 × 10−5 5.52 × 10−6 2.43 × 10−7 0.0723 0.0600 7230 11 0.278 0.474 0.220 0.0158 1.29 × 10−5 1.02 × 10−4 4.42 × 10−6 1.69 × 10−7 0.0999 0.0847 7020 12 0.262 0.490 0.219 0.0148 8.41 × 10−6 6.66 × 10−5 3.83 × 10−6 1.78 × 10−7 0.0928 0.0776 8160 13 0.201 0.333 0.230 0.0191 1.34 × 10−5 1.08 × 10−4 8.94 × 10−6 2.32 × 10−7 0.0681 0.0580 4720 14 0.215 0.376 0.193 0.0152 7.29 × 10−6 5.95 × 10−5 5.99 × 10−6 1.94 × 10−7 0.0625 0.0528 6650 15 0.310 0.436 0.176 0.0143 1.02 × 10−5 8.18 × 10−5 5.35 × 10−6 1.28 × 10−7 0.0905 0.0789 6470 16 0.284 0.461 0.175 0.0131 6.76 × 10−6 5.36 × 10−5 4.40 × 10−6 1.35 × 10−7 0.0812 0.0695 7750 17 0.178 0.315 0.255 0.0211 1.47 × 10−5 1.19 × 10−4 9.51 × 10−6 2.75 × 10−7 0.0656 0.0553 4360 18 0.214 0.383 0.188 0.0146 7.16 × 10−6 5.81 × 10−5 5.07 × 10−6 1.91 × 10−7 0.0610 0.0513 6900 19 0.302 0.472 0.173 0.0129 1.01 × 10−5 8.03 × 10−5 3.93 × 10−6 1.26 × 10−7 0.0872 0.0748 6860 20 0.282 0.491 0.172 0.0120 6.61 × 10−6 5.26 × 10−5 3.37 × 10−6 1.33 × 10−7 0.0802 0.0677 8050 21 0.204 0.374 0.211 0.0164 1.20 × 10−5 9.85 × 10−5 5.54 × 10−6 2.04 × 10−7 0.0648 0.0541 5390 22 0.221 0.414 0.184 0.0136 6.96 × 10−6 5.69 × 10−5 3.90 × 10−6 1.78 × 10−7 0.0628 0.0523 7350 23 0.305 0.506 0.168 0.0117 9.91 × 10−6 7.82 × 10−5 2.97 × 10−6 1.18 × 10−7 0.0868 0.0737 7220 24 0.286 0.520 0.169 0.0111 6.57 × 10−6 5.16 × 10−5 2.60 × 10−6 1.25 × 10−7 0.0808 0.0676 8310 25 0.297 0.427 0.187 0.0151 1.09 × 10−5 8.67 × 10−5 5.75 × 10−6 1.41 × 10−7 0.0907 0.0789 6360 26 0.224 0.385 0.192 0.0151 8.32 × 10−6 6.66 × 10−5 5.96 × 10−6 1.83 × 10−7 0.0653 0.0554 6320 27 0.258 0.426 0.195 0.0149 1.13 × 10−5 8.97 × 10−5 5.03 × 10−6 1.63 × 10−7 0.0796 0.0677 6240 28 0.212 0.376 0.194 0.0152 8.24 × 10−6 6.73 × 10−5 5.49 × 10−6 1.95 × 10−7 0.0622 0.0523 6350 29 0.242 0.429 0.202 0.0150 1.17 × 10−5 9.30 × 10−5 4.53 × 10−6 1.74 × 10−7 0.0769 0.0646 6300 30 0.203 0.379 0.197 0.0151 8.51 × 10−6 6.81 × 10−5 4.78 × 10−6 2.01 × 10−7 0.0605 0.0502 6500 42 0.189 0.355 0.186 0.0145 6.91 × 10−6 5.71 × 10−5 4.60 × 10−6 2.04 × 10−7 0.0522 0.0433 6590 a The best models (see §4) are indicated by bold face. b Observational results from Gloeckler & Geiss (2004), Gloeckler (2005, private communication) and Witte et al. (1996) (see table 1 for uncer- tainties). photoionization equilibrium, are correct. The predictions of the best models provide excellent matches to the observational con- straints. Based on these models, we find the boundary conditions of the heliosphere to be describable as n(H0) = 0.19−0.20 cm−3, ne = 0.05−0.08 cm −3, and X(H) ≡ H+/(H0+H+) = 0.19−0.26, X(He) ≡ He+/(He0 + He+) = 0.36 − 0.43. For these models, we find abundances of O/H = 295 − 437 ppm, C/H = 589 − 813 ppm, and N/H = 40.7 − 64.6 ppm (Table 7). The total LIC den- sity is n(H)0 = 0.213 − 0.232 cm −3, while the strength of the interstellar magnetic in the cloud varies between 0 and 3.8 µG. The Ne PUI data further favor densities of n(H0) ≈ 0.19 cm−3 and ne = 0.06 − 0.07 cm In this analysis we have assumed negligible filtration for He in the heliosheath regions. Modeling of the He filtration factor however allows values as small as fHe = 0.92, which yields n(He i)= 0.0164 cm−3 for the CHISM (Table 1). The predictions of Model 21 agree with this value for n(He i), as well as with the ratios Mg ii/Mg i and C ii/C ii∗ towards ǫ CMa and the pickup ion Ne and Ar data (Table 3). The predicted cloud temperature is low by ∼ 1000 K. The density and ionization for Model 21 are n(H0) = 0.21 cm−3 and ne= 0.06 cm −3. We therefore conclude that our best models listed above are robust in the sense that they predict consistent values for the n(H0) to within 5%, and electron densities to within 25%. The radiation field incident on the LIC for Model 26 is shown in Fig. 1 and the spectral characteristics of the field for each model are listed in Table 5. The wavelengths regions λ ≤ 912 Å and λ ∼ 1500 Å are of primary importance for the photoioniza- tion of the cloud, the former because it determines H0 and He0 ionization, and the latter because it determines Mg0 ionization. The ionization parameter is defined as U ≡ Φ/n(H)c, where Φ is the H ionizing photon flux, n(H) is the total (neutral + ion- ized) hydrogen density at the cloud surface and c is the speed of light. The total ionizing photon fluxes at the cloud surface (photons cm−2 s−1) for the three bands 13.6–24.6, 24.6–54.4, and 54.4–100 eV, are given by ΦH, ΦHe0 , and ΦHe+ , respectively. The ratio of the total number of H0 and He0 ionizing photons in the incident radiation field is given by Q(He0)/Q(H0). The quantity 〈E〉 (eV) is the mean energy of an ionizing photon, equal to the integrated energy flux from 13.6 to 100 eV divided by the inte- grated photon flux over the same energy range. In Figure 3 we show n(He0) and temperature of the CHISM for our model calculations. In Figure 4 we show N ii/N i vs. Mg ii/Mg i, illustrating an anti-correlation of the ratios caused by the fact that Mg ii/Mg i decreases with electron density while N ii/N i indicates the ionization level in the cloud. The ionization of the CHISM is listed in Table 6 for Model 26, where com- monly observed elements are listed. The abundances of He, Ne, Slavin and Frisch: Boundary Conditions of the Heliosphere 9 Table 5. Characteristics of the Model Radiation Field Modela U φH φHe0 φHe+ Q(He 0)/Q(H0) 〈E〉 photons cm−2 s−1 photons cm−2 s−1 photons cm−2 s−1 eV 1 2.0 × 10−6 4.6 × 103 7.5 × 103 2.8 × 103 0.46 74.6 2 2.0 × 10−6 5.3 × 103 7.3 × 103 2.7 × 103 0.44 72.4 3 3.1 × 10−6 9.0 × 103 1.3 × 104 2.7 × 103 0.49 57.7 4 3.0 × 10−6 8.6 × 103 1.2 × 104 2.6 × 103 0.49 58.6 5 2.0 × 10−6 4.0 × 103 7.0 × 103 3.8 × 103 0.43 79.0 6 2.1 × 10−6 4.9 × 103 6.9 × 103 3.8 × 103 0.41 76.2 7 3.2 × 10−6 7.4 × 103 1.4 × 104 3.8 × 103 0.53 61.3 8 3.1 × 10−6 7.4 × 103 1.3 × 104 3.7 × 103 0.52 61.5 9 2.2 × 10−6 3.8 × 103 7.4 × 103 5.6 × 103 0.40 80.0 10 2.3 × 10−6 4.8 × 103 7.2 × 103 5.6 × 103 0.38 77.3 11 3.5 × 10−6 6.6 × 103 1.5 × 104 5.6 × 103 0.53 63.4 12 3.4 × 10−6 6.9 × 103 1.5 × 104 5.5 × 103 0.52 63.2 13 2.3 × 10−6 4.3 × 103 7.0 × 103 2.8 × 103 0.46 76.9 14 2.4 × 10−6 5.1 × 103 6.8 × 103 2.7 × 103 0.43 74.3 15 3.7 × 10−6 8.5 × 103 1.2 × 104 2.7 × 103 0.49 58.8 16 3.6 × 10−6 8.2 × 103 1.2 × 104 2.6 × 103 0.49 59.5 17 2.3 × 10−6 3.7 × 103 6.4 × 103 3.8 × 103 0.42 81.8 18 2.4 × 10−6 4.7 × 103 6.2 × 103 3.8 × 103 0.39 78.4 19 3.9 × 10−6 7.1 × 103 1.4 × 104 3.8 × 103 0.53 62.2 20 3.8 × 10−6 7.2 × 103 1.3 × 104 3.7 × 103 0.52 62.2 21 2.6 × 10−6 3.6 × 103 6.8 × 103 5.6 × 103 0.39 82.0 22 2.7 × 10−6 4.6 × 103 6.6 × 103 5.6 × 103 0.36 78.9 23 4.2 × 10−6 6.3 × 103 1.5 × 104 5.6 × 103 0.52 64.4 24 4.2 × 10−6 6.7 × 103 1.4 × 104 5.5 × 103 0.51 64.1 25 3.5 × 10−6 7.9 × 103 1.2 × 104 2.7 × 103 0.49 60.3 26 2.5 × 10−6 5.0 × 103 7.5 × 103 2.7 × 103 0.45 73.2 27 3.1 × 10−6 5.4 × 103 1.0 × 104 3.8 × 103 0.50 68.9 28 2.4 × 10−6 4.3 × 103 6.4 × 103 3.8 × 103 0.40 79.5 29 3.0 × 10−6 4.6 × 103 9.7 × 103 5.6 × 103 0.45 73.6 30 2.4 × 10−6 3.8 × 103 5.5 × 103 5.6 × 103 0.33 84.6 42 2.2 × 10−6 3.6 × 103 3.6 × 103 5.6 × 103 0.25 91.1 a The best models (see §4) are indicated by bold face. Fig. 4. Model results for N(N ii)/N(N i) versus Mg ii/Mg i. The symbols have the same meaning as in Figure 3. In this case the models with higher n(H0) lie below and slightly to the left of those with lower density. N(N ii)/N(N i) is an indicator of cloud ionization fraction while Mg ii/Mg i goes as 1/ne. The dotted line is the observed value for Mg ii/Mg i and the dashed lines indi- cate the 1 − σ error range for the value. We see that models that match the observed ratio all correspond to relatively low ioniza- tion, X(H) ∼ 0.20 − 0.27 in the CHISM. Na, Al, P, Ar, and Ca were assumed, based on solar abundances, and were not adjusted in the modeling process. The abundances of C, N, O, Mg, Si, S and Fe were adjusted for each model to match observed column densities towards ǫ CMa (see §2). The elemental abundances that have been assumed, with the excep- tion of that for He, are not expected to have any significant im- pact on the model results. 5. Discussion 5.1. Heliosphere Boundary Conditions As discussed above, the best models of those we calculated are determined by the match to the CHISM He0 density and temperature found by the in situ Ulysses measurements (§2.2), combined with the matching the LIC component column den- sity ratios Mg ii/Mg i and C ii/C ii∗. These models span a fairly large range in the model parameters: nH = 0.213 − 0.232 cm−3, log Th = 5.9 − 6.1 K, B0 = 0.05 − 3.8 µG, and N(H i) = 3.0 × 1017 − 4.5 × 1017 cm−2. Despite this, the predicted val- ues for neutral H density and electron density in the CHISM lie within a remarkably small range: n(H0) = 0.19 − 0.20 cm−3, ne = 0.05 − 0.08 cm −3, for models that include the conductive boundary. For the one model without evaporation that is consis- tent with the data we find n(H0) = 0.186 cm−3, ne = 0.052 cm Including the PUI Ne data as a constraint narrows the density re- sults to n(H0) ≈ 0.19 cm−3 and ne = 0.06− 0.07 cm −3. For these 10 Slavin and Frisch: Boundary Conditions of the Heliosphere Table 6. Model 26 Results for Ionization Fractionsa Element PPM I II III IV H 106 0.776 0.224 – – He 105 0.611 0.385 4.36(-3) – C 661 2.68(-4) 0.975 0.0244 0.000 N 46.8 0.720 0.280 8.52(-5) 0.000 O 331 0.814 0.186 4.71(-5) 0.000 Ne 123 0.196 0.652 0.152 2.79(-6) Na 2.04 1.47(-3) 0.843 0.155 6.34(-6) Mg 6.61 1.98(-3) 0.850 0.148 0.000 Al 0.0794 5.37(-5) 0.976 0.0118 0.0123 Si 8.13 4.21(-5) 0.999 8.02(-4) 3.10(-5) P 0.219 1.35(-4) 0.977 0.0232 9.29(-5) S 15.8 6.47(-5) 0.971 0.0288 1.95(-6) Ar 2.82 0.263 0.500 0.238 2.83(-6) Ca 4.07(-4) 9.21(-6) 0.0155 0.984 1.87(-4) Fe 2.51 7.01(-5) 0.975 0.0245 5.75(-6) a Numbers less than 10−3 are written as x(y) where y is the exponent and x is the mantissa (or significand). densities to stray out of this range would appear to require sig- nificant errors in the underlying comparison data, e.g. n(He0) in the CHISM, or substantial non-equilibrium ionization effects in the LIC. The variation in the interstellar radiation field between the different models that match the data give us some degree of confidence that these densities are not highly sensitive to the de- tails of the radiation field. The range ne = 0.05 − 0.08 cm corresponds to an electron plasma frequency of 2.0–2.5 kHz, which is the frequency of the mysterious weak radio emission detected beyond the termination shock in the outer heliosphere (Gurnett & Kurth 2005; Mitchell et al. 2004). 5.2. Hydrogen Filtration Factor Tracers of H0 inside of the termination shock, after filtration, in- clude the H i Lyα backscattered radiation, H pickup ions, and the slowdown of the solar wind at distances beyond 5 AU from mass-loading by H PUIs. The range of n(H0) found above to best fit the combined heliospheric n(He0) and LIC data towards ǫ CMa, n(H0) = 0.19− 0.20 cm−3, represents the density of neu- tral interstellar H atoms outside of the heliosphere, and removed from heliospheric influences. The hydrogen filtration factor, fH, can be obtained from comparisons between these models and in- terstellar H0 densities at the termination shock as inferred from in situ observations of interstellar H inside of the heliosphere. The accompanying papers in this special section provide es- timates of the interstellar H0 density at the termination shock. The solar wind slows down due to massloading by interstel- lar H, yielding n(H0)= 0.09 ± 0.01 cm−3 at the termination shock (Richardson et al. 2007). The density of H pickup ions ob- served by Ulysses is inferred at the termination shock, yielding n(H0)= 0.11 ± 0.01 cm−3; models of H atoms traversing the he- liosheath regions then yield for the CHISM n(H0)= 0.20 ± 0.02 cm−3 and np= 0.04±0.02 cm −3, or ne∼ 0.05 cm −3 (Bzowski et al. 2007). The radial variation in the response of the interplanetary Lyα 1215 Å backscattered radiation to the solar rotational mod- ulation of the Lyα “beam” that excites the florescence yields n(H0)∼ 0.085−0.095 cm−3, depending on the heliosphere model (Pryor et al. 2007). From these n(H0) values at the termination shock, we estimate that 43%–58% of the H-atoms successfully traverse the heliosheath region, or fH ∼ 0.43−0.58. Müller et al. (2007) evaluate filtration using five different plasma-neutral Table 7. Elemental Gas Phase Abundances (ppm) Element Model No. C N O Mg Si S Fe 14 589 40.7 295 5.89 7.24 14.1 2.24 21 955 60.3 447 9.77 11.5 22.9 3.55 25 631 66.1 437 7.76 10.0 19.5 3.09 26 661 46.8 331 6.61 8.13 15.8 2.51 27 759 64.6 437 8.71 10.7 20.9 3.31 28 708 45.7 331 7.08 8.32 16.6 2.57 29 813 64.6 437 9.33 11.0 21.9 3.39 30 741 46.8 331 7.41 8.51 17.0 2.63 42 724 39.8 295 6.76 7.76 15.1 2.34 models, and find a range of fH = 0.52 − 0.74. A hydrogen filtra- tion of fH = 0.55 ± 0.03 is consistent with both in situ data and radiative transfer models. 5.3. Gas-Phase Abundances The LIC photoionization models are forced to match the ob- served set of column densities (Tables 1 and 3). The gas-phase abundances of most elements are treated as free parameters that can be varied in order to match observed column densities, so that the successful models yield elemental abundances for the LIC that are automatically corrected for unobserved H+ (Table 7). The exceptions are that He, Ne, and Ar abundances, being unconstrained by the observations toward ǫ CMa, are not ad- justed but are assumed to be 105 ppm, 123 ppm and 2.82 ppm, respectively. In the models, N(C ii∗) is a constraint on both the C abundance (in place of the heavily saturated C ii 1335Å line) and ne, such that the product of the abundance and ne is more tightly limited than either quantity individually. The requirement to match both N(C ii∗) and n(He0) effectively restricts the ioniza- tion fraction of H, which in turn limits O and N ionization whose ionization fractions are tied by charge-transfer to the H ioniza- tion at LIC temperatures. Early studies showed that the abundances of refractory el- ements in the very local ISM are enhanced compared to abun- dances in cold disk gas (Marschall & Hobbs 1972; Stokes 1978; Frisch 1981). Throughout warm and cold disk gas, the un- derabundances of refractory elements compared to solar abun- dances (by factors of 10−1 − 10−4) are taken to represent deple- tion onto interstellar dust grains (e.g. Savage & Sembach 1996). This view is supported by the correlation found between elemen- tal depletions and the temperature characteristic of condensa- tion at solar pressure and composition (Ebel 2000), and assumes that there is a reference abundance pattern that characterizes the cloud, and remains constant over the cloud lifetime as atoms are exchanged between the gas and dust phases. Below (§5.4) we compare solar abundances with observed gas-phase abundances to predict the gas-to-dust mass ratios for the LIC, based on the assumption that LIC gas and dust have remained coupled over the cloud lifetime. An important question is whether the LIC has solar abun- dances. Isotopes of 18O and 22Ne isotopes measured in the ACR population suggest that this is so. ACRs are characterized by a rising particle flux for energies below 10–50 MeV/nucleon, and this characteristic spectral signature is seen for 16O, 18O, 20Ne, and 22Ne. Ratios of 16O/18O ∼ 500 and 20Ne/22Ne ∼ 13.7 are found for both the ACRs and solar material, indicat- ing that the CHISM and solar material have similar composi- Slavin and Frisch: Boundary Conditions of the Heliosphere 11 Table 8. Solar Abundances (ppm) Grevesse Holweger Lodders Grevesse Sauval et al. (1998) (2001) (2003) (2007)a C 334 ± 46 391(+110,−86) 290 ± 27 275 ± 34 N 84 ± 12 85.3+25 −19 82 ± 20 67 ± 10 O 683 ± 94 545+107 −90 579 ± 66 513 ± 63 Ne 121 ± 17 100+17 −15 91 ± 21 77 ± 12 Mg 38 ± 4 34.5+5.1 −4.5 41 ± 2 38 ± 9 Si 35 ± 4 34.4+4.1 −3.7 41 ± 2 36 ± 4 S 22 ± 5 - 18 ± 2 15 ± 2 Ar 2.54 ± 0.35 - 4.24 ± 0.77 1.70 ± 0.34 Fe 32 ± 4 28.15.84.8 35 ± 2 32 ± 4 a Protosolar abundances are obtained by increasing the photospheric abundances by 0.05 dex for elements heavier than He, as suggested by Grevesse et al. (2007) tions (Leske et al. 2000; Leske 2000). We therefore adopt solar abundances as the underlying reference abundance pattern for the LIC. Unfortunately a prominent uncertainty exists in the correct solar abundances of volatile elements such as O and S, which have low condensation temperatures (Tcond, 180 K and 700 K respectively), and noble elements such as Ne and Ar. Solar abundances are determined from photospheric data (C, N, O, Mg, Si, S, Fe), the solar wind (Ar, Ne), solar active regions (Ne), and helioseismic data (He); abundances of non-volatile el- ements are also found from meteoritic data (Grevesse & Sauval 1998; Holweger 2001; Lodders 2003; Grevesse et al. 2007). Solar abundances from these studies are listed in Table 8. Our results, discussed below, indicate that if the LIC has a solar abun- dance composition, as indicated by the 18O and 22Ne data, then the lower abundances found by Grevesse et al. (2007) are pre- ferred by our models. Ne: In these models we have assumed the Ne abundance is 123 ppm (Anders & Grevesse 1989), which is based on a combination of photospheric and interstellar data. Solar sys- tem Ne abundances are difficult to measure because of FIP effects, however values include 77 ppm (Grevesse et al. 2007, after adding 0.05 dex to account for gravitational settling of the elements), and ∼ 41 ppm for solar wind in coronal holes (Gloeckler & Geiss 2007). The predicted densities of n(Ne0) in the CHISM for models 26 and 28 (Table 4) are in agreement with the most recent PUI results for Ne (Table 1), and Ne densities as low as ∼ 100 ppm are allowed when filtration is included. The CHISM Ne abundances indicated by these results ap- pear to be consistent with Ne abundances and ionization levels in the global ISM. The Ne abundance in the Orion nebula is 100 ppm (Simpson et al. 2004). Takei et al. (2002) measured X-ray absorption edges formed by Ne and O in the interstellar gas and dust towards Cyg X-2, and found abundances of Ne/H ∼ 92 ppm and O/H ∼ 579 ppm when both atomic and compound forms in the sightline were included. Juett et al. (2006) observed the X-ray absorption edges of Ne and O towards nine X-ray bina- ries which sampled both neutral and ionized warm material, and found that the ionized states formed in the ionized material have the ratio Ne iii/Ne ii ∼ 0.23. This value is identical to the pre- dictions of model 26, Ne iii/Ne ii ∼ 0.23, a fortuitous agreement that may indicate that the EUV radiation field in the CHISM is similar to the generic galactic EUV field in the solar vicinity. Ar: Solar Ar abundance determinations range between Ar/H ∼ 1.4 − 5.0 ppm (Table 8); we have assumed Ar/H = 2.82 ppm. The predicted Ar density at the Sun is within the uncertainties of the PUI data, although the range of possible filtration factors (0.64 – 0.95, see §2.2) also allow considerable leeway. O: In the warm ISM such as the LIC, the ionization of oxy- gen and hydrogen is tightly coupled over timescales of ∼100 years by charge transfer (Field & Steigman 1971), so that the assumed N(H i) combined with N(O i) measurements act to con- strain the deduced O abundance in the gas. The two best models (26, 28) correspond to O/H = 331 ppm, however the O column density measurements are based on the saturated 1302Å line, and have ∼ 35% uncertainty. The modeled LIC value of O/H ∼ 331 ppm indicates that ∼ 35% of the O atoms are depleted onto dust grains. An oxygen filtration factor of fO ∼ 0.75 is required by the PUI data and Model 26. These models yield gas-phase O abundances that are con- sistent with observations of more distant interstellar sightlines. The ratio N(O i)/N(H i) is measured in both low and high extinc- tion clouds. Oliveira et al. (2003) used unsaturated O i lines in the 910–1100Å interval and found O i/H i = 317 ± 19 ppm for ∼ 30 sightlines that included both types of material. Sightlines with detected H2, N(H) > 10 20.5 cm−2 and 〈nH〉 = 0.1−3.3 cm yield O/H = 319 ± 14 ppm (Meyer et al. 1997). A survey of 19 stars with an average distance of 2.6 kpc by André et al. (2003) found O i/H i = 408± 14 ppm, where the long sightline and high average value N(H)/E(B− V) = 6.3 × 1021 cm−2 mag−1 indicate a bias towards sightlines containing many clouds that individu- ally have low extinctions. A larger sample of 56 sightlines for a range of extinctions and distances show that sightlines with higher average mean densities, 〈nH〉, show O/H = 284±12 ppm, versus O/H = 390 ± 10 ppm for stars with low values of 〈nH〉 (Cartledge et al. 2004). For comparison, solar abundance stud- ies yield a range of ∼ 450 − 780 ppm. C: The best Models 26 and 28 yield a gas-phase abundance of C of C/H = 661 and 708 compared to solar abundances of ∼ 240 − 500 ppm, which is consistent with our earlier re- sults (Slavin & Frisch 2006) indicating an overabundance of C in the LIC. We speculate that shock destruction of carbonaceous grains, perhaps combined with some local spatial decoupling be- tween carbonaceous and silicate grains, may explain these find- ings. Singly-ionized carbon is an important coolant in the LIC (§5.5), so the C overabundance is required to maintain the tem- perature of the CHISM at the observed value. The carbon abun- dance obtained here indirectly depends on the Mg ii→Mg i di- electronic recombination coefficient that determines the ratio Mg ii/Mg i, since that ratio is used as a criteria for the best mod- els. The same ionization correction that gives the C abundance also successfully predicts Ne ionization in global WPIM and the S abundances in the LIC, although this may be a fortuitous co- incidence. In the adjacent sightline towards Sirius the LIC has N(C ii)/N(H i) = 1, 050 ppm (Hebrard et al. 1999). An ionization correction of 300% is required to make this value consistent with solar abundances, and such a large ionization correction is not consistent with the ionization levels of X(H) ∼ 20 − 26% found here. In contrast, sightlines with cold ISM show C abundances on the order of 135 ± 46 ppm (Sofia et al. 1997; Sembach et al. 2000). N: The best models (26 and 28) find N/H = 46–47 ppm, com- pared to solar values of ∼ 57−110 ppm. These results are consis- tent with the PUI results, N/H ∼ 19−47 ppm, after filtration fac- tor uncertainties are included. The N and O results and favor an 12 Slavin and Frisch: Boundary Conditions of the Heliosphere ISM abundance pattern for volatiles similar to the Grevesse et al. (2007) photospheric abundances. S: The best models predict S/H = 16–17, compared to solar values of ∼ 13−27 (including uncertainties, see Table 8). Sulfur is found to have little or no depletion onto dust grains in warm diffuse ISM (e.g. Welty et al. 1999). Mg, Si, Fe: These refractory elements are observed in the LIC gas with abundances far below solar (factors of 3–15). Approximately 92%, 82%, and 77% of the Fe, Mg, and Si, re- spectively, are presumably depleted onto interstellar dust grains. If (Grevesse et al. 2007) abundances are assumed for the LIC, then to within the uncertainties the LIC dust has the relative com- position of Fe:Mg:Si:O = 1:1:1:4, as is consistent with amor- phous olivines MgFeSiO4. Fe and Si are dominantly singly ion- ized, while Mg has a significant fraction (∼ 15%) that is twice ionized. The gas-phase abundances of these refractory elements are highly subsolar, even after ionization corrections are made, indicating that these elements are substantially depleted onto in- terstellar dust grains. In contrast to C, however, the silicate dust in the LIC that carries the missing Mg, Si, and Fe has experi- enced far less destruction than the carbonaceous grains. Ca ii, Na i: Weak lines of the trace ionization species Ca ii and Na i are common diagnostics of ionization and abundance for interstellar clouds, including the partially ionized LIC; Na i is also frequently used as a diagnostic of the H column density. We note that our models show that the ratios N(Ca ii)/N(Na i), N(Na i)/N(H), and N(Na i)/N(H i) vary by 30%, 77%, and 93%, respectively, between the best models (Models 26–30, 42). As trace ionization species, the densities of Na i and Ca ii are highly sensitive to volume density, n(Na0), n(Ca+) ∝ n(H)ne. We there- fore conclude that Na i and Ca ii are imprecise diagnostics of ionization levels, H density, and abundances in warm partially ionized clouds. 5.4. Gas-to-Dust Mass Ratio Because the abundances are automatically corrected for unob- served H+, we use the model results to infer the total mass of the interstellar dust, providing that the gas and dust in the LIC form a coupled and closed system that evolves together as the cloud moves through the LSR. The LIC LSR velocity is 16–21 pc/Myr, so that a LIC origin related to the Loop I or Scorpius-Centaurus superbubble would require that the LIC gas and dust remained a closed system over timescales of 4–5 Myr (Frisch & Slavin 2006; Frisch 1981). Gas-to-dust mass ratios calculated from the best models (26 and 28) using the missing-mass argument2 are in the range RG/D= 149 − 217, depending on solar abundances. The detailed information about RG/D for different assumptions and the different models is listed in table 9. For comparison, RG/D determined from comparisons of in situ observations of interstellar dust inside of the solar system, compared to the gas densities of these models, yield RG/D = 115– 125 (Table 9, Landgraf et al. 2000; Altobelli et al. 2004). The in situ RG/D is an upper limit, since the smallest interstellar dust grains (radii ≤ 0.15 µm) with large charge-to-mass ratios (and thus small Larmor radii) are excluded from the heliosphere by the interstellar magnetic field which is draped over the helio- sphere. For all the models, the RG/D determined from comparing in situ dust measurements with the CHISM gas mass flux is 2 This argument assumes that the ISM reference abundances, in this case solar abundances, represent the sum of the atoms in the gas plus the dust (Frisch et al. 1999). Table 9. Gas-to-Dust Mass Ratios from Models and In Situ Observations Modela Source 14 26 27 28 29 30 42 GS98 137 149 196 149 197 150 138 Lodders 158 174 238 174 239 175 160 Grevesse 194 217 321 217 323 218 196 In Situ 115 116 123 116 125 116 107 a The source of the comparison solar abundances is listed in column 1 (Table 8. The in situ dust flux is from Landgraf et al. (2000), corrected downwards by 20% as recommended by Altobelli et al. (2004) to account for side-wall impacts. Table 10. Major Heat Sources in LIC Gasa Sourceb Fraction of Heating H i 0.657 He i 0.248 dust 0.055 He ii 0.016 cosmic rays 0.010 a Results for model 26. Other models are qualitatively the same, though there are some quantitative variations. b For lines with ion names, the source here denotes the ion that is photoionized. Dust heating comes from photoelectric ejection by photons of the background FUV radiation field. Cosmic ray heating comes from electron impact ionization of the gas and direct heating of the electrons in the LIC plasma by the cosmic ray electrons. lower than that determined by assuming solar abundances and using the gas phase abundances we determine to find RG/D. This suggests that somehow the dust flowing into the helio- sphere is concentrated relative to the gas, compared to the over- all LIC sightline towards ǫ CMa. The lower solar abundances of Grevesse et al. (2007) result in lower required depletions, and produces stronger disagreements with RG/D determined from in situ data. We do not understand this result, which we have found previously (Frisch et al. 1999). Since RG/D is sensitive to the mass of Fe in the dust grains (Frisch & Slavin 2003), we sug- gest that this difference may indicate inhomogeneous mixing of the gas and silicate dust over the ∼ 0.64 pc extent of the LIC. 5.5. Heating and Cooling Rates The heating and cooling rates for Model 26 are listed in Tables 10 and 11. The primary heat sources are photoelectrons from the ionization of H0 and He0, with dust and cosmic ray heating contributing less than 7% of the heating. The dominant source of cooling is the [C ii] 157.6 µm fine-structure line, making up 43% of the total. This is more than twice the contribution of any other coolant. Nearly all the cooling is due to optical and infrared forbidden lines with many lines contributing at the ∼ 1% level. H recombination, free-free emission and dust, through the capture of electrons onto grain surfaces, also contribute at about a 2% level. The importance of C ii as both a constraint on the C abundance in our models as well as a major coolant means that any model that aims to reduce the abundance of C to a solar level faces severe difficulties. The models with LIC temperatures in the THe0 = 6 300 ± 340 K range indicated by Slavin and Frisch: Boundary Conditions of the Heliosphere 13 Table 11. Major Coolants in LIC Gasa Ion/Line Fraction of Cooling [C ii] 157.6 µm 0.428 [S ii] 6731 Å 0.145 Fe ii (total) 0.074 [Si ii] 34.8 µm 0.065 [Ne ii] 12.8 µm 0.035 [O i] 63.2 µm 0.028 H recomb. 0.024 dust 0.024 [Ne iii] 15.6 µm 0.020 [N ii] 6584 Å 0.018 net free-free 0.018 [O i] 6300 Å 0.017 [O ii] 3727 Å 0.011 [Ar ii] 6.98 µm 0.011 a Results for model 26. Other models show similiar results. the in situ He0 data all require supersolar abundances of C. The total heating/cooling rate for the LIC at the Sun for this model is 3.55 × 10−26 ergs cm−3 s−1. 5.6. Radiation Field Recently it has been proposed that a significant portion of the SXRB can be attributed to charge-transfer (a.k.a. charge exchange) between the solar wind ions (e.g., O+7 and O+8) and interstellar neutrals (Cravens 2000; Snowden et al. 2004; Wargelin et al. 2004; Smith et al. 2005; Koutroumpa et al. 2006). While it seems at present that some fraction of the low energy X-rays are from this mechanism, it is unclear how large that fraction is. We note that basing the properties of the local hot plasma in the galactic plane on SXRB emission at energies E > 0.3 keV is problematical. (Bellm & Vaillancourt 2005) have compared the Wisconsin B and Be band data with the ROSAT R12 data, and concluded that the observed anticorrelation be- tween R12 and N(H i) indicates that more than 34% of the SXRB generated in the Galactic disk must come from the Local Bubble. They also concluded that a heavily depleted plasma with log T ∼ 5.8 is consistent both with the McCammon et al. (2002); Sanders et al. (2001) X-ray spectral data, and the upper limits set on the EUV emission by CHIPS (Hurwitz et al. 2004). When the Robertson & Cravens (2003) models of SXRB production by charge-transfer with the solar wind are considered, then only half of the SXRB in the plane is required to arise from a hot local plasma. We also note that the atomic physics for the calculation of the low energy part of the emission is still quite uncertain (V. Kharchenko, private communication). At this point we take the simple approach of ignoring the charge-transfer emission, though we plan to consider its possible impact in future work by reducing the assumed SXRB flux from hot gas. As noted previ- ously, a lower SXRB flux due to a lower pressure in the hot gas does not necessarily have any impact on our calculated flux from the evaporative cloud boundary. 5.7. LIC Pressure The strength of the interstellar magnetic field in the LIC is un- known, though modeling of its effects on the heliosphere sets some constraints. Our best models (26 and 28) presented here have a thermal pressure of ∼ 2100 cm−3 K for the LIC. If the thermal and magnetic pressures are equal, this indicates a mag- netic field strength B ∼ 2.7 µG, in agreement with field strengths for these models. As noted is §2.3, the main effect of the field strength in the models is to regulate the pressure in the evapo- rative cloud boundary, which in turn affects the flux of diffuse EUV radiation incident on the cloud. The amount of EUV flux helps determine the temperature in the cloud, which is how the observational constraints fix the magnetic field strength in the context of our modeling. Thus we do not explicitly fix the mag- netic field strength with the goal of achieving equipartition and indeed some of our successful models have lower or higher field strengths. It is probably coincidental that the field strength re- quired to match the in situ He0 temperature for our best models is also close to the equipartition field strength, but it is at least encouraging that this field strength is consistent with our pho- toionization models. We note that if thermal, cosmic ray, and magnetic pressures are approximately equal the LIC has a pres- sure of ∼ 6300 cm−3 K. 5.8. Comparisons with other LISM Sightlines There have been a number of efforts to understand the LISM ionization and abundances (Frisch et al. 1986; Cheng & Bruhweiler 1990; Lallement & Bertin 1992; Vallerga 1996; Lallement & Ferlet 1997; Holberg et al. 1999; Kimura et al. 2003). The studies that attempt to derive gas phase elemental abundances find a range of results, generally fairly consistent with ours. A point of particular interest is the abundance of carbon that is surprisingly overabundant in our results. As an example, Kimura et al. (2003) find (based on four sightlines and excluding the ǫ CMa and β CMa sightlines), a subsolar C abundance in contradiction with our results. Results for thirteen sightlines from Redfield & Linsky (2004) with velocity components consistent with the LIC velocity vector show that N(C ii)/N(O i) > 1 for 8 of them, especially those at lower column density indicating a solar or supersolar C abundance. For our best models N(C ii)/N(O i) ≈ 2. Our series of studies are unique in that we model the radiation field incident on the cloud, include radiative transfer effects, and calculate the thermal equilibrium within the cloud. The ionization varies through the cloud as does the temperature and density (slightly) and we compare observations within the heliosphere with the physical conditions at that point in the cloud rather than basing the model on line-of-sight averages. Our present results indicate that n(N ii)/n(N i)∼ 0.32 − 0.50 at the solar location, with N becoming more ionized as the sightline approaches the cloud surface. The column density ra- tio is thus higher, ranging from 0.38 − 0.62. Observed values of N(N ii)/N(N i) toward other nearby stars are 0.58+0.56 −0.77 toward Capella (Wood et al. 2002), 1.29±0.23 toward HZ43 (Kruk et al. 2002), 1.91+0.87 −0.69 towards WD1634-573 (Lehner et al. 2003), and 1.13 ± 0.24 towards η UMa (Frisch et al. in preparation). The total H i column density towards each of these stars is greater than the N(H i)∼ 4 × 1017 cm−2 found for the best models here. The nearest of these stars, Capella, has an ionization compara- ble to that of the LIC. The two high-latitude stars HZ43 and η UMa appear to sample low opacity regions where the ionization is larger than at the Sun, as does the WD1634-573 sightline that appears to cross the nearby diffuse H ii region seen towards λ Sco (York 1983). As we have noted the ǫ CMa line of sight is special because that star is the dominant source of stellar EUV photons for the LIC. Thus for sightlines at a large angle from the ǫ CMa sightline, if the H i column between points along the sightline and ǫ CMa is small the apparently high column points 14 Slavin and Frisch: Boundary Conditions of the Heliosphere are subject to a strong EUV field. Such geometry dependent ion- ization effects can be important for non-spherical clouds subject to a strongly spatially variable ionizing radiation source. The variation in the fractional ionization of the CLIC gas has a direct impact on our understanding of the distribution and physical properties of low column density clouds for several rea- sons. (1) Abundances of elements with FIP < 13.6 eV must al- ways be calculated with respect to N(H i)+ N(H ii) for very low column density clouds. (2) Cloud geometry affects the opacity of observed sightlines so that the opacity to ionizing radiation is not directly traced by the observed value of N(H i). For lines of sight other than that towards ǫ CMa, this could require more complex radiative transfer models in which the difference be- tween the line of sight toward the star and that toward one of the primary sources of ionizing flux, ǫ CMa, at each point is taken into account. 6. Conclusions There are many uncertainties regarding the detailed properties of the ionizing interstellar radiation field incident on the LIC. The data we have on the LIC, both from absorption line studies and in situ measurements by spacecraft in the heliosphere, provide us with strong constraints on the ionization and composition of the LIC and particularly the CHISM. By exploring a range of models for the ISRF we find that while a fairly broad range of radiation fields can produce photoionization consistent with the data, other outputs from the models fall within a relatively narrow range of values. Our results for the models explored in this paper in which we require our models to be consistent with the LIC component of the absorption lines observed towards ǫ CMa include: 1. For a range of assumptions regarding the H i column density of the LIC, N(H i) = (3.0−4.5)×1017 cm−2, and temperature of the hot gas of the Local Bubble, log Th = 5.9, 6.0 and 6.1, we are able to find model parameters that allow a match of the model results with best observed quantities, n(He0), T (He0) and N(Mg ii)/N(Mg i). For these models we assume that the cloud is evaporating because of thermal conduction between the hot Local Bubble gas and the warm LIC gas and include the emission from the cloud boundary. 2. For the best models in terms of fits to data, the required input parameters are: initial (i.e. at the outer edge of the cloud) total H density, n(H) ≈ 0.21−0.23 cm−3; and cloud magnetic field, B0 ≤ 3.8 µG. 3. If we assume that the magnetic field configuration reduces thermal conductivity at the boundary enough to prevent evaporation and ignore any radiation from the cloud bound- ary, we find that for most cases the radiation field does not cause sufficient heating to maintain the LIC at the temper- ature observed, T = 6, 300 ± 340 K. One set of parameter choices, though, yields a successful model. These parame- ters are N(H i) = 4.5 × 1017 cm−2 and log Th = 6.1. 4. Despite the wide range of possible input parameters, the out- put values for quantities important for shaping the helio- sphere are confined to a fairly small range: n(H0) = 0.19 − 0.20 cm−3, and ne = 0.05 − 0.08 cm 5. A H filtration factor of fH = 0.55 ± 0.03 yields good agree- ment between the radiative transfer model predictions for n(H i) in the CHISM, and n(H i) at the termination shock as found from observations of PUIs, the H i Lyα glow, and the solar wind slow-down in the outer heliosphere. This filtra- tion value is also consistent with heliosphere models of the ionization of interstellar H atoms traversing the heliosheath regions. 6. Elements with ionization potentials 13.6− 25 eV, e.g. H, He, N, O, Ne, and Ar, are partially ionized with ionization frac- tions of ∼ 0.2 − 0.7. 7. By requiring that the models match the column densities de- rived from absorption line data we are able to determine the necessary elemental abundances for several elements. We find that the abundances of N and O may be somewhat sub- solar. Sulfur is roughly solar, and C is substantially super- solar. Mg, Si and Fe are all sub-solar by factors of 3 − 15. The depletions of Fe, Mg, Si and O in the LIC are consis- tent with a dust population consisting of amorphous silicate olivines MgFeSiO4, though other compositions for the dust are possible as well. We conclude that any carbonaceous dust in the LIC must have been destroyed, while silicate dust has persisted. Except for the gas-to-dust mass ratio, these results are in better agreement with the lower solar abundances of Grevesse et al. (2007). However we note that the O and Ne abundances of Lodders (2003) are in better agreement with other astronomical data such as the X-ray absorption edges. 8. The gas-to-dust mass ratio derived from missing mass in the gas-phase for our best models depends strongly on the as- sumed reference abundance set and range from 137 − 323. Our two best models, nos. 26 and 28, give a range of 149−217. For these same models RG/D = 115−125 based on the observed flux of dust into the heliosphere. The discrep- ancy of these values is minimized, in fact leading to con- sistency within the errors, if one assumes an abundance set such as that of GS98 which has large abundances of the met- als. The GS98 abundances lead to substantial O depletion, however, which is not easily explained and conflict with the S abundances found for models 26 and 28. 9. These models also show that the densities of the trace ion- ization species Ca ii and Na i are extremely sensitive to den- sity and ionization. Therefore the ratios N(Ca ii)/N(Na i), N(Na i)/N(H i), and N(Na i)/N(H) are, by themselves, inade- quate diagnostics of warm low density diffuse gas. Acknowledgements. We would like to thank George Gloeckler for sharing data with us prior to publication, and Alan Cummings for pointing out that the ACR isotopic data indicate that the LIC abundances are solar. We also thank the International Space Science Institute in Bern, Switzerland for hosting the working group on “Interstellar Hydrogen in the Heliosphere.” This research was supported by NASA Solar and Heliospheric Program grants NNG05GD36G and NNG06GE33G to the University of Chicago, and by the NASA grant NNG05EC85C to SWRI. References Adams, T. F. & Frisch, P. C. 1977, ApJ, 212, 300 Ajello, J. M. 1978, ApJ, 222, 1068 Altobelli, N., Krüger, H., Moissl, R., Landgraf, M., & Grün, E. 2004, Planet. Space Sci., 52, 1287 Altun, Z., Yumak, A., Badnell, N. R., Loch, S. D., & Pindzola, M. S. 2006, A&A, 447, 1165 Anders, E. & Grevesse, N. 1989, Geochim. Cosmochim. Acta, 53, 197 André, M. K., Oliveira, C. M., Howk, J. C., et al. 2003, ApJ, 591, 1000 Bellm, E. C. & Vaillancourt, J. E. 2005, ApJ, 622, 959 Bertaux, J. L. & Blamont, J. E. 1971, A&A, 11, 200 Bloch, J. J., Jahoda, K., Juda, M., et al. 1986, ApJ, 308, L59 Bzowski, M. 2003, A&A, 408, 1155 Bzowski, M., Gloeckler, G., Tarnopolski, S., Izmodenov, V., & Moebius, E. 2007, A&A, in press, Cartledge, S. I. B., Lauroesch, J. T., Meyer, D. M., & Sofia, U. J. 2004, ApJ, 613, Cheng, K. & Bruhweiler, F. C. 1990, ApJ, 364, 573 Cowie, L. L. & McKee, C. F. 1977, ApJ, 211, 135 Cravens, T. E. 2000, ApJ, 532, L153 Slavin and Frisch: Boundary Conditions of the Heliosphere 15 Cummings, A. C., Stone, E. C., & Steenberg, C. D. 2002, ApJ, 578, 194 Dupuis, J., Vennes, S., Bowyer, S., Pradhan, A. K., & Thejll, P. 1995, ApJ, 455, Ebel, D. S. 2000, J. Geophys. Res., 105, 10363 Egger, R. J. & Aschenbach, B. 1995, A&A, 294, L25 Ferland, G. J., Korista, K. T., Verner, D. A., et al. 1998, PASP, 110, 761 Field, G. B. & Steigman, G. 1971, ApJ, 166, 59 Frisch, P. & York, D. G. 1986, in The Galaxy and the Solar System (University of Arizona Press), 83–100 Frisch, P. C. 1981, Nature, 293, 377 Frisch, P. C. 2007, “Composition of Matter”, Space Sciences Series of ISSI, publ. Springer, 27, 00 Frisch, P. C., Dorschner, J. M., Geiss, J., et al. 1999, ApJ, 525, 492 Frisch, P. C., Grodnicki, L., & Welty, D. E. 2002, ApJ, 574, 834 Frisch, P. C. & Slavin, J. D. 2003, ApJ, 594, 844 Frisch, P. C. & Slavin, J. D. 2005, Advances in Space Research, 35, 2048 Frisch, P. C. & Slavin, J. D. 2006, Short Term Variations in the Galactic Environment of the Sun, in it Solar Journey: The Significance of Our Galactic Environment for the Heliosphere and Earth, Ed. P. Frisch (Springer), 133–193 Frisch, P. C., York, D. G., & Fowler, J. R. 1986, in ESA Special Publication, Vol. 263, New Insights in Astrophysics. Eight Years of UV Astronomy with IUE, ed. E. J. Rolfe, 491–492 Gloeckler, G. & Fisk, L. 2007, “Composition of Matter”, Space Sciences Series of ISSI, publ. Springer, 27, 00 Gloeckler, G. & Geiss, J. 2004, Advances in Space Research, 34, 53 Gloeckler, G. & Geiss, J. 2007, A&A, this volume Gloeckler, G. & Geiss, J. 2007, Space Science Reviews, 116 Gondhalekar, P. M., Phillips, A. P., & Wilson, R. 1980, A&A, 85, 272 Grevesse, N., Asplund, M., & Sauval, A. J. 2007, Space Science Reviews, 105 Grevesse, N. & Sauval, A. J. 1998, Space Science Reviews, 85, 161 Gry, C. & Jenkins, E. B. 2001, A&A, 367, 617 Gurnett, D. A. & Kurth, W. S. 2005, Science, 309, 2025 Hébrard, G., Mallouris, C., Ferlet, R., et al. 1999, A&A, 350, 643 Hebrard, G., Mallouris, C., Ferlet, R., et al. 1999, A&A, 350, 643 Henry, R. C. 2002, ApJ, 570, 697 Holberg, J. B., Bruhweiler, F. C., Barstow, M. A., & Dobbie, P. D. 1999, ApJ, 517, 841 Holweger, H. 2001, in AIP Conf. Proc. 598: Joint SOHO/ACE workshop ”Solar and Galactic Composition”, 23–+ Hurwitz, M., Sasseen, T. P., & Sirk, M. M. 2004, ArXiv Astrophysics e-prints Izmodenov, V., Malama, Y., Gloeckler, G., & Geiss, J. 2004, A&A, 414, L29 Juett, A. M., Schulz, N. S., Chakrabarty, D., & Gorczyca, T. W. 2006, ApJ, 648, Kimura, H., Mann, I., & Jessberger, E. K. 2003, ApJ, 582, 846 Koutroumpa, D., Lallement, R., Kharchenko, V., et al. 2006, A&A, 460, 289 Kruk, J. W., Howk, J. C., André, M., et al. 2002, ApJS, 140, 19 Lallement, R. & Bertin, P. 1992, A&A, 266, 479 Lallement, R., Bertin, P., Ferlet, R., Vidal-Madjar, A., & Bertaux, J. L. 1994, A&A, 286, 898 Lallement, R. & Ferlet, R. 1997, A&A, 324, 1105 Lallement, R., Vidal-Madjar, A., & Ferlet, R. 1986, A&A, 168, 225 Lallement, R., Welsh, B. Y., Vergely, J. L., Crifo, F., & Sfeir, D. 2003, A&A, 411, 447 Landgraf, M., Baggaley, W. J., Grün, E., Krüger, H., & Linkert, G. 2000, J. Geophys. Res., 105, 10343 Landsman, W. B., Henry, R. C., Moos, H. W., & Linsky, J. L. 1984, ApJ, 285, Lehner, N., Jenkins, E., Gry, C., et al. 2003, ApJ, 595, 858 Leske, R. A. 2000, in AIP Conf. Proc. 516: 26th International Cosmic Ray Conference, ICRC XXVI, ed. B. L. Dingus, D. B. Kieda, & M. H. Salamon, 274–+ Leske, R. A., Mewaldt, R. A., Christian, E. R., et al. 2000, in AIP Conf. Proc. 528: Acceleration and Transport of Energetic Particles Observed in the Heliosphere, ed. R. A. Mewaldt, J. R. Jokipii, M. A. Lee, E. Möbius, & T. H. Zurbuchen, 293–+ Linsky, J. L. & Wood, B. E. 1996, ApJ, 463, 254 Lodders, K. 2003, ApJ, 591, 1220 Möbius, E., Bzowski, M., Chalov, S., et al. 2004, A&A, 426, 897 Müller, H.-R., Florinski, V., Heerikhuisen, J., et al. 2007, A&A, in press, sub- mitted Müller, H.-R. & Zank, G. P. 2004a, J. Geophys. Res., 7104 Müller, H.-R. & Zank, G. P. 2004b, Journal of Geophysical Research (Space Physics), 7104 Marschall, L. A. & Hobbs, L. M. 1972, ApJ, 173, 43 McCammon, D., Almy, R., Apodaca, E., et al. 2002, ApJ, 576, 188 McCammon, D., Burrows, D. N., Sanders, W. T., & Kraushaar, W. L. 1983, ApJ, 269, 107 McClintock, W., Henry, R. C., Linsky, J. L., & Moos, H. W. 1978, ApJ, 225, 465 McClintock, W., Linsky, J. L., Henry, R. C., & Moos, H. W. 1975, ApJ, 202, 733 Meyer, D. M., Cardelli, J. A., & Sofia, U. J. 1997, ApJ, 490, L103 Mitchell, J. J., Cairns, I. H., & Robinson, P. A. 2004, Journal of Geophysical Research (Space Physics), 109, 6108 Oliveira, C. M., Hébrard, G., Howk, J. C., et al. 2003, ApJ, 587, 235 Parravano, A., Hollenbach, D. J., & McKee, C. F. 2003, ApJ, 584, 797 Pryor, W., Gangopadhyay, P., Sandel, W., et al. 2007, A&A, in press Quémerais, E., Lallement, R., Bertaux, J.-L., et al. 2006a, A&A, 455, 1135 Quémerais, E., Lallement, R., Ferron, S., et al. 2006b, Journal of Geophysical Research (Space Physics), 111, 9114 Raymond, J. C. & Smith, B. W. 1977, ApJS, 35, 419 Redfield, S. & Linsky, J. L. 2004, ApJ, 602, 776 Richardson, J. D., Liu, Y., Wang, C., & McComas, D. J. 2007, A&A, in press Richardson, J. D., Wang, C., & Burlaga, L. F. 2004, Advances in Space Research, 34, 150 Ripken, H. W. & Fahr, H. J. 1983, A&A, 122, 181 Robertson, I. P. & Cravens, T. E. 2003, Journal of Geophysical Research (Space Physics), 108, 6 Rucinski, D., Cummings, A. C., Gloeckler, G., et al. 1996, Space Science Reviews, 78, 73 Sanders, W. T., Edgar, R. J., Kraushaar, W. L., McCammon, D., & Morgenthaler, J. P. 2001, ApJ, 554, 694 Savage, B. D. & Sembach, K. R. 1996, ApJ, 470, 893 Sembach, K. R., Howk, J. C., Ryans, R. S. I., & Keenan, F. P. 2000, ApJ, 528, Simpson, J. P., Rubin, R. H., Colgan, S. W. J., Erickson, E. F., & Haas, M. R. 2004, ApJ, 611, 338 Slavin, J. D. 1989, ApJ, 346, 718 Slavin, J. D. & Frisch, P. C. 2002, ApJ, 565, 364 Slavin, J. D. & Frisch, P. C. 2006, ApJ, 651, L37 Smith, R. K., Edgar, R. J., Plucinsky, P. P., et al. 2005, ApJ, 623, 225 Snowden, S. L., Collier, M. R., & Kuntz, K. D. 2004, ApJ, 610, 1182 Snowden, S. L., Cox, D. P., McCammon, D., & Sanders, W. T. 1990, ApJ, 354, Snowden, S. L., Egger, R., Finkbeiner, D. P., Freyberg, M. J., & Plucinsky, P. P. 1998, ApJ, 493, 715 Snowden, S. L., Egger, R., Freyberg, M. J., et al. 1997, ApJ, 485, 125 Snowden, S. L., McCammon, D., & Verter, F. 1993, ApJ, 409, L21 Sofia, U. J., Cardelli, J. A., Guerin, K. P., & Meyer, D. M. 1997, ApJ, 482, L105 Stokes, G. M. 1978, ApJS, 36, 115 Takei, Y., Fujimoto, R., Mitsuda, K., & Onaka, T. 2002, ApJ, 581, 307 Thomas, G. E. & Krassa, R. F. 1971, A&A, 11, 218 Vallerga, J. 1996, Space Sci. Rev., 78, 277 Vallerga, J. 1998, ApJ, 497, 921 Wargelin, B. J., Markevitch, M., Juda, M., et al. 2004, ApJ, 607, 596 Weller, C. S. & Meier, R. R. 1981, ApJ, 246, 386 Welty, D. E., Hobbs, L. M., Lauroesch, J. T., et al. 1999, ApJS, 124, 465 Witte, M. 2004, A&A, 426, 835 Witte, M., Banaszkiewicz, M., & Rosenbauer, H. 1996, Space Science Reviews, 78, 289 Wood, B. E., Linsky, J. L., & Zank, G. P. 2000a, ApJ, 537, 304 Wood, B. E., Linsky, J. L., & Zank, G. P. 2000b, ApJ, 537, 304 Wood, B. E., Redfield, S., Linsky, J. L., Müller, H.-R., & Zank, G. P. 2005, ApJS, 159, 118 Wood, B. E., Redfield, S., Linsky, J. L., & Sahu, M. S. 2002, ApJ, 581, 1168 York, D. G. 1974, ApJ, 193, L127 York, D. G. 1983, ApJ, 264, 172 List of Objects ‘ǫ CMa’ on page 2 ‘Sirius’ on page 3 Introduction Photoionization Model Constraints and Assumptions Astronomical Constraints – The LIC towards CMa In situ Constraints – He0, Pickup Ions, and Anomalous Cosmic Rays Interstellar Radiation Field at the Cloud Surface Photoionization Models Model Results Discussion Heliosphere Boundary Conditions Hydrogen Filtration Factor Gas-Phase Abundances Gas-to-Dust Mass Ratio Heating and Cooling Rates Radiation Field LIC Pressure Comparisons with other LISM Sightlines Conclusions ABSTRACT The boundary conditions of the heliosphere are set by the ionization, density and composition of inflowing interstellar matter. Constraining the properties of the Local Interstellar Cloud (LIC) at the heliosphere requires radiative transfer ionization models. We model the background interstellar radiation field using observed stellar FUV and EUV emission and the diffuse soft X-ray background. We also model the emission from the boundary between the LIC and the hot Local Bubble (LB) plasma, assuming that the cloud is evaporating because of thermal conduction. We create a grid of models covering a plausible range of LIC and LB properties, and use the modeled radiation field as input to radiative transfer/thermal equilibrium calculations using the Cloudy code. Data from in situ observations of He^O, pickup ions and anomalous cosmic rays in the heliosphere, and absorption line measurements towards epsilon CMa were used to constrain the input parameters. A restricted range of assumed LIC HI column densities and LB plasma temperatures produce models that match all the observational constraints. The relative weakness of the constraints on N(HI) and T_h contrast with the narrow limits predicted for the H^O and electron density in the LIC at the Sun, n(H^0) = 0.19 - 0.20 cm^-3, and n(e) = 0.07 +/- 0.01 cm^-3. Derived abundances are mostly typical for low density gas, with sub-solar Mg, Si and Fe, possibly subsolar O and N, and S about solar; however C is supersolar. The interstellar gas at the Sun is warm, low density, and partially ionized, with n(H) = 0.23 - 0.27 cm^-3, T = 6300 K, X(H^+) ~ 0.2, and X(He^+) ~ 0.4. These results appear to be robust since acceptable models are found for substantially different input radiation fields. Our results favor low values for the reference solar abundances for the LIC composition. <|endoftext|><|startoftext|> Introduction Model Hamiltonian Single-orbital model. Spin-selective hybridization. Description of the employed simplifications Interaction Hamiltonian in excitonic representation Numerical estimates of the energy parameters Collective spin-flip states. negative g2DEG-factor Secular equation Spectrum of the localized states Delocalized impurity-related excitations. Positive g2DEG-factor. Pinning of the QHF spin Skyrmionic states created by magnetic impurities Phase diagram of QHF ground state at g2DEG*>0 Discussion acknowledgments EXCITONIC REPRESENTATION SPIN OPERATORS References ABSTRACT A theory of collective states in a magnetically quantized two-dimensional electron gas (2DEG) with half-filled Landau level (quantized Hall ferromagnet) in the presence of magnetic 3d impurities is developed. The spectrum of bound and delocalized spin-excitons as well as the renormalization of Zeeman splitting of the impurity 3d levels due to the indirect exchange interaction with the 2DEG are studied for the specific case of n-type GaAs doped with Mn where the Lande` g-factors of impurity and 2DEG have opposite signs. If the sign of the 2DEG g-factor is changed due to external influences, then impurity related transitions to new ground state phases, presenting various spin-flip and skyrmion-like textures, are possible. Conditions for existence of these phases are discussed. PACS: 73.43.Lp, 73.21.Fg, 72.15.Rn <|endoftext|><|startoftext|> Introduction About a year after the discovery of the first optical af- terglow of a Gamma-Ray Burst (GRB) by van Paradijs et al. (1997), two of van Paradijs’ students discov- ered the first supernova associated with a long-duration E.P.J.van den Heuvel and S.-C. Yoon Astronomical Instiutute “Anton Pannekoek” & Center for High Energy Astrophysics, University of Amsterdam, The Netherlands and Kavli Institute for Theoretical Physics, University of Califor- nia, Santa Barbara GRB: SN 1998bw/GRB980425 (Galama, Vreeswijk et al. 1998). This supernova appeared to be highly pecu- liar and energetic. It is of class Ic, which means that it has no H or He in its spectrum. Its outflow veloci- ties of > 30000 km/s were very much larger than the 10000 km/s seen in “ordinary” Type Ic supernovae and the total kinetic energy in SN1998bw was > 1052 ergs: at least an order of magnitude larger than in other supernovae. Theoretical modeling by Iwamoto at al. (1998) showed that the exploding star must have been a Carbon-Oxygen star with a mass in the range 6 to 13 M�, which had a collapsing core > 3 M�. The latter is too large to leave a neutron star, implying that this was the first-ever observed birth event of a stellar-mass black hole (Iwamoto et al. 1998). The discovery of SN1998bw was a beautiful confirmation of the “collap- sar” (“hypernova”) model proposed by Woosley (1993). According to this model the collapse of the rapidly ro- tating core of a massive star to a black hole will leave behind a rapidly rotating torus of extremely hot nuclear matter around the black hole. Internal friction in this keplerian torus causes its matter to spiral in towards the black hole within a few minutes, generating so much heat in this process that part of the matter is blown away in directions perpendicular to the plane of the torus with relativistic velocities. Woosley speculated that these relativistic “jets” of matter might produce a GRB. SN 1998bw appeared to confirm the predictions of Woosley‘s “collapsar” (“hypernova”) model. Al- though GRB980425 was, as a GRB, intrinsically quite faint and nearby (z=0.0085), which at first cast some doubt on the idea that genuine long-duration GRBs would in general be the birth events of stellar black holes, the discovery of the association of the really “cos- mological” gamma-ray burst GRB 030329 (z= 0.17) with a supernova with a spectrum and lightcurve almost identical to those of SN1998bw (e.g. Hjorth et al. 2003) confirmed beyond any reasonable doubt the association of long GRBs (abbreviated further as LGRB) with the death events of very massive stars and the formation of black holes. Indeed, while the lightcurves of the opti- cal transients (OTs) associated with LGRBs are often dominated by the radiation from the relativistic outflow of the GRB, numerous LGRBs have shown late-time “bumps” consistent with the presence of underlying su- pernovae (e.g. Bloom et al. 1999; Galama et al. 1999; Levan et al. 2005). For a review see Woosley and Bloom (2006). These discoveries have given strong credence to Woosley‘s (1993) model as the “standard” model for the production of the LGRBs, and this model has been worked out in more detail by Woosley and collabora- tors (e.g. MacFadyen and Woosley 1999; Woosley and Heger, 2006). To distinguish these very energetic and peculiar Ic “supernovae” associated LGRBs from the more ordinary Ibc supernovae, we will in this paper call them “hypernovae”. In order to finish with a pure CO- core of mass > 6M�, a star must have started out on the main sequence with a mass > 30M�, which implies that the LGRBs are associated with the most massive stars. Here we will discuss further evidence linking in- deed the LGRBs with such stars, and examine under which circumstances a star could lose its entire H- en He-rich envelope before collapsing to a black hole. It appears that the removal of the envelope by a binary companion might be an attractive possibility. 2 Host Galaxy Characteristics: further evidence for an association of the LGRBs with the most massive stars. In a very important recent paper, Fruchter et al. (2006) reported that the environments of LGRBs are strik- ingly different from those of the “ordinary” core col- lapse supernovae of types Ib,c and II. Using Hubble Space Telescope imaging of the host galaxies of LGRBs and core-collapse supenovae they found that the GRB are far more concentrated on the very brightest re- gions of their host galaxies than are the supernovae. Furthermore, they found that the host galaxies of the GRBs are significantly fainter and more irregular than the hosts of the supernovae. Theoretical work (Fryer, 2004, 2006) shows that stars which started out on the main sequence with masses between 8 and 20 M� leave neutron stars as remnants, while the cores of stars more massive than about 20M� collapse to black holes. Fig- ure 1, after Fryer (2006) shows that this happens irre- spective of initial metallicity, although the black holes produced at lower metallicity tend to be much more massive than those from higher metallicity stars. In view of the slope of the IMF, some 75 per cent of the Fig. 1.— Mass of collapsed remnant as a function of initial main-sequence progenitor mass from the analysis by Fryer (2006), for both the Limongi & Chieffi (2006) and Woosley et al. (2002) stellar progenitors. The lines are derived from the Woosley et al. (2002) progenitors: dotted line refers to solar metallicity, solid line refers to very low metallicity. The points are derived from the Limogni and Chieffi (2006) models: circle -solar, square 0.2 solar, triangle - zero, metallicities. Around 20 solar masses the outcome depends sensitively on the stellar evolution code used. Credit: C.L.Fryer (2006) deaths of stars >8 M� arise from the mass range 8- 20M�, and only some 25 per cent from masses > 20M�. Therefore, the bulk of the core collapse supernovae will be neutron-star forming events. It thus appears that the neutron-star forming events follow the normal light distribution of their host galaxies, whereas the LGRBs are concentrated strongly on the brightest parts of these galaxies. Another striking difference is that while half of the hosts of the “normal” core collapse supernovae are Grand Design (GD) spiral galaxies, only one out of the 42 hosts of the LGRBs is a GD spiral, the other 41 being smaller and more irregular galaxies. [In the case of the one GD spiral it is still very well possible that the real host is a small SMC- or LMC-like satellite of this spiral galaxy, which at this distance cannot be separately recognized]. The brightest patches of the irregular and small host galaxies of LGRBs are “clumps” of massive stars. This follows from the fact that these hosts are generally found to be very blue ( Fruchter et al. 1999; Sokolov et Long Gamma-Ray Burst Progenitors: Boundary Conditions and Binary Models 3 al. 2001) and have strong emission lines (Bloom et al. 1998; Vreeswijk et al. 2001), suggesting a significant abundance of young massive stars. At the large red- shifts of the GRB hosts it is impossible to distinguish the stellar content of the bright emission line spots (the entire HST image of a host is often smaller than an arcsec), but nearby small irregular starforming (“star- burst”) galaxies serve as a good example of what is going on in these small GRB hosts. A nearby exam- ple of such a galaxy is NGC 3125 which was studied by Hadfield and Crowther (2006). These authors find that the bright spots of this galaxy consist of large con- centrations of O- and WR-type stars, which number of order 10 000 in this galaxy. The galaxy has a metal- licity like that of the SMC/LMC (between 0.2 and 0.5 solar) and its brightest clump has at least four dense star clusters of > 200 000 solar masses, each with some 600 O-stars. A few of the hosts of relatively nearby LGRBs associated with hypernovae show similar char- acteristics. The host of SN1998bw is an LMC-size star- forming galaxy; the host of GRB060218 is SMC size; the host of GRB030329 is a z=0.17 undetectable, indi- cating that its size must be smaller than that of the SMC, and the host of GRB970228 at z=0.67 is not larger than the LMC. Recently Wolf and Podsiadlowski (2006), statis- tically studying part of the host galaxy sample of Fruchter et al. (2006), concluded that the typical LGRB host galaxy is of LMC size. They found, on the basis of the metallicity-luminosity relation for star- forming galaxies, that LGRB models that require a sharp metallicity cut-off below 0.5 solar metallicity are effectively ruled out as they would require fainter host galaxies than are observed. They therefore conclude that metallicities up to 0.5 solar must be allowed by models for LBRBs/hypernovae. As, however, in these irregular galaxies the metallicity may vary wildly from place to place, it is not clear to us whether not the LGRBs might arise from areas in the hosts of much lower metallicity, while the average metallicity of the host might still be up to of order 0.5 solar. 3 Possible reasons why small “starburst-like” galaxies are the prime sources of LGRBs These reasons can be divided into two broad categories: (1) Metallicity-related, (2) Starburst-related. As to Category (1): the wind mass-loss rates from massive stars are known to be metallicity-related: Mok- iem at al.(2006) find from observations of O- and B- supergiants in the Local Group galaxies that he wind mass-loss rates scale roughly as Ṁw ∝ Z0.78, where Z is the abundance of the elements heavier than helium. This implies that at lower metallicities, such as in the SMC and LMC (0.2 and 0.5 solar, respectively) massive stars lose (much) less mass during their evolution than in our galaxy. Therefore, they are more likely to finish as a black hole. Indeed, one observes that in the LMC half of the four known persistent High Mass X-ray Bina- ries (HMXB) harbour a black hole while in our Galaxy only one out of the over 20 known persistent HMXBs harbours a black hole (Cygnus X-1). It thus appears that at low Z, black-hole production is more efficient. In addition, a requirement for producing a “hypernova” is that at the time of the core collapse, the star is still rotating sufficiently rapidly to enable the formation of a disk or torus around the black hole (MacFadyen and Woosley 1999). Lower wind mass-loss rates imply also lower angular momentum loss rates, which will increase the probability of having still a sufficiently rapidly ro- tating stellar core at the time of the collapse. As to Category (2): It is well-known that during a starburst massive dense star clusters form with many hundreds, if not thousands, of massive OB stars. For example, many such massive young globular clusters are observed in the pair of Antennae Galaxies. In massive young globular clusters a variety of dynamical interac- tions take place between massive stars, massive binaries and stellar remnants (black holes, neutron stars) rang- ing from direct collisions to companion exchanges in binary systems, and to the formation of so-called In- termediate Mass Black Holes (IMBHs) with masses of order 100 to 1000 solar masses (Portegies Zwart et al., 2002, 2004, 2006). These can be unique events, which do not occur in any other stellar environment. Kulkarni (2006) suggested that LGRBs might be related to such unique events that can occur only in starburst galax- ies. This interesting idea merits to be further worked out, but at present not much further can be said about it. For this reason we will here only concentrate on the possible relation between LGRBs and metallicity. In order to make a hypernova such as the ones observed to coincide with the LGRBs, the two following condi- tions should be fulfilled: (1) the star must have lost its H- and He-rich outer lay- (2) At the time of core collapse, the core should have specific angular momentum in the range J(CO − core) = (3− 20)× 1016[cm2/s] (1) In order to fulfill these two conditions, two possible sce- narios have been proposed: (i) Completely-mixed single-star evolution of a rapidly- rotating low-metallicity star (Yoon and Langer 2005; Woosley and Heger 2006). (ii) Binary mass exchange, where the star achieves and maintains its rapid rotation due to tidal synchroniza- tion in a close binary (Izzard et al. 2004; Podsiadlowski et al. 2004). We now separately discuss these two possible scenarios. 4 Completely mixed single-star models of low metallicity In this case the rapid rotation of the star keeps it com- pletely mixed by meridional circulation during its en- tire H-burning evolution. The low metallicity causes the wind mass- and angular-momentum-loss rates to be small such that the star keeps rotating rapidly until the end. The complete mixing makes that by the end of hydrogen burning the star has become a complete helium star (the weak wind has by that time carried off the thin hydrogen envelope that still surrounded the helium core). Yoon and Langer (2005) calculated such an evolution for a star which started out with M= 40 M� and Z= 10−5 and find that it evolves into a rapidly rotating pure helium star of 32 M�, which after 600 years of C-burning undergoes core-collapse to a black hole with sufficient angular momentum to make a hy- pernova. They find that this type of evolution follows if the star starts out with an equatorial rotation velocity of 0.5 times the critical one. Later calculations by these authors suggest that up to Z =0.2 solar the stars still follow this evolutionary path. Woosley and Heger find that it would still work up to Z = 0.33 solar. For higher Z this single star model no longer works. If the conclu- sion of Wolf and Podsiadlowski (2006) mentioned in section 2 would strictly hold, i.e. if models should work up to Z =0.5 solar, these single star models would be ruled out. However, as mentioned at the end of section 2, due to the patchy distribution of metallicity in irregu- lar starburst galaxies, there could easily be patches with SMC-like (Z=0.2) metallicities in the irregular hosts and therefore certainly these completely mixed single star models cannot be ruled out. In the calculations of Yoon and Langer (2005) these stars still have a helium- rich envelope, which would lead to a Type Ib supernova, but later calculated models (Yoon, Langer and Norman 2006) and also some of the Woosley and Heger (2006) models lose this envelope by wind such that they would produce a Type Ic supernova. 5 Binary Models; can LGRBs be the formation events of Black-Hole X-ray Binaries? 5.1 Introduction The first ones to consider binary models for making LGRBs were Fryer and Woosley (1998). Their model was, however, not a core-collapse model, but one in which an already existing black hole in an X-ray bi- nary spiraled down into the helium core of its massive companion, as a result of a Common-Envelope phase. Although interesting, we will not consider such models here and only concentrate on “hypernova” models in which the LGRB coincides with the core-collapse event in which a black hole is formed. Izzard et al. (2004) and Podsiadlowski et al. (2004) were the first to consider the role that binary systems might play in producing such “hypernova” events. At present some twenty close X-ray binaries are known that consist of a black hole and a low-mass companion star (see McClintock and Remillard, 2006). The black hole in such systems typically has a mass between 3 and 20 M�, and the companion is a Roche-lobe filling star with a mass < 2M�. The orbital periods are in general less than a few days, and in many cases less than 0.5 day. In the system of X-ray-Nova Sco 1994 (J1655-40) the F-type companion of the 7 M� black hole has an overabundance of alpha-type elements such as S, Mg and Si of more than one order of magnitude (Israelian et al. 1999). This is just what one expects if the outer layers of the star of which the core collapsed to the black hole were ejected in a supernova-like event and polluted the outer layers of the F-type companion. It thus appears that in this black-hole X-ray binary a hypernova-like event took place. Podsiadlowski et al. (2004) propose that in all of these low-mass black hole X-ray binaries the formation event of the black hole produced a LGRB. The formation of these BH-LMXBs requires a preceding Common-Envelope (CE) phase of an initially wide binary system consisting of the massive progenitor star of the black hole together with a dis- tant low-mass companion star (e.g. see van den Heuvel and Habets 1984; Brown et al. 1996; Nelemans and van den Heuvel, 2001). During this CE phase the low- mass companion spiraled down deeply into the envelope of the massive companion resulting in a very close bi- nary system consisting of the helium core of the mas- sive star together with its low mass-mass main-sequence companion (< 2M�). Izzard et al. (2004) and Podsi- adlowski et al. (2004) suggested that tidal forces in this close binary keep the helium star in synchronous (=rapid) rotation, allowing it to have sufficient angular momentum at the time of its core collapse to produce a Long Gamma-Ray Burst Progenitors: Boundary Conditions and Binary Models 5 hypernova. These authors, however, did not calculate the timescales on which tidal synchronization in such binaries can be achieved. In order to see whether such a model can work, one has to calculate these timescales as well as the timescales on which the rotation of the contracting stellar core is synchronized with the outer envelope of the star. These two problems we will con- sider here. 5.2 Timescales for synchronization of helium stars in close binaries with a main-sequence companion. We consider helium stars of 8 and 16 M�, which are probably representative for the progenitors of the black holes in LGRBs. Helium-burning helium stars with such masses are almost completely convective. In 8 and 16 M� helium stars the convective cores have radii of about 60 and 70 per cent, respectively, of the stellar radii, and occupy most of the stellar mass (Paczynski 1971). According to Zahn (1975, 1977) the tidal synchro- nization timescale for a star with a convective core and a radiative envelope is given by: 1/tsync = 52 )1/2 MR2 q2(1 + q)5/6E2 where q = M2/M is the mass ratio of the compan- ion (M2) and of the star to be synchronized (M), and gs, R and I are the surface gravity, radius and mo- ment of intertia, respectively, of the latter star, a is the orbital radius and E2 is the tidal torque constant for stars with a radiative envelope and a convective core. E2 is proportional to (Rc/R)6, where Rc is the radius of the convective core (Zahn, 1975, 1977). Zahn (1975) calculated the values of E2 for main-sequence stars of various masses. For such stars in the mass range 7 to 15M� he found E2 to be around 10−4. In order to cor- rect for the much larger relative radius of the convec- tive cores in helium stars, one has to multiply the E2- values for main-sequence stars of similar masses with (RcHe/Rcms)6 , where Rcms is the relative radius if the convective core of the main-sequence star, and RcHe is the one of the helium star. To this end we used for the 8M� helium star (Rc = 0.7R) the E2 value of Zahn’s 10M� main-sequence star (Rc = 0.27R) and for the 16M� Helium star (Rc = 0.8R) we used the E2 value of Zahn‘s 15M� main-sequence star (Rc = 0.30R). This yields E2 = 4.4 × 10−4 for the 8M� helium star and E2 = 1.7× 10−2 for the 16M� helium star. In order to get the shortest possible orbital periods, we now assume that after the CE phase the low-mass main-sequence companion of the helium star fills its Roche lobe. We then find for the 8M� helium star that with Roche-lobe-filling companions of 1, 2 and 4 M�, re- spectively, the orbital periods are 8.78, 10.45 and 12.43 hours, respectively; using equation (2) we then find that with these three companion masses the tidal syn- chronization timescales of these three systems are 1800, 1400 and 1130 years, respectively. For a 16M� helium star the orbital periods with these three main-sequence companion masses are exactly the same and the tidal synchronization timescales are 440, 400 and 370 years, respectively. The lifetimes of helium stars of 8 and 16 M�, respectively, are of order 5 × 105 yrs (Paczynski 1971). Thus one expects, as already assumed by Izzard et al. (2004) and Podsiadlowski et al. (2004), that these helium stars will be fully synchronized with their or- bital motion throughout their core-helium-burning evo- lution. Could after the end of helium burning the con- tracting Carbon-Oxygen core of the helium star keep the angular momentum which it obtained in its state of synchronized helium star and maintain that angular momentum until core collapse? As we will now show, it is unlikely that it will be able to take this barrier. Fig. 2.— Solid curve: specific angular momentum as a function of mass in a synchronized 8 solar mass he- lium star with a 0.8 solar mass Roche-lobe filling main- sequence companion. Dotted curve: specific angular momentum distribution required for the formation of a hypernova in case the mass interior toMr collapses to a Schwarzschild black hole; dash-dotted curve: the same for the case of a Kerr black hole 5.3 Timescales for core-envelope coupling The fully drawn curve in Figure 2 shows the specific an- gular momentum distribution in a synchronized helium star of 8M� in a close binary with a 0.8M� Roche-lobe filling companion (Porb = 7.17h), compared with the minimum specific angular momentum required to form an accretion disk around a Schwarzschild and a Kerr black hole, as a function of the black hole mass. One observes from this figure that if the inner part of the helium star can maintain its specific angular momen- tum also when it becomes a contracting CO-core (which then will spin much faster than its helium envelope) then indeed the inner parts of such helium stars would be able to produce a hypernova/GRB if the black hole is of the Kerr type. However, whether the contracting CO-core can maintain its specific angular momentum which it had as a helium star, depends on the timescale of core-envelope coupling. It is expected that this cou- pling in a convective differentially rotating star will be due to magnetic fields generated in this star, and Spruit (2002) has derived the order of magnitude timescale for this coupling. Yoon (2006) calculated the evolution of rotating helium stars with masses between 8 and 40M� using Spruit’s (2002) mechanism for core-envelope cou- pling. He found that the inner 3M� of the CO-cores of these stars at the moment of core collapse have retained a fraction f of their initial specific angular momentum which they had as a helium star in solid-body rotation: For MHe = 8-16 M�: f = 0.2; 20M�: f = 0.4; 25M�: f = 0.6; 30M�: f = 0.65; 40 M�: f = 0.75. Using these values for 8-16 M� stars in Figure 2 one sees that the specific angular momentum in the central parts of the 8 M� helium star (the fully drawn curve) moves downwards by a factor 5 and thus falls below the Kerr as well as the Schwarzschild curves. The same holds for the 16M� helium star. This means that while a Helium star in a close binary with a Roche-lobe fill- ing low-mass main-sequence star has achieved tidal syn- chronization during core-helium burning, still its core at the time of its collapse will be unable to produce a hypernova/LGRB. We thus see that the progenitors of the black holes in the Black-Hole X-ray Binaries with low-mass companion stars in all likelyhood did not pro- duce a hypernova/LGRB. 5.4 Timescales for synchronization of helium stars in close binaries with a compact companion Such binaries will form by the spiral-in of a neutron- star or black-hole companion of a massive star in a wide High-Mass X-ray Binary (HMXB). Recently, with IN- TEGRAL such a wide system was discovered, consist- ing of a blue supergiant and a compact star in a 330 day orbit (Sidoli et al. 2006). [In HMXBs with orbital periods shorter than about 100 days, the compact star is expected to spiral into the core of its companion such that no binary will be left (e.g. Taam 1996)]. Presently three close X-ray binaries consisting of a helium star (Wolf-Rayet star) and a compact object are known: Cygnus X-3 (Porb = 4.8 h; van Kerkwijk et al. 1992), and the extragalactic sources IC10 X-1 (Porb = 34.8 h, Prestwich et al. 2007; ATel 955) and NGC 300 X- 1 (Porb = 32.8 h; Carpano et al. 2007). The short- est possible orbital periods of helium star plus compact star binaries will occur if the helium star fills its Roche lobe. For helium stars of 8M� and 16M� these short- est possible orbital periods are 2.046 and 2.466 hours, respectively, independent of the mass of the compact companion. Using equation (2) one finds that the syn- chronization timescales in these systems are extremely short, of the order of years to decades at most, such that they will remain synchronized throughout their core-helium- burning evolution. The specific angular momentum is here 3.7×1017 and 6.0×1017 cgs, respec- tively. As mentioned above, the cores of these stars can maintain some 20 per cent of this up till core col- lapse. Equation (1) shows that this is sufficient to make a LGRB/hypernova. Thus the post-in-spiral remnants of HMXBs are suitable for producing Long GRBs. Some example progenitor HMXBs that might pro- duce a LGRB: We use Webbink‘s (1984) equation to calculate the ratio of the final and initial orbital radius in the case of Common-Envelope evolution (e.g. see also van den Heuvel 1994). We will assume that the product αλ = 1, where α is the efficiency parameter for the ejection of the envelope, and λ is a parameter characterizing the density structure of the star. Our first example is Cygnus X-1, for which we adopt a mass of 35M�, with a 14M� helium core for the supergiant and a mass of 15M� for the black hole (e.g. Gies and Bolton, 1982, 1986). The initial orbital period of 5.6 days of this system then results into a final orbital pe- riod of 2.4 hours for the 14M� helium star plus the 15M� black hole. In this case the helium star will be very close to filling is Roche lobe, so we expect the final product of the Cygnus X-1 system to be able to pro- duce a hypernova/LGRB when the core of the helium star collapses to a black hole. A second example is the system of 4U 1223-62/Wray 977, which consists of a neutron star and a blue hyper- giant (B1.5Ia0) in an eccentric orbit with P = 41.5 days (e.g. see Kaper et al. 2006). The hypergiant is likely to have a mass 35M�, so we again we assume here a helium core of 14M�. For the neutron star we as- sume a mass of 1.8M� (like in the system of Vela X-1, which also is a very massive X-ray binary). Assuming the same values for alpha and lambda as in the first case, we find that the final orbital period after spiral-in is 2.1 hours, such that again the helium star just fits in- side its Roche lobe. So also here the core of the helium Long Gamma-Ray Burst Progenitors: Boundary Conditions and Binary Models 7 star at the time of collapse will have enough angular momentum to make a hypernova/LGRB. 6 Discussion and Conclusions We saw in section 4 that completely rotationally mixed single star evolution at relatively low metallicities (Z ≤ 0.33 solar) may well provide a viable model for the production of LGRBs/hypernovae. As to binary mod- els: the results from section 5 show that, assuming Zahn’s (1975, 1977) model for the tidal synchroniza- tion of helium stars in close binaries, massive helium stars with main-sequence companions will be quickly synchronized, within a few centuries to millennia, with their orbital motion. However, we find that as a conse- quence of efficient core-envelope coupling in the post- helium burning phase it is unlikely that these stars by the time of core collapse will have sufficient core angu- lar momentum to produce a hypernova/GRB. On the other hand, if the companion of the helium star is a compact object and the helium star is close to filling its Roche lobe (implying a very short orbital period, of the order of a few hours) we find that by the time of core collapse the core can still have sufficient angular mo- mentum to produce a hypernova/GRB. The fact that we already know two potential progenitors of close he- lium star plus compact star companion binaries among the HMXBs within 3.5 kpc distance from the sun im- plies that there must be several dozens such progenitor systems in our galaxy. Assuming a lifetime of some 50000 years for the HMXB phase, and 25 such systems in the Galaxy, one would expect one hypernova/LGRB from such systems every 2000 years. This is about 5 per cent of the SN rate in our galaxy. Assuming that the GRBs are beamed within a cone of opening half- angle 5 degrees (Frail et al. 2001), we would expect to observe one LGRB from such binary systems per 2 million years from a Galaxy like our own. We note that although this binary model appears viable, it remains puzzling why LGRBs have such a strong preference for the small irregular starburst galaxies. A possible explanation might be that at low metallicity a much larger fraction of the massive stars collapses to black holes. In such galaxies one would already expect most of the persistent “standard” HMXBs (that is: the ones with massive blue supergiant donor stars) to harbour black holes, while then also the donor stars in such systems are likely to collapse to black holes. This would imply that, if indeed the LGRBs originate from binary systems, a considerable fraction of the hypernovae/LGRBs will be the forma- tion events of close double black hole systems. We note that also Tutukov and Cherepaschuck (2004) have pro- posed that LGRBs are later evolutionary products of HMXBs. They assumed (but did not calculate) that the helium star plus compact star remnants from such systems would be synchronized and also assumed that the collapsing cores would have retained the angular momentum from the time a synchronized helium star. We have shown here quantitatively that this is indeed the case. This research was supported in part by the National Science Foundation under Grant No. PHY99-07949. The first author thanks the Mount Stromlo Observa- tory for its hospitality during the conference and the Netherlands research School for Astronomy NOVA for financial support for participation in this meeting. References: Bloom, J.S., Djorgovski, S.G., Kulkarni, S.R. and Frail, D.A., 1998, Ap.J. 507, L25-L28 Bloom, J.S. et al., 1999, Nature, 401, 253- 456. Brown, G.E., Weingartner, J.C. and Wijers, R.A.M.J., 1996, Ap. J. 463, 297-304. Carpano, S., Pollock, A.M.T., Prestwich, A., Crowther, P., Wilms, J., Yungelson, L. and Ehle, M., 2007, astro- ph/0703270 (accepted as a Letter in Astron. & Ap.). Fryer, C.L. (editor), 2004, “Stellar Collapse”, Kluwer Acad. Publishers, Dordrecht, 406 pp. Fryer, C.L., 2006, New Astron. Rev. 50, 492. Fryer, C.L. and Woosley, S.E., 1998, Ap.J. 502, L9-L12. Frail, D.A. et al., 2001, Ap.J. L55. Galama, T.J., Vreeswijk, P.M., et al., 1998, Nature 395, Galama, T.J. et al., 2000, Ap.J. 536, 185-194. Fruchter, A.S. et al., 1999, Ap.J. 519, 13-16. Fruchter, A.S., Levan, A.J., Strolger, L., Vreeswijk, P.M., Thorsett, S.E., Bersier,D., Burud, I, and 26 co- authors, 2006, Nature 441, 463. Hadfield, L.J. and Crowther, P.A., 2006, MMRAS 368, 1822-1832. Hjorth, J. et al., 2003, Nature 423, 847-850. Israelian, G., Rebolo, R., Basri, G., Casares, J. And Martin. E.L., 1999, Nature 401, 142-144. Iwamoto, K. et al. 1998, Nature 395, 672. Izzard, R.G., Ramirez-Ruiz, E. And Tout, C.A., 2004, MNRAS 348, 1215. Kaper, L., van der Meer, A., van Kerkwijk, M.H and van den Heuvel, E.P.J., 2006, Astron. Ap. 457, 595- Kulkarni, S.R. 2006, talk presented at Kavli Institute for Theoretical Physics, March 2006. http://arxiv.org/abs/astro-ph/0703270 http://arxiv.org/abs/astro-ph/0703270 Levan, A. et al., 2005, Ap.J. 624, 880-888. Limongi, M. and Chieffi, A. 2006, Ap.J. 647, 483. MacFadyen, A.I. and Woosley, S.E., 1999, Ap.J. 524, McClintock, J.E. and Remillard, R.A., 2006 in “Com- pact Stellar X-ray Sources” (editors W.H.G.Lewin and M. van der Klis), Cambridge Univ. Press, p. 157-213. Mokiem, R., de Koter, A., et al. 2006, astro- ph/0606403. Nelemans, G. And van den Heuvel, E.P.J., 2001, As- tron.Ap. 376, 950. Paczynski, B. 1971, Acta Astron. 21, 1-14. Podsiadlowski, P., Mazzali, P.A., Nomoto, K., Lazzati, D. And Cappellaro, E., 2004, Ap.J. 607, L17-L20. Portegies Zwart, S.F. and McMillan, S.L.W. 2002, Ap.J. 576, 899. Portegies Zwart, S.F., Baumgardt, H., Hut, P., Makino, J. And McMillan, S.L.W., 2004, Nature, 428, 724. Portegies Zwart, S.F., Baumgardt, H., McMillan, S.L.W., Makino, J., Hut, P., and Ebisuzaki, T. 2006, Ap.J. 641, 319. Prestwich, A. et al, 2007, ATel Nr. 955. Sidoli, L., Paizis, A. and Mereghetti, S., 2006, Astro- ph/10890S. Sokolov, V.V. et al., 2001, Astron. Ap. 372, 438-455. Spruit, H.C., 2002, Astron. Ap. 331, 923-932. Taam, R.E., 1996, in “Compact Stars in Binaries” (editors J. van Paradijs, E.P.J.van den Heuvel and E.Kuulkers), Kluwer Acad. Publishers, Dordrecht, p. 3-15. Tutukov, A.V. and Cherepaschuk, A.M. 2004, Astron- omy Reports 48(1), 39-44. Van den Heuvel, E.P.J., 1994, in “Interacting Binaries” (eds. H.Nussbaumer and A.Orr), Springer, Heidelberg, p. 263ff. Van den Heuvel, E.P.J. and Habets, G.M.H.J., 1984, Nature 309, 598-600. Van Paradijs, J, Groot, P.J., Galama, T., Kouveliotou, C. et al., 1997, Nature 386, 686-689. Vreeswijk, P.M. et al., 2001, Ap.J. 546, 672-680. Webbink, R.F., 1984, Ap.J. 277, 355-360. Van Kerkwijk, M.H., Charles, P.A., Geballe, T.R., King, D.L., et al. 1992, Nature 355, 703. Wolf, C. and Podsiadlowski, P., 2006, astro-ph/0606725v3 Woosley, S.E., 1993, Ap.J. 405, 273. Woosley, S.E., Heger, A. and Weaver, T.A., 2002, Rev. Mod. Phys. 74, 1015. Woosley, S.E. and Bloom, J.S. 2006, Ann. Rev. As- tron. Ap. 44, 507-556. Woosley, S.E. and Heger, A. 2006, Ap.J. 637, 914-921. Yoon, S.-C. and Langer, N., 2005, Astron. Ap. 443, 643-648. Yoon,S.-C., Langer, N. and Norman, C. 2006, Astron. Ap. 460, 199-208. Zahn, J.P., 1975, Astron. Ap. 41, 329. Zahn, J.P. 1977, Astron. Ap. 57, 383-394. http://arxiv.org/abs/astro-ph/0606403 http://arxiv.org/abs/astro-ph/0606403 http://arxiv.org/abs/astro-ph/0606725 Introduction Host Galaxy Characteristics: further evidence for an association of the LGRBs with the most massive stars. Possible reasons why small ``starburst-like'' galaxies are the prime sources of LGRBs Completely mixed single-star models of low metallicity Binary Models; can LGRBs be the formation events of Black-Hole X-ray Binaries? Introduction Timescales for synchronization of helium stars in close binaries with a main-sequence companion. Timescales for core-envelope coupling Timescales for synchronization of helium stars in close binaries with a compact companion Discussion and Conclusions ABSTRACT The observed association of Long Gamma-Ray Bursts (LGRBs) with peculiar Type Ic supernovae gives support to Woosley`s collapsar/hypernova model, in which the GRB is produced by the collapse of the rapidly rotating core of a massive star to a black hole. The association of LGRBs with small star-forming galaxies suggests low-metallicity to be a condition for a massive star to evolve to the collapsar stage. Both completely-mixed single star models and binary star models are possible. In binary models the progenitor of the GRB is a massive helium star with a close companion. We find that tidal synchronization during core-helium burning is reached on a short timescale (less than a few millennia). However, the strong core-envelope coupling in the subsequent evolutionary stages is likely to rule out helium stars with main-sequence companions as progenitors of hypernovae/GRBs. On the other hand, helium stars in close binaries with a neutron-star or black-hole companion can, despite the strong core-envelope coupling in the post-helium burning phase, retain sufficient core angular momentum to produce a hypernova/GRB. <|endoftext|><|startoftext|> Multiscale model of electronic behavior and localization in stretched dry DNA Ryan L. Barnett1, , Paul Maragakis2, , Ari Turner1, Maria Fyta1, and Efthimios Kaxiras1,3 Department of Physics, Harvard University, Cambridge, MA 02138 Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138 School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138 When the DNA double helix is subjected to external forces it can stretch elastically to elongations reaching 100% of its natural length. These distortions, imposed at the mesoscopic or macroscopic scales, have a dramatic effect on electronic properties at the atomic scale and on electrical transport along DNA. Accordingly, a multiscale approach is necessary to capture the electronic behavior of the stretched DNA helix. To construct such a model, we begin with accurate density-functional-theory calculations for electronic states in DNA bases and base pairs in various relative configurations encountered in the equilibrium and stretched forms. These results are complemented by semi- empirical quantum mechanical calculations for the states of a small size [18 base pair poly(CG)- poly(CG)] dry, neutral DNA sequence, using previously published models for stretched DNA. The calculated electronic states are then used to parametrize an effective tight-binding model that can describe electron hopping in the presence of environmental effects, such as the presence of stray water molecules on the backbone or structural features of the substrate. These effects introduce disorder in the model hamiltonian which leads to electron localization. The localization length is smaller by several orders of magnitude in stretched DNA relative to that in the unstretched structure. I. INTRODUCTION Soon after Watson and Crick’s discovery of the DNA double-helix structure [1], Eley and Spivey [2] introduced the notion of efficient charge transport along the stacked π orbitals of the bases. The mechanism of charge trans- port has been the subject of numerous studies in the intervening years, with renewed interest fuelled recently by both biological and technological considerations. Over a decade ago, Barton and co-workers observed distance- independent charge transfer between DNA-intercalated transition-metal complexes [3] and argued that it would be relevant for biology and biotechnology. More recent electron transport experiments on DNA have yielded widely varying results, showing alternatively insulating behavior [4, 5, 6, 7, 8], semiconducting behavior [9], Ohmic conductivity [10, 11, 12, 13], and proximity in- duced superconductivity [14]. The large number of rel- evant variables endemic to such experiments, like the DNA-electrode contact, and the rich variety of structures that DNA can assume, are the causes of variability in the experimental measurements (for a recent review of trans- port theory and experiments see Ref. [15]). Specifically, there is a large diversity of the DNA forms in terms of its composition, length, and structure. Exper- iments done long ago, suggested that DNA substantially longer than its natural length (also referred to as “over- stretched DNA”) can undergo a transition to an elon- gated structure up to twice the length of relaxed DNA [16]. This was also confirmed by recent single molecule ∗Present address: Department of Physics, California Institute of Technology, Pasadena, CA 91125. †Present address: D.E. Shaw Group, 120 West Forty-Fifth St., New York, NY 10036. stretching experiments [17, 18, 19], which showed that the molecule can be reversibly stretched up to 90% of its natural length. Such important deformations of the dou- ble helix may occur in biological environments. Stretch- ing of DNA is also related to cellular processes, such as transcription and replication. For example, proteins of- ten induce important local distortions in the double helix while they diffuse along the molecule in search of their target sequences. The electronic and transport proper- ties of DNA are directly influenced by its different con- formations as well as by environmental factors, such as counterions, impurities or temperature. A full account of these effects based on a realistic, atomic scale description of the structure and the electronic properties challenges the capabilities of theoretical models. Theoretical efforts to understand the electronic behav- ior and transport in DNA can be divided into two general categories: (i) Model calculations that use effective hamiltonians and master equations to describe the dynamics of electrons and holes in DNA (see, for instance, Refs. [20, 21, 22, 23]). Recent results [24] have led to considerable in- sights concerning the sequence-independent delocaliza- tion of electronic states in DNA. The main limitation of such approaches lies in the difficulty of determining accurate values for the parameters in the effective hamil- tonians. (ii) Ab initio calculations that can provide an accurate and detailed description of the electronic features [25, 26, 27]. These approaches are typically limited to a small number of atoms due to computational costs, and cannot readily handle the full complexity of DNA molecules in various conformations. In particular, stretching of DNA can induce a very significant deviation from the B form which is stable under normal conditions in aqueous so- lution. Such structural distortions are bound to have a profound effect on the electronic behavior. A realistic http://arxiv.org/abs/0704.0660v1 description of these effects makes it necessary to handle both the atomic scale features and the overall state of the macromolecule. In the present work, we address the problem of DNA stretching effects on the electronic states and the electron localization by providing a bridge between the two ex- tremes of the length scale; a similar methodology was re- cently used to study hole transfer in DNA [28]. Theoreti- cally, there are different ways of pulling the opposite ends of the DNA strands, leading to different stretched DNA forms, which are determined largely by base pair reorien- tations. Here, we use the poly(CG)-poly(CG) structures obtained in the pioneering study of Lebrun and Lavery [29] as the representative structure for stretching effects. This study modeled the adiabatic elongation of selected DNA molecules in two modes of stretching, correspond- ing to pulling on opposite 3’-3’ ends or 5’-5’ ends of the molecule: In the 3’-3’ stretching mode, the DNA helix is unwound leading to a ribbon-like structure, while in the 5’-5’ stretching mode the DNA helix contracts. We begin with a set of detailed calculations for the electronic structure of DNA bases (A,T,C,G) and repre- sentative base pairs (AT-AT, CG-CG, AT-CG, CG-GC) in various relative configurations, as they are likely to appear in the stretched forms, These calculations are based on density-functional theory [30, 31] and serve to set the stage for more extensive calculations which employ successive levels of approximations necessary to handle the computational demands. Specifically, we ex- tract the salient features of electronic structure of the individual DNA bases and base pairs from the ab ini- tio calculations; these are compared to an efficient and realistic semi-empirical model [32], in order to establish the validity of the latter approach. At this intermediate scale, we consider an 18 base pair poly(CG)-poly(CG) DNA sequence which has been stretched by 30%, 60% and 90% relative to the natural length of the unstretched B form. The atomic structure of these forms has been established by Lebrun and Lavery [29], using empirical interatomic potentials. We next use the information from this approximate description to build an effective hamil- tonian for the electronic behavior at much larger scales. This allows us to describe electron localization, due to the combined effects of stretching and environmental fac- tors, over mesoscopic to macroscopic length scales. The essence of the approach and the different scales involved are shown schematically in Fig. 1. We emphasize that we address here issues related only to dry and neutral DNA structures, where the negatively charged groups on the backbone are passivated by protons, conditions that are relevant to the experiments we consider for comparison to our theoretical results; water molecules or counterions (such as Na+) are not considered in our calculations. II. THEORETICAL METHODS A. Ab initio calculations As our first step toward establishing the electronic be- havior of dry, neutral DNA, we study the nature of elec- tronic states in individual bases and in base pairs. For these calculations we used three different implementa- tions of density-functional theory [30]: a method that uses atomic-like orbitals as the basis [33], one that uses plane waves [34] and a third that uses a real-space grid [35]. In all three approaches, we used the same exchange- correlation functional in the local-density approxima- tion [31], for consistency and simplicity. More elabo- rate approximations to exchange-correlation effects, such as the generalized gradient approximation [36], do not provide any improvement in describing the physics of these weakly interacting units. In each method we used pseudopotentials to represent the atomic cores, of the Trouiller-Martins type [37] in SIESTA, the Vanderbilt ultrasoft type [38] in VASP and the Hammann-Schluter- Chiang type [39] in HARES, with computational pa- rameters (number of orbitals in basis, plane-wave ki- netic energy cutoff and grid spacing) that ensure a high level of convergence. These calculations provide a thor- ough check on the consistency of various computational schemes to reproduce the electronic features of interest. The results are in excellent agreement across the three approaches. Since in these calculations there are no ad- justable parameters, we refer to them in the following as ab initio results. B. Construction of semi-empirical model The stretched forms contain a large number of atoms, typically beyond what can be efficiently treated with the ab initio methods used for the DNA bases and base pairs. Accordingly, for the electronic structure calcula- tions of these structures we use an efficient semi-empirical quantum-mechanical approach which employs a minimal basis set [32]. The consistency of this approach is then verified against the ab-initio calculations. Within the semi-empirical scheme, the electronic eigenfunctions are expressed as |ψ(n)〉 = c(n)ν |ϕν〉 (1) where the basis set |ϕn〉 includes the s and p atomic or- bitals for each atom in the system. The coefficients c are numerical constants, with |c(n)ν |2 giving the weight of orbital |ϕn〉 to the electronic wavefunction. This method uses a second order expansion in the electronic den- sity to obtain the total energy and takes into account self-consistently charge transfer effects which are impor- tant for biological systems. The method gives results for the band gaps that are in excellent agreement with those of the ab initio approaches described above (see Refs. [5, 40]). The highest occupied and lowest unoccupied molecu- lar orbitals (HOMO and LUMO, respectively, also re- ferred to collectively as “frontier states” in the following) are extended over the entire structure in Bloch-like wave functions. In order to describe electron hopping and lo- calization, we need to express these in terms of a basis of Wannier-like states that are localized on the individ- ual bases. To this end, we construct maximally localized states on single base pairs by taking linear combinations of the HOMO and LUMO states from the wavefunctions of Eq. (1). The maximally localized states will then be used to calculate the hopping parameters in the effective 1D hamiltonian. Using the extended electronic states |ψ(n)〉 of the frontier states, with corresponding ener- gies ε(n), we define the maximally localized states |ψ̃(i)〉 through the unitary transformation |ψ̃(i)〉 = 〈ψ(n)|ψ̃(i)〉|ψ(n)〉 (2) which minimizes the sum of the variances 〈ψ̃(i)|ẑ2|ψ̃(i)〉 − 〈ψ̃(i)|ẑ|ψ̃(i)〉2 under the constraint 〈ψ̃(i)|ψ̃(j)〉 = δij where z is the po- sition along the helical axis. Similar and more general methodologies have been developed in the past for ob- taining maximally localized states from extended ones [41, 42]. Due to the invariance of the trace, the first term in Eq. (3) is independent of the unitary transformation and the problem is simplified to one of maximizing the second term on the right-hand side with the same or- thonormality constraint. Carrying out the minimization, we arrive at the equation 〈ψ̃(n)|ẑ|ψ̃(m)〉(zn − zm) = 0 (4) where zn = 〈ψ̃(n)|ẑ|ψ̃(n)〉. (5) By inspection, we see that ζ is maximized when zn = zm for all m and n, corresponding to maximally delocalized states. On the other hand, ζ is minimized when the states |ψ̃(n)〉 are the eigenfunctions of the position operator ẑ within the HOMO or LUMO subspace. Therefore, the problem is further reduced to constructing and diagonal- izing the matrix Mnm = 〈ψ(n)|ẑ|ψ(m)〉 (6) which has the eigenvectors 〈ψ(n)|ψ̃(i)〉 that provide the desired transformation given in Eq. (2). The eigenvalues zn are the positions of the localized states. To evaluate the matrix elements we use the approximation 〈ψ(n)|ẑ|ψ(m)〉 = c(n)∗µ c ν 〈ϕµ|ẑ|ϕν〉 c(n)∗µ c ν Sµνzµν (7) where Sµν = 〈ϕµ|ϕν〉 is the overlap matrix between the two atomic orbitals and zµν = zµ+zν is the average z- value for the atoms located at sites given by the labels µ and ν. Once the localized states are constructed, the hopping parameters can be computed as tij = 〈ψ̃(i)|H|ψ̃(j)〉 = ε(n)〈ψ̃(i)|ψ(n)〉〈ψ(n)|ψ̃(j)〉 (8) recalling that the quantities 〈ψ(n)|ψ̃(i)〉 are determined from the transformation described above. Having defined the maximally localized states in terms of the electronic wavefunctions from the all-atom calcula- tions, we next produce an effective tight-binding hamilto- nian, which allows us to study electron hopping along the DNA double helix. This approach has also been used in a recent study on functionalized carbon nanotubes [43]. In our effective hamiltonian, we consider hopping between first and second neighbors along the helix, and denote the hopping matrix elements according to the scheme shown in Fig. 2 for the HOMO state of the poly(CG)- poly(CG) structure (all other frontier states involve ex- actly the same type of hopping matrix elements): H = ε c†ncn + t1 n even c†ncn+1 + c n+1cn n odd c†ncn+1 + c n+1cn c†ncn+2 + c n+2cn where n represents the nth base pair along the helical axis and we have neglected spin indices because they are unimportant for our analysis. Note that there is a dif- ference between hopping elements connecting even and odd sites to their neighbors (t1 and t2 terms in the ef- fective hamiltonian of Eq. (9)), due to the asymmetry in the structure illustrated in Fig. 2. Performing a Fourier transform on the electron creation and annihilation op- erators e−ikncn (10) gives a hamiltonian which has coupling between momenta k and k + π/a. By doubling the unit cell (and reducing the Brillouin Zone by a factor of two), this can finally be diagonalized to obtain the eigenvalues E±k = ε+ 2t3 cos(2k)± t 21 + t 2 + 2 t1t2 cos(2k)(11) with the momentum sum carried out over the reduced Brillouin Zone. With these expressions for the band structure energies, the density of states (DOS) g(ω) = δ(ω − E(n)k ) (12) can be readily obtained. These quantities are essential in describing electron localization along the DNA double helix under different conditions. C. Disorder and Localization length In order to quantify the amount of localization that is expected in stretched DNA forms, we add a term to the hamiltonian in Eq. (9) of the form Hdis = ncn (13) which is meant to emulate disorder arising from a variety of sources such as interaction of the DNA bases with stray water molecules and ions, or interaction with the substrate. Un are uncorrelated random energy variations chosen according to a Gaussian distribution of zero mean and width γ P (U) = . (14) Once the disorder hamiltonian is constructed with a spe- cific set of random on-site energies, by direct diagonaliza- tion we find the eigenstates |Ψ(i)〉 ofH+Hdis (we use cap- ital symbols to denote the new wavefunctions from the hamiltonian that includes the disorder term) and then calculate the localization length defined as 〈Ψ(i)|n̂2|Ψ(i)〉 − 〈Ψ(i)|n̂|Ψ(i)〉2 where nc†ncn. (16) For a single-hopping model with weak disorder, the lo- calization length scales as L ∼ (t/γ)2 for electrons near the middle of the band [44], with t the hopping matrix element which determines the band width. The more complicated effective hamiltonian considered here is not amenable to simple analytic treatment. III. RESULTS AND DISCUSSION We begin our discussion with an overview of electronic states in single bases and isolated base pairs. The struc- ture of the base pairs is shown in Fig. 3 with the atoms in each base labeled for future reference. These calcula- tions will set the stage for a proper interpretation of the behavior in the stretched and unstretched dry, neutral DNA helix. A. Frontier states The frontier states in the base pairs are related to only one component of the pair for both AT and CG. This is shown in Fig. 4. Specifically, the HOMO state of the AT pair is exactly the same as that of the HOMO state of the isolated A, and the LUMO state of AT the same as that of the isolated T. Similarly, the HOMO state of CG is identified with that of the isolated G and the LUMO state with that of the isolated C. Thus, the purines (A or G) give rise to the HOMO state, while the pyrimidines (T or C) are responsible for the LUMO states of each pair. It is clear from the same figure, that essentially all atomic pz orbitals which belong to a purine or pyrimidine contribute to the respective HOMO or LUMO π state of the base pair. This is in agreement with calculations on the optical absorption spectra of DNA bases and base pairs [45]. A closer inspection of Fig. 4 shows that the molecular frontier states of both AT and CG can be iden- tified as similar contributions (up to sign changes) from specific groups of carbon and nitrogen atoms. Specif- ically, in the purines (A and G) three distinct groups of atoms are mainly involved in forming the HOMO or- bital and include atoms (C8-N7), (C2-N3) and (N1-C6- C5-C4-N9), respectively. In the pyrimidines (T and C) the main groups involved in forming the LUMO orbital are two, (C4-C5-N1) and (N3-N7-C6). In both base pairs the atoms that are less involved in the frontier molecu- lar states are the carbon atoms that form a double bond with an oxygen atom, such as C2 of A and C and the four-fold bonded C7 atom of A. The frontier states are very little affected when the two components of the base-pair are separated along the direction in which they are hydrogen-bonded. To demon- strate this, we show in Fig. 5 the change in the eigenval- ues of the frontier states in AT and CG as a function of the distance between the two atoms that are bonded to the two backbones (we call this the backbone distance). For both base pairs the nitrogen atoms labeled N1 and N9, are the ones attached to the backbone (see Fig. 3). In order to obtain realistic structures, for each value of the backbone distance we hold the atoms of each base that are bonded to the backbone fixed and allow all other atoms to relax fully. These calculations were performed with the SIESTA code [33] and the relaxed configurations were used as input to calculate the electronic structure with the other two methodologies [34, 35]. In Fig. 5 we show complete results from the SIESTA calculations and selected results from one of the other two approaches. The results of Fig. 5 show clearly that only in the re- gion where the backbone distance becomes significantly smaller than the equilibrium value, interaction between the two bases shifts the eigenvalues of the electronic states appreciably, but even then the shifts are relatively small for the frontier states. It is also noteworthy that the band gap of the AT pair is significantly larger (∼ 3 eV) than that of the CG pair (∼ 2 eV) and that the fron- tier states of CG lie within the band gap of the AT pair. This observation is important because it indicates that in an arbitrary sequence of base pairs, the frontier states will be associated with those of the CG pairs. This state- ment is verified by calculations of electronic states in the AT-AT, CG-CG and AT-CG base pair combinations, to which we turn next. For more detailed comparisons, we collect in Table I min HOMO LUMO gap Backbone distance AT 8.67 Å −1.63 1.60 3.23 CG 8.73 Å −0.80 1.31 2.11 Axial distance AT-AT 3.67 Å −1.33 1.37 2.70 CG-CG 3.52 Å −0.46 0.95 1.41 AT-CG 3.36 Å −0.71 1.00 1.71 Rotation angle 36o −1.48 1.58 3.06 AT-AT 108o −1.45 1.68 3.13 180o −1.55 1.63 3.18 36o −0.52 1.22 1.74 CG-CG 108o −0.64 1.54 2.18 180o −0.94 1.60 2.54 36o −0.86 1.51 2.37 CG-GC 108o −0.66 1.43 2.09 180o −0.60 1.12 1.72 36o −0.73 1.38 2.11 AT-CG 108o −0.59 1.27 1.86 180o −0.81 1.25 2.06 TABLE I: Eigenvalues (in eV) of the frontier states for the DNA base pairs and the base-pair combinations, at the equi- librium configurations for the backbone distance, the axial distance (at zero relative angle of rotation) and the angle of rotation (at the equilibrium axial distance). The column la- beled “min” gives the values of the distances and the angle at the equilibrium configurations. Due to symmetry the values for the minima at rotation angles larger than 180o are similar to those given here and are not shown. the eigenvalues of the frontier states for the DNA pairs and the pair combinations, at different equilibrium con- figurations in the three relevant variables, the backbone distance, the axial distance and the rotation angle. Some results on the CG-GC base pair combination are also shown, to allow for comparison to the poly(C)-poly(G) sequence. When two base pairs are stacked on top of each other, there are two degrees of freedom for motion of one relative to the other: a separation along the helical axis, which we will call axial distance, and a relative rotation around the helical axis. We take the helical axis to be that which corresponds to stacking of successive base pairs in the B form of the DNA double helix. According to the notation of Fig. 3, the helical axis for both base-pairs is normal to the line connecting atoms C4 and C6 and is closer (about one third of their distance) to the purine atom C6. For each configuration we fix the atoms that are bonded to the backbone at a given relative position and allow all other atoms to relax, as was done in the calcu- lations involving the backbone distance discussed above. In Fig. 6 we show the behavior of electronic eigenval- ues as a function of the axial distance and the rotation HOMO LUMO ε (eV) 3.12 −0.09 t1 (meV) 14.0 −0.29 t2 (meV) 2.60 0.04 t3 (meV) 0.09 0.26 TABLE II: Parameters for the on-site (ε) and hopping matrix elements (ti, i = 1, 2, 3), for the HOMO and LUMO states of unstretched poly(CG)-poly(CG) DNA. angle. As above, the eigenvalues show little dependence on these two variables, except for rather small values of the axial distance which correspond to unphysically small separation between the two base pairs. What is also remarkable in the above results, is that in the AT-CG combination, the frontier states are clearly identified with those corresponding to the CG pair ex- clusively, which has the smaller band gap (see Fig. 6). Moreover, we note that the band gap of the poly(C)- poly(G) sequence, as calculated by the semi-empirical method based on a minimal atomic orbital basis [32] is in excellent agreement with the value obtained from the SIESTA calculation (2.0 eV and 2.1 eV, respectively). The band gap is expected to be significantly smaller in the case of wet DNA and in the presence of counterions, as shown in Ref. [46], for a Z-DNA helix. The band gaps between all three ab initio methods are identical within the accuracy of these methods. The nature of electronic wavefunctions obtained by the different methods is also in good qualitative agreement. Accordingly, in the rest of this paper we focus our attention to electron local- ization in the dry, neutral poly(CG)-poly(CG) sequence, and employ the results of the semi-empirical electronic structure method. B. Hopping electrons In Fig. 7, we show the unstretched and the three stretched forms of the poly(CG)-poly(CG) sequences at 30%, 60%, 90% elongation, along with the features of the frontier states. For visualization purposes, we repre- sent the calculated wavefunction magnitude of the fron- tier states by blue (HOMO) and red (LUMO) spheres, centered at the sites where the atomic orbitals are lo- cated. The radius of the sphere centered on a particu- lar atom is proportional to the magnitude of the dom- inant coefficient |c(n)ν |2 at this site (see Eq. (1)), which is essentially proportional to the local electronic density. It is evident from this figure that the nature of the or- bitals themselves, represented by the radii of the colored spheres, does not change much in the different stretched DNA forms, but the overlap between orbitals at neigh- boring bases is affected greatly by the amount of stretch- ing. For the poly(CG)-poly(CG) sequence shown, the HOMO orbitals are always associated with the G sites for all the stretching modes, while the LUMO orbitals are related to the C sites. However, as the DNA be- comes more elongated, the orbitals overlap even less and become localized for high stretching modes. The elon- gation to the overstretched form is achieved by changing the dihedral angle configuration of the DNA backbone, which leaves the local part of the orbitals essentially in- tact. Note how the orbitals rotate and spread out as the structure is being ovestretched, following the rotation of bases. We now turn to a discussion of the results for the hop- ping matrix elements of Eq. (9). Our discussion here is relevant to what happens when the occupation of a frontier state is changed from complete filling (for the HOMO) or complete depletion (for the LUMO), that is, the physics of small amounts of hole or electron doping. In Table II we give the values for ε, t1, t2, t3 (see Fig. 2) for the two frontier states of the unstretched poly(CG)- poly(CG) DNA form. The hopping matrix elements for the HOMO state involve only the G sites; those for the LUMO state involve only the C sites. As a consistency check, we have also calculated matrix elements for farther neighbors and found those to be much smaller in magni- tude. We have calculated the values of t1, t2, t3 by repeat- ing the same procedure as above for the stretched forms of the poly(CG)-poly(CG) DNA sequence. We note that if t2 = t3 = 0 electrons will not be able to migrate along the DNA molecule even if t1 is quite large, because at least one of the other two hops is necessary for migra- tion (see Fig. 2). From this simple picture, it is evident that the conductivity will be determined by which ma- trix element dominates. Quantitatively, the “bottleneck” hopping matrix element is given by t = max (min(|t1|, |t2|), |t3|) . (17) In Fig. 8 we show the value of the “bottleneck” hop- ping matrix element calculated as a function of stretch- ing. This indicates that hopping conductivity will dra- matically decrease by several orders of magnitude upon stretching the molecule and that the hopping will de- crease more from stretching in the 3’-3’ mode than in the 5’-5’ mode. This is due to the conformational changes induced by the different stretching modes, described ear- lier. C. Localization length The significant dropping of the hopping matrix ele- ments upon stretching as described in the previous sec- tion is indicative of electron localization with a weak amount of disorder. To investigate this possibility in detail, we focus on effects of stretching in the 3’-3’ mode. The evolution of the density of HOMO states upon stretching is shown in Fig. 9; similar behavior is observed for the LUMO states. The dramatic narrowing of the DOS width (equivalent to reduced dispersion in a band-structure picture) is strongly suggestive of electron localization [47], in this case induced by stretching. This localization length is controlled by the hopping elements t, since ε is the same at each site. For a more quantitative description, we show in Fig. 9 the localization length Li for each eigenstate for a 1500 base-pair DNA strand under different amounts of stretch- ing. The value of L(i) for each state is obtained from Eq.(15), with disorder strength γ = 0.3 meV, which de- termines the width of the gaussian given in Eq. (14). This disorder strength is much smaller than the band width of the unstretched DNA, but becomes comparable to the band width as the molecule is stretched. The magni- tude of such variations in on-site energies is consistent with those produced by the dipole potential terms, for instance, due to the presence of a stray water molecule situated on the substrate roughly 15 Å away from the DNA bases. We find that changing the value of γ by an order of magnitude (either smaller or larger) does not af- fect the qualitative picture presented here. Note that the localization length is not a strict function of the energy, as it depends on the disorder near where a given state happens to be localized. As the molecule is stretched, the localization length dramatically decreases until, for 60% stretching, the eigenstates are completely localized on single base pairs. The charge localization length as a function of DNA stretching has been recently studied in the experiment of Heim et al. [48]. This study focuses on λ-DNA which has an irregular sequence of base pairs, and can be com- pared to our theoretical results for poly(CG)-poly(CG) recalling that the frontier states even for a random se- quence are associated with those of the CG base-pairs. In the experiment, ropes of λ-DNA on a substrate are overstretched by a receding meniscus technique. The DNA ropes in this experimental setup are slightly pos- itively charged, corresponding to a depletion of a few electrons per 1000 base pairs. We suggest that this situa- tion is approximated by the structures of dry and neutral DNA that we considered above. Electrons were injected into the DNA and the resulting localization length was measured by an electron force microscope. For the un- stretched DNA, the charge was found to delocalize across the entire molecule, extending over a length of several microns. On the other hand, the charge injected into the overstretched DNA is localized, extending over a few hundred nanometers only. This is qualitatively consistent with the picture that emerges from our theoretical analy- sis, and is even in reasonable quantitative agreement: the degree of localization in experiment, measured by the ra- tio of length scales going from unstretched to stretched DNA structures, is approximately two orders of magni- tude, while the same quantity in our calculations, going from unstretched to 60% stretched DNA is ∼ 103. IV. SUMMARY We have described and implemented a multiscale method to derive effective hamiltonian models that are able to capture the dynamics of conduction and valence electrons in stretched DNA, starting from ab initio, all- atom quantum mechanical calculations. The ab initio simulations revealed that the frontier states in the base pairs are related to only one component of the pair. The purines were found to be associated with the HOMO states while the pyrimidines with the LUMO states. In the AT-CG combination the frontier states are identified with those of the CG pair. For all combinations of bases and base pairs studied here, the nature of these states was not affected by separation of the bases or base pairs along different directions or rotation along the helical axis. Turning to the next length scale and the semi-empirical calculations, we have calculated the “bottleneck” matrix elements for electron hopping along the DNA molecule, as a function of stretching. These show a significant decrease with elongation of DNA, which is stronger for stretching in the 3’-3’ mode than in the 5’-5’ mode. We were able to show quantitatively that stretching of DNA dramatically narrows the DOS width of frontier states. A small amount of disorder produced by environmental factors will naturally lead to localization of the electrons along the DNA. Our estimate for the degree of localiza- tion, based on a reasonable (and quite small) amount of disorder in the on-site energies for the electron states, is in very good agreement with recent experimental ob- servations. This provides direct validation for the con- sistency and completeness of the multiscale method pre- sented here. Acknowledgements: The authors are grateful to Richard Lavery for providing the overstretched struc- tures. MF acknowledges support by Harvard’s Nanoscale Science and Engineering Center, funded by the National Science Foundation, Award Number PHY-0117795. [1] J. D. WATSON and F. H. C. CRICK, Nature 171 (1953) 737 . [2] D. D. ELEY and D. I. SPIVEY, Trans. Faraday Soc. 58 (1961) 411. [3] C. J. MURPHY, M. R. ARKIN, Y. JENKINS, N. D. GHATLIA, S. H. BOSSMANN, N. J. TURRO, and J. K. BARTON, Science 262 (1993) 1025. [4] E. BRAUN, Y. EICHEN, U. SIVAN, and G. BEN- YOSEPH, Nature 391 (1998) 775. [5] P. DE PABLO, F. MORENO-HERRERO, J. COLCHERO, J. HERRERO, P. HERRERO, A. BARO, P. ORDEJÓN, J. SOLER, and E. ARTA- CHO, Phys. Rev. Lett. 85 (2000) 4992. [6] A. J. STORM, J. VAN NOORT, S. DE VRIES, and C. DEKKER, Appl. Phys. Lett. 79 (2001) 3881. [7] Y. ZHANG, R. H. AUSTIN, J. KRAEFT, E. C. COX, and N. P. ONG, Phys. Rev. Lett. 89 (2002) 198102. [8] Y. BABA, T. SEKIGUCHI, I. SHIMOYAMA, N. HI- RAO, and K. .G. NATH, Phys. Rev. B 74 (2006) 205433. [9] D. PORATH, A. BEZRYADIN, S. DE VRIES, and C. DEKKER, Nature 403 (2000) 635. [10] H.-W. FINK and C. SCHONENBERGER, Nature 398 (1999) 407. [11] L. CAI, H. TABATA, and T. KAWAI, Appl. Phys. Lett. 77 (2000) 3105. [12] P. TRAN, B. ALAVI, and G. GRUNER, Phys. Rev. Lett. 85 (2000) 1564. [13] H. COHEN, C. NOGUES, R. NAAMAN, and D. PO- RATH, Proc. Natl. Acad. Sci., 102 (2005) 11589. [14] A. KASUMOV, M. KOCIAK, S. GUERON, B. REULET, V. VOLKOV, D. KLINOV, and H. BOUCHIAT, Science 291 (2001) 280. [15] R. G. ENDRES, D. L. COX, and R. R. P. SINGH, Rev. Mod. Phys. 76 (2004) 195. [16] M.H.F. WILKINS, R.G. GOSLING, and W.E. SEEDS, Nature (London) 167 (1951) 759. [17] S. B. SMITH, Y. J. CUI, and C. BUSTAMANTE, Sci- ence 271 (1996) 795. [18] P. CLUZEL, A. LEBRUN, C. HELLER, R. LAVERY, J. L. VIOVY, D. CHATENAY, and F. CARON, Science 271 (1996) 792. [19] T. R. STRICK, J. F. ALLEMAND, D. BENSIMON, and V. CROQUETTE, Annu. Rev. Biophys. Biomolec. Struct. 29 (2000) 523. [20] Y. YAMADA, Int. J. Mod. Phys. B, 18 (2004) 1697. [21] K. IGUCHI, Int. J. Mod. Phys. B, 18 (2004) 1845. [22] J. Y. YI, Phys. Rev. B 68 (2003) 193103. [23] C. M. CHANG, A. H. C. NETO, and A. R. BISHOP, Chem. Phys. 303 (2004) 189. [24] R. A. CAETANO and P. A. SCHULZ, Phys. Rev. Lett., 95 (2005) 126601. [25] R. N. BARNETT, C. L. CLEVELAND, A. JOY, U. LANDMAND, G. B. SCHUSTER, Science, 294 (2001) 567. [26] E. ARTACHO, M. MACHADO, S. SÁNCHEZ- PORTAL, P. ORDEJÓN, J. M. SOLER, Molec. Phys. 101 (2003) 1587. [27] S. S. ALEXANDRE, E. ARTACHO, J. M. SOLER, Phys. Rev. Lett. 91 (2003) 108105. [28] K. SENTHILKUMAR et al, J. Am. Chem. Soc. 127 (2005) 14894. [29] A. LEBRUN and R. LAVERY, Nucl. Ac. Res. 24 (1996) 2260. [30] P. HOHENBERG and W. KOHN, Phys. Rev. 136 (1964) B864; W. KOHN and L. J. SHAM, Phys. Rev. 140 (1965) A1133. [31] J. P. PERDEW and A. ZUNGER, Phys. Rev. B 23 (1981) 5048. [32] M. ELSTNER, D. POREZAG, G. JUNGNICKEL, J. EL- SNER, M. HAUGK, T. FRAUENHEIM, S. SUHAI, and G. SEIFERT, Phys. Rev. B 58 (1998) 7260. [33] J. M. SOLER, E. ARTACHO, J. D. GALE, A. GARCÍA, J. JUNGUERA, P. ORDEJÓN, and D. SÁNCHEZ- PORTAL, J. Phys.: Condens. Matter, 14 (2002) 2745. [34] G. KRESSE and J. FURTHMÜLLER, Phys. Rev. B, 54 (1996) 11169. [35] U. V. WAGHMARE, H. KIM, I. J. PARK, N. MO- DINE, P. MARAGAKIS, E. KAXIRAS, Computer Phys. Comm., 137, (2001) 341. [36] J. P. PERDEW and Y. WANG, Phys. Rev. B 45 (1992) 13244; J. P. PERDEW, K. BURKE, and M. ERNZER- HOF, Phys. Rev. Lett. 77 (1996) 3865. [37] N. TROUILLER and J. L. MARTINS, Phys. Rev. B, 43 (1991) 8861. [38] D. VANDERBILT, Phys. Rev. B, 41 (1990) 7892. [39] D. R. HAMANN, M. SCHLUTER and C. CHIANG, Phys. Rev. Lett. 43 (1979) 1494. [40] P. MARAGAKIS, R. L. BARNETT, E. KAXIRAS, M. ELSTNER, and T. FRAUENHEIM, Phys. Rev. B 66 (2002) 241104(R). [41] N. MARZARI and D. VANDERBILT, Phys. Rev. B 56 (1997) 12847. [42] C. SGIAROVELLO,M. PERESSI, and R. RESTA, Phys. Rev. B 64 (2001) 115202 [43] Y-S. LEE, M. BUONGIORNO NARDELLI, and N. MARZARI, Phys. Rev. Lett. 95 (2005) 076804. [44] D. J. THOULESS, J. Phys. C 5 (1972) 77. [45] D. VARSANO, R.I. FELICE, M. A. .L. MARQUES, and A. RUBIO, J. Phys. Chem. B 110 (2006) 7129. [46] F. L. GERVASIO, P. CARLONI, M. PARRINELLO, Phys. Rev. Lett. 89 (2002) 108102. [47] P. W. ANDERSON, Phys. Rev. 109 (1958) 1492. [48] T. HEIM, T. MELIN, D. DERESMES, D. VUILLAUME, Appl. Phys. Lett. 85 (2004) 2637. [49] R. R. SINDEN, in “DNA Structure and Function”, (Aca- demic Press, London, 1994). density functional theory ab−initio structure electronic semi−empirical 1500 b hamiltonian tight−binding effective 1130 atoms94000 FIG. 1: Schematic illustration of the different scales included in the current multiscale model: The two pictures on the left are atomistic systems simulated with different computational approaches (ab initio density functional theory and semi- empirical electronic structure, resprectively). The picture on the right represents a rope composed of DNA molecules, as in experiments [48], which is treated by an effective tight-binding hamiltonian constructed from the atomistic scale calculations. FIG. 2: Schematic depiction of electron hopping in poly(CG)- poly(CG) DNA for the HOMO state. The hopping matrix elements ti are denoted by the indices (i) = (1), (2), (3). Elec- trons are localized on the G bases. For the LUMO state, the hopping is similar with electrons localized on the C bases. C2 C2 FIG. 3: The DNA base pairs AT (top) and CG (bottom), with the atoms labeled. The purines (A, G) are on the right, the pyrimidines (T, C) on the left. Atom labeling follows standard notation convention [49]. All rotations were performed with respect to the helical axis denoted by the black circle (see text). FIG. 4: The frontier states in the base pairs and their identifi- cation with corresponding orbitals in the isolated bases. The middle figure in each panel shows the total charge density on the plane of the base pair, with higher values of the charge density in red and lower values in blue. The figure on the left shows the HOMO state and the figure on the right shows the LUMO state, where red and blue isosurfaces correspond to positive and negative values of the wavefunctions. The labels on the left denote the type of bases and base pairs. 7 8 9 10 11 backbone distance (Α 7 8 9 10 11 backbone distance (Α AT CG T−LUMO C−LUMO G−HOMO A−HOMO FIG. 5: Eigenvalues of states in the AT and CG base pairs as a function of backbone distance. In each case three states are included above and below the band gap. Lines are results from SIESTA calculations, points are results from HARES calculations (see text). The frontier orbitals in both pairs are related to one component of the pair as indicated by the labels. The equilibrium backbone distance is denoted by a vertical dashed line. 2.9 3.4 3.9 4.4 4.9 5.4 axial distance (Α 0 60 120 180 240 300 360 angle (deg.) AT−AT CG−CG T−LUMO A−HOMO C−LUMO G−HOMO C−LUMO G−HOMO AT−CG FIG. 6: Eigenvalues of states in the AT-AT, CG-CG and AT- CG base pair combinations as a function of the distance along the helical axis (at zero angle of rotation) and the rotation an- gle around the helical axis (at the equilibrium axial distance). Lines are results from SIESTA calculations, points are results from VASP calculations (see text). In each case three states are included above and below the band gap. The value of the distance or the rotation angle that correspond to equilibrium configurations are indicated by vertical dashed lines (there are five almost equivalent local minima in rotation). As in Fig. 5, frontier orbitals are identified as the corresponding orbital of one base only. 3’−3’ 5’−5’ FIG. 7: The DNA structures for the unstretched (top) and the different amounts of stretching in the 3’-3’ and the 5’-5’ modes with features of the frontier orbitals described by the blue (HOMO) and red (LUMO) spheres (see text for details). For both modes the amount of stretching is (a) 30%, (b) 60%, and (c) 90% relative to the unstretched structure, which is the B- DNA form. The 3’-5’ orientations of the poly(CG)-poly(CG) sequence are shown in the left panel at 90% stretching, where these the structure is easier to visualize. 0 30 60 90 % stretching HOMO 3’−3’ HOMO 5’−5’ LUMO 3’−3’ LUMO 5’−5’ (2) (2) (2) (3) FIG. 8: The frontier state “bottleneck” hopping matrix el- ements as given by Eq. (17) for the different types (3’-3’ or 5’-5’) and amounts of stretching of poly(CG)-poly(CG) DNA. At each value of stretching, the dominant hopping process is indicated in parenthesis. −20 −10 0 10 20 energy (meV) FIG. 9: (bottom) The density of electronic states for the HOMO state stretched in the 3’-3’ mode. For comparison, the on-site energy parameter, ε, has been set to zero. (top) The localization length Li, defined in Eq. (15), is computed for each eigenstate with disorder strength γ = 0.3 meV. ABSTRACT When the DNA double helix is subjected to external forces it can stretch elastically to elongations reaching 100% of its natural length. These distortions, imposed at the mesoscopic or macroscopic scales, have a dramatic effect on electronic properties at the atomic scale and on electrical transport along DNA. Accordingly, a multiscale approach is necessary to capture the electronic behavior of the stretched DNA helix. To construct such a model, we begin with accurate density-functional-theory calculations for electronic states in DNA bases and base pairs in various relative configurations encountered in the equilibrium and stretched forms. These results are complemented by semi-empirical quantum mechanical calculations for the states of a small size [18 base pair poly(CG)-poly(CG)] dry, neutral DNA sequence, using previously published models for stretched DNA. The calculated electronic states are then used to parametrize an effective tight-binding model that can describe electron hopping in the presence of environmental effects, such as the presence of stray water molecules on the backbone or structural features of the substrate. These effects introduce disorder in the model hamiltonian which leads to electron localization. The localization length is smaller by several orders of magnitude in stretched DNA relative to that in the unstretched structure. <|endoftext|><|startoftext|> Introduction Both from a biregular and a birational standpoint, the geometry of algebraic varieties is often studied in terms of their fibrations. Given a smooth complex projective variety X, there are two parallel theories able to detect on X fibrations whose fibers are varieties of negative Kodaira dimension. The first one, initiated by Mori, associates a morphism X → S, whose general positive dimensional fiber is a Fano variety, to any KX-negative extremal ray of the cone of effective curves of X. The second theory, introduced in works of Campana and of Kollár, Miyaoka and Mori, produces a rational map X 99K S, with proper and rationally chain connected fibers within its domain of definition, from any family of rational curves on X, the most studied case being the one in which the family of all rational curves of X is considered. The purpose of this paper is to study extension properties of fibrations of the types described above, when these fibrations are defined on a smooth subvariety Y of X with ample normal bundle: our goal is to determine additional conditions that guarantee that given fiber structures on Y extend to analogous structures on X, and to compare the corresponding fibrations. Following the terminology of [13], we will refer to such a Y as an “ample subvariety” of X. We remark that, in the special case of codimension one, this setting is more general than the classical setting of ample divisors. Our first result is an extension property for rationally connected fibrations. The pre- cise formulation requires some additional notation, and is given in Theorems 3.1 and 3.6. 1991 Mathematics Subject Classification. Primary: 14D06, 14J10. Secondary: 14C05, 14J40, 14N30. Key words and phrases. Ample subvarieties, rationally connected fibrations, families of rational curves, special varieties, extension of maps, Mori contractions. All authors acknowledge support by MIUR National Research Project “Geometry on Algebraic Va- rieties” (Cofin 2004). The research of the second author was partially supported by NSF grants DMS 0111298 and DMS 0548325. The third author acknowledges partial support by the University of Milan (FIRST 2003). http://arxiv.org/abs/0704.0661v3 2 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI Roughly speaking, Theorem 3.1 says that if Y is a submanifold with ample normal bundle, the inclusion inducing a surjection N1(X) → N1(Y ), and V is a family of rational curves on X inducing a covering family VY on Y , then there is a commutative diagram Y//VY // X//V , where φ is the rational map associated to the family V , π is the map associated to VY , and δ is a surjective morphism of normal varieties; here X//V and Y//VY denote the respective rational quotients. With the second theorem, we determine sufficient conditions to ensure that the morphism δ in the above diagram is generically finite; this is one of the core results of the paper (see Theorem 3.6). Theorem A. Assuming that the normal bundle of Y in X is ample and that the inclusion of Y induces a surjection N1(X) ։ N1(Y ), the morphism δ is generically finite if one of the following two conditions is satisfied: (a) V is an unsplit family; or (b) codimX Y < dimY − dimY//VY . When the rational quotient Y//VY is one-dimensional, it turns out from a general fact that π is a morphism (see Proposition 3.12); using this fact, the previous result can be improved in this case, and we obtain a commutative diagram as the one above where now the vertical arrows are morphisms and δ is a finite morphism between smooth curves (see Corollary 3.13). The situation when Y is the zero scheme of a regular section of an ample vector bundle on X and π is the MRC-fibrations over a base of positive geometric genus was also treated in [25] (see Remark 3.10). Next, we address the problem of extending extremal Mori contractions of fiber-type. Using the previous result, we prove the following theorem (see Theorem 4.1). Theorem B. Let Y be a submanifold of a projective manifold X with ample normal bundle, and assume that the inclusion induces an isomorphism N1(X) ∼= N1(Y ). Let π : Y → Z be an extremal Mori contraction of fiber-type, and let W be an irreducible covering family of rational curves on Y that are contracted by π. If these curves do not break in X under deformation (that is, if the family they generate in X is unsplit), then π extends to an extremal Mori contraction φ : X → S, and there is a commutative diagram // S, where δ is a finite surjective morphism. In the special case when Y is defined by a regular section of an ample vector bundle on X, related results were obtained in [1, 9, 25] (see Remark 4.3). AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 3 In the last section of the paper, the above results are finally applied to classify, un- der suitable conditions, projective manifolds containing “ample subvarieties” that admit a structure of projective bundles or quadric fibrations. Beginning with classical results on hyperplane sections, there has been given evidence to the fact that projective manifolds are, so to speak, at least as “special” as their ample divisors (cf. [26]). The study of pro- jective bundles and quadric fibrations embedded in projective manifolds as zero schemes of regular sections of ample vector bundles was already undertaken in [18, 19, 20, 8, 25], where classification results were obtained when the base of the fibration, if positive dimen- sional, has positive geometric genus. Under the additional assumption of a polarization on the ambient variety inducing a relatively linear polarization on the fibration, classifi- cation results were obtained in [7, 1, 21]. With some dimensional restrictions on the fiber structure of the subvariety in terms of its codimension, we extend such results in two ways. In our first generalization, we consider fibrations over curves and drop the hypothesis that the subvariety is defined by a regular section of an ample vector bundle, only assuming that the normal bundle is ample. In this case, since the positivity condition is local near Y , we additionally require that the inclusion induces an isomorphism on the Picard groups. In the second generalization, we allow the base of the projective fibration to have arbitrary dimension. In either case, we do not put any additional restriction on the base of the fibration and do not require a priori any global polarization (such a polarization will turn out to exist a posteriori). We state here the extension property that we obtain in the case of projective bundles. Theorem C. Let Y be a submanifold of a projective manifold X with ample normal bundle, of codimension (1) codimX Y < dimY − dimZ. Assume that Y admits a projective bundle structure π : Y → Z over a smooth projective variety Z, and that one of the following two situations occurs: (a) Z is a curve and the inclusion of Y in X induces an isomorphism Pic(X) ∼= Pic(Y ); (b) Y is the zero scheme of a regular section of an ample vector bundle on X. Then π extends to a projective bundle structure π̃ : X → Z on X, giving a commutative diagram eπ~~~~ and the fibers of π are linearly embedded in the fibers of π̃. A similar statement is also proven in the case of quadric fibrations with irreducible reduced fibers and relative Picard number one. For the full result, see Theorem 5.8. The proofs of the above results rely on various properties concerning families of rational curves on projective manifolds and their associated rational quotients, which are collected in Section 2. General facts from deformation theory of rational curves needed in the sequel are recalled in this section, and several new notions are introduced and studied. 4 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI Among other things, we study numerical properties of families of rational curves, intro- ducing in particular the notion of numerically covering family (see Definition 2.18), which turns out to be very useful in our context. Appropriate notions for the extension and re- striction of families of rational curves are also introduced in relation to given submanifolds of the ambient variety. Acknowledgements. We would like to thank C. Araujo, L. Bădescu, C. Casagrande and P. Ionescu for useful discussions, and the referee for many precious remarks and sugges- tions. We are grateful to the Istituto Nazionale di Alta Matematica and to the University of Milan (FIRST 2003) for making this collaboration possible. 1. Conventions and basic notation We work over the complex numbers, and use standard notation in algebraic geome- try, although tensor product between line bundles is often denoted additively. For n ≥ 1, we denote by Qn the smooth quadric hypersurface of Pn+1. We use the word general to mean that the choice is made outside a properly contained (Zariski) closed subset, and the word very general to mean that the choice is made outside a countable union of such subsets. For any projective algebraic variety X, we denote by N1(X) := (Pic(X)/ ≡)⊗R the Néron–Severi space of X, and by N1(X) := (Z1(X)/ ≡) ⊗ R the space generated by numerical classes of curves on X. The dimension of these spaces, which is equal to the Picard number of X, is denoted by ρ(X). For a morphism of algebraic varieties X → S, we set N1(X/S) := (Z1(X/S)/ ≡) ⊗ R, and we denote by ρ(X/S) the dimension of this space. The numerical class of a line bundle L (resp., of a curve C) on X is denoted by [L] (resp., [C]). We denote by NE(X) the closure of the cone in N1(X) spanned by numer- ical classes of effective curves, and by Nef(X) its dual cone, namely, the cone in N1(X) spanned by classes of nef divisors. In the case X is a projective variety with Pic(X) ∼= Z, we will denote by OX(1) the ample generator of this group. 2. Families of rational curves on varieties Throughout this section we consider morphisms from the projective line P1 to a smooth projective variety X; for convenience, we fix once for all two distinct points 0 and ∞ in P1. We will use notation as in [15], some of which we recall below. For general facts on the theory of deformation of rational curves on varieties, we refer to [15, 6]. Let Hom(P1,X) be the scheme parameterizing morphisms from P1 to X. We will denote a morphism f : P1 → X by [f ] whenever we think of it as a point of Hom(P1,X). If Σ ⊆ P1 and Θ ⊆ X are closed subschemes, then we denote by Hom(P1,X; Σ → Θ) the subscheme of Hom(P1,X) parameterizing those morphisms f such that f(Σ) ⊆ Θ (the image f(Σ) of Σ is here intended scheme-theoretically). In the particular case Σ = {0} and Θ = {x} for some x ∈ X, we also use the notation Hom(P1,X; 0 7→ x). Note that, for any closed subscheme j : Σ →֒ P1, there is a natural morphism j∗ : Hom(P1,X) → Hom(Σ,X) defined by j∗([f ]) := [f ◦ j]. AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 5 Definition 2.1. A morphism f : P1 → X is free if it is non-constant and f∗TX is nef. We will use the following well-known properties (see [15, Theorem II.1.7]). Proposition 2.2. Let f : P1 → X be a non-constant morphism, and let x = f(0). Let m0 ⊂ OP1 be the maximal ideal sheaf of 0 ∈ P (a) The Zariski tangent space of Hom(P1,X) at [f ] is isomorphic to H0(P1, f∗TX), and Hom(P1,X) is smooth at [f ] if h1(P1, f∗TX) = 0. (b) Similarly, the Zariski tangent space of Hom(P1,X; 0 7→ x) at [f ] is isomorphic to H0(P1, f∗TX⊗m0), and Hom(P 1,X; 0 7→ x) is smooth at [f ] if h1(P1, f∗TX⊗m0) = In particular, if f is free, then Hom(P1,X) and Hom(P1,X; 0 7→ x) are smooth at [f ]. For closed subschemes Θ ⊆ X and Σ ⊆ P1, and any subset S ⊆ Hom(P1,X), we denote S(Σ → Θ) := S ∩Hom(P1,X; Σ → Θ), and, in a similar way, we define S(0 7→ x) for any point x ∈ X. Definition 2.3. Given a subset S ⊆ Hom(P1,X), the image of the universal map P1×S → X is called locus of S, and is denoted by Locus(S). Analogous definitions are given for the loci of S(Σ → Θ) and S(0 7→ x), which are respectively denoted by Locus(S; Σ → Θ) and Locus(S; 0 7→ x). We now restrict our attention to the open subscheme Hombir(P 1,X) of Hom(P1,X) parameterizing morphisms that are birational to their images. The previous notation is adapted to Hombir(P 1,X) in the obvious way. Definition 2.4. Any union V = ∪α∈AVα of irreducible components Vα of Hombir(P is said to be a family of (parameterized) rational curves on X. If V consists of only one irreducible component of Hombir(P 1,X), then V is said to be an irreducible family. If Locus(Vα) is dense in X for every α ∈ A, then V is said to be a covering family. Remark 2.5. The scheme Hombir(P 1,X) has at most countably many irreducible compo- nents. In particular, the same holds for every family of rational curves on X. Proposition 2.6. Let V be an irreducible family of rational curves on X. Then the locus Locus(V ) of V is dense in X if and only if f is free for a general [f ] ∈ V . Proof. Locus(V ) is dense in X if and only if the differential of the universal map P1×V → X has rank equal to dimX at a general point (p, [f ]) ∈ P1×V . By [15, Proposition II.3.10], this occurs if and only if f is free for a general [f ] ∈ V . � Definition 2.7. Given any closed subset S ⊆ Hombir(P 1,X), we denote by 〈S〉 the union of all irreducible components of Hombir(P 1,X) that contain at least one irreducible com- ponent of S, and by 〉S〈 the union of all irreducible components of Hombir(P 1,X) that are irreducible components of S. We call 〈S〉 the minimal family generated by S, and 〉S〈 the maximal family contained in S. Analogous notation and definitions will be adopted for S ⊆ Hom(P1,X). 6 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI Definition 2.8. Let V be a family of rational curves on X. We say that a curve C ⊂ X is a V -curve if C = f(P1) for some [f ] ∈ V . A V -chain of length ℓ is the union of ℓ distinct curves fi(P 1) ⊂ X (1 ≤ i ≤ ℓ) parameterized by elements [fi] ∈ V such that fi+1(0) = fi(∞) for every 1 ≤ i ≤ ℓ− 1. Consider a family V = ∪Vα of rational curves on X. Let V α be the irreducible components of the normalization Homnbir(P 1,X) of Hombir(P 1,X) that map to Vα, and let V be the closure in Chow1(X) of the image of ∪V α via the natural morphism (2) Homnbir(P 1,X) → Chow1(X) (see [15, Comment II.2.7]). By [15, Theorem I.5.10], V is proper. Associated to V (hence to V ), one can define a proper proalgebraic relation on X, as explained in [15, Section IV.4] (see in particular [15, Example IV.4.10]). We denote this relation by RCV . We have that (x, y) ∈ RCV for two very general points x, y ∈ X if and only if there is a V -chain containing x and y. Definition 2.9. X is said to be RCV -connected if there is only one RCV -equivalence class. We say that X is V -chain connected if every two points of X lie on a V -chain. More generally, a closed subset T ⊂ X is said to be V -chain connected if any two points of T lie on a V -chain supported on T . Clearly if X is V -chain connected, then it is RCV -connected. Definition 2.10. A family of rational curves V is said to be an unsplit family if it is irreducible and, after normalization, its image in Chow1(X) via the map (2) is a proper scheme. Remark 2.11. If V is an irreducible family of rational curves on X, and there is an ample vector bundle E onX such that detE·C < 2 rkE for a V -curve C, then V is unsplit. Indeed, i=1[Ci] is any degeneration of the cycle [C], then detE · C = i=1 detE · Ci ≥ k rkE by the ampleness of E. Thus k = 1 and [C1] lies in the image of the normalization of V in Chow1(X) via the map (2). Proposition 2.12. If X is RCV -connected by a family V = ∪Vα with each Vα unsplit, then it is V -chain connected and N1(X) is generated by numerical classes of V -curves. In particular, if V is unsplit, then X has Picard number ρ(X) = 1. Proof. Fix ℓ≫ 0 such that every two very general points of X are connected by a V -chain of length ℓ, and let S ⊂ Chow1(X) be the subset parameterizing connected 1-cycles with ℓ components (counting multiplicities) supported on V -curves. Since each component Vα of V is unsplit, it follows that S is proper. By the choice of ℓ, there is an irreducible component T of S such that u(2) : U ×T U → X × X is dominant, where U → T is the universal family and u : U → X is the cycle map. Note that T is a closed subvariety of Chow1(X), and in particular it is proper. Therefore the image of u (2) is proper, and thus u(2) is surjective. This implies that X is V -chain connected. The second assertion follows from [15, Proposition IV.3.13.3]. � If V = Hombir(P 1,X), then it is a result of Campana [5] and Kollár–Miyaoka–Mori [16] that one can “pass to the quotient”. The natural generalization of this result when V is AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 7 an arbitrary family of rational curves was later given by Kollár (see [15, Theorem IV.4.17]), and can be stated as follows. Theorem 2.13. Let V be a family of rational curves on X. Then there is an open set X◦ ⊆ X, a normal projective variety X//V , and a dominant morphism X ◦ → X//V with connected fibers and proper over the image, whose very general fibers are RCV -equivalence classes in X. Proof. We refer the reader to [15, Theorem IV.4.17] for the existence of a proper morphism X◦ → Y ◦ onto a variety Y ◦ with connected fibers and whose very general fibers are RCV - equivalence classes in X. Since X◦ is normal and the fibers are connected, Y ◦ is normal. It follows by construction that Y ◦ is quasi-projective. Then we can take X//V to be the normalization of the closure of Y ◦ in some projective embedding. � Definition 2.14. With the notation as in the previous theorem, the resulting rational map X 99K X//V is called the RCV -fibration of X, and X//V is the RCV -quotient. We also say that X//V (resp., X 99K X//V ) is the rational quotient (resp., the rational fibration) defined by V . We denote by dimRCV := dimX − dimX//V the dimension of a very general RCV -equivalence class (which is the same as the dimension of a general fiber of X 99K X//V ). Remark 2.15. We have dimRCV > 0 if and only if V is a covering family. We would like to stress that X//V is well defined only up to birational equivalence; throughout the paper, we will often make suitable choices of rational quotients. We remark that, after possibly shrinking further X◦, one can always take a smooth projective model for the rational quotient X//V . Remark 2.16. A case of particular interest is when V = Hombir(P 1,X). In this case X//V is called the MRC-quotient (maximal rational quotient) of X. It follows from a result of Graber, Harris and Starr [12] that in this situation X//V is not uniruled. Every connected subset S ⊆ Hombir(P 1,X) determines in a natural way a numerical class [S] ∈ N1(X), by taking the class of the curve f(P 1) for an arbitrary [f ] ∈ S. More generally, we give the following definition. Definition 2.17. For any subset S ⊆ Hombir(P 1,X), we denote R≥0[S] := [f ]∈S R≥0[f(P 1)] ⊂ N1(X). We call R≥0[S] the cone numerically spanned by S. If S is connected, then we call [S] the numerical class of S. Definition 2.18. A family of rational curves V = ∪α∈AVα is said to be numerically covering if there is a subset A′ ⊆ A such that R≥0[V ] = R≥0[Vα] 8 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI and Locus(Vα) is dense in X for every α ∈ A ′. If V is a numerically covering family, then we denote by Vcov the subfamily of V consisting of all the covering irreducible components of V . Remark 2.19. Note that by definition R≥0[Vcov] = R≥0[V ] for every numerically covering family V of rational curves on X. Clearly, in the notation of the definition, we have ∪α∈A′Vα ⊆ Vcov. Proposition 2.20. Let V = ∪α∈AVα and W = ∪β∈BWβ be two numerically covering families of rational curves on X, and suppose that R≥0[V ] ⊆ R≥0[W ]. Then the RCW -fibration ψ : X 99K X//W factors through the RCV -quotient of X, i.e., we have a commutative diagram X//V //___ X//W . In particular, if R≥0[V ] = R≥0[W ], then V and W define the same rational fibration. Proof. Consider a resolution of the indeterminacies of ψ ψ ""F X//W , and let E ⊂ X ′ be the exceptional locus of σ. Note that the locus of indeterminacies of ψ is contained in X \X◦, and hence, by Theorem 2.13, it does not dominate X//W , since it does not meet the very general RCV -equivalence class. Therefore we can assume that ψ′(E) is a proper closed subset of X//W . Lemma 2.21. If C ⊂ X is a V -curve not contained in σ(E), then its proper transform C ′ ⊂ X ′ is mapped to a point in X//W by ψ Proof of the lemma. Let B′ ⊂ B be the subset parameterizing the irreducible components of W that are covering, so that Wcov = ∪β∈B′Wβ. Since R≥0[V ] ⊆ R≥0[W ] = R≥0[Wcov], we can pick general elements [gβ] ∈ Wβ for β ∈ B ′, and numbers λβ ≥ 0, such that, denoting Γβ = gβ(P 1), we have [C] = λβ[Γβ] in N1(X). By the definition of ψ and Theorem 2.13 and the fact that the Wβ are covering families for every β ∈ B′, we can assume that Γβ ∩ σ(E) = ∅ for every β. Let C ′ and Γ′β be the proper transforms of C and Γβ on X ′, and let D ⊂ X ′ be the pull-back, via ψ′, of an AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 9 ample divisor on X//W . Then ψ β ] = 0, so we have σ∗D ·Γβ = D ·Γ β = 0 for all β, and D · C ′ ≤ σ∗D · C = λβ(σ∗D · Γβ) = 0. Since D is the pull-back of an ample divisor on X//W , this implies that ψ ′] = 0. � We can now conclude the proof of the proposition. We consider a very general fiber G of φ, and fix two very general points x, y ∈ G that are not contained in σ(E). By the generality of the choices, we can assume that both x and y are outside the locus of indeterminacies of ψ, hence ψ(x), ψ(y) 6∈ ψ′(E). Furthermore, we can assume that x and y are connected by a chain of V -curves. By Lemma 2.21, every irreducible component of this chain that is not contained in σ(E) is mapped to a point by ψ. Therefore, since ψ(x) 6∈ ψ′(E), we deduce that this chain is in fact disjoint from σ(E). Thus we can apply the lemma to each one of its components. Since the chain is connected, this implies ψ(x) = ψ(y). In view of the generality of the choice of x and y in G, we conclude that there exists a natural rational map X//V 99K X//W commuting with the respective projections. � Definition 2.22. Let V ⊆ Hombir(P 1,X) be any family of rational curves, and let a > 0 be an integer. We denote by ‖V ‖ the largest family of rational curves on X such that R≥0[‖V ‖] = R≥0[V ]. We call ‖V ‖ the family numerically generated by V . Remark 2.23. Quite obviously, V is a subfamily of ‖V ‖, and if V is a numerically covering family then so is ‖V ‖. The previous proposition implies the following useful property. Corollary 2.24. Let V be a numerically covering family of rational curves on X. Then the families V , Vcov, and ‖V ‖ define the same rational fibration. Proof. Since R≥0[‖V ‖] = R≥0[V ] = R≥0[Vcov], it follows from Proposition 2.20. � An important theorem, due to Kollár, Miyaoka and Mori, says that a smooth pro- jective manifold is maximally rationally chain connected if and only if it is maximally rationally connected (see [15, Theorem IV.3.10]). The proof of this property can be adapted to the situation in which an arbitrary family of rational curves V is considered. More precisely, one can prove that if X is V -chain connected, then every two very general points of X are connected by a ‖V ‖-curve. In the sequel, we will need the following slightly different version of this property (the proofs of the two properties are almost the same). Proposition 2.25. Let V be a family of rational curves on X, and assume that X is V -chain connected. Let y ∈ X be any point that is connected by a V -chain to a very general point of X. Then Locus(‖V ‖, 0 7→ y) is dense in X. Proof. Fix a very general point x ∈ X, and let C = C1+ · · ·+Cn be a V -chain connecting x to y, with Ci = fi(P 1). Let p0 = x, pn = y, and pi = fi(∞) = fi+1(0) for 1 ≤ i ≤ n − 1. From here, we follow step by step the proof of [15, Complement IV.3.10.1], proving inductively that there is a free rational curve gi : P 1 → X which connects pi−1 to pi; the resulting chain of free rational curves is then smoothed into a free rational 10 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI curve h : P1 → X connecting x to y. The construction shows that each gi+1 is obtained by smoothing a comb on fi+1 with teeth assembled out of deformations of gi (see [15, Definition II.7.7]). It follows that h is a ‖V ‖-curve. � Consider now a submanifold Y of X. The inclusion i : Y →֒ X naturally induces an injective morphism i∗ : Hom(P 1, Y ) →֒ Hom(P1,X), which is defined by i∗([g]) := [i ◦ g]. Similar notation will be used for the injection Hombir(P 1, Y ) →֒ Hombir(P 1,X). Statements analogous to the following two propositions also hold for Hombir( ) in place of Hom( ). Proposition 2.26. With the above notation, assume that the normal bundle of Y in X is ample. Then, for every [g] ∈ Hom(P1, Y ) with g free, the morphism f := i ◦ g is free on X, and the schemes Hom(P1,X) and Hom(P1,X; 0 7→ g(0)) are smooth at [f ]. Proof. The bundle g∗TY is nef, since [g] is free. Thus, by the ampleness of NY/X and the exact sequence 0 → g∗TY → f ∗TX → g ∗NY/X → 0, we conclude that f is free. Then the last two assertions follow from Proposition 2.2. � We will need the following generalization of Proposition 2.2. Proposition 2.27. Let Y be a submanifold of a smooth projective variety X and let i : Y →֒ X be the inclusion. Let g : P1 → Y be a free morphism, and let f := i ◦ g. Then the Zariski tangent space of Hom(P1,X; {0} → Y ) at [f ] sits naturally in an exact sequence 0 → H0(P1, g∗TY ) → T[f ]Hom(P 1,X; {0} → Y ) → H0(P1, g∗NY/X ⊗m0). Moreover, if the normal bundle of Y in X is ample, then the sequence completes to a short exact sequence 0 → H0(P1, g∗TY ) → T[f ]Hom(P 1,X; {0} → Y ) → H0(P1, g∗NY/X ⊗m0) → 0 and Hom(P1,X; {0} → Y ) is smooth at [f ]. Proof. The natural maps f∗TX → g ∗NY/X → g ∗NY/X |{0}, passing to cohomology and taking into account the isomorphism T[f ]Hom(P 1,X) ∼= H0(P1, f∗TX) given by Proposi- tion 2.2, yield a natural homomorphism r : T[f ]Hom(P 1,X) → (g∗NY/X)|{0}. We have a Cartesian square Hom(P1,X; {0} → Y ) // Hom(P1,X) Hom({0}, Y ) // Hom({0},X). Note that there are natural identifications Hom({0}, Y ) = Y and Hom({0},X) = X. The Zariski tangent space of Hom(P1,X; {0} → Y ) at [f ], viewed as a subspace of AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 11 T[f ]Hom(P 1,X), is contained in ker(r). This simply follows from the fact that the mor- phism Hom(P1,X; {0} → Y ) → Hom({0},X) factors through Hom({0}, Y ), by taking tangent spaces and recalling the above natural identifications. Now, the freeness of g implies that H1(P1, g∗TY ) = 0, hence we get the exact sequence 0 → H0(P1, g∗TY ) → T[f ]Hom(P 1,X) → H0(P1, g∗NY/X) → 0. This sequence restricts to an exact sequence (3) 0 → H0(P1, g∗TY ) → ker(r) → H 0(P1, g∗NY/X ⊗m0) → 0, and the first assertion follows by observing that T[f ]Hom(P 1,X; {0} → Y ), viewed as a subspace of ker(r), contains the image of H0(P1, g∗TY ). Suppose now that NY/X is ample. By Proposition 2.26, Hom(P 1,X; 0 7→ g(0)) is smooth, hence of dimension h0(P1, f∗TX⊗m0), at [f ]. Let V be the irreducible component of Hom(P1,X) containing [f ]. Note that T[f ]Hom(P 1,X; {0} → Y ) = T[f ]V ({0} → Y ). Moreover, there is a dominant morphism V ({0} → Y ) → Y , defined by [h] 7→ h(0), whose fiber over a point y ∈ Y is V (0 7→ y). This implies that dimV ({0} → Y ) = dimV (0 7→ y) + dimY for a general y ∈ Y , and therefore (4) dimV ({0} → Y ) = h0(P1, f∗TX ⊗m0) + dimY by Proposition 2.2,(b). On the other hand, we have h0(P1, g∗NY/X ⊗m0) = h 0(P1, f∗TX ⊗m0)− h 0(P1, g∗TY ⊗m0) = h0(P1, f∗TX ⊗m0)− h 0(P1, g∗TY ) + h 0(g∗TY |{0}). Therefore, by (3) and h0(g∗TY |{0}) = dimY , we get (5) dimker(r) = h0(P1, f∗TX ⊗m0) + dimY. Then, comparing (5) with (4), we conclude at once that T[f ]Hom(P 1,X; {0} → Y ) = ker(r) and that V ({0} → Y ) is smooth at [f ]. � Definition 2.28. If W ⊆ Hombir(P 1, Y ) is a family of rational curves on Y , then we call 〈i∗(W )〉 ⊆ Hombir(P 1,X) the extension of W to X. Conversely, for every family of rational curves V on X, we call 〉i−1∗ (V )〈 ⊆ Hombir(P 1, Y ) the restriction of V to Y . Remark 2.29. If V is an irreducible family on X, then 〉i−1∗ (V )〈 needs not be irreducible. In fact, the example of the restriction to a smooth quadric Q2 of the family of lines in P3 shows that, in general, different elements in 〉i−1∗ (V )〈 may even define linearly independent numerical classes in N1(Y ). In particular, if V is an unsplit family on X, then its restriction to Y may not be an unsplit family. Although we do not have examples, it seems likely that i−1∗ (V ) might fail to be a family on Y even if V is a family on X. We close this section with the following “relative” version of Proposition 2.12. The proof, which can be found within the proof of [4, Lemma 1.4.5], uses a “non-breaking lemma” due to Wísniewski [28]. Proposition 2.30. Let V be an unsplit family on X. Let Y ⊂ X be a subvariety, and assume that Locus(V, {0} → Y ) is dense in X. Then, for every curve Γ ⊂ X, we have [Γ] = a[ΓY ] + b[C] in N1(X), 12 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI where C is a V -curve, ΓY is a curve contained in Y , and a ≥ 0. 3. Extension of rationally connected fibrations In this section we study the relationship between rational connected fibrations of a projective manifold X and those of an ample submanifold Y ⊂ X. Our main interest is in the case when it is given on X an irreducible family of rational curves V whose restriction VY to Y contains a covering component. Given this situation, the goal is to compare the associated rationally connected fibrations X 99K X//V and Y 99K Y//VY . However, for technical reasons that will be evident in the proof of Theorem 3.6, it is convenient to consider from the beginning a more general setting, allowing V to be reducible. The conditions given in items (i) and (ii) in the following two theorems capture the essential properties of V needed in our arguments. Theorem 3.1. Let X be a smooth projective variety, and assume that Y ⊂ X is a smooth subvariety with ample normal bundle. Let i : Y →֒ X be the inclusion, and suppose that the induced map i∗ : N1(X) → N1(Y ) is surjective. Let V = ∪α∈AVα be a family of rational curves on X, and assume that there is a subset B ⊆ A such that (i) R≥0[V ] = β∈B R≥0[Vβ ], and (ii) Locus i−1∗ (Vβ) is dense in Y for every β ∈ B. Let VY := 〉i ∗ (V )〈 be the restriction of V to Y . Then both V and VY are numerically covering families (respectively, on X and on Y ), and, for suitable choices of the rational quotients, there is a commutative diagram Y//VY // X//V , where π and φ are the projections to the respective rational quotients and δ is a surjective morphism. Remark 3.2. If one assumes that V is irreducible, then the hypothesis that i∗ : N1(X) → N1(Y ) be surjective is unnecessary. Remark 3.3. An analogous property has been independently observed to hold in the case V = Hombir(P 1,X) by A.J. de Jong and J. Starr. The proof of Theorem 3.1 is based upon the following two lemmas. Lemma 3.4. With the same notation and assumptions as in Theorem 3.1, Locus(Vβ; {0} → Y ) is dense in X for every β ∈ B. Proof. To keep the notation light, we suppose throughout the proof that V is irreducible, so that Vβ = V . By hypothesis, Locus(VY ) is dense in Y . Fix a general element [g] of an irreducible component of VY with dense locus in Y , and let f := i ◦ g. Note that [f ] ∈ V ({0} → Y ). Since the chosen component of VY has dense locus in Y , we can assume that y := f(0) is a general point of Y . We know that g is free by Proposition 2.6. AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 13 Thus, since V is an irreducible component of Hom(P1,X) containing [f ], Proposition 2.27 applies to say that V ({0} → Y ) is smooth at [f ], with Zariski tangent space T[f ]V ({0} → Y ) 0(P1, g∗TY )⊕H 0(P1, g∗NY/X ⊗m0)). We can view the right hand side as a vector subspace Λ ⊆ H0(P1, f∗TX) via the iso- morphism (3) given in the proof of Proposition 2.27. It follows from the ampleness of g∗NY/X that f ∗TX is spanned by the sections in Λ at every point q ∈ P 1 \ {0}. By [15, Proposition I.2.19], we conclude that the differential of the universal map 1 × V ({0} → Y ) → X has rank equal to dimX at (q, [f ]) for every q ∈ P1 \ {0}. Since V ({0} → Y ) is smooth at [f ], this implies that its locus in X is dense. � Lemma 3.5. With the same notation and assumptions as in Theorem 3.1, for every β ∈ B the family Vβ is covering (in X) and every irreducible component of i ∗ (Vβ) with dense locus in Y is a family (on Y ). Proof. By the hypothesis (ii) of Theorem 3.1, we can fix an arbitrary irreducible component T of i−1∗ (Vβ) with dense locus in Y . Let [g] ∈ T be a general element. We know by Proposition 2.6 that g is free. Then by Proposition 2.26, f := i ◦ g is free and Hom(P1,X) is smooth at [f ], and thus, in particular, Vβ is a covering family by Proposition 2.6 again. Moreover, we deduce by the smoothness of Hom(P1,X) at [f ] that for every irreducible one-parameter family [gt] in Hombir(P 1, Y ) specializing to [g], the corresponding family [ft] := [i ◦ gt] is contained in V . Therefore [gt] ∈ i ∗ (V ), and in fact [gt] ∈ T by the generality of the choice of [g] in the component T . We conclude that T is an irreducible component of Hombir(P 1, Y ), and hence, in particular, of VY . � Proof of Theorem 3.1. It follows from Lemma 3.5 and the hypothesis (i) that V is a nu- merically covering family onX and similarly, using the fact that the i∗ : N1(Y ) → N1(X) is injective, it follows that VY is a numerically covering family on Y . Let φ : X 99K Z//V and π : Y 99K Y//VY be the rational quotients, as in Theorem 2.13. Let X ◦ ⊆ X, S◦ ⊆ X//V , Y ◦ ⊆ Y , and Z◦ ⊆ Y//VY be open subsets such that the maps φ and π restrict to proper surjective morphisms φ◦ : X◦ → S◦ and π◦ : Y ◦ → Z◦. Let G be a very general fiber of φ◦, and let x be a general point of G. By Lemma 3.4, we can assume that x ∈ Locus(V ; {0} → Y ), and hence we can find a point y ∈ Y such that x ∈ Locus(V ; 0 7→ y) by Theorem 2.13. In particular, x and y are V -chain connected. This implies that y ∈ G, and therefore that G ∩ Y 6= ∅. Since G is a general fiber of φ◦, this means that φ◦(Y ∩X◦) = S◦. Moreover, by the generality of G, we can in fact assume that G has non-empty intersection with a very general fiber of π◦. On the other hand, recalling that π is defined by the restriction VY of the family V defining φ, we see that if F is a very general fiber of π◦ meeting G, then necessarily F ⊆ G. In conclusion, after possibly further 14 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI shrinking Z◦ (and consequently Y ◦), we may assume to have a commutative diagram // X◦ // S◦, where δ◦ is a dominant morphism. We need to show that, after possibly changing the birational model for Y//VY , the map δ◦ extends to a surjective morphism δ : Y//VY → X//V . We first take the normalization Z of a projective compactification of Z◦. Note that δ◦ extends to a rational map Z 99K X//V , which is defined by some linear system H on Z. Then we take Z ′ to be the normalization of the blow-up of the base scheme of H. Since the proper transform of H on Z ′ is base-point free, the rational map δ◦ lifts to a well defined morphism δ : Z ′ → X//V . This is a morphism of projective varieties, and δ ◦ is dominant, so δ is surjective. Moreover, the morphism Z ′ → Z is an isomorphism over Z◦, and thus we can identify the latter with a subset of Z ′. Then we take Z ′ as the VY -quotient Y//VY of Y . This proves the theorem. � Theorem 3.1 can be viewed as a general property of rationally connected fibrations. Aiming also at applications in extension problems of specific fibrations, that will be ad- dressed in the following sections, we are interested in determining sufficient conditions to ensure that the map δ, whose existence was proven in the previous theorem, is generically finite. This is the content of the next theorem, which is the main result of this section. Theorem 3.6. Let X be a smooth projective variety, and assume that Y ⊂ X is a smooth subvariety with ample normal bundle. Let i : Y →֒ X be the inclusion, and suppose that the induced map i∗ : N1(X) → N1(Y ) is surjective. Let V = ∪α∈AVα be a family of rational curves on X, and assume that there is a subset B ⊆ A such that (i) R≥0[V ] = β∈B R≥0[Vβ ], and (ii) Locus i−1∗ (Vβ) is dense in Y for every β ∈ B. Let VY := 〉i ∗ (V )〈 be the restriction of V to Y , and let Y//VY // X//V , be the commutative diagram given by Theorem 3.1. Suppose that one of the following two conditions holds: (a) V is an unsplit family; or (b) codimX Y < dimRCVY . Then the morphism δ is generically finite. AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 15 Proof. We first prove that δ is generically finite when the condition (a) of the statement is satisfied. If Y//VY is a point, then the statement is obvious, so we can suppose that Y//VY is positive dimensional. We suppose by contradiction that δ is not generically finite. Keeping the notation introduced in the proof of Theorem 3.1, let G be a very general fiber of φ◦, and consider YG := Y ∩ G. Note that G is smooth and rationally connected by the restriction VG of V to G in the sense of Definition 2.28. So, since we are assuming that V is unsplit, every irreducible component of VG is an unsplit family on G, and hence N1(G) is generated by classes of VG-curves by Proposition 2.12. Since such curves are all numerically equivalent in X, it follows that the image of the restriction map Pic(X) → Pic(G) has rank 1. Consider the inclusions jX : G →֒ X, jY : YG →֒ Y and iG : YG →֒ G. In order to reach a contradiction, we consider the commutative diagram Pic(X) // Pic(G) Pic(Y ) // Pic(YG). The plan is to give a lower-bound for the rank of the map Pic(X) → Pic(YG) by factoring it through Pic(Y ). Note that Y ◦G := Y ◦ ∩G is a non-empty open subset of YG. Since we are supposing that δ◦ is not generically finite, we have dim π◦(Y ◦G) ≥ 1. Therefore we can find a curve Γ ⊆ YG such that Γ ∩ Y G 6= ∅ and dim π◦(Γ ∩ Y ◦G) ≥ 1. Since Locus(VY ) is dense in Y , every fiber of π◦ has positive dimension and, since these fibers are proper, we can fix a curve C ⊂ YG lying inside a fiber of π ◦|Y ◦ Now, fix a projective model Y//VY for the rational quotient of Y containing Z ◦ as an open subset, and let L ⊂ Y//VY be a general hyperplane section passing through a point in π◦(Γ ∩ Y ◦G) but not containing the point π ◦(C). Let L◦ := L|Z◦ be the restriction of L to Z◦, and let D ⊂ Y be the closure (in Y ) of (π◦)−1(L◦). If U ⊂ Z◦ \ L◦ is an open neighborhood of π◦(C), then (π◦)−1(U) is an open neighborhood of C in X◦ (hence in X) and is disjoint from (π◦)−1(L◦); this shows that D ∩ C = ∅. Let then DG and HG be the restrictions of D and of a general hyperplane section H of Y to YG. By construction, we have DG · Γ > 0 and DG · C = 0, whereas HG is ample; in particular, DG and HG induce numerically independent elements of Pic(YG). Since both of them are restrictions of divisors on Y , this implies that rk Im(j∗Y ) ≥ 2. Therefore, observing that the cokernel of i∗ : Pic(X) → Pic(Y ) is torsion due to the surjectivity of N1(X) → N1(Y ), we conclude rk Im(j∗Y ◦ i ∗) ≥ 2. On the other hand, we have rk Im(i∗G ◦ j X) ≤ 1, and therefore we have a contradiction by the commutativity of the above diagram. This proves that δ is generically finite if the condition (a) of the statement of the theorem is satisfied. It remains to prove that δ is generically finite under the hypothesis (b). Let G be a very general (smooth and connected) fiber of φ◦, and let F be a very general fiber of π among those that are contained in G. Choosing G sufficiently general, we can also ensure that F is general among the fibers of π◦. Then, to conclude the proof of the theorem, we 16 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI need to show that (6) dimG = dimF + codimX Y. Note indeed that, since dimG = dimRCV and dimF = dimRCVY , (6) implies that δ is generically finite. In order to prove (6), we fix a very general point y ∈ F . By Lemma 3.4, Locus(V, {0} 7→ Y ) is dense in X. Moreover, we know that Locus(V, 0 7→ y) is con- tained in G by Theorem 2.13. Thus Locus(V, 0 7→ y) sweeps a dense subset of G, if we let y vary in F . Therefore, by our choices of F and y, we can assume that Locus(V, 0 7→ y) contains a very general point of G. Note that any point of G is connected by a VG-chain to the point y, where VG denotes the restriction of V to G in the sense of Definition 2.28. Therefore Proposition 2.25 applied to G and VG implies that Locus ‖VG‖, 0 7→ y is dense in G. Note that the two families ‖V ‖(0 7→ y) and ‖VG‖(0 7→ y) are the same since G is an RC‖V ‖ equivalence class. In particular, we have dimG = dimLocus ‖V ‖, 0 7→ y At this point the idea is to replace V with ‖V ‖, so that, by the above argument, we can directly assume, without loss of generality, that the two sets Locus(V, 0 7→ y) and G have the same dimension (and, in fact, that the first set is dense in the second one). In order to make this step, we need to show, first of all, that ‖V ‖ satisfies hypotheses analogous to (i) and (ii) imposed to V in the statement of the theorem, and moreover that replacing V with ‖V ‖ does not affect the quotient maps φ and π and the respective rational quotients, which implies condition (b) for ‖V ‖. Lemma 3.7. The family ‖V ‖ satisfies the hypotheses (i) and (ii) imposed to V in Theo- rem 3.6. Moreover, the families VY = 〉i ∗ (V )〈 and ‖V ‖Y := 〉i ∗ (‖V ‖)〈 define the same rational fibration on Y . In particular, codimX Y < dimRC‖V ‖Y . Proof of the lemma. First recall that VY is a numerically covering family by Theorem 3.1, hence so is ‖VY ‖. We compare ‖V ‖ with W := 〈i∗(‖VY ‖cov)〉. Note that ‖VY ‖cov is non-empty and W is a subfamily of ‖V ‖. We claim that (7) R≥0[‖V ‖] = R≥0[W ] in N1(X). By the definition of ‖V ‖, this is equivalent to R≥0[V ] = R≥0[W ]. The inclusion R≥0[V ] ⊇ R≥0[W ] is obvious, so we need to show that the reverse inclusion holds. In fact, by the hypothesis (i) on V , it suffices to show that R≥0[Vβ ] ⊆ R≥0[W ] for every β ∈ B. Since Locus(i−1∗ (Vβ)) is dense in Y , at least one of the irreducible components of 〉i ∗ (Vβ)〈 is a covering family on Y , hence a subfamily of ‖VY ‖cov. This implies that R≥0[Vβ] ⊆ R≥0[W ], hence (7) is proven. On the other hand, the restriction to Y of any irreducible component ofW is a union of irreducible components of ‖VY ‖cov, which are covering by Definition 2.18. Therefore their loci are dense in Y . Combining this with (7) proves the first assertion. We are currently assuming that V satisfies the condition (b) in Theorem 3.6. Therefore we get the desired inequality codimX Y < dimRC‖V ‖Y as soon as we show AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 17 that the families VY and ‖V ‖Y define the same rational quotient of Y , because then dimRCVY = dimRC‖V ‖Y . By using Proposition 2.20 it is enough to show that (8) R≥0[‖V ‖Y ] = R≥0[VY ] in N1(Y ). The inclusion R≥0[‖V ‖Y ] ⊇ R≥0[VY ] is obvious. To prove the other, note that the embed- ding of Y in X induces an inclusion ι : N1(Y ) →֒ N1(X), which follows from the hypothesis that i∗ : N1(X) → N1(Y ) is surjective made in Theo- rem 3.6. Observe that R≥0[‖V ‖Y ] ⊆ R≥0[‖V ‖] = R≥0[V ] = R≥0[Vβ] = ι R≥0[VY ] in N1(X). Since ι is injective, the inclusion holds inN1(Y ), before taking images. This proves equality (8) and completes the proof of the lemma. � We now come back to the proof of the theorem. By the previous lemma, we are allowed to replace V with ‖V ‖. Therefore we can directly assume without loss of generality (9) dimLocus(V, 0 7→ y) = dimG. Then, in order to show (6) and hence to conclude the proof, it suffices to show that dimLocus(V, 0 7→ y) = dimX − dimY//VY . For short, let d = dimX and k = dimY//VY . Since y is a general point of Y , we can assume that VY (0 7→ y) contains a sufficiently general point [g] of a covering component of VY , so that g is free and C := g(P 1) is a smooth curve (the smoothness of C is not essential for the proof, but simplifies the notation). We observe that NY/X |F is an ample vector bundle over F , and that rkNY/X |F < dimF by condition (b), and therefore H1((NY/X |F ) ∗) = 0 by the Le Potier vanishing theorem [23]. This implies that the short exact sequence 0 → NF/Y → NF/X → NY/X |F → 0 splits since NF/Y F , and therefore we get a surjection NF/X ։ NF/Y . Restricting to C and composing with the natural surjections TX |C ։ NC/X and NC/X ։ NF/X |C , we obtain a chain of surjections TX |C ։ NC/X ։ NF/X |C ։ NF/Y |C Since g is free, so is f := i ◦ g by Proposition 2.26, and therefore TX |C is nef. Writing TX |C ∼= ⊕iOP1(bi), we have bi ≥ 0, and thus the existence of a quotient of TX |C isomorphic to O⊕k implies that (10) #{bi | bi = 0} ≥ k. Note that (V, 0 7→ y) is smooth at [f ] by Proposition 2.2. Let u : P1×(V, 0 7→ y) → X be the universal map, and let q ∈ P1 be a point different from 0. By [15, Proposition II.3.10] and (10), we have rkdu(q, [f ]) ≤ d− k. 18 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI Since (V, 0 7→ y) is smooth at [f ], we obtain dimLocus(V, 0 7→ y) ≤ d − k. On the other hand, we have dimG ≥ d − k by Theorem 3.1. Thus by (9) we conclude that dimLocus(V, 0 7→ y) = d− k, which proves (6). This concludes the proof of the theorem. Remark 3.8. It is interesting to compare the result of Theorem 3.6 when the condition in case (b) is satisfied with a result of Sommese [26, Proposition III] (see also [3, Theo- rem (5.3.1)] for a more general version) in which the case of an ample divisor endowed with a fibration is considered. Indeed, one notices that if codimX Y = 1, then the inequality in (b) reduces exactly to the hypothesis imposed in the theorem of Sommese. Remark 3.9. Hartshorne’s conjecture [13, Conjecture 4.5] on the intersection of ample submanifolds with complementary dimensions, if true, would imply that the morphism δ in Theorem 3.6 is in fact an isomorphism whenever the condition given in (b) is satisfied. Indeed, let G be a general fiber of φ. Then Y ∩ G is a disjoint union of deg δ fibers Fj of π, and by using standard exact sequences one easily sees that NY ∩G/G ∼= NY/X |Y ∩G. In particular, we have NFj/G ∼= NY ∩G/G|Fj ∼= NY/X |Fj , which is ample. Note that con- dition (b) implies that 2 dimFj > dimG. Then, assuming that deg δ ≥ 2, Hartshorne’s conjecture would give the contradiction F1 ∩ F2 6= ∅; therefore δ can only be birational. In fact, modulo suitable choices of the rational quotients, one could even assume that δ is an isomorphism. We recall that this conjecture of Hartshorne is known to be true, for instance, for homogeneous spaces [24]; we will use this fact in the proof of Theorem 5.8. Remark 3.10. The previous results are related to a result of Occhetta [25, Proposition 4], where an analogous extension property is obtained under more restrictive hypotheses. In the setting of [25], Y is supposed to be the zero locus of a regular section of an ample vector bundle on X, π is the MRC-fibration of Y , and the rational quotient of Y is assumed to have positive geometric genus. In applications, it might be useful to start with families of rational curves on Y , rather than only considering restrictions of families on X. The following elementary property will be useful. Let X and Y be as above, with the inclusion inducing a surjection N1(X) → N1(Y ), and consider a numerically covering family W of rational curves on Y . Let then V = 〈i∗(W )〉 be the family on X obtained as the extension of W , and let VY =〉i ∗ (V )〈 be its restriction back to Y . Note that VY is numerically covering since W is so. We clearly have W ⊆ VY , but in general the inclusion may be strict (one can take as an example the case of a quadric Y = Q2 in X = P3, taking W to be the family of lines on Y corresponding to one of the two rulings). By this inclusion and Theorem 3.1, we have a commutative diagram // Y//VY // X//V , for a suitable choice of the rational quotients, and some surjective morphisms γ and δ. AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 19 Proposition 3.11. Keeping the notation introduced above, both families W and VY define the same rational quotient (i.e., Y//W = Y//VY and τ = π) if either one of the following two situations occurs: (a) dimY//W = dimX//V , or (b) i∗ : N1(X) → N1(Y ) is surjective. Proof. We observe that the general fiber of γ is rationally connected, since it is dominated by the general fiber of π, and in particular it is connected. Then the conclusion in case (a) follows immediately from the commutativity of the above diagram. We now focus on case (b), and thus assume that N1(X) → N1(Y ) is surjective. By duality, we have an injection N1(Y ) →֒ N1(X) and, by the construction of VY , we deduce that R≥0[VY ] = R≥0[W ]. Then the assertion follows from Proposition 2.20. � The case when the rational quotient is one-dimensional is particularly well suited because of the following well-known property. We are grateful to the referee for suggesting this proof, which simplifies our original argument. Proposition 3.12. Let X be a normal projective variety, let φ◦ : X◦ → C be a dominant morphism from a non-empty open subset X◦ ⊆ X to a smooth projective curve C, and assume that φ◦ is proper over its image. Then φ◦ extends to a (surjective) morphism φ : X → C. Proof. We consider a projection C → P1. Composing with φ◦, we have a rational map X 99K P1 which is defined by some linear pencil. Because of the properness of φ◦, this pencil is free, hence the map X → P1 is regular and we can lift it via Stein factorization to a regular morphism X → C which extends φ◦. � Coming back to the extension of rationally connected fibrations, we immediately obtain the following result. Corollary 3.13. With the same notation and assumptions as in Theorem 3.6, suppose furthermore that dimY//VY = 1. Then the RCVY -fibration π of Y and the RCV -fibration φ of X are surjective morphisms fitting in a commutative diagram Y//VY // X//V , where δ is a finite morphism of smooth curves. Proof. It follows from Theorems 3.1 and 3.6, and Proposition 3.12. � 4. Extension of Mori contractions In this section we consider the problem to extend Mori contractions. Let Y be a smooth projective variety with Picard number ρ := ρ(Y ) ≥ 2, and suppose that R ⊂ N1(Y ) is a KY -negative extremal ray of NE(Y ) such that the associated Mori contraction π : Y → Z is of fiber type, namely, that dimZ < dimY . 20 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI We fix a rational curve C in a general fiber of π passing through a general point of Y , whose numerical class [C] generates R. Then we fix a normalization morphism f : P1 → C ⊂ Y , and let W = 〈[f ]〉 ⊂ Hombir(P 1, Y ) be the minimal family of rational curves on Y generated by [f ]. Note that, by construction, R≥0[W ] = R≥0[C] = R in N1(Y ), and W is a covering family on Y , since C passes through a general point of Y . Moreover, π can be considered as the RCW -fibration of Y . Theorem 4.1. With the above notation, assume that Y is a smooth subvariety, with ample normal bundle, of a smooth projective variety X, such that the inclusion i : Y →֒ X induces an isomorphism i∗ : N1(X) → N1(Y ). Let V := 〈i∗(W )〉 ⊂ Hombir(P be the extension of the family W to X, and assume that V is an unsplit family on X. Then R≥0[V ] is a KX -negative extremal ray of X. Moreover, denoting by φ : X → S the Mori contraction of this ray, there is a finite surjective morphism δ : Z → S giving a commutative diagram // S. Proof. By the hypothesis and duality, the inclusion i : Y →֒ X induces a natural isomor- phism ι : N1(Y ) ∼= N1(X). Let RX := ι(R). This is a ray in N1(X), and is contained in NE(X), because of the inclusion ι(NE(Y )) ⊆ NE(X). Moreover, if C is a curve as above, then adjunction formula gives KX · C = (KY − detNY/X) · C < 0, and thus RX is a KX -negative ray. The main point here is to prove that RX is an extremal ray of NE(X). We start by observing two things. First of all, we have RX = R≥0[V ] in N1(X), and we can consider φ as the RCV -fibration of X. Moreover, if VY := 〉i ∗ (V )〈 is the restriction of V to Y , then we have R≥0[VY ] = ι −1(R≥0[V ]) = R≥0[W ] in N1(Y ), by the isomorphism ι : N1(Y ) ∼= N1(X). Therefore VY and W define the same rational quotient, by Proposition 2.20. Note that V satisfies the conditions (i) and (ii) of Theorem 3.1. Then, by Lemma 3.4 and the fact that V is unsplit, we have (11) Locus(V, {0} → Y ) = X. Indeed the lemma implies that Locus(V, {0} → Y ) is dense in X, and the unsplitness of V implies that such locus is proper, hence closed in X. AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 21 Let D be any divisor on X whose restriction D|Y is a good supporting divisor for R. Note that D · C = 0. Let Γ be any curve on X. Then, by Proposition 2.30, there is a curve ΓY ⊂ Y such that [Γ] = a[ΓY ] + b[C] in N1(X), with a ≥ 0. This implies that D · Γ = aD · ΓY + bD · C = aD|Y · ΓY ≥ 0, since D · C = 0 and D|Y is nef. This proves that D is nef. Since R is an extremal ray of NE(Y ), by duality it corresponds to it an extremal face of Nef(Y ) of maximal dimension ρ − 1, and therefore we can find ρ− 1 good supporting divisors of R whose numerical classes are linearly independent in N1(Y ). By the previous argument, this implies that such good supporting divisors, up to numerically equivalence, extend to divisors on X that are nef and trivial on RX , and whose numerical classes are linearly independent in N1(X). Since there are ρ − 1 of them, and ρ = ρ(X) by the isomorphism N1(X) ∼= N1(Y ), this implies that RX is an extremal ray of NE(X). So, let φ : X → S be the Mori contraction of RX . By taking the Stein factorization of the restricted morphism φ|Y : Y → S, we obtain a commutative diagram // S, where T is normal, ψ is surjective with connected fibers, and δ is finite. We observe that an irreducible curve on Y is contracted by ψ if and only if it is numerically proportional (in X, and hence in Y ) to C, and therefore if and only if it is contracted by π. This implies that T = Z and ψ = π. Moreover, by (11), we see that Y meets every fiber of φ, and therefore δ is surjective. This completes the proof of the theorem. � Remark 4.2. Theorem 4.1 can be generalized, with minor changes, to the case of contrac- tions of KY -negative extremal faces of NE(Y ). Remark 4.3. A result analogous to Theorem 4.1 is obtained in [25, Proposition 5] in a more restrictive context (in the setting of [25], Y is assumed to be the zero locus of a regular section of an ample vector bundle on X, with dimX ≥ 4, and δ is proven to be an isomorphism). Let us note that the proof of [25, Proposition 5] makes use in an essential way of the fact that the divisor D (in our and in his notation as well) is an adjoint divisor, i.e., a divisor of the form D = KX + A for some ample line bundle A on X. This circumstance seems not to be true in general. The gap was pointed out by Paltin Ionescu. Our arguments fill up that gap. A similar discussion is given in [2] in the case Y is an ample divisor on X. In connection with Theorem 4.1 we should also mention [1, Theorem 3.4] and [9, Lemma (1.4)]. We close this section with the following elementary property on extensions of relative polarizations, which applies in particular to the setting of Theorem 4.1. Proposition 4.4. Let Z // S 22 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI be a commutative diagram of morphisms of projective varieties, and assume that i is an embedding, that π is not a finite morphism, and that ρ(X/S) = 1. Let M be a line bundle on X whose restriction M |Y to Y is π-ample. Then there exists an ample line bundle H on X such that H|F ∼=M |F for every fiber F of π. Proof. Since the numerical class of a curve in a positive dimensional fiber of π spans N1(X/S), it follows from the relative Kleiman criterion of ampleness (see, e.g., [17, Theo- rem 1.4]) that M is φ-ample. Then the line bundle H := M + φ∗(kA) is ample if A is an ample line bundle on S and k ≫ 0 (see, e.g., [22, Proposition 1.7.10]), and H|F ∼= M |F for every fiber of π. � 5. Projective bundles and quadric fibrations as ample subvarieties The results obtained in the previous sections are here applied to classify, under suitable conditions, projective manifolds containing “ample subvarieties” that admit a structure of projective bundles or quadric fibrations. Here we will adopt the following definitions of projective bundles and quadric fi- brations. We remark that there are in fact several slightly different notions of quadric fibrations in the literature, and the one adopted here is a little more restrictive than others (see Remark 5.2 below). Definition 5.1. A surjective morphism π : Y → Z between smooth projective varieties is said to be a Pm-bundle (resp., a Qm-fibration) if π is an extremal Mori contraction and there is a line bundle L on Y such that every fiber F of π is mapped isomorphically to Pm (resp., is embedded as an irreducible and reduced quadric hypersurface of Pm+1) via the complete linear system |L|F |. Remark 5.2. With the above definition, a morphism π : Y → Z is a Pm-bundle if and only if Y ∼= PZ(F) for some vector bundle F of rank m + 1 on Z; we can take as L the tautological line bundle of F on Y . On the other hand, not all scrolls (resp., quadric fibrations) in the adjunction theoretic sense (cf. [3, Sections 3.3, 14.1, 14.2]) are Pm- bundles (resp., Qm-fibrations) in our sense. In fact, our definition of Qm-fibration does not include conic bundles with singular fibers, since we require all fibers to be irreducible; moreover, the assumption that π is an extremal Mori contraction implies that ρ(Y/Z) = 1, which excludes (as we will see in the proof of Lemma 5.3) those fibrations with all fibers isomorphic to Q2 for which the fundamental group of the base Z acts with trivial monodromy. Note that the general fiber of a Qm-fibration is isomorphic to Qm. Lemma 5.3. Let π : Y → Z be either a Pm-bundle or a Qm-fibration, and let W be the family of rational curves on Y generated by the lines in the fibers of π. Then W is an irreducible family. Proof. The assertion is clear in all cases except when π is a Q2-fibration. In this case, a straightforward extension of the arguments in the proof of [8, Proposition (1.3.1)] shows two things. First of all, ρ(Y/Z) 6= 1 if and only if all fibers of π are isomorphic to Q2 and the fundamental group of Z acts with trivial monodromy. Moreover, in all remaining cases, the lines in the fibers generate an irreducible family. The fact that the base is one-dimensional in the setting considered in [8] is not essential in the arguments. � AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 23 Definition 5.4. An embedding Pm →֒ Pn is said to be linear if the image of Pm is a linear subspace of Pn. Similarly, an embedding of irreducible varieties F ⊂ G, with F isomorphic to a quadric of Pm+1 and G isomorphic either to Pn or to a quadric in Pn+1, is said to be linear if OG(1)|F ∼= OF (1). For the reminder of this section, we consider the following setting. LetX be a smooth projective variety, and assume that Y ⊂ X is a positive dimensional submanifold with ample normal bundle NY/X . An easy application of a well-known theorem of Kobayashi and Ochiai gives us the following property, in which the case when Y is a projective space or a smooth quadric of dimension ≥ 3 is considered. Proposition 5.5. With the above notation, assume that the inclusion i : Y →֒ X induces an isomorphism i∗ : Pic(X) → Pic(Y ), and denote n := dimX. If Y ∼= Pm for some m ≥ 1, then X ∼= Pn and Y is linearly embedded in X. If Y ∼= Qm for some m ≥ 3, then either X ∼= Pn or X ∼= Qn, and Y is linearly embedded in X. Proof. Suppose first that Y ∼= Pm. By the assumption, we have ρ(X) = 1, and the ample generator OX(1) of Pic(X) satisfies OX(1)|Y ∼= OY (1). Observe that d := degNY/X ≥ rkNY/X = n−m. By adjunction, we have OX(−KX) ∼= OX(m+ 1+ d), and therefore X is a Fano manifold of index ≥ n + 1. Then the assertion follows from Kobayashi and Ochiai’s theorem (see e.g., [3, (3.1.6)]). The argument for the case when Y ∼= Qm with m ≥ 3 is analogous. � Remark 5.6. The case when Y ∼= Q2 is harder to deal with under such weak hypotheses. However, if we additionally assume the existence of an ample line bundle H on X such that H|Y ∼= OP1×P1(1, 1), then this case cannot occur. If at the same time we relax the hypothesis on the Picard groups, and only assume that i∗ : Pic(X) → Pic(Y ) is injective, then it is easy to see that necessarily ρ(X) = 1, either X ∼= Pn or X ∼= Qn, and Y is linearly embedded in X. Indeed, to exclude the case ρ(X) = 2, it suffices to apply Theorem 4.1 with the family W being either one of the two rulings of Q2. In this way one obtains two distinct Mori contractions of X whose relative dimensions, added together, exceed the dimension of X, but this is impossible. Let us now recall the following extension of the Lefschetz hyperplane theorem, es- sentially due to Sommese [27, Proposition 1.16] (see also [19, Theorem 1.1]), which will be used in the proof of case (b) of Theorem 5.8 below. Proposition 5.7. Let X be a projective manifold, and suppose that Y ⊂ X is a submani- fold defined by a regular section of an ample vector bundle on X. Suppose that dimY ≥ 3. Then the inclusion i : Y →֒ X induces an isomorphism i∗ : Pic(X) → Pic(Y ). We now address the case when Y is a Pm-bundle or a Qm-fibration over a smooth projective variety. The following theorem (which includes Proposition 5.5 as a very special case) is the main result of this section. Theorem 5.8. Let Y be a submanifold of a projective manifold X with ample normal bundle. Assume that Y admits either a Pm-bundle or a Qm-fibration structure π : Y → Z over a smooth projective variety Z (see Definition 5.1), and that Y has codimension (12) codimX Y < dimY − dimZ. 24 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI Further suppose that one of the following two situations occurs: (a) Z is a curve and the inclusion i : Y →֒ X induces an isomorphism Pic(X) ∼= Pic(Y ); or (b) Y is the zero scheme of a regular section of an ample vector bundle E on X. Then π extends to a morphism π̃ : X → Z, that is, there is a commutative diagram eπ~~}} Moreover, letting n := m+ codimX Y , the following holds: • if π is a Pm-bundle, then π̃ is a Pn-bundle; and • if π is a Qm-fibration, then π̃ is either a Pn-bundle or a Qn-fibration. In either case, the fibers of π are linearly embedded in the fibers of π̃. Proof. For clarity of exposition, we discuss case (a) and case (b) separately, as certain steps require different arguments. Case (a). Let W be the family of rational curves on Y generated by the lines in the fibers of π. Note that W is an irreducible family by Lemma 5.3, and that π is the RCW -fibration. By Propositions 2.6 and 2.26, Hombir(P 1,X) is smooth at the generic point of i∗(W ). Let V = 〈i∗(W )〉 (note that V is irreducible). By Proposition 2.20, the rational fibration defined by the restriction VY of V to Y coincides with π. Then, by Corollary 3.13, we obtain the commutative diagram // S, where φ is the RCV -fibration and δ is a finite morphism of smooth projective curves. By definition, there is a line bundle L on Y such that L|F ∼= OF (1) for every fiber F of π. Since L extends to a line bundle on X and ρ(X/S) = 1, Proposition 4.4 implies that there exists an ample line bundle H on X such that H|F ∼= OF (1) for every F . If C is a line in a fiber of π and we set b = n if Y is a Pm-bundle and b = n − 1 if Y is a Qm-fibration, then we have (KX + bH) · C = (KY + bH|Y ) · C − detNY/X · C < 0. It thus follows from the classification of polarized manifolds with large nef-value due to Fujita and Ionescu (see [11, Section 1] or [14, Section 1]) that φ : X → S is either a Pn-bundle or a fibration with fibers isomorphic to hyperquadrics in Pn+1 and relative Picard number 1, and in fact, since dimX ≥ 3 and dimS = 1, in the second case X is a Qn-fibration. To conclude, observing that the general fiber of φ is a homogeneous space, we deduce from Remark 3.9 that δ is in fact an isomorphism, and therefore we get a diagram as in the statement by setting π̃ := δ−1 ◦ φ. AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 25 Case (b). The first step is to prove the existence of a diagram as in the statement. We will use different arguments according to the codimension of Y . For short, let r := rkE. Note that codimX Y = r = n−m. If r = 1, then (12) implies that π has relative dimension m ≥ 2. Therefore in this case we can apply a general result of Sommese [26, Proposition III], which says that π extends to a morphism π̃ : X → Z. We now assume that r ≥ 2. Note that in this casem ≥ 3 by (12). To extend π in this situation, we will use the results from the previous sections. As in the proof of case (a), let W be the irreducible family of rational curves on Y generated by the lines in the fibers of π, and let V be the unique irreducible component of Hombir(P 1,X) containing the generic point of i∗(W ). By Proposition 2.20, the rational fibration defined by the restriction VY of V to Y coincides with the one defined by W , that is, with π. Then, by Theorems 3.1 and 3.6, we obtain a commutative diagram //___ S, where φ is the rational quotient defined by V and δ is a dominant and generically finite map. Let G be a general fiber of φ, let F be one of the fibers of π that is contained in G, and fix a line C ⊆ F (we can assume that G and F are smooth). Note that dimG = n. Lemma 5.9. Assuming r ≥ 2, the family V is unsplit. Proof of the lemma. Since the section of E defining Y in X restricts to a regular section of E|G whose zero scheme is F , we can apply Proposition 5.7 to this setting. This implies that the inclusion of F in G induces an isomorphism Pic(G) ∼= Pic(F ). Note that ρ(F ) = 1, since F is either a projective space or a quadric of dimension m ≥ 3. Therefore we have ρ(G) = 1. Let L be a line bundle on Y such that L|F ∼= OF (1), and let L̃ be the extension of L to X. We observe that L̃|G is ample and −KG ≡ aL̃|G for some integer a, which is easily seen to be positive. Then, since L̃|G · C = 1, we see that G is a Fano manifold of index a. In particular, we have −KG · C = a ≤ n + 1. Note that, if X ◦ ⊆ X is as in Theorem 2.13, then we can assume that G is a fiber of the morphism X◦ → Z, and hence KX |G = KX◦ |G = KG. Therefore detE · C = KY · C −KX · C = KF · C −KG · C ≤ −m+ n+ 1 = r + 1. Since we are assuming that r ≥ 2, this implies that detE · C < 2r. We conclude that the family V is unsplit by Remark 2.11. � We come back to the proof of the theorem, still assuming for the moment that r ≥ 2. Since V is unsplit by Lemma 5.9, we can apply Theorem 4.1. This implies that, in the above diagram, both φ and δ are morphisms and δ is finite, and moreover ρ(X/S) = 1. Note in particular that rk Im Pic(X) → Pic(G) = 1. We have dimG = n. We set b = n if F ∼= Pm, and b = n − 1 if F ∼= Qm. Denoting by C a line in a general fiber of π, we obtain (KG + bH|G) · C = (KF + bH|F ) · C − detNF/G · C < 0, 26 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI since NF/G ∼= NY/X |F is ample of rank n−m. By applying again [11, 14] and taking into account that ρ(X/S) = 1, we deduce that either (G,H|G) ∼= (P n,OPn(1)) or (G,H|G) ∼= (Qn,OQn(1)), with the second case occurring only if F ∼= Q m. Since the general fiber G of π̃ is a homogeneous space, we deduce from Remark 3.9 that δ is in fact an isomorphism.1 We then define π̃ := δ−1 ◦ φ. At this point we have a diagram as in the statement for all values of r. We claim that π̃ is equidimensional with irreducible and reduced fibers. To see this, let Gz := π̃ −1(z) for an arbitrary z ∈ Z, and let ∆ be any component of Gz. Note that ∆∩Y is contained in the fiber Fz := π −1(z) of π, which is irreducible and reduced of dimension m; in particular, we have dim(∆∩Y ) ≤ m. Since dim∆ ≥ n and ∆∩Y is defined by a section of E|∆, we have dim(∆ ∩ Y ) ≥ dim∆− r. We conclude that dim∆ = n so that π̃ is equidimensional, and Fz ⊂ ∆. Recalling that Fz is irreducible and reduced, and cut out on Y by Gz (scheme theoretically), we see that Gz is also irreducible and reduced. Since the fibers of π̃ are irreducible and reduced, we can apply the semi-continuity of the ∆-genus [10, Section 5 and (2.1)]. We conclude that π̃ is either a Pn-bundle or a Qn-fibration over Z. � We note that all cases in Theorem 5.8 are effective. Remark 5.10. We already listed in the introduction a series of works related to Theo- rem 5.8. The classical setting when Y is an ample divisor is widely studied in the litera- ture; for a survey and complete references we refer to [3, Chapter 5] and [2]. In the case when Y has codimension r ≥ 2, we are not aware of other results of this type in which the base of the fibration is allowed to be arbitrary. References [1] M. Andreatta and G. Occhetta, Ample vector bundles with sections vanishing on special varieties. Internat. J. Math. 10 (1999), 677–696. [2] M.C. Beltrametti and P. Ionescu, A view on extending morphisms from ample divisors. In preparation. [3] M.C. Beltrametti and A.J. Sommese, The Adjunction Theory of Complex Projective Varieties. Expo- sitions in Mathematics, 16, W. de Gruyter, Berlin, (1995). [4] M.C. Beltrametti, A.J. Sommese and J.A. Wísniewski, Results on varieties with many lines and their applications to adjunction theory. Complex Algebraic Varieties (K. Hulek et al. eds.), Proceedings, Bayreuth, 1990. 16–38, Lecture Notes in Math., 1507, Springer-Verlag, Berlin, 1992. [5] F. Campana, Connexité rationnelle des variétés de Fano. Ann. Sci. Ec. Norm. Sup. 25 (1992), 539– [6] O. Debarre, Higher-Dimensional Algebraic Geometry. Universitext, Springer-Verlag, Berlin, 2001. [7] T. de Fernex, Ample vector bundles with sections vanishing along conic fibrations over curves. Collect. Math. 49 (1998), 67–79. [8] T. de Fernex, Ample vector bundles and intrinsic quadric fibrations over irrational curves.Matematiche (Catania) 55 (2000), 205–222. [9] T. de Fernex and A. Lanteri, Ample vector bundles and Del Pezzo manifolds. Kodai Math. J. 22 (1999), 83–98. [10] T. Fujita, On the structure of polarized varieties with ∆-genera zero. J. Fac. Sci. Univ. Tokyo Sect. IA Math. 22 (1975), 103–115. [11] T. Fujita, On polarized manifolds whose adjoint bundles are not semipositive. Algebraic Geometry, Sendai 1985, Adv. Stud. Pure Math. 10 (1987), 167–178. 1Alternatively, one can deduce this fact from [27, Proposition 1.16]. Indeed this result, applied to G∩Y once this is thought of as a subvariety of G defined by a regular section of E|G, gives the isomorphism 0(G ∩ Y,Z) ∼= H 0(G,Z), and this implies that G ∩ Y is connected. AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 27 [12] T. Graber, J. Harris and J.M. Starr, Families of rationally connected varieties. J. Amer. Math. Soc. 16 (2003), 57–67. [13] R. Hartshorne, Ample Subvarieties of Algebraic Varieties. Lecture Notes in Math. 156, Springer- Verlag, Berlin, 1970. [14] P. Ionescu, Generalized adjunction and applications. Math. Proc. Camb. Philos. Soc. 99 (1986), 457– [15] J. Kollár, Rational Curves on Algebraic Varieties. Ergeb. Math. Grenzgeb. (3) 32, Springer-Verlag, Berlin, 1996. [16] J. Kollár, Y. Miyaoka and S. Mori, Rationally connected varieties. J. Algebraic Geom. 1 (1992), 429–448. [17] J. Kollár and S. Mori, Birational Geometry of Algebraic Varieties. Cambridge Tracts in Mathematics, Cambridge University Press, Cambridge, 1998. [18] A. Lanteri and H. Maeda, Ample vector bundles with sections vanishing on projective spaces or quadrics. Internat. J. Math. 6 (1995), 587–600. [19] A. Lanteri and H. Maeda, Ample vector bundle characterizations of projective bundles and quadric fibrations over curves. In Andreatta et al. (eds.), Proceedings of the international conference: Higher Dimensional Complex Varieties, Trento, Italy, June 15–24, 1994. 247–259, W. de Gruyter, Berlin, 1996. [20] A. Lanteri and H. Maeda, Geometrically ruled surfaces as zero loci of ample vector bundles. Forum Math. 9 (1997), 1–15. [21] A. Lanteri and H. Maeda, Special varieties in adjunction theory and ample vector bundles. Math. Proc. Camb. Philos. Soc. 130 (2001), 61–75. [22] R. Lazarsfeld, Positivity in Algebraic Geometry. I – Classical Setting: Line Bundles and Linear Series. Ergeb. Math. Grenzgeb. (3) 48, Springer-Verlag, Berlin, 2004. [23] J. Le Potier, Annullation de la cohomologie à valeurs dans un fibré vectoriel holomorphe positif de rang quelconque. Math. Ann. 218 (1975), 35–53. [24] M. Lübke, Beweis einer Vermutung von Hartshorne für den Fall homogener Mannigfaltigkeiten. J. Reine Angew. Math. 316 (1980), 215–220. [25] G. Occhetta, Extending rationally connected fibrations. Forum Math. 18 (2006), 853–867. [26] A.J. Sommese, On manifolds that cannot be ample divisors. Math. Ann. 221 (1976), 55–72. [27] A.J. Sommese, Submanifolds of Abelian varieties. Math. Ann. 233 (1978), 229–256. [28] J.A. Wísniewski, On a conjecture of Mukai, Manuscripta Math. 68 (1990), 135–141. Dipartimento di Matematica, Università di Genova, Via Dodecaneso 35, I-16146 Gen- ova, Italy E-mail address: beltrame@dima.unige.it Department of Mathematics, University of Utah, 155 South 1400 East, Salt Lake City, UT 84112, USA E-mail address: defernex@math.utah.edu Dipartimento di Matematica “F. Enriques”, Università di Milano, Via C. Saldini 50, I-20133 Milano, Italy E-mail address: lanteri@mat.unimi.it Introduction 1. Conventions and basic notation 2. Families of rational curves on varieties 3. Extension of rationally connected fibrations 4. Extension of Mori contractions 5. Projective bundles and quadric fibrations as ample subvarieties References ABSTRACT Under some positivity assumptions, extension properties of rationally connected fibrations from a submanifold to its ambient variety are studied. Given a family of rational curves on a complex projective manifold X inducing a covering family on a submanifold Y with ample normal bundle in X, the main results relate, under suitable conditions, the associated rational connected fiber structures on X and on Y. Applications of these results include an extension theorem for Mori contractions of fiber type and a classification theorem in the case Y has a structure of projective bundle or quadric fibration. <|endoftext|><|startoftext|> Introduction Let G∞ be a semisimple real Lie group with unitary dual Ĝ∞. The goal of this note is to produce new upper bounds for the multiplicities with which representations π ∈ Ĝ∞ of cohomological type appear in certain spaces of cusp forms on G∞. More precisely, we suppose that G∞ := G(R ⊗Q F ) for some connected semisimple linear algebraic group G over a number field F . Let K∞ be a maximal compact subgroup of G∞. We fix an embedding G →֒ GLN for some N , and for any ideal q of OF , we let G(q) denote the intersection of G∞ with the congruence subgroup of GLN (OF ) of full level q. We also fix an arithmetic lattice Γ in G∞ (i.e. a subgroup commensurable with the congruence subgroups G(q)) and write Γ(q) := Γ G(q). For any π ∈ Ĝ∞, let m(π,Γ(q)) denote the multiplicity with which π occurs in the decomposition of the regular representation of G∞ on L cusp(Γ(q)\G∞). Let V (q) denote the volume of the arithmetic quotient Γ(q)\G∞. In terms of this notation, we may state our main results. Theorem 1.1. Let p be a prime ideal in OF . Let π ∈ Ĝ∞ be of cohomological type. Suppose either that G∞ does not admit discrete series, or, if G∞ admits discrete series, that π contributes to cohomology in degrees other than dim(G∞/K∞). Then m(π,Γ(pk)) ≪ V (pk)1−1/ dim(G∞) as k → ∞, with the implied constant depending on Γ and p. Theorem 1.2. Let p be a prime ideal in OF . Let W be a finite-dimensional representation of G∞, and let VW,k denote the local system on Γ(p k)\G∞ induced by W (assuming that k is taken large enough for Γ(pk) to be torsion free). Let n ≥ 0, and if G∞ admits discrete series, then suppose furthermore that n 6= dim(G∞/K∞). Then dimHn Γ(pk)\G∞,VW,k ≪ V (pk)1−1/ dim(G∞) as k → ∞, with the implied constant depending on Γ and p. These two theorems are evidently closely related, in light of the results of [5], which show that Γ(pk)\G∞,VW,k may be computed in terms of automorphic forms. http://arxiv.org/abs/0704.0662v1 In the remainder of this introduction, we discuss the relation of Theorem 1.1 to prior results in this direction before briefly describing the main ingredients in the proof of the two theorems. DeGeorge and Wallach [3] established general upper bounds for m(π,Γ) in the case where Γ is cocompact. In particular (ibid, Corollary 3.2), they showed that m(π,Γ) ≤ |φ(g)|2dg vol(Γ\G∞), (1.1) where φ(g) = 〈π(g)v, v〉 is a matrix coefficient, and B is the preimage in Γ\G∞ of a ball in Γ\G∞/K∞ of radius equal to the injectivity radius of Γ\G∞/K∞. Suppose, however, that π is not a discrete series. In particular, the corresponding matrix coefficients of π are then not square integrable. If Γ(q) denotes the mod q congruence subgroup of Γ, then inj.rad(Γ(q)\G∞) → ∞ as NF/Q(q) → ∞, and thus, the formula of DeGeorge–Wallach implies that NF/Q(q)→∞ V (q)−1 ·m(π,Γ(q)) = 0. (1.2) For non-cocompact Γ, an analogous result was established by Savin [10]. It is natural to try to improve this result so as to obtain an estimate on the rate of decay in (1.2) as NF/Q(q) → ∞. If π is non-tempered, then (1.1) itself implies an estimate of the form m(π,Γ(q)) ≪ V (q)1−µ (1.3) for some µ > 0. (See [9], Lemma 1 and displayed equation (6).) In fact Sarnak and Xue in [9] have conjectured an inequality of the following form (in the case of cocompact Γ): Conjecture 1.3 (Sarnak–Xue). For π ∈ Ĝ∞ fixed, m(π,Γ(q)) ≪ V (q)(2/p(π))+ǫ, for all ǫ > 0, where p(π) is the infimum over p ≥ 2 such that the K-finite matrix coefficients of π are in Lp(G∞). Sarnak and Xue proved their conjecture for arithmetic lattices in SL2(R) and SL2(C), obtaining partial results in the direction of this conjecture for SU(2, 1). Note, however, that their conjecture is non trivial only for non-tempered representations, since for tempered representations, p(π) = 2. In particular, in the tempered but non-discrete series case, Conjecture 1.3 is weaker than the known result (1.2). In Theorem 1.1, we restrict our attention to congruence covers of the form Γ(pk) for the fixed prime p. For such covers we obtain a quantitative improvement of (1.2) even in the case of tempered representations (at least for those of cohomological type; note that non-discrete series tempered representations of cohomological type exist precisely when G∞ admits no discrete series – see [1], Thm. 5.1, p. 101). For such representations, our result provides the first general bound of the form (1.3) for any µ > 0. As we already noted, our two main theorems are closely related. Indeed, Theorem 1.1 is an easy corollary of Theorem 1.2 (see the end of Section 3 below), and most of our efforts will be concentrated on establishing the latter result. When studying the Betti numbers of arithmetic quotients of symmetric spaces, it is natural to try to use tools such as Euler characteristics and the Lefschetz trace formula. When applied to analyzing contributions from the discrete series, such methods tend to be very powerful; for example, the (g, k)-cohomology of a discrete series representation is concentrated in a single dimension [1], and so no cancellations occur when taking alternating sums. However, in other situations, these methods can be useless. For example, if π is tempered but not discrete series, then the Euler characteristic of its (g, k)-cohomology vanishes [1]. Similarly, in situations where the symmetric space is a real manifold of odd dimension n, Poincare duality leads to cancellations in the natural (−1)k dim(Hk). One is thus forced to find different techniques. The proof of Theorem 1.2 takes as input the inequality (1.2) of [3] and [10] and a spectral sequence from [4], proceeding via a bootstrapping argument relying on non-commutative Iwasawa theory. Acknowledgments. The second author would like to thank Peter Sarnak for a very stimulating conversation on the subject of this note. 2 Iwasawa Theory Let G ⊆ GLN (Zp) be an analytic pro-p group. Let Gk = G ∩ (1 + p kMN (Zp)) ⊆ GLN (Zp). The subgroups Gk form a fundamental set of open neighbourhoods of the identity in G, and moreover, for large k, there exists a constant c such that [G : Gk] = c · p dk, where d = dim(G). Fix a finite extension E of Qp with ring of integers OE . Write Λ = OE [[G]] and ΛE = E⊗OE Λ. The module theory of Λ falls under the rubric of Iwasawa theory. A fundamental result of Lazard [7] states that Λ is Noetherian; the same is thus true of the ring ΛE . The rings Λ and ΛE are non- commutative domains admitting a common field of fractions which we will denote by L . Thus, L is a division ring which contains Λ and ΛE and is flat over each of them (on both sides). If M is a finitely generated left Λ-module (resp. ΛE-module), then L ⊗Λ M (resp. L ⊗ΛE M) is a finite-dimensional left L -vector space; we define the rank of M to be the L -dimension of this vector space. Note that rank is additive in short exact sequences of finitely generated Λ-modules (resp. ΛE-modules), by virtue of the flatness of L over Λ and ΛE . Recall that a continuous representation of G on an E-Banach space V is called admissible if its topological E-dual V ′ (which is naturally a ΛE-module) is finitely generated over ΛE . (See [11]; a key point is that since ΛE is Noetherian, the category of admissible continuous G-representations is abelian. Indeed, passing to topological duals yields an anti-equivalence with the abelian category of finitely generated ΛE-modules.) We define the corank of an admissible G-representation to be the rank of the finitely generated ΛE-module V A coadmissible G-representation V is not determined by the collection of subspaces of invariants V Gk (r ≥ 1). However, the following result (Theorem 1.10 of Harris [6]) shows that its corank is so determined. Theorem 2.1 (Harris). Let V be an E-Banach space equipped with an admissible continuous G- representation and let d = dim(G). Then as k → ∞, dimE V Gk = r · [G : Gk] +O(p (d−1)k) = r · c · pdk +O(p(d−1)k), where r is the corank of V and c depends only on G. Using this result, we may obtain bounds on the dimensions of the continuous cohomology groups H i(Gk, V ) in terms of k for admissible continuous G-representations V . (Let us remark that the continuous Gk-cohomology on the category of admissible continuous G-representations may also be computed as the right derived functors of the functor of Gk-invariants; see Prop. 1.1.3 of [4].) Lemma 2.2. Let V be an admissible continuous G-representation. For each i ≥ 1, dimE H i(Gk, V ) ≪ p (d−1)k, as k → ∞. Proof. Let C := C(G,E) denote the Banach space of continuous E-valued functions on G, equipped with the right regular G-action. The module C has corank one (indeed, it is cofree – its topological dual is free of rank one over ΛE). Moreover, C is injective in the abelian category of admissible G-representations and is therefore acyclic. If V is an admissible continuous G-representation, then there exists an exact sequence 0 −→ V −→ Cn −→ W −→ 0 of admissible continuous G-representations for some integer n ≥ 0. Since C is acyclic, from the long exact sequence of cohomology we obtain the following: 0 −→ V Gk −→ (CGk)n −→ WGk −→ H1(Gk, V ) −→ 0, (2.1) H i(Gk, V ) ≃ H i−1(Gk,W ), i ≥ 2. (2.2) The lemma for i = 1 follows from a consideration of (2.1), taking into account Theorem 2.1 and the fact that corank of W is equal to n minus the corank of V (since corank is additive in short exact sequences). We now proceed by induction on i. Assume the result for i ≤ m and all admissible continuous representations, in particular for W . The result for i = m+1 then follows directly from the isomorphism (2.2). This completes the proof. 3 Cohomology of Arithmetic Quotients of Symmetric Spaces We now return to the situation considered in the introduction and use the notation introduced there. In particular, we fix a connected semisimple linear group G over F , an embedding G →֒ GLN over F , an arithmetic lattice Γ of the associated real group G∞, and a prime p of F . If we write G := lim Γ/Γ(pk), then G is a compact open subgroup of the p-adic Lie group G(Fp) (where Fp denotes the completion of F at p); alternatively, we may define G to be the closure of Γ in G(Fp). If we replace Γ by Γ(p k) for some sufficiently large value of k (i.e. discarding finitely many initial terms in the descending sequence of lattices Γ(pk)), then G will be pro-p and hence, will be an analytic pro-p-group. Note that Γ is a dense subgroup of G. Let e and f denote respectively the ramification and inertial indices of p in F (so that [Fp : Qp] = ef). For each k ≥ 0, write Gk to denote the closure of Γ(pek) in G. Alternatively, if we consider the embedding G(Fp) →֒ GLN (Fp) →֒ GLefN (Qp), then Gk = G ∩ (1 + p kMefN (Zp)); thus, our notation is compatible with that of the preceding section. We let d denote the dimension of G; note that d = [F :Q] · dim(G∞). For each k ≥ 0, we write Yk := Γ(p ek)\G∞/K∞. There is a natural action of G on Yk through its quotient G/Gk (∼= Γ/Γ(p ek)), which is compatible with the projections Yk → Yk′ for 0 ≤ k ′ ≤ k. Fix a finite-dimensional representation W of G over E, and let W0 denote a G-invariant OE- lattice in W . Let Vk denote the local system of free finite rank OE-modules on Yk associated to W0, and denote by Vk the pull-back of V0 to Yk for any k ≥ 0. If 0 ≤ k ′ ≤ k, then the sheaf Vk on Yk is naturally isomorphic to the pull-back of the sheaf Vk′ on Yk′ under the projection Yk → Yk′ . In particular the sheaf Vk is G/Gk-equivariant. Recall the following definitions from from [4], p. 21: H̃n(V) := lim Hj(Yk,Vk/p s), H̃n(V)E := E ⊗OE H̃ n(V). Each H̃n(V) is a p-adically completeOE-module, equipped with a leftG-action in a natural way, and hence, each H̃n(V)E has a natural structure of E-Banach space and is equipped with a continuous left G-action. In fact, they are admissible continuous representations of G ([4], Thm. 2.1.5 (i)), and in particular, Theorem 2.1 and Lemma 2.2 apply to them. (Note that the results of [4] are stated in the adèlic language. We leave it to the reader to make the easy translation to the more classical language we are using in this paper.) The following result, which is Theorem 2.1.5 (ii) of [4], p. 22, is a “control theorem” relating Gk invariants in H̃ E to the classical cohomology classes H j(Yk,Vk)⊗ E. Theorem 3.1. Fix an integer k. There is a spectral sequence 2 (Yk) = H i(Gk, H̃ j(V)E) =⇒ H i+j(Yk,Vk)E . One should view this spectral sequence as a version of the Hochschild-Serre spectral sequence “compatible in the G-tower.” Theorem 3.2. For any n ≥ 0, if rn denotes the corank of H̃ n(V), then dimE H n(Yk,Vk)E = rn · c · p dk +O(p(d−1)k) as k → ∞. (Here c denotes the constant appearing in the statement of Theorem 2.1; it depends only on G.) Proof. For each i, j ≥ 0 and l ≥ 2, let E l (Yk) denote the terms in the spectral sequence of Theorem 3.1. Since H̃j(V) is admissible, Lemma 2.2 implies that dimE H i(Gk, H̃ j(V)E) ≪ p (d−1)k, and thus, dimE E l (Yk) ≪ p (d−1)k, as k → ∞ (since E l (Yk) is a subquotient of E 2 (Yk) := H i(Gk, H̃ j(V)E) for l ≥ 2). Theorem 2.1 shows that dimE E 2 (Yk) = dimE H̃ E = rn · c · p dk +O(p(d−1)k). On the other hand, since the spectral sequence of Theorem 3.1 is an upper right quadrant exact sequence, E ∞ is obtained by taking finitely many successive kernels of differentials dl to E i+l,j−l+1 which all have order ≪ p(d−1)k by the first part of our argument. Thus, dimE E = rn · c · p dk +O(p(d−1)k). Since Hn(Yk,Vk)E admits a finite length filtration whose associated graded pieces are isomorphic ∞ for i+ j = n, we conclude that dimE H n(Yk,Vk)E = rn · c · p dk +O(p(d−1)k), as claimed. The following lemma quantifies the precise relationship between multiplicities and the dimen- sions of cohomology groups that we will require to deduce Theorem 1.1 from Theorem 1.2. Lemma 3.3. Fix a cohomological degree n, and let S denote the set of isomorphism classes [π] of π ∈ Ĝ∞ that contribute to cohomology with coefficients in V in degree n. Then [π]∈S m(π,Gk) ≍ dimE H cusp(Yk,Vk)E . Proof. Since the set S is finite, there is an integer d ≥ 1 so that 1 ≤ dimHn(g, k;π ⊗W ) ≤ d for each isomorphism class [π] ∈ S. This implies that [π]∈S m(π,Gk) ≤ dimE H cusp(Yk,Vk)E ≤ d [π]∈S m(π,Gk) for each k ≥ 0. We can now prove our main result. Theorem 3.4. Let n ≥ 0, and suppose that either G∞ does not admit discrete series or else that dim(G∞/K∞). Then dimE H n(Yk,Vk)E ≪ p (d−1)k as k → ∞, for all n ≥ 0. Proof. In the case when G∞ admits discrete series, recall that these contribute to cohomology only in the dimension dim(G∞/K∞) ([1], Thm. 5.1, p. 101). Thus, under the assumptions of the theorem, there is no contribution from the discrete series to Hncusp(Yk,Vk). The inequal- ity (1.2) of [3] and [10], together with Lemma 3.3 and the main result of [8] (which states that Hn(Yk,Vk)/H cusp(Yk,Vk) = o(pdk)), thus shows that dimE H n(Yk,Vk)E = o(p dk) as k → ∞, for all n ≥ 0. From Theorem 3.2, we then infer that each H̃n(V) has corank 0. Another application of the same theorem now gives our result. Note that V (pek) ∼ [G : Gk] ∼ c · p dk; thus Theorem 3.4 implies Theorem 1.2 since dim(G) ≤ dim(G∞). (3.1) Theorem 1.2 and Lemma 3.3 together imply Theorem 1.1. Remark 3.5. We have equality in (3.1) precisely when p is the unique prime lying over p in OF . If there is more than one prime lying over p, then dim(G) is strictly less than dim(G∞), and we obtain a corresponding improvement in the bounds of Theorems 1.1 and 1.2, namely (in the notation of their statements), that m(π,Γ(pk)) and dimE H i(Y (pk),VW,k)E ≪ V (p n)1−1/ dim(G) (where, as we noted above, dim(G) = [F :Q] · dim(G∞), with e and f being the ramification and inertial index of p respectively). Example/Question 3.6. Let F/Q be an imaginary quadratic field, and let G = SL2/F . The corresponding symmetric space G∞/K∞ = SL2(C)/SU(2) is a real hyperbolic three space H, and the quotients Y are commensurable with the Bianchi manifolds H/PGL2(OK). Choose a local system V0 associated to some finite-dimensional representation W of G∞ = GL2(C) and a congruence subgroup Γ. Assume that p = pp splits in OF , and apply Theorem 3.4 to the p-power tower. We obtain the inequality H1cusp(Yk,Vk) ≪ p as k → ∞. It is natural to ask how tight this inequality is. The main result of Calegari–Dunfield [2] shows that there exists at least one (F,Γ, p) for which H1cusp(Yk,C) = 0 for all k. On the other hand, if there exists at least one newform on Γ(pk) for some k, then a consideration of the associated oldforms shows that H1cusp(Yk,Vk) ≫ p as k → ∞. Are there situations in which this lower bound gives the true rate of growth? Remark 3.7. Our results are most interesting in the case when G∞ does not admit any discrete series, since, as we noted in the introduction, in this case (and only in this case), G∞ admits (non-discrete series) tempered representations of cohomological type. On the other hand, Theorem 3.2 does have a consequence in the case when G∞ admits discrete series which may be of some interest. Recall the following result from [10] (established in [3] in the cocompact case): if π ∈ Ĝ∞ lies in the discrete series, then m(Γ(πk), π) = d(π)V (pk) + o(V (pk)) (3.2) as k → ∞. Fix a finite-dimensional representation W of G∞, and let Ĝ∞(W )d denote the subset of Ĝ∞ consisting of discrete series representations that contribute to cohomology with coefficients in W . Summing over all π ∈ Ĝ∞(W )d, we obtain the formula |Ĝ∞(W )d| π∈ bG∞(W )d m(Γ(πk), π) = d(π)V (pk) + o(V (pk)). (3.3) (A result first proved in [8].) The following result provides an improvement in the error term of (3.3). Theorem 3.8. There exists µ > 0 such that |Ĝ∞(W )d| π∈ bG∞(W )d m(Γ(πk), π) = d(π)V (pk) +O(V (pk)1−µ). Proof. Let n = dim(G∞/K∞). As already noted, it follows from [1], (Thm. 5.1, p. 101), that all non-discrete series contributions to Hncusp(Yk,Vk) are non-tempered. The same result shows that each discrete series has one-dimensional (g, k)-cohomology in dimension n. As we recalled in the introduction, the multiplicity of any non-tempered representations is bounded by V (pk)1−µ for some µ > 0 [9], and thus, Theorem 3.2 and (the proof of) Lemma 3.3 show that |Ĝ∞(W )d| π∈ bG∞(W )d m(Γ(πk), π) = C · V (pk) +O(V (pk)1−µ). Comparing this formula with (3.3) yields the theorem. Question 3.9. Does the result of Theorem 3.8 hold term-by-term? That is, does (3.2) admit an improvement of the form m(Γ(πk), π) = d(π)V (pk) +O(V (pk)1−µ) for some µ > 0? References [1] Borel, A.; Wallach, N. Continuous cohomology, discrete subgroups, and representations of reductive groups. Annals of Mathematics Studies, 94. Princeton University Press, Princeton, N.J.; University of Tokyo Press, Tokyo, 1980. xvii+388 pp. [2] Calegari, F.; Dunfield, N. Automorphic forms and rational homology 3-spheres. Geom. Topol. 10 (2006), 295–329. [3] DeGeorge, D.; Wallach, N. Limit formulas for multiplicities in L2(Γ\G). Ann. Math. (2) 107 (1978), no. 1, 133–150. [4] Emerton, M. On the interpolation of systems of eigenvalues attached to automorphic Hecke eigenforms. Invent. Math. 164 (2006), no. 1, 1–84. [5] Franke, J. Harmonic Analysis in Weighted L2-spaces. Ann. Sci. cole Norm. Sup. (4) 31 (1998), no. 2, 181–279. [6] Harris, M. Correction to: “p-adic representations arising from descent on abelian varieties” [Compositio Math. 39 (1979), no. 2, 177–245]. Compositio Math. 121 (2000), no. 1, 105–108. [7] Lazard, M. Groupes analytiques p-adiques, Publ. Math. IHES 26 (1965). [8] Rohlfs, J.; Speh, B. On limit multiplicities of representations with cohomology in the cuspidal spectrum. Duke Math. J. 55 (1987), no. 1, 199–211. [9] Sarnak, P.; Xue, X. Bounds for multiplicities of automorphic representations. Duke Math. J. 64 (1991), no. 1, 207–227. [10] Savin, G. Limit multiplicities of cusp forms. Invent. Math. 95 (1989), no. 1, 149–159. [11] Schneider, P.; Teitelbaum, J. Banach space representations and Iwasawa theory, Israel. J. Math. 127 (2002), 359–380. Introduction Iwasawa Theory Cohomology of Arithmetic Quotients of Symmetric Spaces ABSTRACT Let $\Goo$ be a semisimple real Lie group with unitary dual $\Ghat$. The goal of this note is to produce new upper bounds for the multiplicities with which representations $\pi \in \Ghat$ of cohomological type appear in certain spaces of cusp forms on $\Goo$. <|endoftext|><|startoftext|> Decoherence of Quantum-Enhanced Timing Accuracy Mankei Tsang Department of Electrical Engineering, California Institute of Technology, Pasadena, CA 91125 (Dated: August 6, 2021) Quantum enhancement of optical pulse timing accuracy is investigated in the Heisenberg picture. Effects of optical loss, group-velocity dispersion, and Kerr nonlinearity on the position and momentum of an optical pulse are studied via Heisenberg equations of motion. Using the developed formalism, the impact of decoherence by optical loss on the use of adiabatic soliton control for beating the timing standard quantum limit [Tsang, Phys. Rev. Lett. 97, 023902 (2006)] is analyzed theoretically and numerically. The analysis shows that an appreciable enhancement can be achieved using current technology, despite an increase in timing jitter mainly due to the Gordon-Haus effect. The decoherence effect of optical loss on the transmission of quantum-enhanced timing information is also studied, in order to identify situations in which the enhancement is able to survive. PACS numbers: 42.50.Dv, 42.65.Tg, 42.81.Dp I. INTRODUCTION It has been suggested that the use of correlated photons is able to enhance the position accuracy of an optical pulse be- yond the standard quantum limit, and the enhancement can be useful for positioning and clock synchronization applications [1]. Generation of two photons with the requisite correlation has been demonstrated experimentally by Kuzucu et al. [2], but in practice it is more desirable to produce as many corre- lated photons as possible in order to obtain a higher accuracy. To achieve quantum enhancement for a large number of pho- tons, a scheme of adiabatically manipulating optical fiber soli- tons has recently been proposed [3], opening up a viable route of applying quantum enhancement to practical situations. The analysis in Ref. [3] assumes that the optical fibers are loss- less, so the Heisenberg limit [4] can be reached in principle. In reality, however, the quantum noise associated with optical loss increases the soliton timing jitter and limits the achievable enhancement. Compared with the use of solitons for quadra- ture squeezing [5], the adiabatic soliton control scheme poten- tially suffers more from decoherence, because the soliton must propagate for a longer distance to satisfy the adiabatic approx- imation. The effect of loss on a similar scheme of soliton mo- mentum squeezing has been studied by Fini and Hagelstein [6], although they did not study the timing jitter evolution rel- evant to the scheme in Ref. [3], and did not take into account possible departure from the adiabatic approximation. In this paper, the decoherence effect of optical loss on the timing accuracy enhancement scheme proposed in Ref. [3] is investigated in depth, in order to evaluate the performance of the scheme in practice. Instead of approaching the problem in the Schrödinger picture like prior work [1, 3, 6, 7], this paper primarily utilizes Heisenberg equations of motion, since they are able to account for dissipation and fluctuation in a more el- egant way. For simplicity, scalar solitons, as opposed to vector solitons studied in Ref. [3], are considered here. The theoret- ical and numerical analyses show that, despite an increase in timing jitter due to quantum noise and deviation from the adi- abatic approximation, an appreciable enhancement can still be achieved using a realistic setup. The developed formalism is also used to study the propa- gation of an optical pulse with quantum-enhanced timing ac- curacy in a lossy, dispersive, and nonlinear medium, such as an optical fiber, in order to identify situations in which the enhancement can still survive. The effect of loss on many correlated photons sent in as many channels has been inves- tigated by Giovannetti et al. [1], but their analysis focuses on a relatively small number of correlated photons and does not include the effects of dispersion and nonlinearity. This paper is organized as follows: Section II defines the general theoretical framework, and derives the standard quan- tum limits and Heisenberg limits on the variances of the pulse position and momentum operators. Section III studies the evo- lution of such operators in the presence of loss, group-velocity dispersion, and Kerr nonlinearity, and determines the effect of dissipation and fluctuation on the position and momentum un- certainties. Section IV investigates theoretically and numeri- cally the impact of optical loss on the adiabatic soliton con- trol scheme using realistic parameters, while Sec. V studies the decoherence effect on the transmission of the quantum- enhanced timing information in various linear and nonlinear systems. II. THEORETICAL FRAMEWORK A. Definition of pulse position and momentum operators The positive-frequency electric field of a waveguide mode at a certain longitudinal position can be defined as [8] Ê(+)(t) = i 4πε0cn2S ĉ(ω)e−iωt , (1) where n is the refractive index, η is the real part of n, S is the transverse area of the waveguide mode, and ĉ(ω) is the photon annihilation operator. The annihilation operator is related to the corresponding creation operator via the commutator [8], [ĉ(ω), ĉ†(ω ′)] = δ (ω −ω ′). (2) For a pulse with a slowly-varying envelope compared with the optical frequency, the coefficient in front of the annihilation operator can be assumed to be independent of frequency and http://arxiv.org/abs/0704.0663v1 can be evaluated at the carrier frequency ω0, so that the elec- tric field is proportional to the temporal envelope annihilation operator Â(t), Ê(+)(t) ∝ Â(t)e−iω0t , (3) Â(t)≡ 1√ dω â(ω)e−iωt , (4) â(ω)≡ ĉ(ω +ω0). (5) The temporal envelope operator Â(t) and the spectral operator â(ω) evidently also satisfy the following commutation rela- tions with their corresponding creation operators, [Â(t), †(t ′)] = δ (t − t ′), (6) [â(ω), â†(ω ′)] = δ (ω −ω ′). (7) The total photon number operator can be defined as dt †(t)Â(t), (8) and the pulse center position operator as [9] T̂ ≡ 1 dt t†(t)Â(t), (9) where is the average photon number. This definition uses 1/N as the normalization coefficient, instead of the inverse photon number operator N̂−1 used by Lai and Haus [10], in order to express the position operator in terms of normally ordered optical field operators that are easier to handle, as well as to avoid the potential problem of applying N̂−1 on the vacuum state. As long as the photon-number fluctuation is small, the position operator naturally corresponds to the measurement of the center position of the pulse intensity profile. An average longitudinal momentum operator can be similarly defined, Ω̂ ≡ 1 dω ω â†(ω)â(ω) dt †(t) Â(t). (11) If the quantum state is close to a large-photon-number coher- ent state,  can be approximated as +δ Â, with O(δ Â)≪ O(Â). Equations (9) and (11) then become the approximate position and momentum operators defined by Haus and Lai for solitons in a linearized approach [11]. The linearized ex- pressions also describe how they can be accurately measured in practice using balanced homodyne detection [11]. For simplicity, we shall hereafter assume that = 0 and = 0 [9]. In the systems considered in this paper, these two quantities remain constant throughout propagation, if t is regarded as the retarded time in the moving frame of the optical pulse. The commutator between the position and momentum op- erators is [T̂ ,Ω̂] = . (12) By the Heisenberg uncertainty principle, [T̂ ,Ω̂] . (13) B. Derivation of standard quantum limits The standard quantum limits and Heisenberg limits on should be expressed in terms of the pulse width ∆t, defined as dt t2†(t)Â(t) , (14) and the bandwidth ∆ω , dω ω2â†(ω)â(ω) dt †(t) Â(t) . (15) To calculate the standard quantum limit on the position uncer- tainty, consider the expansion dω ω â†â dω ′ ω ′â′†â′ , (16) where we have written â= â(ω) and â′ = â(ω ′) as shorthands. Rearranging the operators, dω ω2â†â dω ′ ωω ′â†â′†ââ′ . (17) The first term on the right-hand side of Eq. (17) is proportional to ∆ω2, while the second term contains a normally ordered cross-spectral density. To derive the standard quantum limit, we shall assume that the cross-spectral density satisfies the factorization condition: â†â′†ââ′ â†â â′†â′ . (18) This condition is always satisfied by any pure or mixed state with only one excited optical mode, such as a coherent state [12, 13]. The second term on the right-hand side of Eq. (17) becomes dω ′ ωω ′ â†â′†ââ′ , (19) which is assumed to be zero, as per the convention of this paper. Thus, the variance of Ω̂ is , (20) where the subscript “coh” denotes statistics of coherent fields [12, 13] given by Eq. (18). By virtue of the Heisenberg uncer- tainty principle given by Eq. (13), the standard quantum limit on the position variance is hence 4N∆ω2 . (21) This limit is applicable to any pure or mixed state, and is con- sistent with the one suggested by Giovannetti et al. for Fock states [1]. A very similar derivation of the limit for Fock states and coherent states is also performed by Vaughan et al. [9]. Owing to Fourier duality of position and momentum in the slowly-varying envelope regime, the standard quantum limit on the momentum can be derived in the same way. The vari- ance of T̂ , assuming coherent-field statistics, is , (22) and the standard quantum limit on the momentum variance is 4N∆t2 . (23) C. Derivation of Heisenberg limits To derive the Heisenberg limit on the position uncertainty, one needs an absolute upper bound on the momentum uncer- tainty . Consider the following non-negative quantity proportional to the coherence bandwidth squared, dω ′ (ω −ω ′)2 â†â′†ââ′ ≥ 0. (24) This quantity is non-negative because (ω − ω ′)2 is non- negative and â†â′†ââ′ is also non-negative [13]. It can be rewritten as dω ′ (ω −ω ′)2 â†â′†ââ′ dω ′ (ω −ω ′)2 â†ââ′†â′ , (25) and expanded as dω ′ (ω2 +ω ′2 − 2ωω ′)â†ââ′†â′ dω ω2â†â − 2N2 ≥ 0. (26) Here we shall approximate N̂ with N, and neglect any photon- number fluctuation. This approximation is exact for Fock states, and acceptable for any quantum state with a small photon-number fluctuation, such as a large-photon-number coherent state. We then obtain the following approximate in- equality, ≤ ∆ω2. (27) With the Heisenberg uncertainty principle given by Eq. (13) and the upper bound on given by Eq. (27), one can then obtain the Heisenberg limit on the uncertainty of T̂ : 4N2∆ω2 . (28) Equation (28) is again consistent with the Heisenberg limit suggested by Giovannetti et al. [1], although the derivation here shows that it is not only valid for Fock states but also correct to the first order for any quantum state with a small photon-number fluctuation. The Heisenberg limit on is similar, 4N2∆t2 . (29) A more exact derivation of the Heisenberg limits is given in Appendix A, where the inverse photon-number operator N̂−1 is used instead of 1/N in the definitions of T̂ , Ω̂, ∆ω , and ∆t. The difference between the approximate Heisenberg limits derived here and the exact Heisenberg limits in Appendix A is negligible for small photon-number fluctuations. III. OPTICAL PULSE PROPAGATION IN THE HEISENBERG PICTURE The classical nonlinear Schrödinger equation that describes the propagation of pulses in a lossy, dispersive, and nonlinear medium, such as an optical fiber, is given by [14] −κ |A|2A− A, (30) where t is the retarded time coordinate in the frame of the moving pulse, β is the group-velocity dispersion coefficient, κ is the normalized Kerr coefficient, and α is the loss coeffi- cient, all of which may depend on z. The phenomenological quantized version that preserves the commutator between  and † is [5] ∂ 2 −κ†ÂÂ− Â+ iŝ. (31)  ≡ Â(z, t) is the pulse envelope annihilation operator in the Heisenberg picture, and ŝ is the Langevin noise operator, sat- isfying the commutation relation [ŝ(z, t), ŝ†(z′, t ′)] = αδ (z− z′)δ (t − t ′). (32) Rewriting the position and momentum operators in Eqs. (9) and (11) in the Heisenberg picture as T̂ (z) and Ω̂(z) in terms of Â(z, t), differenting them with respect to z, and using Eq. (31), their equations of motion can be derived, dT̂ (z) = β (z)Ω̂(z)+ ŜT (z), (33) dΩ̂(z) = ŜΩ(z), (34) where ŜT and ŜΩ are position and momentum noise operators defined as ŜT (z)≡ dt tŝ†(z, t)Â(z, t)+H. c., (35) ŜΩ(z)≡ dt ŝ†(z, t) Â(z, t)+H. c., (36) and H. c. denotes Hermitian conjugate. If the noise reservoir is assumed to be in the vacuum state, the noise operators have the following statistical properties, as shown in Appendix B, ŜT (z) ŜΩ(z) = 0, (37) ŜT (z)ŜT (z α(z)∆t2(z) δ (z− z′), (38) ŜΩ(z)ŜΩ(z α(z)∆ω2(z) δ (z− z′), (39) ŜT (z)ŜΩ(z ′)+ ŜΩ(z)ŜT (z α(z)C(z) δ (z− z′), (40) where C(z) is the pulse chirp factor, defined as C(z)≡ dt †(z, t) Â(z, t) The average position 〈T̂ (z)〉 and average momentum 〈Ω̂(z)〉 are constant and assumed to be zero throughout propagation. The variance of Ω̂ is then Ω̂2(z) Ω̂2(0) α(z′)∆ω2(z′) N(z′) , (42) while the variance of T̂ is more complicated due to the pres- ence of dispersion, T̂ 2(z) T̂ 2(0) T̂ (0)Ω̂(0)+ Ω̂(0)T̂ (0) dz′β (z′) Ω̂2(0) dz′β (z′) α(z′)∆t2(z′) N(z′) dz′β (z′) α(z′′)C(z′′) N(z′′) dz′β (z′) dz′′β (z′′) ∫ z′′ dz′′′ α(z′′′)∆ω2(z′′′) N(z′′′) . (43) Equation (43) is the central result of this paper. It is similar to that derived by Haus for optical solitons using a linearized approach [15], but Eq. (43) is valid for arbitrary loss, arbi- trary dispersion profile β (z), and arbitrary evolution of pulse width ∆t(z), chirp C(z), and bandwidth ∆ω(z), so that it is able to describe the effect of loss on the quantum enhance- ment scheme proposed in Ref. [3]. The first term on the right- hand side of Eq. (43) is the initial quantum fluctuation, while the second and third term on the right-hand side describe the quantum dispersion effect [16]. In an ideal scenario described in Ref. [3], T̂ 2(z) remains constant if the net dispersion ′β (z′) is zero and quantum dispersion is compensated. With loss, however, noise introduces a diffusive jitter given by the fourth term on the right-hand side of Eq. (43), T̂ 2(z) α(z′)∆t2(z′) N(z′) , (44) a less well-known chirp-induced jitter given by the fifth term, T̂ 2(z) dz′β (z′) α(z′′)C(z′′) N(z′′) , (45) and also the Gordon-Haus timing jitter [17] given by the sixth term, T̂ 2(z) dz′β (z′) dz′′β (z′′) ∫ z′′ dz′′′ α(z′′′)∆ω2(z′′′) N(z′′′) . (46) In most cases considered here, N ≫ 1, ≪ ∆t2, ≪ ∆ω2, so one can use the classical nonlinear Schrödinger equation, Eq. (30), to predict the evolution of ∆t(z), C(z), and ∆ω(z) accurately. The evolution of T̂ 2(z) can subsequently be calculated analytically or numerically us- ing Eq. (43) and the classical evolution of ∆t(z), C(z), and ∆ω(z), analogous to the linearized approach [11, 15]. It is worth noting that the chirp-induced jitter, Eq. (45), de- pends on the cross-correlation between the position and mo- mentum noise in Eq. (40), so it can be positive as well as neg- ative, but the sum of the three sources of jitter must obviously remain positive. IV. EFFECT OF LOSS ON ADIABATIC SOLITON CONTROL A. Review of the ideal case Consider the scheme proposed in Ref. [3] and depicted in Fig. 1. Assume that the dispersion coefficient of the first fiber β (z) is negative and its magnitude increases along the fiber slowly compared with the soliton period. The classical soliton solution of Eq. (30), assuming adiabatic change in parameters FIG. 1: (Color online) Schematic (not-to-scale) of the adiabatic soli- ton control scheme. An optical pulse is coupled into a dispersion- increasing fiber of length L with a negative dispersion coefficient β , followed by a much shorter dispersion-compensating fiber of length L′ with a positive dispersion coefficient β ′. β (z) and N(z), is [18] A(z, t) = A0(z)sech dz′|A0(z′)|2 , (47) A0(z) = 2τ(z) , τ(z) = 2|β (z)| κN(z) . (48) The adiabatic approximation is satisfied when β (z) dβ (z)/dz dN(z)/dz ≪ Λ, (49) where Λ is the soliton period, Λ(z)≡ τ2(z) |β (z)| . (50) The root-mean-square pulse width ∆t(z) and bandwidth ∆ω(z) then become ∆t(z) = τ(z) = |β (z)| κN(z) , (51) ∆ω(z) = 3τ(z) κN(z) |β (z)| . (52) The bandwidth ∆ω(z) is thus reduced in the first fiber. If the second fiber has a positive dispersion coefficient β ′ so that the net dispersion is zero ( 0 dzβ (z)+β ′L′ = 0), the quantum dispersion effect given by the second and third term on the right-hand side of Eq. (43) can be eliminated. Furthermore, if β ′ has a very large magnitude compared with β (z) so that the second fiber can be very short compared with the first fiber, the effective nonlinearity experienced by the pulse in the second fiber can be neglected, and ∆ω(z) remains essentially constant in the second fiber. In the lossless case, the final timing jitter T̂ 2(L+L′) is therefore the same as the input T̂ 2(0) , but ∆ω(L+L′) has been reduced and the standard quantum limit T̂ 2(L+L′) , Eq. (21), is raised. Provided that the initial timing jitter of a laser pulse obeys the coherent-field statistics given by Eq. (22), the final timing jitter is T̂ 2(L+L′) T̂ 2(0) ∆t2(0) β 2(0) , (53) while the final standard quantum limit is T̂ 2(L+L′) 4N∆ω2(L+L′) 3β 2(L) . (54) A timing jitter squeezing ratio, analagous to the squeezing ra- tio defined by Haus and Lai [11], can be defined as T̂ 2(L+L′) T̂ 2(L+L′) β 2(0) β 2(L) . (55) The factor of π2/9 arises because the initial jitter for a sech pulse shape is slightly higher than the standard quantum limit given by Eq. (21) in terms of the bandwidth. As long as β (L) at the end of the first fiber is significantly larger than the initial value, the timing jitter becomes lower than the raised standard quantum limit, R becomes smaller than 1, and quantum en- hancement of position accuracy is accomplished. This semi- classical analysis is valid in all practical cases, where N ≫ 1, R ≫ 1/N, ≪ ∆t2, ≪ ∆ω2, and is consistent with the analysis of exact quantum soliton theory in Ref. [3]. R is related to the quantum enhancement factor γ defined in Ref. [3] by R = 1/γ2. The semiclassical analysis is no longer valid when R is close to the Heisenberg limit 1/N, but as the next sections will show, owing to decoherence effects, it is ex- tremely difficult for the enhancement to get anywhere close to the Heisenberg limit. B. Numerical analysis of a realistic case To investigate the impact of noise and the validity of the adiabatic approximation in practice, a numerical evaluation of ∆t(z), C(z), ∆ω(z), and T̂ 2(z) , using Eqs. (30) and (43) and realistic parameters, is necessary. β (z) is assumed to have the following profile used in Ref. [19], β (z) = −12.75 ps2/km 1+(L− z)/Lβ . (56) Lβ = 1 km is used here instead of the Lβ = 1/12 km used in Ref. [19], in order to satisfy the adiabatic approximation for a longer pulse in this example. Other fiber parameters are α = 0.4 dB/km, n2 = 2.6×10−16 cm2/W, Aeff = 30 µm2 [19], λ0 = 1550 nm, ω0 = 2πc/λ0, so that κ = h̄ω0(ω0n2/cAeff). L is assumed to be 2 km. A dispersion-compensating fiber with β ′ = 127.5 ps2/km, α = 0.4 dB/km, n2 = 2.7× 10−16 cm2/W, Aeff = 15 µm2 [20], and L′ = 110 m is used in the numerical analysis as the second fiber. The classical nonlin- ear Schrödinger equation, Eq. (30), is numerically solved us- ing the Fourier split-step method [14]. An initial sech soliton pulse with τ(0) = 1 ps, N(0) = 1.9×107, and an initial energy of 2.4 pJ is assumed. Figure 2 plots the numerical evolution of pulse intensity and spectrum in the two fibers. As expected, the bandwidth is nar- rowed in the first fiber and remains approximately constant in the second (z > 2000 m), owing to the latter’s relative short length. Figure 3 plots the evolution of pulse width ∆t(z), chirp −8 −6 −4 −2 0 2 4 6 8 z (m) |A(z, t)|2 t (ps) 1.5 0 z (m) |a(z, ω)|2 ω (THz) FIG. 2: (Color online) Numerical evolution of pulse intensity (top) and spectrum (bottom). The denser plots for z > 2000 m indicate pulse propagation in the second fiber. The color codes are in the same arbitrary units as the heights of the plots. C(z), and bandwidth ∆ω(z), compared with the adiabatic ap- proximation, Eqs. (51) and (52). The adiabatic approximation is evidently not exact, and the pulse acquires a chirp due to excess dispersion in the first fiber, leading to slight refocusing in the second fiber. The bandwidth is reduced by a factor of 2.2, as opposed to the ideal factor of 3.6. Figure 4 plots the evolution of the diffusive jitter given by Eq. (44), the chirp-induced jitter given by Eq. (45), and the Gordon-Haus jitter given by Eq. (46). It can be seen that al- though the Gordon-Haus jitter increases much more quickly than the other jitter components in the first fiber, the for- mer drops abruptly in the second fiber (z > 2000 m) due to the opposite dispersion. This kind of Gordon-Haus jit- ter reduction by dispersion management is well known [21]. The chirp-induced jitter component drops below zero in the second fiber, but as noted before, the total noise jitter re- mains positive. The final jitter values are numerically deter- mined to be T̂ 2(L+L′) = 0.71 T̂ 2(0) T̂ 2(L+L′) −0.93 T̂ 2(0) , and T̂ 2(L+L′) = 1.42 T̂ 2(0) , result- ing in a total jitter of T̂ 2(L+L′) T̂ 2(0) T̂ 2(L+L′) T̂ 2(L+L′) T̂ 2(L+L′) = 2.19 T̂ 2(0) . (57) The final squeezing ratio is hence 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Pulse Width ∆t(z) Numerical Adiabatic Approximation 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Chirp C(z) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 z (m) Bandwidth ∆ω(z) FIG. 3: (Color online) Evolution of pulse width ∆t(z) (top), chirp C(z) (center), and bandwidth ∆ω (bottom), compared with the adia- batic approximation (dash). Plots of ∆t and ∆ω are normalized with respect to their initial values, respectively. T̂ 2(L+L′) T̂ 2(L+L′) T̂ 2(L+L′) T̂ 2(0) N(L+L′) ∆ω2(L+L′) ∆ω2(0) = 0.42 =−3.8 dB. (58) Despite taking into account the increased timing jitter and the non-ideal bandwidth narrowing, a squeezing ratio of −3.8 dB is predicted by the numerical analysis, suggesting that one should be able to observe the quantum enhancement experi- mentally using current technology. C. Potential improvements As shown in the previous section, the Gordon-Haus effect contributes the largest amount of noise in the soliton control scheme, despite its partial reduction by dispersion manage- ment. Its magnitude at the end of the first fiber can be esti- mated roughly as T̂ 2(L) T̂ 2(L) (αL). (59) As the length of the first fiber must be at least a few times longer than the soliton period Λ for the adiabatic approxima- 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Diffusive Jitter 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Chirp-Induced Jitter 0 200 400 600 800 1000 1200 1400 1600 1800 2000 z (m) Gordon-Haus Jitter FIG. 4: (Color online) Evolution of diffusive jitter (top), chirp- induced jitter (center), and Gordon-Haus jitter (bottom). All plots are normalized with respect to the initial jitter T̂ 2(0) tion to hold and for the bandwidth to be significantly reduced, L/Λ is approximately fixed, and the Gordon-Haus jitter can be reduced only if a figure of merit, FOM ≡ 1 , (60) is enhanced. Since this is a rough order-of-magnitude esti- mate, a representative value of Λ, say at z = L, can be used. The figure of merit suggests that the performance of the soli- ton control scheme can be improved by reducing the pulse width, increasing the overall dispersion coefficient, or reduc- ing the loss coefficient. Reducing the pulse width is the most convenient way of obtaining better enhancement, as the adiabatic bandwidth re- duction can be achieved over a shorter distance with less loss of photons. For example, using τ(0) = 500 fs, L = 1 km, Lβ = 0.3 km, L ′ = 44 m, and otherwise the same parame- ters as in Sec. IV B, the squeezing ratio becomes −6.0 dB, while using τ(0) = 200 fs, L = 500 m, Lβ = 1/12 km, and L′ = 16.2 m gives a squeezing ratio of −7.3 dB. The shorter pulse width, however, significantly enhances higher-order dis- persive and nonlinear effects. Raman scattering, in particular, contributes additional quantum noise because of coupling to optical phonons [22]. It is beyond the scope of this paper to investigate these higher-order effects, so a more conservative pulse width of 1 ps is used in the preceding section. A larger overall dispersion coefficient, on the other hand, means that more photons or a higher nonlinearity are required for a soli- ton to form, so the Raman effect may also become more sig- nificant with a larger dispersion coefficient. The Raman effect can be reduced by cooling the fiber and reducing the number of thermal phonons [22], if it becomes a significant problem. Further advance in optical fiber technology should be able to increase the figure of merit by reducing loss, since the spe- cialty fibers assumed in Sec. IV B have a higher loss than usual transmission fibers by a factor of two. Using α = 0.2 dB/km instead of 0.4 dB/km in Sec. IV B, for instance, reduces the squeezing ratio to −4.7 dB. Spectral filtering or frequency- dependent gain [23] provides another way of controlling the Gordon-Haus effect, although it adds another level of com- plexity to the experimental setup, and it is beyond the scope of this paper to investigate how the frequency-dependentdissi- pation or amplification might help the quantum enhancement scheme. Finally, the design of the setup assumed in Sec. IV B is not fully optimized, and further optimization of parame- ters, fiber dispersion profiles, and bandwidth narrowing strat- egy should be able to improve the enhancement. V. EFFECT OF LOSS ON THE TRANSMISSION OF QUANTUM-ENHANCED TIMING INFORMATION Provided that quantum enhancement of pulse position ac- curacy can be achieved, the information still needs to be transmitted through unavoidably lossy channels. It is hence an important question to ask how loss affects the quantum- enhanced information in optical information transmission sys- tems. Equation (43) governs the general evolution of the tim- ing jitter under the effects of loss, dispersion, and nonlinearity, but in order to estimate the relative magnitude of the decoher- ence effects and gain more insight into the decoherence pro- cesses, in this section Eq. (43) is explicitly solved for various systems and compared with the standard quantum limit. A. Linear non-dispersive systems Without dispersion, the timing jitter increases only due to the diffusive component T̂ 2(z) . An analytic expression for T̂ 2(z) can then be derived from Eq. (43), as ∆t(z) and ∆ω(z) remain constant, T̂ 2(z) T̂ 2(0) ∆t2(0) (1− e−αz). (61) If the initial variance obeys coherent-field statistics, that is, T̂ 2(0) = ∆t2(0)/N(0) according to Eq. (22), the subsequent jitter is T̂ 2(z) ∆t2(0) , (62) and obeys the same coherent-field statistics but for the re- duced photon number N(z). This is consistent with intu- ition. On the other hand, in the high loss limit (αz ≫ 1), the term ∆t2(0)/N(z) is likely to dominate over the initial jitter T̂ 2(0) , so in most cases the position of a significantly atten- uated pulse relaxes to coherent-field statistics independent of its initial fluctuation. This justifies the assumption in Sec. IV that a laser pulse exiting a laser cavity has such statistics, re- gardless of the quantum properties of the pulse inside the cav- Equation (61) can be renormalized as R(z)≡ T̂ 2(z) T̂ 2(z) = R(0)e−αz + 4∆t2(0)∆ω2(0)(1− e−αz). (63) ≪ ∆t2 and ≪ ∆ω2, classical theory pre- dicts that 4∆t2∆ω2 ≈ 1. Equation (63) then suggests that the relative increase in timing jitter is independent of the initial squeezing ratio R(0). This is nevertheless not true in general, as ∆t may depend on both ∆ω and R when the classical theory fails. In Appendix C, the exact dependence of ∆t on ∆ω and R is calculated for a specific multiphoton state with a Gaussian pulse shape called the jointly Guassian state. The expression 4∆t2∆ω2 is given by 4∆t2∆ω2 = (1− 1/N)2 1− 1/(NR) , (64) which results in the following exact expression for an initial jointly Gaussian state, R(z) = R(0)e−αz + (1− 1/N)2 1− 1/(NR) (1− e−αz). (65) For a large photon number (N ≫ 1) and moderate enhance- ment (1 ≥ R ≫ 1/N), 4∆t2∆ω2 ≈ 1, as classical theory would predict for a Gaussian pulse. In this regime, the quantum- enhanced information is just as sensitive to loss as standard- quantum-limited information. When R gets close to the Heisenberg limit 1/N, however, ∆t∆ω approaches infinity. This is because maximal coincident-frequency correlations are required to achieve the Heisenberg limit [1], but heuris- tically speaking, if the photons have exactly the same mo- mentum, they must have infinite uncertainties in their rela- tive positions, leading to an infinite pulse width ∆t. Owing to the abrupt increase in 4∆t2∆ω2 when R approaches 1/N, the quantum enhancement becomes much more sensitive to loss. In the Heisenberg limit of R → 1/N, ∆t → ∞, any loss com- pletely detroys the timing accuracy and leads to an infinite jitter, according to Eq. (65). B. Linear dispersive systems If the system is lossy, dispersive, but linear, it is not difficult to show that ∆t2(z) = ∆t2(0)+C(0) dz′β (z′) +∆ω2(0) dz′β (z′) , (66) C(z) =C(0)+ 2∆ω2(0) dz′β (z′), (67) ∆ω2(z) = ∆ω2(0). (68) The following result can then be obtained from Eq. (43) after some algebra, T̂ 2(z) T̂ 2(0) T̂ (0)Ω̂(0)+ Ω̂(0)T̂ (0) dz′β (z′) Ω̂2(0) dz′β (z′) ∆t2(z) (1− e−αz). This result is similar to that in the previous section, except for the presence of quantum dispersion and the dispersive spread of the pulse width ∆t(z) that leads to increased jitter. With ini- tially coherent-field statistics, T̂ 2(0) Ω̂2(0) are given by Eqs. (22) and (20), respectively, while by similar argu- ments, the coherent-field statistics for T̂ Ω̂+ Ω̂T̂ T̂ Ω̂+ Ω̂T̂ . (70) This leads to the following position variance for a pulse with initially coherent-field statistics, T̂ 2(z) ∆t2(z) , (71) which still maintains the coherent-field statistics for the dis- persed pulse width and the reduced photon number. In the high loss limit (αz ≫ 1), the coherent-field statistics is again approached regardless of the initial conditions. For an initial jointly Gaussian quantum state, on the other hand, the normalized version of Eq. (69) is R(z) = R(0)+ 4∆t2(0)∆ω2(0)+ 4ζ 2 (1− e−αz), (72) where ζ is the normalized effective propagation distance, ζ ≡ ∆ω2(0) dz′β (z′), (73) and 4∆t2(0)∆ω2(0) is given by Eq. (64) evaluated at z= 0. As long as the loss is moderate so that e−αz ≫ 1−e−αz, quantum dispersion, given by the term proportional ζ 2/R(0), becomes the dominant effect and overwhelms the initial enhancement when ζ exceeds R(0)/2. If the net dispersion ′β (z′) is zero, both quantum and classical dispersion are eliminated, and the jitter growth be- comes identical to that in a non-dispersive and linear system given by Eq. (61). C. Soliton-like systems The previous sections show that coherent-field statistics is maintained in a linear system, but as Sec. IV clearly shows, non-trivial statistics can arise from the quantum dynamics of a nonlinear system. The complex evolution of ∆t(z), C(z), and ∆ω(z) in general prevents one from solving Eq. (43) explic- itly, except for special cases such as solitons. If the dispersion is constant and the pulse propagates in the fiber as a soliton, C(z) is zero, while ∆t(z) and ∆ω(z) can be regarded as constant if ≪∆t2 and ≪∆ω2 through- out propagation. Equation (43) can then be solved explicitly, T̂ 2(z) T̂ 2(0) Ω̂2(0) β 2z2 + ∆t2(0) (eαz − 1) 2β 2∆ω2(0) eαz − 1 , (74) where T̂ (0)Ω̂(0)+ Ω̂(0)T̂ (0) is asssumed to be zero for simplicity. If 4 T̂ 2(0) Ω̂2(0) = (π2/9)[1/N2(0)] is also assumed for a soliton pulse for simplicity, Eq. (74) can be normalized to give R(z) = R(0)e−αz + e−αz + (eαz − 1) eαz − 1 . (75) In the low loss regime with αΛ ≪ 1 and αz ≪ 1, Eq. (75) can be further simplified, R(z)≈ R(0)+ (αz). (76) Quantum dispersion is again the dominant effect in this regime, while decoherence effects are much smaller, by a fac- tor of αz approximately. Even if the net dispersion is zero and quantum dispersion is compensated, the Gordon-Haus effect cannot be completely eliminated by dispersion management in the presence of non- linearity and may become significant, as the numerical anal- ysis in Sec. IV B shows. An order-of-magnitude estimate of Gordon-Haus jitter can be performed by considering soliton propagation in a constant negative dispersion fiber, just as in the previous case, followed by a dispersion-compensating fiber of length L′ with positive dispersion coefficient β ′. If L′ is short, the effective nonlinearity experienced by the pulse in the second fiber can be neglected, and ∆ω(z) can be re- garded as constant. Assuming that β L+β ′L′ = 0, the integral in Eq. (46) can be solved to give the Gordon-Haus jitter, T̂ 2(L+L′) ≈ α∆ω 6N(0) β 2L2(L+L′) ≈ α∆ω 6N(0) β 2L3. (77) The normalized contribution to the squeezing ratio is therefore T̂ 2(L+L′) T̂ 2(L+L′) (αL). (78) Compared with the Gordon-Haus jitter at the end of the first fiber given by the last term of Eq. (76), dispersion manage- ment cuts the jitter by half, but the expression maintains its functional dependence on the parameters of the first fiber. This estimate also justifies the use of Eq. (59) to estimate the Gordon-Haus jitter at the end of the two fibers in Sec. IV C. To minimize the impact of Gordon-Haus jitter on the quantum- enhanced timing accuracy in a dispersion-managed soliton system, the condition L3 ≪ 54 R (79) is required. VI. CONCLUSION In conclusion, the decoherence effect by optical loss on adiabatic soliton control and on the transmission of quantum- enhanced timing information has been extensively studied. It is found that an appreciable enhancement can still be achieved by the soliton scheme using current technology, despite an in- crease of timing jitter due to the presence of loss. It is also found that the quantum-enhanced timing accuracy should be much lower than the Heisenberg-limited accuracy to avoid in- creased sensitivity to photon loss during transmission, and the net dispersion in the transmission system should be minimized in order to reduce quantum dispersion and the Gordon-Haus effect. Although the most important pulse propagation effects have been considered in this analysis, higher-order effects, such as third-order dispersion, self-steepening, and Raman scattering [14] might provide further adverse impact on the quantum en- hancement if the optical pulse is ultrashort. In particular, the inelastic Raman scattering process is expected to be a signifi- cant source of decoherence for ultrashort pulses [22]. It is be- yond the scope of this paper to investigate these higher-order effects, but they should be of minor importance for picosec- ond pulses and the propagation distances considered in this paper. Finally, it is worth noting that while this paper focuses on optical pulses, the developed formalism is equally valid for describing the transverse position and momentum of optical beams [24] and the center-of-mass variables of Bose-Einstein condensates [9]. Decoherence by loss of particles in those systems can be studied using the formalism developed in this paper and parameters specific to those systems. VII. ACKNOWLEDGMENTS This work is financially supported by the DARPA Center for Optofluidic Integration and the National Science Founda- tion through the Center for the Science and Engineering of Materials (DMR-0520565). APPENDIX A: DERIVATION OF EXACT HEISENBERG LIMITS An exact Heisenberg limit can be derived if the inverse photon-number operator N̂−1 is used instead of 1/N in the definitions of T̂ , Ω̂, ∆t, and ∆ω in Eqs. (9), (11), (14), and (15), just as in Refs. [9] and [10], T̂ ′ ≡ N̂−1 dt t†Â, (A1) Ω̂′ ≡ N̂−1 dω ω â†â, (A2) ∆t ′ ≡ dt t2† , (A3) ∆ω ′ ≡ dω ω2â†â . (A4) These operators are well defined as long as the quantum state has zero vacuum-state component (〈0|ρ̂|0〉 = 0). Starting from the Heisenberg uncertainty principle for T̂ ′ and Ω̂′, T̂ ′2 , (A5) and the inequality dω ′ (ω −ω ′)2â†â′†ââ′ ≥ 0, (A6) one can obtain the exact inequality ≤ ∆ω ′2, (A7) and the exact Heisenberg limit for the new position operator, T̂ ′2 4∆ω ′2 . (A8) The difference between Eqs. (28) and (A8) is negligible for small photon-number fluctuations. The exact Heisenberg limit is similar. APPENDIX B: NOISE STATISTICS In this section the expression ŜT (z)ŜT (z in Eq. (38) is calculated. The derivations of ŜΩ(z)ŜΩ(z in Eq. (39) and ŜT (z)ŜΩ(z ′)+ ŜΩ(z)ŜT (z in Eq. (40) are similar. Substi- tuting Eq. (35) into Eq. (38) gives ŜT (z)ŜT (z dt ′ tt ′ ŝ†Âŝ′†Â′ ŝ†ÂÂ′†ŝ′ †ŝÂ′†ŝ′ †ŝŝ′†Â′ , (B1) where N = N(z), N′ = N(z′), ŝ = ŝ(z, t),  = Â(z, t), ŝ′ = ŝ(z′, t ′), and Â′ = Â(z′, t ′). If the noise reservoir is in the vac- uum state, ŝ|0reservoir〉= 〈0reservoir|ŝ† = 0, so only the last term in Eq. (B1) is non-zero, ŜT (z)ŜT (z dt ′ tt ′ †ŝŝ′†Â′ dt ′ tt ′ †Â′ αδ (z− z′)δ (t − t ′)+ †ŝ′† ŝÂ′ δ (z− z′) dt ′ tt ′ †ŝ′†ŝÂ′ . (B2) The first term on the right-hand side of Eq. (B2) is the desired result, while the second term can be rewritten as dt ′ tt ′ †ŝ′†ŝÂ′ dt ′ tt ′ †, ŝ′† ŝ, Â′ . (B3) If the system is linear, the commutator between ŝ and  is always zero [13], but because ŝ does not commute with † and  is coupled to † by the nonlinear term in Eq. (31), ŝ may fail to commute with Â. That said, it can be argued that the optical field operator must always commute with future noise operators due to causality and the infinitesimally short memory of ŝ, †, ŝ′† = 0 if z < z′, (B4) ŝ, Â′ = 0 if z > z′, (B5) so Eq. (B3) can be non-zero only at z = z′. The commutator between ŝ and  at z = z′ due to the parametric coupling of  and † can be estimated by a perturbative technique. Consider an integral form of Eq. (31) with the nonlinear term and the Langevin noise term only, Â(z+∆z) = Â(z)+ ∫ z+∆z iκ†(z′)Â(z′)Â(z′)+ ŝ(z′) and †(z′) given by the Hermitian conjugate of Eq. (B6), †(z′) = †(z)+ −iκ†(z′′)†(z′′)Â(z′′)+ ŝ†(z′′) The commutator between ŝ and  at z+∆z becomes [ŝ(z+∆z), Â(z+∆z)] ∫ z+∆z ŝ(z+∆z), †(z′) Â(z′)Â(z′). (B8) ŝ(z+∆z) commutes with Â(z′) because z+∆z > z′, while it fails to commute with †(z′) because †(z′) given by Eq. (B7) depends explicitly on ŝ†. Thus, in the leading order of ∆z, [ŝ(z+∆z), Â(z+∆z)] ∫ z+∆z ŝ(z+∆z), ŝ†(z′′) Â(z′)Â(z′), (B9) which approaches 0 in the limit of ∆z → 0. Hence ŝ commutes with  at z= z′, and the position noise is given only by the first term on the right-hand side of Eq. (B2), resulting in Eq. (38). APPENDIX C: THE JOINTLY GAUSSIAN STATE A Fock state can be expressed as [13, 25] dω1 . . . dωN φ(ω1, . . . ,ωN)|ω1, . . . ,ωN〉, dt1 . . . dtN ψ(t1, . . . , tN)|t1, . . . , tN〉, (C1) where the spectral and temporal eigenstates are given by |ω1, . . . ,ωN〉 ≡ â†(ω1) . . . â†(ωN)|0〉, (C2) |t1, . . . , tN〉 ≡ †(t1) . . .  †(tN)|0〉. (C3) Theses states are eigenstates of the following operators rele- vant to our purpose, Ω̂|ω1, . . . ,ωN〉= |ω1, . . . ,ωN〉, T̂ |t1, . . . , tN〉= |t1, . . . , tN〉, (C5) dω ω2â†â|ω1, . . . ,ωN〉= |ω1, . . . ,ωN〉, dt t2†Â|t1, . . . , tN〉= |t1, . . . , tN〉. (C7) φ(ω1, . . . ,ωN) is the spectral multiphoton probability ampli- tude, and it is related to the temporal probability amplitude ψ(t1, . . . , tN) by the N-dimensional Fourier transform in the slowly-varying envelope regime. Both amplitudes should also satisfy normalization and boson symmetry. To study temporal quantum enhancement, it is convenient to define the probabil- ity amplitude as a jointly Gaussian function [25], φ(ω1, . . . ,ωN) =C exp , (C8) ψ(t1, . . . , tN) =C′ exp −N2B2 , (C9) where B and b are arbitrary and real constants, and C and C′ are normalization constants. Explicit expressions for , ∆ω2, and ∆t2 can be obtained using Eqs. (C4)-(C7) and Appendix B of Ref. [25], = B2, (C10) 4N2B2 , (C11) ∆ω2 = B2 + b2, (C12) ∆t2 = 4N2B2 . (C13) In the limit of b → 0, reaches the Heisenberg limit, 4N2∆ω2 , (C14) and the quantum state can be written as a state of photons with maximal coincident-frequency correlations, |N〉 ∝ dω exp |ω , . . . ,ω〉. (C15) On the other hand, when B2 = b2/N, is at the standard quantum limit, 4N∆ω2 , (C16) the quantum state has only one excited Gaussian mode [25], |N〉 ∝ dω1 . . . |ω1, . . . ,ωN〉 dω exp â†(ω) |0〉, (C17) and therefore also satisfies the coherent-field statistics [12, 13]. These limits and the corresponding quantum states are consistent with those suggested in Ref. [1]. With Eqs. (C12) and (C13), the pulse width ∆t can be determined explicitly in terms of ∆ω and the squeezing ratio R = ∆ω2/(NB2), ∆t2 = (1− 1/N)2 1− 1/(NR) . (C18) [1] V. Giovannetti, S. Lloyd, and L. Maccone, Nature (London) 412, 417 (2001); V. Giovannetti, S. Lloyd, and L. Maccone, Phys. Rev. A 65, 022309 (2002). [2] V. Giovannetti, L. Maccone, J. H. Shapiro, and F. N. C. Wong, Phys. Rev. Lett. 88, 183602 (2002); O. Kuzucu, M. Fiorentino, M. A. Albota, F. N. C. Wong, and F. X. Kärtner, Phys. Rev. Lett. 94, 083601 (2005). [3] M. Tsang, Phys. Rev. Lett. 97, 023902 (2006). [4] V. Giovannetti, S. Lloyd, and L. Maccone, Phys. Rev. Lett. 96, 010401 (2006); V. Giovannetti, S. Lloyd, and L. Maccone, Sci- ence 306, 1330 (2004). [5] S. J. Carter, P. D. Drummond, M. D. Reid, and R. M. Shelby, Phys. Rev. Lett. 58, 1841 (1987); P. D. Drummond and S. J. Carter, J. Opt. Soc. Am. B 4, 1565 (1987); M. J. Potasek and B. Yurke, Phys. Rev. A 35, 3974 (1987); M. J. Potasek and B. Yurke, ibid. 38, 1335 (1988); H. A. Haus, Electromagnetic Noise and Quantum Optical Measurements (Springer, Berlin, 2000). [6] J. M. Fini and P. L. Hagelstein, Phys. Rev. A 66, 033818 (2002). [7] P. L. Hagelstein, Phys. Rev. A 54, 2426 (1996); J. M. Fini, P. L. Hagelstein, and H. A. Haus, ibid. 60, 2442 (1999). [8] B. Huttner and S. M. Barnett, Phys. Rev. A 46, 4306 (1992); R. Matloob, R. Loudon, S. M. Barnett, and J. Jeffers, ibid. 52, 4823 (1995). [9] T. Vaughan, P. Drummond, and G. Leuchs, Phys. Rev. A 75, 033617 (2007). [10] Y. Lai and H. A. Haus, Phys. Rev. A 40, 844 (1989). [11] H. A. Haus and Y. Lai, J. Opt. Soc. Am. B 7, 386 (1990). [12] U. M. Titulaer and R. J. Glauber, Phys. Rev. 140, 676 (1965); U. M. Titulaer and R. J. Glauber, ibid. 145 1041 (1966). [13] L. Mandel and E. Wolf, Optical Coherence and Quantum Op- tics (Cambridge University Press, Cambridge, 1995). [14] G. P. Agrawal, Nonlinear Fiber Optics (Academic Press, San Diego, 2001). [15] H. A. Haus, J. Opt. Soc. Am. B 8, 1122 (1991). [16] Y. Lai and H. A. Haus, Phys. Rev. A 40, 854 (1989). [17] J. P. Gordon and H. A. Haus, Opt. Lett. 11, 665 (1986). [18] H. H. Kuehl, Opt. Lett. 5, 709 (1988). [19] V. A. Bogatyrev, M. M. Bubnov, E. M. Dianov, A. S. Kurkov, P. V. Mamyshev, A. M. Prokhorov, S. D. Rumyantsev, V. A. Semenov, S. L. Semenov, A. A. Sysoliatin, S. V. Chernikov, A. N. Gur’yanov, G. G. Devyatykh, and S. I. Miroshnichenko, J. Lightwave Technol. 9, 561 (1991). [20] L. Grüner-Nielsen, M. Wandel, P. Kristensen, C. Jørgensen, L. V. Jørgensen, B. Edvold, B. Pálsdóttir, and D. Jakobsen, J. Lightwave Technol. 23, 3566 (2005). [21] N. J. Smith, W. Forysiak, and N. J. Doran, Electron. Lett. 32, 2085 (1996). [22] F. X. Kärtner, D. J. Dougherty, H. A. Haus, and E. P. Ippen, J. Opt. Soc. Am. B 11, 1267 (1994); J. F. Corney and P. D. Drummond, ibid. 18, 153 (2001). [23] A. Mecozzi, J. D. Moores, H. A. Haus, and Y. Lai, Opt. Lett. 16, 1841 (1991); Y. Kodama and A. Hasegawa, Opt. Lett. 17, 31 (1992). [24] S. M. Barnett, C. Fabre, and A. Maı̂tre, Eur. Phys. J. D 22, 513 (2003); N. Treps, U. Andersen, B. Buchler, P. K. Lam, A. Maı̂tre, H.-A. Bachor, and C. Fabre, Phys. Rev. Lett. 88, 203601 (2002); N. Treps, N. Grosse, W. P. Bowen, C. Fabre, H.-A. Ba- chor, P. K. Lam, Science 301, 940 (2003). [25] M. Tsang, “Relationship between resolution enhancement and multiphoton absorption rate in quantum lithography,” e-print quant-ph/0607114 (to appear in Phys. Rev. A). http://arxiv.org/abs/quant-ph/0607114 ABSTRACT Quantum enhancement of optical pulse timing accuracy is investigated in the Heisenberg picture. Effects of optical loss, group-velocity dispersion, and Kerr nonlinearity on the position and momentum of an optical pulse are studied via Heisenberg equations of motion. Using the developed formalism, the impact of decoherence by optical loss on the use of adiabatic soliton control for beating the timing standard quantum limit [Tsang, Phys. Rev. Lett. 97, 023902 (2006)] is analyzed theoretically and numerically. The analysis shows that an appreciable enhancement can be achieved using current technology, despite an increase in timing jitter mainly due to the Gordon-Haus effect. The decoherence effect of optical loss on the transmission of quantum-enhanced timing information is also studied, in order to identify situations in which the enhancement is able to survive. <|endoftext|><|startoftext|> Stock market return distributions: from past to present S. Drożdż1,2, M. Forczek1, J. Kwapień1, P. Oświȩcimka1, R. Rak2 1Institute of Nuclear Physics, Polish Academy of Sciences, PL–31-342 Kraków, Poland 2 Institute of Physics, University of Rzeszów, PL–35-959 Rzeszów, Poland Abstract We show that recent stock market fluctuations are characterized by the cumu- lative distributions whose tails on short, minute time scales exhibit power scaling with the scaling index α > 3 and this index tends to increase quickly with de- creasing sampling frequency. Our study is based on high-frequency recordings of the S&P500, DAX and WIG20 indices over the interval May 2004 - May 2006. Our findings suggest that dynamics of the contemporary market may differ from the one observed in the past. This effect indicates a constantly increasing efficiency of world markets. Key words: Financial markets, Inverse cubic power law, q-Gaussian distributions, Multifractality PACS: 89.20.-a, 89.65.Gh, 89.75.-k The so-called financial stylized facts are among the central issues of econo- physics research. Much effort has been devoted on both the empirical and the theoretical level to such phenomena like fat-tailed distributions of financial fluctuations, persistent correlations in volatility, multifractal properties of re- turns etc. Specifically, the interest in the return distributions can be traced back to an early work of Mandelbrot [1] in which he proposed a Lévy process as the one governing the logarithmic price fluctuations. Much later this issue was revisited in [2] based on data with much better statistics and a new model of exponentially-truncated Lévy flights was introduced. Then, in an extensive systematic study of the largest American stock markets [3] the distribution tails for both the prices and the indices were shown to be power-law with the scaling exponent α ≃ 3. The most striking outcome of that study was that de- spite the fact that the tails were well outside the Lévy-stable regime (α ≤ 2), they were apparently stable under time aggregation up to several days for in- dices and up to a month for stocks. The existence of return distributions with Preprint submitted to Elsevier 11 October 2018 http://arxiv.org/abs/0704.0664v1 scaling tails was also reported in other markets like e.g. London [4], Frank- furt [5], Paris [6], Oslo [7], Tokyo [3], and Hong Kong [3,8] but sometimes with a slightly different value of the scaling index. This empirical property of price and index returns led to the formulation of the so-called inverse cubic power law [3], which was soon followed by an attempt of formulating its theoretical foundation [6] (see also [4]). Subsequent related study [9] revealed that, opposite to the earlier outcomes of [3], the tail shape of the return distributions might no longer be so stable along time axis. After comparison of the results obtained from the American stock market data in years 1994-95 and in 1998-99, it turned out that in more recent data the scaling tails with α ≃ 3 for individual companies are preserved up to the time scales (sampling intervals) ∆t of less than one hour instead of one month. This earlier crossover for 1998-99 data can easily be seen in Fig. 1. This result was obtained by extending our previous analysis [9] over a set of 1000 largest American companies [10] in order to enable more direct comparison with outcomes of ref. [3] based on the same number of stocks. The inverse cubic scaling is still evident in Fig. 1 for short time scales up to ∆t = 4 min, but the scaling index starts rising already for data with ∆t = 16 min and for longer time scales the tail behaviour is clearly governed by the Central Limit Theorem. The difference of the results for the same market but for different time intervals might suggest that the scaling behaviour is not stable and depends on some crucial factors as, for example, the speed of information processing which constantly increases from past to present. This possibility can further be examined by considering even more recent data from the American market. First, we look at the S&P500 index which already was the subject of an analysis in [3]. Our data is a time series of 1 min returns covering the period May 2004 - May 2006 (in [3] the period was 1984-1996). The c.d.f. for this data is presented in Fig. 2(a) for several time scales up to 120 min. The most interesting feature is the lack of inverse cubic scaling even for the shortest one-minute returns: in this case the actual scaling index is slightly above 4 and it systematically increases with decreasing sampling frequency (see Table 1). We leave open here the question what factor underlies the evident absence of the α ≃ 3 type of scaling: the dynamics of S&P500 returns could have changed sufficiently significantly since earlier half of 1990s and the inverse cubic scaling no longer exists or it still exists but is restricted to time scales shorter than 1 minute. That our observation is more universal and can be made for other markets as well one may infer from Fig. 2(b) and Fig. 2(c) presenting c.d.f. for the German index DAX and for the Polish index WIG20, respectively, for the same period of time. While the returns of DAX did not exactly comply with the inverse cubic scaling also in the period 1998-99 [9], the ones of WIG20 indeed used to display this kind of behaviour in the past as documented in [11]. However, nowadays WIG20 also develops much thinner tails with α > 4 for ∆t = 1 min (Table 1). abs. norm. returns n 1 min 4 min 16 min 60 min 120 min 240 min 1 day 2 days 4 days 8 days 1000 stocks α = 3.0 Gaussian Fig. 1. Cumulative distributions of normalized stock returns averaged over 1000 highly-capitalized American companies in time interval Dec 1997 - Dec 1999 for several different time scales from 1 min to 8 days. Gaussian distribution and inverse cubic scaling are also shown for comparison. Best-fit power index α calculated by means of log-log regression assumes the following values: 3.08± 0.05 (∆t = 1 min), 3.34 ± 0.05 (4 min), 4.00 ± 0.04 (16 min), 4.60 ± 0.05 (60 min), 4.95 ± 0.06 (120 min), 4.81± 0.15 (240 min), 5.90± 0.08 (1 day), 7.18± 0.28 (2 days), 9.17± 0.22 (4 days), and 8.32 ± 0.40 (8 days). ∆t 1 min 4 min 16 min 32 min 60 min S&P500 4.12± 0.12 4.21± 0.08 5.18 ± 0.21 5.53 ± 0.20 6.10 ± 0.35 DAX 3.56± 0.12 3.76± 0.05 4.44 ± 0.16 5.14 ± 0.26 5.16 ± 0.66 WIG20 4.28± 0.16 5.24± 0.27 5.81 ± 0.78 5.61 ± 0.45 6.30 ± 0.72 Table 1 Values of the scaling index α obtained with a log-log regression fit for three different market indices (S&P500, DAX, and WIG20) from the interval May 2004 - May 2006. Recent studies [11,12] showed that in a wide range of returns the financial return distributions can be approximated by a family of q-Gaussians [13] with the parameter q depending on the sampling interval ∆t. The q-Gaussian dis- tributions follow naturally from the nonextensive statistical mechanics [13,12] and their c.d.f. can be written as [11] P (X > x) = Nq (3− q)β) 2Γ(β) ± (x− µ̄q)2F1(α, β; γ; δ) , (1) Fig. 2. Cumulative distributions of index returns for American S&P500 (top row), German DAX (middle row) and Polish WIG20 (bottom row) recorded over the in- terval May 2004 - May 2006. Data points for sampling intervals from 1 min to 120 min are denoted by different symbols. (Left column) Distributions for normalized returns are shown together with Gaussian distribution and lines corresponding to inverse cubic scaling and, approximately, the actual scaling. (Right column) Exper- imental distributions are best-fitted by q-Gaussians with a free parameter q (fitted values displayed) separately for negative and positive returns. Note the small asym- metry between left and right tails. 0,2 0,3 0,4 0,5 0,6 0,7 0,8 WIG20 Fig. 3. Singularity spectra for 1 min index returns: S&P500 (top), DAX (middle), and WIG20 (bottom). In each case, the actual data (solid line) is accompanied by its randomized version averaged over 10 independent realizations (full symbols). where α = 1 , β = 1/(q− 1), γ = 3 , δ = −Bq(q− 1)(µ̄q − x)2, 2F1(α, β; γ; δ) = δk(α)k(β)k k!(γ)k is the Gauss hypergeometric function and Nq, µ̄q are, respec- tively, the normalization factor and q-mean of the q-Gaussian p.d.f. [13]. Fig. 2(d)-2(f) exhibit cumulative distributions of returns for the same three indices with the corresponding best fits in terms of Eq.(1). The theoretical curves are in satisfactory agreement with the data for all the considered val- ues of ∆t. It is noteworthy that, consistently with the left-hand side panels of Fig. 2, the largest values of q are well below q = 3/2, which corresponds to the inverse cubic scaling, and decrease with decreasing sampling frequency towards the classic Gaussian distribution with q = 1. Finally, we look at the singularity spectra f(α) (for numerical details of the method see e.g. [15]) of our time series under study. Two cases are considered: original data comprising the full variety of nonlinear temporal correlations (solid lines in Fig. 3) and the randomized data in which all the correlations are removed by shuffling the data points (full symbols in Fig. 3). Since both the nonlinear dependencies and the fat-tailed distributions can be potential sources of multifractality, the data shuffling can give some information of how rich is the multiscaling behaviour due to each of these sources [14,15]. In the present context the most interesting feature is that the Gaussian distribution of uncorrelated data is associated with a monofractal f(α) spectrum. Fig. 3 shows the singularity spectra for S&P500, DAX and WIG20 (top to bot- tom) together with their randomized-data counterparts. In all three cases, the original data clearly represent multifractal processes with DAX and S&P500 showing richer multifractality than WIG20. This picture changes completely if we look at the randomized data: we cannot detect sufficiently significant trace of multifractality (f(α) is almost point-like). This is in agreement with the observation that the distribution tails for the contemporary data tend to be thinner than before. This result gives also an additional argument in favour of the statement that the principal (here even the unique) source of multifractal properties of the stock market data are the nonlinear correlations. In this paper we have shown that c.d.f. for the most recent stock market data represented by the index returns develops tails whose scaling index rises above the value of 3 even for short, minute sampling intervals. This means that con- temporary market dynamics significantly differs from the one observed 20 or even 10 years ago and described by the inverse cubic power law [3]. That these changes are a continuous process rather that a sudden transition we infer from the existence of intermediate stages in which the inverse cubic scaling was ob- served up to medium time scales of tens of minutes but was absent in daily data [9,11]. This effect suggests a scenario of constantly increasing market efficiency due to an acceleration of information processing in the world mar- kets [9,16]. The related compression of the range of potential time correlations between consecutive returns in the present analysis finds evidence in a faster convergence towards a Gaussian distribution for aggregated returns. Such a faster convergence indicates weeker time-correlations between returns. Further argument in favour of an increasing stock market efficiency comes from the autocorrelation analysis. The autocorrelation functions calculated exlicitely from the 1 min returns for the three indices considered above drop down to the noise level already for time-lags as small as 1-2 min. This is to be com- pared to ∼ 5 min in the period 1998-99 [9], and to ∼ 20 min which according to ref. [3] was characteristic for the period 1994-95. References [1] B. Mandelbrot, J. Business 36, 294 (1963) [2] R.N. Mantegna, H.E. Stanley, Nature 376 (1995) 46 [3] P. Gopikrishnan, V. Plerou, L.A. Nunes Amaral, M. Meyer, H.E. Stanley, Phys. Rev. E 60 (1999) 5305; V. Plerou, P. Gopikrishnan, L.A.N. Amaral, M. Meyer, H.E. Stanley, Phys. Rev. E 60 (1999) 6519 [4] J.D. Farmer, F. Lillo, Quant. Finance 4 (2004) C7; J.D. Farmer, L. Gillemot, F. Lillo, S. Mike, A. Sen, Quant. Finance 4 (2004) 383 [5] T. Lux, Appl. Financial Economics 6 (1996) 463 [6] X. Gabaix, P. Gopikrishnan, V. Plerou, H.E. Stanley, Nature 423 (2003) 267 [7] J.A. Skjeltorp, Physica A 283 (2000) 486 [8] Z.F. Huang, Physica A 287 (2000) 405 [9] S. Drożdż, J. Kwapień, F. Grümmer, F. Ruf, J. Speth, Acta Phys. Pol. B 34 (2003) 4293, cond-mat/0208240 [10] See http://www.taq.com [11] R. Rak, S. Drożdż, J. Kwapień, Physica A 374 (2007) 315 [12] C. Tsallis, C. Anteneodo, L. Borland, R. Osorio, Physica A 324 (2003) 89 [13] C. Tsallis, R.S. Mendes, A.R. Plastino, Physica A 261 (1998) 534 [14] K. Matia, Y. Ashkenazy, H.E. Stanley, Europhys. Lett. 61 (2003) 422 [15] J. Kwapień, P. Oświȩcimka, S. Drożdż, Physica A 350 (2005) 466 [16] J. Kwapień, S. Drożdż, J. Speth, Physica A 337 (2004) 231 http://arxiv.org/abs/cond-mat/0208240 http://www.taq.com References ABSTRACT We show that recent stock market fluctuations are characterized by the cumulative distributions whose tails on short, minute time scales exhibit power scaling with the scaling index alpha > 3 and this index tends to increase quickly with decreasing sampling frequency. Our study is based on high-frequency recordings of the S&P500, DAX and WIG20 indices over the interval May 2004 - May 2006. Our findings suggest that dynamics of the contemporary market may differ from the one observed in the past. This effect indicates a constantly increasing efficiency of world markets. <|endoftext|><|startoftext|> Introduction In this paper, we study the Cauchy problem for the Hartree equation iut +∆u = f(u), in R n × R, n ≥ 5, u(0) = ϕ(x), in Rn. (1.1) Here f(u) = V ∗ |u|2 u is a nonlinear function of Hartree type for V (x) = |x|−γ , 0 < γ < n, where ∗ denotes the convolution in Rn. In practice, we use the integral formula of (1.1) u(t) = U(t)ϕ− i U(t− s)f(u(s))ds, (1.2) where U(t) = eit∆. http://arxiv.org/abs/0704.0665v2 If the solution u of (1.1) has sufficient smoothness and decay at infinity, it satisfies two conservation laws : M(u(t)) = ∥u(t) E(u(t)) = ∥∇u(t) |x− y|γ |u(t, x)|2|u(t, y)|2 dxdy = E(ϕ). (1.3) As explained in [6], the energy is also conserved for the energy solutions u ∈ C0t (R,H From the viewpoint of the fractional integral, we rewrite the equation (1.1) as iut +∆u = (−∆)− 2 |u|2 For dimension n ≥ 5, the exponent γ = 4 is the unique exponent which is energy critical in the sense that the natural scale transformation uλ(t, x) = λ 2 u(λ2t, λx), leaves the energy invariant, in other words, the energy E(u) is a dimensionless quantity. The Cauchy problem of the Hartree equation has been intensively studied ([4-10], [15, 16, 18, 19]. With regard to the global well-posedness and scattering results, they all dealt with the Ḣ1-subcritical case 2 < γ < min(4, n) in the energy space or some weighted spaces. In [16], we obtained the small data scattering result for the Ḣ1-critical case in the energy space. For the large initial data for the Ḣ1-critical case γ = 4, n ≥ 5 in the energy space , the argument in [16] can not yield the global well-posedness, even with the conservation of the energy (1.3), because the time of existence given by the local theory depends on the profile of the data as well as on the energy. Concerning the Ḣ1-subcritical case 2 < γ < min(4, n) , using the method of Morawetz and Strauss [17], J. Ginibre and G. Velo [6] developed the scattering the- ory in the energy space, where they exploited the properties of ∆ and obtain the usual Morawetz estimate ∣u(t, x) |x− y|γ ∇|u(t, y)|2dydxdt . CE(u). Later, K. Nakanishi [18] exploited the properties of i∂t + ∆ and used a certain related Sobolev-type inequality to obtain a new Morawetz estimate |t|1+ν |u(t, x)| (|t|+ |x|)2+ν dxdt ≤ C(E, ν), for any ν > 0, which was independent of the nonlinearity. In this paper, we deal with the Cauchy problem of the Hartree equation with the large data for the Ḣ1-critical case γ = 4, n ≥ 5 . Inspired by the approach of Bourgain [1] and Tao [22] in the case of the Ḣ1-critical Schrödinger equation with the local nonlinear term, we obtain the global well-posedness and scattering results for the Hartree equation for the large radial data in Ḣ1. The new ingredient is that we take advantage of the following localized estimate for the first time |x|≤A|I|1/2 |u(t, x)|2∆ dxdt = (n− 3) |x|≤A|I|1/2 |u(t, x)|2 dxdt ≤ A|I|1/2C(E) to rule out the possibility of energy concentration, instead of the classical Morawetz estimate ∇V (x− y)|u(x)|2|u(y)|2dydxdt . C(E) due to the nonlinear term . Our main result is the following global well-posedness result in the energy space. Theorem 1.1. Let n ≥ 5, and ϕ ∈ Ḣ1 be radial. then there exists a unique global solution u ∈ C0t (Ḣ (iut +∆u)(t, x) = uV ∗ |u|2 (t, x), in Rn × R, u(0) = ϕ(x), in Rn. (1.4) where V (x) = |x|−4 and on each compact time interval [t−, t+], we have x ([t−,t+]×Rn) ). (1.5) As the right hand side of (1.5) is independent of t−, t+, we can obtain the global spacetime estimate. As a direct consequence of the global L6tL x estimate, we have scattering, asymptotic completeness, and uniform regularity. Corollary 1.1. Let ϕ be radial and have finite energy. Then there exists finite energy solutions u±(t, x) to the free Schrödinger equation iut +∆u = 0 such that ∥u±(t)− u(t) → 0 as t → ±∞. Furthermore, the maps ϕ 7→ u±(0) are homeomorphisms from Ḣ 1(Rn) to Ḣ1(Rn). Fi- nally, if ϕ ∈ Hs for some s > 1, then u(t) ∈ Hs for all time t, and one has the uniform bounds ∥u(t) ≤ C(E(ϕ), s) The paper is organized as follows. In Section 2, we introduce notations and the basic estimates; In Section 3, we derive the local mass conservation and Morawetz inequality; In Section 4, we discuss the local theory for (1.4); In Section 5, we obtain the perturbation theory; Finally, we prove the main theorem in Section 6. 2 Notations and basic estimates We will often use the notations a . b and a = O(b) to denote the estimate a ≤ Cb for some C. The derivative operator ∇ refers to the space variable only. We also occasionally use subscripts to denote the spatial derivatives and use the summation convention over repeated indices. We define 〈a, b〉 = Re(ab), ∂ = (∂t,∇), D = (− ,∇); For 1 ≤ p ≤ ∞, we denote by p′ the dual exponent, that is, 1 For any time interval I, we use L x(I×R n) to denote the mixed spacetime Lebesgue x(I×R Lr(Rn) with the usual modifications when q = ∞. When q = r, we abbreviate L x by L We use U(t) = eit∆ to denote the free group generated by the free Schrödinger equation iut +∆u = 0. It can commute with derivatives, and obeys the inequality ∥eit∆f Lp(Rn) . |t| −n( 1 (2.1) for t 6= 0, 2 ≤ p ≤ ∞. We say that a pair (q, r) is admissible if 2 ≤ r ≤ ∞, n = 1; < ∞, n = 2; , n ≥ 3. For a spacetime slab I ×Rn, we define the Strichartz norm Ṡ0(I) by Ṡ0(I) := sup (q,r) admissible x(I×R and define Ṡ1(I) by Ṡ1(I) Ṡ0(I) When n ≥ 3, the spaces Ṡ0(I), ‖ · ‖ Ṡ0(I) Ṡ1(I), ‖ · ‖ Ṡ1(I) are Banach spaces, respectively. Based on the above notations, we have the following Strichartz inequalities Lemma 2.1. [11], [21] Let u be an Ṡ0 solution to the Schrödinger equation (1.1). Then ∥u(t0) ∥f(u) x (I×R for any t0 ∈ I and any admissible pairs (q, r). The implicit constant is independent of the choice of interval I. From Sobolev embedding, we have Lemma 2.2. For any function u on I × Rn, we have L∞t L L∞t L where all spacetime norms are on I × Rn. For convenience, we introduce two abbreviated notations. For a time interval I, we denote x (I×R W (I) x (I×R Lemma 2.3. Let f(u)(t, x) = uV ∗ |u|2 (t, x), where V (x) = |x|−4. For any time interval I and t0 ∈ I, we have ei(t−s)∆f(u)(s, x)ds Ṡ1(I) W (I) Proof: By Strichartz estimates, Hardy-Littlewood-Sobolev inequality and Hölder inequality, we have ei(t−s)∆f(u)(s, x)ds Ṡ1(I) . ‖∇f(u)(t, x)‖ x (I×R . ‖∇uV ∗ |u|2‖ x (I×Rn) + ‖uV ∗ (u∇u)‖ x (I×Rn) . ‖∇u‖ x (I×Rn) ‖V ∗ |u|2‖ x (I×Rn) + ‖u‖ x (I×Rn) ‖V ∗ (u∇u)‖ x (I×R W (I) 3 Local mass conservation and Morawetz inequality In this section, we will prove two useful estimates. One is a local mass conservation estimate and the other is a Morawetz inequality, which appears in Morawetz identity. The local mass conservation estimate is used to control the flow of mass through a region of space, and the Morawetz inequality is used to prevent concentration. 3.1 Local mass conservation We recall a local mass conservation law that has appeared in [1], [13] and [22]. For completeness, we give the sketch of the proof. Let χ be a bump function supported on the ball B(0, 1) that equals 1 on the ball B(0, 1/2). Observe that if u is a finite energy solution of (1.4), then ∣u(t, x) = −2∇ · Im(u∇u(t, x)). We define Mass(u(t), B(x0, R)) := (x− x0 u(t, x) Differentiating the above quantity with respect to time, we obtain by the integration by parts ∂tMass(u(t), B(x0, R)) = (x− x0 ∣u(t, x) (x− x0 ∇ · Im(u∇u)dx (x− x0 (x− x0 Im(u∇u)dx ∥∇u(t) Mass(u(t), B(x0, R)) hence, we have Mass(u(t1), B(x0, R)) 1/2 −Mass(u(t2), B(x0, R)) ∣t1 − t2 ∣. (3.1) This implies that if the local mass Mass(u(t), B(x0, R)) is large for some time t, then it can also be shown to be similarly large for nearly time t, by increasing the radius R if necessary to reduce the rate of change of the mass. On the other hand, from Sobolev and Hölder inequalities, we have Mass(u(t), B(x0, R)) ≤ (x− x0 . (3.2) This gives the control of mass in small volumes. 3.2 A Morawetz inequality To prevent the concentration of the energy, we need a Morawetz estimate. The Morawetz estimate is based on some integral identity derived by variation of the lagrangian. We define ℓ(u) by 2ℓ(u) = 〈iut, u〉+ |∇u| |u|2(V ∗ |u|2) ℓ(u) is the lagrangian density associated to the equation (1.1). From the definition of the variation of the functional ℓ, we have δvℓ(u) : = lim ℓ(u+ ǫv)− ℓ(u) = 〈iut +∆u− u(V ∗ |u| 2), v〉 + ∂ · 〈Du, v〉. Using this identity together with h = , q = Re(D·h) = and Mu = h ·Du+qu, we obtain the following formula: 〈iut +∆u− u(V ∗ |u| 2),Mu〉 =− ∂ · 〈Du,Mu〉 +D · hℓ(u) + 〈Du, ∂hαDαu〉 − D · ∂q − (V ∗ ∇|u|2) · |u|2. As a consequence of the above dilation identity, we have the following Morawetz estimate, which plays an important role in our proof. Proposition 3.1 (Morawetz estimate). Let u be a solution to (1.4) on a spacetime slab I × Rn. Then for any A ≥ 1, we have |x|≤A|I|1/2 dxdt− ∇V (x− y)|u(x)|2|u(y)|2dydxdt . A|I|1/2E, where Ω = (x, y) ∈ Rn ×Rn; |x| ≤ A|I|1/2; |y| ≤ A|I|1/2 Remark 3.1. Since ∇V (x− y) = 4 |x||y| − x · y |x− y|6 we have ∇V (x− y)|u(x)|2|u(y)|2dydxdt ≥ 0. Proof: We define V a0 (t) = a(x)|u(t, x)|2dx, then Ma0 (t) =: ∂tV 0 (t) = 2Im ajujudx. 0 (t) = −2Im ajjutudx− 4Im ajujutdx △△a|u|2dx+ 4Re ajkujukdx − 2Re ∇a(x)∇V (x− y)|u(y)|2|u(x)|2dxdy △△a|u|2dx+ 4Re ajkujukdx ∇a(x)−∇a(y) ∇V (x− y)|u(y)|2|u(x)|2dxdy where we use the symmetry of a(x) and V (x). Let R > 0 and let η be a bump function adapted to the ball |x| ≤ R which equals 1 on the ball |x| ≤ R/2. We set a(x) := |x|η(x). For |x| ≤ R/2, we have ; ajk = ; △a = ; −△△a = (n− 1)(n − 3) and for R/2 ≤ |x| ≤ R, we have bounds aj = O(1); ajk = O(R −1); △△a = O(R−3). Thus we have 0 (t) = (n− 1)(n − 3) |x|≤R/2 dx+ 4 |x|≤R/2 |∇u|2 − |∂ru| ∇V (x− y)|u(x)|2|u(y)|2dydx |x|∼R ( |u|2 |∇u|2 aj(x)− aj(y) ) xj − yj |x− y|γ+2 |u(x)|2|u(y)|2dydx where γ = 4, (x, y) ∈ Rn × Rn; |x| ≤ R/2, |y| ≤ R/2 (x, y) ∈ Rn × Rn; |x| ∼ R (x, y) ∈ Rn × Rn; |y| ∼ R Meanwhile |x|∼R ( |u|2 |∇u|2 dx . R−1E, aj(x)− aj(y) ) xj − yj |x− y|γ+2 |u(x)|2|u(y)|2dydx Ω2: |x−y|≤R/4 aj(x)− aj(y) ) xj − yj |x− y|γ+2 |u(x)|2|u(y)|2dydx Ω2: |x−y|≥R/4 aj(x)− aj(y) ) xj − yj |x− y|γ+2 |u(x)|2|u(y)|2dydx . R−1 |x− y|γ |u(x)|2|u(y)|2dydx . R−1E. Moreover, from Sobolev and Hölder inequalities, we have Ma0 (t) . |x|.R |u||∇u| . ‖u‖ ‖∇u‖L2x |x|.R . RE. So if we integrate by parts on a time interval I and take R = 2A|I|1/2, we obtain |x|≤A|I|1/2 dxdt− ∇V (x− y)|u(x)|2|u(y)|2dydxdt . A|I|1/2E for n ≥ 4. The proof is completed. 4 Local theory In this section, we develop a local well-posedness and blow-up criterion for the Ḣ1-critical Hartree equation. First, we have Proposition 4.1 (Local well-posedness). Let u(t0) ∈ Ḣ 1, and I be a compact time interval that contains t0 such that ∥U(t− t0)u(t0) for a sufficiently small absolute constant η > 0. Then there exists a unique strong solution to (1.4) on I × Rn such that ∥u(t0) Proof: The proof of this proposition is standard and based on the contraction mapping arguments. We define the solution map to be Φ(u)(t) := U(t− t0)u(t0)− i U(t− s)f(u(s))ds, then Φ is a map from B = {u : ‖u‖X(I) ≤ 2η, ‖u‖W (I) ≤ 2C‖u(t0)‖Ḣ1} with the metric ‖u‖B = ‖u‖X(I) + ‖u‖W (I) onto itself because ‖Φ(u)‖X(I) ≤ ‖U(t− t0)u(t0)‖X(I) + C‖u‖ X(I)‖u‖W (I) ≤ η + 8Cη 2‖u(t0)‖Ḣ1 ≤ 2η; ‖Φ(u)‖W (I) ≤ C‖u(t0)‖Ḣ1+C‖u‖ X(I)‖u‖W (I) ≤ C‖u(t0)‖Ḣ1+8Cη 2‖u(t0)‖Ḣ1 ≤ 2C‖u(t0)‖Ḣ1 . It suffices to prove Φ is a contraction map. Let u, v ∈ B, then ‖Φ(u)− Φ(v)‖W (I) ≤ U(t− s)(V ∗ (ū− v̄)u)u(s, x)ds W (I) U(t− s)(V ∗ v̄(u− v))u(s, x)ds W (I) U(t− s)(V ∗ v̄v)(u− v)(s, x)ds W (I) By Lemma 2.3, we have ‖Φ(u)− Φ(v)‖W (I) ≤ ‖u− v‖X(I) ‖u‖W (I)‖u‖X(I) + ‖u‖W (I)‖v‖X(I) + ‖v‖W (I)‖v‖X(I) + ‖u− v‖W (I) ‖u‖X(I)‖u‖X(I) + ‖u‖X(I)‖v‖X(I) + ‖v‖X(I)‖v‖X(I) ≤ 12Cη‖u(t0)‖Ḣ1‖u− v‖X(I) + 12η 2‖u− v‖W (I) (‖u− v‖X(I) + ‖u− v‖W (I)) In the same way, we have ‖Φ(u)− Φ(v)‖X(I) ≤ 12Cη‖u(t0)‖Ḣ1‖u− v‖X(I) + 12η 2‖u− v‖W (I) (‖u− v‖X(I) + ‖u− v‖W (I)) as long as η is chosen sufficiently small. Then the contraction mapping theorem implies the existence of the unique solution to (1.4) on I. Next, we give the blow-up criterion of the solutions for (1.4). The usual form is similar to those in [2], [12], which is in the form of a maximal interval of existence. For convenience, we obtain Proposition 4.2 (Blow-up criterion). Let ϕ ∈ Ḣ1, and let u be a strong solution to (1.4) on the slab [0, T ) × Rn such that X([0,T )) Then there exists δ > 0 such that the solution u extends to a strong solution to (1.4) on the slab [0, T + δ] × Rn. Proof: By the absolute continuity of integrals, there exists a t0 ∈ [0, T ), such that ‖u‖X([t0,T )) ≤ η/4, then by Lemma 2.3, we have ‖u‖W ([t0,T )) . ‖u(t0)‖Ḣ1 + ‖u‖ X([t0,T )) ‖u‖W ([t0,T )), therefore ‖u‖W ([t0,T )) . ‖u(t0)‖Ḣ1 . Now we write U(t− t0)u(t0) = u(t) + i U(t− s)(V ∗ |u|2)u(s, x)ds, ‖U(t−t0)u(t0)‖X([t0,T )) ≤ ‖u‖X([t0,T ))+C‖u‖ X([t0,T )) ‖u‖W ([t0,T )) ≤ +Cη2‖u(t0)‖Ḣ1 ≤ By the absolute continuity of integrals again, there exists a δ, such that ‖U(t− t0)u(t0)‖X([t0,T+δ)) ≤ η. Thus we may apply Proposition 4.1 on the interval [t0, T + δ] to complete the proof. In other words, this lemma asserts that if [t0, T ∗) is the maximal interval of existence and T ∗ < ∞, then ‖u‖X([t0,T ∗)) = ∞. 5 Perturbation result In this section, we obtain the perturbation for Hartree equation, which shows that the solution can not be large if the linear part of the solution is not large. This is an analogue of Lemma 3.2 in [22], and later, Killip, Visan and Zhang [13] gave the similar perturbation result for the Schrödinger equation with the quadric potentials. Lemma 5.1 (Perturbation lemma). Let u be a solution to (1.4) on I = [t1, t2] such that where η is sufficiently small constant depending on the norm of the initial data, then Ṡ1(I) where uk(t) = U(t− tk)u(tk) for k = 1, 2. Proof: From Strichartz estimate and Lemma 2.3, we obtain Ṡ1(I) ∥u(t1) W (I) ∥u(t1) Ṡ1(I) ∥u(t1) Ṡ1(I) If η is sufficiently small, we have the first claim Ṡ1(I) As for the second claim, we give the proof for k = 1, the case k = 2 is similar. Using Strichartz estimate and Lemma 2.3 again, we have ∥u− u1 Ṡ1(I) . η2, therefore, the second claim follows by the triangle inequality and choosing η sufficiently small. 6 Global well-posedness In this section, we give the proof of Theorem 1.1. The new ingredient is that we first take advantage of the the estimate of the term |x|≤A|I|1/2 dxdt in the localized Morawetz identity to rule out the possibility of energy concentration, which is indepen- dent of the nonlinear term. For the Schrödinger equation, Tao [22] used the classical Morawetz estimate, which depends on the nonlinearity, to prevent the concentration. For readability, we first take some constants C1 = 6n; C2 = 3; C3 = 18n. (6.1) which come from several constraints in the rest of this section. All implicit constants in this section are permitted to depend on the dimension n and the energy. Fix E, [t−, t+], u. We may assume that the energy is large, E > c > 0, otherwise the claim follows from the small energy theory [16]. From the boundedness of energy and Sobolev embedding, we can obtain ∥u(t) ∥u(t) . 1 (6.2) for all t ∈ [t−, t+]. Assume that the solution u already exists on [t−, t+]. By Lemma 4.2, it suffices to obtain a priori estimate X([t−,t+]) ≤ O(1), (6.3) where O(1) is independent of t−, t+. We may assume that X([t−,t+]) ≥ 2η, otherwise it is trivial. We divide [t−, t+] into J subintervals Ij = [tj , tj+1] for some J ≥ 2 such that X(Ij) ≤ η, (6.4) where η is a small constant depending on the dimension n and the energy. As a conse- quence, it suffices to estimate the number J . Now let u± = U(t − t±)u(t±). By Sobolev embedding and Strichartz estimates, we X([t−,t+]) . 1. (6.5) We adapt the following definition of Tao [22]. Definition 6.1. We call Ij exceptional if X(Ij) > ηC3 for at least one sign ±. Otherwise, we call Ij unexceptional. From (6.5), we obtain the upper bound on the number of exceptional intervals, O(η−6C3). We may assume that there exist unexceptional intervals, otherwise the claim would follow from this bound and (6.4). Therefore, it suffices to compute the number of unexceptional intervals. We first prove the existence of a bubble of mass concentration in each unexceptional interval. Proposition 6.1 (Existence of a bubble). Let Ij be an unexceptional interval. Then there exists xj ∈ R n such that Mass(u(t), B(xj , η −C1 |Ij | 1/2)) & ηC1 |Ij| for all t ∈ Ij . Proof: By time translation invariance and scale invariance, we may assume that Ij = [0, 1]. We subdivide Ij further into [0, ] and [1 , 1]. By (6.4) and the pigeonhole principle and time reflection symmetry if necessary, we may assume that X([ 1 Thus by Lemma 5.1, we have X([ 1 . (6.6) By Duhamel formula, we have ) = U(t− t−)u(t−)− i U(t− s)f(u(s))ds U(t− s)f(u(s))ds. (6.7) Since [0, 1] is unexceptional interval, we have ∥U(t− t−)u(t−) X([ 1 ∥u−(t) X([ 1 ≤ ηC3 . On the other hand, by (6.4), Lemma 2.2 , Lemma 2.3 and Lemma 5.1, we have U(t− s)f(u(s))ds X([ 1 X([ 1 W ([ 1 Ṡ1([ 1 . η2. Thus the triangle inequality implies that U(t− s)f(u(s))ds X([ 1 provided η is chosen sufficiently small. Hence, if we define v(t) := U(t− s)f(u(s))ds, then we have X([ 1 η. (6.8) Next, we estimate the upper bound on v. We have by (6.7) and the triangle inequality Ṡ1([ 1 Ṡ1([ 1 ∥U(t− t−)u(t−) Ṡ1([ 1 U(t− s)f(u(s))ds Ṡ1([ 1 X([0, 1 W ([0, 1 X([0, 1 Ṡ1([0, 1 (6.9) where we use Strichartz estimate, (6.4) and Lemma 5.1. We shall need some additional regularity control on v. For any h ∈ Rn, let u(h) denote the translation of u by h, i.e. u(h)(t, x) = u(t, x− h). Lemma 6.1. Let χ be a bump function supported on the ball B(0, 1) of total mass one, and define vav(t, x) = χ(y)v(t, x+ ηC2y)dy, then we have ∥v − vav X([ 1 . ηC2 . Proof: By the chain rule, Hölder inequality and Sobolev embedding, we have ∥∇f(u)(s) ∥(V ∗ |u|2)∇u ∥u(V ∗ ∇|u|2) ∥V ∗ |u|2 ∥V ∗ ∇|u|2 ∥|u|2 ∥∇|u|2 it follows by (2.1) L∞t L ,1]×Rn) ≤ sup t∈[ 1 |t− s|2 ∥∇f(u)(s) ds . 1. From (6.9) and interpolation, we have L∞t L ,1]×Rn) L∞t L ,1]×Rn) L∞t L ,1]×Rn) From the fundamental theorem of calculus, we have ∥v − v(h) L∞t L ,1]×Rn) . |h|. This implies ∥v − vav L∞t L ,1]×Rn) ∥v(t, x+ ηC2y)− v(x) L∞t L ,1]×Rn) χ(y)|ηC2y|dy . ηC2 . Hence from Hölder inequality, we obtain ∥v − vav X([ 1 ∥v − vav L∞t L ,1]×Rn) . ηC2 . This completes the proof of Lemma. Now we return to the proof of Proposition 6.1. By Lemma 6.1 and (6.8), we have X([ 1 & η. (6.10) On the other hand, by Hölder inequality, Young inequalities and (6.9), we have 2(3n−8) ,1]×Rn) L∞t L ,1]×Rn) L∞t L ,1]×Rn) Interpolating with (6.10) gives L∞t,x([ ,1]×Rn)) X([ 1 − 3n−8 2(3n−8) ,1]×Rn) Thus there exists (sj, xj) ∈ [ , 1] × Rn such that ∣vav(sj, xj) ∣ & η Hence, by Cauchy-Schwarz inequality, we have ∣vav(sj , xj) χ(y)v(sj , xj + η C2y)dy = η−nC2 x− xj )v(sj , x)dx . η−nC2η C2Mass(v(sj), B(xj , η C2))1/2, that is Mass(v(sj), B(xj , η C2)) & η3n−6+nC2 & ηC1 . (6.11) Observe that (3.1) also holds for v. If we take R = η−C1 and choose η sufficiently small, we have Mass(v(t), B(xj , η −C1)) & Mass(v(sj), B(xj , η −C1))1/2 − & (Mass(v(sj), B(xj , η C2))1/2 − ηC1)2 & ηC1 (6.12) for all t ∈ [0, 1]. The last step is to show that this mass concentration holds for u. We first show mass concentration for u at time 0. Since [0, 1] is unexceptional interval, by the pigeonhole principle, there is a τj ∈ [0, 1] such that ∥u−(τj) . ηC3 , and so by Hölder inequality, Mass(u−(τj), B(xj , η −C1)) . (x− xj ∥u−(τj) C1+2C3 . η2C1 . From (3.1), we have Mass(u−(0), B(xj , η −C1)) . η2C1 . (6.13) Recall that u(0) = u−(0) − iv(0). Combing (6.12) and (6.13) with the triangle inequality, we obtain Mass(u(0), B(xj , η −C1)) & ηC1 . (6.14) Using (3.1) again, we obtain the result. Next, we use the radial assumption to show that the bubble of mass concentration must occur at the spatial origin. In the forthcoming paper, we shall use the interaction Morawetz estimate with the frequency localized L2 almost-conservation law to rule out the possibility of the energy concentration at any place and deal with the non-radial data. The corresponding results for the Schrödinger equation with local nonlinearity, please see [3], [20] and [23]. Corollary 6.1 (Bubble at the origin). Let Ij be an unexceptional interval. Then Mass(u(t), B(0, η−3C1 |Ij| 1/2)) & ηC1 |Ij | for all t ∈ Ij . Proof: If xj in Proposition 6.1 is within η−3C1 |Ij | 1/2 of the origin, then the result follows immediately. Otherwise by the radial assumption, there would be at least ((η−3C1 |Ij | 1/2)n−1 (η−C1 |Ij|1/2)n−1 η−2(n−1)C1 many distinct balls each containing at least ηC1 |Ij | amount of mass. By Hölder inequality, this implies η−2(n−1)C1 × ηC1 |Ij | . (η−3C1−η−C1 )|Ij | 1/2≤|x|≤(η−3C1+η−C1 )|Ij | |u(t, x)|2dx (η−3C1−η−C1 )|Ij | 1/2≤|x|≤(η−3C1+η−C1 )|Ij | η−3C1 |Ij | × η−C1 |Ij | that is 2n2−9n+4 Because 2n2 − 9n+ 4 > 0 for n ≥ 5, this contradicts the boundedness on the energy of (6.2). This completes the proof. Next, we use Proposition 3.1 to show that if there are many unexceptional intervals, they must form a cascade and must concentrate at some time t∗. Corollary 6.2. Assume that the solution u is spherically symmetric. For any interval I ⊆ [t−, t+] and I be a union of consecutive unexceptional intervals Ij. Then . η−13C1 and moreover, there exists a j such that ∣ & η26C1 Proof: For any unexceptional interval Ij , from Hölder inequality and Corollary 6.1, we have ∣ . Mass u(t), B(0, η−3C1 |Ij | η−3C1 |Ij |1/2 2η−3C1 |Ij |1/2 2 |u(t, x)|2 η−3C1 |Ij| |x|≤2η−3C1 |Ij | |u(t, x)|2 therefore |x|≤2η−3C1 |Ij| |u(t, x)|2 dx & η10C1 We integrate this over each unexceptional interval Ij and sum over j, η10C1 |x|≤2η−3C1 |Ij| |u(t, x)|2 |x|≤2η−3C1 |I|1/2 |u(t, x)|2 |x|≤2η−3C1 |I|1/2 |u(t, x)|2 . η−3C1 |I|1/2. The second claim follows from the first and the fact that )−1/2 This completes the proof. Proposition 6.2 (Interval cascade). Let I be an interval tiled by finitely many intervals I1, · · · , IN . Suppose that for any continuous family Ij : j ∈ J of the unexceptional intervals, there exists j∗ ∈ J such that ∣ ≥ a ∣ (6.15) for some small a > 0. Then there exist K ≥ log(N)/ log(2a−1) distinct indices j1, · · · , jK such that ∣ ≥ 2 ∣ ≥ · · · ≥ 2K−1 and for any t∗ ∈ IjK , dist(Ijk , t∗) . hold for 1 ≤ k ≤ K. I(1) I(2) I(k) I(K−1) I(K) (t) t+ exceptional interval at most O(η−6C3) exceptional interval at most 2a−1−1 Figure 1: Iteration process in Proposition 6.2. Proof: Here we use an algorithm in [1] and [22] to assign a generation to each Ij . By hypothesis, I contains at least one interval of length a|I|. All intervals with length larger than a|I|/2 belong to the first generation. By the total measure, we see that there are at most 2a−1 − 1 intervals in the first generation. Removing there intervals from I leaves at most 2a−1 gaps, which are tiled by intervals Ij . By (6.15) and the contradiction argument, we know that there is not gap with length larger than |I|/2. We now apply this argument recursively to all gaps generated by the previous itera- tion until every Ij has been labeled with a generation number. Each iteration of the algorithm removes at most 2a−1−1 many intervals and produces at most 2a−1 gaps. Suppose that there areN consecutive unexceptional intervals initially, and we perform at most K times iterations. Then the number K obeys N ≤ (2a−1 − 1) + (2a−1 − 1)2a−1 + · · ·+ (2a−1 − 1)(2a−1)K−1 ≤ (2a−1)K , which leads to the claim K ≥ log(N)/ log(2a−1). Let I(K) be the interval obtained after K − 1 iterations and IjK be any interval in I(K). For 1 ≤ i ≤ K − 1, let I(i) be the (i − 1)-generation gap which contains the IjK , and assign the Iji be any ith-generation interval which is contained in I (i) (see Figure 1). By the construction, for any t∗ ∈ IjK , we have dist(t∗, Ijk) ≤ |I (k)| ≤ 2a−1 for all 1 ≤ k ≤ K. Proposition 6.3 (Energy non-evacuation). Let Ij1 , · · · , IjK be a disjoint family of un- exceptional intervals obeying ∣ ≥ 2 ∣ ≥ · · · ≥ 2K−1 ∣ (6.16) and for any t∗ ∈ IjK , dist(Ijk , t∗) . η −26C1 hold for 1 ≤ k ≤ K. Then K ≤ η−100C1 . Proof: By Corollary 6.1, Mass(u(t), B(0, η−3C1 |Ijk | 1/2)) & ηC1 |Ijk | for all t ∈ Ijk . By (3.1), we have Mass(u(t∗), B(0, η −27C1 |Ijk | 1/2)) & ηC1 |Ijk | dist(t∗, Ijk) η−27C1 |Ijk | & ηC1 |Ijk |. On the other hand, from (3.2), we have Mass(u(t∗), B(0, 2η C1 |Ijk | 1/2)) . η2C1 |Ijk |. Define A(k) = x : ηC1 |Ijk | 1/2 ≤ |x| ≤ η−27C1 |Ijk | then we have ∣u(t∗, x) dx & Mass(u(t), B(0, η−27C1 |Ijk | 1/2))−Mass(u(t), B(0, 2ηC1 |Ijk | 1/2)) & ηC1 |Ijk |. By Hölder inequality, we have ∣u(t∗, x) n−2 dx & ηC1 |Ijk | η−27C1 |Ijk | )− 2n & η95C1 Choosing M = −56C1 log η, then we obtain by (6.16) η−27C1 |IjM+1 | 1/2 ≤ ηC1 |Ij1 | η−27C1 |Ij2M+1 | 1/2 ≤ ηC1 |IjM+1 | · · · Hence the annuli A(k) associated to k = 1,M +1, 2M +1, · · · , are disjoint. The number of such annuli is O(K/M). Therefore from (6.2), we obtain η95C1 . ∣u(t∗, x) n−2 dx . 1. That is K . Mη−95C1 . η−100C1 . We now return to the proof of Theorem 1.1. As explained at the beginning of this section, it suffices to bound the number of the unexceptional intervals. Note that the number of exceptional interval is at most O(η−6C3). We first bound the number N of unexceptional intervals that can occur consecutively. Let us denote the union of these consecutive unexceptional intervals by I. By Corol- lary 6.2, the hypotheses of Proposition 6.2 are satisfied with a = η26C1 and so we can find a cascade of K intervals and they satisfied the hypotheses of Proposition 6.3. The bound on K implies the bound on N , namely, N . (2a−26C1)K ≈ (2η−26C1)η −100C1 At last, since there are at most O(η−6C3) exceptional intervals, the total number of intervals is J . η−6C3 + η−6C3N . eη −200C1 This completes the proof of Theorem 1.1. Acknowledgements: The authors were partly supported by the NNSF of China. G. Xu wish to thank Xiaoyi Zhang for providing the paper [13] and some discussions. References [1] J. Bourgain, Scattering in the energy space and below for 3D NLS. J. Anal. Math. 75(1998), 267-297. [2] T. Cazenave, Semilinear Schrödinger equations. Courant Lecture Notes in Mathe- matics, vol. 10. New York: New York University Courant Institute of Mathematical Sciences, 2003. [3] J. Colliander, M. Keel, G. Staffilani, H. Takaoka, and T. Tao, Global well-posedness and scattering for the energy-cirtical nonlinear Schrödinger equation in R3. to appear Ann. of Math.. [4] J. Ginibre and T. Ozawa, Long range scattering for nonlinear Schrödinger and Hartree equations in space dimension n ≥ 2. Comm. Math. Phys., 151(1993), 619- [5] J. Ginibre and G. Velo, On a class of nonlinear Schrödinger equations with nonlocal interactions, Math. Z., 170(1980), 109-136. [6] J. Ginibre and G. Velo, Scattering theory in the energy space for a class of Hartree equations, Nonlinear wave equations (Providence, RI, 1998), 29-60, Con- temp. Math., 263, Amer. Math. Soc., Providence, RI, 2000. [7] J. Ginibre and G. Velo, Long range scattering and modified wave operators for some Hartree type equations. Rev. Math. Phys., 12, No. 3, 361-429 (2000). [8] J. Ginibre and G. Velo, Long range scattering and modified wave operators for some Hartree type equations II. Ann. Henri Poincaré 1, No.4, 753-800 (2000). [9] J. Ginibre and G. Velo, Long range scattering and modified wave operators for some Hartree type equations. III: Gevrey spaces and low dimensions. J. Differ. Equations. 175, No.2, 415-501 (2001). [10] N. Hayashi and Y. Tsutsumi, Scattering theory for the Hartree equations. Ann. Inst. H. Poincaré Phys. Theorique 61(1987), 187-213. [11] M. Keel and T. Tao, Endpoint Strichartz estimates. Amer. J. Math. 120:5(1998), 955-980. [12] C. E. Kenig and F. Merle, Global well-posedness, scattering and blow-up for the energy-critical, focusing, non-linear Schrödinger equation in the radial case. Invent. Math., 166(2006), 645-675. [13] R. Killip, M. Visan and X. Zhang, Energy-critical NLS with quadratic potentials. arXiv:math.AP/0611394. [14] K. Kurata and T. Ogawa, Remarks on blowing-up of solutions for some nonlinear Schrödinger equations. Tokyo J. Math., 13:2(1990), 399-419. [15] C. Miao, Hm-modified wave operator for nonlinear Hartree equation in the space dimensions n ≥ 2. Acta Mathematica Sinica, 13:2(1997), 247-268. [16] C. Miao, G. Xu and L. Zhao, The Cauchy problem of the Hartree equation. preprint. [17] C. Morawetz and W. A. Strauss, Decay and scattering of solutions of a nonlinear relativistic wave equation, Comm. Pure Appl. Math., 25(1972), 1-31. [18] K. Nakanishi, Energy scattering for Hartree equations, Math. Res. Lett., 6(1999), 107-118. http://arxiv.org/abs/math/0611394 [19] H. Nawa and T. Ozawa, Nonlinear scattering with nonlocal interactions, Comm. Math. Phys. 146(1992), 259-275. [20] E. Ryckman and M. Visan, Global well-posedness and scattering for the defocusing energy-critical nonlinear Schrödinger equation in R1+4. Amer. J. Math., 129(2007), 1-60. [21] R. S. Strichartz, Restriction of Fourier tranform to quadratic surfaces and decay of solutions of wave equations. Duke Math. J., 44(1977), 705-714. [22] T. Tao, Global well-posedness and scattering for the higher-dimensional energy- critical nonlinear Schrödinger equation for radial data. New York Journal of Math- ematics, 11(2005), 57-80. [23] M. Visan, The defocusing energy-critical nonlinear Schrödinger equation in higher dimensions. to appear Duke Math. J.. [24] M. I. Weinstein, Nonlinear Schrödinger equations and sharp interpolation estimates. Comm. Math. Phys., 87(1983), 567-576. Introduction Notations and basic estimates Local mass conservation and Morawetz inequality Local mass conservation A Morawetz inequality Local theory Perturbation result Global well-posedness References ABSTRACT We consider the defocusing, $\dot{H}^1$-critical Hartree equation for the radial data in all dimensions $(n\geq 5)$. We show the global well-posedness and scattering results in the energy space. The new ingredient in this paper is that we first take advantage of the term $\displaystyle - \int_{I}\int_{|x|\leq A|I|^{1/2}}|u|^{2}\Delta \Big(\frac{1}{|x|}\Big)dxdt$ in the localized Morawetz identity to rule out the possibility of energy concentration, instead of the classical Morawetz estimate dependent of the nonlinearity. <|endoftext|><|startoftext|> Introduction The model COSMOLOGICAL CONSEQUENCES Special case with =0 General case with =0 Conclusions and Discussions Acknowledgments References ABSTRACT We investigate the effect of the bulk content in the general Gauss-Bonnet braneworld on the evolution of the universe. We find that the Gauss-Bonnet term and the combination of the dark radiation and the matter content of the bulk play a crucial role in the universe evolution. We show that our model can describe the super-acceleration of our universe with the equation of state of the effective dark energy in agreement with observations. <|endoftext|><|startoftext|> Introduction The theory of free probability and free entropy was developed by Voiculescu from 1990s. It played a crucial role in the recent study of finite von Neumann algebras (see [1], [3], [4], [5], [6], [7], [8], [11], [14], [15], [23], [24], [25]). The analogue of free entropy dimension in C∗ algebra context, the notion of topological free entropy dimension of of n−tuple of elements in a unital C∗ algebra, was also introduced by Voiculescu in [26]. After introducing the concept of topological free entropy dimension of n-tuple of elements in a unital C∗ algebra, Voiculescu discussed some of its properties including subadditivity and change of variables in [26]. In this paper, we will add one basic property into the list: topological free entropy dimension of one variable. More specifically, suppose x is a self-adjoint element in a unital C∗ algebra A and σ(x) is the spectrum of x in A. Then topological free entropy dimension of x is equal to 1− 1 where n is the cardinality of the set σ(x) (see Theorem 4.1). In [26], Voiculescu showed that (i) if x1, . . . , xn is a family of free semicircular elements in a unital C∗ algebra with a tracial state, then δtop(x1, . . . , xn) = n, where δtop(x1, . . . , xn) is the topological free entropy dimension of x1, . . . , xn; (ii) if x1, . . . , xn is the universal n-tuple of self-adjoint contractions, then δtop(x1, . . . , xn) = n. Except in these two cases, very few has been known on the values of topological free entropy dimensions in other C∗ algebras. Using the inequality between topological free entropy dimension and Voiculescu’s free dimension capacity, we are able to obtain an estimation of upper-bound of topological free entropy dimension for a 1The second author is supported by an NSF grant. http://arxiv.org/abs/0704.0667v3 unital C∗ algebra with a unique tracial state (see Theorem 5.1). The lower-bound of topological free entropy dimension is also obtained for infinite dimensional simple unital C∗ algebra with a unique tracial state (see Theorem 5.2). As a corollary, we know that the topological free entropy dimension of any family of self-adjoint generators of an irrational rotation C∗ algebra or UHF algebra or C∗red(F2)⊗minC∗red(F2) is equal to 1 (see Theorem 5.3, 5.4, 5.5). For these C∗ algebras, the value of the topological free entropy dimension is independent of the choice of generators. The rest of the paper is devoted to study another invariant associated to n-tuple of elements in C∗ algebras. This invariant, called topological free orbit dimension, is an analogue of free orbit dimension in finite von Neumann algebras (see [11]). We show that the topological free orbit dimension of a self-adjoint element in a unital C∗ algebra is equal to, according to some measurement, the packing dimension of the spectrum of x (see Theorem 7.1). The organization of the paper is as follows. In the section 2, we recall the definition of topological free entropy dimension. Some technical lemmas are proved in section 3. In section 4, we compute the topological free entropy dimension of one self-adjoint element in a unital C∗ algebra. In section 5, we study the relationship between topological free entropy dimension and free capacity dimension of a unital C∗ algebra. Then we show that topological free entropy dimension of of any family of generators of an infinite dimensional simple unital C∗ algebra with a unique tracial state is always greater than or equal to 1. The concept of topological free orbit dimension of n-tuple of elements in a C∗ algebra is introduced in section 6. Its value for one variable is computed in section 7. 2. Definitions and preliminary In this section, we are going to recall Voiculescu’s definition of topological free entropy dimension of n-tuple of elements in a unital C∗ algebra. 2.1. A Covering of a set in a metric space. Suppose (X, d) is a metric space and K is a subset of X . A family of balls in X is called a covering of K if the union of these balls covers K and the centers of these balls lie in K. 2.2. Covering numbers in complex matrix algebra (Mk(C))n. Let Mk(C) be the k × k full matrix algebra with entries in C, and τk be the normalized trace on Mk(C), i.e., Tr, where Tr is the usual trace on Mk(C). Let U(k) denote the group of all unitary matrices in Mk(C). Let Mk(C)n denote the direct sum of n copies of Mk(C). Let Ms.ak (C) be the subalgebra of Mk(C) consisting of all self-adjoint matrices of Mk(C). Let (Ms.ak (C))n be the direct sum of n copies of Ms.ak (C). Let ‖ · ‖ be an operator norm on Mk(C)n defined by ‖(A1, . . . , An)‖ = max{‖A1‖, . . . , ‖An‖} for all (A1, . . . , An) in Mk(C)n. Let ‖ · ‖2 denote the trace norm induced by τk on Mk(C)n, i.e., ‖(A1, . . . , An)‖2 = 1A1) + . . .+ τk(A for all (A1, . . . , An) in Mk(C)n. For every ω > 0, we define the ω-‖ · ‖-ball Ball(B1, . . . , Bn;ω, ‖ · ‖) centered at (B1, . . . , Bn) in Mk(C)n to be the subset of Mk(C)n consisting of all (A1, . . . , An) in Mk(C)n such that ‖(A1, . . . , An)− (B1, . . . , Bn)‖ < ω. Definition 2.1. Suppose that Σ is a subset of Mk(C)n. We define the covering number ν∞(Σ, ω) to be the minimal number of ω-‖ · ‖-balls that consist a covering of Σ in Mk(C)n. For every ω > 0, we define the ω-‖·‖2-ball Ball(B1, . . . , Bn;ω, ‖·‖2) centered at (B1, . . . , Bn) in Mk(C)n to be the subset of Mk(C)n consisting of all (A1, . . . , An) in Mk(C)n such that ‖(A1, . . . , An)− (B1, . . . , Bn)‖2 < ω. Definition 2.2. Suppose that Σ is a subset of Mk(C)n. We define the covering number ν2(Σ, ω) to be the minimal number of ω-‖ · ‖2-balls that consist a covering of Σ in Mk(C)n. 2.3. Noncommutative polynomials. In this article, we always assume that A is a unital C∗-algebra. Let x1, . . . , xn, y1, . . . , ym be self-adjoint elements inA. Let C〈X1, . . . , Xn, Y1, . . . , Ym〉 be the unital noncommutative polynomials in the indeterminates X1, . . . , Xn, Y1, . . . , Ym. Let {Pr}∞r=1 be the collection of all noncommutative polynomials in C〈X1, . . . , Xn, Y1, . . . , Ym〉 with rational complex coefficients. (Here “rational complex coefficients” means that the real and imaginary parts of all coefficients of Pr are rational numbers). Remark 2.1. We alsways assume that 1 ∈ C〈X1, . . . , Xn, Y1, . . . , Ym〉. 2.4. Voiculescu’s Norm-microstates Space. For all integers r, k ≥ 1, real numbers R, ǫ > 0 and noncommutative polynomials P1, . . . , Pr, we define (top) R (x1, . . . , xn, y1, . . . , ym; k, ǫ, P1, . . . , Pr) to be the subset of (Ms.ak (C))n+m consisting of all these (A1, . . . , An, B1, . . . , Bm) ∈ (Ms.ak (C))n+m satisfying max{‖A1‖, . . . , ‖An‖, ‖B1‖, . . . , ‖Bm‖} ≤ R |‖Pj(A1, . . . , An, B1, . . . , Bm)‖ − ‖Pj(x1, . . . , xn, y1, . . . , ym)‖| ≤ ǫ, ∀ 1 ≤ j ≤ r. Remark 2.2. In the definition of norm-microstates space, we use the following assumption. If Pj(x1, . . . , xn, y1, . . . , ym) = α0 · IA + 1≤i1,...,is≤n+m αi1···iszi1 · · · zis where z1, . . . , zn+m denotes x1, . . . , xn, y1, . . . , ym and α0, αi1···is are in C, then Pj(A1, . . . , An, B1, . . . , Bm) = α0 · Ik + 1≤i1,...,is≤n+m αi1···isZi1 · · ·Zis where Z1, . . . , Zn+m denotes A1, . . . , An, B1, . . . , Bm and Ik is the identity matrix in Mk(C). Remark 2.3. In the original definition of norm-microstates space in [26], the parameter R was not introduced. Note the following observation: Let R > max{‖x1‖, . . . , ‖xn‖, ‖y1‖, . . . , ‖ym‖}. When r is large enough so that {X1, . . . , Xn, Y1, . . . , Ym} ⊂ {P1, . . . , Pr} and 0 < ǫ < R−max{‖x1‖, . . . , ‖xn‖, ‖y1‖, . . . , ‖ym‖}, we have (top) R (x1, . . . , xn, y1, . . . , ym; k, ǫ, P1, . . . , Pr) = Γtop(x1, . . . , xn, y1, . . . , ym; k, ǫ, P1, . . . , Pr) for all k ≥ 1, where Γ(top)(x1, . . . , xn, y1, . . . , ym; k, ǫ, P1, . . . , Pr) is the norm-microstates space defined in [26]. Thus our definition agrees with the one in [26] for large R, r and small ǫ. In the later sections, we need to construct the ultraproduct of some matrix algebras, it will be convenient for us to include the parameter “R” in the definition of norm-microstate space. Define the norm-microstates space of x1, . . . , xn in the presence of y1, . . . , ym, denoted by (top) R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr) as the projection of Γ (top) R (x1, . . . , xn, y1, . . . , ym; k, ǫ, P1, . . . , Pr) onto the space (Ms.ak (C))n via the mapping (A1, . . . , An, B1, . . . , Bm) → (A1, . . . , An). 2.5. Voiculescu’s topological free entropy dimension (see [26]). Define (top) R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr), ω) to be the covering number of the set Γ (top) R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr) by ω-‖ · ‖-balls in the metric space (Ms.ak (C))n equipped with operator norm. Definition 2.3. Define δtop(x1, . . . , xn : y1, . . . , ym;ω) = sup ǫ>0,r∈N lim sup log(ν∞(Γ (top) R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr), ω)) −k2 log ω . The topological entropy dimension of x1, . . . , xn in the presence of y1, . . . , ym is defined by δtop(x1, . . . , xn : y1, . . . , ym) = lim sup δtop(x1, . . . , xn : y1, . . . , ym;ω). Remark 2.4. Let M > max{‖x1‖, . . . , ‖xn‖, ‖y1‖, . . . , ‖ym‖} be some positive number. By Remark 2.3, we know δtop(x1, . . . , xn : y1, . . . , ym) = lim sup ǫ>0,r∈N lim sup log(ν∞(Γ (top) M (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr), ω)) −k2 logω . 2.6. C∗ algebra ultraproduct and von Neumann algebra ultraproduct. Suppose {Mkm(C)}∞m=1 is a sequence of complex matrix algebras where km goes to infinity when m approaches infinity. Let γ be a free ultrafilter in β(N)\N. We can introduce a unital C∗ algebra m=1Mkm(C) as follows: Mkm(C) = {(Ym)∞m=1 | ∀ m ≥ 1, Ym ∈ Mkm(C) and sup ‖Ym‖ <∞}. We can also introduce the norm closed two sided ideals I∞ and I2 as follows. I∞ = {(Ym)∞m=1 ∈ Mkm(C) | lim ‖Ym‖ = 0} I2 = {(Ym)∞m=1 ∈ Mkm(C) | lim ‖Ym‖2 = 0} Definition 2.4. The C∗ algebra ultraproduct of {Mkm(C)}∞m=1 along the ultrfilter γ, denoted m=1Mkm(C), is defined to be the quotient algebra of m=1Mkm(C) by the ideal I∞. The image of (Ym) m=1 ∈ m=1Mkm(C) in the quotient algebra is denoted by [(Ym)m]. Definition 2.5. The von Neumann algebra ultraproduct of {Mkm(C)}∞m=1 along the ultrfilter γ, also denoted by m=1Mkm(C) if no confusion arises, is defined to be the quotient algebra of m=1Mkm(C) by the ideal I2. The image of (Ym)∞m=1 ∈ m=1Mkm(C) in the quotient algebra is denoted by [(Ym)m]. Remark 2.5. The von Neumann algebra ultraproduct m=1Mkm(C) is a finite factor (see [16]). 2.7. Topological free entropy dimension of elements in a non-unital C∗ algebra. Topological free entropy dimension can also be defined for n-tuple of elements in a non-unital C∗ algebra. Suppose that A is a non-unital C∗-algebra. Let x1, . . . , xn, y1, . . . , ym be self-adjoint elements in A. Let C〈X1, . . . , Xn, Y1, . . . , Ym〉 ⊖ C be the noncommutative polynomials in the indeterminates X1, . . . , Xn, Y1, . . . , Ym without constant terms. Let {Pr}∞r=1 be the collection of all noncommutative polynomials in C〈X1, . . . , Xn, Y1, . . . , Ym〉 ⊖ C with rational complex coefficients. Then norm-mocrostate space (top) R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr) can be defined similarly as in section 2.4. So topological free entropy dimension δtop(x1, . . . , xn : y1, . . . , ym) can also be defined similarly as in section 2.5. In the paper, we will focus on the case when A is a unital C∗ algebra. 3. Some technical lemmas 3.1. Suppose x is a self-adjoint element in a unital C∗ algebra A. Let σ(x) be the spectrum of x in A. Theorem 3.1. Let R > ‖x‖. For any ω > 0, we have the following. (1) There are some integer n ≥ 1 and distinct real numbers λ1, λ2, · · · , λn in σ(x) satisfying (i) |λi−λj | ≥ ω for all 1 ≤ i 6= j ≤ n; and (ii) for any λ in σ(x), there is some λj with 1 ≤ j ≤ n such that |λ− λj| ≤ ω. (2) There are some r0 > 0 and ǫ0 > 0 such that the following holds: when r > r0, ǫ < ǫ0, for any A in Γ (top) R (x; k, ǫ, P1, . . . , Pr), there are positive integers 1 ≤ k1, . . . , kn ≤ k with k1 + k2 + · · ·+ kn = k and some unitary matrix U in Mk(C) satisfying ‖U∗AU − λ1Ik1 0 · · · 0 0 λ2Ik2 · · · 0 · · · · · · . . . · · · 0 0 · · · λnIkn ‖ ≤ 2ω, where Ikj is the kj × kj identity matrix in Mkj (C) for 1 ≤ j ≤ n. Proof. The proof of part (1) is trivial. We will only prove part (2). Assume that the result in (2) does not hold. Then there is some ω > 0 so that the following holds: for all m ≥ 1, there are km ≥ 1 and some Am in Γ(top)R (x; km, 1m , P1, . . . , Pm) such that ‖U∗AmU − λ1Is1 0 · · · 0 0 λ2Is2 · · · 0 · · · · · · . . . · · · 0 0 · · · λnIsn ‖ > 2ω, (∗) for every 1 ≤ s1, . . . , sn ≤ km with s1 + · · ·+ sn = km and every unitary matrix U in Mkm(C). Let γ be a free ultrafilter in β(N)\N. Let B = m=1 Mkm(C) be the C∗ algebra ultraproduct of {Mkm(C)}∞m=1 along the ultrafilter α, i.e. m=1 Mkm(C) is the quotient algebra of the C∗ algebra m=1Mkm(C) by I∞, the 0-ideal of the norm ‖ · ‖, where I∞ = {(Am)∞m=1 ∈ Mkm(C) | limm→γ ‖Am‖ = 0}. Let a = [(U∗AmU)∞m=1] be a self-adjoint element in B. By mapping x to a, there is a unital ∗-isomorphism from the C∗ subalgebra generated by {IA, x} in A onto the C∗ subalgebra generated by {IB, a} in B. Thus σ(x) = σ(a). It is not hard to see that Hausdorff-dist(σ(U∗AmU), σ(a)) → 0 as m goes to γ, which contradicts with the results in part (1) and (∗). 3.2. In this subsection, we will use the following notation. (i) Let n,m be some positive integers with n ≥ m. (ii) Let δ, θ be some positive numbers. (iii) Let {λ1, λ2, . . . , λm} ∪ {λm+1, . . . , λn} be a family of real numbers such that |λi − λj | ≥ θ for all 1 ≤ i < j ≤ m. (iv) Let k be a positive integer such that k − (n−m) is divided by m. We let k − n +m (v) We let B = diag(λm+1, . . . , λn) be a diagonal matrix in Mn−m(C) and A = diag(λ1It, λ2It, . . . , λmIt, B) be a block-diagonal matrix in Mk(C), where It is the identity matrix in Mt(C). (vi) We let A be defined as above and Ω(A) = {U∗AU | U is in U(k)}. (vii) Assume that {eij}ki,j=1 is a canonical basis of Mk(C). We let V1 = span{eij | |λ[ i ]+1 − λ[ j ]+1| ≥ θ, with 1 ≤ i, j < mt}; and V2 = Mk(C)⊖ V1, where [ i ], or [ j ], denotes the largest integer ≤ [ i ], or [ j ] respectively. Lemma 3.1. We follow the notation as above. Suppose ‖U1AU∗1 − U2AU∗2‖2 ≤ δ for some unitary matrices U1 and U2 in U(k). Then the following hold. (1) There exists some S ∈ V2 such that ‖S‖2 ≤ 1 and ‖U1 − U2S‖2 ≤ (2) If n = m, then there is a unitary matrix W in V2 such that ‖U1 − U2W‖2 ≤ Proof. Assume that U∗2U1 = U11 U12 · · · U1,m+1 U21 U22 · · · U2,m+1 · · · · · · · · · · · · Um+1,1 Um+1,2 · · · Um+1,m+1 where Ui,j is a t × t matrix, Ui,m+1 a t × (n − m) matrix, Um+1,j a (n − m) × t matrix for 1 ≤ i, j ≤ m and Um+1,m+1 is a (n−m)× (n−m) matrix. (1) Let U11 0 · · · 0 U1,m+1 0 U22 · · · 0 U2,m+1 · · · · · · . . . · · · · · · 0 0 · · · Um,m Um,m+1 Um+1,1 Um+1,2 · · · Um+1,m Um+1,m+1 It is easy to see that S is in V2, ‖S‖2 ≤ 1 and δ2 ≥ ‖U1AU∗1 − U2AU∗2‖22 = Tr((U∗2U1A− AU∗2U1)(U∗2U1A− AU∗2U1)∗) 1≤i 6=j≤m Tr(|λi − λj|2UijU∗ij) 1≤i 6=j≤m Tr(UijU Hence ‖U1 − U2S‖22 = ‖U∗2U1 − S‖22 = 1≤i 6=j≤m Tr(UijU ij) ≤ It follows that ‖U1 − U2S‖2 ≤ (2) If n = m, then V2 = Mt(C)⊕Mt(C)⊕ · · · ⊕Mt(C). By the construction of S, we can assume S =WH is a polar decomposition of S in V2 for some unitary matrix W and positive matrix H in V2. Again by the construction of S, we know that ‖S‖ ≤ 1, whence ‖H‖ ≤ 1. From the proven fact that ‖U∗2U1 − S‖2 ≤ δθ , we know that ‖H2 − I‖2 = ‖S∗S − I‖2 ≤ ‖H − I‖2 ≤ ‖H2 − I‖2 ≤ It follows that ‖U1 − U2W‖2 ≤ ‖U1 − U2WH‖2 + ‖U2WH − U2W‖2 = ‖U1 − U2S‖2 + ‖H − I‖2 ≤ Lemma 3.2. We have the following results. (1) For every U ∈ U(k), let Σ(U) = {W ∈ U(k) | ∃ S ∈ V2 such that ‖S‖2 ≤ 1 and ‖W − US‖2 ≤ Then the volume of Σ(U) is bounded above by µ(Σ(U)) ≤ (C1 · 4δ/θ)k )2mt2+4m(n−m)t+2(n−m)2 where µ is the normalized Haar measure on the unitary group U(k) and C,C1 are some constants independent of k, δ, θ. (2) When n = m, for every U ∈ U(k), let Σ̃(U) = {W ∈ U(k) | ∃ a unitary matrix W1 in V2 such that ‖W − UW1‖2 ≤ µ(Σ̃(U)) ≤ (C1 · 8δ/θ)k Proof. (1) By computing the covering number of the set {S | S ∈ V2, such that ‖S‖2 ≤ 1} by δ/θ-‖ · ‖2-balls in Mk(C), we know ν2({S | S ∈ V2, ‖S‖2 ≤ 1}, )real dimension of of V2 )2mt2+4m(n−m)t+2(n−m)2 where C is a universal constant. Thus the covering number of the set Σ(U) by the 4δ/θ-‖·‖2-balls in Mk(C) is bounded by ν2(Σ(U), ) ≤ ν2({S | S ∈ V2, ‖S‖2 ≤ 1}, )2mt2+4m(n−m)t+2(n−m)2 But the ball of radius 4δ/θ in U(k) has the volume bounded by µ(ball of radius 4δ/θ) ≤ (C1 · 4δ/θ)k where C1 is a universal constant. Thus µ(Σ(U)) ≤ (C1 · 4δ/θ)k )2mt2+4m(n−m)t+2(n−m)2 (2) A slight adaption of the proof of part (1) gives us the proof of part (2). � Lemma 3.3. Let Ω(A) be defined as in (vi) at the beginning of this subsection. (1) The covering number of Ω(A) by the 1 δ-‖ · ‖2-balls in Mk(C) is bounded below by ν2(Ω(A), δ) ≥ (C1 · 4δ/θ)−k )−(2mt2+4m(n−m)t+2(n−m)2) (2) If n = m, then ν2(Ω(A), δ) ≥ (C1 · 8δ/θ)−k )−mt2 Proof. (1) For every U ∈ U(k), define Σ(U) = {W ∈ U(k) | ∃ S = S∗ ∈ V1, such that ‖S‖2 ≤ 1, ‖W − US‖2 ≤ By preceding lemma, we have µ(Σ(U)) ≤ (C1 · 4δ/θ)k )mt2+4m(n−m)t+2(n−m)2 A “parking” (or exhausting) argument will show the existence of a family of unitary elements {Ui}Ni=1 ⊂ U(k) such that N ≥ (C1 · 4δ/θ)−k )−(mt2+4m(n−m)t+2(n−m)2) Ui is not contained in ∪i−1j=1 Σ(Uj), ∀ i = 1, . . . , N. From the definition of each Σ(Uj), it follows that ‖Ui − UjS‖2 > , ∀ S ∈ V2, with ‖S‖2 ≤ 1, ∀1 ≤ j < i ≤ N. By Lemma 3.1, we know that ‖UiAU∗i − UjAU∗j ‖2 > δ, ∀1 ≤ j < i ≤ N, which implies that ν2(Ω(A), δ) ≥ N ≥ (C1 · 4δ/θ)−k )−(mt2+4m(n−m)t+2(n−m)2) (2) is similar as (1). � 3.3. We have following theorem. Theorem 3.2. Let n ≥ m, δ, θ > 0 and {λ1, λ2, . . . , λm} ∪ {λm+1, . . . , λn} be a family of real numbers such that |λi − λj| ≥ θ for all 1 ≤ i < j ≤ m. Let k be a positive integer such that k − (n−m) is divided by m and k − n +m B = diag(λm+1, . . . , λn) be a diagonal matrix in Mn−m(C) and A = diag(λ1It, λ2It, . . . , λmIt, B) be a block-diagonal matrix in Mk(C), where It is the identity matrix in Mt(C). We let Ω(A) = {U∗AU | U is in U(k)}. Then the covering number of Ω(A) by the 1 δ-‖ · ‖-balls in Mk(C) is bounded below by ν∞(Ω(A), δ) ≥ (C1 · 4δ/θ)−k )−(2mt2+4m(n−m)t+2(n−m)2) where C,C1 are some universal constants. When n = m, we have ν∞(Ω(A), δ) ≥ (C1 · 8δ/θ)−k )−mt2 Proof. Note that ν∞(Ω(A), ) ≥ ν2(Ω(A), ), ∀ δ > 0. The result follows directly from preceding lemma. � 3.4. The following proposition, whose proof is skipped, is an easy extension of Lemma 3.3. Proposition 3.1. Let m, k be some positive integers and θ, δ be some positive numbers. Let T1, T2, . . . , Tm+1 is a partition of the set {1, 2, . . . , k}, i.e. ∪m+1i=1 Ti = {1, 2, . . . , k} and Ti∩Tj = ∅ for 1 ≤ i 6= j ≤ m+ 1. Let λ1, . . . , λk be some real numbers such that, if 1 ≤ j1 6= j2 ≤ m then |λi1 − λi2 | > θ, ∀ i1 ∈ Tj1, i2 ∈ Tj2 . Let A = diag(λ1, λ2, . . . , λk) be a self-adjoint matrix in Mk(C) and Ω(A) = {U∗AU | U ∈ U(k)} be a subset of Mk(C). Let sj be the cardinality of the set Tj for 1 ≤ j ≤ m+ 1. Then the covering number of Ω(A) by the 1 δ-‖ · ‖2-balls in Mk(C) is bounded below by ν2(Ω(A), δ) ≥ (C1 · 4δ/θ)−k )−2s2 −···−2s2m+1−4(s1+···+sm)sm+1 where C,C1 are some universal constants. 4. Topological free entropy dimension of one variable Suppose x is a self-adjoint element of a unital C∗ algebra A. In this section, we are going to compute the topological entropy dimension of x. 4.1. Upperbound. Proposition 4.1. Suppose x in A is a self-adjoint element with the spectrum σ(x). Then δtop(x) ≤ 1− where n is the cardinality of σ(x). Here we assume that 1 Proof. By [26], we know that the inequality always holds when n is infinity. We need only to show that δtop(x) ≤ 1− when n <∞. Assume that λ1, . . . , λn are in the spectrum of x in A. Let R > ‖x‖. By Theorem 3.1, for every ω > 0, there are r0 > 0 and ǫ0 > 0 such that, for all r > r0, ǫ < ǫ0, A ∈ Γ(top)R (x; k, ǫ, P1, . . . , Pr), there are some 1 ≤ k1, . . . , kn ≤ k, with k1 + · · · + kn = k and a unitary matrix U in Mk(C) satisfying λ1Ik1 0 · · · 0 0 λ2Ik2 · · · 0 0 0 · · · λnIkn ≤ 2ω. (∗∗) Ω(k1, . . . , kn) = λ1Ik1 0 · · · 0 0 0 λ2Ik2 · · · 0 0 0 0 · · · λn−1Ikn−1 0 0 0 · · · 0 λnIkn U∗ | U is in Uk By Corollary 12 in [21] or Theorem 3 in [2], the covering number of Ω(k1, . . . , kn−1, kn) by ω-‖ · ‖-balls in Mk(C) is upperbounded by ν∞(Ω(k1, . . . , kn−1, kn), ω) ≤ where C2 is a constant which does not depend on k, k1, . . . , kn (may depend on n and ‖x‖). Let I be the set consisting of all these (k1, . . . , kn) in Zn such that 1 ≤ k1, . . . , kn ≤ k and k1 + · · ·+ kn = k. Then the cardinality of the set I is equal to (k − 1)! (n− 1)!(k − n)! . Note that k2i ≥ k2/n for all 1 ≤ k1, . . . , kn ≤ k with k1 + · · ·+ kn = k; and by (∗∗) (top) R (x; k, ǫ, P1, . . . , Pr) is contained in 2ω-neighborhood of the set (k1,...,kn)∈I Ω(k1, . . . , kn). It follows that the covering number of the set (top) R (x; k, ǫ, P1, . . . , Pr) by 3ω-‖ · ‖-balls in Mk(C) is upperbounded by (top) R (x; k, ǫ, P1, . . . , Pr), 3ω) ≤ (k − 1)! (n− 1)!(k − n)! · )k2−k2/n δtop(x) ≤ lim sup lim sup (k−1)! (n−1)!(k−n)! )k2−k2/n −k2 log(3ω) = 1− 4.2. Lower-bound. We follow the notation from last subsection. Proposition 4.2. Suppose that x is a self-adjoint element with the finite spectrum σ(x) in A. δtop(x) ≥ 1− where n is the cardinality of the set σ(x). Proof. Suppose that λ1, . . . , λn are distinct spectrum of x. There is some positive number θ such that |λi − λj| > θ, ∀ 1 ≤ i 6= j ≤ n. Assume k = nt for some positive integer t. Let Ak = diag(λ1It, . . . , λnIt) be a diagonal matrix in Mk(C) where It is the t× t identity matrix. It is easy to see that, for all R > ‖x‖, r ≥ 1 and ǫ > 0, we have Ak ∈ Γ(top)R (x; k, ǫ, P1, . . . , Pr). For any ω > 0, applying Theorem 3.2 for n = m and δ = 1 ω, we have (top) R (x; k, ǫ, P1, . . . , Pr), ω) ≥ (C1 · 8δ/θ)−k )−mt2 = (16C1ω/θ) −k2 · )−mt2 Note that k = nt = mt and θ is some fixed number. A quick computation shows that δtop(x) ≥ 1− Proposition 4.3. Suppose that x is a self-adjoint element in A with infinite spectrum. Then δtop(x) ≥ 1. Proof. For any 0 < θ < 1, there are λ1, . . . , λm in the spectrum of x, σ(x), satisfying (i) |λi − λj | ≥ θ; and (ii) for any λ in σ(x), there is some λj with |λ − λj| ≤ θ. By functional calculus, for any R > ‖x‖, r ≥ 1 and ǫ > 0, there are some positive integer n ≥ m and real numbers λm+1, . . . , λn in σ(x) satisfying: for every t ≥ 1 the matrix A = diag(λ1It, λ2It, . . . , λmIt, λm+1, . . . , λn) is in (top) R (x; k, ǫ, P1, . . . , Pr), where we assume that k = mt + n−m. For any ω > 0, let δ = 1 ω. By Theorem 3.2, we know (top) R (x; k, ǫ, P1, . . . , Pr), ω) ≥ (C1 · 4δ/θ)−k )−(2mt2+4m(n−m)t+2(n−m)2) lim sup log(ν∞(Γ (top) R (x; k, ǫ, P1, . . . , Pr), ω)) −k2 logω ≥ log(4C1 )− log θ + 1 + log(2C) + log θ log ω Then, δtop(x) ≥ 1− When θ goes to 0, m goes to infinity as σ(x) has infinitely many elements. Therefore, δtop(x) ≥ 1. 4.3. Topological free entropy dimension in one variable case. By Proposition 4.1, Proposition 4.2 and Proposition 4.3, we have the following result. Theorem 4.1. Suppose x is a self-adjoint element in a unital C∗ algebra A. Then δtop(x) = 1− where n is the cardinality of the set σ(x) and σ(x) is the set of spectrum of x in A. Here we assume that 1 5. Topological free entropy dimension of n-tuple in unital C∗ algebras 5.1. An equivalent definition of topological free entropy dimension. Suppose that A is a unital C∗ algebra and x1, . . . , xn, y1, . . . , ym are self-adjoint elements in A. For every R, ǫ > 0 and positive integers r, k, let (top) R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr) be Voiculescu’s norm-microstate space defined in section 2.4. Define (top) R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr), ω) to be the covering number of the set Γ (top) R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr) by ω-‖·‖2-balls in the metric space (Ms.ak (C))n equipped with trace norm (see Definition 2.2). Definition 5.1. Define δ̃top(x1, . . . , xn : y1, . . . , ym;ω) = sup ǫ>0,r∈N lim sup log(ν2(Γ (top) R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr), ω)) −k2 logω δ̃top(x1, . . . , xn : y1, . . . , ym) = lim sup δ̃top(x1, . . . , xn : y1, . . . , ym;ω) The following proposition was pointed out by Voiculescu in [26]. For the sake of complete- ness, we also include a proof here. Proposition 5.1. Suppose that A is a unital C∗ algebra and x1, . . . , xn, y1, . . . , ym are self- adjoint elements in A. Then δ̃top(x1, . . . , xn : y1, . . . , ym) = δtop(x1, . . . , xn : y1, . . . , ym), where δtop(x1, . . . , xn : y1, . . . , ym) is the topological free entropy dimension of x1, . . . , xn in presence of y1, . . . , ym. Proof. This is an easy consequence of Lemma 1 in [21]. Let λ be the Lebesgue measure on (Ms.ak (C))n. Let, for every ω > 0, B∞(ω) = {(A1, . . . , An) ∈ (Ms.ak (C))n | ‖(A1, . . . , An)‖ ≤ ω} B2(ω) = {(A1, . . . , An) ∈ (Ms.ak (C))n | ‖(A1, . . . , An)‖2 ≤ ω} It follows from the results in [21] or Theorem 8 in [2] that, for some M1,M2 independent of k, ω such that λ(B∞(1)) ≤ λ(B∞(ω/4)) λ(B2(2 nω)) ≤ λ(B2(1)). (5.1.1) For every ω > 0 and any subset set K of (Ms.ak (C))n , let K(ω, ‖ · ‖) = {(A1, . . . , An) ∈ (Ms.ak (C))n | ‖(A1, . . . , An)− (D1, . . . , Dn)‖ ≤ ω for some (D1, . . . , Dn) ∈ K} K(ω, ‖ · ‖2) = {(A1, . . . , An) ∈ (Ms.ak (C))n | ‖(A1, . . . , An)− (D1, . . . , Dn)‖2 ≤ ω for some (D1, . . . , Dn) ∈ K} Note the following fact: ‖(A1, . . . , An)‖2 ≤ n‖(A1, . . . , An)‖, ∀ (A1, . . . , An) ∈ (Ms.ak (C))n. It follows from Lemma 1 in [21] that ν∞(K,ω) ≤ λ(K(ω, ‖ · ‖)) λ(B∞(ω/4)) nω, ‖ · ‖2)) λ(B2(2 ≤ ν2(K( nω, ‖ · ‖2), 2 Combining with the equalities (5.1.1), we get ν∞(K,ω) ≤ λ(K(ω, ‖ · ‖)) λ(B∞(ω/4)) λ(K(ω, ‖ · ‖)) λ(B∞(1)) nω, ‖ · ‖2)) λ(B∞(1)) nω, ‖ · ‖2)) λ(B2(1)) ≤ λ(K( nω, ‖ · ‖2)) λ(B2(2 ≤ ν2(K( nω, ‖·‖2), 2 nω) ≤ ν2(K, Therefore, we have ν2(K, nω) ≤ ν∞(K,ω) ≤ λ(B2(1)) λ(B∞(1)) ν2(K, It is a well-known fact (for example see Theorem 8 in [2]) that λ(B2(1)) λ(B∞(1)) ≤ Cnk23 for some universal constant C3 > 0. Hence ν2(K, nω) ≤ ν∞(K,ω) ≤ nM1C3 ν2(K, Let K be Γ (top) R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr). By the definitions of δ̃top and δtop, we δ̃top(x1, . . . , xn : y1, . . . , ym) = δtop(x1, . . . , xn : y1, . . . , ym). 5.2. Upper-bound of topological free entropy dimension in a unital C∗ algebra. Let us recall Voiculescu’s definition of free dimension capacity in [26]. Definition 5.2. Suppose that A is a unital C∗ algebra with a family of self-adjoint generators x1, . . . , xn. Suppose that TS(A) is the set consisting of all tracial states of A. If TS(A) 6= ∅, define Voiculescu’s free dimension capacity κδ(x1, . . . , xn) of x1, . . . , xn as follows, κδ(x1, . . . , xn) = sup τ∈TS(A) δ0(x1, . . . , xn : τ), where δ0(x1, . . . , xn : τ) is Voiculescu’s (von Neumann algebra) free entropy dimension of x1, . . . , xn in 〈A, τ〉. The relationship between topological free entropy dimension of a unital C∗ algebra with a unique tracial state and its free dimension capacity is indicated by the following result. Theorem 5.1. Suppose that A is a unital C∗ algebra with a family of self-adjoint generators x1, . . . , xn. Suppose that TS(A) is the set consisting of all tracial states of A. If TS(A) is a set with a single element, then δtop(x1, . . . , xn) ≤ κδ(x1, . . . , xn). To prove the preceding theorem, we need the following lemma. Sublemma 5.2.1. Suppose that A is a unital C∗ algebra with a family of self-adjoint generators x1, . . . , xn. Suppose that TS(A) 6= ∅ is the set consisting of all tracial states of A. Let R > max{‖x1‖, . . . , ‖xn‖} be some positive number. Then for any m ≥ 1, there is some rm ∈ N such (top) R (x1, . . . , xn; k, , P1, . . . , Prm) ⊆ ∪τ∈TS(A)ΓR(x1, . . . , xn; k,m, ; τ), ∀ k ≥ 1 where ΓR(x1, . . . , xn; k,m, ; τ) is microstate space of x1, . . . , xn with respect to τ (see [23]). Proof of Sublemma 5.2.1: We will prove the result by contradiction. Suppose, to the contrary, there is some m0 ≥ 1 so that following holds: for any r ∈ N, there are some kr ≥ 1 and some 1 , A 2 , . . . , A n ) ∈ Γ (top) R (x1, . . . , xn; kr, , P1, . . . , Pr) satisfying 1 , A 2 , . . . , A n ) /∈ ∪τ∈TS(A)ΓR(x1, . . . , xn; kr, m, ; τ). (5.2.1) Let α be a free ultrafilter in β(N)\N. Let N = r=1Mkr(C) be the von Neumann algebra ultra- product of {Mkr(C)}∞r=1 along the ultrafilter α, i.e. r=1Mkr(C) is the quotient algebra of the C∗ algebra r=1Mkr(C) by I2, the 0-ideal of the trace τα, where τα((Ar)∞r=1) = limr→α Tr(Ar) Let, for each 1 ≤ j ≤ n, aj = [(A(j)r )∞r=1] be a self-adjoint element in N . By mapping xj to aj , there is a unital ∗-homomorphism ψ from the C∗ algebra A onto the C∗ subalgebra generated by {a1, . . . , an} in N . Let τ0 be the tracial state on A which is induced by τα on ψ(A), i.e. τ0(x) = τα(ψ(x)), ∀ x ∈ A. It follows that when r is large enough, 1 , A 2 , . . . , A n ) ∈ ΓR(x1, . . . , xn; kr, m, ; τ0), which contradicts with the inequality (5.2.1). This complete the proof. � Proof of Theorem 5.1: Let R > max{‖x1‖, . . . , ‖xn‖}. Let τ be the unique trace of A. By Sublemma 5.2.1, for any m ≥ 1, there is r ∈ N such that (top) R (x1, . . . , xn; k, , P1, . . . , Pr) ⊆ ΓR(x1, . . . , xn; k,m, ; τ), ∀ k ≥ 1. Therefore, for any 1 > ω > 0, we have (top) R (x1, . . . , xn; k, , P1, . . . , Pr), ω) ≤ ν2(ΓR(x1, . . . , xn; k,m, ; τ), ω), ∀ k ≥ 1. Now it is easy to check that δ̃top(x1, . . . , xn) ≤ δ(x1, . . . , xn; τ) = κδ(x1, . . . , xn). By Proposition 5.1, we know that δtop(x1, . . . , xn) ≤ κδ(x1, . . . , xn). Remark 5.1. Combining Theorem 5.1 with the results in [11] or [14], we will be able to compute the upper-bound of topological free entropy dimension for a large class of unital C∗ algebras. For example, δtop(x1, . . . , xn) ≤ 1 if x1, . . . , xn is a family of self-adjoint operators that generates an irrational rotation algebra A. 5.3. Lower-bound of topological free entropy dimension in a unital C∗ algebra. In this subsection, we assume that A is a finitely generated, infinite dimensional, unital simple C∗ algebra with a unique tracial state τ . Assume that x1, . . . , xn is a family of self-adjoint generators of A. Let H be the Hilbert space L2(A, τ). Without loss of generality, we might assume that A is faithfully represented on the Hilbert space H . Let M be the von Neumann algebra generated by A on H . It is not hard to see that M is a diffuse von Neumann algebra with a tracial state τ . For each positive integer m, there is a family of mutually orthogonal projections p1, . . . , pm in M such that τ(pj) = 1/m for 1 ≤ j ≤ m. Let ym = 1 · p1 + 2 · p2 + · · ·+m · pm = j · pj ∈ M. Let {Pr(x1, . . . , xn)}∞r=1 be defined as in section 2.3. Thus {Pr(x1, . . . , xn)}∞r=1 is dense in M with respect to the strong operator topology. Hence, for each m ≥ 1, there is some self-adjoint element Prm(x1, . . . , xn) in A such that ‖ym − Prm(x1, . . . , xn)‖2 ≤ where ‖a‖2 = τ(a∗a) for all a ∈ M. Lemma 5.1. Let A be finitely generated, infinite dimensional, unital simple C∗ algebra with a unique tracial state τ . Assume that x1, . . . , xn is a family of self-adjoint generators of A. Let H, M be defined as above. For each m ≥ 1, let ym and Prm(x1, . . . , xn) be chosen as above. δtop(x1, . . . , xn) ≥ δtop(Prm(x1, . . . , xn) : x1, . . . , xn). Proof. Let R > max{‖Prm(x1, . . . , xn)‖, ‖x1‖, . . . , ‖xn‖}. There exists a positive constant D > 1 such that ‖Prm(A1, . . . , An)− Prm(B1, . . . , Bm)‖ ≤ D‖(A1, . . . , An)− (B1, . . . , Bm)‖ for all A1, . . . , An, B1, . . . , Bn in Mk(C) satisfying 0 ≤ ‖A1‖, . . . , ‖An‖, ‖B1‖, . . . , ‖Bn‖ ≤ R. Then it is not hard to verify that, for ω > 0, (top) R (Prm(x1, . . . , xn) : x1, . . . , xn; k, ǫ, P1, . . . , Pr), ω) ≤ ν∞(Γ(top)R (x1, . . . , xn; k, ǫ, P1, . . . , Pr), for each r ≥ rm and ǫ < ω4 . By definition of δtop and Remark 2.3, we have δtop(Prm(x1, . . . , xn) : x1, . . . , xn) ≤ δtop(x1, . . . , xn). Definition 5.3. Suppose A is a unital C∗ algebra and x1, . . . , xn is a family of self-adjoint ele- ments of A that generates A as a C∗ algebra. If for any R > max{‖x1‖, . . . , ‖xn‖, ‖y1‖, . . . , ‖ym‖}, r > 0, ǫ > 0, there is a sequence of positive integers k1 < k2 < · · · such that (top) R (x1, . . . , xn : y1, . . . , ym; ks, ǫ, P1, . . . , Pr) 6= ∅, ∀ s ≥ 1 then A is called having approximation property. Lemma 5.2. Let A be a finitely generated, infinite dimensional, unital simple C∗ algebra with a unique tracial state τ . Assume that A has approximation property. Assume that x1, . . . , xn is a family of self-adjoint generators of A. Let H, M be defined as above. Let m be a positive integer. Let ym and Prm(x1, . . . , xn) be chosen as above. Let R > max{‖Prm(x1, . . . , xn)‖, ‖x1‖, . . . , ‖xn‖}. Then there is some positive integer r > rm so that the following hold: ∀ k ≥ 1, if (B,A1, . . . , An) ∈ Γ(top)R (Prm(x1, . . . , xn), x1, . . . , xn; k, , P1, . . . , Pr), then there are some 1 ≤ k1, . . . , km ≤ k with 1m − for each 1 ≤ j ≤ m and k1 + · · ·+ km = k, and a unitary matrix U in U(k) satisfying ‖B − U 1 · Ik1 0 · · · 0 0 2 · Ik2 · · · 0 · · · · · · . . . · · · 0 0 · · · m · Ikm U∗‖2 ≤ Proof. We will prove the result by contradiction. Assume, to the contrary, for all r ≥ rm there are some kr ≥ 1 and some (B(r), A 1 , . . . , A n ) ∈ Γ (top) R (Prm(x1, . . . , xn), x1, . . . , xn; kr, , P1, . . . , Pr), satisfying ‖B(r) − U 1 · Is1 0 · · · 0 0 2 · Is2 · · · 0 · · · · · · . . . · · · 0 0 · · · m · Ism U∗‖2 > , (5.3.1) for all 1 ≤ s1, . . . , sm ≤ kr with 1m − for each 1 ≤ j ≤ n and s1+ · · ·+ sm = kr, and all unitary matrix U in U(k). Let α be a free ultrafilter in β(N) \N. Let N = r=1Mkr(C) be the von Neumann algebra ultraproduct of {Mkr(C)}∞r=1 along the ultrafilter α, i.e. r=1Mkr(C) is the quotient of the C∗ algebra r=1Mkr(C) by I2, the 0-ideal of the trace τα, where τα((Ar)∞r=1) = limr→α Tr(Ar) Let, for each 1 ≤ j ≤ n, aj = [(A(j)r )∞r=1] be a self-adjoint element in N . By mapping xj to aj , there is a unital ∗-homomorphism ψ from the C∗ algebra A onto the C∗ subalgebra generated by {a1, . . . , an} in N . Since A is a simple C∗ algebra and ψ(IA) = IN , ψ actually is a ∗- isomorphism. Since A has a unique trace τ , ψ induces a ∗-isomorphism (still denoted by ψ) from M onto the von Neumann subalgebra generated by a1, . . . , an in N . Therefore, ‖ym − Prm(x1, . . . , xn)‖2 = ‖ψ(ym)− Prm(a1, . . . , an)‖2,τα ≤ This contradicts with the definition of ym and inequality (5.3.1). � The following lemma is well-known (for example, see Lemma 4.1 in [23]). Lemma 5.3. Suppose A, or B, is a self-adjoint matrix in Ms.a.k (C) with a list of eigenvalues λ1 ≤ λ2 ≤ · · · ≤ λk, or µ1 ≤ µ2 ≤ · · · ≤ µk respectively. Then |λj − µj |2 ≤ Tr((A− UBU∗)2), where U is any unitary matrix in U(k). Lemma 5.4. Let r,m be some positive integer with 4 < m < r. Suppose k1, . . . , km is a family of positive integers such that 1 for all 1 ≤ j ≤ k and k1 + · · ·+ km = k. If A is a self-adjoint matrix in Mk(C) such that, for some unitary matrix U in U(k), ‖A− U 1 · Ik1 0 · · · 0 0 2 · Ik2 · · · 0 · · · · · · . . . · · · 0 0 · · · m · Ikm U∗‖2 ≤ then, for any ω > 0 we have ν2(Ω(A), ω) ≥ (8C1ω)−k for some constants C1, C > 1 independent of k, ω, where Ω(A) = {W ∗AW | W ∈ U(k)}. Proof. Suppose that λ1 ≤ λ2 ≤ . . . ≤ λk are the eigenvalues of A. For each 1 ≤ j ≤ m, let Tj = {i ∈ N | ( kt) + 1 ≤ i ≤ kt and |λi − j| ≤ T̂j = {( kt) + 1, ( kt) + 2, · · · , kt} \ Tj , here we assume that k0 = 0. Let B = diag(1 · Ik1, · · · , m · Ikm) be a diagonal matrix in Mk(C). By Lemma 5.3, we have ≥ Tr((A− UBU∗)2) ≥ i ∈T̂j |λi − j|2 ≥ card(T̂j), where card(T̂j) is the cardinality of the set T̂j . Thus card(T̂j) ≤ , for 1 ≤ j ≤ m. Let sj = card(Tj) for 1 ≤ j ≤ m, whence ≥ kj ≥ sj = kj − card(T̃j) ≥ kj − , ∀ 1 ≤ j ≤ m. Tm+1 = {1, 2, . . . , k} \ (∪nj=1Tj) and sm+1 be the cardinality of the set Tm+1. Thus sm+1 = k − s1 − · · · − sm = card(T̂j) ≤ It is not hard to see that T1, . . . , Tm+1 is a partition of the set {1, 2, . . . , k}. Moreover, if 1 ≤ j1 6= j2 ≤ m then for any i1 ∈ Tj1 , and i2 ∈ Tj2 we have |λi1 − λi2| ≥ |j2 − j1| − |λi2 − j2| − |λi1 − j1| ≥ 1− Applying Proposition 3.1 for such T1, . . . , Tm, Tm+1, θ = 1/2 and ω = δ/2, we have ν2(Ω(A), ω) ≥ (8C1ω)−k )−2s2 −···−2s2m−2s −4(s1+···+sm)sm+1 ≥ (8C1ω)−k )−2(k2 +···+k2m+( )2+2k· 4k ≥ (8C1ω)−k )−2(( k )2+···+( k )2+ 16k ≥ (8C1ω)−k for some constants C,C1 > 1 independent of k, ω. Lemma 5.5. Let A be a finitely generated, infinite dimensional, simple unital C∗ algebra with a unique tracial state τ . Assume that A has approximation property. Assume that x1, . . . , xn is a family of self-adjoint generators of A. Let H, M be defined as above. Let m be a positive integer. Let ym and Prm(x1, . . . , xn) be chosen as above. Let R > max{‖Prm(x1, . . . , xn)‖, ‖x1‖, . . . , ‖xn‖}. When r is large enough and ǫ is small enough, for any ω > 0, we have (top) R (Prm(x1, . . . , xn) : x1, . . . , xn; k, ǫ, P1, . . . , Pr), ω) ≥ (8C1ω)−k Proof. By Lemma 5.2, when r is large enough and ǫ is small enough, the following hold: ∀ k ≥ 1, if (B,A1, . . . , An) ∈ Γ(top)R (Prm(x1, . . . , xn), x1, . . . , xn; k, ǫ, P1, . . . , Pr), then there are some 1 ≤ k1, . . . , km ≤ k with 1m − for each 1 ≤ j ≤ m and k1 + · · ·+ km = k, and a unitary matrix U in U(k) satisfying ‖B − U 1 · Ik1 0 · · · 0 0 2 · Ik2 · · · 0 · · · · · · . . . · · · 0 0 · · · m · Ikm U∗‖2 ≤ Combining with Lemma 5.4, we know that if B ∈ Γ(top)R (Prm(x1, . . . , xn) : x1, . . . , xn; k, ǫ, P1, . . . , Pr) then, for any ω > 0, ν2(Ω(B), ω) ≥ (8C1ω)−k where Ω(B) = {W ∗BW | W ∈ U(k)}. Note that Ω(B) ⊂ Γ(top)R (Prm(x1, . . . , xn) : x1, . . . , xn; k, r, ǫ). It follows that, for any ω > 0, (top) R (Prm(x1, . . . , xn) : x1, . . . , xn; k, r, ǫ), ω) ≥ (8C1ω)−k −56k2 Now we have the following result. Theorem 5.2. Let A be a finitely generated, infinite dimensional, simple unital C∗ algebra with a unique tracial state τ . Assume that x1, . . . , xn is a family of self-adjoint generators of A. If A has approximation property, then δtop(x1, . . . , xn) ≥ 1. Proof. Let H be the Hilbert space L2(A, τ). Without loss of generality, we might assume that A is faithfully represented on the Hilbert space H . Let M be the von Neumann algebra generated by A on H . It is not hard to see that M is a diffuse von Neumann algebra with a tracial state τ . For each positive integer m, there is a family of mutually orthogonal projections p1, . . . , pm in M such that τ(pj) = 1/m for 1 ≤ j ≤ m. Let ym = 1 · p1 + 2 · p2 + · · ·+m · pm = j · pj. Let {Pr(x1, . . . , xn)}∞r=1 be defined as in section 2.3. Thus {Pr(x1, . . . , xn)}∞r=1 is dense in M with respect to the strong operator topology. Hence, for each m ≥ 1, there is some self-adjoint element Prm(x1, . . . , xn) in A such that ‖ym − Prm(x1, . . . , xn)‖2 ≤ By Lemma 5.5, for any ω > 0, when r is large enough and ǫ is small enough, we have for some constants C1, C > 1 independent of k, ω (top) R (Prm(x1, . . . , xn) : x1, . . . , xn; k, ǫ, P1, . . . , Pr), ω) ≥ (8C1ω)−k Therefore, δ̃top(Prm(x1, . . . , xn) : x1, . . . , xn) ≥ 1− By Proposition 5.1, we get δtop(Prm(x1, . . . , xn) : x1, . . . , xn) ≥ 1− By Lemma 5.1, δtop(x1, . . . , xn) ≥ 1− Since m is an arbitrary positive integer, we obtain δtop(x1, . . . , xn) ≥ 1. 5.4. Values of topological free entropy dimensions in some unital C∗ algebras. In this subsection, we are going to compute the values of topological free entropy dimensions in some unital C∗ algebras by using the results from preceding subsection. Theorem 5.3. Let Aθ be an irrational rotation C∗ algebra. Then δtop(x1, . . . , xn) = 1 where x1, . . . , xn is a family of self-adjoint operators that generates Aθ. Proof. Note that Aθ is an infinite dimensional, unital simple C∗ algebra with a unique tracial state τ . By [24] or [11] and Theorem 5.1, we know that δtop(x1, . . . , xn) ≤ 1. It follows from [18] that Aθ has approximation property. Therefore δtop(x1, . . . , xn) ≥ 1. Hence δtop(x1, . . . , xn) = 1. Theorem 5.4. Let A be a UHF algebra (uniformly hyperfinite C∗ algebra). Then δtop(x1, . . . , xn) = 1 where x1, . . . , xn is a family of self-adjoint operators that generates A. Proof. By [17], we know that A is generated by two self-adjoint elements. It is not hard to see that A is an infinite dimensional, unital simple C∗ algebra with a unique tracial state τ . By [24] or [11] and Theorem 5.1, we know that δtop(x1, . . . , xn) ≤ 1. It is easy to check that A has approximation property. Therefore δtop(x1, . . . , xn) ≥ 1. Hence δtop(x1, . . . , xn) = 1. Recall that for any sequence (Am)∞m=1 of C∗ algebras,we can introduce two C∗ algebras Am = {(am)∞m=1 | am ∈ Am, sup ‖am‖ <∞} Am = {(am)∞m=1 | am ∈ Am, lim ‖am‖ = 0} The norm in the quotient C∗ algebra m Am/ mAm is given by ‖ρ((am)∞m=1)‖ = lim sup ‖xm‖, where ρ is the quotient map from mAm onto If A is an exact C∗ algebra, then the sequence 0 → A⊗min Mm(C) → A⊗min Mm(C) → A⊗min ( Mm(C)/ Mm(C)) → 0 is exact. Therefore, we have the following natural identification A⊗min ( Mm(C)/ Mm(C)) = (A⊗min Mm(C))/(A⊗min Mm(C)). On the other hand, we have the following natural embedding A⊗min Mm(C) ⊆ Mm(A) and the identification A⊗min Mm(C) = Mm(A) Thus we have for any exact C∗ algebra A a natural embedding ψ : A⊗min ( Mm(C)/ Mm(C) ⊆ Mm(A)/ Mm(A). Lemma 5.6. Suppose that A and B are unital C∗ algebras and ρ is an unital embedding ρ : A → Mm(B)/ Mm(B). Suppose that x1, . . . , xn is a family of elements in A. Suppose r is a positive integer and {Pj(x1, . . . , xn)}rj=1 is a family of noncommutative polynomials of x1, . . . , xn. Then there are some k ∈ N and a(k)1 , . . . , a n in Mk(B) so that |‖Pj(a(k)1 , . . . , a(k)n )‖ − ‖Pj(x1, . . . , xn)‖| ≤ , ∀ 1 ≤ j ≤ r. Proof. We might assume that ρ(xi) = [(x i )m] ∈ Mm(B)/ Mm(B), ∀ 1 ≤ i ≤ n. By the definition of mMm(B)/ mMm(B), there are some positive integers m1 ≤ m2 such |( sup m1≤l≤m2 ‖Pj(x(l)1 , . . . , x(l)n )‖)− ‖Pj(x1, . . . , xn)‖| ≤ , ∀ 1 ≤ j ≤ r. Let k = j and i = ⊕m2l=m1x i ∈ Mk(B), ∀ 1 ≤ i ≤ n. Then, it is not hard to check that |‖Pj(a(k)1 , . . . , a(k)n )‖ − ‖Pj(x1, . . . , xn)‖| ≤ , ∀ 1 ≤ j ≤ r. Theorem 5.5. Let p ≥ 2 be a positive integer and Fp be the free group on p generators. Let C∗red(Fp)⊗minC∗red(Fp) be a minimal tensor product of two reduced C∗ algebras of free groups Fp. δtop(x1, . . . , xn) = 1, where x1, . . . , xn is any family of self-adjoint generators of C red(Fp)⊗min C∗red(Fp). Proof. Note that C∗red(Fp)⊗minC∗red(Fp) is an infinite dimensional, unital simple C∗ algebra with a unique tracial state. By the result from [5] or [11] and Theorem 5.1, Theorem 5.2, to show δtop(x1, . . . , xn) = 1, we need only to show that C red(Fp)⊗min C∗red(Fp) has approximation property. Therefore, it suffices to show the following: Let R > max{‖x1‖, . . . , ‖xn‖}. For any r ≥ 1, there is some k ∈ N so that (top) R (x1, . . . , xn; k, , P1, . . . , Pr) 6= ∅. By the result from [9], we know there is a unital embedding φ1 : C red(Fp) → Mm(C)/ Mm(C), which induce a unital embedding φ2 : C red(Fp)⊗min C∗red(Fp) → C∗red(Fp)⊗min ( Mm(C)/ Mm(C)) Note that C∗red(Fp) is an exact C ∗ algebra. From the explanation preceding the theorem it follows that there is a unital embedding φ3 : C red(Fp)⊗min C∗red(Fp) → Mm(C∗red(Fp))/ Mm(C∗red(Fp)). By Lemma 5.6, for a family of elements x1, . . . , xn in C red(Fp) ⊗min C∗red(Fp) and r ≥ 1, there are some m ∈ N and some a(m)1 , . . . , a n in Mm(C∗red(Fp)) so that max{‖a1‖, . . . , ‖an‖} < R |‖Pj(a(m)1 , . . . , a(m)n )‖ − ‖Pj(x1, . . . , xn)‖| ≤ , ∀ 0 ≤ j ≤ r. On the other hand, by the existence of embedding φ1 : C red(Fp) → Mm′(C)/ Mm′(C), it follows that there is a unital embedding φ4 : Mm(C∗red(Fp)) = Mm(C)⊗min C∗red(Fp) → Mm(C)⊗min ( Mm′(C)/ Mm′(C)) Mm(C)⊗min ( Mm′(C)/ Mm′(C)) = Mm′m(C)/ Mm′m(C). Hence for such a 1 , . . . , a n in Mm(C∗red(Fp)) and r ≥ 1, by Lemma 5.6, there are some k ∈ N and A1, . . . , An in Mk(C) so that max{‖A1‖, . . . , ‖An‖} < R and |‖Pj(a(m)1 , . . . , a(m)n )‖ − ‖Pj(A1, . . . , An)‖| ≤ , ∀ 0 ≤ j ≤ r. Altogether, we have |‖Pj(x1, . . . , xn)‖ − ‖Pj(A1, . . . , An)‖| ≤ , ∀ 0 ≤ j ≤ r, which implies that C∗red(Fp)⊗min C∗red(Fp) has approximation property. Hence δtop(x1, . . . , xn) = 1, for any family of self-adjoint elements x1, . . . , xn that generates C red(Fp)⊗min C∗red(Fp). � Theorem 5.6. Suppose that K be the C∗ algebra consisting of all compact operators on a sepa- rable Hilbert space H. Suppose A = C K is the unitization of K. If x1, . . . , xn is a family of self-adjoint elements that generate A as a C∗ algebra, then δtop(x1, . . . , xn) = 0. Proof. By [17], we know that unital C∗ algebra A is generated by two self-adjoint elements in A. Note that A has a unique trace τ , which is defined by τ((λ, x)) = λ, ∀ (λ, x) ∈ A. By Theorem 5.1, it is not hard to see that δtop(x1, . . . , xn) = 0, where x1, . . . , xn is a family of self-adjoint generators of A. 6. Topological free orbit dimension of C∗ algebras Assume that A is a unital C∗-algebra. Let x1, . . . , xn, y1, . . . , ym be self-adjoint elements in A. Let C〈X1, . . . , Xn, Y1, . . . , Ym〉 be the noncommutative polynomials in the indeterminates X1, . . . , Xn, Y1, . . . , Ym. Let {Pr}∞r=1 be the collection of all noncommutative polynomials in C〈X1, . . . , Xn, Y1, . . . , Ym〉 with rational coefficients. 6.1. Unitary orbits of balls in Mk(C)n. We let Mk(C) be the k× k full matrix algebra with entries in C, and U(k) be the group of all unitary matrices in Mk(C). Let Mk(C)n denote the direct sum of n copies of Mk(C). Let Ms.ak (C) be the subalgebra of Mk(C) consisting of all self-adjoint matrices of Mk(C). Let (Ms.ak (C))n be the direct sum of n copies of Ms.ak (C). For every ω > 0, we define the ω-orbit-‖ · ‖-ball U(B1, . . . , Bn;ω) centered at (B1, . . . , Bn) in Mk(C)n to be the subset of Mk(C)n consisting of all (A1, . . . , An) in Mk(C)n such that there exists some unitary matrix W in U(k) satisfying ‖(A1, . . . , An)− (WB1W ∗, . . . ,WBnW ∗)‖ < ω. 6.2. Norm-microstate space. For all integers r, k ≥ 1, real numbers R, ǫ > 0 and non- commutative polynomials P1, . . . , Pr, we let (top) R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr) be as defined as in section 2.4. 6.3. Topological free orbit dimension. Definition 6.1. For ω > 0, we define the covering number (top) R (x1, . . . , xn : y1, . . . , yp; k, ǫ, P1, . . . , Pr), ω) to be the minimal number of ω-orbit–‖·‖-balls that cover Γ(top)R (x1, . . . , xn : y1, . . . , yp; k, ǫ, P1, . . . , Pr) with the centers of these ω-orbit-‖ · ‖-balls in Γ(top)R (x1, . . . , xn : y1, . . . , yp; k, ǫ, P1, . . . , Pr) For each function f : N× N× R+ → R, we define, kf(x1, . . . , xn : y1, . . . , yp;ω,R) = inf r∈N,ǫ>0 lim sup f(o∞(Γ (top) R (x1, . . . , xn : y1, . . . , yp; k, ǫ, P1, . . . , Pr), ω), k, ω) kf(x1, . . . , xn : y1, . . . , yp;ω) = sup kf(x1, . . . , xn : y1, . . . , yp;ω,R) kf(x1, . . . , xn : y1, . . . , yp) = lim sup kf(x1, . . . , xn : y1, . . . , yp;ω), where kf(x1, , . . . , xn : y1, . . . , yp) is called the topological f(·)-free-orbit-dimension of x1, . . . , xn in the presence of y1, . . . , yp. 6.4. Topological free entropy dimension and topological free orbit dimension. The following result follows directly from the definitions of topological free entropy dimension and topological free orbit dimension of n-tuple of self-adjoint elements in a C∗ algebra. Theorem 6.1. Suppose that A is a unital C∗ algebra and x1, . . . , xn is a family of self-adjoint elements of A. Let f : N× N× R+ → R be defined by f(s, k, ω) = log s −k2 logω for s, k ∈ N, ω > 0. Then δtop(x1, . . . , xn) ≤ kf(x1, . . . , xn) + 1. 7. Topological free orbit dimension of one variable We recall the packing number of a set in a metric space as follows. Definition 7.1. Suppose that X is a metric space with a metric distance d. (i) The packing number of a set K by ω-balls in X, denoted by P (K,ω), is the maximal cardinality of the subsets F in K satisfying for all a, b in F either a = b or d(a, b) ≥ ω. (ii) The packing dimension of the set K in X, denoted by d(K), is defined by d(K) = lim sup log(P (K,ω)) − log ω . 7.1. Upper-bound of the topological free orbit dimension of one variable. Suppose that x = x∗ is a self-adjoint element in a unital C∗ algebra A and σ(x) is the spectrum of x in For any ω > 0, let m = P (K,ω) be the packing number of σ(x) in R. Thus there exists a family of elements λ1, . . . , λm in σ(x) such that (i) |λi − λj | ≥ ω for all 1 ≤ i 6= j ≤ m; and (ii) for any λ in σ(x), there is some λj with 1 ≤ j ≤ m satisfying |λ− λj | ≤ ω. Lemma 7.1. For any given R > ‖x‖, when r is large enough and ǫ is small enough, we have lim sup log o∞(Γ (top) R (x; k, ǫ, P1, . . . , Pr), 3ω) log k Proof. By Theorem 3.1, there exist some r0 ≥ 1 and ǫ0 > 0 such that the following holds: when r > r0, ǫ < ǫ0, for any A in Γ (top) R (x; k, ǫ, P1, . . . , Pr), there are positive integers 1 ≤ k1, . . . , km ≤ k with k1 + · · ·+ km = k and some unitary matrix U in Mk(C) satisfying ‖U∗AU − λ1Ik1 0 · · · 0 0 λ2Ik2 · · · 0 · · · · · · . . . · · · 0 0 · · · λmIkm ‖ ≤ 2ω, where Ikj is the kj × kj identity matrix for 1 ≤ j ≤ m. Ω(k1, . . . , km) = λ1Ik1 0 · · · 0 0 λ2Ik2 · · · 0 · · · · · · . . . · · · 0 0 · · · λmIkm U | U is in Uk Let J be the set consisting of all these (k1, . . . , km) ∈ Nm with k1 + · · ·+ km = k. Then the cardinality of the set J is equal to (k − 1)! (m− 1)!(k −m)! . (top) R (x; k, ǫ, P1, . . . , Pr) is contained in 2ω-neighborhood of the set (k1,...,km)∈J Ω(k1, . . . , km). It follows that (top) R (x; k, ǫ, P1, . . . , Pr), 3ω) ≤ o∞( (k1,...,km)∈J Ω(k1, . . . , km), ω) ≤ |J | = (k − 1)! (m− 1)!(k −m)! . Therefore, lim sup log o∞(Γ (top) R (x; k, ǫ, P1, . . . , Pr), 3ω) log k = lim sup log o∞(Γ (top) R (x; k, ǫ, P1, . . . , Pr), ω) log k ≤ lim sup (k−1)! (m−1)!(k−m)! log k = m− 1. 7.2. Lower-bound. Suppose that x = x∗ is a self-adjoint element in a unital C∗ algebra A and σ(x) is the spectrum of x in A. Lemma 7.2. We have lim sup log o∞(Γ (top) R (x; k, ǫ, P1, . . . , Pr), log k ≥ m− 1. Proof. For any ω > 0, let m = P (K,ω) be the packing number of σ(x) in R. Thus there exists a family of elements λ1, . . . , λm in σ(x) such that (i) |λi − λj | ≥ ω for all 1 ≤ i 6= j ≤ m; and (ii) for any λ in σ(x), there is some λj with 1 ≤ j ≤ m satisfying |λ− λj| ≤ ω. For any R > ‖x‖, r ≥ 1 and ǫ > 0, by functional calculus, there are λm+1, . . . , λn in σ(x) such that for every 1 ≤ t1, . . . , tm ≤ k − n with 2nt1 + . . .+ 2ntm = k − n, the matrix A = diag(λ1I2nt1 ,λ2I2nt2 , . . . , λmI2ntm , λ1, . . . , λm, . . . , λn) is in Γ (top) R (x; k, ǫ, P1, . . . , Pr), (7.2.1) where we assume that 2n|(k − n). Let J be the set consisting of all these (t1, . . . , tm) ∈ Nm with 2nt1 + . . . + 2ntm = k − n. Then the cardinality of the set J is equal to !(m− 1)! By Weyl’s theorem in [27] on the distance of unitary orbits of two self-adjoint matrices, for any two distinct elements (s1, . . . , sm) and (t1, . . . , tm) in J and any W in U(k), we have ‖A1 −WA2W ∗‖ ≥ ω, where A1 = diag(λ1I2nt1 , λ2I2nt2 , . . . , λmI2ntm , λ1, . . . , λm, . . . , λn) A2 = diag(λ1I2ns1, λ2I2ns2 , . . . , λmI2nsm, λ1, . . . , λm, . . . , λn) are two diagonal self-adjoint matrices in Mk(C). Combining with (7.2.1), we have (top) R (x; k, ǫ, P1, . . . , Pr), ) ≥ |J | ≥ !(m− 1)! Hence lim sup log o∞(Γ (top) R (x; k, ǫ, P1, . . . , Pr), log k ≥ lim sup ( k−n2n −1)! ( k−n2n −m)!(m−1)! log k = m− 1. 7.3. Topological free orbit dimension of one self-adjoint element. Theorem 7.1. Suppose that x = x∗ is a self-adjoint element in a unital C∗ algebra A and σ(x) is the spectrum of x in A. Let d(σ(x)) be the packing dimension of the set σ(x) in R. Let f : N× N× R+ → R be defined by f(s, k, ω) = log s log k − log ω for s, k ∈ N, ω > 0. Then kf(x) = d(σ(x)). Proof. The result follows directly from Lemma 7.1, Lemma 7.2 and Definition 7.1. � Theorem 7.2. Suppose that x = x∗ is a self-adjoint element in a unital C∗ algebra A. Let f : N× N× R+ → R be defined by f(s, k, ω) = log s −k2 logω for s, k ∈ N, ω > 0. Then kf(x) = 0. Proof. The result follows directly from Lemma 7.1 and Definition 7.1. � References [1] N. Brown, K. Dykema, K. Jung, “Free Entropy Dimension in Amalgamated Free Products,” math.OA/0609080. [2] M. Dostál, D. Hadwin, “An alternative to free entropy for free group factors,” International Workshop on Operator Algebra and Operator Theory (Linfen, 2001). Acta Math. Sin. (Engl. Ser.) 19 (2003), no. 3, 419–472. [3] K. Dykema, “Two applications of free entropy,” Math. Ann. 308 (1997), no. 3, 547–558. [4] L. Ge, “Applications of free entropy to finite von Neumann algebras,” Amer. J. Math. 119 (1997), no. 2, 467–485. http://arxiv.org/abs/math/0609080 [5] L. Ge, “Applications of free entropy to finite von Neumann algebras,” II. Ann. of Math. (2) 147 (1998), no. 1, 143–157. [6] L. Ge, S. Popa, “On some decomposition properties for factors of type II1,” Duke Math. J. 94 (1998), no. 1, 79–101. [7] L. Ge, J. Shen, “Free entropy and property T factors,” Proc. Natl. Acad. Sci. USA 97 (2000), no. 18, 9881–9885 (electronic). [8] L. Ge, J. Shen, “On free entropy dimension of finite von Neumann algebras,” Geom. Funct. Anal. 12 (2002), no. 3, 546–566. [9] U. Haagerup, S. Thorbjrnsen, “A new application of random matrices: Ext(C∗ (F2)) is not a group,” Ann. of Math. (2) 162 (2005), no. 2, 711–775. [10] D. Hadwin, “Free entropy and approximate equivalence in von Neumann algebras”, Operator algebras and operator theory (Shanghai, 1997), 111–131, Contemp. Math., 228, Amer. Math. Soc., Providence, RI, 1998. [11] D. Hadwin, J. Shen, “Free orbit diension of finite von Neumann algebras”, Journal of Functional Analysis 249 (2007) 75-91. [12] K. Jung, “The free entropy dimension of hyperfinite von Neumann algebras,” Trans. Amer. Math. Soc. 355 (2003), no. 12, 5053–5089 (electronic). [13] K. Jung, “A free entropy dimension lemma,” Pacific J. Math. 211 (2003), no. 2, 265–271. [14] K. Jung, “Strongly 1-bounded von Neumann algebras,” Math arKiv: math.OA/0510576. [15] K. Jung, D. Shlyakhtenko, “All generating sets of all property T von Neumann algebras have free entropy dimension ≤ 1,” Math arKiv: math.OA/0603669. [16] D. McDuff, “Central sequences and the hyperfinite factor,” Proc. London Math. Soc. (3) 21 1970 443–461. [17] C. Olsen and W. Zame, “Some C∗ algebras with a single generator,” Trans. of A.M.S. 215 (1976), 205-217. [18] M. Pimsner, D. Voiculescu, “Imbedding the irrational rotation C∗-algebra into an AF-algebra,” J. Operator Theory 4 (1980), no. 2, 201–210. [19] M. Stefan, “Indecomposability of free group factors over nonprime subfactors and abelian subalgebras,” Pacific J. Math. 219 (2005), no. 2, 365–390. [20] M. Stefan, “The primality of subfactors of finite index in the interpolated free group factors,” Proc. Amer. Math. Soc. 126 (1998), no. 8, 2299–2307. [21] S. Szarek, “Metric entropy of homogeneous spaces,” Quantum probability, 395–410, Banach Center Publ., 43, Polish Acad. Sci., Warsaw, 1998. [22] D. Voiculescu, “Circular and semicircular systems and free product factors,” Operator algebras, unitary representations, enveloping algebras, and invariant theory (Paris, 1989), 45–60, Progr. Math., 92, Birkhauser Boston, MA, 1990. [23] D. Voiculescu, “The analogues of entropy and of Fisher’s information measure in free probability theory II,” Invent. Math., 118 (1994), 411-440. [24] D. Voiculescu, “The analogues of entropy and of Fisher’s information measure in free probability theory III: The absence of Cartan subalgebras,” Geom. Funct. Anal. 6 (1996) 172–199. [25] D. Voiculescu, “Free entropy dimension ≤ 1 for some generators of property T factors of type II1,” J. Reine Angew. Math. 514 (1999), 113–118. [26] D. Voiculescu, “The topological version of free entropy,” Lett. Math. Phys. 62 (2002), no. 1, 71–82. [27] H. Weyl, “Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung,” (German) Math. Ann. 71 (1912), no. 4, 441–479. http://arxiv.org/abs/math/0510576 http://arxiv.org/abs/math/0603669 1. Introduction 2. Definitions and preliminary 3. Some technical lemmas 4. Topological free entropy dimension of one variable 5. Topological free entropy dimension of n-tuple in unital C* algebras 6. Topological free orbit dimension of C* algebras 7. Topological free orbit dimension of one variable References ABSTRACT The notion of topological free entropy dimension of $n-$tuples of elements in a unital C$^*$ algebra was introduced by Voiculescu. In the paper, we compute topological free entropy dimension of one self-adjoint element and topological orbit dimension of one self-adjoint element in a unital C$^*$ algebra. Moreover, we calculate the values of topological free entropy dimensions of families of generators of some unital C$^*$ algebras (for example: irrational rotation C$^*$ algebras or minimal tensor product of two reduced C$^*$ algebras of free groups). <|endoftext|><|startoftext|> Introduction A hot deconfined medium favors the dissociation of J/ψ since enough hard gluons can overcome the large energy gap between the J/ψ and a continuum state of cc̄ [1]. Models based on perturbative QCD have shown that a dense partonic system can be pro- duced in central Au-Au collisions at RHIC and LHC energies [2-6] and then evolve toward thermal equilibrium and likely chemical equilibrium [7-11]. Such parton plasmas will be searched for soon in experiments at Brookhaven National Laboratory Relativistic Heavy Ion Collider (RHIC). The J/ψ suppression has been taken as a thermometer to identify the evolution history of a parton plasma by showing transverse momentum dependence of the survival probability in the central rapidity region [12]. Charmonium melting inside a hot medium, which leads to J/ψ suppression, was proposed by Matsui and Satz to probe the existence of the quark-gluon plasma [13]. Before the complete formation of charmonium is achieved, a pre-resonant cc̄ is expanding from a collision point. Dominance of the color octet plus a collinear gluon configuration in the pre-resonance state [14] may account for the same suppression of ψ′ and J/ψ production in proton-nucleus collisions [15]. The growth of the color octet configuration and its interaction with nucleons along its trajectory in a nucleus are essential ingredients in explaining measured J/ψ production cross sections. In addition, the importance of color octet configurations has been verified in pp̄ collisions at center-of-mass energy s = 1.8 TeV with the CDF detector at Fermilab [16]. Theoretically, the color-octet production at short distances and its evolution into physical resonances has been well formulated in nonrelativistic QCD [17]. At the collision energies of RHIC and LHC we can reasonably expect considerable contributions from the color octet mechanism. The evolution of ultrarelativistic nucleus-nucleus collisions, e.g. central Au-Au colli- sions at both RHIC and LHC energies, has been divided into three stages in Refs. [9,18,19]: (a) an initial collision where a parton gas is produced; (b) a prethermal stage where elas- tic scatterings among partons lead to local momentum isotropy [20]; (c) a thermal stage where parton numbers increase until freeze-out. The term ’partonic system’ refers to the assembly of partons in the prethermal and thermal stages. The parton plasma only denotes the assembly of partons in the thermal stage. The cc̄ pairs are produced in the initial collision, prethermal and thermal stages but disintegrate in the latter two stages. In order to understand and make predictions for J/ψ yields of RHIC and LHC experiments, the following physical processes are taken into account. (a) In the initial collision, cc̄ pairs are produced in hard and semihard scatterings between partons from incoming nuclei by 2 → 2 processes which start at order α3s through the partonic channels ab→ cc̄[2S+1LJ ]x. In the prethermal and thermal stages, cc̄ pairs can also be produced in 2 → 1 collisions which start at order α2s via the partonic channels ab → cc̄[2S+1LJ ] since partons in the deconfined medium have large transverse momenta. (b) The cc̄ produced at short distance is in a color singlet (cc̄)1 or color octet (cc̄)8 configuration which has a certain probability to evolve nonperturbatively into a color singlet state. This J/ψ production process is formulated in nonrelativistic QCD. (c) Since the color octet to singlet transition of (cc̄)8 takes time, gluons in the partonic medium couple to the color octet state and destroy this transition process. Normally, dissociation cross sections for g+ cc̄→ (cc̄)8 depend on the pair size. Expansion of the cc̄ from a collision point to a full J/ψ size has to be taken into account. A physical resonance formed by a cc̄ pair may be one of J/ψ, χcJ , ψ ′ and others. Since the radiative transition from a higher charmonium state to the J/ψ takes a much longer time, the transition of such a state with nonzero pT takes place outside the partonic system. Since the Fermilab Tevatron experiments have been able to separate direct J/ψ’s from those produced in radiative χcJ decays [16], in this work we assume that the direct J/ψ production can also be extracted in heavy ion measurements. If χcJ and ψ ′ are considered, suppression factors for χcJ and ψ ′ in a deconfined medium are included in prompt J/ψ production. The identification of J/ψ suppression in the medium becomes impossible for any prompt J/ψ production data. Therefore, no contributions from higher charmonium states are taken into account in this work. The purpose of this work is to study the dependence of the J/ψ survival probability and number distributions produced in central Au-Au collisions at RHIC and LHC energies on the transverse momentum and also rapidity which will be measured in RHIC experi- ments [21]. The J/ψ number distributions corresponding to production of cc̄ in the initial collision are given in Section 2. Since nuclear shadowing has been shown to influence J/ψ production in proton-nucleus collisions [22], the nuclear modification of parton distribu- tions is considered. The J/ψ number distributions due to cc̄ production in the prethermal and thermal stages are given in Sections 3 and 4. Section 5 contains dissociation cross sections for gluon-(cc̄)1 and gluon-(cc̄)8. Numerical results for nucleon-cc̄ cross sections, J/ψ number distributions and four ratios including survival probability are presented in Section 6. Conclusions are summarized in the final section. 2. Initial production of cc̄ Intrinsic transverse momenta of partons inside a nucleon result in the production of J/ψ with typical momenta comparable to the QCD scale via 2 → 1 partonic scattering processes [23]. Since we want to study J/ψ productions with pT > 2 GeV, contributions from 2 → 1 partonic reactions are not considered in the initial nucleon-nucleon colli- sion. The effect of intrinsic transverse momentum smearing is rather modest for large transverse momentum J/ψ data from the Tevatron [24]. Upon omission of the intrinsic transverse momentum, differential cross section for J/ψ production in nucleon-nucleon collision resulting only from 2 → 2 partonic processes is given as dydyxdp⊥ = 2p⊥ xaxbfa/N (xa)fb/N(xb) (ab→ cc̄[2S+1L(1)J ]x → J/ψ) (ab → cc̄[2S+1L(8)J ]x → J/ψ)] (1) where the summation abx is over partons labeled by a, b, x, (1) for all possible color- singlet states and (8) for all possible color-octet states. Here denotes the partonic differential cross section for producing a cc̄[2S+1LJ ] and evolving to a J/ψ with spectro- scopic notation for quantum numbers and superscripts for singlet and octet [23, 25], and fa/N is the parton distribution function of the species a in a free nucleon. The longitudinal momentum fractions carried by initial partons, xa and xb, are related to rapidities of cc̄ and x, y and yx, by y + p⊥e yx), xb = −y + p⊥e where s, p⊥ and m⊥ are the center-of-mass energy of nucleon-nucleon collision, trans- verse momentum and transverse mass of the J/ψ. The conditions xa < 1 and xb < 1 restrict yx to a region of s−m⊥e−y < yx < ln s−m⊥ey These 2 → 2 processes at order α3s , gg → cc̄[2S+1LJ ]g, qq̄ → cc̄[2S+1LJ ]g, gq → cc̄[2S+1LJ ]q and gq̄ → cc̄[2S+1LJ ]q̄, start in initial nucleus-nucleus collisions and proceed with the expansion of the heavy pair. While a cc̄ propagates inside a prethermal or thermal partonic system, gluons hit and excite it to continuum states. Let σ gcc̄[13S be the cross section for g+(cc̄)[13S 1 ] → (cc̄)8, σgcc̄[S(8)] for g+(cc̄)[S(8)] → (cc̄)8 and σgcc̄[P (8)] for g+(cc̄)[P (8)] → (cc̄)8 respectively. The cross sections are calculated in Section 5. The probability for dissociation of a small-size cc̄ into a free state relies on the relative velocity between the gluon and cc̄, vrel, and gluon number densities in the prethermal and thermal stages, ng(x) and ng(τ), respectively. Here the variables x and τ are individually space-time coordinates and proper time. In the prethermal stage, parton distributions depend on the correlation between momentum and space-time coordinates [18, 19]. The dependence of the gluon number density ng(x) on x characterizes the partonic system in nonequilibrium. In the thermal stage, thermal parton distributions can be approximated by Jüttner distributions where the temperature and parton fugacities depend only on the proper time [9, 18, 20]. As a consequence, the gluon number density is only a function of τ . Including cc̄ suppression in the partonic system, the finally-formed number distribution of J/ψ resulting from cc̄ pairs produced in the initial central A+B collision is given by dN2→2ini dyd2p⊥ s−m⊥e s−m⊥e xafa/A(xa, m ⊥, ~r)xbfb/B(xb, m ⊥,−~r) (ab→ cc̄[3S(1)1 ]x → J/ψ) exp[− ∫ τiso dτ ′ng(x ′) < vrelσgcc̄[13S(1) (k · u) >pre θ(d− VT∆t) dτ ′ng(τ ′) < vrelσgcc̄[13S(1)1 ] (k · u) >the θ(d− VT∆t)] (ab→ cc̄[3S(8)1 ]x → J/ψ) exp[− ∫ τiso dτ ′ng(x ′) < vrelσgcc̄[S(8)](k · u) >pre θ(d− VT∆t) dτ ′ng(τ ′) < vrelσgcc̄[S(8)](k · u) >the θ(d− VT∆t)] (ab→ cc̄[1S(8)0 ]x → J/ψ) exp[− ∫ τiso dτ ′ng(x ′) < vrelσgcc̄[S(8)](k · u) >pre θ(d− VT∆t) dτ ′ng(τ ′) < vrelσgcc̄[S(8)](k · u) >the θ(d− VT∆t)] (ab→ cc̄[3P (8)J ]x → J/ψ) exp[− ∫ τiso dτ ′ng(x ′) < vrelσgcc̄[P (8)](k · u) >pre θ(d− VT∆t) dτ ′ng(τ ′) < vrelσgcc̄[P (8)](k · u) >the θ(d− VT∆t)]} where fa/A is the parton distribution function of a nucleus, fa/A(x,Q 2, ~r) = TA(~r)Sa/A(x,~r)fa/N (x,Q 2) (3) with the thickness function TA and nuclear parton shadowing factor Sa/A. Here, RA is the nuclear radius. The symbols < · · · >pre and < · · · >the denote averages over gluon distributions in the prethermal and thermal stages, respectively. Along the track of nucleus-nucleus collisions, a deconfined partonic gas is produced from scatterings among primary partons at τ0, then reaches thermalization at τiso and finally freezes out at τf . Here, d is the shortest distance which a cc̄ travels from a production point ~r to the surface of the partonic medium with transverse velocity VT [12]. Suppose a cc̄ is produced at a proper time τ and a spatial rapidity η. The time ∆t for the partonic system to evolve to another proper time τ ′ is (V‖ sinh η − cosh η)τ + (sinh η − V‖ cosh η)2τ 2 + (1− V 2‖ )τ ′2 1− V 2‖ where V‖ is the longitudinal component of the cc̄ velocity. The disappearance of medium interactions on the cc̄ is ensured by the step function θ while this pair escapes from the partonic medium. 3. Production of cc̄ in the prethermal stage To order α2s, a cc̄ in a color singlet state is produced only through gluon fusion gg → cc̄[2S+1L(1)J ]. For the cc̄[3S 1 ], this fusion does not occur. In contrast, color octet states result from both channels gg → cc̄[2S+1L(8)J ] and qq̄ → cc̄[2S+1L J ]. Nevertheless, the number densities of quarks and antiquarks are so small that they are neglected in es- timating the production of cc̄ in the prethermal stage where gluons dominate the partonic system. Four momenta of the two initial partons and final cc̄ are denoted by k1 = (ω1, ~k1), k2 = (ω2, ~k2) and p = (E, ~p) = (m⊥ cosh y, ~p⊥, m⊥ sinh y). The differential production rate for gg → cc̄[2S+1L(8)J ] → J/ψ in the prethermal stage is d3A2→1pre 8(2π)5 δ(4)(k1 + k2 − p) g2Gfg(k1, x)fg(k2, x) | M(gg → cc̄[2S+1L(8)J ] → J/ψ) |2 where gG is the degeneracy factor for gluons and the fg(k, x) is the correlated phase-space distribution function given in Ref. [18]. The squared amplitudes | M |2 for cc̄ in color singlet and color octet are calculated individually in Refs. [23, 25]. To order α2s, the allowed color octet states are 1S 0 and 0,2 through the gluon fusion channel. Taking into account the suppression of cc̄ in the prethermal and thermal stages, the finally-formed number distribution of J/ψ resulting from cc̄ pairs produced through 2 → 1 processes in the prethermal stage is given by dN2→1pre dyd2p⊥ 16(2π)5 ∫ τiso τdτdηdφk1dyk1 g2Gfg(k1, x)fg(k2, x) {| M(gg → cc̄[1S(8)0 ] → J/ψ) |2 exp[− ∫ τiso dτ ′ng(x ′) < vrelσgcc̄[S(8)](k · u) >pre θ(d− VT∆t) dτ ′ng(τ ′) < vrelσgcc̄[S(8)](k · u) >the θ(d− VT∆t)] + | M(gg → cc̄[3P (8)J ] → J/ψ) |2 exp[− ∫ τiso dτ ′ng(x ′) < vrelσgcc̄[P (8)](k · u) >pre θ(d− VT∆t) dτ ′ng(τ ′) < vrelσgcc̄[P (8)](k · u) >the θ(d− VT∆t)]} where φki is the angle between ~k⊥i and ~p⊥ for i = 1, 2 and mc is the charm quark mass. The kinematic variables k⊥1, k⊥2, φk2 and yk2 are expressed in terms of k⊥1 = m⊥ cosh(y − yk1)− p⊥ cosφk1 k⊥2 = p2⊥ + k ⊥1 − 2p⊥k⊥1 cosφk1 sin φk2 = − sinφk1 sinh yk2 = (m⊥ sinh y − k⊥1 sinh yk1) To order α3s, the differential production rate gets contributions from the processes gg → cc̄[2S+1L(1,8)J ]g in the prethermal stage, d3A2→2pre 16(2π)8 δ(4)(k1 + k2 − p− px) g2Gfg(k1, x)fg(k2, x){ | M(gg → cc̄[2S+1L(1)J ]x → J/ψ) |2 | M(gg → cc̄[2S+1L(8)J ]x → J/ψ) |2} (7) where px = (Ex, ~px) = (p⊥x cosh yx, p⊥x cosφx, p⊥x sin φx, p⊥x sinh yx) is the four momen- tum of the massless parton x. Taking into account the suppression of cc̄ in the prethermal and thermal stages, the finally-formed number distribution of J/ψ resulting from cc̄ pairs produced through 2 → 2 processes in the prethermal stage is given by dN2→2pre dyd2p⊥ 16(2π)8 ∫ τiso τdτdηp⊥xdp⊥xdφxdyxdφk1dyk1 2k2⊥1 g2Gfg(k1, x)fg(k2, x) {| M(gg → cc̄[3S(1)1 ]x → J/ψ) |2 exp[− ∫ τiso dτ ′ng(x ′) < vrelσgcc̄[13S(1)1 ] (k · u) >pre θ(d− VT∆t) dτ ′ng(τ ′) < vrelσgcc̄[13S(1)1 ] (k · u) >the θ(d− VT∆t)] + | M(gg → cc̄[3S(8)1 ]x → J/ψ) |2 exp[− ∫ τiso dτ ′ng(x ′) < vrelσgcc̄[S(8)](k · u) >pre θ(d− VT∆t) dτ ′ng(τ ′) < vrelσgcc̄[S(8)](k · u) >the θ(d− VT∆t)] + | M(gg → cc̄[1S(8)0 ]x → J/ψ) |2 exp[− ∫ τiso dτ ′ng(x ′) < vrelσgcc̄[S(8)](k · u) >pre θ(d− VT∆t) dτ ′ng(τ ′) < vrelσgcc̄[S(8)](k · u) >the θ(d− VT∆t)] + | M(gg → cc̄[3P (8)J ]x → J/ψ) |2 exp[− ∫ τiso dτ ′ng(x ′) < vrelσgcc̄[P (8)](k · u) >pre θ(d− VT∆t) dτ ′ng(τ ′) < vrelσgcc̄[P (8)](k · u) >the θ(d− VT∆t)]} where ŝ = (k1 + k2) 2 and some kinematic variables are given by k⊥1 = {4m2c + 2m⊥p⊥x cosh(y − yx)− 2p⊥p⊥x cosφx}/ {2[m⊥ cosh(y − yk1) + p⊥x cosh(yx − yk1)− p⊥ cosφk1 − p⊥x cos(φx − φk1)]} k2⊥2 = m ⊥ + p ⊥x + 2m⊥p⊥x cosh(y − yx) + k2⊥1 −2k⊥1[m⊥ cosh(y − yk1) + p⊥x cosh(yx − yk1)] sinh yk2 = [m⊥ sinh y + p⊥x sinh yx − k⊥1 sinh yk1] The J/ψ number distribution resulting from cc̄ pairs produced in the prethermal stage becomes dNpre dyd2p⊥ dN2→1pre dyd2p⊥ dN2→2pre dyd2p⊥ 4. Production of cc̄ in the thermal stage In the thermal stage, parton distributions are approximated by thermal phase-space distributions fi(k;T, λi) in which the temperature T and nonequilibrium fugacities λi are functions of the proper time τ [9, 18]. While the partonic system evolves, quark and antiquark number densities increase. To order α2s, both gg → cc̄[2S+1L J ] → J/ψ and qq̄ → cc̄[2S+1L(8)J ] → J/ψ contribute to the J/ψ number distribution in the thermal stage dN2→1the dyd2p⊥ 16(2π)5 τdτdηdφk1dyk1 g2Gfg(k1;T, λg)fg(k2;T, λg) | M(gg → cc̄[1S 0 ] → J/ψ) |2 exp[− dτ ′ng(τ ′) < vrelσgcc̄[S(8)](k · u) >the θ(d− VT∆t)] g2Gfg(k1;T, λg)fg(k2;T, λg) | M(gg → cc̄[3P J ] → J/ψ) |2 exp[− dτ ′ng(τ ′) < vrelσgcc̄[P (8)](k · u) >the θ(d− VT∆t)] +gqgq̄fq(k1;T, λq)fq̄(k2;T, λq̄) | M(qq̄ → cc̄[3S(8)1 ] → J/ψ) |2 exp[− dτ ′ng(τ ′) < vrelσgcc̄[S(8)](k · u) >the θ(d− VT∆t)]} where gq and gq̄ are the degeneracy factors for quarks and antiquarks, respectively. In the channel of quark-antiquark annihilation, only the squared amplitude for 3S 1 does not vanish. All lowest-order 2 → 2 reactions gg → cc̄[2S+1LJ ]g, qq̄ → cc̄[2S+1LJ ]g, gq → cc̄[2S+1LJ ]q and gq̄ → cc̄[2S+1LJ ]q̄ contribute to the J/ψ number distribution in the ther- mal stage dN2→2the dyd2p⊥ 16(2π)8 τdτdηp⊥xdp⊥xdφxdyxdφk1dyk1 2k2⊥1 fa(k1;T, λa)fb(k2;T, λb)gagb | M(ab→ cc̄[2S+1L(1)J ]x → J/ψ) |2 exp[− dτ ′ng(τ ′) < vrelσgcc̄[12S+1L(1) (k · u) >the θ(d− VT∆t)] | M(ab→ cc̄[2S+1L(8)J ]x → J/ψ) |2 exp[− dτ ′ng(τ ′) < vrelσgcc̄[L(8)](k · u) >the θ(d− VT∆t)]} The J/ψ number distribution resulting from cc̄ pairs produced in the thermal stage becomes dNthe dyd2p⊥ dN2→1the dyd2p⊥ dN2→2the dyd2p⊥ 5. Gluon-cc̄ dissociation cross sections A dissociation cross section of a full-size J/ψ induced by a gluon is given in Refs. [1, 26]. Since an initially-created cc̄ has a radius of about r0 = and proceeds by expanding to a full-size object, the dissociation cross section of cc̄ by a gluon has a size dependence. By this we mean the dissociation of cc̄ into free states via this process g + cc̄ → (cc̄)8. Cross sections are calculated with chromoelectric dipole coupling between gluon and cc̄ in the procedure for gluon-J/ψ dissociation in Ref. [26]. The wave function of an expanding cc̄ is needed for this purpose, but it has not been investigated in the partonic medium even though some attempts have been made in studies of the color transparency phenomenon [27]. We proceed with the construction of wave functions in a simple one-gluon-exchange potential model. In a parton plasma, the internal motion of J/ψ is obtained [12] from the attractive Coulomb potential, V1 = −gs2/3πr. The quantum-mechanical interpretation of the cc̄ radius is < r2 >, the square root of the radius-square expectation value of the relative- motion wave function. For the 1S color singlet, its wave function in momentum space normalized to the radius of cc̄[13S 1 ] is [~rψ1s](~k) = 32 πa2.50 (1 + (ka0)2)3 where the variable a0 = < r2 > /3 is the Bohr radius for a full-size J/ψ. The velocity- square expectation value of the J/ψ wave function is < v2 >= 0.428. Then the radius of cc̄ is assumed to expand according to < r2 > = < v2 >t+ r0. The gluon-cc̄[1 dissociation cross section is gcc̄[13S 128gs 2m2.5c a 0 − ǫ0)1.5Q0 9[mca 0 − ǫ0) + 1]6 where Q0 is the gluon energy, ǫ0 the binding energy of J/ψ and gs the strong coupling constant. While the cc̄ is in a color octet state, it is not a bound state but rather a scattering state. Its relative-motion wave function is determined by the repulsive potential V8 = g2s/24πr. The radial part of the S wave function is SR(r) + iSI(r) = e iqrF (1 + iη, 2,−2iqr) (15) and the radial part of the P wave function is PR(r) + iPI(r) = qre iqrF (2 + iη, 4,−2iqr) (16) with q = mc < v2 > and η = g2s/24π < v2 >. The function F is the confluent hyper- geometric function. Wave functions in momentum space are obtained by performing a Fourier transform of the wave functions in space coordinates. Normalization constants of the momentum-space wave functions, CS and CP , are determined by fitting the cc̄ radius. Dissociation cross sections of the S-wave and P -wave color-octet states by a gluon are σgcc̄[S(8)] = g2s (mcQ 0)1.5 dr1dr2r mcQ0r1)j1( mcQ0r2)[SR(r1)SR(r2) + SI(r1)SI(r2)] σgcc̄[P (8)] = g2s (mcQ 0)1.5 dr1dr2 mcQ0r1)j0( mcQ0r2) + 2j2( mcQ0r1)j2( mcQ0r2)] [PR(r1)PR(r2) + PI(r1)PI(r2)] where the j0, j1 and j2 are spherical Bessel functions. The b is determined so that the square root of the r2 expectation value of the relative wave function in Eq. (15) or (16) is the color-octet radius. Relations b = 1.435 < r2 > for S-wave and b = 1.3 < r2 > for P -wave approximately hold for color-octet size less than normal hadron size. 6. Numerical results and discussions Results for five aspects are presented in the following subsections. The first aspect is the nucleon-cc̄ dissociation cross sections shown in the next subsection. The second one in Subsection 6.2 is J/ψ number distributions versus transverse momentum at y = 0 and rapidity at pT = 4 GeV with nuclear effect on parton distributions and cc̄ dissociation in the partonic system. The third one in Subsection 6.3 is to define and calculate four ratios including survival probability with y = 0 or pT = 4 GeV at both RHIC and LHC energies. The fourth one is given in Subsection 6.4 to show J/ψ number distributions without nuclear effect on parton distributions and cc̄ dissociation in the partonic system. The fifth one concerns some uncertainties on the above results. 6.1. Nucleon-cc̄ dissociation cross sections In the parton model of the nucleon, the gluon is a dominant ingredient. Whereas the cross section for cc̄ dissociated directly by a real gluon is of order αs, the cross section for the quark-cc̄ dissociation through a virtual gluon is of order α2s. With the gluon-(cc̄)1 cross section given in the last section, the nucleon-cc̄[13S 1 ] cross section driven mainly by the gluon ingredient becomes Ncc̄[13S dxfg/N(x,Q gcc̄[13S where x min = with pN being the proton momentum in the rest frame of the J/ψ. The gluon distribution function fg/N is that Glück-Reya-Vogt (GRV) result at leading order in Ref. [28]. The cross section is drawn in Fig. 1 to show the energy and renormalization- scale dependence while the cc̄ radius is the J/ψ radius in the attractive Coulomb poten- tial, rJ/ψ = 0.348 fm. In Fig. 1, gluon field operators are renormalized at three scales 2ǫ0, Q 0, respectively. The coupling constant has the value αs = correspond- ing to the scale ǫ0 while it varies for the other two scales. Values of the cross section s = 10 GeV are a little lower than the nucleon-J/ψ dissociation cross section ob- tained by the subtraction of quasi-elastic cross section in Ref. [29] from the total cross section given in Ref. [30]. In high-temperature hadronic matter or J/ψ photoproduction reaction, a typical value of the center-of-mass energy for nucleon-J/ψ (or preresonance) dissociation is around s = 6 GeV [1]. At this energy, Fig. 2 is drawn to show the size dependence of σ Ncc̄[13S , with fg/N (x,Q 2) at Q2 = ǫ20. For the S-wave color octet the nucleon-cc̄[S(8)] cross section is σNcc̄[S(8)] = dxfg/N(x, (q +Q 0)2)σgcc̄[S(8)] (20) where x min = . For the P -wave color octet the nucleon-cc̄[P (8)] cross section is σNcc̄[P (8)] = dxfg/N(x, (q +Q 0)2)σgcc̄[P (8)] (21) Since the gluon momentum in a confining medium is bigger than the QCD scale [14], the lowest value of x is set by ΛQCD = 0.2322 GeV used in the leading order GRV parton distribution functions. Dependences of σNcc̄[S(8)] and σNcc̄[P (8)] on the center-of-mass energy√ s are depicted in Fig. 3 while the size of (cc̄)8 is the full size of J/ψ. The dot-dashed line is obtained with the nucleon-J/ψ cross section given by Eq. (24) in Ref. [1] where another gluon distribution function evaluated at Q2 = ǫ20 is used. While the (cc̄)8 has small momentum in a nucleus, the cross section for nucleon-(cc̄)8 production is lower than the absorption cross section determined by Gerschel and Hüfner [31] or the two-gluon exchange result [32]. It was proposed by Kharzeev and Satz that the color octet plus a gluon configuration is a dominant component produced in the proton-nucleus collisions [14]. In fact, the present cross section for a nucleon and a bare (cc̄)8 is one part of the nucleon-g(cc̄)8 cross section. A (cc̄)8 pair produced at a collision point expands before becoming color singlet to a size which may be larger or smaller than the full size of J/ψ. We then show in Fig. 4 the dependence of the nucleon-(cc̄)8 cross section on the color-octet pair radius. We do not want to address proton-nucleus collisions in terms of nucleon-cc̄ cross sections [33, 34] since only the gluon-cc̄ cross sections are needed to study cc̄ suppression in the prethermal and thermal stages. In proton-nucleus collisions, once a color-octet (cc̄)8 pair is produced, it picks up a collinear gluon to form a colorless configuration [14]. However, in central Au-Au collisions at RHIC and LHC energies, the accompanying gluon scatters with other hard gluons in the dense partonic system and is driven away. Therefore the bare cc̄ is the object that we want to study in the partonic system. 6.2. J/ψ number distributions with suppression J/ψ number distributions versus transverse momentum at y = 0 and rapidity at pT = 4 GeV for central Au-Au collisions at RHIC energy s = 200AGeV are calculated with respect to the initial collision, prethermal and thermal stages. Initial productions of cc̄ are calculated with GRV parton distribution functions at renormalization scale µ = p2⊥ + 4m c . Evolution of color-octet states 1S0 and 3PJ toward the J/ψ is specified by nonperturbative matrix elements < OJ/ψ8 (3S1) >, < O 1S0) > and < OJ/ψ8 (3P0) > in nonrelativistic QCD [17]. In the nonperturbative evolution, a gluon from the partonic system hits and prevents the color octet from color neutralizing via g + (cc̄)8 → (cc̄)8. This medium effect has been expressed by exponentials in Eqs. (2), (6), (8), (10) and (11). Therefore, the nonperturbative matrix elements are assumed to be invariant while the medium effect is factorized into exponential forms. Values of these matrix elements are well determined by fitting the CDF measurements for pp̄ collisions at s = 1.8 TeV in Ref. [35], < OJ/ψ8 (3S1) >= (1.12± 0.14)× 10−2GeV3 (22) < OJ/ψ8 (1S0) > + < OJ/ψ8 (3P0) >= (3.90± 1.14)× 10−2GeV3 (23) In pp̄ collisions, differential cross sections of direct J/ψ production depend on the com- bination of < OJ/ψ8 (1S0) > and < O 3P0) >. However, since the dissociation cross section for the S-wave color-octet state is different from that for the P -wave color-octet state, such a dependence on the combination is destroyed. In calculations, values are taken as follows, < OJ/ψ8 (1S0) >= 4× 10−2GeV3, < O 3P0) >= − × 10−2GeV3 (24) The value of < OJ/ψ8 (3P0) > is positive at tree level and negative after renormalization [36]. Eq. (23) is still satisfied by the values in Eq. (24). These values of nonperturbative matrix elements are supposed to be universal for any center-of-mass energy Various contributions to the J/ψ number distributions including initial collisions, prethermal and thermal stages, 2 → 1 and 2 → 2 collisions, are drawn separately in Figs. 5 and 6. The dashed curve resulting from cc̄ production in the initial collision is obtained by calculating Eq. (2) where the nuclear parton shadowing factor is given in Ref. [4] throughout this subsection. The upper and lower dot-dashed curves resulting from cc̄ production in the prethermal stage are obtained by individually calculating Eq. (6) for 2 → 1 collisions and Eq. (8) for 2 → 2 collisions. The upper and lower dotted curves resulting from cc̄ production in the thermal stage are obtained by calculating Eq. (10) for 2 → 1 collisions and Eq. (11) for 2 → 2 collisions, respectively. To exclude the effect of intrinsic transverse momentum smearing, only the region pT > 2 GeV is considered. Consequently, no 2 → 1 collisions contribute in the initial collision. The J/ψ number distribution resulting from the initial collision shown by the dashed line has a plateau similar to that in proton-proton collision [37]. Both Figs. 5 and 6 show that cc̄ pairs produced from the thermal stage can be neglected compared to the initial production, but the contributions from the prethermal stage are important in the transverse momentum region 2GeV < pT < 8GeV and rapidity region 0 < y < 1.2. Productions of cc̄ in the prethermal and thermal stages bulge up the J/ψ number distribution shown by the solid line in this rapidity region. Nevertheless, the dot-dashed and dotted lines fall rapidly as the rapidity gets large. This bulging characterizes the formation of a deconfined medium because the medium has average momentum limited but big enough to produce extra cc̄ and thus J/ψ. The 2 → 1 collisions in the partonic system have bigger contributions than the 2 → 2 collisions. Each of Figs. 7 and 8 contains two sets of lines to show contributions from the color-singlet and color-octet pairs produced at short distance. Any set has a dashed line obtained from Eq. (2) for the initial collision, a dot-dashed line from Eq. (9) for the prethermal stage and a dotted line from Eq. (12) for the thermal stage, respectively. A line in the upper (lower) set for the color-octet (color-singlet) contributions stems from the terms for (cc̄)8 ((cc̄)1) states. The color octet states dominate productions of J/ψ at RHIC energy. However, the ratio of color-octet to color-singlet contributions shown by the two solid lines at pT = 6 GeV is reduced from about 70 at CDF collider energy s = 1.8 TeV to about 40 at RHIC energy. Both contributions of color-singlet and color-octet states have similar dependence on transverse momentum and rapidity. Figs. 9 and 10 show transverse momentum and rapidity dependence of J/ψ number distributions for central Au-Au collisions at LHC energy s = 5.5ATeV. A prominent feature is that the J/ψ number produced from the thermal stage is comparable to that from the prethermal stage. Compared to the initial production, cc̄ and J/ψ produced through 2 → 2 reactions may be neglected. A bulge is observed on the plateau in the rapidity region 0 < y < 1.5. Such a bulge can be taken as a signature for the existence of a parton plasma at the LHC energy. Figs. 11 and 12 depict contributions from the color singlet and color octet at LHC energy. The ratio of color-octet to color-singlet contributions shown by the two solid lines at pT = 6GeV reaches about 150. This indicates the color octet states become more important with the increase of 6.3. Ratios including J/ψ survival probability Nuclear shadowing results in a modification of gluon distribution functions inside a nucleus [38] and such a nuclear effect is represented by the shadowing factor Sa/A in Eq. (3). If the Sa/A = 1 for no shadowing, the dN ini /dyd 2p⊥ is proportional to the product of atomic masses of the two colliding nuclei. If Sa/A 6= 1 and depends on the longitudinal momentum fraction x, the production of cc̄ is reduced in the shadowing region and enhanced for the anti-shadowing region. Irrespective of interactions of J/ψ with the partonic system, J/ψ number distributions produced in the initial central A+B collision is obtained by putting all exponentials equal to 1 in Eq. (2), dN2→20 dyd2p⊥ (Sa/A) = 2 s−m⊥e s−m⊥e xafa/A(xa, m ⊥, ~r)xbfb/B(xb, m ⊥,−~r) (ab→ cc̄[3S(1)1 ]x → J/ψ) (ab→ cc̄[3S(8)1 ]x → J/ψ) (ab→ cc̄[1S(8)0 ]x → J/ψ) (ab→ cc̄[3P (8)J ]x → J/ψ)} To characterize the influence of nuclear parton shadowing on the J/ψ production from the initial collision, a ratio is defined as Rini = dN2→20 dyd2p⊥ (Sa/A 6= 1)/ dN2→20 dyd2p⊥ (Sa/A = 1) (26) Here the Sa/A 6= 1 from Ref. [4] applies throughout this subsection. The initially produced J/ψ originates from the cc̄ pairs produced in the initial colli- sion. Its dependence on the transverse momentum and rapidity is obtained by calculating Eq. (25). Some cc̄ pairs produced in the initial collision may dissociate by gluons from the partonic system. As a consequence, the J/ψ number is reduced. The survival probability for the cc̄ transiting into a J/ψ is defined as the ratio Splasma = dN2→2ini dyd2p⊥ (Sa/A 6= 1)/ dN2→20 dyd2p⊥ (Sa/A 6= 1) (27) We have calculated the J/ψ number distributions produced in the prethermal and thermal stages in Subsection 6.2. The J/ψ yield may be bigger than the reduced amount of initially produced J/ψ due to the cc̄ dissociation by gluons in the partonic system. The partonic system has two roles. One is to produce cc̄ pairs and another is to dissociate cc̄ pairs. To see the roles, a ratio is defined by Rplasma = ( dN2→2ini dyd2p⊥ (Sa/A 6= 1) + dNpre dyd2p⊥ dNthe dyd2p⊥ dN2→20 dyd2p⊥ (Sa/A 6= 1) (28) To understand the nuclear effect on parton distributions and the roles of the partonic system, we need to compare J/ψ production in the central A+B collision with that in the nucleon-nucleon collision. To this end, a ratio is defined as R = ( dN2→2ini dyd2p⊥ (Sa/A 6= 1) + dNpre dyd2p⊥ dNthe dyd2p⊥ dN2→20 dyd2p⊥ (Sa/A = 1) (29) which is also written as R = RiniRplasma (30) The ratios Rini, Splasma, Rplasma and R versus transverse momentum and rapidity are depicted as dashed, dotted, dot-dashed and solid lines, respectively, in Figs. 13 and 14 for the RHIC energy and Figs. 15 and 16 for the LHC energy. In contrast to Rini < 1, the value of Rplasma is larger than 1 for all transverse momenta in Fig. 13 and 0 < y < 1.5 in Fig. 14 and 0.5 < y < 1.5 in Fig. 16. This results in prominent bulges on the solid lines of R in Figs. 14 and 16. In contrast, the survival probability Splasma shown by the dotted lines has no such bulge. Therefore, the bulges are present in Figs. 6 and 10 when the J/ψ yield resulting from cc̄ pairs produced in the partonic system overwhelms the reduced amount of initially produced J/ψ. We conclude that in the rapidity region 0 < y < 1.5 a bulge observed in the ratio R is an indicator for the existence of the partonic system. For Rini < 1 and Splasma < 1, J/ψ suppression arises from the nuclear parton shadowing found in HIJING and cc̄ dissociation in the partonic system. 6.4. J/ψ number distributions with no suppression In Subsection 6.2, J/ψ number distributions have been presented while the cc̄ re- duction due to the nuclear parton shadowing in the initial collision and cc̄ dissociation in the partonic system are taken into account. In this subsection, the suppression including both the reduction and dissociation is omitted in calculations of J/ψ number distributions by setting to 1 all exponentials in Eqs. (2), (6), (8), (10) and (11). Figs. 17-20 depict these distributions versus transverse momentum at y = 0 and rapidity at pT = 4 GeV at both RHIC and LHC energies. We are now ready to explain the dip within y < 1 in Fig. 10. This dip disappears in Fig. 20 where suppression is not considered. Since Rini shown by the dashed line in Fig. 16 is flat with respect to the rapidity y < 2 and Splasma shown by the dotted line has a steep rise in 0.5 < y < 1, the dip phenomenon is solely due to the cc̄ dissociation in the partonic system. Such a dip phenomenon is not obvious but still can be observed in the prethermal and thermal stages when the J/ψ number distributions with suppression are compared to those without suppression. The comparison is indicated in Fig. 21 for the prethermal stage and Fig. 22 for the thermal stage. The solid and dot-dashed lines for no suppression begin to fall from y = 0.5 to y = 1, but change to rising as shown by the dashed and dotted lines when the cc̄ dissociation is switched on. This change occurs because of the steep rise of Splasma. A relatively weak dependence of Splasma on pT is shown by the dotted line in Fig. 15. The dip phenomenon thus cannot be observed in the pT dependence of J/ψ number distributions. Referring back to Eqs. (2), (6), (8), (10) and (11), exponentials there have sensitive dependence on the rapidity in 0 < y < 1.5. The dip is more obvious in the color singlet channel as shown by the lower solid, dashed, dot-dashed and dotted lines in Fig. 12. This is so because the cross section for the 13S 1 -state dissociation has a narrower peak with respect to the incident gluon energy [12] than the color octet states. 6.5. Uncertainties Since gluon shadowing in nuclei has not been studied experimentally, theoretical estimates of the nuclear gluon shadowing factor involve uncertainties. The nuclear parton shadowing factor found in HIJING [4] is a result of the assumption that there is no Q- dependence on the shadowing factor and the shadowing effect for gluons and quarks is the same. Nevertheless, the shadowing factor has been shown by Eskola et al. to evolve with momentum Q [39]. The difference between the latter and the former indicates uncertainty. The ratio Rini defined in Eq. (26) is calculated with Eskola et al.’s parametrization [39] and results are depicted in Fig. 23 showing momentum dependence at y = 0 and Fig. 24 showing rapidity dependence at pT = 4 GeV. Compared to the dashed lines in Figs. 13-16, the change of Rini at RHIC energy greater than 1 is prominent. This implies that the anti-shadowing effect of Eskola et al.’s parametrization is quite important at RHIC energy. Measurements on Rini in RHIC experiments are needed to confirm this nuclear enhancement [21]. The ratio Rini is always flat within the rapidity region -1.5< y < 1.5 for parametriza- tions given in HIJING and by Eskola et al., and the flatness seems to be independent of parametrizations. If the partonic system does not come into being, the ratio R is flat, too, since R = Rini. If the partonic system dissociates cc̄ pairs, the solid curve of R undergoes bulging, dipping and then bulging from y = −1.5 to y = 1.5. Any such twist of R in −1.5 < y < 1.5 observed in experiments is nontrivial, because only a deconfined medium generates it. Upon inclusion of uncertainties on the formation and evolution of parton plasma arising from other factors, for instance, the dependence on the coupling constant αs [40] and transverse flow [41], the J/ψ number distribution and the four ratios including survival probability will change. In the partonic system considered here gluons dominate the evolution and gluon-cc̄ interactions break the pairs. In a system where quarks and antiquarks are abundant, interactions between quarks (antiquarks) and cc̄ may account for a suppression of J/ψ [42]. Additional suppression caused by energy loss of the initial state has not been considered since there is a controversy on the influence of the energy loss [43, 44]. Some uncertainties are expected to be fixed by upcoming experiments at RHIC. 7. Conclusions We have studied J/ψ production through both color-singlet and color-octet cc̄ chan- nels with various stages of central Au-Au collisions at both RHIC and LHC energies. In addition to the scattering processes ab → cc̄[2S+1LJ ]x, contributions of the reactions ab→ cc̄[2S+1LJ ] are also calculated in the prethermal stage and thermal stage. The effect of the medium on an expanding cc̄ involves a gluon interacting with the cc̄ to prevent it from a transition into a color singlet. Cross sections for g + cc̄ → (cc̄)8 are calculated with internal wave functions of (cc̄)1 in an attractive potential and (cc̄)8 in a repulsive potential. Furthermore, nucleon-cc̄ cross sections for color singlet, S- and P - wave color octets as a function of s or cc̄ radius are evaluated by assuming that the nucleon domi- nantly contains gluons. Momentum and rapidity dependence of J/ψ number distribution with various contributions are calculated for central Au-Au collisions at both RHIC and LHC energies. Color octet contributions are one order of magnitude larger than the color singlet contributions. Yields of cc̄ are large in the prethermal stage at RHIC energy and through the 2 → 1 collisions ab → cc̄[2S+1LJ ] at LHC energy. Since the partonic system offers fairly large amounts of cc̄, a bulge in 0 < y < 1.5 at RHIC energy and 0.5 < y < 1.5 at LHC energy can be observed in the rapidity dependence of the J/ψ number distribu- tion and the ratio R of J/ψ number distributions for Au-Au collisions to nucleon-nucleon collisions. Such a bulge is a signature for the existence of a deconfined partonic medium. We suggest that RHIC and LHC experiments measure J/ψ number distributions and the ratio R in the rapidity region 0 < y < 3 to observe a bulge. While the yield of cc̄ from the medium is larger than the reduced amount of initial production in the medium, the ratio Rplasma is larger than 1. The competition between production and suppression determines the values of Rplasma, which relies on the evolution of parton number density and temperature of the partonic system [12]. A dip in the rapidity dependence of the J/ψ number distributions at LHC energy may exist and this amounts to a suppression effect of cc̄ in the partonic system. So far, we have obtained results and conclusions for positive rapidity. It is stressed that the same contents for negative rapidity can be obtained from the positive region by symmetry. Acknowledgements I thank the [Department of Energy’s] Institute for Nuclear Theory at the University of Washington for its hospitality and the Department of Energy for partial support during the completion of this work. I thank the Nuclear Theory Group at LBNL Berkeley for their hospitality during my visit. I also thank X.-N. Wang, C.-Y. Wong and M. Asakawa for discussions, K. J. Eskola for offering Fortran codes of nuclear parton shadowing factors, H. J. Weber for careful reading through the manuscript. This work was also supported in part by the project KJ951-A1-410 of the Chinese Academy of Sciences and the Education Bureau of Chinese Academy of Sciences. References [1]D. Kharzeev and H. Satz, Phys. Lett. B334(1994)155. [2]R. C. Hwa and K. Kajantie, Phys. Rev. Lett. 56(1986)696; J. P. Blaizot and A. H. Mueller, Nucl. Phys. B289(1987)847. [3]K. Kajantie, P. V. Landshoff, and J. Lindfors, Phys. Rev. Lett. 59 (1987)2517; K. J. Eskola, K. Kajantie, and J. Lindfors, Nucl. Phys. B323(1989)37; Phys. Lett. B214(1991)613. [4]X.-N. Wang and M. Gyulassy, Phys. Rev. D44(1991)3501; Comput. Phys. Commun. 83(1994)307. X.-N. Wang, Phys. Rep. 280(1997)287. [5]K. Geiger and B. Müller, Nucl. Phys. B369(1992)600; K. Geiger, Phys. Rev. D47(1993)133. [6]H. J. Moehring and J. Ranft, Z. Phys. C52(1991)643; P. Aurenche et al., Phys. Rev. D45(1992)92; P. Aurenche et al., Comput. Phys. Commun. 83(1994)107. [7]E. Shuryak, Phys. Rev. Lett. 68(1992)3270; L. Xiong and E.Shuryak, Phys. Rev. C49(1994)2203. [8]K. Geiger and J. I. Kapusta, Phys. Rev. D47(1993)4905. [9]T. S. Biró, E. van Doorn, B. Müller, M. H. Thoma, and X.-N. Wang, Phys. Rev. C48(1993)1275. [10]J. Alam, S. Raha and B. Sinha, Phys. Rev. Lett. 73(1994)1895. [11]H. Heiselberg and X.-N. Wang, Phys. Rev. C53(1996)1892. [12]X.-M. Xu, D. Kharzeev, H. Satz and X.-N. Wang, Phys. Rev. C53 (1996)3051. [13]T. Matsui and H. Satz, Phys. Lett. B178(1986)416. [14]D. Kharzeev and H. Satz, Phys. Lett. B366(1996)316. [15]D. M. Alde et al., Phys. Rev. Lett. 66(1991)133; D. M. Alde et al., Phys. Rev. Lett. 66(1991)2285; L. Antoniazzi et al., Phys. Rev. Lett. 70(1993)383; M. H. Schub et al., Phys. Rev. D52(1995)1307; T. Alexopoulos et al., Phys. Rev. D55(1997)3927 [16]F. Abe et al., CDF Collaboration, Phys. Rev. Lett. 79(1997)572,578 [17]W. E. Caswell and G. P. Lepage, Phys. Lett. B167(1986)437; G. P. Lepage, L. Magnea, C. Nakhleh, U. Magnea and K. Hornbostel, Phys. Rev. D46(1992)4052; G. T. Bodwin, E. Braaten and G. P. Lepage, Phys. Rev. D51(1995)1125. [18]P. Lévai, B. Müller and X.-N. Wang, Phys. Rev. C51(1995)3326. [19]Z. Lin and M. Gyulassy, Phys. Rev. C51(1995)2177. [20]K. J. Eskola and X.-N. Wang, Phys. Rev. D49(1994)1284. [21]Y. Akiba, in Proc. of Charmonium Production in Relativistic Nuclear Collisions, INT, Seattle, 1998, eds. B. Jacak and X.-N. Wang (World Scientific, Singapore,1998); M. Rosati, in Proc. of Charmonium Production in Relativistic Nuclear Collisions, INT, Seattle, 1998, eds. B. Jacak and X.-N. Wang (World Scientific, Singapore,1998) [22]S. Gupta and H. Satz, Z. Phys. C55(1992)391. R. C. Hwa and L. Leśniak, Phys. Lett. B295(1992)11. R. Vogt, S. J. Brodsky and P. Hoyer, Nucl Phys. B360(1991)67; K. Boreskov, A. Capella, A. Kaidalov and J. Tran Thanh Van, Phys. Rev. D47(1993)919. M. A. Braun, C. Pajares, C. A. Salgado, N. Armesto and A. Capella, Nucl. Phys. B509(1998)357. [23]P. Cho and A. K. Leibovich, Phys. Rev. D53(1996)150,6203. [24]K. Sridhar, A. D. Martin and W. J. Stirling, Phys. Lett. B438(1998)211. [25]R. Baier and R. Rückl, Z. Phys. C19(1983)251; R. Gastmans, W. Troost and T. T. Wu, Nucl. Phys. B291(1987)731. [26]M. E. Peskin, Nucl. Phys. B156(1979)365; G. Bhanot and M. E. Peskin, Nucl. Phys. B156(1979)391. [27]B. Z. Kopeliovich and B. G. Zakharov, Phys. Rev. D44(1991)3466; L. Frankfurt, G. A. Miller and M. Strikman, Phys. Lett. B304(1993)1; L. Gerland, L. Frankfurt, M. Strikman, H. Stöcker and W. Greiner, Phys. Rev. Lett. 81(1998)762. P. Jain, B. Pire and J. P. Ralston, Phys. Rep. 271(1996)67. [28]M. Glück, E. Reya and A. Vogt, Z. Phys. C67(1995)433. [29]R. L. Anderson, SLAC-Pub 1741(1976). [30]J. Hüfner and B. Z. Kopeliovich, Phys. Lett. B426(1998)154. [31]C. Gerschel and J. Hüfner, Z. Phys. C56(1992)171. [32]J. Dolej̆si and J. Hüfner, Z. Phys. C54(1992)489. C. W. Wong, Phys. Rev. D54(1996)R4199. [33]C.-Y. Wong and C. W. Wong, Phys. Rev. D57(1998)1838. [34]W. Cassing and E. L. Bratkovskaya, Nucl. Phys. A623(1997)570. [35]M. Beneke and M. Krämer, Phys. Rev. D55(1997)R5269. [36]J. Amundson, S. Fleming and I. Maksymyk, Phys. Rev. D56(1997)5844; T. Mehen, Phys. Rev. D55(1997)4338. [37]R. Gavai et al., Int. J. Mod. Phys. A10(1995)3043. [38]A. H. Mueller and J. Qiu, Nucl. Phys. B268(1986)427; K. J. Eskola, J. Qiu and X.-N. Wang, Phys. Rev. Lett. 72(1994)36; M. Arneodo, Phys. Rep. 240(1994)301. [39]K. J. Eskola, V. J. Kolhinen and C. A. Salgado, JYFL-8/98, US-FT/14-98, hep-ph/9807297. K. J. Eskola, V. J. Kolhinen and P. V. Ruuskanen, CERN-TH/97-345, JYFL-2/98, hep-ph/9802350. [40]S. M. H. Wong, Phys. Rev. C56(1997)1075. [41]D. K. Srivastava, M. G. Mustafa and B. Müller, Phys. Rev. C56(1997)1064. [42]R. Wittmann and U. Heinz, Z. Phys. C59(1993)77. [43]S. Gavin and J. Milana, Phys. Rev. Lett. 68(1992)1834; E. Quack and T. Kodama, Phys. Lett. B302(1993)495; R. C. Hwa, J. Pĭsút and N. Pĭsútová, Phys. Rev. C56(1997)432. [44]S. J. Brodsky and P. Hoyer, Phys. Lett. B298(1993)165. http://arxiv.org/abs/hep-ph/9807297 http://arxiv.org/abs/hep-ph/9802350 Figure 1: Solid, dashed and dot-dashed lines are nucleon-cc̄[13S 1 ] cross sections for fg/N (x,Q 2) evaluated at Q2 = ǫ20, 2ǫ 0, (Q 0)2, respectively. The (cc̄)1 has the same size as Figure 2: Cross section for nucleon-cc̄[13S 1 ] at s = 6 GeV as a function of the cc̄ radius is calculated with fg/N (x,Q 2) evaluated at Q2 = ǫ20. Figure 3: The solid and dashed lines are cross sections for nucleon-cc̄[S(8)] and nucleon- cc̄[P (8)] collisions as a function of s, respectively. The dot-dashed line is the nucleon-J/ψ cross section calculated with Eq. (24) in Ref. [1]. The corresponding (cc̄)8 and J/ψ have the same radius. Figure 4: The solid and dashed lines show radius dependence of cross sections for nucleon- (cc̄)8[S (8)] and nucleon-(cc̄)8[P (8)] collisions, respectively. Figure 5: J/ψ number distributions versus transverse momentum at y = 0 and RHIC energy with suppression. The dashed curve corresponds to cc̄ production in the initial collision. The upper and lower dot-dashed (dotted) curves correspond to cc̄ produced through 2 → 1 and 2 → 2 reactions in the prethermal (thermal) stage, respectively. The solid curve is the sum of all contributions. Figure 6: The same as Fig. 5, except for rapidity distribution at pT = 4 GeV. Figure 7: J/ψ number distributions versus transverse momentum at y = 0 and RHIC energy with suppression. The upper and lower dashed (dot-dashed, dotted and solid) lines correspond to cc̄ in color octet and color singlet, respectively, produced in the initial collision(prethermal stage, thermal stage and the all three stages). Figure 8: The same as Fig. 7, except for rapidity distribution at pT = 4 GeV. Figure 9: J/ψ number distributions versus transverse momentum at y = 0 and LHC energy with suppression. The dashed curve corresponds to cc̄ productions in the initial collision. The upper and lower dot-dashed (dotted) curves correspond to cc̄ produced through 2 → 1 and 2 → 2 reactions in the prethermal (thermal) stage, respectively. The solid curve is the sum of all contributions. Figure 10: The same as Fig. 9, except for rapidity distribution at pT = 4 GeV. Figure 11: J/ψ number distributions versus transverse momentum at y = 0 and LHC energy with suppression. The upper and lower dashed (dot-dashed, dotted and solid) lines correspond to cc̄ in color octet and color singlet, respectively, produced in the initial collision (prethermal stage, thermal stage and all three stages). Figure 12: The same as Fig. 11, except for rapidity distribution at pT = 4 GeV. Figure 13: Ratios versus transverse momentum at y = 0 and RHIC energy. The solid, dashed, dot-dashed and dotted lines are R, Rini, Rplasma and Splasma, respectively. Figure 14: The same as Fig. 13, except for rapidity distribution at pT = 4 GeV Figure 15: Ratios versus transverse momentum at y = 0 and LHC energy. The solid, dashed, dot-dashed and dotted lines are R, Rini, Rplasma and Splasma, respectively. Figure 16: The same as Fig. 15, except for rapidity distribution at pT = 4 GeV. Figure 17: J/ψ number distributions versus transverse momentum at y = 0 and RHIC energy without suppression. The dashed curve corresponds to cc̄ productions in the initial collision. The upper and lower dot-dashed (dotted) curves correspond to cc̄ produced through 2 → 1 and 2 → 2 reactions in the prethermal (thermal) stage, respectively. The solid curve is the sum of all contributions. Figure 18: The same as Fig. 17, except for rapidity distribution at pT = 4 GeV. Figure 19: J/ψ number distributions versus transverse momentum at y = 0 and LHC energy without suppression. The dashed curve corresponds to cc̄ productions in the initial collision. The upper and lower dot-dashed (dotted) curves correspond to cc̄ produced through 2 → 1 and 2 → 2 reactions in the prethermal (thermal) stage, respectively. The solid curve is the sum of all contributions. Figure 20: The same as Fig. 19, except for rapidity distribution at pT = 4 GeV. Figure 21: J/ψ number distributions versus rapidity at pT = 4 GeV in the prethermal stage of LHC energy. The dashed and solid lines individually correspond to cc̄ produced through 2 → 1 reactions with and without suppression. The dotted and dot-dashed lines through 2 → 2 reactions with and without suppression, respectively. Figure 22: The same as Fig. 21, except for the thermal stage. Figure 23: Ratio Rini versus transverse momentum at y = 0 is calculated with Eskola’s parametrization. Figure 24: The same as Fig. 23, except for rapidity distribution at pT = 4 GeV. ABSTRACT Any color singlet or octet ccbar pair is created at short distances and then expands to a full size of J/psi. Such a dynamical evolution process is included here in calculations for the J/psi number distribution as a function of transverse momentum and rapidity in central Au-Au collisions at both RHIC and LHC energies. The ccbar pairs are produced in the initial collision and in the partonic system during the prethermal and thermal stages through the partonic channels ab to ccbar [{2S+1}L_J] and ab to ccbar [{2S+1}L_J]x, and then they dissociate in the latter two stages. Dissociation of ccbar in the medium occurs via two reactions: (a) color singlet ccbar plus a gluon turns to color octet ccbar, (b) color octet ccbar plus a gluon persists as color octet. There are modest yields of ccbar in the prethermal stage at RHIC energy and through the reactions ab to ccbar [{2S+1}L_J] at LHC energy for partons with large average momentum in the prethermal stage at both collider energies and in the thermal stage at LHC energy. Production from the partonic system competes with the suppression of the initial yield in the deconfined medium. Consequently, a bulge within -1.5<|startoftext|> Introduction. 2 2 Toy model of the weak coupling limit. 4 2.1 Dilations of contractive semigroups. . . . . . . . . . . . . . . . . . . . . . 4 2.2 “Toy quadratic noises”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Weak coupling limit for Friedrichs operators. . . . . . . . . . . . . . . . . 8 3 Completely positive maps and semigroups. 10 3.1 Completely positive maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Completely positive semigroups. . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Classical Markov semigroups. . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Invariant c.p semigroups. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.5 Detailed Balance Condition. . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4 Bosonic reservoirs. 18 4.1 Second quantization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Coupling to a bosonic reservoir. . . . . . . . . . . . . . . . . . . . . . . . . 19 The paper is in final form and no version of it will be published elsewhere. http://arxiv.org/abs/0704.0669v2 2 J. DEREZIŃSKI AND W. DE ROECK 4.3 Thermal reservoirs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5 Quantum Langevin dynamics. 21 5.1 Linear noises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2 Quadratic noises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.3 Total energy operator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6 Weak coupling limit for Pauli-Fierz operators. 25 6.1 Reduced weak coupling limit. . . . . . . . . . . . . . . . . . . . . . . . . . 26 6.2 Energy of the reservoir in the weak coupling limit. . . . . . . . . . . . . . 27 6.3 Extended weak coupling limit. . . . . . . . . . . . . . . . . . . . . . . . . 27 1. Introduction. Physicists often describe quantum systems by completely positive (c.p.) semigroups [Haa, AL, Al2]. It is generally believed that this approach is only a phe- nomenological approximation to a more fundamental description. One usually assumes that on the fundamental level the dynamics of quantum systems is unitary, more precisely, is of the form t 7→ eitH · e−itH for some self-adjoint H . One of justifications for the use of c.p. semigroups in quantum physics is based on the so-called weak coupling limit for the reduced dynamics [VH, Da1], which we will call the reduced weak coupling limit. One assumes that a small system is coupled to a large reservoir and the dynamics of the full system is unitary. The interaction between the small system and the reservoir is multiplied by a small coupling constant λ. Often the reservoir is described by a free Bose gas. The basic steps of the reduced weak coupling limit are: • Reduce the dynamics to the small system. • Rescale time as t • Subtract the dynamics of the small system. • Consider the weak coupling λ→ 0. In the limit one obtains a dynamics given by a c.p. Markov semigroup. Another possible justification of c.p. dynamics goes as follows. One considers the tensor product of the small system and an appropriate bosonic reservoir. On this enlarged space one constructs a certain unitary dynamics whose reduction to the small system is a c.p. semigroup. We will call it a quantum Langevin dynamics. Another name used in this context in the literature is a quantum stochastic dynamics. Its construction has a long history, let us mention [AFLe, HP, Fr, Maa]. Thus one can obtain a Markov c.p. semigroup by reducing a single unitary dynamics, without invoking a family of dynamics and taking its limit. However, the generator of a quantum Langevin dynamics equals i[Z, ·] where Z is a self-adjoint operator that does not look like a physically realistic Hamiltonian. In particular, it is unbounded from both below and above. Thus one can question the physical relevance of this construction. REDUCED AND EXTENDED WEAK COUPLING LIMIT 3 It turns out, however, that one can extend the weak coupling limit in such a way, that it involves not only the small system, but also the reservoir. As a result of this approach one can obtain a quantum Langevin dynamics. One can argue that this approach gives a physical justification of quantum Langevin dynamics. The above idea was first implemented in [AFL] by Accardi, Frigerio and Lu under the name of the stochastic limit (see also [ALV]). Recently we presented our version of this approach, under the name of the extended weak coupling limit [DD1, DD2], which we believe is simpler and more natural that of [AFL]. The basic steps of the extended weak coupling limit are: • Introduce the so-called asymptotic space — the tensor product of the space of the small system and of the asymptotic reservoir. • Introduce an identification operator that maps the asymptotic reservoir into the physical reservoir and rescales its energy by λ2 around the Bohr frequencies. • Rescale time as t • Subtract the “fast degrees of freedom”. • Consider the weak coupling λ→ 0. In the limit one obtains a quantum Langevin dynamics on the asymptotic space. Note that the asymptotic reservoir is given by a bosonic Fock space (just as the physical reservoir). Its states are however different – correspond only to those physical bosons whose energies differ from the Bohr frequencies by at most O(λ2). Only such bosons survive the weak coupling limit. Let us mention yet another scheme of deriving quantum Langevin equations that has received attention in the literature, namely the ‘repeated interaction models’ where the reservoir is continuously refreshed, see [AtP, AtJ]. In this article we review various aspects of the weak coupling limit, reduced and, es- pecially, extended, mostly following our papers [DD1, DD2]. We also describe some back- ground material, especially related to completely positive semigroups, quantum Langevin dynamics and the Detailed Balance Condition. The plan of our article is as follows. In Section 2 we describe both kinds of the weak coupling limit on a class of toy-examples – the so-called Friedrichs Hamiltonians and their dilations. They are less relevant physically than the main model treated in our article – the one based on Pauli-Fierz operators. Nevertheless, they illustrate some of the main ideas of this limit in a simple and mathematically instructive context. This section is based on [DD1]. In Sections 3 we recall some facts about completely positive maps and semigroups, sketching proofs of the Stinespring dilation theorem [St] and of the so-called Lindblad form of the generator of a c.p. semigroup [Li, GKS]. In particular, we discuss the freedom of choosing various terms in the Lindblad form. This question, which we have not seen discussed in the literature, is relevant for the construction of quantum Langevin dynamics and the weak coupling limit. 4 J. DEREZIŃSKI AND W. DE ROECK C.p. semigroups that arise in the weak coupling limit have an additional property – they commute with the unitary dynamics generated by the Hamiltonian K of the small system – for shortness we say that they are K-invariant. If in addition the reservoir is thermal, they satisfy another special property – the so-called Detailed Balance Condition (DBC) [DF1, Ag, FKGV, Al1]. We devote a large part of Sect. 3 to an analysis of the K-invariance and the DBC. We show that the generator of a c.p. semigroup with these properties has some features that curiously resemble the Tomita-Takesaki theory and the KMS condition. Let us note that in our article the DBC is considered jointly with the K-invariance, because c.p. semigroups obtained in the weak coupling limit always have the latter property. In Section 4 we describe the terminology and notation that we use to describe second- quantized bosonic reservoirs interacting with a small system. In particular, we introduce Pauli-Fierz operators [DJ1] – used often (also under other names) in the physics literature to describe physically realistic systems. In Subsect. 4.3 we discuss thermal reservoirs. In our definition of a thermal reservoir at inverse temperature β one needs to check a simple condition for the interaction without explicitly invoking the concept of a KMS state on an operator algebra, or of a thermal Araki-Woods representation of the CCR [DF1, DJ1]. Of course, this condition is closely related to the KMS property. In Subsection 5 we describe a construction of quantum Langevin dynamics. We include a discussion of the so-called quadratic noises, even though they are still not used in our version of the extended weak coupling limit. (See however [Go] for some partial results in the context of the formalism of [AFL]). In Section 6 we describe the two kinds of the weak coupling limit for Pauli-Fierz operators: reduced and extended. Most of this section is based on [DD2]. 2. Toy model of the weak coupling limit. This section is somewhat independent of the remaining part of the article. It explains the (reduced and extended) weak coupling limit in the setting of contractive semigroups on a Hilbert space and their unitary dila- tions. It gives us an opportunity to explain some of the main ideas of the weak coupling limit in a relatively simple setting. It is based on [DD1]. It is possible to construct physically interesting models based on the material of this section (e.g. by considering quadratic Hamiltonians obtained by second quantization). We will not discuss this possibility further, since in the remaining part of the article we will analyze more interesting and more realistic models. 2.1. Dilations of contractive semigroups. First let us recall the well known concept of a unitary dilation of a contractive semigroup. Let K be a Hilbert space and e−itΥ a strongly continuous contractive semigroup on K. This implies that −iΥ is dissipative: −iΥ + iΥ∗ ≤ 0. Let Z be a Hilbert space containing K, IK the embedding of K in Z and Ut a unitary group on Z. We say that (Z, IK, Ut) is a dilation of e−itΥ iff I∗KUtIK = e −itΥ, t ≥ 0. It is well known that every weakly continuous contractive semigroup possesses a REDUCED AND EXTENDED WEAK COUPLING LIMIT 5 unitary dilation (which is unique up to the unitary equivalence if we additionally demand its minimality). The original and well known construction of a unitary dilation is due to Foias and Nagy and can be found in [NF] (see also [EL]). Below we present another construction, which looks different from that of Foias-Nagy. Its main idea is to view the generator of a unitary dilation as a kind of a singular Friedrichs operator. (See the next section, where Friedrichs operators are introduced). Such a definition is well adapted to the extended weak coupling limit. The construction that we present seems to be less known in the mathematics literature than that of Foias-Nagy. Nevertheless, similar constructions are scattered in the literature, especially in physics papers. Let h be an auxiliary space and ν ∈ B(K, h) satisfy (Υ −Υ∗) = −ν∗ν. (2.1) Note that such h and ν always exist. One of possible choices is to take h := K and i(Υ−Υ∗). If φ is a vector, then |φ) will denote the operator C ∋ λ 7→ |φ)λ := λφ. Similarly, (φ| will denote its adjoint: f 7→ (φ|f := (φ|f) ∈ C. We will use a similar notation also for unbounded functionals. For instance, (1| will denote the (unbounded) linear functional on L2(R) given by (1|f = f(x)dx (2.2) with the domain L2(R) ∩ L1(R). |1) will denote the hermitian conjugate of (1| in the sense of sesquilinear forms: if f ∈ L2(R) ∩ L1(R), then (f |1) := f̄(x)dx. Let ZR be the operator of multiplication on L 2(R) by the variable x. Introduce the Hilbert spaces ZR; = h⊗L2(R) andZ := K ⊕ZR. Clearly,K is contained in Z, so we have the obvious embedding IK : K → Z. We also have the embedding IR : ZR → Z. For t ≥ 0, consider the following sesquilinear form on K ⊕ (h⊗ (L2(R) ∩ L1(R))): Ut = IRe −itZRI∗R + IKe −itΥI∗K (2.3) −i(2π)− 12 IK du e−i(t−u)Υν∗ ⊗ (1|e−iuZRI∗R −i(2π)− 12 IR du e−i(t−u)ZRν ⊗ |1)e−iuΥI∗K −(2π)−1IR 0≤u1,u2, u1+u2≤t du1du2 e −iu2ZRν ⊗ |1)e−i(t−u2−u1)Υν∗ ⊗ (1|e−iu1ZRI∗R. By a straightforward computation we obtain [DD1] Theorem 2.1. The form Ut extends to a strongly continuous unitary group and I∗KUtIK = e −itΥ, t ≥ 0. Thus (Z, IK, Ut) is a dilation of e−itΥ. 6 J. DEREZIŃSKI AND W. DE ROECK Let −iZ denote the generator of Ut, so that Ut = e−itZ . Z is a self-adjoint operator with a number of interesting properties. It is not easy to describe it with a well-defined formula. Formally it is given by the sesquilinear form (Υ + Υ∗) (2π)− 2 ν∗ ⊗ (1| (2π)− 2 ν ⊗ |1) ZR . (2.4) Note that (2.4) looks like a special case of a Friedrichs operator (see Subsection 2.3 and [DF2]). As it stands, however, (2.4) does not define a unique self-adjoint operator. Nevertheless, we will sometimes use the expression (2.4) when referring to Z. Note that it is possible to give a compact formula for the resolvent of Z, (which is another possible method of defining Z). For z ∈ C+, (z − Z)−1 := IR(z − ZR)−1I∗R + IK(z −Υ)−1I∗K +(2π)− 2 IK(z −Υ)−1ν∗ ⊗ (1|(z − ZR)−1I∗R +(2π)− 2 IR(z − ZR)−1ν ⊗ |1)(z −Υ)−1I∗K +(2π)−1IR (z − ZR)−1ν ⊗ |1) (z −Υ)−1 ν∗ ⊗ (1|(z − ZR)−1 I∗R. Yet another approach that allows to define Z involves a “cut-off procedure”. In fact, Z is the norm resolvent limit for r → ∞ of the following regularized operators: Zr := (Υ + Υ∗) (2π)− 2 ν∗ ⊗ (1|1[−r,r](ZR) (2π)− 2 ν ⊗ 1[−r,r](ZR)|1) 1[−r,r](ZR)ZR Note that it is important to remove the cut-off in a symmetric way. If we replace [−r, r] with [−r, ar] we usually obtain a different operator. The convergence of Zr to Z is the reason why we can treat (2.4) as the formal expression for Z. Next, let us mention a certain invariance property of Z. For λ ∈ R, introduce the following unitary operator on Z jλu = u, u ∈ K; jλg(y) := λ−1g(λ−2y), g ∈ ZR. Note that j∗λZRjλ = λ 2ZR, j λ|1) = λ|1). Therefore, the operator Z is invariant with respect to the following scaling: Z = λ−2j∗λ (Υ + Υ∗) λ(2π)− 2 ν∗ ⊗ (1| λ(2π)− 2 ν ⊗ |1) ZR jλ. (2.5) (2.5) will play an important role in the extended weak coupling limit. Note that in the weak coupling limit it is convenient to use the representation of ZR as a multiplication operator. Another natural possibility is to represent it as the differentiation operator. Let us describe this alternative version of the dilation. The (unitary) Fourier transformation on h⊗L2(R) will be denoted as follows: Ff(τ) := (2π)−1/2 f(x)e−iτxdx. (2.6) REDUCED AND EXTENDED WEAK COUPLING LIMIT 7 We will use τ as the generic variable after the application of F . The operator Z trans- formed by 1K ⊕F will be denoted Ẑ := (1K ⊕ F)Z(1K ⊕F∗). (2.7) Introduce Dτ := ∂τ . (2.8) Let (δ0| have the meaning of an (unbounded) linear functional on L2(R) with the domain, say, the first Sobolev space H1(R), such that (δ0|f) := f(0). (2.9) Similarly, |δ0) let be its hermitian adjoint in the sense of forms. By applying the Fourier transform to (2.4), we can write (Υ + Υ∗) ν∗ ⊗ (δ0| ν ⊗ |δ0) Dτ . (2.10) Clearly, e−itẐ is also a dilation of e−itΥ. The operator Ẑ (or Z) and the unitary group it generates has a number of curious and confusing properties. Let us describe one of them. Consider the space D := K ⊕ (h⊗H1(R)). Clearly, it is a dense subspace of Z. Let us define the following quadratic form on D: Ẑ+ := Υ ν∗ ⊗ (δ0| ν ⊗ |δ0) Dτ . (2.11) Then, for ψ, ψ′ ∈ D, (ψ|(e−itẐ − 1)ψ′) = −i(ψ|Ẑ+ψ′). (2.12) One could think that Ẑ+ = Ẑ. But Ẑ+ is in general non-self-adjoint, which is incompat- ible with the fact that e−itẐ is a unitary group. To explain this paradox we notice that (ψ|e−itẐψ′) is in general not differentiable at zero: its right and left derivatives exist but are different. Hence D is not contained in the domain of the generator of Ẑ. We will call Ẑ+ the false form of the generator of eitẐ . In order to make an even closer contact with the usual form of the quantum Langevin equation [HP, At, Fa, Bar, Me], define the cocycle unitary Ŵ (t) := eitDτ e−itẐ . (2.13) Then for t > 0, or for t = 0 and the right derivative, we have the “toy Langevin (stochastic) equation” which holds in the sense of quadratic forms on D, Ŵ (t) = (Υ + ν ⊗ |δt))Ŵ (t) + ν∗⊗(δt|. (2.14) 2.2. “Toy quadratic noises”. The formula for Ẑ or for Ẑ+ has one interesting feature: it involves a perturbation that is localized just at τ = 0. One can ask whether one can consider other dilations with more general perturbations localized at τ = 0. In this subsection we will describe such dilations. This construction will not be needed in the present version of the weak coupling limit. We believe it is an interesting “toy version” 8 J. DEREZIŃSKI AND W. DE ROECK of “quadratic noises”, which we will discuss in Subsect 5.2. We also expect to extend the results of [DD1] to “toy quadratic noises”. Clearly, for any unitary operator U on h⊗L2(R), (1K ⊕U)eitZ(1K ⊕U∗) is a dilation of e−itΥ. Let us choose a special U , which will lead to a perturbation localized at τ = 0. Let S be a unitary operator on h. For ψ ∈ h⊗L2(R) ≃ L2(R, h) we set γ(S)ψ(τ) := Sψ(τ), τ > 0, ψ(τ), τ ≤ 0. (2.15) Then γ(S) is a unitary operator on h⊗L2(R). Set ẐS := (1K ⊕ γ(S)∗)Ẑ(1K ⊕ γ(S)). Clearly, eitẐS is a dilation of e−itΥ. It is awkward to write down a formula for ẐS in the matrix form, even just formally. It is more natural to write down the “false form of ZS”: Ẑ+S := (1K ⊕ γ(S) ∗)Ẑ+(1K ⊕ γ(S)) Υ ν∗S ⊗ (δ0| ν ⊗ |δ0) Dτ + i(1− S)⊗|δ0)(δ0| For ψ, ψ′ ∈ D we have (ψ|(e−itẐS − 1)ψ′) = −i(ψ|Ẑ+S ψ ′), (2.16) Again, as in (2.14), one can extend this formula to derivatives at t > 0. Let ŴS(t) := e itDτ e−itẐS , (2.17) then, in the sense of quadratic forms on D, ŴS(t) = (Υ + ν ⊗ |δt))ŴS(t) + ν∗S⊗(δt|+ i(1 − S)⊗|δt)(δt|. (2.18) 2.3. Weak coupling limit for Friedrichs operators. Let H := K⊕HR be a Hilbert space, where K is finite dimensional. Let IK be the embedding of K in H. Let K be a self-adjoint operator on K and HR be a self-adjoint operator on HR. Let V be a linear operator from K to HR. The following class of operators will be called Friedrichs operators: Hλ := K λV ∗ λV HR Assume that ‖V ∗e−itHRV ‖dt < ∞. Then we can define the following operator, sometimes called the Level Shift Operator, since it describes the shift of eigenvalues of Hλ in perturbation theory at the 2nd order in λ: k∈spK 1k(K)V ∗e−it(HR−k)V 1k(K)dt, (2.19) where 1k(K) denotes the spectral projection of K onto the eigenvalue k; spK denotes the spectrum of K. Note that ΥK = KΥ. The following theorem is a special case of a result of Davies [Da1, Da2, Da3], see also [DD1]: REDUCED AND EXTENDED WEAK COUPLING LIMIT 9 Theorem 2.2 (Reduced weak coupling limit for Friedrichs operators). eitK/λ −itHλ/λ IK = e −itΥ. In order to study the extended weak coupling limit for Friedrichs operators we need to make additional assumptions. They are perhaps a little complicated to state, but they are satisfied in many concrete situations. Assumption 2.3. We suppose that for any k ∈ spK there exists an open Ik ⊂ R and a Hilbert space hk such that k ∈ Ik, Ran1Ik(HR) ≃ hk ⊗ L2(Ik, dx), 1Ik(HR)HR is the multiplication operator by the variable x ∈ Ik and 1Ik(HR)V ≃ v(x)dx. We assume that Ik are disjoint for distinct k and the measurable function Ik ∋ x 7→ v(x) ∈ B(K, hk) is continuous at k. In other words, we assume that the reservoir Hamiltonian HR and the interaction V are “nice” around the spectrum of K. In fact, in the extended weak coupling limit only a vicinity of spK matters. We set h := ⊕ hk, ZR := h⊗L2(R) and Z := K ⊕ ZR. ZR and Z are the so-called asymptotic spaces, which are in general different from the physical spaces HR and H. Next, let us describe the asymptotic dynamics. Let ν : K → h be defined as ν := (2π) v(k)1k(K). Note that it satisfies (2.1) with Υ defined by (2.19). This follows by extending the inte- gration in (2.19) to R and using the inverse Fourier transform. As before, we set ZR to be the multiplication by x on L2(R) and we define e−itZ by (2.3), so that (Z, IK, e−itZ) is a dilation of e−itΥ. Finally, we need an identification operator that maps the asymptotic space into the physical space. This is the least canonical part of the construction. In fact, there is some arbitrariness in its definition for the frequencies away from spK. For λ > 0, we define the family of partial isometries Jλ,k : L 2(R, hk) → L2(Ik, hk) ⊂ H: (Jλ,kgk)(y) = ), if y ∈ Ik; 0, if y ∈ R\Ik. We set Jλ : Z → H, defined for g = (gk) ∈ ZR by Jλg := Jλ,kgk, and on K equal to the identity. Note that Jλ are partial isometries and s− lim J∗λJλ = 1. The following result is proven in [DD1]: Theorem 2.4 (Extended weak coupling limit for Friedrichs operators). s∗ − lim iλ−2tH0eλ −2(t−t0)Hλeiλ −2t0H0Jλ = e itZRe−i(t−t0)Zeit0ZR . 10 J. DEREZIŃSKI AND W. DE ROECK Here we used the strong* limit: s∗ − limλց0 Aλ = A means that for any vector ψ we have limλց0Aλψ = Aψ, limλց0 A λψ = A Note that in the extended weak coupling limit for Friedrichs operators the asymptotic space is a direct sum of parts belonging to various eigenvalues of K that “do not talk to one another”–have independent asymptotic dynamics. 3. Completely positive maps and semigroups. This section presents basic material about completely positive maps and semigroups. In particular, we describe a construction of the Stinespring dilation [St] and of the so-called Lindblad form of the generator of a c.p. semigroup [Li, GKS]. These beautiful classic results are described in many places in the literature. Nevertheless, some of their aspects, mostly concerning the freedom of choice of various terms in the Lindblad form, are difficult to find in the literature. Therefore, we describe this material at length, including sketches of proofs. In Subsect. 3.3 we recall the usual concept of a (classical) Markov semigroups (on a finite state space). When discussing c.p. (quantum) Markov semigroups, it is useful to compare it to their classical analogs, which are usually much simpler. In Subsect 3.4 we discuss c.p. semigroups invariant with respect to a certain unitary dynamics. Such c.p. semigroups arise in the weak coupling limit – therefore, one can argue that they are “more physical than others”. Finally, in Subsect. 3.5 we analyze the Detailed Balance Condition, which singles out c.p. dynamics obtained from a thermal reservoir. 3.1. Completely positive maps. Let K1,K2 be Hilbert spaces. We say that a map Ξ : B(K1) → B(K2) is positive iff A ≥ 0 implies Ξ(A) ≥ 0. We say that Ξ is Markov iff Ξ(1) = 1. We say that a map Ξ is n-positive iff Ξ⊗ id : B(K1 ⊗ Cn) → B(K2 ⊗ Cn) is positive. (id denotes the identity). We say that Ξ is completely positive, or c.p. for short, iff it is n-positive for any n. It is easy to see that if h be a Hilbert space and ν ∈ B(K2,K1 ⊗ h). Then Ξ(A) := ν∗ A⊗1 ν (3.1) is c.p. The following theorem says that the above representation of a c.p. map is universal. 2) means that this representation is unique up to a unitary isomorphism. Theorem 3.1 (Stinespring). Assume that K1,K2 are finite dimensional. 1) If Ξ is c.p. from B(K1) to B(K2), then there exist a Hilbert space h and ν ∈ B(K2,K1 ⊗ h) such that (3.1) is true and {(φ|⊗1h ν ψ : φ ∈ K1, ψ ∈ K2} = h. (3.2) 2) If in addition to the h′ and ν′ also satisfy the above properties, then there exists a unique unitary operator U from h to h′ such that ν′ = 1K1 ⊗ U ν. The right hand side of (3.1) is called a Stinespring dilation of a c.p. map Ξ. If the condition (3.2) holds, then it is called a minimal. REDUCED AND EXTENDED WEAK COUPLING LIMIT 11 Remark 3.2. If we choose a basis in h, so that we identify h with Cn, then we can identify ν with ν1, . . . , νn ∈ B(K2,K1). Then we can rewrite (3.1) as Ξ(A) = ν∗jAνj . (3.3) In the literature, (3.3) is called a Kraus decomposition, even though the work of Stine- spring is much earlier than that of Kraus. Note that physically the space h can be interpreted as a part of the reservoir that directly interacts with the small system. Proof of Theorem 3.1. Let us prove 1). We equip the algebraic tensor product H0 := B(K1)⊗K2 with the following scalar product: for Xi ⊗ vi, w̃ = Yi ⊗ wi ∈ H0 we set (ṽ|w̃) = (vi|Ξ(X∗i Yj)wj). By the complete positivity, it is positive definite. Next we note that there exists a unique linear map π0 : B(K1) → B(H0) satisfying π0(A)ṽ := AXi ⊗ vi. We check that (π0(A)ṽ|π0(A)ṽ) ≤ ‖A‖2(ṽ|ṽ), π0(AB) = π0(A)π0(B), π0(A∗) = π0(A)∗. Let N be the set of ṽ ∈ H0 with (ṽ|ṽ) = 0. Then the completion of H := H0/N is a Hilbert space. There exists a nondegenerate ∗-representation π of B(K1) in B(H) such π(A)(ṽ +N ) = π0(A)ṽ. Using the fact that all our spaces are finite dimensional we see that for some Hilbert space h we can identify H with K1 ⊗ h and π(A) = A⊗ 1. We set νv := 1⊗ v +N . We check that Ξ(A) = ν∗A⊗1 ν. This ends the proof of the existence of the Stinespring dilation. Let us now prove 2). If h′, ν′ is another pair that gives a Stinespring dilation, we check that Xi ⊗ 1h ν vi Xi ⊗ 1h′ ν′ vi Therefore, there exists a unitary U0 : K2 ⊗ h → K2 ⊗ h′ such that U0ν = ν′. We check that U0 A ⊗ 1h = A ⊗ 1h′ U0. Therefore, there exists a unitary U : h → h′ such that U0 = 1⊗ U . 12 J. DEREZIŃSKI AND W. DE ROECK We will need the following inequality for c.p. maps: Theorem 3.3 (Kadison-Schwarz inequality for c.p. maps.). If Ξ is 2-positive and Ξ(1) is invertible, then Ξ(A)∗Ξ(1)−1Ξ(A) ≤ Ξ(A∗A). (3.4) Proof. Let z ∈ C. A∗A zA∗ z̄A |z|2 ≥ 0 implies Ξ(A∗A) zΞ(A∗) z̄Ξ(A) |z|2Ξ(1) ≥ 0. Hence, for φ, ψ ∈ K, (φ|Ξ(A∗A)φ) + 2Rez̄(ψ|Ξ(1)−1/2Ξ(A)φ) + |z|2(ψ|ψ) ≥ 0. (3.5) Therefore, (φ|Ξ(A∗A)|φ)(ψ|ψ) ≥ |(ψ|Ξ(1)−1/2Ξ(A)φ)|2, (3.6) which implies (3.4). 3.2. Completely positive semigroups. Let K be a finite dimensional Hilbert space. Let us consider a c.p. semigroup on B(K). We will always assume the semigroup to be continuous, so that it can be written as etM for a bounded operatorM on B(K). We will call etM Markov if it preserves the identity. C.p. Markov semigroups appear in the literature under various names. Among them let us mention quantum Markov semigroups and quantum dynamical semigroups. IfM1,M2 are the generators of (Markov) c.p. semigroups and c1, c2 ≥ 0, then c1M1+ c2M2 is the generator of a (Markov) c.p. semigroup. This follows by the Trotter formula. Here are two classes of examples of c.p. semigroups: 1) Let Υ = Θ+ i∆ be an operator on K, with Θ,∆ self-adjoint. Then M(A) := iΥA− iAΥ∗ = i[Θ, A]− [∆, A]+ is the generator of a c.p. semigroup and etM (A) = eitΥAe−itΥ 2) Let Ξ be a c.p. map on B(K). Then it is the generator of a c.p. semigroup and etΞ(A) = Ξj(A). Let Θ, ∆ be self-adjoint operators on K. Let h be an auxiliary Hilbert space and ν ∈ B(K,K ⊗ h). Then it follows from what we wrote above that M(S) = i[Θ, A]− [∆, A]+ + ν∗ A⊗1 ν, A ∈ B(K), (3.7) is the generator of a c.p. semigroup. etM is Markov iff 2∆ = ν∗ν. The following theorem gives a complete characterization of generators of c.p. semi- groups on a finite dimensional space [Li, GKS]. Theorem 3.4 (Lindblad, Gorini-Kossakowski-Sudarshan). 1) Let etM be a c.p. semigroup on B(K) for a finite dimensional Hilbert space K. Then there exist self-adjoint operators Θ, ∆ on K, an auxiliary Hilbert space h and an operator ν ∈ B(K,K ⊗ h) such that M can be written in the form (3.7) and {(φ|⊗1 ν ψ : φ, ψ ∈ K} = h. (3.8) REDUCED AND EXTENDED WEAK COUPLING LIMIT 13 2) We can always choose Θ and ν so that TrΘ = 0, Tr ν = 0. (Above, we take the trace of ν on the space K obtaining a vector in h). If this is the case, then Θ and ∆ are determined uniquely, and ν is determined uniquely up to the unitary equivalence. We will say that a c.p. semigroup is purely dissipative if Θ = 0. We will call (3.7) a Lindblad form of M . We will say that it is minimal iff (3.8) holds. Remark 3.5. If we identify h with Cn, then we can write ν∗ A⊗1 ν = ν∗jAνj . Then Tr ν = 0 means Tr νj = 0, j = 1, . . . , n. Proof of Theorem 3.4. Let us prove 1). The unitary group onK, denoted U(K), is compact. Therefore, there exists the Haar measure on U(K), which we denote dU . Note that UXU∗dU = TrX. Define iΘ−∆0 := M(U∗)UdU, where Θ and ∆0 are self-adjoint. Let us show that M(XU∗)UdU = (iΘ−∆0)X. (3.9) First check this identity for unitary X , which follows by the invariance of the measure dU . But every operator is a linear combination of unitaries. So (3.9) follows in general. We can apply the Kadison-Schwarz inequality to the semigroup etM : etM (X)∗etM (1)−1etM (X) ≤ etM (X∗X). (3.10) Differentiating (3.10) at t = 0 yields M(X∗X) +X∗M(1)X −M(X∗)X −X∗M(X) ≥ 0. (3.11) Replacing X with UX , where U is unitary, we obtain M(X∗X) +X∗U∗M(1)UX −M(X∗U∗)UX −X∗U∗M(UX) ≥ 0. (3.12) Integrating (3.12) over U(K) we get M(X∗X) +X∗X TrM(1)− (iΘ−∆0)X∗X −X∗X(−iΘ−∆0)∗ ≥ 0. (3.13) Define ∆1 := ∆0 + TrM(1), Ξ(A) := M(A)− (iΘ −∆1)A−A(−iΘ −∆1). Using (3.13) we see that Ξ is positive. A straightforward extension of the above argument shows that Ξ is also completely positive. Hence, by Theorem 3.1 1), it can be written as Ξ(A) = ν∗1 A⊗1 ν1, 14 J. DEREZIŃSKI AND W. DE ROECK for some auxiliary Hilbert space h and a map ν1 : K → K⊗h. Finally, let us prove 2). The operator Θ has trace zero, because i TrΘ− Tr∆0 = U1M(U ∗)UU∗1 dUdU1 U2UM(U ∗)U∗2 dUdU2 = −i TrΘ− Tr∆0. Let w be an arbitrary vector in h and ∆ := ∆1 + ν ∗1⊗|w) + 1 (w|w), ν := ν1 + 1⊗|w). Then the same generator of a c.p. semigroup can be written in two Lindblad forms: (iΘ−∆1)A+A(−iΘ−∆1) + ν∗1Aν1, = (iΘ−∆)A+A(−iΘ −∆) + ν∗Aν. In particular, choosing w := −Tr ν1, we can make sure that Tr ν = 0. 3.3. Classical Markov semigroups. It is instructive to compare c.p. Markov semigroups with usual (classical) Markov semigroups. Consider the space Cn. For u = (u1, . . . , un) ∈ Cn we will write u ≥ 0 iff u1, . . . , un ≥ 0. We define 1 := (1, . . . , 1). We say that a linear map T is pointwise positive iff u ≥ 0 implies Tu ≥ 0. We say that it is Markov iff T1 = 1. A one-parameter semigroup Rt 7→ Tt ∈ B(Cn) will be called a (classical) Markov semigroup iff Tt is pointwise positive and Markov for any t ≥ 0. Every continuous one-parameter semigroup on Cn is of the form R+ ∋ t 7→ etm for some n × n matrix m. Clearly, the transformations etm are pointwise positive for any t ≥ 0 iff mij ≥ 0, i 6= j. They are Markov for any t ≥ 0 iff in addition jmij = 0. Markov c.p. semigroups often lead to classical Markov semigroups, as described in the following easy fact: Theorem 3.6. Let P1, . . . , Pn ∈ B(K) satisfy P ∗j = Pj and PjPk = δjkPj. Let P be the (commutative) ∗-algebra generated by P1, . . . , Pn. Clearly, P is naturally isomorphic to n. Let etM be a Markov c.p. semigroup on B(K) that preserves the algebra P. Then is a classical Markov semigroup. Conversely, from a classical Markov semigroup one can construct c.p. Markov semi- groups: Theorem 3.7. Let etm be a classical Markov semigroup on Cn. Let e1, . . . , en denote the canonical basis of Cn and Eij := |ei)(ej |. Let θ1, . . . θn be real numbers and set Θ := θ1E11 + · · ·+ θnEnn. For A ∈ B(Cn) define M(A) := i[Θ, A]− 1 mjj [Ejj , A]+ + mijEijAEji. (3.14) Then M is the generator of a Markov c.p. semigroup on B(Cn). The algebra P gener- ated by E11, . . . , Enn is preserved by e tM and naturally isomorphic to Cn. Under this REDUCED AND EXTENDED WEAK COUPLING LIMIT 15 identification, M equals m. 3.4. Invariant c.p semigroups. Let K be a self-adjoint operator on K. Let M be the generator of a c.p. semigroup on K. We say that M is K-invariant iff M(A) = e−itKM(eitKAe−itK)eitK , t ∈ R. (3.15) We will see later on that c.p. semigroups obtained in the weak coupling limit are always K-invariant with respect the Hamiltonian of the small system. Note that M can be split in a canonical way into M = i[Θ, ·] +Md, where Md is its purely dissipative part. M is K-invariant iff [Θ,K] = 0 and Md is K-invariant. Thus in what follows it is enough to restrict ourselves to the purely dissipative case. The following two theorems extend Theorem 3.6 and 3.7. Theorem 3.8. Consider the set-up of Theorem 3.6. Suppose in addition that K is a self-adjoint operator on K with the eigenvalues k1, . . . , kn and Pj = 1kj (K). Let M be K-invariant. Then the algebra P is preserved by etM (and hence the conclusion of Theorem 3.6 holds). Theorem 3.9. Consider the set-up of Theorem 3.7. If k1, . . . , kn are real and K := k1E11 + · · ·+ knEnn, then M is K-invariant. The following theorem describes the K-invariance on the level of a Lindblad form. We restrict ourselves to the Markov case. Theorem 3.10. Let ν ∈ B(K,K⊗h) and let Y be a self-adjoint operator on h such that M(A) = −1 [ν∗ν,A]+ + ν ∗A⊗ 1ν, (3.16) νK = (K⊗1 + 1⊗Y )ν. (3.17) Then M is the generator of a K-invariant purely dissipative Markov c.p. semigroup. Proof. We check that ν∗ν commutes with K. Then it is enough to verify that A 7→ ν∗A⊗ 1 ν is K-invariant. There exists a partial converse of Theorem 3.10. Theorem 3.11. Let M be the generator of a K-invariant purely dissipative Markov c.p. semigroup. Let h, ν realize its minimal Lindblad form (3.16). Then there exists a self- adjoint operator Y on h such that (3.17) is true. Proof. By the uniqueness part of Theorem 3.1 there exists a unique unitary operator Ut on h such that eitK⊗Ut ν e−itK = ν. We easily check the Ut is a continuous 1-parameter unitary group so that Ut can be written as e itY for some self-adjoint Y . Note that Theorems 3.10 and 3.11 have a clear physical meaning. The operator ν is responsible for “quantum jumps”. The operator Y describes the energy of the reservoir (or actually of the part of the reservoir “directly seen” by the interaction). The equation (3.17) describes the energetic balance in each quantum jump. 3.5. Detailed Balance Condition. In the literature the name Detailed Balance Condition (DBC) is given to several related but non-equivalent concepts. In this subsection we discuss some of the versions of the DBC relevant in the weak coupling limit. 16 J. DEREZIŃSKI AND W. DE ROECK Some of the definitions of the DBC (both for classical and quantum systems) involve the time reversal [Ag, Ma, MaSt]. In the weak coupling limit one does not need to introduce the time reversal, hence we will only discuss versions of the DBC that do not involve this operation. (See however [DM] for a discussion of time-reversal in semigroups obtained in the weak coupling limit.) Let us first recall the definition of the classical Detailed Balance Condition. Let p = (p1, . . . , pn) ∈ Cn be a vector with p1, . . . , pn > 0. Introduce the scalar product on Cn: (u|u′)p := jpj . (3.18) Let etm be a classical Markov semigroup on Cn. We say that m satisfies the Detailed Balance Condition for p iff m is self-adjoint for (·|·)p. Let us now consider the quantum case. Let ρ be a nondegenerate density matrix. As usual, we assume that K is finite dimensional. On B(K) we introduce the scalar product (A|B)ρ := Tr ρ1/2A∗ρ1/2B. (3.19) Let M be the generator of a c.p. semigroup on B(K). Recall that it can be uniquely represented as M = i[Θ, ·] +Md, where Md is its purely dissipative part and i[Θ, ·] its Hamiltonian part. We say that M satisfies the Detailed Balance Condition (or DBC) for ρ iff Md is self-adjoint and i[Θ, ·] is anti-self-adjoint for (·|·)ρ. Note that M satisfies the DBC for ρ iff [Θ, ρ] = 0 and Md satisfies the DBC for ρ. Therefore, in our further analysis we will often restrict ourselves to the purely dissipative case. We believe that in the quantum finite dimensional case the above definition of the DBC is the most natural. It was used e.g. in [DF1] under the name of the standard Detailed Balance Condition. A similar but different definition of the DBC can be found in [FKGV, Al1]. Its only difference is the replacement of the scalar product (·|·)ρ given in (3.19) with Tr ρA∗B. (3.20) Note that if M is K-invariant and ρ is a function of K, then both definitions are equiv- alent. The weak coupling limit applied to a small system with a Hamiltonian K interact- ing with a thermal reservoir at some fixed temperature β always yields a Markov c.p. semigroup that is K-invariant and satisfies the DBC for ρ = e−βK/Tr eβK ; see e.g. [LeSp, DF1] and Subsect 4.3. There exists a close relationship between the classical and quantum DBC. Theorem 3.12. Consider the set-up of Theorem 3.6. Let ρ be a density matrix on K with the eigenvalues p1, . . . , pn and let Pj equal the spectral projections of ρ for the eigenvalue pj. If M satisfies the DBC for ρ, either in the sense of (3.19) or in the sense of (3.20), then the classical Markov semigroup etM satisfies the DBC for p = (p1, . . . , pn). REDUCED AND EXTENDED WEAK COUPLING LIMIT 17 Theorem 3.13. Consider the set-up of Theorem 3.7. Let etm satisfy the classical DBC for p = (p1, . . . , pn). Then M defined by (3.14) satisfies both quantum versions of the DBC for ρ := p1E11 + · · ·+ pnEnn. The following theorem describes the DBC for K-invariant generators on the level of their Lindblad form. It is an extension of Theorem 3.10. (Note that (3.21), (3.22) are identical to (3.16), (3.17) of Theorem 3.10). Theorem 3.14. Let ν ∈ B(K,K⊗h) and Y a self-adjoint operator on h such that M(A) = −1 [ν∗ν,A]+ + ν ∗A⊗ 1ν, (3.21) νK = (K⊗1 + 1⊗Y ) ν, (3.22) Trh νAν ∗ = ν∗ A⊗e−βY ν. (3.23) Then M is the generator of a K-invariant purely dissipative Markov c.p. semigroup satisfying the DBC for ρ := e−βK/Tr e−βK. Proof. It follows from (3.17) that ν∗ν commutes with e−βK/2. Hence [ν∗ν, ·]+ is self- adjoint for (·|·)ρ. If M is a map on B(K), then M∗ρ will denote the adjoint for this scalar product. Let M1(A) = ν ∗ A⊗1 ν. We compute: 1 (A) = Trh e βK/2⊗1 ν e−βK/2Ae−βK/2 ν∗ eβK/2⊗1 = Trh e βK/2ν∗ (e−βK/2Ae−βK/2⊗e−βY ) ν eβK/2 (3.24) = ν∗ A⊗1ν =M1(A). (3.25) In (3.24) and (3.25) we used (3.23) and (3.22) respectively. It is possible to replace the condition (3.23) with a different condition (3.26). Note that whereas (3.23) is quadratic in ν, (3.26) is linear in ν. Theorem 3.15. Suppose that ǫ is an antiunitary operator on h such that (φ⊗w|νψ) = (νφ|ψ ⊗ e−βY/2ǫw), φ, ψ ∈ K, w ∈ h. (3.26) Then (3.23) holds. Proof. It is sufficient to assume that A = |ψ)(ψ| for some ψ ∈ K. Let φ ∈ K. Let {wi | i ∈ I} be an orthonormal basis in h. Then Trh(φ|νAν∗φ) = (φ⊗wi|νψ)(νψ|φ⊗wi) (νφ|ψ⊗e−βY/2ǫwi)(ψ⊗e−βY/2ǫwi|νφ) |ψ)(ψ|⊗e−βY νφ = (φ|ν∗ A⊗e−βY νφ). There exists an extension of Theorem 3.11 to the Detailed Balance Condition. It can be viewed as a partial converse of Theorems 3.14 and 3.15: 18 J. DEREZIŃSKI AND W. DE ROECK Theorem 3.16. Let M be the generator of a K-invariant purely dissipative Markov c.p. semigroup satisfying the DBC for e−βK/Tr e−βK . Let h, ν realize its minimal Lindblad form (3.21). Let a self-adjoint operator Y on h satisfy (3.22). Then (3.23) is true and there exists a unique antiunitary operator ǫ on h such that (3.26) holds. Besides, ǫY ǫ = −Y and ǫ2 = 1. Proof. Step 1. By the proof of Theorem 3.14, the DBC for e−βK/Tr e−βK together with (3.22) imply (3.23). Step 2. The next step is to prove that (3.23) and (3.8) imply the existence of an antiu- nitary ǫ on h satisfying (3.26). Identify h with Cn, so that we have a complex conjugation w 7→ w in h. We can assume that Y is diagonal, so that Y w = Y w, w ∈ h. Define ν⋆ by (φ⊗w|νψ) = (ν⋆φ|ψ ⊗ w̄), φ, ψ ∈ K, w ∈ h. (3.27) (Note that ⋆ is a different star from ∗ denoting the Hermitian conjugation, see [DF1]). We can rewrite (3.23) as ν⋆∗ A⊗1 ν⋆ = ν∗ 1⊗e−βY/2 (A⊗1) 1⊗e−βY/2 ν. (3.28) (3.28) defines a c.p. map. By the uniqueness part of Theorem 3.1 and (3.8), we obtain the existence of a unitary map U on h such that ν⋆ = 1⊗Ue−βY/2 ν. Now we set ǫw = U∗w̄. Step 3. We apply (3.23) twice: (φ⊗w|νψ) = (νφ|ψ⊗e−βY/2ǫw) = (φ⊗(e−βY/2ǫ)2w|νψ). Using (3.8) we obtain w = (e−βY/2ǫ)2w. Step 4. Finally applying (3.23) together with (3.22) twice we obtain (φ⊗w|νψ) = (νeβK/2φ|e−βK/2ψ⊗ǫw) = (φ⊗ǫ2w|νψ). Thus with help of (3.8) we get w = ǫ2w. Note that the above results show that for c.p. Markov semigroups that areK-invariant and satisfy the DBC for e−βK/Tr e−βK we naturally obtain a certain algebraic structure on the “restricted reservoir” h that resembles closely the famous Tomita-Takesaki theory. The properties of e−βY and ǫ are paralel to those of the modular operator and the modular conjugations – the basic objects of the Tomita-Takesaki formalism. (See also Subsection 4.3). 4. Bosonic reservoirs. In this section we recall basic terminology related to second quantization, see e.g. [De0]. We also introduce Pauli-Fierz operators – a class of models (known in the literature under various names) that are often used to describe realistic physical systems, see e.g. [DJ1, DJP]. 4.1. Second quantization. Let HR be a Hilbert space describing 1-particle states. The corresponding bosonic Fock space is defined as Γs(HR) := ⊗ns HR. REDUCED AND EXTENDED WEAK COUPLING LIMIT 19 The vacuum vector is Ω = 1 ∈ ⊗0sHR = C. If z ∈ HR, then a(z)Ψ := n(z|⊗1(n−1)⊗Ψ ∈ ⊗n−1s HR, Ψ ∈ ⊗ns HR, is called the annihilation operator of z and a∗(z) := a(z)∗ is the corresponding creation operator. They are closable operators on Γs(HR). For an operator q on HR we define the operator Γ(q) on Γs(HR) by = q ⊗ · · · ⊗ q. (4.1) For an operator h on HR we define the operator dΓ(h) on Γs(HR) by dΓ(h) = h⊗ 1(n−1)⊗ + · · · 1(n−1)⊗ ⊗ h. Note the identity Γ(eith) = eitdΓ(h). 4.2. Coupling to a bosonic reservoir. Let K be a finite dimensional Hilbert space. We imagine that it describes a small quantum system interacting with a bosonic reservoir described by the Fock space Γs(HR). The coupled system is described by the Hilbert space H := K⊗ Γs(HR). Let V ∈ B(K,K ⊗HR). For Ψ ∈ K ⊗⊗ns HR we set a(V )Ψ := nV ∗⊗1(n−1)⊗Ψ ∈ K ⊗⊗n−1s HR. a(V ) is called the annihilation operator of V and a∗(V ) := a(V )∗ the corresponding creation operator. They are closable operators on K⊗Γs(HR). Note in particular that if V is written in the form j Vj⊗|bj) (which is always possible), then a∗(V ) = Vj ⊗ a∗(bj), a(V ) = V ∗j ⊗ a(bj), where a∗(bj), a(bj) are the usual creation/annihilation operators introduced in the pre- vious subsection. The following class of operators plays the central role in our article: Hλ = K ⊗ 1 + 1⊗ dΓ(HR) + λ(a∗(V ) + a(V )). (4.2) HereK is a self-adjoint operator describing the free dynamics of the small system, dΓ(HR) describes the free dynamics of the reservoir and a∗(V )/a(V ), for some V ∈ B(K,K⊗HR), describe the interaction. Operators of the form 4.2 will be called Pauli-Fierz operators. Note that operators of the form (4.2) or similar are very common in the physics lit- erature and are believed to give an approximate description of realistic physical systems in many circumstances (e.g. an atom interacting with radiation in the dipole approxima- tion), see e.g. [DJ1]. 4.3. Thermal reservoirs. In this subsection we will discuss thermal reservoirs. We fix a positive number β having the interpretation of the inverse temperature. 20 J. DEREZIŃSKI AND W. DE ROECK Recall that the free Hamiltonian is H0 := K⊗1 + 1⊗dΓ(HR). To have a simpler formula for the Gibbs state of the small system we assume that Tr e−βK = 1. We set τt(C) := e itH0Ce−itH0 , ωβ(C) := Tr e −βK⊗|Ω)(Ω| C, C ∈ B(H). Theorem 4.1. The following are equivalent: 1) For any D1, D2, D 2 ∈ B(K) and Bj := Dj⊗1 (a∗(V ) + a(V )) D′j ⊗ 1, j = 1, 2, and for any t ∈ R we have ωβ(τt(B1)B2) = ωβ (B2τt+iβ(B1)) . (4.3) 2) For any function f on the spectrum of spHR and A ∈ B(K), we have TrHR 1⊗f̄(−HR) V A V ∗ = V ∗ A⊗e−βHRf(HR) V. (4.4) Proof. The left hand side of (4.3) equals Tr e−βK+itKD1V ∗(D′1e −itKD2⊗e−itHR)V D′2. The right hand side of (4.3) equals TrD2V ∗(D′2e −βK+itKD1⊗e(−β+it)HR)V D′1e−itK . Now we set A1 := D −βK+itKD1, A2 := D −itKD2, and use the cyclicity of the trace. We obtain TrA2⊗e−itHR V A1 V ∗ = TrA2V ∗A1⊗e−βHR+itHR V. By the Fourier transformation we get TrA2⊗f̄(−HR) V A1 V ∗ = TrA2V ∗A1⊗e−βHRf(HR) V. This implies (4.4). We will say that the reservoir is thermal at the inverse temperature β iff the conditions of Theorem 4.1 are true. (4.3) is just the β-KMS condition for the state ωβ , the dynamics τ and appropriate operators. Note that (4.3) is satisfied for Pauli-Fierz semi-Liouvilleans constructed with help of the Araki-Woods representations of the CCR, where we use the terminology of [DJP, De0]. Theorem 4.1 describes a substitute of the KMS condition without invoking explicitly operator algebras. The KMS condition is closely related to the Tomita-Takesaki theory. One of the objects introduced in this theory is the modular conjugation. It turns out that the set-up of Theorem 4.1 is sufficient to introduce a substitute for the modular conjugation without talking about operator algebras. Define HR̃ := {(φ|⊗f(HR) V ψ : φ, ψ ∈ K, f ∈ Cc(R)} (cl denotes the closure). Clearly, HR̃ is a subspace of HR invariant with respect to the 1-particle reservoir Liouvillean HR. It describes the part of HR that is coupled to the small system. Let HR̃ denote the operator HR restricted to the space HR̃. REDUCED AND EXTENDED WEAK COUPLING LIMIT 21 Theorem 4.2. Suppose that the reservoir is thermal at inverse temperature β. Then there exist a unique antiunitary operator ǫR̃ on HR̃ such that (φ⊗w|V ψ) = (V φ|ψ⊗e−βHR̃ǫR̃w). (4.5) It satisfies ǫ2 = 1 and ǫR̃HR̃ǫR̃ = −HR̃. Proof. For f ∈ Cc(R), φ, ψ ∈ K, we set (φ|⊗e−βHR̃/2f(HR̃) V ψ := (ψ|⊗f̄(−HR̃)V φ. (4.4) implies that ǫR̃ is a well defined antiunitary map. 5. Quantum Langevin dynamics. Suppose that we are given a c.p. Markov semigroup etM on B(K). We will describe a certain class of self-adjoint operators Z on a larger Hilbert space such that e−itZ · eitZ is a dilation on etM . We will use the name quantum Langevin (or stochastic) dynamics for e−itZ · eitZ . The unitary group e−itZ will be called a Langevin (or stochastic) Schrödinger dynamics. In Subsection 5.1 we will restrict ourselves to a subclass of quantum Langevin dy- namics involving only the so-called linear noises. Actually, at present our results on the extended weak coupling limit are limited only to them. In Subsection 5.2 we will describe a more general class of quantum Langevin dynamics, which also involve quadratic noises. Our construction involving quadratic noises is related to the operator-theoretic approach of Chebotarev [Ch, ChR], and especially of Gregoratti [Gr]. We expect that our approach to the extended weak coupling limit can be improved to cover also this larger class. Within the approach of [AFL] there exist partial results in this direction [Go]. The history of the discovery of quantum Langevin dynamics is quite involved. The construction can be traced back to [AFLe], and especially [HP] where the quantum stochastic calculus was introduced. But apparently only in [Fr] and [Maa] it was inde- pendently realized that this leads to a dilation of Markov c.p. semigroups. Let us also mention [At, Me, Fa] for more recent presentations of the quantum stochastic calculus. 5.1. Linear noises. Apart from a c.p. Markov semigroup etM let us fix some additional data. More precisely, we fix an operator Υ, an auxiliary Hilbert space h and an operator ν from K to K ⊗ h such that −iΥ + iΥ∗ = −ν∗ν and M is given by M(A) = −i(ΥA−AΥ∗) + ν∗ A⊗1 ν, A ∈ B(K). In other words, we fix a concrete Lindblad form of M . Introduce the Hilbert space ZR := h ⊗ L2(R). The enlarged Hilbert space is Z := K ⊗ Γs(ZR). Let ZR be the operator of multiplication by the variable x on L 2(R). Let (1|, |1) be defined as in (2.2). 22 J. DEREZIŃSKI AND W. DE ROECK We choose a basis (bj) in h, so that we can write νj ⊗ |bj). (5.1) (Note that at the end the construction will not depend on the choice of a basis). Set ν+j = νj , ν j = ν For t ≥ 0 we define the quadratic form Ut := e −itdΓ(ZR) t≥tn≥···≥t1≥0 dtn · · · dt1 ×(2π)−n2 j1,...,jn ǫ1,...,ǫn∈{+,−} ×(−i)ne−i(t−tn)Υνǫnjn e −i(tn−tn−1)Υ · · · νǫ1j1 e −i(t1−0)Υ k=1,...,n: ǫk=+ a∗(eitkZRbjk ⊗ |1)) k′=1,...,n: ǫk′=− a(eitk′ZRbjk′ ⊗ |1)); U−t := U We will denote by IK the embedding of K ≃ K ⊗ Ω in Z. Theorem 5.1. Ut extends to a strongly continuous unitary group on Z such that I∗KUtIK = e −itΥ, I∗KUt A⊗ 1 U−tIK = etM (A). Thus Ut is a unitary dilation of e −itΥ, and Ut · U∗t is a dilation of etM . As every strongly continuous unitary group, Ut can be written as e −itZ for a certain self-adjoint operator Z. Note that formally (and also rigorously with an appropriate regularization) (Υ + Υ∗) + dΓ(ZR) +(2π)− 2 a∗ (ν ⊗ |1)) + (2π)− 12 a (ν ⊗ |1)) . Thus Z has the form of a Pauli-Fierz operator with a rather singular interaction. Let us present an alternative variation of the above construction, which is actually closer to what can be found in the literature. Let F be the Fourier transformation on ZR = h ⊗ L2(R) defined as in (2.6). The operator Z transformed by 1K⊗Γ(F) will be denoted by Ẑ := 1K⊗Γ(F) Z 1K⊗Γ(F∗). (5.2) It equals (Υ + Υ∗)⊗1 + 1⊗dΓ(Dτ ) +a (ν ⊗ |δ0)) + a∗ (ν ⊗ |δ0)) , REDUCED AND EXTENDED WEAK COUPLING LIMIT 23 where δ0, Dτ are defined as in (2.8), (2.9). Similarly to the operator of Section 2.1 denoted with the same symbol, the operator Ẑ (as well as Z) has a number of intriguing properties. Let us describe one of them. Let D0 := h ⊗ H1(R). (Recall that H1(R) is the first Sobolev space). Let Γs(D0), denote the corresponding algebraic Fock space and D1 := K ⊗ Γs(D0). Introduce the (non-self-adjoint) sesquilinear form Ẑ+ = Υ⊗1 + 1⊗dΓ(Dτ ) (5.3) +a (ν ⊗ |δ0)) + a∗ (ν ⊗ |δ0)) . Let ψ, ψ′ ∈ D1. Then (ψ|(e−itẐ − 1)ψ′) = −i(ψ|Ẑ+ψ′). (5.4) Thus it seems that Ẑ+ = Ẑ, which is true only if Υ is self-adjoint and hence there are no off-diagonal terms in Z. Clearly, the explanation of the above paradox is similar as in Subsect. 2.1: (ψ|e−itẐψ′) is not differentiable at zero. This is related to the fact that ψ, ψ′ do not belong to DomZ. Thus Ẑ+ can again be called a false form. In the literature, the Langevin Schrödinger dynamics e−itẐ is usually introduced through the so-called Langevin (or stochastic) Schrödinger equation satisfied by Ŵ (t) := eitdΓ(Dτ )e−itẐ . (5.5) To write this equation recall the decomposition (5.1) and note that. Then, in the sense of quadratic forms on D1, we have Ŵ (t) = Υ⊗1 + a∗ (ν ⊗ |δt)) Ŵ (t) + ν∗j Ŵ (t)a (bj ⊗ |δt)) . (5.6) Note that a (ν ⊗ |δ0)) and a∗ (ν ⊗ |δ0)) appearing in Ẑ and Ẑ+ are quantum analogs of a classical white noise. They are “localized” at τ = 0. Besides, they are (formally) given by a linear expression in terms of creation/annihilation operators. Therefore, they are often called linear quantum noises. 5.2. Quadratic noises. This subsection is outside of the main line of this article. It is closely related to Subsect. 2.2. It is not needed for the description of the weak coupling limit, as given in the next section. Clearly, Ψ ∈ K⊗ ⊗ns h⊗ L2(R) ⊗ns L2(R, h) can be identified with a function Ψ(τ1, . . . , τn) with values in K⊗ (⊗nh) and the arguments satisfying τ1 < · · · < τn. Let S be a unitary operator on K ⊗ h. Let S(j) be this operator acting on K ⊗ ⊗nh, where it is applied to the j’th “leg” of the tensor product ⊗nh. We define an operator Λ(S) on K ⊗ ⊗ns L2(R, h) as follows: If τ1 < · · · < τk < 0 < τk+1 < · · · < τn, then (Λ(S)Ψ) (τ1, . . . , τn) := S(k+1) · · ·S(n)Ψ(τ1, . . . , τn). Clearly, Λ(S) is a unitary operator. If K = C, then it coincides with Γ(γ(S)), where γ(S) was defined in (2.15) and Γ is the functor of the second quantization defined in (4.1). 24 J. DEREZIŃSKI AND W. DE ROECK Introduce the operator ẐS,0 on K⊗Γs(ZR) by ẐS,0 := Υ + Λ(S) ∗1⊗dΓ(Dτ )Λ(S). (5.7) The operator (5.7) is very singular and contains a “delta interaction at τ = 0”. Let us now define the dynamics ÛS,t that generalizes Ût. Let Sij ∈ B(K) be defined Sij⊗|bi)(bj |. (5.8) ν+S,j = νj , ν S,j = ν∗i Sij . Then we introduce the quadratic form ÛS,t := t≥tn≥···≥t1≥0 dtn · · · dt1 j1,...,jn ǫ1,...,ǫn∈{+,−} ×(−i)n k=1,...,n: ǫk=+ a∗(bjk ⊗ |δtk−t)) e−i(t−tn)ẐS,0νǫnS,jn⊗1e −i(tn−tn−1)ẐS,0 · · · νǫ1S,j1⊗1e −i(t1−0)ẐS,0 k′=1,...,n: ǫk′=− a(bjk′ ⊗ |δtk′ )); ÛS,−t := Û One can check that ÛS,t extends to a strongly continuous unitary group. Therefore, one can define a self-adjoint operator ẐS such that ÛS,t = e −itẐS . It satisfies I∗KÛS,tIK = e −itΥ, I∗KÛS,t A⊗ 1 ÛS,−tIK = etM (A). It is awkward to write a formula for ẐS in terms of creation/annihilation operators, even formally. There exists however and alternative formalism that is commonly used in the literature to define the group e−itẐS . Let ψ, ψ′ ∈ D1. Introduce the cocycle ŴS(t) := e itdΓ(Dτ )e−itẐS . (5.9) Then, in the sense of a quadratic form on D1, the cocycle satisfies the differential equation ŴS(t) = Υ⊗1 + a∗(ν⊗|δt)) ŴS(t) (5.10) i(1 − Sij)⊗a∗(bi ⊗ |δt)) ŴS(t) a(bj⊗|δt)) (5.11) ν−S,j ŴS(t)a(bj⊗|δt)). (5.12) REDUCED AND EXTENDED WEAK COUPLING LIMIT 25 This formula is the quantum Langevin (stochastic) equation for the cocycle ŴS(t) in the sense of [HP, Fa, Pa, At, Maa, Fr, Bar, Me], which includes all three kinds of noises. In the literature, the dilation e−itẐS is usually introduced through a version of (5.12). 5.3. Total energy operator. Let us analyze the impact of the invariance of a c.p. semigroup on its quantum Langevin dynamics. Suppose now that K is a self-adjoint operator on K and Y a self-adjoint operator on h. Assume that they satisfy ν K = (K⊗1 + 1⊗Y )ν, (Υ + Υ∗),K = 0. (5.13) This implies in particular that M is K-invariant. Define the self-adjoint operator on Z E := K⊗1 + 1⊗dΓ(Y⊗1). (5.14) Then it is easy to see that the quantum Langevin dynamics commutes with this operator: [E, e−itZ ] = 0. (5.15) E will be called the total energy operator, which is a name suggested by the physical interpretation that we attach to E. Next we discuss the implications of the DBC of a c.p. semigroup on its quantum Langevin dynamics. We set σt(C) := e itECe−itE , ωβ(C) := Tr e −βK⊗|Ω)(Ω| C/Tr e−βK , C ∈ B(Z). We will see that the DBC for e−βK/Tr e−βK is related to a version of the β-KMS con- dition for the dynamics σt and the state ωβ . Theorem 5.2. Assume (5.14). Then the following statements are equivalent: 1) For any D1, D2, D 2 ∈ B(K), f1, f2 ∈ L2(R) and Bj := Dj⊗1 a∗(ν⊗|fj)) + a(ν⊗|fj)) D′j⊗1, j = 1, 2. and for any t ∈ R we have ωβ(σt(B1)B2) = ωβ (B2σt+iβ(B1)) . (5.16) Trh νAν ∗ = ν∗ A⊗e−βY ν, (5.17) (This implies in particular that M satisfies the DBC for e−βK/Tr e−βK). 6. Weak coupling limit for Pauli-Fierz operators. In this section we describe the main results of this article. They are devoted to a rather large class of Pauli-Fierz oper- ators in the weak coupling limit. In the first subsection we recall the well known results about the reduced dynamics, which go back to Davies [Da1, Da2, Da3]. In the second subsection we describe our results that include the reservoir [DD2]. They are inspired by [AFL]. Finally, we discuss the case of thermal reservoirs. 26 J. DEREZIŃSKI AND W. DE ROECK 6.1. Reduced weak coupling limit. We consider a Pauli-Fierz operator Hλ = K ⊗ 1 + 1⊗ dΓ(HR) + λ(a∗(V ) + a(V )). We assume that K is finite dimensional and for any A ∈ B(K) we have ‖V ∗A ⊗ 1 e−itH0V ‖dt < ∞. The following theorem is essentially a special case of a result of Davies [Da1, Da2, Da3], see also [DD2]. Theorem 6.1 (Reduced weak coupling limit for Pauli-Fierz operators). There exists a K-invariant Markov c.p. semigroup etM on B(K) such that e−itK/λ itHλ/λ A⊗ 1 e−itHλ/λ itK/λ2 = etM (A), and a contractive semigroup e−itΥ on K such that [Υ,K] = 0 and eitK/λ −itHλ/λ IK = e −itΥ. If the reservoir is at inverse temperature β, then M satisfies the DBC for the state e−βK/Tr e−βK . The operator Υ ∈ B(K) arising in the weak coupling limit equals Υ := −i k−k′=ω 1k(K)V ∗1k′(K)e −it(HR−ω)V 1k(K)dt. In order to write an explicit formula forM it is convenient to introduce an additional assumption, which anyway will be useful later on in the extended weak coupling limit. Assumption 6.2. Suppose that for any ω ∈ spK − spK there exist an open Iω ⊂ R and a Hilbert space hω such that ω ∈ Iω and Ran1Iω (HR) ≃ hω ⊗ L2(Iω , dx), 1Iω (HR)HR is the multiplication operator by the variable x ∈ Iω and, for ψ ∈ K, 1Iω(HR)V ψ ≃ v(x)ψdx. Assume that Iω are disjoint for distinct ω and x 7→ v(x) ∈ B(K,K⊗hω) is continuous at Thus we assume that the reservoir 1-body Hamiltonian HR and the interaction V are well behaved around the Bohr frequencies – differences of eigenvalues of K. Let h := ⊕ hω. We define νω ∈ B(K,K⊗hω) by νω := (2π) ω=k−k′ 1k(K)v(ω)1k′(K) and ν ∈ B(K,K⊗h) by REDUCED AND EXTENDED WEAK COUPLING LIMIT 27 Note that iΥ− iΥ∗ = k−k′=ω 1k(K)V ∗1k′(K)e −it(HR−ω)V 1k(K)dt k−k′=ω 1k(K)v ∗(ω)1k′(K)v(ω) 1k(K) = ν∗ν. The generator of a c.p. Markov semigroup that arises in the reduced weak coupling limit, called sometimes the Davies generator, is M(A) = −i(ΥA−AΥ∗) + ν∗ A⊗1 ν (6.1) [A, ν∗ν]+ + ν ∗A⊗1 ν, A ∈ B(K). 6.2. Energy of the reservoir in the weak coupling limit. Introduce the operator Y on h by setting Y = ω on hω. (6.2) The operator Y has the interpretation of the asymptotic energy of the restricted reservoir. Theorem 6.3. 1) The operator ν constructed in the weak coupling limit satisfies ν K = (K⊗1 + 1⊗Y )ν. (6.3) This implies in particular that M is K-invariant. 2) If the reservoir is at inverse temperature β, then ν satisfies Trh νAν ∗ = ν∗ A⊗e−βY ν, (6.4) This implies in particular that M satisfies the DBC for e−βK/Tr e−βK . 6.3. Extended weak coupling limit. Recall that given (Υ, ν, h) we can define the space ZR and the Langevin Schrödinger dynamics e−itZ on the space Z := K ⊗ Γs(ZR), as in Subsect. 5.1. For λ > 0, we define the family of partial isometries Jλ,ω : hω⊗L2(R) → hω⊗L2(Iω) ⊂ (Jλ,ωgω)(y) = ), if y ∈ Iω ; 0, if y ∈ R\Iω. We set Jλ : ZR → HR, defined for g = (gω) by Jλg := Jλ,ωgω. Note that Jλ are partial isometries and s− limλց0 J∗λJλ = 1. Set Z0 := dΓ(ZR). The following theorem [DD2] was inspired by [AFL]: Theorem 6.4 (Extended weak coupling limit for Pauli-Fierz operators). s∗ − lim Γ(J∗λ)e iλ−2tH0e−iλ −2(t−t0)Hλeiλ −2t0H0Γ(Jλ) = eitZ0e−i(t−t0)Ze−it0Z0 . 28 J. DEREZIŃSKI AND W. DE ROECK The extended weak coupling limit can be used to describe interesting physical prop- erties of non-equilibrium quantum systems, see e.g. [DM]. The following corollary, which generalizes the results of [Du], describes the asymptotics of correlation functions for ob- servables of the form Γ(Jλ)AΓ(J λ), where A are observables on the asymptotic space. Corollary 6.5 (Asymptotics of correlation functions). Suppose that Aℓ, . . . , A1 ∈ B(Z) and t, tℓ, . . . , t1, t0 ∈ R. Then s∗ − lim iλ−2tH0e−iλ −2(t−tℓ)Hλe−iλ −2tℓH0Γ(Jλ)AℓΓ(J · · ·Γ(Jλ)A1Γ(J∗λ)eiλ −2t1H0e−iλ −2(t1−t0)Hλe−iλ −2t0H0IK = I∗Ke itZ0e−i(t−tℓ)Ze−itℓZ0Aℓ · · ·A1eit1Z0e−i(t1−t0)Ze−it0Z0IK. The following corollary is interesting since it describes how reservoir Hamiltonians converge to operators whose dynamics under the quantum Langevin dynamics U−t · Ut is well-studied, see e.g. [Bar]. Corollary 6.6 (Asymptotic reservoir energies). Consider the operator Y : h 7→ h defined in (6.2). The operator E := K⊗1 + 1⊗dΓ(Y⊗1) plays the role of “asymptotic total energy operator”, i.e. [E, eitZ ] = 0. (6.5) Besides, for κ1, . . . , κℓ ∈ R, s∗ − lim iλ−2tH0e−iλ −2(t−tℓ)Hλe−iλ −2tℓH0 eiκℓdΓ(HR) · · · eiκ1dΓ(HR) eiλ −2t1H0e−iλ −2(t1−t0)Hλe−iλ −2t0H0IK = I∗Ke itZ0e−i(t−tℓ)Ze−itℓZ0 eiκℓdΓ(Y⊗1) · · · eiκ1dΓ(Y⊗1) eit1Z0e−i(t1−t0)Ze−it0Z0IK. REDUCED AND EXTENDED WEAK COUPLING LIMIT 29 References [AFLe] Accardi, L., Frigerio, A., Lewis, J.T.: Quantum stochastic processes, Publ. RIMS 18 (1982) 97-133 [AFL] L. Accardi, A. Frigerio, Y.G. Lu: Weak coupling limit as a quantum functional central limit theorem, Comm. Math. Phys. 131, 537–570 (1990). [ALV] L. Accardi and Y. G. Lu and I. V. Volovich: Quantum Theory and Its Stochastic Limit, Springer, New York, 2002 [Ag] G. S. Agarwal: Open quantum Markovian systems and microreversibility, Z. Physik, 258, (1973) 409–422 [Al1] R. Alicki: On the detailed balance condition for non-Hamiltonian systems, Rep. Math. Phys., 10, (1976) 249-258 [Al2] R. Alicki: Invitation to quantum dynamical semigroups, eds P. Garbaczewski and R. Olkiewicz, ” Dynamics of Dissipation ” Lecture Notes in Physics, Springer, 2002 [AL] Alicki, R., Lendi, K.: Quantum dynamical semigroups and applications, Lecture Notes in Physics no 286, Springer 1991 [At] S. Attal: Quantum noises, “Quantum Open Systems II: The Markovian approach”, eds S. Attal and A. Joye and C.-A. Pillet, Lecture Notes in Mathematics 1881, Springer, 2006 [AtP] S. Attal and Y. Pautrat: From repeated to continuous quantum interactions: to appear in Annales Henri Poincaré” 7 2006 [AtJ] S. Attal and A. Joye: The Langevin Equation for a Quantum Heat Bath: math-ph/0612055 [Bar] A. Barchielli: Continual Measurements in Quantum Mechanics, Open Quantum Systems III. Recent developments eds S. Attal and A. Joye and C.-A. Pillet, Lecture Notes in Math- ematics 1882, Springer 2006, pp 207-292 [Ch] A. M. Chebotarev: Symmetric form of the Hudson-Parthasarathy stochastic equation, Mat. Zametki [Math. Notes] 60 (1996) 726-750 [ChR] A. M. Chebotarev and G. V. Ryzhakov: On the Strong Resolvent Convergence of the Schrodinger Evolution to Quantum Stochastics, Mathematical Notes 74 (2003) 717-733 [Da1] E. B. Davies: Markovian master equations, Comm. Math. Phys. 39, 91 (1974). [Da2] Davies, E. B.: Markovian master equations II. Math. Ann. 219, 147 (1976). [Da3] Davies, E. B.: One parameter semigroups, Academic Press 1980 [De0] J. Dereziński: Introduction to Representations of Canonical Commutation and Anticom- mutation Relations, Large Coulomb Systems, Lecture Notes in Physics 695, eds J. Derezinski and H. Siedentop, Springer, 2006 [DD1] J. Dereziński, W. De Roeck: Extended weak coupling limit for Friedrichs Hamiltonians, Journ. Math. Phys. 48 (2007), 012103 [DD2] J. Dereziński, W. De Roeck: Extended weak coupling limit for Pauli-Fierz operators, to appear in Commun. Math. Phys., preprint math-ph/0610054 [DF1] J. Dereziński, R. Früboes: Fermi Golden Rule and open quantum systems, ”Open Quan- tum Systems III Recent Developments” Lecture Notes in Mathematics 1882 eds S. Attal, A. Joye, C.-A. Pillet 2006, pp 67-116 [DF2] J. Dereziński, R. Früboes: Renormalization of Friedrichs Hamiltonians, Reports on Math. Phys. 50, 433–438 (2002) [DJ1] J. Dereziński and V. Jaksic: Spectral theory of Pauli-Fierz operators, J. Func. Anal. 180 (2001) ”243–327”, [DJP] Dereziński, J., Jakšić, V., Pillet, C. A.: Perturbation theory of W ∗-dynamics, Liouvilleans and KMS-states, to appear in Rev. Math. Phys http://arxiv.org/abs/math-ph/0612055 http://arxiv.org/abs/math-ph/0610054 30 J. DEREZIŃSKI AND W. DE ROECK [DM] W. De Roeck and C. Maes: Fluctuations of the dissipated heat in a quantum stochastic model, Rev. Math. Phys. 18 (2006) ”619–653”, [Du] R. Dümcke: Convergence of multitime correlation functions in the weak and singular cou- pling limits, J. Math. Phys. 24 (19983) 311-315 [EL] D.E. Evans and J.T. Lewis: Dilations of irreversible evolutions in algebraic quantum theory, ed. Dublin Institute for Advanced Studies, 1977 [Fa] Fagnola,F.: Quantum stochastic differential equations and dilation of completely positive semigroups, ”Open Quantum Systems III Recent Developments” Lecture Notes in Mathe- matics 1882 eds S. Attal, A. Joye, C.-A. Pillet 2006, pp 183-220 [Fr] Frigerio, A.: Covariant Markov dilations of quantum dynamical semigroups, Publ. RIMS Kyoto Univ. 21 (1985) 657-675 [FKGV] A. Frigerio and A. Kossakowski and V. Gorini and M. Verri: Quantum detailed balance and KMS condition, CMP, 57 (1977) 97–110 [GKS] Gorini, V., Kossakowski, A., Sudarshan, E.C.G. Journ. Math. Phys. 17 (1976) 821 [Go] J. Gough: Quantum Flows as Markovian Limit of Emission, Absorption and Scattering Interactions, CMP 254 (2005) 489–512 [Gr] Gregoratti, M.: The Hamiltonina operator associated with some quantum stochastic evo- lutions, Comm. Math. Phys. 222 (2001) 181-200; Erratum, Comm. Math. Phys. 264 (2006) 563-564 [Haa] Haake, F.: Statistical treatment of open systems by generalized master equation. Springer Tracts in Modern Physics 66, Springer-Verlag, Berlin, 1973. [HP] R. L. Hudson, K. R. Parthasaraty: Quantum Ito’s formula and stochastic evolutions, Comm. Math. Phys. 93 no. 3, 301–323 (1984). [Ku] B. Kümmerer, W. Schröder: A new construction of unitary dilations: singular coupling to white noise, in Quantum Probabilty and Applications, eds L. Accardi and W. von Walden- fels, (1984) [LeSp] Lebowitz, J., Spohn, H.: Irreversible thermodynamics for quantum systems weakly cou- pled to thermal reservoirs. Adv. Chem. Phys. 39, 109 (1978). [Li] G. Lindblad: On the generators of quantum dynamical semigroups, Comm. Math. Phys. 48 (1976) 119-130 [Maa] Maasen, H.: Quantum Markov processes on Fock space described by integral kernels, in: Quantum probability and applications II, LNM 1136 Springer Berlin 1985, pp 361-374 [Ma] Majewski, W. A.: Journ. Math. Phys. The detailed balance condition in quantum statistical mechanics 25 (1984) 614 [MaSt] Majewski, W. A., Streater, R. F.: Detailed balance and quantum dynamical maps Journ. Phys. A: Math. Gen. 31 (1998) 7981-7995 [Me] Meyer, P.-A.: Quantum probability for probabilists, 2nd edition, L.N.M. 1538, Springer, Berlin 1995 [NF] B. Sz. Nagy and C. Foias: Harmonic Analysis of Operators in Hilbert Space, North-Holland, New York (1970) [Pa] Parthasarathy, K.R.: An introduction to quantum stochastic calculus, Birkhäuser, Basel- Boston-Berlin 1992 [St] W. F. Stinespring: Positive functions on C-algebras, Proc. Amer. Math. Soc., 6, (1955) 211-216. [VH] L. Van Hove: Quantum-mechanical perturbations giving rise to a statistical transport equation. Physica 21, 517 (1955). Introduction. Toy model of the weak coupling limit. Dilations of contractive semigroups. ``Toy quadratic noises''. Weak coupling limit for Friedrichs operators. Completely positive maps and semigroups. Completely positive maps. Completely positive semigroups. Classical Markov semigroups. Invariant c.p semigroups. Detailed Balance Condition. Bosonic reservoirs. Second quantization. Coupling to a bosonic reservoir. Thermal reservoirs. Quantum Langevin dynamics. Linear noises. Quadratic noises. Total energy operator. Weak coupling limit for Pauli-Fierz operators. Reduced weak coupling limit. Energy of the reservoir in the weak coupling limit. Extended weak coupling limit. ABSTRACT We give an extended review of recent work on the extended weak coupling limit. Background material on completely positive semigroups and their unitary dilations is given, as well as a particularly easy construction of `quadratic noises'. <|endoftext|><|startoftext|> arXiv:0704.0670v1 [nucl-ex] 5 Apr 2007 Typeset with jpsj2.cls Letter Complete Set of Polarization Transfer Observables for the 12C(p, n) Reaction at 296 MeV and 0◦ Masanori Dozono∗, Tomotsugu Wakasa, Ema Ihara, Shun Asaji, Kunihiro Fujita1, Kichiji Hatanaka1, Takashi Ishida2, Takaaki Kaneda1, Hiroaki Matsubara1, Yuji Nagasue, Tetsuo Noro, Yasuhiro Sakemi3, Yohei Shimizu1, Hidemitsu Takeda, Yuji Tameshige1, Atsushi Tamii1 and Yukiko Yamada Department of Physics, Kyushu University, Fukuoka 812-8581 Research Center for Nuclear Physics, Osaka University, Osaka 567-0047 Laboratory of Nuclear Science, Tohoku University, Sendai 982-0826 Cyclotron and Radioisotope Center, Tohoku University, Sendai 980-8578 A complete set of polarization transfer observables has been measured for the 12C(p, n) reaction at Tp = 296 MeV and θlab = 0 ◦. The total spin transfer Σ(0◦) and the observable f1 deduced from the measured polarization transfer observables indicate that the spin–dipole resonance at Ex ≃ 7 MeV has greater 2 − strength than 1− strength, which is consistent with recent experimental and theoretical studies. The results also indicate a predominance of the spin-flip and unnatural-parity transition strength in the continuum. The exchange tensor interaction at a large momentum transfer of Q ≃ 3.6 fm−1 is discussed. KEYWORDS: complete set of polarization transfer observables, spin–dipole resonance, exchange tensor interaction The charge exchange reaction at intermediate ener- gies (T & 100 MeV/A) is one of the best probes to study spin–isospin excitations in nuclei, such as spin– dipole (SD) excitations characterized by ∆L = 1, ∆S = 1, and ∆Jπ = 0−, 1−, and 2−. In previous (p, n) and (n, p) experiments on 12C,1, 2 spin–dipole resonances (SDRs) were found at Ex ≃ 4 and 7 MeV. Analysis of the angular distributions of the SDRs at Ex ≃ 4 and 7 MeV indicate that they consist of mainly 2− and 1− compo- nents, respectively. However, a recent 12C(~d, 2He)12B ex- periment3 suggested that the SDR at Ex ≃ 7 MeV in has more 2− components than 1− components. This sug- gestion is supported by a 12C(12C, 12N)12B experiment4 and by theoretical calculations including tensor correla- tions.5 Thus the spin-parity assignment of the SDR at Ex ≃ 7 MeV for the A = 12 system is still controversial. A complete set of polarization transfer (PT) observ- ables at 0◦ is a powerful tool for investigating the spin- parity Jπ of an excited state. The total spin transfer Σ(0◦) deduced from such a set gives information on the transferred spin ∆S, which is independent of theoreti- cal models.6 Furthermore, information can be obtained on the parity from the observable f1. 7 On the other hand, each PT observable is sensitive to the effective nucleon–nucleon (NN) interaction. The PT observables for ∆Jπ = 1+ transitions have been used to study the exchange tensor interaction at large momentum trans- fers.8, 9 In this Letter, we present measurements of a com- plete set of PT observables for the 12C(p, n) reaction at Tp = 296 MeV and θlab = 0 ◦. We have deduced the total spin transfer Σ and the observable f1 using the measured PT observables in order to investigate the spin-parity structure in both the SDR and continuum regions. We also compare the PT observables for the ∗E-mail address: dozono@kutl.kyushu-u.ac.jp 12C(p, n)12N(g.s.; 1+) reaction with distorted-wave im- pulse approximation (DWIA) calculations employing the effective NN interaction in order to assess the effective tensor interaction at a large exchange momentum trans- fer of Q ≃ 3.6 fm−1. Measurements were carried out at the neutron time- of-flight facility10 at the Research Center for Nuclear Physics (RCNP), Osaka University. The proton beam energy was 296 MeV and the typical current and polar- ization were 500 nA and 0.70, respectively. The neutron energy and polarization were measured by the neutron detector/polarimeter NPOL3.11 We used a natural car- bon (98.9% 12C) target with a thickness of 89 mg/cm2. The measured cross sections were normalized to the 0◦ 7Li(p, n)7Be(g.s. + 0.43 MeV) reaction, which has a center of mass (c.m.) cross section of σc.m.(0 27.0±0.8 mb/sr at this incident energy.12 The systematic uncertainties of the data were estimated to be 4–6%. Asymmetries of the 1H(~n, p)n and 12C(~n, p)X reac- tions in NPOL3 were used to deduce the neutron polar- ization. The effective analyzing power Ay;eff of NPOL3 was calibrated by using polarized neutrons from the 12C(~p, ~n)12N(g.s.;1+) reaction at 296 MeV and 0◦. A detailed description of the calibration can be found in Ref. 11. The resulting Ay;eff was 0.151 ± 0.007 ± 0.004, where the first and second uncertainties are statistical and systematic, respectively. Figure 1 shows the double differential cross section and a complete set of PT observables Dii (i = S, N, and L) at 0◦ as a function of excitation energy Ex. The labo- ratory coordinates at 0◦ are defined so that the normal (N̂ ) direction is the same as N̂ at finite angles (nor- mal to the reaction plane), the longitudinal (L̂) direc- tion is along the momentum transfer, and the sideways (Ŝ) direction is given by Ŝ = N̂ × L̂. The data of the cross section in Fig. 1 have been sorted into 0.25-MeV http://arxiv.org/abs/0704.0670v1 2 J. Phys. Soc. Jpn. Letter Author Name Fig. 1. Double differential cross section (top panel) and a com- plete set of polarization transfer observables (bottom three pan- els) for the 12C(p, n) reaction at Tp = 296 MeV and θlab = 0 The error bars represent statistical uncertainties only. bins, while the data of Dii(0 ◦) have been sorted into 1-MeV bins to reduce statistical fluctuations. A high en- ergy resolution of 500 keV full width at half maximum (FWHM) was realized by NPOL3, which enabled us to observe clearly two SDR peaks at Ex ≃ 4 and 7 MeV. It should be noted that the DNN(0 ◦) value should be equal to the corresponding DSS(0 ◦) value because the N̂ direction is identical to the Ŝ direction at 0◦. The ex- perimental DNN (0 ◦) and DSS(0 ◦) values are consistent with each other within statistical uncertainties over the entire range of Ex, demonstrating the reliability of our measurements. Figure 2 shows the total spin transfer Σ(0◦) and the observable f1 defined as Σ(0◦) = 3− [2DNN(0 ◦) +DLL(0 1− 2DNN(0 ◦) +DLL(0 2[1 +DLL(0◦)] as a function of excitation energy Ex. The Σ(0 ◦) value is either 0 or 1 depending on whether ∆S = 0 or ∆S = 1, which is independent of theoretical models.6 The f1 value is either 0 or 1 depending on the natural-parity or unnatural-parity transition if a single ∆Jπ transition is dominant.7 The Σ(0◦) and f1 values of the spin-flip unnatural-parity 1+ and 2− states at Ex = 0 and 4 MeV, respectively, are almost unity, which is consistent with theoretical predictions. The continuum Σ(0◦) values are almost independent of Ex and take values larger than 0.88 up to Ex = 50 MeV, indicating the predominance of the spin-flip strength. The solid line in the top panel of Fig. 2 represents the free NN values of Σ(0◦) for the Fig. 2. Total spin transfer Σ (top panel) and observable f1 (bot- tom panel) for the 12C(p, n) reaction at Tp = 296 MeV and θlab = 0 ◦. The error bars represent statistical uncertainties only. The solid line shows the values of Σ for free NN scattering. corresponding kinematical condition.13 Enhancement of Σ(0◦) relative to the free NN values means enhancement of the ∆S = 1 response relative to the ∆S = 0 response in nuclei at small momentum transfers, which is consis- tent with previous studies of (p, p′) scattering.14, 15 The large values of f1 ≥ 0.72 up to Ex = 50 MeV indicate a predominance of the unnatural-parity transition strength in the continuum, consistent with the 90Zr(p, n) result at 295 MeV.7 The top panel of Fig. 3 shows the spin-flip (σΣ) and non-spin-flip (σ(1−Σ)) cross sections as filled and open circles, respectively, as functions of Ex. The bottom panel shows the unnatural-parity dominant (σf1) and natural- parity dominant (σ(1 − f1)) components of the cross section as filled and open circles, respectively. The solid lines are the results of peak fitting of the spectra with Gaussian peaks and a continuum. The continuum was as- sumed to be the quasi-free scattering contribution, and its shape was given by the formula given in Ref. 16. It should be noted that the spin-flip unnatural-parity 1+ and 2− states at Ex = 0 and 4 MeV, respectively, form peaks only in the σΣ and σf1 spectra. It is found that the prominent peak at Ex ≃ 7 MeV is the spin-flip unnatural-parity component with a Jπ value estimated to be 2− because the Dii(0 ◦) values are consistent with the theoretical prediction for Jπ = 2−.17 In the σ(1−f1) spectrum, possible evidence for SD 1− peaks is seen at Ex ≃ 7, 10, and 14 MeV. The top and bottom panels of Fig. 4 show theoretical calculations for the unnatural- parity and natural-parity SD strengths, respectively.5 Experimentally extracted peaks in the σf1 and σ(1−f1) spectra are also shown. Concentration of the SD 2− strength at three peaks at Ex ≃ 4, 8, and 13 MeV has been predicted. Our data agree with this prediction qual- itatively, but give slightly different excitation energies of Ex ≃ 4, 7, and 11 MeV. On the other hand, the SD 1− strength has been predicted to be quenched and frag- mented due to tensor correlations.5 The experimental re- J. Phys. Soc. Jpn. Letter Author Name 3 Fig. 3. Cross sections separated by Σ (top panel) and f1 (bottom panel) for the 12C(p, n) reaction at Tp = 296 MeV and θlab = 0 The solid lines show peak fitting of the spectra with Gaussian peaks and a continuum. sults are spread over a wide region of Ex ≃ 5–16 MeV and exhibit similar cross sections, which supports frag- mentation of the SD 1− strength. Effective tensor interactions at q ≃ 1–3 fm−1 have mainly been studied using high spin stretched states.18, 19 The present Dii(0 ◦) data can give information on the ex- change tensor interaction at an extremely large exchange momentum transfer of Q ≃ 3.6 fm−1. In the Kerman– McNanus–Thaler (KMT) representation,20 the NN scat- tering amplitude is represented as M(q) =A+ 1 (B + E + F )σ1 · σ2 + C(σ1 + σ2) · n̂ (E −B)S12(q̂) + (F −B)S12(Q̂), where S12 is the tensor operator, q̂ and Q̂ are direct and exchange momentum transfers, respectively, and n̂ = Q̂ × q̂. In a plane-wave impulse approximation (PWIA), the PT observables for the Gamow–Teller (GT) transition at 0◦ are simply expressed using parameters A–F as17 DNN(0 ◦) = DSS(0 2B2 + F 2 DLL(0 −2B2 + F 2 2B2 + F 2 If there is no exchange tensor S12(Q̂) interaction (i.e., F = B), then Dii(0 ◦) = −1/3. The measured PT observables Dii(0 ◦) for the GT 12C(~p, ~n)12N(g.s.;1+) transition are listed in Table I, where the listed uncertainties are statistical only. The presentDNN (0 ◦) and DSS(0 ◦) values are consistent with each other, as expected, and the present DNN (0 ◦) value agrees with the previously measured DNN (0 ◦) value at the same energy.9 The experimental values deviated from −1/3, which indicates that there are contributions from both the exchange tensor interaction at Q ≃ 3.6 fm−1 and nuclear distortion effects. Fig. 4. SD strengths for unnatural-parity (top panel) and natural-parity (bottom panel) taken from Ref. 5. The solid lines represent peaks obtained by fitting σf1 (top panel) and σ(1−f1) (bottom panel) spectra. In order to assess these effects quantitatively, we per- formed microscopic DWIA calculations using the com- puter code dw81.21 The transition amplitudes were cal- culated from the Cohen–Kurath wave functions22 assum- ing Woods–Saxon radial dependence.23 Distorted waves were generated using the optical model potential (OMP) for proton elastic scattering data on 12C at 318 MeV.24 We used the effective NN interaction parameterized by Franey and Love (FL) at 270 or 325 MeV.25 First, we examined the sensitivity of the DWIA results to the OMPs by using two different parameters.24, 26 The OMP dependence of Dii(0 ◦) was found to be less than 0.01. This insensitivity allows us to useDii(0 ◦) as a probe to study the effective NN interaction. Table I shows the DWIA results for Dii(0 ◦) with the NN interaction at 270 and 325 MeV. It is found that the Dii(0 ◦) values, and DLL(0 ◦) in particular, are sensitive to the choice of the NN interaction. These differences are mainly due to the exchange tensor interaction S12(Q) at Q ≃ 3.6 fm The real part of S12(Q) for the FL 325 MeV interaction is about twice as large as that for the FL 270 MeV in- teraction at Q ≃ 3.6 fm−1 (see Fig. 3 of Ref. 9). The experimental Dii(0 ◦) values support the DWIA results with the FL 270 MeV interaction, which indicates that the exchange tensor part of the FL 270 MeV interaction has an appropriate strength at Q ≃ 3.6 fm−1. This con- clusion has already been reported for DNN(0 ◦) data,9 however, the present data make the conclusion more rig- orous because of the high sensitivity of DLL(0 ◦) to the exchange tensor interaction. In summary, a complete set of PT observables for the 12C(p, n) reaction at Tp = 296 MeV and θlab = 0 ◦ has been measured. The total spin transfer Σ(0◦) and the observable f1 are deduced in order to study the spin- parity structure in both the SDR and continuum re- gions. The Σ(0◦) and f1 values show that the SDR at 4 J. Phys. Soc. Jpn. Letter Author Name DNN (0 ◦) DSS(0 ◦) DLL(0 This work −0.216 ± 0.019 −0.210± 0.039 −0.554± 0.023 ref. 9 −0.215 ± 0.019 – – FL 270 MeV −0.225 −0.225 −0.550 FL 325 MeV −0.191 −0.191 −0.619 Table I. PT observables Dii(0 ◦) for the GT 12C(~p, ~n)12N(g.s.;1+) transition at 296 MeV and 0◦ compared with theoretical calculations. Ex ≃ 7 MeV has greater 2 − strength than 1− strength, which agrees with the recent theoretical prediction. In the continuum up to Ex ≃ 50 MeV, a predominance of the spin-flip and unnatural-parity transition strength is also found. We have compared the PT observables of the 12C(p, n)12N(g.s.;1+) reaction with DWIA calcula- tions employing the FL interaction. The exchange tensor interaction of the FL 270 MeV interaction is found to be more appropriate at Q ≃ 3.6 fm−1 than that of the FL 325 MeV interaction. Thus a complete set of PT ob- servables provides rigorous information not only on the spin-parity structure in nuclei but also on the effective NN interaction. Acknowledgment We are grateful to the RCNP cyclotron crew for pro- viding a good quality beam for our experiments. We also thank H. Tanabe for his help during the experiments. This work was supported in part by the Grants-in-Aid for Scientific Research Nos. 14702005 and 16654064 of the Ministry of Education, Culture, Sports, Science, and Technology of Japan. 1) X. Yang, L. Wang, J. Rapaport, C. D. Goodman, C. Foster, Y. Wang, W. Unkelbach, E. Sugarbaker, D. Marchlenski, S. de Lucia, B. Luther, J. L. Ullmann, A. G. Ling, B. K. Park, D. S. Sorenson, L. Rybarcyk, T. N. Taddeucci, C. R. Howell and W. Tornow: Phy. Rev. C 48 (1993) 1158. 2) B. D. Anderson, L. A. C. Garcia, D. J. Millener, D. M. Manley, A. R. Baldwin, A. Fazely, R. Madey, N. Tamimi, J. W.Watson and C. C. Foster: Phys. Rev. C 54 (1996) 237. 3) H. Okamura, T. Uesaka, K. Suda, H. Kumasaka, R. Suzuki, A. Tamii, N. Sakamoto and H. Sakai: Phys. Rev. C 66 (2002) 054602. 4) T. Ichihara, M. Ishihara, H.Ohnuma, T.Niizeki, T.Yamamoto, K. Katoh, T. Yamashita, Y. Fuchi, S. Kubono, M. H. Tanaka, H.Okamura, S. Ishida and T.Uesaka: Nucl.Phys.A 577 (1994) 5) T. Suzuki and H. Sagawa: Nucl. Phys. A 637 (1998) 547. 6) T. Suzuki: Prog. Theor. Phys. 103 (2000) 859. 7) T. Wakasa, H. Sakai, H. Okamura, H. Otsu, T. Nonaka, T. Ohnishi, K. Yako, K. Sekiguchi, S. Fujita, T. Uesaka, Y. Satou, S. Ishida, N. Sakamoto, M. B. Greenfield and K. Hatanaka: J. Phys. Soc. Jpn. 73 (2004) 1611. 8) D.J.Mercer, T.N.Taddeucci, L.J.Rybarcyk, X.Y.Chen, D.L. Prout, R. C. Byrd, J. B. McClelland, W. C. Sailor, S. DeLucia, B. Luther, D. G. Marchlenski, E. Sugarbaker, E. Gülmez, C. A. Whitten, Jr., C. D. Goodman and J. Rapaport: Phys. Rev. Lett. 71 (1993) 684. 9) T. Wakasa, H. Sakai, H. Okamura, H. Otsu, S. Ishida, N. Sakamoto, T. Uesaka, Y. Satou, M. B. Greenfield, N. Koori, A. Okihana and K. Hatanaka: Phys. Rev. C 51 (1995) R2871. 10) H. Sakai, H. Okamura, H. Otsu, T. Wakasa, S. Ishida, N. Sakamoto, T. Uesaka, Y. Satou, S. Fujita and K. Hatanaka: Nucl. Instrum. Methods Phys. Res., Sect. A 369 (1996) 120. 11) T. Wakasa, Y. Hagihara, M. Sasano, S. Asaji, K. Fujita, K. Hatanaka, T. Ishida, T. Kawabata, H. Kuboki, Y. Maeda, T. Noro, T. Saito, H. Sakai, Y. Sakemi, K. Sekiguchi, Y. Shimizu, A. Tamii, Y. Tameshige and K. Yako: Nucl. Instrum. Methods Phys. Res., Sect. A 547 (2005) 569. 12) T. N. Taddeucci, W. P. Alford, M. Barlett, R. C. Byrd, T. A. Carey, D. E. Ciskowski, C. C. Foster, C. Gaarde, C. D. Good- man, C. A. Goulding, E. Gülmez, W. Huang, D. J. Horen, J. Larsen, D. Marchlenski, J. B. McClelland, D. Prout, J. Rapa- port, L. J. Rybarcyk, W. C. Sailor, E. Sugarbaker and C. A. Whitten, Jr.: Phys. Rev. C 41 (1990) 2548. 13) R.A.Arndt, W.J.Briscoe, R.L.Workman and I. I. Strakovsky: computer code said http://gwdac.phys.gwu.edu. 14) C.Glashausser, K. Jones, F.T.Baker, L.Bimbot, H.Esbensen, R. W. Fergerson, A. Green, S. Nanda and R. D. Smith: Phys. Rev. Lett. 58 (1987) 2404. 15) F. T. Baker, L. Bimbot, B. Castel, R. W. Fergerson, C. Clashausser, A. Green, O. Hausser, K. Hicks, K. Jones, C. A. Miller, S. K. Nanda, R. D. Smith, M. Vetterli, J. Wambach, R. Abegg, D. Beatty, V. Cupps, C. Djalari, R. Henderson, K. P. Jackson, R. Jeppeson, J. Lisantti, M. Morlet, R. Sawafta, W. Unkelbach, A. Willis and S. Yen: Phys. Lett. B 237 (1990) 16) A. Erell, J. Alster, J. Lichtenstadt, M. A. Moinester, J. D. Bowman, M. D. Cooper, F. Irom, H. S. Matis, E. Piasetzky and U. Sennhause: Phys. Rev. C 34 (1986) 1822. 17) J. M. Moss: Phys. Rev. C 26 (1982) 727. 18) Edward J. Stephenson and Jeffrey A. Tostevin, in Spin and Isospin in Nuclear Interactions, Proceedings of the Interna- tional Conference, Telluride, Colorado, 11–15 March 1991, edited by Scott W.Wissink, Charles D.Goodman, and George E. Walker (Plenum, New York, 1991), p.281; N. M. Hintz, A. Sethi, and A. M. Lallena, ibid., p.287. 19) N. M. Hintz, A. M. Lallena and A. Sethi: Phys. Rev. C 45 (1992) 1098. 20) A. K. Kerman, H. McManus and R. M. Thaler: Ann. Phys. (N.Y.) 8 (1959) 551. 21) Program dwba70, R. Schaeffer and J. Raynal (unpublished); Extended version dw81 by J. R. Comfort (unpublished). 22) S. Cohen and D. Kurath: Nucl. Phys. 73 (1965) 1. 23) B. L. Clausen, R. J. Peterson and R. A. Lindgren: Phys. Rev. C 38 (1988) 589. 24) F. T. Baker, D. Beatty, L. Bimbot, V. Cupps, C. Djalali, R. W. Fergerson, C. Glashausser, G. Graw, A. Green, K. Jones, M. Morlet, S. K. Nanda, A. Sethi, B. H. Storm, W. Unkelbach and A. Willis: Phys. Rev. C 48 (1993) 1106. 25) M. A. Franey and W. G. Love: Phys. Rev. C 31 (1985) 488. 26) H. O. Meyer, P. Schwandt, H. P. Gubler, W. P. Lee, W. T. H. van Oers, R. Abegg, D. A. Hutcheon, C. A. Miller, R. Helmer, K. P. Jackson, C. Broude and W. Bauhoff: Phys. Rev. C 31 (1985) 1569. ABSTRACT A complete set of polarization transfer observables has been measured for the $^{12}{\rm C}(p,n)$ reaction at $T_p=296 {\rm MeV}$ and $\theta_{\rm lab}=0^{\circ}$. The total spin transfer $\Sigma(0^{\circ})$ and the observable $f_1$ deduced from the measured polarization transfer observables indicate that the spin--dipole resonance at $E_x \simeq 7 {\rm MeV}$ has greater $2^-$ strength than $1^-$ strength, which is consistent with recent experimental and theoretical studies. The results also indicate a predominance of the spin-flip and unnatural-parity transition strength in the continuum. The exchange tensor interaction at a large momentum transfer of $Q \simeq 3.6 {\rm fm}^{-1}$ is discussed. <|endoftext|><|startoftext|> Introduction and problem statement Related work Assumptions The results Example: nonparametric regression Discussion and future work Appendix References ABSTRACT The problem of statistical learning is to construct a predictor of a random variable $Y$ as a function of a related random variable $X$ on the basis of an i.i.d. training sample from the joint distribution of $(X,Y)$. Allowable predictors are drawn from some specified class, and the goal is to approach asymptotically the performance (expected loss) of the best predictor in the class. We consider the setting in which one has perfect observation of the $X$-part of the sample, while the $Y$-part has to be communicated at some finite bit rate. The encoding of the $Y$-values is allowed to depend on the $X$-values. Under suitable regularity conditions on the admissible predictors, the underlying family of probability distributions and the loss function, we give an information-theoretic characterization of achievable predictor performance in terms of conditional distortion-rate functions. The ideas are illustrated on the example of nonparametric regression in Gaussian noise. <|endoftext|><|startoftext|> Hamiltonian formalism in Friedmann cosmology and its quantization Jie Ren1,∗ Xin-He Meng2,3, and Liu Zhao2 Theoretical Physics Division, Chern Institute of Mathematics, Nankai University, Tianjin 300071, China Department of physics, Nankai University, Tianjin 300071, China and BK21 Division of Advanced Research and Education in physics, Hanyang University, Seoul 133-791, Korea (Dated: October 15, 2018) We propose a Hamiltonian formalism for a generalized Friedmann-Roberson-Walker cosmology model in the presence of both a variable equation of state (EOS) parameter w(a) and a variable cosmological constant Λ(a), where a is the scale factor. This Hamiltonian system containing 1 degree of freedom and without constraint, gives Friedmann equations as the equation of motion, which describes a mechanical system with a variable mass object moving in a potential field. After an appropriate transformation of the scale factor, this system can be further simplified to an object with constant mass moving in an effective potential field. In this framework, the Λ cold dark matter model as the current standard model of cosmology corresponds to a harmonic oscillator. We further generalize this formalism to take into account the bulk viscosity and other cases. The Hamiltonian can be quantized straightforwardly, but this is different from the approach of the Wheeler-DeWitt equation in quantum cosmology. PACS numbers: 98.80.Jk,45.20.Jj,03.50.-z I. INTRODUCTION Since the current accelerating expansion of our Uni- verse was discovered [1] around 1998 and 1999, theoret- ical physicists have devoted increasingly more attention to the Friedmann-Roberson-Walker (FRW) model as a standard framework in cosmology study. The Λ cold dark matter (ΛCDM) model as the standard model of cosmol- ogy so far fits well with observational data whereas it has had some serious theoretical problems. To make a comparison to the ΛCDM model, physicists have built many cosmological models that are able to give out the effective Friedmann equations with variable cosmological constant. To quantize the Friedmann equations, the com- monly used theory is the Wheeler-DeWitt equation [2], which has been studied and applied widely in quantum cosmology [3]. Starting from the Hilbert-Einstein action with the Roberson-Walker (RW) metric, the correspond- ing HamiltonianH can be obtained. Then the Friedmann equation plays the role as the constraint H = 0, which leads to the Wheeler-DeWitt equation. In the present work, we consider the Friedmann equations as basic equa- tions and find a Hamiltonian system that gives Fried- mann equations as classical equations of motion without constraint. Many Ansätze of the variable cosmological constant have been studied in the literature, such as Refs. [4, 5, 6, 7]. Moreover, some models motivated from the string theory give an effective cosmological term when reduced to the FRW framework. We assume that the equation of state (EOS) parameter w ≡ p/ρ can also be variable, which means that the contents of the Universe, except the cosmological term, are generalized to a nonperfect ∗Electronic address: jrenphysics@hotmail.com fluid, or perfect fluid as a special case. In observational cosmology, the redshift z is regarded as an observable quantity and related to the scale factor a by z ≡ a0/a−1. Therefore, we investigate a general case that both the EOS parameter w and the cosmological constant Λ can be functions of the scale factor a, and take into account the bulk viscosity. As an extension of the problem, we construct a Hamil- tonian formalism for a system described by the following equation: q̈ = f1(q)q̇ 2 + ηq̇ + f2(q), where f1(q) and f2(q) are arbitrary functions, and η is constant. Also it can be regarded as a generalization of the damping harmonic oscillator. The corresponding Hamiltonian describes an object with variable mass mov- ing in a potential field. After an appropriate canonical transformation, this system can be further simplified to an object with constant mass moving in an effective po- tential field. Thus, differential models in the FRW frame- work are characterized by their effective potentials. This is a general formalism and it can be applied to many cos- mological models, for example, that the ΛCDM model corresponds to a harmonic oscillator. Since the quanti- zation of Friedmann equations can provide an insight to quantum cosmology as a glimpse of quantum gravity, we also make some remarks on the quantum case, which pro- vides a correspondence between cosmology and quantum mechanics. The paper is organized as follows. In Sec. II we present a generalized FRW model and the corresponding Hamil- tonian to describe the Friedmann equations. Then we find a canonical transformation to further simplify the problem, and give some examples and special cases. In Sec. III we show that our framework can also be applied in the dissipative case with bulk viscosity. In Sec. IV we turn our attention to the relation to the observable http://arxiv.org/abs/0704.0672v3 mailto:jrenphysics@hotmail.com quantities and review some issues of the Bianchi identity. In Sec. V we make some remarks on quantum cosmology from our approach. In the last section we present the conclusion and discuss some future subjects. II. HAMILTONIAN FORMALISM A. Hamiltonian description of the Friedmann equations We consider the RW metric in the flat space geometry (k=0) as the case favored by current cosmic observational data: ds2 = −dt2 + a(t)2(dr2 + r2dΩ2), (1) where a(t) is the scale factor. The energy-momentum tensor for the cosmic fluid can be written as T̃µν = (ρ+ p)UµUν + (p+ ρΛ)gµν , (2) where ρΛ = Λ/(8πG) is the energy density of the cos- mological constant. Thus, Einstein’s equation Rµν − gµνR = 8πGT̃µν contains two independent equations: , (3a) = −4πG (ρ+ 3p) + . (3b) The EOS of the matter (cosmic fluid except the cosmo- logical constant) is commonly assumed to be p = (γ − 1)ρ. (4) Cosmologists usually call Eq. (3a) as the Friedmann equation and Eq. (3b) as the acceleration equation in the literature, whereas for simplicity we name both Eqs. (3a) and (3b) Friedmann equations here. For generality, we assume that both γ and Λ are functions of the scale factor a, thus we call it the generalized FRWmodel. Combining the Friedmann equations with the EOS, we obtain = −3γ(a)− 2 γ(a)Λ(a) , (5) which determines the evolution of the scale factor. We regard Eq. (5) as a basic starting point; therefore, if the dynamical equation for the scale factor can be written as that form, the present framework can be valid. If the Newton constant G is constant and the cosmological constant Λ is variable, the energy-momentum tensor for the matter cannot individually conserved [5, 6], which implies an interaction between the matter and vacuum energy. In the following, we assume G to be constant until Sec. IV. Our aim is to find a Hamiltonian description of Eq. (5) as the classical equation of motion. We start from the following Lagrangian L(q, q̇) = 1 M(q)q̇2 − V (q), (6) and the corresponding Hamiltonian thus is H(q, p) = p 2M(q) + V (q), (7) with the canonical Poisson bracket {q, p} = 1. One can check that the equation of motion for Eq. (6) or (7) is ∂ lnM ∂ ln q . (8) This equation possesses the same form as Eq. (5). There- fore, by comparing Eq. (5) with Eq. (8), we can take a as the general coordinate and solve the functions M(a) and V (a). Then the Lagrangian L = 1 M(a)q̇2 − V (a) with M = exp 3γ − 2 , V = −1 MγΛada, (9) gives Eq. (5) as the equation of motion. For some spec- ified functions γ = γ(a) and Λ = Λ(a), the above in- tegrations can be evaluated out to give M(a) and V (a) explicitly. Now we can see that the generalized FRWmodel essen- tially corresponds to an object with variable mass M(a) moving in a potential field V (a). In the following, we will show that this picture can be further simplified as an object with constant mass moving in an effective po- tential field Ṽ (φ), after an appropriate transformation of the scale factor. B. Canonical transformation The above problem can be generalized as the Hamilto- nian description of the nonlinear equation q̈ = f1(q)q̇ 2 + f2(q), (10) where f1(q) and f2(q) are two specified functions. This equation can be derived by the Lagrangian L = M(q)q̇2 − V (q) with M = exp f1(q)dq , V = − Mf2(q)dq. (11) We define a new variable φ as (see Appendix) f1(q)dq dq. (12) This transformation can eliminate the q̇2 term and gives the equation for the variable φ as in φ̈ = f2(q) exp f1(q)dq , (13) where q → φ denotes using Eq. (12) to change the vari- able q to φ. Since there is no φ̇2 term in Eq. (13), this can be regarded as a partial linearization. Therefore, the sys- tem of Eq. (10) transformed to Eq. (13) can be described by the Lagrangian L(φ, φ̇) = 1 φ̇2 − Ṽ (φ), (14) with the potential as Ṽ (φ) = − f2(q) exp f1(q)dq f2(q) exp f1(q)dq .(15) The simplification of the problem by Eq. (12) is essen- tially the canonical transformation q → φ, pq → pφ, H(q, pq) → H(φ, pφ), (16) where pq = M(q)q̇, pφ = φ̇, and H(φ, pφ) = 12p φ + Ṽ (φ). Therefore, the classical and quantum properties of differ- ent models are characterized by the effective potentials. For Eq. (5) as a special case, the new variable φ is given by 3γ − 2 da. (17) C. Some examples We will give some special cases of the above general framework to show some applications. If both γ and Λ are constant for a simple case, the integrations in Eq. (17) can be evaluated out as a3γ/2, γ 6= 0, (18a) = ln a, γ = 0. (18b) Now we consider γ 6= 0 for example. The special case γ = 1 corresponds to the ΛCDM model. The equation for φ can be obtained as φ̈− 3 γ2Λφ = 0, and the corresponding Lagrangian is L = 1 φ̇2 + γ2Λφ2. (19) We can see that the simplest model in cosmology just corresponds to a harmonic oscillator after linearization. In particular, this is a upside-down harmonic oscillator for the asymptotic de Sitter Universe. We can add the curvature effect to the ΛCDM model, which is described by the special case m = 2 of the fol- lowing equation: = −3γ − 2 . (20) Here the parameters γ, Λ, and m are all constants. This equation possesses the same form of Eq. (10). By defining φ as Eq. (12) and using Eq. (15), we obtain the effective potential as Ṽ (φ) = −3 γ2Λφ2 + 3γ −m )2−2m/3γ , (21) for γ 6= 0 and m 6= 3γ. Another example is the Friedmann equations during the inflation era. In the study of inflation, we usually use the conformal time τ instead of the comoving cosmic time t. Here we assume that a constant term −p0 is in the EOS during inflation. Thus the Friedmann equations combined with the EOS p = −ρ− p0 yield p0, (22) where the prime denotes a derivative with respect to τ , and κ2 = 8πG. By defining φ = −1/a, the equation for φ is φ′′φ+ (κ2/2)p0 = 0. The effective potential is thus Ṽ (φ) = p0 ln |φ|. (23) Moreover, if we add the curvature term in this case, it corresponds to a φ2 potential. III. BULK VISCOSITY We assume that the cosmic fluid possesses some dis- sipation effects. Since the sheer tensor σµν = 0 for RW metric, the sheer viscosity does not contribute to the evo- lution in Friedmann cosmology. The energy-momentum tensor for nonperfect fluid concerning bulk viscosity in the right-hand side of Einstein’s equation is given by [8, 9] Tµν = ρUµUν + (p− ζ0θ)hµν , (24) where hµν ≡ gµν + UµUν is the projection operator, θ ≡ Uα;α = 3ȧ/a is the scalar expansion, and ζ is the bulk viscosity coefficient. Consequently, Eq. (5) should be modified as = −3γ(a)− 2 + 12πGζ0 γ(a)Λ(a) . (25) where both γ and Λ can be functions of a for generality, and ζ0 is constant. We also find a Hamiltonian H(a, pa, t) = 2M(a, t) + V (a, t), (26) with the Poisson bracket {a, pa} = 1 to give Eq. (25) as the classical equation of motion. The functions in this Hamiltonian are given by M = exp 3γ − 2 da− 12πGζ0t , (27a) V = −1 MγΛada. (27b) Although a dissipative system cannot be described by a conservative Hamiltonian generally, one can directly check that the classical equation of motion for the Hamil- tonian Eq. (26) is Eq. (25). As a special case, the equa- tion for a damping harmonic oscillator can be derived by the Caldirora-Kani (CK) Hamiltonian [10]. The above problem can be generalized to construct a Hamiltonian system for the equation q̈ = f1(q)q̇ 2 + ηq̇ + f2(q), (28) where η is constant. It can be derived by the Hamiltonian H(q, p, t) = 1 M(q, t)−1p2 + V (q, t) with M = exp f1(q)dq − ηt , V = − Mf2(q)dq. Similarly, by using the new variable φ defined by Eq. (12), the equation for φ is φ̈ = ηφ̇+ f2(q) exp f1(q)dq . (30) Now we consider a very special case that both γ and Λ are constant; then φ defined by Eq. (18a) satisfies φ̈− 12πGζ0φ̇− γ2Λφ = 0, (31) which describes a damping harmonic oscillator. The damping harmonic oscillator Mq̈ = −ηq̇(t)− ∂V (q) , (32) has been studied in quantum mechanics. The CK Hamil- tonian e−ηt/Mp2 + Mω2eηt/M q2, (33) with the commutation relation [q, p] = i~, can yield the dissipation equation (32) through the Heisenberg equa- tion [10]. Our work can be regarded as a generalization to the case of variable mass. It is the variable mass that generates a nonlinear term in the equation of motion that describes the generalized FRW model. In our previous work [9], we have proposed an EOS as p = (γ − 1)ρ− κ2T 22 where the parameters γ, T1 and T2 are constants. Com- bining the Friedmann equations with this more practical EOS, we obtain the dynamical evolution equation for the scale factor as = −3γ − 2 . (35) This model possesses a large variety of properties, such as that we have found a scalar field model which is equiva- lent to the above EOS. For related works on the modified EOS, see Ref. [9, 11, 12, 13]. The present work can also be regarded as a generalization of the EOS to γ = γ(a) and T2 = T2(a). And the corresponding Hamiltonian formalism for this system can be constructed similarly. IV. RELATIONS TO THE OBSERVABLE QUANTITIES The observations of the supernovae (SNe) Ia have provided the direct evidence for the cosmic accelerat- ing expansion of our current Universe [1]. A bridge between the cosmological theory and the observation data is the H-z relation, where H ≡ ȧ/a is the Hub- ble parameter and z is the redshift. For example, the ΛCDM model in cosmology can be described mainly as H2(z) = H20 [Ωm(1+z) 3+1−Ωm], where Ωm is the mat- ter energy density. This model fits the observational data well and provides the cosmological constant as the sim- plest candidate for dark energy. In a sense, the different cosmological models are characterized by the correspond- ing H-z relations. There is also a systematic way to construct the Hamil- tonian starting from the general model H2 = f(a), (36) where f(a) is a specific function of the scale factor a, according to the model. By differentiating Eq. (36), we obtain that it is a solution of the following equation: 3γ − 2 3γf(a) af ′(a) , (37) which possesses the same form of Eq. (5) or (10). The corresponding coefficients are given by f1(a) = − 3γ − 2 , f2(a) = 3γaf(a) a2f ′(a) . (38) Then by applying Eq. (11) we can obtain the correspond- ing Hamiltonian. Therefore, even if the EOS for a cosmo- logical model is not explicitly linear in ρ, the Hamiltonian formalism in the present work can also be applied if the effective Friedmann equation H2 = f(a) can be given out for that model. Many approaches such as modified gravity [14] can be reduced to effective Friedmann equations in the form H2 = f(a). Since ΛCDM model fits the SNe Ia data well, the reasonable cosmological models should be reduced to Friedmann cosmology in an effective way and give out the right H-z relation, in order to make a comparison with the ΛCDM model. In our case, the Friedmann equations in terms of the Hubble parameter can be written as = −3γ H2 + Λ̃(a). (39) Here γ is assumed to be constant for simplicity. This equation is linear in H2 and the effective term Λ̃(a) is an inhomogeneous term. The solution in terms of H(z) concerning the initial condition H(0) = H0 is given out H(z)2 = H20 (1 + z) Λ̃(z′)(1 + z′)−3γ−1dz′ In the power-law ΛCDM model, the contributions of dif- ferent components are separated in H2, such as a con- stant for the cosmological constant, and a (1 + z)2 fac- tor for the curvature term. But in the general case, the contribution of the matter cannot be separated from the above solution. This problem is related to the conserva- tion law of the matter, which has been investigated in Refs. [5, 6]. The Bianchi identity for the energy-momentum tensor Eq. (2) gives ρ̇Λ + ρ̇+ 3H(ρ+ p) = 0, (41) which implies that energy transfer will exist between the matter and the vacuum energy. An intuitive idea has been proposed that if both G and Λ are variable, the ordinary energy-momentum tensor can be individually conserved, i.e., ρ̇+3H(ρ+p) = 0 [6]. This is achieved by combining the Bianchi identity for the variable G and Λ model [G(ρΛ + ρ)] + 3GH(ρ+ p) = 0, (42) with the following constraint: (ρ+ ρΛ)Ġ+Gρ̇Λ = 0. (43) The authors of Ref. [6] assume that both the Newton con- stant G and the cosmological constant Λ are functions of a scale parameter µ and apply the renormalization group approach to cosmology. If G(µ) evolves by a logarithmic law and ρΛ(µ) evolves quadratically with µ, then this picture can explain the evolution of the Universe, and at the same time, the variable G can explain the flat rota- tion curves of the galaxies without introducing the dark matter hypothesis. V. REMARKS ON QUANTUM COSMOLOGY We have obtained a classical Hamiltonian formalism of the Friedmann equations. Generally, once a Hamilto- nian is obtained, the system can be quantized straightfor- wardly by replacing the Poisson bracket with the commu- tation relation [q, p] = i. However, we need to take into account the ambiguity in the ordering of noncommuting operators q and p. For simplicity, we ignore the order- ing ambiguity here. In terms of the new variable φ, the corresponding Schrödinger’s equation can be written as H(φ, p̂φ)Ψ(φ) = EΨ(φ), (44) where p̂φ = −i∂φ. To make a comparison between our ap- proach and the Wheeler-DeWitt equation, we only take the ΛCDM model as a very special case for an illustrative example. The corresponding Hamiltonian for Eq. (19) in the case γ = 1 is H = 1 p2 − 1 Λa3, (45) where p = aȧ. In the approach of the Wheeler-DeWitt equation, H = 0 is a constraint [2, 15], thus the quanti- zation gives (∂2a + a4)Ψ(a) = 0. This is an anharmonic oscillator with zero energy eigenvalue. In our case, the Hamiltonian is nonzero and proportional to the matter energy density, which we show in the following. The so- lution of Eq. (39) with Λ̃(a) = Λ/2 is H20 − a−3 + = H20 [Ωma −3 + 1− Ωm], (46) where Ωm ≡ 1 − Λ/(3H20 ). Therefore, the Hamiltonian can be calculated as H = a H20Ωm. (47) After a canonical transformation by Eq. (16), the Schrödinger’s equation in terms of φ becomes Ψ(φ) = EΨ(φ). (48) Thus, for the asymptotic de Sitter Universe, the ΛCDM model corresponds to an upside-down harmonic oscillator in our formalism. Such an oscillator also appears in the matrix description of de Sitter gravity [16]. We can transform the de Sitter Universe to the dual anti-de Sitter Universe by employing the scale factor du- ality [17], which has been found that a → a−1 gives H → −H and other consequences. The duality for Eq. (5) is given by a → a−1, γ → −γ, Λ → −Λ, φ → −φ. (49) It can be checked easily that Eq. (5) is invariant under these transformations. If we use the dual scale factor a−1, the corresponding potential becomes Ṽ (φ) = + 3 γ2Λφ2. In fact, quantization in de Sitter spacetime is one of the major difficulties of string theory at one time (though this picture has changed a little bit after Kachru-Kallosh- Linde- Trivedi theory appeared). It seems that quantiz- ing de Sitter cosmology is no difference, since the time variable used is the same, and it is known that there is no global timelike coordinates in de Sitter spacetime. Some quantum effects of a scalar field in de Sitter background can be found in Ref. [18]. VI. CONCLUSION AND DISCUSSION We have proposed a systematic scheme to describe the Friedmann equations through a Hamiltonian formalism. The generalized FRW model accompanied by both vari- able EOS parameter and variable cosmological constant admits a Hamiltonian description without constraint. Af- ter an appropriate canonical transformation, the system can be significantly simplified to an object moving in an effective potential field. The bulk viscosity can also be taken into account by a time-dependent Hamiltonian. Some examples are given explicitly, such as the ΛCDM model, the curvature term effect, and the inflation period. The quantization of the system provides a new approach to study the potential quantum cosmology, which is an intriguing topic in theoretical physics research. We shall discuss some possible future developments of our work. As we have claimed, the formalism in this work can be applied to a large variety of cosmological models. By solving the Schrödinger equation H(φ, p̂φ)Ψ = EΨ, the cosmological wave function can be obtained for a specific model. Here we consider the curvature effect, for example, which is described by the potential Eq. (21) with parameters Λ = 0, γ = 1, and m = 2. The cor- responding Schrödinger equation can be solved in terms of the biconfluent Heun equation (BHE) [19]. We can also start from the effective Lagrangian and study the observational effects when we modify the potential. We believe that our formalism would give a new perspective to the potential study of quantum cosmology physics. ACKNOWLEDGMENTS J.R. thanks Prof. M.L. Ge for helpful discussions on Hamiltonian systems. X.H.M. is supported by NSFC un- der No. 10675062 and BK21 Foundation. L.Z. is sup- ported by NSFC under No. 90403014. APPENDIX A: MATHEMATICAL NOTES A more general correspondence between a Hamiltonian and its equation of motion is given in Ref. [19]. The equation of motion of the Hamiltonian H(q, p, t) = P0(q, t)p 2 + P1(q, t)p+ P2(q, t) is given out by ∂ lnP0 ∂ ln f − ∂ lnP0 − 2∂V . (A2) In the mathematical aspect, Eq. (28) can be further gen- eralized to the following equation: q̈ = F1(q, t)q̇ 2 + F2(q, t)q̇ + F3(q, t), (A3) however, here the coefficients F1 and F2 are not com- pletely independent. Comparing with Eq. (A2), we can see that the condition 2∂tF1(q, t) = ∂qF2(q, t) must be satisfied for consistency. In the present work, both f1(q) and η have safely satisfied this condition. We shall explain why we choose the transformation as in Eq. (12). Starting from the following equation q̈ = f1(q)q̇ 2 + ηq̇ + f2(q), (A4) we expect that after an appropriate change of variable φ(q), the above equation can be transformed as φ̈ = ηφ̇+ g(φ). (A5) By differentiating φ(q), we obtain φ̇ = φ′q̇, and φ̈ = φ′′q̇2 + φ′q̈, where the prime denotes a derivative with respect to q. Substituting φ, φ̇, and φ̈ into Eq. (A5), we obtain q̈ = −φ q̇2 + ηq̇ + Now it turns out that by defining −φ′′/φ′ = f1(q), which can be solved as the form Eq. (12), the q̇2 term can be eliminated. [1] A.G. Riess et al., Astron. J. 116, 1009 (1998); N. Bahcall, J.P. Ostriker, S. Perlmutter, and P.J. Steinhardt, Science 284, 1481 (1999); D.N. Spergel et al., astro-ph/0603449; A.G. Riess et al., astro-ph/0611572. [2] B.S. DeWitt, Phys. Rev. 160, 1113 (1967). [3] E.M. Barboza Jr. and N.A. Lemos, Gen. Rel. Grav. 38, 1609 (2006); G.A. Monerat, E.V. Corrêa Silva, G. Oliveira-Neto, L.G. Ferreira Filho, and N.A. Lemos, Phys. Rev. D 73, 044022 (2006); Braz. J. Phys. 35, 1106 (2005); M.P. Da̧browski, C. Kiefer, and B. Sandhöfer, Phys. Rev. D 74, 044022 (2006); N. Pinto-Neto, E. Ser- gio Santini, and F.T. Falciano, Phys. Lett. A 344, 131 (2005); V. Husain and O. Winkler, Phys. Rev. D 69, 084016 (2004); C. Wang, Class. Quant. Grav. 20, 3151 (2003); A.M. Khvedelidze and Yu.G. Palii, Class. Quant. Grav. 18, 1767 (2001); S.A. Gogilidze, A.M. Khvedelidze, V.V. Papoyan, Yu.G. Palii, and V.N. Pervushin, Grav. Cosmol. 3, 17 (1997); A.M. Khvedelidze, V.V. Papoyan, Yu.G. Palii, and V.N. Pervushin, Phys. Lett. B 402, 263 (1997); H.C. Rosu and J. Socorro, Phys. Lett. A 223, 28 (1996); N.A. Lemos, J. Math. Phys. 37, 1449 (1996); L.A. Glinka, gr-qc/0612079; V.V. Kuzmichev, gr-qc/0002029. [4] J.M. Overduin and F.I. Cooperstock, Phys. Rev. D 58, 043506 (1998); R.G. Vishwakarma, Class. Quant. Grav. 18, 1159 (2001). [5] J. Solà and H. Štefančić, Mod. Phys. Lett. A 21, 479 (2006); Phys. Lett. B 624, 147 (2005); B. Guberina, R. Horvat, and H. Nikolić, Phys. Lett. B 636, 80 (2006); [6] I.L. Shapiro, J. Solà, and H. Štefančić, JCAP 0501, 012 (2005). http://arxiv.org/abs/astro-ph/0603449 http://arxiv.org/abs/astro-ph/0611572 http://arxiv.org/abs/gr-qc/0612079 http://arxiv.org/abs/gr-qc/0002029 [7] P. Wang and X.H. Meng, Class. Quant. Grav. 22, 283 (2005). [8] I. Brevik, Phys. Rev. D 65, 127302 (2002); I. Brevik and O. Gorbunova, Gen. Rel. Grav. 37, 2039 (2005). [9] J. Ren and X.H. Meng, Phys. Lett. B 633, 1 (2006); 636, 5 (2006); astro-ph/0605010, to appear in IJMPD; X.H. Meng, J. Ren, and M.G.Hu, Commun. Theor. Phys. 47, 379 (2007). [10] P. Caldirola, Nuovo Cimento 18, 393 (1941); E. Kanai, Prog. Theor. Phys. 3, 440 (1948); L.H. Yu and C.P. Sun, Phys. Rev. A 49, 592 (1994); C.P. Sun and L.H. Yu, Phys. Rev. A 51, 1845 (1995). [11] R. Holman and S. Naidu, astro-ph/0408102; E. Babichev, V. Dokuchaev, and Y. Eroshenko, Class. Quant. Grav. 22, 143 (2005). [12] S. Capozziello, S. Nojiri, and S.D. Odintsov, Phys. Lett. B 634, 93 (2006); Phys. Lett. B 632, 597 (2006); S. Capozziello, V.F. Cardone, E. Elizalde, S. Nojiri, and S.D. Odintsov, Phys. Rev. D 73, 043512 (2006); S. No- jiri, and S.D. Odintsov, Gen. Rel. Grav. 38, 1285 (2006); Phys. Rev. D 72, 023003 (2005); I. Brevik, O.G. Gor- bunova, and A.V. Timoshkin, gr-qc/0702089. [13] I.L. Shapiro and J. Sola, JHEP 0202 006 (2002); astro-ph/0401015. [14] X.H. Meng and P. Wang, Class. Quant. Grav. 20, 4949 (2003); 21, 951 (2004); 22, 23 (2005); ibid, Phys. Lett. B 584, 1 (2004) for example. [15] A. Vilenkin, Phys. Rev. D 50, 2581 (1994). [16] Y.H. Gao, hep-th/0107067. [17] G. Veneziano, Phys. Lett. B 265, 287 (1991); M.C. Bento and O. Bertolami, Class. Quant. Grav. 12, 1919 (1995). [18] V.K. Onemli and R.P. Woodard, Phys. Rev. D 70, 107301 (2004); E.O. Kahya and V.K. Onemli, gr-qc/0612026. [19] S.Yu. Slavyanov and W. Lay, Special Functions: A Uni- fied Theory Based on Singularities (Oxford University Press, New York, 2000). http://arxiv.org/abs/astro-ph/0605010 http://arxiv.org/abs/astro-ph/0408102 http://arxiv.org/abs/gr-qc/0702089 http://arxiv.org/abs/astro-ph/0401015 http://arxiv.org/abs/hep-th/0107067 http://arxiv.org/abs/gr-qc/0612026 ABSTRACT We propose a Hamiltonian formalism for a generalized Friedmann-Roberson-Walker cosmology model in the presence of both a variable equation of state (EOS) parameter $w(a)$ and a variable cosmological constant $\Lambda(a)$, where $a$ is the scale factor. This Hamiltonian system containing 1 degree of freedom and without constraint, gives Friedmann equations as the equation of motion, which describes a mechanical system with a variable mass object moving in a potential field. After an appropriate transformation of the scale factor, this system can be further simplified to an object with constant mass moving in an effective potential field. In this framework, the $\Lambda$ cold dark matter model as the current standard model of cosmology corresponds to a harmonic oscillator. We further generalize this formalism to take into account the bulk viscosity and other cases. The Hamiltonian can be quantized straightforwardly, but this is different from the approach of the Wheeler-DeWitt equation in quantum cosmology. <|endoftext|><|startoftext|> Introduction Information Theoretic Definitions SSR Model Describing SSR Using a Single PDF, fQ() fQ() as the PDF of the Average Transfer Function Mutual Information in Terms of fQ() Entropy of the random variable, Q Examples of the PDF fQ() Large N SSR: Literature Review and Outline of This Paper A General Expression for the SSR Channel Capacity for Large N A Sufficient Condition for Optimality Optimizing the Signal Distribution Example: Uniform Noise Gaussian Noise Optimizing the Noise Distribution Example: Uniform Signal Consequences of Optimizing the Large N Channel Capacity Optimal Fisher Information The Optimal PDF fQ() Output Entropy at Channel Capacity The Optimal Output PMF is Beta-Binomial Analytical Expression for the Mutual Information A Note on the Output Entropy Channel Capacity for Large N and `Matched' Signal and Noise Improvements to Previous Large N Approximations SSR for Large N and =1 Uniform Signal and Noise Gaussian Signal and Noise Acknowledgments Derivations Mutual Information for Large N and Arbitrary Conditional Output Entropy Output Distribution and entropy Mutual Information Proof that fS(x) is a PDF H(y|X) for large N and =1 References ABSTRACT Suprathreshold stochastic resonance (SSR) is a form of noise enhanced signal transmission that occurs in a parallel array of independently noisy identical threshold nonlinearities, including model neurons. Unlike most forms of stochastic resonance, the output response to suprathreshold random input signals of arbitrary magnitude is improved by the presence of even small amounts of noise. In this paper the information transmission performance of SSR in the limit of a large array size is considered. Using a relationship between Shannon's mutual information and Fisher information, a sufficient condition for optimality, i.e. channel capacity, is derived. It is shown that capacity is achieved when the signal distribution is Jeffrey's prior, as formed from the noise distribution, or when the noise distribution depends on the signal distribution via a cosine relationship. These results provide theoretical verification and justification for previous work in both computational neuroscience and electronics. <|endoftext|><|startoftext|> Draft version October 31, 2018 Preprint typeset using LATEX style emulateapj v. 11/26/04 THREE DIFFERENT TYPES OF GALAXY ALIGNMENT WITHIN DARK MATTER HALOS A. Faltenbacher , Cheng Li , Shude Mao , Frank C. van den Bosch , Xiaohu Yang , Y.P. Jing , Anna Pasquali and H.J. Mo Draft version October 31, 2018 ABSTRACT Using a large galaxy group catalogue based on the Sloan Digital Sky Survey Data Release 4 we measure three different types of intrinsic galaxy alignment within groups: halo alignment between the orientation of the brightest group galaxies (BGG) and the distribution of its satellite galaxies, radial alignment between the orientation of a satellite galaxy and the direction towards its BGG, and direct alignment between the orientation of the BGG and that of its satellites. In agreement with previous studies we find that satellite galaxies are preferentially located along the major axis. In addition, on scales r < 0.7Rvir we find that red satellites are preferentially aligned radially with the direction to the BGG. The orientations of blue satellites, however, are perfectly consistent with being isotropic. Finally, on scales r < 0.1Rvir, we find a weak but significant indication for direct alignment between satellites and BGGs. We briefly discuss the implications for weak lensing measurements. Subject headings: galaxies: clusters: general — galaxies: kinematics and dynamics — surveys 1. INTRODUCTION A precise assessment of galaxy alignments is im- portant for two main reasons: it contains information regarding the impact of environment on the formation and evolution of galaxies, and it can be an important source of contamination for weak lensing measurements. In theory, the large scale-tidal field is expected to induce large-scale correlations between galaxy spins and galaxy shapes (e.g., Pen et al. 2000; Croft & Metzler 2000; Heavens et al. 2000; Catelan et al. 2001; Crittenden et al. 2001; Porciani et al. 2002b; Jing 2002). In addition, the preferred accretion of new material along filaments tends to cause alignment with the large scale filamentary structure in which dark matter halos and galaxies are embedded (e.g., Jing 2002; Faltenbacher et al. 2005; Bailin & Steinmetz 2005). On small scales, however, inside virialized dark matter haloes, any primordial alignment is likely to have been significantly weakened due to non-linear effects such as violent relaxation and (impulsive) encounters (e.g., Porciani et al. 2002a). On the other hand, tidal forces from the host halo may also induce new alignments, similar to the tidal locking mechanism that affects the Earth-Moon system (e.g., Ciotti & Dutta 1994; Usami & Fujimoto 1997; Fleck & Kuhn 2003). Observationally, the search for galaxy alignments has a rich and often confusing history. To some extent this owes to the fact that numerous different forms of align- ment have been discussed in the literature: the align- ment between neighbouring clusters (Binggeli 1982; West 1989; Plionis 1994), between brightest cluster galaxies (BCGs) and their parent clusters (Carter & Metcalfe 1980; Binggeli 1982; Struble 1990), between the orienta- 1 Shanghai Astronomical Observatory, Nandan Road 80, Shang- hai 200030, China 2 University of Manchester, Jodrell Bank Observatory, Maccles- field, Cheshire SK11 9DL, UK 3 Max-Planck-Institute for Astronomy, Königstuhl 17, D-69117 Heidelberg, Germany 4 Department of Astronomy, University of Massachusetts, Amherst MA 01003-9305 tion of satellite galaxies and the orientation of the cluster (Dekel 1985; Plionis et al. 2003), and between the ori- entation of satellite galaxies and the orientation of the BCG (Struble 1990). Obviously, several of these align- ments are correlated with each other, but independent measurements are difficult to compare since they are of- ten based on very different data sets. With large galaxy redshift surveys, such as the two-degree Field Galaxy Redshift Survey (2dFGRS, Colless, M., et al. 2001) and the Sloan Digital Sky Sur- vey (SDSS, York, D. G., et al. 2000), it has become pos- sible to investigate alignments using large and homoge- neous samples. This has resulted in robust detections of various alignments: Brainerd (2005), Yang et al. (2006) and Azzaro et al. (2007) all found that satellite galax- ies are preferentially distributed along the major axes of their host galaxies, Trujillo et al. (2006) found that spiral galaxies located on the shells of large voids have rotation axes that lie preferentially on the void surface, and Pereira & Kuhn (2005) and Agustsson & Brainerd (2006b) noticed that satellite galaxies tend to be prefer- entially oriented towards the galaxy at the center of the halo. In this Letter we use a large galaxy group catalogue constructed from the SDSS to study galaxy alignments on small scales within dark matter haloes that span a wide range in masses. The unique aspect of this study is that we investigate three different types of alignment using exactly the same data set consisting of over 60000 galaxies. In addition, by using a carefully selected galaxy group catalogue, we can discriminate between central galaxies and satellites, and study their mutual alignment. The latter is particularly important for galaxy-galaxy lensing, where it can be a significant source of contami- nation. Finally, exploiting the large number of galaxies in our sample, we also investigate how the alignment sig- nal depends on the colors of the galaxies. Throughout we adopt Ωm = 0.3 and ΩΛ = 0.7 and a Hubble parameter h = H0/100 km s −1Mpc−1. 2. DATA & METHODOLOGY http://arxiv.org/abs/0704.0674v2 2 Galaxy alignment within dark matter halos satellite Fig. 1.— Illustration of the three angles θ, φ and ξ, which are used to test for halo alignment, radial alignment and direct alignment, respectively. The three angles are not independent: if ordered by size α ≥ β ≥ γ then α = min[β + γ, 180◦ − β − γ]. We apply our analysis to the SDSS galaxy group cat- alogue of Yang et al. (2007, in prep.). This cata- logue is constructed using the halo-based group finder of Yang et al. (2005) and applied to the New York Uni- versity Value Added Galaxy Catalog (NYU-VAGC) 5 that is based on the SDSS Data Release Four (DR4; Adelman-McCarthy et al. 2006). This group finder uses the general properties of CDM halos (i.e. virial ra- dius, velocity dispersion, etc.) to determine the mem- berships of groups (cf. Weinmann et al. 2006). In this study we only use those groups with redshifts in the range 0.01 ≤ z ≤ 0.2 and with halo masses between 5 × 1012 h−1M⊙ and 5 × 10 14 h−1M⊙. In addition, we only focus on group members with 0.1M −5 logh ≤ −19. Throughout this paper all magnitudes are k+e corrected to z = 0.1 following Blanton et al. (2003). Using the method of Li et al. (2006) we split our galaxies in three color bins. In short, we divide the full NYU-VAGC sam- ple in 282 subsamples according to the r-band luminosity, and fit the 0.1(g−r) color distribution for each subsample with a double-Gaussian. Galaxies in between the centers of the two Gaussians are classified as ‘green’, while those with higher and lower values for the 0.1(g − r) color are classified as ‘red’ and ‘blue’, respectively. The final sam- ple, on which our analysis is based, consists of 18576 groups with a total of 60724 galaxies, of which 29780 are red, 20604 are green, and 10340 are blue. In what follows, we use these groups to examine (i) halo alignment between the orientation of the brightest group galaxies (BGG) and the distribution of its satel- lite galaxies, (ii) radial alignment between the orientation of a satellite galaxy and the direction towards its BGG, and (iii) direct alignment between the orientation of the BGG and that of its satellites. In particular, we define the angles θ, φ and ξ as illustrated in Fig. 1, and in- vestigate whether their distributions are consistent with isotropy, or whether they indicate a preferred alignment. Following Brainerd (2005) and Yang et al. (2006), the orientation of each galaxy is defined by the major axis position angle (PA) of its 25-magn arcsec−2 isophote in the r-band. For each satellite galaxy we compute its projected dis- tance, r, to the BGG, normalized by the virial radius, Rvir, of its group (as derived from the group mass). For each of 5 radial bins, equally spaced in r/Rvir, we then compute 〈θ〉, 〈φ〉 and 〈ξ〉, where 〈.〉 indicates the average over all BGG-satellite pairs in a given radial bin. Next we construct 100 random samples in which the positions 5 http://wassup.physics.nyu.edu/vagc/ Fig. 2.— Mean angle, θ, between the PA of the BGG and the line connecting the BGG with a satellite galaxy, as function of r/Rvir. Different line styles indicate (sub)samples determined according to the satellites’ color. The shaded areas mark the parameter space between the 16th and 84th percentiles of the distributions obtained from the 100 random samples. A signal outside this shaded region means that it is inconsistent with no alignment (i.e., with isotropy) at more than 68 percent confidence. of the galaxies are kept fixed, but their PAs are random- ized. For each of these random samples we compute 〈θ〉, 〈φ〉 and 〈ξ〉 as function of r/Rvir, which we use to com- pute the significance of any detected alignment signal. 3. RESULTS 3.1. Halo alignment Fig. 2 shows the results thus obtained for the angle θ between the orientation of the BGG and the line con- necting the BGG with the satellite galaxy. Clearly, for all four samples shown (all, red, green and blue, where the color refers to that of the satellite galaxy, not that of the BGG) we obtain 〈θ〉 < 45◦ at all 5 radial bins and at high significance6. This indicates that satellite galaxies are preferentially distributed along the major axis of the BGG, in good agreement with the findings of Brainerd (2005), Yang et al. (2006) and Azzaro et al. (2007), but opposite to the old Holmberg (1969) effect. Note that there is a clear indication that the distribution of red satellites is more strongly aligned with the orientation of the BGG than that of blue satellites, again in good agreement with previous studies (cf. Yang et al. 2006; Azzaro et al. 2007) 3.2. Radial alignment Hawley & Peebles (1975) were the first to report a possible detection of radial alignment in the Coma cluster, which has subsequently been confirmed by Thompson (1976) and Djorgovski (1983). However, in a more systematic study based on the 2dFGRS, Bernstein & Norberg (2002) were unable to detect any significant radial alignment of satellite galaxies around isolated host galaxies. On the other hand, using a very similar selection of hosts and satellites, but applied to the SDSS, Agustsson & Brainerd (2006b) found signifi- cant evidence for radial alignment on scales. 70 h−1kpc. In addition, Pereira & Kuhn (2005) found a statistically robust tendency toward radial alignment in a large sam- ple of 85 X-ray selected clusters. Fig. 3 shows the results obtained from our group cat- alogue. It shows, as function of r/Rvir, the mean angle 6 More than 99 percent, except for the 0.3Rvir bin for the blue and the 0.9Rvir bin for the green satellites. Faltenbacher et al. 3 Fig. 3.— Same as Fig. 2, but for the angle φ (see Fig. 1). φ between the PA of the satellite and the line connect- ing the satellite with its BGG. As in Fig. 2 results are shown for all four different samples, together with the 16th and 84th percentiles obtained from the random sam- ples. There is a clear and very significant indication that the major axes of red satellites point towards the BGG (i.e., 〈φ〉 < 45◦), at least for projected radii r . 0.7Rvir. The signal for the green satellites is significantly weaker, but still reveals a preference for radial alignment on small scales: in fact, for the 3 radial bins with r ≤ 0.5Rvir the null-hypothesis of no radial alignment can be rejected at more than 95 percent confidence level. In contrast, for the blue galaxies the data is perfectly consistent with no radial alignment. Since the 2dFGRS is more biased towards blue galaxies than the SDSS, this may at least partially explain why Bernstein & Norberg (2002) were unable to detect significant radial alignment. 3.3. Direct alignment The search for direct alignment has mainly been restricted to galaxy clusters (e.g., Plionis et al. 2003; Strazzullo et al. 2005; Torlina et al. 2007), mostly result- ing in no or very weak indications for alignment be- tween the orientations of BCG and satellite galaxies. Agustsson & Brainerd (2006b) extended the search for direct alignment to a samples of 4289 host-satellites pairs selected from the SDSS DR4, finding a weak but signif- icant signal on scales . 35 h−1kpc. On larger scales, however, no significant alignment was found, in agree- ment with Mandelbaum et al. (2006). Fig. 4 displays our results for the direct alignment, based on the angle ξ between the orientations of a satellite galaxy and that of its BGG. With the ex- ception of the central bin (r/Rvir = 0.1) the null- hypothesis of a random distribution cannot be rejected at more than 1σ confidence level. Our study, based on over 40000 BGG-satellite pairs, therefore agrees with Agustsson & Brainerd (2006b) that there is a weak indi- cation for direct alignment, but only on relatively small scales: for the average group mass in our sample, M = 3.6×1013 h−1M⊙, a radius of r = 0.1Rvir corresponds to 70 h−1kpc. However, at least for the red satellites there is a systematic trend towards angles < 45◦ which may be caused by the group tidal field (cf. Lee et al. 2005). 3.4. Dependence on selection criteria The sample used above is based on galaxies with −5 log h ≤ −19. Typically, including fainter galax- ies improves the number statistics but not necessarily the signal-to-noise since the PAs of fainter galaxies carry Fig. 4.— Same as Fig. 2, but for the angle ξ (see Fig. 1). larger errors. To test the sensitivity of our results, we repeated the above analysis using magnitude limits of −17, −18, and −20. This resulted in alignment signals that were only marginally different. We have also tested the sentitivity of our results to the range of group masses considered. Changing the lower limit to 1012 h−1M⊙ or 1013 h−1M⊙, or imposing no upper mass limit, all yields very similar alignment signals. These tests assure that our selection criteria lead to representative results. 4. DISCUSSION The origin of the halo alignment described in § 3.1 has been studied by Agustsson & Brainerd (2006a) and Kang et al. (2007) using semi-analytical models of galaxy formation combined with large N -body simulations. Since dark matter haloes are in general flattened, and satellite galaxies are a reasonably fair tracer of the dark matter mass distribution, 〈θ〉 will be smaller than 45◦ as long as the BGG is aligned with its dark matter halo. In particular, Kang et al. (2007) were able to accurately reproduce the data of Yang et al. (2006) under the as- sumption that the minor axis of the BGG is perfectly aligned with the spin axis of its dark matter halo. Kang et al. (2007) also showed that the color depen- dence of the halo alignment has a natural explanation in the framework of hierarchical structure formation: red satellites are typically associated with subhaloes that were more massive at their time of accretion. Since the orientation of a halo is correlated with the direction along which it accreted most of its matter (e.g., Wang et al. 2005; Libeskind et al. 2005), red satellites are a more ac- curate tracer of the halo orientation than blue satellites. The origin of the radial alignment is less clear. One possibility is that it reflects a left-over from large-scale alignments introduced by the large scale tidal field and the preferred accretion of matter along filaments. Such alignment, however, is unlikely to survive for more than a few orbits within the halo of the BGG, so that the observed alignment must be mainly due to the satellite galaxies that were accreted most recently. Since these satellites typically reside at relatively large halo-centric radii, this picture predicts a stronger radial alignment at larger radii, clearly opposite to what we find. A more likely explanation, therefore, is that radial alignment has been created locally by the group tidal field. As shown by Ciotti & Dutta (1994), the timescale on which a prolate galaxy can adjust its orientation to the tidal field of a cluster is much shorter than the Hub- ble time, but longer than its intrinsic dynamical time. Consequently, prolate galaxies have a tendency to orient 4 Galaxy alignment within dark matter halos themselves towards the cluster center. The fact that the observed signal increases towards the group center sup- ports this interpretation. In particular, satellites that were accreted early not only are more likely to be red, they also are more likely to reside at small group-centric radii and to have relatively low group-centric velocities (e.g., Mathews et al. 2004). This will enhance their ten- dency to align themselves along the gradient in the clus- ter’s gravitational potential, and they may well be the major contributors to the pronounced signal on small scales. In the case of disk galaxies, the conservation of intrinsic angular momentum prevents the disk from re- adjusting to the tidal field, which may explain why blue satellites show no sign of radial alignment. Finally, the tidal field of the parent halo also results in tidal strip- ping, and the tidal debris may influence the inferred ori- entation of the satellite galaxy (cf. Johnston et al. 2001; Fardal et al. 2006). Detailed studies are required to in- vestigate the interplay between intrinsic satellite orien- tations and the groups tidal field. In order to understand the direct alignment results, first realize that the angles θ, φ and ξ are not indepen- dent (see Fig. 1). However, the equation given in the cap- tion is only applicable for single cases not for the mean angles. Our results indicate that satellite galaxies are more likely to be aligned ‘radially’ with the direction to- wards the BGG, than ‘directly’ with the orientation of the BGG. Since there is no clear theoretical prediction for direct alignment, at least not one that can survive for several orbital periods in a dark matter halo, while radial alignment can be understood as originating from the halo’s tidal field, we consider the relative weakness of direct alignment to be consistent with expectations. In recent years galaxy-galaxy (GG) lensing has emerged as a primary tool for constraining the masses of dark matter halos around galaxies (e.g., Brainerd 2004). If satellite galaxies are falsely identified as sources lensed by the BGG, which is likely to happen in the absence of redshift information, the radial alignment detected here will dilute the tangential GG lensing signal induced by the dark matter halo associated with the BGG, thus re- sulting in an underestimate of the halo mass. In agree- ment with Agustsson & Brainerd (2006b), our findings therefore emphasize the importance of an accurate rejec- tion of satellite galaxies to achieve precision constraints on dark matter halo masses from GG lensing measure- ments. Similarly, the weak but significant detection of direct alignment may contaminate the cosmic shear mea- surements. Since we only detected a weak signal on small scales, one can easily avoid this contamination by sim- ply removing or down-weighting close pairs of galaxies in projection (King & Schneider 2002; Heymans & Heavens 2003). ACKNOWLEDGMENTS This work is supported by NSFC (10533030, 0742961001, 0742951001) and the Knowledge Innova- tion Program of the Chinese Academy of Sciences, grant KJCX2-YW-T05. AF and CL are supported by the Joint Program in Astrophysical Cosmology of the Max Planck Institute for Astrophysics and the Shanghai As- trophysical Observatory. YPJ is partially supported by Shanghai Key Projects in Basic research (04JC14079 and 05XD14019). REFERENCES Adelman-McCarthy, J. K., et al. 2006, ApJS, 162, 38 Agustsson, I. & Brainerd, T. G. 2006a, ApJ, 650, 550 —. 2006b, ApJ, 644, L25 Azzaro, M., Patiri, S. G., Prada, F., & Zentner, A. R. 2007, MNRAS, 376, L43 Bailin, J. & Steinmetz, M. 2005, ApJ, 627, 647 Bernstein, G. M. & Norberg, P. 2002, AJ, 124, 733 Binggeli, B. 1982, A&A, 107, 338 Blanton, M. R., et al. 2003, ApJ, 592, 819 Brainerd, T. G. 2004, in AIP Conf. Proc. 743: The New Cosmology: Conference on Strings and Cosmology, ed. R. E. Allen, D. V. Nanopoulos, & C. N. Pope, 129–156 Brainerd, T. G. 2005, ApJ, 628, L101 Carter, D. & Metcalfe, N. 1980, MNRAS, 191, 325 Catelan, P., Kamionkowski, M., & Blandford, R. D. 2001, MNRAS, 320, L7 Ciotti, L. & Dutta, S. N. 1994, MNRAS, 270, 390 Colless, M., et al. 2001, MNRAS, 328, 1039 Crittenden, R. G., Natarajan, P., Pen, U.-L., & Theuns, T. 2001, ApJ, 559, 552 Croft, R. A. C. & Metzler, C. A. 2000, ApJ, 545, 561 Dekel, A. 1985, ApJ, 298, 461 Djorgovski, S. 1983, ApJ, 274, L7 Faltenbacher, A., Allgood, B., Gottlöber, S., Yepes, G., & Hoffman, Y. 2005, MNRAS, 362, 1099 Fardal, M. A., Babul, A., Geehan, J. J., & Guhathakurta, P. 2006, MNRAS, 366, 1012 Fleck, J.-J. & Kuhn, J. R. 2003, ApJ, 592, 147 Hawley, D. L. & Peebles, P. J. E. 1975, AJ, 80, 477 Heavens, A., Refregier, A., & Heymans, C. 2000, MNRAS, 319, 649 Heymans, C. & Heavens, A. 2003, MNRAS, 339, 711 Holmberg, E. 1969, Arkiv for Astronomi, 5, 305 Jing, Y. P. 2002, MNRAS, 335, L89 Johnston, K. V., Sackett, P. D., & Bullock, J. S. 2001, ApJ, 557, Kang, X., van den Bosch, F. C., Yang, X., Mao, S., Mo, H. J., Li, C., & Jing, Y. P. 2007, MNRAS, in press (astro-ph/0701130) King, L. & Schneider, P. 2002, A&A, 396, 411 Lee, J., Kang, X., & Jing, Y. P. 2005, ApJ, 629, L5 Li, C., Kauffmann, G., Jing, Y. P., White, S. D. M., Börner, G., & Cheng, F. Z. 2006, MNRAS, 368, 21 Libeskind, N. I., Frenk, C. S., Cole, S., Helly, J. C., Jenkins, A., Navarro, J. F., & Power, C. 2005, MNRAS, 363, 146 Mandelbaum, R., Hirata, C. M., Ishak, M., Seljak, U., & Brinkmann, J. 2006, MNRAS, 367, 611 Mathews, W. G., Chomiuk, L., Brighenti, F., & Buote, D. A. 2004, ApJ, 616, 745 Pen, U.-L., Lee, J., & Seljak, U. 2000, ApJ, 543, L107 Pereira, M. J. & Kuhn, J. R. 2005, ApJ, 627, L21 Plionis, M. 1994, ApJS, 95, 401 Plionis, M., Benoist, C., Maurogordato, S., Ferrari, C., & Basilakos, S. 2003, ApJ, 594, 144 Porciani, C., Dekel, A., & Hoffman, Y. 2002a, MNRAS, 332, 325 —. 2002b, MNRAS, 332, 339 Strazzullo, V., Paolillo, M., Longo, G., Puddu, E., Djorgovski, S. G., De Carvalho, R. R., & Gal, R. R. 2005, MNRAS, 359, Struble, M. F. 1990, AJ, 99, 743 Thompson, L. A. 1976, ApJ, 209, 22 Torlina, L., De Propris, R., & West, M. J. 2007, ArXiv Astrophysics e-prints Trujillo, I., Carretero, C., & Patiri, S. G. 2006, ApJ, 640, L111 Usami, M. & Fujimoto, M. 1997, ApJ, 487, 489 Wang, H. Y., Jing, Y. P., Mao, S., & Kang, X. 2005, MNRAS, 364, Weinmann, S. M., van den Bosch, F. C., Yang, X., & Mo, H. J. 2006, MNRAS, 366, 2 West, M. J. 1989, ApJ, 344, 535 Yang, X., Mo, H. J., van den Bosch, F. C., & Jing, Y. P. 2005, MNRAS, 356, 1293 http://arxiv.org/abs/astro-ph/0701130 Faltenbacher et al. 5 Yang, X., van den Bosch, F. C., Mo, H. J., Mao, S., Kang, X., Weinmann, S. M., Guo, Y., & Jing, Y. P. 2006, MNRAS, 369, York, D. G., et al. 2000, AJ, 120, 1579 ABSTRACT Using a large galaxy group catalogue based on the Sloan Digital Sky Survey Data Release 4 we measure three different types of intrinsic galaxy alignment within groups: halo alignment between the orientation of the brightest group galaxies (BGG) and the distribution of its satellite galaxies, radial alignment between the orientation of a satellite galaxy and the direction towards its BGG, and direct alignment between the orientation of the BGG and that of its satellites. In agreement with previous studies we find that satellite galaxies are preferentially located along the major axis. In addition, on scales r < 0.7 Rvir we find that red satellites are preferentially aligned radially with the direction to the BGG. The orientations of blue satellites, however, are perfectly consistent with being isotropic. Finally, on scales r < 0.1 \Rvir, we find a weak but significant indication for direct alignment between satellites and BGGs. We briefly discuss the implications for weak lensing measurements. <|endoftext|><|startoftext|> Proto-Neutron Star Winds, Magnetar Birth, and Gamma-Ray Bursts Brian D. Metzger∗,†, Todd A. Thompson∗∗ and Eliot Quataert∗ ∗Astronomy Department and Theoretical Astrophysics Center, 601 Campbell Hall, Berkeley, CA 94720; bmetzger@astro.berkeley.edu, eliot@astro.berkeley.edu †Department of Physics, 366 LeConte Hall, University of California, Berkeley, CA 94720 ∗∗Department of Astrophysical Sciences, Peyton Hall-Ivy Lane, Princeton University, Princeton, NJ 08544; thomp@astro.princeton.edu Abstract. We begin by reviewing the theory of thermal, neutrino-driven proto-neutron star (PNS) winds. Including the effects of magnetic fields and rotation, we then derive the mass and energy loss from magnetically-driven PNS winds for both relativistic and non-relativistic outflows, including important multi-dimensional considerations. With these simple analytic scalings we argue that proto-magnetars born with ∼ millisecond rotation periods produce relativistic winds just a few seconds after core collapse with luminosi- ties, timescales, mass-loading, and internal shock efficiencies favorable for producing long- duration gamma-ray bursts. Keywords: neutron stars, stellar winds, supernovae, gamma ray bursts, magnetic fields PACS: 97.60.Bw, 97.60.Gb; 97.10.Me 1. NEUTRINO-DRIVEN PNS WINDS After a successful core-collapse supernova (SN), a hot proto-neutron star (PNS) cools and deleptonizes, releasing the majority of its gravitational binding energy (∼ 3×1053 ergs) in neutrinos. With initial core temperature T > 10 MeV, a PNS is born optically-thick to neutrinos of all flavors because the relevant neutrino- matter cross sections scale as σνn ∝ ǫ ν ∝ T 2, where ǫν is a typical neutrino energy. Indeed, because neutrinos are trapped, a PNS’s neutrino luminosity Lν remains substantial and quasi-thermal for a time after bounce τKH ∼ 10−100 s, as roughly verified by the 19 neutrinos detected from SN1987A 20 years ago [1],[2]. Although this Kelvin-Helmholtz (KH) cooling epoch is short compared to the time required for the shock, once successful and moving outward at ∼ 104 km/s, to traverse the progenitor stellar mantle, τKH is still significantly longer than the time over which the initial explosion must be successful. While the specific shock launching mechanism is presently unknown, it must occur in a time t < 1 s ≪ τKH after bounce for the PNS to avoid accreting too much matter. Thus, even after the SN shock has cleared a cavity of relatively low density mate- rial around the PNS, Lν remains substantial. Detailed PNS cooling calculations [3] show that the electron neutrino(antineutrino) luminosity Lνe(ν̄e) is ∼ 10 52 erg/s at t∼ 1 s and declines as∝ t−1 until t≃ τKH, after which Lνe(ν̄e) decreases exponentially as the PNS becomes optically thin. This persistent neutrino flux Fνe(ν̄e) continues to heat the PNS atmosphere, primarily through electron neutrino(antineutrino) http://arxiv.org/abs/0704.0675v1 absorption on nuclei (νe+n → p+ e − and ν̄e+ p → n+ e +). Because the inverse, pair capture rates dominate the cooling, which declines rapidly with temperature (q̇− ∝ T 6) and hence with spherical radius r, a region of significant net positive heating (q̇ ≡ q̇+− q̇− > 0) develops above the neutrinosphere radius Rν . This heat- ing drives mass-loss from the PNS in the form of a thermally-driven wind [4]. To estimate the dependence of the resultant mass-loss rate (Ṁth) on the PNS proper- ties explicitly, consider that in steady state the change in gravitational potential required for a unit mass element to escape the PNS (GM/Rν) must be provided by the total heating it receives accelerating outwards from the PNS surface: , (1) where M is the PNS mass, vr is the outward wind velocity, and q̇ is per unit mass. Because q̇ is quickly dominated by heating from neutrino absorption, which scales as q̇+ ∝ Fνeσnν ∝ Lνeǫ /4πr2, we see that equation (1) implies that ρdr ≈ ρνHν , (2) where we have used Ṁth = 4πρr 2vr for a spherical wind, ρ is the mass density, H is the PNS’s density scale height, ǫνe crudely defines a mean electron neutrino or antineutrino energy, and a subscript “ν” denotes evaluation near Rν . Neglecting rotational support and assuming that the thermal pressure P is dominated by photons and relativistic pairs (which also becomes an excellent approximation as the density plummets abruptly above the PNS surface), we have that Hν ∼ Pν/ρνgν ∝ T ν/Mρν , where gν ∝ M/R ν is the PNS surface gravity and Tν ∝ (Lνeǫ /R2ν) 1/6 is the PNS surface temperature. Tν is set by the balance between heating and cooling at the PNS surface (T 6ν ∝ q̇ − = q̇+ ∝Lνeǫ /R2ν). Inserting these results into equation (2) and including the correct normalization from the relevant weak cross sections, one finds the expression for Ṁth first obtained by ref [4]: Ṁth ≈ 10 10 M⊙/s, (3) where L52 ≡Lνe×10 52 erg/s, ǫ10 ≡ 10ǫνeMeV, Rν ≡ 10R10 km, andM ≡ 1.4M1.4M⊙. Endowed with an enormous gravitational binding energy and a means, through this neutrino-driven outflow, for communicating a fraction of this energy to the outgoing shock, a newly-born PNS seems capable of affecting the properties of the SN that we observe. However, a purely thermal, neutrino-driven PNS wind is only accelerated to an asymptotic speed of order the surface sound speed: v∞th ∼ cs,ν ≈ 2kTν/mp ≈ 0.1L 10 c. Thus, the efficiency η relating wind power Ėth ≈ Ṁth(v 2/2 to total neutrino luminosity (Lν ∼ 6Lνe) is quite low: ∼ 10−5L 1.4 . (4) In particular, although neutrino energy deposited in a similar manner may be responsible for initiating the SN explosion itself at early times (i.e., the neutrino SN mechanism [5]), η drops rapidly as the PNS cools. Quasi-spherical winds of this type are therefore not expected to affect the SN’s nucleosynthesis or morphology (although the wind itself is considered a promising r-process source [4]). 2. MAGNETICALLY-DRIVEN PNS WINDS Some PNSs may possess a more readily extractable form of energy in rotation. A PNS born with a period P = Pms ms is endowed with a rotational energy Erot ≃ 2×10 52P−2ms R 10M1.4 ergs, which, for P < 4 ms, exceeds the energy of a typical SN shock (∼ 1051 ergs). Given a mass loss rate Ṁ and torquing lever arm ωτ , a wind extracts angular momentum J from the PNS at a rate J̇ ≃Ωω2τṀ , where Ω= 2π/P is the PNS rotation rate. With the PNS’s radius Rν as a lever arm and the modest thermally-driven mass-loss rate given by equation (3), the timescale for removal of the PNS’s rotational energy, τJ ≡ J/J̇ ∼MR ν/Ṁω τ ∼M/Ṁth, is much longer than τKH. However, if the PNS is rapidly rotating and possesses a dynamically-important poloidal magnetic field Bp (through either flux-freezing or generated via dynamo action [6]), then both Ṁ and ωτ can be substantially increased; this reduces τJ , allowing efficient extraction of Erot. For magnetized winds ωτ is the Alfvén radius ωA, defined as the cylindrical radius where ρv2r/2 first exceeds B p/8π [7]. The magnetosphere of a PNS is most likely dominated by its dipole component, with a total (positive-definite) surface magnetic flux given by ΦB = 2πBνR ν , where Bν is the polar surface field. To estimate ωA for magnetized PNS outflows recognize that mass and angular momentum are primarily extracted from a PNS along open magnetic flux. For an axisymmetric dipole rotator this represents only a fraction ≈ 2(πθ2LCFL)/4π ≃ Rν/2ωY of ΦB, where θLCFL ≈ Rν/ωY is the latitude (measured from the pole) at the PNS surface of the last closed field line (LCFL), ωY is the radius where the LCFL intersects the equator (the “Y point”), and we have assumed that ωY ≫Rν (θLCFL ≪ 1). Plasma necessarily threads a PNS’s closed magnetosphere and cannot be forced to corotate superluminally; thus ωY cannot exceed the light cylinder radius ωL ≡ c/Ω = 48Pms km, making it useful to write the PNS magnetosphere’s total open magnetic flux as ΦB,open ≈ πBνR ν(Rν/ωL)(ωY/ωL) −1. Now, the overall latitudinal structure of a PNS magnetosphere (i.e., the allocation of open and closed magnetic flux, and the value of ωY/ωL) is primarily dominated by the dipolar closed zone. However, recent numerical simulations [8] show that where the field is open it behaves as a “split monopole”. In this case the poloidal field scales as Bp ∼ ΦB,open/r 2 ≈ 0.2BνP msR10(ωY/ωL) −1(Rν/r) 2, rather than the dipole scaling ∝ (Rν/r) 3. The constant of proportionality is chosen to assure that Bp(Rν)→Bν in the limit of vanishing closed zone (ωL,ωY →Rν) and is in agreement with numerical results (see eq. [28] of ref [8]). 2.1. Non-Relativistic Winds and Asymmetric Supernovae Non-relativistic (NR) magnetically-driven winds reach an equipartition between kinetic and magnetic energy outside ωA such that the kinetic energy flux at ωA (Ṁvr(ωA) 2/2) carries a sizeable fraction of the rotational energy loss extracted by the wind’s surface torque Ėrot = J̇Ω = ṀΩ 2ω2A; thus, we have that vr(ωA)∼ ΩωA. Combining this with the modified monopole scaling for Bp motivated above and mass conservation ṀΩ ≡ ρr 2vr (ṀΩ is the mass flux per solid angle) we find that: ωA/Rν ≃B ms Ṁ Ω,−4R 10 (ωY/ωL) −1, (5) where ṀΩ ≡ ṀΩ,−4×10 −4M⊙s −1sr−1, Bν ≡B15×10 15 G, and we have concentrated on the open magnetic flux that emerges nearest the closed zone (polar latitude ≈ θLCFL) and which thereby dominates the spin-down torque. From equation (5) we see that winds from rapidly rotating PNSs with surface magnetic fields typical of Galactic “magnetars” (Bν ∼ 10 14 − 1015 G) possess enhanced lever arms for extracting rotational energy [9]. Fur- thermore, their total outflow power ĖNRmag ≈ Ėrot ≈ 2πθ LCFLṀΩΩ 2ω2A ≈ 1049B −13/3 ms Ṁ Ω,−4R 10 (ωY/ωL) −3 ergs/s dominates thermal acceleration (ĖNRmag > Ėth) for B15 > 0.4P 23/24 23/12 −11/3 1.4 (ωY/ωL) 9/4. This condi- tion becomes easier to satisfy as the PNS cools, allowing magnetized winds to dominate later stages of the KH epoch for PNSs with even relatively modest Bν and Ω. NR magnetically-driven winds, in addition to being more powerful than spherical, thermally-driven outflows, are efficiently hoop-stress collimated along the PNS rotation axis [8]. The power they deposit along the poles may produce asymmetry in SN ejecta distinct from the shock-launching process itself. Strong magnetic fields and rapid rotation can also increase the out- flow’s power through enhanced mass-loss because ĖNRmag ∝ Ṁ Ω . When the PNS’s hydrostatic atmosphere is forced to co-rotate to the outflow’s sonic radius ωs = (GM sin[θLCFL]/Ω 2)1/3 then ṀΩ is enhanced by a factor φcf ∼ exp[(vφ,ν/cs,ν) 2] over Ṁth/4π due to centrifugal (“cf”) slinging [9], where vφ,ν ≈ RνΩsin[θLCFL] ≈ RνΩ Rν/ωY is the PNS rotation speed at the base of the open flux. Using our estimate for cs,ν from § 1, we see that enhanced mass loss becomes important for Pms < Pcf,ms ≡ L −1/18 10 (ωY/ωL) (i.e., only for PNSs with considerable rotational energy Erot > 10 52 ergs). Fully enhanced mass loss (ṀΩ = Ṁthφcf/4π) requires ωA > ωs, which in turn requires that B15 > Bcf,15 ≡ P −13/4 10 Ṁ Ω,−4M 1.4 (ωY/ωL) 5/4 ≃ 0.3P 7/4ms L 1.4 R −29/12 10 exp[0.5(P/Pcf) −3](ωY/ωL) 5/4, where we have taken Ṁth from § 1. For cases with Bν < Bcf but P < Pcf , ṀΩ lies somewhere between Ṁth/4π and φcfṀth/4π (see [10] for numerical results). Millisecond proto-magnetars generally attain φcf , except perhaps at early times when the PNS is quite hot. 2.2. Relativistic Winds and Gamma-Ray Bursts As the PNS cools, eventually ωA → ωL and the PNS outflow becomes relativistic (REL). This transition occurs after τKH for most PNSs (they become pulsars), but rapidly rotating proto-magnetar winds become relativistic during the KH epoch itself. Similar to normal pulsars, PNSs of this type lose energy at the force-free, “vacuum dipole” rate: ĖRELmag ≈ 6×10 49B215P 10(ωY/ωL) −2 ergs/s (again modulo corrections for excess open magnetic flux ĖRELmag ∝ Φ B,open ∝ (ωY/ωL) −2 [8]), which gives a familiar spin-down timescale τJ =Erot/Ė mag ≈ 300B 10 M1.4(ωY/ωL) s. On the other hand, the mass loading on a PNS’s open magnetic flux is set by neutrino heating, a process totally different from the way that matter is extracted from a normal pulsar’s surface. In fact, a proto-magnetar outflow’s energy-to-mass ratio σ is given by ĖRELmag 2πθ2LCFLṀΩc ≈ 3B215P −10/3 1.4 exp From equation (6) we see that because a PNS’s mass-loss rate drops so precipitously as it cools, σ ∝ L−5/3νe ǫ −10/3 rises rapidly with time, easily reaching ∼ 10− 1000 during the KH epoch for typical magnetar parameters [9],[10]. Detailed evolution calculations indicate that Erot is extracted roughly uniformly in log(σ) [10]. To conclude with a concrete example, consider a proto-magnetar with Bν = 10 G and Pms = 3 at t= 10 seconds after core collapse. From the cooling calculations of ref [3] we have L52(10 s)≈ 0.1 and ǫ10(10 s)≈ 1 (see Figs. [14] and [18]) and so, under the conservative estimate that ωY =ωL, equation (6) gives σ≈ 500. Because σ represents the potential Lorentz factor of the outflow (assuming efficient conversion of magnetic to kinetic energy), we observe that millisecond proto-magnetar birth provides the right mass-loading to explain gamma-ray bursts (GRBs). Further, the power at t = 10 s is still ĖRELmag ≈ 10 50 erg/s with a spin-down time τJ ≈ 30 s, both reasonable values to explain typical luminosities and durations, respectively, of long-duration GRBs. Lastly, because σ rises so rapidly with time as the PNS cools, in the context of GRB internal shock models a cooling proto-magnetar outflow’s kinetic-to-γ-ray efficiency can be quite high; our calculations indicate that values of 10−50% are plausible. We conclude that magnetar birth accompanied by rapid rotation (but requiring less angular momentum than collapsar models) represents a viable long-duration GRB central engine. REFERENCES 1. Bionta, R. M., Blewitt, G., Bratton, C. B., Caspere, D., & Ciocio, A. 1987, Physical Review Letters, 58, 1494 2. Hirata, K. S., et al. 1988, Phys. Rev. D, 38, 448 3. Pons, J. A., Reddy, S., Prakash, M., Lattimer, J. M., & Miralles, J. A. 1999, ApJ, 513, 780 4. Qian, Y.-Z., & Woosley, S. E. 1996, ApJ, 471, 331 5. Bethe, H. A., & Wilson, J. R. 1985, ApJ, 295, 14 6. Thompson, C., & Duncan, R. C. 1993, ApJ, 408, 194 7. Weber, E. J., & Davis, L. J. 1967, ApJ, 148, 217 8. Bucciantini, N., Thompson, T. A., Arons, J., Quataert, E., & Del Zanna, L. 2006, MNRAS, 368, 1717 9. Thompson, T. A., Chang, P., & Quataert, E. 2004, ApJ, 611, 380 10. Metzger, B. D., Thompson, T.A., & Quataert, E. 2007, ApJ in press Neutrino-Driven PNS Winds Magnetically-Driven PNS Winds Non-Relativistic Winds and Asymmetric Supernovae Relativistic Winds and Gamma-Ray Bursts ABSTRACT We begin by reviewing the theory of thermal, neutrino-driven proto-neutron star (PNS) winds. Including the effects of magnetic fields and rotation, we then derive the mass and energy loss from magnetically-driven PNS winds for both relativistic and non-relativistic outflows, including important multi-dimensional considerations. With these simple analytic scalings we argue that proto-magnetars born with ~ millisecond rotation periods produce relativistic winds just a few seconds after core collapse with luminosities, timescales, mass-loading, and internal shock efficiencies favorable for producing long-duration gamma-ray bursts. <|endoftext|><|startoftext|> Introduction Current theories for the formation of massive stars stress the importance of the dense cluster environment in which most of them, if not all, form (Bonnell et al. 2007). Dynamical in- teractions at the centers of massive star forming regions lead to captures forming binary systems, ejections, mass segrega- tion, and possibly coalescence. A remarkable byproduct of the dynamical interactions in dense clusters of massive stars is the relatively large abundance of runaway O-type stars, which amount to almost ∼ 10% of the known O-type stars in the so- lar vicinity (see Maı́z-Apellániz et al. (2004) for a recent cen- sus). Runaway stars, characterized by their large spatial veloc- ities, can form either by dynamical ejection from a dense clus- ter (Poveda et al. 1967, Leonard & Duncan 1988, 1990) or by the explosion as supernova of a member of a close massive binary (Blaauw 1961, van Rensbergen et al. 1996, de Donder et al. 1997). Evidence for actual examples resulting from both mechanisms exists (Hoogerwerf et al. 2000, 2001), and both are a consequence of the special conditions in which massive star formation takes place. On the one hand, the high stellar density of the parental cluster facilitates the dynamical ejection scenario. On the other hand, the supernova scenario is favored by the high frequency of binaries with high mass ratios among massive stars (Garmany et al. 1982, Preibisch et al. 1999), which may be a Send offprint requests to: F. Comerón ⋆ Based on observations collected at the Centro Astronómico Hispano-Alemán (CAHA) at Calar Alto, operated jointly by the Max- Planck Institut für Astronomie and the Instituto de Astrofı́sica de Andalucı́a (CSIC). consequence of dynamical capture followed by accretion and or- bital evolution (Bate et al. 2003). Cygnus OB2, the most massive OB association of the solar neighbourhood (Knödlseder 2000, 2003, Comerón et al. 2002, and references therein), should be the source of numerous run- away stars given its rich content in massive stars, which in- cludes the massive multiple system Cyg OB2 8 near its center. Unfortunately, few studies to the date have addressed its possi- ble runaway population, with the exception of the recent radial velocity survey of Kiminki et al. (2007) in which no runaway candidate has been identified until now. Comerón et al. (1994, 1998) pointed out the existence of large-scale kinematical pe- culiarities in the Cygnus region, most likely related to the pres- ence of Cygnus OB2, as shown by Hipparcos proper motions. Although they interpreted their results in terms of triggered star formation (Elmegreen 1998), at least some of the stars that they identified as moving away from Cygnus OB2 might be actual runaways formed by either of the two mechanisms listed above. In this paper we report the identification of a very high mass runaway star, BD+43◦ 3654, very probably ejected from Cygnus OB2. The star had been already identified as a likely runaway by van Buren & McCray (1988) based on the existence of an apparent bow shock in IRAS images, caused by the inter- action of its stellar wind with the local interstellar medium. Here we present the first spectroscopic observations of the star, which show it to be a very early Of-type supergiant. We also present proper motion data and higher resolution MSX images leading to a more detailed analysis, which strongly supports an origin at the core of Cygnus OB2. http://arxiv.org/abs/0704.0676v1 2 F. Comerón and A. Pasquali: A very massive runaway star from Cygnus OB2 BD+43 3654 IS IS Fig. 1. Spectrum of BD+43◦ 3654 showing the main absorption lines used for spectral classification and the prominent emission of NIII and HeII. The prominent, unlabeled features are Hγ, Hδ, and Hǫ. Interstellar absorption features are indicated by dotted lines. The locations where one might expect to detect NIV and SiIV transitions are also indicated. 2. Observations The spectrum presented here was obtained in the course of a project aimed at producing spectral classifications of pre- viously unknown, photometrically selected new OB stars in the surroundings of Cygnus OB2. The photometry in the BRJHKS bands was taken from the Naval Observatory Merged Astrometric Dataset (NOMAD) catalog (Zacharias et al. 2004), which combines astrometry and photometry from the Hipparcos, Tycho-2, UCAC2, USNO-B1.0, and 2MASS catalogs. The spec- troscopic observations were carried out with the 2.2m tele- scope at the German-Spanish Astronomical Center on Calar Alto (Spain) using the CAFOS visible imager and spectrograph. A 1”5 slit combined with the B-100 grism, providing a resolution λ/∆λ = 800 in the blue part of the visible spectrum, were used. The exposure time was 900 s. The spectrum was reduced, ex- tracted, and wavelength calibrated using standard IRAF1 tasks under the ONEDSPEC package, and it was ratioed by a sixth- degree polynomial fit to the continuum in order to remove the steep slope due to the strong extinction towards the star. 3. Results 3.1. Stellar classification, properties, and kinematics Although the identification of BD+43◦ 3654 as a likely runaway star dates back to van Buren & McCray (1988), no spectral clas- sification is available in that work. Subsequent papers by van Buren et al. (1995) and Noriega-Crespo et al. (1997) refer to the star as a unspecified B-type but do not report dedicated ob- servations, and no other spectroscopic classification appears to be available in the literature apart from a generic classification as ’OB reddened’ in the LS catalog (Hardorp et al. 1964). The spectrum presented here is thus the first one allowing an accu- rate spectral classification of BD+43◦ 3654 and the estimate of its physical parameters. 1 IRAF is distributed by NOAO, which is operated by the Association of Universities for Research in Astronomy, Inc., under contract to the National Science Foundation. Fig. 2. Comparison between the position of BD+43◦ 3654 and the evolutionary tracks for very massive stars of Meynet et al. (1994). The most obvious spectroscopic feature of BD+43◦ 3654 is the presence of intense emission in the NIII and HeII lines, and possibly also in NIV and SiIV, clearly indicating that it is a Of star. HeII lines are also prominent in absorption, and together with the absence of HeI lines indicates a spectral type earlier than O5. Absorption bands due to interstellar absorption, CaII and diffuse interstellar bands, are also strong due to the high ex- tinction towards the star. Based on comparison with the atlas of Walborn & Kirkpatrick (1990), we classify the star as O4If. Using intrinsic colors of early-type stars from Tokunaga (2000) and the 2MASS HKS photometry from the NOMAD catalog, we estimate a K-band extinction AK = 0.57 mag. A summary of previous distance determinations to Cygnus OB2 has been presented by Hanson (2003). Based on her results, we adopt her favored distance modulus DM = 10.8 corresponding to a distance of 1450 pc, with an estimated un- certainty of ±0.4 based on the results of previous determinations summarized in that work. Assuming that BD+43◦ 3654 is ap- proximately at the same distance from the Sun as Cygnus OB2, we derive its absolute magnitude as MV = KS − AK + (V − KS )0 − DM = −6.3 ± 0.5 (1) where the 0.5 mag uncertainty includes as the dominant source the quoted uncertainty in the distance modulus and the contribu- tion of error in the derivation of the extinction. We estimate the latter to be 0.2 mag based on the quality of the fit of a reddened O4-type spectral energy distribution to the measured BRJHKS photometry. The contribution of the errors in the broad-band photometry is negligible as compared to those other two sources. Different calibrations of the stellar parameters of O stars can be found in literature to estimate the mass and the age of BD+43◦ 3654. These calibrations are based on a different treat- ment of the stellar model atmospheres, depending on whether non-LTE conditions, line-blanketing effects and stellar winds are taken into account. For an O4 supergiant in the Milky Way (i.e. of solar metallicity), Martins et al. (2005) provide an ef- fective temperature Te f f = 40702 K; Repolust et al. (2004) es- timate a colder Te f f = 39000 K, while Vacca et al. (1996) give Te f f = 47690 K. We have adopted the average of those three cal- ibrations, 42464 K, as the temperature for BD+43◦ 3654, con- sidering as the uncertainty the range of temperatures spanned by F. Comerón and A. Pasquali: A very massive runaway star from Cygnus OB2 3 those models. This uncertainty is larger than the temperature dif- ference between the subtypes O4 and O5, and between types O4I and O4V, for any given calibration (see e.g. Martins et al. 2005). The same is true for the effects of metallicity, which are hardly noticeable even when metal abundances change by a factor of 10. This is clearly shown in Fig. 15 of Heap et al. (2006), where the temperature-spectral type relationships from different cali- brations involving both galactic and Small Magellanic Cloud O stars are compared. Therefore, plausible uncertainties in either our spectral classification or in our assumption of solar metal- licity for BD+43◦ 3654 do not significantly alter the size of the error bars in Fig. 2. The absolute magnitudes MV obtained by all three models are very similar, with an average of MV = −6.36 and individual determinations deviating by less than 0.05 mag from that value. This is remarkably close to the value that we ob- tain from the photometry of BD+43◦ 3654 and the assumption that its distance modulus is the same obtained by Hanson (2003) for Cygnus OB2, thus supporting our choice of that distance for the star. The position of BD+43◦ 3654 on the Herzsprung-Russell (HR) diagram is shown in Fig. 2, together with the isochrones computed by Meynet et al. (1994) for solar metallicity and with enhanced stellar mass loss. These evolutionary tracks are prefer- able because they better reproduce the low-luminosity observed for some Wolf-Rayet stars, the surface chemical composition of WC and WO stars and the ratio of blue to red supergiants in the star clusters of the Magellanic Clouds. The isochrones plotted in Fig. 2 refer to a stellar mass of the progenitor on the main sequence of Mi = 20, 25, 40 M⊙ (in grey) and Mi = 60, 85 M⊙ (in black). The comparison between the observed properties of BD+43◦ 3654 and the isochrones allows us to estimate an initial mass Mi ≃ (70 ± 15) M⊙ and an approximate age of 1.6 Myr. The isochrones do not take into account stellar rotation, which many studies in the past decade have found to greatly affect mix- ing and mass loss, and to be an important ingredient for stellar evolution (Meynet & Maeder 1997, Langer et al. 1998, Heger & Langer 2000, Meynet & Maeder 2000, and references therein). As shown by Meynet & Maeder (2000), for an initial rotational velocity of 200-300 km s−1 and solar metallicity isochrones be- come brighter by a few tenths of a magnitude and the lifetime in the H-burning phase increases by 20-30%. Given the obser- vational errors on BD+43◦ 3654, these changes do no affect sig- nificantly our estimates of the initial mass and age of the star. Proper motions for BD+43◦ 3654 are available from the NOMAD catalog, based on measurements by the Hipparcos satellite in the Tycho catalog further refined with previous ground-based observations. The values listed are µα cos δ = (−0.4 ± 0.7) mas yr−1 and µδ = (+1.3 ± 1.0) mas yr −1. The corresponding values expressed in galactic coordinates, which are more convenient to derive the spatial velocity of the star with respect to its local interstellar medium, are µl cos b = (+0.8 ± 0.9) mas yr−1, µb = (+1.1 ± 0.8) mas yr Assuming that the local interestellar medium in the sur- roundings of BD+43◦ 3654 moves in a circular orbit around the galactic center, its proper motion (µl cos b)0, (µb)0 can be described by the first-order approximation to the local galactic velocity field; see e.g. Scheffler & Elsässer (1987): (µl cos b)0 = 0.211[A cos 2l cos b + B cos b sin l − cos l] (2a) (µb)0 = 0.211[−A sin 2l sin b cos b Fig. 3. Image obtained in the Midcourse Space Experiment (MSX) galactic plane survey in the D medium-infrared band (13.5 µm - 15.9 µm). The position of BD+43◦ 3654 is marked with a grey circle. The galactic North is up and the direction of growing galactic longitude to the left. cos l sin b + sin l sin b − cos b] (2b) where A and B are the Oort constants in units of km s−1 kpc−1; U, V , and W are the components of the solar peculiar mo- tion in the directions toward the galactic center, the direction of circular galactic rotation, and the North galactic pole re- spectively, in km s−1, and D is the distance to the Sun in kpc. We have adopted A = −B = 12.5 km s−1 kpc−1, cor- responding to a flat rotation curve with an angular velocity of 25 km s−1 kpc−1 and (U,V,W) = (7, 14, 7) km s−1. The proper motion of BD+43◦ 3654 with respect to its local interstellar medium is then ∆µl cos b = µl cos b − (µl cos b)0 = (5.3 ± 1.1) mas yr −1 (3a) ∆µb = µb − (µb)0 = (2.0 ± 0.9) mas yr −1 (3b) where the uncertainty allows for an error of 2 km s−1 kpc−1 in each of A, B, and 2 km s−1 in each of U, V , and W. The position angle θ of the residual proper motion with respect to the North galactic pole, counted as positive in the direction of increasing galactic latitude, is then θ = tan−1 ∆µl cos b = 69◦3 ± 9◦4 (4) The component of the spatial velocity on the plane of the sky that we derive from the residual proper motion at the adopted distance of 1450 pc is (39.8±9.8) km s−1, which is several times the sound speed in a warm neutral interstellar medium at a tem- perature of ∼ 8000 K, as expected from the fact that a clear bow shock is observed ahead of the star in the direction of its motion. 3.2. The bow shock The original identification of a possible bow shock associated to BD+43◦ 3654 was reported by van Buren & McCray (1988) 4 F. Comerón and A. Pasquali: A very massive runaway star from Cygnus OB2 based on 60 µm IRAS maps, and further details were given by van Buren et al. (1995) and Noriega-Crespo et al. (1997). While indeed suggestive of a bow shock, the resolution of the IRAS 60 µm images presented by van Buren et al. (1995) is not sufficient to accurately determine the shape of the bow shock and its position with respect to the star. Much improved images of the region around BD+43◦ 3654 have been provided by the Midcourse Space Experiment (MSX) satellite (Price et al. 2001). The BD+43◦ 3654 bow shock ap- pears in them as a neat, well defined arc-shaped nebula in the D (13.5-15.9 µm; see Fig. 3) and E (18.2-25.1 µm) bands, and is absent in the A (6.8-10.8µm) and C (11.1-13.2 µm) bands. The position of the apsis of the bow shock with respect to BD+43◦ 3654 can be well determined from those images, be- ing located at 3.4 arcmin from the star in a direction that forms an angle of 62◦5±10◦ with the direction towards the north galac- tic pole. This position angle is in very good agreement with the position angle of the residual velocity vector of the star (Eq. (4)). The position of the bow shock with respect to the star allows us to estimate the density of the interstellar medium through which BD+43◦ 3654 is moving. The apsis of the bow shock is approximately located at the stagnation radius, which is the dis- tance from the star where the ram pressure of the interstellar gas equals that of the stellar wind, given by (e.g. Wilkin 1996): Ṁwvw 4πρav2∗ where Ṁw and vw are respectively the mass loss rate and termi- nal wind velocity of the star, ρa is the ambient gas density, and v∗ is the spatial velocity of the star. The distance given in Eq. (5) assumes that the bow shock is bound by shock fronts on both sides. In reality, the non-zero cooling time of the shocked stel- lar wind builds up a thick layer of low-density, high-temperature gas between the reverse shock on the stellar wind and the bow shock. The existence of this thick layer moves the position of the apsis of bow shock to a greater distance from the star than that given by Eq. (5). This expression actually gives the position of the reverse shock ahead of the star, as shown in numerical sim- ulations by Comerón & Kaper (1998), and the actual position of the bow shock is normally ∼ 1.5R0, the precise distance depend- ing on the quantities entering the right-hand side of Eq. (5) and the cooling curve of the stellar wind gas. Concerning the stellar wind, we have adopted Ṁ = 10−5 M⊙ yr −1 and v∗ = 2300 km s as typical values derived by Markova et al. (2004) and Repolust et al. (2004) for the O4I stars in their samples. Finally, we use v∗ = 39.8 km s −1 as derived in the previous Section, assum- ing for simplicity that most of the velocity of BD+43◦ 3654 is on the plane of the sky and that there are no projection effects on the position of the bow shock. Introducing these values in Eq. (5), we obtain a number density of the local interstellar gas nH ≃ 6 cm −3. It must be kept in mind that this is only a rough es- timate of the density, mainly due to the large uncertainties in the values adopted for the different quantities intervening in Eq. (5) and the assumption that the residual motion of the star is in the plane of the sky. In particular, we note that Bouret et al. (2005) find mass loss rates smaller by a factor of ∼ 3 for the galactic O4If+ star HD190429A when taking into account wind clump- ing with respect to the homogeneous wind case, which may im- ply an overestimate of nH by a similar factor due to our adopted values. In any case, the estimated density clearly indicates that the star is moving in a tenuous medium whose density matches well that typical of the warm HI gas in the vicinity of the galactic midplane (e.g. Dickey & Lockman 1990). 4. Discussion: the origin of BD+43◦ 3654 The spectral type and estimated mass of BD+43◦ 3654 places it among the three most massive runaway stars known to date. The only other two comparable stars are ζ Pup and λ Cep (spectral types O4I(n)f and O6I(n)fp, respectively; Maı́z Apellániz 2004), whose masses (65-70 M⊙, as estimated by Hoogerwerf et al. (2001) from evolutionary models by Vanbeveren et al. (1998)) are similar to the one that we estimate for BD+43◦ 3654. Although currently placed near the boundary separating Cygnus OB1 and OB9 (to the extent that this boundary may be real; see Schneider et al. (2007)), the proper motion of BD+43◦ 3654 points away from the core of Cygnus OB2, which is approximately marked by the location of the multiple system of O stars Cyg OB2 8A-D. Other early O-type stars, most no- tably Cyg OB2 22A (O3If*), Cyg OB2 22B (O6V((f))), and Cyg OB2 9 (O5If+), are also within few arcminutes of that lo- cation. The position angle of BD+43◦ 3654 with respect to this system is 58◦84, very similar to the position angle of its resid- ual proper motion vector (Sect. 3.1) and of the axis of the bow shock (Sect. 3.2). In view of the high density of very massive OB stars found in the central regions of Cygnus OB2 (Massey & Thompson 1991), we thus consider as a very likely possibility that BD+43◦ 3654 was formed there and subsequently expelled. Assuming that BD+43◦ 3654 was born in the close vicinity of Cyg OB2 8, of which is currently separated by an angular distance δ = 2◦67, the travel time to its current position is τ = (∆µl cos b)2 + (∆µb)2 = 1.7 ± 0.4 Myr, which is close to the age of the star inferred from the evolutionary tracks and the posi- tion in the H-R diagram (Sect. 3.1). The coincidence between the age and the travel time supports dynamical ejection early in its life as the cause for its runaway velocity, since there would have been no time for a hypothetical massive companion to evolve, go through different mass transfer episodes (Vanbeveren et al. 1998) and then explode as supernova. The spatial velocities of the other two massive runaways noted above are probably higher, unless the radial velocity of BD+43◦ 3654 exceeds the projected veloc- ity on the plane of the sky: Hoogerwerf et al. (2001) measure a velocity of 62.4 km s−1 for ζ Pup, and 74.0 for λ Cep. High spatial velocities may be the signature of an origin by supernova ejection, since high ejection velocities by dynamical interaction become increasingly unlikely as the mass of the ejected star in- creases. The observed mass-velocity relationship for runaway stars (Gies & Bolton 1986) clearly shows this trend. Hoogerwerf et al. (2001) favor a supernova scenario for λ Cep on the basis of the difference between its age and that of the likeliest parental association. The birthplace of ζ Pup is more uncertain according to Hoogerwerf et al. (2001), but van Rensbergen et al. (1996) also favor a supernova scenario. We note however that the ve- locities of all three stars is well below the upper limit for the ejection of very massive stars in encounters with massive bina- ries (Leonard 1991). We thus consider the similarity between the estimated age of BD+43◦ 3654, and its kinematic age if it was born near the center of Cygnus OB2, as the strongest argument in support of a dynamical ejection, possibly from an original clus- ter containing in addition Cyg OB2 8, 9, and 22. BD+43◦ 3654 is the first runaway star from Cygnus OB2 identified thus far, but most likely it is not the only one in such a rich association. If the fraction of runaways among O-type stars is the same for Cygnus OB2 as for the more nearby population of O stars, we estimate that about ten more Cygnus OB2 run- aways may remain to be discovered, having the potential of pro- viding new information on their formation environments and on the mechanisms leading to the runaway ejection. F. Comerón and A. Pasquali: A very massive runaway star from Cygnus OB2 5 Acknowledgements. It is as always a pleasure to acknowledge the support of the staff of the Calar Alto observatory during the execution of our observations. We also thank the detailed and constructive comments of the referee, Dr. Dave van Buren. This research has made use of the SIMBAD database operated at CDS, Strasbourg, France. It also makes use of data products from the Two Micron All Sky Survey, which is a joint project of the University of Massachusetts and the Infrared Processing and Analysis Center/California Institute of Technology, funded by the National Aeronautics and Space Administration and the National Science Foundation, as well as of data products from the Midcourse Space Experiment (MSX). Processing of the MSX data was funded by the Ballistic Missile Defense Organization with additional support from NASA Office of Space Science. This research has also made use of the NASA/ IPAC Infrared Science Archive, which is operated by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration. References Bate, M.R., Bonnell, I.A., Bromm, V., 2003, MNRAS, 336, 705. Blaauw, A., 1961, Bull. Astron. Inst. Netherlands, 15, 265. Bonnell, I.A., Larson, R.B., Zinnecker, H., 2007, in ”Protostars and Planets V”, eds. B. Reipurth, D. Jewitt, K. Keil, Univ. of Arizona Press. Bouret, J.-C., Lanz, T., Hillier, D.J., 2005, A&A, 438, 301. Comerón, F., Kaper, L., 1998, A&A, 338, 273. Comerón, F., Torra, J., 1994, ApJ, 423, 652. Comerón, F., Torra, J., Gómez, A.E., 1998, A&A, 330, 975. Comerón, F., Pasquali, A., Rodighiero, G., Stanishev, V., De Filippis, E., López Martı́, B., Gálvez Ortiz, M.C., Stankov, A., Gredel, R., 2002, A&A, 389, de Donder, E., Vanbeveren, D., van Bever, J., 1997, A&A, 318, 812. Dickey, J.M., Lockman, F.J., 1990, ARA&A, 28, 215. Elmegreen, B.G., 1998, in Origins, eds. C.E. Woodward, J.M. Shull, H.A. Thronson, ASP Conf. Ser. 148. Garmany, C.D., , 1980, Conti, P.S., Massey, P., 1982, ApJ, 242, 1063. Gies, D.R., Bolton, C.T., 1986, ApJS, 61, 419. Hanson, M.M., 2003, ApJ, 597, 957. Hardorp, J., Theile, I., Voigt, H.H., 1964, Luminous Stars in the Northern Milky Way (LS), Vol. 3., Hamburger Sternwarte and Warner & Swasey Obs. Heap, S.R., Lanz, T., Hubeny, I., 2006, ApJ, 638, 409. Heger, A., Langer, N., 2000, ApJ, 544, 1016. Hoogerwerf, R., de Bruijne, J.H.J., de Zeeuw, P.T., 2000, ApJ, 544, L133. Hoogerwerf, R., de Bruijne, J.H.J., de Zeeuw, P.T., 2001, A&A, 365, 49. Kiminki, D.C., Kobulnicky, H.A., Kinemuchi, K., Irwin, J.S., Fryer, C.L., Berrington, R.C., Uzpen, B., Monson, A.J., Pierce, M.J., Woosley, S.E., 2007, ApJ, in press. Knödlseder, J., 2000, A&A, 360, 539. Knödlseder, J., 2003, in A Massive Star Odyssey: From Main Sequence to Supernova, eds. K. van der Hucht, A. Herrero, C. Esteban, ASP Conf. Ser. Langer, N., Heger, A., Fliegner, J., 1998, in Fundamental Stellar Properties: The Interaction between Observation and Theory, IAU Symp. 189, eds. T.R. Bedding, A.J. Booth, J. Davis, Kluwer Acad. Publ. Leonard, P.J.T., 1991, AJ, 101, 562. Leonard, P.J.T., Duncan, M.J., 1988, AJ, 96, 222. Leonard, P.J.T., Duncan, M.J., 1990, AJ, 99, 608. Maı́z Apellániz, J., Walborn, N.R., Galué, H.Á., Wei, L.H., 2004, ApJS, 151, Markova, N., Puls, J., Repolust, T., Markov, H., 2004, A&A, 413, 693. Martins, F., Schaerer, D., Hillier, D.J., 2005, A&A, 436, 1049. Massey, P., Thompson, A.B., 1991, AJ, 101, 1408. Meynet, G., Maeder, A., 1997, A&A, 321, 465. Meynet, G., Maeder, A., 2000, AR&A, 38, 143. Meynet, G., Maeder, A., Schaller, D., Charbonnel, C., 1994, A&AS, 103, 97. Noriega-Crespo, A., van Buren, D., Dgani, R., 1997, AJ, 113, 780. Poveda, A., Ruiz, J., Allen, C., 1967, Bol. Obs. Tonantzintla y Tacubaya, 4, 86. Preibisch, T., Balega, Y., Hofman, K.-H., Weigelt, G., Zinnecker, H., 1999, New Astr., 4, 531. Price, S.D., Egan, M.P., Carey, S.J., Mizuno, D.R., Kuchar, T.A., 2001, AJ, 121, 2819. Repolust, T., Puls, J., Herrero, A., 2004, A&A, 415, 349. Scheffler, H., Elsässer, H., 1987, Physics of the Galaxy and the Interstellar Medium, Springer Verlag. Schneider, N., Simon, R., Bontemps, S., Motte, F., 2007, A&A, submitted. Tokunaga, A.T., 2000, in Allen’s Astrophysical Quantities, ed. A. Cox, AIP Press. Vacca, W.D., Garmany, C.D., Shull, J.M., 1996, ApJ, 460, 914. Vanbeveren, D., De Loore, C., Van Rensbergen, W., 1998, A&A Rev., 9, 63. van Buren, D., McCray, R., 1988, ApJ, 329, L93. van Buren, D., Noriega-Crespo, A., Dgani, R., 1995, AJ, 110, 2914. van Rensbergen, W., Vanbeveren, D., de Loore, C., 1996, A&A, 305, 825. Walborn, N.R., Fitzpatrick, E.L., 1990, PASP, 102, 379. Wilkin, F.P., 1996, ApJ, 459, L31. Zacharias, N., Monet, D.G., Levine, S.E., Urban, S.E., Gaume, R., Wycoff, G.L., 2004, AAS, 205, 4815. List of Objects ‘Cygnus OB2’ on page 1 ‘BD+43◦ 3654’ on page 1 ‘HD190429A’ on page 4 ‘ζ Pup’ on page 4 ‘λ Cep’ on page 4 ‘Cyg OB2 8A-D’ on page 4 ‘Cyg OB2 22A’ on page 4 ‘Cyg OB2 22B’ on page 4 ‘Cyg OB2 9’ on page 4 Introduction Observations Results Stellar classification, properties, and kinematics The bow shock Discussion: the origin of BD+43 3654 ABSTRACT Aims: We analyze the available information on the star BD+43 3654 to investigate the possibility that it may have had its origin in the massive OB association Cygnus OB2. Methods: We present new spectroscopic observations allowing a reliable spectral classification of the star, and discuss existing MSX observations of its associated bow shock and astrometric information not previously studied. Results: Our observations reveal that BD+43 3654 is a very early and luminous star of spectral type O4If, with an estimated mass of (70 +/- 15) solar masses and an age of about 1.6 Myr. The high spatial resolution of the MSX observations allows us to determine its direction of motion in the plane of the sky by means of the symmetry axis of the well-defined bow shock, which matches well the orientation expected from the proper motion. Tracing back its path across the sky we find that BD+43 3654 was located near the central, densest region of Cygnus OB2 at a time in the past similar to its estimated age. Conclusions: BD+43 3654 turns out to be one of the three most massive runaway stars known, and it most likely formed in the central region of Cygnus OB2. A runaway formation mechanism by means of dynamical ejection is consistent with our results. <|endoftext|><|startoftext|> Introduction The star HD56126 is observed in the post-AGB phase of its evolution. While undergoing this short-lived stage (according to Blöcker [1], this phase lasts for ∆T ≈ 103 ÷ 104 years), the star passes to the stage of a planetary nebula and therefore post-AGB stars are also commonly referred to as “protoplanetary nebulae” (PPNe). In the Hertzsprung–Russell diagram post-AGB stars move at almost constant luminosity leftward of the AGB and become increasingly hotter. These objects, which are descendants of AGB-stars, can be used to trace the physical and chemical parameters of the interstellar matter due to a change in the energy source, which is accompanied by a change of the star structure, ejection of the envelope, and mixing of matter. The main task of our research is to reveal the chemical composition anomalies that are due to the nuclear synthesis of chemical elements in the interiors of low- and intermediate-mass stars (less than 8–9M⊙) and subsequent dredge-up of the products of synthesis to the surface layers of stellar atmospheres. We use our high-precision spectroscopic data not only to study the chemical composition, but also to perform a detailed analysis of the velocity field in the atmospheres of these stars, which constitutes a separate astrophysical problem. In addition, the high quality observational data allowed us to produce an atlas of the spectrum of a typical post-AGB star over a wide wavelength interval. To this task, we chose the supergiant star HD 56126 (Sp=F5Iab), which is the optical component of the IR source IRAS07134+1005 with a double-peaked spectral energy distribution (SED) typical of PPN. The star HD56126 is located outside the galactic plane, its galactic coordinates are l=206.◦75, b=+9.◦99. Note that HD56126 is a generally recognized canonical object in the phase of transition from the asymptotic giant branch to a planetary nebula. In addition to the anomalous SED mentioned above, which is due to the circumstellar dust, this star exhibits other, highly conspicuous, features specific for this class of objects [2]: the optical component of the PPN is an F5Iab- type supergiant at a high galactic latitude; the central star is surrounded by an extended nebula, which, according to HST observations [3], has the largest angular size β > 4′′ among PPN objects of this type; the optical spectrum exhibits variable complex emission–absorption profile of the Hα and shows spectral features that are indicative of the current mass outflow. Based on their high-resolution spectroscopy (R=860000, FWHM=0.35km/s) of HD56126, Crawford and Barlow [4] revealed the multicomponent structure of the Ki and C2 features, which is indicative of repeated episodes of mass ejection from the star. ⋆ The full version of the Atlas is available in electronic form from: http://www.sao.ru/hq/ssl/Atlas/Atlas.html http://arxiv.org/abs/0704.0677v1 http://www.sao.ru/hq/ssl/Atlas/Atlas.html 2 Klochkova et al.: Spectroscopy of HD56126 Subsequent studies of HD56126 and of the associated IR source revealed a number of properties, which allowed the object to acquire the canonical status in its class. First, an analysis of the spectra obtained with the echelle spectrograph of the 6-m telescope, allowed Klochkova [5] to conclude that HD56126 is a metal- poor star with [Fe/H ]⊙=−1.0 and high excess of carbon and s-process elements. Second, IRAS 07134+1005 was found to belong to the group of PPNe whose IR-spectra exhibit an emission feature at λ=21µm. Objects of this small subgroup were found to exhibit a correlation between the presence of this 21µm–feature and the manifestation of products of stellar nucleosynthesis in the outer atmospheric layers: overabundance of carbon and heavy metals of the s-process. This so far unexplained correlation has been found independently by Klochkova [6] and a group of other authors [7]. Thus HD56126 possesses the complete set of features peculiar to the entire family of PPNe, and this fact determines the importance of the detailed spectroscopy of this object and preparation of an atlas of its optical spectrum over a wide spectral region. This task is facilitated by HD56126 being the brightest (B=9.m11, V=8.m27) star among carbon-rich PPNe and hence the most accessible star for high-resolution spectroscopy among the objects of this type. Section 2 gives a brief description of the methods of observation and data reduction employed in this paper. Section 3 presents the peculiar features of the spectrum of HD56126, and section 4 describes the field of radial velocities Vr in the atmosphere and envelope of the star. We also briefly discuss the radial- velocity variability and the variability of selected spectral-line profiles. Section 5 describes the spectroscopic atlas, identification of spectral features, and compares the spectrum of HD56126 with that of the standard supergiant αPer (Sp = F5Iab). 2. Observations and reduction of spectra We performed spectroscopic observations of HD 56126 and αPer with the 6-m telescope of the Special Astrophysical Observatory. We obtained all spectra with NES [8, 9] and Lynx [10, 11] echelle spectrographs operating in the Nasmyth focus. A 2048×2048 CCD and image slicer [12] with the NES spectrograph allows taking spectra with a resolution of R≈ 60000, whereas the Lynx spectrograph equipped with a 1K×1K CCD yields a resolution of R≈ 25000. The Table 1 gives the dates of observations and the spectral region recorded. We use the modified ECHELLE context of MIDAS to extract data from two-dimensional echelle spectra (see [13] for details). Cosmic-hit removal was performed via median averaging of two successive spectra. Wavelength calibration was made using Th-Ar-lamps. We use DECH20 [14] code to perform spectropho- tometric and position measurements. In particular, we determine the radial velocities from individual lines and their components by superimposing the direct and mirror-reflected profiles. We determine the position zero point for each spectrogram by referring it to the positions of ionospheric emission features of the night sky and to those of telluric absorptions, which show up against the spectrum of the object. The accuracy of single line velocity measurements in the spectra obtained is better than 1.0 and 1.5 km/s, for NES and Lynx spectrographs, respectively. 3. Peculiarities of the optical spectrum of HD56126 Optical spectra of PPNe differ from the spectra of classical supergiants by the anomalous profiles of spectral lines (Hi, Nai, Hei), and primarily, by the anomalous Hα profiles. Hα lines in the spectra of typical PPNe have complex emission and absorption profiles with asymmetric cores, PCyg- or inverse PCyg-type profiles, and profiles with two emission components. PPNe often exhibit a combination of several such features. Emission in Hα may be due to mass outflow and/or pulsations and hence we must observe sporadic stellar wind in many PPNe. The Doppler shift of the core is usually smaller than the escape velocity, i.e., we have evidence only for motions at the wind base. The spectra of individual objects owe the great variety of their profiles to the differences in the dynamical processes in their extended atmospheres: spherically symmetric outflow with constant or height-dependent velocity, mass infall onto the photosphere, and pulsations. A two-component emission profile is indicative of a nonspherical envelope, e.g., the presence of a circumstellar disk. The peculiarity of the optical spectra of PPNe often shows up not only in specific Hi profiles, but also in the distortions of the spectral features of the F −K-type supergiant due to chemical composition anomalies and the presence of molecular features along with atomic and ion lines. HD 56126 exhibits all these spectral peculiarities that distinguish PPN from a normal supergiant of the same spectral type. As is evident from Fig. 1, the Hα line has a complex profile with absorption and emission components, which are absent in the spectrum of the comparison star αPer. Figure 1 also shows well-defined photospheric wings of the Hα line in the spectrum of HD56126. These wings are almost as extended as in the spectrum of αPer. Figure 2, which shows all the data now available, demonstrates date-to-date variations Klochkova et al.: Spectroscopy of HD56126 3 Table 1. Log of observations and results of Vr measurements. Column 4 gives the mean Vr averaged over weak lines (r→ 1). For Feii(42), Hα and D lines of Nai we give the velocities inferred from the positions of the strongest line components. The numbers in parentheses give the velocities inferred from weaker components. Slanted font in column 5 indicates the velocities inferred from the IR oxygen triplet Oiλ 7773 Å. Semicolumn indicates uncertain Date Spectro- ∆λ, Å Vr graph r→ 1 Feii(42) Hβ Hα DNai C2 interstellar 1 2 3 4 5 6 7 8 9 10 11 12 HD56126 12.01.93 Lynx 5560–8790 88.8 91 – 78 (100:) 77 – – – – 10.03.93 Lynx 5560–8790 89.0 93 – 71 (43:) 75: – – – – 04.03.99 Lynx 5050–6640 85.9 77 – 76 (43:) 78 77.1 – – – 20.11.02 NES 4560–5995 89.6 95 (80:) 89 – 75 (89) 77.2 12.0 23.5 30.8 21.02.03 NES 5150–6660 88.8 96: – 88 (112:) 75 (89) 77.1 12 24 31 12.04.03 NES 5270–6760 88.4 – – 82 (103:) 75 (89:) – 13 23 30.5 14.11.03 NES 4518–6000 85.3 96 (87:) 97 – 75 (87:) 76.9 12.5 – – 10.01.04 NES 5270–6760 86.7 – – 54: 76 (86:) – 13.0 23.5 31 09.03.04 NES 5275–6767 89.8 – – 58 (74:) 76 (89) – 13 24 31 12.11.05 NES 4010–5460 82.5 97 (77:) 98 – – 77.5 – – – 04.03.99 Lynx 5050–6620 −1.2 −1 – −2 – – – 02.08.01 NES 3500–5000 −1.8 −1 : −2 – – – – 11.11.05 NES 4010–5460 −2.0 −2 −2 – – – – 12.11.05 NES 4560–6010 −1.9 −2 −2 – – – – 6540 6550 6560 6570 Figure 1. Fragment of the atlas containing the Hα profile in the spectra of HD 56126 (top) and αPer (bottom). The y-axis gives the residual intensities, the continuum level is set equal to 100. of the central part of the Hα profile. Earlier, Oudmaijer and Bakker [15] performed spectral monitoring of HD56126 and also found the Hα to be highly variable over a two-months time scale. Hα-line variability can be naturally explained in the case of post-AGB stars with signs of binarity (e.g., in the case of HR 4049 [16]), however, it also shows up in post-AGB objects, which exhibit no regular radial-velocity or light variations (the case of HD133656 [17]). Photometric variability would allow us (like in the case of RVTau type stars) to invoke the mechanism where a shock wave stimulates mass outflow. Based on an extensive set of good quality spectroscopic observations of HD56126, Barthès et al. [18] found that not only the profile of Hα but 4 Klochkova et al.: Spectroscopy of HD56126 6562 6564 6566 12.01.93 10.03.93 04.03.99 21.02.03 12.04.03 10.01.04 09.03.04 Wavelength, А Figure 2. Variations of the Hα-line profile in the spectra of HD 56126 taken on different days. The y-axis shows the residual intensities, the continuum level of the lower spectrum is set to 100 and every next spectrum is shifted by 100 gradations with respect to the previous spectrum. also that of Hβ to be variable. The above authors analyzed the variations of the profiles of both these lines and concluded that no periodic component is present that could be associated with the radial-velocity and photometric variations of the star. The profiles of strong Feii lines (first and foremost, those of the members of the 42-nd multiplet), Baii, and other elements in the spectrum of HD56126 are also variable. However, whereas either the blue or red wing may have lower slopes in the absorption core of Hα, nonhydrogen absorptions preserve their Klochkova et al.: Spectroscopy of HD56126 5 5160 5170 5180 5190 Figure 3. Same as Fig. 1, but for the spectral region containing the 5165 Å band of the Swan system of C2 molecule and the Feii(42) 5169 Å line. asymmetry pattern unchanged: the blue wing is always more extended than the red wing. The profile of the Feii (42)λ 5169 Å line in Fig.3 is a typical example. Figures 3 and 4 show fragments of the spectra that may illustrate the differences between line intensities of HD 56126 and αPer. Feii absorptions in the spectrum of the former star are much weaker than in those of the latter star, and the ratio of the central depths of the same line in the spectra of the two stars depends on the line intensity: it increases from 1.5 to 4 as one passes from weak to strong lines. Fei lines are also depressed, on the average, by 0.1. On the contrary, Ci absorptions as well as those of Yii, Zrii, and other s-process products are deeper by 0.1–0.2 in the spectra of HD 56126 compared to the corresponding features in the spectrum of αPer. Let us now analyze the molecular component of the spectrum of HD56126. Bakker et al. [19] were the first to identify the Swan absorption bands of C2 molecule and of the red system of CN molecule in the spectrum of the star. Later, Bakker et al. [20] used high-resolution spectra with R=50000 to perform a detailed analysis of molecular bands in the spectra of HD 56126 and 16 other PPNe selected based on the presence of carbon molecules C2, CN, CH + in their shells. Judging by the velocity corresponding to the position of these bands, the molecular spectrum forms in a limited region of the shell close to the star [20]. Our spectra exhibit several bands of the Swan system (see Fig. 3). Here it is pertinent to recall that the spectra of several PPNe were found to exhibit emission Swan bands [19, 6]. However, the spectra of HD 56126 taken in different years show no signs of emission in these bands. D lines of Nai neither show any signs of emission. This fact is consistent with the rather simple elliptical shape of the nebula surrounding HD56126. Emissions in the Swan bands or in Nai D lines appear to show up only in the spectra of PPNe with bright circumstellar nebulae with well-defined asymmetry. The spectroscopy of the following PPNe corroborates this hypothesis: IRAS 04296+3429 [21], IRAS 23304+6147 [22], AFGL 2688 [23], IRAS 08005−2356 [24], IRAS 20056+1834 [25], and IRAS 20508+2011 [26]. On HST images [3], the nebulae surrounding these PPNe are usually asymmetric and have a bipolar structure. Note also that most of the objects listed above belong to type “1” according to the classification of Trammell et al. [27] — i.e., they are PPNe with polarized optical radiation. Figure 5 shows the behavior of the D2 Nai line profile in the spectrum of HD56126, and here, like in Table 1, to reveal the fine structure of lines, we analyze only the spectra taken with the highest resolution. The positions of the three short-wavelength components remain constant within the errors. This stability confirms that the components form in the interstellar medium. The wavelength shift of the deepest component agrees 6 Klochkova et al.: Spectroscopy of HD56126 5840 5850 5860 5870 Figure 4. Same as Fig. 1, but for the spectral region with the Baii λ 5853 Å line. with that of the Swan bands (columns 8 and 9 in Table 1), and this fact indicates that the component in question forms in the circumstellar shell. Finally, the longest-wavelength component forms in the photosphere: its temporal behavior agrees with that of other photospheric absorptions (columns 4 and 8 in Table 1). 4. Radial velocities pattern Much attention has been paid to the radial-velocity variations of HD 56126 and to study the differences between the radial velocities inferred from lines of different types. Bujarrabal et al. [28] used CO millimeter- wave observations to find Vr = 86.1 km/s, which is natural to adopt as the systemic radial velocity of HD56126. Based on an extensive collection of spectrograms with high temporal resolution and high S/N ratio, Oudmaijer and Bakker [15 analyzed the behavior of Vr and concluded that it is variable on a time scale of several months with a small amplitude (Vr=84÷87± 2 km/s). The above authors demonstrated the absence of variations on time scales ranging from several minutes to several hours. The variability of the radial velocity of HD 56126 also showed up when the radial-velocity measurements made with the 6-m telescope were compared to published data [5]. Lèbre et al. [29] performed a detailed spectroscopic monitoring of HD 56126. Fourier analysis of the available radial-velocity and photometric data led the above researchers to conclude that the dynamical state of the atmosphere of HD 56126 is similar to that of the atmospheres of RVTau-type variables. The above authors interpreted the variability of Hα in terms of shock propagation. Later, Lèbre et al. [30] analyzed the variability of two lines, Hα and Hβ. They obtained additional spectroscopic data and determined the period of radial pulsations to be P = 36.8 days. Barthès et al. [18] analyzed all the available reliable radial-velocity measurements made for HD56126 (89 measurements over eight years) and concluded that radial velocity of this star varies with a half-amplitude of 2.7 km/s and the main period of P = 36.8 ±0.2d. The period of photometric variability is the same and photometric amplitude is very small, 0.m02. However, the above authors found the variability pattern of the star to differ significantly from pulsations observed in RVTau-type stars. Judging by its temperature [5], HD 56126 evolved further beyond the stage of RVTau-type stars. The photometric and radial-velocity variations of HD 56126 may be due to first-overtone radial pulsations driven by shocks that generate complex asynchronous motions in the upper hydrogen layers of the star. Table 1 presents the radial-velocity data we obtained for HD 56126. Given that a velocity gradient is very likely in the upper layers of the star’s atmosphere, we report here the Vr values for individual lines and groups of lines. As is evident from Table 1, velocity variations inferred from weak absorptions (their residual Klochkova et al.: Spectroscopy of HD56126 7 5888 5889 5890 5891 20.11.02 21.02.03 14.11.03 12.04.03 10.01.04 09.03.04 Wavelength, А Figure 5. Spectral fragment with the D2 Nai line on different observing dates. intensities approach 1) are within the variability limits established by Barthès et al. [18]. The positions of the circumstellar Nai D lines and Swan bands of the C2 molecule agree well with each other and match the expansion velocity of the shell, Vexp ≈11 km/s. Note that the position of the wind component of the Hα line is inconsistent with those of the wind components of other lines, whereas the variations of the Hβ profile is synchronized with those of Feii(42) lines. When intercomparing our data and comparing it with those of other authors one must control the zero points of the corresponding radial-velocity systems. We used interstellar and circumstellar lines to control the radial-velocity zero points. Three blueshifted interstellar components of Nai lines in the spectrum of HD56126, which are barely visible in Fig. 5, yield Vr values listed in Column 10–12 of Table 1. The fourth weak component with Vr ≈46km/s is barely visible in at least three our spectra. For each component, all our Vr estimates agree with each other and with those of Bakker et al. [19] within the quoted errors. Furthermore, as is evident from Fig. 5, the blend consisting of the three main components has sharp boundaries, which allow the velocity of this entire feature to be confidently measured. Its mean velocity as inferred from our data is equal to Vr = 20.3 ± 0.3 km/s and agrees with the velocity of 20±2km/s measured by Lèbre et al. [29] from lower-resolution spectra. Crawford and Barlow [4] showed that when observed with superhigh resolution, the circumstellar C2 and Ki features exhibit components that are about 1 km/s apart. These components yield the same set of velocity values, but have different intensities. This effect may cause minor systematic differences (also on the order of 1 km/s) between the velocities inferred from circumstellar atomic and molecular lines in lower-resolution spectra. Our measurements reveal no variations of these velocities with time, and their mean values, 77.2±0.5 and 75.4±0.3km/s for C2 and Nai, respectively, do not disagree systematically with the radial velocities obtained by Lèbre et al. [29], Bakker et al. [19, 20], and Crawford and Barlow [4]: 77.3÷77.6 and 75.3÷76.8km/s for C2 and Nai, Ki, respectively. However, when comparing our Vr values with published data, one must take into account not only method- ological effects, but also the spectroscopic peculiarity of the object itself. Line profiles in the spectrum of HD56126 are asymmetric and their shape varies both with time and with line intensity. We plan to under- take a detailed analysis of the velocity field at different depths in the atmosphere and in the circumstellar envelope of HD 56126 in a separate paper. 8 Klochkova et al.: Spectroscopy of HD56126 As for the possible binarity of HD56126, it has neither been confirmed or finally disproved. In this connection, of certain interest is a comment by Barthès et al. [18] who pointed out a weak trend in the star’s radial velocity over several years of observations. This trend maybe indicative of a second companion in the system with an orbital period longer than 16 years. Our eight spectra, which we took on more recent dates, fail to clarify the situation. It would be therefore important to follow the behavior of Vr over a several-years period taking one to two spectra every month on a regular basis. The radial-velocity variability of HD 56126 is not a unique phenomenon. Part of candidate PPNe also demonstrate radial-velocity variations on a time scale of several hundred days, which may be indicative of the binary nature of these objects. Evidence for orbital motion has been obtained for several optically bright objects with IR excess. For example, authors of papers [31, 32] and [33] proved the binary nature, determined the orbital elements, and proposed models for the high-latitude supergiants 89Her and HR4049, respectively. Van Winckel et al. [34] showed that HR4049, HD44179, and HD52961 to be spectral binaries with the orbital periods of about one to two years. The above authors concluded that all extremely metal- deficient PPNe studied so far (HR4049, HD44179, HD52961, HD46703, and BD+39o4926) are binary stars. The observed correlation between the binarity and the presence of a hot dust envelope indicates that binarity promotes the formation of an envelope. Bakker et al. [16] use high-resolution spectra of HR4049 to analyze the variations of complex emission–absorption profiles of Na D lines and Hα lines over the orbital period. Individual components of these lines may form under different conditions: in the atmosphere of the main star; in the disk where both components of the binary are immersed, or in the interstellar medium. For such binaries, of fundamental importance is the determination of the systemic velocity from radio spectroscopy. The nature of the companions of suspected binary post-AGB stars remains unclear, because we see no direct manifestations of these companions either in the continuum or in spectral lines (all known binary post-AGB objects belong to type SB1). These companions may be either very hot object, or main-sequence stars of very low luminosity. For example, Bakker et al. [16] believe that the secondary companion in HR 4049 is a cold (Te = 3500K) MS star with a mass of M=0.56M⊙, although it may also be a white dwarf, like in the case of Ba-stars [35]. Unfortunately, because of the short history of PPN studies we are so far unable to make any definitive conclusions concerning the cause of radial-velocity variability of a representative sample of these objects. Moreover, the observed pattern of radial-velocity variations is often complicated by differential motions in the extended atmospheres of these objects. A detailed analysis of radial velocities based on data taken with high spectral and temporal resolution for selected — the brightest — PPNe reveals differences in the behavior of radial velocities inferred from lines of different excitation, which form at different depths in the atmosphere of the star. For example, Bakker et al. [36] analyzed the spectrum of the IRAS source identified with the peculiar supergiant HD101584 and found eight categories of spectral lines with fundamentally different temporal behavior of profiles, half-widths, and shifts (and hence of Vr values). In particular, the highest- excitation absorption features, which form near the star’s photosphere, exhibit radial-velocity variations due to the orbital motion in the binary system. At the same time, low-excitation lines with PCyg-type profiles form in the stellar-wind region and are indicative of mass outflow. The systemic velocity has been confidently determined from radio emissions of CO and OH molecules. 5. Spectral atlas Table 2. Spectra used to create this atlas αPer HD56126 ∆λ, Date Spect- Date Spect- Å rograph rograph 4010–5460 11.11.05 NES 12.11.05 NES 5460–6010 12.11.05 NES 9.03.04 NES 6010–6640 4.03.99 Lynx 9.03.04 NES 6640–8790 10.03.93 Lynx Our comparative atlas of the spectra of HD56126 and αPer includes 94 plots representing 40 Å-long spectral fragments. Some of them are shown in Figs. 1, 3, 4 and 6 to illustrate the differences between the Klochkova et al.: Spectroscopy of HD56126 9 6310 6320 6330 6340 6350 Figure 6. Same as Fig. 1, but for spectral fragment containing the Siii λ 6347 Å line. intensities and profiles of lines in the spectra of two stars of similar temperature and luminosity. The full version of the atlas is available at http://www.sao.ru/hq/ssl/Atlas/Atlas.html. In the wavelength interval 4010–6640ÅÅ the atlas gives the complete spectra of both objects. However, in the more long-wavelength portion, up to 8790 Å, part of the spectrum has been lost in gaps between echelle orders and the remaining portions are overcrowded by telluric lines. The atlas therefore gives only the most informative fragments for this part of the spectrum of HD56126. The spectrum of HD56126 is variable — line profiles, differential line shifts, and radial velocities vary from date to date, and therefore we performed no averaging of any kind — different spectral intervals are represented in the atlas by different spectra indicated in Table 2. For each wavelength interval, we selected from among the available spectra the one with the highest resolution and signal-to-noise ratio. We supplement graphic data in the atlas with tables. In Table 3 column 1 gives the results of identification of spectral features; column 2, the laboratory wavelengths used to measure the radial velocities; columns 3 and 5, the central residual line intensities “r”, and columns 4 and 6, the heliocentric radial velocities Vr measured from the line cores. To identify atomic and molecular lines in the spectrum of HD56126, we use the atlases and tables of solar spectrum [37, 38, 39, 40], the Moore tables for multiplets [41, 42], and electronic tables to the paper by Bakker et al. [19]. We also use VALD database [43]. We supplement the standard identification criteria (wavelength, relative line intensity, specific line profile) by two additional ones. One of these new criteria uses the chemical composition anomalies of HD 56126 mentioned above and the spectrum of the comparison star. The second new criterion can be applied only to sufficiently strong lines (r < 0.5), which in some of our spectra exhibit a sharp variation of radial velocity with line depth. Several rather strong lines remained unidentified in the spectrum of HD56126. Some of them can be seen in the fragments of the atlas presented here: the λ 6550 Å line in Fig.1, the λ 5845 and 5852 Å lines in Fig. 4, and the λ 6347 Å line in Fig. 6. Compared to the lines in the spectrum of αPer, those in the spectrum of HD56126 are less blended, because they are narrower and, in addition, many of these lines are weaker due to low metallicity of the star. However, by no means all absorptions can be used for reliable measurement of radial velocities. Table 3 lists about one and half thousand identifications for both stars and only 940 Vr measurements for HD 56126 obtained mostly from lines with minimal blending or from lines with the strongest difference of intensity in the spectra of two stars. http://www.sao.ru/hq/ssl/Atlas/Atlas.html 10 Klochkova et al.: Spectroscopy of HD56126 Conclusions We use numerous high-resolution observations (R=25000 and 60000) made with the echelle spectrographs of the 6-m telescope to perform a detailed analysis of the optical spectrum of the post-AGB star HD56126 identified with the IR source IRAS 07134+1005. We identified numerous absorptions of neutral atoms and ions in the wavelength interval from 4010 to 8790 ÅÅ and measured their depths and the corresponding radial velocities. We identified absorption bands of the C2, CN, and CH molecules, and interstellar bands (DIB). In addition to the known variability of the profile of the Hα line, we found variations of the profiles of a number of Feii and Baii lines. We produced an atlas of the spectra of HD56126 and its comparison star αPer. An analysis of our radial velocities determined from all spectra of our collection leads us to conclude that: – the accuracy of our radial-velocity data for HD 56126 allows them to be combined with the most accurate of earlier published measurements; – we found the behavior of Vr values to differ for lines of different excitation degree, which form at different depth in the stellar atmosphere. The half-amplitude of the variations of radial velocities measured from weak absorptions (r→ 1) is equal to 2–3 km/s; – we confirm the stability of the expansion velocity of the circumstellar envelope of HD56126 as measured from C2 and Nai lines; – we reveal the complex and variable shape of the profiles of strong lines (not only hydrogen lines, but also absorption features of Feii, Yii, Baii, and other elements), which form in the expanding atmosphere (wind base) of the star. To study the kinematic state of the atmosphere, one needs measurements of radial velocities for individual details of these profiles; – we demonstrate the necessity of high and even superhigh spectral resolution for studying stellar and circumstellar lines, respectively, in the spectrum of HD56126. Acknowledgments This work is supported by the Russian Foundation for Basic Research (project code 05–07–90087), the Presidium of the Russian Academy of Sciences (program “Origin and Evolution of Stars and Galaxies“, the Branch of Physical Sciences of the Russian Academy of Sciences (program “Extended Objects in the Universe”). This publication is based on work supported by Award No. RUP1–2687–NA–05 of the U.S. Civilian Research and Development Foundation (CRDF). This work makes use of the SIMBAD, NASA ADS, and VALD astronomical databases. Klochkova et al.: Spectroscopy of HD56126 11 References 1. T. Blöcker, Astrophys. J. 299, 755 (1995). 2. S. Kwok, Annu. Rev. Astron. & Astrophys. 31, 63 (1993). 3. T. Ueta, M. Meixner, and M. Bobrowsky, Astrophys. J. 528, 861 (2000). 4. I.A. Crawford and M.J. Barlow, MNRAS 311, 370 (2000). 5. V.G. Klochkova, MNRAS 272, 710 (1995). 6. V.G. Klochkova, Bull. Spec. Astrophys. Observ. 44, 5 (1998). 7. L. Decin, H. van Winckel, C. Waelkens, and E. Bakker, Astron. & Astrophys. 332, 928 (1998). 8. V.E. Panchuk, V.G. Klochkova, I.D. Najdenov, Preprint of the Special Astrophysical Observatory of the Russian Academy of Sciences No. 135 (1999). 9. V.E. Panchuk, N.E. PIskunov, V.G. Klochkova, et al., Preprint of the Special Astrophysical Observatory of the Russian Academy of Sciences No. 169 (2002). 10. V.G. Klochkova, S.V. Ermakov, V.E. Panchuk, et al., Preprint of the Special Astrophysical Observatory of the Russian Academy of Sciences No. 137 (1999). 11. V.E. Panchuk, V.G. Klochkova, I.D. Najdenov, et al., Preprint of the Special Astrophysical Observatory of the Russian Academy of Sciences No. 139 (1999). 12. V.E. Panchuk, M.V. Yushkin, I.D. Najdenov, Preprint of the Special Astrophysical Observatory of the Russian Academy of Sciences No. 179 (2003). 13. M.V. Yushkin, V.G. Klochkova, Preprint of the Special Astrophysical Observatory of the Russian Academy of Sciences, No. 206 (2005). 14. G.A. Galazutdinov, Preprint of the Special Astrophysical Observatory of the Russian Academy of Sciences No. 192 (1992). 15. R. Oudmaijer and E.J. Bakker, MNRAS 271, 615 (1994). 16. E.J. Bakker, D.L. Lambert, H. van Winckel, et al., Astron. & Astrophys. 336, 263 (1998). 17. H. van Winckel, R. Oudmaijer, and N.R. Trams., Astron. & Astrophys. 312, 553 (1996). 18. D. Barthès, A. Lèbre, D. Gillet, and N. Mauron, Astron. & Astrophys. 359, 168 (2000). 19. E.J. Bakker, L.B.F.M. Waters, H.J.G.L.M. Lamers, et.al., Astron. & Astrophys. 310, 893 (1996). 20. E.J. Bakker, E.F. Dishoeck, L.B.F.M. Waters, and T. Schoenmaker, Astron. & Astrophys. 323, 469 (1997). 21. V.G. Klochkova, R. Szczerba, V.E. Panchuk, and K. Volk, Astron. & Astrophys. 345, 905 (1998). 22. V.G. Klochkova, R. Szczerba, V.E. Panchuk, Astron. Lett. 26, 88 (2000). 23. V.G. Klochkova, R. Szczerba, V.E. Panchuk, Astron. Lett. 26, 439 (2000) 24. V.G. Klochkova, E.L. Chentsov, Astron. Rep. 48, 301 (2004) 25. N. Kameswara Rao, A. Goswami, and D.L. Lambert, MNRAS 334, 129 (2002). 26. V.G. Klochkova, V.E. Panchuk, N.S. Tavolganskaya, G. Zhao, Astron. Rep. 83, 265 (2006). 27. S. Trammell, H.L. Dinerstein, and R.W. Goodrich, Astrophys. J. 108, 984 (1994). 28. V. Bujarrabal, J. Alcolea, and P. Planesas, Astron. & Astrophys. 257, 701 (1992). 29. A. Lèbre, N. Mauron, D. Gillet, and D. Barthes, Astron. & Astrophys. 310, 923 (1996). 30. A. Lèbre, A. B. Fokin, D. Barthes, D. Gillet, and N. Mauron, Astrophys. Space Sci. 265, 105 (2001). 31. A.A. Ferro, PASP 96, 641 (1984). 32. L.B.F.M. Waters, C. Waelkens, M. Mayor, and N.R. Trams, Astron. & Astrophys. 269, 242 (1993). 33. C. Waelkens, H.J.G.L.M. Lamers, L.B.F.M. Waters, et al., Astron. & Astrophys. 242, 433 (1991). 34. H. van Winckel, C. Waelkens, and L.B.F.M. Waters, Astron. & Astrophys. 293, L25 (1995). 35. R.D. McClure, MNRAS 96, 117 (1984). 36. E. J. Bakker, H.J.G.L.M. Lamers, L.B.F.M. Waters, et al., Astron. & Astrophys. 307, 869 (1996). 37. R.L. Kurucz, I. Furenlid, and J.T.L. Brault, Nat. Solar Observ. Atlas, New Mexico: National Solar Observatory (1984). 38. L. Wallace, K. Hinkle, and W. Livingston, Nat. Solar Obs. Techn. Rep. No.00–001, Tucson (2000). 39. A.K. Pierce and J.B. Breckinridge, Contr. Kitt Peak Nat. Obs. No. 559 (1973). 40. J.W. Swensson, W.S. Benedict, L. Delbouille, and G. Roland, “The solar spectrum from λ 7498 to λ 12016. A table of measures and identifications”, Mem. Soc. Roy. Sci. Liege, Vol.hors ser. No.5 (1970). 41. C.E. Moore, Contrib. Princeton University Observ. No.20, part I (1945). 42. C.E. Moore, Contrib. Princeton University Observ. No.20, part II (1945). 43. N.E. Piskunov, F. Kupka, and T.A. Ryabchikova, A&AS 112, 525 (1995). 12 Klochkova et al.: Spectroscopy of HD56126 Table 3. List of lines identified in the spectra of HD 56126 and αPer. Columns 3 and 5 list the central residual intensities of the lines (the continuum level is set to 1), and columns 4 and 6, the heliocentric velocities Vr. αPer HD56126 Element λ Å r Vr r Vr TiII(11) 4012.39 0.13: −2.2 0.05 81.7 FeI(557) 4013.64 FeI(485) 4013.82 0.53: ScII(8) 4014.53 0.32: −3.5 0.49 CeII(157) 4014.90 0.56 NiII(12) 4015.47 0.50: 0.64 84.8: CeII(256) 4015.88 0.74: 83.0: FeI(560) 4016.42 0.74: −1.0 FeI(279) 4017.10 FeI(527) 4017.15 0.50: −1.1 0.85: NiI(171) 4017.47 0.7: 0.9: MnI(5) 4018.10 0.47: FeI(560) 4018.27 ZrII(54) 4018.38 0.53 82.5 NdII(19) 4018.83 0.89: VII(201) 4019.05 0.86: −3.0: FeI(556) 4020.07 NdII(19) 4020.87 0.80 82.4: CoI(16) 4020.90 0.86: −2.3 NdII(36) 4021.33 0.79 82.0: FeI(120) 4021.61 FeI(278) 4021.87 0.45: −3.9 FeI(654) 4022.74 NdII 4023.01 0.75: 81.2: VII(32) 4023.38 0.47: −2.5: 0.56 82.5 FeI(277) 4024.10 ZrII(54) 4024.45 0.33: 85.1: TiI(12) 4024.58 0.37: −1.5: TiII(11) 4025.13 0.36: −2.3 0.42: 83.5: TiII(87) 4028.34 0.29: −1.7 0.28: 84.8: FeI(556) 4029.63 0.43: −0.6: ZrII(41) 4029.68 0.28 83.4: NdII(32) 4030.47 0.60: FeI(560) 4030.50 MnI(2) 4030.76 0.18: 0.63 81.0: CeII(108) 4031.34 0.52 FeII(151) 4031.44 LaII(40) 4031.68 0.42 84.6: MnI 4031.78 0.46: −2.0 FeI(44) 4032.63 0.52: FeII(126) 4032.95 0.63 MnI(2) 4033.06 0.26: −3.3: ZrII(42) 4034.08 0.52: 83.7 MnI(2) 4034.48 0.39: −2.6: Klochkova et al.: Spectroscopy of HD56126 13 αPer HD56126 Element λ Å r Vr r Vr ZrII(70) 4034.84 0.85: 83.1: VII(32) 4035.60 0.35: 0.61 85.1 MnI(5) 4035.72 VII(9) 4036.76 0.68: −2.0 0.81 82.2: GdII(49) 4037.39 0.90: 0.83 84.1: CeII(218) 4037.67 0.81: 82.7: GdII(49) 4037.90 0.85: 0.81: NdII(31) 4038.12 0.83: VII(32) 4039.56 0.82: −2.0 0.89: FeI 4040.09 0.76: ZrII(54) 4040.24 0.58 83.5 FeI(655) 4040.64 0.52: CeII(138) 4040.76 0.52 82.6: NdII(30) 4040.80 MnI(5) 4041.35 0.48: −2.5 0.84: CeII(140) 4042.58 0.60: SmII(4) 4042.72 0.59 SmII(9) 4042.90 0.53 83.1: FeI(276) 4043.90 0.56 FeII(172) 4044.01 0.89 81.2: FeI(359) 4044.61 0.60 −3.3: GdII(49) 4045.15 0.72: ZrII(30) 4045.63 FeI(43) 4045.81 0.12 −4.6 0.22: FeI(557) 4046.06 VII(177) 4046.27 CeII(81) 4046.34 0.72 80.5 ZrII(43) 4048.68 0.29 84.2 MnI(5) 4048.75 0.39 −4.3 ZrII(43) 4050.32 0.66 0.34 82.6: VII(32) 4051.04 NdII(66) 4051.15 0.69 0.75 FeI(700) 4051.31 CrII(19) 4051.97 0.54 0.64: TiII(87) 4053.82 0.33 0.39 83.2 FeI(698) 4054.82 0.51 CeII(82) 4054.99 0.70 81.2: MnI(5) 4055.54 0.68 −3.1 TiII(11) 4056.19 0.60 −0.6 0.79 84.7 FeII(212) 4057.46 0.78 85: MgI(16) 4057.50 0.48 −4.9 CoI(16) 4058.22 0.75 −4.2: 0.89 FeI(120) 4058.76 0.66 0.91 MnI(5) 4058.92 FeI(767) 4059.79 0.82 −3.3: NdII(63) 4059.96 0.88 NdII(10) 4061.08 0.75 −2.4 0.66 81.0: FeII(189) 4061.79 0.84: CeII(34) 4062.23 0.79 82.3 14 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr FeI(359) 4062.44 0.56 −1.0 FeI(43) 4063.59 0.19 −4.0: 0.40 82.9 TiII(106) 4064.37 0.67 −1.5: 0.82 81.1: VII(215) 4065.09 0.74 0.78 FeI(698) 4065.38 NiII(11) 4067.03 0.38 0.59 83.8 CeII(22) 4067.28 0.64 86: FeI(559) 4067.98 0.54 −3.1 0.87 83.0 CeII(82) 4068.84 0.75 81.8 FeI(558) 4070.78 0.60 ZrII(54) 4071.09 0.59 82.3: FeI(43) 4071.74 0.23 −3.8 0.46 82.8 CrII(26) 4072.56 0.68 −3.1 0.77 85.5: CeII(109) 4072.92 0.78 83.8: CeII(4) 4073.48 0.61 83.4: FeI(558) 4073.76 0.64 GdII(44) 4073.76 0.70: FeI(524) 4074.79 0.61 −2.8 0.95: NdII(62) 4075.12 0.85: CeII(57) 4075.70 SmII(51) 4075.85 0.53 −2.3: 0.51 FeI(558) 4076.63 LaII(11) 4076.71 0.37 −3.3: 0.74: ZrII(54) 4077.05 0.53: CrII(19) 4077.50 SrII(1) 4077.71 0.09 −2.8 0.10 DyII 4077.96 CeII(19) 4078.32 0.67: FeI(217) 4078.35 0.56: −0.7: MnI(5) 4079.3: 0.58 0.95: FeI(359) 4079.84 0.69 −2.1: FeI(558) 4080.21 0.74 CeII(44) 4080.44 0.72 82.4: CrII(165) 4081.21 0.80: 0.74 82.4: CrII(165) 4082.29 0.77 CeII(60) 4083.23 0.61 82.2 MnI(5) 4083.63 0.52 −1.8: FeI(698) 4084.49 0.62 −1.9: 0.88 CeII(172) 4085.23 0.51 0.70 85.6 FeI(559) 4085.30 VII(214) 4085.67 0.61 ZrII(54) 4085.68 0.61 84.8: CrII(26) 4086.13 0.70 −3.3: 0.85 84.0: LaII(10) 4086.71 0.62 −2.1: 0.47 83.0 FeII(28) 4087.28 0.65 0.72 84.6: FeII(39) 4088.75 0.74 −2.9: CrII(19) 4088.90 0.76 ZrII(29) 4090.52 0.70 −3.7 0.43 82.4 CoI(29) 4092.39 0.62: Klochkova et al.: Spectroscopy of HD56126 15 αPer HD56126 Element λ Å r Vr r Vr HfII(6) 4093.16 0.84: −3.2: 0.65: 82.5 CeII(160) 4093.96 0.83: 82.0: CaI(25) 4094.93 −1.1: FeI(217) 4095.98 ZrII(15) 4096.63 0.50 83.6 FeI(558) 4097.08 −3.2: Hδ 4101.74 0.08 −2.0 0.06 96.0: SiI(2) 4102.94 −2.9: DyII 4103.31 81.2: FeI(356) 4104.12 −1.6: CeII(156) 4105.00 0.67: CeII(160) 4106.13 0.79: FeI(217) 4106.26 0.64: CeII(139) 4106.88 0.78: SmII(50) 4107.39 0.63: 85.1: FeI(354) 4107.49 0.47 −3.4 CaI(39) 4108.53 0.76: FeI(558) 4109.06 0.67: NdII(17) 4109.07 0.72: 83.3 NdII(10) 4109.46 0.61: 82.0 MgII(21) 4109.54 FeI(357) 4109.80 0.51: ZrII(30) 4110.05 0.62: 82.4 CeII(29) 4110.39 0.74: 83.6: CrII(18) 4110.99 0.52 −3.5 0.64 83.0: CeII 4111.39 0.75: FeII(188) 4111.90 0.75: 0.86 82.3 FeI(695) 4112.32 0.80: CrII(18) 4112.55 0.91 82.5: FeI(1103) 4112.96 0.62: CrII(18) 4113.22 0.86 83.6 CeII(137) 4113.73 0.86 0.80 81.8: FeI(357) 4114.45 0.69 −2.7 0.95: 82.5: KII(2) 4114.99 CeII(22) 4115.37 0.79 81.6 CrII(181) 4116.66 0.86 CeII(35) 4117.01 0.79 82.4: CeII(77) 4117.29 0.83: 82: FeI(700) 4117.85 CeII(11) 4118.15 0.83: 0.69 81.7 FeI(801) 4118.54 0.38 SmII(51) 4118.55 0.72 CeII(89) 4119.01 0.83 80.7: FeII(21) 4119.51 0.65: −1.6 0.77: CeII(22) 4119.79 0.64 85: FeI(423) 4120.21 0.72: CeII(112) 4120.83 0.87 0.71 81.0 CoI(28) 4121.32 0.67: −1.6: 0.95: FeI(356) 4121.81 0.70: −2.3 0.92 16 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr FeII(28) 4122.66 0.37 −4.6 0.50 82.5 LaII(41) 4123.23 0.59 0.46 83.8: FeI(217) 4123.75 0.62 CeII(60) 4123.87 0.62 81.5 FeII(22) 4124.78 YII(14) 4124.91 0.52 0.46 FeI(1103) 4125.62 FeI(354) 4125.88 0.71: 0.95: FeI(695) 4126.18 0.69: −3.8: 0.97 85: CeII(4) 4127.37 0.65 82.4 FeI(357) 4127.61 FeI(558) 4127.80 0.42: 0.68: SiII(8) 4128.07 0.41: −1.5: 0.50 84.2 FeII(27) 4128.74 0.49 0.63 82.7 CeII(227) 4129.18 0.61: 0.75 EuII(1) 4129.72 0.60 0.85 81.1: BaII(4) 4130.65 0.44 CeII(209) 4130.71 0.56 FeI(43) 4132.06 0.26 −1.5 0.58 83.4: FeI(357) 4132.90 0.60 −4.5: CeII(4) 4133.80 0.61 −2.0 0.58 82.0 FeI(357) 4134.68 0.46 0.83 83.6 NdII 4135.33 0.83 0.73 CeII(188) 4135.44 FeI(726) 4137.00 0.61 −3.0 CeII(2) 4137.65 0.67 0.57: 80.7: FeII(150) 4138.21 0.78: FeII(39) 4138.40 0.70 0.73: FeI(18) 4139.93 0.82 −4.1 FeI(695) 4140.41 0.86 −2.1 LaII(40) 4141.73 0.72 81.2 HfII(87) 4141.84 0.77: CeII(10) 4142.40 0.68 0.69 81.8 VII(226) 4142.90 0.79 FeI(523) 4143.42 0.41: 0.78 82.7 FeI(43) 4143.87 0.29 −2.7 0.59 82.1 CeII(3) 4144.49 0.86 −1.8: 0.76 83.6 CeII(9) 4145.00 0.80 0.70 81.7 CrII(162) 4145.76 0.82 CeII(203) 4146.23 0.68 82.2 FeI(42) 4147.67 0.56 −3.7: 0.93 84.0: ZrII(41) 4149.20 0.37 0.37 FeI(694) 4149.36 CeII 4149.94 0.72 0.61 81.0: FeI(695) 4150.25 0.76 ZrII(42) 4150.97 0.67 −3.2 0.43 82.8 CeII(2) 4151.97 0.55 82.1 FeI(18) 4152.17 0.45 FeI(695) 4153.90 0.49 −0.2: 0.85 82.3 Klochkova et al.: Spectroscopy of HD56126 17 αPer HD56126 Element λ Å r Vr r Vr FeI(355) 4154.50 0.43 0.88 83.2 FeI(694) 4154.81 0.88 83.5 ZrII(29) 4156.25 0.43 −4.2: 0.38 80.9: FeI(354) 4156.80 0.46 FeI(695) 4157.78 0.58 −3.2 FeI(695) 4158.80 HfII(41) 4158.90 0.64 CeII(246) 4159.03 0.80 80.6: ZrII(42) 4161.21 0.35 84.2: TiII(21) 4161.52 0.35 0.41 81.2: SrII(3) 4161.80 0.49 TiII(105) 4163.64 0.33 −2.4 0.38 82.6 CeII(10) 4165.59 0.72 0.64 83.7 BaII(4) 4166.00 0.75 82.0 MgI(15) 4167.27 0.53 −2.1 0.79 83.2 CeII(29) 4167.80 0.74 0.85 82.7 CeII(173) 4169.88 0.76 0.71 FeI(482) 4170.91 0.52 TiII(105) 4171.90 0.41 84.3 FeI(650) 4171.91 0.28 FeI(689) 4172.64 0.85 81.8: FeI(19) 4172.75 0.46 FeII(27) 4173.46 0.22 −1.2 0.34 84.4 TiII(105) 4174.07 0.49 −2.2 0.74 84.2: FeI(19) 4174.91 0.63 −2.9 FeI(354) 4175.64 0.53 −2.0 0.78 83.2 FeI(689) 4176.57 0.60 −0.6 0.82 FeI(18) 4177.59 0.22 FeII(21) 4177.68 0.29: 83.0: FeII(28) 4178.85 0.31 −1.5 0.38 84.5 CrII(26) 4179.43 0.49 0.60 83.9: ZrII(99) 4179.81 0.49 82.4 FeI(354) 4181.75 0.38 −1.0: 0.78 83.4 FeI(476) 4182.39 0.71: −2.1 VII(37) 4183.45 0.60 0.82 81.1: TiII(21) 4184.31 0.49 0.62 82.0: FeI(355) 4184.89 0.59 −1.8 0.87 85.1: CeII(124) 4185.33 0.93: 0.84 82.0 CeII(1) 4186.61 0.43 85.8: FeI(152) 4187.04 0.41 −2.4 0.71 82.8: FeII(152) 4187.80 0.36 −4.5 0.69 82.9 FeI(1116) 4188.73 0.68 −1.4 0.96 83.6 PrII(8) 4189.52 0.90: 83.1 FeI(940) 4189.56 0.84 TiII(21) 4190.29 0.67 −2.4 0.89 82.3 VII(25) 4190.40 CeII(169) 4190.63 0.87 80.8: GdII(34) 4191.07 83.2: FeI(152) 4191.43 0.38 18 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr ZrII(108) 4191.50 0.57 LaII(78) 4192.35 0.87: CeII(79) 4193.10 0.87 0.72 84.0: CeII(85) 4193.87 0.89 0.84 81.5 FeI(693) 4195.33 0.50 0.87 83.7: CrII(161) 4195.41 FeI(693) 4196.21 0.58 LaII(41) 4196.55 0.60: CeII(136) 4197.67 0.81 CeII(209) 4198.00 0.79 FeI(152) 4198.30 0.30 0.68 CeII(207) 4198.43 CeII(7) 4198.67 0.64 81.1: FeI(522) 4199.09 0.40 FeII(141) 4199.09 NdII(15) 4199.10 YII(5) 4199.27 0.60 FeI(3) 4199.99 0.85 FeI(689) 4200.93 0.73 −2.6: FeI(42) 4202.03 0.28 0.58 82.9 VII(25) 4202.34 0.80 82.6 CeII(186) 4202.94 0.78 0.71 82.6 LaII(53) 4204.03 0.52 −2.5: 0.80 81.9: YII(1) 4204.75 0.50 80.8 VII(37) 4205.07 0.41 −3.0 0.65 MnII(2) 4205.39 0.51 0.79: ZrII(133) 4205.91 0.84 81.1 HfII(74) 4206.59 0.85 83.7: FeI(3) 4206.70 0.67 −1.2 MnII(2) 4207.23 0.66 −3.2: CrII(26) 4207.35 0.92 83.1 FeI(689) 4208.61 0.66: ZrII(41) 4208.98 0.56 −2.3 CrII(162) 4209.02 0.42 82.6: CeII(3) 4209.41 VII(25) 4209.74 0.73 0.88 83.5: FeI(152) 4210.35 0.47 −1.0 ZrII(97) 4210.62 0.64 82.3 ZrII(15) 4211.89 0.57 −2.6 0.39 83.5 CeII(169) 4213.04 0.88 FeI(355) 4213.65 0.71 −2.5 0.95: CeII(203) 4214.03 0.84 82.5 FeI(274) 4215.43 SrII(1) 4215.52 0.15 −0.6 0.16 96.9 FeI(3) 4216.18 0.51 −2.9 CdII(49) 4217.20 0.88 82.5: FeI(693) 4217.55 0.61 −2.6 0.75 83.3 FeI(800) 4219.36 0.51 −2.1 0.85 83.7 CaII(16) 4220.13 0.88: Klochkova et al.: Spectroscopy of HD56126 19 αPer HD56126 Element λ Å r Vr r Vr NdII(32) 4220.26 0.68 FeI(482) 4220.35 FeII(152) 4222.21 0.48 ZrII(80) 4222.41 CeII(36) 4222.60 0.52: FeI(689) 4224.17 ZrII(29) 4224.28 0.50: 0.66 81.0 FeI(689) 4224.52 CrII(162) 4224.85 0.83 83.6 VII(37) 4225.22 PrII(8) 4225.33 0.73: FeI(693) 4225.45 0.44 FeI(521) 4225.95 0.74: CaI(2) 4226.72 0.21 −3.6 0.43 80.7: FeI(693) 4227.43 0.32 −4.7 0.59: NdII(19) 4227.72 0.73: 82.6 NdII(36) 4228.20 CI 4228.32 0.86 0.77 81.9 SmII(4) 4229.70 0.76 0.92: FeI(41) 4229.76 LaII(83) 4230.95 0.92 83.2: NiI(136) 4231.03 0.91 ZrII(99) 4231.64 0.52 83.5 HfII(72) 4232.43 0.86: 0.78 82.2 FeII(27) 4233.17 0.23 0.34 86.5 FeI(152) 4233.60 0.40: NdII(20) 4234.20 0.83 81.9 VII(24) 4234.22 0.83 MnI(23) 4235.14 MnI(23) 4235.29 0.73 VII(5) 4235.74 0.46 FeI(152) 4235.94 0.31 ZrII(110) 4236.54 0.84: 0.64 83.0 LaII(41) 4238.38 0.63 82.3 FeI(693) 4238.81 0.52 −2.6 0.79 81.6 FeI(18) 4239.85 0.51 −3.8 0.67 CeII(2) 4239.91 FeI(764) 4240.38 CaI(38) 4240.45 0.77 CrII(31) 4242.37 0.47 −1.2 0.53 83.3 NiII(9) 4244.80 0.84 0.88 81.0: FeI(352) 4245.26 0.61 0.94 FeI(691) 4245.35 HfII(72) 4245.84 0.72 CeII(158) 4245.98 FeI(906) 4246.09 0.69 ScII(7) 4246.83 0.25 −2.1 0.39 88.7: NdII(14) 4246.88 FeI(693) 4247.42 0.47 −2.7 20 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr FeI(482) 4248.23 0.72 CeII(1) 4248.68 0.82 −2.8: 0.69 81.8 FeI(152) 4250.12 0.41 −1.7 0.69 82.6 FeI(42) 4250.79 0.33 −2.2 0.58 81.8 GdII(15) 4251.74 0.84 0.80 82.1 CrII(31) 4252.63 0.64 −3.0 0.74 82.4 CeII(77) 4253.36 0.81 82.5 CrI(1) 4254.34 0.35 −1.3 0.68 82.8 CeII(81) 4255.78 0.76 83.0 CeII(172) 4256.16 0.83: NdII(59) 4256.24 0.81 −3.1 0.84: CeII(123) 4257.12 0.92 81.0: ZrII(15) 4258.05 FeII(28) 4258.15 0.34 −0.5 0.36 FeI(3) 4258.32 FeI(476) 4260.13 FeI(152) 4260.47 0.31 −1.5 0.57 83.2 CeII(19) 4261.16 0.95 81.2 CrII(31) 4261.92 0.49 −2.2 0.55 83.5 SmII(37) 4262.68 0.94 81.5: CeII(254) 4263.43 0.78 FeI(692) 4264.21 0.84 −2.3 CeII(239) 4264.37 0.91 81.2 FeI(993) 4264.74 0.90 YII(71) 4264.88 0.78 ZrII(98) 4264.92 FeI(993) 4265.26 0.87 MnI(23) 4265.92 0.91 −2.5 ZrII(8) 4266.72 0.86 82.9 FeI(273) 4266.97 0.78 −3.8 FeI(482) 4267.83 0.73 −1.9 0.93 84: CrII(192) 4268.93 0.75: CI(16) 4269.02 0.68 CrII(31) 4269.29 0.65 0.69: CeII(204) 4270.19 0.90 −2.8 0.77 81.5 CeII(21) 4270.72 0.82 81.0: FeI(152) 4271.16 0.40 −2.7 0.70 82.0 FeI(42) 4271.76 0.25 −1.3 0.44 82.8 FeII(27) 4273.32 0.44 0.43: ZrII(28) 4273.52 FeI(478) 4273.88 0.90: CrI(1) 4274.79 0.37 −2.1 0.72 82.7 CrII(31) 4275.56 0.54 −1.7 0.57 83.6 FeI(597) 4276.68 0.87 ZrII(40) 4277.37 0.84 0.70 82.4 FeII(32) 4278.16 0.59 −1.6 0.71 82.8 VII(225) 4278.89 0.90 0.83 82.4: SmII(27) 4279.68 0.92 82.7: FeI(351) 4279.87 0.82 Klochkova et al.: Spectroscopy of HD56126 21 αPer HD56126 Element λ Å r Vr r Vr CeII(225) 4280.14 0.88 82.2: GdII(15) 4280.48 0.82 0.89 82.4 SmII(46) 4280.79 0.84: SmII 4281.01 CrII(17) 4281.03 0.79 MnI(23) 4281.10 ZrII(182) 4282.21 0.58 FeI(71) 4282.40 0.41 −1.7 CaI(5) 4283.01 0.58 −2.3 0.92 81.7 CrII(31) 4284.20 0.60 −1.2 0.69 83.1 MnII(6) 4284.43 0.82: 83.0: NdII(10) 4284.51 CeII(11) 4285.37 0.82 82.8 FeI(597) 4285.44 0.71 −2.3 TiI(44) 4286.01 0.84 FeI(414) 4286.47 0.84 −1.9 ZrII(69) 4286.51 0.63 82.6 LaII(75) 4286.97 0.76 −3.3 0.75 81.7 TiII(20) 4287.88 0.36 0.1 0.51 82.9 CeII(135) 4289.45 0.79 82.0: CrI(1) 4289.72 0.33: TiII(41) 4290.21 0.24 −2.0: 0.36 85.0: TiI(44) 4290.94 0.79 −3.5: FeI(3) 4291.46 0.76 −1.4 MnII(6) 4292.25 0.82 CeII(205) 4292.77 0.88 83.0: ZrII(110) 4293.14 0.91 0.63 82.0 TiII(20) 4294.10 0.28 −2.2 0.37 83.9 FeI(41) 4294.12 ScII(15) 4294.78 0.55 −2.3 0.73 82.5 LaII(53) 4296.05 0.73 0.65 83.1: FeII(28) 4296.57 0.35 −0.4 0.37 CeII(2) 4296.68 PrII(7) 4297.76 0.92 84.3: FeII(520) 4298.04 0.78 FeI(152) 4299.23 0.31 0.60 CeII(47) 4299.36 TiII(41) 4300.05 0.24 −0.7 0.31 93.0: TiI(44) 4301.09 0.74: ZrII(109) 4301.81 TiII(41) 4301.92 0.31 −0.8 0.40 81.5 CaI(5) 4302.53 0.46 −3.0: 0.79 83.2 FeII(27) 4303.17 0.34 −1.7 0.41 84.2: NdII(10) 4303.59 FeI(414) 4304.55 0.86 −3.4 FeI(476) 4305.45 0.49 83.5: ScII(15) 4305.71 0.37 CeII(1) 4306.72 0.81 0.75 82.3 CaI(5) 4307.75 22 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr TiII(41) 4307.89 0.21 −3.7: 0.40 82.4 FeI(42) 4307.90 ZrII(88) 4308.94 0.70 82.4 FeI(849) 4309.03 0.72: YII(5) 4309.63 0.38 0.40 84.8: CeII(126) 4309.74 CeII(133) 4310.70 0.91 82.1 ZrII(99) 4312.23 0.84 82.9 TiII(41) 4312.86 0.33 −2.3 0.41 83.5 ScII(15) 4314.09 0.22 0.36: FeII(32) 4314.30 TiII(41) 4314.98 0.25 −1.1 0.43 83.8 GdII(43) 4316.05 0.97: 0.95 82.0: TiII(94) 4316.80 0.58 −1.6 0.69 82.7 ZrII(40) 4317.32 0.82: 0.56 81.9 CaI(5) 4318.65 0.56 −0.1 0.90 83.4: ScII(15) 4320.74 0.22 0.38 TiII(41) 4320.95 FeI 4321.79 0.87 −1.9 LaII(25) 4322.50 0.88 −0.9 0.75 82.7 ZrII(141) 4323.62 0.91 81.0 FeI(70) 4325.00 ScII(15) 4325.01 0.33 −2.6 0.46 81.9: FeI(42) 4325.76 0.25 −2.4 0.43 81.4: BaII(7) 4326.74 0.73 0.90 MnII(6) 4326.76 FeI(761) 4327.10 GdII(15) 4327.13 0.75 0.92 82.5 FeI(597) 4327.91 0.83 −1.6 0.89 83.0 SmII(15) 4329.03 0.89 −2.8 0.92 TiII(94) 4330.24 0.51 0.64 CeII(82) 4330.45 TiII(41) 4330.70 0.44 −3.5 0.59 81.8: NiI(52) 4331.65 0.77 −2.7 VII(23) 4331.79 0.87 ZrII(132) 4333.28 0.59: 82.1 LaII(24) 4333.76 0.59 −0.7 0.53: 82.4 LaII(77) 4334.96 0.80: 81.7: CaII(89) 4336.26 0.69: FeI(41) 4337.05 0.36: TiII(94) 4337.33 CeII(82) 4337.78 0.38: TiII(20) 4337.92 0.23: −3.7: 0.33 NdII(68) 4338.70 0.49: 0.52 82.5: Hγ 4340.47 0.09 −2.1 0.08 97.0: TiII(32) 4341.36 0.30: FeI(645) 4343.26 0.61: TiII(20) 4344.29 0.30: 0.47: 82.1 CeII(251) 4345.96 0.80: Klochkova et al.: Spectroscopy of HD56126 23 αPer HD56126 Element λ Å r Vr r Vr FeI(598) 4346.56 0.74: FeI(828) 4347.84 0.78 FeI(414) 4348.94 0.85: −1.8: CeII(59) 4349.79 0.85 0.78 81.2 VII(36) 4349.97 TiII(94) 4350.84 0.52 0.71 83.2 FeII(27) 4351.77 0.23 −1.1 0.36: 85.6: MgI(14) 4351.91 FeI(71) 4352.73 0.53 −2.5 CeII(220) 4352.73 0.76 FeII(213) 4354.36 LaII(58) 4354.40 0.70 83.3: ScII(14) 4354.61 0.58 −5.7: CaI(37) 4355.19 0.81 −3.3: FeII 4357.57 0.90 83.8: NdII(10) 4358.17 0.80 81.6 NdII(57) 4358.70 YII(5) 4358.72 0.58 −5.6 0.48 83.7 NiI(86) 4359.63 GdII(67) 4359.64 0.57 ZrII(79) 4359.74 0.43 82.8 FeII 4361.25 0.93 −1.7 0.92: CeII(157) 4361.66 0.92: SmII(45) 4362.04 NiII(9) 4362.09 0.80 −2.0 0.86 82.5 LaII(133) 4363.05 0.93 −1.9 0.95: MoII(3) 4363.64 0.92: 81.5: GdII(33) 4364.14 0.91: 81.9: YII(70) 4364.17 0.96 CeII(135) 4364.66 0.88 −1.9 0.74 81.4 LaII(53) 4344.66 FeI(415) 4365.90 0.92 FeI(414) 4367.58 TiII(104) 4367.66 0.41 −0.6: 0.56 82.6 FeI(41) 4367.91 CeII(227) 4368.23 0.82: 0.79 83.0: FeII 4368.26 NdII(11) 4368.63 0.93: 0.91: 82.3: FeII(28) 4369.40 0.53 0.67 83.1 FeI(518) 4369.77 0.66: 0.90: ZrII(79) 4370.96 0.73: 0.48 81.9 CI(14) 4371.37 0.69: 0.70 83.2: FeII(33) 4372.22 0.92 0.92 FeI(214) 4373.57 0.83 CeII(202) 4373.82 0.83 81.5: ScII(14) 4374.46 0.30 0.43 TiII(93) 4374.82 0.28 0.44: 81.6: YII(13) 4374.94 0.29: 94.7: NdII(8) 4375.04 24 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr FeI(2) 4375.93 0.49 −2.1 0.78 81.1 FeI(471) 4376.78 0.91 −1.5 MoII(3) 4377.77 0.94: −3.1: 0.92 82.3: LaII(77) 4378.10 0.93 VI(22) 4379.23 0.82 −1.2: ZrII(88) 4379.78 0.70 −3.3 CeII(155) 4380.06 0.43 83.0 CdII(68) 4380.64 0.91 0.98: CeII(2) 4382.17 0.87 −2.9 0.73: 81.2: FeI(799) 4382.78 0.82 −1.7: ZrII(97) 4383.10 0.85: FeI(41) 4383.54 0.25 −2.2 0.40 82.6 FeII(32) 4384.32 0.50 0.61 83.9 ScII(14) 4384.81 0.46 0.74 FeII(27) 4385.38 0.36 −2.2 0.44 84.0 NdII(50) 4385.66 CeII(57) 4386.84 TiII(104) 4386.85 0.51 0.57 81.0: FeI(476) 4387.90 0.79 CeII(5) 4388.01 0.87 81.0: FeI(830) 4388.41 0.72 −2.2 ZrII(140) 4388.50 0.90 80.0: FeI(2) 4389.25 0.90 −3.1 VI(22) 4389.99 0.87 −2.0: MgII(10) 4390.56 0.76 FeI(414) 4390.96 TiII(61) 4391.02 0.48 0.67 82.2 CeII(81) 4391.66 0.76 0.66 82.1 TiII(51) 4394.04 0.44 −2.6 0.54 83.2 TiII(19) 4395.03 0.25 −1.5 0.36 87.7 TiII(61) 4395.84 0.48 −2.5 0.58 83.0 YII(5) 4398.02 0.49 0.44 84.1 TiII(61) 4398.29 CeII(81) 4398.79 0.89 80.4: CeII(81) 4399.20 0.75 82.4 TiII(51) 4399.77 0.34 −2.3 0.44 83.1 ScII(14) 4400.40 0.36 0.50 83.3 NdII(10) 4400.83 0.80 FeII(828) 4401.29 0.55 ZrII(68) 4401.35 0.84 83.6: ZrII(79) 4403.35 0.76 0.60 81.9 FeI(41) 4404.75 0.28 −1.6 0.49 82.5 VI(22) 4406.65 0.90 −3.3 GdII(103) 4406.67 0.94 81.9: CeII(64) 4407.28 0.90 81.1: FeI(68) 4407.70 0.56 −2.6 0.79 81.5 FeI(68) 4408.42 0.58 −2.2 0.92 83.0: PrII(4) 4408.84 0.85 82.0 TiII(61) 4409.24 0.50 0.77: Klochkova et al.: Spectroscopy of HD56126 25 αPer HD56126 Element λ Å r Vr r Vr TiII(61) 4409.52 0.75: NiI(88) 4410.52 CeII(33) 4410.64 0.79: TiII(115) 4411.08 0.58 −2.8 0.62 82.3 TiII(61) 4411.93 0.64 −1.1 0.78 84.2: NdII(9) 4412.27 0.93 FeII(32) 4413.59 0.69 0.3: 0.79 84.5: PrII(26) 4413.77 ZrII(79) 4414.54 0.82: 0.55 82.5 FeI(41) 4415.12 0.29 −0.7: 0.57 83.4 ScII(14) 4415.56 0.38: 0.54 82.3 FeII(27) 4416.82 0.38 −2.0 0.43 83.4 TiII(40) 4417.72 0.34 −2.2 0.41 82.9 TiII(51) 4418.34 0.48 −2.1 0.59 82.2 CeII(2) 4418.78 0.85: 0.70 81.6 SmII(32) 4420.53 0.87 ScII(14) 4420.67 0.79 SmII(37) 4421.13 0.94 0.88 82.3 TiII(93) 4421.94 0.58 −1.9 0.68 82.9 FeI(350) 4422.57 0.55 −1.4 YII(5) 4422.59 0.46 82.7 FeI(412) 4423.14 0.76 TiII(61) 4423.22 0.92 82.9 CeII(21) 4423.68 0.89 81.2 FeI(830) 4423.84 0.88 SmII(45) 4424.34 0.82 −4.2: 0.83 81.4 CaI(4) 4425.44 0.62 −1.5 0.93 81.3 FeI(2) 4427.31 0.46 0.87: 81.7: TiII(61) 4427.90 0.82 −2.2: 0.79 82.5 CeII(19) 4429.27 0.86 −3.1 0.73 83.0 LaII(38) 4429.92 0.68 0.60 81.3 FeI(68) 4430.62 0.63 −2.9 0.95 81.3: ScII(14) 4431.37 0.75 −2.7 0.92 81.3 TiII(51) 4432.10 0.79 0.94 83.6 LaII(11) 4432.95 0.94 FeI(830) 4433.22 0.77 −2.0 GdII(82) 4433.64 0.93 SmII(41) 4433.89 0.76 0.84 84.5: SmII(36) 4434.32 0.86 80.7 CaI(4) 4434.96 0.49 0.0 0.89 82.2 FeI(2) 4435.15 EuII(4) 4435.58 0.89 83.9: CaI(4) 4435.68 0.57 GdII(117) 4436.23 0.89 0.95 FeI(516) 4436.93 0.88 −1.4 CeII(169) 4437.61 0.95: −1.0: 0.93 81.7 GdII(67) 4438.13 0.97: GdII(44) 4438.27 FeI(828) 4438.35 0.91 −2.7 26 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr FeII(32) 4439.16 0.95 −3.8 0.95 83.0: FeI 4439.89 0.95 ZrII(79) 4440.45 0.81 −1.3: 0.57 83.4 CeII(238) 4440.88 0.89 0.89: TiII(40) 4441.73 0.55 −2.0 0.69 82.8 FeI(68) 4442.34 0.52 −0.9 ZrII(53) 4442.50 0.72 80.8: ZrII(88) 4443.00 0.53 0.43 83.1 FeI(350) 4443.20 TiII(19) 4443.80 0.29 −1.5 0.42 86.8 LaII(133) 4443.94 TiII(31) 4444.56 0.46 −2.9 0.56 83.3 ZrII(96) 4445.88 0.87 84.8 NdII(49) 4446.39 0.86 −2.3: 0.84 80.3 FeI(68) 4447.72 0.56 −1.6 0.89 83.1 CeII(202) 4449.33 0.81 0.74 82.4 FeII(222) 4449.66 0.88 0.85 82.0: TiII(19) 4450.48 0.34 −1.4 0.47 84.7 FeII 4451.55 0.73 82.9 MnI(22) 4451.59 0.71 −2.9: NdII(6) 4451.98 0.93: SmII(26) 4452.73 0.91 0.91 81.0: TiI(113) 4453.32 0.87 VII(199) 4453.35 FeII(350) 4454.39 SmII(49) 4454.63 CaI(4) 4454.78 0.41 −1.5 0.52 84.4 FeII 4455.26 0.89 LaII(53) 4455.79 0.87 CaI(4) 4455.89 0.63 −3.2 NdII(50) 4456.39 0.90 82.9 CaI(4) 4456.62 0.78 0.91 83.8: ZrII(79) 4457.42 0.62 82.4 TiI(113) 4457.44 0.75 FeI(992) 4458.08 MnI(28) 4458.25 0.78 SmII(7) 4458.52 0.93 81.4 FeI(68) 4459.12 0.46 −1.8 0.85 82.1 CeII(2) 4460.21 0.77 −0.7 0.67 82.0 ZrII(67) 4461.22 0.52 82.9: FeI(2) 4461.65 0.43 FeI(471) 4462.00 NdII(54) 4462.41 0.85 NdII(50) 4462.98 0.88 0.82 82.1 CeII(20) 4463.41 0.83 81.2 TiII(40) 4464.45 0.39 0.52 83.4: MnI(22) 4464.68 HfII(72) 4466.41 CI 4466.48 0.76 84.3 Klochkova et al.: Spectroscopy of HD56126 27 αPer HD56126 Element λ Å r Vr r Vr FeI(350) 4466.55 0.52 −2.1 SmII(53) 4467.34 0.88 1.2 0.82 TiII(31) 4468.49 0.29 −1.8 0.41 86.9 TiII(18) 4469.16 0.73 82.7 FeI(830) 4469.37 0.46 TiII(40) 4470.85 0.50 −1.7 0.62 84.0: CeII(8) 4471.24 0.68 81.6: FeI(595) 4472.72 FeII(37) 4472.92 0.52 0.64 82.9 FeII(17) 4474.19 0.96 0.98 81.0: VII(199) 4475.70 0.98: FeI(350) 4476.02 0.52 −1.1 0.88 84.0 YI(14) 4477.46 0.91 −1.8 CI 4477.47 0.85 82.5 CI 4478.59 SmII 4478.66 0.85 −1.2 0.81: 81.5 GdII(15) 4478.80 CI 4478.83 0.85: CeII(203) 4479.36 0.77 81.7: CeII(124) 4479.43 FeI(828) 4479.61 0.80: FeI(515) 4480.14 0.82 MgII(4) 4481.22 0.28 −2.2 0.26 82.5 ZrII(131) 4482.04 FeI(2) 4482.17 0.48 0.86: FeI(68) 4482.26 TiI(113) 4482.74 0.85 −2.0: GdII(62) 4483.33 0.97 0.93 84.4 CeII(3) 4483.90 0.79 81.7 FeI(828) 4484.23 0.73 4484.80 0.92 ZrII(79) 4485.44 0.83 84.4 FeI(830) 4485.68 0.81 −3.6 HfII(23) 4486.14 0.97 83.4 CeII(57) 4486.91 0.85 −1.0 0.74 81.7 TiII(115) 4488.33 0.52 −3.9 0.60 82.7 FeII(37) 4489.17 0.42 −2.7 0.53 83.2 FeI(2) 4489.74 0.73: FeI(973) 4490.77 0.83 −3.0: 0.97 FeII(37) 4491.40 0.42 −1.5 0.51 83.2 TiII(18) 4493.52 0.67 −1.5 0.81 81.7 ZrII(130) 4494.41 0.54 84.0: FeI(68) 4494.56 0.47 −3.6 CeII(154) 4495.39 ZrII(79) 4495.44 0.82 0.76 82.7 TiII(40) 4495.46 FeII(147) 4495.52 TiI(146) 4496.15 0.88 TiI(8) 4496.25 0.90 82.5: 28 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr CrI(10) 4496.86 ZrII(40) 4496.96 0.56 0.45 84.3 CeII(19) 4497.84 0.90 0.89 82.9 MnI(22) 4498.90 0.88 CrI(150) 4500.28 TiII(18) 4500.32 0.77 0.88 TiII(31) 4501.27 0.30 −2.1 0.44 86.0 MnI(22) 4502.22 0.90: −1.6: CrII(16) 4504.52 FeI(555) 4504.83 0.93 NdII(7) 4506.58 0.93 TiII(30) 4506.74 0.83 −2.5 GdII(13) 4506.93 CrII(16) 4507.19 FeII(38) 4508.28 0.37 −2.3 0.46 83.8 CrII(191) 4511.82 0.90 −0.8 TiI(42) 4512.74 0.90 −1.7 0.95 FeII(37) 4515.35 0.37 −2.3 0.46 83.6 LaII 4516.38 0.92: CrII(191) 4516.56 0.94 FeI(472) 4517.53 0.88 −1.7 TiI(42) 4518.03 TiII(18) 4518.30 0.60 −1.3: 0.77 VII(212) 4518.38 SmII(49) 4519.63 0.93: 0.87 80.5 FeII(37) 4520.22 0.38 −2.5 0.46 83.3 GdII(44) 4521.30 0.97: FeII(38) 4522.63 0.30 −2.8 0.41 84.1: TiI(42) 4522.80 CeII(2) 4523.08 BaII(3) 4524.94 0.72 81.5 LaII(50) 4526.12 0.82 80.8 FeI(969) 4526.45 0.73 −1.1 CeII(108) 4527.35 0.79: −2.0: 0.72 82.3 VII(56) 4528.50 0.61 FeI(68) 4528.61 0.36 −2.8 TiII(82) 4529.48 0.46 −0.1: 0.63 83.3 FeI(39) 4531.15 0.57 0.95 81.4 TiI(42) 4533.24 0.91 81.6: TiII(50) 4533.96 0.23 0.34 FeII(37) 4534.16 TiI(42) 4534.78 0.78 −1.9 0.95 SmII(45) 4537.95 0.94 0.92 CrII(39) 4539.62 0.69 CeII(108) 4539.77 0.66 FeII(38) 4541.52 0.46 −2.9 0.55 82.5 TiII(60) 4544.02 0.64 −2.9 0.79 81.8 TiII(30) 4545.14 0.57 −2.4 0.74 81.3: CrI(10) 4545.96 0.83 −2.8 Klochkova et al.: Spectroscopy of HD56126 29 αPer HD56126 Element λ Å r Vr r Vr FeI(755) 4547.85 0.79 −1.9 0.97 FeII(38) 4549.47 TiII(82) 4549.62 0.15 0.28 90.5: CeII(229) 4551.30 0.93: 0.86 80.0: TiII(30) 4552.30 0.61: 0.86 82.2 BaII(1) 4554.03 0.29 −2.1 0.31 88.7 CrII(44) 4554.99 0.58 −2.2 0.64 83.0 FeII(37) 4555.89 0.33 0.45 84.0 CrII(44) 4558.65 0.38 −2.6 0.45 82.1 CeII(8) 4560.28 0.78 82.2 CeII(2) 4560.96 0.90 −1.8: 0.83 82.2 CeII(1) 4562.36 0.81 −2.0 0.69 81.9 TiII(50) 4563.76 0.32 −2.2 0.41 84.7 ZrII(116) 4565.41 0.82 84.2: CrII(39) 4565.77 0.60: 0.71 83.0 TiII(60) 4568.32 0.73 −1.1 0.88 82.6 HfII(86) 4570.70 0.89 80.3: MgI(1) 4571.10 0.82 −1.1 TiII(82) 4571.98 0.27 −1.5 0.36 91.3: ZrII(139) 4574.48 0.69 84.6: LaII(23) 4574.87 0.83: 0.78 80.6: FeII(38) 4576.34 0.47 −2.1 0.54 83.0 FeII(26) 4580.06 0.59 0.70 82.7 CeII(7) 4582.50 0.75: FeII(37) 4582.83 0.51 −2.2 0.58 82.6 TiII(39) 4583.41 FeII(38) 4583.84 0.28 −2.3 0.41 87.5 CrII(44) 4588.20 0.45 −2.0 0.49 83.1 CrII(44) 4589.94 0.43 −1.8 0.53 83.4 CrII(44) 4592.05 0.58 −1.7: 0.65 82.8 CeII(6) 4593.92 0.74: 0.72 81.1 FeI(820) 4596.06 0.87 80.1: NdII(51) 4597.01 0.93 82.9 VII(56) 4600.19 0.67 0.90 82.8 FeII(43) 4601.36 0.84: −2.8 0.91 83.6 ZrII(138) 4601.97 0.86: −0.4: 0.89 82.6 FeI(39) 4602.94 0.63 −2.2 0.94 81.8 CeII(6) 4606.40 0.87: 0.83 82.1 TiII(39) 4609.27 0.85 −2.8 0.94 82.5 FeI(826) 4611.28 0.73 −2.4 ZrII(67) 4613.95 0.87 −1.2: 0.69 82.6 CrII(44) 4616.62 0.61 −2.0 0.68 83.3 CrII(44) 4618.82 0.49 −2.2 0.55 82.3 LaII(76) 4619.87 0.85 82.3 FeII(38) 4620.51 0.55 −1.7 0.63 83.4 CeII(27) 4624.90 0.77 83.7: FeI(554) 4625.05 0.76 CeII(1) 4628.16 0.83 −1.9 0.67 82.6 ZrII(139) 4629.07 30 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr FeII(37) 4629.33 0.38 −2.7 0.44 82.7: CrII(34) 4634.07 0.54 −2.1 0.59 83.0 FeII(186) 4635.31 0.79 −1.6 0.79 82.9 TiII(38) 4636.33 0.81 −2.4 0.91 82.9 FeI(822) 4638.02 0.80: 0.95 82.1 SmII(36) 4642.24 0.96 −2.7 0.93 82.5 LaII(8) 4645.28 0.95: 0.93 80.9: CrI(21) 4646.17 0.71 −1.4 0.95 FeI(409) 4647.44 0.72 −2.6 0.91 80.8: FeII(25) 4648.93 0.91 82.4: CrI(21) 4651.29 0.86 −1.7 CrI(21) 4652.16 0.78 −1.7 CeII(154) 4654.29 0.77 83.4: LaII(75) 4655.49 0.82 82.0 FeII(43) 4656.98 0.43 0.64 TiII(59) 4657.20 ZrII(129) 4661.78 0.74 82.3: LaII(8) 4662.51 0.80 82.3: FeII(44) 4663.71 0.69 −1.6 0.77 83.4 FeII(37) 4666.75 0.51 −2.0 0.61 83.1 LaII(76) 4668.91 0.91 82.1 FeII(25) 4670.19 ScII(24) 4670.40 0.43 0.60 82.1: LaII(80) 4671.82 0.92 82.7: CeII(18) 4680.13 0.92 81.6 YII(12) 4682.34 0.73 0.61 81.4 CeII(228) 4684.61 0.89 81.1: ZrII(129) 4685.19 0.86 0.73 82.7 LaII(23) 4691.17 0.95 82.4: FeI(409) 4691.42 0.75 −1.2: LaII(75) 4692.50 0.91 81.4 TiII(59) 4698.67 0.93 81.9 MgI(11) 4702.99 0.52 −2.2 ZrII(138) 4703.03 0.63 81.9 NdII(3) 4706.55 0.92 −1.1: 0.89 82.2 TiII(49) 4708.66 0.59 0.76 82.5 C2 (1;0)R1(16) 4712.96 0.92: C2 (1;0)R2(15) 4713.12 0.92: 77.2: C2 (1;0)R1(15) 4714.38 0.91 77.6 C2 (1;0)R2(14) 4714.54 0.91: 78.0: C2 (1;0)R3(13) 4714.72 0.93: 77.7 NdII(49) 4715.60 0.91: C2 (1;0)R3(12) 4716.15 0.96 78.4: LaII(52) 4716.44 0.92 82.2: C2 (1;0)R1(13) 4717.08 0.93 77.4 C2 (1;0)R2(12) 4717.29 0.94 78.1 C2 (1;0)R3(11) 4717.48 78.0 C2 (1;0)R1(12) 4718.38 0.92 78.2 C2 (1;0)R2(11) 4718.60 0.93 77.4 Klochkova et al.: Spectroscopy of HD56126 31 αPer HD56126 Element λ Å r Vr r Vr C2 (1;0)R3(10) 4718.84 0.95 77.8 C2 (1;0)R1(11) 4719.61 0.87: 78.3: C2 (1;0)R1(10) 4720.81 0.93 77.7 C2 (1;0)R2(09) 4721.09 0.94 78.7 C2 (1;0)R3(08) 4721.36 0.95 77.4 C2 (1;0)R1(09) 4721.94 0.95 78.3 C2 (1;0)R2(08) 4722.27 0.91 78.8: C2 (1;0)R3(07) 4722.53 0.94 77.8 C2 (1;0)R1(08) 4723.04 0.93 78.0 C2 (1;0)R2(07) 4723.44 0.90 77.2 C2 (1;0)R3(06) 4723.74 0.94 77.6 C2 (1;0)R1(07) 4724.08 0.93 77.7 C2 (1;0)R3(05) 4724.83 0.92 77.9 C2 (1;0)R1(06) 4725.07 0.88: 78.3: C2 (1;0)R2(05) 4725.57 0.91 79.6 C2 (1;0)R1(05) 4725.99 0.86: 78.5: C2 (1;0)R2(04) 4726.60 0.89 77.5 C2 (1;0)R1(02) 4728.47 0.77 77.9: C2 (1;0)P1(34) 4730.77 0.90 78.0 FeII(43) 4731.47 0.51 −3.1 0.59 81.3 C2 (1;0)P2(03) 4732.81 0.89 78.3: C2 (1;0)P2(04) 4733.40 0.84 78.6: C2 (1;0)P2(05) 4733.93 0.83 FeI(554) 4736.77 0.63 −0.3 0.73 84.0: LaII(8) 4740.27 0.84 0.78 81.9 LaII(75) 4743.08 0.92 0.85 82.7 PrII(3) 4744.93 0.95 82.2 CeII 4747.14 0.97 0.92 82.4 LaII(65) 4748.73 0.93 −0.9: 0.83 81.5 FeI(634) 4757.58 0.91 −0.8 CeII 4757.84 0.94 81.4 MnI(21) 4761.53 0.85 −2.5: ZrII(107) 4761.67 0.63 82.1 CI(6) 4762.31 CI(6) 4762.54 0.63: TiII 4763.90 0.58 −2.4 0.75 81.8: TiII(48) 4764.53 0.64 −0.5: 0.83 MnI(21) 4766.43 0.76 CI(6) 4766.68 0.83 82.2 CI(6) 4770.03 0.88 −3.6 0.76 82.3 CI(6) 4771.75 0.54 81.9 CeII(17) 4773.94 0.93 −3.9 0.90 80.6 CI(6) 4775.91 0.84 −2.2 0.71 82.0 TiII(92) 4779.99 0.52 −2.7 0.60 82.3 MnI(16) 4783.42 0.70 −2.0 0.96 82.4 YII(22) 4786.58 0.59 82.5 TiII(17) 4798.53 0.64 −3.0 0.81 82.8 LaII(37) 4804.04 0.96 −1.1: 0.91 82.0 TiII(92) 4805.09 0.45 −1.5 0.53 83.2 32 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr LaII(37) 4809.00 0.89 82.4 NdII(3) 4811.35 0.93 0.91 82.0 CrII(30) 4812.35 0.73 −2.6 0.81 82.5 CI(5) 4812.92 0.95 83.0: ZrII(66) 4816.47 0.97 0.87 82.5 CI(5) 4817.37 0.95: −1.5: 0.90 81.7 NdII(47) 4820.34 0.91 81.9 YII(22) 4823.31 0.53 82.6 MnI(16) 4823.51 CrII(30) 4824.14 0.47 −2.9 0.54 82.0 NdII(3) 4825.48 0.82: 0.86 82.0 CI(5) 4826.80 0.95 −2.1 0.91 82.7: NiI(111) 4831.18 0.83 −1.9 0.95 81.4 FeII(30) 4833.19 0.85 −1.8 0.94 82.9 CrII(30) 4836.23 0.67 0.78 82.2 LaII(37) 4840.02 0.92 81.4: ZrII(138) 4841.98 0.92 82.6 SmII(26) 4844.21 0.95 82.2: CeII(17) 4846.57 0.96 81.1: CrII(30) 4848.25 0.52 −2.0 0.59 82.9 YII(22) 4854.87 −2.1: 83.0 FeI(318) 4859.74 0.80 Hβ 4861.33 0.11 −1.7 0.13 98.2 CrII(30) 4864.32 −1.7: 83.3: TiII(29) 4865.62 −2.6: 81.7: FeI(318) 4871.32 0.50: −2.6: 0.75 81.2 FeI(318) 4872.14 0.55: −3.1: 0.83 81.9 TiII(114) 4874.01 0.65: −2.2: 0.70 82.5 CrII(30) 4876.40 0.61 CrII(30) 4876.48 0.51 CaI(35) 4878.14 FeI(318) 4878.22 0.56 0.90 81.0: YII(12) 4881.44 0.92 81.1 CeII 4882.46 0.85 81.3 YII(22) 4883.69 0.51 −2.8 0.39 84.8 CrII(30) 4884.60 0.75 −0.9: 0.84 83.2 FeI(318) 4890.76 0.53 −2.2 0.78 80.8 FeI(318) 4891.49 0.49 −2.2 0.74 82.7 FeII(36) 4893.82 0.79 −1.6 0.86 ZrII(107) 4894.43 0.89 82.0 BaII(3) 4899.94 YII(22) 4900.12 0.46 0.38 FeI(318) 4903.31 0.66 −1.9 0.87 81.0 ZrII(145) 4908.67 0.97 80.8: TiII(114) 4911.19 0.59 −1.1: 0.63 83.7: ZrII(3) 4911.66 0.82 81.8: FeI(318) 4918.99 0.52 −2.8 0.81 82.9 FeI(318) 4920.50 0.42 −1.1 0.64 85.0: LaII(7) 4920.98 0.63 81.9: Klochkova et al.: Spectroscopy of HD56126 33 αPer HD56126 Element λ Å r Vr r Vr LaII(7) 4921.80 0.79 0.67 82.3 FeII(42) 4923.92 0.29 −1.8 0.55: 78.0: 0.39 93.9 CI(13) 4932.05 0.81 −3.4 0.64 81.3 BaII(1) 4934.08 0.32 −2.9 0.35 85.5 FeI(687) 4946.39 0.79 −1.9 LaII(36) 4946.47 0.92 80.8 FeII(168) 4953.98 0.93 −1.6: 0.94 83.3: FeI(318) 4957.30 FeI(318) 4957.60 0.31 0.65 81.8: NdII(1) 4959.13 0.92 −1.1 0.92 81.4 NdII(22) 4961.40 0.96 0.93 82.4 SrI(4) 4962.29 0.78 83.3 FeI(687) 4966.09 0.71 −2.3 0.94 83.2 OI(14) 4967.88 0.95 FeI(1067) 4967.90 0.83 −1.9 OI(14) 4968.79 0.96: 82.1: LaII(37) 4970.39 0.85 81.7 CeII 4971.48 0.88 82.2 FeI(984) 4973.11 0.82 −2.0 TiII(71) 4981.35 0.94 80.8 TiII(38) 4981.74 0.72 −1.2 YII(20) 4982.13 0.69 82.4 LaII(22) 4986.82 0.93 0.86 82.8 NdII 4989.96 0.95 −2.0: 0.90 81.4: FeII(36) 4993.35 0.65 −1.1: 0.75 82.9 FeI(16) 4994.14 0.73 −2.4: 0.93 82.0: LaII(37) 4999.46 0.78 82.6 TiI(38) 4999.49 0.75 −2.1 TiII(71) 5005.17 0.78: 0.92 82.2 FeI(984) 5005.72 FeI(318) 5006.12 0.54: 0.88 82.1 TiII(113) 5010.21 0.77 −2.9 0.87 82.8 BaII(10) 5013.00 0.92 80.1 TiII(71) 5013.69 0.64 −2.7 0.81 82.0 CI 5017.09 0.91: 82.8: FeII(42) 5018.44 0.28 −2.2 0.55: 77.0: 0.34 96.4 CaII(15) 5019.98 0.72 −3.0: 0.82 82.8: TiI(38) 5020.03 FeI(965) 5022.24 0.74: CeII(16) 5022.87 0.90 81.3: CI 5023.85 0.94: 0.87 82.7 TiI(38) 5024.85 0.90: CI 5024.92 0.94 81.8 ScII(23) 5031.02 0.53 −2.6 0.63 82.5 CI(4) 5039.07 0.76 82.3 CI 5040.13 0.90 82.4: SiII(5) 5041.03 0.77 80.5: 34 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr CeII(16) 5044.01 0.89 81.9 FeI(318) 5044.22 0.86 FeI(114) 5049.82 0.64 −2.1 0.93 81.8 FeI(16) 5051.64 0.62 −0.9: CI(12) 5052.17 0.74: 0.54 81.7 CI 5053.52 0.95 83.6 SiII(5) 5055.98 0.83 −1.1: 0.75 83.3: SiII(5) 5056.31 FeI(1094) 5065.02 0.66 −1.7: 0.93: 81.3: FeI(383) 5068.77 0.68 0.95: TiII(113) 5069.09 0.93 83.0: FeI(1089) 5272.08 TiII(113) 5072.29 0.63 0.79 82.4 FeI(1094) 5074.75 0.71 −1.7 0.94 81.6 CeII(15) 5079.68 0.80 81.6 FeI(16) 5079.75 0.74 NiI(143) 5080.54 0.74 −2.1 0.94 81.4 FeI(16) 5083.35 0.72 −2.1 0.97 YII(20) 5087.42 0.62 −2.4 0.51 84.3 FeI(1090) 5090.78 0.82 −2.0 0.94 82.9: NdII(48) 5092.80 0.94 −1.2 0.92 82.0 C2 (0;0)R1(33) 5095.15 0.98 77.5: C2 (0;0)R1(32) 5098.13 0.98 77.3: C2 (0;0)R3(30) 5098.30 0.98: 78.3: FeII 5100.74 0.70 0.81 82.3: C2 (0;0)R1(29) 5106.36 0.95 77.5 FeI(16) 5107.45 LaII(164) 5107.54 0.57 0.90 83.0: FeI(36) 5107.65 FeI(1) 5110.42 0.68 −2.1 ZrII(95) 5112.27 0.89 −1.9 0.62 82.7 LaII(36) 5114.55 0.91: 0.82 81.6: C2 (0;0)R1(25) 5116.66 0.93 77.7 C2 (0;0)R3(23) 5116.89 0.95 77.6 CeII 5117.18 0.97 0.90 81.4: YII(20) 5119.12 0.88 −2.1 0.69 80.9: FeII(35) 5120.34 0.80 −1.0: 0.91 82.7 C2 (0;0)R1(23) 5121.44 0.93 77.3 C2 (0;0)R3(21) 5121.69 0.94 77.7 LaII(36) 5123.00 YII(21) 5123.22 0.70 0.53 C2 (0;0)R1(22) 5123.79 77.3: C2 (0;0)R3(20) 5124.04 77.3: C2 (0;0)R1(21) 5125.98 0.91 77.5 C2 (0;0)R3(19) 5126.26 0.93 77.5 C2 (0;0)R3(20) 5128.19 0.90 77.3 C2 (0;0)R3(18) 5128.49 0.93 77.9 TiII(86) 5129.16 0.50 C2 (0;0)R1(19) 5130.27 0.89 77.9 Klochkova et al.: Spectroscopy of HD56126 35 αPer HD56126 Element λ Å r Vr r Vr C2 (0;0)R1(18) 5132.36 0.86 77.7 FeII(35) 5132.66 0.77 −2.6: 82.2: FeI(1092) 5133.69 0.64 −2.0 0.88 82.9 C2 (0;0)R1(17) 5134.32 0.89 77.1 C2 (0;0)R3(15) 5134.67 0.91 77.4 C2 (0;0)R1(16) 5136.27 0.89 77.6 C2 (0;0)R2(15) 5136.44 0.94 77.7 C2 (0;0)R3(14) 5136.66 0.89 77.7 C2 (0;0)R1(15) 5138.11 0.89 77.3 C2 (0;0)R2(14) 5138.32 0.93 77.1 C2 (0;0)R3(13) 5138.51 0.90 77.6 C2 (0;0)R1(14) 5139.93 0.88 77.5 C2 (0;0)R2(13) 5140.14 0.92 77.8 C2 (0;0)R3(12) 5140.38 0.89 77.6 C2 (0;0)R1(13) 5141.65 0.87 77.2 C2 (0;0)R2(12) 5141.90 0.89 77.1 C2 (0;0)R3(11) 5142.11 0.89 77.7 C2 (0;0)R1(12) 5143.33 0.86 77.6 C2 (0;0)R2(11) 5143.60 0.89 77.4 C2 (0;0)R3(10) 5143.86 0.88 77.7 C2 (0;0)R1(11) 5144.92 0.85 77.5 C2 (0;0)R2(10) 5145.23 0.87 77.6 C2 (0;0)R3(09) 5145.48 0.87 77.5 FeII(35) 5146.11 0.73: 0.86 83.1: C2 (0;0)R1(10) 5146.46 0.83 77.5 C2 (0;0)R2(09) 5146.81 0.88 77.6 C2 (0;0)R3(08) 5146.12 0.88 77.2 C2 (0;0)R1(09) 5147.93 0.84 77.4 C2 (0;0)R2(08) 5148.33 0.83 77.0 C2 (0;0)R3(07) 5148.61 0.84 77.3 C2 (0;0)R1(08) 5149.33 0.84 77.8 C2 (0;0)R2(07) 5149.79 0.85 77.1 C2 (0;0)R3(06) 5150.14 0.86 77.6 FeI(16) 5150.85 0.70 −0.9: C2 (0;0)R2(06) 5151.17 0.83 77.3 C2 (0;0)R2(05) 5152.52 0.81 77.0 C2 (0;0)R2(04) 5153.77 0.74 77.2 TiII(70) 5154.07 0.51 0.57: C2 (0;0)R2(03) 5154.99 0.82 77.4 C2 (0;0)R1(02) 5156.11 0.77 77.2 C2 (0;0)R2(01) 5157.16 0.86 77.8 C2 (0;0)P2(04) 5161.98 0.75 76.6 FeI(1089) 5162.27 0.66 −1.6 C2 (0;0)P2(05) 5162.58 0.66 77.3 C2 (0;0)P2(07) 5163.13 0.75 77.4 C2 (0;0)P1(14) 5165.03 0.56 78.0 C2 (0;0) head 5165.24 0.72: MgI(2) 5167.32 0.30 0.55 FeI(37) 5167.49 36 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr FeII(42) 5169.03 0.27 −2.7 0.56: 77.0: 0.33 97.4 FeI(36) 5171.60 0.56 −1.7 0.88 82.6 MgI(2) 5172.68 0.37 −1.4 0.51 82.7 NdII 5179.78 0.94 81.6 FeI(1166) 5180.07 0.92 MgI(2) 5183.60 0.32 −2.1 0.45 81.6 TiII(86) 5185.91 0.54 −2.1 0.69 82.0 CeII(46) 5187.46 0.93: 0.81 81.4 TiII(70) 5188.69 0.41 0.57 CaI(49) 5188.85 FeI(383) 5191.45 0.56 FeII(52) 5191.58 0.58 82.8: FeI(383) 5192.35 0.57 0.83: NdII(75) 5192.61 0.81: YII(28) 5196.43 0.75 81.8 FeII(49) 5197.58 0.46 −2.4 0.59 83.3 YII(20) 5200.41 0.70 −2.8: 0.61 83.7: FeI(66) 5202.34 0.67 0.94 81.4 YII(20) 5205.73 0.55 84.8: CrI(7) 5206.04 0.42 TiII(103) 5211.54 0.72 −2.7 0.83 82.3 NdII(44) 5212.37 0.96 −2.0: 0.95 81.1 FeII 5216.85 0.94 82.2 PrII(35) 5220.11 0.94 82.0 TiII(70) 5226.54 FeI(383) 5226.87 0.37: −1.9: 0.60 83.6: FeI(37) 5227.19 0.79 81.6 FeI(1091) 5228.38 0.89 NdII(46) 5228.43 FeI(553) 5229.85 0.74 −1.3 0.95 82.8 FeII(49) 5234.63 0.45 −2.3 0.57 83.8 CrII(43) 5237.32 0.60 −1.9 0.68 82.7: ScII(26) 5239.82 0.66 −1.8 0.76 82.4 CrII(23) 5246.77 0.80: 0.91 83.1 FeII(49) 5254.93 0.62 −1.3 0.74 82.0 NdII(43) 5255.51 0.91 81.6: FeII(41) 5256.93 0.80 −1.7 0.89 83.2 FeII 5260.26 0.92: 0.90 81.9: TiII(70) 5262.11 0.33: 0.81 CaI(22) 5262.24 FeII(48) 5264.81 0.60 −2.0 0.56 87.4: FeI(383) 5266.55 0.58 −1.8 0.88 82.3 TiII(103) 5268.62 0.73: 0.86: 83.0: FeI(15) 5269.54 0.42 −2.0 0.70 82.1 FeII(185) 5272.39 0.86 0.91 83.3: CeII(15) 5274.23 0.92: 0.80 81.9 CrII(43) 5274.98 0.60: 0.71 82.9 FeII(49) 5276.00 0.42 −3.0 0.56 84.8 Klochkova et al.: Spectroscopy of HD56126 37 αPer HD56126 Element λ Å r Vr r Vr FeI(383) 5281.79 0.70 −2.1 0.95 81.3 FeII(41) 5284.10 0.53: 0.68 83.1 YII(20) 5289.82 0.95 −2.3 0.87 81.2 LaII(6) 5290.83 0.96 −1.8: 0.93 81.1: NdII(75) 5293.16 0.90 −1.4 0.84 82.2 HfII(49) 5298.06 0.94 82.7: CrII(24) 5305.86 0.76 −1.5 0.84 82.9 CrII(43) 5308.43 0.79 −2.1 0.85 82.7 CrII(43) 5310.69 0.84: 0.92 82.7 CrII(43) 5313.58 0.70 −1.6 0.78 82.9 FeII(49,48) 5316.66 0.34 −1.3: 0.48 88.0: ScII(22) 5318.35 0.85 −1.4 0.95 83.2: NdII(75) 5319.82 0.92: 0.85 81.5 YII(20) 5320.78 0.94 82.0 FeI(553) 5324.18 0.58 −2.4 0.81 82.6 FeII(49) 5325.56 0.65 −2.5 0.73 82.6 FeI(15) 5328.04 0.37: 0.74 82.4 OI(12) 5328.69 0.93 83.3 CrII(43) 5334.87 0.72 −2.1 0.80 82.9 TiII(69) 5336.79 0.55 −2.2 0.71 81.9 FeII(48) 5337.73 0.71 CrII(43) 5337.79 0.81 FeI(37) 5341.02 0.61 −1.6 0.94 81.5 ZrII(115) 5350.09 ZrII(115) 5350.35 0.87: 0.64 FeI(1062) 5353.38 0.79: CeII(15) 5353.53 0.79 81.3 FeII(48) 5362.86 0.51 −1.9 0.63 83.5 FeI(1146) 5364.87 0.72 0.93 82.6 FeI(1146) 5367.47 0.70 −2.1 0.90 82.4 CrII(29) 5369.35 0.96 82.2 FeI(1146) 5369.96 0.65 −2.1 0.88 82.9 FeI(15) 5371.49 0.46 −2.2 0.82 81.8 NdII(79) 5371.92 0.92 81.5: LaII(95) 5377.05 0.94 82.3 CI(11) 5380.34 0.85: −1.9: 0.68 81.9 TiII(69) 5381.03 0.63 −2.8 0.78 81.8 FeI(1146) 5383.37 0.63 −2.2 0.86 82.6 FeI(553) 5393.17 0.71 CeII(24) 5393.39 0.81 FeI(553) 5393.17 0.71 CeII(24) 5393.39 0.81 FeI(15) 5397.13 0.52 −2.1 0.89 81.6 YII(35) 5402.78 0.85 −2.1 0.59 81.9 FeI(1145) 5404.14 0.59 0.87 82.2 FeI(15) 5405.77 0.50 −1.9 0.89 81.4 CeII(23) 5409.22 0.90 81.4 FeI(1165) 5410.91 0.70 −1.5 0.91 82.8: FeII(48) 5414.07 0.74 −2.2 0.84 82.5 38 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr FeI(1165) 5415.20 0.64 −2.1 0.87 82.2 ZrII(94) 5418.01 0.96 80.6: TiII(69) 5418.77 0.67 −2.0 0.82 82.4 CrII(23) 5420.93 0.79 0.90 83.0 FeI(1146) 5424.07 0.60 −1.8 0.84 82.7 FeII(49) 5425.25 0.67 −2.0 0.75 82.7 FeII 5427.80 0.94 −1.4 0.96 81.4: FeI(15) 5434.52 0.56 −1.8 0.91 81.8 OI(11) 5435.18 0.97 OI(11) 5435.78 0.97 82.4 OI(11) 5436.86 0.95: NdII(76) 5442.29 0.97 81.0: FeI(1163) 5445.04 0.74 −1.9 0.95 82.1 FeI(15) 5446.92 0.48 0.89 81.0: NdII 5451.12 0.97: 0.96 81.5: CeII(24) 5468.38 0.96 0.92 89.5 CeII(24) 5472.30 0.90 89.0 YII(27) 5473.39 0.87: 0.66 90.0 ZrII(115) 5477.79 0.89 89.7 CrII(50) 5478.37 0.76 −1.7 0.83 89.3 YII(27) 5480.74 0.83: 0.66 90.0 NdII(79) 5485.71 0.97 −1.3: 0.92 90.3: TiII(68) 5490.68 0.82 −1.2: 0.93 90.4: YII(27) 5497.41 0.62 0.51 90.5 FeI(15) 5497.52 FeI(15) 5501.47 0.72 CrII(50) 5502.08 0.80 −2.7: 0.87 90.2 FeI(15) 5506.79 0.67 −0.9: 0.96 90.3: CrII(50) 5508.62 0.82 −2.9: 0.89 89.0: YII(19) 5509.90 0.80 −1.4 0.65 90.9: CrII(23) 5510.71 0.83 −1.9 0.90 90.7: YII(27) 5521.56 0.92 −2.1 0.69 90.3 ScII(31) 5526.81 0.55 −2.1 0.60 90.4 MgI(9) 5528.41 0.58 −2.4 0.76 88.7 FeII(55) 5534.84 0.57 −1.8 0.63 90.0 FeI(926) 5543.19 0.90 −2.1 0.97 89.8 FeI(1062) 5543.94 0.89 −2.4 0.96 89.0: YII(27) 5544.61 0.89: 0.65 91.0 CI 5545.07 0.92: 0.82: 88.0: YII(27) 5546.01 0.90: 0.66 90.1 CI 5547.27 0.95 89.5 CI 5551.03 0.95 87.8: CI 5551.59 0.92 88.5 FeI(1183) 5554.90 0.84 −2.7 0.95 90.8: FeI(686) 5569.62 0.71 −1.8 0.93 89.0 FeI(686) 5572.84 0.63 0.89 90.0 FeI(686) 5586.76 0.61 −1.4 0.84 89.2 CaI(21) 5588.76 0.65 −2.0 0.90 88.8 CaI(21) 5594.47 0.66: 0.89 88.4 Klochkova et al.: Spectroscopy of HD56126 39 αPer HD56126 Element λ Å r Vr r Vr CaI(21) 5601.28 0.80 −2.9 0.97 88.7 CeII(26) 5610.26 0.97: YII(19) 5610.36 0.93 FeI(686) 5615.64 0.57 0.82 89.5 NdII(86) 5620.65 0.93 88.5 FeI(686) 5624.54 0.75 −2.4 0.93 90.2: FeII(57) 5627.49 0.87 −1.6 0.93 90.3 CI 5629.93 0.98: −1.5: 0.96 88.5 FeI(1314) 5633.95 0.89 −2.0 FeI(1087) 5638.27 0.87 −2.3 ScII(29) 5640.98 0.73 0.82 90.6 ScII(29) 5657.87 0.56: 0.72 90.3 ScII(29) 5658.34 0.83: 88.5 FeI(1087) 5662.52 YII(38) 5662.94 0.66: 0.42 90.6 ScII(29) 5667.15 0.77 −1.3 0.87 90.0 CI 5668.95 0.72 90.4 ScII(29) 5669.03 0.72 −2.9 LaII(95) 5671.54 0.96 91.2: NaI(6) 5682.64 0.79: 0.96 88.8: ScII(29) 5684.19 0.70: 0.83 90.5 NaI(6) 5688.21 0.72 −0.9: 0.92 91.2: NdII(79) 5688.54 0.91 89.6: CI 5693.11 0.98: 0.94 89.5 MgI(8) 5711.09 0.85 −1.6 0.98: NiI(231) 5715.09 0.90 −1.6 CI 5720.78 0.98 89.5: YII(34) 5728.89 0.94 −1.9 0.75 90.3 FeI(1087) 5731.77 0.92 −2.1 0.98 FeII(57) 5732.72 0.94 0.97 90.0 FeI(1180) 5752.04 0.93 −2.2 FeI(1107) 5763.00 0.80 −3.0: 0.96: 89.3: LaII(70) 5769.06 0.96 −1.2: 0.86 88.9 SiI(17) 5772.15 0.91 −1.8 0.98: FeI(1087) 5775.08 0.93 −1.7 YII(34) 5781.69 0.84 90.5 CI 5793.12 0.89 0.87 88.9 CI 5794.46 0.97 90.6: LaII(4) 5797.59 0.87 89.9 SiI(9) 5797.86 0.90 CI 5800.59 0.95: 0.91 89.2 NdII(79) 5804.02 0.93 90.5 CI 5805.19 0.94: 0.94 90.5 LaII(4) 5805.78 0.94 −2.7: 0.86 89.9 FeI(1180) 5806.73 0.93 −2.2 0.98 90.2 LaII(4) 5808.31 0.98: 0.97 91.5: FeI(982) 5809.22 0.95 −1.3 0.98: VII(99) 5819.93 0.95 −1.4 0.98: 90.0: FeII(164) 5823.15 0.96 −2.5: 0.97 89.6: 40 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr NdII(86) 5842.39 0.99: 0.96 88.3: FeI(1178) 5852.22 0.96 −1.8 BaII(2) 5853.68 0.63 −2.1 0.53 90.8 CaI(47) 5857.45 0.72 0.94 88.9 FeI(1180) 5862.36 0.85 −1.9 0.96 88.8 LaII(62) 5863.70 0.98 0.97 89.8 LaII(35) 5880.63 0.96 90.7: NaI(1) 5889.95 0.45: −1.2: 0.21 13.0 0.05 +2.2 0.32 24.0 0.37 31.4 0.10 75.7 0.40 89.0 NaI(1) 5895.92 0.50: −0.9: 0.38 13.2 0.06 +2.3 0.51 24.7 0.48 32.0 0.16 76.3 0.45 89.8 FeI(982) 5934.66 0.89 −2.2 SiI(16) 5948.54 0.83 −3.0 0.97: SiII(4) 5957.56 0.93 88.0: CeII(80) 5975.83 0.94 89.9 SiII(4) 5978.93 0.88 87.5 FeI(1175) 5983.68 0.89 −0.7 0.95: 89.0: FeI(1260) 5984.82 0.86 −1.9 0.96: 88.0: FeII(46) 5991.37 0.75 −1.7 0.84 90.3 CI 6001.13 0.98: 0.92 89.5 CI 6002.98 0.95 89.7: FeI(959) 6003.02 0.88 −2.4 CI 6006.03 0.96: 0.86 89.5 CI 6010.68 0.98 −1.2 0.92 88.8 CI 6012.24 0.98: −1.5: 0.94 90.3: CI 6013.32 0.80 87.7: MnI(27) 6013.49 0.89: CI 6016.45 0.93 89.2: MnI(27) 6016.64 0.91: −2.0: MnI(27) 6021.80 0.90 −2.0: FeI(1178) 6024.07 0.82 −2.2 0.94 88.8 ZrII(136) 6028.64 0.97: 0.91 89.1 CeII(30) 6034.20 0.98 −1.7 0.96 90.8: CeII(30) 6043.40 0.99 −1.5: 0.94 89.7 FeI(207) 6065.49 0.81 −1.4 0.98: FeII(46) 6084.10 0.83 −1.5 0.90 ZrII(106) 6106.47 0.97: 0.93 90.9 FeII(46) 6113.32 0.88 −1.6 0.93 88.6 ZrII(93) 6114.78 0.98 −1.5: 0.92 92.2: CI 6120.82 0.99: 0.97 89.8 CaI(3) 6122.22 0.69 −0.9 0.93 89.5 FeI(169) 6136.62 0.75 0.95 89.0: FeI(207) 6137.70 0.78 −2.0 0.96 89.2: Klochkova et al.: Spectroscopy of HD56126 41 αPer HD56126 Element λ Å r Vr r Vr BaII(2) 6141.72 0.56 −1.4 0.37 92.3 FeII(74) 6147.74 0.74 −1.4 0.77 90.2 FeII(74) 6149.25 0.75 −1.5 0.77 90.0 SiI(29) 6155.98 0.89 OI(10) 6155.98 0.95: 0.82 87.0: OI(10) 6156.17 0.95: 0.82 87.1 OI(10) 6158.18 0.79 88.0 FeI(1260) 6170.51 0.90 −1.5 0.97 89.7 FeII(200) 6175.14 0.95 90.2 LuII(2) 6221.88 0.98: 0.90 89.9: FeII(74) 6238.39 0.74 −1.7 0.76 90.0 SeII(28) 6245.62 0.80: −0.5: 0.90 90.0: FeII(74) 6247.56 0.66 −2.2 0.70 89.9: FeII 6248.90 0.94 −1.4 0.94 89.3 FeI(169) 6252.56 0.81 −1.9 0.97 89.5 LaII(33) 6262.30 0.96 −0.6: 0.89 90.2 FeII 6317.99 0.81 −1.0: 0.86 88.8: FeI(168) 6318.03 SiII(2) 6347.10 0.65 −2.0 0.51 89.7 FeII(40) 6369.46 0.84 −1.6 0.91 90.5 SiII(2) 6371.36 0.72 −1.8 0.57 89.5 FeII 6383.72 0.91 −1.6 0.92 89.8 FeII 6385.46 0.93 −2.0 0.94 88.4: LaII(33) 6390.49 0.97 0.89 90.3 FeI(168) 6393.61 0.79 −2.5 0.96 90.5: CI 6397.98 0.98: 0.94 89.2 FeII(74) 6416.92 0.75 −1.0 0.80 90.2 FeI(62) 6430.85 0.83 −1.6 0.97 90.6: FeII(40) 6432.68 0.75 −1.4 0.81 90.0 CaI(18) 6439.08 0.69 −1.6 0.89 90.0 FeII 6442.95 0.95 −1.2 0.95 88.9 FeII(199) 6446.41 0.95 −1.2 0.95 91.1: OI(9) 6453.60 0.97: 0.94 87.9 OI(9) 6454.45 0.98: −2.0: 0.94 88.0 FeII(74) 6456.39 0.60 −0.2: 0.66 91.0: TiII(91) 6491.57 0.78: 0.91 90.7: BaII(2) 6496.91 0.55 0.52 FeII(40) 6516.08 0.66 −2.0 0.77 90.2 LaII(33) 6526.99 0.91 90.7: MgII(23) 6545.97 0.93 89.8 FeI(268) 6546.25 0.92: Hα 6562.81 0.19 −2.1 0.32 58.0 0.40: 74.0: 0.89 82.8: CI(22) 6587.61 0.89 −1.7 0.72 89.5 SeII(19) 6604.59 0.89 0.94 91.1: TiII(91) 6606.95 0.95 −2.2: 0.96 89.5 CI 6611.35 0.96 90.6: YII 6613.75 0.91 0.73 90.0 42 Klochkova et al.: Spectroscopy of HD56126 αPer HD56126 Element λ Å r Vr r Vr CI 6654.61 0.94 89.5: CI 6655.51 0.91 90.5 YII(26) 6795.42 0.88 90.0 CI 7087.83 0.90 89.3 CI 7100.12 0.91 89.7 SiI(23) 7415.95 0.95: 87.5: SiI(23) 7423.50 0.92 91.0: EuII(8) 7426.57 0.93: 85.0: CI 7476.18 0.89: 86.0: CI 7483.44 0.90 85.0: LaII(1) 7483.48 FeI(1077) 7511.03 0.97: 87.0 KI(1) 7664.91 0.83 77.5 OI(1) 7771.94 0.34 93.5 OI(1) 7774.17 0.35 94.8: OI(1) 7775.39 0.42: CI 7860.89 0.91: 88.0: MgII(8) 7877.05 0.89 90.5: YII(32) 7881.90 0.89 90.5: MgII(8) 7896.37 0.80 88.7: H(P27) 8306.12 0.91: 91.0: H(P25) 8323.43 0.91: 91.0: H(P20) 8392.40 0.64 87.0: H(P19) 8413.32 0.62 88.0: H(P18) 8437.96 0.60 92.0: OI(4) 8446.5: 0.26 94.0: H(P17) 8467.25 0.58 93.0: CaII(2) 8542.11 0.40 85.0: H(P15) 8545.38 0.53 94.0: NI(8) 8567.74 0.90 89.0: NI(8) 8594.01 90.0: H(P14) 8598.39 0.49 93.0: NI(8) 8629.24 0.78: 88.0: NI(1) 8703.25 0.85: 90.0: NI(1) 8711.70 0.83 87.0: NI(1) 8718.83 0.84 89.0: H(P12) 8750.47 0.43 98.0: Introduction Observations and reduction of spectra Peculiarities of the optical spectrum of HD56126 Radial velocities pattern Spectral atlas ABSTRACT We studied in detail the optical spectrum of the post-AGB star HD56126 (IRAS07134+1005). We use high resolution spectra (R=25000 and 60000) obtained with the echelle spectrographs of the 6-m telescope. About one and a half thousand absorptions of neutral atoms and ions, absorption bands of C_2, CN, and CH molecules, and interstellar bands (DIBs) are identified in the 4010 to 8790 AA wavelength region, and the depths and radial velocities of these spectral features are measured. Differences are revealed between the variations of the radial velocities measured from spectral features of different excitation. In addition to the well-known variability of the Halpha profile, we found variations in the profiles of a number of FeII, YII, and BaII lines. We also produce an atlas of the spectrum of HD56126 and its comparison staralpha Per. The full version of the atlas is available in electronic form from Web-address: http://www.sao.ru/hq/ssl/Atlas/Atlas.html <|endoftext|><|startoftext|> Generation of Large Number-Path Entanglement Using Linear Optics and Feed-Forward Hugo Cable∗ and Jonathan P. Dowling Horace C. Hearne Jr. Institute for Theoretical Physics, Department of Physics and Astronomy, Louisiana State University, Baton Rouge LA70803. (Dated: April 4, 2007) We show how an idealised measurement procedure can condense photons from two modes into one, and how, by feeding forward the results of the measurement, it is possible to generate efficiently superpositions of components for which only one mode is populated, commonly called “N00N states”. For the basic procedure, sources of number states leak onto a beam splitter, and the output ports are monitored by photodetectors. We find that detecting a fixed fraction of the input at one output port suffices to direct the remainder to the same port with high probability, however large the initial state. When instead photons are detected at both ports, Schrödinger cat states are produced. We describe a circuit for making the components of such a state orthogonal, and another for subsequent conversion to a N00N state. Our approach scales exponentially better than existing proposals. Important applications include quantum imaging and metrology. The fundamental limits to optical detection for metrol- ogy and imaging are quantum mechanical [1]. Of partic- ular interest for reaching such quantum limits are path- entangled states of photons of the form |N0〉+ eiφ|0N〉, in a basis of photon-number states, commonly referred to as “N00N” states. A variety of applications have been suggested [2]. For lithography [3] and microscopy [4], N00N state light would be used together with multi- photon absorbers to achieve enhanced resolution. This is because the de Broglie wavelength for an N -photon state is a factor 1/N smaller than the wavelength associated with the single photon, and the absorption rate scales linearly with the incident intensity, rather than as the N th power. Regarding applications to precision metrol- ogy, whereby an interferometric setup is used to mea- sure small phase shifts, N00N states achieve the Heisen- berg limit, for which the phase uncertainty scales as 1/N [5, 6, 7], and entanglement is a fundamental requirement for achieving this limit. It has been rigorously demon- strated that the cost of improving sensitivity (without using entanglement) is higher intensities or longer coher- ence times [8]. Classically the shot-noise limit applies, attained for example by laser light, for which the uncer- tainty scales as 1/ N , already a restraint in applications such as magnetometry [9] and gyroscopy [10]. However, building a source of N00N states beyond two photons is challenging. Three, four and six photon ex- periments have been reported [11, 12, 13], but only in the first two references were N00N states generated. In the- ory a source could be made using a nonlinear crystal [14]. However, the required optical nonlinearity is not readily available. An alternative is a non-deterministic approach using linear optics, wherein the desired state is generated on condition of a specific outcome at photodetectors. A variety of schemes have been suggested which typically rely on conditional destructive interference [15, 16, 17]. Electronic address: hcable@lsu.edu However, so far none of these scales efficiently, that is they all share the feature that exponentially decreasing success probabilities outweigh the possible gains. Noting that quantum algorithms, exhibiting polynomial and ex- ponential speedups over their classical counterparts, may be implemented scalably in a linear-optics approach [18], we expect that it should be possible to do better. In this Article we address this challenge by adapting a concep- tually simple measurement procedure. Our method is as follows. First, we aim to minimise the negative effects of back-action in a sequence of detections, whereby earlier measurements affect the outcomes of later ones. Next, we engineer output states that closely approximate the ideal case. Finally, we exploit feed-forward, for which cir- cuits are actively switched in response to previous pho- todetections. Feed-forward is an essential ingredient of linear-optics based quantum computing, but is not used in previous proposals for engineering N00N states. We begin by considering the thought experiment de- picted in Fig. 1(a). Here, cavity modes labeled A and B are assumed to start with a well-defined photon num- ber N . They are each coupled to an external mode by a weakly transmissive mirror, and these modes are com- bined at a 50:50 beam splitter, and then subject to par- tial photodetection. The beam splitter acts to make the origin of the photons indistinguishable. When a photon is registered at the left or right photodetector labeled DL or DR, the transformation is given by the Kraus operators L̂ = (â − b̂)/ 2 or R̂ = (â + b̂)/ 2 respec- tively (where â and b̂ are the annihilation operators for modes A and B). To obtain the corresponding prob- abilities it is necessary to normalise by the total pho- ton number prior to detection. We suppose now that a string of detections occur only at Dr, say (by adjusting the path length difference of the cavities between detec- tions, with a phase shifter, the same state for the cav- ity modes can be obtained in every case). After this, the detectors are removed and the system evolves to a final state with all the remaining population at the out- put ports. Denoting by |ψAB〉 the state of modes A http://arxiv.org/abs/0704.0678v1 mailto:hcable@lsu.edu FIG. 1: Two measurement-based procedures, inducing rela- tive phase correlations between the principal modes, each one with N photons at the start. In (a), population leaks from cavity modes A and B into external modes, which are com- bined at a beam splitter. Photons are detected one at a time by photodetectors Dl and Dr. In (b), all modes are propagat- ing. Beam splitters couple some fraction f from the principal modes one and two into ancillae modes three and four, which are initially the vacuum. The ancillae are subsequently com- bined at a beam splitter, and subjected to number-resolving photodetection at Dl and Dr. and B after r initial detections, we find that |ψAB〉 = R̂r |N〉 |N〉 / 〈N | 〈N | |N〉 |N〉, normalising to unity. The probability Pcond of finding all the remain- ing photons “condensed” at the right output port (and none at the left output port) is as follows, Pcond = 〈ψAB| |ψAB〉 /S! (rCk) SCN−k where S ≡ 2N − r denotes the total remaining photon number, C denotes a binomial coefficient, and we assume that r < N . Evaluating the value of Pcond numerically for initial states of increasing size, we find that its value is determined asymptotically by the proportion of the input that is measured. For example, setting r either as one quarter or one third of 2N suffices for Pcond > 0.6 or Pcond > 0.7, respectively. We have found for our thought experiment, that later detections tend to strongly reinforce earlier ones. Hence, the effect of measurement back-action here is useful for state engineering, and in what follows we adapt the mea- surement process for N00N state generation. We di- vide our analysis into three stages. First, we translate our thought experiment into a mathematically equivalent procedure based purely on linear optics, and consider the general case for which photons are detected at both pho- todetectors. The localisation phenomena resulting from this measurement process, have been studied extensively in the context of the debate over the existence of abso- lute optical coherence in common quantum-optical exper- iments [19]. It has been demonstrated that well-defined correlations in the relative optical phase evolve, for the remaining population, and play a central role in the ongo- ing dynamics. Hence, in the second stage of our analysis, we investigate simple procedures for manipulating phase correlations, and relate states with well-defined correla- tions in the relative phase and N00N states. Finally, we identify a method based on feed-forward to enable N00N states to be generated efficiently. First, we translate our thought experiment into a mathematically-equivalent procedure, based on linear op- tics, as depicted in Fig. 1(b). We label this optical cir- cuit, Circuit I. Here all modes are propagating, and a source is assumed to supply dual Fock states |N〉|N〉 to the principal modes one and two. Beam splitters of re- flectance f couple modes one and two to ancillae modes three and four, which are combined at a 50:50 beam splitter. They are then measured by number-resolving photodetectors labeled Dl and Dr, where on average a fraction f of the input photons are registered. We now consider the state |ψl,r〉 generated in modes one and two after l photons are registered at Dl and r at Dr. Fol- lowing Ref. [20], it is convenient to adopt a representa- tion in terms of coherent states, which are of the form, |α〉 ≡ ||α| exp(iθ)〉 ∝ |α|2k/k! exp (ikθ) |k〉 in a basis of Fock states. It has been shown that, |ψl,r〉∝ dθavd∆exp (−iSθav)× G(∆−∆0)+exp (iσ)G(∆+∆0) |α1〉|α2〉,(1) where S ≡ 2N − l − r, αj = |αj | exp(iθj), θav = (θ1 + θ2)/2 and ∆ ≡ θ2 − θ1. The superposition phase σ takes the value lπ, and hence the measurement record must be known exactly. The scalar function G(X) is given to good approximation by the Gaussian expression −(l + r)X2/4 . The total photon number is equal to S. There are well-defined correlations in the relative- optical phase parameter ∆ at values plus and minus ∆0, determined only by the ratio of l to r. These correla- tions are multi-valued whenever photons are registered at both photodetectors, and a Schrödinger cat state is generated. We can see that cats are generated as a re- sult of the symmetry of the setup. Specifically, L̂ and R̂ are invariant under an exchange of the labeling of the modes, a transformation which reverses the sign of the relative phase. The generation of cat states therefore also requires precise phase stability between the modes. Turning to the source, we see that the state of the input can be a mixture of the form N PN |N〉|N〉〈N |〈N |, since ∆0 is independent of N , and standard linear optical ele- ments obey a superselection rule for the photon number. Several two-mode squeezing processes strongly suppress relative number fluctuations, and hence might serve as practical sources of light described by these mixed states. For the second stage in our analysis, we identify the outputs of Circuit I as examples of quantum reference frames — reference frames for a classically defined pa- rameter composed of finite quantum resources. Quantum reference frames are subject to depletion and degradation as they are used, and are currently of interest for proto- cols in the field of quantum information, in which they are regarded as a resource [21]. By making an analogy to classical phase references we can now identify simple ways in which states of the form Eq. (1) can be manip- ulated. For the current purposes we can assume that a large number of detections have been performed and define, |ψ∞ (∆0)〉 ∝ dθ exp (−iSθ) |α〉|α exp(i∆0)〉, (2) where α = |α|eiθ, for a state with a total photon num- ber S and a relative phase of ∆0 (assumed to be nor- malized). Relative phase correlations between more than two modes are transitive and are transformed additively by phase shifters. A phase reference can be extended to additional modes by combining it with the vacuum at a beam splitter. As an example, a 50:50 beam split- ter, which we denote here by Ubs, beating light in a Fock state with S photons against the vacuum yields, Ubs|S〉|0〉 ∝ |ψ∞(0)〉, and Ubs|0〉|S〉 ∝ |ψ∞(π)〉. There- fore, we see that a simple circuit, consisting only of a beam splitter and a phase shifter, can convert a cat state generated by Circuit I to aN00N state, whenever the rel- ative phase correlations differ by π. This happens when l = r and ∆0 = π/2. We label this circuit, Circuit III (anticipating an intermediate process modifying the cat states for the general case). Before proceeding to the final stage of our analy- sis, we consider a simple N00N -state generator, that attempts to convert every cat state generated by Cir- cuit I using Circuit III. This method might be expected to yield close approximations to N00N states, when- ever the the relative phases of the cat state are close to plus and minus π/2. The situation is summarized in Fig. 2(a). To measure the quality of the output state we adopt the fidelity, denoting it by F . For the measurement-induced condensation, considered at the start, F takes the same same value as Pcond. For schemes generating N00N states, it is necessary to ac- count for the phase of the superposition, and we define F = maxφ 〈0S| + exp (−iφ) 〈S0| |ψoutput〉 /2, where S is the total photon number of the state |ψoutput〉. Evaluating F for our N00N -state generator, when Cir- cuit I generates a cat state with relative phase compo- nents at ±∆0 and total photon number S, we find to first approximation that F ∼ cos2S [(∆0 − π/2) /2]. As with other proposals, this scheme in fact scales exponen- tially poorly whenever the relative phase correlations are less than π apart, as is typically the case. Inspecting the overlap for different relative phase components, as in Eq. (2) with total photon number S, we find that 〈ψ∞(∆1)|ψ∞(∆2)〉 cos [(∆2 −∆1) /2] . The poor scaling can be attributed to the non-orthogonality of the cat state components. We now proceed to the final stage of our analysis. Our previousN00N -state generator is effective when Circuit I FIG. 2: (a) and (b) illustrate complete N00N state generators in outline. Circuit I produces Schrödinger cat states with two relative phase components non-deterministically (green). The generators are terminated by a fixed unitary (yellow) circuit consisting of beam splitters and phase shifters. In (b) the cat states are corrected, using an additional measurement process conditioned on the previous detection outcomes. generates cat states with components which are orthog- onal. However, this occurs with low probability. It is not possible to improve the situation with any combi- nation of (idealised) beam splitters and phase shifters, since these implement unitary transformations. Hence, we now devise a circuit, labeled Circuit II, to make input cat components orthogonal, using additional processes of measurement and feed-forward. This more sophisticated scenario is depicted in Fig. 2(b). To identify a suitable circuit, we investigate how a beam splitter transforms phase references, starting with two classical fields. Here the field in each mode is represented by a complex num- ber, with the square amplitude corresponding to the in- tensity, and the phase to the optical one. A 50:50 beam splitter, configured so as not to impart additional phase shifts to the modes, outputs two classical fields described by the sum and difference of the values for the inputs (al- tering both the square amplitudes and the phases). If the input has a relative phase of 0 or π, and equal intensities for each mode, the population is transferred entirely into one mode. On the other hand, if the input has a relative phase of plus or minus π/2, and equal intensities in each mode, the relative phase and intensities are preserved. Moving to the quantum case, we consider the action of the beam splitter for a state, defined as in Eq. (2), with a relative phase of ∆0, a total photon number S, and an intensity S/2 in each mode. Computing the final state explicitly, we find a scenario similar to the classical case, Ubs|ψ∞〉 ∝ dθ exp (−iSθ)× I1 exp (iθ)〉| I2 exp [i (θ ± π/2)]〉 . (3) This has an intensity SI1/ (I1 + I2) = [1− cos(∆0)] /2 in mode one and SI2/ (I1 + I2) = S [1 + cos(∆0)] /2 in mode two, and a relative phase of plus π/2 when 0 < ∆0 ≤ π/2 and of minus π/2 when −π/2 ≤ ∆0 < 0 (we consider cases for which the intensity is increased in favour of mode two). The symmetry of the beam splitter transformation makes it useful for altering the cats generated by Circuit I, so that the relative phases are different by π. How- ever, it creates a difference in the intensities between the modes. To correct this, we propose beating mode two against the vacuum, so as to to move the difference of the intensities to an ancillary mode, which can be re- moved by a photodetection. This method depends criti- cally on feeding forward the result of the detections per- formed by Circuit I, so that a variable beam splitter can be set according to the value of ∆0. A variable beam splitter can be implemented with 50:50 beam splitters and variable phase shifters. The cost of correction is a decrease in the total photon number, which varies non- deterministically. As can be seen from Eq. (3), a fraction of cos (∆0) of the photons are lost on average. Overall, Circuits I through III constitute a complete N00N -state generator. Additional mathematical analysis is given in the supplementary online text. Runs for which Circuit I fails to generate a Schrödinger cat state, or too many photons are lost in the detection process are discarded. The fidelities at the output are, on average, 0.87, 0.94 or 0.98, when a fraction of one third, one half, or two third respectively of the input photons are detected by Circuit I. Higher fidelities are possible when the photon number at the input is small. If allowance is made for sufficient input photons to be detected by Circuit I, and a further half to be detected in Circuit II, the probability of failure is not too large. Finally, we suggest some possibilities for experimen- tal implementation. For the source, we propose an op- tical parametric oscillator setup for which the two mode squeezed output of an optical parametric amplifier is en- hanced by a cavity [22]. Note, however, that the cur- rent purposes require twin beams of a much lower inten- sity than is typical in many experiments, and that the beams must be rendered frequency degenerate. Tech- niques of feed-forward and photodetection are being de- veloped with a view to quantum information technolo- gies [23, 24]. For the source, an important problem is imperfect correlation between the modes. If, for exam- ple, two independent lasers of equal intensity provide the input, the scheme generates the intended relative phase correlations, but no entanglement [25]. Photodetectors are subject to loss and dark counts. Losses will act to degrade the source, reducing the relative number corre- lation and increasing the uncertainty in the total photon number. Dark counts are more problematic, mixing over the phase for the superposition in Eq. (1). An alter- native suggestion is using trapped bosonic atoms. One possibility might be to work in a regime for which the atomic wave-packets are much longer than the typical scattering length, as proposed in Ref. [26]. Another is to use Bose-Einstein condensates, for which a variety of coherent operations have been demonstrated. Number- resolved condensates might be obtained from the Mott Insulator phase, while relative-number squeezing can be achieved by different techniques. In conclusion, we have proposed for the first time a linear-optics based scheme that generates large N00N states efficiently, the photon number at the output scal- ing with that of the source — all the while maintaining high fidelities, high success probabilities and a fixed num- ber of circuit components. As well as being of immediate interest for a range of applications, our results have con- nections with other topics. For example, the scaling we derive for our measurement-induced condensation pro- cedure is of relevance to the study of the interference of light from independent sources and localizing relative op- tical phase, phenomena with analogs in different physical systems [27]. We have left as an open question the extent to which the scaling can be attributed to Bose statistics. Regarding our N00N -state generators, the creation of macroscopic entangled states is of interest for exploring the quantum-classical transition. Finally, our study of Schrödinger cat states may have application to quantum computing, where Schrödinger cat states, defined for one mode only, have been proposed to encode qubits, which may be manipulated using standard experimental tech- niques [28]. Acknowledgements The authors would like to acknowledge support from the Hearne Institute, the Army Research Office, and the Dis- ruptive Technologies Office. H. C. would like to thank Terry Rudolph, Ryan Glasser, Sonja Daffer and Yuan Liang Lim for helpful discussions. [1] Giovannetti, V., Lloyd, S. & Maccone, L. Quantum- enhanced measurements: beating the standard quantum limit. Science 306, 1330 (2004). [2] Kapale, K. T., Didomenico, L. D., Lee, H., Kok, P. & Dowling, J. P. Quantum interferometric sensors. Con- cepts of Physics II, 225 (2005). [3] Boto, A. N. et al. Quantum interferometric optical lithog- raphy: exploiting entanglement to beat the diffraction limit. Phys. Rev. Lett. 85, 2733 (2000). [4] Teich, M. C. & Saleh, B. E. A. Microscopy with quantum- entangled photons. Českloslovenský časopis pro fyziku 47, 3 (1997). English translation. [5] Bollinger, J. J., Itano, W. M., Wineland, D. J. & Heinzen, D. J. Optimal frequency measurements with maximally correlated states. Phys. Rev. A 54, R4649 (1996). [6] Ou, Z. Y. Fundamental quantum limit in precision phase measurement. Phys. Rev. A 55, 2598 (1997). [7] Boixo, S., Flammia, S. T., Caves, C. M. & Geremia, J. M. Generalized limits for single-parameter quantum estimation. Phys. Rev. Lett. 98, 090401 (2007). [8] Giovannetti, V., Lloyd, S. & Maccone, L. Quantum metrology. Phys. Rev. Lett. 96, 010401 (2006). [9] Kominis, I. K., Kornack, T. W., Allred, J. C. & Romalis, M. V. A subfemtotesla multichannel atomic magnetome- ter. Nature 422, 596 (2003). [10] Dowling, J. P. Correlated input-port, matter-wave inter- ferometer: quantum-noise limits to the atom-laser gyro- scope. Phys. Rev. A 57, 4736 (1998). [11] Mitchell, M. W., Lundeen, J. S. & Steinberg, A. M. Super-resolving phase measurements with a multi-photon entangled state. Nature 429, 161 (2004). [12] Walther, P. et al. De broglie wavelength of a non-local four-photon state. Nature 429, 158 (2004). [13] Resch, K. J. et al. Time-reversal and super-resolving phase measurements. quant-ph/0511214 (2005). [14] Sanders, B. C. Quantum dynamics of the nonlinear rota- tor and the effects of continual spin measurement. Phys. Rev. A 40, 2417 (1989). [15] Fiurášek, J. Conditional generation of n-photon entan- gled states of light. Phys. Rev. A 65, 053818 (2002). [16] Zou, X., Pahlke, K. & Mathis, W. Generation of entan- gled photon states by using linear optical elements. Phys. Rev. A 66, 014102 (2002). [17] Kok, P., Lee, H. & Dowling, J. P. Creation of large- photon-number path entanglement conditioned on pho- todetection. Phys. Rev. A 65, 052104 (2002). [18] Knill, E., Laflamme, R. & Milburn, G. J. A scheme for efficient quantum computation with linear optics. Nature 409, 46 (2001). [19] Mølmer, K. Optical coherence: A convenient fiction. Phys. Rev. A 55, 3195 (1997). [20] Sanders, B. C., Bartlett, S. D., Rudolph, T. & Knight, P. L. Photon-number superselection and the entangled coherent-state representation. Phys. Rev. A 68, 042329 (2003). [21] Bartlett, S. D., Rudolph, T. & Spekkens, R. W. Refer- ence frames, superselection rules, and quantum informa- tion. quant-ph/0610030v2 (2006). [22] Zhang, Y., Kasai, K. & Watanabe, M. Investigation of the photon-number statistics of twin beams by direct de- tection. Opt. Lett. 27, 1244 (2002). [23] Kok, P. et al. Linear optical quantum computing with photonic qubits. Rev. Mod. Phys. 79, 135 (2007). [24] Prevedel, R. et al. High-speed linear optics quantum computing using active feed-forward. Nature 445, 65 (2007). [25] Cable, H., Knight, P. L. & Rudolph, T. Measurement- induced localization of relative degrees of freedom. Phys. Rev. A 71, 042107 (2005). [26] Popescu, S. KLM quantum computation with bosonic atoms. quant-ph/0610043 (2006). [27] Rau, A. V., Dunningham, J. A. & Burnett, K. Measurement-induced relative-position localization through entanglement. Science 301, 1081 (2003). [28] Jeong, H. & Ralph, T. C. Schrodinger cat states for quan- tum information processing. quant-ph/0509137 (2005). Supplementary Material: Methods In these supplementary notes, we provide further anal- ysis of our N00N -state generator, consisting of Circuits I, II and III, as depicted in outline in Fig. 2(b). First, we specify notation for beam splitters, phase shifters and states with well-defined relative phase correlations. For the lossless beam splitter, we choose a notation which makes explicit the “rotation” performed by such a device. A beam splitter with transmittance τ and reflectance (1− τ) acts to transform the annihilation operators ôj for modes labeled j, according to the relations, cos(γ) − sin(γ) sin(γ) cos(γ) with angular parameter γ, where τ = cos2 (γ) and 0 ≤ γ ≤ π/2. We denote this transformation Ubs (γ), and we include, where necessary, phase shifts of χ at the input port and −χ at the output port of the first mode, so that Ubs (γ, χ) ≡ exp γ exp(iχ)ô1ô 2 − γ exp(−iχ)ô . For example, Ubs (arccos ( τ ), π/2) corresponds to a sym- metric beam splitter. We denote a phase shift trans- formation on mode j, exp †ôjχ , by Ups (χ). For a state defined, as in Eq. (2), with a total photon number S and relative phase ∆0, it is helpful to incorporate a phase factor exp (−iS∆0/2) into the normalisation (making the definition symmetric between the modes). We then adopt the following notation for a normalised Schrödinger cat state, |ψcat (∆0,Λ)〉 ∝ |ψ∞ (∆0)〉+ exp (iΛ) |ψ∞ (−∆0)〉 , having components with relative phases plus and minus ∆0, a phase for the superposition Λ (with the overall normalisation constant assumed positive). Next, we elaborate on the sequence of operations per- formed by our N00N -state generator. We assume the final state should have at least P photons, and that the correlations in the relative phase are ideal. For the first step, Circuit I, depicted in Fig. 1(b), implements the transformation, |l, r〉〈l, r| 3,4 Ubs conditioned on the detection of l photons in mode 3 and r photons in mode 4. A dual Fock state from the source evolves to a cat state according to, Circuit I :|N〉1|N〉2|0〉3|0〉4 −→ |ψcat (∆0, lπ)〉1,2 . This cat state has relative phase components with values plus and minus ∆0 ≡ 2 arccos r/(l + r) , and total photon number 2N − l − r. Runs for which l or r are zero must be discarded. It has been shown that values for the relative phase are generated with approximately equal frequency across the range [25], and hence these failure events do not affect the scaling of the generator. Next, Circuit II acts to transform the relative phase correlations, to plus and minus π/2, in every case. When r ≥ l, the relative phase correlations lie in the range [−π/2, π/2], and a 50:50 beam splitter acting on the prin- cipal modes corrects the relative phase correlations, while increasing the intensity in mode two (and decreasing it in mode one). To achieve the same outcome when l < r, we suppose that a phase shift of π is applied in advance (on either mode). This transforms the cat state generated by Circuit I as, Ups (π) |ψcat (∆0(l, r), lπ)〉 ∝ |ψcat (∆0(r, l), rπ)〉 . Next, a beam splitter, with transmittance [1− cos (∆0)] / [1 + cos (∆0)], transfers the differ- ence of the intensities to the ancillary mode five. A circuit for implementing the variable beam splitter is given by the relation, Ubs (γ)2,5 ≡ Ubs (π/4, π/2)2,5 Ups (γ)5 Ups (−γ)2 Ubs (π/4,−π/2)2,5 . A photodetector measures Q photons in mode 5. Overall, Circuit II implements the transformation, |Q〉 〈Q| Ubs {arccos [tan (∆0/2)]}2,5 Ubs (π/4)1,2 . The cat state evolves as, Circuit II : |ψcat (∆0, lπ)〉−→|ψcat [π/2, lπ + (2N − l− r −Q)π/2]〉 . We derived a full probability distribution for the out- comes of the photodetection performed by Circuit II, Prob(Q = 0, · · · , S − 1) = 1 1 + (−1)l cosS (∆0) SCQ [1−cos (∆0)]S−Q cosQ (∆0) Prob(Q = S) = 1 + (−1)l cosS (∆0) 1 + (−1)l cosS (∆0) , where S = 2N − l − r is the total photon number prior to detection, and C denotes a binomial coefficient. This probability distribution is approximately binomial, and the expected number of detections is S cos (∆0). If too many photons are lost in Circuits I and II the run must be aborted. The probability of this can be made small by taking P ≃ N(1−f). In principle, excess photons can be removed by an additional process, similar to Circuit Finally, Circuit III implements the unitary transfor- mation, Ubs (π/4, π)1,2 Ups (π/2)2 . The corrected cat state evolves as, Circuit III : |ψcat (π/2, lπ+(2N−l−r−Q)π/2)〉−→|P, 0〉+(−1)l |0, P 〉 , yielding the desired N00N state, with P = 2N−l−r−Q. It may be noted that the superposition phase for the N00N -state at the output depends on the measurement record at the photodetectors. When l < r, the additional phase shift in Circuit II causes this phase to be rπ rather than lπ. Next, we estimate the fidelities of the states produced by our N00N state generator, and clarify its behaviour for large photon number. To do this, we first compute the fidelity for one component of a cat state generated by Circuit I, which we denote by |ψG(∆0)〉. We assume, as in Eq. (1), that the function G(X), describing the localisation of the relative phase, assumes its Gaussian asymptotic form. Note that the rate of localisation is faster when phase correlations evolve at more than one value. We assume that the state at the input is the dual Fock state |N〉1|N〉2, and that a total of D = l + r de- tections have occurred. Then, 〈ψG(∆0)|ψ∞ (∆0)〉 ∫ π/2 d∆cosS ∫ π/2 ∫ π/2 d∆d∆′ cosS ∆′2 +∆2 where S = 2N − D is the total photon number. This result was derived assuming that S and D are not small. The value for the fidelity depends only on the ratio of detections to input photons. For example, when D/2N is one half, F = 0.94, and when D/2N is two thirds, F = 0.98. To verify this result, we computed numerically exact values for the fidelity, 〈ψl,r|ψcat (∆0(l, r), lπ)〉 for a range of states |ψl,r〉 generated by Circuit I. For input state |3〉1|3〉2, the fidelity is 0.94 for (l, r) = (1, 2) and (2, 1), and anomalously it is 1 for (l, r) = (1, 1). For input state |5〉1|5〉2 the values are 0.94 and 0.96 when l + r = 5, and range from 0.96 to 1 when l + r = 6, while for input state |15〉1|15〉2 the values range from 0.92 to 0.96 when l+ r = 15, and from 0.96 to 0.99 when l + r = 20. Finally, we performed a complete numerical simulation of the N00N -state generator, to verify that Circuits I, II and III work together as predicted. In particular, it was necessary to check that Circuits II and Circuit III func- tion as expected when the relative phase correlations for the cat states are not perfectly well-defined. The results are shown in Fig. 3. Each point in the plots corresponds to a particular choice of input state, and measurement by Circuit I. The height corresponds to the expected pho- ton number for the output N00N state, and the color to its fidelity. Averages are taken over all possible out- comes to the third photodetection performed by Circuit II. For comparison, the mesh shows the predictions of the preceding analysis. Good agreement is seen between these analytical predictions and the numerical results. However, inspection of individual outcomes in Circuit II reveals that the high fidelities are not maintained in ev- ery case. Roughly speaking, improbable outcomes were often found to have low fidelity. 0.88 0.9 0.92 0.94 0.96 0.98 1 Fidelity FIG. 3: Fidelities (color) and photon number (vertical axis) are displayed for outputs of our N00N states generator. Each point corresponds to a possible outcome to Circuit I, for which D photons are detected. Input states |N〉|N〉2 are considered for N up to 15. Going from left to right, D/2N is one third, one half and two thirds. ABSTRACT We show how an idealised measurement procedure can condense photons from two modes into one, and how, by feeding forward the results of the measurement, it is possible to generate efficiently superpositions of components for which only one mode is populated, commonly called ``N00N states''. For the basic procedure, sources of number states leak onto a beam splitter, and the output ports are monitored by photodetectors. We find that detecting a fixed fraction of the input at one output port suffices to direct the remainder to the same port with high probability, however large the initial state. When instead photons are detected at both ports, Schr\"{o}dinger cat states are produced. We describe a circuit for making the components of such a state orthogonal, and another for subsequent conversion to a N00N state. Our approach scales exponentially better than existing proposals. Important applications include quantum imaging and metrology. <|endoftext|><|startoftext|> Introduction We are interested in a finite branch local solution to the sixth Painlevé equation around a fixed singular point. We show that every such solution is in fact an algebraic branch solution (see Definition 1.1 for the terminology). In particular a global solution is an algebraic solution if and only if it is finitely many-valued globally. Although the problem under study is local in nature, our solution to it relies on an effective combination of some global technologies and some local tools. The former includes the algebraic geometry of the sixth Painlevé equation, Riemann- Hilbert correspondence, geometry and dynamics on cubic surfaces, Kleinian singularities and their minimal resolutions [15, 16, 17, 18, 20], while the latter includes the power geometry of algebraic differential equation [5, 6, 7], which is a method of constructing formal solutions by means of Newton polygons, and the theory of nonlinear differential equations of “regular singular type” [10, 11], which discusses the convergence of formal solutions. Let us describe our main results in more detail. First we recall that the sixth Painlevé equation PVI(κ) is a Hamiltonian system of nonlinear differential equations ∂H(κ) = −∂H(κ) , (1) ∗Mathematics Subject Classification: 34M55, 37F10. †E-mail address: iwasaki@math.kyushu-u.ac.jp http://arxiv.org/abs/0704.0679v1 with time variable z ∈ Z := P1 − {0, 1,∞} and unknown functions q = q(z) and p = p(z), depending on complex parameters κ = (κ0, κ1, κ2, κ3, κ4) in the 4-dimensional affine space K := { κ = (κ0, κ1, κ2, κ3, κ4) ∈ C5 : 2κ0 + κ1 + κ2 + κ3 + κ4 = 1 }, (2) where the Hamiltonian H(κ) = H(q, p, z; κ) is given by z(z − 1)H(κ) = (q0q1qz)p2 − {κ1q1qz + (κ2 − 1)q0q1 + κ3q0qz}p+ κ0(κ0 + κ4)qz, with qν := q − ν for ν ∈ {0, 1, z}. Each of the points 0, 1, ∞ is called a fixed singular point. It is well known that equation (1) has the analytic Painlevé property, that is, any meromor- phic solution germ at a base point z ∈ Z can be continued meromorphically along any path in Z emanating from z. Thus a solution can branch only around a fixed singular point. We are interested in finite branch solutions around it, by which we mean the following. Definition 1.1 A finite branch solution to equation (1), say, around z = 0 is a local solution (q(z), p(z)) on a punctured disk D× = D − {0} centered at z = 0 such that its lift (q̃(z̃), p̃(z̃)) along some finite branched covering ϕ : (D̃, 0̃) → (D, 0), z̃ 7→ z = z̃n around z = 0 is a single- valued meromorphic function on D̃× = D̃ − {0̃}. Such a solution is said to be an algebraic branch solution if it can be represented by a convergent Puiseux-Laurent expansion q(z) = i/n, p(z) = i/n, (3) with ai = bi = 0 for all sufficiently small i ≪ 0, namely, if the lift (q̃(z̃), p̃(z̃)) is a single-valued meromorphic function on D̃ with at most pole at the origin z̃ = 0̃. Problem 1.2 Is any finite branch solution to PVI(κ) an algebraic branch solution ? In this article we settle this problem in the affirmative as is stated in the following. Theorem 1.3 Any finite branch solution to Painlevé VI around a fixed singular point is an algebraic branch solution. In particular a global solution is an algebraic solution if and only if it is finitely many-valued globally. These results are valid for all parameters κ ∈ K. It is an interesting problem to consider algebraic solutions to Painlevé VI. Many algebraic solutions have been constructed in [1, 2, 3, 8, 13, 14, 24, 25], but a complete classification seems to be outstanding. We hope that Theorem 1.3 will play an important part in discussing this issue. The following remark explains what Theorem 1.3 signifies and why it is remarkable. Remark 1.4 Logically, according to Definition 1.1, a finite branch solution (q(z), p(z)) around z = 0 may have a very transcendental singularity at z = 0, to the effect that its lift (q̃(z̃), p̃(z̃)) may have infinitely many poles in D̃× accumulating to the origin z̃ = 0̃, or even if such an accumulation phenomenon does not occur, it may have an essential singularity at z̃ = 0̃. Rather surprisingly, however, Theorem 1.3 excludes the possibility for a finite branch solution to admit such transcendental phenomena. This result becomes more intriguing if we recall that wild behaviors of a generic solution to Painlevé VI have been observed in [9, 12, 22, 31, 32] and examples of solutions with infintely many poles accumulating to z̃ = 0 are given in [12, 31]; such a distribution of poles may be expected for a generic solution, though it is not rigorously verified yet to the author’s knowledge. Thus we can think that a finite branch solution is quite distinguished from generic solutions, necessarily being an algebraic branch solution. I II III {all algebraic branch solutions} i→֒ {all finite branch solutions} j→֒ {all solutions} {some algebraic branch solutions} i →֒ {all finite branch solutions} I ′ II Figure 1: Main idea for the proof of Theorem 1.3 The main idea for the proof of Theorem 1.3 is presented in Figure 1. We have natural inclusions i and j in the top line of Figure 1 and we wish to show that the injection i is in fact a surjection. Our strategy consists of the “upper bound part” and the “lower bound part”. (1) Upper bound part: In this part we investigate the inclusion j : II →֒ III in Figure 1, considering how the locus of finite branch solutions is included in the moduli space of all solutions. In other words, we make a confinement of the locus II in the entire space III. What we shall really do is not an upper bound estimation of this locus but rather a pinpoint identification of it. This is the main part of the article and we use the algebraic geometry of Painlevé VI, Riemann-Hilbert correspondence, geometry and dynamics on cubic surfaces, and minimal resolutions of Kleinian singularities [15, 16, 17, 18, 19, 20]. (2) Lower bound part: In this part we fill in the diagram of Figure 1 by adding the bottom line to the top one. We try to construct as many algebraic branch solutions as possible in order to make the set I ′ as large as possible. The construction is based on the power geometry technique developed in [5, 6, 7] and the convergence arguments in [10, 11]. We are done if the set I ′ is large enough to show that the injection i′ : I ′ →֒ II is in fact a surjection. This does not mean that we verify the equality I ′ = II directly. (If such a direct approach were feasible, then our problem would not be difficult from the beginning!) Instead, we prove it very indirectly based on the following idea. (3) Key trick: Suppose that a component A of I ′ injects into a component B of II. If the cardinalities of A and B are finite and the same, then the injection i′ : A →֒ B is in fact a surjection. If A and B are biholomorphic to C and the injection i′ : A →֒ B is holomorphic, then it must be a surjection because any holomorphic injection C →֒ C is a surjection (use Casorati-Weierstrass or Picard’s little theorem). The same argument holds true if C is replaced by C×, since any holomorphic injection C× →֒ C× is a surjection (lift it to the universal covering C →֒ C). These tricks enable us to identify the component A ⊂ I ′ with the component B ⊂ II. We show that each component involved is either of the three types mentioned above. Then we make this kind of argument componentwise to get an identification I ′ = II, which leads to the desired coincidence I ′ = I = II. In view of the way in which Theorem 1.3 is established, the power geometry technique provides us with an efficient method of identifying all finite branch solutions (up to Bäcklund transformations), which have now turned out to be algebraic branch solutions, by determining the leading terms of their Puiseux-Laurent expansions. In some sense this article is a counterpart of the previous paper [20] where an ergodic study of Painlevé VI is developed (see also the survey [21]). Put z1 = 0, z2 = 1, z3 = ∞. For each γ2 γ3 Figure 2: Three basic loops γ1, γ2, γ3 in Z = P 1 − {0, 1,∞} {i, j, k} = {1, 2, 3}, let γi be a loop in Z surrounding zi once anti-clockwise and leaving zj and zk outside as in Figure 2. Then the fundamental group π1(Z, z) is represented as π1(Z, z) = 〈 γ1, γ2, γ3 | γ1γ2γ3 = 1 〉. (4) A loop γ ∈ π1(Z, z) is said to be elementary if it is conjugate to γmi for some i ∈ {1, 2, 3} and m ∈ Z; otherwise, it is said to be non-elementary. The main theme of [20] is the dynamics of the nonlinear monodromy of PVI(κ) along a given loop γ. It is shown there that, along every non-elementary loop, the nonlinear monodromy is chaotic and the number of its periodic points grows exponentially as the period tends to infinity. On the other hand, it is Liouville integrable along an elementary loop, in the sense that it preserves a Lagrangian fibration. Now we notice that from the dynamical point of view the main problem of this article is nothing other than discussing the periodic points of the nonlinear monodromy along the basic loop γi, which is of course an elementary loop. In view of its integrable character, one may doubt if there is something very deep with this issue. As Theorem 1.3 and Remark 1.4 show, however, this issue is actually quite interesting from the function-theoretical point of view. The plan of this article is as follows. In §2 the phase space of Painlevé VI is introduced as a moduli space of stable parabolic connections. In §3 the Riemann-Hilbert correspondence from the moduli space to an affine cubic surface is formulated and its character as an analytic minimal resolution of Kleininan singularities is stated. In §4 the dynamical system on the cubic surface representing the nonlinear monodromy of Painlevé VI is formulated and some preliminary properties of it are given. In §5 we briefly review Bäcklund transformations and their relation to the Riemann-Hilbert correspondence. In §6 fixed points and periodic points of the dynamical system are discussed. A stratification of the parameter space K is also introduced in order to describe the singularities of the cubic surfaces. In §7 a case-by-case study of fixed points and periodic points is made according to the stratification, thereby a pinpoint identification of finite branch solutions is made on each stratum. In §8 power geometry of algebraic differential equations is applied to Painlevé VI in order to construct as many algebraic branch solutions as possible. In §9 we consider the inclusion of those solutions constructed in §8 into the moduli space of all finite branch solutions. After some preliminaries on Riccati solutions, we show that this inclusion is in fact a surjection, thereby complete the proof of Theorem 1.3. singularities t1 = 0 t2 = z t3 = 1 t4 = ∞ first exponent −λ1 −λ2 −λ3 −λ4 second exponent λ1 λ2 λ3 λ4 − 1 difference κ1 κ2 κ3 κ4 Table 1: Riemann scheme: κi is the difference of the second exponent from the first. 2 Phase Space Equation (1) is only a fragmentary appearance of a more intrinsic object constructed algebro- geometrically [16, 17, 18]. We review this construction following the expositions of [20, 21]. The sixth Painlevé dynamical system PVI(κ) is formulated as a holomorphic, uniform, transversal foliation on a fibration of certain smooth quasi-projective rational surfaces πκ : M(κ) → Z := P1 − {0, 1,∞}, whose fiber Mz(κ) := π−1κ (z) over z ∈ Z, called the space of initial conditions at time z, is realized as a moduli space of stable parabolic connections. The total space M(κ) is called the phase space of PVI(κ). In this formulation, the uniformity of the Painlevé foliation, in other words, the geometric Painlevé property of it is a natural consequence of a solution to the Riemann-Hilbert problem (see Theorem 3.5), especially of the properness of the Riemann- Hilbert correspondence [16]. Then equation (1) is just a coordinate expression of the foliation on an affine open subset of M(κ) and the analytic Painlevé property for equation (1) is an immediate consequence of the geometric Painlevé property for the foliation and the algebraicity of the phase space M(κ). Moreover there exists a natural compactification Mz(κ) →֒ Mz(κ) of the moduli space Mz(κ) into a moduli space Mz(κ) of stable parabolic phi-connections. Here we include a very sketchy explanation of the terminology used in the last paragraph. A stable parabolic connection is a Fuchsian connection equipped with a parabolic structure on a (rank 2) vector bundle over P1 having a Riemann scheme as in Table 1, where the parabolic structure corresponds to the first exponents, which satisfies a sort of stability condition in geometric invariant theory. Here the parameter κi stands for the difference of the second exponent from the first one at the regular singular point ti. On the other hand, a stable parabolic phi-connection is a variant of stable parabolic connection allowing a “matrix-valued Planck constant” called a phi-operator φ such that the generalized Leibniz rule ∇(fs) = df ⊗ φ(s) + f∇(s) is satisfied, where the key point here is that the field φ may be degenerate or simi-classical. Then the moduli space Mz(κ) can be compactified by adding some semi-classical objects, that is, some stable parabolic phi-connections with degenerate phi-operator φ. There is the following characterization of our moduli spaces (see Figure 3). Theorem 2.1 ([16, 17, 18]) (1) The compactified moduli space Mz(κ) is isomorphic to an 8-point blow-up of the Hirze- bruch surface Σ2 → P1 of degree 2. Mz(κ) Yz(κ) : vertical leaves E1 E2 E3 E4 Figure 3: Nonlinear monodromy γ∗ : Mz(κ) along a loop γ ∈ π1(Z, z) (2) Mz(κ) has a unique effective anti-canonical divisor Yz(κ), which is given by Yz(κ) = 2E0 + E1 + E2 + E3 + E4, (5) where E0 is the strict transform of the section at infinity and Ei (i = 1, 2, 3, 4) is the strict transform of the fiber over the point ti ∈ P1 of the Hirzebruch surface Σ2 → P1. (3) The support of the divisor Yz(κ) is exactly the locus where the phi-operator φ is degenerate, with the coefficients of formula (5) being the ranks of degeneracy of φ. In particular, Mz(κ) = Mz(κ)− Yz(κ). This theorem implies that Mz(κ) is a moduli-theoretical realization of the space of initial conditions for PVI(κ) constructed “by hands” in [26], Mz(κ) is a generalized Halphen surface of type D 4 in [30] and (Mz(κ),Yz(κ)) is an Okamoto-Painlevé pair of type D̃4 in [28]. Since the Painlevé foliation has the geometric Painlevé property [16], each loop γ ∈ π1(Z, z) admits global horizontal lifts along the foliation and induces an automorphism γ∗ : Mz(κ) → Mz(κ), Q 7→ Q′, (6) called the nonlinear monodromy along the loop γ (see Figure 3). Note that a fixed point or a periodic point of the map γ∗ : Mz(κ) can be identified with a solution germ at z which is single-valued or finitely many-valued along the loop γ, respectively. 3 Riemann-Hilbert Correspondence Generally speaking, a Riemann-Hilbert correspondence is the map from a moduli space of flat connections to a moduli space of monodromy representations, sending a connection to its C1 C2 C3 0 z 1 Figure 4: Four loops in P1 − {0, z, 1,∞} monodromy. In our situation an appropriate Riemann-Hilbert correspondence RHz,κ : Mz(κ) → Rz(a), Q 7→ ρ, (7) is formulated in [16, 17, 18]. For each a = (a1, a2, a3, a4) ∈ A := C4, let Rz(a) denote the moduli space of Jordan equivalence classes of linear monodromy representations ρ : π1(P 1 − {0, z, 1,∞}, ∗) → SL2(C), with the prescribed local monodromy data Tr ρ(Ci) = ai (i = 1, 2, 3, 4), where Ci is a loop as in Figure 4. Any stable parabolic connection Q ∈ Mz(κ), restricted to P1 − {0, z, 1,∞}, induces a flat connection and determines the Jordan equivalence class ρ ∈ Rz(a) of its monodromy representations, where the correspondence of parameters κ 7→ a is described as follows. If −1πκi) (i = 0, 1, 2, 3), − exp( −1πκ4) (i = 4), then b = (b0, b1, b2, b3, b4) belongs to the multiplicative space B := { b = (b0, b1, b2, b3, b4) ∈ (C×)5 : b20b1b2b3b4 = 1 }. The Riemann scheme in Table 1 then implies that the monodromy matrix ρ(Ci) has an eigen- value bi for each i = 1, 2, 3, 4. Since ρ(Ci) ∈ SL2(C), its trace ai = Tr ρ(Ci) is given by ai = bi + b i (i = 1, 2, 3, 4). (9) Given any θ = (θ1, θ2, θ3, θ4) ∈ Θ := C4θ, consider the affine cubic surface S(θ) = {x ∈ C3x : f(x, θ) := x1x2x3 + x21 + x22 + x23 − θ1x1 − θ2x2 − θ3x3 + θ4 = 0}. Then there exists an isomorphism of affine algebraic surfaces Rz(a) → S(θ), ρ 7→ x = (x1, x2, x3), with xi = Tr ρ(CjCk) for {i, j, k} = {1, 2, 3}, where the correspondence of parameters a 7→ θ is given by aia4 + ajak ({i, j, k} = {1, 2, 3}), a1a2a3a4 + a 1 + a 2 + a 3 + a 4 − 4 (i = 4). w1 w2 w3 w4  −2 1 1 1 1 1 −2 0 0 0 1 0 −2 0 0 1 0 0 −2 0 1 0 0 0 −2  Figure 5: Dynkin diagram and Cartan matrix of type D The composition of the sequence κ 7→ b 7→ a 7→ θ of the three maps (8), (9) and (10) is referred to as the Riemann-Hilbert correspondence in the parameter level [16] and is denoted by rh : K → Θ. (11) Then the Riemann-Hilbert correspondence (7) is reformulated as a holomorphic map RHz,κ : Mz(κ) → S(θ) with θ = rh(κ). (12) The map (11) admits a remarkable affine Weyl group structure [16, 19], from which the Bäcklund transformations of Painlevé VI emerge [15]. In view of formula (2) the affine space K can be identified with the linear space C4 by the forgetful isomorphism K → C4, κ = (κ0, κ1, κ2, κ3, κ4) 7→ (κ1, κ2, κ3, κ4), where the latter space C4 is equipped with the standard (complex) Euclidean inner product. For each i ∈ {0, 1, 2, 3, 4}, let wi : K → K, κ 7→ κ′, be the orthogonal reflection in the hyperplane { κ ∈ K : κi = 0}, which is explicitly represented as κ′j = κj + κicij (i, j ∈ {0, 1, 2, 3, 4}), (13) where C = (cij) is the Cartan matrix of type D 4 given in Figure 5. Then the group generated by w0, w1, w2, w3, w4 is an affine Weyl group of type D 4 ) = 〈w0, w1, w2, w3, w4〉 y K. corresponding to the Dynkin diagram in Figure 5. The reflecting hyperplanes of all reflections in the group W (D 4 ) are given by affine linear relations κi = m, κ1 ± κ2 ± κ3 ± κ4 = 2m+ 1 (i ∈ {1, 2, 3, 4}, m ∈ Z), where the signs ± may be chosen arbitrarily. Let Wall be the union of all these hyperplanes. Then the affine Weyl group structure on (11) is stated as follows [16] (see Figure 6). Lemma 3.1 In terms of b ∈ B, the discriminant ∆(θ) of the cubic surfaces S(θ) factors as ∆(θ) = (bl − b−1l )2 ε∈{±1}4 (bε − 1), (14) where we put bε = bε11 b 4 for each quadruple sign ε = (ε1, ε2, ε3, ε4) ∈ {±1}4. The Riemann-Hilbert correspondence in the parameter level (11) is a branched W (D 4 )-covering ramifying along Wall and mapping Wall onto the discriminant locus ∆(θ) = 0 in Θ. ∆(θ) = 0 K-space Θ-spaceWall Figure 6: Riemann-Hilbert correspondence in the parameter level The singularity structure of the cubic surfaces S(θ) can be described in terms of the strati- fication of K by proper Dynkin subdiagrams, which we now define. Definition 3.2 Let I be the set of all proper subsets of {0, 1, 2, 3, 4} including the empty set ∅. For each element I ∈ I, we put KI = the W (D(1)4 )-translates of the set { κ ∈ K : κi = 0 (i ∈ I) }, DI = the Dynkin subdiagram of D 4 that has nodes • exactly in I. Let KI be the set obtained from KI by removing the sets KJ with #J = #I + 1. Then it turns out that we have either KI = KI′ or KI ∩ KI′ = ∅ for any distinct subsets I, I ′ ∈ I (see Remark 3.3). So we can think of the stratification of K by the subsets KI (I ∈ I), called the W (D 4 )-stratification, where each KI is referred to as a W (D 4 )-stratum. For example, if I = ∅, one has the big open K∅ = K −Wall. Other examples of W (D()14 )-strata are given in Figure 7. The diagram DI encodes not only its underlying abstract Dynkin type but also the inclusion pattern DI →֒ D(1)4 , a kind of marking. The abstact Dynkin type of DI is denoted by Dynk(I). All the feasible abstract Dynkin types are tabulated in Table 2. There is a mistake in the definition of KI in [16, Definition 9.3] and [21], which is now corrected in Definition 3.2. (As for [16], correction may be possible before it is published.) Remark 3.3 Let I and I ′ be distinct elements of I. If Dynk(I) 6= Dynk(I ′), then KI∩KI′ = ∅. On the other hand, if KI = KI′ then Dynk(I) = Dynk(I ′) must be of abstract type A1 or A2. D4 A⊕41 A3 I = {0, 1, 2, 3} I = {1, 2, 3, 4} I = {0, 1, 2} Figure 7: Examples of W (D 4 )-strata number of nodes 4 3 2 1 0 abstract D4 A3 A2 A1 ∅ Dynkin type A⊕41 A 1 − − Table 2: Feasible abstract Dynkin types (1) There is a unique W (D 4 )-stratum of abstract type ∅, or A1, or A2, or A⊕41 . (2) There are six W (D 4 )-strata of abstract type A 1 or A3. (3) There are four W (D 4 )-strata of abstract type A 1 or D4. Example 3.4 We consider the W (D 4 )-strata of abstract types A 1 and D4. (1) The unique W (D 4 )-stratum of abstract type A 1 exactly corresponds to the value θ = (0, 0, 0,−4). A parameter κ ∈ K lies in this stratum if and only if either (a) κ1, κ2, κ3, κ4 ∈ Z, κ1 + κ2 + κ3 + κ4 ∈ 2Z; or (b) κ1, κ2, κ3, κ4 ∈ Z+ 1/2. (2) The four W (D 4 )-strata of abstract type D4 exactly correspond to the values θ = (8ε1, 8ε2, 8ε3, 28), where ε = (ε1, ε2, ε3) ∈ {±}3 ranges over all triple signs such that ε1ε2ε3 = 1. A parameter κ ∈ K lies in the union of these W (D(1)4 )-strata if and only if κ1, κ2, κ3, κ4 ∈ Z, κ1 + κ2 + κ3 + κ4 ∈ 2Z+ 1. With this stratification, we have a very neat solution to the Riemann-Hilbert problem. Theorem 3.5 ([16, 17, 18]) Given any κ ∈ K, put θ = rh(κ) ∈ Θ. Then, (1) if κ ∈ KI then S(θ) has Kleinian singularities of Dynkin type DI , (2) the Riemann-Hilbert correspondence (12) is a proper surjective map that is an analytic minimal resolution of Kleinian singularities. If κ ∈ K −Wall then the surface S(θ) is smooth and RHz,κ is a biholomorphism, while if κ ∈ Wall, it is not a biholomorphism but only gives a resolution of singularities (proper and surjective, but not injective). For example, see Figure 8 for the case κ = (0, 0, 0, 0, 1) where a singularity of type D4 occurs. In the latter case, however, if we take a standard algebraic minimal resolution of Kleinian singularities as constructed by Brieskorn [4] and others, ϕ : S̃(θ) → S(θ) (15) moduli space cubic surface RHz,κ resolution of singularity Mz(κ) D4Ez(κ) Figure 8: Resolution of singularities by Riemann-Hilbert correspondence then we can lift the Riemann-Hilbert correspondence (12) to have a commutative diagram Mz(κ) gRHz,κ−−−→ S̃(θ) Mz(κ) RHz,κ−−−→ S(θ). The lifted Riemann-Hilbert correspondence R̃Hz,κ is a biholomorphism and hence gives a strict conjugacy between the nonlinear monodromy (6) of PVI(κ) and a certain automorphism g̃ : S̃(θ) → S̃(θ). (17) This latter map will be described explicitly in Section 4 (see Theorem 4.1). The singularity structure of the affine cubic surface S(θ) is closely related to the Riccati solutions to PVI(κ) [16], where a Riccati solution is a particular solution that arises from the Riccati equation associated to a Gauss hypergeometric equation. Let Ez(κ) ⊂ Mz(κ) be the exceptional set of the resolution of singularities by the Riemann-Hilbert correspondence (12). Similarly, let E(θ) ⊂ S̃(θ) be the exceptional set of the algebraic resolution of singularities (15). Theorem 3.6 ([16, 29]) Equation PVI(κ) admits Riccati solutions if and only if κ ∈ Wall. All Riccati solution germs at time z ∈ Z are parametrized by the exceptional set Ez(κ) ⊂ Mz(κ), which precisely corresponds to the exceptional set E(θ) ⊂ S̃(θ) through the lifted Riemann-Hilbert correspondence (16). Fot this reason we may refer to Ez(κ) and M◦z(κ) := Mz(κ) − Ez(κ) as the Riccati locus and non-Riccati locus of Mz(κ) respectively. They are invariant under the action of the nonlinear monodromy (6). Corresponding to them, let Sing(θ) and S◦(θ) := S(θ)−Sing(θ) be the singular locus and the smooth locus of the cubic surface S(θ) respectively. Remark 3.7 Two remarks are in order at this stage. (1) By Theorem 3.5 the Riemann-Hilbert correspondence (12) restricts to a biholomorphism RH◦z,κ : M◦z(κ) → S◦(θ) (18) between the non-Riccati locus of Mz(κ) and the smooth locus of S(θ), while it collapses the Riccati locus Ez(κ) to the singular locus Sing(θ). In order to resolve this degeneracy and obtain an isomorphism, we had to take the lifted Riemann-Hibert correspondence (16), which induces an isomorphism between the exceptional sets Ez(κ) and E(θ). (2) For the Riccati solutions the main problem of this article is trivial; if a Riccati solution is a finite branch solution around a fixed singular point, then it is an algebraic branch solution, because the Riccati solution is (essentially) the logarithmic derivative of a Gauss hypergeometric function. Thus we may restrict our attention to the non-Riccati locus. 4 Dynamics on Cubic Surface We shall describe the strict conjugacy (17) of the nonlinear monodromy (6). For a cyclic permutation (i, j, k) of (1, 2, 3) we define an isomorphism gi : S(θ) → S(θ′), (x, θ) 7→ (x′, θ′) by gi : (x j , x j , θ k) = (θj − xj − xkxi, xi, xk, θj , θi, θk). (19) Through the resolution of singularities (15), the map gi is uniquely lifted to an isomorphism g̃i : S̃(θ) → S̃(θ′), (i = 1, 2, 3). We remark that the square g2i is an automorphism of S(θ) with g̃2i being its lift to S̃(θ). Theorem 4.1 ([16]) For each i ∈ {1, 2, 3} the nonlinear monodromy γi∗ : Mz(κ) along the i-th basic loop γi is strictly conjugated to the automorphism g̃ i : S̃(θ) via the lifted Riemann- Hilbert correspondence (16). More generally, if γ ∈ π1(Z, z) is represented by γ = γε1i1 γ · · ·γεnin with (i1, . . . , in) ∈ {1, 2, 3}n and (ε1, . . . , εn) ∈ {±1}n, then the map (17) is given by g̃ = g̃2ε1i1 g̃ · · · g̃2εnin . Let F̃ixj(θ) be the set of all fixed points of the transformation g̃ j : S̃(θ) . Moreover, for any integer n > 1, let P̃erj(θ;n) be the set of all periodic points of prime period n of the transformation g̃2j : S̃(θ) . Theorem 4.1 then implies that all single-valued solution germs and all n-branch solution germs to PVI(κ) around the fixed singular point zj are parametrized by the sets F̃ixj(θ) and P̃erj(θ;n) respectively. By Remark 3.7, considering F̃ixj(θ) and P̃erj(θ;n) upstairs is the same thing as considering Fixj(θ) and Perj(θ;n) downstairs, except for the ex- ceptional locus upstairs and the singular locus downstairs. Here Fixj(θ) and Perj(θ;n) denote the set of all fixed points and the set of all periodic points of prime period n of the transfor- mation g2j : S(θ) downstairs. In order to make the situation more transparent, we begin by investigating simultaneous fixed points of g21, g 3 downstairs. Theorem 4.2 If Fix(θ) is the set of all simultaneous fixed points of g21, g 3 : S(θ) , then Fix(θ) = Sing(θ). (20) Proof. A point x ∈ S(θ) is a singular point of the surface S(θ) if and only if its gradient vector field y(x, θ) = (y1(x, θ), y2(x, θ), y3(x, θ)) vanishes at the point x, where yi(x, θ) := (x, θ) = 2xi + xjxk − θi (21) On the other hand, an inspection of formula (19) readily shows that x ∈ S(θ) is a simultaneous fixed point of g21, g 3 if and only if x is a common root of equations f(x, θ) = y1(x, θ) = y2(x, θ) = y3(x, θ) = 0. (22) Then the equality (20) immediately follows from these observations. ✷ As is announced in [16], this theorem yields a characterization of the rational solutions. Corollary 4.3 Any single-valued global solution to PVI(κ) is a rational Riccati solution. Proof. If a single-valued solution Q ∈ Mz(κ) belongs to the non-Riccati locus Q ∈ M◦z(κ), then the Riemann-Hilbert correspondence (18) sends Q to a smooth point x ∈ S◦(θ). Since the single-valued solution Q is a simultaneous fixed point of the nonlinear monodromies γ1∗, γ2∗, γ3∗, the corresponding point x must lie in Fix(θ). Then Theorem 4.2 implies that x ∈ Sing(θ), which contradicts the fact that x ∈ S◦(θ). Hence any single-valued solution is a Riccati solution. Since any Riccati solution is (essentially) the logarithmic derivative of a Gauss hypergeometric function, any single-valued Riccati solution must be a rational solution. ✷ All the rational solutions to Painlevé VI are classified in [25]. We come back to our discussion downstairs and give a simple characterization of the sets Fixj(θ) and Perj(θ;n). Lemma 4.4 Let x = (x1, x2, x3) ∈ S(θ) be any point and let n be any integer > 1. (1) x ∈ Fixj(θ) if and only if x is a root of equations f(x, θ) = yj(x, θ) = yk(x, θ) = 0. (23) (2) x ∈ Perj(θ;n) if and only if there exists an integer 0 < m < n coprime to n such that f(x, θ) = 0, xi = 2 cos(πm/n). (24) Proof. We put (x′, θ′) = gj(x, θ) and y ′ = y(x′, θ′). Then formula (19) yields y′i = yi − xjyk, y′j = −yk, y′k = yj − xiyk. (25) For each integer n ∈ Z we write (x(n), θ(n)) = gni (x, θ) and y(n) = y(x(n), θ(n)). From formulas (19) and (25), we can easily obtain three recurrence relations (n+2) j + xi y (n+1) j + y j = 0, (26) (n+2) j − x j = y (n+2) j , (27) (n+1) k = x j . (28) The characteristic equation of the recurrence relation (26) is the quadratic equation λ2 + xi λ+ 1 = 0, (29) the roots of which are denoted by α and β = α−1. Since αβ = 1, we may and shall assume that |α| ≥ 1 ≥ |β| > 0 in the sequel. The discussion is divided into two cases. Case xi ∈ C − {±2}: In this case, the roots α and β are distinct and different from ±1 and the recurrence relation (26) is settled as βn(αyj + yk)− αn(βyj + yk) α− β . Then it follows from (27) and (28) that the sequences x j and x k are determined as j = x (2n+1) k = p α 2n + q β2n + r1, (2n+1) j = x k = p α 2n+1 + q β2n+1 + r2, where the constants p, q, r1 and r2 are given by p = − α 2(βyj + yk) (α − β)(α2 − 1) , q = β2(αyj + yk) (α− β)(β2 − 1) , r1 = xj − p− q, r2 = x′j − αp− βq. Notice that p = q = 0 if and only if x satisfies equations (23). Indeed, the condition p = q = 0 is equivalent to αyj + yk = βyj + yk = 0, which is equivalent to the condition yj = yk = 0, because the roots α and β are distinct. Now we assume that x is a root of equations (23). Then (30) implies that the sequence x(n) is periodic of period two, that is, x is a fixed point of g2j . Next we assume that x is not a root of equations (23). If x is a periodic point of g2j of prime period n ≥ 1, then (30) yields j − xj = (α2n − 1)(p− qβ2n) = 0, k − xk = (α2n − 1)(pα− qβ2n+1) = 0. Here it cannot happen that p− qβ2n = pα− qβ2n+1 = 0. Indeed, otherwise, we have p = qβ2n and q(1 − β2) = 0. Since at least one of p and q is nonzero, we have β ∈ {±1} and hence xi ∈ {±2}, which contradicts the assumption that xi 6∈ {±2}. Therefore, α2n = 1, that is, α is a primitive 2n-th root of unity. Note that n ≥ 2 since α 6∈ {±1}. Thus there is an integer 0 < m < n comprime to n such that α = exp(πim/n) and so xi = α + α −1 = 2 cos(πm/n), which leads to condition (24). Conversely, if condition (24) is satisfied, then it is easy to see that x is a periodic point of g2j of prime period n. Case xi ∈ {±2}: In this case we have xi = −2ε for some sign ε ∈ {±1} and hence equation (29) has a double root α = β = ε. Then the recurrence equation (26) is settled as j = ε n{yj − n(yj + εyk)}. If the sequence x(n) is periodic, then so is the sequence y(n)j . This is the case if and only if yj + εyk = 0. Conversely, if this condition is satisfied, then we have j = ε nyj. Substituting this equation into (27) yields j = x (2n+1) k = xj + nyj, (2n+1) j = x (2n+2) k = x j + εnyj. Hence the sequence x(2n) is periodic if and only if yj = yk = 0, namely, if and only if x is a root of (23). In this case x is a fixed point of g2j . ✷ In order to give the relation between the fixed points upstairs and those downstairs, we put Fix◦j(θ) := Fixj(θ)− Sing(θ), F̃ix j (θ) := F̃ixj(θ)− E(θ), F̃ix j(θ) := F̃ixj(θ) ∩ E(θ). For the periodic points of prime period n > 1, we define Per◦j(θ;n), P̃er j(θ;n) and P̃er j(θ;n) in a similar manner. Then there exist direct sum decompositions F̃ixj(θ) = F̃ix j (θ)∐ F̃ix j(θ), P̃erj(θ;n) = P̃er j (θ;n)∐ P̃er j(θ;n), where the exceptional components F̃ix j(θ) and P̃er j(θ;n) parametrize the single-valued Riccati solutions and the n-branched Riccati solutions around the fixed singular point zj respectively. Lemma 4.5 The minimal resolution (15) induces an isomorphism ϕ : F̃ix j (θ) → Fix◦j (θ). (32) For any n > 1 we have Per(θ;n)∩Sing(θ) = ∅, that is, Per◦j(θ;n) = Perj(θ;n), and the minimal resolution (15) induces an isomorphism ϕ : P̃er j (θ;n) → Perj(θ;n) (n > 1). (33) Proof. The isomorphism (32) is trivial from the definition. The assertion Per(θ;n)∩Sing(θ) = ∅ follows from (20). Then the isomorphism (33) is again trivial from the definition. ✷ The fixed point set and the periodic point set, upstairs or downstairs, will be investigated more closely in §6. For this purpose it is convenient to consider the symmetric group S4 of degree 4 acting on K by permuting the entries κ1, κ2, κ3, κ4 of κ ∈ K and fixing κ0. Through the Riemann-Hilbert correspondence in the parameter level, rh : K → Θ, the action S4 y K induces an action of S3 ⋉ Kl on Θ, where Kl is Klein’s 4-group realized as the group of even triple signs, Kl = {ε = (ε1, ε2, ε3) ∈ {±1}3 : ε1ε2ε3 = 1}, acting on Θ by the sign changes (θ1, θ2, θ3, θ4) 7→ (ε1θ1, ε2θ2, ε3θ3, θ4), while S3 acts on Θ by permuting the entries θ1, θ2, θ3 of θ ∈ Θ and fixing θ4. This construction defines an isomorphism of groups S4 ∼= S3 ⋉Kl, σ 7→ (τ, ε), (34) with respect to which the map rh : K → Θ becomes S4-equivariant. Viewed as a subgroup of S4, Klein’s 4-group is the permutation group Kl = {1, (14)(23), (24)(31), (34)(12)}. Let σ ∈ S4 act on x = (x1, x2, x3) in the same manner as it does on (θ1, θ2, θ3). Then the polynomial f(x, θ) is σ-invariant and hence σ induces an isomorphism of algebraic surfaces, σ : S(θ) → S(σ(θ)). As for the action g2j : S(θ) , we have the commutative diagram j−−−→ S(θ) S(σ(θ)) −−−→ S(σ(θ)), (A3)i (A Figure 9: W̃ (D 4 )-strata (A3)i and (A for any element σ ∈ S4 with τ ∈ S3 determined by (34). It induces isomorphisms σ : Fixj(θ) → Fixτ(j)(σ(θ)), σ : Perj(θ;n) → Perτ(j)(σ(θ);n), which, via the minimal resolution (15), lift up to isomorphisms σ̃ : F̃ixj(θ) → F̃ixτ(j)(σ(θ)), σ̃ : P̃erj(θ;n) → P̃erτ(j)(σ(θ);n). The action of the symmetric group S4 on K mentioned above is just induced from its action on the index set {0, 1, 2, 3, 4} fixing the element 0, namely, from the realization of S4 as the automorphism group of the Dynkin diagram D 4 . By taking the semi-direct product by the symmetric group S4 or by Klein’s 4-group Kl, we can enlarge the affine Weyl group W (D to the affine Weyl group of type F 4 or to the extended affine Weyl group of type D 4 ) = S4 ⋉W (D 4 ) ⊃ W̃ (D 4 ) = Kl⋉W (D Definition 4.6 Replacing the group W (D 4 ) with W (F 4 ) in Definition 3.2, we can define a coarser stratification of K than the W (D(1)4 )-stratification, called the W (F 4 )-stratification. Moreover, replacingW (D 4 ) with W̃ (D 4), we can also think of a stratification of K intermediate between these two stratifications, called the W̃ (D 4 )-stratification. The following is the classification of the W (F 4 )-strata and W̃ (D 4 )-strata. Lemma 4.7 For each abstract Dynkin type ∗ in Table 2, there is a unique W (F (1)4 )-stratum of type ∗. As for the W̃ (D(1)4 )-strata, we have the following classification (see also Figure 9). (1) For ∗ ∈ {D4, A⊕41 , A⊕31 , A2, A1, ∅}, there is a unique W̃ (D 4 )-stratum of abstract type ∗ and this unique stratum is denoted by the same symbol ∗. (2) For ∗ ∈ {A3, A⊕21 }, there are exactly three W̃ (D 4 )-strata of abstract type ∗; (a) for ∗ = A3, the stratum (A3)i represented by I = {0, j, k} with {i, j, k} = {1, 2, 3}; (b) for ∗ = A⊕21 , the stratum (A⊕21 )i represented by I = {j, k} with {i, j, k} = {1, 2, 3}. (A⊕21 )i (A3)i Figure 10: Adjacency relations among W̃ (D 4 )-strata (i = 1, 2, 3) If something about the transformation g2j is discussed for a fixed index j, the relevant stratification is the W̃ (D 4 )-stratification. Namely we may discuss the issue on each W̃ (D stratum, choosing any representative of each W̃ (D 4 )-orbit, since in the commutative diagram (35) we have τ(j) = j and hence g2 = g2j for every σ ∈ Kl (see also Remark 5.1). For two W̃ (D 4 )-strata, say ∗ and ∗∗, we write ∗ → ∗∗ if the stratum ∗∗ lies on the boundary of the stratum ∗. All the possible adjacency relations ∗ → ∗∗ are depicted in Figure 10. Note that there are no adjacency relations between (A⊕21 )i and (A3)j for any distinct i, j ∈ {1, 2, 3}. 5 Bäcklund Transformations In this section we briefly discuss Bäcklund transformations, especially the characterization of them in terms of Riemann-Hilbert correspondence [15, 16]. This topic is included here in order to confirm that our problem may be treated modulo Bäcklund transformations. For each σ ∈ S4 we define the isomorphism of affine cubic surfaces σ : S(θ) → S(σ(θ)), (x1, x2, x3) 7→ (ετ(1)xτ(1), ετ(2)xτ(2), ετ(3)xτ(3)), where σ ∈ S4 is identified with (τ, ε) ∈ S3 ⋉ Kl via the isomorphism (34). Consider the natural homomorphism W (F 4 ) = S4 ⋉ W (D 4 ) → S4, w 7→ σ. Since the Riemann-Hilbert correspondence (12) is an analytic minimal resolution of singularities, for each w ∈ W (F (1)4 ), there exists an analytic isomorphism w : Mz(κ) → Mz(w(κ)) such that the diagram Mz(κ) w−−−→ Mz(w(κ)) RHz,κ yRHz,w(κ) S(θ) −−−→ S(σ(θ)) is commutative, for any fixed κ ∈ K with θ = rh(κ) ∈ Θ. The commutative diagram (36) characterizes the Bäcklund transformations of Painlevé VI. Namely the map w : Mz(κ) → Mz(w(κ)) turns out to be algebraic and there are suitable affine coordinates on Mz(κ) and Mz(w(κ)) in terms of which the map w can be represented by the usual formula for Bäcklund transformations known as birational canonical transforamtions [27] (see [15, 16] for the precise statement). In other words the Riemann-Hilbert correspondence is equivariant under the Bäcklund transformations and so is our main problem. Remark 5.1 The S4-factor of W (F 4 ) = S4 ⋉W (D 4 ) or more strictly the S3-factor of S4 = S3⋉Kl permutes the three fixed singular points 0, 1 and ∞, while they are fixed by W̃ (D(1)4 ) = Kl ⋉ W (D 4 ). Hence we may consider our problem only around the origin z = 0 and, upon restricting our attention to z = 0, we may discuss it modulo the Bäcklund action of W̃ (D 6 Fixed Points and Periodic Points We shall more closely investigate the fixed point set Fixj(θ), or rather its subset Fix j (θ) j (θ) of smooth fixed points, by solving the system of equations (23). In view of (21) the last two equations in (23) are expressed as a linear system for the unknowns (xj , xk), 2xj + xixk = θj , xixj + 2xk = θk, If its determinant 4− x2i is nonzero, then system (37) is uniquely settled as 2θj − xiθk 4− x2i , xk = 2θk − xiθj 4− x2i . (38) Substituting (38) into equation f(x, θ) = 0 yields a quartic equation for the unknown xi, x4i − θix3i + (θ4 − 4)x2i + (4θi − θjθk)xi + θ2j + θ2k − 4θ4 = 0. (39) Conversely, if xi is a root of equation (39) with nonzero x i − 4, then subsituting this into formula (38) yields a root of system (23). The four roots of quartic equation (39) are given by F (bi, b4; bj , bk), F (bi, b 4 ; bj, bk), F (bj, bk; bi, b4), F (bj , b k ; bi, b4), counted with multiplicities, where F (bi, b4; bj , bk) is defined by F (bi, b4; bj , bk) = bib4 + b 4 . (40) We pick up the root xi = F (bi, b4; bj, bk). Note that F (bi, b4; bj, bk) 2 − 4 is nonzero precisely when b2i b 4 6= 1. If this is the case, then substituting xi = F (bi, b4; bj , bk) into formula (38) yields xj = G(bi, b4; bj , bk) and xk = G(bi, b4; bk, bj), where G(bi, b4; bj , bk) is defined by G(bi, b4; bj , bk) = (bi + b4)(bj + bk)(bjbk + 1) 2(bib4 + 1)bjbk (bi − b4)(bj − bk)(bjbk − 1) 2(bib4 − 1)bjbk . (41) Therefore, if P (bi, b4; bj, bk) denotes the point defined by xi = F (bi, b4; bj, bk), xj = G(bi, b4; bj , bk), xk = G(bi, b4; bk, bj), then x = P (bi, b4; bj , bk) gives a root of system (23) with nonzero x i −4 provided that b2i b24 6= 1. If x is at this root, then yi(x, θ) admits the following nice factorization yi(x, θ) = (bi − b−1i )(b4 − b−14 ) (b2i b 4 − 1)2 (εj ,εk)∈{±1} k b4 − 1) = (bib4 − b−1i b−14 )−2 {F (bi, b4; bj, bk)− F (bi, b−14 ; bj, bk)} {F (bi, b4; bj, bk)− F (bj , bk ; bi, b4)} {F (bi, b4; bj, bk)− F (bj , b−1k ; bi, b4)}. label fixed point existence smoothness condition 1 P (bi, b4; bj , bk) κi + κ4 6∈ Z κi 6∈ Z, κ4 6∈ Z, κi + κ4 ± κj ± κk 6∈ 2Z+ 1 2 P (bi, b 4 ; bj, bk) κi − κ4 6∈ Z κi 6∈ Z, κ4 6∈ Z, κi − κ4 ± κj ± κk 6∈ 2Z+ 1 3 P (bj, bk; bi, b4) κj + κk 6∈ Z κj 6∈ Z, κk 6∈ Z, κj + κk ± κi ± κ4 6∈ 2Z+ 1 4 P (bj, b k ; bi, b4) κj − κk 6∈ Z κj 6∈ Z, κk 6∈ Z, κj − κk ± κi ± κ4 6∈ 2Z+ 1 Table 3: Smooth fixed points x ∈ Fix◦j(θ) with nonzero x2i − 4 Hence P (bi, b4; bj, bk) is a smooth point of S(θ) if and only if F (bi, b4; bj , bk) is a simple root of equation (39). In terms of κ ∈ K, the existence and smoothness conditions for P (bi, b4; bj , bk) are given by κi + κ4 6∈ Z and κi 6∈ Z, κ4 6∈ Z, κi + κ4 ± κj ± κk 6∈ 2Z+ 1, respectively. Lemma 6.1 The smooth fixed points x ∈ Fix◦j (θ) with nonzero x2i −4 are precisely those points in Table 3 which satisfy the existence and smoothness conditions mentioned there. The fixed points in Table 3 is closely related to the configuration of lines on the affine cubic surface S(θ) or on its compactification S(θ) by the standard embedding S(θ) →֒ S(θ) ⊂ P3, x = (x1, x2, x3) 7→ [1 : x1 : x2 : x3], where the projective cubic surface S(θ) is defined by the homogeneous equation F (X, θ) := X1X2X3 +X0(X 3 )−X20 (θ1X1 + θ2X2 + θ3X3) + θ4X30 = 0. It is obtained from the affine surface S(θ) by adding three lines at infinity Li = {X ∈ P3 : X0 = Xi = 0 } (i = 1, 2, 3), whose union L = L1 ∪ L2 ∪ L3 is called the tritangent lines at infinity. It is well known that a smooth projective cubic surface has exactly 27 lines on it. We describe them in the current situation [20]. Let Li(bi, b4; bj, bk) be the line in P 3 defined by Xi = (bib4 + b 4 )X0, Xj + (bib4)Xk = {bi(bk + b−1k )}+ b4(bj + b−1j )X0. (43) 1 L+i1 = Li(bi, b4; bj, bk) L i1 = Li(b i , b 4 ; bj, bk) 2 L+i2 = Li(bi, b 4 ; bj, bk) L i2 = Li(b i , b4; bj, bk) 3 L+i3 = Li(bj , bk; bi, b4) L i3 = Li(b j , b k ; bi, b4) 4 L+i4 = Li(bj , b k ; bi, b4) L i4 = Li(b j , bk; bi, b4) Table 4: Eight lines intersecting the line Li at infinity, divided into four pairs L−24L L+11 L L−12 L Figure 11: The 27 lines on a smooth cubic surface viewed from the tritangent lines at infinity For each i ∈ {1, 2, 3} the eight lines in Table 4 are the only lines on S(θ) that intersect the i-th line Li at infinity, but they do not intersect the remaining two lines Lj and Lk at infinity. These lines are divided into four pairs as in Table 4. The surface S(θ) is always smooth at infinity [20] and hence, if κ ∈ K−Wall, then S(θ) is smooth everywhere. In this case, the two lines in the same pair intersect, while two lines from different pairs do not. The intersection point of the i-th pair is exactly the i-th fixed point in Table 3. See Figure 11 for a total image of these situations. Caution: for a pair of distinct indices i and j, the intersection relations between L±iµ and L±jν are not depicted in the Figure 11. We also remark that in some degenerate cases the lines L+iµ and L iµ may meet in a point on the line Li at infinity. Next we consider the case where the determinant 4 − x2i of system (37) vanishes. In other words we ask when the fixed point set Fixj(θ) contains points x such that xi ∈ {±2}. Lemma 6.2 Fixj(θ) contains a point x such that xi = 2δ with δ ∈ {±1} if and only if either (1) bib4 = bib 4 = δ; or (2) bjbk = bjb k = δ; or (3) bib 4 = bjb k = δ for some double sign (εk, ε4) ∈ {±1}2. If this is the case, then θk = δθj and all such poins x are exactly those points on the line ℓδj := { xi = 2δ, xj + δxk = θj/2 }. (44) In particular ℓδj ⊂ Fixj(θ) precisely when xi = 2δ is a multiple root of the quartic equation (39). xi ∈ {±2} multiplicity component remark no simple smooth point intersection point of L±iµ no multiple singular point Riccati locus yes multiple line ℓ+j or ℓ j line contains singular points yes simple empty L±iµ intersects at infinity Table 5: The roots of quartic equation (39) and the components of Fixj(θ) Proof. If xi = 2δ with δ ∈ {±1} then system (37) is linearly dependent, so that θk − δθj = 0. However, since θk − δθj = (bibj)−1(bib4 − δ)(bib−14 − δ)(bjbk − δ)(bjb−1k − δ), we have either 4 = δ for some sign ε4 ∈ {±1} or bjb k = δ for some sign εk ∈ {±1}. Taking the equation xj + δxk = θj/2 into account, we observe that f(x, θ) factors as f(x, θ) = −(2bibj)−2(bib−ε44 − δ)2(bjbk − δ)2(bjb−1k − δ)2 (if bib 4 = δ), −(2bibj)−2(bjb−εkk − δ)2(bib4 − δ)2(bib−14 − δ)2 (if bjb k = δ). If bib 4 = δ then equation f(x, θ) = 0 yields either bib 4 = δ or bjb k = δ for some sign εk ∈ {±1}; the former case falls into case (1) while the latter falls into case (3). In a similar manner the other case bjb k = δ falls into case (2) or case (3). Next, if Fixj(θ) contains the line ℓ j , then what we have just proved implies that F (bi, b4; bj, bk) = F (bi, b 4 ; bj, bk) = 2δ if condition (1) is satisfied; F (bj, bk; bi, b4) = F (bj , b k ; bi, b4) = 2δ if condition (2) is satisfied; F (bi, b 4 ; bj, bk) = F (bj , b k ; bi, b4) = 2δ if condition (3) is satisfied. Hence xi = 2δ is a multiple root of the quartic equation (39). Conversely, if xi = 2δ is a multiple root of (39), then we can trace the argument backwards to conclude that the system (23) admits the line solution ℓδj , that is, Fixj(θ) contains ℓ j . ✷ Summarizing the arguments so far yields a classification of the irreducible components of the algebraic set Fixj(θ) in terms of certain roots of quartic equation (39). Theorem 6.3 Any irreducible component of Fixj(θ) is just a single point or a single affine line; the former is called a point component and the latter is called a line component respectively. The irreducible components of Fixj(θ) are in one-to-one correspondence with those roots of quartic equation (39) which are not a simple root x = (x1, x2, x3) such that xi ∈ {±2}. (1) A simple root with xi 6∈ {±2} corresponds to a point component that is a smooth point of the surface S(θ) and is given in Table 3. (2) A multiple root with xi 6∈ {±2} corresponds to a point component that is a singular point of the surface S(θ) and is associated with Riccati solutions. (3) A multiple root with xi ∈ {±2} corresponds to a line component; either ℓ+j or ℓ−j . (4) A simple root with xi ∈ {±2} corresponds to no component of Fixj(θ). A summary of Theorem 6.3 is given in Table 5 and the following remark may be helpful. Remark 6.4 The assertions (3) and (4) of Theorem 6.3 may be well understood through the degeneration of line configration on the projective surface S(θ) as the parameter θ = rh(κ) tends to a special position. For a generic value of θ the lines L±iµ intersect in a single (smooth) point on the affine part S(θ) of S(θ). If the parameter θ tends to a special position so that a corresponding root xi of quartic equation (39) approaches {±2}, then the two line L±iµ are getting “parallel” and eventually either coincide completely or meet in a point at infinity. The former case falls into assertion (3) and the latter case falls into assertion (4) respectively. Let us investigate more closely the case where Fixj(θ) contains line components. Lemma 6.5 Let θ = rh(κ) with κ ∈ K and (i, j, k) be any cyclic permutation of (1, 2, 3). (1) Fixj(θ) contains either ℓ j or ℓ j but not both of them if and only if κ lies in a W̃ (D stratum appearing in the following adjacency diagram (see also Figure 10) : (A⊕21 )i −−−→ A⊕31y (A3)i −−−→ D4 (2) Fixj(θ) contains both ℓ j and ℓ j if and only if θ = (0, 0, 0,−4), that is, precisely when κ is in the W̃ (D 4 )-stratum of type A 1 . In this case one has Fixj(θ) = ℓ j ∐ ℓ−j . Proof. Lemma 6.2 implies that Fixj(θ) contains at least one of ℓ j and ℓ j if and only if either (a) bib4 = bib 4 ∈ {±1}; or (b) bjbk = bjb−1k ∈ {±1}; or (c) bib 4 = bjb k ∈ {±1} for some double sign (εk, ε4) ∈ {±1}2. This property is invariant under the action of W̃ (D(1)4 ) = Kl⋉W (D on K. Using this action we can reduce conditions (a) and (c) to condition (b). First, observe that the permutation (i, j)(k4) ∈ Kl induces the map (b0, bi, bj , bk, b4) 7→ (b0, bj , bi,−b4,−bk), which reduces condition (a) to condition (b). Next, formula (13) implies that the reflection wi induces the multiplicative transformation wi : B → B, b 7→ b′, where b′j = −bjbciji (i = 4, j = 0), i (otherwise). Applying w4 or wk if necessary, we may assume from the beginning that ε4 = 1 and εk = −1 in condition (c). Then using w0 there yields b 0bib4 = bjb k ∈ {±1}. But since b20bib4bjbk = 1, we have bjbk = bjb k ∈ {±1}, that is, condition (b). Note that condition (b) means κj, κk ∈ Z. On the other hand, the extended affine Weyl group W̃ (D 4 ) contains shifts (κ0, κi, κj, κk, κ4) 7→ (κ0, κi − 1, κj + 1, κk, κ4), (κ0, κi, κj, κk, κ4) 7→ (κ0, κi, κj, κk + 1, κ4 − 1). Repeated applications of these operations and their inverses can shift κj and κk independently by arbitrary integers. Thus the condition κj, κk ∈ Z can further be reduced to κj = κk = 0. Thus we have shown that if Fixj(θ) contains at least one of ℓ j and ℓ j , then κ must lie in the W̃ (D 4 )-stratum of type (A 1 )i or on its boundary strata of types (A3)i, A 1 , D4, A Moreover, it is easy to see that the converse is also true. For a sign δ ∈ {±1} the conditions (1), (2), (3) in Lemma 6.2 are denoted by (1δ), (2δ), (3δ), respectively. Now we assume that Fixj(θ) contains both ℓ j and ℓ j . Then there exists a pair of conditions, one from {(1+), (2+), (3+)} and the other from {(1−), (2−), (3−)}, that are valid at the same time. Such a pair can be consistent only if it is either (1−) + (2−); or (2+) + (1−); or (3+) + (3−) where if the sign for (3+) is (εk, ε4) then the sign for (3 −) must be its antipode (−εk,−ε4). The first and second pairs lead to b21 = b22 = b23 = b24 = 1 and to b1b2b3b4 = −1, while the third pair yields b21 = b 2 = b 3 = b 4 = −1. These are nothing but the conditions (a) and (b) in Example 3.4. (1). Therefore κ must lie in the stratum of type A⊕41 . Combining this with the discussion in the last paragraph establishes the assertion (1), as well as a large part of the assertion (2). The only thing yet to be proved is the assertion that if Fixj(θ) contains both ℓ+j and ℓ j , then Fixj(θ) = ℓ j ∐ ℓ−j . For this, the last part of Lemma 6.2 implies that both xi = 2 and xi = −2 are multiple roots of the quartic equation (39), so that there are no other roots of the equation (39). Thus Fixj(θ) has no elements other than those in ℓ j ∐ ℓ−j . ✷ Now we turn our attention to periodic points and investigate the set P̃er j(θ;n) of periodic points of prime period n > 1 on the non-Riccati locus. Lemma 6.6 For any integer n > 1 the set P̃er j(θ;n) is biholomorphic to the disjoint union of ϕ(n) copies of C×, where ϕ(n) denotes the number of integers 0 < m < n coprime to n. Proof. By Lemma 4.5 we can identify P̃er j(θ;n) with Perj(θ;n) and hence may work downstairs. For any integer 0 < m < n coprime to n, we consider the projective curve Cm in P 3 defined by {4 cos2(πm/n)− 2θi cos(πm/n) + θ4}X20 −X0(θjXj + θkXk) +X2j +X k + 2 cos(πm/n)XjXk = 0, (47) Xi − 2 cos(πm/n)X0 = 0, (48) where (47) is obtained from F (X, θ) = 0 by substituting (48) and factoring X0 out of it. It follows from −2 < 2 cos(πm/n) < 2 that Cm is an irreducible smooth conic curve. By equations (24) of Lemma 4.4 the closure Perj(θ;n) of Perj(θ;n) in S(θ) is the union of these ϕ(n) curves Cm. The curve Cm intersects the lines L = Li ∪ Lj ∪ Lk at infinity in the two points P±m : [X0 : Xi : Xj : Xk] = [0 : 0 : −1 : exp(±π −1m/n)] ∈ Li. If Cm := Cm−{P+m , P−m}, then Cm is biholomorphic to C×, since Cm ∼= P1. So Perj(θ;n) is the disjoint union of the ϕ(n) curves Cm with 0 < m < n, (m,n) = 1, and hence biholomorphic to the disjoint union of ϕ(n) copies of C×. ✷ 7 Case-by-Case Study We make case-by-case studies of F̃ixj(θ) and P̃er j(θ;n) according to the adjacency diagram in Figure 10. Now we need to introduce some notation. Recall that we have the resolution of singularities (15) which restricts to an isomorphism ϕ : S̃◦(θ) → S◦(θ) and that the smooth fixed points Fix◦j(θ) in S◦(θ) are listed in Table 3. For each P ∈ Fix◦j(θ) let P̃ ∈ F̃ix j (θ) denote its lift through the isomorphism ϕ. For example, P̃ (bi, b4; bj , bk) denotes the lift of P (bi, b4; bj, bk). If {· · · } is a set of expressions P̃ (bi, b±14 ; bj, bk), P̃ (bj , b±1k ; bi, b4), then we denote by {{· · · }} its subset obtained by discarding those expressions which do not satisfy either the existence condition or the smoothness condition of Table 3. An example is given in (49) below. Example 7.1 (∅) Consider the W̃ (D 4 )-stratum of type ∅, namely, the big open K −Wall. F̃ixj(θ) = {{ P̃ (bi, b4; bj , bk), P̃ (bi, b−14 ; bj, bk), P̃ (bj , bk; bi, b4), P̃ (bj , b−1k ; bi, b4) }}. (49) Here we have only to care the existence condition, as we are in the big open where the smooth- ness condition is fulfilled by hypothesis. If a finer stratification of K attached to the W (F (1)4 )- action on K is introduced, then a more precise description of (49) is feasible, detecting how many and which elements are there in (49), but the details are omitted. We only remark that F̃ixj(θ) consists of four distinct points in the most generic case where none of κi±κ4 and κj±κk are integers. As for the periodic points, since there is no Riccati locus, we have P̃erj(θ;n) = P̃er j (θ;n), P̃er j(θ;n) = ∅, (n > 1). Example 7.2 (A1) Consider the W̃ (D 4 )-stratum of type A1. We may assume that κ0 = 0 so that b0 = 1 and bibjbkb4 = 1. Note that none of b i , b j , b 4 equals 1. We claim that b k 6= 1. Otherwise, we would have κj + κk ∈ Z. Applying a shift as in (46) to κ repeatedly, we may assume that κj + κk = 0 while keeping the condition κ0 = 0. Then the transformation w0wj sends κ to κ′ with κ′j = 0 and κ k = κj + κk = 0, so that one has κ ∈ K{j,k}, namely, κ lies in the closure of the stratum of type (A⊕21 )i. This contradicts the assumption that we are in the stratum of type A1. In this case, the surface S(θ) has a unique singular point of type A1 at (xi, xj , xk) = (bib4 + b 4 , bjb4 + b 4 , bkb4 + b Blow up S(θ) at this point to obtain a minimal resolution (15). Write the blowing-up as (xi, xj, xk) = (uiuj + bib4 + b 4 , uj + bjb4 + b 4 , ukuj + bkb4 + b in terms of coordinates (ui, uj, uk). The exceptional set e is the irreducible quadratic curve uj = bibjbk + (1 + b j )bkui + (1 + b j )biuk + bibjbk(u i + u k) + (1 + b k)bjuiuk = 0, which can be paramatrized as uj = 0 and b2i b k − 1)3t bj{bk(b2i − 1)(b2j − 1) + bi(b2jb2k − 1)t}{(b2k − 1)(b24 − 1) + bibkb24(b2jb2k − 1)t} {b2jbk(b2i − 1)(b2k − 1)− bi(b2jb2k − 1)t}{bk(b2j − 1)(1− b24) + bib24(b2jb2k − 1)t} bj{bk(b2i − 1)(b2j − 1) + bi(b2jb2k − 1)t}{(b2k − 1)(b24 − 1) + bibkb24(b2jb2k − 1)t} In terms of this parametrization, the lifted transformation g̃2j acts on the exceptional curve e ≃ P1 by the multiplication t 7→ b2jb2kt. Since b2jb2k 6= 1, the set F̃ix j(θ) consists of the two p0 p+ S̃(θ) t = 0 t = ∞ s = t = 0 s = ∞ t = ∞ S̃(θ) Figure 12: Surface of types A1 (left) and A2 (right) points, say p and q, corresponding to t = 0 and t = ∞ (see Figure 12, left). On the other hand, the possible candidates for the smooth fixed points F̃ix j (θ) are only the points of labels 2 and 4 in Table 3, since those of labels 1 and 3 do not satisfy the smoothness condition. Thus, F̃ixj(θ) = {{P̃ (bi, b−14 ; bj , bk), P̃ (bj , b−1k ; bi, b4)}} ∐ {p, q}. (50) As for the Riccati periodic points P̃er j(θ;n), the discussion above implies that for any n > 1, j(θ;n) = e (if bjbk is a primitive 2n-th root of unity), ∅ (otherwise). Example 7.3 (A2) Consider the W̃ (D 4 )-stratum of type A2. We may assume that κ0 = κi = 0 so that b0 = bi = 1. Then the surface S(θ) has a unique singular point of type A2 at (xi, xj , xk) = (b4 + b 4 , bjb4 + b 4 , bkb4 + b Blow up S(θ) at this point to obtain a minimal resolution (15). Write the blowing-up as (xi, xj , xk) = (uiuj + b4 + b 4 , uj + bjb4 + b 4 , ukuj + bkb4 + b in terms of coordinates (ui, uj, uk). The exceptional set e is the union of two lines e+ : uj = bkui + uk + bjbk = 0, e − : uj = b k ui + uk + b k = 0, intersecting in a point. These lines are parametrized as e+ : (ui, uj, uk) = k − 1 bj(1− b2k) + (b2jb2k − 1)s bk(1− b2j ) + bjbk(1− b2jb2k)s bj(1− b2k) + (b2jb2k − 1)s e− : (ui, uj, uk) = k − 1 bj(1− b2k) + (b2jb2k − 1)t bk(1− b2j ) + b−1j b−1k (1− b2jb2k)t bj(1− b2k) + (b2jb2k − 1)t with the intersection point corresponding to s = t = 0. In terms of these parametrizations, the lifted transformation g̃2j acts on e + and e− by the multiplications s 7→ b−2j b−2k s and t 7→ b2jb2kt, ℓ̃+j C ei e4 S̃(θ) Figure 13: Surface of type (A⊕21 )i which are rewritten as s 7→ b24s and t 7→ b−24 t, since bjbkb4 = 1. Note that b24 6= 1, for otherwise κ would be in the closure of the W̃ (D 4 )-stratum of type (A3)i. So g̃ j has exactly two fixed points p0 and p+ on e + corresponding to s = 0 and s = ∞. Similarly g̃2j has exactly two fixed points p0 and p− on e − corresponding to t = 0 and t = ∞, where p0 is the intersection point of e+ and e− (see Figure 12, right). Thus we have F̃ix j(θ) = {p0, p+, p−}. Next we consider the smooth fixed point of g̃2j on S̃(θ). Since we are assuming that κi = κj + κk + κ4 = 1, the points of labels 1, 2, 3 in Table 3 do not satisfy the smoothness condition and that of label 4 is the only smooth fixed point. Thus F̃ix j (θ) = {P̃ (bj , b−1k ; bi, b4)} and hence F̃ixj(θ) = {P̃ (bj , b−1k ; bi, b4), p0, p+, p−}. (51) In the remaining cases presented below, Fixj(θ) contains at least one line component. Example 7.4 (A ) First we consider F̃ixj(θ) and P̃er j(θ) on the W̃ (D 4 )-stratum of type (A⊕21 )i. We may assume that κj = κk = 0 so that bj = bk = 1. Since our stratum is not of type (A3)i nor of type D4, we have (bib4−1)(bib−14 −1) 6= 0 or equivalently bi+b−1i 6= b4+b−14 . In this case Fixj(θ) contains the line ℓ j but does not the line ℓ j and the surface S(θ) has two singular points of type A1 at (xi, xj , xk) = (2, bi + b i , b4 + b 4 ) and (xi, xj , xk) = (2, b4 + b 4 , bi + b We denote the former singularity by qi and the latter by q4 respectively; both singularities lie on the line ℓ+j . Blow up S(θ) at these points to obtain a minimal resolution as in (15). Let ℓ̃+j be the strict transform of ℓ+j , and let ei and e4 be the exceptional curves over qi and q4 respectively. Moreover let pi be the intersection point of ℓ̃ j and ei. Similarly let p4 be the intersection point of ℓ̃+j and e4 (see Figure 13). Then the blowing-up at the point qi is represented as (xi, xj, xk) = (uiuj + 2, uj + bi + b i , ukuj + b4 + b in terms of coordinates (ui, uj, uk) around (0, 0, 0). The strict transform ℓ̃ j and the exceptional curve ei are given by ui = uk + 1 = 0 and uj = (bib4)(u i + u k) + (b i + 1)b4(uiuk) + bi(b 4 + 1)ui + 2(bib4)uk + (bib4) = 0. The exceptional curve ei admits a parametrization (bib4 − 1)(bib−14 − 1) (t+ bi)(bit + 1) , uj = 0, uk = − bi(t+ b4)(b4t+ 1) b4(t+ bi)(bit+ 1) , (52) where the intersection point pi has coordinates (ui, uj, uk) = (0, 0,−1), which corresponds to t = ∞. The lifted transformation g̃2j acts on ei as a Möbius transformation fixing pi. Some computations show that in terms of the variable t this transformation is just the shift t 7→ t+ (bi + b−1i )− (b4 + b−14 ). and hence a parabolic transformation. Thus g̃2j has no periodic points on ei other than the fixed point pi. By symmetry, g̃ j also acts on e4 as a parabolic Möbius transformation fixing p4 only. Summarizing the arguments, we conclude that on the W̃ (D 4 )-stratum of type (A 1 )i, F̃ixj(θ) = ℓ̃ j ∐ { P̃ (bi, b4; bj , bk), P̃ (bi, b−14 ; bj, bk) }, P̃er j(θ;n) = ∅ (n > 1). (53) Next we consider F̃ixi(θ) on the W̃ (D 4 )-stratum of type (A 1 )i. Some calculations show that there are parametrizations of ei and e4 such that g̃ i acts on ei and e4 as the multiplications t 7→ b24t and t 7→ b2i t respectively. (Modify (52) to get such parametrization.) Since b24 6= 1 and b2i 6= 1, the transformation g̃2i has exactly two fixed points, say pii and qii, on ei, and exactly two fixed points, say pi4 and qi4, on e4. There are no smooth fixed points F̃ix i (θ), because the smoothness condition of Table 3 with (i, j, k) replaced by (k, i, j) is not satisfied for any labels there. Thus we have F̃ixi(θ) = F̃ix i (θ) = {pii, qii, pi4, qi4} and F̃ix i (θ) = ∅. By symmetry there is a similar characterization of F̃ixk(θ). By permuting the indices (i, j, k), we have F̃ixj(θ) = F̃ix j(θ) = {four points}, F̃ix j(θ) = ∅, (54) on the W̃ (D 4 )-strata of types (A 1 )j and (A 1 )k. A slightly further consideration yields j(θ;n) =   ej ∐ e4 (if bj and b4 are primitive 2n-th roots of unity), ej (if b4 is a primitive 2n-th root of unity, but bj is not), e4 (if bj is a primitive 2n-th root of unity, but b4 is not), ∅ (otherwise). on the stratum (A⊕21 )j and a similar characterization of it on the stratum (A 1 )k. Example 7.5 (A3) First we consider F̃ixj(θ) and P̃er j(θ) on the W̃ (D 4 )-stratum of type (A3)i. We may assume that κ0 = κj = κk = 0 and κi+κ4 = 1 so that bj = bk = 1 and bib4 = 1. But we have bib 4 6∈ {±1}, since our stratum is not of type D4. In this case Fixj(θ) contains the line ℓ+j but does not the line ℓ j . The surface S(θ) has only one singular point of type A3 at (xi, xj , xk) = (2, b4 + b 4 , b4 + b 4 ), which lies on the line ℓ j . Blow up the singular point. This blowing-up is expressed as (xi, xj , xk) = (uiuj + 2, uj + b4 + b 4 , ukuj + b4 + b 4 ) in terms of coordinates (ui, uj, uk) around (0, 0, 0). The strict transform of the surface S(θ) is given by uj = b4uiujuk + b4u i ++b4u k + (b 4 + 1)uiuk + (b 4 + 1)ui + 2b4uk + b4 = 0, which has yet one singular point, say q. The exceptional curve consists of two line components uj = ui+b4uk+b4 = 0 and uj = b4ui+uk+1 = 0, whose intersection point (ui, uj, uk) = (0, 0,−1) is exactly the singular point q. The strict transform of ℓ+j is now given by ui = uk+1 = 0, which ℓ̃+jej ek pj p0 pk S̃(θ) Figure 14: Surface of type (A3)i also passes through q. Blow up again the singular point q. Let e0 be the exceptional curve and let ej, ek, ℓ̃ j be the strict transforms of the lines uj = ui+ b4uk+ b4 = 0, uj = b4ui+uk+1 = 0, ui = uk + 1 = 0 respectively. If we express this blowing-up as (ui, uj, uk) = (vi, vivj , vivk − 1), then the exceptional curve e0 is given by vi = b4 − b4vj + (b24 + 1)vk + b4v2k = 0; ej is given by vj = 1 + b4vk = 0; and ek is given by vj = b4 + vk = 0. The intersection point of e0 and ej is (vi, vj , vk) = (0, 0,−bi) and that of e0 and ek is (vi, vj , vk) = (0, 0,−b4). If ej is parametrized as (vi, vj , vk) = ((t+bi) −1, 0,−bi), then the transformation g̃2j acts on ej as the shift t 7→ t+b4−bi. Similarly, if ek is parametrized as (vi, vj, vk) = ((t + b4) −1, 0,−b4), then g̃2j acts on ek as the shift t 7→ t + b4 − bi. Hence g̃2j acts on ej and ek as parabolic Möbius transformations fixing only pj and qj . Then g̃ j acts on e0 as the identity, because it also fixes the intersection point p0 of e0 and ℓ̃ j . Summarizing the arguments we see that on the stratum of type (A3)i, F̃ixj(θ) = ℓ̃ e0 ∐ { P̃ (bi, b−14 ; bj , bk) }, P̃er j(θ;n) = ∅ (n > 1), (55) where ℓ̃+j ∪ e0 indicates that the curves ℓ̃ j and e0 meet in the point p0. Next we consider F̃ixi(θ) and P̃er i (θ;n) on the W̃ (D 4 )-stratum of type (A3)i. If we take a parametrization of e0 such that t = 0 and t = ∞ correspond to the points pj and pk respectively, then a simple check shows that the transformation g̃2i on ej is expressed as t 7→ b−24 t. There is a parametrization of ej such that t = 0 corresponds to pj and g̃ i is given by t 7→ b24t. Since b24 6= 1, the transformation g̃2i has exactly two fixed points on ej , one of which is just pj and the other is denoted by pij . Similarly, there is a parametrization of ek such that t = 0 corresponds to pk and g̃ i is given by t 7→ b−24 t, and hence g̃2i has exactly two fixed points on ek, one of which is just pk and the other is denoted by pik. There are no smooth fixed points F̃ix i (θ), because the smoothness condition of Table 3 with (i, j, k) replaced by (k, i, j) is not satisfied for any labels there. So we have F̃ixi(θ) = F̃ix i (θ) = {pj, pij , pk, pik} and F̃ix i (θ) = ∅ on the W̃ (D 4 )-stratum of type (A3)i. By symmetry there is a similar characterization of F̃ixk(θ) on the same stratum. By permuting the indices (i, j, k), we have F̃ixj(θ) = F̃ix j(θ) = {four points}, F̃ix j(θ) = ∅, (56) ej ek pj pk r0 r∞ S̃(θ) Figure 15: Surface of type A⊕31 on the W̃ (D 4 )-strata of types (A3)j and (A3)k. A slightly further consideration yields j(θ;n) = ei (if b4 is a primitive 2n-th root of unity), ∅ (otherwise), on the stratum (A3)j and a similar characterization of it on the stratum (A3)k. Example 7.6 (A ) Consider the W̃ (D 4 )-stratum of type A 1 . We may assume that κi = κj = κk = 0 so that bi = bj = bk = 1. But we have b4 6∈ {±1} since our stratum is not of type D4 nor of type A 1 . In this case the surface S(θ) has three singular points of type A1 at (xi, xj, xk) = (b4+b 4 , 2, 2), (2, b4+b 4 , 2), (2, 2, b4+b 4 ), which are called qi, qj , qk respectively. Note that the two points qj and qk lie on the line ℓ j but qi does not lie on the union ℓ j ∐ ℓ−j . The minimal resolution (15) is obtained by blowing up these three points (see Figure 15). First, consider the blowing-up at qk and represent it by (xi, xj , xk) = (uiuj+2, uj+2, ukuj+b4+b Then the strict transform ℓ̃+j of the line ℓ j is given by ui = uk + 1 = 0, while the exceptional curve ek is given by b4(ui + uk + 1) 2 + (b4 − 1)2ui = 0. The curves ℓ̃+j and ek intersect in the point (ui, uj, uk) = (0, 0,−1); this point is called pk. If we parametrize the curve ek as ui = − (b4 − 1)2t2 , uj = 0, uk = − {(b4 − 1)t+ 1}{(b4 − 1)t− b4} (b4 − 1)2t2 (t ∈ P1), where t = ∞ corresponds to the point pk, then the lifted transformation g̃2j induces the shift t 7→ t+1 and hence acts on ek as a parabolic Möbius transformation fixing pk only. In a similar manner g̃2j acts on the exceptional curve ej over qj as a parabolic Möbius transformation fixing only the intersection point pj of ℓ̃ j and ej . Next we consider the blowing-up at qi and represent it by (xi, xj , xk) = (uiuj + b4 + b 4 , uj + 2, ukuj + 2). Then the exceptional curve ei is given by b4(ui + uk + 1) 2 + (b4 − 1)2uk = 0, which can be parametrized as ui = − (b4 + 1) (b4t + 1)2 , uj = 0, uk = − b4(t− 1)2 (b4t+ 1)2 (t ∈ P1). ℓ̃+kℓ̃ ejei ek qi qj qk S̃(θ) Figure 16: Surface of type D4 In terms of this parametrization, the transformation g̃2j restricts to the map t 7→ b24t on the exceptional curve ei. Let r0 and r∞ be the points on ei corresponding to t = 0 and t = ∞ respec- tively. Since b24 6= 1, the map g̃2j acts on ei as a Möbius transformation with exactly two fixed points r0 and r∞. Hence the set F̃ixj(θ) contains the line component ℓ̃ j and the Riccati compo- nent {r0, r∞}, but has no smooth-point component, since F (bi, b4; bj , bk) = F (bi, b−14 ; bj, bk) = b4 + b 4 6∈ {±2} is a double root of the quartic equation (39) (see Theorem 6.3). Thus we have F̃ixj(θ) = ℓ̃ j ∐ {r0, r∞}. (57) As for the Riccati periodic points, since g̃2j acts on ei as t 7→ b24t, we have for any n > 0, j(θ;n) = ei (if b4 is a primitive 2n-th root of unity), ∅ (otherwise). Example 7.7 (D4) Consider the W̃ (D 4 )-stratum of type D4, say, the W (D 4 )-stratum with value θ = (8, 8, 8, 28). In this case the surface S(θ) has only one singular point of type D4 at (xi, xj, xk) = (2, 2, 2). The minimal resolution (15) is obtained by successive blowing-ups: Blow up the singular point. If we express the blowing-up as (xi, xj , xk) = (uiuj + 2, uj + 2, ukuj + 2) in terms of coordinates (ui, uj, uk), then the strict transform of the surface S(θ) is represented as uiujuk + (ui + uk + 1) 2 = 0. The exceptional curve e is given by uj = ui + uk + 1 = 0. The strict transforms of ℓ+i and ℓ j are given by ui+1 = uk = 0 and ui = uk+1 = 0, while the strict transform of ℓ+k is at infinity and not expressible in terms of the coordinates (ui, uj, uk). The blow-up surface has three singularities, all of which are of type A1 and located at the points in which the exceptional curve e intersects the strict transforms of ℓ+i , ℓ j , ℓ k . The lifts of the transformations g2i , g j , g k fix the curve e pointwise, since they fix the three singular points on it. Again blow up these points. Then we obtain a minimal resolution (15) of the surface S(θ) as depicted in Figure 16, where ei, ej , ek are the exceptional curves over the singular points and e++−e+−+ e−−− e−++ S̃(θ) P Figure 17: Surface of type A⊕41 e0, ℓ̃ i , ℓ̃ j , ℓ̃ k are the strict transforms of e, ℓ i , ℓ j , ℓ k , respectively. Being the strict transform of e, the exceptional curve e0 is fixed pointwise by the lifts g̃ i , g̃ j , g̃ k of g i , g j , g k, and hence carries rational solutions. Moreover the lift g̃2j fixes ej pointwise. This can be seen without computation. Since g̃2j is area-preserving and fixes ℓ̃ j pointwise, it has derivative 1 at pj along the curve ej . So the Möbius transformation on ej induced by g̃ j is either identity or a map of parabolic type. But the latter is impossible because it has at least two fixed points at pj and qj (see Figure 16). Hence g̃ j acts on ej as the identity. Next we shall observe that g̃ acts on ei as a parabolic Möbius transformation fixing qi only. If we express the blowing-up at (ui, uj, uk) = (−1, 0, 0) as (ui, uj, uk) = (vivk − 1, vjvk, vk), then the exceptional curve ei is given by vk = vj − (vi + 1)2 = 0. Parametrize ei as (vi, vj , vk) = (−(t + 1)/t, t−2, 0), where qi corresponds to t = ∞. Then g̃2j acts on ei by the shift t 7→ t + 1. Similarly g̃2j acts on ek as a parabolic transformation fixing qk only. By symmetry, g̃ i and g̃ k act on ej as parabolic transformations fixing qj only. Notice that the exceptional curve e0 carries rational Riccati solutions, while ej − {qj} carries Riccati solutions of infinte period. Thus we have F̃ixj(θ) = ℓ̃ e0, P̃er j(θ;n) = ∅ (n > 1). (58) Example 7.8 (A ) Consider the W̃ (D 4 )-stratum of type A 1 , where θ = (0, 0, 0,−4) and Fixj(θ) = ℓ j ∐ℓ−j . In this case the surface S(θ) has four singularities of type A1 at (xi, xj, xk) = (2εi, 2εj, 2εk) ∈ {±2}3 with εiεjεk = −1. Blow up at these points to obtain a minimal resolution as in (15). Let eεiεjεk be the exceptional line over (xi, xj, xk) = (2εi, 2εj, 2εk) and ℓ̃ j be the strict transform of ℓ j . Moreover let p εiεjεk denote the intersection point of the lines eεiεjεk and ℓ̃ j (see Figure 17). Then the lifted transformation g̃ j : S̃(θ) acts on the exceptional line eεiεjεk ∼= P1 as a Möbius transformation. It is a parabolic transformation with the only fixed point pεiεjεk . Let us check this for (εi, εj, εk) = (−1,−1,−1). The blowing-up of C3 at (xi, xj, xk) = (−2,−2,−2) is described by xi = uiuj − 2, xj = uj − 2, xk = ujuk − 2, in terms of coordinates (ui, uj, uk) around (0, 0, 0). Then the exceptional line e −−− is represented by the equations uj = 0 and (ui − uk)2 − 2(ui + uk) + 1 = 0 and hence it is parametrized as , uj = 0, uk = where the fixed point p−−− corresponds to t = ∞. Then we can check that g̃2j acts on the line e−−− as the translation t 7→ t+4, as desired. Thus the only fixed points of g̃2j on the exceptional set E(θ) are the four points pεiεjεk with εiεjεk = −1 and there are no periodic points, so that F̃ixj(θ) = ℓ̃ j ∐ ℓ̃−j , P̃er j(θ;n) = ∅ (n > 1). (59) 8 Power Geometry We apply the method of power geometry [5, 6, 7] to construct as many algebraic branch solutions to PVI(κ) as possible around each fixed singular point. Basically we can follow the arguments of [7]. However, while the attention of [7] is restricted to generic parameters, we require a thorough treatment of all parameters, where much ampler varieties of patterns are present. Moreover, the way in [7] of representing the parameters of Painlevé VI is not convenient for our purpose. So we have to redevelop the necessary arguments on power geometry from scratch. In view of Remark 5.1, it is sufficient to work around the origin z = 0. In order to apply the method in [5, 6, 7], we reduce the system (1) into a single second-order equation. If (q, p) = (q(z), p(z)) is a solution to system (1) such that q 6≡ 0, 1, z, ∞, then we solve the first equation of system (1) with respect to p = p(z) to obtain z(z − 1)q′ + κ1q1qz + (κ2 − 1)q0q1 + κ3q0qz 2q0q1qz . (60) Substituting this into the second equation yields the single second-order equation z − 1 + q0q1qz 2z2(z − 1)2 κ24 − κ21 + κ23 z − 1 + (1− κ22) z(z − 1) Multiply equation (61) by 2z2(z − 1)2q0q1qz and move its right-hand side to the left to obtain P (z, q) = 0, (62) where P (z, q) is a polynomial of (z, q, q′, q′′), that is, a differential sum of (z, q), whose explicit formula is omitted here but can be found in [7]. Therefore system (1) is equivalent to equation (62) together with (60) except for the possible solutions such that q ≡ 0, 1, z, ∞. A simple check shows that the Newton polygon of equation (62) is given as in Figure 18, where there are four patterns according as the parameters κ1 and κ4 are zero or not. First we search for a holomorphic solution germ q = q(z) to equation (62) around z = 0. We have only to construct formal power series solutions of the form q = czr + (higher order terms), (r, c) ∈ Z× C×, (63) 0 (3, 0) (0, 3) (0, 6) (3, 3) Case κ1 6= 0, κ4 6= 0. (0, 3) (0, 6) (3, 3) (3, 2)(1, 2) Case κ1 = 0, κ4 6= 0. 0 (3, 0) (0, 3) (3, 3)Γ0 (0, 4) (2, 4) Case κ1 6= 0, κ4 = 0. (0, 3) (3, 3)Γ0 (0, 4) (2, 4) (3, 2)(1, 2) Case κ1 = 0, κ4 = 0. Figure 18: Newton polygon for Painlevé VI 0 (4, 0) (0, 3) (0, 6) (2, 1) (6, 0) 0 (3, 0) (0, 1) (0, 6) (3, 3) (1, 0) Figure 19: Newton polygons for Lemma 8.1 (left) and Lemma 8.2 (right) since any formal power series solution to equation (62) is convergent [10, 11]. Then it follows from (60) that the associated formal Laurent series for p = p(z) is also convergent. In order to construct formal solutions (63), we consider the truncations along the edges Γ1 and Γ0 of the Newton polygons in Figure 18. We see that Γ1 and Γ0 have outer normal vectors (p1, p2) = (−1,−1) and (p1, p2) = (−1, 0), whose slopes are p2/p1 = 1 and p2/p1 = 0 respectively. Thus the edges Γ1 and Γ0 correspond to the exponents r = 1 and r = 0 respectively. The truncation of P = P (z, q) along the edge Γ1 is given by P1 = −2zq2q′ + 2z2q(q′)2 − 2z2q2q′′ + (κ21 − κ22 + 1)zq2 − z3(q′)2 + 2z3qq′′ −2κ21z2q + κ21z3. Substituting q = cz into equation P1 = 0 yields c = κ1/(κ1 + εκ2) with any sign ε ∈ {±1}. Similarly, the truncation of P = P (z, q) along the edge Γ0 is given by P0 = −2zq2q′ + 2z2q(q′)2 − 2z2q2q′′ + (κ23 − κ24)q4 + 2zq3q′ − 3z2q2(q′)2 +2z2q3q′′ + 2κ24q 5 − κ24q6. Substituting q = c into equation P0 = 0 yields c = (κ4 + εκ3)/κ4 with any sign ε ∈ {±1}. Lemma 8.1 If κ1 + κ2 6∈ Z, then there exists a holomorphic solution around the origin z = 0, κ1 + κ2 + κ1κ2 ak,+(κ) z k, p = κ0(κ0 + κ4) bk,+(κ) z k, (64) depending holomorphically on κ ∈ K with κ1 + κ2 6∈ Z. Similarly, if κ1 − κ2 6∈ Z, then there exists a meromorphic solution around the origin z = 0, κ1 − κ2 + κ1κ2 ak,−(κ) z k, p = κ1 − κ2 bk,−(κ) z k, (65) depending holomorphically on κ ∈ K with κ1 − κ2 6∈ Z. Proof. Substituting q = κ1z(κ1 + εκ2) −1 + κ1κ2Q with ε ∈ {±1} into equation (62) yields (κ1 + εκ2) z, κ1z(κ1 + εκ2) −1 + κ1κ2Q = κ21κ 2 p(z, Q), where p(z;Q) is a differential sum of (z, Q) with coefficients in C[κ] whose Newton polygon is given as in Figure 19 (left). The vertex (2, 1) carries the linear differential expression LεQ = 2ε(κ1 + εκ2)4x2{x2Q′′ − xQ′ − (κ1 + εκ2 + 1)(κ1 + εκ2 − 1)Q}, while the vertex (4, 0) carries the monomial (κ1 + εκ2) 2{(κ1 + εκ2)2 + κ23 − κ24 − 1}x4. The corresponding characteristic polynomial is given by vε(k) = 2ε(κ1 + εκ2) 4(k − 1− κ1 − εκ2)(k − 1 + κ1 + εκ2). Hence 1 + |κ1 + εκ2| is the unique critical value of the problem. If it is not an integer, then the coefficients ak,ε(κ) of the expansions (64) and (65) are determined uniquely and recursively. By substituting the resulting power series q = q(z) into equation (60), the Laurent series for p = p(z) is uniquely determined as in (64) and (65). As is mentioned earlier, the formal solutions (64) and (65) so obtained are convergent. ✷ Lemma 8.2 Assume that κ4 is nonzero. If κ4 + κ3 6∈ Z, then there exists a holomorphic solution germ around the origin z = 0, κ4 + κ3 ak,+(κ) z k, p = −κ4κ0 bk,+(κ) z k, (66) depending holomorphically on κ ∈ K with κ4 6= 0 and κ4 + κ3 6∈ Z. Similarly, if κ4 − κ3 6∈ Z then there exists a holomorphic solution germ around the origin z = 0, κ4 − κ3 ak,−(κ) z k, p = −κ4(κ0 + κ4) bk,−(κ) z k, (67) depending holomorphically on κ ∈ K with κ4 6= 0 and κ4 − κ3 6∈ Z. Proof. Substituting q = κ−14 (κ4 + εκ3) + κ 4 κ3Q into equation (62) yields z, κ−14 (κ4 + εκ3) + κ 4 κ3Q = κ23 p(z;Q), where p(z;Q) is a differential sum of (z, Q) with coefficients in C[κ]. The vertex (0, 1) carries the linear differential expression LεQ = 2(κ4 + εκ3)2{x2Q′′ + xQ′ − (κ4 + εκ3)2Q}, while the vertex (1, 0) carries the monomial (κ4 + εκ3) 2{1 + κ21 − κ22 − (κ4 + εκ3)2}x. The corresponding characteristic polynomial is given by vε(k) = 2(κ4 + εκ3) 2{k − (κ4 + εκ3)}{k + (κ4 + εκ3)}. Hence |κ4 + εκ3| is the unique critical value of the problem. If it is not an integer, then the coefficients ak,ε(κ) of expansions (66) and (67) are determined uniquely and recursively. Then substituting the resulting series for q = q(z) into equation (60) yields the Laurent series for p = p(z) as in (66) and (67). The formal solutions so obtained are convergent. ✷ The solutions in Lemmas 8.1 and 8.2 are essentially constructed in [7, 23]. We construct more particular solutions for the parameters on various strata of higher codimensions. 0 (4, 0) (0, 3) (0, 6) (2, 1) Figure 20: Newton polygon for Lemma 8.3 Lemma 8.3 (A and A ) If κ1 = κ2 = 0, then there exists a 1-parameter family of holo- morphic solution around the origin z = 0, t+ (1− t)(1− z)κ4 + t(1− t)κ0(κ0 + κ3) ak(t; κ) z p = κ0(κ0 + κ4) bk(t; κ) z depending on t ∈ C, where a2(t; κ) = 2, b0(t; κ) = 1 and the remaining coefficients ak(t; κ), k ≥ 3, and bk(t; κ), k ≥ 1, are polynomials of (t, κ3, κ4) determined uniquely and recursively. Proof. We put R(z; t) = tz{t+(1−t)(1−z)κ4}−1. Substituting q = R(z; t)+t(1−t)κ0(κ0+κ3)Q into equation (62) and multiplying the result by {t+ (1− t)(1− z)κ4}4 yield p(z, Q; t) := {t+ (1− t)(1− z)κ4}4P (z, R(z; t) + t(1− t)κ0(κ0 + κ3)Q) = 2t2(1− t)2κ0(κ0 + κ3){LQ+ g(z, Q; t) + h(z)} = 0, where LQ = z2{z2Q′′ − zQ′ + Q} and h(z) = −2z4(1 − z)2κ4+1. The Newton polygon of (69) is given as in Figure 20, where the terms LQ and h(z) correspond to the vertex (2, 1) and the horizontal infinite edge emanating from the vertex (4, 0) respectively, and the remaining term g(z, Q; t) corresponds to the remaining part of the polygon. Since the characteristic equation of LQ is (k − 1)2 = 0 having the unique root k = 1, the coefficients ak(t; κ), k ≥ 2, in (68) are determined uniquely and recursively. Here the leading coefficient a2(t; κ) is found to be a2(t; κ) = 2 by substituting Q = a2(t; κ)z 2 into the truncation LQ − 2z4 = 0 of equation (69) along the edge connecting the vertices (2, 1) and (4, 0). Substituting the resulting series q = q(z) into (60) we have p = p(z) as in (68). ✷ (0, 6) (0, 1) (2, 0) Figure 21: Newton polygon for Lemma 8.4 Lemma 8.4 (A3) If κ0 = κ1 = κ2 = 0, κ3 + κ4 = 1, then there exists a 1-parameter family of holomorphic solutions around the origin z = 0, 1− (1− z)κ4 + tκ3z + tκ3κ4 ak(t; κ) z k, p = −tκ34 bk(t; κ) z k, (70) depending on t ∈ C, where a2(t; κ) = tκ3, b1(t; κ) = 1 and the coeffcients ak(t; κ), k ≥ 2, and bk(t; κ), k ≥ 1, are polynomials of (t, κ4) determined uniquely and recursively. Proof. Put R(z) = z{1− (1− z)κ4}−1. Substituting q = R(z) + tκ3z + tκ3κ4Q into (62) yields P (z, R(z) + tκ3z + tκ3κ4Q) = tκ 4R(x) 4 p(z, Q; t), where p(z, Q; t) is a differential sum of (z, Q) with coefficients in C[t, κ4] whose Newton polygon is given as in Figure 21. Especially the vertex (0, 1) carries the linear differential expression LQ = 2(z2Q′′ + zQ′ −Q), whose characteristic polynomial is 2(k − 1)(k + 1), while the vertex (2, 0) carries the monomial −6tκ3z2. Since the critical values k = ±1 are smaller than 2, the coefficients ak(t; κ), k ≥ 2, in (70) are determined uniquely and recursively, where the leading coefficient a2(t; κ) is found to be tκ3. The rest of the proof is similar to that in Lemma 8.3. ✷ Lemma 8.5 (D4) If κ0 = κ1 = κ2 = κ3 = 0 and κ4 = 1, then there exists a 1-parameter family of holomorphic solution germs around the origin z = 0, q = 1 + t ak(t) z k, p = (1− z) log(1− z) + t bk(t) z k, (71) depending on a parameter t ∈ C, where a1(t) = b1(t) = 1 and the remaining coeffieicents ak(t) and bk(t), k ≥ 2, are polynomials of t determined uniquely and recursively. 0 (3, 0) (0, 2) (0, 6) (1, 1) (6, 0) 0 (2, 1)(0, 1) (0, 4) (2, 3) (1, 0) Figure 22: Newton polygons for Lemma 8.5 (left) and Lemma 8.6 (right) Proof. Substituting q = 1+ tz + tQ into equation (62) yields P (z, 1+ tQ) = t2p(z, Q; t), where p(z, Q; t) is a differential sum of (z, Q) with coefficients in C[t] whose Newton polygon is given as in Figure 22 (left). Consider the edge connecting (1, 1) and (3, 0). The vertex (1, 1) carries the differential monomial LQ = 2z3Q′′, whose characteristic polynomial is k(k − 1), while the vertex (3, 0) carries the monomial −4tz3. Put a1(t) = 1. Since the critical values k = 0, 1 are smaller than 2, the coefficients ak(t), k ≥ 2, in (71) are determined uniquely and recursively. Substituting the resulting series q = q(z) into (60) we have p = p(z) as in (71). Here the term z/{(1 − z) log(1 − z)} is singled out from p = p(z), because putting t = 0 yields the special solution q ≡ 1 and p = z/{(1− z) log(1− z)} (see also Lemma 9.4). ✷ Lemma 8.6 (A ) Let κ0 = 1/2 and κ1 = κ2 = κ3 = κ4 = 0. Then there exists a 1-parameter family of holomorphic solutions around the origin z = 0 depending on a parameter t ∈ C, q = tz + t(1− t) ak(t) z k, p = bk(t) z k, (72) where the coefficients ak(t) and bk(t) are polynomials of t beginning with a2(t) = 1/2 and b0(t) = 1/4. Moreover there is another 1-parameter family of solutions around z = 0, ck(t) z k, p = t dk(t) z k, (73) depending on t ∈ C, where ck(t) and dk(t) are polynomials of t beginning with c1(t) = 1/2 and d0(t) = −1/2. For t = 0, formula (73) represents the solution such that q ≡ ∞ and p ≡ 0. Proof. We only derive (73), as (72) is derived in a similar manner. Substituting q = t−1 + t−1(t−1)Q into (62) yields P (z, t−1+t−1(t−1)Q) = t−4(t−1)2z−1p(z, Q; t), where p(z, Q; t) is a differential sum of (z, Q) with coefficients in C[t] whose Newton polygon is given as in Figure 22 (right). Consider the edge connecting (0, 1) and (1, 0). The vertex (0, 1) carries the differential sum LQ = −2(zQ′ + z2Q′′), whose characteristic polynomial is −2k2, while the vertex (1, 0) carries the monomial z. Since the critical value k = 0 is smaller than 1, the coefficients ck(t), k ≥ 1, in (73) are determined uniquely and recursively, where the leading term is c1(t) = 1/2. Substituting the resulting series q = q(z) into (60) we have p = p(z) as in (73). ✷ So far we have considered the edges Γ1 and Γ0 of the Newton polygon in Figure 18 and constructed meromorphic solutions around the origin z = 0. Now let us consider the vertex (0, 3) of the polygon, which gives rise to algebraic branch solutions around z = 0. The truncation of the differential sum P = P (z, q) at the vertex (0, 3) is given by P3 = −2zq2q′ + 2z2q(q′)2 − 2z2q2q′′. Its charactersistic polynomial χ(r) is defined by substituting q = zr into P3(z, q) and dividing the result by q3 = z3r. In the present situation we see that χ(r) is identically zero; χ(r) ≡ 0. The normal cone of the vertex (0, 3) is U3 = { (p1, p2) ∈ R2 : p1 < 0, 0 < r = p2/p1 < 1 }. Thus the truncated solutions at the vertex (0, 3) are q = tzr for an arbitrary 0 < r < 1 and t ∈ C×. The Fréchet derivative with respect to q at the truncated solution q = tzr is given by L3Q = −2t2z2r{z2Q′′ + (1− 2r)zQ′ + r2Q}. The corresponding characteristic equation is given by v3(k) := −2t2(k − r)2 = 0, which has the only root k = r. Let n be any integer greater than 1. In order to search for an algebraic n-branch solution around z = 0, we take r = m/n for any integer 0 < m < n coprime to n, and consider a formal Puiseux series solution of the form q = tzm/n + ν=m+1 aν(t) z Since the characteristic equation v3(k) = 0 has no roots such that k > m/n, the coefficients aν = aν(t) can be determined uniquely and recursively for any given initial coefficient t ∈ C×. The convergence of the formal solution and its holomorphic dependence on parameters follow easily if we rewrite the equation (62) in terms of the new independent variable ζ = z1/n and apply the convergence arguments in [10, 11]. Thus we have established the following lemma. Lemma 8.7 For any integer n > 1, there exist ϕ(n) mutually disjoint 1-parameter families of n-branch solution germs to PVI(κ) around the origin z = 0, q(z) = tzm/n + ν=m+1 aν(n,m, t; κ) z p(z) = m+ n(κ1 + κ2 − 1) z−m/n + ν=−m+1 bν(n,m, t; κ) z where the discrete parameter m ranges over all integers 0 < m < n coprime to n and the continuous parameter t takes any value of the punctured complex line C×. Remark 8.8 The family (74) contains no Riccati solutions, even if κ ∈ Wall. This will be shown in the proof of Lemma 9.14. 9 Injection Implies Surjection We establish Theorem 1.3 based on the main idea described in §1. Due to the S3-symmetry permuting the three fixed singular points, it suffices to work around z = 0 (see Remark 5.1). As a preliminary we begin by constructing some Riccati solutions to equation (1). Assume that κ0 = 0 so that κ1 + κ2 + κ3 + κ4 = 1. Then the second equation of system (1) has the null solution p(z) ≡ 0. Substituting this into the first equation yields the Riccati equation z(z − 1)q′ + κ1q1qz + (κ2 − 1)q0q1 + κ3q0qz = 0. If κ4 is nonzero, then the change of dependent variable z(1 − z) log{(1− z)−κ4f} transfers the Riccati equation to the Gauss hypergeometric equation z(1− z)f ′′ + {(1− κ3 − κ4)− (κ2 − κ4 + 1)z}f ′ + κ2κ4f = 0. (75) Next assume that κ0 = κ1 = 0 so that κ2 + κ3 + κ4 = 1. In this case there is another type of Riccati solution to the system (1). The first equation of the system (1) has the null solution q(z) ≡ 0. Substituting this into the second equation yields the Riccati equation z(z − 1)p′ + zp2 + (κ2 − 1 + κ3z)p = 0. Then change of independent variable p = (z − 1) d log g takes it to the linear equation z(1 − z)g′′ + {(1− κ2)− (κ3 + 1)z}g′ = 0. (76) Lemma 9.1 (A1) Assume that κ0 = 0, κ1 + κ2 + κ3 + κ4 = 1, κ1 + κ2 6∈ Z, κ3 + κ4 6∈ Z and κ1κ4 6= 0. Then system (1) has two single-valued Riccati solutions around the origin z = 0, (a) q = κ1 + κ2 +O(κ1z 2), p ≡ 0, (b) q = κ3 + κ4 +O(κ3z), p ≡ 0. Proof. The solutions (a) and (b) are obtained from two linearly independent solutions 2F1(κ2,−κ4, 1− κ3 − κ4; z), zκ3+κ42F1(κ2 + κ3 + κ4, κ3, κ3 + κ4 + 1; z), of equation (75) repsectively. They also come from solutions (64) and (66) respectively. ✷ Lemma 9.2 (A2) Assume that κ0 = κ1 = 0, κ2 + κ3 + κ4 = 1, κ2 6∈ Z, κ3 6= 1 and κ4 6= 0. Then system (1) has three single-valued Riccati solutions around the origin z = 0, (a) q ≡ 0, p ≡ 0, (b) q = κ3 + κ4 +O(z), p ≡ 0, (c) q ≡ 0, p = −κ2(κ3 − 1) κ2 + 1 +O(z). Proof. The solutions (a) and (b) just come from (a) and (b) of Lemma 9.1 respectively, while solution (c) is obtained from the solution zκ22F1(κ2, κ2 + κ3, κ2 + 1; z) of equation (76). ✷ Lemma 9.3 (A3) Assume that κ0 = κ1 = κ2 = 0, κ3 + κ4 = 1 and κ4 6= 0. Then system (1) admits a 1-parameter family of single-valued Riccati solutions around the origin z = 0, q(z; s) = s0 + s1(1− z)κ4 , p(z; s) ≡ 0, (s = [s0 : s1] ∈ P1). (77) Proof. The hypergeometric equation (75) becomes (1 − z)f ′′ − κ3f ′ = 0, whose nontrivial solutions are given by f(z) = s0 + s1(1− z)κ4 with (s0, s1) ∈ C2 − {(0, 0)}. The corresponding Riccati solutions are the 1-parameter family of single-valued solutions q = q(z; s) as in (77). ✷ Lemma 9.4 (D4) Assume that κ0 = κ1 = κ2 = κ3 = 0 and κ4 = 1. (1) System (1) admits a 1-parameter family of rational Riccati solutions q(z; s) = s0 + s1(1− z) , p(z; s) ≡ 0, (s = [s0 : s1] ∈ P1). (78) (2) System (1) admits a 1-parameter family of single-valued Riccati solutions around z = 0, q(z; t) ≡ 1, p(z; t) = t0z (1− z){t0 log(1− z) + t1} , (t = [t0 : t1] ∈ P1). (79) Proof. In this case (77) gives the 1-parameter family of rational solutions (78). Moreover the first equation of system (1) admits a constant solution q(z) ≡ 1. Substituting this into the second equation yelds the Riccati equation z(z−1)p′+(1− z)p2+ p = 0. Change of dependent variable p = −zf ′/f takes this into the linear equation (1 − z)f ′′ − f ′ = 0, whose nontrivial solutions are given by f = t0 log(1 − z) + t1 with (t0, t1) ∈ C2 − {(0, 0)}. Thus the Riccati equation has the 1-parameter family of single-valued solutions p = p(z; t) as in (79). ✷ Now we proceed to the proof of Theorem 1.3. From now on we fix the indices as (i, j, k) = (3, 1, 2) in accordance with the choice of indices in §8. First we treat the fixed point case. Lemma 9.5 The set F̃ixj(θ) is exhausted by meromorphic solutions around z = 0. Proof. Case-by-case check based on the “injection-implies-surjection” principle described in §1. Example 9.6 (∅) We combine the results of Example 7.1, Lemmas 8.1 and 8.2. A key ob- servation is that Lemmas 8.1 and 8.2 give us as many meromorphic solutions around z = 0 as the cardinality of the set F̃ixj(θ) in (49). For example, if P̃ (bi, b4; bj , bk) ∈ F̃ixj(θ), then the existing and smoothness conditions for it (see Table 3) makes it possible to apply Lemma 8.2 to conclude that the meromorphic solution (66) exists corresponding to the fixed point P̃ (bi, b4; bj, bk). Thus the set F̃ixj(θ) is exhausted by meromorphic solutions around z = 0. Example 9.7 (A1) We combine the results of Example 7.2, Lemmas 8.1, 8.2 and 9.1. First we notice that the two single-valued Riccati solutions in Lemma 9.1 correspond to the two Riccati fixed points F̃ix j(θ) = {p, q} in (50). On the other hand, for the same reason as in Example 9.6, formulas (65) and (67) in Lemmas 8.1 and 8.2 give us as many meromorphic solutions around z = 0 as the cardinality of smooth fixed points F̃ix j(θ) = {{P̃ (bi, b−14 ; bj , bk), P̃ (bj , b−1k ; bi, b4)}} in (50). Thus the set F̃ixj(θ) is exhausted by meromorphic solutions around z = 0. Example 9.8 (A2) We combine the results of Example 7.3, Lemmas 8.2 and 9.2. As (51) shows, the set F̃ixj(θ) consists of the four points p0, p+, p− and P̃ (bj , b k ; bi, b4). On the other hand, we have the three single-valued Riccati solutions of Lemma 9.2 and one non-Riccati holomorphic solution (67). Clearly, the three Riccati solutions correspond to the points p0, p+ and p−, while the non-Riccati solution corresponds to the remaining point P̃ (bj , b k ; bi, b4). Thus any single-valued solution around z = 0 is a meromorphic solution. Example 9.9 (A ) We combine the results of Example 7.4, Lemmas 8.2 and 8.3. First we consider the W̃ (D 4 )-stratum of type (A 1 )i. The C-parameter family (68) of holomorphic solutions injects into the line ℓ̃+j ≃ C in (53), so that we have an injection C →֒ C. Since this injection is holomorphic, it must be a surjection. Thus the line ℓ̃+j is exhausted by the family (68). Moreover we have the two holomorphic solutions (66) and (67), which do not lie in the family (68). Thus they must correspond to the points P̃ (bi, b4; bj, bk) and P̃ (bi, b 4 ; bj , bk) in (53). So on the stratum of type (A⊕21 )i the set F̃ixj(θ) is exhausted by meromorphic solutions around z = 0. Next we consider the W̃ (D 4 )-strata of types (A 1 )j and (A 1 )k. On these strata the equality F̃ixj(θ) = F̃ix j(θ) in (54) implies that any single-valued solution around z = 0 is a Riccati and hence meromorphic solution. Example 9.10 (A3) We combine the results of Example 7.5, Lemmas 8.2, 8.4 and 9.3. First we consider the W̃ (D 4 )-stratum of type (A 1 )i. The P 1-family (77) of single-valued Riccati solutions exactly corresponds to the exceptional curve e0 ≃ P1 in (55). So the C-family (70) of holomorphic solutions must inject into the line ℓ̃+j ≃ C in (55). Since this injection C →֒ C is holomorphic, it must be a surjection. Hence ℓ̃+j is exhausted by the family (70). Moreover there is the holomorphic solution (67), which must correspond to the point P̃ (bi, b 4 ; bj, bk) in (55). Thus on the stratum of type (A⊕21 )i the set F̃ixj(θ) is exhausted by meromorphic solutions around z = 0. Next we consider the W̃ (D 4 )-strata of types (A 1 )j and (A 1 )k. On these strata the equality F̃ixj(θ) = F̃ix j(θ) in (56) implies that any single-valued solution around z = 0 is a Riccati and hence meromorphic solution. Example 9.11 (A ) We combine the results of Example 7.6 and Lemma 8.3. As (57) shows, F̃ixj(θ) has only one line component ℓ̃ j ≃ C. Hence the C-family (68) of holomorphic solu- tions must inject into this line, so that we have an inclusion C →֒ C. Since this injection is holomorphic, it must be a surjection. Thus ℓ̃+j is exhausted by the family (68). Example 9.12 (D4) We combine the results of Example 7.7, Lemmas 8.5 and 9.4. The P family (78) of rational Riccati solutions corresponds to the exceptional curve e0 in (58), while the P1-family (79) of single-valued Riccati solutions corresponds to the exceptional curve ej there. Hence C-family (71) of holomorphic solutions, which is different from (78) and (79), must inject into the line ℓ̃+j ≃ C in (58). Since this injection C →֒ C is holomorphic, it must be a surjection. Thus ℓ̃+j is exhausted by the family (71). Example 9.13 (A ) We combine the results of Example 7.8 and Lemma 8.6. In view of F̃ixj(θ) = ℓ̃ j ∐ ℓ̃−j , the C-family (72) of holomorphic solutions injects into the line ℓ̃εj ≃ C for some sign ε ∈ {±1}. So we have an injection C →֒ C. Since this injection is holomorphic, it must be a surjection, so that ℓ̃εj is exhausted by the family (72). Then the other C-family (73) of holomorphic solutions injects into the remaining line ℓ̃−εj ≃ C. So we have another injection C →֒ C. Since this injection is holomorphic, it must be a surjection. Hence F̃ixj(θ) = ℓ̃+j ∐ ℓ̃−j is exhausted by the families (72) and (73). The proof of Lemma 9.5 is now complete. ✷ Finally we argue the periodic point case using the “injection-implies-surjection” principle. Lemma 9.14 For any n > 1 the set P̃erj(θ;n) is exhausted by algebraic n-branch solutions around z = 0. Proof. We combine Lemmas 6.6 and 8.7. First we consider the generic case where κ ∈ K−Wall, namely, where θ = rh(κ) is such that ∆(θ) 6= 0. In this case there is no Riccati locus and hence P̃erj(θ;n) = P̃er j (θ;n), which is biholomorphic to the disjoint union of ϕ(n) copies of C Lemma 6.6. On the other hand, by Lemma 8.7, there are ϕ(n) mutually disjoint C×-parameter families of algebraic n-branch solutions around z = 0 as in (74). Number these families from 1 to ϕ(n). The first family injects into a (unique) connected component (≃ C×) of P̃erj(θ;n), which we call the first component, and we have an injection C× →֒ C×. Since this injection is holomorphic, it must be a surjection and hence the first component is exhausted by the first family. Consider the second family of solutions and the corresponding second component of P̃erj(θ;n). Notice that the second component is different from the first one, because the first component is already occupied by the first family and so it cannot contain the second family. For the same reason as above, the second component is exhausted by the second family. Since the families and the components have the same cardinality ϕ(n), we can repeat this argument to conclude that P̃erj(θ;n) is exhausted by the ϕ(n) families of algebraic n-branch solutions. Next we consider the case where κ ∈ Wall, namely, where the Riccati part P̃er j(θ;n) may appear. Since the lemma is trivial for the Riccati part, we have only to consider the non-Riccati part P̃er j(θ;n). The argument proceeds just in the same manner as in the last paragraph, once we show that the family of solutions in (74) contains no Riccati solutions (see Remark 8.8). To see this, we consider the family S̃ → Θ of surfaces S̃(θ) parametrized by θ ∈ Θ and put j (n) = j (θ;n), E = E(θ), where E(θ) is the exceptional set in S̃(θ). (Precisely speaking, the parameter space Θ should be replaced by a finite covering of it to get a simultaneous minimal resolution.) Then P̃er j (n) and E are closed subsets of S̃ which are disjoint by Lemma 4.5. Now we look at the family of solutions in (74). It depends continuously on κ ∈ K. Take any point κ∗ ∈ Wall and let K − Wall ∋ κ → κ∗. For any κ ∈ K −Wall, the family at κ is contained in P̃er j(θ;n) with θ = rh(κ) and hence in P̃er j(n). Taking the limit κ → κ∗, we see that the family at κ∗ is contained in P̃er j (n), hence in P̃er ∗;n) with θ∗ = rh(κ∗). Since P̃er ∗;n) is disjoint from E(θ∗), the family at κ∗ contains no Riccati solutions. Therefore the proof is complete. ✷ Now the local statement of Theorem 1.3 around a fixed singular point, say z = 0, is an immediate consequence of Lemmas 9.5 and 9.14. At the same time all the finite branch solutions around z = 0 have been classified up to Bäcklund transformations. The global statement about algebraic solutions follows readily from the local statements around z = 0, 1, ∞, together with the analytic Painlevé property on Z = P1 − {0, 1,∞}. The proof of Theorem 1.3 is complete. References [1] F.V. Andreev and A.V. Kitaev, Transformations RS24(3) of the ranks ≤ 4 and algebraic solutions of the sixth Painlevé equation, Comm. Math. Phys. 228 (2002), 151–176. [2] P. Boalch, From Klein to Painlevé via Fourier, Laplace and Jimbo, Proc. London Math. Soc. (3) 90 (2005), 167–208. [3] P. Boalch, The fifty-two icosahedral solutions to Painlevé VI, J. Reine Angew. Math. 596 (2006), 183–214. [4] E. Brieskorn, Über die Auslösung gewisser Singularitäten von holomorphen Abbildungen, Math. Ann. 166 (1966), 76–102. [5] A.D. Bruno, Power asymptotics of solutions to an ordinary differential equation, Dokl. Math., 68 (2003), no. 2, 199-203. [6] A.D. Bruno, Power-logarithmic expansion of solutions to an ordinary differential equation, Dokl. Math., 68 (2003), no. 2, 221-226. [7] A.D. Bruno and I.V. Goryuchkina, Expansions of solutions of the sixth Painlevé equation, Dokl. Math., 69 (2004) no. 2, 268–272. [8] B. Dubrovin and M. Mazzocco, Monodromy of certain Painlevé-VI transcendents and reflection groups, Invent. Math. 141 (2000), no. 1, 55–147. [9] R. Garnier, Étude de l’intégrale générale de l’équation VI de M. Painlevé dans le voisinage de ses singularités transcendentes, Ann. Sci. École Norm. Sup. 34 (1917), 239–353. [10] R. Gérard, Une classe d’équations différentielles non lineaires à singularité régulière, Funk- cial. Ekvac. 29 (1986), no. 1, 55–76. [11] R. Gérard and Y. Sibuya, Etude de certains systèmes de Pfaff avec singularité, Lecture Notes in Math. 712, Springer, Berlin, 1979, 131–288. [12] D. Guzzetti, The elliptic representation of the general Painlevé VI equation, Comm. Pure Appl. Math. 55 (2002), 1280–1363. [13] N. Hitchin, Poncelet polygons and the Painlevé equations, Geometry and analysis, Bombay, 1992, Tata Inst. Fund. Res., Bombay (1995), 151–185. [14] N. Hitchin, A lecture on the octahedron, Bull. London Math. Soc. 35 (2003), no. 5, 577– [15] M. Inaba, K. Iwasaki and M.-H. Saito, Bäcklund transformations of the sixth Painlevé equation in terms of Riemann-Hilbert correspondence, Internat. Math. Res. Notices 2004:1 (2004), 1–30. [16] M. Inaba, K. Iwasaki and M.-H. Saito, Dynamics of the sixth Painlevé equation, to appear in: Théorie asymptotique et équations de Painlevé (Angers, juin 2004), M. Loday and E. Delabaere (Éd.), Séminaires et Congrès, Soc. Math. France. [17] M. Inaba, K. Iwasaki and M.-H. Saito, Moduli of stable parabolic connections, Riemann- Hilbert correspondence and geometry of Painlevé equation of type V I. Part I, Publ. Res. Inst. Math. Sci. 42 (2006), no. 4, 987–1089. [18] M. Inaba, K. Iwasaki and M.-H. Saito, Moduli of stable parabolic connections, Riemann- Hilbert correspondence and geometry of Painlevé equation of type V I. Part II, Adv. Stud. Pure Math. 45 (2006), 387–432. [19] K. Iwasaki, An area-preserving action of the modular group on cubic surfaces and the Painlevé VI equation, Comm. Math. Phys., 242 (2003), no. 1-2, 185–219. [20] K. Iwasaki and T. Uehara, An ergodic study of Painlevé VI, Math. Ann. (in press). Online First DOI: 10.1007/s00208-006-0077-8. arXiv: math.AG/0604582. [21] K. Iwasaki and T. Uehara, Chaos in the sixth Painlevé equation, RIMS Kôkyûroku Bessatsu B2 (2007), 73–88. [22] M. Jimbo, Monodromy problem and the boundary condition for some Painlevé equations, Publ. Res. Inst. Math. Sci., 18 (1982), no.3, 1137–1161. [23] K. Kaneko, Painlevé VI transcendents which are meromorphic at a fixed singularity, Proc. Japan Acad. 82, Ser. A (2006), no. 5, 71–76. [24] A.V. Kitaev, Grothendieck’s dessins d’enfants, their deformations, and algebraic solutions of the sixth Painlevé and Gauss hypergeometric equations, Algebra i Analiz 17 (2005), no. 1, 224–275. [25] M. Mazzocco, Rational solutions of the Painlevé VI equation, J. Phys. A: Math. Gen. 34 (2001), 2281–2294. [26] K. Okamoto, Sur les feuilletages associés aux équations du second ordre à points critiques fixes de P. Painlevé, Espaces des conditions initiales, Japan. J. Math. 5 (1979), 1–79. [27] K. Okamoto, Study of the Painlevé equations I, sixth Painlevé equation PVI, Ann. Math. Pura Appl. (4) 146 (1987), 337–381. [28] M.-H. Saito, T. Takebe and H. Terajima, Deformation of Okamoto-Painlevé pairs and Painlevé equations, J. Algebraic. Geom. 11 (2002), no. 2, 311–362. http://arxiv.org/abs/math/0604582 [29] M.-H. Saito and H. Terajima, Nodal curves and Riccati solutions of Painlevé equations, J. Math. Kyoto Univ. 44 (2004), no. 3, 529–568. [30] H. Sakai, Rational surfaces associated with affine root systems and geometry of the Painlevé equations, Comm. Math. Phys. 220 (2001), 165–229. [31] S. Shimomura, A family of solutions of a nonlinear ordinary differential equation and its application to Painlevé equations (III), (V) and (VI), J. Math. Soc. Japan 39 (1987), no. 4, 649–662. [32] K. Takano, Reduction for Painlevé equations at the fixed singular points of the first kind, Funkcial. Ekvac. 29 (1986), no. 1, 99–119. Introduction Phase Space Riemann-Hilbert Correspondence Dynamics on Cubic Surface Bäcklund Transformations Fixed Points and Periodic Points Case-by-Case Study Power Geometry Injection Implies Surjection ABSTRACT Every finite branch solutions to the sixth Painleve equation around a fixed singular point is an algebraic branch solution. In particular a global solution is an algebraic solution if and only if it is finitely many-valued globally. The proof of this result relies on algebraic geometry of Painleve VI, Riemann-Hilbert correspondence, geometry and dynamics on cubic surfaces, resolutions of Kleinian singularities, and power geometry of algebraic differential equations. In the course of the proof we are also able to classify all finite branch solutions up to Backlund transformations. <|endoftext|><|startoftext|> September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv Modern Physics Letters A c© World Scientific Publishing Company An Inverse f(R) Gravitation for Cosmic Speed up, and Dark Energy Equivalent Sohrab Rahvar Department of Physics, Sharif University of Technology, P.O.Box 11365-9161, Tehran, Iran.∗ Yousef Sobouti Institute for Advanced Studies in Basic Sciences, P.O.Box 45195-1159, Zanjan, Iran Received (Day Month Year) Revised (Day Month Year) To explain the cosmic speed up, brought to light by the recent SNIa and CMB ob- servations, we propose the following: a) In a spacetime endowed with a FRW metric, we choose an empirical scale factor that best explains the observations. b) We assume a modified gravity, generated by an unspecified field lagrangian, f(R). c) We use the adopted empirical scale factor to work back retroactively to obtain f(R), hence the term ‘Inverse f(R)’. d) Next we consider the classic GR and a conventional FRW universe that, in addition to its known baryonic content, possesses a hypothetical ‘Dark Energy’ component. We compare the two scenarios, and find the density, the pressure, and the equation of the state of the Dark Energy required to make up for the differences between the conventional and the modified GR models. Keywords: Cosmology; Dark Energy; Modified Gravity. 95.36.+x, 98.80.Jk, 98.80.Es ∗rahvar@sharif.edu http://arxiv.org/abs/0704.0680v2 September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv 2 Rahvar & Sobouti As cosmological standard candles, supernovae type Ia (SNIa) appear dimmer than what one expects from a Cold Dark Matter (CDM) model of the universe 1,2,3. This observation and other evidences from the Cosmic Microwave Background (CMB) measurements indicate that the universe is in an acceleration phase of its expansion 4,5,6. A conventional CDM scenario does not explain this speed up. Some authors have stipulated a dark energy component to make up for whatever dynamical effects that the known energy momentum content of the model leaves unaccounted for 7,8,9,10,11,12,13,14,15,16,17,18,19,20,21. Others have entertained alternatives to Einstein’s gravitation 22,23,24,25,26,27,28,29,30. Yet a third school have resorted to inhomogeneous FRW universes to explain the dilemma 31. Since all these approaches attempt to answer the same question, all should be equivalent, and there should be a way to translate one language to the other. Here, we are concerned with the ’dark energy’ and ’alternative gravitation’ sce- narios. We suggest to begin with a Freidman- Robertson- Walker (FRW) universe, to choose its scale factor, a(t), in a way that best explains the available observa- tions, and to work out the dynamics of the spacetime. Next, to write down the field equations for a modified f(R) gravitation, and knowing a(t), to solve for f(R). Finally, to attribute whatever deviations from the conventional FRW results there is, to a dark energy field, and to obtain its density, pressure, equation of state, etc. The choice of the scale factor In a FRW metric, ds2 = −dt2 + a(t)2dx2, single term scale factors of the form a ∝ tβ lead to constant deceleration parameters, q = −äa/ȧ2 = (1 − β)/β, and do not serve the purpose. We propose the following two-term ansatz a(t) = 1 + p (t/t0) 1 + p(t/t0) , (1) where t0 is the age of the universe, and α and p are the free parameters of the model, to be adjusted to ensure compatibility of the emerging results with observations. The factor (1+p)−1 is introduced to have a(t0) = 1. By letting either α or p tend to zero, one recovers the standard CDM universe. Hereafter, for economy in writing, we will use the time parameter τ = p(t/t0) 2α/3 instead of the conventional time t. From Eq. (1) one finds [1− (1 + α)(2α− 1)τ ] [1 + τ ] [1 + (1 + α)τ ]−2 (2) )−3/2α [1 + (1 + α)τ ] [1 + τ ] , (3) R = t02R = 6H2(1− q) = )−3/α (4) × [1 + (2 + 5α+ 2α2)τ + (1 + 5α+ 4α2)τ2] [1 + τ ]−2 . For α > 1/2, q can become negative and remain nonsingular throughout. Transition from a decelerated phase of expansion to an accelerated one takes place at τtrans = September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv An Inverse f(R) Gravitation for Cosmic Speed up, and Dark Energy Equivalent 3 [(α+ 1)(2α− 1)]−1 or ttrans = [(α+ 1)(2α− 1)p] −3/2α t0 < t0. For −1 < α < 1/2, q is always positive and nonsingular. For α < −1, q can become negative but also singular. The last two possibilities are discarded. For all values of α and p, limit q(t → 0) = limit q(t → ∞) = (1− 2α)/2(1 + α). (5) In Fig.(1) we have plotted q(t) versus the redshift, z = a(t0)/a(t) − 1, for several values of α and p = 1/3. 0 1 2 Fig. 1. Plot of q(t) versus the redshift, z = a(t0)/a(t) − 1, for α = 0, 1, 2, 3, 4; and p = 1/3. The case α = 0 gives the classic value, q = 2/3. As α increases from 1 to 4, transition to the accelerated phase of expansion moves from later to earlier epochs, from smaller z’s to larger one. As to H and R, both remain positive for all times. Both tend to ∞ as τ → 0 and decease monotonically to 0 as τ → ∞. They exhibit a normal behavior in the neighborhood of τtrans = [(1 + α)(2α − 1)]−1. Equation (3), written for the present epoch, reveals a constraint on α and p, that should be observed in the final design of the model. Thus, [1 + (1 + α)p]/[1 + p] = H0t0 ≈ . (6) September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv 4 Rahvar & Sobouti 0 0.5 1 1.5 Fig. 2. The distance modulus of the SNIa Gold’s sample versus redshifts, black circles; And the plot of Eq. (7) with the scale factor of Eq. (1), solid line. Parameters of the model are α = 2 and p = 1/3. Empirical values of α & p The distance modulus (corrected for the reddening) and the (dimensionless) lumi- nosity distance, DL(z;α, p), of supernovae are related as µ = m−M = 5 logDL(z;α, p) + 25, (7) DL(z;α, p) = (1 + z) dzH(z;α, p)−1. In Fig. (2), the observed distance modulus of 157 SNIa of Gold’s sample are plot- ted versus the redshifts. The solid curve is the plot of equation (7), in which, in compliance with the constraint of Eq. (6), we have chosen α = 2, p = 1/3. (8) The fit to the data points is adequate for our purpose, though the parameters can be refined to optimize the fit. These numbers give ttrans = 0.43t0 and ztrans = 1.14. On adopting H0 ≈ 70km sec−1 Mpc−1, one obtains an age of t0 ≈ 12.4Gyr for the universe. An inverse f(R) way out We begin with a modified field equation generated by an, as yet, unspecified field lagrangian, f(R), Rµν − Rgµν = (f −RF )gµν (9) (∇µ∇ν − gµν∇λ∇λ)F − T (M)µν , September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv An Inverse f(R) Gravitation for Cosmic Speed up, and Dark Energy Equivalent 5 where F (R) = df(R)/dR, and κ = 8πG. For a universe of FRW type, filled with a perfect fluid of density ρm and pressure pm, Eq. (9) and the equation of continuity reduce to 3HḞ + 3H2F − (RF − f)− κρm = 0, (10) F̈ −HḞ + 2ḢF + κ(ρm + pm) = 0, (11) ρ̇m + 3H(ρm + pm) = 0, (12) We further neglect the pressure and integrate Eq. (12) to obtain ρm(t) = ρ0a(t) Next we substitute for ρm in Eq. (11), assume α = 2, change the time variable to τ = p(t/t0) 4/3, and find (1 + τ)3τ2F ′′ − (1 + τ)2(1 + 5τ)τF ′ (13) (1 + τ)(1 + τ + 3τ2)F + where the ′′ ′ now stands for d/dτ , and we have, arbitrarily, put the dimensionless constant (3/4)3(1+p)3t0 2κρ0, that appear in the course of mathematical manipula- tions equal to one. We will shortly discuss the numerical solution of Eq. (13). Some general remarks on its asymptotic behavior, however, are instructive. As τ → 0 we F (τ) = τ2 + . . . + c1 τ −0.44P1(τ) + c2 τ 1.7P2(τ), τ → 0. where P1 and P2 are calculable polynomials in τ , and begin with term 1; c1 and c2 are constants of integration to be obtained from boundary conditions. The exponents, −0.44 and 1.7 are approximate solutions of the indicial equation, s2− 5 = 0. As τ → 0, F diverges to ∞ or converge to 0 on account of one or the other term. This feature makes the solutions sensitive to a CDM type boundary conditions of the form, F (τinitial) = 1 and F ′(τinitial) = 0. Presently we have no basis, observational or otherwise, to make an intelligent guess as to what the appropriate boundary conditions are. For the sake of argument, however, we have adopt c1 = c2 = 0, and kept only the proper solution of Eq. (12). With F (τ) known, Eq.(10) becomes an algebraic equation to calculate f(τ). The numerically calculated solutions of F (τ), f(τ) and R = t02R(τ) are plotted in Fig. (3). Elimination of τ in favor R provides F (R) and f(R) It is instructive to examine the asymptotic behavior of F (R) and f(R) analyti- cally. In the limit of small and large τ ’s, corresponding to large and small R’s, one finds F (R) = 6pR−2/3 + · · · R → ∞ (15) R → 0. (16) September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv 6 Rahvar & Sobouti 0.1 0.2 0.3 0.4 0.5 100* F( ) Fig. 3. F (τ), proper solution of Eq. (13) (×100), dot- dashed line; f(τ), dashed line; and R(τ), solid line; α = 2, p = 1/3. Note that F (R) is a dimensionless scalar as it should be. With F (R) = df/dR known, it is a matter of simple integration to obtain f(R). Thus, f(R) = 6pR−2/3 + · · · , R → ∞, (17) , R → 0. (18) At early epochs (large R’s), the spacetime approaches the conventional FRW uni- verse with the classical GR while for the later times we have a positive acceleration universe. Another point about this action is that for R → 0 in the solar system scales, f(0) = 0 and f ′(0) = 0. In this case we will have standard GR equation and f(R) evades from the solar system test. Recently this type of models have been studied by introducing action in the form of f(R) = R+ f1(R), where for R = 0 in the solar system, f(0) = 0 and for larger R’s, in cosmological scales and inside the large scale structures the action reduces to f(R) = R− Λ 32,33. In the remaining range of R, integration is done numerically and the results are plotted in Fig. (4). Dark Energy equivalent Instead of the modified gravitation considered above, let us assume a classic FRW universe. That is, let f(R) = R and F = 1. Let this universe, however, have a September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv An Inverse f(R) Gravitation for Cosmic Speed up, and Dark Energy Equivalent 7 10 20 30 40 Fig. 4. Numerical plot of f(R) versus R. τ is eliminated between f(τ) and R(τ); α = 2, p = 1/3. ’Dark Energy’ component, in addition to its conventional baryonic content. The counterparts of Eqs. (10), and (11) will be 3H2 − κ (ρde + ρm) = 0, (19) 2Ḣ + κ (ρde + ρm) + κ (pde + pm) = 0. (20) Subtracting Eq. (10) from (19), and Eq. (11) from (20) gives κρde = 3H 2(1− F )− 3HḞ + (RF − f), (21) κ(ρde + pde) = F̈ −HḞ − 2Ḣ(1− F ), or κpde = F̈ + 2HḞ −H2(1− 2q)(1− F ) (RF − f). (22) The equation of state for the dark energy is obtained by eliminating τ , implicit in F and H , between Eqs. (21) and (22). This is done numerically and w = pde/ρde as a function of the redshift is plotted in Fig. (5). Concluding remarks The algorithm of Fig. 6 summarizes the path we have followed in this letter. We have resorted to the SNIa observations to design an empirical FRW metric that allows the model universe to transit from a phase of decelerated expansion at early September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv 8 Rahvar & Sobouti 0 1 2 3 -0.685 -0.68 -0.675 -0.67 Fig. 5. Equation of State of the Dark Energy: Numerical plot of w = pde/ρde versus the redshift epochs to an accelerated one at later times. The spacetime is almost a CDM model and the gravity is almost the classic GR at very early times, but evolves away in course of time. Next we have maintained that the so-designed spacetime is deducible from a modified non- Hilbert- Einstein field lagrangian, f(R) . Knowing the metric, we have solved the modified field equations retroactively for the sought-after f(R). Finally, we have compared our results with those of a conventional FRW model and have attributed the differences between the two to a dark energy component. Eventually, we have extracted the density, the pressure, and the equation of state of this stipulated energy. We note that our choice of the scale factor and the adjustment of its free parameters, to comply with the available cosmological observations, is, by no means, unique. The goal is simply to demonstrate that the use of the observations at the outset, to deduce the rudiments of what seems reasonable, facilitates the access to possible formal underlying theories, the action based f(R) gravity in our case. With the availability of more extensive and more accurate data in future one may come back and revise the model. See also 34 for a similar emphasis. One of us (YS)35 has followed the same path to propose a modified gravitation for galactic environments and to explain the flat rotation curves and the Tully-Fisher relation in spiral galaxies without recourse to hypothetical dark matters. References 1. A. G. Riess et al., Astron. J. 116, 1009 (1998). September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv An Inverse f(R) Gravitation for Cosmic Speed up, and Dark Energy Equivalent 9 boundary condition for F( ) differential equation for F( ) equation of f=f(F, F’, ) R =R( ) Eliminating results in f=f(R) Emprical a( ) from observation Fig. 6. Algorithm of inverse f(R): Choose a scale factor; Solve field equations first for F = df/dR and next for f as functions of the time parameter τ ; Eliminate τ between R(τ) f(τ) to arrive at f(R). 2. S. Perlmutter et al., Astrophys. J. 517, 565 (1999). 3. A. G. Riess et al., Astrophys. J. 607, 665 (2004). 4. C. L. Bennett et al., Astrophys. J. Suppl. 148, 1 (2003). 5. H. V. Peiris et al., Astrophys. J. Suppl. Ser. 148, 213 (2003). 6. D. N. Spergel, L. Verde, H. V. Peiris et al., Astrophys. J. 148, 175 (2003). 7. E. J. Copeland, M. Sami, S. Tsujikawa, Int. J. Mod. Phys. D 15, 1753 (2006) 8. C. Wetterich, Nucl. Phys. B 302, 668 (1988) 9. P. J. E. Peebles and B. Ratra, Astrophys. J. 325, L17 (1988) 10. B. Ratra and P. J. E. Peebles, Phys. Rev. D 37, 3406 (1988). 11. J. A. Frieman, C. T. Hill, A. Stebbins, and I. Waga, Phys. Rev. Lett. 75, 2077 (1995) 12. M. S. Turner and M. White, Phys. Rev. D 56, R4439 (1997) 13. R. R. Caldwell, R. Dave, and P. J. Steinhardt, Phys. Rev. Lett. 80, 1582 (1998). 14. V. Sahni and A. Starobinsky, Int. J. Mod. Phys. D 9, 373 (2000). 15. A. R. Liddle and R. J. Scherrer, Phys. Rev. D 59, 023509 (1999). 16. I. Zlatev, L.Wang, and P. J. Steinhardt, Phys. Rev. Lett. 82, 896 (1999). 17. P. J. Steinhardt, L. Wang, and I. Zlatev, Phys. Rev. D 59, 123504 (1999). 18. D. F. Torres, Phys. Rev. D 66, 043522 (2002). 19. M. S.; S. Arbabi Bidgoli, M. S. Movahed, and S. Rahvar, Int. J. Mod. Phys. D 15, 1455 (2006). 20. Movahed and S. Rahvar, Phys. Rev. D 73, 083518, (2006). 21. S. Rahvar and M. S. Movahed, Phys. Rev. D 75, 023512, (2007). 22. T. Clifton and J. Barrow, Phys. Rev. D 72, 103005 (2005). 23. S. Nojiri and S. D. Odintsov, Phys. Rev. D 68, 123512 (2003). 24. S. Nojiri and S. D. Odintsov, Phys. Lett. B 562, 147 (2003). September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv 10 Rahvar & Sobouti 25. C. Deffayet, G. R. Dvali, and G. Gabadadze, Phys. Rev. D 65, 044023 (2002). 26. K. Freese and M. Lewis, Phys. Lett. B 540, 1 (2002). 27. M. Ahmed, S. Dodelson, P. B. Greene, and R. Sorkin, Phys. Rev. D 69, 103523 (2004). 28. G. R. Dvali, G. Gabadadze, and M. Porrati, Phys. Lett. B 484, 112 (2000). 29. S. Baghram, M. Farhang, and S. Rahvar, Phys. Rev. D 75, 044024 (2007). 30. M. S. Movahed, S. Baghram and S. Rahvar, Phys. Rev. D 76, 044008 (2007). 31. M. N. Celerier, New Advances in Physics 1, 29 (2007) 32. A. A. Starobinsky, JETPL 86, 157 (2007). 33. W. Hu, I. Sawicki, Phys. Rev. D 76, 064004 (2007). 34. A. Shafieloo, U. Alam, V. Sahni and A. A. Starobinsky, MNRAS 366, 1081 (2006) 35. Y. Sobouti, A&A 464, 921 (2007) ABSTRACT To explain the cosmic speed up, brought to light by the recent SNIa and CMB observations, we propose the following: a) In a spacetime endowed with a FRW metric, we choose an empirical scale factor that best explains the observations. b) We assume a modified gravity, generated by an unspecified field lagrangian, $f(R)$. c) We use the adopted empirical scale factor to work back retroactively to obtain $f(R)$, hence the term `Inverse $f(R)$'. d) Next we consider the classic GR and a conventional FRW universe that, in addition to its known baryonic content, possesses a hypothetical `Dark Energy' component. We compare the two scenarios, and find the density, the pressure, and the equation of the state of the Dark Energy required to make up for the differences between the conventional and the modified GR models. <|endoftext|><|startoftext|> Introduction II Experiment III Imaging local dynamics in a drying paint film IV Time resolved correlation V Conclusions Acknowledgments References ABSTRACT We report on a new type of experiment that enables us to monitor spatially and temporally heterogeneous dynamic properties in complex fluids. Our approach is based on the analysis of near-field speckles produced by light diffusely reflected from the superficial volume of a strongly scattering medium. By periodic modulation of an incident speckle beam we obtain pixel-wise ensemble averages of the structure function coefficient, a measure of the dynamic activity. To illustrate the application of our approach we follow the different stages in the drying process of a colloidal thin film. We show that we can access ensemble averaged dynamic properties on length scales as small as ten micrometers over the full field of view. <|endoftext|><|startoftext|> Introduction Along the Asymptotic Giant Branch phase, many stars exhibit maser amplification in different molecular lines. In oxygen rich stars, [O]/[C]>1, O-bearing compounds are mainly formed and maser emission is presented in SiO, H2O and OH. The study of these different molecules provides information of the overall envelope, from the inner layers dominated by the stellar pulsa- tion (SiO masers) to the outermost regions where the circum- stellar material is expanding at constant velocity (OH masers). The very long baseline interferometry is a unique technique to study the compact and very bright SiO emission, and there- fore, it is particularly helpful in understanding the different and complex processes occuring in these inner regions of the enve- lope. On the other hand, the current models of pumping, either collisional or radiative, do not reproduce some characteristics of the SiO masers that have been observed, as for example, their relative location in the envelope. To test these models and constrain the physical parameters of these inner shells better, it is very useful to perform simultaneous observations of several maser transitions. For this reason, we have carried out multi- line and multi-epoch observations in a sample of AGB stars. Send offprint requests to: R. Soria-Ruiz e-mail: soria@jive.nl We present in this paper the latest results for the Mira-type vari- able R Leo. 2. VLBA observations The observations were made with the NRAO1 Very Long Baseline Array (VLBA) on 2002 december 7. Nearly simul- taneous observations of different 43 GHz and 86 GHz 28SiO and 29SiO maser transitions were performed in the variable star R Leo. The 86 GHz lines were observed in between the two 43 GHz scans, and, therefore, we can assume for our purpouses that the observations were simultaneous. The data correlation was done at the VLBA correlator lo- cated in Socorro (New Mexico). Left and right circular polar- izations (LCP & RCP) were measured for the 28SiO v=1 and v=2 J=1–0 lines, whereas only the LCP was observed in the other transitions. Since no significant difference was found be- tween the maps, less than 5%, the final image is the average of both polarizations. Standard procedures for spectral line VLBI data reduction were followed in the calibration and production of the maps. 1 The National Radio Astronomy Observatory is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc. http://arxiv.org/abs/0704.0682v1 2 R. Soria-Ruiz et al.: Mapping the circumstellar SiO maser emission in R Leo Table 1. Observed maser transitions and results of the fits. specie transition restoring R̄ △R center beam (mas2) (mas) (mas) (Xc,Yc) 28SiO v=1 J=1–0 0.78×0.50 29.24 6.42 (26.9,–21.3) v=1 J=2–1 0.50×0.50 33.84 4.20 (–35.2,–4.7) v=2 J=1–0 0.50×0.50 25.92 6.88 (25.5,–16.3) 29SiO v=0 J=1–0 0.78×0.22 one spot — — v=0 J=2–1 non–det. — — — The amplitude calibration was done using the system temper- atures and antenna gain corrections for the 86 GHz and 29SiO data, and the template method for the other 43 GHz data. The phase errors were removed in a two-step process: first, the single-band delay corrections were derived from the continuum calibrators, OJ287 and 3C273. Second, the fringe-rates were estimated by selecting a bright and simple-structured channel; the corrections found were subsequently applied to the maser source. The maps were produced using the CLEAN deconvo- lution algorithm. 3. Results and Data Fits The results are presented as follows (Figs. 1–2): for each observed line, we show the integrated emission map in Jy beam−1 km s−1 units (center panel), the spectrum of the cross-correlated emission for the different maser components (numbered panels), the total power spectrum (AC) of one of the VLBA antennas and of the emission in the map (XC) (upper- left panel), and the ratio of these two magnitudes (upper-right panel). We have also estimated the size of the total masing re- gions by fitting our data to rings. Only those components with SNR≥6 have been included in the fits. The results derived from the calculations are summarized in Table 1: characteristic ring radius (R̄), ring width (△R) and center of the ring (Xc,Yc) (see details on the fitting process in Soria-Ruiz et al., 2005). In par- ticular, the angular sizes derived for the 28SiO v=1 and v=2 J=1–0 regions (Table 1) are compatible with previous obser- vations performed in R Leo by Cotton et al. (2004). Among the six transitions observed only the 29SiO v=0 J=2–1 has not been detected. A more detailed description of the maps is given in the subsequent section. 4. Relative spatial distribution and pumping mechanisms Our maps show that the spatial distribution of the v=1 and v=2 maser spots is similar although not all the components appear in both transitions (Fig. 1). Concerning the relative location of the 43 GHz 28SiO maser layers, the v=2 emission is produced in a closer region of the envelope, assuming that the centroids of all the emissions are coincident. This is also consistent with previously reported results in other oxygen rich envelopes (see e.g. Desmurs et al., 2000; Cotton et al., 2004; Soria-Ruiz et al., 2004, 2005). In contrast to the 43 GHz regions, this first map of the 28SiO v=1 J=2–1 emission in R Leo reveals that the com- ponents of this maser line are situated in a significantly outer region of the envelope, with a very different spot distribution. Since the J=2–1 emission has been imaged only in a very few sources, this result in R Leo is particularly important to test the proposed SiO maser mechanisms. Finally, the 29SiO v=0 J=1–0 map consisted of one maser spot, thus making difficult to derive any spatial information. The total power and recov- ered emission are shown in Figure 2. Current pumping models, either radiative (Bujarrabal, 1994a,b) or collisional (Humphreys et al., 2002), predict that the different rotational maser lines within the same vibra- tional state are produced under similar conditions and there- fore are expected to be located in the same region of the en- velope. As previously mentioned, we find a contradiction be- tween these theoretical predictions and our observational re- sults. This discrepancy has also been observed in other oxygen- rich stars; IRC+10011 (Soria-Ruiz et al., 2005) and TX Cam (Soria-Ruiz et al., 2006). Further calculations of the excita- tion of the SiO molecule in AGB stars have shown that the conditions under which the different maser transitions occur change drastically when the line overlap between infrared lines of H2O and 28SiO is introduced in the pumping models (Bujarrabal et al., 1996; Soria-Ruiz et al., 2004); such a mech- anism could explain the lack of coincidence between the spots of different J–transitions within a vibrational state. Nevertheless, although these new maps support the rele- vance of line overlaps in the SiO maser pumping in O–rich shells, we think that similar studies should be performed in a larger number of evolved stars. In particular, it would be nec- essary to have data on all types of long-period variable stars, namely, Mira-type, semirregular and irregular variables, as well as supergiant stars. Acknowledgements. This work has been financially supported by the Spanish DGI (MCYT) under projects AYA2000-0927 and AYA2003- 7584. All plots have been made using the GILDAS software package (http://www.iram.fr/IRAMFR/GILDAS). References Bujarrabal, V. 1994a, A&A, 285, 953 Bujarrabal, V. 1994b, A&A, 285, 971 Bujarrabal, V., Alcolea, J., Sánchez Contreras, C., & Colomer, F. 1996, A&A, 314, 883 Bujarrabal, V., Gómez-González, J., & Planesas, P. 1989, A&A, 219, 256 Cotton, W. D., Mennesson, B., Diamond, P. J., et al. 2004, A&A, 414, 275 Desmurs, J.-F., Bujarrabal, V., Colomer, F., & Alcolea, J. 2000, A&A, 360, 189 Humphreys, E. M. L., Gray, M. D., Yates, J. A., et al. 2002, A&A, 386, 256 Soria-Ruiz, R., Alcolea, J., Colomer, F., et al. 2004, A&A, 426, Soria-Ruiz, R., Colomer, F., Alcolea, J., et al. 2005, A&A, 432, Soria-Ruiz, R., Colomer, F., Alcolea, J., et al. 2006, Proceedings of the 8th EVN Symposium http://www.iram.fr/IRAMFR/GILDAS R. Soria-Ruiz et al.: Mapping the circumstellar SiO maser emission in R Leo 3 Fig. 1. The 28SiO v=1 (upper panel) and v=2 (lower panel) J=1–0 maser emission in R Leo. Each figure shows the integrated intensity map in Jy beam−1 km s−1 units, the spectra of the individual maser components, the total power spectrum and the emission in the map, and their ratio. For some maser components, the intensity has been divided by a factor of 2 or 3 to ease the comparison with the other spectra. The vertical dashed lines indicate the systemic velocity of the source, VLSR= –0.5 km s (Bujarrabal et al., 1989). Circles represent the fits for the masing regions (dashed: mean radius R̄, continuous: Rout and Rin defined as R̄± 12△R) (see Section 3). The peak intensity is 28.45 Jy beam −1 km s−1 (v=1) and 53.5 Jy beam−1 km s−1 (v=2), and the shown contour is equivalent to the 5σ level, with σ= 0.4 Jy beam−1 km s−1 (v=1) and σ= 0.7 Jy beam−1 km s−1 (v=2). 4 R. Soria-Ruiz et al.: Mapping the circumstellar SiO maser emission in R Leo Fig. 2. Same as Figure 1 for the 28SiO v=1 J=2–1 (upper panel) and 29SiO v=0 J=1–0 (lower panel) maser emission in R Leo. The peak intensity is 6.03 Jy beam−1 km s−1 (28SiO) and 0.26 Jy beam−1 km s−1 (29SiO), and the shown contour is equivalent to the 5σ level, with σ= 0.08 Jy beam−1 km s−1 (28SiO) and σ= 0.01 Jy beam−1 km s−1 (29SiO). Introduction VLBA observations Results and Data Fits Relative spatial distribution and pumping mechanisms ABSTRACT The study of the innermost circumstellar layers around AGB stars is crucial to understand how these envelopes are formed and evolve. The SiO maser emission occurs at a few stellar radii from the central star, providing direct information on the stellar pulsation and on the chemical and physical properties of these regions. Our data also shed light on several aspects of the SiO maser pumping theory that are not well understood yet. We aim to determine} the relative spatial distribution of the 43 GHz and 86 GHz SiO maser lines in the oxygen-rich evolved star R Leo. We have imaged with milliarcsecond resolution, by means of Very Long Baseline Interferometry, the 43 GHz (28SiO v=1, 2 J=1-0 and 29SiO v=0 J=1-0) and 86 GHz (28SiO v=1 J=2-1 and 29SiO v=0 J=2-1) masing regions. We confirm previous results obtained in other oxygen-rich envelopes. In particular, when comparing the 43 GHz emitting regions, the 28SiO v=2 transition is produced in an inner layer, closer to the central star. On the other hand, the 86 GHz line arises in a clearly farther shell. We have also mapped for the first time the 29SiO v=0 J=1-0 emission in R Leo. The already reported discrepancy between the observed distributions of the different maser lines and the theoretical predictions is also found in R Leo. <|endoftext|><|startoftext|> Graph1 Partially disordered state near ferromagnetic transition in MnSi S.V.Maleyev∗ and S. V. Grigoriev Petersburg Nuclear Physics Institute, Gatchina, Leningrad District 188300, Russia (Dated: November 4, 2018) The polarized neutron scattering in helimagnetic MnSi at low T reveals existence of a partially disordered chiral state at ambient pressure in the magnetic field applied along 〈111〉 axis below the first order transition to the non-chiral ferromagnetic state. This unexpected phenomenon is explained by the analysis of the spin-wave spectrum. We demonstrate that the square of the spin- wave gap becomes negative under magnetic field applied along 〈111〉 and 〈110〉 but not along the 〈100〉 direction. It is a result of competition between the spin-wave interaction and cubic anisotropy. This negative sign means an instability of the spin wave spectrum for the helix and leads to a destruction of the helical order, giving rise to the partially disordered state below the first order ferromagnetic transition. PACS numbers: 75.25.+z,61.12.Ex Non-centrosymmetric cubic helimagnets such as MnSi, FeGe, FeCoSi are the subject of the intensive experi- mental and theoretical studies for the last several decades. Their single-handed helical structure was explained by Dzyaloshinskii1. The full set of interactions responsible for observed helical structure (Bak-Jensen model) was es- tablished later in2,3 in agreement with existing experimental data (see for example4 and references therein). The renascence in this field began with a discovery of the quantum phase transition to a disordered (partially ordered) state in MnSi at high pressure5 and6. The following properties of this state attract the main attention: i) non-Fermi- liquid conductivity, ii) spherical neutron scattering surface with the weak maxima along the 〈110〉 axes7,8, whereas at ambient pressure Bragg reflections are observed along 〈111〉4. These features and the structure of the partially ordered state were discussed in several theoretical papers (see9,10,11,12 and references therein). It should be noted also that the spherical scattering surface with maxima along 〈111〉 was observed at ambient pressure just above critical temperature Tc ≃ 29K. This experiment was explained using the Bak-Jensen model13. These studies shadowed an important problem of the helix structure evolution in the external magnetic field H . In particular, simple phenomenological14 and microscopical15 theories predict the smooth second order transition from the conical to the ferromagnetic state. The spin component of the cone parallel to the applied field is proportional to the magnetization and it increases as (H/HC) up to its saturated value. This prediction is in agreement with experiment6. The perpendicular, rotating spin components fade away with the field and according to the plain theory the helical Bragg reflections must decrease as (H2C − H2)/H2C where HC is the critical field for the ferromagnetic transition. This, however, contradicts to the experimental facts if H ‖ [111] (See16 and Fig.1). The experiment shows that the transition is of the first order and the Bragg intensity does not follow the law given above. In this paper we demonstrate that although from a general expression for the ground state energy one can expect the second order transition at the critical field HC , this is true for H ‖ [100] only but the situation changes if H ‖ [111] or H ‖ [110]. In the last cases the spin-wave spectrum is unstable in the field range H1 < H < HC due to the cubic anisotropy as the square of the spin-wave gap becomes negative and the helical long-range order decays in this field interval. The ferromagnetic state occurs above HC . Hence we have a region below HC where the partially disordered state coexists with the almost saturated magnetization. It is a reason why this state has not been noticed in the earlier macroscopic measurements5,6. Let us consider the conical helix with the lattice spin SR = S ζ̂R + S η̂R + S ξ̂R where ζ̂R = ĉ sinα+ (Ae ik·R + c.h.) cosα; η̂R = i(Ae ik·R − c.h.); ξ̂R = ĉ cosα− (Aeik·R + c.h.) sinα, whereA = (â−ib̂)/2 and the unit vectors ζ̂, η̂ and ξ̂ form the right-handed frame. If α = 0 we have a plain helix15. The spin operators are given by the well known expressions: S = S−(a+a)R, SηR = −i(S/2)1/2[aR−a −(a+a2)R/(2S)] and S = (S/2)1/2[aR + a − (a+a2)R/(2S)]. http://arxiv.org/abs/0704.0683v1 Similar to in15, we use the following Hamiltonian {−JqSq · S−q + 2iDq[Sq × S−q] l=x,y,z Sq,lS−q,lq l }+K l=x,y,z Sq,lSp,lSh,lSf ,l +N1/2H · S0, where the first term is the ferromagnetic exchange interaction, the second term is the Dzyaloshinskii interaction (DI) responsible for the helix structure1. The following terms are the anisotropic exchange, the cubic anisotropy and the Zeeman energy. They determine the orientation of the helix vector k in the magnetic field (see15,16 and17). The following hierarchy of interactions holds2,3: J0 >> D0/a >> F0/a 2 ∼ K where a is the lattice constant. Replacing SR → Sζ̂R one gets the classical energy15 Ecl = S 2L(k) −SD0(k · [â× b̂]) cos2α+ SH‖ sinα+ Ecub where A = S(J0 − Jk)/k2 is the spin-wave stiffness at q >> k, L(k) = k̂2l (â l + b̂ l ) and H‖ is the field component parallel to k. Using (3) and (4) and taking into account that the single ion contribution Ecub does not depend on k we get kl = k(D0/|D0|)[ĉl − SF0(â2l + b̂2l )/(2A)] Ecl = −(SAk2/2)[1− SF0L(ĉ)/(2A)] cos2 α+H‖ sinα+ Ecub, where k = S|D0/A| and L(ĉ) is a cubic invariant. For D0 > 0 or D0 < 0 we have the right or the left helix, respectively3. For F0 < (>)0 the helix vector k is oriented along the 〈111〉 (〈100〉) axes as the invariant L has two extrema 2/3(0)3. Neglecting Ecub in Eq.(4) one obtains the conical state with sinα = −H‖/HC if H‖ < HC where HC = Ak2(1 − SF0L/(2A)] ≃ Ak2 and the ferromagnetic state for H‖ > HC15,19. It can be shown that Ecub gives a negligible contribution to HC and to sinα. The principal parameters of the magnetic structure for MnSi are: A ≃ 52meV Å k ≃ 0.038Å and HC ≃ 0.6T ≃ Ak2 in agreement with the theory (see15). Another energy F0k2 ∼ 0.01meV ≃ 0.1T was estimated from the anisotropy of the critical neutron scattering13 and from the reorientation of the helix axis in the magnetic field16. The value of K will be estimated below. The classical energy depends on H‖ only. The general expression for the ground state energy, derived in 15. It contains yet another term of purely quantum nature, which depends on the spin-wave gap ∆ as a parameter. As shown in15 if H⊥ > ∆ 2 the helix wave vector k is directed along the magnetic field. For MnSi ∆ ≃ 12µeV ≃ 0.1T ≪ HC16. In a linear spin-wave theory the gap appears due to the cubic anisotropy but it equals to zero at H‖ = 0 20. There is yet another contribution to the gap, which is a result of the spin-wave interaction. We begin with the former. For evaluation of ∆2 one has to consider the uniform part of the bilinear Hamiltonian, which is given by H0 = E0a 0 a0 +B0(a 0 + a 0 )/2, (5) and ∆20 = E 0 −B20 . If one neglects the cubic anisotropy at H‖ < HC then one has E0 = B0 = (HC/2) cos2 α and the gap is zero15,18. Taking into account Hcub, one obtains after simple but rather tedious calculations E0 = (HC cos 2 α)/2 + E1 + E2;B0 = (HC cos 2 α)/2− E2 E1 = −4Λ[(1− L) sin4 α+ 3L sin2 α cos2 α + (3/8)(2− L) cos4 α]; E2 = (3Λ/4)[4L sin 2 α+ (2− L) cos2 α]; ∆2Cub = (HC cos 2 α+ E1)(E1 + 2E2), where Λ = S3K. These expressions hold at H‖ < HC and, indeed, ∆ Cub = 0 in zero field only For H ≥ HC we have cosα = 0 and if without cubic anisotropy E0 = H−HC , B0 = 0 and the gap ∆ = H−HC15, while if taking it into account the cubic anisotropy we get ∆2Cub = [H −HC − 4Λ(1− L)][H −HC + Λ(10L− 4)]. (7) For the two principal directions L111 = 2/3 and L001 = 0 and we obtain: ∆2Cub,[1,1,1] = (H −HC − 4Λ/3)(H −HC + 8Λ/3); ∆2Cub,[1,0,0] = (H −HC − 4Λ) Thus, one comes to the important conclusion: ∆2 Cub,[1,1,1] is negative at H = HC for both signs of K 21. This circumstance is decisive for the stability of the system if one takes into account that the contribution to ∆2, stem from the spin-wave interaction, is proportional to cos4 α and disappears at H close to HC . Let’s consider the spin wave interaction to the gap ∆2Int. As the DI breaks the total spin conservation law it must lead to the spin-wave gap. However, in cubic crystals this interaction is very soft [see Eq.(2)] and the gap appears as a result of the spin-wave interaction only similar to the case of pseudo-dipolar interaction in antiferromagnets22,23. In the one-loop approximation it consists of both the Hartree-Fock (HF) part evaluated in15 at H = 0 and the second order contribution from three-point spin-wave interaction, which appears due to the helical structure. It was ignored in15. The diagrams for both contributions are shown in Fig.2 where lines correspond to Green functions Gq = − < Taq, a+q > and Fq = F+q = − < Taq, a−q >, which in ω-representation are given by15: Gq(iω) = Eq + iω (iω)2 − ǫ2q ; Fq(iω) = − (iω)2 − ǫ2q , (9) where Eq = S(Mo −Mq) +Bq;Bq = (S/2)(Mq − Jq) cos2 α;Mq = J+q + 2Dq(k · ĉ), J±q = (Jq+k ± Jq−k)/2 and the spin-wave energy ǫq = (E q − B2q)1/2. Although at q ≪ 1/a these expressions give the same result as obtained in15: Eq = Aq 2 + Bq; Bq = (Ak 2/2) cos2 α and ǫq = Aq(k 2 cos2 α + q2)1/2. However for the present consideration we need them for all q as the formulae for ∆2Int contains the sums, which saturate at q ∼ 1/a. The contribution of the forth-point interaction to ∆2 was analyzed in15 at H = 0 and T = 0. At an arbitrary H the gap consists of two terms V1 = (1/4N) (M1 +M2 −M1−3 −M2−3)a+1 a 2 a3a4; V2 = (1/4N) (M1 − J1)[2(a+a)1(a+a)−1 sin2 α − (a+a2)1(a1 + a+−1) cos2 α], where 1 = q1 etc. At small momenta we have in parentheses −2A(3 · 4)/S and Ak2/S for V1 and V2, respectively. Now one has to consider the V1 interaction and the second part of V2 interaction together. They give the principal contribution to the gap ∆2Int ∆2Int = SAk2 cos4 α (Mq − Jq) = (Ak2)2 cos4 α , (11) where in r.h.s we take into account that J+q = Jq = 0 and according to Eq.(4) (k · ĉ) = Ak2/(SD0). This contribution is T -independent. The contribution of the first term in V2 is more complicated one. Its T -independent part is proportional to (ka) and may be neglected [for MnSi (ka)2 ≈ 0.03]. The T -dependent contribution consists of two parts. The excitations with q ≫ k have a quadratic dispersion ǫq ≈ Aq215 and at T ≫ Ak2 they are responsible for the first part, which has the form ∆22,1 = (1/2)(Ak 2)2ζ(3/2)(ka)3[T/(2πAk2)]3/2 sin2 α cos2 α. In spite of small factor (ka)3 it may be important at sufficiently high T . The second part is not so trivial. According to15 (cf. also11) at q . k and H < HC the spin-wave spectrum becomes strongly anisotropic due to umklapp processes connecting excitations with q and q± k. In zero field taking into account the gap we have ǫq = [A 2(k2q2‖ + 3q ⊥/8) + ∆ 2]1/2. (12) As the field increases this anisotropy becomes weaker since the term in the expression for ǫq appears to be proportional to q2⊥. In the ferromagnetic state (H > HC) this anisotropy vanishes. In a very weak field we can use Eq.(12) and then the second part is given by ∆22,2 = (1/π)Ak 3/2 ln(Ak2/∆) sin2 α cos4 α. (13) This expression diverges if ∆ → 0. Similar divergences were discussed in25. However, we can neglect this contribution due to the small factor (ka)3 and decreasing of logarithm when H increases. The three-point interaction is given by V3 = V− cosα+V+ sinα cosα where V± = (2S/N) +a)−q(a−q± a+q ) and C(−)q = [J q +Dq(q · ĉ)] ∼ (A/S)(q · k)(qa)2 C(+)q = [(Jq − J (+)q )/2−Dq(k · ĉ)] ≃ −Ak2/(2S), where the r.h.s. expressions are derived from Eq.(4) at q ≪ 1/a. The corresponding contributions to ∆2 were evaluated as in22. The T -independent term is proportional to (ka)2 and the second one is equal to −2∆2,2/3. Both of them are small and can be neglected. Finally we have ∆2 = ∆2Int +∆ Cub. (15) The field dependence of the ratio ∆2(H)/∆2(0) for the three directions 〈111〉, 〈100〉 and 〈110〉 are shown in Fig.3 with HC = 0.565T, ∆(0) = 0.1 T in agreement with the experiment. The cubic anisotropy Λ was chosen to be equal to −0.05 T. In case of H ‖ [100] ∆2 remains positive and the spin-wave spectrum is stable at all H . In this case at H > HC along with ferromagnetic spin configuration the spin-wave components of the lattice spins has to remain rotating [see Eq.(1)] as was discussed in15. For 〈111〉 direction ∆2 is negative at H > H1 ≃ 0.72HC and the spin-wave spectrum becomes unstable. Hence the long-range helical order demolishes and the corresponding Bragg peaks disappear and the scattering has to be spread around them. Such decrease of the intensity along [111] is shown in Fig.1 in qualitative agreement with the theory. However, more detailed measurements are needed. A similar instability has to be along [110] also contrary to the expectations for H ‖ [100]. The width of this scattering may be estimated from the condition ǫ2q = 0, as for larger q the spin-wave spectrum can not feel the disorder. Near H1 we have ǫ q ∼ (Aqk)2 + ∆2 and the inverse correlation length of the disorder κ = k(|∆|/Ak2) ≪ k. Close to HC ǫq ∼ Aq215 and κ = k(|∆|/Ak2)1/2. This disordered state have a strong chirality, which is demonstrated by a constant polarization of the scattered neutrons in the whole field range below HC (see inset in Fig.1). This polarization is determined as P = (I− − I+)/(I− + I+) and according to the general theory 26 the ratio P/P0, where P0 is the initial neutron polarization, is the chirality at given q. A strong drop of the polarization at HC is a signature of the first order transition to the uniform ferromagnetic state with weak chiral fluctuations. In conclusion, we analyzed thoroughly the field behavior of the spin wave gap in the spin-wave spectrum of cubic helimagnets. It is shown that if the field is applied parallel to both [111] and [110] directions a partially disordered state has to take place at H1 < HC . We demonstrated that this state appears when the square of the spin-wave gap becomes negative and the spin-wave spectrum unstable. We presented the first observation in MnSi of this partially disordered chiral state in the magnetic field. The work is supported in part by the RFBR (projects No 05-02-19889, 06-02-16702 and 07-02-01318) and the Russian State Programs ”Quantum Macrophysics” and ”Strongly correlated electrons in Semiconductors, Metals, Superconductors and magnetic Materials” and Russian State Program ”Neutron research of solids”. ∗ Electronic address: maleyev@sm8283.spb.edu 1 I.E.Dzyaloshinskii, Zh.Eksp. Teor.Fiz.46, 1420 (1964)[Sov.Phys.JETP19, 960 (1964)]. 2 O.Nakanishi, A.Yanase, A.Hasegava, M.Kataoka, Solid State Commun.35, 995 (1980). 3 P.Bak, M.Jensen, J.Phys. C 13, L881 (1980). 4 M.Ishida, Y.Endoh, S.Mitsuda, Y.Ishilawa, M Tanaka,J.Phys.Soc.Jpn. 54 2975 (1975). 5 C.Pfleiderer, G.J.MacMillan, S.R.Julian, G.G.Lonzarich, Phys.Rev. B 55, 8330 (1997). 6 K.Koyama, T.Goto, T.Kanomata, R.Note, Phys.Rev. B 62,986 (2000). 7 C.Pfleiderer, S.R.Julian, G.G.Lonzarich, Nature(London) 414, 427 (2001). 8 C.Pfleiderer, D.Reznik, L.Pintschovius, H.v.Löhneysen, M.Garst, A.Rosh, Nature (London)427, 227 (2004). 9 S.Tewaru, D.Belitz, T.R.Kirpatrick, Phys.Rev.Lett.96, 047207 (2006). 10 U.K.Rössler, A.N.Bogdanov, C.Pfleiderer, Nature(London) 442, 797 (2006). 11 D.Belitz, T.R.Kirpatrick, Phys.Rev. B 73, 054431 (2006). 12 B.Binz, A.Vishwanath, Phys.Rev. B 74, 214408 (2006). 13 S.V.Grigoriev, S.V.Maleyev, A.I.Okorokov, Yu.O.Chetverikov, R.Georgii, P.Böni, D.Lamago, H.Eckerlebe, K.Pranzas, Phys.Rev. B 72,134420 (2005). 14 M.L.Plumer, M.B.Walker, J.Phys.C: Solid State Phys. 14, 4689 (1981). 15 S.V.Maleyev, Phys.Rev. B 73,174402 (2006). 16 S.V.Grigoriev, S.V.Maleyev, A.I.Okorokov, Yu.O.Chetverikov, P.Böni, R.Georgii, D.Lamago, H.Eckerlebe, K.Pranzas, Phys.Rev. B 74, 214414 (2006). mailto:maleyev@sm8283.spb.edu 350 400 450 500 550 600 650 350 400 450 500 550 600 650 H (mT) H (mT) FIG. 1: The intensity of the Bragg reflection in MnSi at T = 15 K as function of the field at H ‖ [111]. The full line is the theoretical prediction (see the text). Inset: the spin chirality as a function of the field measured by polarized neutrons(see text). (a) (b) FIG. 2: Hartree-Fock (a) and three-point diagrams for the spin-wave gap (b). 17 S.V.Grigoriev, S.V.Maleyev, A.I.Okorokov, Yu.O.Chetverikov, H.Eckerlebe, J. Phys.: Condens. Matter 19 145286 (2007). 18 There are misprints in Eqs. (13) and (21,22) in Ref.15: the F terms has to have negative sign. 19 Due to demagnetization HC has additional term depending on the sample form 20 Eq.(53) in15 is erroneous. 21 The same trues in [1, 1, 1] direction where L = 1/2 22 D.Petitgrand, S.V.Maleyev, Ph.Bourges, A.S.Ivanov, Phys.Rev B 59, 1079 (1999). 23 Results presented below were obtained as in22 using modification of the Belyaev’s technique24 adjusted to non-Hermitian spin-wave interaction in Dayson-Maleyev representation. 24 A.A.Abrikosov, L.P.Gor’kov, I.E.Dzyaloshinskii, Quantum field theoretical Methods in Statistical Physics, (Pergamon, New York, 1965). 25 T.R.Kirpatrick D.Belitz, Phys.Rev.Lett. 97, 267205 (2006). 26 S.V.Maleyev, Phys. Usp.,45, 569 (2002); Physica B 350, 26 (2004). 0,0 0,2 0,4 0,6 0,8 1,0 1,2 H || (100) H || (111) H || (110) = - 0.5 FIG. 3: The magnetic field dependence of the ratio ∆2(H)/∆2(0). Parameters are given in the text. References ABSTRACT The polarized neutron scattering in helimagnetic MnSi at low $T$ reveals existence of a partially disordered chiral state at ambient pressure in the magnetic field applied along $<111>$ axis below the first order transition to the non-chiral ferromagnetic state. This unexpected phenomenon is explained by the analysis of the spin-wave spectrum. We demonstrate that the square of the spin-wave gap becomes negative under magnetic field applied along $<111>$ and $<110>$ but not along the $<100>$ direction. It is a result of competition between the spin-wave interaction and cubic anisotropy. This negative sign means an instability of the spin wave spectrum for the helix and leads to a destruction of the helical order, giving rise to the partially disordered state below the first order ferromagnetic transition. <|endoftext|><|startoftext|> Fluctuations in glassy systems Claudio Chamon1 and Leticia F. Cugliandolo2 1 Physics Department, Boston University, 590 Commonwealth Avenue, Boston, MA 02215, USA 2Laboratoire de Physique Théorique et Hautes Énergies, Jussieu, 5ème étage, Tour 25, 4 Place Jussieu, 75252 Paris Cedex 05, France chamon@bu.edu, leticia@lpt.ens.fr Abstract. We summarize a theoretical framework based on global time- reparametrization invariance that explains the origin of dynamic fluctuations in glassy systems. We introduce the main ideas without getting into much technical details. We describe a number of consequences arising from this scenario that can be tested numerically and experimentally distinguishing those that can also be explained by other mechanisms from the ones that we believe, are special to our proposal. We support our claims by presenting some numerical checks performed on the 3d Edwards-Anderson spin-glass. Finally, we discuss up to which extent these ideas apply to super-cooled liquids that have been studied in much more detail up to present. http://arxiv.org/abs/0704.0684v1 CONTENTS 2 Contents 1 Why glasses? vs. universality in glassy dynamics 2 2 Time reparametrization invariance 5 2.1 Mean-field models – dynamic equations . . . . . . . . . . . . . . . . . . . 5 2.2 Structural glasses: the p ≥ 3 cases . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Short-ranged models – dynamic action . . . . . . . . . . . . . . . . . . . 10 2.4 Turning a nuisance into something useful - symmetry as a guideline . . . 11 2.5 The spherical p = 2 case or mean-field domain growth . . . . . . . . . . . 13 2.6 Quantum problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 Consequences and tests 14 3.1 Two-time correlation length . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Scaling of the pdf of local two-time functions . . . . . . . . . . . . . . . 17 3.3 Effective action for local ages . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4 Two-time scaling of local functions . . . . . . . . . . . . . . . . . . . . . 21 3.5 Multi-time scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.6 Local fluctuation-dissipation relation . . . . . . . . . . . . . . . . . . . . 25 3.7 Infinite susceptibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4 Discussion 27 1. Why glasses? vs. universality in glassy dynamics It is common to encounter in nature systems that resist equilibration with their environments and display what is called glassiness. The name is derived from what we normally know as glasses, an irregular array of silicon and oxygen atoms without crystalline order, much as a liquid, but as hard as a solid. The molecular diffusion within glasses is extremely sluggish, slowing down by over 10 orders of magnitude as the temperature is slightly decreased near the operationally defined glass transition temperature. Hence, the term glassiness became generically associated with very slow dynamics [1]. So, why glasses? The answer to this question has been the focus of much research effort for long. It is certainly a rather difficult question, and there have been a number of ideas lined up for trying to understand how material systems become glassy. It is not even clear whether in many systems there is a thermodynamic phase one can call glass, or there is simply a dynamical crossover at low enough temperatures. Do we need to fully answer why glasses? before we really further our understanding of glassy dynamics? We take the point of view that, by starting from the fact that glassy systems exist (as nature presents us with concrete examples), we can then attempt to characterize whatever possible universal properties there are in glassy dynamics. CONTENTS 3 To make this statement clearer, let us turn to a question that Anderson poses in the introduction of his Concepts in solids text [2]: why solids? This question, again, is a rather complex one, and it is not untwined from the question why glasses? if we focus on why a regular array of atoms, as opposed to a random packing, forms in the first place. Even if one assumes that a crystalline structure forms at low temperatures, a detailed quantitative analysis of the energetics remains to be done so as to determine if the packing is hexagonal, cubic, body or face centered, etc. Nonetheless, if one starts from the existence (as observed in nature) of a state with broken translational symmetry, one can construct theories of lattice vibrations (and quantize it) and of electronic band structure (and discern between insulators, metals and semiconductors). Solid state physics starts from the solid. The approach we review in this paper relies on a similar philosophy: we do not claim any theory of the glass transition, and we do not attempt to answer why glasses? with this particular approach. Our theory does not allow us to make non-universal predictions, such as what the glass transition temperature is (if one can be defined) for a certain material, or whether the material displays glassy behavior at all. We aim at understanding if there is a set of principles, guided by symmetry considerations, that can be used to understand certain universal aspects of glassy dynamics once the glass state is presented. For example, glasses age [3]. We thus expect that there exists a unified approach to describe aging phenomena, and we seek some guiding principles, based on dynamical symmetries, that could allow us to understand universal properties in the aging regime, including the scaling of spatial heterogeneities. We propose that the symmetry that captures the universal aging dynamics of glassy systems is the invariance of an effective dynamical action under uniform reparametrizations of the time scales [4]-[9]. Such type of invariance had been known to exist since the early days in which the mean-field equilibrium dynamics of spin- glasses was tried to be understood [10, 11] and it was later encountered in the better formulated out of equilibrium dynamic formalism applied to the same systems [12, 13]. The invariance means that in the asymptotic regime of very long times a family of solutions to the equations of motion is found. This ‘annoyance’, we claim, has actually a physical meaning and implications in the fluctuating dynamics of real glasses. In order to make this statement concrete, it is necessary to show that the invariance also exists in finite dimensional glassy models. With this aim we showed that global time reparametrization invariance emerges in the long times action of short-range spin- glasses assuming causality and a separation of time scales [4]-[6]. Basically, the second assumption amounts to start from a glass state, where one may claim there is a separation between, roughly, a time regime where relaxation is fast and another where relaxation is slow. The invariance can then be used to describe dynamic fluctuations in spin-glasses and, we conjectured, in other glassy systems as well. Physically, the emergence of time reparametrization invariance can be though of in the following way. The out of equilibrium relaxation of glassy systems is well characterized by two-time functions, either correlations or linear responses. In the slow CONTENTS 4 and aging regime they depend on two times and time-translation invariance is lost. The proper measure of ‘time’ inside the system is the value of the correlation itself, and not the ‘wall clock’ in the laboratory. For instance, in a spin-glass the proper measure of sample age is the spin-spin overlap or in particle systems it is the incoherent correlation function. Age measures can fluctuate from point to point in the sample, what we called heterogeneous aging, with younger and older pieces (lower and higher values of the correlation) coexisting at the same values of the two laboratory times. The fact that the effective dynamical action becomes invariant under global time reparametrizations, t → h(t), everywhere in the sample, means that the action weights the fluctuations of the proper ages, C(~r; t1, t2), directly, and the times t1 and t2 in the action are just integrated over as dummy variables. To draw an analogy, in theories of quantum gravity the space-time variables Xµ(τ, σ) are the proper variables, and the action is invariant under conformal transformations of the world-sheet parameters τ and σ. So what does global time-reparametrization invariance symmetry concretely teach us about observables in glasses? So far we discussed a global symmetry or invariance with respect to uniform time-reparametrizations. By looking at spatially heterogeneous reparametrizations, we can predict the behavior of local correlations and linear susceptibilities and the relations between them. For example, we predict that, after a convenient normalization that we explain in the main text, the triangular relation between the local coarse-grained correlations, C(~r; t1, t2), C(~r; t2, t3) and C(~r; t1, t3), as a function of the intermediate time t2, t3 < t2 < t1, at all spatial points, ~r, should be identical to the global triangular relation, in the asymptotic limit of very long absolute times and delays between them and very large coarse-graining linear length [9]. Different sites can be retarded or advanced with respect to the global behaviour but they should all have the same overall type of decay. Similarly, the relation between local susceptibilities and their associated correlations should be identical all over the sample [5, 6] leading to a uniform effective temperature [14]. The purpose of this article is to describe our current understanding of dynamic fluctuations (heterogeneities) in the non-equilibrium relaxation of glassy systems arising from the time-reparametrization invariance scenario [4]-[9],[15]. We illustrate it by presenting critical tests. The structure of this review is the following. In Sect. 2 we explain the origin of the time-reparametrization invariance scenario. We do not present detailed derivations that were already published but we aim at highlighting the main ideas behind the scene. In Sect. 3 we list several measurable consequences of the theory. We discuss how they have been examined numerically in different glassy models. The discuss here which consequences could also be explained by other approaches and which, we believe, are unique to our scenario. Finally, in Sect. 4 we discuss the scenario. Since this research project is not closed yet, we present some proposals for numeric and experimental tests as well as some ideas about analytic calculations that could help us understanding the limitations of our proposal. CONTENTS 5 2. Time reparametrization invariance In this Section we explain how global time-reparametrization invariance develops asymptotically in the aging regime of glassy models. For the sake of simplicity we focus on the classical formalism and at the end of this section we mention the modifications introduced by quantum fluctuations. 2.1. Mean-field models – dynamic equations Schematic models of spin-glasses, structural glasses and ferromagnetic clean coarsening are encoded in the family of p-spin models defined by [16] H = − i1i2...ip Ji1i2...ipsi1si2 . . . sip (1) with quenched disordered exchanges distributed with the Gaussian law P (Ji1i2...ip) ∝ −p!J2 i1i2...ip /(2Np−1) . The dynamic variables si, i = 1, . . . , N , are of Ising type, si = ±1, or satisfy a global spherical constraint i=1 s i = N . p is an integer parameter: with p = 2 and Ising variables one mimics spin-glasses, with p > 2 and Ising or spherical spins the phenomenology of structural glasses is recovered, and with p = 2 and spherical spins one describes ferromagnetic domain-growth in clean systems. The sum runs over all p-uplets; for this reason these models are ‘mean-field’ in the sense that the saddle- point evaluation of the partition function or the dynamic generating functional is exact in the thermodynamic limit, N → ∞. The Hamiltonian (1) also represents the potential energy of a particle with position ~s = (s1, . . . , sN) on an infinite dimensional hypercube (si = ±1) or hypersphere ( i=1 s i = N). Dynamics is introduced with a Langevin equation that represents the coupling of the spins to an equilibrated thermal environment. Ising spins are then replaced by soft variables by introducing a double-well potential energy, i=1 V (si), with V (si) = a(s2i − 1) 2. The initial condition is usually chosen to be random thus mimicking a rapid quench from infinite temperature to the working temperature T . In the limit N → ∞ exact Schwinger-Dyson equations couple the global correlation and instantaneous linear response NC(t, tw) = si(t)si(tw) , NR(t, tw) = δsi(t) δhi(tw) i=1 si(t) = 0 for all t). The field hi couples linearly to the spin variables and, in general, we are interested in perturbing fields that are uncorrelated with the equilibrium states of the systems. It is not necessary to average over thermal noise or quenched disorder since these quantities do not fluctuate in the out of equilibrium regime reached when the infinite volume limit (N → ∞) has been taken at the outset. Here are in what follows times are measured from an origin that corresponds to the quench to the working temperature. Our study applies then in the order of limits CONTENTS 6 in which the exact causal Schwinger-Dyson equations for spherical models at times t ≥ tw read [12, 19] (∂t − zt)C(t, tw) = dt′ Σ(t, t′)C(t′, tw) + dt′ D(t, t′)R(tw, t ′) , (4) (∂t − zt)R(t, tw) = dt′ Σ(t, t′)R(t′, tw) + δ(t− tw) , (5) where the vertex, D, and self-energy, Σ, are functions of C and R D(t, tw) = Cp−1(t, tw) , Σ(t, tw) = p(p− 1) Cp−2(t, tw)R(t, tw) . (6) The Lagrange multiplier zt is fixed by requiring C(t, t) = 1. For Ising problems the soft spin constraint can be treated in the mode-coupling approximation [17] or else, one can apply a T − Tc expansion taking advantage of the fact that the phase transition is of second order for p = 2 (Sherrington-Kirkpatrick or SK model) [13]. In such fully-connected models all higher order correlations and linear responses factorize and can be written as functions of these two-time functions. Self-consistent approximate treatments of interacting particle models with realistic potentials, like the mode-coupling approach, yield similar equations with the addition of a wave-vector dependence that in the present context can also be taken into account by considering models of d-dimensional random manifolds embedded in N → ∞ dimensional spaces [18, 19]. 2.2. Structural glasses: the p ≥ 3 cases We now focus on the p ≥ 3 cases that mimic the structural glass problem. We mention at the end of this subsection the Ising p = 2 (SK) spin-glass case that is not conceptually different but just technically more involved. In Sect. 2.5 we discuss the p = 2 spherical problem that yields a mean-field description of coarsening phenomena and is rather different from the point of view of time-reparametrization transformations. Equations (4) and (5 are causal and be solved numerically by constructing C(t, tw) and R(t, tw) from the initial instant t = tw = 0. An analytic solution is possible in the asymptotic limit, as we discuss below. We first present the main features of C and R and we later explain how these are obtained from the asymptotic analytic solution. Equations (4) and (5) have a dynamic transition at a critical temperature Td(p). At T > Td the dynamics occurs in equilibrium and close to Td the decay of the correlations slows down as in super-cooled liquids with the α relaxation time diverging as a power law of T − Td [16, 20]. Below Td eqs. (4) and (5) admit a unique solution [12, 13] that is no longer stationary. The behaviour of the low-temperature correlation and susceptibility is sketched in Fig. 1. The low-temperature solution presents a separation of time-scales. In the long waiting-time limit the self-correlation and integrated linear response or susceptibility, χ(t, tw) ≡ dt′R(t, t′), can be written as C(t, tw) = Cst(t− tw) + Cag(t, tw) , (7) CONTENTS 7 1e+00 1e-01 1e-02 1e+051e+031e+011e-01 rapid & stationary (C st ) aging & (Cag) 1e+00 1e-01 1e+051e+031e+011e-01 rapid & stationary (χ st) aging & slow (χag) Figure 1. Sketch of the relaxation of the self-correlation and susceptibility in the glassy regime. The separation of time-scales is clear in the figure. The Edwards- Anderson parameter, qea the corresponding susceptibility χea and the asymptotic value limt→∞ χ(t, tw) are indicated with horizontal lines. χ(t, tw) = χst(t− tw) + χag(t, tw) . (8) The first terms in the right-hand-side describe the stationary regime at short time- differences in which the correlation and susceptibility relatively rapidly approach a plateau at limt−tw→∞ limtw→∞C(t, tw) = qea and limt−tw→∞ limtw→∞ χ(t, tw) = (1 − qea)/T ≡ χea. The second terms are the aging relaxation of C towards zero (in the absence of an external field), and the aging response of the system towards the asymptotic value χea + qea/Teff with Teff a parameter with the interpretation of an effective temperature [14]. The stationary and aging relaxation are fast and slow in the sense that ∂tCst(t, tw) ∼ Cst(t, tw) C > qea , (9) ∂tCag(t, tw) ≪ Cag(t, tw) C < qea , (10) ∂tχst(t, tw) ∼ χst(t, tw) χ < χea , (11) ∂tχag(t, tw) ≪ χag(t, tw) χ > χea . (12) The aging self-correlation and susceptibility scale as Cag(t, tw) ≈ qea fc R(tw) , χag(t, tw) ≈ qea fχ R(tw) . (13) The scaling functions satisfy the limit conditions fc(1) = 1, fc(∞) = 0, fχ(1) = 0 and fχ(∞) = 1/Teff . Using mathematical properties of monotonic two-time functions one can show that such a scaling holds asymptotically in each (two) time-scale of the evolution [13]. While in a system undergoing finite dimensional coarsening R(t) has a natural interpretation as the typical domain radius, in mean-field models there is no immediate understanding of the ‘clock’ R(t) that, in a sense, sets the macroscopic time-scale. The numerical solution suggests that R(t) is just a power of time. CONTENTS 8 In the asymptotic limit in which the additive separation of time-scales with the scaling form (13) holds it is convenient to use a parametric description of the dynamics in which times do not appear explicitly. More precisely, the approach to the asymptotic scaling, and fc and fχ, can be put to the test by constructing ‘triangular relations’ between correlations and susceptibilities, respectively. For generic three long times t1 ≥ t2 ≥ t3 ≫ t0 one computes the correlations C(tµ, tν), µ > ν = 1, 2, 3. If the times are such that the ratios R(tµ)/R(tν) remain finite in the asymptotic limit, that is to say C(tµ, tν) = Cag(tµ, tν), one has C(t1, t3) = qea fc f−1c [C(t1, t2)/qea] f c [C(t2, t3)/qea] . (14) If, instead, t1 = t2 + τ with τ > 0 finite, and R(t2)/R(t3) finite in such a way that C(t1, t2) = qea + Cst(t1 − t2) and C(t2, t3) = Cag(t2, t3) one has C(t1, t3) = min [C(t1, t2), C(t2, t3) ] (15) asymptotically. This form goes under the name of dynamic ultrametricity. In the opposite case t3 = t2 − τ and R(t1)/R(t2) finite dynamic ultrametricity also holds. These relations follow immediately from the additive separation of time-scales (8) and the scaling (13) but they can be shown without assuming dynamic scaling, just by using the monotonicity properties of temporal correlations [13]. The simplest way to see these relations at work is to display C(t1, t2) against C(t2, t3), for a chosen value of C(t1, t3) < qea, in a parametric plot in which t2 varies from t3 to t1. In the asymptotic limit t3 → ∞ the construction reaches a stable master curve as displayed in Fig. 2- left. The vertical and horizontal parts correspond to t2 such that C(t1, t2) > qea and C(t2, t3) > qea, respectively, and dynamic ultrametricity holds. The curved part is for t2 such that all correlations are in the aging regime and its functional form is fully determined by fc. The ‘clock’ R yields the speed at which the data-point moves on the parametric curve. A similar construction can be done for the susceptibilities. The stationary correlation, Cst, and susceptibility, χst, are linked by the equilibrium fluctuation dissipation theorem (fdt), χst = (1− Cst)/T . In the aging regime, instead, there is a non-trivial relation between χ and C: χag = (qea − Cag)/Teff that yields fχ(x) = (1−fc(x))/Teff . This relation is also better appreciated if shown in a parametric construction in which times do not appear explicitly. In the long waiting-time limit the plot χ(t, tw) against C(t, tw) with t the parameter running from tw to infinity approaches a broken line form with the slopes −1/T (for C > qea) and −1/Teff (for C < qea). Again, the ‘clock’ R yields the speed at which this curve is constructed upon increasing t. An analytic solution to the Schwinger-Dyson equations was derived in the limit of long waiting-time in which the separation of time-scales, that is to say the plateaus in C and χ, are fully established. In the aging regime, one uses the fact that the variation of the correlation and linear susceptibility are negligible with respect to all terms in the right-hand-side of the equations and can thus be dropped. Furthermore, one approximates the integrals by separating the contributions from the stationary and CONTENTS 9 10.80.60.40.20 10.80.40.20 Figure 2. Sketch of the parametric representation of the correlation and susceptibility. Left: triangular relation between the correlation function in the asymptotic limit t3 → ∞. The three curves correspond to different t1’s such that C(t1, t3) takes three values. The breaking points lie at qea. The arrow indicates the sense of the evolution when t2 increases from t3 to t1. Right: susceptibility, χ(t2, t3) against correlation, C(t2, t3) at fixed t3 using t2 ≥ t3 as a parameter in the long t3 limit. The breaking point at (qea, χea) separates the stationary regime where the equilibrium fdt is satisfied from the aging regime where it is modified. The slopes are −1/T and −1/Teff , respectively. The arrow also indicates here the sense of the evolution when t2 increases from t3. aging regimes [19]. Equation (5) becomes µ∞Rag(t, tw) ∼ p(p− 1) dt′ Cp−2ag (t, t ′)Rag(t, t ′)Rag(t ′, tw) (16) (µ∞ is a constant with contributions from limt→∞ zt and border terms in the integrals). The companion eq. (4) takes a similar form within the same approximation. Now, the surprise is that the approximate equations are invariant under the transformation t→ ht ≡ h(t) , Cag(t, tw) → Cag(ht, htw) , Rag(t, tw) → ḣtw Rag(ht, htw) , with ht positive and monotonic and ḣtw ≡ dhtw/dtw. While the functions fc and fχ and their fd relation are fixed by the remaining approximate equation (16) and its companion, the time-reparametrization invariance does not allow one to compute, analytically, the clock R(t). This problem is similar to the velocity selection problem present for instance in Fisher differential equation describing front propagation and the like [21]. The exact Schwinger-Dyson equations do have a unique solution with a special function R(t) that is selected by the short-time difference effect of the time-derivatives. However, as time increases and time-differences increase too the effect of the time- derivatives diminishes. In the approximate analytic solution we take advantage of this fact to solve the equations asymptotically but we introduce in this way a symmetry that does not allow us to fix R(t). We obtain, instead, a family of solutions parametrized by ht. It is important to reckon that the parametric constructions in Fig. 2 are independent of the clock and thus are fully determined by the approximate treatment. CONTENTS 10 The case p = 2 with Ising spins or Sherrington-Kirkpatrick model has a more complicated scaling form with a sequence of two-time scales leading to dynamic ultrametricity for all C < qea asymptotically [13]. This behaviour is technically more involved but, as far as the symmetry properties are concerned, it is similar to the case treated above. The full dynamic equations have a unique solution but the approximate ones acquire time-reparametrization invariance. We shall not discuss these cases further in the rest of this review. 2.3. Short-ranged models – dynamic action In a series of papers [4]-[9] we claimed that the time-reparametrization invariance thus far introduced via the asymptotic solution to the dynamic equations in mean-field glassy problems is indeed an asymptotic property of the dynamics of glassy systems, mean-field and finite dimensional. The separation of time-scales stationary-aging has been observed in a variety of glassy systems with numerical simulations and experiments [3, 22]. The slowness of the decay in the aging regime, eqs. (10) and (12) and a weak long-term memory of the kind (13), are the hallmark of glassy relaxation. The idea is then that time-reparametrization invariance is the symmetry associated to the dominant dynamic fluctuations in these sytems. In order to pursue this idea forward one has to first prove that the symmetry of the saddle-point equations is also a symmetry of the action in the dynamic generating functional not only of fully-connected spin models of the mean-field type but also of finite dimensional glassy systems. In [4] we derived and studied the symmetry properties of the dynamic action – the so-called Martin-Siggia-Rose action associated to Langevin stochastic dynamics – of the disorder averaged soft-spin 3d Edwards-Anderson (ea) model of spin-glasses. (H = 〈ij〉 Jijsisj with Jij Gaussian random variables with zero mean and taking non-zero value only on nearest neighbours on the d-dimensional lattice and si = ±1.) In our analysis we took a number of steps that we briefly recall here. First, we introduced four fluctuating two-time fields defined on the lattice sites, Qabi (t, tw) , with i = 1, . . . , N and a, b = 0, 1 . (18) Their thermal averages are the expected values of the local two-time self-correlation (a = b = 0), the retarded linear response (a = 0, b = 1), the advanced linear response (a = 1, b = 0), and a fourth observable (a = b = 1) that vanishes if causality is preserved. Second, we assumed that a separation of time-scales fast-slow, of the type described in the previous subsection, applies to these fluctuating fields too. Third, we determined the long-time action by using a Renormalization Group (RG) scheme in the time variables. This allowed us to write the full action as a sum of two contributions: one from the fast regime holding at short time-differences, another one from the slow regime valid at long time-differences. The coupling between these two vanishes asymptotically. Fourth, we analyzed the surviving terms in the action, that are just the slow contribution. Using advanced and retarded scaling dimensions that are just the labels a and b of the fields, CONTENTS 11 one finds that the global time-reparametrization, t→ ht ≡ h(t) , Q̃ i (t, tw) = (ḣt) a(ḣtw) b Qabi (ht, htw) , (19) leaves all surviving terms unmodified. This step is concisely carried out as follows. Take a generic term in the action. Under (19) it transforms as dtν · · · → dtν (ḣtν ) ∆ν · · · where N is the number of time integrals, the dots represent a product of the fields Qabi and ∆ν is the sum of all factors ḣtν arising from the transformation of the fields. Interestingly enough, one finds that all the ∆ν equal one. Thus, with a simple change of variables one absorbs each factor ḣtν in the corresponding integration variable and dtν · · · → dhν · · · (20) Note that in order to prove invariance under the simpler and more common scale transformation, ht = λt, it is enough to have ν=1∆ν = N . Scale invariance is included in the larger global time-reparametrization symmetry but it is, clearly, more restrictive. Finally, we showed that the measure in the functional integral is also global time- reparametrization invariant, completing the program. We refer the reader to Refs. [4] and [6] for the technical details leading to these results. In the disordered 3d ea model studied we carried out the disorder average. The presence of quenched disorder gave us an analytic control of the theory but this does not necessarily mean that such a symmetry develops only for the long time regime of models with quenched disorder. It appears that if the essential assumptions are causality and unitarity, and a separation of time scales that takes the action to a non- trivial asymptotic state (some glassy state), then one expects the symmetry to exist for systems without quenched disorder but that are glassy nevertheless. In order to check the development of time-reparametrization invariance in problems of particles in interaction in finite dimensions one should first obtain the relevant action to work with. A good candidate for a starting point is the Dean-Kawasaki stochastic equation for the evolution of the local density [23]. The idea is then to write its dynamic generating functional and study the symmetry properties of the effective action assuming that a separation of time-scales exists. We are currently carrying out this study. 2.4. Turning a nuisance into something useful - symmetry as a guideline The global time-reparametrization invariance implies that the action describing the long time slow dynamics of a spin-glass is basically a “geometric” random surface theory, with the Q’s themselves as the natural coordinates. The original two times parametrize the surface. Physical quantities, as the bulk integrated response χ(t1, t2) = dt′R(t′, t2) and correlation C(t1, t2) have scaling dimension zero under t → h(t) as well as their local counterparts. The emergence of this gauge-like symmetry, which may appear first CONTENTS 12 as a nuisance that relates too many solutions for just one problem. However, it may provide a simple way to understand spatial fluctuations in systems that possess this global time-reparametrization symmetry. Here, a simple analogy with the problem of a ferromagnet may elucidate the point we want to make. In a ferromagnet the action is invariant under uniform rotations of the magnetization vector ~m. In the ordered phase, rotation symmetry is spontaneously broken, and a certain magnetization direction ~m0 is picked. Typically, a vanishingly small pinning field selects this direction. The low action excitations are the spin waves, fluctuations of the uniform, symmetry broken, state. These spin waves can be described in terms of smooth spatially fluctuating rotations around the uniform magnetization state. The spin waves, generated by using slowly varying local rotations, are the Goldstone modes of the ferromagnet. Similarly, in the aging regime of the glassy systems, the action has a global symmetry, under uniform time-reparametrizations, t → h(t), with the fields transforming as in eq. (19). The probability weight of having certain local two- time correlation and response, the observables, should be independent of this reparametrization. After coarse-graining over a linear length ℓ the non-vanishing fluctuating fields are the local coarse-grained correlation (a = b = 0) and linear response (a = 0, b = 1), C(~r; t, tw) = j∈V~r sj(t)sj(tw) , R(~r; t, tw) = j∈V~r δsj(t) δhj(tw) with the sum carried over the spins in the volume V~r = ℓ d centered at ~r, and a is the lattice spacing. The transformation (17) is now restated as t→ ht ≡ h(t) , Cag(~r; t, tw) → Cag(~r; ht, htw) , Rag(~r; t, tw) → ḣtw Rag(~r; ht, htw) , (21) and it is an asymptotic symmetry of the action for the slow coarse-grained degrees of freedom. Indeed, the symmetry breaking terms, that have their origin in the short- time dynamics and short-time difference dynamics, are not identical to zero but become vanishing small asymptotically. The particular scaling function R(t) selected by the system is determined by matching the fast and the slow dynamics. It depends on several details – the existence of external forcing, the nature of the microscopic interactions, etc. In other words, the fast modes which are absent in the slow dynamics act as symmetry breaking fields for the slow modes. In analogy with the spin-wave fluctuations in magnetic systems, that are dictated by the rotational symmetry, we proposed that the smooth fluctuations in the glassy phase can be obtained by studying the slow varying, position dependent reparametrizations t→ h(~r, t) around the one reparametrization R(t) selected by the short-time dynamics. In other words, we basically proposed that there are Goldstone modes for the glassy action which can be written as slowly varying, spatially inhomogeneous time reparametrizations. This suggests that the slow part of the coarse-grained local CONTENTS 13 correlations and susceptibilities should scale as Cag(~r; t, tw) ≈ qea fc h(~r, t) h(~r, tw) , χag(~r; t, tw) ∼ fχ h(~r, t) h(~r, tw) , (22) with fc and fχ the same functions describing the global correlation and susceptibility, respectively [eqs. (13) and (23)] and the same function h(~r, t) scaling the two-time correlation and susceptibility on each site ~r [4, 5, 6]. The sum rules Cag(t, tw) = ddr Cag(r; t, tw) and χag(t, tw) = V ddr χag(r; t, tw) apply. The reason for this proposal is that the global reparametrization invariance in time of the dynamic action in this two-time regime leads to low action excitations (Goldstone modes) for smoothly varying spatial fluctuations in the reparametrization of time, but not in the external form of the scaling functions. As in a sigma model (for example, to describe the ferromagnet), the external functions fc and fχ fix the manifold of states, and the local time reparametrizations correspond to fluctuations restricted to this fixed manifold of states (in the ferromagnet, the tilting of direction but not the magnitude of the magnetization vector). 2.5. The spherical p = 2 case or mean-field domain growth The spherical SK model (p = 2) can be solved exactly by analyzing the Langevin equation in the basis of eigenvectors of the random matrix Jij [24]. While the correlation has a very similar behaviour to the one of the p ≥ 3 cases (see Fig. 1-left), the susceptibility is quite different. One can mention that the stationary and aging regimes in the linear response are not so sharply separated in this case. If one uses an additive separation as in (8) the aging contribution to the integrated linear response, χag(t, tw), vanishes asymptotically. More precisely, χag(t, tw) ∼ t . (23) Importantly enough, even though the inequalities (10) and (12) are still valid, a careful inspection of all terms in the Schwinger-Dyson equations shows that they are of the same order asymptotically. Moreover, the stationary contribution to the equations in the aging regime is not just a constant: the corrections associated to the asymptotic approach to the plateau in the correlation cannot be neglected. As a result one cannot simply drop the time derivatives and the Schwinger-Dyson equations in the aging regime are not time-reparametrization invariant but just scale invariant, that is to say, they are unchanged by the transformation t→ h(t) = λt, with C and R transforming as in (17) and λ a positive constant. A similar mechanism, though even harder to prove, applies to the dynamic equations for the correlation and linear response of the non-conserved dynamics of the O(N) ferromagnetic model in the large N limit [8]. In line with what we explained above, the effective action for the slow degrees of freedom of the p = 2 spherical model and, more generally, the d-dimensional ferromagnetic O(N) model in the large N limit, are not invariant under global time- reparametrizations but only under global rescaling of time, t → λt. This marks CONTENTS 14 an important difference between models with a finite aging response and these quasi quadratic models with a vanishing aging response [8]. This result is important for a number of reasons: first, it suggests that the susceptibility, or even the effective temperature, might be intimately related to the symmetry properties of the dynamics and consequently of the fluctuations; second, it suggests that the mechanism for fluctuations in coarsening systems might be different from the one of other glassy problems with finite and well-defined effective temperatures. In order to justify the latter statement it remains to be checked whether the reduction of time-reparametrization invariance to just scale invariance also holds in finite dimensional non-field coarsening. 2.6. Quantum problems In the case of quantum models one introduces the effect of dissipation by coupling the system to an environment represented, typically, by an infinite ensemble of quantum harmonic oscillators. One then uses the Schwinger-Keldysh formalism to write a generating functional and from it, in the fully-connected or infinite dimensional cases, one derives Schwinger-Dyson equations similar to the ones above. The asymptotic analysis of these equations follows the same steps as in the classical limit [25, 17] (at least in the case of a weak coupling to the bath [26]) and the time-reparametrization invariance also applies. The appearance of an asymptotic invariance under time-reparatrizations in the mean-field dynamic equations was related to the reparametrization invariance of the replica treatment of the statics of the same models [10, 27]. The latter remains rather abstract. Brézin and de Dominicis [27] studied the consequences of twisting the reparametrizations in the replica approach. Interestingly enough, this can be simply done in a dynamic treatment either by applying shear forces or by applying heat- baths with different inherent dynamics to different parts of the system. More precisely, using a model with open boundary conditions one could apply a thermal bath with a characteristic time-scale on one end and a different thermal bath with a different characteristic time-scale on the opposite end and see how a time-reparametrization ‘flow’ establishes in the model. 3. Consequences and tests In this Section we discuss how one can put these ideas to the test by presenting a number of consequences of global time-reparametrization invariance that are directly measurable numerically and experimentally. The properties that we discuss explicitly are: 1. A growing dynamical correlation length. 2. Scaling of the pdf of local two-time functions. 3. Functional form of the pdf of local two-time functions. 4. Triangular relations between two-time functions. CONTENTS 15 5. Scaling relations for general multi-time functions. 6. Local fluctuation-dissipation relations. 7. Infinite susceptibilities. In considering these predictions, we shall separate them into two distinct classes. The first class contains predictions that are consistent with other theoretical scenarios as well as with the presence of reparametrization invariance. Hence, while reparametrization invariance leads to these predictions, this class alone cannot be used to argue for the role of the symmetry over other mechanisms. The second class, on the other hand, contains predictions that are not natural within other frameworks, and to the date of this report have no obvious explanation within other frameworks. Properties 1 and 2 belong to the first class, properties 3-7 to the second. We shall present these properties in detail below. 3.1. Two-time correlation length In equilibrium statistical models one defines the static correlation length, ξeq, from the spatial decay of the correlation between the fluctuations of the order parameter measured at two space points, 〈 [φ(~r) − 〈 φ(~r) 〉][φ(~r′) − 〈 φ(~r′) 〉] 〉 |~r−~r′|=∆ ∼ ∆−d+2−η e−∆/ξeq , where the angular brackets denote an average over the Gibbs-Boltzmann measure. ξeq depends on temperature and, in second order phase transitions, it diverges at Tc leaving only an algebraically decaying correlation. A dynamic equilibrium correlation length characterizes the spatial decay of equal-time correlations in the equilibrium relaxation of ‘usual’ systems. Similarly to the static case one defines ξ(t) via 〈 [φ(~r, t) − 〈 φ(~r, t) 〉][φ(~r′, t) − 〈 φ(~r′, t) 〉] 〉 |~r−~r′|=∆ ∼ e−∆/ξ(t); the angular brackets indicate here an average over thermal histories and, for simplicity, we omitted the algebraic correction to the exponential decay. The average over thermal noise can be traded for an integration over the reference space-point ~r and one then obtains ξ(t) from V −1 ddr δφ(~r, t)δφ(~r′, t) | |~r−~r′|=∆ ∼ e−∆/ξ(t) where δφ(~r, t) ≡ φ(~r, t) − ddr′′ φ(~r′′, t). The correlation length, ξ(t), depends now on temperature and total time. In systems with slow dynamics in which the order parameter is a two-time entity, a two-time correlation length can be defined in analogy to what we described in the previous paragraph: [φ(~r, t)φ(~r, tw)− 〈 φ(~r, t)φ(~r, tw) 〉] × [φ(~r′, t)φ(~r′, tw)− 〈 φ(~r′, t)φ(~r′, tw) 〉] |~r−~r′|=∆ ∼ e−∆/ξ(t,tw) . (24) Once again by trading the thermal average by a spatial average 〈 φ(~r, t)φ(~r, tw) 〉 becomes the global correlation C(t, tw) and ξ(t, tw) is derived from the four-point correlation C4(∆; t, tw) ≡ ddr δ[φ(~r, t)φ(~r, tw)]δ[φ(~r′, t)φ(~r′, tw)] |~r−~r′|=∆ CONTENTS 16 1e+081e+061e+041e+021e+00 tw=1k tw=10k tw=100k 0 0 0.2 0.4 0.6 0.8 1 tw=1k tw=10k tw=100k 0 0 0.2 0.4 0.6 0.8 1 tw=1k tw=10k tw=100k Figure 3. The correlation length in the 3d EA model at T = 0.6 < Tc and L = 100. (a) As a function of t − tw; (b) as a function of 1 − C; (c) in the scaling form t against 1 − C. These results are taken from [9]. The vertical arrow in panels (b) and (c) indicates the value of qea. with δ[φ(~r, t)φ(~r, tw)] ≡ φ(~r, t)φ(~r, tw) − C(t, tw). The quantity C4(∆; t, tw) measures the probability that a fluctuation of the two-time composite field φ(~r, t)φ(~r, tw) with respect to its global average C(t, tw) in the spatial position ~r affects a fluctuation of the same composite field at a different site ~r′ located at a distance ∆ from ~r and averaged over the whole ensemble of reference points ~r in the sample. The numerical analysis of C4 in the low temperature out of equilibrium dynamics of the 3d ea model [6, 9] (see Fig. 3), soft sphere [28] and Lennard-Jones [29] mixtures yield ξ(t, tw) ∼ ξst(t− tw) C > qea ξag(t, tw) C < qea ξag(t, tw) ∼ t w g(C) and a a small power (26) (a logarithmic growth is also possible). g(C) is a monotonically decreasing function of C. This is a two-time monotonically growing function even at time-lags that are longer than the waiting-time dependent α relaxation time. Note the difference with what is observed in the super-cooled phase where a number of numerical and experimental measurements point at a dynamic correlation length that reaches a maximum at the α relaxation time and later diminishes to zero [30]. The divergence has a clear interpretation within the global time-reparametrization scenario: it is due to the generation and development of the zero mode. The global reparametrization invariance symmetry develops only in the limit of very long times, so that at intermediate times the irrelevant terms that are scale down to zero are still manifest. These irrelevant terms, symmetry breaking ones, give a finite length scale (or a finite ‘mass’) to the soft reparametrization modes. Because they are irrelevant, the correlation length increases or, equivalently, the mass decreases asymptotically. Even though the numerically accessible times are sufficiently long so as to see the separation of time-scales in the relaxation of the global correlation, the correlation length is still very short. In numerical simulations ξ reaches of the order of 4 lattice spacings or inter-particle distances in the spin-glass problem or soft sphere and Lennard- Jones system, respectively. ξ just increases by, say, a factor 4 when the waiting-time is CONTENTS 17 increased by nearly 4 orders of magnitude. The expected limit ξ(t, tw) → ∞ is thus far from being attained. As we already mentioned in the introduction to this Section, the growing length scale is consistent with the development of time-reparametrization invariance, but it is also consistent with other mechanisms. Hence, it alone cannot be used to argue unambiguously in favor of the symmetry-based approach. For example, theories based on the mode-coupling approach or random first order scenario [33] and its refinement including the effect of entropic droplets [34], dynamical criticality controlled by a zero- temperature critical point [35] and frustration limited domains [36] are used to justify the growth of a dynamic length-scale, at least in the super-cooled liquid. Hence, the existence of a growing length scale belongs to the first class of predictions we mentioned in the introduction to this section. Finally, let us note that the fluctuations in the susceptibility, or in multi-time correlations, see eq. (35), and associated susceptibilities, can be used to derive other correlation lengths. It would be interesting to check whether all these behave in the same manner. 3.2. Scaling of the pdf of local two-time functions The most direct way of testing the mere existence of local fluctuations is to measure the probability distribution function (pdf) of local coarse-grained correlators, C(~r; t, tw), and linear susceptibilities, χ(~r; t, tw), at different pairs of times, t and tw. In such a measurement one is forced to use finite coarse-graining lengths. ℓ then becomes a parameters that has to be taken into account in the scaling analysis of the results. In Ref. [9] we showed that, quite generally, the pdf of local coarse-grained correlators can be scaled onto universal curves as long as the global correlation, C(t, tw), is the same, and the ratio of the coarse graining length over the dynamical correlation length, ℓ/ξ(t, tw), is held fixed (see [31] for a similar discussion applied to the super- cooled liquid). Such scaling can be easily understood as follows. At fixed temperature the pdf ρ[Cr; t, tw, ℓ, L] depends on four parameters: two times, t and tw, and two lengths, the coarse-graining length, ℓ, and the size of the system, L. As in the aging regime C(t, tw) is a monotonic function of the two times and ξ(t, tw) ∼ t wg(C), one can trade the two times by C and ξ in complete generality. The next step is a scaling assumption: that in the long times limit the pdfs depend on the coarse graining length ℓ, the total size L and the scale ξ only through the ratios ℓ/ξ and ξ/L. This last step, we should stress, is really a scaling assumption, and not a trivial requirement from dimensional analysis. The lengths ℓ, L and ξ are already dimensionless as they are measured in units of the lattice spacing. The end result from the rewriting of the parameters in terms of the global correlation and the scaling hypothesis is that the pdfs characterizing the heterogeneous constant temperature aging of the system can be written as ρ[Cr;C(t, tw), ℓ/ξ(t, tw), ξ(t, tw)/L] . (27) CONTENTS 18 0 0.2 0.4 0.6 0.8 1 tw=1k tw=10k tw=100k 0 0.2 0.4 0.6 0.8 1 tw=1k tw=10k tw=100k Figure 4. pdf of local coarse-grained correlations Cr at different times t and tw in the 3d Edwards-Anderson model with L = 100 at T = 0.6 < Tc. The waiting-times are given in the key and the global correlation is fixed to C = 0.4 < qea. (a) The coarse-graining boxes have linear size ℓ = 9 in all cases. The curves do not collapse, a slow drift with increasing tw is clear in the figure. (b) Variable coarse-graining length ℓ chosen so as to held ℓ/ξ approximately constant. The collapse improves considerably with respect to panel (a). These results are taken from [9]. In Fig. 4 we show the scaling of the distribution of local coarse-grained correlations in the 3d ea model and the effect of the scaling variable ℓ/ξ (the size of the system, L, is sufficiently large so that ξ/L vanishes in practice). It is noteworthy that a reasonable scaling with the global correlation held fixed and not taking into account the effect of the second scaling variable has been already achieved, approximately, in the Edwards- Anderson model [5, 6], as well as in the kinetically constrained models studied in [7] and Lennard-Jones systems [32]. This is justified by the fact that ξ varies very slowly with tw. However, it is clear that at long though finite times one has to hold the ratio ℓ/ξ constant to obtain a full collapse in all cases. The scaling variable ℓ/ξ allows one to study the change in the functional form of the pdfs upon modifying the coarse-graining volume. Indeed, the pdfs should crossover from a non-trivial form to a simple Gaussian when ℓ goes through the value ξ. In summary, for finite ξ one identifies three ℓ-dependent regimes with different functional forms of the pdfs: • ℓ ≪ ξ. For finite ξ this means ℓ of the order of the lattice spacing, ℓ ∼ a. In this case the pdfs do not have any particular structure. • ℓ ∼ ξ. For finite but large ξ this case is non-trivial and indeed the one that is accessed with numerical simulations and experiments. One finds that the statistics is non-Gaussian for all C. The skewness decreases from zero at C = 1 to reach a minimum and then increase again at small values of C. As regards the functional form, one finds that a Gumbel-like functional form, characterized by a real parameter that depends on C and ℓ/ξ describes the data rather well for, say, qea/2 ∼ qea. • ℓ ≫ ξ. In this limit one matches the central-limit theorem conditions and the statistics becomes Gaussian for all C. CONTENTS 19 The analysis of the ℓ-dependent pdfs thus provides an independent way to estimate ξ. 3.3. Effective action for local ages The argument leading to eq. (27) is a scaling hypothesis and it does not rely on time- reparametrization invariance. Indeed, there exist models in which the scaling (28) for the pdf of local coarse-grained correlations is found, e.g. the O(N) model in the large N limit [8], and global time-reparametrization does not apply. The implications of time-reparametrization invariance appear later, as a prediction for the functional form of the asymptotic limit ρ∞(Cr;C) ≡ lim t,tw→∞ C(t,tw)=C ρ[Cr;C(t, tw), ℓ/ξ(t, tw), ξ(t, tw)/L] . (28) In order to study the functional form that ρ∞ can take let us now use the symmetry argument to analyse the statistics of the fluctuations of the local correlations. So far we have not yet determined how much the h(~r, t) vary in space and time. To this end we need to derive an effective action for these functions that will tell us how costly it is to deviate from the average clock R(t). Ideally, one would like to derive this action from the microscopic one. This should be possible in quasi mean-field models such as the p-spin model with Kac long (but finite) range interactions [37]. For the moment we have just proposed the simplest action that serves our purposes in the ideal limit in which the zero mode is fully developed ξ → ∞ and the local quantities are measured in the infinite coarse-graining volume limit with ℓ/ξ → 0 [7]. Otherwise the parameter ℓ should be taken into account. To start with we worked with the more convenient transformed variable h(~r, t) ≡ e−ϕ(~r,t) that implies Cag(r; t, tw) ≈ qea fc h(~r, t) h(~r, tw) = qea fc dt′∂t′ ϕ(~r,t and we searched for the simplest action that satisfies the constraints due to the symmetries. These are: i. The action must be invariant under a global time reparametrization t→ h(t). ii. If our interest is in short-ranged problems, the action must be written using local terms. The action can thus contain products evaluated at a single time and point in space of terms such as ϕ(~r, t), ∂tϕ(~r, t), ∇ϕ(~r, t), ∇∂tϕ(~r, t), and similar derivatives. iii. The scaling form in eq. (29) is invariant under ϕ(~r, t) → ϕ(~r, t) + Φ(~r), with Φ(~r) independent of time. Thus, the action must also have this symmetry. iv. The action must be positive definite. These requirements largely restrict the possible actions. The one with the smallest number of spatial derivatives (most relevant terms) is S[ϕ] = (∇∂tϕ(~r, t)) ∂tϕ(~r, t) , (30) CONTENTS 20 with K a stiffness. A term M ∂tϕ(~r, t) is also allowed by symmetry but since its space- time integral is constant we drop it. The action solely depends on the time derivatives ∂tϕ(~r, t) and it is simple to check that it satisfies all the four constraints enumerated above (the last requirement follows from the fact that h(~r, t) are monotonically increasing functions of time) [7]. Due to the simple form (30) the ∂tϕ(~r, t) are uncorrelated at any two different times t1 and t2. Thus the expression ∆ϕ~r| dt′ ∂t′ϕ(~r, t ′) entering the exponential in the scaling form in eq. (29) is a sum of uncorrelated random variables in time. One can interpret such expression as the displacement of a random walker with position dependent velocities. Alternatively, one can think of the space-dependent differences ∆ϕ~r| as the net space-dependent height (labeled by t) of a stack of spatially fluctuating layers dt ∂tϕ(~r, t). The action for the fluctuating surfaces of each layer is given by eq. (30). The statistics of the ∆ϕ~r| are completely determined as follows. The action (30) transforms into one of a Gaussian surface after the introduction of a ‘proper’ time τ ≡ lnR(t), and the change of variables, ψ2(~r, t) = ∂τϕ(~r, τ). Indeed, Cag(~r; t, tw) ≈ fc R lnR(t) lnR(tw) dτ ′ ψ2(~r,τ ′) S[ψ] = K dτ ′ [∇ψ(~r, τ ′)]2 . (32) Due to the Gaussian statistics of the ψ, it is simple to show that connected N -point correlations of ∆ varphi~r1 | satisfy 〈∆ϕr1| ∆ϕr2 | · · ·∆ϕrN | 〉c = [τ(t)− τ(tw)] F(~r1, ~r2, . . . , ~rN) , (33) where the function F can be obtained from Wick’s theorem, summing over all graphs that visit all sites (connected) with two lines (because of ψ2) for each vertex i corresponding to a position ri. The reparametrized times appear only in the prefactor τ(t) − τ(tw) = ln[R(t)/R(tw)]. The probabilistic features of the fluctuations of local correlations C(~r, t, tw) depend on times only through R(t)/R(tw), and hence only through the global correlation itself C(t, tw). In consequence, the action (32) implies the scaling (28). The fact that the time-dependencies of the statistical properties of the two-time local coarse-grained correlations are fully determined by the global correlation is a very welcome property of action (30) since it was not obvious a priori. Having the forms in eqs. (31) and (32) allows us to make some quantitative predictions about the form of the pdfs. With some algebraic manipulations one shows the following generic features [7]: • The distribution is non-Gaussian for all C. • For C ∼ qea the pdf is negatively skewed and once put into normal form it is very close to the distribution of the global equilibrium magnetization in the 2d xy model [38]. It can then be approximately described by a generalized Gumbel form with real parameter. CONTENTS 21 • In the opposite limit C ∼ 0 the pdf is positively skewed and it does not take any recognizable form. If one is interested in testing the action further one can simply use eq. (32) to generate, numerically, the ψ(~r, t), construct the C(~r, t, tw) from these functions, and then compare the pdfs thus obtained to the ones measured, say, in a numerical simulation of a given problem. Note that the scaling function fc also plays a role in the functional form of the pdf of local correlations. The same argument applies to the susceptibilities. The local coarse-grained correlations are, by construction, sums of correlated random variables (unless ℓ ≫ ξ). With numerical simulations of the 3d ea model [9] and kinetically facilitated lattice gases [7] we found that the pdfs of correlations coarse- grained over finite lengths ℓ have a functional form that resembles a generalized Gumbel distribution characterized by a continuous parameter that depends on ℓ/ξ and the value of the global correlation, C, when C ∼ qea (see also [38]). This fact is consistent with the discussion above and also with the observation of Bertin and Clusel that Gumbel-like pdfs with real parameter characterize the statistics of sums of random variables with particular correlations between the elements [39]. The fact that we obtain Gumbel-like distributions then means that the correlations between the terms in the sum are of the form needed to get this type of pdf. In short, the time-reparametrization scenario predicts, in its simplest setting, that eqs. (31) and (32) fully characterise the fluctuation of the local correlations in the large times and coarse-graining volume limits. 3.4. Two-time scaling of local functions As we argued in Sect. 2.4, the global time-reparametrization invariance suggests that in the ideal asymptotic limit the slow part of the coarse-grained local correlations and susceptibilities should scale as in eq. (22) in the ideal limit a ≪ ℓ ≪ ξ. In practice the ideal limit is not reached and one is forced to work with finite correlation lengths and thus finite coarse-graining lengths too. The finite ℓ will then play a role and has to be taken into account. We now present some tests of eq. (22) that are based on the parametric representation of the dynamics explained in Sect. 2 and take into account the finite value of ℓ. Let us then imagine that we compute three local coarse-grained two-time correlations, C~r, at three space points ~r1, ~r2 and ~r3, using a given coarse-graining length, a≪ ℓ, and that we obtain functional forms that are characterized by eq. (22) with, say, h(~r1, t) = ln (t/t0), h(~r2, t) = t/t0, and h(~r3, t) = e ln2(t/t0), in the aging regime. In Fig. 5 we sketch the decay of these correlations for the same tw as a function of time-delay. The plateau is at the same height since qea as well as the the full stationary decay are not expected to fluctuate. The external function fc is the same in all curves. It is clear that the decay of the three correlations follows a different pace, the one at ~r3 is the fastest while the relaxation at ~r2 is the slowest. The simplest way to put the proposal (22) to the test is to analyze the implications CONTENTS 22 1e+00 1e-01 1e-02 1e+061e+041e+021e+00 h3 0 0.25 0.75 0 0.25 0.5 0.75 1 Figure 5. Left: sketch of the decay of the correlation with the same stationary decay to qea – shown with a horizontal dashed line – and three choices of the scaling function h(r1, t) = ln(t/t0), h(r2, t) = t/t0, and h(r3, t) = e lna(t/t0). The waiting-time is the same in all curves. Right: the relation between the integrated linear response against the correlation. With a solid line, the parametric plot for fixed and long tw, using t as a parameter that increases from tw at C = 1 to ∞ at C = 0. With symbols, the three pairs (Cj(t, tw), χj(t, tw)) for the same tw, a fixed value of t and hj(t) as in the left panel. of eq. (22) on local triangular relations. In Sect. 2 we showed that two-time functions with a separation of time-scales as in eq. (8) and an aging scaling as in eq. (13) are related in a parametric way in which times disappear, see the sketch in Fig. 2-left. Equation (22) implies that the local (fluctuating) two-time functions should verify the same relation Cag(~r; t1, t3) = qea fc f−1c [Cag(~r; t1, t2)/qea] f c [Cag(~r; t2, t3)/qea] . (34) This is a result of the fact that time-reparametrization invariance restricts the fluctuations to appear only in the local functions h(~r, t) while the function fc is locked to be the global one everywhere in the sample. A pictorial inspection of this relation should take into account the fact that while the stationary decay is not expected to fluctuate, the full aging relaxation and, in particular, the minimal value of the local two-time functions, C(~r; t1, t3), are indeed fluctuating quantities. The parametric construction on different spatial regions should yield ‘parallel translated’ curves with respect to the global one, as displayed in Fig. 2- left. Fluctuations in the function fc would yield different functional forms in the curved part of the parametric construction. A more quantitative analysis can be done by using the knowledge of fc that can be extracted from the global correlation decay. Indeed, if fc is known, the parametric plot f c (C~r12/qea)/ f−1c (C~r13/qea) against f−1c (C~r23/qea)/ f−1c (C~r13/qea) should yield a master curve identical to the global one with different sites just being advanced or retarded with respect to the global value. This is another way of stating that the sample ages in a heterogeneous manner, with some regions being younger (other older) than the global average. (For simplicity we used a chort-hand notation, C~rµν = C(~r; tµ, tν) with µ, ν = 1, 2, 3.) If the time- CONTENTS 23 0 0.2 0.4 0.6 0.8 1 Ccg(t3,t2) 0 0.2 0.4 0.6 0.8 1 Cr 23 (Cg 13/Cr 13) 0 0.2 0.4 0.6 0.8 1 Cr 23 (Cg 13/Cr 13) 0 0.2 0.4 0.6 0.8 1 Cr 23 (Cg 13/Cr 13) Figure 6. The triangular relation in the 3d ea model. Upper left panel: the thick (black) line represents the global C(t1, t2) against C(t2, t3) using t2 as a parameter varying between t3 = 5 × 10 4 MCs and t1 = 9 × 10 6 MCs, C(t1, t3) ∼ 0.35 and qea ∼ 0.8. The curved part is well represented by the hyperbolic form C(t1, t2) ∼ qeaC(t1, t3)/C(t2, t3) ∼ 0.79 × 0.35/C(t2, t3) that corresponds to fc(x) ∼ x −b. With different points joined with thin lines we show the triangular relations between the local coarse-grained correlations on five randomly chosen sites on the lattice (ℓ = 30). Upper right panel and lower left and right panels: 2d projection of the joint probability density of C(r; t1, t2) C(t1, t3)/C(r; t1, t3) and C(r; t2, t3) C(t1, t3)/C(r; t1, t3) at fixed three values of the intermediate time, t2 = 1.5 × 10 5 MCs, 8 × 105 MCs and 5× 106 MCs, respectively and ℓ = 10. The global C(t1, t2) against C(t2, t3) using t2 as a parameter is shown with a thick green line. The green points indicate the location of C(t1, t2) and C(t2, t3) for the chosen t2’s. Each point in the scatter plot corresponds to a site, r. The lines indicated the boundary surrounding 25%, 50% and 75% of the probability density. The cloud extends mostly along the global relation as predicted by time-reparametrization invariance. These results are taken from [9]. CONTENTS 24 reparametrization mode is indeed flat the local values should lie all along this master curve in the aging regime. The conclusions drawn above apply in the strict a ≪ ℓ ≪ ξ limit. In simulations and experiments ξ is finite and even rather short. Thus, ℓ is forced to also be a rather small parameter, in which case ‘finite size’ fluctuations in fc are also expected to exist. The claim is that the latter should scale down to zero faster (in ℓ) than the fluctuations that are related to the zero mode. We have tested these claims in the non-equilibrium dynamics of the 3d ea spin- glass [9]. The results are shown in Fig. 6. In the upper left panel we show the global triangular relation (thick black line) as well as the local one on four chosen sites. The separation of time-scales is clear in the plot. The aging part is rather well described by fc(x) ∼ x −b and the local curves are quite parallel indeed. In the remaining panels in Fig. 6 we show the 2dprojection of the joint probability density of the site fluctuations in the local coarse-grained correlations at different chosen times t2, t2 = 1.5× 10 5 MCs (upper right), t2 = 8 × 10 5 MCs (lower left), and t2 = 5 × 10 6 MCs (lower right). Taking advantage of the fact that fc(x) ∼ x −b we use a very convenient normalization in which we multiply the horizontal and vertical axes by [C13/C~r13] 1/2. Global time- reparametrization invariance, expressed in eq. (34), implies that the data points should spread along the global curve indicated with a thick green line in the figure. Some sites could be advanced, others retarded, with respect to the global value – shown with a point on the green curve – but all should lie mainly on the same master curve. This is indeed quite well reproduced by the simulation data in the three cases, C(t1, t2) close to C(t1, t3) (upper right panel), C(t1, t2) close to qea (lower right panel), and C(t1, t2) far from both (lower left panel). Most of the data points tend to follow the master curve though some fall away from it. The reason for this is that eq. (34) should be strictly satisfied only in the very large coarse-graining volume limit (ℓ≫ a) with ℓ/ξ ≪ 1 while we are here using ℓ = 10a ∼ ξ, see the discussion in Sect. 3.1. The triangular relation can be used to test the fluctuations of the local susceptibilities too. Indeed, if the separation of time scales (8) and the scaling (13) apply to the global susceptibility, the local ones, after the convenient normalization by the maximum value, should follow another master curve, identical to the global one. Note that time-reparametrization invariance as we discuss it here implies that both local correlations and susceptibilities should be fluctuating quantities. Finally, notice that, in contrast to the growing correlation length scale, there is no blatantly obvious explanation of these triangular relations within other theoretical scenarios. These relations are perhaps the most direct consequence of the time reparametrization symmetry arguments, and so this prediction falls within the second class we discussed in the introduction to this section. CONTENTS 25 3.5. Multi-time scaling In general muti-time correlations are non-trivially related to two-time ones. One can take as an example a generic coarse-grained connected four-time correlation. If this function is monotonic with respect to all times, and the two-point correlations scale as in (13) for all pairs (tµ, tν) with µ, ν = 1, 2, 3, 4, the four-time correlation should behave C(~r; t1, t2, t3, t4) = g h(~r, t1) h(~r, t2) h(~r, t2) h(~r, t3) h(~r, t3) h(~r, t4) with the same external function g for all r, in the asymptotic limit in which all times are widely separated and the corresponding two-time correlations fall below qea. Parametric constructions could be envisaged to test this relation. 3.6. Local fluctuation-dissipation relation The asymptotic relation between the global correlation and susceptibility tw→∞,C(t,tw)=C χ(t, tw) = χ̂(C) (36) was first obtained in mean-field disordered models [12, 13] and later observed in simulations of many realistic systems (spins and particles in interaction on finite dimensional spaces). −(dCχ̂(C)) −1 defines an effective temperature [14]. In the aging regime, that is to say for C < qea, three behaviours have been observed in mean-field systems: in structural glass models χ̂(C) is linear in C (solid line in the right-panel in Fig. 5); in spin-glass models χ̂(C) is a non-linear function of C; in coarsening systems χ̂(C) is a constant equal to (1− qea)/T . The scaling in eq. (22) implies that the parametric construction ‘local susceptibility against local correlation’ should fall on the master curve for the global quantities but can be advanced or retarded with respect to the global value; again in the theoretical limit a ≪ ℓ ≪ ξ. This behaviour is sketched in Fig. 5-right for the three sites whose correlations are displayed in the left panel. The restricted relation between local susceptibility and local correlation in eq. (22) arise from the fact that the fluctuations are due to local reparametrizations alone and not to changes in the external functions fc and fχ (much as in transverse vs. longitudinal fluctuations in a non-linear σ-model). An important property of the interpretation of the fluctuation dissipation relation in terms of effective temperatures is that one expects all observables evolving in the same time-scale to equilibrate and hence have the same value of the effective temperature [14] in an asymptotic regime with slow dynamics and small entropy production. Within the time-reparametrization invariance approach the local effective temperature, defined from the slope of the χ̂(C) plot, is automatically the same in the whole sample within a correlation scale, just because the functions fc and fχ do not fluctuate. In Fig. 7 we show the joint pdf of local correlations and susceptibilities of the 3d ea spin-glass in its glassy phase; the accord with the analytic prediction is very CONTENTS 26 satisfactory [5, 6] with the additional spreading away from the master curve ascribed to the fact that ℓ is finite and not very different from ξ. +++ Bulk 0 0.5 1 Figure 7. (a) The joint pdf ρ(Cr , χr) at two times (tw, t) such that the global correlation is C(t, tw) = 0.7 < qea in the 3d ea model. (b) Projection of three contour levels. The crosses are the parametric construction χ̂(C) for several values of the total time t larger than tw. The dotted straight line is fdt at the working temperature T . These results are taken from [6]. 3.7. Infinite susceptibilities Zero modes are intimately related to infinite susceptibilities. Indeed, systems with continuous symmetries are sensitive to arbitrarily weak perturbations. In the present context the approximate global time-reparametrization invariance implies that one can easily change the ‘clock’ R(t) characterizing the scaling of the global correlation and linear response by applying infinitely weak perturbations that couple to the zero mode. An illustration of this property is the fact that the aging relaxation dynamics of glassy systems is rendered stationary by a weak perturbing force that does not derive from a potential while the χ̂(C) relation in the slow regime is not modified [40]. One could envisage more refined tests such as applying a perturbation that imposes different scalings on two macroscopic borders of the system and see how the time- reparametrization wave develops in the bulk. 3.8. Conclusions In conclusion, the global time-reparametrization invariance scenario gives a mechanism for the divergence of the correlation length ξ though others have also been proposed in the literature. There are a number of predictions, as the parametric relations between local coarse-grained correlations measured at different times and the local fluctuation- dissipation relations that, to our knowledge, are not explained by other scenarios. As regards to the easier to measure pdfs of local correlations the framework is not only CONTENTS 27 consistent with the scaling (28) – that also arises from simple scaling assumptions – but it also justifies the non-Gaussian and Gumbel-like functional form of the pdfs that follows from the proposed effective action for local ‘ages’. Moreover, systems with global time-reparametrization invariance should have as important fluctuations in the local susceptibilities as in the local coarse-grained responses. 4. Discussion We presented a summary of studies of dynamical fluctuations in glassy systems that are based on the idea that, in the long time regime, a global time-reparametrization invariance emerges in the effective action describing the aging dynamics. We discussed how this symmetry concretely appears in mean-field spin models, and how it can be shown to emerge at the level of the action for short-ranged spin glass models with quenched disorder. Two assumptions are used to prove the global time- reparametrization of the action for the short-range spin glass model i) causality and unitarity, and ii) a separation of time scales between a fast (or stationary) and a slow (or aging) relaxation, where in the latter time translation invariance is broken. That the dynamical action is symmetric under uniform, i.e., spatially independent reparametrizations of time variables (t → h(t)) suggests that the dynamic fluctuations that cost little action should be describable in terms of position dependent, long wavelength, reparametrizations of the form t→ h(~r, t). These should be the Goldstone modes associated with breaking time-reparametrization invariance symmetry. We presented predictions of this theoretical framework and tests that we performed in the 3d Edwards-Anderson model to falsify these predictions. Among the consequences of our theoretical framework are those listed in Sect. 3. Some of them find an explanation within other theories as well; others are particularly related to the time reparametrization invariance scenario. For example, a correlation length that grows in time is associated with the asymptotic approach to the long-time regime in which the symmetry is fully developped, and the long wavelength modes eventually become massless. The existence of a growing length is also predicted by other models. The functional form of the triangular relations relating local two-time correlations between three different times is, instead, particular to our framework. In this review we showed tests of the predictions of the global time- reparametrization invariance scenario performed on a finite dimensional spin-glass model [4]-[9]. In the past we also studied the dynamics of kinetically facilitated models, without quenched disorder, along the same lines [7]. We believe that, if aging dynamics is a universal property of glassy systems, then these ideas should also apply to interacting particle systems without quenched disorder. The rationale is that the two assumptions leading to the global time-reparametrization invariance of the dynamical action, namely causality and unitarity, and a separation of time scales, should also hold for other glassy systems. The former assumption we can take as a fact. The second is, in a way, the assumption that a glassy phase exists, even though we say nothing as of why it does. As CONTENTS 28 we stated in the introduction, we do not aim at the question why glasses?, but instead we focus on the possible universal dynamical properties once the glassy state is presented by nature. Some of the consequences of the symmetry have already been tested numerically in Lennard-Jones systems [29, 32] but there is still much room for more detailed studies, including the analysis of local triangular relation between correlation and susceptibilities and tests of their joint behaviour. We thus propose that the asymptotic global time-reparametrization invariance, and the associated low action long wavelength local reparametrizations, constitute the mechanism by which dynamic fluctuations, that is to say heterogeneities, are generated in the glassy state. Moreover, this mechanism may also apply, in an approximate form, to the super-cooled liquid regime. It should just be an approximation because in super- cooled liquids the symmetry is not fully developed and there is then a cut-off setting the limit of the spatial and temporal extent of the heterogeneities, in sharp constrast to the low temperature glassy regime in which the symmetry is realized asymptotically and fluctuations of all sizes exist. The growth and divergence of the (two-time) dynamic correlation length defined from the study of the space correlation of the two-time order parameter is a manifestation of the growth and divergence of these fluctuations in the glassy state; in contrast, such growth is interrupted in the super-cooled liquid. To better understand the distiction between the glassy state with its asymptotic symmetry and the super-cooled state without the fully developing symmetry, consider the phenomenology of dynamic fluctuations as a function of temperature. Dynamic heterogeneities in the super-cooled liquid phase have been identified numerically and experimentally [52]. These are in a number of ways more important than what is observed in a simple liquid or a solid. In the super-cooled liquid phase while the full relaxation is stationary, there is still a time-scale separation with the correlations decaying as a function of time-difference first rapidly to a temperature-independent plateau and then slowly towards zero. The latter is the structural or α-relaxation. The α relaxation time, tα, is finite in the super-cooled liquid regime and it increases upon decreasing temperature. The global parametrization chosen by the symmetry breaking terms in the slow regime is R(t) ∝ e−t/tα and C(t, tw) = qeafc(e −(t−tw)/tα) in this case. The mode-coupling approach to super-cooled liquids is based on approximate dynamic equations for the relevant correlators of realistic systems that are very similar to the p-spin Schwinger-Dyson equations in the high temperature phase [16, 18]. In these equations the correlators are already expressed as functions of the time-difference, τ ≡ t − tw. Close to the critical temperature the separation of time-scales develops in the mode-coupling equations. The approximate analysis of the α relaxation predicted by these equations also relies on dropping the τ derivatives and approximating the integrals by assuming a sharp time-scale separation. The remaining asymptotic (large τ) equations are invariant under reparametrizations of τ . We then expect the local coarse-grained correlations and integrated linear responses in the super-cooled liquid to be, to a first approximation, stationary (after a sufficiently long waiting-time that goes beyond the equilibration time) but with different finite CONTENTS 29 structural relaxation times, fluctuating about the value that characterizes the decay of the global correlations. This is consistent with the experimental observation that dynamic heterogeneities in supercooled liquids seem to have a lifetime of the order of the relaxation time. Deviations from stationarity are not completely excluded for finite ℓ but they should be less important than in the aging low-temperature regime. There is, however, an important difference with respect to the aging regime, in which the equilibration time diverges and local relaxation times or, better stated, local ages can fluctuate without limit when tw → ∞. At high temperatures one does not expect to find heterogeneities with arbitrary long relaxation time. Furthermore, heterogeneities have a finite spatial extent and one can then suppress them by using sufficiently large coarse-graining volumes. The correlation length is stationary, ξ(τ), and it saturates in the limit of long-time differences, τ → ∞. The saturation value, though, increases for decreasing temperature. From a theoretical point of view, this picture is, in a sense, similar to the one that describes the equilibrium paramagnetic phase in the O(N) model, just above the ordering transition temperature. When lowering the temperature the size and life-time of the heterogeneities increases. A p-spin or mode-coupling approach predicts that their typical size and thus limτ→∞ ξ(τ) diverge at the mode coupling transition temperature [33]. In real systems the divergence at Tc is rendered smooth and ξ does not strictly diverge asymptotically. At still lower temperatures the bulk quantities age and we expect then to observe heterogeneous aging dynamics of the kind described in this review, with a two-time dependent correlation length for the local fluctuations. The heterogeneities age as well, in a ‘dynamic’ way. By this we mean that if a region looks older than another one when observed on a given time-window, it can reverse its status and look younger than the same other region when observed on a different time-window. The infinite susceptibility with respect to perturbations that couple to the zero mode are illustrated by the fact that the clock of the bulk quantities that is selected dynamically is very easy to modify with external perturbations. A small force that does not derive from a potential and is applied on every spin in the model renders an aging p spin model stationary [40] while the model maintains a separation of time-scales in which the fast scale follows the temperature of the bath, T , while the slow scale is controlled by an an effective temperature, Teff > T . In this case, the aging system selects a time-reparametrization R(t) = t while in the perturbed model R(t) = e−t/tα . Similarly, the aging of a Lennard-Jones mixture is stopped by an homogeneous shear [41]. A different way to modify the time-reparametrization that characterizes the decay of the correlations is by using complex thermal baths [42]. The picture that we have described applies to long times but not as long as to enter the activated regime that we still do not know how to characterize theoretically, not even at the bulk level. This regime corresponds to times that scale with the system size. The success of mean-field models, or the mode-coupling approach, in describing the bulk dynamics of glassy systems, at least not to close to the crossover glass temperature and at a qualitative level, allows us to claim that these extremely long times scales are CONTENTS 30 unrealistic if not too close to the glass transition Tg even as far as dynamic fluctuations are concerned. The ideas discussed in this paper should not only apply to systems that relax in a non-equilibrium manner as glasses but also to systems with slow dynamics and a separation of time-scales that are kept out of equilibrium with a (weak) external forcing. Recently, there has been much interest in the appearance of shear localization, in the form of shear bands, in the rheology of complex fluids. Along the lines here described it would be very interesting to analyze the fluctuations in the local reparametrizations in the fluidized shear band and the ‘jammed’ glassy band. The analytic treatment of mean-field quantum glassy systems follows similar steps to the ones presented here. The Schwinger-Keldysh approach replaces the Martin- Siggia-Rose one but these formalisms are very similar indeed. The analytic solution to the dynamic equations in the limit in which the coupling to the environment is weak also uses the fact that the dynamics in the aging regime is very slow. The approximate equations then become time-reparametrization invariant. One can then expect that similar dynamic fluctuations arise in glassy problems in which quantum fluctuations are important. Moreover the proof of global time-reparametrization invariance for spin- glasses has been presented directly in the quantum formalism. Novel experimental techniques may be apt to study dynamic heterogeneities in glasses when quantum fluctuations are important. We expect to find similar fluctuations using finite size systems and examining the behaviour of the mesoscopic run-to-run fluctuations of the global correlations. These may be easier to measure experimentally using mesoscopic systems. Importantly enough, global time reparametrization invariance does not develop in all models with slow and aging correlation functions. The O(N) ferromagnetic model in the large N limit is a case in which global time-reparametrization invariance is reduced to just scale invariance [8]. Last, but not least, the approach based on reparametrization invariance suggests that it may be possible to search for universality in glassiness. A Ginzburg-Landau theory for phase transitions captures universal properties that are independent of the details of the material. It is symmetry that defines the universality classes. For example, one requires rotational invariance of the Ginzburg-Landau action when describing ferromagnets. Time reparametrization invariance may be the underlying symmetry that must be satisfied by the Ginzburg-Landau action of all glasses. What would determine if a system is glassy or not? We are tempted to say the answer is if the symmetry is generated or not at long times. Knowing how to describe the universal behavior may tell us all the common properties of glasses, but surely it will not allow us to make non-universal predictions, such as what is the glass transition temperature for a certain material, or whether the material displays glassy behavior at all. This quest for universality is a very interesting theoretical scenario that needs to be confronted. We have tried to state as clearly as possible the implications of our proposal. The CONTENTS 31 full project is not yet complete since several questions about its limitations remain open. A number of issues should be addressed are: (i) From a phenomenological point of view, to perform strong tests of the global time-reparametrization invariance scenario in molecular dynamic simulations [28, 44, 45] of realistic glassy systems and experiments [46]-[51]. More precisely, the triangular relations between local coarse-grained correlations and the local fluctuation dissipation relations should be analysed to give support or else falsify this conjecture. (ii) From an analytic point of view, to derive the effective action for local reparametrizations for glassy models with and without quenched disorder. We are currently working on this project in collaboration with S. Franz. One idea is to study the p spin disordered model with Kac long-range interaction. ANothe one is to study the symmetries of the dynamic action associated with the Dean-Kawasaki equation for the evolution of the density of a system of particles in interaction. (iv) From a mixed analytic and numerical point of view to analyse fluctuations in models with global aging dynamics of different type. To this end, one can study dynamic fluctuations in simple coarsening systems in finite dimensions. In these cases the morphology of domains can be characterized in great detail [43]. We could, in principle, understand the fluctuations in the local correlations and linear responses from a microscopic point of view. Whether these are similar or different to the ones in other glassy problems is still to be established and the outcome of this study could clarify the relevance of the value of the effective temperature in determining the characteristics of the dynamic fluctuations. (v) In the same line as the above, the analysis of fluctuations of elastic lines in the presence of impurities could help us understanding the coarsening phenomenon but also the role of diffusion that superimposes in these cases to the aging dynamics [15]. These are just a few questions posed by this proposal that are still not answered. Acknowledgments We thank our collaborators J. J. Arenzon, C. Aron, A. J. Bray, S. Bustingorry, H. E. Castillo, P. Charbonneau, D. Domı́nguez, J. L. Iguain, L. D. C. Jaubert, M. P. Kennett, M. Picco, D. R. Reichman, M. Sellitto, A. Sicilia and H. Yoshino. We also wish to especially thank L. Berthier, G. Biroli, J-P Bouchaud, D. S. Dean, G. Fabricius, T. Grigera, E. Fradkin, S. Franz, J. Kurchan, H. Makse, D. Stariolo and L. Valluzzi for very helpful discussions. We acknowledge financial support from NSF-CNRS INT-0128922, NSF DMR-0305482, DMR 0403997 and PICS 3172. LFC is a member of Institut Universitaire de France. LFC thanks the Newton Institute at the University of Cambridge, ICTP at Trieste, and Universidad Nacional de Mar del Plata, Argentina, CC the LPTHE at Jussieu, Paris, France, and LFC and CC the Aspen Center for Physics CONTENTS 32 for hospitality where part of this work has been carried out. [1] M. D. Ediger, C. A. Angell, and S. R. Nagel; J. Phys. Chem. 100, 13 200 (1996). Glassy Materials and disordered solids, K. Binder and W. Kob (World Scientific, 2005). [2] P. W. Anderson, Concepts in solids, (World Scientific, 1997). [3] Several reviews and book summarize the aging properties of different types of glasses. Aging in polymer glasses is described in L. C. D. Struik, Physical aging in amorphous polymers and other materials (Elsevier, Houston, 1978). Aging in spin-glasses is reviewed in E. Vincent et al, Slow dynamics and aging in spin-glasses, cond-mat/9607224. Aging in soft glassy materials is summarized in L. Cipelletti and L. Ramos, J. Phys. C 17, R253 (2005). or Viasnoff and Lequeux. Aging in orientational glasses in F. Alberici-Kious, J-P Bouchaud, L. F. Cugliandolo, P. Doussineau and A. Levelut, Phys. Rev. B 62, 14766 (2000) [4] C. Chamon, M. P. Kennett, H. E. Castillo, and L. F. Cugliandolo Phys. Rev. Lett. 89, 217201 (2002). [5] H. E. Castillo, C. Chamon, L. F. Cugliandolo, and M. P. Kennett, Phys. Rev. Lett. 88, 237201 (2002). [6] H. E. Castillo, C. Chamon, L. F. Cugliandolo, J. L. Iguain, and M. P. Kennett, Phys. Rev. B 68, 134442 (2003). [7] C. Chamon, P. Charbonneau, L. F. Cugliandolo, D. R. Reichman, and M. Sellitto, J. of Chem. Phys. 121, 10120 (2004). [8] C. Chamon, L. F. Cugliandolo, H. Yoshino, J. Stat. Mech (2006) P01006. [9] L. D. C. Jaubert, C. Chamon, L. F. Cugliandolo, and M. Picco, cond-mat/0701116, to appear in JSTAT. [10] H. Sompolinsky, Phys. Rev. Lett. 47, 935 (1981). [11] S. L. Ginzburg, Zh. Eksp. Teor. Fiz. 90, 754 (1986) [Sov. Phys. JETP 63, 439 (1986)]. L. B. Ioffe, Phys. Rev. B 38, 5181 (1988). [12] L. F. Cugliandolo and J. Kurchan, Phys. Rev. Lett. 71, 173 (1993). [13] L. F. Cugliandolo and J. Kurchan, J. Phys. A 27, 5749 (1994). [14] L. F. Cugliandolo, J. Kurchan, and L. Peliti, Phys. Rev. E 55, 3898 (1997). [15] S. Bustingorry, J. L. Iguain, C. Chamon, L. F. Cugliandolo, and D. Domı́nguez, Europhys. Lett. 76, 856 (2006). [16] T. R. Kirkpatrick and D. Thirumalai, Phys. Rev. Lett. 58, 2091 (1987); Phys. Rev. B 36, 5388 (1987). T. R. Kirkpatrick and P. Wolynes, Phys. Rev. B 36, 8552 (1987). [17] C. Chamon and M. P. Kennett, Phys. Rev. Lett. 86, 1622 (2001). [18] J-P Bouchaud, L. F. Cugliandolo, J. Kurchan, and M. Mézard, Physica A 226, 243 (1996). [19] L. F. Cugliandolo, Lecture notes in Slow Relaxation and non equilibrium dynamics in condensed matter, Les Houches Session 77 July 2002, J-L Barrat, J Dalibard, J Kurchan, M V Feigel’man eds. (Springer-Verlag, 2003); cond-mat/0210312. [20] W. Götze and L. Sjögren, Rep. Prog. Phys. 55, 241 (1992). W. Götze, Condensed Matter Physics 1, 873 (1998). [21] R. A. Fisher, Ann. Eugenetics, 7, 355 (1937). [22] V. Viasnoff and F. Lequeux, Phys. Rev. Lett. 89, 065701 (2002). [23] D. S. Dean, J. Phys. A 29, L613 (1996). K. Kawazaki and T. Koga, Physica A 201, 115 (1993). [24] L. F. Cugliandolo and D. S. Dean, J. Phys. A 28, 4213 (1996). [25] L. F. Cugliandolo and G. S. Lozano, Phys. Rev. Lett. 80, 4979 (1998). [26] L. F. Cugliandolo, D. R. Grempel, G. L. Lozano, H. Lozza, and C. A. da Silva Santos, Phys. Rev. B 66, 014444 (2002). [27] C. De Dominicis and E. Brézin, Eur. Phys. J. B 30, 71 (2002) http://arxiv.org/abs/cond-mat/9607224 http://arxiv.org/abs/cond-mat/0701116 http://arxiv.org/abs/cond-mat/0210312 CONTENTS 33 [28] G. Parisi, J. Phys. Chem. B 103, 4128 (1999). [29] A. Parsaeian and H. E. Castillo, cond-mat/0610789. [30] see e.g. N. Lačević, F. W. Starr, T. B. Sch/oder, and S. C. Glotzer, J. Chem. Phys. 119, 7372 (2003) and references therein. [31] L. Berthier, Phys. Rev. E 69, 020201(R) (2004). [32] H. E. Castillo and A. Parsaeian, cond-mat/0610857. [33] S. Franz and G. Parisi, J. Phys. I 5, 1401 (1995), Phys. Rev. Lett. 79, 2486 (1997). C. Donati, S. Franz, G. Parisi, and S. C. Glotzer, Phil. Mag. B 79, 1827 (1999). G. Biroli and J-P Bouchaud, Europhys. Lett. 67, 21 (2004). G. Biroli, J-P Bouchaud, K. Miyazaki, and D. R. Reichman, cond-mat/0605733. [34] T. R. Kirkpatrick and P. Wolynes, Phys. Rev. B 36, 8552 (1987). T. R. Kirkpatrick, D. Thirumalai and P. G. Wolynes, Phys. Rev. A 40, 1045 (1989). P. G. Wolynes, Jour. Res. NIST 102, 187 (1997). J-P Bouchaud and G. Biroli, J. Chem. Phys. 121, 7347 (2004). [35] J. P. Garrahan and D. Chandler, Proc. Natl. Acad. Sci. USA 100, 9710 (2003). S. Whitelam, L. Berthier, and J. P. Garrahan Phys. Rev. Lett. 92, 185705 (2004). A. C. Pan, J. P. Garrahan, and D. Chandler Phys. Rev. E 72, 041106 (2005). R. L. Jack, L. Berthier, and J. P. Garrahan, Phys. Rev. E 72, 016103 (2005). R. L. Jack and J. P. Garrahan, J. Chem. Phys. 123, 164508 (2005), [36] G. Tarjus, S. A. Kivelson, Z. Nussinov, and P. Viot, The frustration-based approach of supercooled liquids and the glass transition: a review and critical assessment, cond-mat/0509127 and references therein. [37] C. Chamon, L. F. Cugliandolo and S. Franz, in preparation. [38] S. Bramwell, P. C. W. Holdsworth and J-F Pinton, Nature 396, 552 (1998). [39] E. Bertin and M. Clusel, J. Phys. A 39, 7607 (2006). [40] L. F. Cugliandolo, J. Kurchan, P. Le Doussal, and L. Peliti, Phys. Rev. Lett. 78, 350 (1997). L. Berthier, J.-L. Barrat, and J. Kurchan, Phys. Rev. E 61, 5464 (2000). [41] L. Berthier and J.-L. Barrat, Phys. Rev. Lett. 89, 095702 (2002); J. Chem. Phys. 116, 6228 (2002). [42] L. F. Cugliandolo and J. Kurchan, Physica A 263 242 (1999). [43] J. J. Arenzon, A. J. Bray, L. F. Cugliandolo, and A. Sicilia, cond-mat/0608270, to appear in Phys. Rev. Lett. [44] L. Valluzzi et al., in preparation. [45] K. Vollmayr-Lee, W. Kob, K. Binder, and A. Zippelius, J. Chem. Phys. 116, 5158 (2002). [46] L. Buisson, L. Bellon, and S. Ciliberto, cond-mat/0210490, to appear in Proceedings of “III Workshop on Non-Equilibrium Phenomena” (Pisa 2002). [47] W. K. Kegel and A. V. Blaaderen, Science 287, 290 (2000). [48] K. S. Sinnathamby, H. Oukris, N. E. Israeloff, Phys. Rev. Lett. 95, 67205 (2005). Crider and N. E. Israeloff, Nano Letters 6, 887 (2006). [49] L. Cipelletti, H. Bissig, V. Trappe, P. Ballestat, and S. Mazoyer, J. Phys.: Condens. Matter 15, S257 (2003); A. Duri and L. Cipeletti, cond-mat/0606051 [50] R. E. Courtland and E. R. Weeks, J. Phys.: Condens. Matter 15, S359 (2003). G. C. Cianci, R. E. Courtland, E. R. Weeks, cond-mat/0512698. E. R. Weeks, J. C. Crocker, D. A. Weitz, cond-mat/0610195. [51] P. Wang, C. Song, and H. A. Makse, Nature Physics 2, 526 (2006). [52] H. Sillescu, J. Non-Crystal. Solids 243, 81 (1999); M. D. Ediger, Annu. Rev. Phys. Chem. 51, 99 (2000). http://arxiv.org/abs/cond-mat/0610789 http://arxiv.org/abs/cond-mat/0610857 http://arxiv.org/abs/cond-mat/0605733 http://arxiv.org/abs/cond-mat/0509127 http://arxiv.org/abs/cond-mat/0608270 http://arxiv.org/abs/cond-mat/0210490 http://arxiv.org/abs/cond-mat/0606051 http://arxiv.org/abs/cond-mat/0512698 http://arxiv.org/abs/cond-mat/0610195 Why glasses? vs. universality in glassy dynamics Time reparametrization invariance Mean-field models – dynamic equations Structural glasses: the p3 cases Short-ranged models – dynamic action Turning a nuisance into something useful - symmetry as a guideline The spherical p=2 case or mean-field domain growth Quantum problems Consequences and tests Two-time correlation length Scaling of the pdf of local two-time functions Effective action for local ages Two-time scaling of local functions Multi-time scaling Local fluctuation-dissipation relation Infinite susceptibilities Conclusions Discussion ABSTRACT We summarize a theoretical framework based on global time-reparametrization invariance that explains the origin of dynamic fluctuations in glassy systems. We introduce the main ideas without getting into much technical details. We describe a number of consequences arising from this scenario that can be tested numerically and experimentally distinguishing those that can also be explained by other mechanisms from the ones that we believe, are special to our proposal. We support our claims by presenting some numerical checks performed on the 3d Edwards-Anderson spin-glass. Finally, we discuss up to which extent these ideas apply to super-cooled liquids that have been studied in much more detail up to present. <|endoftext|><|startoftext|> A generalization of Chebyshev polynomials and non rooted posets Masaya Tomie tomie@math.tsukuba.ac.jp In this paper we give a generalization of Chebyshev polynomials and using this we describe the Mobius function of the generalized subword order derived from a poset {a1, · · · as, c | ai < c for i = 1, · · · s}, which contains an affirmative answer for the conjecture by Björner, Sagan and Vatter.(cf,[5] [10]) 1 INTRODUCTION Björner was the first to determine the Möbius functions of factor orders and subword orders. To determine the Möbius functions, he used involutions, shellability, and generating functions. [2][3][4] Björner and Stanley found an interesting relation among the subword order derived from a two point set {a, b} , symmetric groups and composition orders. [6] Factor orders, subword orders, and generalized subword orders were studied in the context of Möbius functions derived from word orders. In [10] Sagan and Vatter gave a description of the Möbius function of the generalized subword order derived from positive integers in two ways, namely the sign reversing involution and the dis- crete Morse theory. More generally they gave a combinatorial description of the Möbius functions derived from rooted forests. And in [5][10] they gave a very interesting conjecture which connects with the relation between a non-rooted forest P2 as in Notation.1 and Chebyshev polynomials. Conjecture 1 ([5][10]) We put P := {a, b, c, | a < c, b < c }, and consider the poset P ∗ consisting of finite words of P with its generalized subword order. Let µ be a Möbius function of P ∗. Suppose 0 ≤ i ≤ j. Then µ(ai, cj) is the coefficient of Xj−i in Ti+j(X). Now we call {Tn(X) | n ∈ N} Chebyshev polynomials of first kind. A series of Chebyshev polynomials {T (X) | n ∈ N} is a system of orthogonal polynomials and induces a special case of hypergeometric functions as a generalization of a binomial series. And this polynomial series is an example of the best approximation polynomials. Not only in analysis, but in combinatorics, Chebyshev polynomials appear in permutation pattern avoidances [7] and Chebyshev posets, Chebyshev transformations defined by Hetyei which are related to cd-indeices, f -vectors and h-vectors respectively. In this paper we give a natural generalization of Chebyshev polynomials in the following way. Definition 1 (generalized Chebyshev polynomials) We define the polynomial T sk (X) for s, k ∈ N as follows: (1) T s0 (X) = 1, T 1 (X) = (s− 1)X , (2) T sk+2(X) + T k (X) = sX · T k+1(X). Now the T 2n(X) are Chebyshev polynomials of first kind. And notice deg(T k (X)) = k. Then, using generalized Chebyshev polynomials, we generalize the conjecture as follows. http://arxiv.org/abs/0704.0685v2 Theorem 1 Let Ps be a poset as Notation.1 and µ be the Möbius function of P s . Then for 0 ≤ m ≤ n , µ(am1 , c n) is the coefficient of Xn−m in T sm+n(x). 2 PRELIMINERIES In this section, we give some basic definitions and notations used in this paper. For the basic definitions of posets and Möbius functions, see [12] and for the definitions of subword orders and generalized subword orders, see [2] [3] [4] [5] [10]. First we recall a path in a poset P .[12] Definition 2 ([12]) Let P be a poset. An arranged sequence (θ1, · · · , θr) with θi ∈ P and θ1 < · · · < θr, is called a path of length r − 1 . We denote it by (θ1 → · · · → θr)p Proposition 1 ([12]) Let P be a locally finite poset, µ be a Möbius function of P and Ck be the number of paths of P of length k respectively. Then we have µ(u, v) = Σk(−1) kCk, for all u ≤ v ∈ P . Definition 3 ([10]) Let P ∗ be the poset with the subword order derived from a poset P . We take p1, · · · pk and q1 · · · ql from P ∗. If p1 · · · pk ≤ q1 · · · ql as a subword order, we call S(j1, · · · , jk) (j1 < · · · < jk), an embedding of p1 · · · pk into q1 · · · ql if pi ≤ qji for 1 ≤ i ≤ k And an embedding S(j1, · · · , jk) is called the right most embedding of p1 · · · pk into q1 · · · ql, if for any embedding S′(j′1, · · · , j k) , we have j i ≤ ji for all 1 ≤ i ≤ k. Notation 1 In this paper we fix a poset Ps for s ∈ N as follows. Ps := {a1, · · ·as, c | ai < c, for i = 1, · · · s}} Definition 4 We define as follows. Let P ∗s be a poset with the generalized subword order derived from a poset Ps as in Notation 1 and let X be a set of the paths of P . Put Mob(X) := Σk≥1Ck, where Ck is the number of paths in X whose length is k. Also we define {akcl} := {p1 · · · pk+l | ♯{p1, · · · , pk+l} ∩ {a1, · · · as} = k, ♯{p1, · · · , pk+l} ∩ {c} = l} for k, l ∈ N ∪ {0}, < p1 · · · pk, {a lcm} >:= {q1 · · · ql+m ∈ {a lcm} | p1 · · · pk ≤ q1 · · · ql+m} for k, l ∈ N ∪ {0}, and Pat{p1 · · · pk, q1 · · · ql} := {(p1 · · · pk → θ1 →, · · · → θr → q1 · · · ql)p | p1 · · · pk < θ1 < · · · < θr < q1 · · · ql, |θi| = l} respectively. Here |θ| is the number of letters of θ. Definition 5 (generalized Chebyshev polynomial) We define the polynomial T sk (X) s, k ∈ N as follows: (1) T s0 (X) = 1, T 1 (X) = (s− 1)X , (2) T sk+2(X) + T k (X) = sX · T k+1(X). Now the {T 2n(X) | n ∈ N} are Chebyshev polynomials of first kind. And notice deg(T k (X)) = k. Here we give a simple expression of the generalized Chebyshev polynomials. Proposition 2 For s, n ∈ N, we have T sn(X) = Σm≤n, n−m:even(−1) (n−m)/2 (n+m)/2 (n−m)/2 ) · sm − ( (n+m)/2− 1 (n−m)/2 ) · sm−1 PROOF We show the above formula by induction. It is easy to see T s0 (X) = 1, T 1 (X) = (s−1)X . Now we have T sn + T = Σm≤n, n−m:even(−1) (n−m)/2 (n+m)/2 (n−m)/2 ) · sm − ( (n+m)/2− 1 (n−m)/2 ) · sm−1 +Σm≤n+2, n+2−m:even(−1) (n+2−m)/2 (n+ 2 +m)/2 (n+ 2−m)/2 ) · sm − ( (n+ 2 +m)/2− 1 (n+ 2−m)/2 ) · sm−1 ) · sn+2 − ( ) · sn+1 ·Xn+2 +Σm≤n, n−m:even(−1) (n−m)/2+1 (n+m)/2 (n−m)/2 + 1 ) · sm − ( (n+m)/2− 1 (n−m)/2 + 1 ) · sm−1 = sn+1(s− 1)Xn+2 +Σm≤n−1, n−1−m:even(−1) (n−m−1)/2+1 (n+m+ 1)/2 (n−m− 1)/2 + 1 ) · sm+1 − ( (n+m+ 1)/2− 1 (n−m− 1)/2 + 1 ) · sm ·Xm+1 sn(s− 1)Xn+1 +Σm≤n−1, n−1−m:even(−1) (n+1−m)/2 (n+ 1 +m)/2 (n+ 1−m)/2 ) · sm − ( (n+ 1 +m)/2− 1 (n+ 1−m)/2 ) · sm−1 = sX · T sn+1. Hence we obtain the derived result. ✷ 3 MAIN RESULTS In this section, we give a proof of Theorem 1 Lemma 1 Let P be a finite poset and we take an element x ∈ P . We put as follows: P≤x := {y | y ≤ x} , P̂≤x := {(θ1 → · · · → θr−1 → x)p | θi ∈ P} , P≥x := {y | y ≥ x} , P̂≥x := {x → θ1 → · · · → θr−1)p | θi ∈ P} and Px := {(· · · → τr → x → σ1 → · · ·) | τi ≤ x, σi ≥ x }. Now a path (x) ∈ P̂≤x, P̂≥x, Px. Then we have MobPx = MobP̂≤x MobP̂≥x. PROOF Notice that a path which passes through x splits into the two paths, one starts from x and the other one ends x. From that we obtain the derived result. ✷ Lemma 2 For m,n, p, q ∈ N∪ {0} such that 0 ≤ m ≤ n, 0 ≤ p ≤ m, 0 ≤ q ≤ n, we take p1 · · · pm, p̃1 · · · p̃m ∈ {am−pcp}. Then we have ♯ < p1 · · · pm, {a n−qcq} >= ♯ < p̃1 · · · p̃m, {a n−qcq} >. PROOF Claim 1 We have ♯ < p1 · · · i−th︷︸︸︷ ax · · · pm, {a n−qcq} >= ♯ < p1 · · · i−th︷︸︸︷ ay · · · pm, {a n−qcq} >. (Proof of claim1) We take ∀q1 · · · qn ∈< p1 · · · i−th︷︸︸︷ ax · · · pm, {a n−qcq} >. And we consider the right most embedding into q1 · · · qn. Notice that the right most embedding is unique. Here we put S(j1, j2, · · · , jm) as the right most embedding p1 · · · i−th︷︸︸︷ ax · · · pm into q1 · · · qn . Now we define the map Φ as follows. Φ < p1 · · · i−th︷︸︸︷ ax · · · pm, {a n−qcq} >−→< p1 · · · i−th︷︸︸︷ ay · · · pm, {a n−qcq} > Φ(q1q2, · · · ji−th︷︸︸︷ ax ax1 · · · axt ji+1−th︷︸︸︷ pi+1 · · · qn) = q1q2 · · · ji−th︷︸︸︷ ay ay1 · · · ayt ji+1−th︷︸︸︷ pi+1 · · · qn, Here we put ayk = axk+y−x, and ak+s = ak. It is easy to see the right most embedding of p1 · · · i−th︷︸︸︷ ay · · · pm into Φ(q1 · · · qn) is S(j1, j2, · · · , jm). And by the construction of Φ, we can easily define the inverse map of Φ. Hence we prove this claim. Claim 2 We have ♯ < p1 · · · i−th︷︸︸︷ (i+1)−th c · · · pm, {a n−qcq} >= ♯ < p1 · · · i−th︷︸︸︷ (i+1)−th ax · · · pm, {a n−qcq} >. (Proof of claim2) We take ∀q1 · · · qn ∈< p1 · · · i−th︷︸︸︷ (i+1)−th c · · · pm, {a n−qcq} > and put S(j1, j2, · · · , jm) as the right most embedding p1 · · · i−th︷︸︸︷ (i+1)−th c · · · pm into q1 · · · qn. Now we define the map Φ as follows. Φ < p1 · · · i−th︷︸︸︷ (i+1)−th c · · · pm, {a n−qcq} >−→< p1 · · · i−th︷︸︸︷ (i+1)−th ax · · · pm, {a n−qcq} >, Φ(q1 · · · =ax︷︸︸︷ qji · · ·︸ ︷︷ ︸ =c︷︸︸︷ qji+1 · · ·︸ ︷︷ ︸ qji+2 · · · qn) = q1 · · · =c︷︸︸︷ qji+1 · · ·︸ ︷︷ ︸ =ax︷︸︸︷ qji · · ·︸ ︷︷ ︸ qji+2 · · · qn. Here the right most embedding of p1 · · · i−th︷︸︸︷ (i+1)−th ax · · · pm into q1 · · · =c︷︸︸︷ qji+1 · · ·︸ ︷︷ ︸ =a︷︸︸︷ qji · · ·︸ ︷︷ ︸ qji+2 · · · qn is S(j1 · · · ji, ji+2 + ji − ji+1, ji+2 · · · jm). By the construction, all of the elements of < p1 · · · i−th︷︸︸︷ (i+1)−th c · · · pm, {a n−qcq} > whose right most embedding are S(j1, j2, · · · , jm), have one to one correspondence to the elements of < p1 · · · i−th︷︸︸︷ (i+1)−th ax · · · pm, {a n−qcq} > whose right most embedding are S(j1 · · · ji, ji+2 + ji − ji+1, ji+2 · · · jm). Hence the Φ is bijeciton. Therefore we have this claim2. By these claims we obtain the derived result. ✷ From Lemma 2 , if p1 · · · pm ∈ {a m−pcp}, then ♯ < p1 · · · pm, {a n−qcq} >= ♯ < a1 · · · a1︸ ︷︷ ︸ (m−p)times c · · · c︸ ︷︷ ︸ p−times , {an−qcq} >. So we denote the number as M((m, p), (n, q)) for all 0 ≤ m ≤ n, 0 ≤ p ≤ m, 0 ≤ q ≤ n. Lemma 3 Let k, l ∈ N∪ {0}, 0 ≤ k ≤ l and p1 · · · pl ∈ {al−kck}, then we have [p1 · · · pl, cl] ≃ Bl−k. Now Bl−k is a Boolean algebra of rank l − k. Lemma 4 For m,n, p, k ∈ N ∪ {0} such that 0 ≤ m ≤ n, 0 ≤ p ≤ m, we take p1 · · · pm ∈ {am−pcp}, then the number of paths in Pat{p1 · · · pm, c n} whose length are k equals to the number of paths in Pat{ a1 · · · a1︸ ︷︷ ︸ (m−p)times c · · · c︸ ︷︷ ︸ p−times , cn}. PROOF Notice that if we take q1 · · · qn ∈ {a n−qcq}, then the number of length l paths from q1 · · · qn to cn equals to the number of length l paths from a1 · · ·a1︸ ︷︷ ︸ (n−q)times c · · · c︸ ︷︷ ︸ q−times to cn. Hence we have ♯{p1 · · · pm → θ1 → · · · → θk = c n | |θi| = n for i = 1, · · · k} = Σp≤r≤nM((m, p), (n, r))♯{ a1 · · · a1︸ ︷︷ ︸ (n−r)times c · · · c︸ ︷︷ ︸ r−times → τ1 → · · · → τk−1 = c n}. (By Lemma 3) Hence we obtain the derived result. Lemma 5 For m,n, p ∈ N ∪ {0}, such that 0 ≤ m ≤ n, 0 ≤ p ≤ m, we take p1, · · · pm ∈ {am−pcp}. Then we MobPat{p1 · · · pm, c i=0 (−1) n−p−iM((m, p, ), (n, p+ i)) ifm < n (−1)n−m ifm = n PROOF From Lemma 4 we have MobPat{p1 · · · pm, c n} = MobPat{ a1 · · · a1︸ ︷︷ ︸ (m−p)times c · · · c︸ ︷︷ ︸ p−times , cn}. Then we have MobPat{ a1 · · · a1︸ ︷︷ ︸ (m−p)times c · · · c︸ ︷︷ ︸ p−times , cn} = (−1)Σ i=0 M((m, p), (n, p+ i))µ( a1 · · ·a1︸ ︷︷ ︸ (n−p−i)times c · · · c︸ ︷︷ ︸ (p+i)times , cn) (By Proposition 1) = (−1)Σ i=0 M((m, p), (n, p+ i))(−1) n−p−i. Hence we obtain the derived result. ✷ Lemma 6 For m,n, p, q ∈ N ∪ {0} such that 1 ≤ m ≤ n, 1 ≤ p ≤ m, 1 ≤ q ≤ n, 1 ≤ p ≤ q, we have M((m, p), (n, q)) = Σn−mi=0 M((m− 1, p− 1), (n− 1− i, q − 1)) · s PROOF We have the left hand side = ♯{x1 · · ·xn ∈ {a n−qcq} | a p ≤ x1 · · ·xn} (By Lemma 2) = ♯{x1 · · ·xn ∈ {a n−qcq} | a p ≤ x1 · · ·xn, xn = c} +♯{x1 · · ·xn ∈ {a n−qcq} | a p ≤ x1 · · ·xn, xn−1 = c, xn 6= c} · · · +♯{x1 · · ·xn ∈ {a n−qcq} | a p ≤ x1 · · ·xn, xn−i = c, xn−i+k 6= c(1 ≤ k ≤ i)} +♯{x1 · · ·xn ∈ {a n−qcq} | a p ≤ x1 · · ·xn, xm = c, xm+k 6= c(1 ≤ k ≤ n−m)} = ♯{ · · ·︸︷︷︸ p−1 ≤ A A ∈ {an−qcq−1}} +♯{ · · ·︸︷︷︸ c︸︷︷︸ p−1 ≤ A A ∈ {an−q−1cq−1}} · s1 +♯{ · · ·︸︷︷︸ c︸︷︷︸ p−1 ≤ A A ∈ {an−q−icq−1}} · si +♯{ · · ·︸︷︷︸ c︸︷︷︸ p−1 ≤ A A ∈ {am−qcq−1}} · sn−m (In case of n− q − i < 0 , we recognize ♯{ · · ·︸︷︷︸ c︸︷︷︸ | am−pcp−1 ≤ A A ∈ an−q−icq−1} as 0.) = M((m− 1, p− 1), (n− 1, q − 1)) · s0 +M((m− 1, p− 1), (n− 2, q − 1)) · s1 + · · · +M((m− 1, p− 1), (n− 1− i, q − 1)) · si · · ·M((m− 1, p− 1), (m− 1, q − 1)) · sn−m Hence we obtain the derived result. ✷ Lemma 7 For m,n, i ∈ N ∪ {0} such that i ≤ m ≤ n, i ≤ p ≤ q, i ≤ p ≤ m, i ≤ q ≤ n, we have M((m, p), (n, q)) = Σn−mk=0 M((m− i, p− i), (n− i− k, q − i)) · s k · ( i+ k − 1 PROOF In case of i = 1, it is shown by Lemma 6. We show the above formula by induction. We suppose this lemma holds for i− 1. Now we see M((m, p), (n, q)) = Σn−mk=0 M((m− i+ 1, p− i+ 1), (n− i− k + 1, q − i+ 1)) · s i+ k − 2 = Σn−mk=0 (Σ l=0 M((m− i, p− i), (n− i− k − l, q − i)) · 2 l) · sk( i+ k − 2 = Σn−mk,l=0M((m− i, p− i), (n− i− (k + l), q − i)) · s i+ k − 2 = Σn−ik+l=0M((m− i, p− i), (n− i− (k + l), q − i)) · s i+ k − 2 = Σn−iα=0M((m− i, p− i), (n− i− α, q − i))(Σ i+ j − 2 )) · sα. Now we remark the following formula. Σαx=0( ) = ( i+ α+ 1 hence we have = Σn−iα=0M((m− i, p− i), (n− i− α, q − i))(Σ i+ j − 2 )) · sα = Σn−iα=0M((m− i, p− i), (n− i− α, q − i)) · s n+ α− 1 Lemma 8 For m,n, p, q ∈ N, 1 ≤ m ≤ n, 1 ≤ p ≤ q, 1 ≤ p ≤ m, 1 ≤ q ≤ n, we have M((m, p), (n, q)) = Σn−mk=0 M((m− p, 0), (n− p− k, q − p)) · s k · ( p+ k − 1 Lemma 9 For 1 ≤ α ≤ β, we have i=0M((α, 0), (β, i)) · (−1) i = 0. PROOF We give a combinatorial proof. We put M̂i :=< a1 · · · a1︸ ︷︷ ︸ α−times , {aβ−ici} >, M̂ := 0≤i≤β M̂i, M̂ev := 0≤i≤β i; even M̂i and M̂odd := 0≤i≤β i; odd M̂i. Then we have Σ i=0M((α, 0), (β, i)) · (−1) i = ♯M̂ev − ♯M̂odd. We consider the map Ψ as follows. Ψ M̂ −→ M̂ Ψ( · · ·︸︷︷︸ a1ax1 · · ·axt) = · · ·︸︷︷︸ cax1 · · ·axt) Ψ( · · ·︸︷︷︸ cax1 · · · axt) = · · ·︸︷︷︸ a1ax1 · · ·axt Ψ( · · ·︸︷︷︸ a1) = · · ·︸︷︷︸ Ψ( · · ·︸︷︷︸ c) = · · ·︸︷︷︸ Here Ψ changes a1 into c and c into a1 which appears right most position of each elements. Since α not being 0, each element of M̂ contains a1 or c. From that the map Ψ is well-defined. Therefore obviously Ψ−1 = Ψ and Ψ(M̂ev) = (M̂odd) Ψ(M̂odd) = M̂ev. Hence Ψ is a bijection and ♯M̂ev = ♯M̂odd. Hence we obtain the derived result. ✷ Lemma 10 For m,n, p ∈ N ∪ {0}, 0 ≤ m < n, 0 ≤ p < m, we have p1 · · · pm ∈ {a m−pcp} =⇒ MobPat{p1 · · · pm, c n} = 0. PROOF We have MobPat{p1 · · · pm, c n} = −Σ i=0 M((m, p), (n, p+ i)) · (−1) n−p−i i=0 (−1) n−p−iΣn−mk=0 M((m− p, 0), (n− p− k, i)) · s p+ k − 1 = (−1)n−p−1Σ i=0 Σ k=0 M((m− p, 0), (n− p− k, i))(−1) i · sk( p+ k − 1 = (−1)n−p−1Σn−mk=0 {Σ n−p−k i=0 M((m− p, 0), (n− p− k, i))(−1) ︸ ︷︷ ︸ } · sk( p+ k − 1 = 0. Hence we obtain the derived result. ✷ Lemma 11 For m,n ∈ N, 1 ≤ m ≤ n, we put P := {am1 → τ1 → · · · → c m → θk11 · · · → c k1 → θk21 · · · → c k2 → · · · → ckr | m < k1 < · · · < kr = n, |τi| = m, |θ j | = ki }, and p1 · · · pm ∈ {a Then we have µ(p1 · · · pm, c n) = µ(am1 , c n) = Mob(P ) = (−1)mµ(cm, cn). PROOF We have µ(p1 · · · pm, c n) = Mob({(p1 · · · pm → θ1 → · · · θr → c n) | p1 · · · pm < θ1 < · · · θr < c Now we put X l1l2 := {(p1 · · · pm → · · · θr → τ1 → · · · τs → c l2 → σk11 → · · · → c k1 → σk21 → · · · → c k2 · · · → cn) | |θr| = l1, θr 6= c l1 , |τ1|, · · · |τs| = l2, |σ j | = ki}. Then we have {(p1 · · · pm → θ1 → · · · θr → c n) | p1 · · · pm < θ1 < · · · θr < c m≤l1<|startoftext|> Introduction Complex systems may also [1] emerge from a large number of interdependent and interacting elements. Networks have proven to be effective models of natural or man- made complex systems, where the elements are represented by the nodes and their interactions by the links. Typical well known examples include communication and transportation networks, social networks, biological networks [2, 3, 4, 5]. Although the statistical analysis of the underlying topological structure has been very fruitful [2, 3, 4, 5] it was limited due to the fact that in real networks the links may have different capacities or intensities or flows of information or strengths. For example, weighted links can be used for the Internet, to represent the amount of data exchanged between two hosts in the network. For scientific collaboration networks the weight depends on the number of coauthored papers between two authors. For airport networks, it’s either the number of available seats on direct flight connections between airports i and j or the actual number of passengers that travel from airport i to j. For neural networks the weight is the number of junctions between neurons and for transportation networks it’s the Euclidean distance between two destinations. The diversification of the links is described in terms of weights on the links. Therefore, the statistical analysis has to be extended from graphs to weighted complex networks. If all links are of equal weight, the statistical parameters used for unweighted graphs are sufficient for the statistical characterization of the network. Therefore, the statistical parameters of the weighted graphs should reduce to the corresponding parameters of the conventional graphs if all weights are put equal to unity. Complex graphs are characterized by three main statistical parameters, namely the degree distribution, the average path length and the clustering coefficient. We shall briefly mention the definitions for clarity and for a better understanding of the proposed extensions of these parameters for weighted graphs. The structure of a network with N nodes is represented by a NxN binary matrix ijA {a }= , known as adjacency matrix, whose element ija equals 1, when there is a link joining node i to node j and 0 otherwise (i, j=1,2,…,N). In the case of undirected networks with no loops, the adjacency matrix is symmetric ( ij jia a= ) and all elements of the main diagonal equal 0 ( iia 0= ). The degree ik of a node i is defined as the number of its neighbours, i.e. the number of links incident to node i: j (i) = ∑ (1) where ija the elements of the adjacency matrix A and (i)Π the neighborhood of node i. The degree distribution is the probability that some node has k connections to other nodes and it is usually described by a power law P(k) ~ k−γ , with 2 3≤ γ ≤ . The characteristic path length of a network is defined as the average of the shortest path lengths between any two nodes: N(N 1) − ∑ (2) where ijd is the shortest path length between i and j, defined as the minimum number of links traversed to get from node i to node j. In many real networks it is found that the existence of a link between nodes i and j and between nodes i and k enhances the probability that node j will also be connected to node k. This tendency of the neighbours of any node i to connect to each other, is called clustering and is quantified by the clustering coefficient iC , which is the fraction of triangles in which node i participates, to the maximum possible number of such triangles: ( ) ( ) ij jk ki i i i i a a a k k 1 k k 1 , ik 0,1≠ (3) where i ij jk ki n a a a = ∑ is the actual number of triangles in which node i participates i.e. the actual number of links between the neighbours of node i, and ( )i ik k 1 / 2− is the maximum possible number of links, when the subgraph of neighbours of node i is completely connected. The clustering coefficient iC equals 1, if node i is the center of a fully interconnected cluster and equals 0, if the neighbours of node i are not connected to each other. Ιn order to characterize the network as a whole, we usually consider the average clustering coefficient C over all the nodes. We may also consider the average clustering coefficient C(k) over the node degree k. Studies of real complex networks have shown that their connection topology is neither completely random nor completely regular, but lies between these extreme cases. Many real networks share features of both extreme cases. For example, the short average path length, typical of random networks, comes along with large clustering coefficient, typical of regular lattices. The coexistence of these attributes defines a distinct class of networks, interpolating between regular lattices and random networks, known today as small world networks [3, 4, 5, 6]. Another class of networks emerges when the degree distribution is a power law (scale free) distribution, which signifies the presence of a non negligible number of highly connected nodes, known as hubs. These nodes, with very large degree k compared to the average degree , are critical for the network’s robustness and vulnerability. These networks are known today as scale free networks [2, 3, 4, 7]. The purpose of this paper is to assess the statistical characterization of weighted networks in terms of proper generalizations of the relevant parameters, namely average path length, degree distribution and clustering coefficient. After reviewing the definitions of the weighted average path length, weighted degree distribution and weighted clustering coefficient in section 2, we compare them in section 3. Although the degree distribution and the average path length admit straightforward generalizations, for the clustering coefficient several different definitions have been proposed. In order to elucidate the significance of different definitions of the weighted clustering coefficient, we studied their dependence on the weights of the connections in section 4, where we introduce the relative perturbation norm as an index to assess the weight distribution. This study revealed new interesting statistical regularities in terms of the relative perturbation norm useful for the statistical characterization of weighted graphs. 2. Statistical parameters of weighted networks The weights of the links between nodes are described by a NxN matrix ijW {w }= . The weight ijw is 0 if the nodes i and j are not linked. We will consider the case of symmetric positive weights ( ij jiw w 0= ≥ ), with no loops ( iiw 0= ). In order to compare different networks or different kinds of weights, we usually normalize the weights in the interval [0,1], by dividing all weights by the maximum weight. Τhe normalized weights are ij max(w ) The statistical parameters for weighted networks are defined as follows. The node degree i ij j (i) = ∑ , which is the number of links attached to node i, is extended directly to the strength or weighted degree, which is the sum of the weights of all links attached to node i: j (i) = ∑ (4) The strength of a node takes into account both the connectivity as well as the weights of the links. The degree distribution is also extended for the weighted networks to the strength distribution P(s), which is the probability that some node’s strength equals s. Recent studies indicate power law aP(s) ~ s− [8, 9, 10]. There are two different generalizations of the characteristic path length in the literature, applicable to transportation and communication networks. In the case of transportation networks the weighted shortest path length ijd between i and j, is defined as the smallest sum of the weights of the links throughout all possible paths from node i to node j [11, 12]: ij ij d min w= ∑ (5) The weight describes physical distances and/or cost usually involved in transportation networks. The capacity/intensity/strength/efficiency of the connection is inversely proportional to the weight. However, this definition is not suitable for communication networks, where the efficiency of the communication channel between two nodes is proportional to the weight. The shortest path length in case of communication networks is defined as: i, j ij d min = ∑ (6) To our knowledge, the latter definition has been used by Latora and Marchiori [13, 14] for the definition of the “efficiency” of the network, as inversely proportional to the shortest path length ijd . The weighted characteristic path length for both cases is the average of all shortest path lengths and it is calculated by formula (2). We found in the literature six proposals for the definitions of the weighted clustering coefficient, which we shall review. Zhang et. al. (2005) [15] definition: ij jk ki w,i 2 ij ij w w w (7) The weights in this definition are normalized. The idea of the generalization is the substitution of the elements of the adjacency matrix by the weights in the nominator of formula (3), as for the denominator the upper limit of the nominator is obtained in order to normalize the coefficient between 0 and 1. The definition originated from gene co-expression networks. As shown by Kalna et. al. (2006) [16] an alternative formula that may apply for this definition is ij jk ki ij ik j k j w w w Lopez-Fernandez et. al. (2004) [17] definition: j,k i i i k (k 1)∈Π ∑ (8) The weights in this definition are not normalized. The idea of the generalization is the substitution of the number of links that exist between the neighbours of node i in formula (3) by the weight of the link between the neighbours j and k. The definition originated from an affiliation network for committers (or modules) of free, open source software projects. Onnela et. al. (2005) [18] definition: ij jk ki w w w k (k 1) (9) The weights in this definition are normalized. The quantity ( ) ij jk kiI(g) w w w= is called “intensity” of the triangle ijk. The concept for this generalization is to substitute the total number of the triangles in which node i participates, by the intensity of the triangle, which is geometric mean of the links’ weights. Barrat et. al. (2004) [8] definition: ij ikB w,i ij jk ki j,ki i C a a a s (k 1) 2 − ∑ (10) The weights in this definition are not normalized. The idea of the generalization is the substitution of the elements of the adjacency matrix in formula (3), by the average of the weights of the links between node i and its neighbours j and k with respect to normalization factor i is (k 1)− which ensures that w,i0 C 1≤ ≤ . This definition was used for airport and scientific collaboration networks. Serrano et. al. (2006) [19] definition ij ik kj w,i 2 w w a s (1 Y ) (11) where = ⎜ ⎟ ∑ has been named “disparity”. The weights in this definition are not normalized. This formula is used for the generalization of the average clustering coefficient with degree k, which has a probabilistic interpretation just as the unweighted clustering coefficient. Holme et. al. [20] definition: ij jk ki ij ij ik j k j w w w max(w ) w w (12) The only difference between formulas (7) and (12) is that (12) is divided by ijmax(w ) . We shall not discuss this definition in the comparison because the essence of the comparison is already addressed by definition (7). Li et. al. (2005) [21] definition of the weighted clustering coefficient, is another version of the Lopez-Fernandez proposal (8). 3. The relation between the different weighted clustering coefficients 1. All definitions reduce to the clustering coefficient (3), when the weights ijw are replaced by the adjacency matrix elements. 2. All weighted clustering coefficients reduce to 0 when there are no links between the neighbours of node i, that is when jk jka w 0= = . 3. In the other extreme, all weighted clustering coefficient take the value 1 when all neighbours of node i are connected to each other. Formulas (7) and (8) take the value 1 if the weights between the neighbours of the node i are 1, independently of the weights of the other links. Formula (9) takes the value 1, if and only if all the weights are equal to 1. Formulas (10) and (11) take the value 1 for all fully connected graphs, independently of all the weights. These calculations are presented in Appendix A. 4. We calculated the values of the weighted clustering coefficients of node i participating in a fully connected triangle. Formulas (7) and (8) take the value jkw of the weight of the link between neighbours j and k, of node i. Formula (9) becomes equal to the intensity of the triangle O 1/3w,i ij jk kiC (w w w )= for all nodes of the triangle. Formulas (10) and (11) take the value 1 for all fully connected graphs, independently of all weights. These calculations are presented in Appendix B. 4. The dependence of the weighted clustering coefficients on the weights In order to understand the meaning of the different proposals-definitions (7), (8), (9) (10) and (11) of the weighted clustering coefficient we shall examine their dependence on the weights, without alteration of the topology of the graph. We simply examine the values of these definitions for different distributions of weights, substituting the nonzero elements of the adjacency matrix A by weights normalized between 0 and 1. A way to distinguish and compare different weight distributions over the same graph, is in terms of the relative perturbation norm , which gives the percentage of the perturbation of the adjacency matrix introduced by the weights. For simplicity, we considered the L2 norm. We shall examine now the dependence of the weighted clustering coefficient with respect to the relative perturbation norms for several different weight distributions as well as for different graphs. We have examined many networks from 20 up to 300 nodes with different topologies that were generated by the networks software PAJEK [22]. The weights examined are randomly generated numbers following uniform or normal distributions with several parameter values, so that the percentages of the perturbations scale from 0-90% increasing by 10% at each perturbation. All simulations gave rise to the same results, figs. 3 and 4, representing the typical trends of random and scale free networks, figs.1 and 2. It is remarkable to emphasize again that in all cases the same trends appear demonstrating a clear dependence on the relative perturbation norm only and no dependence on the values of weights on specific links. Figure 1. The random network (Erdos-Renyi model) examined consists of 100 nodes and was generated by the networks software PAJEK [22]. The clustering coefficient for the unweighted network is 0.3615. Figure 2. The scale-free network (Barabasi-Albert extended model) examined consists of 100 nodes and was generated by the networks software PAJEK [22]. The clustering coefficient for the unweighted network is 0.6561. Figure 3. The values of all five weighted clustering coefficients Zhang et. al. Zw,iC ( ) , Lopez- Fernandez et. al. Lw,iC ( ) , Onnela et. al. w,iC ( ) , Barrat et. al. w,iC ( ) and Serrano et.al. w,iC ( ) , in terms of the relative perturbation norm for the random network (Erdos-Renyi model) with 100 nodes. (A). The weights are randomly generated numbers following the uniform distribution. (B). The weights are randomly generated numbers following the normal distribution. Figure 4. The values of all five weighted clustering coefficients Zhang et. al. Zw,iC ( ) , Lopez- Fernandez et. al. Lw,iC ( ) , Onnela et. al. w,iC ( ) , Barrat et. al. w,iC ( ) and Serrano et.al. w,iC ( ) , in terms of the relative perturbation norm for the scale free network (Barabasi-Albert extended model) with 100 nodes. (A). The weights are randomly generated numbers following the uniform distribution. (B). The weights are randomly generated numbers following the normal distribution. We observe in all cases a clear trend dependence of the values of all five weighted clustering coefficients, in terms of the relative perturbation norm of the weighted network. This demonstrates clearly first of all that the relative perturbation norm is a reliable index of the weight distribution. The Zhang et. al. (7), Lopez-Fernandez et. al. (8) and Onnela et. al. (9), weighted clustering coefficients follow the same trend, decaying smoothly as the relative perturbation norm increases. More specifically the trends of Zhang et. al. (7) and Lopez-Fernandez et. al. (8) almost coincide, while the trend of Onnela et. al. (10) varies slightly from the other two. The weighted clustering coefficients of Barrat et. al. (10) and Serrano et.al. (11) do not change (variations appear after the first two decimal digits), regardless of the size of the network or the distribution of the weights. As mentioned in section 3, these coefficients are independent of the weights when the graph is completely connected. We notice here however, that weighted clustering coefficients (10) and (11) are independent of the weights for any graph. 5. Concluding remarks 1. The clear trend dependence of the values of all five weighted clustering coefficients in terms of the relative perturbation norm shows that the proposed relative perturbation norm is a reliable index of the weight distribution. The meaning of the decaying trend of weighted clustering coefficients Zhang et. al. (7), Lopez-Fernandez et. al. (8) and Onnela et. al. (9), with respect to the increase of the relative perturbation norm is quite natural. The clustering decreases almost linearly as the weights “decrease”. 2. We presented in Appendices A and B the calculations demonstrating that all definitions reduce to the clustering coefficient (3), when the weights ijw are replaced by the adjacency matrix elements. The values of the weighted clustering coefficients of node i participating in a fully connected triangle are presented for completeness because we did not found them in the literature. 3. The results presented in figures 3 and 4 were necessary to obtain in order to have a minimal understanding of the statistical analysis of weighted networks, in order to proceed to applications on real networks. Acknowledgements We would like to thank Prof. Kandylis D. from the Medical School of Aristotle University of Thessaloniki who showed to us the significance of weighted networks in cognitive processes. We also thank Drs. Serrano M. A., Boguñá M. and Pastor- Satorras R. who pointed out their work to us. APPENDIX A. Calculations on the weighted clustering coefficient The definitions (7)-(11) reduce to the clustering coefficient (3), when the weights ijw are replaced by the adjacency matrix elements. 1. Zhang et. al. (2005) ij jk ki w,i 2 ij ij w w w The proof is presented by the authors. For example, for a fully connected network with four nodes 4 4 4 21 31 411j jk k1 1j j2 j3 j4 j 1 k 1 j 1 2 2 2 2 24 4 12 13 14 12 13 142 1j 1j j 1 j 1 12 23 31 12 24 41 13 32 21 13 34 41 14 42 21 14 43 31 12 13 13 14 12 14 w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w ≠ ≠ ≠ ⎜ ⎟⎜ ⎟ = = = + + − − − 12 23 31 13 34 41 12 24 41 12 13 13 14 12 14 w w w w w w w w w w w w w w w w,1C 1= when 23 34 24w w w 1= = = 2. Lopez-Fernandez et. al. (2004) j,k i i i k (k 1)∈Π this formula can be expressed as jk ij ik w a a k (k 1) It is obvious that the formula reduces to the unweighted (3) when jkw are substituted by jka . 3. Onnela et. al. (2005) ij jk ki w w w k (k 1) reduces to the unweighted definition (3) when jkw are substituted by jka . ij ij(a ) a= , hence ( ) ( ) ij jk ki ij jk ki ij jk kiw w w a a a a a a= = ij jk ki ij jk ki j k j kO i i i i a a a a a a k (k 1) k (k 1) ∑∑ ∑∑ 4. Barrat et. al. (2004) ij ikB w,i ij jk ki j,ki i C a a a s (k 1) 2 − ∑ reduces to the unweighted definition (3) when ijw and ikw are substituted by the adjacency matrix elements. i ij ij i j (i) j (i) s w a k ∈Π ∈Π = = =∑ ∑ and 2ij ija a= . ij ik ij ij jk ki ik ij jk kiB w,i ij jk ki j,k j,ki i i i ij jk ki ij jk ki ij jk ki ij jk ki j,k j,ki i i i ij jk ki j,ki i a a a a a a a a a a1 1 C a a a k (k 1) 2 k (k 1) 2 a a a a a a a a a a a a1 1 k (k 1) 2 k (k 1) 2 a a a k (k 1) = = = = = = 5. Serrano et. al. (2006) formula can be expressed as ij ik kj ij ik kj ij ik kj ij ik kj j k j k j k j kS w,i 2 2 22 i i i ij2 2ij2 ji ij2i jij i ij ik kj ij ij w w a w w a w w a w w a s (1 Y ) s w1w s 1 ws 1 ss w w a = = = = = − −⎛ ⎞ ⎛ ⎞⎛ ⎞ −⎜ ⎟ ⎜ ⎟− ⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠⎝ ⎠ ∑∑ ∑∑ ∑∑ ∑∑ It is obvious that the formula reduces to the unweighted (3) when jkw are substituted by jka . APPENDIX B. The values of the weighted clustering coefficients of some node i participating in a fully connected triangle. We calculate the weighted clustering coefficient of node 1. 1. Zhang et. al. (2005) 3 3 3 1j jk k1 1j j2 21 j3 31 j 1 k 1 j 1Z w,1 2 2 2 23 3 12 13 12 132 1j 1j j 1 j 1 12 23 31 13 32 21 12 23 31 12 13 12 13 w w w w w w w w w w w w w w w w w w 2w w w w 2w w 2w w ≠ ≠ ≠ = = = + − −⎛ ⎞ = = = ∑ ∑ 2. Lopez-Fernandez et. al. (2004) 3 3 3 jk 2k 3k k 1 j 1L 23 32k 1 w,1 23 w w w k (k 1) 2(2 1) 2 ≠ ≠ ≠ = = = = 3. Onnela et. al. (2005) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 1 3 3 3 1j jk k1 1j jk k1 j k j kO 1 1 1 3 3 3 1j j2 21 j3 31 1 1 1 1 3 3 3 3 12 23 31 13 32 21 1 1 1 3 3 3 12 23 31 13 32 21 12 23 31 w w w w w w k (k 1) 2(2 1) w w w w w w w w w w w w w w w w w w w w 1 = = = = + =⎜ ⎟ = + =⎢ ⎥⎣ ⎦ = + = ≤⎢ ⎥⎣ ⎦ ∑∑ ∑ ∑ O O O 3 w,1 w,2 w,3 12 23 31C C C w w w 1= = = ≤ 4. Barrat et. al. (2004) ij ikB w,i ij jk ki j,ki i C a a a s (k 1) 2 − ∑ Degree of node 1: 1 1j j (1) k a 2 = =∑ Strength of node 1: 1 1j 12 13 j (1) s w w w = = +∑ 1j 1kB w,1 1j jk k1 j,k1 1 1j 12 1j 13 1j j2 21 1j j3 31 13 12 12 13 13 32 21 12 23 31 12 13 12 23 31 12 23 31 12 13 C a a a s (k 1) 2 w w w w1 a a a a a a s (2 1) 2 2 w w w w1 a a a a a a s 2 2 w w a a a a a a 1 + +⎛ ⎞ = + =⎜ ⎟− ⎝ ⎠ + +⎛ ⎞ = + =⎜ ⎟ = + = = since 12 23 31a a a 1= = = We also prove that Barrat et. al. definition for the weighted clustering coefficient is independent of all weights for all fully connected networks. ij ih ij ihB w,i ij jh hi ij jh hi j,h j hi i i i ik ihi1 ih i2 ih i1 1h hi i2 2h hi ik k h hi i1 i1 i2 i1 i1 11 1i i2 21 1i w w w w1 1 C a a a a a a s (k 1) 2 s (k 1) 2 w ww w w w1 a a a a a a ... a a a s (k 1) 2 2 2 w w w w1 a a a a a a . s (k 1) 2 2 = = = +⎛ ⎞+ + = + + + =⎜ ⎟ − ⎝ ⎠ = + + i i i i i i i i i i ik i1 ik k 1 1i ik i2i1 i2 i2 i2 i1 12 2i i2 22 2i ik k 2 2i i1 ik i2 ik ik ik i1 1k k i i2 2k k i ik k .. a a a w ww w w w a a a a a a ... a a a ... 2 2 2 w w w w w w a a a a a a ... a a 2 2 2 ⎡ +⎛ ⎞ + +⎢⎜ ⎟ +⎛ ⎞+ + + + + + + +⎜ ⎟ + + + + + + + i ik k i For a fully connected network: ij ia 1, i, j 1, 2,..., k= ∀ = and iia 0= , so ik i1B i2 i1 ik i2i1 i2 i1 ik i2 ik w ww w1 C 0 ... s (k 1) 2 2 w ww w 0 ... ... w w w w ... 0 ⎡ +⎛ ⎞+ = + + + +⎢⎜ ⎟ − ⎝ ⎠⎣ +⎛ ⎞+ + + + + + +⎜ ⎟ + + + + i1 i2 ik i1 i1 i2 ik i2 i1 i2 ik ik w w ... w w1 k 2 s (k 1) 2 2 w w ... w w k 2 ... w w ... w w k 2 =⎥⎜ ⎟ ⎡ + + +⎛ ⎞ = + − +⎢⎜ ⎟ − ⎝ ⎠⎣ + + +⎛ ⎞ + + − + +⎜ ⎟ + + + + + − i1 i2 ik i1 i2 ik i1 i2 ik i1 i2 ik i i i w w ... w w w ... w1 k k 2 s (k 1) 2 2 w w ... w w w ... w1 2k 2 1 s (k 1) 2 s =⎥⎜ ⎟ + + + + + +⎛ ⎞ = + − =⎜ ⎟ − ⎝ ⎠ + + + + + + = − = = since ii i1 i2 ik s w w ... w= + + + 5. Serrano et. al. (2006) 3 3 3 1j jk k1 1j j2 21 j3 31 j 1 k 1 j 1S w,1 2 2 2 23 3 12 13 12 132 1j 1j j 1 j 1 12 23 31 13 32 21 12 23 31 12 13 12 13 w a w w a w a w w w w w w a w w a w 2w a w 1 2w w 2w w ≠ ≠ ≠ = = = + − −⎛ ⎞ = = = References [1] Prigogine I., (1980). From being to becoming, Freeman, New York. [2] Barabasi, A.-L. (2002). Linked: The New Science of Networks, Perseus, Cambridge, MA. [3] Dorogovtsev S.N., Mendes J.F.F. (2002). Evolution of networks, Advances in Physics 51, 1079. [4] Newman, M.E.J. (2003). The structure and function of complex networks, SIAM Review 45, 167-256. [5] Watts, D.J. (2003). Six Degrees: The Science of a Connected Age, Norton, New York. [6] Watts, D.J., Strogatz, S.H. (1998). Collective dynamics of ‘small-world’ networks, Nature 393, 440–442. [7] Barabasi, A.-L., Albert, R. (1999). Emergence of scaling in random networks, Science 286, 509-512. [8] Barrat, A., Barthelemy, M., Pastor-Satorras, R., Vespignani, A. (2004). The architecture of complex weighted networks, Proceedings of the National Academy of Sciences (USA) 101, 3747- 3752. [9] Barthelemy, M., Barrat, A., Pastor-Satorras, R., Vespignani, A. (2005). Characterization and modeling of weighted networks, Physica A 346, 34–43. [10] Meiss, M. Menczer, F. Vespignani, A. (2005). On the Lack of Typical Behavior in the Global Web Traffic Network, Proceedings of the 14th International World Wide Web Conference, Japan, ACM 1595930469/05/0005. [11] Thadakamalla, H.P., Kumara, S.R.T., Albert R. (2006). Search in weighted complex networks, cond-mat/0511476v2. [12] Xu, X.-J., Wu, Z.-X., Wang Y.-H. (2005). Properties of weighted complex networks, cond-mat/0504294v3. [13] Latora, V., Marchiori, M. (2001). Efficient behavior of small-world networks, Physical Review Letters 87, 198701. [14] Latora, V., Marchiori, M. (2003). Economic small-world behavior in weighted networks, The European Physical Journal B 32, 249-263. [15] Zhang, B., Horvath, S. (2005). A general framework for weighted gene co- expression network analysis, Statistical Applications in Genetics and Molecular Biology, 4. [16] Kalna, G., Higham, D.J. (2006). Clustering coefficients for weighted networks, University of Strathclyde Mathematics Research report 3. [17] Lopez-Fernandez, L., Robles, G., Gonzalez-Barahona, J.M. (2004). Applying social network analysis to the information in CVS repositories. In Proc. of the 1st Intl. Workshop on Mining Software Repositories (MSR2004), 101-105. [18] Onnela, J.-P., Saramäki, J., Kertész, J., Kaski, K. (2005). Intensity and coherence of motifs in weighted complex networks, Physical Review E, 71 (6), 065103. [19] Serrano M. A., Boguñá M., Pastor-Satorras R. (2006). Correlations in weighted networks, Physical Review E 74, 055101 (R) [20] Holme P., Park S.M., Kim B.J., Edling C.R. (2004). Korean university life in a network perspective: Dynamics of a large affiliation network, cond- mat/0411634v1. [21] Li, M., Fan, Y., Chen, J., Gao, L., Di, Z., Wu J. (2005). Weighted networks of scientific communication: the measurement and topological role of weight, Physica A 350, 643–656. [22] Batagelj V., Mrvar A.: Pajek. http://vlado.fmf.uni-lj.si/pub/networks/pajek/ ABSTRACT The purpose of this paper is to assess the statistical characterization of weighted networks in terms of the generalization of the relevant parameters, namely average path length, degree distribution and clustering coefficient. Although the degree distribution and the average path length admit straightforward generalizations, for the clustering coefficient several different definitions have been proposed in the literature. We examined the different definitions and identified the similarities and differences between them. In order to elucidate the significance of different definitions of the weighted clustering coefficient, we studied their dependence on the weights of the connections. For this purpose, we introduce the relative perturbation norm of the weights as an index to assess the weight distribution. This study revealed new interesting statistical regularities in terms of the relative perturbation norm useful for the statistical characterization of weighted graphs. <|endoftext|><|startoftext|> Introduction Mathematical setting of the problem A priori estimates Determining modes Determining nodes Hausdorff and fractal dimension of attractor Conclusions Bibliography ABSTRACT This paper is devoted to describe the finite-dimensionality of a two-dimensional micropolar fluid flow with periodic boundary conditions. We define the notions of determining modes and nodes and estimate the number of them, we also estimate the dimension of the global attractor. Finally we compare our results with analogous results for Navier-Stokes equation. <|endoftext|><|startoftext|> Introduction Rotor-router walk is a deterministic analogue of random walk, first studied by Priezzhev et al. [18] under the name “Eulerian walkers.” At each site ∗supported by an NSF Graduate Research Fellowship, and NSF grant DMS-0605166; levine(at)math.berkeley.edu †partially supported by NSF grant DMS-0605166; peres(at)stat.berkeley.edu Keywords: abelian sandpile, asymptotic shape, discrete Laplacian, divisible sand- pile, growth model, internal diffusion limited aggregation, rotor-router model 2000 Mathematics Subject Classifications: Primary 60G50; Secondary 35R35 Figure 1: Rotor-router aggregate of one million particles in Z2. Each site is colored according to the direction of its rotor. in the integer lattice Z2 is a rotor pointing north, south, east or west. A particle starts at the origin; during each time step, the rotor at the particle’s current location is rotated clockwise by 90 degrees, and the particle takes a step in the direction of the newly rotated rotor. In rotor-router aggregation, introduced by Jim Propp, we start with n particles at the origin; each par- ticle in turn performs rotor-router walk until it reaches a site not occupied by any other particles. Let An denote the resulting region of n occupied sites. For example, if all rotors initially point north, the sequence will begin A1 = {(0, 0)}, A2 = {(0, 0), (1, 0)}, A3 = {(0, 0), (1, 0), (0,−1)}. The region A1,000,000 is pictured in Figure 1. In higher dimensions, the model can be defined analogously by repeatedly cycling the rotors through an ordering of the 2d cardinal directions in Zd. Jim Propp observed from simulations in two dimensions that the regions An are extraordinarily close to circular, and asked why this was so [7, 19]. Despite the impressive empirical evidence for circularity, the best result known until now [14] says only that if An is rescaled to have unit volume, the volume of the symmetric difference of An with a ball of unit volume tends to zero as a power of n, as n ↑ ∞. The main outline of the argument is summarized in [15]. Fey and Redig [5] also show that An contains a diamond. In particular, these results do not rule out the possibility of “holes” in An far from the boundary or of long tendrils extending far beyond the boundary of the ball, provided the volume of these features is negligible compared to Our main result is the following, which rules out the possibility of holes far from the boundary or of long tendrils in the rotor-router shape. For r ≥ 0 let Br = {x ∈ Zd : |x| < r}. Theorem 1.1. Let An be the region formed by rotor-router aggregation in Zd starting from n particles at the origin and any initial rotor state. There exist constants c, c′ depending only on d, such that Br−c log r ⊂ An ⊂ Br(1+c′r−1/d log r) where r = (n/ωd)1/d, and ωd is the volume of the unit ball in Rd. We remark that the same result holds when the rotors evolve according to stacks of bounded discrepancy; see the remark following Lemma 5.1. Internal diffusion limited aggregation (“internal DLA”) is an analogous growth model defined using random walks instead of rotor-router walks. Starting with n particles at the origin, each particle in turn performs simple random walk until it reaches an unoccupied site. Lawler, Bramson and Griffeath [10] showed that for internal DLA in Zd, the occupied region An, rescaled by a factor of n1/d, converges with probability one to a Euclidean ball in Rd as n→∞. Lawler [11] estimated the rate of convergence. By way of comparison with Theorem 1.1, if In is the internal DLA region formed from n particles started at the origin, the best known bounds [11] are (up to logarithmic factors) Br−r1/3 ⊂ In ⊂ Br+r1/3 for all sufficiently large n, with probability one. We also study another model which is slightly more difficult to define, but much easier to analyze. In the divisible sandpile, each site x ∈ Zd starts with a quantity of “mass” ν0(x) ∈ R≥0. A site topples by keeping up Figure 2: Classical abelian sandpile aggregate of one million particles in Z2. Colors represent the number of grains at each site. to mass 1 for itself, and distributing the excess (if any) equally among its neighbors. Thus if x has mass m > 1, then each of the 2d neighboring sites gains mass (m−1)/2d when we topple x, and x is left with mass 1; if m ≤ 1, then no mass moves when we topple x. Note that individual topplings do not commute; however, the divisible sandpile is “abelian” in the following sense. Proposition 1.2. Let x1, x2, . . . ∈ Zd be a sequence with the property that for any x ∈ Zd there are infinitely many terms xk = x. Let uk(x) = total mass emitted by x after toppling x1, . . . , xk; νk(x) = amount of mass present at x after toppling x1, . . . , xk. Then uk ↑ u and νk → ν ≤ 1. Moreover, the limits u and ν are independent of the sequence {xk}. The abelian property can be generalized as follows: after performing some topplings, we can add some additional mass and then continue top- pling. The resulting limits u and ν will be the same as in the case when all mass was initially present. For a further generalization, see [16]. The limiting function u in Proposition 1.2 is the odometer function for the divisible sandpile. This function plays a central role in our analysis. The limit ν represents the final mass distribution. Sites x ∈ Zd with ν(x) = 1 are called fully occupied. Proposition 1.2 is proved in section 3, along with the following. Theorem 1.3. For m ≥ 0 let Dm ⊂ Zd be the domain of fully occupied sites for the divisible sandpile formed from a pile of mass m at the origin. There exist constants c, c′ depending only on d, such that Br−c ⊂ Dm ⊂ Br+c′ , where r = (m/ωd)1/d and ωd is the volume of the unit ball in Rd. The divisible sandpile is similar to the “oil game” studied by Van den Heuvel [22]. In the terminology of [5], it also corresponds to the h → −∞ limit of the classical abelian sandpile (defined below), that is, the abelian sandpile started from the initial condition in which every site has a very deep “hole.” In the classical abelian sandpile model [1], each site in Zd has an integer number of grains of sand; if a site has at least 2d grains, it topples, sending one grain to each neighbor. If n grains of sand are started at the origin in Zd, write Sn for the set of sites that are visited during the toppling process; in particular, although a site may be empty in the final state, we include it in Sn if it was occupied at any time during the evolution to the final state. Until now the best known constraints on the shape of Sn in two dimen- sions were due to Le Borgne and Rossin [12], who proved that {x ∈ Z2 |x1 + x2 ≤ n/12− 1} ⊂ Sn ⊂ {x ∈ Z2 |x1, x2 ≤ n/2}. Fey and Redig [5] proved analogous bounds in higher dimensions, and ex- tended these bounds to arbitrary values of the height parameter h. This parameter is discussed in section 4. The methods used to prove the near-perfect circularity of the divisible sandpile shape in Theorem 1.3 can be used to give constraints on the shape of the classical abelian sandpile, improving on the bounds of [5] and [12]. Theorem 1.4. Let Sn be the set of sites that are visited by the classical abelian sandpile model in Zd, starting from n particles at the origin. Write n = ωdrd. Then for any � > 0 we have Bc1r−c2 ⊂ Sn ⊂ Bc′1r+c′2 Figure 3: Known bounds on the shape of the classical abelian sandpile in Z2. The inner diamond and outer square are due to Le Borgne and Rossin [12]; the inner and outer circles are those in Theorem 1.4. where c1 = (2d− 1)−1/d, c′1 = (d− �) −1/d. The constant c2 depends only on d, while c′2 depends only on d and �. Note that Theorem 1.4 does not settle the question of the asymptotic shape of Sn, and indeed it is not clear from simulations whether the asymp- totic shape in two dimensions is a disc or perhaps a polygon (Figure 2). To our knowledge even the existence of an asymptotic shape is not known. The rest of the paper is organized as follows. In section 2, we derive the basic Green’s function estimates that are used in the proofs of Theorems 1.1, 1.3 and 1.4. In section 3 we prove Proposition 1.2 and Theorem 1.3 for the divisible sandpile. In section 4 we adapt the methods of the previous section to prove Theorem 1.4 for the classical abelian sandpile model. Section 5 is devoted to the proof of Theorem 1.1. 2 Basic Estimate Write (Xk)k≥0 for simple random walk in Zd, and for d ≥ 3 denote by g(x) = Eo#{k|Xk = x} the expected number of visits to x by simple random walk started at the origin. This is the discrete harmonic Green’s function in Zd; it satisfies ∆g(x) = 0 for x 6= o, and ∆g(o) = −1, where ∆ is the discrete Laplacian ∆g(x) = g(y)− g(x). The sum is over the 2d lattice neighbors y of x. In dimension d = 2, simple random walk is recurrent, so the expectation defining g(x) is infinite. Here we define the potential kernel g(x) = lim gn(x)− gn(o) (1) where gn(x) = Eo#{k ≤ n|Xk = x}. The limit defining g(x) in (1) is finite [9, 20], and g(x) has Laplacian ∆g(x) = 0 for x 6= o, and ∆g(o) = −1. Note that (1) is the negative of the usual definition of the potential kernel; we have chosen this sign convention so that g has the same Laplacian in dimension two as in higher dimensions. Fix a real number m > 0 and consider the function on Zd γ̃d(x) = |x|2 +mg(x). (2) Let r be such that m = ωdrd, and let γd(x) = γ̃d(x)− γ̃d(brce1) (3) where e1 is the first standard basis vector in Zd. The function γd plays a central role in our analysis. To see where it comes from, recall the divisible sandpile odometer function of Proposition 1.2 u(x) = total mass emitted from x. Let Dm ⊂ Zd be the domain of fully occupied sites for the divisible sandpile formed from a pile of mass m at the origin. For x ∈ Dm, since each neighbor y of x emits an equal amount of mass to each of its 2d neighbors, we have ∆u(x) = u(y)− u(x) = mass received by x−mass emitted by x = 1−mδox. Moreover, u = 0 on ∂Dm. By construction, the function γd obeys the same Laplacian condition: ∆γd = 1 −mδo; and as we will see shortly, γd ≈ 0 on ∂Br. Since we expect the domain Dm to be close to the ball Br, we should expect that u ≈ γd. In fact, we will first show that u is close to γd, and then use this to conclude that Dm is close to Br. We will use the following estimates for the Green’s function [6, 21]; see also [9, Theorems 1.5.4 and 1.6.2]. g(x) = log |x|+ κ+O(|x|−2), d = 2 ad|x|2−d +O(|x|−d), d ≥ 3. Here ad = (d−2)ωd , where ωd is the volume of the unit ball in Rd, and κ is a constant whose value we will not need to know. For x ∈ Zd we use |x| to denote the Euclidean norm of x. Here and throughout the paper, constants in error terms denoted O(·) depend only on d. We will need an estimate for γd near the boundary of the ball Br. We first consider dimension d = 2. From (4) we have γ̃2(x) = φ(x)− κm+O(m|x|−2), (5) where φ(x) = |x|2 − log |x|. In the Taylor expansion of φ about |x| = r φ(x) = φ(r)− φ′(r)(r − |x|) + φ′′(t)(r − |x|)2 (6) the linear term vanishes, leaving γ2(x) = (r − |x|)2 +O(m|x|−2) (7) for some t between |x| and r. In dimensions d ≥ 3, from (4) we have γ̃d(x) = |x|2 + adm|x|2−d +O(m|x|−d). Setting φ(x) = |x|2 +adm|x|2−d, the linear term in the Taylor expansion (6) of φ about |x| = r again vanishes, yielding γd(x) = 1 + (d− 1)(r/t)d (r − |x|)2 +O(m|x|−d) for t between |x| and r. Together with (7), this yields the following estimates in all dimensions d ≥ 2. Lemma 2.1. Let γd be given by (3). For all x ∈ Zd we have γd(x) ≥ (r − |x|)2 +O . (8) Lemma 2.2. Let γd be given by (3). Then uniformly in r, γd(x) = O(1), x ∈ Br+1 −Br−1. The following lemma is useful for x near the origin, where the error term in (8) blows up. Lemma 2.3. Let γd be given by (3). Then for sufficiently large r, we have γd(x) > , ∀x ∈ Br/3. Proof. Since γd(x) − |x|2 is superharmonic, it attains its minimum in Br/3 at a point z on the boundary. Thus for any x ∈ Br/3 γd(x)− |x|2 ≥ γd(z)− |z|2, hence by Lemma 2.1 γd(x) ≥ (2r/3)2 − (r/3)2 +O(1) > Lemmas 2.1 and 2.3 together imply the following. Lemma 2.4. Let γd be given by (3). There is a constant a depending only on d, such that γd ≥ −a everywhere. 3 Divisible Sandpile Let µ0 be a nonnegative function on Zd with finite support. We start with mass µ0(y) at each site y. The operation of toppling a vertex x yields the mass distribution Txµ0 = µ0 + α(x)∆δx where α(x) = max(µ0(x)−1, 0) and ∆ is the discrete Laplacian on Zd. Thus if µ0(x) ≤ 1 then Txµ0 = µ0 and no mass topples; if µ0(x) > 1 then the mass in excess of 1 is distributed equally among the neighbors of x. Let x1, x2, . . . ∈ Zd be a sequence with the property that for any x ∈ Zd there are infinitely many terms xk = x. Let µk(y) = Txk . . . Tx1µ0(y). be the amount of mass present at y after toppling the sites x1, . . . , xk in succession. The total mass emitted from y during this process is uk(y) := j≤k:xj=y µj−1(y)− µj(y) = j≤k:xj=y αj(y) (9) where αj(y) = max(µj(y)− 1, 0). Lemma 3.1. As k ↑ ∞ the functions uk and µk tend to limits uk ↑ u and µk → µ. Moreover, these limits satisfy µ = µ0 + ∆u ≤ 1. Proof. Write M = x µ0(x) for the total starting mass, and let B ⊂ Z be a ball centered at the origin containing all points within L1-distance M of the support of µ0. Note that if µk(x) > 0 and µ0(x) = 0, then x must have received its mass from a neighbor, so µk(y) ≥ 1 for some y ∼ x. Since∑ x µk(x) = M , it follows that µk is supported on B. Let R be the radius of B, and consider the quadratic weight µk(x)|x|2 ≤MR2. Since µk(xk)−µk−1(xk) = −αk(xk) and for y ∼ xk we have µk(y)−µk−1(y) = αk(xk), we obtain Qk −Qk−1 = αk(xk) |y|2 − |xk|2 = αk(xk). Summing over k we obtain from (9) Qk = Q0 + uk(x). Fixing x, the sequence uk(x) is thus increasing and bounded above, hence convergent. Given neighboring vertices x ∼ y, since y emits an equal amount of mass to each of its 2d neighbors, it emits mass uk(y)/2d to x up to time k. Thus x receives a total mass of 1 y∼x uk(y) from its neighbors up to time k. Comparing the amount of mass present at x before and after toppling, we obtain µk(x) = µ0(x) + ∆uk(x). Since uk ↑ u we infer that µk → µ := µ0 + ∆u. Note that if xk = x, then µk(x) ≤ 1. Since for each x ∈ Zd this holds for infinitely many values of k, the limit satisfies µ ≤ 1. A function s on Zd is superharmonic if ∆s ≤ 0. Given a function γ on Zd the least superharmonic majorant of γ is the function s(x) = inf{f(x) | f is superharmonic and f ≥ γ}. The study of the least superharmonic majorant is a classical topic in analysis and PDE; see, for example, [8]. Note that if f is superharmonic and f ≥ γ f(x) ≥ f(y) ≥ s(y). Taking the infimum on the left side we obtain that s is superharmonic. Lemma 3.2. The limit u in Lemma 3.1 is given by u = s+ γ, where γ(x) = |x|2 + g(x− y)µ0(y) and s is the least superharmonic majorant of −γ. Proof. By Lemma 3.1 we have ∆u = µ− µ0 ≤ 1− µ0. Since ∆γ = 1−µ0, the difference u−γ is superharmonic. As u is nonnegative, it follows that u − γ ≥ s. For the reverse inequality, note that s + γ − u is superharmonic on the domain D = {x | µ(x) = 1} of fully occupied sites and is nonnegative outside D, hence nonnegative inside D as well. As a corollary of Lemmas 3.1 and 3.2, we obtain the abelian property of the divisible sandpile, Proposition 1.2, which was stated in the introduction. We now turn to the case of a point source mass m started at the origin: µ0 = mδo. More general starting distributions are treated in [16], where we identify the scaling limit of the divisible sandpile model and show that it coincides with that of internal DLA and of the rotor-router model. In the case of a point source of mass m, the natural question is to identify the shape of the resulting domain Dm of fully occupied sites, i.e. sites x for which µ(x) = 1. According to Theorem 3.3, Dm is extremely close to a ball of volume m; in fact, the error in the radius is a constant independent of m. As before, for r ≥ 0 we write Br = {x ∈ Zd : |x| < r} for the lattice ball of radius r centered at the origin. Theorem 3.3. For m ≥ 0 let Dm ⊂ Zd be the domain of fully occupied sites for the divisible sandpile formed from a pile of size m at the origin. There exist constants c, c′ depending only on d, such that Br−c ⊂ Dm ⊂ Br+c′ , where r = (m/ωd)1/d and ωd is the volume of the unit ball in Rd. The idea of the proof is to use Lemma 3.2 along with the basic estimates on γ, Lemmas 2.1 and 2.2, to obtain estimates on the odometer function u(x) = total mass emitted from x. We will need the following simple observation. Lemma 3.4. For every point x ∈ Dm − {o} there is a path x = x0 ∼ x1 ∼ . . . ∼ xk = o in Dm with u(xi+1) ≥ u(xi) + 1. Proof. If xi ∈ Dm − {o}, let xi+1 be a neighbor of xi maximizing u(xi+1). Then xi+1 ∈ Dm and u(xi+1) ≥ = u(xi) + ∆u(xi) = u(xi) + 1, where in the last step we have used the fact that xi ∈ Dm. Proof of Theorem 3.3. We first treat the inner estimate. Let γd be given by (3). By Lemma 3.2 the function u−γd is superharmonic, so its minimum in the ball Br is attained on the boundary. Since u ≥ 0, we have by Lemma 2.2 u(x)− γd(x) ≥ −C, x ∈ ∂Br for a constant C depending only on d. Hence by Lemma 2.1, u(x) ≥ (r − |x|)2 − C ′rd/|x|d, x ∈ Br. (10) for a constant C ′ depending only on d. It follows that there is a constant c, depending only on d, such that u(x) > 0 whenever r/3 ≤ |x| < r − c. Thus Br−c−Br/3 ⊂ Dm. For x ∈ Br/3, by Lemma 2.3 we have u(x) ≥ r2/4−C > 0, hence Br/3 ⊂ Dm. For the outer estimate, note that u−γd is harmonic onDm. By Lemma 2.4 we have γd ≥ −a everywhere, where a depends only on d. Since u vanishes on ∂Dm it follows that u − γd ≤ a on Dm. Now for any x ∈ Dm with r − 1 < |x| ≤ r, we have by Lemma 2.2 u(x) ≤ γd(x) + a ≤ c′ for a constant c′ depending only on d. Lemma 3.4 now implies that Dm ⊂ Br+c′+1. 4 Classical Sandpile We consider a generalization of the classical abelian sandpile, proposed by Fey and Redig [5]. Each site in Zd begins with a “hole” of depth H. Thus, each site absorbs the first H grains it receives, and thereafter functions normally, toppling once for each additional 2d grains it receives. If H is negative, we can interpret this as saying that every site starts with h = −H grains of sand already present. Aggregation is only well-defined in the regime h ≤ 2d− 2, since for h = 2d− 1 the addition of a single grain already causes every site in Zd to topple infinitely often. Let Sn,H be the set of sites that are visited if n particles start at the origin in Zd. Fey and Redig [5, Theorem 4.7] prove that lim sup # (Sn,H 4BH−1/dr) = 0, where n = ωdrd, and 4 denotes symmetric difference. The following theo- rem strengthens this result. Theorem 4.1. Fix an integer H ≥ 2−2d. Let Sn = Sn,H be the set of sites that are visited by the classical abelian sandpile model in Zd, starting from n particles at the origin, if every lattice site begins with a hole of depth H. Write n = ωdrd. Then Bc1r−c2 ⊂ Sn,H where c1 = (2d− 1 +H)−1/d and c2 is a constant depending only on d. Moreover if H ≥ 1− d, then for any � > 0 we have Sn,H ⊂ Bc′1r+c′2 where c′1 = (d− �+H) and c′2 is independent of n but may depend on d, H and �. Note that the ratio c1/c′1 ↑ 1 as H ↑ ∞. Thus, the classical abelian sandpile run from an initial state in which each lattice site starts with a deep hole yields a shape very close to a ball. Intuitively, one can think of the classical sandpile with deep holes as approximating the divisible sandpile, whose limiting shape is a ball by Theorem 3.3. Following this intuition, we can adapt the proof of Theorem 3.3 to prove Theorem 4.1; just one additional averaging trick is needed, which we explain below. Consider the odometer function for the abelian sandpile u(x) = total number of grains emitted from x. Let Tn = {x|u(x) > 0} be the set of sites which topple at least once. Then Tn ⊂ Sn ⊂ Tn ∪ ∂Tn. In the final state, each site which has toppled retains between 0 and 2d− 1 grains, in addition to the H that it absorbed. Hence H ≤ ∆u(x) + nδox ≤ 2d− 1 +H, x ∈ Tn. (11) We can improve the lower bound by averaging over a small box. For x ∈ Zd Qk(x) = {y ∈ Zd : ||x− y||∞ ≤ k} be the box of side length 2k + 1 centered at x, and let u(k)(x) = (2k + 1)−d y∈Qk(x) u(y). Write T (k)n = {x |Qk(x) ⊂ Tn}. Le Borgne and Rossin [12] observe that if T is a set of sites all of which topple, the number of grains remaining in T is at least the number of edges internal to T : indeed, for each internal edge, the endpoint that topples last sends the other a grain which never moves again. Since the box Qk(x) has 2dk(2k + 1)d−1 internal edges, we have ∆u(k)(x) ≥ 2k + 1 d+H − (2k + 1)d 1Qk(o)(x), x ∈ T n . (12) The following lemma is analogous to Lemma 3.4. Lemma 4.2. For every point x ∈ Tn adjacent to ∂Tn there is a path x = x0 ∼ x1 ∼ . . . ∼ xm = o in Tn with u(xi+1) ≥ u(xi) + 1. Proof. By (11) we have u(y) ≥ u(xi). Since u(xi−1) < u(xi), some term u(y) in the sum above must exceed u(xi). Let xi+1 = y. Proof of Theorem 4.1. Let ξ̃d(x) = (2d− 1 +H)|x|2 + ng(x), and let ξd(x) = ξ̃d(x)− ξ̃d(bc1rce1). Taking m = n/(2d− 1 +H) in Lemma 2.2, we have u(x)− ξd(x) ≥ −ξd(x) ≥ −C(2d− 1 +H), x ∈ ∂Bc1r (13) for a constant C depending only on d. By (11), u− ξd is superharmonic, so u − ξd ≥ −C(2d − 1 + H) in all of Bc1r. Hence by Lemma 2.1 we have for x ∈ Bc1r u(x) ≥ (2d− 1 +H) (c1r − |x|)2 − C ′(c1r)d/|x|d , (14) where C ′ depends only on d. It follows that u is positive on Bc1r−c2−Bc1r/3 for a suitable constant c2 depending only on d. For x ∈ Bc1r/3, by Lemma 2.3 we have u(x) > (2d− 1 +H)(c21r 2/4− C) > 0. Thus Bc1r−c2 ⊂ Tn ⊂ Sn. For the outer estimate, let ψ̂d(x) = (d− �+H)|x|2 + ng(x). Choose k large enough so that 2k d ≥ d− �, and define ψ̃d(x) = (2k + 1) y∈Qk(x) ψ̂d(y). Finally, let ψd(x) = ψ̃d(x)− ψ̃d(bc′1rce1). By (12), u(k) − ψd is subharmonic on T n . Taking m = n/(d − � + H) in Lemma 2.4, there is a constant a depending only on d, such that ψd ≥ −a(d+H) everywhere. Since u(k) ≤ (2d+H)(d+1)k on ∂T (k)n it follows that u(k) − ψd ≤ a(d + H) + (2d + H)(d+1)k on T n . Now for any x ∈ Sn with c′1r − 1 < |x| ≤ c 1r we have by Lemma 2.2 u(k)(x) ≤ ψd(x) + a(d+H) + (2d+H)(d+1)k ≤ c̃2 for a constant c̃2 depending only on d, H and �. Then u(x) ≤ c′2 := (2k + 1)dc̃2. Lemma 4.2 now implies that Tn ⊂ Bc′1r+c′2 , and hence Sn ⊂ Tn ∪ ∂Tn ⊂ Bc′1r+c′2+1. We remark that the crude bound of (2d + H)(d+1)k used in the proof of the outer estimate can be improved to a bound of order k2H, and the final factor of (2k + 1)d can be replaced by a constant factor independent of k and H, using the fact that a nonnegative function on Zd with bounded Laplacian cannot grow faster than quadratically; see [16]. 5 Rotor-Router Model Given a function f on Zd, for a directed edge (x, y) write ∇f(x, y) = f(y)− f(x). Given a function s on directed edges in Zd, write div s(x) = s(x, y). The discrete Laplacian of f is then given by ∆f(x) = div∇f = f(y)− f(x). 5.1 Inner Estimate Fixing n ≥ 1, consider the odometer function for rotor-router aggregation u(x) = total number exits from x by the first n particles. We learned the idea of using the odometer function to study the rotor-router shape from Matt Cook [2]. Lemma 5.1. For a directed edge (x, y) in Zd, denote by κ(x, y) the net number of crossings from x to y performed by the first n particles in rotor- router aggregation. Then ∇u(x, y) = −2dκ(x, y) +R(x, y) (15) for some edge function R which satisfies |R(x, y)| ≤ 4d− 2 for all edges (x, y). Remark. In the more general setting of rotor stacks of bounded discrepancy, the 4d− 2 will be replaced by a different constant here. Proof. Writing N(x, y) for the number of particles routed from x to y, we u(x)− 2d+ 1 ≤ N(x, y) ≤ u(x) + 2d− 1 hence |∇u(x, y) + 2dκ(x, y)| = |u(y)− u(x) + 2dN(x, y)− 2dN(y, x)| ≤ 4d− 2. In what follows, C0, C1, . . . denote constants depending only on d. Lemma 5.2. Let Ω ⊂ Zd − {o} with 2 ≤ #Ω <∞. Then∑ |y|1−d ≤ C0 Diam(Ω). Proof. For each positive integer k, let Sk = {y ∈ Zd : k ≤ |y| < k + 1}. Then ∑ |y|1−d ≤ k1−d#Sk ≤ C ′0 for a constant C ′0 depending only on d. Since Ω can intersect at most Diam(Ω) + 1 ≤ 2 Diam(Ω) distinct sets Sk, taking C0 = 2C ′0 the proof is complete. Lemma 5.3. Let G = GBr be the Green’s function for simple random walk in Zd stopped on exiting Br. For any ρ ≥ 1 and x ∈ Br,∑ |x−y|≤ρ |G(x, y)−G(x, z)| ≤ C1ρ. (16) Proof. Let (Xt)t≥0 denote simple random walk in Zd, and let T be the first exit time from Br. For fixed y, the function A(x) = g(x− y)− Exg(XT − y) (17) has Laplacian ∆A(x) = −δxy in Br and vanishes on ∂Br, hence A(x) = G(x, y). Let x, y ∈ Br and z ∼ y. From (4) we have |g(x− y)− g(x− z)| ≤ |x− y|d−1 , y, z 6= x. Using the triangle inequality together with (17), we obtain |G(x, y)−G(x, z)| ≤ |g(x− y)− g(x− z)|+ Ex|g(XT − y)− g(XT − z)| |x− y|d−1 w∈∂Br Hx(w) |w − y|d−1 where Hx(w) = Px(XT = w). Write D = {y ∈ Br : |x− y| ≤}. Then∑ y 6=x z 6=x |G(x, y)−G(x, z)| ≤ C3ρ+ C2 w∈∂Br Hx(w) |w − y|1−d. (18) Figure 4: Diagram for the Proof of Lemma 5.4. Taking Ω = w − D in Lemma 5.2, the inner sum on the right is at most C0Diam(D) ≤ 2C0ρ, so the right side of (18) is bounded above by C1ρ for a suitable C1. Finally, the terms in which y or z coincides with x make a negligible contribution to the sum in (16), since for y ∼ x ∈ Zd |G(x, x)−G(x, y)| ≤ |g(o)−g(x−y)|+Ex|g(XT −x)−g(XT −y)| ≤ C4. Lemma 5.4. Let H1, H2 be linear half-spaces in Zd, not necessarily parallel to the coordinate axes. Let Ti be the first hitting time of Hi. If x /∈ H1∪H2, Px(T1 > T2) ≤ h1 + 1 where hi is the distance from x to Hi. Proof. If one of H1, H2 contains the other, the result is vacuous. Otherwise, let H̃i be the half-space shifted parallel to Hci by distance 2h2 in the direction of x, and let T̃i be the first hitting time of Hi∪H̃i. Let (Xt)t≥0 denote simple random walk in Zd, and write Mt for the (signed) distance from Xt to the hyperplane defining the boundary of H1, with M0 = h1. Then Mt is a martingale with bounded increments. Since ExT̃1 < ∞, we obtain from optional stopping h1 = ExMeT1 ≥ 2h2 Px (XeT1 ∈ H̃1)− Px(XeT1 ∈ H1), hence Px (XeT1 ∈ H̃1) ≤ h1 + 12h2 . (19) Likewise, dM2t − t is a martingale with bounded increments, giving ExT̃1 ≤ d ExM2eT1 ≤ d(2h2 + 1)2 Px (XeT1 ∈ H̃1) ≤ d(h1 + 1)(2h2 + 1) . (20) Let T = min(T̃1, T̃2). Denoting by Dt the distance from Xt to the hyperplane defining the boundary of H2, the quantity D2t + (2h2 −Dt) is a martingale. Writing p = Px(T = T̃2) we have dh22 = EN0 = ENT ≥ p (2h2) 2 + (1− p)dh22 − ExT ≥ (1 + p)dh22 − ExT hence by (20) h1 + 1 Finally by (19) P(T1 > T2) ≤ p+ P(XeT1 ∈ H̃1) ≤ 52 h1 + 1 Lemma 5.5. Let x ∈ Br and let ρ = r + 1− |x|. Let S∗k = {y ∈ Br : 2 kρ < |x− y| ≤ 2k+1ρ}. (21) Let τk be the first hitting time of S∗k , and T the first exit time from Br. Then Px(τk < T ) ≤ C22−k. Figure 5: Diagram for the proof of Lemma 5.5. Proof. Let H be the outer half-space tangent to Br at the point z ∈ ∂Br closest to x. Let Q be the cube of side length 2kρ/ d centered at x. Then Q is disjoint from S∗k , hence Px(τk < T ) ≤ Px(T∂Q < T ) ≤ Px(T∂Q < TH) where T∂Q and TH are the first hitting times of ∂Q and H. Let H1, . . . ,H2d be the outer half-spaces defining the faces of Q, so that Q = Hc1 ∩ . . .∩H By Lemma 5.4 we have Px(T∂Q < TH) ≤ Px(THi < TH) dist(x,H) + 1 dist(x,Hi) 2 dist(x,Hi) Since dist(x,H) = |x − z| ≤ ρ and dist(x,Hi) = 2k−1ρ/ d, and ρ ≥ 1, taking C2 = 20 d3/2(1 + d)2 completes the proof. Lemma 5.6. Let G = GBr be the Green’s function for random walk stopped on exiting Br. Let x ∈ Br and let ρ = r + 1− |x|. Then∑ |G(x, y)−G(x, z)| ≤ C3ρ log Proof. Let S∗k be given by (21), and let W = {w ∈ ∂(S∗k ∪ ∂S k) : |w − x| < 2 be the portion of the boundary of the enlarged spherical shell S∗k ∪∂S k lying closer to x. Let τW be the first hitting time of W , and T the first exit time from Br. For w ∈W let Hx(w) = Px(XτW∧T = w). For any y ∈ S∗k and z ∼ y, simple random walk started at x must hit W before hitting either y or z, hence |G(x, y)−G(x, z)| ≤ Hx(w)|G(w, y)−G(w, z)|. For any y ∈ S∗k and any w ∈W we have |y − w| ≤ |y − x|+ |w − x| ≤ 3 · 2kρ. Lemma 5.3 yields∑ |G(x, y)−G(x, z)| ≤ 3C12kρ Hx(w). By Lemma 5.5 we have w∈W Hx(w) ≤ C22 −k, so the above sum is at most 3C1C2ρ. Since the union of shells S∗0 ,S 1 , . . . ,S dlog2(r/ρ)e covers all of Br ex- cept for those points y within distance ρ of x, and |y−x|≤ρ z∼y |G(x, y)− G(x, z)| ≤ C1ρ by Lemma 5.3, the result follows. Proof of Theorem 1.1, Inner Estimate. Let κ and R be defined as in Lemma 5.1. Since the net number of particles to enter a site x 6= o is at most one, we have 2d div κ(x) ≥ −1. Likewise 2d div κ(o) = n−1. Taking the divergence in (15), we obtain ∆u(x) ≤ 1 + divR(x), x 6= o; (22) ∆u(o) = 1− n+ divR(o). (23) Let T be the first exit time from Br, and define f(x) = Exu(XT )− ExT + n Ex #{j < T |Xj = 0}. Then ∆f(x) = 1 for x ∈ Br − {o} and ∆f(o) = 1 − n. Moreover f ≥ 0 on ∂Br. It follows from Lemma 2.2 with m = n that f ≥ γ − C4 on Br for a suitable constant C4. We have u(x)− Exu(XT ) = u(Xk∧T )− u(X(k+1)∧T ) Each summand on the right side is zero on the event {T ≤ k}, hence u(Xk∧T )− u(X(k+1)∧T ) | Fk∧T = −∆u(Xk)1{T>k}. Taking expectations and using (22) and (23), we obtain u(x)− Exu(XT ) ≥ 1{T>k}(n1{Xk=o} − 1− divR(Xk)) = n Ex #{k < T |Xk = o} − ExT − 1{T>k}divR(Xk) hence u(x)− f(x) ≥ − 1{T>k} ∑ R(Xk, z)  . (24) Since random walk exits Br with probability at least 12d every time it reaches a site adjacent to the boundary ∂Br, the expected time spent adjacent to the boundary before time T is at most 2d. Since |R| ≤ 4d, the terms in (24) with z ∈ ∂Br contribute at most 16d3 to the sum. Thus u(x)− f(x) ≥ −  ∑ y,z∈Br 1{T>k}∩{Xk=y}R(y, z) − 8d2. For y ∈ Br we have {Xk = y} ∩ {T > k} = {Xk∧T = y}, hence u(x)− f(x) ≥ − y,z∈Br Px(Xk∧T = y)R(y, z)− 8d2. (25) Write pk(y) = Px(Xk∧T = y). Note that since ∇u and κ are antisymmet- ric, R is antisymmetric. Thus∑ y,z∈Br pk(y)R(y, z) = − y,z∈Br pk(z)R(y, z) y,z∈Br pk(y)− pk(z) R(y, z). Summing over k and using the fact that |R| ≤ 4d, we conclude from (25) u(x) ≥ f(x)− y,z∈Br |G(x, y)−G(x, z)| − 8d2, where G = GBr is the Green’s function for simple random walk stopped on exiting Br. By Lemma 5.6 we obtain u(x) ≥ f(x)− C3(r + 1− |x|) log r + 1− |x| − 8d2. Using the fact that f ≥ γ − C4, we obtain from Lemma 2.1 u(x) ≥ (r − |x|)2 − C3(r + 1− |x|) log r + 1− |x| The right side is positive provided r/3 ≤ |x| < r−C5 log r. For x ∈ Br/3, by Lemma 2.3 we have u(x) > r2/4−C3r log 32 > 0, hence Br−C5 log r ⊂ An. 5.2 Outer Estimate The following result is due to Holroyd and Propp (unpublished); we include a proof for the sake of completeness. Notice that the bound in (26) does not depend on the number of particles. Proposition 5.7. Let Γ be a finite connected graph, and let Y ⊂ Z be sub- sets of the vertex set of Γ. Let s be a nonnegative integer-valued function on the vertices of Γ. Let Hw(s, Y ) be the expected number of particles stopping in Y if s(x) particles start at each vertex x and perform independent sim- ple random walks stopped on first hitting Z. Let Hr(s, Y ) be the number of particles stopping in Y if s(x) particles start at each vertex x and perform rotor-router walks stopped on first hitting Z. Let H(x) = Hw(1x, Y ). Then |Hr(s, Y )−Hw(s, Y )| ≤ |H(u)−H(v)| (26) independent of s and the initial positions of the rotors. Proof. For each vertex u /∈ Z, arbitrarily choose a neighbor η(u). Or- der the neighbors η(u) = v1, v2, . . . , vd of u so that the rotor at u points to vi+1 immediately after pointing to vi (indices mod d). We assign weight w(u, η(u)) = 0 to a rotor pointing from u to η(u), and weight w(u, vi) = H(u) − H(vi) + w(u, vi−1) to a rotor pointing from u to vi. These assign- ments are consistent since H is a harmonic function: i(H(u)−H(vi)) = 0. Figure 6: Diagram for the proof of Lemma 5.8. We also assign weight H(u) to a particle located at u. The sum of rotor and particle weights in any configuration is invariant under the operation of routing a particle and rotating the corresponding rotor. Initially, the sum of all particle weights is Hw(s, Y ). After all particles have stopped, the sum of the particle weights is Hr(s, Y ). Their difference is thus at most the change in rotor weights, which is bounded above by the sum in (26). For ρ ∈ Z let Sρ = {x ∈ Zd : ρ ≤ |x| < ρ+ 1}. (27) Bρ = {x ∈ Zd : |x| < ρ} = S0 ∪ . . . ∪ Sρ−1. Note that for simple random walk started in Bρ, the first exit time of Bρ and first hitting time of Sρ coincide. Our next result is a modification of Lemma 5(b) of [10]. Lemma 5.8. Fix ρ ≥ 1 and y ∈ Sρ. For x ∈ Bρ let H(x) = Px(XT = y), where T is the first hitting time of Sρ. Then H(x) ≤ |x− y|d−1 for a constant J depending only on d. Proof. We induct on the distance |x − y|, assuming the result holds for all x′ with |x′ − y| ≤ 1 |x− y|; the base case can be made trivial by choosing J sufficiently large. By Lemma 5(b) of [10], we can choose J large enough so that the result holds provided |y| − |x| ≥ 2−d−3|x − y|. Otherwise, let H1 be the outer half-space tangent to Sρ at the point of Sρ closest to x, and let H2 be the inner half-space tangent to the ball S̃ of radius 12 |x− y| about y, at the point of S̃ closest to x. By Lemma 5.4 applied to these half-spaces, the probability that random walk started at x reaches S̃ before hitting Sρ is at most 21−d. Writing T̃ for the first hitting time of S̃ ∪ Sρ, we have H(x) ≤ x′∈eS Px(XeT = x′)H(x′) ≤ 21−dJ · |x− y| where we have used the inductive hypothesis to bound H(x′). The lazy random walk in Zd stays in place with probability 1 , and moves to each of the 2d neighbors with probability 1 . We will need the following standard result, which can be derived e.g. from the estimates in [17], section II.12; we include a proof for the sake of completeness. Lemma 5.9. Given u ∼ v ∈ Zd, lazy random walks started at u and v can be coupled with probability 1−C/R before either reaches distance R from u, where C depends only on d. Proof. Let i be the coordinate such that ui 6= vi. To define a step of the coupling, choose one of the d coordinates uniformly at random. If the chosen coordinate is different from i, let the two walks take the same lazy step so that they still agree in this coordinate. If the chosen coordinate is i, let one walk take a step while the other stays in place. With probability 1 walks will then be coupled. Otherwise, they are located at points u′, v′ with |u′− v′| = 2. Moreover, P |u−u′| ≥ R for a constant C ′ depending only on d. From now on, whenever coordinate i is chosen, let the two walks take lazy steps in opposite directions. ∣∣∣xi = u′i + v′i2 be the hyperplane bisecting the segment [u′, v′]. Since the steps of one walk are reflections in H1 of the steps of the other, the walks couple when they hit H1. Let Q be the cube of side length R/ d+2 centered at u, and let H2 be a hyperplane defining one of the faces of Q. By Lemma 5.4 with h1 = 1 and h2 = R/4 d, the probability that one of the walks exits Q before the walks couple is at most 2d · 5 1 + 1 ≤ 40 d3/2 1 + 2 Lemma 5.10. With H defined as in Lemma 5.8, we have∑ |H(u)−H(v)| ≤ J ′ log ρ for a constant J ′ depending only on d. Proof. Given u ∈ Bρ and v ∼ u, by Lemma 5.9, lazy random walks started at u and v can be coupled with probability 1 − 2C/|u − y| before either reaches distance |u − y|/2 from u. If the walks reach this distance without coupling, by Lemma 5.8 each has still has probability at most J/|u− y|d−1 of exiting Bρ at y. By the strong Markov property it follows that |H(u)−H(v)| ≤ |u− y|d Summing in spherical shells about y, we obtain |H(u)−H(v)| ≤ d−1 2CJ ≤ J ′ log ρ. We remark that Lemma 5.10 could also be inferred from Lemma 5.8 using [9, Thm. 1.7.1] in a ball of radius |u− y|/2 about u. To prove the outer estimate of Theorem 1.1, we will make use of the abelian property of rotor-router aggregation. Fix a finite set Γ ⊂ Zd con- taining the origin. Starting with n particles at the origin, at each time step, choose a site x ∈ Γ with more than one particle, rotate the rotor at x, and move one particle from x to the neighbor the rotor points to. After a finite number of such choices, each site in Γ will have at most one particle, and all particles that exited Γ will be on the boundary ∂Γ. The abelian property says that the final configuration of particles and the final configuration of rotors do not depend on the choices. For a proof, see [4, Prop. 4.1]. In our application, we will fix ρ ≥ r and stop each particle in rotor-router aggregation either when it reaches an unoccupied site or when it reaches the spherical shell Sρ. Let Nρ be the number of particles that reach Sρ during this process. Note that at some sites in Sρ, more than one particle may have stopped. If we let each of these extra particles in turn continue performing rotor-router walk, stopping either when it reaches an unoccupied site or when it hits the larger shell Sρ+h, then by the abelian property, the number of particles that reach Sρ+h will be Nρ+h. We will show that when h is order r1−1/d, a constant fraction of the particles that reach Sρ find unoccupied sites before reaching Sρ+h. Proof of Theorem 1.1, Outer Estimate. Fix integers ρ ≥ r and h ≥ 1. In the setting of Proposition 5.7, let Γ be the lattice ball Bρ+h+1, and let Z = Sρ+h. Fix y ∈ Sρ+h and let Y = {y}. For x ∈ Sρ, let s(x) be the number of particles stopped at x if each particle in rotor-router aggregation is stopped either when it reaches an unoccupied site or when it reaches Sρ. Write H(x) = Px(XT = y) where T is the first hitting time of Sρ+h. By Lemma 5.8 we have Hw(s, y) = s(x)H(x) ≤ where is the number of particles that ever visit the shell Sρ. By Lemma 5.10 the sum in (26) is at most J ′ log h, hence from Proposi- ton 5.7 and (29) we have Hr(s, y) ≤ + J ′ log h. (30) Let ρ(0) = r, and define ρ(i) inductively by ρ(i+ 1) = min ρ(i) +N2/(2d−1) , min{ρ > ρ(i)|Nρ ≤ Nρ(i)/2} . (31) Fixing h < ρ(i+ 1)− ρ(i), we have hd−1 log h ≤ N logNρ(i) ≤ Nρ(i); so (30) with ρ = ρ(i) simplifies to Hr(s, y) ≤ CNρ(i) where C = J + J ′. Since all particles that visit Sρ(i)+h during rotor-router aggregation must pass through Sρ(i), we have by the abelian property Nρ(i)+h ≤ y∈Sρ(i)+h Hr(s, y). (33) Let Mk = #(An∩Sk). There are at most Mρ(i)+h nonzero terms in the sum on the right side of (33), and each term is bounded above by (32), hence Mρ(i)+h ≥ Nρ(i)+h CNρ(i) where the second inequality follows from Nρ(i)+h ≥ Nρ(i)/2. Summing over h, we obtain ρ(i+1)−1∑ ρ=ρ(i)+1 (ρ(i+ 1)− ρ(i)− 1)d. (34) The left side is at most Nρ(i), hence ρ(i+ 1)− ρ(i) ≤ (2dCNρ(i)) 1/d ≤ N2/(2d−1) providedNρ(i) ≥ C ′ := (2dC)2d−1. Thus the minimum in (31) is not attained by its first argument. It follows that Nρ(i+1) ≤ Nρ(i)/2, hence Nρ(a log r) < C ′ for a sufficiently large constant a. By the inner estimate, since the ball Br−c log r is entirely occupied, we have ∑ Mρ ≤ ωdrd − ωd(r − c log r)d ≤ cdωdrd−1 log r. Write xi = ρ(i+ 1)− ρ(i)− 1; by (34) we have a log r∑ xdi ≤ cdωdr d−1 log r, By Jensen’s inequality, subject to this constraint, xi is maximized when all xi are equal, in which case xi ≤ C ′′r1−1/d and ρ(a log r) = r + xi ≤ r + C ′′r1−1/d log r. (35) Since Nρ(a log r) < C ′ we have Nρ(a log r)+C′ = 0; that is, no particles reach the shell Sρ(a log r)+C′ . Taking c′ = C ′ + C ′′, we obtain from (35) An ⊂ Br(1+c′r−1/d log r). Figure 7: Image of the rotor-router aggregate of one million particles under the map z 7→ 1/z2. The colors represent the rotor directions. The white disc in the center is the image of the complement of the occupied region. 6 Concluding Remarks A number of intriguing questions remain unanswered. Although we have shown that the asymptotic shape of the rotor-router model is a ball, the near perfect circularity found in Figure 1 remains a mystery. In particular, we do not know whether an analogue of Theorem 1.3 holds for the rotor- router model, with constant error in the radius as the number of particles grows. Equally mysterious are the patterns in the rotor directions evident in Figure 1. The rotor directions can be viewed as values of the odometer function mod 2d, but our control of the odometer is not fine enough to provide useful information about the patterns. If the rescaled occupied region π/nAn is viewed as a subset of the complex plane, it appears that the monochromatic regions visible in Figure 1, in which all rotors point in the same direction, occur near points of the form (1 + 2z)−1/2, where z = a + bi is a Gaussian integer (i.e. a, b ∈ Z). We do not even have a heuristic explanation for this phenomenon. Figure 7 shows the image of A1,000,000 under the map z 7→ 1/z2; the monochromatic patches in the transformed region occur at lattice points. László Lovász (personal communication) has asked whether the occu- pied region An is simply connected, i.e. whether its complement is con- nected. While Theorem 1.1 shows that An cannot have any holes far from the boundary, we cannot answer his question at present. A final question is whether our methods could be adapted to internal DLA to show that if n = ωdrd, then with high probability Br−c log r ⊂ In, where In is the internal DLA cluster of n particles. The current best bound is due to Lawler [11], who proves that with high probability Br−r1/3(log r)2 ⊂ In. Acknowledgments The authors thank Jim Propp for bringing the rotor-router model to our attention and for many fruitful discussions. We also had useful discussions with Oded Schramm, Scott Sheffield and Misha Sodin. We thank Wilfried Huss for pointing out an error in an earlier draft. Yelena Shvets helped draw some of the figures. References [1] P. Bak, C. Tang and K. Wiesenfeld, Self-organized criticality: an ex- planation of the 1/f noise, Phys. Rev. Lett. 59, no. 4 (1987), 381–384. [2] M. Cook, The tent metaphor, available at http://paradise.caltech. edu/~cook/Warehouse/ForPropp/. [3] J. N. Cooper and J. Spencer, Simulating a random walk with con- stant error, Combin. Probab. Comput. 15 (2006) 815–822. http://www. arxiv.org/abs/math.CO/0402323. [4] P. Diaconis and W. Fulton, A growth model, a game, an algebra, La- grange inversion, and characteristic classes, Rend. Sem. Mat. Univ. Pol. Torino 49 (1991) no. 1, 95–119. http://paradise.caltech.edu/~cook/Warehouse/ForPropp/ http://paradise.caltech.edu/~cook/Warehouse/ForPropp/ http://www.arxiv.org/abs/math.CO/0402323 http://www.arxiv.org/abs/math.CO/0402323 [5] A. Fey and F. Redig, Limiting shapes for deterministic centrally seeded growth models, J. Stat. Phys. 130 (2008), no. 3, 579–597. http:// arxiv.org/abs/math.PR/0702450. [6] Y. Fukai and K. Uchiyama, Potential kernel for two-dimensional ran- dom walk. Ann. Probab. 24 (1996), no. 4, 1979–1992. [7] M. Kleber, Goldbug variations, Math. Intelligencer 27 (2005), no. 1, 55–63. [8] P. Koosis, La plus petite majorante surharmonique et son rapport avec l’existence des fonctions entières de type exponentiel jouant le rôle de multiplicateurs, Ann. Inst. Fourier (Grenoble) 33 (1983), fasc. 1, 67– [9] G. Lawler, Intersections of Random Walks, Birkhäuser, 1996. [10] G. Lawler, M. Bramson and D. Griffeath, Internal diffusion limited aggregation, Ann. Probab. 20, no. 4 (1992), 2117–2140. [11] G. Lawler, Subdiffusive fluctuations for internal diffusion limited aggre- gation, Ann. Probab. 23 (1995) no. 1, 71–86. [12] Y. Le Borgne and D. Rossin, On the identity of the sandpile group, Discrete Math. 256 (2002) 775–790. [13] L. Levine, The rotor-router model, Harvard University senior thesis (2002), http://arxiv.org/abs/math/0409407. [14] L. Levine and Y. Peres, Spherical asymptotics for the rotor-router model in Zd, Indiana Univ. Math. J. 57 (2008), no. 1, 431–450. http://arxiv.org/abs/math/0503251 [15] L. Levine and Y. Peres, The rotor-router shape is spherical, Math. Intelligencer 27 (2005), no. 3, 9–11. [16] L. Levine and Y. Peres, Scaling limits for internal aggregation models with multiple sources. http://arxiv.org/abs/0712.3378. [17] T. Lindvall, Lectures on the Coupling Method, Wiley, 1992. [18] V. B. Priezzhev, D. Dhar, A. Dhar, and S. Krishnamurthy, Eulerian walkers as a model of self-organised criticality, Phys. Rev. Lett. 77 (1996) 5079–82. http://arxiv.org/abs/math.PR/0702450 http://arxiv.org/abs/math.PR/0702450 http://arxiv.org/abs/math/0409407 http://arxiv.org/abs/math/0503251 http://arxiv.org/abs/0712.3378 [19] J. Propp, Three lectures on quasirandomness, available at http:// faculty.uml.edu/jpropp/berkeley.html. [20] F. Spitzer, Principles of Random Walk, Springer, 1976. [21] K. Uchiyama, Green’s functions for random walks on ZN , Proc. London Math. Soc. 77 (1998), no. 1, 215–240. [22] J. Van den Heuvel, Algorithmic aspects of a chip-firing game, Combin. Probab. Comput. 10, no. 6 (2001), 505–529. http://faculty.uml.edu/jpropp/berkeley.html http://faculty.uml.edu/jpropp/berkeley.html Introduction Basic Estimate Divisible Sandpile Classical Sandpile Rotor-Router Model Inner Estimate Outer Estimate Concluding Remarks ABSTRACT The rotor-router model is a deterministic analogue of random walk. It can be used to define a deterministic growth model analogous to internal DLA. We prove that the asymptotic shape of this model is a Euclidean ball, in a sense which is stronger than our earlier work. For the shape consisting of $n=\omega_d r^d$ sites, where $\omega_d$ is the volume of the unit ball in $\R^d$, we show that the inradius of the set of occupied sites is at least $r-O(\log r)$, while the outradius is at most $r+O(r^\alpha)$ for any $\alpha > 1-1/d$. For a related model, the divisible sandpile, we show that the domain of occupied sites is a Euclidean ball with error in the radius a constant independent of the total mass. For the classical abelian sandpile model in two dimensions, with $n=\pi r^2$ particles, we show that the inradius is at least $r/\sqrt{3}$, and the outradius is at most $(r+o(r))/\sqrt{2}$. This improves on bounds of Le Borgne and Rossin. Similar bounds apply in higher dimensions. <|endoftext|><|startoftext|> Introduction The kilohertz quasi-periodic oscillations (kHz QPOs) were firstly discov- ered in Sco X-1, a luminous Z source in neutron star (NS) low-mass X-ray binaries (LMXBs) (e.g. van der Klis et al. 1996), and now they have been detected in twenty more sources (e.g. van der Klis 2000, 2006, for reviews). Usually, these kHz QPOs appear in pairs, the upper kHz QPO frequency (ν2, hereafter the upper-frequency) and the lower kHz QPO frequency (ν1, here- after the lower-frequency), which are discovered in three classes of sources, i.e. accretion powered millisecond pulsars, bright Z sources and less luminous Atoll sources (e.g., Hasinger & van der Klis 1989). The kHz QPO peak separation, ∆ν = ν2 − ν1, in a given source generally decreases with frequency, except the recently detected kHz QPOs in Cir X- 1, in which the peak separation increases with frequency (Boutloukos et al. Accepted for publication in Advances of Space Research http://arxiv.org/abs/0704.0689v1 2006). In addition, the variable peak separations are not equal to the NS spin frequencies. However, the averaged peak separation is found to be either close to the spin frequency or to half of it (e.g., van der Klis 2006; Linares et al. 2005). The above observations offer strong evidence against the simple beat- frequency model, in which the lower-frequency is the beat between the upper- frequency ν2 and the NS spin frequency νs (e.g. Strohmayer et al. 1996; Zhang et al. 1997; Miller et al. 1998), i.e. ν1 = ν2 − νs. Furthermore, with the dis- covery of pairs of 30–450 Hz QPOs from a few black-hole candidates with the frequency ratios 3:2 (e.g., van der Klis 2006), Abramowicz et al. (2003) reported that the ratios of twin kHz QPOs in Sco X-1 tend to cluster around a value about 3:2, and they argued this fact to be a promising link with the black hole high-frequency QPOs (e.g. van der Klis 2006). For the all Z and Atoll sources, the data plots of the upper-frequency versus the lower-frequency can be fitted by a power law function (e.g., Zhang et al. 2006a), and also roughly fitted by a linear function (Belloni et al. 2005). However, for the individual kHz QPO source, for instance Sco X-1, its kHz QPOs can be well fitted by a power law function (e.g. Psaltis et al. 1998; Yin et al. 2005). In this paper, to investigate the twin kHz QPO correlation for the individ- ual Z or Atoll source, we fitted the data with a power-law and a linear function for four typical Z sources and four typical Atoll sources, and a comparison of both fittings by χ2-tests is discussed in section 2, where comparisons with the models are discussed. The conclusions and consequences are given in section 2 Correlations between twin kHz QPOs Until now, twin kHz QPOs have been detected in 21 LMXBs, including 2 accretion powered millisecond X-ray pulsars, 8 Z sources and 11 Atoll sources, as listed in Tab. 1. In Fig. 1 and Fig. 2, we plotted twin kHz QPO data for the Z sources and Atoll sources, showing the correlations of ν1 vs. ν2, ∆ν vs. ν2 and ν2/ν1 vs. ν2, where the power-law and linear fitting lines for the eight Z and Atoll sources are presented. The results of the fittings and χ2 -tests are listed in Tab. 2. 2.1 A power law fitting The power-law function is chosen as ν1 = a 1000 Hz Hz (1) to fit twin kHz QPO data points of all Atoll (Z) sources, as well as 4 individual Atoll (Z) sources, separately. It is noted that a same function was applied to the fitting of kHz QPOs of Sco X-1 by Psaltis et al. (1998) with a smaller set of kHz QPO data points. The fitting results of the normalization coefficient a, the power-law index b and χ2/d.o.f. for various cases are listed in Tab. 2, which correspond to the fitting curves as presented in Fig. 1. We find that the power- law index for the fitting of all Z sources (see Tab. 2) is 1.87, obviously bigger than that of the fitting for all Atoll sources (1.61). Then, for the individual case, the power-law index for Z source is generally bigger than that in Atoll source, except GX 17+2. 2.2 A linear fitting For the same data sets, the linear fitting function is chosen as, ν2 = Aν1 +B Hz , (2) which was exploited by Belloni et al. (2005) to discuss the kHz QPO fitting in Sco X-1, 4U 1608-52, 4U 1636-53, 4U 1728-34 and 4U 1820-30. By means of the χ2 -tests, as shown in Tab. 2, we find that the linear fitting concordes with the data well in some cases, and there is no much systematic difference between the linear slope parameters of the Atoll sources and those of Z sources. 2.3 Comparison between the power-law and the linear correlation As a comparison between models and the data, it is remarked that the relativistic precession model (e.g. Stella & Vietri 1999) and the Alfvén wave oscillation model (e.g. Zhang 2004; Li & Zhang 2005) both can lead to power- law relations approximately, and then the beat-frequency model (e.g. Miller et al. 1998) and the 3:2 resonance model (e.g. Abramowicz et al. 2003, this model is successfully applied to black hole candidates) predicted the linear relations between twin kHz QPO frequencies in the lowest approximation (Abramowicz et al. 2005). In Tab. 2, we can see that the χ2/d.o.f. of the power-law relation is usually less than the linear one for the same source, except the two Atoll sources 4U 0614+09 and 4U 1636-53. And a linear function cannot give a firstly increasing and then decreasing tendency of all Z data as shown in Fig. 2b. But a power-law one would fit it well as shown in Fig. 1b. So, these maybe mean that a power-law correlation is better than a linear one. 2.4 Testing the constant peak separation ∆ν = 300 Hz Since the discovery of kHz QPOs, it is known that the peak separation for Sco X-1 (van der Klis et al 1997; Méndez & van der Klis 2000) is a not constant, and the same is true for the other Z sources, e.g., GX 17+2 (Homan et al. 2002) and Cir X-1 (Boutloukos 2006). As for the Atoll sources, the peak separation of 4U 1728–34 (Migliari, van der Klis, & Fender 2003; Méndez & van der Klis 1999) is always significantly lower than the burst oscillation frequency, and the peak separation of 4U 1636–53 (Jonker, Méndez, & van der Klis 2002b; Méndez, van der Klis, & van Paradijs 1998) is varying between being lower and higher than half the spin frequency. In addition, 4U 1608-52 ( Méndez et al. 1998) and 4U 1735-44 (Ford et al. 1998) are found to share the varied peak separations. In Fig. 1b or Fig. 2b, we show that the peak separations in all Z sources decrease (increase) systematically with the upper frequency if the upper fre- quency is larger (less) than ∼700 Hz (e.g. van der Klis 2000, 2006; Boutloukos et al. 2006). But this firstly increasing and then decreasing with frequency is not clearly found for the kHz QPO data of all Atoll sources, as shown in Fig. 1e and Fig. 2e, which perhaps is on account of the less amount of data in the low kHz QPO frequencies in Atoll sources. From Fig. 1 and Fig. 2, we find that the peak separations are scattered in a wide range of frequency for each source. Therefore, a constant peak separation, i.e. ∆ν = 300 Hz, cannot fit for these data. Fig. 3 shows the results of χ2-tests against a general constant peak sep- aration of the twin kHz QPOs in the 8 individual sources. The minimum χ2/d.o.f. are all with values >> 1 , which means that any constant peak separation model cannot fit for these data anymore. 2.5 Testing the constant peak ratio ν2/ν1 = 3/2 From Fig. 1c (Fig. 1f) or Fig. 2c (Fig. 2f), we find that twin frequency ratios distribute in a wide range from 1.2 to 4.2, with the averaged value 1.73 (1.50) for all known Z (Atoll) data. Obviously, a constant ratio ν2/ν1 = 3/2, which can be applicable to some black hole QPO sources, is not consistent with the observed NS/LMXB data. In the ν2/ν1 vs. ν2 plots of Fig. 1 and Fig. 2, the frequency ratios systematically decrease with the upper-frequency for both Z and Atoll sources. In detail, the incompatible 3:2 ratio peak distribution has been also studied by Belloni et al. (2005) in several sources, who showed that the distribution of QPO frequencies in Sco X-1, 4U 1608–52, 4U 1636–53, 4U 1728–34, and 4U 1820–30 is multi-peaked, with the peaks occurring at the different ν2/ν1 ratios. 3 Conclusions In this paper, the updated data sets of twin kHz QPO frequencies simul- taneously detected in NS LMXBs are analyzed, and the power-law and linear fittings are studied for the individual Z/Atoll and all Z/Atoll sources, respec- tively. Our main conclusions are presented as follows. (1)In Fig.1 and Fig.2, we can notice that a simple constant peak separation model, such as the beat- frequency model (e.g. Strohmayer et al. 1996; Zhang et al. 1997; Miller et al. 1998), or a constant peak ratio assumption, as in a naive extrapolation of the observed resonant frequency ratio from the black hole sources to the neutron stars (e.g. Abramowicz et al. 2003), cannot fit the observed data. Namely, any simple constant peak separation and constant peak ratio models are generally inconsistent with the data. The peak separations in all Z sources tend to increase (decrease) with the upper-frequency if the upper-frequency is less (larger) than ∼700 Hz. But this tendency does not appear in all Atoll sources because of less amount of data at low kHz QPO frequency. Statis- tically, the twin frequency ratios tend to decrease with the upper-frequency in both Z and Atoll sources. (2)Our results show that the index of the fitted power-law relation of Z source is generally bigger than that of Atoll source, except GX 17+2. On the consideration of model, this different index value in Z and Atoll sources might be related to the diversity in their luminosity or magnetic field. However the linear correlations do not show any systematic differences between Z and Atoll sources. (3)The power-law fitting is somewhat better than the linear one for most of the sources, because the χ2/d.o.f. value of the power-law correlation is generally less than that of linear one, and a lin- ear correlation cannot give the firstly increasing and then decreasing tendency of peak separations in all Z data. As a comparison with model’s prediction, we mention the Relativistic Precession model (e.g. Stella & Vietri 1999) and the Alfvén Wave Oscillation model (Zhang 2004), since both models can give an approximated power-law correlation between the twin kHz QPOs, however none of them can distinguish the influences of the luminosity of Z and Atoll sources. As a summary, if the future data still support the conclusions obtained in the paper, they will pose the meaningful constraints on the models for explaining kHz QPOs. 4 Acknowledgements We are grateful for T. Belloni, M. Méndez, D. Psaltis and J. Homan for providing the QPO data, and thank C.M. Zhang for discussions. We highly appreciate the anonymous reviewers for their helpful comments. 5 References Abramowicz, M.A., Bulik, T., Bursa, M., et al. Evidence for a 2:3 resonance in Sco X-1 kHz QPOs. A&A404, L21–L24. 2003. Abramowicz, M.A., Barret, D., Bursa, M., et al. AN, 326, 864–866. 2005. Belloni,T., Psaltis, D., & van der Klis, M. A Unified Description of the Timing Features of Accreting X-Ray Binaries. ApJ 572, 392–406. 2002. Belloni, T., Méndez, M., & Homan, J. The distribution of kHz QPO frequen- cies in bright low mass X-ray binaries. A&A437, 209–216. 2005. Boutloukos, S., van der Klis, M., Altamirano, D., et al. Discovery of twin kHz QPOs in the peculiar X-ray binary Circinus X-1. ApJ in press 2006. (astro-ph/0608089) Di Salvo, T., Méndez, M., & van der Klis, M. On the correlated spectral and timing properties of 4U 1636-53: An atoll source at high accretion rates. A&A406, 177–192. 2003. Hasinger, G., & van der Klis, M. Two patterns of correlated X-ray timing and spectral behaviour in low-mass X-ray binaries.A&A225, 79–96. 1989. Homan, J., van der Klis, M., Jonker, P.G., et al. RXTE Observations of the Neutron Star Low-Mass X-Ray Binary GX 17+2: Correlated X-Ray Spectral and Timing Behavior. ApJ 568, 878–900. 2002. Jonker, P.G., van der Klis, M., Wijnands, et al. The Power Spectral Properties of the Z Source GX 340+0. ApJ 537, 374–386. 2000. Jonker, P.G., van der Klis, M., Homan, J., et al. Low- and high-frequency variability as a function of spectral properties in the bright X-ray binary GX 5-1. MNRAS 333, 665–678. 2002a. http://arxiv.org/abs/astro-ph/0608089 Jonker, P.G., Méndez, M., & van der Klis, M. Kilohertz quasi-periodic oscil- lations difference frequency exceeds inferred spin frequency in 4U 1636-53. MNRAS 336, L1–L5. 2002b. Li, X.D., & Zhang, C.M. A Model for Twin Kilohertz Quasi-periodic Os- cillations in Neutron Star Low-Mass X-Ray Binaries. ApJ 635, L57–L60. 2005. Linares, M., van der Klis, M., Altamilano, D. et al. Discovery of Kilohertz Quasi-periodic Oscillations and Shifted Frequency Correlations in the Ac- creting Millisecond Pulsar XTE J1807-294. ApJ 634, 1250–1260. 2005. Markwardt, C.B., Strohmayer, T.E., & Swank, J.H. Observation of Kilohertz Quasi-periodic Oscillations from the Atoll Source 4U 1702-429 by the Rossi X-Ray Timing Explorer. ApJ 512, L125–L129. 1999. Méndez, M., van der Klis, M., & van Paradijs, J. Difference Frequency of Kilohertz QPOs Not Equal to Half the Burst Oscillation Frequency in 4U 1636-53. ApJ 506, L117–L119. 1998. Méndez, M., van der Klis, M., Wijnands, R., et al. Kilohertz Quasi-periodic Oscillation Peak Separation Is Not Constant in the Atoll Source 4U 1608- 52. ApJ 505, L23–L26. 1998. Méndez, M., & van der Klis, M. Precise Measurements of the Kilohertz Quasi- periodic Oscillations in 4U 1728-34. ApJ 517, L5–L54. 1999. Méndez, M., van der Klis, M. The harmonic and sideband structure of the kilohertz quasi-periodic oscillations in Sco X-1. MNRAS 318, 938–942. 2000. Migliari, S., van der Klis, M., & Fender, R. Evidence of a decrease of kHz quasi-periodic oscillation peak separation towards low frequencies in 4U 1728-34 (GX 354-0). MNRAS 345, L35–L39. 2003. Miller, M.C., Lamb, F.K., & Psaltis, D. . Sonic-Point Model of Kilohertz Quasi-periodic Brightness Oscillations in Low-Mass X-Ray Binaries.ApJ 508, 791–830. 1998. O’Neill, P.M., Kuulkers, E., Sood, R. K., et al. The X-ray fast-time variability of Sco X-2 (GX 349+2) with RXTE. MNRAS 336, 217–232. 2002. Psaltis, D., Méndez, M., Wijnands, R., et al. The Beat-Frequency Interpre- tation of Kilohertz Quasi-periodic Oscillations in Neutron Star Low-Mass X-Ray Binaries. ApJ 501, L95–L99. 1998. Psaltis, D., Wijnands, R., Homan, J., Jonker, et al. On the Magnetospheric Beat-Frequency and Lense-Thirring Interpretations of the Horizontal-Branch Oscillation in the Z Sources. ApJ 520, 763–775. 1999a. Psaltis, D., Belloni, T. & van der Klis, M. Correlations in Quasi-periodic Oscillation and Noise Frequencies among Neutron Star and Black Hole X-Ray Binaries. ApJ 520, 262–270. 1999b. Stella, L., Vietri, M. & Morsink, S. Correlations in the Quasi-periodic Os- cillation Frequencies of Low-Mass X-Ray Binaries and the Relativistic Precession Model. ApJ 524, L63–L66. 1999. Strohmayer, T., Zhang, W., Smale, A., et al. Millisecond X-Ray Variability from an Accreting Neutron Star System. ApJ 469, L9–L12. 1996. van der Klis, M., Swank, J.H., Zhang, W., et al. Discovery of Submillisecond Quasi-periodic Oscillations in the X-Ray Flux of Scorpius X-1. ApJ 469, L1–L4. 1996. van der Klis, M., Wijnands, R., Horne, D. et al. Kilohertz Quasi-Periodic Oscillation Peak Separation Is Not Constant in Scorpius X-1. ApJ 481, L97–L100. 1997. van der Klis, M. Millisecond Oscillations in X-ray Binaries. ARA&A38, 717– 760. 2000. van der Klis, M. Rapid X-Ray Variability. in Compact stellar X-ray sources, W.H.G. Lewin & M. van der Klis (eds.), Cambridge University Press, p.39. 2006. (astro-ph/0410551) van Straaten, S., Ford, E.C., van der Klis, M., et al. Relations between Timing Features and Colors in the X-Ray Binary 4U 0614+09. ApJ 540, 1049– 1061. 2000. van Straaten, S., van der Klis, M., Di Salvo, T., et al. A Multi-Lorentzian Timing Study of the Atoll Sources 4U 0614+09 and 4U 1728-34. ApJ 568, 912–930. 2002. van Straaten, S., van der Klis, M., & Méndez, M. The Atoll Source States of 4U 1608-52. ApJ 596, 1155–1176. 2003. Wijnands, R., van der Klis, M., Homan, J., et al. Quasi-periodic X-ray bright- ness fluctuations in an accreting millisecond pulsar. Nature 424, 44–47. 2003. Yin, H.X., Zhang, C.M., Zhao, Y.H., et al. A Study on the Correlations between the Twin kHz QPO frequencies in Sco X-1. ChJAA5, 595–600. 2005. http://arxiv.org/abs/astro-ph/0410551 Zhang, C.M. The MHD Alfven wave oscillation model of kHz Quasi Periodic Oscillations of Accreting X-ray binaries. A&A423, 401–404. 2004. Zhang, C.M., Yin, H.X., Zhao, Y.H., et al. The correlations between the twin kHz quasi-periodic oscillation frequencies of low-mass X-ray binaries. MNRAS 366, 1373–1377. 2006a. Zhang, F., Qu, J.L., Zhang, C.M., et al. Timing Features of the Accretion- driven Millisecond X-Ray Pulsar XTE J1807-294 in the 2003 March Out- burst. ApJ 646, 1116–1124. 2006b. Zhang, W., Strohmayer, T.E., & Swank, J.H. Neutron Star Masses and Radii as Inferred from Kilohertz Quasi-periodic Oscillations. ApJ 482, L167– L170. 1997. 0 200 400 600 800 1000 1200 1400 (Hz) GX 5-1 GX 17+2 GX 340+0 Sco X-1 Other Z data Sco X-1 GX 5-1 GX 340+0 GX 17+2 All Z data = 300 Hz 0 200 400 600 800 1000 1200 1400 4U 1728-34 4U 0614+09 4U 1608-52 4U 1636-53 Other data 4U 1728-34 4U 0614+09 4U 1608-52 4U 1636-53 All Atoll data =300Hz =3/2 (Hz) Fig. 1. Plots of a and d ν1 vs. ν2, b and e ∆ν vs. ν2 and c and f ν2/ν1 vs. ν2 for Z sources and Atoll sources. Power-law fitting lines and the two reference lines (ν2/ν1 = 3/2, and ∆ν = 300 Hz) are presented also. 0 200 400 600 800 1000 1200 1400 (Hz) GX 5-1 GX 17+2 GX 340+0 Sco X-1 Other Z data Sco X-1 GX 5-1 GX 17+2 GX 340+0 All Z data =300 Hz =3/2 0 200 400 600 800 1000 1200 1400 4U 1728-34 4U 0614+09 4U 1608-52 4U 1636-53 Other Atoll data =3/2 =300Hz 4U 1728-34 4U 0614+09 4U 1608-52 4U 1636-53 All Atoll data (Hz) Fig. 2. Plots of a and d ν1 vs. ν2, b and e ∆ν vs. ν2 and c and f ν2/ν1 vs. ν2 for Z sources and Atoll sources. Linear fitting lines and the two reference lines (ν2/ν1 = 3/2, and ∆ν = 300 Hz) are presented also. 200 300 400 10000 GX 5-1 GX 17+2 GX 340+0 Sco X-1 4U 1728-34 4U 1636-53 4U 1608-52 4U 0614+09 (Hz) Fig. 3. χ2−tests for a constant peak separation of the 8 individual sources list in Tab. 2. Table 1 List of LMXBs with the simultaneously detected twin kHz QPOs. Sources ν (3) ν2/ν 1 References (Hz) (Hz) (Hz) Millisecond pulsar (2) XTE J1807-294 127-360 353-587 179-247 1.51-2.78 1,2, SAX J1808.4-3658 499 694 195 1.39 3 Z source (8) Cir X-1 56-226 229-505 173-340 2.23-4.19 4 Sco X-1 544-852 844-1086 223-312 1.26-1.57 B,M,K GX 340+0 197-565 535-840 275-413 1.49-2.72 B,K,P,5 XTE J1701-462 620 909 289 1.47 6 GX 349+2 712-715 978-985 266-270 1.37-1.38 B,K,7 GX 5-1 156-634 478-880 232-363 1.38-3.06 B,K,P,8 GX 17+2 475-830 759-1078 233-308 1.28-1.60 B,K,P,9 Cyg X-2 532 856.6 324 1.61 B,K,P Atoll source (11) 4U 0614+09 153-823 449-1162 238-382 1.38-2.93 B,K,P,10,11 4U 1608-52 476-876 802-1099 224-327 1.26-1.69 M,B,K,12 4U 1636-53 644-921 971-1192 217-329 1.24-1.51 B,K,P,13,14 4U 1702-43 722 1055 333 1.46 K,P,15 4U 1705-44 776 1074 298 1.38 B,K,P 4U 1728-34 308-894 582-1183 271-359 1.31-1.89 B,K,P,11,16 KS 1731-260 903 1169 266 1.29 B,K,P 4U 1735-44 640-728 982-1026 296-341 1.41-1.53 B,K,P 4U 1820-30 790 1064 273 1.35 B,K,P 4U 1915-05 224-707 514-1055 290-353 1.49-2.3 B,K,P XTEJ2123-058 849-871 1110-1140 261-270 1.31-1.31 B,K,P (1):the range of ν1; (2): the range of ν2; (3): the range of ∆ν; (4): the range of ν2/ν1. K: van der Klis 2000, van der Klis 2006; M: Méndez et al. 1998ab, Méndez & van der Klis 1999, 2000; B: Belloni et al. 2002, Belloni et al. 2005; P: Psaltis et al. 1999ab. 1: Linares 2005; 2: Zhang et al. 2006b; 3: Wijnands et al. 2003; 4: Boutloukos et al. 2006; 5: Jonker et al. 2000; 6: Homan 2006 (personal communication); 7: O’Neill et al. 2002; 8: Jonker et al. 2002a; 9: Homan et al. 2002; 10: van Straaten et al. 2002; 11: van Straaten et al. 2000; 12: van Straaten et al. 2003; 13: Di Salvo et al. 2003; 14: Jonker et al. 2002b; 15: Markwardt et al. 1999; 16: Migliari et al. 2003. Table 2 List of the results of fittings and χ2-tests. ν1 = a(ν2/(1000 Hz)) b Hz ν2 = Aν1+B Hz Source∗ a b χ2/d.o.f. A B χ2/d.o.f. Z source Sco X-1 721.95 ± 0.69 1.85 ± 0.01 33.9/87 0.765 ± 0.007 445.84 ± 4.73 54.8/87 GX 340+0 763.85 ±38.03 2.12 ± 0.15 21.3/17 0.884 ± 0.067 374.96 ±24.41 27.2/17 GX 5-1 833.02 ±26.08 2.26 ± 0.10 29.7/27 0.828 ± 0.039 379.54 ±15.10 43.7/27 GX 17+2 723.40 ± 5.53 1.56 ± 0.06 11.2/19 0.906 ± 0.038 341.13 ±24.27 12.0/19 All Z 725.38 ± 2.50 1.87 ± 0.02 205.8/169 0.924 ± 0.012 336.22 ± 6.81 380.1/169 Atoll source 4U 0614+09 673.74 ± 4.77 1.49 ± 0.06 80.1/40 1.002 ± 0.033 320.75 ±20.37 68.4/40 4U 1608-52 717.36 ± 4.89 1.81 ± 0.08 7.8/16 0.781 ± 0.036 436.60 ±25.12 9.1/16 4U 1636-53 685.16 ±15.44 1.72 ± 0.16 10.7/11 0.737 ± 0.064 505.18 ±53.51 10.1/11 4U 1728-34 667.86 ± 5.59 1.51 ± 0.07 20.9/23 0.997 ± 0.046 329.33 ±32.09 25.5/23 All Atoll 683.48 ± 3.01 1.61 ± 0.04 293.3/109 0.912 ± 0.020 371.08 ±13.91 308.7/109 Introduction Correlations between twin kHz QPOs A power law fitting A linear fitting Comparison between the power-law and the linear correlation Testing the constant peak separation =300 Hz Testing the constant peak ratio 2/1=3/2 Conclusions Acknowledgements References ABSTRACT The recently updated data of the twin kilohertz quasi-periodic oscillations (kHz QPOs) in the neutron star low-mass X-ray binaries are analyzed. The power-law fitting $\nu_{1}=a(\nu_{2}/1000)^{b}$ and linear fitting $\nu_{2}=A\nu_{1}+B$ are applied, individually, to the data points of four Z sources (GX 17+2, GX 340+0, GX 5-1 and Sco X-1) and four Atoll sources (4U 0614+09, 4U 1608-52, 4U 1636-53 and 4U 1728-34). The $\chi^{2}$-tests show that the power-law correlation and linear correlation both can fit data well. Moreover, the comparisons between the data and the theoretical models for kHz QPOs are discussed. <|endoftext|><|startoftext|> Introduction 4 2 Fuzzball solutions on T 4 9 2.1 Chiral null models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 The IIA F1-NS5 system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Dualizing further to the D1-D5 system . . . . . . . . . . . . . . . . . . . . . 12 3 Fuzzball solutions on K3 13 3.1 Heterotic chiral model in 10 dimensions . . . . . . . . . . . . . . . . . . . . 13 3.2 Compactification on T 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 String-string duality to P-NS5 (IIA) on K3 . . . . . . . . . . . . . . . . . . 16 3.4 T-duality to F1-NS5 (IIB) on K3 . . . . . . . . . . . . . . . . . . . . . . . . 17 3.5 S-duality to D1-D5 on K3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4 D1-D5 fuzzball solutions 21 5 Vevs for the fuzzball solutions 26 5.1 Holographic relations for vevs . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.2 Application to the fuzzball solutions . . . . . . . . . . . . . . . . . . . . . . 30 6 Properties of fuzzball solutions 33 6.1 Dual field theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.2 Correspondence between geometries and ground states . . . . . . . . . . . . 34 6.3 Matching with the holographic vevs . . . . . . . . . . . . . . . . . . . . . . 35 6.4 A simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.5 Selection rules for curve frequencies . . . . . . . . . . . . . . . . . . . . . . . 40 6.6 Fuzzballs with no transverse excitations . . . . . . . . . . . . . . . . . . . . 41 7 Implications for the fuzzball program 43 A Conventions 46 A.1 Field equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 A.2 Duality rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 B Reduction of type IIB solutions on K3 49 B.1 S-duality in 6 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 B.2 Basis change matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 C Properties of spherical harmonics 54 D Interpretation of winding modes 55 E Density of ground states with fixed R charges 59 1 Introduction Over the last few years an interesting new proposal for the gravitational nature of black hole microstates has emerged [1, 2, 3]; see also [4, 5, 6], and [7]. According to this proposal there should exist non-singular horizon-free geometries associated with the black hole microstates. These so-called fuzzball geometries should asymptotically approach the original black hole geometry and should generically differ from each other around the horizon scale. In this scenario the black hole provides only an average statistical description of the physics and thus longstanding issues such as the information loss paradox would be resolved. The underlying physics of the black hole would not be conceptually different from that of a star, with the temperature and entropy being of statistical origin. Given the importance of understanding black hole physics and its implications for quantum gravity, this proposal should be developed, explored and tested where possible. Many issues need to be addressed to implement the fuzzball proposal at a quantitative and precise level. The proposal requires the existence of exponential numbers of horizon- free non-singular solutions for each black hole. So the most basic of questions is whether one can find such a number of solutions with the required properties and moreover what precisely are the required properties for any given black hole. Moreover one would like to show quantitatively how black hole properties emerge upon coarse-graining; for this one needs to know the precise relationship between geometries and microstates. Much of the recent work on this proposal has been focused on constructing fuzzball ge- ometries for certain supersymmetric black holes with macroscopic horizons; for a summary of progress in this direction see [8]. The method of construction here uses crucially super- symmetry and known classifications of supersymmetric solutions: one looks for non-singular horizon-free supersymmetric solutions with the correct charges to match those of the black hole. This method however has a number of limitations. One is that the supersymmetric classifications are not sufficiently restrictive for cases of interest and thus one needs a specific ansatz to make progress. To date many of the fuzzball geometries constructed are rather atypical; for example, they have angular momenta much larger than those of a typical black hole microstate. Whilst families of typical geometries are presumably contained in the supersymmetric classification, finding an ansatz to construct families rather than isolated examples is not easy. Another key issue is that one does not know precisely what is the relationship between a given geometry and the black hole microstates. This in turn means that one does not know whether one has constructed the correct geometries to describe the black hole. Nor does one know whether one has enough geometries to account for the black hole entropy upon geometric quantization, using the methods of [9]. For example, in cases where the dual theory has distinct Higgs and Coulomb branches, one needs to determine whether a given fuzzball geometry describes Higgs or Coulomb branch physics. More importantly, one would like to see explicitly how black hole properties emerge upon coarse graining; to understand how to do such a computation properly the precise relation between the fuzzball geometries and microstates is crucial. To address this issue, we have advocated and developed the use of AdS/CFT methods. That is, the supersymmetric black holes of interest admit a dual CFT description and the fuzzball geometries therefore have a decoupling limit which is asymptotically AdS. One can therefore use well-developed techniques of AdS/CFT, in particular Kaluza-Klein holography [10], to extract field theory data from the geometry and diagnose precisely what the geometry describes. It is worth emphasizing at this point that the AdS/CFT correspondence both motivates and supports the fuzzball picture. The gravity/gauge theory dictionary relates a given asymptotically AdS geometry to either a deformation of the CFT or the CFT in a non-trivial vacuum characterized by the expectation values of gauge invariant operators. Conversely, one expects that for any stable state of the CFT (such as the BPS states) there exists an asymptotically AdS solution, whose asymptotics encode the vevs of gauge invariant operators in that state. If the field theory is in a pure state, there is no entropy and one does not expect the corresponding geometry to have a horizon, and hence entropy. AdS/CFT thus implies that the field theory in a given pure stable (black hole) state should have a geometric dual with no horizon; there is however no guarantee that the geometry should be well-described by supergravity alone, i.e. weakly curved everywhere1. In our recent papers [11, 12], we have discussed in some detail the case of the D1-D5 system, for which (some) fuzzball geometries were constructed in [1]. Since this is a 2-charge system, there is no macroscopic horizon: the naive geometry is singular, with the horizon believed to form on taking into account α′ corrections. Whilst this is not a macroscopic black hole system, there are a number of reasons to explore this case fully before moving on to supersymmetric macroscopic black holes. 1There are additional subtleties in low dimensional quantum field theories due to the strong infrared fluctuations. More properly one should view a given fuzzball geometry as dual to a wavefunction on the Higgs branch of the field theory, but it seems in any case likely that such wavefunctions would be localized around specific regions in the large N limit and thus that this issue does not play a key role at infinite N . Firstly, one can obtain all fuzzball geometries in this system by dualities from known solitonic solutions of F1-P systems. Thus one should be able to account for all the entropy, and show how the average black hole description emerges. Moreover, the dual description of this black hole is the simplest and best understood: the black hole entropy arises from the degeneracy of the Ramond ground states of the dual (4, 4) CFT. This is an ideal system in which to address the question of what is the precise correspondence between geome- tries and microstates, and moreover how the properties of given microstates determine and characterize the fuzzball geometries. In the original work of [1], only a subset of the 2-charge fuzzball geometries were con- structed using dualities from F1-P solutions. Recall that the D1-D5 system on T 4 is related by dualities to the type II F1-P system, also on T 4, whilst the D1-D5 system on K3 is re- lated to the heterotic F1-P system on T 4; the exact duality chains needed will be reviewed in sections 2 and 3. Now the solution for a fundamental string carrying momentum in type II is characterized by 12 arbitrary curves, eight associated with transverse bosonic excitations and four associated with the bosonization of eight fermionic excitations on the string [13]. The corresponding heterotic string solution is characterized by 24 arbitrary curves, eight associated with transverse bosonic excitations and 16 associated with charge waves on the string. In the work of [1], the duality chain was carried out for type II F1-P solutions on T 4 for which only bosonic excitations in the transverse R4 are excited. That is, the solutions are characterized by only four arbitrary curves; in the dual D1-D5 solutions these four curves characterize the blow-up of the branes, which in the naive solutions are sitting in the origin of the transverse R4, into a supertube. In this paper we carry out the dualities for generic F1-P solutions in both the T 4 and K3 cases, to obtain generic 2-charge fuzzball solutions with internal excitations. Note that partial results for the T 4 case were previously given in the appendix of [3]; we will comment on the relation between our solutions and theirs in section 2. The general solutions are then characterized by arbitrary curves capturing excitations along the compact manifold M4, along with the four curves describing the blow-up in R4. They describe a bound state of D1 and D5-branes, wrapped on the compact manifold M4, blown up into a rotating supertube in R4 and with excitations along the part of the D5-branes wrapping the M4. The duality chain that uses string-string duality from heterotic on T 4 to type II on K3 provides a route for obtaining fuzzball solutions that has not been fully explored. One of the results in this paper is to make explicit all steps in this duality route. In particular, we work out the reduction of type IIB on K3 and show how S-duality acts in six dimensions. These results may be useful in obtaining fuzzball solution with more charges. In our previous work [11, 12], we made a precise proposal for the relationship between the 2-charge fuzzball geometries characterized by four curves F i(v) and superpositions of R ground states: a given geometry characterized by F i(v) is dual to a specific superposition of R vacua with the superposition determined by the Fourier coefficients of the curves F i(v). In particular, note that only geometries associated with circular curves are dual to a single R ground state (in the usual basis, where the states are eigenstates of the R-charge). This proposal has a straightforward extension to generic 2-charge geometries, which we will spell out in section 6, and the extended proposal passes all kinematical and accessible dynamical tests, just as in [11, 12]. In particular, we extract one point functions for chiral primaries from the asymptotically AdS region of the fuzzball solutions. We find that chiral primaries associated with the middle cohomology of M4 acquire vevs when there are both internal and transverse excitations; these vevs hence characterize the internal excitations. Moreover, there are selection rules for these vevs, in that the internal and transverse curves must have common frequencies. These properties of the holographic vevs follow directly from the proposed dual super- positions of ground states. The vevs in these ground states can be derived from three point functions between chiral primaries at the conformal point. Selection rules for the latter, namely charge conservation and conservation of the number of operators associated with each middle cohomology cycle, lead to precisely the features of the vevs found holographi- cally. To test the actual values of the kinematically allowed vevs would require information about the three point functions of all chiral primaries which is not currently known and is inaccessible in supergravity. However, as in [12], these vevs are reproduced surprisingly well by simple approximations for the three point functions, which follow from treating the operators as harmonic oscillators. This suggests that the structure of the chiral ring may simplify considerably in the large N limit, and it would be interesting to explore this question further. An interesting feature of the solutions is that they collapse to the naive geometry when there are internal but no transverse excitations. One can understand this as follows. Ge- ometries with only internal excitations are dual to superpositions of R ground states built from operators associated with the middle cohomology of M4. Such operators account for a finite fraction of the entropy, but have zero R charges with respect to the SO(4) R sym- metry group. This means that they can only be characterized by the vevs of SO(4) singlet operators but the only such operators visible in supergravity are kinematically prevented from acquiring vevs. Thus it is consistent that in supergravity one could not distinguish between such solutions: one would need to go beyond supergravity to resolve them (by, for instance, considering vevs of singlet operators dual to string states). This brings us to a recurring question in the fuzzball program: can it be implemented consistently within supergravity? As already mentioned, rigorously testing the proposed correspondence between geometries and superpositions of microstates requires information beyond supergravity. Furthermore, the geometric duals of superpositions with very small or zero R charges are not well-described in supergravity. Even if one has geometries which are smooth supergravity geometries, these may not be distinguishable from each other within supergravity: for example, their vevs may differ only by terms of order 1/N , which cannot be reliably computed in supergravity. The question of whether the fuzzball program can be implemented in supergravity could first be phrased in the following way. Can one find a complete basis of fuzzball geometries, each of which is well-described everywhere by supergravity, which are distinguishable from each other within supergravity and which together span the black hole microstates? On general grounds one would expect this not to be possible since many of the microstates carry small quantum numbers. We quantify this discussion in the last section of this paper in the context of both 2-charge and 3-charge systems. To make progress within supergravity, however, it would suffice to sample the black hole microstates in a controlled way. I.e. one could try to find a basis of geometries which are well-described and distinguishable in supergravity and which span the black hole microstates but for which each basis element is assigned a measure. In this approach, one would deal with the fact that many geometries are too similar to be distinguished in supergravity by picking representative geometries with appropriate measures. In constructing such a representative basis, the detailed matching between geometries and black hole microstates would be crucial, to correctly assign measures and to show that the basis indeed spans all the black hole microstates. The plan of this paper is as follows. In section 2 we determine the fuzzball geometries for D1-D5 on T 4 from dualizing type II F1-P solutions whilst in section 3 we obtain fuzzball geometries for D1-D5 on K3 from dualizing heterotic F1-P solutions. The resulting so- lutions are of the same form and are summarized in section 4; readers interested only in the solutions may skip sections 2 and 3. In section 5 we extract from the asymptotically AdS regions the dual field theory data, one point functions for chiral primaries. In section 6 we discuss the correspondence between geometries and R vacua, extending the proposal of [11, 12] and using the holographic vevs to test this proposal. In section 7 we discuss more generally the implications of our results for the fuzzball proposal. Finally there are a number of appendices. In appendix A we state our conventions for the field equations and duality rules, in appendix B we discuss in detail the reduction of type IIB on K3 and ap- pendix C summarizes relevant properties of spherical harmonics. In appendix D we discuss fundamental string solutions with winding along the torus, and the corresponding duals in the D1-D5 system. In appendix E we derive the density of ground states with fixed R charges. 2 Fuzzball solutions on T 4 In this section we will obtain general 2-charge solutions for the D1-D5 system on T 4 from type II F1-P solutions. 2.1 Chiral null models Let us begin with a general chiral null model of ten-dimensional supergravity, written in the form ds2 = H−1(x, v)dv(−du +K(x, v)dv + 2AI(x, v)dxI) + dxIdxI ; (2.1) e−2Φ = H(x, v); B(2)uv = (H(x, v)−1 − 1); B(2)vI = H(x, v) −1AI(x, v). The conventions for the supergravity field equations are given in the appendix A.1. The above is a solution of the equations of motion provided that the defining functions are harmonic in the transverse directions, labeled by xI : �H(x, v) = �K(x, v) = �AI(x, v) = (∂IA I(x, v) − ∂vH(x, v)) = 0. (2.2) Solutions of these equations appropriate for describing solitonic fundamental strings carry- ing momentum were given in [14, 15]: H = 1 + |x− F (v)|6 , AI = − QḞI(v) |x− F (v)|6 , K = Q2Ḟ (v)2 Q|x− F (v)|6 , (2.3) where F I(v) is an arbitrary null curve describing the transverse location of the string, and Ḟ I denotes ∂vF I(v). More general solutions appropriate for describing solitonic strings with fermionic condensates were discussed in [13]. Here we will dualise without using the explicit forms of the functions, thus the resulting dual supergravity solutions are applicable for all choices of harmonic functions. The F1-P solutions described by such chiral null models can be dualised to give cor- responding solutions for the D1-D5 system as follows. Compactify four of the transverse directions on a torus, such that xi with i = 1, · · · , 4 are coordinates on R4 and xρ with ρ = 5, · · · , 8 are coordinates on T 4. Then let v = (t−y) and u = (t+y) with the coordinate y being periodic with length Ly ≡ 2πRy, and smear all harmonic functions over both this circle and over the T 4, so that they satisfy �R4H(x) = �R4K(x) = �R4AI(x) = 0, ∂iA i = 0. (2.4) Thus the harmonic functions appropriate for describing strings with only bosonic conden- sates are H = 1 + |x− F (v)|2 ; Ai = − dvḞi(v) |x− F (v)|2 ; (2.5) Aρ = − dvḞρ(v) |x− F (v)|2 ; K = dv(Ḟi(v) 2 + Ḟρ(v) |x− F (v)|2 . Here |x−F (v)|2 denotes i(xi−Fi(v))2. Note that neither Ḟi(v) nor Ḟρ(v) have zero modes; the asymptotic expansions of AI at large |x| therefore begin at order 1/|x|3. Closure of the curve in R4 automatically implies that Ḟi(v) has no zero modes. The question of whether Ḟρ(v) has zero modes is more subtle: since the torus coordinate x ρ is periodic, the curve Fρ(v) could have winding modes. As we will discuss in appendix D, however, such winding modes are possible only when the worldsheet theory is deformed by constant B fields. The corresponding supergravity solutions, and those obtained from them by dualities, should thus not be included in describing BPS states in the original 2-charge systems. The appropriate chain of dualities to the D1−D5 system is T5678→ D5y5678 NS5y5678 NS5y5678  , (2.6) to map to the type IIA NS5-F1 system. The subsequent dualities NS5y5678 NS5y5678 D5y5678  (2.7) result in a D1-D5 system. Here the subscripts of Dpa1···ap denote the spatial directions wrapped by the brane. In carrying out these dualities we use the rules reviewed in appendix A.2. We will give details of the intermediate solution in the type IIA NS5-F1 system since it differs from that obtained in [3]. 2.2 The IIA F1-NS5 system By dualizing the chiral null model from the F1-P system in IIB to F1-NS5 in IIA one obtains the solution ds2 = K̃−1[−(dt−Aidxi)2 + (dy −Bidxi)2] +Hdxidxi + dxρdxρ e2Φ = K̃−1H, B ty = K̃ −1 − 1, (2.8) µ̄i = K̃ −1Bµ̄i , B ij = −cij + 2K̃ −1A[iBj] C(1)ρ = H −1Aρ, C tyρ = (HK̃) −1Aρ, C µ̄iρ = (HK̃) −1Bµ̄i Aρ, ijρ = (λρ)ij + 2(HK̃) −1AρA[iBj], C ρστ = ǫρστπH −1Aπ, where K̃ = 1 +K −H−1AρAρ, dc = − ∗4 dH, dB = − ∗4 dA, (2.9) dλρ = ∗4dAρ, Bµ̄i = (−Bi, Ai), with µ̄ = (t, y). Here the transverse and torus directions are denoted by (i, j) and (ρ, σ) respectively and ∗4 denotes the Hodge dual in the flat metric on R4, with ǫρστπ denoting the Hodge dual in flat T 4 metric. The defining functions satisfy the equations given in (2.4). The RR field strengths corresponding to the above potentials are iρ = ∂i(H −1Aρ), F tyiρ = K̃ −1∂i(H −1Aρ), µ̄ijρ = 2K̃ −1Bµ̄ ∂j](H −1Aρ), F iρστ = ǫρστπ∂i(H −1Aπ), (2.10) = K̃−1 6A[iBj∂k](H −1Aρ) +Hǫijkl∂ l(H−1Aρ) Thus the solution describes NS5-branes wrapping the y circle and the T 4, bound to funda- mental strings delocalized on the T 4 and wrapping the y circle, with additional excitations on the T 4. These excitations break the T 4 symmetry by singling out a direction within the torus, and source multipole moments of the RR fluxes; the solution however has no net D-brane charges. Now let us briefly comment on the relation between this solution and that presented in appendix B of [3]2. The NS-NS sector fields agree, but the RR fields are different; in [3] they are given as 1, 3 and 5-form potentials. The relation of these potentials to field strengths (and the corresponding field equations) is not given in [3]. As reviewed in appendix A.2, in the presence of both electric and magnetic sources it is rather natural to use the so-called democratic formalisms of supergravity [16], in which one includes p-form field strengths with 2We thank Samir Mathur for discussions on this issue. p > 5 along with constraints relating higher and lower form field strengths. Any solution written in the democratic formalism can be rewritten in terms of the standard formalism, appropriately eliminating the higher form field strengths. If one interprets the RR forms of [3] in this way, one does not however obtain a supergravity solution in the democratic formalism; the Hodge duality constraints between higher and lower form field strengths are not satisfied. Furthermore, one would not obtain from the RR fields of [3] the solution written here in the standard formalism, after eliminating the higher forms. 2.3 Dualizing further to the D1-D5 system The final steps in the duality chain are T-duality along a torus direction, followed by S- duality. When T-dualizing further along a torus direction to a F1-NS5 solution in IIB, the excitations along the torus mean that the dual solution depends explicitly on the chosen T-duality cycle in the torus. We will discuss the physical interpretation of the distinguished direction in section 4. In the following the T-duality is taken along the x8 direction, resulting in the following D1-D5 system: ds2 = 5 f̃1 [−(dt−Aidxi)2 + (dy −Bidxi)2] + f1/21 f 5 dxidx i + f 5 dxρdx e2Φ = f5f̃1 f5f̃1 µ̄i = ABµ̄i f5f̃1 , (2.11) ij = λij + 2AA[iBj] f5f̃1 αβ = −ǫαβγf 5 Aγ , B α8 = f 5 Aα, C(0) = −f−11 A, C ty = 1− f̃−11 , C µ̄i = −f̃ ij = cij − 2f̃ 1 A[iBj], C tyij = λij + f5f̃1 (cij + 2A[iBj]), µ̄ijk f5f̃1 cjk], C = −ǫαβγf−15 A γ , C tyα8 = f 5 Aα, αβγ8 = ǫαβγf 5 A, C ijα8 = (λα)ij + f 5 Aαcij , C ijαβ = −ǫαβγ(λ ij + f γcij), where f5 ≡ H, f̃1 = 1 +K −H−1(AαAα + (A)2), f1 = f̃1 +H−1(A)2, dc = − ∗4 dH, dB = − ∗4 dA, Bµ̄i = (−Bi, Ai), (2.12) dλα = ∗4dAα, dλ = ∗4dA. Here µ̄ = (t, y) and we denote A8 as A with the remaining Aρ being denoted by Aα where the index α runs over only 5, 6, 7. The Hodge dual over these coordinates is denoted by ǫαβγ . Explicit expressions for these defining harmonic functions in terms of variables of the D1-D5 system will be given in section 4. The forms with components along the torus directions can be written more compactly as follows. Introduce a basis of self-dual and anti-self dual 2-forms on the torus such that ωα± = (dx4+α± ∧ dx8 ± ∗T 4(dx4+α± ∧ dx8)), (2.13) with α± = 1, 2, 3. These forms are normalized such that ωα± ∧ ωβ± = ±(2π)4V δα±β± , (2.14) where (2π)4V is the volume of the torus. Then the potentials wrapping the torus directions can be expressed as B(2)ρσ = C tyρσ = 2f−15 A α−ωα−ρσ , (2.15) ijρσ = (λij) α− + f−15 A α−cij ωα−ρσ , C(4)ρστπ = ǫρστπf with ǫρστπ being the Hodge dual in the flat metric on T 4. Note that these fields are expanded only in the anti-self dual two-forms, with neither the self dual two-forms nor the odd-dimensional forms on the torus being switched on anywhere in the solution. As we will discuss later, this means the corresponding six-dimensional solution can be described in chiral N = 4b six-dimensional supergravity. The components of forms associated with the odd cohomology of T 4 reduce to gauge fields in six dimensions which are contained in the full N = 8 six-dimensional supergravity, but not its truncation to N = 4b. 3 Fuzzball solutions on K3 In this section we will obtain general 2-charge solutions for the D1-D5 system on K3 from F1-P solutions of the heterotic string. 3.1 Heterotic chiral model in 10 dimensions The chiral model for the charged heterotic F1-P system in 10 dimensions is: ds2 = H−1(−dudv + (K − 2α′H−1N (c)N (c))dv2 + 2AIdxIdv) + dxIdxI B̂(2)uv = (H−1 − 1), B̂(2)vI = H −1AI , (3.1) Φ̂ = −1 lnH, V̂ (c)v = H −1N (c), where I = 1, · · · , 8 labels the transverse directions and V̂ (c)m are Abelian gauge fields, with ((c) = 1, · · · , 16) labeling the elements of the Cartan of the gauge group. The fields are denoted with hats to distinguish them from the six-dimensional fields used in the next subsection. The equations of motion for the heterotic string are given in appendix A.1; here again the defining functions satisfy �H(x, v) = �K(x, v) = �AI(x, v) = (∂IA I(x, v)− ∂vH(x, v)) = �N (c) = 0. (3.2) For the solution to correspond to a solitonic charged heterotic string, one takes the following solutions H = 1 + |x− F (v)|6 , AI = − QḞI(v) |x− F (v)|6 , N (c) = q(c)(v) |x− F (v)|6 , Q2Ḟ (v)2 + 2α′q(c)q(c)(v) Q|x− F (v)|6 , (3.3) where F I(v) is an arbitrary null curve in R8; q(c)(v) is an arbitrary charge wave and ḞI(v) denotes ∂vFI(v). Such solutions were first discussed in [14, 15], although the above has a more generic charge wave, lying in U(1)16 rather than U(1). In what follows it will be convenient to set α′ = 1 These solutions can be related by a duality chain to fuzzball solutions in the D1-D5 system compactified on K3. The chain of dualities is the following: Het,T 4 NS5ty,K3 NS5ty,K3 D5ty,K3 (3.4) The first step in the duality is string-string duality between the heterotic theory on T 4 and type IIA on K3. Again the subscripts of Dpa1···ap denote the spatial directions wrapped by the brane. To use this chain of dualities on the charged solitonic strings given above, the solutions must be smeared over the T 4 and over v, so that the harmonic functions satisfy �R4H = �R4K = �R4AI = �R4N (c) = ∂iA i = 0 (3.5) where i = 1, · · · , 4 labels the transverse R4 directions. Note that although the chain of dualities is shorter than in the previous case there are various subtleties associated with it, related to the K3 compactification, which will be discussed below. 3.2 Compactification on T 4 Compactification of the heterotic theory on T 4 is straightforward, see [18, 19] and the review [20]. The 10-dimensional metric is reduced as Ĝmn = gMN +GρσV (1) ρ (1) σ (1) ρ M Gρσ (1) σ N Gρσ Gρσ  , (3.6) where V (1) ρ M with ρ = 1, · · · 4, are KK gauge fields. (Recall that the ten-dimensional quan- tities are denoted with hats to distinguish them from six-dimensional quantities.) The reduced theory contains the following bosonic fields: the graviton gMN , the six-dimensional dilaton Φ6, 24 Abelian gauge fields V M ≡ (V (1) ρ M , V M ρ, V (3) (c) M ), a two form BMN and an O(4, 20) matrix of scalars M . Note that the index (a), (b) for the SO(4, 20) vector runs from (1, · · · , 24). These six-dimensional fields are related to the ten-dimensional fields as Φ6 = Φ̂− ln detGρσ ; M ρ = B̂ Mρ + B̂ (1) σ V̂ (c)ρ V (3) (c) M ; (3.7) (3) (c) M = V̂ M − V̂ (1) ρ HMNP = 3(∂[M B̂ L(a)(b)F (V ) with the metric gMN and V (1) ρ M defined in (3.6). The matrix L is given by 0 −I20  , (3.8) where In denotes the n× n identity matrix. The scalar moduli are defined via M = ΩT1 G−1 −G−1C −G−1V T −CTG−1 G+ CTG−1C + V TV CTG−1V T + V T −V G−1 V G−1C + V I16 + V G−1V T Ω1, (3.9) where G ≡ [Ĝρσ ], C ≡ [12 V̂ σ + B̂ ρσ ] and V ≡ [V̂ (c)ρ ] are defined in terms of the components of the 10-dimensional fields along the torus. The constant O(4, 20) matrix Ω1 is given by I4 I4 0 −I4 I4 0 . (3.10) This matrix arises in (3.9) as follows. In [18, 20] the matrix L was chosen to be off-diagonal, but for our purposes it is useful for L to be diagonal. An off-diagonal choice is associated with an off-diagonal intersection matrix for the self-dual and anti-self-dual forms of K3, but this is an unnatural choice for our solutions, in which only anti-self-dual forms are active. Thus relative to the conventions of [18, 20] we take L → ΩT1 LΩ1, which induces M → ΩT1MΩ1 and F → ΩT1 F . The definitions of this and other constant matrices used throughout the paper are summarized in appendix B.2. These fields satisfy the equations of motion following from the action −ge−2Φ6 [R+ 4(∂Φ6)2 − H23 − F (V ) MN (LML)(a)(b)F (V ) (b)MN tr(∂MML∂ MML)], (3.11) where α′ has been set to 1/4 and κ26 = κ 10/V4 with V4 the volume of the torus. The reduction of the heterotic solution to six dimensions is then ds2 = H−1 −dudv + K −H−1(1 (N (c))2 + (Aρ) dv2 + 2Aidx + dxidx Buv = (H−1 − 1), Bvi = H−1Ai, Φ6 = −12 lnH (3.12) V (a)v = 2H−1Aρ,H −1N (c) , M = I24, where i = 1, · · · , 4 runs over the transverse R4 directions and ρ = 5, · · · , 8 runs over the internal directions of the T 4. Thus the six-dimensional solution has only one non-trivial scalar field, the dilaton, with all other scalar fields being constant. 3.3 String-string duality to P-NS5 (IIA) on K3 Given the six-dimensional heterotic solution, the corresponding IIA solution in six dimen- sions can be obtained as follows. Compactification of type IIA on K3 leads to the following six-dimensional theory [17]: 6 [R′ + 4(∂Φ′6) 2 − 1 tr(∂MM ′L∂MM ′L)] (3.13) F ′(V ) MN (LM ′L)(a)(b)F ′(V ) (b)MN B′2 ∧ F ′2(V )(a) ∧ F ′2(V )(b)L(a)(b). The field content is the same as for the heterotic theory in (3.11); note that in contrast to (3.7) there is no Chern-Simons term in the definition of the 3-form field strength, that is, H ′MNP = 3∂[MB The rules for string-string duality are [17]: Φ′6 = −Φ6, g′MN = e−2Φ6gMN , M ′ =M, V M = V H ′3 = e −2Φ6 ∗6 H3; (3.14) these transform the equations of motion derived from (3.11) into ones derived from the action (3.13). Acting with this string-string duality on the heterotic solutions (3.12) yields, dropping the primes on IIA fields: ds2 = −dudv + (K −H−1((N (c))2/2 + (Aρ)2))dv2 + 2Aidxidv +Hdxidxi, Hvij = −ǫijkl∂kAl, Hijk = ǫijkl∂lH, Φ6 = lnH, (3.15) V (a)v = 2H−1Aρ,H −1N (c) , M = I24, with ǫijkl denoting the dual in the flat R 4 metric. This describes NS5-branes on type IIA, wrapped on K3 and on the circle direction y, carrying momentum along the circle direction. 3.4 T-duality to F1-NS5 (IIB) on K3 The next step in the duality chain is T-duality on the circle direction y to give an NS5-F1 solution of type IIB on K3. It is most convenient to carry out this step directly in six dimensions, using the results of [22] on T-duality of type II theories on K3× S1. Recall that type IIB compactified on K3 gives d = 6, N = 4b supergravity coupled to 21 tensor multiplets, constructed by Romans in [23]. The bosonic field content of this theory is the graviton gMN , 5 self-dual and 21 anti-self dual tensor fields and an O(5,21) matrix of scalars M which can be written in terms of a vielbein M−1 = V TV . Following the notation of [30] the bosonic field equations may be written as RMN = 2P PQ +HrMPQH ∇MPnrM = QMnmPmrM +QMrsPnsM + HnMNPHrMNP , (3.16) along with Hodge duality conditions on the 3-forms ∗6Hn3 = Hn3 , ∗6Hr3 = −Hr3 , (3.17) In these equations (m,n) are SO(5) vector indices running from 1 to 5 whilst (r, s) are SO(21) vector indices running from 6 to 26. The 3-form field strengths are given by Hn = GAV nA ; H r = GAV rA, (3.18) where A ≡ {n, r} = 1, · · · , 26; GA = dbA are closed and the vielbein on the coset space SO(5, 21)/(SO(5) × SO(21)) satisfies V T ηV = η, V =  , η = 0 −I21  . (3.19) The associated connection is dV V −1 = 2P rn Qrs  , (3.20) where Qmn and Qrs are antisymmetric and the off-diagonal block matrices Pms and P rn are transposed to each other. Note also that there is a freedom in choosing the vielbein; SO(5) × SO(21) transformations acting on H3 and V as V → OV, H3 → OH3, (3.21) leave G3 and M−1 unchanged. Note that the field equations (3.16) can also be derived from the SO(5, 21) invariant Einstein frame pseudo-action [21] tr(∂M−1∂M)− 1 GAMNPM−1ABG , (3.22) with the Hodge duality conditions (3.17) being imposed independently. Now let us consider the T-duality relating a six-dimensional IIB solution to a six- dimensional IIA solution of (3.13); the corresponding rules were derived in [22]. Given that the six-dimensional IIA supergravity has only an SO(4, 20) symmetry, relating IIB to IIA requires explicitly breaking the SO(5, 21) symmetry of the IIB action down to SO(4, 20). That is, one defines a conformal frame in which only an SO(4, 20) subgroup is manifest and in which the action reads R+ 4(∂Φ)2 + tr(∂M−1∂M) ∂l(a)M−1 (a)(b) ∂l(b) GAMNPM−1ABG . (3.23) The SO(5, 21) matrix M−1 has now been split up into the dilaton Φ, an SO(4, 20) vector l(a) and an SO(4, 20) matrix M−1 (a)(b) , and we have chosen the parametrization M−1AB = Ω e−2Φ + lTM−1l + 1 e2Φl4 −1 e2Φl2 (lTM−1)(b) + e2Φl2(lTL)(b) e2Φl2 e2Φ −e2Φ(lTL)(b) (M−1l)(a) + e2Φl2(Ll)(a) −e2Φ(Ll)(a) M−1(a)(b) + e 2Φ(Ll)(a)(l TL)(b) (3.24) where l2 = l(a)l(b)L(a)(b), L(a)(b) was defined in (3.8) and Ω3 is a constant matrix defined in appendix B.2. The fields Φ, l(a) and M−1 and half of the 3-forms can now be related to the IIA fields of section 3.3 by the following T-duality rules (given in terms of the 2-form potentials bA) [22]: g̃yy = g yy , b̃ yM + b̃ g−1yy gyM , (3.25) g̃yM = g yy ByM , b̃ MN + b̃ g−1yy (BMN + 2(gy[MBN ]y)), g̃MN = gMN − g−1yy (gyMgyN −ByMByN ), l̃(a) = V (a)y , Φ̃ = Φ− 1 log |gyy|, M̃−1(a)(b) =M (a)(b) (a)+1 M − g y gyM ), (1 ≤ (a) ≤ 24), Here y is the T-duality circle, the six-dimensional index M excludes y and IIB fields are denoted by tildes to distinguish them from IIA fields. The other half of the tensor fields, that is (b̃1yM − b̃26yM ), (b̃1MN − b̃26MN ), b̃ (a)+1 MN , b̃ (a)+1 , can then be determined using the Hodge duality constraints (3.17). We now have all the ingredients to obtain the T-dual of the IIA solution (3.15) along y ≡ 1 (u−v). The IIA solution is expressed in terms of harmonic functions which also depend on the null coordinate v, and thus one needs to smear the solutions before dualizing. Note that it is the harmonic functions (H,K,AI , N (c)) which must be smeared over v, rather than the six-dimensional fields given in (3.15), since it is the former that satisfy linear equations and can therefore be superimposed. The Einstein frame metric and three forms are given by ds2 = [−(dt−Aidxi)2 + (dy −Bidxi)2)] + HK̃dxidx GAtyi = ∂i , GAµ̄ij = −2∂[i , (3.26) GAijk = ǫijkl∂ lnA + 6∂[i AjBk] where (H +K + 1, 04) , n −2Aρ,− 2N (c),H −K − 1 , (3.27) K̃ = 1 +K −H−1(1 (N (c))2 + (Aρ) 2), dB = − ∗4 dA, Bµi = (−Bi, Ai). Recall that n = 1, · · · , 5 and r = 6, · · · , 26 and ∗4 denotes the dual on flat R4; µ̄ = (t, y). The SO(4, 20) scalars are given by , l(a) = 2H−1Aρ,H −1N (c) , M = I24. (3.28) The SO(5, 21) scalar matrix M−1 = V TV in (3.24) can then conveniently be expressed in terms of the vielbein V = ΩT3 H−1K̃ 0 0 H3K̃)−1(A2ρ + (N (c))2) HK̃−1 − HK̃−1l(b) l(a) 0 I24 Ω3. (3.29) 3.5 S-duality to D1-D5 on K3 One further step in the duality chain is required to obtain the D1-D5 solution in type IIB, namely S duality. However, in the previous section the type II solutions have been given in six rather than ten dimensions. To carry out S duality one needs to specify the relationship between six and ten dimensional fields. Whilst the ten-dimensional SL(2, R) symmetry is part of the six-dimensional symmetry group, its embedding into the full six-dimensional symmetry group is only defined once one specifies the uplift to ten dimensions. The details of the dimensional reduction are given in appendix B, with the six-dimensional S duality rules being given in (B.16); the S duality leaves the Einstein frame metric invariant, and acts as a constant rotation and similarity transformation on the three forms GA and the matrix of scalars M respectively. The S-dual solution is thus ds2 = f5f̃1 [−(dt−Aidxi)2 + (dy −Bidxi)2)] + f5f̃1dxidx i, (3.30) GAtyi = ∂i f5f̃1 , GAµ̄ij = −2∂[i f5f̃1 GAijk = ǫijkl∂ lmA + 6∂[i f5f̃1 AjBk] (f5 + F1) , (3.31) (f5 − F1),−2Aα,− 2N (c), 2A5 ((f5 − F1),−2Aα− , 2A) . Here the index α = 6, 7, 8. Note that the specific reduction used here, see appendix B, distinguished A5 from the other Aρ and N (c). A different embedding would single out a different harmonic function, and hence a different vector, and it is thus convenient to introduce (A,Aα−) to denote the choice of splitting more abstractly. Also as in (2.12) it is convenient to introduce the following combinations of harmonic functions: f5 = H, f̃1 = 1 +K −H−1(A2 +Aα−Aα−), (3.32) F1 = 1 +K, f1 = f̃1 +H −1A2. The vielbein of scalars is given by V = ΩT4 f−11 f̃1 0 0 0 0 f̃−11 f1 −GAF1 ( f1f̃1) −1A −GA kγ −FA 0 f−15 f1 0 0 FA 0 −1 f−15 F (k 1 −Fkγ 0 0 f−15 k γ 0 I22 Ω4, (3.33) where to simplify notation quantities (F,G) are defined as F = (f1f5) −1/2, G = (f1f̃1f −1/2. (3.34) We also define the 22-dimensional vector kγ as kγ = (03, 2Aα−). (3.35) Here γ = 1, · · · , b2 where the second Betti number is b2 = 22 for K3. Using the reduction formulae (B.13) and (B.14), the six-dimensional solution (3.30), (3.33) can be lifted to ten dimensions, resulting in a solution with an analogous form to the T 4 case (2.11). We will thus summarize the solution for both cases in the following section. 4 D1-D5 fuzzball solutions In this section we will summarize the D1-D5 fuzzball solutions with internal excitations, for both the K3 and T 4 cases. In both cases the solutions can be written as ds2 = [−(dt−Aidxi)2 + (dy −Bidxi)2] + f1/21 f 5 dxidx i + f e2Φ = f5f̃1 f5f̃1 µ̄i = ABµ̄i f5f̃1 , (4.1) ij = λij + 2AA[iBj] f5f̃1 , B(2)ρσ = f γωγρσ, C (0) = −f−11 A, ty = 1− f̃−11 , C µ̄i = −f̃ i , C ij = cij − 2f̃ 1 A[iBj], tyij = λij + f5f̃1 (cij + 2A[iBj]), C µ̄ijk = f5f̃1 cjk], tyρσ = f γωγρσ, C ijρσ = (λ ij + f γcij)ω ρσ, C ρστπ = f 5 Aǫρστπ, where we introduce a basis of self-dual and anti-self-dual 2-forms ωγ ≡ (ωα+ , ωα−) with γ = 1, · · · , b2 on the compact manifold M4. For both T 4 and K3 the self-dual forms are labeled by α+ = 1, 2, 3 whilst the anti-self-dual forms are labeled by α− = 1, 2, 3 for T 4 and α− = 1, · · · 19 for K3. The intersections and normalizations of these forms are defined in (2.13), (2.14) and (B.4). The solutions are expressed in terms of the following combinations of harmonic functions (H,K,Ai,A,Aα−) f5 = H; f̃1 = 1 +K −H−1(A2 +Aα−Aα−); f1 = f̃1 +H−1A2; kγ = (03, 2Aα−); dB = − ∗4 dA; dc = − ∗4 df5; (4.2) dλγ = ∗4dkγ ; dλ = ∗4dA; Bµ̄i = (−Bi, Ai), where µ̄ = (t, y) and the Hodge dual ∗4 is defined over (flat) R4, with the Hodge dual in the Ricci flat metric on the compact manifold being denoted by ǫρστπ. The constant term in C ty is chosen so that the potential vanishes at asymptotically flat infinity. The corresponding RR field strengths are i = −∂i f−11 A tyi = (f1f̃1f f25∂if̃1 + f5A∂iA−A2∂if5 µ̄ij = (f 5 f1f̃1) (f5∂j]f̃1 + f5A∂j]A−A2∂j]f5) + 2f̃1f25∂[iB ijk = −ǫijkl(∂ lf5 − f−11 A∂lA)− 6f 1 ∂[i(AjBk]) (4.3) +(f25f1f̃1) 6A[iBj(f5∂k]f̃1 + f5A∂k]A−A2∂k]f5) iρσ = f 1 A∂i(f γ)ωγρσ, iρστπ = ǫρστπ∂i(f 5 A), F tyijk = ǫijklf̃ 1 f5∂ l(f−15 A), µ̄ijkl = −ǫijklf5f̃ m(f−15 A), tyiρσ = f̃ 1 ∂i(k γ/f5)ω ρσ, F µ̄ijρσ = 2f̃ ∂j](f γ)ωγρσ, ijkρσ = 6f̃−11 A[iBj∂k](f γ) + ǫijklf5∂ l(f−15 k ωγρσ. It has been explicitly checked that this is a solution of the ten-dimensional field equations for any choices of harmonic functions (H,K,Ai,A,Aα−) with ∂iAi = 0. Note that in the case of K3 one needs the identity (B.15) for the harmonic forms to check the components of the Einstein equation along K3. We are interested in solutions for which the defining harmonic functions are given by H = 1 + |x− F (v)|2 ; Ai = − dvḞi(v) |x− F (v)|2 , (4.4) A = −Q5 dvḞ(v) |x− F (v)|2 ; A α− = −Q5 dvḞα−(v) |x− F (v)|2 , dv(Ḟ (v)2 + Ḟ(v)2 + Ḟα−(v)2) |x− F (v)|2 . In these expressions Q5 is the 5-brane charge and L is the length of the defining curve in the D1-D5 system, given by L = 2πQ5/R, (4.5) where R is the radius of the y circle. Note that Q5 has dimensions of length squared and is related to the integral charge via Q5 = α ′n5 (4.6) (where gs has been set to one). Assuming that the curves (Ḟ(v), Ḟα− (v)) do not have zero modes, the D1-brane charge Q1 is given by dv(Ḟ (v)2 + Ḟ(v)2 + Ḟα−(v)2), (4.7) and the corresponding integral charge is given by , (4.8) where (2π)4V is the volume of the compact manifold. The mapping of the parameters from the original F1-P systems to the D1-D5 systems was discussed in [1] and is unchanged here. The fact that the solutions take exactly the same form, regardless of whether the compact manifold is T 4 or K3, is unsurprising given that only zero modes of the compact manifold are excited. The solutions defined in terms of the harmonic functions (4.4) describe the complete set of two-charge fuzzballs for the D1-D5 system on K3. In the case of T 4, these describe fuzzballs with only bosonic excitations; the most general solution would include fermionic excitations and thus more general harmonic functions of the type discussed in [13]. Solutions involving harmonic functions with disconnected sources would be appropriate for describing Coulomb branch physics. Note that, whilst the solutions obtained by dualities from super- symmetric F1-P solutions are guaranteed to be supersymmetric, one would need to check supersymmetry explicitly for solutions involving other choices of harmonic functions. In the final solutions one of the harmonic functions A describing internal excitations is singled out from the others. In the original F1-P system, the solutions pick out a direction in the internal space. For the type II system on T 4, the choice of Aρ singles out a direction in the torus whilst in the heterotic solution the choice of (Aρ, N (c)) singles out a direction in the 20d internal space. Both duality chains, however, also distinguish directions in the internal space. In the T 4 case one had to choose a direction in the torus, whilst in the K3 case the choice is implicitly made when one uplifts type IIB solutions from six to ten dimensions. In particular, the uplift splits the 21 anti-self-dual six-dimensional 3-forms into 19 + 1 + 1 associated with the ten-dimensional (F (5), F (3),H(3)) respectively. When there are no internal excitations, the final solutions must be independent of the choice of direction made in the duality chains but this does not remain true when the original solution breaks the rotational symmetry in the internal space. A is the component of the original vector along the direction distinguished in the duality chain, whilst Aα− are the components orthogonal to this direction. When there are no excitations along the direction picked out by the duality, i.e. A = 0, the solution considerably simplifies, becoming ds2 = (f1f5)1/2 [−(dt−Aidxi)2 + (dy −Bidxi)2] + f1/21 f 5 dxidx i + f e2Φ = , B(2)ρσ = f γωγρσ, C ty = 1− f−11 , C µ̄i = −f ij = cij − 2f 1 A[iBj], C tyρσ = f γωγρσ, C ijρσ = (λ ij + f γcij)ω In this solution the internal excitations induce fluxes of the NS 3-form and RR 5-form along anti-self dual cycles in the compact manifold (but no net 3-form or 5-form charges). By contrast the excitations parallel to the duality direction induce a field strength for the RR axion, NS 3-form field strength in the non-compact directions and RR 5-form field strength along the compact manifold (but again no net charges). Let us also comment on the M4 moduli in our solutions. The solutions are expressed in terms of a Ricci flat metric on M4 and anti-self dual harmonic two forms. The forms satisfy ωγρσω δρσ = Dǫ δdγǫ ≡ δγδ , (4.9) where the intersection matrix dδγ and the matrix D δ relating the basis of forms and dual forms are defined in (B.4) and (B.6) respectively. The latter condition on D ǫ arose from the duality chain, and followed from the fact that in the original F1-P solutions the internal manifold had a flat square metric. Thus, the final solutions are expressed at a specific point in the moduli space of M4 because the original F1-P solutions have specific fixed moduli. It is straightforward to extend the solutions to general moduli: one needs to change f̃1 = 1 +K −H−1(A2 +Aα−Aα−) → 1 +K −H−1(A2 + 12k γkδDǫ δdγǫ), (4.10) with kγ as defined in (4.2), to obtain the solution for more general D Given a generic fuzzball solution, one would like to check whether the geometry is indeed smooth and horizon-free. For the fuzzballs with no internal excitations this question was discussed in [3], the conclusion being that the solutions are non-singular unless the defining curve F i(v) is non-generic and self-intersects. In the appendix of [3], the smoothness of fuzzballs with internal excitations was also discussed. However, their D1-D5 solutions were incomplete: only the metric was given, and this was effectively given in the form (3.30) rather than (4.1). Nonetheless, their conclusion remains unchanged: following the same discussion as in [3] one can show that a generic fuzzball solution with internal excitations is non-singular provided that the defining curve F i(v) does not self-intersect and Ḟi(v) only has isolated zeroes. In particular, if there are no transverse excitations, F i(v) = 0, the solution will be singular as discussed in section 6.6. One can show that there are no horizons as follows. The harmonic function f5 is clearly positive definite, by its definition. The functions (f1, f̃1) are also positive definite, since they can be rewritten as a sum of positive terms as f5f̃1 = |x− F |2 dvḞ 2 |x− F |2 (4.11) dv(Ḟ (v))2 + (Ḟα−(v))2 |x− F |2 dvdv′ (Ḟ(v)− Ḟ(v′))2 + (Ḟα−(v)− Ḟα−(v′))2 |x− F (v)|2 |x− F (v′)|2 and a corresponding expression for f5f1. Note that in the decoupling limit only the terms proportional to Q25 remain, and these are also manifestly positive definite. Given that the defining functions have no zeroes anywhere, the geometry therefore has no horizons. Now let us consider the conserved charges. From the asymptotics one can see that the fuzzball solutions have the same mass and D1-brane, D5-brane charges as the naive solution; the latter are given in (4.6) and (4.8) whilst the ADM mass is (Q1 +Q5), (4.12) where Ly = 2πR, Ω3 = 2π 2 is the volume of a unit 3-sphere, and 2κ26 = (2κ 2)/(V (2π)4) with 2κ2 = (2π)7(α′)4 in our conventions. The fuzzball solutions have in addition angular momenta, given by J ij = dv(F iḞ j − F jḞ i). (4.13) These are the only charges; the fields F (1) and F (5) fall off too quickly at infinity for the corresponding charges to be non-zero. One can compute from the harmonic expansions of the fields dipole and more generally multipole moments of the charge distributions. A generic solution breaks completely the SO(4) rotational invariance in R4, and this symmetry breaking is captured by these multipole moments. However, the multipole moments computed at asymptotically flat infinity do not have a direct interpretation in the dual field theory. In contrast, the asymptotics of the solutions in the decoupling limit do give field theory information: one-point functions of chiral primaries are expressed in terms of the asymptotic expansions (and hence multipole moments) near the AdS3 ×S3 boundary. Thus it is more useful to compute in detail the latter, as we shall do in the next section. 5 Vevs for the fuzzball solutions In the decoupling limit all of the fuzzball solutions are asymptotic to AdS3 × S3 × M4, where M4 is T 4 or K3. Therefore one can use AdS/CFT methods to extract holographic data from the geometries. In particular, the asymptotics of the six-dimensional solutions near the AdS3 × S3 boundary encode the vevs of chiral primary operators in the dual field theory. The precise relationship between asymptotics and vevs is however rather subtle. A systematic method for extracting vevs from asymptotically AdS ×X solutions (with X an arbitrary compact manifold) was only recently constructed, in [10], building on earlier work [24, 25, 26, 27, 28], see also the review [29]. This method of Kaluza-Klein holography was then applied to the case of asymptotically AdS3×S3 solutions of d = 6, N = 4b supergravity coupled to nt tensor multiplets in [11, 12] and in what follows we will make use of many of the results derived there. For fuzzball solutions on K3, the relevant solution of six-dimensional N = 4b super- gravity coupled to 21 tensor multiplets was given explicitly in (3.30). For the case of T 4, we obtained the solution in ten dimensions, but there is a corresponding six-dimensional solution of N = 4b supergravity coupled to 5 tensor multiplets. This solution is of exactly the same form as the K3 solution given in (3.30), but with the index α− = 1, 2, 3. Thus in what follows we will analyze both cases simultaneously. As mentioned earlier, the T 4 solution reduces to a solution of d = 6, N = 4b supergravity rather than a solution of d = 6, N = 8 supergravity because forms associated with the odd cohomology of T 4 (and hence six-dimensional vectors) are not present in our solutions. 5.1 Holographic relations for vevs Consider an AdS3 × S3 solution of the six-dimensional field equations (3.16), such that ds26 = (−dt2 + dy2 + dz2) + dΩ23 ; (5.1) G5 = H5 ≡ go5 = Q1Q5(rdr ∧ dt ∧ dy + dΩ3), with the vielbein being diagonal and all other three forms (both self-dual and anti-self dual) vanishing. In what follows it is convenient to absorb the curvature radius Q1Q5 into an overall prefactor in the action, and work with the unit radius AdS3 × S3. Now express the perturbations of the six-dimensional supergravity fields relative to the AdS3×S3 background gMN = g MN + hMN ; G A = goA + gA; (5.2) V nA = δ A + φ nrδrA + φnrφmrδmA ; V rA = δ A + φ nrδnA + φnrφnsδsA. These fluctuations can then be expanded in spherical harmonics as follows: hµν = hIµν(x)Y I(y), (5.3) hµa = (hIvµ (x)Y a (y) + h (s)µ(x)DaY I(y)), h(ab) = (ρIt(x)Y It (y) + ρIv (x)DaY b (y) + ρ (s)(x)D(aDb)Y I(y)), haa = πI(x)Y I(y), gAµνρ = 3D[µb (x)Y I(y), gAµνa = (b(A)Iµν (x)DaY I(y) + 2D[µZ (A)Iv (x)Y Iva (y)); gAµab = (A)I(x)ǫabcD cY I(y) + 2Z(A)Ivµ D[bY gAabc = (−ǫabcΛIU (A)I(x)Y I(y)); φmr = φ(mr)I(x)Y I(y), Here (µ, ν) are AdS indices and (a, b) are S3 indices, with x denoting AdS coordinates and y denoting sphere coordinates. The subscript (ab) denotes symmetrization of indices a and b with the trace removed. Relevant properties of the spherical harmonics are reviewed in appendix C. We will often use a notation where we replace the index I by the degree of the harmonic k or by a pair of indices (k, I) where k is the degree of the harmonic and I now parametrizes their degeneracy, and similarly for Iv, It. Imposing the de Donder gauge condition DAhaM = 0 on the metric fluctuations re- moves the fields with subscripts (s, v). In deriving the spectrum and computing correlation functions, this is therefore a convenient choice. The de Donder gauge choice is however not always a convenient choice for the asymptotic expansion of solutions; indeed the natural coordinate choice in our application takes us outside de Donder gauge. As discussed in [10] this issue is straightforwardly dealt with by working with gauge invariant combinations of the fluctuations. Next let us briefly review the linearized spectrum derived in [30], focusing on fields dual to chiral primaries. Consider first the scalars. It is useful to introduce the following combinations which diagonalize the linearized equations of motion: 4(k + 1) (5r)k I + 2(k + 2)U I ), (5.4) σkI = 12(k + 1) (6(k + 2)Û I − π̂ The fields s(r)k and σk correspond to scalar chiral primaries, with the masses of the scalar fields being s(r)k = k(k − 2), (5.5) The index r spans 6 · · · 5 + nt with nt = 5, 21 respectively for T 4 and K3. Note also that k ≥ 1 for s(r)k; k ≥ 2 for σk. The hats (Û (5)kI , π̂kI ) denote the following. As discussed in [10], the equations of motion for the gauge invariant fields are precisely the same as those in de Donder gauge, provided one replaces all fields with the corresponding gauge invariant field. The hat thus denotes the appropriate gauge invariant field, which reduces to the de Donder gauge field when one sets to zero all fields with subscripts (s, v). For our purposes we will need these gauge invariant quantities only to leading order in the fluctuations, with the appropriate combinations being I = πI2 +Λ 2ρI2(s); (5.6) 2 = U 2 − 12ρ 2(s); ĥ0µν = h h1±αµ h Next consider the vector fields. It is useful to introduce the following combinations which diagonalize the equations of motion: h±µIv = (C±µIv −A (C±µIv +A ). (5.7) For general k the equations of motion are Proca-Chern-Simons equations which couple (A±µ , C µ ) via a first order constraint [30]. The three dynamical fields at each degree k have masses (k − 1, k + 1, k + 3), corresponding to dual operators of dimensions (k, k + 2, k + 4) respectively; the operators of dimension k are vector chiral primaries. The lowest dimension operators are the R symmetry currents, which couple to the k = 1 A±αµ bulk fields. The latter satisfy the Chern-Simons equation Fµν(A ±α) = 0, (5.8) where Fµν(A ±α) is the curvature of the connection and the index α = 1, 2, 3 is an SU(2) adjoint index. We will here only discuss the vevs of these vector chiral primaries. Finally there is a tower of KK gravitons with m2 = k(k + 2) but only the massless graviton, dual to the stress energy tensor, will play a role here. Note that it is the com- bination Ĥµν = ĥ µν + π 0goµν which satisfies the Einstein equation; moreover one needs the appropriate gauge covariant combination ĥ0µν given in (5.6). Let us denote by (O ) the chiral primary operators dual to the fields (s I , σ respectively. The vevs of the scalar operators with dimension two or less can then be expressed in terms of the coefficients in the asymptotic expansion as i ]1; I ]2; (5.9) 2[σ2I ]2 − 2aIij i ]1[s Here [ψ]n denotes the coefficient of the z n term in the asymptotic expansion of the field ψ. The coefficient aIij refers to the triple overlap between spherical harmonics, defined in (C.5). Note that dimension one scalar spherical harmonics have degeneracy four, and are thus labeled by i = 1, · · · 4. Now consider the stress energy tensor and the R symmetry currents. The three dimen- sional metric and the Chern-Simons gauge fields admit the following asymptotic expansions ds23 = g(0)µ̄ν̄ + z g(2)µ̄ν̄ + log(z 2)h(2)µ̄ν̄ + (log(z 2))2h̃(2)µ̄ν̄ + · · · dxµ̄dxν̄ ; A±α = A±α + z2A±α + · · · (5.10) The vevs of the R symmetry currents J±αu are then given in terms of terms in the asymptotic expansion of A±αµ as J±αµ̄ g(0)µ̄ν̄ ± ǫµ̄ν̄ A±αν̄ . (5.11) The vev of the stress energy tensor Tµ̄ν̄ is given by 〈Tµ̄ν̄〉 = g(2)µ̄ν̄ + Rg(0)µ̄ν̄ + 8 1g(0)µ̄ν̄ + (5.12) where parentheses denote the symmetrized traceless combination of indices. This summarizes the expressions for the vevs of chiral primaries with dimension two or less which were derived in [12]. Note that these operators correspond to supergravity fields which are at the bottom of each Kaluza-Klein tower. The supergravity solution of course also captures the vevs of operators dual to the other fields in each tower. Expressions for these vevs were not derived in [12], the obstruction being the non-linear terms: in general the vev of a dimension p operator will include contributions from terms involving up to p supergravity fields. Computing these in turn requires the field equations (along with gauge invariant combinations, KK reduction maps etc) up to pth order in the fluctuations. Now (apart from the stress energy tensor) none of the operators whose vevs are given above is an SO(4) (R symmetry) singlet. For later purposes it will be useful to review which other operators are SO(4) singlets. The computation of the linearized spectrum in [30] picks out the following as SO(4) singlets: τ0 ≡ 1 π0; t(r)0 ≡ 1 φ5(r)0, (5.13) along with φ0i(r) with i = 1, · · · , 4. Recall ψ0 denotes the projection of the field ψ onto the degree zero harmonic. The fields (τ0, t(r)0) are dual to operators of dimension four, whilst the fields φ0i(r) are dual to dimension two (marginal) operators. The former lie in the same tower as (σ2, s(r)2) respectively, whilst the latter are in the same tower as s(r)1. In total there are (nt + 1) SO(4) singlet irrelevant operators and 4nt SO(4) singlet marginal operators, where nt = 5, 21 for T 4 and K3 respectively. Consider the SO(4) singlet marginal operators dual to the supergravity fields φi(r). These operators have been discussed previously in the context of marginal deformations of the CFT, see the review [32] and references therein. Suppose one introduces a free field realization for the T 4 theory, with bosonic and fermionic fields (xiI(z), ψ I(z)) where I = 1, · · · , N . Then some of the marginal operators can be explicitly realized in the untwisted sector as bosonic bilinears ∂xiI(z)∂̄x I(z̄); (5.14) there are sixteen such operators, in correspondence with sixteen of the supergravity fields. The remaining four marginal operators are realized in the twisted sector, and are associated with deformation from the orbifold point. 5.2 Application to the fuzzball solutions The six-dimensional metric of (3.30) in the decoupling limit manifestly asymptotes to ds2 = (−dt2 + dy2) + + dΩ23 . (5.15) where dv(Ḟ (v)2 + Ḟ(v)2 + Ḟα−(v)2). (5.16) Note that the vielbein (3.33) is asymptotically constant V o = ΩT4 I2 0 0 0 Q1/Q5 0 0 Q5/Q1 0 0 0 0 I22 Ω4, (5.17) but it does not asymptote to the identity matrix. Thus one needs the constant SO(5, 21) transformation V → V (V o)−1, G3 → V oG3. (5.18) to bring the background into the form assumed in (5.1). The fields are expanded about the background values, by expanding the harmonic func- tions defining the solution in spherical harmonics as f5kIY k (θ3) , K = f1kIY k (θ3) , (5.19) k≥1,I (AkI)iY k (θ3) , A = k≥1,I (AkI)Y Ik (θ3) Aα− = k≥1,I Y Ik (θ3) The polar coordinates here are denoted by (r, θ3) and Y k (θ3) are (normalized) spherical harmonics of degree k on S3 with I labeling the degeneracy. Note that the restriction k ≥ 1 in the last three lines is due to the vanishing zero mode, see [12]. As in [12], the coefficients in the expansion can be expressed as f5kI = L(k + 1) dv(CIi1···ikF i1 · · ·F ik), (5.20) f1kI = L(k + 1)Q1 Ḟ 2 + Ḟ2 + (Ḟα−)2 CIi1···ikF i1 · · ·F ik , (AkI)i = − L(k + 1) dvḞiC i1···ikF i1 · · ·F ik , (AkI) = − Q1L(k + 1) dvḞCIi1···ikF i1 · · ·F ik , Aα−kI = − Q1L(k + 1) dvḞα−CIi1···ikF i1 · · ·F ik . Here the CIi1···ik are orthogonal symmetric traceless rank k tensors on R 4 which are in one- to-one correspondence with the (normalized) spherical harmonics Y Ik (θ3) of degree k on S Fixing the center of mass of the whole system implies that (f11i + f 1i) = 0. (5.21) The leading term in the asymptotic expansion of the transverse gauge field Ai can be written in terms of degree one vector harmonics as (A1j)iY (aα−Y α−1 + a α+Y α+1 ), (5.22) where (Y α−1 , Y 1 ) with α = 1, 2, 3 form a basis for the k = 1 vector harmonics and we have defined aα± = e±αij(A1j)i, (5.23) where the spherical harmonic triple overlap e±αij is defined in C.6. The dual field is given by B = − (aα−Y α−1 − aα+Y 1 ). (5.24) Now given these asymptotic expansions of the harmonic functions one can proceed to expand all the supergravity fields, and extract the appropriate combinations required for computing the vevs defined in (5.9), (5.11) and (5.12). Since the details of the computation are very similar to those in [12], we will simply summarize the results as follows. Firstly the vevs of the stress energy tensor and of the R symmetry currents are the same as in [12], namely 〈Tµ̄ν̄〉 = 0; (5.25) aα±(dy ± dt). (5.26) The vanishing of the stress energy tensor is as anticipated, since these solutions should be dual to R vacua. As in [12], however, the cancellation is very non-trivial. The vevs of the scalar operators dual to the fields (s I , σ I ) are also unchanged from [12]: 2f51i); (5.27) 6(f12I − f52I)); 2(−(f12I + f52I) + 8aα−aβ+fIαβ). The internal excitations of the new fuzzball solutions are therefore captured by the vevs of operators dual to the fields s I with r > 6: (5+nt)1 2(A1i); (6+α−)1 2Aα−1i ; (5.28) (5+nt)2 6(A2I); (6+α−)2 6Aα−2I . Here nt = 5, 21 for T 4 and K3 respectively, with α− = 1, · · · , b2− with b2− = 3, 19 respec- tively. Thus each curve (F(v),Fα− (v)) induces corresponding vevs of operators associated with the middle cohomology of M4. Note the sign difference for the vevs of operators which are related to the distinguished harmonic function F(v). 6 Properties of fuzzball solutions In this section we will discuss various properties of the fuzzball solutions, including the interpretation of the vevs computed in the previous section. 6.1 Dual field theory Let us start by briefly reviewing aspects of the dual CFT and the ground states of the R sector; a more detailed review of the issues relevant here is contained in [12]. Consider the dual CFT at the orbifold point; there is a family of chiral primaries in the NS sector associated with the cohomology of the internal manifold, T 4 or K3. For our discussions only the chiral primaries associated with the even cohomology are relevant; let these be labeled as O(p,q)n where n is the twist and (p, q) labels the associated cohomology class. The degeneracy of the operators associated with the (1, 1) cohomology is h1,1. The complete set of chiral primaries associated with the even cohomology is then built from products of the (Opl,qlnl ) nlml = N, (6.1) where symmetrization over the N copies of the CFT is implicit. The correspondence between (scalar) supergravity fields and chiral primaries is 3 σn ↔ O(2,2)(n−1), n ≥ 2; (6.2) s(6)n ↔ O (0,0) (n+1) , s(6+α̃)n ↔ O (1,1) (n)α̃ , α̃ = 1, · · · h1,1, n ≥ 1. Spectral flow maps these chiral primaries in the NS sector to R ground states, where hR = hNS − jNS3 + jR3 = j , (6.3) where c is the central charge. Each of the operators in (6.1) is mapped by spectral flow to a (ground state) operator of definite R-charge (O(pl,ql)nl ) (OR(pl,ql)nl ) ml , (6.4) jR3 = (pl − 1)ml, j̄R3 = 12 (ql − 1)ml. Note that R operators which are obtained from spectral flow of those associated with the (1, 1) cohomology have zero R charge. 3As discussed in [12], the dictionary between (σn, s n ) and (O (2,2) (n−1) (0,0) (n+1) ) may be more complicated, since their quantum numbers are indistinguishable, but this subtlety will not play a role here. 6.2 Correspondence between geometries and ground states In [11, 12] we discussed the correspondence between fuzzball geometries characterized by a curve F i(v) and R ground states (6.4) with (pl, ql) = 1± 1. The latter are related to chiral primaries in the NS sector built from the cohomology common to both T 4 and K3, namely the (0, 0), (2, 0), (0, 2) and (2, 2) cohomology. The following proposal was made in [11, 12] for the precise correspondence between geometries and ground states; see also [33]. Given a curve F i(v) we construct the corre- sponding coherent state in the FP system and then find which Fock states in this coherent state have excitation number NL equal to nw, where n is the momentum and w is the wind- ing. Applying a map between FP oscillators and R operators then yields the superposition of R ground states that is proposed to be dual to the D1-D5 geometry. This proposal can be straightforwardly extended to the new geometries, which are char- acterized by the curve F i(v) along with h1,1 additional functions (F(v),Fα− (v)). Consider first the T 4 system, for which the four additional functions are F ρ(v). Then the eight functions F I(v) ≡ (F i(v), F ρ(v)) can be expanded in harmonics as F I(v) = (αIne −inσ+ + (αIn) ∗einσ ), (6.5) where σ+ = v/wR9. The corresponding coherent state in the FP system is , (6.6) where is a coherent state of the left moving oscillator âIn, satisfying â = αIn Contained in this coherent state are Fock states, such that (âInI ) mI |0〉 , N = nImI . (6.7) Now retain only the terms in the coherent state involving these Fock states, and map the FP oscillators to CFT R operators via the dictionary (â1n ± iâ2n) ↔ OR(±1+1),(±1+1)n ; (6.8) (â3n ± iâ4n) ↔ OR(±1+1),(∓1+1)n ; âρn ↔ O R(1,1) (ρ−4)n. The dictionary for the case of K3 is analogous. Here one has four curves F i(v) describing the transverse oscillations and twenty curves F α̃(v) describing the internal excitations. The oscillators associated with the former are mapped to operators associated with the universal cohomology as in (6.8) whilst the oscillators associated with the latter are mapped to operators associated with the (1, 1) cohomology as âα̃n ↔ O R(1,1) α̃n . (6.9) This completely defines the proposed superposition of R ground states to which a given geometry corresponds. Note that below we will suggest that a slight refinement of this dictionary may be necessary, taking into account that one of the internal curves is dis- tinguished by the duality chain. For the distinguished curve the mapping may include a negative sign, namely ân ↔ −OR(1,1)n ; this mapping would explain the relative sign between the vevs found in (5.28) associated with the distinguished curve F and the remaining curves Fα respectively. Note that there is a direct correspondence between the frequency of the harmonic on the curve and the twist label of the CFT operator. The latter is strictly positive, n ≥ 1, and thus in the dictionary (6.8) there are no candidate CFT operators to correspond to winding modes of the curves (F(v),Fα− (v)). In the case of T 4 such candidates might be provided by the additional chiral primaries associated with the extra T 4 in the target space of the sigma model, discussed in [34]. However the latter is related to the degeneracy of the right-moving ground states in the dual F1-P system, rather than to winding modes. For K3 all chiral primaries have been included (except for the additional primaries which appear at specific points in the K3 moduli space). Thus one confirms that winding modes of the curves (F(v),Fα− (v)) should not be included in constructing geometries dual to the R ground states. As discussed in appendix D these winding modes may describe geometric duals of states in deformations of the CFT. 6.3 Matching with the holographic vevs In this section we will see how the general structure of the vevs given in (5.28) can be reproduced using the proposed dictionary. The holographic vevs take the form O(1,1)α̃kI dvḞ α̃CIi1···ikF i1 · · ·F ik . (6.10) Thus the vevs of the operators O(1,1)α̃kI are zero unless the curve F α̃(v) is non-vanishing and at least one of the F i(v) is non-vanishing. Moreover, the dimension one operators will not acquire a vev unless the transverse and internal curves have excitations with the same frequency. Analogous selection rules for frequencies of curve harmonics apply for the vevs of higher dimension operators. These properties of the vevs follow directly from the proposed superpositions, along with selection rules for three point functions of chiral primaries. The superposition dual to a given set of curves is built from the R ground states ORI |0〉 = (OR(pl,qq)nl )ml |0〉, (6.11) l nlml = N and I labeling the degeneracy of the ground states. So this superposition can be denoted abstractly as |Ψ) = I aIORI |0〉 with certain coefficients aI . In particular, if the curve F α̃(v) = 0 the superposition does not contain any R ground states built from OR(1,1)α̃n operators. Moreover, if there are no transverse excitations, the superposition will contain only states with zero R charge. Now consider evaluating the vev of a dimension k operator O(1,1)α̃k in such a superposition. This is determined by three point functions between this operator and the chiral primary operators occurring in the superposition. More explicitly, the operator vev is related to three point functions via (ΨNS |O(1,1)α̃k |ΨNS) = a∗IaJ 〈(OI)†(∞)O (1,1) α̃k (µ)(O J )(0)〉. (6.12) Here OI is the NS sector operator which flows to ORI in the R sector and |ΨNS) is the flow of the superposition back to the NS sector, namely I aIOI |0〉. The quantity µ is a mass scale. Note we are evaluating the relevant three point function in the NS sector, and have hence flowed the ground states back to NS sector chiral primaries. We would get the same answer by flowing the operator whose vev we wish to compute, O(1,1) , into the Ramond sector and computing the three point function there. Recall that the R charges of these operators are related by the spectral flow formula (6.3) as jNS3 = j N . In particular, NS sector chiral primaries built only from operators associated with the middle cohomology all have the same R charges, namely 1 There are two basic selection rules for the three point functions (6.12). Firstly, as usual one has to impose conservation of the R charges. Secondly, a basic property of such three point functions is that they are only non-zero when the total number of operators O(1,1)α̃ with a given index α̃ in the correlation function is even 4. From a supergravity perspective one can see this selection rule arising as follows. One computes n-point correlation functions using n-point couplings in the three dimensional supergravity action, with the latter following from the reduction of the ten-dimensional action on S3×M4. Since a (1, 1) form integrates 4Note that this selection rule was used for the computation of three point functions of single particle operators in the orbifold CFT in [35]. to zero over M4, the three dimensional action only contains terms with an even number of fields sα̃ associated with a given (1, 1) cycle α̃ on M4. Therefore non-zero n-point functions must contain an even number of operators O(1,1)α̃ , and so do corresponding multi-particle 3-point functions obtained by taking coincident limits. Expressed in terms of cohomology, allowed three point functions contain an even number of (1, 1)α̃ cycles labeled by α̃. Thus in single particle correlators one can have processes such as O(0,0) +O(1,1)α̃ → O (1,1) α̃ and O (1,1) α̃ +O (1,1) α̃ → O(2,2), but processes such as O(0,0) + O(1,1)α̃ → O(0,0) which involve an odd number of α̃ cycles are kinematically forbidden. This kinematical selection rule for (1, 1) cycle conservation immediately explains why the operator O(1,1)α̃k can only acquire a vev when the curve F α̃(v) is non-vanishing: only then does the ground state superposition contain operators OR(1,1)α̃ such that the selection rule can be satisfied. One can also easily see why the operator only acquires a vev if there are transverse excitations as well. All Ramond ground states associated with the middle cohomology have zero R charge, with the corresponding chiral primaries in the NS sector having the same charge jNS3 = N . Thus a superposition involving only O(1,1) operators has a definite R charge, and a charged operator cannot acquire a vev. Including transverse excitations means that the superposition of Ramond ground states contains charged operators, associated with the universal cohomology, and does not have definite R charge. Therefore a charged operator can acquire a vev. Thus, to summarize, the proposed map between curves and superpositions of R ground states indeed reproduces the principal features of the holographic vevs. Using basic selection rules for three point functions we have explained why the operators O(1,1) acquire vevs only when the curve F α̃(v) is non-zero and when there are excitations in R4. We will see below that using reasonable assumptions for the three point functions we can also reproduce the selection rules for vevs relating to frequencies on the curves. Before discussing the general case, however, it will be instructive to consider a particular example. 6.4 A simple example Consider a fuzzball geometry characterized by a circular curve in the transverse R4 and one additional internal curve, with only one harmonic of the same frequency: F 1(v) = cos(2πn ); F 2(v) = sin(2πn ); F(v) = µB cos(2πn ), (6.13) where µ = Q1Q5/R and the D1-brane charge constraint (5.16) enforces (A2 + 1 B2) = 1. (6.14) The corresponding dual superposition of R ground states is then given by |Ψ) = Cl(OR(2,2)n )l(O R(1,1) −l |0〉 , (6.15) − l)!l! with the operators being orthonormal in the large N limit. In the case that either A or B are zero the superposition manifestly collapses to a single term. In the general case, this superposition gives the following for the expectation values of the R charges: Ψ|jR3 |Ψ Ψ|j̄R3 |Ψ C2l l; (6.16) N/n−1 − 1)! − (l + 1))! A2(l+1)( −(l+1)) = Evaluating (5.26) for (6.13) gives A2(dy ± dt), (6.17) and thus the integrated R charges defined in our conventions as 〈j3〉 = ; 〈j̄3〉 = , (6.18) agree with those of the superposition of R ground states. The kinematical properties also match between the geometry and the proposed super- position. In particular, when B 6= 0 the SO(2) symmetry in the 1-2 plane is broken: the harmonic functions (K,A) depend explicitly on the angle φ in this plane. The asymptotic expansions of these functions involve charged harmonics, and therefore charged operators acquire vevs characterizing the symmetry breaking. More explicitly, the relevant terms in (5.20) are f1kI ∝ dv(A2 +B2 sin2( ))CIi1···ikF i1 · · ·F ik ; (6.19) AkI ∝ dvB sin( ))CIi1···ikF i1 · · ·F ik . Now the symmetric tensor of rank k and SO(2) charge in the 1-2 plane of ±m behaves as ((F 1)2 + (F 2)2)k−m(F 1 ± iF 2)m = (µA )ke±2πinm L . (6.20) Note that m is related to (j3, j̄3) via m = j3 + j̄3. Thus, when B 6= 0, harmonics in the expansion of f1 with charges |m| = 2 are excited, and terms with |m| = 1 are excited in the expansion of A. Following (6.10) the latter implies that the dimension k operators O(1,1) 1(km) only acquire vevs when their SO(2) charge m in the 1-2 plane is ±1. In particular using (5.28) the vevs of the dimension one operators are 〈O(1,1) 1(1±1)〉 = ∓i µAB, (6.21) where the normalized degree one symmetric traceless tensors are 2(F 1 ± iF 2). These properties are implied by the superposition (6.15). The latter is a superposition of states with different R charge, and therefore it does break the SO(2) symmetry, with the symmetry breaking being characterized by the vevs of charged operators. Moreover following (6.12) the vev of O(1,1) 1(km) is given by C∗l Cl′〈(O(2,2)n )l(O (1,1) −l|O(1,1) 1(km) (µ)|(O(2,2)n )l (O(1,1)1n ) −l′〉. (6.22) For the dimension one operators, charge conservation reduces this to C∗l±1Cl〈(O(2,2)n )l±1(O (1,1) ∓1−l|O(1,1) 1(1±1)(µ)|(O (2,2) l(O(1,1)1n ) −l〉. (6.23) Thus there are contributions only from neighboring terms in the superposition. Computing the actual values of these vevs is beyond current technology: one would need to know three point functions for single and multiple particle chiral primaries at the conformal point. However, as in [12], the behavior of the vevs as functions of the curve radii (A,B) can be captured by remarkably simple approximations for the correlators, motivated by harmonic oscillators. Suppose one treats the operators as harmonic oscillators, with the operator O(1,1) 1(11) destroying one O(1,1)1n and creating one O (2,2) n . For harmonic oscillators such that [â, â†] = 1 the normalized state with p quanta is given by |p〉 = (â†)p/ p!|0〉 and therefore â†|p〉 = p+ 1|p+ 1〉. Using harmonic oscillator algebra for the operators gives 〈(O(2,2)n )l+1(O (1,1) −1−l|O(1,1) 1(11) (µ)|(O(2,2)n )l(O (1,1) −l〉 ≈ µ − l)(l + 1). (6.24) Then the corresponding vev in the superposition |Ψ) is 〈O(1,1) 1(11) 〉Ψ = µ N/n−1 c∗l+1cl − l)(l + 1) = µN AB, (6.25) which has exactly the structure of (6.21). Given that such simple approximations (and factorizations) of the correlators reproduce the structure of the vevs so well, it would be interesting to explore whether this relates to simplifications in the structure of the chiral ring in the large N limit. Next consider the vevs of dimension k operators. Using charge conservation and (1, 1) cycle conservation in (6.22) implies that only operators with m odd can acquire a vev. To reproduce the holographic result, that vevs are non-zero only when m = ±1, requires the assumption that only nearest neighbor terms in the superposition contribute to one point functions. This would follow from a stronger selection rule for (1, 1) cycle conservation, that the number of (1, 1) cycles in the in and out states differ by at most one. In particular, multi-particle processes such as (O(1,1)ãn )3 + O (1,1) α̃n → (O (2,2) 3 would be forbidden. The selection rules for holographic vevs suggest that there is indeed such cycle conservation, and it would be interesting to explore this issue further. Let us now return to the comment made below (6.9), that one may need to include a minus sign in the dictionary for the distinguished curve. Such a minus sign would intro- duce factors of (−1)N/n−l into the superposition (6.15), and thence an overall sign in the vevs of the associated operators O(1,1) 1(kI) . This would naturally account for the relative sign difference between the vevs associated with the distinguished curve and those associated with the remaining curves. It is not conclusive that one needs such a minus sign without knowing the exact three point functions and hence vevs. However such a sign change for oscillators associated with the direction distinguished by the duality would not be surpris- ing. Recall that under T-duality of closed strings right moving oscillators associated with the duality direction switch sign, whilst the left moving oscillators and oscillators associated with orthogonal directions do not. 6.5 Selection rules for curve frequencies Selection rules for charge and (1, 1) cycles are sufficient to reproduce the general structure of the vevs. In the particular example discussed above, these rules also implied the selection rules for the curve frequencies: operators acquire vevs only when the transverse and internal curves have related frequencies. Here we will note how, with reasonable assumptions, one can motivate the selection rules for frequencies in the general case. Consider the computation of the vev of a dimension one operator O(1,1)α̃1 for a general superposition |Ψ) using (6.12). Using the selection rules for charge and (1, 1) cycles, the contributions to (6.12) involve only certain pairs of operators (OI ,OJ ). Their SO(2) charges must differ by (±1/2,±1/2) and they must differ by an odd number of O(1,1)α̃ operators. Now let us make the further assumption that there are contributions to (6.12) only from pairs of operators (OI ,OJ ) which differ by only one term, the relevant operators taking the form OJ = O(p,q)n OJ̃ , (6.26) with OJ̃ being the same for in and out states, but the single operator O(p,q)n differing between in and out states. Thus we are assuming that the relevant three point functions factorize, with the non-trivial part of the correlator arising from a single particle process. This is indeed the structure of the three point functions arising in our example. Only nearest neighbor terms in the superposition contribute in the computation of the vev of the dimension one operator in (6.23). Moreover the m = ±1 charge selection rule for the vevs of higher dimension operators immediately follows from restricting to nearest neighbor terms in the three-point functions. Note further that this factorization structure is present in the orbifold CFT computation of the three point functions. The operator O(1,1)α̃1 ≡ O (1,1) α̃1 I is the identity operator in (N −1) copies of the CFT and thus only acts non-trivially in one copy of the CFT. Consider the case of the vev of the operator with SO(2) charges (1/2, 1/2); it would take the form I,J ,Ĩ a∗IaJNĨ 〈(O(2,2)n )†(∞)O (1,1) α̃1 (µ)(O (1,1) α̃n )(0)〉 (6.27) +〈(O(1,1)n )†(∞)O (1,1) α̃1 (µ)(O (0,0) α̃n )(0)〉 where NĨ is the norm of OĨ . Analogous expressions would hold for the dimension one operators with other charge assignments. Such a factorization would immediately explain the frequency selection rule found in the holographic vevs obtained from supergravity (6.10). The superposition contains operators of the form (6.26) with both (p, q) = (1, 1) and (p, q) 6= (1, 1) only when the internal curve and the transverse curves share a frequency. Extending these arguments to vevs of higher dimension operators would be straightforward, and would imply selection rules for curve frequencies. 6.6 Fuzzballs with no transverse excitations Consider the case where the fuzzball geometry has only internal excitations, F i(v) = 0. Then the corresponding dual superposition of ground states can involve only states built from the operators OR(1,1)αn . Any such state will be a zero eigenstate of both jR3 and j̄R3 . Furthermore, such ground states associated with the middle cohomology account for a finite fraction of the entropy of the D1-D5 system. In the case of K3 the total entropy behaves S = 2π , (6.28) with c = 24N . The ground states associated with the middle cohomology account for a central charge c = 20N . In the case of T 4 the entropy behaves as (6.28) with c = 12N . The states associated with the universal cohomology account for c = 4N , the odd cohomology accounts for another c = 4N and the middle cohomology accounts for the final c = 4N . Now let us consider the properties of the corresponding fuzzball geometry. When there are no transverse excitations and no winding modes of the internal curves, the SO(4) symmetry in R4 is unbroken, and the defining harmonic functions (4.4) reduce to H = 1 + ; K = ; (6.29) with Ai = 0 and where Q1 is defined in (5.16). The solutions manifestly all collapse to the standard (singular) D1-D5 solution and so, whilst one would need an exponential number of geometries (upon quantization) to account for dual ground states build from operators associated with the middle cohomology, one has only one singular geometry. Therefore the relevant fuzzball solutions are not visible in supergravity: one needs to take into account higher order corrections. One can understand this from several perspectives. Firstly, as discussed above, R ground states associated with the middle cohomology have zero R charge; they do not break the SO(4) symmetry. A geometry which is asymptotically AdS3 × S3 for which the SO(4) symmetry is exact can be characterized by the vevs of SO(4) singlet operators. The only such operators in supergravity are the stress energy tensor, and the scalar operators listed in (5.13). Since the vev of the stress energy tensor must be zero for the D1-D5 ground states, the geometry would have to be distinguished by the vevs of the singlet operators given in (5.13). Our results imply that these operators do not acquire vevs, and therefore within su- pergravity (without higher order corrections) geometries dual to different R ground states associated with the middle cohomology cannot be distinguished. The reason is the follow- ing. The SO(4) singlet operators dual to supergravity fields are related to chiral primaries by the action of supercharge raising operators; they are the top components of the multi- plets. Thus these SO(4) singlet operators cannot acquire vevs in states built from the chiral primaries. SO(4) singlet operators associated with stringy excitations would be needed to characterize the different ground states. A heuristic argument based on the supertube picture also indicates that geometries dual to these ground states are not to be found in the supergravity approximation. The geometries with transverse excitations in R4 can be viewed as a bound state of D1-D5 branes, blown up by their angular momentum in the R4. Indeed, the characteristic size of the fuzzball geometry is directly related to this angular momentum. The simplest example, related to a circular supertube, is to take a geometry characterized by a circular curve; this is obtained by setting B = 0 in (6.13). The characteristic scale of the geometry is rc ∼ gsµ/n, (6.30) where gs is the string coupling and µ has dimensions of length, whilst the (dimensionless) angular momentum behaves as j12 = N/n, and thus rc ∼ gsµ(j12/N). Hence the size of the D1-D5 bound state increases linearly with the angular momentum. A general fuzzball geometry will of course not be as symmetric but nonetheless the characteristic scale averaged over the R4 is still related to the total angular momentum. In our previous paper [12] we noted that fuzzball geometries dual to vacua for which the R charge is very small are not well described by supergravity. Here we have found that this implies that an exponential number of geometries dual to a finite fraction of the Ramond ground states, with strictly zero R charge, cannot be described at all in the supergravity approximation. 7 Implications for the fuzzball program In this section we will consider the implications of our results for the fuzzball program, focusing in particular on whether one can find a set of smooth weakly curved supergravity geometries which span the black hole microstates. We have seen in the previous sections that the geometric duals of superpositions of R vacua with small or zero R charge are not well-described in supergravity. The natural basis for R ground states (6.4) uses states of definite R-charges, and it is therefore straightforward to work out the density of ground states with given R-charges, dN,j3,j̄3 , with the total number of ground states being given by dN = N,j3,j̄3 dN,j3,j̄3 . This computation is discussed in appendix E with the resulting density in the large N limit being dN,j1,j2 4(N + 1− j)31/4 2π(2N − j)√ N + 1− j cosh2( N+1−j ) cosh N+1−j ) , (7.1) where j1 = (j3 + j̄3) and j 2 = (j3 − j̄3) and j = |j1| + |j2|. The key feature is that the number of states with zero R charge differs from the total number of R ground states given in (E.16) only by a polynomial factor: dN,0,0 ∼= dN/N. (7.2) The geometries dual to such ground states are unlikely to be well-described in supergravity, and therefore the basis of black hole microstates labeled by R charges is not a good basis for the geometric duals. This argument reinforces the discussion of [12], where we showed in detail that the geometric duals of specific states (in this basis) must be characterized by very small vevs which cannot be reliably distinguished in supergravity; they are comparable in magnitude to higher order corrections. The geometries that are smooth in supergravity correspond to specific superpositions of the R charge eigenstates, for which some vevs are atypically large. The natural basis for the field theory description of the microstates is thus not the natural basis for the geometric duals. This issue is likely to persist in other black hole systems. For example, the microstates of the D1-D5-P system are also most naturally described as (j3, j̄3) eigenstates, with a relation analogous to (7.2) holding, so the number of states with zero R-charge is suppressed only polynomially compared to the total number of black hole microstates. Just as in the 2-charge system discussed here, the geometric duals are related to supertubes whose radii depend on the R-charges. States or superpositions of states which have small or zero R-charges are unlikely to be well-described by supergravity solutions. Thus a given smooth supergravity geometry should be described by a specific superposition of the black hole microstates. Identifying the specific superpositions for known 3-charge geometries is an open and important question. The issue is whether there exist enough such geometries, well-described and distinguish- able in supergravity, to span the entire set of black hole microstates. It seems unlikely that a basis exists which simultaneously satisfies all three requirements. Firstly, on general grounds microstates with small quantum numbers will not be well-described in supergrav- ity. Even when considering superpositions that are well described by supergravity, to span the entire basis, one will have to include superpositions which can only be distinguished by these small vevs. I.e. in choosing a basis of geometries for which some vevs are sufficiently large for the supergravity description to be valid one will find that some of these geometries cannot be distinguished among themselves in supergravity. We have already seen several examples of this problem in the 2-charge system. Let us parameterize the curves as F i(v) = µ (αine 2πinv/L + (αin) ∗e−2πinv/L); (7.3) F β̃(v) = µ (αβ̃ne 2πinv/L + (αβ̃n) ∗e−2πinv/L), where µ = Q1Q5/R and β̃ runs from 1 to h 1,1(M4). The D1-brane charge constraint (4.7) limits the total amplitude of these curves as ) = 1. (7.4) Thus in general increasing the amplitude in one mode, to make certain quantum numbers large, decreases the amplitudes in the others. Moreover, the amplitude in a given mode is bounded via |αn|2 ≤ 1/n2, and is thus is intrinsically very small for high frequency modes, which sample vacua with large twist labels in the CFT. Note also that the vevs of R-charges are given in terms of jij = iN n(αin(α ∗ − αjn(αin)∗) (7.5) As we have seen, to be describable in supergravity, geometries must have transverse R4 excitations, and thus some large R-charges, requiring jij ≫ 1. Combining (7.5) and (7.4) one sees that this restricts the amplitudes of the internal excitations, and thus of the sampling of the black hole microstates associated with the middle cohomology of M4. Another way to understand the limitations of supergravity is to go back to the F1-P system where the corresponding state is the coherent state |{αin}, {α m}). These states form a complete basis of states, so we know that there is an F1-P geometry corresponding to every 1/2 BPS state. However, only when all αin, α m are large are the geometries well- described and distinguishable within supergravity. Indeed, the amplitudes αin, α m are also the root mean deviations of the distribution around the mean (which is described by the classical curve), so only for large αin, α m is the classical string that sources the supergravity solution a good approximation of the quantum state. Putting it differently, when some of the amplitudes are small the difference in the solutions for different amplitudes is comparable with the error in the solutions due to the approximation of the source by a classical string, so one cannot reliably distinguish them within this approximation. If one could not find a basis of distinguishable supergravity geometries spanning the mi- crostates, one might ask whether a sufficiently representative basis exists. That is, suppose one chooses a single representative of the indistinguishable geometries, and assigns a mea- sure to this geometry. Then is the corresponding basis of weighted geometries sufficiently representative to obtain the black hole properties? In the 2-charge system, the now com- plete set of fuzzball geometries along with the precise mapping between these geometries and R vacua allows these questions to be addressed at a quantitative level and we hope to return to this issue elsewhere. Acknowledgments The authors are supported by NWO, KS via the Vernieuwingsimpuls grant “Quantum gravity and particle physics” and IK, MMT via the Vidi grant “Holography, duality and time dependence in string theory”. This work was also supported in part by the EU contract MRTN-CT-2004-512194. KS and MMT would like to thank both the 2006 SimonsWorkshop and the theoretical physics group at the University of Crete, where some of this work was completed. The computer algebra package GRTensor was used to verify that our solutions satisfy the supergravity field equations. A Conventions The following table summarises the indices used throughout the paper. In some cases an index is used more than once, with different meanings, in separate sections of the paper. Index Range Usage (m,n) 0, · · · , 9 10d sugra fields (M,N) 0, · · · , 5 6d sugra fields (µ, ν) 0, 1, 2 3d fields (a, b) 1, 2, 3 S3 indices (i, j) 1, 2, 3, 4 R4 indices (ρ, σ) 1, 2, 3, 4 M4 indices (µ̄, ν̄) 0, 1 2d fields (α, β) 1, 2, 3 SU(2) vector index (γ, δ) 1, · · · , b2 H2(M4) (α̃, β̃) 1, · · · , h1,1 H1,1(M4) (I, J) 1, · · · , 8 SO(8) vector ((c), (d)) 1, · · · , 16 heterotic vector fields ((a), (b)) 1, · · · , 24 SO(4, 20) vector (A,B) 1, · · · , 26 SO(5, 21) vector (m,n) 1, · · · , 5 SO(5) vector (r, s) 6, · · · , (nt + 1) SO(nt) vector A.1 Field equations The equations of motion for IIA supergravity are: e−2Φ(Rmn + 2∇m∇nΦ− H(3)mpqH (3)pq F (2)mpF 2 · 3!F mpqrF (4)pqr (F (2))2 + (F (4))2) = 0, (A.1) 4∇2Φ− 4(∇Φ)2 +R− 1 (H(3))2 = 0, dH(3) = 0, dF (2) = 0, ∇mF (2)mn − H(3)pqrF (4)npqr = 0, ∇m(e−2ΦH(3)mnp)− F (2)qr F (4)qrnp − 1 2 · (4!)2 ǫ npm1···m4n1···n4F m1···m4F n1···n4 = 0, dF (4) = H(3) ∧ F (2), ∇mF (4)mnpq − 3! · 4!ǫ npqm1···m3n1···n4H m1···m3F n1···n4 = 0. The corresponding equations for type IIB are: e−2Φ(Rmn + 2∇m∇nΦ− H(3)mpqH (3)pq F (1)m F F (3)mpqF (3)pq 4 · 4!F mpqrsFn (5)pqrs Gmn((F (1))2 + (F (3))2) = 0, 4∇2Φ− 4(∇Φ)2 +R− 1 (H(3))2 = 0, dH(3) = 0, ∇m(e−2ΦH(3)mnp)− F (1)m F (3)mnp − F (3)mqrF (5)mqrnp = 0, (A.2) dF (1) = 0, ∇mF (1)m + H(3)pqrF (3)pqr = 0, dF (3) = H(3) ∧ F (1), ∇mF (3)mnp + H(3)mqrF (5)mqrnp = 0, dF (5) = d(∗F (5)) = H(3) ∧ F (3), where the Hodge dual of a p-form ωp in d dimensions is given by (∗ωp)i1···id−p = ǫi1···id−pj1···jpω j1···jp p , (A.3) with ǫ01···d−1 = √−g. The RR field strengths are defined as F (p+1) = dC(p) −H(3) ∧C(p−2). (A.4) The equations of motion for the heterotic theory are: 4∇2Φ− 4 (∇Φ)2 +R− 1 (H(3))2 − α′(F (c))2 = 0, e−2ΦH(3)mnr Rmn + 2∇m∇nΦ− 1 H(3)mrsH(3)nrs − 2α′F (c)mrF (c)nr = 0, e−2ΦF (c)mn e−2ΦH(3)nrsF (c)rs = 0. mn with (c) = 1, · · · 16 are the field strengths of Abelian gauge fields V (c)m ; we consider here only supergravity backgrounds with Abelian gauge fields. This restriction means that the gauge field part of the Chern-Simons form in H3, H(3) = dB(2) − 2α′ω3(V ) + · · · , (A.5) does not play a role in the supergravity solutions, nor does the Lorentz Chern-Simons term denoted by the ellipses. A.2 Duality rules The T-duality rules for RR fields were derived in [36] by reducing type IIA and type IIB supergravities on a circle and relating the respective RR potentials in the 9-dimensional theory. However, for calculations involving magnetic sources, it is more convenient to work with T-duality rules for RR field strengths, since potentials can only be defined locally. In the following we will rederive the T-duality rules in terms of RR field strengths. It is slightly easier although not necessary to use the democratic formalism of IIA and IIB supergravity introduced in [16]. In this formalism one includes p-form field strengths for p > 5 with Hodge dualities relating higher and lower-form field strengths being imposed in the field equations. This formalism is natural when both magnetic and electric sources are present; moreover there is no need for Chern-Simons terms in the field equations. The RR part of the (pseudo)-action is simply SRR = − 2κ210 (F (q))2, (A.6) where q = 2, 4, 6, 8 is even in the IIA case and q = 1, 3, 5, 7, 9 is odd in the IIB case. The field strengths are defined as F (q) = dC(q−1) − H(3) ∧ C(q−3) for q ≥ 3 and Fq = dC(q−1) for q < 3. The Hodge duality relation between higher and lower form field strengths in our conventions is ∗F (q) = (−1)⌊ ⌋F (10−q), (A.7) where ⌊n⌋ denotes the largest integer less or equal to n. Now to compactify on a circle the ten-dimensional metric can be parameterized as ds2 = e2ψ(dy −Aµdxµ)2 + ĝµνdxµdxν , (A.8) where y denotes the compact direction, and 9-dimensional quantities will be denoted as hatted. An economic way to derive the T-duality rules for the field strengths is the following. Choose the vielbein to be ey = eψ(dy −Aµdxµ); eµ = êµ, (A.9) where µ denotes a tangent space index, and êµ is the 9-dimensional vielbein. Now reduce the field strengths (in the tangent frame) as F̂ (q)µ = F (q)µ , F̂ (q−1)µ = F (q)µ y. (A.10) The corresponding 9-dimensional action for the field strengths is given by SRR = − 2κ210 eψF̂ 2q . (A.11) Since ψIIA = −ψIIB under T-duality, one can read from this action the transformation rules for field strengths in 10d: (q+1) y = e , (A.12) F̃ (q+1)µ = eψF (q+2)µ Here q even defines IIB fields in terms of IIA fields and q odd defines IIA in terms of IIB. Note that the field strengths on both sides are in the tangent frame. Given the T-duality rules for NSNS fields eψ̃ = e−ψ, õ = B yµ , B̃ ym = Am, (A.13) B̃(2)mn = B mn + 2A[mB , Φ̃ = Φ− ψ, with the metric gmn invariant, one can easily convert (A.12) back into F (q)m1...mq = F (q+1) m1...mqy − q(−1)qB(2) (q−1) m2...mq] + q(q − 1)B(2) (q−1) m3...mq]y F (q)m1...mq−1y = F (q−1) m1...mq−1 − (q − 1)(−1)qA[m1F (q−1) m2...mq−1]y . (A.14) Strictly speaking, this gives the duality rules in the democratic formalism. However we can obtain the usual rules by simply dropping the (p > 5)-form field strengths as long as we make sure to self-dualise F (5) in each IIB solution. The S duality rules for type IIB are τ̃ = −1 , B̃(2) = C(2), C̃(2) = −B(2), F̃ (5) = F (5), G̃mn = |τ |Gmn, (A.15) where τ = C(0) + ie−Φ. B Reduction of type IIB solutions on K3 The reduction of type IIB on K3 is very similar to the reduction of type IIA, which was discussed in some detail in [37]. In the following we will use the reduction of the NS-NS sector fields given in [37], and derive the reduction of the type IIB RR fields. Let us first review the reduction of the NS-NS sector. Starting from the ten-dimensional action SNS = 2κ210 e−2Φ̂(R̂+ 4(∂Φ̂)2 − 1 Ĥ23 ) , (B.1) where ten-dimensional fields are denoted by hats, the corresponding six-dimensional field equations can be derived from the action [37] −ge−2Φ R+ 4(∂Φ)2 − 1 H23 + tr(∂M−1∂M) , (B.2) where the six-dimensional fields are defined as follows. Firstly the 10-dimensional 2-form potential is reduced as B̂(2)(x, y) = B2(x) + b γ(x)ω 2 (y), (B.3) where (x, y) are six-dimensional and K3 coordinates respectively and the two forms ω 2 with γ = 1, · · · 22 span the cohomology H2(K3,R). The 2-forms ωγ2 transform under an O(3, 19) symmetry, with a metric defined by the 22-dimensional intersection matrix dγδ = (2π)4V 2 ∧ ω 2, (B.4) where (2π)4V is the volume of K3. A natural choice for dγδ is dγδ = 0 −I19  , (B.5) corresponding to a diagonal basis for the 3 self-dual and 19 anti-self dual two forms of K3. Furthermore, there is a matrix Dδ γ defined by the action of the Hodge operator ∗K34 ω 2 = ω γ , (B.6) which is dependent on the K3 metric and satisfies ǫ = δ δdǫζD γ = dδζ . (B.7) The SO(4, 20) matrix of scalars M−1 (a)(b) was derived in [37] to be M−1 = ΩT2 e−ρ + bγbδdγǫD eρb4 1 eρb2 1 eρb2bγdγδ + b γdγǫD eρb2 eρ eρbγdγδ eρb2bγdγδ + b γdγǫD ρbγdγδ e ρbǫdǫγb ζdζδ + dγǫD Ω2, (B.8) with b2 ≡ bγbδdγδ. Here ρ is the breathing mode of K3, e−ρ = 1(2π)4V ∗41. The six- dimensional dilaton is related to the 10-dimensional dilaton via Φ = Φ̂ + ρ/2. The dimensional reduction of the NS sector makes manifest only an SO(4, 20) subgroup of the full SO(5, 21) symmetry. Including the reduction of the RR sector should thus give the equations of motion following from the six-dimensional string frame action, which for IIB was given in (3.23) R+ 4(∂Φ)2 + tr(∂M−1∂M) ∂l(a)M−1 (a)(b) ∂l(b) GAMNPM−1ABG and in which only an SO(4, 20) subgroup of the total SO(5, 21) symmetry is manifest; recall that M−1AB here is an SO(5, 21) matrix, with M (a)(b) being SO(4, 20). Note that the six- dimensional coupling is related to the ten-dimensional coupling via (2π)4V (2κ26) = 2κ where (2π)4V is the volume of K3. Following the same steps as [37] the RR potentials can be reduced as Ĉ(0)(x, y) = C0(x), Ĉ (2)(x, y) = C2(x) + c (0,2) 2 (y), (B.9) Ĉ(4)(x, y) = C4(x) + c (2,4) (x) ∧ ωγ2 (y) + c(0,4)(x)(e ρ ∗K3 1)(y), where ∗K3 denotes the Hodge dual in the K3 metric and the corresponding field strengths F̂ (1)(x, y) = F1(x), (B.10) F̂ (3)(x, y) = dC2(x)− C0(x)H3(x) + (0,2) (x)− C0(x)dbγ(x) ω2(y) ≡ F3 +Kγ1 ∧ ω Ĥ(3)(x, y) = dB2(x) + db γ(x) ∧ ωγ2 (y) ≡ H3 + db γ ∧ ωγ2 , F̂ (5)(x, y) = dC4(x)− C2(x) ∧H3(x) + (2,4) (x)− C2(x)dbγ(x)− cγ(0,2)(x)H3(x) ∧ ωγ2 (y) dc(0,4)(x)− cγ0,2(x)dbδ(x)dγδ ∧ (eρ(x) ∗K3 1)(y) ≡ F5 +Kγ3 ∧ ω 2 + F̃1 ∧ eρ ∗K3 1. The reduction of the potentials thus gives two three form field strengths H3 and F3, 3 self- dual and 19 anti-self dual three form field strengths K 3 and 46 scalars b γ , c (0,2) , c(0,4) and C0. After splitting the three forms H3 and F3 into their self-dual and anti-self-dual parts, we obtain 5 self-dual and 21 anti-self dual tensors in total, as described in [38]. It is then straightforward to obtain the map relating six and ten-dimensional fields by inserting the expressions (B.9) and (B.10) into the ten-dimensional field equations (A.2). The additional RR scalars are contained in l(a) = ΩT2 c̃(0,4) (0,2) , (B.11) with Ω2 defined in the appendix B.2 and the shifted fields defined as (0,2) (0,2) − C0bγ , (B.12) c̃(0,4) = c(0,4) − bγcδ(0,2)dγδ + b2C0. The fields Φ, l(a) and the SO(4, 20) matrix M−1 given in (B.8) can be recombined into the SO(5, 21) matrix M−1 = V TV , with the latter conveniently expressed in terms of the vielbein V = ΩT4 e−Φ 0 0 0 0 −eΦ(C0c(0,4) − 12c (0,2) ) eΦ −eΦc̃(0,4) −eΦC0 eΦc̃γ(0,2)dγδ e−ρ/2C0 0 e −ρ/2 0 0 eρ/2c(0,4) 0 eρ/2b2 eρ/2 eρ/2bγdγδ Ṽδγc (0,2) 0 Ṽδγb γ 0 Ṽγδ Ω4. (B.13) Here the SO(3, 19) vielbein Ṽαβ is defined by dαβD γ = ṼαβṼβγ , c (0,2) (0,2) (0,2) dγδ and the matrix Ω4 is defined in the appendix B.2. The six-dimensional tensor fields are related to the ten-dimensional fields as H13 = (1 + ∗6)H3, Hα++13 = − (Ṽ K3) α+ , (B.14) H53 = − e−ρ/2 (1 + ∗6)F3, H63 = − e−ρ/2 (1− ∗6)F3, 3 = − (Ṽ K3) α− , H263 = (1− ∗6)H3. Here α+ = 1, 2, 3 and α− = 4, · · · 22, labeling the self dual and anti-self dual forms respec- tively. Note that using formulas (B.13) and (B.14) to lift a six-dimensional solution to ten dimensions requires a specific choice of six-dimensional vielbein. The solutions we find have D = dγδ ; this implies the identity 2 )ρσ(ω α−β− , (B.15) where (ρ, τ) are K3 coordinates and gρτ is the K3 metric. As discussed in [39], a choice of ǫ fixes the complex structure completely and implies (ω 2 )ρσ(ω ρσ = Dǫ δdγǫ. Varying this identity with respect to the metric results in (B.15). B.1 S-duality in 6 dimensions Given the map between 10-dimensional and 6-dimensional fields, we can now obtain the action of S-duality on 6-dimensional fields as part of the SO(5, 21) symmetry: G3 → OSG3, M−1 → OSM−1OTS , (B.16) where (OS)ij = 0 0 −1 0 I3 0 1 0 0 , (OS)rs = 0 0 1 0 I19 0 −1 0 0 , (B.17) Moreover one can perform an SO(5) × SO(21) transformation to bring the vielbein of the S-dual solution back to the form used by the 10-dimensional lift. Including this transfor- mation, H3 and V transform as H3 → OGH3, V → OGV OTS , (B.18) (OG)ij = C0 0 −eΦ̂ 0 I3 0 eΦ̂ 0 C0 , (OG)rs = C0 0 −eΦ̂ 0 I19 0 eΦ̂ 0 C0 , (B.19) where τ = C0 + ie −Φ̂, Φ̂ = Φ− ρ/2 is the 10-dimensional dilaton and the fields C0 and eΦ̂ are the original ones taken before the S-duality. B.2 Basis change matrices In defining six-dimensional supergravities there are implicit choices of constant SO(p, q) matrices. When discussing the compactification from the ten to six dimensions, the most convenient choices for these matrices are certain off-diagonal forms, see for example [15, 17, 19, 20, 21, 22]. When one is interested in specific solutions of the six-dimensional supergravity equations, such as AdS3 × S3 solutions, and deriving the spectrum in such backgrounds, it is rather more convenient to use diagonal choices for these matrices, see for example [30, 31]. In this paper we both compactify from ten to six dimensions, and expand six-dimensional solutions about a given background. We therefore find it most convenient to use diagonal choices for the constant matrices. To use previous results on compactification and T-duality, we need to apply certain similarity transformations. For the most part these may be implicitly written in terms of basis change matrices, so that compactification and duality formulas remain as simple as possible. Thus let us define matrices Ω1 and Ω2 for SO(4, 20), and Ω3 and Ω4 for SO(5, 21) via: (vρ −wρ) (vρ +wρ) , ΩT3 (v − w) (v + w) ,(B.20) (v − w) (v + w) , ΩT4 (v1 − w1) (v2 − w2) (v2 + w2) (v1 + w1) where ρ = 1, · · · 4, (c) = 1, · · · 16, (a) = 1, · · · 24, α = 1, 2, 3 and α− = 1, · · · 19. These satisfy the conditions: 0 −I4 0 −I4 0 0 0 0 −I16 ΩT1 = 0 −I20  , (B.21) σ1 0 0 0 I3 0 0 0 −I19 ΩT2 = 0 −I20 σ1 0 0 0 I4 0 0 0 −I20 ΩT3 = 0 −I21 σ1 0 0 0 0 σ1 0 0 0 0 I3 0 0 0 o −I19 ΩT4 = 0 −I21 Here σ1 is the Pauli matrix C Properties of spherical harmonics Scalar, vector and tensor spherical harmonics satisfy the following equations �Y I = −ΛkY I , (C.1) �Y Iva = (1− Λk)Y Iva , DaY Iva = 0, �Y It = (2− Λk)Y It(ab), D aY It k(ab) where Λk = k(k+2) and the tensor harmonic is traceless. It will often be useful to explicitly indicate the degree k of the harmonic; we will do this by an additional subscript k, e.g. degree k spherical harmonics will also be denoted by Y Ik , etc. � denotes the d’Alambertian along the three sphere. The vector spherical harmonics are the direct sum of two irreducible representations of SU(2)L × SU(2)R which are characterized by ǫabcD bY cIv± = ±(k + 1)Y Iv±a ≡ λkY Iv±a . (C.2) The degeneracy of the degree k representation is dk,ǫ = (k + 1) 2 − ǫ, (C.3) where ǫ = 0, 1, 2 respectively for scalar, vector and tensor harmonics. For degree one vector harmonics Iv is an adjoint index of SU(2) and will be denoted by α. We use normalized spherical harmonics such that Y I1Y J1 = Ω3δ I1J1 ; Y aIvY Jva = Ω3δ IvJv ; Y (ab)ItY Jt = Ω3δ ItJt, (C.4) where Ω3 = 2π 2 is the volume of a unit 3-sphere. We define the following triple integrals as Y IY JY K = Ω3aIJK ; (C.5) (Y α±1 ) 1 = Ω3e αij ; (C.6) D Interpretation of winding modes In the fundamental string supergravity solutions (2.1) the null curves describing the motion of the string along a torus direction xρ (whose periodicity is 2πRρ) could have winding modes such that Fρ(v) = wρRρv/Ry, with wρ integral. Consider now the correspondence with quantum string states. Such winding modes are not consistent with both supersymmetry and momentum and winding quantization for a string propagating in flat space, with no B field. Recall that the zero modes of a worldsheet compact boson field can be written as X(σ+, σ−) = x+ + nR)σ+ + − nR)σ− ≡ x+ w̃σ+ + wσ−, (D.1) where R is the radius and (p, n) are the quantized momentum and winding respectively; note that we define σ± = (τ ± σ). BPS left-moving states with no right-moving excitations have w = 0 and hence α′p = −nR2. However the latter condition has no solutions at generic radius and so states with winding along the torus directions cannot be BPS. Therefore winding modes should not be included to describe the F1-P states and corresponding dual D1-D5 ground states of interest here. Now consider switching on constant B ρv ≡ bρ on the worldsheet. The constant B field shifts the momentum charges, and thus there are BPS left-moving states with winding around the torus directions. To be more precise, following the discussion of [12], one can describe a string with left-moving excitations using a null lightcone gauge. The relevant terms in the worldsheet fields are then V = wvσ−; U = wuσ− + w̃uσ+ + a−n e −inσ− ; (D.2) XI = δIρw Iσ− + −inσ− , where winding modes are included only along torus directions, labeled by ρ. The L0 con- straint implies wvwu = (wρ)2 + 2 |n|aInaI−n ≡ (wv)2|∂VXI |20, (D.3) where |A|0 denotes the projection onto the zero mode. The momentum and winding charges are given by dσ(∂τX m +B(2)mn∂σX n); Wm = dσ∂σX m, (D.4) respectively, where α′ = 2. Requiring no winding in the time direction and no momentum along the xρ directions imposes w̃u = wu + wv and wρ = bρw v . The conserved momentum and winding charges are then PM = 1 (1 + |∂VXI |20 + b2ρ), (|∂VXI |20 − b2ρ), 0 ; WM = wv(0, 1, 0, bρ). (D.5) Note that the integral quantized momentum charge py along the y direction is therefore py = Ry(w u − (wv)−1(wρ)2). (D.6) Now consider the solitonic string supergravity solution (2.1) with defining curves F I(v) where F ρ(v) = bρv + F̄ ρ(v), with F̄ ρ(v) having no zero mode. The ADM charges of this solitonic string were computed in [15], and are given by PMADM = kQ (1 + |∂vF I |20), |∂vF I |20, 0, bρ , (D.7) where the effective Newton constant is k = Ω3Ly/2κ 6. When bρ = 0 these charges match the worldsheet charges (D.5) provided that wv = 2kQ as in [15] but when bρ 6= 0 they do not quite agree with the worldsheet charges. The reason is that in the supergravity solution ρv approaches zero at infinity, but to match with the constant B ρv background on the worldsheet, B ρv should approach bρ at infinity. This can be achieved via a constant gauge transformation Aρ → Aρ − bρ, combined with a coordinate shift u→ u+ 2bρxρ. The ADM charges of this shifted background indeed exactly match the worldsheet charges (D.5). The harmonic functions Aρ then take the form Aρ = −bρH − |x− F |2 , (D.8) where in the latter expression |x−F |2 denotes i−F i(v))2; the harmonic function has been smeared over the T 4 and the y circle. Note that when F i(v) = 0 the supergravity solution collapses to ds2 = H−1dv(−du +Kdv) + dxIdxI ; K = (1 + Q|∂vF ρ|20 ), (D.9) e−2Φ = H ≡ (1 + Q ); B(2)uv = (H−1 − 1); B(2)vρ = −bρ. This is the naive SO(4) invariant F1-P solution, with an additional constant B field. Finally let us note that one can similarly switch on winding modes for the curves q(c)(v) charac- terizing the charge waves in the heterotic solution (3.1) by including constant A v on the worldsheet. Now let us consider solutions in the D1-D5 system, and the interpretation of including winding modes of the internal curves. In particular, it is interesting to note that the general SO(4) invariant solutions include harmonic functions A = ao + ; Aα− = aα−o + , (D.10) in addition to the harmonic functions (H,K) given in (6.29). The non-constant terms in these harmonic functions are related to the winding modes of the internal curves, with the quantities aα̃ = (a, aα−) being given by a = −Q5 dvḞ(v); aα− = −Q5 dvḞα−(v). (D.11) Following the duality chain, these constants are given by aα̃ = −Q5bα̃ where for the T 4 case bα̃ ≡ B(2)ρv = bρ and for the K3 case bα̃ ≡ (B(2)ρv = bρ, A(c)v = b(c)). The constant terms (ao, a o ) are related to the boundary conditions at asymptotically flat infinity, as we will discuss below. When these functions (A,Aα−) are non-zero, the geometry generically differs from the naive D1-D5 geometry. The functions (f1, f̃1) appearing in the metric behave as f̃1 = 1 + − (1 + Q5 (ao + )2 + (aα−o + f1 = 1 + − (1 + Q5 (aα−o + . (D.12) In the decoupling limit these functions become f̃1 → r−2(Q1 −Q−15 (a2 + aα−aα−)) ≡ ; f1 → r−2(Q1 −Q−15 (aα−aα−)) ≡ , (D.13) and thus (ao, a o ) drop out. Note that q̃1 corresponds to the conserved momentum charge in the F1-P system (D.6). Substituting the decoupling region functions into (4.1), one finds that the near horizon region of the solution is AdS3×S3×M4, supported by both F (3) and H(3) flux: ds2 = (−dt2 + dy2) + q1Q5( + dΩ23) + ds2M4 ; (D.14) e2Φ = Q5q̃1 tyr = − = 2q−11 q̃1Q5; tyr = 2aQ 1 r, H = −2a. The field strengths F (1) and F (5) vanish, but there are non-vanishing potentials: B(2)ρσ = 2Q−15 a α−ωα−ρσ , C (0) = −q−11 a, C(4)ρστπ = Q 5 aǫρστπ; (D.15) tyαβ = a(1 + q̃ 2)ǫαβ, C αβρσ = 2 2ǫαβa α−ωα−ρσ , C tyρσ = 2Q−15 a α−ωα−ρσ , where ǫ is a 2-form such that dǫ is the volume form of the unit 3-sphere. The conserved charges therefore include Chern-Simons terms; using the equations of motion (A.2) one finds that they are given by D5 : Q5 = (F (3) +H(3)C(0)); D1 : q̃1 = S3×M4 (∗F (3) +H(3) ∧ C(4)); (D.16) D3 : aα− = S3×ωα− B(2) ∧ (F (3) +H(3)C(0)); NS5 : a = −1 H(3), where we drop terms which do not contribute to the charges. The curvature radius of the AdS3 × S3 is l = (q1Q5)1/4, and the three-dimensional Newton constant is 8πV4Ω3 (q1Q5) 3/4, (D.17) with the volume of M4 being (2π)4V and 2κ210 = (2π) 7(α′)4. Then using [40, 41] the central charge of the dual CFT is (α)′4 q̃1Q5 ≡ 6ñ1n5 (D.18) where the integral charges (ñ1, n5) are given by Q5 = α ′n5; q̃1 = (α′)3ñ1 . (D.19) Now consider the relation between this system and the F1-P system discussed previously. The conserved charges here are (Q5, q̃1, a, a α−), which correspond to the winding, momen- tum along the y circle and winding along the internal manifold in the original system. The fact that (a, aα−) measure NS5-brane and D3-brane charges in the final system is consistent with the duality chains from the F1-P systems: applying the standard duality rules along the chains given in (2.6),(2.7) and (3.4), one indeed finds that the original winding charges become NS5-brane and D3-brane charges. Finally let us comment on the constant terms in the harmonic functions, (ao, a These clearly determine the behavior of the solution at asymptotically flat infinity: the B field and RR potentials at infinity depend on them. Now consider how these constant terms can be described in the CFT. In the context of the pure D1-D5 system it was noted in [12] that (infinitesimal) constant terms in the harmonic functions (f1, f5) can be reinstated by making (infinitesimal) irrelevant deformations of the CFT by SO(4) singlet operators. See also [42] for a related discussion in the context of the AdS5/CFT4 correspondence. It seems probable that a similar interpretation would hold here: the (nt − 1) parameters (ao, a o ) (where nt = 5, 21 for T 4 and K3 respectively) would be related to the parameters of deformations of the CFT by irrelevant SO(4) singlet operators. In total taking into account these (nt − 1) zero modes, plus the two constant terms in the (f1, f5) harmonic functions, one gets (nt+1) parameters. This agrees exactly with the count of the number of irrelevant SO(4) singlet operators5. How to describe these deformations in the field theory beyond the infinitesimal level is not known, however. E Density of ground states with fixed R charges In this appendix we will derive an asymptotic formula for the number of R ground states with given R charges. Our derivation follows closely that of [43] for the density of fundamental string states with a given mass and angular momentum. In fact, we will consider the case of K3, so the relevant counting is precisely that of the density of left moving heterotic string states with a given excitation level N and (commuting) angular momenta (j12, j34) in the 5Such deformations may also be related to the attractor flow of moduli; this idea is currently being developed by Kyriakos Papadodimas and collaborators. transverse R4. For this purpose we can consider the following Hamiltonian (a)=1 + λ1j 1 + λ2j 2, (E.1) where (λ1, λ2) are Lagrange multipliers and j1 = j12 = −i n−1(α1−nα n −α2−nα1n); j2 = j34 = −i n−1(α3−nα n−α4−nα3n). (E.2) Here the oscillators satisfy the standard commutation relations, namely n , α nδn+mδ (a)(b). In [43] the partition function was computed in the case λ2 = 0, and thus the partition function of interest here can be computed by generalizing their results. The first step is to diagonalize the Hamiltonian by introducing combinations a12n = (α1n + iα n); b (α1n − iα2n) (E.3) and analogously (a34n , b n ). Then the Hamiltonian takes the form (a)=5 n + (n− λ1)(a12n )†a12n + (n+ λ1)(b12n )†b12n (E.4) +(n− λ2)(a34n )†a34n + (n+ λ2)(b34n )†b34n The partition function Z = Tr(e−βH) is then (1− wn)−20(1− c1wn)−1(1− c−11 wn)−1(1− c2wn)−1(1− c (E.5) with w = e−β and c1 = e βλ1 , c2 = e βλ2 . To estimate the asymptotic density of states, one as usual expresses the partition function in terms of modular functions and then uses the modular transformation properties. Here one needs the Jacobi theta function θ1(z|τ) = 2f(q2)q1/4 sin(πz) (1− 2q2n cos(2πz) + q4n), (E.6) f(q2) = (1− q2n), q = eiπτ , (E.7) and the modular transformation property | − 1 ) = eiπ/4 τeiπz 2/τθ1(z|τ) (E.8) Rewriting the partition function in terms of the modular functions, applying this modular transformation and then taking the high temperature limit results in Z(β, λ1, λ2) = Cβ 12e4π 2/β λ1λ2 sin(πλ1) sin(πλ2) , (E.9) with C a constant. From this expression one can extract the density of states with level N and angular momenta (j1, j2) by expanding Z(w, k1, k2) = dN,j1,j2w Neik1j 1+ik2j , (E.10) where k1 = −iβλ1 and k2 = −iβλ2, and projecting out the dN,j1,j2 . Integrating over (k1, k2) can be done exactly, since dkeiky sinh(πk/β) cosh2(βy/2) , (E.11) resulting in the following contour integral over a circle around w = 0 for dN,j1,j2 : dN,j1,j2 = C β14e4π 2/β 1 cosh2(βj1/2) cosh2(βj2/2) . (E.12) Assuming N is large the integral can be approximated by a saddle point evaluation, with the saddle point defined by the solution of = N + 1− j1 tanh(1 j1β)− j2 tanh(1 j2β). (E.13) For small angular momenta, which is the case of primary interest here, the solution is β ∼= 2π/ N + 1. For ( ∣) = O(N) the stationary point is at N + 1− |j1| − |j2| . (E.14) Note that ∣ ≤ N . This latter stationary point is equally applicable to small angular momenta, and thus one can write the asymptotic density of states as dN,j1,j2 4(N + 1− j)31/4 2π(2N − j)√ N + 1− j cosh2( N+1−j ) cosh N+1−j ) , (E.15) where j = ∣. The constant of proportionality is fixed by the state with j1 = N , j2 = 0 being unique. Note that the commuting generators (j3, j̄3) of (SU(2)L, SU(2)R) respectively are related to the rotations in the 1-2 and 3-4 planes via j3 = (j1 + j2) and j̄3 = (j1 − j2). The total number of states at level N is dN ∼= N27/4 exp(4π N), (E.16) and thus the density of states with zero angular momenta differs from the total number of states only by a factor of 1/N ; the exponential growth with N is the same. References [1] O. Lunin and S. D. Mathur, “AdS/CFT duality and the black hole information para- dox,” Nucl. Phys. B 623, 342 (2002) [arXiv:hep-th/0109154]. [2] O. Lunin, S. D. Mathur and A. Saxena, “What is the gravity dual of a chiral primary?,” Nucl. Phys. B 655 (2003) 185 [arXiv:hep-th/0211292]. [3] O. Lunin, J. Maldacena and L. Maoz, “Gravity solutions for the D1-D5 system with angular momentum,” arXiv:hep-th/0212210. [4] O. Lunin and S. D. Mathur, “Metric of the multiply wound rotating string,” Nucl. Phys. B 610, 49 (2001) [arXiv:hep-th/0105136]. [5] V. Balasubramanian, J. de Boer, E. Keski-Vakkuri and S. F. Ross, “Supersymmetric conical defects: Towards a string theoretic description of black hole formation,” Phys. Rev. D 64, 064011 (2001) [arXiv:hep-th/0011217]. [6] J. M. Maldacena and L. Maoz, “De-singularization by rotation,” JHEP 0212 (2002) 055 [arXiv:hep-th/0012025]. [7] S. D. Mathur, “The fuzzball proposal for black holes: An elementary review,” arXiv:hep-th/0502050. [8] I. Bena and N. P. Warner, “Black holes, black rings and their microstates,” arXiv:hep- th/0701216. [9] V. S. Rychkov, “D1-D5 black hole microstate counting from supergravity,” JHEP 0601 (2006) 063 [arXiv:hep-th/0512053]. [10] K. Skenderis and M. Taylor, “Kaluza-Klein holography,” JHEP 0605, 057 (2006) [arXiv:hep-th/0603016]. [11] K. Skenderis and M. Taylor, “Fuzzball solutions for black holes and D1-brane-D5-brane microstates,” Phys. Rev. Lett. 98, 071601 (2007) [arXiv:hep-th/0609154]. [12] I. Kanitscheider, K. Skenderis and M. Taylor, “Holographic anatomy of fuzzballs,” JHEP 0704 (2007) 023 [arXiv:hep-th/0611171]. [13] M. Taylor, “General 2 charge geometries,” JHEP 0603 (2006) 009 [arXiv:hep- th/0507223]. [14] C. G. Callan, J. M. Maldacena and A. W. Peet, “Extremal Black Holes As Fundamental Strings,” Nucl. Phys. B 475, 645 (1996) [arXiv:hep-th/9510134]. [15] A. Dabholkar, J. P. Gauntlett, J. A. Harvey and D. Waldram, “Strings as Solitons and Black Holes as Strings,” Nucl. Phys. B 474, 85 (1996) [arXiv:hep-th/9511053]. [16] E. Bergshoeff, R. Kallosh, T. Ortin, D. Roest and A. Van Proeyen, “New formulations of D = 10 supersymmetry and D8 - O8 domain walls,” Class. Quant. Grav. 18, 3359 (2001) [arXiv:hep-th/0103233]. [17] A. Sen, “String String Duality Conjecture In Six-Dimensions And Charged Solitonic Strings,” Nucl. Phys. B 450 (1995) 103 [arXiv:hep-th/9504027]. [18] A. Sen, “Strong - weak coupling duality in four-dimensional string theory,” Int. J. Mod. Phys. A 9 (1994) 3707 [arXiv:hep-th/9402002]. [19] J. Maharana and J. H. Schwarz, “Noncompact symmetries in string theory,” Nucl. Phys. B 390, 3 (1993) [arXiv:hep-th/9207016]. [20] D. Youm, “Black holes and solitons in string theory,” Phys. Rept. 316, 1 (1999) [arXiv:hep-th/9710046]. [21] E. Bergshoeff, H. J. Boonstra and T. Ortin, “S Duality And Dyonic P-Brane Solutions In Type II String Theory,” Phys. Rev. D 53 (1996) 7206 [arXiv:hep-th/9508091]. [22] K. Behrndt, E. Bergshoeff and B. Janssen, “Type II Duality Symmetries in Six Di- mensions,” Nucl. Phys. B 467 (1996) 100 [arXiv:hep-th/9512152]. [23] L. J. Romans, “Selfduality For Interacting Fields: Covariant Field Equations For Six- Dimensional Chiral Supergravities,” Nucl. Phys. B 276 (1986) 71. [24] S. de Haro, S. N. Solodukhin and K. Skenderis, “Holographic reconstruction of space- time and renormalization in the AdS/CFT correspondence,” Commun. Math. Phys. 217, 595 (2001) [arXiv:hep-th/0002230]. [25] M. Bianchi, D. Z. Freedman and K. Skenderis, “How to go with an RG flow,” JHEP 0108, 041 (2001) [arXiv:hep-th/0105276]. [26] M. Bianchi, D. Z. Freedman and K. Skenderis, “Holographic renormalization,” Nucl. Phys. B 631, 159 (2002) [arXiv:hep-th/0112119]. [27] I. Papadimitriou and K. Skenderis, “AdS / CFT correspondence and geometry,” arXiv:hep-th/0404176. [28] I. Papadimitriou and K. Skenderis, “Correlation functions in holographic RG flows,” JHEP 0410, 075 (2004) [arXiv:hep-th/0407071]. [29] K. Skenderis, “Lecture notes on holographic renormalization,” Class. Quant. Grav. 19 (2002) 5849 [arXiv:hep-th/0209067]. [30] S. Deger, A. Kaya, E. Sezgin and P. Sundell, “Spectrum of D = 6, N = 4b supergravity on AdS(3) x S(3),” Nucl. Phys. B 536, 110 (1998) [arXiv:hep-th/9804166]. [31] G. Arutyunov, A. Pankiewicz and S. Theisen, “Cubic couplings in D = 6 N = 4b su- pergravity on AdS(3) x S(3),” Phys. Rev. D 63 (2001) 044024 [arXiv:hep-th/0007061]. [32] J. R. David, G. Mandal and S. R. Wadia, “Microscopic formulation of black holes in string theory,” Phys. Rept. 369, 549 (2002) [arXiv:hep-th/0203048]. [33] L. F. Alday, J. de Boer and I. Messamah, “The gravitational description of coarse grained microstates,” JHEP 0612, 063 (2006) [arXiv:hep-th/0607222]. [34] F. Larsen and E. J. Martinec, “U(1) charges and moduli in the D1-D5 system,” JHEP 9906, 019 (1999) [arXiv:hep-th/9905064]. [35] A. Jevicki, M. Mihailescu and S. Ramgoolam, “Gravity from CFT on S**N(X): Sym- metries and interactions,” Nucl. Phys. B 577, 47 (2000) [arXiv:hep-th/9907144]. [36] E. Bergshoeff, C. M. Hull and T. Ortin, “Duality in the type II superstring effective action,” Nucl. Phys. B 451, 547 (1995) [arXiv:hep-th/9504081]. [37] M. J. Duff, J. T. Liu and R. Minasian, “Eleven-dimensional origin of string / string duality: A one-loop test,” Nucl. Phys. B 452, 261 (1995) [arXiv:hep-th/9506126]. [38] P. K. Townsend, “A New Anomaly Free Chiral Supergravity Theory From Compacti- fication On K3,” Phys. Lett. B 139 (1984) 283. [39] P. S. Aspinwall, “K3 surfaces and string duality,” arXiv:hep-th/9611137. [40] J. D. Brown and M. Henneaux, “Central Charges in the Canonical Realization of Asymptotic Symmetries: An Example from Three-Dimensional Gravity,” Commun. Math. Phys. 104 (1986) 207. [41] M. Henningson and K. Skenderis, “The holographic Weyl anomaly,” JHEP 9807, 023 (1998) [arXiv:hep-th/9806087]. [42] K. Skenderis and M. Taylor, “Holographic Coulomb branch vevs,” JHEP 0608, 001 (2006) [arXiv:hep-th/0604169]. [43] J. G. Russo and L. Susskind, “Asymptotic level density in heterotic string theory and rotating black holes,” Nucl. Phys. B 437, 611 (1995) [arXiv:hep-th/9405117]. ABSTRACT We construct general 2-charge D1-D5 horizon-free non-singular solutions of IIB supergravity on T^4 and K3 describing fuzzballs with excitations in the internal manifold; these excitations are characterized by arbitrary curves. The solutions are obtained via dualities from F1-P solutions of heterotic and type IIB on T^4 for the K3 and T^4 cases, respectively. We compute the holographic data encoded in these solutions, and show that the internal excitations are captured by vevs of chiral primaries associated with the middle cohomology of T^4 or K3. We argue that each geometry is dual to a specific superposition of R ground states determined in terms of the Fourier coefficients of the curves defining the supergravity solution. We compute vevs of chiral primaries associated with the middle cohomology and show that they indeed acquire vevs in the superpositions corresponding to fuzzballs with internal excitations, in accordance with the holographic results. We also address the question of whether the fuzzball program can be implemented consistently within supergravity. <|endoftext|><|startoftext|> Introduction While the emergence and learning of human languages has been simulated since decades on computers [1], and while a later economics Nobel laureate also contributed to linguistics long ago [2], the competition between existing languages of adults is a more recent research trend, where physicists have tried to play a major role. It follows the principle of survival of the fittest, as known from Darwinian evolution in biology, and indeed many of the tech- niques have been borrowed from simulational biology [3]. This emphasis from physics on the competition of existing languages for adult humans started with Abrams and Strogatz[4] and was then followed by at least six groups independently [5, 6, 7, 8, 9, 10]. More recently, of course, reviews[11, 3] and conferences brought them together, and others followed them [12, 13, 14]. Today about 7000 different languages (as defined by linguists) are spoken, and about every ten days one of them dies out. On the other hand, the split of Latin into different languages spoken from Portugal to Romania is well documented. In statistical physics, we can describe and explain the pressure which air molecules of a known density and temperature exert on the walls. But we cannot predict were one given molecule will be one second from now. Similarly, the application of statistical physics tools to linguistics may describe the ensemble of the seven thousand presently existing languages, http://arxiv.org/abs/0704.0691v1 1000 1 10 100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 size s Distribution of language sizes from Grimes, Ethnologue, and 550 exp[-0.05{ln(size/7000)}**2] Figure 1: Empirical variation of the number Ns of languages spoken by s people each. For better presentation, the language sizes s are binned in powers of two. Data from Ethnologue [16], as plotted in [17]. The parabola corresponds to a log-normal distribution; we see deviations from it for the smallest sizes [18]. but not the extinction of one given language in one given region on Earth. Figure 1 shows how many languages exist today, as a function of the number of speakers of that language. A statistical theory of language competition thus first of all should try to reproduce such results, in order to validate the model. If if fails to describe this fact, why should one trust it at all? Or as stated by linguist Yang on page 216 of [15]: It is time for the ancient field of linguistics to join the quantitative world of modern science. This review starts with our own model for numerous languages in section 2, followed by a review of the alternative model of Viviane de Oliveira and coworkers [10]. Then we review more shortly the many other models which at present do not allow the simulation of thousands of different languages. 2 Schulze Model 2.1 Definition Our own simulations, also called the Schulze model, characterise each lan- guage (or grammar) by F independent features each of which can take one of Q different values; the binary case Q = 2 allows the storage in bit-strings. Three basic mechanisms connected with probabilities p, q and r are common to all variants: i) With probability p at each iteration and for each feature, this feature is changed (or mutated in biological language). This change is random or not, depending on process ii). ii) With probability q the mutation/change under i) is not random but instead transfers the value of this feature from another person in the popu- lation. This transfer is called diffusion by linguists. With probability 1− q, the change is random. iii) With probability (1 − x)2r (also (1 − x2)r has been used instead) somebody discards the mother language and takes over the whole language (all F features) from another person in the population. Here x is the fraction of people speaking the old language. This flight is called shift by linguists. Several variants are possible: One can use one joint population where everybody can meet everybody for transfer and shift; or we put people onto a square L× L lattice or more complicated network, where diffusion and/or shift are possible only from a randomly selected neighbour. People may migrate on this lattice, which physicists would call diffusion. The population can be fixed, meaning that at every iteration all adults are replaced by their children. Or it can grow by a suitable birth and death process; in this case the shifting probability can include also a factor proportional to the population. If one dislikes to have three free parameters p, q, r one may set q = r = 1 without much loss in results. For the number F of features, from 8 to 64 were used in simulations. Real languages contain many thousands of words for everyday use, and thus one should identify one feature rather with an independent grammatical element (like the order of subject, object and verb in a sentence) than with a word. F for real languages was estimated as about 30 [19] or about 40 to 50 [15] such choices, and the Word Atlas of Language Structures [20] lists 138 features with up to Q = 9 values. These grammar sizes thus correspond roughly to what has been simulated. According to [21] the rate of change in normal linguistic typological features, i.e., excluding a few extraordinarily unstable ones, is 16 % per 1000 years. 2.2 Results If we start with everybody speaking one language (or with just one Eve), then at low p this language still dominates and is spoken by more than half of the total population, with the remaining people speaking a minor and short-lived variant of this dominating language. At high p, on the other hand, the whole population soon fragments into many languages, roughly such that everybody selects nearly randomly one of the QF possible languages. This corresponds to the biblical story of the Tower of Babel. We thus have dominance for small p and fragmentation for large p, with a first-order phase transition or jump at some threshold value which depends on the other parameters and details of the model. If instead we start with everybody speaking a randomly selected lan- guage, then for high p this situation remains. For low p, however, after some time one language by random accidents happens to grow to a sufficiently large size such that it then grows rapidly to be spoken by more than half of the population. Thus a transition from initial fragmentation to final dom- inance happens in Fig.2. The threshold value is different from the one for the opposite direction from dominance to fragmentation: we have hysteresis as is common for first-order transitions, Fig.3. Empirically, this transition to one dominating language was observed on the American continent; within the last five centuries, two thirds of the native Brazilian languages have died out. And in the last half century we observed the rise of English in physics research publications. While 85 years ago, physicist Bose sent from India his paper to Einstein in order to have it translated from English to German (which lead to Bose-Einstein condensation), after World War II physics re- search was usually published in English, first in Japan, since the 1960’s in (West) Germany, a decade later in France, since the 1990’s in Russia; finally, China has witnessed a surge in physics papers written in English since 2000. The time needed to go from fragmentation to dominance increases roughly logarithmically with population size, at least in the binary case Q = 2 with- out lattice. Thus a mathematical solution for an infinite population might never get this transition. In other words, proper models should be agent- based [22], with individuals acting on their own; one should not average over 10000 100000 100 M 0 100 200 300 400 500 600 Size of largest language for 100M, 10M, 1 M, 0.1 M, 10 K, 1 K people, 0.48 mutations per person, q=0.94 Figure 2: Variation with time of the number of people speaking the most widespread language, for various population sizes. The larger the population is, the longer is the time until the transition from fragmentation to dominance takes place. Q = 2, F = 8, p = 0.06, q = 0.94; from [11]. the whole population, using differential equation for the concentrations. Such simulations have been standard in computational physics for half a century (Monte Carlo and Molecular Dynamics), while mean field approximations average over many individuals and can give somewhat or completely wrong results. (The transition from fragmentation to dominance may require a shift probability (1− x)2r instead of (1− x2)r.) The language size distribution to be compared with Fig.1 is shown in Fig.4a. To get it, we looked at non-equilibrium results and introduced ran- dom multiplicative noise, since otherwise the language sizes were too small and their distribution too irregular. Fig.4b avoids these tricks and instead places the people on a directed scale-free network, discussed below. No lattice or other spatial structure was employed in the above simula- tions. On a lattice one can look at language geography [23, 24, 25]. North and South of the Alps, different languages are spoken, and the same sepa- ration is made by the English Channel. Genetic and linguistic boundaries in Europe mostly coincide, and about two thirds of them agree with natural boundaries like a mountain chain or sea [26]. We simulated this effect on a 0.01 0.1 1 10 population in millions Critical mutation rate, no transfer, initially one (top) or initially many random languages (bottom) Figure 3: Dependence of the mutation threshold for the phase transition on the population size; upper data from dominance to fragmentation, lower data from fragmentation to dominance. Above the curves we arrive at frag- mentation, below at dominance. From [11]. lattice [27] with contact only between nearest neighbours and a horizontal barrier separating the upper from the lower half. The shift from a small to a large language happens across the barrier only with a small crossing probability c. For c = 0 one thus has two completely separated halves of the lattice, and trivially the languages which evolve as dominating are different on both sides of the border. With c = 1 the border has no effect, and only one language dominates. Fig.5 shows how often for small but finite c two separate dominating languages may coexist; already quite small c suffice, particularly for large lattices, to unify the two regions into only one with the same language dominating on both sides. Fig.4b employed a directed Barabási-Albert scale-free network. These networks are grown from a small fully connected core such that each new network member selects m already existing members as teachers. The more people have selected a certain teacher before, the higher is the probability that this teacher will again be selected. Information only flows from the teacher to the person who selected this teacher, not in the opposite direction [31]. 1000 10000 100000 10 M 1 10 100 1000 10000 100000 number of speakers Two runs of 200 million; fragmented start, mutation rate 0.0032 per bit-string, t < 5000 or 6000 10 K 100K 1 10 100 1 K 10 K 100K 1 M 10 M number of speakers N = 1 million (+), 3 M (x), 10 M (*), 30 M (sq.) Figure 4: Language size distribution Ns. Top: without lattice, with random multiplicative noise, not in equilibrium [17]. Bottom: On scale-free network in equilibrium [30]. 0.0001 0.001 0.01 0.1 crossing rate 100 samples, L = 50 (right line), 100 (+,x), 200 (left line) Figure 5: Fraction of cases when a semi-permeable barrier allowed two different languages to dominate on its two sides in the Schulze model. 3 Viviane Model 3.1 Definition The model of Viviane de Oliveira et al [10] has become known as the Viviane model (following the Brazilian tradition of how to call people). It simulates the colonisation of an uninhabited continent by people. Each site j of an L×L square lattice can later be populated by cj people; this carrying capacity cj is selected randomly between 1 and some m ∼ 102. Initially only one site i is occupied by ci people. Then at each time step, one randomly selected empty lattice neighbour j of occupied sites becomes occupied with probability cj/m by cj people. Thus after some time the whole lattice becomes occupied and the simulation stops. In contrast to the Schulze model, the Viviane model is a growth process and not one eventually fluctuating about some equilibrium. Languages have no internal structure and are simply numbered 1, 2, 3,,, with 1 being the number of the language spoken on the originally occupied site. All people within one lattice site speak the same language. First, if a new site has been colonized the language spoken there is taken from one of the occupied neighbours k, proportional to the fitness Fk of that neighbour site k. This fitness is the total number of people speaking anywhere in the lattice the language of k, except that it is bounded from above by a maximum fitness Mk fixed randomly between 1 and some Mmax ∼ 10 3. Second, mutations (language change) are made with probability α/Fj on the freshly occupied site j only, from the selected language of neighbour k. A mutation means that a new language is created which gets a new number not used previously. In this way, the flight (shift) from small language and the mutations, which were two separate processes in the Schulze model, are combined into one process; and this process also is a transfer (diffusion) process which in the Schulze model was dealt with separately. Thus we have here only one free parameter α, the mutation coefficient, instead of three parameters p, q, r in the Schulze model. Variants allow mutations also later, after a site is occupied. Or a language is characterised by a string of F bits (Q = 2 in the Schulze notation) and only different bit-strings count as different languages [28]. Or the capacities ck are not homogeneously distributed between 1 and m but more often small than large, with a frequency proportional to 1/cj, as long as it is not larger than the maximum m. Lot of computer time then is saved if after the occupation of the new sites one selects two of its occupied neighbours and takes the language from the one with the bigger capacity; if only one neighbour is occupied then its language is taken over. 3.2 Results In contrast to the Schulze model, the Viviane model gives languages spoken by 109 people if a sufficiently large lattice is used. The language size distri- bution Ns has a maximum at moderately small language sizes s. However, instead of a round parabola as in Fig.1, the log-log plot of Ns versus s gives two straight lines meeting at the maximum, that means one power law for small s (where Ns increases with s) and another power law for large s (where Ns decreases with s). So, not everything is solved yet. Crucial progress was made by Paulo Murilo de Oliveira (not the same family as Viviane de Oliveira), who introduced the above-mentioned modifi- cations [28]: Languages (grammars) are characterized by bit-strings (Q = 2) of length F ≃ 13 and count as different only when their bit-strings differ; the carrying capacities c are selected with a probability ∝ 1/c, and the newly colonized site gets the fitter language of two previously occupied neighbours. 100K 1 10 100 1 K 10K 100K 1 M 10M 100M 1 G number of speakers L = 20,000; 7500 languages; 5940 million people Figure 6: Variation of the number Ns of languages spoken by s people each in the Viviane model, as modified and published in [28]. Now the distribution is roughly log-normal, Fig.6, with enhancement for very small sizes; the total population and the total number of languages can be made close to the present reality, the maximum of the parabola in a double- logarithmic plot (with binning by factors of two in s) is near s ≃ 104, while the largest language is spoken by 109 people, similar to Mandarin Chinese. Fig.7 shows for both the modified Viviane model and the Schulze lattice model that languages in general are less similar to each other if they are widely separated geographically, in agreement with reality [24, 25]. Note the difference in scales: One lattice constant (distance between nearest neigh- bours) corresponds to about one kilometer in the Viviane model and 1000 kilometers in the Schulze model, if Fig.7 is compared to reality [24]. Also the classification of different languages into one family, like the Indo- European languages, has been simulated with moderate success. Following the history of the mutations during the colonisation, a language tree like can be constructed in the unmodified version (Fig.11.15 in [11]). One can imagine that this is Latin, splitting up into Romanian, Italian, Spanish and French, with Spanish then splitting into Castellan, Galego and Catalan, and Catalan mutating further into Mallorquin. (Many small branches were omitted for clarity.) More quantitative information is obtained from the modified bit- 0 2 4 6 8 10 12 14 16 18 20 geographical distance in 1000 Modified Viviane model, line = random differences 1 2 3 4 5 6 7 8 9 10 geographical distance Schulze model, Q=5, F=9, p=0.5, q=r=0.9, s=0.5 Figure 7: Language differences (in arbitrary units [24]) as a function of geographical distance in the Viviane model (top) and the Schulze model (bottom). The horizontal line corresponds to completely uncorrelated lan- guages. In the bottom part, + and x correspond to start with fragmentation (+) and with dominance (x). From [24, 30]. string version [28]. The mutated language on a newly occupied site starts a new family if it differs in two or more bits from the bit-string characterising the historically first language of the old family. The size distribution of language families in Fig.8 agrees in its central part with the empirically observed [29] exponent –0.525 and is independent of the length F = 8, 16, 32 or 64 of the bit-strings for the Viviane model and independent of the population size for the Schulze model. 1000 10000 100000 1 10 100 1000 number of languages 100 * L=10,000; 8(+), 16(x), 32(*), 64(sq.) bits 10 K 100K 1 10 100 1000 number of families N = 1 million (+), 3 M (x), 10 M(*), 30 M(sq.) Figure 8: Number of families as a function of the number of different lan- guages in this family [30]. Top: Modified Viviane model for various lengths of the bit-strings. Bottom: Schulze model with Q = 5, F = 8 on directed Barabási-Albert scale-free network, p = 0.5, q = 0.59, r = 0.9 for various population sizes. 4 Other Models Years before physicists invaded en masse the field of linguistics, Nettle [33] already wrote down a differential equation for the number L of languages, dL/dt = 70/t− L/20 , where time t is measured in millennia. For long times, only one language (mathematically: zero languages) will remain; however that time lies far in the future. A more detailed splitting mechanism was introduced by Novotny and Drozd [34] for the emergence of new languages from one mother lan- guage, and gave a log-normal distribution of language sizes, in agreement with reality except presumably at the smallest sizes [18]. In the same spirit of looking at languages as a whole, ignoring the individuals, are the very re- cent models of Tuncay [14], who coupled a splitting mechanism with random multiplicative noise in the size of the growing population, plus an extinction probability, and found a power-law decay or a log-normal size distribution for the simulated languages, depending on parameters. He also checked the lifetimes of the simulated languages. An “early” attempt to apply the Ising model of statistical physics to linguistics [36] had little success. Numerous coupled differential equations were studied by scientists coming from theoretical chemistry, mathematics and computer science [35] for the purpose of language learning by children. They have been applied [3] also to the competition of up to 8000 languages of adults, but since the original authors have to our knowledge not followed this re-interpretation of their learning model we now refer to [35, 3] for details and results. It was the population dynamics of Abrams and Strogatz [4] which started the avalanche of physics papers on language competition. They assume two languages X and Y, spoken by the fractions x and y = 1 − x of a fixed population with a time dependence dx/dt = yxas− xya(1− s) , with a status or prestige variable S which is close to one if X has a high prestige and close to zero for low prestige of X. The neutral case is s = 1/2. The exponent a = 1.31 was fitted to some empirical data of how minority languages decay in size. If a is replaced by unity we arrive at the logistic equation of Verhulst from the 19th century, which was applied to languages by Shen in 1997. 0 1 2 3 4 5 6 7 8 9 10 Abrams-Strogatz model for a = 1.31 and s = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6 (bottom to top); unsymmetric Figure 9: Abrams-Strogatz model: Fraction of people speaking language X, versus time, for an initial concentration of only ten percent and various prestige values s of this language. For s ≥ 0.7 this language finally wins over the other language Y. From [3]. Fig.9 shows the resulting x(t) if X is spoken initially by a minority of ten percent only. Then for low, neutral or slightly higher status s of its language X, the fraction decays further towards zero, but for a higher status like x = 0.7 it finally wins over and is spoken by everybody (not shown). This may correspond to the influence of a colonial power; indeed in France today most people speak French as a result of the Roman conquest of more than two millennia ago, and in Brazil many of the native languages have become extinct in the last five centuries since the Portuguese arrived there. This Abrams-Strogatz approach was soon generalized to a lattice by Pa- triarca and Lepännen [4], and later to populations with bilingual speakers [5, 38], coexistence of the two languages [8] as well simulations based on individuals [13]. Such agent-based simulations were also made by Kosmidis et al [6] who gave each person a string of 20 bits. The first 10 belonged to one language, the last 10 to another language. In this way they were able to simulate people speaking, more of less correctly, one or two languages. One could also interpret their model as one for English which is a mixture of German (Anglo-Saxon) and French words, due to the conquest of England by the Normans in 1066. Finally, Schwämmle [9] also used bit-strings, but to describe biological ageing through the Penna model. The child can learn the languasge from the mother, the father, or both, thus also allowing for bilinguals. This model allowed to simulate that languages are learned easier in youth than at old age. In this way it builds a bridge between language competition and language learning [35]. 5 How physics may inform linguistics: prospects for future research As the research described above has progressed a larger design has become apparent, which consists in an empirical side looking for quantitative distribu- tions involving languages [29, 24, 37] and a development of models simulating similar quantitive distributions. The hope is that as more and more quantifi- able relations in and among languages are discovered and simulation models are developed which can adequately replicate these distributions, the simu- lation models will of necessity become more and more adequate as models of actual languages, and could therefore be employed for purposes beyond the ones for which they were designed. For instance, the revised Viviane model, which was designed to capture the distribution of speaker popula- tions and the population of languages within families [30], could potentially be employed for investigating absolute rates of language change, an issue with which linguists are very much concerned [39], inasmuch as knowledge of how fast languages change could provide us with a way to date prehis- toric events involving people speaking given reconstructed languages. Thus, a strand of research where linguists and physicists can and will continue to cooperate is the search for quantifiable distributions on the one hand and the fine-tuning of models which can adequately simulate an increasing range of such distributions. Apart from some exceptions [5, 6, 38], most work on language compe- tition has assumed monolingual speakers. Since most of the world’s pop- ulation is bi- or multilingual this is clearly not adequate. Language shift will normally involve transitional bilingualism, or bilingualism may persist for centuries without the majority language necessarily replacing minority languages. Diglossia, i.e. the use of different languages for different pur- poses may help sustain bilingualism. Current models can be extended to investigate under which conditions bilingualism may persist or get reduced to monolingualism. Different kinds of situations can be modelled, such as the replacement of certain, but not all, languages within the domain of the Roman empire, the development of so-called linguistic areas, where several languages share a number of features (e.g., the Balkans, India, Mesoamer- ica), multilingualism caused by linguistic exogamy (the northwest Amazon region), the shift from one to another lingua franca with retention of minority languages (Mayan immigrants in urban United States shifting from Spanish to English but retaining Mayan languages), etc., and may be applied to sit- uations where prehistoric interaction has left linguistic traces but where the nature of the interaction is unknown (e.g., the sharing of linguistic features around the entire coast of the Pacific Ocean). In the Appendix we provide a preview of the extension of the Schulze model to bilingualism. An area where physicists may wish to try out their hands more is that of language change. Simulations may help linguists come to terms with realities that are accessible through empirical research only in small fragments. Lan- guages develop and change through the interaction of multitudes of agents using large lexical inventories and complex grammars. The kinds of regu- larities that linguist can identify, such as the regularity of sound changes or directed paths of grammaticalization (roughly, the process whereby sep- arate words become part of the morphology), are mostly accessible only to a retrospective view, trough the comparison of language stages dozens or hundreds of years apart. What lies between is a flux whose behavior is not easy to understand. 19th century historical linguists, with their focus on regular sound changes, lived in a universe of clean equations such as Latin p = English f (as in pater = father). The advent of 20th century soci- olinguistics, with its focus on the social mechanisms behind sound changes, complicated the picture, much like the picture gets complicated when one moves from clean Newtonian physics to modern statistical physics, which tries to model the actual behavior and interaction of entities. Nevertheless, unlike physicists, linguists working on the way that languages change have taken little recourse to simulations that might help them understand the complexity of how language change or shift percolates within a community. For instance, a leading sociolinguist has argued that ”networks constituted chiefly of strong ties function as a mechanism to support minority languages, resisting institutional pressures to language shift, but when these networks weaken, language shift is likely to take place” (p. 558 in [40]). This hypoth- esis is based on just a few case studies, and such case studies are, one the one hand, extremely costly and, on the other, cannot even begin to cover the multitute of different situations that actually obtains. In addition to the strength of network ties, other important parameters are presumably the size of the group speaking the minority language, geography, prestige of one as opposed to the other language, economic gain involved in shifting language, age- and gender-determined mobility, and maybe more. The behaviors of such parameters can be investigated in simulations (e.g., for geography see [11, 32]). Finally, more works needs to be done towards the integration of the mod- elling of language competition by physicists reviewed here and the modelling of language evolution by computational linguists [41, 42, 43, 44, 45]. While physicists have been adept in modelling the interaction among agents but have operated with languages represented only by numbers or bit-strings, computational linguists offer elaborate grammar models. With more com- plex models of the interior structure of languages carried by agents, research need not be limited to a focus on language competition, but could be ex- tended to issues of language structure itself. 6 Appendix: Bilingualism Several authors studied the possibility that people speak more than one lan- guage [5, 6, 38], and we do here the same for the Schulze model on the square lattice, with F = 8 features of Q = 3 different values, using only interactions to nearest neighbours [27]. For this purpose we modify the switch process. Before, languages spoken by a fraction x of the four neighbours were dropped in favour of the language of a randomly selected neighbour with probability (1 − x)2r (r = 0.9). Now we do this at lattice site i only if none of the four neighbours of i speaks the mother language of site i: x = 0; then with probability r we replace the mother language of i by the mother language of a randomly selected neighbour. Otherwise, for x > 0, with the above probability (1− x)2r, site i learns as an additional “foreign” language a randomly selected (foreign or mother) language of a randomly selected neighbour. If in the latter case x > 0, site i has already learned a foreign language before, then this old foreign language is replaced by the new foreign language. .0001 0.001 0.01 1 10 100 1000 10000 Leading mother(+,x) and foreign(*,sq.) languages 1e-04 0.001 0.01 1 10 100 1000 10000 Leaders with (+,x) and without (*,sq.) bilinguals Figure 10: Largest and second-largest languages in the Schulze lattice at p = 0.01, q = r = 0.9 in a 8000 × 8000 lattice without migration. Part a includes bilinguals, part b compares only the mother languages with and without bilinguals. .0001 0.001 0.01 1 10 100 1000 10000 As 10a but with forgetting and migration d=0.01 10 100 1000 100 x 100 migration: d = 1 (left) to 0.0001 (right) Figure 11: Part a: As Fig 10a, but including a migration probability d of 1 % and a site-dependent forgetting probability between zero and 5 %, on a 6000× 6000 lattice. Part b shows the drastic speed-up of dominance if the migration probability d is enhanced: d = 1, 0.5, 0.2, 0.1, . . . , 0.0005, 0.0002, 0.0001. These are the learning and replacement events if everybody can speak at most two languages. If instead the number of languages for each person is restricted by an overall upper limit, then for x > 0 the last-learned foreign language is replaced by the newly selected neighbour language. If this upper limit is set equal to one, we go back to the model of monolingual speakers. In all cases, x is the fraction of neighbours of i speaking as their mother tongue the mother language of site i. For language diffusion we took q = 0.9 throughout, and for language change mostly p = 0.01. Thus one iteration may correspond to about one human generation. We start with a fragmented distribution of mother languages and no foreign languageas, except that one particular language is spoken initially by ten percent of the people, randomly spread over the lattice. Then we check if this “lingua franca” finally (after at most 105 iterations) is spoken by about everybody: transition from fragmentation to dominance of initially favoured language. (If another language dominated we count this case as fragmentation.) For ten 50× 50 lattices, this transition happened up to p = 0.04 if bilin- guality is allowed, while for monolinguals it happens up to the larger chang- ing rate p = 0.09: Bilinguality makes dominance of one language less stable against continuous changes; see also [38]. For Q = 5 instead of 3 this limit shifts from 0.04 to 0.05, while for Q = 3, F = 16 it is about 0.03. Fig.10 shows for 8000× 8000 lattices the time dependence of the fraction of people speaking the largest and the second-largest language, separately for mother tongue and foreign languages; the comparison for the case without bilinguals is restricted to the mother language and shows that dominance is reached faster without bilinguals. In these simulations, after a short time everybody speaks two languages, and if up to ten languages are allowed, then again after a short time every- body speaks ten languages. This is nice but unrealistic. In order to take into account that people forget again foreign languages which were learned but not used, or give up learning a foreign language when the need for it dissipates, we assume that at every time step each speaker (more precisely, each lattice site) may give up the last-learned foreign language if none of the neighbours at that time speaks this language. This forgetting happens with a probability between zero and five percent, fixed for each site randomly and independently at the beginning. In addition we included migration via exchanging locations: A speaker or family exchanges residence with a randomly selected neighbour, and both carry their languages with them. This happens at each iteration with a probability d; physicists call d the diffusion constant. Fig.11 shows that ap- preciable migration can drastically speed up the growth of the lingua franca from having an initial advantage of being spoken by ten percent of the pop- ulation to being dominant. References [1] http://www.isrl.uiuc.edu/amag/langev/; W.S.Y. Wang, J.W. Minett, Trans. Philological Soc. 103, 121 (2005). [2] R.Selten und J. Pool, S.64 in: R. Selten (Hg), Game Equilibrium Models IV, Berlin-Heidelberg: Springer 1992. [3] D. Stauffer, S. Moss de Oliveira, P.M.C. de Oliveira, J.S. Sá Martins, Biology, Sociology, Geology by Computational Physicists. Amsterdam: Elsevier 2006. [4] D. Abrams and S.H. Strogatz, Nature 424, 900 (2003); M. Patriarca and T. Leppänen, Physica A 338, 296 (2004). [5] J. Mira, J., A. Paredes, Europhys. Lett. 69 (2005) 1031. [6] K. Kosmidis, J.M. Halley, P. Argyrakis, Physica A, 353 (2005) 595; K. Kosmidis, A. Kalampokis, P. Argyrakis, Physica A 366, 495 and 320, 808 (2006). [7] C. Schulze, D. Stauffer, Int. J. Mod. Phys. C 16, 781 and Physics of Life Reviews 2, 89 (2005). [8] J.P. Pinasco, L. Romanelli, Physica A 361, 355 (2006). [9] V. Schwämmle, Int. J. Mod. Phys. C 16, 1519 (2005) and 17, 103 (2006). [10] V.M. de Oliveira, M.A.F. Gomes and I.R. Tsang, Physica A 361, 361 and 368, 257 (2006). [11] C. Schulze and D. Stauffer, p.311 in: B.K. Chakrabarti, A. Chakraborti, and A. Chatterjee (Hgg.), Econophysics and Sociophysics: Trends and Perspectives. Weinheim: Wiley-VCH 2006. For shorter reviews see http://www.isrl.uiuc.edu/amag/langev/ AIP Conference Proceedings 779 (2005) 49; Comput. Sci. Engin. 8 (May/June 2006) 86. [12] T. Tȩsileanu, H. Meyer-Ortmanns, Int. J. Mod. Phys. C 17, 259 (2006). [13] D. Stauffer, X. Castelló, V.M. Egúıluz, M. San Miguel, Physica A 274, 835 (2007); X. Castelló, V.M. Egúıluz, M. San Miguel, New J. Phys. 8, article 308 (2006). [14] Ç. Tuncay, Int. J. Mod. Phys. C 18, No. 5 (2007) , e-prints physics 0610110, 0612137, 0703144 on arXiv.org [15] C. Yang, The Infinite Gift - How Children Learn and Unlearn the Lan- guages of the World, Scribner, New York 2006. [16] B.F. Grimes, Ethnologue: Languages of the World (14th edn. 2000). Dallas, TX: Summer Institute of Linguistics; and www.ethnologue.org [17] D. Stauffer, C. Schulze, F.W.S. Lima, S. Wichmann and S. Solomon, Physica A 371, 719 (2006). [18] W.J. Sutherland, Nature 423, 276 (2003). [19] E.J. Briscoe, Language 76, 245 (2000). [20] M. Haspelmath, M. Dryer, D. Gil, and B. Comrie (eds.). The World Atlas of Language Structures. Oxford: Oxford University Press 2005. [21] S. Wichmann and E.W. Holman. 2007ms. Assessing Temporal Stability for Linguistic Typological Features. Manuscript to be submitted [22] F.C.Billari, T. Fent, A. Prskawetz and J. Scheffran (Hgg.) Agent-based computational modelling, Heidelberg: Physica-Verlag 2006. [23] Goebl, H., Mitt.Österr. Geogr. Ges. 146 (2004) 247 (in German). [24] E.W. Holman, C. Schulze, D. Stauffer und S. Wichmann, conditionally accepted by Linguistic Typology. [25] L.L. Cavalli-Sforza and W.S.Y. Wang, Language 62, 38 (1986). [26] G. Barbujani and R.R. Sokal, Proc. Natl. Acad. USA 87, 1816 (1990). http://arxiv.org/abs/physics/0610110 [27] C. Schulze and D. Stauffer, Physica A, in press (2007). [28] P.M.C. de Oliveira, D. Stauffer, F.W.S. Lima, A.O. Sousa, C. Schulze and S. Moss de Oliveira, Physica A 376, 609 (2007). [29] S. Wichmann, J. Linguistics 41, 117 (2005). [30] D. Stauffer, C. Schulze and S. Wichmann, in: Beiträge zur Experimen- talphysik, Didaktik und computergestützten Physik, S. Kolling (ed.): Logos-Verlag, Berlin 2007 (Festschrift Patt) [31] R. Albert and L.A. Barabási, Rev. Mod. Phys. 74, 47 (2002). [32] C. Schulze and D. Stauffer, Adv. Complex Syst. 9, 183 (2006). [33] D. Nettle, Lingua 108, 95 (1999) and Proc. Natl. Acad. Sci. USA 96, 3325 (1999). [34] V. Novotny and P. Drozd, Proc. Roy. Soc. London B267, 947 (2000). [35] M.A. Nowak, N.L. Komarova and P. Niyogi, Nature 417, 611 (2000); N.L. Komarova, J. Theor. Biology 230, 227 (2004). [36] N. Prévost, The physics of language: toward a phase Transition of lan- guage change. PhD Thesis, Simon Fraser University, Vancouver 2003. [37] S.Wichmann and E.W. Holman. Submitted. Pairwise Comparisons of Typological Profiles. For the proceedings of the conference Rara & Raris- sima – Collecting and interpreting unusual characteristics of human lan- guages, Leipzig (Germany), 29 March - 1 April 2006; e-print 07040071 at arXiv.org [38] X. Castelló, V.M. Egúıluz, and M. San Miguel, New J.Phys. 8, 308 (2006) [39] J. Nichols, Diversity and stability in languages. In: B.D. Joseph, and R. D. Janda (eds.), The Handbook of Historical Linguistics, pp. 283-310. Malden/Oxford/Melbourne/Berlin: Blackwell Publishing 2003. [40] L. Milroy, Social networks. In: J.K. Chambers, P. Trudgill and N. Schilling-Estes (eds.), The Handbook of Language Variation and Change. Blackwell 2002. http://arxiv.org/abs/e-print/0704007 [41] A. Cangelosi and D. Parisi (eds.),Simulating the Evolution of Language. Berlin: Springer-Verlag 2002. [42] E.J. Briscoe (ed.), Linguistic Evolution through Language Acquisition: Formal and Computational Models. Cambridge: Cambridge University Press 2002. [43] M. Christiansen and S. Kirby (eds.), Language Evolution. Oxford: Ox- ford University Press 2003. [44] B. de Boer, Computer modelling as a tool for understanding language evolution. In: N. Gontier, J. P. van Bendegem and D. Aerts (eds.), Evo- lutionary Epistemology, Language and Culture: A Non-Adaptationist Systems Theoretical Approach, 381-406. Dordrecht: Springer 2006. [45] Niyogi, Partha, The Computational Nature of Language Learning and Evolution. Cambridge & London: The MIT Press 2006. Introduction Schulze Model Definition Results Viviane Model Definition Results Other Models How physics may inform linguistics: prospects for future research Appendix: Bilingualism ABSTRACT Simulations of physicists for the competition between adult languages since 2003 are reviewed. How many languages are spoken by how many people? How many languages are contained in various language families? How do language similarities decay with geographical distance, and what effects do natural boundaries have? New simulations of bilinguality are given in an appendix. <|endoftext|><|startoftext|> Introduction A consistent theoretical model of the high critical temperature superconductivity in cuprates is to be able to accommodate both the normal and superconducting states under incorporation of the essential features of these systems (see, e.g., [1] for a review): strong antiferromagnetic (AFM) superexchange interaction inside the CuO2 planes, occurrence of two relatively isolated energy bands around the Fermi level, able to develop dx2−y2 pairing: one stemming from single particle copper dx2−y2 states and the second one from singlet doubly occupied states generated [2] by crystal field interaction; hopping conduction for an extremely low density of the free charge carriers. The p-d model [3], while incorporating all these features, is too cumbersome and cell-cluster perturbation theory [4, 5] providing a hierarchy of the various interaction terms was used to derive simpler models from it. Extreme limit cases of this reduction procedure are various effective one-band t-J models (see, e.g., [6, 7] and references therein) which, while unveiling the role played by the AFM exchange interaction in the occurrence of the d-wave pairing, address exclusively the superconducting state. The reduction of the p-d model to an effective two-band Hubbard model considered by Plakida et al. [8], corroborated with the use of the equation of motion technique for thermodynamic Green functions (GF) [9], provided the simplest approach to the description of both the normal [8, 10] and the superconducting states [11, 12, 13] within a frame securing rigorous fulfilment of the Pauli exclusion principle for fermionic states. The Green function technique rests on the Hubbard operator algebra. Its rigorous implementation onto a system characterized by specific symmetry properties (translation invariant two-dimensional spin lattice, spin reversal invariance of the observables) results either in characteristic invariance properties of several correlation functions, or in the occurrence of some exactly vanishing correlation functions. The use of these results allows rigorous derivation and simplification of the expressions of the frequency matrix and of the generalized mean field approximation (GMFA) Green functions of the model. The obtained expressions contain higher order boson-boson correlation functions (CFs). For the CFs involving singlets (normal singlet hopping CFs and anomalous exchange pairing CFs), an approximation procedure which avoids the usual decoupling schemes and, yet, secures the correlation order reduction to GMFA-GF expressions, under the identification and elimination of exponentially small quantities, is described. The organization of the paper is as follows. Sec. 2 summarizes essentials of the two-band Hubbard model and GMFA-GF equations. Sec. 3 describes the invariance properties following from the translation invariance of the underlying spin lattice. Sec. 4 derives invariance properties and constraints following from the invariance of the macroscopic properties of the system under spin reversal. On the basis of the results of Sec. 3 and 4, rigorous derivation of the frequency matrix in the (r, ω)-representation is done in Sec. 5. The derivation of GMFA-GF expressions for the boson-boson correlation functions involving singlets is discussed in Sec. 6. Collecting together the results of sections 5 and 6, expressions of the frequency Mean field Green functions of Hubbard model of superconductivity 3 matrix and of the GMFA Green function matrix are derived in the (q, ω)-representation in sections 7 and 8 respectively. These results explicitly incorporate both hole-doping and electron-doping features of the cuprate systems through the singlet hopping and superconducting pairing terms. The paper ends with conclusions in section 9. 2. Mean field approximation The Hamiltonian of the effective two-band singlet-hole Hubbard model [8] is written in the form H = E1 Xσσi + E2 X22i + σ0,0σ 1,i +K22 2σ,σ2 1,i +K21 2σ̄,0σ 1,i + τ σ0,σ̄2 1,i ) (1) The summation label i runs over the sites of an infinite two-dimensional (2D) square array the lattice constants of which, ax = ay, are defined by the underlying single crystal structure. The spin projection values in the sums over σ are σ = ±1/2, σ̄ = −σ. The Hubbard operators (HOs) X i = |iα〉〈iβ| are defined for the four states of the model at each lattice site i: |0〉 (vacuum), |σ〉 = |↑〉 and |σ̄〉 = |↓〉 (single particle spin states inside the hole subband), and |2〉 = |↑↓〉 (singlet state in the singlet subband). The multiplication rule holdsX i = δβγX i . The HOs may be fermionic (single spin state creation/annihilation in a subband) or bosonic (singlet creation/annihilation, spin or charge densities, particle numbers). For a pair of fermionic HOs, the anticommutator rule holds {X i , X j } = δij(δβγX i + δηαX i ) whereas, if one or both HOs are bosonic, the commutation rule holds [X i , X j ] = δij(δβγX i − δηαX i ). At each lattice site i, the constraint of no double occupancy of any quantum state |iα〉 is rigorously fulfilled due to the completeness relation X00i +X i = 1. In (1), E1 = ε̃d−µ denotes the hole subband energy for the renormalized energy ε̃d of a d-hole and the chemical potential µ. The energy parameter of the singlet subband is E2 = 2E1 + ∆, where ∆ ≈ ∆pd = εp − εd is an effective Coulomb energy Ueff corresponding to the difference between the two energy levels of the model. In the description of the hopping processes, the label 1 points to the hole subband and 2 to the singlet subband. The hopping energy parameter Kab = 2tpdKab depends on tpd, the hopping p-d integral, and on energy band dependent form factors, Kab. Inband (K11,K22) and interband (K21 = K12) processess are present. The Hubbard 1-forms αβ,γη 1,i = m (2) incorporate the overall effects of specific hopping processes (through the labels (αβ, γη) of the pair of Hubbard operators) involving the lattice site i and its neighbouring sites. Up to three coordination spheres around the reference site i do contribute [4, 5] to the sum (2), each being characterized by a small specific value of the overlap coefficients Mean field Green functions of Hubbard model of superconductivity 4 νij (ν1 for the nearest neighbour (nn), ν2 for the next nearest neighbour (nnn), ν3 for the third coordination spheres). The quasi-particle spectrum and superconducting pairing for the Hamiltonian (1) are obtained [11, 12] from the two-time 4× 4 GF matrix (in Zubarev notation [9]) G̃ijσ(t− t ′) = 〈〈X̂iσ(t) |X̂ ′)〉〉 = −iθ(t− t′)〈{X̂iσ(t), X̂ jσ}〉, (3) where 〈· · ·〉 denotes the statistical average over the Gibbs grand canonical ensemble. The GF (3) is defined for the four-component Nambu column operator X̂iσ = (X ⊤ (4) where the superscript ⊤ denotes the transposition. In (3), X̂ jσ = (X is the adjoint operator of X̂jσ. The GF matrix in (r, ω)-representation is related to the expression (3) of the GF matrix in (r, t)-representation by the non-unitary Fourier transform, G̃ijσ(t− t G̃ijσ(ω) e −iω(t−t′) dω . (5) The energy spectrum of the translation invariant spin lattice of (1) is solved in the reciprocal space. The GF matrix in this (q, ω)-representation is related to the GF matrix in (r, ω)-representation by the non-unitary discrete Fourier transform G̃ijσ(ω) = e−iq (rj−ri) G̃σ(q, ω). (6) For an elemental GF of labels (αβ, γη), we use the notation 〈〈X i (t)|X ′)〉〉 in the (r, t)-representation and, similarly, 〈〈X j 〉〉ω (assuming Hubbard operators at t = 0), in the (r, ω)-representation. In the (q, ω)-representation, it is convenient to use the notation Gαβ,γη(q, ω). We shall consider henceforth the GMFA-GF, G̃0σ(q, ω). Its derivation involves: (i) Differentiation of the GF (3) with respect to t and use of the equations of motion for the Heisenberg operators X i (t). (ii) Derivation of an algebraic equation for G̃ijσ(ω), Eq. (5). (iii) Elimination of the contribution of the inelastic processes to the commutator Ẑiσ = [X̂iσ, H ] entering the equation of motion of G̃ijσ(ω). (iv) Transformation to (q, ω)-representation of the obtained equation of G̃0ijσ(ω) by means of the Fourier transform (6). This finally yields G̃0σ(q, ω) = χ̃ χ̃ω − Ãσ(q) χ̃ , (7) χ̃ = 〈{X̂iσ, X̂ iσ}〉, (8) Ãσ(q) = eiq (rj−ri) Ãijσ, rij = rj − ri , (9) Ãijσ = 〈{[X̂iσ, H ], X̂ jσ}〉 . (10) The matrix Ãijσ is Hermitian. Mean field Green functions of Hubbard model of superconductivity 5 3. Translation invariance of the spin lattice Four consequences follow from the translation invariance of the spin lattice. • The definition of the Hubbard 1-form (2) over a translation invariant spin lattice results in the identity (which secures the hermiticity of the Hamiltonian H): αβ,γη 1,i = −τ γη,αβ 1,i . (11) • The Green function (3) of the model Hamiltonian (1) depends only on the distance rij = |rj − ri| between the position vectors at the lattice sites i and j [9]. • The one-site statistical averages are independent on the site label i, 〈X i 〉 = 〈X (∀ i, j). For this reason, the site label in the one-site averages will be omitted. • The two-site statistical averages 〈X j 〉 remain invariant under the interchange of the site labels i and j, j 〉 = 〈X i 〉, i 6= j (12) 4. Spin reversal invariance The energy spectrum of the system described by the Hamiltonian (1) does not depend on the specific values σ = ±1/2 of the spin projection. As a consequence, the definition of the GF (3) either in terms of the σ-Nambu operator (4) or the σ̄-Nambu operator X̂iσ̄ = (X ⊤ (13) has to result in mathematically equivalent descriptions of the observables. This means, however, that the mathematical structures of the frequency matrices Ãijσ, Eq. (10), and Ãijσ̄ = 〈{[X̂iσ̄, H ], X̂ jσ̄}〉 emerging from the σ̄-Nambu operator (13), have to be related to each other. The identification of the existing relationships is constructive: we calculate and compare the corresponding matrix elements of Ãijσ and Ãijσ̄. The multiplication rules and the commutation/anticommutation relations satisfied by the Hubbard operators result in the following general expression of the elemental anticommutators entering their definitions: i , H ], X j } = δijC λµ,νϕ i + (1− δij)νijT λµ,νϕ ij , (14) with one-site contributions given by λµ,νϕ i = δνµ δµσ)E1 + δµ2E2 i +K11τ 0ϕ,σ0 1,i −K22τ 2ϕ,σ2 1,i +K21 ·2σ(τ 2ϕ,0σ̄ 1,i +τ 0ϕ,2σ̄ 1,i ) +δλ2(−E2X i +K22 σϕ,2σ 1,i +K21 σ̄ϕ,σ0 1,i )− −δλ0(K11 σϕ,0σ 1,i +K21 σϕ,σ̄2 1,i ) Mean field Green functions of Hubbard model of superconductivity 6 δλσ)E1 + δλ2E2 i +K11τ ν0,0σ 1,i −K22τ ν2,2σ 1,i +K21 · 2σ(τ ν2,σ̄0 1,i +τ ν0,σ̄2 1,i ) +δµ2(E2X i +K22 νσ,σ2 1,i +K21 νσ̄,0σ 1,i )− −δµ0(K11 νσ,σ0 1,i +K21 νσ,2σ̄ 1,i ) δϕ0(K11τ νµ,σ0 1,i + 2σK21τ νµ,2σ̄ 1,i )− δϕ2(K22τ νµ,σ2 1,i − 2σK21τ νµ,0σ̄ 1,i ) δλ0(K11τ νµ,0σ 1,i + 2σK21τ νµ,σ̄2 1,i )− δλ2(K22τ νµ,2σ 1,i − 2σK21τ νµ,σ̄0 1,i ) δν0(K11τ λϕ,0σ 1,i + 2σK21τ λϕ,σ̄2 1,i )− δν2(K22τ λϕ,2σ 1,i − 2σK21τ λϕ,σ̄0 1,i ) δµ0(K11τ λϕ,σ0 1,i + 2σK21τ λϕ,2σ̄ 1,i )− δµ2(K22τ λϕ,σ2 1,i − 2σK21τ λϕ,0σ̄ 1,i ) and two-site contributions given by λµ,νϕ ij = δνµ δµσ)(K11X j −K22X j )+(−δµ0K11+δµ2K22) Xλσi X δλσ)(−K11X j +K22X j )+(δλ0K11−δλ2K22) δν0K11X j − δν2K22X +K21 ·2σ j +δϕ2X j +δν,−λ(X δϕ0K11X j − δϕ2K22X +K21 ·2σ j + δν2X j + δϕ,−µ(X δλ0K11X j −δλ2K22X j +K21 ·2σ(δµ0X j +δµ2X δµ0K11X j −δµ2K22X j +K21 ·2σ(δλ0X j +δλ2X 2σ(δλ0δν2X j −δλ2δν0X j −δµ0δϕ2X j +δµ2δϕ0X The comparison of the results obtained from (14) for the corresponding matrix elements of Ãijσ and Ãijσ̄ and the use of the translation invariance properties (11) and (12) result in four distinct kinds of relationships: • Under the spin reversal σ → σ̄, the following invariance properties hold for the normal one-site statistical averages: 〈Xσσi 〉 = 〈X i 〉 (15) Mean field Green functions of Hubbard model of superconductivity 7 σ2,2σ 1,i 〉 = 〈τ σ̄2,2σ̄ 1,i 〉, 〈τ 0σ̄,σ̄0 1,i 〉 = 〈τ 0σ,σ0 1,i 〉 (16) σ2,σ̄0 1,i 〉 = 2σ̄〈τ σ̄2,σ0 1,i 〉 (17) • The identity 〈C σ2,0σ 0σ̄,σ̄2 i 〉 = 0 holds, therefrom we get for the one-site anomalous averages, 〈X02i 〉 = 0 (18) 0σ̄,σ̄2 1,i 〉 = −〈τ 0σ,σ2 1,i 〉 (19) 0σ̄,0σ 1,i 〉 = 〈τ σ2,σ̄2 1,i 〉 (20) The first two equations imply that the contributions of the one-site terms 〈X02i 〉 0σ̄,σ̄2 1,i 〉 to the superconducting pairing vanish identically irrespective of the model details (like, e.g., the relationship between the lattice constants ax and ay). For a rectangular spin lattice (ax 6= ay), Eq. (20) points to the occurrence of a small non-vanishing one-site contribution to the superconducting pairing originating equally in both energy subbands. However, over the square spin lattice (1) (ax = ay), each term of (20) vanishes for d-wave pairing due to the symmetry in the reciprocal space [12]. • Under the spin reversal σ → σ̄, the following invariance properties hold for the two-site statistical averages: 〈Xσσi X j 〉 = 〈X j 〉, 〈X j 〉 = 〈X j 〉 (21) 〈X22i X j 〉 = 〈X j 〉, 〈X j 〉 = 〈X j 〉 (22) 〈X02i X j 〉 = 〈X j 〉. (23) • The operator of the number of particles at site i within the singlet subband, Ni, is the sum of spin σ and σ̄ components, Ni = niσ + niσ̄, niσ = X i , niσ̄ = X i . (24) Similar relationships hold for the number of particles at site i within the hole subband, Nhi , Nhi = n iσ + n iσ̄, n iσ = X i , n iσ̄ = X i . (25) Due to the completeness relation, Ni +N i = 2, niσ + n iσ = niσ̄ + n iσ̄ = 1. (26) These equalities simply reflect the fact that, at a given lattice site i, there is a single spin state of predefined spin projection, whereas the total number of spin states equals two. Therefore, the operator Ni, Eq. (24), provides unique characterization of the occupied states within the model [8, 12, 10]. Mean field Green functions of Hubbard model of superconductivity 8 5. Frequency matrix in (r, ω)-representation A straightforward consequence of the results established in section 4 is the simplest general expression of the frequency matrix Ãijσ, Eq. (10): Ãijσ = δij ĉσ 0̂ 0̂ −(ĉσ̄) + (1− δij) D̂ijσ ∆̂ijσ (∆̂ijσ) † −(D̂ijσ̄) . (27) The one-site 2× 2 matrix ĉσ is Hermitian, its elements do not depend on the particular lattice site i, ĉσ = (E1 +∆)χ2 + a22 2σa21 2σa∗21 E1χ1 + a22 , (28) and are expressed in terms of the spin reversal invariant quantities χ2 = 〈niσ〉 = 〈niσ̄〉 (29) χ1 = 〈n iσ〉 = 〈n iσ̄〉 = 1− χ2 (30) a22 = K11〈τ 0σ̄,σ̄0 1 〉 − K22〈τ σ2,2σ 1 〉 (31) a21 = (K11 −K22) · 2σ〈τ σ2,σ̄0 1 〉+K21(〈τ 0σ̄,σ̄0 1 〉 − 〈τ σ2,2σ 1 〉). (32) The normal hopping 2× 2 matrix D̂ijσ is symmetric, D̂ijσ = d22ij 2σd 2σd21ij d Due to the constraints (21)–(22), the charge-spin correlations entering the matrix elements of (33) get exactly decoupled from each other, such that d22ij = K22(χ ij + χ ij)−K11χ d11ij = K11[χ ij + (χ1 − χ2)νij + χ ij]−K22χ d21ij = K21[(χ ij − χ2νij) + χ ij ]−K21χ with the three spin reversal invariant weighted boson-boson correlation functions representing respectively charge-charge (c), spin-spin (S), and singlet-hopping (s-h) correlations: χcij = νij〈NiNj〉/4, (34) χSij = νij〈SiSj〉 (35) χs−hij = νij〈X j 〉 (36) In (35), Si = (S i , S i ), with S i = (X i )/2 and S i = X The anomalous hopping 2× 2 matrix ∆̂ijσ has a very special form namely, ∆̂ijσ = −K21 · 2σ (K11 +K22) (K11 +K22) K21 · 2σ ij (37) where the spin reversal invariant weighted boson-boson pairing (pair) correlation function is given by ij = νij〈X i Nj〉 = 2νij〈X j )〉 = (38) = − νij〈N i 〉 = −2νij〈(X i 〉. (39) Mean field Green functions of Hubbard model of superconductivity 9 In Eqs. (38) and (39), the derivation of the second expression from the first one makes use of the spin reversal invariance property (23). To get a workable expression of the frequency matrix, approximations have to be derived for the boson-boson statistical averages entering the two-site hopping matrix elements. In the next section we show that the method of reference [12], yielding the pairing correlation function 〈X02i Nj〉 in terms of GMFA Green functions within an approach able to identify and rule out exponentially small terms, can be extended to the singlet hopping correlations 〈X02i X j 〉 as well. 6. Hopping processes involving singlets The right approach to the reduction of the order of correlation of the boson-boson statistical averages 〈X02i X j 〉 = 〈X i 〉 goes differently for the hole-doped and electron-doped cuprates. • Reduction of the correlation order for hole-doped cuprates In these systems, the Fermi level (the zero point energy) stays in the singlet subband. We get the estimates E2 ≃ −∆, E2 − ∆ ≃ −2∆, E2 + ∆ ≃ 0. With ∆ ∼ 3eV , β∆ ∼ 3.5 · 104T−1. Therefore, at T . 300K, the quantities containing the factor eβE2 ≃ e−β∆ . e−100 < 10−44 are negligible. We start with the following form of the spectral theorem [9] 〈X02i X j 〉 = 1 + e−βω 〈〈X02i |X j 〉〉ω+iε − 〈〈X j 〉〉ω−iε , (40) written for anticommutator retarded (ω + iε), respectively advanced (ω − iε) Green functions. Their equation of motion in the (r, ω)-representation is (ω−E2)〈〈X j 〉〉ω ≃ 2〈X j 〉+K21 0σ̄,0σ 1,i |X j 〉〉ω−〈〈τ σ2,σ̄2 1,i |X j 〉〉ω where, for the sake of simplicity, the labels ±iε, ε = 0+, describing respectively the retarded and the advanced Green functions have been omitted. In Eq. (41), the higher order r.h.s. contributions coming from the inband hopping terms have been dropped off. Replacing (41) in (40), we get 〈X02i X j 〉 ≃ K21 1 + e−βω ω − E2 + iε 0σ̄,0σ 1,i |X j 〉〉ω+iε−〈〈τ σ2,σ̄2 1,i |X j 〉〉ω+iε To evaluate the imaginary part, we use the identity [9] ω − E2 + iε ω −E2 − iπδ(ω − E2). The integrals over the δ-function yield (finite) GF real parts at ω = E2, multiplied by a thermodynamic factor ∼ e−β∆ ≪ 1. The imaginary part of the hole subband GF Mean field Green functions of Hubbard model of superconductivity 10 0σ̄,0σ 1,i |X j 〉〉ω+iε shows a δ-like maximum at ω = E2−∆, where (ω−E2) −1 ≃ ∆−1 and the thermodynamic factor reaches a value ∼ e−2∆. The only non-negligible contribution to the principal part integral comes from the singlet subband GF 〈〈τ σ2,σ̄2 1,i |X j 〉〉ω+iε the imaginary part of which shows a δ-like maximum at ω = E2 +∆ ≃ 0. This allows us to approximate (ω − E2) −1 ≈ ∆−1 within the integral over the singlet subband GF to get 〈X02i X j 〉 ≃ (1− δij) 2σ̄〈τ σ2,σ̄2 1,i X j 〉 (42) Replacing this result in Eq. (38) and using (2) we get ij ≃ (1− δij) K21νij 4νij ·2σ̄〈X m6=(i,j) 2σ〈Xσ2i X m Nj〉 Omitting the three-site terms, we get the two-site approximation of the superconducting pairing originating in the singlet subband, ij ≃ (1− δij) 4K21ν · 2σ̄〈Xσ2i X j 〉, (44) which reproduces the well-known two-site exchange term of the t-J model. For the singlet hopping correlation function, (42) yields the two-site approximation χs−hij ≃ (1− δij) 2K21ν · 2σ̄〈Xσ2i X j 〉 (45) • Reduction of the correlation order for electron-doped cuprates The Fermi level (the zero point energy) stays now in the hole subband. We have the estimates E2 ≃ ∆, E2 +∆ ≃ 2∆, E2 −∆ ≃ 0. It is convenient now to start with the alternative form of the spectral theorem [9] i 〉 = eβω + 1 〈〈X02i |X j 〉〉ω+iε − 〈〈X j 〉〉ω−iε , (46) with the retarded and advanced GFs following from the same equation (41). Exponentially small quantities result from the δ-term of (ω − E2 + iε) −1 and from the singlet subband GF 〈〈τ σ2,σ̄2 1,i |X j 〉〉ω+iε. The hole subband GF 〈〈τ 0σ̄,0σ 1,i |X j 〉〉ω+iε, yields the non-negligible contribution i 〉 ≃ (1− δij) 2σ̄〈X 0σ̄,0σ 1,i 〉 (47) Replacing in (39) and omitting the three-site terms, we get the two-site approximation of the superconducting pairing originating in the hole subband, ij ≃ (1− δij) 4K21ν · 2σ〈X0σ̄i X j 〉 (48) Finally, the two-site approximation of the singlet-hopping correlation function is 〈X02i X j 〉 ≃ (1− δij) 2K21ν · 2σ̄〈X0σ̄i X j 〉. (49) In conclusion, the GMFA superconducting pairing is a second order effect. The lowest order contribution to it originates in interband hopping correlating annihilation Mean field Green functions of Hubbard model of superconductivity 11 (or creation) of pairs of spins at neighbouring lattice sites i and j within that energy subband which crosses the Fermi level. Similarly, the singlet hopping is a second order effect as well. It mainly proceeds by interband i ⇄ j single particle jumps from the upper energy subband to the lower energy subband. 7. Frequency matrix in (q, ω)-representation The calculation of the matrix elements of Ãσ(q) from Eq. (9) asks for three essentially different kinds of Fourier transforms, namely, • The averages of the Hubbard 1-forms entering Eqs. (31) and (32) result in sums of products of q-space averages and geometrical form factors: λµ,νϕ 1,i 〉 = 〈XλµXνϕ〉qγα(q) (50) for label sets {(λµ, νϕ)} ∈ {(0σ̄, σ̄0); (σ2, 2σ); (σ2, σ̄0)}. The quantity 〈XλµXνϕ〉q denotes the average of the q-space image of the product of Hubbard operators of labels λµ and νϕ respectively, 〈XλµXνϕ〉q = 1 + e−βω Gλµ,νϕ(q, ω + iε)−Gλµ,νϕ(q, ω − iε) Finally, in Eq. (50), γα(q) denote the nn (α = 1), nnn (α = 2), and third neighbour (α = 3) geometrical form factors, γ1(q) = 2[cos(qxax) + cos(qyay)], γ2(q) = 4 cos(qxax) cos(qyay), γ3(q) = 2[cos(2qxax) + cos(2qyay)]. • For the two-site weighted singlet hopping (36) and the superconducting pairing (38), the Fourier transforms result in convolutions of specific averages and geometrical form factors. The results are as follows: − Singlet hopping χs−h(q) = ν2α · Ξkγα(q− k) (52) where Ξk = 2σ〈X σ2X σ̄0〉k, while Ξk = 2σ〈X 0σ̄X2σ〉k for hole-doped and electron- doped cuprates respectively, with averages defined in (51). − Superconducting pairing χpair(q) = ν2α · Πkγα(q− k) (53) where Πk = 2σ̄〈X σ2X σ̄2〉k, while Πk = 2σ〈X 0σ̄X0σ〉k for hole-doped and electron- doped cuprates respectively, with averages defined in (51). • The charge-charge and spin-spin correlation functions (34) and (35) are treated approximately following [8, 10]: Mean field Green functions of Hubbard model of superconductivity 12 – The order of the charge-charge correlation function 〈NiNj〉 is lowered using a Hubbard type I approximation decoupling procedure 〈NiNj〉 ≃ 〈Ni〉〈Nj〉 = 2χ2. – The spin-spin correlation function 〈SiSj〉 is kept undecoupled, but treated phenomenologically. Eq. (2) implies the occurrence of up to three non-vanishing spin-spin correlation functions: nn, χS1 = 〈SiSi±ax/y〉, nnn, χ 2 = 〈SiSi±ax±ay〉, and χS3 = 〈SiSi±2ax/y〉. These are site independent quantities. Using the above results, we get from (9) and (27) the mathematical structure of the frequency matrix Ãσ(q) as follows, Ãσ(q) = Êσ(q) Φ̂σ(q) (Φ̂σ(q)) † −(Êσ̄(q)) . (54) The normal 2× 2 matrix contributions to Ãσ(q) show the characteristic σ-dependence, Êσ(q) = c22 2σc21 2σc∗21 c11 ; −(Êσ̄(q)) −c22 2σc 2σc21 −c11 with the σ-independent terms cab carrying normal one-site and two-site matrix elements, c22 ≡ c22(q) = (E1 +∆)χ2 + a22 + d22(q) c11 ≡ c11(q) = E1χ1 + a22 + d11(q) c21 ≡ c21(q) = a21 + d21(q) dab(q) = Kab ναγα(q)[χ α + (−1) a+bχaχb] + s−h(q) The one-site terms are defined by Eqs. (31)–(32) and (50). The exchange energy parameters are given by Jab = 4KabK21/∆, {ab} ∈ {22, 11, 21}, (56) while the singlet hopping contribution χs−h(q) is given by Eq. (52). The anomalous 2× 2 matrix contributions to Ãσ(q), obtained from (37), show the characteristic σ-dependence, Φ̂σ(q) = −2σξ1b ξ2b −ξ2b 2σξ1b ; (Φ̂σ(q)) −2σξ1b ∗ −ξ2b ∗ 2σξ1b with ξ1 = J21, ξ2 = (J11 + J22)/2, whereas b ≡ b(q) is a shorthand notation for the pairing matrix element (53). Remark 1 The spin reversal σ → σ̄ symmetry properties of the elemental Green functions entering the matrix GF (3) are identical to those established for the underlying frequency matrix Ãσ(q). Mean field Green functions of Hubbard model of superconductivity 13 8. GMFA Green function From Eqs. (15) and (18) it follows that the matrix χ̃, Eq. (8), is diagonal and spin reversal invariant, with two nonvanishing matrix elements, χ̂ 0̂ 0̂ χ̂ , χ̂ = , 0̂ = , (58) where χ2 and χ1 are given by Eqs. (29) and (30) respectively. Replacing in (7) the expressions (58) of the matrix χ̃ and (54) of the frequency matrix Ãσ(q), we get a structure of the GMFA-GFmatrix obeying the general symmetry properties established in [11], G̃0σ(q, ω) = Ĝ0σ(q, ω) F̂ σ (q, ω) (F̂ 0σ (q, ω)) † −(Ĝ0σ̄(q,−ω)) , (59) where the argument ω carries, in fact, the complex value ω + iε, ε = 0+. (Hence the elemental GFs containing the argument ω point to retarded GFs, while those containing the argument −ω point to advanced GFs.) The normal 2× 2 matrix Ĝ0σ(q, ω) shows the characteristic σ-dependence, Ĝ0σ(q, ω) = g22(q, ω) 2σg21(q, ω) 2σg∗21(q, ω) g11(q, ω) D(q, ω) with the σ-independent components gab(q, ω) found from gab(q, ω) = Aabω 3 +Babω 2 + Cabω +Dab, {ab} ∈ {22, 11, 21}. Here the coefficients Aab are given respectively by A22 = χ2, A11 = χ1, A21 = 0, while Bab, Cab, Dab are q-dependent coefficients: B22(q) = c22, B11(q) = c11, B21(q) = c21 C22(q) = − [χ2(c 11 + ξ 1 |b| 2) + χ1(|c21| 2 + ξ22 |b| 2)]/χ21 C11(q) = − [χ1(c 22 + ξ 1 |b| 2) + χ2(|c21| 2 + ξ22 |b| 2)]/χ22 C21(q) = [c21(χ2c11 + χ1c22)− ξ1ξ2|b| 2]/(χ1χ2) D22(q) = −[c11(c22c11 − |c21| 2) + (c22ξ 1 + c11ξ 2 + 2ℜ(c21)ξ1ξ2)|b| 2]/χ21 D11(q) = −[c22(c22c11 − |c21| 2) + (c11ξ 1 + c22ξ 2 + 2ℜ(c21)ξ1ξ2)|b| 2]/χ22 D21(q) = {c21(c22c11 − |c21| 2)− [c∗21ξ 1 + c21ξ 2 + (c22 + c11)ξ1ξ2]|b| 2}/(χ1χ2) The anomalous 2× 2 matrix F̂ 0σ (q, ω) shows the characteristic σ-dependence, F̂ 0σ (q, ω) = 2σf22(q, ω) f21(q, ω) −f21(q,−ω) 2σf11(q, ω) D(q, ω) Mean field Green functions of Hubbard model of superconductivity 14 with the elemental GFs fab(q, ω) given by faa(q, ω) = (Paaω 2 +Raa)b, {aa} ∈ {22, 11}, f21(q, ω) = (P21ω 2 +Q21ω +R21)b. Here, P22 = −ξ1, P11 = ξ1, and P21 = −ξ2 are q-independent, while R22(q) = [(c 11 + c 21)ξ1 + 2c11c21ξ2 + ξ1(ξ 1 − ξ 2)|b| 2]/χ21 R11(q) = −[(c 22 + c )ξ1 + 2c22c 21ξ2 + ξ1(ξ 1 − ξ 2)|b| 2]/χ22 R21(q) = [(c11c 21 + c22c21)ξ1 + (c22c11 + |c21| 2)ξ2 − ξ2(ξ 1 − ξ 2)|b| 2]/(χ1χ2) Q21(q) = [(χ2c21 − χ1c 21)ξ1 + (χ2c11 − χ1c22)ξ2]/(χ1χ2). The denominator D(q, ω) occurring in Eqs. (60) and (61), which is proportional to the determinant of the matrix χ̃ω − Ãσ(q) in (7), shows the following monic bi-quadratic dependence in ω: D(q, ω) = (ω2 − uω + v)(ω2 + uω + v), (62) where v = v(q) and u = u(q) are found respectively from (c22c11−|c21| 2)− (ξ21 − ξ 2)|b| [(c22+c11) + 2ℜ(c21)] 2ξ21 − −4(c22+c11)ℜ(c21)ξ1(ξ1 − ξ2)− 4|c21| 2(ξ21−ξ /(χ21χ 2) (63) u2 − 2v = (c211 + ξ 1 |b| (c222 + ξ 1 |b| (|c21| 2 + ξ22 |b| 2). (64) A necessary consistency condition to be satisfied by the parameters of the model at any vector q inside the Brillouin zone is v2(q) ≥ 0. Remark 2 The zeros of the determinant of the GMFA-GF, D(q, ω) = 0 (65) provide the GMFA energy spectrum of the system. At every wave vector q inside the Brillouin zone, this yields for the superconducting state the energy eigenvalue set {Ω1(q), Ω2(q), −Ω2(q), −Ω1(q)}, Ω1,2(q) = (u/2)± (u/2)2 − v. (66) In the normal state (b = 0), Eqs. (63) and (64) reduce respectively to v0 = (c22/χ2)(c11/χ1)− |c21| 2/(χ1χ2) u0 = (c22/χ2) + (c11/χ1) such that the energy spectrum is given by the roots of the second order equation ω2 − u0ω + v0 = 0 solved in [8]. Finally, if we assume a pure Hubbard model (i.e., energy band independent hopping parameters, K11 = K22 = K21 ≡ t, [10]), then a significant simplification of the equations derived in the last two sections is obtained. The normal 2× 2 matrix Êσ(q) Mean field Green functions of Hubbard model of superconductivity 15 becomes symmetric and so is the normal GMFA-GF Ĝ0σ(q, ω). Moreover, there is a single exchange energy parameter in (57), ξ1 = ξ2 ≡ J = 4t 2/∆, which simplifies the anomalous 2×2 frequency matrix to Φ̂σ(q) = −1 2σ Jb, such that the quantities u and v in the expression (62) of the GF determinant reduce to v2 = [(c22c11 − c 2 + (c22 + c11 + 2c21) 2J2|b|2]/(χ21χ 2) (67) u2 − 2v = [χ22c 11 + χ 22 + 2χ1χ2c 21 + J 2|b|2]/(χ21χ 2). (68) A non-negative value v ≥ 0 always follows from Eq. (67), however, the reality of the solutions (66) needs investigation of the domain of variation of the adjustable parameters of the model. 9. Conclusions The two-band Hubbard model of the high Tc superconductivity in cuprates [8, 12] uses Hubbard operator algebra on a physical system characterized by specific invariance symmetries with respect to translations and spin reversal. In the present paper we have shown that the system symmetries result either in invariance properties or exact vanishing of several characteristic statistical averages. The vanishing of the one-site anomalous matrix elements is shown to be a property which is embedded in the Hubbard operator algebra. Another worth mentioning consequence following from the spin reversal invariance properties of the two-site statistical averages is the exact decoupling from each other of the charge and spin correlations entering the matrix elements of the frequency matrix. The use of these results allowed rigorous derivation and simplification of the expression of the frequency matrix of the generalized mean field approximation (GMFA) Green function (GF) matrix of the model. For the higher order boson-boson averages 〈X02i X j 〉 and 〈X i Nj〉, which enter respectively the normal singlet hopping and anomalous exchange pairing contributions to the frequency matrix, an approximation procedure resulting in GMFA-GF expressions was described. The procedure avoids the current decoupling schemes [14, 15]. Its principle, first formulated in [12], consists in the identification and elimination of exponentially small contributions to the spectral theorem representations of these statistical averages. A point worth noting is that the proper identification of exponentially small quantities asks for the use of different starting expressions of the spectral theorem for the hole-doped and electron-doped cuprates. The results of the reduction procedure may be summarized as follows: • The singlet hopping is a second order effect which may be described as interband i ⇄ j single particle jumps from the upper to the lower energy subband. • The GMFA superconducting pairing is a second order effect, the lowest order contribution to which originates in interband hopping correlating the annihilation Mean field Green functions of Hubbard model of superconductivity 16 (creation) of spin pairs at neighbouring lattice sites i and j within that energy subband which crosses the Fermi level. The derivation of the most general and simplest possible expressions of the frequency matrix and of the GMFA-GF matrix in the (q, ω)-representation enables reliable numerical investigation of the consequences coming from the adjustable parameters of the model (the degree of hole/electron doping, the energy gap ∆, the hopping parameters). Another open question of the GF approach to the solution of the present model is the use of the Hubbard operator algebra to get rigorous derivation and simplification of the Dyson equation of the complete Green function. As shown previously in [12], the self-energy corrections induce a spin fluctuation d-wave pairing originating in kinematic interaction in the second order. These investigations are underway and results will be reported in a forthcoming paper. Acknowledgments The authors would like to express their gratitude to Prof. N.M. Plakida for useful advice and critical reading of the manuscript. Partial financial support was secured by the Romanian Authority for Scientific Research (Project 11404/31.10.2005 - SIMFAP). References [1] Damascelli A, Hussain Z and Shen Z -X 1986 Rev. Mod. Phys. 75 473 [2] F.C. Zhang F C and T.M. Rice T M 1988 Phys. Rev. B 37 3759 [3] Emery V J 1987 Phys. Rev. Lett. 58 2794; Varma C M Schmitt-Rink S, and Abrahams E 1987 Solid State Commun. 62 681 [4] Feiner L J, Jefferson J H and Raimondi R 1996 Phys. Rev. B 53 8751 [5] Yushankhai V Yu, Oudovenko V S and Hayn R 1997 Phys. Rev. B 55 15562 [6] Plakida N M and Oudovenko V S 1999 Phys. Rev. B 59, 11949 [7] Plakida N M 2001 JETP Lett. 74 36 [8] Plakida N M, Hayn R, and Richard J -L 1995 Phys. Rev. B 51 16599 [9] Zubarev D N 1960 Sov. Phys. Usp. 3 320 [10] Plakida N M and Oudovenko V S 2007 JETP 104 230 [11] Plakida N M 1997 Physica C 282–287 1737 [12] Plakida N M, Anton L, Adam S, and Adam Gh 2003 ZhETF 124, 367; English transl.: 2003 JETP 97 331 [13] Plakida N M 2006 Fiz. Nizkikh Temp. 32 483 [14] Roth L M 1969 Phys. Rev. 184 451 [15] Beenen J and Edwards D M 1995 Phys. Rev. B 52 13636; Avella A, Mancini F, Villani D and Matsumoto H 1997 Physica C 282–287 1757; Di Matteo T, Mancini F, Matsumoto H and Oudovenko V S 1997 Physica B 230–232 915; Stanescu T D, Martin I and Phillips Ph, 2000 Phys. Rev. B 62 4300 Introduction Mean field approximation Translation invariance of the spin lattice Spin reversal invariance Frequency matrix in (r, )-representation Hopping processes involving singlets Frequency matrix in (q, )-representation GMFA Green function Conclusions ABSTRACT The Green function (GF) equation of motion technique for solving the effective two-band Hubbard model of high-T_c superconductivity in cuprates [N.M. Plakida et al., Phys. Rev. B, v. 51, 16599 (1995); JETP, v. 97, 331 (2003)] rests on the Hubbard operator (HO) algebra. We show that, if we take into account the invariance to translations and spin reversal, the HO algebra results in invariance properties of several specific correlation functions. The use of these properties allows rigorous derivation and simplification of the expressions of the frequency matrix (FM) and of the generalized mean field approximation (GMFA) Green functions (GFs) of the model. For the normal singlet hopping and anomalous exchange pairing correlation functions which enter the FM and GMFA-GFs, an approximation procedure based on the identification and elimination of exponentially small quantities is described. It secures the reduction of the correlation order to GMFA-GF expressions. <|endoftext|><|startoftext|> Geometry Effects at Atomic-Size Aluminium Contacts U. Schwingenschlögl ∗, C. Schuster Institut für Physik, Universität Augsburg, 86135 Augsburg, Germany Abstract We present electronic structure calculations for aluminium nanocontacts. Address- ing the neck of the contact, we compare characteristic geometries to investigate the effects of the local aluminium coordination on the electronic states. We find that the Al 3pz states are very sensitive against modifications of the orbital over- lap, which has serious consequences for the transport properties. Stretching of the contact shifts states towards the Fermi energy, leaving the system instable against ferromagnetic ordering. By spacial restriction, hybridization is locally suppressed at nanocontacts and the charge neutrality is violated. We discuss the influence of mechanical stress by means of quantitative results for the charge transfer. Key words: density functional theory, electronic structure, stretched nanocontact, hybridization, charge neutrality PACS: 71.20.-b, 73.20.-r, 73.20.At, 73.40.-c, 73.63.Rt Atomic-size contacts can be prepared by means of scanning tunneling mi- croscopy [1] or break junction techniques [2]. In each case, piezoelectric ele- ments are used to stretch a wire with a precision of a few picometers until finally a single atom configuration is reached. Such contacts have attracted great attention over the last couple of years, in particular concerning their electrical transport properties. Since transport is restricted to a small number of atomic orbitals at the contact, conductance across nanocontacts strongly depends on the local electronic structure. An atomic-size constriction acco- modates only a small number of conducting channels, which is determined by the number of valence orbitals of the contact atom. The transmission of each channel likewise is fixed by the local atomic environment. For a review on the quantum properties of atomic-size conductors see Agräıt et al. [3]. ∗ Corresponding author. Fax: 49-821-598-3262 Email address: Udo.Schwingenschloegl@physik.uni-augsburg.de (U. Schwingenschlögl). Preprint submitted to Elsevier 1 November 2018 http://arxiv.org/abs/0704.0693v1 From the theoretical point of view, the electronic structure and conductance of nanocontacts and nanowires has been studied by ab initio band structure calculations. For aluminium contacts, investigations of the electronic states have been reported in [4,5,6,7], and the conductance has been addressed in [8,9,10,11,12,13,14,15,16]. In these studies various geometries have been used, which are assumed to model the local atomic structure of the contact in an adequate way. The breakage of an aluminium contact has been simulated by means of molecular dynamics calculations in [17,18,19], i.e. on the basis of realistic structural arrangements. However, despite such a large number of investigations, the literature lacks satisfactory reflections about the interrela- tions between the details of the crystal structure and the local electronic states at the nanocontact. In the present letter we will deal with this point by com- paring characteristic contact geometries, including stretched configurations. In a previous work [20] we have demonstrated that hybridization between Al 3s und 3p states is strongly suppressed at aluminium nanocontacts due to directed bonds at the neck of the contact. We therefore expect the system to be very sensitive against modifications of the orbital overlap coming along with the specific contact geometry. As a consequence, structural details are important for the electrical transport, since hybridization effects can play a critical role for transport properties of atomic-size contacts and interfaces [21,22,23]. In particular, stretching of the nanocontact alterates the chemical bonding and thus may lead to unexpected electronic features. We will show that it is mandatory to account for the very details of the contact geometry in order to obtain adequate results from electronic structure calculations. The band structure results presented subsequently are obtained within density functional theory and the generalized gradient approximation. We use the WIEN2k program package, a state-of-the-art full-potential code based on a mixed lapw and apw+lo basis [24]. In our calculations the charge density is represented by ≈150000 plane waves and the exchange-correlation potential is parametrized according to the Perdew-Burke-Ernzernhof scheme. Moreover, the mesh for the Brillouin zone integration comprises between 75 and 102 points in the irreducible wedge. While Al 1s, 2s, and 2p orbitals are treated as core states, the valence states comprise Al 3s and 3p orbitals. The radius of the aluminium muffin-tin spheres amounts to 2.6 Bohr radii. Our calculations rely on two characteristic contact geometries, which we in- troduce in the following. On the one hand, we address a configuration where a single Al atom is connected to planar Al units on both sides, each consisting of seven atoms in a hexagonal arrangement with fcc [111] orientation. The central sites of these planar units lie on top of the contact atom, thus giving rise to linear σ-type Al-Al bonds along the z-axis. For this reason, we call the first geometry under consideration the linear contact configuration. The finite Al units are connected to Al ab-planes of infinite extension, which enables us to apply periodic boundary conditions. We note that the contact Al site in this linear geometry, due to its two nearest neighbours, resembles the essential structural features of atoms in a monostrand nanowire [20]. On the other hand, we study an Al atom sandwiched between two pyramidal Al electrodes in fcc [001] orientation, which we call the pyramidal contact configuration. To be specific, the contact Al site has four crystallographically equivalent nearest neighbours on both sides, which prohibits σ-type Al-Al bonding via the 3pz orbitals along the z-axis. Whereas the second pyramidal layer comprises nine atoms, the third layer off the contact extends infinitely on account of periodic boundary conditions. For both contact configurations, a convenient choice for the bond lengths and bond angles is given by the bulk (fcc) aluminium values, therefore by inter- atomic distances of 2.86 Å. Mechanical stress can increase this bond length at the nanocontact, which we simulate by interatomic distances of 3.95 Å for the linear and 3.62 Å for the pyramidal contact configuration. In both cases, only the bond lengths between the contact Al site and its nearest neighbours are changed with respect to the fcc setup. Structural relaxation of the electrodes due to the elongated contact bonds plays a minor role. For bulk aluminium it is well established that the formal Al 3s23p1 electronic configuration is seriously interfered by hybridization effects, giving rise to a prototypical sp-hybrid system. However, the situation changes dramatically when covalent bonding is no longer isotropic but restricted to specific direc- tions, as for an atomic-size contact. Partial Al 3s, 3pz, and 3px densities of states (DOS) as calculated for the Al site at the neck of our linear contact configuration are shown in figure 1. By symmetry, px and py states are degen- erate. While most of the occupied states are of 3s type, the 3p states dominate at energies above the Fermi level. Because hardly any contribution of 3s and 3p states is found at energies dominated by the other states, respectively, evo- lution of hybrid orbitals and an interpretation in terms of sp-hybrid states is precluded. Most 3p electrons occupy the pz orbital, which is oriented along the principal axis of the contact and therefore mediates σ-type orbital overlap. Because neither 3s nor 3px states give rise to significant contributions to the DOS at the Fermi energy, chemical bonding is well characterized in terms of directed 3pz bonds. When the Al-Al bond length is stretched from 2.86 Å to 3.95 Å at the nanocon- tact, the central Al site decouples from its neighbours. Its electronic states hence become more atom-like, which is clearly visible for the 3s states in fig- ure 1. Smaller band widths and sharper DOS structures likewise are obvious for the 3px states. Finally, for the 3pz symmetry component we observe a shift of states from lower energies to the Fermi level and from higher energies to a new structure at about 2 eV. The DOS at the Fermi energy increases d=3.95 Å d=2.86 Å Al 3s 20-2-4-6 Al 3p 20-2-4-6 Al 3p E-EF(eV) 20-2-4-6 Fig. 1. Partial Al 3s, 3pz, and 3px densities of states for the contact aluminium site in the linear contact configuration. configuration bond length charge transfer linear 2.86 Å 0.51 2.97 Å 0.54 3.08 Å 0.57 3.33 Å 0.59 3.58 Å 0.62 3.95 Å 0.63 pyramidal 2.86 Å 0.22 3.22 Å 0.53 3.62 Å 0.61 4.04 Å 0.66 Table 1 Net charge transfer off the contact aluminium site. significantly. Turning to the occupation of the valence orbitals, the decoupling of the contact Al site comes along with a reduction of charge, amounting to 0.13 electrons. In general, the px/py atomic orbital does not mediate chemical bonding due to the spacial restriction of the crystal structure. Its occupation thus is strongly reduced and we cannot expect local charge neutrality. Calculated values for the net charge transfer off the contact Al site, as compared to bulk aluminium, are given in table 1 for bond lengths between 2.86 Å and 3.95 Å. Figure 2 shows partial Al 3s, 3pz, and 3px densities of states for the pyramidal contact configuration. As compared to the linear case fundamental differences are found. For an Al-Al bond length of 2.86 Å we now have a finite Al 3s DOS at the Fermi energy. The same is true for the 3pz states, whereas the 3px DOS almost vanishes. Again, px and py states are degenerate by symmetry. While most of the occupied states still are of 3s type, contributions of the 3p states are larger than for the linear contact configuration. Chemical bonding therefore is more isotropic, which is reflected by larger band widths. As expected from the contact geometry, σ-type bonding via the 3pz-orbital is significantly reduced. Due to increased hybridization between the 3s and the three 3p orbitals, all these states can participate in the electrical transport. Figure 3 illustrates the differences in the electronic structure at the linear and pyramidal contact by means of electron density maps. In each case, the density map covers the plane of the central Al site, where the principal axis of the contact runs from left to d=3.62 Å d=2.86 Å Al 3s 20-2-4-6 Al 3p 20-2-4-6 Al 3p E-EF(eV) 20-2-4-6 Fig. 2. Partial Al 3s, 3pz, and 3px densities of states for the contact aluminium site in the pyramidal contact configuration. pyramidal linear Fig. 3. Electron density maps for the neck of the linear and pyramidal contact. right. Stretching of the contact by increasing the Al-Al bond length from 2.86 Å to 3.62 Å has similar effects as for the linear contact configuration. The decou- pling of the central Al site from the pyramidal electrodes results in smaller band widths and sharper DOS structures for the 3s, 3pz, as well as 3px states, see figure 2. In addition, the 3pz und 3px states shift to higher energies, giving rise to pronounced DOS structures near 2 eV. Due to the reduced band width, the 3s and 3pz DOS disappears almost completely in the vicinity of the Fermi energy. As concerns the 3pz states, the pyramidal geometry thus shows the opposite behaviour than the linear geometry. Since the Al 3pz states mediate the main part of the orbital overlap across the nanocontact, they are very sensitive against changes in the crystal structure. Structural rearrangement during the breakage of an aluminium nanocontact or nanowire consequently should have serious effects on the electrical transport, such as modulation of the conductance. Elongated bonds again are accompanied by a decline of charge at the contact. However, the value of 0.39 electrons net charge transfer off the central Al site is larger than in the linear case. This traces back to the fact that reduction of hybridization on stretching is more efficient for the pyramidal geometry, since for the linear geometry hardly any hybridization is left right from the Al 3p E-EF(eV) 20-2-4-6 Fig. 4. Spin majority and minority Al 3pz densities of states for the contact alu- minium site in the stretched linear contact configuration (d = 3.95 Å). beginning. Accordingly, smaller values for the net charge transfer are found in the pyramidal case, see table 1. We next show that the remarkable increase of the Al 3pz DOS at the Fermi energy on stretching the linear contact configuration leads to an instability against ferromagnetic ordering. A spontaneous magnetization of simple metal nanowires has been predicted by Zabala et al. [4]. These authors have demon- strated the instability in explicit calculations for an aluminium nanowire, using a stabilized jellium model. For Al-Al bond lengths of d = 3.95 Å at the neck of our contact, we hence have performed spin polarized electronic structure calculations, yielding a stable ferromagnetic solution. Figure 4 displays the corresponding spin majority and minority densities of states for the contact Al site. The local spin splitting of about 0.2 eV is connected to an energy gain of 5.5mRyd. Moreover, the magnetic moment amounts to 0.1µB, which is significantly smaller than the 0.68µB reported by Zabala et al. [4] for an infi- nite nanowire. However, Delin et al. [6] have shown for Pd nanowires that the spontaneous magnetization decreases rapidly for short chains. Only the linear contact configuration is subject to a magnetic instability on stretching. For the pyramidal geometry, of course, magnetism cannot be expected, compare the spin degenerate DOS curves in figure 2. In conclusion, we have studied the electronic structure of aluminium nanocon- tacts by means of band structure calculations within density functional theory. Taking into account the details of the crystal structure, we have discussed the electronic features of prototypical contact geometries (linear versus pyrami- dal). Our calculations result in two largely different scenarios. In particular, the Al 3pz states are strongly affected by modifications of the chemical bonding. If σ-type bonding via the 3pz orbitals is dominant because of direct orbital overlap across the contact, sp-type hybridization is almost completely sup- pressed and only Al 3pz states remain at the Fermi energy [20]. Otherwise, if the bonding is more isotropic for geometrical reasons, the Al 3s states likewise have to be taken into account. The divers behaviour of the linear and pyrami- dal geometry becomes even more pronounced when the contact is stretched. Whereas in the linear case the Al 3pz DOS at the Fermi energy increases, which even yields a ferromagnetic instability, it vanishes in the pyramidal case, leaving the system insulating. As a consequence, the structural details of the contact are expected to strongly influence the electrical transport. Because of structural rearrangements, they are particularly relevant for the breakage of a nanocontact or nanowire. Acknowledgements We thank U. Eckern and P. Schwab for helpful discussions and the Deutsche Forschungsgemeinschaft for financial support (SFB 484). References [1] Pascual J.I., Méndez J., Gómez-Herrero J., Baró A.M., and Garćıa N., Phys. Rev. Lett. 71 (1993) 1852. [2] Scheer E., Joyez P., Esteve D., Urbina C., and Devoret M.H., Phys. Rev. Lett. 78 (1997) 3535. [3] Agräıt N., Levi-Yeyati A., and van Ruitenbeck J.-M., Phys. Rep. 377 (2003) 81. [4] Zabala N., Puska M.J., Ayuela A., Raebiger H., and Nieminen R.M., J. Magn. Magn. Mater. 249 (2002) 193. [5] Ribeiro F.J., and Cohen M.L., Phys. Rev. B 68 (2003) 035423. [6] Delin A., Tosatti E., and Weht R., Phys. Rev. Lett. 92 (2004) 057201. [7] Delin A. and Tosatti E., J. Phys.: Condens. Matter 16 (2004) 8061. [8] Levy-Yeyati A., Mart́ın-Rodero A., and Flores F., Phys. Rev. B 56 (1997) 10369. [9] Cuevas J.C., Levi-Yeyati A., and Mart́ın-Rodero A., Phys. Rev. Lett. 80 (1998) 1066. [10] Cuevas J.C., Levi-Yeyati A., Mart́ın-Rodero A., Bollinger G.R., Untiedt C., and Agräıt N., Phys. Rev. Lett. 81 (1998) 2990. [11] Kobayashi N., Brandbyge M., and Tsukada M., Phys. Rev. B 62 (2000) 8430. [12] Palacios J.J., Pérez-Jiménez A.J., Louis E., SanFabián E., and Vergés J.A., Phys. Rev. B 66 (2002) 035322. [13] Thygesen K.S., and Jacobsen K.W., Phys. Rev. Lett. 91 (2003) 146801. [14] Lee H.-W., Sim H.-S., Kim D.-H., and Chang K.J., Phys. Rev. B 68 (2003) 075424. [15] Okano S, Shiraishi K., and Oshiyama A., Phys. Rev. B 69 (2004) 045401. [16] Sasaki T., Egami Y., Ono T., and Hirose K., Nanotechnology 15 (2004) 1882. [17] Hasmy A., Medina E., and Serena P.A., Phys. Rev. Lett. 86 (2001) 5574. [18] Hasmy A., Pérez-Jiménez A.J., Palacios J.J., Garćıa-Mochales P., Costa- Krämer J.L., Dı́az M., Medina E., and Serena P.A., Phys. Rev. B 72 (2005) 245405. [19] Pauly F., Dreher M., Viljas J.K., Häfner M., Cuevas J.C., and Nielaba P., arXiv:cond-mat/0607129. [20] Schwingenschlögl U. and Schuster C., Chem. Phys. Lett. 432 (2006) 245. [21] Schmitt T., Augustsson A., Nordgren J., Duda L.-C., Höwing J., Gustafsson T., Schwingenschlögl U., and Eyert V., Appl. Phys. Lett. 86 (2005) 064101. [22] Schwingenschlögl U. and Schuster C., Chem. Phys. Lett. 435 (2007) 100. [23] Schwingenschlögl U. and Schuster C., Europhys. Lett. 37 (2007) 37007. [24] Blaha P., Schwarz K., Madsen G., Kvasicka D., and Luitz J., WIEN2k: An augmented plane wave + local orbitals program for calculating crystal properties, Vienna University of Technology, 2001. http://arxiv.org/abs/cond-mat/0607129 References ABSTRACT We present electronic structure calculations for aluminium nanocontacts. Addressing the neck of the contact, we compare characteristic geometries to investigate the effects of the local aluminium coordination on the electronic states. We find that the Al 3pz states are very sensitive against modifications of the orbital overlap, which has serious consequences for the transport properties. Stretching of the contact shifts states towards the Fermi energy, leaving the system instable against ferromagnetic ordering. By spacial restriction, hybridization is locally suppressed at nanocontacts and the charge neutrality is violated. We discuss the influence of mechanical stress by means of quantitative results for the charge transfer. <|endoftext|><|startoftext|> Introduction Measurements of current - voltage (I-V ) characteristics are accompanied with the heat emission and the selfheating. The selfheating can modify dramatically the resulting I-V curve. A heat hysteresis of I-V curve and a dependence of I-V curve on the velocity of scanning of current are signs of selfheating. A removal of the selfheating is very important for transport measurements of high-Tc superconductors because their heightened temperature sensibility. By reducing the cross section S of a bulk sample one can measure I-V curve at the fixed range of the current density for the smaller values of the measuring current. The selfheating decreases as well. In the case of non-tunneling break junction (BJ) technique, a significant reducing of S is achieved by the formation of a microcrack in a bulk sample. The non-tunneling BJ of high-Tc superconductors represents two massive polycrystalline banks connected by a narrow bottleneck (Figure 1a). The bottleneck is constituted by granules and intergranular boundaries which are weak links (Figure 1b). The current density in the bottleneck is much larger than that in the banks. If the bias current I is less than the critical current Ic of the bulk sample then the weak links in the banks have zero resistance. Provided small transport currents, (i) Ic and the I-V curve of the BJ http://arxiv.org/abs/0704.0694v1 Current - voltage characteristics of break junctions of high-Tc superconductors 2 are determined by the weak links in the bottleneck only, (ii) the selfheating effect is negligible. Figure 1. a) Break junction of polycrystalline sample. The crack 1 and the bottleneck 2 are displayed. b) Granules in the bottleneck. Filled circles mark weak links that are intergranular boundaries. Dotted lines are the main paths for transport current. c) Simplified circuit for the network (Sec. 3.1). The experimental I-V curves of BJs of high-Tc superconductors have rich peculiarities reflecting physical mechanisms of a charge transport through weak links. It was a topic of many investigations [1, 2, 3, 4, 5, 6, 7]. Here we analyze the earlier works of our group [4, 5, 6, 7] and the new experimental data (Section 2). The model for description of the I-V curves is suggested in Section 3. The peculiarities observed on the experimental I-V curves of BJs have been explained in Section 4. Also the parameters of weak links in the investigated samples are estimated in Section 4. 2. Experiment La1.85Sr0.15CuO4 (LSCO), Y0.75Lu0.25Ba2Cu3O7−δ (YBCO) and Bi1.8Pb0.3Sr1.9Ca2Cu3Ox (BSCCO) were synthesized by the standard ceramic technology. The composite 67 vol.% YBa2Cu3O7−δ + 33 vol.% Ag (YBCO+Ag) was prepared from YBa2Cu3O7−δ powder and ultra-dispersed Ag [8]. The initial components were mixed and pressed. Then the composite was synthesized at 925◦C for 8 h. The critical temperatures Tc are 38 K for LSCO, 112 K for BSCCO, 93.5 K for YBCO and YBCO+Ag. Samples with a typical size of 2 mm x 2 mm x 10 mm were sawed out from synthesized pellets. Then the samples were glued to a sapphire substrates. The sapphire was chosen due to its high thermal conductivity at low temperatures. The central part of the samples was polished down to obtain a cross-sectional area S ≈ 0.2 x 1 mm2. For such a value of S, the critical current Ic of YBCO and BSCCO has a typical value about 2 A at 4.2 K (current density ≈ 1000 A/cm2). Further controllable decrease in S is very difficult due to an inevitable mechanical stresses breaking the sample. In order Current - voltage characteristics of break junctions of high-Tc superconductors 3 to obtain a contact of the break junction type, the sample with the above value of S was bent together with the substrate with the help of screws of spring-loaded current contacts. It led to the emergence of a microcrack in the part of sample between the potential contacts. As a result, either a tunnelling contact (no bottleneck, the resistance R > 10 Ohm at the room temperature) or a metal contact (R < 10 Ohm) was formed. Only the metal contacts were selected for investigation. The drop of Ic(4.2 K) when the sample was cracked shows that the values of S decreased by ≈30 times for LSCO and ≈100 times for YBCO and BSCCO. For YBCO+Ag Ic(77.4 K) decreases by ∼500 times. The I-V curves were measured by the standard four probe technique under bias current. A typical V (I) dependence of BJ has the hysteretic peculiarity which decreases as temperature increases. Also there is the excess current on the I-V curves. The I-V curves of LSCO and BSCCO BJs exhibit an arch-like structure at low temperature. The I-V curves of BJs investigated are independent of scanning velocity of bias current. Thus, the experimental conditions provides that the hysteretic peculiarity on these I-V curves is not caused by the selfheating. 3. Model 3.1. I-V curve of network A polycrystalline high-Tc superconductor is considered to be the network of weak links. The I-V curve of a network is determined by the I-V curves of individual weak links and their mutual disposition. Let us consider firstly an influence of mutual disposition of weak links on the I-V curve. For a bulk high-Tc superconductor the I-V curve resembles the one of typical single weak link [9]. However the I-V curves of BJs are distorted usually in comparison with the one of a single weak link. It is because the combination of finite number of weak links remains in the bottleneck of BJ (Figure 1 a, b). So the contribution of different weak links to the resulting I-V curve is more stronger in a BJ than in a large network. The characteristics of a chaotic network is difficult to calculate [9]. To simplify the calculation of resulting I-V curve of BJ we consider an equivalent network: the simple parallel connection of a few chains of series-connected weak links (Figure 1 c). Indeed there are percolation clusters [10] in a network that are paths for current (Figure 1 b). The each percolating cluster in the considered network is considered to be the series-connected weak links. The V (I) dependence of the series-connected weak links is determined as V (I) = Vi(I), where the sum is over all weak links in the chain, Vi(I) is the I-V curve of each weak link. The weak links and their I-V curves may be different. It is conveniently to replace here the sum over all weak links with the sum of a few more typical weak links multiplied by a weighting coefficient Pi. The relation for the series-connected weak links Current - voltage characteristics of break junctions of high-Tc superconductors 4 is resulted: V (I) = NV PiVi (I) , (1) where NV is the number of typical weak links, Pi shows the share of ith weak link in the resulting I-V curve of the chain, Pi = 1. The parallel connection of chains is considered further. If the current I flows through the network then the current Ij through jth chain equals IP‖j/N‖ and Ij = I. Here N‖ is the number of parallel chains in the network, P‖j is the weighting coefficient determined by the resistance of jth chain, P‖j = 1. An addition (a subtraction) of chains in parallel connection smears (draws down) the I-V curve of network to higher (lower) currents. It is like to the modification of I-V curve due to the increase (the decrease) of cross section of sample. For the sake of simplicity the difference of parallel chains can be neglected and the typical chain may be considered only. Then the expression for I-V curve of network of weak links follows: V (I) = NV , (2) where the sum is over the typical weak links with weighting coefficients Pi, NV is the number of series-connected weak links in the typical chain in the network, I/N‖ = Ii is the current through the ith weak link of the typical chain. 3.2. I-V curve of a typical weak link The metal intergranular boundaries were revealed in the polycrystalline YBCO synthesized by the standard ceramic technology [11]. The excess current and other peculiarities on the I-V curves of the studied samples are characteristic for superconductor/normal-metal/superconductor (SNS) junctions [12]. These facts verify that the intergranular boundaries in the high-Tc superconductors investigated are metallic. Therefore the networks of SNS junctions are realized in the samples. The Kümmel - Gunsenheimer - Nicolsky (KGN) theory [13] only among theories developed for SNS structures predicts the hysteretic peculiarity on the I-V curve of weak link. The KGN theory considers the multiple Andreev reflections of quasiparticles. According to the KGN model, the hysteretic peculiarity reflects a part of I-V curve with a negative differential resistance which can be observed under bias voltage [12, 13]. The KGN approach was used earlier to the description of experimental I-V curves of low-Tc [14, 15] and high-Tc weak links [16, 14]. The approach based on consideration of the phase slip in nanowires [17] may alternatively be employed to compute the hysteretic I-V curve. The model [17] is valid at T ≈ Tc while the KGN model is appropriate at temperature range T < Tc. We use the simplified version [14] of KGN to describe the I-V curves of individual weak links. According to [14] the expression for the current density of SNS junction is given by: Current - voltage characteristics of break junctions of high-Tc superconductors 5 Table 1. Parameters of superconductors Sample ∆0 [meV] m ∗/me kF [Å BSCCO 25 6.5 0.61 LSCO 9 5 0.35 YBCO 17.5 5 0.65 YBCO+Ag 17.5 5 0.65 j(V ) = em∗2d2 2π3~5 −∆+neV ∆2 − E2 1− C 2|E| E2 −∆2 with C = π/2(1 − dm∗∆/2~2kF ) for C > 1 and C = 1 otherwise, E1 = −∆ + neV for −∆ + neV ≥ ∆ and E1 = ∆ otherwise. Here A is the cross section area and d is the thickness of normal layer with the inelastic mean free path l and resistance RN , e is the charge and m∗ is the effective mass of electron, ∆ is the energy gap of superconductor, n is the number of Andreev reflections which a quasiparticle with energy E undergoes before it moves out of the normal layer. One should calculate a few I(V ) dependencies by Eq.(3) for different parameters to simulate the I-V curve of network by Eq.(2). Almost all parameters in Eq.(3) can be dispersing for different weak links. Indeed there are some distribution functions of the parameters of intergranular boundaries (d, A, RN) or the parameters of superconducting crystallites (∆, the angle of orientation) in the SNS network. 4. Current - voltage characteristics Figures 2-5 show the experimental I-V curves of BJs (circles) and the calculated I-V curves of SNS networks (solid lines). The right scale of V -axis of all graphs is given in the units eV/∆ to correlate the position of peculiarities on I-V curve with the value of energy gap. The parameters of superconductors are presented in Table 1. The mean values of energy gap ∆0 at T = 0 known to be for high-Tc superconductors were used. Parameters kF , m ∗ were estimated by the Kresin-Wolf model [18]. For a fitting we have calculated the I(V ) dependencies of different SNS junctions by Eq.(3) to describe different parts of the experimental I-V curve. The parameters varied were d and RN . Then we have substituted the arrays of I-V values to Eq.(2). The most experimental I-V curves are satisfactory described when the sum in Eq.(2) contains at least two members. The first member describes the hysteretic peculiarity, Current - voltage characteristics of break junctions of high-Tc superconductors 6 the second one describes the initial part of I-V curve. The fitting is illustrated in detail on Figure 2 were curve 1 is calculated for d = 78 Å, curve 2 is calculated for d = 400 Å Figure 2. I-V curve of YBCO+Ag break junction at T = 77.4 K. Experiment (circles) and computed curves (solid lines). Arrows display the jumps of voltage drop. Curve 1 that is N‖I1(V1) fits the hysteretic peculiarity. Curve 2 that is N‖I2(V2) fits the initial part of I-V curve. Curve 3 is the dependence V (I) = NV P1V1(I/N‖) + P2V2(I/N‖) Figure 3. I-V curve of BSCCO break junction at T = 4.2 K. Experiment (circles) and computed curve (solid line). Arrows display the jumps of voltage drop. The main fitting parameters are d/l, NV , N‖A, RN/N‖, P1,2. The parameter P1 is the weighting coefficient for the stronger (with the thinner d) typical weak link, P2 = 1− P1. Some parameters used are presented in Table 2. Values of l are estimated from the experimental data of resistivity (2, 3, 1.6, 3.6 mOhm cm at 150 K for bulk BSCCO, LSCO, YBCO, YBCO+Ag correspondingly) and data of works [19, 20]. The value of l for Ag at 77 K is known to be ∼ 0.1 cm. But it is more realistic to use much Current - voltage characteristics of break junctions of high-Tc superconductors 7 Figure 4. Temperature evolution of I-V curve of LSCO break junction. Experiment (circles) and computed curves (solid lines). The I-V curves at 11.05 K, 23.6 K, 32.85 K are shifted up by 0.2 V, 0.4 V, 0.6 V correspondingly. Figure 5. Temperature evolution of I-V curve of YBCO break junction. Experiment (circles) and computed curves (solid lines). The I-V curves at 21.65 K, 41.1 K, 61.95 K are shifted up by 10 mV, 20 mV, 30 mV correspondingly. smaller value for composite. Table 2 shows the possible different values of l and the corresponding values of d for YBCO+Ag. The number of the parallel paths N‖ is estimated by assuming A ≃ 10−11 cm2 for the weak links in polycrystalline high-Tc superconductors. Such choice of A is reasonable because the cross section area of weak link should be more smaller than D2 (Figure 1b), where D ∼ 10−4 cm is the grain size of high-Tc superconductors. This rough estimation of N‖ is influenced by a form of the percolation clusters in the sample [10] and imperfections of weak links. Figures 2-5 demonstrate that the hysteretic peculiarity on the experimental I-V curves is resulted from the region of negative differential resistance. This region is due to the number of the Andreev reflections decreases when the voltage increases. Current - voltage characteristics of break junctions of high-Tc superconductors 8 Table 2. Parameters of SNS junctions in the networks Sample l [Å] d1 [Å] d2 [Å] P1 NV N‖ BSCCO 72∗ 3.5 - 1 1 1 LSCO 50∗ 4.8 20 0.905 15 20 YBCO 90∗ 2 20 0.333 3 1 YBCO+Ag 1000∗∗ 78 400 0.75 4 5 100∗∗ 7.8 40 0.75 4 5 ∗ value at T = 4.2 K ∗∗ value at T = 77.4 K The experimental I-V curves for LSCO and YBCO at different temperatures and the corresponding curves computed are presented in figures 4 and 5. We account a decreasing of l and ∆ to compute I-V curves at higher temperatures (for LSCO l(11.05 K) = 50 Å, ∆(11.05 K) = 0.93 meV, l(23.6 K) = 50 Å, ∆(23.6 K) = 0.69 meV, l(32.85 K) = 47 Å, ∆(32.85 K) = 0.46 meV; for YBCO l(21.65 K) = 81 Å, ∆(21.65 K) = 17.3 meV, l(41.1 K) = 70 Å, ∆(41.1 K) = 16.6 meV, l(61.95 K) = 60 Å, ∆(61.95 K) = 13.3 meV). The coincidence of computed curves and experimental I-V curves becomes less satisfactory then T approaches to Tc. As possible, this discrepancy is due to an influence of other thermoactivated mechanisms. As the simulation curves demonstrate (Figs. 3 and 4), the arch-like peculiarity on the experimental I-V curves of LSCO and BSCCO is one of the arches of the subharmonic gap structure [13]. By using Eq.(2) we account for the arch-like peculiarity at voltages ≫ ∆/e for LSCO (Figure 4) that should seem to contradict the KGN model prediction for the subharmonic gap structure at V ≤ 2∆/e [13]. Also we have used Eq.(2) to estimate the number of resistive weak links in the sample of composite 92.5 vol. % YBCO + 7.5 vol. % BaPbO3 [16]. The I-V curve of this composite was described earlier by the KGN based approaches [16, 14]. We obtained NV = 13, N‖ ≈ 4000 and the full number of resistive weak links is 52000. Small number NV is the evidence that the shot narrowest part of bulk sample is resistive only. 5. Conclusion We have measured the I-V characteristics of break junctions of polycrystalline high-Tc superconductors. The peculiarities that are typical for SNS junctions are revealed on the I-V curves. The expression for I-V curve of network of weak links (Eq.(2)) was suggested to describe the experimental data. Eq.(2) determines the relation between the I-V curve of network and the I-V characteristics of typical weak links. The I-V curves of SNS junctions forming the network in the polycrystalline high- Tc superconductors are described by the Kümmel - Gunsenheimer - Nicolsky approach [13, 14]. The multiple Andreev reflections are found to be responsible for the hysteretic Current - voltage characteristics of break junctions of high-Tc superconductors 9 and arch-like peculiarities on the I-V curves. The shift of subharmonic gap structure to higher voltages is explained by the connection of a few SNS junctions in series. We believe that the expression suggested (Eq.(2)) allows to estimate the number of junctions with nonlinear I-V curves and R > 0 in various simulated networks. Acknowledgements We are thankful to R. Kümmel and Yu.S. Gokhfeld for fruitful discussions. This work is supported by program of President of Russian Federation for support of young scientists (grant MK 7414.2006.2), program of presidium of Russian Academy of Sciences ”Quantum macrophysics” 3.4, program of Siberian Division of Russian Academy of Sciences 3.4, Lavrent’ev competition of young scientist projects (project 52). References [1] Zimmermann U, Abens S, Dikin D, Keck K, Wolf T 1996 Physica B 218 205 [2] Svistunov V M, Tarenkov V Yu, Dyachenko A I, Hatta E 2000 JETP Lett. 71 289 [3] Gonnelli R S, Calzolari A, Daghero D, Ummarino G A, Stepanov V A, Giunchi G, Ceresara S, Ripamonti G. 2001 Phys. Rev. Lett. 87 097001 [4] Petrov M I, Balaev D A, Gokhfeld D M, Shaikhutdinov K A, Aleksandrov K S 2002 Phys. Solid State 44 1229 [5] Petrov M I, Balaev D A, Gokhfeld D M, Shaikhutdinov K A 2003 Phys. Solid State 45 1219 [6] Petrov M I, Gokhfeld D M, Balaev D A, Shaihutdinov K A, Kümmel R 2004 Physica C 408 620 [7] Gokhfeld D M, Balaev D A, Shaykhutdinov K A, Popkov S I, Petrov M I 2006 Physics of Metals and Metallography 101 (Suppl. 1) S27 (Preprint cond-mat/0410112) [8] Mamalis A G, Ovchinnikov S G, Petrov M I, Balaev D A, Shaihutdinov K A, Gohfeld D M, Kharlamova S A, Vottea I N 2001 Physica C 364-365 174 [9] Haslinger R, Joynt R 2000 Phys. Rev. B 61 4206 [10] Stauffer D 1979 Physics Reports 54 1 [11] Petrov M I, Balaev D A, Gokhfeld D M 2007 Phys. Solid State 49 619 [12] Likharev K K 1979 Rev. Mod. Phys. 51 101 [13] Kümmel R, Gunsenheimer U, Nicolsky R 1990 Phys. Rev. B 42 3992 [14] Gokhfeld D M 2007 Supercond. Sci. Technol. 20 62 (Preprint cond-mat/0609541) [15] Gokhfeld D M 2007 Physica C (materials of M2S-HTSC, Preprint cond-mat/0605427) [16] Petrov M I, Balaev D A, Gohfeld D M, Ospishchev S V, Shaihutdinov K A, Aleksandrov K S 1999 Physica C 314 51 [17] Michotte S, Matefi-Tempfli S, Piraux L, Vodolazov D Y, Peeters F M 2004 Phys. Rev. B 69 094512 [18] Kresin V Z, Wolf S A 1990 Phys. Rev. B 41 4278 [19] Larbalestier D, Gurevich A, Feldmann D M, Polyanskii A 2001 Nature 414 368 [20] Gorkov L P, Kopnin N B 1988 Usp. Fiz. Nauk 156 117 (Sov. Phys. Usp. 31 850) http://arxiv.org/abs/cond-mat/0410112 http://arxiv.org/abs/cond-mat/0609541 http://arxiv.org/abs/cond-mat/0605427 Introduction Experiment Model I-V curve of network I-V curve of a typical weak link Current - voltage characteristics Conclusion ABSTRACT The current-voltage ($I$-$V$) characteristics of break junctions of polycrystalline La$_{1.85}$Sr$_{0.15}$CuO$_4$, Y$_{0.75}$Lu$_{0.25}$Ba$_2$Cu$_3$O$_{7-\delta}$, Bi$_{1.8}$Pb$_{0.3}$Sr$_{1.9}$Ca$_2$Cu$_3$O$_x$ and composite YBa$_2$Cu$_3$O$_{7-\delta}$ + Ag are investigated. The experimental $I$-$V$ curves exhibit the specific peculiarities of superconductor/normal-metal/superconductor junctions. The relation between an $I$-$V$ characteristic of network of weak links and $I$-$V$ dependencies of typical weak links is suggested to describe the experimental data. The $I$-$V$ curves of typical weak links are calculated by the K\"{u}mmel - Gunsenheimer - Nicolsky model considering the multiple Andreev reflections. <|endoftext|><|startoftext|> Introduction General discussion Calculation of R123(2) Calculation of R123(3) Photon splitting in a monochromatic plane wave Asymptotics of the amplitudes Amplitudes for small and small Amplitudes for small and fixed Amplitudes for large and fixed =/ Possibility of experimental observation of photon splitting in a laser field Conclusions Acknowledgments Coefficients for the helicity amplitudes References ABSTRACT Photon splitting due to vacuum polarization in a laser field is considered. Using an operator technique, we derive the amplitudes for arbitrary strength, spectral content and polarization of the laser field. The case of a monochromatic circularly polarized laser field is studied in detail and the amplitudes are obtained as three-fold integrals. The asymptotic behavior of the amplitudes for various limits of interest are investigated also in the case of a linearly polarized laser field. Using the obtained results, the possibility of experimental observation of the process is discussed. <|endoftext|><|startoftext|> Introduction: Group-VB transition metals V, Nb, and Ta crystallizes in body-centered cubic (BCC) structure at ambient pressure and temperature conditions. These metals are of great use due to their high thermal, mechanical and chemical stabilities. Recently these metals have been the subject of numerous experimental and theoretical studies [1-7] in Mbar pressure regions. The Nb is known to have the highest superconducting transition temperature (Tc) among elemental solids at ambient pressure [1]. The Ta is used in high-pressure experiments as a pressure standard. Experimentally it is known that V, Nb and Ta remain stable in the BCC structure at least up to 150 GPa [2,4] pressure. However, the linear response phonon calculations using full potential linear muffin-in orbital (FP-LMTO) method of V by Suzuki and Otani had shown the softening of the transverse acoustics (TA) phonon mode at ~120 GPa pressure, which eventually becomes imaginary at pressures higher than 130 GPa [5]. The subsequent work of Landa et al. using exact muffin-tin orbital (EMTO) method had also shown the anomalous pressure behavior of the C44 elastic constant. They found that the C44 starts softening above 50 GPa pressure and drops to zero value at about 180 GPa pressure. In the pressure range of 180-270 GPa the value of C44 is negative. Above to this, the C44 adopts the usual pressure behavior. These are an indication of the development of a structural instability in BCC lattice. However, these authors corroborated this behavior to the Fermi-surface nesting and the possibility of any structural phase transition was not explored [6]. In this paper we are reporting our ab-initio density functional theory based results for the V, Nb and Ta under high pressure. For V, BCC to rhombohedral structural phase transition has been predicted at around 60 GPa pressure and it remains in rhombohedral structure at least up to 332 GPa pressure. However, at 434 GPa pressure the BCC structure re-appears in our calculations. This behavior is not found in the Nb and Ta.. But lattice expansion in Nb resulted a new minimum around 111° which is lower in energy relative to BCC. By analogy, we expect such features in Ta will appear even at higher lattice expansions. Computational method: The cubic structures namely simple cubic, body centered cubic (BCC) and face-centered cubic (FCC) crystal structures are related to a rhombohedral structure. The simple cube can be viewed as a rhombohedron with angle (αrhom) equal to 90°. The primitive cells of BCC and FCC are rhombohedron with angle αrhom equal to 109.47° and 60° respectively [8]. Thus it is possible to generate all theses cubic structures from a rhombohedron with atom at (0,0,0) by varying the angle. Here, we have performed total energy calculations taking a rhombohedral unit cell for V, Nb and Ta as a function of αrhom in the range of 54°-112° at several volumes. All the calculations for V were done using the pseudopotential based PWSCF computer code [9]. The total energy calculations are based on density functional theory and the phonon calculations on density functional perturbation theory [10]. An ultrasoft pseudopotential for V was used with 40 Ry plane wave energy cut off and a 400 Ry cut off in the expansion of the augmentation charges. Phonon dynamical matrices were computed on a uniform 6x6x6 grid of q-points in the BZ of BCC structure. This leads to total 16 q-points in the irreducible BZ. Using Fourier interpolation, we then constructed and diagonalized the dynamical matrices on a denser grid (24x24x24). The calculations for the Ta and Nb were performed using the VASP computer program with PAW pseudopotentials taking an energy cut off of 400 eV for plane wave expansions [11-13]. Results were cross checked with both programs at a few volumes for V and Nb. The s and p semicore levels of respective elements were included in the valence states. For the exchange-correlation term the generalized gradient correction approximation of Perdew-Buke-Ernzerof (GGA-PBE) [14] was used. A 18x18x18 k-point mesh for the Brillouin zone (BZ) integration was used for all the total energy calculations. The energy convergence with respect to computational parameters was carefully examined. Results and discussion: The zero-pressure and zero-temperature results for lattice constant a0, bulk modulus BB0, and the first pressure derivative of the bulk modulus B’ for V, Ta, and Nb are obtained by fitting the total energies versus lattice constant data to a fourth order polynomial. Calculated equilibrium lattice constants are in good agreement with the experimental values (within the error of 1%). Bulk modulus and its first pressure derivates are also in good agreement with existing experimental data (see table for comparison). Calculated pressure-volume relations matches well with existing experimental data and shown in Fig.1. Hence justifies the use of respective presudopotentials and other computational parameters. Table-1: The experimental data shown in brackets by symbol (*) is of Takemura [2] and by symbol (+) is of Dewaele et. al.[7]. Element Lattice constant (a0) Bulk Modulus (BB0) B’ V 2.998 (3.00)* 182 (165,188)* 3.70 (3.5,4.04)* Nb 3.309 (3.30)* 172 (153,168)* 3.40 (3.9,2.2)* Ta 3.320 (3.30)+ 208 (194,)+ 3.13 (3.25)+ Fig.2 depicts our calculated total energy as a function of angle αrhom for V at equilibrium volume. The SC is on the absolute maximum and FCC is on the local maximum indicating the unstable nature of theses structures for a given rhombohedral distortion. The BCC is in the absolute minimum of the total energy curve as per expectation. However, there are two local minima around angles 56° and 68° (see the inset of Fig.2). To our best of knowledge no body has shown that for V, FCC and SC structures are mechanically unstable. The energy variation in the range of angles 106° -111° is about 4.0 mRy per atom (see Fig.3a). Similar calculations were performed at many volumes covering more than 400 GPa pressures. The Figs.3(a, b) show total energy variation as a function of angle αrhom in the range of 106°-112° at several pressures. The reason for showing the energy variation only in this range of angles is two fold. Firstly, BCC is of lowest energy among all structures and secondly no extra feature appear around the FCC except that local minima become more pronounced with few degree shift. Particularly at 112 GPa pressure the minimum around 68° is shifted to angle 70° and the minimum around 56° shifted to 57° . At this pressure 70° minimum is lower in energy by 7 mRy per atom relative to the 57° minimum and around 12 mRy per atom relative to FCC. However, SC remains on the absolute maximum of the total energy curve up to 400 GPa pressure, ruling out the possibility of BCC to SC transition as predicted earlier [15]. Below 60 GPa pressure the behavior of energy curves remain unchanged. However at 60 GPa a new minimum appear at angle 110.50° (say, Min1) leading to a structural phase transition. Now BCC energy is higher by amount of 0.05 mRy per atom relative to Min1. Further increase in pressure leads to development of another minimum at angle108.50° (say, Min2) and BCC lies on a local maximum. At 112 GPa pressure energy difference between Min1 and Min2 is about –0.057 mRy per atom and that between Min1 and BCC is -0.10 mRy per atom. Thus now BCC-V becomes mechanically unstable toward a rhombohedral distortion which can lead to softening of C44 elastic constant as it related to trigonal shear. In fact Land et al [6] predicted C44 softening in the same pressure region. It is to be noted that in our work the BCC to rhombohedral structural phase transition has occurred at much lower pressure (~ 60 GPa). These energy changes are found to persist even with denser (e.g. 28x28x28 k-point) mesh for the BZ integration and with larger plane wave energy cut offs (e.g., 65 Ry). Thus it confirms that this feature is not an artifact of calculation but it is real. In fact, as these calculations were performed with same rhombohedral cell for all values of angles; inaccuracies due to computational parameters will be same. At 112 GPa pressure the calculations were also repeated with VASP using PAW pseudopotential [11-13] but results were invariant. In our calculations the frequency of TA mode becomes imaginary at some q-values along the Γ-Η direction of the BZ (Fig. 3c) at 112 GPa pressure; lower compared to ~130 GPa pressure reported by Suzuki and Otani [5]. The reason of this mismatch of pressure can be due to the difference in computational techniques and approximations used for exchange-correlation terms. The total energy behavior between angles 106°-112° at 160 GPa is similar to that at 112 GPa except now the Min2 becomes lower in energy relative to Min1 by 0.20 mRy per atom leading to an iso-structural phase transition. Now Min2 energy is lower by 0.14 mRy per atom relative to BCC. Further increase in pressure lead to disappearance of Min1 and the energy difference between Min2 and Min1 at 240 GPa is around -0.55 mRy per atom. However, the energy difference between the BCC and Min1 is 0.14 mRy per atom. At 332 GPa, the Min2 is lower in energy by 0.019 mRy per atom relative to 109.47° (BCC) energy value. Further increase in pressure lead to disappearance of the Min2 also and at 434 GPa pressure there is only one minimum located at 109.47°. In our phonon calculations at 160 GPa pressure the TA mode frequency becomes positive and further increase in pressure lead to usual pressure behavior, i.e, hardening with pressure [Fig. 3(c)]. The calculations for pressures higher than 300 GPa were checked with VASP code using PAW pseudopotential including the nonlinear core corrections which are supposed to be taken care the core-core overlap at higher pressures. Our results remain unchanged. We have also performed total energy calculations for hexagonal closed packed (HCP) lattice at optimized c/a axial ratio (1.829) at two pressures; one at ambient and other at 332 GPa pressure and its energy is found always higher compared to BCC. The energy difference (19.6 mRy/atom at ambient pressure) between BCC and HCP increases with pressure (40 mRy/atom at 332 GPa ) thus rules out the BCC-HCP transition. It is to be noted that work reported by Ding et. al. had claim the observation of BCC to rhombohedral transition near 69 GPa pressure [17] which also supports our predictions. Similar calculations were repeated for the Nb and Ta and the results of total energy variations with αrhom at several pressures are shown in Figs. 4(a,b). No new features appear in total energy curves under pressure like those in V and thus no phonon calculations were attempted. These are in agreement with earlier elastic constant and phonon calculations [3,6,16]. The only known anomaly for these two elements in the C44 is its slope change with pressure, which occurs at 40 GPa for Nb [6] and in 100 GPa region for Ta [16]. But lattice expansion in Nb resulted a new minimum around 111° which is lower in energy relative to BCC by 11 meV per atom at 6.7% lattice expansion. By analogy, we expect such features in Ta will appear even at larger lattice expansions. Conclusion: In conclusion, high pressure structural behavior of BCC metals of group-VB are investigated using density functional theory. Calculations for V show the onset of a phonon softening and a rhombohedral instability at 60 GPa pressure which may lead to C44 elastic constant softening as predicted earlier [6] in the same pressure region. Thus, we predict a BCC to rhombohedral structural phase transition (αrhom=110.5°) at around 60 GPa pressure in V. It exists in rhombohedral structure at least up to 330 GPa pressure, however its angle is changed to 108.50° at 160 GPa pressure resulting to an iso-structural phase transition. Finally at around 434 GPa pressure, it transforms to BCC structure again. This behavior is not found in the Nb and Ta. References: [1] V. V. Struzhkin, Y. A. Timofeev, R. J. Hemley and H. K. Mao, Phys. Rev. Lett. 79, 4262 (1197). [2] K. Takemura, ‘Porc. Int. Conf. On High Pressure Science and Technology, AIRAPT- 17’ (Honolulu, 1999), (Sci. Technol. High Pressure vol 1) ed. M. H. Maghnani, W. J. Nellis and M. F. Nicol (India: University Press) page 443 (2000). [3] J. S. Tse, Z. Li, K. Uehara, Y. Ma and R. Ahuja, Phys. Rev. B 69, 132101 (2004). [4] H. Cynn and C. S. Yoo, Phys. Rev. B 59, 8526 (1999). [5] N. Suzuki and M. Otani, J. Phys.:Condens. Matter 14, 10869 (2002). [6] A. Landa, J. Klepeis, P. Soderlind, I. Naumov, O. Velikokhatnyi, L. Vitos and A. Ruban, J. Phys.:Condens. Matter 18, 5079 (2006). [7] A. Dewaele, P. Loubeyre and M. Mezouar, Phys. Rev. B 70, 094112 (2004). [8] C. Kittel, ‘Introduction to Solid State Physics’ Fifth Edition (1993). [9] Website, http://www.pwscf.org [10] S. Baroni, S. de Gironcoli, A. D. Corso and P. Giannozzi, Rev. Mod. Phys. 73, 515 (2001). [11] G. Kresse and J. Furthmuller, Phys. Rev. B. 54, 11169 (1996). [12] G. Kresse and J. Hafner, Phys. Rev. B 47, 558 (1993). [13] G. Kresse and J. Joubert, Phys. Rev. B 59, 1758 (1999). [14] J. P. Perdew, K Burke and, M Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996). [15] C. Nirmala Louis and K. Iyakutti, Phys. Rev. B 67, 094509 (2003). [16] O. Gulseren and R. E. Cohen, Phys. Rev. B 65, 064103 (2002). [17] Ding et. al., Phys. Rev. Lett. 98, 085502 (2007). http://www.pwscf.org/ 0 100 200 300 400 0 40 80 120 160 200 Theory Experiment Theory Experiment Pressure (GPa) Theory Experiment Experiment Fig. 1: The calculated P-V relations are shown with solid lines. The experimental data for V and Nb are from Ref. [2]. The experimental data for Ta shown by symbol (Ο) is from Ref. [4] and other is form Ref. [7]. The V0’s are respective theoretical and experimental equilibrium volumes. 40 50 60 70 80 90 100 110 120 -0.74 -0.73 -0.72 -0.71 -0.70 -0.69 -0.68 -0.67 50 55 60 65 70 75 -0.720 -0.716 -0.712 V P ~ 0 GPa BCCSC (Deg) Fig. 2: Total energy variation with αrhom at volume 13.48 Å3 per atom. -0.739 -0.738 -0.737 -0.736 -0.735 -0.734 -0.7200 -0.7188 -0.7176 -0.7164 -0.694 -0.693 -0.692 -0.691 106 107 108 109 110 111 112 -0.663 -0.662 -0.661 -0.660 -0.659 P~0 GPaVol=13.48 P=42 GPaVol=11.36 P=60 GPa Vol=10.37 P=112 GPaVol=9.64 Fig. 3(a) -0.617 -0.616 -0.615 -0.614 -0.613 -0.612 -0.611 -0.538 -0.536 -0.534 -0.532 -0.530 -0.462 -0.460 -0.458 -0.456 -0.454 106 107 108 109 110 111 112 -0.334 -0.332 -0.330 -0.328 -0.326 P=160 GPaVol=8.89 P=240 GPaVol=8.02 P=332 GPaVol=7.41 P=434 GPaVol=6.67 Fig. 3(b) Fig. 3(a,b): The total energy variation with angle αrhom at different pressures for V. All the volumes are in unit of Å3 per atom. P=60 GPa P=112 GPa P=160 GPa P=332 GPa Fig. 3c: The pressure variation of the phonon modes frequency of V along Γ-Η direction of the BZ. -9.900 -9.895 -9.890 -9.885 -9.880 -9.875 -10.215 -10.200 -10.185 -10.170 -10.18 -10.16 -10.14 -10.12 106 107 108 109 110 111 112 -9.48 -9.44 -9.40 -9.36 Vol=22 P=-2 GPa Vol=18.5 P=13 GPa Vol=17 Vol=10 (Deg) P=73 GPa Vol=14 Fig. 4a -11.84 -11.82 -11.80 -11.78 -11.76 -11.74 -11.72 -11.70 -11.68 -11.36 -11.32 -11.28 -11.24 -11.20 -11.16 -11.12 -10.92 -10.88 -10.84 -10.80 -10.76 -10.72 -10.68 -10.64 -10.60 -10.56 -10.52 -10.48 -10.44 -10.40 -10.36 -10.32 106 107 108 109 110 111 112 -7.60 -7.52 -7.44 -7.36 -7.28 -7.20 -7.12 P=14 GPa Vol= 17.5 P=63 GPa Vol=15 Vol=14 P=114 GPa P=96 GPa Vol=13.5 P=280 GPa Vol=10.78 (Deg) Fig. 4b Fig. 4 (a,b): The calculated total energy variation with αrhom at several volumes for Nb and Ta. The volumes given in figures are in units of Å3 per atom. ABSTRACT Results of the first-principles calculations are presented for the group-VB metals V, Nb and Ta up to couple of megabar pressure. An unique structural phase transition sequence BCC-->(at 60 GPa) rhombohedral (angle=110.5 degree)-->(at ~ 160 GPa)rhombohedral(angle=108.5 degree)--> (at ~ 430 GPa) BCC is predicted in V. We also find that BCC-V becomes mechanically and vibrationally unstable at around 112 GPa pressure. Similar transitions are absent in Nb and Ta. <|endoftext|><|startoftext|> Introduction The Standard Model (SM) of electroweak interactions has been successful. It can explain all experimental results except for neutrino oscillation phenomena. Masses of quarks and leptons are generated through the Yukawa interaction after the electroweak symmetry breaking. However, no principle has been established to determine the flavor structure of the Yukawa couplings, and the origin of fermion mass hierarchy remains unknown. There have been many attempts to explain the flavor structure of Yukawa couplings. A promising approach would be the idea of the flavor symmetry. In models based on the Froggatt–Nielsen (FN) mechanism[1], the U(1) global symmetry is imposed as a flavor symmetry, in which the vacuum expectation value (VEV) of an iso-singlet scalar field (FN field) gives a power-like structure of Yukawa couplings due to the U(1) charge assignment for the relevant fields. Extension to more complicated flavor symmetries has also been studied; i.e., non-Abelian global symmetries such as U(2)[2], discrete symmetries such as S3[3], A4[4], D5[5], etc. They have several distinct patterns for the symmetry breaking, and the difference in VEVs for each scalar field gives a hierarchical structure of the Yukawa matrix. In most of such models, ∗ E-mail: kanemu@sci.u-toyama.ac.jp † E-mail: matsuda@mail.tsinghua.edu.cn ‡ E-mail: toshi@mpi-hd.mpg.de § E-mail: petcov@sissa.it ¶ E-mail: shindou@sissa.it ‖ E-mail: takasugi@het.phys.sci.osaka-u.ac.jp ∗∗ E-mail: ko2@het.phys.sci.osaka-u.ac.jp, Address after April 2007, Theory Group, KEK http://arxiv.org/abs/0704.0697v1 only the orders of magnitude of the Yukawa matrix elements are estimated, so that O(1) uncertainties exist in coefficients of the coupling constants between the scalar and matter fields. In this framework, CP violation (CPV) comes from complex phases in these coefficients. On the other hand, some kinds of the texture such as a democratic structure[6] have been investigated for the Yukawa matrix. In the model with the democratic structure, all the elements of the Yukawa matrix are assumed to have the same value up to the leading order, and mass hierarchy and flavor mixing are given by diagonalizing these Yukawa matrices. CPV is supposed to appear as a consequence of complex nature of the terms of tiny breaking of democratic structure. In any case the CP violating phase comes from complex phases of free parameters so that it is not predictable. Since both the flavor mixing and the CP violating phase are determined by the Yukawa matrices, it would be natural to consider that they are given through the same mechanism which is relevant to the Yukawa interaction. In the scenario of spontaneous CPV[7], the phase is deduced from the relative complex phase between VEVs of the scalar fields. Combining the spontaneous CPV scenario with the idea of the flavor symmetry, one can obtain the non-zero complex phase in the Yukawa matrix from the VEVs of scalar fields. This idea has been developed in several flavor models; e.g., a model with three U(1) scalar fields[8], spontaneous CPV in non-Abelian flavor symmetry[9], SO(10) model with the complex VEVs of Higgs field[10], etc. In this paper, we introduce a simple model where the FN mechanism works with democratic Yukawa structure between quarks and FN fields, and show how the CPV can be obtained. As mentioned in [8], this type of models requires at least more than two FN fields for a successful prediction of physical CP violating phase. This paper is organized as follows. In Sec.II, we study generation of CPV for quarks based on the FN mechanism with the democratic ansatz. In Sec.III, simple models with two FN fields are discussed. We present analytic expressions for the CKM parameters and numerical evaluations are also shown. Conclu- sions are given in Sec.IV. 2 CP violation in democratic models The democratic ansatz for the flavor structure of the Yukawa matrix has been implemented in Refs. [6, 13, 14, 15]. In this framework, the Yukawa matrices for the up- and down-type quarks are simply written as Yu,d ∝ 1 1 1 1 1 1 1 1 1  . (1) This flavor structure can be constructed by models with the S3R × S3L permutation symmetry[13]. The symmetry can be realized in the geometrical origin of the brane-world scenario[14], and also in the strong dynamics[15]. Two of the three eigenvalues are zero in these matrices in Eq. (1), and no CP violating phase appears in the S3R×S3L limit 1. It is clear that in order to explain the experimental data this permutation symmetry must be broken by some small effects. When the small breaking terms for the permutation and CP symmetries are introduced by hand, the mass splitting between the 1st- and 2nd-generation quarks, the mixing angles and the CP violating phase are explained. 1With keeping the S3R × S3L symmetry, the complex Yukawa matrices Yu,d in Eq. (1) can be re-expressed, for example, by an appropriate unitary transformation as Yu,d ∝ ω ω2 1 ω2 1 ω 1 ω ω2 A , (2) where ω = ei2π/3 is the cube root of one. However the complex phase in this matrix is unphysical, because it is rephased out by the redefinition of quark fields. The FN mechanism is a simple idea to generate the mass hierarchy of quarks and leptons. In the simplest FN model[1], an iso-singlet scalar field, Θ, is introduced with the U(1)FN flavor symmetry in order to discriminate the fermion flavor by the U(1)FN charge. The U(1)FN charge assigned for Θ is taken to be fΘ = −1 without loss of generality. Under the U(1)FN symmetry, non-renormalizable interactions relevant to the quark mass matrix can be written as LFN = −(CU )ij ŪiQj ·Hu +fQj+fHu − (CD)ijD̄iQj ·Hd +fQj+fHd , (3) where Hu and Hd are iso-doublet fields (the Higgs fields) with their hypercharge to be −1/2 and +1/2, respectively2, Qi is the left-handed quark doublet, Ui and Di are right-handed up- and down-type quarks in the i-th generation, and CU and CD are coupling constants of order one. The U(1)FN charge of the field X is expressed by fX .The cut-off scale is given by Λ, which describes the mass scale of new physics dynamics. The coefficients (CU )ij and (CD)ij are generally complex numbers. After U(1)FN is broken by the VEV of Θ, 〈Θ〉 = λΛ , (4) where λ is a small dimensionless parameter, the quark Yukawa matrices are obtained as (YU,D)ij = (CU,D)ijλ +fQj+fHu,d . (5) With the assignment of U(1)FN charges[16] as (fQ1 , fUc1 ), (fQ2 , fU ), (fQ3 , fUc3 ) = (3, 2, 0) , (fDc , fDc , fDc ) = (2, 1, 1), (fHu , fHd) = (0, 0), (6) observed quark mass hierarchy and the CKM mixings can be derived by assuming λ to be close to the Cabibbo angle sin θc = 0.22. At the leading order, induced masses for quarks are mu ∼ λ6〈Hu〉, mc ∼ λ4〈Hu〉, mt ∼ 〈Hu〉, md ∼ λ5〈Hd〉, ms ∼ λ3〈Hd〉 and mb ∼ λ〈Hd〉, and the CKM matrix is given by UCKM ∼ 1 λ λ3 λ 1 λ2 λ3 λ2 1  . (7) Now we consider the possibility of the spontaneous CPV due to the complex phase of 〈Θ〉. We assume that CU and CD in Eq. (3) have the democratic structure, i.e., CU,D = αU,D 1 1 1 1 1 1 1 1 1  , (8) There is no CPV in the model with only one FN field. Although complex phases may be obtained in the mass matrices by introducing the complex VEV of Θ, such phases are rotated away by the phase redefinition of quark fields. Hence the model should have at least two FN fields Θ1,2 in order to obtain non-vanishing CP violating phase. We start from the following Lagrangian with two FN fields, LFN2 =− Ūi(CU )ijQj ·Hu D̄i(CD)ijQj ·Hd , (9) 2In the SM, Hd = iσ2H u is satisfied. where n 1 and n 2 run from zero to n ij ≡ fUci ,Dci +fQj+fHu,d with satisfying the constraint n ij . After the U(1)FN symmetry is broken, the Yukawa couplings in the SM are given by (YU,D)ij = (CU,D)ijλ Rk , (10) where , R = ≡ |R|eiα . (11) Therefore, physical CP violating phase can be obtained from the relative phase α between 〈Θ1〉 and 〈Θ2〉. 3 Examples for the model with two Froggatt-Nielsen fields In this section, we show how to generate CPV from two FN fields by considering simple models. In order to generate the quark mass hierarchy, we employ the U(1)FN charge assignment for matter fields given in Eq. (6). In general, U(1)FN charges for the FN fields Θ1 and Θ2 can be different with each other. We here assume that the both have the same U(1)FN charge for simplicity; (fΘ1 , fΘ2) = (−1,−1). (a) The simplest toy model First of all, we discuss the naive model defined in Eq. (9). The mass matrices Mu,d for up- and down-type quarks are given by (Mu)ij = αU 〈Hu〉AnU ij , (Md)ij = αD〈Hd〉AnD ij , (12) where An = k=0 R k. This model predicts m2s/m b = O(λ6) and |Vus| = O(λ). Only one of the two experimental values can be adjusted. Furthermore, each mass matrix gives one massless eigenstate because of the conditions detMu = detMd = 0 In order to avoid these difficulties, the following possibilities can be considered: (i) introducing an additional symmetry, (ii) throwing away the democratic ansatz given in Eq. (8), etc. In the next model, we explore the possibility of keeping the democratic structure for CU,D. (b) The extended models with the Z2 symmetry We try to construct more realistic model. The U(1)FN charges are assigned again as in Eq. (6). In order to obtain the observed value of m2s/m b by setting λ ∼ sin θc, we impose the Z2 symmetry under the transformation of Θ1 → Θ1 and Θ2 → −Θ2. For Z2 parity assignment for quark fields, there are a lot of choices. If we consider the scenario associated with the grand unified theory (GUT), it would be natural that the Z2 parity for Qi and U i is common and that for Hu is set to be + for the prediction of a large top-quark mass. Because Hd always couples to D i , we can set the Z2 parity for Hd to be + without loss of generality. Then, there are 64 possibilities on parity assignment for quarks. However, it turns out that most of them cannot give correct numbers of m2c/m t and m b . Consequently, only the following sets of Z2 parity assignment are enough to be discussed; • Type I-a, ((Q1, U 1), (Q2, U 2), (Q3, U 3 )) = (+,+,−) , (Dc1, Dc2, Dc3) = (+,+,−) . (13) • Type I-b, ((Q1, U 1), (Q2, U 2), (Q3, U 3 )) = (+,+,−) , (Dc1, Dc2, Dc3) = (−,+,−) . (14) 3Even when the u- and d-quarks are massless at the electroweak scale, their finite masses would be generated at lower energy scales due to the strong dynamics[17]. • Type II-a, ((Q1, U 1), (Q2, U 2), (Q3, U 3 )) = (+,−,+) , (Dc1, Dc2, Dc3) = (+,−,+) . (15) • Type II-b, ((Q1, U 1), (Q2, U 2), (Q3, U 3 )) = (+,−,+) , (Dc1, Dc2, Dc3) = (−,−,+) . (16) Type I-a gives the mass matrices as Mu(I-a) = 6 B4λ 5 RB2λ 5 B4λ 4 Rλ2 3 Rλ2 1 αU 〈Hu〉 , Md(I-a) = 5 B4λ 4 Rλ2 4 B2λ 4 RB2λ αD〈Hd〉 , (17) where B2n = k=0 R 2k. Diagonalizing above matrices, we obtain mass ratios m2c/m t , m , the CKM mixing angles (absolute values of CKM matrix elements), and the Kobayashi-Maskawa phase φ3 ≡ arg(V ∗ubVud/V ∗cbVcd) at the leading order as = |1 +R4|2λ8 , m |1−R4|2 (1 + |R|2)2 |Vus| = |R|8 + |R|−8 − 2(2 cos2 4α− 1) |Vub| = |R|||R| − |R|−1| (|R|+ |R|−1) |R|4 + |R|−4 + 2 cos 4α |Vcb| = |R|2 + |R|−2 + 2 cos 4α |R|+ |R|−1 λ2 , (18) φ1 = arg |R|4 + 1 + (|R|2 + 1) cos 4α− i(|R|2 − 1) sin 4α φ2 = arg (1− |R|2) − |R|4 + 2i sin 4α φ3 = arg (1− |R|2) |R|4 − + (|R|2 − 1) cos 4α+ i(|R|2 + 1) sin 4α Let us discuss appropriate values of |R|, cos 4α and sin 4α. We first expect that λ ∼ sin θc. Then, |R| > 1 is needed to obtain the reasonable value of |Vub|. However, |R| cannot be much greater than unity, because m2c/m t exceeds the experimentally acceptable value. In addition, cos 4α < 0 is necessary for |R| > 1 to explain the data of |Vcb|. Finally, sin 4α < 0 is required for φ1 to be in the first quadrant. In this case, however, it turns out that φ3 cannot be in the first quadrant simultaneously. For numerical evaluation, we take |R| = 3/2, cos 4α = −3/4 and sin 4α = − 7/4. This parameter set determines λ = 0.25 under the experimental value of |Vus|(= 0.22). The matrices Mu,d(I-a) are diagonalized, and we obtain |Vub| = 0.0028 , |Vcb| = 0.032 , (19) which are in excellent agreement with the CKM mixing angles at the GUT scale, |Vcb| = 0.029–0.039 and |Vub| = 0.0024–0.0038 which are evaluated from renormalization group method with the experimental data at low energies[18]. However, quark mass ratios are predicted as = 0.0061 , = 0.073 , (20) which are about twice as large as the expected values at the GUT scale; mc/mt ∼ 0.0032 ± 0.0007 and ms/mb = 0.036± 0.005. Finally, we find φ1 = 12 ◦ , φ2 = 31 ◦ , φ3 = 137 ◦ . (21) Although the predictions cannot explain all the data simultaneously, it would be amazing to observe that this model can reproduce most of them in a considerable extent. We also find that for Type I-b, the mixing angles, mass ratios between 2nd- and 3rd-generations, and the CKM phase are completely the same at leading order as in Eq. (18). For Type II-a, the mass matrices are Mu(II-a) = 6 RB4λ 5 B2λ 5 B4λ 4 Rλ2 3 Rλ2 1 αU 〈Hu〉 , Md(II-a) = 5 RB2λ 4 B2λ 4 B2λ 4 RB2λ αD〈Hd〉 . The resulting mass ratios, the CKM parameters and phases φ1, φ2 and φ3 are the same as those in Type I except for |Vus| and |Vub|, which are |Vus| = |R|8 + |R|−8 − 2(2 cos2 4α− 1) |Vub| = |R|2||R| − |R|−1| (|R|+ |R|−1) |R|4 + |R|−4 + 2 cos 4α λ3 . (23) These expressions are different from those in Type I by the multiplication factor |R|. In this case, we take |R| = 3/2 and cos 4α = −1/2 in order to compensate the effect of the extra factor |R| in |Vus| in comparison with the that in Type I. In addition, we take sin 4α = − 3/2 and λ = 0.23. We obtain the following results; |Vus| = 0.22 , |Vub| = 0.0038 , |Vcb| = 0.035 , = 0.0060 , = 0.059 , φ1 = 18 ◦, φ2 = 38 ◦, φ3 = 123 ◦. (24) Although the size of ms/mb becomes smaller than that of the Type I, it is still too large to be phenomeno- logically acceptable. Moreover, mc/mt remains two times greater than the expected value, and φ3 is in the 2nd quadrant. Type II-b gives almost the same results for the CKM parameters and the mass ratios between 2nd- and 3rd- generations. 4 Conclusion We have studied possibility of incorporating CPV by using the FN mechanism in the context of democratic flavor FN couplings. We have considered models with two FN fields, in which the relative phase of their VEVs plays as the origin of CP violating phase at low energies. In the scenario with the Z2 symmetry, the relationship among ratios between quark masses, the absolute values of CKM matrix elements and the CP violating phase has been examined in several simplest models. We have found that the predictions have been in good agreement with most of the data. However, the CP violating phase φ3 has been predicted to be around 130◦, so that the models we have examined are not acceptable. It may be oversimplification to assume the flavor blind couplings. We expect that the small modification for the democratic assumption would cure this phenomenological problem. We have demonstrated the way how introduce the CPV to the FN model and have shown that the scenario with two FN fields would be promising. An application of our scenario to the lepton sector including neutrinos is under way and will appear in our future publications. If this will be successfully achieved, we would obtain a model which gives the unified description of CP phases for quarks and leptons. Acknowledgements. S. K. was supported, in part, by the Grant-in-Aid of the Ministry of Education, Culture, Sports, Science and Technology, Government of Japan, Grant Nos.17043008 and 18034004. S. P. and T. S. were supported in part by the Italian MIUR and INFN under the programs “Fisica Astroparti- cellare”. References [1] C. D. Froggatt and H. B. Nielsen, Nucl. Phys. B 147 (1979) 277. [2] A. Pomarol and D. Tommasini, Nucl. Phys. B 466 (1996) 3; R. Barbieri, G. R. Dvali and L. J. Hall, Phys. Lett. B 377 (1996) 76; R. Barbieri and L. J. Hall, Nuovo Cim. A 110 (1997) 1; R. Barbieri, L. Giusti, L. J. Hall and A. Romanino, Nucl. Phys. B 550 (1999) 32; R. Barbieri, L. J. Hall, S. Raby and A. Romanino, Nucl. Phys. B 493 (1997) 3; R. Barbieri, L. J. Hall and A. Romanino, Phys. Lett. B 401 (1997) 47; [3] S. Pakvasa and H. Sugawara, Phys. Lett. B 73 (1978) 61; H. Harari, H. Haut and J. Weyers, Phys. Lett. B 78 (1978) 459; E. Derman, Phys. Rev. D 19 (1979) 317; E. Ma, Phys. Rev. D 43 (1991) 2761; Phys. Rev. D 61 (2000) 033012; L. J. Hall and H. Murayama, Phys. Rev. Lett. 75 (1995) 3985. [4] D. Wyler, Phys. Rev. D 19 (1979) 3369; E. Ma and G. Rajasekaran, Phys. Rev. D 64 (2001) 113012; K. S. Babu, E. Ma and J. W. F. Valle, Phys. Lett. B 552 (2003) 207. [5] C. Hagedorn, M. Lindner and F. Plentinger, Phys. Rev. D 74 (2006) 025007. [6] H. Fritzsch, Nucl. Phys. B 155 (1979) 189; Y. Koide, Phys. Rev. D 28 (1983) 252; Phys. Rev. D 39 (1989) 1391; H. Fritzsch and Z. Z. Xing, Phys. Lett. B 372 (1996) 265; Phys. Lett. B 440 (1998) 313; Phys. Rev. D 61 (2000) 073016; Phys. Lett. B 598 (2004) 237; [7] T. D. Lee, Phys. Rept. 9 (1974) 143 and references therein. [8] Y. Nir and R. Rattazzi, Phys. Lett. B 382 (1996) 363 . [9] G. G. Ross, L. Velasco-Sevilla and O. Vives, Nucl. Phys. B 692 (2004) 50; N. Sahu and S. Uma Sankar, arXiv:hep-ph/0501069. J. Ferrandis, arXiv:hep-ph/0510051. [10] W. Grimus and H. Kuhbock, arXiv:hep-ph/0607197; arXiv:hep-ph/0612132. [11] N. Cabibbo, Phys. Rev. Lett. 10 (1963) 531. [12] M. Kobayashi and T. Maskawa, Prog. Theor. Phys. 49 (1973) 652. [13] H. Harari, H. Haut and J. Weyers, Phys. Lett. B 78 (1978) 459; M. Tanimoto, Phys. Lett. B 483 (2000) 417; M. Fukugita, M. Tanimoto and T. Yanagida, Phys. Rev. D 57 (1998) 4429; M. Tanimoto, T. Watari and T. Yanagida, Phys. Lett. B 461 (1999) 345. [14] T. Watari and T. Yanagida, Phys. Lett. B 544 (2002) 167; A. Soddu and N. K. Tran, Phys. Rev. D 69 (2004) 015010. [15] T. Kobayashi, H. Shirano and H. Terao, arXiv:hep-ph/0412299. Q. Shafi and Z. Tavartkiladze, Phys. Lett. B 594 (2004) 177; T. Kobayashi, Y. Omura and H. Terao, Phys. Rev. D 74 (2006) 053005. http://arxiv.org/abs/hep-ph/0501069 http://arxiv.org/abs/hep-ph/0510051 http://arxiv.org/abs/hep-ph/0607197 http://arxiv.org/abs/hep-ph/0612132 http://arxiv.org/abs/hep-ph/0412299 [16] See e.g. M. Leurer, Y. Nir and N. Seiberg, Nucl. Phys. B 420 (1994) 468; J. K. Elwood, N. Irges and P. Ramond, Phys. Rev. Lett. 81 (1998) 5064; M. Bando and T. Kugo, Prog. Theor. Phys. 101 (1999) 1313; M. Bando, T. Kugo and K. Yoshioka, Prog. Theor. Phys. 104 (2000) 211. [17] D. R. Nelson, G. T. Fleming and G. W. Kilcup, Phys. Rev. Lett. 90, 021601 (2003); C. Aubin et al. [MILC Collaboration], Phys. Rev. D 70, 114501 (2004). [18] H. Fusaoka and Y. Koide, Phys. Rev. D 57 (1998) 3986; C. R. Das and M. K. Parida, Eur. Phys. J. C 20 (2001) 121. Introduction CP violation in democratic models Examples for the model with two Froggatt-Nielsen fields Conclusion ABSTRACT We study how to incorporate CP violation in the Froggatt--Nielsen (FN) mechanism. To this end, we introduce non-renormalizable interactions with a flavor democratic structure to the fermion mass generation sector. It is found that at least two iso-singlet scalar fields with imposed a discrete symmetry are necessary to generate CP violation due to the appearance of the relative phase between their vacuum expectation values. In the simplest model, ratios of quark masses and the Cabibbo-Kobayashi-Maskawa (CKM) matrix including the CP violating phase are determined by the CKM element |V_{us}| and the ratio of two vacuum expectation values R=|R|e^{i*alpha} (a magnitude and a phase). It is demonstrated how the angles phi_i (i=1--3) of the unitarity triangle and the CKM off-diagonal elements |V_{ub}| and |V_{cb}| are predicted as a function of |V_{us}|, |R| and \alpha. Although the predicted value of the CP violating phase does not agree with the experimental data within the simplest model, the basic idea of our scenario would be promising to construct a more realistic model of flavor and CP violation. <|endoftext|><|startoftext|> Introduction 1 1.1 The adiabatic piston . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Physical motivation for the results . . . . . . . . . . . . . . . . . . . 4 2 Background Averaging Material 8 2.1 The averaging framework . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Some classical averaging results . . . . . . . . . . . . . . . . . . . . 12 2.3 A proof of Anosov’s theorem . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Moral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3 Results for piston systems in one dimension 27 3.1 Statement of results . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Heuristic derivation of the averaged equation for the hard core piston 34 3.3 Proof of the main result for the hard core piston . . . . . . . . . . . 35 3.4 Proof of the main result for the soft core piston . . . . . . . . . . . 41 3.5 Appendix to Section 3.4 . . . . . . . . . . . . . . . . . . . . . . . . 50 4 The periodic oscillation of an adiabatic piston in two or three dimensions 54 4.1 Statement of the main result . . . . . . . . . . . . . . . . . . . . . . 54 4.2 Preparatory material concerning a two-dimensional gas container with only one gas particle on each side 58 4.3 Proof of the main result for two-dimensional gas containers with only one gas particle on each side . . . . . . . . . . . . . . . . . . . 66 4.4 Generalization to a full proof of Theorem 4.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.5 Inducing maps on subspaces . . . . . . . . . . . . . . . . . . . . . . 82 4.6 Derivative bounds for the billiard map in three dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Bibliography 84 List of Figures 1.1 A gas container D in d = 2 dimensions separated by an adiabatic piston. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 An effective potential. . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 A schematic of the phase space M. Note that although the level set Mc = {h = c} is depicted as a torus, it need not be a torus. It could be any compact, co-dimension m submanifold. . . . . . . . . . 11 3.1 The piston system with n1 = 3 and n2 = 4. Note that the gas particles do not interact with each other, but only with the piston and the walls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1 A gas container D ⊂ R2 separated by a piston. . . . . . . . . . . . . 55 4.2 A choice of coordinates on phase space. . . . . . . . . . . . . . . . . 59 4.3 An analysis of the divergences of orbits when ε > 0 and the left gas particle collides with the moving piston to the right of Q0. Note that the dimensions are distorted for visual clarity, but that εL and εL/γ are both o(γ) as ε→ 0. . . . . . . . . . . . . . . . . . . . . . 79 Chapter 1 Introduction What can be rigorously understood about the nonequilibrium dynamics of chaotic, many particle systems? Although much progress has been made in understanding the infinite time behavior of such systems, our understanding on finite time scales is still far from complete. Systems of many particles contain a large number of degrees of freedom, and it is often impractical or impossible to keep track of their full dynamics. However, if one is only interested in the evolution of macroscopic quantities, then these variables form a small subset of all of the variables. The evo- lution of these quantities does not itself form a closed dynamical system, because it depends on events happening in all of the (very large) phase space. We must therefore develop techniques for describing the evolution of just a few variables in phase space. Such descriptions are valid on limited time scales because a large amount of information about the dynamics of the full system is lost. However, the time scales of validity can often be long enough to enable a good prediction of the observable dynamics. Averaging techniques help to describe the evolution of certain variables in some physical systems, especially when the system has components that move on different time scales. The primary results of this thesis involve applying averaging techniques to chaotic microscopic models of gas particles separated by an adiabatic piston for the purposes of justifying and understanding macroscopic laws. This thesis is organized as follows. In Section 1.1 we briefly introduce the the adiabatic piston problem and our results. In Section 1.2 we review the physical motivations for our results. The following three chapters may each be read inde- pendently. Chapter 2 presents an introduction to averaging theory and the proofs of a number of averaging theorems for smooth systems that motivate our later proofs for the piston problem. Chapter 3 contains our results for piston systems in one dimension, and Chapter 4 contains our results for the piston system in dimensions two and three. 1.1 The adiabatic piston Consider the following simple model of an adiabatic piston separating two gas containers: A massive piston of mass M ≫ 1 divides a container in Rd, d = 1, 2, or 3, into two halves. The piston has no internal degrees of freedom and can only move along one axis of the container. On either side of the piston there are a finite number of ideal, unit mass, point gas particles that interact with the walls of the container and with the piston via elastic collisions. When M = ∞, the piston remains fixed in place, and each gas particle performs billiard motion at a constant energy in its sub-container. We make an ergodicity assumption on the behavior of the gas particles when the piston is fixed. Then we study the motions of the piston when the number of gas particles is fixed, the total energy of the system is bounded, but M is very large. Heuristically, after some time, one expects the system to approach a steady state, where the energy of the system is equidistributed amongst the particles and the piston. However, even if we could show that the full system is ergodic, an abstract ergodic theorem says nothing about the time scale required to reach such a steady state. Because the piston will move much slower than a typical gas particle, it is natural to try to determine the intermediate behavior of the piston by averaging techniques. By averaging over the motion of the gas particles on a time scale chosen short enough that the piston is nearly fixed, but long enough that the ergodic behavior of individual gas particles is observable, we will show that the system does not approach the expected steady state on the time scale M1/2. Instead, the piston oscillates periodically, and there is no net energy transfer between the gas particles. The results of this thesis follow earlier work by Neishtadt and Sinai [Sin99, NS04]. They determined that for a wide variety of Hamiltonians for the gas par- ticles, the averaged behavior of the piston is periodic oscillation, with the piston moving inside an effective potential well whose shape depends on the initial po- sition of the piston and the gas particles’ Hamiltonians. They pointed out that an averaging theorem due to Anosov [Ano60, LM88], proved for smooth systems, should extend to this case. The main result of the present work, Theorem 4.1.1, is that Anosov’s theorem does extend to the particular gas particle Hamiltonian described above. Thus, if we examine the actual motions of the piston with respect to the slow time τ = t/M1/2, then, as M → ∞, in probability (with respect to Liouville measure) most initial conditions give rise to orbits whose actual motion is accurately described by the averaged behavior for 0 ≤ τ ≤ 1, i.e. for 0 ≤ t ≤M1/2. A recent study involving some similar ideas by Chernov and Dolgopyat [CD06a] considered the motion inside a two-dimensional domain of a single heavy, large gas particle (a disk) of mass M ≫ 1 and a single unit mass point particle. They as- sumed that for each fixed location of the heavy particle, the light particle moves inside a dispersing (Sinai) billiard domain. By averaging over the strongly hy- perbolic motions of the light particle, they showed that under an appropriate scaling of space and time the limiting process of the heavy particle’s velocity is a (time-inhomogeneous) Brownian motion on a time scale O(M1/2). It is not clear whether a similar result holds for the piston problem, even for gas containers with good hyperbolic properties, such as the Bunimovich stadium. In such a container the motion of a gas particle when the piston is fixed is only nonuniformly hyper- bolic because it can experience many collisions with the flat walls of the container immediately preceding and following a collision with the piston. The present work provides a weak law of large numbers, and it is an open problem to describe the sizes of the deviations for the piston problem [CD06b]. Although our result does not yield concrete information on the sizes of the devia- tions, it is general in that it imposes very few conditions on the shape of the gas container. Most studies of billiard systems impose strict conditions on the shape of the boundary, generally involving the sign of the curvature and how the corners are put together. The proofs in this work require no such restrictions. In particu- lar, the gas container can have cusps as corners and need satisfy no hyperbolicity conditions. If the piston divides a container in R2 or R3 with axial symmetry, such as a rectangle or a cylinder, then our ergodicity assumption on the behavior of the gas particles when the piston is fixed does not hold. In this case, the interactions of the gas particles with the piston and the ends of the container are completely specified by their motions along the normal axis of the container. Thus, this system projects onto a system inside an interval consisting of a massive point particle, the piston, which interacts with the gas particles on either side of it. These gas particles make elastic collisions with the walls at the ends of the container and with the piston, but they do not interact with each other. For such one-dimensional containers, the effects of the gas particles are quasi-periodic and can be essentially decoupled, and we recover a strong law of large numbers with a uniform rate, reminiscent of classical averaging over just one fast variable in S1: The convergence of the actual motions to the averaged behavior is uniform over all initial conditions, with the size of the deviations being no larger than O(M−1/2) on the time scale M+1/2. See Theorem 3.1.1. Gorelyshev and Neishtadt [GN06] independently obtained this result. For systems in d = 1 dimension, we also investigate the behavior of the system when the interactions of the gas particles with the walls and the piston have been smoothed, so that Anosov’s theorem applies directly. Let δ ≥ 0 be a parameter of smoothing, so that δ = 0 corresponds to the hard core setting above. Then the averaged behavior of the piston is still a periodic oscillation, which depends smoothly on δ. We show that the deviations of the actual motions of the piston from the averaged behavior are again not more than O(M−1/2) on the time scale M1/2. The size of the deviations is bounded uniformly, both over initial conditions and over the amount of smoothing, Theorem 3.1.2. Our results for a single heavy piston separating two gas containers generalize to the case of N heavy pistons separating N +1 gas containers. Here the averaged behavior of the pistons has them moving like an N -dimensional particle inside an effective potential well. Compare Section 3.1.3. The systems under consideration in this work are simple models of an adiabatic piston. The general adiabatic piston problem [Cal63], well-known from physics, consists of the following: An insulating piston separates two gas containers, and initially the piston is fixed in place, and the gas in each container is in a separate thermal equilibrium. At some time, the piston is no longer externally constrained and is free to move. One hopes to show that eventually the system will come to a full thermal equilibrium, where each gas has the same pressure and temperature. Whether the system will evolve to thermal equilibrium and the interim behavior of the piston are mechanical problems, not adequately described by thermodynam- ics [Gru99], that have recently generated much interest within the physics and mathematics communities following Lieb’s address [Lie99]. One expects that the system will evolve in at least two stages. First, the system relaxes deterministically toward a mechanical equilibrium, where the pressures on either side of the piston are equal. In the second, much longer, stage, the piston drifts stochastically in the direction of the hotter gas, and the temperatures of the gases equilibrate. See for example [GPL03, CL02, Che04] and the references therein. Previously, rigorous results have been limited mainly to models where the effects of gas particles rec- olliding with the piston can be neglected, either by restricting to extremely short time scales [LSC02, CLS02] or to infinite gas containers [Che04]. 1.2 Physical motivation for the results In this section, we briefly review the physical motivations for our results on the adiabatic piston. Consider a massive, insulating piston of massM that separates a gas container D in Rd, d = 1, 2, or 3. See Figure 1.1. Denote the location of the piston by Q and its velocity by dQ/dt = V . If Q is fixed, then the piston divides D into two subdomains, D1(Q) = D1 on the left and D2(Q) = D2 on the right. By |Di| we denote the area (when d = 2, or length, when d = 1, or volume, when d = 3) of Di. Define ∂ |D1(Q)| = −∂ |D2(Q)| so that ℓ is the piston’s cross-sectional length (when d = 2, or area, when d = 3). If d = 1, then ℓ = 1. By Ei we denote the total energy of the gas inside Di. D1(Q) D2(Q) E1 E2 D = D1(Q) ⊔ D2(Q) ✲ V = εW M = ε−2 ≫ 1 s✟✟✙ℓ Figure 1.1: A gas container D in d = 2 dimensions separated by an adiabatic piston. We are interested in the dynamics of the piston when the system’s total energy is bounded and M → ∞. When M = ∞, the piston remains fixed in place, and each energy Ei remains constant. When M is large but finite, MV 2/2 is bounded, and so V = O(M−1/2). It is natural to define ε =M−1/2, so that W is of order 1 as ε → 0. This is equivalent to scaling time by ε, and so we introduce the slow time τ = εt. If we let Pi denote the pressure of the gas inside Di, then heuristically the dynamics of the piston should be governed by the following differential equation: = V, M = P1ℓ− P2ℓ, = P1ℓ− P2ℓ. (1.1) To find differential equations for the energies of the gases, note that in a short amount of time dt, the change in energy should come entirely from the work done on a gas, i.e. the force applied to the gas times the distance the piston has moved, because the piston is adiabatic. Thus, one expects that = −V P1ℓ, = +V P2ℓ, = −WP1ℓ, = +WP2ℓ. (1.2) To obtain a closed system of differential equations, it is necessary to insert an expression for the pressures. Piℓ should be the average force from the gas particles in Di experienced by the piston when it is held fixed in place. Whether such an expression, depending only on Ei and Di(Q), exists and is the same for (almost) every initial condition of the gas particles depends strongly on the microscopic model of the gas particle dynamics. Sinai and Neishtadt [Sin99, NS04] pointed out that for many microscopic models where the pressures are well defined, the solutions of Equations (1.1) and (1.2) have the piston moving according to a model- dependent effective Hamiltonian. Because the pressure of an ideal gas in d dimensions is proportional to the energy density, with the constant of proportionality 2/d, we choose to insert d |Di| Later, we will make assumptions on the microscopic gas particle dynamics to justify this substitution. However, if we accept this definition of the pressure, we obtain the following ordinary differential equations for the four macroscopic variables of the system: d |D1(Q)| − 2E2ℓ d |D2(Q)| − 2WE1ℓ d |D1(Q)| 2WE2ℓ d |D2(Q)| . (1.3) For these equations, one can see the effective Hamiltonian as follows. Since d ln(Ei) d ln(|Di(Q)|) Ei(τ) = Ei(0) |Di(Q(0))| |Di(Q(τ))| Hence d2Q(τ) E1(0) |D1(Q(0))|2/d |D1(Q(τ))|1+2/d E2(0) |D2(Q(0))|2/d |D2(Q(τ))|1+2/d effective potential P1 = P2 Figure 1.2: An effective potential. and so (Q,W ) behave as if they were the coordinates of a Hamiltonian system describing a particle undergoing motion inside a potential well. The effective Hamiltonian may be expressed as W 2 + E1(0) |D1(Q(0))|2/d |D1(Q)|2/d E2(0) |D2(Q(0))|2/d |D2(Q)|2/d . (1.4) The question is, do the solutions of Equation (1.3) give an accurate description of the actual motions of the macroscopic variables when M tends to infinity? The main result of this thesis, Theorem 4.1.1, is that, for an appropriately defined system, the answer to this question is affirmative for 0 ≤ t ≤ M1/2, at least for most initial conditions of the microscopic variables. Observe that one should not expect the description to be accurate on time scales much longer than O(M1/2) = O(ε−1). The reason for this is that, presumably, there are corrections of size O(ε) in Equation (1.3) that we are neglecting. For τ = εt > O(1), these corrections should become significant. Such higher order corrections for the adiabatic piston were studied by Crosignani et al. [CDPS96]. Chapter 2 Background Averaging Material In this chapter, we present a number of well-known classical averaging results for smooth systems, as well as a proof of Anosov’s averaging theorem, which is the first general multi-phase averaging result. All of these theorems are at least 45 years old. However, we present them here because our proofs of the classical results are at least slightly novel, and the ideas in them lend themselves well to certain higher-dimensional generalizations. In particular, they are fairly close to the ideas in the proof we give for our piston results in one dimension. The proof of Anosov’s theorem is a new and unpublished proof due mainly to Dolgopyat, with some further simplifications made. The ideas in this proof underly the ideas we will use to prove the weak law of large numbers for our piston system in dimensions two and three. We begin by giving a discussion of a framework for general averaging theory and some averaging results. A number of classical averaging theorems are then proved, followed by the proof of Anosov’s theorem. 2.1 The averaging framework In this section, consider a family of ordinary differential equations = Z(z, ε) (2.1) on a smooth, finite-dimensional Riemannian manifold M, which is indexed by the real parameter ε ∈ [0, ε0]. Assume • Regularity: the functions Z and ∂Z/∂ε are both C1 on M× [0, ε0]. We denote the flow generated by Z(·, ε) by zε(t, z) = zε(t). We will usually suppress the dependence on the initial condition z = zε(0, z). Think of zε(·) as being a random variable whose domain is the space of initial conditions for the differential equation (2.1) and whose range is the space of continuous paths (depending on the parameter t) in M. • Existence of smooth integrals: z0(t) has m independent C2 first integrals h = (h1, . . . , hm) : M → Rm. Then h is conserved by z0(t), and at every point the linear operator ∂h/∂z has full rank. It follows from the implicit function theorem that each level set Mc := {h = c} is a smooth submanifold of co-dimension m, which is invariant under z0(t). Fur- ther, assume that there exists an open ball U ⊂ Rm satisfying: • Compactness: ∀c ∈ U , Mc is compact. • Preservation of smooth measures: ∀c ∈ U , z0(t)|Mc preserves a smooth measure µc that varies smoothly with c, i.e. there exists a C1 function g : M → R>0 such that g|Mc is the density of µc with respect to the restriction of Riemannian volume. hε(t, z) = hε(t) := h(zε(t)). Again, think of hε(·) as being a random variable that takes initial conditions z ∈ M to continuous paths (depending on the parameter t) in U . Since dh0/dt ≡ 0, Hadamard’s Lemma allows us to write = εH(zε, ε) for some C1 function H : M× [0, ε0] → U . Observe that (t) = Dh(zε(t))Z(zε(t), ε) = Dh(zε(t)) Z(zε(t), ε)− Z(zε(t), 0) so that H(z, 0) = L ∂Z |ε=0h. Here L denotes the Lie derivative. Define the averaged vector field H̄ by H̄(h) = H(z, 0)dµh(z). (2.2) Then H̄ is C1. Fix a compact set V ⊂ U , and introduce the slow time τ = εt. Let h̄(τ, z) = h̄(τ) be the random variable that is the solution of = H̄(h̄), h̄(0) = hε(0). We only consider the dynamics in a compact subset of phase space, so for initial conditions z ∈ h−1U , define the stopping time Tε(z) = Tε = inf{τ ≥ 0 : h̄(τ) /∈ V or hε(τ/ε) /∈ V}. Heuristically, think of the phase space M as being a fiber bundle whose base is the open set U and whose fibers are the compact sets Mh. See Figure 2.1. Then the vector field Z(·, 0) is perpendicular to the base, so its orbits z0(t) flow only along the fibers. Now when 0 < ε ≪ 1, the vector field Z(·, ε) acquires a component of size O(ε) along the base, and so its orbits zε(t) have a small drift along the base, which we can follow by observing the evolution of hε(t). Because of this, we refer to h as consisting of the slow variables. Other variables, used to complete h to a parameterization of (a piece of) phase space, are called fast variables. Note that hε(t) depends on all the dimensions of phase space, and so it is not the flow of a vector field on the m-dimensional space U . However, because the motion along each fiber is relatively fast compared to the motion across fibers, we hope to be able to average over the fast motions and obtain a vector field on U that gives a good description of hε(t) over a relatively long time interval, independent of where the solution zε(t) started on Mhε(0). Because our averaged vector field, as defined by Equation (2.2), only accounts for deviations of size O(ε), we cannot expect this time interval to be longer than size O(1/ε). In terms of the slow time τ = εt, this length becomes O(1). In other words, the goal of the first-order averaging method described above should be to show that, in some sense, sup0≤τ≤1∧Tε ∣hε(τ/ε)− h̄(τ) ∣→ 0 as ε → 0. This is often referred to as the averaging principle. Note that the assumptions of regularity, existence of smooth integrals, com- pactness, and preservation of smooth measures above are not sufficient for the averaging principle to hold in any form. As an example of just one possible obstruction, the level sets Mc could separate into two completely disjoint sets, Mc = M+c ⊔ M−c . If this were the case, then it would be implausible that the solutions of the averaged vector field defined by averaging over all of Mc would accurately describe hε(t, z), independent of whether z ∈ M+c or z ∈ M−c . Some averaging results So far, we are in a general averaging setting. Frequently, one also assumes that the invariant submanifolds, Mh, are tori, and that there exists a choice of coordinates z = (h, ϕ) M = {(h, ϕ)} ✲ h ∈ U ⊂ Rm “slow variables” ϕ =“fast variables” ✄✗Z(·, 0) Z(·, ε) Figure 2.1: A schematic of the phase space M. Note that although the level set Mc = {h = c} is depicted as a torus, it need not be a torus. It could be any compact, co-dimension m submanifold. on M in which the differential equation (2.1) takes the form = εH(h, ϕ, ε), = Φ(h, ϕ, ε). Then if ϕ ∈ S1 and the differential equation for the fast variable is regular, i.e. Φ(h, ϕ, 0) is bounded away from zero for h ∈ U , initial conditions s.t. hε(0)∈V 0≤τ≤1∧Tε ∣hε(τ/ε)− h̄(τ) ∣ = O(ε) as ε → 0. See for example Chapter 5 in [SV85], Chapter 3 in [LM88], or Theorem 2.2.3 in the following section. When the differential equation for the fast variable is not regular, or when there is more than one fast variable, the typical averaging result becomes much weaker than the uniform convergence above. For example, consider the case when ϕ ∈ Tn, n > 1, and the unperturbed motion is quasi-periodic, i.e. Φ(h, ϕ, 0) = Ω(h). Also assume that H ∈ Cn+2 and that Ω is nonvanishing and satisfies a nondegeneracy condition on U (for example, Ω : U → Tn is a submersion). Let P denote Riemannian volume on M. Neishtadt [LM88, Nei76] showed that in this situation, for each fixed δ > 0, 0≤τ≤1∧Tε ∣hε(τ/ε)− h̄(τ) ∣ ≥ δ ε/δ), and that this result is optimal. Thus, the averaged equation only describes the actual motions of the slow variables in probability on the time scale 1/ε as ε→ 0. Neishtadt’s result was motivated by a general averaging theorem for smooth systems due to Anosov. This theorem requires none of the additional assumptions in the averaging results above. Under the conditions of regularity, existence of smooth integrals, compactness, and preservation of smooth measures, as well as • Ergodicity: for Lebesgue almost every c ∈ U , (z0(·), µc) is ergodic, Anosov showed that sup0≤τ≤1∧Tε ∣hε(τ/ε)− h̄(τ) ∣ → 0 in probability (w.r.t. Rie- mannian volume on initial conditions) as ε→ 0, i.e. Theorem 2.1.1 (Anosov’s averaging theorem [Ano60]). For each T > 0 and for each fixed δ > 0, 0≤τ≤T∧Tε ∣hε(τ/ε)− h̄(τ) ∣ ≥ δ as ε→ 0. We present a recent proof of this theorem in Section 2.3 below. If we consider hε(·) and h̄(·) to be random variables, Anosov’s theorem is a version of the weak law of large numbers. In general, we can do no better: There is no general strong law in this setting. There exists a simple example due to Neishtadt (which comes from the equations for the motion of a pendulum with linear drag being driven by a constant torque) where for no initial condition in a positive measure set do we have convergence of hε(t) to h̄(εt) on the time scale 1/ε as ε → 0 [Kif04b]. Here, the phase space is R× S1, and the unperturbed motion is (uniquely) ergodic on all but one fiber. 2.2 Some classical averaging results In this section we present some simple, well-known averaging results. See for example Chapter 5 in [SV85] or Chapter 3 in [LM88]. 2.2.1 Averaging for time-periodic vector fields Consider a family of time dependent ordinary differential equations = εH(h, t, ε), (2.3) indexed by the real parameter ε ≥ 0, where h ∈ Rm. Fix V ⊂⊂ U ⊂ Rm, and suppose • Regularity: H ∈ C1(U × R× [0,∞)). • Periodicity: There exists T > 0 such that for each h ∈ U , H(h, t, 0) is T -periodic in time. = εH(h, t, 0) +O(ε2). Let hε(t) denote the solution of Equation (2.3). We seek a time independent vector field whose solutions approximate hε(t), at least for a long length of time. It is natural to define the averaged vector field H̄ by H̄(h) = H(h, s, 0)ds. Then H̄ ∈ C1(U). Let h̄(τ) be the solution of = H̄(h̄), h̄(0) = hε(0). It is reasonable to hope that h̄(εt) and hε(t) are close together for 0 ≤ t ≤ ε−1. We only consider the dynamics in a compact subset of phase space, so for initial conditions in U , we define the stopping time Tε = inf{τ ≥ 0 : h̄(τ) /∈ V or hε(τ/ε) /∈ V}. Theorem 2.2.1 (Time-periodic averaging). For each T > 0, hε(0)∈V 0≤τ≤T∧Tε ∣hε(τ/ε)− h̄(τ) ∣ = O(ε) as ε→ 0. Proof. We divide our proof into three essential steps. Step 1: Reduction using Gronwall’s Inequality. Now, h̄(τ) satisfies the integral equation h̄(τ)− h̄(0) = H̄(h̄(σ))dσ, while hε(τ/ε) satisfies hε(τ/ε)− hε(0) = ε ∫ τ/ε H(hε(s), s, ε)ds = O(ε) + ε ∫ τ/ε H(hε(s), s, 0)ds = O(ε) + ε ∫ τ/ε H(hε(s), s, 0)− H̄(hε(s))ds+ H̄(hε(σ/ε))dσ for 0 ≤ τ ≤ T ∧ Tε. Define eε(τ) = ε ∫ τ/ε H(hε(s), s, 0)− H̄(hε(s))ds. It follows from Gronwall’s Inequality that 0≤τ≤T∧Tε ∣h̄(τ)− hε(τ/ε) O(ε) + sup 0≤τ≤T∧Tε |eε(τ)| eLip(H̄|V)T . Step 2: A sequence of times adapted for ergodization. Ergodization refers to the convergence along an orbit of a function’s time average to its space average. We define a sequence of times tk for k ≥ 0 by tk = kT . This sequence of times is motivated by the fact that tk+1 − tk ∫ tk+1 H(h0(s), s, 0)ds = H̄(h0). Note that h0(t) is independent of time. Thus, 0≤τ≤T∧Tε |eε(τ)| ≤ O(ε) + ε tk+1≤T∧Tεε ∫ tk+1 H(hε(s), s, 0)− H̄(hε(s))ds . (2.4) Step 3: Control of individual terms by comparison with solutions of the ε = 0 equation. The sum in Equation (2.4) has no more than O(1/ε) terms, and so it suffices to show that each term ∫ tk+1 H(hε(s), s, 0) − H̄(hε(s))ds is no larger than O(ε). We can accomplish this by comparing the motions of hε(t) for tk ≤ t ≤ tk+1 with hk,ε(t), which is defined to be the solution of the ε = 0 ordinary differential equation satisfying hk,ε(tk) = hε(tk), i.e. hk,ε(t) ≡ hε(tk). Lemma 2.2.2. If tk+1 ≤ T∧Tεε , then suptk≤t≤tk+1 |hk,ε(t)− hε(t)| = O(ε). Proof. dhε/dt = O(ε). Using that H and H̄ are Lipschitz continuous, we conclude that ∫ tk+1 H(hε(s), s, 0)− H̄(hε(s))ds ∫ tk+1 H(hε(s), s, 0)−H(hk,ε(s), s, 0)ds ∫ tk+1 H(hk,ε(s), s, 0)− H̄(hk,ε(s))ds ∫ tk+1 H̄(hk,ε(s))− H̄(hε(s))ds =O(ε) + 0 +O(ε) =O(ε). Thus we see that sup0≤τ≤T∧Tε ∣hε(τ/ε)− h̄(τ) ∣ ≤ O(ε), independent of the initial condition hε(0) ∈ V. Remark 2.2.1. Note that the O(ε) control in Theorem 2.2.1 on a time scale t = O(ε−1) is generally optimal. For example, take H(h, t, ε) = cos(t) + ε. 2.2.2 Averaging for vector fields with one regular fast vari- For h ∈ Rm and ϕ ∈ S1 = [0, 1]/0 ∼ 1, consider the family of ordinary differential equations = εH(h, ϕ, ε), = Φ(h, ϕ, ε), (2.5) indexed by the real parameter ε ≥ 0. With z = (h, ϕ), we write this family of differential equations as dz/dt = Z(z, ε). Fix V ⊂⊂ U ⊂ Rm, and suppose • Regularity: Z ∈ C1(U × S1 × [0,∞)). • Regular fast variable: Φ(h, ϕ, 0) is bounded away from 0 for h ∈ U , i.e. (h,ϕ)∈U×S1 |Φ(h, ϕ, 0)| > 0. Without loss of generality, we assume that Φ(h, ϕ, 0) > 0. Let zε(t) = (hε(t), ϕε(t)) denote the solution of Equation (2.5). Then z0(t) leaves invariant the circles Mc = {h = c} in phase space. In fact, z0(t) preserves an uniquely ergodic invariant probability measure on Mc, whose density is given dµc = Φ(c, ϕ, 0) where Kc = Φ(c,ϕ,0) is a normalization constant. The averaged vector field H̄ is defined by averaging H(h, ϕ, 0) over ϕ: H̄(h) = H(h, ϕ, 0)dµh(ϕ) = H(h, ϕ, 0) Φ(h, ϕ, 0) Then H̄ ∈ C1(U). Let h̄(τ) be the solution of = H̄(h̄), h̄(0) = hε(0). For initial conditions in U × S1, we have the usual stopping time Tε = inf{τ ≥ 0 : h̄(τ) /∈ V or hε(τ/ε) /∈ V}. Theorem 2.2.3 (Averaging over one regular fast variable). For each T > 0, initial conditions s.t. hε(0)∈V 0≤τ≤T∧Tε ∣hε(τ/ε)− h̄(τ) ∣ = O(ε) as ε→ 0. Remark 2.2.2. This result encompasses Theorem 2.2.1 for time-periodic averaging. For example, if T = 1, simply take ϕ = t mod 1 and Φ(h, ϕ, ε) = 1. Remark 2.2.3. Many of the proofs of the above theorem of which we are aware hinge on considering ϕ as a time-like variable. For example, one could write H(h, ϕ, 0) Φ(h, ϕ, 0) +O(ε2), and this looks very similar to the time-periodic situation considered previously. However, it does take some work to justify such arguments rigorously, and the traditional proofs do not easily generalize to averaging over multiple fast variables. Our proof essentially uses ϕ to mark off time, and it will immediately generalize to a specific instance of multiphase averaging. Proof. Again, we have three steps. Step 1: Reduction using Gronwall’s Inequality. Now h̄(τ)− h̄(0) = H̄(h̄(σ))dσ, hε(τ/ε)− hε(0) = ε ∫ τ/ε H(zε(s), ε)ds = O(ε) + ε ∫ τ/ε H(zε(s), 0)ds = O(ε) + ε ∫ τ/ε H(zε(s), 0)− H̄(hε(s))ds+ H̄(hε(σ/ε))dσ for 0 ≤ τ ≤ T ∧ Tε. Define eε(τ) = ε ∫ τ/ε H(zε(s), 0)− H̄(hε(s))ds. It follows from Gronwall’s Inequality that 0≤τ≤T∧Tε ∣h̄(τ)− hε(τ/ε) O(ε) + sup 0≤τ≤T∧Tε |eε(τ)| eLip(H̄|V)T . Step 2: A sequence of times adapted for ergodization. Now for each initial condition in our phase space and for each fixed ε, we define a sequence of times tk,ε and a sequence of solutions zk,ε(t) inductively as follows: t0,ε = 0 and z0,ε(t) = z0(t). For k > 0, tk,ε = inf{t > tk−1,ε : ϕk−1,ε(t) = ϕε(0)}, and zk,ε(t) is defined as the solution of dzk,ε = Z(zk,ε, 0) = (0,Φ(zk,ε, 0)), zk,ε(tk,ε) = zε(tk,ε). This sequence of times is motivated by the fact that tk+1,ε − tk,ε ∫ tk+1,ε tk ,ε H(zk,ε(s), 0)ds = H̄(hk,ε). Recall that hk,ε(t) is independent of time. The elements of this sequence of times are approximately uniformly spaced, i.e. if we fix ω > 0 such that z ∈ V × S1 ⇒ 1/ω < Φ(z, 0) < ω, then if tk+1,ε ≤ (T ∧ Tε)/ε, 1/ω < tk+1,ε − tk,ε < ω. Thus, 0≤τ≤T∧Tε |eε(τ)| ≤ O(ε) + ε tk+1,ε≤T∧Tεε ∫ tk+1,ε H(zε(s), 0)− H̄(hε(s))ds where the sum in in this equation has no more than O(1/ε) terms. Step 3: Control of individual terms by comparison with solutions along fibers. It suffices to show that each term ∫ tk+1,ε H(zε(s), 0) − H̄(hε(s))ds is no larger than O(ε). We can accomplish this by comparing the motions of zε(t) for tk,ε ≤ t ≤ tk+1,ε with zk,ε(t). Lemma 2.2.4. If tk+1,ε ≤ T∧Tεε , then suptk,ε≤t≤tk+1,ε |zk,ε(t)− zε(t)| = O(ε). Proof. Without loss of generality, we take k = 0, so that zk,ε(t) = z0(t). Since h0(t) = hε(0) and dhε/dt = O(ε), supt0,ε≤t≤t1,ε |h0(t)− hε(t)| = O(ε). Now ϕε(t) − ϕε(0) = Φ(hε(s), ϕε(s), ε)ds, and because Φ is Lipschitz, we find that |ϕε(t)− ϕ0(t)| ≤ O(ε) + Lip (Φ) |ϕε(s)− ϕ0(s)| ds for 0 ≤ t ≤ ω. The result follows from Gronwall’s Inequality. Using that H and H̄ are Lipschitz continuous, we conclude that ∫ tk+1,ε H(zε(s), 0)− H̄(hε(s))ds ∫ tk+1,ε H(zε(s), 0)−H(zk,ε(s), 0)ds ∫ tk+1,ε H(zk,ε(s), 0)− H̄(hk,ε(s))ds ∫ tk+1,ε H̄(hk,ε(s))− H̄(hε(s))ds =O(ε) + 0 +O(ε) =O(ε). Thus we see that sup0≤τ≤T∧Tε ∣hε(τ/ε)− h̄(τ) ∣ = O(ε), independent of the initial condition (hε(0), ϕε(0)) ∈ V × S1. 2.2.3 Multiphase averaging for vector fields with separa- ble, regular fast variables As explained in Section 2.1, when the differential equation for the fast variable is not regular, or when there is more than one fast variable, the typical averaging result becomes much weaker than the uniform convergence in Theorems 2.2.1 and 2.2.3 above. Nonetheless, if the differential equations under consideration satisfy some very specific hypotheses, the proof in the previous section immediately generalizes to yield uniform convergence. For h ∈ Rm and ϕ = (ϕ1, · · · , ϕn) ∈ Tn = ([0, 1]/0 ∼ 1)n, consider the family of ordinary differential equations = εH(h, ϕ, ε), = Φ(h, ϕ, ε), (2.6) indexed by the real parameter ε ≥ 0. We also write z = (h, ϕ) and dz/dt = Z(z, ε). Fix V ⊂⊂ U ⊂ Rm, and suppose • Regularity: Z ∈ C1(U × Tn × [0,∞)). • Separable fast variables: H(h, ϕ, 0) and Φ(h, ϕ, 0) have the following specific forms: – There exist C1 functionsHj(h, ϕj) such thatH(h, ϕ, 0) = j=1Hj(h, ϕ This can be thought of as saying that, to first order in ε, each fast vari- able affects the slow variables independently of the other fast variables. – The components Φj of Φ satisfy Φj(h, ϕ, 0) = Φj(h, ϕj, 0), i.e. the un- perturbed motion has each fast variable moving independently of the other fast variables. Note that this assumption is satisfied if the unper- turbed motion is quasi-periodic, i.e. Φ(h, ϕ, 0) = Ω(h). • Regular fast variables: For each j, (h,ϕj)∈U×S1 ∣Φj(h, ϕj, 0) ∣ > 0. Let zε(t) = (hε(t), ϕε(t)) denote the solution of Equation (2.6). Then z0(t) leaves invariant the tori Mc = {h = c} in phase space. In fact, z0(t) preserves a (not necessarily ergodic) invariant probability measure on Mc, whose density is given by dµc = |Φj(c, ϕj, 0)| , where Kjc = |Φj(c,ϕj ,0)| . The averaged vector field H̄ is defined by H̄(h) = H(h, ϕ, 0)dµh(ϕ) = Hj(h, ϕ j)dµh(ϕ) Hj(h, ϕ |Φj(h, ϕj, 0)|dϕ H̄j(h). Let h̄(τ) be the solution of = H̄(h̄), h̄(0) = hε(0), and the stopping time Tε = inf{τ ≥ 0 : h̄(τ) /∈ V or hε(τ/ε) /∈ V}. Theorem 2.2.5 (Averaging over multiple separable, regular fast variables). For each T > 0, initial conditions s.t. hε(0)∈V 0≤τ≤T∧Tε ∣hε(τ/ε)− h̄(τ) ∣ = O(ε) as ε→ 0. Proof. The proof is essentially the same as the proof of Theorem 2.2.3. As before, we need only show that sup0≤τ≤T∧Tε |eε(τ)| = O(ε), where eε(τ) = ε ∫ τ/ε H(zε(s), 0)− H̄(hε(s))ds. But by our separability assumptions, it suffices to show that for each j, 0≤τ≤T∧Tε |ej,ε(τ)| = O(ε), where ej,ε(τ) is defined by ej,ε(τ) = ε ∫ τ/ε Hj(hε(s), ϕ ε(s))− H̄j(hε(s))ds. Thus, we have effectively separated the effects of each fast variable, and now the proof can be completed by essentially following steps 2 and 3 in the proof of Theorem 2.2.3. 2.3 A proof of Anosov’s theorem Anosov’s original proof of Theorem 2.1.1 from 1960 may be found in [Ano60]. An exposition of the theorem and Anosov’s proof in English may be found in [LM88]. Recently, Kifer [Kif04a] proved necessary and sufficient conditions for the averaging principle to hold in an averaged with respect to initial conditions sense. He also showed explicitly that his conditions are met in the setting of Anosov’s theorem. The proof of Anosov’s theorem given here is mainly due to Dolgopyat [Dol05], although some further simplifications have been made. Proof of Anosov’s theorem. We begin by showing that without loss of generality we may take Tε = ∞. This is just for convenience, and not an essential part of the proof. To accomplish this, let ψ(h) be a smooth bump function satisfying • ψ(h) = 1 if h ∈ V, • ψ(h) > 0 if h ∈ interior(Ṽ), • ψ(h) = 0 if h /∈ Ṽ, where Ṽ is a compact set chosen such that V ⊂⊂ interior(Ṽ) ⊂⊂ U . Next, set Z̃(z, ε) = ψ(h(z))Z(z, ε). Because the bump function was chosen to depend only on the slow variables, our assumption about preservation of measures is still satisfied; on each fiber, Z̃(z, 0) is a scaler multiple of Z(z, 0). Furthermore, the flow of Z̃(·, 0)|Mh is ergodic for almost every h ∈ Ṽ. Then it would suffice to prove our theorem for the vector fields Z̃(z, ε) with the set Ṽ replacing V. We assume that this reduction has been made, although we will not use it until Step 5 below. Step 1: Reduction using Gronwall’s Inequality. Observe that h̄(τ) satisfies the integral equation h̄(τ)− h̄(0) = H̄(h̄(σ))dσ, while hε(τ/ε) satisfies hε(τ/ε)− hε(0) = ε ∫ τ/ε H(zε(s), ε)ds = O(ε) + ε ∫ τ/ε H(zε(s), 0)ds = O(ε) + ε ∫ τ/ε H(zε(s), 0)− H̄(hε(s))ds+ H̄(hε(σ/ε))dσ for 0 ≤ τ ≤ T ∧ Tε. Here we have used the fact that h−1V × [0, ε0] is compact to achieve uniformity over all initial conditions in the size of the O(ε) term above. We use this fact repeatedly in what follows. In particular, H , H̄, and Z are uniformly bounded and have uniform Lipschitz constants on the domains of interest. Define eε(τ) = ε ∫ τ/ε H(zε(s), 0)− H̄(hε(s))ds. It follows from Gronwall’s Inequality that 0≤τ≤T∧Tε ∣h̄(τ)− hε(τ/ε) O(ε) + sup 0≤τ≤T∧Tε |eε(τ)| eLip(H̄ |V)T . (2.7) Step 2: Introduction of a time scale for ergodization. Choose a real-valued function L(ε) such that L(ε) → ∞, L(ε) = o(log ε−1) as ε → 0. Think of L(ε) as being a time scale which grows as ε→ 0 so that ergodization, i.e. the convergence along an orbit of a function’s time average to a space average, can take place. However, L(ε) doesn’t grow too fast, so that on this time scale zε(t) essentially stays on one fiber, where we have our ergodicity assumption. Set tk,ε = kL(ε), so 0≤τ≤T∧Tε |eε(τ)| ≤ O(εL(ε)) + ε εL(ε) ∫ tk+1,ε H(zε(s), 0)− H̄(hε(s))ds . (2.8) Step 3: A splitting for using the triangle inequality. Now we let zk,ε(s) be the solution of dzk,ε = Z(zk,ε, 0), zk,ε(tk,ε) = zε(tk,ε). Set hk,ε(t) = h(zk,ε(t)). Observe that hk,ε(t) is independent of t. We break up the integral ∫ tk+1,ε H(zε(s), 0)− H̄(hε(s))ds into three parts: ∫ tk+1,ε H(zε(s), 0)− H̄(hε(s))ds ∫ tk+1,ε H(zε(s), 0)−H(zk,ε(s), 0)ds ∫ tk+1,ε H(zk,ε(s), 0)− H̄(hk,ε(s))ds ∫ tk+1,ε H̄(hk,ε(s))ds− H̄(hε(s))ds :=Ik,ε + IIk,ε + IIIk,ε. The term IIk,ε represents an “ergodicity term” that can be controlled by our assumptions on the ergodicity of the flow z0(t), while the terms Ik,ε and IIIk,ε represent “continuity terms” that can be controlled using the following control on the drift from solutions along fibers. Step 4: Control of drift from solutions along fibers. Lemma 2.3.1. If 0 < tk+1,ε ≤ T∧Tεε , tk,ε≤t≤tk+1,ε |zk,ε(t)− zε(t)| ≤ O(εL(ε)eLip(Z)L(ε)) Proof. Without loss of generality we may set k = 0, so that zk,ε(t) = z0(t). Then for 0 ≤ t ≤ L(ε), |z0(t)− zε(t)| = Z(z0(s), 0)− Z(zε(s), ε)ds ≤ Lip (Z) |ε|+ |z0(s)− zε(s)| ds = O(εL(ε)) + Lip (Z) |z0(s)− zε(s)| ds. The result follows from Gronwall’s Inequality. From Lemma 2.3.1 we find that Ik,ε, IIIk,ε = O(εL(ε)2eLip(Z)L(ε)). Step 5: Use of ergodicity along fibers to control IIk,ε. From Equations (2.7) and (2.8) and the triangle inequality, we already know that 0≤τ≤T∧Tε ∣h̄(τ)− hε(τ/ε) ≤ O(ε) +O(εL(ε)) + ε T εL(ε) O(εL(ε)2eLip(Z)L(ε)) +O εL(ε) |IIk,ε| = O(εL(ε)eLip(Z)L(ε)) +O εL(ε) |IIk,ε| (2.9) Fix δ > 0. Recalling that Tε = ∞, it suffices to show that εL(ε) |IIk,ε| ≥ δ as ε→ 0. For initial conditions z ∈ M and for 0 ≤ k ≤ T εL(ε) define Bk,ε = |IIk,ε| > Bz,ε = {k : z ∈ Bk,ε} . Think of these sets as describing “bad ergodization.” For example, roughly speak- ing, z ∈ Bk,ε if the orbit zε(t) starting at z spends the time between tk,ε and tk+1,ε in a region of phase space where the function H(·, 0) is “poorly ergodized” on the time scale L(ε) by the flow z0(t) (as measured by the parameter δ/2T ). As IIk,ε is clearly never larger than O(L(ε)), it follows that εL(ε) |IIk,ε| ≤ +O(εL(ε)#(Bz,ε)). Therefore it suffices to show that #(Bz,ε) ≥ const εL(ε) as ε→ 0. By Chebyshev’s Inequality, we need only show that E(εL(ε)#(Bz,ε)) = εL(ε) εL(ε) P (Bk,ε) tends to 0 with ε. In order to estimate the size of P (Bk,ε), it is convenient to introduce a new measure P f that is uniformly equivalent to the restriction of Riemannian volume P to h−1V. Here the f stands for “factor,” and P f is defined by dP f = dh · dµh, where dh represents integration with respect to the uniform measure on V. Observe that B0,ε = zε(tk,ε)Bk,ε. In words, the initial conditions giving rise to orbits that are “bad” on the time interval [tk,ε, tk+1,ε], moved forward by time tk,ε, are precisely the initial conditions giving rise to orbits that are “bad” on the time interval [t0,ε, t1,ε]. Because the flow z0(·) preserves the measure P f , we expect P f(B0,ε) and P f(Bk,ε) to have roughly the same size. This is made precise by the following lemma. Lemma 2.3.2. There exists a constant K such that for each Borel set B ⊂ M and each t ∈ [−T/ε, T/ε], P f(zε(t)B) ≤ eKTP f(B). Proof. Assume that P f(B) > 0, and set γ(t) = ln P f(zε(t)B)/P . Then γ(0) = 0, and (t) = zε(t)B f̃(z)dz zε(t)B f̃(z)dz zε(t)B divP fZ(z, ε)dz zε(t)B f̃(z)dz where f̃ > 0 is the C1 density of P f with respect to Riemannian volume on h−1V, dz represents integration with respect to that volume, and divP fZ(z, ε) = divzf̃(z)Z(z, ε). Because z0(t) preserves P f , divP fZ(z, 0) ≡ 0. By Hadamard’s Lemma, it follows that divP f Z(z, ε) = O(ε) on the compact set h−1V. Hence dγ(t)/dt = O(ε), and the result follows. Returning to our proof of Anosov’s theorem, it suffices to show that P f(B0,ε) = dh · µh ∫ L(ε) H(z0(s), 0)− H̄(h0(0))ds tends to 0 with ε. By our ergodicity assumption, for almost every h, ∫ L(ε) H(z0(s), 0)− H̄(h0(0))ds → 0 as ε→ 0. Finally, an application of the Bounded Convergence Theorem finishes the proof. 2.4 Moral From the proofs of the theorems in this chapter, it should be apparent that there are at least two key steps necessary for proving a version of the averaging principle in the setting presented in Section 2.1. The first step is estimating the continuity between the ε = 0 and the ε > 0 solutions of = Z(z, ε). In particular, on some relatively long timescale L = L(ε) ≪ ε−1, we need to show 0≤t≤L |z0(t)− zε(t)| → 0 as ε → 0. As long as L is sub-logarithmic in ε−1, such estimates for smooth systems can be made using Gronwall’s Inequality. The second step is estimating the rate of ergodization of H(·, 0) by z0(t), i.e. es- timating how fast H(z0(s), 0) ds→ H̄(h0) (generally as L → ∞). Note that the estimates in this step compete with those in the first step in that, if L is small we obtain better continuity, but if L is large we usually obtain better ergodization. Also, we do not need the full force of the assumption of ergodicity of (z0(t), µh) on the fibers Mh. We only need z0(t) to ergodize the specific function H(·, 0). Compare the proof of Theorem 2.2.5. Note that in the setting of Anosov’s theorem, uniform ergodization leads to uniform convergence in the averaging principle. Returning to the proof of Theo- rem 2.1.1 above, suppose that ∫ L(ε) H(z0(s), 0)ds→ H̄(h0) uniformly over all initial conditions as L(ε) → ∞. Then for all ε sufficiently small and each k, Bk,ε = ∅, and hence for all ε sufficiently small and each z, #(Bz,ε) = 0. From Equation (2.9), it follows that sup0≤τ≤T∧Tε ∣h̄(τ)− hε(τ/ε) ∣ → 0 as ε → 0, uniformly over all initial conditions z ∈ h−1V. However, uniform convergence in Birkhoff’s Ergodic Theorem is extremely rare and usually comes about because of unique ergodicity, so it is unreasonable to expect this sort of uniform convergence in most situations where Anosov’s theorem applies. Chapter 3 Results for piston systems in one dimension In this chapter, we present our results for piston systems in one dimension. These results may also be found in [Wri06]. 3.1 Statement of results 3.1.1 The hard core piston problem Consider the system of n1 + n2 + 1 point particles moving inside the unit interval indicated in Figure 3.1. One distinguished particle, the piston, has position Q and mass M . To the left of the piston there are n1 > 0 particles with positions q1,j and masses m1,j, 1 ≤ j ≤ n1, and to the right there are n2 > 0 particles with positions q2,j and masses m2,j, 1 ≤ j ≤ n2. These gas particles do not interact with each other, but they interact with the piston and with walls located at the end points of the unit interval via elastic collisions. We denote the velocities by dQ/dt = V and dxi,j/dt = vi,j. There is a standard method for transforming this system into a billiard system consisting of a point particle moving inside an (n1 + n2 + 1)-dimensional polytope [CM06a], but we will not use this in what follows. We are interested in the dynamics of this system when the numbers and masses of the gas particles are fixed, the total energy is bounded, and the mass of the piston tends to infinity. When M = ∞, the piston remains at rest, and each gas particle performs periodic motion. More interesting are the motions of the system when M is very large but finite. Because the total energy of the system is bounded, MV 2/2 ≤ const, and so V = O(M−1/2). Set ε =M−1/2, t t t t t t t Figure 3.1: The piston system with n1 = 3 and n2 = 4. Note that the gas particles do not interact with each other, but only with the piston and the walls. and let so that with W = O(1). When ε = 0, the system has n1 + n2 + 2 independent first integrals (conserved quantities), which we take to be Q, W , and si,j = |vi,j |, the speeds of the gas particles. We refer to these variables as the slow variables because they should change slowly with time when ε is small, and we denote them by h = (Q,W, s1,1, s1,2, · · · , s1,n1, s2,1, s2,2, · · · , s2,n2) ∈ Rn1+n2+2. We will often abbreviate by writing h = (Q,W, s1,j, s2,j). Let hε(t, z) = hε(t) denote the dynamics of these variables in time for a fixed value of ε, where z represents the dependence on the initial condition in phase space. We usually suppress the initial condition in our notation. Think of hε(·) as a random variable which, given an initial condition in the 2(n1 + n2 + 1)-dimensional phase space, produces a piecewise continuous path in Rn1+n2+2. These paths are the projection of the actual motions in our phase space onto a lower dimensional space. The goal of averaging is to find a vector field on Rn1+n2+2 whose orbits approximate hε(t). The averaged equation Sinai [Sin99] derived = H̄(h) := j=1 m1,js j=1 m2,js −s1,jW s2,jW (3.1) as the averaged equation (with respect to the slow time τ = εt) for the slow variables. We provide a heuristic derivation in Section 3.2. Sinai solved this equation as follows: From d ln(s1,j) = −d ln(Q) s1,j(τ) = s1,j(0)Q(0)/Q(τ). Similarly, s2,j(τ) = s2,j(0)(1−Q(0))/(1−Q(τ)). Hence j=1m1,js1,j(0) 2Q(0)2 j=1m2,js2,j(0) 2(1−Q(0))2 (1−Q)3 , and so (Q,W ) behave as if they were the coordinates of a Hamiltonian system describing a particle undergoing periodic motion inside a potential well. If we let s2i,j be the kinetic energy of the gas particles on one side of the piston, the effective Hamiltonian may be expressed as W 2 + E1(0)Q(0) E2(0)(1−Q(0))2 (1−Q)2 . (3.2) Hence, the solutions to the averaged equation are periodic for all initial conditions under consideration. Main result in the hard core setting The solutions of the averaged equation approximate the motions of the slow vari- ables, hε(t), on a time scale O(1/ε) as ε → 0. Precisely, let h̄(τ, z) = h̄(τ) be the solution of = H̄(h̄), h̄(0) = hε(0). Again, think of h̄(·) as being a random variable that takes an initial condition in our phase space and produces a path in Rn1+n2+2. Next, fix a compact set V ⊂ Rn1+n2+2 such that h ∈ V ⇒ Q ⊂⊂ (0, 1),W ⊂⊂ R, and si,j ⊂⊂ (0,∞) for each i and j.1 For the remainder of this discussion we will restrict our attention to the dynamics of the system while the slow variables remain in the set V. To this end, we define the stopping time Tε(z) = Tε := inf{τ ≥ 0 : h̄(τ) /∈ V or hε(τ/ε) /∈ V}. Theorem 3.1.1. For each T > 0, initial conditions s.t. hε(0)∈V 0≤τ≤T∧Tε ∣hε(τ/ε)− h̄(τ) ∣ = O(ε) as ε =M−1/2 → 0. This result was independently obtained by Gorelyshev and Neishtadt [GN06]. Note that the stopping time does not unduly restrict the result. Given any c such that h = c ⇒ Q ∈ (0, 1), si,j ∈ (0,∞), then by an appropriate choice of the compact set V we may ensure that, for all ε sufficiently small and all initial conditions in our phase space with hε(0) = c, Tε ≥ T . We do this by choosing V ∋ c such that the distance between ∂V and the periodic orbit h̄(τ) with h̄(0) = c is positive. Call this distance d. Then Tε can only occur before T if hε(τ/ε) has deviated by at least d from h̄(τ) for some τ ∈ [0, T ). Since the size of the deviations tends to zero uniformly with ε, this is impossible for all small ε. 3.1.2 The soft core piston problem In this section, we consider the same system of one piston and gas particles inside the unit interval considered in Section 3.1.1, but now the interactions of the gas particles with the walls and with the piston are smooth. Let κ : R → R be a C2 function satisfying • κ(x) = 0 if x ≥ 1, • κ′(x) < 0 if x < 1. Let δ > 0 be a parameter of smoothing, and set κδ(x) = κ(x/δ). 1 We have introduced this notation for convenience. For example, h ∈ V ⇒ Q ⊂⊂ (0, 1) means that there exists a compact set A ⊂ (0, 1) such that h ∈ V ⇒ Q ∈ A, and similarly for the other variables. Then consider the Hamiltonian system obtained by having the gas particles inter- act with the piston and the walls via the potential κδ(q1,j) + κδ(Q− q1,j) + κδ(q2,j −Q) + κδ(1− q2,j). As before, we set ε =M−1/2 and W = V/ε. If we let E1,j = m1,jv 1,j + κδ(q1,j) + κδ(Q− q1,j), 1 ≤ j ≤ n1, E2,j = m2,jv 2,j + κδ(q2,j −Q) + κδ(1− q2,j), 1 ≤ j ≤ n2, (3.3) then Ei,j may be thought of as the energy associated with a gas particle, and W 2/2 + j=1E1,j + j=1E2,j is the conserved energy. When ε = 0, the Hamiltonian system admits n1 + n2 + 2 independent first integrals, which we choose this time as h = (Q,W,E1,j , E2,j). While discussing the soft core dynamics we use the energies Ei,j rather than the variables si,j = 2Ei,j/mi,j, which we used for the hard core dynamics, for convenience. For comparison with the hard core results, we formally consider the dynamics described by setting δ = 0 to be the hard core dynamics described in Section 3.1.1. This is reasonable because we will only consider gas particle energies below the barrier height κ(0). Then for any ε, δ ≥ 0, hδε(t) denotes the actual time evolution of the slow variables. While discussing the soft core dynamics we often use δ as a superscript to specify the dynamics for a certain value of δ. We usually suppress the dependence on δ, unless it is needed for clarity. Main result in the soft core setting We have already seen that when δ = 0, there is an appropriate averaged vector field H̄0 whose solutions approximate the actual motions of the slow variables, h0ε(t). We will show that when δ > 0, there is also an appropriate averaged vector field H̄δ whose solutions still approximate the actual motions of the slow variables, hδε(t). We delay the derivation of H̄ δ until Section 3.4.1. Fix a compact set V ⊂ Rn1+n2+2 such that h ∈ V ⇒ Q ⊂⊂ (0, 1),W ⊂⊂ R, and Ei,j ⊂⊂ (0, κ(0)) for each i and j. For each ε, δ ≥ 0 we define the functions h̄δ(·) and T δε on our phase space by letting h̄δ(τ) be the solution of = H̄δ(h̄δ), h̄δ(0) = hδε(0), (3.4) T δε = inf{τ ≥ 0 : h̄δ(τ) /∈ V or hδε(τ/ε) /∈ V}. Theorem 3.1.2. There exists δ0 > 0 such that the averaged vector field H̄ δ(h) is C1 on the domain {(δ, h) : 0 ≤ δ ≤ δ0, h ∈ V}. Furthermore, for each T > 0, 0≤δ≤δ0 initial conditions s.t. hδε(0)∈V 0≤τ≤T∧T δε ∣hδε(τ/ε)− h̄δ(τ) ∣ = O(ε) as ε =M−1/2 → 0. As in Section 3.1.1, for any fixed c there exists a suitable choice of the compact set V such that for all sufficiently small ε and δ, T δε ≥ T whenever hδε(0) = c. As we will see, for each fixed δ > 0, Anosov’s theorem 2.1.1 applies to the soft core system and yields a weak law of large numbers, and Theorem 2.2.5 applies and yields a strong law of large numbers with a uniform rate of convergence. However, neither of these theorems yields the uniformity over δ in the result above. 3.1.3 Applications and generalizations Relationship between the hard core and the soft core piston It is not a priori clear that we can compare the motions of the slow variables on the time scale 1/ε for δ > 0 versus δ = 0, i.e. compare the motions of the soft core piston with the motions of the hard core piston on a relatively long time scale. It is impossible to compare the motions of the fast-moving gas particles on this time scale as ε → 0. As we see in Section 3.4, the frequency with which a gas particle hits the piston changes by an amount O(δ) when we smooth the interaction. Thus, on the time scale 1/ε, the number of collisions is altered by roughly O(δ/ε), and this number diverges if δ is held fixed while ε → 0. Similarly, one might expect that it is impossible to compare the motions of the soft and hard core pistons as ε → 0 without letting δ → 0 with ε. However, from Gronwall’s Inequality it follows that if h̄δ(0) = h̄0(0), then 0≤τ≤T∧T δε ∧T 0ε ∣h̄δ(τ)− h̄0(τ) ∣ = O(δ). From the triangle inequality and Theorems 3.1.1 and 3.1.2 we obtain the following corollary, which allows us to compare the motions of the hard core and the soft core piston. Corollary 3.1.3. As ε =M−1/2, δ → 0, initial conditions s.t. hδε(0)=c=h 0≤t≤(T∧T δε ∧T 0ε )/ε ∣hδε(t)− h0ε(t) ∣ = O(ε) +O(δ). This shows that, provided the slow variables have the same initial conditions, 0≤t≤1/ε ∣hδε(t)− h0ε(t) ∣ = O(ε) +O(δ). Thus the motions of the slow variables converge on the time scale 1/ε as ε, δ → 0, and it is immaterial in which order we let these parameters tend to zero. The adiabatic piston problem We comment on what Theorem 3.1.1 says about the adiabatic piston problem. The initial conditions of the adiabatic piston problem require thatW (0) = 0. Although our system is so simple that a proper thermodynamical pressure is not defined, we can define the pressure of a gas to be the average force received from the gas particles by the piston when it is held fixed, i.e. P1 = j=1 2m1,js1,j = 2E1/Q and P2 = 2E2/(1 − Q). Then if P1(0) > P2(0), the initial condition for our averaged equation (3.1) has the motion of the piston starting at the left turning point of a periodic orbit determined by the effective potential well. Up to errors not much bigger than M−1/2, we see the piston oscillate periodically on the time scale M1/2. If P1(0) < P2(0), the motion of the piston starts at a right turning point. However, if P1(0) = P2(0), then the motion of the piston starts at the bottom of the effective potential well. In this case of mechanical equilibrium, h̄(τ) = h̄(0), and we conclude that, up to errors not much bigger thanM−1/2, we see no motion of the piston on the time scale M1/2. A much longer time scale is required to see if the temperatures equilibrate. Generalizations A simple generalization of Theorem 3.1.1, proved by similar techniques, follows. The system consists of N − 1 pistons, that is, heavy point particles, located inside the unit interval at positions Q1 < Q2 < . . . < QN−1. Walls are located at Q0 ≡ 0 and QN ≡ 1, and the piston at position Qi has mass Mi. Then the pistons divide the unit interval into N chambers. Inside the ith chamber, there are ni ≥ 1 gas particles whose locations and masses will be denoted by xi,j and mi,j , respectively, where 1 ≤ j ≤ ni. All of the particles are point particles, and the gas particles interact with the pistons and with the walls via elastic collisions. However, the gas particles do not directly interact with each other. We scale the piston masses as Mi = M̂i/ε 2 with M̂i constant, define Wi by dQi/dt = εWi, and let Ei be the kinetic energy of the gas particles in the ith chamber. Then we can find an appropriate averaged equation whose solutions have the pistons moving like an (N − 1)-dimensional particle inside a potential well with an effective Hamiltonian Ei(0)(Qi(0)−Qi−1(0))2 (Qi −Qi−1)2 If we write the slow variables as h = (Qi,Wi, |vi,j|) and fix a compact set V such that h ∈ V ⇒ Qi+1 − Qi ⊂⊂ (0, 1),Wi ⊂⊂ R, and |vi,j| ⊂⊂ (0,∞), then the convergence of the actual motions of the slow variables to the averaged solutions is exactly the same as the convergence given in Theorem 3.1.1. Remark 3.1.1. The inverse quadratic potential between adjacent pistons in the effective Hamiltonian above is also referred to as the Calogero-Moser-Sutherland potential. It has also been observed as the effective potential created between two adjacent tagged particles in a one-dimensional Rayleigh gas by the insertion of one very light particle inbetween the tagged particles [BTT07]. 3.2 Heuristic derivation of the averaged equa- tion for the hard core piston We present here a heuristic derivation of Sinai’s averaged equation (3.1) that is found in [Dol05]. First, we examine interparticle collisions when ε > 0. When a particle on the left, say the one at position q1,j , collides with the piston, s1,j andW instantaneously change according to the laws of elastic collisions: v+1,j m1,j +M m1,j −M 2M 2m1,j M −m1,j v−1,j . (3.5) If the speed of the left gas particle is bounded away from zero, and W = M1/2V is also bounded, it follows that for all ε sufficiently small, any collision will have v−1,j > 0 and v 1,j < 0. In this case, when we translate Equation (3.5) into our new coordinates, we find that s+1,j 1 + ε2m1,j 1− ε2m1,j −2ε 2εm1,j 1− ε2m1,j s−1,j , (3.6) so that ∆s1,j = s 1,j − s−1,j = −2εW− +O(ε2), ∆W =W+ −W− = +2εm1,js−1,j +O(ε2). The situation is analogous when particles on the right collide with the piston. For all ε sufficiently small, s2,j and W instantaneously change by ∆W = W+ −W− = −2εm2,js−2,j +O(ε2), ∆s2,j = s 2,j − s−2,j = +2εW− +O(ε2). We defer discussing the rare events in which multiple gas particles collide with the piston simultaneously, although we will see that they can be handled appropriately. Let ∆t be a length of time long enough such that the piston experiences many collisions with the gas particles, but short enough such that the slow variables change very little, in this time interval. From each collision with the particle at position q1,j , W changes by an amount +2εm1,js1,j + O(ε2), and the frequency of these collisions is approximately . Arguing similarly for collisions with the other particles, we guess that 2m1,js1,j 2m2,js2,j 2(1−Q) +O(ε Note that not only does the position of the piston change slowly in time, but its velocity also changes slowly, i.e. the piston has inertia. With τ = εt as the slow time, a reasonable guess for the averaged equation for W is m1,js m2,js 1−Q . Similar arguments for the other slow variables lead to the averaged equation (3.1). 3.3 Proof of the main result for the hard core piston 3.3.1 Proof of Theorem 3.1.1 with only one gas particle on each side We specialize to the case when there is only one gas particle on either side of the piston, i.e. we assume that n1 = n2 = 1. We then denote x1,1 by q1, m2,2 by m2, etc. This allows the proof’s major ideas to be clearly expressed, without substantially limiting their applicability. At the end of this section, we outline the simple generalizations needed to make the proof apply in the general case. A choice of coordinates on the phase space for a three particle system As part of our proof, we choose a set of coordinates on our six-dimensional phase space such that, in these coordinates, the ε = 0 dynamics are smooth. Complete the slow variables h = (Q,W, s1, s2) to a full set of coordinates by adding the coordinates ϕi ∈ [0, 1]/ 0 ∼ 1 = S1, i = 1, 2, defined as follows: ϕ1 = ϕ1(q1, v1, Q) = if v1 > 0 1− q1 if v1 < 0 ϕ2 = ϕ2(q2, v2, Q) = 2(1−Q) if v2 < 0 1− 1−q2 2(1−Q) if v2 > 0 When ε = 0, these coordinates are simply the angle variable portion of action- angle coordinates for an integrable Hamiltonian system. They are defined such that collisions occur between the piston and the gas particles precisely when ϕ1 or ϕ2 = 1/2. Then z = (h, ϕ1, ϕ2) represents a choice of coordinates on our phase space, which is homeomorphic to (a subset of R4) × T2. We abuse notation and also let h(z) represent the projection onto the first four coordinates of z. Now we describe the dynamics of our system in these coordinates. When ϕ1, ϕ2 6= 1/2, ϕ1 if 0 ≤ ϕ1 < 1/2 (1− ϕ1) if 1/2 < ϕ1 ≤ 1 2(1−Q) + 1−Qϕ2 if 0 ≤ ϕ2 < 1/2 2(1−Q) − 1−Q(1− ϕ2) if 1/2 < ϕ2 ≤ 1 Hence between interparticle collisions, the dynamics are smooth and are described = εW, +O(ε), 2(1−Q) +O(ε). (3.7) When ϕ1 reaches 1/2, while ϕ2 6= 1/2, the coordinates Q, s2, ϕ1, and ϕ2 are instantaneously unchanged, while s1 andW instantaneously jump, as described by Equation (3.6). As an aside, it is curious that s+1 +εW + = s−1 −εW−, so that dϕ1/dt is continuous as ϕ1 crosses 1/2. However, the collision induces discontinuous jumps of size O(ε2) in dQ/dt and dϕ2/dt. Denote the linear transformation in Equation (3.6) with j = 1 by A1,ε. Then A1,ε = 1 −2ε 2εm1 1 +O(ε2). The situation is analogous when ϕ2 reaches 1/2, while ϕ1 6= 1/2. Then W and s2 are instantaneously transformed by a linear transformation A2,ε = 1 −2εm2 +O(ε2). We also account for the possibility of all three particles colliding simultane- ously. There is no completely satisfactory way to do this, as the dynamics have an essential singularity near {ϕ1 = ϕ2 = 1/2}. Furthermore, such three particle colli- sions occur with probability zero with respect to the invariant measure discussed below. However, the two 3× 3 matrices A1,ε 0 0 A2,ε have a commutator of size O(ε2). We will see that this small of an error will make no difference to us as ε → 0, and so when ϕ1 = ϕ2 = 1/2, we pretend that the left particle collides with the piston instantaneously before the right particle does. Precisely, we transform the variables s1, W, and s2 by 0 A2,ε A1,ε 0 We find that ∆s1 = s 1 − s−1 = −2εW− +O(ε2), ∆W = W+ −W− = +2εm1s−1 − 2εm2s−2 +O(ε2), ∆s2 = s 2 − s−2 = +2εW− +O(ε2). The above rules define a flow on the phase space, which we denote by zε(t). We denote its components by Qε(t), Wε(t), s1,ε(t), etc. When ε > 0, the flow is not continuous, and for definiteness we take zε(t) to be left continuous in t. Because our system comes from a Hamiltonian system, it preserves Liouville measure. In our coordinates, this measure has a density proportional to Q(1 − Q). That this measure is preserved also follows from the fact that the ordinary differential equation (3.7) preserves this measure, and the matrices A1,ε, A2,ε have determinant 1. Also note that the set {ϕ1 = ϕ2 = 1/2} has co-dimension two, and so t zε(t){ϕ1 = ϕ2 = 1/2} has co-dimension one, which shows that only a measure zero set of initial conditions will give rise to three particle collisions. Argument for uniform convergence Step 1: Reduction using Gronwall’s Inequality. Define H(z) by H(z) = 2m1s1δϕ1=1/2 − 2m2s2δϕ2=1/2 −2Wδϕ1=1/2 2Wδϕ2=1/2 Here we make use of Dirac delta functions. All integrals involving these delta functions may be replaced by sums. We explicitly deal with any ambiguities arising from collisions occurring at the limits of integration. Lemma 3.3.1. For 0 ≤ t ≤ T∧Tε hε(t)− hε(0) = ε H(zε(s))ds+O(ε), where any ambiguity about changes due to collisions occurring precisely at times 0 and t is absorbed in the O(ε) term. Proof. There are four components to verify. The first component requires that Qε(t)−Qε(0) = ε Wε(s)ds+O(ε). This is trivially true because Qε(t)−Qε(0) = Wε(s)ds. The second component states that Wε(t)−Wε(0) = ε 2m1s1,ε(s)δϕ1,ε(s)=1/2−2m2s2,ε(s)δϕ2,ε(s)=1/2ds+O(ε). (3.8) Let rk and qj be the times in (0, t) such that ϕ1,ε(rk) = 1/2 and ϕ2,ε(qj) = 1/2, respectively. Then Wε(t)−Wε(0) = ∆Wε(rk) + ∆Wε(qj) +O(ε). Observe that there exists ω > 0 such that for all sufficiently small ε and all h ∈ V, 1/ω < dϕi < ω. Thus the number of collisions in a time interval grows no faster than linearly in the length of that time interval. Because t ≤ T/ε, it follows that Wε(t)−Wε(0) = ε 2m1s1,ε(rk)− ε 2m2s2,ε(qj) +O(ε), and Equation (3.8) is verified. Note that because V is compact, there is uniformity over all initial conditions in the size of the O(ε) terms above. The third and fourth components are handled similarly. Next, h̄(τ) satisfies the integral equation h̄(τ)− h̄(0) = H̄(h̄(σ))dσ, while hε(τ/ε) satisfies hε(τ/ε)− hε(0) = O(ε) + ε ∫ τ/ε H(zε(s))ds = O(ε) + ε ∫ τ/ε H(zε(s))− H̄(hε(s))ds+ H̄(hε(σ/ε))dσ for 0 ≤ τ ≤ T ∧ Tε. Define eε(τ) = ε ∫ τ/ε H(zε(s))− H̄(hε(s))ds. It follows from Gronwall’s Inequality that 0≤τ≤T∧Tε ∣h̄(τ)− hε(τ/ε) O(ε) + sup 0≤τ≤T∧Tε |eε(τ)| eLip(H̄ |V)T . (3.9) Gronwall’s Inequality is usually stated for continuous paths, but the standard proof (found in [SV85]) still works for paths that are merely integrable, and ∣h̄(τ)− hε(τ/ε) ∣ is piecewise smooth. Step 2: A splitting according to particles. Now H(z)− H̄(h) = 2m1s1δϕ1=1/2 −m1s21/Q −2Wδϕ1=1/2 + s1W/Q −2m2s2δϕ2=1/2 +m2s22/(1−Q) 2Wδϕ2=1/2 − s2W/(1−Q) and so, in order to show that sup0≤τ≤T∧Tε |eε(τ)| = O(ε), it suffices to show that 0≤τ≤T∧Tε ∫ τ/ε s1,ε(s)δϕ1,ε(s)=1/2 − s1,ε(s) 2Qε(s) = O(1), 0≤τ≤T∧Tε ∫ τ/ε Wε(s)δϕ1,ε(s)=1/2 − Wε(s)s1,ε(s) 2Qε(s) = O(1), as well as two analogous claims about terms involving ϕ2,ε. Thus we have effec- tively separated the effects of the different gas particles, so that we can deal with each particle separately. We will only show that 0≤τ≤T∧Tε ∫ τ/ε s1,ε(s)δϕ1,ε(s)=1/2 − s1,ε(s) 2Qε(s) = O(1). The other three terms can be handled similarly. Step 3: A sequence of times adapted for ergodization. Ergodization refers to the convergence along an orbit of a function’s time average to its space average. For example, because of the splitting according to particles above, one can easily check that 1 H(z0(s))ds = H̄(h0) + O(1/t), even when z0(·) restricted to the invariant tori Mh0 is not ergodic. In this step, for each initial condition zε(0) in our phase space, we define a sequence of times tk,ε inductively as follows: t0,ε = inf{t ≥ 0 : ϕ1,ε(t) = 0}, tk+1,ε = inf{t > tk,ε : ϕ1,ε(t) = 0}. This sequence is chosen because δϕ1,0(s)=1/2 is “ergodizd” as time passes from tk,0 to tk+1,0. If ε is sufficiently small and tk+1,ε ≤ (T ∧ Tε)/ε, then the spacings between these times are uniformly of order 1, i.e. 1/ω < tk+1,ε − tk,ε < ω. Thus, 0≤τ≤T∧Tε ∫ τ/ε s1,ε(s)δϕ1,ε(s)=1/2 − s1,ε(s) 2Qε(s) ≤ O(1) + tk+1,ε≤T∧Tεε ∫ tk+1,ε s1,ε(s)δϕ1,ε(s)=1/2 − s1,ε(s) 2Qε(s) (3.10) Step 4: Control of individual terms by comparison with solutions along fibers. The sum in Equation (3.10) has no more than O(1/ε) terms, and so it suffices to show that each term is no larger than O(ε). We can accomplish this by comparing the motions of zε(t) for tk,ε ≤ t ≤ tk+1,ε with the solution of the ε = 0 version of Equation (3.7) that, at time tk,ε, is located at zε(tk,ε). Since each term in the sum has the same form, without loss of generality we will only examine the first term and suppose that t0,ε = 0, i.e. that ϕ1,ε(0) = 0. Lemma 3.3.2. If t1,ε ≤ T∧Tεε , then sup0≤t≤t1,ε |z0(t)− zε(t)| = O(ε). Proof. To check that sup0≤t≤t1,ε |h0(t)− hε(t)| = O(ε), first note that h0(t) = h0(0) = hε(0). Then dQε/dt = O(ε), so that Q0(t)−Qε(t) = O(εt). Furthermore, the other slow variables change by O(ε) at collisions, while the number of collisions in the time interval [0, t1,ε] is O(1). It remains to show that sup0≤t≤t1,ε |ϕi,0(t)− ϕi,ε(t)| = O(ε). Using what we know about the divergence of the slow variables, ϕ1,0(t)− ϕ1,ε(t) = s1,0(s) 2Q0(s) − s1,ε(s) 2Qε(s) +O(ε)ds = O(ε)ds = O(ε) for 0 ≤ t ≤ t1,ε. Showing that sup0≤t≤t1,ε |ϕ2,0(t)− ϕ2,ε(t)| = O(ε) is similar. From Lemma 3.3.2, t1,ε = t1,0 +O(ε) = 2Q0/s1,0 +O(ε). We conclude that ∫ t1,ε s1,ε(s)δϕ1,ε(s)=1/2 − s1,ε(s) 2Qε(s) ds = O(ε) + ∫ t1,ε s1,0(s)δϕ1,ε(s)=1/2 − s1,0(s) 2Q0(s) = O(ε) + s1,0 − t1,ε s21,0 = O(ε). It follows that sup0≤τ≤T∧Tε ∣hε(τ/ε)− h̄(τ) ∣ = O(ε), independent of the initial condition in h−1V. 3.3.2 Extension to multiple gas particles When n1, n2 > 1, only minor modifications are necessary to generalize the proof above. We start by extending the slow variables h to a full set of coordinates on phase space by defining the angle variables ϕi,j ∈ [0, 1]/ 0 ∼ 1 = S1 for 1 ≤ i ≤ 2, 1 ≤ j ≤ ni: ϕ1,j = ϕ1,j(q1,j , v1,j, Q) = if v1,j > 0 1− q1,j if v1,j < 0 ϕ2,j = ϕ2,j(q2,j , v2,j, Q) = 1−q2,j 2(1−Q) if v2,j < 0 1− 1−q2,j 2(1−Q) if v2,j > 0 Then dϕ1,j/dt = s1,j(2Q) −1 + O(ε), dϕ2,j/dt = s2,j(2(1 − Q))−1 + O(ε), and z = (h, ϕ1,j , ϕ2,j) represents a choice of coordinates on our phase space, which is homeomorphic to (a subset of Rn1+n2+2) × Tn1+n2 . In these coordinates, the dynamical system yields a discontinuous flow zε(t) on phase space. The flow preserves Liouville measure, which in our coordinates has a density proportional to Qn1(1 − Q)n2 . As is Section 3.3.1, one can show that the measure of initial conditions leading to multiple particle collisions is zero. Next, define H(z) by H(z) = j=1 2m1,js1,jδϕ1,j=1/2 − j=1 2m2s2,jδϕ2,j=1/2 −2Wδϕ1,j=1/2 2Wδϕ2,j=1/2 For 0 ≤ t ≤ T∧Tε , hε(t) − hε(0) = ε H(zε(s))ds + O(ε). From here, the rest of the proof follows the same arguments made in Section 3.3.1. 3.4 Proof of the main result for the soft core piston For the remainder of this chapter, we consider the family of Hamiltonian systems introduced in Section 3.1.2, which are parameterized by ε, δ ≥ 0. For simplicity, we specialize to n1 = n2 = 1. As in Section 3.3, the generalization to n1, n2 > 1 is not difficult. The Hamiltonian dynamics are given by the following ordinary differential equation: = εW, = ε (−κ′δ(Q− x1) + κ′δ(x2 −Q)) , = v1, −κ′δ(x1) + κ′δ(Q− x1) = v2, −κ′δ(x2 −Q) + κ′δ(1− x2) (3.11) Recalling the particle energies defined by Equation (3.3), we find that = εWκ′δ(Q− x1), = −εWκ′δ(x2 −Q). For the compact set V introduced in Section 3.1.2, fix a small positive number E and an open set U ⊂ R4 such that V ⊂ U and h ∈ U ⇒ Q ∈ (E , 1−E),W ⊂⊂ R, and E < E1, E2 < κ(0)− E . We only consider the dynamics for 0 < δ < E/2 and h ∈ U . Define U1(q1) = U1(q1, Q, δ) := κδ(q1) + κδ(Q− q1), U2(q2) = U2(q2, Q, δ) := κδ(q2 −Q) + κδ(1− q2). Then the energies Ei satisfy Ei = miv i /2 + Ui(xi). Let T1 = T1(Q,E1, δ) and T2 = T2(Q,E2, δ) denote the periods of the motions of the left and right gas particles, respectively, when ε = 0. Lemma 3.4.1. For i = 1, 2, Ti ∈ C1{(Q,Ei, δ) : Q ∈ (E , 1− E), Ei ∈ (E , κ(0)− E), 0 ≤ δ < E/2}. Furthermore, T1(Q,E1, δ) = Q +O(δ), T2(Q,E2, δ) = (1−Q) +O(δ). The proof of this lemma is mostly computational, and so we delay it until Section 3.5. Note especially that the periods can be suitably defined such that their regularity extends to δ = 0. In this section, and in Section 3.5 below, we adopt the following convention on the use of the O notation. All use of the O notation will explicitly contain the dependence on ε and δ as ε, δ → 0. For example, if a function f(h, ε, δ) = O(ε), then there exists δ′, ε′ > 0 such that sup0<ε≤ε′, 0<δ≤δ′, h∈V |f(h, ε, δ)/ε| <∞. When ε = 0, (Ei − Ui(xi)). Define a = a(Ei, δ) by κδ(a) = κ(a/δ) = Ei, so that a(E1, δ) is a turning point for the left gas particle. Then a = δκ −1(Ei), where κ−1 is defined as follows: κ : [0, 1] → [0, κ(0)] takes 0 to κ(0) and 1 to 0. Furthermore, κ ∈ C2([0, 1]), κ′ ≤ 0, and κ′(x) < 0 if x < 1. By monotonicity, κ−1 : [0, κ(0)] → [0, 1] exists and takes 0 to 1 and κ(0) to 0. Also, by the Implicit Function Theorem, κ−1 ∈ C2((0, κ(0)]), (κ−1)′(y) < 0 for y > 0, and (κ−1)′(y) → −∞ as y → 0+. Because we only consider energies Ei ∈ (E , κ(0)− E), it follows that a(Ei, δ) is a C2 function for the domains of interest. 3.4.1 Derivation of the averaged equation As we previously pointed out, for each fixed δ > 0, Anosov’s theorem 2.1.1 and Theorem 2.2.5 apply directly to the family of ordinary differential equations in Equation (3.11), provided that δ is sufficiently small. The invariant fibers Mh of the ε = 0 flow are tori described by a fixed value of the four slow variables and {(Q,W, q1, v1, q2, v2) : E1 = m1v21/2 + U1(q1, Q, δ), E2 = m2v22/2 + U2(q2, Q, δ)}. If we use (q1, q2) as local coordinates on Mh, which is valid except when v1 or v2 = 0, the invariant measure µh of the unperturbed flow has the density dq1dq2 (E1 − U1(q1)) T2 (E2 − U2(q2)) The restricted flow is ergodic for almost every h. See Corollary 3.5.1 in Section −κ′δ(Q− q1) + κ′δ(q2 −Q) Wκ′δ(Q− q1) −Wκ′δ(q2 −Q) κ′δ(Q− q1)dµh = ∫ Q−a κ′δ(Q− q1) (E1 − U1(q1)) ∫ Q−a κ′δ(Q− q1) E1 − κδ(Q− q1) E1 − u 8m1E1 Similarly, κ′δ(q2 −Q)dµh = − 8m2E2 It follows that the averaged vector field is H̄δ(h) = 8m1E1 8m2E2 8m1E1 8m2E2 where from Lemma 3.4.1 we see that H̄ ·(·) ∈ C1({(δ, h) : 0 ≤ δ < E/2, h ∈ V}). H̄0(h) agrees with the averaged vector field for the hard core system from Equation (3.1), once we account for the change of coordinates Ei = mis i /2. Remark 3.4.1. An argument due to Neishtadt and Sinai [NS04] shows that the solutions to the averaged equation (3.4) are periodic. This argument also shows that, as in the case δ = 0, the limiting dynamics of (Q,W ) are effectively Hamil- tonian, with the shape of the Hamiltonian depending on δ, Q(0), and the initial energies of the gas particles. The argument depends heavily on the observation that the phase integrals Ii(Q,Ei, δ) = miv2+Ui(x,Q,δ)≤Ei are adiabatic invariants, i.e. they are integrals of the solutions to the averaged equation. Thus the four-dimensional phase space of the averaged equation is foli- ated by invariant two-dimensional submanifolds, and one can think of the effective Hamiltonians for the piston as living on these submanifolds. 3.4.2 Proof of Theorem 3.1.2 The following arguments are motivated by our proof in Section 3.3, although the details are more involved as we show that the rate of convergence is independent of all small δ. A choice of coordinates on phase space We wish to describe the dynamics in a coordinate system inspired by the one used in Section 3.3.1. For each fixed δ ∈ (0, δ0], this change of coordinates will be C1 in all variables on the domain of interest. However, it is an exercise in analysis to show this, and so we delay the proofs of the following two lemmas until Section We introduce the angular coordinates ϕi ∈ [0, 1]/ 0 ∼ 1 = S1 defined by ϕ1 = ϕ1(q1, v1, Q) = 0 if q1 = a E1−U1(s)ds if v1 > 0 1/2 if q1 = Q− a E1−U1(s)ds if v1 < 0 ϕ2 = ϕ2(q2, v2, Q) = 0 if q2 = 1− a ∫ 1−a E2−U2(s)ds if v2 < 0 1/2 if q2 = Q+ a ∫ 1−a E2−U2(s)ds if v2 > 0 . (3.12) Then z = (h, ϕ1, ϕ2) is a choice of coordinates on h −1U . As before, we will abuse notation and let h(z) denote the projection onto the first four coordinates of z. There is a fixed value of δ0 in the statement of Theorem 3.1.2. However, for the purposes of our proof, it will be convenient to progressively choose δ0 smaller when needed. At the end of the proof, we will have only shrunk δ0 a finite number of times, and this final value will satisfies the requirements of the theorem. Our first requirement on δ0 is that it is smaller than E/2. Lemma 3.4.2. If δ0 > 0 is sufficiently small, then for each δ ∈ (0, δ0] the ordinary differential equation (3.11) in the coordinates z takes the form = Zδ(z, ε), (3.13) where Zδ ∈ C1(h−1U × [0,∞)). When z ∈ h−1U , Zδ(z, ε) = −κ′δ(Q− q1(z)) + κ′δ(q2(z)−Q) εWκ′δ(Q− q1(z)) −εWκ′δ(q2(z)−Q) +O(ε) +O(ε) . (3.14) Recall that, by our conventions, the O(ε) terms in Equation (3.14) have a size that can be bounded independent of all δ sufficiently small. Denote the flow determined by Zδ(·, ε) by zδε(t), and its components by Qδε(t), W δε (t), Eδ1,ε(t), etc. Also, set hδε(t) = h(z ε(t)). From Equation (3.14), Hδ(z, ε) := −κ′δ(Q− q1(z)) + κ′δ(q2(z)−Q) Wκ′δ(Q− q1(z)) −Wκ′δ(q2(z)−Q) . (3.15) In particular, Hδ(z, ε) = Hδ(z, 0). Before proceeding, we need one final technical lemma. Lemma 3.4.3. If δ0 > 0 is chosen sufficiently small, there exists a constant K such that for all δ ∈ (0, δ0], κ′δ(|Q− xi(z)|) = 0 unless ϕi ∈ [1/2−Kδ, 1/2 +Kδ]. Argument for uniform convergence We start by proving the following lemma, which essentially says that an orbit zδε(t) only spends a fraction O(δ) of its time in a region of phase space where ∣Hδ(zδε(t), ε) ∣Hδ(zδε(t), 0) ∣ is of size O(δ−1) Lemma 3.4.4. For 0 ≤ T ′ ≤ T ≤ T∧T ∣Hδ(zδε(s), 0) ∣ ds = O(1 ∨ (T − T ′)). Proof. Without loss of generality, T ′ = 0. From Lemmas 3.4.1 and 3.4.2 it follows that if we choose δ0 sufficiently small, then there exists ω > 0 such that for all sufficiently small ε and all δ ∈ (0, δ0], h ∈ V ⇒ 1/ω < < ω. Define the set B = [1/2 − Kδ, 1/2 + Kδ], where K comes from Lemma 3.4.3. Then we find a crude bound on Qδε(s)− q1(zδε(s)) ∣ ds using that dϕδ1,ε ≥ 1/ω if ϕδ1,ε ∈ B ≤ ω if ϕδ1,ε ∈ Bc. This yields Qδε(s)− q1(zδε(s)) ∣ ds ≤ const 1ϕδ1,ε(s)∈Bds ≤ const 2Kωδ + 1−2Kδ T + 2Kωδ = O(1 ∨ T ). Similarly, ∣κ′δ(q2(z ε(s))−Qδε(s)) ∣ ds = O(1∨ T ), and so ∣Hδ(zδε(s), 0) ∣ ds = O(1 ∨ T ). We now follow steps one through four from Section 3.3.1, making modifications where necessary. Step 1: Reduction using Gronwall’s Inequality. Now hδε(τ/ε) satisfies hδε(τ/ε)− hδε(0) = ε ∫ τ/ε Hδ(zδε(s), 0)ds. Define eδε(τ) = ε ∫ τ/ε Hδ(zδε(s), 0)− H̄δ(hδε(s))ds. It follows from Gronwall’s Inequality and the fact that H̄ ·(·) ∈ C1({(δ, h) : 0 ≤ δ ≤ δ0, h ∈ V}) that 0≤τ≤T∧T δε ∣hδε(τ/ε)− h̄δ(τ) 0≤τ≤T∧T δε ∣eδε(τ) eLip(H̄ δ|V)T 0≤τ≤T∧T δε ∣eδε(τ) (3.16) Step 2: A splitting according to particles. Next, Hδ(z, 0)− H̄δ(h) −κ′δ(Q− q1(z))− 8m1E1 Wκ′δ(Q− q1(z)) +W 8m1E1 κ′δ(q2(z)−Q) + 8m2E2 −Wκ′δ(q2(z)−Q)−W 8m2E2 and so, in order to show that sup0≤τ≤T∧T δε ∣eδε(τ) ∣ = O(ε), it suffices to show that for i = 1, 2, 0≤τ≤T∧T δε ∫ τ/ε ∣Qδε(s)− xi(zδε(s)) i,ε(s) Ti(Qδε(s), E i,ε(s), δ) = O(1), 0≤τ≤T∧T δε ∫ τ/ε Wε(s)κ ∣Qδε(s)− xi(zδε(s)) +Wε(s) i,ε(s) Ti(Qδε(s), E i,ε(s), δ) = O(1). We only demonstrate that 0≤τ≤T∧T δε ∫ τ/ε Qδε(s)− q1(zδε(s)) 1,ε(s) T1(Qδε(s), E 1,ε(s), δ) = O(1). The other three terms are handled similarly. Step 3: A sequence of times adapted for ergodization. Define the se- quence of times tδk,ε inductively by t 0,ε = inf{t ≥ 0 : ϕδ1,ε(t) = 0}, tδk+1,ε = inf{t > tδk,ε : ϕ 1,ε(t) = 0}. If ε and δ are sufficiently small and tδk+1,ε ≤ (T ∧ T δε )/ε, then it follows from Lemma 3.4.2 and the discussion in the proof of Lemma 3.4.4 that 1/ω < tδk+1,ε − tδk,ε < ω. From Lemmas 3.4.2 and 3.4.4 it follows that 0≤τ≤T∧T δε ∫ τ/ε Qδε(s)− q1(zδε(s)) 1,ε(s) T1(Qδε(s), E 1,ε(s), δ) ≤ O(1) + k+1,ε k+1,ε Qδε(s)− q1(zδε(s)) 1,ε(s) T1(Qδε(s), E 1,ε(s), δ) (3.17) Step 4: Control of individual terms by comparison with solutions along fibers. As before, it suffices to show that each term in the sum in Equation (3.17) is no larger than O(ε). Without loss of generality we will only examine the first term and suppose that tδ0,ε = 0, i.e. that ϕ 1,ε(0) = 0. Lemma 3.4.5. If tδ1,ε ≤ T∧T δε , then sup0≤t≤tδ1,ε ∣zδ0(t)− zδε(t) ∣ = O(ε). Proof. By Lemma 3.4.4, hδ0(t) − hδε(t) = hδε(0) − hδε(t) = −ε Hδ(zδε(s), 0)ds = O(ε(1 ∨ t)) for t ≥ 0. Using what we know about the divergence of the slow variables, we find that ϕδ1,0(t)− ϕδ1,ε(t) = 0(s), E 0(s), δ) T1(Qδε(s), E ε(s), δ) +O(ε)ds O(ε)ds = O(ε) for 0 ≤ t ≤ tδ1,ε. Lemmas 3.4.1 and 3.4.2 ensure the desired uniformity in the sizes of the orders of magnitudes. Showing that sup0≤t≤tδ1,ε ∣ϕδ2,0(t)− ϕδ2,ε(t) ∣ = O(ε) is similar. From Lemma 3.4.5 we find that t1,ε = t1,0+O(ε) = T1(Qδ0, Eδ0 , δ)+O(ε). Hence ∫ tδ1,ε 1,ε(s) T1(Qδε(s), E 1,ε(s), δ) ds = O(ε) + ∫ tδ1,0 1,0, δ) = O(ε) + But when q1(z ε) < Q ε − a, Eδ1,ε(s)− κδ Qδε(s)− q1(zδε(s)) ε(s)) Qδε(s)− q1(zδε(s)) and so ∫ tδ1,ε Qδε(s)− q1(zδε(s)) ds = − 1,ε(0)− 1,ε(t = O(ε)− Hence, ∫ tδ1,ε Qδε(s)− q1(zδε(s)) 1,ε(s) T1(Qδε(s), E 1,ε(s), δ) ds = O(ε), as desired. 3.5 Appendix to Section 3.4 Proof of Lemma 3.4.1: Proof. For 0 < δ < E/2, T1 = T1(Q,E1, δ) = 2 ∫ Q−a E1 − U1(s) T2 = T2(Q,E2, δ) = 2 ∫ 1−a E2 − U2(s) We only consider the claims about T1, and for convenience we take m1 = 2. Then T1(Q,E1, δ) = 2 ∫ Q−a E1 − U1(s) ∫ Q/2 E1 − κδ(s) Q/2− δ√ E1 − κδ(s) 2Q− 4δ√ κ−1(E1) E1 − κ(s) Define F (E) := κ−1(E) E − κ(s) −(κ−1)′(u)√ E − u Notice that (κ−1)′(u) diverges as u → 0+, while (E − u)−1/2 diverges as u → E−, but both functions are still integrable on [0, E]. It follows that F (E) is well defined. Then it suffices to show that F : [E , κ(0)− E ] → R is C1. Write F (E) = ∫ E/2 −(κ−1)′(u)√ E − u −(κ−1)′(u)√ E − u := F1(E) + F2(E). A standard application of the Dominated Convergence Theorem allows us to dif- ferentiate inside the integral and conclude that F1 ∈ C∞([E , κ(0)− E ]), with F ′1(E) = ∫ E/2 (κ−1)′(u) 2(E − u)3/2du. To examine F2, we make the substitution v = E − u to find that F2(E) = ∫ E−E/2 −(κ−1)′(E − v)√ Using the fact that (κ−1)′ ∈ C1([E/2, κ(0)]) and the Dominated Convergence The- orem, we find that F2 is differentiable, with F ′2(E) = −(κ−1)′(E/2) E − E/2 ∫ E−E/2 −(κ−1)′′(E − v)√ Another application of the Dominated Convergence Theorem shows that F ′2 is continuous, and so F2 ∈ C1([E , κ(0)− E ]). T1(Q,E1, δ) = −E−1/21 + F1(E1) + F2(E1) has the desired regularity. For future reference, we note that +O(δ). (3.18) Corollary 3.5.1. For all δ sufficiently small, the flow zδ0(t) restricted to the in- variant tori Mc = {h = c} is ergodic (with respect to the invariant Lebesgue measure) for almost every c ∈ U . Proof. The flow is ergodic whenever the periods T1 and T2 are irrationally related. Fix δ sufficiently small such that ∂T1 = −Q/E3/21 + O(δ) < 0. Next, consider Q, W , and E2 fixed, so that T2 is constant. Because T1 ∈ C1, it follows that, as we let E1 vary, /∈ Q for almost every E1. The result follows from Fubini’s Theorem. Proof of Lemma 3.4.2: Proof. For the duration of this proof, we consider the dynamics for a small, fixed value of δ > 0, which we generally suppress in our notation. For convenience, we take m1 = 2. Let ψ denote the map taking (Q,W, q1, v1, q2, v2) to (Q,W,E1, E2, ϕ1, ϕ2). We claim that ψ is a C1 change of coordinates on the domain of interest. Since E1 = v21 + κδ(q1) + κδ(Q− q1), E1 is a C2 function of q1, v1, and Q. A similar statement holds for E2. The angular coordinates ϕi(xi, vi, Q) are defined by Equation (3.12). We only consider ϕ1, as the statements for ϕ2 are similar. Then ϕ1(q1, v1, Q) is clearly C1 whenever q1 6= a,Q−a. The apparent difficulties in regularity at the turning points are only a result of how the definition of ϕ1 is presented in Equation (3.12). Recall that the angle variables are actually defined by integrating the elapsed time along orbits, and our previous definition expressed ϕ1 in a manner which emphasized the dependence on q1. In fact, whenever |v1| < ϕ1(q1, v1, Q) = (κ−1δ ) ′(E1 − v2)dv if q1 < δ (κ−1δ ) ′(E1 − v2)dv if q1 > Q− δ. (3.19) Here E1 is implicitly considered to be a function of q1, v1, and Q. One can verify that Dψ is non-degenerate on the domain of interest, and so ψ is indeed a C1 change of coordinates. Next observe that dϕ1,0/dt = 1/T1, so Hadamard’s Lemma implies that dϕ1,ε +O(εf(δ)). It remains to show that, in fact, we may take f(δ) = 1. It is easy to verify this whenever q1 ≤ Q−δ because dE1/dt = 0 there. We only perform the more difficult verification when q1 > Q− δ. When q1 > Q−δ, |v1| < E1 and E1 = v 1+κδ(Q−q1). From Equation (3.19) we find that T1(Q,E1, δ) (κ−1)′(E1 − v2)dv. (3.20) To find dϕ1/dt, we consider ϕ1 as a function of v1, Q, and E1, so that Then, using Equations (3.18) and (3.20), we compute (κ−1δ ) ′(E1 − v21) κ′δ(Q− q1) 1/2− ϕ1 (εW ) = εW 1/2− ϕ1 1/2− ϕ1 (κ−1)′′(E1 − v2)dv (εWκ′δ(Q− q1)). Using that κ′δ(Q− q1) = κ′(κ−1(E1 − v21))/δ = (δ(κ−1)′(E1 − v21))−1, we find that 1/2− ϕ1 (κ−1)′(E1 − v21) (κ−1)′′(E1 − v2)dv But here 1/2− ϕ1 is O(δ). See the proof of Lemma 3.4.3 below. Thus the claims about dϕ1/dt will be proven, provided we can uniformly bound (κ−1)′(E1 − v21) (κ−1)′′(E1 − v2)dv. Note that the apparent divergence of the integral as |v1| → E1 is entirely due to the fact that our expression for ϕ1 from Equation (3.20) requires |v1| < E1. If we make the substitution u = E1− v2 and let e = E1− v21 , then it suffices to show E≤E1≤κ(0)−E 0 0, we only consider the dynamics on the invariant subset of phase space defined by Mε = {(Q, V, qi,j, vi,j) ∈ R2d(n1+n2)+2 : Q ∈ [0, 1], qi,j ∈ Di(Q), Emin ≤ V 2 + E1 + E2 ≤ Emax}. Let Pε denote the probability measure obtained by restricting the invariant Liou- ville measure to Mε. Define the stopping time Tε(z) = Tε = inf{τ ≥ 0 : h̄(τ) /∈ V or hε(τ/ε) /∈ V}. Theorem 4.1.1. If D is a gas container in d = 2 or 3 dimensions satisfying the assumptions in Subsection 4.1.1 above, then for each T > 0, 0≤τ≤T∧Tε ∣hε(τ/ε)− h̄(τ) ∣→ 0 in probability as ε =M−1/2 → 0, i.e. for each fixed δ > 0, 0≤τ≤T∧Tε ∣hε(τ/ε)− h̄(τ) ∣ ≥ δ → 0 as ε =M−1/2 → 0. Remark 4.1.1. It should be noted that the stopping time in the above result is not unduly restrictive. If the initial pressures of the two gasses are not too mismatched, then the solution to the averaged equation is a periodic orbit, with the effective potential well keeping the piston away from the walls. Thus, if the actual motions follow the averaged solution closely for 0 ≤ τ ≤ T ∧ Tε, and the averaged solution stays in V, it follows that Tε > T . Remark 4.1.2. The techniques of this work should immediately generalize to prove the analogue of Theorem 4.1.1 above in the nonphysical dimensions d > 3, although we do not pursue this here. Remark 4.1.3. As in Subsection 3.1.3, Theorem 4.1.1 can be easily generalized to cover a system of N − 1 pistons that divide N gas containers, so long as, for almost every fixed location of the pistons, the billiard flow of a single gas particle on an energy surface in any of the N subcontainers is ergodic (with respect to the invariant Liouville measure). The effective Hamiltonian for the pistons has them moving like an (N − 1)-dimensional particle inside a potential well. 4.2 Preparatory material concerning a two-dimensional gas container with only one gas particle on each side Our results and techniques of proof are essentially independent of the dimension and the fixed number of gas particles on either side of the piston. Thus, we focus D1 D2 ℓ ✲V = εW M = ε−2 ≫ 1 Figure 4.2: A choice of coordinates on phase space. on the case when d = 2 and there is only one gas particle on either side. Later, in Section 4.4, we will indicate the simple modifications that generalize our proof to the general situation. For clarity, in this section and next, we denote q1,1 by q1, v2,1 by v2, etc. We decompose the gas particle coordinates according to whether they are perpendicular to or parallel to the piston’s face, for example q1 = (q 1 , q See Figure 4.2. The Hamiltonian dynamics define a flow on our phase space. We denote this flow by zε(t, z) = zε(t), where z = zε(0, z). One should think of zε(·) as being a random variable that takes initial conditions in phase space to paths in phase space. Then hε(t) = h(zε(t)). By the change of coordinates W = V/ε, we may identify all of the Mε defined in Section 4.1 with the space M = {(Q,W, q1, v1, q2, v2) ∈ R10 : Q ∈ [0, 1], q1 ∈ D1(Q), q2 ∈ D2(Q), Emin ≤ W 2 + E1 + E2 ≤ Emax}. and all of the Pε with the probability measure P on M, which has the density dP = const dQdWdq⊥1 dq (Throughout this work we will use const to represent generic constants that are independent of ε.) We will assume that these identifications have been made, so that we may consider zε(·) as a family of measure preserving flows on the same space that all preserve the same probability measure. We denote the components of zε(t) by Qε(t), q 1,ε(t), etc. The set {z ∈ M : q1 = Q = q2} has co-dimension two, and so zε(t){q1 = Q = q2} has co-dimension one, which shows that only a measure zero set of initial conditions will give rise to three particle collisions. We ignore this and other measures zero events, such as gas particles hitting singularities of the billiard flow, in what follows. Now we present some background material, as well as some lemmas that will assist us in our proof of Theorem 4.1.1. We begin by studying the billiard flow of a gas particle when the piston is infinitely massive. Next we examine collisions between the gas particles and the piston when the piston has a large, but finite, mass. Then we present a heuristic derivation of the averaged equation that is suggestive of our proof. Finally we prove a lemma that allows us to disregard the possibility that a gas particle will move nearly parallel to the piston’s face – a situation that is clearly bad for having the motions of the piston follow the solutions of the averaged equation. 4.2.1 Billiard flows and maps in two dimensions In this section, we study the billiard flows of the gas particles when M = ∞ and the slow variables are held fixed at a specific value h ∈ V. We will only study the motions of the left gas particle, as similar definitions and results hold for the motions of the right gas particle. Thus we wish to study the billiard flow of a point particle moving inside the domain D1 at a constant speed 2E1. The results of this section that are stated without proof can be found in [CM06a]. Let T D1 denote the tangent bundle to D1. The billiard flow takes place in the three-dimensional space M1h = M1 = {(q1, v1) ∈ T D1 : q1 ∈ D1, |v1| =√ 2E1}/ ∼. Here the quotient means that when q1 ∈ ∂D1, we identify velocity vectors pointing outside of D1 with those pointing inside D1 by reflecting through the tangent line to ∂D1 at q1, so that the angle of incidence with the unit normal vector to ∂D1 equals the angle of reflection. Note that most of the quantities defined in this subsection depend on the fixed value of h. We will usually suppress this dependence, although, when necessary, we will indicate it by a subscript h. We denote the resulting flow by y(t, y) = y(t), where y(0, y) = y. As the billiard flow comes from a Hamiltonian system, it preserves Liouville measure restricted to the energy surface. We denote the resulting probability measure by µ. This measure has the density dµ = dq1dv1/(2π 2E1 |D1|). Here dq1 represents area on R2, and dv1 represents length on S v1 ∈ R2 : |v1| = There is a standard cross-section to the billiard flow, the collision cross-section Ω = {(q1, v1) ∈ T D1 : q1 ∈ ∂D1, |v1| = 2E1}/ ∼. It is customary to parameter- ize Ω by {x = (r, ϕ) : r ∈ ∂D1, ϕ ∈ [−π/2,+π/2]}, where r is arc length and ϕ represents the angle between the outgoing velocity vector and the inward pointing normal vector to ∂D1. It follows that Ω may be realized as the disjoint union of a finite number of rectangles and cylinders. The cylinders correspond to fixed scatterers with smooth boundary placed inside the gas container. If F : Ω is the collision map, i.e. the return map to the collision cross-section, then F preserves the projected probability measure ν, which has the density dν = cosϕdϕ dr/(2 |∂D1|). Here |∂D1| is the length of ∂D1. We suppose that the flow is ergodic, and so F is an invertible, ergodic mea- sure preserving transformation. Because ∂D1 is piecewise C3, F is piecewise C2, although it does have discontinuities and unbounded derivatives near discontinu- ities corresponding to grazing collisions. Because of our assumptions on D1, the free flight times and the curvature of ∂D1 are uniformly bounded. It follows that if x /∈ ∂Ω ∪ F−1(∂Ω), then F is differentiable at x, and ‖DF (x)‖ ≤ const cosϕ(Fx) , (4.2) where ϕ(Fx) is the value of the ϕ coordinate at the image of x. Following the ideas in Section 4.5, we induce F on the subspace Ω̂ of Ω cor- responding to collisions with the (immobile) piston. We denote the induced map by F̂ and the induced measure by ν̂. We parameterize Ω̂ by {(r, ϕ) : 0 ≤ r ≤ ℓ, ϕ ∈ [−π/2,+π/2]}. As νΩ̂ = ℓ/ |∂D1|, it follows that ν̂ has the density dν̂ = cosϕdϕ dr/(2ℓ). For x ∈ Ω, define ζx to be the free flight time, i.e. the time it takes the billiard particle traveling at speed 2E1 to travel from x to Fx. If x /∈ ∂Ω ∪ F−1(∂Ω), ‖Dζ(x)‖ ≤ const cosϕ(Fx) . (4.3) Santaló’s formula [San76, Che97] tells us that Eνζ = π |D1| |v1| |∂D1| . (4.4) If ζ̂ : Ω̂ → R is the free flight time between collisions with the piston, then it follows from Proposition 4.5.1 that Eν̂ ζ̂ = π |D1| |v1| ℓ . (4.5) The expected value of ∣ when the left gas particle collides with the (immo- bile) piston is given by ∣ = Eν̂ 2E1 cosϕ = ∫ +π/2 cos2 ϕdϕ = . (4.6) We wish to compute limt→∞ t ∣2v⊥1 (s) ∣ δq⊥1 (s)=Qds, the time average of the change in momentum of the left gas particle when it collides with the piston. If this limit exists and is equal for almost every initial condition of the left gas particle, then it makes sense to define the pressure inside D1 to be this quantity divided by ℓ. Because the collisions are hard-core, we cannot directly apply Birkhoff’s Ergodic Theorem to compute this limit. However, we can compute this limit by using the map F̂ . Lemma 4.2.1. If the billiard flow y(t) is ergodic, then for µ− a.e. y ∈ M1, ∣v⊥1 (s) ∣ δq⊥1 (s)=Qds = 2 |D1(Q)| Proof. Because the billiard flow may be viewed as a suspension flow over the collision cross-section with ζ as the height function, it suffices to show that the convergence takes place for ν̂ − a.e. x ∈ Ω̂. For an initial condition x ∈ Ω̂, define N̂t(x) = N̂t = # s ∈ (0, t] : y(s, x) ∈ Ω̂ . By the Poincaré Recurrence Theorem, N̂t → ∞ as t→ ∞, ν̂ − a.e. n=0 ζ̂(F̂ ∣ (F̂ nx) ≤ 1 ∣v⊥1 (s) ∣ δq⊥1 (s)=Qds ≤ N̂t ∑N̂t−1 n=0 ζ̂(F̂ ∣ (F̂ nx), and so the result follows from Birkhoff’s Ergodic Theorem and Equations (4.5) and (4.6). Corollary 4.2.2. If the billiard flow y(t) is ergodic, then for each δ > 0, y ∈ M1 : ∣v⊥1 (s) ∣ δq⊥1 (s)=Qds− 2 |D1(Q)| → 0 as t→ ∞. 4.2.2 Analysis of collisions In this section, we return to studying our piston system when ε > 0. We will examine what happens when a particle collides with the piston. For convenience, we will only examine in detail collisions between the piston and the left gas particle. Collisions with the right gas particle can be handled similarly. When the left gas particle collides with the piston, v⊥1 and V instantaneously change according to the laws of elastic collisions: 1−M 2M 2 M − 1 In our coordinates, this becomes 1 + ε2 ε2 − 1 2ε 2ε 1− ε2 . (4.7) Recalling that v1,W = O(1), we find that to first order in ε, v⊥+1 = −v⊥−1 +O(ε), W+ =W− +O(ε). (4.8) Observe that a collision can only take place if v⊥−1 > εW −. In particular, v⊥−1 > 2Emax. Thus, either v 1 > 0 or v 1 = O(ε). By expanding Equation (4.7) to second order in ε, it follows that E+1 − E−1 = −2εW ∣+O(ε2), W+ −W− = +2ε ∣+O(ε2). (4.9) Note that it is immaterial whether we use the pre-collision or post-collision values of W and ∣ on the right hand side of Equation (4.9), because any ambiguity can be absorbed into the O(ε2) term. It is convenient for us to define a “clean collision” between the piston and the left gas particle: Definition 4.2.1. The left gas particle experiences a clean collision with the piston if and only if v⊥−1 > 0 and v 1 < −ε 2Emax. In particular, after a clean collision, the left gas particle will escape from the piston, i.e. the left gas particle will have to move into the region q⊥1 ≤ 0 before it can experience another collision with the piston. It follows that there exists a constant C1 > 0, which depends on the set V, such that for all ε sufficiently small, so long as Q ≥ Qmin and ∣ > εC1 when q 1 ∈ [Qmin, Q], then the left gas particle will experience only clean collisions with the piston, and the time between these collisions will be greater than 2Qmin/( 2Emax). (Note that when we write expressions such as q⊥1 ∈ [Qmin, Q], we implicitly mean that q1 is positioned inside the “tube” discussed at the beginning of Section 4.1.) One can verify that C1 = 5 2Emax would work. Similarly, we can define clean collisions between the right gas particle and the piston. We assume that C1 was chosen sufficiently large such that for all ε sufficiently small, so long as Q ≤ Qmax and ∣ > εC1 when q 2 ∈ [Q,Qmax], then the right gas particle will experience only clean collisions with the piston. Now we define three more stopping times, which are functions of the initial conditions in phase space. T ′ε = inf{τ ≥ 0 : Qmin ≤ q⊥1,ε(τ/ε) ≤ Qε(τ/ε) ≤ Qmax and ∣v⊥1,ε(τ/ε) ∣ ≤ C1ε}, T ′′ε = inf{τ ≥ 0 : Qmin ≤ Qε(τ/ε) ≤ q⊥2,ε(τ/ε) ≤ Qmax and ∣v⊥2,ε(τ/ε) ∣ ≤ C1ε}, T̃ε =T ∧ Tε ∧ T ′ε ∧ T ′′ε Define H(z) by H(z) = ∣ δq⊥1 =Q − 2 ∣ δq⊥2 =Q ∣ δq⊥1 =Q ∣ δq⊥2 =Q Here we make use of Dirac delta functions. All integrals involving these delta functions may be replaced by sums. The following lemma is an immediate consequence of Equation (4.9) and the above discussion: Lemma 4.2.3. If 0 ≤ t1 ≤ t2 ≤ T̃ε/ε, the piston experiences O((t2 − t1) ∨ 1) collisions with gas particles in the time interval [t1, t2], all of which are clean collisions. Furthermore, hε(t2)− hε(t1) = O(ε) + ε H(zε(s))ds. Here any ambiguities arising from collisions occurring at the limits of integration can be absorbed into the O(ε) term. 4.2.3 Another heuristic derivation of the averaged equa- The following heuristic derivation of Equation (4.1) when d = 2 was suggested in [Dol05]. Let ∆t be a length of time long enough such that the piston experiences many collisions with the gas particles, but short enough such that the slow variables change very little, in this time interval. From each collision with the left gas particle, Equation (4.9) states that W changes by an amount +2ε ∣ + O(ε2), and from Equation (4.6) the average change in W at these collisions should be approximately επ 2E1/2 + O(ε2). From Equation (4.5) the frequency of these collisions is approximately 2E1 ℓ/(π |D1|). Arguing similarly for collisions with the other particle, we guess that |D1(Q)| − ε E2ℓ|D2(Q)| +O(ε2). With τ = εt as the slow time, a reasonable guess for the averaged equation for W |D1(Q)| − E2ℓ|D2(Q)| Similar arguments for the other slow variables lead to the averaged equation (4.1), and this explains why we used Pi = Ei/ |Di| for the pressure of a 2-dimensional gas in Section 1.2. There is a similar heuristic derivation of the averaged equation in d > 2 dimen- sions. Compare the analogues of Equations (4.5) and (4.6) in Subsection 4.4.2. 4.2.4 A priori estimate on the size of a set of bad initial conditions In this section, we give an a priori estimate on the size of a set of initial conditions that should not give rise to orbits for which sup0≤τ≤T∧Tε ∣hε(τ/ε)− h̄(τ) ∣ is small. In particular, when proving Theorem 4.1.1, it is convenient to focus on orbits that only contain clean collisions with the piston. Thus, we show that P{T̃ε < T ∧ Tε} vanishes as ε → 0. At first, this result may seem surprising, since P{T ′ε ∧ T ′′ε = 0} = O(ε), and one would expect ∪T/εt=0zε(−t){T ′ε ∧ T ′′ε = 0} to have a size of order 1. However, the rate at which orbits escape from {T ′ε ∧ T ′′ε = 0} is very small, and so we can prove the following: Lemma 4.2.4. P{T̃ε < T ∧ Tε} = O(ε). In some sense, this lemma states that the probability of having a gas particle move nearly parallel to the piston’s face within the time interval [0, T/ε], when one would expect the other gas particle to force the piston to move on a macroscopic scale, vanishes as ε → 0. Thus, one can hope to control the occurrence of the “nondiffusive fluctuations” of the piston described in [CD06a] on a time scale O(ε−1). Proof. As the left and the right gas particles can be handled similarly, it suffices to show that P{T ′ε < T} = O(ε). Define Bε = {z ∈ M : Qmin ≤ q⊥1 ≤ Q ≤ Qmax and ∣ ≤ C1ε}. Then {T ′ε < T} ⊂ ∪ t=0zε(−t)Bε, and if γ = Qmin/ 8Emax, zε(−t)Bε  = P zε(t)Bε  = P Bε ∪ ((zε(t)Bε)\Bε) ≤ PBε + P T/(εγ) zε(kγ) (zε(t)Bε)\Bε ≤ PBε + (zε(t)Bε)\Bε Now PBε = O(ε), so if we can show that P ( t=0(zε(t)Bε)\Bε) = O(ε2), then it will follow that P{T ′ε < T} = O(ε). If z ∈ t=0(zε(t)Bε)\Bε, it is still true that ∣ = O(ε). This is because changes by at most O(ε) at the collisions, and if a collision forces ∣ > C1ε, then the gas particle must escape to the region q⊥1 ≤ 0 before v⊥1 can change again, and this will take time greater than γ. Furthermore, if z ∈ t=0(zε(t)Bε)\Bε, then at least one of the following four possibilities must hold: ∣q⊥1 −Qmin ∣ ≤ O(ε), • |Q−Qmin| ≤ O(ε), • |Q−Qmax| ≤ O(ε), ∣Q− q⊥1 ∣ ≤ O(ε). It follows that P ( t=0(zε(t)Bε)\Bε) = O(ε2). For example, 1{|v⊥1 |≤O(ε), |q⊥1 −Qmin|≤O(ε)}dP = const {Emin≤W 2/2+v21/2+v22/2≤Emax} 1{|v⊥1 |≤O(ε)}dWdv {Q∈[0,1], q1∈D1, q2∈D2} 1{|q⊥1 −Qmin|≤O(ε)}dQdq = O(ε2). 4.3 Proof of the main result for two-dimensional gas containers with only one gas particle on each side As in Section 4.2, we continue with the case when d = 2 and there is only one gas particle on either side of the piston. 4.3.1 Main steps in the proof of convergence in probability By Lemma 4.2.4, it suffices to show that sup0≤τ≤T̃ε ∣hε(τ/ε)− h̄(τ) ∣→ 0 in prob- ability as ε =M−1/2 → 0. Several of the ideas in the steps below were inspired by a recent proof of Anosov’s averaging theorem for smooth systems that is due to Dolgopyat [Dol05]. Step 1: Reduction using Gronwall’s Inequality. Observe that h̄(τ) satisfies the integral equation h̄(τ)− h̄(0) = H̄(h̄(σ))dσ, while from Lemma 4.2.3, hε(τ/ε)− hε(0) = O(ε) + ε ∫ τ/ε H(zε(s))ds = O(ε) + ε ∫ τ/ε H(zε(s))− H̄(hε(s))ds+ H̄(hε(σ/ε))dσ for 0 ≤ τ ≤ T̃ε. Define eε(τ) = ε ∫ τ/ε H(zε(s))− H̄(hε(s))ds. It follows from Gronwall’s Inequality that 0≤τ≤T̃ε ∣hε(τ/ε)− h̄(τ) O(ε) + sup 0≤τ≤T̃ε |eε(τ)| eLip(H̄|V)T . (4.10) Gronwall’s Inequality is usually stated for continuous paths, but the standard proof (found in [SV85]) still works for paths that are merely integrable, and ∣hε(τ/ε)− h̄(τ) ∣ is piecewise smooth. Step 2: Introduction of a time scale for ergodization. Let L(ε) be a real valued function such that L(ε) → ∞, but L(ε) ≪ log ε−1, as ε → 0. In Section 4.3.2 we will place precise restrictions on the growth rate of L(ε). Think of L(ε) as being a time scale that grows as ε → 0 so that ergodization, i.e. the convergence along an orbit of a function’s time average to a space average, can take place. However, L(ε) doesn’t grow too fast, so that on this time scale zε(t) essentially stays on the submanifold {h = hε(0)}, where we have our ergodicity assumption. Set tk,ε = kL(ε), so that 0≤τ≤T̃ε |eε(τ)| ≤ O(εL(ε)) + ε εL(ε) ∫ tk+1,ε H(zε(s))− H̄(hε(s))ds . (4.11) Step 3: A splitting according to particles. Now H(z) − H̄(h(z)) divides into two pieces, each of which depends on only one gas particle when the piston is held fixed: H(z)− H̄(h(z)) = ∣ δq⊥1 =Q − |D1(Q)| ∣ δq⊥1 =Q + |D1(Q)| |D2(Q)| − 2 ∣ δq⊥2 =Q − WE2ℓ|D2(Q)| + 2W ∣ δq⊥2 =Q We will only deal with the piece depending on the left gas particle, as the right particle can be handled similarly. Define G(z) = ∣ δq⊥1 =Q, Ḡ(h) = 2 |D1(Q)| . (4.12) Returning to Equation (4.11), we see that in order to prove Theorem 4.1.1, it suffices to show that both εL(ε) ∫ tk+1,ε G(zε(s))− Ḡ(hε(s))ds εL(ε) ∫ tk+1,ε Wε(s) G(zε(s))− Ḡ(hε(s)) converge to 0 in probability as ε→ 0. Step 4: A splitting for using the triangle inequality. Now we let zk,ε(s) be the orbit of the ε = 0 Hamiltonian vector field satisfying zk,ε(tk,ε) = zε(tk,ε). Set hk,ε(t) = h(zk,ε(t)). Observe that hk,ε(t) is independent of t. We emphasize that so long as 0 ≤ t ≤ T̃ε/ε, the times between collisions of a specific gas particle and piston are uniformly bounded greater than 0, as explained before Lemma 4.2.3. It follows that, so long as tk+1,ε ≤ T̃ε/ε, tk,ε≤t≤tk+1,ε |hk,ε(t)− hε(t)| = O(εL(ε)). (4.13) This is because the slow variables change by at most O(ε) at collisions, and dQε/dt = O(ε). Also, ∫ tk+1,ε Wε(s) G(zε(s))− Ḡ(hε(s)) = O(εL(ε)2) +Wk,ε(tk,ε) ∫ tk+1,ε G(zε(s))− Ḡ(hε(s))ds, and so εL(ε) ∫ tk+1,ε Wε(s) G(zε(s))− Ḡ(hε(s)) ≤ O(εL(ε)) + ε const εL(ε) ∫ tk+1,ε G(zε(s))− Ḡ(hε(s))ds Thus, in order to prove Theorem 4.1.1, it suffices to show that εL(ε) ∫ tk+1,ε G(zε(s))− Ḡ(hε(s))ds εL(ε) |Ik,ε|+ |IIk,ε|+ |IIIk,ε| converges to 0 in probability as ε → 0, where Ik,ε = ∫ tk+1,ε G(zε(s))−G(zk,ε(s))ds, IIk,ε = ∫ tk+1,ε G(zk,ε(s))− Ḡ(hk,ε(s))ds, IIIk,ε = ∫ tk+1,ε Ḡ(hk,ε(s))− Ḡ(hε(s))ds. The term IIk,ε represents an “ergodicity term” that can be controlled by our assumptions on the ergodicity of the flow z0(t), while the terms Ik,ε and IIIk,ε represent “continuity terms” that can be controlled by controlling the drift of zε(t) from zk,ε(t) for tk,ε ≤ t ≤ tk+1,ε. Step 5: Control of drift from the ε = 0 orbits. Now Ḡ is uniformly Lipschitz on the compact set V, and so it follows from Equation (4.13) that IIIk,ε = O(εL(ε)2). Thus, ε εL(ε) k=0 |IIIk,ε| = O(εL(ε)) → 0 as ε→ 0. Next, we show that for fixed δ > 0, P εL(ε) k=0 |Ik,ε| ≥ δ → 0 as ε → 0. For initial conditions z ∈ M and for integers k ∈ [0, T/(εL(ε))− 1] define Ak,ε = |Ik,ε| > and k ≤ T̃ε εL(ε) Az,ε = {k : z ∈ Ak,ε} . Think of these sets as describing “poor continuity” between solutions of the ε = 0 and the ε > 0 Hamiltonian vector fields. For example, roughly speaking, z ∈ Ak,ε if the orbit zε(t) starting at z does not closely follow zk,ε(t) for tk,ε ≤ t ≤ tk+1,ε. One can easily check that |Ik,ε| ≤ O(L(ε)) for k ≤ T̃ε/(εL(ε))− 1, and so it follows that εL(ε) |Ik,ε| ≤ +O(εL(ε)#(Az,ε)). Therefore it suffices to show that P (#(Az,ε) ≥ δ(const εL(ε))−1) → 0 as ε → 0. By Chebyshev’s Inequality, we need only show that EP (εL(ε)#(Az,ε)) = εL(ε) εL(ε) P (Ak,ε) tends to 0 with ε. Observe that zε(tk,ε)Ak,ε ⊂ A0,ε. In words, the initial conditions giving rise to orbits that are “bad” on the time interval [tk,ε, tk+1,ε], moved forward by time tk,ε, are initial conditions giving rise to orbits which are “bad” on the time interval [t0,ε, t1,ε]. Because the flow zε(·) preserves the measure, we find that εL(ε) εL(ε) P (Ak,ε) ≤ constP (A0,ε). To estimate P (A0,ε), it is convenient to use a different probability measure, which is uniformly equivalent to P on the set {z ∈ M : h(z) ∈ V} ⊃ {T̃ε ≥ εL(ε)}. We denote this new probability measure by P f , where the f stands for “factor.” If we choose coordinates on M by using h and the billiard coordinates on the two gas particles, then P f is defined on M by dP f = dh dµ1h dµ2h, where dh represents the uniform measure on V ⊂ R4, and the factor measure dµih represents the in- variant billiard measure of the ith gas particle coordinates for a fixed value of the slow variables. One can verify that 1{h(z)∈V}dP ≤ const dP f , but that P f is not invariant under the flow zε(·) when ε > 0. We abuse notation, and consider µ1h to be a measure on the left particle’s initial billiard coordinates once h and the initial coordinates of the right gas particle are fixed. In this context, µ1h is simply the measure µ from Subsection 4.2.1. Then P f(A0,ε) dh dµ2h · µ1h ∫ L(ε) G(zε(s))−G(z0(s))ds and εL(ε) ≤ T̃ε and we must show that the last term tends to 0 with ε. By the Bounded Con- vergence Theorem, it suffices to show that for almost every h ∈ V and initial condition for the right gas particle, ∫ L(ε) G(zε(s))−G(z0(s))ds and εL(ε) ≤ T̃ε → 0 as ε→ 0. (4.14) Note that if G were a smooth function and zε(·) were the flow of a smooth family of vector fields Z(z, ε) that depended smoothly on ε, then from Gronwall’s Inequality, it would follow that sup0≤t≤L(ε) |zε(t)− z0(t)| ≤ O(εL(ε)eLip(Z)L(ε)). If this were the case, then L(ε)−1 ∫ L(ε) G(zε(s))−G(z0(s))ds = O(εL(ε)eLip(Z)L(ε)), which would tend to 0 with ε. Thus, we need a Gronwall-type inequality for billiard flows. We obtain the appropriate estimates in Section 4.3.2. Step 6: Use of ergodicity along fibers to control IIk,ε. All that remains to be shown is that for fixed δ > 0, P εL(ε) k=0 |IIk,ε| ≥ δ → 0 as ε→ 0. For initial conditions z ∈ M and for integers k ∈ [0, T/(εL(ε))− 1] define Bk,ε = |IIk,ε| > and k ≤ T̃ε εL(ε) Bz,ε = {k : z ∈ Bk,ε} . Think of these sets as describing “bad ergodization.” For example, roughly speak- ing, z ∈ Bk,ε if the orbit zε(t) starting at z spends the time between tk,ε and tk+1,ε in a region of phase space where the function G(·) is “poorly ergodized” on the time scale L(ε) by the flow z0(t) (as measured by the parameter δ/2T ). Note that G(z) = ∣ δq⊥1 =Q is not really a function, but that we may still speak of the convergence of t−1 G(z0(s))ds as t → ∞. As we showed in Lemma 4.2.1, the limit is Ḡ(h0) for almost every initial condition. Proceeding as in Step 5 above, we find that it suffices to show that for almost every h ∈ V, G(z0(s))ds− Ḡ(h0(0)) → 0 as t→ ∞. But this is simply a question of examining billiard flows, and it follows immediately from Corollary 4.2.2 and our Main Assumption. 4.3.2 A Gronwall-type inequality for billiards We begin by presenting a general version of Gronwall’s Inequality for billiard maps. Then we will show how these results imply the convergence required in Equation (4.14). Some inequalities for the collision map In this section, we consider the value of the slow variables to be fixed at h0 ∈ V. We will use the notation and results presented in Section 4.2.1, but because the value of the slow variables is fixed, we will omit it in our notation. Let ρ, γ, and λ satisfy 0 < ρ≪ γ ≪ 1 ≪ λ <∞. Eventually, these quantities will be chosen to depend explicitly on ε, but for now they are fixed. Recall that the phase space Ω for the collision map F is a finite union of disjoint rectangles and cylinders. Let d(·, ·) be the Euclidean metric on connected components of Ω. If x and x′ belong to different components, then we set d(x, x′) = ∞. The invariant measure ν satisfies ν < const · (Lebesgue measure). For A ⊂ Ω and a > 0, let Na(A) = {x ∈ Ω : d(x,A) < a} be the a-neighborhood of A. For x ∈ Ω let xk(x) = xk = F kx, k ≥ 0, be its forward orbit. Suppose x /∈ Cγ,λ, where Cγ,λ = ∪λk=0F−kNγ(∂Ω) ∪λk=0F−kNγ(F−1Nγ(∂Ω)) Thus for 0 ≤ k ≤ λ, xk is well defined, and from Equation (4.2) it satisfies d(x′, xk) ≤ γ ⇒ d(Fx′, xk+1) ≤ const d(x′, xk). (4.15) Next, we consider any ρ-pseudo-orbit x′k obtained from x by adding on an error of size ≤ ρ at each application of the map, i.e. d(x′0, x0) ≤ ρ, and for k ≥ 1, d(x′k, Fx k−1) ≤ ρ. Provided d(xj , x′j) < γ for each j < k, it follows that d(xk, x k) ≤ ρ const ≤ const ρ const . (4.16) In particular, if ρ, γ, and λ were chosen such that const ρ const < γ, (4.17) then Equation (4.16) will hold for each k ≤ λ. We assume that Equation (4.17) is true. Then we can also control the differences in elapsed flight times using Equation (4.3): |ζxk − ζx′k| ≤ const ρ const . (4.18) It remains to estimate the size νCγ,λ of the set of x for which the above estimates do not hold. Using Lemma 4.3.1 below, νCγ,λ ≤ (λ+ 1) νNγ(∂Ω) + νNγ(F−1Nγ(∂Ω)) ≤ O(λ(γ + γ1/3)) = O(λγ1/3). (4.19) Lemma 4.3.1. As γ → 0, νNγ(F−1Nγ(∂Ω)) = O(γ1/3). This estimate is not necessarily the best possible. For example, for dispersing billiard tables, where the curvature of the boundary is positive, one can show that νNγ(F−1Nγ(∂Ω)) = O(γ). However, the estimate in Lemma 4.3.1 is general and sufficient for our needs. Proof. First, we note that it is equivalent to estimate νNγ(FNγ(∂Ω)), as F has the measure-preserving involution I(r, ϕ) = (r,−ϕ), i.e. F−1 = I ◦F ◦I [CM06b]. Fix α ∈ (0, 1/2), and cover Nγ(∂Ω) with O(γ−1) starlike sets, each of diameter no greater than O(γ). For example, these sets could be squares of side length γ. Enumerate the sets as {Ai}. Set G = {i : FAi ∩ Nγα(∂Ω) = ∅}. If i ∈ G, F |Ai is a diffeomorphism satisfying ‖DF |Ai‖ ≤ O(γ−α). See Equa- tion (4.2). Thus diameter (FAi) ≤ O(γ1−α), and so diameter (Nγ(FAi)) ≤ O(γ1−α). Hence νNγ(FAi) ≤ O(γ2(1−α)), and νNγ(∪i∈GFAi) ≤ O(γ1−2α). If i /∈ G, Ai ∩ F−1(Nγα(∂Ω)) 6= ∅. Thus Ai might be cut into many pieces by F−1(∂Ω), but each of these pieces must be mapped near ∂Ω. In fact, FAi ⊂ NO(γα)(∂Ω). This is because outside F−1(Nγα(∂Ω)), ‖DF‖ ≤ O(γ−α), and so points in FAi are no more than a distance O(γ/γα) away from Nγα(∂Ω), and γ < γ1−α < γα. It follows that Nγ(FAi) ⊂ NO(γα)(∂Ω), and νNO(γα)(∂Ω) = O(γα). Thus νNγ(F−1Nγ(∂Ω)) = O(γ1−2α + γα), and we obtain the lemma by taking α = 1/3. Application to a perturbed billiard flow Returning to the end of Step 5 in Section 4.3.1, let the initial conditions of the slow variables be fixed at h0 = (Q0,W0, E1,0, E2,0) ∈ V throughout the remainder of this section. We can assume that the billiard dynamics of the left gas particle in D1(Q0) are ergodic. Also, fix a particular value of the initial conditions for the right gas particle for the remainder of this section. Then zε(t) and T̃ε may be thought of as random variables depending on the left gas particle’s initial conditions y ∈ M1. Now if hε(t) = (Qε(t),Wε(t), E1,ε(t), E2,ε(t)) denotes the actual motions of the slow variables when ε > 0, it follows from Equation (4.13) that, provided εL(ε) ≤ T̃ε, 0≤t≤L(ε) |h0 − hε(t)| = O(εL(ε)). (4.20) Furthermore, we only need to show that y ∈ M1 : ∫ L(ε) G(zε(s))−G(z0(s))ds and εL(ε) ≤ T̃ε (4.21) as ε→ 0, where G is defined in Equation (4.12). For definiteness, we take the following quantities from Subsection 4.3.2 to de- pend on ε as follows: L(ε) = L = log log γ(ε) = γ = e−L, λ(ε) = λ = ρ(ε) = ρ = const (4.22) The constant in the choice of ρ and ρ’s dependence on ε will be explained in the proof of Lemma 4.3.3, which is at the end of this subsection. The other choices may be explained as follows. We wish to use continuity estimates for the billiard map to produce continuity estimates for the flow on the time scale L. As the divergence of orbits should be exponentially fast, we choose L to grow sublogarithmically in ε−1. Since from Equation (4.4) the expected flight time between collisions with ∂D1(Q0) when ε = 0 is Eνζ = π |D1(Q0)| /( 2E1,0 |∂D1(Q0)|), we expect to see roughly λ/2 collisions on this time scale. Considering λ collisions gives us some margin for error. Furthermore, we will want orbits to keep a certain distance, γ, away from the billiard discontinuities. γ → 0 as ε → 0, but γ is very large compared to the possible drift O(εL) of the slow variables on the time scale L. In fact, for each C,m, n > 0, = O(ε econstL2) → 0 as ε→ 0. (4.23) Let X : M1 → Ω be the map taking y ∈ M1 to x = X(y) ∈ Ω, the location of the billiard orbit of y in the collision cross-section that corresponds to the most recent time in the past that the orbit was in the collision cross-section. We consider the set of initial conditions Eε = X−1(Ω\Cγ,λ) x ∈ Ω : ζ(F kx) > L Now from Equations (4.19) and (4.22), νCγ,λ → 0 as ε → 0. Furthermore, by the ergodicity of F , x ∈ Ω : ζ(F kx) ≤ L x ∈ Ω : λ−1 ζ(F kx) ≤ Eνζ/2 as ε → 0. But because the free flight time is bounded above, µX−1 ≤ const ·ν, and so µEε → 1 as ε→ 0. Hence, the convergence in Equation (4.21) and the conclusion of the proof in Section 4.3.1 follow from the lemma below and Equation (4.23). Lemma 4.3.2 (Analysis of deviations along good orbits). As ε→ 0, y∈Eε∩{εL≤T̃ε} G(zε(s))−G(z0(s))ds const +O(L−1) → 0. Proof. Fix a particular value of y ∈ Eε∩ εL ≤ T̃ε . For convenience, suppose that y = X(y) = x ∈ Ω. Let y0(t) denote the time evolution of the billiard coordinates for the left gas particle when ε = 0. Then there is some N ≤ λ such that the orbit xk = F kx = (rk, ϕk) for 0 ≤ k ≤ N corresponds to all of the instances (in order) when y0(t) enters the collision cross-section Ω = Ωh0 corresponding to collisions with ∂D1(Q0) for 0 ≤ t ≤ L. We write Ωh0 to emphasize that in this subsection we are only considering the collision cross-section corresponding to the billiard dynamics in the domain D1(Q0) at the energy level E1,0. In particular, F will always refer to the return map on Ωh0 . Also, define an increasing sequence of times tk corresponding to the actual times y0(t) enters the collision cross-section, i.e. t0 = 0, tk = tk−1 + ζxk−1 for k > 0. Then xk = y0(tk). Furthermore, define inductively N1 = inf {k > 0 : tk corresponds to a collision with the piston} , Nj = inf {k > Nj−1 : tk corresponds to a collision with the piston} . Next, let yε(t) denote the time evolution of the billiard coordinates for the left gas particle when ε > 0. We will construct a pseudo-orbit x′k,ε = (r k,ε, ϕ k,ε) of points in Ωh0 that essentially track the collisions (in order) of the left gas particle with the boundary under the dynamics of yε(t) for 0 ≤ t ≤ L. First, define an increasing sequence of times t′k,ε corresponding to the actual times yε(t) experiences a collision with the boundary of the gas container or the moving piston. Define N ′ε = sup k ≥ 0 : t′k,ε ≤ L N ′1,ε = inf k > 0 : t′k,ε corresponds to a collision with the piston N ′j,ε = inf k > N ′j−1,ε : t k,ε corresponds to a collision with the piston Because L ≤ T̃ε(y)/ε, we know that as long as N ′j+1,ε ≤ N ′ε, then N ′j+1,ε−N ′j,ε ≥ 2. See the discussion in Subsection 4.2.2. Then we define x′k,ε ∈ Ωh0 by x′k,ε = k,ε) if k /∈ N ′j,ε F−1x′k+1,ε if k ∈ N ′j,ε Lemma 4.3.3. Provided ε is sufficiently small, the following hold for each k ∈ [0, N∧N ′ε). Furthermore, the requisite smallness of ε and the sizes of the constants in these estimates may be chosen independent of the initial condition y ∈ Eε ∩ εL ≤ T̃ε and of k: (a) x′k,ε is well defined. In particular, if k /∈ N ′j,ε , yε(t k,ε) corresponds to a collision point on ∂D1(Q0), and not to a collision point on a piece of ∂D to the right of Q0. (b) If k > 0 and k /∈ N ′j,ε , then x′k,ε = Fx k−1,ε. (c) If k > 0 and k ∈ N ′j,ε , then d(x′k,ε, Fx k−1,ε) ≤ ρ and the ϕ coordinate of k,ε) satisfies ϕ(yε(t k,ε)) = ϕ k,ε +O(ε). (d) d(xk, x k,ε) ≤ const ρ(const/γ)k . (e) k = N ′j,ε if and only if k = Nj. (f) If k > 0, t′k,ε − t′k−1,ε = tk − tk−1 +O(ρ (const/γ) We defer the proof of Lemma 4.3.3 until the end of this subsection. Assuming that ε is sufficiently small for the conclusions of Lemma 4.3.3 to be valid, we continue with the proof of Lemma 4.3.2. Set M = N ∧ N ′ε − 1. Note that M ≤ λ ∼ L. From (f) in Lemma 4.3.3 and Equations (4.22) and (4.23), we see that ∣tM − t′M,ε ∣t′k,ε − t′k−1,ε − (tk − tk−1) ∣ = O constλ → 0 as ε → 0. Because the flight times t′k,ε − t′k−1,ε and tk − tk−1 are uniformly bounded above, it follows from the definitions of N and N ′ε that tM , t M,ε ≥ L − const. But from Subsection 4.2.2, the time between the collisions of the left gas particle with the piston are uniformly bounded away from zero. Using (c) and Equation (4.20), it follows that G(zε(s))−G(z0(s))ds = O(L−1) + k∈{Nj :Nj≤M} 2E1,0 cosϕk − 2E1,ε(t k,ε) cos(ϕ k,ε +O(ε)) = O(L−1) + k∈{Nj :Nj≤M} 2E1,0 cosϕk − 2E1,0 cosϕ k,ε +O(εL) = O(L−1) +O(εL2) + 2E1,0 k∈{Nj :Nj≤M} ∣cosϕk − cosϕ′k,ε But using (d), k∈{Nj :Nj≤M} ∣cosϕk − cosϕ′k,ε O(ρ(const/γ)k) = O(ρ(const/γ)λ). Since εL2 = O(ρ(const/γ)λ), this finishes the proof of Lemma 4.3.2. Proof of Lemma 4.3.3. The proof is by induction. We take ε to be so small that Equation (4.17) is satisfied. This is possible by Equation (4.23). It is trivial to verify (a)-(f) for k = 0. So let 0 < l < N ∧N ′ε, and suppose that (a)-(f) have been verified for all k < l. We have three cases to consider: Case 1: l − 1 and l /∈ N ′j,ε In this case, verifying (a)-(f) for k = l is a relatively straightforward application of the machinery developed in Subsection 4.3.2, because for t′l−1,ε ≤ t ≤ t′l,ε, yε(t) traces out the billiard orbit between x′l−1,ε and x l,ε corresponding to free flight in the domain D1(Q0). We make only two remarks. First, as long as ε is sufficiently small, it really is true that x′l,ε = yε(t corresponds to a true collision point on ∂D1(Q0). Indeed, if this were not the case, then it must be that Qε(t l,ε) > Q0, and yε(t l,ε) would have to correspond to a collision with the side of the “tube” to the right of Q0. But then x l,ε = Fx′l−1,ε ∈ Ωh0 would correspond to a collision with an immobile piston at Q0 and would satisfy d(xk, x k,ε) ≤ const ρ(const/γ)k ≤ const ρ(const/γ)λ = o(γ), using Equations (4.16) and (4.23). But xk /∈ Nγ(∂Ωh0), and so it follows that when the trajectory of yε(t) crosses the plane {Q = Q0}, it is at least a distance ∼ γ away from the boundary of the face of the piston, and its velocity vector is pointed no closer than ∼ γ to being parallel to the piston’s face. As Qε(t′l,ε) − Q0 = O(εL) = o(γ), and it is geometrically impossible (for small ε) to construct a right triangle whose sides s1, s2 satisfy |s1| ≥∼ γ, |s2| ≤ O(εL), with the measure of the acute angle adjacent to s1 being greater than ∼ γ, we have a contradiction. After crossing the plane {Q = Q0}, yε(t) must experience its next collision with the face of the piston, which violates the fact that l /∈ N ′j,ε Second, t′l,ε − t′l−1,ε = ζx′l−1,ε +O(εL), because v1,ε = v1,0 +O(εL). See Equa- tion (4.20). From Equation 4.18, ∣ζxl−1 − ζx′l−1,ε ∣ ≤ O((ρ/γ) (const/γ)l−1). As tl − tl−1 = ζxl−1 and εL = O((ρ/γ) (const/γ)l−1), we obtain (f). Case 2: There exists i such that l = N ′i,ε: For definiteness, we suppose that Qε(t l,ε) ≥ Q0, so that the left gas particle collides with the piston to the right of Q0. The case when Qε(t l,ε) ≤ Q0 can be handled similarly. We know that xl−1, xl, xl+1 /∈ Nγ(∂Ωh0)∪Nγ(F−1Nγ(∂Ωh0)). Using the induc- tive hypothesis and Equation (4.16), we can define x′′l,ε = Fx l−1,ε, x l+1,ε = F 2x′l−1,ε, and d(xl, x l,ε) ≤ const ρ(const/γ)l, d(xl+1, x′′l+1,ε) ≤ const ρ(const/γ)l+1. In partic- ular, x′′l,ε and x l+1,ε are both a distance ∼ γ away from ∂Ωh0 . Furthermore, when the left gas particle collides with the moving piston, it follows from Equation (4.8) that the difference between its angle of incidence and its angle of reflection is O(ε). Referring to Figure 4.3, this means that ϕ′l,ε = ϕ′′l,ε + O(ε). Geometric arguments similar to the one given in Case 1 above show that the yε-trajectory of the left gas particle has precisely one collision with the piston and no other collisions with the sides of the gas container when the gas particle traverses the region Q0 ≤ Q ≤ Qε(t′l,ε). Note that x′l,ε was defined to be the point in the collision cross-section Ωh0 corresponding to the return of the yε-trajectory into the region Q ≤ Q0. See Figure 4.3. From this figure, it is also evident that d(r′l,ε, r l,ε) ≤ O(εL/γ). Thus d(x′′l,ε, x′l,ε) = O(εL/γ), and this explains the choice of ρ(ε) in Equation (4.22). From the above discussion and the machinery of Subsection 4.3.2, (a)-(e) now follow readily for both k = l and k = l + 1. Furthermore, property (f) follows in much the same manner as it did in Case 1 above. However, one should note that t′l,ε− t′l−1,ε = ζx′l−1,ε+O(εL)+O(εL/γ) and t′l+1,ε− t′l,ε = ζx′l,ε+O(εL)+O(εL/γ), because of the extra distance O(εL/γ) that the gas particle travels to the right of Q0. But εL/γ = O((ρ/γ) (const/γ)l−1), and so property (f) follows. Case 3: There exists i such that l − 1 = N ′i,ε: As mentioned above, the inductive step in this case follows immediately from our analysis in Case 2. r−coordinate Q0 Qε(t′l,ε) D1(Q0) γ/2 γ/2 O(εL) O(εL/γ) r′′l,ε ϕ′′l,ε r′l,ε ϕ′l,ε r′l−1,ε ϕ′l−1,ε r′l+1,ε r′′l+1,ε Figure 4.3: An analysis of the divergences of orbits when ε > 0 and the left gas particle collides with the moving piston to the right of Q0. Note that the dimensions are distorted for visual clarity, but that εL and εL/γ are both o(γ) as ε→ 0. Furthermore, ϕ′′l,ε ∈ (−π/2 + γ/2, π/2 − γ/2) and ϕ′l,ε = ϕ′′l,ε + O(ε), and so r′l,ε = r l,ε + O(εL/γ). In particular, the yε-trajectory of the left gas particle has precisely one collision with the piston and no other collisions with the sides of the gas container when the gas particle traverses the region Q0 ≤ Q ≤ Qε(t′l,ε) 4.4 Generalization to a full proof of Theorem 4.1.1 It remains to generalize the proof in Sections 4.2 and 4.3 to the cases when n1, n2 ≥ 1 and d = 3. 4.4.1 Multiple gas particles on each side of the piston When d = 2, but n1, n2 ≥ 1, only minor modifications are necessary to generalize the proof above. As in Subsection 4.2.2, one defines a stopping time T̃ε satisfying T̃ε < T ∧ Tε = O(ε) such that for 0 ≤ t ≤ T̃ε/ε, gas particles will only experience clean collisions with the piston. Next, define H(z) by H(z) = ∣v⊥1,j ∣ δq⊥1,j=Q ∣v⊥2,j ∣ δq⊥2,j=Q ∣v⊥1,j ∣ δq⊥1,j=Q ∣v⊥2,j ∣ δq⊥2,j=Q It follows that for 0 ≤ t ≤ T̃ε/ε, hε(t)−hε(0) = O(ε)+ε H(zε(s))ds. From here, the rest of the proof follows the same steps made in Subsection 4.3.1. We note that at Step 3, we find that H(z) − H̄(h(z)) divides into n1 + n2 pieces, each of which depends on only one gas particle when the piston is held fixed. 4.4.2 Three dimensions The proof of Theorem 4.1.1 in d = 3 dimensions is essentially the same as the proof in two dimensions given above. The principal differences are due to differences in the geometry of billiards. We indicate the necessary modifications. In analogy with Section 4.2.1, we briefly summarize the necessary facts for the billiard flows of the gas particles when M = ∞ and the slow variables are held fixed at a specific value h ∈ V. As before, we will only consider the motions of one gas particle moving in D1. Thus we consider the billiard flow of a point particle moving inside the domain D1 at a constant speed 2E1. Unless otherwise noted, we use the notation from Section 4.2.1. The billiard flow takes place in the five-dimensional space M1 = {(q1, v1) ∈ T D1 : q1 ∈ D1, |v1| = 2E1}/ ∼. Here the quotient means that when q1 ∈ ∂D1, we identify velocity vectors pointing outside of D1 with those pointing inside D1 by reflecting orthogonally through the tangent plane to ∂D1 at q1. The billiard flow preserves Liouville measure restricted to the energy surface. This measure has the density dµ = dq1dv1/(8πE1 |D1|). Here dq1 represents volume on R3, and dv1 represents area on S v1 ∈ R3 : |v1| = The collision cross-section Ω = {(q1, v1) ∈ T D1 : q1 ∈ ∂D1, |v1| = 2E1}/ ∼ is properly thought of as a fiber bundle, whose base consists of the smooth pieces of ∂D1 and whose fibers are the set of outgoing velocity vectors at q1 ∈ ∂D1. This and other facts about higher-dimensional billiards, with emphasis on the dispersing case, can be found in [BCST03]. For our purposes, Ω can be parameterized as follows. We decompose ∂D1 into a finite union ∪jΓj of pieces, each of which is diffeomorphic via coordinates r to a compact, connected subset of R2 with a piecewise C3 boundary. The Γj are nonoverlapping, except possibly on their boundaries. Next, if (q1, v1) ∈ Ω and v1 is the outward going velocity vector, let v̂ = v1/ |v1|. Then Ω can be parameterized by {x = (r, v̂)}. It follows that Ω it is diffeomorphic to ∪jΓj × S2+, where S2+ is the upper unit hemisphere, and by ∂Ω we mean the subset diffeomorphic to (∪j∂Γj × S2+) (∪jΓj × ∂S2+). If x ∈ Ω, we let ϕ ∈ [0, π/2] represent the angle between the outgoing velocity vector and the inward pointing normal vector n to ∂D1, i.e. cosϕ = 〈v̂, n〉. Note that we no longer allow ϕ to take on negative values. The return map F : Ω preserves the projected probability measure ν, which has the density dν = cosϕdv̂ dr/(π |∂D1|). Here |∂D1| is the area of ∂D1. F is an invertible, measure preserving transformation that is piecewise C2. Because of our assumptions on D1, the free flight times and the curvature of ∂D1 are uniformly bounded. The bound on ‖DF (x)‖ given in Equation (4.2) is still true. A proof of this fact for general three-dimensional billiard tables with finite horizon does not seem to have made it into the literature, although see [BCST03] for the case of dispersing billiards. For completeness, we provide a sketch of a proof for general billiard tables in Section 4.6. We suppose that the billiard flow is ergodic, so that F is ergodic. Again, we induce F on the subspace Ω̂ of Ω corresponding to collisions with the (immobile) piston to obtain the induced map F̂ : Ω̂ that preserves the induced measure ν̂. The free flight time ζ : Ω → R again satisfies the derivative bound given in Equation (4.3). The generalized Santaló’s formula [Che97] yields Eνζ = 4 |D1| |v1| |∂D1| If ζ̂ : Ω̂ → R is the free flight time between collisions with the piston, then it follows from Proposition 4.5.1 that Eν̂ ζ̂ = 4 |D1| |v1| ℓ The expected value of ∣ when the left gas particle collides with the (immo- bile) piston is given by ∣ = Eν̂ 2E1 cosϕ = cos2 ϕdv̂1 = As a consequence, we obtain Lemma 4.4.1. For µ− a.e. y ∈ M1, ∣v⊥1 (s) ∣ δq⊥1 (s)=Qds = 3 |D1(Q)| Compare the proof of Lemma 4.2.1. With these differences in mind, the rest of the proof of Theorem 4.1.1 when d = 3 proceeds in the same manner as indicated in Sections 4.2, 4.3 and 4.4.1 above. The only notable difference occurs in the proof of the Gronwall-type in- equality for billiards. Due to dimensional considerations, if one follows the proof of Lemma 4.3.1 for a three-dimensional billiard table, one finds that νNγ(F−1Nγ(∂Ω)) = O(γ1−4α + γα). The optimal value of α is 1/5, and so νNγ(F−1Nγ(∂Ω)) = O(γ1/5) as γ → 0. Hence νCγ,λ = O(λγ1/5), which is a slightly worse estimate than the one in Equa- tion (4.19). However, it is still sufficient for all of the arguments in Section 4.3.2, and this finishes the proof. 4.5 Inducing maps on subspaces Here we present some well-known facts on inducing measure preserving transfor- mations on subspaces. Let F : (Ω,B, ν) be an invertible, ergodic, measure preserving transformation of the probability space Ω endowed with the σ-algebra B and the probability measure ν. Let Ω̂ ∈ B satisfy 0 < νΩ̂ < 1. Define R : Ω̂ → N to be the first return time to Ω̂, i.e. Rω = inf{n ∈ N : F nω ∈ Ω̂}. Then if ν̂ := ν(· ∩ Ω̂)/νΩ̂ and B̂ := {B ∩ Ω̂ : B ∈ B}, F̂ : (Ω̂, B̂, ν̂) defined by F̂ω = FRωω is also an invertible, ergodic, measure preserving transforma- tion [Pet83]. Furthermore Eν̂R = Rdν̂ = (νΩ̂)−1. This last fact is a consequence of the following proposition: Proposition 4.5.1. If ζ : Ω → R≥0 is in L1(ν), then ζ̂ = n=0 ζ ◦F n is in L1(ν̂), Eν̂ ζ̂ = Proof. ζ ◦ F n dν̂ = ζ ◦ F n dν = Ω̂∩{R=k} ζ ◦ F n dν Fn(Ω̂∩{R=k}) ζ dν = ζ dν, because {F n(Ω̂ ∩ {R = k}) : 0 ≤ n < k <∞} is a partition of Ω. 4.6 Derivative bounds for the billiard map in three dimensions Returning to Section 4.4.2, we need to show that for a billiard table D1 ⊂ R3 with a piecewise C3 boundary and the free flight time uniformly bounded above, the billiard map F satisfies the following: If x0 /∈ ∂Ω ∪ F−1(∂Ω), then ‖DF (x0)‖ ≤ const cosϕ(Fx0) Fix x0 = (r0, v̂0) ∈ Ω, and let x1 = (r1, v̂1) = Fx0. Let Σ be the plane that perpendicularly bisects the straight line between r0 and r1, and let r1/2 denote the point of intersection. We consider Σ as a “transparent” wall, so that in a neighborhood of x0, we can write F = F2 ◦ F1. Here, F1 is like a billiard map in that it takes points (i.e. directed velocity vectors with a base) near x0 to points with a base on Σ and a direction pointing near r1. (F1 would be a billiard map if we reflected the image velocity vectors orthogonally through Σ.) F2 is a billiard map that takes points in the image of F1 and maps them near x1. Let x1/2 = F1x0 = F 2 x1. Then ‖DF (x0)‖ ≤ ‖DF1(x0)‖ ∥DF2(x1/2) It is easy to verify that ‖DF1(x0)‖ ≤ const, with the constant depending only on the curvature of ∂D1 at r0. In other words, the constant may be chosen independent of x0. Similarly, ∥DF−12 (x1) ∥ ≤ const. Because billiard maps pre- serve a probability measure with a density proportional to cosϕ, detDF−12 (x1) = cosϕ1/ cosϕ1/2 = cosϕ1. As Ω is 4-dimensional, it follows from Cramer’s Rule for the inversion of linear transformations that ∥DF2(x1/2) const ∥DF−12 (x1) detDF−12 (x1) ≤ const cosϕ1 and we are done. Bibliography [Ano60] D. V. Anosov. Averaging in systems of ordinary differential equations with rapidly oscillating solutions. Izv. Akad. Nauk SSSR Ser. Mat., 24:721–742, 1960. [BCST03] Péter Bálint, Nikolai Chernov, Domokos Szász, and Imre Péter Tóth. Geometry of multi-dimensional dispersing billiards. Astérisque, (286):xviii, 119–150, 2003. Geometric methods in dynamics. I. [BR98] Leonid A. Bunimovich and Jan Rehacek. On the ergodicity of many- dimensional focusing billiards. Ann. Inst. H. Poincaré Phys. Théor., 68(4):421–448, 1998. Classical and quantum chaos. [BTT07] P. Bálint, B. Tóth, and I. P. Tóth. On the zero mass limit of tagged particle diffusion in the 1-d Rayleigh gas. Submitted to the Journal of Statistical Physics, 2007. [Bun79] L. A. Bunimovich. On the ergodic properties of nowhere dispersing billiards. Comm. Math. Phys., 65(3):295–312, 1979. [Cal63] H. B. Callen. Thermodynamics. Wiley, New York, 1963. Appendix C. [CD06a] N. Chernov and D. Dolgopyat. Brownian brownian motion - I. Memoirs of the American Mathematical Society, to appear, 2006. [CD06b] N. Chernov and D. Dolgopyat. Hyperbolic billiards and statistical physics. In Proceedings of the International Congress of Mathemati- cians, Madrid, Spain, 2006. [CDPS96] B. Crosignani, P. Di Porto, and M. Segev. Approach to thermal equilib- rium in a system with adiabatic constraints. Am. J. Phys., 64(5):610– 613, 1996. [Che97] N. Chernov. Entropy, Lyapunov exponents, and mean free path for billiards. J. Statist. Phys., 88(1-2):1–29, 1997. [Che04] N. Chernov. On a slow drift of a massive piston in an ideal gas that remains at mechanical equilibrium. Math. Phys. Electron. J., 10:Paper 2, 18 pp. (electronic), 2004. [CL02] N. Chernov and J. L. Lebowitz. Dynamics of a massive piston in an ideal gas: oscillatory motion and approach to equilibrium. J. Statist. Phys., 109(3-4):507–527, 2002. Special issue dedicated to J. Robert Dorfman on the occasion of his sixty-fifth birthday. [CLS02] N. Chernov, J. L. Lebowitz, and Ya. Sinai. Scaling dynamics of a massive piston in a cube filled with ideal gas: exact results. J. Statist. Phys., 109(3-4):529–548, 2002. Special issue dedicated to J. Robert Dorfman on the occasion of his sixty-fifth birthday. [CM06a] N. Chernov and R. Markarian. Chaotic Billiards. Number 127 in Math- ematical Surveys and Monographs. American Mathematical Society, 2006. [CM06b] N. Chernov and R. Markarian. Dispersing billiards with cusps: slow decay of correlations. preprint, 2006. [Dol05] Dmitry Dolgopyat. Introduction to averaging. Available online at http://www.math.umd.edu/∼dmitry, 2005. [GN06] I.V. Gorelyshev and A.I. Neishtadt. On the adiabatic perturbation theory for systems with impacts. Prikl. Mat. Mekh., 70(1):6–19, 2006. English translation in Journal of Applied Mathematics and Mechanics 70 (2006) 417. [GPL03] Christian Gruber, Séverine Pache, and Annick Lesne. Two-time-scale relaxation towards thermal equilibrium of the enigmatic piston. J. Statist. Phys., 112(5-6):1177–1206, 2003. [Gru99] Ch. Gruber. Thermodynamics of systems with internal adibatic con- straints: time evolution of the adiabatic piston. Eur. J. Phys., 20:259– 266, 1999. [Kif04a] Yuri Kifer. Averaging principle for fully coupled dynamical systems and large deviations. Ergodic Theory Dynam. Systems, 24(3):847–871, 2004. [Kif04b] Yuri Kifer. Some recent advances in averaging. In Modern Dynami- cal Systems and Applications, pages 385–403. Cambridge Univ. Press, Cambridge, 2004. http://www.math.umd.edu/~dmitry [Lie99] Elliott H. Lieb. Some problems in statistical mechanics that I would like to see solved. Phys. A, 263(1-4):491–499, 1999. STATPHYS 20 (Paris, 1998). [LM88] P. Lochak and C. Meunier. Multiphase Averaging for Classical Systems. Springer-Verlag, New York, 1988. [LSC02] J. Lebowitz, Ya. G. Sinai, and N. Chernov. Dynamics of a massive piston immersed in an ideal gas. Uspekhi Mat. Nauk, 57(6(348)):3–86, 2002. English translation in Russian Math. Surveys 57 (2002), no. 6, 1045–1125. [Nei76] A. I. Neishtadt. Averaging in multi-frequency systems II. Doklady Akad. Nauk. SSSR Mechanics, 226(6):1295–1298, 1976. English translation in Soviet Phys. Doklady 21 (1976), no. 2, 80–82. [NS04] A. I. Neishtadt and Ya. G. Sinai. Adiabatic piston as a dynamical system. J. Statist. Phys., 116(1-4):815–820, 2004. [Pet83] Karl Petersen. Ergodic Theory. Cambridge University Press, Cam- bridge, 1983. [San76] L. A. Santaló. Integral Geometry and Geometric Probability. Addison Wesley, Reading, Mass., 1976. [Sin70] Ya. G. Sinăı. Dynamical systems with elastic reflections. Ergodic prop- erties of dispersing billiards. Uspehi Mat. Nauk, 25(2 (152)):141–192, 1970. [Sin99] Ya. G. Sinai. Dynamics of a massive particle surrounded by a finite number of light particles. Teoret. Mat. Fiz., 121(1):110–116, 1999. En- glish translation in Theoret. and Math. Phys. 121 (1999), no. 1, 1351- 1357. [SV85] J. A. Sanders and F. Verhulst. Averaging Methods in Nonlinear Dy- namical Systems. Springer-Verlag, New York, 1985. [Vor97] Ya. B. Vorobets. Ergodicity of billiards in polygons. Mat. Sb., 188(3):65–112, 1997. [Wri06] Paul Wright. A simple piston problem in one dimension. Nonlinearity, 19:2365–2389, 2006. [Wri07] Paul Wright. The periodic oscillation of an adiabatic piston in two or three dimensions. Comm. Math. Phys., 2007. To appear; available online at http://www.cims.nyu.edu/∼paulrite. http://www.cims.nyu.edu/~paulrite Dedication Acknowledgements Abstract List of Figures Introduction The adiabatic piston Physical motivation for the results Background Averaging Material The averaging framework Some classical averaging results A proof of Anosov's theorem Moral Results for piston systems in one dimension Statement of results Heuristic derivation of the averaged equation for the hard core piston Proof of the main result for the hard core piston Proof of the main result for the soft core piston Appendix to Section 3.4 The periodic oscillation of an adiabatic piston in two or three dimensions Statement of the main result Preparatory material concerning a two-dimensional gas container with only one gas particle on each side Proof of the main result for two-dimensional gas containers with only one gas particle on each side Generalization to a full proof of Theorem 4.1.1 Inducing maps on subspaces Derivative bounds for the billiard map in three dimensions Bibliography ABSTRACT We study a heavy piston of mass $M$ that moves in one dimension. The piston separates two gas chambers, each of which contains finitely many ideal, unit mass gas particles moving in $d$ dimensions, where $ d\geq 1$. Using averaging techniques, we prove that the actual motions of the piston converge in probability to the predicted averaged behavior on the time scale $M^ {1/2} $ when $M$ tends to infinity while the total energy of the system is bounded and the number of gas particles is fixed. Neishtadt and Sinai previously pointed out that an averaging theorem due to Anosov should extend to this situation. When $ d=1$, the gas particles move in just one dimension, and we prove that the rate of convergence of the actual motions of the piston to its averaged behavior is $\mathcal{O} (M^ {-1/2}) $ on the time scale $M^ {1/2} $. The convergence is uniform over all initial conditions in a compact set. We also investigate the piston system when the particle interactions have been smoothed. The convergence to the averaged behavior again takes place uniformly, both over initial conditions and over the amount of smoothing. In addition, we prove generalizations of our results to $N$ pistons separating $N+1$ gas chambers. We also provide a general discussion of averaging theory and the proofs of a number of previously known averaging results. In particular, we include a new proof of Anosov's averaging theorem for smooth systems that is primarily due to Dolgopyat. <|endoftext|><|startoftext|> Introduction While the thermal decomposition of cyclanes has been the subject of several papers, there are only few studies about the reactions of polycyclanes and the corresponding kinetic parameters are still very uncertain. The geometry and the enthalpy of formation of norbornane or bicyclo[2.2.1]heptane (C7H12), a bridged bicyclic alkane (Figure 1), have been previously studied in order to relate the structure of the molecule to its strain.1,2 A ring strain energy of 17,2 kcal.mol-1 has been estimated for norbornane2 (the ring strain energy is defined as the difference between the experimental gas phase enthalpy of formation and the gas phase enthalpy estimated using the group additivity method proposed by Benson3 for the estimation of thermochemical data). Baldwin and al.4 studied the thermal isomerization of 3-butenyl-cyclopropane to norbornane. It was observed that 3-butenyl-cyclopropane lead to norbornane and many other hydrocarbons when heated at the temperature of 688 K. It was also established that the formation of norbornane occurs through the scission of the C2–C3 bond of 3-butenyl-cyclopropane and might involve the initial generation of a diradical (Figure 2). Figure 1 Figure 2 O’Neal and Benson studied the kinetics of pyrolysis of some non-bridged bicyclic alkanes (e.g. bicyclo[2.2.0]hexane, bicyclo[3.2.0]heptane) from the point of view of diradical intermediates.5 Diradical mechanism estimates were found to be consistent with experimental Arrhenius parameters for a large number of hydrocarbons. Reaction channels for the fate of diradicals were proposed by Tsang in the case of cyclopentane and cyclohexane.6,7 Direct studies of trimethylene and tetramethylene diradicals have been performed by Pedersen and al. by using femtosecond laser techniques with mass spectrometry in a molecular beam.8 More recently, Sirjean et al. performed quantum calculations about the gas phase unimolecular decomposition of cyclobutane, cyclopentane and cyclohexane by using a diradical mechanism.9 The theoretical approach used for the calculation was validated by comparing calculated results with available experimental data. Several papers about the pyrolysis of tricyclo[5.2.1.02,6]decane, a tricyclic alkane, have been published.10-14 Indeed this hydrocarbon is a component of synthetic fuels used in aeronautics. A comprehensive primary mechanism of the pyrolysis of this species has been developed in our laboratory.14 The reactions of unimolecular initiation of this polycyclic compound have been detailed and the reactions of diradicals (decompositions by β-scission, internal disproportionnations) have been taken in account on a systematic way. The first purpose of this article is to present new experimental results of the pyrolysis of norbornane (solid at room temperature) dissolved in benzene. In line with previous work on hydrocarbons,14-17 experiments have been performed in a jet stirred reactor which was operated at temperatures between 873 and 973 K, residence times between 1 and 4 s, at a pressure of 106 kPa and at high dilution. Conversions ranging from 0.04 to 22.6% have been obtained. The attention has been paid to the analysis of the products of the reaction. The formation of 25 major and minor products has been observed. The second objective of this paper is to describe the reactions involved in the mechanism of the pyrolysis of the norbornane – benzene binary mixture. These reactions include all the possible channels of unimolecular initiation of norbornane using a diradical approach as in the case of tricyclo[5.2.1.02,6]decane.14 Cross-coupling reactions due to the presence of benzene in the feed of the reactor have been also reviewed although it will be shown later in this paper that benzene is not very reactive under the operating conditions of this study. Experimental Section The apparatus (Figure 3) used for the experimental study of the thermal decomposition of norbornane (dissolved in benzene) has already been described in two papers about the pyrolysis of tricyclo[5.2.1.02,6]decane14 and n-dodecane,15 respectively Its main features are reminded and specificities linked to the dissolution of norbornane in benzene are discussed below. Figure 3 Experiments were performed in a continuous quartz jet stirred reactor operated at constant temperature (inner volume of about 90 cm3). This reactor was designed to be perfectly stirred for residence times ranging from 0.5 to 5 s.18,19 The heating of the reactor was achieved using Thermocoax heating resistors coiled around the vessel. Temperature inside the reactor was measured with a type K thermocouple which was located at the level of the injection cross at the center of the vessel. Before entering the reactor, reactants were preheated to the reaction temperature to avoid the formation of temperature gradients inside the gas phase due to the endothermal properties of the pyrolysis reaction. The residence time of the reactants inside the annular preheater was very short, i.e. about 1% of the residence time inside the reactor. Pressure inside the reactor was set equal to 106 kPa and was controlled with a control valve set downstream the products analysis devices. Unlike most hydrocarbons having close molecular weights, norbornane (C7H12) is solid at room temperature. The melting point of pure norbornane at atmospheric pressure is 360 K20 and its boiling point is 381 K.21 In order to study the pyrolysis of norbornane with the same apparatus as that used for n-dodecane and tricyclo[5.2.1.02,6]decane,14,15 solid norbornane has been dissolved in a solvent. Benzene was chosen since it is a good solvent for many hydrocarbons and because, as an aromatic compound, it is very unreactive at low temperature. Norbornane used for the experiments was provided by Aldrich (mass fraction purity greater than 0.98) and benzene was provided by Fluka (mass fraction purity greater than 0.99). The liquid reactant (20 wt% norbornane, 80 wt% benzene) was stored in a pressurized glass vessel. Before performing experiments nitrogen bubbling and vacuum pumping were performed in order to remove oxygen traces dissolved in the hydrocarbon mixture. The liquid reactant mass flow rate was controlled by a mass flow controller, mixed to the carrier gas (helium 99.995% pure) and evaporated in a single pass heat exchanger, the temperature of which was set above the boiling point of the diluted hydrocarbon mixture. The molar composition of the mixture at the inlet of the reactor was 0.7% norbornane, 3.6% benzene and 95.7% helium. Products leaving the reactor have been analyzed by gas chromatography. Analyses were performed in two steps. Light species (which are gaseous at room temperature such as hydrogen and hydrocarbons containing less than 5 carbon atoms) were analyzed on-line by two gas chromatographs. The first chromatograph was fitted with a carbosphere packed column and both a flame ionization detector (FID) for the detection of methane and C2 hydrocarbons and a thermal conductivity detector (TCD) for the detection of hydrogen. Argon was chosen as carrier and reference gas in order to detect hydrogen with a better sensibility. A first analysis was performed with a constant oven temperature of 303 K to separate the hydrogen peak from the helium peak (experimental carrier gas). A second analysis was performed with a constant oven temperature of 473 K for the hydrocarbons separation. Retention times (in min) were: methane: 2.4, acetylene: 5.3, ethylene: 7.3 and ethane: 9.6. The second chromatograph used for light species analyses was equipped with a FID for the hydrocarbons detection and a Haysep D packed column. This column gave a good separation for hydrocarbons from methane to C5 hydrocarbons. In particular the peaks corresponding to species like allene and propene or like 1-butene, 2-butene, 1,3- butadiene and 1-butyne were well defined. Retention times (in min) for species whose formation was observed during the study were: methane: 2.6, ethane: 15.5, propene: 61.2, allene: 70.9, propyne: 73.5, 1-butene: 106.6, 1,3-butadiene: 107.7 and 1,3-cyclopentadiene: 147.8. Species identification and calibration were performed with gaseous standard mixtures provided by Air Liquide and Messer. Heavy species (hydrocarbons containing more than 5 carbon atoms which are liquid or solid at room temperature) were condensed in a trap connected at the outlet of the reactor and maintained at liquid nitrogen temperature during a determined period of time. After this time of accumulation, the trap was disconnected and solvent (acetone) and a known amount of internal standard (n-octane) were added. When the temperature of the trap was back to a temperature close to 273 K the mixture was poured into a sampling bottle and then analyzed by gas chromatography. A first analysis was performed with a gas chromatograph fitted with a capillary HP-1 column and a FID for the separation and the detection of hydrocarbons. Oven temperature profile was set to: 313 K held 30 min, rate 5 K.min-1, 453 K held 62 min in order to obtain a good separation of the products of the reaction. Retention times of main products of the reaction (in min) were: 1,3-cyclopentadiene: 3.7, benzene: 4.9, norbornane: 7.3, toluene: 7.9, styrene: 17.4, indene: 38.8, naphthalene: 46.3, biphenyl: 53.3. Calibration was performed with prepared solutions containing small amounts of quantified hydrocarbons and of n-octane (internal standard). A second analysis was performed for the identification of the products of the reaction with a gas chromatography-mass spectrometry system working in the same conditions than the gas chromatograph used for quantification (same column, same carrier gas, same carrier gas flow rate, same oven temperature profile). This procedure allowed us to obtain the same chromatograms with both chromatographs so that direct comparison of the peaks could be performed. Identification of the products separated by the HP-1 column was performed by comparison of the mass spectrum corresponding to the detected peaks with the numerous mass spectra included in the library NBS 75K which was provided by Agilent with the GC-MS apparatus. The consistency between the different chromatographic analyses was verified from products which were present on two chromatograms (like 1,3-cyclopentadiene, methane and ethane). In each case the relative variation between the mole fractions corresponding to the different analyses was less than 5%. The repeatability of experimental results has been studied. Calculated maximum uncertainties in the experimental mole fractions were ±5% for species analyzed on-line and ±8% for heavy species condensed in the trap. Carbon to hydrogen ratios (C/H ratios) in the products have been calculated. For this calculation all species have been taken in account except norbornane (this species has no influence on the value of the C/H ratio), benzene, toluene, styrene, naphthalene and biphenyl (these four last species mainly come from benzene). An average value of 0.60 (± 0.02) has been obtained. This value is slightly above the theoretical value (7/12=0.58) but it should be kept in mind that a rigorous distinction between products from the norbornane and from the benzene (C/H ratio of 1) is not possible. Experimental Results Norbornane – benzene binary mixture. The evolution of the conversion of norbornane with residence time is shown on Figure 4. Because the values of the difference between the mass flow rates of norbornane entering and leaving the reactor were not accurate enough for such low conversions, the values of conversion presented on this graph were deduced from the products of the reaction apart from aromatic and polyaromatic compounds (toluene, styrene, indene, naphthalene and biphenyl produced in very small quantities). Under the conditions of the study, these last products probably derived from benzene. At higher conversions aromatic and polyaromatic compounds could be formed through secondary reactions from small unsaturated hydrocarbons and from 1,3-cyclopentadiene. No evolution of the mass flow rate of benzene was observed between the inlet and the outlet of the reactor under the operating conditions of our study. Benzene appeared to be very stable corresponding to very low conversions. Figure 4 Twenty five products of the thermal decomposition of the norbornane – benzene binary mixture have been analyzed. These products are (by increasing molecular weight): hydrogen, methane, acetylene, ethylene, ethane, allene, propyne, propene, 1-butene, 1,3-butadiene, 1,3-cyclopentadiene, 1,3- cyclohexadiene, 1,4-cyclohexadiene, 5-methyl-1,3-cyclopentadiene, 1,3,5-hexatriene, toluene, 3-ethyl- cyclopentene, ethenyl-cyclopentane, 4-methyl-cyclohexene, methylene-cyclohexane, styrene, indene, naphthalene and biphenyl. An unidentified minor product (molecular weight of 94 g.mol-1 according to mass spectroscopy) has been detected between toluene and styrene. It is worth noticing that the formation in small amounts of several species having the same molecular weight as norbornane has been observed: 3-ethyl-cyclopentene, ethenyl-cyclopentane, 4-methyl- cyclohexene and methylene-cyclohexane (Figure 5). The evolution of the mole fractions of these four products with residence time is shown on Figure 6. The possible channels of formation of these particular species which were observed even at very low conversion (less than 0.5%) will be discussed later in this paper. The formation of very small quantities of aromatic and polyaromatic compounds such as toluene, styrene, indene, naphthalene and biphenyl was also observed. These species probably come from reactions of benzene or from cross-coupling reactions of the norbornane – benzene binary mixture. The presence of benzene in the feed of the reactor masked the possible formation of small quantities of this species as specific product from norbornane. Figure 5 Figure 6 Figure 7 displays the distribution of the products in term of selectivity (here the selectivity of a product is defined as the ratio of the mole fraction of the considered product and the sum of the mole fractions of all products) at a temperature of 973 K and a residence time of 1 s. This figure shows that the three main products of the reaction are hydrogen, ethylene and 1,3-cyclopentadiene which are formed in similar quantities. Figure 8 shows the evolution of the mole fraction with residence time of these three main products, as well as methane, propene, 1,3-butadiene, toluene and biphenyl. Figure 7 Figure 8 The primary products of the reaction of the thermal decomposition of norbornane dissolved in benzene were determined from a study of the selectivity performed at a temperature of 953 K (corresponding to a maximum conversion of 15%). A species is probably a primary product if the extrapolation to origin of its selectivity versus residence time gives a value different from zero (corresponding to a non zero initial rate of production). According to this study 15 species seem to be primary products: hydrogen, methane, ethylene, ethane, propene, 1-butene, 1,3-butadiene, 1,3-cyclopentadiene, 1,3-cyclohexadiene, toluene, 3-ethyl-cyclopentene, 4-methyl-cyclohexene, methylene-cyclohexane, ethenyl-cyclopentane and biphenyl. The values of selectivities to origin of these products are given in Table 1. Species with the highest selectivities at origin are hydrogen (0.346), ethylene (0.314) and 1,3-cyclopentadiene (0.184). Unlike toluene and biphenyl (Figure 9a) other aromatic and polyaromatic compounds (styrene, indene and naphthalene) do not seem to be primary products (extrapolations to origin of their selectivities versus residence time are close to zero; Figure 9b). This can be explained by the fact that toluene and biphenyl can be obtained directly from phenyl radicals derived from benzene (by combination of phenyl radical and methyl radical for toluene, by self- combination of phenyl radicals or by ipso addition of phenyl on benzene for biphenyl) whereas styrene, indene and naphthalene can not be directly formed from primary radicals generated by the decomposition of norbornane and benzene. Amongst species having the same molecular weight as norbornane 3-ethyl-cyclopentene has the largest selectivity to origin. Table 1 Figure 9 Influence of the benzene on the kinetic of the reaction. A short study of the thermal decomposition of pure benzene was performed in order to determine if benzene plays a role in the kinetics of the reaction of pyrolysis of the norbornane – benzene binary mixture. Experiments were performed at temperatures between 913 and 973 K, at residence times ranging from 1 to 4 s and at a pressure of 106 kPa. The molar composition of the flow entering the reactor was 96.4% helium and 3.6% benzene (the mole fraction of benzene at the inlet of the reactor was fixed to the same value as in the case of the study of the binary mixture norbornane – benzene for direct comparison). Figure 10 shows the evolution of the conversion of benzene with residence time at temperatures between 913 and 973 K (the conversion were deduced from the products of the reaction). Under these operating conditions benzene appeared to be very stable. A maximum conversion of benzene of 8×10-2 % was obtained at a temperature of 973 K and a residence time of 1 s. For comparison the conversion of norbornane (dissolved in benzene) was 9.7% under the same conditions. Figure 10 The only product of the reaction which was detected was biphenyl. Hydrogen is probably another product of the reaction22-25 but it was not detected (TCD is known for being a much less sensitive detector than the FID). The formation of toluene, styrene, indene and naphthalene was not observed. These species which were observed in the case of the norbornane – benzene binary mixture were probably formed from cross-coupling reactions or from specific reactions of norbornane. Mole fractions of biphenyl which were obtained in both studies (pure benzene and norbornane – benzene binary mixture) have been compared. The two graphs of Figure 11 display the evolutions of the mole fractions of biphenyl with residence time (Figure 11a) and with temperature (Figure 11b). This figure shows that the mole fraction of biphenyl is always slightly larger in the case of the norbornane – benzene binary mixture than in the case of benzene (apart from the experiment leading to the lowest conversion and performed with a temperature of 913 K and a residence time of 1 s; Figure 11b). It can also be observed that the variation between the mole fractions obtained during the two studies increases with conversion. While interactions between the two hydrocarbons do exist during the thermal decomposition of the norbornane dissolved in benzene, the conditions of our study (temperatures less than 973 K) are such that the presence of benzene has a negligible influence on the reactions of norbornane. In a recent paper26 El Balkali et al. showed that in the case of the oxidation of an equimolar n-heptane – benzene binary mixture the presence of benzene had very little influence on the reactivity at low temperature. Figure 11 Comparison with the reaction of pyrolysis of tricyclo[5.2.1.02,6]decane. Experimental results obtained during this study have been compared with previous ones obtained with tricyclo[5.2.1.02,6]decane,14 a tricyclic alkane the structure of which contains the structure of norbornane (Figure 12). This tricyclic alkane can be considered as a norbornane structure sharing two adjacent carbon atoms with a cyclopentane structure. Figure 12 Figure 13 displays the evolution of the conversions of norbornane (dissolved in benzene) and tricyclo[5.2.1.02,6]decane obtained at temperatures ranging from 873 to 973 K, at a residence time of 1 s and at a pressure of 106 kPa. The mole percentages of norbornane and tricyclo[5.2.1.02,6]decane at the inlet of the reactor were set equal to 0.7%. This figure shows that the reactivities of the two polycyclic alkanes are very similar. Figure 13 The thermal decomposition of tricyclo[5.2.1.02,6]decane leads to the formation of large amounts of hydrogen, ethylene, propene, 1,3-cyclopentadiene and cyclopentene.14 While hydrogen, ethylene and 1,3-cyclopentadiene were also amongst the main products of the thermal decomposition of norbornane, propene appeared to be a minor product and the formation of cyclopentene was not observed. An analysis of the kinetic model of the pyrolysis of tricyclo[5.2.1.02,6]decane14 shows that cyclopentene and allyl radicals (precursors of propene) mainly come from the cyclopentane part of the structure of tricyclo[5.2.1.02,6]decane. Discussion Most of the reactions involved in the pyrolysis of polycyclanes are still badly known and the related kinetic parameters are still very uncertain. We describe here the reactions involved in the mechanism of the thermal decomposition of the norbornane – benzene binary mixture and the possible channels of formation of the products of the reaction are discussed Unimolecular initiations by bond scission of norbornane. Fate of diradicals. Unlike linear and branched alkanes for which two free radicals are directly obtained, unimolecular initiations of polycyclic alkanes by breaking of a C–C bond lead to the formation of diradicals (species with two radical centers). The molecule of norbornane (bicyclic alkane) has three different C–C bonds. The unimolecular initiations can lead to the formation of the three diradicals BR1, BR2 and BR3 shown on Figure 14. According to O’Neal and Benson5 the activation energies of these reactions are given by the expression: E1=∆H(C–C)-∆ETC+E-1, where E1 is the activation energy of the reaction of opening of the cycle, ∆H(C–C) is the bond energy of the broken C–C bond, ∆ETC is the difference of ring strain energy between the products and the reactants, and E-1 is the activation energy of the reverse reaction of closure of the diradical. If it is considered that the ring strain energy of norbornane2 is equal to 17.2 kcal.mol-1, that diradicals BR1 and BR2 have the same ring strain energy than cyclopentane (6,3 kcal.mol-1) and that BR3 has the same ring strain energy than cyclohexane (0 kcal.mol-1), the terms ∆ETC are equal to 10.9 kcal.mol-1 for BR1 and BR2 and to 17.2 kcal.mol-1 for BR3. If we approximate that the sum ∆H(C– C)+E-1 is equal to about 87 kcal.mol-1 (this was observed for cyclopropane, cyclobutane, cyclopentane and cyclohexane14) we obtain activation energies of 76.1 kcal.mol-1 for BR1 and BR2 and of 69.8 kcal.mol-1 for BR3. Figure 14 In previous studies of the pyrolysis of cyclanes and polycyclanes6,7,9,14 it has been shown that diradicals could react through three ways: (1) by combination to give back the initial (poly)cyclane; this reaction is the reverse step of an unimolecular initiation by C–C bond scission. (2) by internal disproportionnation through a (poly)cyclic transition state intermediate; an unsaturated molecule is then obtained. (3) by decomposition by β–scission; products of the reaction depend on the position of the two radical centers. In most cases, a smaller diradical and a molecule are obtained. Figure 15 Kinetic parameters of these reactions have been estimated for cyclobutane, cyclopentane and cyclohexane by quantum calculation by Sirjean et al..9 This study showed that the easiest reaction is the reverse reaction by combination of the diradical formed by the unimolecular initiation (this is why cyclanes and polycyclanes present a greater stability than linear and branched alkanes). If we except this last reaction, the internal disproportionnation is largely easier than the decomposition by β–scission (apart from the particular case of cyclobutane in which the β–scission is easier because the broken C–C bond is in β position of the two radical centers). It is worth noticing that in the early stage of the reaction, (poly)cyclanes mainly lead to the formation of molecular species through diradicals and that they do not lead directly to the formation of free radicals. Thus at very low conversion the concentration of radicals is very low and the primary molecular initiation products from the reactant mainly react by unimolecular initiations to form new diradicals or free radicals. Figure 15 displays the possible internal disproportionnations and decompositions by β–scission of diradicals BR1, BR2 and BR3 of Figure 14. For example BR2 can reacts by three different reactions of β–scission (to form two new diradicals of the same size and a smaller diradical with ethylene) and by three internal disproportionnations through bicyclic transition state intermediates (to form three unsaturated molecular species). New diradicals obtained by β–scission from BR1, BR2 and BR3 react in their turn by reactions of combination, disproportionnation and β–scission. Molecular species obtained through internal disproportionnation (which are the main products obtained from diradicals) react by unimolecular initiations to form diradicals and/or free radicals. During experiments the formation of small amounts of 3-ethyl-cyclopentene, ethenyl-cyclopentane, 4- methyl-cyclohexene and methylene-cyclohexane (corresponding to MA4, MA2, MA6 and MA5 on the scheme of Figure 15) has been observed in the early stage of the reaction. These species can be obtained through internal disproportionnations of diradicals generated by the unimolecular initiations of norbornane and/or through metatheses of radicals involved in transfer and propagation reactions of the decomposition of the norbornyl radicals (but this last source of formation is in competition with more probable reactions of β–scission and isomerization). Thus 3-ethyl-cyclopentene, ethenyl-cyclopentane, 4-methyl-cyclohexene and methylene-cyclohexane likely come from the initiation step of norbornane (this is in accordance with the observation of the formation of these species at very low conversion). The formation of 1-methylene,3-methyl-cyclopentane (MA1 on Figure 15) has not been observed under the conditions of our study. This is probably because the diradical BR1 is more likely to react by reaction of termination by combination to give back norbornane. Unlike 3-ethyl-cyclopentene and ethenyl-cyclopentane the formation of 4-ethyl-cyclopentene (MA3 on Figure 15) was not observed. This may be explained by the fact that the bicyclic structure of the transition state which connects diradical BR2 and MA3 is more strained than the transition states which connect BR2 with MA2 and MA4. Study of the selectivity of the products of the reaction showed that MA2, MA4, MA5 and MA6 seemed to be primary products (extrapolations of their selectivities to origin gave values different from zero). Among these four species MA4 (3-ethyl-cyclopentene) has the highest selectivity at origin (Table 1) which also means that it has the highest initial rate of formation. This let us suppose that the unimolecular initiation of norbornane to diradical BR2 followed by the internal disproportionnation to MA4 is the easiest path of the initiation step. Possible unimolecular initiations of 3-ethenyl-cyclopentene (main initiation product from norbornane) are given as an example (Figure 16). The breaking of the two vinylic C–C bonds of MA4 are not written on the scheme of Figure 16 because the bond dissociation energy is much higher than those of alkylic and allylic C–C bonds. Figure 16 Reactions of unimolecular initiation by breaking of C–H bonds can also be considered but these reactions are more difficult than reactions of unimolecular initiation by breaking of C–C bonds. These reactions can lead to the formation of the three norbornyl radicals shown on Figure 17; but this channel of formation is negligible compared to the reactions of metathesis of radicals from norbornane (see below). Figure 17 Transfer and propagation reactions of norbornyl radicals. The molecule of norbornane owns three different carbon atoms. Reactions of metathesis of hydrogen atoms and radicals with norbornane lead to the formation of three norbornyl radicals (Figure 18). Figure 18 Figure 19 The three norbornyl radicals can react by decompositions by β–scission to lead to the formation of six cyclic radicals (Figure 19). These six new radicals can then react by decompositions by β–scission, by isomerizations and by metatheses (H abstractions) with molecules. Figure 20 displays the reactions of β–scission, metathesis and isomerization of radical R6 from Figure 19. Only isomerizations involving allylic hydrogen atoms have been taken in account on this scheme. The metatheses of radical R6 with molecules lead to the formation of 3-ethenyl-cyclopentane (MA4) but these reactions are less probable that the reactions of β–scission and isomerization. Estimations of activation energies of reactions of β–scission by opening of the ring of cyclopentyl and cyclohexyl radicals9,27 showed that the values used for linear and branched alkyl radicals28-30 cannot be used in a systematic way for cycloalkyl radicals.14 Norbornyl radicals have a more complex structure than cyclopentyl and cyclohexyl radicals and activation energies of their reactions of β–scission by opening of the ring are very uncertain. Moreover for some (poly)cyclic radicals (e.g. cyclopentyl radicals) the activation energy of the β–scission of C–C bonds may be much higher than the value used for linear and ramified alkanes28-30 and the reactions of β–scission of C–H bonds may become competitive with the reactions of β–scission of C–C bonds. Figure 20 Uncertainties on the kinetic parameters of reactions involved in the transfer and propagation steps of the norbornyl radicals make the discussion difficult at this stage of the study. Nevertheless it can be noticed that some radicals (like the radicals R6 and R7 of Figure 19) can lead to the formation of ethylene and cyclic C5 radicals which are precursors of 1,3-cyclopentadiene. Figure 21 Reactions of benzene. Benzene is known for being a very stable hydrocarbon. At low temperature primary reactions of the pyrolysis of benzene are rather simple.31 The only reaction of unimolecular initiation is the breaking of a C–H bond (bond energy of 110.9 kcal.mol-1). Breaking of a benzylic C–C bond of benzene is very difficult because of the high energy of this type of bond (120.8 kcal.mol-1). Bimolecular initiation (reverse reaction of a termination by disproportionnation) consists in the transfer of an hydrogen atom from a molecule of benzene to another. This step leads to the formation of a phenyl radical and a 2,4-cyclohexadien-1-yl radical. Activation energy of this reaction of bimolecular initiation (close to the enthalpy of the reaction: 94.4 kcal.mol-1) is rather high. Under the conditions of our study, unimolecular and bimolecular initiations of benzene are probably negligible compared to unimolecular initiations of the norbornane which have lowest activation energies (part of the strain energy of norbornane is recovered during ring opening3). The initiation step mainly leads to the formation of hydrogen atoms and phenyl radicals (Figure 21). Hydrogen atoms can reacts by reaction of metathesis with benzene to form an hydrogen molecule and a phenyl radical, by self combination to form hydrogen molecule (this trimolecular step is negligible) and by addition to benzene to lead to the 2,4-cyclohexadien-1-yl radical. Phenyl radicals can react by self combination or by addition to benzene (followed by the loss of an atom of hydrogen) to form biphenyl. Sivaramakrishnan et al.32 observed the formation of acetylene and diacetylene from the decomposition by C–C bond β–scission of phenyl radical under extreme conditions (high temperature, high pressure benzene pyrolysis study behind reflected shock waves). 2,4-cyclohexadien-1-yl radical can react by C– H bond β–scission to give benzene, it can decompose by C–C bond β–scission to give 1,3,5-hexatrien- 1-yl radicals and then acetylene and 1,3-butadien-1-yl radicals and can lead to 1,3-cyclohexadiene and 1,4-cyclohexadiene by metathesis on molecule. But under the conditions of the present study of benzene pyrolysis, i.e. at low temperature (below 973 K) and close to atmospheric pressure, decomposition of phenyl and 2,4-cyclohexadien-1-yl radicals does not occur and the formation of acetylene and unsaturated C4 hydrocarbons was not observed. This is in agreement with the observation of the formation of mainly hydrogen and biphenyl by Brooks and al. who studied the pyrolysis of benzene at temperatures between 873 and 1036 K in a static reactor.25 According to Brioukov et al.31, which performed the analysis of the experimental results obtained by Brooks and al.25, the decomposition of benzene is dominated by the unimolecular initiation generating an hydrogen atom and a phenyl radical followed by the short propagation chain composed of the reaction of metathesis of an hydrogen atom with benzene and the reaction of ipso addition of a phenyl radical to benzene leading to biphenyl and an hydrogen atom. Cross-coupling reactions of the benzene – norbornane binary mixture. Comparison between experimental results obtained during the study of the pyrolysis of pure benzene and that of norbornane dissolved in benzene showed that there are low interactions between the two hydrocarbons. There are very few possibilities of cross-coupling reactions (mainly reactions of metathesis and reactions of termination) because the primary mechanism of pyrolysis of benzene generates few species. Bimolecular initiations involving norbornane and benzene molecules lead to the formation of the three norbornyl radicals and of 2,4-cyclohexadien-1-yl radical (Figure 22). Activation energies of these reactions (between 80.5 and 89.4 kcal.mol-1 according to the transferred hydrogen atom) are little higher than those of unimolecular initiation of norbornane but lower than the unimolecular initiation of benzene and than the bimolecular initiation of two molecules of benzene. Hydrogen atoms and radicals deriving from benzene (phenyl radicals and 2,4-cyclohexadien-1-yl radicals) can react by metatheses with norbornane to form the three norbornyl radicals (hydrogen, benzene, 1,3-cyclohexadiene and 1,4- cyclohexadiene are obtained respectively). In the same way radicals deriving from norbornane (more numerous than in the case of benzene) can react by metatheses with benzene to lead to phenyl radicals (Figure 22). Reactions of termination between radicals deriving from the two hydrocarbons can explain the formation of some products like toluene: this last species can be obtained from the combination between a phenyl radical (deriving from benzene) and a methyl radical (deriving from norbornane and not from benzene at such low temperature). Figure 22 Conclusions New experimental results of the thermal decomposition of norbornane dissolved in benzene have been obtained in a jet stirred reactor. A great attention was paid to the identification and the quantification of the products of the reaction. The formation of 25 both major and minor species has been observed during the experiments. Main products were hydrogen, ethylene and 1,3-cyclopentadiene. The detection of minor species having the same molecular weight as norbornane gave interesting information about the reactions of unimolecular initiation of norbornane. The study of the selectivities of the products of the reaction showed that 15 species were probably primary products and values of extrapolations to origin of selectivities let us think that the easiest initiation of the norbornane leads to the formation of 3-ethenyl-cyclopentene. The use of benzene as solvent of norbornane appeared to be a good choice because benzene was very unreactive under the operating conditions of our study: firstly interactions between the two hydrocarbons remained low and the reactivity of norbornane was little affected by the presence of benzene; secondly unimolecular initiations of norbornane (most probable initiation steps) were not masked by initiations involving benzene and the formation of molecular product from the diradicals generated through the initiations could be observed. Reactions that occur during the pyrolysis of norbornane and during the pyrolysis of benzene have been reviewed and described. Cross coupling reactions in the case of the norbornane – benzene binary mixture are rather limited because the decomposition of benzene generates only few species at low temperature. The kinetics of the reactions of unimolecular initiation by breaking of the C-C bonds which are part of a ring structure, reactions of the diradicals (termination by combination, termination by disproportionnation and decomposition by β-scission) and reactions of decomposition by β-scission leading to the opening of a ring (e.g. reactions of β-scission of the norbornyl radicals) will require to be better investigated with reliable estimations of the kinetic parameters of these specific and sensitive reactions. ACKNOWLEDGMENT This work was supported by MBDA-France and the CNRS. We are grateful to E. Daniau, M. Bouchez, and F. Falempin for helpful discussion. REFERENCES 1. Doms, L.;Van den Enden, L.; Geise, H. J.; Van Alsenoy, C. J. Am. Chem. Soc. 1983, 105, 158-162. 2. Vervkin, S. P.; Emel’yanenko, V. N. J. Phys. Chem. A 2004, 108, 6575-6580. 3. Benson, S. W. Thermochemical Kinetics, 2nd ed; John Wiley: New York, 1976. 4. Baldwin, J. E.; Burrell, R. C.; Shukla, R. Org. Lett. 2002, 4, 3305-3307. 5. O’Neal, H. E.; Benson, S. W. Int. J. Chem. Kinet. 1970, 2, 423-456. 6. Tsang, W. Int. J. Chem. Kinet. 1978, 10, 599-617. 7. Tsang, W. Int. J. Chem. Kinet. 1978, 10, 1119-1138. 8. Pedersen, S.; Herek, J. L.; Zewail, A. H. Science 1994, 266, 1359-1364. 9. Sirjean, B.; Glaude, P. A.; Ruiz-Lopez, M. F.; Fournet R. J. Phys. Chem. A 2006, 110, 12693- 12704. 10. Striebich, R. C.; Lawrence, J. J. Anal. Appl. Pyrol. 2003, 70, 339-352. 11. Rao, P. N.; Kunzru, D. J. Anal. Appl. Pyrol. 2006, 76, 154-160. 12. Nakra, S.; Green, R. J.; Anderson, S. L., Combust. Flame 2006, 144, 662-674. 13. Davidson, D. F.; Horning, D. C.; Oelschlaeger, M. A.; Hanson, R. K.; 37th Joint Propulsion Conference, Salt Lake City, UT, 2001; AIAA-01-3707. 14. Herbinet, O.; Sirjean, B.; Bounaceur, R.; Fournet, R.; Battin-Leclerc, F.; Scacchi, G.; Marquaire, P. M. J. Phys. Chem. A 2006, 110, 11298-11314. 15. Herbinet, O.; Marquaire, P. M.; Battin-Leclerc, F.; Fournet, R. J. Anal. Appl. Pyrol. doi:10.1016/j.jaap.2006.10.010. 16. Chambon, M.; Marquaire, P. M.; Come, G. M. C1 Mol. Chem. 1987, 2, 47-59. 17. Ziegler, I.; Fournet, R.; Marquaire, P. M. J. Anal. Appl. Pyrol., 2005, 73, 107-115. 18. Matras, D.; Villermaux, J. Chem. Eng. Sci. 1973, 28, 129-137. 19. David, R.; Matras, D. Can. J. Chem. Eng. 1975, 53, 297-300. 20. Burwell, R. L.; Shim, B. K. C.; Rowlinson, H. C., J. Am. Chem. Soc. 1957, 79, 5142-5148. 21. Desty, D. H.; Whyman, B. H. F. Anal. Chem. 1957, 29, 320-329. 22. Mead, F. C.; Burk, R. E. Ind. Eng. Chem. 1935, 27, 299-301. 23. Hou, K. C.; Palmer, H. B. J. Phys. Chem. 1965, 69, 863-868. 24. Louw, R.; Lucas, H. J. Recueil des Travaux Chimiques des Pays-Bas 1973, 922, 55-71. 25. Brooks, C. T.; Peacock, S. J.; Reuben, B. G. J. Chem. Soc. Faraday Trans. I 1979, 75, 652-62. 26. El Balkali, A.; Ribaucour, M.; Saylam, A.; Vanhove, G.; Thersen, E.; Pauwels, J. F. Fuel 2006, 85, 881-895. 27. Handford-Styring, S. M.; Walker, R. W. J. Chem. Soc., Faraday Trans. 1995, 91, 1431-1438. 28. Buda, F.; Bounaceur, R.; Warth, V.; Glaude, P. A.; Fournet, R.; Battin-Leclerc, F. Combust. Flame 2005, 142, 170-186. 29. Warth, V.; Stef, N.; Glaude, P. A.; Battin-Leclerc, F.; Scacchi G.; Côme, G. M. Combust. Flame 1998, 114, 81-102. 30. Dahm, K. D.; Virk, P. S.; Bounaceur, R.; Battin-Leclerc, F.; Marquaire, P. M.; Fournet, R.; Daniau, E.; Bouchez, M. J. Anal. Appl. Pyrol. 2004, 71, 865-881. 31. Brioukov, M. G.; Park, J.; Lin, M. C. Int. J. Chem. Kinet. 1999, 31, 577-582. 32. Sivaramakrishnan, R.; Brezinsky, K.; Vasudevan, H.; Tranter, R. S. Combust. Sci. and Tech. 2006, 178, 285-305. Table 1. Values of selectivities at origin of primary products of the reaction of pyrolysis of norbornane in benzene at a temperature of 953 K. Species Selectivities to origin hydrogen 0.346 methane 0.011 ethylene 0.314 ethane 0.004 propene 0.020 1-butene 0.006 1,3-butadiene 0.035 1,3-cyclopentadiene 0.184 1,3-cyclohexadiene 0.013 toluene 0.004 3-ethyl-cyclopentene 0.028 4-methyl-cyclohexene 0.009 methylene-cyclohexane 0.007 ethenyl-cyclopentane 0.004 biphenyl 0.012 Total 0.997 (theoretical value: 1) Figure 1. Structure of norbornane (bicyclo[2.2.1]heptane). Figure 2. Isomerization of 3-butenyl-cyclopropane to norbornane through a diradical intermediate as proposed by Baldwin et al..4 Nitrogen Vacuum Liquid hydrocarbon supply Reactor pressure control valve Liquid hydrocarbon Jet Stirred Reactor Annular preheating zone Controlled Evaporator and Mixer Pressurized vessel containing the mixture of benzene and norbornane Gas mass flow controller Liquid mass flow controller Gaseous product sampling line Inert gas supply Liquid nitrogen bath Ice bath On-line GC analysis Nitrogen Vacuum Liquid hydrocarbon supply Reactor pressure control valve Liquid hydrocarbon Jet Stirred Reactor Annular preheating zone Controlled Evaporator and Mixer Pressurized vessel containing the mixture of benzene and norbornane Gas mass flow controller Liquid mass flow controller Gaseous product sampling line Inert gas supply Liquid nitrogen bath Ice bath On-line GC analysis Figure 3. Experimental apparatus flow sheet. 43210 Residence time (s) Figure 4. Evolution of the conversion of norbornane with residence time. ( 873 K, 893 K, 913 K, 933 K, 953 K, 973 K). (a) (b) (d)(c) Figure 5. Structure of products having the same molecular weight as norbornane. (a) 3-ethyl-cyclopentene, (b) ethenyl-cyclopentane, (c) methylene-cyclohexane and (d) 4-methyl-cyclohexene. 5x10-5 43210 Residence time (s) 6x10-5 n (a) 2.5x10-5 43210 Residence time (s) 2.5x10-5 Figure 6. Evolution of the mole fractions of products having the same molecular weight as norbornane. (a) 3-ethyl-cyclopentene, (b) ethenyl-cyclopentane, (c) methylene-cyclohexane, (d) 4-methyl- cyclohexene. ( 873 K, 893 K, 913 K, 933 K, 953 K, 973 K). hydrogen ethane acetylene ethylene ethane propene allene propyne 1-butene 1,3-butadiene 1,3-cyclopentadiene 1,3,5-hexatriene 1,4-cyclohexadiene ethyl-1,3-cyclopentadiene 1,3-cyclohexadiene toluene styrene indene naphtalene biphenyl 3-ethyl-cyclopentene ethenyl-cyclopentane ethyl-cyclohexene ethylene-cyclohexane unidentify product T=973 K τ=1s Figure 7. Distribution of the products of the reaction at a temperature of 973 K and at a residence time of 1 s. 1.6x10-4 1.6x10-4 6x10-5 43210 Residence time (s) 3.0x10-4 1.0x10-4 43210 Residence time (s) 1.2x10-3 1.6x10-3 1.6x10-3 n (a) Figure 8. Evolution of the mole fractions of some products of the reaction with residence time. (a) hydrogen, (b) ethylene, (c) 1,3-cyclopentadiene, (d) methane, (e) propene, (f) 1,3-butadiene, (g) toluene, (h) biphenyl. ( 873 K, 893 K, 913 K, 933 K, 953 K, 973 K). 2.0x10-2 43210 Residence time (s) biphenyl toluene 6x10-3 43210 Residence time (s) styrene naphtalene indene (b) Figure 9. Evolution of the selectivities of (a) biphenyl and toluene and (b) styrene, indene and naphthalene with residence time (at a temperature of 953 K). 43210 Residence time (s) 973 K 953 K 933 K 913 K Figure 10. Evolution of the conversion of benzene with residence time. 3.0x10-5 0.99 1.96 2.96 3.92 Residence time (s) benzene benzene - norbornane T=933 K (a) 3.5x10-5 913 933 953 973 Temperature (K) benzene benzene - norbornane (τ=1s) (b) Figure 11. Comparison of mole fractions of biphenyl obtained in the two studies. Evolution with residence time (a) and with temperature (b). Figure 12. Structure of the tricyclo[5.2.1.02,6]decane. 973953933913893873 Temperature (K) tricyclo[5.2.1.02,6]decane norbornane in benzene Figure 13. Comparison of the conversions of norbornane (dissolved in benzene) and tricyclo[5.2.1.02,6]decane. Experiments were performed at a residence time of 1 s, at a pressure of 106 kPa and with a mole percentage of reactant at the inlet of the reactor of 0.7%. BR1 BR3 norbornane Figure 14. Reactions of unimolecular initiation of norbornane by breaking of C-C bonds. + C2H4Z BR7 BR5 MA2 MA3 β-scission disproportionnation β-scission disproportionnation β-scission disproportionnation BR3 BR7 BR4 MA5 MA6 Figure 15. Reactions of β-scission and of disproportionnation of diradicals BR1, BR2 and BR3 of Figure 14. + C2H5 + CH3 Figure 16. Reactions of unimolecular initiation of 3-ethenyl-cyclopentene (MA4 of Figure 15). norbornane Figure 17. Reactions of unimolecular initiation of norbornane by breaking of C-H bonds. norbornane R2R1 R3 or or Figure 18. The three norbornyl radicals obtained by metatheses of hydrogen atoms or radicals R on norbornane. Figure 19. Reaction of decomposition by β-scission of the three norbornyl radicals. R16 R17 β-scission isomerization metathesis (+RH) (+R ) Figure 20. Reactions of the radical R6 of Figure 19. + H (unimolecular initiation) + + (bimolecular initiation) + H + H2 (metathesis of H atoms with benzene) (combination of two H atoms) (combination of two phenyl radicals) H + H (+ M) H2 (+ M) + H (addition of H atoms to benzene) + + H (addition of phenyl radicals to benzene) Figure 21. Primary reactions of the pyrolysis of benzene. + (bimolecular initiation) + Rn + RnH (metathesis of phenyl radical with norbornane) Rb + Rn products (metatheses of radicals deriving from norbornane with benzene) (combination/disproportionnation of radicals deriving from norbornane and benzene) Figure 22. Cross-coupling reactions of norbornane and benzene. ABSTRACT The thermal decomposition of norbornane (dissolved in benzene) has been studied in a jet stirred reactor at temperatures between 873 and 973 K, at residence times ranging from 1 to 4 s and at atmospheric pressure, leading to conversions from 0.04 to 22.6%. 25 reaction products were identified and quantified by gas chromatography, amongst which the main ones are hydrogen, ethylene and 1,3-cyclopentadiene. A mechanism investigation of the thermal decomposition of the norbornane - benzene binary mixture has been performed. Reactions involved in the mechanism have been reviewed: unimolecular initiations 1 by C-C bond scission of norbornane, fate of the generated diradicals, reactions of transfer and propagation of norbornyl radicals, reactions of benzene and cross-coupling reactions. <|endoftext|><|startoftext|> Introduction Many important gas-phase or heterogeneous industrial processes such as combustion, partial oxidation, cracking or pyrolysis exhibit a complex chemical scheme involving hundreds of species and several thousands of reactions. In these thermal processes, cyclic hydrocarbons, particularly cycloalkanes, represent an important class of compounds. These molecules are produced during the gas- phase processes though they can also be present in the reactants in large amounts; for example, a commercial jet fuel contains about 26% of naphtenes and condensed naphtenes, while in a commercial diesel fuel, this percentage reaches 40 % [1]. Modeling their reactivity currently represents an important challenge in the formulation of new fuels, less polluting and usable with new types of combustion in engines like “Homogeneous Charge Compression Ignition” (HCCI) [2]. During combustion or pyrolysis processes, cycloalkanes can lead to the formation of a) toxic compounds or soot precursors such as benzene (by successive dehydrogenations) and b) linear unsaturated species such as buta-1,3-diene or acroleïn (by the opening of the ring). Several experimental and modeling studies have been carried out on the oxidation of cyloalkanes in gas phase [3-9]. However, the modeling of the combustion of cycloalkanes remains difficult due to the lack of both, kinetic data for elementary reactions and thermodynamic data (∆H°f, S°, C°p) for the relevant species. A number of experimental and theoretical works have been reported on the decomposition of small cycloalkanes, such as cyclopropane [10-16]. Ring opening in this molecule is a well-known process and the rate of dissociation leading to propene formation has been extensively measured and calculated. Not less than 37 estimations of the rate constant are available on the NIST chemical kinetics database [17]. Thermal decomposition of cyclobutane has been experimentally studied too and rate constants for the ring opening leading to the formation of two ethylene molecules have been measured [15-16, 18-19]. Theoretical studies on cyclobutane have mainly focused on the reverse reaction, namely, the cycloaddition of two ethylene molecules [20-23], since it represents a prototype reaction for the Woodward-Hoffmann rules and illustrates the usefulness of orbital symmetry considerations. Moreover, Pedersen et al. [24] have showed the validity of the biradical hypothesis by direct femtosecond studies of the transition-state structures. Theses studies have highlighted the fact that cycloaddition of two ethylene molecules can proceed through two different routes: one involves a tetramethylene biradical intermediate, while the other implies a concerted reaction that directly leads to cyclobutane formation. However, the latter reaction has been shown to have a high activation energy due to steric effects (see for instance ref [20] and therein) and to be much less favorable than the biradical process. Even though the study of ring opening of cyclopropane and cyclobutane are interesting from a theoretically point of view, larger cycloalkanes such as cyclopentane or cyclohexane are mainly involved in usual fuel. In spite of that, the unimolecular initiation of these molecules has been little investigated. Ring opening of cyclopentane and cylohexane has been experimentally studied by Tsang [25-26] who has reported the main routes of decomposition of these molecules. The mechanism and initial rates of decomposition were determined from single-pulse shock-tube experiments. For cyclohexane, a reaction pathway leading to the formation of 1-hexene has been considered only whereas for cyclopentane, the processes leading to either 1-pentene or to cyclopropane + ethylene have both been investigated, in accordance with the products experimentally detected. These results have been confirmed by further experimental studies by Kalra et al. [27] and Brown et al. [28]. In addition, Tsang [25] has shown that the experimentally obtained global rate parameters are consistent with a biradical mechanism for ring opening (Scheme 1): Scheme 1. Biradical mechanism for cyclopentane ring opening [25] Tsang connected the global rate parameters for the decomposition of cyclopentane in 1-pentene (k1) or in cyclopropane+ethylene (k2) to the rate parameters of the elementary reactions shown in Scheme 1. The equilibrium constant Keq=kb/kc has been estimated by the group additivity method proposed by Benson [29] and using some analogies with reactions of linear alkanes. From these assumptions, the rate constants of the elementary reactions have been estimated. The same procedure has been applied for the rate of decomposition of cyclohexane in 1-hexene. Even if the rate parameters obtained for the elementary reactions are consistent with the formation of a biradical, no transition state has been defined for validating the suggested mechanism. Moreover, the route leading to ethylene and trimethylene through C-C bond cleavage is rather ambiguous since other pathways are considered by Tsang, as for instance the direct decomposition of the n-pentyl biradical into cyclopropane and ethylene (Scheme 1). All of these studies showed that biradicals are central to the understanding of reaction mechanisms as well as to the predictability of reaction products and rates in the ring opening of cycloalkanes. Our aim in the present work is to analyze the ring opening of cyclobutane, cyclopentane and cyclohexane by means of high level quantum chemical calculations in order to obtain accurate rate constants for elementary reactions. Comparison of the computed rate constants with available experimental data allows us to validate the theoretical approach. We explore several plausible pathways that could be involved in the decomposition of the biradicals and we discuss the evolution of the ring strain energy (RSE) in going from the reactants to the transition state for ring opening. 2. Computational method Quantum chemical computations have been performed on an IBM SP4 computer with the Gaussian03 (G03) software package [30]. The high-level composite method CBS-QB3 [31] has been used. Analysis of vibrational frequencies confirmed that all transition structures (TS) have one imaginary frequency. Intrinsic Reaction Coordinate (IRC) calculations have been systematically performed at the B3LYP/6- 31G(d) [32-33] level to ensure that the computed TSs connect the desired reactants and products. Only singlet biradicals states have been considered and their study at the composite CBS-QB3 level requires two specific comments. First, at this level, geometry optimization of the systems is performed by density functional theory (DFT) using an unrestricted B3LYP method and a CBSB7 basis set. It is worth noting that the use of one determinantal wavefunction to describe open shell singlet biradicals can be questionable. However, previous studies have shown that the geometries obtained in this way compare well with those obtained at more refined computational levels [34-38]. Second, in CBS-QB3 calculations, a correction for spin-contamination in open-shell species is added to the total energy. It includes a term of the form ( )2200954.0 thSSE −−=∆ where denotes the expected value of the 2S 2Ŝ operator and 2thS the corresponding theoretical value (e.g. 0 for a singlet state). This correction was derived for systems displaying a small spin-contamination. However, because of strong singlet-triplet mixing in the unrestricted wavefunction for biradicals, the 2S values are close to 1 and this leads to a systematic error in the CBS-QB3 energy correction of about 6 kcal.mol-1. Several papers have pointed out this limitation [39-41] and some authors have proposed to remove the spin-contamination correction in this case. In the present work, we have preferred to use an empirical parameter specifically designed to handle spin-contamination in singlet biradicals so that singlet-triplet gaps for hydrocarbons biradicals are correctly described at the CBS-QB3 level. All energy values for singlet biradical species are therefore corrected by an expression of the form: ∆Espin = -0.031 (S2-S2th) (1) with S2th =1. 3. Thermochemical data Thermochemical data (∆Hf°, S°, Cp°) for all the species involved in this study have been derived from energy and frequency calculations and are collected in Table 1. In the CBS-QB3 method, harmonic frequencies, at the B3LYP/CBSB7 level of theory, are scaled by a factor 0.99. Explicit treatment of the internals rotors has been performed with the hinderedRotor option of Gaussian03 in accordance with the work of Ayala and Schlegel [42]. A systematic analysis of the results obtained was made in order to ensure that internals rotors were correctly treated. Thus, for biradicals a practical correction was introduced in order to take into account the symmetry number of 2 for each CH2(•) terminal group. This symmetry is not automatically recognized by Gaussian in the case of a radical group. Moreover, it must be stressed that in transition states, the constrained torsions of the cyclic structure have been treated as harmonic oscillators and the free alkyl groups as hindered rotations. Enthalpies of formation (∆Hf°) of species involved in this study have been calculated using isodesmic reactions [43] excepted for cyclanes and 1-alkene for which more accurate experimental enthalpies of formation [44] can be found in the literature. Thanks to the conservation of the total number and types of bonds, very good results can be obtained due to the cancellation of errors on the two sides of the reaction. Several isodesmic reactions have been considered for the calculation of ∆Hf° in order to obtain an average value. However, results appear to be strongly dependent on the accuracy of the experimental data used for species involved in isodesmic reactions, especially for biradicals. Table 1. Ideal gas phase thermodynamics properties obtained by CBS-QB3 calculation. is expressed in kcal.mol KfH 298,∆ -1 and and are given in cal.molo KS298 )(TCp ° -1.K-1. )(TCp ° Species KfH 298,∆ KS298 300 K 400 K 500 K 600 K 800 K 1000 K 1500K P.G. 12.74 56.78 13.28 18.03 22.25 25.74 31.04 34.92 40.98 D3h 6.79 63.19 16.97 23.47 29.37 34.33 41.94 47.48 55.94 D2d -18.27 70.21 20.99 29.17 36.67 43.01 52.81 59.92 70.74 D5h -29.43 71.55 25.45 35.27 44.28 51.96 63.89 72.55 85.66 D3d -22.99 74.09 25.61 35.33 44.30 51.95 63.84 72.49 85.60 D2 0.82 78.27 24.52 32.14 38.82 44.39 53.03 59.41 69.39 C1 -4.40 84.87 29.11 39.91 47.71 54.90 65.51 73.16 84.94 C1 -5.09 79.85 25.99 32.74 38.83 44.05 52.37 56.64 68.53 C1 -9.90 91.75 31.26 39.41 46.75 53.05 63.11 70.69 82.62 C1 26.31 66.32 18.06 22.98 27.35 31.02 36.58 40.52 46.70 C2h 25.41 78.56 23.57 29.58 34.92 39.42 46.46 51.70 59.91 C2 67.43 79.01 22.95 27.80 32.16 35.90 41.92 46.54 53.94 C1 68.13 77.33 23.59 28.36 32.50 36.10 41.93 46.46 53.82 C2h 62.84 87.11 27.87 34.29 40.04 44.93 52.73 58.64 68.05 C1 62.46 87.12 28.37 34.50 40.11 44.95 52.71 58.62 68.03 C1 62.79 89.54 28.15 34.26 39.89 44.75 52.52 58.44 67.90 C1 58.50 98.74 31.50 39.48 46.76 52.98 62.83 70.22 81.87 C1 59.41 94.29 33.61 41.17 48.03 53.96 63.47 70.70 82.15 C1 59.08 92.96 34.02 41.97 49.00 54.97 64.47 71.58 82.74 C1 Hence, for these systems, ∆Hf° has been obtained from the prototype reaction: •R1-•R2 + 2R3H → HR1-R2H + 2 •R3, (2) where •R1-•R2 represents the biradical and •R3 = •H, •CH3, •C2H5 or n-•C3H7. The computed entropy of cyclopentane has been corrected in order to take into account the experimental symmetry of the molecule (D5h). Indeed, CBS-QB3 calculations lead to a non planar geometry with C1 symmetry. In addition, a low frequency of 22 cm-1 can be associated with a puckering motion of the ring. According to Benson [29], this pseudo-rotation is so fast that cyclopentane can be treated as dynamically flat (with a symmetry number σ=10). Thus, we corrected the entropy of cyclopentane by R ln10 (equation 3): Sc-C5H10 = S c-C5H10(CBS-QB3) – R ln 10 (3) The computed value, 70.0 cal.mol-1.K-1, is in good agreement with the experimental value [29]. To our knowledge, no experimental enthalpy of formation for biradicals •C4H8•, •C4H10• or •C6H12• has been reported. However, we can compare our values with those obtained from an estimation based on bond dissociation energy (BDE) according to reaction (4): CnH2n+2 = •CnH2n• + 2 H• (4) where CnH2n+2 represents a linear free alkane and •CnH2n• is the corresponding biradical. ∆H°r for reaction 4 corresponds to twice the BDE for a C-H bond and the enthalpy of formation of •CnH2n• can therefore be calculated from equation (5): ∆H°f (•CnH2n•, 298K) = 2BDE + ∆H°f (CnH2n+2, 298K) - 2 ∆H°f (H•, 298K) (5) This calculation rests on the assumption that no interaction exists between the two radical centers. Table 2 compares estimated values using equation (5) and CBS-QB3 results for the most stable conformation of the biradicals •C4H8•, •C4H10• or •C6H12• . The BDE value for a primary C−H bond in a linear alkane has been taken equal to 100.9 kcal.mol-1, as proposed by Tsang [45] (it corresponds to C-H bond dissociation in propane to give the n-propyl radical). Experimental enthalpies of formation for the molecules used in equation (5) come from NIST webbook [44]. As shown in Table 2, a very good agreement is obtained between CBS-QB3 calculations using isodesmic reactions and values estimated by BDE and equation (5). This result corroborates the consistency of the electronic calculation scheme for the biradicals, in particular, the use of a broken symmetry method to optimize their geometry, as discussed in Section 2. Table 2. Comparison of ∆H°f (in kcal.mol-1, at 298 K) estimated from equation (5) and from theoretical calculations. Biradical ∆H°f from equation (5) This work •C4H8• 67.57 67.43 •C5H10• 62.52 62.46 •C6H12• 57.64 58.50 4. Kinetic calculations The rate constants involved in the mechanisms were calculated using TST [47]: ( ) ⎟ ⎛ ∆−⎟ Trpdk buni exp exp κ (6) where and ≠∆S ≠∆H are, respectively, the entropy and enthalpy of activation and rpd is the reaction path degeneracy. For reactions involving H-transfer, a transmission coefficient, namely κ(T), has been calculated in order to take into account tunneling effect. We used an approximation to κ, provided by Skodje and Truhlar [47]: ( ) ( )[ ] ( ) ( )[ ] )/sin( for for (7) where Tkh B 1 , )Im( 2 == In equation (7), ν≠ is the imaginary frequency associated with the reaction coordinate, ∆V≠ is the zero- point-including potential energy difference between the TS structure and the reactants, and V is 0 for an exoergic reaction and the zero-point-including potential energy difference between reactants and products for an endoergic reaction. The calculation of ≠∆H was performed by taking into account electronic energies of reactants and TSs but also the enthalpy of reaction (Scheme 2). Reactant Product Scheme 2.Calculation of enthalpy of reaction: ≠∆H Thus, ≠∆H (reactant → product) is calculated from the following relation: ∆H≠ R → P( ) = ∆H1≠ (elec) + ∆H−1≠ (elec) + ∆Hr( ) 2 (8) and ≠∆H (product → reactant) by : ∆H≠ P → R( ) = ∆H1≠ (elec) + ∆H−1≠ (elec) − ∆Hr( ) 2 (9) where and are, respectively, the enthalpy of activation for the direct and reverse reactions, calculated from electronic energies, ZPVE and thermal correction energy. )(1 elecH ≠∆ )(1 elecH rH∆ corresponds to the enthalpy of reaction estimated using isodesmic reactions [48]. The use of equations (8) and (9) ensures the consistency between kinetic and thermodynamic data and improves the activation enthalpy results since the values obtained for rH∆ are more accurate than direct electronic calculations. The kinetic data are obtained with a fitting of equation 6 in the temperature range 600-2000K, with the following modified Arrhenius form: k = A Tb exp (-E/RT) (10) 5. Ring opening mechanisms The ring opening mechanisms for cyclobutane, cyclopentane and cyclohexane are now discussed. In all schemes presented below, electronic free enthalpies are reported in kcal.mol-1 and are relative to the reference cycloalkane at P=1 atm. The value in bold corresponds to free enthalpy at T=298 K while the value in italics corresponds to T=1000 K. 5.1 Cyclobutane Scheme 3 presents the global mechanism obtained for the ring opening of cyclobutane. In this scheme, we only considered conformers (molecules, biradicals, TSs) of lowest free enthalpies. Especially, we do not account for conformations of the biradical (2) (gauche and trans) and we only considered the gauche conformer since its free enthalpy is lowest. This point will be discussed in more detail below. 100.5 0.0 -15.7 -18.0 -22.9 2 (s) 57.8 (s) singlet state ∆G°(T) in kcal.mol-1 (bold : T= 298K, italic : T=1000K) Scheme 3. Mechanism obtained for the ring opening of cyclobutane and for conformers of lowest energies. The ring opening of cyclobutane involves a low free enthalpy of activation; ∆G≠ = 60.7 kcal.mol-1 at 298 K which corresponds to an activation enthalpy (∆H≠) of 62.7 kcal.mol-1. This value is lower than that involved in the dissociation of a C-C bond in the corresponding linear alkane (∆H≠ ≈ 86.3 kcal.mol-1) [49]. This result will be discussed below but we can anticipate that it is mainly due to the high ring strain energy involved in the cyclobutane molecule, that is partially removed in the transition state. Following ring opening, three ways of decomposition for tetramethylene have been investigated. The route leading to the formation of two ethylene molecules corresponds to the most favorable one, as already mentioned in the literature [15-23]. The process leading to 1-butene represents another possible pathway though this elementary reaction involves a high stressed cyclic transition state for H-transfer (four-member ring) and displays a high activation energy ( ≠∆H =17.6 kcal.mol-1 vs 2.8 kcal.mol-1 for the β-scission leading to C2H4, at 298 K). This pathway, however, might provide a non–negligible contribution to the total process. Indeed, C4H8 has been identified as a minor product in experimental studies of cyclobutane decomposition [18]. The third route consists of the abstraction of two hydrogen atoms to form buta-1,3-diene and H2. The reaction step has a high activation barrier ( ≠∆H = 40 kcal.mol-1 at T=298K) and should not compete with the previous mechanisms. A concerted reaction leading from cyclobutane to direct formation of two ethylene molecules has been envisaged but no consistent TS could be obtained. As mentioned previously, several studies [20- 23] have been performed on the reverse reaction, i.e. the cyclic addition of two ethylene molecules. The authors found that a two-step biradical-type reaction is expected to be favored (by about 24 kcal.mol-1) over concerted pathways. Sakai [20] estimated activation energies of 77.6 kcal.mol-1 and 58.8 kcal.mol-1 for the concerted and stepwise processes, respectively, at the MP2//CAS/6-311+G(d,p) level. However, it is important to underline that this author reported a second-order saddle point structure (two negative eigenvalues) for the concerted mechanism, which therefore does not correspond to a transition state. Table 3 gives the kinetics parameters of the modified Arrhenius form (equation 10) for all the elementary processes involved in Scheme 3. Unimolecular initiation of cycloalkanes is unimportant for low temperatures in thermal processes and we only give kinetics parameters for T > 600K in all the tables. Table 3. Rate parameters for the unimolecular initiation of cyclobutane at P=1 atm, 600 ≤ T (K) ≤ 2000 K and related to scheme 3. k1-2 k2-1 k2-C2H4 k2-4 k2-5 log A (s-1) 18.53 12.21 7.32 5.57 2.23 n -0.797 -0.305 1.443 2.171 2.995 (kcal.mol-1) 64.85 1.98 3.03 16.44 37.61 To the best of our knowledge, no previous quantum calculations have been performed to estimate rate parameters for the elementary reactions involved in the thermal decomposition of cyclobutane. In 1971, Beadle et al. [19] experimentally studied the pyrolysis of cyclobutane and reported rate parameters for the ring opening of c-C4H8 (k1-2 and k2-1) and for the decomposition of the biradical in C2H4 (k2-C2H4). Their estimations are based on tabulated thermochemistry and additivity methods. The A factor and activation energy proposed by Beadle et al. are as follows: 3.63 1015 s-1 and 63.34 kcal.mol-1 for k1-2, 2 1012 s-1 and 6.6 kcal.mol-1 for k2-1, and 1.17 1013 s-1 and 8.25 kcal.mol-1 for k2-C2H4. In spite of considerable uncertainty in that study, as mentioned by the authors, their results are in good agreement with ours. Thus, at 800 K, the ratio between our value and that proposed by Beadle et al. is 1.7, 2.0 and 0.7 for k1-2, k2-1, and k2-C2H4, respectively. Moreover, it is worth noting that our CBS-QB3 quantum calculations provide an accurate energy for the cycloaddition of two ethylene molecules (formation of biradical 2). The free enthalpy of activation obtained is equal to 56.3 kcal.mol-1 at 298K, which corresponds to an activation energy ∆H≠ = 47.7 kcal.mol-1. This last value can be compared with the activation energy of 56.8 kcal.mol-1 reported by Sakai [20] at the MP2/CAS/6-311+G(d,p) level though this author obtained a second order saddle point. As mentioned previously, gauche/trans interconversion of (2) has been neglected in Scheme 3. However, the β-scission reaction involves a low ∆G≠ (5.1 kcal.mol-1 at 298K) that could be of the same order of magnitude than the rotational barrier around the central CC bond. Accordingly, it can be interesting to consider the two conformations of the biradical for this particular but important pathway. A detailed scheme is presented in Scheme 4. 1 2 (s) 3 (s) 6.66.6 46.30.0 48.4 -22.9 -22.9 56.2 53.5 56.0 55.7 Gauche Trans (s) singlet state ∆G°(T) in kcal.mol-1 (bold : T= 298K, italic : T=1000K) Scheme 4. Detailed mechanism of C2H4 formation from the ring opening of cyclobutane and obtained by considering different conformers of C4 biradicals. Cyclobutane ring-opening leads to the gauche biradical (2). The latter can decompose in two C2H4 molecules or rotate to give the trans conformation (3), which in turn, can also decompose in two C2H4 molecules. The kinetic parameters for all processes involved in Scheme 4 are collected in Table 4. Table 4. Rate parameters for the unimolecular initiation of cyclobutane at P=1 atm, 600 ≤ T (K) ≤ 2000 K and related to scheme 4. k1-2 k2-1 k2-3 k3-2 k2-C2H4 k3-C2H4 log A (s-1) 18.53 12.21 11.30 11.27 7.63 7.32 n -0.797 -0.305 0.461 0.545 1.453 1.521 Ea (kcal/mol) 64.85 1.98 4.29 3.26 4.79 2.07 Tables 3 and 4 give the kinetic parameters involved in the thermal decomposition of cyclobutane. In order to validate our results, we compare the global rate constant for the process: Cyclobutane → C2H4 + C2H4, measured experimentally by Barnard et al. [15] and Lewis et al. [16], with that derived from our computations (Schemes 3 and 4). We consider the quasi-stationary state approximation (QSSA) for biradicals. For Scheme 3, QQSA applied to the gauche biradical leads to the simple expression: HCscheme = (11) For scheme 4, QSSA leads to a more complex expression: k scheme g (12) with , 4242424242 322323323122312 HCHCHCHCHC kkkkkkkkk k −−−−−−−−−− The fit of the global rate constants obtained from relations (11) and (12) leads to the rate parameters presented in table 5. Table 5. Rate parameters for the global reaction cC4H8 → 2 C2H4, at P=1 atm, 600 ≤ T (K) ≤ 2000 K and related to Schemes 3 and 4. 3scheme 4scheme log A (s-1) 20.29 21.52 n -1.259 -1.606 Ea (kcal/mol) 67.69 68.16 Figure 1 compares our results ( and ) with the absolute value measured directly by Barnard et al. [15] and by Lewis et al. [16] for the thermal rate decomposition of cyclobutane in two ethylene molecules. 3scheme 4scheme Figure 1. Comparison between calculated rate constant and experimental data for the global reaction cC4H8 → C2H4 + C2H4 As shown in Figure 1, our results are close to the experimental values, validating the consistency of the theoretical approach. Computed rate constants are always slightly lower than those obtained by Lewis et al. [16] (maximum factor 2.4) or by Barnard et al. [15] (maximum factor 4.3). It is interesting to note that differences between and decrease with temperature, which is consistent with rotational hindrance (20% at 600K and 8% at 2000 K). Though these differences are weak, the rate 4scheme 3scheme constant obtained by explicit consideration of the two conformations of the biradical (trans and gauche) is closer to experimental results than the rate constant calculated from Scheme 3. 5.2 Cyclopentane Scheme 5 shows the global mechanism obtained for the ring opening of cyclopentane. As for cyclobutane, the global mechanism does not take into account the different conformations of the biradical •C5H10• and we only consider the conformation with the lowest free enthalpy. Due to weaker ring strain energy, the free enthalpy of activation of the ring opening of cyclopentane (∆G≠ = 80.5 kcal.mol-1 at 298 K) is higher than that obtained in cyclobutane (∆G≠ = 60.7 kcal.mol-1 at 298 K). 4 (s) 100.5 117.3 102.7 107.1 (s) singlet state ∆G°(T) in kcal.mol-1 (bold : T= 298K, italic : T=1000K) Scheme 5. Mechanism obtained for the ring opening of cyclopentane and for conformers of lowest energies. Four routes of decomposition have been investigated for the biradical 4, the most favorable one being the formation of 1-pentene by H-transfer. This last reaction exhibits an activation energy much lower than the equivalent process in cyclobutane ( ≠∆H = 6 kcal.mol-1 vs 17.6 kcal.mol-1 for cC4H8 at 298 K). This noticeable difference can be ascribed to different strain energy in the corresponding transition states. Conversely, the β-scission process leading to ethylene and cyclopropane is much more difficult than that leading to ethylene in cyclobutane. In this last case, the presence of two radical centers in (•C4H8•) in β position weakens the inner C-C σ-bond. Another interesting point concerns the dehydrogenation reaction of biradical 4 leading to penta-1,4-diene (6) and H2. At low temperature (298K), this reaction is competitive with β-scission (reaction 4 →7) but becomes unimportant at high temperature (1000K) due to a low change of entropy between the biradical and TS. Finally, the reaction of biradical (4) to yield ethyl-cyclopropane (8) is unlikely to occur at either room or high temperature because it involves a high stressed cyclic transition state for H-abstraction and always displays quite high activation energy. For completeness, it must be noted that formation of a trimethylene biradical from 4 has been envisaged. However, the singlet state for this system could not be obtained, geometry optimizations leading systematically to the formation of cyclopropane (only the triplet state could be optimized). This may be due to the monodeterminantal character of the methodology used for the geometry optimizations but it has been shown that ring closure of trimethylene to form cyclopropane occurs very fast anyway [50]. Table 6 gives the kinetics parameters of the modified Arrhenius forms (equation 7) for all the elementary processes involved in Scheme 5. As mentioned previously, just a few studies have been devoted to the kinetics of the thermal decomposition of cyclopentane [25, 27]. Tsang [25] measured by comparative-rate-pulse shock tube experiments, the ring opening of cyclopentane, and suggested rate expressions for k1-4, k4-1, k4-5 and k4-7, over the temperature range 1000 K – 1200 K. The estimations were based on experimental results obtained for the thermal decomposition of cyclopentane and on the thermochemical considerations methods proposed by Benson [29]. Table 6. Rate parameters for the unimolecular initiation of cyclobutane at P=1 atm, 600 ≤ T (K) ≤ 2000 K and related to Scheme 5. k1-4 k4-1 k4-5 k4-6 k4-7 k4-8 log A (s-1) 18.11 9.89 6.77 0.51 9.78 -3.07 n -0.466 0.311 1.480 3.015 1.1 4.157 Ea (kcal/mol) 85.18 1.7 7.76 17.78 26.16 32.43 However, the analysis did not take into account transition state and Tsang used a constant value of 0.16 for the ratio k4-5/ k4-1 in accordance with the disproportionation to combination ratio for n-propyl radicals. At 1100 K, the ratios obtained between our values and those proposed by Tsang for k1-4, k4-1, k4-5 and k4-7 are, respectively, 0.5, 1.5, 1.6 and 1.7. Despite uncertainties in both experimental and theoretical calculations, the agreement is therefore quite satisfying. It is also possible to compare some computed rate parameters for elementary reactions in Scheme 5 with values obtained using semi- empirical relations as proposed by O’Neal [51]. Accordingly, the activation energy involved in reaction 4 → 6, corresponding to a H-transfer (disproportionation), can be estimated by the following relationship [51]: Ea = ED + ESE (13) where ED represents the activation energy involved in the disproportionation of two alkyl free radicals and ESE represents the strain energy of the cyclic transition state. The latter was taken to be 6.3 kcal.mol-1 since the transition state exhibits a five-member ring [29]. An activation energy of 1 kcal.mol-1 can be reasonably taken into account for disproportionation [49]. From our calculations the activation enthalpy of reaction 4→ 6, is estimated to 6 kcal.mol-1 at 298 K and therefore it is in good agreement with the semi-empirical estimation. On the other hand , our quantum calculations give an activation energy equal to 26.6 kcal.mol-1 at 298 K for the β-scission of biradical 4 (reaction 4→7). This value can be compared with the mean activation energy for the β-scission of a C-C bond in an alkyl free radical (28.7 kcal.mol-1 [49]). In Scheme 5, biradicals 4 represents the most stable conformer of the biradical although it does not lead directly to the formation of 1-pentene. Since some activation energies, such as that of the reaction 4 → 5 ( ≠∆H = 6.7 kcal.mol-1 at 298 K), have values close to that of the rotation barrier heights, the role of rotational hindrance has been examined in the case of cyclopentane. Scheme 6 contains the detailed mechanism. 2 (s) 3(s) 77.5 77.1 65.3 64.7 (s) singlet state ∆G°(T) in kcal.mol-1 (bold : T= 298K, italic : T=1000K) Scheme 6. Detailed mechanism of 1-pentene formation from the ring opening of cyclopentane and obtained by considering different conformers of C5 biradicals. Two conformations of the biradical •C5H10• are involved in Scheme 6 but only one (biradical 3) leads to the formation of 1-pentene by H-abstraction. This result has been validated by IRC calculations. Geometries of TS3-5 and biradicals (2) and (3) are presented in Figure 2. As shown, the distance between carbon atom 2 and hydrogen atom 11 is larger in biradical 2 than in biradical 3 (compares structures a and c), which renders H-abstraction for this last species more favorable. This result can be explained by a gauche interaction in biradical 2 involving carbon atoms 4 and 1. In biradical 3, carbon atoms 1, 3, 4 and 5 are in the same plan and no gauche interaction between carbon atoms 1 and 4 is possible. Figure 2. Geometries relative to biradical 2 (a), and TS (b), biradical 3 (c) Figure 2. G It is wor Indeed, in activation e In cyclope only one c for the mec Table 7. R 2000 K and log A (s-1) (kcal/mol) In order rotational b experiment 2.819 Å eometries relative to biradical 2 (a), and TS3-5 (b), biradical 3 (c) th noting the different role played by rotational hindrance in cyclo cyclobutane (Scheme 4), the rotational barrier is of the same o nergy for β-scission and both conformers may lead to the formati ntane, the rotational barrier is lower than the activation energy for onformation (biradical 3) allows the latter process. Table 7 summ hanism shown in Scheme 6. ate parameters for the unimolecular initiation of cyclopentane at related to Scheme 6. k1-2 k2-1 k2-3 k3-2 k3-5 18.11 10.74 10.97 10.54 6.89 -0.466 0.207 0.569 0.602 1.494 84.76 1.7 2.68 2.97 7.45 to validate the rate parameters involved in Schemes 5 and 6 and to arriers between biradicals 2 and 3, we have compared the glob ally by Tsang [25] for the following reactions: cyclopentane → 1-pentene, k1= 1016.1 exp(-84840 (cal.mol-1 cyclopentane → cyclopropane + ethylene, k2= 1016.25 exp(-95060 2.645 Å 2.819 Å 2.645 Å pentane and cyclobutane. rder of magnitude as the on of ethylene molecules. 1-pentene formation and arizes the rate parameters P=1 atm, 600 ≤ T (K) ≤ quantify the effect of the al rate constants obtained )/RT), (cal.mol-1)/RT), with global rate constant estimated by QSSA performed on biradical 4 (Scheme 5) and biradicals 2 and 3 (Scheme 6). The rate expressions are: 54415 51, kschemeg , (14) 533253122-312 5332216 51, −−−−− kkkkkk kschemeg , (15) 74415 71, kschemeg , (16) and the kinetics parameters are given in Table 8. Table 8. Rate parameters for the global reactions cC5H10 → 1-pentene and cC5H10→ cC3H6 +C2H4 at P=1 atm, 600 ≤ T (K) ≤ 2000 K and related to Schemes 5 and 6. scheme gk − scheme gk − scheme gk − log A (s-1) 20.39 24.33 20.06 n -0.970 -1.542 -0.878 (kcal/mol) 92.86 112.49 92.23 Comparison of global rate constants estimated by the QSSA approach and those proposed by Tsang are presented in Figure 3. In the range 1000K –1200K, the values calculated from our global rate constants for the formation of 1-pentene ( and ) are in a good agreement with those obtained by Tsang, our values being lower by a factor 1.2 to 2. scheme scheme Figure 3. Comparison between calculated rate constant and experimental data for the global reactions c-C5H10 → 1-pentene and cC5H10 → c-C3H6 + C2H4 A similar result has been obtained in the case of the reaction leading to cyclopropane and ethylene, with values underestimated by a factor 0.7 to 2.5 with respect to Tsang estimations. Concerning the influence of rotational hindrance in the formation of 1-pentene, is always found to be lower than over the temperature range (28 % at 600K and 8% at 2000K). By analogy with cyclobutane, calculations performed by considering the different conformations for biradical •C scheme scheme 5H10• permit to obtain results closer to experiment. Moreover, the effect of rotational hindrance is slightly greater in the case of cyclopentane than cyclobutane. As said above, this can be explained by the fact that biradical •C5H10• must necessarily rotate in order to yield 1-pentene whereas in cyclobutane, both gauche and trans conformations of the biradical can decompose in two ethylene molecules. 5.3 Cyclohexane Only a few numbers of studies have been performed on the unimolecular initiation mechanism of cylohexane [26, 28]. Scheme 7 shows our results for the ring opening of c-C6H12 and subsequent reactions. As before, in this scheme, we only consider the lowest free enthalpy conformers of the biradicals. Table 9 summarizes the computed rate parameters. 5 (s) 106.5 127.0 110.8 8 94.7106.9 53.287.5 C4 pathway (s) singlet state ∆G°(T) in kcal.mol-1 (bold : T= 298K, italic : T=1000K) Scheme 7. Mechanism obtained for the ring opening of cyclohexane and for conformers of lowest energies. Table 9. Rate parameters for the unimolecular initiation of cyclohexane at P=1 atm, 600 ≤ T (K) ≤ 2000 K and related to scheme 7. k1-5 k5-1 k2-5 k5-2 k5-6 k5-7 k5-8 k5-c-C3H6 log(A s-1) 21.32 9.91 20.11 10.38 2.46 -1.33 10.40 5.23 n -0.972 0.136 -0.785 0.137 2.569 3.800 0.994 2.185 Ea (kcal/mol) 92.63 2.09 85.77 2.13 1.42 17.22 25.75 44.25 In our study, the chair and boat conformations of cyclohexane have been taken into account. As mentioned by Dixon and al. [52], the boat structure (C2v symmetry) is a transition state that connects the chair structure with a twist boat (D2 symmetry) conformation. We did not locate the transition state corresponding to the C2v boat structure but we have obtained a D2 twist boat conformation that does correspond to a characterized energy minimum. Hereafter, this twist boat conformation will be simply refereed to as the “boat conformation” (symmetry number of 4, D2 symmetry). At high temperature, the concentration of this boat conformation cannot be neglected and the equilibrium constant Keq, corresponding to the reaction: c-C6H12 (chair) c-C6H12 (boat) , has been fitted in the 600 - 2000 K temperature range using the relation: K rreq 271.1 3265exp ) ( exp +−=+−= (17) CBS-QB3 calculations performed from chair and boat conformations of cyclohexane and considerations of isodesmic reactions have permitted to calculate and . The mean values for and are, respectively, 6.5 kcal.mol °∆ rS °∆ rH °∆ rH °∆ rS -1 and 2.5 cal.mol.l-1. From equation (17), Keq is equal to 0.047 at 753K that corresponds to 95.5 % of cyclohexane molecules in chair conformation. This result is in very good agreement with the value reported by Walker and Gulati [53] who reported a slightly larger value (99.5%). Another estimation of the equilibrium constant has been reported by Eliel and Wilen [54] from experimental values obtained at 1073K. They found that 25% of the twist boat structure is present in the mixture, that leads to Keq = 0.33. From equation (17), we obtain Keq = 0.17 at 1073K that is consistent with the estimation made by Eliel and Wilen. Thus, it appears that, in the range of temperature considered in our study, the chair/boat conformation ratio must be taken into account in kinetic calculations. Two different TSs have been found for the ring opening of cyclohexane depending on its initial boat or chair structure (Figure 4), the TS corresponding to the former conformation being 2.4 kcal.mol-1 lower in free energy. Since activation energy for ring opening is much higher than the energy involved in boat/chair conformation change, we conclude that only the lowest TS should be taken into account in the kinetic scheme. a) b) Figure 4: TS obtained from the ring opening of cyclohexane with the boat conformation (a) and the chair conformation (b) The free enthalpy of activation obtained for the ring opening of cyclohexane (∆G≠ = 85.5 kcal.mol-1, at T = 298K) is higher than that obtained for cyclobutane (∆G≠ = 60.7 kcal.mol-1) or cyclopentane (∆G≠ = 80.5 kcal.mol-1) at the same temperature, and close to the value found for a linear alkane. This result will be discussed in detail below but it is interesting to point out that it is consistent with an “unstrained structure” in cyclohexane. The main way of decomposition for the biradical •C6H12• (biradical 5) is the formation of 1-hexene, as previously mentioned by Tsang [26]. Indeed, this process involves a slightly constrained transition state for H-abstraction (six-member ring) with a low activation energy ≠∆H = 1.8 kcal.mol-1 (vs 17.6 kcal.mol-1 for c-C4H8 and 6 kcal.mol-1 for c-C5H10 at 298 K). The value of ≠∆H = 1.8 kcal.mol-1 can be compared with the semi-empirical estimation of Ea given by equation 13 and corresponding to a disproportionation process. Taking 1 kcal.mol-1 for ED and 0.7 kcal.mol-1 for ERS (disproportionation of two alkyl radicals for ED and a ring strain energy of a six-member ring for ERS), Ea = 1.7 kcal.mol-1, in very good agreement with our calculation. Owing to this very low activation energy, other routes for •C6H12• decomposition are unlikely. An interesting remark can be made for β-scission leading to the formation of •C4H8• and ethylene (reaction 5 → 8). In the experimental study performed by Tsang [26], no cyclobutane was detected what could implicitly be explained by a very low reaction rate for •C4H8• formation compared to 1-hexene. Analyzing free enthalpies of activation at T=1000K in Scheme 7 shows that the ratio between the disproportionation leading to 1-hexene (reaction 5→ 6) and the β-scission (reaction 5→ 8) is about 120. A comparable value is obtained in cylopentane decomposition (Scheme 5). However, in the case of the thermal decomposition of cyclopentane, cylopropane is detected experimentally [25] (though it represents a minor product) whereas for cylohexane, no cyclobutane is observed at all. This discrepancy can be explained by the very fast decomposition of the biradical •C4H8• in two ethylene molecules, compared to the cyclization reaction, as shown in Scheme 3. This assumption cannot be verified using the experimental data reported by Tsang [26] since ethylene formed by initiation would represent a minor part of the total concentration of this molecule, which is principally obtained in propagation reactions (decomposition of 1-hexene and cylohexyl radical). Activation energy of the β-scission of biradical 5 (reaction 5→ 8) is equal to 25.2 kcal.mol-1 at 298 K, which is consistent with the β-scission of a C-C bond in an alkyl free radical, with a corresponding activation energy of 28.7 kcal.mol-1 [50]. Let us now consider the effect of conformers of •C6H12•, that was ignored in Scheme 7. The activation energy involved in the formation of 1-hexene from the biradical •C6H12• is quite low ( ≠∆H = 1 kcal.mol-1 at 298 K) and might be competitive with rotational barriers. In accordance with this assumption, we developed a detailed mechanism for the formation of 1-hexene involving the different conformations of the biradical (Scheme 8). 3 (s) 4 (s) 87.288.43.9 67.1 67.0 6 13.3-1.7 (s) singlet state ∆G°(T) in kcal.mol-1 (bold : T= 298K, italic : T=1000K) Scheme 8. Detailed mechanism of 1-hexene formation from the ring opening of cyclohexane and obtained by considering different conformers of C6 biradicals. Table 10. Rate parameters for the unimolecular initiation of cyclohexane at P=1 atm, 600 ≤ T (K) ≤ 2000 K and related to scheme 8. k1-3 k3-1 k2-3 k3-2 k3-6 k3-4 k4-3 k4-6 log A (s-1) 20.68 11.08 20.16 11.23 2.17 10.11 11.40 4.17 n -0.799 0.117 -0.810 0.073 2.923 0.720 0.359 2.295 Ea (kcal/mol) 92.44 0.70 85.99 0.81 0.77 2.50 2.98 0.20 Table 10 summarizes the rate parameters for elementary reactions in Scheme 8. In order to compare the global rate constant obtained by Tsang for the thermal decomposition of cyclohexane in 1-hexene [26] to that obtained from Schemes 7 and 8, we performed quasi-stationary-state approximation (QSSA) on biradical 5 (Scheme 7) and biradicals 3 and 4 (Scheme 8). Scheme 8 leads to the following expressions for the reaction c-C6H12 → 1-C6H12 : kchair scheme8 = k3−6 k1−3 k4−6 k3−4 k1−3 C k4−3 + k4−6( ) (18) kboat scheme8 = k3−6 k2−3 k4−6 k3−4 k2−3 C k4−3 + k4−6( ) (19) with C = k3−4 k4−6 k4−3 + k4−6 + k3−1 + k3−2 + k3−6 (20) At a given temperature, the global rate constant can be calculated from rate constants given by equations (18) and (19) and equilibrium constant in order to have the ratio of boat and chair conformations. Thus, we obtained the following expression for the global rate constant: scheme scheme boateq scheme chairscheme )61( + =− (21) where Keq is the equilibrium constant obtained from equation (17). For the more general Scheme 7, QSSA applied on the biradical 5 leads to the following relations: kchair scheme7 = k1−5 k5−6 k5−1 + k5−6 and kboat scheme7 = k2−5 k5−6 k5−2 + k5−6 (22) By analogy with equation (21), the global rate constant for Scheme 7 can be the calculated by using the equilibrium constant: kg(1−6) scheme7 = kchair scheme7 1 + Keq Keq kboat scheme7 1 + Keq (23) Rate parameters obtained by fitting and for temperatures ranging from 600K to 2000K are presented in Table 11. kg(1−6) scheme7 kg(1−6) scheme8 Table 11. Rate parameters for the global reactions cC6H12 → 1-hexene at P=1 atm, 600 ≤ T (K) ≤ 2000 K and related to Schemes 7 and 8. scheme gk − scheme gk − log A (s-1) 20.45 20.29 n -0.685 -0.639 Ea (kcal/mol) 93.01 94.52 Figure 5 shows the comparison of the values calculated from and (Table 11) and those obtained from the rate constant proposed by Tsang [26], for temperatures between 950K and 1100K, i.e. in the range of validity of Tsang’s study. kg(1−6) scheme7 kg(1−6) scheme8 Figure 5: comparison between calculated rate constant and experimental data [26] for the global reactions c-C6H12 → 1-hexene. The agreement between k and the rate constant proposed by Tsang is rather satisfactory since the ratio g(1−6) scheme8 kg(1−6) scheme8 kTsang lies between 2 and 2.6. Another point concerns the difference obtained between the rate constants and shown in Figure 6. By neglecting rotational hindrance (Scheme 7), the global rate constant is overestimated by a factor 2 compared to , in the temperature range 950 – 1100 K. This difference can be explained by the low activation energy involved in the formation of 1-hexene compared to rotational barrier, but also by entropic effects due to the difference between entropy of the “linear” biradical (biradical 5) and biradicals 3 and 4 (Table 1). This results shows that by considering only the lowest energy biradical conformation, the global rate constant is largely over- estimated. kg(1−6) scheme7 kg(1−6) scheme8 kg(1−6) scheme7 kg(1−6) scheme8 5.4 Ring strain energies The results obtained for the ring opening of cycloalkanes into biradicals, show an increase of the enthalpy of activation going from cyclobutane to cyclohexane. These differences are mainly due to the change of ring strain energy when one goes from cycloalkanes to TS. To discuss the results obtained, we first calculated the ring strain energy of cycloalkanes by using the usual definition of RSE [55] (equation 24) : RSE = ∆HRS = Hcyclo – n HCH2 (24) where Hcyclo and HCH2, represent, respectively, the electronic enthalpies of cycloalkane and of CH2 fragment in a strain-free alkane. n represents the number of CH2 fragments in the cyclic species. HCH2 has been calculated by difference between the enthalpy of n-hexane and n-pentane. Hcyclo and HCH2 have been obtained by considering the electronic energy, zero-point energy and thermal corrections to enthalpy given at a CBS-QB3 level of theory at 298K. RSE can also be obtained by using enthalpies calculated by isodesmic reactions. Thus, equation 24 can be rewritten as : RSE = ∆HRS = ∆fH°cyclo – n ∆fH°CH2 (25) where ∆fH°cyclo and ∆fH°CH2 correspond, respectively, to the enthalpy of cycloalkane formation obtained from isodesmic reactions and enthalpy of a CH2 group formation, obtained by the difference between the enthalpy of formation of n-hexane and n-pentane. The values obtained for cyclobutane, cyclopentane, cyclohexane (twist boat) and cyclohexane (chair) are summarized in Table 12. RSE obtained at a CBS-QB3 level from direct electronic energies and those obtained from isodesmic reactions are in a very good agreement with values proposed by Cohen [56] and based on group additivity method. Table 12. Ring strain energies (in kcal.mol-1) of cycloalkanes as calculated at the CBS-QB3 level of theory and those proposed by Cohen [56] at T=298K. Cycloalkane Cyclobutane Cyclopentane Cyclohexane (Chair) Cyclohexane (Twist Boat) RSE given by Electronic energies 27.0 7.4 1.1 7.5 RSE calculated from Isodesmic reactions 26.8 7.5 1.0 7.5 RSE from Cohen [56] 26.8 7.1 0.7 / It is now interesting to estimate the remaining part of RSE contained in the transition states during ring opening. This can be done by comparing the enthalpies of activation obtained for the ring opening of cycloalkanes with those obtained for the dissociation of free-strain linear alkanes and by assuming that the difference obtained between the cyclic species and the corresponding linear one is only due to the ring strain energy. If no difference is observed, one can conclude that no remaining RSE is contained in the TS. For a linear alkane, the bond dissociation energy (BDE) can be assimilated with the activation enthalpy since the recombination of the two free radicals is barrierless. BDE between two secondary carbon atoms have been estimated by CBS-QB3 methods for n-butane, n-pentane and n-hexane by means of reaction (26): CnH2n+2 → x •C3H7 + y •C2H5 (26) where CnH2n+2 represents a linear alkane; x and y depend on the value of n. BDE corresponds to the enthalpy of reaction (24) and can be expressed by the following equation: BDE = ∆rH° = x HC3H7 + y HC2H5 – HCnH2n+2 ≈ ∆H≠ (27) where HC3H7, HC2H5, represent, respectively, electronic enthalpy of n-propyl and ethyl radicals, and HCnH2n+2, the enthalpy of n-butane, n-pentane or n-hexane. Calculations of BDE in equation (27) can be done using electronic enthalpies or enthalpies estimated from isodesmic reaction.Table 13 gives the results obtained in the two cases and the experimental values proposed by Luo [57]. Table 13: BDE (in kcal.mol-1) of two secondary carbon atoms for unstrained linear alkanes, at 298 K (C-C) CBSQ-B3 Electronic energies CBSQ-B3 Isodesmic reactions Experimentals BDE [57] n-Butane 88.8 87.0 86.8 n-Pentane 89.6 87.9 87.3 n-Hexane 90.5 88.9 87.5 BDE calculated from direct electronic calculations are higher that experimental ones [57]. On the other hand, the values deduced from isodesmic calculations are closer to these recommended values and show the best accuracy obtained with isodesmic reactions. If we consider that BDE can be assimilated to enthalpy of reaction in the case of linear unstrained alkane, we can compare the activation energies obtained for the opening of cycloalkanes and those obtained by removing the RSE from the BDE of the corresponding linear free-strain alkane. Table 14 shows this comparison from isodesmic calculations. Table 14. Comparison of ∆H≠ (in kcal.mol-1), obtained for the ring opening of cycloalkanes and those estimated from linear unstrained alkane, at T=298K by taking into account isodesmic reactions. Species ∆H ≠ for cycloalkane Ring strain energy (RSE) ∆H≠ calculated from the corresponding linear alkane Remaining Cyclobutane 61.7 26.8 87 – 26.8 = 60.2 1.5 Cyclopentane 82.4 7.5 87.9– 7.5 = 80.4 2.0 Cyclohexane (chair) 89.5 1.0 88.9 – 1.0 = 87.9 1.6 Cyclohexane (twist boat) 83.0 7.5 88.9 – 7.5 = 81.4 1.6 The remaining RSE shown in table 14 permits to conclude that almost all of the RSE is removed in the transition state of cyloalkanes, excepted, perhaps, for cyclohexane (chair) where isodesmic calculations show a slightly increase of the ring strain energy in the transition state. Figure 6 shows the three transition state involved in the ring opening of the cycloalkanes studied. c) b) a) 3.533 Å 3.329 Å 2.789 Å Figure 6: TS involved in the ring opening of cyclobutane (a), cyclopentane (b) and cyclohexane (c) Dudev et al. [55] have showed, from ab initio calculations, that the ring strain energy of cycloalkanes can be explained by contributions of ring bond angles, bond lengths or dihedral angles, this last parameter reflecting nonbonded interactions. Thus, the ring strain energy remaining in the TS of cyclohexane can be explained by the fact that the six-member ring is forced to adopt an energetically unfavorable gauche conformation, whereas n-hexane can exist in a strain-free trans-trans-trans conformation. Moreover, they showed that the effects of ring bond angles on RSE decrease when the size of the cycle increases while an opposite effect is obtained for nonbonded interactions. In our study, the formation of TS from cyclobutane involves an increase of the ring bond angles which permits to remove a large part of RSE. Indeed, in the TS of Figure 6a, ∠C1C2C3 = ∠C2C3C4 = 108.4° vs 88.6° in cyclobutane. For TS involved in the ring opening of cyclopentane (Figure 6b), the large increase of the bond length between C1 and C2 (3.329 Å in TS vs 1.545 Å in cyclopentane) associated with an increase of the ring bond angle ( °=∠=∠ 8.114 542531 CCCCCC in the TS vs 104.8° in the molecule), may explain a large part of the removal of RSE in the TS. In cyclohexane, the low value of RSE is maintained in the transition state, that can be explained by gauche interactions remaining in the TS. 6. Conclusions While much work have been carried out on the thermal reactions of aliphatic hydrocarbons, the high temperature reactions of cyclic hydrocarbons has not been explored extensively. In this study, the ring opening of the most representative cyclic alkanes, i.e. cyclobutane, cyclopentane, and cyclohexane have been extensively explored by means of quantum chemistry. All the possible elementary reactions have been investigated from the biradicals yielded by the initiation steps. The thermochemical properties of all the species have been calculated with the high-level CBS-QB3 method. The inharmonic contribution of hindered rotors have been taken into account and isodesmic reactions have been systematically used for the evaluation of the enthalpies in order to minimize the systematic errors. The enthalpies of formation of the biradicals have been compared to data obtained with a semi-empirical method and show a very good agreement. For all the elementary reactions, the transition state theory allowed to calculate the rate constant. Three parameters Arrhenius expressions have been derived in the temperature range 600 to 2000 K at atmospheric pressure. Tunneling effect has been taken into account in the case of internal H transfers. Thanks to the Quasi Steady State Approximation applied to the biradicals, rate constants have been calculated for the global reaction leading directly from the cyclic alkane to the molecular products. These values have been compared with the few data available in the literature and showed a rather good agreement. The main reaction routes are the decomposition to two ethylene molecules in the case of cyclobutane and the internal disproportionation of the biradicals yielding 1-pentene and 1-hexene in the case of cyclopentene and cyclohexane, respectively. An important fact highlighted in this work is the role of the internal rotation hindrance in the biradical fate. Whereas the energy barriers between conformers are usually of low energy in comparison to the reaction barriers, all the energies are close in this case and taking the rotations between the conformers into account changes the global rate constant especially for the largest biradicals. The analysis of the variation of the ring strain energy has also showed that the larger part is removed when going from the cycloalkanes to the transition states. These last structure are close to be unconstrained with between 1 or 2 kcal remaining. ACKNOWLEDGMENT The Centre Informatique National de l’Enseignement Supérieur (CINES) is gratefully acknowledged for allocation of computational resources SUPPORTING INFORMATION AVAILABLE The full list of author in ref 30, the structural parameters for all the species investigated in this study the frequencies, energies and zero point energies. This material is available free of charge via the Internet at http://pubs.acs.org. http://pubs.acs.org/ REFERENCES (1) Guibet, J.C. Fuels and Engines, Institut français du Pétrole Publications, Eds.; Technip : Paris, 1999 ; Vol 1, pp 55-56. (2) Bounaceur, R. ; Glaude, P.A. ; Fournet, R. ; Battin-Leclerc, F. ; Jay, S. ; Pires Da Cruz, A. Int. J. of Vehicles Design, in press. (3) Slutsky, V.; Kazakov, O.; Severin, E.; Bespalov, E.;Tsyganov, S. Combust. Flame 1993, 94, 108. (4) Zeppieri, S.; Brezinsky, K.; Glassman, I. Combust. Flame 1997, 108, 266. (5) Voisin, D.; Marchal, A.; Reuillon, M. ; Boetner, J.C. ; Cathonnet, M. Combust. Sci. Technol. 1998, 138, 137. (6) Simon, V. ; Simon, Y.; Scacchi, G. ; Barronet, F. Can. J. Chem. 1999, 77, 1177. (7) El Bakali, A.; Braun-Unkhoff, M.; Dagaut, P.; Frank, P.; Cathonnet, M. Proc. Combust. Inst. 2000, 28, 1631. (8) Ristori, A.; Dagaut, P. ; El Bakali, A. ; Cathonnet, M. Combust. Sci. Technol. 2001, 165, 197. (9) Lemaire, O.; Ribaucour, M. ; Carlier, M. ; Minetti, R. Combust. Flame 2001, 127, 1971. (10) Dubnikova, F. ; Lifshitz, A. J. Phys. Chem. A 1998, 102, 3299. (11) Hohm, U. ; Kerl, K. Ber Bunsenges phys. Chem. 1990, 94, 1414. (12) Hidaka, V.; Oki, T. Chem. Phys. Lett. 1987, 141, 212. (13) Rickborn, S.F.; Rogers, D.S.; Ring, M.A.; O’Neal, H.E. J.Phys. Chem. 1986, 90, 408-414. (14) Lewis, D.K.; Bosch, H.W.; Hossenlop, J.M. J. Phys. Chem. 1982, 86, 803. (15) Barnard, J.A.; Cocks, A.T.; Lee, R.K-Y. J. Chem. Soc. Faraday Trans. 1 1974, 70, 1782. (16) Lewis, D.K.; Bergmann, J.; Manjoney, R.; Paddock, R.; Kaira, B.L. J. Phys. Chem. 1984, 88, 4112. (17) NIST Chemical kinetics Database; http://kinetics.nist.gov (18) Butler, J.N.; Ogawa, R.B. J. Am. Chem. Soc. 1963, 85, 3346. (19) Beadle, P.C.; Golden, D.M.; King, K.D.; Benson, S.W. J. Am. Chem. Soc. 1972, 94, 2943. (20) Sakai, S. Int. J. Quantum Chem. 2002, 90, 549. (21) Bernardi, F.; Bottoni, A.; Robb, A.R.; Schlegel, H.B.; Tonachini, G. J. Am. Chem. Soc. 1985, 107, 2260. (22) Bernardi, F.; Bottoni, A.; Olivucci, M.; Robb, A.R.; Schlegel, H.B.; Tonachini, G. J. Am. Chem. Soc. 1988, 110, 5993. (23) Doubleday, C., Jr.; J. Am. Chem. Soc. 1993, 115, 11968. (24) Pedersen, S; Herek, J.L.; Zewail, A.H. Science 1994, 266, 1359. (25) Tsang, W. Int. J. Chem. Kinet. 1978, 10, 599. (26) Tsang, W. Int. J. Chem. Kinet. 1978, 10, 1119. (27) Kalra, B.L.; Feinstein, S.A.; Lewis, K. Can. J. Chem. 1979, 57, 1324. (28) Brown, T.C.; King, K.D., Nguyen,T.T. J. Phys. Chem. 1986, 90, 419. (29) Benson, S.W. Thermochemical Kinetics, 2nd Ed.; Wiley: New York, 1976. (30) Frisch, M. J.; et al. Gaussian03, revision B05; Gaussian, Inc.: Wallingford, CT, 2004. http://kinetics.nist.gov/ (31) Montgomery, J.A.; Frisch, M.J.; Ochterski, J.W.; Petersson, G.A. J. Chem. Phys. 1999, 110, 2822. (32) Becke, A.D. J. Phys. Chem. 1993, 98, 5648. (33) Lee, L.; Yang, W.; Parr, R.G. Phys Rev. B 1998, 37, 785. (34) Gräfenstein, J.; Hjerpe A.M.; Kraka E.; Cremer D. J. Phys. Chem. A 2000, 104, 1748. (35) Cremer, D.; Filatov, M.; Polo, V. ; Kraka, E.; Shaik, S. Int. J. Mol. Sci. 2002, 3, 604. (36) Schreiner, P.R.; Prall, M. J. Am. Chem. Soc., 1999, 121, 8615. (37) Goldstein, E.; Beno, B.; Houk, K.N. J. Am. Chem. Soc., 1996, 118, 6036. (38) Kutawa, K.T. ; Valin, L.C. ; Converse, A.D. J. Phys. Chem. A 2005, 109, 10710. (39) Wijaya, C.D.; Sumathi, R.; Green, W.H.Jr. J. Phys. Chem. A 2003, 107, 4908. (40) Coote, M.L.; Wood, G.P.F., Radom, L. J. Phys. Chem. A 2002, 106, 12124. (41) Wood, G.P.F.; Henry, D.J.; Radom, L. J. Phys. Chem. A 2003, 107, 7985. (42) Ayala, P.Y.; Schlegel, H.B. J. Chem. Phys., 1998, 108, 2314. (43) Irikura, K.K.; Frurip, D.J.; in Irikura, K.K.; Frurip, D.J (Eds), Computational Thermochemistry, ACS Symposium series 677, Washington DC, 1998, pp 13-14. (44) NIST Chemistry WebBook : http://webbook.nist.gov/chemistry (45) Tsang, W., in: Simões, J.A.M.; Greenberg A.; Liebman, J.F. (Eds), Energetics of organic free radicals, vol. 4, Blackie A&P, Glasgow, 1996, pp 22-58. (46) Cramer, J.C. Essentials of Computational Chemistry, 2nd Ed., Wiley: Chichester, 2004, p. 527. (47) Skodje, R. T.; Truhlar, D.G. J. Phys. Chem., 1981, 85, 624. http://webbook.nist.gov/chemistry (48) Lee, J.; Bozelli, J.W. J. Phys. Chem. A, 2003, 107, 3778. (49) Bounaceur, R.; Buda, F.; Conraud, V.; Glaude, P.A.; Fournet, R.; Battin-Leclerc, F. Comb. & Flame, 2005, 142, 170. (50) Bettinger, H.F.; Rienstra-Kiracofe, J.C.; Hoffman, B.C.; Schaefer III, H.F.; Baldwin, J.E.; Schleyer, R. Chem. Commun., 1999, 1515 (51) Brocard, J.C.; Baronnet, F.; O’Neal, H.E. Comb. & Flame, 1983, 52, 25. (52) Dixon, D.A.; Komornicki, A. J. Phys. Chem. 1990, 94, 5630. (53) Gulati, S.K.; Walker, R.W. J. Chem. Soc., Faraday Trans., 1989, 85, 1799 (54) Eliel, E.L.; Wilen, S.H. Stereochemistry of Organic Compounds, Wiley-Intersciences, New- York, 1994. (55) Dudev, T.; Lim, C. J. Am. Chem. Soc., 1998, 120, 4450. (56) Cohen, N. J. Phys. Chem. Ref. Data, 1996, 25, 1411 (57) Luo, Y.R. Handbook of Bond Dissociation Energies in Organic Compounds, CRC Press LLC 2003, pp 96-97. ABSTRACT This work reports a theoretical study of the gas phase unimolecular decomposition of cyclobutane, cyclopentane and cyclohexane by means of quantum chemical calculations. A biradical mechanism has been envisaged for each cycloalkane, and the main routes for the decomposition of the biradicals formed have been investigated at the CBS-QB3 level of theory. Thermochemical data (\delta H^0_f, S^0, C^0_p) for all the involved species have been obtained by means of isodesmic reactions. The contribution of hindered rotors has also been included. Activation barriers of each reaction have been analyzed to assess the 1 energetically most favorable pathways for the decomposition of biradicals. Rate constants have been derived for all elementary reactions using transition state theory at 1 atm and temperatures ranging from 600 to 2000 K. Global rate constant for the decomposition of the cyclic alkanes in molecular products have been calculated. Comparison between calculated and experimental results allowed to validate the theoretical approach. An important result is that the rotational barriers between the conformers, which are usually neglected, are of importance in decomposition rate of the largest biradicals. Ring strain energies (RSE) in transition states for ring opening have been estimated and show that the main part of RSE contained in the cyclic reactants is removed upon the activation process. <|endoftext|><|startoftext|> Introduction I.1 The Yb breakthrough I.2. Strategy on the matrix host II. Temperature profile of an ytterbium-doped crystal under diode pumping II.1. Theoretical aspects II.1.1. The steady-state heat equation II.1.2. A review of analytical solutions of the steady state heat equation II.1.3. What is special about ytterbium-doped materials? The influence of absorption saturation in the temperature distribution. II.1.4. Determining the absolute temperature: the boundary conditions II.2. Experimental absolute temperature mapping and heat transfer measurements using an infrared camera II.2.1. Introduction II.2.2. Experimental setup for direct temperature mapping II.2.3. results and measurements of heat transfer coefficients III. Thermal lensing effects : theory III.1. Introduction III.2. Stress and strain calculations III.3. How can we take into account the photoelastic effect ? III.4. Simplified account of photoelastic effect in isotropic crystals III.5. A consequence of strain-induced birefringence: depolarization losses. III.6. Thermally-induced optical phase shift. III.6.1. Expression of the optical path III.6.2. The thermal lens focal length III.7. Discussion about the use of the “dn/dT” coefficient 3 /115 III.8. A novel definition for thermo-optic coefficient based on experimentally measurable parameters. III.9. The aberrations of the thermal lens. IV. Thermal lensing techniques IV.1. Introduction IV.2. Geometrical methods IV.3. Methods based on the properties of cavity eigenmodes IV.4. Methods based on wavefront measurements IV.4.1. Classical interferometric techniques IV.4.2. Shearing interferometric techniques IV.4.3. Methods based on Shack-Hartmann wavefront sensing IV.4.4. Other techniques IV.5. Conclusion V. thermal lensing measurements in ytterbium-doped materials: the evidence of a non radiative path V.1. the thermal load in Yb-doped materials V.2. Evidence of nonradiative effects in Yb-doped materials: the example of Yb:YAG V.3. Laser wavelength dependence oN the thermal load in Yb-doped broadband materials: the example of Yb:Y2SiO5 IV.4. The influence of the mean fluorescence wavelength on the thermal load: an illustration with Yb:KGW IV.5. Conclusion VI. Conclusion Appendix : Calculation of the photoelastic constants Cr, θ and C’r, θ using plane strain and plane stress approximations. 4 /115 I. Introduction I.1. The Yb3+ breakthrough Diode-pumped solid-state laser (DPSSL) technology has become a very intense field of research in Physics [1,2]. The replacement of flash-lamp pumping by direct laser-diode pumping for solid-state materials has brought a very important breakthrough in the laser technology in particular for high power lasers [3-4]. In fact, the better matching between absorption wavelength and material’s absorption spectra brought by the use of laser diode emission ― compared to the broad one of flash-lamps ― has lead to a significant benefit in efficiency and subsequently in simplicity, compactness, reliability and cost. This progress has substantial implications on laser applications such as fundamental and applied research, laser processing, medical applications … 5 /115 4I9/2 4I11/2 4I13/2 4I15/2 4F3/2 4F5/2 4G5/2 4G7/2 4G9/2 2F5/2 2F7/20 10000 20000 Nd3+ Yb3+ 0.8 µm 0.9 µm 0.94 µm 0.98 µm ~1 µm ~ 1 µm Parasitic effects 4I9/2 4I11/2 4I13/2 4I15/2 4F3/2 4F5/2 4G5/2 4G7/2 4G9/2 2F5/2 2F7/20 10000 20000 Nd3+ Yb3+ 0.8 µm 0.9 µm 0.94 µm 0.98 µm ~1 µm ~ 1 µm Parasitic effects Figure 1: Energy levels of Yb and Nd ions. Typical laser transition lines are represented for both pump absorption and laser emission. Lines of high excited states are also represented including the lines involved in the deleterious effects (up-conversion, excited-state absorption or concentration quenching). In the realm of high average power DPSSL, two rare-earth ions dominate: neodymium and ytterbium [5-6]. Actually they can be efficiently pumped, respectively at 808 nm with InGaAsP/GaAs or AlGaAs/GaAs diodes for neodymium, and between 900 and 980 with InGaAs/GaAs diodes for ytterbium (fig. 1). In both case the standard laser emission is around 1 µm, corresponding to the transition between the 4F3/2 and 4I11/2 lines for the Nd3+ and between the 2F5/2 and 2F7/2 for the Yb3+. At the beginning of the high-power-laser development, the Nd-doped materials were preferred to the Yb-doped ones mainly because of the four level nature and their 6 /115 many absorption lines, which are more convenient as far as flash-lamp pumping is concerned. However it seems obvious, for more than one decade now, that Yb-doped materials are more suited for very efficient and very-high-average-power diode-pumped lasers. The main reason for this is the very simple electronic level structure of the Yb3+ ion, which consists on two manifolds as shown in figure 1. This singular property allows avoiding most of the parasitic effects such as upconversion, cross relaxation or excited-state absorption which are present in Nd-doped materials [7] because of the existence of higher excited-state levels (4G9/2 for the 1-µm-laser emission). These deleterious effects have two main consequences. First, they increase the thermal load and subsequently the thermal problems [8] because the main desexcitation paths of the high-excited state levels are non- radiative (as represented in fig. 1). Secondly, they also alter the gain because they can induce strong depopulation of the 4F3/2 level implicated in the laser inversion population. Another advantage of Yb-doped materials compared to their neodymium doped counterparts is the very low quantum defect (again due to the 2-manifold based electronic structure). In fact, when pumped at 980 nm the quantum defect of ytterbium is around 5 % compared to 30 % for neodymium (in YAG). This is a real benefit for reducing the thermal problems and thus to attain very high average powers. As an example of comparison between Nd and Yb doped materials, we summarized in table 1, the different parameters for the same well-known matrix host: YAG (Y3Al5O12) [9-13]. 7 /115 Table 1: comparison between Nd:YAG and Yb:YAG Crystal Nd:YAG Yb:YAG Emission line Wavelength Cross section Broadness (FWHM) 1064 nm 28 10-20 cm-2 0.8 nm 1031 nm 2.1 10-20 cm-2 9 nm Lifetime 230 µs 951 µs Saturation fluence 0.67 J/cm2 9.2 J/cm2 Maximum doping rate 2 % 100 % Absorption line Wavelength Cross section Broadness (FWHM) 808 nm 67 10-20 cm-2 2 nm 968 nm 0.7 10-20 cm-2 4 nm 942 nm 0.75 10-20 cm-2 18 nm Quantum defect 32 % 6.5 % 9.5 % I.2. Strategy on the matrix host Another advantage of Yb-doped versus Nd-doped materials is the longer lifetime which may allow a better storage of the pump energy; and the last but not the least advantage is the generally broader bandwidth of the emission lines. This last advantage leads to a potential for femtosecond pulse generation which, in the current state-of-the-art has never been demonstrated with neodymium. The emergence of ytterbium-based lasers has allowed crucial progress in the ultrashort-pulsed laser technology. These materials have been actually the key point for the development of the latest generation of “ultrafast” lasers: the all-solid-state femtosecond lasers [14- 36]. Applications for such lasers are abundant and excite a great interest in the scientific community. However Yb-doping brings several drawbacks or difficulties. The first one is the very strong influence of the matrix on the spectral properties. Actually, as the two levels 2F5/2 and 2F7/2 are split in manifolds by the Stark effect due to the electric crystalline field of the host matrix, the ion environment strongly models the spectrum. In a simple way, the spectral broadening can be directly related to the level of disorder of the matrix [37-48]. On the first hand, if the matrix is relatively simple and well-ordered, the spectra would reveal relatively narrow and intense lines (which are a 8 /115 strong disadvantage for short pulse generation). Though, a simple matrix structure generally implies a high thermal conductivity which is a key point for developing high power lasers. An example of such an Yb-doped material is given in figure 2 with Yb:YAG. On the second hand, if the disorder of the host matrix is high, the spectrum will be large and suitable for very-short pulse generation but at the expanse of thermal conductivity. An example of such an Yb-doped material is given in figure 3 with Yb:SYS. The numerous advantages of the Yb3+ ion have led to a strong interest for many host matrices but in general favouring either short pulses or high power applications. Table 2 represents this diversity of already studied Yb-doped host matrices and their principal properties. 9 /115 Table 2: Comparison between Yb-doped crystals Emission line Absorption line Material (name and formula) Wavelength (nm) Cross section 10-20 cm2 Broadness (nm) Lifetime (ms) Usual wavelength (nm) Thermal conductivity (undoped) (W/m/K) Yb:YAG Yb:Y3Al5O12 1031 2.1 9 0.951 942 968 11 Yb:GGG Yb:Gd3Al5O12 1025 2 10 0.8 971 8 Yb:Y2O3 1076 0.4 14.5 0.82 979 13.6 Yb:Sc2O3 1041 1.44 11.6 0.8 979 16.5 Yb:CaF2 1045 0.25 70 2.4 979 9.7 Yb:YVO4 1020 0.9 40 0.25 985 5.1 Yb:LSO Yb:Lu2SiO5 1040 0.6 35 0.95 978 5.3 Yb:YSO Yb:Y2SiO5 1042 0.6 40 0.67 978 4.4 Yb:YLF Yb:YLiF4 1030 0.81 14 2.21 940 4.3 Yb:KGW Yb:KGd(WO4)2 1023 2.8 20 0.3 981 3.3 Yb:KYW Yb:KY(WO4)2 1025 3 24 0.3 981 3.3 Yb:SYS Yb:SrY4(SiO4) 30 1065 0.44 73 0.82 980 2 Yb:GdCOB Yb:Ca4GdO(BO3)3 1044 0.35 44 2.6 976 2.1 Yb:BOYS Yb:Sr3Y(BO3)3 1060 0.3 60 1.1 975 1.8 Yb:glass (phosphate glass) 1020 0.05 35 1.3 975 0.8 Another drawback of the Yb-doped materials is due to the quasi-3-level structure of these lasers. As it is apparent on the spectra of figure 2 and 3, there is an overlap between the emission and the absorption bands which leads to strong re-absorption effects and to a reduction of the effective emission band broadness. Moreover, since the splitting due to Stark effect is relatively small (between 200 and 1000 cm-1), the high energy levels within the 2F7/2 manifold (corresponding to the 10 /115 different possible low-energy levels of the laser transition) are somewhat populated at thermal equilibrium. This implies two deleterious effects when temperature increases: first, a reduction of the laser inversion population, second, an increase of the reabsorption at the laser wavelength. A special care concerning the thermal load and thermal management will be then necessary to develop efficient lasers based on ytterbium-doped materials, especially in the high power regime. 11 /115 950 970 990 1010 1030 1050 1070 1090 Section efficace d'absorption Section efficace d'émission Yb:YAG Wavelength (nm) Absorption Emission Absorption Emission 950 970 990 1010 1030 1050 1070 1090 Section efficace d'absorption Section efficace d'émission Yb:YAG Wavelength (nm) Absorption Emission Absorption Emission Absorption Emission Figure 2 : Absorption and emission spectra of Yb:YAG. 850 900 950 1000 1050 1100 1150 Section efficace d'absorption Section efficace d'émission 73 nm Yb:SYS Wavelength (nm) Absorption Emission Absorption Emission 850 900 950 1000 1050 1100 1150 Section efficace d'absorption Section efficace d'émission 73 nm Yb:SYS Wavelength (nm) Absorption Emission 850 900 950 1000 1050 1100 1150 Section efficace d'absorption Section efficace d'émission 73 nm Yb:SYS Wavelength (nm) 850 900 950 1000 1050 1100 1150 Section efficace d'absorption Section efficace d'émission 73 nm Yb:SYS Wavelength (nm) 850 900 950 1000 1050 1100 1150 Section efficace d'absorption Section efficace d'émission 73 nm Yb:SYS Wavelength (nm) Absorption Emission Absorption Emission Absorption Emission Figure 3 : Absorption and emission spectra of Yb:SYS. As a first conclusion, Ytterbium-doped crystals are particularly suitable for directly diode- pumped, solid-state high-power and/or femtosecond lasers. The emergence of new ytterbium-doped laser crystals has allowed crucial progress in the DPSSL technology. Nevertheless, a very special 12 /115 care has to be done concerning the thermal properties and thermal effects in the Yd-doped materials because of their strong influence on the laser performance. In this paper, we then propose to make a review of different thermal-effect studies made on ytterbium-doped laser crystals. II. Temperature profile of an ytterbium-doped crystal under diode pumping In this part we present a review about how to calculate and measure the temperature distribution in an end-pumped laser crystal. We explain in which cases it is possible to obtain an analytical expression (otherwise a finite-element analysis would be necessary), and how these well- known results have to be corrected when we deal with ytterbium-doped crystals, because absorption saturation cannot be ignored in this case. Then, we investigate the role of the thermal contact at the boundaries, which is an essential parameter for the knowledge of the temperature. This will be illustrated, in the last part of this section, by experimental absolute temperature maps, obtained with an infrared imaging camera. II.1. Theoretical aspects II.1.1. The steady-state heat equation A study of thermal effects in crystals first requires the calculation of the temperature field at any point of the crystal. One has to solve the heat equation: )t,z,y,x(Q)t,z,y,x(TK )t,z,y,x(T c thcp =∇− ρ (II.1.1.) with : - T = T(x,y,z,t) : temperature in K; - ρ : density in kg.m-3; - cp : specific heat in J.kg-1. K-1; 13 /115 - Kc : thermal conductivity in W.m-1.K-1; - Qth : thermal power (or thermal load) per unit volume in W.m-3. The specific heat affects the temperature variation in the pulsed regime or in the transient regime: we will ignore it in the following work since we will only consider CW lasers. The thermal conductivity governs the temperature gradient inside the crystal, and will have a crucial importance for the thermal lens magnitude. The heat transfer coefficient H is arising when writing the boundary conditions, and has then an influence only on the absolute value of the temperature inside the crystal. We will discuss at the end of this part how to measure it, and some ways to improve its value. In order to obtain analytical expressions, some assumptions have to be made. We will assume the following: (1) The pump profile is axisymmetric. End-pumping by a fiber-coupled diode is a good example of such a profile; (2) the thermal conductivity Kc is a scalar quantity, not a tensor; this means that we restrict our discussion to glasses and cubic crystals [49] (in practice however the anisotropy of the thermal conductivity tensor is often weak); (3) the cooling is isotropic in the z- plane, which means that the crystal mount does not favour one given direction of cooling. (4) At last we assume that the thermal conductivity is not significantly dependant on temperature, so that it can be considered as a constant. This approximation is very realistic in YAG around room temperature, following the study of Brown et al. [50], and we assume that it is also true in other crystals. However this approximation would not be valid anymore at cryogenic temperatures. The heat equation, including all these assumptions, becomes: )z,r(Q (II.1.2.) 14 /115 when r is the radial coordinate of a point inside the crystal, measured with respect to the pump distribution axisymmetric axis. To simplify a bit further the equations, the crystal will be considered cylindrical, whose axis corresponds to the pump symmetry axis, with a radius r0 and a length L (see figure 4) Figure 4: geometry of the crystal, taken for all calculations. z=0 is the input face. II.1.2. A review of analytical solutions of the steady state heat equation An analytical expression of the temperature distribution inside a crystal is calculable only in a limited number of simple systems. For more complex geometries, one has to use finite element codes. We present in this subsection a brief review of the cases where such an analytical treatment is feasible. The first study of thermal effects in crystals has been provided by Koechner [51] in the early seventies. He considered flash-pumped Nd:YAG rods, within which the thermal load is uniform : Q thth 2 = (II.1.3.) where Pth = ηh Pabs is the thermal power (in W) dissipated into the rod. Pabs is the absorbed pump power and ηh is the fractional thermal load. The solution writes: z r0 z = 0 Pump 15 /115 )r(T)r(T absh 22 += (II.1.4.) where T(r0) is the temperature at the edge of the crystal, which will be estimated later thanks to the boundary conditions. It is useful to write the temperature shift between the centre and the edge of the rod: ( ) ( ) 0∆ 0 =−= (II.1.5.) We note that ∆T is independent of the radius r0 of the crystal, but scales inversely with L. The previous results can not be applied to end-pumping configurations, because in these latter cases the thermal load is localized within a small volume inside the crystal. In a vast majority of practical circumstances, the pump beam profile is axisymmetric and can be described by a super-gaussian shape. The general solution of the heat equation for a super-gaussian beam of any order has been derived by Schmid et al. [52]. The situation is even simpler in most cases: indeed, the pump is often either a near-diffraction- limited Gaussian beam (laser or single mode diode), or a “top-hat” beam (that is a super-gaussian profile of infinite order). The latter description corresponds quite well to fiber-coupled diode laser array pumping. The solution of the heat equation in the specific case of Gaussian-beam pumping is treated in [53] and [54]. The case of the top-hat shape has been derived by Chen et al. [55]. Hereafter we give the solution for a top-hat beam profile. Assuming that the thermal load in each slice at z coordinate is a disk of radius wp(z), the temperature shift with respect to the edge temperature is: 16 /115 )z(wr )z(wr )z,r(T)z,r(T (II.1.6.) where: - αNS is the (non-saturated) linear absorption coefficient; saturation absorption is then not taken into account in this formula ; - The axial heat flux (along z) is ignored, which means in other words that ∂ is neglected in the heat equation. We’ll see in the next section the reasons why we can neglect axial heat flux. The formula is then not valid for thin disks. - As a consequence of the latter point, the temperature can be computed inside each “slice” of thickness dz of the crystal, as if the surrounding slices did not exist. The temperature gradient inside the pumped volume is of particular interest because the laser beam is usually (and preferably) smaller than the pump volume. One obtains: ( , ) ( 0, ) ( , ) 4 1 ( ) h abs NS P e r T r z T r z T r z K e w z ∆ = = − = (II.1.7.) The temperature shift turns out to be independent of the crystal radius r0 and of the crystal length L, which was not the case in Koechner’s simple model (equation II.1.4). It makes sense since the important parameter here is the absorption length NSabsL α1= , and not the whole length L of the crystal. In figure 5 we plotted the normalized temperature distribution for a typical ratio wp/r0 = 0.1. 17 /115 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 r/r0 T( r ) −T( r0 ) Figure 5: Normalized temperature distribution (in a plane perpendicular to the propagation axis) for a crystal pumped by a top-hat-profile fiber-coupled laser diode with wp/r0 = 0.1). II.1.3. What is special about ytterbium-doped materials? The influence of absorption saturation in the temperature distribution. An ytterbium-doped material, especially when pumped at the zero-line wavelength (i.e. around 980 nm), has many common points with a saturable absorber. The absorption rate due to the pump is counterbalanced by the spontaneous emission rate, but also (which is far from being negligible) by the stimulated emission rate at the pump wavelength. It is essential to take absorption saturation effects into account; otherwise the absorbed pump power can be dramatically overestimated. As a result, the absorbed pump power is lower under nonlasing than under lasing conditions, since lasing provides (hopefully) a very efficient path to carry the excited population back to the fundamental level. 18 /115 It is noteworthy that most Finite Element Analysis (FEA) codes (primarily designed for 4-level laser systems in which absorption saturation is not a problem) basically assume an exponential decay for the pump power inside the crystal. It can lead to large errors, as we illustrate below. Let Pp(z) be the pump power through a plane in the crystal at z coordinate. The thermal load density generated into a disk of radius wp(z) and thickness dz is: dz)z(w )z(dP th 2π = (II.1.8) where -dPp(z) represents the absorbed pump power in the thin slice of thickness dz. The temperature field is: )z(wr )z(wr )z(dP )z,r(T)z,r(T (II.1.9.) Inside the pumped volume the temperature shift writes as follows: ( ) ( ) zrTzT =− (II.1.10.) Absorption saturation issues are taken into account, under nonlasing conditions, by the following equation for the pump irradiance Ip (which is the pump power divided by the pump spot area): (II.1.11.) Where αNS is the absorption coefficient in the non saturated regime. The pump saturation irradiance I is calculated from the spectroscopic properties of the material: ( ) ( )[ ]τλσλσλ pempabspp sat + = (II.1.12.) 19 /115 where absσ is the absorption cross section, emσ is the emission cross section, λp the pump wavelength, and τ the radiative lifetime. The pump power Pp(z) obeys the following equation, for a top hat beam profile (one may find the equivalent formulation for a gaussian pump profile in [56]): )z(wI)z(P )z(wI)z(P )z(dP pppNSp = (II.1.13.) A practical way to study absorption saturation issues, and to check the assumptions made so far, is to perform fluorescence imaging experiments in a pumped crystal. Using a crystal whose one of the edge surfaces (in practice one side not facing the radiator) has been polished, one can make an optical image of the fluorescence, under lasing or nonlasing conditions, with a CCD camera and an interference filter at a long wavelength (at 1064 nm for instance), required to completely eliminate the scattered light at the pump wavelength, as well as to prevent detection of fluorescence photons which could have experienced reabsorption. This simple experiment allows visualizing absorption saturation (the fluorescence intensity, integrated along the depth of focus of the imaging system, does not decay exponentially) and also to measure what is the optimum location for the pump spot inside the crystal (figure 6). The experiments we performed with different Yb-doped materials taught us that the optimum focus (the one for which the measured laser efficiency was the highest) was always located at about one third of the whole crystal length from the input face. This parameter is taken into consideration in the following. The low brightness of the diode pump beam (compared to the brightness of the laser beam) makes the effective Rayleigh distance of the pump beam considerably shorter than the crystal length. For this reason, the divergence of the pump beam inside the crystal must also be considered, in order to correctly account for saturation issues. Here we describe the pump radius evolution by a relation of the type: 20 /115 pp wn (II.1.14.) where w is the pump beam waist radius. The M2 factor is determined experimentally. In our case we used a 200µm-diameter core fiber-coupled diode (HLU15F200-980 from LIMO GmbH), whose the M2 was measured to be around 80. Results shown in figure 6 show experimental data and theoretical predictions in a 15%-at. doped Yb:GdCOB crystal [57]. The theoretical profiles are computed assuming that: 1) the pump volume has a top-hat profile, and 2) the imaging objective has a very low numerical aperture, so that the rate of spontaneous photons detected by one pixel can be calculated by integrating the fluorescence yield over one vertical line underneath. The good match between theory and experiments show incidentally that the “top hat” hypothesis for the pump beam profile is well justified. 21 /115 Without saturation absorption : At low power (Pinc = 1 W Pabs =200 mW) With saturation absorption : At high power (Pinc = 13.7 W Pabs =6 W Ipsat =4.1 kW/cm experiment experiment simulation simulation Without saturation absorption : At low power (Pinc = 1 W Pabs =200 mW) With saturation absorption : At high power (Pinc = 13.7 W Pabs =6 W Ipsat =4.1 kW/cm experiment experiment simulation simulation Theoretical profile computed along the symmetry axis Experimental profile measured along the symmetry axis Figure 6: Fluorescence detected @ 1064 nm on a crystal pumped at 980 nm at low power (top) and at high power (bottom), through the optically-polished top face. The influence of absorption saturation is clearly visible: at low pump power, the fluorescence yield is higher at the pump waist location, as expected provided that both absorption coefficient and absorption saturation are weak; on the contrary, when absorption saturation becomes non negligible, the amount of fluorescence photons is minimum at the pump waist. Theory and experiments agree very well, except near the exit face of the crystal, a discrepancy which could be related to the fact that far from the waist, the pump beam is no longer “top hat”. 22 /115 Fig 7a) 0 0.5 1 1.5 2 2.5 3 distance z (mm) Intensité de saturation décroissante position du waist de pompe Thickness z (mm) Pump waist location Decreasing pump saturation intensity ↓ Fig 7b) 0 0.5 1 1.5 2 2.5 3 distance z (mm) position du waist de pompe Intensité de saturation décroissante Thickness z (mm) Pump waist location Decreasing pump saturation intensity ↓ Figure 7: Evolution of pump power (Fig. 7a) and temperature difference T(0) –T(r0) (Fig. 7b) versus crystal thickness z. The pump saturation intensities values are: ∞= I - 50 - 20 - 10 - 5 kW/cm2 (for these curves ηh= 0.065 et Kc = 2 W.m-1.K-1, corresponding to the parameters of Yb:GdCOB). 23 /115 Fig 8a) Thickness z (mm) radius r (mm) fig. 8b) radius r (mm) Thickness z (mm) Figure 8: Temperature distribution under nonlasing condition. The pump beam divergence inside the crystal is taken into account (M2 = 80). In Fig 8a) the saturation of absorption is ignored; Fig 24 /115 8b) pump absorption saturation is taken into account ( I = 4.1 kW/cm2). The parameters used are form Yb:GdCOB. Equations (II.1.13) and (II.1.14.) can be solved numerically and injected in (II.1.9.) to obtain the temperature distribution. Figure 7 shows the evolution of pump power (fig. 7a) and temperature (fig. 7b) at the center of the rod versus crystal thickness, for various values of the pump saturation intensity. Here we assumed that the pump beam waist was located at z0 = L/3, which is experimentally well verified, as far as the laser output is optimized (see figure 6 and above text). In absence of saturation I infinite), both the pump power and the temperature experience an exponential decay as expected; but for lower values of the pump saturation intensity, the temperature reaches a local minimum at the pump beam waist. Figure 8 shows a 3D view of temperature distribution without saturation (fig. 8a) and in presence of strong saturation (fig 8b.) corresponding to Ipsat = 4.1 kW/cm2, that is the value for Yb:GdCOB. It appears in the latter case that the region where the pump density is the strongest (near the pump beam waist) is not the region where the temperature is the highest (near the faces of the crystal). Pump beam divergence appears to be an important parameter: it makes, for this example, the temperature higher at the exit face than at the entrance face of the crystal. In presence of laser extraction, the pump intensity evolution through the crystal is given by: satsat (II.1.15.) where ( ) NlabslNS λσα = (II.1.16.) 25 /115 ( ) ( )[ ]τλσλσλ lemlabsl sat + = (II.1.17.) are the non-saturated absorption coefficient at laser wavelength, and the laser saturation intensity, respectively. When the intracavity laser intensity I largely exceeds I , and if reabsorption at laser wavelength is small, one can show that (II.1.15) simply becomes: α−= (II.1.18.) which means that the ground manifold is repopulated so that absorption is not saturated any more. In real cases, as a matter of fact, the absorbed pump power under lasing conditions is intermediate between the non saturated regime and the saturated (non lasing) regime: in a first approximation it is possible to ignore saturation effects only if the laser extraction is efficient. II.1.4. Determining the absolute temperature: the boundary conditions In this subsection we deal with the boundary problem. For the moment we have established expressions for the temperature gradient, but we have no idea of the absolute temperature inside the crystal. Let us assume that the four edge faces of the crystal are in “contact” with a radiator, which will be in most cases a piece of cooled copper. The first boundary condition expresses the continuity of the thermal flux across these contacts: copperinside copper crystaltheinside crystal n (II.1.19.) where K is the thermal conductivity, n is the surface normal vector, and ∂/∂n the normal derivative. Common metals (Copper or indium, the latter being used as an intermediate contacting material) have thermal conductivities that are several orders of magnitude higher than the usual conductivities of laser crystals: 400 W.m-1.K-1 for copper and 820 W.m-1.K-1 for indium. This means that the 26 /115 temperature gradients inside these metals will always be negligible, so that we consider in the following that the temperature inside the radiator is uniform and is noted Tc. Let’s see now the second boundary condition. In many papers and FEA codes, the temperature at the edge of the crystal is set equal to Tc: cTrT =)( 0 (II.1.20.) This is actually true only for an ideal contact [58]. But even for flat and polished surfaces pressed one against another, this relation is far from reality [59]. The most realistic condition is surprisingly a Newton-type law of cooling, even if we indeed deal with conduction problems here: ( )cc TrTHn K −=−= )( 0∂ (II.1.21.) where jq is the thermal density flux. H is the heat transfer coefficient or surface conductance (W.cm-2.K-1). H is of course infinite for ideal thermal contact. Carslaw et al. [58] have shown that the physical origin of a temperature gap between the edge of the rod and the mount was due to the presence of a thin oxide (or air or grease) layer, which acts as a very large thermal resistance. Measuring the heat transfer coefficient is usually difficult and not found easily in the literature: we present in the next section a simple and accurate method to perform this measurement. What about the end faces, which are in contact with air most of the time? The heat can flow out of the crystal through the two end faces by both convection in free air and thermal radiation. Cousins [60] calculated the equivalent H coefficient for the two processes and has shown that both coefficients were of the order of 10-3 W.cm-2.K-1. Since the measured H coefficients for conduction are typically in the range 1-10 W.cm-2.K-1, this is the proof that the assumption of pure radial heat flux made on the previous subsection is correct. 27 /115 Using (II.1.9.) et (II.1.21.) one can calculate the temperature gap between the radiator and the edge of the crystal: )()( (II.1.22.) The parameter of interest here is the normal derivative of the temperature at the interface. This explains why the quality of the thermal contact has a tremendous impact on side or edge-pumped slabs or rods; since in these configurations the temperature distribution is described by a formula of the type II.1.4, that is a parabolic dependence. In contrast, in end-pumped configurations, where the temperature profile is described by equation II.1.6., the thermal gradient at the periphery is smaller and the requirement of a good thermal contact can be loosen. It is also interesting to know the maximum temperature reached inside the crystal. In order to obtain a easy-to-handle scaling formula, we make the strong assumption that absorption saturation is absent, and we ignore the divergence of the pump beam inside the crystal. We have: cmax w (II.1.23.) As a conclusion for this section, we will list some conclusions one can make from these two last equations, like a list of recipes to reduce the temperature Tmax : • Increase wp : obvious and efficient, but at the expense of laser efficiency. • Increase H. As shown in the next experimental section, reducing H does not affect the temperature gradient, and will not help to reduce the thermal lens magnitude. All we can get is a uniform decrease of temperature. However reducing the absolute temperature is actually more interesting in an Yb-doped crystal than in an Nd-doped material for example, in virtue of reabsorption losses that are highly temperature-dependant. A better contact can also help 28 /115 reducing fracture risks but this is not directly linked to a decrease of the temperature either: it is because a good contact can induce radial components to the stress tensor at the periphery, or also because it will decrease the density of high spatial frequency alterations of the surface which are the ultimate causes of crack-induced propagating fractures [61-63]. • Decrease Tc : if the radiator temperature is decreased but still remains around the room temperature (that is with a standard thermoelectric or water-flow cooling), the effect is the same as increasing H: we only play on the temperature pedestal, not on the gradient. However, if the mount is cooled far below room temperature (at cryogenic temperatures for instance), the thermal conductivity of the crystals significantly increases, which is highly positive for the thermal gradient. This approach has been successively applied to reduce the thermal lens in high-energy femtosecond laser chains [64] or in Nd:YAG rods [65] • Decrease the crystal size? The crystal size has no influence on the temperature gradient. To understand its influence on the maximum temperature, Tmax is plotted versus r0 for different values of H in figure 9. We observe that when the radius of the crystal exceeds roughly 10 times the pumped area radius wp, the temperature becomes independent of r0 . In practice, it is possible to reduce the absolute temperature using small crystals, providing they are really small (see for example [66]). It is practically very difficult to cut and polish crystals whose size is smaller than 2 mm: one can then conclude that the transverse section of a crystal is not a parameter on which one can play efficiently. Besides, the effect of a bad thermal contact is visible only for crystals whose size would be on the order of the pump spot size. As illustrated by figure 9, a small crystal with bad cooling (for example r0 = 0.5 mm and H = 0.1 W.cm-2.K-1) is far worse than a « reasonably » sized crystal with correct cooling (r0 = 2 mm ; H = 1 W.cm-2.K-1) since the temperature difference between the two configurations reaches 200 °C. 29 /115 • Add an axial component to the heat flux: it does not appear in equation II.1.23 because it has been derived with the assumption of a purely radial heat flux. However, one can add a large axial heat flux by putting either a « transparent radiator » in front of the input face (this is the principle of composite bondings [67] ) or by using thin disks (i.e. L« r0) that are very efficiently cooled through the face in contact with the radiator [68]. 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Crystal radius (mm) H = ∞ : perfect contact H = 1 W/cm2/K H = 0.1 W/cm2/K 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Crystal radius (mm) H = ∞ : perfect contact H = 1 W/cm2/K H = 0.1 W/cm2/K Figure 9: Maximum temperature at the center of the input face of the crystal T(r=0, z=0) versus the crystal radius r0. The absorption saturation is neglected, as well as the divergence of the pump beam inside the crystal. The parameters are: Tc=15°C, ηh=6.5 %, αNS = 7.4 cm-1, Kc=2.1 W/m/K (values for GdCOB), Pinc=15 W, wp=100 µm. 30 /115 II.2. Experimental absolute temperature mapping and heat transfer measurements using an infrared camera II.2.1. Introduction As depicted in the previous paragraph, the temperatures obtained by solving the heat equation are only relative temperature distributions, expressed with respect to the rod surface temperature. The latter depends on the boundary conditions and is then very difficult to predict. Direct temperature mapping could consequently be a helpful measurement to understand pump-induced thermal effects. Moreover, we have shown that one of the crucial parameter to uniformly decrease the temperature inside the crystal (which can be useful to reduce fracture risks, see above paragraph) is the thermal contact between the crystal and its surrounding mount. Consequently, the knowledge of quantitative and experimentally measured information as the heat transfer coefficient H is of practical importance for high power laser development. We herein report on a very simple experimental setup, based on an infrared camera that can perform spatially resolved analysis of the absolute temperature on the entrance face of the crystal, where temperature reaches generally a very high value (in any case higher than at the beam waist, as explained in the subsection II.1). We can also experimentally measure the heat transfer coefficient H between the crystal and its surrounding for different types of commonly used thermal contacts. We first describe the experimental setup that allows such measurements, and illustrate it with the well-known Yb:YAG crystal [8]. II.2.2. Experimental setup for direct temperature mapping The experimental setup is presented on figure 10. A fiber-coupled laser diode was focussed inside an Yb:YAG laser crystal; the infrared emission of the entrance face of the crystal was observed with an infrared camera. A dichroic Zinc selenide (ZnSe) plate was used as a dichroic mirror: it was High Reflectivity (HR) coated for 960-1080 nm on one face (at 45° angle of incidence) to direct the 31 /115 pump beam into the crystal, and also coated for High Transmission (HT) in the 8-12 µm spectral range on both faces to let the thermal radiation reach the camera. Figure 10: Experimental setup for absolute temperature measurements. A germanium objective (focal length 50 mm, N.A. 0.7, aberration-corrected for infinite conjugation) was appended close to the ZnSe plate to create the intermediate thermal image with high spatial resolution. The camera was an AGEMA 570 (Flir Systems Inc.) consisting of 240x320 microbolometers working at room temperature. The measured noise equivalent temperature difference (NETD) of the camera is 0.2 °C. The numerical aperture of the whole imaging system in the object plane being around 1, a theoretical spatial resolution of about 10 µm could be achieved; however, the resolution is here limited to 60 µm by the size of the pixels of our camera. The crystal used here was a 2-mm long, 4x4 mm2 square cross section, 8-at. % doped Yb:YAG crystal. It was 32 /115 AR-coated on its faces (the lateral ones are polished). Its thermal conductivity, which is lower than that of an undoped YAG crystal, was measured to be 7 W.m-1.K-1 (11 W.m-1.K-1 for the undoped crystal). The pump source was a high power fiber-coupled diode array (HLU15F200-980 from LIMO GmbH) emitting 13.5 W at 968 nm. The fiber had a core diameter of 200 µm and a numerical aperture of 0.22. The output face was imaged onto the crystal to a 270-µm-diameter spot via two doublets. The crystal absorbed 5.4 watts of pump power in this case. The crystal was clamped in a copper block by its four side faces. In addition, on the top surface of the crystal, a frictionless copper finger allowed us to apply a well-controlled pressure on the crystal by the use of a set of known weights put upon the finger. The heat is finally evacuated from the copper block by a flow of circulating water. The key issue of infrared absolute temperature measurements is the correct calibration of the system. Indeed, neither the crystal nor the copper mount has an infrared luminance which equals that of a blackbody at the same temperature. The signal V detected by one pixel for a portion of crystal (or copper mount) at temperature T is: ( ) ( ) ( )∫ ελ dLL TTrSGTV rt optr (II.2.1.) where G is the geometric extent; ( )λrS is the spectral sensitivity; Tropt is the whole transmission coefficient of the ZnSe plate, Germanium objective and camera optics; dLTBB is the spectral luminance of a blackbody at temperature T, ε(T) is the emissivity; Lr denotes the infrared luminance of the camera itself (and its close surroundings) which is reflected back into it by the Germanium objective and by the polished surface of the crystal; Lt is the luminance transmitted through the crystal: it is zero in the 8-12 µm range since the crystal is highly opaque in this spectral region. Lr is nonzero and makes polished objects look brighter than blackbodies: if Lr is ignored it leads to overestimation of the temperature around room temperature. Inversely, the emissivity is less than 33 /115 one and makes objects radiate less than a blackbody. Since the parameters ε and Lr are dramatically dependant on the surface quality and flatness, all the visible parts of the heat sink were covered with lustreless black painting. Moreover, the evaluation of all those parameters is not straightforward. We propose to calibrate the whole system as follows: the crystal and the copper mount were heated together to a set of given temperatures using a thermoelectric (Peltier) element, and we then compare with the temperature given by the camera to apply the adequate correction. This careful calibration allows rigorous and absolute measurement of the temperature with a spatial resolution large enough to study with sufficient accuracy the thermal behaviour on the crystal’s input face. II.2.3 Results and measurement of heat transfer coefficients Figure 11: The temperature map obtained when the crystal is clamped by its four edge faces by bare contact with copper without thermal joint (a), with heat sink grease (b), with a thin graphite layer (c), and with pressured and non-pressured indium (d). 34 /115 Figure 11 shows the temperature map obtained when the crystal is clamped by its four edge faces by bare contact with copper without thermal joint (a), with a thin graphite layer (b), with indium (c) and with heat sink grease (d). Figure 12 is an enlargement of the two extreme cases, namely the bare contact and heat sink grease contact, with a transverse profile (y = 0) that shows the temperature evolution along the crystal lateral dimension. Figure 12 : Temperature mapping of the crystal (front view) and lateral profile at y=0 for two different types of thermal contact (direct copper-crystal contact on the left, with grease on the right). In the “bare contact” case, a clear gap is noticeable between the temperatures of the mount and at the edge of the crystal. The temperature distribution is parabolic inside the pumped region and then 35 /115 experiences a logarithmic decay until the edge of the crystal, in good agreement with the theory described in the previous section in the case of fibre-coupled diode pumping (see equation II.1.9 and figure 5). As already mentioned, the quality of the heat transfer at the interface between the crystal and its mount has an influence on the value of the temperature but not on the thermal gradient. We consequently studied more in detail the heat contact. The heat transfer coefficient H is defined by equation (II.1.22), where the thermal gradient is considered normal to the surface. Our system provides a space-resolved temperature mapping of the crystal, with a spatial resolution which is far below the crystal size: it then allows the measurement of H. By performing a linear fit of the temperature versus position on the points that are closer to the crystal edge, the heat flux can be determined: by applying the equation (II.1.22), one can then infer the value of H. We found for instance a value of 0.25 W.cm-2.K-1 in the case of bare contact. We estimate that the uncertainty on H is about 15%. The order of magnitude obtained is consistent with the values evoked by Carslaw [58] and Koechner [69]. The hot spot that can be noticed in figure 12 betrays the poor contact between the polished face of the crystal and the copper surface. The heat transfer is primarily a question of how much two surfaces are in contact with respect to each other; we checked experimentally that the temperature inside the crystal did not depend on the applied pressure: we did not observe any noticeable variation of the temperature when changing the applied pressure in absence of thermal joint between the crystal and the copper mount. We summarized in table 3 the results obtained for the different thermal joints used in our set of experiments, namely graphite layer, indium foil and heat sink grease (CT40-5 from Circuitworks®). Graphite layer (around 0.5 mm thick) does not modify significantly either the maximum temperature or the heat transfer coefficient, but it was noticed that the contact was much more uniform than in the case of bare contact: in particular no hot spot appeared any more and the contact was somewhat independent of the applied pressure. It is not the case with indium foil. For this 36 /115 experiment the crystal was wrapped within a 1-mm thick indium foil. Since Indium is a soft material, the quality of the contact is greatly dependant on the applied pressure. The temperature at the center of the pumped region experiences a 7°C decrease while the pressure increased from 1.5 kg/cm2 to 22 kg/cm2 as shown in figure 13 (note that in this case the H coefficient is measured across the surface where the pressure is applied and is then an “effective” heat coefficient that takes into account the transfer from crystal to indium and then from indium to copper.) Figure 13: Evolution of the heat transfer coefficient H (squares) and maximum temperature (triangles) versus applied pressure for indium-wrapped crystals. The most dramatic change in heat transfer coefficient is obtained with heat sink grease (see table 3). The temperature gap drops down to 1°C and H reaches 2 W.cm-2.K-1. The heat contact is here independent on the applied pressure. This better heat transfer coefficient is achieved while the thermal grease has a much lower thermal conductivity than indium (0.62 W/m.K for 37 /115 CircuitWorks ® CT40-5 thermal grease vs. 82 W/m.K for pure indium). This is an illustration, in conjunction with the data about the variation of H with the applied pressure, of the idea that achieving a good H is first and foremost a question of decreasing the thermal resistance at the interface (eliminating air gaps, maximizing the surface contact…) Table 3: Table of measured H coefficients for different contacts. Tmax is the temperature at the center of the pumped region; Te is the temperature at the edge of the crystal (averaged on the 4 sides if not symmetrical), and Tm is the copper mount temperature near the crystal. Contact H (W.cm-2.K-1) Tmax (°C) Te(°C) Te-Tm (°C) Bare 0.25 49.8 33.5 10.7 Graphite layer 0.28 46.5 30.5 8.7 Indium foil (applied pressure : 22 kg/cm2) 0.9 40.0 25.1 4.9 Heat sink grease 2.0 37.0 21.6 1.5 38 /115 III. Thermal lensing effects: theory III.1. Introduction The previous chapter was dedicated to the calculation and measurement of the temperature distribution, which is the first essential step for the study of thermal effects. The appearance of thermal gradients causes the crystal to be under stress. The presence of inhomogeneous temperature, stress, and strain distributions is responsible of many deleterious effects for laser action: the most radical effect is fracture, observed when the hoop (tangential) stress at the periphery of the crystal exceeds the so-called tensile stress. More subtle effects arise from the stress-induced modification of the optical indices of refraction: alteration of the stability domains of the cavity, depolarization, losses and degradation in beam quality, all of these four phenomena being largely intermixed. In this paper we designate by “thermal lensing” effects all the phenomena resulting in a phase change of a beam passing through a pumped crystal; in other words we do not restrict this expression to an ideal spherical thin thermal lens, we also include its aberrations and its polarization-dependant aberrations. This chapter presents a general and synthetic scope of these effects, and points out how they are related to each other. We base our discussion on analytical simple scaling relationships, and we point out the validity of these formulas. In this review, we come back to well-established theories that have been exposed many times in the past [60, 69, 70], but we also bring some new insights, to our knowledge, on some points of practical interest. In particular, we will point out several inaccuracies generally reported about the values of the photoelastic constants in YAG, which are the result of an incorrect use of the Hooke Law; we also present what is (still to our knowledge) the first derivation of the photoelastic constants that have to be used for end-pumped crystals, that is in other words when the calculation is made using the plane stress rather than the plane strain approximation. 39 /115 We will eventually point out that the use of the dn/dT coefficient (temperature derivative of the refractive index) is very confusing. In one hand, the classical formula (which reveals the existence of three contributions: the “dn/dT” part, the bulging of end faces, and the photoelastic effect) which is used since decades is correct provided that the dn/dT appearing in this expression is understood as a partial derivative taken at constant strain. In the other hand, the experimentalist can measure a quantity which is closer to a partial derivative at constant stress, and the partial derivatives are obviously not equal. The dn/dT parameter is then not actually the correct parameter to be used in order to estimate the thermal lens focal length: this subtlety means in particular that one cannot, in general, make use of a value of dn/dT readily found in handbooks to estimate the magnitude of the thermal lens of an operating laser, because the experimental measurement conditions are in the two cases mutually inconsistent. We’ll see however that when the dn/dT is large and positive, the difference can be ignored. We will conclude this review by a synthetic diagram showing all the thermal effects and how they are connected together. Given that thermo-optical properties pertain more to a crystal host than to a doping ion, this section is more general than the others and does not restrict to the case of Ytterbium-doped materials. III.2. Stress and strain calculations Once the temperature field has been computed, the next step is to calculate the stress and strain distributions inside the crystal, obtained from the so-called “generalized” Hooke law, because it includes the thermal expansion term [49]: TS ijTklijklij ∆+= ασε (III.1) 40 /115 where i, j, k, l = 1,2,3 and the Einstein summation convention is used. ∆T is the temperature shift with respect to equilibrium (no strain), (Sijkl) is the compliances tensor, (σkl) is the stress tensor, (εij) is the strain tensor, and ( α ) is the thermal expansion coefficients tensor. The analytical formulations of thermal stress and strain distributions in end-pumped lasers require a large amount of approximations, thoroughly discussed in Cousins’s reference paper published in 1992 [60]. In order to obtain an analytical solution to the stress problem, an additional approximation is required, which consists in considering the problem in two dimensions. This is either the plane strain approximation (valid for long and thin rods) or the plane stress approximation (valid for thin disks). Interestingly, Cousins [60] pointed out that the plane stress approximation remained valid (within approximately 10%) for aspect ratios up to L/2r0 = 1.5, providing that the stresses were considered as mean values integrated along the whole thickness of the rod. In the previous section, when we derived the temperature distribution, we had considered the crystal as a stacking of thin slices, so that the temperature could be calculated in a single thin slice as if the surrounding material did not exist. It is not possible to use this approach for the stress distribution, because a given slice is under the mechanical influence of the slices located on both sides, and cannot be considered as independent. This is why an attempt to take into account absorption saturation effects and pump divergence (as far as thermal stresses are concerned) inevitably requires a finite element analysis. As far as diode end-pumping is concerned, the plane stress approximation is then the most meaningful approximation that can be done. However, the exact calculation remains possible for a given crystal (using FEA codes) provided that all the compliances and thermal expansion coefficients are known. To the best of our knowledge, these coefficients have been measured for a very restricted number of laser crystals up to now: we may readily find these data for YAG, sapphire, YLF and Y2O3 [71]: other data are available in the Handbook of Optics [71] but for 41 /115 crystals which are not commonly used in laser applications. The data for other materials are almost inexistent. The analytical solution of the generalized Hooke law (eq. III.1) can be found, for example in [60] under the plane stress approximation. In this paper, the discussion was restricted to isotropic materials (at a mechanical rather than an optical point of view, that is when the compliances can be reduced to only two parameters, the Young modulus and the Poisson ratio), absorption saturation was not considered, and the divergence of the pump beam was not taken into account either. Once this has been calculated, it is possible to study thermal fracture issues. It is generally admitted that fracture occurs when the maximum hoop (tangential) stress σmax at the surface periphery of the crystal exceeds the tensile stress σTS. The latter depends on both the fracture toughness of the material and on its surface flatness. These aspects have been studied in detail by Marion [61-63]. Data about fracture toughness of materials can be readily found in the literature for YAG, fluoroapatites, sapphire, yttrium orthosilicate YSO and some phosphate glasses [39, 63, 71, 72]. For a qualitative discussion of fracture issues in Yb-doped materials, the reader is invited to refer to a previous publication [73]. III.3. How can we take into account the photoelastic effect ? We now consider how the temperature, stress and strain fields inside the crystal alter the phase of the cavity beam, all these effects being referred as “thermal lensing” phenomena. The appearance of stresses in the crystal causes the linear optical indicatrix (related to the linear indices of refraction) to change its shape, its size and its orientation. This photoelastic effect is accounted by the 4th rank elasto-optical tensor (pijkl): klijklij pB ε=∆ (III.2) Where (Bij) is the dielectric impermeability 2nd rank tensor. This expression if obviously valid in the linear optical regime only, and when piezoelectric effect is neglected [49]. 42 /115 The complete computation of thermal effects in a given material requires that we know everything about the tensors (Sijkl), (pijkl) and ( α ). The minimum number of independent terms of each tensor depends on the crystal symmetry, as discussed by Nye [49]. For instance, let us consider crystals like KGW or KYW [74], GdCOB [75], YCOB [76] or YSO [43] which are of particular interest for Ytterbium doping. These crystals belong to the monoclinic crystal system: this means that the compliances can be “reduced” to 13 independent parameters, and the elasto-optical tensor, once the redundant coefficients have been identified, appears to have 20 independent coefficients [49]. Adding the 3 thermal expansion coefficients, this means that we need to know no less than 36 coefficients before to be able to draw the new index ellipsoid at a given point of the crystal. Obviously these parameters are not known (for any monoclinic crystal, in fact, to the best of our knowledge), which means that a rigorous calculation even with a FEA code is just not possible. This simple remark highlights the importance of experimental measurements of thermal effects in such crystals, and shows the interest as well as the inherent limitations of a simple analytical model. III.4. Simplified account of photoelastic effect in isotropic crystals Now we are aware of these difficulties, we focus our discussion on a simpler study case, actually the only case where analytical expressions are obtainable, that is: - we consider isotropic crystals only, and more particularly the widespread YAG crystal, for which all the parameters previously evoked are well known. Generally, cubic crystals belonging to the space groups 43 ,432, 3m m m (like YAG) require 3 independent elastic coefficients ; however the remarkable isotropic mechanical properties of YAG enable to think of only two mechanical coefficients, that is the Young modulus and the Poisson ratio. In the end, 6 coefficients only are needed for YAG. - The plane stress approximation is used; 43 /115 - the pump profile is still axisymmetric; - we consider what occurs inside the pump volume, that is for r < wp. - the pump divergence inside the crystal is neglected. - temperature, stress, strain, index are considered integrated along the whole thickness of the rod. For a physical quantity A(r,z) , we shall note : ( ) ( ) dzzrArA the integrated value of A(r,z) along the rod. According to the Neumann-Curie theorem [49], under these assumptions the principal axes of all the involved tensors (stress, strain, index ellipsoid) are radial and tangential. The notations used in the following are depicted in figure 14. nr nθ nr nθ Figure 14: Orientation of the indices’ ellipsoid in an isotropic crystal under thermal stress. 44 /115 The shift of the principal indices nr and nθ are related to the diagonal coefficients of the optical indicatrix by: = (III.3) We can also write the indices variation as a function of the strains as follows: Bnn ε =∆−=∆ ,,,, (III.4) The six coefficients (i= r, θ; j=r, θ, z) can be calculated from the pijkl coefficients by a correct change of coordinates. The complete solutions for the strains εr, εθ and εz can be found, for YAG, in many papers and textbooks [60, 69] and can also be found in the appendix. Inside the pump volume, it can be readily shown that stresses and strains have a parabolic dependence, like the temperature distribution. Since the indices of refraction are linear combinations of strains, it turns out that radial and tangential index distributions must also be parabolic. The (integrated) shift of refractive index may be written as: ( ) ( ) ( ) rndzzr rn θθ =∆ ∑ ∫ (III.5) where Cr and Cθ (or C’r and C’θ) are constants which will be called, following the pioneering work of Koechner, the “photoelastic constants”. Their calculation and expression is given in the appendix. We would like to point out two important clarifications about these coefficients (and also justify the presence of this appendix in this review): ♦ W. Koechner published incorrect values of these coefficients in his reference book [69] because the temperature term in the Hooke law has been omitted; this omission has first 45 /115 been highlighted by Cousins [60], but the expression of the photoelastic constants remained uncorrected in the following editions of this book, and nowadays still remains used under this form in many papers. ♦ Secondly, the derivation of the photoelastic constants requires turning the 3D problem into a 2D problem, as discussed in the previous section. Only the plane strain case was considered by Koechner. However, we saw that the plane stress case is closer to reality in end-pumped rods. Here we denote as Cr and Cθ the photoelastic constants valid for long and thin rods (the “Koechner case”, that is when the plane strain approximation is valid), and C’r and C’θ the photoelastic constants derived within the framework of the plane stress approximation. Since we are only interested in end-pumping, we only consider the ' ,θrC constants in the following. The above-mentioned relations are derived by making the assumption that the pump beam radius is constant through the crystal thickness; however we don’t have to assume a particular absorption regime, so that they stay valid in presence of absorption saturation, under lasing as well as under nonlasing conditions. As we will see in the next section, the most interesting feature for the laser scientist is the index shift between the center and the edge of the pumped zone, since it yields the contribution to the global thermal lens. From (II.1.10), and using the bracket notation for z-integrated values, we can write (III.5) under the form: ( ) ( ) ( ) ( )rTTCnrnn −=∆−∆ 020 '30,, ,θαθθ (III.6) III.5. A consequence of strain-induced birefringence: depolarization losses. 46 /115 The birefringence of a crystal submitted to thermal stress has two main consequences for a light beam passing through it: both its state of polarization and its phase will be altered. Before examining in detail the effect on phase (i.e. thermal lensing effects), let’s first examine the influence on polarization. We use the restrictions exposed in the previous subsection, given that the following can be readily extended to uniaxial crystals provided that the optical axis lies parallel to the propagation axis. In these cases, if no polarizing element is added into the cavity, the laser output is not polarized, and stress-induced birefringence has no net effect. Beam polarized along x crystal Depolarized beam Figure 15: Depolarization of a polarized beam passing through an isotropic crystal under thermal stress. For many applications however a polarized output is desirable: the situation is depicted schematically in figure 15. An incident beam (polarized along the x direction) will have its polarization modified differently for every single ray: For a ray crossing the (Ox) or (Oy) axis for instance, the polarization is not modified, for all the other rays the polarization becomes elliptical, with principal axis that are radial and tangential. 47 /115 At every roundtrip in the laser cavity, the beam meets a polarizing element (such as a plate at Brewster angle) and this depolarizing element, yielding to the so-called “depolarization losses”. Another effect, consequence of the latter, is a modification of the beam spatial profile: since the beam is not altered along the (Ox) and (Oy) directions and suffers losses elsewhere, it tends to take the shape of a cross; this aspect is chiefly discussed in Koechner [69]. In biaxial crystals (or uniaxial crystals with optical axis normal to the direction of propagation), the output is naturally polarized along the crystallophysic (not to be confused with cristallographic) axis along which the emission cross section is the highest. In this case, stress- induced birefringence does not generally twist the index ellipsoid enough to significantly modify the polarization state. Finally, clever solutions have been imagined to compensate for depolarization losses in isotropic crystals: for example the use of two rods with a Faraday rotator inbetween [78, 79], or even a simple quarter waveplate [70]. This last technique is however limited to a few configurations [80]. In the following, we present results obtained with Yb-doped crystals which are either isotropic (and naturally not polarized), or naturally birefringent, so that this problem was not encountered. III.6. Thermally-induced optical phase shift. III.6.1. Expression of the optical path We present now this derivation with some detail, because it makes appear a trivial yet fundamental difference between these results and what is currently reported in the literature. The derivation is largely reproducing Cousins’s work [60]. 48 /115 n (T, ε j ) L+∆L d - ∆L z = 0 n0 (Tc ) z = 0 Figure 16: Notations used for the calculation of the optical path Consider a crystal whose length is L and index n0 at temperature Tc (temperature of the heat sink) and in absence of strain. We consider the optical path of a straight ray (running parallel to the crystal axis z) between a plane z = 0 (taken at the entrance face of the crystal) and a plane z = L+d (see figure 16). In absence of temperature and stress fields (pump off), the optical path is: dLTn c off +×= )(0δ (III.7) With the pump on, the optical path is dependant on the lateral shift r of the ray with respect to the crystal axis, but also on the direction of polarization. As a consequence there will be two distinct thermal lens focal lengths, one for a (virtual) radially polarized beam and another for a tangentially polarized beam, what is commonly named “bifocussing”. We may write the optical path with the pump beam “on” as (see figure 16): ( ) ( ) ( )rLddzTnr ∆−+= ∫ , ,, εδ θθ (III.8) where ∆L(r) is the crystal length shift due to the inner compression, responsible for the bulging of the end faces. 49 /115 Assuming small variations in the refractive index, one may expand ( )jr Tn εθ ,, in Taylor series, and discard the second and higher-order terms: ( ) ( ) ( )( ) ( ) ( )rLddzzr += ∫ ∑ =0 ,, 0 ,,, (III.9) Note that the temperature derivative of the refractive index appearing in this equation is a partial derivative calculated at constant strain. As we will discuss in the next subsection, this is not the usual dn/dT parameter. The rod length change ∆L(r) can be written as a function of the axial strain εz and equals ( ) ( ) ( )rdzz,rrL z z εε∆ == ∫ (III.10) From the strain-stress relationships featured in the appendix, it can be easily shown that the axial strain, under the plane stress approximation, is equal to: ( ) ( ) ( ) ( )rTrTr Tz −=+−= 01 ναε (III.11) Given that the first-order terms in the integral appearing in (III.9) are much smaller than n0(Tc) and given that ∆L « L, we can write that for the first order terms ( ) ( )∫∫ ≈ ∆+ LLL ...... The relative optical path is then: ( ) ( ) ( ) ( ) ( )( ) ( ) ( )0110 TrTnr offon −+−+⎟ (III.12) III.6.2. The thermal lens focal length The thermal lens is related to the optical path difference (OPD or ∆ in the following) between an on-axis central ray (r = 0) and an outer parallel ray passing inside the pumped region, defined by a radius r < wp. 50 /115 We note : ( ) ( ) ( )rr relrrelr θθ δδ ,, 0 −=∆ (III.13) The expression of the optical path difference is, from (III.6) and (III.12): ( ) ( ) ( ) ( ) ( )rTTnCn r TrT +−++⎟⎟ =∆ 0112 0 , ναα θ θ (III.14) We assume that the thermal derivative of the refractive index is equal for radial and tangential index, so we can write: ( ) ( )( ) ( ) ( ) ( ) ( )rTT rTTCnn ++−+⎟ (III.15) where θχ ,r is usually called the “thermo-optic” coefficient. We remind the reader that this expression is valid only under some restrictive conditions that have been presented in detail in the section 4 of this chapter. Given that the integrated temperature shift is given by (from eq. II.1.11): ( ) ( ) ( ) ( )( ) dzzrTzTrTT ∫ =−=− (III.16) it appears that the optical path difference also follows a quadratic dependence in r. This means that in the paraxial approximation, the pumped crystal acts as a thin lens whose focal length is given by : ),( 2∆ = (III.17) In the following the difference between fr and fθ (responsible for bifocussing) will be omitted (realistic if the photoelastic effect is negligible, or if the laser beam is not polarized.) 51 /115 The thermal lens dioptric power is thus defined by: ( )( ) th Kw 002 2 ++−+⎟ == (III.18) where χ is a polarization-averaged thermo-optic coefficient. Let’s notice that this formula still holds in Yb-doped materials, with strong absorption saturation, since no assumption was made concerning absorption. If the pump profile is gaussian (e.g. end-pumping by another laser), one can show [70] that the thermal lens dioptric power is twice as large as (III.18), meaning that the thermal load is more spiky around the center of the beam than in the case of an uniform “top hat” pump beam energy deposition. To conclude, let’s address some orders of magnitude. To be correct, the previous derivation has to be performed within the paraxial approximation, which means Lf th >> . Taking some typical values (ηh ∼ 0.1, χ ∼ 10-5 , Kc ∼ 5 W/m/K, wp = 100 µm, et L = 3 mm), it appears that the paraxial conditions are met provided that Pabs << 100 W. This corresponds to most of practical cases. III.7. Discussion about the use of the “dn/dT” coefficient. Let’s start this discussion by the expression of the thermo-optic coefficient: ( )( ) 3 ', 0 0 ,1 1 2r T T r n n C Tθ θ ε χ ν α α ∂⎛ ⎞= − + + + ⎜ ⎟∂⎝ ⎠ (III.19) 52 /115 In this subsection, we would like to point out how this expression can be misleading if one carelessly uses, to evaluate the magnitude of a thermal lens, the so-called dn/dT instead of The error is especially important when all terms except the dn/dT are discarded for the sake of simplification. This represents, to our knowledge, a discussion that has never been published so far. The three contributions appearing in (III.19) may be understood as follow: ♦ The term ( )( ) Tn αν110 +− is clearly related to the bulging of end faces, and is the direct consequence of the inner compression of the crystal, which causes the optical path to increase (if αT>0). It is strictly true for an infinitely thin crystal, since plane stress approximation is used to derive it. It is reported by Cousins [60] that for a rod whose ratio length/diameter is 1.5, this term overestimates the actual bulging by around 35%. In general this can be taken as an upper limit for end faces bulging in DPSSLs. ♦ The term ' , 02 θα rT Cn accounts for the photoelastic effect only, as already discussed in subsections III.3 and 4. It explains bifocussing, depolarization and polarization-dependant astigmatism. ♦ As for the first term (III.19), it represents the partial derivative of refractive index at constant strain, which is the thermo-optic coefficient of a virtual perfectly rigid crystal. It is noteworthy that it is actually not the usual dn/dT parameter that one can measure and find easily in the literature. The “usual techniques” for measuring dn/dT are based either on geometrical optics (e.g. measurement of the minimum-of-deviation angle of a prism cut in the material under study [72]) or on interferometric techniques (e.g. measuring fringe patterns displacements [81]). In all cases, the sample is put into an oven and exposed to different well- known temperatures. The crystal is free to expand, and the temperature rise into the material is uniform, which is incidentally an essential condition for valuable measurements. In this case, 53 /115 obviously, the crystal experiences thermal expansion, a phenomenon which causes the index to change (to decrease, in general) in virtue of a pure photoelastic effect. In all these practical circumstances, the strain tensor relates directly to the temperature shift by the thermal expansion tensor, in other words the stress terms in equation III.1 are zero. The coefficient measured experimentally can then be regarded as a partial derivative at constant stress: measured (III.20) On the other hand, the reality experienced by the laser crystal while optically pumped is radically different. We know that in all cases (transverse as well as end pumping, thin disks as well as long and thin rods) the pumped area inside the crystal is under compression (negative stresses and strains), which is true as soon as the thermal expansion coefficient is positive. It can be explained qualitatively by saying that the central region of the crystal, yet hotter than the edges, is prevented from expanding by the expanding (cooler) outer parts of the rod, which eventually causes the central region to be under compression. As a result, we see that if we consider the measured dn/dT instead of the partial derivative at constant strain in eq. (III.19), the photoelastic contribution to the thermal lens, already fully taken into account with the term 2 ' , 0 θα rT Cn , which accounts for thermal expansion if any, is partially cancelled by the photoelastic (thermal expansion) term hidden in the measured dn/dT. In a first attempt to correct for this, we thus have to evaluate precisely the thermal expansion contribution to the measured dn/dT. It can be done in a simple way by considering the Clausius- Mossotti model for refractive index, which writes: 54 /115 ( ) ( )( ) (III.21) where ρ is the specific mass (density), M the molecular weight, Na the Avogadro number, and αe the polarizability. This expression is valid strictly speaking for isotropic ionic crystals (for covalent crystals the local field correction is smaller and atomic polarizabilities loose their meaning due to the very nature of covalent bonding) The modification of polarizability with temperature results from the change in thermal occupancies and spectra of the energy levels. Tsay et al. [82] have developed a two-oscillator model where they consider the contribution of both electronic and lattice vibration terms. The discussion about the different origins of dn/dT is beyond the scope of this review and will not be exposed in detail here. We can differentiate (III.20), assuming that changes in density only originate from isotropic thermal expansion. This is consistent with the experimental procedure used to measure this coefficient. ( ) ( )( ){ }, ,e e e measured dn T T T n n n n dT T T Tσ ρ α ρ α αρ ρ α ρ α ⎛ ⎞ ⎛ ⎞∂ ∂∂ ∂ ∂ ∂ ∂⎛ ⎞⎜ ⎟ = = + +⎜ ⎟⎜ ⎟⎜ ⎟ ∂ ∂ ∂ ∂ ∂ ∂ ∂⎝ ⎠ ⎝ ⎠⎝ ⎠ (III.22) From (III.20) and isotropic expansion assumption we obtain: ( )( ) 12 22 (III.23) This expression puts into the light three contributions to the measured dn/dT: - a pure effect of thermal expansion : ( )( ) 12 22 −+ (III.24) 55 /115 - the influence of thermal expansion on polarizability, ( )( )2 22 1 − + − ⎡ ⎤∂ ⎢ ⎥∂⎣ ⎦ (III.25) - a thermal expansion-independent contribution, which can be assimilated to the partial derivative at constant strain: ( )( ) 12 22 (III.26) Since the second and third terms do not breakdown easily into a set of available material physical parameters, no general formula based on the measured dn/dT can be derived for the last partial derivative. We see also that thermal expansion appears in both (III.24) and (III.25), so that we cannot in a straightforward way dissociate “pure” thermal expansion from strain-related polarizability effects. This formulation brings some questions (rather than answers) about the interpretation of some experimental results, in particular those obtained with materials whose measured dn/dT is negative. It is often reported that in such materials (e.g. LiCAF [72, 83], FAP [39], YLF [84]), the thermal lens is weak or even divergent (case observed in YLF crystals) because the negative dn/dT counterbalances (or even surpasses) the other positive terms in the expression of the thermo-optic coefficient. Although, because of the lack of data about these materials, the photoelastic term is just supposed to be positive, or evaluated from other materials whose properties are believed to be similar (Woods et al. [72] use data from CaF2 to evaluate photoelastic constants of LiCAF, Payne et al. [39] used data from LG-750 phosphate glass to approximate that of FAP). In some cases (e.g. thermal lensing measurements in Nd:YLF crystals [84]), discrepancies are reported between theory and experiment, which do not occur with YAG which has a positive measured dn/dT. 56 /115 A large and negative dn/dT coefficient means that the thermal expansion is dominating polarizability contribution for an unstressed crystal; it is observed that such behaviour is generally associated with a large thermal expansion coefficient. This also means that the photoelastic term, proportional to αT, can be expected to be greater. In contrast, we can say nothing about the sign of this term, whose knowledge requires that we know all of the pijkl coefficients of the crystal. This means that there is no obvious relationship between the sign and magnitude of the measured dn/dT and the sign and magnitude of the thermal lens. To go further, the only term which is truly always positive is the end faces bulging term. The polarizability dependence on temperature (eq. III.26) is mostly positive too [82]. We can then assess that negative thermal lensing is more likely to be explained by negative photoelastic terms, and/or possibly, by a negative e term. In conclusion, the crystals with negative measured dn/dT have to be considered very carefully as far as simulations are concerned: photoelastic terms must not be neglected. However, it remains that photoelastic contributions, whatever calculated or measured, tend to be small in many crystals. This means that the rude approximation made by replacing χ by the measured dn/dT (+ the end bulging term) will be all the more close to reality that the dn/dT is large and positive. III.8. A novel definition for thermo-optic coefficient based on experimentally measurable parameters. In the precedent subsection, we saw that there are many problems related to the definition of the thermo-optic coefficient, which are above all related to the abusive use of the parameter (dn/dT) in a context where it is not relevant. 57 /115 Actually, a better way to describe the phenomena is to start from measurable data that are relevant as far as solid-state lasers are concerned. The partial derivative at constant strain is a formal parameter which has not a real physical meaning since it is impossible to prevent the crystal from any strain, compression or thermal expansion. Furthermore, photoelastic and polarizability effects are so strongly intermingled, that one cannot imagine easily an experiment that could separate clearly the effect of one from the other. That’s why we propose to base solid state laser thermal characterization for high power applications on measurable data and separate the thermo-optic coefficient χ in three truly independent contributions, as follows: lgn bu ing birefringenceχ χ χ χ= + + (III.27) where ( )3 ' '0n T r n C C = + +⎜ ⎟∂⎝ ⎠ (III.28) ( )( )lg 0 1 1bu ing Tnχ ν α= − + (III.29) ( )3 ' '0birefringence T rn C Cθχ α= ± − (III.30) χbulging accounts for curvature of end faces, and is measurable by performing, for example, interferometric or wavefront measurements on a probe beam reflected on each side of the crystal, as done by Baer et al. [85] or Kleine et al. [86]. The expression given in (III.29) applies to the thin disk ideal model, but its real value can be computed depending on every special geometry, quite easily with a finite element code, since this is a pure thermomechanical problem, where data are more readily accessible or measurable. χbirefringence accounts for strain-induced birefringence (“+” for radial polarization, “-“ for tangential). It can be measured separately by performing measurements of polarization-dependant 58 /115 astigmatism, for instance, thanks to a wavefront measurement method sensitive to aberrations (see next section). Eventually, the term χn which accounts for all the refractive index variations with temperature, is not rigorously calculable for all the reasons exposed above, but its exact value can be deduced for each material and for each pumping configuration, from the separate measurement of χ (global thermo-optic coefficient), χbirefringence and χbulging. To conclude, let us give some orders of magnitude of different terms in widespread YAG. The measured value of the global thermo-optic coefficient (see next section of this article for details) for this material under diode pumping is 10 10-6 K-1. The different contributions calculated from tabulated data are: 6 19.10 measured − −⎛ ⎞ =⎜ ⎟ ( )( )2 2 6 12 1 31.5 10 measured n ndn + −⎛ ⎞ + =⎜ ⎟ ( )( ) 6 1lg 0 1 1 7.210bu ing Tn Kχ ν α − −= − + = 3 6 1 3 6 1 2 0.27 10 2 0.9310 n C K n C Kθ (see appendix for the detail of the calculation of photoelastic constants within the plane stress approximation). The thermo-optic coefficient are small, but the bulging term is far from being negligible. Weber et al. [87] have shown that in Nd:YAG, bulging represented 30% of the global thermal lens. We calculated also for information a “corrected” dn/dT, which is obtained after subtraction of the inappropriate thermal expansion term (III.24): this coefficient turns out to be very large here. 59 /115 In diode-end-pumped Nd:YLF and Nd:YVO4, Baer et al. [85] and Kleine et al. [86] have shown experimentally that the bulging term represented half of the total thermo-optic coefficient. A finite-element analysis, performed by Peng et al. [88], leads to a similar conclusion for a Nd:YVO4 crystal. III.9. The aberrations of the thermal lens. In conclusion to this chapter, we will introduce the thermal lens higher-order distortions id est the thermal lens aberrations. For a perfectly parabolic distortion of the wave front or equivalently a pure thermal lens the thermal distortion can be easily compensated by addition in the laser cavity of the opposite divergent lens or by adjusting the distance of the different cavity elements. But, if aberrations are present the compensation is very difficult and requires complex systems [89]. While uncorrected, these aberrations lead to degradation in beam quality (brightness), and also to losses due to diffraction of the beam high spatial frequencies [90]. The aberrations are present when the wavefront distortions induced by the absorbed pump beam are not perfectly parabolic. This occurs when the longitudinal pump beam has not a true top- hat profile, for example in the case of a Gaussian pump beam profile [91]. Moreover, if the laser beam size is larger than the pumped area, the aberrations also become important as shown in figure 17. The rays, far from the pumped area are almost not deviated. This is the signature of spherical aberration (which is sometimes referred as a “thermal lens varying with radius r”.) 60 /115 Pumped areaPumped area Figure 17: When the laser beam is larger than the pumped area, spherical aberration is observed. In general the aberrations affect the laser modes of the cavity in a way that to degrade the beam quality. This degradation can be evaluated by the M2 factor or the Strelh ratio for exemple [92]. We are just giving here an exemple of the influence of aberrations on the beam quality, some results given by Clarkson [70]. Let us consider that only 3rd order spherical aberration is present. In this case, the optical path difference ∆(r) can be written as: ( ) 44 −= (III.31) if the laser initial beam M2 factor is 2iM , the M 2 with the added aberations is : ( ) ( )22222 qif MMM += (III.32) 61 /115 with π8 442 L M = (III.33) where λ is the wavelength and wL the laser beam waist. One can show that obviously C4=0 for a top-hat pump profile provided that wL< wp. But for a gaussian pump-beam profile (beam waist wp) we obtain [70]: M (III.34) In that case, the wL/wp is present to the power of 4, which means a strong increase of M2 even for small mode mismatch. Another classical aberration to be considered is the thermal astigmatism. In this case, the analytical solution is not as simple as with the 3rd order spherical aberration and we will only focus on the qualitative approach, answering this simple question: in which conditions does the thermal lens exhibit astigmatism? In practice it will occur whenever: - the thermal conductivity is anisotropical ; - the thermal expansion tensor ijTα does not reduce to a scalar quantity; - the cooling is inhomogeneous; - the laser beam is polarized, even for an isotropic material. This is the so-called polarization- dependant astigmatism, which can be used a sa probe to evaluate the strain-induced birefringence (see figure 18.) 62 /115 crystal (nr-1)L (nθ-1)L Figure 18: Illustration of polarization-dependant astigmatism. Here a vertically polarized strikes the crystal from the left. The ray # sees the radial index of refraction nr while ray # sees the tangential index of refraction nθ . If for the sake of simplicity the indices are considered constant over the whole crystal length L, the astigmatism is (nr- nθ)L. As a conclusion for this part, we present a schematic diagram (fig. 19) summing up the different thermal effects arising in a solid state laser medium, and how they are related to each other. 63 /115 Figure 19: Summary of the thermal effects in solid-state lasers. The observable consequences observable are presented in full rectangles. The aberrations can be split in two classes: the ones Pumping ∆T temperature deformations Modification of crystal lattice (thermal expansion) (stress) Elastic tensor Modification of refraction indices Thermal lens Induced birefringence Photoelastic effect depolarization Modification of polarizability, related to ⎜ ⎟∂⎝ ⎠ FRACTURE if σ > σTS Curvature (bulging) of both faces of the crystal focussing For an ideal lens Changes in stability range of the laser cavity, and in the size and the divergence of the laser beam. Losses potentially Spherical aberration or/and astigmatism Beam quality degradation ABERRATIONS of the thermal lens Polarization- dependant Astigmatism Leading to 64 /115 that do not depend on the polarization (lined rectangles) and the ones that depend on polarization (dotted rectangles) which come from the strain-induced birefringence. (*): 2 types of losses are induced by aberrations: the diffraction losses (associated with degradation in beam quality) and the losses induced by the eventual presence of a diaphragm in the cavity to prevent from oscillation of higher order laser modes. IV. Thermal lensing techniques IV.1. Introduction The first evidence of thermal effects in lasers were demonstrated in 1965 by Gordon, Leite and Whinnery [93] working at Bell Labs on He-Ne lasers for Raman spectroscopy applications. The use of liquids to Q-switch the laser lead to the observation of unexpected effects such as relaxation oscillation of jump of modes. The exceptionally long time constant of this phenomenon (several seconds) lead to the conclusion that a thermal lens was at the origin of the observed effects. This lens was created by the small absorption occurring in the liquids. The first application proposed by the authors was then to use it to measure very small absorption coefficients down to 10-4 cm-1. As a matter of fact, the so-called “thermal lens method” allows nowadays to measure absorption coefficient lower than 10-7 cm-1 [94]. Since 1965, the photothermal-methods panel available to physicians has grown in diversity [95]. Actually, due to the complexity of simulating thermal effects in lasers (see previous subsections), the experimental determination is often the only accurate method. Since the 70’s, numerous attempts have been done to measure the thermal lens in solid-state laser media. They can be classified in three categories: the geometrical methods based on the deflexion of a beam, the methods based on the properties of cavity modes, and the methods based on the wavefront measurements. 65 /115 IV.2. Geometrical methods These methods are probably the simplest methods to measure the thermal lenses. They can be separated in two sub-groups: one can exploit either the defocussing or the deflection of a beam passing through the pumped medium crystal Powermeter Figure 20 : Example of thermal lens measurement based on the displacement of the focal point. The principle of the first category of methods is very simple. Considering a probe beam going through the crystal, the measurement of the axial shift of the focal point position allows to retrieve directly the thermal lens using simple geometrical optics. When the rod is relatively large, one can use a collimated beam of comparable size to directly measure the position of this focal point by Z-scan measurement for example as shown in figure 20 (a small aperture is longitudinally translated in order to find the maximum of probe-beam transmission [96].) This method is based on the assumption that the lens is perfect (without aberrations) and then is especially suited to transverse pumping with large-size materials . In fact, the method is not easily applicable to end- pumped lasers because of the very large depth-of-focus associated with a very small and low solid- angle–probe beam, and obviously does not yield the thermal lens aberrations. Moreover, for small beam diameters (10-100 µm), this method needs to be generalized using Gaussian beam optics. Hu and Whinnery [97] described a simple method to evaluate the thermal lens with a probe beam whose size is comparable to the cavity beam in end-pumping schemes. This 66 /115 last method is described in figure 21: it can be based either on the measurement of the divergence of the probe beam [97], or in the measurement of the beam diameter in an appropriate plane. figure 21 : Example of measurement based on focal point displacement for tightly focused probe beam according to Hu and Whinnery’s method. To conclude on the methods based on the focal point displacement, we can say that their main drawback is their low accuracy. The relative precision is only 20-30% on the focal lens measurement. We can use geometrical optics in another way by measuring the deflection of a probe beam. Instead of using a beam covering the whole pumped area, we can measure the deflection of a small beam slightly off-axis as shown in figure 22. figure 22 : Example of measurement based on deflection measurement [98]. detector Laser crystal With thermal lens Gaussian beam Without thermal lens 67 /115 The whole surface can be scanned in order to measure not only the focal lens but also the aberrations due to the thermal effects [98]. However this method is complex and it can only be used for large transversally-pumped crystals (even if the resolution can be lower than the probe beam size [99]). Moreover this method can be considered as a point-to-point measurement of the simpler Shack-Hartmann technique described later. IV.3. Methods based on the properties of cavity eigenmodes After the development of end-pumped lasers (particularly thanks to the increasing performance of laser-diodes), alternative methods appeared. These less straightforward methods are based on the properties on the laser cavity eigenmodes, in particular on the fact that the thermal lens affects the stability zones of a laser cavity. All these methods are based on the theory of paraxial beam propagation theory in cavities presented by Kogelnik and Li [100] and can be formalized using the ABCD matrices. An example of this influence of the thermal lens on the stability of a cavity is given in figure 23. As a direct consequence, these methods consider ideal lens (aberration- free) and do not give any information on the thermally-induced aberrations. Nevertheless these methods remain easy to implement since there is no probe beam (the laser itself is used to measure the thermal lens). In counterpart, it is impossible to measure the thermal lens in absence of laser extraction (with pumping but no laser emission). Here are three different examples of these methods based respectively on the work of Frauchiger and al., Neuenschwander and al. and Ozygus et al. 68 /115 No thermal lens thermal lens Unstable cavity Stable cavity No thermal lens thermal lens Unstable cavity Stable cavity Figure 23: example of this influence of the thermal lens on the stability using a plano-plano cavity [101]. - Frauchiger and al. [102] measured the divergence of the output laser beam in a diode-pumped Nd :YAG and retrieved the thermal lens by a paraxial calculation. This example allows us to perceive the limitation of these techniques using the properties of cavity modes. Indeed, they are very sensitive to the beam quality. The measured divergence is actually directly proportional to the M2 factor of the beam. If the laser mode is not perfectly TEM00 the reliability of the method is strongly affected. - Neuenschwander and al. [101] used a plano-plano laser cavity stabilised by the thermal lens (figure 23). Adding two extra-cavity lenses and measuring the beam diameter in different longitudinal positions they found the waist in the cavity and therefore the thermal lens. This method is interesting because it allows the simultaneous measurement of the M2 factor which reduces the 69 /115 limitations due to the beam quality. In that case, the limitation is due to the precision on the distances between the different optical components which is not always perfectly known. - Ozygus et al. [103] proposed an alternative method that consisted in using the frequencies of different transverse modes. In fact, the frequencies of the different transverse modes (in a plano- concave laser cavity) depend not only on the length of the cavity but also on the radius of curvature of the mirrors (considering the flat mirror and the crystal as another concave mirror). If one achieves to have two modes lasing simultaneously in the cavity, the measurement of the beating frequency with an optical spectrum analyser allows retrieving the thermal lens. Another upgraded technique based on the same effect was also presented by Ozygus et al.later in 1997 [104]. In the last one, one translates one of the mirrors to find the positions of the spectral degeneracy (when two eigenmodes have the same frequency). The advantage of this method is its capacity to be used for measuring long-focal-length thermal lenses. It allows the measurement of focal lengths as long as 5 In conclusion, the methods based on the properties of the cavity modes allows to have relative precisions on the thermal focal lens of 15 % for a TEM00 beam but down to 60 % for an non-diffraction limited beam [101]. IV.4. Methods based on wavefront measurements In this part, we will distinguish three wavefront measurement techniques: “classical” interferometry (based on fringe measurements on Michelson, Fizeau, … interferometers), shearing interferometry, and Shack-Hartmann sensing. IV.4.1. Classical interferometric techniques 70 /115 As far as “classical” interferometry is concerned, one can consider equal-thickness fringes between the parallel end faces of the rod (this is the so-called Fizeau interferometer); one can also insert the rod under study in an arm of a Michelson-type [105] or Mach-Zehnder -type interferometer [106] for example. The first type of method is simpler since it does not require a second interferometrically adjusted arm. These methods are particularly well suited for large amplifier rods but in counterpart, they are not convenient for end-pumping. Indeed, even if there is no fundamental contraindication to use this method for small spots, in practice in this case the number of fringes is too small to obtain an exploitable interferogram. As an example, if we consider a 200-µm diameter probe beam, a focal length of 5 cm only induces a phase shift between the centre and the edge of the beam of ∆ = h2/2f = λ/6 (with h the radius of the beam, f the focal length, and λ the wavelength of the probe beam, here λ =670 nm). In this case, of course, no fringes are visible. To overcome this problem, one can choose to take a probe beam larger than the pumped area, given that in these conditions, as shown in figure 17, some spherical aberration will be present, which requires an additional numerical model to fit the data and retrieve the thermal lens, as done by Pfistner et al. [107]. The weakness of this method relies precisely in the retrieving algorithm since the whole interferogram consists on only a few fringes. The precision is then a hard point and in the best case this precision is evaluated to λ/4. This work [107] has also been done on YAG, GSGG and YLF crystals. IV.4.2. Shearing interferometric techniques A classical solution to obtain information on the phase “between two fringes” is phase shift interferometry. This technique has been used by Khizhnyak et al [108] with longitudinally-pumped Nd :YAG lasers. This method is easy to implement since it’s based on commercial products, but it 71 /115 remains quite unused due to its important cost. One generally prefers lateral-shearing-interferometer methods. Methods based on lateral shearing interferometers are particularly well suited to end pumping. The principle is the following: the beam is duplicated in several replicas, typically 2 [109- 110],3 [111] or 4 [112-113] (as presented in figures 24 and 25 with a tri-wave lateral shearing interferometer setup). Figure 24 : Example of tri-wave lateral shearing interferometer (courtesy of J.C. Chanteloup). 72 /115 Figure 25: Example of tri-wave lateral shearing interferometer setup (courtesy of J.C. Chanteloup). The replicas are slightly shifted from each other in the lateral direction which provides an interferogram whose fringes (for 2 waves) or dots (for 3, 4 waves) separation give information on the derivative of the wavefront. For example in the absence of wave-front distortion the lines are rectilinear for the 2-wave shearing interferometer, they form an homogenous honey-comb for the 3- wave shearing interferometer and perfect squares for the 4-wave shearing interferometer. This technique is more sensitive than classical interferometry since the sensitivity is tunable by adjusting the shearing distance. The larger the shift, the more precise the technique . The precision of this method is excellent since it is in the order of λ/50 [109] and can even reach λ/200 [111]. The method using 3 (or 4)-wave « trilateral shearing interferometry » [111] has the advantage over the 2-wave shearing interferometry to allow the cartography in the 2 dimensions in one acquisition. This method is simple to implement (figure 25) since the splitting of the beam can be realized readily with a 3D grating. Moreover the use of a grating makes this method totally achromatic which allows its use for broad-band lasers such as femtosecond lasers. In 1998, 73 /115 Chanteloup et al. [112] reported on the wavefront distortions of a terawatt-class femtosecond laser system with an accuracy of λ/50. This method was also used to characterize thermal lensing in Ytterbium-doped materials, namely in Yb:YAB by J.L. Blows et al. [110], who used the 2-wave shearing interferometer technique to measure thermal lenses, and then thermal conductivities and fractional thermal loadings. The experimental setup used is reproduced on figure 26. Figure 26: setup for measuring thermal lensing in diode-pumped Yb:YAB crystal with lateral shearing interferometry from [110] (courtesy of J. Dawes, Centre for lasers and Applications, Sydney) The use of a probe beam at 530 nm allowed the thermal lens to be evaluated under lasing and nonlasing conditions (that is with a shutter inside the cavity), which is an essential requirement to perform radiative quantum efficiencies measurements. IV.4.3. Methods based on Shack-Hartmann wavefront sensing Although it can be seen also as a multiple beam interferometric method, we separate this method from the previously mentioned ones since it can be understood easily with geometrical optics. 74 /115 In 1900, Hartman proposed to use a drilled plate [115] to measure wavefronts. The principle is simple: since light rays run perpendicular to the wavefront, one can retrieve the local wavefront slope as soon as the direction of the ray is measured. The Hartmann plate is made of small apertures which scatter the beam into regularly-spaced diffraction patterns, and behind which is located a detector (typically a CCD camera nowadays). In 1971, Roland Shack and Ben Platt improved the Hartmann setup by replacing the array of holes by an array of microlenses (figures 27, 26). Single unit Figure 27: example of intensity pattern observed in the focal plane of the Shack-Hartmann microlenses. 75 /115 Rays propagationIncident Focal length Microlens array Rays propagationIncident Focal length Microlens array Figure 28: Shack-Hartmann wavefront sensor setup. Reconstructed wave front Measured local slopes Input wave front Reconstructed wave front Measured local slopes Input wave front Measured local slopes Input wave front Figure 29: Shack-Hartmann sampling. 76 /115 The small axial shift of the microlens diffraction pattern centroid (see figure 28) is directly proportional to the average local slope of the wavefront on the aperture of the micro-lenses. The displacement vectors allow having 2D information. The standard sensitivity of such systems is typically λ/100 RMS for commercial products, which settles this technique as a competitor of lateral shearing interferometry. One of the advantages of this method, compared to interferometric ones, is its insensitivity to mechanical vibrations or thermal fluctuations. In counterpart, the principal limitation of this kind of sensor is the discrete sampling that limits the transverse resolution (figure 29) and thus prevents from obtaining information about high-spatial-frequency phase distortions. Nevertheless, it’s noteworthy that this point is not a problem for thermal lensing and aberration measurements because in virtue of the general heat equation (II.1.1), temperature variations in a crystal are smooth even if the thermal load exhibits sharp variations. In 1998, Armstrong [116] reported the measurement of thermal lens in transversally-pumped Nd:YAG and Nd:YAP rods. More recently Ito et al. [117] and Pittman et al. [118] reported the use of a Hartmann-Shack sensor to measure the thermal lens in Ti:sapphire rods of terawatt-class femtosecond lasers. In 2001, reports of temporal changes of thermal lens effects on high power pulsed Yb:glass lasers have been done by Nishimura et al. using a Shack-Hartmann wavefront sensor [120]. Here the sensor was used for its ability to yield real-time (100 Hz sampling rate) estimation of Zernike coefficients of aberrations, and allowed to measure characteristic thermal relaxation times. Our group reported recently [73] a derivation of Armstrong’s work to measure thermal lensing in various diode-pumped Yb:doped crystals, under lasing or nonlasing conditions. The setup appears in figure 30. The probe beam was a laser diode at 670 nm coupled in a single mode fiber, chosen for its high spatial coherence (an essential feature to correctly define the “reference wavefront”) and its low temporal coherence (necessary to avoid coherent cross talk, that is interference between two neighbouring microlens diffraction patterns.) After collimation by a 77 /115 microscope objective, the probe beam was focused onto the crystal and superimposed with the pump beam. The crystal was then imaged upon the microlens array using a magnifying relay imaging system. An uncoated glass plate was inserted in the pump beam path to reflect the probe beam towards the sensor. A selective interference filter at 670 nm was added in front of the sensor to eliminate any unwanted signal at the pump or laser wavelengths. A « reference wavefront » is recorded when the pump diode is turned off, which includes all static aberrations of the optical elements and of the cold crystal itself. It is then subtracted to the measured wavefront when the pump is on. Thus, only phase distortions originating in thermal effects are recorded. The phase front was then reconstructed by projection over the set of the orthogonal Zernike polynomials [70]. Figure 30: Experimental setup used to measure thermal lensing in diode-pumped Yb-doped crystals with a Shack-Hartmann wavefront sensor, from [73]. IV.4.4 Other techniques There exists some more marginal and less used techniques to measure thermal lensing. For example, it is possible to achieve phase reconstruction basing on Fourier optics. One can show that it is possible to retrieve the phase by knowing the intensity profile in different planes linked by Fiber-coupled pump laser diode Fiber-coupled Shack-Hartmann sensor Interference Filter @ 670 nm Laser cavity Glass plate crystal 78 /115 Fourier transformation. For instance it is quite obvious that a uniform intensity over a circular aperture and an Airy pattern appearing in two Fourier-related planes implies that the wavefront propagating from one plane to the other is purely spherical. Nevertheless the general inverse problem is far from being obvious. Grossard and al. [114] proposed a technique for measuring thermal lensing aberrations in a diode-pumped Nd:YVO4 crystal using intensity profiles in 3 planes and a complex phase retrieval algorithm derived from Gerchberg and Saxton’s work. IV.5. Conclusion In conclusion, we made a review of the main methods used to experimentally measure the thermal effects in lasers, putting in emphasis the advantages and the limitations of each method. A summary of this review is presented in table 4. 79 /115 Methods Based on focal point displacement no Very difficult yes yes yes yes yes Very easy / very cheap Based on cavity properties no yes Only with Almost impossible probe yes yes Low ≈ 15 à 60 % Difficult in general / cheap « classical » Interferometry yes Very difficult yes in theory no no no yes σ∆ ∼λ/4 for the interferometry without phase shift. Difficult / Cheap (except phase shift interf.) lateral shearing interferometry yes yes yes yes yes yes yes for the 3-4-wave shearing high σ∆∼λ/50 to λ/200 medium / expensive reconstruction de la phase from different position intensity profiles yes yes yes yes yes yes no unknown Difficult /no commercial Hartmann- Shack yes yes yes yes yes yes yes high σ ∆ ∼λ/100 easy/ expensive Table 4: summary of the different techniques used to measure thermal lensing. σ∆ denotes the reported RMS deviation on phase shift. V. Thermal lensing measurements in ytterbium-doped materials: the evidence of a non radiative path We will conclude this review by giving some examples of experimental thermal lensing measurements in diode-pumped Yb-doped crystals. Most of the examples are taken from our previous publications [73,121]: the reader interested in the technical details of the experiments, as well as by the theoretical considerations underneath, is invited to refer to these works. 80 /115 V.1. the thermal load in Yb-doped materials What are the heat sources in a laser medium ? Following T. Y. Fan and using the same notations [8], the fractional thermal load ηh (that is the fraction of the absorbed pump power converted into heat) can be written: rlph λ ηηηη 11 (V.1) Where: - λp, λl, λf are the pump wavelength, the observed laser wavelength, and the mean fluorescence spectrum wavelength, respectively; - ηp is the pump quantum efficiency, which is the fraction of absorbed pump photons contributing to inversion. Non-unity pump quantum efficiency accounts for residual absorption of the undoped crystal, or can be related to the presence of nonradiative sites. - ηr is the radiative quantum efficiency for the upper manifold: it represents the fraction of excited atoms that decay by a radiative path (in absence of stimulated emission). Non-unity radiative quantum efficiency can be related to multiphonon relaxation (although it is very unlikely since a large number of phonons are necessary to bridge the 10 000 cm-1 gap separating the excited and ground manifolds) or more probably to concentration quenching. The latter phenomenon corresponds to the trapping of the energy by a color center, an impurity, or a lattice defect (Yb2+, rare-earth impurities or hydroxyl groups have been evoked as possible “dark sites” [122-124]) after several transfers of excitation between neighbouring ions. - ηl is the laser extraction efficiency, defined as the fraction of excited ions that are extracted by stimulated emission. An expression of the laser extraction efficiency can be derived by writing the stimulated, spontaneous and nonradiative relaxation rates [73]. In most cases an approximate 81 /115 relation can be used by neglecting reabsorption at laser wavelength. In the latter case we obtain the simplified expression: ≈ (V.2) where I is the intracavity laser intensity, and ( )lem λσ the emission cross section at laser wavelength. As shown by eq. (V.2) the laser extraction efficiency tends towards 1 for intracavity laser intensities that surpass the laser saturation intensity. Generally, cw oscillators based on Yb- doped materials work with high reflectivity output couplers: as a consequence the intracavity laser intensity is very high, at least one order of magnitude higher than the laser saturation intensity, so that ηl is typically close to unity in an operating Yb laser. In this case, the thermal load becomes nearly independent on the radiative quantum efficiency, and is only given by the quantum defect. Nevertheless, the quantum efficiency directly affects the excited state population and has subsequently crucial importance for Q-switched lasers or low repetition rate amplifiers. Incidentally, the relation (V.2) also illustrates that the performance of an Yb-based cw oscillator becomes nearly independent of the emission cross section at laser wavelength, provided that the pump intensity is far higher than the pump saturation intensity. V.2. Evidence of nonradiative effects in Yb-doped materials: the example of Yb:YAG During the past decade, it has been assumed and claimed many times that nonradiative effects did not exist in Yb-doped materials, in virtue of the very simple electronic structure that prevented deleterious effects that are well known with other ions, such as cross relaxation, excited state absorption or upconversion. Nevertheless, all the measurements performed in the past few years 82 /115 have all brought a contradiction to this statement. Blows et al. have demonstrated clear evidence of a nonradiative path in Yb:YAB [110], Ramirez et al. [125] in Yb:MgO:LiNbO3 , and as for YAG, non-unity quantum efficiencies have been reported by Barnes et al. [126], Patel et al. [127], Ramirez et al. [125] and Chenais et al. [121]. This recent work has also shown the existence of nonradiative effects in Yb:GGG, Yb:YSO, Yb:KGW, Yb:YCOB, Yb:GdCOB. An example of quantum efficiency measurement thanks to the thermal lens method is shown in figure 31 with an Yb:YAG crystal. A simple qualitative explanation is given in figure 32. The clear difference between the thermal lens dioptric power under lasing and nonlasing conditions can be modelled using eq. (V.1) and eq. (V.2), provided that laser power is measured simultaneously. Given some additional approximations, detailed in [121], one can retrieve both radiative quantum efficiency and thermo-optic coefficient χ. The results recently reported for different crystals have been summarized in table 5. 83 /115 figure 31: Thermal lens dioptric power (here, aberrations of the thermal lens were negligible) under lasing and nonlasing conditions. On the right (same graph) the measured laser power, useful to compute the laser extraction efficiency. (from [121]) laserfluorescence NON lasing lasing 2 F 5/2 2 F 7/2 Non Radiative (NR) decay Energy fluorescence NR decay NON LASING LASING 543210 Absorbed pump power (W) 8- at.% Yb:YAG TL, non lasing : experiment TL, non lasing : model TL, lasing : experiment TL, lasing : model Laser power 84 /115 figure 32 : Simple qualitative explanation of the observed difference between thermal lens dioptric power under lasing and nonlasing conditions. When the laser is on, stimulated emission short- circuits the nonradiative path, causing the thermal load to be lower. It is seen that for a crystal like YAG, many different techniques have been used (for the details of each method, refer to the cited publications), and that a large dispersion of reported quantum efficiencies appears. This dispersion tends to assess the conjecture of concentration quenching as the nonradiative source in Yb-doped materials. This conjecture has been very alleviated in highly- doped Yb:YAG samples and was attributed to cooperative processes between two Yb3+ ions towards Yb2+ impurities [122]. Owing to the intrinsic nature of concentration quenching and the major role played by impurities, it is clear that the radiative quantum efficiency is a parameter that pertains to a single given sample, characterized by its doping concentration, the growth technique and its associated environment (in particular the nature of the crucible), and of course the degree of purity of the compounds. 85 /115 Table 5: Reported values of measured thermo-optic coefficients and radiative quantum efficiencies in the literature for different Yb-doped materials. crystal Mean fluorescence wavelength λf (nm) Thermo-optic coefficient (×10-6 K-1) (measured, under diode-pumping conditions) Radiative quantum efficiency ηr (measured) Method used (for quantum efficiency measurement) Reference 0.70 Thermal lensing (Shack-Hartmann) [121] 0.874 photometric [126] 0.835 calorimetric [126] 0.97 lifetime [127] Yb:YAG 1007 10.0 (from [121]) 0.85 Direct temperature measurement [125] Yb:GGG 1013 31 0.90 Thermal lensing (Shack-Hartmann) [121] Yb:GdCOB 1011 6.5 0.71 idem [121] Yb:YCOB 1035 17 0.90 idem [121] Yb:KGW 993 7.5 0.96 idem [121] Yb:YSO 1001 15 0.89 idem [121] Yb:YAB 996 Non reported 0.88 Thermal lensing (lateral shearing) [110] V.3. Laser wavelength dependence on the thermal load in Yb-doped broadband materials: the example of Yb:Y2SiO5 86 /115 We saw that the fractional thermal load (eq. V.1) is dependant on the operating wavelength of the laser. In most practical circumstances however, this dependence is hidden by the fact that the laser extraction efficiency also greatly depends on the laser wavelength, since it is linked to the emission cross section. The Yb:Y2SiO5 (Yb:YSO [128]) crystal exhibits two maxima of comparable amplitude in its emission spectrum, at 1042 and 1058 nm respectively. In addition the output is naturally linearly polarized along the crystallophysic axis Y for both wavelengths. It is thus a good candidate to put clearly into evidence the influence of laser wavelength on thermal lensing. To perform the experiment, we added a SF6 dispersive prism cut at Brewster angle in the collimated arm of the three-mirror cavity appearing in figure 30, so that identical laser efficiencies were achieved at both wavelengths (2.1 Watts were obtained for 8.5 Watts of absorbed pump power). The results are shown in figure 33. It appears clearly that the thermal lens is weaker when the laser oscillates at 1042 nm, as expected since quantum defect is lower at this wavelength. The theoretical curve derived for the 1058-nm laser oscillation was obtained from the 1042-nm curve by just modifying the wavelength and the emission cross section in eq. (V.1) and (V.2), without any adjustable parameter (see [121]). This simple experiment shows the interest of multiwavelength thermal lensing measurements in broadband materials. Indeed, we have considered here a simple formulation of the fractional thermal loading (given by eq. V.1) which gives here satisfactory results; but it is possible, from the work of Patel et al. [127] for instance, to derive a more accurate expression of the fractional thermal load, which takes into account the probability of excitation transfer to a neighbouring ion [129]. If measurements are performed versus absorbed power at different (more than 2) laser wavelengths, for which the laser extraction efficiency is also known, this means that we have the possibility to infer other spectroscopic parameters (such as the transfer probability to a neighbouring ion) involved in the expression of the thermal load. 87 /115 86420 Absorbed pump power (W) 1058 nm 1042 nm 1058 nm 1042 nm TL, lasing at 1042 nm : experiment TL, lasing at 1058nm : experiment TL, lasing at 1042 nm : model TL, lasing at 1058 nm : model 5-at. % Yb:YSO Figure 33: Thermal lens dioptric power at 1042 and 1058 nm (left) and laser power (right in Yb:YSO. The effect of wavelength on quantum defect is clearly visible for this material. (from [121]) IV.4. The influence of the mean fluorescence wavelength on the thermal load: an illustration with Yb:KGW The fact that the mean fluorescence wavelength affects the fractional thermal load is a very interesting feature of broadband Yb-doped materials. There are even some materials whose mean fluorescence wavelength is below the tail of the absorption spectrum. More precisely, according to eq (V.1) it is clear that if we may find a pump wavelength verifying: 88 /115 f p r pλ η η λ< (V.3) then the thermal load will be negative (in absence of laser extraction) and “radiative cooling” will be achieved. Such radiative cooling has been reported in Yb-doped ZBLANP glass in 1995 [130] and in a KGW crystal by Bowman et al. in 2002 [131]. The latter author has also proposed to use this phenomenon to realize a thermal load-free (radiation-balanced) laser [132]. The key idea is to correctly adjust the laser intensity so that the spontaneous emission rate, source of cooling providing the pump wavelength is long enough, exactly balances the stimulated emission rate, source of heating. We show here how the mean fluorescence wavelength plays a key role in the interpretation of the results obtained when measuring the thermal lens in a Yb:KGW crystal. KGW is now a well- known crystal, suited for ultrafast laser applications [32-33, 17, 46, 74]. When used with wavevector k//c, the polarization-averaged mean fluorescence wavelength is 993 nm while the observed laser wavelength is 1030 nm and the pump wavelength tuned to the zero-line absorption peak, i.e. 979 nm. Results are shown in figure 34, where it can be seen that unlike previously reported results, thermal lensing is actually stronger under lasing conditions. A simple explanation is given with the schematic picture appearing in figure 35. The simple model suggested above fits well with experimental data and yields a high quantum efficiency for this sample (0,96), consistent with the fact that tungstate crystals are grown with the flux method, which is known to carry less impurities during growth than the Czochralski technique. 89 /115 figure 34: Thermal lensing measurements in Yb:KGW (from [121]) NON LASING LASING 6543210 Absorbed pump power (W) 5- at.% Yb:KGW TL, non lasing : experiment TL, non lasing : model TL, lasing : experiment TL, lasing : model Laser power 2 F 5/2 2 F 7/2 993 nm 1030 nm laser fluo fluorescence Non lasing lasing Energy 90 /115 Figure 35: Qualitative explanation of the paradoxical behaviour of the Yb:KGW crystal. Under nonlasing conditions (left), the thermal defect is low due to a low mean fluorescence wavelength (993 nm). When laser extraction at 1030 nm becomes the dominant way down for excitation, then the quantum defect is higher. This simple picture assumes that ηr = 1. IV.5. Conclusion As a conclusion for this final part of this paper, we have shown some examples of thermal lensing measurements, which allowed us to highlight several points of interest pertaining to Yb-doped laser media : - all measurements show a difference between thermal lensing dioptric power with and without laser action: this provides the proof that in all the crystals under investigation (here YAG, GGG, YSO, GdCOB, YCOB, KGW, YAB) there is a nonradiative return path for excited state population. This is all the more detrimental for laser performance in pulsed regime, since in cw oscillators the laser extraction efficiency can be large enough to dissimulate this effect. - Since Yb-doped materials have broader spectra than their Nd-doped counterparts, thermal lensing has two specific properties, which have been illustrated by two experiments: 1) the thermal load depends on the laser oscillation wavelength: the lower the wavelength and the lower the quantum defect; 2) the mean fluorescence wavelength λf plays a key role. Materials exhibiting a low λf will be less sensitive to heating under nonlasing conditions, and in KGW we saw that the thermal lens was even higher under lasing action. 91 /115 The peculiarities of Yb-doped materials are not limited to these two latter facts. Chenais et al. [121] have reported the appearance of a roll-off in the thermal lens dioptric power at high pumping densities, under nonlasing conditions, with several different materials [121]. Furthermore, it has been observed by several groups ([110], [124]) a green luminescence that can be related to cooperative luminescence. The detailed interpretation of these phenomena and their connection to the thermal load are still works in progress. VI. Conclusion Ytterbium-doped materials have brought new prospects and deep changes in the area of high power solid state lasers. Associated to new and clever pumping concepts (fiber lasers, thin disk lasers, spinning disk lasers...) they are now well-established competitors of Nd-doped materials for high power applications. This paper has been an effort to make a review of the recently published works dealing with thermal effects in solid state lasers, with a particular scope on the special case of Yb- doped crystals. In the first section we have presented the general properties of Ytterbium-doped media, and pointed out the crucial role of the matrix host on the properties of the laser material. The part II was devoted to a detailed presentation of the temperature distribution in a diode-pumped Yb-doped crystal: how to calculate it and how to measure it. We pointed out the importance of boundary conditions, and gave some practical information about the role of the thermal contacts in the temperature profile. We have shown that it was possible to easily include pump absorption saturation effects and pump beam divergence inside the crystal, exploiting the fact that the heat transfer coefficient towards end faces was far smaller than towards edge faces. In the third part of this review we focussed on the thermo-optical properties and made a quite detailed study of the so-called thermal lensing phenomenon. A comprehensive understanding of 92 /115 this aspect requires a good knowledge about both thermo-mechanical and thermo-optical properties of the materials under consideration. This lead us to point out several inaccuracies reported in previous works, concerning the calculation of photoelastic constants for isotropic materials, and more generally the abusive employment of the dn/dT parameter when it is used to estimate the magnitude of the thermal lens: we have shown that the partial derivative of index with temperature taken at constant strain is the most appropriate figure, this is because the dn/dT is classically measured under experimental conditions that are not consistent with the usual situation of a diode- pumped crystal under mechanical stress. We proposed an alternative way to split the thermo-optic coefficient into three truly independent terms, and addressed in conclusion a schematic diagram of thermal effects showing how all the different apparent consequences (lensing, depolarization, strain-induced birefringence, astigmatism, fracture...) are connected to each other. Given the high complexity of these thermo-optical phenomena, and the unfeasibility of precise calculations as far as all the properties of a crystal are not known (that is the case for the majority of laser crystals), we then focussed our attention on thermal lensing measurement techniques, which was the topic of part IV. We presented in that section a review of what are, to our knowledge, the main different techniques that can be employed to measure thermal lensing in end-pumped laser media, and discussed their relative advantages and drawbacks. Finally, we presented some examples of thermal lensing measurements that have been reported recently in Yb-doped crystals. All these measurements agree to find non-unity radiative quantum efficiencies for the Yb-doped materials under investigation. This non-ideal behaviour, presumably related to concentration quenching, provides contradiction to the general consideration that Yb- doped materials are totally free of deleterious nonradiative effects. We concluded this review by giving some examples of the influence of the laser operating wavelength on the thermal load, as well as the influence of the mean fluorescence wavelength. 93 /115 Acknowledgements: We wish to acknowledge all the people involved in the realization of this review: at first Romain Gaumé (Stanford University) for fruitful discussions about issues related to dn/dT and characterization of materials; Bruno Viana, Gerard Aka and Daniel Vivien from the Laboratoire de Chimie Appliquée de l’Etat Solide (Ecole Nationale Supérieure de Chimie Paris, France), Alain Brenier and Georges Boulon from the Laboratoire de Physico-Chimie des Materiaux Luminescents (Lyon, France) and at last Bernard Ferrand from CEA-LETI (Grenoble, France), for growing the crystals used in most of the experimental results reported here. We gratefully acknowledge the Ecole Supérieure d’Optique (Orsay, France) for loaning us the Shack- Hartmann wavefront sensor as well as the infrared camera. The experiments related to our previous works have been done under the auspices of the CNRS and the Délégation Générale de l’Armement (DGA). We also thank Jean-Christophe Chanteloup and Judith Dawes for allowing us to reproduce some figures taken from their works. VII. Appendix: Calculation of the photoelastic constants Cr, θ and C’r, θ using plane strain and plane stress approximations. The Cr, θ constants are useful parameters to account for photoelastic effects in the most simple cases (optically isotropic crystals or glasses, and parabolic dependence of strain inside the crystal). When Koechner published for the first time the derivation of these constants in 1970, the term accounting for thermal dilatation in the generalized Hooke law was omitted [51]. In 1992, Cousins pointed out the mistake but did not correct, at that time, the expressions of the constants Cr, θ, whose derivation however requires the use of the generalized Hooke law. In the last version of 94 /115 Koechner’s reference book (Solid State Laser engineering, Springer Verlag ed.), the faulty equations were actually removed or corrected, but the formulation of the photoelastic constants Cr, θ remained unchanged since the first edition. Because the generalized Hooke law is used to infer an expression of the constants, it means that their expression depends on which assumption has been made about stresses and strains. In Koechner’s calculation, the plane strain approximation was made. In the case of end-pumping, it is well known however that the plane stress approximation is closer to reality. In this appendix we derive the expression of the photoelastic constants, within the framework of the two approximations. VII.1. Basic equations We restrict our discussion to cubic crystals. In this case, only the elasto-optical coefficients p11, p12, p44 are nonzero. These coefficients are given in the [100] system linked to the crystal. In optically isotropic materials, the principal axes of strain are given by the cylindrical coordinate system axes (radial, tangential, axial). After a change of coordinates one obtains [77] : [ ]441211 441211 441211 (A.1.) for the radial index, and : 95 /115 [ ]441211 441211 441211 (A.2.) for the tangential index. As stated by Shoji et al.[119], these expressions are valid only for propagation along the [111] axis. For YAG we have : p11 = - 0.029 p22 = + 0.0091 p44 = - 0.0615. The strains are related to stresses by the generalized Hooke law, which in our case (whatever the approximation we make) writes: ( )[ ] ( ) ( )[ ] ( ) ( )[ ] ( )cTrzz cTzrr −++−= −++−= −++−= ασσνσ ασσνσ ασσνσ (A.3.) VII.2. Derivation of Cr,θ using the plane strain approximation (Koechner’s case) 96 /115 We consider a homogeneously pumped rod. Only the radial dependence of stresses and temperature is considered, since the additional constants represent a constant phase shift which does not affect the phase profile. We define: ( )ν116 and Q absh = . The stresses and temperature fields, within the plane strain approximation, are given by Koechner [69] : ( ) cteQSrr cteQSrr .cteQSrr θ (A.4.) ( ) .cter +−= 2 (A.5.) Omitting the constant terms, the generalized Hooke Law yields : ( )[ ] ( ) 0 ν14ασσνσ ε 22θ =⎥ −−=++−= r Trzz (A.6.) which checks the plane strain condition. We have also: ( )[ ] ( ) 22θ ν134 ν71ασσνσ ε QSr −−=++−= (A.7.) 97 /115 ( )[ ] ( ) 22θθ ν14 ν53ασσνσ ε QSr −−=++−= (A.8.) The radial index variation is related to the strains by: = (A.9.) and the are given by (A.1) and (A.2). Following Koechner, we can write the index variations under the form : nn ,r ,r −= (A.10.) With : ( )( ) ( )1ν48 875ν1 441211 C r (A.11.) ( )( ) ( )1ν16 3ν1 1211 C (A.12.) For YAG, we find Cr = + 0.020 et Cθ = + 1.77 10-4. The valued computed by Koechner are: Cr = 0.017 et Cθ = -0.0025 (with the same values taken for pmn and the same Poisson ratio: ν = 0.25). The error made by Koechner is about 20%. 98 /115 These coefficients have been extensively used for the purpose of evaluating depolarization losses in Nd:YAG. In this latter case, the relevant parameter is [60]: ( )rB CCC −= θ2 (A.13.) With (A.11) and (A.12), we find: ( )441211 4148 pppCB +−− (A.14.) which is coincidentally the same formula obtained by Koechner from incorrect expressions of Cr et Cθ. VII.3. Derivation of C’r,θ using the plane stress approximation (end pumping case) We consider a thin disk, pumped by a top-hat pump beam profile of radius wp equal to the rod radius. This is also true in the conditions defined in Section III.4, provided that we are only concerned in the area r < wp and that integrated values of parameters along the crystal length are considered. Let’s define : = The temperature distribution is given by: 99 /115 ( ) 2' . T r r cte = − + (A.15.) where we use Cousins’ notation (see Part III.4) , meaning that the bracketed quantity is integrated along the rod length L, and is then homogeneous to the said quantity times a length. We have: 3 ' ' 0 ( ) r Q S r Cte r Q S r Cte r plane stress (A.16.) From (A.3) we obtain : 3 ' ' 4 ' ' Q S r Q S r Q S r (A.17.) By analogy with (A.10.), the index shift can be written under the form: 3 ' 2 n n C r ∆ = − which yields, using (A.1), (A.2), and (A.9) : ( )( )11 12' 1 9 15 ν− + + = (A.18.) 100 /115 ( )( )11 12 44' 1 7 17 8 p p p ν− + + − = (A.19.) For YAG : C’r = + 0.0032 et C’θ = - 0.011. The optical indicatrix does not deform the same way compared to the case of the long thin rod: here the tangential index becomes greater than n0. An interesting feature is the stress-induced birefringence term, defined previously (eq. A.13): ( )' 11 12 44 C p p p = − + (A.20.) This expression differs from (A.14) only by a factor of 1-ν (=0.75 for YAG, but also for most materials). It is of special interest to calculate the average value of the photoelastic constants: 0.0039 rC Cθ+ = − which gives an order of magnitude of the role of photoelastic effect in the thermal lens. Here, in the end pumping case, the thermal stresses yield a divergent contribution to the thermal lens, whereas it was convergent in the case of side-pumping. See subsections 6 to 8 of Section III for a detailed discussion about the use of these parameters. 101 /115 References: 1. T.Y. Fan : “Diode-pumped Solid-State Lasers”, in Laser Sources and Applications, Proc. of the 47th Scottish Universities Summer School in Physics, St Andrews, edited by A. Miller and D.Finlayson, SUSSP Publications & Institute of Physics Publishing, 1996. 2. D. Hanna and W. Clarkson : “A review of diode-pumped lasers”, in Advances in Lasers and applications, Proc. of the 52nd Scottish Universities Summer School in Physics, St Andrews, edited by D. Finlayson and B. Sinclair, SUSSP Publications & Institute of Physics Publishing, 1999. 3. R.J. Keyes, T.M. Quist « Injection luminescent pumping of CaF2:U3+ with GaAs diode lasers », Appl. Phys. Lett. 4, 50-52 (1964). 4. W. Streifer, D. Scifres, G. Harnagel, D. Welch, J. Berger, M. Sakamoto : “Advances in laser diode pumps“, IEEE J. Quant. Elec. Vol. 24, pp. 883-894 (1988). 5. D. Brown : « Ultrahigh-average power diode-pumped Nd :YAG and Yb:YAG lasers », IEEE J. Quant. Elec., Vol. 33, No. 5, pp. 861-873 (1997). 6. W. Krupke : «Ytterbium Solid-State Lasers – The first decade», IEEE Journal On Sel. Topics in Quant. Elec. 6, 1287-1296 (2000). 7. P. Hardman, W. Clarkson, G. Friel, M. Pollnau, D. Hanna : “energy-transfer upconversion and thermal lensing in high-power end-pumped Nd:YLF laser crystals”, IEEE J. Quant. Elec. , Vol. 35, No. 4, pp. 647-655 (1999). 8. T.Y. Fan : “Heat generation in Nd:YAG and Yb:YAG”, IEEE J. Quant. Elec., Vol. 29, No. 6, pp. 1457-1459 (1993). 9. D. Sumida, A. Betin, H. Bruesselbach, R. Byren, S. Matthews, R. Reeder, M. Mangir : “Diode-pumped Yb:YAG catches up with Nd:YAG”, Laser Focus World, Juin 1999, pp. 63-70. 102 /115 10. L. Johnson, J. Geusic, L. Van Uitert : “Coherent oscillations from Tm3+, Ho3+, Yb3+ and Er3+ ions in yttrium aluminium garnet”, Appl. Phys. Lett., Vol. 7, No. 5, pp. 127-129 (1965). 11. R. Wynne, J. Daneu, T. Y. Fan : “thermal coefficients of the expansion and refractive index in YAG”, Appl. Opt., vol. 38, no. 15, pp. 3282-3284 (1999). 12. P. Lacovara, H. Choi, C. Wang, R. Aggrawal, T. Fan : « room-temperature diode-pumped Yb:YAG laser », Opt. Lett. , vol. 16 No. 14, pp.1089-1091 (1991). 13. G.A. Slack and D.W. Oliver : “Thermal conductivity of garnets and phonon scattering by rare-earth ions”, Phys. Rev. B, 4, 592-609 (1971). 14. F. Krausz, M.E. Fermann, T. Brabec, P. F. Curley, M. Hofer, M. H. Ober, C. Spielmann, E. Wintner, A. J. Schmidt, « Femtosecond Solid-State Lasers»‚IEEE J. Quant. Elec. 28, 2097-2121 (1992). 15. U. Keller, «Ultrafast all-solid-state laser technology», Appl. Phys. B, 58, 347-364 (1994). 16. J. Aus der Au, D. Kopf, F. Morier-Genoud, M. Moser, U. Keller « 60-fs pulses from a diode- pumped Ndglass laser » Opt. Lett. 22 307 (1997). 17. F. Brunner, T. Sdmeyer, E. Innerhofer, F. Morier-Genoud, R. Paschotta, V. E. Kisel, V. G. Shcherbitsky, N. V. Kuleshov, J. Gao, K. Contag, A. Giesen, U. Keller «240-fs pulses with 22-W average power from a mode-locked thin-disk YbKY(WO4)2 laser», Opt. Lett. 27 1162 (2002). 18. A. Courjaud, R. Maleck-Rassoul, N. Deguil, C. Hönninger, and F. Salin, “Diode pumped multikilohertz femtosecond amplifier”, in OSA Trends in Optics and Photonics, Advanced Solid-State Lasers, 68, 121-123 (2002). 19. F. Druon, G.J. Valentine, S. Chénais, P. Raybaut, F. Balembois, P. Georges, A. Brun, A.J. Kemp, W. Sibbett, S. Mohr, D. Kopf, D.J.L. Birkin, D. Burns, A. Courjaud, C. Hönninger, F. Salin, R. Gaumé, A. Aron, G. Aka, B. Viana, C. Clerc, H. Bernas, «Diode-pumped femtosecond oscillator using ultra-broad Yb-doped crystals and modelocked using low- 103 /115 temperature grown or ion implanted saturable-absorber mirrors», Euro. Phys. J. 20, 177- 182 (2003). 20. F. Druon, F. Balembois, P. Georges, A. Brun, A. Courjaud, C. Honninger, F. Salin, A. Aron, F. Mougel, G. Aka, D. Vivien «Generation of 90-fs pulses from a mode-locked diode- pumped Yb 3+: Ca4GdO(BO3)3 laser» Opt. Lett. 25, 423-425 (2000). 21. F. Druon, S. Chénais, P. Raybaut, F. Balembois, P. Georges, R. Gaum, G. Aka, B. Viana, S. Mohr, D. Kopf «Diode-pumped Yb:Sr3Y(BO3)3 femtosecond laser» Opt. Lett. 27, 197-199 (2002). 22. F. Druon, S. Chénais, P. Raybaut, F. Balembois, P. Georges, R. Gaumé, P. H. Haumesser, B. Viana, D. Vivien, S. Dhellemmes, V. Ortiz, C. Larat «Apatite-structure crystal, Yb3+:SrY4(SiO4)3O, for the development of diode-pumped femtosecond lasers» , Opt. Lett. 27, 1914-1916 (2002). 23. F. Druon, F. Balembois, P. Georges « Laser crystals for the production of ultra-short laser pulses » Ann. Chim. Mat. 28, 47-72 (2003). 24. C. Honninger , F. Morier-Genoud , M. Moser , U. Keller , L. R. Brovelli , and C. Harder , «Efficient and tunable diode-pumped femtosecond Yb:glass lasers» , Opt. Lett. 23, 126-128 (1998). 25. C. Honninger, R. Paschotta, M. Graf, F. Morier-Genoud, G. Zhang, M. Moser, S. Biswal, J. Nees, A. Braun, G. Mourou, I. Johannsen, A. Giesen, W. Seeber, and U. Keller, «Ultrafast ytterbium-doped bulk laser amplifiers» Appl. Phys. B, 69, 3 (1999). 26. E. Innerhofer, T. Sdmeyer, F. Brunner, R. Hring, A. Aschwanden, R. Paschotta, C. Honninger, M. Kumkar, U. Keller «60-W average power in 810-fs pulses from a thin-disk YbYAG laser» Opt. Lett. 28, 367-369 (2003). 27. Junji Kawanaka, Koichi Yamakawa, Hajime Nishioka, Ken-ichi Ueda «30-mJ, diode-pumped, chirped-pulse YbYLF regenerative amplifier » Opt. Lett. 28, 2121-2123 (2003). 104 /115 28. Peter Klopp, Valentin Petrov, Uwe Griebner, Klaus Petermann, Volker Peters, Götz Erbert « Highly efficient mode-locked Yb:Sc2O3 laser » Opt. Lett. 29, 391-393 (2004). 29. P. Klopp, V. Petrov, U. Griebner, and G. Erbert, "Passively mode-locked Yb:KYW laser pumped by a tapered diode laser," Opt. Express 10, 108-113 (2002). http://www.opticsexpress.org/abstract.cfm?URI=OPEX-10-2-108 30. Peter Klopp, Valentin Petrov, Uwe Griebner, Klaus Petermann, Volker Peters, Götz Erbert « Highly efficient mode-locked Yb:Sc2O3 laser» Opt. Lett. 29, 391-393 (2004). 31. H. Liu, S. Biswal, J. Paye, J. Nees, G. Mourou, C. Hnninger, U. Keller « Directly diode- pumped millijoule subpicosecond Ybglass regenerative amplifier » , Opt. Lett. 24, 917-919 (1999). 32. H. Liu, J. Nees, G. Mourou « Diode-pumped Kerr-lens mode-locked Yb:KY(WO4)2 laser» Opt. Lett. 26, 1723-1725 (2001). 33. Hsiao-Hua Liu, John Nees, Grard Mourou, « Directly diode-pumped Yb:KY(WO4).2 regenerative amplifiers » Opt. Lett. 27, 722-724 (2002). 34. P. Raybaut, F. Druon, F. Balembois, P. Georges, R. Gaumé, G. Aka, B. Viana «Directly diode-pumped oscillators and regenerative amplifiers for ultra-short pulse generation » Invited paper CThFF5 CLEO 2004. 35. Pierre Raybaut, Frederic Druon, Francois Balembois, Patrick Georges, Romain Gaumé, Bruno Viana, Daniel Vivien « Directly diode-pumped Yb3+:SrY4(SiO4)3O regenerative amplifier » Opt. Lett. 28, 2195-2196 (2003). 36. A. Shirakawa, K. Takaichi, H. Yagi, J. -. Bisson, J. Lu, M. Musha, K. Ueda, T. Yanagitani, T. S. Petrov, and A. A. Kaminskii, "Diode-pumped mode-locked Yb3+:Y2O3 ceramic laser," Opt. Express 11, 2911-2916 (2003). 105 /115 37. L. DeLoach, S. Payne, L. Chase, L. Smith, W. Kway, W. Krupke : “Evaluation of absorption and emission properties of Yb3+-doped crystals for laser applications”, IEEE J. Quant. Elec., vol. 29, no. 4, pp. 1179-1091 (1984). 38. S.A. Payne, L.D. DeLoach, L.K. Smith W.L. Kway, J.B. Tassano, W.F. Krupke, B.H.T. Chai, G. Louts : “Ytterbium-doped apatite-structure crystals : a new class of laser materials ”, J. Appl. Phys 76, 497-503 (1994) 39. S. Payne, L. Smith, L. DeLoach, W. Kway, J. Tassano, W. Krupke : “Laser, optical, and thermomechanical properties of Yb-doped fluorapatite”, IEEE. J. Quant. Elec. , vol. 30, No.1, pp. 170-179 (1994). 40. F. Mougel, G. Aka, A. Kahn-Harari, H. Hubert, J.M. Benitez, D. Vivien : “Infrared laser performance and self-frequency doubling of Nd3+:Ca4GdO(BO3)3 (Nd:GdCOB)”, Opt. Mat. 8, pp.161-173 (1997). 41. F. Druon, F. Augé, F. Balembois, P. Georges, A. Brun, A. Aron, F. Mougel, G. Aka, D. Vivien : « Efficient, tunable, zero-line diode-pumped, continuous-wave Yb3+ :Ca4LnO(BO3)3 (Ln = Gd, Y) lasers at room temperature and application to miniature lasers », J. Opt. Soc. Am. B, Vol. 17, No. 1 (2000). 42. P.-H. Haumesser, R. Gaumé, B. Viana, D. Vivien : “Determination of laser parameters of ytterbium-doped oxide crystalline materials”, J. Opt. Soc. Am. B, Vol. 19, No. 10, pp. 2365- 2375 (2002). 43. R. Gaumé, P.-H. Haumesser, B. Viana, G. Aka, D. Vivien, E. Scheer, P. Bourdon, B. Ferrand, M. Jacquet, N. Lenain : « Spectroscopic properties and laser performance of Yb3+ :Y2SiO5, a new infrared laser material », OSA TOPS vol. 34, Advanced Solid State Lasers, p. 469, 2000. 106 /115 44. S. Chénais, F. Druon, F. Balembois, P. Georges, R. Gaumé, B. Viana, D. Vivien, A. Brenier, G. Boulon : « Diode-pumped Yb:GGG laser : comparison with Yb:YAG », Optical Materials 22 pp.99-106 (2003). 45. N. V. Kuleshov, A. A. Lagatsky, V. G. Shcherbitsky, V. P. Mikhailov, E. Heumann, T. Jensen, A. Diening, G. Huber : “CW laser performance of Yb and Er, Yb doped tungstates”, Appl. Phys. B 64, pp.409-413 (1997). 46. A.A. Lagatsky, N.V. Kuleshov, V.P. Mikhailov: « diode-pumped CW lasing of Yb:KYW and Yb:KGW », Opt. Comm. 165, pp.71-75 (1999) 47. A. Lucca, M. Jacquemet, F. Druon, F. Balembois, P. Georges, P. Camy,, J.L. Doualan, R. Moncorgé « high power diode-pumped CW laser operation of Yb :CaF2 » post-deadline CPD-A1 CLEO (2004). 48. S. Jiang, M. Myers, D. Rhonehouse, U. Griebner, R. Koch, H. Schonnagel : “Ytterbium doped phosphate laser glasses”, Proceedings of SPIE “Solid State Lasers VI”, Vol. 2986 (1997). 49. J.F. Nye : physical properties of crystals, Clarendon Press, Oxford, 1985. 50. D. Brown : « Ultrahigh-average power diode-pumped Nd :YAG and Yb:YAG lasers », IEEE J. Quant. Elec., Vol. 33, No. 5, pp. 861-873 (1997). 51. W. Koechner : « Absorbed Pump Power, Thermal Profile and Stresses in a cw Pumped Nd :YAG Crystal », Appl. Opt., Vol. 9, No. 6, June 1970, 1429-1434 52. M. Schmid, T. Graf, and H. P. Weber, “Analytical model of the temperature distribution and the thermally induced birefringence in laser rods with cylindrically symmetric heating,” J. Opt. Soc. Amer. B, vol. V17, no. 8, pp. 1398–1404, 2000. 53. U. Farrukh, A. Buoncristiani, C. Byvik : « an analysis of the temperature distribution in finite solid-state laser rods », IEEE J. Quant. Elec., vol. 34, no. 11, pp. 2253-2263 (1988) 107 /115 54. M. Innocenzi, H. Yura, C. Fincher, R. Fields : « Thermal modeling of continuous-wave end- pumped solid-state lasers », Appl. Phys. Lett. 56 (19), 7 may 1990, pp. 1831-1833 55. Y. Chen, T. Huang, C. Kao, C. Wang, S. Wang : « optimization in scaling fiber-coupled laser- diode end-pumped lasers to higher power : influence on thermal effect », IEEE J. Quant. Elec., Vol. 33, No. 8, pp. 1424-1429 (1997) 56. F. Sanchez, M. Brunel, K Aït-Ameur : « Pump-saturation effects in end-pumped solid-state lasers », J. Opt. Soc. Am. B, Vol. 15, No. 9, pp. 2390-2394 (1998). 57. F. Augé, F. Druon, F. Balembois, P. Georges, A. Brun, F. Mougel, G. P. Aka, and D. Vivien, “Theoretical and experimental investigations of a diode-pumped quasi-three-level laser: The Yb –doped Ca4GdO(BO3)3 (Yb:GdCOB) laser,” IEEE J. Quantum Electron., vol. 36, pp. 598–606, May 2000. 58. H.S. Carslaw, J.C. Jaeger : conduction of heat in solids, 2nd edition, Clarendon Press, Oxford, 1986. 59. Jacobs and Starr, Rev. Sci. Instr. 10, 140 (1941). 60. A. Cousins: « temperature and thermal stress scaling in finite-length end-pumped laser rods» , IEEE J. Quant. Elec. , vol. 28, no. 4, pp.1057-1069 (1992). 61. J. Marion: “strengthened solid-state laser materials”, Appl. Phys. Lett. 47 (7), pp. 94-96 (1985). 62. J. Marion: “Fracture of solid-state laser slabs”, J. Appl. Phys. 60 (1), pp. 69-77 (1986). 63. J. Marion: “appropriate use of the strength parameter in solid-state slab laser design”, J. Appl. Phys. 62 (5), pp. 1595-1604 (1987). 64. S. Ferré: « Caractérisation expérimentale et simulation des effets thermiques d’une chaîne laser ultra-intense à base de saphir dopé au titane », thèse de doctorat de l’école polytechnique, 2002. 108 /115 65. H. Glur, R. Lavi, T. Graf: “Reduction of thermally induced lenses in Nd:YAG rods with low temperatures, IEEE J. of Quant. Elec., Vol. 40, No. 5, 499-504 (2004) 66. D. Kopf, K. Weingarten, G. Zhang, M. Moser, M. Emanuel, R. Beach, J. Skidmore, U. Keller :“ High average power diode-pumped femtosecond Cr:LiSAF lasers”, Appl. Phys. B 65, pp. 235-243 (1997). 67. M. Tsunekane, N. Taguchi, H. Inaba : “ reduction of thermal effects in a diode-end-pumped, composite Nd:YAG rod with a sapphire end”, Applied Optics, vol. 37, no. 15, pp. 3290- 3294 (1998). 68. A. Giesen, H. Hügel, A. Voss, K. Wittig, U. Brauch, H. Opower :“scalable concept for diode- pumped hgh-power solid-state lasers“, Appl. Phys. B 58, 365-372 (1994). 69. W. Koechner : Solid State laser engineering, 5th version, Springer, 1999. 70. W.A. Clarkson : « thermal effects and their mitigation in end-pumped solid-state lasers », J. Phys. D : Appl. Phys. 34 pp. 2381-2395 (2001) 71. Handbook of Optics, 2nd ed., vol. II (devices, measurements and properties), edited by M. Bass, E. Stryland, D. Williams, W. Wolfe, McGRAW-HILL, Inc., 2001. 72. B. Woods, S. Payne, J. Marion, R. Hugues, L. Davis : “thermomechanical and thermo- optical properties of the LiCaAlF6:Cr3+ laser material”, J. Opt. Soc. Am. B Vol. 8, No. 5, pp. 970- 977 (1991). 73. S. Chenais, F. Balembois, F. Druon, G. Lucas-leclin, P. Georges, “thermal lensing in Diode-pumped Ytterbium Lasers – Part I : Theoretical analysis and wavefront measurements”, IEEE J. Quant. Elec. Vol 40, No. 9, 1217-1234, (2004) 74. A.A. Lagatsky, N.V. Kuleshov, V.P. Mikhailov: « diode-pumped CW lasing of Yb:KYW and Yb:KGW », Opt. Comm. 165, pp.71-75 (1999) 109 /115 75. F. Augé, F. Druon, F. Balembois, P. Georges, A. Brun, F. Mougel, G. P. Aka, D. Vivien : « Theoretical and experimental investigations of a diode-pumped quasi-three-level laser : the Yb3+-doped Ca4GdO(BO3)3 (Yb:GdCOB) laser » , IEEE J. Quant. Elec., Vol. 36, No.5, pp. 598-606 (2000). 76. A. Aron, G. Aka, B. Viana, A. Kahn-Harari, D. Vivien, F. Druon, F. Balembois, P. Georges, A. Brun, N. Lenain, M. Jacquet : “Spectroscopic properties and laser performances of Yb:YCOB and potential of the Yb:LaCOB material”, Opt. Mat. 16, pp.181-188 (2001). 77. W. Koechner, D.K. Rice : “effect of birefringence on the performance of linearly polarized YAG:Nd lasers,” IEEE J. Quant. Elec. , vol. 6, pp.557-566 (1990). 78. Scott W.C., De Wit M., Appl Phys Lett. vol. 18, pp.3-4 1971 79. Lu Q., Kugler N., Weber H., Dong S., Muller N. and Wittrock U., Opt Quant Electron, vol 28, pp. 57-69 (1996) 80. R. Fluck, M. Hermann, L. Hackel : “energetic and thermal performance of high-gain diode-side pumped Nd:YAG rods”, Appl. Phys. B 70, pp. 491-498 (2000). 81. S De Nicola, A Finizio, G Pierattini and G Carbonara “Interferometric method for concurrent measurement of the thermo-optic coefficients of quartz retarders” Pure Appl. Opt. 3 209-213 (1994) 82. Y-F. Tsay, B. Bendow, S. Mitra : “theory of the temperature derivative of the refractive index in transparent crystals”, Phys. Rev. B., Vol. 8, No. 6, pp. 2688-2696 (1973). 83. J. Eichenholz, M. Richardson, : “measurement of thermal lensing in Cr3+-doped colquirites”, IEEE J. Quant. Elec., vol. 34, no. 5, pp. 910-919 (1988) 84. C. Pfistner, R. Weber, H. Weber, S. Merazzi, R. Gruber : “ Thermal beam distortions in end- pumped Nd:YAG, Nd:GSGG, and Nd:YLF Rods ”, IEEE. J. Quant. Elec. vol. 30, no. 7, pp. 1605-1615 (1994). 110 /115 85. T. Baer, W. Nighan, M. Keierstead : “modeling of end-pumped, solid-state lasers”, in Conference on Lasers and Electro-Optics, Vol. 11 of 1993 OSA Technical Digest Series (Optical society of America, Washington D.C., 1993), p. 638. 86. K. Kleine, L. Gonzalez, R. Bhatia, L. Marshall,, D. Matthews : “High brightness Nd:YVO4 laser for nonlinear optics”, Advanced Solid State Lasers, M. Fejer, H. Injeyan and U. Keller eds., Vol. 26 of OSA Trends in Optics and Photonics Series (Optical Society of America, Washington D.C., 1999), pp. 157-158. 87. R. Weber, B. Neuenschwender, M. Macdonald, M.B. Roos, H.P. Weber : “Cooling schemes for longitudinally diode-laser pumped Nd:YAG rods”, IEEE J. Quant. Elec. 34, pp. 1046-1053 (1998). 88. X. Peng, A. Asundi, Y. Chen, Z. Xiong : “study of the mechanical properties of Nd:YVO4 crystal by use of laser interferometry and finite-element analysis”, Appl. Opt., Vol. 40, No. 9, pp. 1396-1403 (2001). 89. A. Brignon, J.-P. Huignard, M. H. Garrett, and I. Mnushkina, “Spatial beam cleanup of a Nd:YAG laser operating at 1.06 µm with two-wave mixing in Rh:BaTiO3,” Appl. Opt. 36, 7788-7793 (1997). 90. W. Xie, S. Tam, Y. Lam, Y. Kwon : “Diffraction losses of high power solid state lasers”, Opt. Comm. 189, pp. 337-343 (2001). 91. J. Frauchiger, P. Albers, H. Weber : "modeling of thermal lensing and higher order ring mode oscillation in end-pumped CW Nd:YAG lasers“, IEEE J. Quant. Elec. , vol. 28, no. 4, pp. 1046-1056 (1992). 92. F. Druon, G. Chériaux, J. Faure, G. Vdovin, J.C. Chanteloup, J. Nees, M. Nantel, A. Maksimchuk, G. Mourou, "Wave-front correction of femtosecond terawatt lasers using deformable mirrors", Optics Letters, Vol. 23 No 1, pp. 1043-45, 1998 111 /115 93. J. Gordon, R. Leite, R. Moore, S. Porto, J. Whinnery : “long-transient effects in lasers with inserted liquid samples”, J. Appl. Phys. Vol. 36, no. 1, pp. 3-8 (1965). 94. D. Fournier, A.C. Boccara, N. Amer, R. Gerlach : « sensitive in situ trace-gas detection by photothermal deflection », Appl. Phys. Lett. 37 (6), pp. 519-521 (1980). 95. S. Bialkowski : “Photothermal Spectroscopy Methods for Chemical Analysis” Volume 134 Chemical Analysis: A Series of Monographs on Analytical Chemistry and Its Applications, , J. D. Winefordner, Series Editor, John Wiley & Sons, Inc. 1996 96. D. Burnham : “simple measurement of thermal lensing effects in laser rods”, Appl. Opt. vol. 9, no. 7, pp.1727-1728 (1970). 97. C. Hu, J.R. Whinnery : “New thermooptical measurement method and a comparison with other methods”, Appl. Opt. Vol. 12, No. 1, pp. 72-79 (1973). 98. R. Paugstadt, M. Bas : “method for temporally and spatially resolved thermal-lensing measurements”, Appl. Opt. , vol. 33, no. 6, pp. 954-959 (1994). 99. R. Misra, P. Banerjee : “theoretical and experimental studies of pump-induced probe deflection in a thermal medium”, Appl. Opt. , vol. 34, no. 18, pp.3358-3365 (1995). 100. H. Kogelnik, T. Li : “laser beams and resonators”, Appl. Opt. vol. 5, pp. 1550 (octobre 1966). 101. B. Neuenschwander, R. Weber, H. Weber : “determination of the thermal lens in solid-state lasers with stable cavities”, IEEE. J. Quant. Elec. , vol. 31, no. 6, pp. 1082-1087 (1995). 102. J. Frauchiger, P. Albers, H. Weber : "modeling of thermal lensing and higher order ring mode oscillation in end-pumped CW Nd:YAG lasers“, IEEE J. Quant. Elec. , vol. 28, no. 4, pp. 1046-1056 (1992). 103. B. Ozygus, J. Erhard : “thermal lens determination of end-pumped solid-state lasers with transverse beat frequencies”, Appl. Phys. Lett. 67 (10), pp. 1361-1362 (1995). 112 /115 104. B. Ozygus, Q. Zhang : “thermal lens determination of end-pumped solid-state lasers using primary degeneration modes”, Appl. Phys. Lett. 71 (18), pp. 2590-2592 (1997). 105. A. Cabezas, L. Komai, R. Treat : “dynamic measurements of phae shifts in laser amplifiers”, Appl. Opt., Vol. 5, No. 4, pp. 647-651 (1966). 106. H. Welling, C. Bickart : “spatial and temporal variation of the optical path length in flash- pumped laser rods”, J. Opt. Soc. Am. Vol. 56, No. 5, pp. 611-618 (1966). 107. C. Pfistner, R. Weber, H. Weber, S. Merazzi, R. Gruber : “ Thermal beam distortions in end-pumped Nd:YAG, Nd:GSGG, and Nd:YLF Rods ”, IEEE. J. Quant. Elec. vol. 30, no. 7, pp. 1605-1615 (1994). 108. A. Khizhnyak, G. Galich, M. Lopiitchouk : “characteristics of thermal lens induced in active rod of cw Nd:YAG laser”, Semiconductor Physics ; Quantum Electronics & Optoelectronics (SQO), Vol. 2, No. 1, pp. 147-152 (1999). 109. J. Blows, J. Dawes, T. Omatsu : “thermal lensing measurements in line-focus end-pumped neodymium yttrium aluminium garnet using holographic lateral shearing interferometry”, J. Appl. Phys. , Vol. 83, No. 6, pp. 2901-2906 (1998). 110. J.L. Blows, P. Dekker, P. Wang , J.M. Dawes , T. Omatsu, "Thermal lensing measurements and thermal conductivity of Yb:YAB" Appl. Phys. B 76, 289-292 (2003) 111. J. Primot : “three-wave lateral shearing interferometer”, Appl. Opt. , Vol 32, No. 31, pp.6242-6249 (1993). J. Primot : “three-wave lateral shearing interferometer”, Appl. Opt. , Vol 32, No. 31, pp.6242-6249 (1993). 112. J-C. Chanteloup: « Multiple-wave lateral shearing interferometry for wave-front sensing », Opt. Lett., Vol. 23, No. 8, pp. 621-623 (1998) 113 /115 113. J-C. Chanteloup, F. Druon, M. Nantel, A. Maksimchuk, G. Mourou : « single-shot wave- front measurements of high-intensity ultrashort laser pulses with a three-wave interferometer », Applied Optics, Vol. 44, No. 9, pp. 1559-1571 (2005) 114. L. Grossard, A. Desfarges-Berthelemot, B. Colombeau, C. Froehly : “iterative reconstruction of thermally induced phase distortion in a Nd3+:YVO4 laser”, J. Opt. A : Pure Appl. Opt. 4 pp. 1-7 (2002). 115. J. Hartmann : “Bemerkungen uber den Bau und die Justirung von Spectrographen”, Zeitung Instrumentenkd., 20 p.47 (1900). 116. D. Armstrong, J. Mansell, D. Neal : “how to avoid beam distortion in solid-state laser design”, Laser Focus World, décembre 1998, pp. 129-132. 117. S. Ito, H. Nagaoka, T. Miura, K. Kobayashi, A. Endo, K. Torizuka : “ measurement of thermal lensing in a power amplifier of a terawatt Ti:sapphire laser”, Appl. Phys. B 74, pp. 343- 347 (2002). 118. M. Pittman, S. Ferré, J.P. Rouseau, L. Notebaert, J.P. Chambaret, G. Chériaux : « design and characterization of a near-difraction-limited femtosecond 100-TW 10-Hz high-intensity laser system », Appl. Phys. B 74, pp. 529-535 (2002). 119. I. Shoji, T. Taira : “Drastic reduction of depolarization resulting from thermally induced birefringence by use of a (110)-cut YAG crystal”, OSA TOPS Advanced Solid State Lasers Vol. 68, pp.521-525, 2002. 120. A. Nashimura, K. Akaoka, A. Ohzu, T. Usami, “Temporal changes of thermal lens effects on highly-pumped Ytterbium glass by wavefront measurements”, Journal of Nuclear Science and Technology, Vol. 38, No. 12, p. 1043-1047 (2001) 121. S. Chénais, F. Balembois, F . Druon, G. Lucas-Leclin, P. Georges, "Thermal Lensing in Diode-Pumped Ytterbium Lasers - Part II: evaluation of quantum efficiencies and thermo-optic coefficients." IEEE J. Quantum Electronics Vol. 40 No 9 September, 1235-1243, 2004 114 /115 122. D. F. de Sousa, N. Martynyuk, V. Peters, K. Lunstedt, K. Rademaker, K. Petermann, and S. Basun, “Quenching behavior of highly doped Yb:YAG and YbAG,” in Conf. Lasers Electro- Optics Europe, Tech. Dig., Conf. Ed., 2003, CG1-3. 123. H. Yin, P. Deng, and F. Gan, “Defects in Yb:YAG,” J. Appl. Phys., vol. 83, no. 8, pp. 3825–3828, 1998. 124. P. Yang, P. Deng, and Z. Yin, “Concentration quenching in Yb:YAG,” J. Lumin., pp. 51–54, 125. M. O. Ramirez, D. Jaque, L. E. Bausa, J. A. S. Garcia, and J. G. Solé, “Thermal loading in highly efficient diode pumped ytterbium doped lithium niobate lasers,” presented at the Conf. Lasers and Electro-Optics Europe 2003 (CLEO Europe), Münich, Germany, 2003. 126. N. Barnes, B. Walsh : “Quantum efficiency measurements of Nd:YAG, Yb:YAG, and Tm:YAG”, OSA TOPS Advanced Solid State Lasers Vol. 68, pp.284-287, 2002. 127. F. Patel, E. Honea, J. Speth, S. Payne, R. Hutcheson, R. Equall : “Laser demonstration of Yb3Al5O12 (YbAG) and materials properties of highly doped Yb:YAG”, IEEE. J. Quant. Elec. , vol. 37, no. 1, pp. 135-144 (2001). 128. Jacquemet, M.; Jacquemet, C.; Janel, N.; Druon, F.; Balembois, F.; Georges, P.; Petit, J.; Viana, B.; Vivien, D.; Ferrand, B. “Efficient laser action of Yb:LSO and Yb:YSO oxyorthosilicates crystals under high-power diode-pumping”, Applied Physics B, Volume 80, Issue 2, pp.171-176 (2005) 129. R. Gaumé, “Relations structure-propriétés dans les lasers solides de puissance à l’ytterbium. élaboration et caractérization de nouveaux matériaux et de cristaux composites soudés par diffusion,” Ph.D. dissertation, Pierre et Marie Curie Univ., Paris, VI, France, 2002 130. R. Epstein et. al : « Observation of laser-induced fluorescent cooling of a solid », Nature, vol. 377, pp.500-503, 12 oct. 1995. 115 /115 131. S. Bowman, N. Jenkins, B. Feldman, S. O’Connor: “Demonstration of a radiatively cooled laser”, in Proceedings of Conference on Lasers and Electro-Optics (CLEO 2002), Long Beach, CA, June 2002, p. 180. 132. S. Bowman : “ lasers without internal heat generation”, IEEE J. Quant. Elec., vol. 35, no. 1, 115-122 (1999) ABSTRACT A review of theoretical and experimental studies of thermal effects in solid-state lasers is presented, with a special focus on diode-pumped ytterbium-doped materials. A large part of this review provides however general information applicable to any kind of solid-state laser. Our aim here is not to make a list of the techniques that have been used to minimize thermal effects, but instead to give an overview of the theoretical aspects underneath, and give a state-of-the-art of the tools at the disposal of the laser scientist to measure thermal effects. After a presentation of some general properties of Yb-doped materials, we address the issue of evaluating the temperature map in Yb-doped laser crystals, both theoretically and experimentally. This is the first step before studying the complex problem of thermal lensing (part III). We will focus on some newly discussed aspects, like the definition of the thermo-optic coefficient: we will highlight some misleading interpretations of thermal lensing experiments due to the use of the dn/dT parameter in a context where it is not relevant. Part IV will be devoted to a state-of-the-art of experimental techniques used to measure thermal lensing. Eventually, in part V, we will give some concrete examples in Yb-doped materials, where their peculiarities will be pointed out. <|endoftext|><|startoftext|> Introduction Numerical simulations Field scattered by a dipole along the interface Analytical model Summary Acknowledgments APPENDIX References ABSTRACT We present a numerical study and analytical model of the optical near-field diffracted in the vicinity of subwavelength grooves milled in silver surfaces. The Green's tensor approach permits computation of the phase and amplitude dependence of the diffracted wave as a function of the groove geometry. It is shown that the field diffracted along the interface by the groove is equivalent to replacing the groove by an oscillating dipolar line source. An analytic expression is derived from the Green's function formalism, that reproduces well the asymptotic surface plasmon polariton (SPP) wave as well as the transient surface wave in the near-zone close to the groove. The agreement between this model and the full simulation is very good, showing that the transient "near-zone" regime does not depend on the precise shape of the groove. Finally, it is shown that a composite diffractive evanescent wave model that includes the asymptotic SPP can describe the wavelength evolution in this transient near-zone. Such a semi-analytical model may be useful for the design and optimization of more elaborate photonic circuits whose behavior in large part will be controlled by surface waves. <|endoftext|><|startoftext|> Introduction (a) Equipped with the L2-Wasserstein distance dW (cf. (2.1)), the space P(M) of probability measures on an Euclidean or Riemannian space M is itself a rich object of geometric interest. Due to the fundamental works of Y. Brenier, R. McCann, F. Otto, C. Villani and many others (see e.g. [Bre91, McC97, CEMS01, Ott01, OV00, Vil03]) there are well understood and pow- erful concepts of geodesics, exponential maps, tangent spaces TµP(M) and gradients Du(µ) of functions on this space. In a certain sense, P(M) can be regarded as an infinite dimensional Riemannian manifold, or at least as an infinite dimensional Alexandrov space with nonnegative lower curvature bound if the base manifold (M,d) has nonnegative sectional curvature. A central role is played by the relative entropy : P(M) → R ∪ {+∞} with respect to the Riemannian volume measure dx on M Ent(µ) = ρ log ρ dx, if dµ(x) ≪ dx with ρ(x) = dµ(x) +∞, else. The relative entropy as a function on the geodesic space (P(M), dW ) is K-convex for a given number K ∈ R if and only if the Ricci curvature of the underlying manifold M is bounded from below by K, [vRS05, Stu06]. The gradient flow for the relative entropy in the geodesic space (P(M), dW ) is given by the heat equation ∂∂tµ = ∆µ on M , [JKO98]. More generally, a large class of evolution equations can be treated as gradient flows for suitable free energy functionals S : P(M) → R, [Vil03]. What is missing until now, is a natural ’Riemannian volume measure’ P on P(M). The basic re- quirement will be an integration by parts formula for the gradient. This will imply the closability of the pre-Dirichlet form E(u, v) = 〈Du(µ),Dv(µ)〉Tµ dP(µ) in L2(P(M),P), – which in turn will be the key tool in order to develop an analytic and stochastic calculus on P(M). In particular, it will allow us to construct a kind of Laplacian and a kind of Brownian motion on P(M). Among others, we intend to use the powerful machinery of Dirichlet forms to study stochastically perturbed gradient flows on P(M) which – on the level http://arxiv.org/abs/0704.0704v1 of the underlying spaces M – will lead to a new concept of SPDEs (preserving probability by construction). Instead of constructing a ’uniform distribution’ P on P(M), for various reasons, we prefer to construct a probability measure Pβ on P(M) formally given as dPβ(µ) = e−β·Ent(µ) dP(µ) (1.1) for β > 0 and some normalization constant Zβ. (In the language of statistical mechanics, β is the ’inverse temperature’ and Zβ the ’partition function’ whereas the entropy plays the role of a Hamiltonian.) (b) One of the basic results of this paper is the rigorous construction of such a entropic measure Pβ in the one-dimensional case, i.e. M = S1 or M = [0, 1]. We will essentially make use of the representation of probability measures by their inverse distributions function gµ. It allows to transfer the problem of constructing a measure Pβ on the space of probability measures P([0, 1]) (or P(S1)) into the problem of constructing a measure Qβ0 (or Qβ) on the space G0 (or G, resp.) of nondecreasing functions from [0, 1] (or S1, resp.) into itself. In terms of the measure Q 0 on G0, for instance, the formal characterization (1.1) then reads as follows 0 (g) = e−β·S(g) dQ0(g). (1.2) Here Q0 denotes some ’uniform distribution’ on G0 ⊂ L2([0, 1]) and S : G0 → [0,∞] is the entropy functional S(g) := Ent(g∗Leb) = − log g′(t) dt. This representation is reminiscent of Feynman’s heuristic picture of the Wiener measure, — now with the energy H(g) = g′(t)2dt of a path replaced by its entropy. Q 0 will turn out to be (the law of) the Dirichlet process or normalized Gamma process. (c) The key result here is the quasi-invariance – or in other words a change of variable formula – for the measure Pβ (or P 0 ) under push-forwards µ 7→ h∗µ by means of smooth diffeomorphisms h of S1 (or [0, 1], resp.). This is equivalent to the quasi-invariance of the measure Qβ under translations g 7→ h ◦ g of the semigroup G by smooth h ∈ G. The density dPβ(h∗µ) dPβ(µ) · Y 0h (µ) consists of two terms. The first one h (µ) = exp log h′(t)dµ(t) can be interpreted as exp(−βEnt(h∗µ))/ exp(−βEnt(µ)) in accordance with our formal inter- pretation (1.1). The second one Y 0h (µ) = I∈gaps(µ) h′(I−) · h′(I+) |h(I)|/|I| can be interpreted as the change of variable formula for the (non-existing) measure P. Here gaps(µ) denotes the set of intervals I =]I−, I+[⊂ S1 of maximal length with µ(I) = 0. Note that Pβ is concentrated on the set of µ which have no atoms and not absolutely continuous parts and whose supports have Lebesgue measure 0. (d) The tangent space at a given point µ in P = P(S1) (or in P0 = P([0, 1])) will be an appropriate completion of the space C∞(S1,R) (or C∞([0, 1],R), resp.). The action of a tangent vector ϕ on µ (’exponential map’) is given by the push forward ϕ∗µ. This leads to the notion of the directional derivative Dϕu(µ) = lim [u((Id+ tϕ)∗µ)− u(µ)] for functions u : P → R. The quasi-invariance of the measure Pβ implies an integration by parts formula (and thus the closability) D∗ϕu = −Dϕu− Vϕ · u with drift Vϕ = limt→0 Id+tϕ − 1). The subsequent construction will strongly depend on the choice of the norm on the tangent spaces TµP. Basically, we will encounter two important cases. (e) Choosing TµP = Hs(S1,Leb) for some s > 1/2 — independent of µ — leads to a regular, local, recurrent Dirichlet form E on L2(P,Pβ) by E(u, u) = |Dϕku(µ)|2 dPβ(µ). where {ϕk}k∈N denotes some complete orthonormal system in the Sobolev space Hs(S1). Ac- cording to the theory of Dirichlet forms on locally compact spaces [FOT94], this form is associ- ated with a continuous Markov process on P(S1) which is reversible with respect to the measure Pβ. Its generator is given by DϕkDϕk + Vϕk ·Dϕk . (1.3) This process (gt)t≥0 is closely related to the stochastic processes on the diffeomorphism group of S1 and to the ’Brownian motion’ on the homeomorphism group of S1, studied by Airault, Fang, Malliavin, Ren, Thalmaier and others [AMT04, AM06, AR02, Fan02, Fan04, Mal99]. These are processes with generator 1 kDϕkDϕk . Hence, one advantage of our approach is to identify a probability measure Pβ such that these processes — after adding a suitable drift — are reversible. Moreover, previous approaches are restricted to s ≥ 3/2 whereas our construction applies to all cases s > 1/2. (f) Choosing TµG = L2([0, 1], µ) leads to the Wasserstein Dirichlet form E(u, u) = ‖Du(µ)‖2L2(µ) dP 0 (µ) on L2(P0,Pβ0 ). Its square field operator is the squared norm of the Wasserstein gradient and its intrinsic distance (which governs the short time asymptotic of the process) coincides with the L2-Wasserstein metric. The associated continuous Markov process (µt)t≥0 on P([0, 1]), which we shall call Wasserstein diffusion, is reversible w.r.t. the entropic measure P 0 . It can be regarded as a stochastic perturbation of the Neumann heat flow on P([0, 1]) with small time Gaussian behaviour measured in terms of kinetic energy. 2 Spaces of Probability Measures and Monotone Maps The goal of this paper is to study stochastic dynamics on spaces P(M) in case M is the unit interval [0, 1] or the unit circle S1. 2.1 The Spaces P0 = P([0, 1]) and G0 Let us collect some basic facts for the space P0 = P([0, 1]) of probability measures on the unit interval [0, 1] the proofs of which can be found in the monograph [Vil03]. Equipped with the L2-Wasserstein distance dW , it is a compact metric space. Recall that dW (µ, ν) := inf [0,1]2 |x− y|2γ(dx, dy) , (2.1) where the infimum is taken over all probability measures γ ∈ P([0, 1]2) having marginals µ and ν (i.e. γ(A×M) = µ(A) and γ(M ×B) = ν(B) for all A,B ⊂M). Let G0 denote the space of all right continuous nondecreasing maps g : [0, 1[→ [0, 1] equipped with the L2-distance ‖g1 − g2‖L2 = |g1(t)− g2(t)|2dt Moreover, for notational convenience each g ∈ G0 is extended to the full interval [0, 1] by g(1) := 1. The map χ : G0 → P0, g 7→ g∗Leb (= push forward of the Lebesgue measure on [0, 1] under the map g) establishes an isometry between (G0, ‖.‖L2) and (P0, dW ). The inverse map χ−1 : P0 → G0, µ 7→ gµ assigns to each probability measure µ ∈ P0 its inverse distribution function defined by gµ(t) := inf{s ∈ [0, 1] : µ[0, s] > t} (2.2) with inf ∅ := 1. In particular, for all µ, ν ∈ P0 dW (µ, ν) = ‖gµ − gν‖L2 . (2.3) For each g ∈ G0 the generalized inverse g−1 ∈ G0 is defined by g−1(t) = inf{s ≥ 0 : g(s) > t}. Obviously, ‖g1 − g2‖L1 = ‖g−11 − g 2 ‖L1 (2.4) (being simply the area between the graphs) and (g−1)−1 = g. Moreover, g−1(g(t)) = t for all t provided g−1 is continuous. (Note that under the measure Q 0 to be constructed below the latter will be satisfied for a.e. g ∈ G0.) On G0, there exist various canonical topologies: the L2-topology of G0 regarded as subset of L2([0, 1],R); the image of the weak topology on P0 under the map χ−1 : µ 7→ gµ (= inverse distribution function); the image of the weak topology on P0 under the map µ 7→ g−1µ (= distribution function). All these – and several other – topologies coincide. Proposition 2.1. For each sequence (gn)n ⊂ G0, each g ∈ G0 and each p ∈ [1,∞[ the following are equivalent: (i) gn(t) → g(t) for each t ∈ [0, 1] in which g is continuous; (ii) gn → g in Lp([0, 1]); (iii) g−1n → g−1 in Lp([0, 1]); (iv) µgn → µg weakly; (v) µgn → µg in dW . In particular, G0 is compact. Let us briefly sketch the main arguments of the Proof. Since all the functions gn and g n are bounded, properties (ii) and (iii) obviously are independent of p. The equivalence of (ii) and (iii) for p = 1 was already stated in (2.4) and the equivalence between (ii) for p = 2 and (v) was stated in (2.3). The equivalence of (iv) and (v) is the well known fact that the Wasserstein distance metrizes the weak topology. Another well known characterization of weak convergence states that (iv) is equivalent to (i’): g−1n (t) → g−1(t) for each t ∈ [0, 1] in which g−1 is continuous. Finally, (i′) ⇔ (i) according to the equivalence (ii) ⇔ (iii) which allows to pass from convergence of distribution functions g−1n to convergence of inverse distribution functions gn. The last assertion follows from the compactness of P0 in the weak topology. 2.2 The Spaces G, G1 and P = P(S1) Throughout this paper, S1 = R/Z will always denote the circle of length 1. It inherits the group operation + from R with neutral element 0. For each x, y ∈ S1 the positively oriented segment from x to y will be denoted by [x, y] and its length by |[x, y]|. If no ambiguity is possible, the latter will also be denoted by y − x. In contrast to that, |x − y| will denote the S1-distance between x and y. Hence, in particular, |[y, x]| = 1 − |[x, y]| and |x − y| = min{|[y, x]|, |[x, y]|}. A family of points t1, . . . , tN ∈ S1 is called an ’ordered family’ if i=1 |[ti, ti+1]| = 1 with tN+1 := t1 (or in other words if all the open segments ]ti, ti+1[ are disjoint). G(R) = {g : R → R right continuous nondecreasing with g(x+ 1) = g(x) + 1 for all x ∈ R}. Due to the required equivariance with respect to the group action of Z, each map g ∈ G(R) induces uniquely a map π(g) : S1 → S1. Put G := π(G(R)). The monotonicity of the functions in G(R) induces also a kind of monotonicity of maps in G: each continuous g ∈ G will be order preserving and homotopic to the identity map. In the sequel, however, we often will have to deal with discontinuous g ∈ G. The elements g ∈ G will be called monotone maps of S1. G is a compact subspace of the L2-space of maps from S1 to S1 with metric ‖g1 − g2‖L2 = |g1(t)− g2(t)|2dt With the composition ◦ of maps, G is a semigroup. Its neutral element e is the identity map. Of particular interest in the sequel will be the semigroup G1 = G/S1 where functions g, h ∈ G will be identified if g(.) = h(.+ a) for some a ∈ S1. Proposition 2.2. The map χ : G1 → P, g 7→ g∗Leb (= push forward of the Lebesgue measure on S1 under the map g) and its inverse χ−1 : P → G1, µ 7→ gµ (with gµ as defined in (2.2)) establish an isometry between the space G1 equipped with the induced L2-distance ‖g1 − g2‖G1 = |g1(t)− g2(t+ s)|2dt and the space P of probability measures on S1 equipped with the L2-Wasserstein distance. In particular, G1 is compact. Proof. The bijectivity of χ and χ−1 is clear. It remains to prove that dW (µ, ν) = ‖gµ − gν‖G1 (2.5) for all µ, ν ∈ P. Obviously, it suffices to prove this for all absolutely continuous µ, ν (or equivalently for strictly increasing gµ, gν) since the latter are dense in P (or in G1, resp.). For such a pair of measures, there exists a map F : S1 → S1 (’transport map’) which minimizes the transportation costs [Vil03]. Fix any point in S1, say 0, and put s = F (0). Then the map F is a transport map for the mass µ on the segment ]0, 1[ onto the mass ν on the segment ]s, s+ 1[. Since these segments are isometric to the interval ]0, 1[, the results from the previous subsection imply that the minimal cost for such a transport is given by |g1(t) − g2(t + s)|2dt. Varying over s finally proves the claim. 3 Dirichlet Process and Entropic Measure 3.1 Gibbsean Interpretation and Heuristic Derivation of the Entropic Mea- One of the basic results of this paper is the rigorous construction of a measure Pβ formally given as (1.1) in the one-dimensional case, i.e. M = S1 or M = [0, 1]. We will essentially make use of the isometries χ : G1 → P = P(S1), g 7→ g∗Leb and χ : G0 → P0 = P([0, 1]). They allow to transfer the problem of constructing measures Pβ on spaces of probability measures P (or P0) into the problem of constructing measures Qβ (or Q 0 ) on spaces of functions G1 (or G0, resp.). In terms of the measure Q 0 on G0, for instance, the formal characterization (1.1) then reads as follows 0 (dg) = e−β·S(g)Q0(dg). (3.1) Here Q0 denotes some ’uniform distribution’ on G0 ⊂ L2([0, 1]) and S : G0 → [0,∞] is the entropy functional S(g) := Ent(g∗Leb). If g is absolutely continuous then S(g) can be expressed explicitly as S(g) = − log g′(t) dt. The representation (3.1) is reminiscent of Feynman’s heuristic picture of the Wiener measure. Let us briefly recall the latter and try to use it as a guideline for our construction of the measure According to this heuristic picture, the Wiener measure Pβ with diffusion constant σ2 = 1/β should be interpreted (and could be constructed) as Pβ(dg) = e−β·H(g)P(dg) (3.2) with the energy functional H(g) = 1 g′(t)2dt. Here P(dg) is assumed to be the ’uniform distribution’ on the space G∗ of all continuous paths g : [0, 1] → R with g(0) = 0. Even if such a uniform distribution existed, typically almost all paths g would have infinite energy. Nevertheless, one can overcome this difficulty as follows. Given any finite partition {0 = t0 < t1 < · · · < tN = 1} of [0, 1], one should replace the energy H(g) of the path g by the energy of the piecewise linear interpolation of g HN (g) = inf {H(g̃) : g̃ ∈ G∗, g̃(ti) = g(ti) ∀i} = |g(ti)− g(ti−1)|2 2(ti − ti−1) Then (3.2) leads to the following explicit representation for the finite dimensional distributions Pβ (gt1 ∈ dx1, . . . , gtN ∈ dxN ) = |xi − xi−1|2 ti − ti−1 pN (dx1, . . . , xN ). (3.3) Here pN (dx1, . . . , xN ) = P (gt1 ∈ dx1, . . . , gtN ∈ dxN ) should be a ’uniform distribution’ on RN and Zβ,N a normalization constant. Choosing pN to be the N -dimensional Lebesgue measure makes the RHS of (3.3) a projective family of probability measures. According to Kolmogorov’s extension theorem this family has a unique projective limit, the Wiener measure Pβ on G∗ with diffusion constant σ2 = 1/β. Now let us try to follow this procedure with the entropy functional S(g) replacing the energy functional H(g). Given any finite partition {0 = t0 < t1 < · · · < tN < tN+1 = 1} of [0, 1], we will replace the entropy S(g) of the path g by the entropy of the piecewise linear interpolation SN (g) = inf {S(g̃) : g̃ ∈ G0, g̃(ti) = g(ti) ∀i} = − g(ti)− g(ti−1) ti − ti−1 · (ti − ti−1). This leads to the following expression for the finite dimensional distributions 0 (gt1 ∈ dx1, . . . , gtN ∈ dxN ) xi − xi−1 ti − ti−1 · (ti − ti−1) qN(dx1 . . . dxN ) (3.4) where qN (dx1, . . . , xN ) = Q0 (gt1 ∈ dx1, . . . , gtN ∈ dxN ) is a ’uniform distribution’ on the simplex (x1, . . . , xN ) ∈ [0, 1]N : 0 < x1 < x2 . . . < xN < 1 and x0 := 0, xN+1 := 1. What is a ’canonical’ candidate for qN? A natural requirement will be the invariance property qN (dx1, . . . , dxN ) = [(Ξ xi−1,xi+k)∗ qk(dxi, . . . , dxi+k−1)] dqN−k(dx1, . . . , dxi−1, dxi+k, . . . , dxN ) (3.5) for all 1 ≤ k ≤ N and all 1 ≤ i ≤ N − k + 1 with the convention x0 = 0, xN+1 = 1 and the rescaling map Ξa,b : ]0, 1[k→ Rk, yj 7→ yj(b− a) + a for j = 1, · · · , k. If the qN , N ∈ N, were probability measures then the invariance property admits the following interpretation: under qN , the distribution of the (N − k)-tuple (x1, . . . , xi−1, xi+k, . . . , xN ) is nothing but qN−k; and under qN , the distribution of the k-tuple (xi, . . . , xi+k−1) of points in the interval ]xi−1, xk[ coincides — after rescaling of this interval — with qk. Unfortunately, no family of probability measures qN , N ∈ N with property (3.5) exists. However, there is a family of measures with this property. By iteration of the invariance property (3.5), the choice of the measure q1 on the interval Σ1 = ]0, 1[ will determine all the measures qN , N ∈ N. Moreover, applying (3.5) for N = 2, k = 1 and both choices of i yields (Ξ0,x1)∗ q1(dx2) dq1(dx1) = (Ξx2,1)∗ q1(dx1) dq1(dx2) (3.6) for all 0 < x1 < x2 < 1. This reflects the intuitive requirement that there should be no difference whether we first choose randomly x1 ∈ ]0, 1[ and then x2 ∈ ]x1, 1[ or the other way round, first x2 ∈ ]0, 1[ and then x1 ∈ ]0, x2[. Lemma 3.1. A family of measures qN , N ∈ N, with continuous densities satisfies property (3.5) if and only if qN (dx1, . . . , dxN ) = C N dx1 . . . dxN x1 · (x2 − x1) · . . . · (xN − xN−1) · (1− xN ) (3.7) for some constant C ∈ R+. Proof. If q1(dx) = ρ(x)dx then (3.6) is equivalent to ρ(y) · ρ = ρ(x) · ρ y − x for all 0 < x < y < 1. For continuous ρ this implies that there exists a constant C ∈ R+ such that ρ(x) = C x(1−x) for all 0 < x < 1. Iterated inserting this into (3.5) yields the claim. Let us come back to our attempt to give a meaning to the heuristic formula (3.1). Combining (3.4) with the choice (3.7) of the measure qN finally yields 0 (gt1 ∈ dx1, . . . , gtN ∈ dxN ) (xi − xi−1)β(ti−ti1 ) dx1 . . . dxN x1 · (x2 − x1) · . . . · (1− xN ) (3.8) with appropriate normalization constants Zβ,N . Now the RHS of this formula indeed turns out to define a consistent family of probability measures. Hence, by Kolmogorov’s extension theorem it admits a projective limit Q 0 on the space G0. The push forward of this measure under the canonical identification χ : G0 → P0, g 7→ g∗Leb will be the entropic measure Pβ0 which we were looking for. The details of the rigorous construction of this measure as well as various properties of it will be presented in the following sections. 3.2 The Measures Qβ and Pβ The basic object to be studied in this section is the probability measure Qβ on the space G. Proposition 3.2. For each real number β > 0 there exists a unique probability measure Qβ on G, called Dirichlet process, with the property that for each N ∈ N and for each ordered family of points t1, t2, . . . , tN ∈ S1 β (gt1 ∈ dx1, . . . , gtN ∈ dxN ) = i=1 Γ(β(ti+1 − ti)) (xi+1 − xi)β(ti+1−ti)−1dx1 . . . dxN . (3.9) The precise meaning of (3.9) is that for all bounded measurable u : (S1)N → R u (gt1 , . . . , gtN ) dQ i=1 Γ(β · |[ti, ti+1]|) u(x1, . . . , xN ) |[xi, xi+1]|β·|[ti,ti+1]|−1dx1 . . . dxN . with ΣN = (x1, . . . , xN ) ∈ (S1)N : i=1 |[xi, xi+1]| = 1 and xN+1 := x1, tN+1 := t1. In particular, with N = 1 this means u(gt)dQ β(g) = u(x)dx for each t ∈ S1. Proof. It suffices to prove that (3.9) defines a consistent family of finite dimensional distributions. The existence of Qβ (as a ’projective limit’) then follows from Kolmogorov’s extension theorem. The required consistency means that i=1 Γ(β · |[ti, ti+1]|) |[xi, xi+1]|β·|[ti,ti+1]|−1u(x1, . . . , xN ) dx1 . . . dxN Γ(β · |[t1, t2]|) · . . . · Γ(β · |[tk−1, tk+1]|) · . . . · Γ(β · |[tN , t1]|) |[x1, x2]|β·|[t1,t2]|−1 · . . . · |[xk−1, xk+1]|β·|[tk−1,tk+1]|−1 · . . . · |[xN , x1]|β·|[tN ,t1]|−1 ·v(x1, . . . , xk−1, xk . . . , xN ) dx1 . . . dxk−1dxk+1 . . . dxN whenever u(x1, . . . , xN ) = v(x1, . . . , xk−1, xk . . . , xN ) for all (x1, . . . xN ) ∈ ΣN . The latter is an immediate consequence of the well-known fact (Euler’s beta integral) that [xk−1,xk+1] |[xk−1, xk]|β·|[tk−1,tk]|−1 · |[xk, xk+1]|β·|[tk,tk+1]|−1 dxk Γ(β · |[tk−1, tk]|)Γ(β · |[tk, tk+1]|) Γ(β · |[tk−1, tk+1]|) |[xk−1, xk+1]|β·|[tk−1,tk+1]|−1. For s ∈ S1 let θ̂s : G → G, g 7→ g ◦ θs be the isomorphism of G induced by the rotation θs : S 1 → S1, t 7→ t+ s. Obviously, the measure Qβ on G is invariant under each of the maps θ̂s. Hence, Qβ induces a probability measure Q 1 on the quotient spaces G1 = G/S1. Recall the definition of the map χ : G → P, g 7→ g∗Leb. Since (g ◦ θs)∗Leb = g∗Leb this canonically extends to a map χ : G1 → P. (As mentioned before, the latter is even an isometry.) Definition 3.3. The entropic measure Pβ on P is defined as the push forward of the Dirichlet process Qβ on G (or equivalently, of the measure Qβ1 on G1) under the map χ. That is, for all bounded measurable u : P → R u(µ) dPβ(µ) = u(g∗Leb) dQ β(g). 3.3 The Measures Q 0 and P The subspaces {g ∈ G : g(0) = 0} and {g ∈ G0 : g(0) = 0} can obviously be identified. Conditioning the probability measure Qβ onto this event thus will define a probability measure 0 on G0. However, we prefer to give the direct construction of Q Proposition 3.4. For each real number β > 0 there exists a unique probability measure Q G0, called Dirichlet process, with the property that for each N ∈ N and each family 0 = t0 < t1 < t2 < . . . < tN < tN+1 = 1 0 (gt1 ∈ dx1, . . . , gtN ∈ dxN ) = i Γ(β · (ti+1 − ti)) (xi+1 − xi)β·(ti+1−ti)−1dx1 . . . dxN . (3.10) The precise meaning of (3.10) is that for all bounded measurable u : [0, 1]N → R u (gt1 , . . . , gtN ) dQ 0 (g) i=1 Γ(β · (ti+1 − ti)) u(x1, . . . , xN ) (xi+1 − xi)β·(ti+1−ti)−1dx1 . . . dxN . with ΣN = (x1, . . . , xN ) ∈ [0, 1]N : 0 < x1 < x2 . . . < xn < 1 and xN+1 := x1, tN+1 := t1. Remark 3.5. According to these explicit formulae, it is easy to calculate the moments of the Dirichlet process. For instance, 0 (gt) := gt dQ 0 (g) = t 0 (gt) := (gt − t)2 dQβ0 (g) = 1 + β t(1− t) for all β > 0 and all t ∈ [0, 1]. Definition 3.6. The entropic measure P 0 on P0 = P([0, 1]) is defined as the push forward of the Dirichlet process Q 0 on G0 under the map χ. That is, for all bounded measurable u : P0 → R u(µ) dP 0 (µ) = u(g∗Leb) dQ 0 (g). Remark 3.7. (i) According to the above construction Q 0 ( . ) = Q β( . |g(0) = 0) and u(g) dQ 0 (g) = u(g − g(0)) dQβ(g), u(g) dQβ(g) = u(g + x) dQ 0 (g) dx. (ii) Analogously, the entropic measures on the sphere and on the are linked as follows u(µ) dPβ(µ) = u((θx)∗µ)dP 0 (µ) dx or briefly dPβ = (θ̂x)∗dP where θx : S 1 → S1, y 7→ x + y and θ̂x : P → P : µ 7→ (θx)∗µ. We would like to emphasize, however, that Pβ 6= Pβ0 . For instance, consider u(µ) := f dµ for some f : S1 → R (which may be identified with f : [0, 1] → R). Then P(S1) u(µ) dPβ(µ) = f(x) dx whereas P([0,1]) u(µ) dP 0 (µ) = [0,1] f(x)ρβ(x) dx with ρβ(x) = Γ(βt)Γ(β(1−t)) xβt−1(1− x)β(1−t)−1 dt. According to the last remark, it suffices to study in detail one of the four measures Qβ, Q Pβ, and P 0 . We will concentrate in the rest of this chapter on the measure Q 0 which seems to admit the most easy interpretations. 3.4 The Dirichlet Process as Normalized Gamma Process We start recalling some basic facts about the real valued Gamma processes. For α > 0 denote by G(α) the absolutely continuous probability measure on R+ with density xα−1e−x. Definition 3.8. A real valued Markov process (γt)t≥0 starting in zero is called standard Gamma process if its increments γt − γs are independent and distributed according to G(t − s) for 0 ≤ s < t. Without loss of generality we may assume that almost surely the function t→ γt is right continuous and nondecreasing. Alternatively the Gamma-Process may be defined as the unique pure jump Levy process with Levy measure Λ(dx) = 1‖‖x>0 dx. The connection between pure jump Levy and Poisson point processes gives rise to several other equivalent representations of the Gamma process [Kin93, Ber99]. For instance, let Π = {p = (px, py) ∈ R2} be the Poisson point process on R+×R+ with intensity measure dx× Λ(dy) with Λ as above, then a Gamma process is obtained by γt := p∈Π:px≤t py. (3.11) For β > 0 the process γt·β is a Levy process with Levy measure Λβ(dx) = β · 1‖‖x>0 e dx. Its increments are distributed according to P (γβ·t − γβ·s ∈ dx) = Γ(β · (t− s))x β·(t−s)−1e−xdx. � Proposition 3.9. For each β > 0, the law of the process ( )t∈[0,1] is the Dirichlet process Q Proof. This well-known fact is easily obtained from Lukacs’ characterization of the Gamma distribution [ÉY04]. 3.5 Support Properties Proposition 3.10. (i) For each β > 0, the measure Q 0 has full support on G0. (ii) Q 0 -almost surely the function t 7→ g(t) is strictly increasing but increases only by jumps (that is, the jumps heights add up to 1 and the jump locations are dense in [0, 1]). (iii) For each fixed t0 ∈ [0, 1], Qβ0 -almost surely the function t 7→ g(t) is continuous at t0. Proof. (i) Let g ∈ G ⊂ L2([0, 1], dx) and ǫ > 0 then we have to show Qβ(Bǫ(g)) > 0 where Bǫ(g) = {h ∈ G0 : ‖h − g‖L2([0,1]) < ǫ}. For this choose finitely many points ti ∈ [0, 1] together with δi > 0 such that the set S := {f ∈ G ∣ |f(ti)−g(ti)| ≤ δi ∀i} is contained in Bǫ(g). Clearly, from (3.10) Qβ(S) > 0 which proves the claim. (ii) (3.10) implies that Q 0 -almost surely g(s) < g(t) for each given pair s < t. Varying over all such rational pairs s < t, it follows that a.e. g is strictly increasing on R+. In terms of the probabilistic representation (3.9), it is obvious that g increases only by jumps. (iii) This also follows easily from the representation as a normalized gamma process (3.9). Restating the previous property (ii) in terms of the entropic measure yields that P 0 -a.e. µ ∈ P0 is ’Cantor like’. More precisely, Corollary 3.11. P 0 -almost surely the measure µ ∈ P0 has no absolutely continuous part and no discrete part. The topological support of µ has Lebesgue measure 0. Moreover, Ent(µ) = +∞. (3.12) Proof. The assertion on the entropy of µ is an immediate consequence of the statement on the support of µ. The second claim follows from the fact that the jump heights of g add up to 1. In terms of the measure Q 0 , the last assertion of the corollary states that S(g) = +∞ for Q 0 -a.e. g ∈ G0. 3.6 Scaling and Invariance Properties The Dirichlet process Q 0 on G0 has the following Markov property: the distribution of g|[s,t] depends on g[0,1]\[s,t] only via g(s), g(t). And the Dirichlet process Q 0 on G0 has a remarkable self-similarity property: if we restrict the functions g onto a given interval [s, t] and then linearly rescale their domain and range in order to make them again elements of G0 then this new process is distributed according to Qβ 0 with β′ = β · |t− s|. Proposition 3.12. For each β > 0, and each s, t ∈ [0, 1], s < t g|[s,t] ∈ . ∣ g[0,1]\[s,t] g|[s,t] ∈ . ∣ g(s), g(t) (3.13) (Ξs,t)∗Q 0 = Q β·|t−s| 0 (3.14) where Ξs,t : G0 → G0 with Ξs,t(g)(r) = g((1−r)s+rt)−g(s)g(t)−g(s) for r ∈ [0, 1]. Proof. Both properties follow immediately from the representation in Proposition 3.10. Corollary 3.13. The probability measures Q 0 , β > 0 on G0 are uniquely characterized by the self-similarity property (3.14) and the distributions of g1/2: 0 (g1/2 ∈ dx) = Γ(β/2)2 · [x(1− x)]β/2−1dx. Proposition 3.14. (i) For β → 0 the measures Qβ0 weakly converge to a measure Q00 defined as the uniform distribution on the set {1[t,1] : t ∈ ]0, 1]} ⊂ G0. Analogously, the measures Qβ weakly converge for β → 0 to a measure Q0 defined as the uniform distribution on the set of constant maps {t : t ∈ S1} ⊂ G. (ii) For β → ∞ the measures Qβ0 (or Qβ) weakly converge to the Dirac mass δe on the identity map e of [0, 1] (or S1, resp.). Proof. (i) Since the space G0 (equipped with the L2-topology) is compact, so is P(G0) (equipped with the weak topology). Hence the family Q 0 , β > 0 is pre-compact. Let Q 0 denote the limit of any converging subsequence of Q 0 for β → 0. According to the formula for the one-dimensional distributions, for each t ∈ ]0, 1[ 0 (gt ∈ dx) = Γ(βt)Γ(β(1 − t)) · x βt−1(1− x)β(1−t)−1dx −→ (1− t)δ{0}(dx) + tδ{1}(dx) as β → 0. Hence, Q00 is the uniform distribution on the set {1[t,1] : t ∈ ]0, 1]} ⊂ G0. (ii) Similarly, Q 0 (gt ∈ dx) → δt(dx) as β → ∞. Hence, δe with e : t 7→ t will be the unique accumulation point of Q 0 for β → ∞. Restating the previous results in terms of the entropic measures, yields that the entropic mea- sures P 0 converge weakly to the uniform distribution P 0 on the set {(1 − t)δ{0} + tδ{1} : t ∈ [0, 1]} ⊂ P0; and the measures Pβ converge weakly to the uniform distribution P0 on the set {δ{t} : t ∈ S1} ⊂ P whereas for β → ∞ both, Pβ0 and Pβ, will converge to δLeb, the Dirac mass on the uniform distribution of [0, 1] or S1, resp. The assertions of Proposition 3.12 imply the following Markov property and self-similarity prop- erty of the entropic measure. Proposition 3.15. For each each x, y ∈ [0, 1], x < y µ|[x,y] ∈ . ∣µ|[0,1]\[x,y] µ|[x,y] ∈ . ∣µ([x, y] µ|[x,y] ∈ . ∣µ([x, y]) = α 0 (µx,y ∈ . ) with µx,y ∈ P0 (’rescaling of µ|[x,y]’) defined by µx,y(A) = 1µ([x,y])µ(x+(y−x) ·A) for A ⊂ [0, 1]. 3.7 Dirichlet Processes on General Measurable Spaces Recall Ferguson’s notion of a Dirichlet process on a general measurable spaceM with parameter measure m on M . This is a probability measure Qm on P(M), uniquely defined by the fact that for any finite measurable partition M = ˙ i=1 Mi and σi := m(Mi). P(M) (µ : µ(M1) ∈ dx1, . . . , µ(MN ) ∈ dxN ) Γ(m(M)) i=1 Γ(σi) 1 · · · x )σN+1−1dx1 · · · dxN , If a map h :M →M leaves the parameter measure m invariant then obviously the induced map ĥ : P(M) → P(M), µ 7→ h∗µ leaves the Dirichlet process QmP(M) invariant. In the particular case M = [0, 1] and m = β · Leb, the Dirichlet process Qm can be obtained as push forward of the measure Q 0 (introduced before) under the isomorphism ζ : G0 → P([0, 1]) which assigns to each g the induced Lebesgue-Stieltjes measure dg (the inverse ζ−1 assigns to each probability measure its distribution function): P([0,1]) = ζ∗Q 0 . (3.15) Note that the support properties of the measure Qm P([0,1]) are completely different from those of the measure P 0 . In particular, Q P([0,1]) -almost every µ ∈ P([0, 1]) is discrete and has full topo- logical support, cf. Corollary 3.11. The invariance properties of Qm P([0,1]) under push forwards by means of measure preserving transformations of [0, 1] seems to have no intrinsic interpretation in terms of Q 4 The Change of Variable Formula for the Dirichlet Process and for the Entropic Measure Our main result in this chapter will be a change of variable formula for the Dirichlet process. To motivate this formula, let us first present an heuristic derivation based on the formal repre- sentation (3.1). 4.1 Heuristic Approaches to Change of Variable Formulae Let us have a look on the change of variable formula for the Wiener measure. On a formal level, it easily follows from Feynman’s heuristic interpretation dPβ(g) = g′(t)2dt dP(g) with the (non-existing) ’uniform distribution’ P. Assuming that the latter is ’translation invari- ant’ (i.e. invariant under additive changes of variables, – at least in ’smooth’ directions h) we immediately obtain dPβ(h+ g) = (h+g)′(t)2dt dP(h + g) h′(t)2dt−β h′(t)g′(t)dt · e− g′(t)2dt dP(g) h′(t)2dt−β h′(t)dg(t) dPβ(g). If we interpret h′(t)dg(t) as the Ito integral of h′ with respect to the Brownian path g then this is indeed the famous Cameron-Martin-Girsanov-Maruyama theorem. In the case of the entropic measure, the starting point for a similar argumentation is the heuristic interpretation 0 (g) = log g′(t)dt dQ0(g), again with a (non-existing) ’uniform distribution’ Q0 on G0. The natural concept of ’change of variables’, of course, will be based on the semigroup structure of the underlying space G0; that is, we will study transformations of G0 of the form g 7→ h ◦ g for some (smooth) element h ∈ G0. It turns out that Q0 should not be assumed to be invariant under translations but merely quasi-invariant: dQ0(h ◦ g) = Y 0h (g) dQ0(g) with some density Yh. This immediately implies the following change of variable formula for Q 0 (h ◦ g) = log(h◦g)′(t)dt dQ0(h ◦ g) log h′(g(t))dt · eβ log g′(t)dt · Y 0h (g) dQ0(g) log g′(t)dt · Y 0h (g) dQ 0 (g). This is the heuristic derivation of the change of variables formula. Its rigorous derivation (and the identification of the density Yh) is the main result of this chapter. 4.2 The Change of Variables Formula on the Sphere For g, h ∈ G with h ∈ C2 we put Y 0h (g) := h′ (g(a−)) · h′ (g(a+)) δ(h◦g) , (4.1) where Jg ⊂ S1 denotes the set of jump locations of g and δ(h ◦ g) (a) := h (g(a+)) − h (g(a−)) g(a+)− g(a−) . To simplify notation, here and in the sequel (if no ambiguity seems possible), we write y − x instead of |[x, y]| to denote the length of the positively oriented segment from x to y in S1. We will see below that the infinite product in the definition of Y 0h (g) converges for Q β-a.e. g ∈ G. Moreover, for β > 0 we put h (g) := exp log h′ (g(s)) ds h (g) := X h (g) · Y h (g). (4.2) Theorem 4.1. Each C2-diffeomorphism h ∈ G induces a bijective map τh : G → G, g 7→ h ◦ g which leaves the measure Qβ quasi-invariant: dQβ(h ◦ g) = Y βh (g) dQ β(g). In other words, the push forward of Qβ under the map τ−1h = τh−1 is absolutely continuous w.r.t. Qβ with density Y d(τh−1)∗Q dQβ(g) h (g). The function Y h is bounded from above and below (away from 0) on G. By means of the canonical isometry χ : G → P, g 7→ g∗Leb, Theorem 4.1 immediately implies Corollary 4.2. For each C2-diffeomorphism h ∈ G the entropic measure Pβ is quasi-invariant under the transformation µ 7→ h∗µ of the space P: dPβ(h∗µ) = Y −1(µ)) dPβ(µ). The density Y −1(µ)) introduced in (4.2) can be expressed as follows −1(µ)) = exp log h′(s)µ(ds) I∈gaps(µ) h′(I−) · h′(I+) |h(I)|/|I| where gaps(µ) denotes the set of segments I = ]I−, I+[⊂ S1 of maximal length with µ(I) = 0 and |I| denotes the length of such a segment. 4.3 The Change of Variables Formula on the Interval From the representation of Qβ as a product of Q 0 and Leb (see Remark 3.7) and the change of variable formulae for Qβ and Leb, one can deduce a change of variable formula for Q 0 similar to that of Theorem 4.1 but containing an additional factor 1 h′(0) . In this case, one has to restrict to translations by means of C2-diffeomorphisms h ∈ G with h(0) = 0. More generally, one might be interested in translations of G0 by means of C2-diffeomorphisms h ∈ G0. In contrast to the previous situation, it now may happen that h′(0) 6= h′(1). For g ∈ G0 and C2-ismorphism h : [0, 1] → [0, 1] we put h,0(g) := X h (g) · Yh,0(g) (4.3) Yh,0(g) = h′(0) · h′(1) · Y 0h (g) and X (g) and Y 0h (g) defined as before in (4.1), (4.2). Note that here and in the sequel by a C2-isomorphism h ∈ G0 we understand an increasing homeomorphism h : [0, 1] → [0, 1] such that h and h−1 are bounded in C2([0, 1]), which in particular implies h′ > 0. Theorem 4.3. Each translation τh : G0 → G0, g 7→ h ◦ g by means of a C2-isomorphism h ∈ G0 leaves the measure Q 0 quasi-invariant: 0 (h ◦ g) = Y (g) dQ 0 (g) or, in other words, d(τh−1)∗Q 0 (g) 0 (g) The function Y h,0 is bounded from above and below (away from 0) on G0. Corollary 4.4. For each C2-isomorphism h ∈ G0 the entropic measure Pβ0 is quasi-invariant under the transformation µ 7→ h∗µ of the space P0: 0 (h∗µ) 0 (µ) = exp log h′(s)µ(ds) h′(0) · h′(1) I∈gaps(µ) h′(I−) · h′(I+) |h(I)|/|I| where gaps(µ) denotes the set of intervals I = ]I−, I+[⊂ [0, 1] of maximal length with µ(I) = 0 and |I| denotes the length of such an interval. Remark 4.5. Theorem 4.3 seems to be unrelated to the quasi-invariance of the measure Qm P([0,1]) under the transformation dg → h · dg/〈h, dg〉 shown in [Han02]. Nor is it anyhow implied by the quasi-ivarariance formula for the general measure valued gamma process as in [TVY01] with respect to a similar transformation. In our present case the latter would correspond to the mapping dγ → h · dγ of the (measure valued) Gamma process dγ. 4.4 Proofs for the Sphere Case Lemma 4.6. For each C2-diffeomorphism h ∈ G h (g) = lim h (g(ti+1))− h (g(ti)) g(ti+1)− g(ti) ]β(ti+1−ti) (4.4) Here ti = for i = 0, 1, . . . , k − 1 and tk = 0. Thus ti+1 − ti := |[ti, ti+1]| = 1k for all i. Proof. Without restriction, we may assume β = 1. According to Taylor’s formula h (g(ti+1)) = h (g(ti)) + h ′ (g(ti)) · (g(ti+1)− g(ti)) + 12h ′′(γi) · (g(ti+1)− g(ti))2 for some γi ∈ [g(ti), g(ti+1)]. Hence, h (g(ti+1))− h (g(ti)) g(ti+1)− g(ti) ]ti+1−ti = lim h′ (g(ti)) + h′′(γi) · (g(ti+1)− g(ti)) ]ti+1−ti = lim log h′ (g(ti)) + log 1 + 1 h′′(γi) h′ (g(ti)) (g(ti+1)− g(ti)) · (ti+1 − ti) = exp log h′ (g(ti)) · (ti+1 − ti) = exp log h′ (g(t)) dt = X1h(g). Here (⋆) follows from the fact that 1 + 1 h′′(γi) h′ (g(ti)) · (g(ti+1)− g(ti)) = h (g(ti+1))− h (g(ti)) g(ti+1)− g(ti) h′ (g(ti)) = h′(ηi) · h′ (g(ti)) ≥ ε > 0 for some ηi ∈ [g(ti), g(ti+1)] and some ε > 0, independent of i and k. Thus 1 + 1 h′′(γi) h′ (g(ti)) · (g(ti+1)− g(ti)) · (ti+1 − ti) ≤ C1 · h′′(γi) h′ (g(ti)) · (g(ti+1)− g(ti)) · (ti+1 − ti) ≤ C2 · (g(ti+1)− g(ti)) · (ti+1 − ti) ≤ C3 · 1k . Lemma 4.7. For each C3-diffeomorphism h ∈ G Y 0h (g) := lim h′ (g(ti)) · g(ti+1)− g(ti) h (g(ti+1))− h (g(ti)) (4.5) where ti = for i = 0, 1, . . . , k − 1 and tk = 0. Proof. Let h and g be given. Depending on some ε > 0 let us choose l ∈ N large enough (to be specified in the sequel) and let a1, . . . , al denote the l largest jumps of g. Put J g = Jg\{a1, . . . , al} and for simplicity al+1 := a1. For k very large (compared with l) and j = 1, . . . , l let kj denote the index i ∈ {0, 1, . . . , k − 1}, for which aj ∈ [ti, ti+1[. Then again by Taylor’s formula kj+1−1 i=kj+1 h′ (g(ti)) · g(ti+1)− g(ti) h (g(ti+1))− h (g(ti)) kj+1−1 i=kj+1 1 + 1 h′′ (g(ti)) h′ (g(ti)) · (g(ti+1)− g(ti)) + 16 h′′′(ηi) h′ (g(ti)) · (g(ti+1)− g(ti))2 ≤ exp kj+1−1 i=kj+1 log h′ (g(ti)) + · (g(ti+1)− g(ti)) ≤ eε/l · exp kj+1−1 i=kj+1 log h′ (g(ti)) · (g(ti+1)− g(ti)) provided l and k are chosen so large that |g(ti+1)− g(ti)| ≤ C1 · l for all i ∈ {0, . . . , k − 1} \ {k1, . . . , kl}, where C1 = sup |h′′′(x)| 6·h′(y) On the other hand, g(tkj+1) g(tkj+1) ) = exp ∫ g(tkj+1 ) g(tkj+1) log h′ (s) ds = exp kj+1−1 i=kj+1 log h′ (g(ti)) · (g(ti+1)− g(ti)) + log h′ (γi) · 12 (g(ti+1)− g(ti)) ≥ e−ε/l · exp kj+1−1 i=kj+1 log h′ (g(ti)) · (g(ti+1)− g(ti)) provided l and k are chosen so large that |g(ti+1)− g(ti)| ≤ C2 · l for all i ∈ {0, 1, . . . , k − 1} \ {k1, . . . , kl}, where C2 = sup log h′ Therefore, i∈{0,1,...,k−1}\{k1,...,kl} h′ (g(ti)) · g(ti+1)− g(ti) h (g(ti+1))− h (g(ti)) ≤ e2ε · g(tkj+1) g(tkj+1) ) = (I). In order to derive the corresponding lower estimate, we can proceed as before in (1a) and (2) (replacing ε by −ε and ≤ by ≥ and vice versa). To proceed as in (1b) we have to argue as follows kj+1−1 i=kj+1 log h′ (g(ti))− εl · (g(ti+1)− g(ti)) ≥ e−ε/l · exp kj+1−1 i=kj+1 (1− ε) · log h′ (g(ti)) · (g(ti+1)− g(ti)) provided l and k are chosen so large that log (1 + C3 · (g(ti+1)− g(ti))) ≥ (1− ε) · C3 · (g(ti+1)− g(ti)) for all i ∈ {0, 1, . . . , k − 1} \ {k1, . . . , kl}, where C3 = sup log h′ Thus we obtain the following lower estimate i∈{0,1,...,k−1}\{k1,...,kl} h′ (g(ti)) · g(ti+1)− g(ti) h (g(ti+1))− h (g(ti)) ≥ e−2ε · g(tkj+1) g(tkj+1) ≥ e−2ε · C−ε/23 · g(tkj+1) g(tkj+1) ) = (II), since g(tkj+1) g(tkj+1) = exp log h′ g(tkj+1) − log h′ g(tkj+1) ≤ exp g(tkj+1)− g(tkj+1) ≤ exp where C3 = sup ∣(log h′) Now for fixed l as k → ∞ the bound (I) converges to (I′) = e2ε · h′ (g(aj+1−)) h′ (g(aj+)) and the bound (II) to (II′) = e−2ε · C−ε/23 · h′ (g(aj+1−)) h′ (g(aj+)) Finally, it remains to consider i∈{k1,...,kl} h′ (g(ti)) · g(ti+1)− g(ti) h (g(ti+1))− h (g(ti)) = (III). Again for fixed l and k → ∞ this obviously converges to (III′) = h′ (g(aj−)) · δ(h ◦ g) Putting together these estimates and letting l → ∞, we obtain the claim. Lemma 4.8. (i) For all g, h ∈ G with h ∈ C2 strictly increasing, the infinite product in the definition of Y 0h (g) converges. There exists a constant C = C(β, h) such that ∀g ∈ G ≤ Y β (g) ≤ C. (ii) If hn → h in C2 then Y 0hn(g) → Y h (g). (iii) Let Y 0h,k,X denote the sequences used in Lemma 4.6 and 4.7 to approximate Y 0h ,X Then there exists a constant C = C(β, h) such that ∀g ∈ G, ∀k ∈ N ≤ Y βh,k(g) ≤ C. Proof. (i) Put C = sup |(log h′)′|. Given g ∈ G and ǫ > 0, we choose k large enough such that a∈Jg(k) |g(a+)−g(a−)| ≤ ǫ where Jg(k) = Jg \{a1, a2, . . . , ak} denotes the ’set of small jumps’ of g. Here we enumerate the jump locations a1, a2, . . . ∈ Jg according to the size of the respective jumps. Then with suitable ξa ∈ [g(a−), g(a+)] a∈Jg(k) h′(g(a−)) h′(g(a+)) δ(h◦g) a∈Jg(k) log h′(g(a−)) + 1 log h′(g(a−)) − log h′(ξ(a)) a∈Jg(k) |C · (g(a+)− g(a−))| = C · ǫ. Hence, the infinite sum h′(g(a−)) h′(g(a+)) δ(h◦g) = lim a∈Jg(k) h′(g(a−)) h′(g(a+)) δ(h◦g) is absolutely convergent and thus also infinite product in the definition of Y 0h (g) converges. The same arguments immediately yield ∣log Y 0h (g) log h′(g(a−)) + 1 log h′(g(a−)) − log h′(ξ(a)) ≤ C. (4.6) (ii) In order to prove the convergence Y 0hn(g) → Y h (g), for given g ∈ G we split the product over all jumps into a finite product over the big jumps and an infinite product over all small jumps. Obviously, the finite products will converge (for any choice of k) a∈{a1,...,ak} h′n(g(a−)) h′n(g(a+)) δ(hn◦g) a∈{a1,...,ak} h′(g(a−)) h′(g(a+)) δ(h◦g) as n → ∞ provided hn → h in C2. Now let C = supn supx |(log h′n)′(x)| and choose k as before. Then uniformly in n a∈Jg\{a1,...,ak} h′n(g(a−)) h′n(g(a+)) δ(hn◦g) ≤ C · ǫ. (iii) Let C1 = sup |h′(x)| and C2 = sup ∣(log h′) ∣. Then for all g and k: Xh,k(g) = h′(ηi) ti+1−ti ≤ C1 Y 0h,k(g) = h′ (g(ti)) h′(γi) = exp log h′ (ζi) · (g(ti)− γi) ≤ exp |g(ti)− γi| ≤ exp(C2) (with suitable γi, ηi ∈ [g(ti), g(ti+1)] and ζi ∈ [g(ti), γi]). Analogously, the lower estimates follow. Proof of Theorem 4.1. In order to prove the equality of the two measures under consideration, it suffices to prove that all of their finite dimensional distributions coincide. That is, for each m ∈ N, each ordered family t1, . . . , tm of points in S 1 and each bounded continuous u : (S1)m −→ R one has to verify that h−1 (g(t1)) , h −1 (g(t2)) , . . . , h −1 (g(tm)) dQβ(g) u (g(t1), g(t2), . . . , g(tm)) · Y βh (g) dQ β(g). Without restriction, we may restrict ourselves to equidistant partitions, i.e. ti = for i = 1, . . . ,m. Let us fix m ∈ N, u and h. For simplicity, we first assume that h is C3. Then by Lemmas 4.6 - 4.8 and Lebesgue’s theorem , . . . , g (1) · Y βh (g) dQ , . . . , g (1) · lim (g) dQβ(g) = lim , . . . , g (1) dQβ(g) = lim [Γ(β/km)]km u(xk, x2k, . . . , xmk) h′(xi) · [h(xi+1)− h(xi)] dx1 . . . dxmk = lim [Γ(β/km)]km u(xk, x2k, . . . , xmk) · [h(xi+1)− h(xi)] dh(x1) . . . dh(xmk) = lim [Γ(β/km)]km h−1(yk), h −1(y2k), . . . , h −1(ymk) [yi+1 − yi] dy1 . . . dymk , h−1 , . . . , h−1 (g (1)) dQβ(g). Now we treat the general case h ∈ C2. We choose a sequence of C3-functions hn ∈ G with hn → h in C2. Then h−1 (g(t1)) , h −1 (g(t2)) , . . . , h −1 (g(tm)) dQβ(g) = lim h−1n (g(t1)) , h n (g(t2)) , . . . , h n (g(tm)) dQβ(g) = lim u (g(t1), g(t2), . . . , g(tm)) · Y βhn(g) dQ u (g(t1), g(t2), . . . , g(tm)) · Y βh (g) dQ β(g). For the last equality, we have used the dominated convergence Y (g) → Y βh (g) (due to Lemma 4.8). 4.5 Proof for the Interval Case The proof of Theorem 4.3 uses completely analogous arguments as in the previous section. To simplify notation, for h ∈ C1([0, 1]), k ∈ N let Xh,k, Y 0h,k : G0 → R be defined by Xh,k(g) := h (g(ti+1))− h (g(ti)) g(ti+1)− g(ti) ]ti+1−ti Y 0h,k(g) := g(t1)− g(t0) h (g(t1))− h (g(t0)) ] k−1 h′ (g(ti)) · g(ti+1)− g(ti) h (g(ti+1))− h (g(ti)) where ti = with i = 0, 1, . . . , k. Similar to the proof of theorem 4.1 the measure Q 0 satisfies the following finite dimensional quasi-invariance formula. For any u : [0, 1]m−1 → R, m, l ∈ N and C1-isomorphism h : [0, 1] → [0, 1] h−1 (g(t1)) , h −1 (g(t2)) , . . . , h −1 (g(tm−1)) 0 (g) u (g(t1), g(t2), . . . , g(tm−1)) ·Xβh,l·m(g) · Y h,l·m(g) dQ 0 (g), where ti = , i = 1, · · · ,m−1. The passage to the limit for letting first l and then m to infinity is based on the following assertions. Lemma 4.9. (i) For each C2-isomorphism h ∈ G0 and g ∈ G0 Xh(g) = lim Xh,k(g). (ii) For each C3-isomorphism h ∈ G0 and g ∈ G0 Y 0h,k(g) = h′(g(a+)) · h′(g(a−)) δ(h◦g) h′(g(0)) · h′(g(1−)) 1 if g(1−) = g(1) h′(g(1−)) δ(h◦g) else, where Jg ⊂]0, 1[ is the set of jump locations of g on ]0, 1[. In particular, Y 0h,k(g) = Yh,0(g) for Q 0 -a.e.g. (iii) For all g ∈ G0 and C2-isomorphism h ∈ G0, the infinite product in the definition of Yh,0(g) converges. There exists a constant C = C(β, h) such that ∀g ∈ G0 ≤ Y β (g) ≤ C. (iv) If hn → h in C2([0, 1], [0, 1]) with h as above, then Y 0,hn(g) → Y0,h(g). (v) For each C3-isomorphism h ∈ G0 there exists a constant C = C(β, h) such that ∀g ∈ G, ∀k ∈ N ≤ Xβh,k(g) · Y h,k(g) ≤ C. Proof. The proofs of (i) and (iii)-(iv) carry over from their respective counterparts on the sphere, lemmas 4.6 and 4.8 above. We sketch the proof of statement (ii) which needs most modification. For ε > 0 choose l ∈ N large enough and let a2, . . . , al−1 denote the l − 2 largest jumps of g on ]0, 1[. For k very large (compared with l) we may assume that a2, . . . , al−2 ∈] 2k , 1 − [. Put a1 := , al := 1 − 1k . For j = 1, . . . , l let kj denote the index i ∈ {1, . . . , k − 1}, for which aj ∈ [ti, ti+1[. In particular, k1 = 1 and kl = k−1. Then using the same arguments as in lemma 4.7 one obtains, for k and l sufficiently large, the two sided bounds (I) = e2ε · g(tkj+1) g(tkj+1) i∈{1,...,k−1}\{k1,...,kl} h′ (g(ti)) · g(ti+1)− g(ti) h (g(ti+1))− h (g(ti)) ≥ e−2ε · C−ε/23 · g(tkj+1) g(tkj+1) ) = (II) For fixed l and k → ∞ the bounds (I) and (II) converge to (I′) = e2ε h′(g(a2−)) h′(g(0)) h′ (g(aj+1−)) h′ (g(aj+)) h′(g(1−)) h′(g(al−1+)) (II′) = e−2ε · C−ε/23 · h′(g(a2−)) h′(g(0)) h′ (g(aj+1−)) h′ (g(aj+)) h′(g(1−)) h′(g(al−1+)) It remains to consider the three remaining terms (III) = i∈{k2,...,kl−1} h′ (g(ti)) · g(ti+1)− g(ti) h (g(ti+1))− h (g(ti)) which for fixed l and k → ∞ converges to (III′) = h′ (g(aj−)) · δ(h ◦ g) (IV) = )− g(0) − h (g(0)) )− g( 1 converging by right continuity of g to (IV′) = h′(g(0)) (V) = k − 1 g(1) − g(k−1 h (g(1)) − h g(k−1 which tends, also for k → ∞, to (V′) = 1 if g continuous in 1 δ(h◦g) (1) 1 h′(g(1−)) else. Combining these estimates and letting l → ∞, we obtain the first claim. The second claim in statement (ii) follows from the fact that g is continuous in t = 1 Q 0 -almost surely. 5 The Integration by Parts Formula In order to construct Dirichlet forms and Markov processes on G, we will consider it as an infinite dimensional manifold. For each g ∈ G, the tangent space TgG will be an appropriate completion of the space C∞(S1,R). The whole construction will strongly depend on the choice of the norm on the tangent spaces TgG. Basically, we will encounter two important cases: • in Chapter 6 we will study the case TgG = Hs(S1,Leb) for some s > 1/2, independent of g; this approach is closely related to the construction of stochastic processes on the diffeomorphism group of S1 and Malliavin’s Brownian motion on the homeomorphism group on S1, cf. [Mal99]. • in Chapters 7-9 we will assume TgG = L2(S1, g∗Leb); in terms of the dynamics on the space P(S1) of probability measures, this will lead to a Dirichlet form and a stochastic process associated with the Wasserstein gradient and with intrinsic metric given by the Wasserstein distance. In this chapter, we develop the basic tools for the differential calculus on G. The main result will be an integration by parts formula. These results will be independent of the choice of the norm on the tangent space. 5.1 The Drift Term For each ϕ ∈ C∞(S1,R), the flow generated by ϕ is the map eϕ : R × S1 → S1 where for each x ∈ S1 the function eϕ(., x) : R → S1, t 7→ eϕ(t, x) denotes the unique solution to the ODE = ϕ(xt) (5.1) with initial condition x0 = x. Since eϕ(t, x) = etϕ(1, x) for all ϕ, t, x under consideration, we may simplify notation and write etϕ(x) instead of eϕ(t, x). Obviously, for each ϕ ∈ C∞(S1,R) the family etϕ, t ∈ R is a group of orientation preserving, C∞-diffeomorphism of S1. (In particular, e0 is the identity map e on S1, etϕ ◦ esϕ = e(t+s)ϕ for all s, t ∈ R and (eϕ)−1 = e−ϕ.) Since ∂ etϕ(x)|t=0 = ϕ(x) we obtain as a linearization for small t etϕ(x) ≈ x+ tϕ(x). (5.2) More precisely, |etϕ(x)− (x+ tϕ(x))| ≤ C · t2 as well as etϕ(x)− (1 + t ϕ(x))| ≤ C · t2 uniformly in x and |t| ≤ 1. For ϕ ∈ C∞(S1,R) and β > 0 we define functions V βϕ : G → R by V βϕ (g) := V ϕ (g) + β ϕ′(g(x))dx where V 0ϕ (g) := ϕ′(g(a+)) + ϕ′(g(a−)) − ϕ(g(a+)) − ϕ(g(a−)) g(a+)− g(a−) . (5.3) Lemma 5.1. (i) The sum in (5.3) is absolutely convergent. More precisely, |V 0ϕ (g)| ≤ ϕ′(g(a+)) + ϕ′(g(a−)) − ϕ(g(a+)) − ϕ(g(a−)) g(a+)− g(a−) |ϕ′′(x)|dx |V βϕ (g)| ≤ (1/2 + β) · |ϕ′′(x)|dx. (ii) For each β ≥ 0 V βϕ (g) = Y βetϕ(g) e+tϕ(g) . (5.4) Proof. (i) According to Taylor’s formula, for each a ∈ Jg ϕ′(g(a+)) + ϕ′(g(a−)) − δ(ϕ ◦ g) (a) = 2(g(a+) − g(a−)) ∫ g(a+) g(a−) ∫ g(a+) g(a−) sgn(y−x) ·ϕ′′(y)dydx. Hence, ϕ′(g(a+)) + ϕ′(g(a−)) − δ(ϕ ◦ g) (g(a+)− g(a−)) ∫ g(a+) g(a−) ∫ g(a+) g(a−) sgn(y − x) · ϕ′′(y)dydx ∫ g(a+) g(a−) |ϕ′′(y)|dy = 1 |ϕ′′(y)|dy. Finally, ϕ′(g(x))dx| ≤ sup |ϕ′(y)| ≤ |ϕ′′(y)|dy. (ii) Let us first consider the case β = 0. log Y 0etϕ(g) etϕ)(g(a+)) + etϕ)(g(a−)) − log δ(etϕ ◦ g) etϕ)(g(a+)) + etϕ)(g(a−)) − log δ(etϕ ◦ g) In order to justify that we may interchange differentiation and summation, we decompose (as we did several times before) the infinite sum over all jumps in Jg into a finite sum over big jumps a1, . . . , ak and an infinite sum over small jumps in Jg(k) = Jg \ {a1, . . . , ak}. Of course, the finite sum will make no problem. We are going to prove that the contribution of the small jumps is arbitrarily small. Recall from Lemma 4.8 that a∈Jg(k) etϕ)(g(a+)) + etϕ)(g(a−)) − log δ(etϕ ◦ g) ≤ Ct· a∈Jg(k) [g(a+)− g(a−)] where Ct := supx log( ∂ etϕ)(x) ∣. Now Ct ≤ C · |t| for all |t| ≤ 1 and an appropriate constant C. Thus for any given ǫ > 0 a∈Jg(k) etϕ)(g(a+)) + etϕ)(g(a−)) − log δ(etϕ ◦ g) provided k is chosen large enough (i.e. such that C · a∈Jg(k) |g(a+)−g(a−)| ≤ ǫ). This justifies the above interchange of differentiation and summation. Now for each x ∈ S1 etϕ(x) = ϕ′(x) since the linearization of etϕ for small t yields etϕ(x) ≈ x+ tϕ(x), etϕ(x) ≈ 1 + tϕ′(x). Similarly, for small t we obtain δ(etϕ ◦ g) (a) ≈ 1 + t · δ(ϕ ◦ g) and thus δ(etϕ ◦ g) δ(ϕ ◦ g) Therefore, log Y 0etϕ(g) = V 0ϕ (g). On the other hand, obviously log Y 0etϕ(g) Y 0etϕ(g) since Y 0e0(g) = 1. Finally, we have to consider the derivative of Xetϕ . Based on the previous arguments and using the fact that ∂ (x) is uniformly bounded in t ∈ [−1, 1] and x ∈ S1 we immediately logXetϕ(g) (g(y))dy (g(y)) dy = ϕ′(g(y))dy. Again Xe0(g) = 1. Therefore, = β · ϕ′(g(y))dy and thus Y βetϕ(g) = V βϕ (g). this proves the first identity in (5.4). The proof of the second one V ϕ (g) = e+tϕ(g) similar (even slightly easier). 5.2 Directional Derivatives For functions u : G → R we will define the directional derivative along ϕ ∈ C∞(S1,R) by Dϕu(g) := lim [u(etϕ ◦ g)− u(g)] (5.5) provided this limit exists. In particular, this will be the case for the following ’cylinder functions’. Definition 5.2. We say that u : G → R belongs to the class Sk(G) if it can be written as u(g) = U(g(x1), . . . , g(xm)) (5.6) for some m ∈ N, some x1, . . . , xm ∈ S1 and some Ck-function U : (S1)m → R. It should be mentioned that functions u ∈ Sk(G) are in general not continuous on G. Lemma 5.3. The directional derivative exists for all u ∈ S1(G). In particular, for u as above Dϕu(g) = lim [u(g + t · ϕ ◦ g) − u(g)] ∂iU(g(x1), . . . , g(xm)) · ϕ(g(xi)) with ∂iU := U . Moreover, Dϕ : S k(G) → Sk−1(G) for all k ∈ N ∪ {∞} and ‖Dϕu‖L2(Qβ) ≤ m · ‖∇U‖∞ · ‖ϕ‖L2(S1). Proof. The first claim follows from Dϕu(g) = U(etϕ(g(x1)), . . . , etϕ(g(xm))) ∂iU(etϕ(g(x1)), . . . , etϕ(g(xm))) · etϕ(g(xi)) ∂iU(g(x1), . . . , g(xm)) · ϕ(g(xi)) U(g(x1) + tϕ(g(x1)), . . . , g(xm) + tϕ(g(xm))) = lim [u(g + t · ϕ ◦ g)− u(g)] . For the second claim, ‖Dϕu‖2L2(Qβ) = ∂iU(g(x1), . . . , g(xm)) · ϕ(g(xi)) dQβ(g) (∂iU) 2(g(x1), . . . , g(xm)) · ϕ2(g(xi)) dQβ(g) ≤ ‖∇U‖2∞ · ϕ2(g(xi)) dQ = m · ‖∇U‖2∞ · ϕ2(y) dy. 5.3 Integration by Parts Formula on P(S1) For ϕ ∈ C∞(S1,R) let D∗ϕ denote the operator in L2(G,Qβ) adjoint to Dϕ with domain S1(G). Proposition 5.4. Dom(D∗ϕ) ⊃ S1(G) and for all u ∈ S1(G) D∗ϕu = −Dϕu− V βϕ · u. (5.7) Proof. Let u, v ∈ S1(G). Then Dϕu · v dQβ = lim [u(etϕ ◦ g) − u(g)] · v(g) dQβ(g) = lim u(g) · v(e−tϕ ◦ g) · Y βe−tϕ − u(g) · v(g) dQβ(g) = lim u(g) · [v(e−tϕ ◦ g)− v(g)] dQβ(g) + lim u(g) · v(g) · Y βe−tϕ − 1 dQβ(g) + lim u(g) · [v(e−tϕ ◦ g) − v(g)] · Y βe−tϕ − 1 dQβ(g) u ·Dϕv dQβ(g)− u · v · V βϕ dQβ(g) + 0. To justify the last equality, note that according to Lemma 4.8 | log Y βetϕ | ≤ C · |t| for |t| ≤ 1. Hence, the claim follows with dominated convergence and Lemma 5.4. Corollary 5.5. The operator (Dϕ,S 1(G)) is closable in L2(Qβ). Its closure will be denoted by (Dϕ,Dom(Dϕ)). In other words, Dom(Dϕ) is the closure (or completion) of S 1(G) with respect to the norm [u2 + (Dϕu) 2] dQβ Of course, the space Dom(Dϕ) will depend on β but we assume β > 0 to be fixed for the sequel. Remark 5.6. The bilinear form Eϕ(u, v) := Dϕu ·Dϕv dQβ , Dom(Eϕ) := Dom(Dϕ) (5.8) is a Dirichlet form on L2(G,Qβ) with form core S∞(G). Its generator (Lϕ,Dom(Lϕ)) is the Friedrichs extension of the symmetric operator (−D∗ϕ ◦Dϕ, S2(G)). 5.4 Derivatives and Integration by Parts Formula on P([0, 1]) Now let us have a look on flows on [0, 1]. To do so, let a function ϕ ∈ C∞([0, 1],R) with ϕ(0) = ϕ(1) = 0 be given. (Note that each such function can be regarded as ϕ ∈ C∞(S1,R) with ϕ(0) = 0.) The flow equation (5.1) now defines a flow etϕ, t ∈ R, of order preserving C∞ diffeomorphisms of [0, 1]. In particular, etϕ(0) = 0 and etϕ(1) = 1 for all t ∈ R. Lemma 5.1 together with Theorem 4.3 immediately yields Lemma 5.7. For ϕ ∈ C∞([0, 1],R) with ϕ(0) = ϕ(1) = 0 and each β ≥ 0 etϕ,0 = V βϕ (g)− ϕ′(0) + ϕ′(1) ϕ,0(g). (5.9) For functions u : G0 → R we will define the directional derivative along ϕ ∈ C∞([0, 1],R) with ϕ(0) = ϕ(1) = 0 as before by Dϕu(g) := lim [u(etϕ ◦ g)− u(g)] (5.10) provided this limit exists. We will consider three classes of ’cylinder functions’ for which the existence of this limit is guaranteed. Definition 5.8. (i) We say that a function u : G0 → R belongs to the class Ck(G0) (for k ∈ N ∪ {0,∞}) if it can be written as u(g) = U ~f(t)g(t)dt (5.11) for some m ∈ N, some ~f = (f1, . . . , fm) with fi ∈ L2([0, 1],Leb) and some Ck-function U : Rm → R. Here and in the sequel, we write ~f(t)g(t)dt = f1(t)g(t)dt, . . . , fm(t)g(t)dt (ii) We say that u : G0 → R belongs to the class Sk(G0) if it can be written as u(g) = U (g(x1), . . . , g(xm)) (5.12) for some m ∈ N, some x1, . . . , xm ∈ [0, 1] and some Ck-function U : Rm → R. (iii) We say that u : G0 → R belongs to the class Zk(G0) if it can be written as u(g) = U ~α(gs)ds (5.13) with U as above, ~α = (α1, . . . , αm) ∈ Ck([0, 1],Rm) and ~α(gs)ds = α1(gs)ds, . . . , αm(gs)ds Remark 5.9. For each ϕ ∈ C∞(S1,R) with ϕ(0) = 0 (which can be regarded as ϕ ∈ C∞([0, 1],R) with ϕ(0) = ϕ(1) = 0), the definitions of Dϕ in (5.5) and (5.10) are consistent in the following sense. Each cylinder function u ∈ S1(G0) defines by v(g) := u(g − g0) (∀g ∈ G) a cylinder function v ∈ S1(G) with Dϕv = Dϕu on G0. Conversely, each cylinder function v ∈ S1(G) defines by u(g) := v(g) (∀g ∈ G0) a cylinder function u ∈ S1(G0) with Dϕv = Dϕu on G0. Lemma 5.10. (i) The directional derivative Dϕu(g) exists for all u ∈ C1(G0)∪S1(G0)∪Z1(G0) (in each point g ∈ G0 and in each direction ϕ ∈ C∞([0, 1],R) with ϕ(0) = ϕ(1) = 0) and Dϕu(g) = limt→0 [u(g + t · ϕ ◦ g)− u(g)]. Moreover, Dϕu(g) = ~f(t)g(t)dt fi(t)ϕ(g(t))dt for each u ∈ C1(G0) as in (5.11), Dϕu(g) = ∂iU(g(x1), . . . , g(xm)) · ϕ(g(xi)) for each u ∈ S1(G0) as in (5.12), and Dϕu(g) = ~α(gs)ds α′i(gs)ϕ(gs)ds for each u ∈ Z1(G0) as in (5.13). (ii) For ϕ ∈ C∞([0, 1],R) with ϕ(0) = ϕ(1) = 0 let D∗ϕ,0 denote the operator in L2(G0,Q adjoint to Dϕ. Then for all u ∈ C1(G0) ∪S1(G0) ∪ Z1(G0) D∗ϕ,0u = −Dϕu− V ϕ,0 · u. (5.14) Proof. See the proof of the analogous results in Lemma 5.3 and Proposition 5.4. Remark 5.11. The operators (Dϕ,C 1(G0)), (Dϕ,S1(G0)), and (Dϕ,Z1(G0)) are closable in 0 ). The closures of (Dϕ,C 1(G0)), (Dϕ,Z1(G0)) and (Dϕ,S1(G0)) coincide. They will be denoted by (Dϕ,Dom(Dϕ)). See (proof of) Corollary 6.11. 6 Dirichlet Form and Stochastic Dynamics on on G At each point g ∈ G, the directional derivative Dϕu(g) of any ’nice’ function u on G defines a linear form ϕ 7→ Dϕu(g) on C∞(S1). If we specify a pre-Hilbert norm ‖.‖g on C∞(S1) for which this linear form is continuous then there exists a unique element Du(g) ∈ TgG with Dϕu(g) = 〈Du(g), ϕ〉g for all ϕ ∈ C∞(S1). Here TgG denotes the completion of C∞(S1) w.r.t. the norm ‖.‖g. The canonical choice of a Dirichlet form on G will then be (the closure of) E(u, v) = 〈Du(g),Dv(g)〉g dQβ(g), u, v ∈ S1(G). (6.1) Given such a Dirichlet form, there is a straightforward procedure to construct an operator (’gen- eralized Laplacian’) and a Markov process (’generalized Brownian motion’). Different choices of ‖.‖g in general will lead to completely different Dirichlet forms, operators and Markov processes. We will discuss in detail two choices: in this chapter we will choose ‖.‖g (independent of g) to be the Sobolev norm ‖.‖Hs for some s > 1/2; in the remaining chapters, ‖.‖g will always be the L2-norm ϕ 7→ ( ϕ(gt) 2dt)1/2 of L2(S1, g∗Leb). For the sequel, fix – once for ever – the number β > 0 and drop it from the notations, i.e. Q := Qβ, Vϕ := V ϕ etc. 6.1 The Dirichlet Form on G Let (ψk)k∈N denote the standard Fourier basis of L 2(S1). That is, ψ2k(x) = 2 · sin(2πkx), ψ2k+1(x) = 2 · cos(2πkx) for k = 1, 2, . . . and ψ1(x) = 1. It constitutes a complete orthonormal system in L 2(S1): each ϕ ∈ L2(S1) can uniquely be written as ϕ(x) = k=1 ck · ψk(x) with Fourier coefficients of ϕ given by ck := ϕ(y)ψk(y)dy. In terms of these Fourier coefficients we define for each s ≥ 0 the norm ‖ϕ‖Hs := c21 + k2s · (c22k + c22k+1) (6.2) on C∞(S1). The Sobolev space Hs(S1) is the completion of C∞(S1) with respect to the norm ‖.‖Hs . It has a complete orthonormal system consisting of smooth functions (ϕk)k∈N. For instance, one may choose ϕ2k(x) = 2 · k−s · sin(2πkx), ϕ2k+1(x) = 2 · k−s · cos(2πkx) (6.3) for k = 1, 2, . . . and ϕ1(x) = 1. A linear form A : C∞(S1) → R is continuous w.r.t. ‖.‖Hs — and thus can be represented as A(ϕ) = 〈ψ,ϕ〉Hs for some ψ ∈ Hs(S1) with ‖ψ‖Hs = ‖A‖Hs — if and only if ‖A‖Hs := |A(ψ1)|2 + k2s · (|A(ψ2k)|2 + |A(ψ2k+1)|2) <∞. (6.4) Proposition 6.1. Fix a number s > 1/2. Then for each cylinder function u ∈ S(G) and each g ∈ G, the directional derivative defines a continuous linear form ϕ 7→ Dϕu(g) on C∞(S1) ⊂ Hs(S1). There exists a unique tangent vector Du(g) ∈ Hs(S1) such that Dϕu(g) = 〈Du(g), ϕ〉Hs for all ϕ ∈ C∞(S1). In terms of the family Φ = (ϕk)k∈N from (6.3) Du(g) = Dϕku(g) · ϕk(.) ‖Du(g)‖2Hs = |Dϕku(g)|2. (6.5) Proof. It remains to prove that the RHS of (6.5) is finite for each u and g under consideration. According to Lemma 5.3, for any u ∈ S(G) represented as in (5.12) |Dϕku(g)|2 = ∂iU(g(x1), . . . , g(xm)) · ϕk(g(xi)) ≤ m · ‖∇U‖2∞ · ‖ ϕ2k‖∞ = m · ‖∇U‖2∞ · (1 + 4 k−2s). And, indeed, the latter is finite for each s > 1/2. For the sequel, let us now fix a number s > 1/2 and define E(u, v) = 〈Du(g),Dv(g)〉Hs dQ(g) (6.6) for u, v ∈ S1(G). Equivalently, in terms of the family Φ = (ϕk)k∈N from (6.3) E(u, v) = Dϕku(g) ·Dϕkv(g) dQ(g). (6.7) Theorem 6.2. (i) (E ,S1(G)) is closable. Its closure (E ,Dom(E)) is a regular Dirichlet form on L2(G,Q) which is strongly local and recurrent (hence, in particular, conservative). (ii) For u ∈ S1(G) with representation (5.6) E(u, u) = ∂iU(g(x1), . . . , g(xm)) · ϕk(g(xi)) dQ(g). The generator of the Dirichlet form is the Friedrichs extension of the operator L given on S2(G) Lu(g) = i,j=1 ∂i∂jU (g(x1), . . . , g(xm))ϕk(g(xi))ϕk(g(xj)) ∂iU (g(x1), . . . , g(xm)) [ϕ k(g(xi)) + Vϕk(g)]ϕk(g(xi)). (iii) Z1(G) is a core for Dom(E) (i.e. it is contained in the latter as a dense subset). For u ∈ Z1(G) with representation (5.13) E(u, u) = ~α(gt)dt) · α′i(gt)ϕk(gt)dt dQ(g). The generator of the Dirichlet form is the Friedrichs extension of the operator L given on Z2(G) Lu(g) = i,j=1 ∂i∂jU ~α(gt)dt α′i(gt)ϕk(gt)dt · α′j(gt)ϕk(gt)dt ~α(gt)dt {Vϕk(g) + [α′′i (gt)ϕ k(gt) + α i(gt)ϕ k(gt)ϕk(gt)]dt}. (iv) The intrinsic metric ρ can be estimated from below in terms of the L2-metric: ρ(g, h) ≥ 1√ ‖g − h‖L2 . Remark 6.3. All assertions of the above Theorem remain valid for any E defined as in (6.7) with any choice of a sequence Φ = (ϕk)k∈N of smooth functions on S 1 with C := ‖ ϕ2k‖∞ <∞. (6.8) (This condition is satisfied for the sequence from (6.3) if and only if s > 1/2.) The proof of the Theorem will make use of the following Lemma 6.4. (i) Dom(E) contains all functions u which can be represented as u(g) = U(‖g − f1‖L2 , . . . , ‖g − fm‖L2) (6.9) with some m ∈ N, some f1, . . . , fm ∈ G and some U ∈ C1(Rm,R). For each u as above, each ϕ ∈ C∞(S1) and Q-a.e. g ∈ G Dϕu(g) = ∂iU(‖g − f1‖L2 , . . . , ‖g − fm‖L2) · sign(g(t)− fi(t)) |g(t) − fi(t)| ‖g − fi‖L2 ϕ(g(t))dt where sign(z) := +1 for z ∈ S1 with |[0, z]| ≤ 1/2 and sign(z) := −1 for z ∈ S1 with |[z, 0]| < 1/2. (ii) Moreover, Dom(E) contains all functions u which can be represented as u(g) = U(gǫ1(x1), . . . , gǫm(xm)) (6.10) with some m ∈ N, some x1, . . . , xm ∈ S1, some ǫ1, . . . , ǫm ∈ ]0, 1[ and some U ∈ C1((S1)m,R). Here gǫ(x) := ∫ x+ǫ g(t)dt ∈ S1 for x ∈ S1 and 0 < ǫ < 1. More precisely, gǫ(x) := π( ∫ x+ǫ π−1g(t)dt) where π : G(R) → G (cf. section 2.2) denotes the projection and π−1 : G → G(R) the canonical lift with π−1(g)(t) ∈ [g(x), g(x) + 1] ⊂ R for t ∈ [x, x+ 1] ⊂ R. For each u as above, each ϕ ∈ C∞(S1) and each g ∈ G Dϕu(g) = ∂iU(gǫ1(x1), . . . , gǫm(xm)) · ∫ xi+ǫi ϕ(g(t))dt. (iii) The set of all u of the form (6.10) is dense in Dom(E). Proof. (i) Let us first prove that for each f ∈ G, the map u(g) = ‖g − f‖L2 lies in Dom(E). For n ∈ N, let πn : G → G be the map which replaces each g by the piecewise constant map: πn(g)(t) := g( ) for t ∈ [ i Then by right continuity πn(g) → g as n→ ∞ and thus |g( i )− f( i )|2 −→ |g(t) − f(t)|2dt. Therefore, for each g ∈ G as n→ ∞ un(g) := Un(g(0), g( ), . . . , g( n − 1 )) −→ u(g) (6.11) where Un(x1, . . . , xn) := i=0 dn(xi+1 − f( in)) and dn is a smooth approximation of the distance function x 7→ |x| on S1 (which itself is non-differentiable at x = 0 and x = 1 ) with |d′n| ≤ 1 and dn(x) → |x| as n→ ∞. Obviously, un ∈ S1(G). By dominated convergence, (6.11) also implies that un → u in L2(G,Q). Hence, u ∈ Dom(E) if (and only if) we can prove that E(un) <∞. E(un) = ∂iUn(g(0), g( ), . . . , g( n − 1 )) · ϕ(g( i − 1 dQ(g) ϕ2k(g( i − 1 )) dQ(g) = ‖ϕk‖2L2 <∞, uniformly in n ∈ N. This proves the claim for the function u(g) = ‖g − f‖L2 . From this, the general claim follows immediately: if vn, n ∈ N, is a sequence of S1(G) approx- imations of g 7→ ‖g − 0‖L2 then un(g) := U(vn(g − f1), . . . , vn(g − fm)) defines a sequence of S1(G) approximations of u(g) = U(‖g − f1‖L2 , . . . , ‖g − fm‖L2). (ii) Again it suffices to treat the particular case m = 1 and U = id, that is, u(g) = gǫ(x) for some x ∈ S1 and some 0 < ǫ < 1. Let g̃ ∈ G(R) be the lifting of g and recall that u(g) = π(1 ∫ x+ǫ g̃(t)dt). Define un ∈ S1(G) for n ∈ N by un(g) = π( 1n i=0 g̃(x + ǫ)). Right continuity of g̃ implies un → u as n → ∞ pointwise on G and thus also in L2(G,Q). To see the boundedness of E(un) note that Dϕun(g) = 1n i=0 ϕ(g(x + ǫ)). Thus E(un) ≤ ϕ2k(g(x + ǫ))dQ(g) = ‖ϕk‖2L2 <∞. (iii) We have to prove that each u ∈ S1(G) can be approximated in the norm (‖.‖2 + E(.))1/2 by functions un of type (6.10). Again it suffices to treat the particular case u(g) = g(x) for some x ∈ S1. Choose un(g) = g1/n(x). Then by right continuity of g, un → u pointwise on G and thus also in L2(G,Q). Moreover, Dϕun(g) = n ∫ x+1/n ϕ(g(t))dt (for all ϕ and g) and therefore E(un) ≤ ∫ x+1/n ϕ2k(g(t))dtdQ(g) = ‖ϕk‖2L2 <∞. Proof of the Theorem. (a) The sum E of closable bilinear forms with common domain S1(G) is closable, provided it is still finite on this domain. The latter will follow by means of Lemma 5.3 which implies for all u ∈ S1(G) with representation (5.11) E(u, u) = ∂iU(g(x1), . . . , g(xm)) · ϕk(g(xi)) dQ(g) ≤ m · ‖∇U‖2∞ · ‖ϕk‖2L2(S1) <∞. Hence, indeed E is finite on S1(G). (b) The Markov property for E follows from that of the Eϕk(u, v) = Dϕku ·Dϕkv dQ. (c) According to the previous Lemma, the class of continuous functions of type (6.10) is dense in Dom(E). Moreover, the class of finite energy functions of type (6.9) is dense in C(G) (with the L2 topology of G ⊂ L2(S1), cf. Proposition 2.1). Therefore, the Dirichlet form E is regular. (e) The estimate for the intrinsic metric is an immediate consequence of the following estimate for the norm of the gradient of the function u(g) = ‖g − f‖L2 (which holds for each f ∈ G uniformly in g ∈ G): ‖Du(g)‖2 = sign(g(t)− fi(t)) |g(t) − fi(t)| ‖g − fi‖L2 ϕk(g(t))dt ϕ2k(g(t))dt ≤ ‖ ϕ2k‖∞ =: C. (f) The locality is an immediate consequence of the previous estimate: Given functions u, v ∈ Dom(E) with disjoint supports, one has to prove that E(u, v) = 0. Without restriction, one may assume that supp[u] ⊂ Br(g) and supp[v] ⊂ Br(h) with ‖g − h‖L2 > 2r + 2δ. (The general case will follow by a simple covering argument.) Without restriction, u, v can be assumed to be bounded. Then |u| ≤ Cwδ,g and |v| ≤ Cwδ,h for some constant C where wδ,g(f) = (r + δ − ‖f − g‖L2) ∧ 1 Given un ∈ S1(G) with un → u in Dom(E) put un = (un ∧ wδ,g) ∨ (−wδ,g). Then un → u in Dom(E). Analogously, vn → v in Dom(E) for vn = (vn ∧ wδ,h) ∨ (−wδ,h). But obviously, E(un, vn) = 0 since un · vn = 0. Hence, E(u, v) = 0. (g) In order to prove that Z1(G) is contained in Dom(E) it suffices to prove that each u ∈ Z1(G) of the form u(g) = α(gt)dt can be approximated in Dom(E) by un ∈ S1(G). Given u as above with α ∈ C1(S1,R) put un(g) = 1n i=1 α(gi/n). Then un ∈ S1(G), un → u on G and Dϕun(g) = α′(gi/n)ϕ(gi/n) → α′(gt)ϕ(gt)dt = Dϕu(g). Moreover, E(un, un) = α′(gi/n)ϕ(gi/n) dQ(g) ≤ C · α′(gi/n) 2 dQ(g) = C · α′(t)2dt uniformly in n ∈ N. Hence, u ∈ Dom(E) and E(u, u) = lim E(un, un) = α′(gt)ϕk(gt)dt dQ(g). (h) The set Z1(G) is dense in Dom(E) since according to assertion (ii) of the previous Lemma already the subset of all u of the form (6.10) is dense in Dom(E). Finally, one easily verifies that Z2(G) is dense in Z1(G) and (using the integration by parts formula) that L is a symmetric operator on Z2(G) with the given representation. Corollary 6.5. There exists a strong Markov process (gt)t≥0 on G, associated with the Dirichlet form E. It has continuous trajectories and it is reversible w.r.t. the measure Q. Its generator has the form DϕkDϕk + Vϕk ·Dϕk with {ϕk}k∈N being the Fourier basis of Hs(S1). Remark 6.6. This process (gt)t≥0 is closely related to the stochastic processes on the diffeo- morphism group of S1 and to the ’Brownian motion’ on the homeomorphism group of S1, studied by Airault, Fang, Malliavin, Ren, Thalmaier and others [AMT04, AM06, AR02, Fan02, Fan04, Mal99]. These are processes with generator 1 kDϕkDϕk . For instance, in the case s = 3/2 our process from the previous Corollary may be regarded as ’Brownian motion plus drift’. All the previous approaches are restricted to s ≥ 3/2. The main improvements of our approach are: • identification of a probability measure Q such that these processes — after adding a suitable drift — are reversible; • construction of such processes in all cases s > 1/2. 6.2 Finite Dimensional Noise Approximations In the previous section, we have seen the construction of the diffusion process on G under minimal assumptions. However, the construction of the process is rather abstract. In this section, we try to construct explicitly a diffusion process associated with the generator of the Dirichlet form E from Theorem 6.2. Here we do not aim for greatest generality. Let a finite family Φ = (ϕk)k=1,...,n of smooth functions on S 1 be given and let (Wt)t≥0 with Wt = (W t , . . . ,W t ) be a n-dimensional Brownian motion, defined on some probability space (Ω,F ,P). For each x ∈ S1 we define a stochastic processes (ηt(x))t≥0 with values in S1 as the strong solution of the Ito differential equation dηt(x) = ϕk(ηt(x))dW ϕ′k(ηt(x))ϕk(ηt(x))dt (6.12) with initial condition η0(x) = x. Equation (6.12) can be rewritten in Stratonovich form as follows dηt(x) = ϕk(ηt(x)) ⋄ dW kt . (6.13) Obviously, for every t and for P-a.e. ω ∈ Ω, the function x 7→ ηt(x, ω) is an element of the semigroup G. (Indeed, it is a C∞-diffeomorphism.) Thus (6.13) may also be interpreted as a Stratonovich SDE on the semigroup G: dηt = ϕk(ηt) ⋄ dW kt , η0 = e. (6.14) This process on G is right invariant: if gt denotes the solution to (6.14) with initial condition g0 = g for some initial condition g ∈ G then gt = ηt ◦ g. One easily verifies that the generator of this process (gt)t≥0 is given on S 2(G) by 1 k=1DϕkDϕk . What we aim for, however, is a process with generator D∗ϕkDϕk = DϕkDϕk + Vϕk ·Dϕk . Define a new probability measure Pg on (Ω,F), given on Ft by dPg = exp Vϕk(ηs ◦ g)dW ks − |Vϕk(ηs ◦ g)|2ds dP (6.15) and a semigroup (Pt)t≥0 acting on bounded measurable functions u on G as follows Ptu(g) = u(ηt(g(.), ω)) dP g(ω). Proposition 6.7. (Pt)t≥0 is a strongly continuous Markov semigroup on G. Its generator is an extension of the operator 1 L = −1 Dϕk with domain S 2(G). That is, for all u ∈ S2(G) and all g ∈ G (Ptu(g)− u(g)) = Lu(g). (6.16) Proof. The strong continuity follows easily from the fact that ηt(x, .) → x a.s. as t → 0 which implies by dominated convergence Ptu(g) = u(ηt ◦ g) dPg → u(g) for each continuous u : G → R. Now we aim for identifying the generator. According to Girsanov’s theorem, under the measure Pg the processes W̃ kt =W Vϕk(ηs ◦ g)ds for k = 1, . . . , n will define n independent Brownian motions. In terms of these driving processes, (6.12) can be reformulated as dgt(x) = ϕk(gt(x))dW̃ [ϕ′k(gt(x)) + Vϕk(gt)]ϕk(gt(x))dt (6.17) (recall that gs = ηs ◦ g). The chain rule applied to a smooth function U on (S1)m, therefore, yields dU (gt(y1), . . . , gt(ym)) U (gt(y1), . . . , gt(ym)) dgt(yi) i,j=1 ∂xi∂xj U (gt(y1), . . . , gt(ym)) d〈g.(yi), g.(yj)〉t U (gt(y1), . . . , gt(ym))ϕk(gt(yi))dW̃ U (gt(y1), . . . , gt(ym)) [ϕ k(gt(yi)) + Vϕk(gt)]ϕk(gt(yi))dt i,j=1 ∂xi∂xj U (gt(y1), . . . , gt(ym))ϕk(gt(yi))ϕk(gt(yj))dt. Hence, for a cylinder function of the form u(g) = U(g(y1), . . . , g(ym)) we obtain (Ptu(g) − u(g)) = lim [U (gt(y1), . . . , gt(ym))− U (g0(y1), . . . , g0(ym))] dPg = lim U (gs(y1), . . . , gs(ym)) [ϕ k(gs(yi)) + Vϕk(gs)]ϕk(gs(yi)) i,j=1 ∂xi∂xj U (gs(y1), . . . , gs(ym))ϕk(gs(yi))ϕk(gs(yj))  ds dPg U (g(y1), . . . , g(ym)) [ϕ k(g(yi)) + Vϕk(g)]ϕk(g(yi)) i,j=1 ∂xi∂xj U (g(y1), . . . , g(ym))ϕk(g(yi))ϕk(g(yj)) [DϕkDϕku(g) + Vϕk(g) ·Dϕku(g)] = − D∗ϕkDϕku(g). In order to justify (∗), we have to verify continuity in s in all the expressions preceding (∗). The only term for which this is not obvious is Vϕk(gs). But gs = ηs ◦g with a function ηs(x, ω) which is continuous in x and in s. Thus Vϕk(ηs(., ω) ◦ g) is continuous in s. Remark 6.8. All the previous argumentations in principle also apply to infinite families of (ϕk)k=1,2,..., provided they have sufficiently good integrability properties. For instance, the family (6.3) with s > 5 will do the job. There are three key steps which require a careful verification: • the solvability of the Ito equation (6.12) and the fact that the solutions are homeomor- phisms of S1; here s ≥ 3 suffices, cf. [Mal99]; • the boundedness of the quadratic variation of the drift to justify Girsanov’s transformation in (6.15); for s > 5 this will be satisfied since Lemma 5.1 implies (uniformly in g) |Vϕk(g)| 2 ≤ (β + 1)2 |ϕ′′k(x)|2dx ≤ 4(β + 1)2 k4−2s; • the finiteness of the generator and Ito’s chain rule for C2-cylinder functions; here s > 3 will be sufficient. Remark 6.9. Another completely different approximation of the process (gt)t≥0 in terms of finite dimensional SDEs is obtained as follows. For N ∈ N, let S1N denote the set of cylinder functions u : G → R which can be represented as u(g) = U(g(1/N), g(2/N), . . . , g(1)) for some U ∈ C1((S1)N ). Denote the closure of (E ,S1N ) by (EN ,Dom(EN )). It is the image of the Dirichlet form (EN ,Dom(EN )) on ΣN ⊂ (S1)N given by EN (U) = i,j=1 ∂iU(x)∂jU(x) aij(x)ρ(x) dx (6.18) aij(x) = ϕk(xi)ϕk(xj), ρ(x) = Γ(β/N)N (xi+1 − xi)β/N−1dx. and (as before) ΣN = (x1, . . . , xN ) ∈ (S1)N : i=1 |[xi, xi+1]| = 1 . That is, EN (u) = EN (U) for cylinder functions u ∈ S1N as above. Let (Xt,Px)t≥0,x∈ΣN be the Markov process on ΣN associated with EN . Then the semigroup associated with EN is given by TNt u(g) = Eg(1/N),...,g(1) [U(Xt)] . Now let (gt,Pg)t≥0,g∈G and (Tt)t≥0 denote the Markov process and the L 2-semigroup associated with E . Then as N → ∞ t → Tt strongly in L2 since E2N ց E in the sense of quadratic forms, [RS80], Theorem S.16. (Note that ∪N∈NS12N is dense inDom(E).) 6.3 Dirichlet Form and Stochastic Dynamics on G1 and P In order to define the derivative of a function u : G1 → R we regard it as a function ũ on G with the property ũ(g) = ũ(g ◦θz) for all z ∈ S1. This implies that Dϕũ(g) = (Dϕũ)(g ◦θz) whenever one of these expressions is well-defined. In other words, Dϕũ defines a function on G1 which will be denoted by Dϕu and called the directional derivative of u along ϕ. Corollary 6.10. (i) Under assumption (6.8), with the notations from above, E(u, u) = |Dϕku| 2 dQ. defines a regular, strongly local, recurrent Dirichlet form on L2(G1,Q). (ii) The Markov process on G analyzed in the previous section extends to a (continuous, re- versible) Markov process on G1. In order to see the second claim, let g, g̃ ∈ G with g̃ = g ◦ θz for some z ∈ S1. Then obviously, g̃t(., ω) = ηt(g̃(.), ω) = ηt(g(. + z), ω) = gt(., ω) ◦ θz. Moreover, Pg̃ = Pg since Vϕ(g ◦ θz) = Vϕ(g) for all ϕ under consideration and all z ∈ S1. The objects considered previously – derivative, Dirichlet form and Markov process on G1 – have canonical counterparts on P. The key to these new objects is the bijective map χ : G1 → P. The flow generated by a smooth ’tangent vector’ ϕ : S1 → R through the point µ ∈ P will be given by ((etϕ)∗µ)t∈R. In these terms, the directional derivative of a function u : P → R at the point µ ∈ P in direction ϕ ∈ C∞(S1,R) can be expressed as Dϕu(µ) = lim [u((etϕ)∗µ)− u(µ)] , provided this limit exists. The adjoint operator to Dϕ in L 2(P,P) is given (on a suitable dense subspace) by D∗ϕu(µ) = −Dϕ(µ)− Vϕ(χ−1(µ)) · u(µ). The drift term can be represented as −1(µ)) = β ϕ′(s)µ(ds) + I∈gaps(µ) ϕ′(I−) + ϕ ′(I+) − ϕ(I+)− ϕ(I−)|I| Given a sequence Φ = (ϕk)k∈N of smooth functions on S 1 satisfying (6.8), we obtain a (regular, strongly local, recurrent) Dirichlet form E on L2(P,P) by E(u, u) = |Dϕku(µ)|2dP(µ). (6.19) It is the image of the Dirichlet form defined in (6.7) under the map χ. The generator of E is given on an appropriate dense subspace of L2(P,P) by L = − D∗ϕkDϕk . (6.20) For P-a.e. µ0 ∈ P, the associated Markov process (µt)t≥0 on P starting in µ0 is given as µt(ω) = gt(ω)∗Leb where (gt)t≥0 is the process on G, starting in g0 := χ−1(µ0). (As mentioned before, (gt)t≥0 admits a more direct construction provided we restrict ourselves to a finite sequence Φ = (ϕk)k=1,...,n.) 6.4 Dirichlet Form and Stochastic Dynamics on G0 and P0 For s > 0 and ϕ : [0, 1] → R let the Sobolev norm ‖ϕ‖Hs be defined as in (6.2) and let Hs0([0, 1]) denote the closure of C∞c (]0, 1[), the space of smooth ϕ : [0, 1] → R with compact support in ]0, 1[. If s ≥ 1/2 (which is the only case we are interested in) Hs0([0, 1]) can be identified with {ϕ ∈ Hs([0, 1]) : ϕ(0) = ϕ(1) = 0} or equivalently with {ϕ ∈ Hs(S1) : ϕ(0) = 0}. For the sequel, fix s > 1/2 and a complete orthonormal basis Φ = {ϕk}k∈N of Hs0([0, 1]) with C := ‖ k‖∞ <∞, and define E0(u, u) = |Dϕk ,0u(g)|2 dQ0(g). Corollary 6.11. (E0,S1(G0)), (E0,Z1(G0)) and (E0,C1(G0)) are closable. Their closures coincide and define a regular, strongly local, recurrent Dirichlet form (E0,Dom(E0)) on L2(G0,Q0). Proof. For the closability (and the equivalence of the respective closures) of (E0,S1(G0)) and (E0,Z1(G0)), see the proof of Theorem 6.2. Also all the assertions on the closure are deduced in the same manner. For the closability of (E0,C1(G0)) (and the equivalence of its closure with the previously defined closures), see the proof of Theorem 7.8 below. As explained in the previous subsection, these objects (invariant measure, derivative, Dirichlet form and Markov process) on G0 have canonical counterparts on P0 defined by means of the bijective map χ : G0 → P0. 7 The Canonical Dirichlet Form on the Wasserstein Space 7.1 Tangent Spaces and Gradients The aim of this chapter is to construct a canonical Dirichlet form on the L2-Wasserstein space P0. Due to the isometry χ : G0 → P0 this is equivalent to construct a canonical Dirichlet form on the metric space (G0, ‖.‖L2). This can be realized in two geometric settings which seem to be completely different: • Like in the preceding two chapters, G0 can be considered as a group, with composition of functions as group operation. The tangent space TgG0 is the closure (w.r.t. some norm) of the space of smooth functions ϕ : [0, 1] → R with ϕ(0) = ϕ(1) = 0. Such a function ϕ induces a flow on G0 by (g, t) 7→ etϕ ◦ g ≈ g + t ϕ ◦ g and it defines a directional derivative by Dϕu(g) = limt→0 [u(etϕ ◦ g)−u(g)] for u : G0 → R. The norm on TgG0 we now choose to be ‖ϕ‖Tg := ( ϕ(gs) 2ds)1/2. That is, TgG0 := L2([0, 1], g∗Leb). For given u and g as above, a gradient Du(g) ∈ TgG0 exists with Dϕu(g) = 〈Du(g), ϕ〉Tg (∀ϕ ∈ Tg) if and only if supϕ Dϕu(g) ‖ϕ◦g‖ • Alternatively, we can regard G0 as a closed subset of the space L2([0, 1],Leb). The lin- ear structure of the latter (with the pointwise addition of functions as group operation) suggests to choose as tangent space TgG0 := L2([0, 1],Leb). An element f ∈ TgG0 induces a flow by (g, t) 7→ g+tf and it defines a directional derivative (’Frechet derivative’) by Dfu(g) = limt→0 [u(g + tf) − u(g)] for u : G0 → R, provided u extends to a neighborhood of G0 in L2([0, 1],Leb) or the flow (induced by f) stays within G0. A gradient Du(g) ∈ TgG0 exists with Dfu(g) = 〈Du(g), f〉L2 (∀ϕ ∈ L2) if and only if supf Dfu(g) <∞. In this case, Du(g) is the usual L2-gradient. Fortunately, both geometric settings lead to the same result. Lemma 7.1. (i) For each g ∈ G0, the map ιg : ϕ 7→ ϕ ◦ g defines an isometric embedding of TgG0 = L2([0, 1], g∗Leb) into TgG0 = L2([0, 1],Leb). For each (smooth) cylinder function u : G0 → R Dϕu(g) = Dϕ◦gu(g). If Du ∈ L2(Leb) exists then Du ∈ L2(g∗Leb) also exists. (ii) For Q0-a.e. g ∈ G0, the above map ιg : TgG0 → TgG0 is even bijective. For each u as above Du(g) = Du(g) ◦ g−1 and ‖Du(g)‖Tg = ‖Du(g)‖Tg . Proof. (i) is obvious, (ii) follows from the fact that for Q0-a.e. g ∈ G0 the generalized inverse g−1 is continuous and thus g−1(gt) = t for all t (see sections 3.5 and 2.1). Hence, the map ιg : TgG0 → TgG0 is surjective: for each f ∈ TgG0 ιg(f ◦ g−1) = f ◦ g−1 ◦ g = f. Example 7.2. (i) For each u ∈ Z1(G0) of the form u(g) = U( ~α(gt)dt) with U ∈ C1(Rm,R) and ~α = (α1, . . . , αm) ∈ C1([0, 1],Rm), the gradients Du(g) ∈ TgG0 = L2([0, 1], g∗Leb) and Du(g) ∈ TgG0 = L2([0, 1],Leb) exist: Du(g) = ~α(gt)dt) · α′i(g(.)), Du(g) = ~α(gt)dt) · α′i(.) and their norms coincide: ‖Du(g)‖2Tg = ‖Du(g)‖ ~α(gt)dt) · α′i(g(s)) (ii) For each u ∈ C1(G0) of the form u(g) = U( ~f(t)g(t)dt) with U ∈ C1(Rm,R) and ~f = (f1, . . . , fm) ∈ L2([0, 1],Rm), the gradient Du(g) = ~f(t)g(t)dt) · αi(.) ∈ L2([0, 1],Leb) exists and ‖Du(g)‖2Tg = ~f(t)g(t)dt) · fi(s) For u ∈ C1(G0) ∪ Z1(G0), the gradient Du can be regarded as a map G0 × [0, 1] → R, (g, t) 7→ Du(g)(t). More precisely, D : C1(G0) ∪ Z1(G0) → L2(G0 × [0, 1],Q0 ⊗ Leb). Proposition 7.3. The operator D : Z1(G0) → L2(G0×[0, 1],Q0⊗Leb) is closable in L2(G0,Q0). Proof. LetW ∈ L2(G0×[0, 1],Q0⊗Leb) be of the formW (g) = w(g)·ϕ(gt) with some w ∈ Z1(G0) and some ϕ ∈ C∞([0, 1]) satisfying ϕ(0) = ϕ(1) = 0. Then according to the integration by parts formula for each u ∈ Z1(G0) with u(g) = U( ~α(gs)ds) G0×[0,1] Du ·W d(Q0 ⊗ Leb) = ~α(gs)ds)α i(gt)w(g)ϕ(gt)dtdQ0(g) Dϕu(g)w(g) dQ0(g) = u(g)D∗ϕw(g) dQ0(g). To prove the closability of D, consider a sequence (un)n in Z 1(G0) with un → 0 in L2(Q0) and Dun → V in L2(Q0 ⊗ Leb). Then V ·W d(Q0 ⊗ Leb) = lim Dun ·W d(Q0 ⊗ Leb) = lim ϕw dQ0 = 0 (7.1) for all W as above. The linear hull of the latter is dense in L2(Q0 ⊗ Leb). Hence, (7.1) implies V = 0 which proves the closability of D. The closure of (D,Z1(G0)) will be denoted by (D,Dom(D). Note that a priori it is not clear whether D coincides with D on C1(G0). (See, however, Theorem 7.8 below.) 7.2 The Dirichlet Form Definition 7.4. For u, v ∈ Z1(G0) ∪ C1(G0) we define the ’Wasserstein Dirichlet integral’ E(u, v) = 〈Du(g),Dv(g)〉L2 dQ0(g). (7.2) Theorem 7.5. (i) (E,Z1(G0)) is closable. Its closure (E,Dom(E)) is a regular, recurrent Dirichlet form on L2(G0,Q0). Dom(E) = Dom(D) and for all u, v ∈ Dom(D) E(u, v) = G0×[0,1] Du · Dv d(Q0 ⊗ Leb). (ii) The set Z∞0 (G0) of all cylinder functions u ∈ Z∞(G0) of the form u(g) = U( ~α(gs)ds) with U ∈ C∞(Rm,R) and ~α = (α1, . . . , αm) ∈ C∞([0, 1],Rm) satisfying α′i(0) = α′i(1) = 0 is a core for (E,Dom(E)). (iii) The generator (L,Dom(L) of (E,Dom(E)) is the Friedrichs extension of the operator (L,Z∞0 (G0) given by Lu(g) = − D∗αiui(g) i,j=1 ∂i∂jU ~α(gs)ds α′i(gs)α j(gs)ds + ~α(gs)ds · V β where ui(g) := ∂iU( ~α(gs)ds) and V (g) denotes the drift term defined in section 5.1 with ϕ = α′i; β > 0 is the parameter of the entropic measure fixed throughout the whole chapter. (iv) The Dirichlet form (E,Dom(E)) has a square field operator given by Γ(u, v) := 〈Du,Dv〉L2(Leb) ∈ L1(G0,Q0) with Dom(Γ) = Dom(E) ∩ L∞(G0,Q0). That is, for all u, v, w ∈ Dom(E) ∩ L∞(G0,Q0) w · Γ(u, v) dQ0 = E(u, vw) + E(uw, v) − E(uv,w). (7.3) Proof. (a) The closability of the form (E,Z1(G0)) follows immediately from the previous Propo- sition 7.3. Alternatively, we can deduce it from assertion (iii) which we are going to prove first. (b) Our first claim is that E(u,w) = − u · Lw dQ0 for all u,w ∈ Z∞0 (G0). Let u(g) = ~α(gs)ds) and w(g) = W ( ~γ(gs)ds) with U,W ∈ C∞(Rm,R) and ~α = (α1, . . . , αm), ~γ = (γ1, . . . , γm) ∈ C∞([0, 1],Rm) satisfying α′i(0) = α′i(1) = γ′i(0) = γ′i(1) = 0. Observe that 〈Du(g),Dw(g)〉L2 = i,j=1 ~α(gs)ds) · ∂jW ( ~γ(gs)ds) · α′i(gs)γ j(gs)ds ui(g) ·Dα′ w(g). Hence, according to the integration by parts formula from Proposition 5.10 E(u,w) = 〈Du(g),Dw(g)〉L2 dQ0(g) ui(g) ·Dα′ w(g) dQ0(g) ui(g) · w(g) dQ0(g) Lu(g) · w(g) dQ0(g). This proves our first claim. In particular, (L,Z∞0 (G0)) is a symmetric operator. Therefore, the form (E,Z∞0 (G0)) is closable and its generator coincides with the Friedrichs extension of L. (c) Now let us prove that Z∞0 (G0) is dense in Z1(G0). That is, let us prove that each function u ∈ Z1(G0) can be approximated by functions uǫ ∈ Z∞0 (G0). For simplicity, assume that u is of the form u(g) = U( α(gs)ds) with U ∈ C1(R) and α ∈ C1([0, 1]). (That is, for simplicity, m = 1.) Let Uǫ ∈ C∞(R) for ǫ > 0 be smooth approximations of U with ‖U − Uǫ‖∞ + ‖U ′ − U ′ǫ‖∞ → 0 as ǫ → 0 and let αǫ ∈ C∞(R) with α′ǫ(0) = α′ǫ(1) = 0 be smooth approximations of α with ‖α−αǫ‖∞ → 0 and α′ǫ(t) → α′(t) for all t ∈]0, 1[ as ǫ → 0. Moreover, assume that supǫ ‖α′‖∞ < Define uǫ ∈ Z∞0 (G0) as uǫ(g) = Uǫ( αǫ(gs)ds). Then uǫ → u in L2(G0,Q0) by dominated convergence relative Q0. Since U ′ǫ( αǫ(g(s))ds) [0,1] α′ǫ(gs) 2ds ≤ C, (α′ǫ) 2(g(s)) ǫ→0−→ α′(gs)2 ∀s ∈ [0, 1] \ {g = 0} ∩ {g = 1} [0, 1] \ {g = 0} ∩ {g = 1} =]0, 1[ for Q0-almost all g ∈ G0 one finds by dominated convergence in L2([0, 1],Leb), for Q0-almost all g ∈ G0 U ′ǫ( αǫ(gs)ds) [0,1] α′ǫ(gs) ǫ→0−→ α(gs)ds) [0,1] α′(gs) Hence also with E(uǫ, uǫ) = U ′ǫ( αǫ(gs)ds) α′ǫ(gs) 2dsQ0(dg) ǫ→0−→ α(gs)ds) α′(gs) 2dsQ0(dg) by dominated convergence in L2(G0,Q0). In particular, {uǫ}ǫ constitutes a Cauchy sequence relative to the norm ‖v‖2 E,1 := ‖v‖2L2(G,Q) + E(v, v). In fact, since the sequence uǫ is uniformly bounded w.r.t. to ‖.‖E,1, by weak compactness there is a weakly converging subsequence in (Dom(E), ‖.‖E,1). Since the associated norms converge, the convergence is actually strong in (Dom(E), ‖.‖E,1). Moreover, since uǫ → u in L2(G0,Q0), this limit is unique. Hence the entire sequence converges to u ∈ (Dom(E), ‖.‖E,1), such that in particular E(u, u) = limǫ→0E(uǫ, uǫ). This proves our second claim. In particular, it implies that also (E,Z1(G0)) is closable and that the closures of Z∞0 (G0) and Z1(G0) coincide. (d) Obviously, (E,Dom(E)) has the Markovian property. Hence, it is a Dirichlet form. Since the constant functions belong to Dom(E), the form is recurrent. Finally, the set Z1(G0) is dense in (C(G0), ‖.‖∞) according to the theorem of Stone-Weierstrass since it separates the points in the compact metric space G0. Hence, (E,Dom(E)) is regular. (e) According to Leibniz’ rule, (7.3) holds true for all u, v, w ∈ Z1(G0). Arbitrary u, v, w ∈ Dom(E)∩L∞(G0,Q0) can be approximated in (E(.) + ‖.‖2)1/2 by un, vn, wn ∈ Z1(G0) which are uniformly bounded on G0. Then unvn → uv, unwn → uw and vnwn → vw in (E(.) + ‖.‖2)1/2. Moreover, we may assume that wn → w Q0-a.e. on G0 and thus |wΓ(u, v) − wnΓ(un, vn)| dQ0 ≤ |w−wn|Γ(u, v)dQ0 + |wn| · |Γ(u, v)−Γ(un, vn)|dQ0 → 0 by dominated convergence. Hence, (7.3) carries over from Z1(G0) to Dom(E) ∩L∞(G0,Q0). Lemma 7.6. For each f ∈ G0 the function u : g 7→ 〈f, g〉L2 belongs to Dom(E). Proof. (a) For f, g ∈ G0 put µf = f∗Leb and µg = g∗Leb. Recall that by Kantorovich duality ‖f − g‖2L2 = d2W (µf , µg) = sup ϕdµf + = sup ϕ(ft)dt+ ψ(gt)dt where the supϕ,ψ is taken over all (smooth, bounded) ϕ ∈ L1([0, 1], µf ), ψ ∈ L1([0, 1], µg) satisfying ϕ(x) + ψ(y) ≤ 1 |x − y|2 for µf -a.e. x and µg-a.e. y in [0, 1]. Replacing ϕ(x) by |x|2/2 − ϕ(x) (and ψ(y) by . . .) this can be restated as 〈f, g〉L2 = inf ϕ(ft)dt+ ψ(gt)dt (7.4) where the infϕ,ψ now is taken over all (smooth, bounded) ϕ ∈ L1([0, 1], µf ), ψ ∈ L1([0, 1], µg) satisfying ϕ(x) + ψ(y) ≥ 〈x, y〉 for µf -a.e. x and µg-a.e. y in [0, 1]. If g is strictly increasing then ψ can be chosen as ψ′ = f ◦ g−1, cf. [Vil03], sect. 2.1 and 2.2. (b) Now fix a countable dense set {gn}n∈N of strictly increasing functions in G0 and an arbitrary function f ∈ G0. Let (ϕn, ψn) denote a minimizing pair for (f, gn) in (7.4) and define un : G0 → R un(g) := min i=1,...,n ϕ(fi(t))dt+ ψi(g(t))dt Note that ψ′i = f ◦ g i and thus un(gi) = 〈f, gi〉 for all i = 1, . . . , n. Therefore, |un(g)− un(g̃)| ≤ max |ψi(g(t))dt − ψi(g̃(t))|dt ≤ max ‖ψ′i‖∞ · |g(t)− g̃(t)|dt ≤ ‖g − g̃‖L1 for all g, g̃ ∈ G0. Hence, un → u pointwise on G0 and in L2(G0,Q0) where u(g) := 〈f, g〉. (c) The function un is in the class Z 0(G0): un(g) = Un ~α(gt)dt with Un(x1, . . . , xn) = min{c1 + x1, . . . , cn + xn}, ci = ϕi(f(t))dt and αi = ψi. The function Un can be easily approximated by C1 functions in order to verify that un ∈ Dom(E) and Dun(g) = 1Ai(g) · ψ′i(g(.)) with a suitable disjoint decomposition G0 = ∪iAi. (More precisely, Ai denotes the set of all g ∈ G0 satisfying ϕ(fi(t))dt + ψi(g(t))dt < ϕ(fj(t))dt + ψj(g(t))dt for all j < i and ϕ(fi(t))dt+ ψi(g(t))dt ≤ ϕ(fi(t))dt+ ψi(g(t))dt for all j > i.) Thus ‖Dun(g)‖2 = 1Ai(g) · ψ′i(g(t)) E(un) ≤ max ‖ψ′i ◦ g‖2L2dQ0(g). In particular, since |ψ′i| ≤ 1, E(un) ≤ 1 and thus u ∈ Dom(E). Lemma 7.7. For all u ∈ Z1(G0) and all w ∈ C1(G0) ∩Dom(E) E(u,w) = 〈Du(g),Dw(g)〉L2dQ0(g) (7.5) (with Du(g) and Dw(g) given explicitly as in Example 7.2). Proof. Recall that for u ∈ Z∞0 (G0) of the form u(g) = U( ~α(gt)dt) Lu(g) = ui(g) with ui(g) = ∂iU( ~α(gt)dt). Hence, for w ∈ C1(G0) of the form w(g) =W (〈~h, g〉) E(u,w) = − Lu(g)w(g) dQ0(g) ui(g)w(g) dQ0(g) = ui(g)Dα′ ui(g)w(g) dQ0(g) i,j=1 ~α(gt)dt) · ∂jW ( ~h(t)g(t)dt) · α′i(g(t))hj(t)dt dQ0(g) 〈Du(g),Dw(g)〉dQ0(g). This proves the claim provided u ∈ Z∞0 (G0). By density this extends to all u ∈ Z1(G0). Theorem 7.8. (i) (E,C1(G0)) is closable and its closure coincides with (E,Dom(E)). Similarly, (D,C1(G0)) is closable and its closure coincides with (D,Dom(D)). (ii) For all u,w ∈ Z1(G0) ∪ C1(G0) Γ(u,w)(g) = 〈Du(g),Dw(g)〉L2 , (7.6) in particular, E(u,w) = 〈Du(g),Dw(g)〉L2dQ0(g) (with Du(g) and Dw(g) given explicitly as in Example 7.2). (iii) For each f ∈ G0 the function uf : g 7→ ‖f − g‖L2 belongs to Dom(E) and Γ(uf , uf ) ≤ 1 Q0-a.e. on G0. (iv) (E,Dom(E)) is strongly local. Proof. (a) Claim: For each f ∈ L2([0, 1],Leb) the function uf : g 7→ 〈f, g〉L2 belongs to Dom(E) and E(uf , uf ) = ‖f‖2L2 . Indeed, if f ∈ L2 ∩ C1 then f = c0 + c1f1 + c2f2 with f1, f2 ∈ G0 and c0, c1, c2 ∈ R. Hence, uf ∈ Dom(E) according to Lemma 7.6 and E(uf , uf ) = ‖Duf‖2dQ0 = ‖f‖2 according to Lemma 7.7. Finally, each f ∈ L2 can be approximated by fn ∈ L2 ∩ C1 with ‖f − fn‖ → 0. Hence, uf ∈ Dom(E) and E(uf , uf ) = ‖f‖2. (b) Claim: C1(G0) ⊂ Dom(E). Let u ∈ C1(G0) be given with u(g) = U(〈~f , g〉), U ∈ C1(Rm,R), ~f = (f1, . . . , fm) ∈ L2([0, 1],Rm). For each i = 1, . . . ,m let (wi,n)n∈N be an approximating sequence in (Z 1(G0), (E + ‖.‖2)1/2) for wi : g 7→ 〈fi, g〉. Put un(g) = U(w1,n(g), . . . , wm,n(g)). Then un ∈ Z1(G0), un → u pointwise on G0 and in L2(G0,Q0). Moreover, E(un, un) = ∂iU(w1,n(g), . . . , wm,n(g))Dwi,n(g)‖2L2 dQ0(g) ∂iU(〈f1, g〉, . . . , 〈fm, g〉)Dwi(g)‖2L2 dQ0(g) ‖Du(g)‖2dQ0(g). Hence, u ∈ Dom(E) and E(u, u) = ‖Du(g)‖2dQ0(g). (c) Assertion (ii) then follows via polarization and bi-linearity. Assertion (iii) is an immediate consequence of assertion (ii). Assertion (iii) allows to prove the locality of the Dirichlet form (E,Dom(E)) in the same manner as in the proof of Theorem 6.2. (d) Claim: C1(G0) is dense in Dom(E). We have to prove that each u ∈ Z1(G0) can be approximated by un ∈ C1(G0). As usual, it suffices to treat the particular case u(g) = α(gt)dt for some α ∈ C1([0, 1]). Put Un(x1, . . . , xn) = i=1 α(xi) and fn,i(t) = n · 1[ i−1 (t). Then un(g) := Un(〈fn,1, g〉, . . . 〈fn,n, g〉) = defines a sequence in C1(G0) with un(g) → u(g) pointwise on G0 and in L2(G0,Q0). Moreover, Dun(g) = · 1[ i−1 [(.) (7.7) and therefore E(un) = dQ0(g) −→ α′(gt) 2dtdQ0(g) = E(u). (7.8) Thus (un)n is Cauchy in Dom(E) and un → u in Dom(E). 7.3 Rademacher Property and Intrinsic Metric We say that a function u : G0 → R is 1-Lipschitz if |u(g)− u(h)| ≤ ‖g − h‖L2 (∀g, h ∈ G0). Theorem 7.9. Every 1-Lipschitz function u on G0 belongs to Dom(E) and Γ(u, u) ≤ 1 Q0-a.e. on G0. Before proving the theorem in full generality, let us first consider the following particular case. Lemma 7.10. Given n ∈ N, let {h1, . . . , hn} be a orthonormal system in L2([0, 1],Leb) and let U be a 1-Lipschitz function on Rn. Then the function u(g) = U(〈h1, g〉, . . . , 〈hn, g〉) belongs to Dom(E) and Γ(u, u) ≤ 1 Q0-a.e. on G0. Proof. Let us first assume that in addition U is C1. Then according to Theorem 7.8, u is in Dom(E) and Du(g) = i=1 ∂iU(〈~h, g〉) · hi. Thus Γ(u, u)(g) = ‖Du(g)‖L2 = |∂iU(〈~h, g〉)|2 ≤ 1. In the case of a general 1-Lipschitz continuous U on Rn we choose an approximating sequence of 1-Lipschitz functions Uk, k ∈ N, in C1(Rn) with Uk → U uniformly on Rn and put uk(g) = Uk((〈~h, g〉) for g ∈ G0. Then uk → u pointwise and in L2(G0,Q0). Hence, u ∈ Dom(E) and Γ(u, u) ≤ 1 Q0-a.e. on G0. Proof of Theorem 7.9. Every 1-Lipschitz function u on G0 can be extended to a 1-Lipschitz function ũ on L2([0, 1],Leb) (’Kirszbraun extension’). Hence, without restriction, assume that u is a 1-Lipschitz function on L2([0, 1],Leb). Choose a complete orthonormal system {hi}i∈N of the separable Hilbert space L2([0, 1],Leb) and define for each n ∈ N the function Un : Rn → R Un(x1, . . . , xn) = u for x = (x1, . . . , xn) ∈ Rn. This function Un is 1-Lipschitz on Rn: |Un(x)− Un(y)| ≤ xihi − ≤ |x− y|. Hence, according to the previous Lemma the function un(g) = Un(〈h1, g〉, . . . , 〈hn, g〉) belongs belongs to Dom(E) and Γ(un, un) ≤ 1 Q0-a.e. on G0. Note that un(g) = u 〈hi, g〉hi for each g ∈ L2([0, 1],Leb). Therefore, un → u on L2([0, 1],Leb) since i=1〈hi, g〉hi → g on L2([0, 1],Leb) and since u is continuous on L2([0, 1],Leb). Thus, finally, u ∈ Dom(E) and Γ(u, u) ≤ 1 Q0-a.e. on G0. Our next goal is the converse to the previous Theorem. Theorem 7.11. Every continuous function u ∈ Dom(E) with Γ(u, u) ≤ 1 Q0-a.e. on G0 is 1-Lipschitz on G0. Lemma 7.12. For each u ∈ C1(G0) ∪ Z1(G0) and all g0, g1 ∈ G0 u(g1)− u(g0) = 〈Du ((1− t)g0 + tg1) , g1 − g0〉L2dt. (7.9) Proof. Put gt = (1− t)g0 + tg1 and consider the C1 function η : [0, 1] → R defined by ηt = u(gt). η̇t = Dg1−g0u(gt) = 〈Du(gt), g1 − g0〉 and thus η1 − η0 = η̇tdt = 〈Du(gt), g1 − g0〉dt. Lemma 7.13. Let g0, g1 ∈ G0 ∩C3 and put gt = (1− t)g0 + tg1. Then for each u ∈ Dom(E) and each bounded measurable Ψ : G0 → R [u(g1 ◦ h)− u(g0 ◦ h)]Ψ(h) dQ0(h) = 〈Du(gt ◦ h, (g1 − g0) ◦ h〉Ψ(h)Q0(h)dt. (7.10) Proof. Given g0, g1, Ψ and u ∈ Dom(E) as above, choose an approximating sequence in Z1(G0)∪ C1(G0) with un → u in Dom(E) as n→ ∞. According to the previous Lemma for each n [un(g1◦h)−un(g0◦h)]Ψ(h) dQ0(h) = 〈Dun (gt ◦ h) , (g1−g0)◦h〉Ψ(h) dQ0(h)dt. (7.11) By assumption un → u in L2(G0,Q0) and Dun → Du in L2(G0 × [0, 1],Q0 ⊗ Leb) as n → ∞. Using the quasi-invariance of Q0 (Theorem 4.3) this implies |u(gt ◦ h)− un(gt ◦ h)|Ψ(h) dQ0(h) = [u(h)− un(h)|Ψ(g−1t ◦ h) · Y (h) dQ0(h) → 0 as n→ ∞ as well as ‖Du(gt ◦ h)− Dun(gt ◦ h)|2L2Ψ(h)Q0(h) ‖Du(h) − Dun(h)|2L2Ψ(g t ◦ h) · Y (h)Q0(h) → 0 Hence, we may pass to the limit n→ ∞ in (7.11) which yields the claim. Proof of Theorem 7.11. Let a continuous u ∈ Dom(E) be given with Γ(u, u) ≤ 1 Q0-a.e. on G0. We want to prove that u(g1)− u(g0) ≤ ‖g1 − g0‖L2 for all g0, g1 ∈ G0. By density of G0 ∩ C3 in G0 and by continuity of u it suffices to prove the claim for g0, g1 ∈ G0 ∩ C3. Choose a sequence of bounded measurable Ψk : G0 → R+ such that the probability measures ΨkdQ0 on G0 converge weakly to δe, the Dirac mass in the identity map e ∈ G0. Then according to the previous Lemma and the assumption ‖Du‖ ≤ 1 [u(g1 ◦ h)− u(g0 ◦ h)]Ψk((h)dQ0(h) 〈Du(gt ◦ h, (g1 − g0) ◦ h〉Ψk(h) dQ0(h)dt ‖Du(gt ◦ h)‖L2 · ‖(g1 − g0) ◦ h‖L2 ·Ψk(h) dQ0(h)dt ‖(g1 − g0) ◦ h‖L2 ·Ψk(h) dQ0(h). Now the integrands on both sides, h 7→ u(g1 ◦h)−u(g0 ◦h) as well as h 7→ ‖(g1 − g0) ◦h‖L2 , are continuous in h ∈ G0. Hence, as k → ∞ by weak convergence ΨkdQ0 → δe we obtain u(g1)− u(g0) ≤ ‖g1 − g0‖L2 . Corollary 7.14. The intrinsic metric for the Dirichlet form (E,Dom(E)) is the L2-metric: ‖g1 − g0‖L2 = sup {u(g1)− u(g0) : u ∈ C(G0) ∩Dom(E), Γ(u, u) ≤ 1Q0-a.e. on G0} for all g0, g1 ∈ G0. 7.4 Finite Dimensional Noise Approximations The goal of this section is to present representations – and finite dimensional approximations – of the Dirichlet form E(u, v) = 〈Du(g),Dv(g)〉L2 dQ0(g) in terms of globally defined vector fields. If (ϕi)i∈N is a complete orthonormal system in Tg = L 2([0, 1], g∗Leb) for a given g ∈ G0 then obviously 〈Du(g),Dv(g)〉L2 = Dϕiu(g)Dϕiv(g). (7.12) Unfortunately, however, there exists no family (ϕi)i∈N which is simultaneously orthonormal in all Tg = L 2([0, 1], g∗Leb), g ∈ G0. For a general family, the representation (7.12) should be replaced by 〈Du(g),Dv(g)〉L2 = i,j=1 Dϕiu(g) · aij(g) ·Dϕjv(g) (7.13) where a(g) = (aij(g))i,j∈N is the ’generalized inverse’ to Φ(g) = (Φij(g))i,j∈N with Φij(g) := 〈ϕi, ϕj〉Tg = ϕi(gt)ϕj(gt)dt. In order to make these concepts rigorous, we have to introduce some notations. For fixed n ∈ N let S+(n) ⊂ Rn×n denote the set of symmetric nonnegative definite real (n×n)- matrices. For each A ∈ S+(n) a unique element A−1 ∈ S+(n), called generalized inverse to A, is defined by A−1x := 0 if x ∈ Ker(A), y if x ∈ Ran(A) with x = Ay This definition makes sense since (by the symmetry of A) we have an orthogonal decomposition Rn = Ker(A)⊕ Ran(A). Obviously, A−1 ·A = A · A−1 = πA where πA denotes the projection onto Ran(A). Moreover, for each A ∈ S+(n) there exists a unique element A1/2 ∈ S+(n), called nonnegative square root of A, satisfying A1/2 · A1/2 = A. Let Ψ(n) denote the map A 7→ A−1, regarded as a map from S+(n) ⊂ Rn×n to Rn×n, with ij (A) = (A −1)ij for i, j = 1, . . . , n. Similarly, put Ξ(n) : S+(n) → S+(n), A 7→ (A1/2)−1 = (A−1)1/2. Note that Ψ(n)(A) = Ξ(n)(A) · Ξ(n)(A) for all A ∈ S+(n). The maps Ψ(n) and Ξ(n) are smooth on the subset of positive definite matrices A ∈ S+(n) but unfortunately not on the whole set S+(n). However, they can be approximated from below (in the sense of quadratic forms) by smooth maps: there exists a sequence of C∞ maps Ξ(n,l) : Rn×n → Rn×n with ξ · Ξ(n,k)(A) · ξ ≤ ξ · Ξ(n,l)(A) · ξ for all A ∈ S+(n), ξ ∈ Rn and all k, l ∈ N with k ≤ l and (n,l) ij (A) → Ξ ij (A) = (A −1/2)ij for all A ∈ S+(n), i, j ∈ {1, . . . , n} as l → ∞. Put Ψ(n,l)(A) = Ξ(n,l)(A) ·Ξ(n,l)(A) for A ∈ Rn×n. Then the sequence (Ψ(n,l))l∈N approximates Ψ (n) from below in the sense of quadratic forms. Now let us choose a family {ϕi}i∈N of smooth functions ϕi : [0, 1] → R which is total in C0([0, 1]) w.r.t. uniform convergence (i.e. its linear hull is dense). Put Φij(g) := 〈ϕi, ϕj〉Tg = ϕi(gx)ϕj(gx)dx (n,l) ij (g) = Ψ (n,l) ij (Φ(g)) , σ (n,l) ij (g) = Ξ (n,l) ij (Φ(g)). Note that the maps g 7→ a(n,l)ij (g) and g 7→ σ (n,l) ij (g) (for each choice of n, l, i, j) belong to the class Z∞(G0). Moreover, put ij (g) = Ψ ij (Φ(g)) . Then obviously the orthogonal projection πn onto the linear span of {ϕ1, . . . , ϕn} ⊂ Tg = L2([0, 1], g∗Leb) is given by πnu = i,j=1 ij (g) · 〈u, ϕi〉Tg · ϕj 〈πnu, πnv〉Tg = i,j=1 〈u, ϕi〉Tg · a ij (g) · 〈v, ϕj〉Tg for all u, v ∈ Tg. Theorem 7.15. (i) For each n, l ∈ N the form (E(n,l),Z1(G0)) with (n,l)(u, v) = i,j=1 Dϕiu(g) · a (n,l) ij (g) ·Dϕjv(g) dQ0(g) is closable. Its closure is a Dirichlet form with generator being the Friedrichs extension of the symmetric operator (L(n,l),Z2(G0)) given by (n,l) = i,j=1 (n,l) ij ·DϕiDϕj + i,j=1 (n,l) ij + a (n,l) ij · V Dϕj . (7.14) (ii) As l → ∞ (n,l) ր E(n) where (n)(u, v) = i,j=1 Dϕiu(g) · a ij (g) ·Dϕjv(g) dQ0(g). for u, v ∈ Z1(G0). Hence, in particular, E(n) is a Dirichlet form. (iii) As n→ ∞ (n) ր E (which provides an alternative proof for the closability of the form (E,Z1(G0))). Proof. (i) The function a (n,l) i,j on G0 is a cylinder function in the class Z1(G0). The integration by parts formula for the Dϕi , therefore, implies that for all u, v ∈ Z2(G0) (n,l)(u, v) = Dϕiu(g)Dϕjv(g)a (n,l) ij (g)dQ0(g) u(g) ·D∗ϕi (n,l) ij Dϕjv (g) dQ0(g) = − u(g) · L(n,l)v(g) dQ0(g). (n,l) = − i,j=1 (n,l) ij Dϕj Hence, (E(n,l),Z2(G0)) is closable and the generator of its closure is the Friedrichs extension of (L(n,l),Z2(G0)). (ii) The monotone convergence E(n,l) ր E(n) of the quadratic forms is an immediate consequence of the fact that a(n,l)(g) ր a(n)(g) (in the sense of symmetric matrices) for each g ∈ G0 which in turn follows from the defining properties of the approximations Ψ(n,l) of the generalized inverse Ψ(n). The limit of an increasing sequence of Dirichlet forms is itself again a Dirichlet form provided it is densely defined which in our case is guaranteed since it is finite on Z2(G0). (iii) Obviously, the En, n ∈ N constitute an increasing sequence of Dirichlet forms with En ≤ E for all n. Moreover, Z1(G0) is a core for all the forms under consideration. Hence, it suffices to prove that for each u ∈ Z1(G0) and each ǫ > 0 there exists an n ∈ N such that (n)(u, u)− E(u, u) To simplify notation, assume that u is of the form u(g) = U( α(gt)dt) for some U ∈ C1c (R) and some α ∈ C1([0, 1]). By assumption, the set {ϕi, i ∈ N} is total in C0([0, 1]) w.r.t. uniform convergence. Hence, for each δ > 0 there exist n ∈ N and ϕ ∈ span(ϕ1, . . . , ϕn) with ‖α′−ϕ‖sup ≤ δ which implies 〈α′, ϕ〉Tg ‖ϕ‖Tg ≥ ‖ϕ‖Tg − δ ≥ ‖α′‖Tg − 2δ. E(u, u) ≥ E(n)(u, u) ≥ α(gt)dt) 2 · 〈α′, ϕ〉2Tg · ‖ϕ‖2Tg dQ0(g) α(gt)dt) ‖α′‖Tg − 2δ dQ0(g) α(gt)dt) 1 + δ ‖α′‖2Tg − 4δ dQ0(g) 1 + δ E(u, u)− 4δ‖U ′‖2sup. Hence, for δ sufficiently small, E(u, u) and E(n)(u, u) are arbitrarily close to each other. Remark 7.16. For any given g0 ∈ G0, let (gt)t≥0 with gt : (x, ω) 7→ gxt (ω) be the solution to the dgxt = i,j=1 (n,l) ij (gt) · ϕj(g t ) dW i,j=1 (n,l) ij (gt) · ϕj(g t ) · ϕ′i(g t ) + V i,j=1 k,m=1 (n,l) ij (Φ(gt)) · 〈(ϕkϕm) ′, ϕi〉Tg · ϕj(gxt )dt where ∂kmΨ (n,l) ij for (k,m) ∈ {1, . . . , n}2 denotes the 1st order partial derivative of the function (n,l) ij : R n×n → R with respect to the coordinate xkm. Then the generator of the process coincides on Z2(G0) with the operator 12L (n,l) from (7.14), the generator of the Dirichlet form E(n,l). Let us briefly comment on the various terms in the SDE from above: • The first one, i,j=1 σ (n,l) ij (gt) · ϕj(gxt ) dW it is the diffusion term, written in Ito form; • the second one, 1 i,j=1 a (n,l) ij (gt) ·ϕj(gxt ) ·ϕ′i(gxt )dt is a drift which comes from the trans- formation between Stratonovich and Ito form (it would disappear if we wrote the diffusion term in Stratonovich form). • The next one, 1 i,j=1 a (n,l) ij (gt) ·ϕj(gxt ) · V ϕi(gt)dt is a drift which arises from our change of variable formula. Actually, since V βϕi(g) = β ϕ′i(g(y))dy + ϕ′i(g(a+)) + ϕ i(g(a−)) − ϕi(g(a+)) − ϕi(g(a−)) g(a+)− g(a−) it consists of two parts, one originates in the logarithmic derivative of the entropy of the g’s (which finally will force the process to evolve as a stochastic perturbation of the heat equation), the other one is created by the jumps of the g’s. • The last term, 1 i,j=1 k,m=1 ∂kmΨ (n,l) ij (Φ(gt)) · 〈(ϕkϕm)′, ϕi〉Tg · ϕj(gxt )dt involves the derivative of the diffusion matrix. It arises from the fact that the generator is originally given in divergence form. 7.5 The Wasserstein Diffusion (µt) on P0 The objects considered previously – derivative, Dirichlet form and Markov process on G0 – have canonical counterparts on P0. The key to these objects is the bijective map χ : G0 → P0, g 7→ g∗Leb. We denote by Zk(P0) the set of all (’cylinder’) functions u : P0 → R which can be written as u(µ) = U α1dµ, . . . , (7.15) with some m ∈ N, some U ∈ Ck(Rm) and some ~α = (α1, . . . , αm) ∈ Ck([0, 1],Rm) . The subset of u ∈ Zk(P0) with α′i(0) = α′i(1) = 0 for all i = 1, . . . ,m will be denoted by Zk0(P0). For u ∈ Z1(P0) represented as above we define its gradient Du(µ) ∈ L2([0, 1], µ) by Du(µ) = ~αdµ) · α′i(.) with norm ‖Du(µ)‖L2(µ) = ~αdµ) · α′i The tangent space at a given point µ ∈ P0 can be identified with L2([0, 1], µ). The action of a tangent vector ϕ ∈ L2([0, 1], µ) on µ (’exponential map’) is given by the push forward ϕ∗µ. Theorem 7.17. (i) The image of the Dirichlet form defined in (7.2) under the map χ is the regular, strongly local, recurrent Wasserstein Dirichlet form E on L2(P0,P0) defined on its core Z1(P0) by E(u, v) = 〈Du(µ),Dv(µ)〉2L2(µ)dP0(µ). (7.16) The Dirichlet form has a square field operator, defined on Dom(E) ∩ L∞, and given on Z1(P0) Γ(u, v)(µ) = 〈Du(µ),Dv(µ)〉2L2(µ). The intrinsic metric for the Dirichlet form is the L2-Wasserstein distance dW . More precisely, a continuous function u : P0 → R is 1-Lipschitz w.r.t. the L2-Wasserstein distance if and only if it belongs to Dom(E) and Γ(u, u)(µ) ≤ 1 for P0-a.e. µ ∈ P0. (ii) The generator of the Dirichlet form is the Friedrichs extension of the symmetric operator (L,Z20(P0) on L2(P0,P0) given as L = L1 + L2 + β · L3 with L1u(µ) = i,j=1 ∂i∂jU( ~αdµ) · L2u(µ) = ~αdµ) · I∈gaps(µ) α′′i (I−) + α i (I+) i(I+)− α′i(I−) i (0) + α i (1) L3u(µ) = ~αdµ) · α′′i dµ. Recall that gaps(µ) denotes the set of intervals I = ]I−, I+[⊂ [0, 1] of maximal length with µ(I) = 0 and |I| denotes the length of such an interval. (iii) For P0-a.e. µ0 ∈ P0, the associated Markov process (µt)t≥0 on P0 starting in µ0, called Wasserstein diffusion, with generator 1 L is given as µt(ω) = gt(ω)∗Leb where (gt)t≥0 is the Markov process on G0 associated with the Dirichlet form of Theorem 7.5, starting in g0 := χ −1(µ0). For each u ∈ Z20(P0) the process u(µt)− u(µ0)− Lu(µs)ds is a martingale whenever the distribution of µ0 is chosen to be absolutely continuous w.r.t. the entropic measure P0. Its quadratic variation process is Γ(u, u)(µs)ds. Remark 7.18. L1 is the second order part (’diffusion part’) of the generator L, L2 and L3 are first order operators (’drift parts’). The operator L1 describes the diffusion on P0 in all directions of the respective tangent spaces. This means that the process (µt) at each time t ≥ 0 experiences the full ’tangential’ L2([0, 1], µt)-noise. L3 is the generator of the deterministic semigroup (’Neumann heat flow’) (Ht)t≥0 on L 2(P0,P0) given by Htu(µ) = u(htµ). Here ht is the heat kernel on [0, 1] with reflecting (’Neumann’) boundary conditions and htµ(.) = ht(., y)µ(dy). Indeed, for each u ∈ Z10(P0) given as u(g) = U( ~αdµ) we obtain Htu(µ) = ~α(x)ht(x, y)µ(dy)dx and thus ∂tHtu(µ) = ∂iU(htµ) · ∂t αi(x)ht(x, y)µ(dy)dx ∂iU(htµ) · αi(x)h t (x, y)µ(dy)dx ∂iU(htµ) · α′′i (x)ht(x, y)µ(dy)dx = L3Htu(µ). Note that L depends on β only via the drift term L3 and L → L3 as β → ∞. The following statement, which in the finite dimensional case is known as Varadhan’s formula, exhibits another close relationship between (µt) and the geometry of (P([0, 1]), dW ). The Gaus- sian short time asymptotics of the process (µt)t≥0 are governed by the L 2-Wasserstein distance. Corollary 7.19. For measurable sets A,B ∈ P0 with positive P0-measure, let dW (A,B) = inf{dW (ν, ν̃) | ν ∈ A, ν̃ ∈ B} and pt(A,B) = pt(ν, dν̃)P0(dν) where pt(ν, dν̃) denotes the transition semigroup for the process (µt)t≥0. t log pt(A,B) = − dW (A,B) . (7.17) Proof. This type of result is known as Varadhan’s formula. Its respective form for (E,Dom(E) on L2(P0,P0) holds true by the very general results of [HR03] for conservative symmetric diffusions, and the identification of the intrinsic metric as dW in our previous Theorem. Due to the sample path continuity of (µt) the Wasserstein diffusion is equivalently characterized by the following martingale problem. Here we use the notation 〈α, µt〉 = α(x)µt(dx). Corollary 7.20. For each α ∈ C2([0, 1]) with α′(0) = α′(1) = 0 the process Mt = 〈α, µt〉 − 〈α′′, µs〉ds I∈gaps(µs) α′′(I−) + α ′′(I+) ′(I+)− α′(I−) ′′(0) + α′′(1) is a continuous martingale with quadratic variation process [M ]t = 〈(α′)2, µs〉ds. Remark 7.21. For illustration one may compare corollary 7.20 for (µt) in the case β = 1 to the respective martingale problems for four other well-known measure valued process, say on the real line, namely the so-called super-Brownian motion or Dawson-Watanabe process (µDWt ), the Fleming-Viot process (µFW ), both of which we can consider with the Laplacian as drift, the Dobrushin-Doob process (µDDt ) which is the empirical measure of independent Brownian motions with locally finite Poissonian starting distribution, cf. [AKR98], and finally simply the empirical measure process of a single Brownian motion (µBMt = δXt). For each i ∈ {DW,FV,DD,BM} and sufficiently regular α : R → R the process M it := 〈α, µit〉 − 12 〈α′′, µis〉ds is a continuous martingale with quadratic variation process [MDW ]t = 〈α2, µDWs 〉ds, [MFV ]t = [〈α2, µFVs 〉 − (〈α, µFVs 〉)2]ds, [MDD]t = 〈(α′)2, µDDs 〉ds, [MBM ]t = 〈(α′)2, µBMs 〉ds. In view of corollary 7.19 the apparent similarity of µDD and µBM to the Wasserstein diffusion µ is no suprise. However, the effective state spaces of µDD, µBM and µt are as much different as their invariant measures. References [AKR98] S. Albeverio, Yu. G. Kondratiev, and M. Röckner. Analysis and geometry on con- figuration spaces. J. Funct. Anal., 154(2):444–500, 1998. [AM06] Hélène Airault and Paul Malliavin. Quasi-invariance of Brownian measures on the group of circle homeomorphisms and infinite-dimensional Riemannian geometry. J. Funct. Anal. 241 (1): 99-142, 2006. [AMT04] Hélène Airault, Paul Malliavin, and Anton Thalmaier. Canonical Brownian motion on the space of univalent functions and resolution of Beltrami equations by a con- tinuity method along stochastic flows. J. Math. Pures Appl. (9), 83(8):955–1018, 2004. [AR02] Hélène Airault and Jiagang Ren. Modulus of continuity of the canonic Brownian motion “on the group of diffeomorphisms of the circle”. J. Funct. Anal. 196 (2): 395-426, 2002. [Ber99] Jean Bertoin. Subordinators: examples and applications. In Lectures on probability theory and statistics (Saint-Flour, 1997), volume 1717 of Lecture Notes in Math., pages 1–91. Springer, Berlin, 1999. [Bre91] Yann Brenier. Polar factorization and monotone rearrangement of vector-valued functions. Comm. Pure Appl. Math., 44(4):375–417, 1991. [CEMS01] Dario Cordero-Erausquin, Robert J. McCann, and Michael Schmuckenschläger. A Riemannian interpolation inequality à la Borell, Brascamp and Lieb. Invent. Math., 146(2):219–257, 2001. [Daw93] Donald A. Dawson. Measure-valued Markov processes. In École d’Été de Probabilités de Saint-Flour XXI—1991, volume 1541 of Lecture Notes in Math., pages 1–260. Springer, Berlin, 1993. [DZ06] Arnaud Debussche and Lorenzo Zambotti. Conservative Stochastic Cahn-Hilliard equation with reflection. 2006. Preprint. [ÉY04] Michel Émery and Marc Yor. A parallel between Brownian bridges and gamma bridges. Publ. Res. Inst. Math. Sci., 40(3):669–688, 2004. [Fan02] Shizan Fang. Canonical Brownian motion on the diffeomorphism group of the circle. J. Funct. Anal. 196 (1): 162-179, 2002. [Fan04] Shizan Fang. Solving stochastic differential equations on Homeo(S1). J. Funct. Anal., 216(1):22–46, 2004. [FOT94] Masatoshi Fukushima, Yōichi Oshima, and Masayoshi Takeda. Dirichlet forms and symmetric Markov processes. Walter de Gruyter & Co., Berlin, 1994. [Han02] Kenji Handa. Quasi-invariance and reversibility in the Fleming-Viot process. Probab. Theory Related Fields, 122(4):545–566, 2002. [HR03] Masanori Hino and José A. Ramı́rez. Small-time Gaussian behavior of symmetric diffusion semigroups. Ann. Probab., 31(3):1254–1295, 2003. [JKO98] Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formulation of the Fokker-Planck equation. SIAM J. Math. Anal., 29(1):1–17 (electronic), 1998. [Kin93] J. F. C. Kingman. Poisson processes, volume 3 of Oxford Studies in Probability. The Clarendon Press Oxford University Press, New York, 1993. , Oxford Science Publications. [Mal99] Paul Malliavin. The canonic diffusion above the diffeomorphism group of the circle. C. R. Acad. Sci. Paris Sér. I Math., 329(4):325–329, 1999. [McC97] Robert J. McCann. A convexity principle for interacting gases. Adv. Math., 128(1):153–179, 1997. [NP92] D. Nualart and É. Pardoux. White noise driven quasilinear SPDEs with reflection. Probab. Theory Related Fields, 93(1):77–89, 1992. [Ott01] Felix Otto. The geometry of dissipative evolution equations: the porous medium equation. Comm. Partial Differential Equations, 26(1-2):101–174, 2001. [OV00] F. Otto and C. Villani. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality. J. Funct. Anal., 173(2):361–400, 2000. [RS80] Michael Reed and Barry Simon. Functional Analysis I. Academic Press 1980. [vRS05] Max-K. von Renesse and Karl-Theodor Sturm. Transport inequalities, gradient es- timates, entropy, and Ricci curvature. Comm. Pure Appl. Math., 58(7):923–940, 2005. [Sch97] Alexander Schied. Geometric aspects of Fleming-Viot and Dawson-Watanabe pro- cesses. Ann. Probab., 25(3):1160–1179, 1997. [Sta03] Wilhelm Stannat. On transition semigroups of (A,Ψ)-superprocesses with immigra- tion. Ann. Probab., 31(3):1377–1412, 2003. [Stu06] Karl-Theodor Sturm. On the geometry of metric measure spaces. I. Acta Math., 196(1):65–131, 2006. [TVY01] Natalia Tsilevich, Anatoly Vershik, and Marc Yor. An infinite-dimensional analogue of the Lebesgue measure and distinguished properties of the gamma process. J. Funct. Anal., 185(1):274–296, 2001. [Vil03] Cédric Villani. Topics in optimal transportation, volume 58 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2003. Introduction Spaces of Probability Measures and Monotone Maps The Spaces P0=P([0,1]) and G0 The Spaces G, G1 and P=P(S1) Dirichlet Process and Entropic Measure Gibbsean Interpretation and Heuristic Derivation of the Entropic Measure The Measures Q and P The Measures Q0 and P0 The Dirichlet Process as Normalized Gamma Process Support Properties Scaling and Invariance Properties Dirichlet Processes on General Measurable Spaces The Change of Variable Formula for the Dirichlet Process and for the Entropic Measure Heuristic Approaches to Change of Variable Formulae The Change of Variables Formula on the Sphere The Change of Variables Formula on the Interval Proofs for the Sphere Case Proof for the Interval Case The Integration by Parts Formula The Drift Term Directional Derivatives Integration by Parts Formula on P(S1) Derivatives and Integration by Parts Formula on P([0,1]) Dirichlet Form and Stochastic Dynamics on on G The Dirichlet Form on G Finite Dimensional Noise Approximations Dirichlet Form and Stochastic Dynamics on G1 and P Dirichlet Form and Stochastic Dynamics on G0 and P0 The Canonical Dirichlet Form on the Wasserstein Space Tangent Spaces and Gradients The Dirichlet Form Rademacher Property and Intrinsic Metric Finite Dimensional Noise Approximations The Wasserstein Diffusion (t) on P0 ABSTRACT We construct a new random probability measure on the sphere and on the unit interval which in both cases has a Gibbs structure with the relative entropy functional as Hamiltonian. It satisfies a quasi-invariance formula with respect to the action of smooth diffeomorphism of the sphere and the interval respectively. The associated integration by parts formula is used to construct two classes of diffusion processes on probability measures (on the sphere or the unit interval) by Dirichlet form methods. The first one is closely related to Malliavin's Brownian motion on the homeomorphism group. The second one is a probability valued stochastic perturbation of the heat flow, whose intrinsic metric is the quadratic Wasserstein distance. It may be regarded as the canonical diffusion process on the Wasserstein space. <|endoftext|><|startoftext|> Introduction The eikonal approximation [1–3] has a long history of successful results in describing scattering processes like nucleon-nucleus scattering, heavy-ion col- lisions, and electroinduced nucleon-knockout reactions. The latter class of re- actions, usually denoted as A(e, e′p), provide access to a wide range of nuclear phenomena like short- and long-range correlations, relativistic effects, the tran- sition from hadronic to partonic degrees of freedom, and medium modifications of nucleon properties. The interpretation of A(e, e′p) data heavily relies on an Email address: Jan.Ryckebusch@UGent.be (J. Ryckebusch). Preprint submitted to Elsevier 6 August 2021 http://arxiv.org/abs/0704.0705v1 accurate description of the effect of the final-state interactions (FSI), i.e., the interactions of the ejected proton with the residual nucleus such as rescatter- ing and/or absorption. The eikonal approximation has been widely used to treat these distortions, either in combination with optical potentials [4–7], or with Glauber theory, its multiple-scattering extension [8–15]. The eikonal scattering wave functions are derived by linearizing the continuum wave equation for the ejected proton. Hence, the solution is only valid to first order in 1/k, with k the proton’s momentum, and the eikonal approximation is suited for the description of reactions at sufficiently high energies. To extend the applicability to lower energies, Wallace [16] has developed systematic cor- rections to the eikonal scattering amplitude. Several authors have investigated the effect of higher-order eikonal corrections in elastic nuclear scattering by protons, antiprotons, and α particles [17,18], heavy-ion collisions [19–22], and inclusive electron-nucleus scattering [23]. The aim of this Letter is to deter- mine the influence of higher-order eikonal corrections on A(e, e′p) observables. To this purpose, we extend the relativistic optical model eikonal approxima- tion (ROMEA) A(e, e′p) framework of Ref. [7]. Our formalism builds upon the work of Baker [24], where an eikonal approximation for potential scattering was derived to second order in 1/k. Here, this work is extended to include the effect of the spin-orbit potential. The outline of this Letter is as follows. In Section 2, the second-order eikonal correction to the ROMEA model is derived. Section 3 presents the results of the A(e, e′p) numerical calculations. We look into how the second-order eikonal correction affects more inclusive quantities like the nuclear transparency, as well as truly exclusive observables such as the induced normal polarization Pn, the left-right asymmetry ALT , and the differential cross section. Finally, in Section 4, we state our conclusions. 2 Formalism For the description of the A(e, e′p) reaction, we adopt the impulse approxi- mation (IA) and the independent-nucleon picture. Within this approach, the basic quantity to be computed is the transition matrix element [25] 〈Jµ〉 = ~k,ms (~r) Ĵµ(~r) ei~q·~r φα1(~r) . (1) Here, φα1 and Ψ ~k,ms are the relativistic bound-state and scattering wave func- tions, with α1 the quantum numbers of the struck proton and ~k and ms the momentum and spin of the ejected proton. The relativistic bound-state wave function is obtained in the Hartree approximation to the σ − ω model [26] with the W1 parametrization for the different field strengths [27]. The scat- tering wave function Ψ ~k,ms appears with incoming boundary conditions and is related to Ψ ~k,ms by time reversal. Furthermore, Ĵµ is the relativistic one-body current operator. Throughout this Letter, we use the Coulomb gauge and the CC2 form of Ĵµ [28]. We now turn our attention to the determination of the scattering wave func- tion Ψ ~k,ms . We start by considering the Dirac equation for a proton with relativistic energy E = k2 +M2N and spin state subject to Lorentz scalar and vector potentials Vs(r) and Vv(r). The Dirac equation for the four- component spinor Ψ ~k,ms (~r) is converted to a Schrödinger-like equation for the upper component u ~k,ms (~r) [7, 29] + Vc(r) + Vso(r) (~σ · ~L− i~r · ~̂p) ~k,ms (~r) = ~k,ms (~r) . (2) The central Vc(r) and spin-orbit Vso(r) potentials are defined in terms of the scalar and vector ones, Vs(r) and Vv(r). The lower component w ~k,ms (~r) is related to the upper one through ~k,ms (~r) = E +MN + Vs(r)− Vv(r) ~σ · ~̂p u ~k,ms (~r) . (3) When solving Eq. (2) in the eikonal approximation, a standard procedure is to replace the momentum operator ~̂p by the asymptotic momentum ~k in the spin-orbit (Vso(r)~σ · ~L) and Darwin (Vso(r) (−i~r · ~̂p)) terms, as well as in the lower component (3). In literature, this is usually referred to as the effective momentum approximation (EMA) [30]. For the upper component, one puts forward a solution of the form ~k,ms (~r) ≡ N η(~r) ei ~k·~r χ 1 , (4) i.e., a plane wave modulated by an eikonal factor η(~r). Here, N is a normal- ization factor. In the ROMEA approach [7, 29], which adopts the first-order eikonal approx- imation, Eq. (2) is linearized in ~̂p leading to a solution for the eikonal factor of the form ηROMEA(~r) = ηROMEA(~b, z) = exp dz ′ Vopt(~b, z  , (5) where ~r ≡ (~b, z), the z axis lies along the momentum ~k of the proton, and Vopt(~b, z) = Vc(~b, z) + Vso(~b, z) (~σ · ~b × ~k − ikz). Despite the fact that it is written as an exponential phase, the solution (5) is only valid up to first order in Vopt/k. In what follows, we will derive an expression for the eikonal factor η(~r) that is valid up to order Vopt/k 2. The momentum dependence in the spin-orbit and Darwin terms makes that these terms are retained up to order Vso/k, while central terms are included up to order Vc/k 2. Note that the expansion is not expressed in terms of the Lorentz scalar and vector potentials Vs and Vv. Looking for a solution of the form (4) for the Schrödinger-like equation (2), Baker arrived at the following equation for the eikonal factor (see Eq. (14) of Ref. [24]): η(~b, z) = 1− i dz ′ Vopt(~b, z ′) η(~b, z ′) + Vopt(~b, z) η(~b, z) dz ′ (z − z ′) Vopt(~b, z ′) η(~b, z ′) . (6) Note that, apart from dropping contributions of order Vopt/k 3 and higher, no additional assumptions were made when deriving Eq. (6). In Ref. [24], Eq. (6) was subsequently solved for spherically symmetric potentials. The spin-orbit and Darwin terms, however, break the spherical symmetry and a novel method to solve Eq. (6) is needed. To that purpose, we assume that the derivative of the function η is of higher order in 1/k than η itself (as is true for the ROMEA solution (5)). This allows us to drop the ∂η/∂b contribution in the last term of Eq. (6), as it is of order Vopt/k 3 or higher: dz ′ (z − z ′) Vopt(~b, z ′) η(~b, z ′) dz ′ (z − z ′) Vc(~b, z ′) + Vso(~b, z ′) (~σ ·~b× ~k − ikz ′) η(~b, z ′) . (7) Spherical symmetry implies that z ′ ∂Vc(~b, z ′)/∂b = b ∂Vc(~b, z ′)/∂z ′. Hence, the z ′ ∂Vc/∂b term in Eq. (7) can be written as dz ′ b ∂Vc(~b, z η(~b, z ′) dz ′ b Vc(~b, z ′) η(~b, z ′) 2 + b Vc(~b, z) η(~b, z) . (8) In the first step, we made use of the fact that the derivative ∂η/∂z ′ is of higher order to turn the integrand into an exact differential. A similar reasoning, followed by integration by parts, leads to dz ′ (z − z ′) ∂Vso(~b, z (−ikz ′) η(~b, z ′) 2 + b Vso(~b, z) η(~b, z ′) , (9) for the Darwin term of Eq. (7). Inserting the expressions of Eqs. (8) and (9), Eq. (6) adopts the form η(~b, z) = dz ′ Vopt(~b, z ′) η(~b, z ′)− 1 + b Vc(~b, z) η(~b, z) 1 + b ∂Vc(~b, z η(~b, z ′) Vso(~b, z) (~σ ·~b× ~k − ikz) η(~b, z) 1 + b dz ′ (z − z ′) Vso(~b, z ′)~σ ·~b× ~k η(~b, z ′) 2 + b Vso(~b, z) η(~b, z ′) . (10) We look for a solution of the form η(~b, z) = f(~b, z) exp dz ′ Vopt(~b, z ′) f(~b, z ′) = f(~b, z) exp i S(~b, z) , (11) which should reduce to the ROMEA result of Eq. (5) when terms of higher order than Vopt/k are neglected. Accordingly, the function f(~b, z) should be of the form f = 1+O(Vopt/k 2). Substituting (11) into Eq. (10) and multiplying by e−i S( ~b,z) yields f(~b, z) = 1− 1 + b Vc(~b, z) f(~b, z) 1 + b ∂Vc(~b, z f(~b, z ′) Vso(~b, z) (~σ ·~b× ~k − ikz) f(~b, z) 1 + b dz ′ (z − z ′) Vso(~b, z ′)~σ ·~b× ~k f(~b, z ′) 2 + b Vso(~b, z) f(~b, z ′) . (12) In deriving this equation, we set ei S( ~b,z ′) e−i S( ~b,z) equal to 1, since higher-order terms are neglected. The difficulty in solving for f(~b, z) is that Eq. (12) is an integral equation. An expression for f(~b, z) can, however, be readily obtained by adding (1−f) terms, which introduce only higher-order terms, to the right- hand side of Eq. (12). This is permitted since we seek for a solution up to order Vopt/k 2. With this manipulation, the function f becomes f(~b, z) = 1− 1 + b Vc(~b, z) + 1 + b ∂Vc(~b, z Vso(~b, z) (~σ ·~b× ~k − ikz) 1 + b dz ′ (z − z ′) Vso(~b, z ′)~σ ·~b× ~k 2 + b Vso(~b, z) . (13) The eikonal factor of Eq. (11) with f(~b, z) given by (13), is a solution of the integral equation (6) to order Vopt/k 2 and reduces to the ROMEA result (5) when truncated at order Vopt/k. Furthermore, it can be easily verified that the derivative of η is of higher order in Vopt/k than η itself. Henceforth, calculations performed with the eikonal factor of Eqs. (11) and (13), are dubbed as the second-order relativistic optical model eikonal approximation (SOROMEA). 3 Results One way to quantify the overall effect of FSI in A(e, e′p) processes is via the nuclear transparency. The measurements are commonly performed under quasielastic conditions [31–36]. We obtain the theoretical transparencies by adopting similar expressions and cuts as in the experiments. Hence, the nuclear transparency is defined as [37] d~pmS α(~pm, Em, ~k) d~pmS PWIA(~pm, Em) . (14) Here, Sα is the reduced cross section for knockout from the shell α Sα(~pm, Em, ~k) = dΩpdǫ′dΩǫ′ (e, e′p) , (15) where ~pm and Em are the missing momentum and energy, K is a kinemati- cal factor and σep is the off-shell electron-proton cross section. S PWIA is the reduced cross section within the plane-wave impulse approximation (PWIA) in the nonrelativistic limit. Further, α extends over all occupied shells α in the target nucleus. The phase-space volume in the missing momentum ∆3pm is defined by the cut |pm| ≤ 300 MeV/c. The A-dependent factor cA corrects in a phenomenological way for the effect of short-range correlations. We intro- duce the cA in the denominator of Eq. (14) because the data have undergone a rescaling with cA = 0.9 ( 12C) and 0.82 (56Fe). Transparencies have been computed for the nuclei 12C and 56Fe at planar and constant (~q, ω) kinematics compatible with the phase space covered in the experiments. For the optical potential, the EDAD1 parametrization of Ref. [38] was used. In Fig. 1 the ROMEA and SOROMEA results are displayed as a function of the four-momentum transfer Q2 and compared to the data. Not surpris- ingly, at high Q2, the ROMEA and SOROMEA predictions practically coin- cide and the role of the second-order eikonal effects grows with decreasing Q2. At Q2 = 1.7 (GeV/c)2, the ROMEA and SOROMEA transparencies agree to within 1%; while at Q2 = 0.3 (GeV/c)2, the difference has risen to 3% for 56Fe and 5% for 12C. The enhancement of the nuclear transparency due to the second-order eikonal corrections is modest, even for values of the four- momentum transfer as low as Q2 = 0.2 (GeV/c)2. Both the ROMEA and the SOROMEA predictions tend to slightly underestimate the measurements. The second-order corrections move the predictions somewhat closer to the Q2 = 0.34 (GeV/c)2 data point. As the nuclear transparency involves integrations over missing momenta and energies, it may hide subtleties in the theoretical treatment of the FSI mech- anisms. Next, we focus on highly exclusive A(e, e′p) quantities and quantify the role of second-order eikonal effects. An observable that is particularly well suited to study FSI effects is the induced normal polarization d5σ (σn =↑)− d 5σ (σn =↓) d5σ (σn =↑) + d5σ (σn =↓) , (16) where σn denotes the spin orientation of the ejectile in the direction orthogonal to the reaction plane. Indeed, in the one-photon exchange approximation, Pn vanishes in the absence of FSI. Fig. 2 shows the missing momentum dependence of the induced normal polar- ization for the kinematics of Ref. [39], corresponding with Q2 ≈ 0.5 (GeV/c)2. The calculations are performed with the energy-dependent A-independent (EDAI) potential of Ref. [38]. The ROMEA results are in line with the rela- tivistic distorted-wave impulse approximation (RDWIA) calculations of Ref. [40]. The RDWIA framework was implemented by the Madrid-Sevilla group [41] and relies on a partial-wave expansion of the exact scattering wave function. It is similar to the (SO)ROMEA approach in that both models compute the effect of the FSI with the aid of proton-nucleus optical potentials. Further, the overall agreement with the data is excellent. The second-order eikonal corrections are most pronounced for the 1s1/2 level. For missing momenta pm > 125 MeV/c, they reduce the magnitude of the Pn for the 1s1/2 state by roughly 20%, thereby resulting in a marginally better agreement with the highest pm data point. For 1p3/2 knockout, on the other hand, the effect of the second-order eikonal corrections is smaller than 5%. The inclusion of the second-order eikonal effects is particularly visible at high missing momentum, a region where also other mechanisms become impor- tant. The qualitative behavior of the meson-exchange and ∆-isobar currents, for instance, is alike [42]. At low missing momenta (pm ≤ 200 MeV/c), the in- duced normal polarization Pn is relatively insensitive to the two-body currents; whereas at higher missing momenta, sizable contributions from the meson- exchange and isobar currents are predicted. The influence of the meson and isobar degrees of freedom is also stronger for knockout from the 1s1/2 shell than for 1p3/2 knockout. In Fig. 2, also calculations neglecting the spin-orbit part Vso(~b, z)~σ ·~b×~k are shown. They illustrate that the spin-orbit distortion is the largest source of Pn. Hence, a correct inclusion of this term is essential. Moreover, Pn proves to be rather sensitive to the choice of optical potential [40]. Another A(e, e′p) observable which has been the subject of many investigations is the left-right asymmetry ALT = d5σ (φ = 0◦)− d5σ (φ = 180◦) d5σ (φ = 0◦) + d5σ (φ = 180◦) . (17) The subscript LT indicates that this quantity is closely related to the longitudinal- transverse response function. Fig. 3 presents the ALT predictions for the removal of 1p-shell protons in 16O in the kinematics of Refs. [43, 44]. The FSI shift the dip in ALT , which is located at pm ≈ 400 MeV/c in the relativistic PWIA (RPWIA), to lower values of the missing momentum. This shift is essential to describe the data at pm ≈ 350 MeV/c. The exact pm location and height of the ripple, however, are affected by many ingredients of the calculations, such as the current operator, bound-state wave function, and parametrization of the optical potential [44]. As can be inferred from Fig. 3, the second-order eikonal corrections affect the height, but not the position of the ripple. We also show the results of our SOROMEA calculations within the so-called noSV approximation. In this approximation, the dynamical enhancement of the lower component of the scattering wave (3) due to the Vs(r)− Vv(r) term is omitted. As such, the SOROMEA-noSV calculations make the same set of assumptions as the EMAf-noSV predictions by the Madrid-Sevilla group. The EMAf-noSV approach is an RDWIA calculation which adopts the EMA in combination with the noSV approximation. The second-order eikonal correc- tions clearly increase the height of the oscillation in ALT and brings the eikonal noSV calculations in excellent agreement with the corresponding partial-wave prediction EMAf-noSV. Finally, the comparison between the SOROMEA and the SOROMEA-noSV calculations demonstrates that the dynamical enhance- ment plays a significant role in the description of the ALT data. In Fig. 4, 16O(e, e′p) cross-section results are displayed for the kinematics of Fig. 3. The spectroscopic factors, which normalize the calculations to the data, were determined by performing a χ2 fit to the data and are summa- rized in Table 1. The RDWIA spectroscopic factors are 5–10% higher than the (SO)ROMEA ones. The second-order eikonal corrections hardly affect the values of the extracted spectroscopic factors. Both our (SO)ROMEA calcula- tions and the RDWIA predictions of the Madrid-Sevilla group do a very good job of representing the data over the entire pm range. For missing momenta |pm| ≤ 250 MeV/c, the (SO)ROMEA and RDWIA results are in excellent agreement. The impact of the second-order eikonal corrections on the com- puted differential cross sections is almost negligible for pm below the Fermi momentum, but can be as large as 30% at high pm. The inclusion of the second-order effects improves the agreement with the RDWIA calculations at these high missing momenta. Results for the effective response functions RL, RT , RLT , and RTT are not shown, but the effect of the second-order eikonal corrections is similar to the effect on the differential cross section. RPWIA ROMEA SOROMEA RDWIA 1p3/2 0.55 0.84 0.83 0.92 1p1/2 0.47 0.75 0.74 0.78 Table 1 The spectroscopic factors for the 16O(e, e′p) reaction of Ref. [43], as obtained with a χ2 procedure. 4 Conclusions We have developed a formalism to account for second-order corrections in the eikonal approximation. Our model is relativistic and includes both the central and spin-orbit parts of the optical potentials. The formalism has been applied to A(e, e′p) processes. Our numerical calculations show that the effect of the second-order eikonal corrections on A(e, e′p) observables is rather limited for Q2 ≥ 0.2 (GeV/c)2. The nuclear transparency calculations confirm the ex- pected energy dependence of the eikonal corrections: the effect decreases with increasing Q2. Concerning the pm dependence of the A(e, e ′p) observables, the effect of the second-order eikonal corrections is minor except at high missing momenta. In this high-pm region, the eikonal corrections affect the observables up to an order of 30%, thereby bringing the calculations closer to the data and/or the RDWIA calculations. The robustness of the first-order eikonal ap- proximation, which emerges from this study, can be invoked to explain the success of the Glauber approach to A(e, e′p) down to relatively low kinetic energies of 200 MeV. Acknowledgements This work was supported by the Fund for Scientific Research, Flanders (FWO). References [1] G.P. McCauley, G.E. Brown, Proc. Phys. Soc. London 71 (1958) 893. [2] R.J. Glauber, in: W.E. Brittin, et al. (Eds.), Lectures in Theoretical Physics, Interscience, New York, 1959. [3] C.J. Joachain, Quantum Collision Theory (Elsevier, Amsterdam, 1975). [4] W.R. Greenberg, G.A. Miller, Phys. Rev. C 49 (1994) 2747. [5] A. Bianconi, M. Radici, Phys. Lett. B 363 (1995) 24. [6] H. Ito, S.E. Koonin, R. Seki, Phys. Rev. C 56 (1997) 3231. [7] D. Debruyne, J. Ryckebusch, W. Van Nespen, S. Janssen, Phys. Rev. C 62 (2000) 024611. [8] L.L. Frankfurt, E. Moniz, M. Sargsyan, M.I. Strikman, Phys. Rev. C 51 (1995) 3435. [9] N.N. Nikolaev, A. Szcurek, J. Speth, J. Wambach, B.G. Zakharov, V.R. Zoller, Nucl. Phys. A 582 (1995) 665. [10] S. Jeschonnek, T.W. Donnelly, Phys. Rev. C 59 (1999) 2676. [11] A. Kohama, K. Yazaki, R. Seki, Nucl. Phys. A 662 (2000) 175. [12] O. Benhar, N. Nikolaev, J. Speth, A. Usmani, B. Zakharov, Nucl. Phys. A 673 (2000) 241. [13] C. Ciofi degli Atti, L.P. Kaptari, D. Treleani, Phys. Rev. C 63 (2001) 044601. [14] M. Petraki, E. Mavrommatis, O. Benhar, J.W. Clark, A. Fabrocini, S. Fantoni, Phys. Rev. C 67 (2003) 014605. [15] J. Ryckebusch, D. Debruyne, P. Lava, S. Janssen, B. Van Overmeire, T. Van Cauteren, Nucl. Phys. A 728 (2003) 226. [16] S.J. Wallace, Phys. Rev. Lett. 27 (1971) 622; S.J. Wallace, Ann. Phys. (N.Y.) 78 (1973) 190; S.J. Wallace, Phys. Rev. D 8 (1973) 1846; S.J. Wallace, J.A. McNeil, Phys. Rev. D 16 (1977) 3565; S.J. Wallace, Phys. Rev. C 29 (1984) 956. [17] D. Waxman, C. Wilkin, J.-F. Germond, R.J. Lombard, Phys. Rev. C 24 (1981) [18] G. Fäldt, A. Ingemarsson, J. Mahalanabis, Phys. Rev. C 46 (1992) 1974. [19] F. Carstoiu, R.J. Lombard, Phys. Rev. C 48 (1993) 830. [20] M.H. Cha, Y.J. Kim, Phys. Rev. C 51 (1995) 212. [21] J.S. Al-Khalili, J.A. Tostevin, J.M. Brooke, Phys. Rev. C 55 (1997) R1018. [22] C.E. Aguiar, F. Zardi, A. Vitturi, Phys. Rev. C 56 (1997) 1511. [23] J.A. Tjon, S.J. Wallace, Phys. Rev. C 74 (2006) 064602. [24] A. Baker, Phys. Rev. D 6 (1972) 3462. [25] J.J. Kelly, Adv. Nucl. Phys. 23 (1996) 75. [26] B.D. Serot, J.D. Walecka, Adv. Nucl. Phys. 16 (1986) 1. [27] R.J. Furnstahl, B.D. Serot, H.-B. Tang, Nucl. Phys. A 615 (1997) 441. [28] T. de Forest, Nucl. Phys. A 392 (1983) 232. [29] R.D. Amado, J. Piekarewicz, D.A. Sparrow, J.A. McNeil, Phys. Rev. C 28 (1983) 1663. [30] J.J. Kelly, Phys. Rev. C 60 (1999) 044609. [31] G. Garino, et al., Phys. Rev. C 45 (1992) 780. [32] T.G. O’Neill, et al., Phys. Lett. B 351 (1995) 87. [33] N.C.R. Makins, et al., Phys. Rev. Lett. 72 (1994) 1986. [34] D. Abbott, et al., Phys. Rev. Lett. 80 (1998) 5072. [35] D. Dutta, et al., Phys. Rev. C 68 (2003) 064603. [36] D. Rohe, et al., Phys. Rev. C 72 (2005) 054602. [37] P. Lava, M.C. Mart́ınez, J. Ryckebusch, J.A. Caballero, J.M. Ud́ıas, Phys. Lett. B 595 (2004) 177. [38] E.D. Cooper, S. Hama, B.C. Clark, R.L. Mercer, Phys. Rev. C 47 (1993) 297. [39] R.J. Woo, et al., Phys. Rev. Lett. 80 (1998) 456. [40] J.M. Ud́ıas, J.R. Vignote, Phys. Rev. C 62 (2000) 034302. [41] J.M. Ud́ıas, P. Sarriguren, E. Moya de Guerra, E. Garrido, J. A. Caballero, Phys. Rev. C 48 (1993) 2731. [42] J. Ryckebusch, D. Debruyne, W. Van Nespen, S. Janssen, Phys. Rev. C 60 (1999) 034604. [43] J. Gao, et al., Phys. Rev. Lett. 84 (2000) 3265. [44] K.G. Fissum, et al., Phys. Rev. C 70 (2004) 034606. ROMEA SOROMEA Q2 (GeV2/c2) 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fig. 1. Nuclear transparencies versus Q2 for A(e, e′p) reactions in quasielastic kine- matics. The SOROMEA (dashed lines) are compared to the ROMEA (solid lines) results. The EDAD1 potential [38] has been employed in both formalisms. Data points are from Refs. [31] (open squares), [32, 33] (open triangles), [34, 35] (solid triangles), and [36] (open diamonds). 1p3/2 ROMEA SOROMEA ROMEA-noSO SOROMEA-noSO 1s1/2 pm (MeV/c) 0 50 100 150 200 250 300 Fig. 2. Induced normal polarization Pn for proton knockout from the 1p3/2 (upper panel) and 1s1/2 (lower panel) shell in the 12C(e, e′~p) reaction. The kinematics is de- termined by beam energy ǫ = 579 MeV, momentum transfer q = 760 MeV/c, energy transfer ω = 292 MeV, and azimuthal angle φ = 180◦. The solid (dashed) curves represent ROMEA (SOROMEA) calculations. The dot-dashed (dotted) curves re- fer to predictions obtained within the ROMEA (SOROMEA) frameworks, with the spin-orbit term Vso(~b, z)~σ ·~b× ~k turned off. The data are from Ref. [39]. 1p1/2 ROMEA SOROMEA SOROMEA-noSV EMAf-noSV RPWIA 1p3/2 pm (MeV/c) 0 50 100 150 200 250 300 350 400 450 Fig. 3. The left-right asymmetry ALT for the 16O(e, e′p) experiment of [43]. The kinematics was ǫ = 2.442 GeV, q = 1 GeV/c, and ω = 445 MeV (i.e., Q2 = 0.8 (GeV/c)2). The red solid (green dashed) lines show the results of the ROMEA (SOROMEA) calculations. The SOROMEA-noSV (orange long-dotted curves) calculations differ from the SOROMEA calculations in that the dynamical enhancement of the lower component of the scattering wave function is neglected. The cyan short-dotted curves present the results from an RDWIA calculation where the spinor distortions in the scattered wave are neglected. All calculations use the EDAI version for the optical potentials [38]. The black short-dotted curves represent the RPWIA results. The data points are from Ref. [43]. ROMEA SOROMEA RDWIA RPWIA 1p3/2 1p1/2 pm (MeV/c) -400 -300 -200 -100 0 100 200 300 400 Fig. 4. 16O(e, e′p) cross sections compared to ROMEA, SOROMEA, RDWIA, and RPWIA calculations for the constant (~q, ω) kinematics of Fig. 3. The calculations use the optical potential EDAI [38]. The data are from Ref. [43] and the RDWIA results from Ref. [44]. The following convention is adopted: positive (negative) pm corresponds to φ = 180◦ (φ = 0◦). Introduction Formalism Results Conclusions Acknowledgements References ABSTRACT The first-order eikonal approximation is frequently adopted in interpreting the results of $A(e,e'p)$ measurements. Glauber calculations, for example, typically adopt the first-order eikonal approximation. We present an extension of the relativistic eikonal approach to $A(e,e'p)$ which accounts for second-order eikonal corrections. The numerical calculations are performed within the relativistic optical model eikonal approximation. The nuclear transparency results indicate that the effect of the second-order eikonal corrections is rather modest, even at $Q^{2} \approx 0.2$ (GeV/c)$^2$. The same applies to polarization observables, left-right asymmetries, and differential cross sections at low missing momenta. At high missing momenta, however, the second-order eikonal corrections are significant and bring the calculations in closer agreement with the data and/or the exact results from models adopting partial-wave expansions. <|endoftext|><|startoftext|> Flory-Huggins theory for the solubility of heterogeneously-modified polymers Patrick B. Warren Unilever R&D Port Sunlight, Bebington, Wirral, CH63 3JW, UK. (Dated: April 5, 2007) Many water soluble polymers are chemically modified versions of insoluble base materials such as cellulose. A Flory-Huggins model is solved to determine the effects of heterogeneity in modification on the solubility of such polymers. It is found that heterogeneity leads to decreased solubility, with the effect increasing with increasing blockiness. In the limit of extreme blockiness, the nature of the phase coexistence crosses over to a polymer-polymer demixing transition. Some consequences are discussed for the synthesis of partially modified polymers, and the experimental characterisation of such systems. Many water-soluble polymers are made by chemically modifying insoluble base materials such as starches and gums, for example a wide class of water-soluble polymers are obtained from cellulose [1, 2]. It is often possible to vary the degree of modification of the base polymer to obtain water soluble polymers with, in principle, con- tinuously variable properties. A basic characteristic of these polymers is their solubility, but given the essen- tially stochastic nature of the chemical modification step, what is the effect of heterogeneity in modification on the solubility of the resulting materials? In the present paper, this question is approached from a theoretical point of view by setting up a Flory-Huggins model for the phase behaviour of a polymer-solvent mix- ture [3], where the polymers have a random degree of modification. In this approach, the issue of solubility is translated into the problem of determing the phase coexistence between a dissolved aqueous phase and an undissolved (water-poor) phase. The solubility is then formally given by the polymer concentration in the aque- ous phase. Determination of the full phase behaviour for a multicomponent Flory-Huggins theory is an onerous task though, and a simpler approach is to examine the spinodal stability of the system, which can be taken to be representative of the full phase behaviour. This is the approach taken in the present paper. It is arguably more insightful than a full calculation of the phase behaviour since closed-form analytic expressions can be obtained for the spinodal stability limit. The approach taken is sim- ilar to models for the phase behaviour of random block copolymer melts which have been developed in the past [4, 5, 6]. There has been rather little work though on random copolymers which also include a solvent, apart from a brief example described by Sollich et al [7]. In the present model, it is supposed that the system comprises a large number of species of polymers i with differing degrees of modification 0 < αi < 1 and con- centrations ρi. For simplicity, length polydispersity is neglected, and all the polymers are assumed to have the same number N of segments. The system is then de- scribed by the following (mean field) Flory-Huggins free energy density, i ρi log ρi+(1−φ) log(1−φ)+χ(φ−η)(1−φ), (1) where φ is the total polymer segment concentration and η is the concentration of chemically modified segments, given respectively by φ = N i ρi and η = N i ρiαi. The first term in Eq. (1) is the ideal free energy of mix- ing. The second term is the usual Flory-Huggins configu- rational chain entropy. The third term is the free energy cost of the unmodified polymer segments at a concentra- tion φ− η coming into contact with solvent (water) at a concentration 1 − φ. Typically one expects χ > 1/2 for this interaction, to represent the repulsion between un- modified segments and water which leads to phase sepa- ration of unmodified polymers. To keep the model sim- ple, this is the only χ-parameter that is retained in the problem. Eq. (1) has the structure of a moment free energy, since the excess free energy, comprising the second and third terms, only depends on φ and η which are mo- ment densities. Such a system can be analysed using the methods developed by Sollich and coworkers [7, 8, 9, 10]. In particular, Ref. [10] describes how the spinodal sta- bility conditions for systems with an excess free energy can be expressed in terms of moment densities, gen- eralising various truncation theorems obtained by ear- lier workers [11, 12]. I now summarise the relevant re- sults, translated into terms suitable for the present prob- lem. Let us consider such a system with a free energy i ρi log ρi + f (ex)(φ(1) . . . φ(n)), where the excess free energy depends on moment densities of the form φ(r) = i ρiw i (r = 1 . . . n), with the w i being species-dependent weights. The fundamental idea is that the moment densities can be treated as effective species concentrations. In particular, it can be proved that spin- odal stability corresponds to the positive-definiteness of the matrix M of second partial derivatives of the free en- ergy with respect to the moment densities. In Ref. [10] it is shown that M = Mid + Mex where (M id )rs = i ρiw i and (Mex)rs = ∂ 2f (ex)/∂φ(r)∂φ(s). The limit of spinodal stability is given by detM = 0. This condition usually corresponds to the vanishing of a single eigenvalue of M, with an eigenvector ∆φ(s) that satisfies s(M)rs∆φ (s) = 0. It is shown in Ref. [10] that the spinodal instability direction in the space of species con- centrations is given by ∆ρi = rs ρiw i (Mid)rs∆φ For the present problem, there are two moment densi- ties φ and η, defined respectively with w i = N (a con- http://arxiv.org/abs/0704.0707v1 stant) and w i = Nαi (the number of modified groups on the ith species). Application of the above theory to Eq. (1) leads to id = N i ρiαi i ρiαi i ρiα Mex = (1− φ)−1 − 2χ χ . (3) After some algebra the condition detM = 0 reduces to −2χ(1−〈α〉)−χ2Nφ(〈α2〉−〈α〉) = 0, (4) where 〈α〉 = i ρiαi / i ρi, 〈α i ρiα i ρi. (5) I emphasise that, despite being remarkably simple, Eq. (4) is exact. One already reaches a significant conclusion from this. The first three terms in Eq. (4) are what one would ex- pect from standard Flory-Huggins theory [3], with an ef- fective χ-parameter given by the product of the original χ-parameter and the fraction 1− 〈α〉 of unmodified seg- ments. These terms therefore take account of the mean degree of modification. The final term in Eq. (4) is a correction due to the heterogeneity. Since the variance 〈α2〉− 〈α〉2 is positive, this term is always negative. The effect is that heterogeneity in modification reduces the solubility, over and above what would be expected from the mean degree of modification. To make further progress, it is convenient to specify a model for the distribution of the αi. In particular, such a model can be used to examine the effect of blockiness in modification which is expected to play an important role. In previous work on random block copolymers [4, 5], a Markov model was used to characterise the correlations between different kinds of segments. Whilst such a model may be appropriate for the stochastic nature of the syn- thetic route for such random block copolymers, as dis- cussed below it is probably not appropriate in the present case. I therefore consider instead a very simple model for the heterogeneity in which the modified segments occur in blocks of size M , where 1 < M < N . In this model, it is supposed that each block has an equal probability p of being modified, and there are no further correlations. Then, for any particular species, αi = (1/N) j=1 Mǫij where j labels the blocks, and ǫij is zero or one with prob- ability 1 − p and p respectively. Thus the αi are drawn from scaled binomial distribution, with 〈α〉 = p, 〈α2〉 − 〈α〉 = (M/N) p(1− p). (6) Eq. (4) becomes − 2χ(1− p)− χ2Mφp(1− p) = 0. (7) This is a quadratic equation for χ and the appropriate root is )}1/2 . (8) I now examine the consequences of this result. The formal limit M → 0 corresponds to a vanishing variance and a completely uniform distribution of mod- ified segments, as though each monomer has undergone an identical fractional modification by a fraction p, rather than being modified or not with probability p and 1− p. As noted already above, this limit corresponds to sim- ple Flory-Huggins theory with an effective χ-parameter equal to χ(1−p). For large N , this indicates the absence of phase separation for χ(1− p) < 1/2 or p > 1− 1/(2χ). Now let us consider Eq. (8) for block size M = 1. In this case, individual segments are modified randomly with no correlations. For M = 1 and large N in Eq. (8), there are two behaviours depending on the value of p. For p < 4/5, there is an absence of phase separation for χ(1 − p) < 1/2, just as for the M → 0 limit. For 4/5 < p < 1, the behaviour is more complicated. To be precise, the location of the minimum value of the χ(φ) spinodal shifts from φmin ∼ N −1/2 for p < 4/5 to a non- vanishing 0 < φmin < 1 for p > 4/5 (it is the examination of Eq. (8) in the limit φ ∼ N−1/2 that gives the cross over point p = 4/5). The change in behaviour can be seen for the M = 1 curves (dashed lines) in Fig. 1 and is shown explicitly in the upper plot of Fig. 2. Let us next consider the limit of extreme blockiness M = N . This limit is strikingly different from the M = 1 case. For largeN and p > 0, one can show that there is an absence of phase separation only for χ Np(1− p) < 2. In the large N limit, this inequality is always violated, indicating that the system always has a tendency to un- dergo phase separation in the limit of extreme blockiness. Since the unmodified polymer system itself only phase separates for χ > 1/2, this suggests that the phase sep- aration has the nature of a polymer-polymer demixing transition rather than a solvent-driven phase separation. This insight is confirmed by analysis of the spinodal in- stability direction below. For large N and general M in Eq. (8), one would ex- pect that the above two cases represent the two classes of behaviour. In the first case M ≪ N and the behaviour is similar to the M = 1 limit where individual segments are randomly modified. In the second case, M ∝ N and the behaviour is similar to the M = N limit of ex- treme blockiness. Fig. 1 shows typical spinodal curves calculated from Eq. (8) for various values of p and M . The location of the minimum (φmin, χmin) of the spin- odal curves can be numerically determined, and Fig. 2 shows how this depends on p. The results show firstly that for M ≪ N , increasing p leads to increasing solubility as the value of χ required to reach the spinodal instability is increased. Moreover, a decrease in solubility between a uniform model (M → 0) with no heterogeneity, and a model with fine-grained 0.0 0.2 0.4 0.6 0.8 1.0 PSfrag replacements p = 0.1 0.0 0.2 0.4 0.6 0.8 1.0 PSfrag replacements p = 0.1 p = 0.5 0.0 0.2 0.4 0.6 0.8 1.0 PSfrag replacements p = 0.1 p = 0.5 p = 0.9 FIG. 1: Spinodal curves calculated from Eq. (8) for polymers of length N = 103, for three values of the mean degree of modification p, and for block sizes M → 0 (uniform limit, solid line), M = 1 (dashed line), M = 10 (dash-dot line), M = 100 (dash-dot-dot line) and M = 103 (dash-dash-dot line). The system is spinodally unstable above the indicated curves. Note the change in shape of the M = 1 curves: for p = 0.1 and 0.5 the minimum is at φ → 0, whereas for p = 0.9 the minimum is at φ ≈ 0.25. blockiness (M = 1), is apparent. The major effect arises as M → N though, where the tendency for phase sepa- ration is greatly enhanced. The above analysis is augmented considering the spin- odal instability direction associated with the spinodal stability limit which can provide a useful mechanistic in- sight. As explained above, the spinodal instability direc- tion is characterised by the eigenvector that corresponds to the vanishing eigenvalue responsible for the vanish- 0.0 0.2 0.4 0.6 0.8 1.0 PSfrag replacements 0.0 0.2 0.4 0.6 0.8 1.0 PSfrag replacements FIG. 2: The location of the numerically determined minimum of the spinodal curves from Eq. (8) is plotted as a function of p, for polymers of length N = 103 and block sizes M → 0 (uniform limit, solid line), M = 1 (dashed line), M = 10 (dash-dot line), M = 100 (dash-dot-dot line) and M = 103 (dash-dash-dot line). For M = 1 (dashed line) the upper plot shows clearly that φmin ≈ N ≈ 0.03 only holds for 4/5 = 0.8. ing spinodal determinant. For the present problem, from Eqs. (2)–(3), one finds the instability direction is charac- terised by ∆η /∆φ = 〈α〉 − χNφ(〈α2〉 − 〈α〉2) (9) The corresponding spinodal instability direction in the space of species concentrations is 〈α2〉∆φ − 〈α〉∆η + αi(∆η − 〈α〉∆φ) φ(〈α2〉 − 〈α〉2) 1 + χNφ(〈α〉 − αi) where the second line follows by inserting the result for the ratio ∆η/∆φ. These results should be evaluated on the spinodal. They are all exact, for an arbitrary distri- bution of αi. For the instability direction to lie along a pure dilution line, one should have ∆ρi/ρi independent of species i. One can conclude that this only happens if ∆η/∆φ = 〈α〉, in other words if the variance 〈α2〉 − 〈α〉2 vanishes. In such a case, the phase transition is purely associative, or solvent-driven, meaning that the compositions of the coexisting phases remain the same (∆ρi/∆φ = ρi/φ). If one specialises to the model of blockiness described above by inserting the value of χ corresponding to the spinodal stability limit, the instability direction becomes )}1/2 This confirms that the spinodal instability lies along a dilution line (∆η/∆φ = 〈α〉 = p) only in the limit M → 0 which formally corresponds to a vanishing vari- ance. For M = 1 (and M ≪ N in general) the phase transition has a mixed character. The interesting case occurs when M = N (or M ∝ N in general) for which ∆η/∆φ ∼ (−)N1/2 in the limit of largeN . One can write this as ∆φ/∆η → 0 as N → ∞. This shows that the phase transition tends towards being purely segregative, meaning that the overall polymer concentration in coex- isting phases remains the same (∆φ = 0). This confirms the suggestion above, that in the limit of extreme block- iness, the system tends towards a segregative polymer- polymer demixing transition. Let us now try to draw some conclusions. The main ef- fect of randomness is to reduce the solubility of partially- modified polymers beyond what would be expected from the mean degree of modification. The extent to which this occurs depends on the blockiness in substitution. For fine-grained blockiness, the phase behaviour is ex- pected to be similar to a system for which there is no ran- domness, albeit with a somewhat reduced solubility. For coarse-grained blockiness, where the block size is com- parable to the polymer length, the nature of the phase transition changes to a polymer-polymer demixing tran- sition. In this situation, one expects that the modified polymers (being almost fully modified) will partition into the aqueous phase, leaving the unmodified polymers be- hind. The reason for considering the two extreme kinds of blockiness is now clearer: namely one can envisage two different mechanisms of chemical modification (this is the reason why a Markov model for the distribution of mod- ified segments has not been used). Fine-grained block- iness would arise if monomers are equally accessible to the modifying agent, irrespective of their surroudings. If this cannot be achieved in a one-step process (for the reason described below) it could perhaps be achieved in a two-step process, by fully modifying the polymers then removing a random fraction of the derivative groups. Ex- treme blockiness on the scale of the polymer chain itself would arise if the modifying agent was present only in the aqueous phase, and as such only able to access polymer which had already been solubilised. This would lead to a mixture of polymers which were either fully modified, or remained unmodified and insoluble. The process of mod- ification of insoluble polymers could still be initiated be- cause the modifying agent is able to access the tiny pro- portion of the insoluble polymer segments which lie at the interface between the insoluble and aqueous phases. Experimentally, confirmation of the scenario of extreme blockiness would be given by measuring the mean degree of modification for the dissolved polymers. One should find that this is much in excess of the apparent mean degree of modification. In the calculation, the major effect arises from inter- chain rather than intra-chain heterogeneities. The model is not sophisticated enough to take account of the solu- tion structures such as micelles or mesophases that could form for blocky polymers with block sizes M ≫ 1 but still M < N (for example, diblock copolymers). Such polymers would be expected to have greater solubilities than would be predicted from the Flory-Huggins theory since the hydrophobic groups can be buried in micelles or other solution structures. The present theory could be extended to discuss these inhomogeneous situations using a Landau approach developed for random block copoly- mers [4, 5, 6]. For the mechanistic routes discussed above though, it is difficult to envisage that polymers with in- termediate block sizes could arise very easily. I therefore expect that the general conclusions will remain. Finally I note that in principle the above model for the phase behaviour could be combined with a model for the chemical modification reaction, to obtain a theory for reaction-induced solubility. However, one needs to take great care to capture the kinetics correctly [13]. I thank Nigel Clarke for a critical reading of the manuscript. [1] R. L. Davidson, Handbook of water-soluble gums and resins (McGraw-Hill, New York, 1980). [2] J. Rueben, Macromol. 17, 156 (1984). [3] P. J. Flory, Principles of polymer chemistry (Cornell Uni- versity Press, Ithaca, New York, 1953). [4] G. H. Fredrickson and S. T. Milner, Phys. Rev. Lett. 67, 835 (1991). [5] G. H. Fredrickson, S. T. Milner, and L. Leibler, Macro- molecules 25, 6341 (1992). [6] A. Nesariker, M. Olvera de la Cruz, and B. Crist, J. Chem. Phys. 98, 7385 (1993). [7] P. Sollich, P. B. Warren, and M. E. Cates, Adv. Chem. Phys. 116, 265 (2001). [8] P. Sollich and M. E. Cates, Phys. Rev. Lett. 80, 1365 (1998). [9] P. B. Warren, Phys. Rev. Lett. 80, 1369 (1998). [10] P. B. Warren, Europhys. Lett. 46, 295 (1999). [11] P. Irvine and M. Gordon, Proc. R. Soc. Lond. A 375, 397 (1981). [12] E. M. Hendriks, Ind. Eng. Chem. Res. 27, 1728 (1988). [13] G. A. Buxton and N. Clarke, Macromolecules 38, 8929 (2005). ABSTRACT Many water soluble polymers are chemically modified versions of insoluble base materials such as cellulose. A Flory-Huggins model is solved to determine the effects of heterogeneity in modification on the solubility of such polymers. It is found that heterogeneity leads to decreased solubility, with the effect increasing with increasing blockiness. In the limit of extreme blockiness, the nature of the phase coexistence crosses over to a polymer-polymer demixing transition. Some consequences are discussed for the synthesis of partially modified polymers, and the experimental characterisation of such systems. <|endoftext|><|startoftext|> Introduction and statement of the results. Let Ω be a bounded open set with smooth boundary in R2 or R3. Consider a L∞ function σ such that there exists a real c with σ(x) ≥ c > 0. Consider the elliptic equation − div σ(x)∇u = 0 in Ω, (1) with the Dirichlet boundary condition u = f on ∂Ω. (2) Define the Dirichlet-to-Neumann map as Λσ : f 7→ σ (∂nu)|∂Ω , where u solves (1),(2) and n is the outer unit normal vector to ∂Ω. The inverse conductivity problem of Calderón is to determine σ from Λσ. Electrical impedance tomography aims to form an image of the conductivity distribution σ from the knowledge of Λσ. When σ is smooth enough, one can reconstruct σ from Λσ (see the works of Sylvester and Uhlmann [21], Nachmann [15, 16] and Novikov ∗Laboratoire de Mathématiques Appliquées de Compiègne. Université de Technologie de Compiègne. http://arxiv.org/abs/0704.0708v1 [17]). When the conductivity distribution is only L∞, Astala and Päivärinta have recently shown in [3] that, in dimension two, the map Λσ determines σ ∈ L ∞(Ω). We are interested in a particular case of that problem: when a body is inserted inside a given object with a distinct conductivity, the question of determining its shape from boundary measurement arises in many fields of modern technology. In the context of the inverse problem of conductivity of Calderón, we restrict the range of admissible conductivity distributions to the family of piecewise constant functions which take only two distinct values σ1, σ2 > 0 which are assumed to be known. The conductivity distribution is then defined by an open subset ω as σ = σ1χΩ\ω + σ2χω. (3) Here, the only unknown of the problem is ω a subdomain of Ω with a smooth boundary ∂ω; its outer unit normal vector is denoted by n. The notation χω (respectively. χΩ\ω) denotes the characteristic function of ω (respectively. Ω \ ω). The second main difference arises from practical considerations: it is unrealistic from the point of view of applications to know the full graph of Dirichlet-to-Neumann. Therefore, we will assume that one has access to a single point in that graph. This non destructive testing problem is usually written from a numerical point of view as the minimization of a cost function: typically a least-square matching criterion. Many authors have investigated the steepest descent method for this problem [13, 7, 10, 18, 1] with the methods of shape optimization since the unknown parameter is a geometrical domain. This work is devoted to the study of second order methods for this problem that has only be considered before for simplified models in [5, 2]. By introducing second order methods, one aims to reach two distinct objectives. • On one hand, we provide all the needed material to design a Newton algorithm. We will give differentiability results for the state function and for the objective that we have chosen to study in this work. Nevertheless, we point out that the discretization of a Newton method for this problem turns out to be very delicate; this is why, in the present paper, we will neither discuss about this problem nor present numerical examples. This topic is actually the main objective of a work in progress. • On the other hand, we analyze rigorously the well-posedness of the optimization method. This is justified by the huge numerical literature devoted to the numerical study of this question in the field of inverse problems; the numerical experiments insist on the ill-posedness of this problem. We will explain the instability in the continuous settings in terms of shape optimization. We show that the shape Hessian is not coercive -in fact its Riesz operator is compact – and this explains the unstability of the minimization process. Let us describe the precise problem under consideration and the notations. We consider a bounded domain Ω ⊂ Rd (d = 2 or 3) with a C2 boundary. It is filled with a material whose conductivity is σ1 and with an unknown inclusion ω in Ω of conductivity σ2 6= σ1. We search to reconstruct the shape of ω by measuring on ∂Ω, the input voltage and the corresponding output current. In the sequel, we fix d0 > 0 and consider inclusions ω such that ω ⊂⊂ Ωd0 = {x ∈ ω, d(x, ∂Ω) > d0}. We also assume that the boundary ∂ω is of class C4,α. The inverse problem arises when one has access to the normal vector derivative of the potential u that solves (1)-(2) when the conductivity distribution is defined by (3) . Assume that ones knows σ1∂nu = g on ∂Ω, (4) then the problem (1)-(2)-(4) is overdetermined. The electrical impedance tomography problem we consider is to recover the shape of ω from the knowledge of the single Cauchy pair (f, g). In order to recover the shape of the inclusion ω, an usual strategy is to minimize a cost function. Many choices are possible; however it turns out that a Kohn and Vogelius type objective leads to a minimization problem with nicer properties than the least squares fitting approaches (we refer to [1] for a comparison of different objectives with order one methods and to [2] for the case of a perfectly insulated inclusion). Therefore, we study such a cost function in this work. Let us define this criterion. Its distinctive feature is to involve two state functions ud and un: the state ud solves (1)-(2) while un solves (1)-(4). The Kohn -Vogelius objective JKV is then defined as: JKV (ω) = σ|∇(ud − un)| 2 (5) Let us sum up the results of this paper concerning the minimization of this objective. We first prove differentiability results for the state ud. In the sequel, we use the convention that a bold character denotes a vector. If h denotes a deformation field, it can be written as h = hτ + hnn on ∂ω. Note also that in the following lines, n denotes the outer normal field to ∂ω pointing into Ω\ω. Hence, for x ∈ ∂ω, we define, when the limit exists, u±(x) (resp. (∂nu) ±(x)) as the limit of u(x± tn(x)) (resp. 〈∇u(x± tn(x),n(x))) when t > 0 tends to 0. Note that hτ is a vector while hn is a scalar quantity. The admissible deformation fields have to preserve ∂Ω and the regularity of the boundaries: therefore the space of admissible fields is H = {h ∈ C4,α(Rd,Rd), Supp(h) ⊂ Ωd0}. The following result concerns the first order derivative of the state functions ud and un. It was derived in [7, 18, 1]. Theorem 1 Let Ω be an open smooth subset of Rd (d = 2 or 3) and let ω be an element of Ωd0 with a boundary of class C4,α. Then the state functions ud and un are shape differentiable; furthermore their shape derivative u′d and u n belongs to H 1(Ω \ ω) ∪H1(ω) and satisfy   ∆u′d = 0 in Ω \ ω and in ω, d on ∂ω,[ = [σ]divτ (hn∇τud) on ∂ω, u′d = 0 on ∂Ω.   ∆u′n = 0 in Ω \ ω and in ω, n on ∂ω,[ = [σ]divτ (hn∇τun) on ∂ω, ∂u′n = 0 on ∂Ω. The main result of this work concerns the second order derivative. It is given is the following theorem. Theorem 2 Let Ω be an open smooth subset of Rd (d = 2 or 3) and let ω be an element of Ωd0 with a C4,α boundary. Let h1 and h2 be two deformation fields in H. Then the state ud has a second order shape derivative u′′d ∈ H 1(Ω \ ω) ∪H1(ω) that solves   ∆u′′d = 0 in Ω \ ω and in ω,[ h1,nh2,nH − h1τ .(Dnh2τ ) [∂nud]− h1,n[∂n(ud) 2] + h2,n[∂n(ud) h1τ .∇h2,n + h2τ .∇h1,n [∂nud] on ∂ω,[ = divτ σ∇τ (ud) + h1,n σ∇τ (ud) + h1τ .(Dnh2τ )[σ∇τud] − divτ (h1τ .∇τh2,n +∇τh1,n.h2τ ) [σ∇τud] + divτ h2,nh1,n(2Dn−HI) [σ∇τud] on ∂ω, u′′d = 0 on ∂Ω. Here, (ud) i denotes the first order derivative of u in the direction of hi as given in (6), Dn stands for the second fundamental form of the manifold ∂ω and H stands for the mean curvature of ∂ω. The twin result concerning un is an easy adaption of Theorem 2. Once the differentiability of the state function has been established, one can consider the objectives. In [1], we have shown the first order result. Theorem 3 Let Ω be an open smooth subset of Rd (d = 2 or 3) and let ω be an element of Ωd0 with a C4,α boundary. Let h1 and h2 be two deformation fields in H. The Kohn-Vogelius objective is differentiable with respect to the shape and its derivative in the direction of a deformation field h is given by: DJKV (ω)h = [σ] 2 − |∂nu + |∇τud| 2 − |∇τun| hn. (9) We now give the second-order derivative of the Kohn and Vogelius criterion. Theorem 4 Let Ω be an open smooth subset of Rd (d = 2 or 3) and ω be an element of Ωd0 with a C4,α boundary. Let h1 and h2 be two deformation fields in H. The Kohn-Vogelius objective is twice differentiable with respect to the shape and its second derivative in the directions h1 and h2 is given D2JKV (ω)(h1,h2) = σ|∇v|2 h1τ .∇τ (h2,n) + h2τ .∇τ (h1,n)− h2τ .(Dnh1τ ) σ|∇v|2 h1,nh2,n + 2 σ∇v.(h1,n∇v 2 + h2,n∇v ∂n(un) 2 + ∂n(un) 1 − ∂nv 1(ud) 2 − ∂nv 2(ud) σ∂n(un) − σ1∂nv where we have set v = ud − un. To investigate the properties of stability of this cost function, we are led to consider an admissible inclusion ω∗ to solve both (1)-(2) and (1)-(4) in order to obtain the corresponding measurements f∗ and g∗. It is obvious that the domain ω∗ realizes the absolute minimum of the criterion JKV since, by construction, we can write ud = un in Ω and hence JKV (ω ∗) = 0. We will check that the Euler equation DJKV (ω ∗)(h) = 0, holds. We will also prove that D2JKV (ω ∗)(h,h) = σ|∇v′|2. (11) Moreover, if hn 6= 0, then D 2JKV (ω ∗)(h,h) > 0 holds. Nevertheless, (11) does not mean that the minimization problem is well-posed. In fact, it is the following theorem that explains the instability of standard minimization algorithms. Theorem 5 Assume that ω∗ is a critical shape of JKV for which the additional condition un = ud holds. Then the Riesz operator corresponding to D2JKV (ω ∗) defined from H1/2(∂ω∗) with values in H−1/2(∂ω∗) is compact. Moreover, the minimization problem is severely ill-posed in the following sense: if the target domain is C∞ and if λn denotes the n th eigenvalue of D2JKV (ω ∗), then λn = o(n−s) for all s > 0. Theorem 5 has two main consequences. First, the shape Hessian at the global minimizer is not coercive. This means that this minimizer may not be a local strict minimum of the criterion. Moreover, the criterion provides no control of the distance between the parameter ω and the target ω∗. The second consequence concerns any numerical scheme used to obtain this optimal domain ω∗. One has to face this difficulty and this explains why frozen Newton or Levenberg-Marquard schemes have been used to solve numerically this problem [7, 1]. The paper is organized as follows. In a first section, we state some preliminary results. Some are well known facts in shape optimization and will be recalled without proof for the sake of readability. Some of them (e.g the derivatives of a Laplace-Beltrami operator and the tangential regularity of the solution to (1)-(2) along the discontinuity of the conductivity distribution) are less known and will be proved thanks to potential layer methods. Hence we will tackle the computations in Section 3 that we consider as the core of this work : it is essentially devoted to prove Theorem 2. After a first part where we prove the existence of a second order derivative for the state, we propose two distinct methods to find the boundary value problem solved by this second order derivative. The first method (subsection 3.3) follows the lines of classical proofs of shape differentiability by differentiating the weak formulation of problem (1)-(2) and interpreting the result in terms of differential operator and boundary conditions. The alternative method (subsection 3.4) consists in a direct differentiation of the boundary conditions. Finally, Section 4 is devoted to the analysis of the criterion, we establish Theorem 4 and Theorem 5. We will present their consequences on the stability of critical shapes. 2 Preliminary results. 2.1 Elements of shape calculus Before entering the proof of Theorem 2, we recall without proof some basic facts from shape opti- mization (see [6] for references). Let h be a deformation field in C2(Ω,Rd) with ‖h‖C2 < 1. We set Tt(h, .) = Id + th and denote by Ωt the transported domain Ωt = Tt(Ω). To avoid heavy notations, we will misuse the notation Tt instead of Tt(h, .). Material and shape derivatives. Classically, in mechanics of continuous media, the material derivative is defined as being a positive limit. In our context, for any vector field h ∈ H, we define the material derivative of the domain functional y = y(Ω) at Ω in an admissible direction h as the limit ẏ(Ω;h) = lim y(Ωt) ◦ Tt − y(Ω) , (12) Similarly, one can define the material derivative ẏ(∂Ω,h) for any domain functional y = y(∂Ω) which depends on ∂Ω. Another kind of derivative occurs : it is called the shape derivative of y(Ω,h). It is viewed as a first local variation. Its definition is given by the following Definition 1 The shape derivative y′ = y′(Ω;h) of a functional y(Ω) at Ω in the direction of a vector field h is given by y′ = ẏ − h.∇y. (13) For more details on these derivations, the reader can consult [20, 6]. Elements of tangential derivatives. We will need in the sequel to manipulate the tangential differential operators on a manifold. For the reader’s convenience, we recall from [4, 6] some definitions and also some useful rules of calculus. Definition 2 The tangential divergence of a vector field V ∈ C1(Rd,Rd) is given by divτ (V) = div (V)−DV.n.n, (14) where the notation DV denotes the Jacobian matrix of V. When the vector V ∈ C1(∂Ω,Rd) is defined on ∂Ω, then the following notation is used to define the tangential divergence divτ (V) = div − (DṼ.n).n, (15) where Ṽ stands for an arbitrary C1 extension of V on an open neighborhood of ∂Ω. We introduce now, the notion of tangential gradient ∇τ of any smooth scalar function f in C1(∂Ω,Rd). Definition 3 Let an element f ∈ C1(∂Ω,Rd) be given and let f̃ be an extension of f in the sense that f̃ ∈ C1(U) and f̃ |∂Ω = f and where U is an open neighborhood of ∂Ω. Then the following notation is used to defined the tangential gradient ∇τf = ∇f̃ |∂Ω −∇f̃ .n n on ∂Ω. (16) The details for the existence of such an extension can be found in [4]. Let us remark that these definitions do not depend on the choice of the extension. Furthermore, one can show the important relation ∫ ∇τf.F = − f divτ (F) , (17) for all elements f ∈ C1(∂Ω) and all vector fields F ∈ C1(∂Ω,Rd) satisfying Fn = 〈F, n〉 = 0. Integration by parts on ∂Ω. In general, the condition above Fn = 0 is not always satisfied. We are then led to find another formula to extend the formula in the general case. The extension of this integration by parts formula to fields with a normal vector component involves curvature. First, we point out that the curvature is connected to the normal vector via the tangential divergence operator. Recall that the mean curvature of ∂Ω is defined as H = divτ (n). Making use of the form of divτ (n) on the boundary, one shows straightforwardly the following statement. Proposition 1 Let Ω be an open subset of R3 with a C2 boundary. For any unitary extension N of n on a neighborhood of ∂Ω, one has div (N ) = H on ∂Ω. Assume that the manifold ∂Ω has no borders. If F ∈ H2(∂Ω)3 and f ∈ H2(∂Ω), then we have ∇f.F+ fdivτ (F) = (∇f.n+Hf)F.n. (18) We assume now that the domain Ω has a C3 boundary. The simplest second-order derivative is the Laplace Beltrami operator; it is defined as follows (see [20, 4, 6]) thanks to the following usual chain rule. Definition 4 Let f ∈ H2(∂Ω). The Laplace-Beltrami ∆τ of f is defined as follows ∆τf = divτ (∇τf) . (19) There is a relation connecting the Laplace operator and the Laplace-Beltrami operator. Let us denote by ∂2nnf = (D 2f.n).n where D2f stands for the Hessian of f . Proposition 2 Let Ω be a domain with a boundary ∂Ω of class C3. For all functions f ∈ H3(Ω), it holds ∆f = ∆τf +H∂nf + ∂ nnf, on ∂Ω. (20) We need to compute shape and material derivative of special vector fields: the outer unit normal vector n, the tangential gradient and the Laplace-Beltrami operator applied to a function. While the derivative of the normal vector is obtained by a straightforward calculus, we have to transport from ∂Ωt to ∂Ω the Laplace-Beltrami operator and the tangential gradient in order to compute the other derivatives. Derivatives of the normal vector. We describe the material and shape derivatives of the normal vector. We will denote by n the gradient of the signed distance to ∂Ω. This is an unitary extension of the unitary normal vector n at ∂Ω which is smooth in the vicinity of ∂Ω. This extension furnishes a symmetric Jacobian Dn that satisfies Dnn = 0 on ∂Ω. The direction h will be supposed to be in C2(Rd,Rd) or in C2(∂Ω,Rd). Proposition 3 The material derivative ṅ of the normal vector n at Ω in the direction of a vector field h ∈ C1(Rd,Rd) is given by ṅ = −∇τ (h.n) +Dnhτ , where hτ = h− h.n n. Concerning its shape derivative defined as n′ = (∂tnt)|t=0 where nt is any smooth unitary extension of n to ∂Ωt, we obtain. Proposition 4 The shape boundary n′ in the direction of h is given by n′ = −∇τ (h.n). Derivative of the tangential gradient. For f ∈ H3(∂Ω), we compute the material derivative of ∇τf . We first compute the difference ∇τf −∇ḟ . Proposition 5 For all functions f ∈ C2(R3) and directions h ∈ C2(∂Ω,R3), one has ˙∇τf = ∇ḟ + (D 2fh)τ −∇f.n ṅ−∇f.ṅ n Proof of Proposition 5. We differentiate ∇f and ∇f.n n and obtain ∇̇f = ∇f ′ +D2fh while ˙∇f.n n = ∇f.ṅ n+∇f.n ṅ+∇f ′.n n+ (D2fh).n n. The two former equations give the desired result. � Derivative of the Laplace-Beltrami operator. Now, we want to compute the material derivative ∆τf . We begin to study how to transport the Laplace-Beltrami operator when one works on ∂Ωt. Let ∆τ,t denote the Laplace-Beltrami operator on the manifold ∂Ωt. To compute the derivative of a Laplace-Beltrami operator, we need the following proposition that we quote from [20]. Proposition 6 Let f ∈ H5/2(Rd), then ∆τ,tf ◦ Tt γτ (t) φ = − ∇(f ◦ Tt)− (B(t) n).∇(f ◦ Tt) .∇φ, ∀φ ∈ D(Rd). In the former proposition, we set γ(t) = detDTt, γτ (t) = γ(t)‖(DT T .n‖Rd , B(t) = D(T−1t )(D(T ‖(DT−1t ) T .n‖2 C(t) = γτ (t)D(T t )(D(Tt) −1)T . . (22) A straightforward computation gives γ′(0) = divτ (h) , γ′τ (0) = divτ (h) = divτ (hτ ) +Hhn, B′(0) = 2(Dhn).nI − (Dh+ (Dh)T ), C′(0) = divτ (h) I − (Dh+ (Dh) . (23) Theorem 6 Let f ∈ D(Rd). The material derivative of ∆τf in the direction h is given by ∆τf = ∆τ ḟ+∇τf.∇τ divτ (hτ ) +∇τ (Hhn).∇τf − divτ Dh+ (Dh)T Proof of Theorem 6 : Formula (24) is shown in a weak sense. For each test function φ ∈ C∞(∂Ω), there exists an extension φ̃ ∈ D(Rd) such that ∂nφ̃ = 0; this can be done by extending φ as a constant along the orbits of the gradient of the signed distance function to ∂Ω and the use of a cut-off function. For f ∈ D(Rd), we set A(t) = (∆τ,tf) ◦ Tt −∆τf γτ (t) φ. After an integration by parts on ∂Ω, we obtain: A(t) = 1− γτ (t) (∆τ,tf) ◦ Tt φ+ (∆τ,tf) ◦ Ttφ+ ∇τf.∇τφ 1− γτ (t) (∆τ,tf) ◦ Tt φ ∇τf − C(t)∇ (f ◦ Tt) .∇φ̃+ (B(t)n.∇(f ◦ Tt) C(t) n.∇φ̃ Since ∂nφ̃ = 0 and C(0) = I, we get A(t) = 1− γτ (t) (∆τ,tf) ◦ Tt φ+ ∇τ (f − f ◦ Tt) .∇τ φ̃+ C(0)− C(t) ∇(f ◦ Tt).∇τ φ̃. When t→ 0, it then comes ∆τfφ = − γ′τ (t)∆τ fφ+∇τ ḟ .∇τφ+ C′(0).∇f .∇τφ, ∆τ ḟ − divτ (h)∆τf Dh+ (Dh)T − divτ (h) I ∇f.∇τφ, ∆τ ḟ − divτ (h)∆τf + divτ divτ (h)∇τf − divτ Dh+ (Dh)T Expanding the double divergence term, we obtain: ∆τf = ∆τ ḟ +∇τf.∇τdivτ (h)− divτ Dh+ (Dh)T In order to explicit these derivatives, we let appear the curvatures of ∂Ω by means of ∇τf.∇τdivτ (h) = ∇τf.∇τ divτ (hτ ) +Hhn and this ends the proof of the theorem (24). � 3 Existence of the second order derivative of the state. Proof of Theorem 2. The section is devoted to prove Theorem 2. We follow the usual strategy to derive existence in shape optimization. In section 3.2, we will write the weak formulation of the problem, then transport it on the reference domain, pass to the limit and obtain existence of the material derivative. In a second time, we will seek a boundary value problem solved by the material derivative. This will provide a characterization of the second order shape derivative. Two strategies, that we will detail, are possible: the first one explored in section 3.3 consists in working on the variational formulation while the second one uses the tangential differential calculus by differentiating the boundary conditions. This last approach will be presented in section 3.4. The computations that will be made in subsections 3.3 and 3.4 require some regularity of the traces of the state ud on the interface of discontinuity ∂ω. For the sake of readability, we postponed in subsection 3.5 all the needed justifications. 3.1 Preliminary results. In the sequel, we will use some technical formulae. To preserve the readability of the proof of the main result, we state them in this paragraph. The tools needed for proving these results can be found in [20]. Given a smooth vector field h, we denote Ah = Dh+Dh T − div (h) I We begin with the following formula. Lemma 1 It holds: ∇u.Ah∇v = ∇(h.∇u).∇v +∇(h.∇v)∇u − div (∇u.∇v)h . (25) Given two smooth vector fields h1 and h2, we set A = Dh2Ah1 +Ah1Dh2 T −Ah1div (h2)− (Ah1) ′(h2), (26) b = (h2.∇u)Ah1∇v + (h2.∇v)Ah1∇u − ((Ah1∇u).∇v)h2. Here, the notation (Ah1) ′(h2) stands for the matrix defined by its elements ((Ah1) ′(h2))k,l = ∇(((Ah1 ) ′)k,l).h2 Lemma 2 One has: ∇u.A∇v = div (b)− (h2.∇u)div (Ah1∇v) − (h2.∇v)div (Ah1∇u) . (27) We need the following crucial result Lemma 3 If u is harmonic then Ah1∇u = ∆(h1.∇u). (28) Proof of Lemma 3 For any harmonic function u in Ω and for every test function φ ∈ D(Ω), we can write ∫ ∇u̇∇φ = Ah∇u∇φ then ∫ ∆u̇ φ = div (Ah∇u) φ Since u̇ = u′ + h.∇u and since u′ is harmonic in Ω, we obtain the result. � 3.2 Proof of existence of the second order derivative. We follow Hettlich and Rundell [8] and Simon [19] to define the second order derivative of an op- erator with respect to a domain. We compute the second derivative by considering two admissible deformations h1,h2 ∈ H that will describe the small variations of ∂ω. Simon shows that the second derivative F ′′(∂ω;h1,h2) of F (∂ω) is defined as a bounded bilinear operator satisfying F ′′(∂ω;h1,h2) = F ′(∂ω;h1) h2 − F ′(∂ω;Dh1 h2) (29) For more details, the reader can consult the appendix in page 613 of [8]. Let us begin the proof. Let h1,h2 ∈ H be two vector fields. The direction h1 being fixed, we consider u̇1,h2 the variation of u̇1 with respect to the direction h2. We recall from [1] that the material derivative u̇1 of u in the direction h1 satisfies ∀v ∈ H10 (Ω), σ∇u̇1.∇v = σ∇u.Ah1∇v. Let φ2 : Ω 7→ Ω be the diffeomorphism defined by φ2(x) = x+h2(x) and we set ψ2 = φ 2 . Setting ωh2 = x+ h2(x), x ∈ ω , Ωh2 = x+ h2(x), x ∈ Ω = Ω and σh2 = σ ◦ φ2, we get σh2∇u̇1,h2 .∇v = σh2∇uh2 .Ah1∇v (30) where uh2 is the solution of the original problem with ωh2 instead of ω. Making the change of variables x = φ2(X), we get the integral identity on the fixed domain Ω : σ∇˜̇u1,h2 . Dψ2(Dψ2) T det(Dφ2) σ∇ũh2 . Dψ2Ãh1(Dψ2) T det(Dφ2) ∇v (31) with the notations ũ = u ◦φ2 and Ãh1 = Ah1 ◦φ2. Since the material derivative u̇1 of u with respect to the direction h1 satisfies ∫ σ∇u̇1.∇v = σ∇u.Ah1∇v, the difference of (30) and (31) gives ˜̇u1,h2 − u̇1 .∇v = σ∇˜̇u1,h2 . I −Dψ2(Dψ2) T det(Dφ2) σ∇ũh2 . Dψ2Ãh1(Dψ2) T det(Dφ2)−Ah1 (∇ũh2 −∇u).Ah1∇v. We quote from [13] and [8] the following asymptotic formulae ‖ − div (hi) ‖∞ = O(‖hi‖ ‖Dψi(Dψi) T det(Dφi)− I +Ahi‖∞ = O(‖hi‖ ‖Dψ2Ãh1(Dψ2) T det(Dφ2)−Ah1 +Dh2Ah1 +Ah1(Dh2) T − div (h2)Ah1 − (Ah1) ′(h2)‖∞ = O(‖h2‖ Making the adequate substitutions, we easily check that the material derivative of u̇1 with respect to h2 exists. This derivative, denoted by ü1, satisfies σ∇ü1.∇v dx = ∇u̇1.Ah2∇v +∇u̇2.Ah1∇v −∇u.A∇v . (32) where A is defined in (26). 3.3 Derivation of (8) from the weak formulation. We want to make explicit the problem solved by (u′)′. To achieve this, we should write the right hand side ∇u̇1.Ah2∇v +∇u̇2.Ah1∇v −∇u.A∇v as the sum of an integral with ∇v in factor and an integral of a divergence to identify the jump conditions on ∂ω. To that end, we will use algebraic identities that involve second order derivatives of u, u̇i and of the test function v ∈ D(Ω). Using Lemma 1, we obtain: σ∇u̇1.Ah2∇v = ∇(h2.∇u̇1).∇v +∇(h2.∇v)∇u̇1 − div (∇u̇1.∇v)h2 σ∇u̇2.Ah1∇v = ∇(h1.∇u̇2).∇v +∇(h1.∇v)∇u̇2 − div (∇u̇2.∇v)h1 Concerning the remaining terms, we use Lemma 2 to get σ∇u.A∇v = σ div (h2.∇u)Ah1∇v + (h2.∇v)Ah1∇u− (Ah1∇u.∇v)h2 (h2.∇u)div Ah1∇v + (h2.∇v)div Ah1∇u We apply Lemma 3 and gather the expressions obtained for F . ∇ (h1.∇u̇2 + h2.∇u̇1) .∇v +∇(h2.∇v).∇u̇1 +∇(h1.∇v).∇u̇2 σ div (Ah1∇u.∇v −∇u̇1.∇v)h2 − (∇u̇2.∇v)h1 (h2.∇v)∆(h1.∇u)− div (h2.∇v)Ah1∇u −∇(h2.∇u).Ah1∇v Using (25), we remove the dependency on Ah1∇v: ∇(h2.∇u).Ah1∇v = ∇(h1.∇(h2.∇u)).∇v +∇(h1.∇v)∇(h2.∇u)− div (∇(h2.∇u).∇v)h1 Therefore, we write F = F1 + F2 where ∇ (h1.∇u̇2 + h2.∇u̇1)−∇(h1.∇(h2.∇u)) .∇v, (34) ∇(h1.∇v).∇(u̇2 − h2.∇u) +∇(h2.∇v).∇u̇1 + (h2.∇v)∆(h1.∇u) σ div (Ah1∇u.∇v −∇u̇1.∇v)h2 + ∇(h2.∇u).∇v −∇u̇2.∇v h1 − (h2.∇v)Ah1∇u The connection between second order material and shape derivatives is given by: ü1 = (u 2 + h1.∇u̇2 + h2.∇u̇1 − h1.∇(h2.∇u), incorporating this expression in (34), we rewrite (32) as: ∀v ∈ H10 (Ω), σ∇(u′1) 2.∇v = F2. (35) Testing it against v ∈ D(Ω \ ∂ω), we get ∆(u′1) 2 = 0 in Ω \ ω and in ω. We now deduce the jump conditions for (u′1) 2. To obtain the jump of the potential, we simply write that ü1 ∈ H 0(Ω), hence [ü1] = 0 on ∂ω and then [(u′1) 2] = −h1.∇u 2 − h2.∇u̇1. To express the jump of the flux, we then apply the Gauss formula in (35) to get [σ∂n(u 2]v = F2. (36) The second term F2 contains all the jumps of the flux on the interface ∂ω. A simplified expression of F2. To get a simplified formula for F2 under a boundary integral, some lengthy but straightforward calculations are needed. We summarize the result by means of the following lemma Lemma 4 One has: 2h2,nh1,nDn [σ∇τu]− h2,nn.∇h1,n [σ∇τu] + h2,nh1τ .Dnn [σ∇τu] h1τ .∇τ (h2,n) [σ∇τu]− h1,nh2,nH [σ∇τu] + divτ Proof of lemma First, write : σ∇(h1.∇v).∇(u̇2 − h2.∇u) = σ1 ∇(h1.∇v).∇u 2 + σ2 ∇(h1.∇v).∇u [σ∂nu 2](h1.∇v) Note that the normal vector is oriented from ω to Ω \ ω. In the same spirit, we write ∇(h2.∇v).∇u̇1 + (h2.∇v)∆(h1.∇u) = ∇(h2.∇v).∇(u̇1 − h1.∇u) + div (h2.∇v).∇(h1.∇u) By a argument of symmetry, we then can write: σ∇(h2.∇v).∇(u̇1 − h1.∇u) = − [σ∂nu 1](h2.∇v). To drop the dependency in Ah1 , we use (25) and get after expansion: Ah1∇u.∇v = div ∇(h1.∇v).∇u +∇(h1.∇u)∇v − div (∇u.∇v)h1 (h2.∇v)Ah1∇u = ∇(h2.∇v).Ah1∇u + (h2.∇v)div Ah1∇u = ∇(h1.∇(h2.∇v)).∇u +∇(h1.∇u)∇(h2.∇v) + (h2.∇v)∆(h1.∇u) − div ∇(h2.∇v).∇u h1.∇(h2.∇v) .∇u+ div (h2.∇v)∇(h1.∇u)− ∇(h2.∇v).∇u After integrating by parts, we conclude thanks to the state equation and obtain h1.∇(h2.∇v) .∇u = − h1.∇(h2.∇v) div (σ∇u) = 0 We substitute the shape derivative u′ to the material one u̇: F2 = − [σ∂nu 1](h2.∇v) + [σ∂nu 2](h1.∇v)− σ div (∇u.∇v)h1 σ div ∇(h1.∇v).∇u)h2 + (∇(h2.∇v).∇u (∇u′2.∇v)h1 + (∇u 1.∇v)h2 First, we use the continuity of the flux on ∂ω, then we integrate by parts on ∂ω and finally we incorporate the expressions of the jumps of the shape derivatives u′ to obtain ∇(h2.∇v).∇u σ∇u.∇(h2.∇v) h1,n = − [σ∇τu]h1,n∇τ (h2.∇v) [σ∇τu]h1,n h2.∇v = h2.∇v. This leads to a simplified expression for F2: F2 = − σ div (∇u.∇v)h1 (∇u′1.∇v)h2 + (∇u 2.∇v)h1 Let us study each term of this sum. Using Gauss formula and integrating by parts on the manifold ∂ω, we obtain σ div ∇u′1.∇v)h2 σ∇u′1.∇v ∂nv − ∂nv + By symmetry, we also get: σ div ∇u′2.∇v)h1 ∂nv + We now turn to the term with a double divergence. We first write it as a boundary integral thanks to Gauss formula as σ div (∇u.∇v)h1 h2,ndiv σ(∇u.∇v) then, we use (14) to introduce the tangential operators σ div (∇u.∇v)h1 h2,ndivτ σ(∇u.∇v) h2,nD(h1 σ(∇u.∇v) )n.n. We study each of these terms. We start with the one involving tangential derivatives: we expand the tangential divergence to incorporate the jump relation for the state u. σ(∇u.∇v) = divτ (h1) σ(∇u.∇v) + h1.∇τ [σ∇u.∇v] = divτ (h1) [σ∇τu] .∇τv + h1.∇τ [σ∇τu.∇τv] . Then, the first term becomes: h2,ndivτ σ(∇u.∇v) h2,ndivτ (h1) [σ∇τu] .∇τv + h2,nh1.∇τ [σ∇τu∇τv] . We use the integration by parts formula (18) to get: h2,ndivτ σ(∇u.∇v) h1,nh2,nH [σ∇τu] .∇τv − divτ divτ (h1)h2,n [σ∇τu] v − divτ h1h2,n [σ∇τu] .∇τv h1h2,n − divτ (h1)h2,n − h1,nh2,nH [σ∇τu] Expanding h1h2,n [σ∇τu] v = divτ divτ (h1) h2,n [σ∇τu] + h1.∇τ (h2,n) [σ∇τu] = divτ divτ (h1) h2,n [σ∇τu] v + divτ h1τ∇τh2,n [σ∇τu] we obtain the new expression: h2,ndivτ σ(∇u.∇v) h1τ∇τh2,n − h1,nh2,nH [σ∇τu] v. (38) Now, we consider the term involving normal components. We have n.D(h1 σ∇(∇u.∇v) )n = n.∇(h1,n [σ∇u.∇v])− [σ∇u.∇v]h1τ .Dnn = n.∇(h1,n) [σ∇τu]∇τv + h1,nn.∇([σ∇u.∇v]). Then, we get h2,nD(h1 σ(∇u.∇v) )n.n = h2,nn.∇(h1,n) [σ∇τu] .∇τv + h2,nh1,nn.∇([σ∇u.∇v]) −divτ h2,nn.∇(h1,n) [σ∇τu] v + h2,nh1,nn.∇([σ∇u.∇v]). A straightforward calculus leads to n.∇([σ∇u.∇v]) = n. σD2u∇v +D2v [σ∇u] ∇τv +D 2v [σ∇τu] = ∂nv ∇τv + n.D 2v [σ∇τu] . where D2u is the Hessian matrix of u. From (20) and from the jump conditions for the state u, we deduce that [ = − [σ∆τu] . When one differentiates the relation expressing the continuity of the flux for the state along the tangential direction ∇τv, one gets ([6], p 235): 0 = ∇[σ∂nu].∇τv = [σD 2u]∇τv.n+ [σ∇u].(Dn∇τv). In the same spirit, it comes that ∇∂nv.[σ∇τu] = D 2v[σ∇τu].n+∇v.(Dn[σ∇τu]). (40) Since Dn is a symmetric matrix and Dnn = 0, one checks ∇v.(Dn[σ∇τu]) = [σ∇u].(Dn∇τv). n.∇([σ∇u.∇v])) = − [σ∆τu] ∂nv − 2Dn [σ∇τu] .∇τv + [σ∇τu]∇τ∂nv We integrate this expression on ∂ω and obtain after some integration by parts: h2,nh1,nn.∇([σ∇u.∇v]) h2,nh1,n [σ∆τu]∂nv + h2,nh1,n [σ∇τu]∇τ∂nv − 2 h2,nh1,nDn [σ∇τu] .∇τv, h2,nh1,n [σ∆τu] + divτ h2,nh1,n [σ∇τu] ∂nv + 2 h2,nh1,nDn [σ∇τu] Hence h2,nD(h1 σ(∇u.∇v) )n.n = − h2,nh1,n [σ∆τu] + divτ h2,nh1,n [σ∇τu] 2h2,nh1,nDn [σ∇τu]− h2,nn.∇τh1,n [σ∇τu] (∇u.∇v)h1 h2,nh1,n [σ∆τu] + divτ h2,nh1,n [σ∇τu] 2h2,nh1,nDn [σ∇τu]− h2,nn.∇h1,n [σ∇τu] h1τ .∇τ (h2,n) [σ∇τu]− h1,nh2,nH [σ∇τu] Gathering all the terms, we write F2 as: 2h2,nh1,nDn [σ∇τu] + h1τ .∇τ (h2,n)− h2,nn.∇h1,n − h1,nh2,nH [σ∇τu] + divτ h2,nh1,n [σ∆τu] + divτ h2,nh1,n [σ∇τu] h1,ndivτ h2,n [σ∇τu] + h2,ndivτ h1,n [σ∇τu] We end the proof after expanding the tangential divergence of the last term of F2. � Let us return to the weak formulation (36) of the derivative. By identification, we get [σ∂n(u 2] = divτ + divτ − divτ h2,nh1,n(2Dn−HI) [σ∇τu] − divτ h1τ .∇τ (h2,n) [σ∇τu]− h2,nn.∇h1,n [σ∇τu] + h2,nh1τ .Dnn [σ∇τu] It remains to compute the jump of the flux for the second order derivative. Since u′′1,2 = (u 2 − u Dh1 h2 where u′Dh1 h2 is the first shape derivative of u in the direction of the vector field Dh1 h2. Thanks to (6), we can write the jump under the form [σ∂nu 1,2] = [σ∂n(u 2]− [σ∂nu Dh1 h2 ] = [σ∂n(u 2]− divτ Dh1h2.n[σ∇τu] . (42) Let us split the field h2 in two parts: Dh1 h2.n = h2,nn.Dh1 n +Dh1 h2τ .n. In the spirit of (40), we obtain Dh1h2τ .n = ∇τh1,n.h2τ − h1τ .Dnh2τ . (43) Thanks to (39), the jump [σ∂nu Dh1 h2 ] then can be written under the form [σ∂nu Dh1 h2 ] = divτ (h2,nn.∇h1,n +∇τh1,n.h2τ − h1τ .Dnh2τ )[σ∇τu] Gathering all the terms, simplifications occur and we get: [σ∂nu 1,2] =divτ + h1,n − divτ (h1τ .∇τh2,n +∇τh1,n.h2τ ) [σ∇τu] − divτ h2,nh1,n(2Dn−HI) [σ∇τu] + divτ h1τ .Dnh2τ )[σ∇τu] To get the jumps of the potential, we use (41) and obtain u′′1,2 (u′1) u′Dh1 h2 = −h1. − h2. [∇u̇1]− u′Dh1 h2 = −h2,n − h1,n − h2τ . − h1τ . − h2,nn. ∇(h1.∇u) − h1,nn. ∇(h2.∇u) u′Dh1 h2 Thanks to the jump of the potential for the first order shape derivative given in (6), it comes that h2τ . = −h2τ . ∇(h1.∇u) and h1τ . = −h1τ . ∇(h2.∇u) and then: u′′1,2 = −h2,n − h1,n − h2,nn. ∇(h1.∇u) + h1τ . ∇(h2.∇u) u′Dh1 h2 Computing the other jumps that appeared in the former expression, we get ∇(h2.∇u) = (Dh2) T [∇u] + h1τ . ∇(h2.∇u) = n.Dh2h1τ [∂nu] + h2,nh1τ . n+ h1τ . h2τ . h2,nn. ∇(h1.∇u) = h2,n [∂nu]n.Dh1n+ h2,nh1,nn. n+ h2,nn. h1τ . u′Dh1 h2 = −Dh1 h2.n [∂nu] = − h2,nn.Dh1n+ n.Dh1h2τ [∂nu] With the help of formula (43), we obtain: −h2,nn. ∇(h1.∇u) + h1τ . ∇(h2.∇u) u′Dh1 h2 ∇τh1,n.h2τ +∇τh2,n.h1τ [∂nu] − 2h1τ .Dnh2τ [∂nu] + h1τ . h2τ − h2,nh1,nn. n = − [∆τu]−H [∂nu] = −H [∂nu] , h1τ . h2τ = h1τ .D([∇u])h2τ = h1τ .D([∂nu]n)h2τ = [∂nu]h1τ .Dnh2τ . Finally, we gather the results of these computations to write u′′1,2 + h1,n ∇τh1,n.h2τ +∇τh2,n.h1τ [∂nu] h2,nh1,nH − h1τ .Dnh2τ [∂nu]) 3.4 How to recover (8) by formal differentiation of the boundary condi- tions. The aim of this section is to retrieve the expression of the flux jump [σ∂nu ′′] by computing the normal derivatives of each of the expressions [σ∇u′].n and h1,n[σ∇τu] . Since [σ∇u′].n = divτ h1,n[σ∇τu] = h1,n[σ∆τu] +∇τh1,n.[σ∇τu], then, we get [σ∇u′].n = ˙h1,n[σ∆τu] + h1,n [σ∆τu] + ˙∇τh1,n.[σ∇τu] +∇τh1,n. [σ∇τu]. (46) In order to avoid lengthy computations, we shall concentrate on each normal derivative appearing in the above formula. Some of the results are straightforward and their proof will be left to the reader. Combining propositions (3) and (5), we conclude that ˙∇τh1,n = −∇τ (h1.∇τh2,n) + (D 2h1,n.h2)τ −∇h1,n.ṅ n−∇h1,n.n ṅ. In the same manner, we also get [σ∇τu] = [σ∇τu 2] + ([σD 2u].h2)τ − [σ∇τu].ṅ n− [σ∇τu]n ṅ. Hence, we can write h1.n = h2.∇h,n −∇τh2,n.h1τ . It remains to simplify the terms A = (D2u.h2)τ .∇τh1,n and B = [σ∇τu].(D 2h1,n.h2)τ . We obtain: A = −[σ∇τu].(Dn∇τh1,n)h2,n + [σ∆τu]∇τh1,n.h2τ , B = (D2h2,n.h2τ ).[σ∇τu] +∇τ (∂nh1,n).[σ∇τu]h2,n − [σ∇τu].(Dn∇τh1,n)h2,n. We tackle the computation of (∂nu ′)′. For the sake of clearness, we subdivide the work in several steps. First step. We compute h1,n[σ∇τu] . We expand: h1,n[σ∇τu] h1,n[σ∆τu] + ∇τh1,n.[σ∇τu], h1,n[σ∆τu] + h1,n [σ∆τu] + ∇τh1,n.[σ∇τu] +∇τh1,n. [σ∇τu]. Hence, after substitution, one gets h1,n[σ∇τu] = divτ h1,n[σ∇τu 2] + (h2,n∂nh1,n −∇τh2,n.h1τ )[σ∇τu] + 2[σ∆τu]∇τh1,n.h2τ − ∂nh1,[σ∇τu].(Dnh2τ ) + [σ∇τu].(D 2h1,n.h2τ ) − 2h2,n[σ∇τu].(Dn∇τh1,n) + h1,n σ∆τu− [σ∆τu . (47) Second step. We compute [σ∂nu 1]. From the expression of ṅ, we get after some straightforward computations: [σ∂nu 1] = [σ∂n(u 2] + ([σD 2u′1]h2).n+ [σ∇τu 1].(Dnh2τ −∇τh2,n). (48) Third step. We compute σ∂n(u 2. From the jump condition on the flux of the derivative (6) and (47) and (48), we obtain: [σ∂(u′1) 2] = divτ h1,n[σ∇τu h2,n∂nh1,n −∇τh2,n.h1τ [σ∇τu] + 2∇τh1,n.h2τ [σ∆τu] −([σD2u′1]h2).n+ [σ∇τu ∇τh2,n −Dnh2τ − ∂nh1,n[σ∇τu].(Dn h2τ ) +(D2h1,n h2τ ).[σ∇τu]− h2,n(Dn [σ∇τu])£.∇τh1,n. Taking account of the following calculation, −([σD2u′1]h2).n+ [σ∇τu 1].∇τh2, n = − h2,n[σD 2u′1]n+ [σD 2u′1]h2τ .n+ [σ∇τu 1].∇τh2, n, = h2,n [σ∆τu 1] +H [σ∂nu + [σu′1].∇τh2,n − ([σD 2u′1]h2τ ).n, = divτ h2,n[σ∇τu +Hh2,n[σ∂nu 1]− ([σD 2u′1]h2τ ).n; it comes [σ∂n(u 2] = divτ h1,n[σ∇τu 2] + h2,n[σ∇τu h2,n∂nh1,n −∇τh2,n.h1τ [σ∇τu] +2[σ∆τu]∇τh1,n.h2τ +Hh2,n[σ∂nu 1]− ([σD 2u′1]h2τ ).n [σ∇τu 1] + ∂nh1,n[σ∇τu] .(Dnh2τ ) + (D 2h1,n h2τ ).[σ∇τu] −2h2,n∇τh1,n.(Dn [σ∇τu]) + h1,n [σ∆τu]− [σ∆τu . (49) This formula remains hard to handle. To get a more convenient one, we decide to derive tangentially to the direction h2 the boundary identity [σ∂nu 1] = h1,n[σ∆τu] +∇τh1,n.[σ∇τu]. This leads to: ([σD2u′1]h2τ ).n+(Dnh2τ ).[σ∇τu 1] = ∇τh1,n.h2τ [σ∆τu] + h1,n∇τ [σ∆τu].h2τ + (D2h1,n h2τ ).[σ∇τu]− ∂nh1,n[σ∇τu].(Dnh2τ ) + [σ∆τu]h2τ .∇τh1,n. From (24) and subtracting (50) from (49), we can write [σ∂n(u 2] = divτ h1,n[σ∇τu 2] + h2,n[σ∇τu h2,n∂nh1,n −∇τh2,n.h1τ [σ∇τu] +divτ h1,nh2,n(HI − 2Dn).[σ∇τu] − h1,n ∇τ [σ∆τu].h2τ +∆τ [σ∇τu].h2τ +h1,n ∇τdivτ (h2τ ) .[σ∇τu]− divτ Dh2 + (Dh2) [σ∇τu] From (24), we obtain [σ∆τu] = [σ∆τ u̇] +∇τdivτ (h2τ ) .[σ∇τu] +∇τ (Hh2,n).[σ∇τu] − divτ Dh2 + (Dh2) [σ∇τu] and using the relation between the material and shape derivative, we get [σ∆τu] = [σ∆τu ′] +∇ [σ∆τu] .h2 and [σ∆τ u̇] = [σ∆τu ′] + ∆τ [σ∇u].h2 Injecting these relations in (51) and applying them for h2τ , we get ∆τ ([σ∇τu].h2τ ) +∇τdivτ (h2τ ) .[σ∇τu] = ∇τ [σ∆τu].h2τ + divτ Dh2 + (Dh2) [σ∇τu] This last fact allows us to conclude. 3.5 Justification of the formal computations. We have to justify rigorously that the right-hand sides of (6),(7),(8) make sense. They involve tangential derivatives of un and ud along the interface ∂ω up to the order three. The existence of these derivatives is not clear a priori since the gradient of the solution has a discontinuity along this interface. Our first aim is to precise the tangential regularity along the interface ∂ω of the solution u of (1) with either Dirichlet or Neumann boundary conditions. We should access to the trace of u on the interface ∂ω. Any numerical discretization needs also to compute the state, its derivatives with respect to the shape and the normal derivatives along the interface ∂ω. To that end, we introduce for any α ∈ H1/2(∂ω) and β ∈ H−1/2(∂ω) the following boundary value problems ∆v = 0 in Ω \ ω and in ω, [v] = α on ∂ω, [σ∂nv] = β on ∂ω, v = f1 on ∂Ω. and (N) ∆v = 0 in Ω \ ω and in ω, [v] = α on ∂ω, [σ∂nv] = β on ∂ω, ∂nv = g1 on ∂Ω, where (f1, g1) ∈ H 1/2(∂Ω)×H−1/2(∂Ω). Note that for α = 0, β = 0 and (f1, g1) = (f, g) then (ud) and un solve respectively (D) and (N); furthermore the choice of hn∂nu + and β = [σ] divτ (hn∇τu) (53) leads to (6) and (7) when we take (f1, g) = (0, 0). Existence of solutions to (D) and (N). To study these problems, we use the integral rep- resentation in terms of layer potentials. In a first step, we recall some definitions. The Newtonian potential Γ is defined as: Γ(x, y) = ln(|x− y|) if n = 2, |x− y| if n = 3. The integral equations applying to direct problem will be obtained from a study of the classical single- and double-layer potentials. We begin to introduce the following operators S∂Ω∂ω : u 7→ S∂Ω∂ωu(x) := Γ(x, y)u(y) dσ(y); S∂ω∂Ω : u 7→ S∂ω∂Ωu(x) := Γ(x, y)u(y) dσ(y); K∂Ω∂ω : u 7→ K∂Ω∂ωu(x) := ∂nΓ(x, y)u(y) dσ(y) ; K∂ω∂Ω : u 7→ K∂ω∂Ωu(x) := ∂nΓ(x, y)u(y) dσ(y) Note that all these operators have a smooth kernel since the boundaries ∂ω and ∂Ω are assumed to have no common point. We also denote SΩ : u 7→ SΩu(x) := Γ(x, y)u(y) dσ(y); KΩ : u 7→ KΩu(x) := ∂nΓ(x, y)u(y) dσ(y); Sω : u 7→ Sωu(x) := Γ(x, y)u(y) dσ(y); Kω : u 7→ Kωu(x) := ∂nΓ(x, y)u(y) dσ(y). We now obtain some systems of integral equations to compute the state function and their shape derivatives. Since v is harmonic in Ω \ ω and for all x ∈ ∂Ω ∪ ∂ω, it has the classical boundary representation: v(x) = ∂nΓ(x, y)v(y) − ∂nΓ(x, y)v(y)− Γ(x, y)∂nv(y) + Γ(x, y)∂nv(y). (54) Similarly since v harmonic in ω, for all x ∈ ∂ω we can write v(x) = ∂nΓ(x, y)v(y) − Γ(x, y)∂nv(y). (55) Let us denote by vd the solution of the boundary values problem (D) in (52). Let us show how to compute their restrictions and also their normal vector derivatives on the boundaries. Incorpo- rating the jump conditions, a straightforward computation leads to the following boundary integral equations I + µKω σ2 + σ1 S∂Ω∂ω µK∂ω∂Ω σ2 + σ1 (v+d )|∂ω (∂nvd)|∂Ω σ1 + σ2  I −Kω −σ2K∂ω∂Ω S∂ω∂Ω  σ1 + σ2  K∂Ω∂ωf1  where µ = [σ]/(σ1 + σ2). Thanks to (55), the quantity (∂nvd) + is then given by Sω(∂nvd) I +Kω v+d (x)|∂ω − α Concerning vn, the solution of the Neumann problem (N) in (52), the same kind of computations gives  I + µKω − σ2 + σ1 K∂Ω∂ω µK∂ω∂Ω − σ2 + σ1 I +KΩ  (v+n )|∂ω (vn)|∂Ω σ1 + σ2  I −Kω −σ2K∂ω∂Ω S∂ω∂Ω  σ1 + σ2 S∂Ω∂ωg1 Finally, the computation of (∂nvn) is given by Sω(∂nvn) I +Kω v+n (x)|∂ω − α Concerning the well-posedness of (56), we can state the following result. Theorem 7 The linear system of integral equation (56) has an unique solution in H1/2(∂ω) × H−1/2(∂Ω). Proof of Theorem 7 Let A be the matricial operator defined on H1/2(∂ω)×H−1/2(∂Ω) as I + µKω σ2 + σ1 S∂Ω∂ω µK∂ω∂Ω σ1 + σ2 The main argument of the proof is based on the Fredholm alternative. In a first step, we have to show that the adjoint operator A∗ is injective. Since the boundaries are bounded, the adjoint operator A∗ defined on H−1/2(∂ω)×H1/2(∂Ω) can be written under the form I + µK∗ω µK σ2 + σ1 S∂ω∂Ω σ2 + σ1 . (59) Let (u, v) ∈ H−1/2(∂ω)×H1/2(∂Ω) be in the kernel of A∗. Consider the potential W defined for each x ∈ Rd by W (x) = σ2 + σ1 Γ(x, y)u(y) + Γ(x, y)v(y) . (60) In a first step, we show that W = 0. The function W satisfies ∆W = 0 in Rd \ (∂ω ∪ ∂Ω) by construction. We check that W |∂Ω = 0 from the equation corresponding to the second line of A By the properties of the single layer potential, [W ] = 0 on ∂ω. Furthermore, it holds [σ∂nW ] = 0 on ∂ω. Indeed, we can have ([11]) σ1 + σ2 u+K∗∂Ω∂ωv σ1 + σ2 v +K∗∂Ω∂ωv hence, σ1∂nW + − σ2∂nW − = σ1 I + µK∗ω)u+ µK ∂Ω∂ωv This corresponds to the first line of A∗(u, v). Then, W solves the Laplace equation (1) with ho- mogeneous Dirichlet boundary conditions. By the uniqueness of the solution, we get W = 0 in In a second step, we deduce that u = v = 0. Since W = 0 in Ω, we see that [∂nW ] = 0 on ∂ω. Since [∂nW ] = σ1u/(σ1 + σ2) on ∂ω , we deduce u = 0. From the second line of A ∗(u, v) = 0, we see that SΩv = 0 on ∂Ω. Since the single layer potential operator SΩ : H −1/2(∂Ω) 7→ H1/2(∂Ω) is an isomorphism, v = 0 holds. The injectivity of A∗ is proved. Since 2A = I + C where C is a compact operator, we conclude that A has a continuous inverse thanks to the Fredholm alternative. � In a similar way, the problem (57) is well-posed under some additional assumptions. We define the adequate space ♦ (∂Ω) = φ ∈ H1/2(∂Ω) : φ = 0 We can state the following result. Theorem 8 If we impose the normalizing condition then there exists one unique couple ((vn)|∂ω, (vn)|∂Ω) ∈ H 1/2(∂ω)×H ♦ (∂Ω) solution of (57) . Proof of Theorem 8 Set  I + µKω − σ2 + σ1 K∂Ω∂ω µK∂ω∂Ω − σ1 + σ2 I +KΩ  the operator defined on H1/2(∂ω)×H ♦ (∂Ω). The adjoint B ∗ can be written under the form I + µK∗ω µK σ1 + σ2 K∗∂ω∂Ω − σ1 + σ2 I +K∗Ω In a first step, we begin to show that B∗ is injective. Let (u, v) ∈ H1/2(∂ω) × H1/2(∂Ω) be in the kernel of B∗. We introduce the potential Z(x) = − σ1 + σ2 Γ(x, y)u(y) + Γ(x, y)v(y) , x ∈ Rd. We can see that Z is a harmonic function in Rd\(∂ω ∪ ∂Ω), satisfying ∂nZ = 0 on ∂Ω. By the properties of the single layer potential, [Z] = 0 Furthermore, a straightforward calculation shows that [σ∂nZ] = 0 on ∂ω. Hence, Z solves the boundary value problem −div (σ∇Z) = 0 in Ω, ∂nZ = 0 on ∂Ω. The function is therefore constant in Ω. Writing [∂nZ] = 0 on ∂ω, we get easily u = 0 and then +K∗Ω)v = 0. Since the operator λI −K Ω is one to one on H ♦ (∂Ω), we deduce that v = 0. We conclude the proof thanks to the Fredholm alternative. � Tangential regularity results. Let us consider now the particular case where both α and β are the zero function and (f1, g1) = (f, g) where f and g are respectively the Dirichlet and Neumann boundary data. To recover the tangential regularity of the solution u along ∂ω, we look at the first line of (56) to deduce that I + µKω (ud)|∂ω = − σ2 + σ1 S∂Ω∂ω∂nud|∂Ω + σ1 + σ2 K∂Ω∂ωf ; (63) Sω(∂nud) I +Kω u+d (x)|∂ω (64) It is easy to deduce that (ud)|∂ω ∈ C 3,α(∂ω). Indeed, from (63) that we consider as an equation in (ud)|∂ω with data f and (∂nud)|∂Ω = g, we see that (f, (∂nvd)|∂Ω) belongs to H 1/2(∂Ω)×H−1/2(∂Ω), thanks to Theorem 7. In order to give a sense to the jump conditions arising in (6),(7),(8), we need to work in space of functions of higher regularity. We choose the framework of Hölder spaces. We quote [12] to precise the behavior of the layer potentials on these spaces. Theorem 9 (Kirsch [12]) 1. If ∂ω is of class C2,α, 0 < α < 1 then the operators Sω and Kω map C β(∂ω) continuously into C1,β for all 0 < β ≤ α. 2. Let k ∈ N with k 6= 0. If ∂ω is of class Ck+1,α with 0 < α < 1, then the operators Sω and Kω map Ck,β(∂ω) continuously into Ck+1,β(∂ω) for all 0 < β ≤ α. 3. Let k be an integer. If ∂ω is of class Ck+2,α, then K∗ω maps C k,β continuously into Ck+1,β(∂ω) for all 0 < β ≤ α. We go back to the proof. Since the two boundaries have no intersection point and since ∂ω is of class C4,α, it follows that the right hand side of the former equation is of class C3,α(∂ω). We then conclude the solution of (63) will be of class C3,α since the operator 1/2I + µKω is an isomorphism from C3,α(∂ω) into itself. With the same arguments, we show straightforwardly that (∂nun) ∈ C2,α. About the regularity of the jumps of the second derivative. The equations giving the jump conditions [u′d] and [∂u d] show obviously that [u d] and [∂nu d] belong respectively to C 2,α(∂ω) and C1,α(∂ω). Hence, it comes straightforwardly that [u′′d ] ∈ C 1,α. With the same arguments, we show that [∂nu d ] ∈ C 0,α(see [20] for more details) and then that all the formal computations to get the equations describing the second derivative have a sense. Remark 1 In a view of a numerical discretization of the state equation, one has to emphasize that the choice of a finite elements method seems inappropriate: one should extract tangential derivative of high order on the interface ∂ω. The obtained numerical accuracy is not sufficient to incorporate the results in an optimization scheme. On the converse, the systems of boundary integral equations (56) and (57) are well-suited for this kind of computation. Nevertheless, a discussion of adapted schemes should be precise and is out of the scope of this manuscript. 3.6 Case of Neumann boundary conditions. Since the admissible deformation fields have a support with no intersection points with the outer boundary, it is a straightforward application of the preceding computations to show that un solution to (1)-(4) is twice differentiable with respect to the shape. Furthermore, its second order derivative u′′n belongs to H 1(Ω \ ω) ∪H1(ω) and solves   ∆u′′n = 0 in ω \ ω and in ω,[ h1,nh2,nH − h1τ .Dnh2τ [∂nun]− h1,n[∂n(un) 2] + h2,n[∂n(un) h1τ .∇h2,n + h2τ .∇h1,n [∂nun] on ∂ω,[ = divτ σ∇τ (un) + h1,n σ∇τ (un) + h1τ .Dn.h2τ )[σ∇τun] −divτ (h1τ .∇τh2,n +∇τh1,n.h2τ + h2,nh1,n(2Dn−HI)) [σ∇τun] on ∂ω, n = 0 on ∂Ω; where we use the notations of Theorem 2. 4 Second order derivatives for the criterion. 4.1 Proof of Theorem 4. The differentiability of the objective is a direct application of Theorem 2. The computation we make here is based on the relation D2JKV (ω)(h1,h2) = D DJKV (w)h1 h2 −DJKV (w)Dh1h2. (66) To obtain (10), we compute in a first step the shape gradient in the direction h1. Then, in a second step, we differentiate the obtained expression in the direction of h2. In the sequel, we adopt the notation v = ud − un to obtain concise expressions. DJKV (ω)h1 = σ1 |∇v|2h1 + 2∇v.∇v′1 + σ2 |∇v|2h1 + 2∇v.∇v′1 = σ1(A1 + 2B1) + σ2(A2 + 2B2), where |∇v|2h1 ∇v.∇v′1 |∇v|2h1 ∇v.∇v′1. Now, we use the classical formulae to differentiate a domain integral to get DA1(ω)h2 = |∇v|2h1 + 2div ∇v.∇v′2 h1 |∇v+|2h1 h2,n + 2∇v +.∇(v′2) + h1,n; DA2(ω)h2 = |∇v−|2h1 h2,n + 2∇v −.∇(v′2) − h1,n. The terms DBi, i = 1, 2 require more precisions. First, we write DB1(ω)h2 = ∇v.∇v′1)h2 +∇v′1.∇v 2 +∇v.∇(v ∇v+.∇(v′1) +h2,n + ∂nv +((v′1) +)′2 + +(v′2) + + ∂n(v +(v′1) ∂nv((un) ∂n(ud) 1(un) 2 + ∂n(ud) 2(un) Note that we used the Green formula twice to keep the symmetry in h1 and h2. We also use the fact that the derivatives (ud) i are harmonic in Ω \ ω to transform the boundary integral on the exterior boundary into an integral on the moving boundary. We obtain DB1(ω)h2 = − ∇v+.∇(v′1) +h2,n + ∂nv +(((ud) + − v∂n(((un) +((ud) + + ∂n(v +((ud) + − ∂n(un) +(v′2) + − ∂n((un) +(v′1) By the same methods, we get DB2(ω)h2 = ∇v−.∇(v′1) −h2,n + ∂nv −((v′1) −(v′2) − + ∂n(v −(v′1) We regroup the different terms and after some straightforward computations, we obtain: DJKV (ω)h1 (ω)h2 = − σ|∇v|2h1 h1,n∇v 2 + h2,n∇v 1 + (ud) 2 − ∂n(un) 1 − ∂n(un) σ∂n((un) − σ1∂nv ((ud) In order to compute D2JKV (ω)(h1,h2), the first order derivative of the Kohn-Vogelius objective is needed. It can be written as follows: DJKV (w)h = − σ|∇v|2 hn + 2 σ∂n(un) − σ1∂nv Gathering (66),(41) and (42), we write the second derivative of the Kohn-Vogelius criterion as: D2JKV (ω)(h1,h2) = − σ|∇v|2h1 σ|∇v|2 (Dh1h2).n 1 + (ud) 2 − ∂n(un) 1 − ∂n(un) h1,n∇v 2 + h2,n∇v σ∂n(un) − σ1∂nv Let us give a more simplified version for the first term. We decompose the field h2 into normal vector and tangential parts and we use (43). After some elementary computations, we obtain σ|∇v|2h1 σ|∇v|2 (Dh1h2).n σ|∇v|2 h1τ .∇τh2,n + h2τ .∇τh1,n − h2τ .Dnh1τ σ|∇v|2 h1,nh2,n. Finally, the second order derivative of the Kohn-Vogelius objective becomes: D2JKV (ω)(h1,h2) = σ|∇v|2 h1τ .∇τh2,n + h2τ .∇τh1,n − h2τ .Dnh1τ σ|∇v|2 h1,nh2,n + 2 h1,n∇v 2 + h2,n∇v 1 + (ud) 2 − ∂n(un) 1 − ∂n(un) σ∂n(un) − σ1∂nv 4.2 Analysis of stability. Proof of Theorem 5 Now, we specify the domain ω that is assumed to be a critical shape for JKV . Moreover, we assume that the additional condition ud = un holds. To emphasize that we deal with such a special domain, we will denote it ω∗. The assumptions mean that the measurements are compatible and that ω∗ is a global minimum of the criterion. From the necessary condition of order two at a minimum, the shape Hessian is positive at such a point. Let us notice that only the normal component of h appears. Let us also emphasize that there is no hope to get h = 0 from the structure theorem for second order shape derivative ([6]). The deformation field h appears in D2JKV (ω ∗)(h,h) only thought its normal component hn since ω a critical point for JKV . This remark explains why we consider in the statement of Theorem 5 the scalar Sobolev space corresponding to the normal components of the deformation field. We now prove Theorem 5. From (67), we deduce DJ2KV (ω ∗)[h, h] = −2 u′d∂nv ′ − ∂nu = 2 [σ] (u′+d − u n )divτ (hn∇τud)− d hn∂n(u d − u = 2 [σ] u′+d − u n , divτ (hn∇τud) ∂nudhn, ∂n u′d − u where 〈, 〉 denotes the duality between H1/2(∂ω∗)×H−1/2(∂ω∗) . Let us introduce the operators T1 : H 1/2(∂ω∗) → H−1/2(∂ω∗) M1 : H 1/2(∂ω∗) → H1/2(∂ω∗) h 7→ divτ (hn∇τud) h 7→ u d − u T2 : H 1/2(∂ω∗) → H1/2(∂ω∗) M2 : H 1/2(∂ω∗) → H−1/2(∂ω∗) h 7→ hn∂nu d h 7→ ∂n u′+d − u The Hessian can then be written under the form : D2JKV (ω ∗)(h,h) = 2 [σ] M1(h), T1(h) T2(h),M2(h) From the classical results of Maz’ya and Shaposhnikova on multipliers ([14], [22]), we get easily that T1 and T2 are continuous operators. In fact, the compactness of the Hessian is a consequence of the fact that both operators M1 and M2 are compact. We use a regularity argument : we remark that M1 is the composition of the operators: R1 : H 1/2(∂ω∗) → H ⋄ (∂Ω) and R2 : H ⋄ (∂Ω) → H 1/2(∂ω∗) h 7→ −u′n φ 7→ ψ where ψ is the trace on ∂ω∗ of Ψ solution of −∆Ψ = 0 in Ω \ ω∗ and in ω∗, , [Ψ] = 0 on ∂ω∗, [σ∂nΨ] = 0 on ∂ω Ψ = φ on ∂Ω. While R1 is a continuous operator, we prove that R2 is compact. Let us express u|∂ω∗ = ψ. We use the integral formula of u to obtain: I + µKω∗ σ2 + σ1 S∂Ω∂ω∗ µK∂ω∗∂Ω σ2 + σ1 (u)|∂ω∗ (∂nu)|∂Ω σ1 + σ2  K∂Ω∂ω∗φ  The matricial operator arising in this equation appeared also in (56). It has a continuous inverse thanks to Theorem 7. Let us express u|∂ω∗ = ψ: I + µKω∗)− µS∂Ω∂ω∗S Ω K∂ω∗∂Ω σ1 + σ2 K∂Ω∂ω∗ − S∂Ω∂ω∗S I −KΩ) φ. (69) Since the operators K∂Ω∂ω∗ and S∂Ω∂ω∗ are compact, the operator R2 is compact, hence M1 is compact. The proof of compactness of M2 is similar. Let us mention that a similar strategy of proof can be found in [5]. The natural question is then to quantify how is this optimization problem ill-posed. This question is directly in related to the rate at which the singular values of the Hessian operator are decreasing. Equation (69) shows that this rate is the one of the operators K∂Ω∂ω∗ and S∂Ω∂ω∗ . Now, since for every u ∈ H1/2(∂Ω), the functions K∂Ω∂ω∗u and S∂Ω∂ω∗u are harmonic outside of ∂Ω and therefore in Ω, their restrictions on ∂ω∗ are as smooth as ∂ω∗. We conclude that if ∂ω∗ is C∞ then the restriction belongs to each Hs(∂ω∗) for s > 1/2 then that if λn denotes the n th eigenvalue of D2JKV (ω ∗), then λn = o(n −s) for all s > 0. References [1] L Afraites, and M. Dambrine and D. Kateb. Conformal mappings and shape derivatives for the transmission problem with a single measurement. Preprint HAL 2006 to appear in Numerical Functional Analysis and Optimization. [2] L Afraites, and M. Dambrine, and K. Eppler and D. Kateb. Detecting perfectly insulated obstacles by shape optimization techniques of order two. Preprint HAL 2006. [3] K. Astala, and L. Pävärinta. Calderón’s inverse conductivity problem in the plane. Ann. of Math., (163), 265-299. [4] M. Delfour, and J.-P. Zolesio. Shapes and Geometries: Analysis, Differential Calculus, and Optimization SIAM, (2001). [5] K. Eppler, and H. Harbrecht. A regularized Newton method in electrical impedance tomography using Hessian information, Control and Cybernetics (34), 203-225. [6] A. Henrot, and M. Pierre. Variation et optimization de formes. Springer Mathématiques et Applications, volume 48 (2005). [7] F. Hettlich, and W. Rundell. The determination of a discontinuity in a conductivity from a single boundary measurement, Inverse Problems 14 (1998), 67-82. [8] F. Hettlich, and W. Rundell. A Second Degree Method for Nonlinear Inverse Problems, SIAM J. Numer. Anal., 37, No.2, (1999), 587–620. [9] B. Hofmann. Approximation of the inverse electrical impedance tomography problem by an inverse transmission problem, Inverse problems 14 (1998), 1171-1187. [10] K. Ito, K. Kunisch, and Z. Li. Level-set function approach to an inverse interface problem, Inverse Problems 17 (2001), 1225-1242. [11] R. Kress. Linear Integral Equations. Springer-Verlag, Applied Mathematical Sciences (82). [12] A. Kirsch. Surface Gradients and Continuity Properties for some integral operators in Classical Scattering Theory. Mathematical Methods in the Applied Sciences, Vol 11 (1989), 789-804. [13] A. Kirsch. The Domain Derivative and Two Applications in Inverse Scattering Theory, Inverse Problems 9 (1993), 81-96. [14] V.G. Maz’ya and T.O. Shaposhnikova. Theory of multipliers in spaces of differentiable functions, Pitman, Boston, Monographs and Studies in Mathematics, 23, (1985). [15] A.I. Nachmann. Reconstruction from boundary measurements, Ann. of Math., 128 (1988), 531-576 [16] A.I. Nachmann. Global uniqueness for a two dimensional inverse boundary value problem, Ann. of Math., 143 (1996), 71-96 [17] R.G. Novikov. A multidimensional inverse spectral problem for the equation ∆ψ + (v(x) − Eu(x))ψ = 0, Funktsional. Anal. i Prilozhen. 22 (1988), no 4, 11-22, 96; translation in Funct. Anal. Appl. 22, 263-272. [18] O. Pantz. Sensibilité de l’équation de la chaleur aux sauts de conductivité, C.R. Acad. Sci. Paris, Ser. I 341-5 (2005), 333-337. [19] J. Simon. Second variation for domain optimization problems, In Control and estimation of distributed parameter systems, F. Kappel, K. Kunisch and W. Schappacher ed., International Series of Numerical Mathematics, no 91, Birkhäuser, 361-378. [20] J. Sokolowski and Jean-Paul Zolesio. Introduction to Shape Optimization: Shape Sensitivity Analysis, Springer-Verlag (1992). [21] J. Sylvester, and G. Uhlmann. A global uniqueness for an inverse boundary value problem Ann. of Math. 125 (1987), 153-169. [22] H. Triebel. Theory of Function Spaces, Birkhaüser (1983). Introduction and statement of the results. Preliminary results. Elements of shape calculus Existence of the second order derivative of the state. Proof of Theorem ??. Preliminary results. Proof of existence of the second order derivative. Derivation of (??) from the weak formulation. How to recover (??) by formal differentiation of the boundary conditions. Justification of the formal computations. Case of Neumann boundary conditions. Second order derivatives for the criterion. Proof of Theorem ??. Analysis of stability. Proof of Theorem ?? ABSTRACT This paper is devoted to the analysis of a second order method for recovering the \emph{a priori} unknown shape of an inclusion $\omega$ inside a body $\Omega$ from boundary measurement. This inverse problem - known as electrical impedance tomography - has many important practical applications and hence has focussed much attention during the last years. However, to our best knowledge, no work has yet considered a second order approach for this problem. This paper aims to fill that void: we investigate the existence of second order derivative of the state $u$ with respect to perturbations of the shape of the interface $\partial\omega$, then we choose a cost function in order to recover the geometry of $\partial \omega$ and derive the expression of the derivatives needed to implement the corresponding Newton method. We then investigate the stability of the process and explain why this inverse problem is severely ill-posed by proving the compactness of the Hessian at the global minimizer. <|endoftext|><|startoftext|> Braiding transformation, entanglement swapping and Berry phase in entanglement space Jing-Ling Chen,1, ∗ Kang Xue,2 and Mo-Lin Ge1, † Liuhui Center for Applied Mathematics and Theoretical Physics Division, Chern Institute of Mathematics, Nankai University, Tianjin 300071, People’s Republic of China Department of Physics, Northeast Normal University, Changchun, Jilin 130024, People’s Republic of China We show that braiding transformation is a natural approach to describe quantum entanglement, by using the unitary braiding operators to realize entanglement swapping and generate the GHZ states as well as the linear cluster states. A Hamiltonian is constructed from the unitary Ři,i+1(θ, ϕ)- matrix, where ϕ = ωt is time-dependent while θ is time-independent. This in turn allows us to investigate the Berry phase in the entanglement space. PACS numbers: 03.67.Mn, 02.40.-k, 03.65.Vf I. INTRODUCTION Quantum entanglement is the most surprising non- classical property of composite quantum systems that Schrödinger singled out many decades ago as “the char- acteristic trait of quantum mechanics”. Recently entan- glement has become one of the most fascinating topics in quantum information, because it has been shown that entangled pairs are more powerful resources than the sep- arable ones in a number of applications, such as quantum cryptography [1], dense coding, teleportation [2] and in- vestigation of quantum channels, communication proto- cols and computation [3]. For instance, by using a maxi- mally entangled state |Φ+〉 = 1/ 2(| ↑↑〉+| ↓↓〉) (i.e., one of Bell states and also the so-called Einstein-Podolsky- Rosen (EPR) channel in [2]), Bennett et al. have showed that it is faithful to transmit a one-qubit state a| ↑〉+b| ↓〉 from one location (Alice) to another (Bob) by sending two bits of classical information. For a two-qubit system, there has been defined a “magic basis” consisting of four Bell states [4]: |Φ+〉 = 1/ 2(| ↑↑〉+ | ↓↓〉), |Φ−〉 = 1/ 2(| ↑↑〉 − | ↓↓〉), |Ψ+〉 = 1/ 2(| ↑↓〉+ | ↓↑〉), |Ψ−〉 = 1/ 2(| ↑↓〉 − | ↓↑〉), (1) where spin-1/2 notation for definiteness has been used. Any pure state of two-qubit can be expanded in this par- ticular basis and its degree of entanglement can be ex- pressed in a remarkably simple way [4]. It is possible to study these Bell states from the other point of view of transformation theory. The fact that they are all nor- malized and mutual orthogonal naturally indicates that the four Bell states are connected to the standard basis ∗Electronic address: chenjl@nankai.edu.cn †Electronic address: geml@nankai.edu.cn {| ↑↑〉, | ↑↓〉, | ↓↑〉, | ↓↓〉} by a unitary transformation 1 0 0 1 0 1 1 0 0 −1 1 0 −1 0 0 1 . (2) More precisely, let | ↑〉 = (1, 0)T and | ↓〉 = (0, 1)T , | ↑↑〉 is understood as | ↑〉⊗| ↑〉, one then has the matrix forms for the standard basis as | ↑↑〉 = (1, 0, 0, 0)T , | ↑↓〉 = (0, 1, 0, 0)T , | ↓↑〉 = (0, 0, 1, 0)T , | ↓↓〉 = (0, 0, 0, 1)T . Act- ing the unitary matrix U on the standard basis will pro- duce the four Bell states: U | ↑↑〉 = 1/ 2(1, 0, 0,−1)T = |Φ−〉, U | ↑↓〉 = 1/ 2(0, 1,−1, 0)T = ¯|Ψ−〉, U | ↓↑〉 = 2(0, 1, 1, 0)T = |Ψ+〉, U | ↓↓〉 = 1/ 2(1, 0, 0, 1)T = |Φ+〉, in short one obtains U(| ↑↑〉, | ↑↓〉, | ↓↑〉, | ↓↓〉) = (|Φ−〉, |Ψ−〉, |Ψ+〉, |Φ+〉). During the investigation of the relationships among quantum entanglement, topological entanglement and quantum computation, Kauffman et al. have discovered a very significant result that the matrix U is nothing but a braiding operator, and furthermore it can be identi- fied to the universal quantum gate (i.e., the CNOT gate) [5][6]. There is an earlier literature on topological quan- tum computation and which is all about quantum com- puting using braiding [7]. These literatures introduce the braiding operators and Yang–Baxter equations to the field of quantum information and quantum computation, and also provide a novel way to study the quantum en- tanglement. Our aim in this work is twofold: one is to show that braiding transformation is a natural approach describing the quantum entanglement, the other is to investigate the Berry phase in the entanglement space (or the Bloch space). The paper is organized as follows. In Sec. II, we briefly review the unitary braiding operators and apply them to realize entanglement swapping and to generate the Greenberger-Horne-Zeilinger (GHZ) states as well as the linear cluster states. In Sec. III, after briefly review- ing the Yang–Baxterization approach, we construct a Hamiltonian from the unitary Ři,i+1(θ, ϕ)-matrix, where ϕ is time-dependent while θ is time-independent. This in http://arxiv.org/abs/0704.0709v3 mailto:chenjl@nankai.edu.cn mailto:geml@nankai.edu.cn turn allows us to investigate the Berry phase in the en- tanglement space. Conclusion and discussion are made in the last section. II. BRAIDING TRANSFORMATION AND ITS APPLICATIONS Hereafter for convenience, we shall denote the spin up | ↑〉 and down | ↓〉 as |0〉 and |1〉, respectively. Braiding operators are the generalizations of the usual permuta- tion operators. ForN spin-1/2 particles, the permutation operator for the particles i and i+ 1 reads Pi,i+1 = (1 + ~σi · ~σi+1) = 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 , (3) Here Pi,i+1 is understood as 11 ⊗ 12 ⊗ · · · ⊗ 1i−1 ⊗ (1 + ~σi · ~σi+1)/2⊗ 1i+2 ⊗ · · · ⊗ 1N , where 1 is the 2× 2 unit matrix. The permutation operator Pi,i+1 exchanges the spin state |k〉i ⊗ |l〉i+1 to be |l〉i ⊗ |k〉i+1. The braiding operators satisfy the following braid re- lations: bi,i+1bi+1,i+2bi,i+1 = bi+1,i+2bi,i+1bi+1,i+2, i ≤ N − 2, bi,i+1bj,j+1 = bj,j+1bi,i+1, |i− j| ≥ 2. (4) The usual permutation operator Pi,i+1 is a solution of Eq. (4) with the constraint P 2i,i+1 = 1. Physics prefers to the unitary transformations. One may observe that both U and Pi,i+1 are unitary. Two more general unitary braiding transformations satisfying the braiding relations Bi,i+1 = 1 0 0 e−iϕ 0 1 1 0 0 −1 1 0 −eiϕ 0 0 1 , (5) Pi,i+1 = eiξ00 0 0 0 0 0 eiξ10 0 0 eiξ01 0 0 0 0 0 eiξ11 , (6) which allow additional phase factors. Braiding opera- tors Bi,i+1 and Pi,i+1 transform the direct-product states |kl〉 ≡ |k〉i ⊗ |l〉i+1 in the following way Bi,i+1 |00〉 − eiϕ|11〉 |01〉 − |10〉 |01〉+ |10〉 e−iϕ|00〉+ |11〉 , (7) Pi,i+1 eiξ00 |00〉 eiξ10 |10〉 eiξ01 |01〉 eiξ11 |11〉 . (8) B12 B34 B12 B34 |ψ〉ABCD = |Φ −〉AB ⊗ |Φ |ψ′〉ABCD = −|Φ −〉AD ⊗ |Φ |0〉A |0〉B |0〉D|0〉C FIG. 1: Realizing ES by braiding transformations. After act- ing B34B12 on a separable state |0000〉ABCD , one prepares a state |ψ〉ABCD = |Φ−〉AB ⊗ |Φ−〉CD needed for quantum entanglement swapping. After performing successive braid- ing transformations B23B34B12B23 on |ψ〉ABCD , the entan- glement involved in the state |ψ〉ABCD is swapped to the state |ψ′〉ABCD = −|Φ−〉AD ⊗ |Φ+〉BC . They may generate entangled states from disentangled ones: (i) The braiding matrix Bi,i+1 yields directly the four Bell states |Φ±〉 and |Ψ±〉 with the relative phase factor e−iϕ. The phase factor e−iϕ originates from the q-deformation of the braiding operator U with q = e−iϕ [8][9], and ϕ may have a physical significance of mag- netic flux [10]. In the next section, we shall vary adia- batically the parameter ϕ to obtain the Berry phase in the entanglement space. (ii) When Pi,i+1 acts on an ini- tial separable state 1/ 2(|0〉+ |1〉)i⊗ 1/ 2(|0〉+ |1〉)i+1, it produces an entangled state (eiξ00 |00〉 + eiξ01 |01〉 + eiξ10 |10〉 + eiξ11 |11〉)/2 whose degree of entanglement equals to |ei(ξ00+ξ11) − ei(ξ01+ξ10)|/2. Thus it is indeed a very natural way for the braiding operators to describe and to generate quantum entanglement. To strengthen such a viewpoint, we would like to provide two explicit examples as applications of braiding transformations as follows. Example 1: Entanglement swapping. Entanglement swapping (ES) is a very interesting quantum mechanical phenomenon, which was originally proposed by Żukowski et al. [11], generalized to multipartite quantum systems by Zeilinger et al. [12] and Bose et al. [13] independently, and experimentally realized by Pan et al. [14]. The origi- nal ES is based on quantum measurement: Suppose Alice and Bob share an entangled state, similarly Claire and Danny also share some entangled states, if Bob and Claire come together and make a measurement in a suitable basis and communicate their measurement results clas- sically, then Alice’s and Danny’s particles may become entangled. Now we come to use the braiding transforma- tions to realize the ES. Starting from a separable state |0000〉ABCD ≡ |0000〉1234, we prepare a state |ψ〉ABCD needed for quantum entanglement swapping due to the braiding transformations B12 and B34 as follows: |ψ〉ABCD = B34B12|0000〉ABCD = |Φ−〉AB ⊗ |Φ−〉CD, (9) (|00〉 − |11〉)AB ⊗ (|00〉 − |11〉)CD, here for simplicity we have set ϕ = 0, and |Φ±〉 are the usual Bell states. One may verify that |ψ′〉ABCD = B23B34B12B23|ψ〉ABCD = −|Φ−〉AD ⊗ |Φ+〉BC , (10) (−|00〉+ |11〉)AD ⊗ (|00〉+ |11〉)BC , in other words, after making the successive braiding transformations B23B34B12B23, the entanglement in- volved in the state |ψ〉ABCD is swapped to |ψ′〉ABCD, therefore we have realized the ES (see Fig. 1). The dif- ference between the original ES scenario and ours is that the former based on quantum measurement, while the latter based on unitary braiding transformations with- out quantum measurement. It is worthy to mention that the approach of realizing ES by braiding transformations is not unique. For instance, ES can be done even simpler by using only two permutations P34P23 that acting on the state |ψ〉ABCD. Example 2: Generating the GHZ states and the linear cluster states. These are some kinds of important en- tangled states in quantum information, such as the well- known GHZ state and the linear cluster state. (i) It is easy to check that, after acting B12B23 on the initially separable three-qubit state |111〉123, one obtains a state |ψ′〉GHZ = B12B23|111〉123 (11) (|100〉123 + |010〉123 + |001〉123 + |111〉123), which is equivalent to the standard three-qubit GHZ state |ψ〉GHZ = 1/ 2(|000〉123 + |111〉123) up to a local unitary transformation: |ψ′〉GHZ = Ua ⊗ Ub ⊗ Uc|ψ〉GHZ , (12) where Ua = Ub = Uc = V , and , (13) i.e., the unitary transformation V is decomposed as a product of the Hadamard gate and the phase gate of σz. In general, one may obtain the N -qubit GHZ states by acting B12B23 · · ·BN−1,N on the initially separable N - qubit state |11 · · · 1〉12···N . (ii) The linear cluster state is the highly entangled multiparticle state on which one- way quantum computation is based [15][16]. The linear cluster state is locally equivalent to the N -qubits ring cluster state. The random quantum measurement er- ror can be overcome by applying a feed-forward tech- nique, such that the future measurement basis depends on earlier measurement results. This technique is crucial for achieving deterministic quantum computation once a cluster state is prepared. For four qubits, the linear cluster state reads |ψ〉cluster = (|0〉1|0〉2|0〉3|0〉4 + |0〉1|0〉2|1〉3|1〉4 + |1〉1|1〉2|0〉3|0〉4 − |1〉1|1〉2|1〉3|1〉4). (14) However, it is not easy to generate |ψ〉cluster by us- ing only one kind of unitary braiding transformations Bi,i+1. In the following, starting from the initial separa- ble four-qubit state |0000〉1234, we would like to mathe- matically generate the four-qubit linear cluster state by combined using two kinds of unitary braiding transfor- mations Bi,i+1 and Pi,i+1, namely |ψ〉cluster = P23P23B34B12|0000〉1234, (15) where the phases in P23 are chosen as ξ00 = 0, ξ01 = ξ10 = ξ11 = π, and P23 is the usual permutation opera- tor in Eq. (3). Moreover, one can mathematically gener- ate 16 orthogonal four-qubit linear cluster states by act- ing P23P23B34B12 on the initial states |ijkl〉1234, where i, j, k, l run from 0 to 1. Significantly such realizations of entanglement swap- ping as well as the GHZ states are purely based on one kind of braiding transformations Bi,i+1. Eqs. (9)-(13) are hopeful to provide an alternative approach for the experimenter to realize the ES and also generate the GHZ states through a network of quantum logic gates in the future. Recent realization of the linear cluster states is based on quantum measurements [16]. By using two kinds of braiding transformations, Eq. (15) has mathe- matically produced the state |ψ〉cluster . Since Bi,i+1 and Pi,i+1 do not have the same eigenvalues and they can- not be the matrices representing exchanges within the same braid group representation, there is still a distance between the mathematical realization Eq. (15) and the actual physical realization. III. R-MATRIX, HAMILTONIAN AND BERRY PHASE IN ENTANGLEMENT SPACE In Ref. [6], the unitary matrix Ři,i+1(θ, ϕ) has been in- troduced from the Yang–Baxterization approach [8] in or- der to include the general discussion of the nonmaximally entangled states. To make the paper be self-contained, we briefly review it in the following. The Yang-Baxterization of the unitary braiding oper- ator Bi,i+1 is Ři,i+1(x) = 1 + x2 (Bi,i+1 + xB i,i+1), (16) namely, Ři,i+1(x)-matrix is a linear superposition of ma- trices Bi,i+1 and B i,i+1, where B −1 = B† is the inverse matrix of B. The unitary Ř-matrix is a generalization of the unitary braiding matrix Bi,i+1, which satisfies the Yang–Baxter equation: Ři(x) Ři+1(xy) Ři(y) = Ři+1(y) Ři(xy) Ři+1(x), (17) where x and y are called the spectral parameters. The braid relations (4) can be viewed as an asymptotic be- havior of the Yang–Baxter equation. By introducing the new variables of angles θ as cos θ = (1− x)/ 2(1 + x2), sin θ = (1+x)/ 2(1 + x2), the matrix Ři,i+1(x) may be recast to Ři,i+1(θ, ϕ) = sin θ 1i ⊗ 1i+1 + cos θ Mi,i+1. where Mi,i+1 = e −iϕS+i ⊗ S i+1 − eiϕS i ⊗ S i+1 + S S−i+1 − S i ⊗ S i+1, and S ± = Sx ± iSy are the matrices for spin-1/2 angular momentum operators. Similar to Eq. (7), when the unitary matrix Ři,i+1(θ, ϕ) acts on the direct-product states |kl〉, it is expected to produce the nonmaximally entangled states Ři,i+1(θ, ϕ) sin θ|00〉 − eiϕ cos θ|11〉 sin θ|01〉 − cos θ|10〉 cos θ|01〉+ sin θ|10〉) e−iϕ cos θ|00〉+ sin θ|11〉 Remarkably, the four states in the right-hand side of Eq. (18) possess the same degree of entanglement (or the con- currence [17]) equals to | sin(2θ)|. When θ = π/4, they reduce to the four Bell basis and correspondingly the ma- trix Ři,i+1(θ, ϕ) reduces to the braiding operator Bi,i+1. There are two parameters θ, ϕ in the unitary matrix Ři,i+1(θ, ϕ). If let θ be time-dependent while ϕ be time- independent, one can construct a Hamiltonian as in Ref. [6]. However, the eigenstates of such a Hamiltonian are separable states, which do not allow us to study the Berry phases for entangled states. To reach this purpose, in this paper we will let ϕ = ωt be time-dependent while θ be time-independent. Equation (18) can be abbreviated as Ři,i+1(θ, ϕ)|ψ(π/2, 0)〉 = |ψ(θ, ϕ)〉. Taking the Schrödinger equation ih̄∂|ψ(θ, ϕ)〉/∂t = H(θ, ϕ)|ψ(θ, ϕ)〉 into account, one obtains ih̄∂/∂t[Ři,i+1(θ, ϕ)|ψ(π/2, 0)〉] = ih̄∂/∂t[|ψ(θ, ϕ)〉] = H(θ, ϕ)|ψ(θ, ϕ)〉 = H(θ, ϕ)Ři,i+1(θ, ϕ)|ψ(π/2, 0)〉. Now let the parameters θ be time-independent and ϕ(t) = ωt, one may arrive at a Hamiltonian through the unitary transformation Ři,i+1(θ, ϕ) as H(θ, ϕ) = ih̄ ∂Ři,i+1(θ, ϕ) i,i+1(θ, ϕ). (19) More precisely, the Hamiltonian reads H(θ, ϕ) = h̄ϕ̇ cos θ cos θ 0 0 e−iϕ sin θ 0 0 0 0 0 0 0 0 eiϕ sin θ 0 0 − cos θ ,(20) or, H(θ, ϕ) = h̄ϕ̇ cos θ[cos θ(Szi ⊗ 1i+1 + 1i ⊗ Szi+1) + sin θ(e−iϕS+i ⊗ S i+1 + e iϕS−i ⊗ S i+1]. In the standard basis {|00〉, |01〉, |10〉, |11〉}, one observes that H(θ, ϕ) has contributions merely on {|00〉, |11〉}, i.e., it makes four-dimensions “collapse” to two-dimensions since θ is assumed to be time-independent. In the basis of {|01〉, |10〉}, the two eigenstates |χ01〉 = |01〉, |χ10〉 = |10〉 FIG. 2: Berry phases in Bloch space (or the entanglement space). The parameter θ comes from the Yang–Baxterization of the unitary braiding operators, while parameters ϕ origi- nates from the q-deformation of the braiding operators. They define a point on the unit three-dimensional sphere named the Bloch sphere, and have definite geometric meanings as angles of longitude and latitude respectively. Let θ be time- independent, when the parameter ϕ(t) evolves adiabatically from 0 to 2π, one obtains the Berry phases for χ±(θ, ϕ) as shown in Eq. (22). The relation between Berry phases and concurrence of the entangled states χ±(θ, ϕ) is γ± = ∓π(1− 1− C2), where C = | sin θ| is the concurrence. are degenerate with zero eigenvalues E01 = E10 = 0, they will not give rise to Berry phases so we would not like to discuss them here. In the basis of {|00〉, |11〉}, the two eigenvalues E± = ±h̄ϕ̇ cos θ with two corresponding eigenstates read |χ+(θ, ϕ)〉 = cos |00〉+ eiϕ sin |11〉, |χ−(θ, ϕ)〉 = −e−iϕ sin |00〉+ cos θ |11〉. (21) Interestingly, the interval between E+ and E− depends on θ that related to the degree of entanglement of the states. According to Berry’s theory [18], when ϕ(t) evolves adiabatically from 0 to 2π, the corresponding Berry phases for the entangled states are γ± = i dt 〈χ±(θ, ϕ)| |χ±(θ, ϕ)〉 = ∓ , (22) where Ω = 2π(1 − cos θ) is the familiar solid angle en- closed by the loop on the Bloch sphere (see Fig. 2). Actually, the eigenstates |χ±(θ, ϕ)〉 are the SU(2) spin coherent states. If we express the Hamiltonian in terms of SU(2) generators as [19] H(θ, ϕ) = X1J1 +X2J2 +X3J3, (23) where X1 = 2h̄ϕ̇ cos θ sin θ cosϕ, X2 = 2h̄ϕ̇ cos θ sin θ sinϕ, X3 = 2h̄ϕ̇ cos θ cos θ, and the SU(2) generators are J1 = (S i ⊗ S i+1 + S i ⊗ S i+1)/2, J2 = (S i ⊗ S i+1 − S i ⊗ S i+1)/2i, J3 = (S i ⊗ 1i+1 + 1i ⊗ Szi+1)/2, (24) based on which one can verify directly that |χ+(θ, ϕ)〉 = exp[ζJ+ − ζ∗J−] |00〉, |χ−(θ, ϕ)〉 = exp[ζJ+ − ζ∗J−] |11〉, (25) where exp[ζJ+ − ζ∗J−] is the spin coherence operators (and also the usual D 2 (θ, ϕ)-matrix in the angular mo- mentum theory), J± = J1 ± iJ2 and ζ = e−iϕθ/2. Berry phase for spin coherence states has been discussed in [19], where the corresponding result coincides with Eq. (22). IV. CONCLUSION AND DISCUSSION In summary, we have shown that braiding transfor- mation is a natural approach to describe quantum en- tanglement, by applying the unitary braiding operators to realize entanglement swapping and to generate the GHZ states as well as the linear cluster states. The uni- tary braiding matrix Bi,i+1 describes the Bell states and the Yang–Baxter matrix Ři,i+1(θ, ϕ) describes generally entangled states with arbitrary degree of entanglement. Varying the parameter θ continuously from 0 to 2π, one may obtain an “oscillating entanglement” phenomenon for the entangled states. A Hamiltonian is constructed from the unitary Ři,i+1(θ, ϕ)-matrix, where ϕ = ωt is time-dependent while θ is time-independent. This in turn allows us to investigate the Berry phases for the entan- gled states in the entanglement space. Let us make two discussions to end this paper. (i) Very recently, geometric phases for mixed states [20] have been observed in experiments by using NMR interferometry [21] as well as single photon interferome- try [22]. Under a certain noisy environment, the states |χ±(θ, ϕ)〉 may become mixed states as ρ±(r, θ, ϕ) = r |χ±〉〈χ±|+ (1− r)ρnoise, (26) where 0 ≤ r ≤ 1. Usually, ρnoise is chosen as 1i ⊗ 1i+1/4 = (|00〉〈00| + |01〉〈01| + |10〉〈10| + |11〉〈11|)/4 and ρ±(r, θ, ϕ) become the generalized Werner states [3]. Following Ref. [23], one may calculate the geo- metric phases for the mixed states ρ±(r, θ, ϕ), however, the computation becomes complicated since ρ±(r, θ, ϕ) have two nonzero degenerate eigenvalues in the subspace spanned by {|01〉, |10〉}. Geometric phases for degen- erate mixed states are complicated and we will discuss them elsewhere. In the following, we would like to dis- cuss a more simpler case for geometric phases of mixed states, by restricting the noise in the subspace spanned by {|00〉, |11〉}. The analysis on such a restriction to the noisy environment is limited, for it assumes that the states |01〉 and |10〉 are decoupled, and the environment only affects the |00〉 and |11〉 subspace. For simplicity, let us denote |0〉 ≡ |00〉, |1〉 ≡ |11〉, then the Hamiltonian can be rewritten in a very sim- ple form as H(θ, ϕ) = h̄ϕ̇ cos θ r̂ · σ, where r̂ = (sin θ cosϕ, sin θ sinϕ, cos θ) is a unit vector on the Bloch sphere, and σ = (σ1, σ2, σ3) is the Pauli matrix vector in the basis of {|0〉, |1〉}, namely, σ1 = |0〉〈1| + |1〉〈0|, σ2 = −i|0〉〈1| + i|1〉〈0|, σ3 = |0〉〈0| − |1〉〈1|. Based on which, the pure states |χ±(θ, ϕ)〉 can be rewritten in a density matrix form as |χ±〉〈χ±| = (11 ± r̂ · σ)/2, where 11 = |0〉〈0| + |1〉〈1|. In other words, in the basis of {|0〉, |1〉}, |χ±〉 may be viewed as states of a single “qubit”, which allows us to introduce mixed states and discuss their geometric phases in a particular noisy en- vironment as follows. By choosing ρnoise = 11/2, one has from Eq. (26) that ρ±(r, θ, ϕ) = (11± r · σ), (27) where r = rr̂. The state |χ+〉 corresponds to a point r̂ on the Bloch sphere; ρnoise is located on the center of the Bloch sphere; the unit vector r̂ shrinks to r when the particular noise is presented and then |χ±〉〈χ±| turn to be mixed states ρ±(r, θ, ϕ). Follow the same calculations in [23], let r and θ be time-independent, when parameter ϕ(t) evolves adiabatically from 0 to 2π, one obtains the geometric phase for the mixed states (27) as γmixed± = ∓ arctan(r tan ), (28) which reduces to Eq. (22) for r = 1. (ii) The Berry phases in Eq. (22) can be expressed in terms of the concurrence of the states |χ±(θ, ϕ)〉 as γ± = ∓π(1 − 1− C2), with C = | sin θ| being the con- currence. It is well-known that C is an invariant of entan- glement for the entangled states |χ±(θ, ϕ)〉, while Berry phase is related to some certain topological structures. This might bridge a connection between quantum entan- glement and topological quantum computation. Even- tually, when one restricts the discussion to the basis of {|0〉, |1〉}, by taking θ = π/4, φ = −π/2 (or q = i), the matrix Ři,i+1 may reduce to the two-dimensional repre- sentation of braiding operators as in Eq. (140) of [9], which has physical applications in non-Abelian quantum Hall systems and topological quantum field theory. ACKNOWLEDGMENTS The authors thank Prof. L. D. Faddeev and Prof. K. Fijikawa for their encourage- ment and useful discussions. This work was supported in part by NSF of China (Grant No. 10575053 and No. 10605013) and Program for New Century Excellent Tal- ents in University. [1] A. K. Ekert, Phys. Rev. Lett. 67, 661 (1991). [2] C. H. Bennett, and S. J. Wiesner, Phys. Rev. Lett. 69, 2881 (1992); C. H. Bennett, G. Brassard, C. Crépeau, R. Jozsa, A. Peres, and W. K. Wootters, Phys. Rev. Lett. 70, 1895 (1993). [3] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, 2000). [4] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin, and W. K. Wootters, Phys. Rev. A 54, 3824 (1996). [5] L. H. Kauffman and S. J. Lomonaco Jr., New J. Phys. 6, 134 (2004); J. M. Franko, E. C. Rowell, and Z. Wang, J. Knot Theory Ramifications 15, 413 (2006). [6] Y. Zhang, L. H. Kauffman and M. L. Ge, Int. J. Quant. Inform. 3, 669 (2005). [7] A. Y. Kitaev, Annals Phys. 303, 2 (2003); e-print quant-ph/9707021. [8] Yang–Baxter Equations in Integrable Systems, edited by M. Jimbo (World Scientific, Singapore, 1990). [9] J. K. Slingerland, and F. A. Bais, Nucl. Phys. B 612, 229 (2001). [10] G. Badurek, H. Rauch, A.Zeilinger, W. Bauspiess, and U. Bonse, Phys.Rev. D 14, 1177 (1976); A. Zeilinger, Physica B 137, 235 (1986). [11] M. Żukowski, A. Zeilinger, M. A. Horne, and A. K. Ekert, Phys. Rev. Lett. 71, 4287 (1993). [12] A. Zeilinger, M. A. Horne, H. Weinfurter, and M. Żukowski, Phys. Rev. Lett. 78, 3031 (1997). [13] S. Bose, V. Vedral, and P. L. Knight, Phys. Rev. A 57, 822 (1998). [14] J. W. Pan, D. Bouwmeester, H. Weinfurter, A. Zeilinger, Phys. Rev. Lett. 80, 3891 (1998). [15] R. Raussendorf and H. J. Briegel, Phys. Rev. Lett. 86, 5188 (2001). [16] R. Prevedel, P. Walther, F. Tiefenbacher, P. Böhi, R. Kaltenbaek, T. Jennewein, and A. Zeilinger, Nature 445, 65 (2007). [17] W. K. Wootters, Phys. Rev. Lett. 80, 2245 (1998). [18] Geometric Phases in Physics, edited by A. Shapere and F. Wilczek (World Scientific, Singapore, 1989). [19] S. Chaturvedi, M. S. Sriram, and V. Srinivasan, J. Phys. A 20, L1091 (1987). [20] E. Sjöqvist, A. K. Pati, A. Ekert, J. S. Anandan, M. Ericsson, D. K. L. Oi, and V. Vedral, Phys. Rev. Lett. 85, 2845 (2000). [21] J. Du, P. Zou, M. Shi, L. C. Kwek, J. W. Pan, C. H. Oh, A. Ekert, D. K. L. Oi, and M. Ericsson, Phys. Rev. Lett. 91, 100403 (2003). [22] M. Ericsson, D. Achilles, J. T. Barreiro, D. Branning, N. A. Peters, and P. G. Kwiat, Phys. Rev. Lett. 94, 050401 (2005). [23] K. Singh, D. M. Tong, K. Basu, J. L. Chen, and J. F. Du, Phys. Rev. A 67, 032106 (2003). http://arxiv.org/abs/quant-ph/9707021 ABSTRACT We show that braiding transformation is a natural approach to describe quantum entanglement, by using the unitary braiding operators to realize entanglement swapping and generate the GHZ states as well as the linear cluster states. A Hamiltonian is constructed from the unitary $\check{R}_{i,i+1}(\theta,\phi)$-matrix, where $\phi=\omega t$ is time-dependent while $\theta$ is time-independent. This in turn allows us to investigate the Berry phase in the entanglement space. <|endoftext|><|startoftext|> Introduction A large number of baryon resonances has been estab- lished experimentally [1]. Below a mass of 1.8GeV/c2, most of these states are well reproduced by constituent quark models [2,3,4]. The models differ in details of the predicted mass spectrum but have a common feature: above 1.8GeV/c2, they predict many more states than have been seen experimentally. A natural explanation for these miss- ing resonances is that they have escaped detection. The majority of known non-strange baryon resonances stems from πN scattering experiments. Model calculations show that for some of these missing resonances only a small coupling to πN is expected [5]. In elastic scattering, the coupling to πN enters in the entrance and exit channel so Correspondence to: klempt@hiskp.uni-bonn.de these resonances contribute only very weakly. By contrast, these resonances are predicted to have normal photo cou- plings [6] and some of them should be observed in channels like Nη, KΛ, KΣ [7], ∆η or ∆ω [8]. In comparison to the Nπ final state, most of the above provide a distinctive advantage: they act as isospin filters; only N∗ resonances contribute to the Nη and KΛ final states while resonances in ∆η, and ∆ω belong to the ∆∗ series. A partial-wave analysis of various photoproduction data suggested the existence of several new resonances [9]. The analysis included data from CB-ELSA on π0 and η photo- production [10,11], Mainz-TAPS data on η photoproduc- tion [12], beam-asymmetry measurements of π0 and η [13, 14,15], data on γp → nπ+ [16] and from the compilation of the SAID database [17], and data on photoproduction http://arxiv.org/abs/0704.0710v1 2 J. Junkersfeld et al.: Photoproduction of π0ω off protons Fig. 1. Contributions to ∆ω photoproduction: left, produc- tion of ∆∗ intermediate states; right: production of ω mesons via t-channel pion exchange. of γp → K+Λ and γp → K+Σ0 from SAPHIR [18,19], CLAS [20,21], and LEPS [22]. The reaction γp → pω is known to receive large contri- butions from t-channel exchange processes [23,24]. A sim- ilar mechanism may contribute also to ∆ω photoproduc- tion: the incoming photon may couple to ωπ0, the virtual π0 excites the nucleon to a ∆ and the ω escapes, pref- erentially in forward direction. Fig. 1 shows a Feynman diagram for this reaction mechanism and for the produc- tion of a ∆ resonance decaying into ∆ω. This paper reports on a measurement of differential and total cross sections for the reaction γp → pπ0ω , (1) with ω → π0γ and the π0 detected in its two photon decay. From this data the total cross-section for γp → ∆+ω (2) with the subsequent decays ∆+ → pπ0 and ω → π0γ was extracted and compared to an earlier measurement at higher energies [25]. The low statistics for reactions (1) and (2) does not yet provide a sufficiently large data sam- ple for a partial-wave analysis, but may serve as a guide for what to expect from future experiments and is thus of exploratory character. 2 Experimental setup The experiment was performed at the Electron Stretcher Accelerator ELSA [26] at the University of Bonn. Elec- trons were extracted at an energy of 3.2GeV and brems- strahlung was produced in a radiator foil with a thickness of 3/1000 of a radiation length (Fig. 2). Electrons deflected in a magnet were detected with a tagging system cover- ing the photon energy range from 750 to 2970MeV. The tagging system consisted of 14 thick scintillation counters and two proportional wire chambers with a total of 352 wires. The scintillation counters were used to derive a fast timing signal and the wire chambers to determine the pho- ton energy. The γ energy resolution varied from 30MeV dipole magnet beam radiator e beam g beam Crystal Barrel scifi detector target quadrupole H liquifier2 6,6 m Fig. 2. Setup of the CB-ELSA experiment at the lower end to 0.5MeV at the upper end of the spec- trum not taking into account the energy distribution of the electron beam of 3−5MeV [26]. This is well matched with the overall resolution of the detector for this reac- tion of 30MeV (FWHM) (see below). Typical rates were 1 − 3 × 106 photons/s. The photon beam hit a liquid H2 target of 5.3 cm length and 3 cm diameter. The absolute normalisation was derived from a comparison of our dif- ferential angular distributions for the reaction γp → pπ0 with the SAID model SM02. The normalisation uncer- tainty was estimated to be 15% [10,27]. Charged reaction products were detected by a three- layer scintillating fibre detector covering polar angles from 15◦ to 165◦ [28]. The outer layer was parallel to the beam axis, the fibres of the other two layers were bent ±25◦ with respect to the first layer to allow for a spatial re- construction of hits. Photons and charged particles were detected in the Crystal Barrel detector [29], a calorimeter consisting of 1380 CsI(Tl) crystals with photodiode read- out, covering 98% of 4π solid angle. The detector with its high granularity and energy resolution is excellently suited for the detection of multi-photon final states. Electromagnetic showers typically extended over up to 30 crystals in the calorimeter. Photons were reconstructed with an energy resolution of σE/E = 2.5%/ E[GeV] and an angular resolution of σθ,φ ≈ 1.1 ◦. Hits due to charged particles induce smaller clusters with typical 3 – 6 crystals. A fast first-level trigger signal was derived from a coin- cidence between a hit in the tagging system and a signal in at least two out of three layers of the inner fibre detector. The second-level trigger required a minimum number of hits in the calorimeter. For part of the data the minimal number of hits in the calorimeter was 2, otherwise at least 3 hits were requested, in order to reduce the dead-time. Dead-time losses were below 70%, and below 20% for the more restrictive trigger. A more detailed description of the experimental setup and the event reconstruction can be found in [27]. 3 The reaction γp → pπ0ω 3.1 Event selection The reaction γp → pπ0ω, ω → π0γ, leads to a final state with five photons and a proton. The π0ω photoproduction threshold is at Eγ = 1365MeV; a cut on a tagged photon energyEγ > 1315MeV was applied right at the beginning. J. Junkersfeld et al.: Photoproduction of π0ω off protons 3 [MeV] γ 0π 500 600 700 800 900 1000 1100 1200 0.9 MeV± = 783.8 ωm 83± = 2017 ωN [MeV] γ 0π 500 600 700 800 900 1000 1100 1200 Fig. 3. ω signal in π0γ invariant mass The first step in the analysis is the identification of the five photons and the reconstruction of their energies and directions. Protons with Ekin below ∼ 95MeV only pro- duce a signal in the inner detector but not in the calorime- ter. Hence in the analysis, events were selected with 5 or 6 hits in the Crystal Barrel calorimeter and 1 – 3 hits in the inner detector. (A three–hit pattern can arise from three single hits in each layer not crossing in a single point.) At least two layers of the inner detector had to have a sig- nal. For each pair of fibre and barrel hits it was tested if the two vectors pointing from the target centre to a fibre– detector-hit and to a Crystal-Barrel-hit form an angle of 20◦ or less; in this case the Crystal Barrel hit was identi- fied as a proton, otherwise as a photon. The 20◦ matching angle was chosen to allow for the extension of the target and the uncertainties in the measurement. Events with five photons were kept for further analysis. In case of 6 hits in the barrel, one of them had to match the proton identification. Surviving events were kinematically fitted to the hy- pothesis γp → pmiss π 0π0γ with a missing proton, neglect- ing identified charged hits and using all remaining photon candidates. The kinematic fit assumed that the reaction took place in the target centre. Since the momentum of the proton is unknown and needs to be reconstructed, en- ergy and momentum conservation give one constraint, the π0 masses two constraints. A cut on a confidence level of 2% was applied, optimised to lose only few good events. From the fit, the flight direction of the proton was deter- mined and compared to hits in the inner detector. Again, the direction of the missing proton and the direction to a hit in the inner fibre detector had to form an angle of ±20◦ or less for the hit to be identified as proton. The pπ0π0γ events were used to identify pπ0ω events with ω decaying into π0γ. Fig. 3 shows the π0γ mass dis- tribution with two entries per event. The fit using a Voigt function (a Breit-Wigner convoluted with a Gaussian) im- posing the ω width of Γ = 8.49MeV/c2 assigns about 2000 events to reaction (1). The ω mass was determined to (783.8± 0.9stat± 1.0syst)MeV/c 2. The systematic error was estimated from the comparison of η, η′, and ω masses in different reactions with the PDG values. The mass res- olution is determined to σ = 16MeV/c2. [MeV]γE 1500 2000 2500 3000 ω 0 π MCω 0πp [MeV]γE 1500 2000 2500 3000 ω 0 π 0.005 0.015 MC0π 0π 0πp Fig. 4. Acceptance of pπ0ω events (left) and the misidentifi- cation probability of p 3π0 events (right). ) [MeV]γ0πm( 600 800 1000 12000 26± = 253 ωN 600 800 1000 12000 < 2382 MeVγ2256 < E ) [MeV]γ0πm( 600 800 1000 12000 30± = 364 ωN 600 800 1000 12000 < 2677 MeVγ2495 < E Fig. 5. The ω signal with background. The background is predicted in height and shape by simulations of p 3π0 (dark- grey) and pπ0ω combinatorial background (light-grey). The background in the π0γ distribution of pπ0π0γ events has two main sources. A large fraction stems from p 3π0 events. Fig. 4 shows that, in the energy region 2000 < Eγ < 3000MeV, p 3π 0 events have a high probability to be misidentified as pπ0ω. The misidentification probability is only one order of magnitude smaller than the acceptance for pπ0ω. However, the branching ratio of p 3π0 → p 6γ is (96.44±0.09)% compared with (8.71±0.25)% for pπ0ω → p 5γ. The cross-section for 3π0 photoproduction was esti- mated using the cross-section for γp → pη [11], which was determined from events with the η decaying into γγ and 3 π0. The fractions of η and non-η events in the 3π0 event samples were determined and used to estimate the cross-section of γp → p 3π0. Monte Carlo simulations were performed using the p 3π0 cross-section estimate to deter- mine the expected number of p 3π0 events surviving the pπ0π0γ reconstruction. Fig. 5 shows for two photon en- ergy ranges the predicted contribution of pπ0π0π0 events to the background and the observed π0γ distribution. The expected combinatorial background was determined from the number of reconstructed pπ0ω events. It is shown together with the p 3π0 part of the background in Fig. 5. In the ω mass region, there is good agreement between the simulated background distribution and the observed back- ground. The study of simulated pπ0π0 and pπ0η events shows misidentification probabilities of the order of 0.1%. Their contributions were neglected. The number of events due to reaction (1) in a given energy range was determined by fitting the π0γ distribu- tion using a Voigt function for the ω signal and a sec- ond order polynomial for the background. The fit also re- turned the number of background events below the peak. For background subtraction, data histograms were filled with events within the ω mass region (mω ± 40MeV/c and background histograms with events falling into the upper or lower sidebands (687 − 727MeV/c2 and 839 − 4 J. Junkersfeld et al.: Photoproduction of π0ω off protons 879MeV/c2, also shown in Fig. 3). The latter histograms were scaled to contain the same number of events as found in the background below the peak. For each energy and angular region, the sideband histograms were subtracted from the data histograms to extract the pπ0ω distribu- tions. The same procedure was used to determine the pπ0ω distributions for each energy region as function of the mo- mentum transfer and the invariant mass respectively. The acceptance was studied with a GEANT-based Mon- te Carlo simulation using phase space distributed pπ0ω and ∆+ω events. In the first iteration only pπ0ω Monte- Carlo events were taken into account. cross-sections for γp → pπ0ω and γp → ∆+ω were thus obtained, as will be described in sections 3.3 and 3.4, and used to produce Monte Carlo events with a realistic mixture of pπ0ω and ∆+ω events. This provides a more realistic acceptance simulation. Stable results were achieved in the second iter- ation. The simulated acceptance was different when only events due to phase-space distributed pπ0ω events or ∆+ω events were used for the simulation. The difference in the acceptance was taken as a contribution to the systematic error. 3.2 Differential cross-sections The differential cross-sections were obtained from the side- band subtracted histograms. We give in the centre of mass system cross-sections differential in cos θω, cos θπ0 and |t− tmin|, dσ/dΩ(cos θω), dσ/dΩ(cos θπ0), dσ/dt (|t− tmin|), respectively. Here t is the squared four-momentum trans- fer from the photon beam to the pπ0 system given by t = q2 = (pγ − pω) and tmin is the minimal momentum transfer imposed by kinematics. Fig. 6 presents the differential cross-sections as a func- tion of cos θω, in table they are given in numerical form. The distributions are compatible with a description of the (x) = a0 + a1 · e a2x with x = cos θ. (4) In the backward direction, the acceptance is small and the errors large. The fit using Eq. (4) took into account only data for which the acceptance ǫ was above 5% (thus re- stricting the fit range to cos θω > −0.6 forEγ < 1800MeV, and to cos θω > −0.8 for Eγ > 1800MeV), and was then extrapolated to cover the full cos θω range. In the forward direction, there is a strong increase in intensity, in partic- ular at energies above 2GeV. Production of ω mesons via t-channel exchange with simultaneous p → ∆(1232) ex- citation seems to play an important role in the dynamics of reaction (1). From the cos θω distributions a total cross-section was determined by summing over the measured values for which the acceptance was above 5% and using extrapolated val- ues in the remaining range. 0.8 < 1817 MeVγ1383 < E 0.8 < 2256 MeVγ2020 < E < 2495 MeVγ2382 < E -1 -0.5 0 0.5 10 1.5 < 2845 MeVγ2677 < E < 2020 MeVγ1817 < E < 2382 MeVγ2256 < E < 2677 MeVγ2495 < E ωθcos -0.5 0 0.5 1 < 2970 MeVγ2845 < E Fig. 6. Differential cross-sections dσ/dΩ(cos θω). The differential cross-sections dσ/dΩ(cos θπ0) are shown in Fig. 7 and listed numerically in 2. There are no ob- vious structures beside some fluctuations in the forward and backward regions. The data were fitted using a con- stant. The fit was restricted to data points measured with an acceptance of at least 5%, thus excluding for Eγ > 2380MeV the points with cos θπ0 > 0.8. From this dis- tribution the total cross-section is derived from the data points and the extrapolation was used for the points with small acceptance. Fig. 8 shows the differential cross-sections dσ/dt in dependence of |t − tmin|, which are compatible with an exponential behaviour in the low t region. This is charac- teristic for production via t-channel exchange. The data were fitted in the region below 0.8 · |tmax − tmin| (approx- imately corresponding to ǫ > 5%) using (|t− tmin|) = e a+b|t−tmin| + c(E) (5) where tmin is the minimum squared momentum transfer imposed by kinematics. The non-t-dependent contribution was described with a function c(E) = c0 + c1 · Eγ . The parameters c0 and c1 were determined in a combined fit of the differential cross sections. The slope parameter b is shown in Fig. 9. The slope is approximately constant over the covered energy range. This indicates a strong contribution from ω production via t-channel exchange processes. J. Junkersfeld et al.: Photoproduction of π0ω off protons 5 Table 1. Differential cross-sections dσ/dΩ(cos θω). There is a common systematic error of 16%. cos θω dσ/dΩ(cos θω) dσ/dΩ(cos θω) dσ/dΩ(cos θω) dσ/dΩ(cos θω) [µb/sr] [µb/sr] [µb/sr] [µb/sr] Eγ [MeV] 1383 - 1817 1817 - 2020 2020 - 2256 2256 - 2382 −1.00 −−0.80 0.16 ± 0.08 0.00 ± 0.13 0.29 ± 0.14 0.35 ± 0.23 −0.80 −−0.60 0.08 ± 0.06 0.05 ± 0.07 0.18 ± 0.09 0.23 ± 0.13 −0.60 −−0.40 0.04 ± 0.04 0.12 ± 0.06 0.15 ± 0.06 0.33 ± 0.12 −0.40 −−0.20 0.02 ± 0.03 0.03 ± 0.04 0.05 ± 0.04 0.22 ± 0.08 −0.20− 0.00 0.03 ± 0.02 0.17 ± 0.04 0.16 ± 0.05 0.10 ± 0.08 0.00 − 0.20 −0.02± 0.02 0.09 ± 0.05 0.17 ± 0.05 0.31 ± 0.09 0.20 − 0.40 0.06 ± 0.03 0.13 ± 0.04 0.22 ± 0.06 0.30 ± 0.08 0.40 − 0.60 0.03 ± 0.03 0.19 ± 0.06 0.33 ± 0.06 0.39 ± 0.11 0.60 − 0.80 0.04 ± 0.03 0.32 ± 0.06 0.46 ± 0.08 0.71 ± 0.13 0.80 − 1.00 0.11 ± 0.04 0.32 ± 0.07 0.75 ± 0.10 0.65 ± 0.17 Eγ [MeV] 2382 - 2495 2495 - 2677 2677 - 2845 2845 - 2970 −1.00 −−0.80 −0.10± 0.25 0.27 ± 0.18 0.70 ± 0.28 0.47 ± 0.23 −0.80 −−0.60 0.27 ± 0.14 0.44 ± 0.13 0.37 ± 0.16 0.42 ± 0.16 −0.60 −−0.40 0.44 ± 0.13 0.27 ± 0.09 0.15 ± 0.10 0.13 ± 0.12 −0.40 −−0.20 0.21 ± 0.10 0.17 ± 0.08 0.30 ± 0.09 0.25 ± 0.11 −0.20− 0.00 0.17 ± 0.09 0.09 ± 0.07 0.14 ± 0.09 0.25 ± 0.10 0.00 − 0.20 0.25 ± 0.09 0.28 ± 0.08 0.22 ± 0.08 0.17 ± 0.11 0.20 − 0.40 0.48 ± 0.11 0.26 ± 0.08 0.42 ± 0.10 0.32 ± 0.12 0.40 − 0.60 0.45 ± 0.13 0.42 ± 0.10 0.42 ± 0.11 0.51 ± 0.15 0.60 − 0.80 1.02 ± 0.18 0.57 ± 0.13 0.77 ± 0.18 0.99 ± 0.24 0.80 − 1.00 1.76 ± 0.27 1.23 ± 0.20 1.86 ± 0.28 1.79 ± 0.33 Table 2. Differential cross-sections dσ/dΩ(cos θ ). There is a common systematic error of 16%. cos θ dσ/dΩ(cos θ ) dσ/dΩ(cos θ ) dσ/dΩ(cos θ ) dσ/dΩ(cos θ [µb/sr] [µb/sr] [µb/sr] [µb/sr] Eγ [MeV] 1383 - 1817 1817 - 2020 2020 - 2256 2256 - 2382 −1.00−−0.80 0.05± 0.03 0.23 ± 0.07 0.30 ± 0.07 0.40± 0.11 −0.80−−0.60 0.05± 0.03 0.21 ± 0.05 0.32 ± 0.06 0.27± 0.09 −0.60−−0.40 0.07± 0.03 0.09 ± 0.05 0.29 ± 0.06 0.40± 0.11 −0.40−−0.20 0.04± 0.03 0.08 ± 0.05 0.27 ± 0.07 0.30± 0.11 −0.20− 0.00 −0.04± 0.03 0.17 ± 0.05 0.23 ± 0.06 0.42± 0.11 0.00 − 0.20 0.06± 0.03 0.17 ± 0.05 0.18 ± 0.06 0.17± 0.10 0.20 − 0.40 0.03± 0.03 0.11 ± 0.05 0.30 ± 0.06 0.40± 0.12 0.40 − 0.60 0.03± 0.03 0.16 ± 0.06 0.27 ± 0.07 0.32± 0.10 0.60 − 0.80 0.07± 0.04 0.18 ± 0.06 0.24 ± 0.07 0.33± 0.11 0.80 − 1.00 0.04± 0.04 0.18 ± 0.08 0.18 ± 0.10 0.64± 0.17 Eγ [MeV] 2382 - 2495 2495 - 2677 2677 - 2845 2845 - 2970 −1.00−−0.80 0.59± 0.14 0.67 ± 0.11 0.67 ± 0.14 0.61± 0.17 −0.80−−0.60 0.54± 0.12 0.29 ± 0.10 0.37 ± 0.10 0.50± 0.14 −0.60−−0.40 0.57± 0.13 0.37 ± 0.10 0.26 ± 0.11 0.32± 0.12 −0.40−−0.20 0.47± 0.13 0.31 ± 0.10 0.34 ± 0.12 0.23± 0.14 −0.20− 0.00 0.58± 0.13 0.46 ± 0.10 0.64 ± 0.12 0.43± 0.13 0.00 − 0.20 0.22± 0.12 0.12 ± 0.09 0.42 ± 0.12 0.49± 0.13 0.20 − 0.40 0.65± 0.14 0.39 ± 0.10 0.61 ± 0.12 0.60± 0.14 0.40 − 0.60 0.28± 0.12 0.29 ± 0.10 0.37 ± 0.13 0.58± 0.15 0.60 − 0.80 0.39± 0.15 0.39 ± 0.11 0.24 ± 0.12 0.09± 0.16 0.80 − 1.00 0.56± 0.27 0.15 ± 0.21 0.26 ± 0.16 0.35± 0.26 6 J. Junkersfeld et al.: Photoproduction of π0ω off protons From these differential distributions the total cross- section was obtained by integrating over function (5) from tmin to tmax. 3.3 Total cross-section The total cross-section was determined in three different ways, by extrapolation and summation of the three types of differential cross sections, dσ/dΩ(cos θω), dσ/dΩ(cos θπ0), and dσ/dt (|t−tmin|) as described above. Statistical errors of the total cross-sections were determined by error prop- agation. As final result, the mean value of the total cross- section and the mean statistical error are shown in Fig. 10 (left) as a function of the photon energy. The cross-section rises with increasing photon energy, i. e. with the available phase space. A systematic uncertainty was derived from the spread of the three different determinations of the total cross- section, using data of Fig. 6, 7 and 8. A further error of 5.7% was assigned to the Monte Carlo reconstruction efficiency [30]. These contributions and the 15% normal- isation error [27] were added in quadrature to yield the total systematic error shown in Fig. 10. 3.4 The ∆+ω contribution to pπ0ω Fig. 11 shows the differential cross-sections dσ/dm (pπ0), which were used to disentangle the ∆+ω contribution to < 1817 MeVγ1383 < E < 2256 MeVγ2020 < E 1 < 2495 MeVγ2382 < E -1 -0.5 0 0.5 10 1 < 2845 MeVγ2677 < E < 2020 MeVγ1817 < E < 2382 MeVγ2256 < E < 2677 MeVγ2495 < E 0πθcos -0.5 0 0.5 1 < 2970 MeVγ2845 < E Fig. 7. Differential cross-sections dσ/dΩ(cos θ d -110 10 < 1817 MeVγ1383 < E 10 < 2256 MeVγ2020 < E 10 < 2495 MeVγ2382 < E 0 1 2 10 < 2845 MeVγ2677 < E < 2020 MeVγ1817 < E < 2382 MeVγ2256 < E < 2677 MeVγ2495 < E ]2| [(GeV/c) |t - t < 2970 MeVγ2845 < E Fig. 8. Differential cross-sections dσ/dt (|t − tmin|) of the squared four-momentum transfer t to the pπ0 system. [MeV]γE 1500 2000 2500 3000 Fig. 9. Slope parameter of dσ/dt (|t− tmin|) the total cross-section. The distributions show prominent ∆ signals. The ∆ peak was fitted by a phase space cor- rected Breit-Wigner function (see e. g. [31] for details). The non-resonant pπ0ω part was described by phase-space distributed pπ0ω Monte Carlo events. Only the ampli- tudes of the two contributions were left free in the fit. The Breit-Wigner width of the ∆ was fixed to 120MeV/c2 the mass was fixed to 1232MeV/c2 for energies below 2500MeV and set to values between 1240 and 1250MeV/c2 for higher energies to improve the fit. With these two com- ponents a good description of the ∆ peak and of the pπ0ω J. Junkersfeld et al.: Photoproduction of π0ω off protons 7 phase space contribution to the differential cross-section was achieved. The Breit-Wigner distributions and the pπ0ω phase- space contributions were integrated and their fractions de- termined. The systematic uncertainty due to the disentan- glement was estimated to 3− 10% and added in quadra- ture to the systematic error. The cross-section for γp → pπ0ω without ∆+ω contributions is shown in Fig. 10 (right). The total cross-section of γp → ∆+ω was determined from the observed fraction of ∆+ω events and the γp → pπ0ω cross-section, taking into account the unseen ∆+ → nπ+ decay mode. The resulting cross-section is shown in Fig. 12 together with the results of the LAMP2 exper- iment [25]. It is worthwhile to discuss how the LAMP2 cross-section was determined. The LAMP2 experiment measured the reaction γp → ∆+ω by identifying ω mesons in their π+π−π0 decays. The ∆+ decay products were not observed. Instead, the ∆+ was identified in the missing mass spectrum of the γp → ωX reaction. The missing mass distribution (Fig. 13) con- tains signals for pω and ∆+ω production. The authors give a 15% systematic uncertainty due to the difficulty in disentangling the pω, pπ0ω and ∆+ω contributions. In our analysis the fraction of pπ0 below the ∆ is sig- nificant (see fig. 11) and larger than estimated by LAMP2. Hence it seems possible that the LAMP2 cross-section is overestimated. The total cross-section for ∆+ω photoproduction in Fig. 12 (see table 3) is consistent with a simple fit as- suming a background amplitude in the form A · (E − Ethreshold) α/2 · (E − Eh) β/2 (A, α, β, Ethreshold, Eh fit parameters). The χ2 = 12.1 for NDoF = 6 corresponds to an acceptable 6% probability. 4 Summary We have studied the reaction γp → pπ0ω with ω → π0γ from the ωπ0 production threshold up to 3GeV photon energy using an unpolarised tagged photon beam and a liquid hydrogen target. Differential cross-sections were de- termined as functions of cos θω, cos θπ0 and |t− tmin|. The distributions reveal strong contributions from isovector [MeV]γE 1500 2000 2500 3000 ω 0 π [MeV]γE 1500 2000 2500 3000 Fig. 10. Total cross-sections σ(γp → p π0ω) before (left) and after (right) subtraction of the ∆+ω contribution. The grey band represents the systematic uncertainty. 0.005 0.015 0.025 < 1817 MeVγ1383 < E 0.04 < 2256 MeVγ2020 < E 0.04 < 2495 MeVγ2382 < E 1200 1400 1600 18000 0.04 < 2845 MeVγ2677 < E < 2020 MeVγ1817 < E < 2382 MeVγ2256 < E < 2677 MeVγ2495 < E [MeV]0πpm 1200 1400 1600 1800 < 2970 MeVγ2845 < E Fig. 11. Differential cross-sections dσ/dm (pπ0). They are fit- ted with a combination of a Breit-Wigner (blue) and phase space pπ0ω Monte Carlo events (red). [MeV]γE 2000 3000 4000 5000 [MeV]γE 2000 3000 4000 5000 CB-ELSA LAMP2 Fig. 12. Total cross-section σ(γp → ∆+ω), shown are the data from this analysis and from the LAMP2 experiment [25]. The systematic errors are shown as an error band in light (CB- ELSA) and dark grey (LAMP2). A fit to the data points is shown, which is described in the text. 8 J. Junkersfeld et al.: Photoproduction of π0ω off protons Fig. 13. Missing mass of the ω in the LAMP2 experiment (from [25]). Note that the spectrum is dominated by pω events. Table 3. Total cross-sections of γp → p π0ω (with and with- out ∆+ω contribution) and γp → ∆+ω. The systematic error is shown in Fig. 10 and 12. Eγ σ(γp → pπ 0ω) σ(γp → pπ0ω) σ(γp → ∆+ω) [MeV] [µb] (no ∆+ω) [µb] [µb] 1383− 1817 0.49 ± 0.13 0.35 ± 0.13 0.21 ± 0.17 1817− 2020 1.95 ± 0.24 0.93 ± 0.21 1.54 ± 0.35 2020− 2256 3.28 ± 0.28 1.46 ± 0.28 2.74 ± 0.44 2256− 2382 4.47 ± 0.47 2.44 ± 0.50 3.04 ± 0.63 2382− 2495 6.31 ± 0.56 3.58 ± 0.61 4.10 ± 0.76 2495− 2677 4.80 ± 0.43 3.49 ± 0.50 1.97 ± 0.50 2677− 2845 5.87 ± 0.52 3.92 ± 0.62 2.92 ± 0.63 2845− 2970 5.93 ± 0.63 4.18 ± 0.77 2.62 ± 0.74 exchange currents from the photon – converting to an ω meson – to the proton which undergoes a p-∆ excita- tion. The cross section for ∆ω production and for non-∆ω events rises with increasing phase space; LAMP2 data in- dicate a decrease of the cross-sections when going to larger photon energies (3 - 5GeV). We thank the technical staff at ELSA and at all the partici- pating institutions for their invaluable contributions to the suc- cess of the experiment. We acknowledge financial support from the Deutsche Forschungsgemeinschaft (DFG) within the Son- derforschungsbereich SFB/TR16. The collaboration with St. Petersburg received funds from DFG and the Russian Founda- tion for Basic Research. B. Krusche acknowledges support from Schweizerischer Nationalfond. U. Thoma thanks for an Emmy Noether grant from the DFG. A.V. Anisovich and A.V. Sarant- sev acknowledge support from the Alexander von Humboldt Foundation. This work comprises part of the PhD thesis of J. Junkersfeld. References 1. W.M. Yao et al., J. Phys. G 33, 1 (2006). 2. S. Capstick and N. Isgur, Phys. Rev. D 34 (1986) 2809. S. Capstick and W. Roberts, Prog. Part. Nucl. Phys. 45 (2000) S241. 3. L. Y. Glozman, W. Plessas, K. Varga, and R. F. Wagen- brunn, Phys. Rev. D 58 (1998) 094030. 4. U. Löring, K. Kretzschmar, B. C. Metsch, and H. R. Petry, Eur. Phys. J. A 10 (2001) 309. U. Löring, B. C. Metsch, and H. R. Petry, Eur. Phys. J. A 10 (2001) 395, 447 5. S. Capstick and W. Roberts, Phys. Rev. D 49 (1994) 4570. 6. S. Capstick, Phys. Rev. D 46 (1992) 2864. 7. S. Capstick and W. Roberts, Phys. Rev. D 58 (1998) 074011. 8. S. Capstick and W. Roberts, Phys. Rev. D 57 (1998) 4301. 9. A. V. Anisovich, A. Sarantsev, O. Bartholomy, E. Klempt, V. A. Nikonov and U. Thoma, Eur. Phys. J. A 25 (2005) A. Sarantsev, V. A. Nikonov, A. V. Anisovich, E. Klempt and U. Thoma, Eur. Phys. J. A 25 (2005) 441. 10. O. Bartholomy et al., Phys. Rev. Lett. 94 (2005) 012003. 11. V. Crede et al., Phys. Rev. Lett. 94 (2005) 012004. 12. B. Krusche et al., Phys. Rev. Lett. 74 (1995) 3736. 13. O. Bartalini et al., Eur. Phys. J. A 26 (2005) 399. 14. A.A. Belyaev et al., Nucl. Phys. B 213 (1983) 201. R. Beck et al., Phys. Rev. Lett. 78 (1997) 606. D. Rebreyend et al., Nucl. Phys. A 663 (2000) 436. 15. J. Ajaka et al., Phys. Rev. Lett. 81 (1998) 1797. 16. K.H. Althoff et al., Z. Phys. C 18 (1983) 199. E. J. Durwen, BONN-IR-80-7 (1980). K. Buechler et al., Nucl. Phys. A 570 (1994) 580. 17. R.A. Arndt et al., W. J. Briscoe, I. I. Strakovsky, and R. L.Workman, http://gwdac.phys.gwu.edu. 18. K. H. Glander et al., Eur. Phys. J. A 19 (2004) 251. 19. R. Lawall et al., Eur. Phys. J. A 24 (2005) 275. 20. J. W. C. McNabb et al., Phys. Rev. C 69 (2004) 042201. 21. B. Carnahan, UMI-31-09682 (microfiche), Ph.D. thesis (2003), CUA, Washington, D.C., see also R. Bradford et al., Phys. Rev. C 73 (2006) 035202 22. R. G. T. Zegers et al., Phys. Rev. Lett. 91 (2003) 092001. 23. J. Barth et al., Eur. Phys. J. A 18 (2003) 117. 24. J. Ajaka et al., Phys. Rev. Lett. 96 (2006) 132003. 25. D. P. Barber et al.,Z. Phys. C 26 (1984) 343. 26. W. Hillert, Eur. Phys. J. A 28S1 (2006) 139. 27. H. van Pee et al., Eur. Phys. J. A 31 61 (2007). 28. G. Suft et al., Nucl. Instrum. Meth. A 538 (2005) 416. 29. E. Aker et al., Nucl. Instrum. Meth. A 321 (1992) 69. 30. C. Amsler et al., Z. Phys. C 58 (1993) 175. 31. A. V. Anisovich, A. Sarantsev, O. Bartholomy, E. Klempt, V. A. Nikonov and U. Thoma, Eur. Phys. J. A 25 (2005) http://gwdac.phys.gwu.edu Introduction Experimental setup The reaction pp0 Summary ABSTRACT Differential and total cross-sections for photoproduction of gamma proton to proton pi0 omega and gamma proton to Delta+ omega were determined from measurements of the CB-ELSA experiment, performed at the electron accelerator ELSA in Bonn. The measurements covered the photon energy range from the production threshold up to 3GeV. <|endoftext|><|startoftext|> arXiv:0704.0711v1 [nucl-th] 5 Apr 2007 Two-pion exchange three-nucleon potential: O(q4) chiral expansion S. Ishikawa1 and M. R. Robilotta2 1Department of Physics, Science Research Center, Hosei University, 2-17-1 Fujimi, Chiyoda, Tokyo 102-8160, Japan 2Instituto de F́ısica, Universidade de São Paulo, C.P. 66318, 05315-970, São Paulo, SP, Brazil (Dated: November 4, 2018) Abstract We present the expansion of the two-pion exchange three-nucleon potential (TPE-3NP) to chiral order q4, which corresponds to a subset of all possibilities at this order and is based on the πN amplitude at O(q3). Results encompass both numerical corrections to strength coefficients of previous O(q3) terms and new structures in the profile functions. The former are typically smaller than 10% whereas the latter arise from either loop functions or non-local gradients acting on the wave function. The influence of the new TPE-3NP over static and scattering three-body observables has been assessed and found to be small, as expected from perturbative corrections. PACS numbers: 13.75.Cs, 21.30.Fe, 13.75.Gx, 12.39.Fe http://arxiv.org/abs/0704.0711v1 I. INTRODUCTION The research programme for nuclear forces, outlined more than fifty years ago by Taketani, Nakamura, and Sasaki [1], treats pions and nucleons as basic degrees of freedom. This insight proved to be very fruitful. On the one hand, it implies the interconnection of all nuclear processes, both among themselves and with a class of free reactions. On the other, it determines a close relationship between the number of pions involved in a given interaction and its range. As a consequence, the outer components of nuclear forces are dominated by just a few basic subamplitudes, describing either single (N → πN) or multipion (ππ → ππ, πN → πN , πN → ππN , ...) interactions. Nevertheless, it took a long time for a theoretical tool to be available which allows the precise treatment of these amplitudes. Nowadays, owing to the development of chiral per- turbation theory (ChPT) in association with effective lagrangians [2, 3], the roles of pions and nucleons in nuclear forces can be described consistently. The rationale for this approach is that the quarks u and d, which have small masses, dominate low-energy interactions. One then works with a two-flavor version of QCD and treats their masses as perturbations in a chiral symmetric lagrangian. The systematic inclusion of quark mass contributions is performed by means of chiral perturbation theory, which incorporates low-energy features of QCD into the nuclear force problem. In performing perturbative expansions, one uses a typical scale q, set by either pion four-momenta or nucleon three-momenta, such that q ≪ 1 Nuclear forces are dominated by two-body (NN) interactions and leading contributions are due to the one-pion exchange potential (OPEP), which begins [4] at O(q0). The two- pion exchange potential (TPEP) begins at O(q2) and, at present, there are two independent expansions up to O(q4) in the literature, based on either heavy baryon [5] or covariant [6, 7] ChPT. The TPEP is closely related with the off-shell πN amplitude and, at this order, two-loop diagrams involving intermediate ππ scattering already begin to contribute. In proper three-nucleon (3N) interactions, the leading term is due to the process known as TPE-3NP, in which the pion emitted by a nucleon is scattered before being absorbed by another one. It has been available since long [8–10], involves only tree-level interactions and has the longest possible range. This contribution begins at O(q3) and consistency with available NN forces demands the extension of the chiral series for the 3NP up to O(q4). However, the implementation of this programme is not straightforward, since it requires the evaluation of a rather large number of diagrams. With the purpose of exploring the magnitude of O(q4) effects, in this work we concentrate on the particular subset of processes which still belong to the TPE-3NP class. Our presentation is divided as follows. In section II we display the general relationship between the TPE-3NP and the πN amplitude, in order to discuss how it affects chiral power counting in the former. The πN amplitude relevant for the O(q4) potential is derived in section III and used to construct the three-body interaction in section IV. We concentrate on numerical changes induced into both potential parameters and observables in sections V and VI, whereas conclusions are presented in section VII. There are also four appendices, dealing with kinematics, πN subthreshold coefficients, loop integrals and non-local terms. II. GENERAL FORMULATION Potentials to be used into non-relativistic equations can be derived from field the- ory by means of the T -matrix. In the case of three-nucleon potentials, one starts from the non-relativistic transition matrix describing the process N(p1) N(p2) N(p3) → N(p′1) N(p 2) N(p 3), which includes both kernels and their iterations. The former corre- spond to proper interactions, represented by diagrams which cannot be split into two pieces by cutting positive-energy nucleon lines only, whereas the latter are automatically gener- ated by the dynamical equation. Therefore, just the kernels, denoted collectively by t̄3, are included into the potential. The transformation of a T -matrix into a potential depends on both the dynamical equa- tion adopted and conventions associated with off-shell effects. The latter were discussed in a comprehensive paper by Friar [11]. Here we use the kinematical variables defined in Appendix A and relate t̄3 to the momentum space potential operator Ŵ by writing [12] 〈p′1,p′2,p′3 |Ŵ |p1,p2,p3〉 = −(2π)3 δ3(P ′−P ) t̄3(p′1,p′2,p′3,p1,p2,p3) . (1) In configuration space, internal dynamics is described by the function W (r′,ρ′; r,ρ) = − (2π)3 (2π)3 (2π)3 (2π)3 ×ei[Qr·(r′−r)+Qρ·(ρ′−ρ)+qr ·(r′+r)/2+qρ·(ρ′+ρ)/2] t̄3(Qr,Qρ, qr, qρ) , (2) which is to be used in a non-local version of the Schrödinger equation: ρ′ − ǫ ψ(r′,ρ′) = − dr dρ W (r′,ρ′; r,ρ) ψ(r,ρ) . (3) Non-local effects are associated with the variablesQr andQρ. When these effects are not too strong, they can be represented by gradients acting on the wave function and the potential W is rewritten as W (r′,ρ′; r,ρ) = δ3(r′−r) δ3(ρ′−ρ) V (r,ρ) . (4) The two-pion exchange three-nucleon potential is represented in Fig. 1a. It is closely related with the πN scattering amplitude, which is O(q) for free pions and becomes O(q2) within the three-nucleon system. As a consequence, the TPE-3NP begins at O(q3) and, at this order, it also receives contributions from interactions (c) and (d), which have shorter range. The extension of the chiral series to O(q4) requires both the inclusion of single loop effects into processes that already contribute at O(q3) and the evaluation of many new amplitudes, especially those associated with diagram (b). (a) (b) (c) (d) FIG. 1: (Color online) Classes of three-nucleon forces, where full and dashed lines represent nucleons and pions respectively; diagram (a) corresponds to the TPE-3NP. In this paper we concentrate on the particular set of processes which belong to the TPE- 3NP class, represented by the T -matrix Tππ and evaluated using the kinematical conditions given in Fig. 2. The coupling of a pion to nucleon i = (1, 2) is derived from the usual lowest order pseudo-vector lagrangian L(1) and the Dirac equation yields the equivalent forms for the vertex (gA/2fπ) [τ ū (p ′−p) γ5 u](i) = (mgA/fπ) [τ ū γ5 u](i) , (5) where gA, fπ and m represent, respectively, the axial nucleon decay, the pion decay and the nucleon mass. k´, b k , a FIG. 2: (Color online) Two-pion exchange three-nucleon potential. The amplitude for the intermediate process πa(k)N(p) → πb(k′)N(p′) has the isospin structure Tba = δab T + + iǫbacτc T − (6) and Fig. 2 yields Tππ =− [ū γ5 u] [ū γ5 u] (2) 1 k2−µ2 ′2−µ2 (1) ·τ (2) T+ − i τ (1)×τ (2) ·τ (3) T− , (7) µ being the pion mass. Results in Appendix A show that [ū γ5 u] (i) → O(q), whereas pion propagators are O(q−2). As a consequence, in the O(q4) expansion of the potential one needs Tππ to O(q) and T± to O(q3). For on-shell nucleons, the sub amplitudes T± can be written as T± = ū(p′) D± − i σµν(p ′−p)µKν B± u(p) , (8) with K = (k′+k)/2. The dynamical content of the πN interaction is carried by the functions D± and B± and their main properties were reviewed by Höhler [13]. The chiral structure of these sub amplitudes was discussed by Becher and Leutwyler [14, 15] a few years ago, in the framework of covariant perturbation theory, and here we employ their results. As far as power counting is concerned, in Appendix A one finds [ū(p′) u(p)](3) → O(q0) and ū(p′) σµν(p ′−p)µKν u(p)](3) → O(q2), indicating that one needs the expansions of D± and B± up to O(q3) and O(q) respectively. At low and intermediate energies, the πN amplitude is given by a nucleon pole superim- posed to a smooth background. One then distinguishes the pseudovector (PV) Born term from a remainder (R) and writes T± = T±pv + T R . (9) The former contribution depends on just two observables, namely the nucleon mass m and the πN coupling constant g, as prescribed by the Ward-Takahashi identity [16]. The calcu- lation of these quantities in chiral perturbation theory may involve loops and other coupling constants but, at the end, results must be organized so as to reproduce the physical values of both m and g in T±pv [17]. For this reason, one uses the constant g, instead of (gA/fπ), since the former is indeed the observable determined by the residue of the nucleon pole [13, 15, 18]. The pv Born sub amplitudes are given by D+pv = k′ ·k k′ ·k , (10) B+pv = −g2 , (11) D−pv = k ·k′ k ·k′ , (12) B−pv = −g2 , (13) where s and u are the usual πN Mandelstam variables. In the case of free pions, their chiral orders are respectively [D+pv, B pv, D pv, B pv] → O[q2, q−1 , q, q0], but important changes do occur when the pions become off-shell. The amplitudes T±R receive contributions from both tree interactions and loops. The former can be read directly from the basic lagrangians and correspond to polynomials in t = (k′−k)2 and ν = (p′+p)·(k′+k)/4m, with coefficients given by renormalized LECs [15]. The latter are more complex and depend on Feynman integrals. In the description of πN amplitudes below threshold, one approximates both types of contributions by polynomials and writes [13, 19] 2mtn , (14) where XR stands for D R/ν, D R/ν or B R . The subthreshold coefficients xmn have the status of observables, since they can be obtained by means of dispersion relations applied to scattering data. As such, they constitute an important source of information about the values of the LECs to be used in effective lagrangians. The isospin odd subthreshold coefficients include leading order terms, which implement the predictions made by Weinberg [20] and Tomozawa [21] for πN scattering lengths, given D−WT = 2f 2π , B−WT = 2f 2π . (15) For free pions, one has [D−WT , B WT ] → O[q, q0], but these orders of magnitude also change when pions become virtual. Quite generally, the ranges of nuclear interactions are determined by t-channel exchanges. At O(q3), the TPE-3NP involves only single-pion exchanges among different nucleons and has the longest possible range. Another t-channel structure becomes apparent at O(q4), associated with the pion cloud of the nucleon, which gives rise to both scalar and vector form factors [18]. These effects extend well beyond 1 fm [22, 23] and a limitation of the power series given by Eq. (14) is that they cannot accommodate these ranges, since Fourier transforms of polynomials yield only δ-functions and its derivatives. In the description of the πN amplitude produced by Becher and Leutwyler [15], one learns that the only sources of medium range (mr) effects are their diagrams k and l, which contain two pions propagating in the t-channel. In our derivation of the TPE-3NP, the loop content of these diagrams is not approximated by power series and, for free pions, the non-pole subamplitudes are written as D+R = D mr(t) + d̄+00 + d 2 + d̄+01t d+20ν 4 + d+11ν 2t + d̄+02t , (16) B+R = B mr(t) + b+00ν , (17) D−R = D mr(t) + ν/(2f 2π) d̄−00ν + d 3 + d̄−01νt , (18) B−R = B mr(t) + 1/(2f 2π) + b̄ b−10ν 2 + b̄−01t , (19) where the labels (n) outside the brackets indicate the presence of O(qn) leading terms and mr denotes terms associated with the nucleon pion cloud. The bar symbol over some coefficients indicates that they do not include both Weinberg-Tomozawa and medium range contributions, which are accounted for explicitly. The functions D±R and B R depend on the parameters fπ, gA, µ, m and on the LECs ci and d̄i, which appear into higher order terms of the effective lagrangian. The subthreshold coefficients are the door through which LECs enter our calculation and their explicit forms are given in Appendix B. The dynamical content of the O(q3) πN amplitude is shown in Fig. 3. The first two diagrams correspond to PV Born amplitudes, whereas the third one represents the Weinberg- = + + FIG. 3: (Color online) Representation of the πN amplitude used in the construction of the TPE- Tomozawa contact interaction, all of them with physical masses and coupling constants. The fourth graph summarizes the terms within square brackets in Eqs. (16-19) and depends on the LECs. Finally, the last two diagrams describe medium range effects owing to the nucleon pion cloud, associated with scalar and vector form factors. This decomposition of the πN amplitude has also been used in our derivation of the two-pion exchange components of the NN interaction [6, 7] and hence the present calculation is consistent with those results. III. INTERMEDIATE πN AMPLITUDE The combination of Figs. 2 and 3 gives rise to the TPE-3NP, associated with the six diagrams shown in Fig. 4. In the sequence, we discuss their individual contributions to the subamplitudes D± and B±. We are interested only in the longest possible component of the potential and numerators of expressions are systematically simplified by using k2 → µ2 and k ′2 → µ2. In configuration space, this corresponds to keeping only those terms which contain two Yukawa functions and neglecting interactions associated with Figs. 1 (c) and 1 • diagrams (a) and (b): The crosses in the nucleon propagators of Figs. 4 (a) and 4 (b) indicate that they do not include forward propagating components, so as to avoid double counting when the potential is used in the dynamical equation. The covariant evaluation of these contributions is based on Eqs. (10-13). Denoting by p̄ the momenta of the propagating + += + + + (a) (b) (c) (d) (e) (f) FIG. 4: (Color online) Structure of the O(q4) two-pion exchange three-nucleon potential nucleons, the factors 1/(s−m2) and 1/(u−m2) are decomposed as (p̄0)2−Ē2 2Ē(p̄0−Ē) 2Ē(p̄0+Ē) , (20) with Ē = m2+p̄2. The first term represents forward propagating nucleons, associated with the iteration of the OPEP, whereas the second one gives rise to connected contributions. Discarding the former and using the results of Appendix A, one has 1/({su} −m2)→ −1/ 4m2 + 3q2r+q ρ/3+16Q ρ/3± 10qr ·Qρ/ 3∓ 2qρ ·Qr/ . (21) After appropriate truncation, one obtains D+ab = − (2µ2−t)→ O(q2) , (22) B+ab → O(q 2) , (23) D−ab = − ν → O(q2) , (24) B−ab → O(q 2) , (25) where we have used the fact that, in the case of virtual pions, ν → O(q2). • diagrams (c) and (d): These contributions are purely polynomial, can be read directly from Eqs. (16-19), and are given by D+cd = − 16 πf 4π (2µ2−t)→ O(q2) , (26) B+cd → O(q 2) , (27) D−cd = 2f 2π ν → O(q2) , (28) B−cd = 2f 2π 2 c4m 8 πf 4π → O(q0) . (29) • diagrams (e) and (f): The medium range components of the intermediate πN amplitude D+e = 64π2f 4π (2t−µ2) (1−t/2µ2) Πt − 2π → O(q3) , (30) D+ef → O(q 4) , (31) B−e = g2Amµ 16π2f 4π (1−t/4µ2) Πt − π → O(q) , (32) where Πt is the dimensionless Feynman integral µ2 F (a) ← M = 2µ/a , F (a) = tan−1 µ (1−a2/2) . (33) The amplitude D−ef , proportional to ν, is O(q3) for free pions and here becomes O(q4). Thus, in fact, diagram (f) does not contribute to the TPE-3NP at O(q4). • full results: The Golberger-Treiman relation g/m = gA/fπ is valid up to O(q2) and can be used in diagrams (a) and (b). One then has σ(2µ2) (2µ2−t) + c3 + g2A(1+g 16πf 2π 128π2f 2π (1−2t/µ2) Πt , (34) where σ(t = 2µ2) = −4 c1 µ2 − 3g2Aµ 32πf 2π is the value of the scalar form factor at the Cheng-Dashen point [14]. The remaining ampli- tudes read B+ → O(q2) , (36) 1−g2A 2f 2π ν , (37) 1 + 4 c4m 2f 2π A(1+2g 16 πf 4π g2Amµ 16 π2f 4π (1−t/4µ2) Πt . (38) The subamplitudes D± and B± begin at O(q2) and one needs just the leading terms in the spinor matrix elements of Eq. (8), which is rewritten as T+ = 2mD+ , (39) T− = 2mD− + iσ(3) ·k′×kB− , (40) with D+ → O(q2)+O(q3), D− → O(q2), and B− → O(q0)+O(q). • O(q3) reduction: In order to compare our amplitudes with previous O(q3) results, one notes that, in case corrections are dropped, one would have (2µ2−t) , (41) 2f 2π 2 c4m . (42) These expressions agree with those derived directly from a chiral lagrangian [24], except for the terms within square brackets in both D+ and B−. The former corresponds to a Born contribution whereas the latter is due to diagram (c) in Fig. 4, associated with the Weinberg-Tomozawa term. IV. TWO-PION EXCHANGE POTENTIAL The expansion of the TPE-3NP up to O(q4) requires only leading terms in vertices and propagators. In order to derive the non-relativistic potential in momentum space, one divides Eq. (7) by the relativistic normalization factor 2m for each external nucleon leg and writes1 t̄3 = 4f 2π ′2+µ2 (1) ·k σ(2) ·k′ (1) ·τ (2)D+ − i τ (1) × τ (2) ·τ (3) (3) ·k′×k B− . (43) The configuration space potential has the form V3(r,ρ) = τ (1) ·τ (2) V +3 (r,ρ) + τ (1) × τ (2) ·τ (3) V −3 (r,ρ) + cyclic permutations, (44) 1 One notes that this expression is identical with Eq. (33) of Ref. [10] divided by 8m3. V +3 (r,ρ) = C (1) ·x̂31 σ(2) ·x̂23 U1(x31)U1(x23) + C+2 (1/9)σ(1) ·σ(2) [U(x31)−U2(x31)] [U(x23)−U2(x23)] + (1/3)σ(1) ·x̂23 σ(2) ·x̂23 [U(x31)−U2(x31)] U2(x23) + (1/3)σ(1) ·x̂31 σ(2) ·x̂31 U2(x31) [U(x23)−U2(x23)] + σ(1) ·x̂31 σ(2) ·x̂23 x̂31 ·x̂23 U2(x31)U2(x23) + C+3 σ (1) ·∇I31 σ(2) ·∇I23 ∇I31 ·∇I23 I0 − 2 I1 , (45) V −3 (r,ρ) = C (1/9)σ(1)×σ(2) ·σ(3) [U(x31)−U2(x31)] [U(x23)−U2(x23)] + (1/3)σ(3)×σ(1) ·x̂23 σ(2) ·x̂23 [U(x31)−U2(x31)] U(x23) + (1/3)σ(1) ·x̂31 σ(2)×σ(3) ·x̂31 U2(x31) [U(x23)−U2(x23)] + σ(1) ·x̂31 σ(2) ·x̂23 σ(3) ·x̂31×x̂23 U2(x31)U2(x23) + C−2 (1) · 31 −i∇ (2) ·x̂23 [U(x31)−U2(x31)] U1(x23) + σ(1) ·x̂31 σ(2) · 31 −i∇ U1(x31) [U(x23)−U2(x23)] + 3σ(1) ·x̂31 σ(2) ·x̂23 31 −i∇ · [x̂31 U2(x31)U1(x23) + x̂23 U1(x31)U2(x23)] + C−3 σ (1) ·∇I31 σ(2) ·∇I23 σ(3) ·∇I31×∇I23 I0 − I1/4 . (46) The profile functions are written in terms of the dimensionless variables xij = µ rij and read U(x) = , (47) U1(x) = − , (48) U2(x) = , (49) In = − (2π)3 (2π)3 ei(k·r31+k ·r23) ′2+µ2 Πt(t) . (50) The last function involves the loop integral given in Eq. (33) and is discussed further in Appendix C. The gradients ∇Iij act on the functions I n, whereas the ∇ ij act only on the wave function and give rise to non-local interactions, as discussed in Appendix D. The strength coefficients are the following combinations of the basic coupling constants C+1 = 64 π2f 4π σ(2µ2) , (51) C+2 = 32 π2f 4π m +mc3 + g2A(1+g 16πf 2π , (52) C+3 = 4096 π3f 6π , (53) C−1 = 256 π2f 4π m 1 + 4mc4 − g2A(1+2g 8πf 2π , (54) C−2 = g2A(g A−1)µ6 768 π2f 4π m , (55) C−3 = − 2048 π3f 6π . (56) V. STRENGTH COEFFICIENTS The strength constants of the potential involve a blend of four well determined param- eters, namely m = 938.28 MeV, µ = 139.57 MeV, gA = 1.267 and fπ = 92.4 MeV, with the scalar form factor at the Cheng-Dashen point and the LECs c3 and c4, which are less precise. As far as σ(2µ2) is concerned, we rely on the results [25] σ(2µ2)−σ(0) = 15.2± 0.4 MeV, σ(0) = 45±8 MeV, and adopt the central value σ(2µ2) = 60 MeV. The values quoted for the LECs in the literature vary considerably, depending on the empirical input employed and the chiral order one is working at. A sample of values is given in Table I. Our work is based on the O(q3) expansion of the intermediate πN amplitude and, for the sake of consistency, one must use LECs extracted at the same order. The kinematical conditions of the three-body interaction are such that the variable ν is O(q2), an order of magnitude smaller than the threshold value, ν = µ. This makes information encompassed in the subthreshold coefficients better suited to this problem and we use results from Appendix TABLE I: Some values of the LECs c3 and c4; m is the nucleon mass. Reference Chiral order πN input mc3 mc4 [26] 3 amplitude at ν = 0, t = 0 −5.00± 1.43 3.62 ± 0.04 [26] 3 amplitude at ν = 0, t = 2µ2/3 −5.01± 1.01 3.62 ± 0.04 [27] 3 scattering amplitude −5.69± 0.04 3.03 ± 0.16 [15] 4 subthreshold coefficients -3.4 2.0 [15] 4 scattering lengths -4.2 2.3 tree 2 subthreshold coefficients -3.6 2.0 this work 3 subthreshold coefficients -4.9 3.3 B in order to write mc3 = −mf 2π d+01 − g4A mµ 16 π f 2π − 77 g 768 π f 2π , (57) mc4 = f 2π b g2A(1+g 16 π f 2π . (58) Adopting the values for the subthreshold coefficients given by Höhler [13], namely d+01 = 1.14 ± 0.02µ−3 and b−00 = 10.36 ± 0.10µ−2, one finds the figures shown in the last row of Table I. These, in turn, produce the strength coefficients displayed in Table II. For the sake of comparison, we also quote values employed in our earlier calculation [10] and in two TM’ versions [28] of the Tucson-Melbourne potential [8]. TABLE II: Strength coefficients in MeV. reference C+1 C this work 0.794 -2.118 0.034 0.691 0.014 -0.067 Brazil [10] 0.92 -1.99 - 0.67 - - TM’(93) [28] 0.60 -2.05 - 0.58 - - TM’(99) [28] 0.91 -2.26 - 0.61 - - Changes in these parameters represent theoretical progress achieved over more than two decades and it is worth investigating their origins in some detail. With this purpose in mind, we compare present results with those of our previous O(q3) calculation [10]. At the chiral order one is working here, new qualitative effects begin to show up, associated with both loops and non-local interactions. They are represented by terms proportional to the coefficients C+3 , C 2 and C 3 in Eqs. (45) and (46). The πN coupling is now described by g2Aµ 2/f 2π = 3.66 whereas, previously, the factor g2µ2/m2 = 3.97 was used. From a conceptual point of view, the latter should be preferred, since g is indeed the proper coupling observable. In chiral perturbation theory, the difference between both forms is ascribed to the parameter ∆GT = −2d18µ2/g, which describes the Goldberger-Treiman discrepancy [15]. As this is a O(q2) effect, both forms of the coupling become equivalent in the present calculation. On the other hand, the empirical value of g is subject to larger uncertainties and the form based on gA is more precise. Our present choice accounts for a decrease of 8% in all parameters. The relations C+1 ↔ Cs, C+2 ↔ Cp and C−1 ↔ −C ′p allow one to compare Eqs. (45) and (46) with Eq. (67) of Ref. [10]. One notes that the latter contains an unfortunate misprint in the sign of the term proportional to C ′p, as pointed out in Ref. [29]. In the earlier calculation, the coefficient Cs was based on a parameter [30] ασ = 1.05µ −1, which corresponds to σ(2µ2) = 64 MeV. The results of Table II show that the values of C+2 and C−1 are rather close to those of Cp and −C ′p. This can be understood by rewriting Eqs. (52) and (54) in terms of the subthreshold coefficient d+01 and b 00 as follows C+2 = − 32 π2f 4π m mf 2π d 29g2Amµ 768πf 2π , (59) C−1 = 128 π2f 4π m f 2π b g2Amµ 16πf 2π . (60) Numerically, this amounts to C+2 = −(1.845 + 0.110 + [0.163]) MeV and C−1 = (0.624 + [0.067]) MeV. The second term in the former equation was overlooked in Ref. [10] and should have been considered there. The square brackets2 correspond to next-to-leading order contributions and yield corrections of about 8% and 11% to the leading terms in C+2 and C−1 , respectively. 3 As the model used in Ref. [10] was explicitly designed to reproduce the subthreshold coefficients quoted by Höhler [13], it produces the very same contributions as the first terms in Eqs. (59) and (60). 2 These factors can be traced back to loop diagrams in Fig. 3 and are dynamically related with the term proportional to C± , as we discuss in Appendix C. 3 When comparing the new coefficients with those in the second row of Table II, one should also take into account the 8% effect due to the Goldberger-Treiman discrepancy. VI. NUMERICAL RESULTS FOR THREE-NUCLEON SYSTEMS In order to test the effects of the TPE-3NP at O(q4), in this section, we present some numerical results of Faddeev calculations for three-nucleon bound and scattering states. The calculations are based on a configuration space approach, in which we solve the Faddeev integral equations [31–33], Φ3 = Ξ12,3 + E + iǫ−H0 − V12 × [V12 (Φ1 + Φ2) +W3 (Φ1 + Φ2 + Φ3)] , (and cyclic permutations), (61) where Ξ12,3, which does not appear in the bound state problem, is an initial state wave function for the scattering problem, H0 is a three-body kinetic operator in the center of mass, V12 is a nucleon-nucleon (2NP) potential between nucleons 1 and 2, and W3 is the 3NP displayed in Fig. 2. Partial wave states of a 3N system, in which both NN and 3N forces act, are restricted to those with total NN angular momenta j ≤ 6 for bound state calculations, and j ≤ 3 for scattering state calculations. The total 3N angular momentum (J) is truncated at J = 19/2, while 3NP is switched off for 3N states with J > 9/2 for scattering calculations. These truncation procedures are confirmed to give converged results for the purposes of the present work. When just local terms are retained, t̄3 in Eq. (43) can be cast in the conventional form [8–10] t̄3 = − 4f 2π F (k2) 2 + µ2 F (k′2) ′2 + µ2 (σ(1) · k)(σ(2) · k′) (τ (1) · τ (2)){a+ b(k · k′)} −(iτ (1) × τ (2) · τ (3))(iσ(3) · k′ × k)d , (62) where the coefficients, a, b, and d are related with our potential strength coefficients by [C+1 , C 2 , C 1 ] = (4π)2 [−aµ4, bµ6, −dµ6] . (63) The values of the coefficients, a, b, and d for the TPE-3NP at O(q4) are shown in Table III, as BR-O(q4). In this table, the values for the older version of the Brazil TPE-3NP, BR(83) [10], and the potential up to O(q3) given by Eqs. (41-42), BR-O(q3), are shown as well. TABLE III: Coefficients a, b, and d of the TPE-3NP. 3NP a µ b µ3 d µ3 BR-O(q4) -0.981 -2.617 -0.854 BR-O(q3) -0.736 -3.483 -1.204 BR(83) -1.05 -2.29 -0.768 In Eq. (62), the function F (k2) represents a πNN form factor. We apply a dipole form factor with the cut off mass Λ, Λ2−µ2 , which modifies the profile functions U(x), U1(x), and U2(x) in Eqs. (47-49) as U(x) = Λ̄2 − 1 , (64) U1(x) = − + Λ̄2 e−Λ̄x Λ̄2 − 1 e−Λ̄x, (65) U2(r) = (Λ̄x)2 e−Λ̄x −Λ̄(Λ̄ 2 − 1) e−Λ̄x, (66) with Λ̄ = Λ/µ. We choose the Argonne V18 model (AV18) [34] for a realistic NN potential, by which the triton binding energy (B3) becomes 7.626 MeV, underbinding it by about 0.9 MeV compared to the empirical value, 8.482 MeV. As it is well known, the introduction of the TPE-3NP remedies this deficiency. The amount of attractive contribution depends on the cutoff mass Λ, as shown in Fig. 5. The solid curve shows the dependence of B3 on Λ for the calculation with the BR-O(q4) 3NP in addition to the AV18 2NP (AV18+BR-O(q4)). In the figure, the empirical value and the AV18 result are displayed by the dashed and dotted horizontal lines, respectively. Due to the strong attractive character of the 3NP, B3 is reproduced by choosing a rather small value of Λ, namely 660 MeV. In the same figure, the Λ-dependence of B3 for AV18+BR-O(q3) is displayed by a dashed curve and that for the AV18+BR(83) by a dotted curve. From these curves we see that AV18+BR-O(q3) reproduces B3 for Λ = 620 MeV and AV18+BR(83) for Λ = 680 MeV. In other words, the BR-O(q4) 3NP is slightly more attractive than the BR(83) 3NP and a large attractive effect occurs when one moves from the TPE O(q4) 3NP to the O(q3) 3NP. This tendency is strongly correlated with the magnitude of the coefficient b, as shown in Table III. This can be understood as a dominant contribution to B3 from the component of the TPE-3NP associated with the coefficients b. This dominance is shown in Table IV, where we tabulate calculated B3 for the AV18 plus the BR-O(q4) 3NP and plus each term of the BR-O(q4) coming from the coefficients a, b, and d. 550 600 650 700 750 800 L (MeV) FIG. 5: (Color online) The triton binding energy B3 as functions of the cutoff mass Λ of the πNN dipole form factor. The solid curve denotes the result for AV18+BR-O(q4), the dashed curve for AV18+BR-O(q3), and the dotted curve for AV18+BR(83). The horizontal lines denote the AV18 result (dotted line) and the empirical value (dashed line). In Fig. 6, we compare six calculated observables for proton-deuteron elastic scattering, namely differential cross sections σ(θ), vector analyzing powers of the proton Ay(θ) and of the deuteron iT11(θ), and tensor analyzing powers of the deuteron T20(θ), T21(θ), and T22(θ), at incident proton energy E N = 3.0 MeV, (or incident deuteron energy E d = 6.0 MeV,) with experimental data of Ref. [35, 36]. In the figure, the solid curves designate the AV18 calculations and the dashed curves the AV18+BR-O(q4) calculations, which are almost indistinguishable from the AV18+BR-O(q3) and AV18+BR(83) calculations, once TABLE IV: Triton binding energy for the AV18 2NP plus the BR-O(q4) 3NP for each term of the BR-O(q4) 3NP with Λ = 660 MeV. ∆B3 means the difference of the calculated binding energy from that of the AV18 calculation. B3 (MeV) ∆B3 (MeV) AV18+BR-O(q4) 8.492 0.866 AV18+BR-O(q4)-a 7.673 0.047 AV18+BR-O(q4)-b 8.241 0.615 AV18+BR-O(q4)-d 7.787 0.161 the cut off masses are chosen so that B3 is reproduced. It is reminded that the TPE-3NF gives minor effects on the vector analyzing powers. This happens because the exchange of pions gives essentially scalar and tensor components of nuclear interaction in spin space, which are not so effective to the vector analyzing powers. On the other hand, as is noticed in Refs. [37, 38], at ElabN = 3.0 MeV, the TPE-3NP gives a wrong contribution to the tensor analyzing power T21(θ) around θ = 90 In Fig. 7, we compare calculations of observables in neutron-deuteron elastic scattering at ElabN = 28.0 MeV with experimental data of proton-deuteron scattering Ref. [39]. At this energy, discrepancies between the calculations and the experimental data in the vector analyzing power iT11(θ) appear at θ ∼ 100◦, where iT11(θ) has a minimum, and at θ ∼ 140◦, where iT11(θ) has a maximum, which are not compensated by the introduction of the TPE- 3NP. On the other hand, while the AV18 calculation almost reproduces the experimental data of T21(θ) at θ ∼ 90◦, the introduction of the TPE-3NP gives a wrong effect, as in the ElabN = 3 MeV case. These results set the stage for the introduction of terms associated with the coefficients C+3 , C 2 , and C 3 , Eqs. (44-45), which are new features of the O(q4) expansion of the TPE- 3NP. Terms proportional to C±3 , which include the rather complicated function I(r31, r23) given in Appendix C, arise from a loop integral, Eq. (33). On the other hand, the term with C−2 corresponds to a non-local potential and includes the gradient operator ∇ ij , which acts on the wave function and arises from the kinematical variable ν. Both kinds of contributions are not expressed in the conventional local form shown in Eq. (62), which involves only the coefficients C+1 , C 2 , and C 1 , and the full evaluation of their effects would require an 0 60 120 180 -0.04 -0.03 -0.02 -0.01 0 60 120 180 0 60 120 180 -0.03 -0.02 -0.01 (deg) = 3.0 MeV (E = 6.0 MeV) (deg) (deg) (deg) (deg) (deg) FIG. 6: (Color online) Proton-deuteron elastic scattering observables at ElabN = 3.0 MeV. Solid curves are calculations for the AV18 potential, and dashed curves for the AV18+BR-O(q4). Ex- perimental data are taken from Refs. [35, 36]. extensive rebuilding of large numerical codes. However, the coefficients of the new terms are small, and in this exploratory paper we estimate their influence over observables as follows. The function I(r31, r23) is approximated by Eq. (C11), which amounts to replacing Πt(t) by a factor −π. Further, the kinematical factors in front of Πt(t) in Eqs. (34) and (38), namely 1 − 2t/µ2 and 1 − t/4µ2, are approximately evaluated by putting t ≈ 2µ2, which yields −3 and 1/2, respectively. By this procedure, the coefficients C+3 and C−3 are absorbed into C+2 and C 1 , or in b and d respectively, and one has ∆C+2 = −3C+3 , ∆C−1 = C−3 /2. (67) Numerically, this corresponds to ∆C+2 = −0.102 MeV ∼ 120C 2 and ∆C 1 = −0.034 MeV ∼ C−1 , or ∆b = −0.125(µ−3) and ∆d = 0.042(µ−3). The net change produced in the triton binding energy is +0.026 MeV (+0.037 MeV from ∆C+2 and -0.011 MeV from ∆C 1 ), just about 1/30 of the total increase in B3 due to the local terms of the BR-O(q4) TPE-3NP. The non-local term proportional to C−2 is more involved and we restrict ourselves to a rough assessment of its role. We replace the variable ν by a constant 〈ν〉 and assume, for 0 60 120 180 0 60 120 180 0 60 120 180 (deg) = 28.0 MeV �E = 56.0 MeV� (deg) (deg) (deg) (deg) (deg) FIG. 7: (Color online) Nucleon-deuteron elastic scattering observables at ElabN = 28.0 MeV. Curves are calculations for neutron-deuteron scattering. Solid curves denote calculations for the AV18 potential and dashed curves for the AV18+BR-O(q4). Experimental data are those for proton- deuteron scattering taken from Ref. [39]. example, that 〈ν〉 = µ2 . This changes the C−2 term in Eq. (46) into the very simple form V −3 (r,ρ) = C 1 (· · · ) + iC̃−2 σ(1) · x̂31σ(2) · x̂23U1(x31)U1(x23) + C−3 (· · · ) , (68) C̃−2 = − 4f 2π 1− g2A 2f 2π (4π)2 g2A(1− g2A)µ6 512π2f 4πm = 0.021 MeV . (69) Except for the isospin factor, this term is similar to that with C+1 (or a), which adds about 0.05 MeV to the triton binding energy. Since the potential strength C̃−2 is about 3 % of C+1 , its contribution to the binding energy may be estimated to be a tiny 0.001 MeV. VII. CONCLUSIONS In the framework of chiral perturbation theory, three-nucleon forces begin at O(q3), with a long range component which is due to the exchanges of two pions and relatively simple. At O(q4), on the other hand, a large number of different processes intervene and a full description becomes rather complex. For this reason, here we concentrate on a subset of O(q4) interactions, namely that which still involves the exchanges of just two pions. This part of the 3NP is closely related with the πN amplitude, and the expansion of the former up to O(q4) depends on the latter at O(q3). Our expressions for the potential are given in Eqs. (44-56) and the new chiral layer of the TPE-3NP considered in this work gives rise to both numerical corrections to strength coef- ficients of already existing terms (C+1 , C 2 , C 1 ) and new structures in the profile functions. Changes in numerical coefficients lay in the neighborhood of 10% and can be read in Tables II and III. New structures, on the other hand, arise either from loop functions representing form factors or the non-local terms associated with gradients acting on the wave function. They correspond to the terms proportional to the parameters C+3 , C 2 and C 3 , which are small and compatible with perturbative effects. In order to insert our results into a broader picture, in Table V we show the orders at which the various effects begin to appear, including the drift potential derived recently [40]. TABLE V: Chiral picture for two- and three-body forces. beginning TWO-BODY TWO-BODY THREE-BODY O(q0) OPEP: V −T , V O(q2) OPEP: V −D TPEP: V T , V O(q3) TPEP: V −LS, V T , V SS ;V C , V LS TPEP: C 1 , C O(q4) TPEP: V −D ;V Q , V D TPEP: C 3 , C The influence of the new TPE-3NP over three-body observables has been assessed in both static and scattering environments, adopting the Argonne V18 potential for the two-body interaction. In order to reproduce the empirical triton binding energy, the O(q4) potential requires a cutoff mass of 660 MeV. Comparing this with the value of 680 MeV for the 1983 Brazil TPE-3NP, one learns that the later version is more attractive. In the study of proton-deuteron elastic scattering, we have calculated cross sections σ(θ), vector analyzing powers Ay(θ) of the proton and iT11(θ) of the deuteron, and tensor analyzing powers T20(θ), T21(θ), and T22(θ) of the deuteron, at energies of 3 and 28 MeV. Results are displayed in Figs. 6 and 7, where it is possible to see that there is little sensitivity to the changes induced in the strength parameters when one goes from O(q3) to O(q4). Old problems, as the Ay(θ) puzzle, remain unsolved. The present version of the TPE-3NP contains new structures, associated with loop inte- grals an non-local operators. Their influence over observables has been estimated and found to be at least one order of magnitude smaller than other three-body effects. A more detailed study of this part of the force is being carried on. APPENDIX A: KINEMATICS The coordinate describing the position of nucleon i is ri and one uses the combinations R = (r1+r2+r3)/3 , r = r2−r1 , ρ = (2 r3−r1−r2)/ 3 , (A1) which yield r1 = R− , r2 = R + , r3 = R + . (A2) The momentum of nucleon i is pi and one defines P = p1+p2+p3 , pr = (p2−p1)/2 , pρ = (2p3−p1−p2)/2 3 . (A3) Initial momenta p and final momenta p′ are used in the combinations Q = (P ′+P )/2 , q = (P ′−P ) , (A4) Qr = (p r+pr)/2 , qr = (p r−pr) , (A5) Qρ = (p ρ+pρ)/2 , qρ = (p ρ−pρ) . (A6) In the CM, one has P = 0 and the three-momenta are given by p1 = −(Qr−qr/2)− (Qρ−qρ/2)/ 3 , p′1 = −(Qr+qr/2)− (Qρ+qρ/2)/ 3 , (A7) p2 = (Qr−qr/2)− (Qρ−qρ/2)/ 3 , p′2 = (Qr+qr/2)− (Qρ+qρ/2)/ 3 , (A8) p3 = 2(Qρ−qρ/2)/ 3 , p′3 = 2(Qρ+qρ/2)/ 3 . (A9) Energy conservation for on-shell particles yield the non-relativistic constraint Qr ·qr +Qρ ·qρ = 0 . (A10) The momenta of the exchanged pions are written as k = p1 − p′1 , k′ = p′2 − p2 , (A11) k0 = −(qr+qρ/ 3)·(Qr+Qρ/ 3)/m , k = qr+qρ/ 3 , (A12) ′0 = (qr−qρ/ 3)·(Qr−Qρ/ 3)/m , k′ = qr−qρ/ 3 , (A13) and the Mandelstam variables for nucleon 3 read s = (p3+k) 2 = m2 − (qr+qρ/ 3) · (qr+2Qr−qρ/ 3Qρ) +O(q4) , (A14) u = (p3−k′)2 = m2 − (qr−qρ/ 3) · (qr+2Qr+qρ/ 3Qρ) +O(q4) , (A15) ν = (s−u)/4m = −2 qr ·Qρ/ 3 +O(q4) . (A16) In the evaluation of the intermediate πN amplitude, one needs [ū(p′) u(p)](3) ≃ 2m+O(q2) , (A17) ū(p′) σµν(p ′−p)µKν u(p)](3) ≃ 2 iσ(3) ·qρ×qr/ 3 +O(q4) . (A18) The πN vertex for nucleon 1 is associated with [ū(p′) γ5 u(p)] (1) ≃ σ(1) ·(qr+qρ/ 3) +O(q3) , (A19) and results for nucleon 2 are obtained by making qr → −qr. APPENDIX B: SUBTHRESHOLD COEFFICIENTS The polynomial parts of the amplitudes T±R , Eqs. (30-35), are determined by the sub- threshold coefficients of Ref. [15]. The terms relevant to the O(q3) expansion are written as d+00 = − 2 (2c1 − c3) µ2 8 g4A µ 64 π f 4π 3 g2A µ 64 π f 4π , (B1) d+01 = − 48 g4A µ 768 π f 4π 77 g2A µ 768 π f 4π , (B2) d+02 = 193 g2A 15360 π f 4π µ , (B3) d−00 = 2 f 2π +O(q2) , (B4) b−00 = 2 f 2π 2 c4 m g4A m µ 8 π f 4π g2A m µ 8 π f 4π , (B5) b−01 = g2A m 96 π f 4π µ , (B6) where the parameters ci and d̃i are the usual coupling constants of the chiral lagrangians of order 2 and 3 respectively [41] and the tilde over the latter indicates that they were renormalized [15]. Terms within square brackets labeled (mr) in these results are due to the medium range diagrams shown in Fig. 3 and have been included explicitly into the functions D±mr and B mr. Terms bearing the (WT ) label were also explicitly considered in Eqs. (15-19). The subthreshold coefficients are determined from πN scatterig data and a set of experimental values is given in Ref. [13]. APPENDIX C: FUNCTIONS In The functions In, describing loop contributions, are given by In(r31, r23) = − (2π)3 (2π)3 ei(k·r31+k ·r23) ′2+µ2 Πt(t) . (C1) Using the definition Eq. (33) and the Jacobi variables Eq. (A1), one writes In(r31, r23) = I(r31, r23) , (C2) I(r31, r23) = 128π da tan−1 µ (1−a2/2) L(a; r,ρ) (C3) L(a; r,ρ) = (2π)3 (2π)3 ei(Q·r− 3q·ρ/2) a2q2+4µ2 [(Q−q)2+µ2] [(Q+q)2+µ2] . (C4) The numerical evaluation of the function L is can be simplified by using alternative repre- sentations. • form 1: One uses the Feynman procedure for manipulating denominators, which yields L(a; r,ρ) = (2π)3 (2π)3 ei(Q·r− 3q·ρ/2) a2q2+4µ2 [(Q2+q2/4+µ2)−(1−2b)q ·Q]2 (2π)3 ei[(1−2b)r− 3ρ]·q/2 a2q2+4µ2 e−Θ r µ2+b(1−b) q2 . (C5) Performing the angular integration over q, one has L(a; r,ρ) = 16 π3 e−Θ r Θ (a2q2+4µ2) sin q [(1− 2b) r − 3ρ]/2 [(1− 2b) r − 3ρ]/2 . (C6) • form 2: The Fourier transform dx e−ik·x allows one to write L(a; r,ρ) = e−µ|r31+z| |r31+z| e−µ|r23−z| |r23−z| e−2µ z/a . (C8) These results may be further simplified by means of approximations. • heavy baryon approximation: In the limit m → ∞, corresponding to the heavy baryon case, one uses F (a)→ 4π/a2 in Eq. (33) and Eqs. (C5) and (C7) yield, respectively, I(r31, r23) ≃ tan−1 e−Θ r sin q [(1− 2b) r − 3ρ]/2 [(1− 2b) r − 3ρ]/2 , (C9) I(r31, r23) ≃ e−µ|r31+z| |r31+z| e−µ|r23−z| |r23−z| e−2µ z 2µ z2 . (C10) • multipole approximation: The integrand in Eq. (C10) is peaked around z = 0 and a multipole expansion of the Yukawa functions produces I(r31, r23) ≃ U(x31) U(x23) + · · · . (C11) The same result can also be obtained by using the expansion Πt(t) ∼ −π[1 + t/12µ2 + t2/80µ4 + · · · ], valid for low t, directly into Eq. (C1). APPENDIX D: NON-LOCAL TERM In configuration space, the variable Qρ corresponds to a non-local operator, represented by a gradient acting on the wave function. In order to make the dependence of t̄3 on Qρ explicit, one writes t̄3 = [Qρ]i Xi(qr, qρ) , (D1) where X is a generic three-vector, and evaluates the matrix element 〈ψ |W |ψ〉 =− ]12 ∫ dr′ dρ′ dr dρ ψ∗(r′,ρ′) ψ(r,ρ) dQr dQρ dqr dqρ × ei[Qr·(r ′−r)+Q ·(ρ′−ρ)+q ·(r′+r)/2+q ·(ρ′+ρ)/2] t̄3(Qr,Qρ, qr, qρ) dr dρ ∗(r,ρ) ψ(r,ρ) + ψ∗(r,ρ) ∇ρ ψ(r,ρ) dqr dqρ e ·ρ] Xi(qr, qρ) . (D2) This yields the potential V3(r,ρ) = − (2π)6 dqr dqρ e ·ρ] Xi(qr, qρ) , (D3) where the operator ∇ acts only on the wave function. An alternative form can be obtained by integrating Eq. (D2) by parts, and one finds V3(r,ρ) =− (2π)6 dqr dqρ e ·ρ] X(qr, qρ) − i∇wfρ dqr dqρ e ·ρ] X(qr, qρ) . (D4) In the case of the three-body force, the only non-local contribution is associated with the subamplitude D−, Eq. (37), which yields Xi = −i τ (1) × τ (2) ·τ (3) ′2+µ2 (1) ·k σ(2) ·k′ g2A(g A − 1)√ 3 8f 4π m (k′+k)i . (D5) The action of ∇ρ on the second term of Eq. (D4) gives rise to an integrand proportional to ′2−k2), which has short range and does not contribute to the TPE-3NP. Therefore it is neglected. [1] M. Taketani, S. Nakamura, and T. Sasaki, Prog. Theor. Phys. 6, 581 (1951). [2] S. Weinberg, Phys. Lett. B 251, 288 (1990); Nucl. Phys. B 363, 3 (1991). [3] S. Weinberg, Phys. Lett. B 295, 114 (1992). [4] C. Ordóñez and U.van Kolck, Phys. Lett. B 291, 459 (1992); C. Ordóñez, L. Ray, and U. van Kolck, Phys. Rev. Lett. 72, 1982 (1994); Phys. Rev. C 53, 2086 (1996). [5] N. Kaiser, R. Brockman, and W. Weise, Nucl. Phys. A625, 758 (1997); N. Kaiser, Phys. Rev. C 64, 057001 (2001); Phys. Rev. C 65, 017001 (2001); E. Epelbaum, W.Glöckle, and U-G. Meissner, Nucl. Phys. A637, 107 (1998); ibid. A671, 295 (2000); D. R. Entem and R. Machleidt, Phys. Rev. C 66, 014002 (2002). [6] R. Higa and M. R. Robilotta, Phys. Rev. C 68, 024004 (2003). [7] R. Higa, M. R. Robilotta, and C. A. da Rocha, Phys. Rev. C 69, 034009 (2004). [8] S. A Coon, M. D. Scadron, P. C. McNamee, B. R. Barrett, D. W. E. Blatt, and B. H. J. McKellar, Nucl. Phys. A317, 242 (1979). [9] S. A. Coon and W. Glöckle, Phys. Rev. C 23, 1790 (1981). [10] H. T. Coelho, T. K. Das, and M. R. Robilotta, Phys. Rev. C 28, 1812 (1983). [11] J. L. Friar, Phys. Rev. C 60, 034002 (1999). [12] S-N. Yang, Phys. Rev. C 10, 2067 (1974). [13] G. Höhler, group I, vol.9, subvol.b, part 2 of Landölt-Bornstein Numerical data and Functional Relationships in Science and Technology, ed. H.Schopper, 1983; G. Höhler, H. P. Jacob, and R. Strauss, Nucl. Phys. B39, 273 (1972). [14] T. Becher and H. Leutwyler, Eur. Phys. Journal C 9, 643 (1999). [15] T. Becher and H. Leutwyler, JHEP 106, 17 (2001). [16] J. C. Ward, Phys. Rev. 78, 1824 (1950); Y. Takahashi, Nuovo Cimento 6, 370 (1957); L. S. Brown, W. J. Pardee, and R. Peccei, Phys. Rev. D 4, 2801 (1971). [17] M. Mojžǐs and J. Kambor, Phys. Lett. B 476, 344 (2000). [18] J. Gasser, M. E. Sainio, and A. Švarc, Nucl. Phys. B307, 779 (1988). [19] G. Höhler, H. P. Jacob, and R. Strauss, Nucl. Phys. B39, 273 (1972). [20] S. Weinberg, Phys. Rev. Lett. 17, 616 (1966). [21] Y. Tomozawa, Nuovo Cimento A 46, 707 (1966). [22] M. R. Robilotta, Phys. Rev. C 63, 044004 (2001). [23] I. P. Cavalcante, M. R. Robilotta, J. Sá Borges, D. de O. Santos, and G. R. S. Zarnauskas, Phys. Rev. C 72, 065207 (2005). [24] J. L. Friar, D. Huber, and U. van Kolck, Phys. Rev. C 59, 53 (199); U. van Kolck, Ph. D. thesis, University of Texas, 1993; C. Ordóñez and U. van Kolck, Phys. Lett. B 291, 459 (1992); U. van Kolck, Phys. Rev. C 49, 2932 (1994). [25] J. Gasser, H.Leutwyler, and M. E. Sainio, Phys. Lett. B 253, 252, 260 (1991). [26] P. Büttiker and U.-G. Meissner, Nucl. Phys. A668, 97 (2000). [27] N. Fettes and U-G. Meissner, Nucl. Phys. A693, 693 (2001). [28] S. A. Coon and H. K. Han, Few-Body Syst. 30, 131 (2001). [29] M. R. Robilotta and H. T. Coelho, Nucl. Phys. A460, 645 (1986). [30] M. G. Olsson and E. T. Osypowski, Nucl. Phys. B101,136 (1975); E. T. Osypowski, Nucl. Phys. B21, 615 (1970). [31] T. Sasakawa and S. Ishikawa, Few-Body Syst. 1, 3 (1986). [32] S. Ishikawa, Few-Body Syst. 32, 229 (2003). [33] S. Ishikawa, Few-Body Syst. (to be published), nucl-th/0701044. [34] R. B. Wiringa, V. G. J. Stoks, and R. Schiavilla, Phys. Rev. C 51, 38 (1995). [35] K. Sagara, H. Oguri, S. Shimizu, K. Maeda, H. Nakamura, T. Nakashima, and S. Morinobu, Phys. Rev. C 50, 576 (1994). [36] S. Shimizu, K. Sagara, H. Nakamura, K. Maeda, T. Miwa, N. Nishimori, S. Ueno, T. Nakashima, and S. Morinobu, Phys. Rev. C 52, 1193 (1995). [37] S. Ishikawa, M. Tanifuji, and Y. Iseri, Phys. Rev. C 67, 061001(R) (2003). [38] S. Ishikawa, M. Tanifuji, and Y. Iseri, in Proc. of the Seventeenth International IUPAP Con- ference on Few-Body Problems in Physics, Durham, North Carolina, USA, 2003, edited by W. Glöckle and W. Tornow, (Elsevier, Amsterdam, 2004) S61. [39] K. Hatanaka, N. Matsuoka, H. Sakai, T. Saito, K. Hosono, Y. Koike, M. Kondo, K. Imai, H. Shimizu, T. Ichihara, K. Nisimura, and A. Okihana, Nucl. Phys. A426, 77 (1984). [40] M. R. Robilotta, Phys. Rev. C 74, 044002 (2006). [41] V. Bernard, N. Kaiser, J. Kambor, and U-G. Meissner, Nucl. Phys. B388, 315 (1992). ABSTRACT We present the expansion of the two-pion exchange three-nucleon potential (TPE-3NP) to chiral order q^4, which corresponds to a subset of all possibilities at this order and is based on the \piN amplitude at O(q^3). Results encompass both numerical corrections to strength coefficients of previous O(q^3) terms and new structures in the profile functions. The former are typically smaller than 10% whereas the latter arise from either loop functions or non-local gradients acting on the wave function. The influence of the new TPE-3NP over static and scattering three-body observables has been assessed and found to be small, as expected from perturbative corrections. <|endoftext|><|startoftext|> Interference effects in above-threshold ionization from diatomic molecules: determining the internuclear separation H. Hetzheim,1, 2 C. Figueira de Morisson Faria,3 and W. Becker2 Max-Planck-Institut für Kernphysik, Saupfercheckweg 1, 69117 Heidelberg, Germany Max-Born-Institut für nichtlineare Optik und Kurzzeitspektroskopie, Max-Born-Str. 2A, D-12489 Berlin, Germany Department of Physics and Astronomy, University College London, Gower Street, London WC1E 6BT, United Kingdom (Dated: October 28, 2018) We calculate angle-resolved above-threshold ionization spectra for diatomic molecules in linearly polarized laser fields, employing the strong-field approximation. The interference structure resulting from the individual contributions of the different scattering scenarios is discussed in detail, with respect to the dependence on the internuclear distance and molecular orientation. We show that, in general, the contributions from the processes in which the electron is freed at one center and rescatters off the other obscure the interference maxima and minima obtained from single-center processes. However, around the boundary of the energy regions for which rescattering has a classical counterpart, such processes play a negligible role and very clear interference patterns are observed. In such energy regions, one is able to infer the internuclear distance from the energy difference between adjacent interference minima. I. INTRODUCTION The interaction of matter with an intense laser field (I & 1013W/cm ) leads to several phenomena, such as above-threshold ionization (ATI) or high-order harmonic generation (HHG). Such phenomena owe their existence to physical mechanisms, in which an electron reaches the continuum, by tunneling or multiphoton ionization, at an instant t′. Subsequently, it is accelerated by the field and driven back towards its parent ion, or molecule, with which it rescatters or recombines at a later time t [1]. Such laser-induced recombination or rescattering processes take place within a fraction of a laser-field cycle. The period of a typical near-infrared Ti:sapphire laser pulse is T = 2π/ω ∼ 2.6fs. Thus, HHG and ATI occur on a time scale of hundreds of attoseconds [2]. Hence, above- threshold ionization and high-order harmonic generation may be employed for probing, or even controlling, dy- namic processes with attosecond and sub-angstrom reso- lution. This fact, together with new alignment techniques, has opened a whole new range of possibilities for studying molecules in strong laser fields, employing high-energy photoelectrons or high-order harmonic radiation. Con- crete examples are the attosecond reconstruction of the nuclear motion in a molecule [3], the real-time imaging of vibrational wavepackets [4], the tomographic reconstruc- tion of molecular orbitals [5], the time-resolved measure- ment of intramolecular quantum-interference effects [6], or the determination of internuclear distances [7]. These applications are a direct consequence of the fact that a molecule possesses a very specific configuration of ions from which the electron may leave, or off which it may rescatter causing above-threshold ionization, or re- combine generating high-harmonics. This leads to char- acteristic quantum-interference patterns in the HHG or ATI spectra, in which structural information about the molecule is hidden. This is true both for polyatomic [8] and diatomic molecules [6, 7, 9, 10, 11, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]. In particular for diatomic molecules, it has been shown that the high- order harmonic or ATI spectra exhibit overall maxima and minima, which are highly dependent on the spatial separation between both centers in the molecules, and can be described as the interference between two radi- ating point sources. In this sense, HHG or ATI by a diatomic molecule may be viewed as the microscopic ana- log of a double-slit experiment [6, 10, 11]. Furthermore, such features depend on the symmetry of the highest oc- cupied molecular orbitals, and on the alignment angle of the molecule with respect to the laser-field polarization [6, 7, 10, 11, 12, 13, 15, 19, 20, 21, 22, 23, 24, 25, 26, 27]. Specifically in the diatomic case, several aspects of this interference have been extensively studied in the past few years, such as the influence of the orbital sym- metry, the internuclear distance, the alignment angles [6, 7, 10, 11, 12, 13, 15, 19, 20, 21, 22, 23, 27], and molecular vibration [24, 25, 26], as well as the role of the laser-field shape [16, 17] or polarization [18]. Further- more, an adequate modeling of bound molecular states, in comparison with existing ionization experiments [14], has also raised considerable debate [15, 19, 20, 27]. For that purpose, both the purely numerical solution of the time-dependent Schrödinger equation [7, 11, 12], and the strong-field approximation [10, 13, 15, 19, 20, 22, 23, 24, 25, 26, 27] have been employed. The latter method allows a transparent physical interpretation of the phenomena in question as laser-induced rescattering or recombination processes, and permits a clear space- time picture, which can be related to the classical orbits of an electron in a strong laser field [28]. For a diatomic molecule, there exist two main rescattering or recombi- nation scenarios: the electron born through ionization at the center Ci upon its return may recollide and interact with either the same ion (Ci), or with the other one ( Cj http://arxiv.org/abs/0704.0712v2 (i 6= j)) Such processes have been taken into account for high-order harmonic generation employing a two-center zero-range potential [13, 21], using Bessel function expan- sions [19], and by means of saddle-point methods [21, 22]. In this paper, we calculate the energy spectra and an- gular distributions of ATI produced electrons in linearly polarized laser fields, within the framework of the strong- field approximation (SFA) and the single-active electron approximation (SAE). We employ a zero-range poten- tial model similar to that in [13], and consider both the direct electrons, which reach the detector without inter- acting with their parent molecule, and the electrons that suffer a single act of rescattering before reaching the de- tector. In the latter case, we put particular emphasis on interference effects: A final state with given momentum outside the laser field can be reached via two different scenarios. An electron can be born at and rescatter off the same center, or it can be born at one center and rescatter off the other. We show that the processes in- volving two centers, in general, obscure the interference patterns in the ATI spectra, in almost all energy-angle regions. An exception, however, is the boundary of the region that after tunneling is classically accessible to the ionized electron, in other words, the region before the classical cutoff. Near this boundary, the two-center pro- cesses yield negligible contributions, and one may identify very clear interference patterns. This makes it possible to provide a recipe to determine the internuclear distance R out of the angle-resolved ATI spectra. Throughout the article we will use the velocity gauge and atomic units (e = m = ~ = 1, c = 137). The paper is organized as follows: In Sec. II we pro- vide the ATI transition amplitudes for the direct and for the rescattered electrons, which in Sec. III are employed to compute the ATI spectra. The interference patterns in the spectra are analyzed with respect to molecular orien- tation, internuclear distance, and the position of the de- tector with respect to the polarization of the laser field and the molecular axis (Sec III A). In Sec. III B, we present angle-resolved spectra, from which we infer the internuclear distance. Finally, in Sec. IV, we summarize the paper. II. TRANSITION AMPLITUDES The transition amplitude for direct ionization, within the strong-field approximation (SFA) [29], is given by the Keldysh-Faisal-Reiss amplitude [30] Mp = −i dt〈Ψ(V )p (t)|V |Ψ0(t)〉, (1) where |Ψ0(t)〉 = |Ψ0〉 exp(i|E0|t). The amplitude de- scribes an electron, initially in the ground state |Ψ0〉, that is injected in the continuum by the laser field overcom- ing the ionization potential |E0|, and reaches the detector with final momentum p. The form of the transition am- plitude given here, which contains the binding potential V (r) rather than the interaction with the laser field, was first presented in Ref. [31]. In the SFA, the final state with momentum p is described by a Volkov state, which in velocity gauge has the form 〈r|Ψ(V )p (t)〉 = (2π)3/2 eip·re dτ [p+A(τ)]2 . (2) In the amplitude (1), the electron once ionized does not interact with the ion (that is, with the binding potential V ) anymore. If we allow for at most one single act of rescattering, the amplitude (1) is replaced by M (0,1)p = − ×〈Ψ(V )p (t)|V U (V )(t, t′)V |Ψ0(t′)〉. (3) Here, U (V )(t, t′) denotes the Volkov time-evolution oper- ator, which describes the evolution of the electron in the presence of the external laser field, ignoring the binding potential. Equation (3) incorporates direct ionization, as described by Eq. (1), as well as ionization followed by rescattering (for details, see, e.g., Ref. [32]). In order to apply Eqs. (1) and (3) to a diatomic molecule, we consider the two-center binding potential V (r) = V0(r−R1) + V0(r−R2), (4) where Ri (i = 1, 2) denote the coordinates of the centers Ci (i = 1, 2). For the ground-state wave function, we employ a linear combination of atomic orbitals (LCAO): Ψ(r) = c1Ψ0(r−R1) + c2Ψ0(r−R2). (5) Specifically, we will use the zero-range potential V (r) = r, (6) whose single bound state is described by the wave func- Ψ0(r) = )1/2 1 e−κr, (7) with κ = 2|E0|. The regularization operator ∂/∂r r acts on the wave function to its right in order to satisfy the proper boundary conditions at the origin [34]. For direct ionization by a monochromatic linearly polarized laser field A(t) = A0 cosωt e, (8) the evaluation of the amplitude (1) is straightforward. Taking R1 = R/2 and R2 = −R/2, so that R is the internuclear distance of the two centers, one obtains by expanding the exponent in the Volkov wave function in Eq. (2) into Bessel functions M0p = + Up + |E0| −Nω J−(2l+N) 2p · e , (9) 0 0.5 1 1.5 2 2.5 energy in units of U s) atom molecule R=2 a.u. molecule R=6 a.u. FIG. 1: (Color online) ATI spectra of the direct electrons in the laser polarization direction in the atomic and molecular case, for a molecule aligned parallel to a laser field of frequency ω = 0.058 a.u. and ponderomotive potential Up = 2.08 a.u. We consider a symmetric combination of atomic orbitals [c1 = c2 = 1/ 2 in Eq. (5)], with ionization potential |E0| = 0.9 a.u. In order to facilitate the comparison, the same ionization potential V0(r) was chosen in the atomic and molecular cases. The arrows mark the various destructive interference energies (n = 0, 1, 2) for R = 6 a.u.. where Up = A 0/4 denotes the ponderomotive energy of the laser field (8). The prefactor, which is proportional F = −κ+ exp(−κR)/R, (10) is of no relevance, since we do not attempt to calculate total ionization rates. It is, however, worth mentioning that the limit of R → 0 is not straightforward. Below we will not face this limit. For a more detailed discussion, see, e.g., Refs. [13, 35]. The only difference between the matrix element (9) for a molecule and the corresponding matrix element for an atom, besides the R-dependent prefactor, is the presence of the term cos(p ·R/2). This term describes the inter- ference of electron orbits with momentum p originating from one or the other center of the two-center potential (4). The cosine term comes from assuming a symmetric combination of orbitals in the ground-state wave func- tion (5), so that c1 = c2 = 1/ 2. For an antisymmetric combination, so that c1 = −c2, the cosine is replaced by a sine, leading to suppression of electrons with low mo- menta due to destructive interference [9, 10, 11]. The interference factor cos(p ·R/2) = cos(pR cos θ/2) yields destructive interference for electrons with energies E ≡ p (2n+ 1)π R cos θ for integer n. An illustration of the interference effect is given in Fig. 1, which shows the spectrum of the di- rect electrons for an atom and for a symmetric diatomic molecule aligned parallel to the laser-field polarization. The clearly visible sharp dips in the spectrum, due to destructive interference, are indicated by the arrows in the figure. Next, we turn to the evaluation of the ma- trix element (3), which allows rescattering. With the two-center potential (4) and the symmetric ground-state wave function (5), the matrix element reads (2π)3/2 d3r′ e−ip·re dτ(p+A(τ))2 [V0(r+R/2) + V0(r−R/2)] × U (V )(rt; r′t′) [V0(r′ +R/2) + V0(r′ −R/2)] [Ψ0(r′ +R/2) + Ψ0(r′ −R/2)] ei|E0|t . (12) For the zero-range potential (6), the integrations over space can be carried out easily [32], which leaves a two-dimensional integral over the ionization time t′ and the rescattering time t. For finite-range potentials, one may proceed by introducing form factors and employing saddle-point methods. In this case, the single-center pre- factors cause an overall decrease of the yield for increas- ing photoelectron energy. There are, however, no signif- icant changes in the interference patterns in comparison with the zero-range case, since these prefactors do not influence the action or the cosine factor. For a detailed discussion of the single-atom case, see, e.g., Ref. [36]. We split the eight integrals into those where the elec- tron rescatters off the same center from which it was ion- ized (r = r′ = ±R/2) and those where it rescatters off the opposite center (r = −r′ = ±R/2). We refer to the respective terms by M ij where i = +,− denote the cen- ter of ionization and j = +,− the center of rescattering. The integrals M++ and M−−, which specify the elec- trons coming from and rescattering off the same center, are essentially identical to the corresponding results for an atom [32]. The structure of the molecule is reflected in the integrals M+− and M−+, which characterize the electrons that experience the presence of both centers. Evaluating the remaining two integrals over t and t′, we substitute t′ = t − τ . The doubly infinite integral over t then yields a δ-function expressing energy conservation, while the semi-infinite integration over τ has to be cal- culated numerically. Expanding all oscillating exponents in terms of Bessel functions, we obtain for the transition amplitudes M++p +M p = 2F cos + Up + |E0| −Nω J−(2l+N) 2p · e )3/2{ e−i(|E0|τ+lα)e −iUpτ [1−( sinωτ/2 = 2F cos M (atom)p , (13) M+−p = Fe + Up + |E0| −Nω 2τ e−i[|E0|τ+lα+(2l+N)β−]e −iUpτ [1−( sinωτ/2 J−(2l+N) , (14) M−+p = Fe + Up + |E0| −Nω 2τ e−i[|E0|τ+lα+(2l+N)β+]e −iUpτ [1−( sinωτ/2 J−(2l+N) . (15) The real quantities A, B± and the phases α and β± are defined by Ae−iα = e−2iωτ + e−iωτ , (16) −iβ± = p · e± R · e [i sinωτ − (1− cosωτ)].(17) Upon R → −R, we have B± exp(−iβ±) → B∓ exp(−iβ∓). Consequently, the matrix element Mp does not change when R → −R. The complete matrix element is the sum of the terms (13) – (15), Mp = M p . (18) The first two terms describe electrons originating from and rescattering off the same center. They are pro- portional to the atomic ionization amplitude M (atom) [32] multiplied by the wave-function overlap F and the two-center interference factor cos(p ·R/2), which we ob- served for the direct electrons in Eq. (9). The behavior of the exchange terms M+−p and M p is more complicated and will be discussed below. The transition amplitude Mp simplifies enormously when the molecule is aligned perpendicularly to the field so that R · e = 0. Equation (17) shows that in this case B+ = B− and β+ = β− = 0. Hence, the integrals on the right-hand side of Eqs. (14) and (15) are equal and M+−p +M p becomes proportional to 2 cos(p ·R/2) just like M++p +M p . If, in addition, the electron is emitted perpendicularly to the field so that also p · e = 0, then B+ = B− = 0 and we have Mp = 0 unless N is even. Substituting τ → τ/ω in Eqs. (13)–(17) one can see that the amplitudes M ijp and their sum Mp depend on the parameters of the problem through the dimension- less quantities p2/ω, Up/ω, and ωR 2 when the relative orientations of the vectors R, p, and e are kept fixed. III. PHOTOELECTRON SPECTRA In this section we discuss the ATI spectra computed employing the transition matrix elements (1) and (3), and a symmetric combination of equivalent centers [c1 = c2 = 1/ 2 in Eq. (5)]. For the sake of simplicity, in the comparison between the atomic and molecular case the same ionization potential V0 is chosen. Specifically, we take |E0| = 0.9 a.u. in Eqs. (5) and (7) through- out [37]. In Sec. III A, we perform a detailed analysis of the interference patterns with respect to the molecular orientation, rescattering scenarios, and the direction of electron emission, while in Sec. III B we provide a recipe for measuring the internuclear distance from an analysis of the interference patterns in the angle-resolved spectra. A. Analysis of the interference patterns As a first step, we investigate how the interference pat- terns are influenced by the orientation of the molecule with respect to the laser-polarization direction. Such re- sults are displayed in Figs. 2 and 3 for parallel and per- pendicular orientations, respectively. In both cases, we compare the entire ATI spectrum consisting of the di- rect and the rescattered electrons in the atomic and the molecular case. Unless stated otherwise (cf. Fig. 5), we consider electron emission in the laser-polarization direc- tion. As expected from Eq. (9), for energies smaller than 2Up, the main contributions to the yield come from the direct-ionization matrix element (9). Apart from the interference-related factor of cos(p ·R/2), the transition matrix element is identical to that obtained for a sin- gle atom (cf. Fig. 1). This factor is responsible for the sharp interference dips at the positions given by Eq. (11). In the plateau energy region, however, the spectra de- pend on the laser-field polarization in a more complex way, as will be discussed next. In case the molecule is aligned parallel to the laser-field polarization (Fig. 2), the plateau is strongly enhanced in the molecular case, and the structure of the spectrum is very different from the atomic case and dependent on the internuclear dis- tanceR. Indeed, inspection of the exchange integrals (14) and (15) does not reveal any simple dependence on the internuclear distance. Generally, for the molecular case there are more pathways into a given final state. For our case of a two-center potential, there are four pathways in place of one for the atomic case. If they add coherently, a significant enhancement can result, ideally by a factor of 16, which is roughly what is observed in Fig. 2 be- fore the cutoff. The structure caused by the cosine factor is suppressed in the plateau region. This is caused by the contribution of the processes in which the electron is ejected from one center and rescatters off the other. Such processes correspond to the transition amplitudes M−+p andM p , which do not exhibit the proportionality to the cosine that is characteristic of the the one-center scattering amplitudes M++p and M p . A further par- ticular feature observed in this case is the displacement of the cut-off to higher energies with increasing internu- clear distance. This can be understood by the fact that an electron that moves from one center to the other may gain more energy from the field since it may be acceler- ated over a longer distance before it recollides. A strikingly different behavior is observed if the molecule is aligned perpendicular to the direction of the laser field so that R ·e = 0. In Fig. 3 we consider the case where the electron is emitted in the direction of the laser polarization so that p ·R = 0, too. In this case, there is a general enhancement of the ATI yield in comparison to that of a single atom by roughly a factor of two for the di- rect electrons and a much larger factor for the rescattered electrons. Notice that the molecular spectrum is practi- cally independent of the internuclear distance [21], since the R dependence of the prefactor (10) is weak for R & 2 and the exponential of R2/(2τ) is small for the values of R that we consider and the values of τ that give the dominant contributions to the integral. The entire ATI spectrum does not show any interference structure, since the contributions from the two centers add constructively for p ·R = 0. In fact, the cosine term in the matrix el- ements M0p, M p and M p simply reduces to one and the spectrum therefore looks like that of an atom. Specif- ically within the plateau, by symmetry the contributions from the two centers always interfere constructively. This results in a spectrum that is largely independent on the internuclear separation R, except that the plateau is en- hanced, compared with the atomic case, by the existence of four pathways. Formally, this can be understood as discussed above at the end of Sec. II. For arbitrary p · R, if the electron is emitted perpen- dicular to the laser polarization so that p · e = 0, then we see from Eqs. (16) and (17) that B+ = B−, while β− = β+ + π. The sum of the two exchange terms then goes like cos(p ·R/2) for even N and like sin(p ·R/2) for odd N . This holds regardless of the orientation of the molecule. If the laser polarization is perpendicular to both the electron momentum and the internuclear axis, then B+ = B− = 0. This implies that Mp is nonzero only for integer N . Each other electron peak is missing. Next, we discuss the interference pattern in more detail by analyzing the individual contributions to the transi- tion matrix element. In Fig. 4, we separately investigate the individual contributions to the amplitude (18). If only M++p and M p are taken, a very pronounced min- inum is observed near 5Up. These matrix elements cor- respond to the case in which the electron is ejected from and rescatters off the same center, so that the minimum is due to the term cos(p ·R/2). In the full spectrum |Mp|2, however, this minimum is absent because it is filled by the contributions from the exchange terms M−+p and M For a given orientation of the molecule with respect to the laser field and for fixed momentum p, Fig. 4 shows that the contribution |M−+p |2 of the scenario in which the electron is freed at the center C1 and rescatters off at the center C2 is different from that of |M+−p |2 where it is released at C2 and rescatters at C1. The same has been observed for high-order harmonic generation in a two-center system [13]. The imprints of interference can still be observed if the electron is emitted away from the laser polarization direction. This will cause, however, an overall decrease in the photoelectron energies for both the direct and the rescattered electrons. Examples are presented in Fig. 5. This behavior is known from atomic ionization, and its origin is the same in the molecular case; for a discussion, see, e.g., Ref. [32]. 0 2 4 6 8 10 energy in units of U s) atom molecule R=2 a.u. molecule R=4 a.u. FIG. 2: (Color online) Comparison of the complete ATI spec- tra consisting of direct and rescattered electrons in the atomic and the molecular case for internuclear distances of R = 2 a.u. and R = 4 a.u.. The molecule is aligned parallel to the laser-polarization direction, and the electrons are emitted in the same direction. The arrows mark the destructive in- terferences (n = 0, 1) of the molecule for R = 2 a.u.. The destructive interference for n = 1 is already in the plateau region, where the role of exchange terms becomes important. The remaining parameters are the same as in Fig. 1. 0 2 4 6 8 10 energy in units of U s) atom molecule R=2 a.u. molecule R=4 a.u. FIG. 3: (Color online) The same as Fig. 2 but with the molecular axis perpendicular to the laser polarization. The electrons are emitted parallel to the laser polarization. B. Determining the internuclear distance 9 10 11 0 2 4 6 8 10 energy in units of U FIG. 4: (Color online) Individual contributions of the var- ious rescattering scenarios to the total amplitude (18) for a diatomic molecule with internuclear distance R = 2 a.u. aligned parallel to the laser-field polarization, for the same molecular and field parameters as in Fig. 2. The electrons are emitted in the polarization direction. The arrows mark the respective cutoff energies for the various transition amplitude matrix elements. The inset at the lower left is an enlargement of the region near the cutoff where the direct terms and the exchange terms differ in a characteristic fashion, allowing for the determination of the internuclear separation. -8 Ψ=0 0 2 4 6 8 10 12 energy in units of U A || R A ⊥ R FIG. 5: (Color online) Electron yield for the parameters of Fig. 1 and internuclear distance R = 2 a.u. for different emission angles ψ with respect to the polarization of the laser field. The molecule is aligned parallel (perpendicular) to the laser field in the upper (lower) panel. FIG. 6: (Color online) Angle-resolved ATI spectra on a log- arithmic scale for a diatomic molecule with internuclear dis- tances R = 2 a.u. and R = 3 a.u. (middle and bottom panels, respectively), aligned parallel to the laser-field polar- ization, compared to the single-atom case (upper panel). The binding energy is E0 = 0.9 a.u. in all cases, and the laser fre- quency and the ponderomotive potential are ω = 0.058 a.u. and Up = 2.08 a.u., respectively. The plotted lines depict the minima (solid lines) and the maxima (dashed lines) of the energy distribution given by Eq. (11). FIG. 7: (Color online) Angle-resolved ATI spectra on a log- arithmic scale for a diatomic molecule with ionization poten- tial E0 = 0.9 a.u. and internuclear distances R = 4 a.u., R = 6 a.u. and R = 8 a.u. (upper, middle and bottom panels, respectively), aligned parallel to the laser-field polar- ization. The field parameters are the same as in the previous figure. The plotted lines depict the minima (solid lines) and the maxima (dashed lines) of the energy distribution given by Eq. (11). For a complete picture of the angle-resolved ATI spec- trum, not restricted to emission in particular directions, we will now present density plots. While they invariably imply loss of fine details and depend on the positioning and gradient of the false-color scale, they give a compre- hensive overview of the general structure. We restrict ourselves to the case of parallel alignment [41]. In this case the spectrum is symmetrical with respect to the in- ternuclear axis. It is obvious that the electrons with max- FIG. 8: (Color online) Enlargement of a limited energy-angle region of Fig. 7 for R = 8 a.u. (lowest panel) with increased resolution. The indents of Fig. 7 are distinctly visible as val- leys deeply cut into the high ridge that precedes the cutoff. imal kinetic energy will be detected in the direction of the laser field. The angle-resolved spectra displayed in Figs. 6 and 7 are very intricate and do not exhibit any simple struc- tures. They depend strongly on the internuclear sep- aration but do not, on a first inspection, lend them- selves in any obvious way to the assignment of a specific value of R to a given spectrum. Especially, owing to the presence and magnitude of the exchange terms (14) and (15), the two-center interference, which is expressed in the cos(p ·R/2) term, is not immediately visible. How- ever, looking more closely, one can observe a very distinct manifestation of this term just near the classical bound- ary of the spectrum. Roughly, the latter agrees with the boundaries of the colored areas in the various panels of Figs. 6 and 7. We observe well-defined indents in the overall smooth curve that defines the classical boundary. The positions of these indents and, especially, their sepa- rations agree quite well with the interference minima pre- dicted by Eq. (11). The figures show that the separation δE between the indents (on the scale of Up) monotoni- cally decreases with increasing R. Hence, by comparing a measured angle-resolved spectrum with Figs. 6 and 7 we can infer the internuclear separation. For the parameters underlying Figs. 6 and 7, the resulting function R(δE) is given in Table I. Fig. 8 exhibits an enlargement of the relevant area around the classical cutoff for the case of R = 8 a.u. with a higher resolution of the electron yield. The indents are very clearly visible like valleys that cut into the drop of a plateau on a topographical map. An analytical formula for R(δE) can be gained from an analytical formula for the classical cutoff energy E(θ) as a function of the angle θ. Intersecting this with the energies of the interference minima given by Eq. (11) al- lows one to determine the function R(δE) in dependence of the parameters of the problem. For the case of an atom, such a formula for E(θ) is actually known [39]. At least for R ≤ 6 a.u., Figs. 6 and 7 show that this classical boundary does not depend very strongly on the inter- nuclear separation R, so that the atomic result could be employed. However, even with this simplification, the re- sulting formula is quite complicated and we refrain from presenting it here. The question arises of why near the classical bound- ary the interference term cos(p ·R/2) roughly multiplies the angle-resolved spectrum, like it does for the direct electrons. The answer can be inferred from Fig. 4. The total ionization amplitude Mp is the superposition (13) of four different scenarios such that the electron starts from and rescatters off one or the other center. The two contributions (13) where they start from and rescatter off the same center are identical except for the geomet- rical phase, which leads to the cosine in Eq. (13). In contrast, the other two contributions (14) and (15) are uncorrelated since they are generated by geometrically different scenarios. Their magnitudes are different and almost nowhere do they exhibit a significant construc- tive interference. The two contributions (13) are individ- ually large when the long orbit and the short orbit add constructively, as is the case specifically just before the classical cutoff. In this case, they dominate the other two terms (14) and (15) by a factor of the order of 2 to 4. Hence the complete spectrum distinctly exhibits the geometrical interference, which is expressed in the factor cos(p ·R/2). Internuclear distances Energy differences of the indents of the molecule at the spectral boundary R = 2 a.u. δE ≈ 4.5Up R = 3 a.u. δE ≈ 3.2Up R = 4 a.u. δE ≈ 2.4Up R = 6 a.u. δE ≈ 1.3Up R = 8 a.u. δE ≈ 1.1Up TABLE I: Energy differences between adjacent indents around the classical boundary of the angle-resolved spectrum of Fig. 6. The differences are taken, for each internuclear separation, by starting with the first indent for Ψ ≥ 0◦ as a function of the energy. IV. CONCLUSIONS We have analyzed ATI spectra for a two-center molecule in a linearly polarized laser field. The terms of the two-center wave function contributing to the in- terference structure within the SFA formalism could be identified as well as the absence of the interference struc- ture throughout most of the plateau region. We have shown that the angle-resolved spectra can be used to de- termine the internuclear distance of a molecule aligned with the laser field, by reading off the energy differences between subsequent interference minima at the classical boundary of the spectrum. The validity of this method depends upon how close to reality is the angle-resolved spectrum calculated for our model molecule. Certainly, the spectrum of the di- rect electrons cannot be trusted for this purpose. How- ever, high-order ATI of an atom is well described by the SFA and a zero-range potential, especially near the classi- cal boundary [32]; for a comparison of spectra calculated from the SFA with the solution of the time-dependent Schrödinger equation, see Ref. [40] for the case of an atom. Experimentally, application of the method re- quires a high detection efficiency that allows one to ob- tain a sufficient number of counts down to the classical boundary. The problem of how to extract the internuclear sepa- ration from a diffraction pattern has been addressed by a different method in Ref. [7], employing the numerical solution of the time-dependent Schrdinger equation. In [7], however, it is necessary to compute a radial distribu- tion function from the diffraction intensity, whereas, with the method discussed in this paper, one may determine the internuclear distances directly from the spectra. The method suggested in Ref. [7] has the advantage that it analyzes direct electrons and, therefore, does not require exceptionally high detection efficiency. Finally, in a real physical system, there exist additional effects, which have not been incorporated in this model and may alter the interference patterns. Molecular vi- bration, for instance, causes an intensity loss in the high- harmonic signal [25], which may lead to a blurring in the patterns. However, recently, numerical ATI computa- tions in which such an effect is included have shown that for H+2 the angle-dependent interference patterns related to the double-slit physical picture remain distinguishable in the case considered [17]. Generally, the amount of blurring depends on the rigidity of the vibrational poten- tial. Since the period of vibrations is much longer than the laser period, with a few-cycle laser pulse our method could be used to track a vibrational wave packet or the dissociation of a molecule [4]. Another feature which has not been incorporated in our model is the dependence of the ionization potential on the internuclear distance. In fact, we have taken E0 to be constant, whereas, in reality, it decreases with R [13]. This feature, however, will only cause an overall energy shift in the spectra. Therefore, it will not affect the distance between two consecutive minima or maxima in the patterns for constant internu- clear distance (Figs. 6 and 7). Therefore, we expect our method to be applicable to real physical systems and a wide parameter range. Acknowledgments C.F.M.F. would like to thank L.E. Chipperfield, R. Torres, and J.P. Marangos for useful discussions and the UK Engineering and Physical Sciences Research Council (Advanced Fellowship, grant No. EP/D07309X/1) for financial support [1] P. B. Corkum, Phys. Rev. Lett. 71, 1994 (1993); K. C. Kulander, K. J. Schafer, and J. L. Krause in: B. Piraux et al. eds., Proceedings of the SILAP conference, (Plenum, New York, 1993). [2] A. Scrinzi, M. Y. Ivanov, R. Kienberger, and D. M. Vil- leneuve, J. Phys. B 39, R1 (2006). [3] S. Baker, J. Robinson, C. Haworth, H. Teng, R. Smith, C. Chirilă, M. Lein, J. Tisch, and J. P. Marangos, Science 312, 424 (2006). [4] H. Niikura, F. Légaré, R. Hasbani, A. D. Bandrauk, M. Yu. Ivanov, D. M. Villeneuve, and P. B. Corkum, Nature 417, 917 (2002); H. Niikura, F. Légaré, R. Hasbani, M. Yu. Ivanov, D. M. Villeneuve, and P. B. Corkum, Nature 421, 826 (2003); E. Goll, G. Wunner, and A. Saenz, Phys. Rev. Lett. 97, 103003 (2006); F. Légaré, K. F. Lee, A. D. Bandrauk, D. M. Villeneuve, and P. B. Corkum, J. Phys. B 39, S503 (2006). [5] J. Itatani, J. Levesque, D. Zeidler, H. Niikura, H. Pépin, J.C. Kieffer, P. B. Corkum, and D. M. Villeneuve, Nature 432, 867 (2004). [6] T. Kanai, S. Minemoto, and H. Sakai, Nature 435, 470 (2005). [7] S. X. Hu and L. A. Collins, Phys. Rev. Lett. 94, 073004 (2005). [8] For experiments see, e.g., N. Hay, R. de Nalda, T. Half- mann, K. J. Mendham, M. B. Mason, M. Castillejo, and J. P. Marangos, Phys. Rev. A 62, 041803(R)(2000); C. Altucci, R. Velotta, E. Heesel, E. Springate, J. P. Maran- gos, C. Vozzi, E. Benedetti, F. Calegari, G. Sansone, S. Stagira, M. Nisoli, and V. Tosa, Phys. Rev. A 73, 043411 (2006); H. Ohmura, F. Ito, and M. Tachyia, Phys. Rev. A 74, 043410 (2006); and for theory T. K. Kjeldsen, C. Z. Bisgaard, L. B. Madsen, and H. Stapelfeld, Phys. Rev. A 71, 013418 (2005). [9] B. Shan, X. M. Tong, Z. Zhao, Z. Chang, and C. D. Lin, Phys. Rev. A 66, 061401(R) (2002); F. Grasbon, G. G. Paulus, S. L. Chin, H. Walther, J. Muth-Böhm, A. Becker, and F. H. M. Faisal, Phys. Rev. A 63, 041402(R)(2001); C. Altucci, R. Velotta, J. P. Maran- gos, E. Heesel, E. Springate, M. Pascolini, L. Poletto, P. Villoresi, C. Vozzi, G. Sansone, M. Anscombe, J. P. Caumes, S. Stagira, and M. Nisoli, Phys. Rev. A 71, 013409 (2005). [10] J. Muth-Böhm, A. Becker, and F. H. M. Faisal, Phys. Rev. Lett. 85, 2280 (2000); A. Jarón-Becker, A. Becker, and F. H. M. Faisal, Phys. Rev. A 69, 023410 (2004) [11] M. Lein, N. Hay, R. Velotta, J. P. Marangos, and P. L. Knight, Phys. Rev. Lett. 88, 183903 (2002); Phys. Rev. A 66, 023805 (2002); M. Lein, J. P. Marangos, and P. L. Knight, Phys. Rev. A 66, 051404(R) (2002); M. Spanner, O. Smirnova, P. B. Corkum, and M. Y. Ivanov, J. Phys. B 37, L243 (2004). [12] D. A. Telnov and Shih-I Chu, Phys. Rev. A 71, 013408 (2005); G. Lagmago Kamta and A. D. Bandrauk, Phys. Rev. A 71, 053407 (2005). [13] R. Kopold, W. Becker, and M. Kleber, Phys. Rev. A 58, 4022 (1998). [14] C. Guo, M. Li, J. P. Nibarger, and G. N. Gibson, Phys. Rev. A 58, R4271 (1998); M. J. DeWitt, E. Wells, and R. R. Jones, Phys. Rev. Lett. 87, 153001 (2001); E. Wells, M. J. DeWitt, and R. R. Jones, Phys. Rev. A 66, 013409 (2002); I. V. Litvinyuk, K.F. Lee, P.W. Dooley, D.M. Rayner, D.M. Villeneuve, and P.B. Corkum, Phys. Rev. Lett. 90, 233003 (2003). [15] T. K. Kjeldsen and L. B. Madsen, J. Phys. B 37, 2033 (2004); Phys. Rev. A 73, 047401 (2006). [16] C. P. J. Martiny, and L. B. Madsen, Phys. Rev. Lett. 97, 093001 (2006); ibid. 97, 169903 (2006); S. Baier, C. Ruiz, L. Plaja, and A. Becker, Phys. Rev. A 74, 033405 (2006). [17] S. Seltsø, J. F. McCann, M. Førre, J. P. Hansen, and L. B. Madsen, Phys. Rev. A 73, 033407 (2006). [18] M. Lein, P. P. Corso, J. P. Marangos, and P. L. Knight, Phys. Rev. A 67, 023819 (2003). [19] V. I. Usachenko, P. E. Pyak, and S.-I Chu, Laser Phys. 16, 1326 (2006). [20] V. I. Usachenko, and S.-I Chu, Phys. Rev. A 71, 063410 (2005); V. I. Usachenko, Phys. Rev. A 73, 047402 (2006). [21] H. Hetzheim, M. Sc. thesis (Humboldt Universität zu Berlin, 2005). [22] C. C. Chirilă and M. Lein, Phys. Rev. A 73, 023410 (2006); ibid. 74, 051401(R) (2006). [23] X. Zhou, X. M. Tong, Z. X. Zhao, and C. D. Lin, Phys. Rev. A 71, 061801(R) (2005); ibid. 72, 033412 (2005). [24] T. K. Kjeldsen and L. B. Madsen, Phys. Rev. A 71, 023411 (2005); Phys. Rev. Lett. 95, 073004 (2005); C. B. Madsen and L. B. Madsen, Phys. Rev. A 74, 023406 (2006). [25] M. Lein, Phys. Rev. Lett. 94, 053004 (2005); C. C. Chirilă and M. Lein, J. Phys. B 39, S437 (2006). [26] A. Requate, A. Becker, and F. H. M. Faisal, Phys. Rev. A 73, 033406 (2006). [27] D. B. Milošević, Phys. Rev. A 74, 063404 (2006). [28] P. Salières, B. Carré, L. LeDéroff, F. Grasbon, G. G. Paulus, H. Walther, R. Kopold, W. Becker, D. B. Milošević, A. Sanpera, and M. Lewenstein, Science 292, 902 (2001). [29] The SFA consists in neglecting the atomic binding poten- tials when the electron is in the continuum, the external laser field when the electron is bound, and the internal structure of the molecule. [30] L. V. Keldysh, Sov. Phys. JETP 20, 1307 (1964); F. H. M. Faisal, J. Phys. B 6, L89 (1973); H. R. Reiss, Phys. Rev. A 22, 1786 (1980). [31] A. Perelomov, V. Popov, and M. Terent’ev, JETP 23, 924 (1966). [32] A. Lohr, M. Kleber, R. Kopold, and W. Becker, Phys. Rev. A 55, R4003 (1997). [33] W. Becker, S. Long, and J. K. McIver Phys. Rev. A 50, 1540 (1994); M. Lewenstein, Ph. Balcou, M. Yu. Ivanov, A. L’Huillier, and P. B. Corkum Phys. Rev. A 49, 2117 (1994); W. Becker, A. Lohr, M. Kleber, and M. Lewen- stein, Phys. Rev. A 56, 645 (1997). [34] E. Fermi, Ric. Sci. 7, 13 (1936). [35] P. Krstić, D. B. Milošević, and R. Janev, Phys. Rev. A 44, 3089 (1991). [36] C. Figueira de Morisson Faria, H. Schomerus, and W. Becker, Phys. Rev. A 66, 043413 (2002). [37] In reality, the ionization potential of a molecule decreases with increasing internuclear distance. This effect, how- ever, will only cause an overall shift in the interference patterns. Since it will neither modify their shapes, nor the energy difference between neighboring maxima, it is not relevant to the present discussion. For the specific com- putation of this shift within the context of a zero-range model potential see, e.g., Ref. [13]. [38] W. Becker, F. Grasbon, R. Kopold, D. B. Milošević, G. G. Paulus, and H. Walther, Adv. At. Mol. Opt. Phys. 48, 36 (2002). [39] E. Hasović, M. Busuladžić, A. Gasibegović-Busuladžić, D. B. Milošević, and W. Becker, Laser Phys. 17, 376 (2007). [40] D. Bauer, D. B. Milošević, and W. Becker, J. Mod. Opt. 53, 135 (2006). [41] If one changes the orientation of the molecule, the sit- uation becomes different. For small angles the electrons with the maximal energy are still observable in the direc- tion of the laser field, but there exists a further maximum of electrons with a certain momentum in the opposite di- rection as a result of the alignment of the molecule [21]. With increasing angle of the molecular axis with respect to the laser polarization direction, this local maximum will move over the entire angle-resolved ATI spectrum. ABSTRACT We calculate angle-resolved above-threshold ionization spectra for diatomic molecules in linearly polarized laser fields, employing the strong-field approximation. The interference structure resulting from the individual contributions of the different scattering scenarios is discussed in detail, with respect to the dependence on the internuclear distance and molecular orientation. We show that, in general, the contributions from the processes in which the electron is freed at one center and rescatters off the other obscure the interference maxima and minima obtained from single-center processes. However, around the boundary of the energy regions for which rescattering has a classical counterpart, such processes play a negligible role and very clear interference patterns are observed. In such energy regions, one is able to infer the internuclear distance from the energy difference between adjacent interference minima. <|endoftext|><|startoftext|> Néel order in the two-dimensional S = 1 -Heisenberg Model Ute Löw1 Theoretische Physik, Universität zu Köln, Zülpicher Str.77, 50937 Köln, Germany (Dated: November 1, 2018) The existence of Néel order in the S = 1 Heisenberg model on the square lattice at T = 0 is shown using inequalities set up by Kennedy, Lieb and Shastry in combination with high precision Quantum Monte Carlo data. The ground state order of quantum spin systems, in particular the issue whether the ground state shows long range magnetic order, has attracted long and continuous interest. For the prototype of spin models, the antiferro- magnetic Heisenberg model, the existence of Néel order at low temperatures was proved in the seminal paper of Dyson, Lieb and Simon [1] in 1978 for spin S ≥ 1 and spatial dimension d ≥ 3 and also for S = 1 and d > 3. Ten years later Kennedy, Lieb and Shastry [2] showed that also for S = 1 and d = 3 Néel order in the ground state exists. The situation in two dimensions is different and more subtle, since the Mermin-Wagner-Hohenberg theorem forbids Néel order at finite T , leaving open however the possibility of Néel order in the ground state. The exis- tence of Néel order for the two-dimensional model and S ≥ 1 was shown in [3, 4] and later in [2] by an indepen- dent derivation of the relevant inequality at T = 0. However the inequalities sufficient to show Néel order for S = 1 in the two-dimensional case are not sufficient to construct an analogous proof for S = 1 . Thus the case of S = 1 remains an open problem. Still it is pos- sible to derive inequalities concerning spin-spin correla- tions at short distances [2] which are violated if Néel order is present. That is, with a minimum of numerical infor- mation, the question of Néel order in the ground state can be decided. The issue of this paper is to evaluate the spin-spin cor- relations of the two-dimensional S = 1 antiferromagnetic Heisenberg model at short distances and demonstrate that these results combined with the analytic expres- sions of [2] show the existence of Néel order in the two- dimensional S = 1 antiferromagnetic Heisenberg model at T = 0. Such a study has become possible, due to the developement of high precision Monte-Carlo techniques over the last decade. In Ref.[2] Kennedy, Lieb and Shastry used data of Gross, Sanchez-Velasco and Siggia [5] for a comparison, however these data clearly deviate from the results pre- sented here. The authors of [5] used a Quantum Monte Carlo method without loop updates and with discrete Trotter time (see below). Their data served only as a crude comparison to extrapolated Lanczos data and data produced by the Neumann-Ulam method, which were the best algorithms to study the properties of the two- dimensional Heisenberg model in 1988. Today modern loop algortihms by far outreach both methods. As will be shown in the following an accurate evalua- tion of correlation functions at short distances is possible with modern Quantum Monte Carlo methods, which al- low us to compute expectation values at very low temper- atures and even though the short distance results have a certain finite size and finite temperature correction, these uncertainties are well controlled and allow to draw defi- nite conclusions. The approach and intention of this paper is diffrent from a completely numerical evaluation of e.g. the corre- lation length, which involves a calculation of correlations at long distances and an appropriate extrapolation to in- finite distances, which cannot be used as a proof of long range order in any rigorous sense. At first sight a ”Quantum Monte Carlo algorithm” seems a puzzling concept, since an important step in any Monte-Carlo-method is the evaluation of Boltzmann weights for given energies of the system. For quantum models these energies are hard if not impossible to calcu- late. A key idea to make Monte Carlo methods applicable to quantum systems is to map the quantum model onto a classical model by introducing an extra dimension, usu- ally referred to as Trotter-time [6]. In the first generation of algorithms this mapping was straightforwardly applied to the quantum Heisen- berg model. Though this allowed for a wealth of new studies of the finite temperature properties in one and in particular in two-dimensional systems, these algorithms had two major drawbacks, which became most evident at low temperatures. Firstly the extra Trotter dimension was discretized, introducing the number of time slices as a parameter which had to be eliminated from the final results by an extrapolation. Secondly the update pro- cedure, i.e. the construction of new independent config- urations, was done locally. As a consequence one had to move through the lattice site by site several times to obtain a configuration independent of the starting config- uration and useful for a new evaluation of an observable. A first improvement was introduced by the so called loop-algorithms [7], which uses nonlocal updates similar to the Swendsen-Wang algorithm for classical models. A second and important step towards high precision Quan- tum Monte Carlo techniques were algorithms which work directly in the Euclidian time continuum [8] and require no extrapolation in Trotter time. For the algorithm [9] used for the analysis presented here no approximations enter, and statistical errors are the only source of inac- curacy. Since this work intends to produce highly accurate http://arxiv.org/abs/0704.0713v1 data it seems appropriate to assess the precision of the method by a comparison with exact results. The best candidate for such a comparison are the correlations of one-dimensional systems evaluated by the Bethe-ansatz with almost arbitrary precision up to distance seven [10] and with results for finite chains from Ref.[11]. This is done in the Appendix for chains of 400 sites at T=0.005. After these introductory remarks we now return to our actual goal, which is the two-dimensional system. Our starting point is a S = 1 Heisenberg model x,yεΛ ~Sx~Sy (1) with nearest neighbour interaction on a finite square lat- tice Λ with an even number of sites in every direction and periodic boundary conditions. The Fourier transform of the spin-spin correlation function at T = 0 is given by gq = 〈S−qSq〉 = e−iqx〈S30S x〉 (2) where e−iqxS3x. (3) For the corresponding finite temperature expectation value of gq an upper bound fq was derived in [1]. The T = 0 limit of this bound was obtained in Ref.[4] and a direct proof of the bound at T = 0 was given in [2]. Following the notation and arguments of Kennedy, Lieb and Shastry [2] the inequality for d = 2 reads gq ≤ fq for q 6= Q (4) where fq = 12Eq−Q , Eq = 2 − cos q1 − cos q2, Q = (π, π) and −e0 is the ground state energy per site of the Heisenberg model Eq.1 on the lattice Λ. The fundamental idea is, that the existence of Néel order in the limit of infinite system size corresponds to a delta-function in the Fourier transform of the spin-spin correlation gq at Q. This means, if Eq. 4 is integrated over the whole Brillouin zone one finds in the case of Néel order d2q fq ≥ d2q gq = S(S + 1)/3 (5) where m2 is the coefficient of the delta-function at Q. If there is no Néel order m2 is zero. By numerically evaluating the integral over fq, and by using exact varia- tional upper and lower bounds on the ground state energy −e0 one sees, that the above inequality and its analogon for d ≥ 3 cannot be fulfilled with m2 = 0 and S ≥ 1, which proves Néel order. Inequalities of type Eq. 5 are not sufficient to prove the existence of a nonzero m2 for d = 2, 3 and S = 1 but a new relation is obtained by multiplying gq by cos qi and again integrating over the Brillouin zone: ddq gq cos qi = 〈S 〉 = −e0/3d (6) with i=1,2 for d=2 and i=1,2,3 for d=3 and δi the unit vector in i-direction and the value of the ground state energy form Ref.[12] is e0 = 0.669437(5). Carrying out an analogous integral over fq and using again Eq.4 one finds: d2Eq−Q cos qi were the f+ means the positive part of a function, which equals f, when f is positive and is zero otherwise. Again Eq.7, which is valid if no Néel order exists, was shown to be violated for d = 3 and S = 1 in Ref.[2] by using bounds on e0 and thus the existence of Néel order was proved also for d = 3 and S = 1 For S = 1 and d = 2 one cannot construct a contra- diction by using only the ground state energy. Here more input from numerical data is needed. This can be incor- porated by multiplying gq by cos(mqi) with m = 2, 3... and again integrating over the whole Brillouin zone: d2q gq cos(mqi) = 〈S 〉 (8) with i=1,2. Next, defining ḡ(n) as ḡ(n) = (−1)m〈S30S 〉 (9) and using again inequality 4 one constructs the follow- ing relations involving the correlation functions: ḡ(n) = 2n+ 2 (−1)m{cos(mq1) + cos(mq2)} gq 2n+ 2 (−1)m{cos(mq1) + cos(mq2)}+ fq. Whenever the inequality Eq. 10 is violated for a cer- tain n, a nonzero m2 multiplying the delta-function at Q is needed and therefore the existence of Néel order is proved. The ḡ(n) as defined in Eq.9 were calculated by the QuantumMonte Carlo method [9]. The results, displayed in table I, show that the ḡ(n) calculated by Quantum Monte Carlo cross the bound obtained by integrating over fq at n = 8. This is also depicted in Fig. 1. Thus inequality Eq.10 is violated and Néel order must exists in n Bound T = 0.005 T = 0.025 T = 0.075 1 2.297e-01 1.80799e-01 ± 3.63e-06 1.80794e-01 1.80792e-01 2 1.714e-01 1.40308e-01 ± 5.63e-06 1.40302e-01 1.40298e-01 3 1.383e-01 1.17686e-01 ± 6.84e-06 1.17678e-01 1.17670e-01 4 1.166e-01 1.03005e-01 ± 7.67e-06 1.02997e-01 1.02985e-01 5 1.013e-01 9.27815e-02 ± 8.27e-06 9.27743e-02 9.27544e-02 6 8.990e-02 8.52115e-02 ± 8.73e-06 8.52048e-02 8.51770e-02 7 8.107e-02 7.93875e-02 ± 9.10e-06 7.93811e-02 7.93436e-02 8 7.400e-02 7.47551e-02 ± 9.40e-06 7.47496e-02 7.47012e-02 9 6.820e-02 7.09844e-02 ± 9.64e-06 7.09795e-02 7.09191e-02 10 6.334e-02 6.78504e-02 ± 9.85e-06 6.78464e-02 6.77734e-02 11 5.921e-02 6.52055e-02 ± 1.00e-05 6.52021e-02 6.51163e-02 12 5.563e-02 6.29418e-02 ± 1.02e-05 6.29389e-02 6.28404e-02 13 5.252e-02 6.09835e-02 ± 1.03e-05 6.09806e-02 6.08695e-02 14 4.976e-02 5.92718e-02 ± 1.04e-05 5.92692e-02 5.91456e-02 15 4.732e-02 5.77638e-02 ± 1.06e-05 5.77617e-02 5.76255e-02 TABLE I: Bound obtained by integrating numerically over the right hand side of Eq.10 compared with ḡ(n) for a 40×40 lattice and different temperatures. the two-dimensional antiferromagnetic Heisenberg model with S = 1 at T = 0. There are three type of corrections to the data of table I, which need to be taken into account, but which, as we shall show in the following, do not change the above conclusion of a crossing of the curves at n = 8: (i) effects of finite temperature, (ii) effects of the finiteness of the system, (iii) statistical errors. In the following we comment on how these corrections modify the data. (i) The Quantum Monte Carlo data presented are at T ≥ 0.005. The overall effect of finite temperature is to lower the absolute value of the correlations and therefore also the value of the ḡ(n). The effect of finite temperature is to shift the crossing of the bound and ḡ(n) to larger n, or eventually to destroy a crossing completely. The functional dependence of the internal energy U(T ), which up to an overall factor 3z (z = 2 is the coordination number of the two-dimensional square lat- tice) equals the correlation-function at distance one, has been determined for low T by spin wave theory [13, 14] U(T ) = −e0 + bT 3. (11) The coefficient is given in [15] as b = ≈ 0.2853626, so the correction for distance one is ≈ b 10−7, which is two orders of magnitude smaller than the statistical error, (see point (iii)). For distances larger than one, we fitted the data as a function of temperature (taking the exponent of T as n Bound T = 0.025 T=0.025 extrapolated 1 2.297e-01 1.80794e-01 1.80791e-01 ± 5.09e-06 2 1.714e-01 1.40302e-01 1.40295e-01 ± 7.87e-06 3 1.383e-01 1.17678e-01 1.17668e-01 ± 9.53e-06 4 1.166e-01 1.02997e-01 1.02983e-01 ± 1.07e-05 5 1.013e-01 9.27743e-02 9.27534e-02 ± 1.15e-05 6 8.990e-02 8.52048e-02 8.51762e-02 ± 1.21e-05 7 8.107e-02 7.93811e-02 7.93428e-02 ± 1.26e-05 8 7.400e-02 7.47496e-02 7.46995e-02 ± 1.30e-05 9 6.820e-02 7.09795e-02 7.09154e-02 ± 1.34e-05 10 6.334e-02 6.78464e-02 6.77663e-02 ± 1.37e-05 11 5.921e-02 6.52021e-02 6.51035e-02 ± 1.39e-05 12 5.563e-02 6.29389e-02 6.28188e-02 ± 1.41e-05 13 5.252e-02 6.09806e-02 6.08346e-02 ± 1.43e-05 14 4.976e-02 5.92692e-02 5.90923e-02 ± 1.45e-05 15 4.732e-02 5.77617e-02 5.75473e-02 ± 1.46e-05 TABLE II: Bound obtained by integrating numerically over the right hand side of Eq.10 compared with ḡ(n) extrapolated for N=40,36,32,24 at T = 0.025. fit parameter) for T = 0.005, 0.025, 0.05, 0.075 and found the corrections due to finite temperature all of the order of 10−5, which is the order of the statistical error. There- fore we do not give any finite temperature corrections. (ii) The absolute value of the correlations in the ther- modynamic limit are smaller than in systems of finite size. This means that the effect of finite system size is opposite to the effect of temperature. The finite size be- haviour of the ground state energy is well studied for the Heisenberg model on the square lattice. Arguments originating from the quantum nonlinear sigma model de- scription [16] of the Heisenberg model to lowest order in system size give − e0 = −e0(N) + , with c > 0 (12) where −e0(N) is the ground state energy of a system of size N ×N . Though the corrections are not substantial, they do effect the results, and taking into account, that the finite size errors in contrast to the finite temperature effects, might falsely lead to a crossing, we extrapolated the data for N = 24...40 using the functional dependence Eq.12, which we found well satisfied also for larger dis- tances. The results are shown in table II. One sees that the numeric values are changed but the crossing point is still at n = 8. (iii) We compute ∆x = 1√ 〈x2〉 − 〈x〉2 (where the observable x stands for the value of a correlation at a given distance, temperature and system size and NMC is the number of Monte Carlo iterations), which is a reliable estimate for the statistical error of the mean value 〈x〉, since for the algorithm of Ref.[9] the autocorrelation time is of order one and the Monte Carlo configurations are al- most independent. To assess the quality of our error anal- 0 5 10 15 20 bound 24x24 36x36 40x40 FIG. 1: Bound on ḡ(n) obtained from Eq.9 and ḡ(n) for 24× 24, 36× 36 and 40× 40 at T = 0.025. Distance Quantum Monte Carlo Bethe-Ansatz 0 0.25000000 ( 0) 1 -0.14771586 (198) -0.1477157268 2 0.06067787 (324) 0.0606797699 3 -0.05024194 (282) -0.0502486272 4 0.03464515 (281) 0.0346527769 5 -0.03088096 (260) -0.0308903666 6 0.02443619 (255) 0.0244467383 7 -0.02248413 (242) -0.0224982227 8 0.01895736 (236) TABLE III: Correlations for a chain with N = 400 sites at T/J=0.005 compared with results from Ref. [10]. ysis we also returned to the case of the one-dimensional antiferromagnetic Heisenberg model ( see Appendix ) and compared results with independent streams of random numbers. To calculate an upper limit to the errors of ḡ(n), the er- rors of the correlations where added up ( being evaluated with the same configurations, they are not independent). To conclude, the error analysis shows that the short range correlations entering Eq.9 were determined with sufficiently high accuracy to prove the existence of a crossing of the bound and the Quantum Monte Carlo data for ḡ(n) at n = 8 and therefore to show the existence of long range order. Appendix (1) In this Appendix we list the correlations of a one- dimensional Heisenberg model with periodic boundary conditions and chain length N = 400 at T=0.005 com- pared with results of Ref.[10] for infinite chain length and T = 0. For the internal energy U(T ) of the Heisenberg chain the temperature dependence for low T is U(T ) = −e10 + aT 2 with the ground-state energy e10 = 0.4431471804 for 400 sites and e10 = − + ln 2 for the infinite size system[17]. and the coefficient a = 1 given in Ref. [18, 19]. This means that the correction for the corre- lations in tableIII due to finite temperatures are of the order of 10−5. (2) The exact values of the correlation functions [11, 20] for distance one and two at T = 0 for a chain with 400 sites are 〈S30S 1〉400 = −0.147717441765735 and 〈S30S 2〉400 = 0.0606813790491800. The above data show that the error analysis concerning statistical errors and finite temperature effects is consistent. Acknowledgement I am indebted to Prof. E.H. Lieb for bringing the prob- lem of longrange order to my attention and for his interest in this work. [1] F.J. Dyson, E.H. Lieb and B. Simon, J.Stat.Phys. 18 335-383 (1978). [2] T. Kennedy, E. H.Lieb and S. Shastry, J.Stat.Phys. 53, 1019-1030,(1988). [3] I. Affleck, T. Kennedy, E.H. Lieb and H. Tasaki, Comm. Math. Phys. 115:477-528. [4] E. Jordão Neves and J. Fernando Perez, Phys.Lett.114A 331-333 (1986). [5] M. Gross, E. Sanchez-Velasco, and E. Siggia, Phys.Rev.B 39 2484(1989). [6] M. Suzuki, Commun. Math. Phys. 51, (1976). [7] H.G. Evertz, G. Lana, M.Marcu, Phys. Rev. Lett 70, 875 (1993). [8] E. Farhi and S. Gutmann, Ann.Phys. (N.Y.)213, 182 (1992). [9] B. B. Beard and U. -J. Wiese, Phys. Rev. Lett. 77, 5130 (1996). [10] J. Sato, M. Shiroishi, M. Takahashi hep-th/0507290. [11] J.Damerau, F.Göhmann, N.P.Hasenclever, A.Klümper, cond-mat/0701463. [12] A. W. Sandvik, Phys.Rev.B56 (1997) 11678. [13] R. Kubo, Phys.Rev.87, 568 (1952). [14] T. Oguchi, Phys. Rev.117, 117 (1960). [15] M. Takahashi, Phys. Rev.B 40, 2494 (1989). [16] S. Chakravaty, B.I. Halperin, D.R. Nelson, Phys.Rev.B39,2344(1989). [17] L. Hulthén, Arkiv Mat.Astron.Fysik 26A,1 (1938). [18] H.M. Babujian, Nucl.Phys. B215, 317 (1982). [19] I. Affleck, Phys.Rev.Lett. 56,746 (1986). [20] J. Damerau, private communication. ABSTRACT The existence of Neel order in the S=1/2 Heisenberg model on the square lattice at T=0 is shown using inequalities set up by Kennedy, Lieb and Shastry in combination with high precision Quantum Monte Carlo data. <|endoftext|><|startoftext|> Introduction Let (M, g) be a compact Riemannian manifold. The Perelman λ-functional (1.1) λM(g) = inf f∈C∞(M) {F(g, f) : e−fdvolg = 1} where F(g, f) = (Rg + |∇f |2)e−fdvolg and Rg is the scalar curvature of g. Note that λM(g) is the lowest eigenvalue of the operator −4△ + Rg. By [Pe1] the gradient flow of the Perelman λ-functional is the Hamilton’s the Ricci-flow evolution equation (1.2) g(t) = −2Ric(g(t)) The normalized Ricci flow equation on an n-manifold M reads (1.3) g(t) = −2Ric(g(t)) + 2R where Ric (resp. R) denotes the Ricci tensor (resp. the average scalar curvature ). Note that (1.2) and (1.3) differ only by a change of scale in space and time, and the volume Vol(g(t)) is constant in t. If dimM = n, λM(g) = λM(g)Volg(M) is invariant up to rescaling the metric. Perelman [Pe1] has proved that λM(g(t)) is non-decreasing along the Ricci flow g(t) whenever λM(g(t)) ≤ 0. This leads to the The first author was supported by NSF Grant 19925104 of China, 973 project of Foundation Science of China, and the Capital Normal University. http://arxiv.org/abs/0704.0714v1 2 F. FANG, Y. ZHANG, AND Z. ZHANG Perelman invariant λM by taking supremum of λM(g) in the set of all Riemannian metrics on M . By [AIL] the Perelman invariant λM is equal to the Yamabe invariant whenever λM ≤ 0, after the earlier estimations (cf. [An5] [Pe2] [Le4] [FZ] and [Kot]). In particular, if (M, g) is a smooth compact oriented 4-manifold with a Spinc-structure c which is a monopole class (i.e., the associated Seiberg-Witten equation possesses an irreducible solution) so that that c21(c)[M ] > 0, by [FZ] λM ≤ − 32π2c21(c)[M ]. Moreover, g is a Kähler-Einstein metric of negative scalar curvature if and only if λM(g) = − 32π2c21(c)[M ]. However, there are plenty of 4-manifolds where the Perel- man invariant λM = − 32π2c21(c)[M ] but do not admit any Kähler Einstein metric. It is natural to study 4-manifolds with these extremal property. For such a 4-manifold M , to seek for an ”optimal” Riemannian metric on M with respect to the Perelman functional λM : M → R, we want to consider a maximal solution g(t) which is a solu- tion of the Ricci flow (1.3). We call a longtime solution g(t), t ∈ [0,+∞), to the Ricci flow (1.3) a maximum solution if lim λM(g(t)) = λM . For a compact 3-manifold, by Perelman [Pe2] all solutions of the Ricci flow (1.2) with surgery exist for longtime and are maximum solutions, provided λM ≤ 0. In the paper [FZZ] obstructions are found for the longtime solutions with bounded curvature to (1.3). In this paper we are going to study the maximum solutions of (1.3) with bounded Ricci curvatures instead. To avoid technique terminology we only state our results for symplectic 4-manifolds by using the celebrated work of Taubes [Ta]: if (M,ω) is a compact symplectic manifold with b+2 (M) > 1 (the dimension of self-dual harmonic 2-forms of M), the spinc-structure induced by ω is a monopole class. Moreover, in this situation c21(c)[M ] = 2χ(M) + 3τ(M), where χ(M) (resp. τ(M)) is the Euler characteristic (resp. signature) of M . Theorem 1.1. Let (M,ω) be a smooth compact symplectic 4-manifold satisfying that b+2 (M) > 1 and 2χ(M) + 3τ(M) > 0. If g(t), t ∈ [0,∞), is a solution to (1.3) such that |Ric(g(t))| ≤ 3, and λM(g(t)) = − 32π2(2χ(M) + 3τ(M)), then there exists an m ∈ N, and sequences of points {xj,k ∈ M}, j = 1, · · · , m, satisfying that, by passing to a subsequence, (M, g(tk + t), x1,k, · · · , xm,k) dGH−→ ( Nj, g∞, x1,∞, · · · , , xm,∞), t ∈ [0,∞), in the m-pointed Gromov-Hausdorff sense for any sequence tk −→ ∞, where (Nj , g∞), j = 1, · · · , m, are complete Kähler-Einstein orbifolds of complex dimension 2 with at most finitely many isolated orbifold points. The scalar curvature (resp. volume) of g∞ is −Volg0(M)− 32π2(2χ(M) + 3τ(M)) (resp. Volg0(M) = Volg∞(Nj)) Moreover, the convergence is C∞ in the non-singular part of 1 Nj. MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 3 We first remark that, if the diameters diamg(tk)(M) possess a uniform upper bound, then m = 1, and N1 is a compact Kähler-Einstein orbifold. Secondly, if the Ricci curva- ture bound in the above theorem is replaced by a uniform bound of sectional curvature, then every (Nj, g∞), j = 1, · · · , m are complete Kähler-Einstein manifolds. By the same arguments as in [An5][An6], j=1Nj can weakly embed in M , j=1Nj ⊂⊂ M , i.e. for any compact subset K ⊂ j=1Nj , there is a smooth embedding FK : K −→ M . Furthermore, there exists a sufficiently large compact subset K ⊂ j=1Nj such that M\K admits an F-structure of positive rank. This type geometric decomposition seems very useful to understand the diffeomorphism type of 4-manifolds. Theorem 1.2. Let (M,ω) be a smooth compact symplectic 4-manifold such that b+2 (M) > 1 and let g(t), t ∈ [0,∞), be a solution to (1.3) such that |R(g(t))| ≤ 12. If in addition χ(M) = 3τ(M) > 0, then λM(g(t)) = − 32π2(2χ(M) + 3τ(M)) Moreover, if |Ric(g(t))| ≤ 3, the Kähler-Einstein metric g∞ in Theorem 1.1 is complex hyperbolic. To conclude the section we point out that the main result in Theorem 1.1 (resp. Corollary 1.2) holds if the manifold is not symplectic but a compact oriented 4-manifold with a monopole class c1 (i.e. with a spin c-structure with non-vanishing Seiberg-Witten invariant) so that c21 = 2χ(M) + 3τ(M) > 0. 2. Preliminaries 2.1. Monopole class. Let (M, g) be a compact oriented Riemannian 4-manifold with a Spinc structure c. Let b+2 (M) denote the dimension of the space of self-dual harmonic 2-forms in M . Let S± denote the Spinc-bundles associated to c, and let L be the determinant line bundle of c. There is a well-defined Dirac operator DA : Γ(S+c ) −→ Γ(S−c ) Let c : ∧∗T ∗M −→ End(S+ ) denote the Clifford multiplication on the Spinc- bundles, and, for any φ ∈ Γ(S± ), let q(φ) = φ⊗ φ− 1 |φ|2id. The Seiberg-Witten equations read (2.1) DAφ = 0 c(F+A ) = q(φ) where A is an Hermitian connection on L, and F+A is the self-dual part of the curvature of A. A solution of (2.1) is called reducible if φ ≡ 0; otherwise, it is called irreducible. If (φ,A) is a resolution of (2.1), one calculates easily that (2.2) |F+A | = |φ|2, 4 F. FANG, Y. ZHANG, AND Z. ZHANG The Bochner formula reads (2.3) 0 = −2△|φ|2 + 4|∇Aφ|2 +Rg|φ|2 + |φ|4, where Rg is the scalar curvature of g. The Seiberg-Witten invariant can be defined by counting the irreducible solutions of the Seiberg-Witten equations (cf. [Le2]). Definition 2.2. ([K1]) Let M be a smooth compact oriented 4-manifold. An element α ∈ H2(M,Z)/torsion is called a monopole class ofM if and only if there exists a Spinc- structure c on M with first Chern class c1 ≡ α(mod torsion), so that the Seiberg-Witten equations have a solution for every Riemannian metric g on M . By the celebrated work of Taubes [Ta], if (M,ω) is a compact symplectic 4-manifold with b+2 (M) > 1, the canonical class of (M,ω) is a monopole class. 2.3. Kato’s inequality. Let (M, g) be a Riemannian Spinc-manifold of dimension n, the following Kato inequality is useful. Proposition 2.4. (Proposition 2.2 in [BD]) Let φ be a harmonic Spinc-spinor on (M, g), i.e. DAφ = 0, where DA is the Dirac operator and A is an Hermitian connection on the determinant line bundle. Then (2.4) |∇|φ||2 ≤ n− 1 |∇Aφ|2 ≤ |∇Aφ|2 at all points where φ is non-zero. Moreover, |∇|φ||2 = |∇Aφ|2 occurs only if ∇Aφ ≡ 0. Note that the arguments in the proof of Proposition 2.2 in [BD] can be used to prove this proposition without any change, where the same conclusion was derived for Spin-spinor φ. For any ǫ > 0, let |φ|2ǫ = |φ|2+ǫ2. If φ is harmonic, by above proposition, (2.5) |∇|φ|ǫ|2 ≤ |∇|φ||2 ≤ n− 1 |∇Aφ|2 ≤ |∇Aφ|2 at points where φ(p) 6= 0. Since {p ∈ M : φ(p) 6= 0} is dense in M for harmonic φ, we conclude that (2.5) holds everywhere in M . 2.5. Chern-Gauss-Bonnet formula and Hirzebruch signature formula. Let (M, g) be a compact closed oriented Riemannian 4-manifold, χ(M) and τ(M) are the Euler number and the signature of M respectively. The Chern-Gauss-Bonnet formula and the Hirzebruch signature theorem say that (2.6) χ(M) = + |Wg|2 − |Rico|2)dvg, and (2.7) τ(M) = (|W+g |2 − |W−g |2)dvg, where Rico = Ric(g)− Rg g is the Einstein tensor, W+g and W g are the self-dual and anti-self-dual Weyl tensors respectively (cf. [B]). If g is a Kähler-Einstein metric, then (2.8) R2g = 24|W+g |2, MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 5 (cf. [Le3]) which will be used in the proof of Theorem 1.1. By Chern-Gauss-Bonnet formula, one has an L2-bound of the curvature operator Rm(g) by the bounds of Ricci curvature, i.e. if |Ric(g)| < C, then (2.9) |Rm(g)|2dvg ≤ 8π2χ(M) + C1V olg(M), where C and C1 are constants independent of (M, g). Let (N, g) be a complete Ricci-flat Einstein 4-manifold. Assume that (2.10) |Rm(g)|2dvg < ∞, and Volg(Bg(x, r)) ≥ Cr4, for all r > 0, a point x ∈ N , and a positive constant C. By Theorem 2.11 of [N], (N, g) is ALE. (i.e, Asymptotically Locally Euclidean space) of order 4. It is well-known that N is asymptotic to the cone on the spherical space form S3/Γ, where Γ ⊂ SO(4) is a finite group. The Chern-Gauss-Bonnet formula implies that (2.11) χ(N) = |Rm(g)|2dvg + (cf. [N] and [An1]). 2.6. Curvature estimates for 4-manifolds. Now let’s recall a result of [CT], which is important to the proof of Theorem 1.1. Let (M, g) be a complete Riemannian 4- manifold. A subset U ⊂ M such that for all p ∈ U , sup Bg(p,1) Ric(g) ≥ −3, is called ̺-collapsed if for all p ∈ U , V olg(Bg(p, 1)) ≤ ̺. By Theorem 0.1 in [CG], there is a constant ε4 such that if U is ̺-collapsed with sectional curvature |Kg| ≤ 1 and ̺ ≤ ε4, then U carries an F-structure of positive rank. Theorem 2.7. (Remark 5.11 and Theorem 1.26 in [CT]) There exist constants δ > 0, c > 0 such that: if (M, g) is a complete Riemannian 4-manifold with |Ric(g)| ≤ 3 and |Rm(g)|2dvg ≤ C, and if E ⊂ M is a bounded open subset such that T1(E) = {x ∈ M : dist(x, E) ≤ 1} is ε4-collapsed with Bg(x,1) |Rm(g)|2dvg ≤ δ (for all T1(E)), then ∫ |Rm(g)|2dvg ≤ cV olg(A0,1(E)), where A0,1(E) = T1(E)\E. 6 F. FANG, Y. ZHANG, AND Z. ZHANG 3. The limiting behavior of Ricci flow In this section we study the limiting behavior of Ricci-flow with bounded Ricci curvatures on 4-manifolds. We will assume in this section that M is a smooth closed oriented 4-manifold with λM < 0, and g(t), t ∈ [0,+∞), is a longtime solution of the normalized Ricci flow (1.3) with bounded Ricci-curvature. By normalization we may assume that |Ric(g(t))| ≤ 3. By (2.9) there is a constant C independent of t such that |Rm(g(t))|2dvg(t) ≤ C. Let us denoted by V the volume Volg(0)(M) = Volg(t)(M), and R̆(g(t)) = min R(g(t))(x) the minimum of the scalar curvature of g(t). It is easy to see that R̆(g(t)) ≤ λMV − Lemma 3.1. (3.1.1) lim λM(g(t)) = lim R(g(t)) = lim R̆(g(t)) = R∞ (3.1.2) lim |R(g(t))−R(g(t))|dvg(t) = 0, (3.1.3) lim |Rico(g(t))|2dvg(t) = 0. Proof. By Perelman [Pe1] λM(g(t)) is a non-decreasing function on t, therefore the limit λM(g(t)) exists since λM < 0. Now let us denote by R∞ the limit lim λM(g(t)). Note that R∞ ≤ λMV − 2 < 0. To prove (3.1.1), we first prove that both lim R(g(t)) and lim R̆(g(t)) exist and take values R∞. By the same arguments as in the proof of Proposition 2.6 and Lemma 2.7 of [FZZ] we get that R(g(t))− R̆(g(t)) = 0. Observe that R(g(t)) ≥ λM(g(t)) ≥ R̆(g(t)) (cf. [KL] (92.3)). Therefore lim R(g(t)) = R∞ = lim R̆(g(t)). This proves (3.1.1). Note that∫ |R(g(t))− R(g(t))|dvg(t) ≤ (R(g(t))− R̆(g(t)))dvg(t) + (R(g(t))− R̆(g(t)))dvg(t) = 2(R(g(t))− R̆(g(t)))V (3.1.2) follows from (3.1.1). By Lemma 3.1 in [FZZ], |Rico(g(t))|2dvg(t)dt < ∞, and, by Lemma 1 in [Ye], we have |Rico(g(t))|2dvg(t) ≤ −2 |∇Rico(g(t))|2dvg(t)+4 |Rm||Rico(g(t))|2dvg(t) < D, where D is a constant independent of t. By the same argument as in the proof of Proposition 2.6 in [FZZ] (3.1.3) follows. � MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 7 The following is the main result of this section, which is an analogy of Theorem 10.5 in [CT], where the same conclusion was derived for closed oriented Einstein 4-manifolds with the same negative Einstein constant. The key point in our case is to use Lemma 3.1 to get non-collapsing balls and to prove the limiting metric is an Einstein metric (cf. Lemma 3.3 and Lemma 3.4 below). Proposition 3.2. Let M be a smooth closed oriented 4-manifold with λM < 0. If g(t), t ∈ [0,∞) is a solution to (1.3) such that |Ric(g(t))| ≤ 3, and {tk} is a sequence of times tends to infinity such that diamgk(M) −→ ∞, when k −→ ∞, where gk = g(tk), then there exists an m ∈ N, and sequences of points {xj,k ∈ M}, j = 1, · · · , m, satisfying that, by passing to a subsequence, (M, gk, x1,k, · · · , xm,k) dGH−→ ( Nj , g∞, x1,∞, · · · , , xm,∞) in the m-pointed Gromov-Hausdorff sense for k → ∞, where (Nj , g∞) j = 1, · · · , m are complete Einstein 4-orbifolds with at most finitely many isolated orbifold points {qi}. The scalar curvature (resp. volume) of g∞ is R∞ = lim λM(g(t)), (resp. V = Volg0(M) = Volg∞(Nj)). Furthermore, in the regular part of Nj, {gk} converges to g∞ in both L2,p (resp. C1,α) sense for all p < ∞ (resp. α < 1). We divide the proof of Proposition 3.2 into several useful lemmas. A key result in the paper [CT] shows that, for any compact oriented Einstein 4- manifold (X, g) with Einstein constant −3, there exists a constant C depending only on the Euler number of X , and a point x ∈ X such that Volg(Bg(x, 1)) ≥ CVolg(X) (cf. Theorem 0.14 [CT]). Cheeger-Tian remarked that the same result continues to hold for 4-manifolds which are sufficiently negatively Ricci pinched. The following lemmas is an analogy of the result for the metric gk in Proposition 3.2. Lemma 3.3. There exists a constant v > 0, and a sequence {xk} ⊂ M such that Volgk(Bgk(xk, 1)) ≥ v. Proof. Let ε4 > 0 be the critical constant of Cheeger-Tian (cf. §1 [CT]), i.e., if X is a Riemannian 4-manifold which is ε4-collapsed with locally bounded curvature, then X carries an F-structure of positive rank. We may assume that, for all x ∈ M and gk, Volgk(Bgk(x, 1)) < ε4. By a standard covering argument, for any k, there exist finitely many points q1, · · · , ql such that E = M\ i=1Bgk(qi, 1) satisfies the hypothesis of Theorem 2.7. Moreover, l ≤ Cδ−1 where C and δ are the constants in Theorem 2.7. Therefore, by Theorem 2.7 we conclude that, there is a constant C1 independent of k such that (3.1) |R(gk)|2dvk ≤ 6 |Rm(gk)|2dvk ≤ C1 Volgk(Bgk(qi, 1)). 8 F. FANG, Y. ZHANG, AND Z. ZHANG On the other hand, by Lemma (3.1.2) (R(gk) 2 − R(gk)2)dvk| ≤ 24 |R(gk)− R(gk)|dvk k→∞−→ 0. Therefore (3.2) ∞Volgk(E)− R(gk) 2dvk ≤ R(gk)2Volgk(E)− R(gk) (R(gk) 2 − R(gk)2)dvk for sufficiently large k since R∞ ≤ λMV − 2 < 0. By inserting (3.1) we get that ∞(V − V olgk(Bgk(qi, 1))− ∞Volgk(E)− Volgk(Bgk(qi, 1)), V ≤ C2 Volgk(Bgk(qi, 1)), where C2 is a constant independent of k. Therefore, there is at least a ball among the l balls whose volume is at least V . The desired result follows. � Assuming that diamgk(M) → ∞ for k → ∞, by using the technique developed in [An3], the analogue of Theorem 3.3 in [An2] holds (cf. Theorem 2.3 in [An4]), i.e. there exist a sequence of points {xk} ⊂ M such that, by passing to a subsequence, {(M, gk, xk)} dGH−→ (N∞, g∞, x∞) where N∞ is a 4-orbifold with only isolated orbifold points {qi}, g∞ is a complete C0 orbifold metric, and g∞ is a C 1,α ∩L2,p Riemannian metric on the regular part of N∞, for all p < ∞ and α < 1. Furthermore, {gk} converges to g∞ in the L2,p (resp. C1,α) sense on the regular part of N∞, i.e. for any r ≫ 1 and k, there is a smooth embedding Fk,r : Bg∞(x∞, r)\ i Bg∞(qi, r −1) ⊂ N∞ → M such that, by passing to a subsequence, F ∗k,rgk converge to g∞ in both L 2,p and C1,α senses. Lemma 3.4. g∞ is an Einstein orbifold metric with scalar curvature R∞. Proof. We first prove that g∞ is an Einstein metric with scalar curvature R∞ on the regular part of N∞. Since F k,rgk converge to g∞ in the L 2,p(resp. C1,α) sense on Bg∞(p∞, r)\ i Bg∞(qi, r −1), for any r, by Lemma 3.1, we obtain that Bg∞ (p∞,r)\ i Bg∞(qi,r |Rico(g∞)|2dv∞ ≤ lim |Rico(gk)|2dvk = 0, MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 9 Bg∞ (p∞,r)\ i Bg∞ (qi,r |R(g∞)− R∞|dv∞ ≤ lim |R(gk)− R(gk)|dvk = 0. Therefore g∞ is a C 1,α Riemannian metric on Bg∞(p∞, r)\ i Bg∞(qi, r −1) which sat- isfies the Einstein equation in the weak sense. By elliptic regularity theory, g∞ is a smooth Einstein metric with scalar curvature R∞. Since g∞ is a C 0-orbifold metric, i.e. for any orbifold point qi ∈ N∞, there is a neighborhood Ui ∼= B(0, r)/Γ of qi such that g̃∞ is a C0-Riemannian metric on B(0, r) ⊂ R4 where Γ ⊂ SO(4) is a finite subgroup acting freely on S3, and g̃∞|B(0,r)\{0} is the pull-back metric of g∞. Note that g̃∞ is a smooth Einstein metric on B(0, r)\{0} satisfying that B(0,r) |Rm(g̃∞)|2dveg∞ < C < ∞. By the arguments as in [An1] and [Ti], g̃∞ is a C ∞ Einstein metric on B(0, r) (cf. the proof of Theorem C in [An1], and Section 4 in [Ti]). Hence g∞ is an Einstein orbifold metric. � By the discussion before Lemma 3.4 we may choose ℓ sequences of points {xj,k} ⊂ M , j = 1, · · · , ℓ, such that distgk(xi,k, xj,k) k→∞−→ ∞ for any i 6= j, and (3.3) {(M, gk, x1,k, · · · , xℓ,k)} dGH−→ ( Nj , g∞, x1,∞, · · · , xℓ,∞) where (Nj , g∞, xj,∞), j = 1, · · · , ℓ are complete Einstein 4-orbifolds with only isolated singular points and scalar curvatures R∞. Furthermore, {gk} converges to g∞ in both L2,p (resp. C1,α) sense on the regular parts of Nj , j = 1, · · · , ℓ. Note that (3.4) V ≥ Volg∞(Nj). Lemma 3.5. The number of orbifold points of Nj is less than a constant depending only on the Euler characteristic χ(M). Proof. For each orbifold point q ∈ Nj , there exist a sequence {qk} ⊂ M , and two constants r ≫ r1 > 0 such that: (3.5.1) q ∈ Bg∞(xj,∞, r); (3.5.2) Bg∞(q, r1)\Bg∞(q, σ) lies in the regular part of Bg∞(xj,∞, r) for any σ < r1; (3.5.3) (Bgk(qk, r1)\Bgk(qk, σ), gk) C1,α−→ (Bg∞(q, r1)\Bg∞(q, σ), g∞). By the definition of harmonic radius (cf. [An3]), the harmonic radii of all points in Bgk(qk, r1)\Bgk(qk, σ) have a uniform lower bound, saying µ > 0, a constant depending on σ but independent of k. Clearly, there is a positive constant v0 (e.g., Volg∞(Bg∞(xj,∞, r))) such that Volgk(Bgk(xj,k, r)) ≥ v0. Note that the Sobolev constants CS,k of Bgk(xj,k, r) are bounded from below by a constant depending only on v0, r (cf. [An2] and [Cr]). There- fore, by [An2] again we get that Volgk(Bgk(qk, s)) ≥ Cs4 for any s ≪ 1, where C is independent of k. 10 F. FANG, Y. ZHANG, AND Z. ZHANG Let us denote by rh,k the infimum of the harmonic radii of gk in the ball Bgk(qk, r1). Note that rh,k k→∞−→ 0 since q is a orbifold point (cf. [An3]). Therefore, there is a point q̄k ∈ Bgk(qk, σ) so that rh(q̄k) = rh,k for sufficiently large k. Consider the normalized balls (Bgk(qk, r1), r h,kgk), which have harmonic radii at least 1. By passing to a subsequence if necessary, (Bgk(qk, r1), r h,kgk, q̄k) C1,α−→ (W, ḡ∞, q̄) where (W, ḡ∞) is a complete Ricci-flat 4-manifold satisfying that (3.5) Volḡ∞(Bḡ∞(q̄, r)) ≥ Cr4 for any r > 0. It is obvious that |Rm(ḡ∞)|2dvḡ∞ ≤ lim inf |Rm(gk)|2dvk ≤ C. Therefore (W, ḡ∞) is an Asymptotically Locally Euclidean space (cf. Theorem 2.11 in [N] or [An1]), which is asymptotic to a cone of S3/Γ where Γ ⊂ SO(4) is a finite group acting freely on S3. By the Chern-Gauss-Bonnet formula (3.6) χ(W ) = |Rm(ḡ∞)|2dvḡ∞ + |Γ| . By [An1] W is isometric to R4, provided |Γ| = 1. Since the harmonic radius of ḡ∞ at q̄ is 1, hence ḡ∞ can not be the Euclidean metric. Hence |Γ| ≥ 2. It is easy to verify that χ(W ) ≥ 1. By (3.6) we get that |Rm(ḡ∞)|2dvḡ∞ ≥ 4π2. This proves that every orbifold point contributes to lim inf |Rm(gk)|2dvk at least 4π2. By the rescaling invariance of the integral we conclude that the number of orbifold points β ≤ C The following lemma is an analogue of a result in Cheeger-Tian [CT]. Lemma 3.6. ℓ < χ(M)+β+1, where β := #{number of orbifold points in Lemma 3.5}. Proof. Suppose not, i.e, ℓ ≥ χ(M) + β + 1, by definition there are at least χ(M) + 1 components of 1Nj which are smooth complete non-compact Einstein 4-manifolds of finite volume, for simplicity saying N1, · · · , Ns, where s ≥ χ(M) + 1. By Theorem 4.5 in [CT], for each 1 ≤ j ≤ s, |Rm(g∞)|2dvg∞ ≥ 8π2. Since (M, gk, xk,j) L2,p−→ (Nj , g∞, x∞,j), by Chern-Gauss-Bonnet formula and (3.1.3) in Lemma 3.1 we get that 8π2χ(M) = lim |Rm(gk)|2dvgk ≥ |Rm(g∞)|2dvg∞ ≥ 8π2(χ(M) + 1). A contradiction. � MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 11 Let m denote the maximal value of all possible choice of the base point sequences in (3.3), which has a upper bound by Lemma 3.6. Lemma 3.7. Let Mk,r = M\ Bgk(xj,k, r). For sufficiently large r, there is a constant C independent of r such that (3.7) lim Volgk(Mk,r) ≤ C Volg∞(Nj\Bg∞(xj,∞, (3.8) Volg∞(Nj) = V. Proof. We may choose r ≫ 1 such that, for any y ∈ j=1(Nj\Bg∞(xj,∞, r − 1)), Volg∞(Bg∞(y, 1)) ≤ 12ε4, where ε4 > 0 is the critical constant of Cheeger-Tian (cf. proof of Lemma 3.3 or §1 [CT] ). Now we claim that there is a constant k0 ≫ 1 such that, for any k > k0 and any x ∈ Mk,r, Volgk(Bgk(x, 1)) ≤ ε4. If it is false, without loss of generality we may assume a sequence of points {yk} ⊂ Mk,r such that (3.9) Volgk(Bgk(yk, 1)) > ε4 Observe that the distance distgk(yk, xj,k) → ∞ as k → ∞ for all 1 ≤ j ≤ m. Otherwise, assuming distgk(yk, xj,k) < ρ for some j and ρ > 0, we get that F j,k,ρ(yk) → y∞ ∈ Bg∞(xj,∞, ρ)\Bg∞(xj,∞, r − 1), and so (3.10) Volgk(Bgk(yk, 1)) → Volg∞(Bg∞(y∞, 1)) ≤ when k → ∞, since F ∗j,k,ρgk C1,α-converges to gj,∞, where (3.11) Fj,k,ρ : Bg∞(xj,∞, ρ)\ Bg∞(qi, ρ −1) ⊂ N∞ → M is a smooth embedding so that F ∗j,k,ρgk converges to g∞ in the C 1,α-sense (cf. the discussion before Lemma 3.4). A contradiction to (3.9). Note that (M, gk, yk) dGH−→ (N∞, g∞, y∞) where N∞ is a complete 4-orbifold different from each of Nj, 1 ≤ j ≤ m. This violates the choice of maximality of m. Hence we have proved the claim. By a standard covering argument, for any k, there exist finitely many points z1,k, · · · , zI,k such that Ek,r = Mk,r\ i=1Bzi,k(1) satisfies the hypothesis of Theorem 2.7, where I is independent of k. By Theorem 2.7, there is a constant C independent of k such that |R(gk)|2dvk ≤ 6 |Rm(gk)|2dvk ≤ C( Volgk(Bgk(zi,k, 1))+Volgk(A0,1(Mk,r))). By Lemma 3.1, for k ≫ 1, we have (3.12) |R(gk)− R(gk)|dvk < |R(gk)− R(gk)|dvk −→ 0. 12 F. FANG, Y. ZHANG, AND Z. ZHANG By (3.2) we get ∞Volgk(Ek,r)− R(gk) 2dvk ≤ (R(gk) 2 − R(gk)2)dvk |R(gk)− R(gk)|dvk, Since Volgk(Ek,r) ≥ Volgk(Mk,r) − Volgk(Bgk(zi,k, 1)), by the above together we get immediately that (3.13) Volgk(Mk,r) ≤ C( Volgk(Bgk(zi,k, 1))+Volgk(A0,1(Mk,r)))+24 |R(gk)−R(gk)|dvk. If distgk(zi,k, xj,k) → ∞ for all 1 ≤ j ≤ m, by the same argument as above we get that Volgk(Bgk(zi,k, 1)) → 0 when k → ∞. Otherwise, there exists a subsequence ks → ∞ and an index j such that distgks (zi,ks, xj,ks) < ρ for some constant ρ. In both cases, we obtain lim sup Volgk(Bgk(zi,ks, 1)) ≤ Volg∞(Nj\Bg∞(xj,∞, for r ≫ ρ. Therefore, by (3.12) and (3.13) we conclude immediately (3.7). By (3.7) it follows that lim k,r→∞ Volgk(Mk,r) → 0. Hence (3.8) follows. � By now Proposition 3.2 follows by the above lemmas. 4. Smooth convergence on the regular part The main result of this section is the following: Proposition 4.1. Let M be a closed 4-manifold satisfying that λ̄M < 0 and let g(t), t ∈ [0,∞), be a solution to the normalized Ricci flow equation (1.3) on M with uniformly bounded Ricci curvature. If (M, g(tk), pk) dGH−→ (N∞, g∞, p∞), where tk → ∞ and N∞ is a 4-dimensional orbifold, and g(tk) C1,α−→ g∞ on the regular part R of N∞ (the compliment of the orbifold points), then, by passing to a subsequence, for all t ∈ [0,∞), (M, g(tk + t), pk) dGH−→ (N∞, g∞(t), p∞), where g∞(t) is a family of smooth metrics on R solving the normalized Ricci flow equation on R with g∞(0) = g∞. Moreover, the convergence is smooth on R× [0,∞). In [Se] the convergence of Kähler-Ricci flow on compact Kähler manifolds with bounded Ricci curvature was studied. It seems that the arguments in [Se] could be applied to prove Proposition 4.1, but the authors can not follow completely her line. Therefore, we give a quite different approach, where we first give a curvature estimate of the Ricci flow similar to Perelman’s pseudolocality theorem. Using this curvature MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 13 estimation we prove the limit Ricci flow exists on R × [0,∞). Finally, we prove that R is exactly the regular part of every subsequence limit of (M, g(tk + t), pk), for all t ∈ [0,∞). It deserves to point out that our approach works only in dimension 4. We now give a curvature estimate for the Ricci flow which is an analogy of Perel- man’s pseudolocality theorem (cf. [Pe1] Thm. 10.1). The difference is that here we use the hypothesis of local almost Euclidean volume growth, instead of the almost Euclidean isoperimetric estimate. The proof is much easier than that of Perelman’s pseudolocality theorem. Theorem 4.2. There exist universal constants δ0, ǫ0 > 0 with the following property. Let g(t), t ∈ [0, (ǫP r0)2], be a solution to the Ricci flow equation (1.2) on a closed n-manifold M and x0 ∈ M be a point. If the scalar curvature R(x, t) ≥ −r−20 whenever distg(t)(x0, x) ≤ r0, and the volume Volg(t)(Bg(t)(x, r)) ≥ (1− δ0)Vol(B(r)) for all Bg(t)(x, r) ⊂ Bg(t)(x0, r0), where B(r) denotes a ball of radius r in the n-Euclidean space and Vol(B(r)) denotes its Euclidean volume, then the Riemannian curvature tensor satisfies |Rm|g(t)(x, t) ≤ t−1, whenever distg(t)(x0, x) < ǫ0r0, and 0 < t ≤ (ǫ0r0)2. In particular, |Rm|g(t)(x0, t) ≤ t−1 for all time t ∈ (0, (ǫ0r0)2]. Proof. We use Claim 1 and Claim 2 of Theorem 10.1 in [Pe1] and adopt a contradiction argument. For any given small constants ǫ, δ > 0, set ǫ0 = ǫ, δ0 = δ, then there is a solution to the Ricci flow equation (1.2), say (M, g(t)), not satisfying the conclusion of the theorem. After a rescaling, we may assume that r0 = 1. Denote by M̄ the non-empty set of pairs (x, t) such that |Rm|g(t)(x, t) > t−1, then as in Claim 1 and Claim 2 of Theorem 10.1 in [Pe1], we can choose another space time point (x̄, t̄) ∈ M̄ with 0 < t̄ ≤ ǫ2, distg(t̄)(x0, x̄) < 110 , such that |Rm|g(t)(x, t) ≤ 4Q whenever t̄− 1 Q−1 ≤ t ≤ t̄, distg(t̄)(x̄, x) ≤ (100nǫ)−1Q−1/2, where Q = |Rm|g(t̄)(x̄, t̄). It is remarkable that from the proof of Claim 2 of Theorem 10.1 in [Pe1], each such a space time point (x, t) satisfies distg(t)(x, x0) < distg(t̄)(x0, x̄) + (100nǫ) −1Q−1/2 < + (100n)−1 < Now choosing sequences of positive numbers ǫk → 0 and δk → 0, we obtain a sequence of solutions (Mk, gk(t)), t ∈ [0, ǫ2k] and a sequence of points x0,k, x̄k ∈ Mk and times t̄k, with each satisfying the assumptions of the theorem and the properties described above. In particular, we have that Qk = |Rmk|gk(t̄k)(x̄k, t̄k) → ∞. Consider the sequence of pointed Ricci flow solutions (Bgk(t̄k)(x̄k, (100nǫk) k ), Qkgk(Q k t+ t̄k), x̄k), t ∈ [− , 0]. Using Hamilton’s compactness theorem for solutions to the Ricci flow, we can extract a subsequence which converge to a complete Ricci flow solution (M∞, g∞(t), x̄∞), t ∈ , 0], with |Rm∞|g∞(0)(x̄∞, 0) = 1. 14 F. FANG, Y. ZHANG, AND Z. ZHANG By assumption, the balls Bgk(t̄k)(x̄k, (100nǫk) k ) ⊂ Bgk(t)(x0,k, for any t ∈ [t̄− 1 Q−1, t̄], so the scalar curvature Rk(x, t) ≥ −1 for t ∈ [t̄− 12nQ −1, t̄] and x ∈ Bgk(t̄k)(x̄k, 110(100nǫk) k ) and Volgk(t)(Bgk(t)(x, r)) ≥ (1−δk)Vol(B(r)) for any metric ball Bgk(t)(x, r) ⊂ Bgk(t̄k)(x̄k, 110(100nǫk) k ), t ∈ [t̄− 12nQ −1, t̄]. Passing to the limit, we see that g∞(t) has scalar curvature R∞ ≥ 0 everywhere and local volume Volg∞(t)(Bg∞(t)(z, r)) ≥ Vol(B(r)) for any balls Bg∞(t)(z, r) at time t ∈ (− 12n , 0]. Then the local variation formula of volume implies that R∞ ≡ 0 on M∞×(− 12n , 0], see [STW] for details. By the evolution of the scalar curvature ∂ R∞ = △R∞ + 2|Ric∞|2, we get that Ric∞ ≡ 0 over M∞ × (− 12n , 0]. Then the Bishop-Gromov volume comparison theorem implies that g∞(t) are flat solutions to the Ricci flow, which contradicts the fact that |Rm∞|(x̄∞, 0) = 1. This ends the proof of the theorem. � The next lemma provides a comparison of the curvature of the normalized and un- normalized Ricci flow. By assumption, there is C̄ < ∞ such that |Ric| ≤ C̄ everywhere along the flow (M, g(t)). Note that by Lemma 3.1, there is some time T < ∞ such that 2R∞ ≤ R(g(t)) ≤ 12R∞ < 0 whenever t > T . Fix any such a time t̄ > T and let h(t) and h̃(t̃) be the solutions to the normalized and unnormalized Ricci flow with initial metric h(0) = h̃(0) = g(t̄) respectively, where t̃ = t̃(t) is the corresponding rescaled time for t. Denote by Rmt̄, Rict̄, Rt̄ and R̃mt̄, R̃ict̄, R̃t̄ the corresponding Riemann- ian curvature, Ricci curvature and scalar curvature of them, where |Rict̄| ≤ C̄ since h(t) = g(t̄+ t). Then we have Lemma 4.3. The solution h̃(t̃) exists for all time t̃ ∈ [0,∞). Furthermore, there exist constants C and τ depending on λ̄M and C̄, such that t ≤ t̃ ≤ Ct, |R̃mt̄|(x, t̃) ≤ |Rmt̄|(x, t) ≤ C|R̃mt̄|(x, t̃), whenever t ≤ τ. Proof. The solution h(t) has average scalar curvature R(t̄ + t) ≤ 1 R∞ < 0, so h̃(t̃) also has average scalar curvature R̃ < 0. From the evolution d lnVol(h̃(t̃)) = −R̃, the volume Vol(h̃(t̃)) increases strictly in t̃, so to normalize it, we need to compress the space and time. Thus t̃ ≥ t and |R̃mt̄|(x, t̃) ≤ |Rmt̄|(x, t) for all (x, t). So h̃(t̃) exists for all time. The last assertion means that the scaling factor from normalized Ricci flow to the unnormalized one is less than C on the time interval [0, τ ]. Consider the evolution of average scalar curvature R̃(t̃): (2|R̃ict̄|2 − R̃2t̄ )dvk V olh̃( t̃)(M) for some constant Λ = Λ(C̄), since |R̃ict̄| ≤ |Rict̄| ≤ C̄, |R̃t̄| ≤ |Rt̄| ≤ C̄, |R̃| ≤ |R| ≤ C̄. Note that the initial value R̃(0) = R(g(t̄)) ≤ 1 R∞, so there is some constant τ̃ = τ̃(Λ) such that R̃(t̃) ≤ 1 R∞ for t̃ ∈ [0, τ̃ ]. Thus the scaling factor from normalized Ricci MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 15 flow to the unnormalized one, which equals R(h(t)) R̃(t̃) , is less than 8 on the time interval t̃ ∈ [0, τ̃ ]. Now the result follows, by setting τ = τ̃ and C = 8. � The following lemma gives the estimation of the local volume along the Ricci flow. As in [Se], the proof uses Theorem A 1.5 of [CC]. By assumption, we have a solution (M, g(t)) to the normalized Ricci flow (1.3) and a sequence of times tk → ∞ and points pk such that (M, g(tk), pk) dGH−→ (N∞, g∞, p∞) with g(tk) C1,α−→ g∞ on the regular part R of the orbifold N∞. For the space M or N∞, let Rǫ,ρ be the set of points x such that dGH(B(x, r), B(r)) < ǫr for any r ≤ u, where u ≥ ρ is some constant depending on x. Here and after, B(r) denotes a ball of radius r in 4-Euclidean space and B(x, r) the metric ball of radius r with center x in a metric space. A weak version is WRǫ,ρ, the set of points x such that there is u ≥ ρ with dGH(B(x, u), B(u)) < ǫu. Lemma 4.4. For each q ∈ R, choose a sequence qk ∈ M that converge to q. Then for any ǫ > 0, there exist k0, η, ρ > 0 such that Vol(Bg(tk+t)(q k, r)) ≥ (1− ǫ)Vol(B(r)), ∀r < ρ, k0 < k, whenever Bg(tk+t)(q k, r) ⊂ Bg(tk)(qk, ρ) and t ∈ [−η, η]. Proof. By the boundedness of Ricci tensor, there is a universal constant Λ = Λ(C̄) > 1 such that Bg(t)(p,Λ −1r) ⊂ Bg(s)(p, r) ⊂ Bg(t)(p,Λr) for all t, s ∈ [tk − 1, tk + 1], p ∈ M and r > 0. By Theorem A.1.5 of [CC], for fixed ǫ > 0, there are δ = δ(ǫ, n), ρ = ρ(ǫ, n) > 0 such that x ∈ WRδ,r implies Vol(Bg(t)(x, r)) ≥ (1 − ǫ)Vol(B(r)) for each r ≤ ρ and x ∈ M . So by definition, it suffice to show q′k ∈ Rδ,ρ with respect to each metric g(t), t ∈ [tk − η, tk + η], whenever q k ∈ Bg(tk)(qk,Λρ), for some constant η > 0. The constant ρ may be modified by a smaller one if necessary. Using Theorem A.1.5 of [CC] again, for fixed δ as above, there is δ1 = δ1(δ, n) > 0 such that qk ∈ WR (Λ2+1)ρ implies q k ∈ Rδ,ρ for any q k ∈ Bg(t)(qk,Λ2ρ). So it re- duces to show qk ∈ WR (Λ2+1)ρ with respect to each time t ∈ [tk − η, tk + η] for some η > 0 small enough. In fact, as showed in [Se], dGH(Bg(tk)(qk, ρ1), B(ρ1)) < for some small number ρ1 and all k large enough. By the boundedness of Ricci ten- sor again, there is a constant η ≤ 1 such that for each time t ∈ [−η, η], we have dGH(Bg(tk+t)(qk, ρ1), Bg(tk)(qk, ρ1)) < δ1ρ1 for all k. Thus dGH(Bg(tk+t)(qk, ρ1), B(ρ1)) < δ1ρ1 for each t ∈ [−η, η]. Now the result follows by setting ρ = (1−δ)ρ1Λ2+1 . � Note that in the proof, the constant δ1 = δ1(ǫ, n), so the constant η depends only on ǫ, n and C̄. By assumption, there is a compact exhaustion {Ki}∞i=1 of R and a sequence of smooth embeddings Fi : Ki → M such that Fi(p∞) = pi and F ∗i g(ti) converges to g∞ in the local C 1,α sense. Following the lines described in [Se], we can prove Lemma 4.5. Denote by Ki,k = Fk(Ki), then for any ǫ > 0 and i, there are k0, η, ρ > 0 such that Vol(Bg(tk+t)(q k, r)) ≥ (1− ǫ)Vol(B(r)), ∀q k ∈ Ki,k, k0 < k, t ∈ [−η, η] and r < ρ. Now we are ready to prove the Proposition 4.1. 16 F. FANG, Y. ZHANG, AND Z. ZHANG Proof of Proposition 4.1. Assume that p∞ ∈ Ki for each i. Set ǫ = δ0 in the the previous lemma, where δ0 is just the constant in Theorem 4.2, then for one fixed Ki, there exist k0, η, ρ > 0 such that Vol(Bg(tk+t)(q, r)) ≥ (1 − δ0)Vol(B(r)) whenever q ∈ Ki,k, k0 < k, t ∈ [−η, η] and r < ρ. Modifying ρ and η by smaller constants, we assume (ǫ0ρ) 2 ≤ 2η < τ , where τ and ǫ0 are constants in Lemma 4.3 and Theorem 4.2 respectively. Let hk(t̃) be the corresponding solutions to the unnormalized Ricci flow equation with initial value hk(0) = g(tk−η), then Vol(Bhk(t̃)(q, r)) ≥ (1−δ0)Vol(B(r)) whenever q ∈ Ki,k, r < ρ, k0 < k and t̃ satisfying t(t̃) ∈ [0, 2η], since the inequality Vol(B(q, r)) ≥ (1− δ0)Vol(B(r)) is scale invariant and Bhk(t̃) ⊂ Bg(tk+t(t̃))(q, r) for k large enough such that tk ≥ T + η for T chosen as above. Denote by R̃mk the Riemannian curvature tensor of hk, then by Theorem 4.2 and Lemma 4.3, we have |Rm|(q, tk + t) ≤ C|R̃mk|(q, t̃) ≤ C(t̃)−1 ≤ C(t− tk + η)−1, for all q ∈ Ki,k. Hence |Rm|(q, t) is uniformly bounded on Ki,k × [tk − η2 , tk + By Hamilton’s compactness theorem of Ricci flow solution, {(Ki,k, g(tk+ t), pk)}∞k=1 converge along a subsequence to a solution to the normalized Ricci flow (Ki,∞, gi,∞(t), pi,∞), t ∈ ), in the local C∞ sense. When we consider the time t = 0, then using a diag- onalization argument, a subsequence of {(Ki,k, g(tk), pk)}i,k will converge in the local C∞ sense to a smooth Riemannian manifold (K∞, g∞, p∞), which is just (R, g∞), by the uniqueness of the limit space. For fixed i, there is a family of metrics gi,∞(t), t ∈ (−η2 , ), on Ki. As showed in [Se], we translate the time by η , say considering the sequence {(Ki,k, g(tk + η4 + t), pk)}k, and repeat the above argument, then obtain that {(Ki,k, g(tk + t), pk)}k loc−→ (Ki,∞, gi,∞(t), pi,∞) along another subsequence, on the time interval t ∈ (−η2 , The essential point is that the estimate dGH(Bg(tk)(qk, ρ1), B(ρ1)) < δ1ρ1 in the proof of Lemma 4.4 holds for some constant ρ1, simultaneously the time tk is replaced by , but the constant η in Lemma 4.5 is fixed in this procedure. Iterating this process infinite times we obtain the convergence on Ki for all t ∈ [0,∞). Then do the same thing for each Ki, i = 1, 2, · · · , and after a diagonalization argument, we get that a subsequence of {(Ki,k, g(tk + t), pk)}k, say (Ki,ki, g(tki + t), pki) loc−→ (R, g∞(t), p∞) for all t ∈ [0,∞), with g∞(0) = g∞. We finally show that the completion of R with respect to the metric g∞(t), say R̄t, is just N∞, for each time t ∈ [0,∞). Denote by S = N∞\R the set of singular points of (N∞, g∞(0)), then it suffice to show that R̄t = R ∪ S for fixed time t. Assume S = {ql}Ql=1, where Q ≤ β for β = β(M) by Lemma 3.5, and let ε > 0 be any small constant such that Bg∞(0)(qi, ε) ∩ Bg∞(0)(qj , ε) = ∅ whenever i 6= j. Denote by Kε = R\ Bg∞(0)(pl, ε), then using |Ric∞| ≤ C̄ on R× [0,∞) and by the evolution of the distance function, we obtain dGH((R\Kε, g∞(t)),S) ≤ e2C̄tε and consequently R̄t = R ∪ S, by letting ε → 0. � 5. Proofs of Theorems 1.1 and 1.2 The main result of this section is the following MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 17 Theorem 5.1. Let (M, c) be a smooth oriented closed 4-manifold with a Spinc-structure c. Assume that the first Chern class c1(c) of c is a monopole class of M satisfying that (5.1) c21(c)[M ] ≥ 2χ(M) + 3τ(M) > 0. Let g(t), t ∈ [0,∞), be a solution to (1.3) so that |Ric(g(t))| ≤ 3, and (5.2) lim λM(g(t)) = − 32π2c21(c)[M ]. Then there exists an m ∈ N, and sequences of points {xj,k ∈ M}, j = 1, · · · , m, satisfying that, by passing to a subsequence, (M, g(tk + t), x1,k, · · · , xm,k) dGH−→ ( Nj, g∞, x1,∞, · · · , , xm,∞), t ∈ [0,∞), in the m-pointed Gromov-Hausdorff sense for any tk → ∞, where (Nj , g∞) j = 1, · · · , m are complete Kähler-Einstein orbifolds of complex dimension 2 with at most finitely many isolated orbifold points {qi}. The scalar curvature (resp. volume) of g∞ is −Volg0(M)− 32π2c21(c)[M ] (resp. V = Volg0(M) = Volg∞(Nj)). Furthermore, in the regular part of Nj, {g(tk + t)} converges to g∞ in C∞-sense. Comparing with Proposition 3.2, Theorem 5.1 shows that the Einstein orbifolds are actually Kähler Einstein orbifolds under the additional assumptions. The key point in the proof is that the sequence of the self-dual parts of the curvatures of the connections on the determinant line bundles given by the irreducible solutions in the Seiberg-Witten equations converges to a non-trivial parallel self-dual 2-form on every component Nj , which is a candidate of the Kähler form. Let (M, c) and g(t) be the same as in Thoerem 5.1, and let V , m, tk, xj,k, R̆(g(t)), gk, g∞, Nj and Fj,k,r be the same as in Section 3. Assume that, for each k, (φk, Ak) is an irreducible solution to the Seiberg-Witten equations (2.1). Let | · |k denote the norm with respect to the metric gk = g(tk). The following lemma shows that the L 2-norms of the self-dual parts F+Ak tends to zero. Lemma 5.2. |∇kF+Ak | kdvk = 0, where ∇k is the connection on Λ2T ∗(M) induced by Levi-civita connection. Proof. The Bochner formula implies that 0 = −1 ∆k|φk|2k + |∇Akφk|2k + R(gk) |φk|2k + |φk|4k, By taking integration we get that (5.3) (|∇Akφk|2k + R(gk) |φk|2k)dvk = − |φk|4kdvk. 18 F. FANG, Y. ZHANG, AND Z. ZHANG Since λM(gk) is the lowest eigenvalue of the operator −4△k+R(gk), for any 1 ≫ ǫ > 0, by definition (5.4) λM(gk) |φk|2k,ǫdvk ≤ (4|∇|φk|k,ǫ|2 +R(gk)|φk|2k,ǫ)dvk, where | · |2k,ǫ = | · |2k + ǫ2. By Kato’s inequality (cf. (2.5)) and letting ǫ → 0, λM(gk) |φk|2kdvk ≤ (4|∇Akφk|2k +R(gk)|φk|2k)dvk = − |φk|4kdvk ≤ 0. As λM(gk) ≤ 0, by Schwarz inequality, λM(gk)( |φk|4k,ǫdvk) 2 = λM(gk)Volgk(M) |φk|4k,ǫdvk) 2 ≤ λM(gk) |φk|2k,ǫdvk. Therefore λM(gk)( |φk|4k,ǫdvk) (4|∇|φk|k,ǫ|2 +R(gk)|φ|2k,ǫ)dvk. (5.5) 4 (|∇Akφk|2k − |∇|φk|k,ǫ|2)dvk ≤ − |φk|4kdvk − λM(gk)( |φk|4k,ǫdvk) From (2.5), |∇|φk|k,ǫ|2 ≤ 34 |∇ Akφk|2k. Hence, by letting ǫ −→ 0, we have (5.6) |∇Akφk|2kdvk ≤ −(( |φk|4kdvk) 2 + λM(gk))( |φk|4kdvk) If c+1,k denotes the self-dual part of the harmonic form representing the first Chern class c1(c) of c, by the Seiberg-Witten equation we get that (5.7) |φk|4kdvk = 8 |F+Ak | 2dvk ≥ 32π2[c+1,k]2[M ] ≥ 32π2c21(c)[M ]. Note that, by the standard estimates for Seiberg-Witten equations, −R̆(gk) ≥ |φk|2k and, by Theorem 1.1 in [FZ], 32π2c21(c)[M ] + λM(gk) is non-positive. Hence (5.8) |∇Akφk|2kdvk ≤ −( 32π2c21(c)[M ] + λM(gk))( |φk|4kdvk) ≤ R̆(gk)V 32π2c21(c)[M ] + λM(gk)) −→ 0, when k −→ ∞, by (5.2) and Lemma 3.1. By the second one of the Seiberg-Witten equations again (cf. [Le2]), (5.9) |∇kF+Ak | |φk|2k|∇Akφk|2k, where ∇Ak is the connection on Γ(S ) induced by the Levi-civita connection. Hence |∇kF+Ak | kdvk ≤ |R̆(g(tk))| |∇Akφk|2kdvk −→ 0, when k −→ ∞. � MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 19 Regard F+Ak as self-dual 2-forms of g k on Uj,r = Bg∞(xj,∞, r)\ i Bg∞(qi,j, r where g′k = F j,k,r+1gk, and qi,j are the orbifold points of Nj. Since (5.10) |F+Ak | |φk|4k ≤ R̆(gk) 2 ≤ C, where C is a constant independent of k, F+Ak ∈ L 1,2(g′k), and ‖F+Ak‖L1,2(g′k) ≤ C where C ′ is a constant independent of k. Note that ‖ · ‖L1,2(g∞) ≤ 2‖ · ‖L1,2(g′k) for k ≫ 1 since g′k C1,α−→ g∞ on Uj,r. Thus, by passing to a subsequence, F+Ak L1,2−→ Ωj ∈ L1,2(g∞), a self-dual 2-form with respect to g∞. Lemma 5.3. For any j, Ωj is a smooth self-dual 2-form on Uj,r\∂Uj,r such that ∇∞Ωj ≡ 0, and |Ωj |∞ ≡ cont. 6= 0, where ∇∞ is the connection induced by the Levi-civita connection of g∞. Hence, g∞ is a Kähler metric with Kähler form |Ωj | on Uj,r. Proof. By Lemma 5.2 |∇∞Ωj |2∞dv∞ = lim |∇∞F+Ak | ∞dv∞ ≤ 2 lim |∇kF+Ak | kdvk = 0. It is easy to see that Ωj is a weak solution of the elliptic equation ∇∞Ωj = 0 on Uj,r. By elliptic equation theory, Ωj is a smooth self-dual 2-form on Uj,r\∂Uj,r, ∇∞Ωj ≡ 0, and |Ωj|∞ ≡ cont.. Now we claim that, for any j and r ≫ 1, |Ωj |2∞dv∞ 6= 0. If not, there exist js, s = 1, · · · , m0, m0 ≤ m, such that Ujs,r |Ωjs|2∞dv∞ ≡ 0. By Lemma 3.1, R∞ = R(gk) = lim R̆(gk) = λMV 2 , which is the scalar curvature of g∞, i.e. R∞ = R(g∞). Note that, by (5.10) and Lemma 3.7, |Ωj |2∞dv∞ = lim |F+Ak | R̆(gk) 2Volg′ (Uj,r) ∞Volg∞(Uj,r), |F+Ak | kdvk − |F+Ak | kdvk| ≤ R̆(gk) 2Volgk(M\ Fk,j,r(Uj,r)) Volg∞(Nj\Uj, r2 ), and, by Lemma 3.1, (R(gk) 2 − R2∞)dvk| ≤ 24 lim (|R(gk)−R(gk)|+ |R∞ − R(gk)|)dvk = 0, 20 F. FANG, Y. ZHANG, AND Z. ZHANG where C is a constant in-dependent of k. Hence, we obtain j 6=j1,··· ,jm0 Volg∞(Uj,r) ≥ 8|Ωj|2∞dv∞ = lim 8|F+Ak| ≥ lim 8|F+Ak | kdvk − CR Volg∞(Nj\Uj, r2 ) ≥ 32π2c21(c)[M ]− CR Volg∞(Nj\Uj, r2 ). The last inequality is obtained by (5.7). Thus, by (5.1), j 6=j1,··· ,jm0 Volg∞(Uj,r) ≥ 32π2(2χ(M) + 3τ(M))− CR Volg∞(Nj\Uj, r2 ). By the Chern-Gauss-Bonnet formula and the Hirzebruch signature theorem, 2χ(M) + 3τ(M) ≥ 1 R(gk) 2 + 2|W+(gk)|2k)dvk − |Rico(gk)|2dvk. By Lemma 3.1, and the fact that g′k L2,p−→ g∞ on Uj,r, we obtain that j 6=j1,··· ,jm0 Volg∞(Uj,r) ≥ + 2|W+(g∞)|2∞)dv∞ −CR2∞ Volg∞(Nj\Uj, r2 ). Note that, on any Uj,r, j 6= j1, · · · , jm0 , ∇∞Ωj ≡ 0, |Ωj|∞ ≡ cont. 6= 0, and Ωj is a self-dual 2-form. Thus g∞ is a Kähler metric with Kähler form |Ωj | on Uj,r, j 6= j1, · · · , jm0 . It is well known that R ∞ = 24|W+(g∞)|2∞ for Kähler metrics (cf. [Le3]). Thus j 6=j1,··· ,jm0 Volg∞(Uj,r) ≥ R j 6=j1,··· ,jm0 Volg∞(Uj,r)− CR Volg∞(Nj\Uj, r2 ) js=j1,··· ,jm0 + 2|W+(g∞)|2∞)dv∞ ≥ R2∞ j 6=j1,··· ,jm0 Volg∞(Uj,r)− CR Volg∞(Nj\Uj, r2 ) js=j1,··· ,jm0 ∞Volg∞(Ujs,r). MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 21 Note that, for r ≫ 1, 1 ≫ 3CR2∞ Volg∞(Nj\Uj, r2 ) ≥ js=j1,··· ,jm0 ∞Volg∞(Ujs,r). A contradiction. Thus, for all j, |Ωj |2∞dv∞ 6= 0, and∇∞Ωj ≡ 0, |Ωj |∞ ≡ cont. 6= 0. Thus we obtain the conclusion. Proof of Theorem 5.1. First, assume that diamg(tk)(M) −→ ∞, when k −→ ∞. By Proposition 3.2 and Proposition 4.1, there exists a m ∈ N, and a sequence of points {xj,k ∈ M}, k ∈ N, j = 1, · · · , m, satisfying that, by passing to a subsequence, (M, g(tk+t), x1,k, · · · , xm,k), t ∈ [0,∞), converges to {(N1, g∞, x1,∞), · · · , (Nm, g∞, xm,∞)} in them-pointed Gromov-Hausdorff sense, when k −→ ∞, where (Nj , g∞) j = 1, · · · , m are complete Einstein 4-orbifolds with finite isolated orbifold points {qi}. The scalar curvature of g∞ is R∞ = lim λM(g(t)), and V = Volg0(M) = Volg∞(Nj). By Lemma 5.2, g∞ is a Kähler-Einstein metric in the non-singular part of Then by the same arguments as in Section 4 of [Ti], g∞ is actually a Kähler-Einstein orbifold metric. Furthermore, in the non-singular part of Nj , {g(tk+ t)}, t ∈ [0,∞), C∞-converges to g∞ by Proposition 4.1. If diamgk(M) < C for a constant C in-dependent of k, we can also obtain the conclusion by the similar, but much easier, arguments as above. Theorem 5.4. Let (M, c) be a smooth compact closed oriented 4-manifold with a Spinc- structure c. Assume that the first Chern class c1(c) of c is a monopole class of M satisfying c21(c)[M ] = 2χ(M) + 3τ(M) > 0, and χ(M) = 3τ(M). If M admits a solution g(t), t ∈ [0,∞) to (1.3) with |R(g(t))| ≤ 12, then λM(g(t)) = − 32π2c21(c)[M ]. Furthermore, if |Ric(g(t))| ≤ 3, the Kähler-Einstein metric g∞ in Theorem 5.1 is a complex hyperbolic metric. Proof. Let V = V olg(t)(M). By the Chern-Gauss-Bonnet formula and the Hirzebruch signature theorem, (5.11) 2χ(M)− 3τ(M) ≥ 1 R(g(t))2 + 2|W−(g(t))|2 − 1 |Rico(g(t))|2dvg(t), where W− is the anti-self-dual Weyl tensor. Note that (5.12) R(g(t))2dvg(t) ≥ R(g(t))2V −→ R ∞V = lim λM(g(t)) 22 F. FANG, Y. ZHANG, AND Z. ZHANG when t −→ ∞, by Schwarz inequality and Lemma 3.1. By (5.11), (5.12), Lemma 3.1 and Theorem 1.1 in [FZ], 2χ(M)− 3τ(M) ≥ lim inf |W−(g(t))|2dvg(t) + λM(g(t)) ≥ lim inf |W−(g(t))|2dvg(t) + c21(c)[M ] = lim inf |W−(g(t))|2dvg(t) + (2χ(M) + 3τ(M)). Since χ(M) = 3τ(M), we obtain λM(g(t)) = − 32π2c21(c)[M ], lim inf |W−(g(t))|2dvg(t) = 0. Now, assume that |Ric(g(t))| ≤ 3. Let tk, Nj , gk, and g∞ be the same as above. For any j and compact subset U of the regular part of Nj, |W−(g∞)|2∞dv∞ ≤ lim inf |W−(g(tk))|2kdvk = 0, since g(tk) L2,p−→ g∞ on U . Hence g∞ is a Kähler-Einstein metric with W−(g∞) ≡ 0. This implies that g∞ is a complex hyperbolic metric (cf. [Le1]). The desired result follows. Proofs of Theorem 1.1 and Theorem 1.2. By the work of Taubes [Ta], if (M,ω) is a compact symplectic manifold with b+2 (M) > 1, the spin c-structure induced by ω is a monopole class. Moreover, since in this situation c21(c)[M ] = 2χ(M)+3τ(M), Theorem 1.1 (resp. Theorem 1.2) is an obvious consequence of Theorem 5.1 (resp. Theorem 5.4). � References [1] [An1] M. T. Anderson, Ricci curvature bounds and Einstein metrics on compact manifolds, J. Amer. Math. Soc. 2 (1989), 455-490. [An2] M. T. Anderson, The L2 structure of moduli spaces of Einstein metrics on 4-manifolds, G.A.F.A. (1991), 231-251. [An3] M. T. Anderson, Convergence and rigidity of manifolds under Ricci curvature bounds, Invent. Math. 102 (1990), 429-445. [An4] M. T. Anderson, Degeneration of metrics with bounded curvature and applications to critical metrics of Riemannian functionals, Proceeding of Sympoia in Pure Mathematics, 54 (1993), 53-79. [An5] M. T. Anderson, Canonical metrics on 3-manifolds and 4-manifolds, Asian J.Math. 10, (2006), 127-163. [An6] M. T. Anderson, Extrema of curvature functionals on the space of metrics on 3-manifolds, Calc. Var. and PDE, 5 (1997), 199-269. [AIL] K.Akutagawa, M.Ishida, and C.LeBrun, Perelman’s invariant, Ricci flow, and the Yamabe invariants of smooth manifolds, arxiv/math.DG/0610130. [B] A. L. Besse, Einstein manifolds, Ergebnisse der Math. Springer-Verlag, Berlin-New York 1987. MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 23 [BD] C.Bär, M.Dahl, Small eigenvalues of the conformal Laplacian, Geom. Funct. anal. 13 (2003), 483-508. [CC] J. Cheeger and T. H. Colding, On the structure of space with Ricci curvature bounded below I, Jour. Diff. Geom., 45 (1997), 406-480. [CG] J. Cheeger and M. Gromov, Collapsing Riemannian Manifolds while keeping their curvature bounded I, J.Diff.Geom. 23, (1986), 309-364. [Cr] C.Croke, Some isoperimetric inequalities and eigenvalue estimates, Ann. Sci. Ecole Norm. Sup. (4)13 (1980), 419-435. [CT] J.Cheeger, and G.Tian, Curvature and injectivity radius estimates for Einstein 4-manifolds, Journal of the American Mathematical Society, 19, (2006), 487-525. [FZ] F.Fang, and Y.G.Zhang, Perelman’s λ-functional and the Seiberg-Witten equations, math.FA/0608439. [FZZ] F.Fang, Y.G.Zhang, and Z.L.Zhang, Non-singular solutions to the normalized Ricci flow equa- tion, math.DG/0609254. [H1] R. Hamilton, Three-manifolds with positive Ricci curvature, J. Diff. Geom. 17 (1982) 255-306. [H2] R. Hamilton, A compactness property for solutions of the Ricci flow, Amer. J. Math. 117 (1995) 545-574. [K] P.B.Kronheimer, Minimal genus in S1 ×M3, Invent. Math. 135(1) (1999), 45-61. [KL] B.Kleiner, J.Lott, Notes on Perelman’s papers, arxiv/math.DG/0605667. [Kot] D.Kotschick, Monopole classes and Perelman’s invariant of four-manifolds, arXiv:math.DG/0608504. [Le1] C.LeBrun, Einstein metrics and Mostow rigidity, Math. Res. Lett., 2 (1995), 1-8. [Le2] C.LeBrun, Four-Dimensional Einstein Manifolds and Beyond, in Surveys in Differential Ge- ometry, vol VI: Essays on Einstein Manifolds, 247-285. [Le3] C.LeBrun, Ricci curvature, minimal volumes, and Seiberg-Witten theory, Invent. Math., 145 (2001), 279-316. [Le4] C.LeBrun, Kodaira dimension and the Yamabe probblem, Comm. Anal.Geom. 7 (1999), 133-156. [N] H.Nakajima,Self-duality of ALE Ricci-flat 4-manifolds and positive mass theorem, Advanced Studies in Pure Math. 18-I, (1990), 385-395. [Pe1] G.Perelman, The entropy formula for the Ricci flow and its geometric applications, arXiv:math/0211159. [Pe2] G.Perelman, Ricci flow with surgery on three-manifolds, arXiv: math:DG/0303109v1. [Se] N.Sesum, Convergence of a Kähler-Ricci flow, arXiv:math.DG/0402238v1. [STW] N. Sesum, G. Tian and X.D. Wang, Notes on Perelman’s paper on the entropy formula for the Ricci flow and its geometric applications, preprint. [Ta] C.H.Taubes, More constraints on symplectic forms from Seiberg-Witten invariants, Math. Res. Lett. 2 (1995), 9-13. [Ti] G.Tian, On Calabi’s conjecture for complex surface with positive first Chern class, Invent. Math. 101 (1990), 101-172. [W] E.Witten, Monopoles and four-manifolds, Math. Res. Lett., 1(1994), 809-822. [Y] R.Ye, Ricci flow, Einstein metrics and space forms, Trans. Amer. Math. Soc. 338 no.2 (1993), 871-896. Department of Mathematics, Capital Normal University, Beijing, P.R.China E-mail address : ffang@nankai.edu.cn Department of Mathematics, Capital Normal University, Beijing, P.R.China Nankai Institute of Mathematics, Weijin Road 94, Tianjin 300071, P.R.China http://arxiv.org/abs/math/0608439 http://arxiv.org/abs/math/0609254 http://arxiv.org/abs/math/0608504 http://arxiv.org/abs/math/0211159 http://arxiv.org/abs/math/0402238 1. Introduction 2. Preliminaries 2.1. Monopole class 2.3. Kato's inequality 2.5. Chern-Gauss-Bonnet formula and Hirzebruch signature formula 2.6. Curvature estimates for 4-manifolds 3. The limiting behavior of Ricci flow 4. Smooth convergence on the regular part 5. Proofs of Theorems 1.1 and 1.2 References ABSTRACT We consider maximum solution $g(t)$, $t\in [0, +\infty)$, to the normalized Ricci flow. Among other things, we prove that, if $(M, \omega) $ is a smooth compact symplectic 4-manifold such that $b_2^+(M)>1$ and let $g(t),t\in[0,\infty)$, be a solution to (1.3) on $M$ whose Ricci curvature satisfies that $|\text{Ric}(g(t))|\leq 3$ and additionally $\chi(M)=3 \tau (M)>0$, then there exists an $m\in \mathbb{N}$, and a sequence of points $\{x_{j,k}\in M\}$, $j=1, ..., m$, satisfying that, by passing to a subsequence, $$(M, g(t_{k}+t), x_{1,k},..., x_{m,k}) \stackrel{d_{GH}}\longrightarrow (\coprod_{j=1}^m N_j, g_{\infty}, x_{1,\infty}, ...,, x_{m,\infty}),$$ $t\in [0, \infty)$, in the $m$-pointed Gromov-Hausdorff sense for any sequence $t_{k}\longrightarrow \infty$, where $(N_{j}, g_{\infty})$, $j=1,..., m$, are complete complex hyperbolic orbifolds of complex dimension 2 with at most finitely many isolated orbifold points. Moreover, the convergence is $C^{\infty}$ in the non-singular part of $\coprod_1^m N_{j}$ and $\text{Vol}_{g_{0}}(M)=\sum_{j=1}^{m}\text{Vol}_{g_{\infty}}(N_{j})$, where $\chi(M)$ (resp. $\tau(M)$) is the Euler characteristic (resp. signature) of $M$. <|endoftext|><|startoftext|> Phase separation and flux quantization in the doped quantum dimer model on the square and triangular lattices Arnaud Ralko,1 Frédéric Mila,2 and Didier Poilblanc1 1 Laboratoire de Physique Théorique, CNRS & Université Paul Sabatier, F-31062 Toulouse, France 2 Institute of Theoretical Physics, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland (Dated: April 5, 2007) The doped two-dimensional quantum dimer model is investigated by numerical techniques on the square and triangular lattices, with significantly different results. On the square lattice, at small enough doping, there is always a phase separation between an insulating valence-bond solid and a uniform superfluid phase, whereas on the triangular lattice, doping leads directly to a uniform superfluid in a large portion of the RVB phase. Under an applied Aharonov-Bohm flux, the superfluid exhibits quantization in terms of half-flux quanta, consistent with Q = 2e elementary charge quanta in transport properties. PACS numbers: 75.10.Jm, 05.50.+q, 05.30.-d Understanding electron pairing in high temperature superconductors is a major challenge in strongly corre- lated systems. In his milestone paper, Anderson pro- posed a simple connection between high temperature su- perconductors and Mott insulators [1]. Electron pairs ”hidden” in the strongly correlated insulating parent state as Valence Bond (VB) singlets lead, once fried to move at finite doping, to a superconducting behavior. A very good candidate of the insulating parent state is the resonating VB state (RVB), a state with only expo- nentially decaying correlations and no lattice symmetry breaking. A simple realization of RVB has been pro- posed by Rokhsar and Kivelson (RK) in the framework of an effective quantum dimer model (QDM) with only local processes and orthogonal dimer coverings [2]. Even though the relevance of these models for the description of SU(2) Heisenberg models is still debated, this approach is expected to capture the physics of systems that nat- urally possess singlet ground states (GS). For instance, specific quantum dimer models have recently been de- rived from a spin-orbital model describing LiNiO2 [3], or from the trimerized kagome antiferromagnet [4]. In a re- cent work, a family of doped QDMs (at T=0) generalizing the so-called RK point of Ref.[2] has been constructed and investigated[5], taking advantage of a mapping to classical dimer models [6] that extends the mapping of the RK model onto a classical model at infinite temper- ature, with evidence of phase separation at low doping. However, the soluble models of Ref.[5] are ’ad hoc’ con- structions, and this call for the investigation of similars issue in the context of more realistic models. In that re- spect, a natural minimal model to describe the motion of charge carriers in a sea of dimers is the two-dimensional quantum hard-core dimer-gas Hamiltonian: H = v Nc|c〉〈c| − J (c,c′) |c′〉〈c| − t (c,c′′) |c′′〉〈c| where the sum on (c) runs over all configurations of the Hilbert space, Nc is the number of flippable plaquettes, the sum on (c′, c) runs over all configurations |c〉 and |c′〉 that differ by a single plaquette dimer flip, and the sum on (c′′, c) runs over all configurations |c〉 and |c′′〉 that differ by a single hole hopping between nearest neigh- bors (triangular) or (diagonal) next-nearest neighbors (square). Throughout the energy scale is set by J = 1. A schematic phase diagram for the two lattices is de- picted in Fig.1 in the undoped case. Remarkably, these lattices lead to quite different insulating states. Indeed, an ordered plaquette phase appears on the square lattice immediately away from the special RK point, whereas a RVB liquid phase is present in the triangular lattice PSfrag replacements columnar columnar staggered staggered liquid plaquette FIG. 1: (color online) Schematic phase diagrams for the tri- angular and the square lattice. In this Letter, we investigate in details the properties of model (1) on the square and triangular lattices at fi- nite doping. Building on the differences between the two lattices in the undoped case, we investigate to which ex- tent the properties of the doped system are governed by the nature of the insulating parent state. This investi- gation is based on exact Diagonalisations and extensive Green’s Function Monte-Carlo (GFMC) simulations [7] essentially free of the usual finite-size limitations [8]. Phase separation: At small t, it is expected that holes http://arxiv.org/abs/0704.0715v2 experience an effective attractive potential. It is there- fore natural to first address the issue of phase separation (PS), i.e. the possibility for the system to spontaneously undergo a macroscopic segregation into two phases with different hole concentrations. We analyze the problem as a function of the hopping parameter t and hole concen- tration x = nh/N , where nh is the number of holes in the system and N the number of sites. In order to perform a Maxwell construction we define: s(x) = e(x)− e(0) where e(x) is the energy per site at doping x. This quan- tity corresponds to the slope of the line passing through e(0) and e(x). If the system exhibits PS, the energy will present a change of curvature implying s(x) to have a minimum at a critical doping xc. The fact that the local curvature of e(x) at x = 0 is negative then implies that the two seperated phases will have x = xc and x = 0 (the undoped insulator). In Fig.2, typical results are shown for both square and triangular lattices and for different sizes. Interestingly, PS appears in both cases, but with 0 0.05 0.1 0.15 -0.025 -0.02 3x6x6 3x8x8 3x10x10 0 0.05 0.1 0.15 -0.138 -0.136 -0.134 12x12 14x14 16x16 PSfrag replacements Triangular v=0.85, t=0.05 Square v=0.90, t=0.10 FIG. 2: (color online) Slope of energy density (Eq.(2)) vs doping for different sizes. (a) Triangular lattice. (b) Square lattice. noticeable differences. While for the square lattice (lower panel) the critical hole concentration xc is roughly size independent, there is a strong size dependence for the triangular lattice (upper panel). This size effect can be traced back to the nature of the parent undoped state. On the square lattice, the crystalline phase (for v < 1) at zero doping is very robust and for increasing size, its lo- cal order changes only weakly. On the triangular lattice, it has been shown that size effects play an important role [8], especially in the RVB liquid phase for 0.8 . v ≤ 1. Periodic boundary conditions (BC) tend to stabilize the so-called 12 phase on small clusters, and clusters with more than 192 sites are necessary to significantly re- duce finite-size effects, in particular, as in Fig.2, close to the transition point with the crystalline phase. Hence the PS observed around x = 0.075 for the 3× 6× 6 cluster is not representative of the thermodynamic limit. To obtain the phase diagram in the (v, x) plane, we have performed a systematic size-scaling analysis at fixed t and for various v’s depicted in Fig.3. In agreement with 0.003 0.004 0.005 0.006 0.007 0.008 v=0.95 v=0.90 v=0.85 0 0.002 0.004 0.006 0.008 0.01 v=0.90 v=0.85 v=0.80PSfrag replacements Triangular Square Triangular t=0.05 Square t=0.15 FIG. 3: (color online) Scaling of the critical doping xc de- fined by Maxwell construction with the inverse total number of sites. (a) Triangular lattice. (b) Square lattice. the previous discussion, a significant size dependence is only present for the triangular lattice, in which case PS disappears for large clusters in the RVB phase in the vicinity of the RK point [9]. In Fig.4, we report the ther- modynamic limit of xc for the two lattices as a function of v, and for different values of t. For the square lattice, calculations have been done from the RK point down to the expected phase transition between the plaquette phase and the columnar phase, namely v ≃ 0.6 [10]. For the triangular lattice, the range between the RK point down to the RVB- 12 transition point at v ≃ 0.8 has been covered [8]. These results clearly demonstrate the difference between the square and triangular lattices. In the first case, as soon as v 6= 1, PS occurs for x < xc. Moreover, upon decreasing v, crystalline order strength- ens and, for fixed t, it is necessary to consider a higher concentration of holes to reach a stable conducting phase. Similarly, the bigger t, the lower xc. On the triangular lattice, a finite size-scaling analysis shows that no phase separation appears down to a critical value v ∼ 0.9, well above the critical value v ∼ 0.8 below which plaquette order sets in. Although numerical limitations prevent computations for smaller v and t, our results up to the 3 × 12 × 12-site cluster provide clear evidence for a re- gion of PS inside the RVB region, between v ∼ 0.8 and v ∼ 0.9 [11]. Dimer ordering on the square lattice: Next we investi- gate how dimer order, known to exist at x = 0, evolves under finite doping. Two scenarii are a priori possible: i) the dimer order vanishes in the stable conducting phase 0.6 0.7 0.8 0.9 1 t=0.05 t=0.10 t=0.15 t=0.20 PSfrag replacements Square Triangular FIG. 4: (color online) Phase separation boundaries for the square and triangular lattices in the thermodynamic limit. The dashed lines correspond to the approximate location of the phase transition between plaquette and columnar phases [10] for the square lattice and between plaquette and RVB phases [8, 12] for the triangular lattice. immediately at xc; ii) dimer order survives above xc in a narrow region of the conducting phase. To solve this problem, we have calculated the squared order parameter D2(~k) in the GS |Ψ0〉 defined by: D2(~k) = 〈Ψ0|d(−~k)d(~k)|Ψ0〉 〈Ψ0|Ψ0〉 along the path of Fig.5(a) the first Brillouin zone of the square lattice, where d(~k) is the Fourier transform of the dimer operator defined on the horizontal bonds. Note that this calculation has not been tried for the triangular lattice since no Bragg peak is present in the RVB phase, and the algorithm is losing efficiency for v . 0.8 [9]. In the pure plaquette phase on the square lattice, a Bragg peak develops at point ~kM = (π, 0), the middle of the side of the BZ. We show in Fig.5(b) a typical result for the squared order parameter on the 196-site cluster for differ- ent values of x. Clearly, the Bragg peak disappears upon doping. A finite size scaling of the order parameter can be performed thanks to the linear behaviour at low con- centration. Within our data, D2(~k = ~kM ) ≡ D2M (L, x) behaves like aLx + bL in the linear region. In this case, one can determine rather precisely x+∞ such that D2M (+∞, x+∞) = 0 i.e. x+∞ is the concentration in the thermodynamic limit where the Bragg peak vanishes. By definition, x+∞ = −b+∞/a+∞ ≃ 0.05(8). This value, and the linear behaviour in the thermodynamic limit, are displayed in Fig.5(c). If we compare x+∞ to the corre- sponding xc from Fig.4, we can conclude that the Bragg peak indeed vanishes at xc ≃ 0.067(5) within error bars. Note that numerical errors increase for larger clusters, and we are not able to use the same analysis for clusters larger than 256 sites (the results for the 18 × 18 cluster have not been used). Although the determination of x+∞ 0 0.03 0.06 0.09 0.12 0.15 0.005 0.015 12x12 14x14 16x16 18x18 TD limit 0.005 x=0.00 x=0.03 x=0.06 x=0.09 x=0.12 PSfrag replacements v=0.90, t=0.15 FIG. 5: (color online) (a) First Brillouin zone. (b) Momen- tum dependence of the squared order parameter for the 196- site cluster (14x14). (c) Squared order parameter at the M point as a function of the hole concentration x and for differ- ent cluster sizes. The thermodynamic limit is depicted as a dashed line, with the corresponding error bar (see main text). is delicate, the linear behavior of D2 vs x is consistent with the physical behavior expected for the binary sys- tem of Fig.4. No dimer order is present above xc, showing that the system is simply “conducting ”in this case, with D2(~kM ) decreasing as N −1 (critical behaviour). Flux quantization: Finally, let us turn to a better char- acterization of the “conducting phase”. Since holes are bosonic one expects the conducting phase to be a su- perfluid (through Bose condensation)[13]. However, ex- tra complexity results from the fact that these bosons move in a dimer environment. To investigate this is- sue, we pierce the torus by an Aharonov-Bohm flux of strength φ = ξφ0 with 0 ≤ ξ ≤ 1 and where φ0 = hc/e is the magnetic flux quantum. To reduce finite size effects, we also consider arbitrary BC in the second direction (y). All this is implemented by the Peierls substitution, changing the hole hopping term into t′ = exp(±i2πξa/Lx± i2πηa/Ly)t, where the ± depends on the directions ±x± y, while a is the lattice parameter and Lx and Ly the linear sizes of the system. Obviously, the whole spectrum should be periodic in ξ with period 1. We show in Fig.6 the spectrum of the 4 × 4 cluster on the square lattice, with 4 holes. It turns out that (i) the GS energy exhibits well-defined minima and (ii) is rigorously periodic with period ξ = 1/2, which means that there is flux quantization in units of half the flux quantum (red curve). Property (i) is typical of a super- fluid [14]: It is the precursor of a finite barrier in the ther- modynamic limit. [15]. Property (ii) was suggested quite some time ago in the context of a more general QDM by Kivelson[13], who also predicted that, in the cylinder geometry, one should be able to tunnel between the two branches of Fig.6, thus lifting the degeneracy at the level crossing. This degeneracy is not lifted in our case, neither in the torus geometry, nor in the cylinder geometry, due 0 0.2 0.4 0.6 0.8 1 -0.54 -0.51 -0.48 -0.45 k(0,0) k(π,0) -0.48 -0.46 PSfrag replacements FIG. 6: (color online) Energy spectrum vs (reduced) Aharonov-Bohm flux ξ, for a 4 × 4 cluster with 4 holes at v = 0.70 and t = 1.00. (a) Torus geometry. Arbitrary BC are used in the transverse direction to obtain a continuous set of momenta leading to a continuum (colored area). (b) The two low-energy branches for the case of a cylinder (dashed line) and including a small bond disorder (full red line). to the translational symmetry, which puts the two states that are degenerate at ξ = 1/4 into different symmetry sectors. However, getting rid of the translational sym- metry by changing the amplitude of a local dimer flip indeed removes the degeneracy (upper panel of Fig.6), leading to a detectable flux quantization in units of half the flux quantum in an experiment in which the flux is sweeped. Thus, in our model, the ground-state energy has periodicity hc/2e, consistent with mobile elementary particles of charge Q = 2e in the system. Unlike what was recently found in a bosonic model with correlated hopping[16], these particles are not boson pairs: ¿From the bosonic point of view, it is the statistical flux of the dimer background that leads to the half-flux quantiza- tion. If dimers are interpreted as SU(2) electron singlets, these singlets are the physical pairs that lead to half-flux quantization. This scenario is fundamentally different from the usual mechanism related to real space pairing of the charge carriers found e.g. in the extended Hubbard chain[17], in the 2-leg ladders [18] or, more generally, in Luther-Emery liquids [19], as can be inferred from the exact degeneracy between ξ = 0 and ξ = 0.5 for finite systems in the present case, to be contrasted with the significant finite-size effects of the other cases. Summary and conclusions: The numerical investiga- tion with Green’s function QuantumMonte Carlo and ex- act diagonalizations of the doped two-dimensional quan- tum hard-core dimer model on the square and triangular lattices has led to a number of interesting conclusions re- garding hole motion in a dimer background. Phase sep- aration is often present at low doping, as suggested by earlier investigations, but our results indicate that it is related to the presence of valence bond order [20]: In the RVB phase of the triangular lattice, PS only occurs close to the plaquette phase, where short-range dimer correla- tions are already strong enough. Close to the RK point, doping the RVB phase leads directly to a superfluid phase as shown from its response to an Aharonov-Bohm flux. Moreover, we observed that the flux quantization is in units of half a flux quantum, consistent with the idea that the dimer background leads to effective particles of charge 2e. All these results are in qualitative agreement with the gauge theories of high Tc superconductivity in strongly correlated systems [21]. We acknowledge useful discussions with Federico Becca. This work was supported by the Swiss National Fund, by MaNEP, and by the Agence Nationale de la Recherche (France). [1] P.W. Anderson, Science 235, 1196 (1987). [2] D.S. Rokhsar and S.A. Kivelson, Phys. Rev. Lett. 61, 2376 (1988). [3] F. Vernay, A. Ralko, F. Becca and F. Mila, Phys. Rev. B 74, 054402 (2006). [4] M. E. Zhitomirsky, Phys. Rev. B 71, 214413 (2005). [5] D. Poilblanc, F. Alet, F. Becca, A. Ralko, F. Trousselet and F. Mila, Phys. Rev. B 74, 014437 (2006). [6] C. Castelnovo, C. Chamon, C. Mudry and P. Pujol, Ann. Phys. 322, 903 (2007). [7] N. Trivedi and D.M. Ceperley, Phys. Rev. B 41, 4552 (1990); M. Calandra and S. Sorella, Phys. Rev. B 57, 11446 (1998). [8] A. Ralko, M. Ferrero, F. Becca, D. Ivanov, and F. Mila, Phys. Rev. B 74, 134301 (2006) and references therein. [9] For small values of t, the GFMC algorithm is no longer ergodic due to hole localization. [10] O.F. Syljuasen, Phys. Rev. B 73, 245105 (2006). [11] Long-range Coulomb repulsion is expected to have a ma- jor role in the PS region and might stabilize stripes. [12] R. Moessner and S. L. Sondhi, Phys. Rev. B 63, 224401 (2001). [13] S. Kivelson, Phys. Rev. B 39, 259 (1989). [14] N. Byers and C.N. Yang, Phys. Rev. Lett. 7, 46 (1961). [15] By contrast, for non-interacting fermions, signs of a flat energy curve already appear on such small clusters pro- vided one also uses arbitrary BC. Similar arguments were used in D. Poilblanc, Phys. Rev. B 44, 9562 (1991) for the 2D t-J model. [16] R. Bendjama, B. Kumar, F. Mila, Phys. Rev. Lett. 95, 110406 (2005). [17] K. Penc and F. Mila, Phys. Rev. B 49, 9670 (1994). [18] C. A. Hayward, D. Poilblanc, R. M. Noack, D. J. Scalapino and W. Hanke, Phys. Rev. Lett. 75, 926 (1995). [19] A. Seidel and D. H. Lee, Phys. Rev. B 71, 045113 (2005). [20] Our results give an explicit characterization of the confinement-deconfinement discussed in O.F. Syljuasen, Phys. Rev. B 71, 020401(R)(2005). [21] T. Senthil and P.A. Lee, Phys. Rev. B 71, 174515 (2005). and references therein. ABSTRACT The doped two-dimensional quantum dimer model is investigated by numerical techniques on the square and triangular lattices, with significantly different results. On the square lattice, at small enough doping, there is always a phase separation between an insulating valence-bond solid and a uniform superfluid phase, whereas on the triangular lattice, doping leads directly to a uniform superfluid in a large portion of the RVB phase. Under an applied Aharonov-Bohm flux, the superfluid exhibits quantization in terms of half-flux quanta, consistent with Q=2e elementary charge quanta in transport properties. <|endoftext|><|startoftext|> Introduction For a given combinatorial class of objects, such as polygons or polyhedra, the most basic question concerns the number of objects of a given size (always assumed to be finite), or an asymptotic estimate thereof. Informally stated, in this overview we will analyse the refined question: What does a typical object look like? In contrast to the combinatorial question about the number of objects of a given size, the latter question is of a probabilistic nature. For counting parameters in addition to object size, one asks for their (asymptotic) probability law. To give this question a meaning, an underlying ensemble has to be specified. The simplest choice is the uniform ensemble, where each object of a given size occurs with equal probability. For self-avoiding polygons on the square lattice, size may be the number of edges of the polygon, and an additional counting parameter may be the area enclosed by the polygon. We will call this ensemble the fixed perimeter ensemble. For the uniform fixed perimeter ensemble, one assumes that, for a fixed number of edges, each polygon occurs with the same probability. Another ensemble, which we will call the fixed area ensemble, is obtained with size being the polygon area, and the number of edges being an additional counting http://arxiv.org/abs/0704.0716v4 parameter. For the uniform fixed area ensemble, one assumes that, for fixed area, each polygon occurs with the same probability. To be specific, let pm,n denote the number of square lattice self-avoiding polygons of half-perimeter m and area n. Discrete random variables X̃m of area in the uniform fixed perimeter ensemble and of perimeter Ỹn in the uniform fixed area ensemble are defined by P(X̃m = n) = pm,n∑ n pm,n , P(Ỹn = m) = pm,n∑ m pm,n We are interested in an asymptotic description of these probability laws, in the limit of infinite object size. In statistical physics, certain non-uniform ensembles are important. For fixed object size, the probability of an object with value n of the counting parameter (such as the area of a polygon) may be proportional to an, for some non-negative parameter a = e−βE of non-uniformity. Here E is the energy of the object, and β = 1/(kBT ), where T is the temperature, and kB denotes Boltzmann’s constant. A qualitative change in the behaviour of typical objects may then be reflected in a qualitative change in the probability law of the counting parameter w.r.t. a. Such a change is an indication of a phase transition, i.e., a non-analyticity in the free energy of the corresponding ensemble. For self-avoiding polygons in the fixed perimeter ensemble, let q denote the parameter of non-uniformity, P(X̃m(q) = n) = pm,nq n pm,nq Polygons of large area are suppressed in probability for small values of q, such that one expects a typical self-avoiding polygon to closely resemble a branched polymer. Likewise, for large values of q, a typical polygon is expected to be inflated, closely resembling a ball (or square) shape. Let us define the ball-shaped phase by the condition that the mean area of a polygon grows quadratically with its perimeter. The ball-shaped phase occurs for q > 1 [31]. Linear growth of the mean area w.r.t. perimeter is expected to occur for all values 0 < q < 1. This phase called the branched polymer phase. Of particular interest is the point q = 1, at which a phase transition occurs [31]. This transition is called a collapse transition. Similar considerations apply for self-avoiding polygons in the fixed area ensemble, P(Ỹn(x) = m) = pm,nx pm,nxm with parameter of non-uniformity x, where 0 < x < ∞. For a given model, these effects may be studied using data from exact or Monte-Carlo enumeration and series extrapolation techniques. Sometimes, the underlying model is exactly solvable, i.e., it obeys a combinatorial decomposition, which leads to a recursion for the counting parameter. In that case, its (asymptotic) behaviour may be extracted from the recurrence. A convenient tool is generating functions. The combinatorial information about the number of objects of a given size is coded in a one-variable (ordinary) generating function, typically of positive and finite radius of convergence. Given the generating function of the counting problem, the asymptotic behaviour of its coefficients can be inferred from the leading singular behaviour of the generating function. This is determined by the location and nature of the singularity of the generating function closest to the origin. There are elaborate techniques for studying this behaviour exactly [37] or numerically [43]. The case of additional counting parameters leads to a multivariate generating function. For self-avoiding polygons, the half-perimeter and area generating function is P (x, q) = pm,nx For a fixed value of a non-uniformity parameter q0, where 0 < q0 ≤ 1, let x0 be the radius of convergence of P (x, q0). The asymptotic law of the counting parameter is encoded in the singular behaviour of the generating function P (x, q) about (x0, q0). If locally about (x0, q0) the nature of the singularity of P (x, q) does not change, then distributions are expected to be concentrated, with a Gaussian limit law. This corresponds to the physical intuition that fluctuations of macroscopic quantities are asymptotically negligible away from phase transition points. If the nature of the singularity does change locally, we expect non- concentrated distributions, resulting in non-Gaussian limit laws. This is expected to be the case at phase transition points. Qualitative information about the singularity structure is given by the singularity di- agram (also called the phase diagram). It displays the region of convergence of the two- variable generating function, i.e., the set of points (x, q) in the closed upper right quadrant of the plane, such that the generating function P (x, q) converges. The set of boundary points with positive coordinates is a set of singular points of P (x, q), called the critical curve. See Figure 1 for a sketch of the singularity diagram of a typical polygon model such as self-avoiding polygons, counted by half-perimeter and area, with generating func- tion P (x, q) as above. There appear two lines of singularities, which intersect at the point Figure 1: Singularity diagram of a typical polygon model counted by half-perimeter and area, with x conjugate to half-perimeter and q conjugate to area. (x, q) = (xc, 1). Here xc is the radius of convergence of the half-perimeter generating func- tion P (x, 1), also called the critical point. The nature of a singularity does not change along each of the two lines, and the intersection point (x, q) = (xc, 1) of the two lines is a phase transition point. For 0 < q < 1 fixed, denote by xc(q) the radius of convergence of P (x, q). The branched polymer phase for the fixed perimeter ensemble 0 < q < 1 (and also for the corresponding fixed area ensemble) is asymptotically described by the singularity of P (x, q) about (xc(q), q). In the ball-shaped phase q > 1 of the fixed perimeter ensemble, the (ordinary) generating function does not seem the right object to study, since it has zero radius of convergence for fixed q > 1. The singularity of P (x, q) about (x, 1) describes, for 0 < x < xc, a ball-shaped phase in the fixed area ensemble, with a finite average size of a ball. For points (x, q) within the region of convergence, both x and y positive, the generating function P (x, q) is finite and positive. Thus, such points may be interpreted as parameters in a mixed infinite ensemble P(X̃(x, q) = (m,n)) = pm,nx m,n pm,nx The limiting law of the counting parameter in the fixed area or fixed perimeter ensem- ble can be extracted from the leading singular behaviour of the two-variable generating function. There are two different approaches to the problem. The first one consists in analysing, for fixed non-uniformity parameter a, the singular behaviour of the remaining one-parameter generating function and its derivatives w.r.t. a. This method is also called the method of moments. It can be successfully applied in the fixed perimeter ensemble at the phase transition point. Typically, this results in non-concentrated distributions. The second approach derives an asymptotic approximation of the two-variable generat- ing function. Away from a phase transition point, such an approximation can be obtained for some classes of models, typically resulting in concentrated distributions, with a Gaus- sian law for the centred and normalised random variable. However, it is usually difficult to extract such information at a phase transition point. The theory of tricritical scaling seeks to fill this gap, by suggesting and justifying a particular ansatz for an approximation using scaling functions. Knowledge of the approximation may imply knowledge of the quantities analysed in the first approach. In the following, we give an overview of these two approaches. For the first approach, summarised by the title limit distributions , there are a number of rigorous results, which we will discuss. The second approach, summarised by the title scaling functions , is less developed. For that reason, our presentation will be more descriptive, stating important open questions. We will stress connections between the two approaches, thereby providing a probabilistic interpretation of scaling functions in terms of limit distributions. 2 Polygon models and generating functions Models of polygons, polyominoes or polyhedra have been studied intensively on the square and cubic lattices. It is believed that the leading asymptotic behaviour of such models, such as the type of limit distribution or critical exponents, is independent of the underlying lattice. In two dimensions, a number of models of square lattice polygons have been enumerated according to perimeter and area and other parameters, see [7] for a review of models with an exact solution. The majority of such models has an algebraic perimeter generating function. We mention prudent polygons [96, 22, 8] as a notable exception. Of particular importance for polygon models is the fixed perimeter ensemble, since it models two-dimensional vesicle collapse. Another important ensemble is the fixed area ensemble, which serves as a model of ring polymers. The fixed area ensemble may also describe percolation and cluster growth. For example, staircase polygons are models of directed compact percolation [26, 28, 29, 27, 12, 57]. This may be compared to the exactly solvable case of percolation on a tree [42]. The model of self-avoiding polygons is conjectured to describe the hull of critical percolation clusters [60]. In addition to perimeter, other counting parameters have been studied, such as width and height, generalisations of area [89], radius of gyration [53, 64], number of nearest- neighbour interactions [4], last column height [7], and site perimeter [20, 11]. Also, mo- tivated by applications in chemistry, symmetry subclasses of polygon models have been analysed [63, 62, 40, 95]. Whereas this gives rise to a number of different ensembles, only a few of them have been asymptotically studied. Not all of them display phase transitions. In three dimensions, models of polyhedra on the cubic lattice have been enumerated according to perimeter, surface area and volume, see [74, 102, 3] and the discussion in section 3.9. Various ensembles may be defined, such as the fixed surface area ensemble and the fixed volume ensemble. The fixed surface area ensemble serves as a model of three-dimensional vesicle collapse [104]. In this chapter, we will consider models of square lattice polygons, counted by half- perimeter and area. Let pm,n denote the (finite) number of such polygons of half-perimeter m and area n. The numbers pm,n will always satisfy the following assumption. Assumption 1. For m,n ∈ N0, let non-negative integers pm,n ∈ N0 be given. The numbers pm,n are assumed to satisfy the following properties. i) There exist positive constants A,B > 0 such that pm,n = 0 if n ≤ Am or if n ≥ Bm2. ii) The sequence ( n pm,n)m∈N0 has infinitely many positive elements and grows at most exponentially. Remarks. i) A sequence (an)n∈N0 is said to grow at most exponentially, if there are positive constants C, µ such that |an| ≤ Cµn for all n. ii) Condition i) reflects the geometric constraint that the area of a polygon grows at most quadratically and at least linearly with its perimeter. For self-avoiding polygons, we have n ≥ m − 1. Since pm,n = 0 if m < 2, we may choose A = 1/3. Since n ≤ m2/4 for self-avoiding polygons, we may choose B = 1/3. Condition ii) is a natural condition on the growth of the number of polygons of a given perimeter. For self-avoiding polygons, we may choose C = 1 and µ = 16. iii) For models with counting parameters different from area, or for models in higher dimensions, a modified assumption holds, with the growth condition i) being replaced by n ≤ Amk0 and n ≥ Bmk1 , for appropriate values of k0 and k1. Counting parameters statisfying pm,n = 0 for n ≥ Bmk are called rank k parameters [25]. The above assumption imposes restrictions on the generating function of the numbers pm,n. These explain the qualitative form of the singularity diagram Figure 1. Proposition 1. For numbers pm,n, let Assumption 1 be satisfied. Then, the generating function P (x, q) = m,n pm,nx mqn has the following properties. i) The generating function P (x, q) satisfies for k ∈ N P (x, q) ≪ P (x, q) ≪ Bk P (x, q), where ≪ denotes coefficient-wise domination. ii) The evaluation P (x, 1) is a power series with radius of convergence xc, where 0 < xc ≤ 1. iii) The generating function P (x, q) diverges, if x 6= 0 and |q| > 1. It converges, if |q| < 1 and |x| < xcq−A. In particular, for k ∈ N0, the evaluations P (x, q) are power series with radius of convergence 1. iv) For k ∈ N0, the evaluations P (x, q) are power series with radius of convergence xc. They satisfy, for |x| < xc, P (x, q) = lim −1 0. Then, the generating function g(x) = m=0 amx m has radius of convergence xc. If γ /∈ {0,−1,−2, . . .}, then there exists a power series g(reg)(x) with radius of convergence strictly larger than xc, such that g(x) satisfies g(x)− g(reg)(x) ∼ AΓ(γ) (1− x/xc)γ (x ր xc), (3) where Γ(z) denotes the Gamma function. Remarks. i) The above lemma can be proved using the analytic properties of the polylog function [32]. If γ ∈ {0,−1,−2, . . .}, an asymptotic form similar to Eq. (3) is valid, which involves logarithms. ii) The function g(reg)(x) in the above lemma is not unique. For example, if γ > 0, any polynomial in x may be chosen. We demand g(reg)(x) ≡ 0 in that case. If γ < 0 and g(reg)(x) is restricted to be a polynomial, it is uniquely defined. If −1 < γ < 0, we have g(reg)(x) ≡ g(xc). In the general case, the polynomial has degree ⌊−γ⌋, compare [32]. In the following, we will demand uniqueness by the above choice. The power series g(sing)(x) := g(x)− g(reg)(x) is then called the singular part of g(x). Conversely, let a power series g(x) with radius of convergence xc be given. In order to conclude from Eq. (3) the behaviour Eq. (2), certain additional analyticity assumptions on g(x) have to be satisfied. To this end, a function g(x) is called ∆(xc, η, φ)-regular (or simply ∆-regular) [30], if there is a positive real number xc > 0, such that g(x) is analytic in the indented disc ∆(xc, η, φ) := {z ∈ C : |z| ≤ xc + η, |Arg(z − xc)| ≥ φ}, for some η > 0 and some φ, where 0 < φ < π/2. Note that xc /∈ ∆, where we adopt the convention Arg(0) = 0. The point x = xc is the only point for |x| ≤ xc, where g(x) may possess a singularity. Lemma 2 ([35]). Let the function g(x) be ∆-regular and assume that g(x) ∼ 1 (1− x/xc)γ (x → xc in ∆). If γ /∈ {0,−1,−2, . . .}, we then have [xm]g(x) ∼ x−mc m γ−1 (m → ∞), where [xm]g(x) denotes the Taylor coefficient of g(x) of order m about x = 0. Remarks. i) Note that the coefficients of the function f(x) = (1 − x/xc)−γ with real exponent γ /∈ {0,−1,−2, . . .} satisfy [xm]f(x) ∼ 1 x−mc m γ−1 (m → ∞). (4) This may be seen by an application of the binomial series and Stirling’s formula. For functions g(x) ∼ f(x), the assumption of ∆-regularity for g(x) ensures that the same asymptotic estimate holds for the coefficients of g(x). ii) Theorems of the above type are called transfer theorems [35, 37]. The set of ∆-regular functions with singularities of the above form is closed under addition, multiplication, differentiation, and integration [30]. iii) The case of a finite number of singularities on the circle of convergence can be treated by a straightforward extension of the above result [35, 37]. Lemma 1 implies a particular singular behaviour of the factorial moment generating functions, if the numbers pm,n satisfy certain typical asymptotic estimates. We write (a)k = a · (a− 1) · . . . · (a− k + 1) to denote the lower factorial. Proposition 2. For m,n ∈ N0, let real numbers pm,n be given. Assume that the numbers pm,n asymptotically satisfy, for k ∈ N0, (n)kpm,n ∼ Akx−mc mγk−1 (m → ∞), (5) for real numbers Ak, xc, γk, where Ak > 0, xc > 0, and γk /∈ {0,−1,−2, . . .}. Then, the factorial moment generating functions gk(x) satisfy (sing) k (x) ∼ (1− x/xc)γk (x ր xc), (6) where fk = Ak Γ(γk). Model φ θ γ0 Area limit law rectangles convex polygons −1 2 β1,1/2 Ferrers diagrams stacks 1 Gaussian staircase polygons bargraph polygons column-convex polygons directed column-convex polygons diagonally convex directed polygons rooted self-avoiding polygons∗ directed convex polygons 2 meander diagonally convex polygons∗ −1 three-choice polygons 0 Table 1: Exponents and area limit laws for prominent polygon models. An asterisk denotes a numerical analysis. Remarks. i) The above assumption on the growth of the coefficients in Eq. (5) is typical for polygon models, with γk = (k − θ)/φ, and φ > 0. ii) If the numbers pm,n satisfy, in addition to Eq.(5), Condition i) of Assumption 1, this implies for exponents of the form γk = (k − θ)/φ, where φ > 0, the estimate 1/2 ≤ φ ≤ 1. iii) The proposition implies that the singular part of the factorial moment generating function gk(x) is asymptotically equal to the singular part of the corresponding (ordinary) moment generating function, P (x, q) )(sing) P (x, q) ∣∣∣∣∣ (sing) (x ր xc). We give a list of exponents and area limit distributions for a number of polygon models. An asterisk denotes that corresponding results rely on a numerical analysis. It appears that the value (θ, φ) = (1/3, 2/3) arises for a large number of models. Furthermore, the exponent γ0 seems to determine the area limit law. These two observations will be explained in the following section. 3 Limit distributions In this section, we will concentrate on models of square lattice polygons in the fixed perime- ter ensemble, and analyse their area law. The uniform ensemble is of particular interest, since non-Gaussian limit laws usually appear, due to expected phase transitions at q = 1. For non-uniform ensembles q 6= 1, Gaussian limit laws are expected, due to the absence of phase transitions. There are effective techniques for the uniform ensemble, since the relevant generating functions are typically algebraic. This is different from the fixed area ensemble, where singularities are more difficult to analyse. It will turn out that the dominant singularity of the perimeter generating function determines the limiting area law of the model. We will first discuss several examples with different type of singularity. Then, we will describe a general result, by analysing classes of q-difference equations (see e.g. [103]), which exactly solvable polygon models obey. Whereas in the case q 6= 1 their theory is developed to some extent, the case q = 1 is more difficult to analyse. Motivated by the typical behaviour of polygon models, we assume that a q-difference equation reduces to an algebraic equation as q approaches unity, and then analyse the behaviour of its solution about q = 1. Useful background concerning a probabilistic analysis of counting parameters of combi- natorial structures can be found in [37, Ch IX]. See [80, Ch 1] and [5, Ch 1] for background about asymptotic expansions. For properties of formal power series, see [39, Ch 1.1]. A useful reference on the Laplace transform, which will appear below, is [23]. 3.1 An illustrative example: Rectangles 3.1.1 Limit law of area Let pm,n denote the number of rectangles of half-perimeter m and area n. Consider the uniform fixed perimeter ensemble, with a discrete random variable of area X̃m defined by P(X̃m = n) = pm,n∑ n pm,n . (7) The k-th moments of X̃m are given explicitly by E[X̃km] = (l(m− l))k ∼ m2k (x(1− x))kdx = (k!) (2k + 1)! m2k (m → ∞), where we approximated the Riemann sum by an integral, using the Euler-MacLaurin sum- mation formula. Thus, the random variable X̃m has mean µm ∼ m2/6 and variance σ2m ∼ m4/180. Since the sequence of random variables (X̃m) does not satisfy the con- centration property limm→∞ σm/µm = 0, we expect a non-trivial limiting distribution. Consider the normalised random variable . (8) Since the moments ofXm converge asm → ∞, and the limit sequenceMk := limm→∞ E[Xkm] satisfies the Carleman condition k(M2k) −1/(2k) = ∞, they define [17, Ch 4.5] a unique random variable X with moments Mk. Its moment generating function M(t) = E[e −tX ] is readily obtained as M(t) = E[Xk] (−t)k = et erf The corresponding probability distribution p(x) is obtained by an inverse Laplace trans- form, and is given by p(x) = 1−x 0 ≤ x ≤ 1 0 x > 1 . (9) This distribution is known as the beta distribution β1,1/2. Together with [17, Thm 4.5.5], we arrive at the following result. Theorem 1. The area random variable X̃m of rectangles Eq. (7) has mean µm ∼ m2/6 and variance σ2m ∼ m4/180. The normalised random variables Xm Eq. (8) converge in distribution to a continuous random variable with limit law β1,1/2 Eq. (9). We also have moment convergence. 3.1.2 Limit law via generating functions We now extract the limit distribution using generating functions. Whereas the derivation is less direct than the previous approach, the method applies to a number of other cases, where a direct approach fails. Consider the half-perimeter and area generating function P (x, q) for rectangles, P (x, q) = pm,nx The factorial moments of the area random variable X̃m Eq. (7) are obtained from the generating function via E[(X̃m)k] = n(n)kpm,n∑ n pm,n [xm] ∂ P (x, q) [xm]P (x, 1) where (a)k = a · (a − 1) · . . . · (a − k + 1) is the lower factorial. The generating function P (x, q) satisfies [87, Eq. 5.1] the linear q-difference equation [103] P (x, q) = x2qP (qx, q) + x2q(1 + qx) 1− qx . (10) Due to the particular structure of the functional equation, the area moment generating functions gk(x) = P (x, q) are rational functions and can be computed recursively from the functional equation, by repeated differentiation w.r.t. q and then setting q=1. (Such calculations are easily per- formed with a computer algebra system.) This gives, in particular, g0(x) = (1− x)2 , g1(x) = (1− x)4 g2(x) = (1− x)6 , g3(x) = (1− x)8 g4(x) = x4(1 + 22x+ x2) (1− x)10 , g5(x) = 12x5(1 + 8x+ x2) (1− x)12 Whereas the exact expressions get messy for increasing k, their asymptotic form about their singularity xc = 1 is simply given by gk(x) ∼ (1− x)2k+2 (x → 1). (11) The above result can be inferred from the functional equation, which induces a recursion for the functions gk(x), which in turn can be asymptotically analysed. This method is called moment pumping [36]. Below, we will extract the above asymptotic behaviour by the method of dominant balance. The asymptotic behaviour of the moments of X̃m can be obtained from singularity analysis of generating functions, as described in Lemma 2. Using the functional equation, it can be shown that all functions gk(x) are Laurent series about x = 1, with a finite number of terms. Hence the remark following Lemma 2 implies for the (factorial) moments of the random variable Xm Eq. (8) the expression E[(Xm) ∼ E[(Xm)k] Γ(2k + 2) (2k + 1)! (m → ∞), in accordance with the previous derivation. On the level of the moment generating function, an application of Watson’s lemma [5, Sec 4.1] shows that the coefficients k! in Eq. (11) appear in the asymptotic expansion of a certain Laplace transform of the (entire) moment generating function E[e−tX ], E[Xk] (−t2)k t dt ∼ (−1)kk!s−(2k+2) (s → ∞). Note that the r.h.s. is formally obtained by term-by-term integration of the l.h.s.. Using the arguments of [46, Ch 8.11], one concludes that there exists an s0 > 0, such that there is a unique function F (s) analytic for ℜ(s) ≥ s0 with the above asymptotic expansion. It is given by F (s) = Ei(s2) es , (12) where Ei(z) = e−tz /t dt is the exponential integral. The moment generating function M(t) = E[e−tX ] of the random variable X is given by an inverse Laplace transform of F (s), e−stM(t2)t dt = F (s). Since there are effective methods for computing inverse Laplace transforms [23], the question arises whether the function F (s) can be easily obtained. It turns out that the functional equation Eq. (10) induces a differential equation for F (s). This equation can be obtained in a mechanical way, using the method of dominant balance. 3.1.3 Dominant balance For a given functional equation, the method of dominant balance consists of a certain rescaling of the variables, such that the quantity of interest appears in the expansion of a rescaled variable to leading order. The method was originally used as an heuristic tool in order to extract the scaling function of a polygon model [84] (see the following section). In the present framework, it is a rigorous method. Consider the half-perimeter and area generating function P (x, q) as a formal power series. The substitution q = 1− ǫ̃ is valid, since the coefficients of the power series P (x, q) in x are polynomials in q. We get the power series in ǫ̃, H(x, ǫ̃) = (−1)kgk(x)ǫ̃k. whose coefficients (−1)kgk(x) are power series in x. The functional equation Eq. (10) induces an equation for H(x, ǫ̃), from which the factorial area moment generating functions gk(x) may be computed recursively. Now replace gk(x) by its expansion about x = 1, gk(x) = (1− x)2k+2−l Introducing s̃ = 1− x, this leads to a power series E(s̃, ǫ̃) in ǫ̃, E(s̃, ǫ̃) = (−1)k s̃2k+2−l whose coefficients are Laurent series in s̃. As above, the functional equation induces an equation for the power series E(s̃, ǫ̃) in ǫ̃, from which the expansion coefficients may be computed recursively. We infer from the previous equation that E(sǫ, ǫ2) = (−1)k fk,l s2k+2−l F (s, ǫ). (13) Write F (s, ǫ) = l≥0 Fl(s)ǫ l. By construction, the (formal) series F0(s) = F (s, 0) coincides with the asymptotic expansion of the desired function F (s) Eq. (12) about infinity. The above example suggests a technique for computing F0(s). The functional equation Eq. (10) for P (x, q) induces, after reparametrisation, differential equations for the functions Fl(s), from which F0(s) may be obtained explicitly. These may be computed by first writing P (x, q) = (1− q)1/2 , (1− q)1/2 , (14) and then introducing variables s and ǫ, by setting x = 1− sǫ and q = 1− ǫ2. Expand the equation to leading order in ǫ. This yields, to order ǫ0, the first order differential equation sF ′0(s) + 2− 2s2F0(s) = 0. The above equation translates into a recursion for the coefficients fk,0, from which fk,0 = k! can be deduced. In addition, the equation has a unique solution with the prescribed asymptotic behaviour Eqn. (13), which is given by F0(s) = Ei(s 2) es As we will argue in the next section, Eq. (14) is sometimes referred to as a scaling Ansatz, the function F (s, 0) appears as a scaling function, the functions Fl(s), for l ≥ 1, appear as correction-to-scaling functions . In our formal framework, where the series Fl(s) are rescaled generating functions for the coefficients fk,l, their derivation is rigorous. 3.2 A general method In the preceding two subsections, we described a method for obtaining limit laws of counting parameters, via a generating function approach. Since this method will be important in the remainder of this section, we summarise it here. Its first ingredient is based on the so-called method of moments [17, Thm 4.5.5]. Proposition 3. For m,n ∈ N0, let real numbers pm,n be given. Assume that the numbers pm,n asymptotically satisfy, for k ∈ N0, (n)kpm,n ∼ Akx−mc mγk−1 (m → ∞), (15) where Ak are positive numbers, and γk = (k − θ)/φ, with real constants θ and φ > 0. Assume that the numbers Mk := Ak/A0 satisfy the Carleman condition (M2k) −1/(2k) = +∞. (16) Then the following conclusions hold. i) For almost all m, the random variables X̃m P(X̃m = n) = pm,n∑ n pm,n are well defined. We have Xm := d→ X, (18) for a unique random variable X with moments Mk, where d denotes convergence in distribution. We also have moment convergence. ii) If the numbers Mk satisfy for all t ∈ R the estimate = 0, (19) then the moment generating function M(t) = E[e−tX ] of X is an entire function. The coefficients AkΓ(γk) are related to M(t) by a Laplace transform which has, for θ > 0, the asymptotic expansion E[Xk] (−t1/φ)k t1−γ0 (−1)kAkΓ(γk)s−γk (s → ∞). sketch. A straightforward calculation using Eq. (15) leads to E[(X̃m)k] mk/φ (m → ∞). This implies that the same asymptotic form holds for the (ordinary) moments E[(X̃m) Due to the growth condition Eq. (16), the sequence (Mk) defines a unique random variable X with moments Mk. Also, moment convergence of the sequence (Xm) to X implies convergence in distribution, see [17, Thm 4.5.5]. Due to the growth condition Eq. (19), the function M(t) is entire. Hence the conditions of Watson’s Lemma [5, Sec 4.1] are satisfied, and we obtain Eq. (20). Remarks. i) The growth condition Eq. (19) implies the Carleman condition Eq. (16). All examples below have entire moment generating functions M(t). ii) If γ0 < 0, a modified version of Eq. (20) can be given, see for example staircase polygons below. Proposition 2 states that assumption Eq. (15) translates, at the level of the half- perimeter and area generating function P (x, q) = m,n pm,nx mqn, to a certain asymptotic expression for the factorial moment generating functions gk(x) = P (x, q) Their asymptotic behaviour follows from Eq. (15), and is (sing) k (x) ∼ (1− x/xc)γk (x ր xc), where fk = AkΓ(γk). Adopting the generating function viewpoint, the amplitudes fk determine the numbers Ak, hence the moments Mk = Ak/A0 of the limit distribution. The series F (s) = k≥0(−1)kfks−γk will be of central importance in the sequel. Definition 1 (Area amplitude series). Let Assumption 1 be satisfied. Assume that the generating function P (x, q) = m,n pm,nx mqn satisfies asymptotically P (x, q) )(sing) (1− x/xc)γk (x ր xc), with exponents γk /∈ {0,−1,−2, . . .}. Then, the formal series F (s) = (−1)k fk is called the area amplitude series. Remarks. i) Proposition 3 states that the area amplitude series appears in the asymptotic expansion about infinity of a Laplace transform of the moment generating function of the area limit distribution. The probability distribution of the limiting area distribution is related to F (s) by a double Laplace transform. ii) For typical polygon models, all derivatives of P (x, q) w.r.t. q, evaluated at q = 1, exist and have the same radius of convergence, see Proposition 1. Typical polygon models do have factorial moment generating functions of the above form, see the examples below. The second ingredient of the method consists in applying the method of dominant balance. As described above, this may result in a differential equation (or in a difference equation [90]) for the function F (s). Its applicability has to be tested for each given type of functional equation. Typically, it can be applied if the factorial area moment generating functions gk(x) Eq. (1) have, for values x < xc, a local expansion about x = xc of the form (sing) k (x) = (1− x/xc)γk,l where γk,l = (k − θl)/φ and θl+1 > θl. If a transfer theorem such as Lemma 2 applies, then the differential equation for F (s) induces a recurrence for the moments of the limit distribution. If the differential equation can be solved in closed form, inverse Laplace transform techniques may be applied in order to obtain explicit expressions for the moment generating function and the probability density. Also, higher order corrections to the limiting behaviour may be analysed, by studying the functions Fl(s), for l ≥ 1. See [87] for examples. 3.3 Further examples Using the general method as described above, area limit laws for the other exactly solved polygon models can be derived. A model with the same area limit law as rectangles is convex polygons, compare [87]. We will discuss some classes of polygon models with different area limit laws. 3.3.1 Ferrers diagrams In contrast to the previous example, the limit distribution of area of Ferrers diagrams is concentrated. Proposition 4. The area random variable X̃m of Ferrers diagrams has mean µm ∼ m2/8. The normalised random variablesXm Eq. (18) converge in distribution to a random variable with density p(x) = δ(x− 1/8). Remark. It should be noted that the above convergence statement already follows from the concentration property limm→∞ σm/µm = 0, with σ m ∼ m3/48 the variance of Xm, by an explicit analysis of the first three factorial moment generating functions. (By Cheby- shev’s inequality, the concentration property implies convergence in probability, which in turn implies convergence in distribution.) For illustrative purposes, we follow a different route via the moment method in the following proof. Proof. Ferrers diagrams, counted by half-perimeter and area, satisfy the linear q-difference equation [87, Eq (5.4)] P (x, q) = (1− qx)2 P (qx, q) + (1− qx)2 The perimeter generating function g0(x) = x 2/(1 − 2x) is obtained by setting q = 1 in the above equation. Hence xc = 1/2. Using the functional equation, it can be shown by induction on k that all area moment generating functions gk(x) are rational in g0(x) and its derivatives. Hence all gk(x) are rational functions. Since the area of a polygon grows at most quadratically with the perimeter, we have a bound on the exponent, γk ≤ 2k + 1, of the leading singular part of gk(x). Given this bound, the method of dominant balance can be applied. We set P (x, q) = (1− q) 12 1− 2x (1− q) 12 , (1− q) and introduce new variables s and ǫ by q = 1 − ǫ2 and 2x = 1 − sǫ. Then an expansion of the functional equation yields, to order ǫ0, the ODE of first order F ′(s) = 4sF (s)− 1, whose unique solution with the prescribed asymptotic behaviour is F (s) = It can be inferred from the differential equation that all coefficients in the asymptotic expansion of F (s) at infinity are nonzero. Hence, the above exponent bound is tight. It can be inferred from the functional equation by induction on k that each gk(x) is a Laurent polynomial about xc = 1/2. Thus, Lemma 2 applies, and we obtain the moment generating function of the corresponding random variable Eq. (18) as M(s) = exp(−s/8). This is readily recognised as the moment generating function of a probability distribution concentrated at x = 1/8. A sequence of random variables, which satisfies the concentration property, often leads to a Gaussian limit law, after centering and suitable normalisation. This is also the case for Ferrers diagrams. Theorem 2 ([97]). The area random variable X̃m of Ferrers diagrams has mean µm ∼ m2/8 and variance σ2m ∼ m3/48. The centred and normalised random variables X̃m − µm , (21) converge in distribution to a Gaussian random variable. Remarks. i) It is possible to prove this result by the method of dominant balance. The idea of proof consists in studying the functional equation of the generating function for the “centred coefficients” pm,n − µm. ii) The above arguments can also be applied to stack polygons to yield the concentration property and a central limit theorem. 3.3.2 Staircase polygons The limit law of area of staircase polygons is the Airy distribution. This distribution (see [34] and the survey [52]) is conveniently defined via its moments. Definition 2 (Airy distribution [34]). The random variable Y is said to be Airy distributed E[Y k] Γ(γ0) Γ(γk) where γk = 3k/2− 1/2, and the numbers φk satisfy, for k ≥ 1, the quadratic recurrence γk−1φk−1 + φlφk−l = 0, with initial condition φ0 = −1. Remarks ([34, 58]). i) The first moment is E[Y ] = π. The sequence of moments can be shown to satisfy the Carleman condition. Hence the distribution is uniquely determined by its moments. ii) The numbers φk appear in the asymptotic expansion of the logarithmic derivative of the Airy function at infinity, logAi(s) ∼ (−1)kφk s−γk (s → ∞), where Ai(x) = 1 cos(t3/3 + tx) dt is the Airy function. iii) Explicit expressions for the numbers φk are known [58]. They are, for k ≥ 1, given by φk = 2 k+1 3 x3(k−1)/2 Ai(x)2 + Bi(x)2 where Bi(z) is the second standard solution of the Airy differential equation f ′′(z)−zf(z) = iv) The Airy distribution appears in a variety of contexts [34]. In particular, the random variable Y/ 8 describes the law of the area of a Brownian excursion. See also [76] for an overview from a physical perspective. Explicit expressions have been derived for the moment generating function of the Airy distribution and for its density. Fact 1 ([19, 66, 99, 34]). The moment generating function M(t) = E[e−tY ] of the Airy distribution satisfies the modified Laplace transform (e−st−1)M(2−3/2t3/2) dt = 21/3 Ai′(21/3s) Ai(21/3s) Ai′(0) Ai(0) . (22) The moment generating function M(t) is given explicitly by M(2−3/2t) = −βkt2/32−1/3 where the numbers −βk are the zeros of the Airy function. Its density p(x) is given explicitly 23/2p(23/2x) = e−vk v where vk = 2β k/(27x 2) and U(a, b, z) is the confluent hypergeometric function. Remarks. i) The confluent hypergeometric function U(a, b; z) is defined as [1] U(a, b; z) = sin πb 1F1[a, b; z] Γ(1 + a− b)Γ(b) 1F1[1 + a− b, 2− b; z] Γ(a)Γ(2− b) where 1F1[a; b; z] is the hypergeometric function 1F1[a; b; z] = 1 + a(a+ 1) b(b+ 1) + . . . ii) The moment generating function and its density are obtained by two consecutive inverse Laplace transforms of Eq. (22), see [67, 68] and [99, 54]. iii) In the proof of the following theorem, we will derive Eq. (22) using the model of staircase polygons. This shows, in particular, that the coefficients φk appear in the asymptotic expansion of the Airy function. Theorem 3. The normalised area random variables Xm of staircase polygons Eq. (18) satisfy d−→ Y√ (m → ∞), where Y is Airy distributed according to Definition 2. We also have moment convergence. Remark. Given the functional equation of the half-perimeter and area generating function of staircase polygons, P (x, q) = 1− 2xq − P (qx, q) (see [88] for a recent derivation), this result is a special case of Theorem 4 below, which is stated in [25]. Proof. We use the method of dominant balance. From the functional equation Eq. (23), we infer g0(x) = 1/4 + 1− 4x/2 + (1 − 4x)/4. Hence xc = 1/4. The structure of the functional equation implies that all functions gk(x) can be written as Laurent series in 1− 4x, see also Proposition 7 below. Explicitly, we get g1(x) = x2/(1 − 4x). This suggests γk = (3k− 1)/2. An upper bound of this form on the exponent γk can be derived without too much effort from the functional equation, by an application of Faa di Bruno’s formula, see also [89, Prop (4.4)]. Thus, the method of dominant balance can be applied. We set P (x, q) = + (1− q)1/3F 1− 4x (1− q)2/3 , (1− q)1/3 and introduce variables s, ǫ by 4x = 1 − sǫ2 and q = 1 − ǫ3. In the above equation, we excluded the constant 1/4 =: P (reg)(x, q), since it does not contribute to the moment asymptotics. Expanding the functional equation to order ǫ2 gives the Riccati equation F ′(s) + 4F (s)2 − s = 0. (24) It follows that the coefficients fk of F (s) satisfy, for k ≥ 1, the quadratic recursion γk−1fk−1 + 4 flfk−l = 0, with initial condition f0 = −1/2. A comparison with the definition of the Airy distribution shows that φk = 2 2k+1fk. Using the closure properties of ∆-regular functions, it can be inferred from the functional equation that (the analytic continuation of) each factorial moment generating function gk(x) is ∆-regular, with xc = 1/4, see also Proposition 7 below. Hence the transfer theorem Lemma 2 can be applied. We obtain 4Xm d→ Y in distribution and for moments, where Y is Airy distributed. Remarks. i) The unique solution F (s) of the differential equation in the above proof Eq. (24), satisfying the prescribed asymptotic behaviour, is given by F (s) = logAi(41/3s). (25) The moment generating function M(t) of the limiting random variable X = limm→∞Xm is related to the function F (s) via the modified Laplace transform (e−st−1)M(t3/2) 1 dt = 4 π(F (s)− F (0)), where the modification has been introduced in order to ensure a finite integral about the origin. This result relates the above proof to Proposition 1. ii) The method of dominant balance can be used to obtain corrections Fl(s) to the limiting behaviour [87]. The fact that the area law of staircase polygons is, up to normalisation, the same as that of the area under a Brownian excursion, suggests that there might be a combinato- rial explanation. Indeed, as is well known, there is a bijection [21, 98] between staircase polygons and Dyck paths, a discrete version of Brownian excursions [2], see figure 2 [88]. Within this bijection, the polygon area corresponds to the sum of peak heights of the Dyck path, but not to the area below the Dyck path. For more about this connection, see the Figure 2: [88] A combinatorial bijection between staircase polygons and Dyck paths [21, 98]. Column heights of a polygon correspond to peak heights of a path. remark at the end of the following subsection. 3.4 q-difference equations All polygon models discussed above have an algebraic perimeter generating function. More- over, their half-perimeter and area generating function satisfies a functional equation of the form P (x, q) = G(x, q, P (x, q), P (qx, q)), for a real polynomial G(x, q, y0, y1). Since, under mild assumptions on G, the equation reduces to an algebraic equation for P (x, 1) in the limit q → 1, it may be viewed as a “deformation” of an algebraic equation. In this subsection, we will analyse equations of this type at the special point (x, q) = (xc, 1), where xc is the radius of convergence of P (x, 1). It will appear that the methods used in the above examples also can be applied to this more general case. The above equation falls into the class of q-difference equations [103]. While particular examples appear in combinatorics in a number of places, see e.g. [37], the asymptotic behaviour of equations of the above form seems to have been systematically studied initially in [25, 87]. The study can be done in some generality, e.g., also for non-polynomial power series G, for replacements more general than x 7→ qx, and for multivariate generalisations, see [89] and [25]. For simplicity, we will concentrate on polynomial G, and then briefly discuss generalisations. Our exposition closely follows [89, 87]. 3.4.1 Algebraic q-difference equations Definition 3 (Algebraic q-difference equation [25, 87]). An algebraic q-difference equation is an equation of the form P (x, q) = G(x, q, P (x, q), P (qx, q), . . . , P (qNx, q)), (26) where G(x, q, y0, y1, . . . , yN) is a complex polynomial. We require that G(0, q, 0, 0, . . . , 0) ≡ 0, (0, q, 0, 0, . . . , 0) ≡ 0 (k = 0, 1, . . . , N). Remarks. i) See [103] for an overview of the theory of q-difference equations. As q approaches unity, the above equation reduces to an algebraic equation. ii) Asymptotics for solutions of algebraic q-difference equations have been considered in [25]. The above definition is a special case of [89, Def 2.4], where a multivariate extension is considered, and where G may be non-polynomial. Also, replacements more general than x 7→ f(q)x are allowed. Such equations are called q-functional equations in [89]. The results presented below apply mutatis mutandis also to q-functional equations. The algebraic q-difference equation in Definition 3 uniquely defines a (formal) power series P (x, q) satisfying P (0, q) ≡ 0. This is shown by analysing the implied recurrence for the coefficients pm(q) of P (x, q) = m>0 pm(q)x m, see also [89, Prop 2.5]. In fact, pm(q) is a polynomial in q. The growth of its degree in m is not larger than cm 2 for some positive constant c, hence the counting parameters are rank 2 parameters [25]. In our situation, such a bound holds, since the area of a polygon grows at most quadratically with its perimeter. From the preceding discussion, it follows that the factorial moment generating functions gk(x) = P (x, q) are well-defined as formal power series. In fact, they can be recursively determined from the q-difference equation by implicit differentiation, as a consequence of the following proposi- tion. Proposition 5 ([87, 89]). Consider the derivative of order k > 0 of an algebraic q- difference equation Eq. (26) w.r.t. q, evaluated at q = 1. It is linear in gk(x), and its r.h.s. is a complex polynomial in the power series gl(x) and its derivatives up to order k − l, where l = 0, . . . , k. Remarks. i) This statement can be shown by analysing the k-th derivative of the q- difference equation, using Faa di Bruno’s formula [18]. ii) It follows that every function gk(x) is rational in gl(x) and its derivatives up to order k− l, where 0 ≤ l < k. Since G is a polynomial, gk(x) is algebraic, by the closure properties of algebraic functions. We discuss analytic properties of the (analytic continuations of the) factorial moment generating functions gk(x). These are determined by the analytic properties of g0(x) = P (x, 1). We discuss the case of a square-root singularity of P (x, 1), which often occurs for combinatorial structures, and which is well studied, see e.g. [79, Thm 10.6] or [37, Ch VII.4]. Other cases may be treated similarly. We make the following assumption: Assumption 2. The q-difference equation in Definition 3 has the following properties: i) All coefficients of the polynomial G(x, q, y0, y1, . . . , yN) are non-negative. ii) The polynomial Q(x, y) := G(x, 1, y, y, . . . , y) satisfies Q(x, 0) 6≡ 0 and has degree at least two in y. iii) P (x, 1) = m≥1 pmx m is aperiodic, i.e., there exist indices 1 ≤ i < j < k such that pipjpk 6= 0, while gcd(j − i, k − i) = 1. Remarks. i) The positivity assumption is natural for combinatorial constructions. There are, however, q-difference equations with negative coefficients, which arise from systems of q-difference equations with non-negative coefficients by reduction. Examples are convex polygons [87, Sec 5.4] and directed convex polygons, see below. ii) Assumptions i) and ii) result in a square-root singularity as the dominant singularity of P (x, 1). iii) Assumption iii) implies that there is only one singularity of P (x, 1) on its circle of convergence. Since P (x, 1) has non-negative coefficients only, it occurs on the positive real half-line. The periodic case can be treated by a straightforward extension [37]. An application of the (complex) implicit function theorem ensures that P (x, 1) is an- alytic at the origin. It can be analytically continued, as long as the defining algebraic equation remains invertible. Together with the positivity assumption, one can conclude that there is a number 0 < xc < ∞, such that the analytic continuation of P (x, 1) satisfies yc = limxրxc P (x, 1) < ∞, with Q(xc, yc) = yc, Q(xc, y) With the positivity assumption on the coefficients, it follows that Q(xc, y) > 0, C := Q(x, yc) > 0. (27) These conditions characterise the singularity of P (x, 1) at x = xc as a square-root. It can be shown that there exists a locally convergent expansion of P (x, 1) about x = xc, and that P (x, 1) is analytic for |x| < xc. We have the following result. Recall that a function f(z) is ∆-regular if it is analytic in the indented disc ∆ = {z : |z| ≤ xc+η, |Arg(z−xc)| ≥ φ} for some η > 0 and some φ, where 0 < φ < π/2. Proposition 6 ([79, 37, 89]). Given Assumption 2, the power series P (x, 1) is analytic at x = 0, with radius of convergence xc. Its analytic continuation is ∆-regular, with a square-root singularity at x = xc and a local Puiseux expansion P (x, 1) = yc + f0,l(1− x/xc)1/2+l/2, where yc = limxրxc P (x, 1) < ∞ and f0,0 = − xcC/B, for constants B > 0 and C > 0 as in Eq. (27). The numbers f0,l can be recursively determined from the q-difference equation. The asymptotic behaviour of P (x, 1) = g0(x) carries over to the factorial moment generating functions gk(x). Proposition 7 ([89]). Given Assumption 2, all factorial moment generating functions gk(x) are, for k ≥ 1, analytic at x = 0, with radius of convergence xc. Their analytic continuations are ∆-regular, with local Puiseux expansions gk(x) = fk,l(1− x/xc)−γk+l/2, where γk = 3k/2 − 1/2. The numbers fk,0 = fk are, for k ≥ 2, characterised by the recursion γk−1fk−1 + flfk−l = 0, and the numbers f0 < 0 and f1 > 0 are given by f0 = − , 4f1 = k=1 k (xc, 1, yc, yc, . . . , yc) , (28) for constants B > 0 and C > 0 as in Eq. (27). Remarks. i) This result can be obtained by a direct analysis of the q-difference equation, applying Faa di Bruno’s formula, see also [87, Sec 2.2]. ii) Alternatively, it can be obtained by applying the method of dominant balance to the q-difference equation. To this end, one notes that all functions gk(x) are Laurent series in√ 1− x/xc, and that their leading exponents are bounded from above by γk. (An upper bound on an exponent is usually easier to obtain than its exact value, since cancellations can be ignored). With these two ingredients, the method of dominant balance, as described above, can be applied. The differential equation of the function F (s) then translates, via a transfer theorem, into the above recursion for the coefficients. See [89, Sec 5]. The above result can be used to infer the limit distribution of area, along the lines of Section 3.2. Theorem 4 ([25, 89]). Let Assumption 2 be satisfied. For the solution of an algebraic q-difference equation P (x, q) = m,n pm,nx mqn, let X̃m denote the random variable P(X̃m = n) = pm,n∑ n pm,n (which is well-defined for almost all m). The mean of X̃m is given by E[X̃m] ∼ 2 m3/2 (m → ∞), where the numbers f0 and f1 are given in Eq. (28). The sequence of normalised random variables Xm converges in distribution, E[X̃m] d−→ Y√ (m → ∞), where Y is Airy distributed according to Definition 2. We also have moment convergence. Remarks. i) An explicit calculation shows that φk = |f0|−1 fk. Together with Proposition 7, the claim of the proof follows by standard reasoning, as in the examples above. ii) The above theorem appears in [25, Thm 3.1], together with an indication of the ar- guments of a proof. [There is a misprint in the definition of γ in [25, Thm 3.1]. In our notation γ = 4Bf1.] Within the more general setup of q-functional equations, the theorem is a special case of [89, Thm 1.5]. iii) The above theorem is a kind of central limit theorem for combinatorial constructions, since the Airy distribution arises under natural assumptions for a large class of combina- torial constructions. For a connection to certain Brownian motion functionals, see below. 3.4.2 q-functional equations and other extensions We discuss extensions of the above result. Generically, the dominant singularity of P (x, 1) is a square-root. The case of a simple pole as dominant singularity, which generalises the example of Ferrers diagrams, has been discussed in [87]. Under weak assumptions, the resulting limit distribution of area is concentrated. Other singularities can also be analysed, as shown in the examples of rectangles above and of directed convex polygons in the following subsection. Compare also [90]. The case of non-polynomial G can be discussed along the same lines, with certain assumptions on the analyticity properties of the series G. In the undeformed case q = 1, it is a classical result [37, Ch VII.3] that the generating function has a square-root as dominant singularity, as in the polynomial case. One can then argue along the above lines that an Airy distribution emerges as the limit law of the deformation variable [89, Thm 1.5]. Such an extension is relevant, since prominent combinatorial models, such as the Cayley tree generating function, fall into that class. See also the discussion of self-avoiding polygons below. The above statements also remain valid for more general classes of replacements x 7→ qx, e.g., for replacements x 7→ f(q)x, where f(q) is analytic for 0 ≤ q ≤ 1, with non-negative series coefficients about q = 0. More interestingly, the idea of introducing a q-deformation may be iterated [25], leading to equations such as P (x, q1, . . . , qM) = G(x, P (xq1 · . . . · qM , q1q2 · . . . · qM , q2q3 · . . . · qM , . . . , qM)). (29) The counting parameters corresponding to qk are rank k + 1 parameters, and limit distri- butions for such quantities have been derived for some types of singularities [77, 78, 88]. There is a central limit result for the generic case of a square-root singularity [89]. This generalisation applies to counting parameters, which decompose linearly under a combina- torial construction. These results can also be obtained by an alternative method, which generalises to non-linear parameters, see [51]. The case where the limit q to unity in a q-difference equation is not algebraic, has not been discussed. For example, if G(x, q, P (x, q), P (qx, q)) = 0 for some polynomial G, the limit q to unity might lead to an algebraic differential equation for P (x, 1). This may be seen by noting that f(x)− f(qx) (1− q) = xf ′(x), for f(x) differentiable at x. Such equations are possibly related to polygon models such as three-choice polygons [44] or punctured staircase polygons [45]. Their perimeter generating function is not algebraic, hence the models do not satisfy an algebraic q-difference equation as in Definition 3. 3.4.3 A stochastic connection Lastly, we indicate a link to Brownian motion, which appears in [99, 100] and was further developed in [77, 78, 89, 88]. As we saw in Section 3.2, limit distributions can, under certain conditions, be characterised by a certain Laplace transform of their moment gen- erating functions. This approach, which arises naturally from the viewpoint of generating functions, can be applied to discrete versions of Brownian motion, excursions, bridges or meanders. Asymptotic results are results for the corresponding stochastic objects. In fact, distributions of some functionals of Brownian motion have apparently first been obtained using this approach [99, 100]. Interestingly, a similar characterisation appears in stochastics for functionals of Brow- nian motion, via the Feynman-Kac formula. For example, Louchard’s formula [66] relates the logarithmic derivate of the Airy function to a certain Laplace transform of the moment generating function of the law of the Brownian excursion area. Distributions of functionals of Brownian motion can also be obtained by a path integral approach, see [75] for a recent overview. The discrete approach provides an alternative method for obtaining information about distributions of certain functionals of Brownian motion. For such functionals, it provides an alternative proof of Louchard’s formula [77, 78]. It leads, via the method of dominant balance, quite directly to moment recurrences for the underlying distribution. These have been studied in the case of rank k parameters for discrete models of Brownian motion. In particular, they characterise the distributions of integrals over (k − 1)-th powers of the corresponding stochastic objects [77, 78, 89, 88]. Such results have apparently not been previously derived using stochastic methods. The generating function approach can also be applied to classes of q-functional equations with singularities different from those connected to Brownian motion. For a related generalisation, see [10]. Vice versa, results and techniques from stochastics can be (and have been) analysed in order to study asymptotic properties of polygons. An example is the contour process of simply generated trees [38], which asymptotically describes the area of a staircase polygon. See also [69, 70, 71, 59]. 3.5 Directed convex polygons We show that the limit law of area of directed convex polygons in the uniform fixed perimeter ensemble is that of the area of the Brownian meander. Fact 2 ([100, Thm 2]). The random variable Z of area of the Brownian meander is char- acterised by E[Zk] Γ(α0) Γ(αk) where αk = 3k/2 + 1/2. The numbers ωk satisfy for k ≥ 1 the quadratic recurrence αk−1ωk−1 + −lωk−l = 0, with initial condition ω0 = 1, where the numbers φk appear in the Airy distribution as in Definition 2. Remarks. i) This result has been derived using a discrete meander, whose length and area generating function is described by a system of two algebraic q-difference equations, see [77, Prop 1]. ii) We have E[Z] = 3 2π/8 for the mean of Z. The random variable Z is uniquely determined by its moments. The numbers ωk appear in the asymptotic expansion [100, Thm 3] Ω(s) = Ai(t) dt 3Ai(s) (−1)kωks−αk (s → ∞), where Ai(x) = 1 cos(t3/3 + tx) dt is the Airy function. Explicit expressions have been derived for the moment generating function and for the distribution function of Z. Fact 3 ([100, Thm 5]). The moment generating function M(t) = E[e−tZ ] of Z satisfies the Laplace transform ∫ ∞ e−stM( 2 t3/2) πΩ(s). (30) It is explicitly given by M(t) = 2−1/6t1/3 Rk exp(−βkt2/32−1/3) for ℜ(t) > 0, where the numbers −βk are the zeroes of the Airy function, and where βk(1 + 3 Ai(−t) dt) 3Ai′(−βk) The random variable Z has a continuous density p(y), with distribution function R(x) =∫ x p(y) dy given by R(x) = (18)1/6x −vk v k Ai((3vk/2) 2/3), where vk = (βk) 3/(27x2). Remark. The moment generating function and the distribution function are obtained by two consecutive inverse Laplace transforms of Eq. (30). Theorem 5. The normalised area random variablesXm of directed convex polygons Eq. (18) satisfy d−→ 1 Z (m → ∞), where Z is the area random variable of the Brownian meander as in Fact 2. We also have moment convergence. Proof. A system of q-difference equations for the generating function Q(x, y, q) of directed convex polygons, counted by width, height and area, has been given in [9, Lemma 1.1]. It can be reduced to a single equation, q(qx− 1)Q(x, y, q) + ((1 + q)(P (x, y, q) + y))Q(qx, y, q)+ xyq − y2 + P (x, y, q)(qx− y − 1) Q(q2x, y, q) − q2xy (y + P (x, y, q)− 1) = 0, where P (x, y, q) is the width, height and area generating function of staircase polygons. Setting q=1 and x = y yields the half-perimeter generating function g0(x) = 1− 4x Hence xc = 1/4 for the radius of convergence of Q(x, x, 1). It is possible to derive from Eq. (31) a q-difference equation for the (isotropic) half- perimeter and area generating function Q(x, q) = Q(x, x, q) of directed convex polygons. This is due to the symmetry Q(x, y, q) = Q(y, x, q), which results from invariance of the set of directed convex polygons under reflection along the negative diagonal y = −x. Since this equation is quite long, we do not give it here. By arguments analogous to those of the previous subsection, it can be deduced from this equation that all area moment generating functions gk(x) of Q(x, 1) are Laurent series in s = 1− 4x, see also [89, Prop (4.3)]. The leading singular exponent of gk(x), defined by gk(x) ∼ hk(1− x/xc)−αk as x ր xc, can be bounded from above by αk ≤ 3k/2 + 1/2, see also [89, Prop (4.4)] for the argument. We apply the method of dominant balance, in order to prove that αk = 3k/2 + 1/2 and to yield recurrences for the coefficients hk. We define P (x, q) = + (1− q)1/3F 1− 4x (1− q)2/3 , (1− q)1/3 Q(x, q) = (1− q)−1/3H 1− 4x (1− q)2/3 , (1− q)1/3 where F (s) = F (s, 0) has already been determined in Eq. (25). We set 4x = 1 − sǫ2, q = 1− ǫ3, and expand the q-difference equation to leading order in ǫ. We get for H(s) := H(s, 0) the inhomogeneous linear differential equation of first order H ′(s) + 4H(s)F (s) + This implies for the coefficients hk of H(s) = k≥0 hks −αk and fk of F (s) = k≥0 fks for k ≥ 1 the quadratic recursion αk−1hk−1 + 4 flhk−l = 0, where h0 = 1/16. Using fk = 2 −2k−1φk, we obtain the meander recursion in Fact 2 by setting hk = 2 −k−4ωk. It can be inferred from the functional equation that (the analytic continuations of) all factorial moment generating functions are ∆-regular, with xc = 1/4. Thus Lemma 2 applies, and we conclude Xm d→ Z/2. Remarks. i) The above theorem states that the limit distribution of area of directed convex polygons coincides, up to normalisation, with the area distribution of the Brownian meander [100]. This suggests that there might exist a combinatorial bijection to discrete meanders, in analogy to that between staircase polygons and Dyck paths. Up to now, a “nice” bijection has not been found, see however [6, 72] for combinatorial bijections to discrete bridges. ii) The above proof relies on a q-difference equation for the isotropic generating function Q(x, x, q). Up to normalisation, the meander distribution also appears for the anisotropic model Q(x, y, q), where 0 < y < 1/2 is fixed, as can be shown by a considerably simpler calculation. The normalisation constant coincides with that of the isotropic model for y = 1/2. The latter statement is also a consequence of the fact that the height random variable of directed polygons is asymptotically Gaussian, after centering and normalisation. Analogous considerations apply to the relation between isotropic and anisotropic versions of the other polygon classes. 3.6 Limit laws away from (xc, 1) As motivated in the introduction, limit laws in the fixed perimeter ensemble for q 6= 1 are expected to be Gaussian. The same remark holds for the fixed area ensemble for x 6= xc. There are partial results for the model of staircase polygons. The fixed area ensemble can, for x < xc and q near unity, be analysed using Fact 7 of the following section. For staircase polygons in the uniform fixed area ensemble x = 1, the following result holds. Fact 4 ([37, Prop IX.11]). Consider the perimeter random variable of staircase polygons in the uniform fixed area ensemble, P(Ỹn = m) = pm,n∑ m pm,n The variable Ỹn has mean µn ∼ µ ·n and standard deviation σn ∼ σ n, where the numbers µ and σ satisfy µ = 0.8417620156 . . . , σ = 0.4242065326 . . . The centred and normalised random variables Ỹn − µn converge in distribution to a Gaussian random variable. Remark. The above result is proved using an explicit expression for the half-perimeter and area generating function, as a ratio of two q-Bessel functions. It can be shown that this expression is meromorphic about (x, q) = (1, qc) with a simple pole, where qc is the radius of convergence of the generating function P (1, q). The explicit form of the singularity about (1, qc) yields a Gaussian limit law. There are a number of results for classes of column-convex polygons in the uniform fixed area ensemble, typically leading to Gaussian limit laws. The upper and lower shape of a polygon can be described by Brownian motions. See [69, 70, 71] for details. It would be interesting to prove convergence to a Gaussian limit law within a more general frame- work, such as q-difference equations. Analogous questions for other functional equations, describing counting parameters such as horizontal width, have been studied in [24]. 3.7 Self-avoiding polygons A numerical analysis of self-avoiding polygons, using data from exact enumeration [91, 92], supports the conjecture that the limit law of area is, up to normalisation, the Airy distribution. Let pm,n denote the number of square lattice self-avoiding polygons of half-perimeter m and area n. Exact enumeration techniques have been applied to obtain the numbers pm,n for all values of n for given m ≤ 50. Numerical extrapolation techniques yield very accurate estimates of the asymptotic behaviour of the coefficients of the factorial moment generating functions. To leading order, these are given by [xm]gk(x) = (n)kpm,n ∼ Akx−mc m3k/2−3/2−1 (m → ∞), (32) for positive amplitudes Ak. The above form has been numerically checked [91, 92] for values k ≤ 10 and is conjectured to hold for arbitrary k. The value xc is the radius of convergence of the half-perimeter generating function of self-avoiding polygons. The amplitudes Ak have been extrapolated to at least five significant digits. In particular, we have xc = 0.14368062927(2), A0 = 0.09940174(4), A1 = 0.0397886(1), where the numbers in brackets denote the uncertainty in the last digit. An exact value of the amplitude A1 = 1/(8π) has been predicted [15] using field-theoretic arguments. The particular form of the exponent implies that the model of rooted self-avoiding poly- gons p̃m,n = mpm,n has the same exponents φ = 2/3 and θ = 1/3 as staircase polygons. In particular, it implies a square-root as dominant singularity of the half-perimeter generat- ing function. Together with the above result for q-functional equations, this suggests that (rooted) self-avoiding polygons might obey the Airy distribution as a limit law of area. A natural method to test this conjecture consists in analysing ratios of moments, such that a normalisation constant is eliminated. Such ratios are also called universal amplitude ratios. If the conjecture were true, we would have asymptotically E[X̃km] E[X̃m]k ∼ k! Γ(γ1) Γ(γk)Γ(γ0)k−1 (m → ∞), for the area random variables X̃m as in Eq. (17). The numbers φk and exponents γk are those of the Airy distribution as in Definition 2. The above form was numerically confirmed for values of k ≤ 10 to a high level of numerical accuracy. The normalisation constant is obtained by noting that E[Y ] = Conjecture 1 (cf [91, 92]). Let pm,n denote the number of square lattice self-avoiding polygons of half-perimeter m and area n. Let X̃m denote the random variable of area in the uniform fixed perimeter ensemble, P(X̃m = n) = pm,n∑ n pm,n We conjecture that E[X̃m] d−→ Y√ where Y is Airy distributed according to Definition 2. Remarks. i) Field theoretic arguments [15] yield A1 = 1/(8π). ii) References [91, 92] contain conjectures for the scaling function of self-avoiding polygons and rooted self-avoiding polygons, see the following section. In fact, the numerical analysis in [91, 92] mainly concerns the area amplitudes Ak, which determine the limit distribution of area. iii) The area law of self-avoiding polygons has also been studied [91, 92] on the triangular and hexagonal lattices. As for the square lattice, the area limit law appears to be the Airy distribution, up to normalisation. iv) It is an open question whether there are non-trivial counting parameters other than the area, whose limit law (in the fixed perimeter ensembles) coincides between self-avoiding polygons and staircase polygons. See [88] for a negative example. This indicates that underlying stochastic processes must be quite different. v) A proof of the above conjecture is an outstanding open problem. It would be interest- ing to analyse the emergence of the Airy distribution using stochastic Loewner evolution [60]. Self-avoiding polygons at criticality are conjectured to describe the hull of critical percolation clusters and the outer boundary of two-dimensional Brownian motion [60]. A numerical analysis of the fixed area ensemble along the above lines again shows behaviour similar to that of staircase polygons. This supports the following conjecture. Conjecture 2. Consider the perimeter random variable of self-avoiding polygons in the uniform fixed area ensemble, P(Ỹn = m) = pm,n∑ m pm,n The random variable Ỹn is conjectured to have mean µn ∼ µ · n and standard deviation σn ∼ σ n, where the numbers µ and σ satisfy µ = 1.855217(1), σ2 = 0.3259(1), where the number in brackets denotes the uncertainty in the last digit. The centred and normalised random variables Ỹn − µn are conjectured to converge in distribution to a Gaussian random variable. The above conjectures, together with the results of the previous subsection, also raise the question whether rooted square-lattice self-avoiding polygons, counted by half-perimeter and area, might satisfy a q-functional equation. In particular, it would be interesting to consider whether rooted self-avoiding polygons might satisfy P (x) = G(x, P (x)), (33) for some power series G(x, y) in x, y. If the perimeter generating function P (x) is not algebraic, this excludes polynomials G(x, y) in x and y. Note that the anisotropic perimeter generating function of self-avoiding polygons is not D-finite [86]. It is thus unlikely that the isotropic perimeter generating function is D-finite and, in particular, algebraic. On the other hand, solutions of Eq. (33) need not be algebraic nor D-finite. An example is the Cayley tree generating function T (x) satisfying T (x) = x exp(T (x)), see [33]. 3.8 Punctured polygons Punctured polygons are self-avoiding polygons with internal holes, which are also self- avoiding polygons. The polygons are also mutually avoiding. The perimeter of a punctured polygon is the sum of the lengths of its boundary curves, the area of a punctured polygon is the area of the outer polygon minus the area of the holes. Apart from intrinsic combinato- rial interest, models of punctured polygons may be viewed as arising from two-dimensional sections of three-dimensional self-avoiding vesicles. Counted by area, they may serve as an approximation to the polyomino model. We consider, for a given subclass of self-avoiding polygons, punctured polygons with holes from the same subclass. The case of a bounded number of punctures of bounded size can be analysed in some generality. The case of a bounded number of punctures of unbounded size leads to simple results if the critical perimeter generating function of the model without punctures is finite. For a given subclass of self-avoiding polygons, the number pm,n denotes the number of polygons with half-perimeter m and area n. Let p (r,s) m,n denote the number of polygons with r ≥ 1 punctures whose half-perimeter sum equals s. Let p(r)m,n denote the number of polygons with r ≥ 1 punctures of arbitrary size. Theorem 6 ([94, Thms 1,2]). Assume that, for a class of self-avoiding polygons with- out punctures, the area moment coefficients p n≥0 n kpm,n have, for k ∈ N0, the asymptotic form p(k)m ∼ Akx−mc mγk−1 (m → ∞), for numbers Ak > 0, for 0 < xc ≤ 1 and for γk = (k − θ)/φ, where 0 < φ < 1. Let g0(x) = m≥0 p m denote the half-perimeter generating function. Then, the area moment coefficient p (r,k,s) (r,s) m,n of the polygon class with r ≥ 1 punctures whose half-perimeter sum equals s is, for k ∈ N0, asymptotically given by p(r,k,s)m ∼ A (r,s) γk+r−1 (m → ∞), where A (r,s) xsc[x s](g0(x)) If θ > 0, the area moment coefficient p (r,k) m,n of the polygon class with r ≥ 1 punctures of arbitrary size satisfies, for k ∈ N0, asymptotically p(r,k)m ∼ A γk+r−1 (m → ∞), where the amplitudes A k are given by Ak+r(g0(xc)) Remarks. i) The basic argument in the proof of the preceding result involves an estimate of interactions of hole polygons with one another or with the boundary of the external polygon, which are shown to be asymptotically irrelevant. This argument also applies in higher dimensions, as long as the exponent φ satisfies 0 < φ < 1. ii) In the case of an infinite critical perimeter generating function, such as for subclasses of convex polygons, boundary effects are asymptotically relevant, if punctures of unbounded size are considered. The case of an unbounded number of punctures, which approximates the polyomino problem, is unsolved. iii) The above result leads to new area limit distributions. For rectangles with r punctures of bounded size, we get βr+1,1/2 as the limit distribution of area. For staircase polygons with punctures, we obtain generalisations of the Airy distribution, which are discussed in [94]. In contrast, for Ferrers diagrams with punctures of bounded size, the limit distribution of area stays concentrated. iv) The theorem also applies to models of punctured polygons, which do not satisfy an algebraic q-difference equation. An example is given by staircase polygons with a staircase hole of unbounded size, whose perimeter generating function is not algebraic [45]. 3.9 Models in three dimensions There are very few results for models in higher dimensions, notably for models on the cubic lattice. There are a number of natural counting parameters for such objects. We restrict consideration to area and volume, which is the three-dimensional analogue of perimeter and area of two-dimensional models. One prominent model is self-avoiding surfaces on the cubic lattice, also studied as a model of three-dimensional vesicle collapse. We follow the review in [102] (see also the references therein) and consider closed orientable surfaces of genus zero, i.e., surfaces homeomorphic to a sphere. Numerical studies indicate that the surface generating function displays a square-root γ = −1/2 as the dominant singularity. Consider the fixed surface area ensemble with weights proportional to qn, with n the volume of the surface. One expects a deflated phase (branched polymer phase) for small values of q and an inflated phase (spherical phase) for large values of q. In the deflated phase, the mean volume of a surface should grow proportionally to the aream of the surface, in the inflated phase the mean volume should grow like m3/2 with the surface. Numerical simulations suggest a phase transition at q = 1 with exponent φ = 1. This indicates that a typical surface resembles a branched polymer, and a concentrated distribution of volume is expected. Note that this behaviour differs from that of the two-dimensional model of self-avoiding polygons. Even relatively simple subclasses of self-avoiding surfaces such as rectangular boxes [73] and plane partition vesicles [50], generalising the two-dimensional models of rectangles and Ferrers diagrams, display complicated behaviour. Let pm,n denote the number of surfaces of area m and volume n and consider the generating function P (x, q) = m,n pm,nx mqn. For rectangular box vesicles, we apparently have P (x, 1) ∼ A| log(1−x)|/(1−x)3/2 as x → 1−, some some constant A > 0, see [73, Eq (35)]. In the fixed surface area ensemble, a linear polymer phase 0 < q < 1 is separated from a cubic phase q > 1. At q = 1, we have φ = 2/3, such that typical rectangular boxes are expected to attain a cubic shape. We expect a limit distribution which is concentrated. For plane partition vesicles, it is conjectured on the basis of numerical simulations [50, Sec 4.1.1] that P (x, 1) ∼ A exp(α/(xc−x)1/3)/(xc−x)γ , where γ ≈ 1.7 at xc = 0.8467(3), for non-vanishing constants A and α. It is expected that φ = 1/2. As in the previous subsection, three-dimensional models of punctured vesicles may be considered. The above arguments hold, if the exponent φ satisfies 0 < φ < 1. A corresponding result for punctures of unbounded size can be stated if the critical surface area generating function is finite. 3.10 Summary In this section, we described methods to extract asymptotic area laws for polygon models on the square lattice, and we applied these to various classes of polygons. Some of the laws were found to coincide with those of the (absolute) area under a Brownian excursion and a Brownian meander. A combinatorial explanation for the latter result has not been given. Is there a simple polygon model with the same area limit law as the area under a Brownian bridge? The connection to stochastics deserves further investigation. In particular, it would be interesting to identify underlying stochastic processes. For an approach to a number of different random combinatorial structures starting from a probabilistic viewpoint, see [82]. Area laws of polygon models in the uniform fixed perimeter ensemble q = 1 have been understood in some generality, by an analysis of the singular behaviour of q-functional equations about the point (x, q) = (xc, 1). Essentially, the type of singularity of the half-perimeter generating function determines the limit law. A refined analysis can be done, leading to local limit laws and providing convergence rates. Also, limit distributions describing corrections to the asymptotic behaviour can be derived. They seem to coincide with distributions arising in models of punctured polygons, see [94]. For non-uniform ensembles, concentrated distributions are expected, but general re- sults, e.g. for q-functional equations, are lacking. These may be obtained by multivariate singularity analysis, see also [24, 65]. The underlying structure of q-functional equations appears in a number of other combi- natorial models, such as models of two-dimensional directed walks, counted by length and area between the walk and the x-axis, models of simply generated trees, counted by the number of nodes and path length, and models which appear in the average case analysis of algorithms, see [34, 37]. Thus, the above methods and results can be applied to such models. In statistical physics, this mainly concerns models of (interacting) directed walks, see [48] for a review. There is also an approach to the behaviour of such walks from a stochastic viewpoint, see e.g. the review [101]. There are exactly solvable polygon models, which do not satisfy an algebraic q-difference equation, such as three-choice polygons [44], punctured staircase polygons [45], prudent polygon subclasses [96], and possibly diagonally convex polygons. For a rigorous analysis of the above models, it may be necessary to understand q-difference equations with more general holonomic solutions, as q approaches unity. Focussing on self-avoiding polygons, it might be interesting to analyse whether the perimeter generating function of rooted self-avoiding polygons might satisfy an implicit equation Eq. (33). Asymptotic properties of the area can possibly be studied using stochas- tic Loewner evolution [60]. Another open question concerns the area limit law for q 6= 1 or the perimeter limit law for x 6= xc, where Gaussian behaviour is expected. At present, even the simpler question of analyticity of the critical curve xc(q) for 0 < q < 1 is open. Most results of this section concerned area limit laws of polygon models. Similarly, one can ask for perimeter laws in the fixed area ensemble. Results have been given for the uniform ensemble. Generally, Gaussian limit laws are expected away from criticality, i.e., away from x = xc. Perimeter laws are more difficult to extract from a q-functional equation than area laws. We will however see in the following section that, surprisingly, under certain conditions, knowledge of the area limit law can be used to infer the perimeter limit law at criticality. 4 Scaling functions From a technical perspective, the focus in the previous section was on the singular be- haviour of the single-variable factorial moment generating function gk(x) Eq. (1), and on the associated asymptotic behaviour of their coefficients. This yielded the limiting area distribution of some polygon models. In this section, we discuss the more general problem of the singular behaviour of the two- variable perimeter and area generating function of a polygon model. Near the special point (x, q) = (xc, 1), the perimeter and area generating function P (x, q) = m≥0 pm(q)x n≥0 an(x)q n is expected to be approximated by a scaling function, and the correspond- ing coefficient functions pm(q) and an(x) are expected to be approximated by finite size scaling functions. As we will see, scaling functions encapsulate information about the limit distributions discussed in the previous section, and thus have a probabilistic interpretation. We will give a focussed review, guided by exactly solvable examples, since singularity analysis of multivariate generating functions is, in contrast to the one-variable case, not very well developed, see [81] for a recent overview. Methods of particular interest to poly- gon models concern asymptotic expansions about multicritical points, which are discussed for special examples in [80, 5]. Conjectures for the behaviour of polygon models about multicritical points arise from the physical theory of tricritical scaling [41], see the review [61], which has been adapted to polygon models [14, 13]. There are few rigorous results about scaling behaviour of polygon models, which we will discuss. This will complement the exposition in [47]. See also [42, Ch 9] for the related subject of scaling in percolation. 4.1 Scaling and finite size scaling The half-perimeter and area generating function of a polygon model P (x, q) about (x, q) = (xc, 1) is expected to be approximated by a scaling function. This is motivated by the following heuristic argument. Assume that the factorial area moment generating functions gk(x) Eq. (1) have, for values x < xc, a local expansion about x = xc of the form gk(x) = (1− x/xc)γk,l where γk,l = (k − θl)/φ and θl+1 > θl. Disregarding questions of analyticity, we argue P (x, q) ≈ (−1)k (1− x/xc)γk,l (1− q)k (1− q)θl (−1)kfk,l 1− x/xc (1− q)φ )−γk,l) In the above calculation, we replaced P (x, q) by its Taylor series about q = 1, and then replaced the Taylor coefficients by their expansion about x = xc. The preceding heuristic calculation has, for some polygon models and on a formal level, a rigorous counterpart, see the previous section. In the above expression, the r.h.s. depends on series Fl(s) =∑ k≥0(−1)kfk,ls−γk,l of a single variable of combined argument s = (1 − x/xc)/(1 − q)φ. Restricting to the leading term l = 0, this motivates the following definition. For φ > 0 and xc > 0, we define for numbers s−, s+ ∈ [−∞,+∞] the domain D(s−, s+) = {(x, q) ∈ (0,∞)× (0, 1) : s− < (1− x/xc)/(1− q)φ < s+)}. Definition 4 (Scaling function). For numbers pm,n with generating function P (x, q) =∑ m,n pm,nx mqn, let Assumption 1 be satisfied. Let 0 < xc ≤ 1 be the radius of convergence of P (x, 1). Assume that there exist constants s−, s+ ∈ [−∞,+∞] satisfying s− < s+ and a function F : (s−, s+) → R, such that P (x, q) satisfies, for real constants θ and φ > 0, P (sing)(x, q) ∼ (1− q)θF 1− x/xc (1− q)φ (x, q) → (xc, 1) in D(s−, s+). (34) Then, the function F(s) is called an (area) scaling function, and θ and φ are called critical exponents. Remarks. i) In analogy to the one-variable case, the above asymptotic equality means that there exists a power series P (reg)(x, q) convergent for |x| < x1 and |q| < q1, where x1 > xc and q1 > 1, such that the function P (sing)(x, q) := P (x, q)− P (reg)(x, q) is asymp- totically equal to the r.h.s.. ii) Due to the region D(s−, s+) where the limit (x, q) → (xc, 1) is taken, admissible values (x, q) satisfy 0 < q < 1 and 0 < x < x0(q), where x0(q) = xc(1− s−(1− q)φ), if s− 6= −∞. Thus, in this case, the critical curve xc(q) satisfies xc(q) ≥ x0(q) as q approaches unity. Note that equality need not hold in general. iii) The method of dominant balance was originally applied in order to obtain a defining equation for a scaling function F(s) from a given functional equation of a polygon model. This assumes the existence of a scaling function, together with additional analyticity prop- erties. See [84, 91, 87]. iv) For particular examples, an analytic scaling function F(s) exists, with an asymptotic expansion about infinity, and the area amplitude series F (s) agrees with the asymptotic series, see below. v) There is an alternative definition of a scaling function [31] by demanding P (sing)(x, q) ∼ 1 (1− x/xc)−θ/φ (1− x/xc)1/φ (x, q) → (xc, 1) (35) in a suited domain, for a functionH(t) of argument t = (1−q)/(1−x/xc)1/φ. Such a scaling form is also motivated by the above argument. One may then call such a function H(t) a perimeter scaling function. If F(s) is a scaling function, then a function H(t), satisfying Eq. (35) in a suited domain, is given by H(t) = tθF(t−φ). If s− ≤ 0 and s+ = ∞, the particular scaling form Eq. (34) implies a certain asymptotic behaviour of the critical area generating function and of the half-perimeter generating function. The following lemma is a consequence of Definition 4. Lemma 3. Let the assumptions of Definition 4 be satisfied. i) If s+ = ∞ and if the scaling function F(s) has the asymptotic behaviour F(s) ∼ f0s−γ0 (s → ∞), then γ0 = − θφ , and the half-perimeter generating function P (x, 1) satisfies P (sing)(x, 1) ∼ f0(1− x/xc)θ/φ (x ր xc). ii) If s− ≤ 0 and if the scaling function F(s) has the asymptotic behaviour F(s) ∼ h0sα0 (s ց 0), then α0 = 0, and the critical area generating function P (xc, q) satisfies P (sing)(xc, q) ∼ h0(1− q)θ (q ր 1). A sufficient condition for equality of the area amplitude series and the scaling function is stated in the following lemma, which is an extension of Lemma 3. Lemma 4. Let the assumptions of Definition 4 be satisfied. i) Assume that the relation Eq. (34) remains valid under arbitrary differentiation w.r.t. q. If s+ = ∞, if the scaling function F(s) has an asymptotic expansion F(s) ∼ (−1)kfks−γk (s → ∞), and if an according asymptotic expansion is true for arbitrary derivatives, then the following statements hold. a) The exponent γk is, for k ∈ N0, given by k − θ b) The scaling function F(s) determines the asymptotic behaviour of the factorial area moment generating functions via P (x, q) )(sing) (1− x/xc)γk (x ր xc). ii) Assume that the relation Eq. (34) remains valid under arbitrary differentiation w.r. to x. If s− ≤ 0, and if the scaling function F(s) has an asymptotic expansion F(s) ∼ (−1)khksαk (s ց 0), and if an according asymptotic expansion is true for arbitrary derivatives, then the following statements hold. a) The exponent αk is, for k ∈ N0, given by αk = k. b) The scaling function determines the asymptotic behaviour of the factorial perime- ter moment generating functions at x = xc via P (x, q) )(sing) (1− q)βk (q ր 1), where βk = kφ− θ. Remarks. Lemma 4 states conditions under which the area amplitude series coincides with the scaling function. Given these conditions, the scaling function also determines the perimeter law of the polygon model at criticality. In the one-variable case, the singular behaviour of a generating function translates, under suitable assumptions, to the asymptotic behaviour of its coefficients. We sketch the analogous situation for the asymptotic behaviour of a generating function involving a scaling function. Definition 5 (Finite size scaling function). For numbers pm,n with generating function P (x, q) = m,n pm,nx mqn, let Assumption 1 be satisfied. Let 0 < xc ≤ 1 be the radius of convergence of the generating function P (x, 1). i) Assume that there exist a number t+ ∈ (0,∞] and a function f : [0, t+] → R, such that the perimeter coefficient function asymptotically satisfies, for real constants γ0 and φ > 0, [xm]P (x, q) ∼ x−mc mγ0−1f(m1/φ(1− q)) (q,m) → (1,∞), where the limit is taken for m a positive integer and for real q, such that m1/φ(1−q) ∈ [0, t+]. Then, the function f(t) is called a finite size (perimeter) scaling function. ii) Assume that there exist constants t− ∈ [−∞, 0), t+ ∈ (0,∞], and a function h : [t−, t+] → R, such that the area coefficient function asymptotically satisfies, for real constants β0 and φ > 0, [qn]P (x, q) ∼ nβ0−1h(nφ(1− x/xc)) (x, n) → (xc,∞), where the limit is taken for n a positive integer and real x, such that nφ(1− x/xc) ∈ [t−, t+]. Then, the function h(t) is called a finite size (area) scaling function. Remarks. i) The following heuristic calculation motivates the expectation that a finite size scaling function approximates the coefficient function. For the perimeter coefficient function, assume that the exponents γk of the factorial area moment generating functions are of the special form γk = (k − θ)/φ. We argue [xm]P (x, q) ≈ [xm] (−1)k fk (1− x/xc)γk (1− q)k ≈ x−mc mγ0−1 (−1)k Γ(γk) m1/φ(1− q) In the above expression, the r.h.s. depends on a function f(t) of a single variable of combined argument t = m1/φ(1− q). For the area coefficient function, we assume that βk = kφ− θ and argue as above, [qn]P (x, q) ≈ [qn] (−1)k hk (1− q)βk (1− x/xc)k ≈ nβ0−1 (−1)k hk Γ(βk) nφ(1− x/xc) In the above expression, the r.h.s. depends on a function h(t) of a single variable of combined argument t = nφ(1− x/xc). ii) The above argument suggests that a scaling function and a finite size scaling function may be related by a Laplace transformation. A comparison with Eq.(20) leads one to expect that finite size scaling functions are moment generating functions of the limit laws of area and perimeter. iii) Sufficient conditions under which knowledge of a scaling function implies the existence of a finite size scaling function have been given for the finite size area scaling function [13] using Darboux’s theorem. A scaling function describes the leading singular behaviour of the generating function P (x, q) in some region about (x, q) = (xc, 1). A particular form of subsequent correction terms has been argued for at the beginning of the section. Definition 6 (Correction-to-scaling functions). For numbers pm,n with generating function P (x, q) = m,n pm,nx mqn, let Assumption 1 be satisfied. Let 0 < xc ≤ 1 be the radius of convergence of the generating function P (x, 1). Assume that there exist constants s−, s+ ∈ [−∞,+∞] satisfying s− < s+, and functions Fl : (s−, s+) → R for l ∈ N0, such that the generating function P (x, q) satisfies, for real constants φ > 0 and θl, where θl+1 > θl, P (sing)(x, q) ∼ (1− q)θlFl 1− x/xc (1− q)φ (x, q) → (xc, 1) in D(s−, s+). Then, the function F0(s) is a scaling function, and for l ≤ 1, the functions Fl(s) are called correction-to-scaling functions. Remarks. i) In the above context, the symbol ∼ denotes a (generalised) asymptotic expansion (see also [80, Ch 1]): Let (Gk(x))k∈N0 be a sequence of (multivariate) functions satisfying for all k the estimate Gk+1(x) = o(Gk(x)) as x → xc in some prescribed region. For a function G(x), we then write G(x) ∼ k=0Gk(x) as x → xc, if for all n we have G(x) = k=0 Gk(x) +O(Gn(x)) as x → xc. ii) The previous section yielded effective methods for obtaining area amplitude functions. These are candidates for correction-to-scaling functions, see also [87]. 4.2 Squares and rectangles We consider the models of squares and rectangles, whose scaling behaviour can be explicitly computed. Their half-perimeter and area generating function can be written as a single sum, to which the Euler-MacLaurin summation formula [80, Ch 8] can be applied. We first discuss squares. Fact 5 (cf [49, Thm 2.4]). For 0 < x, q < 1, the generating function P (x, q) = m=0 x of squares, counted by half-perimeter and area, is given by P (x, q) = | log q| | log x| | log q| +R(x, q), with F(s) = erfc(s), where the remainder term R(x, q) is bounded by |R(x, q)| ≤ 1 | log x|. Remarks. i) The remainder term differs from that in [49, Thm 2.4], where it was estimated by an integral with lower bound one instead of zero [49, Eq. (46)]. ii) With xc = 1, s− = 0 and s+ = ∞, the function F(s) is a scaling function according to the above definition. The remainder term is uniformly bounded in any rectangle [x0, 1)× [q0, 1) for 0 < x0, q0 < 1, and so the approximation is uniform in this rectangle. iii) The generating function P (x, q) satisfies the quadratic q-difference equation P (x, q) = 1+xq1/4P (q1/2x, q). Using the methods of the previous section, the area amplitude series of the model can be derived. It coincides with the above scaling function F(s). This particular form is expected, since the distribution of area is concentrated, p(x) = δ(x−1/4), compare also with Ferrers diagrams. iv) It has not been studied whether the scaling region can be extended to values x > 1 near (x, q) = (1, 1). It can be checked that the scaling function F(s) also determines the asymptotic behaviour of the perimeter moment generating functions, via its expansion about the origin. As expected, they indicate a concentrated distribution. The half-perimeter and area generating function of rectangles is given by P (x, q) = xr+sqrs = x(qx)r 1− qrx We have P (x, 1) = x2/(1− x)2, and it can be shown that P (1, q) ∼ − log(1−q) 1−q as q ր 1, see [85, 49]. The latter result implies that a scaling form as in Definition 4, with s− ≤ 0, does not exist for rectangles. We have the following result. Fact 6 ([49, Thm 3.4]). For 0 < q < 1 and 0 < qx < 1, the generating function P (x, q) of rectangles satisfies P (x, q) = | log q| | log q| | log x| − LerchPhi qx, 1, | log x| | log q| +R(x, q), with the Lerch Phi-function LerchPhi(z, a, v) = (v+n)a , where the remainder term R(x, q) is bounded by |R(x, q)| ≤ 1− qx | log x| (1− qx)2 | log q| Remarks. i) The theorem implies that, for every q0 ∈ (0, 1), the function (1−qx)2P (x, q) is uniformly approximated for points (x, q) satisfying q0 < q < 1 and 0 < x < xc(q), where xc(q) = 1/q is the critical curve. ii) Rectangles cannot have a scaling function F(s) as in Definition 4 with s− ≤ 0, since the area generating function diverges with a logarithmic singularity. This is reflected in the above approximation. iii) It has not been studied whether the area moments or the perimeter moments at criti- cality can be extracted from the above approximation. iv) The relation of the above approximation to the area amplitude series of rectangles of the previous section, F (s) = Ei(s2) es , is not understood. Interestingly, the expansion of F (s) about s = 0 resembles a logarithmic divergence. It is not clear whether its expansion at the origin is related to the asymptotic behaviour of the perimeter moment generating functions. 4.3 Ferrers diagrams The singularity diagram of Ferrers diagrams is special, since the value xc(1) := limqր1 xc(q) does not coincide with the radius of convergence xc of the half-perimeter generating function P (x, 1). (The function q 7→ xc(q) is continuous on (0, 1], as may be inferred from the exact solution.) Thus, there are two special points in the singularity diagram, namely (x, q) = (xc, 1) and (x, q) = (xc(1), 1). Scaling behaviour about the latter point has apparently not been studied, see also [85]. About the former point (x, q) = (xc, 1), scaling behaviour is expected. The area ampli- tude series F (s) of Ferrers diagrams is given by the entire function F (s) = A numerical analysis indicates that its Taylor coefficients about s = 0 coincide with the perimeter moment amplitudes at criticality, which characterise a concentrated distribu- tion. There is no singularity of F (s) on the negative real axis at any finite value of s, in accordance with the fact that the critical line at q = 1 extends above x = xc. It is not known whether a scaling function exists for Ferrers diagrams, or whether it would coincide with the amplitude generating function, see also the recent discussion [50, Sec 2.3]. An rigorous study may be possible, by first rewriting the half-perimeter and area generating function as a contour integral. A further analysis then reveals a saddle point coalescing with the integration boundary at criticality. For such phenomena, uniform asymptotic expansions can be obtained by Bleistein’s method [80, Ch 9.9]. The approach proposed above is similar to that for the staircase model [83] in the following subsection. 4.4 Staircase polygons For staircase polygons, counted by width, height, and area with associated variables x, y, q, the existence of an area scaling function has been proved. The derivation starts from an exact expression for the generating function, which has then been written as a complex contour integral. About the point (x, q) = (xc, 1), this led to a saddle-point evaluation with the effect of two coalescing saddles. Fact 7 (cf [83, Thm 5.3]). Consider 0 < x, y, q < 1 such that the generating function P (x, y, q) of staircase polygons, counted by width, height and area, is convergent. Set q = e−ǫ for ǫ > 0. Then, as ǫ ց 0, we have P (x, y, q) = 1− x− y +α−1/2ǫ1/3 Ai′(αǫ−2/3) Ai(αǫ−2/3) 1− x− y  (1 +O(ǫ)) uniformly in x, y, where α = α(x, y) satisfies the implicit equation α3/2 = log(x) log(zm − log(zm + + 2 Li2(zm − d)− 2 Li2(zm + where zm = (1 + y − x)/2 and d = z2m − y, and Li2(t) = − log(1−u) du is the Euler dilogarithm. Remarks. i) The characterisation of α3/2 given in [83, Eq (4.21)] has been used. ii) The above approximation defines an area scaling function. For x = y and xc = 1/4, we obtain the approximation [83, Eq (1.14)] P (x, q) ∼ + 4−2/3ǫ1/3 Ai′(44/3(1/4− x)ǫ−2/3) Ai(44/3(1/4− x)ǫ−2/3) as (x, q) → (xc, 1) within the region of convergence of P (x, q). It follows by comparison that the area amplitude series coincides with the area scaling function. iii) An area amplitude series for the anisotropic model has been given in [56], by a suitable refinement of the method of dominant balance. iv) It is expected that the perimeter law at x = xc may be inferred from the Taylor ex- pansion of the scaling function F(s) at s = 0. A closed form for the moment generating function or the probability density has not been given. The right tail of the distribution has been analysed via the asymptotic behaviour of the moments [57, 55]. See also the next subsection. v) The above expression gives the singular behaviour of P (x, q) as q approaches unity, uniformly in x, y. Restricting to x = y, it describes the singular behaviour along the line q = 1 for 0 < x < xc. In the compact percolation picture, this line describes compact per- colation below criticality. Perimeter limit laws away from criticality may be inferred along the above lines. (Asymptotic expansions which are uniform in an additional parameter appear also for solutions of differential equations near singular points [80].) vi) By analytic continuation, it follows that the critical curve xc(q) for P (x, x, q) coincides near q = 1 with the upper boundary curve x0(q) = (1 − s−(1 − q)2/3)/4 of the scaling domain, where the value s− is determined by the singularity of smallest modulus of the scaling function on the negative real axis, hence by the first zero of the Airy function. This leads to a simple pole singularity in the generating function, which describes the branched polymer phase close to q = 1. 4.5 Self-avoiding polygons In the previous section, a conjecture for the limit distribution of area for self-avoiding polygons and rooted slef-avoiding polygons was stated. We further explain the underlying numerical analysis, following [91, 92, 93]. The numerically established form Eq. (32) implies for the area moment generating functions for k 6= 1 singular behaviour of the form (sing) k (x) ∼ (1− x/xc)γk (x ր xc), with critical point xc = 0.14368062927(2) and γk = 3k/2− 3/2, where the numbers fk are related to the amplitudes Ak in Eq. (32) by Γ(γk) For k = 1, we have γk = 0, and a logarithmic singularity is expected, g1(x) ∼ f1 log(1 − x/xc), with f1 = A1. Similar to Conjecture 1, this leads to a corresponding conjecture for the area amplitude series of self-avoiding polygons. If the area amplitude series was a scaling function, we would expect that it also describes the limit law of perimeter at criticality x = xc, via its expansion about the origin. (Interestingly, these moments are related to the moments of the Airy distribution of negative order, see [93, 34].) This prediction was confirmed in [93], up to numerical accuracy, for the first ten perimeter moments. Also, the crossover behaviour to the branched polymer phase has been found to be consistent with the corresponding scaling function prediction. As was argued in the previous subsection, the critical curve xc(q) close to unity should coincide with the upper boundary curve x0(q) = xc(1 − s−(1 − q)2/3), where the point s− is related to the first zero of the Airy function on the negative real axis, s− = −0.2608637(5). The latter two observations support the following conjecture. Conjecture 3 ([87, 93]). Let pm,n denote the number of self-avoiding polygons of half- perimeter m and area n, with generating function P (x, q) = m,n pm,nx mqn. Let xc = 0.14368062927(2) be the radius of convergence of the half-perimeter generating function P (x, 1). Assume that pm,n ∼ A0x−mc m−5/2 (m → ∞), where A0 is estimated by A0 = 0.09940174(4). Let the number s− be such that (4A0) 3 πs− coincides with the zero of the Airy function on the negative real axis of smallest modulus. We have s− = −0.2608637(5). i) For rooted self-avoiding polygons with half-perimeter and area generating function P (r)(x, q) = x d P (x, q), the conjectured form of a scaling function F (r)(s) : (s−,∞) → R as in Definition 4 is F (r)(s) = xc logAi (4A0) with critical exponents θ = 1/3 and φ = 2/3. ii) The conjectured form of a scaling function F(s) : (s−,∞) → R for self-avoiding polygons is obtained by integration, F(s) = − log Ai (4A0) (1− q) log(1− q), (36) with critical exponents θ = 1 and φ = 2/3. Remarks. i) The above conjecture is essentially based on the conjecture of the previous section that both staircase polygons and rooted self-avoiding polygons have, up to normal- isation constants, the same limiting distribution of area in the uniform ensemble q = 1. For a numerical investigation of the implications of the scaling function conjecture, see the preceding discussion. ii) A field-theoretical justification of the above conjecture has been proposed [16]. Also, the values of A1 = 1/(8π) and the prefactor 1/(12π) in Eq. (36) have been predicted using field-theoretic methods [15], see also the discussion in [93]. 4.6 Models in higher dimensions Only very few models of vesicles have been studied in three dimensions. For the simple model of cubes, the scaling behaviour in the perimeter-area ensemble is the same as for squares [49, Thm 2.4]. The scaling form in the area-volume ensemble has been given [49, Thm 2.8]. The asymptotic behaviour of rectangular box vesicles has been studied to some extent [73]. Explicit expressions for scaling functions have not been derived. 4.7 Open questions The mathematical problem of this section concerns the local behaviour of multivariate generating functions about non-isolated singularities. If such behaviour is known, it may, under appropriate conditions, be used to infer asymptotic properties such as limit distri- butions. Along lines of the same singular behaviour in the singularity diagram, expressions uniform in the parameters are expected. This may lead to Gaussian limit laws [37]. Parts of the theory of such asymptotic expansions have been developed using methods of sev- eral complex variables [81]. The case of several coalescing lines of different singularities is more difficult. Non-Gaussian limit laws are expected, and this case is subject to recent mathematical research [81]. Our approach is motivated by certain models of statistical physics. It relies on the observation that the singular behaviour of their generating function is described by a scaling function. There are major open questions concerning scaling functions. On a conceptual level, the transfer problem [35] should be studied in more detail, i.e., conditions under which the existence of a scaling function implies the existence of the finite-size scaling function. Also, conditions have to be derived such that limit laws can be extracted from scaling functions. This is related to the question when can an asymptotic relation be differentiated. Real analytic methods, in conjunction with monotonicity properties of the generating function, might prove useful [80]. For particular examples, such as models satisfying a linear q-difference equation or di- rected convex polygons, scaling functions may be extracted explicitly. It would be interest- ing to prove scaling behaviour for classes of polygon models from their defining functional equation. Furthermore, the staircase polygon result indicates that some generating func- tions may have in fact asymptotic expansions for q ր 1, which are valid uniformly in the perimeter variable (i.e., not only in the limit x ր xc). Such expansions would yield scaling functions and correction-to-scaling functions, thereby extending the formal results of the previous section. This might be worked out for specific models, at least in the relevant example of staircase polygons. Acknowledgements The author would like to thank Tony Guttmann and Iwan Jensen for comments on the manuscript, and Nadine Eisner, Thomas Prellberg and Uwe Schwerdtfeger for helpful dis- cussions. Financial support by the German Research Council (Deutsche Forschungsge- meinschaft) within the CRC701 is gratefully acknowledged. References [1] M. Abramowitz and I.A. Stegun. Handbook of Mathematical Functions with Formu- las, Graphs, and Mathematical Tables, volume 18. National Bureau of Standards Applied Mathematics Series, 1964. Reprint Dover 1973. [2] D.J. Aldous. The continuum random tree II: An overview. In M.T. Barlow and N.H. Bingham, editors, Stochastic Analysis, pages 23–70. Cambridge University Press, Cambridge, 1991. [3] G. Aleksandrowicz and G. Barequet. Counting d-dimensional polycubes and nonrect- angular planar polyominoes. In Proc. 12th Ann. Int. Computing and Combinatorics Conf. (COCOON), Taipei, Taiwan, volume 4112 of Springer Lecture Notes in Com- puter Science, pages 418–427. Springer, 2006. [4] D. Bennett-Wood, I.G. Enting, D.S. Gaunt, A.J. Guttmann, J.L. Leask, A.L. Owczarek, and S.G. Whittington. Exact enumeration study of free energies of inter- acting polygons and walks in two dimensions. J. Phys. A: Math. Gen, 31:4725–4741, 1998. [5] N. Bleistein and R.A. Handelsman. Asymptotic Expansions of Integrals. Holt, Rine- hart and Winston, New York, 1975. [6] M. Bousquet-Mélou. Une bijection entre les polyominos convexes dirigés et les mots de Dyck bilatéres. RAIRO Inform. Théor. Appl., 26:205–219, 1992. [7] M. Bousquet-Mélou. A method for the enumeration of various classes of column- convex polygons. Discrete Math., 154:1–25, 1996. [8] M. Bousquet-Mélou. Families of prudent self-avoiding walks. Preprint arXiv:0804.4843, 2008. [9] M. Bousquet-Mélou and J.-M. Fédou. The generating function of convex polyomi- noes: the resolution of a q-differential system. Discr. Math., 137:53–75, 1995. [10] M. Bousquet-Mélou and S. Janson. The density of the ISE and local limit laws for embedded trees. Ann. Appl. Probab., 16:1597–1632, 2006. [11] M. Bousquet-Mélou and A. Rechnitzer. The site-perimeter of bargraphs. Adv. in Appl. Math., 31:86–112, 2003. [12] R. Brak and J.W. Essam. Directed compact percolation near a wall. III. Exact results for the mean length and number of contacts. J. Phys. A: Math. Gen., 32:355–367, 1999. [13] R. Brak and A.L. Owczarek. On the analyticity properties of scaling functions in models of polymer collapse. J. Phys. A: Math. Gen., 28:4709–4725, 1995. [14] R. Brak, A.L. Owczarek, and T. Prellberg. A scaling theory of the collapse transi- tion in geometric cluster models of polymers and vesicles. J. Phys. A: Math. Gen., 26:4565–5479, 1993. [15] J. Cardy. Mean area of self-avoiding loops. Phys. Rev. Lett., 72:1580–1583, 1994. [16] J. Cardy. Exact scaling functions for self-avoiding loops and branched polymers. J. Phys. A: Math. Gen., 34:L665–L672, 2001. [17] K.L. Chung. A Course in Probability Theory. Academic Press, New York, 2nd edition, 1974. [18] G.M. Constantine and T.H. Savits. A multivariate Faa di Bruno formula with ap- plications. Trans. Amer. Math. Soc., 348:503–520, 1996. [19] D.A. Darling. On the supremum of a certain Gaussian process. Ann. Probab., 11:803– 806, 1983. [20] M.-P. Delest, D. Gouyou-Beauchamps, and B. Vauquelin. Enumeration of parallelo- gram polyominoes with given bond and site perimeter. Graphs Combin., 3:325–339, 1987. [21] M.-P. Delest and X.G. Viennot. Algebraic languages and polyominoes enumeration. Theor. Comput. Sci., 34:169–206, 1984. [22] J.C. Dethridge, T.M. Garoni, A.J. Guttmann, and I. Jensen. Prudent walks and polygons. Preprint arXiv:0810:3137, 2008. [23] G. Doetsch. Introduction to the Theory and Application of the Laplace Transform. Springer, New York, 1974. [24] M. Drmota. Systems of functional equations. Random Structures Algorithms, 10:103– 124, 1997. [25] P. Duchon. q-grammars and wall polyominoes. Ann. Comb., 3:311–321, 1999. [26] J.W. Essam. Directed compact percolation: Cluster size and hyperscaling. J. Phys. A: Math. Gen., 22:4927–4937, 1989. [27] J.W. Essam and A.J. Guttmann. Directed compact percolation near a wall. II. Cluster length and size. J. Phys. A: Math. Gen., 28:3591–3598, 1995. [28] J.W. Essam and D. Tanlakishani. Directed compact percolation. II. Nodal points, mass distribution, and scaling. In Disorder in physical systems, volume 67, pages 67–86. Oxford Univ. Press, New York, 1990. [29] J.W. Essam and D. Tanlakishani. Directed compact percolation near a wall. I. Biased growth. J. Phys. A: Math. Gen., 27:3743–3750, 1994. [30] J.A. Fill, P. Flajolet, and N. Kapur. Singularity analysis, Hadamard products, and tree recurrences. J. Comput. Appl. Math., 174:271–313, 2005. [31] M.E. Fisher, A.J. Guttmann, and S.G. Whittington. Two-dimensional lattice vesicles and polygons. J. Phys. A: Math. Gen., 24:3095–3106, 1991. [32] P. Flajolet. Singularity analysis and asymptotics of Bernoulli sums. Theoret. Comput. Sci., 215:371–381, 1999. [33] P. Flajolet, S. Gerhold, and B. Salvy. On the non-holonomic character of logarithms, powers, and the n-th prime function. Electronic Journal of Combinatorics, 11:A2:1– 16, 2005. [34] P. Flajolet and G. Louchard. Analytic variations on the Airy distribution. Algorith- mica, 31:361–377, 2001. [35] P. Flajolet and A. Odlyzko. Singularity analysis of generating functions. SIAM J. Discr. Math., 3:216–240, 1990. [36] P. Flajolet, P. Poblete, and A. Viola. On the analysis of linear probing hashing. Average-case analysis of algorithms. Algorithmica, 22:37–71, 1998. [37] P. Flajolet and R. Sedgewick. Analytic Combinatorics. Book in preparation, 2008. [38] B. Gittenberger. On the contour of random trees. SIAM J. Discr. Math., 12:434–458, 1999. [39] I.P. Goulden and D.M. Jackson. Combinatorial enumeration. John Wiley & Sons, New York, 1983. [40] D. Gouyou-Beauchamps and P. Leroux. Enumeration of symmetry classes of convex polyominoes on the honeycomb lattice. Theoret. Comput. Sci., 346:307–334, 2005. [41] R.B. Griffiths. Proposal for notation at tricritical points. Phys. Rev. B, 7:545–551, 1973. [42] G. Grimmett. Percolation. Springer, Berlin, 1999. 2nd ed. [43] A.J. Guttmann. Asymptotic analysis of power-series expansions. In C. Domb and J.L. Lebowitz, editors, Phase Transitions and Critical Phenomena, volume 13, pages 1–234. Academic, New York, 1989. [44] A.J. Guttmann and I. Jensen. Fuchsian differential equation for the perimeter gen- erating function of three-choice polygons. Séminaire Lotharingien de Combinatoire, 54:B54c, 2006. [45] A.J. Guttmann and I. Jensen. The perimeter generating function of punctured stair- case polygons. J. Phys. A: Math. Gen., 39:3871–3882, 2006. [46] G.H. Hardy. Divergent Series. Clarendon Press, Oxford, 1949. [47] E.J. Janse van Rensburg. The Statistical Mechanics of Interacting Walks, Polygons, Animals and Vesicles, volume 18 of Oxford Lecture Series in Mathematics and its Applications. Oxford University Press, Oxford, 2000. [48] E.J. Janse van Rensburg. Statistical mechanics of directed models of polymers in the square lattice. J. Phys. A: Math. Gen., 36:R11–R61, 2003. [49] E.J. Janse van Rensburg. Inflating square and rectangular lattice vesicles. J. Phys. A: Math. Gen., 37:3903–3932, 2004. [50] E.J. Janse van Rensburg and J. Ma. Plane partition vesicles. J. Phys. A: Math. Gen., 39:11171–11192, 2006. [51] S. Janson. The Wiener index of simply generated random trees. Random Structures Algorithms, 22:337–358, 2003. [52] S. Janson. Brownian excursion area, Wright’s constants in graph enumeration, and other Brownian areas. Probab. Surv., 4:80–145, 2007. [53] I. Jensen. Perimeter generating function for the mean-squared radius of gyration of convex polygons. J. Phys. A: Math. Gen., 38:L769–775, 2005. [54] B.McK. Johnson and T. Killeen. An explicit formula for the c.d.f. of the l1 norm of the Brownian bridge. Ann. Prob., 11:807–808, 1983. [55] J.M. Kearney. On a random area variable arising in discrete-time queues and compact directed percolation. J. Phys. A: Math. Gen., 37:8421–8431, 2004. [56] M.J. Kearney. Staircase polygons, scaling functions and asymmetric compact perco- lation. J. Phys. A: Math. Gen., 35:L731–L735, 2002. [57] M.J. Kearney. On the finite-size scaling of clusters in compact directed percolation. J. Phys. A: Math. Gen., 36:6629–6633, 2003. [58] M.J. Kearney, S.N. Majumdar, and R.J. Martin. The first-passage area for drifted Brownian motion and the moments of the Airy distribution. J. Phys. A: Math. Theor., 40:F863–F869, 2007. [59] J.-M. Labarbe and J.-F. Marckert. Asymptotics of Bernoulli random walks, bridges, excursions and meanders with a given number of peaks. Electronic J. Probab., 12:229– 261, 2007. [60] G.F. Lawler, O. Schramm, and W. Werner. On the scaling limit of planar self- avoiding walk. In Fractal Geometry and Applications: A Jubilee of Benôıt Man- delbrot, Part 2, volume 72 of Proceedings of Symposia in Pure Mathematics, pages 339–364. Amer. Math. Soc., Providence, RI, 2004. [61] I.D. Lawrie and S. Sarbach. Theory of tricritical points. In C. Domb and J.L. Lebowitz, editors, Phase Transitions and Critical Phenomena, volume 9, pages 1– 161. Academic Press, London, 1984. [62] P. Leroux and É. Rassart. Enumeration of symmetry classes of parallelogram poly- ominoes. Ann. Sci. Math. Québec, 25:71–90, 2001. [63] P. Leroux, É. Rassart, and A. Robitaille. Enumeration of symmetry classes of convex polyominoes in the square lattice. Adv. in Appl. Math, 21:343–380, 1998. [64] K.Y. Lin. Rigorous derivation of the perimeter generating functions for the mean- squared radius of gyration of rectangular, Ferrers and pyramid polygons. J. Phys. A: Math. Gen., 39:8741–8745, 2006. [65] M. Lladser. Asymptotic enumeration via singularity analysis. PhD thesis, Ohio State University, 2003. Doctoral dissertation. [66] G. Louchard. Kac’s formula, Lévy’s local time and Brownian excursion. J. Appl. Probab., 21:479–499, 1984. [67] G. Louchard. The Brownian excursion area: A numerical analysis. Com- put. Math. Appl., 10:413–417, 1985. [68] G. Louchard. Erratum: ”The Brownian excursion area: A numerical analysis”. Comput. Math. Appl., 12:375, 1986. [69] G. Louchard. Probabilistic analysis of some (un)directed animals. Theoret. Comput. Sci., 159:65–79, 1996. [70] G. Louchard. Probabilistic analysis of column-convex and directed diagonally-convex animals. Random Structures Algorithms, 11:151–178, 1997. [71] G. Louchard. Probabilistic analysis of column-convex and directed diagonally-convex animals. II. Trajectories and shapes. Random Structures Algorithms, 15:1–23, 1999. [72] A. Del Lungo, M. Mirolli, R. Pinzani, and S. Rinaldi. A bijection for directed-convex polyominoes. Discr. Math. Theo. Comput. Sci., AA (DM-CCG):133–144, 2001. [73] J. Ma and E.J. Janse van Rensburg. Rectangular vesicles in three dimensions. J. Phys. A: Math. Gen., 38:4115–4147, 2005. [74] N. Madras and G. Slade. The Self-Avoiding Walk. Birkhäuser Boston, Boston, MA, 1993. [75] S.N. Majumdar. Brownian functionals in physics and computer science. Current Sci., 89:2076–2092, 2005. [76] S.N. Majumdar and A. Comtet. Airy distribution function: From the area under a Brownian excursion to the maximal height of fluctuating interfaces. J. Stat. Phys., 119:777–826, 2005. [77] M. Nguy˜̂en Th´̂e. Area of Brownian motion with generatingfunctionology. In C. Banderier and C. Krattenthaler, editors, Discrete Random Walks, DRW’03, Discrete Mathematics and Theoretical Computer Science Proceedings, AC, pages 229–242. Assoc. Discrete Math. Theor. Comput. Sci., Nancy, 2003. [78] M. Nguy˜̂en Th´̂e. Area and inertial moment of Dyck paths. Combin. Probab. Comput., 13:697–716, 2004. [79] A.M. Odlyzko. Asymptotic enumeration methods. In R.L. Graham, M. Grötschel, and L. Lovász, editors, Handbook of Combinatorics, volume 2, pages 1063–1229. Elsevier, Amsterdam, 1995. [80] F.W.J. Olver. Asymptotics and Special Functions. Academic Press, New York, 1974. [81] R. Pemantle and M. Wilson. Twenty combinatorial examples of asymptotics derived from multivariate generating functions. SIAM Rev., 50:199–272, 2008. [82] J. Pitman. Combinatorial Stochastic Processes, volume 1875 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2006. [83] T. Prellberg. Uniform q-series asymptotics for staircase polygons. J. Phys. A: Math. Gen., 28:1289–1304, 1995. [84] T. Prellberg and R. Brak. Critical exponents from nonlinear functional equations for partially directed cluster models. J. Stat. Phys., 78:701–730, 1995. [85] T. Prellberg and A.L. Owczarek. Stacking models of vesicles and compact clusters. J. Stat. Phys., 80:755–779, 1995. [86] A. Rechnitzer. Haruspicy 2: The anisotropic generating function of self-avoiding polygons is not D-finite. J. Combin. Theory Ser. A, 113:520–546, 2006. [87] C. Richard. Scaling behaviour of two-dimensional polygon models. J. Stat. Phys., 108:459–493, 2002. [88] C. Richard. Staircase polygons: Moments of diagonal lengths and column heights. J. Phys.: Conf. Ser., 42:239–257, 2006. [89] C. Richard. On q-functional equations and excursion moments. Discr. Math., in press, 2008. math.CO/0503198. [90] C. Richard and A.J. Guttmann. q-linear approximants: Scaling functions for polygon models. J. Phys. A: Math. Gen., 34:4783–4796, 2001. [91] C. Richard, A.J. Guttmann, and I. Jensen. Scaling function and universal amplitude combinations for self-avoiding polygons. J. Phys. A: Math. Gen., 34:L495–L501, 2001. [92] C. Richard, I. Jensen, and A.J. Guttmann. Scaling function for self-avoiding poly- gons. In D. Iagolnitzer, V. Rivasseau, and J. Zinn-Justin, editors, Proceedings of the International Congress on Theoretical Physics TH2002 (Paris), Supplement, pages 267–277. Birkhäuser, Basel, 2003. [93] C. Richard, I. Jensen, and A.J. Guttmann. Scaling function for self-avoiding polygons revisited. J. Stat. Mech.: Th. Exp., page P08007, 2004. [94] C. Richard, I. Jensen, and A.J. Guttmann. Area distribution and scaling function for punctured polygons. Electronic Journal of Combinatorics, 15:#R53, 2008. [95] C. Richard, U. Schwerdtfeger, and B. Thatte. Area limit laws for symmetry classes of staircase polygons. Preprint arXiv:0710:4041, 2007. [96] U. Schwerdtfeger. Exact solution of two classes of prudent polygons. Preprint arXiv:0809:5232, 2008. [97] U. Schwerdtfeger. Volume laws for boxed plane partitions and area laws for Ferrers diagrams. In Fifth Colloquium on Mathematics and Computer Science, Discrete Mathematics and Theoretical Computer Science Proceedings, AG, pages 535–544. Assoc. Discrete Math. Theor. Comput. Sci., Nancy, 2008. [98] R.P. Stanley. Enumerative Combinatorics, volume 2. Cambridge University Press, Cambridge, Cambridge. [99] L. Takács. On a probability problem connected with railway traffic. J. Appl. Math. Stochastic Anal., 4:1–27, 1991. [100] L. Takács. Limit distributions for the Bernoulli meander. J. Appl. Prob., 32:375–395, 1995. [101] R. van der Hofstad and W. König. A survey of one-dimensional random polymers. J. Statist. Phys., 103:915–944, 2001. [102] C. Vanderzande. Lattice Models of Polymers, volume 11 of Cambridge Lecture Notes in Physics. Cambridge University Press, Cambridge, 1998. [103] L. Di Vizio, J.-P. Ramis, J. Sauloy, and C. Zhang. Équations aux q-différences. Gaz. Math., 96:20–49, 2003. [104] S.G. Whittington. Statistical mechanics of three-dimensional vesicles. J. Math. Chem., 14:103–110, 1993. Introduction Polygon models and generating functions Limit distributions An illustrative example: Rectangles Limit law of area Limit law via generating functions Dominant balance A general method Further examples Ferrers diagrams Staircase polygons q-difference equations Algebraic q-difference equations q-functional equations and other extensions A stochastic connection Directed convex polygons Limit laws away from (xc,1) Self-avoiding polygons Punctured polygons Models in three dimensions Summary Scaling functions Scaling and finite size scaling Squares and rectangles Ferrers diagrams Staircase polygons Self-avoiding polygons Models in higher dimensions Open questions ABSTRACT We discuss the asymptotic behaviour of models of lattice polygons, mainly on the square lattice. In particular, we focus on limiting area laws in the uniform perimeter ensemble where, for fixed perimeter, each polygon of a given area occurs with the same probability. We relate limit distributions to the scaling behaviour of the associated perimeter and area generating functions, thereby providing a geometric interpretation of scaling functions. To a major extent, this article is a pedagogic review of known results. <|endoftext|><|startoftext|> Incommmensurability and unconventional superconductor to insulator transition in the Hubbard model with bond-charge interaction A. A. Aligia,1 A. Anfossi,2, 3 L. Arrachea,4, 3 C. Degli Esposti Boschi,5 A. O. Dobry,6 C. Gazza,6 A. Montorsi,2 F. Ortolani,7 and M. E. Torio6 1Comisión Nacional de Enerǵıa Atómica, Centro Atómico Bariloche and Instituto Balseiro, 8400 S.C. de Bariloche, Argentina 2Dipartimento di Fisica del Politecnico and CNISM, corso Duca degli Abruzzi 24, I-10129, Torino, Italy 3BIFI, Universidad de Zaragoza, Corona de Aragón 42, 5009 Zaragoza, Spain 4Departamento de F́ısica de la Materia Condensada, Universidad de Zaragoza, 5009 Zaragoza 5Unità CNISM and Dipartimento di Fisica dell’Università di Bologna, viale Berti-Pichat 6/2, I-40127, Bologna, Italy 6Instituto de F́ısica Rosario, CONICET-UNR, Bv. 27 de Febrero 210 bis, 2000 Rosario, Argentina. 7Dipartimento di Fisica dell’Università di Bologna and INFN, viale Berti-Pichat 6/2, I-40127, Bologna, Italy (Dated: November 1, 2018) We determine the quantum phase diagram of the one-dimensional Hubbard model with bond- charge interaction X in addition to the usual Coulomb repulsion U > 0 at half-filling. For large enoughX < t the model shows three phases. For large U the system is in the spin-density wave phase as in the usual Hubbard model. As U decreases, there is first a spin transition to a spontaneously dimerized bond-ordered wave phase and then a charge transition to a novel phase in which the dominant correlations at large distances correspond to an incommensurate singlet superconductor. PACS numbers: 71.10.Fd,71.10.Hf,71.10.Pm,71.30.+h The Hubbard model has been originally proposed to describe the effect of the Coulomb interaction in tran- sition metals, which usually contain localized orbitals. Other real compounds containing more extended orbitals cannot in general be properly described by this simple Hamiltonian. Well-known examples are several quasi- one-dimensional (1D) materials that have been recently investigated [1], which exhibit a variety of phases that cannot be explained with the usual Hubbard model. Ad- ditional interactions should be included. A natural in- teraction that arises in systems with extended orbitals is the bond-charge interaction X [2]. In fact, it is natural to assume that the charge in the bond affects screening and the effective potential acting on valence electrons, and therefore the extension of the Wannier orbitals and the hopping between them should vary with the charge. This leads to the U −X Hamiltonian: H = −t σ=↑,↓,〈ij〉 iσcjσ +H.c.) + U ni↑ni↓ σ,〈ij〉 cjσ +H.c.)(ni−σ + nj−σ). (1) This model has been studied in two dimensions, moti- vated by a theory of hole superconductivity [3]. A mod- ified version of it has been derived as an effective model for the cuprates and shows enhanced d-wave supercon- ducting correlations [4]. Recently, this model has been paramount to broader audiences, and its relevance has been discussed in the context of mesoscopic transport [5] and quantum information [6, 7]. In 1D, there are bosonization [8, 9] and numerical [9] results available. However, at half-filling, the effect of X disappears in the standard bosonization treatment and a behavior different from the usual Hubbard model was not expected in these studies. For X = t, an exact solution is available [10]. In this case the ground state is highly degenerate: the transition to a metallic state takes place at Uc = 4t > 0, but the response of the system to an applied magnetic flux indicates that it is not supercon- ducting [11]. In view of the previous studies, the recent evidence of an insulator-metal transition driven by X < t at finite Uc > 0 at half-filling comes as a surprise [12]. The nature of the metallic phase and the character of the transition have not been fully elucidated, though the possibility of superconductivity has been suggested. In this Letter we employ several analytical and numer- ical techniques to calculate accurately the phase diagram of the model at half-filling in 1D and to determine the na- ture of each phase. We establish that the insulator-metal transition is of commensurate-incommensurate (CIC) type to a phase with dominating singlet superconducting (SS) correlations. Remarkably, unlike other CIC tran- sitions [13, 14], it is not driven by one-body effects like chemical potential or the emergence of more than two Fermi points in the noninteracting dispersion relation, but by strong correlations induced by large enough X . In addition, we unveil that inside the insulating phase there is a spin transition separating the expected spin-density wave (SDW) for U > Us from a spontaneously dimerized bond-ordered wave (BOW) phase for Uc < U < Us. This transition is of Kosterlitz-Thouless (KT) type and a spin gap opens in the BOW phase. The nature of each phase and the qualitative aspects of the phase diagram can be understood by a weak coupling bosonization analysis provided it includes vertex correc- tions of second order in X to the coupling constants and http://arxiv.org/abs/0704.0717v2 one term of order a2 in the bosonization of the bond- charge interaction as described below, where a is the lat- tice constant. A bosonized version of (1) is given by the following Hamiltonian density: H = H0σ +H0ρ + (2πα)2 8φσ)− (2πα)2 (2πα)2 8φσ)∂xφρ, (2) where H0σ and H0ρ are the usual known quadratic forms and α is a short range cutoff in the bosonization proce- dure. The first line of (2) has the structure of the pre- viously studied bosonized theory [8], which corresponds to two decoupled sine-Gordon field theories, one for the spin (φσ) and the other for the charge (φρ). In order to take into account the effect of the bond-charge inter- action on the phase diagram of the system, we included vertex corrections of second order in X in the definition of the the coupling constants gi, due to virtual processes involving states far from the Fermi energy [15]. In addi- tion, we took into account the usually neglected gσρ term that couples spin and charge degrees of freedom. The lat- ter is ∝ a2. It arises including spatial derivatives of the fermionic fields in the representation of (1) in terms of a low energy field theory. All of these terms have naive scaling dimension 3 and are usually neglected. However, one term that bosonize as the second line of (2) becomes relevant for large enough X and provides a mechanism for an incommensurate transition, as discussed below. Explicitly, the effective parameters read g1⊥ = g2⊥ = (U− 8X π(t−X) )a and gσρ = 2a2X . The forward and umk- lapp processes are the same as in the Hubbard model, g3⊥ = g4⊥ = Ua. The Luttinger liquid parameters (Kρ and Kσ) and the charge and spin wave velocity (uρ and uσ) in terms of gi are given by known expressions [16]. Neglecting the gσρ term, the renormalization-group (RG) flow diagrams are of KT type. A spin gap opens when g1⊥ < 0, i.e., when the flow of RG, which takes place on the separatrix of the KT diagram due to spin SU(2) symmetry, goes to strong coupling. Therefore, the spin gapped phase appears when U < Us = π(t−X) . As for the behavior of the charge modes, a gap opens when the g3⊥ term becomes relevant. The charge gapped phase takes place for U > Uc, with Uc < Us. The gσρ term becomes relevant for Kσ < 1/2 (X > 0.6t for U = 0). In the spin gapped phase the cos( 8φσ) is frozen at its mean value. This term could be interpreted as a chemical potential [µ = (2πα)2 〈cos( 8φσ)〉] times a charge density operator. The effects of such a term are known [16]. If we start the analysis from a situation where there is also a charge gap (∆c) smaller than the spin one (∆s), and we then increase the value of X , the effect of this term is to close ∆c, leading to a metallic phase when µ > ∆c. The effective Fermi level is shifted with respect to the original one and the system develops incommen- 0 0,2 0,4 0,6 Bosonization S Bosonization C 0 0,2 0,4 0,6 0,8 0 0,2 0,4 0,6 0,8 1 Exact Top. Ph S Top. Ph C DMRG S DMRG C Figure 1: (Color online). Phase diagram. Left: Bosoniza- tion (top), and real space renormalization-group (bottom) predictions. Right: Numerical results as obtained by DMRG (circles-squares) and topological phase (crosses) methods. surate correlations. A numerical analysis discussed be- low shows that the system has dominant SS correlations. Thus, this phase can be characterized as incommensurate singlet superconducting (ICSS). For a qualitative localization of the boundary transi- tion line between the insulator and the ICSS phase, we have implemented a procedure as follows: (i) We start from a parameter regime where the spin gap is open. (ii) We follow the RG flow up to a length scale where |g1⊥|/(πUs)| ∼ 1. (iii) At this point the gσρ term is de- coupled by a mean field approach similar to that used by Nersesyan et al. to show incommensurability in the anisotropic zigzag chain [17]. The value of 〈cos( 8φσ)〉 is exactly obtained at the LE point (Kσ = 1/2). (iv) For vanishing gσρ, ∆c is obtained by rescaling the problem to the LE point of the charge sector, by using the RG equations of the sine-Gordon theory. (v) The CIC tran- sition takes place when (2πα)2 〈cos( 8φσ)〉 = ∆c [16]. In the top left panel of Fig. 1 we show the phase diagram of the model predicted by this approach. For each value of X , there are two transition points Uc and Us corre- sponding to the charge and spin transition, respectively. Each phase is characterized by the gapped modes and the relevant order parameter. For U > Uc the system is an insulator. For U > Us, the slowest decaying correlation functions are the spin-spin ones. The system is in a SDW phase. For Uc < U < Us a fully gapped (spin and charge) phase is developed. The fields φσ and φρ are located at the minimum of the potential, and the translation sym- metry is spontaneously broken. The BOW parameter, defined below, acquires a nonzero value. For U < Uc the charge gap closes and the dominant correlations at large distances are the SS ones. While the nature of each phase has been identified, the phase boundaries predicted by bosonization are not quantitatively valid, particularly for large values of the interactions. In the right panel of Fig. 1 we show the phase diagram of the model, as ob- tained by accurate numerical techniques. One of them, used to determine the charge transition line, consists in studying singularities of single-site entanglement [12] by means of density-matrix renormalization group (DMRG) [18]. Another method is based on topological numbers, or jumps of Berry phases [19], which was successfully ap- plied to a similar model [8] (b). The value of Uc (Us) is determined in this case by the jump of the charge (spin) Berry phase. The corresponding values of Uc and Us in systems up to L = 14 sites, extrapolated to the thermo- dynamic limit using a parabola in 1/L2, are also shown in Fig. 1. DMRG evaluations of ∆c and ∆s confirm these pre- dictions. The charge gap was calculated in [12] from the definition 2∆c = E0(N + 2) + E0(N − 2)− 2E0(N), E0(N) being the ground-state energy of the chain with N particles. Similarly, the spin gap is here determined through ∆s = E0(Sz = 1) − E0(Sz = 0), being E0(Sz), the ground-state energy of the half-filled system within the subspace with a given total Sz. We can see in Fig. 1 that the closing of ∆c, ∆s do not take place simultane- ously for small U and X . The critical lines for the closing of both gaps obtained by extrapolations to the thermo- dynamic limit are in reasonable quantitative agreement with the ones determined by the method of the topolog- ical phases. We have verified that the spin transition is of KT type, calculating the scaling dimensions of the singlet and triplet operators as described in [19]. In order to identify the universality class of the charge transition, we em- ployed the finite-size crossing method [20]. The study of the dependence of 〈ni↑ni↓〉 = ∂eL/∂U on the size L (eL being the ground-state energy density) provides a loca- tion of the critical points in agreement with the methods discussed above. In addition, the divergence that devel- ops ∂e2L/∂U 2 with increasing L indicates that the gap ex- ponent ν remains close to 1/2 (the value that can be com- puted exactly at the point X = t) for X/t = 0.6, 0.7, 0.8 with a possible increase for X/t → 0.5; below this point, our numerical analysis suggests that the charge transi- tion becomes of KT type, with “ν = ∞”. The estimate ν = 1/2 relies upon the assumption that the dynamic exponent ζ (through which gap and correlation length ξ are related, ∆c ∝ ξζ) is still ζ = 2, as in the exactly solvable case X = t [7]. As already noted in [12], the behavior of ∆c ∝ L−2 along the transition line is consis- tent with this exponent. We stress that such feature is in agreement with the CIC character of the metal-insulator transition [16]. Instead, within the metallic phase, the finite-size scaling suggests ∆c ∝ L−1, although the data are rather noisy due to incommmensurability. In Fig. 2 we show numerical results supporting the in- commensurate character of the metallic phase. We report the density distributions in real space for the local charge 8 16 24 32 40 48 56 64 U=4.0 t U=3.05 t U=2.5 t U=0.5 t 8 16 24 32 40 48 56 64 X=0.3 t X=0.58 t X=0.8 t Figure 2: (Color online). Charge distribution 〈ni〉 evaluated by DMRG. Left: X = 0.8t. Right: U = 1.5t. density ni = ni↑ + ni↓ in the ground state in an open chain with L = 64 sites. The incommensurate character of the metallic phase manifests itself also in the behavior of the charge and spin correlation functions, whose cor- responding structure factors show peaks away from the commensurate reciprocal vector q = π (not shown). The left panel of Fig. 2 corresponds to X = 0.8t as U is var- ied. The behavior is similar to the one observed within the incommensurate phase of the Hubbard model includ- ing next-nearest-neighbor hopping (t−t′−U model) [13]. For U > Uc = 3.05t, the commensurate charge distribu- tion characterizing the insulating phase is reached within a few lattice sites from the edge. The insulator-metal transition shows up via the appearance of incommensu- rate modulations in the charge distribution, whose wave- length increases within the metallic phase. The right part of the figure shows the results obtained by varying X at U = 1.5t. Interestingly, a first modulation appears already for Xs < X < Xc ( Xs ≈ 0.5t, and Xc ≈ 0.6t). Again, for X > Xc further incommensurate modulations appear in the LE phase. Within the charge sector U < Uc, the dominating cor- relations at large distance are superconducting pair-pair ones if the correlation exponent Kρ > 1 or charge-charge ones otherwise. We calculatedKρ employing the method- ology described in [9]. This study casts extrapolated val- ues Kρ ∼ 1.3 for U = 0 and X = 0.8t. To provide stronger evidence for the SS character of the incommen- surate phase, we have calculated on-site pairing correla- tions 〈P †i Pj〉 with Pi = c i↓ and charge-charge correla- tions |〈ninj〉 − 〈ni〉〈nj〉| in an open chain with 100 sites and using the sites 30 to 70 to avoid boundary effects. The results are displayed in Fig. 3. A fitting of the pair- ing correlations at distances between 8 and 40 sites gives Kρ = 1.32± 0.01. This value is also consistent with the long distance behavior of the charge-charge correlations. The inset also shows the tendency of the system to show the anomalous flux quantization characteristic of super- 0.0 0.5 1.0 1.5 0.0 0.5 1.0 -5.060 -5.055 -5.050 Φ / Φ pair-pair charge-charge |i-j| Figure 3: Pair-pair and charge-charge correlation functions for U = t and X = 0.8t. Full (dashed) line corresponds to a power law with exponent 1/Kρ (Kρ). The inset shows the ground state energy as a function of an applied magnetic flux. conductivity [11], which is more pronounced as the size of the system increases. An additional argument suggesting superconducting correlations within this phase is provided by the real space renormalization-group method, used before for the standard Hubbard model [21]. Different from that case, the recursive equations for the renormalized parameters in the positive U regime, depending on X and U , exhibit three different fixed points for the nth step renormalized Coulomb interaction U (n) in the large n limit: U (n) > 0 for U > Urc, U (n) = 0 for U = Urc, and U (n) < 0 for U < Urc. In the latter case, the effective Coulomb inter- action becomes attractive. In the bottom left insert of Fig. 1 Urc obtained in this way is reported. To support the bosonization predictions, which char- acterize the intermediate phase as a BOW, we have eval- uated with DMRG the BOW order parameter OBOW = (−1)i〈c†i+1σciσ +H.c.〉]/(L− 1) in chains with open boundary conditions, following the same procedure as Manmana et al. for the ionic Hubbard model [22] in chains up to 400 sites. In spite of the large systems used, finite-size effects are still important and do not allow an accurate extrapolation. In any case, the qualitative be- havior of our results (not shown) is similar to that found by Manmana et al. showing a clear maximum inside the BOW phase, an abrupt fall for U ∼ Uc as the system en- ters the SS phase and a slower decay for larger U ∼ Us, which for finite systems extends inside the SDW phase. To conclude, we have presented compelling evidence, based on bosonization as well as on other analytical and numerical techniques, of the existence of a narrow bond- ordered wave phase and a transition to an unconventional incommensurate metallic one with dominant singlet su- perconducting correlations in the phase diagram of the U − X model. The appearance of superconductivity in a model with repulsive on-site interactions at half fill- ing, and of incommensurate correlations induced by in- teraction are both unusual features. Their emergence can be understood from the structure of the exactly solvable case X = t. There the number Nd of doubly occupied sites (doublons) becomes a conserved quantity; holes and doublons play an identical role regarding the kinetic en- ergy ǫ(kF ), which can be mapped into that of a spinless fermion system, with Fermi momentum kF . The compe- tition of ǫ(kF ) and UNd fixes the Fermi level of the re- sulting effective model. The presence of doublons in the ground state (U < 4t) simultaneously drives the spinless fermions away from half-filling (kF 6= π), and switches on the doublons role in the kinetic energy. The latter ceases to be identical to that of holes as soon as X 6= t, gener- ating incommensurability within the system. Moreover superconducting correlations can dominate away from half-filling [8]. Thus, a nonvanishing number of doublons provides the scenario for both incommensurability and superconductivity for X . t. We thank D. Cabra for useful discussions. We acknowl- edge support from PICT’s No. 03-11609, No. 03-12742, and No. 05-33775 of ANPCyT and PIP’s No. 5254 and No. 5306 of CONICET, Argentina, No. FIS2006- 08533-C03-02, and the “Ramon y Cajal” program from MCEyC of Spain, Angelo Della Riccia Foundation, and PRIN 2005021773 Italy. [1] Burbonais, Science 281, 1155 (1998); H. Kishida et al., Nature (London) 405, 929 (2000). [2] J. T. Gammel and D. K. Campbell, Phys. Rev. B 60, 71 (1988); Y. Z. Zhang, ibid. 92, 246404 (2004); R. Strack and D. Volhardt, Phys. Rev. Lett. 70, 2637 (1993). [3] J. E. Hirsch, Physica (Amsterdam) 158C, 326 (1989); J. E. Hirsch and F. Marsiglio, Phys. Rev. B 39, 11515 (1989). [4] L. Arrachea and A. A. Aligia, Phys. Rev. B 59, 1333 (1999); ibid. 61, 9686 (2000). [5] A. Hübsch et al., Phys. Rev. Lett. 96, 196401 (2006). [6] A. Anfossi et al. Phys. Rev. Lett. 95, 056402 (2005). [7] A. Anfossi, P. Giorda, and A. Montorsi, Phys. Rev. B 75, 165106 (2007). [8] G. I. Japaridze and E. Müller-Hartman, Ann. Phys. (Leipzig) 506, 163 (1994); A. A. Aligia and L. Arrachea, Phys. Rev. B 60, 15332 (1999), and references therein. [9] L. Arrachea et al., Phys. Rev. B 50, 16044 (1994). [10] L. Arrachea and A. A. Aligia, Phys. Rev. Lett. 73, 2240 (1994); J. de Boer, V. E. Korepin, and A. Schadschneider, ibid. 74, 789 (1995). [11] L. Arrachea, A. A. Aligia and E. Gagliano, Phys. Rev. Lett. 76, 4396 (1996). [12] A. Anfossi et al., Phys. Rev. B 73, 085113 (2006). [13] G. I. Japaridze, R. M. Noack, and D. Baeriswyl, Phys. Rev. B 76, 115118 (2007). [14] I. N. Karnaukhov, Phys. Rev. B 66, 092304 (2002). [15] M. Tsuchiizu and A. Furusaki, Phys. Rev. B 69, 035103 (2004). [16] T. Giamarchi, Quantum Physics in One Dimension (Ox- ford University Press, Oxford, U.K., 2004). [17] A. A. Nersesyan, A. O. Gogolin, and F.H.L. Eβler, Phys. Rev. Lett. 81, 910 (1998). [18] S. R. White, Phys. Rev. Lett. 69, 2863 (1992); K. Hall- berg, Adv. Phys. 55, 477 (2006). [19] M. E. Torio et al., Phys. Rev. B 73, 115109 (2006), and references therein. [20] L. Campos Venuti et al., Phys. Rev. A 73, 010303(R) (2006). [21] J. E. Hirsch, Phys. Rev. B 22, 5259 (1980). [22] S. R. Manmana et al., Phys. Rev. B 70, 155115 (2004). ABSTRACT We determine the quantum phase diagram of the one-dimensional Hubbard model with bond-charge interaction X in addition to the usual Coulomb repulsion U at half-filling. For large enough X and positive U the model shows three phases. For large U the system is in the spin-density wave phase already known in the usual Hubbard model. As U decreases, there is first a spin transition to a spontaneously dimerized bond-ordered wave phase and then a charge transition to a novel phase in which the dominant correlations at large distances correspond to an incommensurate singlet superconductor. <|endoftext|><|startoftext|> Introduction Flavor Ratios of Astrophysical Neutrinos Production of Astrophysical Neutrinos Energy Spectra in weak decays Power law Spectra Energy Loss Mechanisms Decay Energy Distribution Hadronic interactions Gas Target Photoproduction Nuclear photodisintegration Neutrino Production in Gamma Ray Bursts Experimental Flavor identification Conclusions References ABSTRACT The measurement of the flavor composition of the neutrino fluxes from astrophysical sources has been proposed as a method to study not only the nature of their emission mechanisms, but also the neutrino fundamental properties. It is however problematic to reconcile these two goals, since a sufficiently accurate understanding of the neutrino fluxes at the source is needed to extract information about the physics of neutrino propagation. In this work we discuss critically the expectations for the flavor composition and energy spectrum from different types of astrophysical sources, and comment on the theoretical uncertainties connected to our limited knowledge of their structure. <|endoftext|><|startoftext|> Introduction. The gravitational waves from the present system are calculated by the quadrupole formula [33], which is given by hQxx − hQyy cos 2ϕ+ 2hQxy sin 2ϕ ] (cos2 θ + 1) hQxx + h yy − 2hQzz ) sin2 θ hQxz cosϕ+ h yz sinϕ sin θ cos θ , (3.1) 2hQxy cos 2ϕ− (hQxx − hQyy) sin 2ϕ ] cos θ hQxz sinϕ− hQyz cosϕ sin θ, (3.2) where d2Qij with Qij ≡ µ ZiZj − δij Z (the reduced quadrupole moment of a point mass). (3.3) (r, θ, ϕ) [or (x, y, z)] is the position of a distant observer in spherical coordinates [or Cartesian coordinates], andZ(t) is a trajectory of a particle. We assume that the observer is on the equatorial plane, i.e. (θ, ϕ) = (π/2, 0). Figure 6 shows the waveforms from Orbits (a), (b), and (c). The left panels show the “+” polarization modes of those waves, while the right ones are the “×” polarization. The top, middle, and bottom panels correspond to the waves from Orbits (a), (b), and (c), respectively. The waves from Orbits (a) and (c) show a periodic feature, which is expected from the Poincaré maps in Fig. 1. On the other hand, the waves from Orbit (b) show a completely different behaviour. We find much random spiky noise in the waveform before t/M = 1.6×105 and after t/M = 3.8×105. This is a typical feature of the gravitational waves from highly chaotic motion [11, 17]. We also find that the amplitude decreases for the time interval of t/M = (1.6 ∼ 3.8) × 105. As shown in Fig. 3, in this time interval, the particle moves near the small tori in the phase space. This adjective feature of this particle motion appears clearly in the gravitational amplitudes. That is, in the phase of a nearly regular motion, the particle position and its velocity do not change much compared with those in the more strongly chaotic phase (b-1) (see Fig. 1(b) and Fig. 3(b)). The time variation of the quadrupole moment of the system is small and hence the wave amplitude decreases as well. We also calculate the energy spectra of the gravitational waves, which will be one of the most important observable quantities in the near future. In Fig. 7, we show the energy spectra for each orbit. Figures 7(a) and (c) show many sharp peaks at certain characteristic frequencies. If a motion is regular, we expect several typical frequencies with those harmonics. So such a result reflects that the particle moves regularly. Figure 7 (b) gives the spectrum of Orbit (b). It is clearly different from the previous two almost regular cases. It looks just like white noise, below a typical frequency fM ∼ 10−2, i.e., the shape of the spectrum is flat and it contains many noisy components. However, the spectrum of Orbit (b-2) (Fig. 7(b-2)), which is analyzed by the orbit only in the time interval of t/M = (1.6 ∼ 3.8)× 105, does not do so. Rather it looks similar to the spectrum of a regular orbit. Contrary to Fig. 7(b), it does not contain much noise at the low frequency region (fM ≤ 10−2). To see more detail, dividing the time interval of Orbit (b) into two, we show the magnifications of the spectra of Orbits (a), (b-1), (b-2), and (c) in Fig. 8. Compared to the spectra (a) and (c), the spectra (b-1) and (b-2) contain many noisy spikes. Such noisy spikes are usually found in the gravitational waves from a chaotic orbit [17]. However, the spectra (b-1) and (b-2) are completely different. The spectrum (b-1) is just white noise. No structure is found. On the other hand, the spectrum (b-2) looks similar to those for regular orbits. The “sharp” peaks appear at some frequencies, but the widths of those peaks are broadened by many noisy spikes. Therefore, we conclude that Orbit (b-2) looks nearly regular but still holds its chaotic character, and such a feature imprints in the spectrum of the waves. The important point is that two phases in the particle orbit (b), i.e., the nearly regular phase and the more strongly chaotic one, are also distinguishable in the gravitational wave forms and the energy spectra. With this analysis, we could constrain orbital parameters. IV. SUMMARY AND DISCUSSION In this paper we have investigated chaos characteristic for a test particle motion in a system of a point mass with a massive disk in Newtonian gravity. To distinguish such characteristics, we propose the gravitational waves emitted from this system. At first, we analyzed the motion of the particle by use of the Poincaré map and the “local” Lyapunov exponent. We found that the phase in which particle motion becomes nearly regular always appears even though the global motion is chaotic. We emphasize that both phases of nearly regular and more strongly chaotic motions are found in the same orbit. The gravitational wave forms and their energy spectra have been evaluated by use of the quadrupole formula in each case. In two almost regular cases, the waves show the periodic behaviour and certain sharp peaks appear in those energy spectra. In the chaotic case, we have found that the waves show two phases, the nearly regular phase and a more strongly chaotic one. In the nearly regular phase, wave amplitude gets smaller in the more strongly chaotic phase. The energy spectra are also clearly different. The spectrum in the more strongly chaotic phase looks like white noise, but in the nearly regular one, it becomes similar to those in the regular ones. However it is accompanied by many small noisy spikes, which is a characteristic feature of a chaotic system. These spikes make the widths of the spectrum peaks broader than those in the regular cases. Comparing information from the waves with the particle motion, we conclude that we can extract chaotic characteristics of a particle motion of the gravitational waves of the system. In the present analysis, in the spectrum (b-2) of the gravitational waves, we do not find a power-law structure, which appears in the spectrum of the particle motion. This may be because the waveform is given by the change of the quadrupole moment, which contains higher time derivatives of a particle trajectory such as acceleration. It may be much more interesting if one can find the 1/f behaviour in some information of the gravitational waves because 2000 4000 6000 8000 10000 orbit(a) 2000 4000 6000 8000 10000 orbit(a) 100000 200000 300000 400000 500000 orbit(b) 100000 200000 300000 400000 500000 orbit(b) 2000 4000 6000 8000 10000 orbit(c) 2000 4000 6000 8000 10000 orbit(c) FIG. 6: The gravitational waveforms evaluated by the quadrupole formula. Top, middle, and bottom figures correspond to those for Orbits (a), (b), and (c), respectively. The left and right rows give the “+” and “×” polarization modes, respectively. such an indication may specify the type of chaos more clearly. This is under investigation. Finally, we mention a possibility to constrain parameters in a dynamical system. If the gravitational waves are observed for a sufficiently long time, we can monitor the time variation of the wave amplitudes, their forms and polarizations. We can then calculate the energy spectra for some durations. If the spectra show one of the typical characteristics found in this paper, the parameters of a particle motion could be constrained. Of course, a realistic system can be more complicated, and the present model may be too simple. But we believe the characteristic behaviour of the gravitational waves found in this paper will help us to understand a chaotic system. Therefore our next task is to analyze the gravitational waves from various chaotic systems, especially relativistic chaotic systems [4, 5, 6, 7, 9, 10, 11, 12, 13, 17, 21, 32]. Then, we should investigate whether or not the correlation between the gravitational -6 -5 -4 -3 -2 -1 0 Log10[fM] orbit(a) -6 -5 -4 -3 -2 -1 0 Log10[fM] orbit(b) -5.5 -5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 Log10[fM] orbit(b-2) -2 -1.9 -1.8 -1.7 -1.6 -1.5 Log10[fM] orbit(c) FIG. 7: The energy spectra of the gravitational waves shown in Fig. 6. Orbit (b-2) gives the spectrum of the waves for the “stagnant motion”, i.e., when the particle motion of Orbit (b) becomes near regular for t/M = (1.6 ∼ 3.8) × 105. Figures (a) and (c) show many sharp peaks at certain characteristic frequencies. This is because of the regular motion. The spectrum in Fig. (b), which looks like white noise for fM ≤ 10−2, is clearly different from those in Figs. (a) and (c), but the spectrum in Fig. (b-2) does not look like white noise. It looks similar to the cases (a) and (c). However, the peaks are not sharp but rather broadened by appearing so many other spikes. Note that the typical frequency of the orbits is in the range of fM = 10−2 ∼ 10−1 (see Fig. 5). waves and chaos in dynamical systems found in this work is generic. Acknowledgments We express thanks to T. Konishi for useful discussions. This work was supported in part by Japan Society for Promotion of Science (JSPS) Research Fellowships (K.K. and H.K.), by a Grant-in-Aid from the Scientific Research Fund of the JSPS (No. 17540268), and by the Japan-U.K. Research Cooperative Program. K.M. would like to thank DAMTP, the Centre for Theoretical Cosmology, and Clare Hall, where this work was completed. [1] Deterministic Chaos in General Relativity, edited by D. Hobill, A. Burd, and A. Coley (Plenum, New York, 1994), and references therein. [2] J.D. Barrow, Phys. Rep. 85, 1 (1982). [3] G. Contopoulos, Proc. R. Soc. London A431,183(1990). [4] V. Karas and D. Vokrouhlický, Gen. Rela. Grav. 24,729(1992). [5] H. Varvoglis and D. Papadopoulos, Astron. Astrophys. 261,664(1992). [6] L. Bombelli and E. Calzetta, Class. Quantum Grav. 9,2573(1992). [7] R. Moeckel, Commun. Math. Phys. 150,415(1992). -2.4 -2.2 -2 -1.8 -1.6 Log10[fM] orbit(a) -2 -1.8 -1.6 -1.4 -1.2 -1 Log10[fM] orbit(b-1) -2 -1.8 -1.6 -1.4 -1.2 -1 Log10[fM] orbit(b-2) -2 -1.9 -1.8 -1.7 -1.6 -1.5 Log10[fM] orbit(c) FIG. 8: Magnification of energy spectra of Orbits (a), (b-1), (b-2), and (c). [8] C.P. Dettmann, N.E. Frankel and N.J. Cornish, Phys. Rev. D 50, 618(1994). [9] U. Yurtsever, Phys. Rev. D 52,3176(1995). [10] Y. Sota, S. Suzuki, and K. Maeda, Class. Quantum Grav. 13,1241(1996). [11] S. Suzuki and K. Maeda, Phys. Rev. D 55,4848(1997); 58,023005(1998); 61,024005(1999). [12] P.S. Letelier and W.M. Viera, Phys. Rev. D 56,8095(1997). [13] J. Podolsky and K. Vesely, Phys. Rev. D 58, 081501(1998). [14] J. Levin, Phys. Rev. Lett. 84,3515(2000). [15] J.D. Schnittman and F.A. Rasio, Phys. Rev. Lett. 87, 121101 (2001). [16] N.J. Cornish and J. Levin, Phys. Rev. Lett. 89,179001(2002). [17] K. Kiuchi and K. Maeda, Phys. Rev. D 70,064036(2004) [18] C.F.F. Karney, Physica D 8,360(1983) [19] G. Contopoulos, M. Harsoula and N. Voglis, Celest. Mech. Dyn. Astron. 78,197(2000) [20] G. Contopoulos, Order and Chaos in Dynamical Astronomy, (Springer, 2002) [21] H. Koyama, K. Kiuchi and T. Konishi, arXiv:gr-qc/0702072. [22] K. Tsubono, Prepared for Edoardo Amaldi Meeting on Gravitational Wave Experiments, Rome, Italy, 14-17 Jun 1994 [23] A. Abramovici et al., “LIGO: The Laser interferometer gravitational wave observatory,” Science 256, 325 (1992). [24] J. Hough et al., Prepared for TAMA Workshop on Gravitational Wave Detection, Saitama, Japan, 12-14 Nov 1996 [25] K.S. Thorne, arXiv:gr-qc/9506086. [26] A. Saa and R. Venegeroles, Phys. Lett. A 259,201(1999) [27] A. Saa, Phys. Lett. A 269,204(2000) [28] The recent observation suggests there exist huge black holes at the centers of many galaxies. See, for example, J. Kormendy and D. Richstone, Astrophys. J. 393,559(1992) [29] I. Shimada and T. Nagashima, PTP 61, 1605 (1979) [30] The reason why we find non-zero positive Lyapunov exponent for an integrable system is that we solve the equation of motion by a finite difference method and the finite difference approximation does not provide an exact integrable system. In fact, if we reduce the time step for integration, the value decreases. [31] W.C.Saslaw, Gravitational physics of stellar and galactic systems, Cambridge University Press, 1985. [32] J.P.S. Lemos and P.S. Letelier, Phys. Rev. D 49,5135(1994). [33] L. D. Landau and E. M. Lifshitz, The Classical Theory of Fields, (Pergamon, Oxford, 1951). [34] P. Grassberger, R. Badii, and A. Politi, J. Stat. Phys. 51, 135 (1988); H. E. Kandrup, B. L. Eckstein, and B. O. Bradley, Astron. Asrophys. 320, 65 (1997) http://arxiv.org/abs/gr-qc/0702072 http://arxiv.org/abs/gr-qc/9506086 APPENDIX A: LOCAL LYAPUNOV EXPONENT In this appendix, we give the definition of “local” Lyapunov exponent. Our definition of “local” Lyapunov exponent is somewhat different from the conventional one[34], but those are essentially the same. At first, let us consider the system whose time evolution is described by a set of differential equations in N - dimensional space, ẋ = F(x) , (A1) where x(t) is a N -dimensional vector. The time evolution of the orbital deviation δx, which is the difference between two nearby orbits, obeys the following set of linear differential equations: δẋ = (x(t))δx. (A2) The solution of Eq. (A2) can be written formally as δx(t) = U tt0δx0, (A3) where δx0 is an “initial” deviation at some time t0 and U is an evolution matrix, which is given by the following integration; U tt0 = exp (x(t′))dt′ . (A4) We define the “local” Lyapunov exponent in time interval [t0, t] by λ(ek, t) = t− t0 ||U tt0e1 ∧ U e2 ∧ · · · ∧ U tt0ek|| ||e1 ∧ e2 · · · ∧ ek|| for k = 1, 2, · · · , N , where ek is a k-dimensional subspace in the tangent space at the initial point x0, which is spanned by k independent vectors ei (i = 1, 2, · · · , k), ∧ is an exterior product, and || ◦ || is a norm with respect to some appropriate Riemannian metric. If we take a limit of t → ∞, λ(ek,∞) correspond to the conventional Lyapunov exponents. If the integration time interval t∆ ≡ t − t0 is much longer than the dynamical time of the system, we may find convergent values for each λ(ek, t), which are almost independent of t∆ (or t0). We may call them “local” Lyapunov exponents at t. The maximum value of “local” Lyapunov exponents, i.e. λ(t) = max{λ(ek, t)|k = 1, 2, · · · , N} is the most important one for our discussion. So we also call it the “local” Lyapunov exponent at t. -6 -5 -4 -3 -2 -1 0 Log10[fM] orbit(b) Introduction Basic equations Numerical Analysis Two phases of chaos in particle motion Indication of chaos in gravitational waves Summary and Discussion Acknowledgments References Local Lyapunov Exponent ABSTRACT We study gravitational waves from a particle moving around a system of a point mass with a disk in Newtonian gravitational theory. A particle motion in this system can be chaotic when the gravitational contribution from a surface density of a disk is comparable with that from a point mass. In such an orbit, we sometimes find that there appears a phase of the orbit in which particle motion becomes to be nearly regular (the so-called ``stagnant motion'') for a finite time interval between more strongly chaotic phases. To study how these different chaotic behaviours affect on observation of gravitational waves, we investigate a correlation of the particle motion and the waves. We find that such a difference in chaotic motions reflects on the wave forms and energy spectra. The character of the waves in the stagnant motion is quite different from that either in a regular motion or in a more strongly chaotic motion. This suggests that we may make a distinction between different chaotic behaviours of the orbit via the gravitational waves. <|endoftext|><|startoftext|> Introduction This paper is a sequel to [Z]. We present here a Lohner-type algorithm for computation of rigorous enclosures of partial derivatives with respect to initial conditions up to an arbitrary order r of the flow induced by an autonomous ODE, hence the name Cr-Lohner algorithm. Let r be a positive integer, then by Cr-algorithm we will mean the routine which gives rigorous estimates for partial derivatives with respect to initial conditions up to an order r and Cr- computations we mean an application of an Cr-algorithm. Our main motivation for the development of Cr-algorithm was a desire to provide a tool, which will considerably extend the possibilities of computer assisted proofs in the dynamics of ODEs. Till now most of such proofs have used topological conditions (see for example [HZHT, MM, GZ, Z1]) and additionally conditions on the first derivatives with respect to initial conditions (see for example [RNS, T, Wi1, WZ, KZ]), hence it required C0- and C1-computations, respectively. The spectrum of problems treated includes the questions of the existence of periodic orbits and their local uniqueness, the existence of symbolic dynamics, the existence of hyperbolic invariants sets, the existence of homo- and heteroclinic orbits. To treat other phenomena, like bifurcations of periodic orbits, the route to chaos, invariant tori through KAM theory one needs the knowledge of partial derivatives with respect to initial conditions of higher order. In principle, one can think that a good rigorous ODE solver should be en- ough. Namely, to compute the partial derivatives of the flow induced by x′ = f(x), x ∈ Rn (1) 1 Research supported by an annual national scholarship for young scientists from the Foundation for Polish Science 2 Research supported in part by Polish State Ministry of Science and Information Techno- logy grant N201 024 31/2163 http://arxiv.org/abs/0704.0720v1 it is enough to rigorously integrate a system of variational equations obtained by a formal differentiation of (1) with respect to the initial conditions. For example for r = 2 we have the following system x′ = f(x), (2) Vij(t) = (x)Vsj(t) (3) Hijk(t) = s,r=1 ∂xs∂xr (x)Vrk(t)Vsj(t) + (x)Hsjk(x), (4) with the initial conditions x(0) = x0, V (0) = Id, Hijk(0) = 0, i, j, k = 1, . . . , n. It is well known that if by ϕ(t, x0) we denote the (local) flow induced by (1), (t, x0) = Vi,j(t), ∂xj∂xk (t, x0) = Hijk(t). Analogous statements are true for higher order partial derivatives with respect to initial conditions. It turns out that a straightforward application of a rigorous ODE solver to the system of variational Equations (2–4) is very inefficient. Namely, it totally ignores the structure of the system and leads to a very poor performance and unnecessary long computation times (see Section 4.1). Our algorithm is a modification of the Lohner algorithm [Lo], which takes into account the structure of variational Equations (2–4). Basically it consists of the Taylor method, a heuristic routine for a priori bounds for solution of (2–4) during a time step and a Lohner-type control of the wrapping effect, which is done separately for x and partial derivatives with respect initial conditions (the variables V and H in (3,4)). The Taylor method is realized using the automa- tic differentiation [Ra] and the algorithms for computation of compositions of multivariate Taylor series. The proposed algorithm has been successfully applied in [HNW] to the Mi- chelson system [Mi], where a computer assisted proof of the existence of a cocoon bifurcation was presented. Some parts of this proof required C2-computations. In the present paper in Section 8 we show an application of our algorithm to pendulum equation with periodic forcing and the Michelson system. We used it to compute rigorous bounds for the coefficients of some normal forms up to order five, which enabled us to prove the existence of invariant tori around some elliptic periodic orbits in these systems using KAM theorem for twist maps on the plane. These proofs required C3 and C5 computations. 2 Basic definitions To effectively deal with the formulas involving partial derivatives we will use ex- tensively a notation of multiindices, multipointers and submultipointers throu- ghout the paper. As an motivation let us consider the formula for the partial derivatives of the composition of maps. Assume g : Rn → Rn and f : Rn → R are of class C3. We have ∂3(f ◦ g) ∂xi∂xj∂xc k,r,s=1 ∂xk∂xr∂xs ∂xi∂xj∂xc k,r=1 ∂xk∂xr ∂xi∂xc ∂xj∂xc ∂xi∂xj To the operator ∂ ∂xi1∂xi2∂xi3 we can in a unique way assign a multipointer, which is a nondecreasing sequence of integers (j1, j2, j3), such that {i1, i2, i3} = {j1, j2, j3}. A submultipointer is a multipointer, which is a part of a longer mul- tipointer, for example (i, j, c)(1,3) = (i, c). One observes, that submultipointers appear at several places in the above formula. A multiindex is an element of α ∈ Nn. It is another way to represent various partial derivatives. The coefficient αi tells us how many times to differentiate a function with respect to the i-th variable. Obviously, we have one-to-one correspondence between multipointers and multiindices. 2.1 Multiindices By N we will denote the set of nonnegative integers, i.e. N = {0, 1, 2, . . .}. Definition 1 An element τ ∈ Nn will be called a multiindex. For a sequence α = (α1, . . . , αn) ∈ Nn and a vector x = (x1, . . . , xn) ∈ Rn we 1. |α| = α1 + · · ·+ αn 2. α! = α1! · α2! · · ·αn! 3. xα = (xα11 , . . . , x By eni ∈ Nn we will denote eni = (0, 0, . . . , 0, 1 , 0, . . . , 0, 0). We will drop the index n (the dimension) in the symbol eni when it is obvious from the context. Put Nnp := {a ∈ Nn : |a| = p}. For δ = (δ1, . . . , δk) ∈ Nn1 × · · · × Nnk we set 1. |δ| = i=1 |δi| 2. δ! = i=1 δi! Let f = (f1, . . . , fm) : R n → Rm be sufficiently smooth. For α ∈ Nn we set 1. Dαfi = ∂|α|fi ∂xα11 · · · ∂x 2. Dαf = (Dαf1, D αf2, . . . , D For a function f : R× Rn → Rn by Dαfi(t, x) we will denote Dαfi(t, ·)(x) and similarly Dαf(t, x) = (Dαf1(t, x), . . . , D αfn(t, x)). This convention means that Dα always acts on x-variables. 2.2 Multipointers For a fixed n > 0 and p > 0 we define Nnp := {(a1, a2, . . . , ap) ∈ Np : 1 ≤ a1 ≤ · · · ≤ ap ≤ n} N = Nn := Definition 2 An element of Nn will be called a multipointer. Remark 3 A function Λ : Nnp ∋ (a1, . . . , ap) → enai ∈ N p (5) is a bijection. Let f = (f1, . . . , fm) : R n → Rm be a sufficiently smooth. For a ∈ Nnp we set 1. Dafi := ∂xa1 . . . ∂xap 2. Daf := (Daf1, . . . , Dafm) For a function f : R × Rn → Rn by Dafi(t, x) we will denote Dafi(t, ·)(x). In the light of the above notations Dαf = D Λ(α)f . For a = (a1, a2, . . . , an) ∈ Nnp and b = (b1, b2, . . . , bn) ∈ Nnq we define a+ b = (a1 + b1, . . . , an + bn) ∈ Nnp+q. For α ∈ Nnp and β ∈ Nnq we define α+ β = Λ−1 (Λ(α) + Λ(β)) ∈ Nnp+q. By ≤ we will denote a linear order (lexicographical order) in N defined in the following way. For a ∈ Nnp and b ∈ Nnq (a ≤ b) ⇐⇒ either ∃i, i ≤ p, i ≤ q, ai < bi and aj = bj for j < i or p ≤ q and ai = bi for i = 1, . . . , p. Definition 4 For k ≤ p we set N p(k) := {(δ1, . . . , δk) ∈ (N p)k : δ1 ≤ · · · ≤ δk, δ1+ · · ·+δk = (1, 2, . . . , p)} (7) We will use N p(k) extensively in the next section. Its will be used to label terms in Dαfi(ϕ(t, x)). Observe that for p > 0 N p(1) = {(1, 2, . . . , p)} N p(p) = {((1), (2), . . . , (p))} One can construct all elements of N p(k) using the following recursive procedure. From the definition of N p(k) it follows that if (δ1, . . . , δm−1) ∈ N p−1(m − 1) then (δ1, . . . , δm−1, (p)) ∈ N p(m) (notice that order is preserved). Similarly, if (δ1, . . . , δm) ∈ N p−1(m) then (δ1, . . . , δs−1, δs + (p), δs+1, . . . , δm) ∈ N p(m) and again order of elements is preserved. Hence, for p > 2 and 1 < k < p we have N p(k) = A ∪B where (δ1, . . . , δk−1, (p)) : (δ1, . . . , δk−1) ∈ N p−1(k − 1) (δ1, . . . , δs−1, δs + (p), δs+1, . . . , δk) : (δ1, . . . , δk) ∈ N p−1(k) } (8) and the sets A and B are disjoint. Another way to generate all elements of N p(k) can be described as follows • decompose the set {1, 2, . . . , p} into k nonempty and disjoints sets ∆i, i = 1, . . . , k • we sort each ∆i and permute ∆i’s to obtain min(∆1) < min(∆2) < · · · < min(∆k) • we define δi to be an ordered set consisting of all elements of ∆i for i = 1, . . . , k Definition 5 For an arbitrary a ∈ Nnp and δ ∈ N k such that k ≤ p we define a submultipointer aδ ∈ Nnk by (aδ)i = aδi for i = 1, . . . , k, which can be expressed using Λ as follows aδ := Λ enaδi ∈ Nnk 3 Equations for variations Consider an ODE x′ = f(x) where f is CK+1. Let ϕ : R× Rn−→◦ Rn be a local dynamical system induced by x′ = f(x). It is well known, that ϕ ∈ CK and one can derive the equations for partial derivatives of ϕ by differentiating equation (t, x) = f(ϕ(t, x)) with respect to the initial condition x. As a result we obtain a system of so-called equations for variations, whose size depends on the order r of partial derivatives we intend to compute. An example of such system for r = 2 is given by (2–4) with initial conditions given by (5). The goal of this section is to write the equations for variations in a compact form using multipointers and multiindices, which allows us to take into account the symmetries of partial derivatives, Lemma 6 Assume f ∈ Cr+1 and let ϕ : R × Rn−→◦ Rn be a local dynamical system induced by x′ = f(x). Then for a ∈ Nnp such that p ≤ r holds Daϕi = i1,...,ik=1 Dei1+···+eik fi (δ1,...,δk)∈Np(k) Daδjϕij (9) for i = 1, . . . , n. Proof: In the proof the functions Dei1+···+eik fi are always evaluated at ϕ(t, x), and various partial derivatives of ϕ are always evaluated at (t, x), there- fore the arguments will be always dropped to simplify formulae. We prove the lemma by induction on p = |a|. If p = 1 then a = (c) for some c ∈ {1, . . . , n} and (9) becomes D(c)ϕi = Desfi ·D(c)ϕs. Assume (9) holds true for p − 1, p > 1. Let us fix a ∈ Nnp . We have a = b + (c), where b = (a1, . . . , ap−1) ∈ Nnp−1 and c = ap. Since (9) is satisfied for p− 1, therefore we have Daϕi = D(c) = D(c) i1,...,ik=1 β:=ei1+···+eik (δ1,...,δk)∈Np−1(k) Dbδjϕij i1,...,ik+1=1 β:=ei1+···+eik+1 Dβfi ·D(c)ϕik+1 (δ1,...,δk)∈Np−1(k) Dbδjϕij i1,...,ik=1 β:=ei1+···+eik (δ1,...,δk)∈Np−1(k) Dbδs+(c)ϕis j 6=s Dbδjϕij For k = 1, . . . , p we set Tk := i1,...,ik=1 Dei1+···+eik fi (δ1,...,δk)∈Np(k) Daδjϕij (10) Now our goal is to prove that: Daϕi = Tk (11) Our strategy of proof is as follows. We will define S1, . . . , Sp, such that Daϕi = Sk, Si = Ti, i = 1, . . . , p. (12) We set i1,...,ik=1 β:=ei1+···+eik (δ1,...,δk)∈Np−1(k) Dbδs+(c)ϕis j 6=s Dbδjϕij k=p−1 i1,...,ik+1=1 β:=ei1+···+eik+1 Dβfi ·D(c)ϕik+1 (δ1,...,δk)∈Np−1(k) Dbδjϕij . For m = 2, 3, . . . , p− 1 we set k=m−1 i1,...,ik+1=1 β:=ei1+···+eik+1 Dβfi ·D(c)ϕik+1 (δ1,...,δk)∈Np−1(k) Dbδjϕij i1,...,ik=1 β:=ei1+···+eik (δ1,...,δk)∈Np−1(k) Dbδs+(c)ϕis j 6=s Dbδjϕij It remains to show that Si = Ti for i = 1, . . . , p. Consider first i = 1. Recall that N p−1(1) = {(1, 2, . . . , p− 1)}, hence Desfi ·Db+(c)ϕs = Desfi ·Daϕs. Therefore S1 = T1. (13) Consider now i = p. For an arbitrary s > 0 N s(s) contains only one element ((1), (2), . . . , (s)). Therefore we obtain i1,...,ip=1 Dei1+···+eip fi ·D(c)ϕip (δ1,...,δp−1)∈Np−1(p−1) Dbδjϕij i1,...,ip=1 Dei1+···+eip fi ·D(c)ϕip Dbjϕij . Since a = b+ (c), where c = (ap), hence i1,...,ip=1 Dei1+···+eip fi Dajϕij i1,...,ip=1 Dei1+···+eip fi (δ1,...,δp)∈Np(p) Daδjϕij = Tp Consider now m = 2, 3, . . . , p− 1. We have i1,...,im=1 Dei1+···+eim fi ·D(c)ϕim (δ1,...,δm−1)∈Np−1(m−1) Dbδjϕij i1,...,im=1 Dei1+···+eim fi (δ1,...,δm)∈Np−1(m) Dbδs+(c)ϕis j 6=s Dbδjϕij Using decomposition N p(m) = A ∪B as in (8) we obtain i1,...,im=1 Dei1+···+eim fi (δ1,...,δm−1,δm=(p))∈A Daδjϕij i1,...,im=1 Dei1+···+eim fi (δ1,...,δm)∈B Daδjϕij i1,...,im=1 Dei1+···+eim fi (δ1,...,δm)∈Np(m) Daδjϕij = Tm We have shown that Ti = Si for i = 1, . . . , p. This finishes the proof. 4 Cr-Lohner algorithm 4.1 Why one needs an Cr-algorithm? There are several effective algorithms for the computation of rigorous bounds for solutions of ordinary differential equations, including Lohner method [Lo], Hermite–Obreschkoff algorithm [NJ] or Taylor models [BM]. For Cr-computa- tions the number of equations to solve is equal to n hence, even for r = 1 direct application of such an algorithms to equations for variations (14) leads to integration in high dimensional space and is usually inefficient. Let us recall after [Z, Sec. 6] the basic reason for this. In order to have a good control over the expansion rate of the set of initial conditions during a time step these algorithms, while being C0, are C1 ’internally’(or higher for Taylor models), because they solve non-rigorously equations for (∂ϕ ) - the variational matrix of the flow. This effectively squares the dimension of phase space of the equation and impacts heavily the computation time. But as it was observed in [Z] the equations for partial derivatives of the flow can be seen as non-autonomous and nonhomogenous linear equations, therefore we do not need additional equations for variations for them. As a result the dimension of the effective phase space for our Cr-algorithm is given by n and not a square of this number. Another important aspect of the proposed algorithm is the fact that the Lohner-type control of the wrapping effect is done separately for x-variables and variables Daϕ. This feature is not present in the blind application of C0 algorithm to the system of variational equations and it turns out that this often practically switches off the control of the wrapping effect on x-variables, as various choices used in this control become dominated by the Daϕ-variables. In [Z] a C1-algorithm has been proposed. Here we present an algorithm for computation of higher order partial derivatives. 4.2 An outline of the algorithm Let us fix r ≤ K and consider the following system of differential equations ϕ = f ◦ ϕ Daϕ = i1,...,ik=1 Dei1+···+eik f (δ1,...,δk)∈Nd(k) Daδjϕij for all a ∈ Nnd , d = 1, . . . , r. Our goal is to present an algorithm for computing a rigorous bound for the solution of (14) with a set of initial conditions ϕ(0, x0) ∈ [x0] ⊂ Rn Dϕ(0, x0) = Id Daϕ(0, x0) = 0, for a ∈ Nn2 ∪ . . . ∪ Nnr . In the sequel we will use the following notations: • if a solution of system (14) is defined for t > 0 and some x0 ∈ Rn, then for a ∈ N by Va(t, x0) we denote Daϕ(t, x0) • for [x0] ⊂ Rn by [Va(t, [x0])] we will denote a set for which we have Va(t, [x0]) ⊂ [Va(t, [x0])]. This set is obtained using an rigorous nume- rical routine described below. The Cr-Lohner algorithm is a modification of C1-Lohner algorithm [Z]. One step of Cr-Lohner is a shift along the trajectory of the system (14) with the following input and output data Input data: • tk - a current time, • hk - a time step, • [xk] ⊂ Rn, such that ϕ(tk, [x0]) ⊂ [xk], • [Vk,a] = [Vk,a(tk, [x0])] ⊂ Rn, such that Daϕ(tk, [x0]) ⊂ [Vk,a] for a ∈ Nn1 ∪ . . . ∪ Nnr . Output data: • tk+1 = tk + hk - a new current time, • [xk+1] ⊂ Rn, such that ϕ(tk+1, [x0]) ⊂ [xk+1], • [Vk+1,a] = [Vk+1,a(tk+1, [x0])] ⊂ Rn, such that Daϕ(tk+1, [x0]) ⊂ [Vk+1,a] for a ∈ Nn1 ∪ . . . ∪ Nnr . We will often skip the arguments of Vk,a when they are obvious from the context. The values of [xk+1] and [Vk+1,a], a ∈ Nn1 are computed using one step C1-Lohner algorithm. After it is done, we perform the following operations to compute [Vk+1,a] for a ∈ Nn2 ∪ . . . ∪ Nnr 1. Find a rough enclosure for Daϕ([0, hk], [xk]). 2. Compute [Vk+1,a], this will also involve some rearrangement computations to reduce the wrapping effect for V [Mo, Lo]. 5 Computation of a rough enclosure for Daϕ For a fixed multipointer a ∈ Nnd Equation (14) can be written as follows Daϕ(t, x) = Ba(t, x) +A(t, x)Daϕ(t, x) (16) where i1,...,ik=1 Dei1+···+eik f (δ1,...,δk)∈Nd(k) Daδjϕij A = Df ◦ ϕ The procedure for computing the rough enclosure is based on the notion of a logarithmic norm, which we give below. Definition 7 [HNW] For a square matrix A the logarithmic norm µ(A) is de- fined as a limit µ(A) = lim sup ‖Id +Ah‖ − 1 where ‖ · ‖ is a given matrix norm. The formulas for the logarithmic norm of a real matrix in the most frequently used norms are (see [HNW]) 1. for ‖x‖1 = i |xi|, µ(A) = maxj(ajj + i6=j |aij |) 2. for m ‖x‖2 = i |xi|2, µ(A) is equal to the largest eigenvalue of (A + AT )/2 3. for ‖x‖∞ = maxi |xi|, µ(A) = maxi(aii + j 6=i |aij |) In order to find bounds for Daϕ we use the following theorem [HNW, Thm. I.10.6] Theorem 8 Let x(t) be a solution of a differential equation x′(t) = f(t, x(t)), x ∈ Rn (18) Let ν(t) be a piecewise differentiable function with values in Rn. Assume that (t, η) ≤ l(t) for η ∈ [x(t), ν(t)] |ν′(t)− f(t, ν(t))| ≤ δ(t), where by µ(A), we denote a logarithmic norm of a square matrix A ∈ Rn×n. Then for t ≥ t0 we have |x(t) − ν(t)| ≤ eL(t) |x(t0)− ν(t0)|+ e−L(s)δ(s)ds , (19) with L(t) = l(τ)dτ . We apply the above theorem to Equation (16) to obtain Lemma 9 Let us fix x ∈ Rn. Assume that |Ba(t, x)| ≤ δ(t) and µ(A(t, x)) ≤ l(t), then for t > t0 |Daϕ(t, x)| ≤ |Daϕ(t0, x)|eL(t) + eL(t) e−L(τ)δ(τ)dτ (20) with L(t) = l(τ)dτ . Proof: Consider Equation (16) and a homogenous problem for (16) w = f(t, w) := A(t, x) · w, w ∈ Rn. (21) Using Theorem 8 we can estimate the difference between any solution of (21), w, and a solution of (16), denoted by Daϕ. |Daϕ(t) − w(t)| ≤ |Daϕ(t0)− w(t0)|eL(t) + eL(t) e−L(τ)δ(τ)dτ. (22) After a substitution w(t) = 0, which is a solution of the homogenous equation, we obtain our assertion. Usually, we do not have any control over the time dependence of δ and l, hence we will use the following Lemma 10 Assume that |Ba(t, x)| ≤ δ and µ(A(t, x)) ≤ l for t ∈ [0, h] then for t ∈ [0, h] we have |Daϕ(t, x)| ≤ |Daϕ(0, x)|max(1, ehl) + δ elt − 1 , if l 6= 0, (23) |Daϕ(t, x)| ≤ |Daϕ(0, x)|+ δt, when l = 0. (24) 5.1 The procedure for the computation of the rough en- closure for V . The procedure for the computing of the rough enclosure is iterative, which means that given a rough enclosure for ϕ([0, hk], [xk]) and rough enclosures Daϕ([0, hk], [xk]) for all a ∈ Nn1 ∪ . . . ∪ Nnp we are able to compute the rough enclosure for Daϕ([0, hk], [xk]) for a ∈ Nnp+1. The procedures for computation of the rough enclosures of ϕ([0, hk], [xk]) and Daϕ([0, hk], [xk]) for a ∈ Nn1 has been given in [Z]. Below we present an algorithm for computing [Ea] for a ∈ Nn2 ∪ . . . ∪Nnr . Input parameters: • hk - a time step, • [xk] ⊂ Rn - the current value of x = ϕ(tk, [x0]), • [E0] ⊂ Rn - a compact and convex such that ϕ([0, hk], [xk]) ⊂ [E0] • [Ea] ⊂ Rn, a ∈ Nn1 ∪ . . . ∪ Nnp such that Daϕ([0, hk], [xk]) ⊂ [Ea] for a ∈ Nn1 ∪ . . . ∪ Nnp . Output: • [Ea] ⊂ Rn, a ∈ Nnp+1 such that Daϕ([0, hk], [xk]) ⊂ [Ea] Before we present an algorithm let us observe that for a fixed a ∈ Nnp+1, Ba defined in (17) could be seen as a multivariate function of t, x and Vb = Dbϕ for b ∈ Nn1 ∪. . .∪Nnp . More precisely, putmp := ♯ Nn1 ∪ . . . ∪ Nnp , where ♯ stands for number of elements of a set. Recall that, we have defined by (6) a linear order in Nn. Hence, there is a unique sequence of multipointers b1, . . . , bmp , such that bi ∈ Nn1 ∪ . . .∪Nnp for i = 1, . . . ,mp, b1 ≤ b2 ≤ · · · ≤ bmp and bi 6= bj for i 6= j. Let us define B̃a : R× (Rn)mp+1 → Rn, Fa : R× (Rn)mp+1 → Rn B̃a(t, x, vb1 , . . . , vbmp ) = i1,...,ik=1 Dei1+···+eik f(ϕ(t, x)) (δ1,...,δk)∈Np+1(k) Fa(t, x, vb1 , . . . , vbm) = B̃a(t, x, vb1 , . . . , vbm) +Df(ϕ(t, x))Va(t, x) (26) Algorithm: To compute [Ea] for a ∈ Nnp+1 we proceed as follows 1. Find l ≥ maxx∈[E0] µ (Df(x)) 2. Compute δa ≥ max ‖B̃a‖, i.e. δa ≥ max (x,vb1 ,...,vbmp )∈[E0]×[Eb1 ]×···×[Ebmp ] ∥B̃a(0, x, vb1 , . . . , vbmp ) For example, if a = (j, c) ∈ Nn2 , then δa should be such that δa ≥ max x∈[E0],v1∈[E(1)],...,vn∈[E(n)] r,s=1 ∂xr∂xs (x) (vj)s (vc)r 3. Define [Ea]i = [−1, 1]δa e , for i = 1, . . . , n, where [Ea]i denotes i-th coordinate of [Ea]. One can refine the obtained enclosure by [Ea] := [0, hk]Fa 0, [E0], [Eb1 ], . . . , [Ebmp ] ∩ [Ea] Indeed, for i = 1, . . . , n, t ∈ [0, hk] and x0 ∈ [E0] we have Daϕi(t, x0) = Daϕi(t, x0)−Daϕ(0, x0) = t (Fa)i (θi, x0, Db1ϕ(θi, x0), . . . , Dbmpϕ(θi, x0)) = t (Fa)i (0, ϕ(θi, x0), Db1ϕ(θi, x0), . . . , Dbmpϕ(θi, x0)) for some θi ∈ [0, t] ⊂ [0, hk]. In the above we have used the fact that Fa(t, x, v1, . . . , vmp) = Fa(0, ϕ(t, x), v1, . . . , vmp). Since ϕ(θi, x0) ∈ [E0] and Dbjϕ(θi, x0) ∈ [Ebj ] for j = 1, . . . ,mp we get Daϕi(t, x0) ∈ [0, hk] (Fa)i 0, [E0], [Eb1 ], . . . , [Ebmp ] 6 Computation of [Vk+1] 6.1 Composition formulas For any p-times continuously differentiable functions f, g : Rn → Rn and a ∈ Nnp we have Da(f ◦ g) = i1,...,ik=1 Dei1+···+eik fi (δ1,...,δk)∈Np(k) Daδj gij (27) We can apply the above formula to f = ϕ(hk, ·) and g = ϕ(tk, ·) to obtain Va(tk + hk, x0) = i1,...,ik=1 VΛ−1(ei1+...+eik ) (hk, xk) (δ1,...,δk)∈Np(k) (tk, x0) for all x0 ∈ [x0]. Using notations [Vk+1,a] := [Va(tk + hk, [x0])] and [Vk,a] = [Va(tk, [x0])] we can rewrite the above equation as [Vk+1,a] = i1,...,ik=1 VΛ−1(ei1+...+eik ) (hk, [xk]) (δ1,...,δk)∈Np(k) Vk,aδj where Λ is defined by (5). 6.2 The procedure for computation of [Vk+1] We introduce new parameters od - the order of the Taylor method used in computations of Va for a ∈ Nnd . It makes sense to take o1 ≥ o2 ≥ · · · ≥ or. Input parameters: • hk - a time step, • [xk] ⊂ Rn - the current value of x = ϕ(tk, [x0]), • [Vk,a] ⊂ Rn - a current value of Vk,a(tk, [x0]), for a ∈ Nn1 ∪ . . . ∪Nnr • [E0] ⊂ Rn compact and convex, such that ϕ([0, hk], [xk]) ⊂ [E0] - a rough enclosure for [xk], • [Ea] ⊂ Rn, compact and convex, such that Daϕ([0, hk], [xk]) ⊂ [Ea], for a ∈ Nn1 ∪ . . . ∪ Nnr . Output: [Vk+1,a] ⊂ Rn, such that Va(tk + hk, x0) ∈ [Vk+1,a] (29) for x0 ∈ [x0] and a ∈ Nn1 ∪ . . . ∪ Nnr . Algorithm: We compute [Vk+1] as follows 1. Computation of Va(hk, [xk]) using Taylor method for Equation (14), i.e. for a ∈ Nnp we compute [Fa] = dti−1 Fa(0, [xk], Vb1 , . . . , Vbmp−1 ) (30) hop+1 (op + 1)! Fa(0, [E0], [Eb1 ], . . . , [Ebmp−1 ]). where Vbi = 0 for bi ∈ Nn2 ∪ . . . ∪ Nnp−1 and V(j) = enj for j = 1, . . . , n. Observe that Va(hk, [xk]) ⊂ [Fa] (31) Indeed, using Taylor series expansion we obtain that for xk ∈ [xk] and j = 1, . . . , n holds (Va)j(hk, xk) = dti−1 (Fa)j(0, xk, Vb1(0, xk), . . . , Vbmp−1 (0, xk)) hop+1 (op + 1)! (Fa)j(θi, xk, Vb1(θi, xk), . . . , Vbmp−1 (θi, xk)) for some θi ∈ [0, hk]. Observe, that (Fa)j(θi, xk, Vb1(θi, xk), . . . , Vbmp−1 (θi, xk)) (Fa)j(0, ϕ(θi, xk), Vb1(θi, xk), . . . , 0, Vbmp−1 (θi, xk)) Using ϕ(θi, xk) ∈ [E0] and Vbs(θi, xk) ∈ [Ebs ] for s = 1, . . . ,mp−1 we obtain our assertion. 2. The composition. Put [Jk] := ([F(1)], . . . , [F(n)]) Using (28) for a ∈ Nnp we have [Vk+1,a] = [αa] + [Jk] · [Vk,a], (32) where [αa] = i1,...,ik=1 [FΛ−1(ei1+...+eik ) (δ1,...,δk)∈Np(k) Vk,aδj In our implementation of the algorithm we use the symbolic differentiation to obtain formulae for Daf . Next, using the automatic differentiation we compute Fa(t, x, Vb1 (t, x), . . . , Vbmp−1 (t, x))|t=0 which appear in (30). 6.3 Rearrangement for Va - the evaluation of Equation It is well know that a direct evaluation of Equation (32) leads to wrapping effect [Mo, Lo]. To avoid it following the work of Lohner [Lo] we will use the same scheme as it was proposed in [Z]. Namely, observe that Equation (32) has exactly the same structure as the propagation equations for C1-method (see [Z, Section 3]). Moreover, all vectors Vk,a, for a ∈ Nn1 ∪ . . .Nnr ’propagate’ by the same [Jk] as did the variational part in [Z], hence it makes sense the same approach. To be more precise, each set [Vk,a], for a ∈ Nn1 ∪ . . . ∪ Nnr is represented in the following form [Vk,a] = vk,a + [Bk][rk,a] + Ck[qk,a] where [Bk] is interval matrix, Ck is point matrix, vk,a is a point vector and rk,a, qk,a are interval vectors. Observe that [Bk] and Ck are independent of a. In the sequel we will drop index a. Equation (32) leads to [Vk+1] = [α] + [Jk](vk + [Bk][rk] + Ck[qk]) (34) Let m([z]) denotes a center of an interval object, i.e. [z] is interval vector or interval matrix and ∆([z]) = [z]−m([z]). Let [Q] be an interval matrix which contains an orthogonal matrix. Usually, [Q] is computed by the orthonormalisation of the columns of m([Jk])[Bk]. [Z] = m([Jk])Ck Ck+1 = m([Z]) [Bk+1] = [Q] Then we rearrange formula (34) as follows [s] = [α] + [Jk]vk +∆([Jk])[Vk] vk+1 = m([s]) [qk+1] = [qk] [rk+1] = [Q T ](∆([s]) + ∆([Z])[qk]) + ([Q T ]m([Jk])[Bk])[rk] Summarizing, we can use the following data structure to represent ϕ(tk, [x0]) and Daϕ(tk, [x0]), for a ∈ Nn1 ∪ . . . ∪ Nnr type CnSet = record v0, r0, q0: IntervalVector; C0, B0, C,B : IntervalMatrix; {va, ra, qa : IntervalVector}a∈Nn1 ∪...∪Nnr The set ϕ(tk, [x0]) is represented as v0 + B0r0 + C0q0, the partial derivatives Daϕ(tk, [x0]) are represented as va+Bra+Cqa. The matrices B,C are common for all partial derivatives. Notice, that if we start the Cr computation with an initial condition (15) then there is no Lipschitz part at the beginning for the partial derivatives. Hence, the initial values for C and B are set to the identity matrix and the initial values for qa, ra are set to zero. If the interval vectors ra become ’thick’ (i.e. theirs diameters are larger than some threshold value) we can set a new Lipschitz part in our representation (it must be done simultaneously for all Daϕ) and reset ra in the following way qa = ra + (B TC)qa, for a ∈ Nn1 ∪ . . . ∪ Nnr ra = 0, for a ∈ Nn1 ∪ . . . ∪ Nnr C = B B = Id A similar change of the Lipshitz part may be done when vectors ra become thick in comparison to qa. 7 Derivatives of Poincaré map Consider a differential equation x′ = f(x), x ∈ Rn, f ∈ CK+1 (36) Let ϕ : R × Rn → Rn be a (local) dynamical system induced by (36). Let α : Rn → R be C1-map. Put Π = {x | α(x) = C}. Definition 11 We will say that Π is a local section for the vector field f at y0 ∈ Π if 〈∇α(y0)|f(y0)〉 6= 0. (37) Assume x0 ∈ Rn and t0 ∈ R are such that Π is a local section at ϕ(t0, x0). Consider an implicit equation α(ϕ(tP (x), x)) = C. (38) It follows easily from (37) and from the implicit function theorem that there exists a uniquely defined tP : R n−→◦ R in a neighborhood of x0, such that tP (x0) = t0. The function tP is as smooth as the flow ϕ. We will refer to tP as to the Poincare return time to section Π. We define a Poincaré map P : Rn ⊃ dom (tP ) → Rn by P (x) = ϕ(tP (x), x). (39) Usually the Poincaré map is defined as a map P : Π1−→◦ Π2, where Π1,Π2 are local sections in Rn. The approach taken here, i.e. treating the Poincaré map as map P : Rn−→◦ Rn allows us to not to worry about the coordinates on local section. In this section we are interested in the partial derivatives of P defined by (39). From (39) we can compute ∂Pi and we obtain (x) = fi(P (x)) (x) + (tP (x), x). (40) We need ∂tP . We differentiate (38) to obtain (P (x)) fk(P (x)) (x) + (tP (x), x) (∇α(P (x)) · f(P (x))) ∂tP (x) + (P (x)) (tP (x), x) = 0. (41) Hence (x) = − 1〈∇α(P (x))|f(P (x))〉 (P (x)) (tP (x), x). (42) 7.1 Higher order derivatives of the Poincaré map To make formulas transparent we will drop arguments of functions in this sec- tion, but reader should be aware that for tP and its partial derivatives the argument is x, for ϕ and Daϕ the argument is always the pair (tP (x), x). From (40) we obtain D(j,c)P = ϕD(j)tPD(c)tP + D(c)ϕD(j)tP + ϕD(j,c)tP D(j)ϕD(c)tP +D(j,c)ϕ. It is easy to see that partial derivatives of high order give rise to quite complex expressions and it is not entirely obvious how to organize it in some coherent and programmable way. For this purpose we use the following Lemma 12 For a multipointer a ∈ Nnp we have DaP = Daϕ+ (δ1,...,δk)∈Np(k) j=1Daδj tP (δ1,...,δk)∈Np(k) ∂tk−1 Daδsϕ j 6=sDaδj tP Proof: By induction on p. For p = 1 formula (43) is equivalent to (40), because the two last sums are taken over empty set. Assume (43) holds true for some p ≥ 1 and fix a ∈ Nnp+1. Our goal is to show that DaP = R1 +R2 +R3 where R1 = Daϕ+ ϕDatP (δ1,...,δk)∈Np+1(k) Daδj tP (δ1,...,δk)∈Np+1(k) ∂tk−1 Daδsϕ j 6=s Daδj tP Write a = β + γ, where β ∈ Nnp and γ = (ap+1) ∈ Nn1 . From the induction assumption we have DaP = Dγ ϕDβtP (δ1,...,δk)∈Np(k) j=1Dβδj tP (δ1,...,δk)∈Np(k) ∂tk−1 Dβδsϕ j 6=sDβδj tP i=1 Si where S1 = Daϕ+ ϕDatP DβϕDγtP ϕDβtPDγtP DγϕDβtP (δ1,...,δk)∈Np(k) j=1Dβδj tP ∂tk+1 ϕDγtP (δ1,...,δk)∈Np(k) j=1Dβδj tP (δ1,...,δk)∈Np(k) s=1Dβδs+γtP j 6=s Dβδj tP (δ1,...,δk)∈Np(k) ∂tk−1 Dβδs+γϕ j 6=sDβδj tP (δ1,...,δk)∈Np(k) DβδsϕDγtP j 6=sDβδj tP S10 = (δ1,...,δk)∈Np(k) r 6=s ∂tk−1 DβδsϕDβδr+γtP j 6=s j 6=r Dβδj tP Obviously R1 = S1. We will show that R2 = S3 + S6 + S7 and R3 = S2 + S4 + S5 + S8 + S9 + S10. Denote by Ri,k, i = 2, 3 a part of sum Ri with fixed k = 2, . . . , p + 1. Similarly, let us denote by Si,k a part of sum Si, i = 5, . . . , 10, for k = 2, . . . , p. Using decomposition of N p+1(2) as in (8) we obtain that R2,2 = S3 + S7,2. Similarly, using (8) we observe that R2,k = S6,k−1 + S7,k for k = 3, . . . , p. Finally, since N p+1(p + 1) = {((1), (2), . . . , (p + 1))} and γ = (ap+1) we find that R2,p+1 = S6,p. This shows that R2 = S3 + S6 + S7. It remains to show that R3 = S2 +S4 +S5 + S8 + S9 + S10. We will classify possible terms by the fact, where p+ 1 appears in δi, i = 1, . . . , k and how this δi enters in R3 as δs or δj . There are four cases 1. δs = (p+ 1) 2. δj = (p+ 1) 3. p+ 1 ∈ δs, |δs| ≥ 2 4. p+ 1 ∈ δj, |δj | ≥ 2 Let us fix k = 2. Let (δ1, δ2) ∈ N p+1(2). The term for case 1 is S4, for case 2 is S2, case 3 is S8,2 and case 4 is S10,2. Hence, R3,2 = S2 + S4 + S8,2 + S10,2. For k = 3, . . . , p and fixed (δ1, . . . , δk) ∈ N p+1(k) we have: case 1 is given by S5,k−1, case 2 by S9,k−1, case 3 by S8,k and case 4 by S10,k Hence, for k = 3, . . . , p we have R3,k = S5,k−1 + S9,k−1 + S8,k + S10,k. Finally, for k = p+ 1 we observe, that R3,p+1 = S5,p + S9,p. Indeed, in this case (δ1, . . . , δp+1) = ((1), (2), . . . , (p + 1)). Hence, either for δs = γ we have term S5,p and δs 6= γ we have S9,p. We have showed that R3 = S2 + S4 + S5 + S8 + S9 + S10 and the proof is finished. Hence, if we know all the partial derivatives of tP up order p we can compute the partial derivatives of the Poincaré map up the same order. In next subsection we show how to compute partial derivatives of tP for affine sections. 7.2 Partial derivatives of tP for affine sections Assume α : Rn → R is an affine map given by α(x) = α0 + αixi. This is a quite restrictive assumption about sections, but it leads to relatively simple formulas for DatP and it is sufficient for the applications we have in mind. Lemma 13 For a multipointer a ∈ Nnp holds −DatP ∇α| ∂ = 〈∇α|Daϕ〉 ∇α| ∂ (δ1,...,δk)∈Np(k) j=1Daδj tP (δ1,...,δk)∈Np(k) ∇α| ∂ ∂tk−1 Daδsϕ j 6=sDaδj tP Proof: The proof is a direct consequence of Lemma 12 and (38). Since α is affine, by differentiating of α(P (x)) = C we get 〈∇α|DaP 〉 = 0. Using formula (43) for DaP we obtain our assertion. Fix [x] ⊂ Rn and assume we have a rigorous bound for tP ([x]) ∈ [t1, t2] (see [Z, Section 6] for more details on this). Lemmas 13 and 12 show that given rigo- rous bounds for the partial derivatives Daϕ([t1, t2], [x]) and Daϕ([t1, t2], [x]) up to some order p we can compute recursively rigorous bounds for the partial derivatives of tP ([x]) and P ([x]) up to the same order. Notice, that Daϕ are given by Taylor coefficients of the solution of (14) with initial conditions P ([x]) for C0 part and Daϕ(tP (x), [x]) for equations for variations. Hence, these coef- ficients can be easily computed using the automatic differentiation algorithm. 8 Applications. One of the typical invariant sets in hamiltonian mechanics are invariant tori. However, the existence of invariant torus in a given system is often difficult to prove despite the fact that the theory is quite well developed. Probably the best work in this direction was done by Celletti and Chercia [CC1, CC2], where the an effective application (computer assisted proof) of KAM theory to the restricted three body problem modelling system consisting of Sun, Jupiter and asteroid 12 Victoria was given. Our aim here is more modest as we focus on the invariant tori emanating from the elliptic fixed point satisfying suitable twist condition. In this section we show that the rigorous computations of partial derivatives of a dynamical system up to order 3 or 5 can be used to prove that in a particular system an invariant torus exists around some elliptic periodic orbits. In this section this will be done for the forced pendulum equation and the Michelson system. 8.1 Area preserving maps on the plane, normal forms and KAM theorem Definition 14 Let f : R2 → R2 be a smooth area preserving map, such that f(p) = p. Let λ and µ be eigenvalues of df(p). Following [SM] we will call the point p • hyperbolic if λ, µ ∈ R and λ 6= µ, • elliptic if λ = µ and λ 6= µ, • parabolic if λ = µ. The following KAM theorem will be the main tool to prove the existence of invariant tori in this paper. Theorem 15 [SM, §32] Consider an analytic area preserving map f : R2 → 2, f(r, s) = (r1, s1) where r1 = r cosα− s sinα+O2l+2 s1 = r sinα+ s cosα+O2l+2 (44) r2 + s2 and O2l+2 denotes convergent power series in r, s with terms of order greater than 2l+ 1, only. If at least one of γ1, . . . , γl is not zero then the origin is a stable fixed point for map f . Moreover, in any neighborhood U of point 0 there exists an invariant curve for map f around the origin contained in U . The next theorem and its proof tells how to bring a planar area preserving map in the neighborhood of an elliptic fixed point into the form (44). Theorem 16 [SM, §23] Consider an analytic area preserving map f : R2 → R2 such that f(0) = 0. Let λ, λ̄ be complex eigenvalues of Df(0), such that |λ| = |λ̄| = 1. If λk 6= 1 for k = 1, . . . , 2l+2, then there is an analytic area preserving substitution such that in the new coordinates mapping f has form (44). The proof of the above theorem is constructive, i.e. given the power series for f at an elliptic fixed point one can construct explicitly an area preserving substitution and compute the coefficients γ0, . . . , γl in (44). An explicit formula for the coefficient γ1 in the above normal form is given in Appendix A. 8.2 The existence of invariant tori in forced pendulum. Consider an equation θ̈ = − sin(θ) + sin(ωt) (45) Observe that (45) is hamiltonian. Let us denote by Pω : R 2−→◦ R2 the Poincaré map for Equation (45) with a parameter ω, i.e. Pω = ϕ(2π/ω, ·), where ϕ : R × R2−→◦ R2 is a local flow induced by (45). Observe that (45) is nonautonomous, but it is equivalent to first order system of autonomous ODE given by = − sin(θ) + sin(ωt) (46) In the sequel all rigorous computations for (45) will be in fact performed for the system (46). Observe that to any invariant closed curve for Pω corresponds and invariant 2-torus for (45). Consider a set of parameter values Ω1 = [2, 2.994], Ω2 = [3, 3.997], Ω3 = [4, 8] Ω = Ω1 ∪ Ω2 ∪ Ω3 The following lemma was proved with computer assistance Lemma 17 For all parameter values ω ∈ Ω there exists an elliptic fixed point xω ∈ R2 for Pω. Moreover, there exists an area-preserving substitution such that in the new coordinates the map fω(x) = Pω(x+xω)−xω has the form (44) with l = 1 and γ1 6= 0. Before we give the proof, let us briefly comment about the choice of the parame- ter set Ω. For parameter values slightly lower than 2 we observe the parabolic case, i.e. there exists a parameter value ω1 for which eigenvalues of the deriva- tive of Pω1 are equal to −1. In two gaps in Ω below 3 and 4 we have resonances of low order. Namely, we have parameter values with an elliptic fixed with ei- genvalues to e±2π/3 = −1 i and e±iπ/2 = ±i, respectively. Clearly, in a computer assisted proof we need to exclude a small interval around those para- meters. For ω > 4 it seems that the interval Ω3 can be extended much further to the right without any difficulty. Proof of Lemma 17: A computer assisted proof consists of the following steps. We cover the set Ω by 9910 nonequal subintervals ωi. Diameters of ωi’s were relatively large for values far away from the parabolic cases and very small close to them. For a fixed subinterval ωi we proceed as follows 1. Let ω̄ denote an approximate center of the interval ωi. We find an approxi- mate fixed point for Pω̄ using the standard nonrigorous Newton method. Let us denote such a point by xi. 2. We define a box centered at xi, i.e we set vi := xi + [−εi, εi]2, where εi > 0 depends on subinterval ωi - the values we used are from the interval [5 · 10−6, 3 · 10−3], depending on whether xi close to parameter values corresponding to parabolic cases. 3. Using the C1-Lohner algorithm we compute the Interval Newton operator [Mo, N, A] Ni := N(Pωi − Id, xi, vi) and verify that Ni ⊂ int vi. This proves that for all ω ∈ ωi there exists a unique fixed point xω ∈ Ni for 4. Using the C3-Lohner algorithm we compute a rigorous bound for Pωi(Ni) and DαPωi(Ni), α ∈ N21 ∪N22 ∪N23. Hence, we obtain a rigorous bound for the coefficients in fω(x) = |α|=1 DaP (xω)x 5. We show that an arbitrary matrix M ∈ DPωi(Ni) has a pair of complex eigenvalues λ, λ̄ which satisfy λk 6= 1 for k = 1, . . . , 4. From Theorem 16 it follows there exists an area-preserving substitution such that in the new coordinates the map fω for ω ∈ ωi has the form (44) with l = 1. 6. We compute a rigorous bound for γ0 and γ1 which appear in the formula (44) and verify that for ω ∈ ωi holds γ1 6= 0. The rigorous bounds for the values of γ1 on Ω are γ1(Ω1) ⊂ [0.29930416771330087, 30.118260918229566] γ1(Ω2) ⊂ [0.099747909112924596, 0.56550301088840627] γ1(Ω3) ⊂ [0.18574835001593507, 0.4129279974577012] A computer assisted proof of the above took approximately 95 minutes on the Pentium IV 3GHz processor. As a straightforward consequence of Lemma 17 and Theorem 15 we obtain Theorem 18 For all parameter values ω ∈ Ω there exists an elliptic fixed point xω ∈ R2 for Pω. Moreover, any neighborhood of point xω contains an invariant curve for Pω around xω. 8.3 Higher order normal forms. In the previous section it was shown that C3 computations are sufficient to prove that for (45) a family of invariant tori exists. However, it may happen that the coefficient γ1 in the normal form vanishes. In this situation we may try to compute higher order normal form. As an example we consider a pendulum with a different forcing term, θ̈ = − sin(θ) + sin(ωt) + sin(2ωt). (47) Theorem 19 Let Pω be the Poincaré map for (47). For all parameter values ω ∈ Ω∗ = [2.9957694795, 2.9957694796] there exists an elliptic fixed point xω ∈ 2 for Pω. Moreover, any neighbourhood of point xω contains an invariant curve for Pω around xω. Proof: The main concept of the proof is the same as in Lemma 17. Using the nonrigorous Newton method we find an approximate fixed point x = (−7.7491573604896152 · 10−12,−0.54723831527031352). We set v = x + 3 · 10−5([−1, 1] × [−1, 1]). Using the C1-Lohner algorithm we compute the Interval Newton Operator of Pω − Id on v and we obtain that for all ω ∈ Ω∗, N = N(Pω − Id, center(v), v) ⊂ (N1, N2), where N1 = [−5.1582932672798325, 5.1582631625020222] · 10−10 N2 = [−0.54723831580217108,−0.54723831470891193] Since N ⊂ v we conclude that for all ω ∈ Ω∗ there exists a unique fixed point xω ∈ N for the Poincaré map. Using C5-Lohner algorithm we compute a rigorous bound for PΩ∗(N) and DαPΩ∗(N), α ∈ N21 ∪ . . . ∪ N25. Hence, we obtain a rigorous bound for the coefficients in fω(x) = |α|=1 DαP (xω)x α +O6 We show that an arbitrary matrix M ∈ DPΩ∗(N) has a pair of complex ei- genvalues λ, λ̄ which satisfy λk 6= 1 for k = 1, . . . , 6. From Theorem 16 it follows there exists an area-preserving substitution such that in the new coordinates the map fω for ω ∈ Ω∗ has the form (44) with l = 2. Next, we compute a rigorous bound for γ1 and γ2 which appear in the formula (44) and we get γ1(Ω∗) ⊂ [−5.3924276719042241, 5.381714805052106] · 10−6 γ2(Ω∗) ⊂ [199.95180660157078, 199.99104965939162] Since for ω ∈ Ω∗, γ2(ω) 6= 0 the assertion follows from Theorem 15. The main observation which makes this example interesting is that there exists ω∗ ∈ Ω∗ for which γ1(ω∗) = 0 and we cannot conclude the existence of invariant tori for all ω ∈ Ω∗ from C3 computations. To be more precise, we computed the coefficient γ1 for the parameter values ω1 = minΩ∗ and ω2 = maxΩ∗ and we get γ1(ω1) ∈ [−2.3559594437885885,−1.3593457220363871] · 10−8 γ1(ω2) ∈ [2.9671154858524365 · 10−9, 1.2819312939263052 · 10−8] Since γ1 exists for all ω ∈ Ω∗ and depends continuously on ω we conclude, that γ1(ω∗) = 0 for some ω∗ ∈ Ω∗. 8.4 Application to the Michelson system The existence of an invariant curve for a planar map f : R2 → R2 can be proven without assumption that f is measure preserving. The key assumption in the proof given in [SM] is that any curve γ around an elliptic point intersect its image under f , i.e. f(γ)∩ γ 6= ∅. Such a situation is also observed in reversible planar map around an symmetric elliptic fixed points. Definition 20 An invertible transformation M : Ω −→ Ω is called a reversing symmetry of a local dynamical system φ : T × Ω −→ Ω, T = R or T = Z if the following conditions are satisfied 1. if (t, x) ∈ dom (φ) then (−t, S(x)) ∈ dom (φ). 2. S(φ(t, x)) = φ(−t, S(x))) Remark 21 In the discrete time case, the above two conditions are equivalent to identity M ◦ f = f−1 ◦M. where f = φ(1, ·) is a generator of φ. Definition 22 Let φ : T×Ω → Ω be a local (discrete or continuous) dynamical system. For x ∈ Ω put I(x) = {t ∈ T : (t, x) ∈ dom (φ)} O (x) = {φ(t, x) ∈ Ω : t ∈ I(x)} The set O (x) will be called a trajectory of a point x. Definition 23 Assume S is an reversing symmetry for φ : T × Ω → Ω. An orbit O (x) is called S-symmetric orbit if O (x) = S(x). Remark 24 [La] In continuous case the orbit O (x) is S-symmetric if it contains a point from the set Fix(S) = {y : S(y) = y}. Remark 25 [Wi2, Lem.3.3] It is easy to see that if Θ ⊂ Ω is a Poincaré section for a R-reversible flow φ : R × Ω → Ω such that Θ = R(Θ) then the Poincaré map P : Θ → Θ is R|Θ-reversible. As we observed at the beginning of this section, an R-reversible planar map may admit an invariant curve around an R-symmetric elliptic fixed point. In reversible case a planar map admits the same normal form around symmetric, el- liptic fixed point as in the area-preserving case and the substitution which tends the map to the normal form is exactly the same as we described in Appendix A – for details see [Se, BHS]. Consider an ODE  ẋ = y ẏ = z ż = c2 − y − 1 On one hand, the system (48) is an equation for the steady state solution of one-dimensional Kuramoto-Sivashinsky PDE and it is known in the literature as the Michelson system[Mi]. On the other hand, this system appears as a part of the limit family of the unfolding of the nilpotent singularity of codimension three (see [DIK1]). The system (48) is reversible with respect to the symmetry R : (x, y, z, t) → (−x, y,−z,−t) (49) and since the divergence vanishes it is also volume preserving. A dynamical system induced by (48) exhibits several types of dynamics for different values of parameter. For sufficiently large c there is a simple invariant set consisting of two equilibria (±c 2, 0, 0) and heteroclinic orbit between them [MC]. Lau [Lau] numerically observed that when the parameter c decreases a cascade of cocoon bifurcations occurs and at the limit value c ≈ 1.266232337 a periodic orbit is born through a saddle-node bifurcation. This hypothesis has been proved in [KWZ]. The computer assisted proof of this fact given in [KWZ] uses the algorithm presented in this paper in order to compute partial derivatives up to second order for a certain Poincaré map. For the parameter value equal to one and slightly smaller than one it was proven in [DIK2, Wi1, Wi2, Wi3] that the system has rich and complicated dynamics including symbolic dynamics, heteroclinic solutions, Shilnikov homo- clinic solutions. However, as the bifurcations diagram presented by Michelson suggests [Mi, Fig.1] for all parameter values c ∈ (0, 0.3195) there are at least two elliptic periodic orbits with large invariant islands around them. In this section we present a proof that such islands exist for some range of parameter values. The main idea of the proof is almost the same as in the previous section. There are two main differences. First, the Poincaré map will not be a time shift. Therefore computations of the partial derivatives of the Poincaré map require Lemma 12 and Lemma 13. Second difference is: we use the shooting method instead of the interval Newton method for the proof of the existence of symmetric periodic orbit. The aim of this section is to prove the following Theorem 26 For all parameter values from the set C = C1 ∪ C2 = [0.1, 0.225]∪ [0.226, 0.25] there exists a symmetric elliptic periodic orbit for the Michelson system (48). Moreover, each neighbourhood of such an orbit contains a 2D tori invariant under the flow generated by the Michelson system. Let us define the Poincaré section Π := {(0, z, y) : z, y ∈ R}. Let Pc = (P1, P2) : Π−→◦ Π be the Poincaré map for the system with the parameter value c. Notice, that Pc is in fact a half Poincaré map, which means that the trajectory of x crosses Π in opposite directions when passing through x and Pc(x), and therefore periodic orbits for the Michelson system corresponds to periodic points for P 2. Since the section Π is invariant under symmetry (x, y, z) → (−x, y,−z), from Remark 25 the Poincaré map is also reversible with respect to an involu- tion R(y, z) = (y,−z). We will use the same letter R to denote the reversing symmetry of the Poincaré map and the Michelson system. Let us comment about the choice of the set C. In the gap between intervals C1 and C2 there is a parameter value c∗ for which the eigenvalues of the Poincaré map P 2c∗ are ±i. Apparently at this parameter value we have a bifurcation and four periodic islands are born as it is shown in Fig.1 - see also a movie mpp.mov available at [Wi4] which presents an animation of the phase portrait of Pc for the parameter values from the range [0.1, 0.25]. Proof of Theorem 26. The main concept of the proof is quite similar to the one presented in Lemma 17. We divide the set C of parameter values onto 20800 nonequal parts (smaller when close to the bifurcation parameter c∗ and close to 0.1 and 0.25). For a fixed subinterval ci from the grid we proceed as follows 1. Let c̄ denote a center of the interval ci. We find an approximate fixed point of P 2c̄ using the standard nonrigorous Newton method. Let us denote this point by (yi, zi). 2. Since the map Pc is reversible one can prove the existence of the fixed point for P 2c using the shooting method as follows. Let Fix(R) = {(y, z) ∈ Π : R(y, z) = (y, z)} = {(y, 0) ∈ Π : y ∈ R}. Since Pc satisfies (Pc ◦ R)2 = Id whenever the left side is defined, one can see that if x ∈ Fix(R) and Pc(x) ∈ Fix(R) then P 2c (x) = x. Let us remark, that we always get an approximate fixed points (yi, zi) resulting from the nonrigorous Newton method very close to Fix(R). We define two points u1 = (yi − εi, 0), u2 = (yi + εi, 0) ∈ Fix(R), where εi is a small number depending on ci and we show that πz(Pci(u1)) · πz(Pci(u2)) < 0, where πz is a projection onto z coordinate. Hence, if the Pci is defined on the set Ni = (0, [yi − εi, yi + εi], 0) then for all parameter values c ∈ ci there is a point uc ∈ N which satisfies πz(Pc(uc)) = 0 and therefore Figure 1: Phase portrait of the Poincaré map Pc (top) before bifurcation for c = 0.225 and (bottom) after bifurcation for c = 0.226 with four periodic islands. Between those parameters resonant case occurs with eigenvalues equal to ±i. See also auxiliary material [Wi4]. Pc(uc) ∈ Fix(R). This shows that for all c ∈ ci there exists a fixed point for P 2c inside Ni provided Pc is defined on Ni, which will be discussed below. 3. Using C3-Lohner algorithm we compute rigorous bounds for P 2ci(Ni) and DαP 2ci(Ni) for α ∈ N 1 ∪ N22 ∪ N23. This implies also that Ni ⊂ domPci . 4. We show that an arbitrary matrix M ∈ DPci(Ni) has a pair of complex eigenvalues λ, λ̄ which satisfy λk 6= 1 for k = 1, . . . , 4. From Theorem 16 it follows there exists an area-preserving substitution such that in the new coordinates the map Pc for c ∈ ci has the form (44) with l = 1. 5. We compute a rigorous bound for γ0 and γ1 which appear in the formula (44) and verify that for c ∈ ci holds γ1 6= 0. The rigorous bounds for the values of γ1 on C are γ1(C1) ⊂ [0.014515898754816965, 157.76639522562903] γ1(C2) ⊂ [1.1002393483255526, 151.35147664498677] The computer assisted proof of the above took approximately 7 hours and 50 minutes on the Pentium IV 3GHz processor. 9 Implementation notes. All the algorithms presented in this paper have been implemented in C++ by authors and are part of the CAPD library [CAPD]. In particular, the package implements the computation of partial derivatives of a flow with respect to initial condition, partial derivatives of Poincaré maps for linear sections and computations of normal forms for planar maps up to order 5. The implementation combines the automatic and symbolic differentiation in order to generate a coefficients in Taylor series for the solutions of the system (14). Our tests shows that without difficulty we can compute partial derivatives up to order 3 for an equation in 8-dimensional phase space (which gives 1320 equations to solve) on a computer with 512MB memory. However, our current implementation is optimized for lower dimensional problems. All the trees which represent formulas (14) are stored in the memory of a computer. This speeds up computations because we do not need to recompute all the multiindices, multipointers and submultipointers in each step of the algorithm. Unfortunately, such an implementation is memory-consuming. Therefore, higher dimensional problems require a computer with huge memory even for C3 or C5 computations. A Explicit formulas for third order normal forms for a planar map The goal of this section is to give some details about the proof of Theorem 16. We want to present some formulas to give the reader the feeling about the necessary computations. Throughout this section we assume that the assumptions of Theorem 16 are satisfied. In the neighbourhood of 0 f is given by a real, convergent power series f(x, y) = (x1, y1) ak−l,lx lyk−l bk−l,lx lyk−l Denote also by f : C2 → C2 a complex extension of f . Let λ, λ̄ ∈ C be complex eigenvalues of Df(0) and v, v̄ ∈ C corresponding eigenvectors (here bar denotes the complex conjugation). Then, using a linear substitution of the form L = [vT , v̄T ], we can change the coordinate system such that in the new coordinates the mapping f has the form f(ξ, η) = (λξ + p(ξ, η), λ̄η + q(ξ, η)) p(ξ, η) = pl,k−lξ lηk−l q(ξ, η) = ql,k−lξ lηk−l pi,j = qj,i for i, j ≥ 0. The last condition is a consequence of the invariance of R2 ⊂ C2 under the complex map f . We will refer to it as the reality condition. Namely, the set 2 ⊂ C2 in the new coordinates (ξ, η) is given by ξ = η and the condition f(R2) ⊂ R2 expressed in coordinates (ξ, η) is equivalent to (50). Assume now, that λk 6= 1 for k = 1, . . . , 4. Then an analytic area-preserving substitution satisfying reality condition (50) (Φ(z, v),Ψ(z, v)) = (z1, v1) z1 = z + φl,k−lz lvk−l + · · · v1 = v + ψl,k−lz lvk−l + · · · where ψ2,0 = φ0,2 = −λ2p0,2(λ3 − 1)−1 ψ1,1 = φ1,1 = −p1,1(λ− 1)−1 ψ0,2 = φ2,0 = p2,0(λ 2 − λ)−1 ψ3,0 = φ0,3 = −λ3 (p0,3 + p1,1φ0,2 + 2q0,2ψ0,2) (λ4 − 1)−1 ψ2,1 = φ1,2 = λ2 − 1 (p1,2 + 2p2,0φ0,2 + p1,1φ1,1 + p1,1ψ0,2 + 2p0,2ψ1,1) ψ1,2 = φ2,1 = −φ2,0ψ0,2 + φ0,2ψ2,0 ψ0,3 = φ3,0 = (p3,0 + 2p2,0φ2,0 + p1,1ψ2,0) (λ 3 − λ)−1 brings f = (f1, f2) to the normal form (z, v) → (z(α0 + α2zv), v(β0 + β2zv)) +O((zv)2) β0 = α0 = λ β2 = α2 = q1,2 + 2q2,0φ0,2 + q1,1φ1,1 + q1,1ψ0,2 + 2q0,2ψ1,1 Finally, let γ0 ∈ R be such that λ = α0 = eiγ0 and we compute coefficient γ1 by From the proof given in [SM] it follows that γ1 ∈ R and the mapping f in coordinates (z, v) has the form f(z, v) = zei(γ0+γ1zv), ve−i(γ0+γ1zv) where O4 is a convergent power series with the terms of degree at least 4. Again, the coefficients of f(z, v) satisfy reality condition (50). In order to express this normal form in terms of real variables we make a linear substitution z = r + is, v = r − is and we obtain the normal form for f f(r, s) = (r1, s1) +O4 r1 = r cos(γ0 + γ1(r 2 + s2))− s sin(γ0 + γ1(r2 + s2)) s1 = r sin(γ0 + γ1(r 2 + s2)) + s cos(γ0 + γ1(r 2 + s2)) which agrees with (44). The formulas for higher order terms φi,j , ψi,j (and for γ2, which are not given here) has been computed in Mathematica. References [A] G. Alefeld, Inclusion methods for systems of nonlinear equations - the interval Newton method and modifications, in Topics in Validated Com- putations, J. Herzberger (Editor), Elsevier Science B.V., 1994, pages [BHS] H.W. Broer, G.B. Huitema and M.B. Sevryuk, Quasi-periodicity in families of dynamical systems: order amidst chaos, Lecture Notes in Mathematics, Vol. 1645, Springer Verlag, (1996). [BM] M. Berz, K. Makino, New Methods for High-Dimensional Verified Qua- drature, Reliable Computing, 5, 13-22 (1999) [CAPD] CAPD – Computer Assisted Proofs in Dynamics group, a C++ pa- ckage for rigorous numerics, http://capd.wsb-nlu.edu.pl. [CC1] A. Celletti, L. Cherchia, KAM Stability for three-body problem of the Solar System, Z. angw. Math. Phys. 57 (2006) 33-41 [CC2] A. Celletti, L. Cherchia, KAM Stability and Celestial Mechanics, Me- moirs of the AMS, Vol 187, Num 878 (2007) [DIK1] F. Dumortier, S. Ibáñez, and H. Kokubu, New aspects in the unfolding of the nilpotent singularity of codimension three, Dynam. Syst. 16 (2001), 63–95. [DIK2] F. Dumortier, S. Ibáñez, and H. Kokubu, Cocoon bifurcation in three dimensional reversible vector fields, Nonlinearity 19 (2006), 305–328. [GZ] Z. Galias, P. Zgliczyński, Computer assisted proof of chaos in the Lo- renz system, Physica D, 115, 1998,165–188 [HNW] E. Hairer, S.P. Nørsett, G. Wanner, Solving Ordinary Differential Equations I, Nonstiff Problems, Springer-Verlag, Berlin Heidelberg 1987. [HZHT] B. Hassard, J. Zhang, S. Hastings, W. Troy, A computer proof that the Lorenz equations have ”chaotic” solutions, Appl. Math. Letters, 7 (1994), 79–83 [KZ] T. Kapela, P. Zgliczyński, The existence of simple choreographies for N-body problem - a computer assisted proof, Nonlinearity, 16 (2003), 1899-1918 [KWZ] H. Kokubu, D.Wilczak, P. Zgliczyński, Rigorous verifi- cation of cocoon bifurcations in the Michelson system, http://www.ii.uj.edu.pl/~wilczak, submitted [La] J.S.W. Lamb, Reversing symmetries in dynamical systems, PhD The- sis, Amsterdam University, 1994 http://capd.wsb-nlu.edu.pl http://www.ii.uj.edu.pl/~wilczak [Lau] Y-T Lau, The “cocoon” bifurcations in three-dimensional systems with two fixed points, Int. Jour. Bif. Chaos, Vol.2, No.3 (1992) 543-558. [Lo] R.J. Lohner, Computation of Guaranteed Enclosures for the Solutions of Ordinary Initial and Boundary Value Problems, in: Computational Ordinary Differential Equations, J.R. Cash, I. Gladwell Eds., Claren- don Press, Oxford, 1992. [MC] C.K. McCord, Uniqueness of connecting orbits in the equation Y (3) = Y 2 − 1, J. Math. Anal. Appl. 114, 584-592. [Mi] D. Michelson, Steady solutions of the Kuramoto–Sivashinsky equation, Physica D, 19, (1986) 89-111. [Mo] R.E. Moore, Interval Analysis. Prentice Hall, Englewood Cliffs, N.J., [MM] K. Mischaikow, M. Mrozek, Chaos in the Lorenz equations: A compu- ter assisted proof. Part II: Details, Mathematics of Computation, 67, (1998), 1023–1046 [MZ] M. Mrozek, P. Zgliczyński, Set arithmetic and the enclosing problem in dynamics, Annales Pol. Math., 2000, 237–259 [NJ] N. S. Nedialkov, K. R. Jackson, An Interval Hermite – Obreschkoff Method for Computing Rigorous Bounds on the Solution of an Initial Value Problem for an Ordinary Differential Equation, chapter in the book Developments in Reliable Computing, editor T. Csendes, 289-310, Kluwer, Dordrecht, Netherlands, 1999. [N] A. Neumeier, Interval methods for systems of equations, Cambridge University Press, 1990. [RNS] T. Rage, A. Neumaier, C. Schlier, Rigorous verification of chaos in a molecular model, Phys. Rev. E 50 (1994), 2682–2688 [Ra] L.B. Rall, Automatic Differentiation: Techniques and Applications, volume 120 of Lecture Notes in Computer Science. Springer Verlag, Berlin, 1981 [Se] Sevryuk, M. B. Reversible systems, Lecture Notes in Mathematics, 1211. Springer-Verlag, Berlin, 1986 [SM] C.L. Siegel, J.K. Moser, Lectures on Celestial Mechanics, Springer- Verlag Berlin Heidelberg New York, 1995. [SK] D. Stoffer, U. Kirchgraber, Possible chaotic motion of comets in the Sun Jupiter system - an efficient computer-assisted approach, Nonli- nearity, 17 (2004) 281-300. [T] W. Tucker, A Rigorous ODE solver and Smale’s 14th Problem, Foun- dations of Computational Mathematics, (2002), Vol. 2, Num. 1, 53-117 [W] W. Walter, Differential and integral inequalities, Springer-Verlag Ber- lin Heidelberg New York, 1970 [Wi1] D. Wilczak, The existence of Shilnikov homoclinic orbits in the Mi- chelson system: a computer assisted proof, Found. Comp. Math. Vol.6, No.4, 495-535, (2006). [Wi2] D. Wilczak, Chaos in the Kuramoto–Sivashinsky equations – a compu- ter assisted proof, J. Diff. Eqns., 194, 433-459 (2003). [Wi3] D. Wilczak, Symmetric heteroclinic connections in the Michelson sys- tem – a computer assisted proof, SIAM J. App. Dyn. Sys., Vol.4, No.3, 489-514 (2005). [Wi4] D. Wilczak, http://www.ii.uj.edu.pl/~wilczak, a refference for auxiliary materials. [WZ] D. Wilczak, P. Zgliczyński, Heteroclinic Connections between Periodic Orbits in Planar Restricted Circular Three Body Problem - A Compu- ter Assisted Proof, Commun. Math. Phys. 234, 37-75 (2003). [Z1] P. Zgliczyński, Computer assisted proof of chaos in the Hénon map and in the Rössler equations, Nonlinearity, 1997, Vol. 10, No. 1, 243–252 [Z] P. Zgliczyński, C1-Lohner algorithm, Foundations of Computational Mathematics, (2002) 2:429–465 [ZPer] P. Zgliczyński, Lohner Algorithm for perturbations of ODEs and diffe- rential inclusions, http://www.ii.uj.edu.pl/~zgliczyn http://www.ii.uj.edu.pl/~wilczak http://www.ii.uj.edu.pl/~zgliczyn Introduction Basic definitions Multiindices Multipointers Equations for variations Cr-Lohner algorithm Why one needs an Cr-algorithm? An outline of the algorithm Computation of a rough enclosure for Da The procedure for the computation of the rough enclosure for V. Computation of [Vk+1] Composition formulas The procedure for computation of [Vk+1] Rearrangement for Va - the evaluation of Equation (32) Derivatives of Poincaré map Higher order derivatives of the Poincaré map Partial derivatives of tP for affine sections Applications. Area preserving maps on the plane, normal forms and KAM theorem The existence of invariant tori in forced pendulum. Higher order normal forms. Application to the Michelson system Implementation notes. Explicit formulas for third order normal forms for a planar map ABSTRACT We present a Lohner type algorithm for the computation of rigorous bounds for solutions of ordinary differential equations and its derivatives with respect to initial conditions up to arbitrary order. As an application we prove the existence of multiple invariant tori around some elliptic periodic orbits for the pendulum equation with periodic forcing and for Michelson system. <|endoftext|><|startoftext|> Mon. Not. R. Astron. Soc. 000, 1–8 (2006) Printed 1 November 2018 (MN LATEX style file v2.2) Timing evidence in determining the accretion state of the Seyfert galaxy NGC 3783 D. P. Summons1,3⋆, P. Arévalo1 , I. M. McHardy1, P. Uttley2 and A. Bhaskar3 1School of Physics and Astronomy, University of Southampton, Southampton SO17 1BJ, UK 2Astronomical Institute ‘Anton Pannekoek’, University of Amsterdam, Kruislaan 403, 1098 SJ, Amsterdam, the Netherlands 3School of Engineering Science, University of Southampton, Southampton SO17 1BJ, UK Received /Accepted ABSTRACT Previous observations with the Rossi X-ray Timing Explorer (RXTE) have suggested that the power spectral density (PSD) of NGC 3783 flattens to a slope near zero at low frequencies, in a similar manner to that of Galactic black hole X-ray binary systems (GBHs) in the ‘hard’ state. The low radio flux emitted by this object, however, is inconsistent with a hard state interpretation. The accretion rate of NGC 3783 (∼ 7% of the Eddington rate) is similar to that of other AGN with ‘soft’ state PSDs and higher than that at which the GBH Cyg X-1, with which AGN are often compared, changes between ‘hard’ and ‘soft’ states (∼ 2% of the Eddington rate). If NGC 3783 really does have a ‘hard’ state PSD, it would be quite unusual and would indicate that AGN and GBHs are not quite as similar as we currently believe. Here we present an improved X-ray PSD of NGC 3783, spanning from ∼ 10−8 to ∼ 10−3 Hz, based on considerably extended (5.5 years) RXTE observations combined with two orbits of continuous observation by XMM-Newton. We show that this PSD is, in fact, well fitted by a ‘soft’ state model which has only one break, at high frequencies. Although a ‘hard’ state model can also fit the data, the improvement in fit by adding a second break at low frequency is not significant. Thus NGC 3783 is not unusual. These results leave Arakelian 564 as the only AGN which shows a second break at low frequencies, although in that case the very high accretion rate implies a ‘very high’, rather than ‘hard’ state PSD. The break frequency found in NGC 3783 is consistent with the expectation based on comparisons with other AGN and GBHs, given its black hole mass and accretion rate. Key words: galaxies: active – galaxies: Seyfert – galaxies: NGC 3783 – X rays: galaxies 1 INTRODUCTION Super-massive black holes in active galactic nuclei (AGN) and Galactic stellar-mass black hole X-ray binary sys- tems (GBHs) both display aperiodic X-ray variability which may be quantified by calculating the power spec- tral densities (PSDs) of the X-ray light curves. The PSDs can typically be represented by red-noise type power laws (i.e. P (ν), the power at frequency ν, ∝ να where α ∼ -1) with a bend or break (to α 6 -2) at a characteristic PSD frequency. The time-scale, correspond- ing to the bend-frequency, scales approximately linearly with black hole mass from AGN to GBHs (McHardy 1988; Edelson & Nandra 1999; Uttley et al. 2002, 2005; Markowitz et al. 2003; McHardy et al. 2004, 2005), albeit ⋆ E-mail: dps@astro.soton.ac.uk with some scatter. However, the scatter is entirely accounted for by variations in accretion rate, allowing scaling between AGN and GBHs on time-scales from ∼ years to ∼ ms (McHardy et al. 2006). GBHs are observed in a number of distinct X-ray spec- tral states which also have distinct X-ray timing proper- ties. Two common states are the low/hard (hereafter ‘hard’) and high/soft (hearafter ‘soft’) states. In the hard state, the energy-spectrum is dominated by a highly variable power law component and the PSDs are well fitted by multiple broad Lorentzians. For use in AGN, where signal/noise is lower than in GBHs, this PSD shape can be approximated by a double-bend power law with slopes α = 0, -1 and -2, from low to high frequency, where the high- and low-frequency bends correspond to the strongest peaks in the Lorentzian parameterisation. The breaks are typically separated by only one to two decades in frequency. In the soft state, the en- c© 2006 RAS http://arxiv.org/abs/0704.0721v1 2 D. P. Summons et al ergy spectrum is dominated by an approximately constant thermal disc component which extends into the X-ray band in GBHs but which in AGN is shifted down to the opti- cal/UV band. Therefore, a meaningful comparison between the PSDs of soft state GBH and AGN can only be made in cases where the GBH power-law emission is strong enough to show significant variability. Such GBHs are rare and the best example is Cyg X-1 which shows a ‘1/f’ PSD over many decades of frequencies (Reig et al. 2002). The soft state is distinguished from the hard state by having only one, high frequency, break in this power law, from slope -1 to -2. It has been suggested that this pure simple broken or cut-off power-law PSD shape is unique to the soft state of Cyg X-1, which is a persistent source. However in transient GBHs with similar X-ray spectra, the power law PSD com- ponent may be seen in combination with broad Lorentzian features (Done & Gierliński 2005). Axelsson et al. (2006) also note that a mixed power law plus Lorentzian PSD is also present in Cyg X-1 in lower luminosity, harder spectral states, but as the luminosity rises the Lorentzian features weaken and the power law PSD component strengthens un- til, in the softest state, it completely dominates. Since the softest spectral states of transient GBHs are dominated by constant disc emission we cannot determine whether they show a similar PSD shape to Cyg X-1. However, a direct comparison of transient GBHs and Cyg X-1 is complex, since the transients show much larger luminosity changes, and complex hysteresis effects in spec- tral hardness versus luminosity (e.g. Homan et al. 2001, Belloni et al. 2005) which are not seen in Cyg X-1. There- fore it is not clear that one can compare timing properties between Cyg X-1 and transient GBHs simply as a function of observed X-ray spectrum. None the less, it is still interesting that the X-ray spec- trum of Cyg X-1 never becomes totally disc-dominated, and always contains a relatively strong variable component whose PSD resembles that of X-ray bright AGN. If variabil- ity originates, at least partly, in the disc, so power spectral shape is related to the disc structure, that structure might be severely disrupted during outbursts, thereby suggesting a possible difference between the persistent Cyg X-1 and the transient GBH sources. The similarities between the PSDs of Cyg X-1 and AGN may also be related to the possible similarities in accretion flows between AGN and Cyg X-1 noted by Done & Gierliński (2005). To date, NGC 3783 and the Narrow Line Seyfert 1 Galaxy (NLS1) Ark 564 are the only AGN with suggested second, low-frequency breaks in their PSDs (i.e. similar to low/hard GBHs) and are both commonly referred to as be- ing unusual (e.g. Done & Gierliński 2005). The power spec- tral evidence for a second break in the case of Ark 564 is very strong (Pounds et al. 2001; Papadakis et al. 2002; Markowitz et al. 2003, McHardy et al. in prep.). Of all the AGN with good timing data, Ark 564 shows the highest accretion-rate (possibly super-Eddington) so it would not be surprising if it were in an unusual state, e.g. the ‘very high’ state where the PSD, in GBHs, also displays two distinct breaks. The properties of NGC 3783, on the other hand, are similar to those of AGN with proven soft-state PSDs (e.g. NGC 3227, NGC 4051 McHardy et al. 2004, MCG-6-30-15 McHardy et al. 2005), and in particular it is radio quiet (e.g. Reynolds 1997). In the hard state, GBHs are strong radio sources whereas in the soft state the radio emission is quenched (Corbel et al. 2000; Fender 2001; Körding et al. 2006). We also note that NGC 3783 has a more moder- ate accretion rate than Ark 564 (∼ 7%), and more similar to the other AGN mentioned above, and Cyg X-1 changes from the hard to the soft state at around 2% of the Edding- ton accretion rate (i.e. ṁE =0.02) (Pottschmidt et al. 2003; Wilms et al. 2006; Axelsson et al. 2006). These two facts do not lie easily with a hard state identification of NGC 3783. Thus it would be surprising, and might indicate that our cur- rent ideas regarding the scaling between AGN and GBHs are not entirely correct, if NGC 3783 were proven to have a hard state PSD. It is therefore important to determine whether NGC 3783 does have a second, low frequency, break in its PSD or not. Markowitz et al. (2003) recognised the presence of a break in the 2− 10 keV PSD of NGC 3783 at 4× 10−6 Hz and found provisional evidence for a second lower-frequency break at ∼ 2×10−7 Hz. Specifically, Markowitz et al. (2003) rejected the possibility that the PSD is described by a single- break power law with low-frequency slope -1, similar to other AGN, at the 98% confidence level. In this paper we re- investigate the evidence for the second break in the PSD of NGC 3783, using new long-term monitoring data that cov- ers the frequency range where the break appears to be. By including additional RXTE archival data spanning several years, along with short time-scale observations by XMM- Newton, we will demonstrate that the improved PSD is perfectly compatible with a single-bend power law, consis- tent with the behaviour of the other moderately-accreting Seyferts. In Section 2 we describe the observations and the methods by which we extract the RXTE and XMM-Newton light curves. In Sections 3 we discuss the PSD of NGC 3783 as produced from the RXTE and XMM-Newton observa- tions, and compare it with various PSD models. In Section 4 we briefly review the implications of our observations. 2 OBSERVATIONS AND DATA REDUCTION 2.1 RXTE Data Reduction From 1999 to 2006, NGC 3783 has been the target of various monitoring campaigns with RXTE. These campaigns have consisted of short, ∼1 ks duration, observations with the proportional counter array (PCA, Zhang et al. 1993). We have analysed the archival PCA STANDARD-2 data and our own proprietary data with FTOOLS v6.0.2 using stan- dard extraction methods. We use data from the top layers of PCUs 0 and 2 up to 2000 May 12 and only top layer PCU 2 data from observations after this date. The remaining PCUs were not used due to repeated breakdowns. Data were selected according to the standard ‘good- time’ criteria, i.e. target elevation < 10◦, offset pointing < 0.02◦, and electron contamination < 0.1. The background was simulated with the L7 model for faint sources using PCABACKEST v3.0. The response matrices for each PCA observation were calculated using PCARSP v10.1. The fi- nal 2− 10 keV fluxes were calculated using XPSEC v12.2.1 by fitting a power law with galactic absorption to the PHA data. The RXTE data used in our analysis, together with the c© 2006 RAS, MNRAS 000, 1–8 Accretion state of NGC 3783 3 Figure 1. RXTE long-term light curve of NGC 3783 in the 2- 10keV band. Figure 2. RXTE intense sampling light curve in the 2-10 keV band of NGC 3783. sampling patterns, are listed in Table 1 and displayed in Fig. 1. The early data (to MJD 52375) with 4 d sampling, together with the 20 d period of 3 h sampling already pre- sented by Markowitz et al. (2003) are followed, after a 2 year gap, by our new long-term monitoring, with 2 d sampling. As the gap is large compared to the duration of each moni- toring campaign we will include data from each monitoring campaign as separate lightcurves in our fits. 2.2 XMM-Newton Data Reduction NGC 3783 was observed by XMM-Newton during revolu- tions 371 and 372, between 2001 December 17 and 2001 De- cember 21. Temporal analysis of these data were first pre- sented by Markowitz (2005) who discusses the coherence, frequency-dependent phase lags, and variation of high fre- quency PSD slope with energy. Here we use these data to constrain the high frequency part of the overall long and short timescale PSD. We used data from the European Pho- ton Imaging Cameras (EPIC) PN and MOS2 instruments, which were operated in imaging mode. MOS1 was operated in Fast Uncompressed Mode and we do not use those data here. The PN camera was operated in Small Window mode, using the medium filter. Source photons were extracted from a circular region of 40′′ radius and the background was se- Figure 3. XMM-Newton light curve in the 0.2–2 keV band of NGC 3783. lected from a source-free region of equal area on the same chip. We selected single and double events, with quality flag=0. The MOS2 camera was operated in the Full Win- dow mode, using the medium filter. We extracted source and background photons using the same procedure as for the PN data and selected single, double, triple and quadru- ple events. These data showed no serious pile-up when tested with the XMM-SAS task epatplot. We constructed light curves, for each detector and or- bit, in the 0.2–2, 2–10 and 4–10 keV energy bands. We filled in the ∼ 5 ks gap in the middle of orbit 371 light curves, and some other much smaller gaps, by interpolation and added Poisson noise. The resulting PN and MOS2 continuous light curves were then combined to produce the final light curves for each orbit. The combined, background subtracted, av- erage count rates in the 0.2–10 keV band were 11.8 c/s for orbit 371 and 15.8 c/s for orbit 372, and the 0.2-2 keV com- bined light curve is shown in Fig. 3. Poisson noise dominates the PSD on timescales shorter than 1000s, so the light curves were binned into 200s bins. 3 POWER SPECTRAL MODELLING 3.1 Combining RXTE and XMM-Newton data To determine the PSD over the largest possible frequency range we combine the RXTE and XMM-Newton data. In GBHs the break-frequency and slope of the PSD below the break appear to be independent of the chosen energy band (Cui et al. 1997; Churazov et al. 2001; Nowak et al. 1999; Revnivtsev et al. 2000; McHardy et al. 2004). On the other hand, the PSD normalisation and the slope above the break are often energy-dependent (Markowitz 2005). Therefore, when combining data from different instruments, it is prefer- able to use similar energy ranges. The RXTE data are in the 2–10 keV band and, for NGC 3783, that band has a median photon energy of 5.7 keV. The XMM-Newton band with the same median photon energy is 4.1–10 keV. How- ever the count rate in that XMM-Newton band is low (2 c/s) so we only detect significant source power above the Poisson noise level at frequencies below 10−4Hz. To probe higher frequencies we can use the 0.2–2 keV XMM-Newton data (8.8 c/s) but we must re-scale its PSD normalisation to c© 2006 RAS, MNRAS 000, 1–8 4 D. P. Summons et al Light curve Sampling interval Observation length Date Range [MJD] RXTE Long-term 1 ∼4.36 days 1194.6 days 51180.5–52375.1 RXTE Long-term 2 ∼2.1 days 928.3 days 53063.4–53991.6 RXTE Intense monitoring ∼3.2 hours 19.9 days 51960.1–51980.1 XMM-Newton observations (2 orbits) 200-s 3.2 days 52260.8–52264.0 Table 1. Summary of the RXTE and XMM-Newton light curves used in the analysis of NGC 3783, including their sampling frequency and date range. that of the 4–10 keV PSD. We determined the scaling cor- rection by producing PSDs in both energy bands and fitting the same bending power law model to the noise-subtracted data. On the assumption that the PSD shape below the high frequency break is energy-independent, the combined RXTE 2–10 keV and XMM-Newton 0.2–2 keV PSD will then have the shape of the 0.2–2 keV PSD. 3.2 Monte Carlo simulations We use the Monte Carlo technique of Uttley et al. (2002) (PSRESP), to estimate the underlying PSD parameters in the presence of sampling biases. In this method we first calculate the observed (or ‘dirty’) PSD, in parts, from the observed lightcurves, using the Discrete Fourier Transform. Here the PSD estimates are binned in bins of width 1.3ν, where ν is the starting frequency by taking the average of the log of power (Papadakis & Lawrence 1993). We require a minimum of 4 PSD estimates per bin. We then compare si- multaneously the dirty PSDs from each lightcurve with var- ious model PSDs derived from lightcurves simulated with the same sampling pattern as the real observations. We al- ter the model parameters to obtain the best fit for any given model. We refer the reader to Uttley et al. (2002) for a full discussion of the method. For each set of chosen underlying-PSD model param- eters, we simulate red-noise light curves, as described by Timmer & Koenig (1995). The RXTE light curves are sim- ulated with time resolutions of 10.5 h, 5.0 h and 18 m for the first and second long time-scale and the medium time-scale light curves respectively. The simulated resolution, which is 10 times shorter than the typical sampling intervals of the real observations, given in column 2 of Table 1, is to take into account the effect of aliasing. These simulated light curves were resampled and binned to match the real NGC 3783 ob- servations. XMM-Newton light curves were simulated with 200-s resolution, as at shorter time-scales the underlying varability power is negligible compared to the Poisson noise, so aliasing does not play a role. The Poisson noise level was not subtracted from the observed PSD but was added to the simulated PSDs. To reproduce the effect of red-noise leak, each light curve was simulated to be ∼300 times longer than the real observation, and was then split into sections, constituting 300 simulated light curves for each observed lightcurve. The simulated model average PSD is evaluated from this ensemble of PSD realisations, and the errors are assigned from the rms spread of the realisations within a frequency bin. We present the results of several PSD model fits in an attempt to quantify the underlying model shape that best describes the PSD of NGC 3783, and associate an accep- Figure 4. The best fit unbroken power law PSD of the combined RXTE and XMM-Newton data. The solid lines represent the ob- served data and the points with error bars exhibit the biased model and the spread in individual realisations of the model. The dashed line is the underlying model used to generate the simu- lated PSD. The three lowest frequency data sets are from RXTE observations and the high frequency data set (∼10−5 Hz) is from the XMM-Newton observations. Note that the rise in power at the highest frequencies is due to the photon Poisson noise. tance probability with each model. We initially test a sim- ple unbroken power law model. We next fit a power law with a single-bend in the PSD, and then a model incorporating a double-bend. We also fit a single-bend power law with a Lorentzian component. 3.3 Unbroken power law model To begin, we fitted a simple power law model to the data of the form: P (ν) = A where A is the normalisation at a frequency ν0, and α is the power law slope. We made 900 simulations and in Fig. 4 we show the best fit plotted in ν × Pν , which has a PSD slope of -2.1. However, the fit is poor and this model can be rejected with a probability > 99 %, or & 3 σ. 3.4 Single-bend power law model Here we fit a single-bend power law to the data. This model best describes the PSD of Cyg X-1 in the high/soft state, and provides a good fit to the PSDs of the AGN NGC 4051 and MCG–6-30-15 (McHardy et al. 2004, 2005). c© 2006 RAS, MNRAS 000, 1–8 Accretion state of NGC 3783 5 Figure 5. The best fit single-bend power law PSD of the com- bined RXTE and XMM-Newton data. The various lines represent the same data as seen in Fig. 4. Figure 6. The best fit double-bend power law PSD of the com- bined RXTE and XMM-Newton data. The various lines represent the same data as seen in Fig. 4. P (ν) = A ναL ”αL−αH Fig. 5 presents the observed PSD fitted with a single- bend power law model, for which a good likelihood of accep- tance is obtained (P = 44 %). The best fit bend-frequency is νB = 6.2 +40.6 −5.6 × 10 −6 Hz, the high-frequency slope is αH = −2.6+0.6 , and the low-frequency slope is αL = −0.8 −0.5. The errors are 90% confidence limits, an asterisk indicates that the limit is unconstrained. For αH the best fit value is well within the searched parameter space but the degen- eracy produced by red-noise leak in the probability at high values of αH , means that the upper limit is not constrained at the 90% confidence level. The confidence contours for the main interesting parameters are plotted in Figs. 7 and 8. Ta- ble 2 shows the single-bend power law best fit parameters to the data. The best fit single-bend frequency obtained here is consistent with the value found by Markowitz et al. (2003) 3.5 Double-bend power law model Markowitz et al. (2003) provide tentative evidence that Figure 7. Single-bend power law model: 68, 90, and 99% confi- dence contours for the bend frequency, νB, and the high frequency slope, −αH , for the single-bend power law fit to the combined RXTE and XMM-Newton PSD. Figure 8. Single-bend power law model: 68, 90, and 99% confi- dence contours for the bend frequency, νB, and the low frequency slope, −αL, for the single-bending power law fit to the combined RXTE and XMM-Newton PSD. a second, lower, frequency break exists in the PSD of NGC 3783. Thus, we also fitted a more complex double-bend power law model to see if the goodness-of-fit is improved sig- nificantly. The double-bend power law model is given by: P (ν) = A ναL ”αL−αI ”αI−αH where αI is the intermediate-frequency slope and νL and νH are the low and high bend-frequencies respectively. We fixed the low-frequency slope to zero, to avoid making the simulations computationally prohibitive, and because a low-frequency slope of zero would allow the best qualitative comparison to the low state of Cyg X-1 (Nowak et al. 1999). Fig. 6 presents the same observed PSD as in Fig. 5, but fitted with the double-bend power law model. A good likelihood of acceptance is obtained (P=64 %). The best- fitting high bend-frequency is νH = 2.6 × 10−5 Hz, the high-frequency slope is αH = −3.2 , the intermediate- frequency slope is αI = −1.3 , the low-frequency bend is νL = 1.7 ×10−7 Hz. As before, we use 90% confidence lim- c© 2006 RAS, MNRAS 000, 1–8 6 D. P. Summons et al Model Normalisation αH αI αL νH νL Acceptance (a) (Hz) (Hz) (%) Single-bend 1.5× 10−4 −2.6 NA −0.8 +40.6 × 10−6 NA 44.4 Double-bend 1.0× 102 −3.2 0.0 2.6 × 10−5 1.7 × 10−7 63.9 Table 2. Best fit model parameters for the examined models to the combined RXTE and XMM-Newton PSD of NGC 3783. The errors on the single- and double-bend fits are calculated from the 90% confidence intervals. The bend-frequency for the single-bend model, νB , is denoted here as νH . An asterisk indicates that the limit is unconstrained. its. The added parameters allow extra freedom to find better fit probabilities for any given set of double-bend parameters. For this reason, the contour levels cover larger ranges in the parameter space and therefore, most of the 90% contours in our double-bend fit remain unbounded over the fitted parameter space. The high-frequency slope is subject to the same problems as in the single-bend model. Table 2 contains a summary of the best-fitting model parameters. The best-fitting low-frequency bend is found close to the lowest frequencies probed by the data and, as seen in Fig. 9, it is essentially unbounded down to the lowest measurable frequency at the 68% confidence level. These facts suggest that the second, low-frequency, bend might not be required by the data and that the improvement in the fit might be only due to the increased complexity of the model fitted. The likelihood of acceptance is better in the double- bend model than in the single-bend model, 64 versus 44 % respectively, but there are more free parameters. In order to determine the significance of this improvement, we per- formed the following test. Using the best-fitting single-bend PSD parameters, we generated 300 realisations of the sets of RXTE and XMM-Newton lightcurves. Each realisation was then fitted with the best-fitting double-bend parameters, ex- actly as was done with the real data, and the distribution of their fit probabilities was constructed. We found that 121 out of the 300 single-bend simulations have a higher fit prob- ability than the real data, when fitted with the double-bend model. Therefore, we conclude that the improvement in fit probability is no more than may be expected from fitting a model which is more complicated than required by the data: the double-bend model does not represent a significant im- provement. 3.6 Single-bend power law with a Lorentzian component We finally consider whether the observed PSD might be best-described by adding a Lorentzian component, such as are commonly used to describe broad-band noise compo- nents in GBHs (e.g. Nowak 2000), to the single-bend power law. We are motivated to consider this possibility because the PSD of the intense-sampling RXTE light curve is not very well described by either the single- or double-bend power law model. Visual inspection of this light curve, shown in Fig. 2, suggests that the variability is strongly concen- trated on time-scales of around a day, or equivalently, fre- quencies around 10−5 Hz, which is confirmed by the peak seen in the corresponding section of the PSD, and the drop in the same PSD at lower frequencies (∼ 10−6Hz). The long-term monitoring PSDs, however, do not show a dip at 10−6Hz, creating a large discrepancy in the PSD mea- Figure 9. Double-bend power law model: 68, and 90% confi- dence contours for the high bend-frequency, νH , and the low bend- frequency, νL, for the double-bend power law fit to the combined RXTE and XMM-Newton PSD. surements at this frequency. A strongly peaked component in the underlying PSD, at ∼ 10−5 Hz, could produce the observed features. Such a component would appear as a peak in a PSD that covered frequencies above and below its peak-frequency, but would be insufficiently sampled by the long-term monitoring campaigns; thus, its power would be aliased into the highest frequencies of the longer time- scale data, making them rise above the underlying model level and causing the apparent disparity. The Lorentzian profile is described by: PLor(ν) = ν2c + 4Q 2(νc − ν)2 where the centroid frequency νc is related to the peak- frequency νp by νp = νc 1 + 1/4Q2 and the quality factor Q is equal to νc divided by the full width at half maximum of the Lorentzian. The variable A parameterizes the relative contribution of the power law and Lorentzian components to the total rms. Fitting a Lorentzian component in addition to a single-bend power law provides a good fit (P=52 %). The best-fitting Lorentzian contributes 20% of the variance in the frequency range probed and its best-fitting parame- ters are quoted in Table 3. Fig. 10 shows the observed PSD compared with the best-fitting single-bend power law model plus a Lorentzian component. The Lorentzian feature in the model can reproduce qualitatively the spurious power at the high frequency end of the long-term monitoring data and the turn down effect observed in the intensive-sampling data. To determine the significance of the Lorentzian compo- nent fit we repeated the procedure used in determining the c© 2006 RAS, MNRAS 000, 1–8 Accretion state of NGC 3783 7 νp νB Q A αL αH Acceptance (Hz) (Hz) (%) × 10−6 1.1 × 10−5 5.1 Table 3. Best fit single-bend power law with Lorentzian component model parameters to the combined RXTE and XMM-Newton PSD of NGC 3783, where νp is the Lorentzian peak frequency, Q is its quality factor, νB is the power law bend frequency and αL and αH are the power law slopes below and above the bend, respectively. The errors are calculated from the 68% confidence intervals, and an asterisk indicates that the limit is unconstrained. Figure 10. The best-fitting single-bend power law with a Lorentzian component. The fit was done using the entire data set but here we only show the Lorentzian region. As before, solid lines represent the real data PSD, dashed lines represent the best- fitting model and markers with error bars represent the model distorted by sampling effets. The Lorentzian feature in the model can reproduce qualitatively the spurious power at the high fre- quency end of the long-term monitoring data and the turn down effect observed in the intensive-sampling data. significance of the double-bend model. We found that 222 of the 300 single-bend simulated PSDs have a higher fit proba- bility than the data, when fitted with the single-bend power law plus Lorentzian model. This result indicates that the increase in fit probability could be due to the added com- plexity of the model, and that the improvement in the fit over a simple bending power law is not significant. 4 DISCUSSION AND CONCLUSIONS We have combined our own new RXTE monitoring data with archival RXTE and XMM-Newton observations to construct a high-quality PSD of NGC 3783 spanning five decades in frequency. We find that a ‘soft’ state model, with a single bend at 6.2 × 10−6 Hz, similar to that found earlier by Markowitz et al. (2003), a power law of slope approximately -0.8 extending over almost three decades in frequency below the bend, and slope above the bend of approximately -2.6 is a good fit to the data. We also find that a ‘hard’ state model, with a double bend, fits the data, as does a model with a single bend plus an additional Lorentzian compo- nent. However the improvement in fit is marginal and, given the additional free parameters, is not significant. Thus we conclude that a simple ‘soft’ state model provides the most likely explanation of the data. Assuming a mass of 3 × 107M⊙ for NGC 3783 (Peterson et al. 2004), and an accretion rate of 7% of the Eddington limit (Uttley & McHardy 2005, based on Woo & Urry 2002), then NGC 3783 is still in good agree- ment with the scaling of PSD break timescale as ∼ M/ṁE between AGN and GBHs found by McHardy et al. (2006). Our new fits, show that the PSD of NGC 3783 is per- fectly consistent with a single-bend power law with low- frequency slope of -1, in contrast with the earlier result of Markowitz et al. (2003), who found that a similar model was rejected tentatively at ∼ 98% confidence. The differ- ence can be understood in terms of the improved long-term data. Our new RXTE monitoring observations occur every 2 days, compared to 4 days previously, thereby increasing the long term RXTE data set by a factor 2.6 and, in particular, providing overlap at high frequencies with the RXTE inten- sive monitoring data. The drop in long-timescale variability power, evident in the older long term monitoring data is not reproduced by the new long-term monitoring data, showing that this drop could be just a statistical fluctuation. In addi- tion, the very high frequencies are better constrained by the 2 orbits of XMM-Newton data than by the earlier Chandra data used by Markowitz et al. (2003). The classification of the PSD as being ‘soft’ state means that NGC 3783 is no longer considered unusual amongst AGN. The fact that this AGN is radio-quiet strongly sup- ports the analogy with GBHs in the soft state. Also the ac- cretion rate of NGC 3783 ( ṁE =0.07) (Uttley & McHardy 2005, based on Woo & Urry 2002) is similar to that of other AGN with soft-state PSDs (e.g. NGC 3227 Uttley & McHardy 2005, NGC 4051 McHardy et al. 2004, MCG-6-30-15 McHardy et al. 2005). This accretion rate is above the rate at which the persistent GBH Cyg X-1 tran- sits between hard and soft states in either direction and at which other GBHs transit from the soft to hard state ( ṁE =0.02) (Maccarone et al. 2003; Maccarone 2003). We note that other transient GBHs in outburst, where the variable power law emission in the soft state PSD is weak, can re- main in the hard state to much higher accretion rates (∼ 2– 50% Homan & Belloni 2005) but it is not clear whether we should expect similar PSD shapes to AGN for such outburst- ing sources. Thus NGC 3783 remains compatible with other moderately accreting AGN in being analogous to Cyg X-1 in the soft state. It is, of course, possible that the transition rate might not be independent of mass. Observations do not yet greatly constrain the transition rate as a function of mass but the abscence of large deviations from the so-called ‘fun- damental’ plane of radio luminosity, X-ray luminosity and black hole mass (Merloni et al. 2003; Falcke et al. 2004) ar- gues against a large spread in the transistion accretion-rates (e.g. see Körding et al. 2006). In the case of Seyfert galaxy c© 2006 RAS, MNRAS 000, 1–8 8 D. P. Summons et al NGC 3227, the accretion rate is ∼ 1–2% and a ‘soft’ state PSD is measured (Uttley & McHardy 2005), which suggests that the transition accretion-rate in AGN should be at or below that value. Our observations, which show that NGC 3783 does not have a highly unusual PSD, therefore confirm the growing similarities between AGN and Galactic black hole systems and leave only Arakelian 564, which is probably a very high state object, as the only AGN showing clear double breaks (or multiple Lorentzians) in its PSD (e.g. Arévalo et al. 2006, McHardy et al. in prep.). ACKNOWLEDGEMENTS We would like to thank the referee, Chris Done, for use- ful comments and suggestions. This research has made use of the data obtained from the High Energy Astrophysics Science Archive Research Center (HEASARC), provided by NASA’s Goddard Space Flight Center. We would like to thank Information Systems Services (ISS) at the Univer- sity of Southampton for the use of their Beowulf cluster, Iridis2. PU acknowledges support from a Marie Curie Inter- European Research Fellowship. REFERENCES Arévalo P., Papadakis I. E., Uttley P., McHardy I. M., Brinkmann W., 2006, MNRAS, 372, 401 Axelsson M., Borgonovo L., Larsson S., 2006, A&A, 452, Belloni T., Homan J., Casella P., van der Klis M., Nespoli E., Lewin W. H. G., Miller J. M., Méndez M., 2005, A&A, 440, 207 Churazov E., Gilfanov M., Revnivtsev M., 2001, MNRAS, 321, 759 Corbel S., Fender R. P., Tzioumis A. K., Nowak M., McIn- tyre V., Durouchoux P., Sood R., 2000, A&A, 359, 251 Cui W., Heindl W. A., Rothschild R. E., Zhang S. N., Ja- hoda K., Focke W., 1997, ApJL, 474, L57+ Done C., Gierliński M., 2005, MNRAS, 364, 208 Edelson R., Nandra K., 1999, ApJ, 514, 682 Falcke H., Körding E., Markoff S., 2004, A&A, 414, 895 Fender R. P., 2001, MNRAS, 322, 31 Homan J., Belloni T., 2005, ApSS, 300, 107 Homan J., Wijnands R., van der Klis M., Belloni T., van Paradijs J., Klein-Wolt M., Fender R., Méndez M., 2001, ApJS, 132, 377 Körding E. G., Fender R. P., Migliari S., 2006, MNRAS, 369, 1451 Körding E. G., Jester S., Fender R., 2006, MNRAS, 372, Maccarone T. J., 2003, A&A, 409, 697 Maccarone T. J., Gallo E., Fender R., 2003, MNRAS, 345, Markowitz A., 2005, ApJ, 635, 180 Markowitz A., Edelson R., Vaughan S., Uttley P., George I. M., Griffiths R. E., Kaspi S., Lawrence A., McHardy I., Nandra K., Pounds K., Reeves J., Schurch N., Warwick R., 2003, ApJ, 593, 96 McHardy I., 1988, Memorie della Societa Astronomica Ital- iana, 59, 239 McHardy I. M., Gunn K. F., Uttley P., Goad M. R., 2005, MNRAS, 359, 1469 McHardy I. M., Koerding E., Knigge C., Uttley P., Fender R. P., 2006, Nature, 444, 730 McHardy I. M., Papadakis I. E., Uttley P., Page M. J., Mason K. O., 2004, MNRAS, 348, 783 Merloni A., Heinz S., di Matteo T., 2003, MNRAS, 345, Nowak M. A., 2000, MNRAS, 318, 361 Nowak M. A., Vaughan B. A., Wilms J., Dove J. B., Begel- man M. C., 1999, ApJ, 510, 874 Papadakis I. E., Brinkmann W., Negoro H., Gliozzi M., 2002, A&A, 382, L1 Papadakis I. E., Lawrence A., 1993, MNRAS, 261, 612 Peterson B. M., Ferrarese L., Gilbert K. M., Kaspi S., Malkan M. A., Maoz D., Merritt D., Netzer H., Onken C. A., Pogge R. W., Vestergaard M., Wandel A., 2004, ApJ, 613, 682 Pottschmidt K., Wilms J., Nowak M. A., Pooley G. G., Gleissner T., Heindl W. A., Smith D. M., Remillard R., Staubert R., 2003, A&A, 407, 1039 Pounds K., Edelson R., Markowitz A., Vaughan S., 2001, ApJL, 550, L15 Reig P., Papadakis I., Kylafis N. D., 2002, A&A, 383, 202 Revnivtsev M., Gilfanov M., Churazov E., 2000, A&A, 363, Reynolds C. S., 1997, MNRAS, 286, 513 Timmer J., Koenig M., 1995, A&A, 300, 707 Uttley P., McHardy I. M., 2005, MNRAS, 363, 586 Uttley P., McHardy I. M., Papadakis I. E., 2002, MNRAS, 332, 231 Uttley P., McHardy I. M., Vaughan S., 2005, MNRAS, 359, Wilms J., Nowak M. A., Pottschmidt K., Pooley G. G., Fritz S., 2006, A&A, 447, 245 Woo J.-H., Urry C. M., 2002, ApJ, 579, 530 Zhang W., Giles A. B., Jahoda K., Soong Y., Swank J. H., Morgan E. H., 1993, in Siegmund O. H., ed., Proc. SPIE Vol. 2006, p. 324-333, EUV, X-Ray, and Gamma-Ray In- strumentation for Astronomy IV, Oswald H. Siegmund; Ed. Laboratory performance of the proportional counter array experiment for the X-ray Timing Explorer. pp 324– c© 2006 RAS, MNRAS 000, 1–8 INTRODUCTION OBSERVATIONS AND DATA REDUCTION RXTE Data Reduction XMM-Newton Data Reduction Power spectral Modelling Combining RXTE and XMM-Newton data Monte Carlo simulations Unbroken power law model Single-bend power law model Double-bend power law model Single-bend power law with a Lorentzian component Discussion and Conclusions ABSTRACT Previous observations with the Rossi X-ray Timing Explorer (RXTE) have suggested that the power spectral density (PSD) of NGC 3783 flattens to a slope near zero at low frequencies, in a similar manner to that of Galactic black hole X-ray binary systems (GBHs) in the `hard' state. The low radio flux emitted by this object, however, is inconsistent with a hard state interpretation. The accretion rate of NGC 3783 (~7% of the Eddington rate) is similar to that of other AGN with `soft' state PSDs and higher than that at which the GBH Cyg X-1, with which AGN are often compared, changes between `hard' and `soft' states (~2% of the Eddington rate). If NGC 3783 really does have a `hard' state PSD, it would be quite unusual and would indicate that AGN and GBHs are not quite as similar as we currently believe. Here we present an improved X-ray PSD of NGC 3783, spanning from ~10^{-8} to ~10^{-3} Hz, based on considerably extended (5.5 years) RXTE observations combined with two orbits of continuous observation by XMM-Newton. We show that this PSD is, in fact, well fitted by a `soft' state model which has only one break, at high frequencies. Although a `hard' state model can also fit the data, the improvement in fit by adding a second break at low frequency is not significant. Thus NGC 3783 is not unusual. These results leave Arakelian 564 as the only AGN which shows a second break at low frequencies, although in that case the very high accretion rate implies a `very high', rather than `hard' state PSD. The break frequency found in NGC 3783 is consistent with the expectation based on comparisons with other AGN and GBHs, given its black hole mass and accretion rate. <|endoftext|><|startoftext|> Two- and three-point Green’s functions in two-dimensional Landau-gauge Yang-Mills theory Axel Maas1, ∗ Department of Complex Physical Systems, Institute of Physics, Slovak Academy of Sciences, Dúbravská cesta 9, SK-845 11 Bratislava, Slovakia (Dated: November 4, 2018) The ghost and gluon propagator and the ghost-gluon and three-gluon vertex of two-dimensional SU(2) Yang-Mills theory in (minimal) Landau gauge are studied using lattice gauge theory. It is found that the results are qualitatively similar to the ones in three and four dimensions. The propa- gators and the Faddeev-Popov operator behave as expected from the Gribov-Zwanziger scenario. In addition, finite volume effects affecting these Green’s functions are investigated systematically. The critical infrared exponents of the propagators, as proposed in calculations using stochastic quanti- zation and Dyson-Schwinger equations, are confirmed quantitatively. For this purpose lattices of volume up to (42.7 fm)2 have been used. PACS numbers: 11.10.Kk 11.15.-q 11.15.Ha 12.38.Aw I. INTRODUCTION Two-dimensional Yang-Mills theory turns out to be a very fascinating topic. Quite a number of quantities, e. g. the string tension [1], can be calculated exactly, al- though not all quantities are (yet) known analytically. In particular, up to now it was not possible to calcu- late the Green’s functions in Landau gauge. However, exactly these Green’s functions may contain interesting information. The reason for this is confinement. In two-dimensional Yang-Mills theory, confinement in Landau gauge is al- ready manifest in perturbation theory: All elementary fields, the gluons and ghosts, form a BRST quartet, and thus are confined according to the Kugo-Ojima mecha- nism [2]. This can be extended non-perturbatively, pro- vided that BRST symmetry is unbroken beyond pertur- bation theory. This makes explicit the absence of propa- gating degrees of freedom in two-dimensional Yang-Mills theory. But even without propagating degrees of free- dom, this permits to investigate the manifestation of the quartet mechanism on the level of the Green’s functions. In addition, the reasoning for the confinement scenario of Gribov and Zwanziger [3, 4, 5, 6] is applicable to two dimensions as well [6]. However, this scenario has no direct manifestation on the perturbative level, as in the case of the quartet mechanism. It is only manifest in the infrared properties of correlation functions. In par- ticular, the Gribov-Zwanziger scenario predicts that the Faddeev-Popov operator Mab accumulates near-zero or zero eigenvalues. As a consequence, the ghost propagator DG, being the expectation value of the inverse Faddeev- Popov operator, should be infrared diverging. Detailed calculations using stochastic quantization [6] or Dyson- Schwinger equations (DSEs) [7, 8] lead to a power-law ∗Electronic address: axel.maas@savba.sk behavior in the far infrared in any dimension from two to four, DG(p) ∼p→0 p −2−2κ. (1) Furthermore, the gluon propagator is infrared vanishing, and thereby explicitly positivity violating. Its scalar part also behaves like a power-law in the far infrared, D(p) = (d− 1) δµν − Dµν(p) ∼p→0 p −2−2t, where d is the space-time dimension. The two exponents are related by the sum rule t+ 2κ+ = 0. (3) Under the assumption of an infrared bare ghost-gluon vertex, two possible values for κ are found, 0 and 1/5 [6, 8]. If physics is smooth as a function of dimensionality, the non-zero exponent would be expected due to the re- sults obtained in three and in four dimensions [6, 7, 8, 9]. Note that in calculations using the renormalization group in the case of a bare ghost-gluon vertex the same equa- tions as in DSE calculations are obtained, thus leading to the same results for the infrared exponents in any di- mension [10]. These two scenarios are two of the most discussed for the confinement mechanism of gluons also in higher di- mensions, see e. g. for four dimensions the reviews [11] and in three dimensions [6, 8, 12]. A verification of their predictions using lattice gauge theory in higher dimen- sions has, however, turned out to be very complicated, mainly due to finite volume effects. In three dimensions only a qualitative agreement between the predictions of the Gribov-Zwanziger scenario and functional calcula- tions has been obtained [12, 13]. In four dimensions, the lattice results are inconclusive (see e. g. [14, 15, 16, 17]). Studies using Dyson-Schwinger equations in a finite vol- ume support that these problems are, in fact, finite vol- ume effects, and provide even a quantitative prediction http://arxiv.org/abs/0704.0722v2 mailto:axel.maas@savba.sk of these in four dimensions [18]. The latter are in ac- ceptable agreement with the results obtained in lattice calculations [18]. Here, for two dimensions, the accessible lattices permit a quantitative test of the predictions. It will be shown that the predictions, assumptions, and actually the value of κ = 1/5, of the Gribov-Zwanziger scenario are found in lattice calculations, and hence there is very strong ev- idence for the Gribov-Zwanziger scenario to be at work. In fact, it is possible to quantify the finite volume effects. Hence, in the following a quantitative confirmation of the predictions of the Gribov-Zwanziger scenario using lattice gauge theory for two-dimensional SU(2) Yang- Mills theory in (minimal) Landau gauge will be given. Of course, with such results, one question immediately arises when comparing the two-dimensional results to those in higher dimensions: Why do they agree qualita- tively on the level of two- and three-point Green’s func- tions in the infrared? This points to a structural origin of both, the Gribov-Zwanziger and the Kugo-Ojima sce- nario, provided both are, in fact, correct. It is partic- ularly tempting to then investigate the relation of both scenarios in two dimensions. Also how the relation of two-dimensional Yang-Mills theory to topological field theory [19] comes then into play is immediately on one’s mind. These, and similar questions arise when contem- plating the results, and indicate that there are many in- teresting opportunities still present in the study of two- dimensional Yang-Mills theory. These are highly inter- esting questions, and must be investigated in the future. Within this work, however, as a first step just the results from the lattice calculations will be collected. The two-point functions, and as associated quantities the Faddeev-Popov operator and the running coupling, will be investigated in section II. The three-point func- tions will afterwards be discussed in section III. A short summary of the results will be given in section IV. The technicalities of the lattice simulations can be found in appendix A. Lattice artifacts other than finite volume ef- fects will be discussed in appendix B. That the suppres- sion of color indices in equations (1) and (2) is justified will be shown in appendix C. II. TWO-POINT FUNCTIONS The definition and determination of the two-point functions on the lattice, and the associated quantities, have been repeatedly discussed in the literature (see, e. g., [5, 12, 14, 15, 16, 17]). Here, the methods described in [12] are employed. Furthermore, the appearance of β- factors to obtain the correct scaling has been discussed there, also in case of the three-point functions. Hence, this will not be repeated here. To assign units to the quantities, the exactly calculable string tension [1] has been assigned the conventional value (440 MeV)2, as in higher dimensions. p [GeV] 0 0.2 0.4 0.6 0.8 1 Gluon propagator p [GeV] 0 0.5 1 1.5 2 2.5 3 Gluon dressing function FIG. 1: The top panel shows the gluon propagator at small momenta for various volumes. The lower panel shows the gluon dressing function over the whole accessible momentum range. Open circles correspond to a volume of (42.7 fm)2, full squares to (14.2 fm)2, full triangles to (7.11 fm)2, and upside- down full triangles to (2.02 fm)2. The solid line in the top panel is the function 4.5p4/5. A. Gluon propagator The gluon propagator is the most readily accessible two-point correlation function. The results for the prop- agator D(p), (2), and its associated dressing function p2D(p) are shown in figure 1. A strongly infrared sup- pressed gluon propagator is clearly visible. At the same time, the infrared suppression increases with increasing physical volume. In particular, while on a volume of (2.02 fm)2 the propagator appears to be infrared diverging, a clear maximum appears already at a volume only a factor 1/L [GeV] -210 -110 Gluon propagator at zero momentum FIG. 2: The zero-momentum value D(0) of the gluon propa- gator as a function of inverse edge length. The straight line is the power-law fit 5.67L−0.79 to the 20 points at the smallest volumes. (2-3)2 larger. Only the point at the lowest non-vanishing momentum and the point at zero are not consistent with an infrared vanishing gluon propagator. This is, however, expected [18]. The scaling of D(0) with volume, shown in figure 2, makes it very likely that in the infinite volume limit the gluon propagator vanishes at zero momentum, as it vanishes like a power-law with inverse volume. In fact, the exponent 0.79 of the determined power-law is in very good agreement with the expectation [18] that it should coincide with the exponent of the gluon propaga- tor t = 4/5 of equation (2). Furthermore, even the gluon dressing function does not exhibit any qualitative difference to three dimensions. In particular it also exhibits a (shallow) maximum. As the propagator becomes ultraviolet constant, as a conse- quence of asymptotic freedom, there is no intrinsic ne- cessity for such a maximum, as in four dimensions. Its presence in this theory without propagating degrees of freedom is hence slightly surprising. However, in the con- text of a DSE treatment, it is natural to expect such a maximum due to the different signs of ghost and gluon self-energy contributions [8]. The most interesting quantity is the far infrared be- havior. It is clearly visible that the gluon propagator is strongly infrared suppressed. The deviation at the very lowest momenta points, however, shows a more massive behavior, as expected from DSE-studies in finite volumes [18]. However, the mass decreases rapidly with volume, as discussed above, and a massive behavior is seen only in a momentum window which rapidly decreases with in- creasing volume. More interestingly, it is expected that 1/L [GeV] 0 0.1 0.2 0.3 0.4 0.5 Gluon infrared exponent FIG. 3: The measured infrared exponent κZ obtained from the gluon propagator. Two fits are given. The dashed line corresponds to a fit of type (5) which is forced to go to the predicted value κ = 1/5 at 1/L = 0, while the one given by the solid line is not forced to do so. The fit parameters can be found in table I. in the regime1 ≪ p ≪ ΛQCD (4) the continuum behavior should prevail. In particular, the gluon propagator should decrease even in a finite volume in this domain like the power-law (2) [18]. Using the sum rule (3), the exponent of the propagator itself should be 4κ. Such a power-law is shown in the top panel in figure 1, and agrees well with the data inside the domain (4). To investigate this quantitatively, the effective expo- nent κZ was determined. This was done by discarding the two lowest non-vanishing momentum points. Then the next five highest points in momentum were used to fit a power-law. To obtain errors, the steepest and shal- lowest curve consistent with a 1σ-confidence interval was determined as well. That this is likely too optimistic is shown by the scattering of the results below. If more than one momentum representation for a given momen- tum existed, the results were averaged over the various representations, as the violation of rotational symmetry is a minor effect that far in the infrared, see appendix B. The results are shown in figure 3. While there are still significant fluctuations at large volumina, the measured exponents tend towards the continuum value. The volume-dependence of the measured exponents 1 The characteristic scale ΛQCD is in two dimensions of course proportional to the coupling constant g. TABLE I: Fit parameters of formula (5). Fit 1 corresponds to one with fixed a = κ = 1/5, fit 2 to one where a was fitted as well. Fit a b [fm] c [fm2] d [fm3] 1 1/5 0.130 -12.9 19.5 2 0.190 0.358 -14.0 20.9 can be fitted by the formula2 Z = a+ . (5) Two fits have been done. In one case, a was fitted as well, while in the second case a was set to the continuum value κ = 1/5. However, even with a free, the result is in reasonable agreement with 1/5. In particular, the results are not consistent with an infrared finite gluon propagator, which would be expected if κ = 0, the second solution found in [6, 8]. The individual fit parameters are given in table I. Hence, the gluon propagator behaves quantitatively exactly as predicted in the Gribov-Zwanziger scenario, when finite volume effects are taken properly into ac- count. B. Ghost propagator The ghost propagator has been determined along the same line as in higher dimensions [5, 12]. However, more interesting than the propagator itself is the dressing func- tion p2DG(p). The propagator and the dressing function are shown for different volumes in figure 4. It is clearly visible that the dressing function is infrared diverging. This already indicates that of the two possible exponents κ = 0 and κ = 1/5 found [6, 8] only the latter one, if one at all, is realized. Compared to the case of the gluon propagator, finite volume effects are hardly visible to the eye. It seems that the propagator actually becomes less infrared diverging with volume. From the quantitative evaluation below, this is found to be not the case. What seems to be the case is that the domain of closest approach to the origin is affected by finite volume effects. Its modification leads to the various changes in the infrared in a non-trivial man- ner. If this is the case, the finite-volume effects would be very hard to compare between lattice calculations and functional calculations, as they would be dominated by mid-momentum effects, which in functional methods are usually most strongly affected by truncations [8, 11]. This would, on the other hand, explain why in four di- mensions the finite volume effects in the ghost propaga- tor have indeed been found to be at least to some extent 2 The cubic term is necessary to include all volumes. p [GeV] 0 1 2 3 4 5 Ghost propagator p [GeV] 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Ghost dressing function FIG. 4: The top panel shows the ghost dressing function at small momenta for various volumes. The lower panel shows the ghost propagator over the whole accessible momentum range. Open circles correspond to a volume of (42.7 fm)2, full squares to (14.2 fm)2, full triangles to (7.11 fm)2, and upside-down full triangles to (2.02 fm)2. The solid line is the function 1.1p−2/5. different in lattice and in Dyson-Schwinger calculations [18]. In addition, Gribov-Singer effects [3, 20], which ac- cording to the Gribov-Zwanziger scenario are irrelevant in the infinite-volume limit [21], may still be relevant even at volumes as large as those used here. This has not yet been investigated in two dimensions in Landau gauge. Even with the available volumes the effect is small. A power-law with exponent κ = 1/5 already describes the data quite well in the infrared, as shown in the top panel of figure 4. Therefore, a more quantitative investigation of the infrared behavior is required. 1/L [GeV] 0 0.1 0.2 0.3 0.4 0.5 Ghost infrared exponent FIG. 5: The measured infrared exponent κ obtained from the ghost propagator. Two fits of type (5) are given. The dashed line corresponds to a fit which is forced to go to the predicted value at 1/L = 0, while the one given by the solid line is not forced to do so. The fit parameters can be found in table II. TABLE II: Fit parameters for the ghost effective exponent κG using formula (5). Fit 1 corresponds to one with fixed a = κ = 1/5, fit 2 to one where a was fitted as well. Fit a b [fm] c [fm2] d [fm3] 1 1/5 0.139 -0.711 0.173 2 0.150 1.28 -6.41 7.47 This is done by extracting the effective infrared ghost exponent κG in the same way as in the case of the gluon propagator. The results for κG are shown in figure 5. While statistical errors are larger than in the case of the gluon propagator, it is visible that all results clus- ter around the predicted continuum value of κ = 1/5 at large volumes. This is also seen from a fit of the type (5). The corresponding fit parameters can be found in table II. Due to statistical uncertainties it is not as clean as for the gluon. However, it is visible that the exponent does not vary strongly with volume. In fact, the effective ghost exponent seems only to change by about a third when changing the volume by almost two orders of mag- nitude. Finally, even with the limited fit accuracy it is not unreasonable that the results are, in fact, consistent with the prediction of κ = 1/5. Another possibility to check the continuum results is to test the predicted sum rule (3). This is done by using the effective measured exponents κZ and κG in figure 6. Again, a fit of type (5) has been performed. The corre- sponding fit parameters are given in table III. As already anticipated from the individual results, the sum rule be- comes better and better satisfied when approaching the continuum limit. Hence, it seems very likely that the 1/L [GeV] 0 0.1 0.2 0.3 0.4 0.5 Infrared sum rule FIG. 6: Test of the sum rule t+2κ+1 = 0, using the effective ghost exponent κG, shown in figure 5, and the effective gluon exponent tZ = −(1 + 2κZ), shown in figure 3. Two fits of type (5) are given. The dashed line corresponds to a fit which is forced to go to the predicted value at 1/L = 0, while the one given by the solid line is not forced to do so. The fit parameters are given in table III. shiny relation (3) is, in fact, recovered in the continuum limit. TABLE III: Fit parameters for a formula of type (5) for the sum rule. Fit 1 corresponds to one with fixed a = 0, fit 2 to one where a was fitted as well. Fit a b [fm] c [fm2] d [fm3] 1 0 0.0192 24.4 -38.6 2 -0.0806 1.85 15.2 -26.9 One of the particularly interesting results so far is that the ghost exponent is only very weakly dependent on the volume, compared to the one of the gluon. This is in marked contrast to the case in four-dimensional DSEs in a finite volume [18]. Furthermore, all attempts to extract a ghost exponent in lattice calculations in higher dimensions also yield a rather small, more or less volume-independent exponent [17]. It is thus interesting to compare the ghost propagator in various dimensions at roughly the same volume. This is done in figure 7. Only the momentum regime is shown which is accessible by all of the lattices used. Furthermore, the propagators have been normalized so that they coincide at a momentum of 2 GeV. For the momenta itself, the string tension was set to the same value for all three different dimensionalities. Aside from the question to which extent such a com- parison is justified, the results behave as predicted: The ghost propagator becomes more divergent with increasing dimension. Also, it is in agreement with the predictions p [GeV] 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Ghost propagator p [GeV] 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Ghost dressing function FIG. 7: Comparison of the ghost propagator in different di- mensions. Circles are two dimensions, triangles are three dimensions, and upside-down triangles are four dimensions. The lattice volumes are (6.06 fm)2, (5.20 fm)3 [12], and (5.28 fm)4 [22], at β = 30, β = 4.2, and β = 2.3, respectively. [6, 7, 8, 11] that the difference is more pronounced from two to three dimensions than from three to four dimen- sions: κ changes from 1/5 to ≈ 0.39 or 1/2 from two to three dimensions. The four-dimensional exponent of κ ≈ 0.59 is, on the other hand, rather close to the one in three dimensions. This qualitative behavior, with all its caveats, is another indication for the correctness of the Gribov-Zwanziger scenario and the quantitative predic- tions. Furthermore, the result in two dimensions is in fact confirming quantitatively the Gribov-Zwanziger scenario in two dimensions. However, the very slow change in the effective exponent over orders of magnitude in volume is p [GeV] 0 0.5 1 1.5 2 2.5 3 3.5 Effective coupling FIG. 8: The effective running coupling divided by p2 for var- ious volumes. Open circles correspond to a volume of (21.3 fm)2, full squares to (14.2 fm)2, full triangles to (8.08 fm)2, and upside-down full triangles to (2.02 fm)2. indicative of what challenges have to be met in higher dimensions to see the asymptotic ghost exponent. Finally, the exact value of the exponent obtained in DSE calculations depends on the projection of the ten- sor equation for the ghost [8, 11]. The value of 1/5 is obtained only in the case of a transverse projection [8]. This in turn implies automatically a certain structure of the longitudinal (w. r. t. to the gluon momentum) tensor structure of the ghost-gluon vertex, such that it leads for arbitrary projections to the infrared exponent 1/5. This then makes the determination of this tensor structure an almost trivial exercise in the infrared limit. Furthermore, this precisely prescribes how the Slavnov-Taylor identity for the gluon propagator, and hence its transversality, is recovered in the far infrared. C. Running coupling Although it is possibly a questionable concept in two- dimensional Yang-Mills theory, it is possible to formally define a running coupling. Analogous to higher dimen- sions [11, 23], the quantity3 α(p) = p6D(p)DG(p) then proportional to the coupling constant. In particu- lar, as a consequence of the sum rule, the quantity α(p) 3 To improve the statistical behavior, the ghost dressing function has been evaluated on a plane-wave source instead of a point source, as in case of the propagator alone [12]. Hence only the same volumes are accessible for the coupling constant as for the ghost-gluon vertex below, where this is also necessary. should behave in the infrared as p2. Hence α/p2 should be constant. From the results on the sum-rule, given in figure 6, it is already clear that an infrared fixed point will hardly be seen. However, the results, shown in figure 8, exhibit such a fixed point at the largest volumes, provided the lowest point at non-vanishing momentum is discarded4. Note that the finite volume effects seem to make the run- ning coupling diverging instead of vanishing, as in higher dimensions [17, 18]. Thus at sufficiently large volumes, and taking finite volume effects into account, it is in fact possible to ob- serve a fixed point in the coupling in lattice gauge theory. Note that there is a small, systematic overall factor between the coupling obtained in the different volumes shown in figure 8. This effect is not visible in the propaga- tors themselves, but is increased here by taking effectively the third power of the propagators. As this effect occurs at all momenta, it is likely not simply a finite volume effect. However, this can still be an O(a)-effect which is caused, e. g. among other effects, by the fact that tadpole corrections, which give overall-factors to the propagators, have been neglected here [16, 24]. D. Faddeev-Popov operator A last element in the analysis of the two-point cor- relation functions are the properties of the Faddeev- Popov operator, central to the Gribov-Zwanziger sce- nario [3, 4, 5]. The results on the ghost propagator, which is the expectation value of the inverse Faddeev-Popov op- erator, already indicate the existence of an enhancement of its eigenspectrum near zero eigenvalue. This enhance- ment is the hallmark of the Gribov-Zwanziger scenario. However, it is interesting to see the quantitative behav- ior of the eigenspectrum. Hence the spectral properties of the Faddeev-Popov operator have been determined as well, using the technique described in [12]. The near-zero part of the eigenspectrum is shown for various volumes in figure 9. The volume scaling of the lowest eigenvalue is shown in figure 10. It is clearly visi- ble that with increasing volume more and more eigenval- ues are found near zero. This is the near-zero eigenvalue enhancement, as predicted in the Gribov-Zwanziger sce- nario5. In addition, the lowest eigenvalue vanishes in the infinite-volume limit, and in fact vanishes faster than 4 For the coupling constant only edge momenta have been used, in contrast to the propagators where also other momenta have been included. Dismissing here only the lowest non-vanishing momenta is thus equivalent to dismissing the two lowest non- vanishing momenta in case of the propagators. 5 Note that the decrease towards larger eigenvalues seen in figure 9 is likely an artifact of the method to determine the eigenvalues [12]. Furthermore, all eigenvalues are only found with multiplic- ity 1. 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.001 Spectrum near zero eigenvalue FIG. 9: The near-zero part of the eigenvalue spectrum of the Faddeev-Popov operator for volumes (2.02 fm)2 (dashed- dotted line), (7.11 fm)2 (dotted line), (14.2 fm)2 (dashed line), and (24.9 fm)2 (solid line). 1164228, 2261493, 3517400, and 2614098 eigenvalues have been enclosed in the full spectrum, respectively. 1/L [GeV] -210×3 -110 -110×2 Lowest eigenvalue of the Faddeev-Popov operator FIG. 10: The volume-dependence of the lowest eigenvalue of the Faddeev-Popov operator. The solid line is the function 0.314L−2.34 . the lowest eigenvalue of the Laplacian. This is a prop- erty which has also been observed in higher dimensions [12, 22]. It has been argued that such a larger rate may be necessary for the ghost propagator to develop an infrared divergence [25]. It is therefore another direct evidence for the validity of the Gribov-Zwanziger scenario. This van- ishing of the lowest eigenvalue is in fact necessary for the Gribov-Zwanziger mechanism to work: For infinite vol- ume, an average configuration should be arbitrarily close or on the common boundary of the fundamental mod- ular region and the Gribov horizon, where by definition the determinant of the Faddeev-Popov operator vanishes, and thus must have at least one vanishing eigenvalue [5]. III. THREE-POINT FUNCTIONS Investigating the vertices in two dimensions is a very interesting task. On the one hand, the vertices do not lend themselves easily to evaluation, since as three-point functions they are much more strongly affected by statis- tical fluctuations than two-point functions. Hence their investigation has so far been limited to rather small volumes in four [26, 27] and even in three dimensions [12, 26]. On the other hand, the vertices describe inter- action effects, and it is not a-priori clear how they should behave in a theory without propagating degrees of free- dom. In particular, the possibility that the three-gluon vertex, or at least some of its tensor structures, could change sign is a very interesting observation in higher di- mensions [12, 26]. Whether this is also the case in two dimensions, especially in large volumes, is thus also a question of interest. One drawback of investigating vertices in two dimen- sions on a square lattice is the impossibility to construct a momentum configuration such that all three momenta are equal. This equal momentum configuration is the one usually preferred in functional studies of the vertices [28], as it permits to have only one external scale. However, in higher dimensions it was found that the results do not change qualitatively when instead two of the momenta are taken to be orthogonal [12, 26]. This configuration can be realized in two dimensions, and will thus be em- ployed here. In general, vertices have a significant amount of tensor structures. To obtain a more simple function to measure the interaction represented by a vertex, the quantity Γtl,abcGabc Γtl,abcDadDbeDcfΓtl,def . (6) will be evaluated instead. Here the indices a, . . . , f are generic multi-indices, encompassing field-type, Lorentz and color indices. Also, Dab are the propagators of the fields, Gabc represent the full Green’s functions and Γtl,abc are the corresponding tree-level vertices. This quantity is defined such that it becomes equal to one if the full and the tree-level vertex coincide. For a more detailed discussion of this quantity and its properties, see [12]. There are two vertices in Landau-gauge Yang-Mills theory. The first is the ghost-gluon vertex, which is shown for four different volumes in figure 11. In this case in fact the vertex is shown, as only one tensor structure, the tree-level one, survives non-amputation [12]. As in the higher-dimensional cases [12, 26, 27], it ex- hibits an essentially constant behavior, except for a pos- sible small structure below roughly 1 GeV in ghost mo- mentum. This structure is a maximum, with a drop to- wards smaller momenta below the tree-level value. Fur- thermore, the value at small ghost momenta and finite gluon momenta is below 1, but finite. If a modification away from a constant infrared behavior of this vertex should exist, it must set in with an extremely small ef- fective exponent to not be visible on these volumes. These results are all in qualitative agreement with the ghost-gluon vertex in higher dimensions [12, 26, 27]. In particular, the results confirm the truncation scheme in the far infrared used in two dimensions in stochastic quantization and DSE calculations [6, 7, 8]. In that case an infrared finite ghost-gluon vertex was assumed, de- livering the critical infrared exponent κ = 1/5, which in fact was observed in the previous section. This once more nicely confirms the Gribov-Zwanziger scenario, which leads directly to this type of approximation. Further- more, in four dimensions the infrared critical exponent of the ghost-gluon vertex is fixed, once the exponents of the propagators are known [28]. This can be extended to two dimensions and yields in fact an infrared constant ghost-gluon vertex [29]. This is a very stringent test of the scenario. The results found here in lattice calcula- tions once more pass this test. Or, more aptly put, the test passes the results. The three-gluon vertex is much more troublesome to calculate due to strong statistical fluctuations, in partic- ular at large lattice (not physical) momenta. These are, in fact, even more pronounced than in higher dimensions, as was already observed when going from three to four di- mensions [26]. Thus the uncertainty connected with this vertex is quite large. Nonetheless, the results shown in figure 12 are quite spectacular. At a point of about 300- 400 MeV, corresponding roughly to the position where the plateau in the coupling constant develops or where the gluon propagator starts to bend over, the quantity changes sign. Thereafter, it diverges, likely like a power-law, as can be seen from the bottom-left panel in figure 12. Precisely such a divergence is expected in higher dimensions [28]. This also compares very well to lattice results in higher dimensions, which found the on- set of such a negative divergence in three dimensions [12], and at least an infrared suppression in four dimensions [26]. Note, however that due to the contraction (6) not necessarily one particular tensor structure of the vertex changes sign. It is as well possible that two tensor struc- tures have opposite sign throughout, but differ in mag- nitude, and one is dominant in the infrared, while the other dominates in the ultraviolet. The infrared divergence of the three-gluon vertex when one momentum vanishes is roughly in agreement with a power-law with exponent −2.2 for the single external scale, as can be seen in the bottom-left panel of figure 12. Although this is not the momentum configuration used in DSE calculations [28], there is again just one external scale. It could be expected that the infrared behavior is the same, if there is just one scale left. In that case, this exponent of −2.2 is actually the one expected in DSE q [GeV] (Ghost) 0.5 1 1.5 2 eV] (G luon) 0 Ghost-gluon vertex, orthogonal momenta FIG. 11: The ghost-gluon vertex for orthogonal momenta. The top left panel shows the vertex for all possible orthogonal mo- mentum configurations for a volume of (21.3 fm)2, with errors suppressed. The ripple structure is an artifact of the method [12], and vanishes with increasing statistics. The bottom left and right panel show the vertex in two specific momentum con- figurations. In one case the gluon momentum vanishes (left panel), and in the other the gluon and ghost momenta are of equal magnitude (right panel). In this case, various physical volumes are compared. Open circles correspond to a volume of (21.3 fm)2, full squares to (14.2 fm)2, full triangles to (8.08 fm)2, and upside-down full triangles to (2.02 fm)2. q [GeV] 0 0.5 1 1.5 2 2.5 3 3.5 Ghost-gluon vertex, one momentum vanishing q [GeV] 0 0.5 1 1.5 2 2.5 3 3.5 Ghost-gluon vertex, orthogonal momenta with two equal calculations [29]. This statement applies as well to the infrared constancy of the ghost-gluon vertex. Taking this reasoning seriously would imply that all two- and three-point functions exhibit exactly and quan- titatively the infrared exponents predicted in DSE calcu- lations, and are in agreement with the Gribov-Zwanziger scenario. Therefore, this work here would represent the first quantitative confirmation of these two frameworks using lattice gauge theory. It is of course tempting to also investigate higher n- point functions. Unfortunately, this is currently out of reach in the present approach. The reason is that only non-amputated, full Green’s functions can be directly ob- tained with the methods used here. Therefore, it would be necessary to first subtract the not-connected part of the amplitude, and then amputate the Green’s functions. In general, the not-connected and the connected ampli- tude have the same infrared behavior, at least in four dimensions, if the predictions [28] are correct. There- fore, it would be necessary to disentangle the sum of two functions, which both have the same leading infrared behavior. As the statistical fluctuations become larger when increasing the number of external legs, the required statistics become impractical at the current time. Hence it would be necessary to reduce these fluctuations. It is possible that e. g. including only results for the same sign of the Polyakov loop6 would be helpful, as by this statis- tical fluctuations, at least in case of the gluon propagator, are reduced [30]. This has to be investigated further. 6 At finite volume, the value of the Polyakov loop is non-zero for each individual configuration. p [GeV] 0 0.5 1 1.5 2 2.5 3 3.5 4 Three-gluon vertex, one momentum vanishing p [GeV] 0 0.5 1 1.5 2 2.5 3 3.5 4 Three-gluon vertex, orthogonal momenta with two equal p [GeV] -210×5 -110 -110×2 -110×3 Three-gluon vertex, one momentum vanishing q [GeV k [GeV] Three-gluon vertex, orthogonal momenta FIG. 12: The three-gluon vertex for orthogonal momenta. The top left and right panel show the vertex in two specific momentum configurations. In one case one of the gluon momenta vanishes (left panel), and in the other two of the gluon momenta are of equal magnitude (right panel). The bottom left panel shows a magnification of the low-momentum regime for one momentum vanishing. In this case the absolute value of GA is displayed. Various physical volumes are compared. Open circles correspond to a volume of (21.3 fm)2, full squares to (14.2 fm)2, full triangles to (8.08 fm)2, and upside-down full triangles to (2.02 fm)2. Finally, in the bottom right panel GA is shown for the complete orthogonal momentum configuration plane in case of the largest volume (21.3 fm)2. In case of the bottom left panel, results from all available volumes up to lattices of size 1202 are shown, see appendix A. In addition to the previously used symbols, the remaining symbols correspond to (3.56 fm)2 (pluses), (4.04 fm)2 (open stars), (6.06 fm)2 (open crosses), (7.11 fm)2 (full stars), (10.1 fm)2 (open triangles), (10.7 fm)2 (diamonds), (12.1 fm)2 (full circles), and (17.8 fm)2 (open squares). The line is the function −0.17p−2.2. IV. SUMMARY The volumes accessible in two-dimensional Yang-Mills theory permitted here to obtain the two-point and three- point functions on very large lattices, up to (42.7 fm)2 and (21.3 fm)2, respectively. In particular, it was possible to obtain quantitative results on the infrared behavior with a precision which is unprecedented in the lattice investigations of these quantities. These results demonstrated that the gluon propaga- tor is infrared vanishing, the ghost propagator is infrared diverging, and the ’effective coupling constant’ also has the expected qualitative infrared behavior. Moreover, it was possible to make these statements quantitative. In- cluding the effects of finite volume, it was possible to determine the infinite-volume limit of the characteris- tic infrared exponents for the two-point functions, and demonstrate the validity of the sum-rule (3). In fact, the value κ = 1/5 found coincides with one of the two pos- sible values expected from stochastic quantization and Dyson-Schwinger equations for an ’on-shell’, i. e. trans- verse, gluon. Furthermore, the infrared behavior of the vertices permit to close the system self-consistently in the context of such equations. In particular, the ghost-gluon vertex is infrared constant. These results confirm the Gribov-Zwanziger scenario in two dimensions. Without any dynamic, i. e. propa- gating, degrees of freedom, all the infrared behavior is still qualitative the same as in higher dimensions. This implies that these effects in fact stem from the gauge- fixing procedure, in essentially the way predicted by the Gribov-Zwanziger scenario. It will, of course, take some time before it is possible to repeat the same in higher dimensions. One of the quan- titative reasons is that the critical exponent in the gluon observables decreases with increasing dimension [6, 8]. Hence the effects observed here will only be observable for larger volumes in higher dimensions. Nonetheless, the results are also in excellent qualitative agreement with the predictions of DSE calculations for the finite volume behavior of the propagators in four dimensions [18]. Finally, the comparison of the ghost propagator for different dimensions yields the pattern expected from the Gribov-Zwanziger scenario. However, these results should also be taken with care, as two-dimensional Yang-Mills theory is different from its higher-dimensional versions. And although there is little evidence to the contrary, no rigorous implication exists that the effects seen here translate themselves into higher dimensions without changes. Hence a satisfactory state of affairs in higher dimensions has to await equivalent investigations in higher dimensions. Until then, these results here are another piece of the puzzle, which seem to indicate that the Gribov-Zwanziger scenario in Landau gauge is valid also in higher dimensions. These results are, beyond these questions, also inter- esting on their own. It is very tempting to investigate how these results relate to the host of exact results avail- able in two-dimensional Yang-Mills theory, what is the connection to the topological aspects of the theory, and, last but not least, how and if an equivalence between the Gribov-Zwanziger and the Kugo-Ojima confinement scenario exists, at least in two dimensions. Acknowledgments The author is grateful to Attilio Cucchieri and Tereza Mendes for many helpful and interesting discussions. Furthermore he thanks all those (in particular, Markus Huber) who always ask about two dimensions. This work was supported by the DFG under grant MA 3935/1-2 and in part by the Slovak Grant Agency for Science, Project VEGA No. 2/6068/2006. The ROOT framework [31] has been used in this project. APPENDIX A: GENERATION OF CONFIGURATIONS The generation of configurations in two dimensions and their gauge-fixing to Landau gauge can be and has been done exactly as in higher dimensions [12]. In particular, the confirmation of the Gribov-Zwanziger scenario in the present work implies that the problem of Gribov-Singer copies [3, 20] should also in two dimensions become ir- relevant for Green’s functions in the infinite volume limit [21]: Gribov-Singer effects should become smaller with increasing volume. Hence they have been ignored here, although, as discussed in section II, effects at finite vol- ume cannot be excluded. To give units to the momenta, the infinite volume limit of the string tension for a given β, which can be deter- mined analytically [1], is set to (440 MeV)2. The con- figurations used are shown in table IV. The comparison with the (also exactly known) infinite volume value of the plaquette [1] shows that locally the continuum has been reached. However, the discussion in section II shows that this is not correct globally. APPENDIX B: LATTICE ARTIFACTS OTHER THAN FINITE VOLUME As one of the main claims here is that the deviation from the asymptotic continuum form in the infrared is a pure finite-volume effect, it is necessary to check the in- fluence of other lattice artifacts. In particular, discretiza- tion effects and violation of rotational symmetry may be relevant. The latter is known to be a significant effect when comparing correlation functions measured along different directions of the hypercube (see, e. g., [14]), in the present case along an edge or along a diagonal. In figure 13, these effects are explicitly checked. The results are at roughly the same volume of about (10.3 fm)2 at two different βs, 10 and 30, and results with momenta along any possible direction are directly compared. It is clearly visible that, despite a factor of nearly 2 in a, both results agree remarkably well over the whole range of momenta. Thus discretization effects are nearly negligible, at least for a volume of a few fm2 and mo- mentum not too close to the maximum one. Treating only the physical volume as an independent parameter in the infrared throughout the main text is hence justi- fied. Also no significant effect is seen of the violation of rotational invariance, which is usually most pronounced at intermediate momenta in the gluon dressing function. For the current case a few tens of lattice sites along each edge seem to be sufficient to have already a quite good approximation of rotational invariance. Furthermore, there is no distinct difference between the gluon and the ghost dressing function in terms of TABLE IV: Data of the configurations considered in the numerical simulations. The values for a are 1.108 GeV−1 for β = 10 and 1.951 GeV−1 for β = 30 [1]. The momenta p0, pi, and pf denote the lowest non-vanishing momentum and the beginning and the end of the fit interval used in the determination of the effective exponents in section II, respectively. Note that for N ≥ 140 not as many momentum configurations for the gluon propagator were available as for N ≤ 120. Ghost configurations are the ones used to determine the ghost propagator, the properties of the Faddeev-Popov operator, the ghost-gluon vertex, and the running coupling. Gluon configurations are the ones used to determine the gluon propagator and the three-gluon vertex. As the autocorrelation time for the plaquette is less than one hybrid overrelaxation (HOR) sweep, all sweeps (after thermalization) have been used for the plaquette measurement, given the number of plaquette configurations in the table. Note that all ghost configurations are also included in the gluon configurations, the sets are not independent. In case of N ≥ 140, only the propagators have been determined. Hence the number of both configurations coincide. The quantity < P > / < P∞ > gives the ratio of the expectation value of the plaquette over the analytical infinite volume limit. The error is determined according to [12]. Finally, p is the tuning parameter for the stochastic overrelaxation algorithm used for gauge-fixing [32], and which has been obtained by linear self-adjustment [12]. Note that this quantity is not very precisely determined, and should be used rather as an indication of the correct order. Sweeps is the number of HOR sweeps between two consecutive measurements [12]. V [fm2] N = V/a2 β p0 [MeV] pi [MeV] pf [MeV] Ghost config. Gluon config. Plaq. config. 1-< P > /P∞ p Sweeps 2.02 20 30 610 1206 1874 2430 11525 369211 -5(4) 10−6 0.83 30 3.56 20 10 347 685 1064 2102 12319 355257 1(1) 10−5 0.84 30 4.04 40 30 306 610 961 1964 10579 527689 1(2) 10−6 0.88 50 6.06 60 30 204 408 644 1723 7311 510688 0(1) 10−6 0.93 70 7.11 40 10 174 347 546 2161 10758 536786 -2(6) 10−6 0.87 50 8.08 80 30 153 306 484 1429 4898 438579 0(1) 10−6 0.90 90 10.1 100 30 123 245 387 747 1988 216391 -3(1) 10−6 0.96 110 10.7 60 10 116 232 366 1825 7108 496291 -2(4) 10−6 0.92 70 12.1 120 30 102 204 323 552 1754 225036 1(1) 10−6 0.95 130 14.1 140 30 87.6 175 371 368 368 53971 -2(2) 10−6 0.97 150 14.2 80 10 87.0 174 275 1582 6465 579900 -1(3) 10−6 0.92 90 16.2 160 30 76.6 153 325 291 291 48199 0(2) 10−6 0.98 170 17.8 100 10 69.6 139 220 1339 4337 478853 -2(3) 10−6 0.96 110 18.2 180 30 68.1 136 289 308 308 56724 -2(1) 10−6 0.96 190 20.2 200 30 61.3 123 260 199 199 40584 -2(1) 10−6 0.96 210 21.3 120 10 58.0 116 183 762 5236 678065 1(2) 10−6 0.93 130 22.2 220 30 55.7 111 236 232 232 51577 1(1) 10−6 0.99 230 24.2 240 30 51.1 102 217 232 232 55691 0(1) 10−6 0.98 250 24.9 140 10 49.7 99.4 211 517 517 76053 1(5) 10−6 0.96 150 28.4 160 10 43.5 87.0 184 455 455 75500 -8(4) 10−6 0.97 170 32.0 180 10 38.7 77.3 164 390 390 72034 -4(4) 10−6 0.97 190 35.6 200 10 34.8 69.6 148 328 328 66976 -4(4) 10−6 0.97 210 39.1 220 10 31.6 63.3 134 287 287 63703 3(3) 10−3 0.98 230 42.7 240 10 29.0 58.0 123 394 394 96075 0(2) 10−6 0.98 250 these artifacts. In case of the propagators these effects would be even diminished, as the trivial factor p−2 helps in the reduction of such artifacts. Hence the totally domi- nant contribution for the artifacts in the correlation func- tions in the infrared is clearly the finite physical volume. Similar observations pertain to all quantities measured here, and hence only the physical volumes are used as explicit parameters in the main text, and no heed is paid for the different β-values. The only exception observed here is in the case of the running coupling in section II C, where an overall scaling factor has been seen. This issue has been discussed in detail in this section II C. APPENDIX C: CONTRIBUTIONS IN OTHER COLOR TENSOR STRUCTURES There is no a-priori necessity for correlation functions to carry only their tree-level color structure, although such a color structure permits a consistent solution us- ing functional methods in the infrared, at least in four dimensions [18]. Therefore, this property should be ex- plicitly checked. This is done for the ghost and the gluon propagator in figure 14. All contributions are compatible with zero. Furthermore, the average value decreases in all cases with increasing statistics. So, within the statistics available, there are no color-off-diagonal components in the propagators. Due to the structure of the DSEs, it is then very hard to imagine how the higher n-point Green’s functions should have a color structure different from the p [GeV] 0 1 2 3 4 5 Gluon dressing function p [GeV] 0 1 2 3 4 5 Ghost dressing function FIG. 13: Consequences of different discretizations and violation of rotational invariance in case of the gluon (left panel) and ghost (right panel) dressing functions. Open circles correspond to a system at β = 30 and a volume of (10.1 fm)2, open stars to a system at β = 10 and a volume of (10.7 fm)2. The different momentum directions have not been marked differently. p [GeV] 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 -0.04 -0.02 Off-diagonal gluon propagator p [GeV] 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Off-diagonal ghost dressing function FIG. 14: The color off-diagonal elements of the gluon propagator (left) and of the ghost propagator (right) on a (21.3 fm)2 volume. tree-level one. This can, of course, not be excluded by this result. Nonetheless, it seems to be unlikely. [1] H. G. Dosch and V. F. Muller, Fortsch. Phys. 27, 547 (1979). [2] T. Kugo and I. Ojima, Prog. Theor. Phys. Suppl. 66, 1 (1979) [Erratum Prog. Theor. Phys. 71, 1121 (1984)]. [3] V. N. Gribov, Nucl. Phys. B 139, 1 (1978). [4] D. Zwanziger, Phys. Lett. B 257, 168 (1991); D. Zwanziger, Nucl. Phys. B 364, 127 (1991); D. Zwanziger, Phys. Rev. D 67, 105001 (2003) [arXiv: hep-th/0206053]. [5] D. Zwanziger, Nucl. Phys. B 412, 657 (1994). [6] D. Zwanziger, Phys. Rev. D 65, 094039 (2002) [arXiv: hep-th/0109224]. [7] C. Lerche and L. von Smekal, Phys. Rev. D 65, 125006 (2002) [arXiv:hep-ph/0202194]. [8] A. Maas, J. Wambach, B. Grüter and R. Alkofer, Eur. http://arxiv.org/abs/hep-th/0206053 http://arxiv.org/abs/hep-th/0109224 http://arxiv.org/abs/hep-ph/0202194 Phys. J. C 37, No.3, 335 (2004) [arXiv:hep-ph/0408074]. [9] D. Zwanziger, Phys. Rev. D 70, 094034 (2004) [arXiv:hep-ph/0312254]. [10] J. M. Pawlowski, D. F. Litim, S. Nedelko and L. von Smekal, Phys. Rev. Lett. 93, 152002 (2004) [arXiv:hep-th/0312324]. [11] R. Alkofer and L. von Smekal, Phys. Rept. 353, 281 (2001) [arXiv:hep-ph/0007355]; C. S. Fischer, J. Phys. G 32, R253 (2006) [arXiv:hep-ph/0605173]; R. Alkofer, arXiv:hep-ph/0611090. [12] A. Cucchieri, A. Maas and T. Mendes, Phys. Rev. D 74, 014503 (2006) [arXiv:hep-lat/0605011]. [13] A. Cucchieri, Phys. Rev. D 60, 034508 (1999) [arXiv:hep-lat/9902023]; A. Cucchieri, F. Karsch and P. Petreczky, Phys. Rev. D 64, 036001 (2001) [arXiv:hep-lat/0103009]; A. Cucchieri, T. Mendes and A. R. Taurines, Phys. Rev. D 67, 091502 (2003) [arXiv:hep-lat/0302022]; A. Cucchieri and T. Mendes, Phys. Rev. D 73, 071502 (2006) [arXiv:hep-lat/0602012]. [14] A. Cucchieri and T. Mendes, arXiv:hep-ph/0605224. [15] P. Boucaud et al., arXiv:hep-lat/0602006. [16] J. C. R. Bloch, A. Cucchieri, K. Langfeld and T. Mendes, Nucl. Phys. B 687, 76 (2004) [arXiv:hep-lat/0312036]. [17] A. Sternbeck, E. M. Ilgenfritz, M. Müller-Preussker and A. Schiller, Phys. Rev. D 72, 014507 (2005) [arXiv:hep-lat/0506007]; A. Sternbeck, E. M. Ilgenfritz, M. Muller-Preussker, A. Schiller and I. L. Bogolubsky, PoS LAT2006, 076 (2006) [arXiv:hep-lat/0610053]. [18] C. S. Fischer, A. Maas, J. M. Pawlowski and L. von Smekal, arXiv:hep-ph/0701050, accepted by Ann. of. Phys.. [19] D. Birmingham, M. Blau, M. Rakowski and G. Thomp- son, Phys. Rept. 209, 129 (1991); S. Cordes, G. W. Moore and S. Ramgoolam, Nucl. Phys. Proc. Suppl. 41, 184 (1995) [arXiv:hep-th/9411210]. [20] I. M. Singer, Commun. Math. Phys. 60 (1978) 7. [21] D. Zwanziger, Phys. Rev. D 69, 016002 (2004) [arXiv:hep-ph/0303028]. [22] A. Cucchieri, A. Maas and T. Mendes, arXiv:hep-lat/0702022, accepted by Phys. Rev. D. [23] L. von Smekal, R. Alkofer and A. Hauck, Phys. Rev. Lett. 79, 3591 (1997) [arXiv:hep-ph/9705242]; Annals Phys. 267, 1 (1998) [Erratum-ibid. 269, 182 (1998)] [arXiv:hep-ph/9707327]. [24] G. P. Lepage and P. B. Mackenzie, Phys. Rev. D 48, 2250 (1993) [arXiv:hep-lat/9209022]. [25] A. Cucchieri, arXiv:hep-lat/0612004. [26] A. Maas, A. Cucchieri and T. Mendes, arXiv:hep-lat/0610006. [27] A. Cucchieri, T. Mendes and A. Mihara, JHEP 0412, 012 (2004) [arXiv:hep-lat/0408034]; A. Stern- beck, E. M. Ilgenfritz, M. Muller-Preussker, A. Schiller and I. L. Bogolubsky, PoS LAT2006, 076 (2006) [arXiv:hep-lat/0610053]. [28] C. S. Fischer and J. M. Pawlowski, Phys. Rev. D 75, 025012 (2007) [arXiv:hep-th/0609009]; R. Alkofer, C. S. Fischer and F. J. Llanes-Estrada, Phys. Lett. B 611, 279 (2005) [arXiv:hep-th/0412330]. [29] C. S. Fischer, private communication; M. Huber, Diploma Thesis, Graz U., March 2007 (advisor: R. Alkofer). [30] A. Cucchieri, Nucl. Phys. B 508, 353 (1997) [arXiv:hep-lat/9705005]. [31] R. Brun and F. Rademakers, Nucl. Instrum. Meth. A 389, 81 (1997). [32] A. Cucchieri and T. Mendes, Nucl. Phys. B 471, 263 (1996) [arXiv:hep-lat/9511020]; Ph. de Forcrand, R. Gupta, Nucl. Phys. B (Proc. Suppl. ) 9, 516 (1989). http://arxiv.org/abs/hep-ph/0408074 http://arxiv.org/abs/hep-ph/0312254 http://arxiv.org/abs/hep-th/0312324 http://arxiv.org/abs/hep-ph/0007355 http://arxiv.org/abs/hep-ph/0605173 http://arxiv.org/abs/hep-ph/0611090 http://arxiv.org/abs/hep-lat/0605011 http://arxiv.org/abs/hep-lat/9902023 http://arxiv.org/abs/hep-lat/0103009 http://arxiv.org/abs/hep-lat/0302022 http://arxiv.org/abs/hep-lat/0602012 http://arxiv.org/abs/hep-ph/0605224 http://arxiv.org/abs/hep-lat/0602006 http://arxiv.org/abs/hep-lat/0312036 http://arxiv.org/abs/hep-lat/0506007 http://arxiv.org/abs/hep-lat/0610053 http://arxiv.org/abs/hep-ph/0701050 http://arxiv.org/abs/hep-th/9411210 http://arxiv.org/abs/hep-ph/0303028 http://arxiv.org/abs/hep-lat/0702022 http://arxiv.org/abs/hep-ph/9705242 http://arxiv.org/abs/hep-ph/9707327 http://arxiv.org/abs/hep-lat/9209022 http://arxiv.org/abs/hep-lat/0612004 http://arxiv.org/abs/hep-lat/0610006 http://arxiv.org/abs/hep-lat/0408034 http://arxiv.org/abs/hep-lat/0610053 http://arxiv.org/abs/hep-th/0609009 http://arxiv.org/abs/hep-th/0412330 http://arxiv.org/abs/hep-lat/9705005 http://arxiv.org/abs/hep-lat/9511020 ABSTRACT The ghost and gluon propagator and the ghost-gluon and three-gluon vertex of two-dimensional SU(2) Yang-Mills theory in (minimal) Landau gauge are studied using lattice gauge theory. It is found that the results are qualitatively similar to the ones in three and four dimensions. The propagators and the Faddeev-Popov operator behave as expected from the Gribov-Zwanziger scenario. In addition, finite volume effects affecting these Green's functions are investigated systematically. The critical infrared exponents of the propagators, as proposed in calculations using stochastic quantization and Dyson-Schwinger equations, are confirmed quantitatively. For this purpose lattices of volume up to (42.7 fm)^2 have been used. <|endoftext|><|startoftext|> Coupled electron and phonon transport in one-dimensional atomic junctions J. T. Lü∗ and Jian-Sheng Wang† Center for Computational Science and Engineering and Department of Physics, National University of Singapore, Singapore 117542, Republic of Singapore (Dated: August 5, 2021) Employing the nonequilibrium Green’s function method, we develop a fully quantum mechanical model to study the coupled electron-phonon transport in one-dimensional atomic junctions in the presence of a weak electron-phonon interaction. This model enables us to study the electronic and phononic transport on an equal footing. We derive the electrical and energy currents of the coupled electron-phonon system and the energy exchange between them. As an application, we study the heat dissipation in current carrying atomic junctions within the self-consistent Born approximation, which guarantees energy current conservation. We find that the inclusion of phonon transport is important in determining the heat dissipation and temperature change of the atomic junctions. PACS numbers: 71.38.-k,63.20.Kr,72.10.Bg I. INTRODUCTION The electronic transport and phononic transport in meso- and nano-structures have attracted a great deal of interest in the past two decades, although their devel- opment is not so parallel sometimes. These structures display important quantum effects due to the confine- ment in one or more directions1. The quantized electrical conductance2 was observed much earlier than that of the thermal conductance3 mainly due to the difficulty in mea- suring the thermal transport properties. Electrons and phonons are not two isolated systems. Their interactions are important for both electronic and phononic trans- port. With the development of both fields there arises the requirement to study the coupled electron-phonon transport from time to time. When studying electronic transport problems, one usually assumes that electrons interact with some phonon bath where the phonons are in their thermal equilibrium state characterized by the Bose distribution. This simple assumption is not able to give satisfactory results in some cases where the phonons are driven out of equilibrium by the electrons. This is espe- cially true in places where the thermal conductance is low or the phonon relaxation is slow4,5. To take into account the nonequilibrium phonon effect, one usually introduces into the electronic transport formalism some phenomeno- logical parameters that describe the phonon relaxation process. In engineering applications, as the size of the electronic devices decreases to nanoscale, the heat dissi- pation and conduction in these structures become crit- ical issues, which may influence the electronic proper- ties dramatically6. Only studying the electronic trans- port is not enough in these cases. On the other hand, heat transport in one-dimensional (1D) structures has re- ceived considerable attention recently6,7,8. Fourier’s law of heat conduction is no longer valid in many 1D systems. The microscopic origins of the macroscopic Fourier’s law remain one of the most frustrating problems in nonequi- librium statistical mechanics. Since the electrons and phonons both contribute to the heat conduction, their relative roles in many nanostructures are still not clear. Especially in semiconductors, which one carries the ma- jority of the thermal current is not a trivial problem. To answer these questions, we need some general models, which take into account the electron, phonon transport, and their mutual interactions. Theoretically, although the development of electronic transport in 1D structures has been very striking, that of the phononic transport is relatively slow. Classical molecular dynamics (MD) and the Boltzmann-Peierls equation are the widely used methods in phononic trans- port. MD method is not accurate below the Debye temperature, while the Boltzmann-Peierls equation can not be used in nanostructures without translational in- variance. In both cases, the quantum effect becomes important1. Only recently, the nonequilibrium Green’s function method9,10,11,12, which has been widely used to study the electronic transport, has been applied to study the quantum phononic transport13,14,15,16,17. As far as we know, the study of the coupled electronic and phononic transport in nanostructures is rare18,19,20,21. In Ref. 19, the authors considered the nonequilibrium phonons in molecular transport junctions. Galperin and co-authors analyzed the heat generation and conduction in molecular systems18. In this paper, using the nonequi- librium Green’s function method, we study the coupled electronic and phononic transport in 1D atomic junc- tions. The formalism is similar to that of Ref. 18. In our model the electron subsystem is described by a single- orbital tight-binding Hamiltonian, and the phonon sub- system is described in a harmonic approximation. We assume that the electron-phonon interaction is weak so that the perturbative treatment is valid. The strong- interaction case is the scope of future work. The rest of the paper is organized as follows. In Sec. II, we introduce the 1D model system, and derive expres- sions for the electrical, energy current of the coupled electron-phonon system. In Sec. III we show the heat generation in one- and two-atom structures under differ- ent model parameters. Sec. IV is the conclusion. In Ap- pendix A-C we give some technical details of our deriva- tion. http://arxiv.org/abs/0704.0723v1 II. COUPLED ELECTRONIC AND PHONONIC TRANSPORT A. The Hamitonian Our model system is an infinite 1D atomic chain as shown in Fig. 1. The electrons and atoms are only al- lowed to move in the longitudinal direction. We treat the atoms as coupled harmonic oscillators, and take into account their nearest neighbour interactions up to the second order. We assume that there is only one single electronic state for each atom and take into account hop- ping transitions between the nearest states. This corre- sponds to a single-orbital tight-binding model. Also, we assume that there is only one spin state for each orbital. Following Caroli22, we divide the whole system into one central region and two semi-infinite leads, which act as electrical and thermal baths (Fig. 1). The Hamiltonian of the whole system is α=L,C,R;β=e,ph α=L,R;β=e,ph HαCβ +H +Heph. (1) The electron-phonon interaction Hamiltonian Heph is non-zero only in the central region. The electron Hamil- tonian reads Hαe = εαi c |i−j|=1 tαijc j , (2) where c i and c i are the electron creation and annihila- tion operators. εαi is the electron onsite energy, and t ij is the hopping energy between adjacent states. i and j run over the sites in the α region. The coupling Hamiltonian with the leads is HLCe = tLCij c j , (3) HCRe = tCRij c j . (4) HCLe and H e have similar expressions. We also have tαC = tCα , α = L,R. For our 1D tight-binding model, tαC has only one non-zero element. If we label the central atoms with indices 1 to n as shown in Fig. 1, the non-zero elements will be tLC01 , t 10 , t n+1,n, and t n,n+1. The phonon Hamiltonian is Hαph = u̇αi u̇ |i−j|=0,1 uαi K j . (5) uαi and u̇ i are the mass-renormalized atom displacement and momentum operator. Kαii = 2K i , and K εCm+1 ε εRn+1 ε KL0,−1 tL0,−1 t KCL10 tCm+1,m KCm+1,m tRCn+1,n KRCn+1,n tRn+2,n+1 KRn+2,n+1 Heph Heph FIG. 1: Shematic diagram of the 1D coupled electron-phonon system and the parameters used in the model. The big dots in the bottom line represent atoms, while the small dots in the upper line represent electron states. They are coupled via the electron-phonon interaction. −Kα0 / mαi m j (i 6= j). Here Kα0 is the spring constant, and mαi is the mass of the ith atom in the α region. Like the electrons, the coupling Hamiltonian with the leads is HLCph = uLi K j , (6) HCRph = uCi K j . (7) We also have KCα = KαC . The non-zero elements are KLC01 , K 10 , K n+1,n, and K n,n+1. The electron-phonon interaction is included within the adiabatic Born-Oppenheimer approximation. First, the electron subsystem is solved with all the atoms in their equilibrium positions. Then, the isolated phonon sub- system is considered. After that, the electron-phonon in- teraction is turned on by allowing the atoms to oscillate around their equilibrium positions. Within this picture, the electron-phonon interaction is11 Heph = i,j,k Mkijc icjuk. (8) The interaction matrix element is Mkij = . All the operators in Eq. (8) are in the central region, so we omitted the superscript C. In our model, the electron operators are in the second quantization, while that of the phonons are in the first quantization. B. Green’s functions The nonequilibrium Green’s function method for the electronic transport is discussed in Refs. 9,10,11,12, and that for the phononic transport in Refs. 13,14,15,16,17. Here we concentrate on the electron-phonon interactions. The definition of the electron contour-ordered Green’s function is Gjk(τ, τ ′) = −i〈T {cj(τ)c†k(τ ′)}〉, and the phonon counterpart is Djk(τ, τ ′) = −i〈T {uj(τ)uk(τ ′)}〉. Here τ is time on the Keldysh contour, and T {· · ·} is the contour-ordered operator. We set h̄ = 1 through- out the formulas. Without the electron-phonon inter- action, the isolated electron and phonon problem can be solved exactly. We denote these Green’s functions as G0(τ, τ ′) and D0(τ, τ ′), respectively. In our case, it is convenient to write the Hamiltonians as matri- ces and work in the energy space. The electron re- tarded and advanced Green’s functions are Gr0(ε) = †(ε) = (ε+ iη)I −HCe − ΣrL(ε)− ΣrR(ε) . I is an identity matrix, and η → 0+. The retarded self- energy Σrα = t Cαgrαt αC is due to the interactions with the lead α. The retarded Green’s function of the semi- infinite lead grα can be obtained analytically (Appendix A). The “less than” Green’s function is given by G<0 = Gr0(Σ L + Σ 0 , where Σ α = −f eα(Σrα − Σaα). f eα is the Fermi-Dirac distribution. The phonon retarded and advanced Green’s functions are23 Dr0(ω) = D †(ω) = (ω + iη)2I −KC −ΠrL(ω)−ΠrR(ω) . The lead re- tarded self-energy is Πrα(ω) = K Cαdrα(ω)K αC . drα also has analytical expression (Appendix A). The phonon “less than” Green’s function is D<0 = D )Da0 , where Π<α = f α −Πaα). fphα is the Bose distribution function. Knowing the bare electron and phonon Green’s func- tions G0 and D0, we can include their interaction as per- turbation. Following the standard procedure of nonequi- librium Green’s function method, we can express this interaction as self-energies. The full Green’s functions are obtained from the Dyson equation, e.g., for elec- trons Gr,a = G 0 + G r,a, and G< = GrΣ(ε)Σ<α (ε)−G<(ε)Σ>α (ε)}. (14) The electron energy current is JE,eα = ε Tr{G>(ε)Σ<α (ε)−G<(ε)Σ>α (ε)}. (15) The electron heat current is obtained from Eqs. (14- 15) as Jh,eα = J α − µαJα/e. µα is the lead chemical potential. The derivation of the phonon energy current runs parallel with that of the electrons17 JE,phα = − ω Tr{D>(ω)Π<α (ω)−D<(ω)Π>α (ω)}. For phonons the energy current is the same as the heat current. When there is no electron-phonon interaction, the electron energy current is conserved throughout the structure. So is the phonon energy current. In the pres- ence of such an interaction, only the total energy current is conserved due to the energy exchange between them. The phonons do not carry charges, so in both cases the electrical current is conserved. Since we can’t get the exact self-energies in most cases, we need some approxi- mations. Properly defined self-energies should fulfill the electrical and energy current conservation Jα = 0, (17) (JE,eα + J α ) = 0, (18) where α runs over all the leads. We justify that the SCBA fulfills these conservation laws, while the BA fails to con- serve the energy current (Appendix B). Provided we sat- isfy these conservation laws, we can write the electrical and energy current in symmetric forms. The electrical current is J = e T̃ e(ε) [f eL(ε)− f eR(ε)] . (19) The transmission coefficient reads T̃ e = Tr Gr(ΓL + Γeph − Se)GaΓR +GrΓLG a(ΓR + Γeph + S , (20) where Se is (f eR + f L)Γeph + iΣ f eL − f eR . (21) Γα = i(Σ α − Σaα), α = L,R is the electron level-width function. Γeph = i(Σ eph − Σaeph) is due to the electron- phonon interaction. The total energy current is T̃ e(ε) [f eL(ε)− f eR(ε)] T̃ ph(ε) L (ε)− f R (ε) . (22) The phonon transmission coefficient is T̃ ph = Tr Dr(ΛL + Λeph − Sph)DaΛR +DrΛLD a(ΛR + Λeph + S , (23) where Sph is Sph = R + f L )Λeph − iΠ L − f . (24) Λα = i(Π α − Πaα) is the phonon level-width function. Λeph = i(Π eph−Πaeph) is due to the electron-phonon inter- action. Eqs. (19-24) are the generalization of the Caroli formula22 to include the electron-phonon interaction. III. HEAT GENERATION IN CURRENT CARRYING 1D ATOMIC JUNCTIONS As an application of the formalism in Sec. II, we study the heat dissipation in current-carrying 1D atomic junctions18,25,26,27,28,29,30,31. In the presence of potential difference between the two leads, there will be electrical current flowing between them. When the electrons pass the central region, there is energy exchange between the electron and phonon systems. The energy dissipated into the phonon system makes the atom temperature higher than that of the leads if it is not efficiently conducted to the leads. If the electron-phonon interaction is weak, the energy dissipated into the phonon system is only a small fraction of the electron energy current. But this small fraction still influences the transport properties of the atomic junction and even leads to junction breakup32,33, especially when the thermal conductance is low. Dif- ferent models have been used to study the local heating effect. Some simply assume that the phonons are in their thermal equilibrium states31. Some take into account the phonon transport by using the rate equations30 or other semi-classical models26,27,32. Few of them take into ac- count the quantum effect in heat transport18,34. Our model treats the electron and phonon transport on an equal quantum-mechanical footing, and includes their in- teractions self-consistently. The heat generation is given by (Eq. (B9)) Q = i G>nm(ε)M kl(ω)G ij(ε− ω)M ljn . (25) At zero temperature, we can get an analytical expression Eq. (C1) for a single-atom structure by using the bare Green’s functions G0 and D0 in Eq. (25) (Appendix C). Equation (C1) can reproduce most qualitative features of heat generation in a single atom, except that it does not take into account heat conduction in the phonon system. We first study the case where the lead energy band is wide compared to the voltage applied to the structure. For most metallic leads, this condition should hold. In the weak electron-phonon coupling regime, the Born ap- proximation should give acceptable results for the heat generation, although physically it is not a good approx- imation. Figure 2 shows the heat generation of a single atom (n = 1 in Fig. 1) computed using Eqs. (B5) and (B8) under BA and SCBA, respectively. The parameters used in the calculation are stated in the figure caption. With these parameters, the electron energy band is in the range −1 ≤ ε ≤ 1 eV. The chemical potential of each lead is zero in equilibrium. The phonon energy is ap- proximately ω = 0.05 eV. In all the results presented in this section, the temperature is T = 4.2 K, the electron- phonon coupling matrix M = 0.08 eV/(Å·amu 12 ). The cut-off energy of the electron system is 2.1 eV, and the phonon system is 0.2 eV. The energy spacing is dis- cretized into grids of 1 meV. Equation (B5) gives the energy decrease of the electron system, while Eq. (B8) gives the energy increase of the phonon system. Numeri- cal results from Eq. (B5) and Eq. (B8) under SCBA have some slight discrepancy. This is due to numerical inac- curacies. But most of the discrepancy under BA comes from the difference between the bare and the full Green’s functions, which may become even larger for some pa- rameters. So BA should be used with care in the study the energy exchange between the electron and phonon system. We also note that although Eqs. (B5) and (B7) are equivalent, numerical result from Eq. (B5) is unsta- ble in many cases. The reason is that the energy ex- change between the electron and the phonon system is only a small fraction of the total electrical energy cur- rent. Equation (B5) is the difference between two large numbers, so our numerical integration has to be accurate enough to get a reasonable result30. On the contrary, Eq. (25) is much more stable since we have got the dif- ference analytically. All the results presented below use this equation. From Fig. 2, we can see two threshold values in heat generation. The first one corresponds to the onset of phonon emission. Under low temperatures, the equilib- rium phonon occupation is very small, so the phonon absorption process seldom takes place. If the applied bias is smaller than the phonon energy, electrons don’t 0 0.1 0.2 0.3 0.4 0.5 Voltage (V) SCBA, B8 SCBA,B5 BA, B8 BA,B5 0 0.05 0.1 FIG. 2: Comparison of different methods to compute the heat generation in a single atom structure. The four curves cor- respond to results from Eqs. (B5) and (B8) under BA and SCBA, respectively. If we label this single atom as index 1, its electronic onsite energy is written as εC1 = 0.1 eV. The onsite energy of the leads is εL = εR = 0 eV. The hopping energy is tLij = t ij = 0.5 eV. The non-zero electronic cou- pling with the lead is tCL10 = t 21 = 0.1 eV. The matrix ele- ment of the single atom is KC11 = 0.654 eV/(Å · amu). The spring constant between the lead atoms is KLij = K ij = 0.654 eV/(Å2·amu). The non-zero atomic coupling with the leads is KCL10 = K 21 = 0.127 eV/(Å ·amu). have enough energy to emit one phonon. So the heat generation is zero. Once the applied voltage is larger than the phonon energy, phonon emission turns on. The heat generation increases almost linearly with the applied bias (inset of Fig. 2). This is different from the electrical current, which increases smoothly in this regime. The second threshold value corresponds to the alignment of the left lead chemical potential with the electron onsite energy eV = 2ε0 (positive bias µl > µr). The electron transmission is nearly unity above the onsite energy. The larger the transmission, the larger the current and the heat generation provided that the other parameters re- main unchanged. These two threshold behaviours may become less obvious when the coupling with the leads get stronger. As a result of coupling, the discrete elec- tron and phonon density of states (DoS) extends to a small energy region around their discrete values. The continuous phonon DoS leads to the broadening of the first threshold behaviour, while the continuous electron DoS is responsible for that of the second. It is smoothed out when the coupling is large enough (Fig. 3). Only electrons whose energies are within the broadened energy spectrum can tunnel across the central atom. The heat generation reaches maximum when the electron states in one lead are all occupied in this energy range, while those in the other are all empty. The electron-lead coupling not only leads to the elec- tron level broadening, but it also influences the elec- tron tunneling time. The larger this coupling, the less time electrons spend in the central region. In Fig. 3, we show the heat generation and the atom temperature for a single-atom structure under different electronic coupling strengths. The definition of temperature is ambiguous in nanostructures6. Here we use the method proposed in Ref. 18. We can only see one threshold behaviour at about 0.2 V, which is smoothed out when the coupling is larger than 0.2 eV. The temperature and the heat gener- ation show similar trends. The saturate voltage of heat generation increases with the strengthen of the electron- lead coupling. This is due to the coupling induced atomic level broadening. The decrease of the heat generation and temperature with increasing electron-leads coupling can be easily understood. The larger this coupling, the less time electrons spend at the central atom. Since the electron-phonon interaction takes place there, the heat generation decreases. We also show the heat generation as a function of electron-lead coupling in the inset of the lower panel. The applied voltage is 0.3 V. On one side, when the coupling is too small, few electrons can tun- neling through the atom. The heat generation is small. On the other, when the coupling is very large, the elec- tron tunneling process is too quick for the phonons to interact with the electrons. The heat generation is also small. It has a maximum value at some moderate cou- pling strength. This is different from the electrical cur- rent, which increases monotonously with the increase of coupling strength. 0 0.1 0.2 0.3 0.4 0.5 Voltage (V) 0.2 0.4 FIG. 3: Heat generation Q and the atom temperature T under different electron coupling strength tCL10 = t 21 = 0.05, 0.1, 0.2, and 0.3 eV, respectively. Other parameters are the same with Fig. 2. The inset shows the heat generation as a function of electron coupling strength at an applied bias V = 0.3 V. The atom-lead coupling determines how well the gen- erated heat can be conducted into the surrounding leads. One of the important reasons why we are interested in the heat generation in nanostructures is that it may leads to temperature increase and even structure breakup. To study the temperature change, we need to take into ac- count not only the heat generation, but also the heat con- duction into the leads. In the simplest one-atom struc- ture, the heat conductance is mainly determined by the atom-lead coupling. Our model includes this intrinsically. Figure 4 shows the heat generation and the atom temper- ature as a function of atom-lead coupling. For the heat generation, the BA and SCBA results show large differ- ence around the resonant position, which corresponds to a perfect atomic junction. For the atom temperature, BA and SCBA give almost the same results. In the case of a perfect junction, the heat generation reaches its max- imum value, while the atom temperature is the lowest. The reason is that the perfect junction has the best heat conductance. When the atom-lead coupling is weak, the heat generation is small. But the poor heat conductance can still result in a much higher temperature than the surrounding leads. We also show the heat conductance as a function of atom-lead coupling in the inset of the upper panel, which shows a sharp peak at resonance. 0 0.2 0.4 0.6 0.8 1 K (eV/A2 amu) 0.80.60.40.2 FIG. 4: Heat generation Q and the atom temperature T as a function of the atom-lead coupling KCL10 = K 21 = K. Dashed and dotted lines correspond to results of SCBA and BA, respectively. Other parameters are the same with Fig. 2. The inset shows the thermal conductance κ as a function of K. The unit is 1× 10−12 W/K. In Fig. 5, we show the heat generation as a function of electron onsite energy at different biases. We assume that we can tune the the onsite energy via a gate voltage. When the applied bias is less than the phonon energy, there will be no heat generation. When the bias energy is slightly larger than the phonon energy and less than 2ω, there are two energy positions where the heat generation is the largest. These two peaks are approximately at −0.5eV +ω and 0.5eV −ω. They merge into a single one at a bias of eV = 2ω until it reaches saturation. After that, this peak broadens, and becomes ladders. All these behaviour can also be explained by the analytical result of Eq. (C1). In Fig. 6 we show the heat generation of a two-atom structure (n = 2 in Fig. 1). The central region has two identical atoms. Interaction between them leads to two discrete energy levels. One is at 0 eV, and the other at 0.4 eV. When the electrical coupling between the leads and the central region is small (0.1 eV), additional to the threshold behaviour at eV = ω, there are two ladders corresponding to the phonon assisted resonant tunneling across the two electrical levels. If the electrical coupling 0.20.10-0.1-0.2 ε(eV) FIG. 5: Heat generation Q as a function of electrical onsite energy εC1 under different biases. From the inner to the outer side, the applied biases are V = 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, and 0.50 V, respectively. gets larger (0.2 eV), the two ladders broaden out. Again this is attributed to the coupling induced level broad- ening. The heat generation for the two-atom structure is much larger than that of a single-atom structure. The more the electrical levels, the larger the electrical current and heat generation. It is worth noting that for multi- atom structures the distribution of the electrostatic po- tential may influence the results significantly35. In the above calculation, we assume that the two electrical lev- els don’t change with the applied bias, and that we can tune their positions via a gate voltage. 0 0.2 0.4 0.6 0.8 1 Voltage (V) FIG. 6: Heat generation Q as a function of applied voltage for a two-atom structure. The two-atom onsite energy is εC = 0.2 eV, the hopping energy is tC = 0.1 eV, and the spring con- stant is KC = 0.654 eV/(Å2·amu). The two leads are iden- tical. The electron onsite energy is εL = εR = 0, and the hopping energy is tL = tR = 0.5 eV. Their spring constants are the same as the central region. The non-zero coupling cou- plings with the leads are KCL10 = K 32 = 0.327 eV/(Å ·amu), and tCL10 = t 32 = 0.1 (solid), 0.2 eV (dashed), respectively. If one of the metallic leads is replaced by a semicon- ductor, there will be some new features in the electrical current and the heat generation. In our simple model, we can alternate the electron onsite energies between two values to mimic a simple semiconductor (Appendix A). In Fig. 7 we show the heat generation and the electri- cal current for such kind of structure. The alternating onsite energies of the left lead are −0.1 and −0.2 eV, re- spectively. This produces an energy band-gap of 0.1 eV. Other parameters are given in the figure caption. We can see that there appears negative differential conductivity in the current-voltage characteristics due to the semi- conductor band-gap. This qualitatively agrees with the experimental36 and first-principle37 studies. The heat generation curve is slightly different. Additional to its threshold behaviour, the peak and valley positions are also different. The electrical current has a peak when the chemical potential of the lead is aligned with the central electrical level, while the peak of the heat gen- eration shifts to the right by one phonon energy. This corresponds to the phonon-assisted resonant tunneling. The current and heat generation decrease when the sin- gle electrical level is within the band-gap of the left lead. The peak-to-valley ratio depends on the coupling with the semiconductor lead. In the limit of small band-gap and large coupling, we recover the metallic lead results. 0 0.2 0.4 0.6 0.8 1 Voltage (V) FIG. 7: Heat generation Q and electrical current J as a function of applied voltage for a single-atom structure. The left lead is a semiconductor. Its alternating onsite energies are −0.2 and −0.1 eV, respectively. The chemical potential is µL = 0.05 eV higher than the conduction band bottom, which corresponds to n-type doping. Other parameters are 10 = K 21 = 0.4 eV/(Å ·amu), tCL10 = t 21 = 0.1 eV, 1 = 0.1 eV, K 11 = 0.654 eV/(Å ·amu), εR = −0.05 eV, L = tR = 0.5 eV, and KL = KR = 0.654 eV/(Å2·amu). IV. CONCLUSION We studied the coupled electron and phonon transport in 1D atomic junctions in the weak electron-phonon inter- action regime. Base on the nonequilibrium Green’s func- tion method, we derived the electrical, energy current of the coupled electron-phonon system, and the energy ex- change between them. We showed that the SCBA con- serves the energy current. Using this formalism, we stud- ied the heat generation in one- and two-atom structures coupling with different leads under a broad range of pa- rameters. Especially, we studied the influence of the ther- mal transport properties on the heat generation and atom temperature of the central region. The results on semi- conductor leads agree qualitatively with the experimental and first-principle studies. This model can be easily ex- tended to study more realistic structures such as molec- ular transport junctions and metallic nanowires. The electron, phonon Hamiltonian, their interaction and lead- coupling matrices can all be obtained from first-principle calculations17,38,39. The surface Green’s functions for bulk leads can be computed by recursive method17,38. It is also possible to include the electron-electron and the phonon-phonon interactions14,17. Acknowledgments We thank Baowen Li, Sai Kong Chin, Jian Wang, and Nan Zeng for discussions. This work is supported in part by a Faculty Research Grant of the National University of Singapore. APPENDIX A: SURFACE GREEN’S FUNCTIONS OF THE 1D LEAD In this Appendix, we show that for the 1D tight- binding model the lead self-energies can be expressed analytically17. The electron and phonon self-energies are similar in their form. Here we take electrons as an exam- ple, and give the phonon results directly. We assume that the onsite energies of the electrons alternate between ε1 and ε2. The hopping energy is t ij = t0. If ε1 = ε2, we get a continuum band. This corresponds to a metallic lead. If they are not equal, we get two bands with a band gap. We can take the lower as the valence band (VB), and the upper as the conduction band (CB). We use this method to mimic a semiconductor lead. In this case, the semi- infinite lead has two electron states in each period. In the tight-binding model, only the left- (right-) most state of the central region is coupled to the left (right) lead. So we only need to know the surface Green’s function, e.g., for the left lead it is g0 = g 00. We assume the retarded Green’s function is grij = i−j state 1, i−j state 2. Putting it into the definition of the retarded Green’s func- tions [(ε+ iη)I −H ]gr = I, we have − t0c1 + (ε+ iη − ε2)c2 − t0c1λ = 0, (A2) − t0c2 + (ε+ iη − ε1)c1λ− t0c2λ = 0. (A3) From Eqs. (A2-A3), we get an equation for λ 2− (ε+ iη − ε1)(ε+ iη − ε2) λ+ 1 = 0. (A4) The condition that Eq. (A4) has travelling wave solutions gives the dispersion relation (ε1+ε2)− (ε1−ε2)2+16t ≤ ε ≤ ε1 (VB), ε2 ≤ ε ≤ (ε1+ε2)+ (ε1−ε2)2+16t (CB). We assume ε1 ≤ ε2 without loss of generality. The energy band-gap is ε2 − ε1. If they are equal, the two bands merge into one, which corresponds to a metallic lead. For the surface Green’s function of the left lead, we also have (ε+ iη − ε1)c1 − t0c2 = 1. (A6) From Eqs. (A2,A6), we get ε+iη−ε2 (1+λ)t2 (VB), ε+iη−ε1 (1+λ)t2 (CB). |λ| ≥ 1 is one of the roots of Eq. (A4). The surface Green’s function of the right lead is identical. We can also alternate the atom masses to generate a phonon band-gap. In our model the mass change will modify the renormalized spring constants. The diagonal elements of the dynamical matrix will be two alternating values Kαii = 2k1 or 2k2, while the off-diagonal elements will be a single value Kαij = − k1k2, where |i − j| = 1. If we assume that k2 ≥ k1, the acoustic band (AB) is 0 < ω2 < 2k1, and the optical band (OB) 2k2 < ω 2(k1 + k2). The surface Green’s function is (1+λ)k1k2 (AB), (1+λ)k1k2 (OB), where Ωn = (ω + iη) 2 − 2kn. |λ| ≥ 1 is one of the roots 2− Ω1Ω2 λ+ 1 = 0. (A9) In all the simulation results of present paper, the two spring constants are equal (k1 = k2), which correspond a single continuum phonon band. The electron onsite energies are also equal (ε1 = ε2) except in Fig. 7, where we set ε1 = −0.2 eV and ε2 = −0.1 eV to mimic a semiconductor lead. APPENDIX B: ENERGY CURRENT CONSERVATION In this Appendix, we justify that the SCBA satisfies the energy current conservation. The justification of the electrical current conservation is given in the Refs.40,41. What we need to prove is that (JE,eα + J α ) = 0. (B1) The electron part is JE,eα = ε Tr{G>(ε)Σ<α (ε)−G<(ε)Σ>α (ε)}. Using the important relation39,40 G>Σt = 0, (B3) we get JE,eα = − ε Tr{G>(ε)Σeph(ε)}. The Hartree term does not contribute to the current di- rectly. It’s just like a static potential which only modifies the Green’s function. Putting the Fock self-energy into Eq. (B4), we have JE,eα G>nm(ε)M kl(ω)G ij(ε− ω)M −Gkl(ω)G ij(ε− ω)M ljn . (B5) Sum over all the indices is assumed. The heat generation Q is the energy decrease of the electron system, which should also be the energy increase of the phonon system. Replacing ω by −ω, using the symmetric properties of the phonon Green’s functions17, replacing ε by ε − ω, and finally changing dummy variables, we get Gnm(ε)M kl(ω)G ij(ε− ω)M Putting Eq. (B6) back into Eq. (B5), we get JE,eα G>nm(ε)M kl(ω)G ij(ε− ω)M ljn 6= 0. (B7) For the phonon energy current we have JE,phα D>nm(ω)M ki(ε)G jl(ε− ω)M −Dki(ε)G jl(ε− ω)M . (B8) Following the same procedure as electrons, finally we get JE,phα G>nm(ε)M kl(ω)G ij(ε− ω)M ljn 6= 0. (B9) So we still have JE,eα + J = 0. (B10) Eqs. (B7, B9) give the energy exchange between the electron and the phonon system, which is also the heat generation of the atomic junction. Replacing D<, G< by D<0 , G 0 in Eq. (B7), and G >, G< by G>0 , G Eq. (B9), we get the results under BA. We can find that the energy increase of the phonons does not equal to the energy decrease of the electrons under BA. APPENDIX C: ANALYTICAL RESULT AT ZERO TEMPERATURE At zero temperature, we can get an analytical expres- sion for heat generation in a single-atom structure by using the bare Green’s functions in Eq. (B7, B9). We only take into account the imaginary part of the lead self- energies and ignore their energy dependence (the wide- band limit)24. Finally, we assume that the phonons are in their equilibrium states. Under these approximations, the heat generation is (assuming eV ≥ ω0) Q ≈ 1 M2ΓLΓR [(ε− ε0)2 + Γ2/4] [(ε− ε0 − ω0)2 + Γ2/4] M2ΓLΓR 4π(ω20 + Γ (eV/2− ε0)2 + Γ2/4 (eV/2− ε0 − ω0)2 + Γ2/4 (−eV/2− ε0 + ω0)2 + Γ2/4 (−eV/2− ε0)2 + Γ2/4 arctan eV/2− ε0 + arctan eV/2− ε0 − ω0 −arctan −eV/2− ε0 −arctan −eV/2− ε0 + ω0 . (C1) ω0 is the phonon energy, ε0 is the electron onsite energy, V is the applied bias, and Γ = ΓL+ΓR. The heat genera- tion is zero when eV ≤ ω0. Equation (C1) can reproduce most the qualitative features of heat generation in a sin- gle atom, except that it does not take into account heat conduction in the phonon system. ∗ Electronic address: tower.lu@gmail.com http://staff.science.nus.edu.sg/~phywjs/ 1 S. Ciraci, A. Buldum, and I. P. Batra, J. Phys.: Condens. Matter 13, R537 (2001). 2 B. J. van Wees, H. van Houten, C. W. J. Beenakker, J. G. Williamson, L. P. Kouwenhoven, D. van der Marel, and C. T. Foxon, Phys. Rev. Lett. 60, 848 (1988); D. A. Wharam, T. J. Thornton, R. Newbury, M. Pepper, H. Ahmed, J. E. F. Frost, D. G. Hasko, D. C. Peacock, D. A. Ritchie, and G. A. C. Jones, J. Phys. C: Solid State Phys. 21, L209 (1988). 3 L. G. C. Rego and G. Kirczenow, Phys. Rev. Lett. 81, 232 (1998); K. Schwab, E. A. Henriksen, J. M. Worlock, and M. L. Roukes, Nature 404, 974 (2000). 4 E. Pop, D. Mann, J. Cao, Q. Wang, K. Goodson, and H. Dai, Phys. Rev. Lett. 95, 155505 (2005); D. Mann, Y. K. Kato, A. Kinkhabwala, E. Pop, J. Cao, X. Wang, L. Zhang, Q. Wang, J. Guo, and H. Dai, Nature Nanotech- nology 2, 33 (2007). 5 M. Lazzeri, S. Piscanec, F. Mauri, A. C. Ferrari, and J. Robertson, Phys. Rev. Lett. 95, 236802 (2005). 6 D. G. Cahill, W. K. Ford, K. E. Goodson, G. D. Mahan, A. Majumdar, H. J. Maris, R. Merlin, and S. R. Phillpot, J. Appl. Phys. 93, 793 (2003). 7 S. Lepri, R. Livi, and A. Politi, Phys. Rep. 377, 1 (2003). 8 B. Li, J. Wang, L. Wang, and G. Zhang, Chaos 15, 015121 (2005). 9 L. V. Keldysh, Sov. Phys. JETP 20, 1018 (1965). 10 L. Kadanoff and G. Baymn, Quantum Statistical Mechan- ics (W. A. Benjamin, New York, 1962). 11 H. Haug and A.-P. Jauho, Quantum Kinetics in Transport and Optics of Semiconductors (Springer, Berlin, 1996). 12 S. Datta, Electronic Transport in Mesoscopic Systems (Cambridge University Press, 1997). 13 A. Ozpineci and S. Ciraci, Phys. Rev. B 63, 125415 (2001). 14 N. Mingo and L. Yang, Phys. Rev. B 68, 245406 (2003); Phys. Rev. B 70, 249901(E) (2004); N. Mingo, Phys. Rev. B 74, 125402 (2006). 15 T. Yamamoto and K. Watanabe, Phys. Rev. Lett. 96, 255503 (2006). 16 A. Dhar and D. Sen, Phys. Rev. B 73, 085119 (2006). 17 J.-S. Wang, J. Wang, and N. Zeng, Phys. Rev. B 74, 033408 (2006); J.-S. Wang, N. Zeng, J. Wang, and C. K. Gan, cond-mat/0701164. 18 M. Galperin, M. A. Ratner, and A. Nitzan, J. Phys.: Con- dens. Matter 19, 103201 (2007); cond-mat/0611169. 19 D. A. Ryndyk, M. Hartung, and G. Cuniberti, Phys. Rev. B 73, 045420 (2006). 20 C. Auer, F. Schurrer, and C. Ertler, Phys. Rev. B 74, 165409 (2006). 21 M. Lazzeri and F. Mauri, Phys. Rev. B 73, 165419 (2006). 22 C. Caroli, R. Combescot, P. Nozieres, and D. Saint-James, mailto:tower.lu@gmail.com http://staff.science.nus.edu.sg/~phywjs/ J. Phys. C : Solid State Phys. 4, 916 (1971). 23 We denote electron energy as ε, and phonon energy as ω, except in the unified expressions including both electrons and phonons, i.e., Eq. (22). 24 Y. Meir and N. S. Wingreen, Phys. Rev. Lett. 68, 2512 (1992); A.-P. Jauho, N. S. Wingreen, and Y. Meir, Phys. Rev. B 50, 5528 (1994). 25 R. Lake and S. Datta, Phys. Rev. B 46, 4757 (1992). 26 T. N. Todorov, Phil. Mag. B 77, 965 (1998). 27 M. J. Montgomery, T. N. Todorov, and A. P. Sutton, J. Phys.: Condens. Matter 14, 5377 (2002). 28 A. P. Horsfield, D. R. Bowler, H. Ness, C. G. Snchez, T. N. Todorov, and A. J. Fisher, Rep. Prog. Phys. 69, 1195 (2006). 29 N. Agräıt, C. Untiedt, G. Rubio-Bollinger, and S. Vieira, Phys. Rev. Lett. 88, 216803 (2002). 30 A. Pecchia, G. Romano, and A. D. Carlo, Phys. Rev. B 75, 035401 (2007). 31 Q.-F. Sun and X. C. Xie, cond-mat/0608536. 32 T. N. Todorov, J. Hoekstra, and A. P. Sutton, Phys. Rev. Lett. 86, 3606 (2001). 33 M. D. Ventra, Y.-C. Chen, and T. N. Todorov, Phys. Rev. Lett. 92, 176803 (2004). 34 Y.-C. Chen, M. Zwolak, and M. D. Ventra, Nano Lett. 3, 1691 (2003); Z. Yang, M. Chshiev, M. Zwolak, Y.-C. Chen, and M. D. Ventra, Phys. Rev. B 71, 041402(R) (2005); Z. Huang, B. Xu, Y. Chen, M. D. Ventra, and N. Tao, Nano Lett. 6, 1240 (2006). 35 D. Segal and A. Nitzan, J. Chem. Phys. 117, 3915 (2002). 36 N. Guisinger, M. Greene, R. Basu, A. Baluch, and M. Her- sam, Nano Lett. 4, 55 (2004). 37 T. Rakshit, G.-C. Liang, A. W. Ghosh, M. C. Hersam, and S. Datta, Phys. Rev. B 72, 125305 (2005). 38 J. Taylor, H. Guo, and J. Wang, Phys. Rev. B 63, 245407 (2001); N. Sergueev, D. Roubtsov, and H. Guo, Phys. Rev. Lett. 95, 146803 (2005). 39 T. Frederiksen, M. Brandbyge, N. Lorente, and A.-P. Jauho, Phys. Rev. Lett. 93, 256601 (2004); T. Frederik- sen, M. Paulsson, M. Brandbyge, and A.-P. Jauho, cond- mat/0611562. 40 T. Frederiksen, Master’s thesis, Technical University of Denmark (2004). 41 J. K. Viljas, J. C. Cuevas, F. Pauly, and M. Hafner, Phys. Rev. B 72, 245415 (2005). ABSTRACT Employing the nonequilibrium Green's function method, we develop a fully quantum mechanical model to study the coupled electron-phonon transport in one-dimensional atomic junctions in the presence of a weak electron-phonon interaction. This model enables us to study the electronic and phononic transport on an equal footing. We derive the electrical and energy currents of the coupled electron-phonon system and the energy exchange between them. As an application, we study the heat dissipation in current carrying atomic junctions within the self-consistent Born approximation, which guarantees energy current conservation. We find that the inclusion of phonon transport is important in determining the heat dissipation and temperature change of the atomic junctions. <|endoftext|><|startoftext|> Introduction to Mesoscopic Physics 2nd ed. (Oxford, 2001). h-hole l-hole [100] [010] FIG. 1: (a) Surface plot of spin subbands explains two Fermi contours with different kF appear as a result of spin-orbit interaction. (b) According to more realistic calculation[6, 7], the outer subband (h-hole) is significantly warped. (c) Schematic illustration of Beff seen by holes moving around a ring structure in the presence of spin-orbit interaction. (d) Scanning electron micrograph of the ADL sample. Bext (T) 0 0.1 0.2 0.3 0.4 0.5 -0.02 -0.01 FIG. 2: Upper solid line: Resistance of the sample as a function of the external magnetic field Bext for T=60 mK. Lower broken line: Extracted oscillating part of the magnetoresistance. For the background subtraction, a 12th order polynomial for each set of 100 consecutive data points is adopted. Frequency (T −1) T =60mK 20 40 60 80 100 120 140 160 180 200 220 FIG. 3: Fourier spectrum of the AB-type oscillation in the magnetoresistance at 60mK. The main peak splits into four (B, A, A’, B’) and also the 2nd harmonic splits into two (A∗, A’∗). (The other two are within the noise level.) The inset shows peak structures around four times frequency of the main peak structure as indicated by arrows. The scales of FT amplitude are shown by the arrows marked as u in the respective figures. Frequency (T −1) T=60mK 120mK 150mK 210mK 250mK 40 60 80 100 120 140 T (mK) 100 200 300 FIG. 4: FT results at temperatures from 300mK down to 60mK. The data are offset for clarity. The inset shows the peak heights normalized at 60mK for peaks A, A’ and B’. The solid line in the inset is the fit to the Dingle function (4). 40 50 60 100 110 120 140 150 160 170 Frequency (T −1) 190 200 210 220 230 240 FIG. 5: Solid lines are the FT spectra of the function (5) for the winding number n = 1,2,3,4. The magnetic field region is taken as −0.5T< B <0.5T and the parameter ∆kF is taken from SdH measurement as 4.1×107m−1. The scale of abscissa is modified with n to show the entire peak structures. Dotted lines are the results when Berry’s phase ∆θB is set to zero. Significant difference between solid and dotted lines around the center while they are almost identical in surrounding regions. Acknowledgments References ABSTRACT We report observation of spin-orbit Berry's phase in the Aharonov-Bohm (AB) type oscillation of weak field magnetoresistance in an anti-dot lattice (ADL) of a two-dimensional hole system. An AB-type oscillation is superposed on the commensurability peak, and the main peak in the Fourier transform is clearly split up due to variation in Berry's phase originating from the spin-orbit interaction. A simulation considering Berry's phase and the phase arising from the spin-orbit shift in the momentum space shows qualitative agreement with the experiment. <|endoftext|><|startoftext|> Introduction Jones (1976) studied the propagation of light in a moving dielectric and showed by experiment that a rotating medium induces a rotation of the polarisation of the transmitted light. Player (1976) confirmed that this observation could be ac- counted for through an application of Maxwells equations in a moving medium. More recently Padgett et al. (2006) reasoned that the rotation of the medium turns a transmitted image by the same angle as the polarisation. This is in contrast to the Faraday effect (Faraday 1846), where a static magnetic field in a dielectric medium, parallel to the propagation of light, causes a rotation of the polarisation but not a rotation of a transmitted image. Rotation of the plane of polarisation and im- age rotation in a rotating medium may be attributed respectively to the spin and orbital angular momentum of light (Allen et al. 1999, 2003). The first theoretical treatment of this problem was published by Fermi (1923), who considered plane waves and a non-dispersive medium. The theoretical ana- lysis of Player (1976) was also restricted to the propagation of plane waves, but took the dispersion of the medium into account. Player assumed that the dielectric response does not depend on the motion of the medium. In our treatment we follow his assumption although a more careful analysis by Nienhuis et al. (1992) showed that there will be an effect of the motion on the refractive index for a dispersive medium near to an absorption resonance (see also Baranova & Zel’dovich (1979) for a discussion on the effect of the Coriolis force on the refractive index). In contrast to Player we allow for more general electromagnetic fields that can carry orbital angular momentum (OAM). This leads to an additional term in our Article submitted to Royal Society TEX Paper http://arxiv.org/abs/0704.0725v1 2 J. B. Götte, S. M. Barnett and M. Padgett wave equation, which corresponds to a Fresnel drag term familiar from analysis of uniform motion. For a rotating medium, however, this drag leads to a rotational shift of the image. The propagation of light in a rotating medium thus involves both spin angular momentum (SAM) and OAM. We solve the wave equation for circularly polarised Bessel beams and consider two different superpositions of such Bessel beams to quantify the effects of both polarisation and image rotation. For rotation of the polarisation we examine a superposition of left- and right-circularly polarised Bessel beams carrying the same amount of OAM. For image rotation we consider a superposition of Bessel beams with the same circular polarisation but opposite OAM values. Such a superposition creates an intensity pattern with lobes or ‘petals’. In both cases the constituent Bessel beams propagate differently in the medium, which leads to a change in their relative phase. This is the origin of the rotation of both the polarisation and the transmitted image. For both phenomena we derive an expression for the angle per unit length of dielectric through which the image or the polarisation is rotated. The significance of the total angular momentum can be most easily seen in the wave equation for the propagation of light in a rotating medium. We derive this wave equation in section 2. In the remaining sections we calculate the rotation of polarisation (section 3) and the image rotation (section 4) and reveal their common form. 2. Wave equations The wave equation for a general electric displacment D in a rigid dielectric medium rotating with angular velocity Ω is given by: −∇2D = −ǫ(ω′)D̈+ 2[ǫ(ω′)− 1] Ω× Ḋ− (v · ∇)Ḋ . (2.1) An analoguous wave equation can be derived for the magnetic induction B. Com- pared to the form derived by Player (1976), who considered the special case of a plane wave propagating along the direction of Ω, these wave equations contain an additional term 2[ǫ(ω′)− 1](v · ∇)Ḋ. This term is responsible for the Fresnel drag effect which modifies the speed of light in a moving medium (McCrea 1954; Bar- ton 1999; Rindler 2001). In the following we will derive this wave equation for the electric displacement. Our analysis starts with the same considerations as Player (1976), by intro- ducing a rest frame and a moving frame. In the rest frame the dielectric medium rotates with an angular velocity v = Ω× r and in the moving frame the medium is at rest. We restrict our analysis to small velocities with v ≪ c and use Maxwell’s equations in both reference frames (Landau & Lifshitz 1975). For the medium at rest we assume the following constitutive relations: ′ = ǫ(ω′)E′, (2.2a) ′ = H′, (2.2b) where we have used primes to denote the fields and their frequency ω′ in the moving frame. The fields in the moving frame can be expressed in the rest frame by a Lorentz Article submitted to Royal Society On the dragging of light by a rotating medium 3 transformation (Stratton 1941; Jackson 1998), which gives to first order in v/c: ′ = D+ v ×H, (2.3a) ′ = B− v ×E, (2.3b) ′ = E+ v ×B, (2.3c) ′ = H− v ×D, (2.3d) where we have set c = 1 and work with units in which ǫ0 = µ0 = 1. The two constitutive relation in (2.2) in the rest frame are thus given by D+ v ×H = ǫ(ω′) (E+ v ×B) , (2.4a) B− v ×E = H− v ×D. (2.4b) The dielectric constant is still given as a function of the frequency in the moving frame. We also assume that the dielectric constant depends only on the frequency and is otherwise independent of the state of motion of the medium. On combining these two equations we can express D and B with the two other fields E and H to the first order in v: D = ǫ(ω′)E+ [ǫ(ω′)− 1]v×H, (2.5a) B = H− [ǫ(ω′)− 1]v×E. (2.5b) After taking the curl of (2.5a) we can use the Maxwell equation ∇×E = −Ḃ and express Ḃ, with the help of (2.5b), in terms of Ḣ and Ė. If we assume v to be constant (see Appendix A), as in Player’s paper (Player, 1976) this yields ∇×D = −ǫ(ω′)Ḣ+ ǫ(ω′)[ǫ(ω′)− 1]v × Ė+ [ǫ(ω′)− 1]∇× (v ×H). (2.6) It follows from (2.5a) that ǫ(ω′)v × E = v ×D, to the first order in v, and so we can rewrite (2.6) as: ∇×D = −ǫ(ω′)Ḣ+ [ǫ(ω′)− 1]v × Ḋ+ [ǫ(ω′)− 1]∇× (v ×H). (2.7) We can now take the curl of (2.7) to obtain a wave equation for D, as ∇×∇×D = −∇2D for ∇·D = 0, and the curl of Ḣ is given by ∇× Ḣ = D̈. In order to express the curl of the vector products we use the identity∇×(a×b) = ∂ibia−∂iaib, where the doubly occurring index denotes a summation over the Cartesian components. The operator ∂i represents differentiation with respect to the ith component and acts on the whole product which gives rise to terms containing the divergences of v,D and H. These terms are either zero, because ∇ · v = 0 and ∇ ·D = 0 or they lead to terms which are of second order in v and therefore negligible. The wave equation for D is thus given by −∇2D = −ǫ(ω′)D̈+ [ǫ(ω′)− 1] (Ḋ · ∇)v − (v · ∇)Ḋ + [ǫ(ω′)− 1]∇× [(H · ∇)v − (v · ∇)H] . (2.8) For a rotation v = Ω×r we can specify terms of the form (a ·∇)v by expressing the components of the velocity v using the Levi-Civitta symbol εijk as vi = εijkΩjrk. The components of (a · ∇)v are thus given by [(a · ∇)v]i = al∂lεijkΩjrk = alεijkΩjδlk = [Ω× a]i . (2.9) Article submitted to Royal Society 4 J. B. Götte, S. M. Barnett and M. Padgett If we use the results from (2.9) in (2.8) we find for ∇2D: −∇2D = −ǫ(ω′)D̈+ [ǫ(ω′)− 1] Ω× Ḋ− (v · ∇)Ḋ + [ǫ(ω′)− 1]∇× [Ω×H− (v · ∇)H] . (2.10) The curl of the last bracket requires some some additional calculations. The first term is given by: ∇× (Ω×H) = (∇ ·H)Ω− (Ω · ∇)H, (2.11) and the second term can be written as: ∇× (v · ∇)H = Ω (∇ ·H)−∇ (Ω ·H) + (v · ∇) Ḋ, (2.12) where the last term originates from ∇×H. The terms containing the divergence of H cancel and the term (v · ∇) Ḋ can be added to the second term in (2.10). The two remaining terms − (Ω · ∇)H and ∇ (Ω ·H) together give Ω× Ḋ: Ω× Ḋ = Ω× (∇×H) = ∇ (Ω ·H)− (Ω · ∇)H. (2.13) This concludes the derivation of the wave equation (2.1). It is possible to derive the same wave equation for B using similar methods. For a rotation around the z axis with constant angular velocity Ω = Ωez, the directional derivative v · ∇ is proportional to an azimuthal derivative, as v · ∇ = Ω× r · ∇ = Ω∂φ. This allows us to identify the two terms Ω× Ḋ and Ω∂φḊ in the wave equation −∇2D = −ǫ(ω′)D̈+ 2[ǫ(ω′)− 1] Ω× Ḋ− Ω∂φḊ (2.14) as the polarisation rotation and rotary Fresnel drag terms, respectively. Player’s derivation does not contain the term proportional to ∂φḊ because he treated only the case of a plane wave propagating in the z-direction and for such fields D is independent of φ. On substituting a monochromatic ansatz of the form D = D0 exp(−iωt) into (2.14), where ω is the optical angular frequency in the rest frame, we obtain: −∇2D0 = ǫ(ω ′)ω2D0 − 2[ǫ(ω ′)− 1]ωΩ [iez ×D0 − i∂φD0] . (2.15) If we make an ansatz for D0 with a general polarisation given by the complex numbers α and β (with |α|2 + |β|2 = 1) in the form of D0 = (αex + βey)D+Dzez, we find that the x and y components of the wave equation (2.15) decouple if β = ±iα corresponding to left- and right-circularly polarised light respectively. If we restrict the solutions to these two cases we can write the wave equation as: ∇2D = −ǫ(ω′)ω2D + 2[ǫ(ω′)− 1]ωΩ (±1− i∂φ)D, (2.16) where the plus sign refers to left-circular polarisation and the minus sign to right- circular polarisation. We can then identify ±1 as the extreme values of the variable σ which corresponds to the circular polarisation or SAM of the light beam. Similarly we can identify −i∂φ = Lz as the OAM operator, so that the wave equation contains a term which depends on the total angular momentum σ + Lz: ∇2D = −ǫ(ω′)ω2D + 2[ǫ(ω′)− 1]ωΩ (σ + Lz)D. (2.17) We shall see that it is the dependence on the optical angular momentum that is responsible for the rotation of both the polarisation and of a transmitted image. Article submitted to Royal Society On the dragging of light by a rotating medium 5 3. Specific rotary power The rotation of the polarisation arises from the difference in the refractive indices for left- and right-circularly polarised light. The angle per unit length by which the polarisation is rotated is called the specific rotary power. For an optically active medium at rest the specific rotary power is characteristic for a given material, but from (2.17) it can be seen that light propagates differently in a rotating medium, depending on whether the circular polarisation turns in the same rotation sense as the dielectric or in the opposite sense. This phenomenon is described by the effective specific rotary power (Jones 1976; Player 1976). The specific rotary power, defined as (Fowles 1975): δpol(ω) = (nr(ω)− nl(ω)) = (nr(ω)− nl(ω)) , (3.1) is the angle of rotation of the plane of polarisation in an optical active medium. Here, the indices r and l refer to right- and left-circularly polarised light. It was convenient to set c = 1 for our derivation in section 2 but we reintroduce it here to facilitate the calculation of measurable quantities. In order to illustrate the effect of the OAM of light we choose a Bessel beam as an ansatz for the electrical displacement in the x− y plane: D = Jm(κρ) exp(imφ) exp(ikzz), (3.2) where κ and kz are the transverse and longitudinal components of the wavevector. Bessel beams of this form carry OAM of m~ per photon (Allen et al. 1992, 1999, 2003). Substituting the Bessel beam ansatz in the wave equation (2.17) yields the following result for the overall wavenumber k = κ2 + k2z : k2l/r(ω) = ǫ(ω − 2[ǫ(ω′)− 1] (σ +m). (3.3) The indices l and r denoting the circular polarisation correspond respectively to σ = 1 and σ = −1. With the help of the relations ǫ(ω′) = n2(ω′) and k(ω) = n(ω)ω/c we can turn the equation for the wavenumbers into an equation for the effective refractive indices for left- and right-circularly polarised light: n2l/r(ω) = n 2(ω′)− 2[n2(ω′)− 1] (σ +m). (3.4) Following Player (1976) we assume that Ω ≪ ω and we can therefore approximate the square root for the refractive indices nl/r by a small parameter expansion to the first order in Ω/ω: nl/r(ω) ≃ n(ω n(ω′)− n(ω′) (σ +m) . (3.5) The frequency in the moving frame ω′ is different for left- and right-circularly po- larised light (Garetz 1981) and, more generally, the azimuthal or rotational Doppler shift is proportional to the total angular momentum (σ + m) (Allen et al. 1994; Bialynicki-Birula & Bialynicka-Birula 1997; Courtial et al. 1998; Allen et al. 2003). For left-circularly polarised light with σ = 1 the frequency is thus ω′ = ω−Ω(1+m), Article submitted to Royal Society 6 J. B. Götte, S. M. Barnett and M. Padgett and for right-circularly polarised light with σ = −1 the frequency changes to ω′ = ω − Ω(−1 + m). Following Player (1976) we expand the refractive index of the dielectric in a Taylor series to calculate the difference nr − nl: nl(ω) ≃ n(ω)− Ω(1 +m)− n(ω)− (1 +m) , (3.6a) nr(ω) ≃ n(ω)− Ω(−1 +m)− n(ω)− (−1 +m) . (3.6b) Higher order derivatives of n become comparable in magnitude if n′(ω)Ω ≃ n(ω). This will only be case for a strongly dispersive medium, such as atomic or molecular gases, near a resonance. For such gaseous media the dielectric response in a rotating medium has to examined more closely (Nienhuis et al. 1992). For solid materials, such as a rotating glass rod, and for optical frequencies this condition is not fulfilled and we can neglect higher order derivatives in the expansion (3.6). Within Player’s assumption that the refractive index is independent of the motion of the medium we find for the effective specific rotary power: δpol(ω) = ωn′(ω) + n(ω)− . (3.7) On introducing the group refractive index ng(ω) = n(ω) + ωn ′(ω) and the phase refractive index nϕ(ω) = n(ω), we can rewrite the rotary power as δpol(ω) = ng(ω)− n ϕ (ω) (Ω/c), (3.8) which is identical to Player’s (1976) expression. In this form the specific rotary power (3.8) can be used directly with experimental data in the SI unit system. In the next section we look at image rotation caused by a difference in the effective refractive indices for different values of m. 4. Image rotation The specific rotary power describes the rotation of the propagation, but we can define, analogously, a rotary power of image rotation. The image can simply be created by the superposition of two light beams carrying different values of OAM which leads to an azimuthal variation of the intensity pattern. In particular we consider an incident superposition of two similarly circularly polarised Bessel beams with opposite OAM values of the form D = D+ +D− = Jm(κρ) exp(imφ) exp(ikzz) + J−m(κρ) exp(−imφ) exp(ikzz). (4.1) Outside the medium the superposition can be written as one Bessel beam with a trigonometric modulation D = Jm(κρ) (exp(imφ) + (−1) m exp(−imφ)) exp(ikzz), (4.2) but inside the medium the effective refractive index is different for the two com- ponents of the superposition (Allen & Padgett 2007). On propagation this leads to Article submitted to Royal Society On the dragging of light by a rotating medium 7 HaL HbL ∆ Figure 1. Image rotation (Intensity pattern created by the superposition of Bessel beams (a) in (4.1) for m = 2. On propagation the relative phase between the constituent Bessel beams changes which leads to a rotation of the pattern (b). The angle of rotation at a propagation distance L is given by δimgL.) phase difference which causes a rotation of the image (see figure 1). We define δimg(ω) = (n−(ω)− n+(ω)) , (4.3) which is the angle per unit length by which the image is rotated. The factorm in the expression for δimg appears because of the exp(imφ) and exp(−imφ) phase structure of the interfering beams and the resulting 2m-fold symmetry of the created image (Pagdett et al. 2006). The different effective refractive indices for the components of the superposition (4.1) are given by: n2+/−(ω) = n 2(ω′)− 2[n2(ω′)− 1] (σ ±m). (4.4) Here, σ is fixed in contrast to (3.4). The roles of σ and m are reversed for the image rotation and the refractive indices for positive and negative OAM are given by: n+(ω) ≃ n(ω)− Ω(σ +m)− n(ω)− (σ +m) , (4.5a) n−(ω) ≃ n(ω)− Ω(σ −m)− n(ω)− (σ −m) . (4.5b) On substituting (4.5) into (4.3) we find: δimg(ω) = + n(ω)− , (4.6) which can be written in terms of the group and phase refractive indices as: δimg(ω) = ng(ω)− n ϕ (ω) (Ω/c). (4.7) Article submitted to Royal Society 8 J. B. Götte, S. M. Barnett and M. Padgett This verifies the reasoning of Padgett et al. (2006) that the polarisation and the image are turned by the same amount when passing through a rotating medium. It is the total angular momentum that determines the phase shifts and a linearly polarised image will undergo rotations of both the plane of polarisation and the intensity pattern or image. 5. Conclusion We have extended a theoretical study by Player (1976) on the propagation of light through a rotating medium to include general electromagnetic fields. In the original analysis Player (1976) showed that the rotation of the polarisation inside a rotating medium can be understood in terms of a difference in the propagation for left- and right-circularly polarised light. Player’s (1976) analysis was thus concerned solely with the spin angular momentum (SAM) of light. Our treatment has shown that the general wave equation has an additional term, which is of the same form as the Fresnel drag term for a uniform motion. In the context of rotating motion, however, this term is connected to the orbital angular momentum (OAM) of the light. By extending the theoretical analysis to include OAM we have been able to attribute polarisation rotation and image rotation to SAM and OAM respectively. We have shown that a superposition of Bessel beams with the same OAM but opposite SAM states leads to the rotation of the polari- sation, whereas a superposition of Bessel beams with the same SAM and opposite OAM values gives rise to a rotation of the transmitted image. We have obtained quantitative expressions for the rotation of the polarisation and of the transmitted image and have verified that both are turned through the same angle, as recently suggested by Padgett et al. (2006). Player (1976) remarked that the derivation by Fermi (1923) appears to be in error. The mistake in Fermi’s treatment seems to be in missing the transformation of the magnetic fields. Whereas the change in the electric fields induced by the motion of the medium is explicitly given in terms of the electric polarisation P†, a similar transformation for the magnetic field is missing. In terms of our derivation this would mean that (2.5b) changes to B = H in the rest frame. This in turn causes that the term v × Ḋ would be missing in (2.6). This term and the term ∇ × (v × H) contribute equally to the wave equation (2.1), which explains why Fermi’s result for the specific rotary power is smaller than Player’s and ours by a factor of two. As pointed out by Player (1976) this missing factor is cancelled by an additional factor of two in Fermi’s definition of the specific rotary power. We would like to thank Amanda Wright and Jonathan Leach whose experiments on this problem motivated our work. This work was supported by the UK Engineering and Phys- ical Sciences Research Council. Appendix A. Accelerated motion The assumption that v = Ω × r is steady is problematic for a rotating motion; if we assume Ω to be constant over time, then v̇ = (Ω · r)Ω − Ω2r. In princi- ple this would invalidate our initial considerations for the transformation of the † Fermi (1923) denotes the electric polarisation by S Article submitted to Royal Society On the dragging of light by a rotating medium 9 electromagnetic fields (2.3) which strictly hold only for uniform motion. Includ- ing the time-derivative of v would lead to additional terms in (2.6) of the form ǫ(ω′)[ǫ(ω′) − 1]v̇ × E. If we proceed in taking the curl of this vector product we produce four terms which either can be neglected because they are second order in v/c, or they do not contain the time derivative of an optical field. The latter are smaller than terms that do contain a time derivative by ∼ Ω/ω. For our assumption Ω ≪ ω all such terms are negligible. References Allen, L., Beijersbergen, M. W., Spreeuw, R. J. C. &Woerdman, J. P. 1992 Orbital angular momentum of light and the transformation of Laguerre-Gaussian modes. Phys. Rev. A 45, 8185–8190. Allen, L., Babiker, M. & Power, W. L. 1994 Azimuthal Doppler-shift in light-beams with orbital angular-momentum. Opt. Commun. 112, 141–144. Allen, L., Padgett, M. J. & Babiker M. 1999 The Orbital Angular Momentum of Light. Prog. Opt. 39, 291–372. Allen, L., Barnett, S. M. & Padgett, M. J. 2003 Optical Angular Momentum. Bristol: Institute of Physics Publishing. Allen, L. & Padgett, M. J. 2007 Equivalent geometric transformation for spin and orbital angular momentum of light. J. Mod. Opt 54, 487–491. Baranova, N. B. & Zel’dovich, B. Ya. 1979 Coriolis contribution to the rotary ether drag. Proc. R. Soc. Lond. A 368, 591–592. Barton, G. 1999 Introduction to the Relativity principle. Chichester: John Wiley & Sons. Bialynicki-Birula I. & Bialynicka-Birula Z. 1997 Rotational Frequency Shift. Phys. Rev. Lett 78, 2539–2542. Courtial J., Robertson D. A., Dholakia K., Allen L. & Padgett M. J. 1998 Rotational Frequency Shift of a Light Beam. Phys. Rev. Lett. 81, 4828 – 4830. Faraday, M. 1846 Experimental Researches in Electricity. Nineteenth Series. Philos. Trans. R. Soc. Lond. 136, 1–20. Fermi, E. 1923 Sul trascinamento del piano di polarizzazione da parte di un mezzo rotante. Rend. Lincei 32, 115–118. Reprinted in: Fermi, E. 1962 Collected Papers, vol. 1. Chicago: University of Chicago Press. Fowles, F. R. 1975 Introduction to Modern Optics, 2nd edn. New York: Dover Publications. Garetz, B. A. 1981 Angular Doppler-effect. J. Opt. Soc. Am. 71, 609–611. Jackson, J. D. 1999 Classical Electrodynamics, 3rd edn. New York: John Wiley & Sons. Jones, R. V. 1976 Rotary ‘aether drag’. Proc. R. Soc. Lond. A 349, 423–439. Landau, L. D. & Lifshitz, E. M. 1975 The Classical Theory of Fields, 4th edn. Oxford: Elsevier Butterworth-Heinemann. Nienhuis, G., Woerdman, J. P. & Kuščer 1992 Magnetic and mechanical Faraday effects. Phys. Rev. A 46 (11), 7079–7092. McCrea, W. H. 1954 Relativity Physics, 4th edn. London: Methuen Publishing. Padgett M., Whyte G., Girkin J., Wright A., Allen L., Öhberg P. & Barnett S. M. 2006 Polarization and image rotation induced by a rotating dielectric rod: an optical angular momentum interpretation. Optics Lett. 31 (14), 2205–2207. Player, M. A. 1976 Polarization and image rotation induced by a rotating dielectric rod: an optical angular momentum interpretation. Proc. R. Soc. Lond. A 349, 441–445. Rindler, W. 2001 Relativity. Oxford: Oxford University Press. Stratton, J. A., 1941 Electromagnetic Theory. New York: McGraw-Hill. Article submitted to Royal Society Introduction Wave equations Specific rotary power Image rotation Conclusion Accelerated motion ABSTRACT When light is passing through a rotating medium the optical polarisation is rotated. Recently it has been reasoned that this rotation applies also to the transmitted image (Padgett et al. 2006). We examine these two phenomena by extending an analysis of Player (1976) to general electromagnetic fields. We find that in this more general case the wave equation inside the rotating medium has to be amended by a term which is connected to the orbital angular momentum of the light. We show that optical spin and orbital angular momentum account respectively for the rotation of the polarisation and the rotation of the transmitted image. <|endoftext|><|startoftext|> Introduction to Econophysics (Cambridge University Press, Cambridge); Liu Y, Gopikrishnan P, Cizeau P, Meyer M, Peng C K and Stanley H E, 1999 Phys. Rev. E 60, 1390; Vandewalle N, Ausloos M and Boveroux P, 1999 Physica A 269, [30] Kantelhardt J W, Berkovits R, Havlin S and Bunde A, 1999 Physica A 266, 461; Vandewalle N, Ausloos M, Houssa M, Mertens P W and Heyns M M, 1999 Appl. Phys. Lett. 74, 1579 [31] Feder J, 1988 Fractals (Plenum Press, New York) [32] Barabási A L and Vicsek T, 1991 Phys. Rev. A 44, 2730 [33] Peitgen H O, Jürgens H and Saupe D, 1992 Chaos and Fractals (Springer-Verlag, New York), Appendix B [34] Bacry E, Delour J and Muzy J F, 2001 Phys. Rev. E 64, 026103 [35] Fano U, 1947 Phys. Rev. 72 26 [36] Barmes J A and Allan D W, 1996 Proc. IEEE 54 176 [37] Buldyrev S V, Goldberger A L, Havlin S, Mantegna R N, Matsa M E, Peng C K, Simons M, Stanley H E, 1995 Phys. Rev. E 51 5084 [38] Eke A, Herman P, Kocsis L and Kozak L R, 2002 Physiol. Meas. 23, R1-R38 [39] M Sadegh Movahed, G R Jafari, F Ghasemi , Sohrab Rahvar and M Reza Rahimi Tabar, J. Stat. Mech. (2006) P02003. [40] Kantelhardt J W, Zschiegner S A, Kosciliny-Bunde E, Bunde A, Pavlin S and Stanley H E, 2002 Physica A 316, 78-114. [41] Oświȩcimka P and et. al, [arXive:cond-mat/0504608] http://arxiv.org/abs/cond-mat/0504608 Introduction Multifractal Detrended Fluctuation Analysis Description of MF-DFA method Relation to standard multifractal analysis Analysis of music frequency series Conclusion Acknowledgment References ABSTRACT We show that it can be considered some of Bach pitches series as a stochastic process with scaling behavior. Using multifractal deterend fluctuation analysis (MF-DFA) method, frequency series of Bach pitches have been analyzed. In this view we find same second moment exponents (after double profiling) in ranges (1.7-1.8) in his works. Comparing MF-DFA results of original series to those for shuffled and surrogate series we can distinguish multifractality due to long-range correlations and a broad probability density function. Finally we determine the scaling exponents and singularity spectrum. We conclude fat tail has more effect in its multifractality nature than long-range correlations. <|endoftext|><|startoftext|> Circuit QED with a Flux Qubit Strongly Coupled to a Coplanar Transmission Line Resonator T.Lindström1, C.H. Webster1, J.E. Healey2, M. S. Colclough2 C.M.Muirhead2, A.Ya.Tzalenchuk1 1National Physical Laboratory, Hampton Road, Teddington, TW11 0LW, UK and 2 University of Birmingham, Edgbaston, Birmingham B15 2TT, UK (Dated: October 22, 2018) We propose a scheme for circuit quantum electrodynamics with a superconducting flux-qubit coupled to a high-Q coplanar resonator. Assuming realistic circuit parameters we predict that it is possible to reach the strong coupling regime. Routes to metrological applications, such as single photon generation and quantum non-demolition measurements are discussed. I. INTRODUCTION Until a few years ago it was an open question whether true quantum effects such as quantum entanglement would ever be observed in a man-made macroscopic elec- tronic device. However, over the past decade, quantum coherence has been demonstrated in a variety of macro- scopic systems, including superconducting circuits1,2,3,4. Many of these experiments drew inspiration from the pi- oneering work on atomic qubits that took place a decade earlier5. As the fields of atomic physics and quantum optics continue to advance, it makes sense to continue to look to them for guidance. The universal nature of quantum mechanics is greatly to our advantage, in that the terminology and method- ology apply as well to macroscopic as to microscopic systems. This allows well-known results from atomic physics and quantum optics to be used to plan and pre- dict the outcome of experiments on solid state devices. Recently, such techniques have been applied with great success to implement a number of ideas such as quan- tum state tomography6, Mach-Zehnder interferometry7 and sideband cooling8. This approach has also been very successful in the field of cavity quantum electrodynamics (CQED) [see Ref.9 for an introduction]. In CQED an atomic 2-level system (i.e. a qubit) is made to interact with a high-finesse op- tical cavity with a coupling energy h̄g. Provided that the relaxation rates γ of the qubit and κ of the cavity field are smaller than g (known as the strong coupling criterion), it is possible to observe a coherent exchange of energy between the qubit and the cavity field. The resulting entangled states can be detected spectroscopi- cally. Recently, Schoelkopf and co-workers10,11 achieved strong coupling in a macroscopic circuit comprising a su- perconducting charge qubit and a coplanar transmission line resonator. This new field is known as circuit-QED, and has many potential applications such as the gener- ation and detection of single microwave photons. More recently strong coupling was observed in experiments on photonic crystals12 and quantum dots13. Much of the work to date on qubit-cavity sys- tems has been focused on methods for making quan- tum non-demolition (QND) measurements of the qubit state. QND schemes have been used to read-out single qubits10,14 and also to measure the photon number in the cavity15. Several ways of reading out qubits using low-Q cavities/resonators have also been reported16,17, but these do not allow the strong coupling regime to be reached. The benefits of superconducting circuit-QED systems over atomic systems are twofold. Firstly, the qubit en- ergy can be tuned by varying the external magnetic field, enabling control over the qubit-cavity interaction. Sec- ondly, qubit parameters such as the level separation at the degeneracy point can be engineered through appro- priate circuit design. The main advantage of flux qubits over other types of superconducting qubits is that they are less susceptible to fluctuations in the background charge and the associated noise. This makes them less prone to decoherence, and therefore easier to manipulate in a deterministic way. In section II we summarise the established background theory in the context of our proposed system; in III we show that it is possible to achieve strong coupling be- tween a superconducting flux qubit18 and a high-Q copla- nar transmission line resonator; in IV we present com- puter simulations of the resonator response; and in V we propose a scheme for producing single photons on de- mand. II. THEORY OF CIRCUIT QED WITH A FLUX QUBIT In this section we analyze the processes which occur when a superconducting qubit is coupled to a supercon- ducting coplanar transmission line resonator, as shown in Fig. 1. The effect of coupling a flux qubit to a resonator has previously been experimentally demonstrated19 but the quality factor of the resonator was too small to fulfill the strong coupling criteria. Two types of flux qubits will be discussed - an RF SQUID, which consists of a single Josephson junction in a superconducting loop, and a persistent current qubit (PCQ), which has three junctions in the loop, one of which is smaller than the others by a factor α18. Both need to be biased by an external magnetic flux Φx, which tunes the energy level separation through an anticross- ing at Φx = 0.5Φ0. The resonator is a coplanar trans- http://arxiv.org/abs/0704.0727v3 FIG. 1: a) Sketch of a typical coplanar waveguide resonator of length l=λ/2 ≈ 11 mm. Shown is also how the qubit can be placed in between the centre conductor and the ground plane of the waveguide. b) Schematic diagram of a superconducting qubit coupled to a coplanar transmission line resonator. (A) Persistent current qubit. (B) RF SQUID. M is the mutual inductance between the qubit and resonator, and ΦX is the magnetic flux threading the qubit loop. mission line with inductance L and capacitance C, which is weakly coupled to external transmission lines via cou- pling capacitors CC . The Hamiltonian H of the complete system is H = Hq +Hr +Hg +HI +HE . (1) Hq describes the qubit, Hr the resonator, and Hg the interaction between them. HI denotes the interaction of the system with a periodic drive field. Finally, HE describes the interaction of the resonator with its envi- ronment, resulting in the loss of photons to the external transmission lines and the interaction of the qubit with its environment, resulting in spontaneous decay from the excited state to the ground state. The qubit Hamiltonian Hq is given by the expression (h̄/2)(−ǫσz−δσx) where σz and σx are Pauli spin matri- ces, δ is the the level repulsion, ǫ = (2Ip/h̄)(Φx −Φ0/2), and Ip is the persistent current. In the case of an RF SQUID suitable for operation as a qubit, Ip is roughly equal to half the critical current of the single junction, whereas, for the persistent current qubit, Ip is approxi- mately equal to the critical current of the smallest of the three junctions. If the qubit is operated at or near the degeneracy point, Hq can be expressed more simply by transforming to the basis in which the ground state | ↓〉 and excited state | ↑〉 correspond to symmetric and anti- symmetric superpositions of clockwise and anti-clockwise persistent currents. This yields (| ↑〉〈↑ | − | ↓〉〈↓ |) = h̄ω0 σz, (2) where the level separation is h̄ω0, ω0 being the qubit Larmor frequency ǫ2 + δ2. Assuming the resonator supports only a single mode of the electromagnetic field, its Hamiltonian is given by Hr = h̄ωr , (3) where a†(a) is the creation(annihilation) operator which creates (destroys) a single photon in the cavity. The eigenstates of the resonator described by this Hamilto- nian are Fock states |0〉 . . . |n〉 . . . with n photons. The single mode condition corresponds to a harmonic oscilla- tor with the energy levels at h̄ωr(n+ 1/2) and the zero- point energy h̄ωr/2. The interaction between the radiation and the qubit is described as the dipole interaction Hg = −µ̂ · B̂ be- tween a magnetic moment µ of the persistent current circulating in the qubit loop and the magnetic field B in the resonator. Introducing quantization, this term in the Hamiltonian can be written in the form Hg = h̄g(a †σ− + σ+a), (4) where σ+(σ−) is the raising(lowering) operator for the qubit. This expression is valid in the so-called non- dispersive regime where ωr ≈ ω035. The constant g char- acterizes the qubit-photon interaction strength and the expression in the brackets describes the process whereby the qubit can be excited by absorbing a photon, or a photon can be generated at the expense of de-exciting the qubit into its ground state. In the next section we shall explicitly calculate the dipole coupling strength g for specific designs of the qubit and the cavity. The interaction of the coupled system with an external classical drive field can be seen as a periodic exchange of photons between the resonator and the driving field: HI = ξ(e −iωta† + eiωta), (5) where ξ is the drive amplitude. One can also drive the qubit directly using a separate control line leading to terms ξ′(e−iωtσ+ + eiωtσ−), but we shall not consider this case any further. Adding the terms (2)-(5) together we arrive at the driven Jaynes-Cummings Hamiltonian20 σz + h̄ωr + h̄g(a†σ− + σ+a) (6) + ξ(e−iωta† + eiωta). This Hamiltonian can be used to write down a mas- ter equation which completely describes the dynamics of the system. All the parameters which define the system can be conveniently written in units of angular frequency – a convention which we will follow here. If the cav- ity is weakly coupled to an already weak drive field one can achieve a regime when only two lower Fock states of the resonator are relevant. Within the picture described above we make one two-level system, the qubit, interact with another two-level system, the resonator. When the qubit is detuned from the resonator the eigenstates of the coupled system can be written: |0 ↓〉, |0 ↑〉, |1 ↓〉 and |1 ↑〉, where the number represents a Fock state of the resonator and the arrow represents the qubit state. However, when the qubit is brought into resonance, Hg couples the states |0 ↑〉 and |1 ↓〉 and lifts their degen- eracy. The system will oscillate between the states |0 ↑〉 and |1 ↓〉 at a frequency Ω = 2g , known as the vac- uum Rabi frequency21, giving rise to a splitting22 of the central peak as shown in Fig. 2. One can visualize this as a cycle in which the resonator and qubit continuously exchange an amount of energy equal to one photon. As the drive amplitude increases it will start to perturb the system. This leads to a set of states that are shifted by En = ± nh̄g[1− (2ξ/g)2]3/4. (7) Hence, the effect of the drive is to reduce the Rabi fre- quency. The coupling strength g can be determined experimen- tally by making a spectroscopic measurement of the split- ting ∆E1. The drive amplitude should be reduced until the splitting reaches its maximum value, where it is equal to 2gh̄. |0 |1↑〉 ± ↓〉 Energy hω0hωr |1↓〉|0 |1↑〉 ± ↓〉 FIG. 2: Energy levels of the coupled qubit-cavity system. On the left of the diagram the qubit is far detuned from the cavity. As we move from left to right, the magnetic flux threading the qubit loop is increased, tuning the qubit transition frequency into resonance with the cavity. On the right of the diagram, the qubit is once again far detuned. The above discussion covers the resonant regime, where the qubit is tuned into resonance with the cav- ity. In contrast, when the detuning ∆ = ω0−ωr is large, such that g/∆ ≪ 1, a dispersive Stark shift pulls the cav- ity frequency by ±g2/∆. This so-called dispersive regime can be used to perform quantum non-demolition mea- surements of the qubit state11. The effects of the environment on the system are taken into account by the term HE . There are three types of damping that need to be considered: • Photons leak out of the cavity at a rate κ = ωr/Q, where Q is the cavity quality factor. • The qubit relaxes at a rate γ = 1/T1, where T1 is the energy relaxation time. • Pure dephasing of the qubit at a rate γφ = 1/Tφ = 1/T2 − 2/T1, where T2 is the dephasing time. Dephasing plays a larger role in solid state systems than atomic systems, due to the stronger interaction of solid state qubits with their environments. In the absence of pure dephasing we would have T2 = 2T1 but in real sys- tems T2 is frequently much shorter than that, indicating the need to take pure dephasing into account. III. STRONG COUPLING WITH A FLUX QUBIT Below we estimate the coupling strength g using a semi-classical approach. We treat the flux qubit as a magnetic dipole and assume that it is placed at a magnetic field antinode of the resonator. The coupling strength is given by g = µB0rms/h̄, where B0rms is the zero-point root mean square magnetic field generated by the current fluctuations at the antinode of the resonator. The magnetic dipole moment of the qubit is given by µ = IpA, where Ip is the persistent current flowing around the loop and A is the loop area. We can esti- mate B0rms by considering the zero point energy of the resonator, h̄ωr/2. This energy cycles continuously be- tween inductive and capacitive components. The mag- netic field is determined by the inductive component, LI(t)2/2, where L is the total equivalent inductance of the resonator near resonance and I(t) is the instanta- neous current. At the moment when the energy is purely inductive, we have (1/2)LI2max = (1/2)h̄ωr. Since I(t) undergoes sinusoidal oscillation we therefore have that Irms = . (8) Assuming that current flows in thin strips (whose width is determined by the superconducting penetration depth) at the edges of the centre conductor and ground plane, the field at the antinode of the fundamental mode is ap- proximately given by B0rms ≈ µ0Irms , (9) where µ0 is the permeability of free space and r is half the width of the gap between the centre conductor and the ground plane (we assume the qubit is placed at the centre of the gap). Therefore, the coupling strength between the qubit and resonator is given approximately by g ≈ IpAµ0 . (10) By inserting realistic values for the parameters in the above equation we can obtain an estimate of g. First, we choose the fundamental frequency of the res- onator. It is convenient to choose a value that lies within the range 4–8 GHz, as this is well within the design scope of both the qubit and resonator, and can be accessed with commercial microwave sources and components. We choose ωr/2π = 6 GHz. With a centre conductor of width ∼ 10 µm and a gap of width ∼ 5 µm, it is possible to achieve a total inductance L ∼ 2 nH for a resonator op- erated at its fundamental frequency.. Next, we choose Ip such that the transition frequency ω0 of the qubit at the degeneracy point is slightly less than that of the resonator. This will enable us to tune the qubit in and out of resonance with the resonator by changing the external flux Φx threading the qubit loop. For the 3-junction persistent current qubit having two junctions of critical current Ic = 800 nA and junction capacitance C = 4 fF, and one junction of critical cur- rent αIc, where α = 0.72, we get ω0/2π = 4.9 GHz at the degeneracy point and Ip ≈ 580 nA. These parameters were obtained by solving the Schroedinger equation nu- merically. The transition frequency ω0 does not depend on the area A of the qubit loop, provided that the loop inductance remains small compared with the Josephson inductance. However, we note that the larger the loop area, the greater the (undesired) coupling to the environ- ment. Here we choose a value A ≈ 8 µm2. With the above parameters we obtain g/2π ≈ 35 MHz. We now compare this with the rate of photon loss from the resonator κ and the relaxation rate of the qubit γ. When g > κ, γ, the coupled system is able to undergo many cycles (≈ 2g/(κ+γ)) of vacuum Rabi oscillation be- fore losing coherence. This is important for applications such as single microwave photon generation. The photon loss rate from the resonator is given by κ/2π = ωr/Q, where Q is the loaded quality factor of the resonator. It is possible to design a resonator with Q = 105, yield- ing κ/2π ≈ 0.1 MHz. The relaxation rate of the qubit is given by γ = 2π/T1. Taking T1 ∼ 1 µs3, we obtain γ/2π ≈ 1 MHz. Naturally, the values for T1 and T2 for a real system can not be predicted with any accuracy and will depend on the experimental conditions. How- ever, based on the data available in the literature23,24 we believe that the aforementioned values are reasonable. Hence, it is clear that our estimated value of g for the persistent current qubit should satisfy the strong cou- pling criterion. For an RF SQUID with critical current Ic = 10 µA, area A = 64 µm2, loop inductance LSQUID = 35 pH and junction capacitance C = 50 fF, we obtain a transition frequency ω0/2π = 4.6 GHz. In contrast to the persis- tent current qubit, the area of the RF SQUID does affect the transition frequency, via the loop inductance. It is difficult to reduce the area further than the value we have chosen, as this necessitates increasing the critical current and decreasing the junction capacitance, which becomes increasingly difficult to achieve in practice. Close to the degeneracy point, the persistent current Ip in the above SQUID is expected to be about 5 µA. Combined with the increased loop area, this is likely to lead to an even larger coupling strength g than predicted for the persistent cur- rent qubit (unless the SQUID is displaced significantly from the antinode of the resonator) but at the same time make qubit more susceptible to noise. The fact that the loop area of the 3-junction persistent current qubit can be made small enough to render it relatively insensitive to flux noise is one reason why it has been so successfully used by e.g Mooij and co-workers3,18. IV. NUMERICAL SIMULATION OF THE SPECTRUM UNDER MICROWAVE EXCITATION Having shown that it is possible to reach the strong coupling regime with a flux qubit coupled to a coplanar resonator, we now simulate the results of a spectroscopic experiment to measure g. The experiment would involve driving the coupled system with an external microwave field whose frequency ωl would be swept through the res- onance of the coupled system. The most straightforward way to probe the response of the system would be to use what effectively amounts to a standard microwave transmission ( S12 ) measurement of the cavity. The ex- periment would be done by starting with the qubit far detuned from the resonator, then stepping the external magnetic flux to tune the qubit through the cavity reso- nance. This type of experiment would allow us to record the output power of the resonator as a function of qubit Larmor frequency. These simulations were performed by solving the mas- ter equation using a Liouvillian with the Hamiltonian (7) and three collapse (Lindblad) operators which account for the decay and dephasing of the qubit and the cavity at the aforementioned rates κ,γ and γφ (i.e. the effects of HE). A brief description of the formalism can be found in appendix B. All simulations were performed using the “Quantum Optics Toolbox” developed by Tan25. Note that all the figures show the spectrum of the intracavity field (see appendix C for the definition and a description of how it is calculated). In the regime where the qubit Larmor frequency ω0 is very far detuned from the cavity (i.e. when even disper- sive effects are negligible) the qubit and the cavity are effectively decoupled and no exchange of energy can take place. If the system is probed by measuring the response of the cavity, a single peak located at the bare resonance −100 −80 −60 −40 −20 0 20 40 60 80 100 Frequency shift(MHz) FIG. 3: The spectrum far from resonance (where the only effect of the qubit is to broaden the resonance) and at zero de-tuning for small drive (mean steady-state photon number 〈n〉 < 10−3), the latter giving rise to a Rabi splitting 2g. Shown is also the spectrum at large (〈n〉 ≈ 8) drive amplitudes (dashed line). frequency ωr will be seen. If instead the qubit is tuned exactly on resonance (∆ = 0) the effects of the coupling become clearly visible. Now, there are two peaks located at ωr ±Ω/2 = ωr ± g as can be seen in fig. 3. In between these two extremes there is a gradual change from a single- to a double-peaked spectrum where the splitting is approximately g2/∆ as can be seen in the left picture of fig. 4. In this case the pure dephasing rate γφ was set to zero. The result is a ”diamond-shaped” picture with a maximum splitting of 2g at zero detuning. Far from resonance we can identify the four branches as being associated with the states |0 ↑〉 and |1 ↓〉 above and below the resonance. Exactly on resonance the system is in a superposition of states |0 ↑〉 ± |1 ↓〉. This is in agreement with the diagram shown in fig. 2. Note that e.g. an interferometric measurement method26 must be used in order to be able to observe the whole spectrum, a simpler transmission experiments would only see one peak off-resonance since such a measurement only records transitions between photon states where the final state can emit a photon; in this case |0 ↑〉 ↔ |1 ↓〉. The right picture in fig. 4 shows the spectrum when pure dephasing is introduced to the model. The result is somewhat more complicated in that the spectral weights are asymmetric with respect to wr − wl = 0 when the qubit is detuned from resonance. The reason for this is that dephasing leads to a loss of coherence, meaning that the qubit tends to stay in its ground state and the coupling between the states |0 ↑〉 and |1 ↓〉 is reduced, the system is therefore no longer in a superposition of those states. Whence, states with zero photons in the cavity are effectively ”decoupled” from the one-photon states. Since only states which allow for (at least) one photon in the cavity can be measured it follows that only the two branches (one above and one below resonance) that (approximately) correspond to ±|1 ↓〉 will be clearly visible. Note, however, that despite the loss of coherence FIG. 4: (Color online) Power spectrum of the coupled qubit- resonator system as a function of qubit detuning in the strong coupling limit. The frequency of the drive field ωl is held at the resonance frequency ωr of the bare resonator, while the Larmor frequency ω0 of the qubit is tuned by changing the external magnetic field. Left : Spectrum in the absence of pure dephasing. Right: Adding a pure dephasing channel to the dissipation results in an asymmetric spectrum, here the pure dephasing rate γφ is 9.5 MHz. Shown is also one branch of the expression ±g2/∆ (white dashed line). The following parameters were used in the simulations (in GHz): ωr = ωl = 6, g = 0.035, κ = 0.004, γ = 0.001 and ξ/h̄ = 0.25κ. an on-resonance measurement would still show two peaks separated by 2g, i.e. exactly on resonance this situation is effectively indistinguishable from the more coherent case even when the full spectrum is measured. While there are no bound states in the limit of strong driving ξ > g/2 a continuum of states still exists giv- ing rise to complex spectra (dashed line in fig. 3). The structure is reminiscent to the so-called ”Mollow” peaks, well known in atomic physics from e.g. fluorescence spec- troscopy. However, in the latter case the peaks are the result of strong driving of the atom (qubit) whereas in this simulation the cavity is being driven so the similarity is somewhat superficial. When as in this case the cavity is driven so strongly that there are on average several photons in the cavity, we see both a drive induced shift of the position of the sidebands and a reappearance of the central peak. Note that since it is difficult to directly relate the parameter ξ to the power output from a mi- crowave generator, care must be taken not to drive the system inadvertently into this regime. For the persistent current qubit considered here, a de- tuning of ±0.4 GHz corresponds to an external magnetic flux in the range ±10−4Φ0 which is a useful value for a real experiment. However, for the RF SQUID, the flux required is rather less: ±3 · 10−5Φ0. One effect which is not taken into account in our sim- ulations is the presence of thermal photons in the cavity. Thermal effects can, in general, be neglected when work- ing with optical cavities due to the very small average number of photons at those frequencies. In experiment on solid state qubits this is, however, not generally true since they are operated in the microwave range. Also, the rele- vant temperature scale is set not only by the the phonon temperature (e.g. the temperature of the mixing cham- ber of a dilution refrigerator) but also by the amount of noise (essentially ”hot” photons) which reaches the sys- tem via the leads. However, in a well-filtered system a total temperature of 50 mK is attainable. The resonator will then nearly be in its ground state with an average thermal occupancy n̄ of 0.009 and thermal fluctuations in the photon number of the order of 0.1. This justifies ig- noring thermal effects in our simulations for now. That said, even a moderate increase in temperature can sig- nificantly change the outcome of an experiment27. One further simplifying assumption in the model is that T1 and T2 do not change as the qubit is detuned from the optimal bias point Φ0/2. While this is clearly unrealistic, it has been experimentally shown24 that neither param- eter should change dramatically in the parameter range considered here, giving some justification to this approx- imation. V. APPLICATIONS OF CIRCUIT-QED One of the most important applications of circuit QED is the generation of single microwave photons on demand. Single photon sources in the optical regime have been re- alized using e.g. cavity QED with atoms and high finesse optical cavities26. Design and fabrication of deterministic sources that operate in the microwave regime have proved to be more difficult, but a source based on superconduct- ing circuit-QED was recently demonstrated28. This kind of source could be used for quantum radiometry, as well as for quantum information applications such as quan- tum key distribution. Various schemes can be envisaged29,30,31,32, by which single photons can be generated with a circuit QED de- vice. Below, we describe a straightforward technique, based on manipulation of the qubit state with microwave pulses and rapid changes in the DC magnetic field to tune the qubit in and out of resonance with the cavity. Our technique begins with the qubit far detuned from the cavity and the combined system in its ground state |0 ↓〉. A microwave π-pulse is applied to the qubit to excite it to the state |0 ↑〉 [Fig. 5(a)], and this is fol- lowed by a step in the magnetic field to bring the qubit into resonance with the cavity [Fig. 5(b)]. The state of the system immediately after the step is still |0 ↑〉, but due to the qubit-cavity interaction on resonance, this is no longer an eigenstate, so the system begins to precess |0 + |1↑ ↓〉 〉|0 |1↑〉 − ↓〉 large detuning |0 + |1↑ ↓〉 〉 |0 |1↑〉 − ↓〉 |0↑〉|1↓〉 |0 + |1↑ ↓〉 〉 |0 |1↑〉 − ↓〉 |0↑〉|1↓〉 |0 + |1↑ ↓〉 〉|0 |1↑〉 − ↓〉 (a) (b) (c) (d) detuning detuning large detuning FIG. 5: (Color online). Bloch sphere diagrams showing the state of the qubit-cavity system at successive stages of the single photon generation process. around the equator of the Bloch sphere at the vacuum Rabi frequency 2g. After a time 2π/4g, the state of the system will be |1 ↓〉 [Fig. 5(c)]. This means that coherent energy exchange has taken place between the qubit and the cavity, creating a photon-like state. Another step in magnetic field detunes the qubit from the cavity so that the state |1 ↓〉 is once more an eigenstate of the system [Fig. 5(d)]. The system remains in this state until the photon decays out of the cavity into one of the external waveguides in a time of order 2π/κ. By repeating this sequence many times, photons can be generated on de- mand, provided that the time window within which they are required is much longer than 2π/κ. If a scheme similar to the one above is implemented, it is important to prove that it generates single photons de- terministically, rather than stochastically. At optical fre- quencies this is done by studying photon-counting statis- tics using interferometric measurements. Such measure- ments require the use of a beamsplitter. An analogous experiment can be envisaged in the microwave regime, provided that a microwave beamsplitter can be realised. Such a device has been proposed recently29, and could lead to a microwave analogue of the Hanbury-Brown and Twiss interferometer26. VI. CONCLUSIONS We have shown that it should be possible to reach the strong coupling regime using a flux qubit coupled to a coplanar waveguide resonator. If realized, it would open the door to potential applications in metrology, quantum communication and experimental tests of quantum me- chanics. The fact that conventional lithography can be used to fabricate the samples and that the experimen- tal parameters can be chosen freely can in some cases a be significant advantage compared to CQED implen- tantaions utilizing e.g. atoms of ions. Since the relevant frequencies are in the microwave regime it is also possible to use well established methods to manipulate the sys- tem. The main drawback compared to experiments done at optical frequencies is the short coherence time of the qubit and the fact that the system must be operated at very low temperatures. VII. ACKNOWLEDGMENTS The authors would like to thank Mark Oxborrow, Alexandre Zagoskin, Alexander Blais and Vladimir Antonov for helpful discussions and comments. We would also like to thank Scott Parkins for his help with the Quantum Optics Toolbox. This work was funded by the UK Department of Trade and Industry Quan- tum Metrology Programme, project QM04.3.4. and the Swedish Research Council. APPENDIX A: DERIVATION OF THE HAMILTONIAN It is useful to compare the informal procedure used in the introduction of this paper to derive the Hamil- tonian with a more formal approach. Starting with the bare qubit Hamiltonian − 1 (ǫσz + ∆σx) where ǫ = 2Ip(Φx −Φ0/2) we proceed just as before by noting that the flux threading the qubit loop will be modulated via the mutual inductance M that couple the fluctuations in the cavity to the qubit. Writing the total external flux as Φx = Φ x + δΦ, where δΦ = M (a†+a), adding the Hamiltonian for the oscillator mode and the external field and finally transforming into the eigenbasis of the qubit we get H = h̄ωr σz + h̄g(a †σ− + σ+a) sin θ +ξ(e−iωta† + eiωta)− h̄g(a† + a)σz cos θ (A1) in the RWA. Here we have introduced the mixing angle θ = arctan∆/ǫ. This Hamiltonian is identical to the J-C Hamiltonian (7) except that we now have an effec- tive coupling g sin θ and an extra term h̄g(a†+ a)σz cos θ which is zero when the qubit is operated at the degener- acy point θ = π/2. By moving to an interaction frame rotating at the drive frequency ω we see that all terms in the Hamiltonian (A1) are time-independent except the last term which picks up a factor exp(−iwω), meaning it can be neglected in the rotating wave approxiamtion. Note, however, that this additional term can potentially play a role in the dispersive regime. APPENDIX B: DISSIPATION The effects of the environment on a quantum system is in general very difficult to model but is nevertheless cru- cial to understand since it is the cause of decoherence. However, assuming the interaction with the environment is Markovian the evolution of the (reduced) density ma- trix of the system can be described by a master equation ρ̇ = Lρ of Lindblad form33 = − i [H, ρ] + kCkρ+ ρC where Ck are Lindblad operators. In the case considered here we have 3 Lindblad operators. Firstly, the relaxation from the excited state to the ground state at a rate γ1 = 1/T1 represented by a Lindblad operator proportional to the lowering operator σ̂−, i.e. C1 = −. The cavity is loosing energy at a rate κ = ωr/Q which leads to the ”destruction” of photons in the system, C2 = Finally, we also need to consider pure dephasing of the qubit at a rate γφ = 1/Tφ = 1/T2 − 1/2T1 where T2 is the usual total dephasing time of the qubit. This process is represented by the operator C3 = σ̂z . APPENDIX C: CALCULATION OF THE SPECTRUM Our aim is to calculate the steady-state power spec- trum S(ω) of the intracavity field, formally this is de- fined in terms of the photocount output from the cavity as seen by a monochromatic detector20. The spectrum can be calculated from the 2-time correlation function34 〈a†(t+ τ)a(τ)〉 S(ω) = e−iωτ 〈a†(τ + t)a(t)〉dτ (C1) which can be evaluated using the quantum-regression theorem < a†(τ + t)a(t) >= Tr{a†eLτaρ} (C2) where the Liovillian (which includes the three Lindblad operators defined in Appendix B) is given by the right hand side of equation B1 and ρ is the steady-state den- sity matrix which is the solution to Lρ = 0 These calcu- lations are straightforward using the built-in routines of the ”Quantum Optics Toolbox”. 1 Y. Nakamura, Yu.A. Pashkin, and J.S. Tsai. Coherent control of macroscopic quantum states in a single-cooper- pair box. Nature, 398:786–8, 1999. 2 C.H. Van Der Wal, A.C.J. Ter Haar, F.K. Wilhelm, R.N. Schouten, C.J.P.M. Harmans, T.P. Orlando, S. Lloyd, and J.E. Mooij. Quantum superposition of macroscopic persistent-current states. Science, 290:773–7, 2000. 3 I. Chiorescu, Y. Nakamura, C.J.P.M. Harmans, and J.E. Mooij. Coherent quantum dynamics of a superconducting flux qubit. Science, 299:1869–71, 2003. 4 J.M. Martinis, S. Nam, J. Aumentado, and C. Urbina. Rabi oscillations in a large Josephson-junction qubit. Phys. Rev. Lett., 89(11):117901–1, 2003. 5 A. Zeilinger. Experiment and the foundations of quantum physics. Reviews of Modern Physics, 71:288, 1999. 6 Matthias Steffen, M. Ansmann, Radoslaw C. A ND Bial- czak, N. Katz, Erik Lucero, R. McDermott, Matthew Nee- ley, E. M. Weig, A. N. Cleland, and John M. Martinis. Measurement of the entanglement of two superconducting qubits vis quantum state tomography. Science, 313:1423– 1425, 2006. 7 William D. Oliver, Yang Yu, Janice C. Lee, Karl K. Berggren, Leonid S. Levitov, and Terry P. Orlando. Mach- Zehnder interferometry in a strongly driven superconduct- ing qubit. Science, 310:1653–1657, 2005. 8 S.A Valenzuela, W. Oliver, D.M. Berns, K.K. Berggren, L.S. Levitov, and T.P. Orlando. Microwave-induced cool- ing of a superconducting qubit. Science, 314:1589–1591, 2006. 9 C.S. Gerry and P.L. Knight. Introductory Quantum Optics. Cambridge University Press, 2005. 10 A. Wallraff, D. I. Schuster, A. Blais, L. Frunzio, R.-S. Huang, J. Majer, S. Kumar, S. M. Girvin, and R. J. Schoelkopf. Strong coupling of a single photon to a super- conducting qubit using circuit quantum electrodynamics. Nature, 431:162–167, 2004. 11 A Blais, R Huang, A Wallraff, S.M. Girvin, and R.J. Shoelkopf. Cavity quantum electrodynamics for supercon- ducting electrical circuits: An architecture for quantum computation. Phys. Rev. A, 69:062320, 2004. 12 T Yoshie, A. Sherer, A. Hendrickson, G. Khitrova, H.M. Gibbs, G. Rupper, C. Ell, O.B. Shchekin, and D.G. Deppe. Vacuum Rabi splitting with a single quantum dot in a pho- tonic crystal nanocavity. Nature, 432:200–203, 2004. 13 J. P. Reithmaier, A. Lffler, G. Sk, C. Hofmann, S. Kuhn, S. Reitzenstein, L.V. Keldysh, V.D. Kulakovskii, Rei- necke T. L., and A. Forchel. Strong coupling in a single quantum dot-semiconductor microcavity system. Nature, 432:197–200, 2004. 14 A. Lupaşcu, S. Saito, T. Pictot, P.C. de Groot, C.J.M. Harmans, and J.E. Mooij. Quantum non-demolition mea- surement of superconducting two-level system. arXiv:cond- Mat, (0611505), 2006. 15 D. I. Schuster, A. A. Houck, J. A. Schreier, A. Wallraff, J. M. Gambetta, A. Blais, L. Frunzio, B. Johnson, M. H. Devoret, S. M. Girvin, and Schoelkopf R. J. Resolving photon number states in a superconducting circuit. Nature, 445:515, 2007. 16 G. Johansson, L. Tornberg, and C. M. Wilson. Fast quan- tum limited readout of a superconducting qubit using a slow oscillator. Phys. Rev. B, 74:100504R, 2006. 17 Grajcar et al. Four-qubit device with mixed couplings. Phys. Rev. Lett., 96:047006, 2006. 18 J. E. Mooij, T. P. Orlando, L. Levitov, L. Tian, C. H. van der Wal, and S. Lloyd. Josephson persistent-current qubit. Science, 285:1036, 1999. 19 I. Chiorescu, P. Bertet, K. Semba, Y. Nakamura, C.J.P.M. Harmans, and J.E. Mooij. Coherent dynamics of a flux qubit coupled to a harmonic oscillator. Nature, 431:159– 162, 2004. 20 D.F. Walls and G.J. Milburn. Quantum Optics. Springer- Verlag, 1994. 21 J. Johansson, S. Saito, T. Meno, H. Nakano, M. Ueda, K. Semba, and H. Takayanagi. Vacuum Rabi oscillations in a macroscopic superconducting qubit LC oscillator system. Physical Review Letters, 96:127006, 2007. 22 P. Alsing, D.S Guo, and H.J. Carmichael. Dynamic Stark effect for the Jaynes-Cummings system. Phys. Rev. A, 47(7):5135–5143, 1994. 23 P. Bertet, I. Chiorescu, G. Burkard, K. Semba, C. J. P. M. Harmans, D.P. DiVincenzo, and J. E. Mooij. Relaxation and dephasing in a flux qubit. arXiv:cond-mat, (0412485), 2004. 24 K. Kakuyanagi, T. Meno, S. Saito, H. Nakano, K. Semba, H. Takayanagi, F. Deppe, and A. Shnirman. Dephasing of a superconducting flux qubit. Physical Review Letters, 98:47004, 2007. 25 S. Tan. A computational toolbox for quantum and atomic optics. Journal of Optics B, 1:424–432, 1999. 26 M. Oxborrow and A. Sinclair. Single photon sources. Con- temporary Physics, 46(3):173–206, 2005. 27 I Rau, G. Johansson, and A. Shnirman. Cavity quantum electrodynamics in superconducting circuits: Susceptibil- ity at elevated temperatures. Phys. Rev. B, 70:054521, 2004. 28 A. A. Houck, D. I. Schuster, J. M. Gambetta, J. A. Schreier, B. R. Johnson, J. M. Chow, J. Majer, L. Frunzio, M. H. Devoret, S. M. Girvin, and R. J. Schoelkopf. Gen- erating single microwave photons in a circuit. arXiv:cond- mat, (0702648v1), 2007. 29 M. Mariantoni, M.J. Storcz, F.K. Wilhelm, W.D. Oliver, A. Emmert, A. Marx, R. Gross, H. Christ, and E. Solano. On-chip microwave Fock states and quantum homodyne measurements. arXiv:Cond-Mat, (0509737), 2006. 30 F. Marquardt. Efficient on-chip source of microwave pho- ton pairs in superconducting circuit qed. arXiv:cond-Mat, (0605232), 2006. 31 A.M. Zagoskin, M. Grajcar, and A.N. Omelyanchouk. Se- lective amplification of a quantum state. Phys. Rev. B, 70:060301(R), 2004. 32 K. Saito, M. Wubs, S. Kohler, P. Hänggi, and Y. Kayanuma. Quantum state preparation in circuit QED via Landau-Zener tunneling. Europhysics Letters (EPL), 76(1):22–28, 2006. 33 H. P. Breuer and F. Petruccione. The theory of open quan- tum systems. Oxford University Press, 2004. 34 R. J. Glauber. Coherent and incoherent states of radiation field. Physical Review, 131:2766–2788, 1963. 35 In the dispersive regime the effective Hamiltonian becomes H = g2/∆(σ̂+σ̂− + â†âσ̂z) ABSTRACT We propose a scheme for circuit quantum electrodynamics with a superconducting flux-qubit coupled to a high-Q coplanar resonator. Assuming realistic circuit parameters we predict that it is possible to reach the strong coupling regime. Routes to metrological applications, such as single photon generation and quantum non-demolition measurements are discussed. <|endoftext|><|startoftext|> Zero bias anomaly out of equilibrium D. B. Gutman1, Yuval Gefen2, and A. D. Mirlin3,4,∗ Dept. of Physics, University of Florida, Gainesville, FL 32611, USA Dept. of Condensed Matter Physics, Weizmann Institute of Science, Rehovot 76100, Israel Institut für Nanotechnologie, Forschungszentrum Karlsruhe, 76021 Karlsruhe, Germany Inst. für Theorie der kondensierten Materie, Universität Karlsruhe, 76128 Karlsruhe, Germany (Dated: August 7, 2021) The non-equilibrium zero bias anomaly (ZBA) in the tunneling density of states of a diffusive metallic film is studied. An effective action describing virtual fluctuations out-of-equilibrium is derived. The singular behavior of the equilibrium ZBA is smoothed out by real processes of inelastic scattering. PACS numbers: 73.23.-b, 73.40.Gk, 73.50.Td The suppression of tunneling current at low bias due to electron-electron interaction is known as the zero bias anomaly (ZBA). The theory of ZBA for disordered metals at thermal equilibrium has been developed, on a pertur- bative level, by Altshuler and Aronov [1, 2]. The non- perturbative generalization of this theory was achieved by Finkelstein [3]. Measurements of the tunneling density of states (DOS) in biased quasi-one-dimensional wires [4] call for an extension of the theory to non-equilibrium setups. In this work we study the ZBA for disordered metallic films out of equilibrium, in both the perturba- tive and the non-perturbative (in interaction) limits. Besides the experimental motivation, the problem of ZBA in a non-equilibrium system is of fundamental the- oretical interest. At equilibrium, the distribution of elec- trons in phase space has a single edge at the Fermi sur- face. The Coulomb interaction between the tunneling electron and the electrons in the Fermi sea excites vir- tual particle-hole pairs around the Fermi edge, leading to the suppression of the tunneling DOS, similarly to the Debye-Waller factor. The suppression gets stronger when the electron energy approaches the Fermi energy. Out of equilibrium, the distribution of particles may have sev- eral sharp edges rather than a single one at the Fermi surface, which poses important questions addressed in this work: How will the excitation of electron-hole pairs in this situation affect the tunneling DOS? Will there be an interpaly between the two edges? We show that the two edges are not independent: one edge affects the ZBA near the other via real interaction-induced scatter- ing processes governing the dephasing of electrons in the non-equlibrium regime. From this point of view the prob- lem we are considering is a representive of a class of phe- nomena that involve renormalization away from thermal equilibrium, such as the Fermi edge singularity [5] and the Kondo effect [6]. What makes the ZBA particularly interesting is its deep connection to various conceptually impor- tant phenomenological ideas. At equilibrium, the non- perturbative results [3] have been reproduced by quan- tum hydrodynamical methods [7], and, within the frame- work of the theory of dissipation [8], by methods that rely on the fluctuation-dissipation theorem. Our work circumvents this restriction. Starting with the Keldysh non-linear σ-model, we derive an effective action that ac- counts for virtual fluctuations in disordered metals away from equilibrium. This action is complementary to the one for kinetics of real fluctuations (such as noise) devel- oped earlier [11, 12, 13, 14]. Further, we discuss a connec- tion between our theory and phenomenological methods [7, 8, 9, 10]. As a central application of our theory, we an- alyze the ZBA problem and calculate the tunneling DOS for a two-dimensional (2D) diffusive metallic film subject to external bias. Consider an electron which tunnels from a tip into a metal, subject to an external bias. The Coulomb inter- action U(q) (U(q) = 2πe2/q, d = 2) causes electrons in the metal to readjust their position, such that they try to screen the added charge. Their motion is impeded by static disorder. For a sufficiently good metal, character- ized by the dimensionless conductance g = ǫF τ ≫ 1, (ǫF is the Fermi energy and τ – the elastic mean free time) the electron motion is diffusive. Low energy excitations are accounted for by the non-linear Keldysh σ-model [15] iS[Q,φ] = iTr{φT (U−1 + ν0)σ1φ} Tr{D(∇Q)2 − 4∂tQ} − iπν0Tr{φαγ αQ}, (1) where Q(r, t, t′) is a 2 × 2 matrix field (in the Keldysh space) describing slow electronic modes, φ(r, t) is a 2- vector in the Keldysh space representing the Coulomb interaction, and Tr denotes the trace both over Keldysh indices (TrK), spatial, and temporal coordinates. The Q field is subject to a non-linear constraint, dt1Q(r, t, t1)Q(r, t1, t ′) = δ(t− t′)σ0, (2) γ1 ≡ σ0 is a 2×2 unit matrix, γ2 ≡ σ1 the first Pauli ma- trix, ν0 is the DOS at the Fermi energy in the absence of interaction, and D is the diffusion coefficient. The theory can be rewritten in terms of the field Q only. Integrating the φ field out, we derive the following action: iS[Q] = − Tr{D(∇Q)2 − 4∂tQ} i(πν0) TrK{σ1Q} U−1 + ν0 TrK{Q}. (3) http://arxiv.org/abs/0704.0728v1 Physical observables can be found by differentiating the generating functional Z[ϕ1, ϕ2] = exp (iS[Q]) exp 1 + ν0U 2ϕ1ϕ2 − πϕ1TrK{Q} − πϕ2TrK{σ1Q} over the source fields ϕ1 and ϕ2. So far all low energy excitation in the problem have been kept indiscriminately. The price one pays for this is an extreme technical complexity of the theory. In some cases, this description is excessive and the theory can be simplified by singling out a particular subspace of the Q matrices. The resulting theory is less general, but more suitable for tackling a specific class of problems. The noise statistics in disordered systems is a remark- able example of such a simplification. An extension of the Boltzman-Langevin approach [16] to high order cor- relators is achieved on the subspace of matrices that are diagonal in the Wigner representation, Q(r, ǫ, t) = e f̄(r,ǫ,t) 1 2− 4f(r, ǫ, t) f̄(r,ǫ,t). The theory [11, 13, 14] resulting from the σ-model (3), reduced to the subspace (5), accounts for real fluctuations in agreement with the cascade idea of Nagaev [17]. Below we use a similar ideology, “projecting” the σ- model on the subspace appropriate for the ZBA prob- lem. According to works on ZBA at equilibrium, these are virtual fluctuations of gauge fields that dominate the suppression of the tunneling DOS [3, 15, 18]. This sug- gests that gauge-type fluctuations constituting local-in- time rotations of the saddle point Λt−t′ , Q(r, t, t′) = e f̄(r,t)+ i f(r,t)Λ r,t−t′e f(r,t′)− f̄(r,t′), r,t−t′ = 1 2− 4n(r, t− t′) , (6) where n(r, t− t′) is the Fourier transform of the local dis- tribution function n(r, ǫ), are to be retained. Plugging (6) into the action (3) and expanding up to quadratic order in f and f̄ , we derive an effective theory of fluctu- ations in an interacting diffusive conductor. iS[f, f̄ ] = (dω)(dq) f̄(−ω,−q) ω −Dq2 + 1 + ν0U(q) f(ω, q) −Dq2T (ω)f̄(−ω,−q)f̄(ω, q) . (7) Here the effective temperature T (ω) is defined as T (ω) = dǫ n(ǫ) [2− n(ǫ − ω)− n(ǫ+ ω)] , (8) and n(ǫ) is a distribution function determined by the Boltzmann equation. To demonstrate the consistency of our approach, we first show that the effective action (7) reproduces cor- rectly known density correlation functions. The sym- metrized density correlation function is given by ρ̂(r, t), ρ̂(0, 0) 〉ω,q = − ∂φ2(ω, q)∂φ2(−ω,−q) 1 + ν0U ω2〈ff〉ω,q . (9) Similarly, the response function can be expressed as i〈[ρ̂(r, t), ρ̂(0, 0)]−θ(t)〉ω,q = ∂φ1(ω, q)∂φ2(−ω,−q) 1 + ν0U ω2〈f f̄〉ω,q + 1 + ν0U . (10) Calculating the correlation functions entering Eqs. (9), (10) using the action (7), 〈ff〉ω,q = 4Dq2T (ω) Dq2 + iω 1+ν0V (q) 〈fω,qf̄−ω,−q〉 ≡ 〈f f̄〉ω,q = Dq2 − iω 1+ν0U(q) ) ,(11) we reproduce the known results for for the spectral func- tion of density fluctuations ρ̂(r, t), ρ̂(0, 0) 〉ω,q = 2T (ω) |Dq2[1 + ν0U(q)] + iω|2 and the density-density response function, i〈[ρ̂(r, t), ρ̂(0, 0)]−θ(t)〉ω,q = Dq2[1 + ν0U(q)]− iω It is worth noting the analogy between our effective ac- tion and phenomenological theories that describe ZBA at equilibrium within an effective environment model [9, 10]. The latter explain the ZBA as the influence of virtual fluctuations of an electromagnetic field at equilibrium [8] on the electron tunneling. In view of the fluctuation- dissipation theorem, fluctuations of the electromagnetic field are determined by the complex impedance of the system. In the zero dimensional case, modes of the elec- tromagnetic field can be considered as independent quan- tum harmonic oscillators. Being suddenly shaken by the incoming electron, they move away from the equilibrium position, reducing the overlap with their original config- uration hence suppressing electron tunneling amplitude. The action (7) describes similar processes in a 2D diffu- sive system and without the assumption of thermal equi- librium, thus keeping the information about both real and virtual processes in a non-equilibrium state. Now we are ready to apply the theory to the problem of ZBA out of equilibrium. The tunneling DOS ν(ǫ) = GR(ǫ, p)−GA(ǫ, p) can be rewritten in terms of Q matrices as ν(ǫ) = dt e−iǫ(t−t ′)〈TrK Q(r, t ′, t)σz〉Q . (13) Plugging Eq. (6) into Eq.(13), we find = 1+2i dt n(t) eiǫt− Iff (t) sin Iff̄ (t) , (14) where we used the notations Iff (t) = (dω)(dq) (1 − cosωt) 〈〈ff〉〉ω,q , Iff̄ (t) = (dω)(dq) sinωt 〈〈f f̄〉〉ω,q . (15) There is a subtle point related to the definition of the cor- relator 〈〈ff〉〉 and 〈〈f f̄〉〉 in (15). The ZBA should vanish in the absence of interaction (U = 0). This physically ob- vious statement is valid to all orders of the diagrammatic calculations (all potentially singular contributions vanish after the time integration over the Keldysh contour) and is satisfied by the non-linear σ model. However, it ceases to be an exact feature of the theory when the full space of Q matrices is reduced to the subspace Eq. (6). This problem is easily cured. Since we are only interested in summing up the leading interaction-induced ln2 ǫ contri- butions to the tunneling DOS (to all orders), these are the interaction-induced parts of 〈ff〉 and 〈f f̄〉 that we have to keep in Eq. (15) [19]: 〈〈f f̄〉〉ω,q ≡ 〈f f̄〉ω,q − 〈f f̄〉ω,q,U=0 (Dq2 − iω)[Dq2(1 + ν0U)− iω] . (16) Analogously, 〈〈ff〉〉ω,q = 4Dq2T (ω)(2 + ν0U)U |Dq2 − iω|2|Dq2(1 + ν0U)− iω|2 . (17) Equations (14)–(17) constitute our general result for the non-equilibrium ZBA. At equilibrium they reproduce the known results for the ZBA in the perturbative [1, 2] and non-perturbative [3] regimes. To illustrate the effect of non-equilibrium conditions on ZBA we consider a diffusive film of length L connected to two zero-temperature reservoirs with voltage difference V . We assume that the energy relaxation in the film can be neglected, i.e. that L is shorter than the energy relaxation length, a condition met at not too high bias. In this case the solution of the Boltzmann equation is a double-step function n(ǫ, x) = an0(ǫ− eV/2) + (1 − a)n0(ǫ+ eV/2) , (18) where n0(ǫ) is the Fermi distribution function, a = x/L, and x is the distance from the reservoir. This distribution corresponds to the effective temperature T (ω) = [a2 + (1− a)2]Teq(ω) +a(1− a)[Teq(ω + eV ) + Teq(ω − eV )] , (19) where Teq(ω) is the equilibrium value Teq(ω) = ω coth(ω/T ) → |ω| . (20) Substituting Eqs. (16), (17) into Eqs.(15),( 14), we find the tunneling DOS out of equilibrium: = a exp 8π2ν0D log(D2κ4τt−) +(1− a) exp 8π2ν0D log(D2κ4τt+) . (21) This result is valid in both the perturbative (when the exponentials can be expanded up to the first non-trivial term) and non-perturbative regimes. The DOS consists of two terms corresponding to electrons coming from the left and right reservoir. The energy scales governing the argument of the logarithm in these two terms are = max a(1− a)eV log 4πν0D , (22) respectively (κ = 2πe2ν). The evolution of the ZBA in the tunneling DOS with decreasing conductance (i.e. from the perturbative to the non-perturbative regime) is illustrated in Fig. 1. The DOS has a two-dip shape with minima reached at the energies where the distribution function exhibits discon- tinuous jumps (i.e. at the positions of the Fermi edge in the left and right leads, ǫ = ± eV ). Away from the minima, the DOS is controlled by the energy measured from the corresponding edge. As this energy decreases (we get closer to one of the Fermi edges), the singularity near this edge gets affected by the presence of the other edge. The broadening of the ZBA singularity takes place on a new energy scale, (V ) = a(1 − a) 4πν0D eV log . (23) The notation introduced in Eq. (23) stresses the analogy with the equilibrium dephasing rate τ−1 (T ) governing the infrared cutoff of the interference phenomena such as the weak localization [2]. The emergence of τ−1 shows that inelastic scattering processes (responsible for dephasing) lead to smearing of the ZBA singularity. Qualitatively the results can be explained as follows. Virtual fluctuations of the gauge fields act on a quasipar- ticle and suppress the tunneling DOS. Their influence is limited by the quasiparticle life time. An external bias -0.001 -0.0005 0 0.0005 0.001 ( a ) -0.001 -0.0005 0 0.0005 0.001 FIG. 1: Tunneling DOS vs. energy for the double-step dis- tribution function (18) with a = 0.5. The dimensionless con- ductance is, from top to bottom, g = 100, 10 (a) and 1 (b). enhances fluctuations of the electromagnetic field, which in turn dephase the tunneling quasiparticle. This results in a competition between the virtual processes establish- ing the ZBA and the real fluctuations (noise) cutting it Let us stress the crucial difference with the equilib- rium situation. At equilibrium, the ZBA singularity in 2D is cut off by temperature (which enters via the distribution function); specifically, the dephasing rate (T ) does not affect the ZBA in any essential way, since τ−1 (T ) ≪ T . It is thus a distinct feature of the strongly non-equilibrium regime that the ZBA is smeared by the inelastic scattering (dephasing) rate. In conclusion, we have developed an effective theory for virtual fluctuations in disordered metals out of equi- librium. Using this theory, we studied the ZBA in a 2D metallic film biased by external voltage. We have found that out of equilibrium the tunneling DOS has a double- dip structure with minima reached at the ”edges” of the particle distribution. The ZBA near any of the ”edges” is influenced by the other one. The suppression of DOS is smoothed out by the real processes of inelastic scattering (dephasing) with characteristic energy scale τφ(V ). Fur- ther applications and extensions of our theory include, in particular, the ZBA in quasi-one-dimensional and strictly one-dimensional (Luttinger liquid) wires [20] out of equi- librium; these results will be reported elsewhere. We thank D. Bagrets, N. Birge, A. Finkelstein, D. Maslov, A. Shnirman, and A. Shytov for useful discus- sions. This work was supported by NSF-DMR-0308377 (DG), US-Israel BSF, ISF of the Israel Academy of Sci- ences, and DFG SPP 1285 (YG), DFG Center for Func- tional Nanostructures, EC Transnational Access Pro- gram at the WIS Braun Submicron Center (ADM), and Einstein Minerva Center. [*] Also at Petersburg Nuclear Physics Institute, 188300 St. Petersburg, Russia. [1] B. L. Altshuler and A. G. Aronov, Solid State Commun. 30, 115 (1979); Sov. Phys. JETP 50, 968 (1979); B. L. Altshuler, A. G. Aronov, and P. A. Lee, Phys. Rev. Lett. 44, 1288 (1980). [2] B. L. Altshuler and A. G. Aronov, in Electron–Electron Interaction In Disordered Systems, ed. by A.L. Efros and M. Pollak (Elsevier, 1985), p.1. [3] A. M. Finkel’stein, Sov. Phys. JETP 57, 97 (1983); ibid 59, 212 (1984); Sov. Sci. Rev. A 14, 1 (1990). [4] A. Anthore, F. Pierre, H. Pothier, and D. Esteve, Phys. Rev. Lett. 90, 076806 (2003). [5] D.A. Abanin and L.S. Levitov, Phys. Rev. Lett. 94, 186803 (2005). [6] J. Paaske, A. Rosch, J. Kroha, and P. Wölfle, Phys. Rev. B 70, 155301 (2004); J. Paaske, A. Rosch, P. Wölfle, N. Mason, C.M. Marcus, and J. Nygard, Nature Physics 2, 460 (2006). [7] L.S. Levitov and A.V. Shytov, JETP Lett. 66, 214 (1997). [8] Yu. V. Nazarov, Sov. Phys. JETP 68, 561 (1989). [9] M.H. Devoret, D. Esteve, H. Grabert, G.-L. Ingold, H. Pothier, and C. Urbina, Phys. Rev. Lett. 64, 1824 (1990). [10] S.M. Girvin, L.I. Glazman, M. Jonson, D.R. Penn, and M.D. Stiles, Phys. Rev. Lett. 64, 3183 (1990). [11] S. Pilgram, Phys. Rev. B 69, 115315 (2004); S. Pilgram, K.E. Nagaev, and M. Büttiker, Phys. Rev. B 70, 045304 (2004); A.N. Jordan, E.V. Sukhorukov, and S. Pilgram, J. Math. Phys. 45, 4386 (2004). [12] T. Bodineau and B. Derrida, Phys. Rev. Lett. 92, 180601 (2004). [13] D.B. Gutman, Y. Gefen, and A.D. Mirlin, Phys. Rev B 71, 085118 (2005). [14] D.A. Bagrets, Phys. Rev. Lett. 93, 236803 (2004). [15] A. Kamenev and A. Andreev, Phys Rev B. 60, 2218 (1999). [16] A. Ya. Shulman and Sh. M. Kogan, Sov. Phys. JETP 29, 467 (1969); S.V. Gantsevich, V. L. Gurevich, and R. Katilius, Sov. Phys. JETP 30, 276 (1970). [17] K. Nagaev, Phys. Rev. B 66, 075334, (2002). [18] P. Kopietz, Phys. Rev. Lett. 81, 2120 (1998). [19] A similar subtraction was implemented in L.S. Levitov, A.V. Shytov, and B.I. Halperin, Phys. Rev. B 64, 075322 (2001). [20] The equilibrium ZBA in the 1D geometry, including a crossover between the diffusve and ballistic regimes, was recently studied in E.G. Mishchenko, A.V. Andreev, and L.I. Glazman, Phys. Rev. Lett. 87, 246801 (2001); C. Mora, R. Egger, and A. Altland, Phys. Rev. B 75, 035310 (2007). ABSTRACT The non-equilibrium zero bias anomaly (ZBA) in the tunneling density of states of a diffusive metallic film is studied. An effective action describing virtual fluctuations out-of-equilibrium is derived. The singular behavior of the equilibrium ZBA is smoothed out by real processes of inelastic scattering. <|endoftext|><|startoftext|> Introduction The physics of Kaluza-Klein black holes, i.e. black hole spacetimes asymp- totic at infinity to M × S1, has proved to be a surprisingly rich subject, in- cluding such phenomena as the Gregory-Laflamme instability, non-uniform static black strings and the black hole/black string phase transition (see e.g. the reviews [1, 2]). Research to date has focused primarilly on the static case. However, it is also of interest to explore the properties of stationary solutions. Accordingly, in this paper we will study the thermodynamics of stationary Kaluza-Klein black holes1. Static Kaluza-Klein black holes are characterized at infinity by the mass M, tension T and the length L of the compact direction. The physical meaning of the tension follows from its role in the first law for static S1 Kaluza-Klein black holes [6][7][8] dM = κ dA + T dL. (1) We see that the tension determines the variation of the mass with varying length of the compact direction, under the constraint that the horizon area is held fixed. Within the thermodynamic analogy, it appears to be an intensive parameter of the system, like temperature or pressure. Stationary Kaluza-Klein black holes can carry linear momentum in the compact direction, as well as angular momentum. In this paper, we will be interested in this linear momentum, which we denote by P, and will assume that the angular momentum vanishes. The simplest solutions with P 6= 0 are boosted black strings. These are obtained by starting from the infinite uniform black string, boosting in the z direction and then identifying the new z coordinate with period L. The boosted black string is then locally, but not globally, the same as the static uniform black string. Further stationary, but not z-translationally invariant, solutions may be obtained by giving localized black holes or non-uniform black strings velocity in the compact direction. In subsequent sections, we present the following results. We use the Hamiltonian methods of [9][10][8] to establish the first law for stationary, non- rotating Kaluza-Klein black holes. We also derive two Smarr formulas for these spacetimes. These are exact relations between the geometric quantities 1Aspects of stationary Kaluza-Klein black holes have been studied in [3][4]. The ther- modynamics of asymptotically AdS, boosted domain walls have been investigated in ref- erence [5]. M, κA, vHP and T L, where the quantity VH is defined below. The first of these formulas holds for the entire class of spacetimes under consideration. The second Smarr formula holds under the additional assumption of exact translation invariance in the compact direction. A linear combination of these two formulas gives the relation between mass and tension for the boosted black string. We derive each of these Smarr formulas in two ways, first using scaling arguments (as in e.g. reference [11]) and second using Komar integral relations (as in reference [12]). Finally, we present a Gibbs-Duhem formula that relates variations in the tension to variations in the other intensive parameters. This result generalizes the ‘tension first law’ of reference [13]. Our result for the first law resolves a small puzzle related to the boosted black string, which formed part of the motivation for this work. It was found in reference [3] that the tension of the boosted black string becomes negative for values of the boost parameter in excess of a certain critical value, which depends only on the spacetime dimension. If the physical interpretation of the tension based on equation (1) were to continue to hold in the stationary case, then the energy of the system would decrease with increasing L, which seems counter-intuitive. This puzzle is resolved by showing that the coeffi- cient of the dL term in the first law for black holes is an effective tension T̂ . The effective tension T̂ is equal to the ADM tension in the static case, but includes a contribution from the momentum in the stationary case. For the boosted black string T̂ is always positive, and is in fact given by the tension of the unboosted black string with the same horizon radius. 2 Stationary, non-rotating Kaluza-Klein black holes We consider stationary D-dimensional vacuum black hole spacetimes that are asymptotic to MD−1 × S1, and assume that the black hole horizon is a bifurcate Killing horizon. In accordance with our focus on linear momentum around the S1, we take the ADM angular momentum to vanish. We denote the horizon generating Killing field by la, and assume that at infinity it has the form la = T a + vHZ a (2) where T a = (∂/∂t)a and Za = (∂/∂z)a, with z being the coordinate around the S1. The surface gravity κ of the black hole horizon is defined, as usual, via the relation on the horizon ∇a(lblb) = −2κla. (3) The form (2) of the horizon generating Killing field at infinity resem- bles the decomposition of the horizon generating Killing field for a rotating, asymptotically flat black hole. In that case, i.e. upon replacing vHZ a, with φ an azimuthal coordinate, one can show that T a and φa are themselves Killing vectors [14, 15, 16]. The quantity ΩH can then be in- terpreted as the angular velocity of the horizon, and further shown to be constant on the horizon [14]. Returning to the case of Kaluza-Klein black holes, the situation is quite different. Already in the static case, solutions exist which are non-uniform in the z direction. In the stationary case then, it will not generally be the case that T a and Za are Killing vectors. For localized black holes or non-uniform black strings with velocity around the S1, only the linear combination la is a Killing vector. 2.1 Two commuting Killing fields It is nonetheless useful to separately consider the case in which T a = (∂/∂t)a and Za = (∂/∂z)a are two commuting Killing fields, and that the relation (2) holds throughout the spacetime. The boosted black string falls into this class of spacetimes. If both Za and T a are Killing fields, then the quantity vH in equation (2) may be considered to be the velocity of the black hole horizon. This identification follows in a similar way to that of ΩH as the angular velocity in the rotating case (see e.g. the article by Carter in [17]). It follows from equation (3), together with our assumption that T a and Za are commuting Killing vectors, that in addition to lala = 0 on the horizon, one also has there the orthogonality relations laTa = 0, l aZa = 0. (4) Given these, one can then show that the metric components on the horizon satisfy the two relations (T aZa) 2 = (T aTa) Z bZb, vH = − T aZa (ZbZb) The second of these leads to the interpretation of vH as the velocity of the horizon in the following manner. For rotating black holes, one considers ‘zero angular momentum observers’ or ZAMO’s. The angular velocity ΩH of the horizon is the limit of a ZAMO’s angular velocity as it approaches the horizon radius. For a boosted black string, we may analogously consider observers with zero linear momentum along the string, which we might be justified in calling ZELMO’s. Let pa = m dxa/dτ be the momentum of a particle following a geodesic. It’s energy E = −T apa and the z−component of its momentum P = Zapa are both constants of motion. The condition P = 0 of vanishing linear momentum is then dz/dt = −gtz/gzz, and we see that on the horizon the coordinate velocity of a ZELMO is equal to vH . 3 ADM mass, tension and momentum We review the formulas for the ADM mass, tension and momentum. Let us write the spacetime metric near infinity as gab = ηab + γab, where ηab is the D-dimensional Minkowski metric. The components of γab are assumed to fall- off sufficiently rapidly that the integral expressions for the mass, tension and momentum are well-defined. In the asymptotic region, write the spacetime coordinates as xa = (t, z, xi), where i = 1, . . . , D − 2 and the coordinate z running around the S1 is identified with period L. Let Σ be a spatial slice and ∂Σ∞ its boundary at spatial infinity. The ADM mass and momentum in the z direction are then given in asymptotically Cartesian coordinates by the integrals M = 1 dz dsi −∂iγjj − ∂iγzz + ∂jγij dz dsi ∂ iγtz (7) where indices are raised and lowered with the asymptotic metric ηab and the area element dsi is that of a sphere S D−3 at infinity in a slice of constant t and z. The ADM tension is similarly given by the integral [13][6][18] T = − ∂Σ∞/S1 −∂iγjj − ∂iγtt + ∂jγij . (8) Note that in contrast with the ADM mass and momentum, the definition of the tension does not include an integral in the z-direction. The ADM mass is an integral over the boundary of a slice of constant t, which includes the direction around the S1. The tension, on the other hand, is defined [13][6][18] by an integral over the boundary of a slice of constant z. This includes, in principle, an integration over time. However, if one expands the integrand around spatial infinity, one finds that terms that make non-zero contributions to the integral are always time independent. Time dependent terms fall-off too rapidly to contribute. Hence, one can omit the integration over the time direction and work with the quantity T defined above, which is strictly speaking a ‘tension per unit time’. We can evaluate these formulas for M, P and T in terms of the asymp- totic parameters of our spacetimes. The spacetimes we consider have topol- ogy RD−1 × S1, the coordinate z in the compact direction being identified with period L. We can write the metric explicitly as ds2 = gttdt 2 + 2gtzdtdz + gzzdz 2 + 2(gtidtdx i + gzidzdx i) + gijdx idxj (9) where xi with i = 1, . . .D − 2 are the non-compact spatial coordinates. We assume the following falloff conditions at spatial infinity gtt ≃ −1 + ct/rD−4, gzz ≃ 1 + cz/rD−4, gtz ≃ ctz/rD−4, (10) and further that the coefficients gti and gzi falloff sufficiently fast that they do not contribute to any ADM integrals at infinity. The mass, tension [7] and momentum can then be shown, using the field equations, to be given in terms of the asymptotic parameters ct, cz and ctz by M = ΩD−3 L ((D − 3)ct − cz), T = (ct − (D − 3)cz), (11) P = −(D − 4)ΩD−3 L ctz . (12) 4 The boosted black string The boosted black string serves as a simple analytic vacuum spacetime in which to check the results we present below for the first law, Smarr and Gibbs-Duhem relations. The boosted black string metric may be obtained starting from the uniform black string, performing a boost transformation with parameter β and identifying the new, boosted z coordinate with period L. This gives ds2 = −(1 − cosh2 β)dt2 + (1 + sinh2 β)dz2 (13) sinh β cosh βdzdt + (1 − c )−1dr2 + r2dΩ2D−3 The horizon, which has topology SD−3×S1 is located at rH = c1/(D−4). From the asymptotic form of the metric, one finds using the expressions (11) and (12) that the ADM mass, tension and momentum are given as in [3] by ΩD−3L rD−4H ((D − 4) cosh2 β + 1) (14) rD−4H (1 − (D − 4) sinh2 β) (15) P = − ΩD−3L rD−4H (D − 4) sinh β cosh β (16) Note that, as mentioned in the introduction, the tension becomes negative for sinh2 β > 1/(D − 4). We can further compute, as in reference [3], that the horizon area, surface gravity, and horizon velocity of the boosted black string are given by A = ΩD−3LrD−3H cosh β, κ = D − 4 2rH cosh β , vH = − sinh β cosh β . (17) 5 Gauss’ Laws for Perturbations Following the work of [9], we use the Hamiltonian formalism of general rela- tivity to derive the first law for stationary, non-rotating Kaluza-Klein black holes. Another of our goals is to derive a ‘first law’ for variations in the tension as in reference [13], for this class of spacetimes. This requires a slight generalization of the Hamiltonian formalism to accomodate evolution of data on timelike surfaces in a spacelike direction. Although, as we discuss below in section (8), we have not yet succeeded in providing a Hamiltonian deriva- tion of the ‘tension first law’ in the stationary case, our presentation of the Hamiltonian formalism will be general enough to provide the necessary tools. The essence of the method is as follows. In vacuum gravity, suppose one has a black hole solution with a Killing field. Now consider solutions that are perturbatively close to this background solution, but are not required to have the original Killing symmetry. The linearized Einstein constraint equations on a hypersurface can be expressed in the form of a Gauss’ law (see [10]), relating a boundary integral at infinity to a boundary integral at the horizon. The physical meaning of this Gauss’ law relation depends on the choice of Killing field, as well as on the choice of hypersurface. Taking the generator la of a Killing horizon, together with an appropriate choice of a spacelike hypersurface, yields the usual first law for variation of the mass [9]. In the case of solutions that are z translation invariant, choosing the spatial translation Killing vector Za, again with an appropriate choice of a timelike hypersurface, gives a ‘first law’ for variations in the tension [13]. The formalism then proceeds in the following way. Assume we have a foliation of a spacetime by a family of hypersurfaces of constant coordinate w. We denote these hypersurfaces, both collectively and individually, by V and the unit normal to the hypersurfaces by wa. With the application to tension in mind, we consider both timelike and spacelike normals by setting a = ǫ with ǫ = ±1. This slight generalization introduces factors of ǫ into a number of otherwise standard formulas. The spacetime metric can be written as gab = ǫwawb + sab (18) where sab, satisfying sa bwb = 0, is the metric on the hypersurfaces V . As usual, the dynamical variables in the Hamiltonian formalism are the met- ric sab and its canonically conjugate momentum π ab = ǫ |s|(Ksab − Kab). Here Kab = sa c∇cwb is the extrinsic curvature of a hypersurface V . We con- sider Hamiltonian evolution along the vector field W a = (∂/∂w)a, which can be decomposed into its components normal and tangential to V , according to W a = Nwa + Na where Nawa = 0. As usual, we refer to N and N respectively as the lapse function and the shift vector. The gravitational Hamiltonian density which evolves the system along W a is then given by H = NH + NaHa with H = −R(D−1) + ǫ D − 2 − πabπab) (19) Hb = −2Da(|s|− 2πab). (20) where R(D−1) is the scalar curvature for the metric sab and the derivative operator Da on the hypersurface V . One further finds that the quantities H and Ha are simply related to the normal components of the Einstein tensor, H = 2ǫ Gabw awb, Hb = 2ǫ Gacw ascb (21) These components of the field equations contain only first derivatives with respect to the coordinate w, and hence represent constraints on the dynamical fields, sab and π ab, on V . This property is independent of whether the normal direction is timelike, as in the usual ADM formalism, or spacelike. In vacuum, the equations H = 0 and Hb = 0 are enforced in the Hamiltonian formalism as the equations of motion of the nondynamical lapse and shift variables. These are referred to as the Hamiltonian and momentum constraints, a terminology we continue to use in the case that the normal wa is spacelike. Let us now assume that the spacetime metric ḡab is a solution to the vacuum Einstein equations2 with a Killing vector ξa. We decompose ξa into components normal and tangent to the hypersurfaces V introduced above, according to ξa = Fwa + βa. Now, let us further assume that the metric gab = ḡab+δgab is the linear approximation to another solution to the vacuum Einstein equations. Denote the Hamiltonian data for the background metric by s̄ab, π̄ ab, the corresponding perturbations to the data by hab = δsab and pab = δπab, and the linearized Hamiltonian and momentum constraints by δH and δHa. As shown in [10, 9, 13], the following statement then holds as a consequence of Killing’s equation in the background metric, FδH + βaδHa = −D̄aBa (22) where D̄a is the background covariant derivative operator on the hypersurface and the vector Ba is given by Ba = F (D̄ah−D̄bhab)−hD̄aF +habD̄bF + βb(π̄cdhcds̄ b−2π̄achbc−2pab). Here indices are raised and lowered with the background metric s̄ab. Since the metric gab solves the vacuum Einstein equations by assumption, we know that δH = δHa = 0. Therefore, we have the Gauss’ law type statement D̄aB a = 0. Note that the detailed form of this relation for the perturbations hab and p depends on the the Killing vector ξa and the normal wa to the hypersurface, 2In this paper we will focus on the case when the background spacetime is vacuum. It is straightforward to add stress-energy which is described by a Hamiltonian [13]. If the matter Hamiltonian contains constraints–such as for Maxwell theory–then additional charges appear in the first law. This was worked out for Einstein-Yang-Mills in [9]. The general case when the background spacetime has stress-energy, such as a cosmology, was studied earlier in [10]. In this situation, the criterion for a Gauss’ law on perturbations is that the background have an Integral Constraint Vector. as well as on the background spacetime metric. Making different choices for the Killing vector and normal can lead to different relations of this form. We can now integrate the relation D̄aB a = 0 over the hypersurface V and use Stokes theorem to obtain c = 0, (24) where for black hole spacetimes the boundary ∂V of the hypersurface V typically has two components, one on the horizon and one at infinity3. 6 The first law for stationary, Kaluza-Klein black holes Following references [9, 8], we now use the Hamiltonian formalism presented in the last section to derive the first law for stationary, non-rotating Kaluza- Klein black holes. The first law relates the difference δA in the horizon area between nearby solutions to the variations δM, δP and δ L in the mass, momentum and length of the compact direction. As in reference [8], we carry out the calculation first holding the length at infinity, L, fixed, and then use this result in order to do the calculation with δ L 6= 0. We assume as in section (2) that we have a stationary, non-rotating Kaluza-Klein black hole solution with metric ḡab and horizon generating Killing field la, which is given at infinity by la = T a + vHZ a. We further assume as in section (5) that the metric gab = ḡab + δgab is a linear approx- imation to a solution of the field equations. At this stage, we assume that δgab is such that δ L = 0. Further on, we will relax this assumption. The derivation of the mass first law is then quite similar to that for rotating black holes [9]. Consider a spacelike hypersurface V , which intersects the horizon at the bifurcation surface and has a unit normal approaching the vector T a at infinity. Choose the Killing vector in the Gauss’ law construction to be the horizon generator la. Let ∂V∞ and ∂VH denote the boundaries of the hypersurface V at infinity and at the horizon bifurcation surface. Equation (24) then implies that IH + I∞ = 0 (25) 3Kaluza-Klein bubble spacetimes, which we do not consider here, provide an interesting contrast . There is no interior horizon, but the rotational Killing field has an axis. Hence to use Stokes theorem, one must exclude the axis, which introduces an inner boundary. where c, I∞ = c. (26) Let us first consider the calculation of IH . On the horizon bifurcation surface, the quantities F and βa vanish, and the boundary integral on the horizon reduces to IH = − daρ̂c(−h D̄cF + hcb D̄bF ) (27) where ρ̂c is the unit outward pointing normal to the bifurcation surface within V . One can show that the surface gravity is given by κ = ρ̂c∂cF , and it then follows as in reference [9] that IH = 2κ δA (28) Now consider the boundary term at infinity. Many of the terms in (23) fall off too rapidly to make non-zero contributions. In particular, it is straight- forward to check that the DaF terms, as well as those including products of π̄ab with the metric perturbation, fall off too rapidly as r → ∞ to contribute. Furthermore, it is sufficient to take F ≃ 1 and βz = vH in this limit. We then arrive at the expression dz dsi(∂ ih− ∂jhij − 2vHpiz) (29) At this point, we need to note that the formulas (6) and (8) for the ADM mass, momentum and tension are written in terms of the variable γab defined by gab = ηab + γab. In order to interpret the boundary integral (29) in terms of variations in M, P and T , we need to relate the perturbations γab and p ab in the Hamiltonian formalism to a covariant perturbation δγab. It is straightforward to show that to the required order of accuracy h = δklδγkl + δγzz and h ij ≃ δikδjlδγkl, while a further calculation reveals that piz ≃ −(1/2) ∂iδhzt. We then find that ∂ih− ∂jhij + vH∂ihzt = −16πG(δM− vHδP) (31) Inserting these results into equation (25) then yields the mass first law for boosted black strings (with the length L at infinity held fixed)4 δM = κ δA + vHδP (33) We see that the momentum appears as an extensive parameter in the first law, while vH , which for the boosted black string is the horizon velocity, appears as an intensive parameter. This parallels the way angular momentum enters the first law for rotating black holes. Equation (32) is easily verified for the case of the boosted black string using the formulas of section (4). We now generalize the first law (33) to include perturbations with δL 6= 0. Our analysis of the boundary term is based on that in [8] for the static case. The boundary integral at the horizon in this case remains unchanged and is still given by equation (28). Additional terms, however, occur in the boundary term at infinity. Given the results above, we can write the boundary term at infinity as I∞ = 16πG(−δM|δL=0 + vHδP|δL=0 + λδL), (34) where λ remains to be determined. On the other hand, we know the L dependence of M and P explicitly from the expressions (11) and (12). This allows us to write δM = δM|δL=0 + δL (35) δP = δP|δL=0 + δL (36) Combining these with equations (34), (25) and (28) then gives I∞ = 16πG −δM + vHδP + (λ + . (37) We can now further appeal to the results of [8] for the case P = 0 and write λ = λ|P=0 + λ′. We know from [8] that λ|P=0 + M/L = T . Putting this together, allows us to rewrite (37) as I∞ = 16πG −δM + vHδP + (λ′ + T − vH . (38) 4We can also include perturbative stress energy in this relation, in which case the mass first law becomes δM = κ δA+ vHδP + (−δT a b). (32) We still need to calculate the quantity Î∞ = 16πGλ ′δL which includes only the terms in I∞ that are proportional to both P and δL. It is noted in [8] that in order for the perturbative Gauss’s law (24) to apply with δL 6= 0, one need to make a coordinate transformation so that δL appears in the metric perturbation, rather than in a change in the range of coordinates. Following this procedure yields the metric perturbations hzz ≃ 2 ), hzt ≃ There are two terms in equation (23) that potentially contribute to Î∞ and we accordingly write Î∞ = Î + Î(2) . The first of these terms is given by Î(1) dzdac −2βbπ̄achab dzdai(−2vH π̄izhzz) (41) dzdai vH∂iḡtz = 16πG vH P The second term, which requires some care in evaluating, is given by Î(2) −2βbδπcb dai(−2vHpiz) (45) daivH ∂iḡtz (1 − 2 + 1) (46) = 0. (47) where the factor (1 − 2 + 1) in line (46) comes about in the following way. We have piz ≃ δ( sszzKiz) with Kiz ≃ −(1/2)∂igtz. The first 1 comes from the variation of the volume element s, the −2 comes from the variation of inverse metric component szz following from equation (39), and the final 1 comes from the variation of gtz, also as in equation (39). Putting these results together gives λ′ = 2vHP/L and hence I∞ = 16πG −δM + vHδP + (T + . (48) Finally, combining this with IH gives the first law δA + vHδP + (T + )δL (49) From the δL term, we see that the coefficient of δL is an effective tension given by T̂ = T + vHP/L. As mentioned in the introduction, the tension of the boosted black string becomes negative for sufficiently large boost parameter. It is straightforward to check that the first law (49) is satisfied for variations within the family of boosted black string solutions in section (4), and also that the effective tension is given by rD−4H (50) which is equal to the tension of the unboosted black string having the same horizon radius. 7 Smarr formulas, scaling relations and Ko- mar integrals Smarr formulas are relations between the thermodynamic parameters that hold for black hole solutions that have exact symmetries. In this section we will derive the Smarr formula for stationary, but non-rotating Kaluza-Klein black holes. We will also derive a second Smarr-type formula that holds in the case of exact translation invariance in the z-direction, e.g. for the boosted black string. We present two approaches to deriving these formulas. The first is based on general scaling relations, which are familiar from classical thermodynamics, and the second is based on Komar integral relations. Given the statement of the first law (49) for stationary, non-rotating Kaluza-Klein black holes, the Smarr formula can be derived by making use of a simple scaling argument (see e,g, [11]). Given any stationary vacuum solution to Einstein’s equations, we may obtain a one parameter family of solutions by rescaling all the dimensionful parameters in the given solution in an appropriate way. If a parameter µ has dimensions (length)n, we replace it with λnµ. The parameters ct, cz and ctz that specify the asymptotics of the stationary solution all scale as (length)D−4. If we rescale these accordingly, and also replace L with λL, then the mass and momentum rescale as M = λD−3M̄, P = λD−3P̄ (51) where M̄ and P̄ are the mass and momentum of the original solution. Sim- ilarly, the area of the event horizon of the family of spacetimes will be A = λD−2Ā. Now consider how these quantities change under a small change in λ. We have dM = (D−3)Mdλ , dP = (D−3)P dλ , dA = (D−2)Adλ , dL = Ldλ The first law (49) must hold in particular for this variation in λ. This will implies that (D − 3)M = (D − 2) 1 κA + T̂ L + (D − 3)vHP (53) which is the Smarr formula for stationary, non-rotating Kaluza-Klein black holes. Note that via the scaling argument, the effective tension T̂ naturally enters the Smarr formula as well as the first law. We now derive a second Smarr formula that holds only for solutions, such as the boosted black string, that have exact translation invariance in the z- direction5. Note that the mass, momentum and horizon area are all extensive quantities in the compactification length L and that different values of L give another one parameter family of solutions. Within this family we have dM = MdL , dP = P dL , dA = AdL under a small variation in L. For the first law to be satisfied under such variations, we must have M = 1 κA + T̂ L + vHP (55) Because of the simple extensivity of M, P, A and L itself in the length L of the compact direction, this second Smarr formula takes the form of the usual Euler relation for a thermodynamic system, without any additional dimension dependent prefactors. Note that by taking a linear combination of the two Smarr formulas (53) and (55), the horizon area term may be eliminated, giving M = (D − 3)T̂ L + vHP (56) = (D − 3)T L + (D − 2)vHP (57) 5It appears likely that the sboosted black strings can be shown to be the only stationary, non-rotating, z translational vacuum solutions with non-singular horizons (see reference [19]). For P = 0 this is the well known relation between the mass and tension for a uniform black string. The Smarr formulas may also be derived by geometrical means using Komar integral relations. This is done in reference [12] for the first Smarr formula in the case P = 0. For a vacuum spacetime with a Killing vector ka, and a hypersurface Σ with boundaries ∂Σ∞ at infinity and ∂ΣH at the black hole horizon, the Komar integral relation implies the equality I∂Σ∞ = I∂ΣH where IS = − dSab∇akb. (58) The first Smarr formula results from taking ka to be the horizon generator la and Σ to be a spacelike hypersurface with normal dt at infinity. The com- putation of the horizon boundary term in this case is by now quite standard (see [20]). The horizon generator la is null on the horizon and consequently normal to the boundary ∂ΣH . Let q a be the second null vector orthogo- nal to ∂ΣH , normalized so that l aqa = −1. One then has on the boundary dSab = 2l[aqb]dA, where dA is the surface area element. It then follows that I∂ΣH = κA (59) where we have made use of the definition (3) of the surface gravity. The boundary term at infinity may be straightforwardly computed using the asymptotic form of the metric in (10) and the expressions (11) and (12) for the ADM mass, tension and momentum. One finds I∂Σ∞ = ΩD−3L (D − 4)ct (60) D − 2 ((D − 3)M−T L) − vHP. (61) Equating the two boundary integrals correctly reproduces the first Smarr formula (53). The scaling argument that led to the second Smarr formula assumed translation invariance in the z-direction, i.e. that Za as well as the horizon generator la = T a + vHZ a is a Killing vector. To give a geometric derivation, we additionally assume, as in section (2.1), that the Killing vectors T a and Za commute. Let us now take the Killing vector in the Komar construction to be V a = vHT a +Za, which is orthogonal to the horizon generator la both at infinity and on the horizon. We take the hypersurface Σ to be timelike, with normal equal to dz at infinity and proportional to Va at the horizon. The normal to the horizon within Σ is then proportional to the horizon generator la, and hence the boundary term at the horizon includes the factor laVb∇aV b = VaVb∇alb = 0. (62) In the first equality the commutivity of the Killing vectors is used and the second equality follows from Killing’s equation. Hence, the boundary term at the horizon I∂ΣH vanishes for this choice of Killing field and hypersurface. The boundary term at infinity is again straightforward to compute using the expressions in (10),(11) and (12). We find I∂Σ∞ = − (D − 4)(cz + vHctz) (63) = − 1 D − 2 (M/L− (D − 3)T ) + vHP/L. (64) Equating this with zero then gives the second Smarr formula (55). 8 Tension first law and Gibbs-Duhem rela- A second kind of variational formula for static Kaluza-Klein black holes was derived in reference [13]. This ‘tension first law’ states that LdT = − 1 Adκ (65) and holds for perturbations that take a static, translation invariant solution into a nearby solution that is stationary, but not necessarily translation in- variant. It applies, for example, to the perturbation between the marginally stable uniform black string and the static non-uniform black string of ref- erence [21]. In this section, we discuss the thermodynamic context of this formula and conjecture its extension to include P 6= 0. We regard the quantities M, A, L and P as extensive parameters, while κ, T and vH are regarded as intensive parameters. For thermodynamic systems, the first law relates variations in the extensive parameters, as does equation (49). In classical thermodynamics a formula relating the variations of the intensive parameters is known as a Gibbs-Duhem relation. A Gibbs-Duhem relation can be derived from the first law, together with the variation of an Euler formula, such as equation (55). In the present case, variation of the Euler formula gives dM = 1 (κdA + Adκ) + T̂ dL + LdT̂ + vHdP + PdvH (66) Combining this with the first law then gives the Gibbs-Duhem relation Adκ + LdT̂ + PdvH (67) which reduces to (65) for P = 0. Note, however, that the Euler formula (55) holds only for z-translationally invariant solutions, and hence the result above holds only for perturbations that respect this symmetry, i.e. within the boosted black string family of solutions. Equation (65) was derived in [13] via the Hamiltonian perturba- tion methods of section (5), and does not require that the perturbations are invariant under z translations. We would like to extend the derivation of [13] to the stationary non-rotating case, but have not yet accomplished this. 9 Conclusions We have derived various thermodynamic relations for stationary, non-rotating Kaluza-Klein black holes. As in reference [8], the derivation of the first law required a careful application of Hamiltonian perturbation theory techniques. Perhaps the most interesting aspect of the first law (49) is the appearance of the effective tension T̂ which generally differs from the ADM tension. For the boosted black string, the ADM tension becomes negative for large boost parameter, while the effective tension remains positive. We note that the gravitational contribution to the ADM tension was shown to be positive for static spacetimes in reference [22] using spinorial techniques. It should be interesting to see what these techniques yield in the stationary case, e.g. do they prove positivity of the effective tension. Our results concerning the Smarr formulas in section (7) are also of in- terest. In particular, the parallels between the scaling argument and Komar integral relation derivations are intriguing and can most likely be understood in a more general setting. Finally, we would like to be able to give a Hamilto- nian derivation of the Gibbs-Duhem, or ‘tension first law’, result in equation (67). Acknowledgments The authors would like to thank Roberto Emparan and Henriette Elvang for helpful discussions. This work was supported in part by NSF grant PHY- 0555304. References [1] B. Kol, “The phase transition between caged black holes and black strings: A review,” Phys. Rept. 422, 119 (2006) [arXiv:hep-th/0411240]. [2] T. Harmark, V. Niarchos and N. A. Obers, “Instabilities of black strings and branes,” arXiv:hep-th/0701022. [3] J. L. Hovdebo and R. C. Myers, “Black rings, boosted strings and Gregory-Laflamme,” Phys. Rev. D 73, 084013 (2006) [arXiv:hep-th/0601079]. [4] B. Kleihaus, J. Kunz and E. Radu, “Rotating nonuniform black string solutions,” arXiv:hep-th/0702053. [5] R. G. Cai, “Boosted domain wall and charged Kaigorodov space,” Phys. Lett. B 572, 75 (2003) [arXiv:hep-th/0306140]. [6] P. K. Townsend and M. Zamaklar, “The first law of black brane me- chanics,” Class. Quant. Grav. 18, 5269 (2001) [arXiv:hep-th/0107228]. [7] T. Harmark and N. A. Obers, “Phase structure of black holes and strings on cylinders,” Nucl. Phys. B 684, 183 (2004) [arXiv:hep-th/0309230]. [8] D. Kastor and J. Traschen, “Stresses and strains in the first law for Kaluza-Klein black holes,” JHEP 0609, 022 (2006) [arXiv:hep-th/0607051]. [9] D. Sudarsky and R. M. Wald, “Extrema of mass, stationarity, and static- ity, and solutions to the Einstein Yang-Mills equations,” Phys. Rev. D 46, 1453 (1992). [10] J. H. Traschen, “Constraints On Stress Energy Perturbations In General Relativity,” Phys. Rev. D 66, 010001 (2002). http://arxiv.org/abs/hep-th/0411240 http://arxiv.org/abs/hep-th/0701022 http://arxiv.org/abs/hep-th/0601079 http://arxiv.org/abs/hep-th/0702053 http://arxiv.org/abs/hep-th/0306140 http://arxiv.org/abs/hep-th/0107228 http://arxiv.org/abs/hep-th/0309230 http://arxiv.org/abs/hep-th/0607051 [11] B. D. Chowdhury, S. Giusto and S. D. Mathur, “A microscopic model for the black hole - black string phase transition,” Nucl. Phys. B 762, 301 (2007) [arXiv:hep-th/0610069]. [12] T. Harmark and N. A. Obers, “New phase diagram for black holes and strings on cylinders,” Class. Quant. Grav. 21, 1709 (2004) [arXiv:hep-th/0309116]. [13] J. H. Traschen and D. Fox, “Tension perturbations of black brane space- times,” Class. Quant. Grav. 21, 289 (2004) [arXiv:gr-qc/0103106]. [14] S. W. Hawking, “Black holes in general relativity,” Commun. Math. Phys. 25, 152 (1972). [15] S. W. Hawking and G. F. R. Ellis, “The Large scale structure of space- time,” Cambridge University Press, Cambridge, 1973 [16] S. Hollands, A. Ishibashi and R. M. Wald, “A higher di- mensional stationary rotating black hole must be axisymmetric,” arXiv:gr-qc/0605106. [17] B. Carter and J. B. Hartle, “Gravitation in Astrophysics, Proceedings, Nato Advanced Study Institute, Cargese, France, July 15-31, 1986,” New York, USA: Plenum (1987) 399 P. (Nato ASI Series. Series B, Physics, 156) [18] T. Harmark and N. A. Obers, “General definition of gravitational ten- sion,” JHEP 0405, 043 (2004) [arXiv:hep-th/0403103]. [19] J. Lee and H. C. Kim, “Black string solution and frame dragging,” arXiv:gr-qc/0703091. [20] J. M. Bardeen, B. Carter and S. W. Hawking, “The Four laws of black hole mechanics,” Commun. Math. Phys. 31, 161 (1973). [21] T. Wiseman, “Static axisymmetric vacuum solutions and non- uniform black strings,” Class. Quant. Grav. 20, 1137 (2003) [arXiv:hep-th/0209051]. [22] J. H. Traschen, “A positivity theorem for gravitational ten- sion in brane spacetimes,” Class. Quant. Grav. 21, 1343 (2004) [arXiv:hep-th/0308173]. http://arxiv.org/abs/hep-th/0610069 http://arxiv.org/abs/hep-th/0309116 http://arxiv.org/abs/gr-qc/0103106 http://arxiv.org/abs/gr-qc/0605106 http://arxiv.org/abs/hep-th/0403103 http://arxiv.org/abs/gr-qc/0703091 http://arxiv.org/abs/hep-th/0209051 http://arxiv.org/abs/hep-th/0308173 Introduction Stationary, non-rotating Kaluza-Klein black holes Two commuting Killing fields ADM mass, tension and momentum The boosted black string Gauss' Laws for Perturbations The first law for stationary, Kaluza-Klein black holes Smarr formulas, scaling relations and Komar integrals Tension first law and Gibbs-Duhem relation Conclusions ABSTRACT We study the thermodynamics of Kaluza-Klein black holes with momentum along the compact dimension, but vanishing angular momentum. These black holes are stationary, but non-rotating. We derive the first law for these spacetimes and find that the parameter conjugate to variations in the length of the compact direction is an effective tension, which generally differs from the ADM tension. For the boosted black string, this effective tension is always positive, while the ADM tension is negative for large boost parameter. We also derive two Smarr formulas, one that follows from time translation invariance, and a second one that holds only in the case of exact translation symmetry in the compact dimension. Finally, we show that the `tension first law' derived by Traschen and Fox in the static case has the form of a thermodynamic Gibbs-Duhem relation and give its extension in the stationary, non-rotating case. <|endoftext|><|startoftext|> arXiv:0704.0730v1 [cs.PF] 5 Apr 2007 Revisiting the Issues On Netflow Sample and Export Performance Hamed Haddadi, Raul Landa, Miguel Rio Department of Electronic and Electrical Engineering University College London United Kingdom Email: hamed,mrio,rlanda@ee.ucl.ac.uk Saleem Bhatti School of Computer Science University of St. Andrews United Kingdom Email: saleem@dcs.st-and.ac.uk Abstract— The high volume of packets and packet rates of traffic on some router links makes it exceedingly difficult for routers to examine every packet in order to keep detailed statistics about the traffic which is traversing the router. Sampling is commonly applied on routers in order to limit the load incurred by the collection of information that the router has to undertake when evaluating flow information for monitoring purposes. The sampling process in nearly all cases is a deterministic process of choosing 1 in every N packets on a per-interface basis, and then forming the flow statistics based on the collected sampled statistics. Even though this sampling may not be significant for some statistics, such as packet rate, others can be severely distorted. However, it is important to consider the sampling techniques and their relative accuracy when applied to different traffic patterns. The main disadvantage of sampling is the loss of accuracy in the collected trace when compared to the original traffic stream. To date there has not been a detailed analysis of the impact of sampling at a router in various traffic profiles and flow criteria. In this paper, we assess the performance of the sampling process as used in NetFlow in detail, and we discuss some techniques for the compensation of loss of monitoring detail. I. INTRODUCTION Packet sampling is an integral part of passive network measurement on today’s Internet. The high traffic volumes on backbone networks and the pressure on routers has resulted in the need to control the consumption of resources in the measurement infrastructure. This has resulted in the definition and use of estimated statistics by routers, generated based on sampling packets in each direction of each port on the routers. The aims of this paper is to analyse the effects of the sampling process as operated by NetFlow, the dominant standard on today’s routers. There are three constraints on a core router which lead to the use packet sampling: the size of the record buffer, the CPU speed and the record look-up time. In [6], it is noted that in order to manage and analyse the performance of a network, it is enough to look at the basic statistical measures and summary statistics such as average range, variance, and standard deviation. However, in this paper we analyse both analytically and practically the accuracy of the inference of original characteristics from the sampled stream when higher order statistics are used. This paper focuses on the inference of original network traffic characteristics for flows from a sampled set of packets and examines how the sampling process can affect the quality of the results. In this context, a flow is identified specifically, as the tuple of the following five key fields: Source IP address, Destination IP address, Source port number, Destination port number, Layer 4 protocol type. A. NetFlow memory constraints A router at the core of an internet link is carrying a large number of flows at any given time. this pressure on the router entails the use of strict rules in order to export the statistics and keep the router memory buffer and CPU resources available to deal with changes in traffic patterns by avoiding the handling of large tables of flow records. Rules for expiring NetFlow cache entries include: • Flows which have been idle for a specified time are expired and removed from the cache (15 seconds is default) • Long lived flows are expired and removed from the cache (30 minutes is default) • As the cache becomes full a number of heuristics are ap- plied to aggressively age groups of flows simultaneously • TCP connections which have reached the end of byte stream (FIN) or which have been reset (RST) will be expired B. Sampling basics Distributions studies have been done extensively in lit- erature. In brief conclusion, internet traffic is believed to have Heavy-tailed distribution, self-similar nature, Long Range Dependence [2]. Sampling has the following effects on the flows: • It is easy to miss short flows [14] • Mis-ranking on high flows [4] • Sparse flow creation [14] Packet sampling: The inversion methods are of little to no use in practice for low sampling probability q, such as q = 0.01 (1 packet in 100) or smaller, and become much worse as q becomes smaller still. For example, on the Abilene network, 50% sampling was needed to detect the top flow correctly [4]. Flow sampling: Preserves flows intact and the sampling is done on the flow http://arxiv.org/abs/0704.0730v1 records. In practice, any attempt to gather flow statistics involves classifying individual packets into flows. All packet meta-data has to be organised into flows before sampling can take place. This involves more CPU load and more memory if one uses the traditional hash table approach with one entry per flow. New flow classification techniques, such as bitmap algorithms, could be applied but there is no practical usage in this manner currently. II. VARIATION OF HIGHER ORDER STATISTICS In this section we look at a more detailed analysis of the effect of sampling as performed by netflow on higher order statistics of the packet and flow size distributions. For the analysis of packet sampling application is used by NetFlow, we emulated the NetFlow operation on a 1 hour OC-48 trace, collected from the CAIDA link on 24th of April 2003. This data set is available from the public repository at CAIDA [7]. The trace comprises of 84579462 packets with anonymised source and destination IP addresses. An important factor to rememberer in this work is the fact that the memory constraint on the router has been relaxed in generating the flows from the sampled stream. This means that there maybe more than tens of thousands of flow keys present at the memory at a given time, while in NetFlow, the export mechanism empties the buffer list regularly which can have a more severe impact on the resultant distribution of flow rates and statistics3. A. Effects of the short time-out imposed by memory con- straints Table I illustrates the data rates d(t) per interval of mea- surement. Inverted data rates, by dividing d(t) by the sampling probability q, are shown as dn(t). TABLE I THE STATISTICAL PROPERTIES ON DATA RATES d(t) Dataset,bin(secs) STD Skewness Kurtosis d(t), 30 2.2274e+07 0.5421 0.6163 dn(t), 30 2.9109e+07 0.3837 0.4444 d(n)− dn(t), 30 1.6748e+07 -0.2083 0.7172 d(t), 120 7.8650e+07 0.7398 1.6190 dn(t), 120 9.5216e+07 0.3274 0.9268 d(t) − dn(t), 120 3.7652e+07 -0.2971 -1.1848 d(t), 300 1.8491e+08 1.3058 3.7451 dn(t), 300 2.1248e+08 1.1016 2.5408 d(t) − dn(t), 300 6.1039e+07 0.1840 -1.1628 As observed in table I, the mean does not have a great variation, possibly because distributions of packet sizes within single flows do not exhibit high variability. The standard deviation of the estimated data rate is higher than the cor- responding standard deviation for the unsampled data stream. In the absence of any additional knowledge about the higher level protocol, or the nature of the session level activity, in the unsampled data stream, each flow can be thought of as having 3The processing of the data was done using tools which are made available to the public by the authors. packets of varying sizes that are more or less independent from one another. Thus, the whole traffic profile results from the addition of many independent random variables which, by the central limit theorem, tend to balance among themselves to produce a more predictable, homogeneous traffic aggre- gate. However, simple inversion eliminates this multiplicity of randomly distributed values by introducing a very strong correlation effect, whereby the size of all the packets in a reconstructed flow depend on the size of a very small set of sampled packets. This eliminates the possibility for balancing and thus increases the variability of the resulting stream, i.e. its standard deviation. However, the skewness and kurtosis do change. Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. Roughly speaking, a distribution has positive skew (right-skewed) if the right (higher value) tail is longer and negative skew (left-skewed) if the left (lower value) tail is longer (confusing the two is a common error). Skewness, the third standardised moment, is written as γ1 and defined as: where µ3 is the third moment about the mean and σ is the standard deviation. Kurtosis is more commonly defined as the fourth cumulant divided by the square of the variance of the probability distribution, which is known as excess kurtosis. The ”minus 3” at the end of this formula is often explained as a correction to make the kurtosis of the normal distribution equal to zero. The skewness is a sort of measure of the asymmetry of the distribution function. The kurtosis measures the flatness of the distribution function compared to what would be expected from a Gaussian distribution. Table II illustrates the packet rates p(t) per interval of measurement. Inverted packet rates, by dividing p(t) by the sampling probability q, are shown as pn(t). The distributions before and after sampling are extremely close, and thus their difference tends to exaggerate those small difference that they do have. That is the reason of the enormous skewness and kurtosis that are observed. The skewness of the reconstructed stream is smaller than that of the unsampled stream this means that the reconstructed distribution is more symmetric, that is , it tends to diverge in a more homogeneous manner around the mean. Additionally, it is positive, meaning that in both cases the distribution tends to have longer tails towards large packets rather than towards short packets, concentrating its bulk on the smaller packets. If we concede that small flows (flows consisting of a small number of packets) tend to contain small packets, then it is clear that this smaller packets will be underrepresented and the distribution will shift its weight towards bigger packets (members of bigger flows). Thus, it will become more symmetric and hence less skewed. The Kurtosis decreases in all of the considered examples. TABLE II THE STATISTICAL PROPERTIES ON PACKET RATES p(t) Dataset,bin(secs) STD Skewness Kurtosis p(t), 30 3.1162e+04 -0.4007 0.7415 pn(t), 30 3.1359e+04 -0.3584 0.6072 p(t) − pn(t), 30 5.4148e+03 9.1469 96.0659 p(t), 120 1.1215e+05 -0.3875 1.2027 pn(t), 120 1.1178e+05 -0.3759 1.2238 p(t) − pn(t), 120 3.0157e+03 4.7140 26.1079 p(t), 300 2.5128e+05 0.1305 1.6495 pn(t), 300 2.5152e+05 0.1433 1.6597 p(t) − pn(t), 300 2.1047e+03 2.4298 8.9377 This means that the reconstructed streams are more homo- geneous and less prone to outliers when compared with the original traces. Thus, more of the variance in the original traces in packet size can be attributed to infrequent packets that have inordinately big packets that were missed in the sampling process, and thus the variance in the reconstructed stream consists more of homogeneous differences and not large outliers. However, both the reconstructed and unsampled streams are leptokurtic and thus tend to have long, heavy tails. B. The two-sample KS test The two-sample KS test is one of the most useful and general non-parametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples. A CDF was calculated for the number of packets per flow and the number of octets per flow for each of the 120 sampling intervals of 30 seconds each, both for the sampled/inverted and unsampled streams. Then, a Two-Sample Kolmogorov-Smirnov Test with 5% significance level was performed between the 120 unsampled and the 120 sampled & inverted distributions. In every case the distributions before and after sampling and inversion were found to be significantly different, and thus it is very clear that the sampling and inver- sion process significantly distorts the actual flow behaviour of the network. III. PRACTICAL IMPLICATIONS OF SAMPLING The effects of sampling on network traffic statistics can be measured from different perspectives. In this section we will cover the theories behind the sampling strategy and use some real data captures from CAIDA in an emulation approach to demonstrate the performance constraints of systematic sam- pling. A. Inversion errors on sampled statistics The great advantage of sampling is the fact that the first order statistics do not show much variation when the sampling is done at consistent intervals and from a large pool of data. This enables the network monitoring to use the sampled statistics to form a relatively good measure of the aggregate measure of network performance. Figure 1 displays the data rates d(t), in number of bytes seen per 30 second interval, on the one hour trace. The inverted data d(t) is also shown with diamond notation, showing the statistics gathered after the sampled data is multiplied by the sampling rate. The black dots display the relative error per interval, e(t) = d(t)−dn(t) 0 20 40 60 80 100 120 8 Data Rates on CAIDA, 30sec interval Sampling interval [x 30sec] Original Inverted from sampling Relative error Fig. 1. Data rates per 30 second interval, original versus normal inversion of sampled Figure 2 displays the packet rates p(t), the number of packets per 30 second interval, versus the sampled and inverted packet rates pn(t). In this figure, it can be observed that the inversion does a very good job at nearly all times and the relative error is negligible. This is a characteristics of systematic sampling and is due to the central limit theorem. 0 20 40 60 80 100 120 5 Packet Rate p(t) on CAIDA trace, 30 Second period Sampling interval [x 30sec] Original Inverted from sampling Fig. 2. Packet rates per 30 second interval, original vs inversion of sampled It can be readily seen that the recovery of packet rates by simple inversion is much better than the recovery of data rates. This is because sampling one in a thousand packets deterministically can be trivially inverted by multiplying by the sampling rate (1000): we focus on packet level measurement, as opposed to a flow level measurement. If the whole traffic flow is collapsed into a single link, then if we sample one packet out every thousand and then multiply that by the sampling rate, we will get the total number of packets in that time window. We believe that the small differences that we can see in Figure 2 are due to the fact that at the end of the window some packets are lost (because their ‘representative’ was not sampled) or overcounted (a ‘representative’ for 1000 packets was sampled but the time interval finished before they had passed). We believe these errors happen between measurement windows in time, i.e. they are window-edge effects. The inversion property described above does not hold for measuring the number of bytes in a sampling interval. Simple inversion essentially assumes that all packets in a given flow are the same size, and of course this assumption is incorrect. It is to be expected that the greater the standard deviation of packet size over an individual flow, the more inaccurate the recovery by simple inversion will be regarding the number of bytes per measurement interval. Figures 3 and 4 displays the standard error rate on data rate and packet rate recovery respectively, in different measurement intervals. −1 0 1 2 Datarate errors, 300 second −1 −0.5 0 0.5 1 Datarate errors, 120 second −5 0 5 10 Datarate errors, 30 second Fig. 3. Standard Sampling & inversion error on data rates, different measurement bins −5000 0 5000 10000 Packet rate errors, 300 second −5 0 5 10 Packet rate errors, 30 second −1 0 1 2 Packet rate errors, 120 second Fig. 4. Sampling & inversion error on packet rates, different measurement B. Flow size and packet size distributions Figure 5, displays the CDF of packet size distribution in all the flows formed from the sampled and unsampled streams. The little variation in the packet size distribution conforms to the findings of the previous section where it was discussed that the packet sampling has low impact on the packet size distribution. 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 CDF of packet size distributaions, Logarithmic view Log of Packet size [Octets] Sampled Original Fig. 5. Normalised CDF of packets distributions per flow, original vs inverted Figure 6:1 shows the effect that the distribution of packet lengths can have on the distribution of flow lengths when periodic packet sampling is applied. As flows reconstructed from a sampled packet stream are predominantly formed by just one packet, their length distribution follows that of single packets (Figure 5). That is the reason for the sharp jump near 1500 octets, as this characteristic originates from the maximum frame size in ethernet networks. 2 4 6 8 10 12 14 16 18 CDF of Flow Size Distributaions [logarithmic] Flow size [Log(Octets)] Sampled Original 0 2 4 6 8 10 12 CDF of Flow Length Distributaions [logarithmic] Flow length [Log(Packets)] Inverted Original Fig. 6. Normalised CDF of flow size in packets [figure] & length in bytes [right] per flow, original vs inverted From Figure 6:2 , it can be readily seen that, in the sampled stream, more than 90 percent of flows consist of a single packet, whereas in the unsampled case a much grater diversity in flow lengths exists for small flows. This is due to the fact that simple packet-based deterministic sampling under- represents short flows, and those short flows that are indeed detected by the procedure after sampling usually consist of a single packet. Thus, short flows are either lost or recovered as single packet flows, and long flows have their lengths reduced. IV. CONCLUSION In this paper we have reviewed the effects of sampling and flow record creation, as done by NetFlow, on the traffic statistics which are reported by such a process. It is inevitable that systematic sampling can no longer provide a realistic picture of the traffic profile present on internet links. The emergence of applications such as video on-demand, file sharing, streaming applications and even on-line data process- ing packages prevents the routers from reporting an optimal measure of the traffic traversing them. In the inversion process, it is a mistake to assume that the inversion of statistics by multiplication by the sampling rate is an indicate of even the first order statistics such as packet rates. An extension to this work and the inversion problem entails the use of more detailed statistics such as port numbers and TCP flags in order to be able to infer the original characteristics from the probability distribution functions of such variables. This will enable a more detailed recovery of original packet and data rates for different applications. The inference of such probabilities, plus use of methods such as Bayesian inference, would enable a forecasting method which would enable the inversion of the sampled stream in near real time. In a related work, we will be looking at alternative flow syn- thesis schemes, looking at techniques replacing the NetFlow, such as use of hashing techniques using Bloom filters. The use of a light weight flow indexing system will allow for a larger number of flows to be present at the router, possibly increasing the memory constraints and allowing for a higher sampling rate, which will in turn lead to more accurate inversion. V. RELATED WORK There has been a great deal of worked done on analysis of sampling process and inversion problem. Choi et al. have explored the sampling error and measurement overhead of NetFlow in [11] though they have not looked at inversion process. In [3], the authors have compared the Netflow reports with those obtained from SNMP statistics and packet level traces, but without using the sampling feature of NetFlow which is perhaps the dominant version in use nowadays. Estan et al. [5] have proposed a novel method of adapting the sampling rate at a NetFlow router in order to keep the memory resources at a constant level. This is done by upgrading the router firmware, which can be compromised by an attacker injecting varying traffic volume in order to take down the router. Also this work has not considered the flow length statistics which are the primary focus of our work. Hohn et al. [10] have proposed a flow sampling model which can be used in an offline analysis of flow records formed from an unsampled packet stream. In this model the statistics of the original stream are recovered to a great extent. However the intensive computing and memory resources needed in this process prevents the implementation of such a scheme on highspeed routers. They prove it impossible to accurately recover statistics from a packet sampled stream, but based on the assumption of packets being independent and identically distributed Roughan at [12] has looked at statistical processes of active measurement using Poisson and uniform sampling and has compared the theoretical performance of the two methods. Papagiannaki et al. at [8] have discussed the effect of sampling on tiny flows when looking at generation of traffic matrices. Authors at [15] have been looking at anomaly detection using flow statistics, but without sampling. In [17] and [16], authors have looked at inferring the numbers and lengths of flows of original traffic that evaded sampling altogether. They have looked at inversion via multiplication. ACKNOWLEDGEMENTS The authors would like to acknowledge CAIDA [7] for providing the trace files. This work is conducted under the MASTS (EPSRC grant GR/T10503) and the 46PaQ project (EPSRC grant GR/S93707). REFERENCES [1] NetFlow Services Solutions Guide available at http://www. cisco.com/univercd/cc/td/doc/cisintwk/intsolns/ netflsol/nfwhite.htm [2] Will Leland, Murad Taqqu, Walter Willinger, and Daniel Wilson, On the Self-Similar Nature of Ethernet Traffic (Extended Version), IEEE/ACM Transactions on Networking, Vol. 2, No. 1, pp. 1-15, February 1994. [3] Sommer, R. and Feldmann, A. 2002. NetFlow: information loss or win?. In Proceedings of the 2nd ACM SIGCOMM Workshop on internet Measurment (Marseille, France, November 06 - 08, 2002). IMW ’02. ACM Press, New York, NY, 173-174. DOI= http://doi.acm.org/ 10.1145/637201.637226 [4] Barakat, C., Iannaccone, G., and Diot, C. 2005. Ranking flows from sam- pled traffic. In Proceedings of the 2005 ACM Conference on Emerging Network Experiment and Technology (Toulouse, France, October 24 - 27, 2005). CoNEXT’05. ACM Press, New York, NY, 188-199. [5] Estan, C., Keys, K., Moore, D., and Varghese, G. 2004. Building a better NetFlow. SIGCOMM Comput. Commun. Rev. 34, 4 (Aug. 2004), 245- 256. http://doi.acm.org/10.1145/1030194.1015495 [6] Performance and Fault Management (Cisco Press Core Series) by Paul L Della Maggiora (Author), James M. Thompson (Author), Robert L. Pavone Jr. (Author), Kent J. Phelps (Author), Christopher E. Elliott (Editor), Publisher: Cisco press; 1ST edition, ISBN: 1578701805 [7] CAIDA, the Cooperative Association for Internet Data Analysis: http: //www.caida.org [8] K. Papagiannaki, N. Taft, and A. Lakhina. A Distributed Approach to Measure IP Traffic Matrices. In ACM Internet Measurement Conference, Taormina, Italy, October, 2004. [9] IETF PSAMP working Group: http://www.ietf.org/html. charters/psamp-charter.html [10] N. Hohn and D. Veitch. Inverting Sampled Traffic. In Internet Mea- surement Conference, 2003. http://citeseer.ist.psu.edu/ hohn03inverting.html [11] Choi, B. and Bhattacharyya, S. 2005. Observations on Cisco sampled NetFlow. SIGMETRICS Perform. Eval. Rev. 33, 3 (Dec. 2005), 18-23. DOI= http://doi.acm.org/10.1145/1111572.1111579 [12] A Comparison of Poisson and Uniform Sampling for Active Measure- ments, Matthew Roughan, accepted to appear in IEEE JSAC. [13] InMon Corporation (2004). sFlow accuracy and billing. Available at www.inmon.com/pdf/sFlowBilling.pdf [14] Sampling for Passive Internet Measurement: A Review, N.G. Duffield, Statistical Science,Vol. 19, No. 3, 472-498, 2004. [15] Brauckhoff, D., Tellenbach, B., Wagner, A., May, M., and Lakhina, A. 2006. Impact of packet sampling on anomaly detection metrics. In Proceedings of the 6th ACM SIGCOMM on internet Measurement (Rio de Janeriro, Brazil, October 25 - 27, 2006). IMC ’06. ACM Press, New York, NY, 159-164. DOI= http://doi.acm.org/10.1145/ 1177080.1177101 http://www.cisco.com/univercd/cc/td/doc/cisintwk/intsolns/netflsol/nfwhite.htm http://doi.acm.org/10.1145/637201.637226 http://doi.acm.org/10.1145/1030194.1015495 http://www.caida.org http://www.ietf.org/html.charters/psamp-charter.html http://citeseer.ist.psu.edu/hohn03inverting.html www.inmon.com/pdf/sFlowBilling.pdf http://doi.acm.org/10.1145/1177080.1177101 [16] N. Duffield, C. Lund, and M. Thorup. Properties and Prediction of Flow Statistics from Sampled Packet Streams. In Proc. ACM SIGCOMM IMW’02, Marseille, France, Nov. 2002. [17] N. Duffield, C. Lund, and M. Thorup. Estimating Flow Distributions from Sampled Flow Statistics. In Proc. ACM SIGCOMM’03, Karlsruhe, Germany, Aug. 2003. ABSTRACT The high volume of packets and packet rates of traffic on some router links makes it exceedingly difficult for routers to examine every packet in order to keep detailed statistics about the traffic which is traversing the router. Sampling is commonly applied on routers in order to limit the load incurred by the collection of information that the router has to undertake when evaluating flow information for monitoring purposes. The sampling process in nearly all cases is a deterministic process of choosing 1 in every N packets on a per-interface basis, and then forming the flow statistics based on the collected sampled statistics. Even though this sampling may not be significant for some statistics, such as packet rate, others can be severely distorted. However, it is important to consider the sampling techniques and their relative accuracy when applied to different traffic patterns. The main disadvantage of sampling is the loss of accuracy in the collected trace when compared to the original traffic stream. To date there has not been a detailed analysis of the impact of sampling at a router in various traffic profiles and flow criteria. In this paper, we assess the performance of the sampling process as used in NetFlow in detail, and we discuss some techniques for the compensation of loss of monitoring detail. <|endoftext|><|startoftext|> The tensor part of the Skyrme energy density functional. I. Spherical nuclei T. Lesinski,1, ∗ M. Bender,2, 3, † K. Bennaceur,1, 2 T. Duguet,4 and J. Meyer1 1Université de Lyon, F-69003 Lyon, France; Institut de Physique Nucléaire de Lyon, CNRS/IN2P3, Université Lyon 1, F-69622 Villeurbanne, France 2DSM/DAPNIA/SPhN, CEA Saclay, F-91191 Gif-sur-Yvette Cedex, France 3Université Bordeaux 1; CNRS/IN2P3; Centre d’Études Nucléaires de Bordeaux Gradignan, UMR5797, Chemin du Solarium, BP120, F-33175 Gradignan, France 4National Superconducting Cyclotron Laboratory and Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824, USA (Dated: April 4, 2007) We perform a systematic study of the impact of the J2 tensor term in the Skyrme energy functional on properties of spherical nuclei. In the Skyrme energy functional, the tensor terms originate both from zero-range central and tensor forces. We build a set of 36 parameterizations which cover a wide range of the parameter space of the isoscalar and isovector tensor term coupling constants with a fit protocol very similar to that of the successful SLy parameterizations. We analyze the impact of the tensor terms on a large variety of observables in spherical mean-field calculations, such as the spin-orbit splittings and single-particle spectra of doubly-magic nuclei, the evolution of spin-orbit splittings along chains of semi-magic nuclei, mass residuals of spherical nuclei, and known anomalies of radii. The major findings of our study are (i) tensor terms should not be added perturbatively to existing parameterizations, a complete refit of the entire parameter set is imperative. (ii) The free variation of the tensor terms does not lower the χ2 within a standard Skyrme energy functional. (iii) For certain regions of the parameter space of their coupling constants, the tensor terms lead to instabilities of the spherical shell structure, or even the coexistence of two configurations with different spherical shell structure. (iv) The standard spin-orbit interaction does not scale properly with the principal quantum number, such that single-particle states with one or several nodes have too large spin-orbit splittings, while those of nodeless intruder levels are tentatively too small. Tensor terms with realistic coupling constants cannot cure this problem. (v) Positive values of the coupling constants of proton-neutron and like-particle tensor terms allow for a qualitative description of the evolution of spin-orbit splittings in chains of Ca, Ni and Sn isotopes. (vi) For the same values of the tensor term coupling constants, however, the overall agreement of the single-particle spectra in doubly-magic nuclei is deteriorated, which can be traced back to features of the single-particle spectra that are not related to the tensor terms. We conclude that the currently used central and spin-orbit parts of the Skyrme energy density functional are not flexible enough to allow for the presence of large tensor terms. PACS numbers: 21.10.Dr, 21.10.Pc, 21.30.Fe, 21.60.Jz I. INTRODUCTION The strong nuclear spin-orbit interaction in nuclei is responsible for the observed magic numbers in heavy nu- clei [1, 2, 3, 4]. While a simple spin-orbit interaction al- lows for the qualitative description of the global features of shell structure, the available data suggest that single- particle energies evolve with neutron and proton number in a manner that cannot be related to the geometrical growth of the single-particle potential with N and Z. Many anomalies of shell structure have been identified that do not fit into simple experimental systematics, and that challenge any global model of nuclear structure. The evolution of shell structure with N and Z as a fea- ture of self-consistent mean-field models has been known for long. To quote the pioneering study of shell structure ∗Electronic address: lesinski@ipnl.in2p3.fr †Electronic address: bender@cenbg.in2p3.fr in a self-consistent model performed by Beiner et al. [5], the “most striking effect is the appearance of N = 16, 34 and 56 as neutron magic numbers for unstable nuclei, together with a weakening of the shell closure at N = 20 and 28”. Various mechanisms that modify the appear- ance of gaps in the single-particle spectra have been dis- cussed in detail in the literature. The two most promi- nent ones that were worked out by Dobaczewski et al. in Ref. [6], however, play mainly a role for weakly-bound exotic nuclei far from stability, as they are directly or indirectly related to the physics of loosely bound single- particle states, namely that the enhancement of the dif- fuseness of neutron density distribution reduces the spin- orbit coupling in neutron-rich nuclei on the one hand, and the interaction between bound orbitals and the con- tinuum results in a quenching of shell effects in light and medium systems on the other hand. The former effect was also extensively discussed in the framework of rela- tivistic models by Lalazissis et al. [7, 8], while the latter triggered a number of studies that discussed the poten- tial relevance of this so-called “Boguliubov enhanced shell http://arxiv.org/abs/0704.0731v3 mailto:lesinski@ipnl.in2p3.fr mailto:bender@cenbg.in2p3.fr quenching” to explain the abundance pattern from the astrophysical r-process of nucleosynthesis [9, 10, 11, 12]. These two effects take place in neutron-rich nuclei. In proton-rich nuclei, the Coulomb barrier suppresses both the diffuseness of the proton density and the coupling of bound proton states to the continuum. But the Coulomb interaction itself can also modify the shell structure: for super-heavy nuclei, it begins to destabilize the nucleus as a whole. Mean-field models predict that it ampli- fies the shell oscillations of the densities for incomplete filled oscillator shells, which leads to strong variations of the density profile that feed back onto the single-particle spectra [13, 14]. Interestingly, most theoretical papers about the evolu- tion of shell structure from the last decade have specu- lated about new effects that mainly affect neutron shells in nuclei far from stability in the anticipation of the rare- isotope physics that might become accessible with the next generation of experimental facilities. The known anomalies, some of which have been known for a long time, and many more have been identified recently, con- cern also proton shells and already appear sufficiently close to stability that “exotic phenomena can be ruled out for their explanation” in most cases, to paraphrase the authors of Ref. [15]. By contrast, this suggests that there exists a mechanism that induces a strong evolution of single-particle spectra already in stable nuclei that has been overlooked for long. There is a prominent ingredient of the nucleon-nucleon interaction that has been ignored for decades in virtu- ally all global nuclear structure models for medium and heavy nuclei, be it macroscopic-microscopic approaches or self-consistent mean-field methods. It is only very re- cently, that the systematic discrepancies between model predictions and experiment have triggered a renaissance of the tensor force in the description of finite medium- and heavy-mass nuclei. The tensor force is a crucial and necessary ingredient of the bare nucleon-nucleon interaction [16, 17], and con- sequently is contained in all ab-initio approaches that are available for light, mainly p-shell nuclei [18, 19]. One of the first experimental signatures of the tensor force was the small, but finite quadrupole moment of the deuteron. In a boson-exchange picture of the bare nucleon-nucleon interaction, the tensor force originates from the exchange of pseudoscalar pions, which have both central and tensor couplings, see for example section 2.3 in Ref. [20] or ap- pendix 13A of Ref. [21]. In a nuclear many-body system, the bare tensor force induces a strong correlation between the spatial and spin orientations in the two-body density matrix. For two nucleons with parallel spins, the ten- sor force energetically favors the configuration where the distance vector is aligned with the spins, while for anti- parallel spins the tensor force prefers when the distance vector is perpendicular to the spins, see the discussion of Fig. 13 in Ref. [22] and of Fig. 3 in Ref. [23]. The authors of these papers also demonstrate very nicely the well-known fact [24, 25] that in an approach that starts from the bare nucleon-nucleon interaction, nuclei are not bound without taking into account the two-body corre- lations induced by the tensor force. The role of the tensor force, however, manifests itself differently in self-consistent mean-field models, otherwise called energy density functional (EDF) methods, the tool of choice for medium and heavy nuclei. The latter meth- ods use an independent-particle state as a reference state to express the energy of the correlated nuclear ground state. Thus, correlations are not explicitly present in the higher-order density matrices of the reference state, but rather included under the form of a more elaborate func- tional of the (local and nearly local parts of the) one-body density matrix of that reference state. In such a scheme, most of the effect of the bare tensor force on the binding energy is integrated out through the renormalization of the coupling constants associated with a central effective vertex, in a similar fashion as the tensor part of the bare interaction is renormalized into the central one when go- ing from the bare nucleon-nucleon force to a Brueckner G matrix. The tensor terms of the EDF relate to a residual tensor vertex, that gives nothing but a correction to the spin-orbit splittings, which for light p-shell nuclei might be of the same order as the contribution from the genuine spin-orbit force. The interplay of spin-orbit and tensor forces in the mean field of medium and heavy nuclei was explored in Refs. [26, 27, 28], where the particular role of spin-unsaturated shells was pointed out. There are two widely used effective interactions for non-relativistic self-consistent mean-field models [29], the zero-range non-local Skyrme interaction [30, 31, 32, 33] on the one hand and the finite-range Gogny force [34, 35] on the other hand. In fact, the effective zero-range non-local interaction proposed by Skyrme in 1956 [30, 31, 32, 33] already con- tained a zero-range tensor force. The first applications of Skyrme’s interaction in self-consistent mean-field models that became available around 1970, however, neglected the tensor force, and the simplified effective Skyrme in- teraction used in the seminal paper by Vautherin and Brink [36] soon became the standard Skyrme interaction that was used in most applications ever since. Until very recently, there was only very little exploratory work on Skyrme’s tensor force. In their early study, Stancu, Brink and Flocard [37], who added the tensor force perturba- tively to the SIII parameterization, pointed out that some spin-orbit splittings in magic nuclei can be improved with a tensor force. A complete fit including the terms from the tensor force that contribute in spherical nuclei was attempted by Tondeur [38], with the relevant coupling constants of the spin-orbit and tensor terms adjusted to selected spin-orbit splittings in 16O, 48Ca and 208Pb. An- other complete fit of a generalized Skyrme interaction in- cluding a tensor force was performed by Liu et al. [39], but the authors did not investigate the effect of the ten- sor force in detail, nor was the resulting parameterization ever used in the literature thereafter. Similarly, the seminal paper by Gogny [34] on the eval- uation of matrix elements of a finite-range force of Gaus- sian shape in an harmonic oscillator basis contains the ex- pressions for a finite-range tensor force, which, however, was omitted in the parameterizations of Gogny’s force adjusted by the Bruyères-le-Châtel group [35]. It were Onishi and Negele [40] who first published an effective interaction that combined a Gaussian two-body central force, a finite-range tensor force with a zero-range spin- orbit force and a zero-range non-local three-body force, which, however, also fell into oblivion. The role of the tensor force is slightly different in Skyrme and Gogny interactions. In the Gogny force, the contributions from the central and tensor parts re- main explicitly distinct, although, of course, this does not prevent a certain entanglement of their physical ef- fects. In the context of Skyrme’s functional, however, the contribution of a zero-range tensor force to the spherical mean-field state of an even-even nucleus has exactly the same form as a particular exchange term from the non- local part of the central Skyrme force. When looking at spherical nuclei only, adding Skyrme’s tensor force simply allows one to decouple a term that is already provided by the central force. This indeed makes the effective- interaction-restricted functional more flexible, as the ad- ditional degrees of freedom from the tensor force remove an interdependence between the effective mass, the sur- face terms and the “tensor terms”. However, one must always keep in mind that both the central and tensor part of the effective vertex contribute to the so-called J2t “tensor” terms of the functional.1 In the context of relativistic mean-field models, the equivalent of the non-relativistic tensor force appears as the exchange term of effective fields with the quan- tum numbers of the pion, which by construction do not appear in the standard relativistic Hartree models. Only relativistic Hartree-Fock models contain this tensor force, with the first predictive parameterizations becom- ing available just recently [42]. We also mention that there is a large body of work on the tensor force in the interacting shell model, see Ref. [43] for a review, that concentrates on a completely different aspect of the tensor force, namely its unique contribution to excitations with unnatural parity. The recent interest in the effect of the tensor force in the context of self-consistent mean field models was trig- gered by the observed evolution of single-particle levels of one nucleon species in dependence of the number of the other nucleon species. Otsuka et al. [44] proposed 1 As we will outline below, and as was already pointed out in Ref. [5], this argument does not hold for deformed even-even nu- clei or any situation where intrinsic time-reversal is broken, for example odd nuclei or dynamics. There, the tensor and non-local central parts of the effective Skyrme interaction give contribu- tions to the mean-fields and the binding energy with different analytical expressions. This will be discussed in a companion article [41]. that at least part of the effect is caused by the proton- neutron tensor force from pion exchange. Many groups attempt now to explain known, but so far unresolved, anomalies of shell structure in terms of a tensor force. A particularly popular playground is the relative shift of the proton 1g7/2 and 1h11/2 levels in tin isotopes, which is interpreted as the reduction of the spin-orbit splittings of both levels with their respective partners with increasing neutron number [45]. Otsuka et al. [46] added a Gaussian tensor force, ad- justed on the long-range part of a one-pion+ρ exchange potential, to a standard Gogny force. After a consis- tent readjustment of the parameters of its central and spin-orbit parts, they were able to explain coherently the anomalous relative evolution of some single-particle levels without, however, being able to describe their absolute distance in energy. Dobaczewski [47] has pointed out that a perturbatively added tensor interaction with suitably chosen coupling constants in the Skyrme energy density functional does not only modify the evolution of shell structure, but does also improve the description of nu- clear masses around magic nuclei. Brown et al. [48] have fitted a Skyrme interaction with added zero-range tensor force with emphasis on the reproduction of single-particle spectra. While the authors appreciate the qualitatively correctly described evolution of relative level distances, they point out that the combination of zero-range spin- orbit and tensor forces does not and can not correctly describe the ℓ-dependence of spin-orbit splittings. Colò et al. [49], and Brink et al. [50] have added Skyrme’s tensor force perturbatively to the existing standard pa- rameterization SLy5 [51, 52], and to the SIII [5] one, respectively. They have investigated some single-particle energy differences: the 1h11/2 and 1g7/2 proton states in tin isotopes as well as 1i13/2 and 1h9/2 neutron states in N = 82 isotones and propose similar parameters as in Ref. [48]. The effect of the tensor force on the centroid of the GT giant resonance is also estimated by Colò et al. using a sum-rule approach and found to be substantial. Long et al. [53], demonstrate that the tensor force that emerges naturally in relativistic Hartree-Fock also im- proves the relative shifts of the proton 1g7/2 and 1h11/2 levels in tin isotopes. The work on the tensor force published so far aims at an optimal single parameterization, that establishes a best fit to either the underlying bare tensor force [46, 48] or empirical data [38, 47, 49]. The published results, as well as our first exploratory studies, however, suggest that adding a tensor force to the existing mean-field mod- els gives only a local improvement of the relative change of certain single-particle energies, but not necessarily a global improvement of single-particle spectra or other ob- servables. In the framework of the Skyrme interaction, that we will employ throughout this work, there is also the already mentioned ambiguity that the contribution from the tensor force to spherical nuclei has the same structure as a term from the central force. In view of this situation, we will pursue a different strategy and in- vestigate the effect of the tensor terms on a multitude of observables in nuclei though a set of Skyrme interac- tions with systematically varied coupling constants of the tensor terms. The present study was motivated by the finding that the performance of the existing Skyrme-type effective in- teractions for masses and spectroscopic properties is lim- ited by systematic deficiencies of the single-particle spec- tra [54, 55, 56, 57] that seem to be impossible to remove within the standard Skyrme interaction. The details of single-particle spectra were so far somewhat outside the focus of self-consistent mean-field methods, on the one hand as they do not correspond directly to empirical single-particle energies (we will come back to that be- low), and on the other hand because many of the ob- servables that are usually calculated with self-consistent mean-field methods are not very sensitive to the exact placement of single-particle levels. By contrast, there is an enormous body of work that examines the infi- nite and semi-infinite nuclear matter properties of the effective interactions that are the analog of liquid-drop and droplet parameters in great detail. The reason is, of course, that the global trends over the whole chart of nuclei have to be understood before one can look into details. The last few years have seen an increasing de- mand on predictive power. Moreover, beyond-mean-field approaches of the projected generator coordinate method (GCM), or Bohr-Hamiltonian type, have become widely used tools to analyze and predict spectroscopic properties in medium and heavy nuclei, employing either Gogny or Skyrme interactions. The underlying single-particle spec- tra thus now deserve more attention, as many of the spec- troscopic properties of interest turn out to be extremely sensitive to even subtle details of the single-particle spec- tra. As the tensor force is the most obvious missing piece in all standard mean-field interactions, it is the natural starting point for the systematic investigation of possi- ble generalizations with the ultimate goal to improve the predictive power of the interactions for spectroscopy. In the present paper, we will outline the formalism of a Skyrme interaction with added tensor force, describe the fit of the parameterizations, analyze the role of the tensor terms for single-particle spectra, masses and radii of spherical even-even nuclei. A second paper [41] studies the surface and deformation properties of these Skyrme interactions for even-even nuclei, and future work will ex- amine the stability of nuclear matter and the role of the time-odd terms from the tensor force in odd and rotating nuclei. Only deformed nuclei and, in particular, observ- ables sensitive to the time-odd contributions, will pos- sibly allow to distinguish clearly between the non-local central and tensor parts of the Skyrme force. II. THE SKYRME INTERACTION WITH TENSOR TERMS A. The energy density functional The usual ansatz for the Skyrme effective interac- tion [51, 52] leads to an energy density functional which can be written as the sum of a kinetic term, the Skyrme potential energy functional that models the effective strong interaction in the particle-hole channel, a pairing energy functional corresponding to a density-dependent contact pairing interaction, the Coulomb energy func- tional (calculated using the Slater approximation [58]) and correction terms to approximately remove the ex- citation energy from spurious motion caused by broken symmetries E = Ekin + ESkyrme + Epairing + ECoulomb + Ecorr . (1) B. The Skyrme energy density functional Throughout this work, we will use an effective Skyrme energy functional that corresponds to an antisymme- trized density-dependent two-body vertex in the particle- hole channel of the strong interaction, that can be decom- posed into a central, spin-orbit and tensor contribution vSkyrme = vc + vt + vLS . (2) Other choices for the writing of the Skyrme energy func- tional are possible and have been made in the literature, which might affect the form of the effective interaction, its interpretation and the results obtained from it. We will come back to that in section IID below. The Skyrme energy density functional is a functional of local densities and currents ESkyrme = d3r HSkyrme(r) , (3) which has many technical advantages compared to finite- range forces such as the Gogny force. All exchange terms have the same structure as the direct terms, which greatly reduces the number of necessary integrations during a calculation. 1. Local densities and currents Throughout this paper we will assume that we have pure proton and neutron states. The formal framework of the general case including proton-neutron mixing is discussed in Ref. [59]. Without making reference to any single-particle basis, we start from the density matrices of protons and neutrons in coordinate space [60] ρq(rσ, r ′σ′) = 〈â r′σ′q ârσq〉 ρq(r, r ′)δσσ′ + sq(r, r ′) · 〈σ′|σ̂|σ〉 where ρq(r, r ρq(rσ, r sq(r, r ρq(rσ, r ′σ′) 〈σ′|σ̂|σ〉 . (5) The Skyrme energy functional up to second order in derivatives that we will introduce below can be expressed in terms of seven local densities and currents [59] that are defined as ρq(r) = ρq(r, r sq(r) = sq(r, r τq(r) = ∇ ·∇ ′ ρq(r, r Tq,µ(r) = ∇ ·∇ ′ sq,µ(r, r jq(r) = − (∇−∇′) ρq(r, r Jq,µν(r) = − (∇µ −∇ µ) sq,ν(r, r Fq,µ(r) = sq,ν(r, r which are the density ρq(r), the kinetic density τq(r), the current (vector) density jq(r), the spin (pseudovec- tor) density sq(r), the spin kinetic (pseudovector) density Tq(r), the spin-current (pseudotensor) density Jq,µν(r), and the tensor-kinetic (pseudovector) density Fq(r). ρq(r), τq(r) and Jq,µν(r) are time-even, while sq(r), Tq(r), jq(r) and Fq(r) are time-odd. For a detailed dis- cussion of their symmetries see Ref. [60]. There are other local densities up to second order in derivatives that can be constructed, but when constructing an energy func- tional they either cannot be combined with others to terms with proper symmetries or they lead to terms that are not independent from the others [61]. The cartesian spin-current pseudotensor density Jµν can be decomposed into pseudoscalar, (anti-symmetric) vector and (symmetric) traceless pseudotensor parts, all of which have well-defined transformation properties un- der rotations Jµν(r) = δµν J (0)(r)+ 1 ǫµνκ J κ (r)+J µν (r) , (7) where δµν is the Kronecker symbol and ǫµνκ the Levi- Civita tensor. The pseudoscalar, vector and pseudoten- sor parts expressed in terms of the cartesian tensor are given by J (0)(r) = Jµµ(r) , (8) J (1)κ (r) = µ,ν=x ǫκµν Jµν(r) , J (2)µν (r) = [Jµν(r) + Jνµ(r)] − Jκκ(r) . The vector spin current density J(1)(r) ≡ J(r) is often called spin-orbit current, as it enters the spin-orbit energy density. 2 For the formal discussion of the physical content of the Skyrme energy functional it is of advantage to recouple the proton and neutron densities to isoscalar and isovec- tor densities, for example ρ0(r) = ρn(r) + ρp(r) , ρ1(r) = ρn(r)− ρp(r) (9) and similar for all others. As we assume pure proton and neutron states, only the Tz = 0 component of the isovector density is non-zero, which we exploit to drop the index Tz from the isovector densities ρ1Tz (r) etc. 2. Skyrmes’s central force We will use the standard density-dependent central Skyrme force vc(R, r) = t0 (1 + x0P̂σ) δ(r) t3 (1 + x3P̂σ) ρ α(R) δ(r) t1 (1 + x1P̂σ) ′2 δ(r) + δ(r) k̂2 + t2 (1 + x2P̂σ) k̂ ′ · δ(r) k̂ (10) where we use the shorthand notation r = r1 − r2 , R = 1 (r1 + r2) , (11) while k̂ is the usual operator for relative momenta k̂ = − i (∇1 −∇2) (12) and k̂′ its complex conjugated acting on the left. Finally, P̂σ is the spin exchange operator that controls the relative strength of the S = 0 and S = 1 channels for a given term in the two-body interaction P̂σ = (1 + σ̂1 · σ̂2) . (13) As said above, we restrict ourselves to a parameterization of the Skyrme energy functional as obtained from the average value of an effective two-body vertex in the ref- erence Slater determinant. We decompose the isoscalar and isovector parts of the resulting energy density func- tional Hc into a part H c,even t that is composed entirely of time-even densities and currents, and a part H c,odd t that contains terms which are bilinear in time-odd densities 2 Some authors call J(r) spin density, which is ambiguous and confusing when discussing the complete energy density functional including terms that contain the time-odd s(r). and currents and vanishes in intrinsically time-reversal invariant systems Hc(r) = t=0,1 c,even t (r) +H c,odd t (r) . (14) Both H c,even t and H c,odd t are of course constructed such that they are time-even; they are given by [59, 62] c,even t = A t [ρ0] ρ t ρt∆ρt +A t ρtτt µ,ν=x Jt,µνJt,µν , c,odd t = A t [ρ0] s +A∆st st ·∆st +A t st ·Tt , (15) where A t [ρ0] and A t [ρ0] are density dependent coupling constants that depend on the total (isoscalar) density. The detailed relations between the coupling constants of the functional and the central Skyrme force are given in appendix A. The notation reflects that two pairs of terms in H c,even t and H c,odd t are connected by the require- ment of local gauge invariance of the Skyrme energy func- tional [63]. 3. A zero-range spin-orbit force The spin-orbit force used with most standard Skyrme interactions vLS(r) = iW0 (σ̂1 + σ̂2) · k̂ ′ × δ(r) k̂ (16) is a special case of the one proposed by Bell and Skyrme [32, 33]. Again, the corresponding energy func- tional [59, 62] can be separated into a time-even and a time-odd term HLS(r) = t=0,1 LS,even t (r) +H LS,odd t (r) where LS,even t = A t ρt∇ · Jt LS,odd t = A t st · ∇ × jt (18) which share the same coupling constant as again both terms are linked by the local gauge invariance of the en- ergy functional. The relation between the A∇·Jt and the one coupling constant of the two-body spin-orbit force W0 is given in appendix A. 4. Skyrme’s tensor force By convention, the tensor operator in the tensor force is constructed using the unit vectors in the direction of the relative coordinate er = r/|r| and subtracting σ̂1 · σ̂2 Ŝ12 = 3(σ̂1 · er)(σ̂2 · er)− σ̂1 · σ̂2 , (19) such that its mean value vanishes for a relative S state, which decouples the central and tensor channels of the interaction. The operator Ŝ12 commutes with the total spin [Ŝ12, Ŝ 2] = 0, therefore it does not mix partial waves with different spin, i.e. spin singlet and spin triplet states. In particular, it does not act in spin singlet states at all, as Ŝ12P̂S=0 = 0 (see section 13.6 of Ref. [21]). As a consequence, there is no point in multiplying a tensor force with an exchange operator (1+xtP̂σ) as done for the central force, as this will only lead to an overall rescaling of its strength. The derivation of the general energy functional from a zero-range two-body tensor force is discussed in detail in Refs. [59, 64]. We repeat here the details relevant for our discussion, starting from the two zero-range tensor forces proposed by Skyrme [30, 31] vt(r) = 1 3 (σ1 · k ′) (σ2 · k ′)− (σ1 · σ2)k δ(r) + δ(r) 3 (σ1 · k) (σ2 · k)− (σ1 · σ2)k 3 (σ1 · k ′) δ(r) (σ2 · k)− (σ1 · σ2)k ′ · δ(r)k where r, k̂ and k̂′ are defined as above, Eqs. (11) and (12). The corresponding energy density functional can again be decomposed in a time-even and a time-odd part Ht(r) = t=0,1 t,even t (r) +H t,odd t (r) with [59] t,even t = −B µ,ν=x Jt,µνJt,µν − Jt,µµ µ,ν=x Jt,µνJt,νµ t,odd t = B t st ·Tt +B t st · Ft +B∆st st ·∆st +B t (∇ · st) 2 , (22) where we already used the local gauge invariance of the energy functional [59] for the expressions of the coupling constants. The actual expressions for the coupling con- stants expressed in terms of the two coupling constants te and to of the tensor forces are given in appendix A. The “even” term proportional to te in the two-body tensor force (20) mixes relative S and D waves, while the “odd” term proportional to to mixes relative P and F waves. Thus, due to the fact that both act in spin- triplet states only, antisymmetrization implies that the former acts in isospin-singlet states (and hence con- tributes to the neutron-proton interaction only) and the latter in isospin-triplet states (contributing both to the like-particle and neutron-proton interactions). The cen- tral and spin-orbit interactions as we use them, however, do not containD or F wave interactions. From this point of view, one might suspect a mismatch when combining the various interaction terms. From the point of view of the energy functional (22), however, all contributions from the zero-range tensor force are of the same second order in derivatives as the contributions from the non- local part of the central Skyrme force (15) and from the spin-orbit force (18). In the time-even part of the energy functional H t,even there appear three different combinations of the carte- sian components of the spin current tensor. The term proportional to BTt contains the symmetric combination JµνJµν as it already appeared in the energy functional from the central Skyrme interaction (15), while the term proportional to BFt contains two different terms, namely the antisymmetric combination JµνJνµ and the square of the trace of Jνµ. 5. Combining central and tensor interactions The Skyrme energy functional representing central, tensor, and spin-orbit interactions is given by ESkyrme = Ec + ELS + Et t=0,1 t [ρ0] ρ t + C t [ρ0] s t + C t ρt∆ρt + C t (∇ · st) 2 + C∆st st ·∆st + C t (ρtτt − j st ·Tt − µ,ν=x Jt,µνJt,µν + CFt st · Ft − Jt,µµ µ,ν=x Jt,µνJt,νµ +C∇·Jt (ρt∇ · Jt + st · ∇ × jt) . (23) This functional contains all possible bilinear terms up to second order in the derivatives that can be constructed from local densities and that are invariant under spatial and time inversion, rotations, and local gauge transfor- mations [59]. Some of the coupling constants are completely defined by the standard central Skyrme force, i.e. C t = A Cst = A t , C t = A t , and C t = A t , two by the spin-orbit force, C∇Jt = A t , others by the tensor force, CFt = B t and C t = B t , while some are the sum of coupling constants from both central and tensor forces, CTt = A t , and C t = A The three terms bilinear in Jµν can be recoupled into terms bilinear in its pseudoscalar, vector, and pseudoten- sor components J (0), J (1), and J (2), Eq. (8), which is prefered by some authors [59] µ,ν=x Jt,µνJt,µν = µ,ν=x t,µνJ t,µν (24) Jt,µµ µ,ν=x Jt,µνJt,νµ µ,ν=x t,µνJ t,µν . (25) After combining (23) with the kinetic, Coulomb, pairing and other contributions from (1), the mean-field equa- tions are obtained by standard functional derivative tech- niques from the total energy functional [29, 59]. The complete Skyrme energy functional (23) has quite complicated a structure, and in the most general case leads to seven distinct mean fields in the single-particle Hamiltonian [59]. As already mentioned, we want to di- vide the examination of those terms that contain two derivatives and two Pauli matrices in the complete func- tional, i.e. those terms from the central Skyrme force that are often neglected and all the terms from the ten- sor Skyrme force, into three distinct steps: First, in the present paper, we enforce spherical symmetry which re- moves all time-odd densities and all but one out of the nine components of the spin current tensor Jµν as will be outlined in the following section. A subsequent pa- per [41] will discuss deformed even-even nuclei where the complete spin current tensor Jµν is present, and future work will address the time-odd part of the energy func- tional (23). C. The Skyrme energy functional in spherical symmetry For the rest of this paper, we will concentrate on spher- ical nuclei, enforcing spherical symmetry of the N -body wave functions. As a consequence, the canonical single- particle wave functions Ψi [65] can be labeled by ji, ℓi and mi. The index ni labels the different states with same ji and ℓi. The functions Ψi separates into a radial part ψ and an angular and spin part, represented by a tensor spherical harmonic Ωjℓm Ψnjℓm(r) = ψnjℓ(r) Ωjℓm(θ, φ) . (26) Spherical symmetry also enforces that all magnetic sub- states of Ψnjℓm have the same occupation probability v2njℓm ≡ v njℓ for all −j ≤ m ≤ j. For a static spherical state, all time-odd densities are zero sq(r) = Tq(r) = jq(r) = Fq(r) = 0, as are the corresponding mean fields in the single-particle Hamiltonian. Enforcing spherical symmetry also greatly simplifies the spin-current tensor, both the pseudoscalar and pseu- dotensor parts of Jµν vanish. From the vector spin-orbit current, only the radial component is non-zero, which is given by [36] Jq(r) = n,j,ℓ (2j + 1) v2njℓ j(j + 1)− ℓ(ℓ+ 1)− 3 ψ2njℓ(r) (27) so that there is only one out of the nine components of the spin-current tensor density that contributes in spherical nuclei. Unlike the total density ρ and the kinetic den- sity τ , that are bulk properties of the nucleus and grow with the size of the nucleus, the spin-orbit current is a shell effect that shows strong fluctuations. Assume the two shells with same n and ℓ which are split by the spin- orbit interaction, one coupled with the spin to j = ℓ+ 1 the other to j = ℓ − 1 . It is easy to verify that their contributions to Jq(r) are equal but of opposite signs such that they cancel when (i) both shells are completely filled and (ii) their radial wave functions are identical ψn,ℓ+1/2,ℓ = ψn,ℓ−1/2,ℓ. Although the latter condition is never exactly fulfilled, this demonstrates that the spin- orbit current is not a bulk property, but a shell effect that strongly fluctuates with N and Z. It nearly van- ishes in so-called spin-saturated nuclei, where all spin- orbit partners are either completely occupied or empty, and it might be quite large when only the j = ℓ+1/2 level out of one or even several pairs of spin-orbit partners is filled. Altogether, the Skyrme part of the energy density func- tional in spherical nuclei is reduced to HSkyrme = t=0,1 t [ρ0] ρ t + C t ρt∆ρt + C t ρtτt CJt J t + C t ρt∇ · Jt , (28) where we have introduced an effective coupling constant CJt of the J t tensor terms at sphericity, such that the corresponding contribution to the energy functional is given by t=0,1 CJt J t=0,1 CTt + t . (29) The effective coupling constants can be separated back into contributions from the non-local central and tensor forces CJt = A t (30) which are given by AJ0 = AJ1 = BJ0 = (te + 3to) = (T + 3U) BJ1 = (to − te) = (U − T ) , (31) where we also give the expressions using the notation T = 3te and U = 3to employed in [37, 49, 64]. For the following discussion it will be also illuminating to recouple this expression to a representation that uses proton and neutron densities, where we use the notation introduced in Ref. [37] Ht = 1 α (J2n + J p) + β Jn · Jp , (32) α = CJ0 + C 1 , β = C 0 − C CJ0 = (α+ β) , CJ1 = (α− β) . (33) The proton-neutron coupling constants α = αC+αT and β = βC + βT can again be separated into contributions from central and tensor forces (t1 − t2)− (t1x1 + t2x2) , βC = − (t1x1 + t2x2) , (te + to) = (T + U) . (34) As could be expected, the isospin-singlet tensor force contributes only to the proton-neutron term, while the isospin-triplet tensor force contributes to both. The spin-orbit potential of the neutrons is given by Wn(r) = δJn(r) 2∇ρn +∇ρp) + αJn + β Jp . (35) The expression for the protons is obtained exchanging the indices for protons and neutrons. In spherical sym- metry, the tensor force gives a contribution to the spin- orbit potential, but does not alter the structure of the spin-orbit terms in the single-particle Hamiltonian as such. This will be different in the case of deformed mean fields [41, 59]. The dependence of the spin-orbit potential Wq(r) on the spin-orbit current Jq(r) through the tensor terms is the source of a potential instability. When the spin-orbit splitting becomes larger than the splitting of the cen- troids of single-particle states with different orbital angu- lar momentum ℓ, the reordering of levels might increase the number of spin-unsaturated levels, which increases the spin-orbit current Jn and feeds back on the spin-orbit potential by increasing it even further, which ultimately leads to an unphysical shell structure. An example will be given in appendix B. D. A brief history of tensor terms in the central Skyrme energy functional For the interpretation of the parameterizations we will describe below it is important to point out that within our choice of the effective Skyrme interaction as an an- tisymmetrized vertex the two coupling constants of the contribution from the central force toHT , Eq. (29), either represented through AJ0 , A 1 or through αC , βC , are not independent from the coupling constants Aτ0 , A 1 , A and A 1 , that appear in Eq. (28). Through the expres- sions given in appendix A, all six of them are determined by the four coupling constants t1, x1, t2, and x2 from the central Skyrme force, Eq. (10). As a consequence, a ten- sor force is absolutely necessary to decouple the values of the CJt from those of the C t and C t , which determine the isoscalar and isovector effective masses and give the dominant contribution to the surface and surface asym- metry coefficients, respectively. This interpretation of the Skyrme interaction is, how- ever, far from being common practice and a source of confusion and potential inconsistencies in the literature. Many authors have used parameterizations of the central and spin-orbit Skyrme energy functional with coupling constants that in one way or the other do not exactly correspond to the functional obtained from Eqns. (10) and (16), which, depending on the point of view, can be seen as an approximation to or a generalization of the original Skyrme interaction. As the most popular mod- ification concerns the tensor terms, a few comments on the subject are in order. Again, the practice goes back to the seminal paper by Vautherin and Brink [36], who state that “the contribution of this term to [the spin- orbit potential] is quite small. Since it is difficult to in- clude such a term in the case of deformed nuclei, it has been neglected”. This choice was further motivated by the interpretation of the effective Skyrme interaction as a density-matrix expansion (DME) [25, 66, 67, 68]. All early parameterizations as SI and SII [36], SIII-SVI [5], SkM [69] and SkM∗ [70] followed this example and did not contain the J2 terms. Beiner et al. [5] weakened the case for J2 terms further by pointing out that they might lead to unphysical single-particle spectra. During the 1980s and later, however, it became more popular to include them, for example in SkP [65], the parame- terizations T1-T9 by Tondeur et al. [71], Eσ and Zσ by Friedrich and Reinhard [72]. Some of the recent param- eterizations come in pairs, where variants without and with J2 terms are fitted within the same fit protocol, for example (SLy4, SLy5) and (SLy6, SLy7) in Ref. [52], or (SkO, SkO’) in Ref. [73]. Interestingly, all but one parameterization of the cen- tral Skyrme interaction found in the literature set the coupling constants of the J2 terms either to their Skyrme force value (A1) or strictly to zero. The exception is Ref. [38] by Tondeur, where an independent fit of the cou- pling constants of the J2 terms was attempted, making explicit reference to a DME interpretation of the energy functional. Setting the coupling constants of a term to zero when one does not know how to adjust its parameters is of course an acceptable practise when permitted by the cho- sen framework. For Skyrme interactions fitted without the J2 terms, the situation becomes confusing when one looks at deformed nuclei and any situation that breaks time-reversal invariance. First of all, Galilean invariance of the energy functional dictates that the coupling con- stant of the s · T terms is also set to zero, as already indicated by the presentation of the energy functional in Eq. (23). Second, using a DME interpretation of the Skyrme energy functional in one place, but the interre- lations from the two-body Skyrme force in all others is not entirely satisfactory. Many authors who drop the J2 terms rarely show scruples to keep most of the time-odd terms in the Skyrme energy functional (23) with coupling constants Ast and A t from (A1), although they are not at all constrained in the common fit protocols employ- ing properties of even-even nuclei and spin-saturated nu- clear matter. For a list of exceptions see Sect. II.A.2.d of Ref. [29]. An alternative is to set up a hierarchy of terms, as it was attempted by Bonche, Flocard and Heenen in their mean-field and beyond codes, which set A∆st = 0 in addition to the coupling constant of the J2 terms, as all three terms have in common that they couple two Pauli matrices with two derivatives in different manners, see the footnote on page 129 of [74]. There are also inconsistent applications of parameter- izations without J2 − s · T terms to be found in the lit- erature. For example, almost all applications of Skyrme interactions to the Landau parameters gℓ and g ℓ and the properties of polarized nuclear matter, include the con- tribution from the s · T terms, although it should be dropped for parameterizations fitted without J2 terms. Similarly, most RPA and QRPA codes include them for simplicity, see the discussion in Refs. [75, 76, 77]. As it is relevant for the subject of the present paper, we also mention another generalization of the Skyrme in- teraction that invokes the interpretation of the Skyrme energy functional in a DME framework. The spin-orbit force (16) fixes the isospin mix of the corresponding terms in the Skyrme energy functional (23) such that A∇J0 = 3A 1 (A2). There are a few parameterizations as MSkA [78], SkI3 and SkI4 [79], SkO and SkO’ [73] and SLy10 [52] that liberate the isospin degree of freedom in the spin-orbit functional. A DME interpretation of the energy functional is mandatory for this generalization. It is motivated by the better performance of standard rela- tivistic mean-field models for the kink of the charge radii in Pb isotopes. Note that the standard RMF models are effective Hartree theories without exchange terms, and that the standard Lagrangians have very limited isovec- tor degrees of freedom [29], both of which supress a strong isospin dependence of the spin-orbit interaction. It is in- teresting to note that the existing fits of Skyrme energy functionals with generalized spin-orbit interaction do not improve spin-orbit splittings [14]. III. THE FITS A. General remarks In order to study the effect of the J2 terms, we have built a set of 36 effective interactions that systematically cover the region of coupling constants CJ0 and C 1 that give a reasonable description of finite nuclei in connec- tion with the standard central and spin-orbit Skyrme forces. At variance with the perturbative approach used in Refs. [37, 49], each of these parameterizations has been fitted separately, following a procedure nearly identical to that used for the construction of the SLy parameteriza- tions [51, 52], so that we can keep the connection between the new fits with parameterizations that have been ap- plied to a large variety of observables and phenomena. The Saclay-Lyon fit protocol focuses on the simultaneous reproduction of nuclear bulk properties such as binding energies and radii of finite nuclei and the empirical char- acteristics of infinite nuclear matter (i.e. symmetric and pure neutron matter). The latter establishes an impor- tant, though highly idealized, limiting case as it permits to confront the energy functional with calculations from first principles using the bare nucleon-nucleon force [80]. The region of effective coupling constants (CJ0 , C 1 ) of the J2 terms acting in spherical nuclei, as defined in Eq. (28), that we will explore, is shown in Fig. 1. The parameterizations are labeled TIJ , where indices I and J refer to the proton-neutron (β) and like-particle (α) coupling constants in Eq. (32) such that α = 60 (J − 2) MeV fm5, β = 60 (I − 2) MeV fm5. (36) The corresponding values of CJt can be obtained through Eq. (33) or from Fig. 1. On the one hand, we cover the positions of the most popular existing Skyrme in- teractions that take the J2 terms from the central force into account, which are SLy5 [52], SkP [65], Zσ [72], T6 [71], SkO’ [73] and BSk9 [81]. On the other hand, among recent parameterizations including a tensor term, i.e. Skxta [48], Skxtb [48, 82] as well as those published by Colò et al. [49] and Brink and Stancu [50], most fall in a region of negative CJ1 and vanishing C 0 , that is to the lower left of Fig. 1. Parameterizations of this region, which also includes a part of the triangle advocated in the perturbative study of Stancu et al. [37], gave unsat- isfactory results for many observables. Moreover, when attempting to fit parameterizations with large negative coupling constants, we sometimes obtained unrealistic single-particle spectra or even ran into the instabilities al- ready mentioned and outlined in appendix B. Parameter- izations further to the lower and upper right also have un- realistic deformations properties. The contribution from the J2 terms vanishes for T22, which will serve as the reference point. For the parameterizations T2J , only the proton-proton and neutron-neutron terms in Ht are non- zero (β = 0), while for the parameterizations TI2, only the proton-neutron term in Ht contributes (α = 0). Note that the earlier parameterizations T6 and Zσ have a pure like-particle J2 terms as a consequence of the constraint x1 = x2 = 0 employed for both (and most other early parameterizations of Skyrme’s interaction). B. The fit protocol and procedure The list of observables used to construct the cost function χ2 minimized during the fit (see Eq. (4.1) in Ref. [51]) reads as follows: binding energies and charge radii of 40Ca, 48Ca, 56Ni, 90Zr, 132Sn and 208Pb; the bind- ing energy of 100Sn; the spin-orbit splitting of the neutron 3p state in 208Pb; the empirical energy per particle and density at the saturation point of symmetric nuclear mat- ter; and finally, the equation of state of neutron matter as predicted by Wiringa et al. [16]. Furthermore, some properties of infinite nuclear mat- ter are constrained through analytic relations between -60 -30 0 30 60 90 120 150 180 210 240 270 CJ0 [MeV fm BSk9T6 Skxta Skxtb Brink FIG. 1: Values of CJ0 and C 1 for our set of parameteriza- tions (circles). Diagonal lines indicate α = CJ0 +C 1 = 0 (pure neutron-proton coupling) and β = CJ0 − C 1 = 0 (pure like- particle coupling). Values for classical parameter sets are also indicated (dots), with SLy4 representing all parameterizations for which J2 terms have been omitted in the fit. Recent pa- rameterizations with tensor terms are indicated by squares. coupling constants in the same manner as they were in Refs. [51, 52]: the incompressibility modulus K∞ is kept at 230 MeV, while the volume symmetry energy coeffi- cient aτ is set to 32 MeV. The isovector effective mass, ex- pressed through the Thomas-Reiche-Kuhn sum rule en- hancement factor κv, is taken such that κv = 0.25. When using a single density-dependent term in the central Skyrme force (10), the isoscalar effective mass m∗0 cannot be chosen independently from the incompress- ibility modulus for a given exponent α of ρ0. We fol- low here the prescription used for the SLy parameteriza- tions [51, 52] and use α = 1/6, which leads to an isoscalar effective mass close to 0.7 in units of the bare nucleon mass for all TIJ parameterizations. This value allows for a correct description of dynamical properties, as for ex- ample the energy of the giant quadrupole resonance [83]. Using such a protocol we cannot reproduce the isovec- tor effective mass consistent with recent ab-initio predic- tions [84]. Regarding the present exploratory study of the tensor terms this is not a critical limitation, in particular as the influence of this quantity on static properties of finite nuclei turns out to be small. There are three modifications of the fit protocol com- pared to [51, 52]. The obvious one is that the values for CJ0 and C 1 are fixed beforehand as the parameters that will later on label and classify the fits. The second is that we have added the binding energies of 90Zr and 100Sn to the set of data. Indeed, we observed that the latter nucleus is usually significantly overbound when not -60 0 60 120 180 240 (T11) β [MeV fm5] α [MeV fm5] FIG. 2: Values of the cost function χ2 as defined in the fit procedure, for the set of parameterizations TIJ . The label “T11” indicates the position of this parameterization in the (α,β)-plane as obtained from Eqs. (36). Contour lines are drawn at χ2 = 11, 12, 15, 20, 25, and 30. The minimum value is found for T21 (χ2 = 10.05), the maximum for T61 (χ2 = 37.11). included in the fit. The third is that we have dropped the constraint x2 = −1 that was imposed on the SLy pa- rameterizations [51, 52] to ensure the stability of infinite homogeneous neutron matter against a transition into a ferromagnetic state. On the one hand, this stability criterion is completely determined by the coupling con- stants of the time-odd terms in the energy functional [76], that we do not want to constrain here, accepting that the parameterizations might be of limited use beyond the present study. On the other hand, the tensor force brings many new contributions to the energy per parti- cle of polarized nuclear matter that lead to a much more complex stability criterion. We postpone the entire dis- cussion concerning the stability in polarized systems in the presence of a tensor force to future work that will also address finite-size instabilities [84]. It also has to be stressed that the actual stability criterion, as all proper- ties of the time-odd part of the Skyrme energy functional, depends on the choices made for the interpretation of its coupling constants, i.e. antisymmetrized vertex or den- sity functional [76]. The properties of the finite nuclei entering the fit are computed using a Slater determinant without taking pairing into account. The cost function χ2 was mini- mized using a simulated annealing algorithm. The an- nealing schedule was an exponential one, with a charac- teristic time of 200 iterations (also referred to as “simu- lated quenching”) Thus, assuming a reasonably smooth cost function, we strive to obtain satisfactory convergence to its absolute minimum in a single run, allowing a sys- tematic and straightforward production of a large series of forces. The coupling constants for all 36 parameteri- zations can be found in the Physical Review archive [85]. Figure 2 displays the value of χ2 after minimization as -150 -120 -90 -60 -30 0 30 60 90 120 150 180 BJ0 [MeV fm FIG. 3: The contributions from the tensor force BJ0 and B to the effective coupling constants of the J2 term at sphericity. Diagonal lines as in Fig. 1. The diagonal where BJ0 + B αT = 0 (pure proton-neutron contribution) additionally cor- responds to an isospin-singlet force with to ≡ U = 0. a function of the recoupled coupling constants α and β. The first striking feature is the existence of a “valley” at β = 0, i.e. a pure like-particle tensor term ∼ (J2n + J The abrupt rise of χ2 around this value can be attributed to the term depending on nuclear binding energies, as sharp variations of energy residuals can be seen between neighboring magic nuclei with functionals of the T6J se- ries (β = 240). For example, 48Ca and 90Zr tend to be significantly overbound in this case. We will come back later to discussing the implications for the quality of the functionals. C. General properties of the fits The coupling constants of the energy functional for spherical nuclei (28) obtained for T22 are very similar to those of SLy4, except for a slight readjustment coming from the inclusion of the binding energies of 90Zr and 100Sn in the fit as well as the abandoned constraint on x2. With its value of −0.945, the x2 obtained for T22 still stays close to the value −1 enforced for SLy4, which confirms that this is not too severe a constraint for pa- rameterizations without effective J2 terms at sphericity. Increasing the effective tensor term coupling constants CJt , however, the values for x2 start to deviate strongly from the region around −1, which is to a large extent due to the feedback from the contribution of the J2 terms to the surface and surface symmetry energy coefficients in the presence of constraints on isoscalar and isovector ef- -60 0 60 120 180 240 -60 120 180 240 W0 [MeV fm (T11) β [MeV fm5]α [MeV fm5] W0 [MeV fm FIG. 4: Value of spin-orbit coupling constant W0 for each of the parameterizations TIJ , vs. indices I and J (The “(T11)” label indicates the position of this parameterization in the (α, β)-plane). The contour lines differ by 20 MeV fm5. The values plotted here range from 103.7 MeV fm5 (T11) to 195.3 MeV fm5 (T66). fective masses, all of which also depend on x2. A more de- tailed discussion of the contribution of the J2 terms to the surface energy coefficients will be given elsewhere [41]. From the constrained coupling constants CJ0 and C the respective contributions BJ0 and B 1 from the tensor force can be deduced afterwards using the expressions given in Sect. II C. Their values, shown in Fig. 3, are less regularly distributed, which is a consequence of the the non-linear interdependence of all coupling constants. Still, a general trend can be observed, such that all parameterizations are shifted towards the “south-west” compared to Fig. 1. In turn, this indicates that the con- tribution from the central Skyrme force always stays in the small region outlined by SkP, SLy5, Zσ, etc in Fig. 1, with values that range between 28 and 104 MeV fm5 for AJ0 and between 38 to 62 MeV fm 5 for AJ1 , respectively. This justifies a posteriori to use the tensor force as a motivation to decouple the J2t terms from the central part of the effective Skyrme vertex. We note in passing that all our parameterizations TI4 correspond to an al- most pure proton-neutron or isospin-singlet tensor force, i.e. the term ∝ te in Eq. (20), as they are all located close to the αT = 0 line. We also find a particularly strong and systematic vari- ation of the coupling constant W0 of the spin-orbit force, which varies from W0 = 103.7 MeV fm 5 for T11 to W0 = 195.3 MeV fm 5 for T66, see Fig. 4. This variation is of course correlated to the strength of the tensor force. As already shown, the tensor force has the tendency to reduce the spin-orbit splittings in spin-unsaturated nu- clei. To maintain a given spin-orbit splitting in such a nucleus, the spin-orbit coupling constant W0 has to be increased. Ni, T440.015 0.010 0.005 0.000 ρsat/2 0 1 2 3 4 5 6 7 r [fm] -0.010 -0.005 0.000 0.005 0.010 0.015 0.020 Jn [fm FIG. 5: (Color online) Radial component of the neutron spin-orbit current for the chain of Ni isotopes, plotted against radius and neutron number N . The solid line on the base plot indicates the radius where the total density has half its saturation value. IV. RESULTS AND DISCUSSION The calculations presented below include open-shell nuclei treated in the Hartree-Fock-Bogoliubov (HFB) framework. In the particle-particle channel, we use a zero-range interaction with a mixed surface/volume form factor (called DFTM pairing in Ref. [86]). The HFB equations were regularized with a cutoff at 60 MeV in the quasiparticle equivalent spectrum [87]. The pair- ing strength was adjusted in 120Sn with the particle-hole mean field calculated using the parameter set T33. The resulting strength was kept at the same value for all pa- rameterizations, which is justified by the fact that the effective mass parameters are the same. Moreover, we thus avoid including, in the adjustment of the pairing strength, local effects linked with changes in details of the single-particle spectrum. A. Spin-orbit currents and potentials As a first step in the analysis of the role of the tensor terms and their interplay with the spin-orbit interaction in spherical nuclei, we analyze the spin-orbit current den- sity and its relative contribution to the spin-orbit poten- tial. We choose the chain of nickel isotopes, Z = 28, as it covers the largest number of spherical neutron shells and subshells (N = 20, 28, 40 and 50) of any isotopic chain, two of which are spin-saturated (N = 20 and 40), while the other two are not. Figure 5 displays the radial com- ponent of the neutron spin-orbit current Jn for isotopes from the proton to the neutron drip-lines. The calcula- tions are performed with T44, but the spin-orbit current is fairly independent from the parameterization. Starting from N = 20, which corresponds to a completely filled and spin-saturated sd-shell, the next magic number at N = 28 is reached by filling the 1f7/2 shell, which leads to the steeply rising bump in the plot of Jn in the fore- ground, peaked around r ≃ 3.5 fm. Then, from N = 28 to N = 40 the rest of the fp shell is filled, which first produces the small bump at small radii that corresponds to the filling of the 2p3/2 shell, but ultimately leads to a vanishing spin-orbit current when the 1f and 2p lev- els are completely filled for the N = 40 isostope, visible as the deep valley in Fig. 5. Adding more neutrons, the filling of the 1g9/2 shell leads again to a strong neutron spin-orbit current at N = 50. For the remaining isotopes up to the neutron drip line, the evolution of Jn is slower with the filling of the 2d and 3s orbitals. A few further comments are in order. First, the spin- orbit current clearly reflects the spatial probability dis- tribution of the single-particle wave function in pairs of unsaturated spin-orbit partners. Within a given shell, the high-ℓ states contribute at the surface, represented by the solid line on the base of Fig. 5, while low-ℓ states contribute at the interior. The peak from the high-ℓ or- bitals, however, is always located on the inside of the nu- clear surface, as defined by the radius of half saturation density. Second, within a given shell, the largest contri- butions to the spin-orbit current density obviously come from the levels with largest ℓ, as they have the largest degeneracy factors in (27), and because they do not have nodes, which leads to a single, sharply peaked contribu- tion. Third, the spin-orbit current is not exactly zero for nominally “spin-saturated” nuclei, exemplified by the N = 20 and N = 40 isotopes in Fig. 5, as the radial single-particle wave functions are not exactly identical for all pairs of spin-orbit partners, which is a necessary requirement to obtain Jn = 0 at all radii (Cf. the example of the ν 2d states in 132Sn in Fig. 16 below). Fourth, pair- ing and other correlations will always smooth the fluctu- ations of the spin-orbit current with nucleon numbers, as levels in the vicinity of the Fermi energy will never be completely filled or empty. Next, we compare the contributions from the tensor terms and from the spin-orbit force to the spin-orbit po- tentials of protons and neutrons, Eq. (35). The contri- butions from the tensor force to the spin-orbit poten- tial are proportional to the spin-orbit currents of pro- tons and neutrons. For the Ni isotopes, the proton spin- orbit current is very similar to that of the neutrons at N = 28 displayed in Fig. 5. For the parameterization T44 we use here as an example, we have contributions from both proton and neutron spin-orbit currents, which come with equal weights. Their combined contribution to the spin-orbit potential of the neutronWn might be as large as 4 MeV, see Fig. 6. This is more than a third of the maximum contribution from the spin-orbit force to Wn, see Fig. 7. The latter is proportional to a combina- tion of the gradients of the proton and neutron densities, 2∇ρn(r) + ∇ρp(r), see Eq. (35). As a consequence, it has a smooth behavior as a function of particle number, Ni, T44 ρsat/2 0 1 2 3 4 5 6 7 r [fm] 20 N Wn,t [MeV fm] FIG. 6: (Color online) Contribution from the tensor terms to the neutron spin-orbit potential for the chain of Ni isotopes as obtained with the parameterization T44. The solid line on the base plot indicates the radius where the isoscalar density ρ0 crosses half its saturation value. Ni, T44 ρsat/2 0 1 2 3 4 5 6 7 r [fm] 20 N Wn,so [MeV fm] FIG. 7: (Color online) Contribution from the spin-orbit force to the neutron spin-orbit potential for the chain of Ni isotopes as obtained with the parameterization T44. The solid line on the base plot indicates the radius where the isoscalar density ρ0 crosses half its saturation value. with slowly and monotonically varying width, depth and position. Only limited local variations can be seen on the interior due to small variations of the density profile originating from the successive filling of different orbits. Furthermore, one can easily verify that the contribution from the spin-orbit force is peaked at the surface of the nucleus (the solid line on the base plot). The strongest variation of the depth of this potential occurs just be- fore the neutron drip line at N = 62, where is becomes wider and shallower due to the development of a diffuse Ni, T44 ρsat/2 0 1 2 3 4 5 6 7 r [fm] Wn [MeV fm] FIG. 8: (Color online) Total neutron spin-orbit potential for the chain of Ni isotopes as obtained with the parameterization T44. The solid line on the base plot indicates the radius where the isoscalar density ρ0 crosses half its saturation value. Ni, T44 ρsat/2 0 1 2 3 4 5 6 7 r [fm] Wp [MeV fm] FIG. 9: (Color online) Total proton spin-orbit potential for the chain of Ni isotopes as obtained with the parameterization T44. The solid line on the base plot indicates the radius where the isoscalar density ρ0 crosses half its saturation value. neutron skin, which reduces the gradient of the neutron density [6, 7, 8]. Adding the contributions from the proton and neutron tensor terms to that from the spin-orbit force, the total neutron spin-orbit potential for neutrons in Ni isotopes is shown in Fig. 8. For the parameterization T44 used here (and most others in the sample of parameterizations used in this study) the dominating contributions from the spin-orbit and tensor forces to the spin-orbit poten- tial are of opposite sign. For Ni isotopes, Jp is always quite large, while Jn varies as shown in Fig. 5. Notably, both are peaked inside of the surface. When examining 20 30 40 50 60 1g9/2 2p1/2 1f5/2 2p3/2 1f7/2 1d3/2 2s1/2 1d5/2 2d3/2 3s1/2 2d5/2 1g9/2 2p1/2 1f5/2 2p3/2 1f7/2 2s1/2 1d3/2 FIG. 10: (Color online) Single-particle spectra of neutrons (upper panel) and protons (lower panel) for the chain of Ni isotopes, as obtained with the parameterization T22 with van- ishing combined J2 terms. The thick solid line in the upper panel denotes the Fermi energy for neutrons. the combined contribution from the spin-orbit and tensor forces to the spin-orbit potential (35), one must keep in mind that they are peaked at different radii. Moreover, the variation of tensor-term coupling constants among a set of parameterizations implies a rearrangement of the spin-orbit term strength, as will be discussed later. As a consequence, taking into account the tensor force modi- fies the width and localization of the spin-orbit potential Wq(r) much more than it modifies its depth through the variation of the spin-orbit currents. Our observations also confirm the finding of Otsuka et al. [46] that the spin-orbit splittings might be more strongly modified by the tensor force than they are by neutron skins in neutron-rich nuclei through the reduc- tion of the gradient of the density. Figure 9 shows the spin-orbit potential of the protons for the chain of Ni isotopes. Here, the contribution from the spin-orbit force has a larger contribution coming from the gradient of the proton density that just grows with the mass number, without being subject to varying shell fluctuations. The same holds for the proton contribution from the tensor terms. Only the neutron contribution from the tensor terms varies rapidly, proportional to Jn displayed in Fig. 5, which has a very limited effect on the total spin-orbit potential, though. With that, we can examine how the tensor terms af- fect the evolution of single-particle spectra. To that end, Fig. 10 shows the single-particle energies of protons and neutrons along the chain of Ni isotopes for the parameter- ization T22 with vanishing combined tensor terms, which 20 30 40 50 60 1g9/2 2p1/2 1f5/2 2p3/2 1f7/2 1d3/2 2s1/2 1d5/2 2d3/2 3s1/2 2d5/2 1g9/2 2p1/2 1f5/2 2p3/2 1f7/2 2s1/2 1d3/2 FIG. 11: (Color online) The same as Fig. 10, obtained with T44 with proton-neutron and like-particle tensor terms of equal strength. will serve as a reference, while Fig. 11 shows the same for the parameterization T44 with proton-neutron and like- particle tensor terms of equal strength. For the latter, the variation of the neutron spin-orbit current with N in- fluences both neutron and proton single-particle spectra. The effect of the tensor terms is subtle, but clearly visi- ble: for T22, the major change of the single-particle en- ergies is their compression with increasing mass number, while for T44 the level distances oscillate on top of this background correlated to the neutron shell and sub-shell closures at N = 20, 28, 40 and 50. As shown above, the neutron spin-orbit current vanishes for N = 20, where it consequently has no effect on the spin-orbit potentials and splittings. By contrast, the neutron spin-orbit cur- rent is large for N = 28 and 50, where its contribution to the spin-orbit potential reduces the splittings from the spin-orbit force. The strong variation of the spin-orbit current with nucleon numbers is typical for light nuclei up to about mass 100. For heavier nuclei, its variation becomes much smaller. This is exemplified in Fig. 12 for the neutron spin-orbit current in the chain of Pb isotopes. There remain the fast fluctuations at small radii which we al- ready saw for the Ni isotopes and that reflect the subse- quent filling of low-ℓ levels with many nodes, but which have a very limited impact on the spin-orbit splittings when fed into the spin-orbit potential. The dominating peak of the spin-orbit current, just beneath the surface shows only small fluctuations, as the overlapping spin- orbit splittings of levels with different ℓ never give rise to a spin-saturated configuration in heavy nuclei. Pb, T44 0.010 0.005 0.000 ρsat/2 0 2 4 6 8 10r [fm] -0.005 0.000 0.005 0.010 0.015 Jn [fm FIG. 12: (Color online) Radial component of the Neutron spin-orbit current for the chain of Pb isotopes plotted in the same manner as in Fig. 5. Note that both the spin-orbit current J and the spin- orbit potential are exactly zero at r = 0 as they are vectors with negative parity. B. Single-particle energies As a next step, we analyze the modifications that the presence of J2 terms brings to single-particle energies in detail. Before we do so, a few general comments on the definition and interpretation of single-particle energies are in order. From an experimental point of view, empir- ical single-particle energies in a doubly-magic nucleus are determined as the separation energies between the even- even doubly magic nucleus and low-lying states in the adjacent odd-A nuclei, i.e. they are differences of bind- ing energies. In nuclear models, however, it is customary to discuss shell structure and single-particle energies in terms of the spectrum of eigenvalues ǫi of the Hartree- Fock mean-field Hamiltonian (in even-even nuclei), as we have done already in Figs. 10 and 11: ĥΦi = ǫiΦi . (37) In the nuclear EDF approach without pairing, the ref- erence state is directly constructed as a Slater determi- nant of eigenstates of ĥ; hence, the corresponding eigen- values are directly connected to the fundamental build- ing blocks of the theory and reflect the mean field in the nucleus. The density of single-particle levels around the Fermi surface drives the magnitude of pairing cor- relations, the relative distance of single-particle levels at sphericity and their quantum numbers determine to a large extent the detailed structure of the deformation en- ergy landscape which in turn, determines the collective spectroscopy. The spectroscopic properties of even-even nuclei, in particular when they exhibit shape coexistence, provide valuable benchmarks for the underlying single- particle spectrum [56]. The link between the spectrum of single-particle energies on the one hand and the col- lective excitation spectrum on the other hand, however, always remains indirect. On the other hand, “single-particle” states near the Fermi level of a magic nucleus can be observed by adding or removing a particle in one of these states, and thus cor- respond to the ground and excited states of the neighbor- ing odd-mass nuclei. Assuming an infinitely stiff magic core, which is neither subject to any rearrangement or po- larization, nor to any collective excitations following the addition (or removal) of a nucleon, the separation ener- gies with the states in the odd-mass neighbors are equal to the single-particle energies as defined through (37). This highly idealized situation is modified by static [88] and dynamic [89, 90] correlations, often called “core po- larization” (see chapter 7 of Ref. [91]) and “particle- vibration coupling” (see section 9.3.3 of Ref. [92]) in the literature, that alter the separation energies. The main effect of the correlations is that they compress the spec- trum, pulling down the levels from above the Fermi en- ergy and pushing up those from below. The gross fea- tures, i.e. the ordering and relative placement of single- particle states, however, are more weakly affected by correlations. The particle-vibration coupling, however, is also responsible for the fractionization of the single- particle strength. When the latter is too large, the naive comparison between the calculated ǫi given by Eq. (37) and the energy of the lowest experimental state with the same quantum numbers is not even qualitatively mean- ingful anymore [48]. We mention that a part of the static correlations orig- inate from the non-vanishing time-odd densities in the mean-field ground-state of an odd-A nucleus, that also cannot be truly spherical, so that the complete energy functional from Eq. (23) should be considered in a fully self-consistent calculation of the separation energies. The effective single-particle energies that are used to characterize the underlying shell structure in the inter- acting shell model [93] have a slightly different mean- ing. Their definition usually renormalizes polarization and particle-vibration coupling effects around a doubly- magic nucleus whereas their evolution is discussed in terms of monopole shifts [94]. A collection of effective single-particle energies and their evolution was collected by Grawe [95, 96]. Note that the SkX parameterization of the Skyrme energy functional by Brown and its vari- ants [48, 97] were constructed aiming at a description of effective single-particle energies along these lines. It should be kept in mind that the obvious, coarse dis- crepancies between the calculated spectra of ǫµ and the empirical single-particle energies are often larger than the uncertainties coming from the missing correlations, as long as one observes some elementary precautions. We took care to ensure that the states used in the analy- sis below were one-quasiparticle states weakly coupled to core phonons. First, we checked that the even-even nucleus of interest could be described as spherical, indi- cated by a sufficiently high-lying 2+ state. Second, we avoided all levels which were obviously correlated with the energies of 2+ states in the adjacent semi-magic se- ries, as this indicates strong coupling with core excita- tions. Finally, we carefully examined states, lying above the 2+ energy and/or twice the pairing gap of adjacent semi-magic nuclei, in order to eliminate those more accu- rately described as an elementary core excitation coupled to one or more quasiparticles, which generally appear as a multiplet of states. We did not attempt to use energy centroids calculated with use of spectroscopic factors, as these are not systematically available. Indeed, our re- quirement is that if some collectivity is present, it should be similar among all nuclei considered, in order to be eas- ily subtracted out. Empirical single-particle levels shown below are determined from the lowest states having given quantum numbers in an odd-mass nucleus. 1. Spin-orbit splittings The primary effect one expects from a tensor term is that it affects spin-orbit splittings by altering the strength of the spin-orbit field in spin-unsaturated nuclei, according to Eq. (35). One should remember, though, that the spin-orbit coupling itself is readjusted for each pair of coupling constants CJ0 , and C 1 . The effect of this readjustment is generally opposite to that of the variation of the isoscalar tensor term coupling constant. It should thus be stressed that the effects described result from the balance between the variation of tensor and spin-orbit terms, which for most of our parameterizations pull into opposite directions. Common wisdom states that the energy spacing be- tween levels that are both above or both below the magic gap are not much affected by correlations, even when their absolute energy changes; hence it is common prac- tice to confront only the spin-orbit splittings between pairs of particle or hole states with calculated single- particle energies from the spherical mean field. Figure 13 shows the relative error of single-particle splitting of such levels for doubly-magic nuclei throughout the chart of nu- clei. The calculated values are typically 20 to 60% larger than the experimental ones, with the exception of 16O, where the splittings of the neutron and proton 1p states are acceptably reproduced at least for the parameteri- zations T22, T24 and T42, i.e. those with the weakest tensor terms in the sample. It is noteworthy that the calculated splittings depend much more sensitively on the tensor terms for light nuclei with spin-saturated shells (protons and neutrons in 16O, protons in 90Zr) than for the heavy doubly-magic 132Sn and 208Pb, which are quite robust against a variation of the tensor terms. The reason will become clear below. ν1p π1p 132Sn ν2d π2d 208Pb ν3p π2d FIG. 13: (Color online) Relative error of the spin-orbit split- tings in doubly-magic nuclei for ℓ ≤ 2 levels. 2. Connection between tensor and spin-orbit terms The finding that our parameterizations systematically overestimate the spin-orbit splittings deserves an expla- nation. It was earlier already noted that all standard Skyrme interactions, including the SLy parameteriza- tions that share our fit protocol, have an unresolved trend that overestimates the spin-orbit splittings in heavy nu- clei [14, 29, 98]. Adding the tensor terms, however, further deteriorates the overall description of spin-orbit splittings, instead of improving it. It is particularly dis- turbing that the spin-orbit splitting of the 3p level in 208Pb that was used to constrain W0 in the fit is overes- timated by 30 to 40%, which is larger than the relative tolerance of 20% included in the fit protocol. In fact, it turns out that the coupling constant W0 of the spin- orbit force is more tightly constrained by the binding energies of light nuclei than by this or any other spin- orbit splitting. In the HF approach used during the fit, the structure of 40Ca, 48Ca, and 56Ni differs by the occu- pation of the neutron and proton 1f7/2 levels. First, we have to note that the terms in the energy functional that contain the spin-orbit current play an important role for the energy difference between 40Ca and 56Ni. The com- bined contribution from the tensor and spin-orbit terms varies from a near-zero value in the spin-saturated 40Ca to about −60 MeV in 56Ni for all our parameterizations, which is a large fraction of the −142 MeV difference in total binding energy between both nuclei. The Z = 40 subshell and Z = 50 shell are another example of abrupt variation of the spin-orbit current with the filling of the 1g9/2 level, which strongly affects the relative binding energy of N = 50 isotones 90Zr and 100Sn. Second, the fit to phenomenological data can take advantage of the large relative variation of these terms to mock up missing physics in the energy functional that should contribute to the energy difference, but that is absent in it. The consequence will be a spurious increase of the spin-orbit and tensor term coupling constants. The resulting energy functional will correctly describe the mass difference, but not the physics of the spin-orbit and tensor terms. In order to test the above interpretation, we performed a refit of selected TIJ parameterizations without taking into account the masses of 40Ca, 48Ca, 56Ni and 90Zr in the fit procedure. In the resulting parameterizations, the spin-orbit coefficient W0 is typically 20% lower than in the original ones. As a consequence, the empirical value for the spin-orbit splitting of the neutron 3p level in 208Pb is met well within tolerance, at the price of binding en- ergy residuals in light nuclei being unacceptably large, i.e. 56Ni being underbound by 5 MeV while 40Ca and 90Zr are overbound by up to 10 MeV. While the global trend of the spin-orbit splittings shown in Fig. 13 is enor- mously improved with these fits, in particular for heavy nuclei, the overall agreement of the single-particle spectra with experiment is not, so that we had to discard these parameterizations. This finding hints at a deeply rooted deficiency of the Skyrme energy functional. The spin- orbit and, when present, tensor terms indeed do simu- late missing physics of the energy functional at the price of unrealistic spin-orbit splittings. This also hints why perturbative studies, as those performed in [37, 49] give much more promising results than what we will find be- low with our complete refits. We will discuss mass resid- uals in more detail in Sect. IVC1 below. During the fit, the masses of light nuclei do not only compromise the spin-orbit splittings, they also establish a correlation betweenW0 and C 0 in all our parameteriza- tions. The combined spin-orbit and spin-current energy of a given spherical nucleus (N,Z) is given by (keeping only the isoscalar part since we shall focus on the N = Z nuclei 40Ca and 56Ni) 0 (N,Z) = C 0 (N,Z) + C 0 (N,Z) (38) I∇J0 (N,Z) = d3r ρ0∇ · J0 (39) IJ0 (N,Z) = d3r J20 . (40) -30 -15 0 15 30 45 60 75 90 105 120 J [MeV fm5] FIG. 14: Correlation between the values of spin-orbit cou- pling constant C∇J0 and the isoscalar spherical effective spin- current coupling constant CJ0 . Dots: values for the actual parameterizations TIJ , solid line: trend estimated through Eq. (42) (see text). The difference of E 0 between 56Ni and 40Ca = ∆Espin (41) turns out to be fairly independent from the parameteri- zation. Averaged over all 36 parameterizations TIJ used here, ∆Espin has a value of −58.991MeV with a standard deviation as small as 3.202 MeV, or 5.4%. The integrals in Eqs. (39,40) are fairly independent from the actual parameterization. For a rough estimate, we can replace them in Eq. (38) by their average values. Plugged into Eq. (41) this yields C∇J0 = ∆Espin − CJ0 〈I − IJ0 〈I∇J0 ( 56Ni)− I∇J0 ( 40Ca)〉 . (42) Figure 14 compares the values of C∇J0 as obtained through (42) with the values for the actual parameter- izations. The estimate works very well, which demon- strates that C∇J0 = − W0 and C 0 are indeed correlated and cannot be varied independently within a high qual- ity fit of the energy functional (28). As the combined strength of the spin-orbit and tensor terms in the energy functional is mainly determined by the mass difference of the two N = Z nuclei 40Ca and 56Ni, the spin-orbit coupling constant W0 depends more or less linearly on the isoscalar tensor coupling constant CJ0 , while for all practical purposes it is independent from the isovector one, see also Fig. 4 above. 3. Splitting of high-ℓ states and the role of the radial form factor As stated above, it is common practice to confront only the spin-orbit splittings between pairs of particle or hole states with calculated single-particle energies from the spherical mean field. The spin-orbit splitting of intruder states is rarely examined. Figure 15 displays the relative ν1f π1f 132Sn ν1h π1g 208Pb ν1i π1h FIG. 15: (Color online) Spin-orbit splittings of high-ℓ levels in magic nuclei across the Fermi energy. The calculated values are less robust against correlation effects than those shown in Fig. 13 and have to be interpreted with caution (see text). deviation of the spin-orbit splittings of the intruder states with ℓ ≥ 3 that span across major shell closures and are thus given by the energy difference of a particle and a hole state. These splittings are not “safe”, i.e. they can be expected to be strongly decreased by polarization and correlation effects [88, 89, 90]. To leave room for this effect, a mean-field calculation should overestimate the empirical spin-orbit splittings. We observe, however, that mean-field calculations done here give values that are quite close to the experimental ones, or even smaller for parameterizations with large positive isoscalar tensor coupling (cf. the evolution from T22 to T66). This means that the spin-orbit splittings are not too large in general, as might be concluded from Fig. 13, but that there is a wrong trend of the splittings with ℓ with the strength of the spin-orbit potential establishing a compromise between the in-shell splittings of small ℓ orbits that are too large and the across-shell splittings of the intruders that are tentatively too small. In fact, the levels in Fig. 15 obviously have in common that their radial wave functions do not have nodes, while the levels in Fig. 13 have one or two nodes, with the notable excep- tion of the 1p levels in 16O, for which we also find smaller deviations of the spin-orbit splittings than for the other levels in Fig. 13. Underestimating the spin-orbit splittings of intruder levels has immediate and obvious consequences for the performance of an effective interaction, as this closes the magic gaps in the single-particle spectra and compro- mises the predictions for doubly-magic nuclei, as we will demonstrate in detail below. By contrast, the spin-orbit splittings of the low-ℓ states within the major shells have no obvious direct impact on bulk properties. Their devi- ation from empirical data is less dramatic, as the typical bulk observables discussed with mean-field approaches are not very sensitive to them. It is only in applica- tions to spectroscopy that their deficiencies become ev- ident. It is noteworthy that the parameterization T22 without effective tensor terms at sphericity provides a reasonable compromise between the tentatively underes- timated splittings of the intruder levels shown in Fig. 15 and the tentatively overestimated splittings of the lev- els within major shells shown in Fig. 13 above, while for parameterizations with tensor terms this balance is lost. There clearly is a proton-neutron staggering in Figs. 13 and 15, such that calculated proton splittings are rela- tively smaller than the neutron ones. The effect appears both when comparing proton and neutron levels with dif- ferent ℓ in the same nucleus, and when comparing proton and neutron levels with the same ℓ in the same or dif- ferent nuclei (see the 1h levels in 132Sn and 208Pb). The staggering for the intruder levels is even amplified for pa- rameterizations with large proton-neutron tensor term, as T62, T64 or T66. The effect is particularly promi- nent for the heavy 132Sn and 208Pb with a large proton- to-neutron ratio N/Z, which might hint at unresolved isospin dependence of the spin-orbit interaction, although alternative explanations that involve how single-particle states in different shells should interact through tensor and spin-orbit forces are possible as well, see also the next paragraph. Note that also the spin-orbit splittings of the low-ℓ levels shown in Fig. 13 exhibit a staggering, which is of smaller amplitude, though. It has been pointed out by Skalski [99], that an exact treatment of the Coulomb exchange term (compared to the Slater approximation used here and nearly all existing literature) does indeed slightly increase the spin-orbit splittings of protons across major shells. This effect might give a clue to the stagger- ing observed for the N = Z nucleus 56Ni, but the magni- tude of the effect reported in [99] is too small to explain the large staggering we find for the heavier N 6= Z nuclei. Next, we use the example of 132Sn to demonstrate why the spin-orbit splittings of nodeless high-ℓ states are more sensitive to the tensor terms than low-ℓ states with one or several nodes, see Fig. 16. The lower panel shows the neutron spin-orbit potential in 132Sn for four different parameterizations, while the upper panel shows selected radial single-particle wave functions. The ν 1h11/2 and 0 1 2 3 4 5 6 7 8 9 10 r [fm] ν2d3/2 ν2d5/2 ν1h11/2 π1g9/2 FIG. 16: (Color online) Neutron spin-orbit potential (top) and the radial wave function of selected orbitals (bottom) in 132Sn. π 1g9/2 levels give the main contribution to the neutron and proton spin-orbit currents in this nucleus, and con- sequently to the tensor contribution to the spin-orbit po- tential. Indeed, the largest differences between the spin- orbit potentials from the chosen parameterizations are caused by the varying contribution from the tensor terms and appear for the region between 3 and 6 fm, where the wave functions of the 1g and 1h states are peaked. This region corresponds to the inner flank of the spin-orbit po- tential well, while the outer flank is much less affected. While the 1g and 1h wave functions are peaked at the in- ner flank, the 2d orbitals have their node in this region. Consequently, the splittings of the 1g and 1h levels are strongly modified by the tensor terms, while those of the 2d orbitals are quite insensitive. As a rule of thumb, the tensor contribution to the spin-orbit potential in doubly-magic nuclei comes mainly from the nodeless intruder states, which, when present, in turn mainly affect their own spin-orbit splittings, leaving the splittings of the low-ℓ states with one or more nodes nearly unchanged for reasons of geometrical overlap. We note in passing that the slightly different radial wave functions of the 2d orbitals demonstrate nicely that their contribution to the spin-orbit current, Eq. (27), can- not completely cancel. In fact, when regarding more specifically the evolution of the spin-orbit potential between the parameterizations T22 and T66, it is striking that for T66 it is essentially narrowed and its minimum slightly pushed towards larger radii, while its depth remains unaltered. Recalling that ] 132Sn, ν (a) 1h centroid Exp. T22 T42 T62 T24 T44 T64 T26 T46 T66 (7/2+) (5/2+) (1/2+) (11/2-) (3/2+) (7/2-) (3/2-) (9/2-) (1/2-) 1g7/2 2d5/2 3s1/2 2d3/2 1h11/2 2f7/2 3p3/2 1h9/2 3p1/2 2f5/2 132Sn, π (b) 1g centroid Exp. T22 T42 T62 T24 T44 T64 T26 T46 T66 (1/2-) (9/2+) (7/2+) (5/2+) (3/2+) (11/2-) 2p1/2 1g9/2 1g7/2 2d5/2 2d3/2 3s1/2 1h11/2 FIG. 17: Single-particle energies in 132Sn for a subset of our parameterizations. We also show the centroid of the intruder levels, defined through Eq. (43) Top panel: neutron levels, bottom panel: proton levels. A thick mark indicates the Fermi level. T66 shows a pathological behavior of too weak spin-orbit splitting of the intruder states, it appears that a cor- rect ℓ-dependence of spin-orbit splittings might require to modify the radial dependence of the spin-orbit poten- tial such that it becomes wider towards smaller radii. This uncalled-for modification of the shape of the spin- orbit field has previously been put forward by Brown et al. [48] as an argument for a negative like-particle J2 coupling constant α. However, as will be discussed in paragraph IVB6 below, the evolution of single-particle levels along isotopic chains calls for α > 0, see also [48]. Additionally, as we will show in appendix B, large nega- tive values of α pose the risk of instabilities towards the transition to states with unphysical shell structure. 4. Single-particle spectra of doubly-magic nuclei After we have examined the predictions for spin-orbit splittings, we will now turn to the overall quality of the single-particle spectra of doubly-magic nuclei. Figure 17 shows the single-particle spectrum of 132Sn. It is evi- dent that as a consequence of the underestimated spin- orbit splittings of the intruder levels that we discussed in the last section, the spectrum is deteriorated for large positive isoscalar tensor term coupling constants CJ0 (see T66), as, for example, a decrease of the spin-orbit split- 208Pb, ν (a) 1i centroid Exp. T22 T42 T62 T24 T44 T64 T26 T46 T66 13/2+ 11/2+ 15/2- 3p3/2 2f5/2 1i13/2 3p1/2 2g9/2 1i11/2 3d5/2 4s1/2 1j15/2 2g7/2 3d3/2 208Pb, π (b) 1h centroid Exp. T22 T42 T62 T24 T44 T64 T26 T46 T66 11/2- 2d5/2 2d3/2 3s1/2 1h11/2 1h9/2 2f7/2 1i13/2 2f5/2 FIG. 18: Same as Fig. 17 for 208Pb. ting of the neutron 1h shell pushes the 1h11/2 further up, closing the N = 82 gap. As a consequence, the presence of the tensor terms cannot remove the problem shared by all standard mean-field methods that always wrongly put the neutron 1h11/2 level above the 2d3/2 and 3s1/2 lev- els [29], which compromises the description of the entire mass region. For the same reason, the proton spectrum of 132Sn also excludes interactions with large positive CJ0 , which reduces the Z = 50 gap between the 1g levels to unacceptable small values. Figure 17 also shows the energy centroids of the ν 1h and π 1g levels, defined as εcentqnℓ = 2ℓ+ 1 εqnℓ,j=ℓ+1/2 + 2ℓ+ 1 εqnℓ,j=ℓ−1/2 . (43) The position of the centroid is fairly independent from the parameterization. Assuming that the calculated en- ergy of the centroid of an intruder state is more robust against corrections from core polarization and particle- vibration coupling that its spin-orbit splitting, we see that the ν 1h centroid is clearly too high in energy by about 1 MeV. In combination with its tentatively too small spin-orbit splitting, see Fig. 15, this offers an ex- planation for the notorious wrong positioning of the ν 1h11/2, 2d3/2 and 3s1/2 levels in 132Sn [29]. The near- degeneracy of the ν 2d3/2 and 3s1/2 levels is always well reproduced, while the 1h11/2 comes out much too high. As the 1h11/2 is the last occupied neutron level, self- consistency puts it close to the Fermi energy, which, in turn, pushes the 2d3/2 and 3s1/2 levels down in the spec- trum. The overall situation is similar for 208Pb, see Fig. 18. Again, the high-ℓ intruder states move too close to the Z = 82 and N = 126 gaps for large positive CJ0 . The effect is less obvious than for 132Sn as the intruders and their spin-orbit partners are further away from the gaps. Still, the level ordering and the size of the Z = 82 gap become unacceptable for parameterizations with large tensor coupling constants. For strong tensor term cou- pling constants (both like-particle and proton-neutron), a Z = 92 gap opens in the single-particle spectrum of the protons that is also frequently predicted by relativistic mean-field models [14, 88] but absent in experiment [100]. The single-particle spectra for the light doubly magic nuclei 40Ca (Fig. 19), 48Ca (Fig. 20), 56Ni (Fig. 21), 68Ni (Fig. 22) and 90Zr (Fig. 23), all have in common that the relative impact of the J2 terms on the ordering and relative distance of single-particle levels is even stronger than for the heavy nuclei discussed above. But not all of the strong dependence on the coupling constants of the J2 terms that we see in the figures is due to the ac- tual contribution of the tensor terms to the spin-orbit potential. This is most obvious for 40Ca, where protons and neutrons are spin-saturated so that the J2 terms do not contribute to the spin-orbit potentials. Still, increas- ing their coupling constants increases the spin-orbit split- tings, which manifests the readjustment of the spin-orbit force to a given set of CJ0 and C 1 (see Fig. 4). The evolu- tion of the spin-orbit splittings in 40Ca visible in Fig. 19 is the background which we have to keep in mind when discussing the impact of the tensor terms on nuclei with non-vanishing spin-orbit currents. Note that the spin- orbit coupling constant W0 is correlated with isoscalar tensor coupling constant CJ0 , such that the single-particle spectra obtained with T24 and T42 are very similar, as they are for T26, T44 and T62. For 48Ca, Fig. 20, the protons are still spin-saturated with vanishing proton spin-orbit current Jp, while for neutrons we have a large Jn. Depending on the nature of the tensor terms in the energy functional – i.e. like- particle or proton-neutron or a mixture of both – the spin-orbit current will either contribute to the spin-orbit potential of the neutrons or that of the protons or both, see Eq. (35). For the parameterizations with dominating like-particle J2 term, for example T24 and T26, the situ- ation for the protons is the same as for 40Ca: there is no contribution from the tensor terms to the proton spin- orbit splittings, but compared to T22 the proton Z = 20 gap is reduced through the readjustment of the spin-orbit force, leading to values that are too small. For the same parameterizations, the large contribution from Jn to Wn opens up the N = 20 gap to values that are tentatively too large, as it reduces the neutron spin-orbit splittings and thereby compensates, even overcompensates, the ef- fect from the readjustment of the spin-orbit force. At the same time the N = 28 gap is reduced. The opposite effect is seen for parameterizations with large proton-neutron tensor term, for example T42 or T62. For those, the pro- ton spin-orbit splitting is reduced, opening up the Z = 20 40Ca, ν (a) Exp. T22 T24 T26 T42 T44 T46 T62 T64 T66 1d5/2 2s1/2 1d3/2 1f7/2 2p3/2 40Ca, π (b) Exp. T22 T24 T26 T42 T44 T46 T62 T64 T66 2s1/2 1d3/2 1f7/2 2p3/2 FIG. 19: Same as Fig. 17 for 40Ca. 48Ca, ν (a) Exp. T22 T24 T26 T42 T44 T46 T62 T64 T66 2s1/2 1d3/2 1f7/2 2p3/2 2p1/2 1f5/2 48Ca, π (b) Exp. T22 T24 T26 T42 T44 T46 T62 T64 T66 2s1/2 1d3/2 1f7/2 2p3/2 FIG. 20: Same as Fig. 17 for 48Ca. gap compared to T22, while the neutron spin-orbit split- tings are increased by the background effect from the readjusted spin-orbit force. 56Ni, ν (a) Exp. T22 T24 T26 T42 T44 T46 T62 T64 T66 1f7/2 2p3/2 1f5/2 2p1/2 56Ni, π (b) Exp. T22 T24 T26 T42 T44 T46 T62 T64 T66 1f7/2 2p3/2 1f5/2 2p1/2 FIG. 21: Same as Fig. 17 for 56Ni. 68Ni, ν (a) Exp. T22 T24 T26 T42 T44 T46 T62 T64 T66 (5/2-) (1/2-) (9/2+) 2p3/2 1f5/2 2p1/2 1g9/2 68Ni, π (b) Exp. T22 T24 T26 T42 T44 T46 T62 T64 T66 (7/2-) 1f7/2 2p3/2 1f5/2 2p1/2 FIG. 22: Same as Fig. 17 for 68Ni. For 56Ni, Fig. 21, we have large Jn and Jp. In this N = Z nucleus, the like-particle or proton-neutron parts of the tensor terms cannot be distinguished. The spectra depend only on the overall coupling constant of the isoscalar tensor term CJ0 , on the one hand directly through the contribution of the tensor terms to the spin- orbit potentials, and on the other hand through the back- ground readjustment of W0 that is correlated to C well. As already mentioned, results for T24 and T42 are very similar, as they are for T26, T44 and T62. All pa- rameterizations have in common that the proton and neu- tron gaps at 28 are too small. The variation of the single- particle spectra among the parameterizations is smaller than for 40Ca, mainly because the tensor terms compen- sate the background drift from the readjustment of W0. The slightly neutron-rich 68Ni combines a spin- saturated sub-shell closure N = 40 that gives a vanishing neutron spin-orbit current with the magic Z = 28 that gives a strong proton spin-orbit current. The variation of the single-particle spectra in dependence of the coupling constants of the tensor terms is similar to those of 48Ca, with the roles of protons and neutrons exchanged. The nucleus 90Zr combines the spin-saturated proton sub-shell closure Z = 40 with the major neutron shell closure N = 50. The high degeneracy of the occupied ν 1g9/2 level leads to a very strong neutron spin-orbit cur- rent, while the proton spin-orbit current is zero. Even in the absence of a tensor term contributing to their spin-orbit potential for parameterizations with pure like- particle tensor terms, the proton single-particle spectra are dramatically changed by the feedback effect from the readjusted spin-orbit force; see the evolution from T22 to T26. The π 1g9/2 comes down, and closes the Z = 40 sub-shell gap. For parameterizations with pure proton- neutron tensor term, one has the opposite effect, this time because the contribution from the tensor terms overcom- pensates the background effect from the spin-orbit force. The effect of the tensor terms on the neutron spin-orbit splittings is less dramatic, but still might be sizable. We have to point out that the calculations displayed in Fig. 23 were performed without taking pairing into account, as the HFB scheme breaks down in the weak pairing regime of doubly magic nuclei. For some ex- treme (and unrealistic) parameterizations, however, the gaps disappear which, in turn, would lead to strong pair- ing correlations if the calculations were performed within the HFB scheme. This happens, for example, for neu- trons in 90Zr when using T26 and T46. Interestingly, the pairing correlations for neutrons break the spin sat- uration, which leads to a substantial neutron spin-orbit current Jn. As these parameterizations use values of the like-particle coupling constant significantly larger than the neutron-proton one, Jn feeds back onto the neutron spin-orbit potential only, Eq. (35). As the correspond- ing coupling constant α is positive for T26 and T46, the contribution from the tensor terms reduces the spin-orbit splittings, in particular those of the 1g9/2 and 1f5/2. As a result, this counteracts the reduction of the N = 40 gap predicted by T26 and T46 in calculations without pairing. 90Zr, ν (a) Exp. T22 T24 T26 T42 T44 T46 T62 T64 T66 (11/2-) 2p3/2 1f5/2 2p1/2 1g9/2 2d5/2 3s1/2 2d3/2 1g7/2 1h11/2 90Zr, π (b) Exp. T22 T24 T42 T44 T46 T62 T64 T66 1f5/2 2p3/2 2p1/2 1g9/2 FIG. 23: Same as Fig. 17 for 90Zr. 5. Evolution along isotopic chains: np coupling In the preceding sections, we have analyzed character- istics of the single-particle spectra for isolated doubly- magic nuclei. We found that larger tensor terms do not lead to an overall improvement of the single-particle spec- tra. However, we also argued that it might be essentially due to deficiencies of the central (and possibly spin-orbit) interactions and that it should not be used to discard the tensor terms as such. In any case, the results gathered so far on single-particle spectra of doubly-magic nuclei do not permit to narrow down a region of meaningful cou- pling constants of the tensor terms. The analysis must be complemented by looking at other observables. A better suited observable is provided by the evolution of spin- orbit splittings along an isotopic or isotonic chain, which ideally reflects the nucleon-number-dependent contribu- tion from the J2 terms to the spin-orbit potentials. Un- fortunately, safe experimental data for the evolution of spin-orbit partners are scarce; hence, one has to content oneself to the evolution of the energy distance of lev- els with different ℓ, assuming that the effect is primarily caused by the evolution of the spin-orbit splittings of each level with its respective partner. A popular playground for such studies is the chain of Sn isotopes, where two such pairs of levels have gained attention; the π 2d5/2 and π 1g7/2 on the one hand, and the π 1g7/2 and π 1h11/2 on the other hand. Figure 24 shows these two sets of results for a selection of our parameterizations. Experimentally, the 2d5/2 and 1g7/2 levels cross be- 56 60 64 68 72 76 80 84 π1g7/2 - π2d5/2 π1h11/2 - π1g7/2 FIG. 24: (Color online) Distance of the proton 1h11/2 and 1g7/2 levels (top) and of the proton 2d5/2 and 1g7/2 levels (bottom), for the chain of tin isotopes. The “best” param- eterization cannot and should not be determined with a χ2 criterion, see text. tween N = 70 and 72, such that the 2d5/2 provides the ground state of light odd-A Sb isotopes, and 1g7/2 that of the heavy ones, see for example Ref. [101]. The crossing as such is predicted by many mean-field interactions and most of the parameterizations of the Skyrme interaction we use here. It has also been studied in detail with the standard Gogny force (without any tensor term) using elaborate blocking calculations of the odd-A nuclei [102]. The crossing, however, is never predicted at the right neutron number, see Fig. 24. As we have learned above, we should not assume that the absolute distance of the two levels will be correctly described by any of our param- eterizations (as the centroids of the ℓ shells will not have the proper distance and the spin-orbit splittings have a wrong ℓ dependence within a given shell). Hence, the neutron number where the crossing takes place cannot and should not be used as a quality criterion. What does characterize the tensor terms is the bend of the curves in Fig. 24, as ideally it reflects how the spin-orbit split- tings of both levels change in the presence of the ten- sor terms. Similar caution has to be exercised in the 28 32 36 40 44 π1f5/2 - π2p3/2 FIG. 25: (Color online) Distance of the proton 1f5/2 and 2p3/2 in the chain of Ni isotopes. analysis of the unusual relative evolution of the proton 1g7/2 and 1h11/2 levels that was brought to attention by Schieffer et al. [45]. Their spacing has been investi- gated in terms of the tensor force before [44, 46, 48, 49]. Again, we pay attention to the qualitative nature of the bend without focusing too much on the precise value by which the splitting changes when going from N ≈ 58 to N = 82. Indeed, the matching of the lowest proton frag- ment with quantum number 1h11/2 seen experimentally with the corresponding empirical single-particle energy is unsafe because of the fractionization of the strength as discussed in Ref. [48]. For both pairs of levels, the evolution of their distance can be attributed to the tensor coupling between the pro- ton levels and neutrons filling the 1h11/2 level below the N = 82 gap. Unfortunately, this introduces an addi- tional source of uncertainty: as can be seen in Fig. 17, the ordering of the neutron levels in 132Sn is not prop- erly reproduced by any of our parameterizations, with the 1h11/2 level being predicted above the 2d3/2 level, while it is the other way round in experiment. This means that in the calculations, the contribution from the 1h11/2 level to the neutron spin-orbit current builds up at larger N than what can be expected in experiment. As a conse- quence, the prediction for the relative evolution of the levels might be shifted by up to four mass units to the right compared to experiment for both pairs of levels we examine here. In the end, the trend of both splittings is best repro- duced when using a positive value of the neutron-proton Jn · Jp coupling constant β such that the filling of the neutron 1h11/2 shell decreases the spin-orbit splittings of the proton shells. The parameterizations from the T4J and T6J series indeed do reproduce the bend of empirical data, with, however, a clear shift in the neutron number where it occurs, as expected from the previous discussion. A value of β = 120 MeV fm5, which corresponds to the series of T4J parameterizations, matches its magnitude best (see for example T44). A similar analysis can be performed for the proton 1f5/2 and 2p3/2 levels in the chain of Ni isotopes, see Fig. 25. This case is interesting as no distinctive feature can be observed in the empirical spectra, yet the standard parameterizations without tensor terms like T22 do not reproduce them. In fact, to keep the 1f5/2 and 2p3/2 at a constant distance, two competing effects have to cancel. First, the increasing diffuseness of the neutron density with increasing neutron number diminishes the proton spin-orbit splittings through its reduced gradient in the expression for the proton spin-orbit potential when going from N = 32 to N = 40. Second, the filling of the neu- tron 1f5/2 state reduces the neutron spin-orbit current which in turn increases the proton spin-orbit splittings for interactions with sizable proton-neutron tensor con- tribution to the proton spin-orbit potential when going from N = 32 to N = 40. The former effect can be clearly seen for parameterizations T2J with vanishing proton- neutron tensor term, β = 0. Again, parameterizations of the T4J series seem to be the most appropriate to describe the evolution of these levels. The evolution of single-particle levels is the tool of choice to determine the sign and magnitude of the proton-neutron tensor coupling constant. The value which we favor, as a result of our semi-qualitative analy- sis is β = 120 MeV fm5. This value is only slightly larger than the value of 94 to 96 MeV fm5 advocated by Brown et al. in Ref. [48], which was adjusted to theoretical level shifts in the chain of tin isotopes obtained from a G- matrix interaction. We can consider this as a reasonable agreement. Let us defer the discussion of this value to the end of this section and study in the next paragraph the like- particle tensor-term coupling constant α. 6. Evolution along isotopic chains: nn coupling In order to narrow down an empirical value for the neutron-neutron tensor coupling constant, the ideal ob- servable would be the evolution of neutron single-particle levels along an isotopic chain. Unfortunately, these are only accessible at the respective shell closures. We shall therefore compare neutron single-particle spectra of pairs of doubly-magic nuclei belonging to the same isotopic chain. Again, the necessity to extract pure single-particle effects calls for precautions. We choose pairs of parti- cle or hole levels which are close enough in energy that their absolute spacing is not much affected by particle- vibration coupling. Of course, one also has to be careful if both states appear at relatively high excitation energy in the neighboring odd isotope because the fractionization of their strength could again interfere with the analysis. In the following, we choose pairs of orbitals which are as safe as possible. To remove the uncertainties from the deficiencies of the central and spin-orbit parts of the effective interaction that we have identified above, we will look at a double -60 0 60 120 180 240 α [MeV fm5] β = 0 β = 120 β = 240 FIG. 26: Shift of the distance between the neutron 1d3/2 and 2s1/2 levels when going from 40Ca to 48Ca, Eq. (44) (top) and of the neutron 1f5/2 and 2p1/2 levels when going from and 68Ni, Eq. (45) (bottom). difference, where, first, we construct the energy difference between the neutron 1d3/2 and 2s1/2 levels separately for 40Ca and 48Ca, and then compare the value of this difference in both nuclei δCa = 1d3/2 2s1/2 1d3/2 2s1/2 . (44) Assuming that the problems from the central and spin- orbit forces discussed in Sects. IVB 1 and IVB4 have the same effect in both nuclei, they will cancel out in δCa. The interesting feature of this pair of states is that they are separated by more than 2 MeV in 40Ca, while they are nearly degenerate in 48Ca, see Figs. 19 and 20. Such a shift can only be reproduced with a positive (140- 180 MeV fm5) value of α, which decreases the splitting of the neutron 1d shell when the neutron 1f7/2 level is filled. A similar analysis can be performed for the 1f5/2 and 2p1/2 neutron states in the Ni isotopes 56Ni and 68Ni δNi = 1f5/2 2p1/2 1f5/2 2p1/2 . (45) Going from 56Ni to 68Ni, the neutron 1f5/2 level comes further down in energy than the 2p1/2 level for param- eterizations without tensor terms (T22), see Figs. 21 and 22. The reason for this trend is the geometrical growth of the nucleus, which on the one hand lowers the centroid of the 1f levels in the widening potential well, and on the other hand pushes the spin-orbit field to larger radii, which has opposite effects on the splittings of 2p and 1f states. The like-particle tensor terms can com- pensate this trend through a reduction of the spin-orbit splitting of the 1f levels. The observed downward shift by 0.3 MeV can be recovered with a value of α around 120 MeV fm5, see Fig. 26. It is also gratifying to see that the analysis of Ca and Ni isotopes suggests nearly the same value for the like- particle tensor term coupling constant α. C. Binding energies Our ultimate goal, although far beyond the scope of the present paper, is the construction of a universal nu- clear energy density functional that simultaneously de- scribes bulk properties like masses and radii, giant res- onances, and low-energy spectroscopy, such as quasipar- ticle configurations and collective rotational and vibra- tional states. To crosscheck how our findings on single- particle spectra and spin-orbit splittings translate into bulk properties, we will now analyze the evolution of mass residuals and charge radii along isotopic and iso- tonic chains. It has been repeatedly noted in the liter- ature that the mass residuals from mean-field calcula- tions show characteristic arches [29, 52, 54, 65, 72, 103, 104, 105], where heavy mid-shell nuclei are usually un- derbound compared to the doubly magic ones that are located at the bottom of deep ravines. For light nuclei, the patterns are often less obvious. Part of this effect can be explained and removed taking large-amplitude corre- lations from collective shape degrees of freedom into ac- count through suitable beyond-mean-field methods. In turn, this means that the mass residuals should leave room for the extra binding of mid-shell nuclei from cor- relations. However, it turns out that for typical effective interactions the amplitude of the arches is larger than what is brought by correlations [54]. Furthermore, this effect seems not to be of the same size for isotopic and isotonic chains, which altogether hints at deficiencies of the current effective interactions. Recently, Dobaczewski pointed out [47] that the strongly fluctuating contribution brought by the J2 terms to the total binding energy could remove at least some of the ravines found in the mass residuals around magic numbers. The hypothesis was motivated by calculations that evaluate the tensor terms either perturbatively, or self-consistently, using in this case an existing standard parameterization without tensor terms for the rest of the energy functional. Our set of refitted parameterizations with varied coupling constants of the tensor terms gives us a tool to check how much of the argument persists to a full fit. 1. Semi-magic series Figure 27 displays binding energy residuals along var- ious isotopic and isotonic chains of semi-magic nuclei for a selection of our parameterizations: T22 is the reference with vanishing J2 terms at sphericity; T24 has a sub- stantial like-particle coupling constant α and vanishing proton-neutron coupling constant β, which is similar to most of the published parameterizations which take the J2 terms from the central Skyrme force into account; T42 and T62 are parameterizations with substantial proton- neutron coupling constant β and vanishing like-particle coupling constant; T44 has a mixture of like-particle and proton-neutron tensor terms that is close to what we found preferable for the evolution of spin-orbit splittings above; and T46 is a parameterization that gives the best root-mean-square residual of binding energies for spher- ical nuclei, as we will see below. Finally, T66 is a pa- rameterization with large and equal proton-neutron and like-particle tensor-term coupling constants. The tensor terms have opposite effects in light and heavy nuclei: The curves obtained with T22, the parame- terization without J2 term contribution at sphericity, are relatively flat for the light isotopic and isotonic chains, but show very pronounced arches with an amplitude of 5 or even more MeV for the heavy Sn and Pb isotopic chains. By contrast, the most striking effect of the J2 terms is that they induce large fluctuations of the mass residuals in light nuclei, while they flatten the curves in the heavy ones. The strong variation between the parameter sets for light nuclei are of course the direct consequence of the strong variation of the spin-orbit current J that enters the spin-orbit and tensor terms when going back and forth between nuclei where the configuration of at least one nucleon species is spin-saturated. The variations seen are a result of the modifications of tensor-term coupling constants and the associated readjustment of the spin- orbit strength W0. For example, 48Ca is overbound with respect to 40Ca and 56Ni for parameterizations with a proton-neutron coupling constant β > 0, while the like- particle coupling constant α has a more limited effect. Since only the neutron core is spin-unsaturated in this nucleus, this must be attributed to the increase in the readjusted spin-orbit strength W0 (correlated with C (α + β)) which dominates when β is increased and α kept at zero, and counterbalances the effect of α when the latter varies. See the parameter sets T62 and T66 in Figures 27 and 28. The large overbinding of nuclei around 90Zr (Z = 40, N = 50) for parameterizations with large proton-neutron tensor coupling constant has the same origin. For a given parameterization and a given nucleus, the energy gain from the spin-orbit term seems to be almost always larger than the energy loss from the J2 one, see Fig. 28 for Ca isotopes and Fig. 29 for Sn isotopes. Of course other terms in the energy functional compensate for a part of the gain from the spin-orbit term, but the overall trends of the mass residuals suggest that the spin-orbit energy has a much larger contribution to the differences between the parameterizations visible in Fig. 27 than the J2 terms. We have to note that the spin-orbit current does not completely vanish for the nominally proton and neutron spin-saturated 40Ca for parameterizations with large cou- 16 20 24 28 32 36 40 44 48 48 52 56 60 64 68 16 20 24 28 32 28 32 36 40 44 50 60 70 80 100 110 120 130 FIG. 27: (Color online) Mass residuals Eth − Eexp along selected isotopic and isotonic chains of semi-magic nuclei for the parameterizations as indicated. Positive values of Eth − Eexp denote underbound nuclei, negative values overbound nuclei. pling constants of the J2 terms. For those, the gap at 20 is strongly (and nonphysically) reduced, see Fig. 19. The small gap at 20 does not suppress pairing correlations anymore in our HFB approach. The resulting scattering of particles from the sd shell to the fp shell breaks the spin-saturation, such that there is a finite, in some cases quite sizable, contribution from the spin-orbit term to the total binding energy. Owing to the compensation be- tween all contributions, the total energy gain compared to a HF calculation without pairing is usually small and rests on the order of 200 keV for the parameterizations shown in Fig. 27. It is also important to note that some of the light chains in Fig. 27 are sufficiently close to or even cross the N = Z line that they are subject to the Wigner energy, which still lacks a satisfying explanation, not to men- tion a description in the framework of mean-field meth- ods [106]. The Wigner energy is not taken into account 16 20 24 28 32 FIG. 28: (Color online) Evolution of spin-orbit current (J2t ) energy (bottom panel, zero by construction for T22) and spin- orbit energy (top panel) with neutron number N in the chain of Ca isotopes (Z = 20). in our fits, while it turned out to be a crucial ingredient of any HFB [107, 108, 109] or other mass formula. In fact, as shown in Fig. 14 of Ref. [54], the missing Wigner energy clearly sticks out from the mass residuals for SLy4 (which is very similar to T22) when they are plotted for isobaric chains. This local trend around N = Z is, how- ever, overlaced with a global trend with mass number, such that the missing Wigner energy cannot be spotted anymore when looking at the mass residuals for the iso- topic chain of Ca isotopes, similar to what is seen for T22 in Fig. 27. Within our fit protocol, the correlation be- tween the masses of 40Ca, 48Ca and 56Ni, that is brought by the spin-orbit force (see Sect. IVB 2) does not tolerate a correction for the Wigner energy for standard central and spin-orbit Skyrme forces, as this will lead to an un- acceptable underbinding of 48Ca. This, however, might change when the J2 terms are added. Indeed, Fig. 27 suggests that adding a phenomenological Wigner term around 40Ca and 56Ni to a parameter set like T44, which is consistent with the evolution of single-particle levels, would flatten the curves for the mass residuals in the Ca, Ni and N = 28 chains. The mass residuals for the chain of oxygen isotopes that are not shown here would be im- proved in a similar manner. However, extreme caution should be exercised before jumping to premature conclu- sions, as the spin-orbit splittings and level distances in light nuclei are far from realistic for all our parameter- 50 60 70 80 FIG. 29: (Color online) Same as Fig. 28 for tin isotopes (Z = 50). izations; as a consequence it is difficult to judge if the room we find for the Wigner energy is fortuitous or in- deed a feature of well-tuned J2 terms. Note that the HFB mass formulas that do include a correction for the Wigner energy side-by-side with the J2 terms from the central Skyrme force give satisfying mass residuals for light nu- clei [107, 108, 109], but have nuclear matter properties that are quite different from ours; cf. BSk1 and BSk6 with SLy4 in Table I of Ref. [110]. Our constraints on the empirical nuclear matter properties (same as those on SLy4) that are absent in these HFB mass fits might be the deeper reason for this conflict. Large tensor-term coupling constants straighten the arches in the mass residuals in the heavy Sn and Pb iso- topic chains, but the improvements are not completely satisfactory. Large, combined proton-neutron and like- particle coupling constants tend to transform the arch for the tin isotopic chain into a an s-shaped curve, which is not very realistic from the standpoint of expected cor- rections through collective effects. It can again be as- sumed that the deficiencies of the single-particle spectra pointed out in Fig. 17 are responsible, where the ν 1h11/2 and π 1g9/2 are placed too high above the rest of the single-particle spectra in heavy Sn isotopes. For Pb iso- topes, large values of the tensor terms tend to overbind the neutron-deficient isotopes. It is noteworthy that the tensor terms seem to not much affect the mass residu- als of the heavy Pb isotopes above N = 126, which are on the flank of a very deep ravine that becomes visible 50 60 70 80 90 100 110 120 FIG. 30: (Color online) Two-neutron separation energy along the chain of isotopes (Z = 50). when going towards heavier elements, cf. the SLy4 results in Ref. [54]. It has been often noted that effective interactions that give a similar satisfying description of masses close to the valley of stability give diverging predictions when extrap- olated to exotic nuclei. The standard example is the two- neutron separation energy S2n(N,Z) = E(N,Z − 2) − E(N,Z) for the chain of Sn isotopes. Results obtained with a subset of our parameterizations are shown in Fig. 30. It is noteworthy that the differences for neutron-rich nuclei beyond N = 82 are not larger than those for the isotopes closer to stability. Around the valley of stabil- ity, increasing the coupling constants of tensor terms, in particular the like-particle ones, tilts the curve, pushing it up for light isotopes and pulling it down it for heavy ones, which reflects of course the position of the ν 1h11/2 level that is pushed into the N = 82 gap, see Fig. 17. For the neutron-rich isotopes, small differences appear around N = 90, which reflects the change of level struc- ture above the ν 2f7/2 level and at the drip line, but they are much smaller than the differences seen between pa- rameterizations obtained with different fit protocols, see Fig. 5 of Ref. [29]. 2. Systematics In the preceding section we showed how the J2 terms in the energy functional modify the trends of mass resid- uals along isotopic and isotonic chains, in particular the amplitude of the arches between doubly-magic nuclei. In this section, we want to examine how this translates into quality criteria for the overall performance of the param- eterizations for masses. Figure 31 displays the root-mean-square deviation of -60060120180240 ∆Erms [MeV] (T11) β [MeV fm5]α [MeV fm ∆Erms [MeV] FIG. 31: Root-mean-square deviation from experiment of the binding energies of a set of 134 spherical nuclei, for each of the forces TIJ , vs. α and β (The “(T11)” label indicates the position of this parameterization in the (α, β)-plane). Contour lines at ∆Erms = 2.0, 2.25, 2.5, 3.0, 3.5, 4.0 MeV. The minimal value is found for T46 (∆Erms = 1.96 MeV). the mass residuals for all our 36 parameterizations, eval- uated for a set of 134 nuclei predicted to have spherical mean-field ground states when calculated with the pa- rameterizations SLy4 [54]. One observes a clear mini- mum around T46, i.e. (α, β) = (240, 120), with (Eth − Eexp)r.m.s. = 1.96 MeV, compared with 3.44 MeV for T22 (α = β = 0). We found even slightly better values with even more repulsive isoscalar and isovector coupling constants, but the single-particle spectra of these inter- actions turn out to be quite unrealistic, cf. Sect. IVB 1. This already demonstrates that in the presence of the J2 terms a good fit of masses does not necessarily lead to satisfactory single-particle spectra. Figure 32 demonstrates how the distribution of the mass residuals Eth − Eexp affects the evolution of their r.m.s. value for a subset of 9 parameterizations. For T22 (α = β = 0), the distribution is centered at posi- tive mass residuals, with only very few nuclei being over- bound. Increasing β to 120 MeV fm5 (T42) or even 240 MeV fm5 (T62) shifts the median of the distribution to smaller values, which yields more and more overbound nuclei. For large values of β, the distribution spreads out more, which diminishes the improvement from centering the distribution closer to zero. For given β, increasing α mainly shifts the median of the distribution without spreading out its overall shape, which is preferable to optimize the r.m.s. value. These considerations, however, have to be taken with caution. As said above, we aim at a model where certain correlations beyond the mean-field are treated explicitly, which asks for a distribution of mean-field mass residuals with an asymmetric distribution towards positive mass residuals, and a width that is similar to the difference -10 -8 -6 -4 -2 0 2 4 6 8 10 Eth - Eexp [MeV] FIG. 32: (Color online) Distribution of deviations from ex- periment of the binding energies of a set of 134 spherical nuclei (1 MeV bins) for a subset of parameterizations. Each panel corresponds to a given value of β (from top to bottom: β = 0, 120, 240 MeV fm5). between the maximum and minimum correlation energies to be found. D. Radii The evolution of nuclear charge radii along isotopic chains reflects how the mean field of the protons changes when neutrons are added in the system. In the sim- plistic liquid-drop model, it just follows the geometrical growth of the nucleus ∼ A1/3, but data show that there are many local deviations from this global trend. On the one hand, radii are of course subject to correlations beyond the mean field [54, 111, 112, 113, 114] On the other hand, they are also sensitive to the detailed shell structure, which, in turn, might be influenced by tensor terms. We will concentrate here on two anomalies of the evolution of charge radii, both of which are not much influenced by collective correlations beyond the mean- field (at least in calculations with the Skyrme interaction SLy4) [54]: that the root-mean-square (r.m.s.) charge ra- dius of 48Ca is almost the same as the one of the lighter 40Ca or possibly slightly smaller, and the kink in the iso- topic shifts of mean-square (m.s.) charge radii in the Pb isotopes, where Pb isotopes above 208Pb are larger than what could be expected from liquid-drop systematics. In -60 0 60 120 180 240 β [MeV fm5] 40Ca 48Ca α = 0 α = 120 α = 240 (α = 120) 1s1/2 1p3/2 1p1/2 1d5/2 1d3/2 2s1/2 FIG. 33: (Color online) Middle panel: Difference of mean- square charge radii between 40Ca and 48Ca as a function of the proton-neutron tensor term coupling constant β for three values of α. The experimental value (with error bar) is rep- resented by the two horizontal black lines. Bottom panel: Root-mean-square charge radii of 40Ca and 48Ca. Top panel: Contribution of the single-particle proton states to the dif- ference of the charge radii (mean square radius of the point proton distribution, see Eq. (46)). both cases it is plausible that shell effects are the de- termining factor, although alternative explanations that involve pairing effects have been put forward for the lat- ter case as well [115, 116]. Charge radii have been calculated with the approxima- tion used in Ref. [51]3 and derived from Ref. [117] r2ch = 〈r 2〉p + r r2n + v2i µqi〈σ · ℓ〉i , where the mean-square (m.s.) radius of the point-proton distribution 〈r2〉p is corrected by three terms: the first two estimate the effects of the intrinsic charge distribu- tion of the free proton and neutron (with m.s. radii r2p and r2n) and the third adds a correction from the mag- netic moments of the nucleons. Since we will consider the shift of charge radii for different isotopes of the same series, the actual value of r2p cancels out. For the second correction term, which is independent from the interac- tion, we take r2n = −0.117 fm 2 [29]. Finally, the magnetic correction can only depend weakly on the details of the interaction through the occupation factors v2i when non- magic nuclei are considered. The same expressions had been used during the fit of our parameterizations. We begin with the Ca isotopes. Most parameteriza- tions of Skyrme’s interaction are not able to reproduce that the charge radius of 48Ca has about the same size as that of 40Ca, see Fig. 11 in Ref. [29]. The middle panel of Fig. 33 shows the difference of the the m.s. radii of 48Ca and 40Ca in dependence of the tensor term coupling con- stants α and β. First, this difference is almost indepen- dent of α, the strength of the like-particle tensor terms. Second, it is strongly correlated with β, the strength of the proton-neutron tensor term, with large positive val- ues of β bringing the difference of radii into the domain of experimentally acceptable values [118] or even below, with a best match obtained for β = 80 MeV fm5. This effect can be explained by looking at the proton single- particle spectra of 40Ca (Fig. 19) and 48Ca (Fig. 20). In- deed, one observes that a positive neutron-proton tensor coupling constant decreases the strength of the proton spin-orbit field in 48Ca, which in turn lowers the π 1d3/2 level in 48Ca (compare the parameterizations TIJ in Fig. 20 with increasing I for given J). As a consequence, the m.s. radius of this state decreases as it sinks deeper into the potential well of 48Ca. At the same time, this level is pushed up in 40Ca, which slightly increases the contribution of this state to the charge m.s. radius of this nucleus. This effect is demonstrated in the top panel of Fig. 33, which displays the degeneracy-weighted and nor- malized change of the m.s. radii of proton hole states be- tween 40Ca and 48Ca as a function of the proton-neutron tensor term coupling constant β for forces with a like- particle tensor term coupling constant α = 120 MeV fm5. Indeed, the decreasing contribution from the π1d3/2 state to the m.s. radius significantly decreases the isotopic shift 3 There is a typographical error in Eq. (4.2) in Ref. [51], that was copied to Eq. (110) in Ref. [29]: the ~/mc factor should be squared, as is trivially found by dimensional analysis and confirmed by Ref. [117]. -60 0 60 120 180 240 α [MeV fm5] β = 0 β = 120 β = 240 FIG. 34: Change of slope in the m.s. charge radii ∆2r2ch around 208Pb, Eq. (47), in fm2 as a function of α for three values of β. The experimental value is about one and a half times as large as the largest theoretical value shown here, see text. between both Ca isotopes. It has to be noted that the m.s. value of the charge radii of 40Ca and 48Ca are al- most independent of alpha and that their absolute values are not reproduced for any of our parameterizations. The latter study demonstrates the correlation between the isotopic shift of m.s. charge radius between 40Ca and 48Ca and the absolute single-particle energy of the pro- ton 1d3/2 state. This level can be moved around within the single-particle spectrum with the J2 terms. However, the agreement of the calculated single-particle energy of the proton 1d3/2 state in both nuclei with experiment is not necessarily improved for the parameterizations that reproduce the isotopic shift of the m.s. charge radius. Furthermore, a good reproduction of the isotopic shift does not guarantee that the absolute values of the charge radii are well reproduced, see the bottom panel in Fig. 33. In fact, they are predicted too large for all of our pa- rameterizations, which again points to deficiencies of the central field. Altogether, this suggests that in spite of its sensitivity to the coupling constants of the J2 terms, the isotopic shift of m.s. charge radius between 40Ca and 48Ca should not be used to constrain them before one has gained sufficient control over the central interaction. A few further words of caution are in place. The charge radii of all light nuclei are significantly increased by dy- namical quadrupole correlations, see Fig. 23 of Ref. [54]. Correlations beyond the static self-consistent mean field are also at the origin of the arch of the ms charge radii between 40Ca and 48Ca that is neither reproduced by any pure mean-field model, see again Fig. 11 in Ref. [29], nor by the beyond-mean-field calculations with SLy4 of Ref. [54], while the shell model allows for a satisfactory description [119]. Many explanations have been put forward to explain the kink in the isotopic shifts of Pb radii. As it qual- itatively appears in relativistic mean-field models, but not in non-relativistic ones using the standard spin-orbit interaction (16), it has been used as a motivation to gen- eralize the isospin mix of the standard spin-orbit energy density functional, Eq. (18), to simulate the isospin de- pendence of the relativistic Hartree models [78, 79]. The resulting parameterizations are not completely satisfac- tory, as the price for the improvement of the radii is a fur- ther deterioration of spin-orbit splittings [14], while the relativistic mean field gives a satisfactory description of both. Some standard Skyrme interactions that take the tensor terms from the central Skyrme force into account also give a kink, but it is by far too small to reproduce the experimental values [52]. Plotting the m.s. radii along the chain of Pb isotopes as a function of N , the slopes are nearly linear when looking separately at the isotopes below and above 208Pb. We will concentrate on the change in the slope at 208Pb that is brought by the tensor terms, which can be quantified through the second finite difference of the m.s. radii at 208Pb ∆2〈r2ch〉( 208Pb) (47) r2ch( 206Pb)− 2 r2ch( 208Pb) + r2ch( 210Pb) There are two conflicting values to be found in the lit- erature, either 46.4± 1.4 fm2 [118] and the significantly larger 59 ± 3 fm2 [120]. Figure 34 shows the change of slope around 208Pb as defined through Eq. (47) as a func- tion of the like-particle tensor coupling constant α and for three different values of β. It is striking to see that this quantity is almost independent of the neutron-proton tensor coupling constant β, so the change is mainly in- duced by the tensor interaction between particles of the same kind. It has been noted before that the kink in the isotopic shift of the charge radii in Pb isotopes is corre- lated to the single-particle spectrum of neutrons above N = 126, in particular the position of the 1i11/2 level. (This has to be contrasted with the Ca isotopic chain discussed above, where the difference of charge radii be- tween 40Ca and 48Ca appears to be particularly sensi- tive to the single-particle spectrum of the protons.) The closer the 1i11/2 level is to the 2g9/2 level that is filled above N = 126, the more the 1i11/2 becomes occupied through pairing correlations. Through the shape of its radial wave function, the partial filling of the nodeless 1i11/2 increases the neutron radius faster than filling only the 2g9/2, and in particular faster than for the isotopes below N = 126. As the protons follow the density distri- bution of the neutrons, the charge radius grows rapidly beyond N = 126. This offers an explanation why the kink increases with the like-particle tensor term coupling constant α: for large values of the weight α of the neu- tron spin-orbit current in the neutron spin-orbit poten- tial, Eq. (35), the spin-orbit splitting of the ν 1i levels is reduced such that the 1i11/2 approaches the 2g9/2 level in 208Pb, see Fig. 18. While the kink is clearly sensitive to the tensor terms, they cannot be responsible for the entire effect, as even for extreme parameterizations that give unrealistic single-particle spectra the calculated kink hardly reaches about three quarters of its experimental value. V. SUMMARY AND CONCLUSIONS We have reported a systematic study of the effects of the J2 (tensor) terms in the Skyrme energy functional for spherical nuclei. The aim of the present study was not to obtain a unique best fit of the Skyrme energy functional with tensor terms, but to analyze the impact of the tensor terms on a large variety of observables in calculations at a pure mean-field level and to identify, if possible, observables that are particularly, even uniquely, sensitive to the J2 terms. To reach our goal, we have built a set of 36 parameterizations that cover the two-dimensional parameter space of the coupling constants of the J2t terms that does not give obviously unphysical predictions for a wide variety of observables we have looked at. The fits were performed using a protocol very similar to that of the SLy parameterizations [51, 52]. The 36 actual sets of parameters can be found in the Physical Review archive [85]. We use a formalism that explicitly relates the tensor terms in the energy functional to underlying effective density-dependent central, spin-orbit and tensor forces (or vertices) in the particle-hole channel. As has been known for long, a zero-range tensor force gives no qual- itatively new terms for spherical mean-field states when combined with a central Skyrme force, but solely modifies the coupling constants of the J2 terms that are already present. The contribution from the central Skyrme force to the coupling constants of the J2 terms depends on the same parameters t1, x1, t2 and x2 that determine the ef- fective mass and contribute to the surface terms. As the latter terms are much more important for the description of bulk properties than the J2 terms, the coupling con- stants of the J2 terms are confined to a very small region of the parameter space. From this point of view, adding a tensor force is necessary to explore it fully. There is, however, the alternative interpretation of the Skyrme energy functional from the density matrix ex- pansion, which in the absence of ab-initio realizations so far is used as a motivation to set up energy functionals with independent, and phenomenologically fitted, cou- pling constants of all terms not constrained by symme- tries. In particular, this can be used to set unwanted or underconstrained terms to zero, as it is done for many existing parameterizations of the (central) Skyrme in- teraction. For the ground states of spherical nuclei, as discussed here, the frameworks cannot be distinguished. For deformed nuclei and, in particular, polarized nuclear matter, this choice will make a difference. As a result of our study, we have obtained a long list of potential deficiencies of the Skyrme energy functional, most of which can be expected to be related to the prop- erties of the central and spin-orbit interactions used. In fact, these deficiencies become more obvious the moment one adds a tensor force, as it appears that the presence of a tensor force unbalances a delicate compromise within various terms of the Skyrme interaction that permits to get the global trend of gross features of the shell structure right. Our conclusions, however, have to be taken with a grain of salt. On the one hand, some might depend on the fit protocol; and on the other hand, we have to stress that (within the framework of our study – and all others available so far using mean-field methods) the compari- son between calculated and empirical single-particle en- ergies is not straightforward and without the risk of being misled. However, without even looking at single-particle spectra, we find that 1. The presence of the tensor terms leads to a strong rearrangement of the other coupling constants, most notably that of the spin-orbit force. In fact, we find that the variation of the spin-orbit strength W0 provoked by the presence of tensor terms has a larger impact on the global systematics of single- particle spectra than the tensor terms themselves. The rearrangement of the parameters of the central and spin-orbit parts of the effective interaction sug- gests that perturbative studies of the tensor terms, in which they are added to an existing parameter- ization without readjustment, allow only very lim- ited conclusions. 2. In the Skyrme energy functional, the combined cou- pling constants of the spin-orbit and tensor terms are nearly exclusively fixed by the mass differences between 40Ca, 48Ca and 56Ni. This correlation ap- pears to be (at least partly) spurious, the rapidly varying spin-orbit and tensor terms being misused to simulate missing physics in the standard Skyrme functional. 3. The cost function χ2 used in our fit protocol prefers parameterizations with β = 0, i.e. pure like-particle tensor terms ∼ (J2n + J p), without giving a clear preference for a value of the corresponding coupling constant α. By contrast, the mass residuals of 134 spherical even-even nuclei are minimized for inter- actions with large α and β. However, and as we will discuss in [41], the deformation properties of many nuclei obtained with the latter parameteriza- tions are unrealistic, which disfavors this region of the parameter space. 4. The difference of the charge radii of 40Ca and 48Ca turns out to be particularly sensitive to the abso- lute single-particle energy of the proton 1d3/2 level, which can be moved around by the J2 terms. As the parameterizations that give the best agreement for the absolute placement of this level do not nec- essarily give the best overall single-particle spectra for these two nuclei, this quantity should not be used to constrain the J2 terms. Concerning the global properties of the spin-orbit current J and its contribution to the spin-orbit potential, we have shown that 1. The spin-orbit current J in non-spin-saturated doubly-magic nuclei as 56Ni, 100Sn, 132Sn or 208Pb is dominated by the nodeless intruder orbitals. Through the contribution of the tensor terms to the spin-orbit field, the feedback effect on their own spin-orbit splitting is maximized. 2. In light nuclei, J and consequently the contribu- tion of the J2 terms to the binding energy and the spin-orbit potential, vary rapidly between near-zero and very large values when adding just a few nucle- ons to a given nucleus. In heavy spherical nuclei, the variation becomes much slower and smoother as on the one hand one does not encounter spin- saturated configurations anymore, and on the other hand there are more and more high-ℓ states with large degeneracy that require more nucleons to be filled. 3. The contribution from the zero-range spin-orbit force to the spin-orbit potential is peaked at the nuclear surface, as it is proportional to the gradi- ent of the density. By contrast, the contribution from the zero-range tensor terms is peaked further inside of the nucleus, modifying the width of the spin-orbit potential with varying nucleon numbers. As shown in Ref. [48], experimental data tend to dislike such a modification. 4. Large negative coupling constants of the tensor terms will lead to instabilities, where a nucleus gains energy separating the levels from many spin- orbit partners on both sides of the Fermi energy. This process leads to unphysical single-particle spectra and rules out a large part of the parameter space. In particular cases, one might even obtain a (probably spurious) coexistence of two spherical configurations with different shell structure in the same nucleus, which are separated by a barrier. The main motivation to add J2 terms is of course to im- prove the single-particle spectra. All observations and conclusions concerning those have to be taken with care, as in this study we compare the eigenvalues of a spher- ical single-particle Hamiltonian with the separation en- ergy to low-lying states in the odd-A neighbors of doubly and semi-magic nuclei (as was done in all existing earlier studies). When looking at the single-particle spectra in doubly-magic nuclei (or semi-magic nuclei combined with a strong subshell closure of the other species) we find that 1. The relative error of the spin-orbit splittings de- pends strongly on the principal quantum number of the orbitals within a given shell, such that for pa- rameterizations without the tensor terms the split- tings of the intruder state (without nodes in the ra- dial wave function) is tentatively too small, while it becomes too large with increasing number of nodes. Adding the tensor terms further increases the dis- crepancy. This problem can only be resolved by an improved control over the shape of the spin- orbit potential. Indeed, the size of the spin-orbit splittings is related to the overlap of the radial wave function of a given single-particle state with the spin-orbit potential. The tensor terms mod- ify the width of the spin-orbit potential, but to cure this deficiency calls for a large negative like- particle tensor coupling constant α, which is not consistent with the evolution of spin-orbit splittings along chains of semi-magic nuclei, and will lead to instabilities. 2. We also find that, in a given nucleus, the predicted spin-orbit splittings of neutron levels are larger than those of the protons when both are compared to experiment, which hints at an unresolved isospin trend in the spin-orbit interaction. 3. For spin-saturated doubly-magic nuclei as 16O and 40Ca, the spin-orbit splittings of the spin-saturated species of nucleons depends strongly on the cou- pling constants of the J2 terms, although they do not contribute to the spin-orbit field. This is a con- sequence of the strong correlation between the spin- orbit and tensor term coupling constants, which try to compensate each other in spin-unsaturated nu- clei. For parameterizations with strong tensor-term coupling constants, the resulting spin-orbit force leads to unrealistic single-particle spectra of spin- saturated configurations. 4. The centroid of the spin-orbit partners that give the intruder state is tentatively too high compared to the major shell below. The main effect of the tensor terms, that most of the recent studies concentrate on, is the evolution of spin- orbit splittings with N and Z. Unfortunately, there are no data for the splittings themselves, such that one re- lies on data for the evolution of the distance of two lev- els with different ℓ. The comparison is compromised by the global deficiencies of the single-particle spectra listed above. Still, a careful comparison of calculations and experiment suggests that 1. The evolution of the proton 1h11/2, 1g7/2 and 2d5/2 levels in the chain of Sn isotopes and that of the proton 1f5/2 and 2p3/2 levels in Ni isotopes call for a positive proton-neutron tensor coupling constant β with a value around 120 MeV fm5, consistent with the findings of Refs. [48, 49, 50]. 2. The evolution of the neutron 1d3/2 and 2s1/2 levels between 40Ca and 48Ca calls for a like-particle ten- sor coupling constant α with a similar value around 120 MeV fm5. This it at variance to the findings of Refs. [48, 49, 50], but in qualitative agreement with the parameterization skxta of Brown et al. [48] for which the tensor terms were derived from a realis- tic interaction but disregarded thereafter because of its poor description of spin-orbit splittings. 3. Combined this leads to a dominantly isoscalar ten- sor term with a coupling constant CJ0 around 120 MeV fm5, while the isovector coupling constant will have a small, near-zero, value. Our study is obviously only a stepping stone towards improved parameterizations of the Skyrme energy den- sity functional. There are a number of necessary further studies and future theoretical developments 1. The deformation properties of selected parameter- izations TIJ from this study will be discussed in a forthcoming paper [41]. 2. The influence of the terms depending on time-odd densities and currents in the complete energy func- tional (23) on nuclear matter and finite nuclei (rota- tional bands etc) is under investigation as well. The existing stability criteria of polarized matter have to be generalized as the tensor force introduces new unique terms, for example in the Landau parame- ters [121]. 3. It is well known that the strength of the spin- orbit force has to scale with the effective mass of an interaction, which in turn determines the average density of single-particle levels. All pa- rameterizations discussed here have a similar effec- tive mass close to m∗0/m = 0.7 that was already used for the SLy parameterizations. This value is somewhat smaller than the one obtained from ab- initio calculations. We have checked that increas- ing the effective isoscalar mass to the more realistic m∗0/m = 0.8 (which within our fit protocol requires to use two density dependent terms [84]) does not significantly affect any of our conclusions. 4. It is evident that improvements of the central and spin-orbit parts of the energy density functional are necessary, which will require a generalization of its functional form. Other motivations were found re- cently to perform such a generalization [84]. 5. The only quantity that we found sufficiently sen- sitive to the tensor terms is the evolution of the distance between single-particle levels in isotopic or isotonic chains of semi-magic nuclei. The dis- tance between the levels that can be used for such studies is so large, that it might be compromised by their coupling to collective excitations. Reliable calculations including pairing, polarization as well as particle-vibration coupling effects [89, 90] along isotopic and isotonic chains are needed to test the quality, reliability and limits of the simplistic iden- tification of the eigenvalues of the spherical mean- field Hamiltonian in an even-even nucleus with the separation energy to or from low-lying states in the adjacent odd-A nuclei. Acknowledgments We thank P. Bonche, H. Flocard, P.-H. Heenen and B. A. Brown for stimulating and encouraging discus- sions. Work by M. B. and K. B. was performed within the framework of the Espace de Structure Nucléaire Théorique (ESNT). T. L. acknowledges the hospitality of the SPhN and ESNT on many occasions during the realization of this work. This work was supported by the U.S. National Science Foundation under Grant No. PHY-0456903. APPENDIX A: COUPLING CONSTANTS OF THE SKYRME ENERGY FUNCTIONAL The coupling constants of the central Skyrme energy density functional in terms of the parameters of the cen- tral Skyrme force are given by 0 (r) 1 = − ρα0 (r) As0 = − ρα0 (r) As1 = − 0 (r) Aτ0 = Aτ1 = − AT0 = − AT1 = − 0 = − A∆s0 = A∆s1 = t2 . (A1) The coupling constants of the spin-orbit energy density functional in terms of the parameters of the spin-orbit force are given by A∇J0 = − A∇J1 = − W0 . (A2) The coupling constants of the tensor energy density func- tional in terms of the parameters of Skyrme’s tensor force are given by (Table I in [59]) BT0 = − (te + 3to) B (te − to) (A3) BF0 = (te + 3to) B 1 = − (te − to) (A4) B∆s0 = (te − to) B 1 = − (3te + to) (A5) B∇s0 = (te − to) B 1 = − (3te + to) . (A6) APPENDIX B: PHASE TRANSITIONS The densities ρ and τ entering the energy functional (28) vary smoothly with nucleon numbers as they fol- -1025 -1020 -1015 -1010 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 FIG. 35: Total binding energy of 120Sn as a function of C = r Jn ·∇ρn in a constrained calculation. The dashed curve shows results obtained with the parameterization mentioned in the text, while the solid curve shows results obtained with SLy5. low the geometric growth of the nucleus. As a result, a functional depending only on ρ and τ usually shows a unique minimum for given N , Z and shape. The situ- ation is quite different when the tensor terms are taken into account. Indeed, the amplitude of the spin-orbit current density J (27) depends on the number of spin- unsaturated single-particle states in the nucleus; it varies from (almost) zero in spin-saturated nuclei to large finite values as a consequence of shell and finite-size effects, see Fig. 5. This behavior poses the risk of an instability, which was already reported in [5]: multiplying J with a large coupling constant in the spin-orbit potential (35) might, for certain combinations of the signs of the coupling con- stant and the spin-orbit currents of protons and neutrons, increase the spin-orbit splittings. In some nuclei, this will cause two levels originating from different ℓ shells to ap- proach the Fermi energy, one from above and the other from below, or even to cross. In that situation, their occupation numbers will change such that J increases further, which feeds back onto the spin-orbit potential and ultimately leads to a dramatic rearrangement of the single-particle spectrum. We faced this problem when attempting to fit param- eter sets with large negative CJ0 and C 1 . During the fit, some nuclei sometimes fell into the instability, depending on the values of the other coupling constants. As this is a highly nonlinear threshold effect that results in a very large energy gain from tiny modifications of the coupling constants, the corresponding fits did not, and could not, converge. In special cases, one might even run into a situation with two coexisting minima, where as a function of a suitable coordinate the configuration with regular shell structure is separated from a configuration with unphys- ical large spin-orbit splittings by a barrier. In such a 120Sn SLy5 (a) SLy5 (a) (b) 1g9/2 2d5/2 1g7/2 3s1/2 2d3/2 1h11/2 2f7/2 3p3/2 3p1/2 2p3/2 2p1/2 1f5/2 1g9/2 2d5/2 3s1/2 1h11/2 2d3/2 1g7/2 FIG. 36: Single-particle spectra corresponding to the mini- mum found with SLy5 and (a) the secondary minimum found with TXX , (b) the absolute minimum (see Fig. 35; left: neu- tron levels, right: proton levels). case, a calculation of the ground state might converge into one or the other minimum depending on the initial conditions chosen for the iterative solution of the HFB equations. In a calculation along an isotopic or isotonic chain, the coexistence will reveal itself through a large scattering of the mass residuals, which will fall on two distinct curves. We illustrate this phenomenon in Fig. 35 for 120Sn using a parameter set denoted “TXX” with CJ0 = −157.57 MeV fm 5 and CJ1 = −114.88 MeV fm which is located outside the parameter space shown in Fig. 1, to its lower left. Among the various possible recipes for a constraint on the spin-orbit current density, we chose to minimize the following quantity: E[ρ]− µ d3r Jn · ∇ρn − C where µ is a Lagrange parameter and C is a constant used to tune the constraint. The energy curve exhibits two minima denoted (a) and (b). The corresponding single- particle spectra are shown in Fig. 36 along with those obtained for SLy5. The minimum (a) corresponds to an almost spin saturated neutron configuration where both spin partners are either occupied or empty,4 which is very similar to what is found using SLy5. In the minimum (b), which is deeper by more than 7 MeV, the single-particle spectrum is completely reorganized in order to maximize the spin-orbit current density and take advantage of its contribution in the functional. In this situation the neu- tron spin doublets 2d, 1g and 1h split on both sides of the Fermi surface and generate a large spin-orbit current density. This clearly shows that the parameter sets with large and negative coupling constants of the J2 terms must be discarded since for many nuclei they lead to ground states with unrealistic single-particle structure. Note that this kind of instability does not appear for the spin-orbit term: although its contribution to the en- ergy functional (28) also varies between small, sometimes near-zero, and very large values, see Figs. 28 and 29, it is only linear in J. As a consequence, its contribution to the spin-orbit potential (35) lacks the feedback mecha- nism outlined above as it does not scale with J. Still, its contribution to the total energy is usually much larger than that of the J2 terms, so it plays a decisive role for the absolute energy gained when varying J. [1] M. Goeppert Mayer, Phys. Rev. 74, 235 (1948). [2] O. Haxel, J. H. D. Jensen, and H. E. Suess, Phys. Rev. 75, 1766 (1949). [3] E. Feenberg and K. C. Hammack, Phys. Rev. 75, 1877 (1949). [4] M. Goeppert Mayer, Phys. Rev. 75, 1969 (1949). [5] M. Beiner, H. Flocard, Nguyen Van Giai, and P. Quentin, Nucl. Phys. A238, 29 (1975). [6] J. Dobaczewski, I. Hamamoto, W. Nazarewicz, and J. A. Sheikh, Phys. Rev. Lett. 72, 981 (1994). [7] G. A. Lalazissis, D. Vretenar, W. Pöschl, and P. Ring, Phys. Lett. B418, 7 (1998). [8] G. A. Lalazissis, D. Vretenar, W. Pöschl, and P. Ring, Nucl. Phys. A632, 363 (1998). [9] B. Chen, J. Dobaczewski, K. L. Kratz, K. Langanke, B. Pfeiffer, F.-K. Thielemann, and P. Vogel, Phys. Lett. B355, 37 (1995). [10] J. Dobaczewski, A. Nazarewicz, and T. R. Werner, Phys. Scr. T56, 15 (1995). [11] J. M. Pearson, R. C. Nayak, and S. Goriely, Phys. Lett. B387, 455 (1996). [12] B. Pfeiffer, K.-L. Kratz, and F.-K. Thielemann, Z. Phys. A357, 235 (1997). [13] J. Dechargé, J. F. Berger, K. Dietrich, and M. S. Weiss, Phys. Lett. B451, 275 (1999). [14] M. Bender, K. Rutz, P.-G. Reinhard, J. A. Maruhn, and W. Greiner, Phys. Rev. C 60, 034304 (1999). [15] K. Langanke, J. Terasaki, F. Nowacki, D. J. Dean, and W. Nazarewicz, Phys. Rev. C 67, 044314 (2003). [16] R. B. Wiringa, V. G. J. Stoks, and R. Schiavilla, Phys. Rev. C 51, 38 (1995). [17] R. Machleidt, Phys. Rev. C 63, 024001 (2001). [18] S. C. Pieper and R. B. Wiringa, Ann. Rev. Nucl. Part. Sci. 51, 53 (2001). [19] P. Navrátil and W. E. Ormand, Phys. Rev. C 68, 034305 (2003). [20] J. M. Eisenberg and W. Greiner, Nuclear Theory. III. Microscopic theory of the nucleus (Second printing, North Holland Physics Publ., Elsevier Science Publish- ers, Amsterdam, 1976). [21] S. G. Nilsson and I. Ragnarsson, Shapes and Shells in Nuclear Structure (Cambridge University Press, Cam- bridge, England, 1995). [22] T. Neff and H. Feldmeier, Nucl. Phys. A713, 311 (2003). [23] R. Roth, T. Neff, H. Hergert, and H. Feldmeier, Nucl. Phys. A745, 3 (2004). [24] H. A. Bethe, Phys. Rev. 167, 879 (1968). [25] J. W. Negele, Phys. Rev. C 1, 1260 (1970). [26] R. R. Scheerbaum, Phys. Lett. B63, 381 (1976). [27] A. L. Goodman and J. Borysowicz, Nucl. Phys. A295, 333 (1978). [28] D. C. Zheng and L. Zamick, Ann. Phys. (NY) 206, 106 (1991). [29] M. Bender, P.-H. Heenen, and P.-G. Reinhard, Rev. Mod. Phys. 75, 121 (2003). [30] T. H. R. Skyrme, Phil. Mag. 1, 1043 (1956). [31] T. H. R. Skyrme, Nucl. Phys. 9, 615 (1958). [32] J. S. Bell and T. H. R. Skyrme, Phil. Mag. 1, 1055 (1956). [33] T. H. R. Skyrme, Nucl. Phys. 9, 635 (1958). [34] D. Gogny, Nucl. Phys. A237, 399 (1975). [35] J. Dechargé and D. Gogny, Phys. Rev. C 21, 1568 (1980). [36] D. Vautherin and D. M. Brink, Phys. Rev. C 5, 626 (1972). [37] F. Stancu, D. M. Brink, and H. Flocard, Phys. Lett. B68, 108 (1977). [38] F. Tondeur, Phys. Lett. B123, 139 (1983). [39] K.-F. Liu, H. Luo, Z. Ma, Q. Shen, and S. A. Moszkowski, Nucl. Phys. A534, 1 (1991). [40] N. Onishi and J. W. Negele, Nucl. Phys. A301, 336 (1978). [41] M. Bender, K. Bennaceur, T. Duguet, P.-H. Heenen, T. Lesinski, and J. Meyer (2007), companion paper, in preparation. [42] W.-H. Long, Nguyen Van Giai, and J. Meng, Phys. Lett. B640, 150 (2006). [43] M. S. Fayache, L. Zamick, and B. Castel, Phys. Rep. 290, 201 (1997). [44] T. Otsuka, T. Suzuki, R. Fujimoto, H. Grawe, and Y. Akaishi, Phys. Rev. Lett. 95, 232502 (2005). [45] J. P. Schiffer, S. J. Freeman, J. A. Caggiano, C. Deibel, A. Heinz, C.-L. Jiang, R. Lewis, A. Parikh, P. D. Parker, K. E. Rehm et al., Phys. Rev. Lett. 92, 162501 (2004). [46] T. Otsuka, T. Matsuo, and D. Abe, Phys. Rev. Lett. 97, 162501 (2006). [47] J. Dobaczewski, in Proceedings of the Third ANL/MSU/JINA/INT RIA Workshop, edited by T. Duguet, H. Esbensen, K. M. Nollett, and C. D. Roberts (World Scientific, 2006), vol. 15 of Proceed- ings from the Institute for Nuclear Theory, preprint nucl-th/0604043. [48] B. A. Brown, T. Duguet, T. Otsuka, D. Abe, and T. Suzuki, Phys. Rev. C 74, 061303(R) (2006). [49] G. Colò, H. Sagawa, S. Fracasso, and P. F. Bortignon, Phys. Lett. B646, 227 (2007). [50] D. M. Brink and F. Stancu, Phys. Rev. C 75, 064311 (2007). [51] E. Chabanat, P. Bonche, P. Haensel, J. Meyer, and R. Schaeffer, Nucl. Phys. A627, 710 (1997). [52] E. Chabanat, P. Bonche, P. Haensel, J. Meyer, and R. Schaeffer, Nucl. Phys. A635, 231 (1998), erratum Nucl. Phys. A643, 441 (1998). [53] W. H. Long, H. Sagawa, J. Meng, and Nguyen Van Giai (2006), preprint nucl-th/0609076. [54] M. Bender, G. F. Bertsch, and P.-H. Heenen, Phys. Rev. C 73, 034322 (2006). [55] M. Bender, P. Bonche, T. Duguet, and P. H. Heenen, Nucl. Phys. A723, 354 (2003). [56] M. Bender, P. Bonche, and P. H. Heenen, Phys. Rev. C 74, 024312 (2006). [57] A. Chatillon, C. Theisen, P. T. Greenlees, G. Auger, J. E. Bastin, E. Bouchez, B. Bouriquet, J. M. Casand- jian, R. Cee, E. Clément et al., Eur. Phys. J. A30, 397 (2006). [58] J. C. Slater, Phys. Rev. 81, 385 (1951). [59] E. Perlińska, S. G. Rohoziński, J. Dobaczewski, and W. Nazarewicz, Phys. Rev. C 69, 014316 (2004). [60] J. Dobaczewski, J. Dudek, S. G. Rohoziński, and T. R. Werner, Phys. Rev. C 62, 014310 (2000). [61] J. Dobaczewski and J. Dudek, Acta Phys. Pol. B27, 45 (1996). [62] Y. M. Engel, D. M. Brink, K. Goeke, S. J. Krieger, and D. Vautherin, Nucl. Phys. A249, 215 (1975). [63] J. Dobaczewski and J. Dudek, Phys. Rev. C 52, 1827 (1995), erratum Phys. Rev. C 55, 3177 (1997). [64] H. Flocard, Ph.D. thesis, Orsay, Série A, No. 1543, Uni- versité Paris Sud (1975). [65] J. Dobaczewski, H. Flocard, and J. Treiner, Nucl. Phys. A422, 103 (1984). [66] J. W. Negele and D. Vautherin, Phys. Rev. C 5, 1472 (1972). [67] J. W. Negele and D. Vautherin, Phys. Rev. C 11, 1031 (1975). [68] X. Campi and A. Bouyssy, Phys. Lett. 73B, 263 (1978). [69] H. Krivine, J. Treiner, and O. Bohigas, Nucl. Phys. A336, 155 (1980). [70] J. Bartel, P. Quentin, M. Brack, C. Guet, and H. B. Hakansson, Nucl. Phys. A386, 79 (1982). [71] F. Tondeur, M. Brack, M. Farine, and J. M. Pearson, Nucl. Phys. A420, 297 (1984). [72] J. Friedrich and P. G. Reinhard, Phys. Rev. C 33, 335 (1986). [73] P. G. Reinhard, D. J. Dean, W. Nazarewicz, J. Dobaczewski, J. A. Maruhn, and M. R. Strayer, Phys. Rev. C 60, 014316 (1999). [74] P. Bonche, H. Flocard, and P. H. Heenen, Nucl. Phys. A467, 115 (1987). [75] J. Engel, M. Bender, J. Dobaczewski, W. Nazarewicz, and R. Surman, Phys. Rev. C 60, 014302 (1999). [76] M. Bender, J. Dobaczewski, J. Engel, and W. Nazarewicz, Phys. Rev. C 65, 054322 (2002). [77] J. Terasaki, J. Engel, M. Bender, J. Dobaczewski, W. Nazarewicz, and M. V. Stoitsov, Phys. Rev. C 71, 034310 (2005). [78] M. M. Sharma, G. Lalazissis, J. König, and P. Ring, Phys. Rev. Lett. 74, 3744 (1995). [79] P. G. Reinhard and H. Flocard, Nucl. Phys. A584, 467 (1995). [80] A. Akmal, V. R. Pandharipande, and D. G. Ravenhall, Phys. Rev. C58, 1804 (1998). [81] S. Goriely, M. Samyn, J. M. Pearson, and M. Onsi, Nucl. Phys. A750, 425 (2005). [82] B. A. Brown, private communication. [83] K. F. Liu and G. E. Brown, Nucl. Phys. A265, 385 (1976). [84] T. Lesinski, K. Bennaceur, T. Duguet, and J. Meyer, Phys. Rev. C 74, 044315 (2006). [85] See EPAPS Document No. ???? for the coupling con- stants of the 36 TIJ parametrizations. [86] T. Duguet, K. Bennaceur, and P. Bonche (2006), invited talk at the Workshop on New developments in Nuclear Self-Consistent Mean-Field Theories, Yukawa Institute for Theoretical Physics, Kyoto, Japan, May 30 - June 1, 2005, nucl-th/0508054. [87] K. Bennaceur and J. Dobaczewski, Comp. Phys. Comm. 168, 96 (2005). [88] K. Rutz, M. Bender, P. G. Reinhard, J. A. Maruhn, and W. Greiner, Nucl. Phys. A634, 67 (1998). [89] V. Bernard and Nguyen Van Giai, Nucl. Phys. A348, 75 (1980). [90] E. Litvinova and P. Ring, Phys. Rev. C 73, 044328 (2006). [91] G. F. Bertsch, The Practitioner’s Shell model (North Holland, Amsterdam, 1972). [92] P. Ring and P. Schuck, The Nuclear Many Body Problem (Springer, Berlin, 1980). [93] E. Caurier, G. Martinez-Pinedo, F. Nowacki, A. Poves, and A. P. Zuker, Rev. Mod. Phys. 77, 427 (2005). [94] J. Duflo and A. P. Zuker, Phys. Rev. C 59, R2347 (1999). [95] H. Grawe, A. Blazhev, M. Górska, I. Mukha, C. Plet- tner, E. Roeckl, F. Nowacki, R. Grzywacz, and M. Saw- icka, Eur. Phys. J. A25, 357 (2005). [96] H. Grawe, A. Blazhev, M. Górska, R. Grzywacz, H. Mach, and I. Mukha, Eur. Phys. J. A27, 257 (2006). [97] B. A. Brown, Phys. Rev. C 58, 220 (1998). [98] M. López-Quelle, Nguyen Van Giai, S. Marcos, and L. N. Savushkin, Phys. Rev. C 61, 064321 (2000). [99] J. Skalski, Phys. Rev. C 63, 024312 (2001). [100] K. Hauschild, M. Rejmund, H. Grawe, E. Caurier, F. Nowacki, F. Becker, Y. Le Coz, W. Korten, J. Döring, M. Górska et al., Phys. Rev. Lett. 87, 072501 (2001). [101] J. Shergur, D. J. Dean, D. Seweryniak, W. B. Walters, A. Wohr, P. Boutachkov, C. N. Davids, I. Dillmann, A. Juodagalvis, G. Mukherjee et al., Phys. Rev. C 71, 064323 (2005). [102] M. G. Porquet, S. Péru, and M. Girod, Eur. Phys. J. A25, 319 (2005). [103] Z. Patyk, A. Baran, J. F. Berger, J. Dechargé, J. Dobaczewski, P. Ring, and A. Sobiczewski, Phys. Rev. C 59, 704 (1999). [104] D. Lunney, J. M. Pearson, and C. Thibault, Rev. Mod. Phys. 75, 1021 (2003). [105] J. Dobaczewski, M. V. Stoitsov, and W. Nazarewicz, in Proc. Int. Conf. on Nuclear Physics, Large and Small, Cocoyoc, Mexico, April 19-22, 2004, edited by R. Bi- jker, R. F. Casten, and A. Frank (American Insti- tute of Physics, Melville, NY, 2004), pp. 51–56, nucl- th/0404077. [106] W. Satu la, D. J. Dean, J. Gary, S. Mizutori, and W. Nazarewicz, Phys. Lett. B407, 103 (1997). [107] F. Tondeur, S. Goriely, J. M. Pearson, and M. Onsi, Phys. Rev. C 62, 024308 (2000). [108] M. Samyn, S. Goriely, P. H. Heenen, J. M. Pearson, and F. Tondeur, Nucl. Phys. A700, 142 (2002). [109] S. Goriely, M. Samyn, M. Bender, and J. M. Pearson, Phys. Rev. C 68, 054325 (2003). [110] P.-G. Reinhard, M. Bender, W. Nazarewicz, and T. Vertse, Phys. Rev. C 73, 014309 (2006). [111] P.-G. Reinhard and D. Drechsel, Z. Phys. A290, 85 (1979). [112] M. Girod and P.-G. Reinhard, Nucl. Phys. A384, 179 (1982). [113] P. Bonche, J. Dobaczewski, H. Flocard, and P. H. Hee- nen, Nucl. Phys. A530, 149 (1991). [114] P. H. Heenen, P. Bonche, J. Dobaczewski, and H. Flo- card, Nucl. Phys. A561, 367 (1993). [115] N. Tajima, P. Bonche, H. Flocard, P. H. Heenen, and M. S. Weiss, Nucl. Phys. A551, 434 (1993). [116] S. A. Fayans, S. V. Tolokonnikov, E. L. Trykov, and D. Zawischa, Nucl. Phys. A676, 49 (2000). [117] W. Bertozzi, J. Friar, J. Heisenberg, and J. W. Negele, Phys. Lett. B41, 408 (1972). [118] E. W. Otten, in Treatise on Heavy-Ion Science, edited by A. D. Bromley (Plenum, New York, 1989), vol. 8. Nuclei far from Stability, pp. 517–638. [119] E. Caurier, K. Langanke, G. Martinez-Pinedo, F. Nowacki, and P. Vogel, Phys. Lett. B522, 240 (2001). [120] I. Angeli, Atom. Data Nucl. Data Tables 87, 185 (2004). [121] P. Haensel and A. J. Jerzak, Phys. Lett. B112, 285 (1982). ABSTRACT We perform a systematic study of the impact of the J^2 tensor term in the Skyrme energy functional on properties of spherical nuclei. In the Skyrme energy functional, the tensor terms originate both from zero-range central and tensor forces. We build a set of 36 parameterizations, which covers a wide range of the parameter space of the isoscalar and isovector tensor term coupling constants, with a fit protocol very similar to that of the successful SLy parameterizations. We analyze the impact of the tensor terms on a large variety of observables in spherical mean-field calculations, such as the spin-orbit splittings and single-particle spectra of doubly-magic nuclei, the evolution of spin-orbit splittings along chains of semi-magic nuclei, mass residuals of spherical nuclei, and known anomalies of charge radii. Our main conclusion is that the currently used central and spin-orbit parts of the Skyrme energy density functional are not flexible enough to allow for the presence of large tensor terms. <|endoftext|><|startoftext|> Introduction It is a pleasure to participate in the celebration of the seminal accomplish- ments of Gabriele Veneziano. I will try to do so by reviewing a line of research which is intimately connected with several of Gabriele’s important contributions, being concerned with the cardinal problem of String Cosmol- ogy: the fate of the Einstein-like space-time description at big crunch/big bang cosmological singularities. Actually, the work, described below started as a by-product of the string cosmology program initiated by M. Gasperini 1Contribution to “String Theory and Fundamental Interactions” – in celebration of Gabriele Veneziano’s 65th birthday – eds. M. Gasperini and J. Maharana, Springer-Verlag, Heidelberg, 2007. http://arxiv.org/abs/0704.0732v1 and G. Veneziano [1]. While collaborating with Gabriele on the possible birth of “pre-big bang bubbles” from the gravitational-collapse instability of a generic string vacuum made of a stochastic bath of incoming gravitational and dilatonic waves [2], an issue raised itself : what is the structure of a generic spacelike (i.e. big crunch or big bang) singularity within the effec- tive field theory approximation of (super-) string theory (when keeping all fields, and not only the metric and the dilaton). The answer turned out to be surprisingly complex, and rich of hidden structures. It was first found [3, 4] that the general solution, near a spacelike singularity, of the massless bosonic sector of all superstring models (D = 10, IIA, IIB, I, HE, HO), as well as that of M theory (D = 11 supergravity), exhibits a never ending oscillatory behaviour of the Belinsky-Khalatnikov-Lifshitz (BKL) type [5]. However, it was later realized that behind this seeming entirely chaotic behaviour there was a hidden symmetry structure [6, 7, 8]. This led to the conjecture of the existence of a hidden equivalence (i.e. a correspondence) between two seem- ingly very different dynamical systems: on the one hand, 11-dimensional supergravity (or even, hopefully, “M-theory”), and, on the other hand, a one-dimensional E10/K(E10) nonlinear σ model, i.e. the geodesic motion of a massless particle on the infinite-dimensional coset space2 E10/K(E10) [8]. The intuitive hope behind this conjecture is that the BKL-type near space- like singularity limit might act as a tool for revealing a hidden structure, in analogy to the much better established AdS/CFT correspondence [9], where the consideration of the near horizon limit of certain black D-branes has revealed a hidden equivalence between 10-dimensional string theory in AdS spacetime on one side, and a lower-dimensional CFT on the other side. If the (much less firmly established) “gravity/coset correspondence” were con- firmed, it might provide both the basis of a new definition of M-theory, and a description of the “de-emergence” of space near a cosmological singularity (see [10] and below). 2Here K(E10) denotes the (formal) “maximal compact subgroup” of the hyperbolic Kac-Moody group E10. 2 Cosmological billiards Let us start by summarizing the BKL-type analysis of the “near spacelike singularity limit”, that is, of the asymptotic behaviour of the metric gµν(t,x), together with the other fields (such as the 3-form Aµνλ(t,x) in supergravity), near a singular hypersurface. The basic idea is that, near a spacelike singu- larity, the time derivatives are expected to dominate over spatial derivatives. More precisely, BKL found that spatial derivatives introduce terms in the equations of motion for the metric which are similars to the “walls” of a billiard table [5]. To see this, it is convenient [11] to decompose the D- dimensional metric gµν into non-dynamical (lapse N , and shift N i, here set to zero) and dynamical (e−2β , θai ) components. They are defined so that the line element reads ds2 = −N2dt2 + θai θ idxj . (1) Here d ≡ D − 1 denotes the spatial dimension (d = 10 for SUGRA11, and d = 9 for string theory), e−2β represent (in an Iwasawa decomposition) the “diagonal” components of the spatial metric gij, while the “off diagonal” components are represented by the θai , defined to be upper triangular matrices with 1’s on the diagonal (so that, in particular, det θ = 1). The Hamiltonian constraint, at a given spatial point, reads (with Ñ ≡ det gij denoting the “rescaled lapse”) H(βa, πa, P, Q) Gabπaπb + cA(Q,P, ∂β, ∂ 2β, ∂Q) exp − 2wA(β) . (2) Here πa (with a = 1, ..., d) denote the canonical momenta conjugate to the “logarithmic scale factors” βa, while Q denote the remaining configuration variables (θai , 3-form components Aijk(t,x) in supergravity), and P their canonically conjugate momenta (P ia, π ijk). The symbol ∂ denotes spatial derivatives. The (inverse) metric Gab in Eq. (2) is the DeWitt “superspace” metric induced on the β’s by the Einstein-Hilbert action. It endows the d-dimensional3 β space with a Lorentzian structure Gab β̇ aβ̇b. One of the crucial features of Eq. (2) is the appearance of Toda-like exponential potential terms ∝ exp(−2wA(β)), where the wA(β) are linear forms in the logarithmic scale factors: wA(β) ≡ wAa βa. The range of labels A and the specific “wall forms” wA(β) that appear depend on the considered model. For instance, in SUGRA11 there appear: “symmetry wall forms” wSab(β) ≡ βb − βa (with a < b), “gravitational wall forms” w abc(β) ≡ 2βa + e 6=a,b,c βe (a 6= b, b 6= c, c 6= a), “electric 3-form wall forms”, eabc(β) ≡ βa + βb + βc (a 6= b, b 6= c, c 6= a), and “magnetic 3-form wall forms”, ma1....a6 ≡ βa1 + βa2 + ...+ βa6 (with indices all different). One then finds that the near-spacelike-singularity limit amounts to con- sidering the large β limit in Eq.(2). In this limit a crucial role is played by the linear forms wA(β) appearing in the “exponential walls”. Actually, these walls enter in successive “layers”. A first layer consists of a sub- set of all the walls called the dominant walls wi(β). The effect of these dynamically dominant walls is to confine the motion in β-space to a fun- damental billiard chamber defined by the inequalities wi(β) > 0. In the case of SUGRA11, one finds that there are 10 dominant walls: 9 of them are the symmetry walls wS12(β), w 23(β), ..., w 910(β), and the 10th is an elec- tric 3-form wall e123(β) = β 1 + β2 + β3. As noticed in [6] a remarkable fact is that the fundamental cosmological billiard chamber of SUGRA11 (as well as type-II string theories) is the Weyl chamber of the hyperbolic Kac-Moody algebra E10. More precisely, the 10 dynamically dominant wall forms wS12(β), w 23(β), ..., w 910(β), e123(β) can be identified with the 10 sim- ple roots {α1(h), α2(h), ..., α10(h)} of E10. Here h parametrizes a generic el- ement of a Cartan subalgebra (CSA) of E10 . [Let us also note that for Heterotic and type-I string theories the cosmological billiard is the Weyl 310 dimensional for SUGRA11; but the various superstring theories also lead to a 10 dimensional Lorentz space because one must add the (positive) kinetic term of the dilaton ϕ ≡ β10 to the 9-dimensional DeWitt metric corresponding to the 9 spatial dimensions. chamber of another rank-10 hyperbolic Kac-Moody algebra, namely BE10]. In the Dynkin diagram of E10, Fig. 1, the 9 “horizontal” nodes correspond to the 9 symmetry walls, while the characteristic “exceptional” node sticking out “vertically” corresponds to the electric 3-form wall e123 = β 1 + β2 + β3. [The fact that this node stems from the 3rd horizontal node is then seen to be directly related to the presence of the 3-form Aµνλ, with electric kinetic energy ∝ giℓgjmgknȦijkȦℓmn]. α1 α2 α3 α4 α5 α6 α7 α8 α9 ✐ ✐ ✐ ✐ ✐ ✐ ✐ ✐ ✐ Figure 1: Dynkin diagram of E10. The appearance of E10 in the BKL behaviour of SUGRA11 revived an old suggestion of B. Julia [12] about the possible role of E10 in a one-dimensional reduction of SUGRA11. A posteriori, one can view the BKL behaviour as a kind of spontaneous reduction to one dimension (time) of a multidimen- sional theory. Note, however, that we are always discussing generic inho- mogeneous 11-dimensional solutions, but that we examine them in the near- spacelike-singularity limit where the spatial derivatives are sub-dominant: ∂x ≪ ∂t. Note also that the discrete E10(Z) was proposed as a U -duality group of the full (T 10) spatial toroidal compactification of M-theory by Hull and Townsend [13]. 3 Gravity/Coset correspondence Refs [8, 14] went beyond the leading-order BKL analysis just recalled by in- cluding the first three “layers” of spatial-gradient-related sub-dominant walls ∝ exp(−2wA(β)) in Eq.(2). The relative importance of these sub-dominant walls, which modify the leading billiard dynamics defined by the 10 dom- inant walls wi(β), can be ordered by means of an expansion which counts how many dominant wall forms wi(β) are contained in the exponents of the sub-dominant wall forms wA(β), associated to higher spatial gradients. By mapping the dominant gravity wall forms wi(β) onto the corresponding E10 simple roots αi(h), i = 1, ..., 10, the just described BKL-type gradient ex- pansion becomes mapped onto a Lie-algebraic height expansion in the roots of E10. It was remarkably found that, up to height 30 (i.e. up to small corrections to the billiard dynamics associated to the product of 30 leading walls e−2wi(β)), the SUGRA11 dynamics for gµν(t,x), Aµνλ(t,x) considered at some given spatial point x0, could be identified to the geodesic dynam- ics of a massless particle moving on the (infinite-dimensional) coset space E10/K(E10). Note the “holographic” nature of this correspondence between an 11-dimensional dynamics on one side, and a 1-dimensional one on the other side. A point on the coset space E10(R)/K(E10(R)) is coordinatized by a time- dependent (but spatially independent) element of the E10(R) group of the (Iwasawa) form: g(t) = exp h(t) exp ν(t). Here, h(t) = βacoset(t)Ha belongs to the 10-dimensional CSA of E10, while ν(t) = α>0 ν α(t)Eα belongs to a Borel subalgebra of E10 and has an infinite number of components labelled by a positive root α of E10. The (null) geodesic action over the coset space E10/K(E10) takes the simple form SE10/K(E10) = (vsym|vsym) (3) where vsym ≡ 1 (v + vT ) is the “symmetric”4 part of the “velocity” v ≡ (dg/dt)g−1 of a group element g(t) running over E10(R). The correspondence between the gravity, Eq. (2), and coset, Eq. (3), dy- namics is best exhibited by decomposing (the Lie algebra of) E10 with respect 4Here the transpose operation T denotes the negative of the Chevalley involution ω defining the real form E10(10) of E10. It is such that the elements k of the Lie sub-algebra ofK(E10) are “T -antisymmetric”: k T = −k, which is equivalent to them being fixed under ω : ω(k) = +ω(k). to (the Lie algebra of) the GL(10) subgroup defined by the horizontal line in the Dynkin diagram of E10. This allows one to grade the various components of g(t) by their GL(10) level ℓ. One finds that, at the ℓ = 0 level, g(t) is parametrized by the Cartan coordinates βacoset(t) together with a unimodu- lar upper triangular zehnbein θacoset i(t). At level ℓ = 1, one finds a 3-form Acosetijk (t); at level ℓ = 2, a 6-form A coset i1i2...i6 (t), and at level ℓ = 3 a 9-index object Acoseti1|i2...i9(t) with Young-tableau symmetry {8, 1}. The coset action (3) then defines a coupled set of equations of motion for βacoset(t), θ coset i(t), Acosetijk (t), A coset i1...i6 (t), Acoseti1|i2...i9(t). By explicit calculations, it was found that these coupled equations of motion could be identified (modulo terms corre- sponding to potential walls of height at least 30) to the SUGRA11 equations of motion, considered at some given spatial point x0. The dictionary between the two dynamics says essentially that: (0) βagravity(t,x0) ↔ βacoset(t) , θai (t,x0) ↔ θacoset i(t), (1) ∂t Acosetijk (t) corre- sponds to the electric components of the 11-dimensional field strength Fgravity = dAgravity in a certain frame e i, (2) the conjugate momentum of Acoseti1...i6(t) corresponds to the dual (using εi1i2...i10) of the “magnetic” frame compo- nents of the 4-form Fgravity = dAgravity, and (3) the conjugate momentum of Ai1|i2...i9(t) corresponds to the ε 10 dual (on jk) of the structure constants C ijk of the coframe ei (d ei = 1 C ijk e j ∧ ek). The fact that at levels ℓ = 2 and ℓ = 3 the dictionary between supergrav- ity and coset variables maps the first spatial gradients of the SUGRA variables Aijk(t,x) and gij(t,x) onto (time derivatives of) coset variables suggested the conjecture [8] of a hidden equivalence between the two models, i.e. the existence of a dynamics-preserving map between the infinite tower of (spa- tially independent) coset variables (βacoset, ν α), together with their conjugate momenta (πcoseta , pα), and the infinite sequence of spatial Taylor coefficients (β(x0), π(x0), Q(x0), P (x0), ∂Q(x0), ∂ 2β(x0), ∂ 2Q(x0), . . . , ∂ nQ(x0), . . .) formally describing the dynamics of the gravity variables (β(x), π(x), Q(x), P (x)) around some given spatial point x0. 5One, however, expects the map between the two models to become spatially non-local It has been possible to extend the correspondence between the two models to the inclusion of fermionic terms on both sides [15, 16, 17]. Moreover, Ref. [18] found evidence for a nice compatibility between some high-level contributions (height −115!) in the coset action, corresponding to imaginary roots6, and M-theory one-loop corrections to SUGRA11, notably the terms quartic in the curvature tensor. (See also [19] for a study of the compatibility of an underlying Kac-Moody symmetry with quantum corrections in various models). 4 A new view of the (quantum) fate of space at a cosmological singularity Let us now, following [10], sketch the physical picture suggested by the gravity/coset correspondence. That is, let us take seriously the idea that, upon approaching a spacelike singularity, the description in terms of a spa- tial continuum, and space-time based (quantum) field theory breaks down, and should be replaced by a purely abstract Lie algebraic description. More precisely, we suggest that the information previously encoded in the spatial variation of the geometry and of the matter fields gets transferred to an infinite tower of spatially independent (but time dependent) Lie algebraic variables. In other words, we are led to the conclusion that space actually “disappears” (or “de-emerges”) as the singularity is approached7. In partic- ular (and this would be bad news for Gabriele’s pre-big bang scenario), we suggest no (quantum) “bounce” from an incoming collapsing universe to some outgoing expanding universe. Rather it is suggested that “life continues” for for heights ≥ 30. 6i.e. such that (α, α) < 0, by contrast to the “real” roots, (α, α) = +2, which enter the checks mentionned above. 7We have in mind here a “big crunch”, i.e. we conventionally consider that we are tending towards the singularity. Mutatis mutandis, we would say that space “appears” or “emerges” at a big bang. an infinite “affine time” at a singularity, with the double understanding, however, that: (i) life continues only in a totally new form (as in a kind of “transmigration”), and (ii) an infinite affine time interval (measured, say, in the coordinate t of Eq. (3) with a coset lapse function n(t) = 1) corresponds to a sub-Planckian interval of geometrical proper time8. Let us also comment on some expected aspects of the “duality” between the two models. It seems probable (from the AdS/CFT paradigm) that, even if the equivalence between the “gravity” and the “coset” descriptions is formally exact, each model has a natural domain of applicability in which the corresponding description is sufficiently “weakly coupled” to be trustable as is, even in the leading approximation. For the gravity description this domain is clearly that of curvatures smaller than the Planck scale. One then expects that the natural domain of validity of the dual coset model would correspond (in gravity variables) to that of curvatures larger than the Planck scale. In addition, it is possible that the coset description should primarily be considered as a quantum model, as now sketched. The coset action (3) describes the classical motion of a massless particle on the symmetric space E10(R)/K(E10(R)). Quantum mechanically, one should consider a quantum massless particle, i.e., if we neglect polarization effects9 a Klein-Gordon equation, �Ψ(βa, να) = 0 , (4) where � denotes the (formal) Laplace-Beltrami operator on the infinite- dimensional Lorentz-signature curved coset manifold E10(R)/K(E10(R)). Eq. (4) would apply to the case considered here of un-compactified M-theory. In the case where all spatial dimensions are toroidally compactified, it has been suggested [20, 21] that Ψ satisfy (4) together with a condition of period- 8Indeed, it is found that the coset time t (with n(t) = 1) corresponds to a “Zeno-like” gravity coordinate time (with rescaled lapse Ñ = N/ g = 1) which tends to +∞ as the proper time tends to zero. 9Actually, Refs. [15, 16, 17] indicate the need to consider a spinning massless particle, i.e. some kind of Dirac equation on E10/K(E10). icity over the discrete group E10(Z). In other words, Ψ would be a “modular wave form” on E10(Z)\E10(R)/K(E10(R)). Let us emphasize (still following [10]) that all reference to space and time has disappeared in Eq. (4). The disappearance of time is common between (4) and the usual Wheeler-DeWitt equation in which the “wave function(al) of the universe” Ψ[gij(x)] no longer depends on any extrinsic time parameter. [As usual, one needs to choose, among all the dynamical variables a specific “clock field” to be used as an intrinsic time variable parametrizing the dynamics of the remaining variables.] The interesting new feature of (4) (when compared to a Wheeler-DeWitt type equation) is the disappearance of any notion of geometry gij(x) and its replacement by the infinite tower of Lie-algebraic variables (βa, να)10. This quantum de-emergence of space, and the emergence of an infinite-dimensional symmetry group E10 11 which deeply intertwines space-time with matter degrees of freedom might be radical enough to get us closer to an understanding of the fate of space-time and matter at cosmological singularities. Acknowledgments. It is a pleasure to dedicate this review to Gabriele Veneziano, a dear friend and a great physicist from whom I have learned a lot. I am also very grateful to my collaborators Marc Henneaux and Hermann Nicolai for the (continuing) E10 adventure. I wish also to thank Maurizio Gasperini and Jnan Maherana for their patience. References [1] M. Gasperini and G. Veneziano, Phys. Rept. 373, 1 (2003) [arXiv:hep- th/0207130]. 10Note that this is conceptually very different from the E11-based proposal of [22]. 11Let us note that E10 enjoys a similarly distinguished status among the (infinite- dimensional) hyperbolic Kac-Moody Lie groups as E8 does in the Cartan-Killing classi- fication of the finite-dimensional simple Lie groups [23]. [2] A. Buonanno, T. Damour and G. Veneziano, Nucl. Phys. B 543, 275 (1999) [arXiv:hep-th/9806230]. [3] T. Damour and M. Henneaux, Phys. Rev. Lett. 85, 920 (2000) [arXiv:hep-th/0003139]. [4] T. Damour and M. Henneaux, Phys. Lett. B 488, 108 (2000) [Erratum- ibid. B 491, 377 (2000)] [arXiv:hep-th/0006171]. [5] V. A. Belinsky, I. M. Khalatnikov and E. M. Lifshitz, Adv. Phys. 19, 525 (1970). [6] T. Damour and M. Henneaux, Phys. Rev. Lett. 86, 4749 (2001) [arXiv:hep-th/0012172]. [7] T. Damour, M. Henneaux, B. Julia and H. Nicolai, Phys. Lett. B 509, 323 (2001) [arXiv:hep-th/0103094]. [8] T. Damour, M. Henneaux and H. Nicolai, Phys. Rev. Lett. 89, 221601 (2002) [arXiv:hep-th/0207267]. [9] O. Aharony, S. S. Gubser, J. M. Maldacena, H. Ooguri and Y. Oz, Phys. Rept. 323, 183 (2000) [arXiv:hep-th/9905111]. [10] T. Damour and H. Nicolai, “Symmetries, Singularities and the De- emergence of Space”, essay submitted to the Gravity Research Foun- dation, March 2007. [11] T. Damour, M. Henneaux and H. Nicolai, Class. Quant. Grav. 20, R145 (2003) [arXiv:hep-th/0212256]. [12] B. Julia, in: Lectures in Applied Mathematics, Vol. 21 (1985), AMS- SIAM, p. 335; preprint LPTENS 80/16. [13] C. M. Hull and P. K. Townsend, Nucl. Phys. B 438, 109 (1995) [arXiv:hep-th/9410167]. [14] T. Damour and H. Nicolai, arXiv:hep-th/0410245. [15] T. Damour, A. Kleinschmidt and H. Nicolai, Phys. Lett. B 634, 319 (2006) [arXiv:hep-th/0512163]. [16] S. de Buyl, M. Henneaux and L. Paulot, JHEP 0602, 056 (2006) [arXiv:hep-th/0512292]. [17] T. Damour, A. Kleinschmidt and H. Nicolai, JHEP 0608, 046 (2006) [arXiv:hep-th/0606105]. [18] T. Damour and H. Nicolai, Class. Quant. Grav. 22, 2849 (2005) [arXiv:hep-th/0504153]. [19] T. Damour, A. Hanany, M. Henneaux, A. Kleinschmidt and H. Nicolai, Gen. Rel. Grav. 38, 1507 (2006) [arXiv:hep-th/0604143]. [20] O. J. Ganor, arXiv:hep-th/9903110. [21] J. Brown, O. J. Ganor and C. Helfgott, JHEP 0408, 063 (2004) [arXiv:hep-th/0401053]. [22] P. C. West, Class. Quant. Grav. 18, 4443 (2001) [arXiv:hep-th/0104081]. [23] V. G. Kac, Infinite dimensional Lie algebras, 3rd edition, Cambridge University Press (Cambridge, 1990). ABSTRACT We review the recently discovered connection between the Belinsky-Khalatnikov-Lifshitz-like ``chaotic'' structure of generic cosmological singularities in eleven-dimensional supergravity and the ``last'' hyperbolic Kac-Moody algebra E(10). This intriguing connection suggests the existence of a hidden ``correspondence'' between supergravity (or even M-theory) and null geodesic motion on the infinite-dimensional coset space K(E(10)). If true, this gravity/coset correspondence would offer a new view of the (quantum) fate of space (and matter) at cosmological singularities. <|endoftext|><|startoftext|> Acceleration and localization of matter in a ring trap Yu. V. Bludov1 and V. V. Konotop1,2 Centro de F́ısica Teórica e Computacional, Universidade de Lisboa, Complexo Interdisciplinar, Avenida Professor Gama Pinto 2, Lisboa 1649-003, Portugal Departamento de F́ısica, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Ed. C8, Piso 6, Lisboa 1749-016, Portugal and Departamento de Matemáticas, E. T. S. de Ingenieros Industriales, Universidad de Castilla-La Mancha 13071 Ciudad Real, Spain A toroidal trap combined with external time-dependent electric field can be used for implementing different dynamical regimes of matter waves. In particular, we show that dynamical and stochastic acceleration, localization and implementation of the Kapitza pendulum can be originated by means of proper choice of the external force. PACS numbers: 03.75.Lm, 03.75.Kk, 03.75.Ss I. INTRODUCTION Exploring different geometries of potentials trapping cold condensed atoms is of both fundamental and practi- cal importance. Toroidal traps play a special role allow- ing for ”infinite” atomic trajectories and for realization of quasi-one-dimensional (quasi-1D) regimes. These ad- vantages are relevant for designing highly precise sensors based on matter wave interferometry [1, 2] as well as for accurate study of such phenomena as superfluid currents, stability of sound waves, solitons and vortices in Bose- Einstein condensates (BEC’s) [3, 4]. Traps with circular geometry are also believed to be conceptually important for implementation of the main ideas of the accelerator physics at ultra-low temperatures [2] and, in particular, for acceleration of ultracold atoms [5]. In this last con- text existence of well localized wave packets, and thus attenuation of the dispersion, the latter being the intrin- sic property of a quantum systems, is of primary impor- tance. In the first experimental studies [2] it was shown that the dispersive spreading out [1] can be compensated by using betatron resonances in a storage ring. An al- ternative way of contra-balancing dispersion is also well known - it is nonlinearity, leading in quasi-1D regime to existence of bright and dark matter solitons (see e.g. [6, 7] and [8, 9, 10], respectively). This issue has already been explored [11] from the point of view of acceleration of matter waves in a toroidal trap with help of a modulated optical lattice, which is known to be an efficient tool for acceleration of matter waves [12]. In this paper we propose two alternative ways of ac- celerating matter wave solitons – either by time varying or by stochastic external electric field. These new ways of soliton acceleration are especially relevant in view of radiative losses [13] and distortions [12] of solitons mov- ing in optical lattices (the effects acquiring significance for long trajectories). At the same time, it turns out that the toroidal geometry of a trap confining a BEC al- lows one to realize a number of other dynamical regimes, like dynamical localization of solitons and solitonic im- plementation of the celebrated Kapitza pendulum. The- oretical description of all mentioned phenomena can be put witching the unique framework, based on the pertur- bation theory for solitons, what is done in the present pa- per. More specifically, in Sec. II we formulate the model and the main physical constraints determining its valid- ity. In sections III and IV we describe how by applying external time-dependent electric field matter solitons can be accelerated in the usual sense and in the sense of the time increase of the velocity variance (the stochastic ac- celeration), respectively. In Sec. V we describe localized states of the matter in circular trap subject to external field, and in Sec. VI we show that a matter soliton af- fected by rapidly varying force represents an example of the Kapitza pendulum [14]. Summary and discussion of the results are given in the Conclusion. II. SCALING AND THE EVOLUTION EQUATION We assume that a BEC is loaded in a circular trap, which in cylindrical coordinates r = (ρ, ϕ, z) is described by V = Vc(ρ) +mω 2/2, where ωz is the frequency of the magnetic trap in the z–direction, Vc(ρ) is the poten- tial in the radial direction, forming the trap circular in the (x, y)–plane, and m is the mass of an atom. We also suppose that the BEC is subject to external electric field with amplitude E0, which produces an additional poten- tial Vext = −α′E20/4, where α′ is the polarizability of the atoms (see e.g. [15]). If the amplitude E0 or direction of the field vary along some direction, say, along the x-axis, smoothly on the scale of the trap radius R, the potential energy Vext van be expanded in the Taylor series and, af- ter neglecting nonessential constant, be rewritten in the form Vext = −αx, where α = (α′/4)∂(E0)2/∂x|x=0 and we restricted the consideration only to the first term of the expansion. In order to realize one-dimensional geom- etry we require torus radius to be much larger than the core radius rc, what allows us to define a small param- eter ǫ = rc/R ≪ 1. In order to introduce quantitative characteristics, we consider the normalized ground state http://arxiv.org/abs/0704.0733v1 φ of the eigenvalue problem φ+ Vc(ρ)φ = εrφ , φ2ρdρ = 1 (1) and define R1 = φ2ρ2dρ, R2 = φ2dρ/ρ )−1/2 and λ = φ4ρdρ )−1/2 . In the case at hand λ ∼√ Rrc ∼ ǫ1/2R and thus λ≪ R1 ∼ R2 ∼ R. In the present paper we are interested in the dynam- ics of matter waves which spatial extension is much less than the trap perimeter, allowing treat them similarly to the matter solitons in an infinite one-dimensional trap. This in particular the case where the spatial size of the BEC excitations along the trap are of order of λ, which is the well defined parameter and thus convenient for for- mulation the constraints of the theory. Indeed, now we can estimate the kinetic energy of the longitudinal ex- citations as ε‖ = h̄ 2/(2mλ2) and require it to be much less than the kinetic energy of the transverse excitations, εr ∼ h̄2/(2mr2c) (for the sake of simplicity here we assume that the size of the trap in z-direction is of order of the core radius: az = h̄/mωz ∼ rc). Adding the require- ment for the energy of the two-body interactions, which is estimated as |g|n (where g = 4πh̄2as/m, as is the scat- tering length, n ∼ N/V is a mean density, N is the total number of atoms and V is the effective volume occupied by the atoms and estimated as V ∼ πλrcaz), to be of or- der of ε‖ and to be much less than εr (or more precisely, requiring |g|n/εr ∼ ǫ), we can neglect in the leading or- der the transitions between the transverse energy lev- els [9, 10], and employ the multiple scale expansion [6, 10] for description of the quasi-one-dimensional evolution of the BEC. We also notice that subject to the assumptions introduced, one has the estimate N ∼ ǫ3/2R/(8|as|). In order to get an insight on practical numbers, let us consider 7Li atoms (as = −2 nm) in a trap with R = 100µm, rc = 5µm and az = 10µm. Then ǫ = 0.05, the characteristic size of solitonic excitations is λ ≈ 22µm and the number of particles is estimated as N ≈ 140. We emphasize, that these estimates indicate only an order of the parameters. Thus, for example, a condensate of 102÷ 103 lithium atoms satisfy the conditions of the theory. We will be interested in managing soliton dynamics by means of weak (i.e not destroying solitons) electro- magnetic field varying in time. Respectively, we con- sider α time-dependent and characterized by the esti- mate α ∼ h̄2/(mR1λ2). Then, starting with the Gross- Pitaevskii equation, in which the external potential in cylindrical coordinates has the form Vext = −αρ cos(ϕ) ≈ −αR1 cos(ϕ), and using the multiple-scale expansion one ensures that the BEC macroscopic wave function in the leading order allows factorization Ψ = π−1/4a−1/2z e −i(ωr+ωz)t/2e−z 2/2a2 zφ(r)ψ(t, ϕ), (2) where ωr = 2εr/h̄ and ψ(t, ϕ) solves the nonlinear Schrödinger equation, which we write in terms of A = |g|m/ 2πh̄2azψ, ζ = R2ϕ/λ, and τ = h̄t/mλ − cos(κζ)f(τ)A + σ|A|2A . (3) Here σ =signas, f(τ) ≡ mR1λ2α(t)/h̄2 and κ = λ/R2 ∼√ ǫ. We choose the scaling in such a way that all terms in (3) are of the unity order, and in particular A = O(1). This can be done, taken into account the normalization |A|2dζ = 2 |as|N , (4) L = 2π/κ, which follows from the normalization condi- tion for the order parameter |Ψ|2d3r = N , and consid- ering N ∼ az/|as|, what is of order of 103, in a typical experimental setting. Eq. (3) is subject to periodic boundary conditions A(ζ, τ) = A(ζ + L, τ). III. ACCELERATION OF BRIGHT MATTER SOLITONS BY TIME-DEPENDENT EXTERNAL FORCE First we consider the acceleration, γ, which can be achieved due to the potential Ve properly dependent on time. An order of magnitude of γ can be estimated by taking into account that Eq. (3) makes sense provided that all terms are of the unity order. In the physical units this gives γ ∼ h̄2/(m2λ3). Then, recalling the above example of the lithium condensate we estimates γ ∼ 7mm/s2, what is of order of the acceleration an- nounced in [11]. It however does not provide the best es- timate in our case, because it is based on the 1D model, while lowering dimensionality imposes constrains on the atomic density and consequently on the amplitude of the applied force. To describe the physics of the phenomenon we consider a BEC with a negative scattering length (σ = −1). Then a ”bright soliton” solution of Eq. (3) at f(τ) ≡ 0 (or more precisely a periodic solution mimicking a bright soliton in an infinite 1D system) which moves with a constant velocity vn can be written down as follows [8] As = e −i(ω(k)+v2 /2)τ+ivnζη(k) dn (η(k)(ζ − vnτ), k) . (5) Here dn(x, k) is the Jacobi elliptic function [16], k is the elliptic modulus parameterizing the solution. The frequency and the amplitude are given by: ω(k) = (k2/2 − 1)η2(k) and η(k) = 2K(k)/L [K(k) is the com- plete elliptic integral of the first kind]. The velocity of the soliton is quantized vn = 2πn/L with n being integer. To ensure that the solution As satisfies the scaling relations imposed above, we notice that the size of the soliton can be estimated as π/K(k) and its small- ness implies that k is close to unity. In that case we obtain the estimates 1 − k2 ∼ 16 exp(−2π/ ǫ) and dn (η(k)(ζ − vnτ), k) ≈ 1/ cosh(η(k)(ζ − vnτ)). In the limit k → 1 quantization of the velocity does not play any significant role. We verified this numerically. For example, for L = 10 deviation of the initial velocity from the quantized one produces appreciable effect on dynamics during intervals τ <∼ 100 only if k <∼ 0.99. When external force is applied, f(τ) 6= 0, the velocity is not preserved any more, what manifests itself in evolution of the momentum P = (1/2i) AζĀ−AĀζ (here Ā stands for complex conjugation of A) according to the = −f(τ) cos(κζ) ∂|A|2 dζ . (6) The external field, however, does not affect the norm: |A|2dζ=const. It follows from (5) that in the adiabatic approximation the solution of the perturbed equation (3) can be searched in the form A = e−i(ω(k)+V (τ) 2/2)τ+iV (τ)ζη dn (η(ζ −X(τ)), k) (7) where V (τ) = dX(τ)/dτ is the time-dependent velocity of the soliton and X(τ) is the coordinate of the soliton center. Substituting (7) in (6) and taking into account the parity of the functions in the integrand as well as the fact that all of them are periodic with the same period L, we obtain the equation for the soliton coordinate = −κC(k)f(τ) sin(κX) . (8) C(k) = 2πE(k) cos(θ)dn2 dθ (9) and it is taken into account that N = 2ηE(k), where E(k) is the complete elliptic integral of the second kind. Depending on the choice of the function f(τ), Eq. (8) describes different dynamical regimes. Now we are inter- ested in acceleration which occurs during the rotational movement of the soliton in the trap (i.e. X is a growing function). We illustrate this acceleration using an exam- ple of the simplest step-like dependence f(τ). To this end we assume that initially the soliton is centered at X(0) = 0 and require f(τ) to be a constant f0 for time intervals such that the soliton coordinates X(τ) ∈ Ip and to be zero for X(τ) /∈ Ip where the intervals Ip are given by Ip = [(p + )L, (p + 1)L] with p = 0, 1, .... Then, as it is clear, the acceleration of the soliton, which is given by the right hand side of (8), is positive for all times. The above requirement introduces natural splitting of the temporal axis in the set of intervals Tl = [τl, τl+1] (l = 0, 1, ...), with τ0 = 0, such that f(τ) = 0 for τ ∈ T2p and f(τ) = f0 for τ ∈ T2p+1 (here X(τl) = lL/2). Thus, our task is to find τl. This can be done by taking into ac- count that during each of the ”odd” intervals T2p+1 Eq. (8) describes conservative nonlinear oscillator, the solu- tion for which is well known. During ”even” intervals T2p the motion is free (with a constant velocity) what means FIG. 1: (Color online) The soliton velocity vs time (panels a,b) for the parameters k = 0.99999, L = 10.0, f0 = 0.3, n = 0.43 (panel a, the non-quantized velocity), n = 1.0 (panel b, the quantized velocity), and the forms of the soliton (panels c,d,e) in the instants of time indicated in the panel a. In the panels a,b solid and dashed lines represent the velocity numerically computed from Eq.(3), and Eq.(8), respectively. that the time T2p necessary for the soliton to cross an interval [pL, (p+ 1 )L] is T2p = τ2p+1 − τ2p = L/(2v2p), (10) where v2p is the velocity in the point pL. During the time interval T2p+1 the soliton has to cross the interval Ip. From this condition we obtain: T2p+1 = τ2p+2 − τ2p+1 = 2E0/(H2p+1 + E0) H2p+1 + E0 where H2p+1 = v 2p+1/2 + E0 is the energy of the soli- ton in the point (p + 1/2)L, E0 = C(k)f0, and v2p+1 = v2p is the soliton velocity in the same point. At the end of the interval T2p+1 the soliton velocity is given by v2(p+1) = 2(H2p+1 + E0). Thus one computes that after p rotations the soliton acquires the velocity v2(p+1), which can be obtained from the recurrent rela- tion: v2(p+1) = v22p + 4C(k)f0. In Fig.1 a,b we compare the solution, obtained from the perturbation theory, Eq.(8), with numerical simula- tion of Eq.(3) for f0 = 0.3. Nevertheless during the nu- merical simulation we used the values for T2p and T2p+1 (Eqs.(10)–(11)) obtained for the case of adiabatic approx- imation. It follows form the results presented that the dashed and solid lines perfectly match until τ ≈ 50.0. At larger times appreciable discrepancy appears. It occurs due to failure of the adiabatic approximation and can be removed by introducing temporal corrections to T2p and T2p+1. This naturally leads to an optimization problem, which requires numerical approach and goes beyond the scope of the present work. Finally we notice, that for the above example of 7Li condensate the obtained accelera- tion is 0.36mm/s2. Comparison of the panels a and b in Fig.1 shows that for k ≈ 1 quantization of the velocity is not important, what is also confirmed by the evolution of the solitonic forms depicted in the panels Fig.1 c-e. IV. STOCHASTIC ACCELERATION OF MATTER SOLITONS Now we concentrate on another dynamical regime – on the stochastic acceleration – where increase of the veloc- ity of a matter soliton in a toroidal trap is achieved by applying a fluctuating external field. To this end hold- ing all conditions of the applicability of the model (3), we consider the case of a stochastic force f(τ), which is delta-correlated Gaussian process with characteristics 〈f(τ)〉 = 0 and 〈f(τ)f(τ ′)〉 = Dδ(τ − τ ′) (the angular brackets stand for the stochastic averaging and D is the dispersion). Now the dynamics can be described in terms of the distribution function P(V,Φ, τ) = 〈δ(Φ− Φ(τ))δ(V − V (τ))〉, (12) where Φ(τ) ≡ κX is the angular coordinate of the soli- ton, Φ(τ) and V (τ) with explicit time dependence stand for the soliton coordinates obtained from the dynamical equations while Φ and V are considered as independent variables. The distribution function solves the Fokker- Planck equation, which is obtained by the standard pro- cedure (see e.g. [17]): = −V ∂P + D̃ sin2(Φ) . (13) Here D̃ = κ4C2(k)D is the diffusion coefficient. Due to the circular geometry of the trap Eq. (13) is considered on the interval −π < Φ < π with the periodic boundary conditions P(V,Φ − π, τ) = P(V,Φ + π, τ) with respect to Φ and zero boundary conditions with respect V : P → 0 as V → ±∞. The normalization condition for the probability density reads: dΦP = 1. Multiplying Eq.(13) by V and Φ and integrating over V and Φ one readily obtains that the averaged velocity and angular position of the soliton are constants, which for the sake of simplicity will be considered zeros, i.e. 〈V 〉 = 0 and 〈Φ〉 = 0. Next, multiplying (13) by V 2, Φ2 and V Φ and performing the integration one obtains the equations of the second momenta. They are not closed and can be written down as follows: 〈V 2〉 = 2D̃〈sin2 Φ〉 , (14) 〈Φ2〉 = 2〈V Φ〉 , (15) 〈V Φ〉 = −2π P (π, V, τ)V 2dV + 〈V 2〉 .(16) Eq. (14) means that the average square velocity is grow- ing with time, i.e. the soliton undergoes the stochastic acceleration. The law of the growth of the velocity in- variance deviates form the linear, as it would happen for FIG. 2: (Color online) The mean square velocity (panel a) and the stochastic acceleration γ̃ (panel b) of the soliton vs time for different values of the dispersion, obtained by numerical integration of Eq. (13) with parameters L = 10.0 and k = 0.99999. In panel b dashed lines depict the approximation of numerical data by the law γ̃ ∝ τ−1/2. All axes in panels a,b are represented in logarithmic scale. the Brownian diffusion in the momentum space, what happens because the diffusion coefficient in the Fokker- Planck equation (13) is not a constant, but depends on the angular variable. However, due to the diffusion one can expect that the phase distribution will tend to ho- mogeneous, i.e. that P → 1/(2π) as τ → ∞. In this formal limit one obtains that 〈V Φ〉 → 0, 〈sin2 Φ〉 → 1/2 and hence 〈V 2〉 → D̃τ . In other words, the system (14)- (16), describes random walk which in the limit of large time, approaches the Brownian diffusion in the velocity space. In that limit the stochastic acceleration, which can be defined as γ̃ = d 〈V 2〉/dτ , would tend to zero according to the law γ̃ ∝ τ−1/2. In order to check the above predictions and reveal other features of the stochastic dynamics of a soliton in a ring trap we solved numerically Eq. (13) subject to the ini- tial condition P(V,Φ, 0) = 〈δ(Φ)δ(V )〉. The results is summarized in Fig. 2. In the panel a one observes the predicted monotonic growth of the mean velocity with time, which slightly different from the linear law. In the panel b one can see that the stochastic acceleration γ̃ is a monotonically decreasing function, which at suffi- ciently large times tends to zero. In particular, at τ >∼ 15 the decreasing of the acceleration with time can be well approximated by the predicted law γ̃ ∝ τ−1/2, as it is shown by dashed curves in the panel b of Fig. 2 (it was verified that in at the same times 〈sin2 Φ〉 ≈ 1/2, what is in agreement with the analytical predictions). Also Fig. 2 b shows that the stochastic acceleration is larger for larger D. The physical explanation of this last fact is simple: the acceleration is generated by the stochastic force, whose intensity is determined by the dispersion D. V. LOCALIZATION OF MATTER INDUCED BY THE EXTERNAL FIELD Let us now turn to localized states of a matter in a toroidal trap and concentrate on the states generated by the constant external electric field, i.e. by f(τ) ≡ f0. Respectively, we look for stationary solutions of Eq. (3) in the form A = e−iωτA(ζ) and obtain for A(ζ) the equa- tion: − f0 cos(κζ)A + σ|A|2A = ωA (17) which is subject to periodic boundary conditions A(ζ, τ) = A(ζ + L, τ). Several lowest branches of the numerically obtained solutions of Eq. (17) are shown in Fig. 3. The lowest branch approaches zero at the frequency ω0 ≈ −0.143 (it is interesting to mention that this frequency coincides with the lowest gap edge of the spectrum of the Mathieu equation (17) considered on the whole axis), where the amplitude of the nonlinear periodic mode is small and it transforms into the linear periodic Bloch mode at the lowest gap edge. Such a behavior of the branch is sim- ilar to that of the strongly localized modes in a BECs in the optical lattice [18]. The lowest mode A is local- ized in the vicinity of ϕ = 0 (Fig.3, b), i.e. around the minimum of the effective potential and that is why such a mode is stable and can exists even in the linear case, where the two-body interactions are negligible (here it is important that we are dealing with periodic boundary conditions). The modes of the upper branches – B and D (their typical examples are shown in Fig.3 b) – bifur- cate at ω∗ ≈ −0.345. They are localized at ϕ = ±π, i.e. near the points where the potential has maxima. As it is clear, this is pure nonlinear effect and occurs due to delicate balance between the attractive interactions and repulsive forces of the external potential. Such balance can easily be destroyed even by an infinitesimal pertur- bation, what allows us to expect instability of the modes. The mode C represents two local maxima of the atomic distributions at ϕ = 0, π. Similar to the modes B and D we one can expect it to be unstable, what physically can be explained by existence of local atomic maxima at the maxima of the potential. By direct numerical solution of Eq. (3) (more specifically by perturbing the mode pro- files by the factor 1 + 0.1 cos (21 ζ) and computing the dynamics until τ = 1000) we have verified that, indeed, only the mode A on Fig. 3 is dynamically stable, while the modes B, C, and D are unstable. VI. MATTER SOLITON AS A KAPITZA PENDULUM As the final example of nontrivial dynamics of a mat- ter soliton in a toroidal trap we consider dynamical lo- calization induced by a rapidly oscillating force f(τ) = f0 [ν + cos(Ωτ)]. In this case the solitonic motion mimics FIG. 3: (Color online) The number of bosons N (for the ex- ample of the lithium condensate described in the text) vs frequency ω (panel a) and shapes of the localized modes at ω = −0.4 (panel b) for the case where L = 10.0, f0 = 0.3, σ = −1. the famous Kapitza pendulum, which acquires an addi- tional stable point due rapid oscillation of the pivot [14]. Assuming that the physical conditions of the validity of the quasi-1D approximation (3) holds and that the fre- quency Ω is large enough, i.e. Ω2 ≫ κ2C(k)f0, one can perform the standard analysis (see e.g. [14]), i.e. look for a solution of (3) in a form X(τ) + ξ(τ) where ξ is small, |ξ| ≪ |X |, and rapidly varying, and provide averaging over rapid oscillations. Then one arrives at the equation d2X/dτ2 = −∂U/∂X with the effective potential U = −C(k)f0 ν cos(κX) + κ2C(k) cos(2κX) .(18) If the condition κ2C(k)f0/(2Ω 2ν) > 1 is met, the ef- fective potential U possesses two stable points: X = 0 (Φ = 0) and X = L/2 (Φ = π). So, it opens the pos- sibility for the new type of soliton moving around the new stable point. Two typical trajectories of the soli- ton, obtained by numerical integration of Eq. (3), are presented in Fig.4. One of the trajectories displays oscil- lations around the new equilibrium point, while the other one shows the large oscillations around the equilibrium point Φ = 2π started with the same initial data but in the case where Φ = π is not an equilibrium any more. The amplitude of large oscillations decay with time because of energy losses of the soliton in the nonconservative sys- VII. CONCLUSIONS In the present paper we have shown that dynamics of a matter soliton in a toroidal trap, well reproducing one- dimensional geometry, can be very efficiently governed by time varying external electric field. In particular, such regimes like dynamical acceleration, stochastic acceler- ation, localization and implementation of the Kapitza pendulum can be realized by proper choices of the time dependence of the external force. Experimental detection of the acceleration can be im- plemented either by direct imaging of the atomic cloud, FIG. 4: (Color online) The angular coordinate of the soliton center vs time for the soliton motion affected by the rapidly oscillating external force, obtained numerically from Eq.(3) with parameters L = 10.0, n = 0.01, σ = −1, f0 = 0.15, Ω = 2.0, and k = 0.99999. which is well localized in space and has well specified tra- jectory, or by measurement of the atomic distribution in the momentum space displaying shift of the maximum towards higher kinetic energies. Alternatively, one can study the evolution of the atomic cloud releasing from the trap (by switching the trap off) after some period of accelerating motion. The respective dynamics will be a spreading out cloud whose center of mass is moving with the acquired velocity. The obtained results were based on the one- dimensional model, although deduced using the multiple- scale method and thus mathematically controllable. This means that a number of problem, are still left open. One of them is the limitation on the soliton velocity, and thus acceleration, introduced by lowering the space dimension, which appears when the solitonic kinetic energy becomes comparable with the transverse kinetic energy. Another limitation on the soliton acceleration emerges from the fact of the velocity quantization, when the radius of the ring trap is not large enough. We also left for further studies the diversity of localized stationary atomic distri- butions supported by the external filed, indicating only the lowest modes. We thus believe that the richness of the phenomena which can be observed by simple com- bination of the trap geometry and varying external field will stimulate new experimental and theoretical studies. Acknowledgments The authors are indebted to H. Michinel for providing additional data on Ref. [11]. YVB was supported by the FCT grant SFRH/PD/20292/2004. VVK was supported by the Secretaria de Stado de Universidades e Investi- gaci?n (Spain) under the grant SAB2005-0195. The work was supported by the FCT (Portugal) and European pro- gram FEDER under the grant POCI/FIS/56237/2004. [1] S. Gupta, K. W. Murch, K. L. Moore, T. P. Purdy, and D. M. Stamper-Kurn Phys. Rev. Lett. 95, 143201 (2005) [2] K. W. Murch, K. L. Moore, S. Gupta, and D. M. Stamper-Kurn, Phys. Rev. Lett. 96, 013202 (2006) [3] J. Javanainen, S. M. Paik, and S. Mi Yoo, Phys. Rev. A 58, 580 (1998); L. Salasnich, A. Parola, and L. Reatto, Phys. Rev. A 59, 2990 (1999); A. B. Bhattacherjee, E. Courtade, and A. Arimondo, J. Phys. B, 37, 4397 (2004). [4] J. Brand and W. P. Reinhardt, J. Phys. B: At. Mol. Opt. Phys. 34, L113 (2001) [5] O. Dutta, M. Jääskeläinen, and P. Meystre, Phys. Rev. A 74, 023609 (2006). [6] V. M. Pérez-Garćıa, H. Michinel and H. Herrero, Phys. Rev. A 57, 3837 (1998). [7] F. Kh. Abdullaev, et. al. Int. J. Mod. Phys. B 19, 3415 (2005) [8] T. Tsuzuki, J. Low Temp. Phys. 4, 441 (1971). [9] Th. Busch and J. R. Anglin, Phys. Rev. Lett. 84, 2298 (2000); A. Muryshev, G.V. Shlyapnikov, W. Ertmer, K. Sengstock, and M. Lewenstein, Phys. Rev. Lett. 89, 110401 (2002) [10] V. A. Brazhnyi and V. V. Konotop, Phys. Rev. A 68, 043613 (2003). [11] A.V. Carpentier and H. Michinel, cond-mat/0610047. [12] V. A. Brazhnyi, V. V. Konotop, and V. Kuzmiak, Phys. Rev. A 70, 043604 (2004) [13] A. V. Yulin, D. V. Skryabin, and P. St. J. Russell, Phys. Rev. Lett. 91, 260402 (2003) [14] see e.g. L. D. Landau and E. M. Lifshitz, Mechanics (Pergamon Press, New York, 1976) [15] C.J. Pethick and H. Smith, Bose-Einstein condensation in dilute gases (Cambridge, University Press, 2001) [16] D. K. Lawden: Elliptic Functions and applications (Springer-Verlag, New York Inc., 1989) [17] see e.g. V. V. Konotop and L. Vázquez Nonlinear random waves (World Sci., Singapore, 1994) [18] A. Trombettoni and A. Smerzi, Phys. Rev. Lett. 86, 2353 (2001); F. K. Abdullaev, B. B. Baizakov, S. A. Dar- manyan, V. V. Konotop, and M. Salerno, Phys. Rev. A 64, 043606 (2001). http://arxiv.org/abs/cond-mat/0610047 ABSTRACT A toroidal trap combined with external time-dependent electric field can be used for implementing different dynamical regimes of matter waves. In particular, we show that dynamical and stochastic acceleration, localization and implementation of the Kapitza pendulum can be originated by means of proper choice of the external force. <|endoftext|><|startoftext|> **FULL TITLE** ASP Conference Series, Vol. **VOLUME**, **YEAR OF PUBLICATION** **NAMES OF EDITORS** GRO J1655-40: from ASCA and XMM-Newton Observations Xiao-Ling Zhang1, Shuang Nan Zhang2,3,4, Gloria Sala1, Jochen Greiner1, Yuxin Feng3,4, Yangsen Yao5 Abstract. We have analysed four ASCA observations (1994–1995, 1996– 1997) and three XMM-Newton observations (2005) of this source, in all of which the source is in high/soft state. We modeled the continuum spectra with rel- ativistic disk model kerrbb, estimated the spin of the central black hole, and constrained the spectral hardening factor fcol and the distance. If kerrbb model applies, for normally used value of fcol (1.7), the distance cannot be very small, and fcol changes with observations. 1. Background GRO J1655-40, the second microquasar (after GRS 1915-105), had X-ray out- bursts in 1994-1995, 1996-1997, 2005. Its geometric parameters are considered best determined: mass MBH = 7.0± 0.2M⊙, inclination angle θ = 69.50 ± 0.08 (Orosz & Bailyn 1997), distance D = 3.2 ± 0.2 kpc (Hjellming & Rupen 1995), which makes it a very good laboratory of studying black holes and environments. The spin of the central black hole has been estimated by various authors with various methods (see, e.g., Zhang et al. 1997; Abramowicz & Kluźniak 2001; Aschenbach 2004; Shafee et al. 2006), and the reported value range from 0.2 (Abramowicz & Kluźniak 2001) to 0.996 (Aschenbach 2004). In estimating black hole spin from continuum spectral modeling, the color correction factor fcol = Tcol/Teff , is one of the key factors. The normally used value of fcol is 1.7, following Shimura & Takahara (1995), while many authors believe it should not be constant (see, e.g., Merloni et al. 2000). The distance is also very important. The widely accepted value 3.2± 0.2 kpc was challenged by Foellmi et al. (2006), who gave an upper limit of 1.7 kpc. 2. Observations, data reduction and model fitting We analysed three ASCA observations during the 1994–1995 and the 1996–1997 outbursts, and three XMM-Newton observations during the 2005 outburst, in all of which the source was in high/soft state. For ASCA, only GIS2 data were used, after gain correction and deadtime correction. For XMM-Newton, only Epic-pn MPE, Postfach 1312, 85741 Garching, Germany, zhangx@mpe.mpg.de Tsinghua Univ, 100084, Beijing, China U. of Alabama, Huntsville, AL 35899, USA NSSTC, Sparkman Dr. 320, Huntsville AL 35805, USA MIT Kavli Inst. for Astro. and Space Research, 70 Vassar Street, Cambridge, MA 02139 http://arxiv.org/abs/0704.0734v1 2 X.-L. Zhang et al data were used, after correction for rate-dependent Charge-Transfer-Efficiency (Sala et al. 2006). The classical way of estimating black hole spin from the continuum spectral fitting is to fit the spectra with disk models, and obtain the spin directly or indirectly. All models take the source distance as parameter, and most models treat the disk as multi-temperature black-body rings and the derived spin value depends on the apparent/effective temperature ratio. The relativistic disk model kerrbb in XSPEC was used in the fitting. We let fcol vary from 1.0 to 3.0, and D vary from 1.0 kpc and 3.2 kpc. For each combination of fcol and D, we fitted the data sets and obtained a spin value, if the fit was acceptable (χ2/dof < 2). The contour of the derived spin a over D and fcol are shown in the Fig. 1. 3. Conclusion From Fig. 1 we can see, 1. for the normally used fcol value 1.7, kerrbbmodel does not favor small distance; 2. because the black hole spin and the source distance should be constant, fcol changes dramatically between these observations. Figure 1. Contour of black hole specific angular momentum a versus dis- tance D and fcol. The two dotted lines indicate D=1.7kpc, and fcol = 1.7. References Abramowicz, M. A., & Kluźniak, W. 2001, A&A, 374, L19 Aschenbach, B. 2004, A&A, 425, 1075 Foellmi, C., Depagne, E., Dall, T. H., & Mirabel, I. F. 2006, A&A, 457, 249 Hjellming, R.M. & Rupen, M.P. 1995, Nat, 375, 464 Merloni, A., Fabian, A.C., Ross, R.R. 2000, MNRAS, 313, 193 Orosz, J. A., & Bailyn, C. D. 1997, ApJ, 477, 876 Sala, G., et al 2006, A&A, accepted (astro-ph/0606272 ) Shafee, R., et al. 2006, ApJ, 636, L113 Shimura, T. & Takahara, F. 1995, ApJ, 445, 780 Zhang, S. N., Cui, W., & Chen, W. 1997, ApJ, 482, L155 http://arxiv.org/abs/astro-ph/0606272 ABSTRACT We have analysed four ASCA observations (1994--1995, 1996--1997) and three XMM-Newton observations (2005) of this source, in all of which the source is in high/soft state. We modeled the continuum spectra with relativistic disk model kerrbb, estimated the spin of the central black hole, and constrained the spectral hardening factor f_col and the distance. If kerrbb model applies, for normally used value of f_col, the distance cannot be very small, and f_col changes with observations. <|endoftext|><|startoftext|> Introduction Hard X-ray surveys have clearly revealed the important role played by obscured Active Galactic Nuclei (AGN) to repro- duce the cosmic X-ray background (XRB; e.g., Comastri et al. 1995) and have provided evidence that a significant fraction of the accretion-driven energy density in the Universe resides in obscured X-ray sources (e.g., Barger et al. 2005; Hopkins et al. 2006; Hickox & Markevitch 2006). Until recently, the limited information on the broad-band emission of the counterparts of obscured X-ray sources pre- vented a reliable determination of their bolometric luminosi- ties. The lack of a proper knowledge of the spectral energy dis- tributions (SEDs) of obscured sources has led many authors to adopt, in the computation of the bolometric luminosities, the average value derived by Elvis et al. (1994), although in that work the sample comprises mostly local unabsorbed quasars. By the current work, we aim at providing a robust estimate of the bolometric luminosity for obscured sources, which is an essential parameter to derive the cosmic mass density of super- massive black holes (SMBHs, i.e., following the Soltan 1982 approach). A reliable estimate of the bolometric luminosity of obscured AGN is typically limited by the actual capabilities of disentangling the nuclear emission (related to the accretion processes) from that of the host galaxy which, unlike for un- absorbed quasars, often dominates at optical and near-infrared (near-IR) bands. A significant fraction of high-redshift, luminous obscured AGN (the so-called Type 2 quasars) may have escaped spec- troscopic identification due to their faint optical counterparts, thus preventing current studies from an accurate sampling of obscured sources. Mid-infrared (mid-IR) observations appear http://arxiv.org/abs/0704.0735v2 2 F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars Table 1. Properties of our targets Source Id. 2–10 keV flux† R Ks Morph(Ks) X/O z (10−14 erg cm−2 s−1) Abell 2690#75 3.30 24.60 18.33 E 1.86 2.13a PKS 0312−77#36 1.90 24.70 19.13 E 1.66 – PKS 0537−28#91 4.20 23.70 18.99 E 1.60 – PKS 0537−28#54 2.10 25.10 18.91 E 1.86 – PKS 0537−28#111 2.10 24.50 17.64 E 1.62 – Abell 2690#29 2.80 25.10 17.67 P 1.99 2.08b PKS 0312−77#45 2.80 24.40 18.62 P 1.71 – BPM 16274#69 2.27 24.08 17.87 E 1.49 1.35b † 2-10 kev X-ray fluxes from Perola et al. (2004). a Tentative spectroscopic redshift from near-IR spectroscopic observations (Maiolino et al. 2006). b Spectroscopic redshift from near-IR spectroscopic observations (Maiolino et al. 2006). to be fundamental for this class of objects, since they are only marginally affected by dust obscuration and are able to recover the nuclear emission. With mid-IR observations, we expect to reveal the nuclear radiation re-processed by the torus of the obscured active nuclei, which are often recognized as such by means of their X-ray emission only. For these sources, the soft X-ray emission, which is photo-electrically absorbed by the gas, and the optical emission, extincted by the dusty cir- cumnuclear medium, are expected to be downgraded in energy and to emerge as thermally reprocessed radiation in the IR at wavelengths in the range between a few and a few hundred µm (Granato et al. 1997). The potentialities of mid-IR observations were firstly shown by Fadda et al. (2002), who detected with ISOCAM at 15 µm about two-thirds of the X-ray sources detected in the 5–10 keV band in the XMM-Newton Lockman Hole survey. A similar high detection rate at 24 µm has been recently reported by Rigby et al. (2004) and Franceschini et al. (2005) studying the Spitzer counterparts of the Chandra sources in the CDF- S and in the ELAIS-N1 field, respectively, within the SWIRE survey (Lonsdale et al. 2004). In this context, a new interesting class of objects is emerg- ing from the current X-ray surveys: these sources are character- ized by a high (>1) X-ray-to-optical flux ratio (hereafter X/O)1; for comparison, unobscured Type 1 AGN have a broad distri- bution peaked at X/O≈0. Objects with X/O &1 are about 20% of the hard X-ray selected sources, and the fraction of these sources seems to remain constant over ≈ 3 decades of X-ray flux (Comastri & Fiore 2004). By definition, sources with high X/O are among the faintest sources in the optical band. In the shallow, large-area X-ray surveys (e.g., the HELLAS2XMM survey, with F2−10 keV > 10 −14 erg cm−2 s−1 over ≈ 3 deg2; Baldi et al. 2002), where the identification of a sizable sam- ple of sources with X/O>1 has been possible (e.g., Fiore et al. 2003), the X/O selection criterion has proven to be effective in selecting Type 2 quasars at high redshifts. 1 X/O is defined as log FX = log FX + + 5.5. We have performed a pilot program to study with Spitzer a sample of eight sources selected in the 2–10 keV band from the HELLAS2XMM survey on the basis of their high (>1) X/O and large column densities (NH ≥10 22 cm−2). The sample observed with Spitzer has been previously investigated in other bands. The most surprising finding of the follow-up campaigns was the association of these sources with luminous near-IR objects (Mignoli et al. 2004), placing them into the class of Extremely Red Objects (EROs, R − K ≥5). The outline of the paper is as follows: in Sect. 2 we present the sample selection; Spitzer data reduction and analysis are discussed in Sect. 3, while in Sect. 4 we describe the analy- sis of the SEDs. Finally, in Sect. 5 we estimate the bolometric luminosities, the stellar masses of the host galaxies, the black hole masses, and the Eddington ratios. Throughout this paper we adopt the “concordance” (WMAP) cosmology (H0=70 km s −1 Mpc−1, ΩM=0.3, and ΩΛ=0.7; Spergel et al. 2003). Magnitudes are expressed in the Vega system. 2. Sample selection The eight objects presented in this paper (see Table 1) were se- lected among the 10 HELLAS2XMM high X/O ratio sources detected in the Ks band with ISAAC at ESO-VLT; for details on the association of the Ks-band counterpart to the X-ray source, see Mignoli et al. (2004). Two sources of the origi- nal sample were not selected for Spitzer observations: one (PKS 0537−28#31) is associated with a disky galaxy, while the other (BPM 16274#181) has an ambiguous Ks-band mor- phological classification. All but one of the sources observed by Spitzer belong to the first square degree field (122 X-ray sources; see Fiore et al. 2003 for the spectroscopic and photo- metric identification and Perola et al. 2004 for the X-ray spec- tral analysis); the only exception is BPM 16274#69, which be- longs to the sample of the second square degree (110 X-ray sources; see Cocchia et al. 2007; Lanzuisi et al., in prepara- tion). F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars 3 Fig. 1. ISAAC Ks images, centered on the Ks counterpart of the X-ray sources; each box is 30′′ wide. North is to the top and East to the left. Contour levels of the 24 µm emission corresponding to [3, 4, 5, 7, 10, 20, 40]σ are superimposed to each image. The selected sources, although faint in the optical band (23.7 5, thus being EROs. Mignoli et al. (2004) were able to study the surface brightness profiles of these sources in the Ks band and obtained a morphological clas- sification. Only two sources are classified as point-like objects, while all of the others are extended, with clear detection of the host galaxy and radial profiles consistent with those of ellipti- cal galaxies. In this latter class of sources, the central AGN is evident in the X-ray band, while in the near-IR the host galaxy dominates. An upper limit to the contribution of a central un- resolved source (i.e., the nuclear emission), ranging from 2% to 12% of the galaxy emission, was obtained. Furthermore, us- ing both the R − K colour and the morphological information, a minimum photometric redshift, in the range 0.9–2.4, was es- timated for these sources. For three of the sources with the reddest colours (Abell 2690#75, Abell 2690#29 and BPM 16274#69; see Table 1), near-IR spectroscopic observations with ISAAC at ESO-VLT were performed by Maiolino et al. (2006), thus al- lowing for a spectroscopic identification for at least two of these sources (one redshift measurement appears tentative). The point-like source (Abell 2690#29) shows the typical rest- frame optical spectrum of high-redshift dust-reddened quasars, with a broad Hα line (Gregg et al. 2002). The other two ob- served sources, both of them extended in the Ks band, have nar- row emission-line spectra: one is a LINER-like object at z=1.35 and the second source has a spectrum with a single weak line, tentatively associated with Hα at z=2.13. Consistently with the morphological information, in the first source the AGN domi- nates the emission, while in the other two sources the nuclear spectrum is heavily diluted by the host galaxy starlight. 4 F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars Table 2. Spitzer flux densities Source Id. 3.6 µm 4.5 µm 5.8 µm 8.0 µm 24 µm Sν±∆Sν Sν±∆Sν Sν±∆Sν Sν±∆Sν Sν±∆Sν Abell 2690#75 51 ± 5 56 ± 6 89 ± 11 139 ± 15 565 ± 62 PKS 0312−77#36 41 ± 4 44 ± 5 40 ± 8 71 ± 9 236 ± 30a PKS 0537−28#91 28 ± 4 35 ± 4 42 ± 8 80 ± 10 301 ± 40 PKS 0537−28#54 31 ± 4 35 ± 4 50 ± 10 47 ± 8 279 ± 45 PKS 0537−28#111 88 ± 9 75 ± 8 41 ± 6 46 ± 7 148 ± 28 Abell 2690#29 141 ± 14 185 ± 19 260 ± 27 371 ± 38 1012 ±106a PKS 0312−77#45 50 ± 6 62 ± 7 69 ± 10 78 ± 10 249 ± 35 BPM 16274#69 86 ± 9 92 ± 9 97 ± 11 120 ± 13 286 ± 34 The flux density is reported in units of µJy. a The 24 µm flux density is probably over- estimated due to contamination from nearby sources and should be considered as an upper limit. 3. Spitzer observations and data reduction The whole sample of eight hard X-ray selected sources has been observed by Spitzer (Werner et al. 2004), with IRAC (Fazio et al. 2004) observations of 480 s integration time and MIPS (Rieke et al. 2004) observations at 24 µm with a total integration time of ≈ 1400 s per position. IRAC observations were performed in photometry mode with frame time of 30 s and dither pattern of 16 points. The MIPS 24 µm observations were performed in MIPS photometry mode with frame time of 10 s, 10 cycles and small-field pattern. To reduce overheads, the cluster option was used when possible. For the IRAC bands, we used the final combined post-basic calibrated data (BCD) mosaics produced by the Spitzer Science Center (SSC) pipeline (Version S12.0–S13.01). At 24 µm, we started the analysis from the BCD produced by the SSC pipeline (Version S12.4.2–S13.01) and then we applied ad hoc procedures to optimize the reduction, since some of our sources were close to the detection limit (see Table 2). We remind that BCD are individual frames already corrected for dark, flat field and geometric distortion. We improved the quality of the BCD by correcting each individual BCD for a residual flat field de- pending on the scan mirror position (Fadda et al. 2006). The residual flat fielding was obtained from our own data by aver- aging the BCD corresponding to the same scan mirror position and the same Astronomical Observation Request (AOR), con- sidering all the different cluster positions. To each BCD, its median level was subtracted before this operation. This pro- cedure was possible since our observations are not dominated by background fluctuations. The corrected BCD were co-added and background-subtracted using the SSC MOPEX software (Makovoz & Marleau 2005). The resulting mosaics were made with 2.4′′ pixel size. The overall analysis at 24 µm produces mosaics with a typical noise of ≈ 0.020 MJy/pixel (a factor of 2 lower in comparison with the SSC pipeline mosaics). The noise map has been computed for each mosaic by scaling the measured mean rms of the central part of the map according to the inverse square root of the coverage map. The flux densities of our targets in IRAC and MIPS bands were measured on the signal maps using aperture photometry at the position of the sources. The chosen aperture radius for the IRAC bands is 2.45′′ and the adopted factors for the aperture corrections are 1.21, 1.23, 1.38 and 1.58 (following the IRAC Data Handbook, Version 3.0) at 3.6 µm, 4.5 µm, 5.8 µm and 8 µm, respectively. The chosen aperture radius at 24 µm is 7.5′′; aperture correc- tions were derived by examining the photometry of bright stars. Taking into account an additional correction of 1.15 to match the procedure used by the MIPS instrument team to derive cal- ibration factors from standard star observations, the resulting aperture correction is 1.57 (in agreement with the SWIRE team, see Surace et al. 2005). Table 2 reports the results of the Spitzer observations. To compute the photometric uncertainties, we added in quadra- ture the noise map and the systematic uncertainties (≈10%, see MIPS and IRAC Data Handbook 2006, Version 3.0). The rel- ative photometric uncertainties range from ≈10% in the best cases (IRAC channels 1 and 2), up to ≈20% at 24 µm in the worst ones. All of the eight sources are clearly detected in both IRAC and MIPS 24 µm bands. At 24 µm, the flux densities span an order of magnitude, ranging between ≈1000 µJy and ≈150 µJy, with the faintest source (PKS 0537−28#111) close to the 5σ detection level. In Fig. 1 the Ks-band images, along with the contour lev- els of the 24 µm emission, are shown. At 24 µm, the sources PKS 0312−77#36 and Abell 2690#29 appear to be confused. In particular, in both cases, there is a second source at ≈ 8-10′′, unrelated to the targets. The contribution of these sources to the 24 µm flux density of our targets has been estimated by a decomposition analysis (using the PSF fitting algorithm IMFIT within the AIPS environment). Furthermore, both the sources PKS 0312−77#36 and Abell 2690#29 present a second object at ≈ 2′′, clearly visible in the Ks images, too close to our targets for a decomposition analysis, given the 24 µm pixel size. Since these close companions become increasingly fainter moving from the Ks bands to the longer IRAC wavelengths, we have attributed the entire flux density estimated from the decompo- sition analysis to our targets (see Table 2). However, the 24 µm flux densities for these two sources should be treated as upper limits (see Fig. 2). F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars 5 We point out that the deblending procedure measures the peak flux density. To convert the peak flux density into total flux density, we have assumed that the deblended objects are point-like sources and applied a correction factor of 8.9 derived from the 24 µm Spitzer PSF and including the 1.15 calibration factor. 4. Analysis of the spectral energy distributions In this section we provide an analysis of the SEDs of our sources, in order to derive the energy distribution of the nu- clear component “cleaned” by the host galaxy contribution. As anticipated in Sect. 1, the determination of the nuclear SEDs over a large wavelength range is an essential step to estimate the physical properties of the black hole, such as its bolometric luminosity, mass, and accretion rate. Taking advantage of the new Spitzer photometric points, simultaneously to the SED de- termination we have estimated the photometric redshifts of our sources. These new values are then compared with the min- imum redshifts estimated by Mignoli et al. (2004) using only the R and Ks bands and with the spectroscopic ones measured in three cases by Maiolino et al. (2006, see §2). Given the different morphological properties of our sources, two approaches have been adopted, one for the sources dominated in the Ks band by the host galaxy (elliptical-like sources) and another one for the sources dominated by the nu- clear component (point-like sources). 4.1. Elliptical-like sources From the Ks-band morphological analysis, we know that at least up to the observed 2.2 µm band the stellar contribu- tion dominates the emission in these sources. At longer wave- lengths, the nuclear component is expected to arise as repro- cessed radiation of the primary emission, while the stellar component is expected to drop (e.g., Bruzual & Charlot 2003; Silva et al. 2004). In the analysis presented in this work, we have decided to follow a phenomenological approach, checking whether the emission of our sources can be reproduced as the sum of two components, one from the host galaxy and the other related to the reprocessing of the nuclear emission by the dusty torus en- visaged by unification schemes (Antonucci 1993). The shape and relative strengths of the two components have to be con- sistent with all our observed data sets (multi-band photometry, Ks-band morphology and magnitude, Ks-band upper limit on the nuclear component, and X-ray spectral analysis). For the galaxy component, we adopted a set of six galaxy templates, obtained from the synthetic spectra of GISSEL 2003 (Bruzual & Charlot 2003) assuming a simple stellar population and spanning a wide range of ages, from 1 Gyr up to a “maxi- mum age” model (z f orm = 20). The “maximum age” model has been adopted by Mignoli et al. (2004) to derive the minimum photometric redshift for the sources of the current sample. For the nuclear component, we adopted the nuclear tem- plates from Silva et al. (2004), based on the radiative transfer models of Granato & Danese (1994). We chose these templates since in the work of Silva et al. (2004) the radiative transfer models are used to interpolate the observed nuclear IR data for a sizable sample of local AGN. We must note, however, that the nuclear observed SEDs are available only in the 2–20 µm regime, where data from small-aperture instruments are avail- able. At wavelengths above ≈ 20 µm, the SEDs are model ex- trapolations. Silva et al. (2004) found that the nuclear SEDs can be ex- pressed as a function of two parameters, the hard X-ray (2– 10 keV) intrinsic luminosity, which provides the normalization to the SED, and the column density NH , which gives the shape to the SED (see Fig. 2 in Silva et al. 2004). The shapes of the SEDs of the Seyfert galaxies are assumed to be valid also at quasar luminosities. In the attempt to provide a better estimate for the source redshifts, we used the four torus templates as given by Silva et al. (2004), which depend on the column densities NH , and we left the normalizations free to vary. The redshift interval explored by this procedure is 0.5–3.0; the best-fit solution is ob- tained when the algorithm, based on the χ2, finds a minimum in the galaxy template, torus template and redshift parameter space. For four out of six sources the procedure finds a clear min- imum χ2, which allows us to determine a photometric red- shift with relatively good accuracy (see Table 3). For sources BPM 16274#69 and PKS 0537−28#54, our procedure con- strains only the lower bound of the redshift interval. For all the six sources, the estimated redshifts are consistent with the minimum redshifts of Mignoli et al. (2004). In case of the source BPM 16274#69, where a secure spectroscopic redshift is available, the minimum photometric redshift (zphot > 1.25) is consistent with the spectroscopic one (z = 1.35). For source Abell 2690#75, we find zphot = 1.30 +0.30 −0.20, which is signifi- cantly lower than the spectroscopic value (z = 2.13) reported by Maiolino et al. (2006). In this case, we choose to adopt the photometric determination, since the spectroscopic redshift is based on the tentative detection of a single line. The results of the SED fitting and decomposition are shown in Fig. 2. The dot-dashed line represents the best-fit galaxy tem- plate, the dashed line is the best-fit nuclear template and the thick solid line is the sum of the two components. The best-fit galaxy templates are all typical of early-type galaxies with ages between 3 and 6 Gyr. We find an overall agreement between the SED templates and the data points. Moreover, in all but one of the sources, the nuclear component derived from the best fit is consistent with the upper limits derived from the analysis of the Ks-band images (shown as downward-pointing arrows). In Fig. 2 we also report as dotted line the SED of the nuclear component normalized to the intrinsic (i.e., de-absorbed) X-ray luminosity following the prescriptions of Silva et al. (2004), where, at a given NH , the normalization depends only on the in- trinsic hard X-ray luminosity. The overall agreement between the SEDs normalized to the X-ray luminosity and the best-fit SEDs is extremely interesting, being consistent within a factor of ≈ 2–3. In Table 3 we report, along with the photometric redshifts, the column densities NH and the de-absorbed L2−10 keV lumi- nosities. We derive rest-frame NH column densities in the range 1022.0–1023.4 cm−2 and 2–10 keV luminosities in the range 6 F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars Fig. 2. Rest-frame SEDs of the elliptical sources (black filled circles) compared with the best-fit model obtained as the sum (solid line) of an early-type galaxy (dot-dashed line) and a nuclear component (dashed line). For comparison, the nuclear component as derived from the X-ray normalization is also reported (dotted line). The nuclear Ks-band upper limits (downward-pointing arrows) were derived from the morphological analysis carried out by Mignoli et al. (2004). The 24 µm upper limit of source PKS 0312−77#36 takes into account a possible contribution from a companion source. zs means that the redshift is spectroscopic (see §2 for details), while zphot means that the redshift is photometric (see §4.1). 1043.8–1044.7 erg s−1, placing these sources among the Type 2 quasar population. 4.2. Point-like sources From the Ks-band morphological analysis, Abell 2690#29 and PKS 0312−77#45 (see Mignoli et al. 2004 and Table 1) show a completely different appearance at 2.2 µm in comparison with the sample of extended objects; these two sources have their near-IR emission mostly dominated by an unresolved source. The dominant role played by the AGN is supported for Abell 2690#29 also by its near-IR spectrum, where a broad Hα emission line is detected (see Maiolino et al. 2006 and §2). F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars 7 Fig. 3. Rest-frame SEDs of the two point-like sources (black filled circles) compared to an extinguished quasar template (dashed line) and the best-fit red quasar template (solid line). The extinguished quasar template is obtained from the unobscured quasar template of Elvis et al. (1994) using the SMC extinction law with E(B − V) = 0.7 and scaled to fit the R − Ks colour (for comparison, the unobscured quasar template is also shown as a dotted line). The red quasar template is taken from Polletta et al. (2006). The 24 µm flux density upper limit for the source Abell 2690#29 takes into account a possible contribution from a companion source. zs means that the redshift is spectroscopic (see §2 for details), while zphot means that the redshift is photometric (see §4.2). Unfortunately, the near-IR spectroscopy information is absent for PKS 0312−77#45 (see Table 1). We first tried to reproduce their observed SEDs by redden- ing the composite template spectrum of bright Type 1 quasars of Elvis et al. (1994) with several extinction laws. Reddening has been applied as prescribed by Calzetti (1997) for a dust- screen model and by Pei (1992) for the Small Magellanic Cloud (SMC) galaxy. The two prescriptions produce similar effects at λ > 0.5 µm, but the SMC law produces redder spec- tra at shorter wavelengths for the same amount of extinction. Reddened templates with an SMC law reproduce quite well the optical spectra of dust-reddened quasars in the Sloan Digital Sky Survey (SDSS; see Richards et al. 2003), while, using the Calzetti (1997) law, Polletta et al. (2006) were able to repro- duce the SEDs of X-ray sources in the Spitzer SWIRE survey. The procedure of reddening a typical Type 1 quasar does not provide a satisfactory fit to the photometric data points of our sources. In Fig. 3 we show the results obtained when the prescription of Mignoli et al. (2004) for the extinction [SMC extinction law and E(B − V) = 0.7] is adopted. The dashed line shows the reddened quasar template nor- malized to fit the R−Ks colour. For comparison, the unobscured quasar template is also shown (dotted line). Although the R−Ks colour is obviously reproduced, the overall SED is not well re- produced, since the observed IRAC and 24 µm flux densities are systematically lower than predicted (up to a factor of 10 at 24 µm). The discrepancy between the data points and the reddened Type 1 quasar template might be due to the application to ac- tive galactic nuclei of an extinction curve derived from galax- ies. The different behaviour of AGNs from galaxies can be attributed to different dust distribution (i.e., torus shape in AGNs), and gas-to-dust ratios, which can lead to an unusual dust reddening curve for AGNs. Since a reddened quasar template does not reproduce the shape of our data, we adopted the red quasar template from Polletta et al. (2006), which is a composite spectrum: in the optical/near-IR band, it is the spectrum of the red quasar FIRST J013435.7−093102 from Gregg et al. (2002), while the average of several bright quasars from the Palomar-Green (PG) sample (Schmidt & Green 1983) with consistent optical data has been used in the IR. The Polletta et al. (2006) template re- produces the observed data points significantly better, allowing for the observed sharp decrease from 0.2 to 0.7 µm in the source rest frame (see Fig. 3 where the Polletta et al. 2006 spectrum is shown as a solid line). For source Abell 2690#29, which has a spectroscopic red- shift, the SED normalization has been obtained through a best- fit procedure. For source PKS 0312−77#45, where only a min- imum redshift was available prior to this analysis, we have left free to vary both the normalization and the redshift. In Table 3 we report the derived redshifts, column densi- ties NH and de-absorbed L2−10 keV also for these sources which, similarly to the elliptical-like sources, belong to the Type 2 quasar population. 5. Physical parameters 5.1. Bolometric correction Once the SED of the nuclear component has been determined, the following step is to estimate the bolometric luminosity Lbol, which is a quantity directly related to the central black hole activity. The bolometric luminosity Lbol can be estimated from the luminosity in a given band b, Lb, by applying a suit- able bolometric correction kbol,b = Lbol/Lb. For the X-ray se- lected sources, the bolometric luminosity is typically estimated 8 F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars from the luminosity in the 2–10 keV band (kbol,2−10 keV = Lbol/L2−10 keV ). In previous works, several authors used the bolometric correction obtained by Elvis et al. (1994) for lumi- nous, mostly nearby quasars, i.e., kbol,2−10 keV≈ 30. However, these corrections could be affected by the following uncer- tainties: firstly, they are average corrections obtained from a few dozens of bright quasars; secondly, as discussed in Marconi et al. (2004), these corrections could overestimate the bolometric luminosities since they are based on the integral of the observed SEDs of bright unobscured AGN, without removing the IR bump (hence counting twice a fraction of ≈ 30% of the intrinsic optical–UV radiation). At lower lumi- nosities (typical of Seyfert galaxies, i.e., 1042 − 1044 erg s−1), a lower value for this correction (kbol,2−10 keV≈ 10) was suggested (e.g., Fabian 2004). For heavily obscured luminous sources, only few objects have been studied in detail; in particular, for two SWIRE Compton-thick (i.e., log NH & 24 cm −2) AGN Polletta et al. (2006) found kbol,0.3−8 keV ≈ 3 and ≈ 100. In this work, thanks to the multi-band observations and ef- forts in disentangling the AGN and the host components, we try to derive directly the nuclear bolometric luminosity of our sources without assuming any average correction. We estimate Lbol by adding the X-ray luminosity integrated over the entire X-ray range (L0.5−500 keV , not corrected for absorption) to the IR luminosity (L1−1000 µm). L0.5−500 keV has been estimated from the observed L2−10 keV luminosity assuming a single power-law spectrum with Γ=1.9 (typical for AGN emission; see, e.g., Fig. 6 of Vignali et al. 2005 and references therein) plus absorption (where the col- umn densities are taken from the X-ray spectral analysis; see Perola et al. 2004 and Lanzuisi et al., in preparation) and an exponential cut-off at 200 keV. The median value found for the ratio L0.5−500 keV/L2−10 keV is ≈ 4. The IR luminosity has been estimated by integrating the SED from 1 µm to 1000 µm using only the nuclear component for the AGN hosted in the ellipti- cal galaxies (see §4.1), and the Polletta et al. (2006) template for the point-like sources. Before computing the bolometric output of our sources, the derived IR luminosities must be properly corrected to account for the geometry of the torus and its orientation. The first cor- rection is related to the covering factor f (which represents the fraction of the primary optical–UV radiation intercepted by the torus), while the second correction is due to the anisotropy of the IR emission, which is a function of the viewing angle (see Pier & Krolik 1993 and Granato & Danese 1994 for further de- tails). We estimated the first correction (≈ 1.5) from the ratio of obscured (Compton thin + Compton thick) to unobscured quasars as required by the most recent X-ray background syn- thesis model (see Gilli et al. 2007) in the luminosity range of our sources. A correction of ≈ 1.5 implies an average covering factor f≈0.67 which, in a simple torus geometry, corresponds to an angle θ≈48◦ between the perpendicular to the equatorial plane and the edge of the torus. A first-order estimate of the anisotropy factor has been computed from the Silva et al. (2004) templates as the ratio (R) of the luminosity of a face-on vs. an edge-on AGN, whose obscuration is parametrized as a function of NH. The integra- tion has been performed in the 1–30 µm range, after normal- izing the two SEDs to the same luminosity in the 30–100 µm range, where the anisotropy is thought to be negligible. The derived anisotropy factors are large only for the Silva et al. (2004) template with higher column density (R ≈3–4 for NH = 1024.5 cm−2); since all of our targets are characterized by lower obscuration, such corrections do not affect our IR luminosities significantly (R ≈1.2–1.3 for NH ≈ 10 22.0 − 1023.4 cm−2). In conclusion, the final combined corrections to be applied to the observed IR luminosities of our sources, given their column densities, are in the range ≈ 1.8–2.0. After adding the X-ray lu- minosities, the IR correction factors would translate in a mean correction factor of ≈ 1.7 in the computation of the bolometric luminosities. In Table 3 the derived bolometric luminosities are reported along with the full range (L0.5−500 keV) of X-ray luminosities, the IR (L1−1000 µm) luminosities and the bolometric corrections (kbol,2−10 keV ). We note that our L1−1000 µm estimates (hence Lbol) are robust despite the choice of our SEDs. By comparing the L1−1000 µm obtained using the Silva et al. (2004) model with the L1−1000 µm obtained adopting other recent average quasar SEDs (i.e., Richards et al. 2006), we have verified that the uncertain- ties in L1−1000 µm are within the ≈ 10% level. The bolometric output of our targets is dominated by the IR reprocessed emis- sion, the primary X-ray radiation (L0.5−500 keV) accounting only for .15% of the total luminosity. The derived median (mean) value of kbol,2−10 keV is ≈ 25 (35±9; see Fig. 4 and column 8 of Table 3), consistent with the value kbol,2−10 keV≈ 30 from Elvis et al. (1994) widely adopted in past works. However, as pointed out also by Elvis et al. (1994), the bolometric corrections span a wide range of values (≈ 12–100); as a consequence, the adoption of a mean value could lead to inaccurate results. We note that in the extreme case where no corrections for the covering factor and the anisotropy of the torus are applied, we would obtain a median kbol,2−10 keV of ≈ 16. In Fig. 4 the derived Lbol as a function of L2−10 keV is shown; the dot-dashed line joins the expected values from the analysis of Marconi et al. (2004), where kbol,2−10 keV is derived by con- structing an AGN reference template taking into account how the spectral index αox (Zamorani et al. 1981) varies as a func- tion of the luminosity (Vignali et al. 2003). Although the bolo- metric luminosities estimated for our objects are on average lower than those expected on the basis of Marconi et al. (2004) relation, they are however consistent with a trend of higher kbol,2−10 keV for objects with higher X-ray luminosity. If we fit our objects in the Lbol − L2−10keV plane with the same slope as the Marconi et al. (2004) relation, the difference in normaliza- tion is ≈ 50%. 5.2. Galaxy and black hole masses, and black hole Eddington ratios For the AGN hosted in elliptical galaxies we are able to infer both the galaxy and the black hole masses. The galaxy masses are estimated, assuming a Salpeter (1955) initial mass func- tion (IMF), from the Ks luminosities taking into account that ozzi,C ignali,A astri itzer observations inous obscured quasars Table 3. Inferred rest-frame properties of our targets Source Id. za NbH L 2−10 keV L 0.5−500 keV L 1−1000 µm Lbol Lbol/L2−10 keV L K Mstar M BH (Lbol/LEdd) (1022 cm−2) (1044 erg s−1) (1045 erg s−1) (1045 erg s−1) (1045 erg s−1) (1011Lk,⊙) (10 11M⊙) (10 Abell 2690#75 1.30+0.30 −0.20 6.9 3.2 1.3 9.7 11.0 34.7 5.2 3.5 1.3 0.065 PKS 0312-77#36 0.90+0.05 −0.15 1.0 0.7 0.2 1.2 1.5 20.6 1.0 0.8 0.2 0.058 PKS 0537-28#91 1.30+0.40 −0.70 25.8 5.3 2.6 4.4 7.1 13.4 2.8 1.5 0.7 0.084 PKS 0537-28#54 >1.30 1.6 2.0 0.6 3.9 4.6 23.0 3.0 2.0 0.7 0.049 PKS 0537-28#111 1.20+0.20 −0.10 9.1 1.7 0.7 1.4 2.1 12.3 7.8 6.2 2.1 0.008 Abell 2690#29 2.08 2.1 8.4 2.8 78.4 81.2 97.0 4.17 – – – PKS 0312-77#45 1.85+0.20 −0.30 8.0 6.2 2.6 27.2 29.8 47.9 1.65 – – – BPM 16274#69 1.35 2.5 2.4 0.8 5.9 6.7 28.2 8.8 6.0 2.5 0.022 a Photometric redshifts as derived from the analysis presented in this paper (see §4.1). For source PKS 0537-28#54, only a minimun redshift was estimated; for sources Abell 2690#29 and BPM 16274#69, the spectroscopic redshifts measured by Maiolino et al. (2006) are reported. b The column densities, measured through X-ray spectral fitting (see Perola et al. 2004 and Lanzuisi et al., in preparation, for details), were “matched” to the redshift used in the SED best-fitting procedure (using the relation NH(z)=NH(z = 0)×(1 + z) 2.6). c Absorption-corrected X-ray luminosity. d The 0.5–500 keV luminosities have been derived from the observed 2–10 keV luminosities as described in the text. The luminosities are not corrected for absorption. e The 1–1000 µm luminosities have been derived from the integral of the nuclear SEDs (including the corrections described in the text). The values reported for the point-like sources refer to the Polletta et al. (2006) red quasar template (see 4.2 for details). f The Ks-band luminosities refer to the nuclear component for the point-like sources (Abell 2690#29 and PKS 0312−77#45) and to the host-galaxy starlight for the AGN hosted in the elliptical galaxies. g,h The reported MBH and Lbol/LEdd have been computed from the local LK −MBH relation (Marconi & Hunt 2003), under the hypothesis of an evolution of the MBH/Mstar ratio of a factor two with redshift (see §5.2 for details). 10 F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars Fig. 4. Bolometric luminosity vs. absorption-corrected 2– 10 keV luminosity for the six AGN hosted in the elliptical galaxies (circles) and the two point-like AGN (triangles). The filled symbols refer to the values corrected for the covering fac- tor and torus anisotropy, while the empty symbols refer to the bolometric luminosity without applying these corrections. The dot-dashed line represents the correlation from Marconi et al. (2004). The three dashed lines represent the loci of kbol,x=10, 30 and 100. Fig. 5. Black hole mass (MBH) vs. bolometric luminosity (Lbol) for the AGN hosted in the elliptical galaxies. The Eddington luminosity for a given MBH is reported in the right-hand axis. The black hole masses have been estimated from the local LK − MBH relation (Marconi & Hunt 2003) under two differ- ent hypotheses (see 5.2): (1) evolution by a factor of two of the MBH/Mstar ratio with redshift in comparison to the local values (black filled circles); (2) no evolution of the MBH/Mstar ratio (empty circles). The three dashed lines represent the loci of Lbol/LEdd = 0.01, 0.033 and 0.1 (from left to right). Mstar/LK for an old stellar population can vary from ≈ 0.5 to ≈ 0.9 (for ages between 3 and 6 Gyr; Bruzual & Charlot 2003). We can derive Mstar directly from LK since for these sources the Ks-band emission is dominated by the galaxy starlight. The rest-frame LK have been derived using the appropriate SED templates (see Sect. 4.1). The inferred stellar masses are in the range (0.8–6.2)×1011 M⊙, implying that our obscured AGN are hosted by massive elliptical galaxies at high redshifts. In Table 3 both the LK and Mstar values are reported; we note that the different assumption of the Chabrier (2003) IMF would produce a factor ≈ 1.7 lower masses (di Serego Alighieri et al. 2005). To estimate the black hole masses, we take advantage of the local MBH − LK relation (Marconi & Hunt 2003) which, taking into account the Mstar/LK values, is expression of the intrin- sic MBH − Mstar relation. Given the challenging measurements of high-redshift black hole masses, the behaviour of this rela- tion with redshift is still matter of debate and different authors, using different techniques, have found different results. Woo et al. (2006) and Peng et al. (2006) derive a signifi- cant evolution of the MBH − Mstar relation with redshift, be- ing the MBH/Mstar ratio larger, at high redshift, up to a factor ≈ 4 in comparison to the local value. In the Woo et al. (2006) analysis the discrepancy with respect to the local value is al- ready present at z = 0.36, while Peng et al. (2006) find an av- erage MBH/Mstar a factor &4 times larger than the local value at z > 1.7, while at lower redshifts (1 . z . 1.7) they de- rive a ratio which is at most two times higher than the local value, and maybe consistent with marginal or no evolution. On the other hand, Shields et al. (2006) and Hopkins et al. (2006) suggest that the MBH/Mstar ratio is not significantly higher (at most a factor of two) than that measured locally up to z . 2. Given these uncertainties about the evolution of the MBH − Mstar relation with redshift, we have estimated the black hole masses for our objects under two different hypotheses: (1) the MBH/Mstar ratio is higher than locally by a factor two in the redshift range (0.9 . z . 1.4) of our sources; (2) the MBH/Mstar ratio does not evolve with redshift. In both cases, our results imply very massive black hole masses (see Fig. 5), in the range ≈ 2.0 × 108 − 2.5× 109 M⊙ in the former hyphothesis, with a factor of two lower values under the second hypothesis. Our estimated MBH are consistent with the results derived by McLure & Dunlop (2004) studying a large sample of Type 1 SDSS quasars and deriving the black hole masses from virial methods; most of their black hole masses are in the range 1.5×108–2.5×109 M⊙ in the redshift interval of our sample (see Fig. 1 of McLure & Dunlop 2004). From the comparison of the bolometric luminosities com- puted in the previous section (see Table 3) with the Eddington luminosities calculated from the black hole masses esti- mated above, we derive that our obscured AGN are radiat- ing at a relatively low fraction of their Eddington luminos- ity (λ ≈ 0.008–0.084 and λ ≈ 0.015–0.170 under the two hy- potheses; see Fig. 5 and Table 3). This finding confirms and extends to a larger sample the results found by Maiolino et al. (2006) for two sources of our sample and by Brusa et al. (2005) for a sample of EROs in the “Daddi Field”. As suggested by Maiolino et al. (2006), the data indicate that our very massive black holes may have already passed their rapidly accreting phase and are reaching their final masses at low accretion rates. F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars 11 Fig. 6. Bolometric luminosity as a fraction of the Eddington lu- minosity vs. redshift for the whole sample of SDSS quasars of McLure & Dunlop (2004), plotted as small crosses. The large circles indicate the six HELLAS2XMM AGN hosted in ellipti- cal galaxies (symbols as in Fig. 5). The estimated radiating efficiencies are significantly lower than the average Lbol/LEdd ≈ 0.4 inferred by Marconi et al. (2004). However, since in the Marconi et al. (2004) model only the phases of significant black hole growth are considered, our results are not in contrast with the proposed model but sug- gest that our targets belong to the tail of the sources charac- terized by low accretion rates. Consistently, our data (black filled and empty circles representing the evolution and no- evolution hypothesis, respectively) lie in the lower envelope of the Eddington ratio distribution found by McLure & Dunlop (2004) for their large SDSS quasar sample. This is shown in Fig. 6, where our data are overlaid on the SDSS data points. This suggests that the SDSS quasar survey and the HELLAS survey probe different regimes of AGN activity: the SDSS sam- ples the brightest sources in the sky (R <∼ 20), most likely char- acterized by a high accretion rate, while our targets (X-ray se- lected, optically faint, i.e., R > 24, and obscured), are asso- ciated with a different evolutionary phase. We argue that the SMBH in our targets has already reached its final mass and the observed emission is witnessing a late stage of the accretion activity. 6. Conclusions We have performed with Spitzer a pilot program to study a sam- ple of eight Type 2 (i.e., luminous and obscured) quasars at high redshift, selected from the HELLAS2XMM survey. Three sources have a measured spectroscopic redshift (two secure and one tentative) from near-IR spectroscopy; the remaining ob- jects have an estimated minimum redshift obtained from the R − K colours. On the basis of their Ks-band morphological properties, the sample is divided into two classes: sources with radial profiles typical of elliptical galaxies and point-like ob- jects. The most important results can be summarized as fol- lows: • All of the eight sources have been clearly detected in both IRAC and MIPS 24 µm bands. • The Spitzer observations have allowed us to detect the nu- clear component (often hidden at short wavelengths by the host galaxy) as thermal IR re-processed emission from the circumnuclear torus. While for the two point-like sources the nuclear component dominates at all frequencies, for the six sources with elliptical-like radial profile the contribu- tion from the strong stellar continuum is dominant up to the first IRAC bands, but the torus emission accounts for the entire emission at 24 µm. • Taking advantage of the new Spitzer data, the nuclear SEDs of the sources have been modeled and new photometric redshifts have been estimated, following two approaches: for the elliptical sources, the nuclear emission has been “cleaned” from the host galaxy contribution adopting a two-component model (galaxy plus nuclear component), constrained using all the extensive observed data sets. For the point-like sources, the SEDs appear inconsistent with an extinguished Type 1 quasar template, being well repro- duced by an empirical SED of red quasars (Polletta et al. 2006). We find an overall agreement between the SED tem- plates and the data points, and the derived photometric red- shifts are consistent with the spectroscopic ones for two sources. • Using the model components to extrapolate the nuclear SEDs in the far-IR regime, we derived the bolometric lumi- nosities (being in the range ≈ 1045−1047 erg s−1) by adding the IR luminosities to the full range of X-ray luminosities. In this computation, we have considered and discussed the corrections to be applied to the observed IR luminosities to take into account the covering factor of the torus and the anisotropy of the IR emission. The median 2–10 keV bolo- metric correction is ≈ 25, consistent with the value typically assumed in literature. • For the elliptical sources, thanks to the independent esti- mates of the stellar light and nuclear bolometric luminos- ity, the physical parameters of the central black holes have been estimated using the MBH − LK relation and exploring different hypotheses for the evolution of the MBH/Mstar ra- tio with redshift. Under the hyphothesis that the MBH/Mstar ratio is a factor of two higher at z ≈ 1.2 than locally, our luminous, obscured AGN have masses in the range (0.2- 2.5)×109 M⊙, reside in massive [(0.8-6.2)×10 11 M⊙] high- redshift ellipticals and are characterized by low Eddington ratios (λ≈ 0.008–0.084). Through our direct estimate of the IR luminosity, we confirm the conclusion of Maiolino et al. (2006) that these black holes may have already passed their rapidly accretion phase. Acknowledgements. The authors acknowledge partial support by the Italian Space agency under the contract ASI–INAF I/023/05/0. The authors thank R. Gilli, M. Polletta and L.Silva for useful discussions, and R. J. McLure for kindly providing us with the data points of Fig. 6. We thank the anonymous referee for the useful comments. References Antonucci, R. 1993, ARA&A, 31, 473 12 F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars Baldi, A., Molendi, S., Comastri, A., et al. 2002, ApJ, 564, 190 Barger, A. J., Cowie, L. L., Mushotzky, R. F., et al. 2005, AJ, 129, 578 Brusa, M., Comastri, A., Daddi, E., et al. 2005, A&A, 432, 69 Bruzual, G. & Charlot, S. 2003, MNRAS, 344, 1000 Calzetti, D. 1997, in American Institute of Physics Conference Series, ed. W. H. Waller, 403 Chabrier, G. 2003, PASP, 115, 763 Cocchia, F., Fiore, F., Vignali, C., et al. 2007, A&A, in press, astro-ph/0612023 Comastri, A. & Fiore, F. 2004, Ap&SS, 294, 63 Comastri, A., Setti, G., Zamorani, G., & Hasinger, G. 1995, A&A, 296, 1 di Serego Alighieri, S., Vernet, J., Cimatti, A., et al. 2005, A&A, 442, 125 Elvis, M., Wilkes, B. J., McDowell, J. C., et al. 1994, ApJS, 95, Fabian, A. C. 2004, in Coevolution of Black Holes and Galaxies, ed. L. C. Ho, 446 Fadda, D., Flores, H., Hasinger, G., et al. 2002, A&A, 383, 838 Fadda, D., Marleau, F. R., Storrie-Lombardi, L. J., et al. 2006, AJ, 131, 2859 Fazio, G. G., Hora, J. L., Allen, L. E., et al. 2004, ApJS, 154, Fiore, F., Brusa, M., Cocchia, F., et al. 2003, A&A, 409, 79 Franceschini, A., Manners, J., Polletta, M. d. C., et al. 2005, AJ, 129, 2074 Gilli, R., Comastri, A., & Hasinger, G. 2007, A&A, 463, 79 Granato, G. L. & Danese, L. 1994, MNRAS, 268, 235 Granato, G. L., Danese, L., & Franceschini, A. 1997, ApJ, 486, Gregg, M. D., Lacy, M., White, R. L., et al. 2002, ApJ, 564, Hickox, R. C. & Markevitch, M. 2006, ApJ, 645, 95 Hopkins, P. F., Hernquist, L., Cox, T. J., et al. 2006, ApJS, 163, Lonsdale, C., Polletta, M. d. C., Surace, J., et al. 2004, ApJS, 154, 54 Maiolino, R., Mignoli, M., Pozzetti, L., et al. 2006, A&A, 445, Makovoz, D. & Marleau, F. R. 2005, PASP, 117, 1113 Marconi, A. & Hunt, L. K. 2003, ApJ, 589, L21 Marconi, A., Risaliti, G., Gilli, R., et al. 2004, MNRAS, 351, McLure, R. J. & Dunlop, J. S. 2004, MNRAS, 352, 1390 Mignoli, M., Pozzetti, L., Comastri, A., et al. 2004, A&A, 418, Pei, Y. C. 1992, ApJ, 395, 130 Peng, C. Y., Impey, C. D., Rix, H.-W., et al. 2006, ApJ, 649, Perola, G. C., Puccetti, S., Fiore, F., et al. 2004, A&A, 421, 491 Pier, E. A. & Krolik, J. H. 1993, ApJ, 418, 673 Polletta, M. d. C., Wilkes, B. J., Siana, B., et al. 2006, ApJ, 642, Richards, G. T., Hall, P. B., Vanden Berk, D. E., et al. 2003, AJ, 126, 1131 Richards, G. T., Lacy, M., Storrie-Lombardi, L. J., et al. 2006, ApJS, 166, 470 Rieke, G. H., Young, E. T., Engelbracht, C. W., et al. 2004, ApJS, 154, 25 Rigby, J. R., Rieke, G. H., Maiolino, R., et al. 2004, ApJS, 154, Salpeter, E. E. 1955, ApJ, 121, 161 Schmidt, M. & Green, R. F. 1983, ApJ, 269, 352 Shields, G. A., Salviander, S., & Bonning, E. W. 2006, New Astronomy Review, 50, 809 Silva, L., Maiolino, R., & Granato, G. L. 2004, MNRAS, 355, Soltan, A. 1982, MNRAS, 200, 115 Spergel, D. N., Verde, L., Peiris, H. V., et al. 2003, ApJS, 148, Surace, J. A., Shupe, D. L., Fang, F., & et al. 2005, tech- nical report, The SWIRE Data Release 2. Available at http://swire.ipac.caltech.edu/swire/astronomers/ publications/SWIRE2_doc_083105.pdf Vignali, C., Brandt, W. N., & Schneider, D. P. 2003, AJ, 125, Vignali, C., Brandt, W. N., Schneider, D. P., & Kaspi, S. 2005, AJ, 129, 2519 Werner, M. W., Roellig, T. L., Low, F. J., et al. 2004, ApJS, 154, 1 Woo, J.-H., Treu, T., Malkan, M. A., & Blandford, R. D. 2006, ApJ, 645, 900 Zamorani, G., Henry, J. P., Maccacaro, T., et al. 1981, ApJ, 245, 357 Introduction Sample selection Spitzer observations and data reduction Analysis of the spectral energy distributions Elliptical-like sources Point-like sources Physical parameters Bolometric correction Galaxy and black hole masses, and black hole Eddington ratios Conclusions ABSTRACT Aims: We aim at estimating the spectral energy distributions (SEDs) and the physical parameters related to the black holes harbored in eight high X-ray-to-optical (F_X/F_R>10) obscured quasars at z>0.9 selected in the 2--10 keV band from the HELLAS2XMM survey. Methods: We use IRAC and MIPS 24 micron observations, along with optical and Ks-band photometry, to obtain the SEDs of the sources. The observed SEDs are modeled using a combination of an elliptical template and torus emission (using the phenomenological templates of Silva et al. 2004) for six sources associated with passive galaxies; for two point-like sources, the empirical SEDs of red quasars are adopted. The bolometric luminosities and the M_BH-L_K relation are used to provide an estimate of the masses and Eddington ratios of the black holes residing in these AGN. Results: All of our sources are detected in the IRAC and MIPS (at 24 micron) bands. The SED modeling described above is in good agreement with the observed near- and mid-infrared data. The derived bolometric luminosities are in the range ~10^45-10^47 erg s^-1, and the median 2--10 keV bolometric correction is ~25, consistent with the widely adopted value derived by Elvis et al. (1994). For the objects with elliptical-like profiles in the K_s band, we derive high stellar masses (0.8-6.2)X10^11 Mo, black hole masses in the range (0.2-2.5)X10^9 Mo, and Eddington ratios L/L_Edd<0.1, suggesting a low-accretion phase. <|endoftext|><|startoftext|> Introduction 1.1. p-Adic wavelets and pseudo-differential operators. According to the well-known Ostrovsky theorem, any nontrivial valuation on the field Q is equivalent either to the real valuation | · | or to one of the p-adic valuations | · |p. We recall that the field Qp of p-adic numbers is defined as the completion of the field of rational numbers Q with respect to the non-Archimedean p-adic norm | · |p. This norm is defined as follows: if an arbitrary rational number x 6= 0 is represented as x = pγ m , where γ = γ(x) ∈ Z, and m and n are not divisible by p, then (1.1) |x|p = p−γ , x 6= 0, |0|p = 0. This norm inQp satisfies the strong triangle inequality |x+y|p ≤ max(|x|p, |y|p). Thus there are two equal in rights universes: the real universes and the p-adic one. The latter has a specific and unusual properties. Nevertheless, there are a lot of papers where different applications of p-adic analysis to physical problems, stochastics, cognitive sciences and psychology are stud- ied [6]– [10], [13]– [19], [34]– [36] (see also the references therein). In view of the Ostrovsky theorem such investigations not only have great interest in Date: 2000 Mathematics Subject Classification. Primary 11F85, 42C40; Secondary 46F10. Key words and phrases. p-adic multiresolution analysis, p-adic compactly supported wavelets. The first author (V. S.) was also supported in part by DFG Project 436 RUS 113/809 and Grant 05-01-04002-NNIOa of Russian Foundation for Basic Research. http://arxiv.org/abs/0704.0736v1 2 V. M. SHELKOVICH AND M. SKOPINA itself, but lead to applications and better understanding of similar problems in usual mathematical physics. We recall that there exists a p-adic analysis connected with the mapping Qp into Qp and an analysis connected with the mapping Qp into the field of com- plex numbers C, there exist two types of p-adic physics models. For the p-adic analysis related to the mapping Qp → C the operation of partial differentia- tion is not defined , and as a result, large number of models connected with p-adic differential equations use pseudo-differential operators and the theory of p-adic distributions (generalized functions) (see the above mentioned pa- pers and books). In particular, fractional operators Dα are extensively used in applications (see fore-quoted papers and especially [34]). It is well known that the theory of p-adic pseudo-differential operators (in particular, fractional operators) and equations closely related to wavelet type bases. It is typical that p-adic compactly supported wavelets are eigenfunc- tions of p-adic pseudo-differential operators [3]– [5], [16], [17], [18], [20] – [22]. Thus the wavelet theory plays a key role in application of p-adic analysis and gives a new powerful technique for solving p-adic problems. This theory starts development only in resent years and has many open problems. In [20], S. V. Kozyrev constructed the orthonormal compactly supported p-adic wavelet basis (1.2) in L2(Qp): (1.2) θγja(x) = p −γ/2χp p−1j(pγx− a) |pγx− a|p , x ∈ Qp, j ∈ Jp = {1, 2, . . . , p−1}, γ ∈ Z, a ∈ Ip = Qp/Zp. Kozyrev’s wavelets (1.2) are eigenfunctions of the Vladimirov fractional operator [34, IX]. Further develop- ment and generalization of the theory of such type wavelets can be found in the papers by S. V. Kozyrev [21], [22], A. Yu. Khrennikov, and S. V. Kozyrev [16], [17], J. J. Benedetto, and R. L. Benedetto [8], and R. L. Benedetto [9]. In [3], the multidimensional p-adic wavelets generated by direct product of the Kozyrev one-dimensional wavelets were introduced. In [18], a new type of p-adic multidimensional wavelet basis was introduced: θ(m)γsa (x) = p −γ/2χp s(pγx− a) |pγx− a|p , x ∈ Qp, where s ∈ Jp;m, γ ∈ Z, a ∈ Ip. Here Jp;m = {s = p−m s0 + s1p + · · · + sm−1p : sj = 0, 1, . . . , p − 1; j = 0, 1, . . . , m − 1; s0 6= 0}, m ≥ 1 is a fixed positive integer. The multidimensional wavelets from [3] are a par- ticular case of the last wavelets. Moreover, in [3], [18], there were derived the necessary and sufficient conditions for a class of multidimensional p-adic pseudo-differential operators (including fractional operator) to have such mul- tidimensional wavelets as eigenfunctions. It remains to point out that for pseudo-differential operators from [3], [18] a “natural” definition domain is the Lizorkin spaces of distributions Φ′(Qnp ), introduced in [3]. The space Φ′(Qnp ) is invariant under the mentioned above pseudo-differential operators. Moreover, the above mentioned p-adic wavelets belong to the Lizorkin space Φ(Qnp ) of test functions. Recall that the usual p-ADIC HAAR MULTIRESOLUTION ANALYSIS 3 Lizorkin spaces were studied in the excellent papers of P. I. Lizorkin [24], [25] (see also [29], [30]). It’s interesting to compare appearing first wavelets in p-adic analysis with the history of the wavelet theory in real analysis. In 1910 Haar [12] constructed an orthogonal basis for L2(R) consisting of the dyadic shifts and scales of one piecewise constant function. A lot of mathematicians actively studied Haar basis, different kinds of generalizations were introduced, but during almost the whole century nobody could find another wavelet function (a function whose shifts and scales form an orthogonal basis). Only in early nineties a method for construction of wavelet functions appeared. This method is based on the notion of multiresolution analysis (MRA in the sequel) introduced by Y. Meyer and S. Mallat [28], [26], [27]. Smooth compactly supported wavelet functions were found in this way, which has been very important for some engineering applications. In this paper we introduce MRA in L2(Qp) and present a concrete MRA for p = 2 being an analog of Haar MRA in L2(R). The same scheme as in the real setting leads to a Haar basis. It turned out that this Haar basis coincides with Kozyrev’s wavelet system. However, 2-adic Haar MRA is not an identical copy of its real analog. In contrast to Haar MRA in L2(R), we proved that there exist infinity many different Haar orthogonal bases in L2(Q2) generated by the same MRA. 1.2. Contents of the paper. In Sec. 2, we recall some facts from the p-adic theory of distributions [11], [32], [33], [34]. In Sec. 3, some facts from the theory of the p-adic Lizorkin spaces [3] are recalled. In Sec. 4, by Definition 4.1 we introduce the MRA adapted to the p-adic case. In Subsec. 4.2, we introduce the refinement equation (4.7) φ(x) = , x ∈ Qp, whose solution φ(x) = Ω is the characteristic function of the unit disc, where where Ω(t) is the characteristic function of the interval [0, 1]. The con- jecture to use the above equation as the refinement equation was proposed in [18]. The above refinement equation is natural and reflects the fact that the characteristic function Ω of the unit disc B0 is represented as a sum of p pieces characteristic functions of the disjoint discs B−1(r), r = 0, 1, . . . , p − 1 (see (2.7)). In Subsec. 4.3, the 2-adic MRA is constructed. Namely, we proved that MRA is generated by a refinable function which is the characteristic function φ(x) = Ω of the unit disc B0 = {x : |x|2 ≤ 1} ⊂ Q2 and satisfies the refinement equation (4.8) φ(x) = , x ∈ Q2. 4 V. M. SHELKOVICH AND M. SKOPINA By our MRA we construct 2-adic orthonormal wavelet basis (4.15) in L2(Q2), which is the Kozyrev basis (1.2) for the case p = 2. It turned out that the Kozyrev wavelet basis is not unique orthonormal wavelet basis. In Sec. 5, infinity many different 2-adic wavelet orthonormal bases in L2(Q2) are constructed. Namely, using Theorem 5.1, we construct wavelet functions ψ(s)(x), s ∈ N whose dilatations and shifts form 2-adic orthonormal wavelet bases in L2(Q2). Since many p-adic models use pseudo-differential operators, in particular, fractional operator, these results on p-adic wavelets can be intensively used in applications. Moreover, p-adic wavelets can be used to construct solutions of linear and semi-linear pseudo-differential equations [5], [23]. 2. p-Adic distributions We recall some facts from the theory of p-adic distributions (generalized functions). Here and in what follows, we shall systematically use the notations and results from [34] and [11, Ch.II]. Let N, Z, C be the sets of positive integers, integers, complex numbers, respectively, and N0 = {0} ∪ N. Denote by Q∗p = Qp \ {0} the multiplicative group of the field Qp. The canonical form of a p-adic number x 6= 0 is (2.1) x = pγ(x0 + x1p + x2p 2 + · · · ), where γ = γ(x) ∈ Z, xj = 0, 1, . . . , p − 1, x0 6= 0, j = 0, 1, . . . . The series is convergent in the p-adic norm (1.1), and one has |x|p = p−γ. By means of representation (2.1), the fractional part {x}p of a number x ∈ Qp is defined as follows (2.2) {x}p = 0, if γ(x) ≥ 0 or x = 0, pγ(x0 + x1p + x2p 2 + · · ·+ x|γ|−1p|γ|−1), if γ(x) < 0. The function (2.3) χp(ξx) = e 2πi{ξx}p for every fixed ξ ∈ Qp is an additive character of the field Qp. According to [34, III.2.], any multiplicative character π of the field Qp can be represented as = πα(x) = |x|α−1p π1(x), x ∈ Q∗p, where π(p) = p1−α and π1(x) is a normed multiplicative character such that π1(x) = π1(|x|px), π1(p) = π1(1) = 1, |π1(x)| = 1. We denote π0 = |x|−1p . The space Qnp = Qp × · · · × Qp consists of points x = (x1, . . . , xn), where xj ∈ Qp, j = 1, 2 . . . , n, n ≥ 2. The p-adic norm on Qnp is (2.4) |x|p = max 1≤j≤n |xj|p, x ∈ Qnp , where |xj |p id defined by (1.1). p-ADIC HAAR MULTIRESOLUTION ANALYSIS 5 Denote by Bnγ (a) = {x ∈ Qnp : |x − a|p ≤ pγ} the ball of radius pγ with the center at a point a = (a1, . . . , an) ∈ Qnp and by Snγ (a) = {x ∈ Qnp : |x − a|p = pγ} = Bnγ (a) \ Bnγ−1(a) its boundary (sphere), γ ∈ Z. For a = 0 we set Bnγ (0) = B γ and S γ (0) = S γ . For the case n = 1 we will omit the upper index n. It is clear that (2.5) Bnγ (a) = Bγ(a1)× · · · × Bγ(an), where Bγ(aj) = {xj : |xj − aj |p ≤ pγ} ⊂ Qp is a disc of radius pγ with the center at a point aj ∈ Qp, j = 1, 2 . . . , n. Any two balls in Qnp either are disjoint or one contains the other. Every point of the ball is its center. According to [34, I.3,Examples 1,2.], the disc Bγ is represented by the sum of pγ−γ disjoint discs Bγ′(a), γ ′ < γ: (2.6) Bγ = Bγ′ ∪ ∪aBγ′(a), where a = 0 and a = a−rp −r + a−r+1p −r+1 + · · ·+ a−γ′−1p−γ ′−1 are the centers of the discs Bγ′(a), r = γ, γ − 1, γ − 2, . . . , γ′ + 1, 0 ≤ aj ≤ p− 1, a−r 6= 0. In particular, the disc B0 is represented by the sum of p disjoint discs (2.7) B0 = B−1 ∪ ∪p−1r=1B−1(r), where B−1(r) = {x ∈ S0 : x0 = r} = r + pZp, r = 1, . . . , p− 1; B−1 = {|x|p ≤ p−1} = pZp; and S0 = {|x|p = 1} = ∪p−1r=1B−1(r). Here all the discs are disjoint. We call coverings (2.6) and (2.7) the canonical covering of the discs B0 and Bγ , respectively. On Qp there exists the Haar measure, i.e., a positive measure dx invariant under shifts, d(x + a) = dx, and normalized by the equality |ξ|p≤1 dx = 1. The invariant measure dx on the field Qp is extended to an invariant measure dnx = dx1 · · · dxn on Qnp in the standard way. If f is an integrable function on Qp, then [11, Ch.II,§2.2], [34, IV]: (2.8) dx = pγ , f(x) dx = f(x) dx, f(x) dx = f(x) dx− f(x) dx. A complex-valued function f defined on Qnp is called locally-constant if for any x ∈ Qnp there exists an integer l(x) ∈ Z such that f(x+ y) = f(x), y ∈ Bnl(x). Let E(Qnp ) and D(Qnp ) be the linear spaces of locally-constant C-valued func- tions on Qnp and locally-constant C-valued functions with compact supports 6 V. M. SHELKOVICH AND M. SKOPINA (so-called test functions), respectively [34, VI.1.,2.]. If ϕ ∈ D(Qnp ), according to Lemma 1 from [34, VI.1.], there exists l ∈ Z, such that ϕ(x+ y) = ϕ(x), y ∈ Bnl , x ∈ Qnp . The largest of such numbers l = l(ϕ) is called the parameter of constancy of the function ϕ. Let us denote by DlN(Qnp ) the finite-dimensional space of test functions from D(Qnp ) having supports in the ball BnN and with parameters of constancy ≥ l [34, VI.2.]. The following embedding holds: DlN (Qnp ) ⊂ Dl′N ′(Qnp ), N ≤ N ′, l ≥ l′. Thus D(Qnp ) = lim indN→∞ lim indl→−∞DlN(Qnp ). The space D(Qnp ) is a complete locally convex vector space. According to [34, VI,(5.2’)], any function ϕ ∈ DlN(Qnp ) is represented in the following form (2.9) ϕ(x) = pn(N−l)∑ ϕ(cν)∆l(x− cν), x ∈ Qnp , where ∆l(x − cν) are the characteristic functions of the disjoint balls Bl(cν), and the points cν = (cν1 , . . . c n) ∈ BnN do not depend on ϕ. Denote by D′(Qnp ) the set of all linear functionals on D(Qnp ) [34, VI.3.]. Let us introduce in D(Qnp ) a canonical δ-sequence δk(x) = pnkΩ(pk|x|p), and a canonical 1-sequence ∆k(x) = Ω(p −k|x|p), k ∈ Z, x ∈ Qnp , where (2.10) Ω(t) = 1, 0 ≤ t ≤ 1, 0, t > 1. Here ∆k(x) is the characteristic function of the ball B k . It is clear [34, VI.3., VII.1.] that δk → δ, k → ∞ in D′(Qnp ) and ∆k → 1, k → ∞ in E(Qnp ). The Fourier transform of ϕ ∈ D(Qnp ) is defined by the formula F [ϕ](ξ) = χp(ξ · x)ϕ(x) dnx, ξ ∈ Qnp , where χp(ξ · x) = χp(ξ1x1) · · ·χp(ξnxn) = e2πi j=1{ξjxj}p; ξ · x is the scalar product of vectors. The Fourier transform is a linear isomorphism D(Qnp ) into D(Qnp ). Moreover, according to [32, Lemma A.], [33, III,(3.2)], [34, VII.2.], (2.11) ϕ(x) ∈ DlN(Qnp ) iff F (ξ) ∈ D−N−l (Q We define the Fourier transform F [f ] of a distribution f ∈ D′(Qnp ) by the relation [34, VII.3.]: (2.12) 〈F [f ], ϕ〉 = 〈f, F [ϕ]〉, ∀ϕ ∈ D(Qnp ). Let A be a matrix and b ∈ Qnp . Then for a distribution f ∈ D′(Qnp ) the following relation holds [34, VII,(3.3)]: (2.13) F [f(Ax+ b)](ξ) = | detA|−1p χp −A−1b · ξ F [f(x)] p-ADIC HAAR MULTIRESOLUTION ANALYSIS 7 where detA 6= 0. According to [34, IV,(3.1)], (2.14) F [∆k](x) = δk(x), k ∈ Z, x ∈ Qnp . In particular, F [Ω(|ξ|p)](x) = Ω(|x|p). The convolution f ∗ g for distributions f, g ∈ D′(Qnp ) is defined (see [34, VII.1.]) as (2.15) 〈f ∗ g, ϕ〉 = lim 〈f(x)× g(y),∆k(x)ϕ(x+ y)〉 if the limit exists for all ϕ ∈ D(Qnp ), where f(x) × g(y) is the direct product of distributions. If for distributions f, g ∈ D′(Qnp ) the convolution f ∗ g exists then [34, VII,(5.4)] (2.16) F [f ∗ g] = F [f ]F [g]. Definition 2.1. Let πα be a multiplicative character of the field Qp. A dis- tribution f ∈ D′(Qnp ) is called homogeneous of degree πα if for all ϕ ∈ D(Qnp ) and t ∈ Q∗p we have the relation , . . . , = πα(t)|t|np f, ϕ(x1, . . . , xn) i.e., f(tx) = f(tx1, . . . , txn) = πα(t)f(x), x = (x1, . . . , xn) ∈ Qnp . A homoge- neous distribution of degree πα(t) = |t|α−1p (α 6= 0) is called homogeneous of degree α− 1. 3. The p-adic Lizorkin spaces Let us introduce the p-adic Lizorkin space of test functions Φ(Qnp ) = {φ : φ = F [ψ], ψ ∈ Ψ(Qnp )}, where Ψ(Qnp ) = {ψ(ξ) ∈ D(Qnp ) : ψ(0) = 0}. Here Ψ(Qnp ),Φ(Q p ) ⊂ D(Qnp ). The space Φ(Qnp ) is called the p-adic Lizorkin space of test functions . The space Φ(Qnp ) can be equipped with the topology of the space D(Qnp ) which makes Φ a complete space. In view of (2.11), the following lemma holds. Lemma 3.1. ( [3], [4]) (a) φ ∈ Φ(Qnp ) iff φ ∈ D(Qnp ) and (3.1) φ(x) dnx = 0. (b) φ ∈ DlN(Qnp ) ∩ Φ(Qnp ), i.e., φ(x) dnx = 0, iff ψ = F−1[φ] ∈ D−N−l (Qnp ) ∩Ψ(Qnp ), i.e., ψ(ξ) = 0, ξ ∈ Bn−N . Unlike the classical Lizorkin space, any function ψ(ξ) ∈ Φ(Qnp ) is equal to zero not only at ξ = 0 but in a ball Bn ∋ 0, as well. Let Φ′(Qnp ) denote the topological dual of the space Φ(Q p ). We call it the p-adic Lizorkin space of distributions . 8 V. M. SHELKOVICH AND M. SKOPINA By Ψ⊥ and Φ⊥ we denote the subspaces of functionals in D′(Qnp ) orthogonal to Ψ(Qnp ) and Φ(Q p ), respectively. Thus Ψ ⊥ = {f ∈ D′(Qnp ) : f = Cδ, C ∈ C} and Φ⊥ = {f ∈ D′(Qnp ) : f = C, C ∈ C}. Proposition 3.1. ( [3]) Φ′(Qnp ) = D′(Qnp )/Φ⊥, Ψ′(Qnp ) = D′(Qnp )/Ψ⊥. The space Φ′(Qnp ) can be obtained from D′(Qnp ) by “sifting out” constants. Thus two distributions in D′(Qnp ) differing by a constant are indistinguishable as elements of Φ′(Qnp ). Similarly to (2.12), we define the Fourier transform of distributions f ∈ Φ′×(Q p ) and g ∈ Ψ′×(Qnp ) by the relations: (3.2) 〈F [f ], ψ〉 = 〈f, F [ψ]〉, ∀ψ ∈ Ψ(Qnp ), 〈F [g], φ〉 = 〈g, F [φ]〉, ∀φ ∈ Φ(Qnp ). By definition, F [Φ(Qnp )] = Ψ(Q p ) and F [Ψ(Q p )] = Φ(Q p ), i.e., (3.2) give well defined objects. 4. Construction of multiresolution analysis 4.1. p-Adic multiresolution analysis. Denote the factor group Qp/Zp by Ip, i.e. Ip = {a = p−γ a0 + a1p+ · · ·+ aγ−1pγ−1 (4.1) γ ∈ N; aj = 0, 1, . . . , p− 1; j = 0, 1, . . . , γ − 1}. It is well known that Qp = B0 ∪ ∪∞γ=1Sγ, where Sγ = {x ∈ Qp : |x|p = pγ}. In view of (2.1), x ∈ Sγ , γ ≥ 1 if and only if x = x−γp−γ + x−γ+1p−γ+1 + · · ·+ −1 + ξ, where ξ ∈ B0. Since x−γp−γ + x−γ+1p−γ+1 + · · ·+ x−1p−1 ∈ Ip, we have a “natural” decomposition of Qp to a union of mutually disjoint discs: Qp = ∪a∈IpB0(a). So, Ip is a “natural” group of shifts for Qp. Definition 4.1. A collection of closed spaces Vj ⊂ L2(Qp), j ∈ Z is called a multiresolution analysis (MRA) in L2(Qp) if the following axioms hold (a) Vj ⊂ Vj+1 for all j ∈ Z; (b) ∪j∈ZVj is dense in L2(Qp); (c) ∩j∈ZVj = {0}; (d) f(·) ∈ Vj ⇐⇒ f(p−1·) ∈ Vj+1 for all j ∈ Z; (e) there a function φ ∈ V0 such that the system φ(x − a), a ∈ Ip, form an orthonormal basis for V0. The function φ from axiom (e) is called scaling or refinable. It follows im- mediately from axioms (d) and (e) that the functions pj/2φ(p−j · −a), a ∈ Ip, form an orthonormal basis for Vj. p-ADIC HAAR MULTIRESOLUTION ANALYSIS 9 According to the standard scheme (see, e.g., [31, §1.3]) for construction of MRA-based wavelets, for each j, we define a space Wj (wavelet space) as the orthogonal complement of Vj in Vj+1, i.e., (4.2) Vj+1 = Vj ⊕Wj , j ∈ Z, where Wj ⊥ Vj , j ∈ Z. It is not difficult to see that (4.3) f ∈ Wj ⇐⇒ f(p−1·) ∈ Wj+1, for all j ∈ Z and Wj ⊥Wk, j 6= k. Taking into account axioms (b) and (c), we obtain (4.4) ⊕j∈ZWj = L2(Qp) (orthogonal direct sum). If now we find a function ψ ∈ W0 such that the system ψ(x−a), a ∈ Ip, form an orthonormal basis for W0, then the system p j/2ψ(p−j · −a), a ∈ Ip, is an orthonormal basis for L2(Qp). Such a function ψ is called a wavelet function and the basis is a wavelet basis. 4.2. p-Adic refinement equation. Let φ be a refinable function for a MRA. As was mentioned above, the system p1/2φ(p−1 · −a), a ∈ Ip, is a basis for V1. It follows from axoim (a) that (4.5) φ = αaφ(p −1 · −a), αa ∈ C. We see that the function φ is a solution of a special kind of functional equation. Such equations are called refinement equations. Investigation of refinement equations and their solutions is the most difficult part of wavelet theory in real analysis. A natural way for construction of a MRA (see, e.g., [31, §1.2]) is the fol- lowing. We start with an appropriate function φ whose integer shifts form an orthonormal system, and set V0 = span : a ∈ Ip and Vj = p−jx− a : a ∈ Ip , j ∈ Z. It is clear that axioms (d) and (e) of Definition 4.1 are fulfilled. Of course, not any such a function φ provides axiom (a). In the real setting, the relation V0 ⊂ V1 holds if and only if the refinable function satisfies a refinement equation. Situation is different in p-adics. Generally speaking, a refinement equation (4.5) does not imply the including property V0 ⊂ V1. Indeed, we need all the functions φ(· − b), b ∈ Ip, to belong to the space V1, i.e., the equalities φ(x − b) = a∈Ip αa,bφ(p −1x − a) should be fulfilled for all b ∈ Ip. Since p−1b+ a is not in Ip in general, we can not state that refinement equation (4.5) implies φ(x − b) = a∈Ip αa,bφ(p −1x − p−1b − a) ∈ V1 for all b ∈ Ip. The refinement equation reflects some “self-similarity”. The structure of the space Qp has a natural “self-similarity” property which is given by formulas (2.6), (2.7). By (2.7), the characteristic function ∆0(x) = Ω of the unit 10 V. M. SHELKOVICH AND M. SKOPINA disc B0 is represented as a sum of p characteristic functions of the disjoint discs B−1(r), r = 0, 1, . . . , p− 1, i.e., (4.6) ∆0(x) = , x ∈ Qp. Thus, in p-adics, we have a natural refinement equation (4.5): (4.7) φ(x) = , x ∈ Qp, whose solution is φ(x) = ∆0(x) = Ω . This equation is an analog of the refinement equation generating Haar MRA in real analysis. 4.3. Construction of 2-adic Haar multiresolution analysis. Now, using the refinement equation (4.7) for p = 2 (4.8) φ(x) = φ , x ∈ Q2, and its solution, the refinable function φ(x) = ∆0(x) = Ω , we construct 2-adic multiresolution analysis. (4.9) V0 = span : a ∈ I2 (4.10) Vj = span 2−jx− a : a ∈ I2 , j ∈ Z. It is clear that axioms (d) and (e) of Definition 4.1 are fulfilled and the system 2j/2φ(2−j · −a), a ∈ Ip is an orthonormal basis for Vj, j ∈ Z. Note that the characteristic function of the unit disc Ω has a wonderful feature: Ω(| · +ξ|2) = Ω(| · |2), for all ξ ∈ Z2 because the p-adic norm is non- Archimedean. In particular, Ω(| · ±1|2) = Ω(| · |2), i.e., (4.11) φ(x± 1) = φ(x), ∀ x ∈ Q2. Thus φ is periodic with the period 1. In view of this fact, taking into account that 2−1b+ a ( mod 1) is in I2, for all a, b ∈ I2, it follows from the refinement equation (4.8) that V0 ⊂ V1. By (4.10), this yields axiom (a). Due to the refinement equation (4.8), we obtain that Vj ⊂ Vj+1, i.e., the axiom (a) from Definition 4.1 holds. Lemma 4.1. The axiom (b) of Definition 4.1 holds, i.e., ∪j∈ZVj = L2(Q2). Proof. According to (2.9), any function ϕ ∈ D(Q2) belongs to one of the spaces DlN(Q2), and consequently, is represented in the form (4.12) ϕ(x) = pN−l∑ ϕ(cν)∆l(x− cν), x ∈ Q2, p-ADIC HAAR MULTIRESOLUTION ANALYSIS 11 where ∆l(· − cν) are the characteristic functions of the mutually disjoint discs ν) ⊂ Q2, cν ∈ BN , ν = 1, 2, . . . pN−l; l = l(ϕ), N = N(ϕ). Since ∆l(x − cν) = Ω(p−l|x − cν |p) = Ω(|plx − plcν |p) and any number plcν can be represented in the form plcν = aν + bν , where aν ∈ I2, bν ∈ Z2, we have ∆l(x− cν) = ∆l(x− aν). Thus any function ϕ ∈ D(Q2) can be represented in the form (4.13) ϕ(x) = pN−l∑ αν∆l(x− aν), x ∈ Q2, aν ∈ I2, αν ∈ C. Consequently, on the basis of (4.10), ϕ(x) ∈ V−l. Thus any test function ϕ belongs to one of the space Vj , where j = j(ϕ). Since the space D(Q2) is dense in L2(Q2) [34, VI.2], approximating any function from L2(Q2) by test functions (4.13), we prove our assertion. � Lemma 4.2. The axiom (c) of Definition 4.1 holds, i.e., ∩j∈ZVj = {0}. Proof. Suppose that ∩j∈ZVj 6= {0}. Then there exists a function f ∈ Vj for all j ∈ Z. Hence, due to (4.10), f(x) = a∈I2 cjaφ 2−jx− a for all j ∈ Z. Let x = 2−N(x0+x12+x22 2+· · · ). Since 2−jx = 2−N−j(x0+x12+x222+· · · ), for all j ≤ −N , we have 2−jx ∈ Z2, and, consequently, |2−jx − a|2 > 1 for all a ∈ I2, a 6= 0. Thus φ 2−jx − a = 0 for all j ≤ −N and a ∈ I2, a 6= 0. Since |2−jx|2 ≤ 1, we have f(x) = cj0 for all j ≤ −N . Similarly, for another x′ = 2−N (x′0 + x 12 + x 2 + · · · ), we have f(x′) = cj′0 for all j ≤ −N ′. This yields that f(x) = f(x′). Consequently, f(x) ≡ C, where C is a constant. However, if C 6= 0, f 6∈ L2(Q2). Thus, C = 0 and the proof of the theorem is complete. � According to the above scheme, we introduce the spaceW0 as the orthogonal complement of V0 in V1. (4.14) ψ(0)(x) = φ Lemma 4.3. The shift system ψ(0)(x− a), a ∈ I2, is an orthonormal basis of the space W0. Proof. Let us prove that W0 ⊥ V0. It follows from (4.8), (4.14) that ψ(0)(x− a), φ(x− b) ψ(0)(x− a)φ(x− b) dx for all a, b ∈ I2. Let a 6= b. Since it is impossible a 6= b+ 1, b 6= a + 1, taking into account that the functions 21/2φ(2−1 · −c), c ∈ I2 are orthonormal, we obtain ψ(0)(x − a), φ(x− b) = 0. If a = b, again due to the orthonormality 12 V. M. SHELKOVICH AND M. SKOPINA of the system 21/2φ(2−1 · −c), c ∈ I2, taking into account that a2 , ∈ I2, we have ψ(0)(x− a), φ(x− a) dx = 0. Thus, ψ(0)(x+ a) ⊥ φ(x+ b) for all a, b ∈ I2. The refinement equation (4.8) and relation (4.14) imply that x− 2a + ψ(0) x− 2a , a ∈ I2. Since {21/2φ(2−1x − a) : a ∈ I2} is a basis for V1, we have V1 = V0 ⊕W0, i.e., (4.2) holds. � Thus we prove that the collection {Vj : j ∈ Z} is a MRA in L2(Q2) and the function ψ(0) defined by (4.14) is a wavelet function. This MRA is a 2- adic analog of the real Haar MRA and the wavelet basis generated by ψ(0) is an analog of real Haar wavelet basis. But in contrast to the real setting, the refinable function φ generating our Haar MRA is periodic with the period 1 (see (4.11)), which never holds for real refinable functions. It will be shown bellow that due of this specific property of φ, there exist infinity many different orthonormal wavelet bases in the same Haar MRA (see Sec. 5). Due to (2.3), (2.7), the function ψ(0) can be rewritten in the form ψ(0)(x) = −1x)Ω(|x|2) and the Haar wavelet basis is ψ(0)γa (x) = 2 −γ/2ψ(0)(2γx− a) (4.15) = 2−γ/2χ2 2−1(2γx− a) |2γx− a|2 , x ∈ Q2, γ ∈ Z, a ∈ I2. It is clear that (4.16) ψ(0)γa (x) dx = 0, and, according to Lemma 3.1, ψ γa (x) belongs to the Lyzorkin space Φ(Q2). Remark 4.1. The Haar wavelet basis (4.15) coincides with Kozyrev’s wavelet basis (1.2) for the case p = 2. In present paper we restrict ourself by con- structing the Haar wavelets only for p = 2. Since Haar refinement equation (4.7) was presented for all p, a similar construction may be easily realized in the general case. Moreover, it is not difficult to see that Kozytev’s wavelet function θj(x) from (1.2) can be expressed in terms of the refinable function φ(x) as (4.17) θj(x) = χp(p −1jx)Ω = p−1/2 , x ∈ Qp, p-ADIC HAAR MULTIRESOLUTION ANALYSIS 13 where hr = p 2πi{ jr }p, r = 0, 1, . . . , p− 1, j = 1, 2, . . . , p− 1. Remark 4.2. In view of periodicity (4.11) of the refinable function φ, one can use shifts ψ(0)(·+ a), a ∈ I2, instead of shifts ψ(0)(· − a), a ∈ I2. Now we show that there is another function ψ(1)(x) whose shifts form an orthonormal basis in W0. Indeed, taking into account (4.11), we have ψ(1)(x) = (4.18) = an its shifts (4.19) = (4.20) = Since the system of functions {φ(2−1x − a) : a ∈ I2} is orthonormal, in view of (4.11), formulas (4.18)–(4.20) imply that the function ψ(1)(x) and the function ψ(1)(x − a) are orthonormal, whenever a ∈ I2, a 6= 0, 12 . Here we take into account that all shifts (up to mod 1) of refinable function in (4.18), (4.20) are distinct. Similarly, by (4.18), (4.19), we have ψ(1)(x), ψ(1)(x+ 2−1) ψ(1)(x)ψ(1)(x+ 2−1) dx = 2−1 dx = 0. ψ(1)(x), ψ(1)(x) = 2−1 dx = 1. 14 V. M. SHELKOVICH AND M. SKOPINA Thus all shifts of ψ(1) are orthonormal. It is clear that the functions (4.18) and (4.19) can be rewritten in the form (4.21) ψ(1)(x) = − ψ(0) + ψ(0) It follows that ψ(0)(x) = + ψ(1) Since the system ψ(0)(· − a), a ∈ I2, forms an orthonormal basis for W0, the system ψ(1)(· − a), a ∈ I2, is another orthonormal basis for W0. So, we showed that a wavelet basis generated by the Haar MRA is not unique. 5. Description of 2-adic Haar bases 5.1. Complex wavelets. Using the fact that all dilatations and shifts (x → 2γx+ a, a ∈ I2) of the Haar wavelet function ψ(0) form a orthonormal basis in L2(Q2), we show that there exist infinitely many wavelet functions ψ(s), s ∈ N in W0. In what follows, we shall write the 2-adic number a = 2−s a0 + a12 + · · ·+ as−12 s−1) ∈ I2, aj = 0, 1, j = 0, 1, . . . , s − 1 briefly as a rational number a = m , where m = a0 + a12 + · · ·+ as−12s−1. Since the characteristic function of the unit disc φ(x) = ∆0(x) = Ω periodic with the period ξ ∈ S0, the wavelet function ψ0(x) has the following evident and important property: (5.1) ψ(0)(x+ ξ) = −ψ(0)(x), ξ ∈ S0. Here ξ = 1 + ξ12 + ξ22 2 + · · · , where ξj = 0, 1; j ∈ N. Before we prove a general result, we consider the simplest particular case. Consider the function (5.2) ψ(1)(x) = α0ψ + α1ψ , α0, α1 ∈ C, and solve the problem when all shifts of this function generates an orthonormal basis ψ(1)(x+ a), a ∈ I2 in W0. Taking into account orthonormality of the system ψ(0)(x + a), a ∈ I2 and relation (5.1), we can see that the function ψ(1)(x) and the functions ψ(1)(x+a) are orthonormal for all a ∈ I2, a 6= 0, 12 . Thus, in view of (5.1), the system of functions ψ(1)(x + a), a ∈ I2 is orthonormal if and only if the system of functions (5.2) and (5.3) ψ(1) = −α1ψ(0) + α0ψ p-ADIC HAAR MULTIRESOLUTION ANALYSIS 15 is orthonormal. Hence, we have |α0|2 + |α1|2 = 1. In other words, the matrix α0 α1 −α1 α0 is unitary. Thus, the function (5.2), where |α0|2 + |α1|2 = 1 is the wavelet function. It is clear that the wavelet function (4.21) is a particular case of the wavelet function (5.2). Consequently, all dilatations and shifts of ψ(1)(x) form 2-adic orthonormal wavelet basis in L2(Q2). Now we will prove a general theorem. Theorem 5.1. Let s = 1, 2, . . . . The function (5.4) ψ(s)(x) = 2s−1∑ is the wavelet function (whose dilatations and shifts form 2-adic orthonormal wavelet basis in L2(Q2)) if and only if (5.5) αk = 2 −s(−1)k 2s−1∑ −iπ 2r+1 k, k = 0, 1, 2, . . . , 2s − 1, γk ∈ C, |γk| = 1. Proof. Suppose that ψ(s)(x), s ≥ 1 is given by formula (5.4). Since the system ψ(0)(· + a), a ∈ I2 is orthonormal (see Subsec. 4.3) and in view of relation (5.1), it is easy to see that ψ(s) and ψ(s)(·+ a) are orthonormal for any a ∈ I2, a 6= k , k = 0, 1, . . . 2s − 1. Thus the system of functions ψ(s)(x+ a), a ∈ I2 is orthonormal if and only if the system of functions, consisting of the function (5.4) and its shifts, i.e., = −α2s−rψ(0)(x)−α2s−r+1ψ(0) −· · ·−α2s−1ψ(0) r − 1 (5.6) + α0ψ + · · ·+ α2s−r−1ψ(0) 2s − 1 r = 0, 1, . . . , 2s − 1 is orthonormal. Set Ξ(0) = {ψ(0)(· + k ) : k = 0, 1, . . . , 2s − 1}T , Ξ(s) = {ψ(s)(· + k ) : k = 0, 1, . . . , 2s − 1}T . In view of (5.4), (5.6), Ξ(s) = DΞ(0), where (5.7) D =  α0 α1 α2 . . . α2s−2 α2s−1 −α2s−1 α0 α1 . . . α2s−3 α2s−2 −α2s−2 −α2s−1 α0 . . . α2s−4 α2s−3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . −α2 −α3 −α4 . . . α0 α1 −α1 −α2 −α3 . . . −α2s−1 α0  Thus the system Ξ(s) is orthonormal if and only if the matrix D is unitary. 16 V. M. SHELKOVICH AND M. SKOPINA Let u = (α0, α1, . . . , α2s−1) T be a vector and  0 0 . . . 0 0 −1 1 0 . . . 0 0 0 0 1 . . . 0 0 0 . . . . . . . . . . . . . . . . . . . 0 0 . . . 1 0 0 0 0 . . . 0 1 0  be a 2s × 2s matrix. It is easy to see that Aru = (−α2s−r,−α2s−r+1, . . . ,−α2s−1, α0, α1, . . . , α2s−r−1)T , r = 1, 2, . . . , 2s − 1. Thus D = u,Au, . . . , A2 . It is significant that u = −u. Consequently, in order to describe all matrixes D (or in other words, all vectors u), we should find all vectors u = (α0, α1, . . . , α2s−1) T such that the system {Aru : r = 0, 1, 2, . . . , 2s − 1} is orthonormal. In view of the fact that the system ψ(0)(x+a), a ∈ I2 forms an orthonormal basis in W0, it is easy to see that the vector u0 = (1, 0, . . . , 0, 0) T is one of mentioned above vectors u. That is the system composed of vectors u0 and Aru0 = (δ0 r, δ1 r, . . . , δ2s−2 r, δ2s−1 r) T , r = 1, 2, . . . , 2s−1, is orthonormal, where δi r is the Kronecker symbol. Let us prove that the vector u = (α0, α1, . . . , α2s−1) T already mentioned above such that Aru, r = 0, 1, 2, . . . , 2s − 1 is orthonormal, can be expressed by the formula u = Bu0 if and only if B is a unitary matrix such that AB = BA. Indeed, let u = Bu0, where B is a unitary matrix such that AB = BA. Then Aru = BAru0, r = 0, 1, 2, . . . , 2 s − 1. Since the system Aru0, r = 0, 1, 2, . . . , 2s − 1 is orthonormal and the matrix B is unitary, the vectors Aru, r = 0, 1, 2, . . . , 2s − 1 are orthonormal. Conversely, if the system Aru, r = 0, 1, 2, . . . , 2s−1 is orthonormal, taking into account that the system Aru0, r = 0, 1, 2, . . . , 2s − 1 is orthonormal, we conclude that there exists a unitary matrix B such that Aru = B(Aru0), r = 0, 1, 2, . . . , 2 s − 1. Since A2su = −u, u0 = −u0, we have an additional relation A2 u = BA2 u0. It follows from the above relations that (AB − BA)(Aru0) = 0, r = 0, 1, 2, . . . , 2s − 1. Since the vectors Aru0, r = 0, 1, 2, . . . , 2 s − 1 form a basis in the 2s-dimensional space, we conclude that AB = BA. Thus we have D = Bu0, BAu0, . . . , BA 2s−1u0 It is clear that the eigenvalues of A and the corresponding normalized eigen- vectors are (5.8) λr = −eiπ and vr = (vr)1, . . . , (vr)2s , respectively, where (5.9) (vr)l = 2 −s/2(−1)le−iπ l, l = 0, 1, 2, . . . , 2s − 1, p-ADIC HAAR MULTIRESOLUTION ANALYSIS 17 r = 0, 1, 2, . . . , 2s − 1. As is well known, the matrix A can be represented as A = CÃC−1, where λ0 0 . . . 0 0 λ1 . . . 0 . . . 0 0 . . . λ2s−1 is a diagonal matrix, C = v0, v1, . . . , v2s−1 . Since C is a unitary matrix, the matrix B = CB̃C−1 is unitary if and only if B̃ is unitary. On the other hand, AB = BA if and only if ÃB̃ = B̃Ã. Moreover, since according to (5.8) λk 6= λl, whenever k 6= l, all unitary matrix B̃ such that ÃB̃ = B̃Ã, are given by γ0 0 . . . 0 0 γ1 . . . 0 . . . 0 0 . . . γ2s−1  , where γk ∈ C, |γk| = 1. Hence, all unitary matrix B such that AB = BA, are given by B = CB̃C−1, where B̃ is the above diagonal matrix. By using formula (5.9), one can calculate αk = (Bu0)k = (CB̃C −1u0)k = 2s−1∑ γr(vr)k(vr)0 = 2−s(−1)k 2s−1∑ −iπ 2r+1 k, k = 0, 1, 2, . . . , 2s − 1, where γk ∈ C, |γk| = 1. Thus (5.5) holds. Taking into account that Ξ(0) = D−1Ξ(s), we conclude that if we define ψ(s)(x) by formula (5.4), where αk is given by (5.5), k = 0, 1, 2, . . . , 2 s − 1, then the system of functions {ψ(s)(· − a) : a ∈ a ∈ I2} is orthonormal and forms the orthonormal basis in W0. Consequently, all dilatations and shifts of the function (5.4) form 2-adic orthonormal wavelet basis in L2(Q2). � It is clear that γa (x) dx = 0, and in view of Lemma 3.1, ψ γa (x) belongs to the Lizorkin space ∈ Φ(Qn2 ). 5.2. Real wavelets. Using formulas (5.5), one can extract all real wavelet functions (5.4). Let s = 1. According to (5.2), (5.3), (5.10) ψ(1)(x) = cos θ ψ(0) + sin θ ψ(0) is the real wavelet function. 18 V. M. SHELKOVICH AND M. SKOPINA Let s = 2. Set γr = e iθr , r = 0, 1, 2, . . . , 2s − 1. Then (5.5) imply that the wavelet function ψ(1)(x) is real if and only if sin θ1 + sin θ2 + sin θ3 + sin θ4 = 0, cos θ1 − cos θ2 + cos θ3 − cos θ4 = 0, sin θ1 − sin θ2 − sin θ3 + sin θ4 = cos θ1 + cos θ2 − cos θ3 − cos θ4, sin θ1 − sin θ2 − sin θ3 + sin θ4 = −(cos θ1 + cos θ2 − cos θ3 − cos θ4). The last relations are equivalent to the system sin θ1 = − sin θ4, cos θ1 = cos θ4, sin θ2 = − sin θ3, cos θ2 = cos θ3. Thus for s = 2 the real wavelet functions (5.4) is represented as ψ(1)(x) = (cos θ1 + cos θ2)ψ (cos θ1 − cos θ2 + sin θ1 + sin θ2)ψ(0) (sin θ1 − sin θ2)ψ(0) (5.11) + (cos θ1 − cos θ2 − sin θ1 − sin θ2)ψ(0) In particular, for the special cases θ1 = θ2 = θ, θ1 = −θ2 = θ, θ1 = θ2+π2 = θ, we obtain one-parameter families of the real wavelet functions (5.12) ψ(1)(x) = cos θψ(0) + sin θψ(0) ψ(1)(x) = cos θψ(0) sin θψ(0) sin θψ(0) ψ(1)(x) = 1 (cos θ − sin θ)ψ(0) (cos θ + sin θ)ψ(0) (cos θ − sin θ)ψ(0) respectively. Acknowledgments The authors are greatly indebted to E. Yu. Panov for fruitful discussions. p-ADIC HAAR MULTIRESOLUTION ANALYSIS 19 References [1] S. Albeverio, A.Yu. Khrennikov, V.M. Shelkovich, Associated homogeneous p-adic dis- tributions, J. Math. An. Appl. 313 (2006) 64–83. [2] S. Albeverio, A.Yu. Khrennikov, V. M. Shelkovich, Associated homogeneous p-adic generalized functions, Dokl. Ross. Akad. Nauk 393 no. 3 (2003), 300–303. English transl. in Russian Doklady Mathematics. 68 no. 3 (2003) 354–357. [3] S. Albeverio, A.Yu. Khrennikov, V.M. Shelkovich, Harmonic analysis in the p-adic Lizorkin spaces: fractional operators, pseudo-differential equations, p-adic wavelets, Tauberian theorems, Journal of Fourier Analysis and Applications, Vol. 12, Issue 4, (2006), 393–425. [4] S. Albeverio, A.Yu. Khrennikov, V.M. Shelkovich, Pseudo-differential operators in the p-adic Lizorkin space, p-Adic Mathematical Physics. 2-nd International Conference, Belgrade, Serbia and Montenegro, 15 – 21 September 2005, Eds: Branko Dragovich, Zoran Rakic, Melville, New York, 2006, AIP Conference Proceedings – March 29, 2006, Vol. 826, Issue 1, pp. 195–205. [5] S. Albeverio, A.Yu. Khrennikov, V.M. Shelkovich, p-Adic semi-linear evolutionary pseudo-differential equations in the Lizorkin space, To appear in Dokl. Ross. Akad. Nauk, (2007). English transl. in Russian Doklady Mathematics, (2007). [6] I.Ya. Aref′eva, B.G. Dragovic, and I.V. Volovich On the adelic string amplitudes, Phys. Lett. B 209 no. 4 (1998) 445–450. [7] V.A. Avetisov, A.H. Bikulov, S.V. Kozyrev, and V.A. Osipov, p-Adic models of ultra- metric diffusion constrained by hierarchical energy landscapes, J. Phys. A: Math. Gen. 12 (2002) 177–189. [8] J.J. Benedetto, and R.L. Benedetto, A wavelet theory for local fields and related groups, The Journal of Geometric Analysis 3 (2004) 423–456. [9] R.L. Benedetto, Examples of wavelets for local fields, Wavelets, Frames, and operator Theory, (College Park, MD, 2003), Am. Math. Soc., Providence, RI, (2004), 27–47. [10] A.H. Bikulov, and I.V. Volovich, p-Adic Brownian motion, Izvestia Akademii Nauk, Seria Math. 61 no. 3 (1997) 537–552. [11] I.M. Gel′fand, M.I. Graev and I.I. Piatetskii-Shapiro, Generalized functions. vol 6: Representation theory and automorphic functions. Nauka, Moscow, 1966. [12] A. Haar, Sur Theorie de orthogonalen, Funktionensysteme, Math. Ann. 69 (1910) 331– [13] A. Khrennikov, p-Adic valued distributions in mathematical physics. Kluwer Academic Publ., Dordrecht, 1994. [14] A. Khrennikov, Non-archimedean analysis: quantum paradoxes, dynamical systems and biological models. Kluwer Academic Publ., Dordrecht, 1997. [15] A. Khrennikov, Information dynamics in cognitive, psychological, social and anomalous phenomena. Kluwer Academic Publ., Dordrecht, 2004. [16] A.Yu. Khrennikov, and S.V. Kozyrev, Wavelets on ultrametric spaces, Applied and Computational Harmonic Analysis 19 (2005) 61–76. [17] A.Yu. Khrennikov, and S.V. Kozyrev, Pseudodifferential operators on ultrametric spaces and ultrametric wavelets, Izvestia Akademii Nauk, Seria Math. 69 no. 5 (2005) 133–148. [18] A.Yu. Khrennikov, V.M. Shelkovich, p-Adic multidimensional wavelets and their application to p-adic pseudo-differential operators, (2006), Preprint at the url: http://arxiv.org/abs/math-ph/0612049 [19] A.N. Kochubei, Pseudo-differential equations and stochastics over non-archimedean fields, Marcel Dekker. Inc. New York, Basel, 2001. [20] S.V. Kozyrev, Wavelet analysis as a p-adic spectral analysis, Izvestia Akademii Nauk, Seria Math. 66 no. 2 (2002) 149–158. http://arxiv.org/abs/math-ph/0612049 20 V. M. SHELKOVICH AND M. SKOPINA [21] S.V. Kozyrev, p-Adic pseudodifferential operators: methods and applications, Proc. Steklov Inst. Math. 245, Moscow (2004) 154–165. [22] S.V. Kozyrev, p-Adic pseudodifferential operators and p-adic wavelets, Theor. Math. Physics 138, no. 3 (2004) 1–42. [23] S.V. Kozyrev, V.Al. Osipov, V.C. A.Avetisov, Nondegenerate ultrametric diffusion, J. Math. Phys. 46 no. 6 (2005) 15 pp. [24] P.I. Lizorkin, Generalized Liouville differentiation and the functional spaces Lp r(En). Imbedding theorems, (Russian) Mat. Sb. (N.S.) 60(102) (1963) 325–353. [25] P.I. Lizorkin, Operators connected with fractional differentiation, and classes of differ- entiable functions, (Russian) Studies in the theory of differentiable functions of several variables and its applications, IV. Trudy Mat. Inst. Steklov. Vol. 117 (1972), 212–243. [26] S. Mallat, Multiresolution representation and wavelets, Ph. D. Thesis, University of Pennsylvania, Philadelphia, PA. 1988. [27] S. Mallat, An efficient image representation for multiscale analysis, In: Proc. of Machine Vision Conference, Lake Taho. 1987. [28] Y. Meyer, Ondelettes and fonctions splines, Seminaire EDP. Paris. Decamber 1986. [29] S.G. Samko, Hypersingular integrals and their applications. Taylor & Francis, London, 2002. [30] S.G. Samko, A.A. Kilbas, and O.I. Marichev, Fractional integrals and derivatives and some of their applications. Minsk, Nauka i Tekhnika, 1987 (in Russian); English transla- tion: Fractional integrals and derivatives. Theory and applications, Gordon and Breach, London, 1993. [31] I. Novikov , V. Protassov, and M. Skopina, Wavelet Theory. Moscow: Fizmatlit, 2005. [32] M.H. Taibleson, Harmonic analysis on n-dimensional vector spaces over local fields. I. Basic results on fractional integration, Math. Annalen 176 (1968) 191–207. [33] M.H. Taibleson, Fourier analysis on local fields. Princeton University Press, Princeton, 1975. [34] V.S. Vladimirov, I.V. Volovich and E.I. Zelenov, p-Adic analysis and mathematical physics. World Scientific, Singapore, 1994. [35] V.S. Vladimirov, I.V. Volovich, p-Adic quantum mechanics, Commun. Math. Phys. 123 (1989) 659–676. [36] I.V. Volovich, p-Adic string, Class. Quant. Grav. 4 (1987) L83–L87. Department of Mathematics, St.-Petersburg State Architecture and Civil Engineering University, 2 Krasnoarmeiskaya 4, 190005, St. Petersburg, Rus- sia. Phone: +7 (812) 2517549 Fax: +7 (812) 3165872 E-mail address : shelkv@vs1567.spb.edu Department of Applied Mathematics and Control Processes, St. Peters- burg State University, Universitetskii pr.-35, Petrodvorets, 198504 St. Pe- tersburg, Russia. Phone: +7 (812) 51326090 Fax: +7 (812) E-mail address : skopina@ms1167.spb.edu 1. Introduction 1.1. p-Adic wavelets and pseudo-differential operators. 1.2. Contents of the paper. 2. p-Adic distributions 3. The p-adic Lizorkin spaces 4. Construction of multiresolution analysis 4.1. p-Adic multiresolution analysis. 4.2. p-Adic refinement equation. 4.3. Construction of 2-adic Haar multiresolution analysis. 5. Description of 2-adic Haar bases 5.1. Complex wavelets. 5.2. Real wavelets. Acknowledgments References ABSTRACT In this paper, the notion of {\em $p$-adic multiresolution analysis (MRA)} is introduced. We use a ``natural'' refinement equation whose solution (a refinable function) is the characteristic function of the unit disc. This equation reflects the fact that the characteristic function of the unit disc is the sum of $p$ characteristic functions of disjoint discs of radius $p^{-1}$. The case $p=2$ is studied in detail. Our MRA is a 2-adic analog of the real Haar MRA. But in contrast to the real setting, the refinable function generating our Haar MRA is periodic with period 1, which never holds for real refinable functions. This fact implies that there exist infinity many different 2-adic orthonormal wavelet bases in ${\cL}^2(\bQ_2)$ generated by the same Haar MRA. All of these bases are constructed. Since $p$-adic pseudo-differential operators are closely related to wavelet-type bases, our bases can be intensively used for applications. <|endoftext|><|startoftext|> Introduction 3 2 Review 5 2.1 KKLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Consistency of KKLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Large volume scenario (LVS) . . . . . . . . . . . . . . . . . . . . . . . . 7 3 String loop corrections to LVS 10 3.1 From toroidal orientifolds to Calabi-Yau manifolds . . . . . . . . . . . . 11 3.2 LVS with loop corrections . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 The P4[1,1,1,6,9] model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 Gaugino masses 20 4.1 Including loop corrections . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Other soft terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5 LVS for other classes of Calabi-Yau manifolds? 25 5.1 Abundance of “Swiss cheese” Calabi-Yau manifolds . . . . . . . . . . . 25 5.2 Toroidal orientifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.3 Fibered Calabi-Yau manifolds . . . . . . . . . . . . . . . . . . . . . . . 28 6 Further corrections 29 7 Conclusions 29 A Some details on LVS 31 A.1 LVS for P4 [1,1,1,6,9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 A.2 Many Kähler moduli . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 B Loop corrected inverse Kähler metric for P4[1,1,1,6,9] 34 C No-scale Kähler potential in type II string theory 35 C.1 No-scale structure in type IIA . . . . . . . . . . . . . . . . . . . . . . . 36 C.2 No-scale structure in type IIB . . . . . . . . . . . . . . . . . . . . . . . 37 C.3 Cancellation with just the volume modulus . . . . . . . . . . . . . . . . 39 C.4 Cancellation with many Kähler moduli . . . . . . . . . . . . . . . . . . 39 C.5 Perturbative corrections to Vnp1 and Vnp2 . . . . . . . . . . . . . . . . . 40 D KK spectrum with fluxes 42 E The orientifold calculation 44 F Factorized approximation 48 F.1 Factorized approximation of the scalar potential . . . . . . . . . . . . . 50 1 Introduction The KKLT strategy [1,2] for producing stabilized string vacua that can serve as a starting point for phenomenology has been a source of great interest for the last few years. The “large-volume scenario” (LVS) [3,4] is an extension of KKLT where string corrections to the tree-level supergravity effective action computed in [5] play a sig- nificant role, and where the compactification volume can be as large as 1015 in string units. In LVS, work has been done on soft supersymmetry breaking [4,6,7], the QCD axion [8,9], neutrino masses [10], inflation [11–14], and even first attempts at LHC phenomenology [15]. Although tantalizing, the models discussed in the aforementioned papers (nominally “string compactifications”) raise many questions. It remains an open problem to con- struct complete KKLT models in string theory, as opposed to supergravity. Problems one faces include things like the description of RR fluxes in string theory, showing that the necessary nonperturbative effects actually can and do appear in a way consistent with other contributions to the potential (for progress in this direction, see [16–32]), and verifying that one can uplift to a Minkowski or deSitter vacuum without ruining stabilization [26,33–36]. In LVS, since string corrections play a crucial role, striving for actual string constructions seems quite important. In the end, the restrictiveness this entails may greatly improve predictivity, or kill the models completely as string compactifications. In this paper, we will not improve on the consistency of KKLT or LVS in general, but rather assume the existence of LVS models in string theory, and then perform self- consistency checks. This is a modest step on the way towards reconciling phenomeno- logically promising scenarios with underlying string models. We will see that although a priori the situation looks very bleak, and one might have hastily concluded that even our modest consistency check would put very strong constraints on LVS, things are more interesting. It turns out that LVS jumps through every hoop we present it with, and instead of broad qualitative changes, we find only small quantitative changes. The main difference between KKLT and LVS is that LVS includes a specific string α′ correction ∆Kα′ in the Kähler potential K of the four-dimensional N = 1 effective supergravity. Naturally, the four-dimensional string effective action also contains other string corrections. Here, we will focus on gs corrections due to sources (D-branes and O-planes). For some N = 1 and N = 2 toroidal orientifolds, these corrections were computed in [37] (see also [38]; for a comprehensive introduction to orientifolds, see [39]). Compared to the α′ correction ∆Kα′ considered in LVS, the gs corrections to the Kähler potential ∆Kgs will scale as ∆Kα′ : ∆Kgs O(α′3) : O(g2sα′2) (string frame) . (1) By naive dimensional analysis, one would expect that in a 1/V expansion, where V is the overall volume in the Einstein frame, eq. (1) implies ∆Kα′ ∼ O(g−3/2s V−1) , ∆Kgs ∼ O(gsV−2/3) (Einstein frame) . (2) If there is more than one Kähler modulus, as is usually the case, various combinations of Kähler moduli may appear in ∆Kgs in eq.(2), and a priori this could lead to even weaker suppression in 1/V than that shown. However, we will argue that (2) is actually correct as far as the suppression factors in the 1/V expansion go. Nevertheless, even the suppression displayed in (2) seems to be a challenge for LVS, if indeed V ∼ 1015. For V this large, ∆Kgs would dominate ∆Kα′ , since we do not expect the string coupling gs to be stabilized extremely small. On the other hand, if we are interested in the effects gs corrections may have on the existence of the large volume minima, the relevant quantity to look at is the scalar potential V , rather than the Kähler potential K. It turns out that certain cancellations in the expression for the scalar potential leave us with leading correction terms to V that scale as ∆Vα′ ∼ O(g−1/2s V−3) , ∆Vgs ∼ O(gsV−3) . (3) This is already much better news for LVS. However, restoring numerical factors in (3), and with gs typically not stabilized extremely small, it would seem that ∆Kgs could still have a significant effect both on stabilization and on the resultant phenomenology (like soft supersymmetry breaking terms, which also depend on the Kähler potential). We will see that although this is indeed so in principle, in practice the models we consider are surprisingly robust against the inclusion of ∆Kgs . The clearest example of this is the calculation of gaugino masses in sec. 4. The result is that for the “11169 model” (analyzed in [6]), the correction to the gaugino masses due to ∆Kgs is negligible. Thus, for the most part, LVS survives our onslaught unscathed. We consider this a sign that scenarios such as LVS deserve to be taken seriously as goals to be studied in detail in string theory, even as the caveats above (that apply to any KKLT-like setup) serve to remind us that there is much work left to be done to really understand phenomenologically viable stabilized flux compactifications in string theory. 2 Review Let us begin by a quick review of the KKLT and large volume scenarios. For reasons that will become clear, we will want to allow for more than a single Kähler modulus. 2.1 KKLT The KKLT setup [1,2] is a warped Type IIB flux compactification on a Calabi-Yau (or more generally, F-theory) orientifold, with all moduli stabilized. In this paper, we will neglect warping. For progress towards taking warping into account in phenomenological contexts, see [40,41]. In the four-dimensional N = 1 effective supergravity, the Kähler potential and superpotential read K = −ln(S + S̄)− 2 ln(V) +Kcs(U, Ū) , W = Wtree +Wnp =W (S, U) + Ai(S, U)e −aiTi , (4) where the volume V is a function of the Kähler moduli Ti = τi + ibi whose real parts are 4-cycle volumes and whose imaginary parts are axions bi, arising from the integral of the RR 4-form over the corresponding 4-cycles. In particular, the volume V depends on the Ti only through the real parts τi, V = V(Ti + T̄i) = V(τi) , (5) and the nonperturbative superpotential Wnp a priori depends on the complexified dila- ton S and the complex structure moduli U . After stabilization of S and U by demand- ing DUW = 0 = DSW , we have K = −ln(S + S̄)− 2 ln(V) +Kcs(U, Ū) , W = Wtree +Wnp =W0 + −aiTi . (6) We keep the dependence on the complexified dilaton S and the complex structure moduli U in the Kähler potential for now, since the Kähler metric in the F-term potential V = eK GJ̄IDJ̄W̄DIW − 3|W |2 is to be calculated with the full Kähler potential K, including the dependence on S and U . (In eq. (7), the index I a priori runs over all moduli, but after fixing the complex structure moduli and the dilaton, only the sum over the Kähler moduli remains). The scalar potential V has a supersymmetric AdS minimum at a radius that is barely large enough to make the use of a large-radius effective supergravity self-consistent, typically τ ∼ 100 (recall that τ has units of (length)4).1 In addition, to obtain a supersymmetric minimum at all, one needs to tune the flux superpotential W0 to very small values. That is, the stabilization only works for a small parameter range. This is easy to understand, since we are balancing a nonperturbative term against a tree-level term. Let us briefly digress on the reasons for and implications of this balancing. 2.2 Consistency of KKLT In the previous section we only considered the lowest-order supergravity effective action. As was already noted in the original KKLT paper, α′ corrections and gs corrections (string loops) that appear in addition to the tree-level effective action could in principle 1This minimum then has to be uplifted to dS or Minkowski by an additional contribution to the potential. Various mechanisms were suggested in [2,36,42–45]. affect stabilization. Oftentimes, the logic of string effective actions is that if one such correction matters, they all do, so no reliable physics can be learned from considering the first few corrections. If this is true, one can only consider regimes in which all corrections are suppressed. This is not necessarily so if some symmetry prevents the tree-level contribution to the effective action from appearing, so that the first correc- tion (be it α′ or gs) constitutes lowest order. This indeed happens for type IIB flux compactifications; given the tree level Kähler potential (6), if we were to set Wnp = 0, the remaining K andW in (6) produce a no-scale potential, i.e. the scalar potential for the Kähler moduli then vanishes [46]. In KKLT, this no-scale structure is only broken by the nonperturbative contribution to the superpotential Wnp. Since each term in Wnp is exponentially suppressed in some Kähler modulus, the resulting terms in the potential are also exponentially suppressed. For instance, for the simpler example of a single modulus τ , the potential (after already fixing the axionic partner along the lines of appendix A.2) reads 4|A|2aτe−aτ aτ + 1 − 4aτ |AW0| e−aτ , (8) meaning that even for moderate values of the Kähler modulus τ , all these terms are numerically very small. Corrections in α′ and gs, however, are expected to go as powers of Kähler moduli τ , so will dominate the scalar potential for most of parameter space. In particular, it was argued in [3,4] that only for very small values of W0 can perturbative corrections to the Kähler potential be neglected. It was the insight of [3] that even if W0 is O(1) (which is more generic than the tiny value forW0 required in KKLT), there can still be a competition between the perturbative and nonperturbative corrections to the potential in regions of the Kähler cone where large hierarchies between the Kähler moduli are present. We now review this scenario. 2.3 Large volume scenario (LVS) As was shown in [5], the no-scale structure (and factorization of moduli space) is broken by perturbative α′ corrections to the Kähler potential, such as K = −ln(2S1)− 2 ln(V + 12ξS 1 ) +Kcs(U, Ū) , (9) where2 ξ = −ζ(3)χ/2(2π)3 and S1 = ReS. For large volume V, we see that the perturbative correction goes as a power in the volume, − 2 ln(V + 1 1 ) = −2 lnV − V + . . . , (10) 2Here ξ differs by a factor (2π)−3 from [5] because we use the string length ls = 2π which by the discussion in the previous subsection will dominate in the scalar potential if all Kähler moduli are even moderately large. Using the superpotential W = W0 +Wnp = W0 + −aiTi , (11) the scalar potential has the structure V = Vnp1 + Vnp2 + V3 (12) G̄i∂̄W̄np∂iWnp + G̄iK̄ (W̄0 + W̄np)∂iWnp + c.c. G̄iK̄Ki − 3 |W |2 For concrete calculations we will use the model based on the hypersurface of degree 18 in [1,1,1,6,9] (see [16,47,48] for background information on its topology. Some comments about generalizations to other models with arbitrary numbers of Kähler moduli are given in appendix A.2). The defining equation is z181 + z 2 + z 3 + z 4 + z 5 − 18ψz1z2z3z4z5 − 3φz61z62z63 = 0 (13) and it has the Hodge numbers h1,1 = 2 and h2,1 = 272 (only two of the complex structure moduli ψ and φ have been made explicit in (13); moreover, not all of the 272 survive orientifolding). We denote the two Kähler moduli by Tb = τb + ibb and Ts = τs + ibs, where τb and τs are the volumes of 4-cycles, and the subscripts “b” and “s” are chosen in anticipation of the fact that one of the Kähler moduli (τb) will be stabilized big, and the other one (τs) will be stabilized small. An interesting property of this model is that it allows expressing the 2-cycle volumes ti explicitly as functions of the 4-cycle volumes τj , so that the total volume of the manifold can be written directly in terms of 4-cycle volumes, yielding V = 1 b − τ 3/2s , (14) (ts + 6tb) , τs = Following [4], we are interested in minima of the potential with the peculiar property that one Kähler modulus τb ∼ V2/3 is stabilized large and the rest are relatively small (but still large compared to the string scale), aτs ∼ lnV ∼ ln τb (15) in the case at hand. Thus, we expand the potential around large volume, treating e−aτs as being of the same order as V−1. In the end one has to check that the resulting potential indeed leads to a minimum consistent with the exponential hierarchy aτs ∼ lnV, so that the procedure is self-consistent. Applying this strategy, the scalar potential at leading order in 1/V becomes3 VO(1/V3) = 2|A|2a2√τse−2aτs − 2a|AW0|τse 3|W0|2 eKcs . (16) From here one can see the existence of the large volume minima rather generally. By the Dine-Seiberg argument [49], the scalar potential goes to zero asymptotically in every direction. Along the direction (15), for large volume the leading term in (16) is V ∼ Vnp2 ∝ − V3 , (17) which is negative, so the potential V approaches zero from below. For moderately small values of the volume, V is positive (this is guaranteed if the Euler number χ is negative, hence ξ positive), so in between there is a minimum. This minimum is typically nonsupersymmetric, and because we are no longer balancing a tree-level versus a nonperturbative term, we can find minima at large volume — hence the name large volume scenario (LVS).4 To be precise, in flux compactifications we move in parameter space by the choice of discrete fluxes, but since V is exponentially sensitive to parameters like S1, large volume minima appear easy to achieve also by small changes in flux parameters. If we allow for very small values ofW0 (so that KKLT minima exist at all), the above minimum can coexist with the KKLT minimum [4,50]. Here, we will allow W0 to take generic values of order one. The astute reader will have noticed that this argument for the existence of the LVS minimum is “one dimensional”, as it only takes into account the behavior of the potential along the direction (15). One must of course check minimization with respect to all Kähler moduli. In [3] a plausibility argument to this effect was given, and the existence of the minimum was explicitly checked in the case of the P4 [1,1,1,6,9] model by explicitly minimizing the potential (16) with respect to the Kähler moduli. In doing so, it is convenient to trade the two independent variables {τb,τs} for {V,τs} so that 3Here we have already stabilized the axion bs, i.e. solved ∂V/∂bs = 0, which produces the minus sign in the second term; this is also true with many small moduli τi. See appendix A.2 for details. Also note that solving DUW = 0 = DSW causes the values of U and S at the minimum to depend on the Kähler moduli. However, this dependence arises either from the nonperturbative terms in the superpotential or from the α′-correction to the Kähler potential. Thus it would only modify the potential at subleading order in the 1/V expansion. 4By “tree-level” we intend “tree-level supergravity”, i.e. for the purposes of this paper we call both α′ and gs corrections “quantum corrections”. ∂τsV = 0, as then the last term in (16) is independent of τs (this will be different when we include loop corrections). Extremizing with respect to τs, and defining X ≡ Ae−aτs , (18) one obtains a quadratic equation for X , τsS1V (4aτs − 1)X2 + 2a|W0| (aτs − 1)X eKcs . (19) In (18), we chose A to be real as a potential phase can be absorbed into a shift of the axion b and disappears after minimization with respect to b (see section A.2). Two comments are in order. The quadratic equation (19) has just one meaningful solution (X = 0 corresponds to τs = ∞). Moreover, when expanding (19) in 1/(aτs), the leading terms arise from derivatives of the exponential. Formula (19) is an implicit equation determining τs. However, one can easily solve (19) for X and obtains X = Ae−aτs = 2|W0| (aτs)2 . (20) The hierarchy (15) is obvious in this solution, rendering the procedure self-consistent. One also notices that reasonably large values of τs (e.g. 35) are not difficult to obtain, if V is stabilized large enough; for example, simply set a ∼ 1, A ∼ 1, W0 ∼ 1. We fill in the numerical details, following [3], in appendix A.1 (including some further observations). 3 String loop corrections to LVS As already emphasized, the α′ correction proportional to ξ is only one among many corrections in the string effective action. We now consider the effect of string loop corrections on this scenario and what the regime of validity is for including or neglect- ing those corrections. Volume stabilization with string loop corrections but without nonperturbative effects was considered in [51]. To be precise, the corrections considered in [51] were those of [37], that were com- puted for toroidal N = 1 and N = 2 orientifolds. Here, we would need the analogous corrections for smooth Calabi-Yau orientifolds. Needless to say, these are not known. Faced with the fact that the string coupling gs is stabilized at a finite (and typically not terribly small) value, we propose that attempting to estimate the corrections based on experience with the toroidal case is better than arbitrarily discarding them. As we will see, if our estimates are correct, typically the loop corrections can be neglected, though there may at least be some regions of parameter space where they must be taken into account (see figure 4). (In section 5, we will briefly consider “cousins” of LVS where they cannot be neglected anywhere in parameter space.) Improvement on our guesswork would of course be very desirable. 3.1 From toroidal orientifolds to Calabi-Yau manifolds We would like to make an educated guess for the possible form of one-loop corrections in a general Calabi-Yau orientifold. All we can hope to guess is the scaling of these corrections with the Kähler moduli T and the dilaton S. The dependence on other moduli, like the complex structure moduli U , cannot be determined by the following arguments (even in the toroidal orientifolds this dependence was quite complicated). In order to generalize the results of [37] to the case of smooth Calabi-Yau mani- folds, we should first review them and in particular remind ourselves where the various corrections come from in the case of toroidal orientifolds. There, the Kähler potential looks as follows (we will explain the notation as we go along): K = −ln(2S1)− 2 ln(V) +Kcs(U, Ū)− V (21) E (K)i (U, Ū) 4τiS1 i 6=j 6=k E (W )k (U, Ū) 4τiτj There are two kinds of corrections. One comes from the exchange of Kaluza-Klein (KK) modes between D7-branes (or O7-planes) and D3-branes (or O3-planes, both localized in the internal space), which are usually needed for tadpole cancellation, cf. fig. 1. This leads to the first kind of corrections in (21), proportional to E (K)i where the superscript (K) reminds us that these terms originate from KK modes. In the toroidal orientifold case, this type of correction is suppressed by the dilaton and a single Kähler modulus τi, related to the volume of the 4-cycle wrapped by the D7- branes (or O7-planes, respectively).5 We expect an analog of these terms to arise more 5We should mention that there was no additional correction of this kind coming from KK exchange between (parallel) D7-branes in [37] (actually that paper considered the T-dual version with D5-branes, but here we directly translate the result to the D7-brane language). This was due to the fact that in [37] the D7-brane scalars were set to zero. In general we would also expect a correction coming PSfrag replacements τb Figure 1: The loop correction E(K) comes from the exchange of closed strings, or equivalently an open-string one-loop diagram, between the D3-brane and D7-branes (or O7-planes) wrapped on either the small 4-cycle τs (as in a) or the large 4-cycle τb (as in b). The exchanged closed strings carry Kaluza-Klein momentum. generally, given that they originate from the exchange of KK states which are present in all compactifications. The second type of correction comes from the exchange of winding strings between intersecting stacks of D7-branes (or between intersecting D7-branes and O7-planes). The exchanged strings are wound around non-contractible 1-cycles within the intersec- tion locus of the D7-branes (and O7-planes, respectively), cf. fig. 2. This leads to the PSfrag replacements Figure 2: The loop correction E(W ) comes from the exchange of winding strings on the intersection between the small 4-cycle τs and the large 4-cycle τb. If this intersection is empty, there are no terms with E(W ). second kind of correction in (21) proportional to E (W )i . The superscript (W ) reminds us that these terms arise from the exchange of winding strings. In toroidal orientifolds, this type of correction is suppressed by the two Kähler moduli measuring the volumes from parallel (or more generally, non-intersecting) D7-branes by exchange of KK-states. These should scale in the same way with the Kähler moduli as those arising from the KK exchange between D3- and D7-branes. of the 4-cycles wrapped by the D7-branes (and O7-planes). One might a priori think that this kind of correction does not generalize easily to a smooth Calabi-Yau which has vanishing first Betti number (and therefore at most torsional 1-cycles). However, the exchanged winding strings are, from the open string point of view, Dirichlet strings with their endpoints stuck on the D7-branes. Thus, the topological condition is on the cycle over which the two D7-brane stacks (or one D7-brane stack and an O7-plane) intersect, as in figure 3. Thus, it depends on the topology of specific cycles within cycles whether winding open strings exist in a given model.6 PSfrag replacements no D-brane no D- brane Figure 3: A D7-brane is wrapped on a 4-cycle A, which intersects the 4-cycle B on a 2-cycle C. For Dirichlet strings, the relevant topological condition (the existence of nontrivial 1-cycles) is on the intersection locus C, not on cycle B or on the whole Calabi-Yau. In other words, without the D-brane, the string on cycle C could have been unwound by sliding it along cycle B (as shown in the figure). With the D-brane, the string on cycle C is stuck. Given the expressions in [37] and the subset reproduced in (21) above, it is tempting to conjecture that some terms at one loop might be suppressed only by powers of single Kähler moduli like the τi (and the dilaton): Calabi-Yau: ∆Kgs for some function E of the complex structure and open string moduli. If this were the case, the one-loop corrections would typically dominate the α′ correction in (21) 6The toroidal orientifold case seems to be a bit degenerate. Two stacks of D7-branes intersect along a 2-cycle with the topology of P1. However, there are point-like curvature singularities along the P1 at the orbifold point and strings winding around these singular points cannot be contracted without crossing the singularities. This seems to allow for stability of winding strings (at least classically). (which is suppressed by the overall volume V) in the Kähler potential, if there are large hierarchies among the Kähler moduli. However, one should keep in mind that toroidal orientifolds are rather special in that they have very simple intersection numbers. In particular, the overall volume can be written as V ∼ τiti, where there is no summation over i implied. Thus, it is not obvious whether a generalization to the case of a general Calabi-Yau really contains terms suppressed by single Kähler moduli instead of the overall volume. Even though we cannot exclude the presence of such terms, we deem it more likely that the scaling of one-loop corrections to the Kähler potential is not (22) Calabi-Yau: ∆Kgs gaK(t, S1)E W(t, S1)E V , (23) where the sums run over KK and winding states, respectively. Also, E (K) and E (W ) are again unknown functions of the complex structure and open string moduli, t stands for the 2-cycle volumes (in the Einstein frame; see appendix C.1) and the functions gK(t, S1) and gW(t, S1) determine the scaling of the KK and winding mode masses with the Kähler moduli and the dilaton.7 As we review in appendix E, in the toroidal orientifold case the suppression by the overall volume arises naturally through the Weyl rescaling to the 4-dimensional Einstein frame. Starting with the ansatz (23) for smooth Calabi-Yau manifolds, the known form (22) for toroidal orientifolds follows simply by substituting gK, gW and the intersec- tion numbers for the toroidal orientifold case. In particular, gK ∼ ti for the 2-cycle transverse to the relevant D7-brane, while gW ∼ t−1i for the 2-cycle along which the two D7-branes intersect. Then, the first of the terms in (23) reduces to E (K)i /(S1τi) for toroidal orientifolds, the second to E (W )i /(τjτk) with j 6= i 6= k, cf (50). Our strategy in the following chapters will therefore be to assume a scaling like (23) for the 1-loop corrections to the Kähler potential for general Calabi-Yau spaces. As already mentioned, the dependence on the complex structure and open string moduli cannot be inferred by analogy to the orientifold case. We parameterize our igno- 7In rewriting the sums over KK and winding states in terms of the functions g and E , we assume that the dependence of the corresponding spectra on the complex structure and Kähler moduli factorizes. In the known examples of toroidal orientifolds (with or without world-volume fluxes), this is always the case, cf. [52]. Moreover, in general there can appear several contributions (denoted by a and q) depending on which tower of KK or winding states are exchanged in a given process. We will see explicit examples of this in the following. rance by keeping the expressions E in (23) as unknown functions of the corresponding moduli. Then we investigate the consequences of the one-loop terms, depending on the size of these unknown functions at the minimum of the potential for the complex structure and open string moduli. Some further comments on the form of ∆Kgs will appear in section 5.2. 3.2 LVS with loop corrections Thus, allowing for string loop corrections of the form (23) in (9), and expanding the α′ correction as in (10), we can write K = −ln(2S1)− 2 ln(V) +Kcs(U, Ū)− W = W0 + −aiTi , (24) where as explained in the previous section, we have not specified the explicit form of the loop corrections E , that are allowed to be functions of U (and in general of the open string moduli, that we neglect in our analysis, assuming that they can be stabilized by fluxes). The Kähler potential for the complex structure moduli Kcs(U, Ū) is left unspecified in (24), indeed we will not need its explicit form. For consistency, we have also included loop corrections to the α′ correction.8 This changes ξ to ξ̃, which is a small change; for S1 = 10, numerically ξ̃ ≈ 1.02 ξ. Neglecting fluxes, the functions gaK and g W are proportional and inversely propor- tional to some 2-cycle volume, respectively. (We will come back to corrections from fluxes in appendix D.) When using a particular basis of 2-cycles (with volumes ti as in appendix C.1), the 2-cycle volume appearing in gaK or g W might be given by a linear combination ta = i citi of the basis cycles ti (and similarly for tq). Depending on which 2-cycle is the relevant one, this linear combination might or might not contain the large 2-cycle tb ∼ V1/3, which always exists in LVS. If it is present in the linear combination, one can neglect the contribution of the small 2-cycles to leading order in a large volume expansion and obtains possible terms proportional to E (K)b S−11 V−2/3 or E (W )b V−4/3, where the subscript b refers to the large 4-cycle τb. 8We remind the reader that the α′ correction arises from the R4 term in 10 dimensions whose coefficient receives corrections at 1-loop (and from D-instantons). The 1-loop correction amounts to a shift of the prefactor from ξ to ξ̃ = ξ 1 + π 3ζ(3)S2 , see for instance [53] for a review. Before getting into the details, it is hard to resist trying to anticipate what might happen. For those terms that are more suppressed in volume than the ξ̃ term (e.g E (W )b ), one would expect the loop corrections to have little effect on stabilization. They could still represent a small but interesting correction to physical quantities in LVS. For those that are less suppressed in volume than the ξ̃ term (e.g. E (K)b ), one would expect the loop correction to have a huge effect on stabilization, and severely constrain the allowed values for the complex structure moduli and the dilaton in LVS (in particular, constrain them to a region in moduli space where the function E (K)b takes very small values). We will find, however, that this expectation is sometimes too naive. For example, there can be cancellations in the scalar potential that are not obvious from just looking at the Kähler potential. Let us now get into more detail on what happens in the LVS model with loop corrections. 3.3 The P4 [1,1,1,6,9] model We would now like to specify the general form of the Kähler- and superpotential (24) to the case of the P4[1,1,1,6,9] model. In this space, the divisors that produce nonperturbative superpotentials when D7- (or D3-) branes are wrapped around them do not intersect, as reviewed for instance in [48]. Therefore, we do not expect any correction of the E (W ) type in this model (for the generalization to models where there are such intersections, see appendix D). Moreover, we neglect flux corrections to the KK mass spectrum in the main text. It is shown in appendix D that, for small fluxes, this correctly captures all the qualitative features we are interested in, and it leads to much clearer formulas. Thus, we now consider the scalar potential resulting from K = −ln(2S1)− 2 ln(V) +Kcs(U, Ū)− τbE (K)b τsE (K)s W = W0 + Ae −aTs . (25) As τb is very large the corresponding non-perturbative term in the superpotential of (24) can be neglected, which allowed us to simplify the notation by setting As = A and as = a. The general structure of the scalar potential was already given in (12). The three contributions at leading order (O(V−3)) in the large volume expansion are Vnp1 = e 24a2|A|2τ 3/2s e−2aτs V∆ , (26) Vnp2 = −eKcs 2a|AW0|τse−aτs 6E (K)s , (27) 3eKcs |W0|2 1 ξ̃ + 4(E (K)s )2 , (28) where the axion has already been minimized for, as discussed in section A.2, and 2S1τs − 3E (K)s . (29) The leading α′-correction is the ξ̃ term in V3 above. We now see that it scales with the volume and the string coupling gs = 1/S1 as claimed in the Introduction, in eq. (3). Also the volume dependence of the loop correction (E (K)s term) in V3 is as announced in (3). The gs factors seem to differ from (3); we see g s , g s and g s for Vnp1, Vnp2 and V3, respectively. This is because the gs dependence advertised in (3) arises in models where, unlike in P4[1,1,1,6,9], the E (W ) correction is present as well, cf. appendix D.9 It is also worth mentioning that the loop correction proportional to E (K)s modifies Vnp1 and Vnp2 at leading order in the V-expansion. whereas the α′ correction does not; it only appears in V3. This is so even though both corrections are equally suppressed in the Kähler potential (i.e. ∼ V−1). The reason for this can be traced back to the fact that the loop-correction explicitly depends on τs and not only on the overall volume, cf. the discussion in appendix C.4 and C.5. As anticipated, E (K)b and its first derivatives appear only at the next order, O(V−10/3): V10/3 = 2 61/3|W0|2eKcs S31V10/3 (E (K)b )2 + ∂αE (K)b ∂ᾱE , (30) where ∂α = ∂/∂U α and ∂ᾱ = ∂/∂Ū ᾱ and α enumerates the complex structure moduli. For E (K)b = E s = 0, the potential terms at leading order coincide with the original case discussed in (56), cf. appendix A.1. The singularity from zeros of the denominator is an artifact of the expansion as discussed in appendix B. The range of validity is 9There, it is shown that including the effect of fluxes on the KK spectrum might also produce this behavior. limited to the range in moduli space where the denominator ∆ does not become too small. It is also apparent that the loop terms are subleading in a large τs, large S1 expansion. However, depending on the relative values of the parameters {E (K)s , τs, S1}, a truncation to the first terms in such an expansion may or may not be valid. We perform a numerical comparison of the two contributions to V3 in figure 4. Figure 4: The top surface is the α′ correction, the second is the gs correction, and the “red carpet” is 10/∆ (we used the values A = 1,W0 = 1, a = 2π/8). We see that for most of the parameter range, the α′ correction dominates, and only for large E(K)s , with the string coupling gs = 1/S1 not too small, do the contributions become comparable. We can understand the volume dependence of the terms (26)-(30) as follows. The common prefactor eK gives an overall suppression τ−3b ≃ V−2. The quantum corrections obey the rule that a term proportional to 1/τλb in K appears in V3 at order 1/τ (where the +3 comes from the overall eK factor) for all values of λ except for λ = 1. When it does appear, it is generated by the term (KiKi − 3) and breaks the no-scale structure. For λ = 1 there is a cancellation at leading order, so it appears only at order 1/τ 2+3b (see appendix C.3 and C.4). This rule can explicitly be verified in our calculation: the α′ and the E (K)s corrections are suppressed by 1/τ 3/2b inK, and therefore they appear with the suppression 1/τ b in V3. On the other hand, for the E b term a cancellation takes place to leading order (λ = 1). It appears neither in Vnp1 nor in Vnp2 at leading order (which can be understood more generally, cf. appendix C.5). Thus, it only appears subleading in the potential, at O(V−10/3).10 10This cancellation for λ = 1 was already noticed in [51], albeit in the case without nonperturbative superpotential. In [54] it was argued that this cancellation can be understood from a field redefinition We now proceed to minimize the potential (26)-(28), using the same strategy as in the case without loop corrections, cf. section 2.3 and appendix A.1. The equations ∂VV = 0 = ∂τsV are of course more complicated now, but it is easy to solve them numerically. Doing so we find that the volume V and the small 4-cycle volume τs, viewed as functions of S1 and E (K)s , are well fit by linear functions when restricted to a sufficiently limited range in parameter space. For example, range: S1 = [8, 11], E (K)s = [20, 40] log10 V = 1.720S1 − 0.1208 E (K)s − 3.437 , (31) τs = 5.000S1 − 0.3581 E (K)s − 8.638 . The fits are quite good; the error is no greater than ±0.3 for τs and ±0.1 for log10 V in this range, for an {S1, E (K)s } grid of 402 points. From (31) we see an interesting difference to the case without loop corrections. The value of τs at the minimum depends on the complex structure moduli U , through E (K)s . This is in contrast to the case without loop corrections, where the value of τs is only determined by the value of the Euler number ξ and the dilaton S1, cf. (57) below. It is analogous to the perturbative stabilization in [51] where the volume at the minimum of the potential also depends on U . The result (26)-(28) was derived in a particular model, but we expect the appearance of loop corrections in V to be more general. This opens up the possibility that in principle, one might obtain large volume minima even for manifolds of vanishing (or even positive) Euler number, where LVS is not applicable, as LVS-style stabilization only holds for one sign of ξ. In practice it might be difficult to get large enough values for the 1-loop corrections to stabilize τs at a value sufficiently bigger than the string scale. This deserves further study. We also note that the special structure of (28) and (30), i.e. the appearance of Es only in (28) and of Eb only in (30), offers additional flexibility in tuning the relative size of these terms in a purely perturbative stabilization of the Kähler moduli along the lines of [51,55]. Also this point deserves further study. argument combined with the no-scale structure of the tree-level Kähler potential. That argument holds for the case of a single Kähler modulus T with tree-level Kähler potential −3 ln(T + T̄ ) and under the assumption that the coefficient of the loop correction to the Kähler potential ∼ (T + T̄ )−1 is independent of the complex structure moduli and the dilaton. Here, these assumptions do not hold, but we showed that the term ∼ (Tb + T̄b)−1 in the Kähler potential nevertheless only appears at subleading order in the potential in LVS, cf. (26)-(30). 4 Gaugino masses Now that we know how the stabilization of the (Kähler) moduli is modified by loop corrections, it is natural to extend our analysis to the soft supersymmetry breaking Lagrangian (For a review see for instance [56,57].) In LVS, supersymmetry breaking is mostly due to F -terms: Fs 6= 0, Fb 6= 0. These determine the soft supersymmetry breaking terms which can be present in the low energy effective action without spoiling the hierarchy between the electroweak and the Planck scale, Leff = LMSSM + Lsoft . (32) The soft Lagrangian contains gaugino masses M , scalar masses m, further scalar bilin- ear terms B and trilinear terms A. (For explicit expressions, see the aforementioned reviews, or e.g. [6].) Let us start considering gaugino masses. In [6] it was shown that in LVS, gaugino masses Ma are generically suppressed with respect to the gravitino mass m3/2: |Ma| ≃ ln(1/m3/2) ln(1/m3/2) (we use units in which MPl = 1). This suppression results from a cancellation of the leading order F -term contribution to gaugino masses. We briefly review this calcula- tion. Given the F -terms F I = eK/2GJ̄IDJ̄W̄ , (34) gaugino masses are given by [56] F I∂Ifa , (35) where fa are the gauge kinetic functions and a labels the different gauge group fac- tors. In LVS the Standard Model (SM) gauge groups arise from D7-branes wrapped around small 4-cycles. We do not try to go into the details of how to embed the SM concretely, but we mention that different gauge group factors might arise from brane stacks wrapping the same 4-cycle if world volume fluxes are present on the branes. In that case the gauge kinetic functions are given by11 + ha(F)S + f (1)a (U) , (36) 11We use the “phenomenology” normalization of the gauge generators, in the language of [58]; that explains the relative factor of 4π in (36). where ha depends on the world volume fluxes and we also included a possible 1-loop correction to the gauge kinetic function which depends on the complex structure (and possibly open string) moduli. If several gauge groups arise from branes wrapped around the same cycle, the same Kähler modulus T would appear in all of them. From (36) it is clear why the D7-branes of the SM have to wrap small 4-cycles, because otherwise the gauge coupling would come out too small (unless there is an unnatural cancellation between the different contributions to fa). As is also apparent from (36), the gauge kinetic function in general depends not only on the Kähler moduli but also on the dilaton and the complex structure. Thus, according to (35) we need to know FU , F S and F i for the small Kähler moduli.12 From the definition (34), it is clear that FU and F S might be non-vanishing even though we assume DUW = 0 = DSW , provided the inverse metric G J̄I contains mixed components between Kähler moduli on the one hand and complex structure moduli and dilaton on the other hand. Without loop corrections (i.e. considering only the leading α′ correction) there is no mixing between the Kähler and complex structure moduli, and one finds FU = 0 , F S ∼ O(V−2) and F i ∼ O(V−1) (without loop corrections) . (37) Thus, at leading order in the large volume expansion, the sum in (35) only runs over the Kähler moduli. Moreover, taking into account the linear dependence of the gauge kinetic functions (36) on the (small) Kähler moduli, the sum effectively only involves a single term, i.e. F a +O(V−2) , (38) where F a is the F-term of the (small) Kähler modulus appearing in fa. As a concrete example we consider again the P4[1,1,1,6,9] model with only one small Kähler modulus τs. The corresponding F-term is given by F s = eK/2 Gs̄s∂s̄W̄ + (G s̄sKs̄ +G b̄sKb̄)W̄ = 2τse K/2W̄0 − 1 +O((aτs)−2) +O(V−2) , (39) where we used (20) and (61) for the first term and (64) for the second. 12With a slight abuse of notation, we denote the F -terms of the Kähler moduli by the index i, but the F -terms of the other moduli are identified by the symbol for the corresponding modulus, like FS . This is to avoid introducing too many indices. Now the leading order cancellation is obvious in (39). Determining the gaugino masses requires dividing by Refs, cf. (35). In order to further evaluate this, [6,7] as- sumed that the dilute flux approximation fs = (4π) −1Ts is valid, i.e. they neglected the contributions from world-volume fluxes and one-loop terms compared to the tree-level term. This puts some constraints on the allowed discrete flux values determining hs. We want to stress that the cancellation appearing in (39) is independent of this approx- imation. We are mainly interested in the fate of this cancellation when including loop corrections, and do not have anything to add concerning phenomenological constraints that may arise from imposing the dilute flux approximation. Using it, the gaugino masses simplify to |Ms| = = eK/2|W0| +O((aτs)−2) ln(1/m3/2) ln(1/m3/2) which is the announced result. In (40) we used m3/2 ∼ |W0|/V and aτs ∼ ln(V/|W0|) , (41) where the second relation holds in LVS due to (20). 4.1 Including loop corrections The previous section was a review of the results found in [6]. Now we ask what changes if one considers the loop corrected Kähler potential (25). A priori, as (40) results from a leading order cancellation, one might wonder whether loop corrections might spoil this small hierarchy between the gaugino and gravitino masses. To address this concern we start by observing that the gaugino masses are still determined by the F-terms of the small Kähler moduli (in the large volume limit). More precisely, the scaling of the F-terms (37) now becomes FU = O(V−2), F S ∼ O(V−2) and F i ∼ O(V−1) (with loop corrections) , (42) i.e. FU no longer vanishes, but it is just as suppressed as F S. We again focus on the P4[1,1,1,6,9] model and ask how (39) is modified by loop cor- rections. Amongst other things, we need to generalize equation (20) to include loop corrections, since we need it to calculate the first term in (39). Thus, we need to extremize the potential again with respect to τs by setting ∂τsV = 4aτs − 1 ∆+ 6E (K)s 2a|W0| V2S1∆2 aτs − 1 ∆2 − 18(E (K)s )2 2aS1τ s E (K)s X (43) − 3|W0| 2(E (K)s )2 4S21V3∆ to zero. Obviously, X = 0 is no longer a solution. Instead, there are now two non-trivial solutions, one of which goes to zero in the limit E (K)s → 0. This solution corresponds to a maximum of the potential, so it is of no use to us here. We can expand the other solution for large aτs, as in the case without loop corrections, yielding X = Ae−aτs = 2|W0| 2aE (K)s (aτs)2 . (44) Another ingredient we need is the quantity Gı̄sKı̄, in order to evaluate the second term in (39). Using equation (65) we obtain Gı̄sKı̄ = −2τs 6E (K)s + . . . = −2τs − 2E (K)s − 18(E S21τs 2(E (K)s )3 τ 2s S + . . . , (45) where the ellipsis represents terms that are more suppressed in V−1. Now we see from (44), (45) and (65) that at leading order in an expansion in aτs, the quantities relevant to evaluate (39) are not affected by the loop corrections. Thus, the leading order cancellation in the gaugino masses survives the inclusion of loop effects.13 At first glance, though, equations (44), (45) and (65) seem to suggest a correction to the subleading term, that could potentially give a significant contribution to the gaugino masses after the leading-order cancellation, cf. (39). In the actual calculation, this contribution drops out. Putting all the ingredients together (and employing the dilute flux approximation again), the gaugino mass turns 13One might argue that this result was to be expected, because the main assumption of [6] is that the stabilization is due to nonperturbative effects, i.e. the dominant effect in ∂τsV should arise from the nonperturbative superpotential. However, in view of (43), it is no longer obvious that the nonperturbative terms dominate when loop corrections are included. out to be |Ms| = = 3eK/2|W0| 16a2τ 2s S1 − 12 2aE (K)s 64S1a3τ 3s + . . . ln(1/m3/2) ln(1/m3/2) . (46) The result of [6] is therefore very robust. Unexpectedly, the correction to (40) due to E (K)s only appears at sub-sub-leading order in the 1/ ln(1/m3/2) expansion. 4.2 Other soft terms In [7] all other soft terms were calculated for LVS. The main result (see p. 15 of [7]) is that roughly speaking, all the soft parameters are determined by F s and by the power with which the chiral matter metrics scale with τs. As we saw in the previous section, F s gets modified by loop corrections only at sub-sub-leading order in a 1/τs expansion (see (46)). Therefore, the calculation of all the soft terms in [7] appears to be quite robust against including loop effects. One of the key assumptions in [7] is that all the Yukawa couplings Y are already present in perturbation theory, i.e. they have the schematic form Y = Y pert(U) + Y np(e−T ). This requirement featured prominently already in the derivation of the volume dependence of the chiral matter metrics in [59] by scaling arguments. In [7] the same schematic form is essential for simplifying the trilinear soft terms A. In general these terms receive a contribution of the schematic form F T∂T logY = F T ∂T (Y pert(U) + Y np(e−T )) Y pert(U) + Y np(e−T ) ∼ O(e O(T 0) +O(e−T ) , (47) which is exponentially suppressed if and only if Y pert(U) is non-vanishing. However, in many examples the Yukawa couplings are actually only generated nonperturbatively, see for instance the discussion in [60], and [61] for some examples. This poses a con- straint on the way the Standard Model is realized in LVS, if one wants to ensure flavor universality of the soft breaking terms as advertised in [7]. One more comment about the important issue of flavor universality. In [7], section 3.4., it was argued that in LVS, approximate flavor universality is a natural consequence of the zeroth-order factorization of Kähler and complex structure moduli spaces. We provide some more details on the factorized approximation in appendix F. 5 LVS for other classes of Calabi-Yau manifolds? In section 3.3 and 4 we saw that the 1-loop corrections to the moduli Kähler potential only have relatively small effects on the large volume scenario based on the P4 [1,1,1,6,9] model of [3]. In this chapter, we would like to ask the question how generic the “Swiss cheese” form is for a Calabi-Yau manifold and if there are other models in which the one-loop corrections discussed above might become more important. This is indeed to be expected if the Calabi-Yau under consideration has a fibered structure, as we explain in the following. If gs corrections do dominate α ′ corrections, they could ruin the volume expansion of LVS. 5.1 Abundance of “Swiss cheese” Calabi-Yau manifolds In the LVS examples discussed in [4] the volume in terms of the Kähler moduli takes the “Swiss cheese” form − . . .− , (48) where the coefficients ai, ..., ci are only non-vanishing for the small Kähler moduli. The LVS limit consists in scaling the overall volume of the Calabi-Yau more or less isotrop- ically while having small holes inside the manifold. The τ ’s are linear combinations of ∂tiV, where now V is considered as a (cubic) function of the 2-cycle volumes ti. For the effective field theory analysis to be valid one should not only demand that the 4-cycle volumes τi are large compared to the string scale, but also that the 2-cycle volumes ti are large. In the cases discussed in [4], the linear combinations ∂tiV are indeed such that one can have one of them exponentially large and the others small (but still sufficiently larger than the string scale), without taking any of the ti to be exponentially small. This is obvious for the P4 [1,1,1,6,9] example where the 2-cycle volume tb only appears in the definition of one of the τ ’s, cf. (14), but it is also true for the second example of [4], cf. their formulas (84). However, the F18 model of [16] does not seem to allow its volume to be written in the form (48) with one Kähler modulus τb that can become large while keeping all the others small (again, demanding that the ti stay larger than 1 in string units). Thus, it is an interesting question how generic or non-generic the “Swiss cheese” Calabi-Yau manifolds are. We do not attempt to give a general answer; instead, we turn to two examples in which the form of the volume differs from (48). 5.2 Toroidal orientifolds The reason loop corrections may be more important in toroidal orientifolds than in compactifications on “Swiss cheese” Calabi-Yau manifolds is the following. As we al- ready discussed in section 3.1, the conjectured form of 1-loop corrections (23) simplifies in the case of toroidal orientifolds, because they have very special and simple intersec- tion numbers. More concretely, using the definition τi = ∂tiV, together with the special form of the intersection numbers in the toroidal case, i.e. V = t1t2t3, the volume can alternatively be expressed as V = √τ1τ2τ3 = tiτi (no summation; i = 1, 2 or 3) . (49) Thus, formula (23) simplifies and the 1-loop corrections proportional to E (K)i are only suppressed by single Kähler moduli instead of by the overall volume. Also the terms proportional to E (W )i can be rewritten in the toroidal orientifold case and the Kähler potential takes the form (for the T 6/(Z2 × Z2) example) K = −ln(2S1)− 2 ln(V) +Kcs(U, Ū)− V (50) E (K)i (U, Ū) 4τiS1 i 6=j 6=k E (W )k (U, Ū) 4τiτj where the functions E are non-holomorphic Eisenstein series in this case [37]. It is easy to see that the origin of this simplification is the fact that there is just a single non- vanishing intersection number in the toroidal orientifold case and all Kähler moduli appear linearly in the cubic expression for the volume. The difference of the toroidal orientifold to the “Swiss cheese” case of LVS can also be seen in the different forms of the functional dependence of the volume on the Kähler moduli. In the toroidal orientifold case one has the relations ∂t1V = t2t3 , ∂t2V = t1t3 , ∂t3V = t1t2 , (51) so that two of them will always become large if one takes one of the ti to be large and demands that the other two stay larger than 1. This also holds for any linear combinations of them. The difference is also obvious from the fact that the 2-cycle volume tb that is responsible for τb becoming large in the LVS examples of [4] always appears cubically in the volume. This is related to the fact that the term (τb + aiτi) should be the square of a linear combination of the ti, in order for (48) to be expressible as a cubic polynomial in the ti. In contrast, any (untwisted) 2-cycle volume in the toroidal orientifold case only appears linearly in the cubic volume polynomial. To illustrate the effect of terms in the Kähler potential that are suppressed only by single Kähler moduli instead of the overall volume, we take the Kähler potential (50) and expand V3 in the region of the Kähler cone where τ1 = τ2 = τb ≫ τ3 = τs (as we explained above, at least two of the Kähler moduli have to become large simultaneously, if one wants to avoid any of the 2-cycle volumes becoming very small). This leads to (for simplicity setting all Ui = U , all E (K)i = E (K) and all E i = E (W )): |W0|2eKcs 2S1V2 (E (K))2 + 1 (∂UŪKcs) −1∂UE (K)∂ŪE (K) 8τ 2s S 3 ξ̃ S E (W ) (E (K))2 + (∂UŪKcs)−1∂UE (K)∂ŪE (K) 4S21τs . (52) Obviously, the leading term in the large τb expansion now comes from the loop correc- tion and not from the α′ term (which term really dominates depends on the values of S1 and U as well, of course). Thus, an expansion of the potential as in LVS, cf. (16), would not be realized in this case, even if one found a way to lift enough zero modes by fluxes for τs to appear in a nonperturbative superpotential. This toy example was meant to show that for a consistent large volume expansion in models with large hierarchies in the Kähler moduli, it is important to make sure that there are no correction terms in the Kähler potential (from loop or α′-corrections) that are suppressed only by some of the small Kähler moduli. We should stress again that also terms suppressed by the large volume can be dangerous if the suppression is less than for the α′ term, i.e. if it is τ−λb with λ < 3/2. The only exception to this rule is the case λ = 1 as we showed above (and as is shown more generally in appendices C.4 and C.5). In this respect it would be important to know if the conjecture (25) really bears out. If it turns out that the actual form of the 1-loop corrections also contains terms like tλ1b t , (53) with λ1 + λ2 = 1 but 0 < λ1 6= 1 or 0, such a 1-loop correction would spoil the large volume expansion performed in (16).14 14In principle, one would also need an argument that no such terms arise at higher loop order, which 5.3 Fibered Calabi-Yau manifolds The feature of orientifolds that all Kähler moduli appear linearly in the cubic expression for the volume shows that a similar simplification can occur in the case of (K3) fibered Calabi-Yau manifolds, which also have the property that one Kähler modulus (the one corresponding to the volume of the base) only appears linearly in the cubic expression for the volume. This takes the form V = tbηijtitj + dijktitjtk , (54) where ηij are the intersection numbers of the (K3) fiber, and neither they nor the triple intersection numbers dijk contain the index b, which is chosen to denote “base”, but it is also suggestively the same index as the one we used for the large Kähler modulus in the P4 [1,1,1,6,9] model. Two-parameter examples of this type appear in e.g. [47,62]. In a region of the Kähler moduli space where the base tb is rather large but all the other ti stay relatively small, the volume is approximately V = tbηijtitj . Thus, if the Kähler potential has a 1-loop correction ∼ E (K)b tb/V, it could be approximated in this region E (K)b tb E (K)b , (55) where τf = ηijtitj is the volume of the (K3) fiber (which is small compared to t Obviously, this would lead to a correction to the Kähler potential that is only sup- pressed by a single (small) 4-cycle volume, similar to the toroidal orientifold example we discussed in the last section. We should note that this limit (large base and small fiber for (K3) fibered Calabi- Yau manifolds), is quite different from the one performed in the usual LVS of [3], even though both cases involve hierarchical limits of the Kähler moduli. As explained in section 5.1, the LVS limit consists in scaling the overall volume of the Calabi-Yau more or less isotropically while keeping holes in the bulk of the manifold small. In contrast, the limit of large base and small fiber is anisotropic. At the moment we have nothing to add about whether such anisotropic configurations with all moduli stabilized actually exist. We merely wanted to point out that if they do exist, that would be an example of smooth Calabi-Yau compactifications where the gs corrections we consider dominate over the α′ corrections considered in the large volume limit, as in the toroidal orientifold case. would, however, have to be further suppressed in the dilaton S1. 6 Further corrections In [4], further α′ corrections to the string effective action beyond the one in (9) were considered. In the case of bulk α′ corrections (i.e. those already present in type IIB bulk theory without D-branes, arising from sphere level) scaling arguments were given as to why they are suppressed in the large volume limit. Although that discussion was surprisingly powerful in its simplicity, we do not consider it completely conclusive, if large hierarchies between the Kähler moduli exist. After all, dimensional analysis alone does not guarantee that the other α′-corrections are always suppressed by additional powers in the overall volume, instead of powers of some of the small Kähler moduli. Moreover, in addition to the bulk α′ corrections that appear at order O(α′3), in the models of interest for LVS further α′-corrections arise on the worldvolume of D-branes and O-planes, cf. [63–70]. These corrections begin already at order O(α′) and scaling arguments of the kind used for the bulk corrections do not seem to be sufficient to neglect them. Indeed, there are correction terms involving two powers of the Riemann tensor which do modify the effective D3-brane charge and tension, if the D7-branes are wrapped over 4-cycles with non-vanishing Euler number. These terms were already taken into account in [1]. However, there are further contributions to the DBI action at the same order in α′, like F 23R or F 3 , where F3 stands for the RR 3-form field strength, R for the Riemann tensor and we left index contractions unspecified. If the D7-branes do not break supersymmetry and remain BPS, it seems unlikely that these terms could contribute to the potential for the closed string moduli, i.e. induce some effective D3- brane tension. The reason is that there does not seem to be a corresponding term in the Chern-Simons action that could lead to the necessary modification of the effective D3-brane charge at the same time. This could be checked in more detail. In general, we think that the question of additional corrections to the moduli (Kähler) potential deserves further attention. Here we only outlined some steps in that direction. 7 Conclusions In this paper, we have investigated whether string loop corrections may impact a) stabi- lization in the large volume compactification scenario (LVS), and b) the phenomenology of those scenarios, as manifested in the soft supersymmetry breaking terms. The result is that for the specific class of compactification manifolds considered in LVS, so-called “Swiss cheese” Calabi-Yau manifolds, changes are minuscule. Only if the loop cor- rections become abnormally large (in the toroidal orientifold case, this can happen if the complex structure is stabilized very large) do they affect LVS. For other classes of manifolds, the corrections may be important. We hasten to add that the detailed expressions for the loop corrections in LVS remain unknown; we have merely tried to infer their scaling with the Kähler moduli from experience in the toroidal orientifold limit. We think it is important to attempt to address this issue, as the string coupling is stabilized at a nonzero value, so the corrections cannot be turned off. We also stress the (to some readers obvious) fact that there remain a host of is- sues that must ultimately be dealt with if one wishes to claim that these are “string compactifications”. • We cannot be sure that fluxes do not alter the corrections, since backgrounds with RR and NSNS fluxes are not well understood in string perturbation theory. • Additional corrections may appear (see section 6) that could be equally threat- ening to LVS as the loop corrections, or worse. • In [37] only the corrections to the Kähler potential coming from N = 2 sectors were determined and we based our generalization on those results. However, there might be interesting corrections coming from the N = 1 sectors as well. • It has not yet been shown that a local Standard Model-like construction can be embedded in the simplest examples like the P4 [1,1,1,6,9] model. If more general models turn out to be needed, one needs to reconsider whether the requisite nonperturbative superpotentials are generated. • We have largely ignored open string moduli, under the proviso that they are stabilized heavy, as are the dilaton and complex structure moduli. • The coefficient A(S, U) in the nonperturbative superpotential is generally as- sumed to be of order 1. It is not known how generic this is. • All string computations we have discussed were performed in a supersymmetric context. In LVS supersymmetry is broken already before uplift, in the AdS minimum. Supersymmetry breaking directly in string theory is not very well understood [39,71]. Faced with all these caveats, a pessimist might be inclined to give up. We think we have shown that it is worth considering these issues in detail. Sometimes, an effect one would have thought to be devastating turns out to be as gentle as a summer breeze. Acknowledgments It is a pleasure to thank Carlo Angelantonj, Vijay Balasubramanian, Massimo Bianchi, Joe Conlon, Gottfried Curio, Robbert Dijkgraaf, Michael Douglas, Bogdan Florea, Elias Kiritsis, Max Kreuzer, Fernando Marchesano, Peter Mayr, Thomas Mohaupt, Hans-Peter Nilles, Gabi Pfuff, Fernando Quevedo, Waldemar Schulgin, Mike Schulz, Stephan Stieberger, Angel Uranga, and Alexander Westphal for helpful discussions and comments and Boris Körs for initial collaboration. This work is supported in part by the European Community’s Human Potential Program under contract MRTN-CT- 2004-005104 ’Constituents, fundamental forces and symmetries of the universe’. The work of M. B. is supported by European Community’s Human Potential Program un- der contract MRTN-CT-2004-512194, ‘The European Superstring Theory Network’. He would like to thank the Galileo Galilei Institute in Florence for hospitality. M. H. would like to thank the university of Nis for hospitality. The work of M. H. and E. P. is supported by the German Research Foundation (DFG) within the Emmy Noether- Program (grant number: HA 3448/3-1). Both M. B. and M. H. would like to thank the KITP in Santa Barbara for hospitality during the program “String Phenomenol- ogy” and the university of Hamburg for hospitality during the workshop “Generalized Geometry and Flux Compactifications”. A Some details on LVS In this appendix we collect some details on the minimization of the potential in LVS, mainly reviewing the results of [3,8], but filling in some details. The minimization with respect to the axions (i.e. the imaginary parts of the Kähler moduli) is performed for an arbitrary number of Kähler moduli, while for the minimization of the real parts, we restrict to the example of the hypersurface in P4[1,1,1,6,9] discussed throughout the main text. A.1 LVS for P4 [1,1,1,6,9] Here we give some more numerical details on large-volume stabilization in the P4 [1,1,1,6,9] orientifold. The relevant features of this Calabi-Yau have been described in chapter 2.3. The leading terms of the scalar potential are V e−Kcs = τe−2aτ V2 τe −aτ + V3 , (56) where we use τ = τs and V as the independent variables and for the expansion we have in mind the limit (15). The minimum of this potential under the assumption that aτ ≫ 1 is given by V = µ eaτ . (57) In the P4[1,1,1,6,9] orientifold the coefficients λ, µ and ν can be calculated explicitly, yielding 2a2|A|2 , µ = 2a|AW0| and ν = ξ S1|W0|2 . (58) We notice that the value of τ at the minimum is determined only by the Euler number τ ∝ χ2/3 and the value of the dilaton S1 at its minimum. An example of a set of possible parameters (using a = 2π/10, A = 1, S1 = 10 and W0 = 10) is ξ = − ζ(3)χ 2(2π)3 ≃ 1.31 −→ ν ≃ 155 , λ ≃ 0.67 , µ = 4π . (59) There is an unknown overall factor eKcs that does not change the shape of the potential and so leaves the position of the minima unchanged. For the parameters given in equation (59), the minimum is at τ ≃ 41.1 and V ≃ 9.96 · 1011. These values come from equation (57) which is just approximated using the assumption aτ ≫ 1. This solution has the shortcoming that, if one is interested in the value of the potential at the minimum, after substitution of (57) into (56), one finds V = 0. If instead one solves the exact equation for the minimum of the potential numerically, the result is V ≃ −6.6 · 10−37 at the point τ ≃ 41.7 and V ≃ 1.38 1012. From this one checks that, apart from the shortcoming that V = 0, the approximate solution gives the position of the minimum with a good precision. A.2 Many Kähler moduli The simple picture of P4 [1,1,1,6,9] , gets slightly more involved in models with more than two Kähler moduli, but some general statements can still be made. For a single small Kähler modulus, among the leading contributions to the potential only the one from Vnp2 is axion dependent, while the leading terms in V3 and Vnp1 are axion independent. For several small Kähler moduli, all three terms are axion dependent. However, the argument that the leading term in Vnp2 only receives a sign change due to axion stabi- lization generalizes (and holds also for the regular KKLT scenario with relatively small volume, see e.g. [8], section 3.2). Indeed, with the superpotential (11) one obtains Vnp1 = e K G̄i aiaj |AiAj |e−aiτi−ajτj cos(−aibi + ajbj + βi − βj) Vnp2 = −2eK aiGk̄iKk̄ |AiW0|e−aiτi cos(−aibi + βi − βW0) +|AiAj |e−aiτi−ajτj cos(−aibi + ajbj + βi − βj) , (60) V3 = e K (Gk̄lKk̄Kl − 3) |W0|2 + 2 |W0Ai|e−aiτi cos(−aibi + βi − βW0) +|AiAj |e−aiτi−ajτj cos(−aibi + ajbj + βi − βj) where Ai = |Ai|eiβi, W0 = |W0|eiβW0 and a sum over repeated indices is understood throughout. As the only dependence on the axions is in form of cosines, one can easily see that this potential has a minimum for aibi = −βW0 + βi + niπ , ni ∈ 2Z+ 1 . (61) We notice that the minimum of the bi depends on the (already fixed) complex structure moduli, but it is independent of the Kähler moduli. In the regime (15) the scalar potential again contains three terms at leading order, Vnp1 ∼ 2eKcs aiaj |AiAj |e−aiτi−ajτjM ljMki (−VVlk + VlVk) + . . . , Vnp2 ∼ −2eKcs ai|AiW0|e−aiτiτi + . . . , (62) V3 ∼ eKcs 8V3 |W0| 2 + . . . , where the sum over i and j effectively only picks up terms from the small moduli because of the exponential suppression of Vnp1 and Vnp2. Moreover, for Vnp1 we used the form (89) for the inverse of the moduli metric with respect to the basis (83). The Kähler moduli appearing in the nonperturbative superpotential are linear combinations of these, which we account for by a basis-changing matrix Mki , i.e. Ti =M i T̃k , (63) where T̃k are the fields defined in (83) and Ti are the Kähler moduli appearing in the nonperturbative superpotential. (Another way of saying this is that the real parts of Ti measure the volumes of a basis of divisors that have the right properties to contribute to the nonperturbative superpotential.) In the second term we used Gk̄iKk̄ = −2τi + . . . = −2ReTi + . . . . (64) In the basis (83), this would follow straightforwardly from (87), (89) and the relations (80), but it holds equally well after a change of basis, because both sides of (64) transform linearly under a change of basis (63). Note finally that the ellipsis in (62) and (64) stand for subleading corrections in the large volume limit (assuming also (15)). B Loop corrected inverse Kähler metric for P4 [1,1,1,6,9] We now have a closer look at the inverse metric from the Kähler potential in equation (25). We invert the 4× 4 matrix and focus on the four terms that appear in the scalar potential for the Kähler moduli, Gb̄b = τ 2b +O (τb) , Gb̄s = Gs̄b = 4τbτs 6E (K)s , (65) Gs̄s = 2S1τs where we have performed an expansion in τb ≃ V2/3 and the quantity ∆ was introduced in (29). We notice that only Gb̄b is not corrected at leading order. The apparent divergence from zeros of the denominator ∆ is an artifact of the expansion. In fact, the determinant of the (entire) Kähler metric behaves as detG ∼ Aτ−7/2b +Bτ b + . . . (66) for some expressions15 A and B, which depend on the moduli τs, U and S1. In par- ticular, one finds A ∼ ∆, but B does not vanish at a zero of ∆. Thus, in general the expansion in large τb picks up the factor A, which is responsible for the apparent divergence in (65). However, this is fictitious because when ∆ = 0, the next term proportional to B is non-vanishing and the determinant stays away from zero. Indeed, we do not expect to find any zero of the determinant in the range of validity of the parameters. If E (K)s ≪ (S1τs), one can further expand (65) with respect to E (K)s /(S1τs), yielding Gb̄b = τ 2b +O (τb) , Gb̄s = Gs̄b = 4τbτs 2E (K)s ((E (K)s )2 Gs̄s = 3E (K)s√ 2S1τs ((E (K)s )2 Depending on the values of the moduli (τs and S1), this expansion may or may not be useful. In general, only the expansion in τb makes sense and one has to deal with the full expressions (65). That is what we did in section 3.3. C No-scale Kähler potential in type II string the- In this appendix we review why compactification of type IIA and type IIB theory on general Calabi-Yau manifolds, or orientifolds thereof, lead to no-scale (F-term) potentials if i) the superpotential does not depend on the Kähler moduli (68) and if ii) one uses the tree-level form of the Kähler potential. (69) (Of course, in LVS neither i) nor ii) holds, but one can think of jointly imposing i) and ii) as a zeroth-order approximation, that we will successively move away from in later subsections of this appendix.) 15This A has nothing to do with the A in Wnp. If the moduli spaces of Kähler and complex structure moduli factorize (see appendix F for more details on this), and under assumption i), the F-term potential takes the V = eK GĪJDĪW̄DJW − 3|W |2 GābDāW̄DbW + (G ı̄jKı̄Kj − 3)|W |2 . (71) The indices a and b run over the complex structure moduli and the dilaton, i, j over the Kähler moduli and I and J refer to all moduli. The condition for a no-scale potential (V = 0 for the Kähler moduli) is then Gı̄jKı̄Kj = 3 , (72) and we will verify in turn that this is fulfilled in both type IIA and type IIB Calabi- Yau compactifications, if one uses the tree-level Kähler potential, as in assumption ii). In that case, the moduli spaces of Kähler and complex structure moduli do factorize exactly. C.1 No-scale structure in type IIA The tree level Kähler potential for the Kähler moduli is K = − ln dijk(σ + σ̄)i(σ + σ̄)j(σ + σ̄)k = − ln dijktitjtk = − ln(V) , (73) where dijk are the intersection numbers of the Calabi-Yau, dijk = ωi ∧ ωj ∧ ωk , (74) σi = ti + ici (75) are the complexified Kähler moduli whose real parts ti represent the volumes of 2-cycles and whose imaginary parts originate from the expansion of the NSNS 2-form. Using the Kähler form J = tiωi (76) of the Calabi-Yau, it is useful to introduce the notation V = 1 J ∧ J ∧ J = 1 dijktitjtk , ωi ∧ J ∧ J = dijktjtk , Vij = ωi ∧ ωj ∧ J = dijktk . (77) Note that here the index i does not denote a derivative with respect to the Kähler variables (in contrast to subscripts on the Kähler potential K). Instead, one has the relations Vi = 2∂σiV and Vij = 4∂σi∂σ̄̄V. It is straightforward to calculate Ki = − 2V = Kı̄ Gi̄ = Ki̄ = − . (78) Then one can show that the inverse Kähler metric is G̄i = −4VjiV + 2tjti . (79) To verify this, one has to use V ijVj = ti , Vijtj = 2Vi , Viti = 3V . (80) Putting everything together, one arrives at Gı̄jKı̄Kj = −4V ijV + 2titj V = 3 , (81) i.e. (72) is fulfilled under assumption (69). C.2 No-scale structure in type IIB In the type IIB case, the tree-level Kähler potential for the Kähler moduli is K = −2 ln dijktitjtk = −2 ln(V) . (82) The difference to the IIA case is that, even if K in (82) is expressed in terms of the 2-cycle volumes ti, the real parts of the good Kähler moduli, T̃i, are now the 4-cycle volumes τ̃i (the imaginary parts, on the other hand, arise from the RR 4-form). The relation between them depends on the particular Calabi-Yau: Re T̃i = τ̃i = dijktjtk = Vi , (83) which cannot be inverted in general.16 In order to calculate Ki = ∂T̃iK we note that ∂ti = (∂ti T̃j)∂T̃j + (∂ti T̄)∂ ¯̃T̄ = Vij ∂T̃j + ∂ ¯̃T̄ . (84) If acting on a function F that only depends on T̃ + T , as is the case for K, (84) simplifies to ∂tiF (T̃ + T ) = 2Vij∂T̃jF (T̃ + T ) , (85) where on the left hand side T̃ is understood as a function of t. Alternatively, one has ∂T̃iF (T̃ + T ) = V ij∂tjF (T̃ + T ) = ∂ ¯̃ F (T̃ + T ) . (86) Using this, one can calculate Ki = − V ∂T̃iV = − V ijVj V = − 2V = Kı̄ , (87) where in the last step we used (80). In the same way one can calculate Gi̄ = . (88) Using this formula one can check that the inverse Kähler metric is given by Gı̄j = 4 (−VVij + ViVj) . (89) Putting everything together, no-scale structure holds also for type IIB: Gı̄jKı̄Kj = (−VVij + ViVj) V = 3 , (90) again under the assumption (69). 16Note that the Kähler moduli appearing in the non-perturbative superpotentials in the examples of [16] are related to the ones in (83) by a linear field redefinition. However, this does not play any role in verifying the no-scale structure at leading order, as (90) below is invariant under field redefinitions. We chose to make the distinction clear by using tildes for the Kähler moduli defined by (83). C.3 Cancellation with just the volume modulus Now we relax assumption (69). For simplicity, let us first consider the Kähler potential K = −3 ln(T + T̄ ) + Ξ (T + T̄ )λ , (91) which corresponds to the case of a single Kähler modulus and the complex structure moduli and the dilaton are neglected. A generic quantum correction was added to the tree level term, which could be an α′ or a loop correction, depending on the value of λ. Focusing on V3, i.e. eK |W |2 = G ̄iK̄Ki − 3 , (92) one calculates eK |W |2 = (3(2τ)λ + ξλ)2 3(2τ)2λ + Ξ(2τ)λλ(λ+ 1) = 3− 3 + (λ− λ (2τ)λ 3(2τ)2λ . (93) This simplified calculation gives an intuition of why the E (K)b -term does not appear in V3 of (28) whereas the α ′- and E (K)s -terms contribute. When the exponent of the quantum correction is exactly 1, there is a cancellation at leading order in the scalar potential (compare also the discussion in footnote 10). Note that since we focused on V3 in this subsection, it did not matter whether assumption (68) holds or not. C.4 Cancellation with many Kähler moduli We would now like to see how the previous result is changed when we have an arbitrary number of moduli. We do not make any assumption on the dependence of the volume on the Kähler moduli (“Swiss cheese” or fibered manifolds are special cases). Due to its relevance for LVS, we consider a single correction to the Kähler potential which only depends on the large Kähler modulus Tb (an example would be the α ′-correction or the loop term proportional to E (K)b , considering the moduli other than the Kähler moduli as fixed; this is allowed at leading order in a τb-expansion, as we argue in appendix F). Thus, we take the Kähler potential to be of the form K = K(0) + δK = −2ln(V) + δK(Tb, T̄b) ≡ −2ln(V) + (Tb + T̄b)λ . (94) Again focusing on V3, we obtain eK |W |2 = G ̄iK̄Ki − 3 = (G̄i0 + δG̄i) ̄ + δK̄ i + δKi − 3 , where δKi ≡ ∂TiδK and G 0 is the inverse metric of appendix C.2; finally δG ̄i is the modification of the inverse metric coming from considering the modified Kähler potential (94). Explicitly one has G̄i = (G0i̄ + δKi̄) −1 ≃ G̄i0 −G 0 δKhk̄G 0 + . . . , δKi = − (2τb)λ+1 δib , δKi̄ = (λ2 + λ) Ξb (2τb)λ+2 δibδ̄b . (96) We now put everything together and use the results of appendix C.2 and formula (64) (which, for the unperturbed metric and Kähler potential, is an exact equality) to arrive eK |W |2 = i − 3 ̄ δKi 0 δKkh̄G i + . . . = 0 + (2τb)λ+1 4(λ2 + λ)Ξb (2τb)λ+2 τbτb + . . . (λ− λ2)Ξb (2τb)λ + . . . . (97) We notice that the term 1/τλb vanishes exactly for λ = 1, independently of the explicit form of the volume in terms of the Kähler moduli. In particular, the loop correction proportional to E (K)b experiences a cancellation at leading order in V3 (and it is not difficult to see that the subleading order is suppressed by τ−2λb ). Therefore, the loop correction is subleading in the potential compared to the α′ correction, even though it is leading in the Kähler potential. Next, we would like to extend this analysis to the other parts of the potential, i.e. Vnp1 and Vnp2. C.5 Perturbative corrections to Vnp1 and Vnp2 We now introduce the nonpertubative superpotential into the game, i.e. relax assump- tion (68), and look at the other terms of the scalar potential, Vnp1 and Vnp2 (see eq. (12)). For this, we restrict to the P4[1,1,1,6,9] model again. The contribution Vnp1 is pro- portional to Gs̄sW̄,s̄W,s. From (65) we see that no E (K)b appears at leading order. This can be understood as follows. Consider the Kähler potential (94) where now V is the volume of P4 [1,1,1,6,9] , given in (14). Then the scaling with the large Kähler modulus τb is schematically given by Gs̄s ≃ Gs̄s0 −Gs̄b0 δKbb̄Gb̄s0 + . . . ∼ τ 3/2b + τ 2b τλ+2b ∼ τ 3/2b + + . . . , (98) which shows that any loop correction to the Kähler potential of the form Ξ/τλb leads to a subleading contribution to Vnp1 in the large volume expansion. As usual, the ellipsis stands for terms that are even more subleading in the τb expansion. To understand the E (K)s correction to Gs̄s we need to consider K = −2 ln(V) + δ̃K(T, T̄ ) ≡ −2 ln(V) + Ξb g(Ts, T̄s) (Tb + T̄b)λ for some function g(Ts, T̄s) of the small Kähler modulus and we assume λ ≥ 3/2 in the following. Then, again very schematically, the scaling behavior is given by17 Gs̄s ≃ Gs̄s0 −Gs̄i0 δ̃Ki̄G 0 + . . . ∼ τ 3/2b + Ξb (τb, τ τ−λ−2b τ τ−λ−1b τ + . . . (100) ∼ τ 3/2b + τ−λb + τ −λ+3/2 b + τ b + . . . . One sees that λ = 3/2 indeed contributes at the same order as Gs̄s0 . This is confirmed by the dependence of Gs̄s in (65) on E (K)s through ∆, cf. (29). We now consider Vnp2. This is proportional to G ̄sK̄. Again we start by considering a correction to the Kähler potential whose only dependence on the Kähler moduli is via τb, as in (94). Schematically, we find G̄sK̄ ≃ (G̄s0 −G 0 δKbb̄G ̄ + δK̄ + . . . ∼ τs + Gs̄b0 Ξb τλ+1b + . . . ∼ τ 0b + + . . . . (101) This result is confirmed by the absence of E (K)b in the leading term of Vnp2. A calculation very similar to the one in (100) shows, however, that Vnp2 is modified by a correction to the Kähler potential of the form (99) for λ = 3/2. It is straightforward to generalize this analysis to a more general form of the “Swiss cheese” volume, with more than one small Kähler modulus. 17For λ = 3/2 we can still use the expansion of the inverse metric (96), because the correction term would also be further suppressed e.g. in the dilaton. D KK spectrum with fluxes In this section we would like to develop some intuition on how the analysis of sections 3.3 and 4 might change in the presence of fluxes. We will restrict the discussion to one possible effect of the fluxes, namely their influence on the KK spectrum. It is not known explicitly how closed string fluxes, which are present in LVS, would change the mass spectrum. We will consider a toy example, using an analogy to the correction arising from world volume fluxes (cf. [52]), in order to get a feeling for what kind of effects one might expect. In particular, for the purposes of this appendix we assume a modified KK mass spectrum of the form m2KK ∼ 1 + F 1 + F ) , (102) where F represents any of the fluxes that may be present, and in the second equality the factors of S1 appeared when expressing the 2-cycle volumes in Einstein frame as compared to the string frame (t ∼ e−Φ/2tstr). Note that expanding (102) for large values of t would lead to a correction ∆m2 ∼ F 2/t3, whose scaling with the flux and with t is reminiscent of the moduli masses induced by closed string 3-form flux [72,73]. In that case, the suppression would be by the overall volume (which would lead to only mild effects in LVS), but in (102) we allow for a suppression by single 2-cycle volumes (which might be the small 2-cycle in the P4[1,1,1,6,9] model). Substituting (102) in (23), we now consider the scalar potential resulting from K = −ln(2S1)− 2 ln(V) +Kcs(U, Ū)− τbE (K)b τsE (K)s F 2S1 W = W0 + Ae −aTs . (103) We have not included any flux correction to the term proportional to E (K)b because we expect such corrections to be subleading in a large volume expansion.18 Note that the F -dependent correction term we did include is of the same form as the winding string 18Even though we think it is unlikely, we cannot exclude that the correction to KK masses that scale like t−1b without fluxes is only suppressed by F 2/τs instead of F 2/τb. In that case, one would have to redo the analysis of appendix C.5, using (99) with λ = 1. This would prohibit the use of the expansion (96), because in the large volume limit the leading contribution to Gss̄ would arise from the loop correction (it would scale as τ−1b as opposed to the scaling of the tree level contribution ∼ τ In that case the leading terms in Vnp1 and Vnp2 would be suppressed compared to V3 and only arise at order V−10/3, thus invalidating the volume expansion of LVS. correction ∼ E (W )s , when one neglects any potential flux corrections to the winding string spectrum, cf. (24) (remember that gsW would just be proportional to 1/ without fluxes). Thus, by considering (103) we implicitly also analyze in the following the effect of corrections from winding strings (recall from section 3.3 that this correction is not present in the P4[1,1,1,6,9] model, but may be present in general). We now give the generalization of (26)-(28) when using the modified Kähler po- tential (103). The three contributions at leading order (O(V−3)) in the large volume expansion are Vnp1 = e 24a2|A|2τ 3/2s e−2aτs V∆ , (104) Vnp2 = −eKcs 2a|AW0|τse−aτs 6E (K)s 1− 2F , (105) 3eKcs |W0|2 1 ξ̃ + (106) 4(E (K)s )2 − 8(E (K)s )2F 2S1τ−1s (1 + F 2S1τ−1s )− 8 F 2S21E where the axion has already been minimized for, as discussed in section A.2, and now ∆ is generalized to 2S1τs − 3E (K)s 1− 3F . (107) Plots for F = 1 and F = 3 are given in fig. 5, and they look quite similar to the plot without flux, fig. 4. Qualitatively, the conclusion is the same; only for nongeneric values of the gs corrections do they compete with the α ′ correction. Note, however, that the amount of fine-tuning seems to depend on the value of the flux, cf. fig. 5. The same is true for the dependence of the values of V and τs at the minimum on S1 and E (K)s . For F = 3, for instance, this dependence becomes more complicated than what we found in (31). For the parameter range shown in fig. 5, the values of τs and V in the minimum vary in the ranges τs ∈ [14.6, 46.3] and log10 V ∈ [3.7, 15.5], where the smallest value for both of them is reached in the corner where the two corrections become comparable. Also the cancellation that we found for the gaugino masses survives the inclusion of the flux factor in (103). The correction still only appears at sub-sub-leading order 0.0 9 PSfrag replacements 0.0 9 PSfrag replacements Figure 5: Similarly to figure 4, the top surface is the α′ correction, the second is the gs correction (with F = 1 in the left graph and F = 3 in the right), and the “red carpet” is 10/∆, with ∆ from (107), using the same values as in fig. 4. The result is qualitatively the same as before. Note, however, that the range for E(K)s differs. For larger values of F one does not need to fine-tune E(K)s as much in order for the two corrections to become of similar order. in an expansion in ln(1/m3/2) and we find (again using the dilute flux approximation for the prefactor (Ref)−1): |Ms| = = 3eK/2|W0| 16a2τ 2s S1 − 12 2a(1− 2F 2S1τ−1s )E 64S1a3τ 3s + . . . ln(1/m3/2) ln(1/m3/2) . (108) This concludes our brief study of the direct effects of fluxes on the loop corrections. E The orientifold calculation In the main text, we are interested in how ∆Kgs , the one-loop correction to the Kähler potential, scales with the Kähler moduli Ti. Our argument in section 3.1 is based on the known result for ∆Kgs in the case of N = 2 supersymmetric K3× T 2 orientifolds and N = 1 supersymmetric T 6/ZN (or T 6/(Z2 ×Z2)) orientifolds, from [37] (see also [74]). Here we review this computation for the case of K3 × T 2, and take this opportunity to adapt it to our case of D3-branes and D7-branes from the beginning. (One can also obtain them by T-duality on the final D9/D5 results of [37], e.g. as in the appendix of [75], but as we shall see, the direct computation is enlightening in its own right.) We will leave out details that are essentially identical to [37], and only emphasize the differences. As shown in [37] using “Kähler adapted” vertex operators, the easiest way to com- pute ∆Kgs is by considering the 2-point function of the complex structure modulus U of T 2, with vanishing Wilson line moduli, i.e. 〈VUVŪ〉 = − 4g2cα (U − Ū)2 〈V (0,0)ZZ V (0,0) 〉σ . (109) Here, we use the notation of [37], (0,0) U = −gcα′−2 U − Ū (0,0) (0,0) = gcα ′−2 2 U − Ū (0,0) (110) (0,0) ZZ = − i∂Z + α′(p · ψ)Ψ i∂̄Z + α′(p · ψ̃)Ψ̃ eipX , (0,0) = − 2 i∂Z̄ + α′(p · ψ)Ψ̄ i∂̄Z̄ + α′(p · ψ̃) ¯̃Ψ eipX . (111) As in [37], and [76] before that, we find these complex worldsheet variables particularly convenient: (X4 + ŪX5) , Z̄ = (X4 + UX5) , (ψ4 + Ūψ5) , Ψ̄ = (ψ4 + Uψ5) , (112) where Gstr is the volume of T 2 measured in string frame. The 2-point function (109) can be expanded for small momenta, p1 · p2 ≪ 1, and we obtain 〈V (0,0)ZZ V (0,0) 〉σ = −V4 (p1 · p2) 16(4π2α′)2 d2ν1d k=0,1 ~n=(n,m)T e−π~n ~nt−1 ](0, τ) η3(τ) γσ,kZ intσ,k[ 〈∂̄Z(ν̄1)∂̄Z̄(ν̄2)〉σ〈Ψ(ν1)Ψ̄(ν2)〉α,βσ 〈ψ(ν1)ψ(ν2)〉α,βσ (113) +〈∂̄Z(ν̄1)∂Z̄(ν2)〉σ〈Ψ(ν1) ¯̃Ψ(ν̄2)〉α,βσ 〈ψ(ν1)ψ̃(ν̄2)〉α,βσ + c.c. +O((p1 · p2)2) . For the details we refer to [37]. The main difference to the corresponding formula (C.3) in [37] is the appearance of the inverse metric G−1str in the exponent arising from the zero mode sum, and in the prefactor. This is due to the fact that the D3 and D7 branes are localized along the T 2, and so the closed string channel involves a Kaluza-Klein momentum sum instead of a winding sum. The sum over bosonic zero modes has been made explicit, since there is also an implicit dependence on m,n in the bosonic correlators: this arises from the classical piece in the split into zero modes and fluctuations. That is, Z(ν) = Zclass(ν) + Zqu(ν), where the classical part is given Zclass = n+mŪ Re(ν) c̃σ , c̃σ = 1 for K 2 for A ,M . (114) These zero modes have the right periodicity under Re(ν) → Re(ν) + π (for A,M) or Re(ν) → Re(ν) + 2π (for K), i.e. X4 → X4 + 2πn α′ and X5 → X5 + 2πm α′. In contrast to [37] they involve the real part of ν. The reason is again that in the D3/D7 case the branes are localized along T 2 and thus the winding appears in the open string channel as opposed to the closed string channel (as was the case for D9/D5 branes). The sum over spin structures is performed using Riemann identities. This leaves the correlators of the bosonic fields as the only piece that depends on the positions νi of the vertex operators. The νi integral can then be evaluated. As the zero modes (114) involve the real part of ν in the case of D3/D7-branes, in contrast to the D9/D5- case studied in [37], the zero mode contribution in the Z-correlators drops out. The quantum part is evaluated using the method of images on the worldsheet [38,77,78]. To evaluate the KK sum in (113), it is useful to regularize the integral over t by a UV cutoff Λ. With this we obtain 1/(eσΛ2) ~n=(n,m)T π3c2σtα ′e−π~n ~nt−1 π3α′c2σe 4 + πα′c2σ E2(0, U) + . . . , (115) where the prime at the sum indicates that the (n,m) = (0, 0) term is left out, and cσ, eσ are constants whose precise values will not be important in the following (but can be found in [37]). Terms that go to zero in the limit Λ → ∞ have been dropped, as indicated by the ellipsis. The nonholomorphic Eisenstein series E2(0, U) is the s = 2 special case of Es(0, U) = ~n=(n,m)T |n+mU |2s . (116) The terms involving the UV cutoff Λ drop out after summing over all diagrams, due to tadpole cancellation. We have then reduced (113) to 〈V (0,0)ZZ V (0,0) 〉σ = −(p1 · p2)α′ (4π2α′)2 E2(0, U)γσ,kQσ,k +O((p1 · p2)2) . (117) The quantities Qσ,k come from the sum over spin structures and are defined in [37]. Introducing the notation E2(0, U) = k=0,1 E2(0, U)γσ,kQσ,k , (118) we end up with (neglecting some irrelevant factors of gc, α ′, terms subleading in the low-energy expansion, and constants of order 1) 〈VUVŪ〉 ∼ −i(p1 · p2) (4π2α′)2 (U − Ū)2 E2(0, U) . (119) To read off the one-loop correction to the kinetic term of U we need to perform a Weyl rescaling to the Einstein frame. In the one-loop term (119) this just leads to Weyl rescaling: × e Vstr , (120) where Vstr = VstrK3 Gstr (121) is the overall volume in string frame. The Kähler potential can then be read off from the kinetic term by use of the identity ∂U∂ŪE2(0, U) = − (U − Ū)2 E2(0, U) , (122) producing the final result ∆Kgs ∼ Gstre Vstr(S + S̄) E2(0, U) , (123) where Gstre Φ/Vstr is to be interpreted as a function of the Kähler variables. In the K3× T 2 orientifold case, using (121), this is just proportional to eΦ/VstrK3 ∼ (T + T̄ )−1 (with Re T the volume of K3 measured in Einstein frame), giving a result T-dual to [37] (note that we switched the real and imaginary parts in the definition of T and S as compared to [37], to conform with the rest of this paper). As we argue in section 3.1, in general the dependence on the Kähler moduli will be more complicated than this, because there is no analog to the relation (121). It is still clear that the inverse suppression in the overall volume will appear as in (123), given that it is a direct consequence of the Weyl rescaling. F Factorized approximation As mentioned in section 4.2, it is an important issue to what extent the moduli spaces of Kähler and complex structure moduli factorize. In this appendix, we give further details on the factorized approximation. A common starting point in the analysis of the potential arising in type IIB theory with 3-form fluxes is to assume that all complex structure moduli Uα and the dilaton S are stabilized by demanding DUαW = 0 = DSW . (124) In this case the F-term potential for the moduli (7) reduces to V = eK G̄iD̄W̄DiW − 3|W |2 , (125) where as in the main text, the indices i and j refer only to the Kähler moduli and thus run from 1 to h1,1. Note that even though the complex structure moduli and the dilaton are assumed to be stabilized by (124), the inverse metric G̄i is part of the inverse of the whole moduli space metric. More precisely, if we denote the Kähler moduli by Ti, as before, and all other moduli (i.e. the complex structure moduli and the dilaton) collectively as Za, the moduli space metric is given by GIJ̄ ∼ Ki̄ Kib̄ Ka̄ Kab̄ . (126) We denote the inverse of this (whole) metric by GJ̄I . In general G̄i 6= (Ki̄)−1 . (127) Equality only holds if Gib̄ = 0, i.e. if the moduli space of the Kähler moduli is factorized from the rest, as it is the case without loop and α′ corrections. In this appendix, we would like to investigate at which order in a large volume expansion the two matrices in (127) start to deviate from each other. For this analysis we assume a volume of the “Swiss cheese” form as in (48) and a Kähler potential of the form (24) (without taking possible effects of fluxes on the KK and winding mode spectra into account as was done in appendix D; thus, gaK ∼ ta and g K ∼ t−1q for some 2-cycle volumes). To avoid cumbersome notation we will indicate all the small moduli collectively as τs. We then use the formula A−1(1 +BP−1CA−1) −A−1BP−1 −P−1CA−1 P−1 , (128) where P is the Schur complement of A, defined as P = D − CA−1B . (129) In our case P is the Schur complement of Ki̄. From (24) we read off that GIJ̄ ∼ τ−2b τ τ−2b τ τ−2b τ , (130) where we only indicate the τb dependence and the indices run over I, J = {Tb, Ts, U, S}. Here β = −2 for those τi with a nonvanishing ai in (48) (so β has an implicit index i), otherwise β = −5/2 (which is in particular the value in the P[1,1,1,6,9] case). We decompose GIJ̄ as in (128) τ−2b τ , A−1 ∼ τ 2b τ 7/2+β 7/2+β B = CT ∼ τ−2b τ , D ∼ τ 0b τ τ−1b τ τ 0b τ τ−1b τ , P−1 ∼ τ 0b τ τ−1b τ . (131) Using (128) one easily obtains the scaling of the inverse: GJ̄I ∼ τ 2b τ 7/2+β 7/2+β τ 0b τ τ 0b τ . (132) Now, from (128), G̄i receives two contributions. The first is K ̄i, that would be the only term in the case of a factorized metric; the second is K−1h̄ Khb̄P Kal̄K , that breaks factorization. Let us compare their τb scaling: G̄i = A−1 + A−1BP−1CA−1 (133) τ 2b τ 7/2+β 7/2+β τ 0b τ τ 0b τ Thus the corrections coming from non-vanishing off-diagonal metric elements in (126) set in with a suppression by τ−2b , τ −7/2−β b and τ b in G b̄b, Gb̄s and Gs̄s, respectively. In the explicit example based on P[1,1,1,6,9], β = −5/2, and we checked this result by comparing to the subleading terms in (65). F.1 Factorized approximation of the scalar potential What we are really interested in is not the (inverse) metric itself, but the scalar po- tential, to which we now turn. For the nonperturbative terms Vnp1 and Vnp2, the suppression of the off-diagonal terms in (133) is inherited by the scalar potential, as they are proportional to Gs̄sW̄,s̄W,s and G ̄sK̄, respectively. For V3 things are not as simple, due to the no-scale structure at leading order. Let us neglect for a moment all the quantum corrections, then the no-scale structure implies Gı̄jKı̄Kj − 3 no−scale ∼ (τ b , τ τ 2b τ 7/2+β 7/2+β ∼ τ 0b + τ 2β+7/2 b = 0 . (134) The two terms have to vanish independently. Now let us add corrections that break no-scale structure. Because of the cancellation described in appendix C.3, the leading contribution can be seen to come at order τ b (from the α ′, E (K)s and E (W )s corrections). On the other hand, the off-diagonal terms appear at order Gı̄jKı̄Kj off−diagonal ∼ (τ b , τ τ 0b τ τ 0b τ ∼ τ−2b + τ b + τ ∼ τ−2b + . . . , (135) for both β = −2 and β = −5/2. Therefore, the off-diagonal terms of the moduli space metric appear in the scalar potential with a suppression of at least τ b (as is confirmed by the explicit example of section 3.3, cf. formulas (26)-(30)). The suppression can be even stronger if some corrections are absent and the leading term in (135) vanishes. To summarize: if one is only interested in the leading term of the scalar potential in the large volume (i.e. large τb) expansion, then one can use the factorized approximation, G̄i = K ̄i +O . (136) This provides a useful tool to simplify the calculations. References [1] S. B. Giddings, S. Kachru, and J. Polchinski, Hierarchies from fluxes in string compactifications, Phys. Rev. D66 (2002) 106006, [hep-th/0105097]. [2] S. Kachru, R. Kallosh, A. Linde, and S. P. Trivedi, De Sitter vacua in string theory, Phys. Rev. D68 (2003) 046005, [hep-th/0301240]. [3] V. Balasubramanian, P. Berglund, J. P. Conlon, and F. Quevedo, Systematics of moduli stabilisation in Calabi-Yau flux compactifications, JHEP 03 (2005) 007, [hep-th/0502058]. http://xxx.lanl.gov/abs/hep-th/0105097 http://xxx.lanl.gov/abs/hep-th/0301240 http://xxx.lanl.gov/abs/hep-th/0502058 [4] J. P. Conlon, F. Quevedo, and K. Suruliz, Large-volume flux compactifications: Moduli spectrum and D3/D7 soft supersymmetry breaking, JHEP 08 (2005) 007, [hep-th/0505076]. [5] K. Becker, M. Becker, M. Haack, and J. Louis, Supersymmetry breaking and alpha’-corrections to flux induced potentials, JHEP 06 (2002) 060, [hep-th/0204254]. [6] J. P. Conlon and F. Quevedo, Gaugino and scalar masses in the landscape, JHEP 06 (2006) 029, [hep-th/0605141]. [7] J. P. Conlon, S. S. Abdussalam, F. Quevedo, and K. Suruliz, Soft SUSY breaking terms for chiral matter in IIB string compactifications, JHEP 01 (2007) 032, [hep-th/0610129]. [8] J. P. Conlon, The QCD axion and moduli stabilisation, JHEP 05 (2006) 078, [hep-th/0602233]. [9] J. P. Conlon, Seeing the invisible axion in the sparticle spectrum, Phys. Rev. Lett. 97 (2006) 261802, [hep-ph/0607138]. [10] J. P. Conlon and D. Cremades, The neutrino suppression scale from large volumes, hep-ph/0611144. [11] J. P. Conlon and F. Quevedo, Kaehler moduli inflation, JHEP 01 (2006) 146, [hep-th/0509012]. [12] R. Holman and J. A. Hutasoit, Axionic inflation from large volume flux compactifications, hep-th/0603246. [13] J. Simon, R. Jimenez, L. Verde, P. Berglund, and V. Balasubramanian, Using cosmology to constrain the topology of hidden dimensions, astro-ph/0605371. [14] J. R. Bond, L. Kofman, S. Prokushkin, and P. M. Vaudrevange, Roulette inflation with Kaehler moduli and their axions, hep-th/0612197. [15] G. L. Kane, P. Kumar, and J. Shao, LHC string phenomenology, hep-ph/0610038. [16] F. Denef, M. R. Douglas, and B. Florea, Building a better racetrack, JHEP 06 (2004) 034, [hep-th/0404257]. http://xxx.lanl.gov/abs/hep-th/0505076 http://xxx.lanl.gov/abs/hep-th/0204254 http://xxx.lanl.gov/abs/hep-th/0605141 http://xxx.lanl.gov/abs/hep-th/0610129 http://xxx.lanl.gov/abs/hep-th/0602233 http://xxx.lanl.gov/abs/hep-ph/0607138 http://xxx.lanl.gov/abs/hep-ph/0611144 http://xxx.lanl.gov/abs/hep-th/0509012 http://xxx.lanl.gov/abs/hep-th/0603246 http://xxx.lanl.gov/abs/astro-ph/0605371 http://xxx.lanl.gov/abs/hep-th/0612197 http://xxx.lanl.gov/abs/hep-ph/0610038 http://xxx.lanl.gov/abs/hep-th/0404257 [17] L. Görlich, S. Kachru, P. K. Tripathy, and S. P. Trivedi, Gaugino condensation and nonperturbative superpotentials in flux compactifications, JHEP 12 (2004) 074, [hep-th/0407130]. [18] P. K. Tripathy and S. P. Trivedi, D3 brane action and fermion zero modes in presence of background flux, JHEP 06 (2005) 066, [hep-th/0503072]. [19] F. Denef, M. R. Douglas, B. Florea, A. Grassi, and S. Kachru, Fixing all moduli in a simple F-theory compactification, Adv. Theor. Math. Phys. 9 (2005) 861–929, [hep-th/0503124]. [20] N. Saulina, Topological constraints on stabilized flux vacua, Nucl. Phys. B720 (2005) 203–210, [hep-th/0503125]. [21] R. Kallosh, A.-K. Kashani-Poor, and A. Tomasiello, Counting fermionic zero modes on M5 with fluxes, JHEP 06 (2005) 069, [hep-th/0503138]. [22] L. Martucci, J. Rosseel, D. Van den Bleeken, and A. Van Proeyen, Dirac actions for D-branes on backgrounds with fluxes, Class. Quant. Grav. 22 (2005) 2745–2764, [hep-th/0504041]. [23] P. Berglund and P. Mayr, Non-perturbative superpotentials in F-theory and string duality, hep-th/0504058. [24] E. Bergshoeff, R. Kallosh, A.-K. Kashani-Poor, D. Sorokin, and A. Tomasiello, An index for the Dirac operator on D3 branes with background fluxes, JHEP 10 (2005) 102, [hep-th/0507069]. [25] D. Lüst, S. Reffert, W. Schulgin, and P. K. Tripathy, Fermion zero modes in the presence of fluxes and a non- perturbative superpotential, JHEP 08 (2006) 071, [hep-th/0509082]. [26] D. Lüst, S. Reffert, E. Scheidegger, W. Schulgin, and S. Stieberger, Moduli stabilization in type IIB orientifolds. II, Nucl. Phys. B766 (2007) 178–231, [hep-th/0609013]. [27] R. Blumenhagen, M. Cvetic, and T. Weigand, Spacetime instanton corrections in 4D string vacua - the seesaw mechanism for D-brane models, Nucl. Phys. B771 (2007) 113–142, [hep-th/0609191]. http://xxx.lanl.gov/abs/hep-th/0407130 http://xxx.lanl.gov/abs/hep-th/0503072 http://xxx.lanl.gov/abs/hep-th/0503124 http://xxx.lanl.gov/abs/hep-th/0503125 http://xxx.lanl.gov/abs/hep-th/0503138 http://xxx.lanl.gov/abs/hep-th/0504041 http://xxx.lanl.gov/abs/hep-th/0504058 http://xxx.lanl.gov/abs/hep-th/0507069 http://xxx.lanl.gov/abs/hep-th/0509082 http://xxx.lanl.gov/abs/hep-th/0609013 http://xxx.lanl.gov/abs/hep-th/0609191 [28] M. Haack, D. Krefl, D. Lüst, A. Van Proeyen, and M. Zagermann, Gaugino condensates and D-terms from D7-branes, JHEP 01 (2007) 078, [hep-th/0609211]. [29] N. Akerblom, R. Blumenhagen, D. Lüst, E. Plauschinn, and M. Schmidt-Sommerfeld, Non-perturbative SQCD superpotentials from string instantons, JHEP 04 (2007) 076, [hep-th/0612132]. [30] D. Tsimpis, Fivebrane instantons and Calabi-Yau fourfolds with flux, JHEP 03 (2007) 099, [hep-th/0701287]. [31] M. Bianchi and E. Kiritsis, Non-perturbative and flux superpotentials for Type I strings on the Z3 orbifold, hep-th/0702015. [32] R. Argurio, M. Bertolini, G. Ferretti, A. Lerda, and C. Petersson, Stringy instantons at orbifold singularities, arXiv:0704.0262 [hep-th]. [33] K. Choi, A. Falkowski, H. P. Nilles, M. Olechowski, and S. Pokorski, Stability of flux compactifications and the pattern of supersymmetry breaking, JHEP 11 (2004) 076, [hep-th/0411066]. [34] D. Lüst, S. Reffert, W. Schulgin, and S. Stieberger, Moduli stabilization in type IIB orientifolds. I: Orbifold limits, Nucl. Phys. B766 (2007) 68–149, [hep-th/0506090]. [35] D. Krefl and D. Lüst, On supersymmetric minkowski vacua in IIB orientifolds, JHEP 06 (2006) 023, [hep-th/0603166]. [36] M. Gomez-Reino and C. A. Scrucca, Locally stable non-supersymmetric Minkowski vacua in supergravity, JHEP 05 (2006) 015, [hep-th/0602246]. [37] M. Berg, M. Haack, and B. Körs, String loop corrections to Kaehler potentials in orientifolds, JHEP 11 (2005) 030, [hep-th/0508043]. [38] I. Antoniadis, C. Bachas, C. Fabre, H. Partouche, and T. R. Taylor, Aspects of type I - type II - heterotic triality in four dimensions, Nucl. Phys. B489 (1997) 160–178, [hep-th/9608012]. [39] C. Angelantonj and A. Sagnotti, Open strings, Phys. Rept. 371 (2002) 1–150, [hep-th/0204089]. http://xxx.lanl.gov/abs/hep-th/0609211 http://xxx.lanl.gov/abs/hep-th/0612132 http://xxx.lanl.gov/abs/hep-th/0701287 http://xxx.lanl.gov/abs/hep-th/0702015 http://xxx.lanl.gov/abs/arXiv:0704.0262 [hep-th] http://xxx.lanl.gov/abs/hep-th/0411066 http://xxx.lanl.gov/abs/hep-th/0506090 http://xxx.lanl.gov/abs/hep-th/0603166 http://xxx.lanl.gov/abs/hep-th/0602246 http://xxx.lanl.gov/abs/hep-th/0508043 http://xxx.lanl.gov/abs/hep-th/9608012 http://xxx.lanl.gov/abs/hep-th/0204089 [40] C. P. Burgess, P. Camara, S. de Alwis, S. Giddings, A. Maharana, F. Quevedo, and K. Suruliz, Warped supersymmetry breaking, hep-th/0610255. [41] S. B. Giddings and A. Maharana, Dynamics of warped compactifications and the shape of the warped landscape, Phys. Rev. D73 (2006) 126003, [hep-th/0507158]. [42] C. P. Burgess, R. Kallosh, and F. Quevedo, de Sitter string vacua from supersymmetric D-terms, JHEP 10 (2003) 056, [hep-th/0309187]. [43] O. Lebedev, H. P. Nilles, and M. Ratz, de Sitter vacua from matter superpotentials, Phys. Lett. B636 (2006) 126, [hep-th/0603047]. [44] E. Dudas and Y. Mambrini, Moduli stabilization with positive vacuum energy, JHEP 10 (2006) 044, [hep-th/0607077]. [45] E. Dudas, C. Papineau, and S. Pokorski, Moduli stabilization and uplifting with dynamically generated F-terms, JHEP 02 (2007) 028, [hep-th/0610297]. [46] E. Cremmer, S. Ferrara, C. Kounnas, and D. V. Nanopoulos, Naturally vanishing cosmological constant in N=1 supergravity, Phys. Lett. B133 (1983) [47] P. Candelas, A. Font, S. H. Katz, and D. R. Morrison, Mirror symmetry for two parameter models. 2, Nucl. Phys. B429 (1994) 626–674, [hep-th/9403187]. [48] G. Curio and V. Spillner, On the modified KKLT procedure: A case study for the P(11169)(18) model, hep-th/0606047. [49] M. Dine and N. Seiberg, Is the superstring weakly coupled?, Phys. Lett. B162 (1985) 299. [50] V. Balasubramanian and P. Berglund, Stringy corrections to Kaehler potentials, SUSY breaking, and the cosmological constant problem, JHEP 11 (2004) 085, [hep-th/0408054]. [51] M. Berg, M. Haack, and B. Körs, On volume stabilization by quantum corrections, Phys. Rev. Lett. 96 (2006) 021601, [hep-th/0508171]. [52] R. Blumenhagen, B. Körs, D. Lüst, and S. Stieberger, Four-dimensional string compactifications with D-branes, orientifolds and fluxes, hep-th/0610327. http://xxx.lanl.gov/abs/hep-th/0610255 http://xxx.lanl.gov/abs/hep-th/0507158 http://xxx.lanl.gov/abs/hep-th/0309187 http://xxx.lanl.gov/abs/hep-th/0603047 http://xxx.lanl.gov/abs/hep-th/0607077 http://xxx.lanl.gov/abs/hep-th/0610297 http://xxx.lanl.gov/abs/hep-th/9403187 http://xxx.lanl.gov/abs/hep-th/0606047 http://xxx.lanl.gov/abs/hep-th/0408054 http://xxx.lanl.gov/abs/hep-th/0508171 http://xxx.lanl.gov/abs/hep-th/0610327 [53] M. B. Green, Interconnections between type II superstrings, M theory and N = 4 Yang-Mills, hep-th/9903124. [54] K. Choi and H. P. Nilles, The gaugino code, JHEP 04 (2007) 006, [hep-ph/0702146]. [55] G. von Gersdorff and A. Hebecker, Kaehler corrections for the volume modulus of flux compactifications, Phys. Lett. B624 (2005) 270–274, [hep-th/0507131]. [56] H. P. Nilles, Supersymmetry, supergravity and particle physics, Phys. Rept. 110 (1984) 1. [57] S. P. Martin, A supersymmetry primer, hep-ph/9709356. [58] V. S. Kaplunovsky, One loop threshold effects in string unification, Nucl. Phys. B307 (1988) 145, [hep-th/9205068]. [59] J. P. Conlon, D. Cremades, and F. Quevedo, Kaehler potentials of chiral matter fields for Calabi-Yau string compactifications, JHEP 01 (2007) 022, [hep-th/0609180]. [60] D. Berenstein, Branes vs. GUTS: Challenges for string inspired phenomenology, hep-th/0603103. [61] L. E. Ibanez, F. Marchesano, and R. Rabadan, Getting just the standard model at intersecting branes, JHEP 11 (2001) 002, [hep-th/0105155]. [62] P. Candelas, X. De La Ossa, A. Font, S. H. Katz, and D. R. Morrison, Mirror symmetry for two parameter models. I, Nucl. Phys. B416 (1994) 481–538, [hep-th/9308083]. [63] M. B. Green, J. A. Harvey, and G. W. Moore, I-brane inflow and anomalous couplings on D-branes, Class. Quant. Grav. 14 (1997) 47–52, [hep-th/9605033]. [64] K. Dasgupta, D. P. Jatkar, and S. Mukhi, Gravitational couplings and Z(2) orientifolds, Nucl. Phys. B523 (1998) 465–484, [hep-th/9707224]. [65] Y.-K. E. Cheung and Z. Yin, Anomalies, branes, and currents, Nucl. Phys. B517 (1998) 69–91, [hep-th/9710206]. [66] R. Minasian and G. W. Moore, K-theory and Ramond-Ramond charge, JHEP 11 (1997) 002, [hep-th/9710230]. http://xxx.lanl.gov/abs/hep-th/9903124 http://xxx.lanl.gov/abs/hep-ph/0702146 http://xxx.lanl.gov/abs/hep-th/0507131 http://xxx.lanl.gov/abs/hep-ph/9709356 http://xxx.lanl.gov/abs/hep-th/9205068 http://xxx.lanl.gov/abs/hep-th/0609180 http://xxx.lanl.gov/abs/hep-th/0603103 http://xxx.lanl.gov/abs/hep-th/0105155 http://xxx.lanl.gov/abs/hep-th/9308083 http://xxx.lanl.gov/abs/hep-th/9605033 http://xxx.lanl.gov/abs/hep-th/9707224 http://xxx.lanl.gov/abs/hep-th/9710206 http://xxx.lanl.gov/abs/hep-th/9710230 [67] J. F. Morales, C. A. Scrucca, and M. Serone, Anomalous couplings for D-branes and O-planes, Nucl. Phys. B552 (1999) 291–315, [hep-th/9812071]. [68] J. Stefanski, Bogdan, Gravitational couplings of d-branes and o-planes, Nucl. Phys. B548 (1999) 275–290, [hep-th/9812088]. [69] C. P. Bachas, P. Bain, and M. B. Green, Curvature terms in D-brane actions and their M-theory origin, JHEP 05 (1999) 011, [hep-th/9903210]. [70] A. Fotopoulos, On (alpha’)**2 corrections to the D-brane action for non- geodesic world-volume embeddings, JHEP 09 (2001) 005, [hep-th/0104146]. [71] E. Dudas, G. Pradisi, M. Nicolosi, and A. Sagnotti, On tadpoles and vacuum redefinitions in string theory, Nucl. Phys. B708 (2005) 3–44, [hep-th/0410101]. [72] N. Kaloper and R. C. Myers, The O(dd) story of massive supergravity, JHEP 05 (1999) 010, [hep-th/9901045]. [73] S. Kachru, M. B. Schulz, and S. Trivedi, Moduli stabilization from fluxes in a simple IIB orientifold, JHEP 10 (2003) 007, [hep-th/0201028]. [74] M. Berg, M. Haack, and B. Körs, Loop corrections to volume moduli and inflation in string theory, Phys. Rev. D71 (2005) 026005, [hep-th/0404087]. [75] M. Berg, M. Haack, and B. Körs, On the moduli dependence of nonperturbative superpotentials in brane inflation, hep-th/0409282. [76] D. Lüst, P. Mayr, R. Richter, and S. Stieberger, Scattering of gauge, matter, and moduli fields from intersecting branes, Nucl. Phys. B696 (2004) 205–250, [hep-th/0404134]. [77] C. P. Burgess and T. R. Morris, Open and unoriented strings a la Polyakov, Nucl. Phys. B291 (1987) 256. [78] C. P. Burgess and T. R. Morris, Open superstrings a la Polyakov, Nucl. Phys. B291 (1987) 285. http://xxx.lanl.gov/abs/hep-th/9812071 http://xxx.lanl.gov/abs/hep-th/9812088 http://xxx.lanl.gov/abs/hep-th/9903210 http://xxx.lanl.gov/abs/hep-th/0104146 http://xxx.lanl.gov/abs/hep-th/0410101 http://xxx.lanl.gov/abs/hep-th/9901045 http://xxx.lanl.gov/abs/hep-th/0201028 http://xxx.lanl.gov/abs/hep-th/0404087 http://xxx.lanl.gov/abs/hep-th/0409282 http://xxx.lanl.gov/abs/hep-th/0404134 Introduction Review KKLT Consistency of KKLT Large volume scenario (LVS) String loop corrections to LVS From toroidal orientifolds to Calabi-Yau manifolds LVS with loop corrections The P[1,1,1,6,9]4 model Gaugino masses Including loop corrections Other soft terms LVS for other classes of Calabi-Yau manifolds? Abundance of ``Swiss cheese'' Calabi-Yau manifolds Toroidal orientifolds Fibered Calabi-Yau manifolds Further corrections Conclusions Some details on LVS LVS for P[1,1,1,6,9]4 Many Kähler moduli Loop corrected inverse Kähler metric for P[1,1,1,6,9]4 No-scale Kähler potential in type II string theory No-scale structure in type IIA No-scale structure in type IIB Cancellation with just the volume modulus Cancellation with many Kähler moduli Perturbative corrections to Vnp1 and Vnp2 KK spectrum with fluxes The orientifold calculation Factorized approximation Factorized approximation of the scalar potential ABSTRACT We subject the phenomenologically successful large volume scenario of hep-th/0502058 to a first consistency check in string theory. In particular, we consider whether the expansion of the string effective action is consistent in the presence of D-branes and O-planes. Due to the no-scale structure at tree-level, the scenario is surprisingly robust. We compute the modification of soft supersymmetry breaking terms, and find only subleading corrections. We also comment that for large-volume limits of toroidal orientifolds and fibered Calabi-Yau manifolds the corrections can be more important, and we discuss further checks that need to be performed. <|endoftext|><|startoftext|> Driven activation versus thermal activation Patrick Ilg∗ and Jean-Louis Barrat Université de Lyon; Univ. Lyon I, Laboratoire de Physique de la Matière Condensée et des Nanostructures; CNRS, UMR 5586, 43 Bvd. du 11 Nov. 1918, 69622 Villeurbanne Cedex, France (Dated: November 2, 2018) Activated dynamics in a glassy system undergoing steady shear deformation is studied by numerical simula- tions. Our results show that the external driving force has a strong influence on the barrier crossing rate, even though the reaction coordinate is only weakly coupled to the nonequilibrium system. This ”driven activation” can be quantified by introducing in the Arrhenius expression an effective temperature, which is close to the one determined from the fluctuation-dissipation relation. This conclusion is supported by analytical results for a simplified model system. PACS numbers: 64.70.Pf,05.40.-a,05.70.Ln Activated rate theory is ubiquitous in the description and understanding of dynamical processes in condensed matter, physical chemistry or materials science. The basic problem, known as the ”barrier crossing” or ”Kramers problem”, is that of a single degree of freedom, coupled to a heat bath, and moving in a double well potential. The ”barrier crossing rate” is defined as the average time taken by the system to switch from a potential well to the other, under the influence of ther- mal noise. In general, the single degree of freedom, often called ”reaction coordinate”, is coupled to a complex, fluc- tuating environment. The ”thermal noise” is a schematic de- scription of the interaction with this environment. This approach has been applied to a wealth of different problems. We can for example mention diffusion in solids, in which case the reaction coordinate is an atomic position, and the noise is associated with thermal vibrations. In isomeriza- tion reactions, the reaction coordinate is an internal coordinate of the molecule, coupled to a liquid solvent. In nucleation the- ory, the internal coordinate describes a collective fluctuation of an order parameter, and the ”barrier” is interpreted as a free energy, rather than energy, barrier. Other examples involve the Eyring theory of plasticity in solids, in which the activated process is associated with a local strain change. The analysis of the barrier crossing problem is often asso- ciated with the names of Eyring, who proposed the so called ”transition state approximation” [1], and of Kramers, who made the first complete analysis of the problem in the lim- its of low and high friction [2]. Since then, many refinements of the theory have been studied and are reviewed in reference [3]. In all cases, it turns out that an essential factor in the re- action rate, which to a large extent governs the variation with temperature T , is the Arrhenius contribution: r(T ) ∼ exp(−∆E/kBT ) (1) where ∆E is the energy barrier to overcome. The exponential variation of the Arrhenius factor (1) is, in fact, the hallmark of activated processes. ∗Present address: ETH Zürich, Polymer Physics, HCI H541, CH-8093 Zürich, Switzerland As discussed above, activated processes are often invoked in the description of the dynamical response of condensed matter systems. As such, they will typically take place un- der nonequilibrium conditions. The deviation from equilib- rium can be weak, e.g. during the flow of a Newtonian liquid, in which case the applicability of equation (1) is straightfor- ward. In other cases, however, the same equation is applied to systems that are strongly out of equilibrium, in the sense that their response to an external driving force is strongly nonlin- ear, or that their phase space distribution is very different from the equilibrium, Gibbs-Boltzmann distribution. A prototypical example of such a strongly nonequilibrium situation is the flow of a glassy system. Such a flow can be in- duced only by stresses larger than the yield stress (see e.g. [4] for the effect of strain and temperature in glassy solids). In the absence of flow, the relaxation is very slow, and the sys- tem is out of equilibrium and non-stationary [5, 6]. The flow produces a nonequilibrium steady state [7, 8, 9], with a typi- cal relaxation time that is fixed internally by the applied stress or the strain rate. This situation has attracted a considerable amount of theoretical and experimental interest, in two differ- ent contexts. The first one is the rheology of ”soft glasses” (emulsions, pastes, colloidal glasses, foams). The second one is the plastic deformation of bulk metallic glasses. In both cases, approaches have been proposed that introduce a ”noise temperature” [10] or ”disorder temperature” [11, 12]. In [10], this noise temperature replaces the actual temperature in equa- tion (1). In such models, the effective temperature is intro- duced in a somewhat empirical manner. Another concept of effective temperature, rooted in sta- tistical mechanics ideas, was introduced in [13, 14], based on the ”fluctuation-dissipation ratio”. At equilibrium, the fluctuation-dissipation theorem states that the ratio between integrated response and correlation functions (FDR) is equal to the temperature. Cugliandolo et al. [13] showed how this concept could be extended to out-of-equilibrium system, by defining the effective temperature from the FDR, which now differs from the thermal bath temperature. It was proposed that a thermometer probing a nonequilibrium system on long time scales would actually be sensitive to this effective tem- perature, and this result was checked numerically on simple models [8, 9, 15]. Experimental evidence supporting this definition of an effective temperature has been found e.g. in http://arxiv.org/abs/0704.0738v1 [16, 17]. In this contribution, we explore the influence of an external driving force on the rate of a simple activated process. Our primary objective is here to check how the external drive, and the ”noise” it generates, can influence the dynamics of an in- ternal degree of freedom, which is not directly coupled to the driving force. A very standard way of quantifying the results is to use the Arrhenius representation, which provides an op- erational way of introducing an ”activation temperature”, that can be compared to other calculations of effective tempera- tures in nonequilibrium systems. Our approach involves the simulation of the classical Kob- Andersen ”binary Lennard-Jones” model undergoing shear flow, similar to the one used in ref. [8]. In order to probe ac- tivated dynamics, one appealing possibility would be to iden- tify and study the activated events that actually give rise to the flow at low temperature, in the spirit of [10]. This ap- proach, however, is difficult and could yield ambiguous re- sults, as the flow is self consistently coupled to these events. We therefore make use of the flexibility of numerical model- ing to devise a very simple ”activated degree of freedom” that has only a weak coupling to the existing flow in our system. This is achieved by replacing each particle of the minority species rBj by a peanut shaped ”dumbbell” with coordinates j ± (uj/2)e z, with fixed orientation along ez , the direc- tion perpendicular to the shear plane. Each center of force in the dumbbell carries half of the particle interaction, and the separation between the two centers of force u is small enough that the perturbation of the surrounding fluid can be neglected. The important feature of the model is the fact that the two centers of force are related through an internal ”reac- tion coordinate” u, which evolves in a bistable intramolecular potential V (u) = (V0/u 2 − u20) 2, where u0 = 0.1 (in Lennard-Jones units) is the equilibrium dumbbell separation (see fig. 1b). Each dumbbell is therefore a simple ”two-state” system which can undergo, under the influence of the interac- tions with the surrounding fluid, an ”isomerization reaction”. This reaction corresponds to exchanging the positions of the two centers of force (see fig. 1). This ”isomerization” will be the focus of our study. Its rate can be studied as a function of the imposed barrier height, of the external temperature T and on the driving force, which is here quantified by the shear rate γ̇. We have chosen to work under conditions for T and γ̇ that have been well character- ized previously [8] (T = 0.3 and γ̇ = 10−3, in Lennard-Jones units) and to concentrate on the influence of barrier height ∆E = V0. At this temperature, the system would not undergo structural relaxation on the time scales that can be achieved using computer simulation. Under the influence of the exter- nal drive, a relaxation on a time scale τα ≃ 100 is observed. This time scale is very well separated from microscopic, vi- brational time scale, so that our system is a practical realiza- tion of the theoretical concepts described in ref. [13]. Determination of reaction rates is a notoriously difficult challenge for numerical simulations, as the activated events typically take place on much larger time scales than the short time vibrations of the intramolecular bonds. A number of so- phisticated methods [18] have been developed to bypass this intrinsic difficulty, either from biased simulations, or by mak- ing use directly of the rate formula 1. Unfortunately, such methods always assume that the system is close to thermal equilibrium, and are therefore inapplicable in our case. The forward flux method recently proposed in [19] is applicable to nonequilibrium systems. However, only a single reaction coordinate per system can be treated with this method, which is impractical for the present situation. As a result, we have to use ”brute force” simulations to obtain reaction rates from the study of individual trajectories, which seriously limits the range of barrier heights that can be considered. The Sllod equations of motion appropriate for a fluid under- going simple shear were integrated with a leapfrog algorithm using a time step of ∆t = 5×10−4 in reduced Lennard-Jones units [8]. For the dumbbell particles, the leapfrog algorithm is applied to the center-of-mass and relative positions and mo- menta. Lees-Edwards periodic boundary conditions [8] are employed in order to minimize effects due to the finite system size. Constant temperature conditions are ensured by rescal- ing the velocity components in the neutral direction of all par- ticles at each time step. The reaction rate r is determined from the number correla- tion function C(t) by C(t) = 〈δn(t)δn(0)〉 〈δn(0)2〉 ≈ exp [−rt/〈n〉] (2) where δn(t) = n(t) − 〈n〉 and n(t) equals one if u(t) > uB and zero else [20]. The systems studied contain N = 2048 particles, 410 of which are dumbbells. Eq. (2) is evaluated from an ensemble average over 20− 40 independent systems. The results are found to be independent of the exact location of the dividing surface uB in the vicinity of the barrier max- imum uB = 0. The fast initial decay of C(t) is well de- scribed by transition state theory. Escape rates are extracted from fits to Eq. (2) for intermediate times 5 ≤ t ≤ 10. We verified that very similar results are found within a broad range 1 ≤ t ≤ 30, before the correlation function finally de- cays to zero, in full agreement with theoretical expectation [20]. For relatively low barrier heights V0/T . 3, C(t) de- cays more rapidly, so that we extracted rates for shorter times, 0.5 ≤ t ≤ 1. In the following, we present results for the reaction rate as a function of V0 based on the study of the decay in the number- number correlation function [20]. We adopt common practice by giving all temperature and energy values in terms of the depth of the Lennard-Jones potential ǫ. Figure 2 illustrates the difficulty of the approach, by show- ing the trajectories of selected dumbbells for different values of the barrier at T = 0.3. For V0/kT = 1, barrier crossings are so common that describing them trough classical rate the- ory is problematic. For V0/kT = 10, the crossings become very unlikely, so that the determination of the rate becomes difficult. This leaves us with typically two decades in terms of variation of the reaction rate. The corresponding reaction rates, determined from the cor- relation function of the dumbbell internal coordinate, are shown in figure 3. At T = 0.8, γ̇ = 10−3, the rates obey the equilibrium Arrhenius law (1), showing that under these conditions the drive is only a weak perturbation to the system. We now concentrate on the rates obtained at T = 0.3 and γ̇ = 10−3. The reaction rates are clearly influenced by the external driving imposed to the system. To show this, we use the rates obtained at a rather high temperature, req(T = 0.8), to extrapolate to T = 0.3. The equilibrium extrapolation rext is achieved using the Arrhenius formula, i.e. rext(T = 0.3) = req(T = 0.8)×exp(+V0/0.8−V0/0.3). Clearly, the extrapo- lated rates are significantly lower than those actually observed under shear, except at low barrier heights (high rates) were the two estimates almost coincide. The difference between the ex- trapolated rates and the measured ones is an indicator of the inadequacy of the standard Arrhenius formula, using the ther- mal bath temperature, in the driven system. In spite of the limited range of accessible rates, it is clear from figure 3 that the rates in a glassy system under shear do not obey Arrhenius behavior of the form exp(−V0/T ) over the whole range of barrier heights under study. While this law is relatively well obeyed at low barriers and large cross- ing rates, it would significantly underestimate the rate for high barriers. Instead, at high barrier rates, the crossing rate is con- siderably increased. If an attempt is made to fit the results to an ”effective Arrhenius factor”, a value of Teff ≃ 0.6 is obtained. Under the same conditions, a completely different de- termination of the effective temperature [8], based on the fluctuation-dissipation approach mentioned above, yields T ∗ ≃ 0.65. This is in good agreement with the present fit to an Arrhenius law. The determination of Teff based on re- action rates is of limited accuracy, such that we cannot ex- clude that Teff and T ∗ actually differ slightly or that Teff is slightly dependent on barrier height. A more precise determi- nation of Teff would require larger barrier heights, which is computationally quite demanding. Note, that V0 = 3 corre- sponds for T = 0.3 already to the rather high barrier height of V0/kT = 10. In figure 3, we also display the results obtained for the rates at a slightly higher value of the shear, γ̇ = 10−2. The sepa- ration of time scales between relaxation time and microscopic times is less marked than for the low shear rates (τα ≃ 10 in this case). It appears that the increase in shear induces a change in the prefactor for the rates, rather than in the bar- rier height dependence. This is consistent with the relatively weak influence of shear rate on effective temperature reported earlier [8]. It is interesting to discuss the time scale at which the crossover between the two Arrhenius laws, characterized ei- ther by the bath temperature or an effective temperature, takes place. A natural guess would be to associate this crossover with a value of the rate that corresponds to the inverse of the α relaxation time. The general idea is, that fluctuations taking place on longer time scales will be associated with a higher temperature [8, 13]. In figure 3 we see that this guess over- estimates the crossover rate by a factor of 5 in the case of γ̇ = 10−3. It is not clear at this point, whether this differ- ence is significant or reflects merely some arbitrariness in the definition of relaxation times. The simulation results presented above suggest that the activated dynamics is governed by an elevated temperature Teff ≃ 0.6 > T . This temperature is consistent with the ef- fective temperature T ∗ = 0.65 found in extensive simulation studies on the fluctuation-dissipation relation in this system [8]. In order to investigate the relation between Teff and T and to rationalize our simulation results, we study to following toy model proposed in [21]. Consider a particle of mass m at position x moving in an external potential V (x) under the influence of two thermal baths. One bath, associated with the fast degrees of freedom, is kept at temperature Tfast and exerts an instantaneous fric- tion force of strength Γ0. The second bath, which mimics the slow degrees of freedom is held at temperature Tslow and is described by the retarded friction coefficient (memory kernel) Γ(t). The equations of motion read ẋ = v, mv̇ = −V ′(x)− dsΓ(t−s)v(s)−Γ0v(t)+ξ(t)+η(t) (3) The fast bath is modeled as Gaussian white noise with 〈η(t)〉 = 0, 〈η(t)η(s)〉 = 2Tfastδ(t − s), whereas the ran- dom force due to the slow bath is described by 〈ξ(t)〉 = 0, 〈ξ(t)ξ(s)〉 = 2TslowΓ(t − s). We use an exponentially de- caying memory kernel Γ(t) = α−1e−t/(αγ) for which the non-Markovian dynamics (3) can equivalently be rewritten as Markovian dynamics in an extended set of variables [22]. Exact solutions of the model (3) for harmonic potentials V are presented in [21]. For barrier crossing problems with double-well potentials V , no analytical solutions to (3) are known. We therefore extend the widely used transition state approximation to the present model after adiabatic elimina- tion of the fast degrees of freedom. The resulting expres- sion for the rate rTST is rather lengthy and will be presented elsewhere together with the (straightforward) procedure. For the double-well potential V (x) considered above, the depen- dence of the rate rTST on the barrier height V0 is again domi- nated by the Arrhenius factor, however with an effective tem- perature Teff,TST = Tfastw/[w + 4(Tfast − Tslow)], where w = Tslow + αV ′′Tfast and V ′′ = 8V0/u 0. Thus, if the slow and fast bath are both kept at the same temperature, Tslow = Tfast = T , one recovers the usual Kramers re- sult with Teff,TST = T . If, however, Tslow > Tfast, the escape rate is enhanced due to Teff,TST > Tfast. Due to the interplay between fast and slow dynamics in the barrier crossing, the effective temperature is in general intermedi- ate between the temperature of the slow and the fast bath. These predictions are in agreement with the simulation re- sults presented above. Furthermore, estimating the coefficient α ≈ 0.01 from the inverse high frequency shear modulus for the Lennard-Jones system [22], the predicted effective tem- perature is Teff,TST ≈ 0.45. In view of the simplicity of the model and the uncertainty in α, the order of magnitude agree- ment with the observed Teff is reasonable. In conclusion, we have shown that activated processes out of equilibrium are influenced by an external driving, even if the corresponding degree of freedom is weakly coupled to the drive. Qualitatively, this increase is the essential result from our simulations. From a more quantitative point of view, the analysis of the Arrhenius plot allows one to define oper- ationally an effective activation temperature. The link of this activation temperature to other definitions of effective tem- perature, and the time scale for the crossover from ”thermal activation” to ”driven activation” will have to be explored fur- ther. However, the results are consistent with a general picture involving a degree of freedom coupled to two different heat baths, one associated with short time vibrations and one asso- ciated with shear induced fluctuations, taking place on longer time scales and described by a higher temperature [21]. This ”driven activation” (as opposed to ”thermal” activa- tion) could have interesting consequences for characterizing the effective temperature of nonequilibrium systems, by pro- viding a ”thermometer” based on activated processes. It can also be of importance within the theory of plasticity of amor- phous materials, by providing a self-consistent description of the ”noise” that induces local plastic events, within a classical statistical mechanics description involving a noise tempera- ture [11, 23]. Inserting an effective temperature in Eyring’s rate theory of plasticity, T. Haxton and A. J. Liu were recently able to account for the flow curves of simple glassy systems at low temperatures [24]. [1] H. Eyring, J. Chem. Phys. 3, 107 (1935). [2] H. A. Kramers, Physica 7, 284 (1940). [3] P. Hänggi, P. Talkner, and M. Borkovec, Rev. Mod. Phys. 62, 251 (1990). [4] J. Rottler and M. O. Robbins, Phys. Rev. E 68, 011507 (2003). [5] L. Berthier, G. Biroli, J.-P. Bouchaud, L. Cipelletti, D. E. Masri, D. L’Hôte, F. Ladieu, and M. Pierno, Science 310, 1797 (2005). [6] K. N. Pham, A. M. Puertas, J. Bergenholtz, S. U. Egelhaaf, A. Moussaı̈d, P. N. Pusey, A. B. Schofield, M. E. Cates, M. Fuchs, and W. C. K. Poon, Science 296, 104 (2002). [7] L. Berthier, J.-L. Barrat, and J. Kurchan, Phys. Rev. E 61, 5464 (2000). [8] L. Berthier and J.-L. Barrat, J. Chem. Phys. 116, 6228 (2002). [9] L. Berthier and J.-L. Barrat, Phys. Rev. Lett. 89, 095702 (2002). [10] P. Sollich, F. Lequeux, P. Hébraud, and M. E. Cates, Phys. Rev. Lett. 78, 2020 (1997). [11] J. S. Langer, Phys. Rev. E 70, 041502 (2004). [12] J. S. Langer and A. Lemaitre, Phys. Rev. Lett. 94, 175701 (2005). [13] L. Cugliandolo, J. Kurchan, and L. Peliti, Phys. Rev. E 55, 3898 (1997). [14] J. Kurchan, Nature 433, 222 (2005). [15] I. K. Ono, C. S. O’Hern, D. J. Durian, S. A. Langer, A. J. Liu, and S. R. Nagel, Phys. Rev. Lett. 89, 095703 (2002). [16] P. Wang, C. Song, and H. A. Makse, Nature Physics 2, 526 (2006). [17] D. Herisson and M. Ocio, Phys. Rev. Lett. 88, 257202 (2002). [18] see e.g. C. Dellago, P. G. Bolhuis, and P. L. Geissler, Advances Chem. Phys. 123, 1 (2002). [19] R. J. Allen, P. B. Warren, and P. R. ten Wolde, Phys. Rev. Lett. 94, 018104 (2005). [20] D. Chandler, J. Chem. Phys. 68, 2959 (1978). [21] P. Ilg and J.-L. Barrat, J. Phys.: Conf. Ser. 40, 76 (2006). [22] J.-L. Barrat, Chem. Phys. Lett. 165, 551 (1990). [23] A. Lemaitre and C. Caroli, cond-mat/0609689. [24] A. Liu, private communication. Figures ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������� ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ ������ dumbbell separation u FIG. 1: (a) Schematic representation of a dumbbell particle in a sys- tem undergoing shear flow with fixed orientation perpendicular to the shear plane. Also shown is an isomerization reaction. The magnitude of the separation between the two centers of force is considerably exaggerated in this schematic representation. (b) Intramolecular po- tential (characteristic of the nonlinear ”spring” shown in panel (a) ) between the two centers of force that define the dumbbell. FIG. 2: (Color online) Trajectories of the internal dumbbell coor- dinate for different barrier heights. The thermal bath temperature is T = 0.3 in reduced Lennard-Jones units, and the shear rate is γ̇ = 10 1 1.5 2 2.5 3 3.5 barrier height V τα, γ=10−2 τα, γ=10−3 1 / 0.8 1 / 0.3 1 / 0.6 FIG. 3: (Color online) Reaction rate as a function of barrier height, for fixed temperature and shear rate. Full squares: results for T = 0.8 (red) at equilibrium. T = 0.3 (black and blue), and different shear rates (full diamonds and circles correspond to γ̇ = 10−2 and γ̇ = 10 −3, respectively). Open circles represent rext, an extrapola- tion of the high temperature results to T = 0.3 as explained in the text. ABSTRACT Activated dynamics in a glassy system undergoing steady shear deformation is studied by numerical simulations. Our results show that the external driving force has a strong influence on the barrier crossing rate, even though the reaction coordinate is only weakly coupled to the nonequilibrium system. This "driven activation" can be quantified by introducing in the Arrhenius expression an effective temperature, which is close to the one determined from the fluctuation-dissipation relation. This conclusion is supported by analytical results for a simplified model system. <|endoftext|><|startoftext|> Computation of Power Loss in Likelihood Ratio Tests for Probability Densities Extended by Lehmann Alternatives Lucas Gallindo Martins Soares Departamento de Estatı́stica e Informática Universidade Federal Rural de Pernambuco, Brasil lucasgallindo@gmail.com Abstract We compute the loss of power in likelihood ratio tests when we test the original parameter of a probability density extended by the first Lehmann alternative. 1 Distributions Generated by Lehmann Alter- natives In the context of parametric models for lifetime data, [Gupta et alii 1998] disseminated the study of distributions generated by Lehmann alterna- tives, cumulative distributions that take one of the following forms: G1 (x, λ) = [F (x)] λ or G2 (x, λ) = 1− [1− F (x)] λ (1) where F (x) is any cumulative distribution and λ > 0. In the present note, we are going to call both G distributions generated distributions or extended distributions. It is easy to see that for integer values of λ, G1 and G2 are, respectively, the distribution of the maximum and the minimum of a sample of size λ, the support of the two distribution is the same of F , and that the associated density functions are g1 (x, λ) = λf(x) [F (x)] λ−1 and g2 (x, λ) = λf(x) [1− F (x)] λ−1 (2) where f(x) is the density function associated with F . Suppose that we generate a distribution G(x|λ) based on the distribution F (x), and want to generate another distribution G′(x|λ, λ′) repeating the process; It is easy to see that the distribution G′ will be the same as G, for the new param- eter of the distribution, λλ′ may be summarized as a single one. This has the interesting side effect that the standard uniparametric exponential dis- tribution may be seen as a distribution generated by the second Lehmann alternative from the distribution F (x) = 1− e−x. To compute the moments of distribution generated by Lehmann alter- natives, we use the change of variables u = F (x) in the expression xkλf(x) [F (x)]λ−1 dx (3) yielding λQk(u)uλ−1du = EBeta(λ,1) [Q(u)] (4) where Q(u) = F−1(u) is the quantile function. This integral is equivalent to the expectancy of Q(u) with respect to a Beta distribution with parameters α = λ, β = 1. The same reasoning can be used to show that, for the second Lehmann alternative, E = EBeta(1,λ) [Q(u)]. Using the log-likelihood functions G1 (x, λ) = n ln (λ) + ln f (xj) + (λ− 1) lnF (xj) (5) G2 (x, λ) = n ln (λ) + ln f (xj) + (λ− 1) ln [1− F (xj)] (6) we see that the maximum likelihood estimators to the parameter λ have the forms λ̂ = − j=1 lnF (xj) and λ̂ = − j=1 ln [1− F (xj)] The existing literature about distributions generated by Lehmann al- ternatives concerns mostly distributions defined on the interval (0,∞) or in the real line, with the paper by [Nadarajah and Kotz 2006] being the more complete review of progresses and the paper [Nadarajah 2006] being an in- teresting application of the concepts developed outside the original proposal by [Gupta et alii 1998], which was to analyze lifetime data. In the present paper, we are concerned with some information theoretical quantities of the first extension. These are not the only papers dealing with the subject, but a complete list with comments would be a paper on its own. 2 Kullback-Leibler Divergence Given two probability density functions, the quantity defined as DKL (f |g) = f(x) ln dx (8) is called Kullback-Leibler Divergence (abbreviated DKL) after the authors of the classical paper [Kullback and Leibler 1951]. Very often, this quantity is used as a measure of distance between two probability density functions, even though it is not a metric; This divergence measure clearly is greater or equal than zero, with zero occurring only and only if f = g, but it is not symmetric, so DKL (f |g) 6= DKL (g|f), and it does not obey the triangle inequality also. Rewriting equation (8), we get∫ f(x) ln f(x) ln(f(x))− f(x) ln(g(x))dx (9) = Ef [ln(f(X))]− Ef [ln(g(X))] (10) where Ef [h(X)] is the expectation of the random variable h(X) with respect to the probability density f . Since DKL (f |g) is greater than zero, we have Ef [ln(f(X))] > Ef [ln(g(X))] (11) We will now show that maximizing the likelihood is equivalent to mini- mize DKL (f |e), where e is the empirical distribution function. Calculating DKL (f |e) we arrive at DKL (f |e) = Ef [ln(f(X))]− ln (f(xj , θ)) (12) where the rightmost term is the empirical log-likelihood multiplied by a constant. So, maximizing the rightmost term we minimize the whole diver- gence; Then the process of maximizing the likelihood is equivalent to mi- nimizing the divergence between the empirical density and the parametric model. This result is very common in the related literature, and is shown in full detail on sources like [Eguchi and Copas 1998], which gives an accessi- ble but rather compact deduction of properties of methods based on Like- lihood Functions using DKL. In the next (and last) section we draw freely from a result shown in the [Eguchi and Copas 1998] paper that states that DKL might be used to measure the loss of power in likelihood ratio tests when the distribution under the alternative hypothesis is mis-specified. 3 Wrong Specification of Reference Distribu- tion and Loss of Power in Likelihood Ratio Tests Suppose we have data from a probability distribution H(x|θ, λ), and want to test the hypothesis that (θ = θ0, λ = λ0). The usual log-likelihood ratio is expressed as Λ(λ0, θ0) = `(λ̂, θ̂) `(λ0, θ0) where the notation ξ̂ is used for the unrestricted maximum likelihood es- timative of the parameter ξ. Suppose we are not willing to (or not able to) compute `(λ̂, θ̂) because the estimative of the parameter λ is trouble- some and decide to approximate the likelihood ratio statistic using `(λ1, θ̃) instead of the likelihood under the alternative hypothesis, where θ̃ is the maximum likelihood estimator of θ given that λ = λ1. We have then the relation Λ(x) ≈ `(λ1, θ̃) `(λ0, θ0) A result by [Eguchi and Copas 1998], section 3, states that the test statistic generated this way is less powerful than the usual one, with the loss in the power equal to ∆Power = DKL f(x|λ̂, θ̂), f(x|λ1, θ̃) In the present paper, we are concerned with the case where the data follows a distribution extended with the first Lehmann alternative, where the original distribution is such that F = F (x|θ) for a parameter θ. The null hypothesis will be of the form H0 : θ = θ0, λ = 1 (16) against a alternative hypothesis HA : θ 6= θ0, λ 6= 1 (17) If we erroneously consider that the data doesn’t come from a extended dis- tribution G(x|λ, θ), but from a population that follows the original F (x|θ) distribution, we can say that we are approximating the log-likelihood un- der the alternative hypothesis like in the previous discussion. In this case, the log-likelihood will be taken under the hypothesis HA′ : θ 6= θ0, λ = 1 (18) which generates the following expression for the log-likelihood: Λ(x) ≈ `(1, θ̃) `(1, θ0) Then we have that the test has less power than the one using the full G distribution; The difference on the power of the tests is given by ∆Power = DKL g(x|λ̂, θ̂)|g(x|1, θ̃) The main point in the above discussion is that for testing hypotheses about the ”original” parameter ξ, the tests using the extended version of distribu- tions are always more powerful, with a considerable difference in the error type II rate. Expanding the equation (20) we have that ∆P = DKL g(x|λ̂, θ̂)|g(x|1, θ̃) g(x|λ̂, θ̂) ln g(x|λ̂, θ̂) g(x|1, θ̃) dx (22) λf(x|λ̂, θ̂)Fλ−1(x|λ̂, θ̂) ln λf(x|λ̂, θ̂)Fλ−1(x|λ̂, θ̂) f(x|1, θ̃) dx (23) λf(x|λ̂, θ̂)Fλ−1(x|λ̂, θ̂) ln λFλ−1(x|λ̂, θ̂) dx (24) = lnλ+ λ(λ− 1)f(x|λ̂, θ̂)Fλ−1(x|λ̂, θ̂) ln F (x|λ̂, θ̂) dx (25) Integrating by parts, we get ∆Power = lnλ+ The graphic of this function is the loss of power that we have on our test when we the distribution of our data is one extended by the first Lehmann alternative and we fail to notice that, and is depicted in Figure 1 for values of λ bigger than one. References [Eguchi and Copas 1998] EGUCHI, S. AND COPAS, J. (2006). Interpreting Kullback-Leibler divergence with the Neyman-Pearson lemma. Journal of Multivariate Analysis, vol. 97, Issue 9, pages 2034-2040. [Gupta et alii 1998] GUPTA, R. C., GUPTA, P. L. AND GUPTA, R. D. (1998). Modeling failure time data by Lehman alternatives. Communication in Statistics: Theory and Methods, vol. 27, pages 887-904. [Kullback and Leibler 1951] KULLBACK, S. AND LEIBLER, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, vol. 22, Number 1, pages 79-86. [Nadarajah and Kotz 2006] NADARAJAH, S., KOTZ, S. (2006). The Expo- nentiated Type Distributions. Acta Applicandae Mathematicae, vol. 92, pages 97-111. [Nadarajah 2006] NADARAJAH, S. 2006. The exponentiated Gumbel dis- tribution with climate application. Environmetrics, vol. 17, Number 1, pages 13-23. Figure 1: Loss of Power as a Function of λ, for λ > 1. Distributions Generated by Lehmann Alternatives Kullback-Leibler Divergence Wrong Specification of Reference Distribution and Loss of Power in Likelihood Ratio Tests ABSTRACT We compute the loss of power in likelihood ratio tests when we test the original parameter of a probability density extended by the first Lehmann alternative. <|endoftext|><|startoftext|> Introduction The method Two-integral dynamics A preliminary analysis Results Miyamoto-Nagai disks Thick exponential disks Milky-Way like galaxies Discussion and conclusions REFERENCES ABSTRACT We investigate the possibility of discriminating between Modified Newtonian Dynamics (MOND) and Newtonian gravity with dark matter, by studying the vertical dynamics of disk galaxies. We consider models with the same circular velocity in the equatorial plane (purely baryonic disks in MOND and the same disks in Newtonian gravity embedded in spherical dark matter haloes), and we construct their intrinsic and projected kinematical fields by solving the Jeans equations under the assumption of a two-integral distribution function. We found that the vertical velocity dispersion of deep-MOND disks can be much larger than in the equivalent spherical Newtonian models. However, in the more realistic case of high-surface density disks this effect is significantly reduced, casting doubts on the possibility of discriminating between MOND and Newtonian gravity with dark matter by using current observations. <|endoftext|><|startoftext|> Al’tshuler-Aronov correction to the conductivity of a large metallic square network Christophe Texier1, 2 and Gilles Montambaux2 1Laboratoire de Physique Théorique et Modèles Statistiques, UMR 8626 du CNRS, Université Paris-Sud, F-91405 Orsay Cedex, France. 2Laboratoire de Physique des Solides, UMR 8502 du CNRS, Université Paris-Sud, F-91405 Orsay Cedex, France. (Dated: April 5, 2007) We consider the correction ∆σee due to electron-electron interaction to the conductivity of a weakly disordered metal (Al’tshuler-Aronov correction). The correction is related to the spectral determinant of the Laplace operator. The case of a large square metallic network is considered. The variation of ∆σee(LT ) as a function of the thermal length LT is found very similar to the variation of the weak localization ∆σWL(Lϕ) as a function of the phase coherence length. Our result for ∆σee interpolates between the known 1d and 2d results, but the interaction parameter entering the expression of ∆σee keeps a 1d behaviour. Quite surprisingly, the result is very close to the 2d logarithmic behaviour already for LT ∼ a/2, where a is the lattice parameter. PACS numbers: 73.23.-b ; 73.20.Fz ; 72.15.Rn I. INTRODUCTION At low temperature, the classical (Drude) con- ductivity of a weakly disordered metal is affected by two kinds of quantum corrections : the first one is the weak localization (WL) correction, a phase coherent contribution that originates from quan- tum interferences between reversed electronic tra- jectories. This contribution to the averaged con- ductivity depends on the phase coherence length Lϕ and the magnetic field : ∆σWL(B, Lϕ). The temperature manifests itself through Lϕ, since phase breaking may depend on temperature, e.g. if it originates from electron-electron1 or electron- phonon2 interaction. In a metal, an electron is not only elastically scattered on the disordered potential, but, due to the electron-electron interaction, is also scattered by the electrostatic potential created by the other electrons. At low temperatures, when the elas- tic scattering rate (1/τe) dominates the electron- electron scattering rate (1/τee(T )), the motion of the electron is diffusive between scattering events with other electrons. In this regime, electron- electron interaction is responsible for a small deple- tion of the density of states at Fermi energy (called the DoS anomaly or the Coulomb dip) and a cor- rection to the averaged conductivity as well, the so- called Al’tshuler-Aronov (AA) correction3,4,5,6,7,8,9 (see Refs.10,11,12 for a recent discussion). AA and WL corrections are of the same order (but this latter vanishes in a magnetic field). However, con- trary to the WL, the AA correction is not sensitive to phase coherence and involves another important length scale : the thermal length LT = (~ = kB = 1). The AA correction, denoted be- low ∆σee(LT ), has been measured in metallic wires in several experiments14,15,16,17. From the exper- imental point of view, AA correction allows to study interaction effects in weakly disordered met- als, but also furnishes a local probe of temperature in order to control Joule heating effects15,17, which is crucial in a phase coherent experiment. All the works aforementioned refer to the quasi- one-dimensional (wire) or two-dimensional (plane) situations. Quantum transport has also been stud- ied in more complex geometries like networks of quasi-1d wires. For example several studies of WL have been provided on large regular networks in honeycomb and square metallic networks18,19, in square networks etched in a 2DEG20, and in square and dice silver networks21. Theoretical studies of WL on networks have been initiated by the works of Douçot & Rammal22,23 and improved by Pas- caud & Montambaux24 who introduced a powerful tool25 : the spectral determinant of the Laplace operator, that will be used in the following (see also Ref.26). The aim of this paper is to study how the AA correction can be computed in networks. In a first part we briefly recall how the spectral determinant can be used to compute the WL. Then in a second part we will consider the AA correction. II. SPECTRAL DETERMINANT AND WEAK LOCALIZATION Interferences of reversed electronic trajec- tories are encoded in the Cooperon, solu- tion of a diffusion-like equation (∂t − D[∇ − 2ieA(x)]2)Pc(x, x′; t) = δ(x − x′)δ(t), where A(x) is the vector potential. On large regular networks, when it is justified to integrate uniformly the Cooperon over the network (see Ref.27 for a discus- sion of this point) it is meaningful to introduce the space-averaged Cooperon Pc(t) = Pc(x, x; t) http://arxiv.org/abs/0704.0741v1 ∆σWL = − dt e−t/τϕ Pc(t) (1) = −2e lnS(γ) (2) where τϕ = L ϕ/D is the phase coherence time. The factor 2 stands for spin degeneracy. We have omitted in (1,2) a factor 1/s where s is the cross-section of the wires. The parameter γ is re- lated to the phase coherence length γ = 1/L2ϕ (note that description of the decoherence due to electron-electron interaction in networks requires a more refined discussion28,29). The spectral de- terminant of the Laplace operator is formally de- fined as S(γ) = det(γ −∆) = n(γ + En) where {En} is the spectrum of −∆ [in the presence of a magnetic field, ∆ → (∇ − 2ieA)2]. The inter- est in introducing S(γ) is that it can be related to the determinant of a V × V -matrix, where V is the number of vertices, that encodes all informa- tion on the network (topology, length of the wires, magnetic field, connection to reservoirs). We la- bel vertices by greek letters. lαβ designates the length of the wire (αβ) and θαβ the circulation of the vector potential along the wire. The topology is encoded in the adjacency matrix : aαβ = 1 if α and β are linked by a wire, aαβ = 0 otherwise. λα = ∞ if α is connected to a reservoir and λα = 0 if not. We introduce the matrix Mαβ = δαβ aαµ coth −iθαβ where the aαµ constraints the sum to run over neighbouring vertices. Then24 S(γ) = γlαβ√ detM (4) where the product runs over all wires. We now con- sider a large square network of size Nx ×Ny made of wires of length lαβ = a ∀(αβ). For simplicity we impose periodic boundary conditions (topology of a torus), which is inessential as soon as the to- tal size of the network remains small compared to Lϕ. At zero magnetic field the spectrum of the adjacency matrix is ǫn,m = 2 cos(2nπ/Nx) + 2 cos(2mπ/Ny), with n = 1, · · · , Nx and m = 1, · · · , Ny. Therefore S(γ) = )NxNy 2 cosh γa− cos 2πn − cos 2πm The calculation of lnS(γ) involves a sum that can be replaced by an integral when Nx, Ny ≫ Lϕ/a. Using (2π)2 2A+ cosx+ cos y K(1/A), (6) where K(x) is the complete elliptic integral of first kind30, yields20 lnS(γ) = γa− 1√ where the volume of the network is Vol = 2NxNya. We recover the expression of the WL first derived by Douçot & Rammal23. Figure 1 displays the dependence of the WL correction as a function of the phase coherence length Lϕ. We now discuss two limiting cases. 0 1 2 3 4 5Lϕ/a 0.1 1.0 10.0 FIG. 1: ∆σWL in unit of 2e 2/h as a function of Lϕ/a (at zero magnetic field). The dashed line is the 1d re- sult. The dotted line is the 2d limit eq. (10). 1d limit.– In the limit Lϕ ≪ a (i.e. γa ≫ 1) : ∆σWL = − −2a/Lϕ We compare with the result for a wire of length a connected at its extremities : ∆σwireWL ≃ − 2e ). As we can see the dominant terms coincide. Deviations appear when Lϕ/a increases since tra- jectories begin to feel the topology of the network. This is already visible by comparing the second terms of the expansions. 2d limit.– In the limit Lϕ ≫ a (i.e. γa ≪ 1), we obtain lnS(γ) = ln(4Lϕ/a) + The conductivity reads ∆σWL ≃ − ln(Lϕ/a) + CWL with CWL = 2 ln 2 ≃ 0.608. As noticed in the beginning of the section, eqs. (8,10) should be di- vided by the cross-section s of the wires. In the 2d limit, diffusive trajectories expand over distances larger than a and feel the two dimensional nature of the system, being the reason why (10) is reminis- cent of the 2d result. It is interesting to point that the network provides a natural cutoff (the length of the wires, a) while the computation of the WL for a plane in the diffusion approximation requires to introduce a cutoff by hand for lower times in eq. (1), which is the elastic scattering time τe. In this latter case the constant added to the logarith- mic behaviour is not well controlled since it de- pends on the cutoff procedure (the computation of the constant for a plane requires to go beyond the diffusion approximation and leads to31 ∆σ plane ln(2L2ϕ/ℓ e + 1) ≃ − 2e ln(Lϕ/ℓe) + ln 2] since ℓe ≪ Lϕ). III. AL’TSHULER-ARONOV CORRECTION At first order in the electron-electron interac- tion, the exchange term is the dominant contribu- tion to the correction to the conductivity8,10,11,32 ∆σee = − dπVol ω coth D~q 2 U(~q, ω) (−iω +D~q 2)3 (11) where U(~q, ω) is the dynamically screened inter- action. Within the RPA approximation and in the small ~q and ω limit, the interaction takes the form33 U(~q, ω) ≃ 1 −iω+D~q 2 D~q 2 where ρ0 is the den- sity of states per spin channel. Replacing the Drude conductivity by its expression σ0 = 2e and performing an integration by parts, we get ∆σee = − πdVol ω coth −iω +D~q 2 (12) After Fourier transform, the result can be cast in the form11 : ∆σee = −λσ sinhπT t Pd(t) , For the exchange term considered here, one finds λσ = 4/d. Further calculation yields 8 λσ ≃ F , where F is the average of the interaction on the Fermi surface (see definition in Refs.8,9). This expression of λσ is valid in the perturba- tive regime, F ≪ 1 ; nonperturbative expression is given in Refs.6,7,8,9. Pd(t) is the space inte- grated return probability Pd(t) = Pd(x, x; t), where Pd(x, x′; t) is solution of a classical diffusion equation similar to the equation for Pc(x, x′; t), apart that it does not feel the magnetic field : [∂t−D∆]Pd(x, x′; t) = δ(x−x′)δ(t). Therefore the Laplace transform of Pd(t) is given by ∂γ lnS(γ) with θαβ = 0. It is interesting to point out that (13) has a similar structure to (1) with a different cutoff procedure for large time. It also involves a different scale : the temperature dependence of ∆σee is driven by the length scale LT instead of Lϕ for the weak-localization correction ∆σWL. Up to eq. (13) the discussion is rather general and nothing has been specified on the system. We have seen in section II that the WL for the square network presents a dimensional crossover from 1d to 2d by tuning Lϕ/a. A similar dimen- sional crossover occurs for the AA correction by tuning LT /a as we will see. This remark raises the question of the dimension d in eq. (11). To answer this question we should return to the origin of the factor 1/d : the current lines in the conductivity σij produce a factor qiqj replaced by δij ~q 2 after angular integration. Since in a network the diffu- sion in the wires has a 1d structure (provided that W ≪ LT ∼ D/ω, where W is the width of the wires), the dimension in λσ is d = 1. Therefore we have for the network λnetworkσ ≃ 4− 32F . If one now expands the thermal function in (13) sinh y = 4y2 n e−2ny , (14) we can also relate ∆σee to the spectral determi- nant. We obtain : ∆σee = −λσ lnS(γ) γ= 2nπ which is the central result of this paper. It is the starting point of the discussion below. Application to the case of the square network.– We have to compute γ2 ∂ lnS(γ). We start from (7) and compute its second derivative. We obtain after some algebra : ∆σee = −λσ where the function ϕ(x) is given by : ϕ(x) = − 8 2x coshx sinh3 x sinh2 x 3 cothx 3 tanhx coshx 3− 2x sinh 2x coshx , (17) E(x) being the complete elliptic integral of second kind30. The function ϕ(x) is plotted in figure 2 and its limiting behaviours are easily obtained30 : ϕ(x) = +O(x2) for x → 0 (18) +O(xe−2x) for x → ∞ (19) The LT dependence of AA correction on a square network is displayed on figure 3, where we have plotted ∆σee(LT ) given by eq. (16). The di- mensional crossover now occurs by tuning the ra- tio LT/a. We consider the two limits. 0 5 10 15 FIG. 2: The function ϕ(x) of eq. (17). 1d limit.– For LT ≪ a we can replace the expan- sion (19) in the series (16). Therefore ∆σee ≃ −λσ 3ζ(3/2) 3ζ(3/2) ≃ 0.782. The dominant term again coincides with the one for a connected wire8,10,11 while the second differs by a factor 2, as for the WL [see discussion after eq. (8)]. 2d limit.– In the limit LT ≫ a we introduce N = (LT /a)2 and cut the sum (16) in two pieces : N . It is clear from the limit be- 0 1 2 3 4 5LT/a ee 0.1 1.0 10.0 FIG. 3: The continuous line is ∆σee in unit of λσ as a function of LT /a (the series (16) is computed nu- merically). The dashed line is the 1d limit, eq. (20), and the dotted curve is the 2d limit, eq. (21). haviours of ϕ(x) that the first sum diverges loga- rithmically with N while the second brings a neg- ligible contribution of order N 0. Therefore : ∆σee ≃ −λσ ln(LT /a) + Cee The constant is estimated numerically. We find Cee ≃ 0.56. The two eqs. (20,21) should be divided by the cross-section s of the wires. The two functions ∆σWL(B = 0, Lϕ) (figure 1) and ∆σee(LT ) (figure 3) are very similar. Apart from the prefactors 2e2/h and λσe 2/h which ac- count respectively for the spin degeneracy and the interaction strength, the linear behaviours at the origin have a different slope (1 and 0.782) and the logarithmic behaviours are slightly shifted : CWL ≃ 0.61 and Cee ≃ 0.56. IV. COMPARISON WITH EXPERIMENTS The AA correction has been recently measured by Mallet et al34 in networks of silver wires with 3 104 and 105 cells, lattice spacing a = 0.64 µm and diffusion constant D ≃ 100 cm2/s. The dif- fusion constant D has been measured separately (through measurement of the Drude conductance), therefore we can compare our result (16) with ex- periment using one fitting parameter only : the interaction parameter λσ. The 2d logarithmic behaviour (21) has been observed in the range 100mK< T < 1K from which the value λexpσ ≃ 3.1 was extracted, in agreement with similar mea- surements performed on a long silver wire for which34,35 λexp, wireσ ≃ 3.2. We now compare with the theoretical value : for silver Fermi wavelength is k−1F = 0.083 nm and Thomas-Fermi screening length κ−1 = 1/ 8πρ0e2 = 0.055 nm. In the Thomas-Fermi approximation, the parameter F is given by11 F = ( κ )2 ln[1 + (2kF )2], therefore F ≃ 0.58. Using the 1d nonperturbative expres- sion8 λσ = 4 + 1 + F/2 − 1 − F/4), we get λthσ ≃ 3.24, close to the experimental value. V. CONCLUSION Equations (15,16) are our main results. The first one shows that AA and WL can be formally re- lated : ∆σee(LT ) = ∆σWL(Lϕ) = 2nπ The validity of this relation is the same as for eqs. (1,2) : the system should be such that it is meaningful to average uniformly the nonlocal conductivity σ(r, r′) to get the local conductivity drdr′ σ(r, r′). A similar discussion has been proposed to relate WL and conductivity fluctua- tions (see appendix E of Ref.29). Our starting point (11) is a formulation in the Fourier space, what implicitly assumes translation invariance. Whereas this assumption seems rea- sonable for a large regular network such as the square network studied in this article, its valid- ity is not clear for networks of arbitrary topology, what would need further developments. We have computed the AA correction in a large square network and shown that the result inter- polates between the 1d, eq. (20), and a 2d result, eq. (21). Interestingly, the 2d limit in a network involves a 1d constant λnetworkσ ≃ 4 − 32F , what is confirmed by experiments, as discussed in sec- tion IV. The interest of the network compared to the plane is to control the constant Cee of eq. (21) : for a plane, a cutoff must be introduced in eq. (13) at short time t ∼ τe and the constant Cee is replaced by a number that depends on the pre- cise cutoff procedure. Experimentally, it would be interesting to observe the crossover from (20) to (21) by varying LT/a. This was not pos- sible in experiments of Mallet et al34 described in section IV because measurements are compli- cated by the fact that electron-phonon interaction also brings a temperature-dependent contribution, ∆σe−ph, at high temperature (above few Kelvins). The conductivity is given by σ = σ0 + ∆σWL + ∆σee + ∆σe−ph. The WL can be suppressed by a magnetic field however the electron-phonon contri- bution is difficult to separate from ∆σee. There- fore the network should be patterned in a way such that the crossover 1d-2d remains below T ∼ 1 K where ∆σe−ph is negligible. As an example we con- sider the silver networks studied in Ref.21 for which LT = 0.27 × T−1/2 (LT in µm and T in K). In order to see clearly the 1d and the 2d regimes it would be convenient to study two networks with different lattice spacings. If temperature is con- strained by 10 mK< T < 1 K, for a = 0.5 µm we have 0.5 . LT /a . 5, which probes the 2d regime over one decade. A second lattice with a ∼ 5 µm would allow to probe the 1d regime since in this case 0.05 . LT/a . 0.5. Acknowledgements We have benefitted from stimulating discussions with Christopher Bäuerle, Hélène Bouchiat, Meydi Ferrier, François Mallet, Laurent Saminadayar and Félicien Schopfer. 1 B. L. Altshuler, A. G. Aronov, and D. E. Khmel- nitsky, Effects of electron-electron collisions with small energy transfers on quantum localisation, J. Phys. C: Solid St. Phys. 15, 7367 (1982). 2 S. Chakravarty and A. Schmid, Weak localization: the quasiclassical theory of electrons in a random potential, Phys. Rep. 140(4), 193 (1986). 3 B. L. Al’tshuler and A. G. Aronov, Contribu- tion to the theory of disordered metals in strongly doped semiconductors, Sov. Phys. JETP 50(5), 968 (1979). 4 B. L. Al’tshuler, D. E. Khmel’nitzkĭı, A. I. Larkin, and P. A. Lee, Magnetoresistance and Hall effect in a disordered two-dimensional electron gas, Phys. Rev. B 22(11), 5142 (1980). 5 B. L. Altshuler and A. G. Aronov, Fermi-liquid the- ory of the electron-electron interaction effects in disordered metals, Solid State Commun. 46, 429 (1983). 6 A. M. Finkel’shtĕın, Influence of Coulomb interac- tion on the properties of disordered metals, Sov. Phys. JETP 57(1), 97 (1983). 7 C. Castellani, C. Di Castro, P. A. Lee, and M. Ma, Interaction-driven metal-insulator transi- tions in disordered fermion systems, Phys. Rev. B 30(2), 527 (1984). 8 B. L. Altshuler and A. G. Aronov, Electron-electron interaction in disordered conductors, in Electron- electron interactions in disordered systems, edited by A. L. Efros and M. Pollak, page 1, North- Holland, 1985. 9 P. A. Lee and T. V. Ramakrishnan, Disordered elec- tronic systems, Rev. Mod. Phys. 57, 287 (1985). 10 I. L. Aleiner, B. L. Altshuler, and M. E. Gershenson, Interaction effects and phase relaxation in disor- dered systems, Waves RandomMedia 9, 201 (1999). 11 É. Akkermans and G. Montambaux, Mesoscopic physics of electrons and photons, Cambridge Uni- versity Press, 2007. 12 It is worth mentioning that a similar effect exists at higher temperature, in the ballistic regime (τee ≪ τe) ; Ref. 13 provides a nice review on this point and describes the crossover between the two regimes. 13 G. Zala, B. N. Narozhny, and I. L. Aleiner, Interac- tion corrections at intermediate temperatures: Lon- gitudinal conductivity and kinetic equation, Phys. Rev. B 64, 214204 (2001). 14 A. E. White, M. Tinkham, W. J. Skocpol, and D. C. Flanders, Evidence for interaction effects in the low-temperature resistance rise in ultrathin metallic wires, Phys. Rev. Lett. 48(25), 1752 (1982). 15 P. M. Echternach, M. E. Gershenson, H. M. Bozler, A. L. Bogdanov, and B. Nilsson, Temperature de- pendence of the resistance of one-dimensional metal films with dominant Nyquist phase breaking, Phys. Rev. B 50(8), 5748 (1994). 16 F. Pierre, A. B. Gougam, A. Anthore, H. Pothier, D. Esteve, and N. O. Birge, Dephasing of electrons in mesoscopic metal wires, Phys. Rev. B 68, 085413 (2003). 17 C. Bäuerle, F. Mallet, F. Schopfer, D. Mailly, G. Eska, and L. Saminadayar, Experimental Test of the Numerical Renormalization Group Theory for Inelastic Scattering from Magnetic Impurities, Phys. Rev. Lett. 95, 266805 (2005). 18 B. Pannetier, J. Chaussy, R. Rammal, and P. Gan- dit, First Observation of Altshuler-Aronov-Spivak effect in gold and copper, Phys. Rev. B 31(5), 3209 (1985). 19 G. J. Dolan, J. C. Licini, and D. J. Bishop, Quan- tum Interference Effects in Lithium Ring Arrays, Phys. Rev. Lett. 56(14), 1493 (1986). 20 M. Ferrier, L. Angers, A. C. H. Rowe, S. Guéron, H. Bouchiat, C. Texier, G. Montambaux, and D. Mailly, Direct measurement of the phase co- herence length in a GaAs/GaAlAs square network, Phys. Rev. Lett. 93, 246804 (2004). 21 F. Schopfer, F. Mallet, D. Mailly, C. Texier, G. Montambaux, L. Saminadayar, and C. Bäuerle, Dimensional crossover in quantum networks: from mesoscopic to macroscopic physics, Phys. Rev. Lett. 98, 026807 (2007). 22 B. Douçot and R. Rammal, Quantum oscillations in normal-metal networks, Phys. Rev. Lett. 55(10), 1148 (1985). 23 B. Douçot and R. Rammal, Interference effects and magnetoresistance oscillations in normal-metal net- works: 1. weak localization approach, J. Physique 47, 973–999 (1986). 24 M. Pascaud and G. Montambaux, Persistent cur- rents on networks, Phys. Rev. Lett. 82, 4512 (1999). 25 Pascaud & Montambaux have rather considered thermodynamic properties. The nonlocal effects in networks have been further investigated in Ref.27. 26 E. Akkermans, A. Comtet, J. Desbois, G. Montam- baux, and C. Texier, On the spectral determinant of quantum graphs, Ann. Phys. (N.Y.) 284, 10–51 (2000). 27 C. Texier and G. Montambaux, Weak localization in multiterminal networks of diffusive wires, Phys. Rev. Lett. 92, 186801 (2004). 28 T. Ludwig and A. D. Mirlin, Interaction-induced dephasing of Aharonov-Bohm oscillations, Phys. Rev. B 69, 193306 (2004). 29 C. Texier and G. Montambaux, Dephasing due to electron-electron interaction in a diffusive ring, Phys. Rev. B 72, 115327 (2005). 30 I. S. Gradshteyn and I. M. Ryzhik, Table of in- tegrals, series and products, Academic Press, fifth edition, 1994. 31 A. Cassam-Chenai and B. Shapiro, Two dimen- sional weak localization beyond the diffusion ap- proximation, J. Phys. I France 4, 1527 (1994). 32 The formula (5.1) of Ref.8 has the wrong sign. 33 This interaction assumes that the screening length is smaller than the transverse size of the wire. 34 F. Mallet et al, to be published (2007). 35 L. Saminadayar, P. Mohanty, R. A. Webb, P. De- giovanni and C. Bäuerle, Phase coherence in the presence of magnetic impurities, Physica E, to be published (2007). ABSTRACT We consider the correction $\Delta\sigma_\mathrm{ee}$ due to electron-electron interaction to the conductivity of a weakly disordered metal (Al'tshuler-Aronov correction). The correction is related to the spectral determinant of the Laplace operator. The case of a large square metallic network is considered. The variation of $\Delta\sigma_\mathrm{ee}(L_T)$ as a function of the thermal length $L_T$ is found very similar to the variation of the weak localization $\Delta\sigma_\mathrm{WL}(L_\phi)$ as a function of the phase coherence length. Our result for $\Delta\sigma_\mathrm{ee}$ interpolates between the known 1d and 2d results, but the interaction parameter entering the expression of $\Delta\sigma_\mathrm{ee}$ keeps a 1d behaviour. Quite surprisingly, the result is very close to the 2d logarithmic behaviour already for $L_T\sim{a}/2$, where $a$ is the lattice parameter. <|endoftext|><|startoftext|> Introduction At low temperature, quantum interferences of reversed electronic trajectories are responsible for a small reduction of the averaged conductivity called the weak localization (WL) correction. This correction is a manifestation of quantum coherence which is always limited over a certain length scale, named the phase coherence length Lϕ. A way to extract this important length scale in experiments is to use the magnetic field sensitivity of the WL. For example the WL correction of an infinitely long wire of rectangular section of width W and area S submitted to a perpendicular magnetic field B is 1 〈∆σ〉 = −2e2 )2]−1/2 (in the following we will forget the 1/S factor). The width of the magnetoconductance (MC) curve provides a direct determination of Lϕ. ... ... (b) ...... Figure 1: Chains of rings. If we consider the regime b ≫ Lϕ ≫ L the rings can be considered as independent in case (a) but not in case (b). Another possibility to extract phase coherence length is to study arrays of rings whose MC present oscillations as a function of the flux φ per ring with period half the quantum flux φ0 = h/e. These are the famous Al’tshuler-Aronov-Spivak oscillations 2 (AAS), observed in many experiments3,4,5,6. In order to extract the phase coherence length from AAS oscillations, a precise theoretical prediction for the behaviour of the AAS harmonics with the phase coherence length is needed. Harmonics of the oscillations are defined as 〈∆σ(φ)〉 = n∆σne 4πinφ/φ0 . A well-known expression has been derived in Ref. 1 for an isolated ring of perimeter L : ∆σn = − −|n|L/Lϕ (1) However, in a real experiment where the ring is connected to wires, or embedded in a larger network, this expression can only be relevant in the regime Lϕ ≪ L. At the lowest tempera- tures, when Lϕ & L, the AAS harmonics are strongly affected by the surrounding wires since trajectories can expand outside the ring over distances larger than the perimeter. It is the aim of this paper to discuss the behaviour of AAS harmonics in chains of rings when Lϕ & L. We will consider two cases represented on figure 1 : in the first situation the rings are separated http://arxiv.org/abs/0704.0742v1 by a distance b ≫ Lϕ and can therefore be considered as independent (however the connecting wires will affect the AAS harmonics). In the second case rings are in contact and harmonics can involve trajectories winding around several neighbouring rings. In section 3, we will see that when electron-electron interaction is the dominant process for decoherence, eq. (1) cannot be used even in the regime L ≪ Lϕ. 2 Nonlocality of quantum transport in chain of rings We consider an array of rings all pierced by the same flux φ. The n-th harmonic of the WL correction at a given point x of a network can be expressed as ∆σn(x) = − dtPn(x, x; t) e−t/τϕ (2) where D is the diffusion constant and τϕ = L ϕ/D the phase coherence time. The factor 2 stands for spin degeneracy. Pn(x, x; t) is the probability that a particle diffusing into the network comes back to its initial point x in a time t, after having encircled a flux nφ. For example, in an isolated ring Pn(x, x; t) = 1√ −(nL)2/4Dt which immediatly gives eq. (1). Except in translation invariant systems, ∆σn(x) depends on x and expression (2) must be averaged over the network in a proper way described in Ref. 7. A ring with two arms.– The case of a ring connected to two arms has been studied in detail in Ref. 8 where it has been shown that Pn(x, x; t) ≃ 2(Dt)3/4 (Dt)1/4 ) for time scales t ≫ L2/D with x inside the ring (the precise form of the dimensionless function Ψ(ξ) is inessential for the present discussion). Compared with the isolated ring case, where the typical number of winding scales with time as nt ∼ t1/2, diffusion around the ring is slowed down as nt ∼ t1/4 due to the time spent in the arms. As a consequence the harmonics of the conductance of a ring are given by 8 : ∆σn ∝ Lϕ3/2e−|n| 2L/Lϕ (note that the scaling n ∼ L1/2ϕ is analogous to the scaling of winding with time nt ∼ t1/4 since Lϕ ∼ t1/2). The chain of distant rings.– The same argument holds for the chain of rings separated by a distance b ≫ Lϕ (figure 1.a). In this case, averaging properly ∆σn(x) inside the chain of Nr rings, one finds that the harmonics of the dimensionless conductance read : ∆gn ≃ − 1/2Lϕ 2 [(Nr + 1)b]2 2L/Lϕ for b ≫ Lϕ ≫ L (3) The chain of attached rings.– If we now consider the network of figure 1.b, we can show that the probability reads 9 Pn(x, x; t) ≃ L8πDte −(nL)2/4Dt for t ≫ L2/D. The AAS harmonics are given in this case by 9 : ∆gn ≃ − [ln(2Lϕ/|n|L) + bn] for Lϕ/L ≫ |n| (4) −|n|L/Lϕ |n|L/Lϕ for |n| ≫ Lϕ/L ≫ 1 (5) where bn depends weakly on n (b∞ = −C, the Euler constant). 3 Decoherence due to electron-electron interaction The above results rely on the fact that, in eq. (2), the long times have been cut off with an exponential damping e−t/τϕ . However it has been shown recently that this simple modelization does not account correctly for the decoherence due to electron-electron interaction, which is the dominant one at low temperature a. In this case, an alternative description was proposed by Al’tshuler, Aronov & Khmel’nitskii (AAK) 12 but it is only recently that the consequences for AAS oscillations have been understood13,14,15. The model of AAK.– The length scale characterizing the efficiency of electron-electron inter- action to suppress phase coherence in wires is known as the Nyquist length LN = (ν0D 2/T )1/3 where ν0 is the density of states, D the diffusion constant and T the temperature (~ = kB = 1). In the model of AAK, the random phase accumulated by an electron moving in the fluctuat- ing electric potential due to other electrons is included in the calculation of the WL. The pair of reversed interfering trajectories picks a phase eiΦ[C], where C designates a closed diffusive trajectory, and the harmonics of WL are given by ∆σn ∼ − 〈eiΦ[Cn]〉V = − 〈Φ[Cn]2〉V (6) The sum runs over all closed trajectories with winding n (a proper formulation of eq. (6) requires a path integral). Gaussian fluctuations of the electric potential are given by the fluctuation- dissipation theorem 〈V (~r, t)V (~r ′, t′)〉V = 2e Tδ(t − t′)Pd(~r,~r ′) (written here in the classical limit T ≪ ω), where σ0 is the classical Drude conductivity. Pd is solution of the diffusion equation −∆Pd(~r,~r ′) = δ(~r−~r ′) and therefore depends on the topology of the system. Then14 〈Φ[x(τ)]2〉V = dτ [Pd(x(τ), x(τ)) − Pd(x(τ), x(t− τ))] (7) where C ≡ (x(τ), 0 6 τ 6 t |x(0) = x(t)) is a closed diffusive path. The crucial point is that the simple exponential damping of eq. (2) is replaced in eq. (6) by a functional of the trajectory 〈Φ[Cn]2〉V . Therefore decoherence is now network-dependent and a priori sensitive to the nature of trajectories (in particular whether they do enclose a magnetic flux or not). The limit LN ≪ L.– The model described above was applied to the case of a single ring 13,14,15. The result for an isolated ring is relevant to describe arrays of rings in the limit LN ≪ L where winding trajectories hardly exit from a ring, which makes rings independent from each other. For the chain of distant rings (figure 1.a) we have ∆gn ∼ − [(Nr + 1)b]2 −|n|π (L/LN ) 3/2 ∼ −nL3/2T 1/2 T 1/3 for LN ≪ L ≪ b (8) (for the case of the chain of rings in contact (figure 1.b), (Nr + 1)b in the denominator is replaced by NrL/4). Whereas the time characterizing efficiency of electron-electron interaction to suppress phase coherence in a wire is the Nyquist time τN = L N/D ∝ T−2/3, it was shown in Refs. 13,14 that the behaviour (8) is related to a new time scale characterizing decoherence for winding trajectories : τc = τ L ∝ T −1, where τL = L 2/D is the Thouless time of the ring. The chain of distant rings.– If we consider a ring connected to long arms, winding trajectories spend most of the time in the arms8 and the decoherence mostly occurs in the arms. Therefore decoherence occurs on a time scale τN , like in a wire. The function 〈eiΦ[C]〉V for a wire was a The exponential damping gives the correct shape of a MC of a wire 10,11 with Lϕ → 2LN (see below for definition of LN ), however this simple substitution gives an incorrect result for AAS harmonics as explained below. studied in Ref. 16. Using this result and the winding properties recalled in section 2 leads to 14 ∆gn ∼ − [(Nr + 1)b]2 for n2 ≪ LN/L (9) [(Nr + 1)b] )7/12 −κ2|n| L/LN ∼ e −nL1/2T 1/6 T 11/36 for n2 ≫ LN/L , (10) where κ2 = 2|u1|1/4 ≃ 1.421. The chain of attached rings.– In this case, the nature of decoherence was shown to be closely related to the one of a wire since diffusion along the chain is reminiscent of a 1d diffusion and again occurs on time scale τN ∆gn ≃ − ln(LN/|n|L) + cste for |n| ≪ LN/L (11) ≃ − 1 Nr|u1|3/2 −κ3|n|L/LN ∼ e−nLT 1/3 for |n| ≫ LN/L (12) where κ3 = 2 −1/3|u1|1/2 ≃ 0.801. 4 Conclusion We have considered networks of connected rings, made of weakly disordered wires. We have first shown that geometrical effects can strongly modify the exponential behaviour of AAS harmonics well-known for an isolated ring, since trajectories can now explore the network around each ring. In the second part we have shown that decoherence due to electron-electron interaction is sensitive to geometry, a second reason that modifies the simple AAS result. An interesting experiment would be to compare precisely AAS oscillations for the two networks of figure 1 in the low temperature regime LN ≫ L. References 1. B. L. Al’tshuler and A. G. Aronov, JETP Lett. 33(10), 499 (1981). 2. B. L. Al’tshuler, A. G. Aronov, and B. Z. Spivak, JETP Lett. 33(2), 94 (1981). 3. B. Pannetier, J. Chaussy, R. Rammal, and P. Gandit, Phys. Rev. B 31(5), 3209 (1985). 4. G. J. Dolan, J. C. Licini, and D. J. Bishop, Phys. Rev. Lett. 56(14), 1493 (1986). 5. M. Ferrier, L. Angers, A. C. H. Rowe, S. Guéron, H. Bouchiat, C. Texier, G. Montambaux, and D. Mailly, Phys. Rev. Lett. 93, 246804 (2004). 6. F. Schopfer, F. Mallet, D. Mailly, C. Texier, G. Montambaux, L. Saminadayar, and C. Bäuerle, Phys. Rev. Lett. 98, 026807 (2007). 7. C. Texier and G. Montambaux, Phys. Rev. Lett. 92, 186801 (2004). 8. C. Texier and G. Montambaux, J. Phys. A: Math. Gen. 38, 3455–3471 (2005). 9. C. Texier and G. Montambaux, in preparation (2007). 10. F. Pierre, A. B. Gougam, A. Anthore, H. Pothier, D. Esteve, and N. O. Birge, Phys. Rev. B 68, 085413 (2003). 11. É. Akkermans and G. Montambaux, Physique mésoscopique des électrons et des photons, EDP Sciences, CNRS éditions, 2004. Mesoscopic physics of electrons and photons, Cam- bridge University Press, 2007. 12. B. L. Altshuler, A. G. Aronov, and D. E. Khmelnitsky, J. Phys. C: Solid St. Phys. 15, 7367 (1982). 13. T. Ludwig and A. D. Mirlin, Phys. Rev. B 69, 193306 (2004). 14. C. Texier and G. Montambaux, Phys. Rev. B 72, 115327 (2005) ; ibid 74, 209902(E) (2006). 15. C. Texier and G. Montambaux, Comment on Ref. 13, submitted (2007). 16. G. Montambaux and E. Akkermans, Phys. Rev. Lett. 95, 016403 (2005). Introduction Nonlocality of quantum transport in chain of rings Decoherence due to electron-electron interaction Conclusion ABSTRACT We study weak localization in chains of metallic rings. We show than nonlocality of quantum transport can drastically affect the behaviour of the harmonics of magnetoconductance oscillations. Two different geometries are considered: the case of rings separated by long wires compared to the phase coherence length and the case of contacted rings. In a second part we discuss the role of decoherence due to electron-electron interaction in these two geometries. <|endoftext|><|startoftext|> Diatomicmolecule as a quantum entanglement switch Adam Rycerz ∗ Marian Smoluchowski Institute of Physics, Jagiellonian University, Reymonta 4, 30–059 Kraków, Poland Abstract We investigate a pair entanglement of electrons in diatomic molecule, modeled as a correlated double quantum dot attached to the leads. The low-temperature properties are derived from the ground state obtained by utilizing the Rejec-Ramšak variational technique within the framework of EDABI method, which combines exact diagonalization with ab initio calculations. The results show, that single-particle basis renormalization modifies the entanglement-switch effectiveness significantly. We also found the entanglement signature of a competition between an extended Kondo and singlet phases. Key words: Correlated nanosystems, Entanglement manipulation, EDABI method PACS: 73.63.-b, 03.67.Mn, 72.15.Qm Quantum entanglement, as one of the most intriguing features of quantum mechanics, have spurred a great deal of scientific activity during the last decade, mainly because it is regarded as a valuable resource in quantum commu- nication and information processing [1]. The question on entanglement between microscopic degrees of freedom in a condensed phase have been raised recently [2], in hope to shed new lights on the physics of quantum phase transitions and quantum coherence [3]. In the field of quantum elec- tronics, a pair entanglement appeared to be a convenient tool to characterize the nature of transport through quan- tum dot, since its vanish when the system is in a Kondo regime [4]. The analogical behavior was observed for two qubits in double quantum dot, for either serial and paral- lel configuration [5]. The latter case is intriguing, since the concurrence [6] at T = 0 changes abruptly from C ≈ 1 to C = 0 when varying the interdot coupling, so a finite An- derson system shows a true quantum phase transition. Here we consider a nanoscale version of such an entangle- ment switch, inspired by conductance measurements for a single hydrogenmolecule [7]. A special attention is payed to electron-correlation effects, in particular the wave-function renormalization [8]. Recent experiment [9] shows the cur- rent through a molecule is carried by a single conductance channel, so serial configuration shown in Fig. 1 seems to be the realistic one. The Hamiltonian of the system is ∗ Corresponding author. Tel: (+48 12) 663–55–68 Fax: (+48 12) 633–40–79 Email address: rycerz@th.if.uj.edu.pl (Adam Rycerz). ���������� �������� �������� ������������������������ �������� Fig. 1. Diatomic molecule modeled as a double quantum dot attached serially to the leads. A cross-section of the single-particle potential along the main system axis is shown schematically. H = HL + VL +HC + VR +HR, (1) where HC models the central region, HL(R) describes the left (right) lead, and VL(R) is the coupling between the lead and the central region. Both HL(R) and VL(R) terms have a tight–binding form, with the chemical potential in leads µ, the hopping t, and the tunneling amplitude V , as depicted schematically in Fig. 1. The central-region Hamiltonian iσcjσ + iσ 6=jσ′ Uijniσniσ′ + (Ze) 2/R (2) (with i, j = 1, 2 and σ =↑, ↓) describes a double quan- tum dot with electron-electron interaction. tij and Uij are single-particle and interaction elements, the last term de- scribes the Coulomb repulsion of the two ions at the dis- tance R. Here we put Z = 1 and calculate all the param- eters tij , Uij as the Slater integrals [10] for 1s-like hydro- genic orbitals Ψ1s(r) = α3/π exp(−α|r|), where α−1 is the orbital size (cf. Fig. 1). The parameter α is optimized to get a minimal ground-state energy for whole the system Preprint submitted to Elsevier 15 November 2018 http://arxiv.org/abs/0704.0743v2 -3 -2 -1 0 1 chemical potential, 0 1 2 3 4 average filling, 〈n1+n2〉 Fig. 2. Entanglement and transport through the system in Fig. 1 as a function of the chemical potential µ (top panel) and the average filling 〈n1+n2〉 (bottom panel). Tick (thin) solid and dashed lines shows the concurrence C (conductance G) for Γ = t/9 and t/4, respectively. The limits Γ → 0 are depicted with dotted lines in the bottom panel. The interatomic distance is R = 1.5a. 1 2 3 4 5 6 interatomic distance, Fig. 3. Concurrence (tick lines) and conductance (thin lines) at the half-filled sector 〈n1+n2〉 = 2 as a function of the interatomic distance R. The remaining parameters are the same as in Fig. 2. described by the Hamiltonian (1). Thus, following the idea of EDABI method [8], we reduce the number of physical parameters of the problem to just a three: the interatomic distance R, the lead-molecule hybridization Γ = V 2/t, and the chemical potential µ (we put the lead hopping t = 1 Ry = 13.6 eV to work in the wide–bandwidth limit). The entanglement between electrons placed on two atoms can be characterized by the charge concurrence [4] C = 2max 0, |〈c iσcjσ〉| − 〈niσnjσ〉〈n̄iσ n̄jσ〉 where n̄iσ ≡ 1−niσ. We also discuss the conductivity cal- culated from the formula G = G0 sin 2(E+−E−)/4tN [11], where G0 = 2e 2/h̄, and E± are the ground-state ener- gies of the system with periodic and antiperiodic boundary conditions, respectively. Either the energies E± or correla- tion functions in Eq. (3) are calculated within the Rejec– Ramšak variational method [11], complemented by the or- bital size optimization, as mentioned above. We use up to N = 104 sites to reach the convergence. In Fig. 2 we show the concurrence and conductance for R = 1.5a0 (where a0 is the Bohr radius) and two values of the hybridization Γ = t/9 and t/4. The conductance spectrum asymmetry, caused by wave-function renormal- ization [8], is followed by an analogical effect on entangle- ment, which changes significantly faster for the upper con- duction band, where the average filling is 〈n1+n2〉 ≈ 3 (one extra electron). The asymmetry vanish when analyz- ing the system properties as a function of 〈n1+n2〉, show- ing it originates from varying charge compressibility χc = ∂〈n1+n2〉/∂µ ≈ 2/(U11 + U12) ∼ 1/α. We also note the convergence of discussed quantities with Γ → 0 to C ≈ 1− |〈n1+n2〉 − 2|/2 and G ≈ G0 sin 2(π〈n1+n2〉/2). Entanglement evolution with R is illustrated in Fig. 3, where we focus on the charge neutral section 〈n1+n2〉 = 2. The abrupt entanglement drop follows the sharp con- ductance peak for Γ = t/9, which is associated with the competition between double Kondo and spin/charge sin- glet phases [12]. For Γ = t/4 both C andG dependence onR become smooth, but the switching behavior is still present. Earlier, we have shown that Γ = t/4 is large enough to cause molecule instability and therefore may allow the in- dividual atom manipulation [8]. In conclusion, we analyzed a pair entanglement of elec- trons in diatomic molecule attached serially to the leads. Entanglement evolution with the chemical potential speeds up remarkably for the negatively charged system, due to electron correlation effects. The switching behavior was also observed when changing the interatomic distance. The work was supported by Polish Science Foundation (FNP), and Ministry of Science Grant No. 1 P03B 001 29. References [1] See review by C.H. Bennet and D.P. Divincenzo, Nature 404, 247 (2000); M.A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge, 2000). [2] A. Osterloh et al., Nature 416, 608 (2002); T. J. Osborne and M. A. Nielsen, Phys. Rev. A 66, 032110 (2002); S.–J. Gu et al., Phys. Rev. Lett. 93, 086402 (2004). [3] J. van Wezel, J. van den Brink, J. Zaanen, Phys. Rev. Lett. 94, 230401 (2005); cond-mat/0606140. [4] A. Rycerz, Eur. Phys. J. B 52, 291 (2006); S. Oh, J. Kim, Phys. Rev. B 73, 052407 (2007). [5] A. Ramšak, J. Mravlje, R. Žitko, J. Bonča, Phys. Rev. B 74, 241305(R) (2006); R. Žitko, J. Bonča, ibid. 74, 045312 (2006). [6] W.K. Wootters, Phys. Rev. Lett. 80, 2245 (1998). [7] R.H.M. Smit et al., Nature 419, 906 (2002). [8] J. Spa lek et al., cond-mat/0610815. [9] D. Djukic, J.M. van Ruitenbeek, Nano. Lett. 6, 789 (2006); M. Kiguchi et al., cond-mat/0612681. [10] J. C. Slater, Quantum Theory of Molecules and Solids, McGraw– Kill (New York, 1963), Vol. 1, p. 50. [11] T. Rejec, A. Ramšak, Phys. Rev. B 68, 033306 (2003). [12] P.S. Cornaglia, D.R. Grempel, Phys. Rev. B 71, 075305 (2005); J. Mravlje, A. Ramšak, T. Rejec, ibid. 73, 241305(R) (2006). References ABSTRACT We investigate a pair entanglement of electrons in diatomic molecule, modeled as a correlated double quantum dot attached to the leads. The low-temperature properties are derived from the ground state obtained by utilizing the Rejec-Ramsak variational technique within the framework of EDABI method, which combines exact diagonalization with ab initio calculations. The results show, that single-particle basis renormalization modifies the entanglement-switch effectiveness significantly. We also found the entanglement signature of a competition between an extended Kondo and singlet phases. <|endoftext|><|startoftext|> Introduction and Setting Let (Ω,F ,P) be a probability space carrying an N -dimensional Brownian motion (Wt)t≥0 with a d × d correlation matrix. We consider smooth curves Fǫ : R → L2(Ω;RN ) of random variables, where ǫ ∈ R is a parameter. We apply Taylor theorems to obtain strong approximations of the curve Fǫ at ǫ = 0 and we apply partial integration on Wiener space to obtain weak approximations of the law of Fǫ for small values of ǫ. We choose the notion Taylor expansion instead of asymptotic expansion in order to point out that the strong method is indeed a classical Taylor expansion with usual conditions for convergence. The weak method represents a truncated converging power series in the parameter ǫ if – for instance – the payoff f : RN → R stems from a real analytic function and some distributional properties are satisfied. 2. Weak and strong Taylor methods - Structure Theorems We introduce in this section two concepts of approximation. Consider a curve ǫ 7→ Fǫ, where ǫ ∈ R and Fǫ ∈ L2(Ω;RN ). Definition 1. A strong Taylor approximation of order n ≥ 0 is a (truncated) power series (2.1) Tnǫ (Fǫ) := such that (2.2) E |Fǫ −Tnǫ (Fǫ)| = o(ǫn), Financial support from the Austrian Science Fund (FWF) under grant P 15889 and the START- prize-grant Y328-N13 is gratefully acknowledged. Furthermore this work was financially supported by the Christian Doppler Research Association (CDG). The authors gratefully acknowledge a fruitful collaboration and continued support by Bank Austria and the Austrian Federal Financing Agency (ÖBFA) through CDG. http://arxiv.org/abs/0704.0745v1 2 MARIA SIOPACHA AND JOSEF TEICHMANN holds true as ǫ→ 0. Remark 1. In our setting a strong Taylor approximation of any order n ≥ 0 of the curve Fǫ can always be obtained, see for instance [KM97]. Let f : RN → R be a Lipschitz function with Lipschitz constant K, then we obtain (2.3) E |f(Fǫ)− f ǫ (Fǫ) ‖Fǫ −Tnǫ (Fǫ)‖ = Ko(ǫn). Equation (2.3) does not hold anymore if f is not globally Lipschitz continuous. In particular, we observe the dependence of the right hand side on the Lipschitz constant K. Hence, truncating an a-priori known Taylor expansion leads to an error term, which contains the Lipschitz constant and is therefore not useful for non-Lipschitz claims. The weak method navigates around this feature by partial integration. Definition 2. A weak Taylor approximation of order n ≥ 0 is a power series for each bounded, measurable f : RN → R, ǫ (f, Fǫ) := E(f(F0)πi), where πi ∈ L1(Ω) denote real valued, integrable random variables, such that f(Fǫ) −Wnǫ (f, Fǫ)| = o(ǫn). Remark 2. The weights πi for i ≥ 1 are called Malliavin weights. Remark 3. If the law of Fǫ is real analytic at ǫ = 0 in the weak sense, i.e. if there exist (signed) measures µi such that for all bounded, measurable f : R N → R the following series converges and the equality f(Fǫ) f(x)µi(dx), holds true, precisely then we do have a converging weak Taylor expansion. We aim for constructing stochastic representations of the following type, for i ≥ 0: f(x)µi(dx) = E(f(F0)πi). For the definition of the weak Taylor approximation to make sense, existence of the Malliavin weights has to hold. The following theorem can be found in a slightly different version in [MT06] and goes back to S. Watanabe. For the definition and notion of D∞(RN ) see [Mal97] or [Nua06]. Theorem 1. Let Fǫ : R → D∞(RN ) be smooth and assume that the Malliavin co- variance matrix γ(Fǫ) is invertible with p-integrable inverse for every p ≥ 1 around ǫ = 0 (i.e. on an open interval containing ǫ = 0). Then there is a weak Taylor approximation of any order n ≥ 0 and there are explicit formulas for the weights πi. If we only know that the Malliavin covariance matrix γ(F0) is invertible with p-integrable inverse, then we can also calculate the Malliavin weights, since they depend only on γ(F0). WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 3 Proof. Fix n ≥ 0 and take a smooth test function f : RN → R and assume that γ−1(Fǫ) exists as a smooth curve in D∞ on a open ǫ-interval containing ǫ = 0. By standard arguments we can prove the following formula f(Fǫ) f(Fǫ)δ s 7→ (DsFǫ)Tγ−1(Fǫ) More precisely, by the integration by parts [Nua06, Definition 1.3.1-(1.42)], the chain rule [Nua06, Proposition 1.2.3] and the definition of the Malliavin covariance matrix, [Nua06, page 92], we obtain from the right hand side the desired left hand side. Notice that the ǫ-dependence of the Skorohod integral is smooth due to basic properties of D∞. Hence, we can calculate higher derivatives of the left hand side by iterating the above procedure and differentiating the Skorohod integral. We denote (2.4) π1 := δ s 7→ (DsFǫ)Tγ−1(Fǫ) We write then, pars pro toto, the formula for the second derivative E(f(Fǫ)) = E f(Fǫ)δ s 7→ π1(DsFǫ)Tγ−1(Fǫ) f(Fǫ)δ s 7→ (Ds γ−1(Fǫ) f(Fǫ)δ s 7→ (DsFǫ)Tγ−1(Fǫ) dγ(Fǫ) γ−1(Fǫ) f(Fǫ)δ s 7→ (DsFǫ)Tγ−1(Fǫ) This formula makes perfect sense at ǫ = 0 and – by induction – we see that we can perform this step for any derivative. The general, recursive result is the following: as := (DsFǫ) γ−1(Fǫ) for 0 ≤ s ≤ T, πn := δ(s 7→ asπn−1) + πn−1, π0 := 1. Here we understand the weights πn as ǫ-dependent, whereas in the final formulas we put ǫ = 0. This proves the result for smooth test functions f and under the assumption that the Malliavin covariance matrix is invertible around ǫ = 0. If we approximate a bounded, measurable function f by smooth test functions we obtain the desired assertion by standard arguments, since the weights are integrable. � Remark 4. By Taylor’s theorem and the Faà-di-Bruno-formula we obtain dnf(Fǫ) |α|≤n f (α)(Fǫ)pα, where pα is a well-defined polynomial in derivatives of the curve ǫ 7→ Fǫ, for a multi-index α. Since D∞ is an algebra, see [Mal97], the above expression lies in 4 MARIA SIOPACHA AND JOSEF TEICHMANN Lp(Ω) for each p ≥ 0. The previous result provides a representation of the partial integration result for |α|≤n f (α)(Fǫ)pα) = E(f(Fǫ)πn). The structure of the weights is seen from above. The result can be considered as a dual version of the Faà-di-Bruno-formula. However, the structure of this dual formula is much simpler. We provide an example to demonstrate the strong and weak method of approx- imation. The method works in order to replace time-consuming iteration schemes, like the Euler-scheme, by simulations of “simple” Itô integrals. Example 1. We deal with a generic, real-valued random variable over a one- dimensional Gaussian space, see [Nua06], i.e. where the F i lie in the (i+1)st Wiener chaos Hi+1(Ω) (one can think of a Hermite expansion for instance) and the sum is understood in the L2-sense. From the strong expansion we obtain immediately – for a given Lipschitz function f : R → R – that f(Fǫ) f(F 0 + ǫF 1) | ≤ Ko(ǫ), as ǫ→ 0, where K denotes the Lipschitz constant of f . This simple approximation can be sometimes quite useful. We assume now that F 0 = h(s)dWs has non-vanishing variance in order to calculate the weights, which do depend only on γ(F0). The strong Taylor approxi- mation is given by definition, the weak Taylor expansion can be constructed by the previous recursive formulas and the specifications 0 = h(s), γ(F 0) = h(s)2ds, h(s)2ds In order to obtain a first-order approximation for bounded, measurable random variables we therefore have to calculate f(F 0) f(F 0)π1 where π1 = δ s 7→ asF 1 This amounts to an integration of f times a polynomial with respect to a Gaussian density, since: f(F 0)π1 f(F 0)F 1 asdWs f(F 0)DsF Notice that the strong approximation does not yield such a result for bounded, measurable random variables. Notice also that in the given case the approximation can be calculated in a deterministic way, since we deal with Gaussian integrations. WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 5 The second-order weak Taylor approximation is given by f(F 0) f(F 0)π1 + ǫ2E f(F 0)π2 where π2 = δ s 7→ π1asF 1 s 7→ asF 2 3. Applications from Financial Mathematics For applications we want to deal with strong and weak Taylor approximations of a given curve of random variables. We are particulary interested in cases, where the first derivative dFǫ |ǫ=0 is of simple form or – even more important – where the Malliavin covariance matrix γ(F0) is of simple form. In these cases it is easy to obtain first or second order approximations of the respective quantities in the weak or strong sense. In what follows, first we will present one of the most applied interest rate models, namely the LIBOR market model (LMM). Then, we will introduce the commonly used technique of freezing the drift. We will show how to embed the”freezing the drift” technique into our framework of Taylor approximations. We understand freezing the drift as a strong Taylor approximation of order zero in the drift term of the LIBOR SDE. Our goal is to put this technique into a method, where we can in particular improve the order of approximation. We will finally extend the assumption of log normality and develop a stochastic volatility LMM, where we will show how to obtain tractable option prices via our weak Taylor approximations. 3.1. The LIBOR Market Model. We apply our concepts to the LMM, initially constructed by [BGM97], [MSS97] and [Jam97]. Let T denote a strictly positive fixed time horizon and (Ω,FT ,P, (Ft)0≤t≤T ) be a complete probability space, sup- porting an N -dimensional Brownian motion Wt = (W t , ...W t )0≤t≤T . The factors are correlated with dW it dW t = ρijdt. Let 0 = T0 < T1 < T2 < . . . < TN < TN+1 =: T be a discrete tenor structure and α := Ti+1 − Ti the accrual factor for the time period [Ti, Ti+1], i = 0, . . . , N . Let P (t, Ti) denote the value at time t of a zero coupon bond with maturity Ti ∈ [0, T ]. The measure P is the terminal forward measure, which corresponds to taking the final bond P (t, T ) as numéraire. The forward LIBOR rate Lit := Lt(Ti, Ti+1) at time t ≤ Ti for the period [Ti, Ti+1] is given by: Lit = Lt(Ti, Ti+1) = P (t, Ti) P (t, Ti+1) We assume that for any maturity Ti there exists a bounded, continuous, determin- istic function σi(t) : [0, Ti] → R, which represents the volatility of the LIBOR Lit, i = 1, ..., N . The log normal LIBOR market model can be expressed under the measure P as: (3.1) dLit = σ i(t)Lit j=i+1 1 + αL dt+ σi(t)LitdW t , i = 1, ..., N. 3.2. Freezing the Drift. The dynamics of forward LIBORs for i = 1, ..., N − 1 depend on the stochastic drift term , i ≤ j ≤ N , which is determined by LI- BOR rates with longer maturities. This random drift prohibits analytic tractability when pricing products that depend on more that one LIBOR rate, since there is no unifying measure under which all LIBOR rates are simultaneously log normal. 6 MARIA SIOPACHA AND JOSEF TEICHMANN In addition, it encumbers the numerical implementation of the model. Common practice is to approximate this term by its starting value or as it is widely referred to as freezing the drift, i.e. 1 + αL 1 + αL It was first implemented in the original paper [BGM97] for the pricing of swaptions based on the LMM. [BW00] and [Sch02] argue that freezing the drift is justified due to the fact that this term has small variance. However, by freezing the drift there is a difference in option prices with the real and the frozen drift. It has not been examined how big the error is or for which assets it works well or not. Our aim is to investigate such a phenomenon and improve the performance by providing with correction terms of order one. 3.3. Correcting the Frozen Drift. The purpose of this section is to embed the well-known and often applied technique of freezing the drift into the strong and weak Taylor approximations, in order to develop a method to improve the order of accuracy. Specifically for the strong Taylor approximation, the method works well, since we always deal with a globally Lipschitz drift term x 7→ αx+ 1+αx+ with small Lipschitz constant α. Remark 5. As it will be clear later, the strong Taylor correction method can be accommodated with any extension of the log normal LMM, for example with the Lévy LIBOR model by Eberlein and Özkan [EÖ05]. 3.3.1. Strong Taylor Approximation. We first state a useful lemma, asserting that we can indeed freeze the drift under special model formulation and choice parame- ters. Lemma 1. Let ǫ1 ∈ R and consider for i = 1, . . . , N the following stochastic differential equation: (i,ǫ1) t = ǫ1 σi(t)X (i,ǫ1) j=i+1 (j,ǫ1) 1 + αX (j,ǫ1) ρijdt+ dW ,(3.2) defined on the complete probability space (Ω,FT ,P, (Ft)0≤t≤T ) where Wt is an N - dimensional Brownian motion under the measure P with dW it dW t = ρijdt. Then the first-order strong Taylor approximation for X (i,ǫ1) t is given by: (3.3) T1ǫ1(X (i,ǫ1) t ) = X (i,0) t + ǫ1 (i,ǫ1) Proof. By (1) we obtain for n = 1: (i,ǫ1) t ) ≃ X (i,ǫ1) t = X (i,0) 0 + ǫ1Y t + o(ǫ1), since X (i,0) t = X (i,0) 0 and where Y |ǫ1=0X (i,ǫ1) t is the first-order correction term. By differentiating (3.2) with respect to ǫ1, we calculate: (i,ǫ1) = σi(t)X (i,0) j=i+1 (j,0) 1 + αX (j,0) ρijdt+ dW WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 7 and derive Y it as the solution to the above linear SDE: (3.4) Y it = −σi(s)X(i,0)0 j=i+1 (j,0) 1 + αX (j,0) σi(s)X (i,0) with Y i0 = 0. � Remark 6. We parametrise the LIBOR market model in terms of the parameter ǫ1 as follows: (i,ǫ1) t = σ i(t)L (i,ǫ1) j=i+1 (j,ǫ1) 1 + αX (j,ǫ1) ρijdt+ dW and assume at t = 0 that L (i,ǫ1) 0 = X (i,ǫ1) 0 for all ǫ1 and all i = 1, ..., N . If ǫ1 = 1, what we obtain is the standard LIBOR market model formulation and in particular (i,1) t = X (i,1) t . For ǫ1 = 0, X (i,0) t equals its starting value and thus the drift term in the following SDE is no longer stochastic: (3.5) dL (i,0) t = σ i(t)L (i,0) j=i+1 (j,0) 1 + αX (j,0) ρijdt+ dW The next proposition provides a way for a pathwise approximation of L (i,ǫ1) t , by means of adjusting its SDE. This is achieved by adding Tnǫ1(X (j,ǫ1) t ) in the frozen drift part. Proposition 1. Assume the setup of Lemma 1 and assume further at t = 0 that (i,ǫ1) 0 = X (i,ǫ1) 0 for all ǫ1 and all i = 1, ..., N . Then the stochastic differential equation for L (i,ǫ1) t with the unfrozen drift: (3.6) dL (i,ǫ1) t = σ i(t)L (i,ǫ1) j=i+1 (j,ǫ1) 1 + αX (j,ǫ1) ρijdt+ dW can be strongly approximated as ǫ1 ↓ 0 by (3.7) dL̂ (i,ǫ1) t = σ i(t)L̂ (i,ǫ1) j=i+1 Tnǫ1(X (j,ǫ1) σj(t) 1 + α Tnǫ1(X (j,ǫ1) ρijdt+ dW Remark 7. For n = 0, we derive the ”freezing the drift” case. For n = 1, we already obtain an improvement. Proof. First step is to interchange X (j,ǫ1) t with (X (j,ǫ1) t )+ in (3.6) to obtain: (i,ǫ1) t = σ i(t)L (i,ǫ1) j=i+1 (j,ǫ1) t )+σ 1 + α(X (j,ǫ1) ρijdt+ dW This yields no change for the dynamics of L (i,ǫ1) t , since X (j,ǫ1) t = (X (j,ǫ1) t )+. 8 MARIA SIOPACHA AND JOSEF TEICHMANN By Taylor’s expansion, we know that as ǫ1 ↓ 0, L̂(i,ǫ1)t → L (i,ǫ1) t P-a.s. The estimate for the error term is given by log L̂ (i,ǫ1) t − logL (i,ǫ1) σi(s) j=i+1 Tnǫ1(X (j,ǫ1) σj(s) 1 + α Tnǫ1(X (j,ǫ1) ρij + j=i+1 (j,ǫ1) t )+σ 1 + α(X (j,ǫ1) α|X(j,ǫ1)s − (Tnǫ1(X (j,ǫ1) s ))+|ds. Remark 8. The SDE for the approximated L̂ (i,ǫ1) t is easier and faster to simulate than (3.1), as it is exhibited by the following example. Notice additionally that (i,ǫ1) t is a continuous functional of the process Y t (3.4) and of the Brownian path W it . Eventually, by using L̂ (i,ǫ1) t as the LIBOR rates, the computational complexity of the drift and thus of the model can be reduced substantially, while maintaining accuracy of prices. Example 2. In this example, we examine the performance of the strong Taylor correction method. Let N = 3 and consider pricing a caplet on the LIBOR rate L1 with strike K. Its price is given by: 0 = αEP L1T1 −K Assume that the volatility functions σi(t) : [0, Ti] → R for i = 1, 2, 3 are given by (cf. Brigo and Mercurio [BM01], formulation (6.12)): σi(t) = a(Ti − t) + d − b(Ti − t) where the constants a, b, d, e are the same for all three LIBOR rates and are equal to a = −0.113035, b = 0.22911, d = −a, e = 0.684784. Thus, we can write the model under the terminal measure P as: (1,ǫ1) t = σ 1(t)L (1,ǫ1) (2,ǫ1) 2(t)ρ12 1 + αX (2,ǫ1) (3,ǫ1) 3(t)ρ13 1 + αX (3,ǫ1) dt+ σ1(t)L (1,ǫ1) (2,ǫ1) t = σ 2(t)L (2,ǫ1) (3,ǫ1) 3(t)ρ23 1 + αX (3,ǫ1) dt+ σ2(t)L (2,ǫ1) dL3t = σ 3(t)L3t dW (1,ǫ1) t = ǫ1 σ1(t)X (1,ǫ1) (2,ǫ1) 2(t)ρ12 1 + αX (2,ǫ1) (3,ǫ1) 3(t)ρ13 1 + αX (3,ǫ1) dt+ σ1(t)X (1,ǫ1) (2,ǫ1) t = ǫ1 σ2(t)X (2,ǫ1) (3,ǫ1) 3(t)ρ23 1 + αX (3,ǫ1) dt+ σ2(t)X (2,ǫ1) (3,ǫ1) t = ǫ1 σ3(t)X (3,ǫ1) with initial values L (i,ǫ1) 0 = X (i,ǫ1) 0 = ci, for i = 1, 2, 3 and for all ǫ1. The Brownian motion vector (W 1t ,W t ) is correlated with correlation coefficient ρij given by: ρij = 0.49 + (1− 0.49) exp (−0.13|i− j|), i, j = 1, 2, 3. WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 9 The SDEs for the approximated LIBOR rates L̂ (1,ǫ1) t and L̂ (2,ǫ1) t are given by: (1,ǫ1) t = σ 1(t)L̂ (1,ǫ1) c2 + ǫ1Y σ2(t)ρ12 1 + α c2 + ǫ1Y c3 + ǫ1Y σ3(t)ρ13 1 + α c3 + ǫ1Y + σ1(t)L̂ (1,ǫ1) (2,ǫ1) t = σ 2(t)L̂ (2,ǫ1) c3 + ǫ1Y σ3(t)ρ23 1 + α c3 + ǫ1Y dt+ σ2(t)L̂ (2,ǫ1) The partial derivative terms Y 2t and Y t are equal to: Y 2t = c2 ( ∫ t σ2(s)dW 2s − αc3ρ23 1 + αc3 σ2(s)σ3(s)ds Y 3t = c3 σ3(s)dW 3s . We compare three caplet prices: • benchmark price, underlying L(1,ǫ1)t ; • strong Taylor price, underlying L̂(1,ǫ1)t ; • frozen drift price, underlying L(1,0)t . Numerical results in basis points (bps) are displayed in Table 2 for parameters ǫ1 = 1, N = 3, α = 0.50137, c1 = 3.86777%, c2 = 3.7574%, c3 = 3.8631%, T1 = 1.53151, Ti = T1 + iα, i = 2, 3, 4. We characteristically observe the difference in prices between the benchmark and frozen drift price, whilst our strong Taylor correction method performs very well and is computationally simpler and faster. strikes K=3% K=3.5% K=4% K=5.75% K=6.25% K=8% benchmark 11.1831 8.5897 6.5503 3.0349 2.4423 1.2969 strong Taylor 11.0687 8.5691 6.5867 3.1448 2.5513 1.3926 frozen drift 13.9551 11.1822 8.8803 4.6313 3.8506 2.2524 Table 1: Caplet values in bps for parameters ǫ1 = 1, α = 0.50137, c1 = 3.86777%, c2 = 3.7574%, c3 = 3.8631% and T1 = 1.53151. 3.3.2. Weak Taylor Approximation. In what follows, we provide some results on how to correct option prices obtained by the SDE with the frozen drift (3.5) by adding a correction term involving the appropriate Malliavin weight. Let L i,k,ǫ1 denote the vector of the LIBOR rates (L (i,ǫ1) , . . . , L (k,ǫ1) Proposition 2. Assume the setup of Lemma 1, where the ith LIBOR rate is given (3.8) dL (i,ǫ1) t = σ i(t)L (i,ǫ1) j=i+1 (j,ǫ1) 1 + αX (j,ǫ1) ρijdt+ dW with L (i,ǫ1) 0 = X (i,ǫ1) 0 for all ǫ1 and all i = 1, ..., N . Assume furthermore that the Malliavin covariance matrix γ(L i,k,0 ) is invertible. Then the price of an option with 10 MARIA SIOPACHA AND JOSEF TEICHMANN payoff g(L i,k,ǫ1 ), for i ≤ k ≤ N and g bounded measurable, can be approximated by the weak Taylor approximation of order one: a(g,L i,k,ǫ1 ) = P (0, T ) i,k,0 + ǫ1EP i,k,0 ,(3.9) where the Malliavin weight ζTi is given by: ζTi = δ i,k,0 )Tγ−1(L i,k,0 i,k,ǫ1 ,(3.10) for t ≤ Ti. Proof. The weight ζTi is obtained by (2.4). Notice that we can write: i,k,ǫ1 i,k,0 and hence the result (3.9) by Definition 2 for n = 1. � Example 3. In this example we let N = 3 and we price a payers swaption with strike price K and maturity T1, where the underlying swap is entered at T1 and has payment dates T2 and T3. We assume that the volatility functions σ i(t) : [0, Ti] → R for i = 1, 2, 3 are constant: σ1(t) = σ1, σ 2(t) = σ2, σ 3(t) = σ3, such that we obtain under the terminal measure P: (1,ǫ1) t = σ1L (1,ǫ1) (2,ǫ1) 1 + αX (2,ǫ1) (3,ǫ1) 1 + αX (3,ǫ1) dt+ dW 1t (2,ǫ1) t = σ2L (2,ǫ1) (3,ǫ1) 1 + αX (3,ǫ1) dt+ dW 2t dL3t = σ3L t ,(3.11) (1,ǫ1) t = ǫ1 (1,ǫ1) (2,ǫ1) 1 + αX (2,ǫ1) (3,ǫ1) 1 + αX (3,ǫ1) dt+ dW 1t (2,ǫ1) t = ǫ1 (2,ǫ1) (3,ǫ1) 1 + αX (3,ǫ1) dt+ dW 2t (3,ǫ1) t = ǫ1 (3,ǫ1) with initial values L (i,ǫ1) 0 = X (i,ǫ1) 0 = ci, for i = 1, 2, 3 and for all ǫ1. W t and W 2t are correlated with correlation coefficient ρ12. We freeze the drifts in the above equations to obtain: (1,0) t = c1 exp ( αc2σ2 1 + αc2 αc3σ3 1 + αc3 (2,0) t = c2 exp ( αc3σ3 1 + αc3 L3t = c3 exp Similarly to the previous example, we compare four option prices: • benchmark price; • frozen drift; WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 11 • strong Taylor price; • weak Taylor price. The weak correction formula (3.9) adds a correction term to the closed form price of the option. The swaption payoff at Ti can be found for example in [MR98]: swptn k=i+1 (1 + αL if the underlying swap is entered at time Ti and has payment dates Ti+1, ..., T . αk is given by: Kα, k = i+ 1, . . . , N, 1 +Kα, k = N + 1. The payers swaption value at time t = 0 can be written as: swptn 0 = P (0, Ti)EPi swptn = P (0, T )EP (1 + αL )− (1 +Kα) ,(3.12) where αi := −1 and Pi denotes the forward measure corresponding to the bond P (t, Ti) as numéraire. Therefore, its benchmark price is given by the above formula with N = 2 and i = 1: swptn 0 = P (0, T ) αL1T1 + αL + α2L1T1L −Kα2L2T1 − 2Kα Its weak Taylor price is given by (3.9) with i = 1 and k = N = 2. swptn 0 = P (0, T ) (1,0) (2,0) + α2L (1,0) (2,0) −Kα2L(2,0) − 2Kα + ǫ1EP (1,0) (2,0) + α2L (1,0) (2,0) −Kα2L(2,0) − 2Kα The weight ζT1 is given by (3.10). The partial derivative terms C |ǫ1=0L (1,ǫ1) and C2T1 := |ǫ1=0L (2,ǫ1) are given by: C1T1 = L (1,0) σ1ρ12 ( σ3αc3β2 (1 + αc3) t− (β2 + β3)W 2t C2T1 = L (2,0) −σ2β3W 2t dt, with C10 = C 0 = 0 and β2 := (1+αc2)2 , β3 := (1+αc3)2 . The Malliavin covariance matrix of the vector (L (1,0) (2,0) ) is equal to: (1,0) (2,0) (1 + ρ212)(L (1,0) )2T1σ 1 2ρ12(L (1,0) (2,0) )T1σ1σ2 2ρ12(L (1,0) (2,0) )T1σ1σ2 (1 + ρ 12)(L (2,0) )2T1σ ⇒ det (1,0) (2,0) (1,0) (2,0) )2T 21 σ 2(1− ρ212). 12 MARIA SIOPACHA AND JOSEF TEICHMANN The determinant is not zero as long as ρ12 6= 1, which is a natural assumption. Hence under this condition, its inverse is given by: (1,0) (2,0) (1− ρ212) 1+ρ212 (1,0) )2T1σ − 2ρ12 (1,0) (2,0) T1σ1σ2 − 2ρ12 (1,0) (2,0) T1σ1σ2 1+ρ212 (2,0) )2T1σ Write the weight ζT1 = ζ + ζ2T1 , where the first weight ζ is obtained as: ζ1T1 = (1,0) C1T1γ 11 + C γ−112 +D1tL (2,0) C1T1γ 21 + C γ−122 δW 1t , and ζ2T1 similarly: ζ2T1 = (1,0) C1T1γ 11 + C γ−112 +D2tL (2,0) C1T1γ 21 + C γ−122 δW 2t . Performing all necessary calculations, we conclude that: ζ1T1 = ρ12 W 1T1 (σ3αc3β2T1 2(1 + αc3) − (β2 + β3) W 2t dt ρ12(β2 + β3)T1 − ρ12 (ρ12β3T1 W 2t dt Analogously we obtain ζ2T1 as: ζ2T1 = ρ W 2T1 (σ3αc3β2T1 2(1 + αc3) (β2 + β3) W 2t dt (β2 + β3)T1 (β3T1 W 2t dt Notice that the weights are functions of normal variables and thus the calculation of the weak Taylor price amounts just to computation of deterministic integrals. Table 3 gives the swaption prices in bps for parameters N = 3, α = 0.25, σ1 = 18%, σ2 = 15%, σ3 = 12%, c0 = 5.28875%, c1 = 5.37375%, c2 = 5.40%, c3 = 5.40125% and ρ12 = 0.75. strikes K=4% K=4.5% K=4.75% K=5% K=5.15% K=5.25% benchmark 10.2240 6.5386 4.7454 3.1060 2.2599 1.7758 frozen drift 10.2132 6.5326 4.7419 3.1028 2.2582 1.7618 strong Taylor 10.2240 6.5386 4.7454 3.1060 2.2599 1.7758 weak Taylor 10.2266 6.5407 4.7485 3.1064 2.2593 1.7626 Table 2: Swaption values in bps for parameters ǫ1 = 1, α = 0.25, σ1 = 18%, σ2 = 15%, σ3 = 12%, c0 = 5.28875%, c1 = 5.37375%, c2 = 5.40%, c3 = 5.40125% and ρ12 = 0.75. WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 13 3.4. The Stochastic Volatility LIBOR Market Model. In this section, we develop a stochastic volatility LMM. The stochastic volatility parameter vt follows a square root process, like in the extensively applied Heston model [Hes93]. The resulting model, called hereafter the stochastic volatility LMM (SVLMM), has the following dynamics under the terminal measure: dLit = σ i(t)Lit j=i+1 1 + αL vtdt+ dW , i = 1, ..., N,(3.13) dvt = κ(θ − vt)dt+ ǫ2 vtdBt, where κ, θ, ǫ2 ∈ R+. The Brownian motions Wt = (W 1t , ...,WNt ) and Bt are ex- pressed under the terminal measure with correlations dW it dBt = ρidt and dW ρijdt for i, j = 1, ...N . We assume additionally that the filtration (Ft)0≤t≤T is gen- erated by both Brownian motions. Observe that the process vt is a time-changed squared Bessel process with dimension δ = 4κθ/ǫ22. If δ ≥ 2, then the point zero is unattainable. So we require 2κθ ≥ ǫ22 for the process vt not to reach zero. 3.4.1. Pricing a multi-LIBOR option. In this section, we aim at approximating the price of an option with payoff depending on the vector L i,k,ǫ1,ǫ2 i,ǫ1,ǫ2 , . . . , L k,ǫ1,ǫ2 We interpret the volatility of the volatility parameter ǫ2 as a parameter on which the LIBOR rates depend. Overall, we parametrise the SVLMM by both ǫ1 and ǫ2 and correct prices in a weak sense introducing Malliavin weights. Proposition 3. Consider the SVLMM (3.13) and assume that the Malliavin co- variance matrix γ(L i,k,0,0 ) is invertible. Then the price of an option with payoff i,k,ǫ1,ǫ2 ), i ≤ k ≤ N , where ψ is a bounded measurable function, can be approx- imated by the weak Taylor approximation of order one: (ǫ1,ǫ2) i,k,ǫ1,ǫ2 )) = P (0, T ) i,k,0,0 + ǫ1EP i,k,0,0 + ǫ2EP i,k,0,0 ,(3.14) where the Malliavin weights ζTi , πTi are given by: ζTi = δ i,k,0,0 )Tγ−1(L i,k,0,0 i,k,ǫ1,0 ,(3.15) πTi = δ i,k,0,0 )Tγ−1(L i,k,0,0 i,k,0,ǫ2 ),(3.16) for t ≤ Ti. Proof. The weights ζTi and πTi are obtained by (2.4). We derive (3.14) by noticing that: i,k,ǫ1,ǫ2 i,k,0,0 i,k,ǫ1,0 i,k,0,ǫ2 i,k,0,0 + ǫ1E i,k,0,0 + ǫ2E i,k,0,0 from Definition 2 for n = 1. � 14 MARIA SIOPACHA AND JOSEF TEICHMANN Example 4. Let N = 2 and consider the SVLMM where the volatility functions σi(t) : [0, Ti] → R for i = 1, 2 are assumed to be constant and in particular σ1(t) = σ1, σ 2(t) = σ2. We derive an approximative formula for the price of a payers swaption with maturity T1 and strike price K. The underlying swap is entered at T1 and has payment dates T2, T3. Under the terminal measure P we can write the SDEs for the LIBOR rates and stochastic volatility as: dvǫ2t = κ θ − vǫ2t dt+ ǫ2 vǫ2t dBt, (1,ǫ1,ǫ2) t = −L (1,ǫ1,ǫ2) t ρ12 (2,ǫ1,ǫ2) 1 + αX (2,ǫ1,ǫ2) t dt+ σ1L (1,ǫ1,ǫ2) vǫ2t dW (2,ǫ2) t = σ2L (2,ǫ2) vǫ2t dW (2,ǫ1) t = ǫ1 (2,ǫ2) vǫ2t dW W 1t and W t are assumed to be correlated, so correlations are as dW t dBt = ρidt and dW 1t dW t = ρ12 for i = 1, 2. The (0, 0)-model is given by: v0t = exp (−κt)(v00 − θ) + θ, (1,0,0) = c1 exp v0t dW ( αc2ρ12 1 + αc2 (2,0) = c2 exp v0t dW (2,0,0) t = c2, with c := v0t dt = θT1 − (exp (−κT1) − 1). As in the previous example, we compare the following option prices: • benchmark price; • frozen drift; • weak Taylor price (3.14). The benchmark price is given by (3.12) with N = 2 and i = 1: swptn 0 = P (0, T )EP (1,ǫ1,ǫ2) (2,ǫ2) (1,ǫ1,ǫ2) (2,ǫ2) −Kα2L(2,ǫ2) The weak Taylor price is obtained by (3.14): swptn 0 = P (0, T ) (1,0,0) (2,0) + α2L (1,0,0) (2,0) −Kα2L(2,0) − 2Kα + ǫ1EP (1,0,0) (2,0) + α2L (1,0,0) (2,0) −Kα2· · L(2,0) − 2Kα + ǫ2EP (1,0,0) (2,0) + α2L (1,0,0) (2,0) −Kα2L(2,0)T1 − 2Kα We calculate the Malliavin weights ζT1 , πT1 as given by (3.15) and (3.16) corre- spondingly. We can express the weight ζT1 as: ζT1 = ζ + ζ2T1 , WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 15 with: ζ1T1 = (1,0,0) (1,ǫ1,0) γ−1(L (1,0,0) (2,0,0) (2,0,0) (1,ǫ1,0) γ−1(L (1,0,0) (2,0,0) δW 1t , ζ2T1 = (1,0,0) (1,ǫ1,0) γ−1(L (1,0,0) (2,0,0) (2,0,0) (1,ǫ1,0) γ−1(L (1,0,0) (2,0,0) δW 2t . since ∂ |ǫ1=0 L (2,ǫ1,0) = 0. The partial derivative term with respect to ǫ1 for L given by: (1,ǫ1,0) (1,0,0) (1 + αc2)2 v0sdW = −σ1ρ12β2L(1,0,0)T1 θ(T1 − t)− v00 − θ (exp (−κT1)− exp (−κt)) dW 2t , where β2 = (1+αc2)2 . Similarly the weight πT1 is given by: πT1 = π + π2T1 , with: π1T1 = (l,0,0) (j,0,ǫ2) (1,0,0) (2,0,0) δW 1t , π2T1 = (l,0,0) (j,0,ǫ2) (1,0,0) (2,0,0) δW 2t . Partial derivative terms are equal to: (1,0,ǫ2) (1,0,0) exp (−κt) exp (κs) v0sdBsdW αc2σ1σ2ρ12 1 + αc2 exp (κs) exp (−κT1)− exp (−κs) Doing similar calculations, we derive the second partial derivative: (2,ǫ2) (2,0) σ1σ2Vtdt where Vt = exp (−κt) exp (κs) v0sdBs. 16 MARIA SIOPACHA AND JOSEF TEICHMANN We calculate the Malliavin covariance matrix γ (1,0,0) (2,0,0) and its in- verse. (1 + ρ212)(L (1,0,0) )2σ21 v0t dt ︸ ︷︷ ︸ 2ρ12L (1,0,0) (2,0,0) σ1σ2c 2ρ12L (1,0,0) (2,0,0) σ1σ2c (1 + ρ 12)(L (2,0,0) )2σ22c ⇒ det (1,0,0) (2,0,0) (1,0,0) (2,0,0) )2σ21σ 2(1− ρ212). Hence its inverse is given by, for ρ12 6= 1: γ−1 = (1− ρ212) 1+ρ212 (1,0,0) )2σ21c − 2ρ12 (1,0,0) (2,0,0) σ1σ2c − 2ρ12 (1,0,0) (2,0,0) σ1σ2c 1+ρ212 (2,0,0) )2σ22c If we define Xi = v0t dW t , i = 1, 2 and Y = θ(T1−t)− v (exp (−κT1)− exp (−κt)) dW 2t , we finally obtain the weights as: ζ1T1 = − ρ12β2 X1Y − Cov(X1, Y ) ζ2T1 = ρ212β2 X2Y − Cov(X2, Y ) Moreover, for the weight πT1 we define: exp (−κT1)− exp (−κt) and random variables Di, Zi for i = 1, 2: g(s)dW isdW g(s)dZisdW where the Brownian motions Zit are independent from W t and f(t) = exp (−κt)√ g(s) = exp (κs) v0s . Therefore, we obtain the weights as: π1T1 = X1(ρ1D1 + 1− ρ21Z1) + (αc2(2ρ12σ2 + σ1) + σ1 1 + αc2 (αc2(2ρ12σ2 + σ1) + σ1 1 + αc2 X1(ρ2D2 + 1− ρ22Z2)+ σ1BX1 − σ1ρ1E WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 17 where E equals to 1− exp (−κT1) . Similarly we get π2T1 as: π2T1 = X2(ρ2D2 + 1− ρ22Z2) + σ2X2 + 1 − σ2ρ2E − ρ12 X2(ρ1D1 + 1− ρ21Z1) + σ1BX2 (αc2(2ρ12σ2 + σ1) + σ1 1 + αc2 − σ1ρ2E (αc2(2ρ12σ2 + σ1) + σ1 1 + αc2 In this example, the weights are functions of normal variables and double sto- chastic integrals, which are computed via simulation. Table 4 reports the swaption prices in bps with parameters N = 2, α = 1.5, σ1 = 25%, σ2 = 15%, c0 = 5.28875%, c1 = 5.4%, c2 = 5.39%, v0 = 1, ρ1 = −0.75, ρ2 = −0.6, κ = 2.3767, θ = 0.2143, ǫ2 = 25%, ρ12 = 0.63. strikes K=3.5% K=4% K=5% K=6% K=7% K=8% benchmark 3.8984 2.9221 1.2588 0.3858 0.1019 0.0216 (0, 0)-model 3.8951 2.9053 1.2705 0.3966 0.0942 0.0185 weak Taylor 3.8990 2.9159 1.2694 0.3791 0.1042 0.0210 Table 3: Stochastic volatility swaption values in bps for parameters ǫ1 = 1, α = 1.5, σ1 = 25%, σ2 = 15%, c0 = 5.28875%, c1 = 5.4%, c2 = 5.39%, v0 = 1, ρ1 = −0.75, ρ2 = −0.6, κ = 2.3767, θ = 0.2143, ǫ2 = 25%, ρ12 = 0.63. References [BGM97] A. Brace, D. Gatarek, and M. Musiela, The Market Model of Interest Rate Dynamics, Mathematical Finance 7 (1997), no. 2, 127–155. [BM01] D. Brigo and F. Mercurio, Interest Rate Models: Theory and Practice, Springer Finance, Springer, 2001. [BW00] A. Brace and R.S. Womersley, Exact Fit to the Swaption Volatility Matrix using Semi- definite Programming, Working paper, presented at ICBI Global Derivatives Confer- ence, Paris, April 2000, 2000. [EÖ05] E. Eberlein and F. Özkan, The Lévy LIBOR model, Finance and Stochastics 9 (2005), 327348. [Hes93] S. Heston, A Closed-Form Solution for Options with Stochastic Volatility with Applica- tions to Bond and Currency Options, The Review of Financial Studies 6 (1993), no. 2, 327–343. [Jam97] F. Jamshidian, LIBOR and Swap Market Models and Measures, Finance and Stochas- tics 1 (1997), 293–330. [KM97] A. Kriegl and P. W. Michor, The Convenient Setting of Global Analysis, American Mathematical Society, 1997. [Mal97] P. Malliavin, Stochastic Analysis, Springer, 1997. [MR98] M. Musiela and M. Rutkowski, Martingale Methods in Financial Modelling, second ed., Springer, 1998. [MSS97] K. Miltersen, K. Sandmann, and D. Sondermann, Closed Form Solutions for Term Structures Derivatives with Log-Normal Interest Rates, Journal of Finance 52 (1997), 409–430. [MT06] P. Malliavin and A. Thalmaier, Stochastic Calculus of Variations in Mathematical Fi- nance, Springer, 2006. 18 MARIA SIOPACHA AND JOSEF TEICHMANN [Nua06] D. Nualart, The Malliavin Calculus and Related Topics, second ed., Springer Verlag, 2006. [Sch02] E. Schlögl, A Multicurrency Extension of the Lognormal Interest Rate Market Models, Finance and Stochastics 6 (2002), 173196. Department of Mathematical Methods in Economics, Vienna University of Technol- ogy, Wiedner Hauptstrasse 8–10/105–1, A-1040 Vienna, Austria. E-mail address: [josef.teichmann,siopacha]@fam.tuwien.ac.at 1. Introduction and Setting 2. Weak and strong Taylor methods - Structure Theorems 3. Applications from Financial Mathematics 3.1. The LIBOR Market Model 3.2. Freezing the Drift 3.3. Correcting the Frozen Drift 3.4. The Stochastic Volatility LIBOR Market Model References ABSTRACT We apply results of Malliavin-Thalmaier-Watanabe for strong and weak Taylor expansions of solutions of perturbed stochastic differential equations (SDEs). In particular, we work out weight expressions for the Taylor coefficients of the expansion. The results are applied to LIBOR market models in order to deal with the typical stochastic drift and with stochastic volatility. In contrast to other accurate methods like numerical schemes for the full SDE, we obtain easily tractable expressions for accurate pricing. In particular, we present an easily tractable alternative to ``freezing the drift'' in LIBOR market models, which has an accuracy similar to the full numerical scheme. Numerical examples underline the results. <|endoftext|><|startoftext|> Finite bias visibility of the electronic Mach-Zehnder interferometer Preden Roulleau, F. Portier, D. C. Glattli,∗ and P. Roche† Nanoelectronic group, Service de Physique de l’Etat Condensé, CEA Saclay, F-91191 Gif-Sur-Yvette, France A. Cavanna, G. Faini, U. Gennser, and D. Mailly CNRS, Phynano team, Laboratoire de Photonique et Nanostructures, Route de Nozay, F-91460 Marcoussis, France (Dated: November 28, 2018) We present an original statistical method to measure the visibility of interferences in an electronic Mach-Zehnder interferometer in the presence of low frequency fluctuations. The visibility presents a single side lobe structure shown to result from a gaussian phase averaging whose variance is quadratic with the bias. To reinforce our approach and validate our statistical method, the same experiment is also realized with a stable sample. It exhibits the same visibility behavior as the fluctuating one, indicating the intrinsic character of finite bias phase averaging. In both samples, the dilution of the impinging current reduces the variance of the gaussian distribution. PACS numbers: 85.35.Ds, 73.43.Fj Nowadays quantum conductors can be used to per- form experiments usually done in optics, where electron beams replace photon beams. A beamlike electron mo- tion can be obtained in the Integer Quantum Hall Effect (IQHE) regime using a high mobility two dimensional electron gas in a high magnetic field at low temperature. In the IQHE regime, one-dimensional gapless excitation modes form, which correspond to electrons drifting along the edge of the sample. The number of these so-called edge channels corresponds to the number of filled Lan- dau levels in the bulk. The chirality of the excitations yields long collision times between quasi-particles, mak- ing edge states very suitable for quantum interferences experiments like the electronic Mach-Zehnder interfer- ometer (MZI) [1, 2, 3]. Surprisingly, despite some ex- periments which show that equilibrium length in chiral wires is rather long [4], very little is known about the coherence length or the phase averaging in these ”per- fect” chiral uni-dimensional wires. In particular, while in the very first interference MZI experiment the interfer- ence visibility showed a monotonic decrease with voltage bias, which was attributed to phase noise [1], in a more recent paper, a surprising non-monotonic decrease with a lobe structure was observed [5]. A satisfactory expla- nation has not yet been found, and the experiment has so far not been reported by other groups to confirm these results. We report here on an original method to measure the visibility of interferences in a MZI, when low frequency phase fluctuations prevent direct observation of the peri- odic interference pattern obtained by changing the mag- netic flux through the MZI. We studied the visibility at finite energy and observed a single side lobe structure, which can be explained by a gaussian phase averaging whose variance is proportional to V 2, where V is the FIG. 1: SEM view of the electronic Mach-Zehnder with a schematic representation of the edge state. G0, G1, G2 are quantum point contacts which mimic beam splitters. The pairs of split gates defining a QPC are electrically connected via a Au metallic bridge deposited on an insolator (SU8). G0 allows a dilution of the impinging current, G1 and G2 are the two beam splitters of the Mach-Zehnder interferometer. SG is a side gate which allows a variation of the length of the lower path (b). bias voltage. To reinforce our result and check if low frequency fluctuation may be responsible for that behav- ior, we realized the same experiment on a stable sample : we also observed a single side lobe structure which can be fitted with our approach of gaussian phase averaging. This proves the validity of the results, which cannot be an artefact due to the low frequency phase fluctuations in the first sample. In both samples, the dilution of the impinging current has an unexpected effect : it decreases the variance of the gaussian distribution. The MZI geometry is patterned using e-beam lithogra- phy on a high mobility two dimensional electron gas in a GaAs/Ga1−xAlxAs heterojunction with a sheet density nS = 2.0×1011 cm−2 and a mobility of 2.5×106 cm2/Vs. http://arxiv.org/abs/0704.0746v3 The experiment was performed in the IQHE regime at filling factor ν = nSh/eB = 2 (magnetic field B =5.2 Tesla). Transport occurs through two edge states with an extremely large energy redistribution length [4]. Quan- tum point contacts (QPC) controlled by gates G0, G1 and G2 define electronic beam splitters with transmis- sions T0, T1 and T2 respectively. In all the results pre- sented here, the interferences were studied on the outer edge state schematically drawn as black lines in Fig.(1), the inner edge state being fully reflected by all the QPCs. The interferometer consists of G1, G2 and the small cen- tral ohmic contact in between the two arms. G1 splits the incident beam into two trajectories (a) and (b), which are recombined with G2 leading to interferences. The two arms defined by the mesa are 8 µm long and en- close a 14 µm2 area. The current which is not transmit- ted through the MZI, IB = ID − IT , is collected to the ground with the small ohmic contact. An additional gate SG allows a change of the length of the trajectory (b). The impinging current I0 can be diluted thanks to the beam splitter G0 whose transmission T0 determines the diluted current dID = T0 × dI0. We measure the differ- ential transmission through the MZI by standard lock-in techniques using a 619 Hz frequency 5 µVrms AC bias VAC superimposed to the DC voltage V . This AC bias modulates the incoming current dID = T0 × h/e2 ×VAC , and thus the transmitted current in an energy range close to eV , giving the transmission T (eV ) = dIT /dI0. Using the single particle approach of the Landauer- Büttiker formalism, the transmission amplitude t through the MZI is the sum of the two complex transmission amplitudes corresponding to paths (a) and (b) of the interferometer; t = t0{t1 exp(iφa)t2 − r1 exp(iφb)r2}. This leads to a transmission probabil- ity T (ǫ) = T0{T1T2 + R1R2 + T1R2R1T2 sin[ϕ(ǫ)]}, where ϕ(ǫ) = φa − φb and Ti = |ti|2 = 1 −Ri. ϕ(ǫ) cor- responds to the total Aharonov-Bohm (AB) flux across the surface S(ǫ) defined by the arms of the MZI, ϕ(ǫ) = 2πS(ǫ) × eB/h. The surface S depends on the energy ǫ when there is a finite length difference ∆L = La−Lb be- tween the two arms. This leads to a variation of the phase with the energy, ϕ(ǫ+EF ) = ϕ(EF )+ǫ∆L/(~vD), where vD is the drift velocity. When varying the AB flux, the interferences manifest themselves as oscillations of the transmission; in practice this is done either by varying the magnetic field or by varying the surface of the MZI with a side gate [1, 5, 6]. The visibility of the interfer- ences defined as V = (TMAX−TMIN )/(TMAX+TMIN ), is maximum when both beam splitter transmission are set to 1/2. In the present experiment the MZI is designed with equal arm lengths (∆L = 0) and the visibility is not expected to be sensitive to the coherence length of the source ~vD/max(kBT, eVAC). Thus the visibility pro- vides a direct measurement of the decoherence and/or phase averaging in this quantum circuit. In Ref.[1], 60% visibility was observed at low tem- -0.76 -0.72 -0.68 -0.64 -0.4 -0.2 0.0 0.2 0.4 2 x Visibility δT / T 0.0 0.2 0.4 0.6 0.8 1.0 Transmission T gate voltage (V or V ) (Volt) FIG. 2: Sample #1 a)Transmission T = dIT /dI0 as a func- tion of the gate voltages V 1 and V 2 applied on G1 and G2. (◦) T = T1 versus V 1. (•) T = T2 versus V 2. The solid line is the transmission T obtained with T1 fixed to 1/2 while sweeping V 2 : transmission fluctuations due to interferences with low frequency phase noise appears. b) Stack histogram on 6000 successive transmission measurements as a function of the normalized deviation from the mean value. The solid line is the distribution of transmission expected for a uniform distribution of phases. c)Visibility of interferences as a func- tion of the transmission T2 when T1 = 1/2. The solid line is T2(1− T2) dependence predicted by the theory. perature, showing that the quantum coherence length can be at least as large as several micrometer at 20 mK (and probably larger if phase averaging is the limiting factor). At finite energy (compared to the Fermi en- ergy), the visibility was also found decreasing with the bias voltage[1, 5, 6]. This effect is not due to an increase of the coherence length of the electron source which re- mains determined by eVAC or kBT [7]. In a first exper- iment, a monotonic visibility decrease was found, which was attributed to phase averaging, as confirmed by shot noise measurements [1]. Nevertheless, it remains unclear why and how the phase averaging increases with the bias. In a recent paper, instead of a monotonic decrease of the visibility, a lobe structure was observed for filling factor less than 1 in the QPCs [5]. No non-interacting electron model was found to be able to explain this observation, and although interaction effects have been proposed [8], a satisfactory explanation has not yet been found to ac- count for all the experimental observations. So far, two experiments have shown to two different behaviors, rais- ing questions about the universality of these observations. Here, we report experiments where different samples give consistent results, with a fit to the data clearly demon- strating that our MZI suffers from a gaussian phase av- eraging whose variance is proportional to V 2, leading to the single side lobe structure of the visibility. We have used the following procedure to tune the MZI. We first measure independently the two beam splitters’ transparencies versus their respective gate voltages, the inner edge state being fully reflected. This is shown in figure (2a) where the transmission (T1 or T2) through one QPC is varied while keeping unit transparency for the other QPC. This provides the characterization of the transparency of each beam splitter as a function of its gate voltage. The fact that the transmission vanishes for large negative voltages means that the small ohmic contact in between the two arms can absorb all incom- ing electrons, otherwise the transmission would tend to a finite value. This is very important in order to avoid any spurious effect in the interference pattern. In a sec- ond step we fix the transmission T1 to 1/2 while sweep- ing the gate voltage of G2 (solid line of figure (2a)). Whereas for a fully incoherent system the T should be 1/2× (R2 + T2) = 1/2, we observe large temporal trans- mission fluctuations around 1/2. We show in the fol- lowing that they result from the interferences, expected in the coherent regime, but in presence of large low fre- quency phase noise. This is revealed by the probabil- ity distribution of the transmissions obtained when mak- ing a large number of transmission measurements for the same gate voltage. Figure (2b) shows a histogram of T when making 6000 measurements (each measurement be- ing separated from the next by 10 ms). The histogram of the transmission fluctuations δT = T − Tmean displays two maxima very well fitted using a probability distribu- tion p(δT /Tmean) = 1/(2π 1− (δT /Tmean)2/V2) (the solid line of figure (2b)). This distribution is obtained assuming interferences δT = Tmean×Vsin(ϕ) and a uni- form probability distribution of ϕ over [−π,+π]. Note that the peaks around |δT /Tmean| = V have a finite width. They correspond to the gaussian distribution as- sociated with the detection noise which has to be convo- luted with the previous distribution. Although no regular oscillations of transmission can be observed due to phase noise, we can directly extract the visibility of the interferences by calculating the vari- ance of the fluctuations (the approach is similar to mea- surements of Universal Conductance Fluctuations via the amplitude of 1/f noise in diffusive metallic wires) [10]. As expected when T1 = 1/2, the visibility extracted by our method is proportional to T2(1− T2), definitively showing that fluctuations results from interference: we are able to measure the visibility of fluctuating interfer- ences (see figure (2c)). The visibility depends on the bias voltage with a lobe structure shown in figure (3), confirming the pioneer- ing observation [5]. Nevertheless, there are marked dif- ferences. The visibility shape is not the same as that in ref.[5]. We have always seen only one side lobe, al- though the sensitivity of our measurements would be high enough to observe a second one if it existed. Moreover, -100 -50 0 50 100 = 0.02 = 0.14 Drain-Source Voltage (µV) FIG. 3: (Color online) Sample #1 : Visibility of the inter- ferences as a function of the drain-source voltage I0h/e 2 for three different values of T0. The curves are shifted for clar- ity. The energy width of the lobe structure is modified by the dilution whereas the maximum visibility at zero bias is not modified. Solid lines are fits using equation (1). From top to bottom, T0 = 0.02 and V0 = 31 µV, T0 = 0.14 and V0 = 22 µV, T0 = 1 and V0 = 11.4 µV. the lobe width (see figure (3)) can be increased by di- luting the impinging current with G0, whereas no such effect is seen for G1 and G2. This apparent increase of the energy scale cannot be attributed to the addition of a resistance in series with the MZI because G0 is close to the MZI, at a distance shorter than the coherence length. An almost perfect fit for the whole range of T0 (dilu- tion), is V = V0e−V 2/2V 2 0 |1− V ID dID/dV |, (1) where V0 is a fitting parameter. Equation (1) is obtained when assuming a gaussian phase averaging with a vari- ance < δϕ2 > proportional to V 2 and a length difference ∆L small enough to neglect the energy dependence of the phase in the observed energy range eV ≪ ~vD/∆L. In such a case, the interfering part of the current I∼ is thus proportional to ID sin(ϕ). The gaussian distribu- tion of the phase leads to I∼ ∝ ID sin(< ϕ >)e−<δϕ 2>/2, where < ϕ > is the mean value of the phase distribu- tion. The measured interfering part of the transmission, T∼ = h/e2 dI∼/dV gives a visibility corresponding to for- mula (1) when < δϕ2 >= V 2/V 2 . Such behavior gives a nul visibility accompanied with a π shift of the phase when V ID/(V dID/dV ) = 1. When T0 ∼ 1, ID is pro- portional to V and the width of the central lobe is simply equal to 2V0. However in the most general case, dID/dV varies with V . One can see in figure (3) that the fit with Equation (1) is very good, definitively showing that the -20 0 20 0.4 0.6 0.8 / dI 0.5 0.6 0.7 / dI V (µV) -40 -20 0 20 40 d) T = 0.06 Drain Source Voltage (µV) FIG. 4: (Color online) Sample #2 : a) Gray plot of the transmission T as a function of the bias voltage V and the side gate voltage VSG. Note the π shift of the phase when the visibility reaches 0. b) & c) T as a function of the side gate voltage for two different values of the drain source volt- age corresponding to the dashed line of a) (0 and 16 µV respectively). d) Lobe structure of the visibility fitted using equation (1) for a diluted and an undiluted impinging current. existence of one side lobe, as observed in the experiment of ref.[5] at ν = 2 (for the highest fields) and at ν = 1, can be explained within our simple approach. Concern- ing multiple side lobes, we cannot yet conclude if they do arise from long range interaction as recently proposed by ref.[8]. Our geometry is different from the one used in the earlier experiment [5] and the coupling between counter propagating edge states, thought to be responsible for multiple side lobe [8], should be less efficient here. To check if low frequency fluctuations have an impact on the finite bias phase averaging, we have studied an- other sample, with the same geometry and fabricated simultaneously (sample #2), which exhibits clear inter- ference pattern (see Figure (4a,b,c)). As one can remark on figure (4d), the lobe structure is well fitted with our theory, definitively showing that the gaussian phase av- eraging is not associated with low frequency phase fluc- tuations. It is noteworthy that V0 increases (see figure (5)) with the dilution, namely when the transmission T0 at zero bias decreases. An impact of the dilution was already observed as it suppressed multiple side lobes [9] (arXiv version of Ref.[5]), but the conclusion was that the width of the central lobe was barely affected. Here, dilution plays a clear role whose T0 dependence is the same for the two studied samples, once normalized to the not di- luted case. This dilution effect is nevertheless not easy to explain. For example, mechanisms like screening, in- tra edge scattering and fluctuations mediated by shot noise should have maximum effect at half transmission, in contradiction with figure (5). More generally, it is difficult to determine if the process responsible for the phase averaging introduced in our model is located at the beam splitters, or is uniformly distributed along the interfering channels. However, setting T1 = 0.02 or 0.05, keeping T2 = 0.5, leaves the lobe width unaffected. This shows that, if located at the Quantum Point Contacts, the phase averaging process is independent of transmis- sion. 0.0 0.2 0.4 0.6 0.8 1.0 Sample #1 : V (1) = 13.7 µV Sample #2 : V (1) = 10.6 µV FIG. 5: (Color online) V0 obtained by fitting the visibility with equation (1), normalized to V0 at T0 = 1, as a function of T0 at zero bias. To summarize, we propose a statistical method to mea- sure the visibility of ”invisible” interferences. We observe a single side lobe structure of the visibility on stable and unstable samples which is shown to result from a gaus- sian phase averaging whose variance is proportional to V 2. Moreover, this variance is shown to be reduced by diluting the impinging current. However, the mechanism responsible for such type of phase averaging remains yet unexplained. The authors would like to thank M. Büttiker for fruit- ful discussions. This work was supported by the French National Research Agency (grant n◦ 2A4002). ∗ Also at LPA, Ecole Normale Supérieure, Paris. † Electronic address: patrice.roche@cea.fr [1] Y. Ji, Y. Chung, D. Sprinzak, M. Heiblum, D. Mahalu, and H. Shtrikman, Nature 422, 415 (2003). [2] P. Samuelsson, E. V. Sukhorukov, and M. Büttiker, Phys. Rev. Lett. 92, 026805 (2004). [3] I. Neder, N. Ofek, Y. Chung, M. Heiblum, D. Mahalu, and V. Umansky, arXiv:0705.0173 (2007). [4] T. Machida, H. Hirai, S. Komiyama, T. Osada, and Y. Shiraki, Solid State Commun. 103, 441 (1997). mailto:patrice.roche@cea.fr http://arxiv.org/abs/0705.0173 [5] I. Neder, M. Heiblum, Y. Levinson, D. Mahalu, and V. Umansky, Phys. Rev. Lett. 96, 016804 (2006). [6] L. V. Litvin, H.-P. Tranitz, W. Wegscheider, and C. Strunk, Phys. Rev. B 75, 033315 (2007). [7] V. S.-W. Chung, P. Samuelsson, and M. Büttiker, Phys. Rev. B 72, 125320 (2005). [8] E. V. Sukhorukov and V. V. Cheianov, cond-mat/0609288 . [9] I. Neder, M. Heiblum, Y. Levinson, D. Mahalu, and V. Umansky, cond-mat/0508024 (2005). [10] All the results on the visibility reported here on sam- ple #1 have been obtained using the following proce- dure : we measured N = 2000 times the transmission and calculated the mean value Tmean and the variance < δT 2 >. It is straightforward to show that the visi- bility is V = < δT 2 > − < δT 2 >0/Tmean, where < δT 2 >0 is the measurement noise which depends on the AC bias amplitude, the noise of the amplifiers and the time constant of the lock-in amplifiers (fixed to 10 ms), measured in absence of the quantum interferences. http://arxiv.org/abs/cond-mat/0609288 http://arxiv.org/abs/cond-mat/0508024 0.2 0.4 0.6 0.6 d) Magnetic Field - 4.6 T (mT) 0.2 0.4 0.6 0.6 c) 0.2 0.4 0.6 ABSTRACT We present an original statistical method to measure the visibility of interferences in an electronic Mach-Zehnder interferometer in the presence of low frequency fluctuations. The visibility presents a single side lobe structure shown to result from a gaussian phase averaging whose variance is quadratic with the bias. To reinforce our approach and validate our statistical method, the same experiment is also realized with a stable sample. It exhibits the same visibility behavior as the fluctuating one, indicating the intrinsic character of finite bias phase averaging. In both samples, the dilution of the impinging current reduces the variance of the gaussian distribution. <|endoftext|><|startoftext|> arXiv:0704.0747v1 [math.DG] 5 Apr 2007 Univ. Beograd. Publ. Elektrotehn. Fak. Ser. Mat. 7 (1996), 105–109. A NOTE ON HIGHER-ORDER DIFFERENTIAL OPERATIONS Branko J. Malešević In this paper we consider successive iterations of the first-order differential operations in space R 1. INTRODUCTION Let C∞(R3) be the set of scalar functions f = f(x1, x2, x3) : R 3 7→ R which have the continuous partial derivatives of the arbitrary order on coordinates xi (i = 1, 2, 3). Let ~C∞(R3) be the set vector functions ~f = f1(x1, x2, x3), f2(x1, x2, x3), f3(x1, x2, x3) : R3 7→ R3 which have the coordinately continuous partial deriva- tives of the arbitrary order on coordinates xi (i = 1, 2, 3). First-order differential operations of the vector analysis of the space R3 are defined on the following set of functions: f : R3 7→ R | f ∈ C∞(R3) and ~F = ~f : R3 7→ R3 | ~f ∈ ~C∞(R3) First-order differential operations of the vector analysis of the space R3 are defined as the following three linear operations [1], denoted here by ∇1,∇2 and ∇3 for a convenience: (1) grad f = ∇1f = ~e1 + ~e2 + ~e3 : F 7→ ~F , (2) curl ~f = ∇2 ~f = ~e1 + ~e2 + ~e3 : ~F 7→ ~F , (3) div ~f = ∇3 ~f = : ~F 7→ F. Let Ω = {∇1,∇2,∇3} be the set of above defined operations and let Σ = F ∪ ~F . Then the first-order differential operations can be considered as partial operations Σ 7→ Σ, i.e. as operations whose domain (and codomain) are subsets F or 01991 Mathematics Subject Classification: 26B12 http://arxiv.org/abs/0704.0747v1 106 Branko J. Malešević ~F of Σ. Second and higher-order differential operations are then defined as products of operations in Ω in the sense of composition of operations. Some of these products might be meaningful, like ∇3 ◦ ∇1, while the others are meaningless, like ∇1 ◦ ∇1. To all meaningless products for any argument we associate the value of nowhere defined function ϑ (Dom (ϑ) = ∅ and Ran (ϑ) = ∅). Nowhere defined function ϑ(f∅) is a concept from the recursive function theory [2]. We do not consider the function ϑ as the starting argument for calculating the value of the higher-order differential operations. In that way we increase set Σ into set Σ = F ∪ ~F ∪ {ϑ}. All meaningful second-order differential operations are: (4) ∆f = div grad f = (∇3 ◦ ∇1) (f), (5) curl curl ~f = (∇2 ◦ ∇2) (~f), (6) graddiv ~f = (∇1 ◦ ∇3)(~f), (7) div curl ~f = (∇3 ◦ ∇2) (~f) = 0, (8) curl grad f = (∇2 ◦ ∇1) (f) = ~0, f, ~f ∈ Σ \ {ϑ}. In this paper we consider higher-order differential operations, search for mean- ingful ones and present some applications. 2. HIGHER-ORDER DIFFERENTIAL OPERATIONS Theorem 1. For arbitrary operations ∇i,∇j ,∇k ∈ Ω (i, j, k ∈ {1, 2, 3}) and argument ξ ∈ Σ \ {ϑ} the associative law holds: (9) ∇i ◦ (∇j ◦ ∇k)(ξ) = (∇i ◦ ∇j) ◦ ∇k(ξ). Proof. Choosing the ∇i,∇j ,∇k from Ω and argument ξ from Σ\{ϑ}, (9) appears in 54 possible cases. It is directly verified that whenever the left side of the equality is meaningless, the right side is also meaningless. Than, all meaningless products have the same value of the nowhere defined function ϑ, so that (9) is true in the following form: ϑ = ϑ. Also, whenever the left side of equality is meaningful, the right side is also meaningful. Then, according to the associative law of the meaningful functions, we conclude that (9) is true. From Theorem 1 it follows (by induction) that the generalized associative law also holds, so we may write the product ∇i1 ◦ ∇i2 ◦ · · · ◦ ∇in without brackets (ij ∈ {1, 2, 3} : j = 1, 2, ..., n). For higher-order differential operations, given as meaningful products, we say that they are the trivial products if they are trivially anullated, i.e. if they are identically the same as the anullating functions 0, ~0 from Σ. Otherwise, we refer to the higher-order differential operations, given as meaningful products, as nontrivial products (if they are nontrivially anullated). Next, we prove the statement: A note on higher-order differential operations 107 Theorem 2. Higher-order differential operations appear as nontrivial products in the following three forms: (grad) div . . . graddiv grad f = (∇1◦)∇3 ◦ · · · ◦ ∇1 ◦ ∇3 ◦ ∇1f, curl curl . . . curl curl curl ~f = ∇2 ◦ ∇2 ◦ · · · ◦ ∇2 ◦ ∇2 ◦ ∇2 ~f, (div) grad . . . div graddiv ~f = (∇3◦)∇1 ◦ · · · ◦ ∇3 ◦ ∇1 ◦ ∇3 ~f, for arbitrary functions f, ~f ∈ Σ \ {ϑ}, where terms in brackets are included for odd number of terms and are left out otherwise. All other meaningful operations are identically zero in their domain. Proof. Meaningful third-order differential operations appear in the form of eight compositions as follows: (10) graddiv gradf = ∇1 ◦ ∇3 ◦ ∇1f, (11) curl curl curl ~f = ∇2 ◦ ∇2 ◦ ∇2 ~f, (12) div graddiv ~f = ∇3 ◦ ∇1 ◦ ∇3 ~f, (13) div curl curl ~f = ∇3 ◦ ∇2 ◦ ∇2 ~f = 0, (14) div curl gradf = ∇3 ◦ ∇2 ◦ ∇1f = 0, (15) curl curl grad f = ∇2 ◦ ∇2 ◦ ∇1f = ~0, (16) curl graddiv ~f = ∇2 ◦ ∇1 ◦ ∇3 ~f = ~0, (17) graddiv curl ~f = ∇1 ◦ ∇3 ◦ ∇2 ~f = ~0, f, ~f ∈ Σ \ {ϑ}. Anullations of the operations (13)–(17) follow directly from the anullations (4)–(5). The statement follows directly from the principle of mathematical induction by means of using the general associative law and formulas (10)–(17). For a given sequence of operations ∇i1 ,∇i2 , . . . ,∇in from the set Ω of func- tions, let define the concept of the collection of functions as a subset of functions Θ ⊆ Σ \ {ϑ} such that all functions ξ from Θ anullate the nontrivial product ∇i1 ◦ ∇i2 ◦ · · · ◦ ∇in (ξ). Let us form some collections. Scalar functions f from Σ, such that ∆nf = 0 is true, define harmonic collection Hn of order n, as the form of the polyharmonic functions. Let us notice that in the case of two dimensions there is a general form of polyharmonic functions f as a solution of the equation ∆nf = 0, [3]. Vector functions ~f from Σ, such that curln ~f = ~0 is true, define curling collection Cn of order n. We can remark that besides the total scalar operation ∆ : F 7→ F (partial scalar operation ∆ : Σ 7→ Σ) we can also consider the total vector operation ~∆ : ~F 7→ ~F (partial vector operation ~∆ : Σ 7→ Σ) defined by: (18) ~∆~f = (∆f1,∆f2,∆f3) = ∆f1 · ~e1 +∆f2 · ~e2 +∆f3 · ~e3. 108 Branko J. Malešević Let set ~Hn be the sign for the vector functions ~f from Σ such that ~∆ n(~f) = ~0, where ~∆n is iteration of order n of the vector operation ~∆ given by (18). The set of vector harmonic functions ~Hn of order n, which is defined in such a way, is not in the list of collections which appear in the previous theorem because it is not obtained through the compositions of operations (1)–(3). For the set ~Hn we shall keep the term collection. Let us notice that for scalar polyharmonic collections, vector polyharmonic collections and curling collections, related to the index-order, the following inclu- sions hold: (19) H ⊂ H2 ⊂ · · · ⊂ Hn−1 ⊂ Hn ⊂ · · · , (20) ~H ⊂ ~H2 ⊂ · · · ⊂ ~Hn−1 ⊂ ~Hn ⊂ · · · , (21) C ⊂ C2 ⊂ · · · ⊂ Cn−1 ⊂ Cn ⊂ · · · . Let emphasize that all previous considerations can be transformed in three- dimensional orthogonal curvilinear coordinate system by introducing of correspond- ing presumptions for functions from the sets F, ~F and Lamé’s coefficients. Finally, let state a few examples where scalar and vector polyharmonic col- lections appear. Example 1. All meaningful products of third-and-higher-order differential opera- tions for vector functions ~f ∈ ~H and scalar functions f ∈ H are anullated. For vector functions ~f ∈ ~H the following equation holds: (22) curl curl ~f = graddiv ~f. Hence, for f ∈ H and ~f ∈ ~H, on the basis of formulas (22) and (10)–(17) the following is true: graddiv gradf = grad (∆f) = ~0, curl curl curl ~f = curl (graddiv) ~f = ~0, div graddiv ~f = div (curl curl) ~f = 0. Thus, all eight meaningful products of third-order differential operations are anul- lated, so that the statement is true. Example 2. If f ∈ Hn−1, then x · f ∈ Hn, n ≥ 2. Let us notice that if f ∈ F, then x · f ∈ F. For an arbitrary scalar function f ∈ F the following equation is directly verified: ∆(x · f) = 2∂f/∂x+ x ·∆(f). Inductive generalization is the following equation: ∆n(x · f) = 2n · ∂ ∆n−1(f) /∂x+ x ·∆n(f). Thus, for (n− 1)-harmonic function f ∈ Hn−1 the conclusion x · f ∈ Hn is true. A note on higher-order differential operations 109 Example 3. If f ∈ Hn−1, then (x 2 + y2 + z2) · f ∈ Hn, n ≥ 2. Let us notice that if f ∈ F, then (x2 + y2 + z2) · f ∈ F. For the arbitrary scalar function f ∈ F the following equations are directly verified: ∆(x2 · f) = 2 · f + 4x · ∂f/∂x+ x2 ·∆(f), ∆2(x2 · f) = 8 · ∂2f/∂x2 + 8x · ∂ /∂x+ 4 ·∆(f) + x2 ·∆2(f). Inductive generalization is the equation as follows: ∆n(x2 · f) = 4n(n− 1) · ∂2 ∆n−2(f) + 4nx · ∂ ∆n−1(f) /∂x+ 2n ·∆n−1(f) + x2 ·∆n(f). Thus, if f ∈ Hn−1, then (x 2 + y2 + z2) · f ∈ Hn. Two previous examples are the generalizations of the corresponding problems con- tained in [4]. Acknowledgement. I wish to express my gratitude to Professors M. Merkle, I. Lazarević and D. Tošić who examined the first version of paper and gave me their suggestions and some very useful remarks. REFERENCES 1. M. L. Krasnov, A. I. Kiselev, G. I. Makarenko: Vector Analysis. Moscow 1981. 2. N. Cutland: Computability. Cambridge University Press, London 1980. 3. D. S. Mitrinović, J. D. Kečkić: Jednačine matematičke fizike. Beograd 1985. 4. D. S. Mitrinović, in association with P. M. Vasić: Diferencijalne jednačine, Novi zbornik problema 4. Beograd 1986. 5. M. J. Crowe: A History of Vector Analysis. University of Notre Dame Press, London 1967. Faculty of Electrical Engineering, (Received May 6, 1996) University of Belgrade, P.O.B 816, 11001 Belgrade, Yugoslavia malesevic@kiklop.etf.bg.ac.yu ABSTRACT In this paper we consider successive iterations of the first-order differential operations in space ${\bf R}^3.$ <|endoftext|><|startoftext|> Introduction The most important magneto-optical interactions that can occur in material media are the Faraday effect, magnetic dichroism, and magnetic birefringence (the Cotton- Mouton effect). Quantum electrodynamics predicts that because of photon-photon interactions even the vacuum becomes birefringent in the presence of a strong magnetic field [1-5]. Further, the interaction with an axion-like particle and two photons via the Primakoff effect will also lend optical properties to the vacuum in the presence of a strong magnetic field [6-10]. The occurrence of an apparent magnetic dichroism of the vacuum would imply the preferential disappearance of left- or right circularly polarized photons from a light beam. To conserve mass and energy this would imply either the production of particles, or photon-splitting. The QED effect and the axion effect are treated in terms of an effective Lagrangian [1-7], in units where 1c= =h and . 2 / 4 1/137eα π= ≈ ( ) ( ) ( ) 2 22 2 2 1 7 1 4 90 4 2 4ae L F F F F F F a a m a F F µν µν µν µ µν µν µν µν µ µν α ⎡ ⎤ = − + + + ∂ ∂ − +⎢ ⎥⎣ ⎦ % %1 a (1) Where the first half of the expression is the Euler-Heisenberg effective Lagrangian, which is appropriate to the QED effect, and the second half is the effective Lagrangian, which is appropriate to the Primakoff effect and accounts for the axion. Here, is the axion field, is the axion mass, and am M is the inverse axion coupling constant. Raffelt and Stodolsky [7] synthesize the results of Adler [4] and solve for the equations of motion. Analysis of the classical wave solutions of the equations of motion produces a picture of mixing between photon and axion modes in a polarized laser experiment with a static transverse magnetic field and an optical cavity to increase path length. In such an experiment, CP arguments predict that the axion will only couple to the parallel components of the beam. Thus, two main effects are predicted. The first effect is a phase difference ∆φ=φ||-φ⊥ between the parallel and perpendicular components of polarized light interacting with the magnetic field. This arises from both QED and the preferential mixing of axion and photon modes. In the mixing part of this picture, a photon mode oscillates into an axion mode before turning back into a photon and gets out of phase. In both cases, this phase difference causes an apparent birefringence. The second main effect, is an apparent linear dichroism which manifests itself as a rotation,ψ ,of the polarization and attenuation. This is caused by the fact that mirrors do not reflect axions and, hence, any axion modes that do not oscillate back to photons before hitting the mirror will appear as lost parallel photon modes. For small axion masses, the theory predicts: φ ω∆ = , ext a ∆ = and lψ = , (2) where l is the length of the cavity, N is the number of passes and L Nl= is the total path length of the beam through the interaction region. The subscripts QED and a, refer to the origin of the effects. In terms of index of refraction, ∆φ=kL(n||-n⊥). Choosing the limit of small axion masses is justified by several experimental results and astrophysical observations [6 -11] which bound the axion mass to and 3 610 10aeV m eV − −> > 1010M GeV> . This result also takes into account Adler’s analysis of the E-H Lagrangian which predicts the following vacuum birefringence: n⊥=1+2ξsin2θ, n||=1+ θξ 2sin 7 with = . (3) Here,θ is the angle between and k .[4,7] extB The expected birefringence, as a function of extB in Tesla, due to the QED effect, is ∆n=n||-n⊥=4×10-23Bext2. The phase shift between the two orthogonal components of a light beam is 2 /L nφ π∆ = ∆ λ . For input light linearly polarized at 45o to the field direction, this translates into an induced ellipticity of the light /L nε π λ= ∆ , for a path length L. For a 1m path and a 1T field the induced ellipticity is expected to be 1.2×10-16. No experiment to date has achieved this sensitivity. The BRFT and PVLAS Experiments Two important experiments have attempted to detect the phenomena that would result from the Primakoff effect. In the BRFT experiment [12] an upper limit of 3.5×10-10 rad was determined for the possible rotation angle for a 2.2km path in a 3.25T field, equivalent to 1.5×10-14 rad m-1T-2, and an ellipticity of 1.6×10-9 was measured on a 299m path in a 3.25T field. The PVLAS experiment [13] claims a rotation of 1.7×10-7rad for a 44km long path in a 5T field, equivalent to 1.55×10-14 rad m-1T-2. The BRFT and PVLAS experiments differ in several important specific ways, although from the standpoint of applying a modulated magnetic field they are similar. BRFT uses a transverse magnetic field modulated at a frequency of 32mHz about a background level of 3.25T. PVLAS uses a transverse magnetic field that rotates around the light propagation axis at 1.89rad/s. This field is equivalent to the simultaneous application of two orthogonal transverse field components oscillating at 0.3Hz, but in quadrature. Neither the BRFT nor PVLAS experiments operated at the photon noise limit. The BRFT experiment used a 200mW argon ion laser and achieved a sensitivity of 4.7×10-7 rad Hz ½ m W -1/2. The PVLAS experiment used a 100mW 1.06µm Nd:YAG laser and achieved a sensitivity of 10-6 rad Hz-1/2 mW-1/2. The photon noise limit at 1.06µm for a detector with a responsivity of 0.4 A/W (a typical value for a Si photodiode at this wavelength) is 2×10-8 rad Hz 1/2 mW 1/2. Discussion We have for several years operated a balanced coherent homodyne polarization interferometer for the study of the Faraday and Cotton-Mouton effects in condensed matter [14], and have achieved a photon noise limited sensitivity of 2×10-8 rad Hz-1/2 mW-1/2 at 632.8nm or 1.06µm. Because we have only a 1kGauss modulated transverse field magnet with 0.1m pole pieces we could not compete with the BRFT and PVLAS experiments in overall sensitivity since we were a factor of 2.3×104 mT2 below BRFT and a factor 109mT2 below PVLAS in terms of path length and field strength. However, our experience with a very sensitive system for measuring elipticity has taught us much about the potential pitfalls of these experiments from an experimental optics standpoint. It is clear to us that the PVLAS experiment suffers from artifacts, as has already been pointed out by Melissinos [15], that the BRFT experiment suffers from artifacts has been acknowledged by its authors, although they do not specify all the sources of these spurious signals. A primary source of spurious signals in sensitive experiments of this kind is motion of optical components caused by a time-varying or a rotating magnetic field. The BRFT experiment acknowledges this and used a feedback system to attempt to minimize its effects. The PVLAS data show clear sideband peaks corresponding to the rotation frequency of their magnet, which should not be present for an effect proportional to B2. Indeed these peaks are approximately 18 times larger than the “real” signal at twice the magnet rotation frequency. They do not explain the origin of the fundamental signal but interpret the second harmonic signal as resulting from an interaction involving a light, neutral, spin-zero particle. In both the BRFT and PVLAS experiments optical components are either close to the magnet or mechanically coupled to the magnet and its cryostat. A primary component of the experiment that is strongly affected by the magnetic field is the evacuated tube passing though the magnet. This tube extends to the cavity end mirrors. All components in the experiment that experience any modulated field or field gradients will experience time-varying diamagnetic or paramagnetic forces. For example, any stainless steel or aluminum optical mounts will experience paramagnetic forces. There are torques acting on induced magnetic dipoles, especially in any components exposed to the field that are not absolutely symmetrically placed with respect to the field direction. A quartz sample tube in the magnet will experience the strongest forces in the regions where it leaves the magnet and experiences the largest field gradients, and will be pulled into the magnet bore. In general time-varying forces all result from any changes in magnetic stored energy that occur as the field is modulated. This generalized force on an object is F= ∫ ⋅∇− .21 dVHB In our sensitive magneto-optical experiments we have verified that significant artifacts can result from any modulated feedback of light into the laser [16]. It has been shown that if a part of its own field is fed back into a laser by an optical component vibrating with small amplitude, then in the weak feedback regime, phase and amplitude of the output beam from the laser are synchronously modulated [17]. This effect is so efficient that when the source laser is influenced by the feedback the modulated light can cause interference in a sensitive measurement even for a balanced homodyne interferometer measuring an extremely small signal. We have performed a rigorous study of the feedback effect for the case of a balanced homodyne polarization interferometer. As a result, we have been able to detect phase and/or amplitude modulation produced in a balanced homodyne polarization interferometer when light from a mirror oscillating with an amplitude of only 9nm is fed back into the laser with 120dB of attenuation. This effect is still present even if the laser is an extremely low phase noise Nd:YAG ring laser [17]. The BRFT experiment is less sensitive to this feedback effect because it uses a multipass, zig-zag Herriott type cavity [18,19] rather than a spherical Fabry-Perot cavity. It is possible for light scattered by any of the optical components in these experiments to cause feedback, even if no specific optical component is used in the normal direction, and this includes scattered light that reflects off the inside walls of the evacuated tube inside the magnet. The BRFT experiment uses a single optical isolator, which probably does not provide sufficient isolation to prevent feedback modulation effects. It appears that, according to the experimental arrangement shown in ref [11], the PVLAS experiment does not use an optical isolator after its laser. In principle, the Fabry-Perot resonator might not reflect significant incident light if the source laser is perfectly frequency locked to the resonator. In practice, however, even for a very high-Q resonator, it is impossible to avoid the feedback due to imperfectness of mirrors and locking electronics. Therefore, in the PVLAS experiment, the feedback modulation effects may cause major interference in measurements. In principle, any correlated intensity noise can be rejected in a balanced homodyne interferometer. However, because of the imperfect performance of real optical and/or electronic components, overall common mode rejection ratio of the interferometer used in our study was approximately 40dB. Synchronous feedback can cause interference in a sensitive experiment even when the signal level is very low. In the case of the PVLAS scheme, by including the feedback effect synchronized at twice the rotating frequency of the magnet, the representation for the light intensity transmitted through the crossed polarizers of the ellipsometer given in Eq (2) of Ref. [11] can be rewritten as 2 20 ,2( ){ [ ( ) ( ) ( )] }mNI I I t t tν σ α η= + + + +Γ where 0I , 2σ , α , η , and Γ have the same meaning as in Ref [11] and ,2 mNI ν is the intensity modulation caused by the feedback. The frequency of this synchronized modulation is given by the vibration frequency of a feedback element, twice the frequency of the rotating magnet. Small misalignment between the polarization components must be included in the quasi-static, uncompensated rotation and ellipticity, , which is much larger than the rotation caused by the Primakoff effect. Thus the term Γ ,22 ( )mN ( )I t tν η Γ in the above equation has not only the same Fourier frequency as 02I αη but also has the same phase relationship when the quarter-wave plate is rotated by 90o. The synchronous interference, thereby cannot be distinguished from the magneto-optical effect being sought. An important, but subtle distinction between the BRFT and PVLAS experiments is that the BRFT uses a mode-matched mirror cavity while the PVLAS apparently does not. Consequently, in the PVLAS experiment as the light beam oscillates between the two cavity mirrors its spot size and radius of curvature both oscillate and the radius of curvature does not match the mirror curvatures. This mismatch in radius leads to local non-normal incidence on the cavity mirrors (except on axis) and causes the local P-and S- polarization components of the beam to suffer different phase shifts, which vary radially on the mirror. A calculation for a typical very high reflectance multilayer mirror shows that this phase difference can be easily 10-11 rad per reflection for an incidence angle of 1.5mrad. The PVLAS cavity is subject to these effects, which would be modulated if the cavity mirrors move, although the BRFT cavity is not. A potential confounder in a search for vacuum magneto optic effects is the Faraday effect resulting from residual axial field components and trace gas. There are residual axial field components in both the BRFT and PVLAS experiments, since the local wave-vector directions in a Gaussian beam are only nominally perpendicular to a transverse field at the beam waist, or on axis. We do not however, believe that these were the sources of sidebands at the magnet oscillation or rotation frequency ωm. Nonetheless, an experiment in which there is no obvious modulation of the effect at frequency ωm is desirable, since an effect proportional to Bext2 only shows up at frequency 2ωm. In an experiment in which the entire field is modulated at frequency ωm a Faraday effect signal, or spurious signal, at ωm is distinguished from the desired signal at frequency 2ωm, which should be further checked by verifying that the desired signal is proportional to Bext2. A complication can arise if the magnet modulation is not a pure harmonic at frequency ωm. Any second harmonic of the magnetic field can produce a spurious signal at 2ωm, but this can be identified since it will be linear in Bext. Features of an Improved Experiment It is our belief that a balanced coherent homodyne interferometer is a better instrument to use than an extinction-based ellipsometer in a search for vacuum magneto- optical effects. Such a system is almost guaranteed to achieve the photon noise limit and provides excellent common mode rejection of laser noise. We also believe that any effect observed should be demonstrated to scale with Bext2 [14]. It will also be desirable to use the largest magnetic field possible, but not to modulate this. An experiment similar to PVLAS can then be performed by rotating the optical train at angular frequency ωm. Conclusions We believe that we have identified the likely causes of artifacts in the PVLAS experiment, and therefore suggest that the case for an interaction involving an axion-like particle has not been made. Furthermore, the PVLAS experiment contradicts the findings of the BRFT experiment, and a series of astrophysical observations that restrict the range of axion particle masses that are possible. An improved experimental arrangement is needed to pursue vacuum magnetic birefringence and polarization rotation effects. With an improved system, detection of the QED- predicted magnetic birefringence [4,5] should be possible, and a more sensitive examination of the existences of any axion-like interactions. * Corresponding author Email address: davis@umd.edu [1] W. Heisenberg and H. Euler, Z. Phys. 98, 714 (1936). [2] V.F. Weisskopf, K. Dan. Vidensk.Selsk.Mat.Fys.Medd, 14, 6 (1936). [3] J. Schwinger, Phys.Rev. 82, 664 (1951). [4] S.L. Adler, Ann Phys. (N.Y.) 67,599 (1971). [5] S. L. Adler, J. Phys. A 40, F143 (2007). [6] P. Sikivie, Phys. Rev. Lett. 51, 1415 (1983). [7] G. Raffelt and L. Stodolsky, Phys. Rev. D 37, 1237 (1988). [8] S.J. Asztalos et al. Ann.Rev.Nucl.Part.Sci. 56, 293 (2006). [9] P. Sikivie, arXiv:hep-ph/0701198v1 (2007). [10] G. Raffelt, arXiv:hep-ph/0611350 (2006). [11] J. Jaeckel et al, Phys.Rev.D 75, 013004 (2007) [12] R. Cameron et al. Phys Rev. D 47, 3703 (1993) [13] E. Zavattini et al. Phys. Rev. Lett. 96, 110406 (2006) [14] K. Cho, S.P. Bush, D.L. Mazzoni, and C.C. Davis, Phys. Rev. B 43, 965 (1991). [15] A.C. Melissinos, arXiv:hep-ph/0702135v1 13 Feb 2007. [16] K. Cho, Ph.D Thesis, University of Maryland, 1991. [17] M. Sargent III, M.O. Scully and W.E. Lamb, Laser Physics, Addison-Wesley, Reading. Mass, 1974. [18] D. R. Herriott, H. Kogelnik, and R. Kompfner, Appl. Opt. 3, 523 (1964) [19] D. R. Herriott and H.J. Schulte, Appl. Opt. Aug. 4, 883 (1965) mailto:davis@umd.edu ABSTRACT We discuss the experimental techniques used to date for measuring the changes in polarization state of a laser produced by a strong transverse magnetic field acting in a vacuum. We point out the likely artifacts that can arise in such experiments, with particular reference to the recent PVLAS observations and the previous findings of the BFRT collaboration. Our observations are based on studies with a photon-noise limited coherent homodyne interferometer with a polarization sensitivity of 2x10^-8 rad Hz^(1/2) mW^(-1/2). <|endoftext|><|startoftext|> Introduction The discovery of binary pulsars in 1974 [1] opened up a new testing ground for relativistic gravity. Before this discovery, the only available testing ground for relativistic gravity was the solar system. As Einstein’s theory of General Relativity (GR) is one of the basic pillars of modern science, it deserves to be tested, with the highest possible accuracy, in all its aspects. In the solar system, the gravitational field is slowly varying and represents only a very small deformation of a flat spacetime. As a consequence, solar system tests can only probe the quasi-stationary (non radiative) weak-field limit of relativistic gravity. By contrast binary systems containing compact objects (neutron stars or black holes) involve spacetime domains (inside and near the compact objects) where the gravitational field is strong. Indeed, the surface relativistic gravitational field h00 ≃ 2GM/c2R of a neutron star is of order 0.4, which is close to the one of a black hole (2GM/c2R = 1) and much larger than the surface gravitational fields of solar system bodies: (2GM/c2R)Sun ∼ 10−6, (2GM/c2R)Earth ∼ 10−9. In addition, the high stability of “pulsar clocks” has made it possible to monitor the dynamics of its orbital motion down to a precision allowing one to measure the small (∼ (v/c)5) orbital effects linked to the propagation of the gravitational field at the velocity of light between the pulsar and its companion. The recent discovery of the remarkable double binary pulsar PSR J0737− 3039 [2, 3] (see also the contributions of M. Kramer and A. Possenti to these ∗Based on lectures given at the SIGRAV School “A Century from Einstein Relativity: Probing Gravity Theories in Binary Systems”, Villa Olmo (Como Lake, Italy), 17-21 May 2005. To appear in the Proceedings, edited by M. Colpi et al. (to be published by Springer). http://arxiv.org/abs/0704.0749v1 proceedings) has renewed the interest in the use of binary pulsars as test-beds of gravity theories. The aim of these notes is to provide an introduction to the theoretical frameworks needed for interpreting binary pulsar data as tests of GR and alternative gravity theories. 2 Motion of binary pulsars in general relativity The traditional (text book) approach to the problem of motion of N separate bodies in GR consists of solving, by successive approximations, Einstein’s field equations (we use the signature −+++) Rµν − Rgµν = Tµν , (1) together with their consequence ∇ν T µν = 0 . (2) To do so, one assumes some specific matter model, say a perfect fluid, T µν = (ε+ p)uµ uν + p gµν . (3) One expands (say in powers of Newton’s constant) gµν(x λ) = ηµν + h µν + h µν + . . . , (4) together with the use of the simplifications brought by the ‘Post-Newtonian’ approximation (∂0 hµν = c −1 ∂t hµν ≪ ∂i hµν ; v/c ≪ 1, p ≪ ε). Then one integrates the local material equation of motion (2) over the volume of each separate body, labelled say by a = 1, 2, . . . , N . In so doing, one must define some ‘center of mass’ zia of body a, as well as some (approximately conserved) ‘mass’ ma of body a, together with some corresponding ‘spin vector’ S a and, possibly, higher multipole moments. An important feature of this traditional method is to use a unique coor- dinate chart xµ to describe the full N -body system. For instance, the center of mass, shape and spin of each body a are all described within this common coordinate system xµ. This use of a single chart has several inconvenient as- pects, even in the case of weakly self-gravitating bodies (as in the solar system case). Indeed, it means for instance that a body which is, say, spherically sym- metric in its own ‘rest frame’ Xα will appear as deformed into some kind of ellipsoid in the common coordinate chart xµ. Moreover, it is not clear how to construct ‘good definitions’ of the center of mass, spin vector, and higher multipole moments of body a, when described in the common coordinate chart xµ. In addition, as we are interested in the motion of strongly self-gravitating bodies, it is not a priori justified to use a simple expansion of the type (4) be- cause h Gma/(c 2 |x − za|) will not be uniformly small in the common coordinate system xµ. It will be small if one stays far away from each object a, but, as recalled above, it will become of order unity on the surface of a compact body. These two shortcomings of the traditional ‘one-chart’ approach to the rela- tivistic problem of motion can be cured by using a ‘multi-chart’ approach.The multi-chart approach describes the motion of N (possibly, but not necessarily, compact) bodies by using N+1 separate coordinate systems: (i) one global coor- dinate chart xµ (µ = 0, 1, 2, 3) used to describe the spacetime outside N ‘tubes’, each containing one body, and (ii) N local coordinate charts Xαa (α = 0, 1, 2, 3; a = 1, 2, . . . , N) used to describe the spacetime in and around each body a. The multi-chart approach was first used to discuss the motion of black holes and other compact objects [4, 5, 6, 7, 8, 9, 10, 11]. Then it was also found to be very convenient for describing, with the high-accuracy required for dealing with modern technologies such as VLBI, systems of N weakly self-gravitating bodies, such as the solar system [12, 13]. The essential idea of the multi-chart approach is to combine the information contained in several expansions. One uses both a global expansion of the type (4) and several local expansions of the type Gαβ(X a ) = G a ;ma) +H αβ (X a ;ma,mb) + · · · , (5) where G αβ(X ;ma) denotes the (possibly strong-field) metric generated by an isolated body of mass ma (possibly with the additional effect of spin). The separate expansions (4) and (5) are then ‘matched’ in some overlapping domain of common validity of the type Gma/c 2 . Ra ≪ |x−za| ≪ d ∼ |xa−xb| (with b 6= a), where one can relate the different coordinate systems by expansions of the form xµ = zµa (Ta) + e i (Ta)X ij(Ta)X a + · · · (6) The multi-chart approach becomes simplified if one considers compact bodies (of radius Ra comparable to 2Gma/c 2). In this case, it was shown [9], by considering how the ‘internal expansion’ (5) propagates into the ‘external’ one (4) via the matching (6), that, in General Relativity, the internal structure of each compact body was effaced to a very high degree, when seen in the external expansion (4). For instance, for non spinning bodies, the internal structure of each body (notably the way it responds to an external tidal excitation) shows up in the external problem of motion only at the fifth post-Newtonian (5PN) approximation, i.e. in terms of order (v/c)10 in the equations of motion. This ‘effacement of internal structure’ indicates that it should be possible to simplify the rigorous multi-chart approach by skeletonizing each compact body by means of some delta-function source. Mathematically, the use of dis- tributional sources is delicate in a nonlinear theory such as GR. However, it was found that one can reproduce the results of the more rigorous matched- multi-chart approach by treating the divergent integrals generated by the use of delta-function sources by means of (complex) analytic continuation [9]. The most efficient method (especially to high PN orders) has been found to use analytic continuation in the dimension of space d [14]. Finally, the most efficient way to derive the general relativistic equations of motion of N compact bodies consists of solving the equations derived from the action (where g ≡ − det(gµν)) dd+1 x R(g)− −gµν(zλa ) dz a dzνa , (7) formally using the standard weak-field expansion (4), but considering the space dimension d as an arbitrary complex number which is sent to its physical value d = 3 only at the end of the calculation. Using this method1 one has derived the equations of motion of two compact bodies at the 2.5PN (v5/c5) approximation level needed for describing binary pulsars [15, 16, 9]: d2 zia = Aia0(za − zb) + c−2Aia2(za − zb,va,vb) + c−4Aia4(za − zb,va,vb,Sa,Sb) + c−5Aia5(za − zb,va − vb) +O(c−6) . (8) Here Aia0 = −Gmb(zia − zib)/|za − zb|3 denotes the Newtonian acceleration, Aia2 its 1PN modification, Aia4 its 2PN modification (together with the spin-orbit effects), and Aia5 the 2.5PN contribution of order v 5/c5. [See the references above; or the review [17], for more references and the explicit expressions of A2, A4 and A5.] It was verified that the term A a5 has the effect of decreasing the mechanical energy of the system by an amount equal (on average) to the energy lost in the form of gravitational wave flux at infinity. Note, however, that here Aia5 was derived, in the near zone of the system, as a direct consequence of the general relativistic propagation of gravity, at the velocity c, between the two bodies. This highlights the fact that binary pulsar tests of the existence of Aia5 are direct tests of the reality of gravitational radiation. Recently, the equations of motion (8) have been computed to even higher accuracy: 3PN ∼ v6/c6 [18, 19, 20, 21, 22] and 3.5PN ∼ v7/c7 [23, 24, 25] (see also the review [26]). These refinements are, however, not (yet) needed for interpreting binary pulsar data. 3 Timing of binary pulsars in general relativity In order to extract observational effects from the equations of motion (8) one needs to go through two steps: (i) to solve the equations of motion (8) so as to 1Or, more precisely, an essentially equivalent analytic continuation using the so-called ‘Riesz kernels’. get the coordinate positions z1 and z2 as explicit functions of the coordinate time t, and (ii) to relate the coordinate motion za(t) to the pulsar observables, i.e. mainly to the times of arrival of electromagnetic pulses on Earth. The first step has been accomplished, in a form particularly useful for dis- cussing pulsar timing, in Ref. [27]. There (see also [28]) it was shown that, when considering the full (periodic and secular) effects of the A2 ∼ v2/c2 terms in Eq. (8), together with the secular effects of the A4 ∼ v4/c4 and A5 ∼ v5/c5 terms, the relativistic two-body motion could be written in a very simple ‘quasi- Keplerian’ form (in polar coordinates), namely: n dt+ σ = u− et sinu , (9) θ − θ0 = (1 + k) 2 arctan 1 + eθ 1− eθ , (10) R ≡ rab = aR(1 − eR cosu) , (11) ra ≡ |za − zCM | = ar(1− er cosu) , (12) rb ≡ |zb − zCM | = ar′(1− er′ cosu) . (13) Here n ≡ 2π/Pb denotes the orbital frequency, k = ∆θ/2π = 〈ω̇〉/n = 〈ω̇〉Pb/2π the fractional periastron advance per orbit, u an auxiliary angle (‘rel- ativistic eccentric anomaly’), et, eθ, eR, er and er′ various ‘relativistic eccentric- ities’ and aR, ar and ar′ some ‘relativistic semi-major axes’. See [27] for the relations between these quantities, as well as their link to the relativistic energy and angular momentum E, J . A direct study [28] of the dynamical effect of the contribution A5 ∼ v5/c5 in the equations of motion (8) has checked that it led to a secular increase of the orbital frequency n(t) ≃ n(0)+ ṅ(t−t0), and thereby to a quadratic term in the ‘relativistic mean anomaly’ ℓ = n dt+ σ appearing on the left-hand side (L.H.S.) of Eq. (9): ℓ ≃ σ0 + n0(t− t0) + ṅ(t− t0)2 . (14) As for the contribution A4 ∼ v4/c4 it induces several secular effects in the orbital motion: various 2PN contributions to the dimensionless periastron pa- rameter k (δ4 k ∼ v4/c4+ spin-orbit effects), and secular variations in the incli- nation of the orbital plane (due to spin-orbit effects). The second step in relating (8) to pulsar observations has been accomplished through the derivation of a ‘relativistic timing formula’ [29, 30]. The ‘timing formula’ of a binary pulsar is a multi-parameter mathematical function relating the observed time of arrival (at the radio-telescope) of the center of the N th pulse to the integer N . It involves many different physical effects: (i) dispersion effects, (ii) travel time across the solar system, (iii) gravitational delay due to the Sun and the planets, (iv) time dilation effects between the time measured on the Earth and the solar-system-barycenter time, (v) variations in the travel time between the binary pulsar and the solar-system barycenter (due to relative accelerations, parallax and proper motion), (vi) time delays happening within the binary system. We shall focus here on the time delays which take place within the binary system (see the lectures of M. Kramer for a discussion of the other effects). For a proper derivation of the time delays occurring within the binary sys- tem we need to use the multi-chart approach mentionned above. In the ‘rest frame’ (X0a = c Ta, X a) attached to the pulsar a, the pulsar phenomenon can be modelled by the secularly changing rotation of a beam of radio waves: Ωa(Ta) d Ta ≃ Ωa Ta + Ω̇a T Ω̈a T a + · · · , (15) where Φa is the longitude around the spin axis. [Depending on the precise defi- nition of the rest-frame attached to the pulsar, the spin axis can either be fixed, or be slowly evolving, see e.g. [13].] One must then relate the initial direction (Θa,Φa), and proper time Ta, of emission of the pulsar beam to the coordinate direction and coordinate time of the null geodesic representing the electromag- netic beam in the ‘global’ coordinates xµ used to describe the dynamics of the binary system [NB: the explicit orbital motion (9)–(13) refers to such global coordinates x0 = ct, xi]. This is done by using the link (6) in which zia denotes the global coordinates of the ‘center of mass’ of the pulsar, Ta the local (proper) time of the pulsar frame, and where, for instance e0i = c2 rab + · · · + · · · (16) Using the link (6) (with expressions such as (16) for the coefficients e i , . . .) one finds, among other results, that a radio beam emitted in the proper direction N i in the local frame appears to propagate, in the global frame, in the coordinate direction ni where ni = N i + −N i N . (17) This is the well known ‘aberration effect’, which will then contribute to the timing formula. One must also write the link between the pulsar ‘proper time’ Ta and the coordinate time t = x0/c = z0a/c used in the orbital motion (9)–(13). This reads − c2 d T 2a = g̃µν(aλa) dzµa dzνa (18) where the ‘tilde’ denotes the operation consisting (in the matching approach) in discarding in gµν the ‘self contributions’ ∼ (Gma/Ra)n, while keeping the effect of the companion (∼ Gmb/rab, etc. . .). One checks that this is equivalent (in the dimensional-continuation approach) in taking xµ = zµa for sufficiently small values of the real part of the dimension d. To lowest order this yields the link 1− 2Gmb c2 rab 1− Gmb c2 rab which combines the special relativistic and general relativistic time dilation effects. Hence, following [30] we can refer to them as the ‘Einstein time delay’. Then, one must compute the (global) time taken by a light beam emitted by the pulsar, at the proper time Ta (linked to temission by (19)), in the initial global direction ni (see Eq. (17)), to reach the barycenter of the solar system. This is done by writing that this light beam follows a null geodesic: in particular 0 = ds2 = gµν(x λ) dxµ dxν ≃ − 1− 2U c2 dt2 + dx2 (20) where U = Gma/|x−za|+Gmb/|x−zb| is the Newtonian potential within the binary system. This yields (with te ≡ temission, ta ≡ tarrival) ta − te = dt ≃ 1 |dx|+ 2 |x− za| |x− zb| |dx| . (21) The first term on the last RHS of Eq. (21) is the usual ‘light crossing time’ |zbarycenter(ta) − za(te)| between the pulsar and the solar barycenter. It con- tains the ‘Roemer time delay’ due to the fact that za(te) moves on an orbit. The second term on the last RHS of Eq. (21) is the ‘Shapiro time delay’ due to the propagation of the beam in a curved spacetime (only the Gmb piece linked to the companion is variable). When inserting the ‘quasi-Keplerian’ form (9)–(13) of the relativistic motion in the ‘Roemer’ term in (21), together with all other relativistic effects, one finds that the final expression for the relativistic timing formula can be significantly simplified by doing two mathematical transformations. One can redefine the ‘time eccentricity’ et appearing in the ‘Kepler equation’ (9), and one can define a new ‘eccentric anomaly’ angle: u→ unew [we henceforth drop the superscript ‘new’ on u]. After these changes, the binary-system part of the general relativis- tic timing formula [30] takes the form (we suppress the index a on the pulsar proper time Ta) tbarycenter − t0 = D−1[T +∆R(T ) + ∆E(T ) + ∆S(T ) + ∆A(T )] (22) ∆R = x sinω[cosu− e(1 + δr)] + x[1 − e2(1 + δθ)2]1/2 cosω sinu , (23) ∆E = γ sinu , (24) ∆S = −2r ln{1− e cosu− s[sinω(cosu− e) + (1− e2)1/2 cosω sinu]},(25) ∆A = A{sin[ω +Ae(u)] + e sinω}+B{cos[ω +Ae(u)] + e cosω} , (26) where x = x0 + ẋ(T − T0) represents the projected light-crossing time (x = apulsar sin i/c), e = e0 + ė(T − T0) a certain (relativistically-defined) ‘timing eccentricity’, Ae(u) the function Ae(u) ≡ 2 arctan 1 + e , (27) ω = ω0 + k Ae(u) the ‘argument of the periastron’, and where the (relativisti- cally-defined) ‘eccentric anomaly’ u is the function of the ‘pulsar proper time’ T obtained by solving the Kepler equation u− e sinu = 2π T − T0 T − T0 . (28) It is understood here that the pulsar proper time T corresponding to the N th pulse is related to the integer N by an equation of the form N = c0 + νp T + ν̇p T ν̈p T 3 . (29) From these formulas, one sees that δθ (and δr) measure some relativistic distor- tion of the pulsar orbit, γ the amplitude of the ‘Einstein time delay’2 ∆E , and r and s the range and shape of the ‘Shapiro time delay’3 ∆S . Note also that the dimensionless PPK parameter k measures the non-uniform advance of the periastron. It is related to the often quoted secular rate of periastron advance ω̇ ≡ 〈dω/dt〉 by the relation k = ω̇Pb/2π. It has been explicitly checked that binary-pulsar observational data do indeed require to model the relativistic pe- riastron advance by means of the non-uniform (and non-trivial) function of u multiplying k on the R.H.S. of Eq. (27) [31]4. Finally, we see from Eq. (28) that Pb represents the (periastron to periastron) orbital period at the fiducial epoch T0, while the dimensionless parameter Ṗb represents the time derivative of Pb (at T0). Schematically, the structure of the DD timing formula (22) is tbarycenter − t0 = F [TN ; {pK}; {pPK}; {qPK}] , (30) where tbarycenter denotes the solar-system barycentric (infinite frequency) ar- rival time of a pulse, T the pulsar emission proper time (corrected for aberra- tion), {pK} = {Pb, T0, e0, ω0, x0} is the set of Keplerian parameters, {pPK = k, γ, Ṗb, r, s, δθ, ė, ẋ} the set of separately measurable post-Keplerian parameters, 2The post-Keplerian timing parameter γ, first introduced in [29], has the dimension of time, and should not be confused with the dimensionless post-Newtonian Eddington parameter γPPN probed by solar-system experiments (see below). 3The dimensionless parameter s is numerically equal to the sine of the inclination angle i of the orbital plane, but its real definition within the PPK formalism is the timing parameter which determines the ‘shape’ of the logarithmic time delay ∆S(T ). 4Alas this function is theory-independent, so that the non-uniform aspect of the periastron advance cannot be used to yield discriminating tests of relativistic gravity theories. and {qPK} = {δr, A,B,D} the set of not separately measurable post-Keplerian parameters [31]. [The parameter D is a ‘Doppler factor’ which enters as an overall multiplicative factor D−1 on the right-hand side of Eq. (22).] A further simplification of the DD timing formula was found possible. In- deed, the fact that the parameters {qPK} = {δr, A,B,D} are not separately measurable means that they can be absorbed in changes of the other param- eters. The explicit formulas for doing that were given in [30] and [31]: they consist in redefining e, x, Pb, δθ and δr. At the end of the day, it suffices to consider a simplified timing formula where {δr, A,B,D} have been set to some given fiducial values, e.g. {0, 0, 0, 1}, and where one only fits for the remaining parameters {pK} and {pPK}. Finally, let us mention that it is possible to extend the general parametrized timing formula (30) by writing a similar parametrized formula describing the ef- fect of the pulsar orbital motion on the directional spectral luminosity [d(energy) /d(time) d(frequency) d(solid angle)] received by an observer. As discussed in detail in [31] this introduces a new set of ‘pulse-structure post-Keplerian pa- rameters’. 4 Phenomenological approach to testing rela- tivistic gravity with binary pulsar data As said in the Introduction, binary pulsars contain strong gravity domains and should therefore allow one to test the strong-field aspects of relativistic gravity. The question we face is then the following: How can one use binary pulsar data to test strong-field (and radiative) gravity? Two different types of answers can be given to this question: a phenomeno- logical (or theory-independent) one, or various types of theory-dependent ap- proaches. In this Section we shall consider the phenomenological approach. The phenomenological approach to binary-pulsar tests of relativistic gravity is called the parametrized post-Keplerian formalism [32, 31]. This approach is based on the fact that the mathematical form of the multi-parameter DD timing formula (30) was found to be applicable not only in General Relativity, but also in a wide class of alternative theories of gravity. Indeed, any theory in which gravity is mediated not only by a metric field gµν but by a general combination of a metric field and of one or several scalar fields ϕ(a) will induce relativistic timing effects in binary pulsars which can still be parametrized by the formulas (22)–(29). Such general ‘tensor-multi-scalar’ theories of gravity contain arbitrary functions of the scalar fields. They have been studied in full generality in [33]. It was shown that, under certain conditions, such tensor-scalar gravity theories could lead, because of strong-field effects, to very different predictions from those of General Relativity in binary pulsar timing observations [34, 35, 36]. However, the point which is important for this Section, is that even when such strong-field effects develop one can still use the universal DD timing formula (30) to fit the observed pulsar times of arrival. The basic idea of the phenomenological, parametrized post-Keplerian (PPK) approach is then the following: By least-square fitting the observed sequence of pulsar arrival times tN to the parametrized formula (30) (in which TN is defined by Eq. (29) which introduces the further parameters νp, ν̇p, ν̈p) one can phenomenologically extract from raw observational data the (best fit) values of all the parameters entering Eqs. (29) and (30). In particular, one so determines both the set of Keplerian parameters {pK} = {Pb, T0, e0, ω0, x0}, and the set of post-Keplerian (PK) parameters {pPK} = {k, γ, Ṗb, r, s, δθ, ė, ẋ}. In extracting these values, we did not have to assume any theory of gravity. However, each specific theory of gravity will make specific predictions relating the PK param- eters to the Keplerian ones, and to the two (a priori unknown) masses ma and mb of the pulsar and its companion. [For certain PK parameters one must also consider other variables related to the spin vectors of a and b.] In other words, the measurement (in addition of the Keplerian parameters) of each PK param- eter defines, for each given theory, a curve in the (ma,mb) mass plane. For any given theory, the measurement of two PK parameters determines two curves and thereby generically determines the values of the two masses ma and mb (as the point of intersection of these two curves). Therefore, as soon as one mea- sures three PK parameters one obtains a test of the considered gravity theory. The test is passed only if the three curves meet at one point. More generally, the measurement of n PK timing parameters yields n− 2 independent tests of relativistic gravity. Any one of these tests, i.e. any simultaneous measurement of three PK parameters can either confirm or put in doubt any given theory of gravity. As General Relativity is our current most successful theory of gravity, it is clearly the prime target for these tests. We have seen above that the timing data of each binary pulsar provides a maximum of 8 PK parameters: k, γ, Ṗb, r, s, δθ, ė and ẋ. Here, we were talking about a normal ‘single line’ binary pulsar where, among the two compact objects a and b only one of the two, say a is observed as a pulsar. In this case, one binary system can provide up to 8− 2 = 6 tests of GR. In practice, however, it has not yet been possible to measure the parameter δθ (which measures a small relativistic deformation of the elliptical orbit), nor the secular parameters ė and ẋ. The original Hulse-Taylor system PSR 1913+16 has allowed one to measure 3 PK parameters: k ≡ 〈ω̇〉Pb/2π, γ and Ṗb. The two parameters k and γ involve (non radiative) strong-field effects, while, as explained above, the orbital period derivative Ṗb is a direct consequence of the term A5 ∼ v5/c5 in the binary-system equations of motion (5). The term A5 is itself directly linked to the retarded propagation, at the velocity of light, of the gravitational interaction between the two strongly self-gravitating bodies a and b. Therefore, any test involving Ṗb will be a mixed radiative strong-field test. Let us explain on this example what information one needs to implement a phenomenological test such as the (k−γ−Ṗb)1913+16 one. First, we need to know the predictions made by the considered target theory for the PK parameters k, γ and Ṗb as functions of the two masses ma and mb. These predictions have been worked out, for General Relativity, in Refs. [29, 28, 30]. Introducing the notation (where n ≡ 2π/Pb) M ≡ ma +mb (31) Xa ≡ ma/M ; Xb ≡ mb/M ; Xa +Xb ≡ 1 (32) βO(M) ≡ , (33) they read kGR(ma,mb) = 1− e2 β2O , (34) γGR(ma,mb) = Xb(1 +Xb)β O , (35) ṖGRb (ma,mb) = − 1 + 73 e2 + 37 (1− e2)7/2 XaXb β O . (36) However, if we use the three predictions (34)–(36), together with the best current observed values of the PK parameters kobs, γobs, Ṗ obdb [37] we shall find that the three curves kGR(ma,mb) = k obs, γGR(ma,mb) = γ obs, ṖGRb (ma,mb) = Ṗ obsb in the (ma,mb) mass plane fail to meet at about the 13 σ level! Should this put in doubt General Relativity? No, because Ref. [38] has shown that the time variation (notably due to galactic acceleration effects) of the Doppler fac- tor D entering Eq. (22) entailed an extra contribution to the ‘observed’ period derivative Ṗ obsb . We need to subtract this non-GR contribution before drawing the corresponding curve: ṖGRb (ma,mb) = Ṗ b − Ṗ galactic b . Then one finds that the three curves do meet within one σ. This yields a deep confirmation of Gen- eral Relativity, and a direct observational proof of the reality of gravitational radiation. We said several times that this test is also a probe of the strong-field aspects of GR. How can one see this? A look at the GR predictions (34)–(36) does not exhibit explicit strong-field effects. Indeed, the derivation of Eqs. (34)–(36) used in a crucial way the ‘effacement of internal structure’ that occurs in the general relativistic dynamics of compact objects. This non trivial property is rather specific of GR and means that, in this theory, all the strong-field effects can be absorbed in the definition of the masses ma and mb. One can, however, verify that strong-field effects do enter the observable PK parameters k, γ, Ṗb etc. . . by considering how the theoretical predictions (34)–(36) get modified in alternative theories of gravity. The presence of such strong-field effects in PK parameters was first pointed out in Ref. [7] (see also [39]) for the Jordan-Fierz-Brans-Dicke theory of gravity, and in Ref. [8] for Rosen’s bi-metric theory of gravity. A detailed study of such strong-field deviations was then performed in [33, 34, 35] for general tensor-(multi-)scalar theories of gravity. In the following Section we shall exhibit how such strong-field effects enter the various post-Keplerian parameters. Continuing our historical review of phenomenological pulsar tests, let us come to the binary system which was the first one to provide several ‘pure strong-field tests’ of relativistic gravity, without mixing of radiative effects: PSR 1534+12. In this system, it was possible to measure the four (non ra- diative) PK parameters k, γ, r and s. [We see from Eq. (25) that r and s mea- sure, respectively, the range and the shape of the ‘Shapiro time delay’ ∆S .] The measurement of the 4 PK parameters k, γ, r, s define 4 curves in the (ma,mb) mass plane, and thereby yield 2 strong-field tests of GR. It was found in [40] that GR passes these two tests. For instance, the ratio between the measured value sobs of the phenomenological parameter5 s and the value sGR[kobs, γobs] predicted by GR on the basis of the measurements of the two PK parameters k and γ (which determine, via Eqs. (34) , (35), the GR-predicted value of ma and mb) was found to be s obs/sGR[kobs, γobs] = 1.004± 0.007 [40]. The most recent data [41] yield sobs/sGR[kobs, γobs] = 1.000± 0.007. We see that we have here a confirmation of the strong-field regime of GR at the 1% level. Another way to get phenomenological tests of the strong field aspects of gravity concerns the possibility of a violation of the strong equivalence principle. This is parametrized by phenomenologically assuming that the ratio between the gravitational and the inertial mass of the pulsar differs from unity (which is its value in GR): (mgrav/minert)a = 1+∆a. Similarly to what happens in the Earth- Moon-Sun system [42], the three-body system made of a binary pulsar and of the Galaxy exhibits a ‘polarization’ of the orbit which is proportional to ∆ ≡ ∆a − ∆b, and which can be constrained by considering certain quasi-circular neutron- star-white-dwarf binary systems [43]. See [44] for recently published improved limits6 on the phenomenological equivalence-principle violation parameter ∆. The Parkes multibeam survey has recently discovered several new interesting ‘relativistic’ binary pulsars, thereby giving a huge increase in the number of phenomenological tests of relativistic gravity. Among those new binary pulsar systems, two stand out as superb testing grounds for relativistic gravity: (i) PSR J1141−6545 [46, 47], and (ii) the remarkable double binary pulsar PSR J0737−3039A and B [2, 3, 48, 49] (see also the lectures by M. Kramer and A. Possenti). The PSR J1141−6545 timing data have led to the measurement of 3 PK parameters: k, γ, and Ṗb [47]. As in PSR 1913+16 this yields one mixed radiative-strong-field test7. 5As already mentioned the dimensionless parameter s is numerically equal (in all theories) to the sine of the inclination angle i of the orbital plane, but it is better thought, in the PPK formalism, as a phenomenological timing parameter determining the ‘shape’ of the logarithmic time delay ∆S(T ). 6Note, however, that these limits, as well as those previously obtained in [45], assume that the (a priori pulsar-mass dependent) parameter ∆ ≃ ∆a is the same for all the analyzed pulsars. 7In addition, scintillation data have led to an estimate of the sine of the orbital inclination, sin i [50]. As said above, sin i numerically coincides with the PK parameter s measuring the ‘shape’ of the Shapiro time delay. Therefore, one could use the scintillation measurements as an indirect determination of s, thereby obtaining two independent tests from PSR J1141−6545 data. A caveat, however, is that the extraction of sin i from scintillation measurements rests on several simplifying assumptions whose validity is unclear. In fact, in the case of PSR J0737−3039 the direct timing measurement of s disagrees with its estimate via scintillation The timing data of the millisecond binary pulsar PSR J0737−3039A have led to the direct measurement of 5 PK parameters: k, γ, r, s and Ṗb [3, 48, 49]. In addition, the ‘double line’ nature of this binary system (i.e. the fact that one observes both components, A and B, as radio pulsars) allows one to perform new phenomenological tests by using Keplerian parameters. Indeed, the simultaneous measurement of the Keplerian parameters xa and xb representing the projected light crossing times of both pulsars (A and B) gives access to the combined Keplerian parameter Robs ≡ xobsb xobsa . (37) On the other hand, the general derivation of [30] (applicable to any Lorentz- invariant theory of gravity, and notably to any tensor-scalar theory) shows that the theoretical prediction for the the ratio R, considered as a function of the masses ma and mb, is Rtheory = . (38) The absence of any explicit strong-field-gravity effects in the theoretical predic- tion (38) (to be contrasted, for instance, with the predictions for PK parameters in tensor-scalar gravity discussed in the next Section) is mainly due to the con- vention used in [30] and [31] for defining the masses ma and mb. These are always defined so that the Lagrangian for two non interacting compact objects reads L0 = −ma c2(1− v2a/c2)1/2. In other words, ma c2 represents the total energy of body a. This means that one has implicitly lumped in the definition of ma many strong-self-gravity effects. [For instance, in tensor-scalar gravity ma includes not only the usual Einsteinian gravitational binding energy due to the self-gravitational field gµν(x), but also the extra binding energy linked to the scalar field ϕ(x).] Anyway, what is important is that, when performing a phenomenological test from the measurement of a triplet of parameters, e.g. {k, γ,R}, at least one parameter among them be a priori sensitive to strong- field effects. This is enough for guaranteeing that the crossing of the three curves ktheory(ma,mb) = k obs, γtheory(ma,mb) = γ obs, Rtheory(ma,mb) = R is really a probe of strong-field gravity. In conclusion, the two recently discovered binary pulsars PSR J1141−6545 and PSR J0737−3039 have more than doubled the number of phenomenological tests of (radiative and) strong-field gravity. Before their discovery, the ‘canoni- cal’ relativistic binary pulsars PSR 1913+16 and PSR 1534+12 had given us four data [49]. It is therefore safer not to use scintillation estimates of sin i on the same footing as direct timing measurements of the PK parameter s. On the other hand, a safe way of obtaining an s-related gravity test consists in using the necessary mathematical fact that s = sin i ≤ 1. In GR the definition xa = aa sin i/c leads to sin i = nxa/(β0 Xb). Therefore we can write the inequality nxa/(β0(M)Xb) ≤ 1 as a phenomenological test of GR. s ≤ 1 0 0.5 1 1.5 2 2.5 PSR J1141−6545 intersection 0 0.5 1 1.5 2 2.5 PSR B1534+12 intersection 0 0.5 1 1.5 2 2.5 PSR J0737−3039 intersection 0 0.5 1 1.5 2 2.5 2.5 ω s ≤ 1 PSR B1913+16 intersection Figure 1: Phenomenological tests of General Relativity obtained from Keplerian and post-Keplerian timing parameters of four relativistic pulsars. Figure taken from [51]. such tests: one (k−γ−Ṗb) test from PSR 1913+16 and three (k−γ−r−s−Ṗb8) tests from PSR 1534+12. The two new binary systems have given us five9 more phenomenological tests: one (k−γ− Ṗb) (or two, k−γ− Ṗb−s) tests from PSR J1141−6545 and four (k−γ− r−s− Ṗb−R) tests from PSR J0737−303910. As illustrated in Figure 1, these nine phenomenological tests of strong-field (and radiative) gravity are all in beautiful agreement with General Relativity. In addition, let us recall that several quasi-circular wide binaries, made of a neutron star and a white dwarf, have led to high-precision phenomenological confirmations [44] (in strong-field conditions) of one of the deep predictions of General Relativity: the ‘strong’ equivalence principle, i.e. the fact that var- ious bodies fall with the same acceleration in an external gravitational field, independently of the strength of their self-gravity. Finally, let us mention that Ref. [31] has extended the philosophy of the 8The timing measurement of Ṗ obs in PSR 1534+12 is even more strongly affected by kinematic corrections (Ḋ terms) than in the PSR 1913+16 case. In absence of a precise, independent measurement of the distance to PSR 1534+12, the k−γ− Ṗb test yields, at best, a ∼ 15% test of GR. 9Or even six, if we use the scintillation determination of s in PSR J1141−6545. 10The companion pulsar 0737−3039B being non recycled, and being visible only during a small part of its orbit, cannot be timed with sufficient accuracy to allow one to measure any of its post-Keplerian parameters. phenomenological (parametrized post-Keplerian) analysis of timing data, to a similar phenomenological analysis of pulse-structure data. Ref. [31] showed that, in principle, one could extract up to 11 ‘post-Keplerian pulse-structure param- eters’. Together with the 8 post-Keplerian timing parameters of a (single-line) binary pulsar, this makes a total of 19 phenomenological PK parameters. As these parameters depend not only on the two massesma,mb but also on the two angles λ, η determining the direction of the spin axis of the pulsar, the maximum number of tests one might hope to extract from one (single-line) binary pulsar is 19 − 4 = 15. However, the present accuracy with which one can model and measure the pulse structure of the known pulsars has not yet allowed one to measure any of these new pulse-structure parameters in a theory-independent and model-independent way. Nonetheless, it has been possible to confirm the reality (and order of mag- nitude) of the spin-orbit coupling in GR which was pointed out [52, 53] to be observable via a secular change of the intensity profile of a pulsar signal. Confir- mations of general relativistic spin-orbit effects in the evolution of pulsar profiles were obained in several pulsars: PSR 1913+16 [54, 55], PSR B1534+12 [56] and PSR J1141−6545 [57]. In this respect, let us mention that the spin-orbit interac- tion affects also several PK parameters, either by inducing a secular evolution in some of them (see [31]) or by contributing to their value. For instance, the spin- orbit interaction contributes to the observed value of the periastron advance parameter k an amount which is significant for the pulsars (such as 1913+16 and 0737−3039) where k is measured with high-accuracy. It was then pointed out [58] that this gives, in principle, and indirect way of measuring the moment of inertia of neutron stars (a useful quantity for probing the equation of state of nuclear matter [59, 60]). However, this can be done only if one measures, besides k, two other PK parameters with 10−5 accuracy. A rather tall order which will be a challenge to meet. The phenomenological approach to pulsar tests has the advantage that it can confirm or invalidate a specific theory of gravity without making assumptions about other theories. Moreover, as General Relativity has no free parameters, any test of its predictions is a potentially lethal test. From this point of view, it is remarkable that GR has passed with flying colours all the pulsar tests if has been submitted to. [See, notably, Fig. 1.] As argued above, these tests have probed strong-field aspects of gravity which had not been probed by solar-system (or cosmological) tests. On the other hand, a disadvantage of the phenomenological tests is that they do not tell us in any precise way which strong-field structures, have been actually tested. For instance, let us imagine that one day one specific PPK test fails to be satisfied by GR, while the others are OK. This leaves us in a quandary: If we trust the problematic test, we must conclude that GR is wrong. However, the other tests say that GR is OK. This example shows that we would like to have some idea of what physical effects, linked to strong-field gravity, enter in each test, or even better in each PK parameter. The ‘effacement of internal structure’ which takes place in GR does not allow one to discuss this issue. This gives us a motivation for going beyond the phenomenological PPK approach by considering theory-dependent formalisms in which one embeds GR within a space of alternative gravity theories. 5 Theory-space approach to testing relativistic gravity with binary pulsar data A complementary approach to testing gravity with binary pulsar data consists in embedding General Relativity within a multi-parameter space of alternative theories of gravity. In other words, we want to contrast the predictions of GR with the predictions of continuous families of alternative theories. In so doing we hope to learn more about which structures of GR are actually being probed in binary pulsar tests. This is a bit similar to the well-known psycho-physiological fact that the best way to appreciate a nuance of colour is to surround a given patch of colour by other patches with slightly different colours. This makes it much easier to detect subtle differences in colour. In the same way, we hope to learn about the probing power of pulsar tests by seeing how the phenomeno- logical tests summarized in Fig. 1 fail (or continue) to be satisfied when one continuously deform, away from GR, the gravity theory which is being tested. Let us first recall the various ways in which this theory-space approach has been used in the context of the solar-system tests of relativistic gravity. 5.1 Theory-space approaches to solar-system tests of rel- ativistic gravity In the quasi-stationary weak-field context of the solar-system, this theory-space approach has been implemented in two different ways. First, the parametrized post-Newtonian (PPN) formalism [61, 62, 63, 42, 64, 65, 11, 66] describes many ‘directions’ in which generic alternative theories of gravity might dif- fer in their weak-field predictions from GR. In its most general versions the PPN formalism contains 10 ‘post-Einstein’ PPN parameters, γ̄ ≡ γPPN − 111, β̄ ≡ βPPN−1, ξ, α1, α2, α3, ζ1, ζ2, ζ3, ζ4. Each one of these dimensionless quanti- ties parametrizes a certain class of slow-motion, weak-field gravitational effects which deviate from corresponding GR predictions. For instance, γ̄ parametrizes modifications both of the effect of a massive body (say, the Sun) on the light passing near it, and of the terms in the two-body gravitational Lagrangian which are proportional to (Gmamb/rab) · (va − vb)2/c2. A second way of implementing the theory-space philosophy consists in con- sidering some explicit, parameter-dependent family of alternative relativistic theories of gravity. For instance, the simplest tensor-scalar theory of gravity 11The PPN parameter γPPN is usually denoted simply as γ. To distinguish it from the Einstein-time-delay PPK timing parameter γ used above we add the superscript PPN. In addition, as the value of γPPN in GR is 1, we prefer to work with the parameter γ̄ ≡ γPPN−1 which vanishes in GR, and therefore measures a ‘deviation’ from GR in a certain ‘direction’ in theory-space. Similarly with β̄ ≡ βPPN − 1. put forward by Jordan [67], Fierz [68] and Brans and Dicke [69] has a unique free parameter, say α20 = (2ωBD + 3) −1. When α20 → 0, this theory reduces to GR, so that α20 (or 1/ωBD) measures all the deviations from GR. When considering the weak-field limit of the Jordan-Fierz-Brans-Dicke (JFBD) the- ory, one finds that it can be described within the PPN formalism by choosing γ̄ = −2α20(1 + α20)−1, β̄ = 0 and ξ = αi = ζj = 0. Having briefly recalled the two types of theory-space approaches used to discuss solar-system tests, let us now consider the case of binary-pulsar tests. 5.2 Theory-space approaches to binary-pulsar tests of rel- ativistic gravity There exist generalizations of these two different theory-space approaches to the context of strong-field gravity and binary pulsar tests. First, the PPN formalism has been (partially) extended beyond the ‘first post-Newtonian’ (1PN) order deviations from GR (∼ v2/c2+Gm/c2 r) to describe 2PN order deviations from [70]. Remarkably, there appear only two new parameters at the 2PN level12: ǫ and ζ. Also, by expanding in powers of the self-gravity parameters of body a and b the predictions for the PPK timing parameters in generic tensor-multi-scalar theories, one has shown that these predictions depended on several ‘layers’ of new dimensionless parameters [33]. Early among these parameters one finds, the 1PN parameters β̄, γ̄ and then the basic 2PN parameters ǫ and ζ, but one also finds further parameters β3, (ββ ′), β′′, . . . which would not enter usual 2PN effects. The two approaches that we have just mentionned can be viewed as generalizations of the PPN formalism. There exist also useful generalizations to the strong-field context of the idea of considering some explicit parameter-dependent family of alternative theo- ries of relativistic gravity. Early studies [7, 8, 39] focussed either on the one- parameter JFBD tensor-scalar theory, or on some theories which are not con- tinuously connected to GR, such as Rosen’s bimetric theory of gravity. Though the JFBD theory exhibits a marked difference from GR in that it predicts the existence of dipole radiation, it has the disadvantage that the weak field, solar- system constraints on its unique parameter α20 are so strong that they drastically constrain (and essentially forbid) the presence of any non-radiative, strong-field deviations from GR. In view of this, it is useful to consider other ‘mini-spaces’ of alternative theories. A two-parameter mini-space of theories, that we shall denote13 here as ′, β′′), was introduced in [33]. This two-parameter family of tensor-bi-scalar theories was constructed so as to have exactly the same first post-Newtonian limit as GR (i.e. γ̄ = β̄ = · · · = 0), but to differ from GR in its predictions 12When restricting oneself to the general class of tensor-multi-scalar theories. At the 1PN level, this restriction would imply that only the ‘directions’ γ̄ and β̄ are allowed. 13We add here an index 2 to T as a reminder that this is a class of tensor-bi-scalar theories, i.e. that they contain two independent scalar fields ϕ1, ϕ2 besides a dynamical metric gµν . for the various observables that can be extracted from binary pulsar data. Let us give one example of this behaviour of the T2(β ′, β′′) class of theories. For a general theory of gravity we expect to have violations of the strong equiva- lence principle in the sense that the ratio between the gravitational mass of a self-gravitating body to its inertial mass will admit an expansion of the type mgrava minerta ≡ 1 + ∆a = 1− η1 ca + η2 c a + . . . (39) where ca ≡ −2 ∂ lnma∂ lnG measures the ‘gravitational compactness’ (or fractional gravitational binding energy, ca ≃ −2Egrava /ma c2) of body a. The numerical coefficient η1 of the contribution linear in ca is a combination of the first post- Newtonian order PPN parameters, namely η1 = 4 β̄ − γ̄ [42]. The numerical coefficient η2 of the term quadratic in ca is a combination of the 1PN and 2PN parameters. When working in the context of the T2(β ′, β′′) theories, the 1PN parameters vanish exactly (β̄ = 0 = γ̄) and the coefficient of the quadratic term becomes simply proportional to the theory parameter β′ : η2 = Bβ′, where B ≈ 1.026. This example shows explicitly how binary pulsar data (here the data constraining the equivalence principle violation parameter ∆ = ∆a − ∆b, see above) can go beyond solar-system experiments in probing certain strong-self- gravity effects. Indeed, solar-system experiments are totally insensitive to 2PN parameters because of the smallness of ca ∼ Gma/c2Ra and of the structure of 2PN effects [70]. By contrast, the ‘compactness’ of neutron stars is of order ca ∼ 0.21ma/M⊙ ∼ 0.3 [33] so that the pulsar limit |∆| < 5.5×10−3 [44] yields, within the T2(β ′, β′′) framework, a significant limit on the dimensionless (2PN order) parameter β′ : |β′| < 0.12. Ref. [35] introduced a new two-parameter mini-space of gravity theories, denoted here as T1(α0, β0), which, from the point of view of theoretical physics, has several advantages over the T2(β ′, β′′) mini-space mentionned above. First, it is technically simpler in that it contains only one scalar field ϕ besides the metric gµν (hence the index 1 on T1(α0, β0)). Second, it contains only positive- energy excitations (while one combination of the two scalar fields of T2(β ′, β′′) carried negative-energy waves). Third, it is the minimal way to parametrize the huge class of tensor-mono-scalar theories with a ‘coupling function’ a(ϕ) satisfying some very general requirements (see below). Let us now motivate the use of tensor-scalar theories of gravity as alternatives to general relativity. 5.3 Tensor-scalar theories of gravity Let us start by recalling (essentially from [35]) why tensor-(mono)-scalar theories define a natural class of alternatives to GR. First, and foremost, the existence of scalar partners to the graviton is a simple theoretical possibility which has surfaced many times in the development of unified theories, from Kaluza-Klein to superstring theory. Second, they are general enough to describe many inter- esting deviations from GR (both in weak-field and in strong field conditions), but simple enough to allow one to work out their predictions in full detail. Let us therefore consider a general tensor-scalar action involving a metric g̃µν (with signature ‘mostly plus’), a scalar field Φ, and some matter variables ψm (including gauge bosons): 16πG∗ g̃1/2 F (Φ)R̃ − Z(Φ)g̃µν∂µΦ ∂νΦ− U(Φ) + Sm[ψm; g̃µν ] . For simplicity, we assume here that the weak equivalence principle is satisfied, i.e., that the matter variables ψm are all coupled to the same ‘physical metric’ g̃µν . The general model (40) involves three arbitrary functions: a function F (Φ) coupling the scalar Φ to the Ricci scalar of g̃µν , R̃ ≡ R(g̃µν), a function Z(Φ) renormalizing the kinetic term of Φ, and a potential function U(Φ). As we have the freedom of arbitrary redefinitions of the scalar field, Φ → Φ′ = f(Φ), only two functions among F , Z and U are independent. It is often convenient to rewrite (40) in a canonical form, obtained by redefining both Φ and g̃µν according to g∗µν = F (Φ) g̃µν , (41) ϕ = ± F ′2(Φ) F 2(Φ) F (Φ) . (42) This yields 16πG∗ ∗ [R∗ − 2gµν∗ ∂µϕ∂νϕ− V (ϕ)] + Sm 2(ϕ) g∗µν where R∗ ≡ R(g∗µν), where the potential V (ϕ) = F−2(Φ)U(Φ) , (44) and where the conformal coupling function A(ϕ) is given by A(ϕ) = F−1/2(Φ) , (45) with Φ(ϕ) obtained by inverting the integral (42). The two arbitrary functions entering the canonical form (43) are: (i) the con- formal coupling function A(ϕ), and (ii) the potential function V (ϕ). Note that the ‘physical metric’ g̃µν (the one measured by laboratory clocks and rods) is conformally related to the ‘Einstein metric’ g∗µν , being given by g̃µν = A 2(ϕ) g∗µν . The canonical representation is technically useful because it decouples the two 14Actually, most unified models suggest that there are violations of the weak equivalence principle. However, the study of general string-inspired tensor-scalar models [71] has found that the composition-dependent effects would be negligible in the gravitational physics of neutron stars that we consider here. The experimental limits on tests of the equivalence principle would, however, bring a strong additional constraint of order 10−5 α2 ∼ ∆a/a . 10−12. As this constraint is strongly model-dependent, we will not use it in our exclusion plots below. One should, however, keep in mind that a limit on the scalar coupling strength of order α2 . 10−7 [71, 72] is likely to exist in many, physically-motivated, tensor-scalar models. irreducible propagating excitations: the spin-0 excitations are described by ϕ, while the pure spin-2 excitations are described by the Einstein metric g∗µν (with kinetic term the usual Einstein-Hilbert action ∝ R(g∗µν)). In many technical developments it is useful to work with the logarithmic coupling function a(ϕ) such that: a(ϕ) ≡ lnA(ϕ) ; A(ϕ) ≡ ea(ϕ) . (46) In the case of the general model (40) this logarithmic15 coupling function is given by a(ϕ) = − lnF (Φ) , where Φ(ϕ) must be obtained from (42). In the following, we shall assume that the potential V (ϕ) is a slowly varying function of ϕ which, in the domain of variation we shall explore, is roughly equivalent to a very small mass term V (ϕ) ∼ 2m2ϕ(ϕ−ϕ0)2 with m2ϕ of cosmo- logical order of magnitude m2ϕ = O(H20 ), or, at least, with a range λϕ = m−1ϕ much larger than the typical length scales that we shall consider (such as the size of the binary orbit, or the size of the Galaxy when considering violations of the strong equivalence principle). Under this assumption16 the potential func- tion V (ϕ) will only serve the role of fixing the value of ϕ far from the system (to ϕ(r = ∞) = ϕ0), and its effect on the propagation of ϕ within the system will be negligible. In the end, the tensor-scalar phenomenology that we shall explore only depends on one function: the coupling function a(ϕ). Let us consider some examples to see what kind of coupling functions might naturally arise. First, the simplest case is the Jordan-Fierz-Brans-Dicke action, which is of the general type (40) with F (Φ) = Φ (47) Z(Φ) = ωBD Φ −1 , (48) where ωBD is an arbitrary constant. Using Eqs. (42), (45) above, one finds that − 2α0 ϕ = lnΦ and that the (logarithmic) coupling function is simply a(ϕ) = α0 ϕ+ const. , (49) where α0 = ∓(2ωBD + 3)−1/2, depending on the sign chosen in Eq. (42). Inde- pendently of this sign, one has the link α20 = 2ωBD + 3 . (50) 15As we shall mostly work with a(ϕ) below, we shall henceforth drop the adjective ‘loga- rithmic’. 16Note, however, that, as was recently explored in [73, 74, 75], a sufficiently fast varying potential V (ϕ) can change the tensor-scalar phenomenology by endowing ϕ with a mass term m2ϕ = ∂2V/∂ϕ2 which strongly depends on the local value of ϕ and, thereby can get large in sufficiently dense environments. Note that 2ωBD + 3 must be positive for the spin-0 excitations to have the correct (non ghost) sign. Let us now discuss the often considered case of a massive scalar field having a nonminimal coupling to curvature 16πG∗ g̃1/2 R̃− g̃µν∂µΦ ∂νΦ−m2ΦΦ2 + ξR̃Φ2 + Sm[ψm; g̃µν ] . This is of the form (40) with F (Φ) = 1 + ξΦ2 , Z(Φ) = 1 , U(Φ) = m2ΦΦ 2 . (52) The case ξ = − 1 is usually referred to as that of ‘conformal coupling’. With the variables (51) the theory is ghost-free only if 2 (1+ ξΦ2)2 (dϕ/dΦ)2 = 1+ ξ(1 + 6 ξ)Φ2 is everywhere positive. If we do not wish to restrict the initial values of Φ, we must have ξ(1+6 ξ) > 0. Introducing then the notation χ ≡ ξ(1 + 6 ξ), we get the following link between Φ and ϕ: 1 + 2χΦ 1 + χ2 Φ2 + χΦ 1 + χ2 Φ2 − 1 + ξΦ2 . (53) For small values of Φ, this yields ϕ = Φ/ 2 + O(Φ3). The potential and the coupling functions are given by V (ϕ) = 1 + ξΦ2 , (54) a(ϕ) = −1 ln(1 + ξΦ2) . (55) These functions have singularities when 1+ ξΦ2 vanishes. If we do not wish to restrict the initial value of Φ we must assume ξ > 0 (which then implies our previous assumption ξ(1+6 ξ) > 0). Then there is a one-to-one relation between Φ and ϕ over the entire real line. Small values of Φ correspond to small values of ϕ and to a coupling function a(ϕ) = − ξ ϕ2 +O(ϕ4) . (56) On the other hand, large values of |Φ| correspond to large values of |ϕ|, and to a coupling function of the asymptotic form a(ϕ) ≃ − |ϕ|+ const. (57) The potential V (ϕ) has a minimum at ϕ = 0, as well as other minima at ϕ → ±∞. If we assume, for instance, that m2Φ and the cosmological dynamics are such that the cosmological value of ϕ is currently attracted towards zero, the value of ϕ at large distances from the local gravitating systems we shall consider will be ϕ0 ≪ 1. As a final example of a possible tensor-scalar gravity theory, let us discuss the string-motivated dilaton-runaway scenario considered in [76]. The starting action (a functional of ḡµν and Φ) was taken of the general form Bg(Φ) BΦ(Φ) [2 �̄Φ −(∇̄Φ)2]− 1 BF (Φ)F̄ 2 − V (Φ) + · · · and it was assumed that all the functions Bi(Φ) have a regular asymptotic be- havior when Φ → +∞ of the form Bi(Φ) = Ci+O(e−Φ). Under this assumption the early cosmological evolution can push Φ towards +∞ (hence the name ‘run- away dilaton’). In the canonical, ‘Einstein frame’ representation (43), one has, for large values of Φ, Φ ≃ c ϕ, where c is a numerical constant, and the coupling function to hadronic matter is given by ea(ϕ) ∝ ΛQCD(ϕ) ∝ B−1/2g (ϕ) exp[−8π2 b−13 BF (ϕ)] where b3 is the one-loop rational coefficient entering the renormalization-group running of the gauge field coupling g2F . This finally yields a coupling function of the approximate form (for large values of ϕ): a(ϕ) ≃ k e−cϕ + const. , where the dimensionless constants k and c are both expected to be of order unity. [The constant c must be positive, but the sign of k is not a priori restricted.] Summarizing: the JFBD model yields a coupling function which is a linear function of ϕ, Eq. (49), a nonminimally coupled scalar yields a coupling function which interpolates between a quadratic function of ϕ, Eq. (56), and a linear one, Eq. (57), and the dilaton-runaway scenario of Ref. [76] yields a coupling function of a decaying exponential type. 5.4 The role of the coupling function a(ϕ); definition of the two-dimensional space of tensor-scalar gravity theo- ries T1(α0, β0) Let us now discuss how the coupling function a(ϕ) enters the observable pre- dictions of tensor-scalar gravity at the first post-Newtonian (1PN) level, i.e., in the weak-field conditions appropriate to solar-system tests. It was shown in previous work that, if one uses appropriate units in the asymptotic region far from the system, namely units such that the asymptotic value a(ϕ0) of a(ϕ) vanishes17, all observable quantities at the 1PN level depend only on the values 17In these units the Einstein metric g∗µν and the physical metric g̃µν asymptotically coin- cide. of the first two derivatives of the a(ϕ) at ϕ = ϕ0. More precisely, if one defines α(ϕ) ≡ ∂ a(ϕ) ; β(ϕ) ≡ ∂ α(ϕ) ∂2 a(ϕ) , (58) and denotes by α0 ≡ α(ϕ0), β0 ≡ β(ϕ0) their asymptotic values, one finds (see, e.g., [33]) that the effective gravitational constant between two bodies (as measured by a Cavendish experiment) is given by G = G∗(1 + α 0) , (59) while, among the PPN parameters, only the two basic Eddington ones, γ̄ ≡ γPPN − 1, and β̄ ≡ βPPN − 1, do not vanish, and are given by γ̄ ≡ γPPN − 1 = −2 α 1 + α20 , (60) β̄ ≡ βPPN − 1 = 1 α0 β0 α0 (1 + α20) . (61) The structure of the results (60) and (61) can be transparently expressed by means of simple (Feynman-like) diagrams (see, e.g., [77]). Eqs. (59) and (60) correspond to diagrams where the interaction between two worldlines (repre- senting two massive bodies) is mediated by the sum of the exchange of one graviton and one scalar particle. The scalar couples to matter with strength G∗. The exchange of a scalar excitation then leads to a term ∝ α20. On the other hand, Eq. (61) corresponds to a nonlinear interaction between three worldlines involving: (i) the ‘generation’ of a scalar excitation on a first world- line (factor α0), (ii) a nonlinear vertex on a second worldline associated to the quadratic piece of a(ϕ) (aquad(ϕ) = β0(ϕ−ϕ0)2; so that one gets a factor β0), and (iii) the final ‘absorption’ of a scalar excitation on a third worldline (second factor α0). Eqs. (60) and (61) can be summarized by saying that the first two coefficients in the Taylor expansion of the coupling function a(ϕ) around ϕ = ϕ0 (after setting a(ϕ0) = 0) a(ϕ) = α0(ϕ− ϕ0) + β0(ϕ − ϕ0)2 + · · · (62) suffice to determine the quasi-stationary, weak-field (1PN) predictions of any tensor-scalar theory. In other words, the solar-system tests only explore the ‘osculating approximation’ (62) (slope and local curvature) to the function a(ϕ). Note that GR corresponds to a vanishing coupling function a(ϕ) = 0 (so that α0 = β0 = · · · = 0), the JFBD model corresponds to keeping only the first term on the R.H.S. of (62), while, for instance, the nonminimally coupled scalar field (with asymptotic value ϕ0 ≪ 1) does indeed lead to nonzero values for both α0 and β0, namely α0 ≃ − 2 ξ ϕ0 ; β0 ≃ − 2 ξ . (63) Finally the dilaton-runaway scenario considered above leads also to non zero values for both α0 and β0, namely α0 ≃ − k c e−cϕ0 ; β0 ≃ + k c2 e−cϕ0 , (64) for a largish value of ϕ0. Note that the dilaton-runaway model naturally predicts that α0 ≪ 1, and that β0 is of the same order of magnitude as α0 : β0 ≃ − c α0 with c being (positive and) of order unity. The interesting outcome is that such a model is well approximated by the usual JFBD model (with β0 = 0). This shows that a JFBD-like theory could come out from a model which is initially quite different from the usual exact JFBD theory. As we shall discuss in detail below, solar-system tests constrain α20 and α 0 |β0| to be both small. This immediately implies that |α0| must be small, i.e., that the scalar field is linearly weakly coupled to matter. On the other hand, the quadratic coupling parameter β0 is not directly constrained. Both its magnitude and its sign can be more or less arbitrary. Note that there are no a priori sign restrictions on β0. The conformal factor A 2(ϕ) = exp(2 a(ϕ)) entering Eq. (43) had to be positive, but this leads to no restrictions on the sign of a(ϕ) and of its various derivatives18. For instance, in the nonminimally coupled scalar field case, it seemed more natural to require ξ > 0, which leads to a negative β0 in view of Eq. (63). Let us summarize the results above: (i) the most general tensor-scalar the- ory19 is described by one arbitrary function a(ϕ); and (ii) weak-field tests depend only on the first two terms, parametrized by α0 and β0, in the Taylor expansion (62) of a(ϕ) around its asymptotic value ϕ0. From this follows a rather natural way to define a simplemini space of tensor- scalar theories. It suffices to consider the two-dimensional space of theories, say T1(α0, β0), defined by the coupling function which is a quadratic polynomial in ϕ [34, 35], say aα0,β0(ϕ) = α0(ϕ− ϕ0) + β0(ϕ− ϕ0)2 . (65) As indicated, this class of theories depends only on two parameters: α0 and β0. The asymptotic value ϕ0 of ϕ does not count as a third parameter (when using the form (65)) because one can always work with the shifted field ϕ̄ ≡ ϕ−ϕ0, with asymptotic value ϕ̄0 = 0 and coupling function aα0,β0(ϕ̄) = α0 ϕ̄+ β0 ϕ̄ 2. Moreover, as already said, the asymptotic value a(ϕ0) of a(ϕ) has also no physical meaning, because one can always use units such that it vanishes (as done in (65)). 18As explained above, we assume here the presence of a potential term V (ϕ) to fix the asymptotic value ϕ0 of ϕ. If the potential V (ϕ) is absent (or negligible), the ‘attractor mechanism’ of Refs. [78, 71] would attract ϕ to a minimum of the coupling function a(ϕ), thereby favoring a positive value of β0. 19Under the assumption that the potential V (ϕ) is a slowly-varying function of ϕ, which modifies the propagation of ϕ only on very large scales. Note also that an alternative way to represent the same class of theories is to use a coupling function of the very simple form aβ(ϕ) = β ϕ2 , (66) but to keep the asymptotic value ϕ0 as an independent parameter. This class of theories is clearly equivalent to T1(α0, β0), Eq. (65), with the dictionary: α0 = β ϕ0, β0 = β. 5.5 Tensor-scalar gravity, strong-field effects, and binary- pulsar observables Having chosen some mini-space of gravity theories, we now wish to derive what predictions these theories make for the timing observables of binary pulsars. To do this we need to generalize the general relativistic treatment of the motion and timing of binary systems comprising strongly self-gravitating bodies summarized above. Let us recall that this treatment was based on a multi-chart method, using a matching between two separate problems: (i) the ‘internal problem’ considers each strongly self-gravitating body in a suitable approximately freely falling frame where the influence of its companion is small, and (ii) the ‘external problem’ where the two bodies are described as effective point masses which interact via the various fields they are coupled to. Let us first consider the internal problem, i.e., the description of a neutron star in an approximately freely falling frame where the influence of the companion is reduced to imposing some boundary conditions on the tensor and scalar fields with which it interacts [7, 8, 33, 34, 35]. The field equations of a general tensor-scalar theory, as derived from the canonical action (43) (neglecting the effect of V (ϕ)) read R∗µν = 2 ∂µϕ∂νϕ+ 8πG∗ T ∗µν − T ∗g∗µν , (67) �g∗ ϕ = − 4πG∗ α(ϕ)T∗ , (68) where T ∗ ≡ 2 c (g∗)−1/2 δSm/δg∗µν denotes the material stress-energy tensor in ‘Einstein units’, and α(ϕ) the ϕ-derivative of the coupling function, see Eq. (58). All tensorial operations in Eqs. (67) and (68) are performed by using the Einstein metric g∗µν . Explicitly writing the field equations (67) and (68) for a slowly rotating (stationary, axisymmetric) neutron star, labelled20 A, leads to a coupled set of ordinary differential equations constraining the radial dependence of g∗µν and ϕ [35, 79]. Imposing the boundary conditions g∗µν → ηµν , ϕ → ϕa at large radial distances, finally determines the crucial ‘form factors’ (in Einstein units) describing the effective coupling between the neutron star A and the fields to 20We henceforth use the labels A and B for the (recycled) pulsar and its companion, instead of the labels a and b used above. We henceforth use the label a to denote the asymptotic value of some quantity (at large radial distances within the local frame, Xi or Xi , of the considered neutron star A or B). which it is sensitive: total mass mA(ϕa), total scalar charge ωA(ϕa), and inertia moment IA(ϕa). As indicated, these quantities are functions of the asymptotic value ϕa of ϕ felt by the considered neutron star 21. They satisfy the relation ωA(ϕa) = −∂ mA(ϕa)/∂ ϕa. From them, one defines other quantities that play an important role in binary pulsar physics, notably αA(ϕa) ≡ − ≡ ∂ lnmA , (69) βA(ϕa) ≡ , (70) as well as kA(ϕa) ≡ − ∂ ln IA . (71) The quantity αA, Eq. (69), plays a crucial role. It measures the effective coupling strength between the neutron star and the ambient scalar field. If we formally let the self-gravity of the neutron A tend toward zero (i.e., if we consider a weakly self-gravitating object), the function αA(ϕa) becomes replaced by α(ϕa) where α(ϕ) ≡ ∂ a(ϕ)/∂ ϕ is the coupling strength appearing in the R.H.S. of Eq. (68). Roughly speaking, we can think of αA(ϕa) as a (suitable defined) average value of the local coupling strength α(ϕ(r)) over the radial profile of the neutron star 0.5 1 1.5 2 2.5 3 critical maximum maximum mass in GR scalar charge baryonic mass neutron star Figure 2: Dependence upon the baryonic mass m̄A of the coupling parameter αA in the theory T1(α0, β0) with α0 = −0.014, β0 = −6. Figure taken from [80]. It was pointed out in Refs. [34, 35] that the strong self-gravity of a neutron star can cause the effective coupling strength αA(ϕa) to become of order unity, 21This ϕa is a combination of the cosmological background value ϕ0 and of the scalar influence of the companion of the considered neutron star. It varies with the orbital period and is determined as part of the ‘external problem’ discussed below. Note that, strictly speaking, the label a (for asymptotic) should be indexed by the label of the considered neutron star: i.e. one should use a label aA (and a locally asymptotic value ϕaA) when considering the neutron star A, and a label aB (with a corresponding ϕaB ) when considering the neutron star B. even when its weak-field counterpart α0 = α(ϕa) is extremely small (as is im- plied by solar-system tests that put strong constraints on the PPN combination γ̄ = −2α20/(1+α20)). This is illustrated, in the minimal context of the T1(α0, β0) class of theories, in Figure 2. Note that when the baryonic mass m̄A of the neutron star is smaller than the critical mass m̄cr ≃ 1.24M⊙ the effective scalar coupling strength αA of the star is quite small (because it is proportional to its weak-field limit α0 = α(ϕa)). By contrast, when m̄A > m̄cr, |αA| becomes of order unity, nearly independently of the externally imposed α0 = αa = α(ϕa). This interesting non-perturbative behaviour was related in [34, 35] to a mechanism of spontaneous scalarization, akin to the well-known mechanism of spontaneous magnetization of ferromagnets. See also [51] for a simple analytical description of the behaviour of αA. Let us also mention in passing that, in the case where A is a black hole, the effective coupling strength αA actually vanishes [33]. This result is related to the impossibility of having (regular) ‘scalar hair’ on a black hole. We have sketched above the first part of the matching approach to the mo- tion and timing of strongly self-gravitating bodies: the ‘internal problem’. It remains to describe the remaining ‘external problem’. As already mentionned (and emphasized, in the present context, by Eardley [7, 11]), the most efficient way to describe the external problem is, instead of matching in detail the exter- nal fields (g∗µν , ϕ) to the fields generated by each body in its comoving frame, to ‘skeletonize’ the bodies by point masses. Technically this means working with the action 16πG∗ ∗ [R∗ − 2 gµν∗ ∂µϕ∂νϕ] mA(ϕ(zA))(−g∗µν(zA) dz 1/2 , (72) where the function mA(ϕ) in the last term on the R.H.S. is the function mA(ϕa) obtained above by solving the internal problem. Eq. (72) indicates that the ar- gument of this function is taken to be ϕa = ϕ(zA), i.e., the value that the scalar field (as viewed in the external problem) takes at the location z A of the center of mass of body A. However, as body A is described, in the external problem, as a point mass this causes a technical difficulty: the externally determined field ϕ(x) becomes formally singular at the location of the point sources, so that ϕ(zA) is a priori undefined. One can either deal with this problem by coming back to the physically well-defined matching approach (which shows that ϕ(zA) should be replaced by ϕa, the value of ϕ in an intermediate domain RA ≪ r ≪ |zA−zB |), or use the efficient technique of dimensional regularization. This means that the spacetime dimension D in Eq. (72) is first taken to have a complex value such that ϕ(zA) is finite, before being analytically continued to its physical value D = 4. One then derives from the action (72) two important consequences for the motion and timing of binary pulsars. First, one derives the Lagrangian de- scribing the relativistic interaction between N strongly self-gravitating bodies (including orbital ∼ (v/c)2 effects, and neglecting O(v4/c4) ones) [11, 7, 33, 39]. It is the sum of one-body, two-body and three-body terms. The one-body action has the usual form of the sum (over the label A) of the kinetic term of each point mass: one-body A = −mA c 1− v2A/c2 = −mA c2 + (v2A) . (73) Here, we use Einstein units, and the inertial mass mA entering Eq. (73) is mA ≡ mA(ϕ0), where ϕ0 is the asymptotic value of ϕ far away from the considered N -body system. The two-body action is a sum over the pairs A,B of a term L 2-body AB which differs from the GR-predicted 2-body Lagrangian in two ways: (i) the usual gravitational constant G appearing as an overall factor in L 2-body AB must be re- placed by an effective (body-dependent) gravitational constant (in the appro- priate units mentioned above) given by GAB = G∗(1 + αA αB) , (74) and (ii) the relativistic (O(v2/c2)) terms in L2-bodyAB contain, in addition to those predicted by GR, new velocity-dependent terms of the form 2-body AB = (γ̄AB) GAB mAmB (vA − vB)2 , (75) γ̄AB ≡ γAB − 1 = − 2 αA αB 1 + αA αB . (76) In these expressions αA ≡ αA(ϕ0) ≡ ∂ lnmA(ϕ0)/∂ϕ0 (see Eq. (69) with ϕa → Finally, the 3-body action is a sum over the pairs B,C and over A (with A 6= B, A 6= C, but the possibility of having B = C) of 3-body ABC = −(1 + 2 β̄ GAB GAC mAmB mC c2 rAB rAC where β̄ABC ≡ βABC − 1 = αB βA αC (1 + αA αB)(1 + αA αC) , (78) with βA = ∂αA(ϕ0)/∂ϕ0 (see Eq. (70) with ϕa → ϕ0). When comparing the strong-field results (74), (76), (78) to their weak-field counterparts (59), (60), (61) one sees that the body-dependent quantity αA replaces the weak-field coupling strength α0 in all quantities which are linked to a scalar effect generated by body A. Note also that, in keeping with the ‘3-body’ nature of Eq. (77), the quantity βABC −1 is linked to scalar interactions which are generated in bodies B and C and which nonlinearly interact on body A. The notation used above has been chosen to emphasize that γAB and β are strong-field analogs of the usual Eddington parameters γPPN, βPPN, so that γ̄AB and β̄ BC are strong-field analogs of the ‘post-Einstein’ 1PN parameters γ̄ and β̄ (which vanish in GR). Indeed the usual PPN results for the post-Einstein terms in the O(1/c2) 2-body and 3-body Lagrangians are obtained by replacing in Eqs. (75) and (77) γ̄AB → γ̄, β̄ABC → β̄ and GAB → G. The non-perturbative strong-field effects discussed above show that the strong self-gravity of neutron stars can cause γAB and β BC to be significantly different from their GR values γGR = 1, βGR = 1, in some scalar-tensor theories having a small value of the basic coupling parameter α0 (so that γ PPN − 1 ∝ α20 and βPPN − 1 ∝ β0 α20 are both small). For instance, Fig. 2 shows that it is possible to have αA ∼ αB ∼ ± 0.6 which implies γAB − 1 ∼ − 0.53, i.e., a 50% deviation from GR! Even larger effects can arise in βABC − 1 because of the large values that βA = ∂αA/∂ϕ0 can reach near the spontaneous scalarization transition [35]. Those possible strong-field modifications of the effective Eddington param- eters γAB, β BC , which parametrize the ‘first post-Keplerian’ (1PK) effects (i.e., the orbital effects ∼ v2/c2 smaller than those entailed by the Lagrangian A 6=B GAB mAmB/rAB), can then significantly modify the usual GR predictions relating the directly observable parametrized post-Keplerian (PPK) parameters to the values of the masses of the pulsar and its compan- ion. As worked out in Refs. [11, 31, 33, 35] one finds the following modified predictions for the PPK parameters k ≡ 〈ω̇〉/n, r and s: kth(mA,mB) = 1− e2 GAB(mA +mB)n αA αB 1 + αA αB − XA βB α A +XB βA α 6 (1 + αA αB)2 , (79) rth(mA,mB) = G0B mB , (80) sth(mA,mB) = GAB(mA +mB)n ]−1/3 . (81) Here, the label A refers to the object which is timed (‘the pulsar’22), the label B refers to its companion, xA = aA sin i/c denotes the projected semi-major axis of the orbit of A (in light seconds), XA ≡ mA/(mA+mB) andXB ≡ mB/(mA+ mB) = 1 − XA the mass ratios, n ≡ 2π/Pb the orbital frequency and G0B = G∗(1 + α0 αB) the effective gravitational constant measuring the interaction 22In the double binary pulsar, both the first discovered pulsar and its companion are pulsars. However, the companion B is a non recycled, slow pulsar whose motion is well described by Keplerian parameters only. between B and a test object (namely electromagnetic waves on their way from the pulsar toward the Earth). In addition one must replace the unknown bare Newtonian G∗ by its expression in terms of the one measured in Cavendish experiments, i.e., G∗ = G/(1 + α 0) as deduced from Eq. (59). The modified theoretical prediction for the PPK parameter γ entering the ‘Einstein time delay’ ∆E , Eq. (24), is more complicated to derive because one must take into account the modulation of the proper spin period of the pulsar caused by the variation of its moment of inertia IA under the (scalar) influence of its companion [11, 7, 35]. This leads to γth(mA,mB) = 1 + αA αB GAB(mA +mB)n [XB(1 + αA αB) + 1 + kA αB ] , (82) where kA(ϕ0) = −∂ ln IA(ϕ0)/∂ϕ0 (see Eq. (71) with ϕa → ϕ0). Numerical studies [35] show that kA can take quite large values. Actually, the quantity kA αB entering (82) blows up near the scalarization transition when α0 → 0 (keeping β0 < 0 fixed). In other words a theory which is closer to GR in weak- field conditions predicts larger deviations in the strong-field regime. The structure dependence of the effective gravitational constantGAB , Eq. (74), has also the consequence that the object A does not fall in the same way as B in the gravitational field of the Galaxy. As most of the mass of the Galaxy is made of non strongly-self-gravitating bodies, A will fall toward the Galaxy with an acceleration ∝ GA0, while B will fall with an acceleration ∝ GB0. Here, as above, GA0 = G0A = G∗(1 + α0 αA) is the effective gravi- tational constant between A and any weakly self-gravitating body. As pointed out in Ref. [43] this possible violation of the universality of free fall of self- gravitating bodies can be constrained by using observational data on the class of small-eccentricity long-orbital-period binary pulsars. More precisely, the quantity which can be observationally constrained is not exactly the violation ∆AB = (G0A −G0B)/G = (1 +α20)−1(α0 αA −α0 αB) of the strong equivalence principle [which simplifies to ∆A0 = (G0A − G)/G = (1 + α20)−1(α0 αA − α20) in the case of observational relevance where one neglects the self-gravity of the white-dwarf companion] but rather23 [33] ∆effective ≡ 2 γAB − (XA βBAA +XB βABB) + 2 (1 + αA αB) −3/2(1 + α20) −1(α0 αA − α0 αB) . (83) Here, the index B (= white-dwarf companion) can be replaced by 0 (weakly self- gravitating body) so that, for instance, γAB = γA0 = 1− 2αAα0/(1+αA α0) = (1− αA α0)/(1 + αA α0), as deduced from Eq. (76). 23This refinement is given here for pedagogical completeness. However, in practice, the lowest-order result ∆ ≃ (1 + α2 )−1(α0 αA − α ) ≃ α0 αA − α is accurate enough. It remains to discuss the possible strong-field modifications of the theoretical prediction for the orbital period derivative Ṗb = Ṗ b (mA,mB). This is obtained by deriving from the effective action (72) the energy lost by the binary system in the form of fluxes of spin-2 and spin-0 waves at infinity. The needed results in a generic tensor-scalar theory were derived in Refs. [33, 39] (in addition one must take into account the tensor-scalar modification of the additional ‘varying- Doppler’ contribution to the observed Ṗb due to the Galactic acceleration [38]). The final result for Ṗb is of the form Ṗ thb (mA,mB) = Ṗ monopole bϕ + Ṗ dipole bϕ + Ṗ quadrupole bϕ + Ṗ quadrupole galagtic bGR + δ th Ṗ galactic b , (84) where, for instance, Ṗ monopole bϕ is (heuristically 24) related to the monopolar flux of spin-0 waves at infinity. The term Ṗ quadrupole bg∗ corresponds to the usual quadrupolar flux of spin-2 waves at infinity. It reads: quadrupole bg∗ (mA,mB) = − 5(1 + αA αB) (mA +mB)2 GAB(mA +mB)n 1 + 73 e2/24 + 37 e4/96 (1− e2)7/2 with GAB = G∗(1 + αA αB) = G(1 + αA αB)/(1 + α 0), where G∗ is the ‘bare’ gravitational constant appearing in the action, while G is the gravitational con- stant measured in Cavendish experiments. The flux (85) is the only one which survives in GR (although without any αA-related modifications). Among the several other contributions which arise in tensor-scalar theories, let us only write down the explicit expression of the contribution to (84) coming from the dipolar flux of scalar waves. Indeed, this contribution is, in most cases, the dominant one [7] because it scales as (v/c)3, while the monopolar and quadrupolar con- tributions scale as (v/c)5. It reads dipole bϕ (mA,mB) = −2π G∗mAmB n c3(mA +mB) 1 + e2/2 (1 − e2)5/2 (αA − αB)2 . (86) Note that the dipolar effect (86) vanishes when αA = αB . Indeed, a binary system made of two identical objects (A = B) cannot select a preferred direction for a dipole vector, and cannot therefore emit any dipolar radiation. This also implies that double neutron star systems (which tend to have mA ≈ mB ∼ 1.35M⊙) will be rather poor emitters of dipolar radiation (though (86) still tends to dominate over the other terms in (84), because of the remaining difference (mA − mB)/(mA + mB) 6= 0). By contrast, very dissymmetric systems such 24Contrary to the GR case where a lot of effort was spent to show how the observed Ṗb was directly related to the GR predictions for the (v/c)5-accurate orbital equations of motion of a binary system [9], we use here the indirect and less rigorous argument that the energy flux at infinity should be balanced by a corresponding decrease of the mechanical energy of the binary system. as a neutron-star and a white-dwarf (or a neutron-star and a black hole) will be very efficient emitters of dipolar radiation, and will potentially lead to very strong constraints on tensor-scalar theories. See below. 5.6 Theory-space analyses of binary pulsar data Having reviewed the theoretical results needed to discuss the predictions of alternative gravity theories, let us end by summarizing the results of various theory-space analyses of binary pulsar data. Let us first recall what are the best, current solar-system limits on the two 1PN ‘post-Einstein’ parameters γ̄ ≡ γPPN − 1 and β̄ ≡ βPPN − 1. They are: γ̄ = (2.1± 2.3)× 10−5 , (87) from frequency shift measurements made with the Cassini spacecraft [81], which supersedes the constraint γ̄ = (−1.7± 4.5)× 10−4 (88) from VLBI measurements [82], |2 γ̄ − β̄| < 3× 10−3 , (89) from Mercury’s perihelion shift [66, 83], and 4 β̄ − γ̄ = (4.4± 4.5)× 10−4 , (90) from Lunar laser ranging measurements [84]. Concerning binary pulsar data, we can make use of the published measure- ments of various Keplerian and post-Keplerian timing parameters in the binary pulsars: PSR 1913+16 [37], PSR B1534+12 [41], PSR J1141−6545 [47] and PSR J0737−3039A+B [3, 48, 49]. In addition, we can use25 the recently up- dated limit on the parameter ∆ measuring a possible violation of the strong equivalence principle (SEP), namely |∆| < 5.5 × 10−3 at the 95% confidence level [44]. This ensemble of solar-system and binary-pulsar data can then be analyzed within any given parametrized theoretical framework. For instance, one might work within (i) the 4-parameter framework T0(γ̄, β̄; ǫ, ζ) [70] which defines the 2PN exten- sion of the original (Eddington) PPN framework T0(γ̄, β̄); or (ii) the 2-parameter class of tensor-mono-scalar theories T1(α0, β0) [34]; or 25There is, however, a caveat in the theoretical use one can make of the phenomenological limits on ∆. Indeed, in the small-eccentricity long-orbital-period binary pulsar systems used to constrain ∆ one does not have access to enough PK parameters to measure the pulsar mass mA directly. As the theoretical expression of ∆ ≃ α0 αA −α depends on mA (through αA), one needs to assume some fiducial value of mA (say mA ≃ 1.35M⊙). (iii) the 2-parameter class of tensor-bi-scalar theories T2(β ′, β′′) [33]. Here, the index 0 on T0(γ̄, β̄; ǫ, ζ) is a reminder of the fact that this framework is not a family of specific theories (it contains zero explicit dynamical fields), but is a parametrization of 2PN deviations from GR. As a consequence, its use for analyzing binary pulsar data is somewhat ill-defined because one needs to truncate the various timing observables (which are functions of the compactness of the two bodies A and B, say PPK = f(cA, cB)) at the 2PN order (i.e. es- sentially at the quadratic order in cA and/or cB). For some observables (or for product of observables) there might be several ways of defining this truncation. In spite of this slight inconvenience, the use of the T0(γ̄, β̄; ǫ, ζ) framework is conceptually useful because it shows very clearly why and how binary-pulsar data can probe the behaviour of gravitational theories beyond the usual 1PN regime probed by solar-system tests. For instance, the parameter ∆A ≡ mgravA /minertA − 1 measuring the strong equivalence principle (SEP) violation in a neutron star has, within the T0(γ̄, β̄; ǫ, ζ) framework, a 2PN-order expansion of the form [33, 70] ∆A = − (4 β̄ − γ̄) cA + + ζ +O(β̄) bA , (91) where cA = −2 ∂ lnmA∂ lnG ≃ 〈U〉A, bA = 1c4 〈U 2〉A ≃ B c2A, with B ≃ 1.026 and cA ≃ kmA/M⊙ with k ∼ 0.21. The general result (91) is compatible with the result quoted in subsection 5.2 within the context of the theory T2(β ′, β′′) when taking into account the fact that, within T2(β ′, β′′), one has β̄ = γ̄ = 0, ǫ = β′ and ζ = 0 [and that β′′ parametrizes some effects beyond the 2PN level]. On the example of Eq. (91) one sees that, after having used solar-system tests to constrain the first contribution on the RHS to a very small value, one can use binary-pulsar tests of the SEP to set a significant limit on the combination ǫ + ζ of 2PN parameters. Other pulsar data then yield significant limits on other combinations of the two 2PN parameters ǫ and ζ. The final conclusion is that binary-pulsar data allow one to set significant limits (around or better than the 1% level) on the possible 2PN deviations from GR (in contrast to solar-system tests which are unable to yield any limit on ǫ and ζ) [70]. For a recent update of the limits on ǫ and ζ, which makes use of recent pulsar data see [51]. Let us now briefly discuss the use of mini-space of theories, such as T1(α0, β0) or T2(β ′, β′′), for analyzing solar-system and binary-pulsar data. The basic methodology is to compute, for each given theory (e.g. for each given values of α0 and β0 if one chooses to work in the T1(α0, β0) theory space) a goodness- of-fit statistics χ2(α0, β0) measuring the quality of the agreement between the experimental data and the considered theory. For instance, when considering the timing data of a particular pulsar, for which one has measured several PK parameters pi (i = 1, . . . , n) with some standard deviations σ , one defines, for this pulsar χ2(α0, β0) = min mA,mB (σobspi ) theory i (α0, β0;mA,mB)− p 2 , (92) where ‘min’ denotes the result of minimizing over the unknown masses mA,mB and where p theory i (α0, β0;mA,mB) denotes the theoretical prediction (within T1(α0, β0)) for the PK observable pi (given also the observed values of the Keplerian parameters). The goodness-of-fit quantity χ2(α0, β0) will reach its minimum χ min for some values, say αmin0 , β 0 , of α0 and β0. Then, one focusses, for each pulsar, on the level contours of the function ∆χ2(α0, β0) ≡ χ2(α0, β0)− χ2min . (93) Each choice of level contour (e.g. ∆χ2 = 1 or ∆χ2 = 2.3) defines a certain region in theory space, which contains, with a certain corresponding ‘confidence level’, the ‘correct’ theory of gravity (if it belongs to the considered mini-space of theories). When combining together several independent data sets (e.g. solar- system data, and different pulsar data) we can define a total goodness-of-fit statistics χ2tot(α0, β0), by adding together the various individual χ 2(α0, β0). This leads to a corresponding combined contour ∆χ2tot(α0, β0). Let us end by briefly summarizing the results of the theory-space approach to relativistic gravity tests. For detailed discussions the reader should consult Refs. [33, 40, 35, 36, 80], and especially the recent update [51] which uses the latest binary-pulsar data. Regarding the two-parameter class of tensor-bi-scalar theories T2(β ′, β′′) the recent analysis [51] has shown that the ∆χ2(β′, β′′) corresponding to the double binary pulsar PSR J0737−3039 was defining quite a small elliptical allowed region in the (β′, β′′) plane. By contrast the other pulsar data define much wider allowed regions, while the strong equivalence principle tests define (in view of the theoretical result ∆ ≃ 1 + 1 Bβ′(c2A − c2B)) a thin, but infinitely long, strip |β′| < cst. in the (β′, β′′) plane. This highlights the power of the double binary pulsar in probing certain specific strong-field deviations from GR. Contrary to the T2(β ′, β′′) tensor-bi-scalar theories, which were constructed to have exactly the same first post-Newtonian limit as GR26 (so that solar- system tests put no constraints on β′ and β′′), the class of tensor-mono-scalar theories T1(α0, β0) is such that its parameters α0 and β0 parametrize both the weak-field 1PN regime (see Eqs. (60) and (61) above) and the strong-field regime (which plays an important role in compact binaries). This means that each class of solar-system data (see Eqs. (87)–(90) above) will define, via a corresponding goodness-of-fit statistics of the type, say χ2Cassini(α0, β0) = (σ Cassini −2 (γ̄theory(α0, β0)− γ̄Cassini)2 26However, this could be achieved only at the cost of allowing some combination of the two scalar fields to carry a negative energy flux. a certain allowed region27 in the (α0, β0) plane. As a consequence, the analysis in the framework of the T1(α0, β0) space of theories allows one to compare and contrast the probing powers of solar-system tests versus binary-pulsar tests (while comparing also solar-system tests among themselves and binary-pulsar ones among themselves). The result of the recent analysis [51] is shown in Figure 3. general relativity B1534+12 J1141–6545 J0737–3039 B1913+16 −6 −4 −2 0 2 4 6 0.025 matter matter 0.175 0.075 0.125 solar system Figure 3: Solar-system and binary-pulsar constraints on the two-parameter fam- ily of tensor-mono-scalar theories T1(α0, β0). Figure taken from [51]. In Fig. 3, the various solar-system constraints (87)–(90) are concentrated around the horizontal β0 axis. In particular, the high-precision Cassini con- straint is the lower small grey strip. The various pulsar constraints are labelled by the name of the pulsar, except for the strong equivalence principle constraint which is labelled SEP. Note that General Relativity corresponds to the origin of the (α0, β0) plane, and is compatible with all existing tests. The global constraint obtained by combining all the pulsar tests would, to a good accuracy, be obtained by intersecting the various pulsar-allowed regions. One can then see on Fig. 3 that it would be comparable to the pre-Cassini solar-system constraints and that its boundaries would be defined successively (starting from the left) by 1913+16, 1141−6545, 0737−3039, 1913+16 again and 1141−6545 again. A first conclusion is therefore that, at the quantitative level, binary-pulsar tests constrain tensor-scalar gravity theories as strongly as most solar-system 27Actually, in the case of the Cassini data, as it is quite plausible that the positive value of the published central value γ̄Cassini = +2.1× 10−5 is due to unsubtracted systematic effects, we use σCassiniγ = 2.3× 10 −5 but γ̄Cassini = 0. Otherwise, we would get unreasonably strong 1σ limits on α2 because tensor-scalar theories predict that γ̄ must be negative, see Eqs. (60) and (61). tests (excluding the exceptionally accurate Cassini result which constrains α20 to be smaller than 1.15 × 10−5, i.e. |α0| < 3.4 × 10−3). A second conclusion is obtained by comparing the behaviour of the solar-system exclusion plots and of the binary-pulsar ones around the negative β0 axis. One sees that binary- pulsar tests exclude a whole domain of the theory space (located on the left of β0 < −4) which is compatible with all solar-system experiments (even when including the very tight Cassini constraint). This remarkable qualitative feature of pulsar tests is a direct consequence of the existence of (non-perturbative) strong-field effects which start developing when the product −β0 cA (with cA denoting, as above, the compactness of the pulsar) becomes of order unity. 6 Conclusion In conclusion, we hope to have convinced the reader of the superb opportunities that binary pulsar data offer for testing gravity theories. In particular, they have been able to go qualitatively beyond solar-system experiments in probing two physically important regimes of relativistic gravity: the radiative regime and the strong-field one. Up to now, General Relativity has passed with flying colours all the radiative and strong-field tests provided by pulsar data. However, it is important to continue testing General Relativity in all its aspects (weak- field, radiative and strong-field). Indeed, history has taught us that physical theories have a limited range of validity, and that it is quite difficult to predict in which regime a theory will cease to be an accurate description of nature. Let us look forward to new results, and possibly interesting surprises, from binary pulsar data. Acknowledgments It is a pleasure to thank my long-term collaborator Gilles Esposito-Farèse for his useful remarks on the text, and for providing the figures. I wish also to thank the organizers of the 2005 Sigrav School, and notably Monica Colpi and Ugo Moschella, for organizing a warm and intellectually stimulating meeting. This work was partly supported by the European Research and Training Network “Forces Universe” (contract number MRTN-CT-2004-005104). References [1] R. A. Hulse and J. H. Taylor: Discovery of a pulsar in a binary system, Astrophys. J. 195, L51 (1975). [2] M. Burgay et al.: An increased estimate of the merger rate of double neutron stars from observations of a highly relativistic system, Nature 426, 531 (2003), arXiv:astro-ph/0312071. [3] A. G. Lyne et al.: A double-pulsar system: A rare laboratory for relativistic gravity and plasma physics, Science 303, 1153 (2004). http://arxiv.org/abs/astro-ph/0312071 [4] F. K. Manasse: J. Math. Phys. 4, 746 (1963). [5] P. D. D’Eath: Phys. Rev. D 11, 1387 (1975). [6] R. E. Kates: Phys. Rev. D 22, 1853 (1980). [7] D. M. Eardley: Astrophys. J. 196, L59 (1975). [8] C. M. Will, D. M. Eardley: Astrophys. J. 212, L91 (1977). [9] T. Damour: Gravitational radiation and the motion of compact bodies, in Gravitational Radiation, edited by N. Deruelle and T. Piran, North- Holland, Amsterdam, pp. 59-144 (1983). [10] K. S. Thorne and J. B. Hartle: Laws of motion and precession for black holes and other bodies, Phys. Rev. D 31, 1815 (1984). [11] C. M. Will: Theory and experiment in gravitational physics, Cambridge University Press (1993) 380 p. [12] V. A. Brumberg and S. M. Kopejkin: Nuovo Cimento B 103, 63 (1988) [13] T. Damour, M. Soffel and C. M. Xu: General relativistic celestial mechan- ics. 1. Method and definition of reference system, Phys. Rev. D 43, 3273 (1991); General relativistic celestial mechanics. 2. Translational equations of motion, Phys. Rev. D 45, 1017 (1992); General relativistic celestial me- chanics. 3. Rotational equations of motion, Phys. Rev. D 47, 3124 (1993); General relativistic celestial mechanics. 4. Theory of satellite motion, Phys. Rev. D 49, 618 (1994). [14] G. ’t Hooft and M. J. G. Veltman: Regularization and renormalization of gauge fields, Nucl. Phys. B 44, 189 (1972). [15] T. Damour and N. Deruelle: Radiation reaction and angular momentum loss in small angle gravitational scattering, Phys. Lett. A 87, 81 (1981). [16] T. Damour: Problème des deux corps et freinage de rayonnement en rela- tivité générale, C.R. Acad. Sci. Paris, Série II, 294, 1355 (1982). [17] T. Damour: The problem of motion in Newtonian and Einsteinian grav- ity, in Three Hundred Years of Gravitation, edited by S.W. Hawking and W. Israel, Cambridge University Press, Cambridge, pp. 128-198 (1987). [18] P. Jaranowski, G. Schäfer: Third post-Newtonian higher order ADM Hamilton dynamics for two-body point-mass systems, Phys. Rev. D 57, 7274 (1998). [19] L. Blanchet, G. Faye: General relativistic dynamics of compact binaries at the third post-Newtonian order, Phys. Rev. D 63, 062005-1-43 (2001). [20] T. Damour, P. Jaranowski, G. Schäfer: Dimensional regularization of the gravitational interaction of point masses, Phys. Lett. B 513, 147 (2001). [21] Y. Itoh, T. Futamase: New derivation of a third post-Newtonian equation of motion for relativistic compact binaries without ambiguity, Phys. Rev. D 68, 121501(R), (2003). [22] L. Blanchet, T. Damour, G. Esposito-Farèse: Dimensional regularization of the third post-Newtonian dynamics of point particles in harmonic coor- dinates, Phys. Rev. D 69, 124007 (2004). [23] M. E. Pati, C. M. Will: Post-Newtonian gravitational radiation and equa- tions of motion via direct integration of the relaxed Einstein equations. II. Two-body equations of motion to second post-Newtonian order, and radiation-reaction to 3.5 post-Newtonian order, Phys. Rev. D 65, 104008- 1-21 (2001). [24] C. Königsdörffer, G. Faye, G. Schäfer: Binary black-hole dynamics at the third-and-a-half post-Newtonian order in the ADM formalism, Phys. Rev. D 68, 044004-1-19 (2003). [25] S. Nissanke, L. Blanchet: Gravitational radiation reaction in the equa- tions of motion of compact binaries to 3.5 post-Newtonian order, Class. Quantum Grav. 22, 1007 (2005). [26] L. Blanchet: Gravitational radiation from post-Newtonian sources and inspiralling compact binaries, Living Rev. Rel. 5, 3 (2002); Updated article: http://www.livingreviews.org/lrr-2006-4 [27] T. Damour, N. Deruelle: General relativitic celestial mechanics of binary system I. The post-Newtonian motion, Ann. Inst. Henri Poincaré 43, 107 (1985). [28] T. Damour: Gravitational radiation reaction in the binary pulsar and the quadrupole formula controversy, Phys. Rev. Lett. 51, 1019 (1983). [29] R. Blandford, S. A. Teukolsky: Astrophys. J. 205, 580 (1976). [30] T. Damour, N. Deruelle: General relativitic celestial mechanics of binary system II. The post-Newtonian timing formula, Ann. Inst. Henri Poincaré 44, 263 (1986). [31] T. Damour, J. H. Taylor: Strong field tests of relativistic gravity and binary pulsars, Phys. Rev. D 45, 1840 (1992). [32] T. Damour: Strong-field tests of general relativity and the binary pulsar, in Proceedings of the 2cd Canadian Conference on General Relativity and Relativistic Astrophysics, edited by A. Coley, C. Dyer, T. Tupper, World Scientific, Singapore, pp. 315-334 (1988). [33] T. Damour, G. Esposito-Farèse: Tensor-multi-scalar theories of gravita- tion, Class. Quant. Grav. 9, 2093 (1992). http://www.livingreviews.org/lrr-2006-4 [34] T. Damour, G. Esposito-Farèse: Non-perturbative strong-field effects in tensor-scalar theories of gravitation, Phys. Rev. Lett. 70, 2220 (1993). [35] T. Damour, G. Esposito-Farèse: Tensor-scalar gravity and binary-pulsar experiments, Phys. Rev. D 54, 1474 (1996), arXiv:gr-qc/9602056. [36] T. Damour, G. Esposito-Farèse: Gravitational-wave versus binary-pulsar tests of strong-field gravity, Phys. Rev. D 58, 042001 (1998). [37] J. M. Weisberg, J. H. Taylor: Relativistic binary pulsar B1913+16: thirty years of observations and analysis, To appear in the proceedings of As- pen Winter Conference on Astrophysics: Binary Radio Pulsars, Aspen, Colorado, 11-17 Jan 2004., arXiv:astro-ph/0407149. [38] T. Damour, J. H. Taylor: On the orbital period change of the binary pulsar Psr-1913+16, The Astrophysical Journal 366, 501 (1991). [39] C. M. Will, H. W. Zaglauer: Gravitational radiation, close binary systems, and the Brans-Dicke theory of gravity, Astrophys. J. 346, 366 (1989). [40] J. H. Taylor, A. Wolszczan, T. Damour, J. M. Weisberg: Experimental constraints on strong field relativistic gravity, Nature 355, 132 (1992). [41] I. H. Stairs, S. E. Thorsett, J. H. Taylor, A. Wolszczan: Studies of the relativistic binary pulsar PSR B1534+12: I. Timing analysis, Astrophys. J. 581, 501 (2002). [42] K. Nordtvedt: Equivalence principle for massive bodies. 2. Theory, Phys. Rev. 169, 1017 (1968). [43] T. Damour and G. Schäfer: New tests of the strong equivalence principle using binary pulsar data, Phys. Rev. Lett. 66, 2549 (1991). [44] I. H. Stairs et al.: Discovery of three wide-orbit binary pulsars: implica- tions for binary evolution and equivalence principles, Astrophys. J. 632, 1060 (2005). [45] N. Wex: New limits on the violation of the Strong Equivalence Princi- ple in strong field regimes, Astronomy and Astrophysics 317, 976 (1997), gr-qc/9511017. [46] V. M. Kaspi et al.: Discovery of a young radio pulsar in a relativistic binary orbit, arXiv:astro-ph/0005214. [47] M. Bailes, S. M. Ord, H. S. Knight, A. W. Hotan: Self-consistency of relativistic observables with general relativity in the white dwarf-neutron star binary pulsar PSR J1141-6545, Astrophys. J. 595, L49 (2003). [48] M. Kramer et al.: eConf C041213, 0038 (2004), astro-ph/0503386. http://arxiv.org/abs/gr-qc/9602056 http://arxiv.org/abs/astro-ph/0407149 http://arxiv.org/abs/gr-qc/9511017 http://arxiv.org/abs/astro-ph/0005214 http://arxiv.org/abs/astro-ph/0503386 [49] M. Kramer et al.: Tests of general relativity from timing the double pulsar, Science 314, 97-102 (2006). [50] S. M. Ord, M. Bailes and W. van Straten: The Scintillation Velocity of the Relativistic Binary Pulsar PSR J1141-6545, arXiv:astro-ph/0204421. [51] T. Damour, G. Esposito-Farèse: Binary-pulsar versus solar-system tests of tensor-scalar gravity, 2007, in preparation. [52] T. Damour, R. Ruffini: Sur certaines vérifications nouvelles de la rela- tivité générale rendues possibles par la découverte d’un pulsar membre d’un système binaire, C.R. Acad. Sci. Paris (Série A) 279, 971 (1974). [53] B. M. Barker, R. F. O’Connell: Gravitational two-body problem with arbitrary masses, spins, and quadrupole moments, Phys. Rev. D 12, 329 (1975). [54] M. Kramer: Astrophys. J. 509, 856 (1998). [55] J. M. Weisberg and J. H. Taylor: Astrophys. J. 576, 942 (2002). [56] I. H. Stairs, S. E. Thorsett, Z. Arzoumanian: Measurement of gravitational spin-orbit coupling in a binary pulsar system, Phys. Rev. Lett. 93, 141101 (2004). [57] A. W. Hotan, M. Bailes, S. M. Ord: Geodetic Precession in PSR J1141- 6545, Astrophys. J. 624, 906 (2005). [58] T. Damour, G. Schäfer: Higher order relativistic periastron advances and binary pulsars, Nuovo Cim. B 101, 127 (1988). [59] J.M. Lattimer, B.F. Schutz: Constraining the equation of state with moment of inertia measurements, Astrophys. J. 629, 979 (2005), arXiv:astro-ph/0411470. [60] I. A. Morrison, T. W. Baumgarte, S. L. Shapiro, V. R. Pandharipande: The moment of inertia of the binary pulsar J0737-3039A: constraining the nuclear equation of state, Astrophys. J. 617, L135 (2004). [61] A. S. Eddington: The Mathematical Theory of Relativity, Cambridge Uni- versity Press, London (1923). [62] L. I. Schiff: Am. J. Phys. 28, 340 (1960). [63] R. Baierlein: Phys. Rev. 162, 1275 (1967). [64] C. M. Will: Astrophys. J. 163, 611 (1971). [65] C. M. Will, K. Nordtvedt: Astrophys. J. 177, 757 (1972). http://arxiv.org/abs/astro-ph/0204421 http://arxiv.org/abs/astro-ph/0411470 [66] C. M. Will: The confrontation between general relativity and experi- ment, Living Rev. Rel. 4, 4 (2001) arXiv:gr-qc/0103036; update (2005) in arXiv:gr-qc/0510072. [67] P. Jordan, Nature (London) 164, 637 (1949); Schwerkraft und Weltall (Vieweg, Braunschweig, 1955); Z. Phys. 157, 112 (1959). [68] M. Fierz: Helv. Phys. Acta 29, 128 (1956). [69] C. Brans, R. H. Dicke: Mach’s principle and a relativistic theory of gravi- tation, Phys. Rev. 124, 925 (1961). [70] T. Damour, G. Esposito-Farèse: Testing gravity to second postNewto- nian order: A Field theory approach, Phys. Rev. D 53, 5541 (1996), arXiv:gr-qc/9506063. [71] T. Damour, A. M. Polyakov: The string dilaton and a least coupling prin- ciple, Nucl. Phys. B 423, 532 (1994) arXiv:hep-th/9401069; String theory and gravity, Gen. Rel. Grav. 26, 1171 (1994), arXiv:gr-qc/9411069. [72] T. Damour, D. Vokrouhlicky: The equivalence principle and the moon, Phys. Rev. D 53, 4177 (1996), arXiv:gr-qc/9507016. [73] J. Khoury, A. Weltman: Chameleon fields: Awaiting surprises for tests of gravity in space, Phys. Rev. Lett. 93, 171104 (2004), arXiv:astro-ph/0309300. [74] J. Khoury, A. Weltman: Chameleon cosmology, Phys. Rev. D 69, 044026 (2004), arXiv:astro-ph/0309411. [75] P. Brax, C. van de Bruck, A. C. Davis, J. Khoury, A. Weltman: Detect- ing dark energy in orbit: The cosmological chameleon, Phys. Rev. D 70, 123518 (2004), arXiv:astro-ph/0408415. [76] T. Damour, F. Piazza, G. Veneziano: Runaway dilaton and equiv- alence principle violations, Phys. Rev. Lett. 89, 081601 (2002), arXiv:gr-qc/0204094; Violations of the equivalence principle in a dilaton- runaway scenario, Phys. Rev. D 66, 046007 (2002), arXiv:hep-th/0205111. [77] T. Damour, G. Esposito-Farèse: Testing gravity to second postNewto- nian order: A Field theory approach, Phys. Rev. D 53, 5541 (1996), arXiv:gr-qc/9506063. [78] T. Damour, K. Nordtvedt: General relativity as a cosmological attractor of tensor scalar theories, Phys. Rev. Lett. 70, 2217 (1993); Tensor-scalar cosmological models and their relaxation toward general relativity, Phys. Rev. D 48, 3436 (1993). [79] J. B. Hartle: Slowly rotating relativistic stars. 1. Equations of structure, Astrophys. J. 150, 1005 (1967). http://arxiv.org/abs/gr-qc/0103036 http://arxiv.org/abs/gr-qc/0510072 http://arxiv.org/abs/gr-qc/9506063 http://arxiv.org/abs/hep-th/9401069 http://arxiv.org/abs/gr-qc/9411069 http://arxiv.org/abs/gr-qc/9507016 http://arxiv.org/abs/astro-ph/0309300 http://arxiv.org/abs/astro-ph/0309411 http://arxiv.org/abs/astro-ph/0408415 http://arxiv.org/abs/gr-qc/0204094 http://arxiv.org/abs/hep-th/0205111 http://arxiv.org/abs/gr-qc/9506063 [80] G. Esposito-Farèse: Binary-pulsar tests of strong-field gravity and gravita- tional radiation damping, in Proceedings of the tenth Marcel Grossmann Meeting, July 2003, edited by M. Novello et al., World Scientific (2005), p. 647, arXiv:gr-qc/0402007. [81] B. Bertotti, L. Iess, P. Tortora: A test of general relativity using radio links with the Cassini spacecraft, Nature 425, 374 (2003). [82] S. S. Shapiro et al: Phys. Rev. Lett 92, 121101 (2004). [83] I. I. Shapiro, in General Relativity and Gravitation 12, edited by N. Ashby, D. F. Bartlett, and W. Wyss (Cambridge University Press, 1990), p. 313. [84] J. G. Williams, S. G. Turyshev, D. H. Boggs: Progress in lunar laser ranging tests of relativistic gravity, Phys. Rev. Lett. 93, 261101 (2004), arXiv:gr-qc/0411113. http://arxiv.org/abs/gr-qc/0402007 http://arxiv.org/abs/gr-qc/0411113 Introduction Motion of binary pulsars in general relativity Timing of binary pulsars in general relativity Phenomenological approach to testing relativistic gravity with binary pulsar data Theory-space approach to testing relativistic gravity with binary pulsar data Theory-space approaches to solar-system tests of relativistic gravity Theory-space approaches to binary-pulsar tests of relativistic gravity Tensor-scalar theories of gravity The role of the coupling function a(); definition of the two-dimensional space of tensor-scalar gravity theories T1 (0 , 0) Tensor-scalar gravity, strong-field effects, and binary-pulsar observables Theory-space analyses of binary pulsar data Conclusion ABSTRACT We review the general relativistic theory of the motion, and of the timing, of binary systems containing compact objects (neutron stars or black holes). Then we indicate the various ways one can use binary pulsar data to test the strong-field and/or radiative aspects of General Relativity, and of general classes of alternative theories of relativistic gravity. <|endoftext|><|startoftext|> arXiv:0704.0750v1 [math.DG] 5 Apr 2007 Univ. Beograd. Publ. Elektrotehn. Fak. Ser. Mat. 9 (1998), 29–33 SOME COMBINATORIAL ASPECTS OF DIFFERENTIAL OPERATION COMPOSITION ON THE SPACE R Branko J. Malešević In this paper we present a recurrent relation for counting meaningful compositions of the higher-order differential operations on the space Rn (n=3,4,...) and extract the non-trivial compositions of order higher than two. 1. DIFFERENTIAL FORMS AND OPERATIONS ON THE SPACE R It is well known that the first-order differential operations grad, curl and div on the space R3 can be introduced using the operator of the exterior differentia- tion d of differential forms [1]: Ω0(R3) −→ Ω1(R3) −→ Ω2(R3) −→ Ω3(R3), where Ωi(R3) is the space of differential forms of degree i = 0, 1, 2, 3 on the space 3 over the ring of functions A = {f : R3 → R | f ∈ C∞(R3)}. In the consideration, which follows, we give definitions of the first-order differential operations. Let us notice that one-dimensional spaces Ω0(R3) and Ω3(R3) are isomorphic toA and let ϕ0 : Ω 0(R3) → A, ϕ3 : Ω 3(R3) → A be the corresponding isomorphisms. Next, the set of vector functionsB = {f =(f1, f2, f3) : R 3 → R3 | f1, f2, f3 ∈ C ∞(R3)}, over the ring A, is three-dimensional. It is isomorphic to Ω1(R3) and Ω2(R3). Let ϕ1 : Ω 1(R3) → B, ϕ2 : Ω 2(R3) → B be the corresponding isomorphisms. In that case, the compositions ϕ−10 ◦ϕ3 : Ω 3(R3) → Ω0(R3) and ϕ−11 ◦ϕ2 : Ω 2(R3) → Ω1(R3) are isomorphisms of the corresponding spaces of differential forms. The first-order differential operations are defined via the operator of the exterior differentiation d of differential forms in the following form: ∇1 = ϕ1◦d◦ϕ 0 : A → B, ∇2 = ϕ2◦d◦ϕ 1 : B → B, ∇3 = ϕ3◦d◦ϕ 2 : B → A. Therefore we obtain explicit expressions for the first order differential operations ∇1, ∇2, ∇3 on the space R 3 in the following form: (1) gradf = ∇1f = e3 : A → B, (2) curlf = ∇2f = e3 : B → B, (3) divf = ∇3f = : B → A. 1991 Mathematics Subject Classification: 26B12, 58A10 http://arxiv.org/abs/0704.0750v1 30 Branko J. Malešević Let us count meaningful compositions of differential operations ∇1,∇2,∇3. Consider the set of functions Θ = {∇1,∇2,∇3}. Let us define a binary relation ρ ”to be in composition” with ∇iρ∇j = ⊤ iff the composition ∇j ◦∇i is meaningful (∇i,∇j ∈ Θ). The Cayley’s table of this relation reads: ρ ∇1 ∇2 ∇3 ∇1 ⊥ ⊤ ⊤ ∇2 ⊥ ⊤ ⊤ ∇3 ⊤ ⊥ ⊥ . We form the graph of relation ρ as follows. If ∇iρ∇j = ⊤ then we put the node ∇j under the node ∇i. Let us mark ∇0 as nowhere-defined function ϑ, with domain and range being the empty set [2]. We shall consider ∇0ρ∇i = ⊤ (i = 1, 2, 3). For the set of functions Θ ∪ {∇0} our graph is the tree with the root in the node ∇0. ∇0 f(0) = 1 ∇2 ❳❳ ∇3 f(1) = 3 ∇1 f(2) = 5 ∇3 f(3) = 8 ✔✔❚❚ ✔✔❚❚ ✔✔❚❚ ✔✔❚❚ ✔✔q∇2 ❚❚q∇3 q∇1 f(4) = 13 ✔✔❚❚ ✔✔❚❚Fig. 1 f(5) = 21 Let fi(k) be a number of meaningful compositions of the k th-order beginning with ∇i. Let f(k) be a number of meaningful composition of the k th-order of operations over Θ. Then f(k) = f1(k) + f2(k) + f3(k). Based on partial self similarity of the tree (Fig. 1), which is formed according to Cayley’s table (4), we get equalities: f1(k) = f2(k − 1) + f3(k − 1) ∧ f2(k) = f2(k − 1) + f3(k − 1) ∧ f3(k) = f1(k − 1). Now, a recurrent relation for f(k) can be derived as follows: f(k) = f1(k) + f2(k) + f3(k) f1(k − 1) + f2(k − 1) + f3(k − 1) f3(k − 1) + f2(k − 1) = f(k − 1) + f1(k − 2) + f2(k − 2) + f3(k − 2) = f(k − 1) + f(k − 2). Based on the initial values: f(1) = 3, f(2) = 5, f(3) = 8 we conclude that f(k) = Fk+3, where is Fibonacci’s number of order k + 3. Let us note that ∇2 ◦∇1 = 0 and ∇3 ◦∇2 = 0, because d 2 = 0. On the other hand, the compositions ∇1 ◦∇3, ∇2 ◦∇2 and ∇3 ◦∇1 are not annihilated, because of ϕ−10 ◦ ϕ3 6= i and ϕ 1 ◦ ϕ2 6= i. Thus, as in the paper [2], we conclude that the non-trivial compositions are of the following form: (∇1◦)∇3 ◦ · · · ◦ ∇1 ◦ ∇3 ◦ ∇1, ∇2 ◦ ∇2 ◦ · · · ◦ ∇2 ◦ ∇2 ◦ ∇2, (∇3◦)∇1 ◦ · · · ◦ ∇3 ◦ ∇1 ◦ ∇3. As non-trivial compositions we consider those which are not identical to the zero function. Terms in parentheses are included in for an odd number of terms and are left out otherwise. Some combinatorial aspects of differential operation compositions ... 31 2. DIFFERENTIAL FORMS AND OPERATIONS ON THE SPACE R Let us present a recurrent relation for counting meaningful compositions of the higher-order differential operations on the space Rn (n = 3, 4, . . .) and extract the non-trivial compositions of order higher than two. Let us form the following sets of functions: Ai = {f : R )|f1, . . . , f(n ) ∈ C for i = 0, 1, . . . ,m where m = [n/2]. Let Ωi(Rn) be a set of differential forms of degree i = 0, 1, . . . , n on the space Rn. Let us notice that Ωi(Rn) and Ωn−i(Rn), over ring A0, are spaces of the same dimension , for i = 0, 1, . . . ,m. They can be identified with Ai, using the corresponding isomorphisms: ϕi : Ω i(Rn) → Ai (0 ≤ i ≤ m) and ϕn−i : Ω n−i(Rn) → Ai (0 ≤ i < n−m). We define the first-order differential operations on the space Rn via the operator of the exterior differentiation d as follows: ∇i = ϕi ◦ d ◦ ϕ i−1 (1 ≤ i ≤ n). (1 ≤ i ≤ m) Therefore, we obtain the first order differential operations on the space Rn, de- pending on pairity of dimension n, in the following form: n = 2m : ∇1 : A0 → A1 ∇2 : A1 → A2 ∇i : Ai → Ai+1 ∇m : Am−1 → Am ∇m+1 : Am → Am−1 ∇n−j : Aj+1 → Aj ∇n−1 : A2 → A1 ∇n : A1 → A0, n = 2m+ 1 : ∇1 : A0 → A1 ∇2 : A1 → A2 ∇i : Ai → Ai+1 ∇m : Am−1 → Am ∇m+1 : Am → Am ∇m+2 : Am → Am−1 ∇n−j : Aj+1 → Aj ∇n−1 : A2 → A1 ∇n : A1 → A0. Consider the set of functions Θ = {∇1,∇2, . . . ,∇n}. Let us define a binary relation ρ ”to be in composition” with ∇iρ∇j = ⊤ iff the composition ∇j ◦∇i is meaningful (∇i,∇j ∈ Θ). It is not difficult to check that Cayley’s table of this relation is determined with: (6) ∇iρ∇j = ⊤ : (j = i+ 1) ∨ (i+ j = n+ 1), ⊥ : (j 6= i+ 1) ∧ (i+ j 6= n+ 1). Let us form an adjacency matrix A = [aij ] ∈ {0, 1} n×n of the graph, determined by relation ρ. Let fi(k) be a number of meaningful compositions of the k th-order 32 Branko J. Malešević beginning with ∇i (notice that fi(1) = 1 for i= 1, . . . , n). Let f(k) be a number of meaningful composition of the kth-order of operations over Θ. Then f(k) = f1(k)+. . .+fn(k). Notice that the following is true: (7) fi(k) = aij · fj(k − 1), for i = 1, . . . , n. Based on (7) we form the system of recurrent equations: f1(k) fn(k) a11 · · · a1n an1 · · · ann f1(k − 1) fn(k − 1) If vn = [ 1 · · · 1 ]1×n then: (9) f(k) = vn · f1(k) fn(k) So, the expression: (10) f(k) = vn · A k−1 · vTn . follows from (8) and (9). Reducing the system of the recurrent equations (8), for any of the functions fi(k) we have: (11) α0fi(k) + α1fi(k − 1) + · · ·+ αnfi(k − n) = 0 (k > n), where α0, . . . , αn are coefficients of the characteristic polynomial Pn(λ) = |A−λI| = n+ . . .+αn. Thus, we conclude that the function f(k) = fi(k) also satisfies: (12) α0f(k) + α1f(k − 1) + · · ·+ αnf(k − n) = 0 (k > n). Hence, the following theorem holds. Theorem 1. The number of meaningful differential operations, on the space R (n = 3, 4, . . .), of the order higher than two, is determined by the formula (10), i.e. by the recurrent formula (12). In n-dimensional space Rn, for dimensions n = 3, 4, 5, . . . , 10, using the pre- vious theorem we form a table of the corresponding recurrent formula: Dimension: Recurrent relations for the number of meaningful compositions: n = 3 f(i+ 2) = f(i + 1) + f(i) n = 4 f(i+ 2) = 2f(i) n = 5 f(i+ 3) = f(i+ 2) + 2f(i+ 1)− f(i) n = 6 f(i+ 4) = 3f(i+ 2)− f(i) n = 7 f(i + 5) = f(i + 3) + 3f(i + 2) − 2f(i + 1)− f(i) n = 8 f(i + 4) = 4f(i + 3) − 3f(i) n = 9 f(i+ 5) = f(i+ 4) + 4f(i+ 3)− 3f(i+ 2)− 3f(i + 1) + f(i) n = 10 f(i + 6) = 5f(i + 4) − 6f(i + 2) + f(i) Some combinatorial aspects of differential operation compositions ... 33 Let us determine non-trivial higher-order meaningful compositions on the space Rn. For isomorphisms ϕk we have: (13) ϕ−1 ◦ ϕn−k 6= i, for k = 1, 2, . . . , n and 2k 6= n. Then, based on (6) and (13), all second-order compositions are given by the formula: (14) ∇j ◦ ∇k = 0 : j = k + 1, gj,k : (k + j = n+ 1) ∧ (2k 6= n), ϑ : (j 6= k + 1) ∧ (k + j 6= n+ 1); where 0 is a trivial composition, gj,k is a non-trivial second-order composition and ϑ is a nowhere-defined function for j, k = 1, . . . , n. Notice that in gj,k = ∇j ◦∇k = ϕn+1−k ◦ d ◦ ϕ ◦ ϕk ◦ d ◦ ϕ k−1 (j=n+1−k ∧ 2k 6=n) and switching the terms is impossible, because in that way we get nowhere-defined function ϑ. Hence, we conclude that the following theorem holds. Theorem 2. All meaningful non-trivial differential operations on the space R (n = 3, 4, . . .), of order higher than, two are given in the form of the following com- positions: (∇k) ◦ ∇j ◦ ∇k ◦ · · · ◦ ∇j ◦ ∇k, (∇j) ◦ ∇k ◦ ∇j ◦ · · · ◦ ∇k ◦ ∇j , with to the condition k+ j = n+ 1 and 2k, 2j 6= n for k, j = 1, 2, . . . , n. Terms in parentheses are included in for an odd number of terms and are left out otherwise. Acknowledgment. I wish to express my gratitude to ProfessorsM. Merkle and M. Prvanović who examined the first version of the paper and gave me their suggestions and some very useful remarks. REFERENCES 1. R.Bott, L.W.Tu: Differential forms in algebraic topology, Springer, New York 1982. 2. B.J.Malešević: A note on higher-order differential operations, Univ. Beograd, Publ. Elektrotehn. Fak.,Ser. Mat. 7 (1996), 105-109. University of Belgrade, (Received September 8, 1997) Faculty of Electrical Engineering, (Revised October 30, 1998) P.O.Box 35-54, 11120 Belgrade, Yugoslavia malesevic@kiklop.etf.bg.ac.yu ABSTRACT In this paper we present a recurrent relation for counting meaningful compositions of the higher-order differential operations on the space $R^{n}$ (n=3,4,...) and extract the non-trivial compositions of order higher than two. <|endoftext|><|startoftext|> Introduction 2. Preliminary 3. The proof of Theorem ?? 4. Applications References ABSTRACT We provide several equivalent characterizations of Kobayashi hyperbolicity in unbounded convex domains in terms of peak and anti-peak functions at infinity, affine lines, Bergman metric and iteration theory. <|endoftext|><|startoftext|> Introduction The two-dimensional models have widely been used in the context of the two-dimensional gravity (e.g. see [1, 2, 3, 4] and references therein) and string theory. From the 2d-gravity point of view, higher-dimensional gravity models, by dimensional reduction reduce to the 2d-gravity [1, 2, 3]. From the string theory point of view, the (1+1)-dimensional actions are fundamental tools of the theory. However, 2d-gravity and 2d- string theory are closely related to each other. The known sigma models for string, in the presence of the dilaton field Φ(X), contain the two-dimensional scalar curvature R(hab), hRΦ(X). (1) In two dimensions the combination hR is total derivative. Thus, in the absence of the dilaton field, this action is a topological invariant that gives no dynamics to the worldsheet metric hab. In fact, in the action (1), the dilaton is not the only choice. For example, replacing the dilaton field with the scalar curvature R, leads to the R2-gravity [1, 4, 5]. In particular the Polyakov action is replaced by a special combination of the worldsheet fields, which include an overall factor R−1. Removing the dilaton and replacing it with another quantities motivated us to study a class of two-dimensional actions. They are useful in the context of the non-critical strings with curved worldsheet, and the 2-dimensional gravity. Instead of the dilaton field, we introduce some combinations of hab, R and the induced metric on the worldsheet, i.e. γab, which give dynamics to hab. These non-linear combinations can contain an arbitrary function f(R) of the scalar curvature R. We observe that these dynamics lead to the constraint equation for hab, extracted from the Polyakov action. For the flat spacetime, these models have the Poincaré symmetry. In addition, they are reparametrization invariant. However, for any function f(R), they do not have the Weyl symmetry. Therefore, the string worldsheet at most is conformally flat. By introducing an extra scalar field in these actions, they also find the Weyl symmetry. Note that a Weyl non-invariant string theory has noncritical dimension, e.g. see [6]. This paper is organized as follows. In section 2, we introduce a new action for the string in which the corresponding worldsheet always is curved. In section 3, the Poincaré symmetry of this string model will be studied. In section 4, the generalized form of the above action will be introduced and it will be analyzed. 2 Curved worldsheet in the curved spacetime We consider the following action for the string, which propagates in the curved spacetime S = −T habγab , (2) where h = − det hab, and T is a dimensionless constant. In addition, R denotes the two- dimensional scalar curvature which is made from hab. The string coordinates are {Xµ(σ, τ)}. The induced metric on the worldsheet, i.e. γab, is also given by γab = gµν(X)∂aX µ(σ, τ)∂bX ν(σ, τ), (3) where gµν(X) is the spacetime metric. In two dimensions, the symmetries of the curvature tensor imply the identity Rab − habR = 0. (4) Therefore, the variation of the action (2) leads to the following equation of motion for hab, Rab − γab = 0. (5) This implies that the energy-momentum tensor, extracted from the action (2), vanishes. Contraction of this equation by hab gives R = 1 habγab. Introducing this equation and the equation (5) into (4) leads to (Polyakov) ab ≡ γab − hab(h a′b′γa′b′) = 0. (6) This is the constraint equation, extracted from the Polyakov action. Note that the energy- momentum tensor, due to the action (2), is proportional to the left-hand-side of the equation (5). Thus, it is different from (6). The equation of motion of the string coordinate Xµ(σ, τ) also is hRhab∂bX hRhabΓ νλ∂aX λ = 0. (7) Presence of the scalar curvature R distinguishes this equation from its analog, extracted from the Polyakov action. Now consider those solutions of the equations of motion (5) and (7), which admit constant scalar curvature R. For these solutions, the equation (7) reduces to the equation of motion of the string coordinates, extracted from the Polyakov action with the curved background. However, for general solutions the scalar curvature R depends on the worldsheet coordinates σ and τ , and hence this coincidence does not occur. 2.1 The model in the conformal gauge Under reparametrization of σ and τ , the action (2) is invariant. That is, in two dimensions the general coordinate transformations σ → σ′(σ, τ) and τ → τ ′(σ, τ), depend on two free functions, namely the new coordinates σ′ and τ ′. By means of such transformations any two of the three independent components of hab can be eliminated. A standard choice is a parametrization of the worldsheet such that hab = e φ(σ,τ)ηab, (8) where ηab = diag(−1, 1), and eφ(σ,τ) is an unknown conformal factor. The choice (8) is called the conformal gauge. Since the action (2) does not have the Weyl symmetry (a local rescaling of the worldsheet metric hab) we cannot choose the gauge hab = ηab. The scalar curvature corresponding to the metric (8) is R = −e−φ∂2φ, (9) where ∂2 = ηab∂a∂b. Thus, the action (2) reduces to S ′ = −T d2σe−φ∂2φ ηabγab . (10) According to the gauge (8), this action describes a conformally flat worldsheet. 3 Poincaré symmetry of the model In this section we consider flat Minkowski space, i.e. gµν(X) = ηµν . Therefore, the equations of motion are simplified to Rab − ηµν∂aX ν = 0, (11) hRhab∂bX µ) = 0. (12) The Poincaré symmetry reflects the symmetry of the background in which the string is propagating. It is described by the transformations δXµ = aµνX ν + bµ, δhab = 0, (13) where aµν and b µ are independent of the worldsheet coordinates σ and τ , and aµν = ηµλa is antisymmetric. Thus, from the worldsheet point of view, these transformations are global symmetries. Under these transformations the action (2) is invariant. 3.1 The conserved currents The Poincaré invariance of the action (2) is associated to the following Noether currents J µνa = T hRhab(Xµ∂bX ν −Xν∂bXµ), Pµa = T hRhab∂bX µ, (14) where the current Pµa is corresponding to the translation invariance and J µνa is the current associated to the Lorentz symmetry. According to the equation of motion (12) these are conserved currents ∂aJ µνa = 0, ∂aPµa = 0. (15) 3.2 The covariantly conserved currents It is possible to construct two other currents from (14), in which they be covariantly con- served. For this, there is the useful formula ∇aKa = a), (16) where Ka is a worldsheet vector. Therefore, we define the currents Jµνa and P µa as in the following Jµνa = J µνa, P µa = Pµa. (17) According to the equations (15) and (16), these are covariantly conserved currents, i.e., ∇aJµνa = ∇aP µa = 0. (18) The currents (17) can also be written as Jµνa = R(Xµ∂aX ν −Xν∂aXµ), P µa = µ. (19) Since there is ∇ahbc = 0, the conservation laws (18) also imply the covariantly conservation of the currents (19). 4 Generalization of the model The generalized form of the action (2) is I = −T f(R)− , (20) where f(R) is an arbitrary differentiable function of the scalar curvature R. The set {Xµ(σ, τ)} describes a string worldsheet in the spacetime. These string coordinates ap- peared in the induced metric γab through the equation (3). Thus, (20) is a model for the string action. The equation of motion of Xµ is as previous, i.e. (7). Vanishing the variation of this action with respect to the worldsheet metric hab, gives the equation of motion of hab, df(R) γab = 0. (21) The trace of this equation is df(R) habγab = 0. (22) Combining the equations (4), (21) and (22) again leads to the equation (6). As an example, consider the function f(R) = α lnR + β. Thus, the field equation (21) implies that the intrinsic metric hab becomes proportional to the induced metric γab, that is hab = Since the Poincaré transformations contain δhab = 0, the generalized action (20) for the flat background metric gµν = ηµν , also has the Poincaré invariance. This leads to the previous conserved currents, i.e. (14) and (19). 4.1 Weyl invariance in the presence of a new scalar field The action (20) under the reparametrization transformations is symmetric. The Weyl trans- formation is also defined by hab −→ h′ab = eρ(σ,τ)hab. (23) Thus, the scalar curvature transforms as R −→ R′ = e−ρ(R−∇2ρ), (24) where ∇2ρ = 1√ hhab∂bρ). The equations (23) and (24) imply that the action (20), for any function f(R), is Weyl non-invariant. Introducing (23) and (24) into the action (20) gives a new action which contains the field ρ(σ, τ), I ′ = −T h(R−∇2ρ) f [e−ρ(R−∇2ρ)]− 1 e−ρhabγab . (25) We can ignore the origin of this action. In other words, it is another model for string. However, under the Weyl transformations hab −→ eu(σ,τ)hab, ρ −→ ρ− u, (26) the action I ′, for any function f , is symmetric. Note that according to the definition of ∇2 there is the transformation ∇2 → e−u∇2. 5 Conclusions We considered some string actions which give dynamics to the worldsheet metric hab. Due to the absence of the Weyl invariance, these models admit at most conformally flat (but not flat) worldsheet. We observed that the constraint equation on the metric, extracted from the Polyakov action, is a special result of the field equations of our string models. Obtaining this constraint equation admits us to introduce an arbitrary function of the scalar curvature to the action. For the case f(R) = α lnR + β, the metric hab becomes proportional to the induced metric of the worldsheet. By introducing a new degree of freedom we obtained a string action, in which for any function f is Weyl invariant. Our string models with arbitrary f(R), in the flat background have the Poincaré sym- metry. The associated conserved currents are proportional to the scalar curvature R. We also constructed the covariantly conserved currents from the Poincaré currents. References [1] H.J. Schmidt, Int. J. Mod. Phys. D7 (1998) 215, gr-qc/9712034. [2] D. Park and Y. Kiem, Phys. Rev. D53 (1996) 5513; Phys. Rev. D53 (1996) 747. [3] A. Achucarro and M. Ortiz, Phys. Rev. D48 (1993) 3600. [4] D. Grumiller, W. Kummer and D.V. Vassilevich, Phys. Rept. 369 (2002) 327-429, hep- th/0204253. [5] M.O. Katanaev and I.V. Volovich, Phys. Lett. B175 (1986) 413, hep-th/0209014. [6] F. David, Mod. Phys. Lett. A3 (1988) 1651; J. Distler, H. Kawai, Nucl. Phys. B321 (1989) 509; A. A. Tseytlin, Int. Jour. Mod. Phys. A4 (1989) 1257; J. Polchinski, Nucl. Phys. B324 (1989) 123. ABSTRACT At first we introduce an action for the string, which leads to a worldsheet that always is curved. For this action we study the Poincar\'e symmetry and the associated conserved currents. Then, a generalization of the above action, which contains an arbitrary function of the two-dimensional scalar curvature, will be introduced. An extra scalar field enables us to modify these actions to Weyl invariant models. <|endoftext|><|startoftext|> Introduction The cosmic strings play an important role in the study of the early universe. These strings arise during the phase transition after the big bang explosion as the temperature drops down below some critical temperature as predicted by grand unified theories [1-5]. It is thought that cosmic strings cause density perturbations leading to the formation of galaxies [6]. These cosmic strings have stress-energy and couple with the gravitational field. Therefore, it is interesting to study the gravitational effects that arise from strings. The general relativistic treatment of strings was started by Letelier [7, 8] and Stachel [9]. Exact solutions of string cosmology in various space-times have been studied by several authors [10-23]. http://arxiv.org/abs/0704.0753v2 On the other hand, the magnetic field has an important role at the cosmo- logical scale and is present in galactic and intergalactic spaces. The importance of the magnetic field for various astrophysical phenomena has been studied in many papers. Melvin [24] has pointed out that during the evolution of the uni- verse, the matter was in a highly ionized state and is smoothly coupled with the field and forms a neutral matter as a result of universe expansion. FRW models are approximately valid as present day magnetic field strength is very small. In the early universe, the strength might have been appreciable. The break-down of isotropy is due to the magnetic field. Therefore the possibility of the presence of magnetic field in the cloud string universe is not unrealistic and has been investigated by many authors [25-28]. In this paper, we have investigated Bianchi type I massive string magnetized barotropic perfect fluid cosmological model in General Relativity. The magnetic field is due to an electric current produced along x-axis with infinite electrical conductivity. Also the behaviour of the model in the presence and absence of magnetic field together with other physical aspects is discussed. 2 The Metric and Field Equations We consider the space-time of Bianchi type-I in the form ds2 = −dt2 +A2(t)dx2 +B2(t)dy2 + C2(t)dz2. (1) The energy momentum tensor for a cloud of massive string and perfect fluid distribution with electromagnetic field is taken as i = (ρ+ p)viv j + pg i − λxix j + E i , (2) where vi and xi satisfy condition vivi = −xixi = −1, vixi = 0, (3) p is the isotropic pressure, ρ is the proper energy density for a cloud string with particles attached to them, λ is the string tension density, vi the four-velocity of the particles, and xi is a unit space-like vector representing the direction of string. In a co-moving co-ordinate system, we have vi = (0, 0, 0, 1), xi = , 0, 0, 0 . (4) The electromagnetic field E i given by Lichnerowicz [29] as i = µ̄ | h |2 − hihj . (5) Here the flow-vector vi satisfies ivj = −1, (6) and µ̄ is the magnetic permeability, hi the magnetic flux vector defined by ǫijklF klvj , (7) where Fkl is the electromagnetic field tensor and ǫijkl is the Levi Civita tensor density. The incidental magnetic field is taken along x-axis, so that h1 6= 0, h2 = h3 = h4 = 0. We assume that F23 is the only non-vanishing component of Fij . The Maxwell’s equations Fij;k + Fjk;i + Fki;j = 0, ;j = 0, (8) are satisfied by F23 = constant = H(say). Here F14 = 0 = F24 = F34, due to the assumption of infinite electrical conduc- tivity [30]. Hence . (9) Since | h |2= hlhl = h1h1 = g11(h1)2, therefore | h |2= µ̄2B2C2 . (10) Using Eqs. (9) and (10) in (5), we have E11 = − 2µ̄B2C2 = −E22 = −E33 = E44 . (11) If the particle density of the configuration is denoted by ρp, then we have ρ = ρp + λ. (12) The Einstein’s field equations (in gravitational units c = 1, G = 1) read as i = −T i , (13) where R i is the Ricci tensor; R = g ijRij is the Ricci scalar. The field equations (13) with (2) subsequently lead to the following system of equations: = −p+ λ+ H 2µ̄B2C2 , (14) 2µ̄B2C2 , (15) 2µ̄B2C2 , (16) 2µ̄B2C2 , (17) where the suffix 4 at the symbols A, B and C denotes ordinary differentiation with respect to t. 3 Solution of Field Equations The field Eqs. (14)-(17) are a system of four equations with six unknown param- eters A, B, C, p, λ and ρ. Two additional constraints relating these parameters are required to obtain explicit solutions of the system. From Eq. (16), we have p = −A44 − B44 − A4B4 , (18) where K = H . Now from Eq. (17), we have . (19) To get deterministic solution, we first assume that the universe is filled with barotropic perfect fluid which leads to p = γρ, (20) where γ(0 ≤ γ ≤ 1) is a constant. Putting the values of p and ρ from Eqs. (18) and (19) in (20), we obtain + (1 + γ) + (1 − γ) K = 0. (21) Equations (15) and (16) lead to (CB4 −BC4)4 (CB4 −BC4) = −A4 , (22) which again leads to , (23) where L is an integrating constant and BC = µ, = ν. (24) Thus from Eqs. (23) and (24), we have . (25) For deterministic solution, we secondly assume A = constant = α(say). (26) Thus Eq. (25) leads to . (27) From Eqs. (21) and (26), we have (1− γ)K = 0. (28) Using (24) in Eq. (28), we obtain + (γ − 1) µ + (1− γ) L 4α2µ2 + (1− γ)K = 0, (29) which again leads to 2µ44 + (γ − 1) , (30) where a = (γ − 1)L + 4(γ − 1)K. (31) Let us assume that µ4 = f(µ). Thus µ44 = ff ′, where f ′ = df . Accordingly Eq. (30) leads to (f2) + (γ − 1) 1 , (32) which again reduces to γ − 1 + bµ1−γ . (33) Now from Eq. (27), we have dµ (34) Using Eq. (33) in Eq. (34), we have Lµ̄γdµ bµ1−γ ℓ2 + µ1−γ , (35) where ℓ2 = a (γ−1)b . Eq. (35), after integration, leads to ν = S ℓ2 + µ1−γ − ℓ ℓ2 + µ1−γ + ℓ α(1−γ)ℓ , (36) where S is the constant of integration. Thus the metric (1) reduces to the form ds2 = − dµ2 + α2dx2 + µ ℓ2 + µ1−γ − ℓ ℓ2 + µ1−γ + ℓ αℓ(1−γ) ℓ2 + µ1−γ − ℓ ℓ2 + µ1−γ + ℓ αℓ(1−γ) , (37) which after suitable transformation of coordinates, leads to ds2 = − dT b(ℓ2 + T 1−γ) + dX2 + T ℓ2 + µ1−γ − ℓ ℓ2 + µ1−γ + ℓ αℓ(1−γ) ℓ2 + µ1−γ − ℓ ℓ2 + µ1−γ + ℓ αℓ(1−γ) , (38) where αx = X, Sy = Y, 1√ z = z, µ = T . In the absence of the magnetic field, i.e. when K → 0, then the metric (37) reduces to 2 = − + T 1−γ ) + dX2 + T + T 1−γ − L + T 1−γ + L + T 1−γ − L + T 1−γ + L . (39) 4 The Geometric and Physical Significance of Model The energy density (ρ), the string tension density (λ), the particle density (ρp), the isotropic pressure (p), the scalar of expansion (θ), and shear tensor (σ) for the model (38) are given by b(ℓ2 + T 1−γ)− L , (40) b(1− γ)T 1−γ , (41) b(ℓ2 + γT 1−γ)− L , (42) b(ℓ2 + T 1−γ)− L . (43) ℓ2 + T 1−γ , (44) 6b(ℓ2 + T 1−γ) + . (45) ρ+ p = b{2ℓ2 − (1− γ)T 1−γ} − 2L , (46) ρ+ 3p = b{4ℓ2 + (1 + 3γ)T 1−γ − 4L − 16K . (47) The reality conditions given by Ellis [31] as (i)ρ+ p > 0, (ii)ρ+ 3p > 0, are satisfied when T 1−γ < ℓ2 − L The energy conditions ρ ≥ 0 and ρp ≥ 0 are satisfied in the presence of magnetic field for the model (38). The condition ρ ≥ 0 leads to b(ℓ2 + T 1−γ) ≥ + 4K. The condition ρp ≥ 0 leads to b(ℓ2 + T 1−γ) ≥ L − 4K. From Eq. (42), we observe that the string tension density λ ≥ 0 provided b(1− γ)T 1−γ ≥ 8K. The model (38) starts with a big bang at T = 0 and the expansion in the model decreases as time increases. When T → 0 then ρ → ∞, λ → ∞. When T → ∞ then ρ → 0, λ → 0. Also p → ∞ when T → 0 and p → 0 when T → ∞. Since limT→∞ 6= 0, hence the model does not isotropize in general. However, if L = 0 then the model (38) isotropizes for large values of T . There is a point type singularity [32] in the model (38) at T = 0. The ratio of magnetic energy to material energy is given by b(ℓ2 + T (1−γ))− L2 , (48) where 0 ≤ γ ≤ 1. The ratio E is non-zero finite quantity initially and tends to zero as T → ∞. The scale factor (R) is given by R3 = ABC = αµ = αT. (49) Thus R increases as T increases. The deceleration parameter (q) in presence of magnetic field is given by q = −RR44 a(3a− 2) + 1 b(1 + 3γ)T (1−γ) γ−1 + bT (1−γ) ) . (50) The deceleration parameter (q) approaches the value (−1) as in the case of de-Sitter universe if T (1−γ) = 2a(3a− 2)(γ − 1)− a b(1− γ)(1− 3γ) In the absence of magnetic field, i.e. K → 0, the above mentioned quantities are given by 4T 1+γ , (51) b(1− γ) 4T 1+γ , (52) 4T 1+γ , (53) 4T 1+γ , (54) In the absence of magnetic field when γ = 1, then ρ = b and also the string tension density becomes zero. The energy conditions ρ ≥ 0 and ρp ≥ 0 are satisfied for the model (38) when b ≥ 0. The reality conditions given by Ellis [31] as (i) ρ+ p > 0, (ii) ρ+ 3p > 0, are satisfied when b > 0. + bT 1−γ , (55) + 6bT 1−γ . (56) In the absence of magnetic field, the model (39) starts with a big bang at T = 0 and the expansion in the model decreases as time increases. When T → 0 then ρ → ∞, λ → ∞ and p → ∞. When T → ∞ then ρ → 0, λ → 0 and p → 0. In the absence of magnetic field, the particle density (ρp) and the isotropic pressure (p) are equal. Since limT→∞ 6= 0, therefore the model does not isotropize in general. However, if L = 0 then the model (39) isotropizes for large values of T . There is a point type singularity [32] in the model (39) at T = 0. In absence of magnetic field, the scale factor (R) is given by R3 = αT. (57) The R increases as T increases in this case also. The deceleration parameter (q) is given by q = − b(1 + 3γ)T (1−γ) − L {3L2(1− γ) + α2} + bT (1−γ) We observe that q < 0 if T (1−γ) > 2{3L2(1−γ)α2} b(1+3γ)α4 . The deceleration parameter (q) approaches the value (−1) as in the case of de-Sitter universe if T (1−γ) = L2{3L2(1− γ) + α2} 3bγα4 Acknowledgments Authors would like to thank the Inter-University Centre for Astronomy and Astrophysics (IUCAA), Pune, India for providing facility and support where this work was carried out. Authors also thank to the referee for their fruitful comments. References [1] Kibble T W B 1976 J. Phys. A: Math. Gen. 9 1387 [2] Zel’dovich Ya B, Kobzarev, I Yu, and Okun, L B 1975 Zh. Eksp. Teor. Fiz. 67 3 Zel’dovich Ya B, Kobzarev, I Yu, and Okun, L B 1975 Sov. Phys.-JETP [3] Kibble T W B 1980 Phys. Rep. 67 183 [4] Everett A E 1981 Phys. Rev. 24 858 [5] Vilenkin A 1981 Phys. Rev. D 24 2082 [6] Zel’dovich Ya B 1980 Mon. Not. R. Astron. Soc. 192 663 [7] Letelier P S 1979 Phys. Rev. D 20 1249 [8] Letelier P S 1983 Phys. Rev. D 28 2414 [9] Stachel J 1980 Phys. Rev. D 21 2171 [10] Banerjee A, Sanyal A K and Chakraborty S 1990 Pramana-J. Phys. 34 1 [11] Chakraborty S 1991 Ind. J. Pure Appl. Phys. 29 31 [12] Tikekar R and Patel L K 1992 Gen. Rel. Grav. 24 397 [13] Tikekar R and Patel L K 1994 Pramana-J. Phys. 42 483 [14] Patel, L K and Maharaj S D 1996 Pramana-J. Phys. 47 33 [15] Ram, S and Singh, T K 1995 Gen. Rel. Grav. 27 1207 [16] Carminati J and McIntosh C B G 1980 J. Phys. A: Math. Gen. 13 953 [17] Krori K D, Chaudhury T, Mahanta C R and Mazumdar A 1990 Gen. Rel. Grav. 22 123 [18] Wang X X 2003 Chin. Phys. Lett. 20 615 [19] Singh G P and Singh T 1999 Gen. Relativ. Gravit. 31 371 [20] Bali R and Upadhaya R D 2003 Astrophys. Space Sci. 283 97 [21] Bali R and Pradhan A 2007 Chin. Phys. Lett. 24 585 [22] Bali R and Anjali 2006 Astrophys. Space Sci. 302 201 [23] Yadav M K, Rai A and Pradhan A 2007 Int. J. Theor. Phys. to appear (gr-qc/0611032). [24] Melvin M A 1975 Ann. New York Acad. Sci. 262 253 [25] Wang X X 2006 Chin. Phys. Lett. 23 1702 [26] Wang X X 2004 Astrophys. Space Sci. 293 933 [27] Chakraborty N C and Chakraborty 2001 Int. J. Mod. Phys. D 10 723 [28] Singh G P and Singh T 1999 Gen. Rel. Gravit. 31 371 [29] Lichnerowicz A 1967 Relativistic Hydrodynamics and Magnetohydrody- namics Benjamin New York p. 13 [30] Roy Maartens 2000 Pramana-J. Phys. 55 576 [31] Ellis G F R 1971 General Relativity and Cosmology ed. Sachs R K Claren- don Press p. 117 [32] MacCallum M A H 1971 Comm. Math. Phys. 20 57 http://arxiv.org/abs/gr-qc/0611032 Introduction The Metric and Field Equations Solution of Field Equations The Geometric and Physical Significance of Model ABSTRACT Bianchi type I massive string cosmological model with magnetic field of barotropic perfect fluid distribution through the techniques used by Latelier and Stachel, is investigated. To get the deterministic model of the universe, it is assumed that the universe is filled with barotropic perfect fluid distribution. The magnetic field is due to electric current produced along x-axis with infinite electrical conductivity. The behaviour of the model in presence and absence of magnetic field together with other physical aspects is further discussed. <|endoftext|><|startoftext|> Introduction The theory of general relativity was developed by Einstein in work that extended from 1907 to 1915. The starting point for Einstein’s thinking was the compo- sition of a review article in 1907 on what we today call the theory of special relativity. Recall that the latter theory sprang from a new kinematics governing length and time measurements that was proposed by Einstein in June of 1905 [1], [2], following important pioneering work by Lorentz and Poincaré. The theory of special relativity essentially poses a new fundamental framework (in place of the one posed by Galileo, Descartes, and Newton) for the formulation of physical laws: this framework being the chrono-geometric space-time structure of Poincaré and Minkowski. After 1905, it therefore seemed a natural task to formulate, reformulate, or modify the then known physical laws so that they fit within the framework of special relativity. For Newton’s law of gravitation, this task was begun (before Einstein had even supplied his conceptual crystallization in 1905) by Lorentz (1900) and Poincaré (1905), and was pursued in the period from 1910 to 1915 by Max Abraham, Gunnar Nordström and Gustav Mie (with these latter researchers developing scalar relativistic theories of gravitation). Meanwhile, in 1907, Einstein became aware that gravitational interactions possessed particular characteristics that suggested the necessity of generalizing the framework and structure of the 1905 theory of relativity. After many years of intense intellectual effort, Einstein succeeded in constructing a generalized ∗Talk given at the Poincaré Seminar “Gravitation et Expérience” (28 October 2006, Paris); to appear in the proceedings to be published by Birkhäuser. †Translated from the French by Eric Novak. http://arxiv.org/abs/0704.0754v1 theory of relativity (or general relativity) that proposed a profound modification of the chrono-geometric structure of the space-time of special relativity. In 1915, in place of a simple, neutral arena, given a priori, independently of all material content, space-time became a physical “field” (identified with the gravitational field). In other words, it was now a dynamical entity, both influencing and influenced by the distribution of mass-energy that it contains. This radically new conception of the structure of space-time remained for a long while on the margins of the development of physics. Twentieth century physics discovered a great number of new physical laws and phenomena while working with the space-time of special relativity as its fundamental framework, as well as imposing the respect of its symmetries (namely the Lorentz-Poincaré group). On the other hand, the theory of general relativity seemed for a long time to be a theory that was both poorly confirmed by experiment and without connection to the extraordinary progress springing from application of quantum theory (along with special relativity) to high-energy physics. This marginaliza- tion of general relativity no longer obtains. Today, general relativity has become one of the essential players in cutting-edge science. Numerous high-precision ex- perimental tests have confirmed, in detail, the pertinence of this theory. General relativity has become the favored tool for the description of the macroscopic uni- verse, covering everything from the big bang to black holes, including the solar system, neutron stars, pulsars, and gravitational waves. Moreover, the search for a consistent description of fundamental physics in its entirety has led to the exploration of theories that unify, within a general quantum framework, the description of matter and all its interactions (including gravity). These theo- ries, which are still under construction and are provisionally known as string theories, contain general relativity in a central way but suggest that the funda- mental structure of space-time-matter is even richer than is suggested separately by quantum theory and general relativity. 2 Special Relativity We begin our exposition of the theory of general relativity by recalling the chrono-geometric structure of space-time in the theory of special relativity. The structure of Poincaré-Minkowski space-time is given by a generalization of the Euclidean geometric structure of ordinary space. The latter structure is sum- marized by the formula L2 = (∆x)2 + (∆y)2 + (∆z)2 (a consequence of the Pythagorean theorem), expressing the square of the distance L between two points in space as a sum of the squares of the differences of the (orthonormal) coordinates x, y, z that label the points. The symmetry group of Euclidean ge- ometry is the group of coordinate transformations (x, y, z) → (x′, y′, z′) that leave the quadratic form L2 = (∆x)2 + (∆y)2 +(∆z)2 invariant. (This group is generated by translations, rotations, and “reversals” such as the transformation given by reflection in a mirror, for example: x′ = −x, y′ = y, z′ = z.) The Poincaré-Minkowski space-time is defined as the ensemble of events (ide- alizations of what happens at a particular point in space, at a particular moment in time), together with the notion of a (squared) interval S2 defined between any two events. An event is fixed by four coordinates, x, y, z, and t, where (x, y, z) are the spatial coordinates of the point in space where the event in question “occurs,” and where t fixes the instant when this event “occurs.” An- other event will be described (within the same reference frame) by four different coordinates, let us say x+∆x, y+∆y, z+∆z, and t+∆t. The points in space where these two events occur are separated by a distance L given by the for- mula above, L2 = (∆x)2+(∆y)2+(∆z)2. The moments in time when these two events occur are separated by a time interval T given by T = ∆t. The squared interval S2 between these two events is given as a function of these quantities, by definition, through the following generalization of the Pythagorean theorem: S2 = L2 − c2 T 2 = (∆x)2 + (∆y)2 + (∆z)2 − c2(∆t)2 , (1) where c denotes the speed of light (or, more precisely, the maximum speed of signal propagation). Equation (1) defines the chrono-geometry of Poincaré-Minkowski space-time. The symmetry group of this chrono-geometry is the group of coordinate trans- formations (x, y, z, t) → (x′, y′, z′, t′) that leave the quadratic form (1) of the interval S invariant. We will show that this group is made up of linear trans- formations and that it is generated by translations in space and time, spatial rotations, “boosts” (meaning special Lorentz transformations), and reversals of space and time. It is useful to replace the time coordinate t by the “light-time” x0 ≡ ct, and to collectively denote the coordinates as xµ ≡ (x0, xi) where the Greek indices µ, ν, . . . = 0, 1, 2, 3, and the Roman indices i, j, . . . = 1, 2, 3 (with x1 = x, x2 = y, and x3 = z). Equation (1) is then written S2 = ηµν ∆x µ ∆xν , (2) where we have used the Einstein summation convention1 and where ηµν is a diagonal matrix whose only non-zero elements are η00 = −1 and η11 = η22 = η33 = +1. The symmetry group of Poincaré-Minkowski space-time is therefore the ensemble of Lorentz-Poincaré transformations, x′µ = Λµν x ν + aµ , (3) where ηαβ Λ ν = ηµν . The chrono-geometry of Poincaré-Minkowski space-time can be visualized by representing, around each point x in space-time, the locus of points that are separated from the point x by a unit (squared) interval, in other words the ensemble of points x′ such that S2xx′ = ηµν(x ′µ−xµ)(x′ν −xν) = +1. This locus is a one-sheeted (unit) hyperboloid. If we were within an ordinary Euclidean space, the ensemble of points x′ would trace out a (unit) sphere centered on x, and the “field” of these spheres 1Every repeated index is supposed to be summed over all of its possible values. centered on each point x would allow one to completely characterize the Eu- clidean geometry of the space. Similarly, in the case of Poincaré-Minkowski space-time, the “field” of unit hyperboloids centered on each point x is a visual characterization of the geometry of this space-time. See Figure 1. This figure gives an idea of the symmetry group of Poincaré-Minkowski space-time, and renders the rigid and homogeneous nature of its geometry particularly clear. Figure 1: Geometry of the “rigid” space-time of the theory of special relativity. This geometry is visualized by representing, around each point x in space-time, the locus of points separated from the point x by a unit (squared) interval. The space-time shown here has only three dimensions: one time dimension (repre- sented vertically), x0 = ct, and two spatial dimensions (represented horizon- tally), x, y. We have also shown the ‘space-time line’, or ‘world-line’, (moving from the bottom to the top of the “space-time block,” or from the past towards the future) representing the history of a particle’s motion. The essential idea in Einstein’s article of June 1905 was to impose the group of transformations (3) as a symmetry group of the fundamental laws of physics (“the principle of relativity”). This point of view proved to be extraordinarily fruitful, since it led to the discovery of new laws and the prediction of new phe- nomena. Let us mention some of these for the record: the relativistic dynamics of classical particles, the dilation of lifetimes for relativistic particles, the re- lation E = mc2 between energy and inertial mass, Dirac’s relativistic theory of quantum spin 1 particles, the prediction of antimatter, the classification of particles by rest mass and spin, the relation between spin and statistics, and the CPT theorem. After these recollections on special relativity, let us discuss the special fea- ture of gravity which, in 1907, suggested to Einstein the need for a profound generalization of the chrono-geometric structure of space-time. 3 The Principle of Equivalence Einstein’s point of departure was a striking experimental fact: all bodies in an external gravitational field fall with the same acceleration. This fact was pointed out by Galileo in 1638. Through a remarkable combination of logical reason- ing, thought experiments, and real experiments performed on inclined planes,2 Galileo was in fact the first to conceive of what we today call the “universality of free-fall” or the “weak principle of equivalence.” Let us cite the conclusion that Galileo drew from a hypothetical argument where he varied the ratio between the densities of the freely falling bodies under consideration and the resistance of the medium through which they fall: “Having observed this I came to the con- clusion that in a medium totally devoid of resistance all bodies would fall with the same speed” [3]. This universality of free-fall was verified with more pre- cision by Newton’s experiments with pendulums, and was incorporated by him into his theory of gravitation (1687) in the form of the identification of the iner- tial massmi (appearing in the fundamental law of dynamics F = mi a) with the gravitational mass mg (appearing in the gravitational force, Fg = Gmgm mi = mg . (4) At the end of the nineteenth century, Baron Roland von Eötvös verified the equivalence (4) between mi and mg with a precision on the order of 10 and Einstein was aware of this high-precision verification. (At present, the equivalence between mi and mg has been verified at the level of 10 −12 [4].) The point that struck Einstein was that, given the precision with whichmi = mg was verified, and given the equivalence between inertial mass and energy discovered by Einstein in September of 1905 [2] (E = mi c 2), one must conclude that all of the various forms of energy that contribute to the inertial mass of a body (rest mass of the elementary constituents, various binding energies, internal kinetic energy, etc.) do contribute in a strictly identical way to the gravitational mass of this body, meaning both to its capacity for reacting to an external gravitational field and to its capacity to create a gravitational field. In 1907, Einstein realized that the equivalence betweenmi andmg implicitly contained a deeper equivalence between inertia and gravitation that had impor- tant consequences for the notion of an inertial reference frame (which was a fun- damental concept in the theory of special relativity). In an ingenious thought experiment, Einstein imagined the behavior of rigid bodies and reference clocks within a freely falling elevator. Because of the universality of free-fall, all of the objects in such a “freely falling local reference frame” would appear not to be accelerating with respect to it. Thus, with respect to such a reference frame, the exterior gravitational field is “erased” (or “effaced”). Einstein therefore pos- tulated what he called the “principle of equivalence” between gravitation and inertia. This principle has two parts, that Einstein used in turns. The first part says that, for any external gravitational field whatsoever, it is possible to 2The experiment with falling bodies said to be performed from atop the Leaning Tower of Pisa is a myth, although it aptly summarizes the essence of Galilean innovation. locally “erase” the gravitational field by using an appropriate freely falling local reference frame and that, because of this, the non-gravitational physical laws apply within this local reference frame just as they would in an inertial reference frame (free of gravity) in special relativity. The second part of Einstein’s equiv- alence principle says that, by starting from an inertial reference frame in special relativity (in the absence of any “true” gravitational field), one can create an apparent gravitational field in a local reference frame, if this reference frame is accelerated (be it in a straight line or through a rotation). 4 Gravitation and Space-Time Chrono-Geometry Einstein was able (through an extraordinary intellectual journey that lasted eight years) to construct a new theory of gravitation, based on a rich general- ization of the 1905 theory of relativity, starting just from the equivalence prin- ciple described above. The first step in this journey consisted in understanding that the principle of equivalence would suggest a profound modification of the chrono-geometric structure of Poincaré-Minkowski space-time recalled in Equa- tion (1) above. To illustrate, let Xα, α = 0, 1, 2, 3, be the space-time coordinates in a lo- cal, freely-falling reference frame (or locally inertial reference frame). In such a reference frame, the laws of special relativity apply. In particular, the infinites- imal space-time interval ds2 = dL2 − c2 dT 2 between two neighboring events within such a reference frame Xα, X ′α = Xα + dXα (close to the center of this reference frame) takes the form ds2 = dL2 − c2 dT 2 = ηαβ dXα dXβ , (5) where we recall that the repeated indices α and β are summed over all of their values (α, β = 0, 1, 2, 3). We also know that in special relativity the local energy and momentum densities and fluxes are collected into the ten components of the energy-momentum tensor Tαβ. (For example, the energy density per unit volume is equal to T 00, in the reference frame described by coordinates Xα = (X0, X i), i = 1, 2, 3.) The conservation of energy and momentum translates into the equation ∂β T αβ = 0, where ∂β = ∂/∂ X The theory of special relativity tells us that we can change our locally in- ertial reference frame (while remaining in the neighborhood of a space-time point where one has “erased” gravity) through a Lorentz transformation, X ′α = Λαβ X β. Under such a transformation, the infinitesimal interval ds2, Equation (5), remains invariant and the ten components of the (symmetric) tensor Tαβ are transformed according to T ′αβ = Λαγ Λ γδ. On the other hand, when we pass from a locally inertial reference frame (with coordinates Xα) to an extended non-inertial reference frame (with coordinates xµ; µ = 0, 1, 2, 3), the transformation connecting the Xα to the xµ is no longer a linear transforma- tion (like the Lorentz transformation) but becomes a non-linear transformation Xα = Xα(xµ) that can take any form whatsoever. Because of this, the value of the infinitesimal interval ds2, when expressed in a general, extended reference frame, will take a more complicated form than the very simple one given by Equation (5) that it had in a reference frame that was locally in free-fall. In fact, by differentiating the non-linear functions Xα = Xα(xµ) we obtain the relation dXα = ∂Xα/∂xµ dxµ. By substituting this relation into (5) we then obtain ds2 = gµν(x λ) dxµ dxν , (6) where the indices µ, ν are summed over 0, 1, 2, 3 and where the ten functions gµν(x) (symmetric over the indices µ and ν) of the four variables x λ are de- fined, point by point (meaning that for each point xλ we consider a refer- ence frame that is locally freely falling at x, with local coordinates Xαx ) by gµν(x) = ηαβ ∂X x (x)/∂x µ ∂Xβx (x)/∂x ν . Because of the nonlinearity of the functions Xα(x), the functions gµν(x) generally depend in a nontrivial way on the coordinates xλ. The local chrono-geometry of space-time thus appears to be given, not by the simple Minkowskian metric (2), with constant coefficients ηµν , but by a quadratic metric of a much more general type, Equation (6), with coefficients gµν(x) that vary from point to point. Such general metric spaces had been introduced and studied by Gauss and Riemann in the nineteenth century (in the case where the quadratic form (6) is positive definite). They carry the name Riemannian spaces or curved spaces. (In the case of interest for Einstein’s theory, where the quadratic form (6) is not positive definite, one speaks of a pseudo-Riemannian metric.) We do not have the space here to explain in detail the various geometric structures in a Riemannian space that are derivable from the data of the in- finitesimal interval (6). Let us note simply that given Equation (6), which gives the distance ds between two infinitesimally separated points, we are able, through integration along a curve, to define the length of an arbitrary curve connecting two widely separated points A and B: LAB = ds. One can then define the “straightest possible line” between two given points A and B to be the shortest line, in other words the curve that minimizes (or, more generally, extremizes) the integrated distance LAB. These straightest possible lines are called geodesic curves. To give a simple example, the geodesics of a spherical surface (like the surface of the Earth) are the great circles (with radius equal to the radius of the sphere). If one mathematically writes the condition for a curve, as given by its parametric representation xµ = xµ(s), where s is the length along the curve, to extremize the total length LAB one finds that x must satisfy the following second-order differential equation: d2 xλ + Γλµν(x) = 0 , (7) where the quantities Γλµν , known as the Christoffel coefficients or connection coefficients, are calculated, at each point x, from the metric components gµν(x) by the equation Γλµν ≡ gλσ(∂µ gνσ + ∂ν gµσ − ∂σ gµν) , (8) where gµν denotes the matrix inverse to gµν (g µσ gσν = δ ν where the Kronecker symbol δµν is equal to 1 when µ = ν and 0 otherwise) and where ∂µ ≡ ∂/∂xµ denotes the partial derivative with respect to the coordinate xµ. To give a very simple example: in the Poincaré-Minkowski space-time the components of the metric are constant, gµν = ηµν (when we use an inertial reference frame). Because of this, the connection coefficients (8) vanish in an inertial reference frame, and the differential equation for geodesics reduces to d2 xλ/ds2 = 0, whose solutions are ordinary straight lines: xλ(s) = aλ s + bλ. On the other hand, in a general “curved” space-time (meaning one with components gµν that depend in an arbitrary way on the point x) the geodesics cannot be globally represented by straight lines. One can nevertheless show that it always remains possible, for any gµν(x) whatsoever, to change coordinates x µ → Xα(x) in such a way that the connection coefficients Γαβγ , in the new system of coordinates Xα, vanish locally, at a given point Xα0 (or even along an arbitrary curve). Such locally geodesic coordinate systems realize Einstein’s equivalence principle mathematically: up to terms of second order, the components gαβ(X) of a “curved” metric in locally geodesic coordinates Xα (ds2 = gαβ(X) dX α dXβ) can be identified with the components of a “flat” Poincaré-Minkowski metric: gαβ(X) = ηαβ +O((X−X0)2), where X0 is the point around which we expand. 5 Einstein’s Equations: Elastic Space-Time Having postulated that a consistent relativistic theory of the gravitational field should include the consideration of a far-reaching generalization of the Poincaré- Minkowski space-time, Equation (6), Einstein concluded that the same ten functions gµν(x) should describe both the geometry of space-time as well as gravitation. He therefore got down to the task of finding which equations must be satisfied by the “geometric-gravitational field” gµν(x). He was guided in this search by three principles. The first was the principle of general relativity, which asserts that in the presence of a gravitational field one should be able to write the fundamental laws of physics (including those governing the gravitational field itself) in the same way in any coordinate system whatsoever. The second was that the “source” of the gravitational field should be the energy-momentum tensor T µν . The third was a principle of correspondence with earlier physics: in the limit where one neglects gravitational effects, gµν(x) = ηµν should be a solution of the equations being sought, and there should also be a so-called Newtonian limit where the new theory reduces to Newton’s theory of gravity. Note that the principle of general relativity (contrary to appearances and contrary to what Einstein believed for several years) has a different physical status than the principle of special relativity. The principle of special relativity was a symmetry principle for the structure of space-time that asserted that physics is the same in a particular class of reference frames, and therefore that certain “corresponding” phenomena occur in exactly the same way in different reference frames (“active” transformations). On the other hand, the principle of general relativity is a principle of indifference: the phenomena do not (in general) take place in the same way in different coordinate systems. However, none of these (extended) coordinate systems enjoys any privileged status with respect to the others. The principle asserting that the energy-momentum tensor T µν should be the source of the gravitational field is founded on two ideas: the relations E = mi c and the weak principle of equivalence mi = mg show that, in the Newtonian limit, the source of gravitation, the gravitational mass mg, is equal to the total energy of the body considered, or in other words the integral over space of the energy density T 00, up to the factor c−2. Therefore at least one of the components of the tensor T µν must play the role of source for the gravitational field. However, since the gravitational field is encoded, according to Einstein, by the ten components of the metric gµν , it is natural to suppose that the source for gµν must also have ten components, which is precisely the case for the (symmetric) tensor T µν . In November of 1915, after many years of conceptually arduous work, Ein- stein wrote the final form of the theory of general relativity [6]. Einstein’s equa- tions are non-linear, second-order partial differential equations for the geometric- gravitational field gµν , containing the energy-momentum tensor Tµν ≡ gµκ gνλ T κλ on the right-hand side. They are written as follows: Rµν − Rgµν = Tµν (9) where G is the (Newtonian) gravitational constant, c is the speed of light, and R ≡ gµν Rµν and the Ricci tensor Rµν are calculated as a function of the connection coefficients Γλµν (8) in the following way: Rµν ≡ ∂α Γαµν − ∂ν Γαµα + Γαβα Γβµν − Γαβν Γβµα . (10) One can show that, in a four-dimensional space-time, the three principles we have described previously uniquely determine Einstein’s equations (9). It is nevertheless remarkable that these equations may also be developed from points of view that are completely different from the one taken by Einstein. For example, in the 1960s various authors (in particular Feynman, Weinberg and Deser; see references in [4]) showed that Einstein’s equations could be obtained from a purely dynamical approach, founded on the consistency of interactions of a long-range spin 2 field, without making any appeal, as Einstein had, to the geometric notions coming from mathematical work on Riemannian spaces. Let us also note that if we relax one of the principles described previously (as Einstein did in 1917) we can find a generalization of Equation (9) in which one adds the term +Λ gµν to the left-hand side, where Λ is the so-called cosmological constant. Such a modification was proposed by Einstein in 1917 in order to be able to write down a globally homogeneous and stationary cosmological solution. Einstein rejected this additional term after work by Friedmann (1922) showed the existence of expanding cosmological solutions of general relativity and after the observational discovery (by Hubble in 1929) of the expanding motion of galaxies within the universe. However, recent cosmological data have once again made this possibility fashionable, although in the fundamental physics of today one tends to believe that a term of the type Λ gµν should be considered as a particular physical contribution to the right-hand side of Einstein’s equations (more precisely, as the stress-energy tensor of the vacuum, T Vµν = − c Λ gµν), rather than as a universal geometric modification of the left-hand side. Let us now comment on the physical meaning of Einstein’s equations (9). The essential new idea is that the chrono-geometric structure of space-time, Equation (6), in other words the structure that underlies all of the measurements that one could locally make of duration, dT , and of distance, dL, (we recall that, locally, ds2 = dL2 − c2 dT 2) is no longer a rigid structure that is given a priori, once and for all (as was the case for the structure of Poincaré-Minkowski space-time), but instead has become a field, a dynamical or elastic structure, which is created and/or deformed by the presence of an energy-momentum distribution. See Figure 2, which visualizes the “elastic” geometry of space- time in the theory of general relativity by representing, around each point x, the locus of points (assumed to be infinitesimally close to x) separated from x by a constant (squared) interval: ds2 = ε2. As in the case of Poincaré-Minkowski space-time (Figure 1), one arrives at a “field” of hyperboloids. However, this field of hyperboloids no longer has a “rigid” and homogeneous structure. Figure 2: “Elastic” space-time geometry in the theory of general relativity. This geometry is visualized by representing, around each space-time point x, the locus of points separated from x by a given small positive (squared) interval. The space-time field gµν(x) describes the variation from point to point of the chrono-geometry as well as all gravitational effects. The simplest example of space-time chrono-geometric elasticity is the effect that the proximity of a mass has on the “local rate of flow for time.” In concrete terms, if you separate two twins at birth, with one staying on the surface of the Earth and the other going to live on the peak of a very tall mountain (in other words farther from the Earth’s center), and then reunite them after 100 years, the “highlander” will be older (will have lived longer) than the twin who stayed on the valley floor. Everything takes place as if time flows more slowly the closer one is to a given distribution of mass-energy. In mathematical terms this effect is due to the fact that the coefficient g00(x) of (dx 0)2 in Equation (6) is deformed with respect to its value in special relativity, gMinkowski00 = η00 = −1, to become gEinstein00 (x) ≃ −1 + 2GM/c2r, where M is the Earth’s mass (in our example) and r the distance to the center of the Earth. In the example considered above of terrestrial twins the effect is extremely small (a difference in the amount of time lived of about one second over 100 years), but the effect is real and has been verified many times using atomic clocks (see the references in [4]). Let us mention that today this “Einstein effect” has important practical repercussions, for example in aerial or maritime navigation, for the piloting of automobiles, or even farm machinery, etc. In fact, the GPS (Global Positioning System), which uses the data transmitted by a constellation of atomic clocks on board satellites, incorporates the Einsteinian deformation of space-time chrono-geometry into its software. The effect is only on the order of one part in a billion, but if it were not taken into account, it would introduce an unacceptably large error into the GPS, which would continually grow over time. Indeed, GPS performance relies on the high stability of the orbiting atomic clocks, a stability better than 10−13, or in other words 10,000 times greater than the apparent change in frequency(∼ 10−9) due to the Einsteinian deformation of the chrono-geometry. 6 TheWeak-Field Limit and the Newtonian Limit To understand the physical consequences of Einstein’s equations (9), it is useful to begin by considering the limiting case of weak geometric-gravitational fields, namely the case where gµν(x) = ηµν + hµν(x), with perturbations hµν(x) that are very small with respect to unity: |hµν(x)| ≪ 1. In this case, a simple calculation (that we encourage the reader to perform) starting from Definitions (8) and (10) above, leads to the following explicit form of Einstein’s equations (where we ignore terms of order h2 and hT ): � hµν − ∂µ ∂α hαν − ∂ν ∂α hαµ + ∂µν hαα = − 16 πG T̃µν , (11) where � = ηµν ∂µν = ∆−∂20 = ∂2/∂x2+∂2/∂y2+∂2/∂z2− c−2 ∂2/∂t2 denotes the “flat” d’Alembertian (or wave operator; xµ = (ct, x, y, z)), and where indices in the upper position have been raised by the inverse ηµν of the flat metric ηµν (numerically ηµν = ηµν , meaning that −η00 = η11 = η22 = η33 = +1). For example ∂α hαν denotes η αβ ∂α hβν and h α ≡ ηαβ hαβ = −h00 + h11 + h22 + h33. The “source” T̃µν appearing on the right-hand side of (11) denotes the combination T̃µν ≡ Tµν − 12 T α ηµν (when space-time is four-dimensional). The “linearized” approximation (11) of Einstein’s equations is analogous to Maxwell’s equations �Aµ − ∂µ ∂αAα = −4π Jµ , (12) connecting the electromagnetic four-potential Aµ ≡ ηµν Aν (where A0 = V , Ai = A, i = 1, 2, 3) to the four-current density Jµ ≡ ηµν Jν (where J0 = ρ is the charge density and J i = J is the current density). Another analogy is that the structure of the left-hand side of Maxwell’s equations implies that the “source” Jµ appearing on the right-hand side must satisfy ∂ µ Jµ = 0 (∂ µ ≡ ηµν ∂ν), which expresses the conservation of electric charge. Likewise, the structure of the left-hand side of the linearized form of Einstein’s equations (11) implies that the “source” Tµν = T̃µν − 12 T̃ α ηµν must satisfy ∂ µ Tµν = 0, which expresses the conservation of energy and momentum of matter. (The structure of the left- hand side of the exact form of Einstein’s equations (9) implies that the source Tµν must satisfy the more complicated equation ∂µ T µν+Γµσµ T σν+Γνσµ T µσ = 0, where the terms in ΓT can be interpreted as describing an exchange of energy and momentum between matter and the gravitational field.) The major dif- ference is that, in the case of electromagnetism, the field Aµ and its source Jµ have a single space-time index, while in the gravitational case the field hµν and its source T̃µν have two space-time indices. We shall return later to this anal- ogy/difference between Aµ and hµν which suggests the existence of a certain relation between gravitation and electromagnetism. We recover the Newtonian theory of gravitation as the limiting case of Ein- stein’s theory by assuming not only that the gravitational field is a weak defor- mation of the flat Minkowski space-time (hµν ≪ 1), but also that the field hµν is slowly varying (∂0 hµν ≪ ∂i hµν) and that its source Tµν is non-relativistic (Tij ≪ T0i ≪ T00). Under these conditions Equation (11) leads to a Poisson- type equation for the purely temporal component, h00, of the space-time field, ∆h00 = − 16 πG T̃00 = − (T00 + Tii) ≃ − T00 , (13) where ∆ = ∂2x + ∂ y + ∂ z is the Laplacian. Recall that, according to Laplace and Poisson, Newton’s theory of gravity is summarized by saying that the grav- itational field is described by a single potential U(x), produced by the mass density ρ(x) according to the Poisson equation ∆U = −4 πGρ, that deter- mines the acceleration of a test particle placed in the exterior field U(x) ac- cording to the equation d2 xi/dt2 = ∂i U(x) ≡ ∂U/∂xi. Because of the relation mi = mg = E/c 2 one can identify ρ = T 00/c2. We therefore find that (13) reproduces the Poisson equation if h00 = +2U/c 2. It therefore remains to ver- ify that Einstein’s theory indeed predicts that a non-relativistic test particle is accelerated by a space-time field according to d2 xi/dt2 ≃ 1 c2 ∂i h00. Einstein understood that this was a consequence of the equivalence principle. In fact, as we discussed in Section 4 above, the principle of equivalence states that the gravitational field is (locally) erased in a locally inertial reference frame Xα (such that gαβ(X) = ηαβ +O((X −X0)2)). In such a reference frame, the laws of special relativity apply at the point X0. In particular an isolated (and elec- trically neutral) body must satisfy a principle of inertia in this frame: its center of mass moves in a straight line at constant speed. In other words it satisfies the equation of motion d2Xα/ds2 = 0. By passing back to an arbitrary (ex- tended) coordinate system xµ, one verifies that this equation for inertial motion transforms into the geodesic equation (7). Therefore (7) describes falling bodies, such as they are observed in arbitrary extended reference frames (for example a reference frame at rest with respect to the Earth or at rest with respect to the center of mass of the solar system). From this one concludes that the relativis- tic analog of the Newtonian field of gravitational acceleration, g(x) = ∇U(x), is gλ(x) ≡ −c2 Γλµν dxµ/ds dxν/ds. By considering a particle whose motion is slow with respect to the speed of light (dxi/ds ≪ dx0/ds ≃ 1) one can easily verify that gi(x) ≃ −c2 Γi00. Finally, by using the definition (8) of Γαµν , and the hypothesis of weak fields, one indeed verifies that gi(x) ≃ 1 c2 ∂i h00, in perfect agreement with the identification h00 = 2U/c 2 anticipated above. We encour- age the reader to personally verify this result, which contains the very essence of Einstein’s theory: gravitational motion is no longer described as being due to a force, but is identified with motion that is “as inertial as possible” within a space-time whose chrono-geometry is deformed in the presence of a mass-energy distribution. Finding the Newtonian theory as a limiting case of Einstein’s theory is ob- viously a necessity for seriously considering this new theory. But of course, from the very beginning Einstein explored the observational consequences of general relativity that go beyond the Newtonian description of gravitation. We have already mentioned one of these above: the fact that g00 = η00 + h00 ≃ −1 + 2U(x)/c2 implies a distortion in the relative measurement of time in the neighborhood of massive bodies. In 1907 (as soon as he had developed the principle of equivalence, and long before he had obtained the field equations of general relativity) Einstein had predicted the existence of such a distortion for measurements of time and frequency in the presence of an external gravita- tional field. He realized that this should have observable consequences for the frequency, as observed on Earth, of the spectral rays emitted from the surface of the Sun. Specifically, a spectral ray of (proper local) frequency ν0 emitted from a point x0 where the (stationary) gravitational potential takes the value U(x0) and observed (via electromagnetic signals) at a point x where the potential is U(x) should appear to have a frequency ν such that g00(x0) g00(x) ≃ 1 + 1 [U(x)− U(x0)] . (14) In the case where the point of emission x0 is in a gravitational potential well deeper than the point of observation x (meaning that U(x0) > U(x)) one has ν < ν0, in other words a reddening effect on frequencies. This effect, which was predicted by Einstein in 1907, was unambiguously verified only in the 1960s, in experiments by Pound and collaborators over a height of about twenty me- ters. The most precise verification (at the level of ∼ 10−4) is due to Vessot and collaborators, who compared a hydrogen maser, launched aboard a rocket that reached about 10,000 km in altitude, to a clock of similar construction on the ground. Other experiments compared the times shown on clocks placed aboard airplanes to clocks remaining on the ground. (For references to these experiments see [4].) As we have already mentioned, the “Einstein effect” (14) must be incorporated in a crucial way into the software of satellite positioning systems such as the GPS. In 1907, Einstein also pointed out that the equivalence principle would sug- gest that light rays should be deflected by a gravitational field. Indeed, a gener- alization of the reasoning given above for the motion of particles in an external gravitational field, based on the principle of equivalence, shows that light must itself follow a trajectory that is “as inertial as possible,” meaning a geodesic of the curved space-time. Light rays must therefore satisfy the geodesic equa- tion (7). (The only difference from the geodesics followed by material particles is that the parameter s in Equation (7) can no longer be taken equal to the “length” along the geodesic, since a “light” geodesic must also satisfy the con- straint gµν(x) dx µ dxν = 0, ensuring that its speed is equal to c, when it is measured in a locally inertial reference frame.) Starting from Equation (7) one can therefore calculate to what extent light is deflected when it passes through the neighborhood of a large mass (such as the Sun). One nevertheless soon realizes that in order to perform this calculation one must know more than the component h00 of the gravitational field. The other components of hµν , and in particular the spatial components hij , come into play in a crucial way in this calculation. This is why it was only in November of 1915, after having obtained the (essentially) final form of his theory, that Einstein could predict the total value of the deflection of light by the Sun. Starting from the linearized form of Einstein’s equations (11) and continuing by making the “non-relativistic” sim- plifications indicated above (Tij ≪ T0i ≪ T00, ∂0 h≪ ∂i h) it is easy to see that the spatial component hij , like h00, can be written (after a helpful choice of coordinates) in terms of the Newtonian potential U as hij(x) ≃ +2U(x) δij/c2, where δij takes the value 1 if i = j and 0 otherwise (i, j = 1, 2, 3). By inserting this result, as well as the preceding result h00 = +2U/c 2, into the geodesic equation (7) for the motion of light, one finds (as Einstein did in 1915) that general relativity predicts that the Sun should deflect a ray of light by an angle θ = 4GM/(c2b) where b is the impact parameter of the ray (meaning its mini- mum distance from the Sun). As is well known, the confirmation of this effect in 1919 (with rather weak precision) made the theory of general relativity and its creator famous. 7 The Post-Newtonian Approximation and Ex- perimental Confirmations in the Regime of Weak and Quasi-Stationary Gravitational Fields We have already pointed out some of the experimental confirmations of the theory of general relativity. At present, the extreme precision of certain mea- surements of time or frequency in the solar system necessitates a very careful account of the modifications brought by general relativity to the Newtonian de- scription of space-time. As a consequence, general relativity is used in a great number of situations, from astronomical or geophysical research (such as very long range radio interferometry, radar tracking of the planets, and laser tracking of the Moon or artificial satellites) to metrological, geodesic or other applica- tions (such as the definition of international atomic time, precision cartography, and the G.P.S.). To do this, the so-called post-Newtonian approximation has been developed. This method involves working in the Newtonian limit sketched above while keeping the terms of higher order in the small parameter ε ∼ v ∼ |hµν | ∼ |∂0 h/∂i h|2 ∼ |T 0i/T 00|2 ∼ |T ij/T 00| , where v denotes a characteristic speed for the elements in the system considered. For all present applications of general relativity to the solar system it suffices to include the first post-Newtonian approximation, in other words to keep the relative corrections of order ε to the Newtonian predictions. Since the theory of general relativity was poorly verified for a long time, one found it useful (as in the pioneering work of A. Eddington, generalized in the 1960s by K. Nordtvedt and C.M. Will) to study not only the precise predictions of the equations (9) defining Einstein’s theory, but to also consider possible deviations from these predictions. These possible deviations were parameterized by means of several non-dimensional “post-Newtonian” parameters. Among these parameters, two play a key role: γ and β. The parameter γ describes a possible deviation from general relativity that comes into play starting at the linearized level, in other words one that modifies the linearized approximation given above. More pre- cisely, it is defined by writing that the difference hij ≡ gij − δij between the spatial metric and the Euclidean metric can take the value hij = 2γ U δij/c 2 (in a suitable coordinate system), rather than the value hGRij = 2U δij/c 2 that it takes in general relativity, thus differing by a factor γ. Therefore, by definition γ takes the value 1 in general relativity, and γ− 1 measures the possible deviation with respect to this theory. As for the parameter β (or rather β−1), it measures a pos- sible deviation (with respect to general relativity) in the value of h00 ≡ g00−η00. The value of h00 in general relativity is h 00 = 2U/c 2 − 2U2/c4, where the first term (discussed above) reproduces the Newtonian approximation (and cannot therefore be modified, as the idea is to parameterize gravitational physics be- yond Newtonian predictions) and where the second term is obtained by solving Einstein’s equations (9) at the second order of approximation. One then writes an h00 of a more general parameterized type, h00 = 2U/c 2 − 2 β U2/c4, where, by definition, β takes the value 1 in general relativity. Let us finally point out that the parameters γ−1 and β−1 completely parameterize the post-Newtonian regime of the simplest theoretical alternatives to general relativity, namely the tensor-scalar theories of gravitation. In these theories, the gravitational inter- action is carried by two fields at the same time: a massless tensor (spin 2) field coupled to T µν , and a massless scalar (spin 0) field ϕ coupled to the trace Tαα . In this case the parameter −(γ − 1) plays the key role of measuring the ratio between the scalar coupling and the tensor coupling. All of the experiments performed to date within the solar system are com- patible with the predictions of general relativity. When they are interpreted in terms of the post-Newtonian (and “post-Einsteinian”) parameters γ − 1 and β−1, they lead to strong constraints on possible deviations from Einstein’s the- ory. We make note of the following among tests performed in the solar system: the deflection of electromagnetic waves in the neighborhood of the Sun, the grav- itational delay (‘Shapiro effect’) of radar signals bounced from the Viking lander on Mars, the global analysis of solar system dynamics (including the advance of planetary perihelia), the sub-centimeter measurement of the Earth-Moon dis- tance obtained from laser signals bounced off of reflectors on the Moon’s surface, etc. At present (October of 2006) the most precise test (that has been published) of general relativity was obtained in 2003 by measuring the ratio 1 + y ≡ f/f0 between the frequency f0 of radio waves sent from Earth to the Cassini space probe and the frequency f of coherent radio waves sent back (with the same local frequency) from Cassini to Earth and compared (on Earth) to the emitted frequency f0. The main contribution to the small quantity y is an effect equal, in general relativity, to yGR = 8(GM/c 3 b) db/dt (where b is, as before, the impact parameter) due to the propagation of radio waves in the geometry of a space-time deformed by the Sun: ds2 ≃ −(1−2U/c2) c2 dt2+(1+2U/c2)(dx2+dy2+dz2), where U = GM/r. The maximum value of the frequency change predicted by general relativity was only |yGR| . 2 × 10−10 for the best observations, but thanks to an excellent frequency stability ∼ 10−14 (after correction for the perturbations caused by the solar corona) and to a relatively large number of individual measurements spread over 18 days, this experiment was able to verify Einstein’s theory at the remarkable level of ∼ 10−5 [7]. More precisely, when this experiment is interpreted in terms of the post-Newtonian parameters γ− 1 and β − 1, it gives the following limit for the parameter γ − 1 [7] γ − 1 = (2.1± 2.3)× 10−5 . (15) As for the best present-day limit on the parameter β−1, it is smaller than 10−3 and comes from the non-observation, in the data from lasers bounced off of the Moon, of any eventual polarization of the Moon’s orbit in the direction of the Sun (‘Nordtvedt effect’; see [4] for references) 4(β − 1)− (γ − 1) = −0.0007± 0.0010 . (16) Although the theory of general relativity is one of the best verified theories in physics, scientists continue to design and plan new or increasingly precise tests of the theory. This is the case in particular for the space mission Gravity Probe B (launched by NASA in April of 2004) whose principal aim is to directly observe a prediction of general relativity that states (intuitively speaking) that space is not only “elastic,” but also “fluid.” In the nineteenth century Foucalt invented both the gyroscope and his famous pendulum in order to render Newton’s absolute (and rigid) space directly observable. His experiments in fact showed that, for example, a gyroscope on the surface of the Earth continued, despite the Earth’s rotation, to align itself in a direction that is “fixed” with respect to the distant stars. However, in 1918, when Lense and Thirring analyzed some of the consequences of the (linearized) Einstein equations (11), they found that general relativity predicts, among other things, the following phenomenon: the rotation of the Earth (or any other ball of matter) creates a particular deformation of the chrono-geometry of space-time. This deformation is described by the “gravito- magnetic” components h0i of the metric, and induces an effect analogous to the “rotation drag” effect caused by a ball of matter turning in a fluid: the rotation of the Earth (minimally) drags all of the space around it, causing it to continually “turn,” as a fluid would.3 This “rotation of space” translates, in an observable way, into a violation of the effects predicted by Newton and confirmed by Foucault’s experiments: in particular, a gyroscope no longer aligns itself in a direction that is “fixed in absolute space,” rather its axis of rotation is “dragged” by the rotating motion of the local space where it is located. This effect is much too small to be visible in Foucalt’s experiments. Its observation by Gravity Probe B (see [8] and the contribution of John Mester to this Poincaré seminar) is important for making Einstein’s revolutionary notion of a fluid space-time tangible to the general public. Up till now we have only discussed the regime of weak and slowly varying gravitational fields. The theory of general relativity predicts the appearance of new phenomena when the gravitational field becomes strong and/or rapidly varying. (We shall not here discuss the cosmological aspects of relativistic grav- itation.) 8 Strong Gravitational Fields and Black Holes The regime of strong gravitational fields is encountered in the physics of grav- itationally condensed bodies. This term designates the final states of stellar evolution, and in particular neutron stars and black holes. Recall that most of the life of a star is spent slowly burning its nuclear fuel. This process causes the star to be structured as a series of layers of differentiated nuclear structure, surrounding a progressively denser core (an “onion-like” structure). When the initial mass of the star is sufficiently large, this process ends into a catastrophic phenomenon: the core, already much denser than ordinary matter, collapses in on itself under the influence of its gravitational self-attraction. (This implosion of the central part of the star is, in many cases, accompanied by an explosion of the outer layers of the star—a supernova.) Depending on the quantity of mass that collapses with the core of a star, this collapse can give rise to either a neutron star or a black hole. A neutron star condenses a mass on the order of the mass of the Sun inside a radius on the order of 10 km. The density in the interior of a neutron star (named thus because neutrons dominate its nuclear composition) is more than 100 million tons per cubic centimeter (1014 g/cm3)! It is about the same as the density in the interior of atomic nuclei. What is important for our discussion is that the deformation away from the Minkowski metric in the immediate neigh- borhood of a neutron star, measured by h00 ∼ hii ∼ 2GM/c2R, where R is the radius of the star, is no longer a small quantity, as it was in the solar system. In fact, while h ∼ 2GM/c2R is on the order of 10−9 for the Earth and 10−6 for the Sun, one finds that h ∼ 0.4 for a typical neutron star (M ≃ 1.4M⊙, R ∼ 10 3Recent historical work (by Herbert Pfister) has in fact shown that this effect had already been derived by Einstein within the framework of the provisory relativistic theory of gravity that he started to develop in 1912 in collaboration with Marcel Grossmann. km). One thus concludes that it is no longer possible, as was the case in the solar system, to study the structure and physics of neutron stars by using the post-Newtonian approximation outlined above. One must consider the exact form of Einstein’s equations (9), with all of their non-linear structure. Because of this, we expect that observations concerning neutron stars will allow us to confirm (or refute) the theory of general relativity in its strongly non-linear regime. We shall discuss such tests below in relation to observations of binary pulsars. A black hole is the result of a continued collapse, meaning that it does not stop with the formation of an ultra-dense star (such as a neutron star). (The physical concept of a black hole was introduced by J.R. Oppenheimer and H. Snyder in 1939. The global geometric structure of black holes was not un- derstood until some years later, thanks notably to the work of R. Penrose. For a historical review of the idea of black holes see [9].) It is a particular structure of curved space-time characterized by the existence of a boundary (called the “black hole surface” or “horizon”) between an exterior region, from which it is possible to emit signals to infinity, and an interior region (of space-time), within which any emitted signal remains trapped. See Figure 3. r = 0 SINGULARITY r = 2M HORIZON FLASH OF LIGHT EMITTED FROM CENTER COLLAPSING space Figure 3: Schematic representation of the space-time for a black hole created from the collapse of a spherical star. Each cone represents the space-time history of a flash of light emitted from a point at a particular instant. (Such a “cone field” is obtained by taking the limit ε2 = 0 from Figure 2, and keeping only the upper part, in other words the part directed towards the future, of the double cones obtained as limits of the hyperboloids of Figure 2.) The interior of the black hole is shaded, its outer boundary being the “black hole surface” or “horizon.” The “inner boundary” (shown in dark grey) of the interior region of the black hole is a space-time singularity of the big-crunch type. The cones shown in this figure are called “light cones.” They are defined as the locus of points (infinitesimally close to x) such that ds2 = 0, with dx0 = cdt ≥ 0. Each represents the beginning of the space-time history of a flash of light emitted from a certain point in space-time. The cones whose vertices are located outside of the horizon (the shaded zone) will evolve by spreading out to infinity, thus representing the possibility for electromagnetic signals to reach infinity. On the other hand, the cones whose vertices are located inside the horizon (the grey zone) will evolve without ever succeeding in escaping the grey zone. It is therefore impossible to emit an electromagnetic signal that reaches infinity from the grey zone. The horizon, namely the boundary between the shaded zone and the unshaded zone, is itself the history of a particular flash of light, emitted from the center of the star over the course of its collapse, such that it asymptotically stabilizes as a space-time cylinder. This space-time cylinder (the asymptotic horizon) therefore represents the space-time history of a bubble of light that, viewed locally, moves outward at the speed c, but which globally “runs in place.” This remarkable behavior is a striking illustration of the “fluid” character of space-time in Einstein’s theory. Indeed, one can compare the pre- ceding situation with what may take place around the open drain of an emptying sink: a wave may move along the water, away from the hole, all the while run- ning in place with respect to the sink because of the falling motion of the water in the direction of the drain. Note that the temporal development of the interior region is limited, ter- minating in a singularity (the dark gray surface) where the curvature becomes infinite and where the classical description of space and time loses its meaning. This singularity is locally similar to the temporal inverse of a cosmological sin- gularity of the big bang type. This is called a big crunch. It is a space-time frontier, beyond which space-time ceases to exist. The appearance of singulari- ties associated with regions of strong gravitational fields is a generic phenomenon in general relativity, as shown by theorems of R. Penrose and S.W. Hawking. Black holes have some remarkable properties. First, a uniqueness theorem (due to W. Israel, B. Carter, D.C. Robinson, G. Bunting, and P.O. Mazur) asserts that an isolated, stationary black hole (in Einstein-Maxwell theory) is completely described by three parameters: its mass M , its angular momen- tum J , and its electric charge Q. The exact solution (called the Kerr-Newman solution) of Einstein’s equations (11) describing a black hole with parameters M,J,Q is explicitly known. We shall here content ourselves with writing the space-time geometry in the simplest case of a black hole: the one in which J = Q = 0 and the black hole is described only by its mass (a solution discov- ered by K. Schwarzschild in January of 1916): ds2 = − 1− 2GM c2 dt2 + 1− 2GM + r2(dθ2 + sin2 θ dϕ2) . (17) We see that the purely temporal component of the metric, g00 = −(1−2GM/c2r), vanishes when the radial coordinate r takes the value r = rH ≡ 2GM/c2. Ac- cording to the earlier equation (14), it would therefore seem that light emitted from an arbitrary point on the sphere r0 = rH , when it is viewed by an observer located anywhere in the exterior (in r > rH), would experience an infinite red- dening of its emission frequency (ν/ν0 = 0). In fact, the sphere rH = 2GM/c is the horizon of the Schwarzschild black hole, and no particle (that is capable of emitting light) can remain at rest when r = rH (nor, a fortiori, when r < rH). To study what happens at the horizon (r = rH) or in the interior (r < rH) of a Schwarzschild black hole, one must use other space-time coordinates than the coordinates (t, r, θ, ϕ) used in Equation (17). The “big crunch” singularity in the interior of a Schwarzschild black hole, in the coordinates of (17), is located at r = 0 (which does not describe, as one might believe, a point in space, but rather an instant in time). The space-time metric of a black hole space-time, such as Equation (17) in the simple case J = Q = 0, allows one to study the influence of a black hole on particles and fields in its neighborhood. One finds that a black hole is a gravitational potential well that is so deep that any particle or wave that penetrates the interior of the black hole (the region r < rH) will never be able to come out again, and that the total energy of the particle or wave that “falls” into the black hole ends up augmenting the total mass-energy M of the black hole. By studying such black hole “accretion” processes with falling particles (following R. Penrose), D. Christodoulou and R. Ruffini showed that a black hole is not only a potential well, but also a physical object possessing a significant free energy that it is possible, in principle, to extract. Such black hole energetics is encapsulated in the “mass formula” of Christodoulou and Ruffini (in units where c = 1) Mirr + 4GMirr 4G2M2irr , (18) where Mirr denotes the irreducible mass of the black hole, a quantity that can only grow, irreversibly. One deduces from (18) that a rotating (J 6= 0) and/or charged (Q 6= 0) black hole possesses a free energy M − Mirr > 0 that can, in principle, be extracted through processes that reduce its angular momentum and/or its electric charge. Such black hole energy-extraction processes may lie at the origin of certain ultra-energetic astrophysical phenomena (such as quasars or gamma ray bursts). Let us note that, according to Equation (18), (rotating or charged) black holes are the largest reservoirs of free energy in the Universe: in fact, 29% of their mass energy can be stored in the form of rotational energy, and up to 50% can be stored in the form of electric energy. These percentages are much higher than the few percent of nuclear binding energy that is at the origin of all the light emitted by stars over their lifetimes. Even though there is not, at present, irrefutable proof of the existence of black holes in the universe, an entire range of very strong presumptive evidence lends credence to their existence. In particular, more than a dozen X-ray emitting binary systems in our galaxy are most likely made up of a black hole and an ordinary star. Moreover, the center of our galaxy seems to contain a very compact concentration of mass ∼ 3 × 106M⊙ that is probably a black hole. (For a review of the observational data leading to these conclusions see, for example, Section 7.6 of the recent book by N. Straumann [6].) The fact that a quantity associated with a black hole, here the irreducible mass Mirr, or, according to a more general result due to S.W. Hawking, the total area A of the surface of a black hole (A = 16 πG2M2irr), can evolve only by irreversibly growing is reminiscent of the second law of thermodynamics. This result led J.D. Bekenstein to interpret the horizon area, A, as being propor- tional to the entropy of the black hole. Such a thermodynamic interpretation is reinforced by the study of the growth of A under the influence of external perturbations, a growth that one can in fact attribute to some local dissipative properties of the black hole surface, notably a surface viscosity and an electrical resistivity equal to 377 ohm (as shown in work by T. Damour and R.L. Zna- jek). These “thermodynamic” interpretations of black hole properties are based on simple analogies at the level of classical physics, but a remarkable result by Hawking showed that they have real content at the level of quantum physics. In 1974, Hawking discovered that the presence of a horizon in a black hole space-time affected the definition of a quantum particle, and caused a black hole to continuously emit a flux of particles having the characteristic spectrum (Planck spectrum) of thermal emission at the temperature T = 4 ~G∂M/∂A, where ~ is the reduced Planck constant. By using the general thermodynamic relation connecting the temperature to the energy E = M and the entropy S, T = ∂M/∂S, we see from Hawking’s result (in conformity with Bekenstein’s ideas) that a black hole possesses an entropy S equal (again with c = 1) to . (19) The Bekenstein-Hawking formula (19) suggests an unexpected, and perhaps pro- found, connection between gravitation, thermodynamics, and quantum theory. See Section 11 below. 9 Binary Pulsars and Experimental Confirma- tions in the Regime of Strong and Radiating Gravitational Fields Binary pulsars are binary systems made up of a pulsar (a rapidly spinning neutron star) and a very dense companion star (either a neutron star or a white dwarf). The first system of this type (called PSR B1913+16) was discovered by R.A. Hulse and J.H. Taylor in 1974 [10]. Today, a dozen are known. Some of these (including the first-discovered PSR B1913+16) have revealed themselves to be remarkable probes of relativistic gravitation and, in particular, of the regime of strong and/or radiating gravitational fields. The reason for which a binary pulsar allows for the probing of strong gravitational fields is that, as we have already indicated above, the deformation of the space-time geometry in the neighborhood of a neutron star is no longer a small quantity, as it is in the solar system. Rather, it is on the order of unity: hµν ≡ gµν − ηµν ∼ 2GM/c2R ∼ 0.4. (We note that this value is only 2.5 times smaller than in the extreme case of a black hole, for which 2GM/c2R = 1.) Moreover, the fact that the gravitational interaction propagates at the speed of light (as indicated by the presence of the wave operator, � = ∆ − c−2∂2/∂t2 in (11)) between the pulsar and its companion is found to play an observationally significant role for certain binary pulsars. Let us outline how the observational data from binary pulsars are used to probe the regime of strong (hµν on the order of unity) and/or radiative (effects propagating at the speed c) gravitational fields. (For more details on the obser- vational data from binary pulsars and their use in probing relativistic gravita- tion, see Michael Kramer’s contribution to this Poincaré seminar.) Essentially, a pulsar plays the role of an extremely stable clock. Indeed, the “pulsar phe- nomenon” is due to the rotation of a bundle of electromagnetic waves, created in the neighborhood of the two magnetic poles of a strongly magnetized neutron star (with a magnetic field on the order of 1012 Gauss, 1012 times the size of the terrestrial magnetic field). Since the magnetic axis of a pulsar is not aligned with its axis of rotation, the rapid rotation of the pulsar causes the (inner) magnetosphere as a whole to rotate, and likewise the bundle of electromagnetic waves created near the magnetic poles. The pulsar is therefore analogous to a lighthouse that sweeps out space with two bundles (one per pole) of electromag- netic waves. Just as for a lighthouse, one does not see the pulsar from Earth except when the bundle sweeps the Earth, thus causing a flash of electromag- netic noise with each turn of the pulsar around itself (in some cases, one even sees a secondary flash, due to emission from the second pole, after each half- turn). One can then measure the time of arrival at Earth of (the center of) each flash of electromagnetic noise. The basic observational data of a pulsar are thus made up of a regular, discrete sequence of the arrival times at Earth of these flashes or “pulses.” This sequence is analogous to the signal from a clock: tick, tick, tick, . . .. Observationally, one finds that some pulsars (and in particular those that belong to binary systems) thus define clocks of a stability comparable to the best atomic clocks [11]. In the case of a solitary pulsar, the sequence of its arrival times is (in essence) a regular “arithmetic sequence,” TN = aN + b, where N is an integer labelling the pulse considered, and where a is equal to the period of rotation of the pulsar around itself. In the case of a binary pulsar, the sequence of arrival times is a much richer signal, say TN = aN + b+∆N , where ∆N measures the deviation with respect to a regular arithmetic sequence. This deviation (after the subtraction of effects not connected to the orbital period of the pulsar) is due to a whole ensemble of physical effects connected to the orbital motion of the pulsar around its companion or, more precisely, around the center of mass of the binary system. Some of these effects could be pre- dicted by a purely Keplerian description of the motion of the pulsar in space, and are analogous to the “Rœmer effect” that allowed Rœmer to determine, for the first time, the speed of light from the arrival times at Earth of light signals coming from Jupiter’s satellites (the light signals coming from a body moving in orbit are “delayed” by the time taken by light to cross this orbit and arrive at Earth). Other effects can only be predicted and calculated by using a relativistic description, either of the orbital motion of the pulsar, or of the prop- agation of electromagnetic signals between the pulsar and Earth. For example, the following facts must be accounted for: (i) the “pulsar clock” moves at a large speed (on the order of 300 km/s ∼ 10−3c) and is embedded in the varying gravitational potential of the companion; (ii) the orbit of the pulsar is not a simple Keplerian ellipse, but (in general relativity) a more complicated orbit that traces out a “rosette” around the center of mass; (iii) the propagation of electromagnetic signals between the pulsar and Earth takes place in a space-time that is curved by both the pulsar and its companion, which leads to particular effects of relativistic delay; etc. Taking relativistic effects in the theoretical de- scription of arrival times for signals emitted by binary pulsars into account thus leads one to write what is called a timing formula. This timing formula (due to T. Damour and N. Deruelle) in essence allows one to parameterize the sequence of arrival times, TN = aN + b +∆N , in other words to parameterize ∆N , as a function of a set of “phenomenological parameters” that include not only the so-called “Keplerian” parameters (such as the orbital period P , the projection of the semi-major axis of the pulsar’s orbit along the line of sight xA = aA sin i, and the eccentricity e), but also the post-Keplerian parameters associated with the relativistic effects mentioned above. For example, effect (i) discussed above is parameterized by a quantity denoted γT ; effect (ii) by (among others) the quantities ω̇, Ṗ ; effect (iii) by the quantities r, s; etc. The way in which observations of binary pulsars allow one to test rela- tivistic theories of gravity is therefore the following. A (least-squares) fit be- tween the observational timing data, ∆obsN , and the parameterized theoreti- cal timing formula, ∆thN (P, xA, e; γT , ω̇, Ṗ , r, s), allows for the determination of the observational values of the Keplerian (P obs, xobsA , e obs) and post-Keplerian (γobsT , ω̇ obs, Ṗ obs, robs, sobs) parameters. The theory of general relativity pre- dicts the value of each post-Keplerian parameter as a function of the Keple- rian parameters and the two masses of the binary system (the mass mA of the pulsar and the mass mB of the companion). For example, the theoretical value predicted by general relativity for the parameter γT is γ T (mA,mB) = en−1(GMn/c3)2/3mB(mA + 2mB)/M 2, where e is the eccentricity, n = 2π/P the orbital frequency, andM ≡ mA+mB. We thus see that, if one assumes that general relativity is correct, the observational measurement of a post-Keplerian parameter, for example γobsT , determines a curve in the plane (mA,mB) of the two masses: γGRT (mA,mB) = γ T , in our example. The measurement of two post-Keplerian parameters thus gives two curves in the (mA,mB) plane and generically allows one to determine the values of the two masses mA and mB, by considering the intersection of the two curves. We obtain a test of general relativity as soon as one observationally measures three or more post-Keplerian parameters: if the three (or more) curves all intersect at one point in the plane of the two masses, the theory of general relativity is confirmed, but if this is not the case the theory is refuted. At present, four distinct binary pulsars have allowed one to test general relativity. These four “relativistic” binary pulsars are: the first binary pulsar PSR B1913+16, the pulsar PSR B1534+12 (dis- covered by A. Wolszczan in 1991), and two recently discovered pulsars: PSR J1141−6545 (discovered in 1999 by V.M. Kaspi et al., whose first timing results are due to M. Bailes et al. in 2003), and PSR J0737−3039 (discovered in 2003 by M. Burgay et al., whose first timing results are due to A.G. Lyne et al. and M. Kramer et al.). With the exception of PSR J1141−6545, whose companion is a white dwarf, the companions of the pulsars are neutron stars. In the case of PSR J0737−3039 the companion turns out to also be a pulsar that is visible from Earth. In the system PSR B1913+16, three post-Keplerian parameters have been measured (ω̇, γT , Ṗ ), which gives one test of the theory. In the system PSR J1141−65, three post-Keplerian parameters have been measured (ω̇, γT , Ṗ ), which gives one test of the theory. (The parameter s is also measured through scin- tillation phenomena, but the use of this measurement for testing gravitation is more problematic.) In the system PSR B1534+12, five post-Keplerian param- eters have been measured, which gives three tests of the theory. In the system PSR J0737−3039,six post-Keplerian parameters,4 which gives four tests of the theory. It is remarkable that all of these tests have confirmed general relativ- ity. See Figure 4 and, for references and details, [4, 11, 12, 13], as well as the contribution by Michael Kramer. Note that, in Figure 4, some post-Keplerian parameters are measured with such great precision that they in fact define very thin curves in the mA,mB plane. On the other hand, some of them are only measured with a rough fractional precision and thus define “thick curves,” or “strips” in the plane of the masses (see, for example, the strips associated with Ṗ , r and s in the case of PSR B1534+12). In any case, the theory is confirmed when all of the strips (thick or thin) have a non-empty common intersection. (One should also note that the strips represented in Figure 4 only use the “one sigma” error bars, in other words a 68% level of confidence. Therefore, the fact that the Ṗ strip for PSR B1534+12 is a little bit disjoint from the intersection of the other strips is not significant: a “two sigma” figure would show excellent agreement between observation and general relativity.) In view of the arguments presented above, all of the tests shown in Figure 4 confirm the validity of general relativity in the regime of strong gravitational fields (hµν ∼ 1). Moreover, the four tests that use measurements of the pa- rameter Ṗ (in the four corresponding systems) are direct experimental confir- mations of the fact that the gravitational interaction propagates at the speed c between the companion and the pulsar. In fact, Ṗ denotes the long-term variation 〈dP/dt〉 of the orbital period. Detailed theoretical calculations of the motion of two gravitationally condensed objects in general relativity, that take into account the effects connected to the propagation of the gravitational inter- action at finite speed[14], have shown that one of the observable effects of this propagation is a long-term decrease in the orbital period given by the formula ṖGR(mA,mB) = − 192 π 1 + 73 e2 + 37 (1− e2)7/2 4In the case of PSR J0737−3039, one of the six measured parameters is the ratio xA/xB between a Keplerian parameter of the pulsar and its analog for the companion, which turns out to also be a pulsar. s ≤ 1 0 0.5 1 1.5 2 2.5 PSR J1141−6545 intersection 0 0.5 1 1.5 2 2.5 PSR B1534+12 intersection 0 0.5 1 1.5 2 2.5 PSR J0737−3039 intersection 0 0.5 1 1.5 2 2.5 2.5 ω s ≤ 1 PSR B1913+16 intersection Figure 4: Tests of general relativity obtained from observations of four binary pulsars. For each binary pulsar one has traced the “curves,” in the plane of the two masses (mA = mass of the pulsar, mB = mass of the companion), defined by equating the theoretical expressions for the various post-Keplerian parameters, as predicted by general relativity, to their observational value, de- termined through a least-squares fit to the parameterized theoretical timing formula. Each “curve” is in fact a “strip,” whose thickness is given by the (one sigma) precision with which the corresponding post-Keplerian parameter is measured. For some parameters, these strips are too thin to be visible. The grey zones would correspond to a sine for the angle of inclination of the or- bital plane with respect to the plane of the sky that is greater than 1, and are therefore physically excluded. The direct physical origin of this decrease in the orbital period lies in the mod- ification, produced by general relativity, of the usual Newtonian law of gravi- tational attraction between two bodies, FNewton = GmAmB/r AB. In place of such a simple law, general relativity predicts a more complicated force law that can be expanded in the symbolic form FEinstein = GmAmB + · · · , (20) where, for example, “v2/c2” represents a whole set of terms of order v2A/c v2B/c 2, vA vB/c 2, or even GmA/c 2 r or GmB/c 2 r. Here vA denotes the speed of body A, vB that of body B, and rAB the distance between the two bod- ies. The term of order v5/c5 in Equation (20) is particularly important. This term is a direct consequence of the finite-speed propagation of the gravitational interaction between A and B, and its calculation shows that it contains a com- ponent that is opposed to the relative speed vA − vB of the two bodies and that, consequently, slows down the orbital motion of each body, causing it to evolve towards an orbit that lies closer to its companion (and therefore has a shorter orbital period). This “braking” term (which is correlated with the emis- sion of gravitational waves), δFEinstein ∼ v5/c5 FNewton, leads to a long-term decrease in the orbital period ṖGR ∼ −(v/c)5 ∼ −10−12 that is very small, but whose reality has been verified with a fractional precision of order 10−3 in PSR B1913+16 and of order 20% in PSR B1534+12 and PSR J1141−6545 [4, 11, 13]. To conclude this brief outline of the tests of relativistic gravitation by binary pulsars, let us note that there is an analog, for the regime of strong gravitational fields, of the formalism of parametrization for possible deviations from general relativity mentioned in Section 6 in the framework of weak gravitational fields (using the post-Newtonian parameters γ−1 and β−1). This analog is obtained by considering a two-parameter family of relativistic theories of gravitation, assuming that the gravitational interaction is propagated not only by a tensor field gµν but also by a scalar field ϕ. Such a class of tensor-scalar theories of gravitation allows for a description of possible deviations in both the solar system and in binary pulsars. It also allows one to explicitly demonstrate that binary pulsars indeed test the effects of strong fields that go beyond the tests of the weak fields of the solar system by exhibiting classes of theories that are compatible with all of the observations in the solar system but that are incompatible with the observations of binary pulsars, see [4, 13]. 10 Gravitational Waves: Propagation, Genera- tion, and Detection As soon as he had finished constructing the theory of general relativity, Ein- stein realized that it implied the existence of waves of geometric deformations of space-time, or “gravitational waves” [15, 2]. Mathematically, these waves are analogs (with the replacement Aµ → hµν) of electromagnetic waves, but concep- tually they signify something remarkable: they exemplify, in the purest possible way, the “elastic” nature of space-time in general relativity. Before Einstein space-time was a rigid structure, given a priori, which was not influenced by the material content of the Universe. After Einstein, a distribution of matter (or more generally of mass-energy) that changes over the course of time, let us say for concreteness a binary system of two neutron stars or two black holes, will not only deform the chrono-geometry of the space-time in its immediate neighborhood, but this deformation will propagate in every possible direction away from the system considered, and will travel out to infinity in the form of a wave whose oscillations will reflect the temporal variations of the matter distri- bution. We therefore see that the study of these gravitational waves poses three separate problems: that of generation, that of propagation, and, finally, that of detection of such gravitational radiation. These three problems are at present being actively studied, since it is hoped that we will soon detect gravitational waves, and thus will be able to obtain new information about the Universe [16]. We shall here content ourselves with an elementary introduction to this field of research. For a more detailed introduction to the detection of gravitational waves see the contribution by Jean-Yves Vinet to this Poincaré seminar. Let us first consider the simplest case of very weak gravitational waves, outside of their material sources. The geometry of such a space-time can be written, as in Section 6, as gµν(x) = ηµν+hµν(x), where hµν ≪ 1. At first order in h, and outside of the source (namely in the domain where Tµν(x) = 0), the perturbation of the geometry, hµν(x), satisfies a homogeneous equation obtained by replacing the right-hand side of Equation (11) with zero. It can be shown that one can simplify this equation through a suitable choice of coordinate system. In a transverse traceless (TT) coordinate system the only non-zero components of a general gravitational wave are the spatial components hTTij , i, j = 1, 2, 3 (in other words hTT00 = 0 = h 0i ), and these components satisfy � hTTij = 0 , ∂j h ij = 0 , h jj = 0 . (21) The first equation in (21), where the wave operator � = ∆ − c−2 ∂2t appears, shows that gravitational waves (like electromagnetic waves) propagate at the speed c. If we consider for simplicity a monochromatic plane wave (hTTij = ζij exp(ik ·x− i ω t)+ complex conjugate, with ω = c |k|), the second equation in (21) shows that the (complex) tensor ζij measuring the polarization of a gravitational wave only has non-zero components in the plane orthogonal to the wave’s direction of propagation: ζij k j = 0. Finally, the third equation in (21) shows that the polarization tensor ζij has vanishing trace: ζjj = 0. More concretely, this means that if a gravitational wave propagates in the z- direction, its polarization is described by a 2 × 2 matrix, ζxx ζxy ζyx ζyy , which is symmetric and traceless. Such a polarization matrix therefore only contains two independent (complex) components: ζ+ ≡ ζxx = −ζyy, and ζ× ≡ ζxy = ζyx. This is the same number of independent (complex) components that an electromagnetic wave has. Indeed, in a transverse gauge, an electromagnetic wave only has spatial components ATi that satisfy �ATi = 0 , ∂j A j = 0 . (22) As in the case above, the first equation (22) means that an electromagnetic wave propagates at the speed c, and the second equation shows that a monochromatic plane electromagnetic wave (ATi = ζi exp(ik · x− i ω t)+ c.c., ω = c |k|) is de- scribed by a (complex) polarization vector ζi that is orthogonal to the direction of propagation: ζj k j = 0. For a wave propagating in the z-direction such a vector only has two independent (complex) components, ζx and ζy. It is in- deed the same number of components that a gravitational wave has, but we see that the two quantities measuring the polarization of a gravitational wave, ζ+ = ζxx = −ζyy, ζ× = ζxy = ζyx are mathematically quite different from the two quantities ζx, ζy measuring the polarization of an electromagnetic wave. However, see Section 11 below. We have here discussed the propagation of a gravitational wave in a back- ground space-time described by the Minkowski metric ηµν . One can also con- sider the propagation of a wave in a curved background space-time, namely by studying solutions of Einstein’s equations (9) of the form gµν(x) = g µν(x) + hµν(x) where hµν is not only small, but varies on temporal and spatial scales much shorter than those of the background metric gBµν(x). Such a study is nec- essary, for example, for understanding the propagation of gravitational waves in the cosmological Universe. The problem of generation consists in searching for the connection between the tensorial amplitude hTTij of the gravitational radiation in the radiation zone and the motion and structure of the source. If one considers the simplest case of a source that is sufficiently diffuse that it only creates waves that are everywhere weak (gµν − ηµν = hµν ≪ 1), one can use the linearized approximation to Ein- stein’s equations (9), namely Equations (11). One can solve Equations (11) by the same technique that is used to solve Maxwell’s equations (12): one fixes the coordinate system by imposing ∂α hαµ − 12 ∂µ h α = 0 (analogous to the Lorentz gauge condition ∂αAα = 0), then one inverts the wave operator by using re- tarded potentials. Finally, one must study the asymptotic form, at infinity, of the emitted wave, and write it in the reduced form of a transverse and traceless amplitude hTTij satisfying Equations (21) (analogous to a transverse electromag- netic wave ATi satisfying (22)). One then finds that, just as charge conservation implies that there is no monopole type electro-magnetic radiation, but only dipole or higher orders of polarity, the conservation of energy-momentum im- plies the absence of monopole and dipole gravitational radiation. For a slowly varying source (v/c≪ 1), the dominant gravitational radiation is of quadrupole type. It is given, in the radiation zone, by an expression of the form hTTij (t, r,n) ≃ [Iij(t− r/c)]TT . (23) Here r denotes the distance to the center of mass of the source, Iij(t) ≡ d3x c−2 T 00(t,x) xixj − 1 x2δij is the quadrupole moment of the mass-energy distri- bution, and the upper index TT denotes an algebraic projection operation for the quadrupole tensor Iij (which is a 3 × 3 matrix) that only retains the part orthogonal to the local direction of wave propagation ni ≡ xi/r with vanish- ing trace (ITTij is therefore locally a (real) 2× 2 symmetric, traceless matrix of the same type as ζij above). Formula (23) (which was in essence obtained by Einstein in 1918 [15]) is only the first approximation to an expansion in powers of v/c, where v designates an internal speed characteristic of the source. The prospect of soon being able to detect gravitational waves has motivated theo- rists to improve Formula (23): (i) by describing the terms of higher order in v/c, up to a very high order, and (ii) by using new approximation methods that allow one to treat sources containing regions of strong gravitational fields (such as, for example, a binary system of two black holes or two neutron stars). See below for the most recent results. Finally, the problem of detection, of which the pioneer was Joseph Weber in the 1960s, is at present giving rise to very active experimental research. The principle behind any detector is that a gravitational wave of amplitude hTTij induces a change in the distance L between two bodies on the order of δL ∼ hL during its passage. One way of seeing this is to consider the action of a wave hTTij on two free particles, at rest before the arrival of the wave at the positions xi1 and x 2 respectively. As we have seen, each particle, in the presence of the wave, will follow a geodesic motion in the geometry gµν = ηµν + hµν (with h00 = h0i = 0 and hij = h ij ). By writing out the geodesic equation, Equation (7), one finds that it simply reduces (at first order in h) to d2xi/ds2 = 0. Therefore, particles that are initially at rest (xi = const.) remain at rest in a transverse and traceless system of coordinates! This does not however mean that the gravitational wave has no observable effect. In fact, since the spatial geometry is perturbed by the passage of the wave, gij(t,x) = δij + h ij (t,x), one finds that the physical distance between the two particles xi1, x 2 (which is observable, for example, by measuring the time taken for light to make a round trip between the two particles) varies, during the passage of the wave, according to L2 = (δij + h ij )(x 2 − xi1)(x 2 − x The problem of detecting a gravitational wave thus leads to the problem of detecting a small relative displacement δL/L ∼ h. By using Formula (23), one finds that the order of magnitude of h, for known or hoped for astrophysical sources (for example,a very close system of two neutron stars or two black holes), situated at distances such that one may hope to see several events per year (r & 600 million light-years), is in fact extremely small: h . 10−22 for signals whose characteristic frequency is around 100 Hertz. Several types of detectors have been developed since the pioneering work of J. Weber [16]. At present, the detectors that should succeed in the near future at detecting amplitudes h ∼ δL/L ∼ 10−22 are large interferometers, of the Michelson or Fabry-Pérot type, having arms that are many kilometers in length into which a very powerful monochromatic laser beam is injected. Such terrestrial interferometric detectors presently exist in the U.S.A. (the LIGO detectors [17]), in Europe (the VIRGO [18] and GEO 600 [19] detectors) and elsewhere (such as the TAMA detector in Japan). Moreover, the international space project LISA [20], made up of an interferometer between satellites that are several million kilometers apart, should allow one to detect low frequency (∼ one hundredth or one thousandth of a Hertz) gravitational waves in a dozen years or so. This collection of gravi- tational wave detectors promises to bring invaluable information for astronomy by opening a new “window” on the Universe that is much more transparent than the various electromagnetic (or neutrino) windows that have so greatly expanded our knowledge of the Universe in the twentieth century. The extreme smallness of the expected gravitational signals has led a num- ber of experimentalists to contribute, over many years, a wealth of ingenuity and know-how in order to develop technology that is sufficiently precise and trustworthy (see [17, 18, 19, 20]). To conclude, let us also mention how much concerted theoretical effort has been made, both in calculating the general rel- ativistic predictions for gravitational waves emitted by certain sources, and in developing methods adapted to the extraction of the gravitational signal from the background noise in the detectors. For example, one of the most promising sources for terrestrial detectors is the wave train for gravitational waves emitted by a system of two black holes, and in particular the final (most intense) portion of this wave train, which is emitted during the last few orbits of the system and the final coalescence of the two black holes into a single, more massive black hole. We have seen above (see Section 9) that the finite speed of propagation of the gravitational interaction between the two bodies of a binary system gives rise to a progressive acceleration of the orbital frequency, connected to the pro- gressive approach of the two bodies towards each other. Here we are speaking of the final stages in such a process, where the two bodies are so close that they orbit around each other in a spiral pattern that accelerates until they attain (for the final “stable” orbits) speeds that become comparable to the speed of light, all the while remaining slightly slower. In order to be able to determine, with a precision that is acceptable for the needs of detection, the dynamics of such a binary black hole system in such a situation, as well as the gravitational amplitude hTTij that it emits, it was necessary to develop a whole ensemble of analytic techniques to a very high level of precision. For example, it was neces- sary to calculate the expansion (20) of the force determining the motion of the two bodies to a very high order and also to calculate the amplitude hTTij of the gravitational radiation emitted to infinity with a precision going well beyond the quadrupole approximation (23). These calculations are comparable in complex- ity to high-order calculations in quantum field theory. Some of the techniques developed for quantum field theory indeed proved to be extremely useful for these calculations in the (classical) theory of general relativity (such as certain resummation methods and the mathematical use of analytic continuation in the number of space-time dimensions). For an entryway into the literature of these modern analytic methods, see [21], and for an early example of a result obtained by such methods of direct interest for the physics of detection see Figure 5 [22], which shows a component of the gravitational amplitude hTTij (t) emitted during the final stages of evolution of a system of two black holes of equal mass. The first oscillations shown in Figure 5 are emitted during the last quasi-circular orbits (accelerated motion in a spiral of decreasing radius). The middle part of the signal corresponds to a phase where, having moved past the last stable orbit, the two black holes “fall” toward each other while spiraling rapidly. In fact, contrarily to Newton’s theory, which predicts that two condensed bodies would be able to orbit around each other with an orbit of arbitrarily small ra- dius (basically up until the point that the two bodies touch), Einstein’s theory predicts a modified law for the force between the two bodies, Equation (20), whose analysis shows that it is so attractive that it no longer allows for sta- ble circular orbits when the distance between the two bodies becomes smaller than around 6G(mA +mB)/c 2. In the case of two black holes, this distance is sufficiently larger than the black hole “radii” (2GmA/c 2 and 2GmB/c 2) that one is still able to analytically treat the beginning of the “spiralling plunge” of the two black holes towards each other. The final oscillations in Figure 5 are emitted by the rotating (and initially highly deformed) black hole formed from the merger of the two initial, separate black holes. −200 −100 0 100 −0.48 −0.38 −0.28 −0.18 −0.08 inspiral + plunge merger + ring−down Figure 5: The gravitational amplitude h(t) emitted during the final stages of evolution of a system of two equal-mass black holes. The beginning of the signal (the left side of the figure), which is sinusoidal, corresponds to an inspiral motion of two separate black holes (with decreasing distance); the middle corresponds to a rapid “inspiralling plunge” of the two black holes towards each other; the end (at right) corresponds to the oscillations of the final, rotating black hole formed from the merger of the two initial black holes. Up until quite recently the analytic predictions illustrated in Figure 5 con- cerning the gravitational signal h(t) emitted by the spiralling plunge and merger of two black holes remained conjectural, since they could be compared to neither other theoretical predictions nor to observational data. Recently, worldwide ef- forts made over three decades to attack the problem of the coalescence of two black holes by numerically solving Einstein’s equations (9) have spectacularly begun to bear fruit. Several groups have been able to numerically calculate the signal h(t) emitted during the final orbits and merger of two black holes [23]. In essence, there is good agreement between the analytical and numerical predictions. In order to be able to detect the gravitational waves emitted by the coalescence of two black holes, it will most likely be necessary to properly combine the information on the structure of the signal h(t) obtained by the two types of methods, which are in fact complementary. 11 General Relativity and Quantum Theory: From Supergravity to String Theory Up until now, we have discussed the classical theory of general relativity, ne- glecting any quantum effects. What becomes of the theory in the quantum regime? This apparently innocent question in fact opens up vast new prospects that are still under construction. We will do nothing more here than to touch upon the subject, by pointing out to the reader some of the paths along which contemporary physics has been led by the challenge of unifying general relativity and quantum theory. For a more complete introduction to the various possi- bilities “beyond” general relativity suggested within the framework of string theory (which is still under construction) one should consult the contribution of Ignatios Antoniadis to this Poincaré Seminar. Let us recall that, from the very beginning of the quasi-definitive formula- tion of quantum theory (1925–1930), the creators of quantum mechanics (Born, Heisenberg, Jordan; Dirac; Pauli; etc.) showed how to “quantize” not only systems with several particles (such as an atom), but also fields, continuous dy- namical systems whose classical description implies a continuous distribution of energy and momentum in space. In particular, they showed how to quantize (or in other words how to formulate within a framework compatible with quantum theory) the electromagnetic field Aµ, which, as we have recalled above, satisfies the Maxwell equations (12) at the classical level. They nevertheless ran into dif- ficulty due to the following fact. In quantum theory, the physics of a system’s evolution is essentially contained in the transition amplitudes A(f, i) between an initial state labelled by i and a final state labelled by f . These amplitudes A(f, i) are complex numbers. They satisfy a “transitivity” property of the type A(f, i) = A(f, n)A(n, i) , (24) which contains a sum over all possible intermediate states, labelled by n (with this sum becoming an integral when there is a continuum of intermediate pos- sible states). R. Feynman used Equation (24) as a point of departure for a new formulation of quantum theory, by interpreting it as an analog of Huy- gens’ Principle: if one thinks of A(f, i) as the amplitude, “at the point f ,” of a “wave” emitted “from the point i,” Equation (24) states that this amplitude can be calculated by considering the “wave” emitted from i as passing through all possible intermediate “points” n (A(n, i)), while reemitting “wavelets” start- ing from these intermediate points (A(f, n)), which then superpose to form the total wave arriving at the “final point f .” Property (24) does not pose any problem in the quantum mechanics of dis- crete systems (particle systems). It simply shows that the amplitude A(f, i) behaves like a wave, and therefore must satisfy a “wave equation” (which is in- deed the case for the Schrödinger equation describing the dependence of A(f, i) on the parameters determining the final configuration f). On the other hand, Property (24) poses formidable problems when one applies it to the quantiza- tion of continuous dynamical systems (fields). In fact, for such systems the “space” of intermediate possible states is infinitely larger than in the case of the mechanics of discrete systems. Roughly speaking, the intermediate possible states for a field can be described as containing ℓ = 1, 2, 3, . . . quantum excita- tions of the field, with each quantum excitation (or pair of “virtual particles”) being described essentially by a plane wave, ζ exp(i kµ x µ), where ζ measures the polarization of these virtual particles and kµ = ηµν kν , with k 0 = ω and ki = k, their angular frequency and wave vector, or (using the Planck-Einstein- de Broglie relations E = ~ω, p = ~k) their energy-momentum pµ = ~ kµ. The quantum theory shows (basically because of the uncertainty principle) that the four-frequencies (and four-momenta) pµ = ~ kµ of the intermediate states cannot be constrained to satisfy the classical equation ηµν p µ pν = −m2 (or in other words E2 = p2 +m2 ; we use c = 1 in this section). As a consequence, the sum over intermediate states for a quantum field theory has the following properties (among others): (i) when ℓ = 1 (an intermediate state containing only one pair of virtual particles, called a one-loop contribution), there is an in- tegral over a four-momentum pµ, d4p = dp; (ii) when ℓ = 2 (two pairs of virtual particles; a two-loop contribution), there is an integral over two four- momenta p 1 , p d4p1 d 4p2; etc. The delicate point comes from the fact that the energy-momentum of an intermediate state can take arbitrarily high values. This possibility is directly connected (through a Fourier transform) to the fact that a field possesses an infinite number of degrees of freedom, corresponding to configurations that vary over arbitrarily small time and length scales. The problems posed by the necessity of integrating over the infinite domain of four-momenta of intermediate virtual particles (or in other words of account- ing for the fact that field configurations can vary over arbitrarily small scales) appeared in the 1930s when the quantum theory of the electromagnetic field Aµ (called quantum electrodynamics, or QED) was studied in detail. These problems imposed themselves in the following form: when one calculates the transition amplitude for given initial and final states (for example the collision of two light quanta, with two photons entering and two photons leaving) by using (24), one finds a result given in the form of a divergent integral, because of the integral (in the one-loop approximation, ℓ = 1) over the arbitrarily large energy-momentum describing virtual electron-positron pairs appearing as pos- sible intermediate states. Little by little, theoretical physicists understood that the types of divergent integrals appearing in QED were relatively benign and, after the second world war, they developed a method (renormalization theory) that allowed one to unambiguously isolate the infinite part of these integrals, and to subtract them by expressing the amplitudes A(f, i) solely as a function of observable quantities [24] (work by J. Schwinger, R. Feynman, F. Dyson etc.). The preceding work led to the development of consistent quantum theories not only for the electromagnetic field Aµ (QED), but also for generalizations of electromagnetism (Yang-Mills theory or non-abelian gauge theory) that turned out to provide excellent descriptions of the new interactions between elementary particles discovered in the twentieth century (the electroweak theory, partially unifying electromagnetism and weak nuclear interactions, and quantum chro- modynamics, describing the strong nuclear interactions). All of these theories give rise to only relatively benign divergences that can be “renormalized” and thus allowed one to compute amplitudes A(f, i) corresponding to observable physical processes [24] (notably, work by G. ’t Hooft and M. Veltman). What happens when we use (24) to construct a “perturbative” quantum theory of general relativity (namely one obtained by expanding in the number ℓ of virtual particle pairs appearing in the intermediate states)? The answer is that the integrals over the four-momenta of intermediate virtual particles are not at all of the benign type that allowed them to be renormalized in the simpler case of electromagnetism. The source of this difference is not accidental, but is rather connected with the basic physics of relativistic gravitation. Indeed, as we have mentioned, the virtual particles have arbitrarily large energies E. Because of the basic relations that led Einstein to develop general relativity, namely E = mi and mi = mg, one deduces that these virtual particles correspond to arbitrarily large gravitational masses mg. They will therefore end up creating intense gravitational effects that become more and more intense as the number ℓ of virtual particle pairs grows. These gravitational interactions that grow without limit with energy and momentum correspond (by Fourier transform) to field configurations concentrated in arbitrarily small space and time scales. One way of seeing why the quantum gravitational field creates much more violent problems than the quantum electromagnetic field is, quite simply, to go back to dimensional analysis. Simple considerations in fact show that the relative (non- dimensional) one-loop amplitude A1 must be proportional to the product ~G and must contain an integral d4k. However, in 1900 Planck had noticed that (in units where c = 1) the dimensions of ~ and G were such that the product ~G had the dimensions of length (or time) squared: ≃ 1.6× 10−33 cm, tP ≡ ≃ 5.4× 10−44 s . (25) One thus deduces that the integral d4k f(k) must have the dimensions of a squared frequency, and therefore that A1 must (when k → ∞) be of the type, A1 ∼ ~G d4k/k2. Such an integral diverges quadratically with the upper limit Λ of the integral (the cutoff frequency, such that |k| ≤ Λ), so that A1 ∼ ~GΛ2 ∼ t2P Λ2. The extension of this dimensional analysis to the intermediate states with several loops (ℓ > 1) causes even more severe polynomial divergences to appear, of a type such that the power of Λ that appears grows without limit with ℓ. In summary, the essential physical characteristics of gravitation (E = mi = mg and the dimension of Newton’s constant G) imply the impossibility of gener- alizing to the gravitational case the methods that allowed a satisfactory quantum treatment of the other interactions (electromagnetic, weak, and strong). Several paths have been explored to get out of this impasse. Some researchers tried to quantize general relativity non-perturbatively, without using an expansion in intermediate states (24) (work by A. Ashtekar, L. Smolin, and others). others have tried to generalize general relativity by adding a fermionic field to Einstein’s (bosonic) gravitational field gµν(x), the gravitino field ψµ(x). It is indeed re- markable that it is possible to define a theory, known as supergravity, that gener- alizes the geometric invariance of general relativity in a profound way. After the 1974 discovery (by J. Wess and B. Zumino) of a possible new global symmetry for interacting bosonic and fermionic fields, supersymmetry (which is a sort of global rotation transforming bosons to fermions and vice versa), D.Z. Freedman, P. van Nieuwenhuizen, and S. Ferrara; and S. Deser and B. Zumino; showed that one could generalize global supersymmetry to a local supersymmetry, meaning that it varies from point to point in space-time. Local supersymmetry is a sort of fermionic generalization (with anti-commuting parameters) of the geometric invariance at the base of general relativity (the invariance under any change in coordinates). The generalization of Einstein’s theory of gravitation that admits such a local supersymmetry is called supergravity theory. As we have mentioned, in four dimensions this theory contains, in addition to the (commuting) bosonic field gµν(x), an (anti-commuting) fermionic field ψµ(x) that is both a space- time vector (with index µ) and a spinor. (It is a massless field of spin 3/2, intermediate between a massless spin 1 field like Aµ and a massless spin 2 field like hµν = gµν − ηµν .) Supergravity was extended to richer fermionic struc- tures (with many gravitinos), and was formulated in space-times having more than four dimensions. It is nevertheless remarkable that there is a maximal dimension, equal to D = 11, admitting a theory of supergravity (the maximal supergravity constructed by E. Cremmer, B. Julia, and J. Scherk). The initial hope underlying the construction of these supergravity theories was that they would perhaps allow one to give meaning to the perturbative calculation (24) of quantum amplitudes. Indeed, one finds for example that at one loop, ℓ = 1, the contributions coming from intermediate fermionic states have a sign oppo- site to the bosonic contributions and (because of the supersymmetry, bosons ↔ fermions) exactly cancel them. Unfortunately, although such cancellations exist for the lowest orders of approximation, it appeared that this was probably not going to be the case at all orders5. The fact that the gravitational interaction constant G has “a bad dimension” remains true and creates non-renormalizable divergences starting at a certain number of loops ℓ. Meanwhile, a third way of defining a consistent quantum theory of gravity was developed, under the name of string theory. Initially formulated as models for the strong interactions (in particular by G. Veneziano, M. Virasoro, P. Ra- mond, A. Neveu, and J.H. Schwarz), the string theories were founded upon the quantization of the relativistic dynamics of an extended object of one spatial di- mension: a “string.” This string could be closed in on itself, like a small rubber band (a closed string), or it could have two ends (an open string). Note that the point of departure of string theory only includes the Poincaré-Minkowski space-time, in other words the metric ηµν of Equation (2), and quantum theory (with the constant ~ = h/2π). In particular, the only symmetry manifest in the classical dynamics of a string is the Poincaré group (3). It is, however, remark- 5Recent work by Z. Bern et al. and M. Green et al., has, however, suggested that such cancellations take place at all orders for the case of maximal supergravity, dimensionally reduced to D = 4 dimensions. able that (as shown by T. Yoneya, and J. Scherk and J.H. Schwarz, in 1974) one of the quantum excitations of a closed string reproduces, in a certain limit, all of the non-linear structure of general relativity (see below). Among the other remarkable properties of string theory [25], let us point out that it is the first physical theory to determine the space-time dimension D. In fact, this theory is only consistent if D = 10, for the versions allowing fermionic excitations (the purely bosonic string theory selects D = 26). The fact that 10 > 4 does not mean that this theory has no relevance to the real world. Indeed, it has been known since the 1930s (from work of T. Kaluza and O. Klein) that a space- time of dimension D > 4 is compatible with experiment if the supplementary (spatial) dimensions close in on themselves (meaning they are compactified) on very small distance scales. The low-energy physics of such a theory seems to take place in a four-dimensional space-time, but it contains new (a priori mass- less) fields connected to the geometry of the additional compactified dimensions. Moreover, recent work (due in particular to I. Antoniadis, N. Arkani-Hamed, S. Dimopoulos, and G. Dvali) has suggested the possibility that the additional dimensions are compactified on scales that are small with respect to everyday life, but very large with respect to the Planck length. This possibility opens up an entire phenomenological field dealing with the eventual observation of signals coming from string theory (see the contribution of I. Antoniadis to this Poincaré seminar). However, string theory’s most remarkable property is that it seems to avoid, in a radical way, the problems of divergent (non-renormalizable) integrals that have weighed down every direct attempt at perturbatively quantizing gravity. In order to explain how string theory arrives at such a result, we must discuss some elements of its formalism. Recall that the classical dynamics of any system is obtained by minimizing a functional of the time evolution of the system’s configuration, called the action (the principle of least action). For example, the action for a particle of mass m, moving in a Riemannian space-time (6), is proportional to the length of the line that it traces in space-time: S = −m ds. This action is minimized when the particle follows a geodesic, in other words when its equation of motion is given by (7). According to Y. Nambu and T. Goto, the action for a string is S = −T dA, where the parameter T (analogous to m for the particle) is called the string tension, and where dA is the area of the two-dimensional surface traced out by the evolution of the string in the (D-dimensional) space- time in which it lives. In quantum theory, the action functional serves (as shown by R. Feynman) to define the transition amplitude (24). Basically, when one considers two intermediate configurations m and n (in the sense of the right-hand side of (24)) that are close to each other, the amplitude A(n,m) is proportional to exp(i S(n,m)/~), where S(n,m) is the minimal classical action such that the system considered evolves from the configuration labelled by n to that labelled by m. Generalizing the decomposition in (24) by introducing an infinite number of intermediate configurations that lie close to each other, one ends up (in a generalization of Huygens’ principle) expressing the amplitude A(f, i) as a multiple sum over all of the “paths” (in the configuration space of the system studied) connecting the initial state i to the final state f . Each path contributes a term eiφ where the phase φ = S/~ is proportional to the action S corresponding to this “path,” or in other words to this possible evolution of the system. In string theory, φ = −(T/~) dA. Since the phase is a non- dimensional quantity, and dA has the dimension of an area, we see that the quantum theory of strings brings in the quantity ~/T , having the dimensions of a length squared, at a fundamental level. More precisely, the fundamental length of string theory, ℓs, is defined by ℓ2s ≡ α′ ≡ 2 π T . (26) This fundamental length plays a central role in string theory. Roughly speak- ing, it defines the characteristic “size” of the quantum states of a string. If ℓs is much smaller than the observational resolution with which one studies the string, the string will look like a point-like particle, and its interactions will be described by a quantum theory of relativistic particles, which is equivalent to a theory of relativistic fields. It is precisely in this sense that general relativity emerges as a limit of string theory. Since this is an important conceptual point for our story, let us give some details about the emergence of general relativity from string theory. The action functional that is used in practice to quantize a string is not really −T dA, but rather (as emphasized by A. Polyakov) = − 1 4 π ℓ2s −γ γab ∂aXµ ∂bXν ηµν + · · · , (27) where σa, a = 0, 1 are two coordinates that allow an event to be located on the space-time surface (or ‘world-sheet’) traced out by the string within the ambient space-time; γab is an auxiliary metric (dΣ 2 = γab(σ) dσ a dσb) defined on this surface (with γab being its inverse, and γ its determinant); and Xµ(σa) defines the embedding of the string in the ambient (flat) space-time. The dots indicate additional terms, and in particular terms of fermionic type that were introduced by P. Ramond, by A. Neveu and J.H. Schwarz, and by others. If one separates the two coordinates σa = (σ0, σ1) into a temporal coordinate, τ ≡ σ0, and a spatial coordinate, σ ≡ σ1, the configuration “at time τ” of the string is described by the functions Xµ(τ, σ), where one can interpret σ as a curvilinear abscissa describing the spatial extent of the string. If we consider a closed string, one that is topologically equivalent to a circle, the function Xµ(τ, σ) must be periodic in σ. One can show that (modulo the imposition of certain constraints) one can choose the coordinates τ and σ on the string such that dΣ2 = −dτ2+dσ2. Then, the dynamical equations for the string (obtained by minimizing the action (27)) reduce to the standard equation for waves on a string: −∂2Xµ/∂τ2 + ∂2Xµ/∂σ2 = 0. The general solution to this equation describes a superposition of waves travelling along the string in both possible directions: Xµ = X L(τ+σ)+X R(τ−σ). If we consider a closed string (one that is topologically equivalent to a circle), these two types of wave are independent of each other. For an open string (with certain reflection conditions at the endpoints of the string) these two types of waves are connected to each other. Moreover, since the string has a finite length in both cases, one can decompose the left- or right-moving waves X L(τ + σ) or X R(τ − σ) as a Fourier series. For example, for a closed string one may write Xµ(τ, σ) = X 0 (τ) + e−2in(τ−σ) + ãµn√ e−2in(τ+σ) + h.c. (28) Here X 0 (τ) = x µ + 2 ℓ2s p µτ describes the motion of the string’s center of mass, and the remainder describes the decomposition of the motion around the center of mass into a discrete set of oscillatory modes. Like any vibrating string, a rel- ativistic string can vibrate in its fundamental mode (n = 1) or in a “harmonic” of the fundamental mode (for an integer n > 1). In the classical case the com- plex coefficients aµn, ã n represent the (complex) amplitudes of vibration for the modes of oscillation at frequency n times the fundamental frequency. (with aµn corresponding to a wave travelling to the right, while ãµn corresponds to a wave travelling to the left.) When one quantizes the string dynamics the position of the string Xµ(τ, σ) becomes an operator (acting in the space of quantum states of the system), and because of this the quantities xµ, pµ, aµn and ã n in (28) be- come operators. The notation h.c. signifies that one must add the hermitian conjugates of the oscillation terms, which will contain the operators (aµn) † and (ãµn) †. (The notation † indicates hermitian conjugation, in other words the oper- ator analog of complex conjugation.) One then finds that the operators xµ and pµ describing the motion of the center of mass satisfy the usual commutation re- lations of a relativistic particle, [xµ, pµ] = i ~ ηµν , and that the operators aµn and ãµn become annihilation operators, like those that appear in the quantum theory of any vibrating system: [aµn, (a †] = ~ ηµν δnm, [ã n, (ã †] = ~ ηµν δmn. In the case of an open string, one only has one set of oscillators, let us say aµn. The discussion up until now has neglected to mention that the oscillation am- plitudes aµn, ã n must satisfy an infinite number of constraints (connected with the equation obtained by minimizing (27) with respect to the auxiliary metric γab). One can satisfy these by expressing two of the space-time components of the oscillators aµn, ã n (for each n) as a function of the other. Because of this, the physical states of the string are described by oscillators ain, ã n where the index i only takes D−2 values in a space-time of dimension D. Forgetting this subtlety for the moment (which is nevertheless crucial physically), let us conclude this discussion by summarizing the spectrum of a quantum string, or in other words the ensemble of quantum states of motion for a string. For an open string, the ensemble of quantum states describes the states of motion (the momenta pµ) of an infinite collection of relativistic particles, having squared massesM2 = −ηµν pµ pν equal to (N−1) m2s, whereN is a non-negative integer andms ≡ ~/ℓs is the fundamental mass of string theory associated to the fundamental length ℓs. For a closed string, one finds another “infinite tower” of more and more massive particles, this time with M2 = 4(N − 1)m2s. In both cases the integer N is given, as a function of the string’s oscillation amplitudes (travelling to the right), by n ηµν(a † aνn . (29) In the case of a closed string one must also satisfy the constraint N = Ñ where Ñ is the operator obtained by replacing aµn by ã n in (29). The preceding result essentially states that the (quantized) internal energy of an oscillating string defines the squared mass of the associated particle. The presence of the additional term −1 in the formulae given above for M2 means that the quantum state of minimum internal energy for a string, that is, the “vacuum” state |0〉 where all oscillators are in their ground state, aµn | 0〉 = 0, corresponds to a negative squared mass (M2 = −m2s for the open string and M2 = −4m2s for the closed string). This unusual quantum state (a tachyon) cor- responds to an instability of the theory of bosonic strings. It is absent from the more sophisticated versions of string theory (“superstrings”) due to F. Gliozzi, J. Scherk, and D. Olive, to M. Green and J.H. Schwarz, and to D. Gross and collaborators. Let us concentrate on the other states (which are the only ones that have corresponding states in superstring theory). One then finds that the first possible physical quantum states (such that N = 1) describe some massless particles. In relativistic quantum theory it is known that any particle is the quantized excitation of a corresponding field. Therefore the massless particles that appear in string theory must correspond to long-range fields. To know which fields appear in this way one must more closely examine which possible combinations of oscillator excitations a 1 , a 2 , a 3 , . . ., appearing in Formula (29), can lead to N = 1. Because of the factor n in (29) multiplying the harmonic contribution of order n to the mass squared, only the oscillators of the fun- damental mode n = 1 can give N = 1. One then deduces that the internal quantum states of massless particles appearing in the theory of open strings are described by a string oscillation state of the form † | 0〉 . (30) On the other hand, because of the constraint N = Ñ = 1, the internal quantum states of the massless particles appearing in the theory of closed strings are described by a state of excitation containing both a left-moving oscillation and a right-moving oscillation: ζµν(a † (ãν1) † | 0〉 . (31) In Equations (30) and (31) the state |0〉 denotes the ground state of all oscillators (aµn | 0〉 = ãµn | 0〉 = 0). The state (30) therefore describes a massless particle (with momentum sat- isfying ηµν p µ pν = 0), possessing an “internal structure” described by a vector polarization ζµ. Here we recognize exactly the definition of a photon, the quan- tum state associated with a wave Aµ(x) = ζµ exp(i kλ x λ), where pµ = ~ kµ. The theory of open strings therefore contains Maxwell’s theory. (One can also show that, because of the constraints briefly mentioned above, the polarization ζµ must be transverse, k µ ζµ = 0, and that it is only defined up to a gauge transformation: ζ′µ = ζµ + a kµ.) As for the state (31), this describes a massless particle (ηµν p µ pν = 0), possessing an “internal structure” described by a tensor polarization ζµν . The plane wave associated with such a particle is therefore of the form h̄µν(x) = ζµν exp(i kλ x λ), where pµ = ~ kµ. As in the case of the open string, one can show that ζµν must be transverse, ζµν k ν = 0 and that it is only defined up to a gauge transformation, ζ′µν = ζµν+kµ aν+kν bµ. We here see the same type of structure appear that we had in general relativity for plane waves. However, here we have a structure that is richer than that of general relativity. Indeed, since the state (31) is obtained by combining two independent states of oscillation, (a † and (ã †, the polarization tensor ζµν is not constrained to be symmetric. Moreover it is not constrained to have vanishing trace. Therefore, if we decompose ζµν into its possible irreducible parts (a symmetric traceless part, a symmetric part with trace, and an antisymmetric part) we find that the field h̄µν(x) associated with the massless states of a closed string decomposes into: (i) a field hµν(x) (the graviton) representing a weak gravitational wave in general relativity, (ii) a scalar field Φ(x) (called the dilaton), and (iii) an antisymmetric tensor field Bµν(x) = −Bνµ(x) subject to the gauge invariance B′µν(x) = Bµν(x) + ∂µ aν(x) − ∂ν aµ(x). Moreover, when one studies the non- linear interactions between these various fields, as described by the transition amplitudes A(f, i) in string theory, one can show that the field hµν(x) truly represents a deformation of the flat geometry of the background space-time in which the theory was initially formulated. Let us emphasize this remarkable result. We started from a theory that studied the quantum dynamics of a string in a rigid background space-time. This theory predicts that certain quantum excitations of a string (that propagate at the speed of light) in fact represent waves of deformation of the space-time geometry. In intuitive terms, the “elas- ticity” of space-time postulated by the theory of general relativity appears here as being due to certain internal vibrations of an elastic object extended in one spatial dimension. Another suggestive consequence of string theory is the link suggested by the comparison between (30) and (31). Roughly, Equation (31) states that the internal state of a closed string corresponding to a graviton is constructed by taking the (tensor) product of the states corresponding to photons in the theory of open strings. This unexpected link between Einstein’s gravitation (gµν) and Maxwell’s theory (Aµ) translates, when we look at interactions in string theory, into remarkable identities (due to H. Kawai, D.C. Lewellen, and S.-H.H. Tye) between the transition amplitudes of open strings and those of closed strings. This affinity between electromagnetism, or rather Yang-Mills theory, and gravitation has recently given rise to fascinating conjectures (due to A. Polyakov and J. Maldacena) connecting quantum Yang-Mills theory in flat space-time to quasi-classical limits of string theory and gravitation in curved space-time. Einstein would certainly have been interested to see how classical general relativity is used here to clarify the limit of a quantum Yang-Mills theory. Having explained the starting point of string theory, we can outline the in- tuitive reason for which this theory avoids the problems with divergent integrals that appeared when one tried to directly quantize gravitation. We have seen that string theory contains an infinite tower of particles whose masses grow with the degree of excitation of the string’s internal oscillators. The gravita- tional field appears in the limit that one considers the low energy interactions (E ≪ ms) between the massless states of the theory. In this limit the gravi- ton (meaning the particle associated with the gravitational field) is treated as a “point-like” particle. When we consider more complicated processes (at one loop, ℓ = 1, see above), virtual elementary gravitons could appear with arbitrar- ily high energy. It is these virtual high-energy gravitons that are responsible for the divergences. However, in string theory, when we consider any intermediate process whatsoever where high energies appear, it must be remembered that this high intermediate energy can also be used to excite the internal state of the virtual gravitons, and thus reveal that they are “made” from an extended string. An analysis of this fact shows that string theory introduces an effective truncation of the type E . ms on the energies of exchanged virtual particles. In other words, the fact that there are no truly “point-like” particles in string theory, but only string excitations having a characteristic length ∼ ℓs, elimi- nates the problem of infinities connected to arbitrarily small length and time scales. Because of this, in string theory one can calculate the transition ampli- tudes corresponding to a collision between two gravitons, and one finds that the result is given by a finite integral [25]. Up until now we have only considered the starting point of string theory. This is a complex theory that is still in a stage of rapid development. Let us briefly sketch some other aspects of this theory that are relevant for this exposé centered around relativistic gravitation. Let us first state that the more sophis- ticated versions of string theory (superstrings) require the inclusion of fermionic oscillators bµn, b̃ n, in addition to the bosonic oscillators a n, ã n introduced above. One then finds that there are no particles of negative mass-squared, and that the space-time dimension D must be equal to 10. One also finds that the mass- less states contain more states than those indicated above. In fact, one finds that the fields corresponding to these states describe the various possible theo- ries of supergravity in D = 10. Recently (in work by J. Polchinski) it has also been understood that string theory contains not only the states of excitation of strings (in other words of objects extended in one spatial direction), but also the states of excitation of objects extended in p spatial directions, where the integer p can take other values than 1. For example, p = 2 corresponds to a membrane. It even seems (according to C. Hull and P. Townsend) that one should recognize that there is a sort of “democracy” between several different values for p. An object extended in p spatial directions is called a p-brane. In general, the masses of the quantum states of these p-branes are very large, be- ing parametrically higher than the characteristic mass ms. However, one may also consider a limit where the mass of certain p-branes tends towards zero. In this limit, the fields associated with these p-branes become long-range fields. A surprising result (by E. Witten) is that, in this limit, the infinite tower of states of certain p-branes (in particular for p = 0) corresponds exactly to the infinite tower of states that appear when one considers the maximal supergravity in D = 11 dimensions, with the eleventh (spatial) dimension compactified on a circle (that is to say with a periodicity condition on x11). In other words, in a certain limit, a theory of superstrings in D = 10 transforms into a theory that lives in D = 11 dimensions! Because of this, many experts in string theory believe that the true definition of string theory (which is still to be found) must start from a theory (to be defined) in 11 dimensions (known as “M -theory”). We have seen in Section 8 that one point of contact between relativistic grav- itation and quantum theory is the phenomenon of thermal emission from black holes discovered by S.W. Hawking. String theory has shed new light upon this phenomenon, as well as on the concept of black hole “entropy.” The essential question that the calculation of S.W. Hawking left in the shadows is: what is the physical meaning of the quantity S defined by Equation (19)? In the ther- modynamic theory of ordinary bodies, the entropy of a system is interpreted, since Boltzmann’s work, as the (natural) logarithm of the number of micro- scopic states N having the same macroscopic characteristics (energy, volume, etc.) as the state of the system under consideration: S = logN . Bekenstein had attempted to estimate the number of microscopic internal states of a macroscop- ically defined black hole, and had argued for a result such that logN was on the order of magnitude of A/~G, but his arguments remained indirect and did not allow a clear meaning to be attributed to this counting of microscopic states. Work by A. Sen and by A. Strominger and C. Vafa, as well as by C.G. Callan and J.M. Maldacena has, for the first time, given examples of black holes whose microscopic description in string theory is sufficiently precise to allow for the calculation (in certain limits) of the number of internal quantum states, N . It is therefore quite satisfying to find a final result for N whose logarithm is pre- cisely equal to the expression (19). However, there do remain dark areas in the understanding of the quantum structure of black holes. In particular, the string theory calculations allowing one to give a precise statistical meaning to the en- tropy (19) deal with very special black holes (known as extremal black holes, which have the maximal electric charge that a black hole with a regular horizon can support). These black holes have a Hawking temperature equal to zero, and therefore do not emit thermal radiation. They correspond to stable states in the quantum theory. One would nevertheless also like to understand the detailed internal quantum structure of unstable black holes, such as the Schwarzschild black hole (17), which has a non-zero temperature, and which therefore loses its mass little by little in the form of thermal radiation. What is the final state to which this gradual process of black hole “evaporation” leads? Is it the case that an initial pure quantum state radiates all of its initial mass to transform itself entirely into incoherent thermal radiation? Or does a Schwarzschild black hole transform itself, after having obtained a minimum size, into something else? The answers to these questions remain open to a large extent, although it has been argued that a Schwarzschild black hole transforms itself into a highly massive quantum string state when its radius becomes on the order of ℓs [26]. We have seen previously that string theory contains general relativity in a certain limit. At the same time, string theory is, strictly speaking, infinitely richer than Einstein’s gravitation, for the graviton is nothing more than a partic- ular quantum excitation of a string, among an infinite number of others. What deviations from Einstein’s gravity are predicted by string theory? This question remains open today because of our lack of comprehension about the connection between string theory and the reality observed in our everyday environment (4-dimensional space-time; electromagnetic, weak, and strong interactions; the spectrum of observed particles; . . .). We shall content ourselves here with out- lining a few possibilities. (See the contribution by I. Antoniadis for a discussion of other possibilities.) First, let us state that if one considers collisions between gravitons with energy-momentum k smaller than, but not negligible with respect to, the characteristic string mass ms, the calculations of transition amplitudes in string theory show that the usual Einstein equations (in the absence of mat- ter) Rµν = 0 must be modified, by including corrections of order (k/ms) 2. One finds that these modified Einstein equations have the form (for bosonic string theory) Rµν + ℓ2s Rµαβγ R ν + · · · = 0 , (32) where �ναβ ≡ ∂α Γ νβ + Γ νβ − ∂β Γµνα − Γ να , (33) denotes the “curvature tensor” of the metric gµν . (the quantity Rµν defined in Section 5 that appears in Einstein’s equations in an essential way is a “trace” of this tensor: Rµν = R �µσν .) As indicated by the dots in (32), the terms written are no more than the two first terms of an infinite series in growing powers of ℓ2s ≡ α′. Equation (32) shows how the fact that the string is not a point, but is rather extended over a characteristic length ∼ ℓs, modifies the Einsteinian description of gravity. The corrections to Einstein’s equation shown in (32) are nevertheless completely negligible in most applications of general relativity. In fact, it is expected that ℓs is on the order of the Planck scale ℓp, Equation (25). More precisely, one expects that ℓs is on the order of magnitude of 10 −32 cm. (Nevertheless, this question remains open, and it has been recently suggested that ℓs is much larger, and perhaps on the order of 10 −17 cm.) If one assumes that ℓs is on the order of magnitude of 10 −32 cm (and that the extra dimensions are compactified on distances scales on the order of ℓs), the only area of general relativistic applications where the modifications shown in (32) should play an important role is in primordial cosmology. Indeed, close to the initial singularity of the Big Bang (if it exists), the “curvature” Rµναβ becomes extremely large. When it reaches values comparable to ℓ−2s the infinite series of corrections in (32) begins to play a role comparable to the first term, discovered by Einstein. Such a situation is also found in the interior of a black hole, when one gets very close to the singularity (see Figure 3). Unfortunately, in such situations, one must take the infinite series of terms in (32) into account, or in other words replace Einstein’s description of gravitation in terms of a field (which corresponds to a point-like (quantum) particle) by its exact stringy description. This is a difficult problem that no one really knows how to attack today. However, a priori string theory predicts more drastic low energy (k ≪ ms) modifications to general relativity than the corrections shown in (32). In fact, we have seen in Equation (31) above that Einsteinian gravity does not appear alone in string theory. It is always necessarily accompanied by other long-range fields, in particular a scalar field Φ(x), the dilaton, and an antisymmetric ten- sor Bµν(x). What role do these “partners” of the graviton play in observable reality? This question does not yet have a clear answer. Moreover, if one recalls that (super)string theory must live in a space-time of dimension D = 10, and that it includes the D = 10 (and eventually the D = 11) theory of supergravity, there are many other supplementary fields that add themselves to the ten com- ponents of the usual metric tensor gµν (in D = 4). It is conceivable that all of these supplementary fields (which are massless to first approximation in string theory) acquire masses in our local universe that are large enough that they no longer propagate observable effects over macroscopic scales. It remains possible, however, that one or several of these fields remain (essentially) massless, and therefore can propagate physical effects over distances that are large enough to be observable. It is therefore of interest to understand what physical effects are implied, for example, by the dilaton Φ(x) or by Bµν(x). Concerning the latter, it is interesting to note that (as emphasized by A. Connes, M. Douglas, and A. Schwartz), in a certain limit, the presence of a background Bµν(x) has the effect of deforming the space-time geometry in a “non-commutative” way. This means that, in a certain sense, the space-time coordinates xµ cease to be sim- ple real (commuting) numbers in order to become non-commuting quantities: xµxν − xνxµ = εµν where εµν = −ενµ is connected to a (uniform) background Bµν . To conclude, let us consider the other obligatory partner of the graviton gµν(x), the dilaton Φ(x). This field plays a central role in string theory. In fact, the average value of the dilaton (in the vacuum) determines the string theory coupling constant, gs = e Φ. The value of gs in turn determines (along with other fields) the physical coupling constants. For example, the gravitational coupling constant is given by a formula of the type ~G = ℓ2s(g s + · · · ) where the dots denote correction terms (which can become quite important if gs is not very small). Similarly, the fine structure constant, α = e2/~c ≃ 1/137, which deter- mines the intensity of electromagnetic interactions is a function of g2s . Because of these relations between the physical coupling constants and gs (and therefore the value of the dilaton; gs = e Φ), we see that if the dilaton is massless (or in other words is long-range), its value Φ(x) at a space-time point x will depend on the distribution of matter in the universe. For example, as is the case with the gravitational field (for example g00(x) ≃ −1 + 2GM/c2r), we expect that the value of Φ(x) depends on the masses present around the point x, and should be different at the Earth’s surface than it is at a higher altitude. One may also expect that Φ(x) would be sensitive to the expansion of the universe and would vary over a time scale comparable to the age of the universe. However, if Φ(x) varies over space and/or time, one concludes from the relations shown above between gs = e Φ and the physical coupling constants that the latter must also vary over space and/or time. Therefore, for example, the value, here and now, of the fine structure constant α could be slightly different from the value it had, long ago, in a very distant galaxy. Such effects are accessible to detailed astronomical observations and, in fact, some recent observations have suggested that the interaction constants were different in distant galaxies. However, other experimental data (such as the fossil nuclear reactor at Oklo and the isotopic composition of ancient terrestrial meteorites) put very severe limits on any vari- ability of the coupling “constants.” Let us finally note that if the fine structure “constant” α, as well as other coupling “constants,” varies with a massless field such as the dilaton Φ(x), then this implies a violation of the basic postulate of general relativity: the principle of equivalence. In particular, one can show that the universality of free fall is necessarily violated, meaning that bodies with dif- ferent nuclear composition would fall with different accelerations in an external gravitational field. This gives an important motivation for testing the principle of equivalence with greater precision. For example, the MICROSCOPE space mission [27] (of the CNES) should soon test the universality of free fall to the level of 10−15, and the STEP space project (Satellite Test of the Equivalence Principle) [28] could reach the level 10−18. Another interesting phenomenological possibility is that the dilaton (and/or other scalar fields of the same type, called moduli) acquires a non-zero mass that is however very small with respect to the string mass scale ms. One could then observe a modification of Newtonian gravitation over small distances (smaller than a tenth of a millimeter). For a discussion of this theoretical possibility and of its recent experimental tests see, respectively, the contributions by I. Anto- niadis and J. Mester to this Poincaré seminar. 12 Conclusion For a long time general relativity was admired as a marvellous intellectual con- struction, but it only played a marginal role in physics. Typical of the appraisal of this theory is the comment by Max Born [29] made upon the fiftieth an- niversary of the annus mirabilis: “The foundations of general relativity seemed to me then, and they still do today, to be the greatest feat of human thought concerning Nature, the most astounding association of philosophical penetra- tion, physical intuition, and mathematical ability. However its connections to experiment were tenuous. It seduced me like a great work of art that should be appreciated and admired from a distance.” Today, one century after the annus mirabilis, the situation is quite different. General relativity plays a central role in a large domain of physics, including everything from primordial cosmology and the physics of black holes to the observation of binary pulsars and the definition of international atomic time. It even has everyday practical applications, via the satellite positioning sys- tems (such as the GPS and, soon, its European counterpart Galileo). Many ambitious (and costly) experimental projects aim to test it (G.P.B., MICRO- SCOPE, STEP, . . .), or use it as a tool for deciphering the distant universe (LIGO/VIRGO/GEO, LISA, . . .). The time is therefore long-gone that its con- nection with experiment was tenuous. Nevertheless, it is worth noting that the fascination with the structure and physical implications of the theory evoked by Born remains intact. One of the motivations for thinking that the theory of strings (and other extended objects) holds the key to the problem of the unification of physics is its deep affinity with general relativity. Indeed, while the attempts at “Grand Unification” made in the 1970s completely ignored the gravitational interaction, string theory necessarily leads to Einstein’s fundamen- tal concept of a dynamical space-time. At any rate, it seems that one must more deeply understand the “generalized quantum geometry” created through the in- teraction of strings and p-branes in order to completely formulate this theory and to understand its hidden symmetries and physical implications. Einstein would no doubt appreciate seeing the key role played by symmetry principles and gravity within modern physics. References [1] A. Einstein, Zur Elektrodynamik bewegter Körper, Annalen der Physik 17, 891 (1905). [2] See http://www.einstein.caltech.edu for an entry into the Einstein Col- lected Papers Project. The French reader will have access to Einstein’s main papers in Albert Einstein, Œuvres choisies, Paris, Le Seuil/CNRS, 1993, under the direction of F. Balibar. See in particular Volumes 2 (Rel- ativités I) and 3 (Relativités II). One can also consult the 2005 Poincaré seminar dedicated to Einstein (http://www.lpthe.jussieu.fr/poincare): Einstein, 1905-2005, Poincaré Seminar 2005, edited by T. Damour, O. Darrigol, B. Duplantier and V. Rivasseau (Birkhäuser Verlag, Basel, Suisse, 2006). See also the excellent summary article by D. Giulini and N. Straumann, “Einstein’s impact on the physics of the twentieth cen- tury,” Studies in History and Philosophy of Modern Physics 37, 115- 173 (2006). For online access to many of Einstein’s original articles and to documents about him, see http://www.alberteinstein.info/. We also note that most of the work in progress on general relativity can be consulted on various archives at http://xxx.lanl.gov, in particular the archive gr-qc. Review articles on certain sub-fields of general relativ- ity are accessible at http://relativity.livingreviews.org. Finally, see T. Damour, Once Upon Einstein, A K Peters Ltd, Wellesley, 2006, for a recent non-technical account of the formation of Einstein’s ideas. [3] Galileo, Dialogues Concerning Two New Sciences, translated by Henry Crew and Alfonso di Salvio, Macmillan, New York, 1914. [4] The reader interested in learning about recent experimental tests of gravitational theories may consult, on the internet, ei- ther the highly detailed review by C.M. Will in Living Re- views (http://relativity.livingreviews.org/Articles/lrr-2001-4) or the brief review by T. Damour in the Review of Particle Physics http://www.einstein.caltech.edu http://www.lpthe.jussieu.fr/poincare http://www.alberteinstein.info/ http://xxx.lanl.gov http://relativity.livingreviews.org http://relativity.livingreviews.org/Articles/lrr-2001-4 (http://pdg.lbl.gov/). See also John Mester’s contribution to this Poincaré seminar. [5] A. Einstein, Die Feldgleichungen der Gravitation, Sitz. Preuss. Akad. Wiss., 1915, p. 844. [6] The reader wishing to study the formalism and applications of general relativity in detail can consult, for example, the following works: L. Landau and E. Lifshitz, The Classical Theory of Fields, Butterworth-Heinemann, 1995; S. Weinberg, Gravitation and Cos- mology, Wiley, New York, 1972; H.C. Ohanian and R. Ruffini, Gravitation and Spacetime, Second Edition, Norton, New York, 1994; N. Straumann, General Relativity, With Applications to Astro- physics, Springer Verlag, 2004. Let us also mention detailed course notes on general relativity by S.M. Carroll, available on the in- ternet: http://pancake.uchicago.edu/∼carroll/notes/∼; as well as at gr-qc/9712019. Finally, let us mention the recent book (in French) on the history of the discovery and reception of general relativity: J. Eisen- staedt, Einstein et la relativité générale, CNRS, Paris, 2002. [7] B. Bertotti, L. Iess, and P. Tortora, A Test of General Relativity Using Radio Links with the Cassini Spacecraft, Nature 425, 374 (2003). [8] http://einstein.stanford.edu [9] W. Israel, Dark stars: the evolution of an idea, in 300 Years of Grav- itation, edited by S.W. Hawking and W. Israel, Cambridge University Press, Cambridge, 1987, Chapter 7, pp. 199-276. [10] The discovery of binary pulsars is related in Hulse’s Nobel Lecture: R.A. Hulse, Reviews of Modern Physics 66, 699 (1994). [11] For an introduction to the observational characteristics of pulsars, and their use in testing relativistic gravitation, see Taylor’s Nobel Lecture: J.H. Taylor, Reviews of Modern Physics 66, 711 (1994). See also Michael Kramer’s contribution to this Poincaré seminar. [12] For an update on the observational characteristics of pulsars, and their use in testing general relativity, see the Living Review by I.H. Stairs, available at http://relativity.livingreviews.org/Articles/lrr-2003-5/ and the contribution by Michael Kramer to this Poincaré seminar. [13] For a recent update on tests of relativistic gravitation (and of tensor- scalar theories) obtained through the chronometry of binary pulsars, see G. Esposito-Farèse, gr-qc/0402007 (available on the general relativity and quantum cosmology archive at the address http://xxx.lanl.gov), and T. Damour and G. Esposito-Farèse, in preparation. Figure 4 is adapted from these references. http://pdg.lbl.gov/ http://pancake.uchicago.edu/~carroll/notes/~ http://arxiv.org/abs/gr-qc/9712019 http://einstein.stanford.edu http://relativity.livingreviews.org/Articles/lrr-2003-5/ http://arxiv.org/abs/gr-qc/0402007 http://xxx.lanl.gov [14] For a review of the problem of the motion of two gravitationally con- densed bodies in general relativity, up to the level where the effects con- nected to the finite speed of propagation of the gravitational interaction appear, see T. Damour, The problem of motion in Newtonian and Ein- steinian gravity, in 300 Years of Gravitation, edited by S.W. Hawking and W. Israel, Cambridge University Press, Cambridge, 1987, Chapter 6, pp. 128-198. [15] A. Einstein, Näherungsweise Integration der Feldgleichungen der Gravitation, Sitz. Preuss. Akad. Wiss., 1916, p. 688 ; ibidem, Über Grav- itationswellen, 1918, p. 154. [16] For a highly detailed introduction to these three problems, see K.S. Thorne Gravitational radiation, in 300 Years of Gravitation, edited by S.W. Hawking and W. Israel, Cambridge University Press, Cam- bridge, 1987, Chapter 9, pp. 330-458. [17] http://www.ligo.caltech.edu/ [18] http://www.virgo.infn.it/ [19] http://www/geo600.uni-hanover.de/ [20] http://lisa.jpl.nasa.gov/ [21] L. Blanchet et al., gr-qc/0406012 ; see also the Living Review by L. Blanchet, available at http://relativity.livingreviews.org/Articles. [22] Figure 5 is adapted from work by A. Buonanno and T. Damour, gr-qc/0001013. [23] F. Pretorius, Phys. Rev. Lett. 95, 121101 (2005), gr-qc/0507014; M. Campanelli et al., Phys. Rev. Lett. 96, 111101 (2006), gr-qc/0511048 J. Baker et al., Phys. Rev. D 73, 104002 (2006), gr-qc/0602026. [24] For a particularly clear exposé of the development of the quantum the- ory of fields, see, for example, the first chapter of S. Weinberg, The Quantum Theory of Fields, volume 1, Foundations, Cambridge Univer- sity Press, Cambridge, 1995. [25] For an introduction to the theory of (super)strings see http://superstringtheory.com/. For a detailed (and technical) in- troduction to the theory see the books: K. Becker, M. Becker, and J.H. Schwarz, String Theory and M-theory: An Introduction, Cambridge University Press, Cambridge, 2006; B. Zwiebach, A First Course in String Theory, Cambridge University Press, Cambridge, 2004; M.B. Green, J.H. Schwarz et E. Witten, Superstring theory, 2 vol- umes, Cambridge University Press, Cambridge, 1987 ; and J. Polchinski, String Theory, 2 volumes, Cambridge University Press, Cambridge, http://www.ligo.caltech.edu/ http://www.virgo.infn.it/ http://www/geo600.uni-hanover.de/ http://lisa.jpl.nasa.gov/ http://arxiv.org/abs/gr-qc/0406012 http://relativity.livingreviews.org/Articles http://arxiv.org/abs/gr-qc/0001013 http://arxiv.org/abs/gr-qc/0507014 http://arxiv.org/abs/gr-qc/0511048 http://arxiv.org/abs/gr-qc/0602026 http://superstringtheory.com/ 1998. To read review articles or to research this theory as it develops see the hep-th archive at http://xxx.lanl.gov. To search for information on the string theory literature (and more generally that of high-energy physics) see also the site http://www.slac.stanford.edu/spires/find/hep. [26] For a detailed introduction to black hole physics see P.K. Townsend, gr-qc/9707012; for an entry into the vast literature on black hole en- tropy, see, for example, T. Damour, hep-th/0401160 in Poincaré Sem- inar 2003, edited by Jean Dalibard, Bertrand Duplantier, and Vincent Rivasseau (Birkhäuser Verlag, Basel, 2004), pp. 227-264. [27] http://www.onera.fr/microscope/ [28] http://www.sstd.rl.ac.uk/fundphys/step/. [29] M. Born, Physics and Relativity, in Fünfzig Jahre Relativitätstheorie, Bern, 11-16 Juli 1955, Verhandlungen, edited by A. Mercier and M. Kervaire, Helvetica Physica Acta, Supplement 4, 244-260 (1956). http://xxx.lanl.gov http://www.slac.stanford.edu/spires/find/hep http://arxiv.org/abs/gr-qc/9707012 http://arxiv.org/abs/hep-th/0401160 http://www.onera.fr/microscope/ http://www.sstd.rl.ac.uk/fundphys/step/ Introduction Special Relativity The Principle of Equivalence Gravitation and Space-Time Chrono-Geometry Einstein's Equations: Elastic Space-Time The Weak-Field Limit and the Newtonian Limit The Post-Newtonian Approximation and Experimental Confirmations in the Regime of Weak and Quasi-Stationary Gravitational Fields Strong Gravitational Fields and Black Holes Binary Pulsars and Experimental Confirmations in the Regime of Strong and Radiating Gravitational Fields Gravitational Waves: Propagation, Generation, and Detection General Relativity and Quantum Theory: From Supergravity to String Theory Conclusion ABSTRACT After recalling the conceptual foundations and the basic structure of general relativity, we review some of its main modern developments (apart from cosmology) : (i) the post-Newtonian limit and weak-field tests in the solar system, (ii) strong gravitational fields and black holes, (iii) strong-field and radiative tests in binary pulsar observations, (iv) gravitational waves, (v) general relativity and quantum theory. <|endoftext|><|startoftext|> Introduction This worksheet demonstrates the use of Maple in Linear Algebra. We give a new procedure (PowerMatrix) in Maple for finding the kth power of n-by-n square matrix A, in a symbolic form, for any positive integer k, k ≥ n. The algorithm is based on an application of Cayley-Hamilton theorem. We used the fact that the entries of the matrix Ak satisfy the same recurrence relation which is determined by the characteristic polynomial of the matrix A (see [1]). The order of these recurrences is n− d, where d is the lowest degree of the characteristic polynomial of the matrix A. For non-singular matrices the procedure can be extended for k not only a positive integer. 2 Initialization > restart: with(LinearAlgebra): 2.1 Procedure Definition 2.1.1 PowerMatrix Input data are a square matrix A and a parameter k. Elements of the matrix A can be numbers and/or parameters. The parameter k can take numeric value or be a symbol. The output data is the kth power of the matrix. The procedure PowerMatrix is as powerful as the procedure rsolve. > PowerMatrix := proc(A::Matrix,k) local i,j,m,r,q,n,d,f,P,F,C; P := x->CharacteristicPolynomial(A,x); n := degree(P(x),x); d := ldegree(P(x),x); http://arxiv.org/abs/0704.0755v2 http://www.maplesoft.com/ mailto:malesh@EUnet.yu mailto:ivana121@EUnet.yu F := (i,j)->rsolve(sum(coeff(P(x),x,m)*f(m+q),m=0..n)=0,seq(f(r)=(A^r)[i,j], r=d+1..n),f); C := q->Matrix(n,n,F); if (type(k,integer)) then return(simplify(A^k)) elif (Determinant(A)=0 and not type(k,numeric)) then printf("The %ath power of the matrix for %a>=%d:", k,k,n) elif (Determinant(A)=0 and type(k,numeric)) then return(simplify(A^k)) fi; return(simplify(subs(q=k,C(q)))); 3 Examples 3.1 Example 1. > A := Matrix([[4,-2,2],[-5,7,-5],[-6,6,-4]]); 4 −2 2 −5 7 −5 −6 6 −4 > PowerMatrix(A,k); −2k + 2 · 3k 2(1+k) − 2 · 3k −2(1+k) + 2 · 3k −5 · 3k + 5 · 2k 5 · 3k − 4 · 2k −5 · 3k + 5 · 2k 6 · 2k − 6 · 3k −6 · 2k + 6 · 3k −6 · 3k + 7 · 2k > Determinant(A); > B := A^(-1); > PowerMatrix(B,k); −2(−k) + 2 · 3(−k) 2(1−k) − 2 · 3(−k) −2(1−k) + 2 · 3(−k) −5 · 3(−k) + 5 · 2(−k) 5 · 3(−k) − 4 · 2(−k) −5 · 3(−k) + 5 · 2(−k) −6 · 3(−k) + 6 · 2(−k) −6 · 2(−k) + 6 · 3(−k) −6 · 3(−k) + 7 · 2(−k) 3.2 Example 2. > A := Matrix([[1-p,p],[p,1-p]]); 1− p p p 1− p > PowerMatrix(A,k); (1− 2 p)k (1− 2 p)k (1− 2 p)k (1− 2 p)k The example is from [4], page 272, exercise 19. 3.3 Example 3. > A := Matrix([[a,b,c],[d,e,f],[g,h,i]]); a b c d e f g h i > PowerMatrix(A,k)[1,1]; R = RootOf( (gbf + hdc + iea − gce − hfa− idb) Z3 + (gc + hf + db − ie − ia − ea) Z2 + (i+ e + a) Z − 1 ) R2ie− R2hf − Re− Ri+ 1 (3 R2gbf + 3 R2hdc+ 3 R2iea −3 R2gce− 3 R2hfa− 3 R2idb+ 2 Rgc+ 2 Rhf + 2 Rdb− 2 Rie− 2 Ria− 2 Rea+ i+ e+ a) R # Warning! In this example MatrixPower and MatrixFuction procedures cannot be done in real-time. # MatrixPower(A,k)[1,1]; # MatrixFunction(A,v^k,v)[1,1]; 3.4 Example 4. > A := Matrix([[0,0,1,0,1],[1,0,0,0,1],[0,0,0,1,1],[0,1,0,0,1],[1,1,1,1,0]]); 0 0 1 0 1 1 0 0 0 1 0 0 0 1 1 0 1 0 0 1 1 1 1 1 0 > PowerMatrix(A,k)[1,5]; Replace ’:’ with ’;’ and see result! > MatrixPower(A,k)[1,5]: > assume(m::integer):simplify(MatrixPower(A,k)[1,5]): The example is from [3], page 101. 3.5 Example 5. and Example 6. Pay attention what happens for singular matrices. 3.5.1 Example 5. > A := Matrix([[0,2,1,3],[0,0,-2,4],[0,0,0,5],[0,0,0,0]]); 0 2 1 3 0 0 −2 4 0 0 0 5 0 0 0 0 > PowerMatrix(A,2); 0 0 −4 13 0 0 0 −10 0 0 0 0 0 0 0 0 > PowerMatrix(A,3); 0 0 0 −20 0 0 0 0 0 0 0 0 0 0 0 0 > PowerMatrix(A,k); The kth power of the matrix A for k ≥ 4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > MatrixPower(A,k); Error, (in LinearAlgebra:-LA_Main:-MatrixPower) power k is not defined for this Matrix > MatrixFunction(A,v^k,v); Error, (in LinearAlgebra:-LA_Main:-MatrixFunction) Matrix function vk is not defined for this Matrix The example is from [2], page 151, exercise 23. 3.5.2 Example 6. > A := Matrix([[1,1,1,0],[1,1,1,-1],[0,0,-1,1],[0,0,1,-1]]); 1 1 1 0 1 1 1 −1 0 0 −1 1 0 0 1 −1 > PowerMatrix(A,k); > The kth power of the matrix for k ≥ 4: 2(−1+k) 2(−1+k) (−1)(1+k) · 2k 5 · 2k (−1)k · 2k 2(−1+k) 2(−1+k) 5 · 2k 5 · (−1)(1+k) · 2k 5 · (−1)k · 2k 0 0 (−1)k · 2(−1+k) (−1)(1+k) · 2(−1+k) 0 0 (−1)(1+k) · 2(−1+k) (−1)k · 2(−1+k) > MatrixPower(A,k); Error, (in LinearAlgebra:-LA_Main:-MatrixPower) power k is not defined for this Matrix > MatrixFunction(A,v^k,v); Error, (in LinearAlgebra:-LA_Main:-MatrixFunction) Matrix function vk is not defined for this Matrix 4 References [1] BrankoMalešević: Some combinatorial aspects of the composition of a set of functions, NSJOM 2006 (36), 3-9, URLs: http://www.im.ns.ac.yu/NSJOM/Papers/36 1/NSJOM 36 1 003 009.pdf, http://arxiv.org/abs/math.CO/0409287. [2] JohnB. Johnston, G.BaleyPrice, Fred S.Van Vleck: Linear Equations and Matrices, Addi- son-Wesley, 1966. [3] Carl D.Meyer: Matrix Analysis and Applied Linear Algebra Book and Solutions Manual SIAM, 2001. [4] Robert Messer: Linear Algebra Gateway to Mathematics, New York, Harper-Collins College Publisher, 1993. 5 Conclusions This procedure has an educational character. It is an interesting demonstration for finding the kth power of a matrix in a symbolic form. Sometimes, it gives solutions in the better form than the existing procedure MatrixPower (see example 4.). See also example 5. and example 6., where we consider singular matrices. In these cases the procedure MatrixPower does not give a solution. The procedure PowerMatrix calculates the kth power of any singular matrices. In some examples it is possible to get a solution in the better form with using the procedure allvalues (see example 3.). Legal Notice: The copyright for this application is owned by the authors. Neither Maplesoft nor the author are responsible for any errors contained within and are not liable for any damages resulting from the use of this material. This application is intended for non-commercial, non-profit use only. Contact the author for permission if you wish to use this application in for-profit activities. http://www.im.ns.ac.yu/NSJOM/Papers/36_1/NSJOM_36_1_003_009.pdf http://arxiv.org/abs/math.CO/0409287 Introduction Initialization Procedure Definition PowerMatrix Examples Example 1. Example 2. Example 3. Example 4. Example 5. and Example 6. Example 5. Example 6. References Conclusions ABSTRACT We give a new procedure in Maple for finding the k-th power of a martix. The algorithm is based on the article [1]. <|endoftext|><|startoftext|> Introduction: String theory, the most serious can- didate for a quantum theory of gravity, predicts the ex- istence of ’branes’, i.e. hypersurfaces in the 10- (or 11- ) dimensional spacetime on which ordinary matter, e.g. gauge particles and fermions, are confined. Gravitons can move freely in the ’bulk’, the full higher dimensional spacetime [1]. The scenario, where our Universe moves through a five- dimensional Anti de Sitter (AdS) spacetime has been especially successful in reproducing the observed four- dimensional behavior of gravity. It has been shown that at sufficiently low energies and large scales, not only grav- ity on the brane looks four dimensional [2], but also cos- mological expansion can be reproduced [3]. We shall con- centrate here on this example and comment on behavior which may survive in other warped braneworlds. We consider the following situation: A fixed ’static brane’ is sitting in the bulk. The ’physical brane’, our Universe, is first moving away from the AdS Cauchy hori- zon, approaching the second brane. This motion corre- sponds to a contracting Universe. After a closest en- counter the physical brane turns around and moves away from the static brane. This motion mimics the observed expanding Universe. The moving brane acts as a time-dependent boundary for the 5D bulk leading to production of gravitons from vac- uum fluctuations in the same way a moving mirror causes photon creation from vacuum in dynamical cavities [4]. Apart from massless gravitons, braneworlds allow for a tower of Kaluza-Klein (KK) gravitons which appear as massive particles on the brane leading possibly to phe- nomenological consequences. We postulate, that high energy stringy physics will lead to a turnaround of the brane motion, i.e., provoke a re- pulsion of the physical brane from the static one. This ∗Electronic address: ruth.durrer@physics.unige.ch †Electronic address: marcus.ruser@physics.unige.ch motion is modeled by a kink where the brane velocity changes sign. As we shall see, a perfect kink leads to divergent particle production due to its infinite acceler- ation. We therefore assume that the kink is rounded off at the string scale Ls. Then particles with energies E > Es = 1/Ls are not generated. This setup represents a regular ’bouncing Universe’ as, for example the ’ekpy- rotic Universe’ [5]. Four-dimensional bouncing Universes have also been studied in Ref. [6]. Moving brane in AdS5: Our starting point is the met- ric of AdS5 in Poincaré coordinates: ds2 = gABdx AdxB = −dt2 + δijdxidxj + dy2 The physical brane (our Universe) is located at some time-dependent position y = yb(t), while the static brane is at a fixed position y = ys > yb(t). The scale factor on the brane is a(η) = yb(t) , dη = 1− v2dt = γ−1dt , v = dyb where we have introduced the brane velocity v and the conformal time η on the brane. If v ≪ 1, the junction conditions lead to the Friedmann equations on the brane. For reviews see [7, 8]. Defining the string and Planck scales by κ5 ≡ L3s and κ4 ≡ L2Pl the Randall-Sundrum (RS) fine tuning condition [2] implies . (2) We assume that the brane energy density is dominated by a radiation component. The contracting (t < 0) and expanding (t > 0) phases are then described by a(t) = |t|+ tb , yb(t) = |t|+ tb , (3) v(t) = − sign(t)L (|t|+ tb)2 ≃ −HL (4) http://arxiv.org/abs/0704.0756v3 http://arxiv.org/abs/0704.0790 mailto:ruth.durrer@physics.unige.ch mailto:marcus.ruser@physics.unige.ch where H = (da/dη)/a2 is the Hubble parameter and we have used that η ≃ t if v ≪ 1. A small velocity also requires yb(t) ≪ L. The transition from contraction to expansion is approximated by a kink at t = 0, such that at the moment of the bounce |v(0)| ≡ vb = , ab = a(0) = , H2b = . (5) Tensor perturbations: We now consider tensor per- turbations hij on this background, ds2 = −dt2 + (δij + 2hij)dxidxj + dy2 . (6) For each polarization, their amplitude h satisfies the Klein-Gordon equation in AdS5 [8] ∂2t + k 2 − ∂2y + h(t, y;k) = 0 (7) where k = |k| is the momentum parallel to the brane and h is subject to the boundary (2nd junction) conditions (v∂t + ∂y)h|yb(t) = 0 → ∂yh|yb(t) = 0 and ∂yh|ys = 0 . Being interested in late-time (low energy) effects, we have approximated the first of those conditions by a Neumann condition (v ≪ 1). Then, the spatial part of Eq. (7) together with (8) forms a Strum-Liouville problem at any given time and therefore has a complete orthonormal set of eigenfunctions {φα(t, y)}∞α=0. These ’instantaneous’ mode functions are given by φ0(t) = ysyb(t) y2s − y2b (t) . (9) φn(t, y) = Nn(t)y 2C2(mn(t), yb(t), y) with Cν(m,x, y) = Y1(mx)Jν(my)−J1(mx)Yν (my) (10) and satisfy [−∂2y + (3/y)∂y]φα(y) = m2αφα(y) as well as (8). Nn is a time-dependent normalization condition. More details can be found in [9]. The massless mode φ0 represents the ordinary four-dimensional graviton on the brane, while the massive modes are KK gravitons. Their masses are quantized by the boundary condition at the static brane which requires C1(mn, yb, ys) = 0. At late times and for large n the KK masses are roughly given by mn ≃ nπ/ys. The gravity wave amplitude h may now be decomposed as [9] h(t, y;k) = qα,k(t)φα(t, y) (11) where the prefactor assures that the variables qα,k are canonically normalized. Their time evolution is deter- mined by the brane motion [cf. Eq. (14)]. Localization of gravity: From the above expressions and using L/yb(t) = a(t), we can determine the late- time behavior of the mode functions φα on the brane (yb ≪ L ≪ ys) φ0(t, yb) → , φn(t, yb) → . (12) At this point we can already make two crucial ob- servations: First, the mass mn is a comoving mass. The instantaneous energy of a KK graviton is ωn,k = k2 +m2n, where k denotes comoving wave number. The ’physical mass’ of a KK mode measured by an ob- server on the brane with cosmic time dτ = adt is therefore mn/a, i.e. the KK masses are redshifted with the expan- sion of the Universe. This comes from the fact that mn is the wave number corresponding to the y direction with respect to the bulk time t which corresponds to conformal time η on the brane and not to physical time. It implies that the energy of KK particles on a moving AdS brane is redshifted like that of massless particles. From this alone we would expect the energy density of KK modes on the brane decays like 1/a4. But this is not all. In contrast to the zero mode which behaves as φ0(t, yb) ∝ 1/a the KK-mode functions φn(t, yb) decay as 1/a 2 with the expansion of the Uni- verse and scale like 1/ ys. Consequently the amplitude of the KK modes on the brane dilutes rapidly with the expansion of the Universe and is in general smaller the larger ys. This can be understood by studying the prob- ability of finding a KK-graviton at position y in the bulk which turns out to be much larger in regions of less warp- ing than in the vicinity of the physical brane[9]. If KK gravitons are present on the brane, they escape rapidly into the bulk, i.e., the moving brane looses them, since their wave function is repulsed away from the brane. This causes the additional 1/a-dependence of φn(t, yb) com- pared to φ0(t, yb). The 1/ ys-dependence expresses the fact that the larger the bulk the smaller the probabil- ity to find a KK-graviton at the position of the moving brane. This behavior reflects the localization of gravity: traces of the five-dimensional nature of gravity like KK gravitons become less and less ’visible’ on the brane as time evolves. As a consequence, the energy density of KK gravitons at late times on the brane behaves as ρKK ∝ 1/a6 . (13) It means that KK gravitons redshift like stiff matter and cannot be the dark matter in an AdS braneworld since their energy density does not have the required 1/a3 be- havior. They also do not behave like dark radiation [7, 8] as one might naively expect. This new result is derived in detail in Ref. [9]. It is based on the calculation of 〈ḣ2(t, yb,k)〉 ∝ 〈q̇2α,k(t)〉φ2α(t, yb) where the bracket in- corporates a quantum expectation value with respect to a well-defined initial vacuum state and averaging over several oscillations of the field [9]. An overdot denotes the derivative with respect to t. The scaling behaviour (13) is due to φ2α(t, yb) only, since graviton production from vacuum fluctuations has ceased at late times (like in radiation domination) which is necessary for a mean- ingful particle definition. Then, 〈q̇2α,k(t)〉 is related to the number of produced gravitons and is constant in time. In case that amplification of tensor perturbation is still on- going, e.g., during a de Sitter phase, the energy density related to the massive modes might scale differently. The scaling behavior (13) remains valid also when the fixed brane is sent off to infinity and we end up with a single braneworld in AdS5, like in the Randall-Sundrum II sce- nario [2]. The situation is not altered if we replace the graviton by a scalar or vector degree of freedom in the bulk. Since every bulk degree of freedom must satisfy the five-dimensional Klein-Gordon equation, the mode func- tions will always be the functions φα, and the energy density of the KK-modes decays like 1/a6. KK particles on a brane moving through an AdS bulk cannot play the role of dark matter. It is important here that we consider a static bulk and the time depencence of the brane comes solely from its motion through the bulk. In Ref. [10] the situation of a fixed brane in a time-dependent bulk is discussed. There it is shown that under certain assumption (separability of the y and t dependence of fluctuations), the energy density of KK modes on a low energy cosmological brane does scale like 1/a3 which seems to be in contradiction with our result. However, the approximations used in [10] lead to a system of equations governing the expansion of the Universe but neglecting the time dependence of the bulk. The situation is then effectively four dimensional even for the KK modes; effects of the fifth dimension like the possibility of KK gravitons escaping into the bulk seem to be lost in this approach. In our case we would have a similar situation if we keep the expansion on the brane a(t) but take the position of the brane in the bulk as static yb(t) = const, which is not consistent with the general relation yb(t) = L/a(t). [For a fixed physical mass M = m/a, if we neglect the time dependence of φn(yb(t)) ∝ 1/a2 we also obtain an energy density for this mass proportional to 1/a3.] Particle production: The equation of motion for the canonical variables qα,k is of the form, see Ref. [9], q̈α,k + ω α,kqα,k = β 6=α Mαβ q̇β,k + Nαβqβ,k . (14) Here ωα,k = k2 +m2α is the frequency of the mode and M andN are coupling matrices. When we quantize these variables, gravitons can be created by two effects: First, the time dependence of the effective frequency (ωeffα,k) ω2α,k−Nαα and second, the time dependence of the mode couplings described by the antisymmetric matrix M and the off-diagonal part of N . Note that Equation (14) is derived from the corre- sponding action for the variables qα,k rather than from the wave equation (7) itself. In this way the approxi- mated boundary conditions (8) can be implemented con- sistently [9, 11]. In the technical paper [9] we have studied graviton pro- duction provoked by a brane moving according to (3) in great detail numerically. We have found that for long wavelengths, kL ≪ 1, the zero mode is mainly generated by its self-coupling, i.e. the time dependence of its ef- fective frequency. One actually finds that N00 ∝ δ(t), so that there is an instability at the moment of the kink which leads to particle creation, and the number of 4D- gravitons is given by 2vb/(kL) 2. This is specific to ra- diation dominated expansion where H2a2 = −∂η(Ha). For another expansion law we would also obtain particle creation during the contraction and expansion phases. Light KK gravitons are produced mainly via their cou- pling to the zero mode. This behavior changes drastically for short wavelengths kL ≫ 1. Then the evolution of the zero mode couples strongly to the KK modes and pro- duction of 4D gravitons via the decay of KK modes takes place. In this case the number of produced 4D gravitons decays only like ∝ 1/(kL). Results and discussion: The numerical simulations have revealed a multitude of interesting effects. In the following we summarize the main findings. We refer the interested reader to Ref. [9] for an extensive discussion. For the zero-mode power spectrum we find on scales kL ≪ 1 on which we observe cosmological fluctuations (Mpc or larger) P0(k) = k2 if kt ≪ 1 (La)−2 if kt ≫ 1 . (15) The spectrum of tensor perturbations is blue on super- horizon scales as one would expect for an ekpyrotic sce- nario. On cosmic microwave background scales the am- plitude of perturbations is of the order of (H0/mPl) 2 and hence unobservably small. Calculating the energy density of the produced massless gravitons one obtains [9] ρh0 ≃ . (16) Comparing this with the radiation energy density, ρrad = (3/(κ4L 2))a−4, the RS fine-tuning condition leads to the simple relation ρh0/ρrad ≃ vb/2. (17) The nucleosynthesis bound [12] requests ρh0 <∼ 0.1ρrad, which implies vb ≤ 0.2, justifying our low energy ap- proach. The model is not severely constrained by the zero-mode. More stringent bounds come from the KK modes. Their energy density on the brane is found to be ρKK ≃ . (18) This result is dominated by high energy KK gravitons which are produced due to the kink. It is reasonable to require that the KK-energy density on the brane be (much) smaller than the radiation density at all times, and in particular, right after the bounce where ρKK is greatest. If this is not satisfied, back reaction cannot be neglected. We obtain with ρrad(0) = 3H b /κ4 a=a(0)=1/ ≃ 100 v3b . (19) If we use the largest value for the brane velocity vb ad- mitted by the nucleosynthesis bound vb ≃ 0.2 and re- quire that ρKK/ρrad be (much) smaller than one for back- reaction effects to be negligible, we obtain the very strin- gent condition . (20) Taking the largest allowed value for L ≃ 0.1mm, the RS fine-tuning condition Eq. (2) determines Ls = (LL 1/3 ≃ 10−22mm ≃ 1/(106TeV) and (L/Ls) 2 ≃ 1042 so that ys > L(L/Ls)2 ≃ 1041mm ∼ 1016Mpc. This is about 12 orders of magnitude larger than the present Hubble scale. Also, since yb(t) ≪ L in the low energy regime, and ys ≫ L according to the inequality (20), the physical brane and the static brane need to be far apart at all times otherwise back reaction is not negligible. This situation is probably not very realistic. We need some high energy, stringy effects to provoke the bounce and these may well be relevant only when the branes are sufficiently close, i.e. at a distance of order Ls. But in this case the constraint (20) will be violated which implies that back reaction will be relevant. On the other hand, if we want that ys ≃ L and back reaction to be unimportant, then Eq. (19) implies that the bounce velocity has to be exceedingly small, vb <∼ 10−15. One might first hope to find a way out of these conclusions by allowing the bounce to happen in the high energy regime. But then vb ≃ 1 and the nucleosynthesis bound is violated since too many zero-mode gravitons are being produced. Clearly our low energy approach looses its justification if vb ≃ 1, but it seems unlikely that modifications coming from the high energy regime alleviate the bounds. Conclusions: Studying graviton production in an AdS braneworld we have found the following. First, the energy density of KK gravitons on the brane behaves as ∝ 1/a6, i.e. it scales like stiff matter with the expansion of the Universe and can therefore not serve as a can- didate for dark matter. Furthermore, if gravity looks four dimensional on the brane, its higher-dimensional aspects, like the KK modes, are repelled from the brane. Even if KK gravitons are produced on the brane they rapidly escape into the bulk as time evolves, leaving no traces of the underlying higher-dimensional nature of gravity. This is likely to survive also in other warped braneworlds when expansion can be mimicked by brane motion. Secondly, a braneworld bouncing at low energies is not constrained by massless 4D gravitons and satisfies the nucleosynthesis bound as long as vb <∼ 0.2. However, for interesting values of the string and AdS scales and the largest admitted bounce velocity the back reaction of the KK modes is only negligible if the two branes are far apart from each other at all times, which seems rather unrealistic. For a realistic bounce the back reaction from KK modes can most likely not be neglected. Even if the energy density of the KK gravitons on the brane dilutes rapidly after the bounce, the corresponding energy density in the bulk could even lead to important changes of the bulk geometry. The present model seems to be adequate to address the back reaction issue since the creation of KK gravitons happens exclusively at the bounce. This and the treatment of the high energy regime vb ≃ 1 is reserved for future work. We thank Kazuya Koyama for discussions. This work is supported by the Swiss National Science Foundation. [1] J. Polchinski, String theory. An introduction to the bosonic string, Vol. I, and String theory. Superstring the- ory and beyond, Vol. II , Cambridge University Press (1998). J. Polchinski, Phys. Rev. Lett. 75, 4724 (1995), hep- th/9910219. [2] L. Randall and R. Sundrum, Phys. Rev. Lett. 83, 3370 (1999), hep-th/9905221; 83, 4690, hep-th/9906064 [3] P. Binetruy, C. Deffayet, U. Ellwanger, and D. Langlois, Phys. Lett. B477, 285 (2000), hep-th/9910219. [4] M. Ruser, Phys. Rev. A73, 043811 (2006); J. Phys. A39, 6711 (2006), and references therein. [5] J. Khoury, P. Steinhardt and N. Turok, Phys. Rev. Lett. 92, 031302 (2004), hep-th/0307132; Phys. Rev. Lett. 91 161301 (2003), astro-ph/0302012. [6] R. Durrer and F. Vernizzi, Phys.Rev.D66 083503 (2002), hep-ph/0203275; C. Cartier, E. Copeland and R. Durrer, Phys. Rev. D67, 103517 (2003), hep-th/0301198. [7] R. Maartens, Living Rev. Rel. 7, 7 (2004), gr-qc/0312059. [8] R. Durrer, Braneworlds, at the XI Brazilian School of Cosmology and Gravitation, Edt. M. Novello and S.E. Perez Bergliaffa, AIP Conference Proceedings, 782 (2005), hep-th/0507006. [9] M. Ruser and R. Durrer, Phys. Rev. D 76, 104014 (2007), arXiv:0704.0790. [10] M. Minamitsuji, M. Sasaki and D. Langlois, Phys. Rev. D71, 084019 (2005). [11] C. Cartier, R. Durrer and M. Ruser, Phys. Rev. D72, 104018 (2005). [12] M. Maggiore. Phys. Rept. 331, 283 (2000). ABSTRACT In braneworld cosmology the expanding Universe is realized as a brane moving through a warped higher-dimensional spacetime. Like a moving mirror causes the creation of photons out of vacuum fluctuations, a moving brane leads to graviton production. We show that, very generically, Kaluza-Klein (KK) particles scale like stiff matter with the expansion of the Universe and can therefore not represent the dark matter in a warped braneworld. We present results for the production of massless and KK gravitons for bouncing branes in five-dimensional anti de Sitter space. We find that for a realistic bounce the back reaction from the generated gravitons will be most likely relevant. This letter summarizes the main results and conclusions from numerical simulations which are presented in detail in a long paper [M.Ruser and R. Durrer, Phys. Rev. D 76, 104014 (2007), arXiv:0704.0790] <|endoftext|><|startoftext|> Bounds on Negativity of Superpositions Yong-Cheng Ou and Heng Fan Institute of Physics, Chinese Academy of Sciences, Beijing 100080, People’s Republic of China The entanglement quantified by negativity of pure bipartite superposed states is studied. We show that if the entanglement is quantified by the concurrence two pure states of high fidelity to one another still have nearly the same entanglement. Furthermore this conclusion can be guaranteed by our obtained inequality, and the concurrence is shown to be a continuous function even in infinite dimensions. The bounds on the negativity of superposed states in terms of those of the states being superposed are obtained. These bounds can find useful applications in estimating the amount of the entanglement of a given pure state. PACS numbers: 03.67.Mn, 03.65.Ta, 03.65.Ud Quantum entanglement plays an important role both in many aspects of quantum information theory[1] and in describing quantum phase transition in quantum many- body systems[2, 3]. As such characterization quantifi- cation of quantum entanglement is a fundamental issue. Consequently the legitimate measures of entanglement are desirable as a first step. The existing well-known bi- partite measure of entanglement with an elegant formula is the concurrence derived analytically by Wootters[4] and the entanglement of formation[5, 6] is a monoton- ically increasing function of the concurrence. In general for a multipartite or higher-dimensional system it is a formidable task of quantifying its entanglement since it needs complicate convex-roof extension. In the last 10 years some important properties of quantum entangle- ment were found, one of which is the monogamy prop- erty described by Coffman-Kundu-Wootters inequality in terms of concurrence [7]. In our previous work we have shown that the monogamy inequality can not general- ize to higher-dimensional systems[8] and established a monogamy inequality in terms of negativity giving a dif- ferent residual entanglement[9]. On the other hand, quantum entanglement is a direct consequence of the superposition principle. It is an in- teresting physical phenomenon that the superposition of two separable states may give birth to an entangled state, on the contrary, the superposition of two entangled states may give birth to a separable state. The relation between the entanglement of the state and the entanglement of the individual terms that by superposition yield the state has been studied, where the entanglement is quantified by the von Neumann entropy[10] and the concurrence[12]. Re- cently it was generalized to the superposition of more than two components[13]. If the entanglement is quan- tified by negativity, it would be interesting to establish the analogous relation and obtain the bound of entangle- ment for the superposition state. In this paper, we first show that, by contrast to the von Neumann entropy, the concurrence is a continuous function even in infinite di- mensions. We deduce an inequality to guarantee this property. Next we give the bounds of the negativity of the superposition state. The discussion and conclusion are presented in the end. The authors in[10] have shown that two states of high fidelity to one another may not have the same entan- glement, i.e., |〈ψ|φ〉|2 → 1 may not generally result in E(ψ) → E(φ), where E is the von Neumann entropy. For a bipartite pure state |Φ〉AB the von Neumann en- tropy is defined as E(ΦAB) ≡ S(TrB|Φ〉AB〈Φ|) = S(TrA|Φ〉AB〈Φ|), (1) where S(ρ) = −Tr(ρ log ρ), and the concurrence is de- fined as C(ΦAB) ≡ 2 (1− Trρ2 , (2) where ρA = TrB|Φ〉AB〈Φ| with the eigenvalues µi. How- ever, if we employ the concurrence to quantify the entan- glement, |〈ψ|φ〉|2 → 1 must result in C(ψ) → C(φ). Let us see their example letting |φ〉AB = |00〉, (3) |ψ〉AB = 1− ǫ|φ〉AB + [|11〉+ |22〉+ · · ·+ |dd〉]. (4) It is obviously true that E(φAB) = C(φAB) = 0, while according to [10] the von Neumann entropy of the state |ψ〉AB is E(ψAB) ≈ ǫ log2 d→ ∞, (5) specially when d is as large as we expect. It follows from Eq.(2) that the concurrence of the state |ψ〉AB give us the result C2(ψAB) = 2 2ǫ− ǫ2 − ǫ → 0, (6) when ǫ is adequately small. By contrast to E(ψAB) in Eq.(5), C2(ψAB) in Eq.(6) is independent of d. Note that when ǫ is small the two states have high fidelity |〈ψ|φ〉|2 = 1 − ǫ → 1. Comparing Eq.(5) to Eq.(6), we can draw a http://arxiv.org/abs/0704.0757v1 conclusion that if the entanglement is quantified by the concurrence two states of high fidelity to one another still have nearly the same entanglement. It is indeed that the difference of the von Neumann en- tropy between two pure states of fixed dimension can be bounded using Fannes’ inequality[11], while the von Neu- mann entropy is not a continuous function and no such bound applies in infinite dimensions. However, as we will show here, a similar bound still works if the entanglement is quantified by the concurrence and the concurrence is a continuous function even in infinite dimensions. In or- der to explain our above viewpoint we present the fol- lowing Theorem which is similar to the original Fannes’ inequality except that the entanglement is quantified by the concurrence. Theorem 1. Suppose ρAB and σAB are density matri- ces of two bipartite pure states in arbitrary dimensions. For the trace distance T (ρA, σA) ≡ Tr|ρA − σB| between ρA = TrBρAB and σA = TrBσAB we have |C2(ρAB)− C2(σAB)| ≤ 4T (ρA, σA). (7) Proof. Let r1 ≥ r2 ≥ · · · ≥ rd be the eigenvalues of ρA, in decreasing order, and s1 ≥ s2 ≥ · · · ≥ sd be the eigenvalues of σA, also in decreasing order. According to[1], it follows that |ri − si| ≤ T (ρA, σA). (8) From the observation of the definition of the concurrence in Eq.(2), we can rewrite the left-hand-side of Eq.(7) as ∣∣C2(ρAB)− C2(σAB) ∣∣ = 2 ∣∣∣∣∣ (r2i − s2i ) ∣∣∣∣∣ ∣∣r2i − s2i |ri + si||ri − si| |ri − si|. (9) The second formula is obtained from the observation that |a+ b+ · · ·+ k| ≤ |a| + |b| + · · · + |k| for any complex quantities a, b, · · ·, k. In the derivation of the last formula we have taken into account the fact that ri + si ≤ 2 since each eigenvalue of ri and si is not greater than one. Combining Eqs.(8) and (9) can give Eq.(7). Thus the proof is completed. From the Theorem 1 it can be seen that the difference of the concurrences of two pure states is a function of fidelity and can be bounded by Eq.(7). What’s more, by contrast to the von Neumann entropy[10] the concurrence is a continuous function and such a bound still works in infinite dimensions. Note that whether a similar bound in Eq.(7) holds for the negativity is still open. In the next paragraphs we are devoted to deducing the bounds on the negativity of any bipartite pure state as a superposition of two terms |Γ〉AB = α|Ψ〉+ β|Φ〉. Before embarking on this study, we first recall some basic definitions of the negativity. As for detecting en- tangled state in higher-dimensional Hilbert space, Peres- Horodecki criterion based on partial transpose[15, 16] is a convenient method. Given a density matrix ρ in a bi- partite pure system of A and B, the partial transpose with respect to A subsystem is described by (ρTA)ij,kl = (ρ)kj,il and the negativity is defined as N = 1 (‖ρTA‖ − 1). (10) The trace norm ‖R‖ is given by ‖R‖ = Tr RR†. Note that N > 0 is the necessary and sufficient condition for entangled bipartite pure states. There are two key ingredients to obtain the bounds of the negativity for bipartite superposition pure states. One is that the negativity can be expressed by means of Schmidt coefficients of a pure state. Suppose that a pure m⊗ n(m ≤ n) quantum state has the standard Schmidt form |ψ〉AB = µi|aibi〉, where µi(i = 1, · · · ,m) are the Schmidt coefficients, ai and bi are the orthogonal basis in HA and HB, respectively. For the pure bipartite state we can derive ‖ρTA‖ = [18], and there- fore Eq.(10) can be reexpressed as N = 1  . (11) In order for the later use we can transform Eq.(11) into = 2N + 1. (12) The other is the Theorem[17], which states that for any two Hermitian matrix H and K defined in Cn×n, µi(H) + µ1(K) ≤ µi(H +K) ≤ µi(H) + µn(K), (13) holds, where µi(·) are the eigenvalues in increasing order. If µ1(K) ≥ 0, from Eq.(13) it is easy to check that µi(H) ≤ µi(H +K) ≤ µi(H) + µn(K), (14) holds also. Then Eq.(14) will be used repeatedly in what follows. For the negativity of the arbitrary superposition state let us first see the simplest case in which two bi- partite states we are superposing, Φ1 and Ψ1, are biorthogonal[10], i.e., Φ1Ψ = Ψ1Φ = 0[12]. Since the matrix representation of a reduced density matrix will be used, we explain the corresponding notations in the following. For the pure state |Φ〉AB defined in m ⊗ n dimensions, generally it can be considered as a vector: |Φ〉AB = [a00, a01, · · · , a0m, a10, a11, · · · , amn]T with the superscript T denoting transpose operation. With the matrix notation, the reduced density matrix reads ρA = ΦΦ †, (15) whose eigenvalues are µi appearing in Eq.(11). Theorem 2. Suppose that two biorthogonal pure states Φ1 and Ψ1, which are defined in m ⊗ n(n ≤ m) di- mensions. The negativity of their superposed states Γ1 = αΦ1 + βΨ1 with |α2|+ |β|2 = 1 satisfies 2|α|2N (Φ1) + 2|β|2N (Ψ1)− 1 ≤ N (αΦ1 + βΨ1) ≤ 2|α|2Ñ (Φ1) + 2|β|2Ñ (Ψ1)− 1 , (16) where Ñ (Φ1) = N (Φ1) + µn(Ψ1)[2N (Φ1) + 1] |α| + n2|β|2µn(Ψ1) 2|α|2 , Ñ (Ψ1) = N (Ψ1) + µn(Φ1)[2N (Ψ1) + 1] |β| + n2|α|2µn(Φ1) 2|β|2 . Proof. From Eq.(15) the reduced density matrix of the state Γ1 can read = |α|2Φ1Φ†1 + |β|2Ψ1Ψ + αβ∗Φ1Ψ + α∗βΨ1Φ . (17) The biorthogonal condition with Φ1Ψ = 0 and Ψ1Φ 0 makes Eq.(17) reduce to = |α|2Φ1Φ†1 + |β|2Ψ1Ψ . (18) Substituting Eq.(18) into the left inequality of Eq.(13) we have |α|2µi(Φ1Φ†1) + |β|2µ1(Ψ1Ψ ) ≤ µi(Γ1Γ†1). (19) Since Ψ1Ψ is positive semidefinite, µ1(Ψ1Ψ ) ≥ 0. Thus Eq.(19) becomes |α|2µi(Φ1Φ†1) ≤ µi(Γ1Γ ). (20) Taking the square root of both sides in Eq.(20) and the sum of µi(·) over all index i, we have µi(Φ1Φ µi(Γ1Γ ). (21) In a similar way, substituting Eq.(18) into the right in- equality of Eq.(14) and taking the sum of µi(·) over all index i, we have µi(Γ1Γ ) ≤ |α| µi(Φ1Φ ) + n|β| µn(Ψ1Ψ Substituting Eqs.(21) and (22) into Eq.(12), respectively, we can obtain |α|2N (Φ1) + |α|2 − 1 ≤ N (αΦ1 + βΨ1) ≤ |α|2Ñ (Φ1) + |α|2 − 1 .(23) If we replace the matrix |α|2Φ1Φ†1 with |β|2Ψ1Ψ Eqs.(20) and (21), i.e., equivalently exchange the ma- trixes H and K in Eq.(14), finally we can also obtain |β|2N (Ψ1) + |β|2 − 1 ≤ N (αΦ1 + βΨ1) ≤ |β|2Ñ (Ψ1) + |β|2 − 1 .(24) Then combining Eqs.(23) and (24) gives Eq.(16). Thus the proof is completed. Note that the lower bound in Eq.(16) can provide a nonzero value only when 2|α|2N (Φ1) + 2|β|2N (Ψ1) > 1. Next we show an example to illustrate the validity of our bound. Consider the state |φ〉AB = α|ϕ〉AB + β|ψ〉AB , (25) |ϕ〉AB = |00〉+ 1√ |11〉, (26) |ψ〉AB = |22〉+ 1√ |33〉, (27) where α = β = 1/ 2. It is easy to check that |ϕ〉AB and |ψ〉AB are biorthogonal, N (|φ〉AB) = 3/2, N (|ϕ〉AB) = N (|ψ〉AB) = 1/2, and µ4(|ϕ〉AB) = µ4(|ψ〉AB) = 1/2. Accordingly from Eq.(16) we obtain the lower and upper bounds 0 < N (|φ〉AB) = < 4, (28) which work well. Finally we directly present the main Theorem of this paper, in which the two states being superposed can be biorthoganal, orthogonal, or nonorthogonal. Theorem 3. Suppose that two arbitrary normalized pure states Φ2 with rank r1 and Ψ2 with rank r2, which are defined in any dimensions. The negativity of their superposed states Γ2 = αΦ2 + βΨ2 with rank r3 and |α2|+ |β|2 = 1 satisfies 2‖α|Φ2〉+ β|Ψ2〉‖2N (αΦ2 + βΨ2) ≤ 2|α|2Ñ (Φ2) + 2|β|2Ñ (Ψ2)− ‖α|Φ2〉+ β|Ψ2〉‖2 + 1, (29) where Ñ (Φ2) = N (Φ2) + µn(Ψ2)[2N (Φ2) + 1] |α| + r2|β|2µn(Ψ2) 2|α|2 , Ñ (Ψ2) = N (Ψ2) + µn(Φ2)[2N (Ψ2) + 1] |β| + r2|α|2µn(Φ2) 2|β|2 , where r = max{r1, r2, r3}. Proof. Consider the matrix M = |α|2Φ2Φ†2 + |β|2Ψ2Ψ , (30) which can be rewritten as ‖Γ2‖2 Γ̂2(Γ̂2) )†, (31) where Γ− = αΦ2 − βΨ2, Γ̂2 = Γ2/‖Γ2‖, and Γ̂−2 = ‖. Thus Eqs.(13) shows that |α|2µi(Φ2Φ†2) + |β|2µ1(Ψ2Ψ ≤ µi(M) ≤ |α|2µi(Φ2Φ†2) + |β|2µn(Ψ2Ψ ), (32) ‖Γ2‖2 Γ̂2Γ̂ ≤ µi(M) ≤ ‖Γ2‖2 Γ̂2Γ̂ .(33) Since µ1(Ψ2Ψ ) ≥ 0 and µ1 ≥ 0, observing the left inequality of Eq.(33) and the right inequality in Eq.(32) we have ‖Γ2‖√ Γ̂2Γ̂ ≤ |α| µi(Φ2Φ ) + |β| µn(Ψ2Ψ Substituting Eqs.(34) into Eq.(12) we have ‖α|Φ2〉+ β|Ψ2〉‖2N (αΦ2 + βΨ2) ≤ 2|α|2Ñ (Φ2)− ‖α|Φ2〉+ β|Ψ2〉‖2 + |α|2. (35) Likewise, if we replace the two matrixes |α|2Φ2Φ†2 with |β|2Ψ2Ψ†2 in Eq.(32), we can obtain ‖α|Φ2〉+ β|Ψ2〉‖2N (αΦ2 + βΨ2) ≤ 2|β|2Ñ (Ψ2)− ‖α|Φ2〉+ β|Ψ2〉‖2 + |β|2. (36) Combining Eqs.(35) and (36) gives Eq.(29). Thus the proof is completed. Since there exists a extra term of the maximal eigen- value in the second inequality in Eq.(33), generally it is difficult to achieve the universal formula for the lower bound of the negativity in this case. But it is our interest in the future work. In conclusion, we have shown that if the entanglement is quantified by the concurrence two pure states of high fidelity to one another still have nearly the same en- tanglement and obtained an inequality that can guaran- tee that the concurrence is a continuous function even in infinite dimensions. However, whether the similar property can apply to the negativity is still open. The bounds on the negativity of superposed states in terms of those of the states being superposed were obtained. So far some bounds of the wildly-studied measures of entanglement like the von Neumann entropy[10], the concurrence[12] and the negativity in this paper for the superposition states have been provided. In view of that the concurrence can be directly accessible in laboratory experiment[19], these bounds can find useful applications in estimating the amount of the entanglement of a given pure state. The author Y.C.O. was supported from China Post- doctoral Science Foundation and the author H.F. was supported by ’Bairen’ program NSFC grant and ’973’ program (2006CB921107). [1] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, Cambridge, 2000). [2] A. Osterloh, L. Amico, G. Falci, and R. Fazio Na- ture(London) 416, 608(2002). [3] L. A. Wu, M. S. Sarandy, and D. A. Lidar, Phys. Rev. Lett. 93, 250404(2004). [4] W. K. Wootters, Phys. Rev. Lett. 80, 2245(1998). [5] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin, and W. K. Wootters, Phys. Rev. A 54, 3824(1996). [6] S. Hill and W. K. Wootters, Phys. Rev. Lett. 78, 5022(1997). [7] V. Coffman, J. Kundu, and W. K. Wootters, Phys. Rev. A 61, 052306(2000). [8] Y. C. Ou, Phys. Rev. A 75, 034305(2007). [9] Y. C. Ou and H. Fan, quant-ph/0702127. [10] N. Linden, S. Popescu, and J. A. Smolin, Phys. Rev. Lett. 97, 100502(2006). [11] M. Ohya and D. Petz, Quantum Entropy and Its Use (Springer-Verlag, Berlin 1983); see also Ref.(1). [12] C. S. Yu, X. X. Yi, and H. S. Song, Phys. Rev. A 75, 022332(2007). [13] Y. Xiang, S. J. Xiong, and F. Y. Hong, quant-ph/0701188. [14] G. Vidal and R. F. Werner, Phys. Rev. A 65, 032314(2002). [15] A. Peres, Phys. Rev. Lett. 77, 1413(1996). [16] M. Horodecki, P. Horodecki, and R. Horodecki, Phys. Lett. A 223, 1(1996). [17] R. A. Horn and C. R. Johnson, Matrix Analysis (Cam- bridge University Press, New York, 1985), see Theorem 4.3.1. [18] K. Chen, S. Albeverio, and S. M. Fei, Phys. Rev. Lett. 95, 040504(2005). [19] S. P. Walborn, P. H. Souto Ribeiro, L. Davidovich, F. Mintert, and A. Buchleitner, Nature(London) 440, 1022(2006). http://arxiv.org/abs/quant-ph/0702127 http://arxiv.org/abs/quant-ph/0701188 ABSTRACT The entanglement quantified by negativity of pure bipartite superposed states is studied. We show that if the entanglement is quantified by the concurrence two pure states of high fidelity to one another still have nearly the same entanglement. Furthermore this conclusion can be guaranteed by our obtained inequality, and the concurrence is shown to be a continuous function even in infinite dimensions. The bounds on the negativity of superposed states in terms of those of the states being superposed are obtained. These bounds can find useful applications in estimating the amount of the entanglement of a given pure state. <|endoftext|><|startoftext|> Halder_articleFigureArxiv Entangling Independent Photons by Time Measurement Matthäus Halder, Alexios Beveratos, Nicolas Gisin, Valerio Scarani, Christoph Simon & Hugo Zbinden Group of Applied Physics, University of Geneva, 20, rue de l'Ecole-de-Médecine, 1211 Geneva 4, Switzerland A quantum system composed of two or more subsystems can be in an entangled state, i.e. a state in which the properties of the global system are well defined but the properties of each subsystem are not. Entanglement is at the heart of quantum physics, both for its conceptual foundations and for applications in information processing and quantum communication. Remarkably, entanglement can be “swapped”: if one prepares two independent entangled pairs A1-A2 and B1-B2, a joint measurement on A1 and B1 (called a “Bell-State Measurement”, BSM) has the effect of projecting A2 and B2 onto an entangled state, although these two particles have never interacted or shared any common past1,2. Experiments using twin photons produced by spontaneous parametric down-conversion (SPDC) have already demonstrated entanglement swapping3-6, but here we present its first realization using continuous wave (CW) sources, as originally proposed2. The challenge was to achieve sufficiently sharp synchronization of the photons in the BSM. Using narrow-band filters, the coherence time of the photons that undergo the BSM is significantly increased, exceeding the temporal resolution of the detectors. Hence pulsed sources can be replaced by CW sources, which do not require any synchronization6,7, allowing for the first time the use of completely autonomous sources. Our experiment exploits recent progress in the time precision of photon detectors, in the efficiency of photon pair production by SPDC with waveguides in nonlinear crystals8, and in the stability of narrow-band filters. This approach is independent of the form of entanglement; we employed time-bin entangled photons9 at telecom wavelengths. In addition to entangling photons from autonomous sources, a fundamental quantum phenomenon, our setup is robust against thermal or mechanical fluctuations in optical fibres thanks to cm-long coherence lengths. The present experiment is thus an important step towards real- world quantum networks with truly independent and distant nodes. The BSM is the essential element in an entanglement-swapping experiment. Linear optics allows the realization of only a partial BSM10 by coupling the two incoming modes on a beam-splitter (BS) and observing a suitable detection pattern in the outgoing modes. Such a measurement is successful in at most 50% of the cases. Still, a successful partial BSM entangles two photons that were, up to then, independent. The physics behind this realization is the bosonic character of photons, it is therefore crucial that the two incoming photons are indistinguishable: they must be identical in their spectral, spatial, polarization and temporal modes at the BS: Spectral overlap is achieved by the use of similar filters, spatial overlap by the use of single-mode optical fibres and polarization is matched by a polarization controller. In addition, the temporal resolution must be unambiguous: detection at a time t ± ∆td, with ∆td the temporal resolution of the detector, must single out a unique time mode. In previous experiments, synchronised pulsed sources created both the photons at the same time and path lengths had to be matched to obtain the required temporal overlap. The pulse length, i.e. the coherence length of the photons, was τc << ∆td (typically τc <1ps), but two subsequent pulses were separated by more than ∆td11. The drawback of such a realization is that the two sources cannot be totally autonomous, because of the indispensable synchronization. Here, by using stable narrow-band filters and detectors with low jitter, we reach the regime where τc > ∆td12. In this case, the detectors always single out a unique time mode. As a benefit, we can give up the pulsed character of the sources and the synchronization between them, realizing for the first time the entanglement swapping scheme as originally proposed in Ref.2. The experimental scheme is sketched in Fig.1. Each of the two non-linear crystals emits pairs of energy-time entangled photons13 produced by SPDC of a photon originating from a CW laser. A pair can be created at any time t, and all these processes are coherent within the km-long coherence length of the laser: tt,ψ describes a pair of signal and idler photons emitted by source A. Thus, the state produced by two independent sources can conveniently be represented as ( )∑ ∑  ++++++∝=Ψ BABABABAprep tttttttttttt ,,,,,, ττττψψ . The first term in the above sum describes 4 photons all arriving at the same time t at a BS. Since for this case two identical photons bunch in the same mode, due to their bosonic nature, this term leads to a Hong-Ou-Mandel (HOM) dip14. The second term describes two photon pairs arriving with a time difference τ>0. The two photons A1 and B1 are sent through a 50/50 BS. This fibre-coupler and the two detectors behind it realize a partial BSM10: in particular, when one of the detectors fires at time t and the other one at time t+τ, this corresponds to a measurement of the Bell-state −Ψ for A1 and B19. In consequence the remaining two photons A2 and B2 are projected in the state 22222222 BABABABA tttt ττψ +−+∝Ψ≡ − , which is a singlet state for time-bin entanglement. Hence entanglement has been swapped. This process can be seen as teleportation15-19 of entanglement. It can be tested by sending the photons in unbalanced interferometers such that the path difference between the two arms corresponds to τ. Interference between temporally distinguishable events (at t and t+τ, respectively) is obtained by erasing the time information via unbalanced interferometers9,12,20 as shown in Fig.1. Note that the value of τ varies from one successful entanglement swapping event to another. As in our experiment the path differences of the analysing interferometers are fixed, we test the entanglement of the swapped pairs produced with one fixed τ. We now describe our experiment in more detail. Above we have assumed that the detection times t and t+τ of the BSM are sharply defined. In physical terms, this requirement means that the detection times have to be determined with sub-coherence- time precision: this is the key ingredient that makes it possible to achieve synchronisation of photons A1 and B1 by detection, thus to use CW sources. Since single-photon detectors have a certain intrinsic minimal jitter, the coherence length of the photons has to be increased to exceed this value by narrow filtering. Consider the case where each of the two sources emits one entangled pair of photons, and where A1 and B1 take different exits of the BS. The photon that takes output port 1 is detected by a NbN superconducting single-photon detector (SSPD)21 with a time resolution ∆td = 74ps. The photon in output port 2 is detected by an InGaAs single photon avalanche diode (APD, ∆td = 105ps) triggered by the detection in the SSPD. The time resolution of these detectors is several times smaller than that of commercial telecommunication photon detection modules. To enable synchronization of the photons at the BS by post-selection, the coherence length of the photons has to exceed ∆td. This is achieved by using filters of 10pm bandwidth, corresponding to a coherence time τc of 350ps. We are able to tolerate the losses due to filtering because we use cm-long wave guides in PPLN crystals with a high down-conversion efficiency of 5*10-7 per pump photon and per nm of the created spectrum. For 2mW of laser power, an emission flux q of 2*10-2 pairs per coherence time is obtained. This q is independent of the filtered bandwidth: in fact, narrower filtering decreases the number of photons per second but increases their coherence time by the same factor, hence keeping q constant. Any two-detector click in the BSM prepares the two remaining photons in a time- bin entangled state. In our experiment the creation rate for such entangled photon pairs is ≈104 per second, with time delays τ ranging up to 10ns. This is two orders of magnitude larger than in previous experiments at shorter and similar wavelengths3-6. As the probability of both the pairs originating from different sources equals the probability of creating them in the same source, the first cases have to be post selected by considering only 4-fold-events. Furthermore only one fixed τ is tested. The resulting rate is smaller by two orders of magnitude compared to the creation rate. To verify their entanglement, the two photons are sent through unbalanced interferometers (a and b) in Michelson configuration. The path length differences of the interferometers must be identical only within the coherence length of the analyzed photons (7cm), but stable in phase (α and β): this is achieved by active stabilization22. On each side, both output ports of the interferometer are connected to InGaAs APD, triggered by the detection of both the photons in the BSM. Four-fold coincidences, between one click in each BSM detector and one behind each interferometer, are registered by a multistop time to digital converter (TDC) and the arrival times (t, t+τ) are stored in a table. For τ = 0, we observe a decrease in this coincidence count rate (see Fig.2). The visibility of this HOM dip of 77% indicates the degree of indistinguishability of the two photons A1 and B1 and could be further improved by increasing τc/∆td. The width of the dip corresponds to the convolution of τc for the two photons with the jitter of the detectors. Note that photons which are detected after the BS at measurable different times, but within τc, do still partially bunch, which confirms that the relevant time precision is set by the coherence time of the photons. To test for successful entanglement swapping, the relative phase α-β between the interferometers is changed by keeping α fixed and scanning β. As usual for the analysis of time-bin entanglement9, interference is observable in the case where, at the output of the interferometers, both photons are detected at the same time. We measured the four possible 4-fold coincidence count rates ),( βαijR (clicks in two outer detectors conditioned on a successful BSM) with { }−+∈ ,, ji the different detectors behind interferometer a and b, respectively. Thus the two-photon spin- correlation coefficient ),(),(),(),( ),(),(),(),( βαβαβαβα βαβαβαβα −−+−−+++ −−+−−+++ E is obtained as a function of the phase settings α and β and plotted in Fig.3 for α fixed. A fit of the form )cos(),( βαβα −= VE to our experimental data gives a visibility V=0.63. If one assumes that the two photons are in a Werner state (which corresponds to white noise), one can show that 31>V is sufficient to demonstrate entanglement 5,23. Our experimental visibility clearly exceeds this bound24. The plain squares show that the 3- fold coincidence count rate between a successful BSM and only one of the outside detectors is independent of the phase setting, as expected for a −Ψ -state. V is limited by imperfections in the matching of wavelengths, polarisations and temporal synchronisation. In our setup, the latter is the main source of errors. The integration time of this measurement was 1 hour for each of the 13 phase settings and the experiment was run 8 times, hence took 104 hours, which demonstrates the stability of our setup. Such long integration times are necessary because of low count rates (5 four- fold coincidences per hour), which are mainly due to poor coupling efficiencies of the photons into optical fibres, losses in optical components like filters and interferometers, as well as the limited detectors efficiencies. All these factors decrease the probability of detecting all four photons of a two-pair event. Exploiting all the produced entangled pairs with different delays τ is possible in principle using rapidly adjustable delays in the interferometers or quantum memories. This would be an important step towards the realization of recent proposals for long- distance quantum communication25. Time-bin entanglement is particularly stable and well suited for fibre optic communications26, and the coherence length of 7cm allows tolerating significant fiber length fluctuations as expected in field experiments. If additionally, count rates are further improved, long distance applications like quantum relays27,28 become realistic. In conclusion, we realized an entanglement swapping experiment with completely autonomous CW sources. This is possible thanks to the low jitter of new NbN superconducting and InGaAs avalanche single-photon detectors and to the long coherence length of the created photon pairs after narrow-band filtering. The setup does not require any synchronization between the sources and is highly stable against length fluctuations of the quantum channels. Methods Schematic description of the setup. Both sources consist of an external cavity diode laser in CW mode at 780.027nm (Toptica DL100), stabilized against a Rubidium transition (85Rb F = 3), pumping a nonlinear periodically poled Lithium Niobate waveguide8 (PPLN, HC photonics Corp) at a power of 2mW. The process of SPDC creates 4*1011 pairs of photons per second with a spectral width of 80nm FWHM centered at 1560nm. The photons are emitted collinearly and coupled into a single-mode fiber with 25% efficiency and the remaining laser light is blocked with a silicon high- pass filter (Si). Signal and idler photons are separated and filtered down to a bandwidth of 10pm by custom-made tunable phase-shifted Bragg gratings (AOS GmbH). These filters have a rejection of >40dB, 3dB insertion losses, and can be tuned independently over a range of 400pm. Once a signal photon has been filtered to ωs, the corresponding idler photon has a well-defined frequency ωi, due to stabilized pump wavelength and energy conservation in the process of SPDC (ωs + ωi = ωlaser). After filtering, the effective conversion efficiency for creating a photon pair within these 10pm is 5*10-9 per pump photon. In principle, the available pump power permits us to produce narrow band entangled photon pairs at rates up to 3*108 pairs per second, which translates to an emission flux of more than 0.1 photons per coherence time. In this experiment, we limited the laser to 2mW, in order to reduce the probability of multiple pair creation which would decrease the interference visibility29. After the beam splitter (BS), the first photon is detected by a NbN superconducting single-photon detector (SSPD, Scontel) operated in free running mode21, with a total detection efficiency of 4.5%, 300 dark counts/sec and a timing resolution of 74ps, including the time jitter of both the detector and the amplification and discrimination electronics. The second photon is detected by an InGaAs single- photon avalanche diode operated in Geiger mode and actively triggered by the detection in the SSPD. With home-made electronics this detector has a time jitter of 105ps. The observed HOM-dip with a visibility of 77% was obtained with two SSPD detectors, which were used because of their smaller time jitter. For the entanglement swapping, we used an APD, because of its higher efficiency, in order to shorten the integration time. This means that the visibility of the interference fringe in Fig.3 could further be increased by the use of two SSPDs, but with the drawback of longer measurement times. Photons A2 and B2 are also detected by InGaAs APDs (ID200, idQuantique). All the APDs have quantum efficiencies of 30% and dark count probabilities of 10-4 per ns. The interferometers are actively stabilized against a laser locked on an atomic transition, have a path length difference of 1.2ns and insertion losses of 4dB each. 1. Yurke, B. & Stoler, D. Bell’s-inequality experiments using independent-particle sources. Phys. Rev. A 46 2229 (1992). 2. Żukowski M., Zeilinger A., Horne M. A. & Ekert A. K. ”Event-ready-detectors” Bell experiment via entanglement swapping. Phys. Rev. Lett. 71, 4287-4290 (1993). 3. Pan J.-W., Bouwmeester D., Weinfurter H. & Zeilinger A. Experimental Entanglement Swapping: Entangling Photons That Never Interacted. Phys. Rev. Lett. 80, 3891-3894 (1998). 4. Jennewein T., Weihs G., Pan J.-W. & Zeilinger A. Experimental Nonlocality Proof of Quantum Teleportation and Entanglement Swapping. Phys. Rev. Lett. 88, 017903 (2002). 5. de Riedmatten, H., Marcikic, I., Tittel, W., Zbinden, H. & Gisin, N. Long-distance entanglement swapping with photons from separated sources. Phys. Rev. A 71, 050302 (2005). 6. Yang, T. et al. Experimental Synchronization of Independent Entangled Photon Sources. Phys. Rev. Lett. 96, 110501 (2006). 7. Kaltenbaek, R., Blauensteiner, B., Żukowski, M., Aspelmeyer, M., & Zeilinger A. Experimental interference of independent photons. Phys. Rev. Lett. 96, 240502 (2006). 8. Tanzilli S. et al. PPLN waveguide for quantum communication. Eur. Phys. .J D 18, 155 (2002). 9. Brendel, J., Gisin, N., Tittel, W. & Zbinden, H. Pulsed Energy-Time Entangled Twin- Photon Source for Quantum Communication. Phys. Rev. Lett. 82, 2597 (1999). 10. Weinfurter, H. Experimental Bell-State Analysis. Europhys Lett. 25, 559 (1994). 11. Bouwmeester, D. et al. Experimental quantum teleportation. Nature 390, 575 (1997). 12. Legero, T., Wilk, T., Henrich, M., Rempe, G. & Kuhn, A. Quantum Beat of Two Single Photons. Phys. Rev. Lett. 93, 070503 (2004). 13. Franson, J.D. Bell inequality for position and time. Phys. Rev. Lett. 62, 2205 (1989). 14. Hong, C.K., Ou, Z.Y. & Mandel, L. Measurement of Subpicosecond Time Intervals between Two Photons by Interference. Phys. Rev. Lett. 59, 2044 (1987). 15. Furusawa, A. Unconditional Quantum Teleportation. Science 282, 706 (1998) 16. Boschi, D.,Branca, S., De Martini, F., Hardy, L. & Popescu, S. Experimental Realization of Teleporting an Unknown Pure Quantum State via Dual Classical and Einstein-Podolsky-Rosen Channels. Phys. Rev. Lett. 80, 1121 (1998). 17. Marcikic, I., de Riedmatten, H., Tittel, W., Zbinden, H. & Gisin, N. Long-distance teleportation of qubits at telecommunication wavelengths. Nature 421, 509 (2003). 18. Riebe, M. et al. Deterministic quantum teleportation with atoms. Nature 429, 734 (2004). 19. Barrett, M. D. et al. Deterministic quantum teleportation of atomic qubits. Nature 429, 737 (2004). 20. Pittman, T. B. et al. Can Two-Photon Interference be Considered the Interference of Two Photons? Phys. Rev. Lett. 77, 1917 (1996). 21. Milostnaya, I. et al. Superconducting single-photon detectors designed for operation at 1.55-µm telecommunication wavelength. J. Phys. Conference Series 43, 1334 (2006). 22. Marcikic, I. et al. Distribution of Time-Bin Entangled Qubits over 50 km of Optical Fiber. Phys. Rev. Lett. 93, 180502 (2004). 23. Peres, A. Separability Criterion for Density Matrices. Phys. Rev. Lett. 77, 1413 (1996). 24. In fact, in our experimental situation entanglement is present even for values of V that are smaller than 0.33 because the noise is dominated by phase errors due to the partial distinguishability of the two photons involved in the BSM. 25. Duan, L.-M., Lukin, M.D., Cirac, J.I. & Zoller, P. Long-distance quantum communication with atomic ensembles and linear optics. Nature 414, 413 (2001). 26. Thew R. T., Tanzilli S., Tittel W., Zbinden H., & Gisin N. Experimental investigation of the robustness of partially entangled qubits over 11 km. Phys. Rev. A 66, 062304 (2002). 27. Jacobs B. C., Pittman T. B. & Franson J. D. Quantum relays and noise suppression using linear optics. Phys. Rev. A 66, 052307 (2002). 28. Collins, D., Gisin, N. & de Riedmatten, H. Quantum Relay for Long Distance Quantum Cryptography. J. Mod. Opt. 522, 735 (2005). 29. Scarani, V., de Riedmatten, H., Marcikic, I., Zbinden, H. & Gisin, N. Four-photon correction in two-photon Bell experiments. Eur. Phys. J. D 32, 129 (2005). Acknowledgements: We thank C. Barreiro, J.-D. Gautier, G. Gol’tsman, C. Jorel, S Tanzilli and J. van Houwelingen for technical support, and H. de Riedmatten, S. Iblisdir and R. Thew for helpful discussions. Financial support by the EU projects QAP and SINPHONIA and by the Swiss NCCR Quantum Photonics is acknowledged. Author Information: Reprints and permissions information is available at npg.nature.com/reprintsandpermissions. The authors declare that they have no competing financial interests Correspondence and requests for materials should be addressed to M.H. (matthaeus.halder@physics.unige.ch). Figure 1: Experimental Setup. Two pairs of entangled photons (A1- A2 and B1-B2) are produced, one by each source (A and B), and all the photons are narrowly filtered (10pm). One photon of each pair is sent into a 50/50 beam splitter (BS) and both undergo a partial Bell-State measurement (BSM). By detection in different output ports of BS and with a certain time delay τ the two photons A1 and B1 are projected on the −Ψ -state for time bin qubits, projecting the two remaining photons on the −Ψ -state as well. The entanglement is swapped onto the photons A2 and B2 and can be tested by passing them through interferometers with phases α and β, and detecting them by single photon avalanche detectors (APD) in both outputs (+,-) of each interferometer. Figure 2: 4-fold coincidence count rate as a function of the temporal delay τ. It can be seen, that the detection probability decreases if the two photons A1 and B1 arrive simultaneously (τ=0) at the beam splitter due to photon bunching, leading to a Hong-Ou-Mandel dip with 77% visibility. Figure3 Figure 3: The correlation coefficient E(α,β) between photons A2 and B2, conditioned on a BSM of photons A1 and B1, as a function of the relative phase α–β of the interferometers (open points). A fit of the form )cos(),( βαβα −= VE gives a visibility V=0.63. This proves successful entanglement swapping (see text). The coincidence count rate of only one detector conditioned on a successful BSM (3-fold coincidence) is independent of the phase setting as expected for a −Ψ -state (squares). ABSTRACT A quantum system composed of two or more subsystems can be in an entangled state, i.e. a state in which the properties of the global system are well defined but the properties of each subsystem are not. Entanglement is at the heart of quantum physics, both for its conceptual foundations and for applications in information processing and quantum communication. Remarkably, entanglement can be "swapped": if one prepares two independent entangled pairs A1-A2 and B1-B2, a joint measurement on A1 and B1 (called a "Bell-State Measurement", BSM) has the effect of projecting A2 and B2 onto an entangled state, although these two particles have never interacted or shared any common past[1,2]. Experiments using twin photons produced by spontaneous parametric down-conversion (SPDC) have already demonstrated entanglement swapping[3-6], but here we present its first realization using continuous wave (CW) sources, as originally proposed[2]. The challenge was to achieve sufficiently sharp synchronization of the photons in the BSM. Using narrow-band filters, the coherence time of the photons that undergo the BSM is significantly increased, exceeding the temporal resolution of the detectors. Hence pulsed sources can be replaced by CW sources, which do not require any synchronization[6,7], allowing for the first time the use of completely autonomous sources. Our experiment exploits recent progress in the time precision of photon detectors, in the efficiency of photon pair production by SPDC with waveguides in nonlinear crystals[8], and in the stability of narrow-band filters. This approach is independent of the form of entanglement; we employed time-bin entangled photons[9] at telecom wavelengths. Our setup is robust against thermal or mechanical fluctuations in optical fibres thanks to cm-long coherence lengths. <|endoftext|><|startoftext|> arXiv:0704.0759v1 [math.AP] 5 Apr 2007 ENERGY CONSERVATION AND ONSAGER’S CONJECTURE FOR THE EULER EQUATIONS A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY ABSTRACT. Onsager conjectured that weak solutions of the Euler equa- tions for incompressible fluids in R3 conserve energy only if they have a certain minimal smoothness, (of order of 1/3 fractional derivatives) and that they dissipate energy if they are rougher. In this paper we prove that energy is conserved for velocities in the function space B 3,c(N) show that this space is sharp in a natural sense. We phrase the energy spectrum in terms of the Littlewood-Paley decomposition and show that the energy flux is controlled by local interactions. This locality is shown to hold also for the helicity flux; moreover, every weak solution of the Euler equations that belongs to B 3,c(N) conserves helicity. In contrast, in two dimensions, the strong locality of the enstrophy holds only in the ultraviolet range. 1. INTRODUCTION The Euler equations for the motion of an incompressible inviscid fluid + (u · ∇)u = −∇p, (2) ∇ · u = 0, where u(x, t) denotes the d-dimensional velocity, p(x, t) denotes the pres- sure, and x ∈ Rd. We mainly consider the case d = 3. When u(x, t) is a classical solution, it follows directly that the total energyE(t) = 1 |u|2 dx is conserved. However, conservation of energy may fail for weak solutions (see Scheffer [25], Shnirelman [24]). This possibility has given rise to a considerable body of literature and it is closely connected with statistical theories of turbulence envisioned 60 years ago by Kolmogorov and On- sager. For reviews see, for example, Eyink and Sreenivasan [14], Robert [23], and Frisch [15]. Date: April 5, 2007. 2000 Mathematics Subject Classification. Primary: 76B03; Secondary: 76F02. Key words and phrases. Euler equations, anomalous dissipation, energy flux, Onsager conjecture, turbulence, Littlewood-Paley spectrum. http://arxiv.org/abs/0704.0759v1 2 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY Onsager [22] conjectured that in 3-dimensional turbulent flows, energy dissipation might exist even in the limit of vanishing viscosity. He sug- gested that an appropriate mathematical description of turbulent flows (in the inviscid limit) might be given by weak solutions of the Euler equations that are not regular enough to conserve energy. According to this view, non- conservation of energy in a turbulent flow might occur not only from vis- cous dissipation, but also from lack of smoothness of the velocity. Specif- ically, Onsager conjectured that weak solutions of the Euler equation with Hölder continuity exponent h > 1/3 do conserve energy and that turbulent or anomalous dissipation occurs when h ≤ 1/3. Eyink [12] proved energy conservation under a stronger assumption. Subsequently, Constantin, E and Titi [7] proved energy conservation for u in the Besov spaceBα3,∞, α > 1/3. More recently the result was proved under a slightly weaker assumption by Duchon and Robert [11]. In this paper we sharpen the result of [7]: we prove that energy is con- served for velocities in the Besov space of tempered distributions B 3,p . In fact we prove the result for velocities in the slightly larger spaceB 3,c(N) Section 3). This is a space in which the “Hölder exponent” is exactly 1/3, but the slightly better regularity is encoded in the summability condition. The method of proof combines the approach of [7] in bounding the trilinear term in (3) with a suitable choice of the test function for weak solutions in terms of a Littlewood-Paley decomposition. Certain cancelations in the tri- linear term become apparent using this decomposition. We observe that the space B 3,c(N) is sharp in the context of no anomalous dissipation. We give an example of a divergence free vector field in B 3,∞ for which the energy flux due to the trilinear term is bounded from below by a positive constant. This construction follows ideas in [12]. However, because it is not a solu- tion of the unforced Euler equation, the example does not prove that indeed there exist unforced solutions to the Euler equation that live in B 3,∞ and dissipate energy. Experiments and numerical simulations indicate that for many turbu- lent flows the energy dissipation rate appears to remain positive at large Reynolds numbers. However, there are no known rigorous lower bounds for slightly viscous Navier-Stokes equations. The existence of a weak so- lution of Euler’s equation, with positive smoothness and that does not con- serve energy remains an open question. For a discussion see, for example, Duchon and Robert [11], Eyink [12], Shnirelman [25], Scheffer [24], de Lellis and Szekelyhidi [10]. We note that the proof in Section 3 applied to Burger’s equation for 1- dimensional compressible flow gives conservation of energy in B1/3 3,c(N) ENERGY CONSERVATION 3 this case it is easy to show that conservation of energy can fail in B which is the sharp space for shocks. The Littlewood-Paley approach to the issue of energy conservation ver- sus turbulent dissipation is mirrored in a study of a discrete dyadic model for the forced Euler equations [4, 5]. By construction, all the interactions in that model system are local and energy cascades strictly to higher wave numbers. There is a unique fixed point which is an exponential global at- tractor. Onsager’s conjecture is confirmed for the model in both directions, i.e. solutions with bounded H5/6 norm satisfy the energy balance condition and turbulent dissipation occurs for all solutions when the H5/6 norm be- comes unbounded, which happens in finite time. The absence of anomalous dissipation for inviscid shell models has been obtained in [8] in a space with regularity logarithmically higher than 1/3. In Section 3.2 we present the definition of the energy flux employed in the paper. This is the flux of the Littlewood-Paley spectrum, ([6]) which is a mathematically convenient variant of the physical concept of flux from the turbulence literature. Our estimates employing the Littlewood-Paley de- composition produce not only a sharpening of the conditions under which there is no anomalous dissipation, but also provide detailed information concerning the cascade of energy flux through frequency space. In sec- tion 3.3. we prove that the energy flux through the sphere of radius κ is controlled primarily by scales of order κ. Thus we give a mathematical justification for the physical intuition underlying much of turbulence the- ory, namely that the flux is controlled by local interactions (see, for exam- ple, Kolmogorov [16] and also [13], where sufficient conditions for locality were described). Our analysis makes precise an exponential decay of non- local contributions to the flux that was conjectured by Kraichnan [17]. The energy is not the only scalar quantity that is conserved under evolu- tion by classical solutions of the Euler equations. For 3-dimensional flows the helicity is an important quantity related to the topological configura- tions of vortex tubes (see, for example, Moffatt and Tsinober [21]). The total helicity is conserved for smooth ideal flows. In Section 4 we observe that the techniques used in Section 3 carry over exactly to considerations of the helicity flux, i.e., there is locality for turbulent cascades of helicity and every weak solution of the Euler equation that belongs to B 3,c(N) conserves helicity. This strengthens a recent result of Chae [2]. Once again our argu- ment is sharp in the sense that a divergence free vector field in B 3,∞ can be constructed to produce an example for which the helicity flux is bounded from below by a positive constant. 4 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY An important property of smooth flows of an ideal fluid in two dimen- sions is conservation of enstrophy (i.e. the L2 norm of the curl of the ve- locity). In section 4.2 we apply the techniques of Section 3 to the weak formulation of the Euler equations for velocity using a test function that permits estimation of the enstrophy. We obtain the result that, unlike the cases of the energy and the helicity, the locality in the enstrophy cascade is strong only in the ultraviolet range. In the infrared range there are nonlo- cal effects. Such ultraviolet locality was predicted by Kraichnan [18] and agrees with numerical and experimental evidence. Furthermore, there are arguments in the physical literature that hold that the enstrophy cascade is not local in the infrared range. We present a concrete example that exhibits this behavior. In the final section of this paper, we study the bilinear term B(u, v). We show that the trilinear map (u, v, w) → 〈B(u, v), w)〉 defined for smooth vector fields in L3 has a unique continuous extension to {B1/2 18/7,2 }3 (and a fortiori to {H5/6}3, which is the relevant space for the dyadic model prob- lem referred to above). We present an example to show that this result is optimal. We stress that the borderline space for energy conservation is much rougher than the space of continuity for 〈B(u, v), w〉. 2. PRELIMINARIES We will use the notation λq = 2 q (in some inverse length units). Let B(0, r) denote the ball centered at 0 of radius r in Rd. We fix a nonnegative radial function χ belonging to C∞0 (B(0, 1)) such that χ(ξ) = 1 for |ξ| ≤ 1/2. We further define (3) ϕ(ξ) = χ(λ−11 ξ)− χ(ξ). Then the following is true (4) χ(ξ) + ϕ(λ−1q ξ) = 1, (5) |p− q| ≥ 2 ⇒ Supp ϕ(λ−1q ·) ∩ Supp ϕ(λ−1p ·) = ∅. We define a Littlewood-Paley decomposition. Let us denote by F the Fourier transform on Rd. Let h, h̃, ∆q (q ≥ −1) be defined as follows: ENERGY CONSERVATION 5 h = F−1ϕ and h̃ = F−1χ, ∆qu = F−1(ϕ(λ−1q ξ)Fu) = λdq h(λqy)u(x− y)dy, q ≥ 0 ∆−1u = F−1(χ(ξ)Fu) = h̃(y)u(x− y)dy. For Q ∈ N we define (6) SQ = Due to (3) we have (7) SQu = F−1(χ(λ−1Q+1ξ)Fu). Let us now recall the definition of inhomogeneous Besov spaces. Definition 2.1. Let s be a real number, p and r two real numbers greater than 1. Then ‖u‖Bsp,r = ‖∆−1u‖Lp + λsq‖∆qu‖Lp ℓr(N) is the inhomogeneous Besov norm. Definition 2.2. Let s be a real number, p and r two real numbers greater than 1. The inhomogeneous Besov space Bsp,r is the space of tempered distributions u such that the norm ‖u‖Bsp,r is finite. We refer to [3] and [19] for background on harmonic analysis in the con- text of fluids. We will use the Bernstein inequalities Lemma 2.3. ‖∆qu‖Lb ≤ λ q ‖∆qu‖La for b ≥ a ≥ 1. As a consiquence we have the following inclusions. Corollary 2.4. If b ≥ a ≥ 1, then we have the following continuous em- beddings Bsa,r ⊂ B B0a,2 ⊂ La 6 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY In particular, the following chain of inclusions will be used throughout the text. (8) H 6 (R3) ⊂ B (R3) ⊂ B (R3) ⊂ B 3,2(R 3. ENERGY FLUX AND LOCALITY 3.1. Weak solutions. Definition 3.1. A function u is a weak solution of the Euler equations with initial data u0 ∈ L2(Rd) if u ∈ Cw([0, T ];L2(Rd)), (the space of weakly continuous functions) and for every ψ ∈ C1([0, T ];S(Rd)) with S(Rd) the space of rapidly decaying functions, with ∇x · ψ = 0 and 0 ≤ t ≤ T , we (u(t), ψ(t))− (u(0), ψ(0))− (u(s), ∂sψ(s))ds = b(u, ψ, u)(s)ds, where (u, v) = u · vdx, b(u, v, w) = u · ∇v · w dx, and ∇x · u(t) = 0 in the sense of distributions for every t ∈ [0, T ]. Clearly, (9) implies Lipschitz continuity of the maps t → (u(t), ψ) for fixed test functions. By an approximation argument one can show that for any weak solution u of the Euler equation, the relationship (9) holds for all ψ that are smooth and localized in space, but only weakly Lipschitz in time. This justifies the use of physical space mollifications of u as test functions ψ. Because we do not have an existence theory of weak solutions, this is a rather academic point. 3.2. Energy flux. For a divergence-free vector field u ∈ L2 we introduce the Littlewood-Paley energy flux at wave number λQ by (10) ΠQ = Tr[SQ(u⊗ u) · ∇SQu]dx. If u(t) is a weak solution to the Euler equation, then substituting the test function ψ = S2Qu into the weak formulation of the Euler equation (9) we obtain (11) ΠQ(t) = ‖SQu(t)‖22. ENERGY CONSERVATION 7 Let us introduce the following localization kernel (12) K(q) = q , q ≤ 0; q , q > 0, For a tempered distribution u in R3 we denote dq = λ q ‖∆qu‖3,(13) d2 = {d2q}q≥−1.(14) Proposition 3.2. The energy flux of a divergence-free vector field u ∈ L2 satisfies the following estimate (15) |ΠQ| ≤ C(K ∗ d2)3/2(Q). From (15) we immediately obtain (16) lim sup |ΠQ| ≤ lim sup We define B 3,c(N) to be the class of all tempered distributions u in R3 for which (17) lim λ1/3q ‖∆qu‖3 = 0, and hence dq → 0. We endow B1/33,c(N) with the norm inherited from B Notice that the Besov spaces B 3,p for 1 ≤ p < ∞, and in particular B are included in B 3,c(N) As a consequence of (11) and (16) we obtain the following theorem. Theorem 3.3. The total energy flux of any divergence-free vector field in the class B 3,c(N) ∩ L2 vanishes. In particular, every weak solution to the Euler equation that belongs to the class L3([0, T ];B 3,c(N) ) ∩Cw([0, T ];L2) conserves energy. Proof of Proposition 3.2. In the argument below all the inequalities should be understood up to a constant multiple. Following [7] we write (18) SQ(u⊗ u) = rQ(u, u)− (u− SQ)⊗ (u− SQ) + SQu⊗ SQu, where rQ(u, u) = hQ(y)(u(x− y)− u(x))⊗ (u(x− y)− u(x))dy, h̃Q(y) = λ Q+1h̃(λQ+1y). 8 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY After substituting (18) into (10) we find Tr[rQ(u, u) · ∇SQu]dx(19) Tr[(u− SQ)⊗ (u− SQ) · ∇SQu]dx.(20) We can estimate the term in (19) using the Hölder inequality by ‖rQ(u, u)‖3/2‖∇SQu‖3, whereas (21) ‖rQ(u, u)‖3/2 ≤ ∣∣∣h̃Q(y) ∣∣∣ ‖u(· − y)− u(·)‖23dy. Let us now use Bernstein’s inequalities and Corollary 2.4 to estimate ‖u(· − y)− u(·)‖23 ≤ |y|2λ2q‖∆qu‖23 + ‖∆qu‖23(22) Q |y|2 Q−q d q + λ q(23) ≤ (λ4/3Q |y|2 + λ Q )(K ∗ d2)(Q).(24) Collecting the obtained estimates we find Tr[rQ(u, u) · ∇SQu]dx ≤ (K ∗ d2)(Q) ∣∣∣h̃Q(y) ∣∣∣λ4/3Q |y|2dy + λ λ2q‖∆qu‖23 ≤ (K ∗ d2)(Q)λ−2/3Q λ4/3q d ≤ (K ∗ d2)3/2(Q) Analogously we estimate the term in (20) Tr[(u− SQ)⊗ (u− SQ) ·∆SQu]dx ≤ ‖u− SQu‖23‖∆SQu‖3 ‖∆qu‖23 λ2q‖∆qu‖23 ≤ (K ∗ d2)3/2(Q). This finishes the proof. ENERGY CONSERVATION 9 3.3. Energy flux through dyadic shells. Let us introduce the energy flux through a sequence of dyadic shells between scales −1 ≤ Q0 < Q1 < ∞ as follows (25) ΠQ0Q1 = Tr[SQ0Q1(u⊗ u) · ∇SQ0Q1u] dx, where (26) SQ0Q1 = Q0≤q≤Q1 ∆q = SQ1 − SQ0. We will show that similar to formula (15) the flux through dyadic shells is essentially controlled by scales near the inner and outer radii. In fact it almost follows from (15) in view of the following decomposition S2Q0Q1 = (SQ1 − SQ0−1) = S2Q1 + S Q0−1 − 2SQ0−1SQ1 = S2Q1 + S Q0−1 − 2SQ0−1 = S2Q1 − S Q0−1 − 2SQ0−1(1− SQ0−1) = S2Q1 − S Q0−1 − 2∆Q0−1∆Q0 . Therefore (28) ΠQ0Q1 = ΠQ1 − ΠQ0−1 − 2 Tr[∆̄Q0(u⊗ u) · ∇∆̄Q0u] dx, where (29) ∆̄Q0(u) = h̄Q0(y)u(x− y) dy, and h̄Q0(x) = F−1 ϕ(λ−1Q0−1ξ)ϕ(λ Note that the flux through a sequence of dyadic shells is equal to the difference between the fluxes across the dyadic spheres on the boundary plus a small error term that can be easily estimated. Indeed, let us rewrite the tensor product term as follows (30) ∆̄Q0(u⊗ u) = r̄Q0(u, u) + ∆̄Q0u⊗ u+ u⊗ ∆̄Q0u, where r̄Q(u, u) = h̄Q(y)(u(x− y)− u(x))⊗ (u(x− y)− u(x)) dy. 10 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY Thus we have Tr[∆̄Q0(u⊗ u) · ∇∆̄Q0u] dx = Tr[r̄Q(u, u) · ∇∆̄Q0u] dx ∆̄Q0u · ∇u · ∆̄Q0u dx We estimate the first integral as previously to obtain Tr[r̄Q0(u, u) · ∇∆̄Q0u] dx ∣∣∣∣ ≤ dQ0(K ∗ d 2)(Q0). As to the second integral we have ∆̄Q0u · ∇u · ∆̄Q0u dx ∣∣∣∣ = ∆̄Q0u · ∇SQ0u · ∆̄Q0u dx ≤ d2Q0(K ∗ d 2)1/2(Q0). Applying these estimates to the flux (28) we arrive at the following con- clusion. Theorem 3.4. The energy flux through dyadic shells between wavenumbers λQ0 and λQ1 is controlled primarily by the end-point scales. More precisely, the following estimate holds (33) |ΠQ0Q1| ≤ C(K ∗ d2)3/2(Q0) + C(K ∗ d2)3/2(Q1). 3.4. Construction of a divergence free vector field with non-vanishing energy flux. In this section we give a construction of a divergence free vector field in B 3,∞(R 3) for which the energy flux is bounded from below by a positive constant. This suggests the sharpness ofB 3,c(N) (R3) for energy conservation. Our construction is based on Eyink’s example on a torus [12], which we transform to R3 using a method described below. Let χQ(ξ) = χ(λ Q+1ξ). We define P ξ for vectors ξ ∈ R3, ξ 6= 0 by P⊥ξ v = v − |ξ|−2(v · ξ)ξ = I− |ξ|−2(ξ ⊗ ξ) for v ∈ C3 and we use v · w = vjwj for v, w ∈ C3. Lemma 3.5. Let Φk(x) be R3 – valued functions, such that Ik := |ξ||FΦk(ξ)| dξ <∞. ENERGY CONSERVATION 11 Let also Ψk(x) = P(e ik·xΦk(x)) where P is the Leray projector onto the space of divergence free vectors. Then (34) sup ∣∣Ψk(x)− eik·x(P⊥k Φk)(x) ∣∣ ≤ 1 |k| , (35) sup ∣∣(S2QΨk)(x)− χ2Q(k)Ψk(x) ∣∣ ≤ c (2π)3 where c is the the Lipschitz constant of χ(ξ)2. Proof. First, note that for any k, ξ ∈ R3 and v ∈ C3 we have (v · ξ)ξ |ξ|2 + (v · ξ)k ∣∣∣∣ ≤ |ξ| ξ + |v||ξ + k| |k| . In addition, it follows that (v · k)k |k|2 + (v · ξ)k ∣∣∣∣ = |(v · (k + ξ))k| ≤ |v||ξ + k||k| . Adding (36) and (37) we obtain |P⊥ξ v − P⊥k v| = (v · ξ)ξ |ξ|2 − (v · k)k (v · ξ)ξ |ξ|2 + (v · ξ)k ∣∣∣∣ + (v · k)k |k|2 + (v · ξ)k ≤ 2 |v||ξ + k||k| . Using this inequality we can now derive the following estimate: |Ψk(x)− eik·x(P⊥k Φk)(x)| = |F−1[P⊥ξ (FΦk)(ξ + k)− P⊥k (FΦk)(ξ + k)]| (2π)3 |ξ + k| |k| |(FΦk)(ξ + k)| dξ = |k|−1 1 |ξ||(FΦk(ξ))| dξ. 12 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY Finally, we have |(S2QΨk)(x)− χQ(k)2Ψk(x)| = |F−1[(χQ(ξ)2 − χQ(k)2)(FΨk)(ξ)]| (2π)3 c|ξ + k| |(FΦk)(ξ + k)| dξ = λ−1Q+1 (2π)3 |ξ||(FΦk)(ξ)| dξ, where c is the the Lipschitz constant of χ(ξ)2. This concludes the proof. � Example illustrating the sharpness of Theorem 3.3. Now we proceed to construct a divergence free vector field in B 3,∞(R 3) with non-vanishing energy flux. Let U(k) be a vector field U : Z3 → C3 as in Eyink’s example [12] with U(λq, 0, 0) = iλ q (0, 0,−1), U(−λq, 0, 0) = iλ−1/3q (0, 0, 1), U(0, λq, 0) = iλ q (1, 0, 1), U(0,−λq, 0) = iλ−1/3q (−1, 0,−1), U(λq, λq, 0) = iλ q (0, 0, 1), U(−λq,−λq, 0) = iλ−1/3q (0, 0,−1), U(λq,−λq, 0) = iλ−1/3q (1, 1,−1), U(−λq, λq, 0) = iλ−1/3q (−1,−1, 1), for all q ∈ N and zero otherwise. Denote ρ(x) = F−1χ(4ξ) and A =∫ ρ(x)3 dx. Since χ(ξ) is radial, ρ(x) is real. Moreover, ρ(x)3 dx = (2π)3 F(ρ2)Fρ dξ (2π)6 χ(4η)χ(4(ξ − η))χ(4ξ) dηdξ > 0. Now let u(x) = P U(k)eik·xρ(x). Note that u ∈ B1/33,∞(R3). Our goal is to estimate the flux ΠQ for the vector field u. Define Φk = |k|1/3U(k)ρ(x) and Ψk(x) = P(eik·xΦk(x)). Then clearly Φk(x) and Ψk(x) satisfy the conditions of Lemma 3.5, and we (41) u(x) = k∈Z3\{0} |k|−1/3Ψk(x). ENERGY CONSERVATION 13 Now note that Ψk1 · ∇S2QΨk2 = Ψk1 · S2QP[∇(eik·xΦk2)] = i(Ψk1 · k2)S2QΨk2 +Ψk1 · S2QP(eik2·x∇Φk2). In addition, the following equality holds by construction: (43) P⊥k Φk = Φk, ∀k ∈ Z3. Define the annulusAQ = Z 3∩B(0, λQ+2)\B(0, λQ−1). Thanks to Lemma 3.5, for any sequences k1(Q), k2(Q), k3(Q) ∈ AQ with k1 + k2 = k3, we have (Ψk1 · ∇S2QΨk2) ·Ψ∗k3 dx = i (Ψk1 · k2)S2QΨk2 ·Ψ∗k3 dx+O(λ (eik1·xΦk1 · k2)χQ(k2)2eik2·xΦk2 · e−ik3·xΦ∗k3 dx+O(λ = i(|k1||k2||k3|)1/3A(U(k1) · k2)χQ(k2)2U(k2) · U(k3)∗ +O(λ0Q). On the other hand, since the Fourier transform of Ψk is supported in B(k, 1/4), we have (Ψk1 · ∇S2QΨk2) ·Ψ∗k3 dx = 0, whenever k1 + k2 6= k3. In addition, due to locality of interactions in this example, (44) also holds if Aq \ {k1, k2, k3} 6= ∅ for all q ∈ N. Finally, (Ψk1 · ∇S2QΨk2) ·Ψ∗k3 dx+ (Ψk1 · ∇S2QΨk3) ·Ψ∗k2 dx = 0, whenever k2 /∈ AQ and k3 /∈ AQ. Hence, the flux for u can be written as (46) ΠQ = − k1,k2,k3∈AQ k1+k2+k3=0 (|k1||k2||k3|)−1/3 (Ψk1 · ∇S2QΨk2) ·Ψk3 dx. Since the number of nonzero terms in the above sum is independent of Q, we obtain (47) ΠQ = AΠ̃Q +O(λ where Π̃ is the flux for the vector field U , i.e., (48) Π̃Q := − k1,k2,k3∈AQ k1+k2+k3=0 i(U(k1) · k2)χQ(k2)2U(k2) · U(k3). 14 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY The flux Π̃Q has only the following non-zero terms (see [12] for details): |k2|=λQ |k3|= i(U1(−k2 − k3) · k2)U2(k2) · U3(k3)(χQ(k2)2 − χQ(k3)2) ≥ 4(χ(1/2)2 − χ(1/ 2)2), |k2|= |k3|=2λQ i(U1(−k2 − k3) · k2)U2(k2) · U3(k3)(χQ(k2)2 − χQ(k3)2) ≥ 4(χ(1/ 2)2 − χ(1)2). Hence Π̃Q ≥ 4(χ(1/2)2 − χ(1/ 2)2 + χ(1/ 2)2 − χ(1)2) = 4. This together with (47) implies that lim inf ΠQ ≥ 4A. 4. OTHER CONSERVATION LAWS In this section we apply similar techniques to derive optimal results con- cerning the conservation of helicity in 3D and that of enstrophy in 2D for weak solutions of the Euler equation. In the case of the helicity flux we prove that simultaneous infrared and ultraviolet localization occurs, as for the energy flux. However, the enstrophy flux exhibits strong localization only in the ultraviolet region, and a partial localization in the infrared re- gion. A possibility of such a type of localization was discussed in [18]. 4.1. Helicity. For a divergence-free vector field u ∈ H1/2 with vorticity ω = ∇ × u ∈ H−1/2 we define the helicity and truncated helicity flux as follows u · ω dx(49) Tr [SQ(u⊗ u) · ∇SQω + SQ(u ∧ ω) · ∇SQu] dx,(50) where u ∧ ω = u ⊗ ω − ω ⊗ u. Thus, if u was a solution to the Euler equation, then HQ would be the time derivative of the Littlewood-Paley helicity at frequency λQ, SQu · SQω dx. ENERGY CONSERVATION 15 Let us denote bq = λ q ‖∆qu‖3,(51) b2 = {b2q}∞q=−1,(52) T (q) = q , q ≤ 0; q , q > 0, Proposition 4.1. The helicity flux of a divergence-free vector field u ∈ H1/2 satisfies the following estimate (54) |HQ| ≤ C(T ∗ b2)3/2(Q). Theorem 4.2. The total helicity flux of any divergence-free vector field in the class B 3,c(N) ∩H1/2 vanishes, i.e. (55) lim HQ = 0. Consequently, every weak solution to the Euler equation that belongs to the class L3([0, T ];B 3,c(N) ) ∩ L∞([0, T ];H1/2) conserves helicity. Proposition 4.1 and Theorem 4.2 are proved by direct analogy with the proofs of Proposition 3.2 and Theorem 3.3. Example illustrating the sharpness of Theorem 4.2. We can also construct an example of a vector field in B 3) for which the helicity flux is bounded from below by a positive constant. Indeed, let U(k) be a vector field U : Z3 → C3 with U(±λq, 0, 0) = λ−2/3q (0, 0,−1), U(0,±λq, 0) = λ−2/3q (1, 0, 1), U(±λq,±λq, 0) = λ−2/3q (0, 0, 1), U(±λq,∓λq, 0) = λ−2/3q (1, 1,−1), for all q ∈ N and zero otherwise. Denote ρ(x) = F−1χ(4ξ),A = ρ(x)3 dx, and let (56) u(x) = P U(k)eik·xρ(x). Note that u ∈ B2/33,∞(R3). On the other hand, a computation similar to the one in Section 3.4 yields (57) lim inf |HQ| ≥ 4A. 16 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY 4.2. Enstrophy. We work with the case of a two dimensional fluid in this section. In order to obtain an expression for the enstrophy flux one can use the original weak formulation of the Euler equation for velocities (9) with the test function chosen to be (58) ψ = ∇⊥S2Qω. Let us denote by ΩQ the expression resulting on the right hand side of (9): (59) ΩQ = SQ(u⊗ u) · ∇∇⊥SQω Thus, d‖SQω‖22 = ΩQ. As before we write rQ(u, u) · ∇∇⊥SQω (u− SQu)⊗ (u− SQu) · ∇∇⊥SQω Let us denote cq = ‖∆qω‖3,(61) c2 = {c2q}∞q=−1,(62) W (q) = λ2q, q ≤ 0; λ−4q , q > 0, We have the following estimate (absolute constants are omitted) |ΩQ| ≤ ∣∣∣h̃Q(y) ∣∣∣ (‖∇SQu‖23|y|2 + ‖(I − SQ)u‖23)‖∇2SQω‖3dy + ‖(I − SQ)u‖23‖∇2SQω‖3 λ−2Q ‖SQω‖23 + λ−2q c λ−2q c ≤ ‖SQω‖23 λ−4Q−qc λ2Q−qc λ−4Q−qc ≤ ‖SQω‖23(W ∗ c2)1/2(Q) + (W ∗ c2)3/2(Q) Thus, we have proved the following proposition. ENERGY CONSERVATION 17 FIGURE 1. Construction of the vector field illustrating in- frared nonlocality. Proposition 4.3. The enstrophy flux of a divergence-free vector field satis- fies the following estimate up to multiplication by an absolute constant (64) |ΩQ| ≤ ‖SQω‖23(W ∗ c2)1/2(Q) + (W ∗ c2)3/2(Q). Consequently, every weak solution to the 2D Euler equation with ω ∈ L3([0, T ];L3) conserves enstrophy. Much stronger results concerning conservation of enstrophy are available for the Euler equations ([13], [20]) and for the long time zero-viscosity limit for damped and driven Navier-Stokes equations ([9]). Example illustrating infrared nonlocality. We conclude this section with a construction of a vector field for which the enstrophy cascade is nonlocal in the infrared range. Let θq = arcsin(λq−Q−2) and (65) U lq = (cos(θq),− sin(θq)), Uhq = (sin(θq), cos(θq)), klq = λq(sin(θq), cos(θq)), k λ2Q+2 − λ2q(cos(θq),− sin(θq)), see Fig. 4.2 for the case q = Q. Denote ρ(x) = δh̃(δx),A = ρ(x)3 dx =∫ h̃(x)3 dx. Note that A > 0 and is independent of δ. Now let (67) ulq(x) = P[U q sin(k q · x)ρ(x)], uhq (x) = P[Uhq sin(khq · x)ρ(x)]. (68) uq(x) = u q(x) + u q (x) 18 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY for q = 0, . . . , Q, and (69) uQ+1(x) = P[V sin(λQ+2x1)ρ(x)], where V = (0, 1). Now define (70) u(x) = uq(x). Our goal is to estimate the enstrophy flux for u. Since Fu is compactly supported, the expression (59) is equivalent to (71) ΩQ = (u · ∇)S2Qω · ω dx. It is easy to see that (72) ΩQ ≥ (uhq · ∇)S2Q(∇⊥ · ulq)(∇⊥ · uQ+1) dx. Using Lemma 3.5 we obtain ΩQ ≥ A |Uhq |λ2q|U lq|λQ+2|V |+O(δ) = λQ+2‖∆Q+2u‖3 λ2q‖∆qu‖23 +O(δ), which shows sharpness of (64) in the infrared range. 5. INEQUALITIES FOR THE NONLINEAR TERM We take d = 3 and consider u, v ∈ B 3,2 with ∇ · u = 0 and wish to examine the advective term (74) B(u, v) = P(u · ∇v) = ΛH(u⊗ v) where (75) [H(u⊗ v)]i = Rj(ujvi) +Ri(RkRl(ukvl)) and P is the Leray-Hodge projector, Λ = (−∆) 12 is the Zygmund operator and Rk = ∂kΛ −1 are Riesz transforms. Proposition 5.1. The bilinear advective term B(u, v) maps continuously the space B 3,2 × B 3,2 to the space B . More precisely, there exist ENERGY CONSERVATION 19 bilinear continuous maps C(u, v), I(u, v) so that B(u, v) = C(u, v) + I(u, v) and constants C such that, for all u, v ∈ B 3,2 with ∇ · u = 0, (76) ‖C(u, v)‖ ≤ C‖u‖ (77) ‖I(u, v)‖ ≤ C‖u‖ hold. If u, v, w ∈ B (78) |〈B(u, v), w〉| ≤ C‖u‖ holds. So the trilinear map (u, v, w) 7→ 〈B(u, v), w〉 defined for smooth vector fields in L3 has a unique continuous extension to and a fortiori to Proof. We use duality. We take w smooth (w ∈ B ) and take the scalar product 〈B(u, v), w〉 = B(u, v) · wdx We write, in the spirit of the paraproduct of Bony ([1]) (79) ∆q(B(u, v)) = Cq(u, v) + Iq(u, v) (80) Cq(u, v) = p≥q−2, |p−p′|≤2 ∆q(ΛH(∆pu,∆p′v)) Iq(u, v) = [∆qΛH(Sq+j−2u,∆q+jv) + ∆qΛH(Sq+j−2v,∆q+ju)] 20 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY We estimate the contribution coming from the Cq(u, v): |〈Cq(u, v), w〉| |q−q′|≤1 p≥q−2, |p−p′|≤2 3∆pu‖L3‖Λ 3∆p′v‖L3‖∆q′w‖L3 |p−p′|≤2 3∆pu‖L3‖Λ 3∆p′v‖L3 q≤p+2,|q−q′|≤1 3∆q′w‖L3 |p−p′|≤2 ‖Λ 13∆pu‖ L3‖Λ 3∆p′v‖L3  ‖w‖ ≤ C‖u‖ This shows that the bilinear map C(u, v) = q≥−1Cq(u, v) maps continu- ously (82) |〈C(u, v), w〉| ≤ C‖u‖ The terms Iq(u, v) contribute |〈Iq(u, v), w〉| |j|≤2, |q−q′|≤1 λq‖Sq+j−2u‖ ‖∆q+jv‖L3‖∆q′w‖ |j|≤2, |q−q′|≤1 λq‖Sq+j−2v‖ ‖∆q+ju‖L3‖∆q′w‖ ≤ C‖u‖ |j|≤2,|q−q′|≤1 q ‖∆q+jv‖L3λ q ‖∆q′w‖ +C‖v‖ |j|≤2,|q−q′|≤1 q ‖∆q+ju‖L3λ q ‖∆q′w‖ ≤ C‖u‖ Here we used the fact that ‖Squ‖ ≤ C‖u‖ ENERGY CONSERVATION 21 This last fact is proved easily: ‖Sq(u)‖ ∥∥∥∥∥∥ |∆ju|2 ∥∥∥∥∥∥ ‖∆ju‖2 ≤ C‖u‖ We used Minkowski’s inequality in L 4 in the penultimate inequality and Bernstein’s inequality in the last. This proves that I maps continuously 3,2 × B 3,2 to B The proof of (78) follows along the same lines. Because of Bernstein’s inequalities, the inequality (82) for the trilinear term 〈C(u, v), w〉 is stronger than (78). The estimate of I follows: |〈Iq(u, v), w〉| |j|≤2, |q−q′|≤1 λq‖Sq+j−2u‖ ‖∆q+jv‖ ‖∆q′w‖ |j|≤2, |q−q′|≤1 λq‖Sq+j−2v‖ ‖∆q+ju‖ ‖∆q′w‖ ≤ C‖u‖ |j|≤2,|q−q′|≤1 q ‖∆q+jv‖ q ‖∆q′w‖ +C‖v‖ |j|≤2,|q−q′|≤1 q ‖∆q+ju‖ q ‖∆q′w‖ + ‖v‖ This concludes the proof. ✷ The inequality (82) is not true for 〈B(u, v), w〉 and (78) is close to being optimal: Proposition 5.2. For any 0 ≤ s ≤ 1 , 1 < p < ∞, 2 < r ≤ ∞ there exist functions u, v, w ∈ Bsp,r and smooth, rapidly decaying functions un, vn, wn, such that limn→∞ un = u, limn→∞ vn = v, limn→∞wn = w hold in the norm of Bsp,r and such that 〈B(un, vn), wn〉 = ∞ Proof. We start the construction with a divergence-free, smooth function u such that Fu ∈ C∞0 (B(0, 14)) and u31dx > 0. We select a direction 22 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY e = (1, 0, 0) and set Φ = (0, u1, 0). Then (83) A := (u(x) · e) ∣∣P⊥e Φ(x) ∣∣2 dx > 0. Next we consider the sequence aq = so that (aq) ∈ ℓr(N) for r > 2, but not for r = 2, and the functions (84) vn = q aqP [sin(λqe · x)Φ(x)] (85) wn = q aqP [cos(λqe · x)Φ(x)] . Clearly, the limits v = limn→∞ vn and w = limn→∞wn exist in norm in every Bsp,r with 0 ≤ s ≤ 12 , 1 < p < ∞ and r > 2. Manifestly, by construction, u, vn and wn are divergence-free, and because their Fourier transforms are in C∞0 , they are rapidly decaying functions. Clearly also 〈B(u, vn), wn〉 = P(u · ∇vn)wndx = (u · ∇vn) · wndx. The terms corresponding to each q in u · ∇vn = (u(x) · e)aqλ q P [cos(λqe · x)Φ(x)] q u(x) · P [sin(λqe · x)∇Φ(x)] and in (85) have Fourier transforms supported B(λqe, ) ∪ B(−λq, 12) and respectively B(λqe, ) ∪ B(−λqe, 14). These are mutually disjoint sets for distinct q and, consequently, the terms corresponding to different indices q do not contribute to the integral (u · ∇vn) · wndx. The terms from the second sum in (86) form a convergent series. Therefore, using Lemma 3.5, we obtain (u · vn) · wn = (u(x) · e) {P [cos(λqe · x)Φ(x)]}2 dx+O(1) (u(x) · e) ∣∣P⊥e Φ(x) ∣∣2 dx+O(1) A+O(1), ENERGY CONSERVATION 23 which concludes the proof. � ACKNOWLEDGMENT The work of AC was partially supported by NSF PHY grant 0555324, the work of PC by NSF DMS grant 0504213, the work of SF by NSF DMS grant 0503768, and the work of RS by NSF DMS grant 0604050. REFERENCES [1] J-M Bony, Calcul symbolique et propagation des singularité pour leséquations aux dérivées partielles non linéaires, Ann. Ecole Norm. Sup. 14 (1981) 209–246. [2] D. Chae, Remarks on the helicity of the 3-D incompressible Euler equations, Comm. Math. Phys. 240 (2003), 501–507. [3] J-Y Chemin, Perfect Incompressible Fluids, Clarendon Press, Oxford Univ 1998. [4] A. Cheskidov, S. Friedlander and N. Pavlović, An inviscid dyadic model of turbu- lence: the fixed point and Onsager’s conjecture, Journal of Mathematical Physics, to appear. [5] A. Cheskidov, S. Friedlander and N. Pavlović, An inviscid dyadic model of turbu- lence: the global attractor (with S. Friedlander and N. Pavlović), preprint. [6] P. Constantin, The Littlewood-Paley spectrum in 2D turbulence, Theor. Comp. Fluid Dyn.9 (1997), 183-189. [7] P. Constantin, W. E, E. Titi, Onsager’s conjecture on the energy conservation for solutions of Euler’s equation, Commun. Math. Phys. 165 (1994), 207–209. [8] P. Constantin, B. Levant, E. Titi, Regularity of inviscid shell models of turbulence, Physical Review E 75 1 (2007) 016305. [9] P. Constantin, F. Ramos, Inviscid limit for damped and driven incompressible Navier- Stokes equations in R2, Commun. Math. Phys., to appear (2007). [10] C. De Lellis and L. Székelyhidi, The Euler equations as differential inclusion, preprint. [11] J. Duchon and R. Robert, Inertial energy dissipation for weak solutions of incom- pressible Euler and Navier-Stokes equations, Nonlinearity 13 (2000), 249–255. [12] G. L. Eyink, Energy dissipation without viscosity in ideal hydrodynamics. I. Fourier analysis and local energy transfer, Phys. D 78 (1994), 222–240. [13] G. L. Eyink, Locality of turbulent cascades, Phys. D 207 (2005), 91–116. [14] G. L. Eyink and K. R. Sreenivasan, Onsager and the theory of hydrodynamic turbu- lence, Rev. Mod. Phys. 78 (2006). [15] U. Frisch, Turbulence. The legacy of A. N. Kolmogorov. Cambridge University Press, Cambridge, 1995. [16] A. N. Kolmogorov, The local structure of turbulence in incompressible viscous fluids at very large Reynolds numbers, Dokl. Akad. Nauk. SSSR 30 (1941), 301–305. [17] R. H. Kraichnan, The structure of isotropic turbulence at very high Reynolds num- bers, J. Fluid Mech. 5 (1959), 497–543. [18] R. H. Kraichnan, Inertial ranges in two-dimensional turbulence, Phys. Fluids 10 (1967), 1417-1423. [19] P-G Lemarié-Rieusset, Recent developments in the Navier-Stokes problem, Chapman and Hall/CRC, Boca Raton, 2002. [20] M. Lopes Filho, A. Mazzucato, H. Nussenzveig-Lopes, Weak solutions, renormalized solutions and enstrophy defects in 2D turbulence, ARMA 179 (2006), 353-387. 24 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY [21] H. K. Moffatt and A. Tsinober, Helicity in laminar and turbulent flow, Ann. Rev. Fluid Mech. 24 (1992), 281–312. [22] L. Onsager, Statistical Hydrodynamics, Nuovo Cimento (Supplemento) 6 (1949), 279–287. [23] R. Robert, Statistical Hydrodynamics ( Onsager revisited ), Handbook of Mathemat- ical Fluid Dynamics, vol 2 ( 2003), 1–55. Ed. Friedlander and Serre. Elsevier. [24] V. Scheffer, An inviscid flow with compact support in space-time, J. Geom. Anal. 3(4) (1993), 343–401. [25] A. Shnirelman, On the nonuniqueness of weak solution of the Euler equation, Comm. Pure Appl. Math. 50 (1997), 1261–1286. (A. Cheskidov) DEPARTMENT OF MATHEMATICS, UNIVERSITY OF MICHIGAN, ANN ARBOR, MI 48109 E-mail address: acheskid@umich.edu (P. Constantin) DEPARTMENT OF MATHEMATICS, UNIVERSITY OF CHICAGO, CHICAGO, IL 60637 E-mail address: const@cs.uchicago.edu (S. Friedlander and R. Shvydkoy) DEPARTMENT OF MATHEMATICS, STAT. AND COMP. SCI., UNIVERSITY OF ILLINOIS, CHICAGO, IL 60607 E-mail address: susan@math.northwestern.edu E-mail address: shvydkoy@math.uic.edu ABSTRACT Onsager conjectured that weak solutions of the Euler equations for incompressible fluids in 3D conserve energy only if they have a certain minimal smoothness, (of order of 1/3 fractional derivatives) and that they dissipate energy if they are rougher. In this paper we prove that energy is conserved for velocities in the function space $B^{1/3}_{3,c(\NN)}$. We show that this space is sharp in a natural sense. We phrase the energy spectrum in terms of the Littlewood-Paley decomposition and show that the energy flux is controlled by local interactions. This locality is shown to hold also for the helicity flux; moreover, every weak solution of the Euler equations that belongs to $B^{2/3}_{3,c(\NN)}$ conserves helicity. In contrast, in two dimensions, the strong locality of the enstrophy holds only in the ultraviolet range. <|endoftext|><|startoftext|> Search for Heavy, Long-Lived Particles that Decay to Photons at CDF II A. Abulencia,24 J. Adelman,13 T. Affolder,10 T. Akimoto,55 M.G. Albrow,17 S. Amerio,43 D. Amidei,35 A. Anastassov,52 K. Anikeev,17 A. Annovi,19 J. Antos,14 M. Aoki,55 G. Apollinari,17 T. Arisawa,57 A. Artikov,15 W. Ashmanskas,17 A. Attal,3 A. Aurisano,42 F. Azfar,42 P. Azzi-Bacchetta,43 P. Azzurri,46 N. Bacchetta,43 W. Badgett,17 A. Barbaro-Galtieri,29 V.E. Barnes,48 B.A. Barnett,25 S. Baroiant,7 V. Bartsch,31 G. Bauer,33 P.-H. Beauchemin,34 F. Bedeschi,46 S. Behari,25 G. Bellettini,46 J. Bellinger,59 A. Belloni,33 D. Benjamin,16 A. Beretvas,17 J. Beringer,29 T. Berry,30 A. Bhatti,50 M. Binkley,17 D. Bisello,43 I. Bizjak,31 R.E. Blair,2 C. Blocker,6 B. Blumenfeld,25 A. Bocci,16 A. Bodek,49 V. Boisvert,49 G. Bolla,48 A. Bolshov,33 D. Bortoletto,48 J. Boudreau,47 A. Boveia,10 B. Brau,10 L. Brigliadori,5 C. Bromberg,36 E. Brubaker,13 J. Budagov,15 H.S. Budd,49 S. Budd,24 K. Burkett,17 G. Busetto,43 P. Bussey,21 A. Buzatu,34 K. L. Byrum,2 S. Cabreraq,16 M. Campanelli,20 M. Campbell,35 F. Canelli,17 A. Canepa,45 S. Carilloi,18 D. Carlsmith,59 R. Carosi,46 S. Carron,34 B. Casal,11 M. Casarsa,54 A. Castro,5 P. Catastini,46 D. Cauz,54 M. Cavalli-Sforza,3 A. Cerri,29 L. Cerritom,31 S.H. Chang,28 Y.C. Chen,1 M. Chertok,7 G. Chiarelli,46 G. Chlachidze,17 F. Chlebana,17 I. Cho,28 K. Cho,28 D. Chokheli,15 J.P. Chou,22 G. Choudalakis,33 S.H. Chuang,52 K. Chung,12 W.H. Chung,59 Y.S. Chung,49 M. Cilijak,46 C.I. Ciobanu,24 M.A. Ciocci,46 A. Clark,20 D. Clark,6 M. Coca,16 G. Compostella,43 M.E. Convery,50 J. Conway,7 B. Cooper,31 K. Copic,35 M. Cordelli,19 G. Cortiana,43 F. Crescioli,46 C. Cuenca Almenarq,7 J. Cuevasl,11 R. Culbertson,17 J.C. Cully,35 S. DaRonco,43 M. Datta,17 S. D’Auria,21 T. Davies,21 D. Dagenhart,17 P. de Barbaro,49 S. De Cecco,51 A. Deisher,29 G. De Lentdeckerc,49 G. De Lorenzo,3 M. Dell’Orso,46 F. Delli Paoli,43 L. Demortier,50 J. Deng,16 M. Deninno,5 D. De Pedis,51 P.F. Derwent,17 G.P. Di Giovanni,44 C. Dionisi,51 B. Di Ruzza,54 J.R. Dittmann,4 M. D’Onofrio,3 C. Dörr,26 S. Donati,46 P. Dong,8 J. Donini,43 T. Dorigo,43 S. Dube,52 J. Efron,39 R. Erbacher,7 D. Errede,24 S. Errede,24 R. Eusebi,17 H.C. Fang,29 S. Farrington,30 I. Fedorko,46 W.T. Fedorko,13 R.G. Feild,60 M. Feindt,26 J.P. Fernandez,32 R. Field,18 G. Flanagan,48 R. Forrest,7 S. Forrester,7 M. Franklin,22 J.C. Freeman,29 I. Furic,13 M. Gallinaro,50 J. Galyardt,12 J.E. Garcia,46 F. Garberson,10 A.F. Garfinkel,48 C. Gay,60 H. Gerberich,24 D. Gerdes,35 S. Giagu,51 P. Giannetti,46 K. Gibson,47 J.L. Gimmell,49 C. Ginsburg,17 N. Giokarisa,15 M. Giordani,54 P. Giromini,19 M. Giunta,46 G. Giurgiu,25 V. Glagolev,15 D. Glenzinski,17 M. Gold,37 N. Goldschmidt,18 J. Goldsteinb,42 A. Golossanov,17 G. Gomez,11 G. Gomez-Ceballos,33 M. Goncharov,53 O. González,32 I. Gorelov,37 A.T. Goshaw,16 K. Goulianos,50 A. Gresele,43 S. Grinstein,22 C. Grosso-Pilcher,13 R.C. Group,17 U. Grundler,24 J. Guimaraes da Costa,22 Z. Gunay-Unalan,36 C. Haber,29 K. Hahn,33 S.R. Hahn,17 E. Halkiadakis,52 A. Hamilton,20 B.-Y. Han,49 J.Y. Han,49 R. Handler,59 F. Happacher,19 K. Hara,55 D. Hare,52 M. Hare,56 S. Harper,42 R.F. Harr,58 R.M. Harris,17 M. Hartz,47 K. Hatakeyama,50 J. Hauser,8 C. Hays,42 M. Heck,26 A. Heijboer,45 B. Heinemann,29 J. Heinrich,45 C. Henderson,33 M. Herndon,59 J. Heuser,26 D. Hidas,16 C.S. Hillb,10 D. Hirschbuehl,26 A. Hocker,17 A. Holloway,22 S. Hou,1 M. Houlden,30 S.-C. Hsu,9 B.T. Huffman,42 R.E. Hughes,39 U. Husemann,60 J. Huston,36 J. Incandela,10 G. Introzzi,46 M. Iori,51 A. Ivanov,7 B. Iyutin,33 E. James,17 D. Jang,52 B. Jayatilaka,16 D. Jeans,51 E.J. Jeon,28 S. Jindariani,18 W. Johnson,7 M. Jones,48 K.K. Joo,28 S.Y. Jun,12 J.E. Jung,28 T.R. Junk,24 T. Kamon,53 P.E. Karchin,58 Y. Kato,41 Y. Kemp,26 R. Kephart,17 U. Kerzel,26 V. Khotilovich,53 B. Kilminster,39 D.H. Kim,28 H.S. Kim,28 J.E. Kim,28 M.J. Kim,17 S.B. Kim,28 S.H. Kim,55 Y.K. Kim,13 N. Kimura,55 L. Kirsch,6 S. Klimenko,18 M. Klute,33 B. Knuteson,33 B.R. Ko,16 K. Kondo,57 D.J. Kong,28 J. Konigsberg,18 A. Korytov,18 A.V. Kotwal,16 A.C. Kraan,45 J. Kraus,24 M. Kreps,26 J. Kroll,45 N. Krumnack,4 M. Kruse,16 V. Krutelyov,10 T. Kubo,55 S. E. Kuhlmann,2 T. Kuhr,26 N.P. Kulkarni,58 Y. Kusakabe,57 S. Kwang,13 A.T. Laasanen,48 S. Lai,34 S. Lami,46 S. Lammel,17 M. Lancaster,31 R.L. Lander,7 K. Lannon,39 A. Lath,52 G. Latino,46 I. Lazzizzera,43 T. LeCompte,2 E. Lee,53 J. Lee,49 J. Lee,28 Y.J. Lee,28 S.W. Leeo,53 R. Lefèvre,20 N. Leonardo,33 S. Leone,46 S. Levy,13 J.D. Lewis,17 C. Lin,60 C.S. Lin,17 M. Lindgren,17 E. Lipeles,9 A. Lister,7 D.O. Litvintsev,17 T. Liu,17 N.S. Lockyer,45 A. Loginov,60 M. Loreti,43 R.-S. Lu,1 D. Lucchesi,43 P. Lujan,29 P. Lukens,17 G. Lungu,18 L. Lyons,42 J. Lys,29 R. Lysak,14 E. Lytken,48 P. Mack,26 D. MacQueen,34 R. Madrak,17 K. Maeshima,17 K. Makhoul,33 T. Maki,23 P. Maksimovic,25 S. Malde,42 S. Malik,31 G. Manca,30 F. Margaroli,5 R. Marginean,17 C. Marino,26 C.P. Marino,24 A. Martin,60 M. Martin,25 V. Marting,21 M. Mart́ınez,3 R. Mart́ınez-Ballaŕın,32 T. Maruyama,55 P. Mastrandrea,51 T. Masubuchi,55 H. Matsunaga,55 M.E. Mattson,58 R. Mazini,34 P. Mazzanti,5 K.S. McFarland,49 P. McIntyre,53 R. McNultyf ,30 A. Mehta,30 P. Mehtala,23 S. Menzemerh,11 A. Menzione,46 P. Merkel,48 C. Mesropian,50 A. Messina,36 T. Miao,17 N. Miladinovic,6 J. Miles,33 R. Miller,36 C. Mills,10 M. Milnik,26 A. Mitra,1 G. Mitselmakher,18 A. Miyamoto,27 S. Moed,20 N. Moggi,5 B. Mohr,8 C.S. Moon,28 http://arxiv.org/abs/0704.0760v1 R. Moore,17 M. Morello,46 P. Movilla Fernandez,29 J. Mülmenstädt,29 A. Mukherjee,17 Th. Muller,26 R. Mumford,25 P. Murat,17 M. Mussini,5 J. Nachtman,17 A. Nagano,55 J. Naganoma,57 K. Nakamura,55 I. Nakano,40 A. Napier,56 V. Necula,16 C. Neu,45 M.S. Neubauer,9 J. Nielsenn,29 L. Nodulman,2 O. Norniella,3 E. Nurse,31 S.H. Oh,16 Y.D. Oh,28 I. Oksuzian,18 T. Okusawa,41 R. Oldeman,30 R. Orava,23 K. Osterberg,23 C. Pagliarone,46 E. Palencia,11 V. Papadimitriou,17 A. Papaikonomou,26 A.A. Paramonov,13 B. Parks,39 S. Pashapour,34 J. Patrick,17 G. Pauletta,54 M. Paulini,12 C. Paus,33 D.E. Pellett,7 A. Penzo,54 T.J. Phillips,16 G. Piacentino,46 J. Piedra,44 L. Pinera,18 K. Pitts,24 C. Plager,8 L. Pondrom,59 X. Portell,3 O. Poukhov,15 N. Pounder,42 F. Prakoshyn,15 A. Pronko,17 J. Proudfoot,2 F. Ptohose,19 G. Punzi,46 J. Pursley,25 J. Rademackerb,42 A. Rahaman,47 V. Ramakrishnan,59 N. Ranjan,48 I. Redondo,32 B. Reisert,17 V. Rekovic,37 P. Renton,42 M. Rescigno,51 S. Richter,26 F. Rimondi,5 L. Ristori,46 A. Robson,21 T. Rodrigo,11 E. Rogers,24 S. Rolli,56 R. Roser,17 M. Rossi,54 R. Rossin,10 P. Roy,34 A. Ruiz,11 J. Russ,12 V. Rusu,13 H. Saarikko,23 A. Safonov,53 W.K. Sakumoto,49 G. Salamanna,51 O. Saltó,3 L. Santi,54 S. Sarkar,51 L. Sartori,46 K. Sato,17 P. Savard,34 A. Savoy-Navarro,44 T. Scheidle,26 P. Schlabach,17 E.E. Schmidt,17 M.P. Schmidt,60 M. Schmitt,38 T. Schwarz,7 L. Scodellaro,11 A.L. Scott,10 A. Scribano,46 F. Scuri,46 A. Sedov,48 S. Seidel,37 Y. Seiya,41 A. Semenov,15 L. Sexton-Kennedy,17 A. Sfyrla,20 S.Z. Shalhout,58 M.D. Shapiro,29 T. Shears,30 P.F. Shepard,47 D. Sherman,22 M. Shimojimak,55 M. Shochet,13 Y. Shon,59 I. Shreyber,20 A. Sidoti,46 P. Sinervo,34 A. Sisakyan,15 A.J. Slaughter,17 J. Slaunwhite,39 K. Sliwa,56 J.R. Smith,7 F.D. Snider,17 R. Snihur,34 M. Soderberg,35 A. Soha,7 S. Somalwar,52 V. Sorin,36 J. Spalding,17 F. Spinella,46 T. Spreitzer,34 P. Squillacioti,46 M. Stanitzki,60 A. Staveris-Polykalas,46 R. St. Denis,21 B. Stelzer,8 O. Stelzer-Chilton,42 D. Stentz,38 J. Strologas,37 D. Stuart,10 J.S. Suh,28 A. Sukhanov,18 H. Sun,56 I. Suslov,15 T. Suzuki,55 A. Taffardp,24 R. Takashima,40 Y. Takeuchi,55 R. Tanaka,40 M. Tecchio,35 P.K. Teng,1 K. Terashi,50 J. Thomd,17 A.S. Thompson,21 E. Thomson,45 P. Tipton,60 V. Tiwari,12 S. Tkaczyk,17 D. Toback,53 S. Tokar,14 K. Tollefson,36 T. Tomura,55 D. Tonelli,46 S. Torre,19 D. Torretta,17 S. Tourneur,44 W. Trischuk,34 S. Tsuno,40 Y. Tu,45 N. Turini,46 F. Ukegawa,55 S. Uozumi,55 S. Vallecorsa,20 N. van Remortel,23 A. Varganov,35 E. Vataga,37 F. Vazquezi,18 G. Velev,17 G. Veramendi,24 V. Veszpremi,48 M. Vidal,32 R. Vidal,17 I. Vila,11 R. Vilar,11 T. Vine,31 I. Vollrath,34 I. Volobouevo,29 G. Volpi,46 F. Würthwein,9 P. Wagner,53 R.G. Wagner,2 R.L. Wagner,17 J. Wagner,26 W. Wagner,26 R. Wallny,8 S.M. Wang,1 A. Warburton,34 D. Waters,31 M. Weinberger,53 W.C. Wester III,17 B. Whitehouse,56 D. Whiteson,45 A.B. Wicklund,2 E. Wicklund,17 G. Williams,34 H.H. Williams,45 P. Wilson,17 B.L. Winer,39 P. Wittichd,17 S. Wolbers,17 C. Wolfe,13 T. Wright,35 X. Wu,20 S.M. Wynne,30 A. Yagil,9 K. Yamamoto,41 J. Yamaoka,52 T. Yamashita,40 C. Yang,60 U.K. Yangj,13 Y.C. Yang,28 W.M. Yao,29 G.P. Yeh,17 J. Yoh,17 K. Yorita,13 T. Yoshida,41 G.B. Yu,49 I. Yu,28 S.S. Yu,17 J.C. Yun,17 L. Zanello,51 A. Zanetti,54 I. Zaw,22 X. Zhang,24 J. Zhou,52 and S. Zucchelli5 (CDF Collaboration∗) 1Institute of Physics, Academia Sinica, Taipei, Taiwan 11529, Republic of China 2Argonne National Laboratory, Argonne, Illinois 60439 3Institut de Fisica d’Altes Energies, Universitat Autonoma de Barcelona, E-08193, Bellaterra (Barcelona), Spain 4Baylor University, Waco, Texas 76798 5Istituto Nazionale di Fisica Nucleare, University of Bologna, I-40127 Bologna, Italy 6Brandeis University, Waltham, Massachusetts 02254 7University of California, Davis, Davis, California 95616 8University of California, Los Angeles, Los Angeles, California 90024 9University of California, San Diego, La Jolla, California 92093 10University of California, Santa Barbara, Santa Barbara, California 93106 11Instituto de Fisica de Cantabria, CSIC-University of Cantabria, 39005 Santander, Spain 12Carnegie Mellon University, Pittsburgh, PA 15213 13Enrico Fermi Institute, University of Chicago, Chicago, Illinois 60637 14Comenius University, 842 48 Bratislava, Slovakia; Institute of Experimental Physics, 040 01 Kosice, Slovakia 15Joint Institute for Nuclear Research, RU-141980 Dubna, Russia 16Duke University, Durham, North Carolina 27708 17Fermi National Accelerator Laboratory, Batavia, Illinois 60510 18University of Florida, Gainesville, Florida 32611 19Laboratori Nazionali di Frascati, Istituto Nazionale di Fisica Nucleare, I-00044 Frascati, Italy 20University of Geneva, CH-1211 Geneva 4, Switzerland 21Glasgow University, Glasgow G12 8QQ, United Kingdom 22Harvard University, Cambridge, Massachusetts 02138 23Division of High Energy Physics, Department of Physics, University of Helsinki and Helsinki Institute of Physics, FIN-00014, Helsinki, Finland 24University of Illinois, Urbana, Illinois 61801 25The Johns Hopkins University, Baltimore, Maryland 21218 26Institut für Experimentelle Kernphysik, Universität Karlsruhe, 76128 Karlsruhe, Germany 27High Energy Accelerator Research Organization (KEK), Tsukuba, Ibaraki 305, Japan 28Center for High Energy Physics: Kyungpook National University, Taegu 702-701, Korea; Seoul National University, Seoul 151-742, Korea; SungKyunKwan University, Suwon 440-746, Korea 29Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, California 94720 30University of Liverpool, Liverpool L69 7ZE, United Kingdom 31University College London, London WC1E 6BT, United Kingdom 32Centro de Investigaciones Energeticas Medioambientales y Tecnologicas, E-28040 Madrid, Spain 33Massachusetts Institute of Technology, Cambridge, Massachusetts 02139 34Institute of Particle Physics: McGill University, Montréal, Canada H3A 2T8; and University of Toronto, Toronto, Canada M5S 1A7 35University of Michigan, Ann Arbor, Michigan 48109 36Michigan State University, East Lansing, Michigan 48824 37University of New Mexico, Albuquerque, New Mexico 87131 38Northwestern University, Evanston, Illinois 60208 39The Ohio State University, Columbus, Ohio 43210 40Okayama University, Okayama 700-8530, Japan 41Osaka City University, Osaka 588, Japan 42University of Oxford, Oxford OX1 3RH, United Kingdom 43University of Padova, Istituto Nazionale di Fisica Nucleare, Sezione di Padova-Trento, I-35131 Padova, Italy 44LPNHE, Universite Pierre et Marie Curie/IN2P3-CNRS, UMR7585, Paris, F-75252 France 45University of Pennsylvania, Philadelphia, Pennsylvania 19104 46Istituto Nazionale di Fisica Nucleare Pisa, Universities of Pisa, Siena and Scuola Normale Superiore, I-56127 Pisa, Italy 47University of Pittsburgh, Pittsburgh, Pennsylvania 15260 48Purdue University, West Lafayette, Indiana 47907 49University of Rochester, Rochester, New York 14627 50The Rockefeller University, New York, New York 10021 51Istituto Nazionale di Fisica Nucleare, Sezione di Roma 1, University of Rome “La Sapienza,” I-00185 Roma, Italy 52Rutgers University, Piscataway, New Jersey 08855 53Texas A&M University, College Station, Texas 77843 54Istituto Nazionale di Fisica Nucleare, University of Trieste/ Udine, Italy 55University of Tsukuba, Tsukuba, Ibaraki 305, Japan 56Tufts University, Medford, Massachusetts 02155 57Waseda University, Tokyo 169, Japan 58Wayne State University, Detroit, Michigan 48201 59University of Wisconsin, Madison, Wisconsin 53706 60Yale University, New Haven, Connecticut 06520 (Dated: November 3, 2018; Version 5.1) We present the first search for heavy, long-lived particles that decay to photons at a hadron collider. We use a sample of γ+jet+missing transverse energy events in pp̄ collisions at 1.96 TeV taken with the CDF II detector. Candidate events are selected based on the arrival time of the photon at the detector. Using an integrated luminosity of 570 pb−1 of collision data, we observe 2 events, consistent with the background estimate of 1.3±0.7 events. While our search strategy does not rely on model-specific dynamics, we set cross section limits in a supersymmetric model with eχ01 → γ eG and place the world-best 95% C.L. lower limit on the eχ01 mass of 101 GeV/c2 at τχ̃0 = 5 ns. PACS numbers: 13.85.Rm, 12.60.Jv, 13.85.Qk, 14.80.Ly ∗With visitors from aUniversity of Athens, bUniversity of Bristol, cUniversity Libre de Bruxelles, dCornell University, eUniversity of Cyprus, fUniversity of Dublin, gUniversity of Ed- inburgh, hUniversity of Heidelberg, iUniversidad Iberoamericana, jUniversity of Manchester, kNagasaki Institute of Applied Science, lUniversity de Oviedo, mUniversity of London, Queen Mary Col- lege, nUniversity of California Santa Cruz, oTexas Tech University, Searches for events with final state photons and miss- ing transverse energy (E/T ) [1] at collider experiments are sensitive to new physics from a wide variety of mod- els [2] including gauge mediated supersymmetry breaking (GMSB) [3]. In these models the lightest neutralino (χ̃01) decays into a photon (γ) and a weakly interacting, stable gravitino (G̃) that gives rise to E/T by leaving the detec- tor without depositing any energy. The observation of an eeγγE/T candidate event by the CDF experiment during Run I at the Fermilab Tevatron [4] has increased the in- terest in experimental tests of this class of theories. Most subsequent searches have focused on promptly produced photons [5, 6], however the χ̃01 can have a lifetime on the order of nanoseconds or more. This is the first search for heavy, long-lived particles that decay to photons at a hadron collider. We optimize our selection requirements using a GMSB model with a standard choice of parameters [7] and vary the values of the χ̃01 mass and lifetime. However, the final search strategy is chosen to be sufficiently general and independent of the specific GMSB model dynamics to yield results that are approximately valid for any model producing the same reconstructed final state topology and kinematics [8]. In pp̄ collisions at the Tevatron the inclusive GMSB production cross section is dominated by pair production of gauginos. The gauginos decay promptly, resulting in a pair of long-lived χ̃01’s in asso- ciation with other final state particles that can be identi- fied as jets. For a heavy χ̃01 decaying inside the detector, the photon can arrive at the face of the detector with a time delay relative to promptly produced photons. To have good sensitivity for nanosecond-lifetime χ̃01’s [8], we search for events that contain a time-delayed photon, E/T , and ≥ 1 jet. This is equivalent to requiring that at least one of the long-lived χ̃01’s decays inside the detector. This Letter summarizes [9] the first search for heavy, long-lived particles that decay to photons at a hadron collider. The data comprise 570±34 pb−1 of pp̄ collisions collected with the CDF II detector [10] at s = 1.96 TeV. Previous searches for nanosecond-lifetime particles using non-timing techniques yielded null results [11]. A full description of the CDF II detector can be found elsewhere [10]. Here we briefly describe the aspects of the detector relevant to this analysis. The magnetic spec- trometer consists of tracking devices inside a 3-m diame- ter, 5-m long superconducting solenoid magnet that op- erates at 1.4 T. An eight-layer silicon microstrip detector array and a 3.1-m long drift chamber with 96 layers of sense wires measure the position (~xi) and time (ti) of the pp̄ interaction [12] and the momenta of charged particles. Muons from collisions or cosmic rays are identified by a pUniversity of California Irvine, qIFIC(CSIC-Universitat de Valen- cia), system of drift chambers situated outside the calorime- ters in the region with pseudorapidity |η| < 1.1 [1]. The calorimeter consists of projective towers with elec- tromagnetic and hadronic compartments. It is divided into a central barrel that surrounds the solenoid coil (|η| < 1.1) and a pair of end-plugs that cover the region 1.1 < |η| < 3.6. Both calorimeters are used to identify and measure the energy and position of photons, elec- trons, jets, and E/T . The electromagnetic calorimeters were recently instrumented with a new system, the EM- Timing system (completed in Fall 2004) [13], that mea- sures the arrival time of electrons and photons in each tower with |η| < 2.1 for all energies above ∼5 GeV. The time and position of arrival of the photon at the calorimeter, tf and ~xf , are used to separate the photons from the decays of heavy, long-lived χ̃01’s from promptly produced photons or photons from non-collision sources. We define the corrected arrival time of the photon as tγc ≡ tf − ti − |~xf − ~xi| The tγc distribution for promptly produced, high energy photons is Gaussian with a mean of zero by construction and with a standard deviation that depends only on the measurement resolution assuming that the pp̄ production vertex has been correctly identified. Photons from heavy, long-lived particles can have arrival times that are many standard deviations larger than zero. The analysis preselection is summarized in Table I. It begins with events passing an online, three-level trigger by having a photon candidate in the region |η| < 1.1 with ET> 25 GeV and E/T> 25 GeV. Offline, the high- est ET photon candidate in the fiducial region of the calorimeter is required to have ET > 30 GeV and to pass the standard photon identification requirements [5] with a minor modification [14]. We require the event to have E/T > 30 GeV where the trigger is 100% effi- cient. We require at least one jet with |ηjet| < 2.0 and > 30 GeV [15]. Since a second photon can be identi- fied as a jet, the analysis is sensitive to signatures where one or both χ̃01’s decay inside the detector. To ensure a high quality ti and ~xi measurement, we require a ver- tex with at least 4 tracks, tracks pT > 15 GeV/c, and |zi| < 60 cm; this also helps to reduce non-collision back- grounds. For events with multiple reconstructed vertices, we pick the vertex with the highest tracks pT . To re- duce cosmic ray background, events are rejected if there are hits in a muon chamber that are not matched to any track and are within 30◦ of the photon. After the above requirements there are 11,932 events in the data sample. There are two major classes of background events: col- lision and non-collision photon candidates. Collision pho- tons are presumed to come from standard model interac- tions, e.g., γ+jet+mismeasured E/T , dijet+mismeasured E/T where the jet is mis-identified as a γ, and W → eν where the electron is mis-identified as a γ. Non-collision Preselection Requirements Cumulative (individual) Efficiency (%) > 30 GeV, E/T > 30 GeV 54 (54) Photon ID and fiducial, |η| < 1.0 39 (74)* Good vertex, tracks pT > 15 GeV/c 31 (79) |ηjet| < 2.0, Ejet > 30 GeV 24 (77) Cosmic ray rejection 23 (98)* Requirements after Optimization E/T > 40 GeV, E > 35 GeV 21 (92) ∆φ(E/T , jet) > 1 rad 18 (86) 2 ns < tγc < 10 ns 6 (33) TABLE I: The data selection criteria and the cumulative and individual requirement efficiencies for an example GMSB model point at mχ̃0 = 100 GeV/c2 and τχ̃0 = 5 ns. The ef- ficiencies listed are, in general, model-dependent and have a fractional uncertainty of 10%. Model-independent efficiencies are indicated with an asterisk. The collision fiducial require- ment of |zi| < 60 cm is part of the good vertex requirement (95%) and is estimated from data. backgrounds come from cosmic rays and beam effects that can produce photon candidates, E/T , and sometimes the reconstructed jet. We separate data events as a func- tion of tγc into several control regions that allow us to estimate the number of background events in the final signal region by fitting to the data using collision and non-collision shape templates as shown in Fig. 1. Collision photons are subdivided in two subclasses: correct and incorrect vertex selection [13]. An incorrect vertex can be selected when two or more collisions occur in one beam bunch crossing, making it possible that the highest reconstructed tracks pT vertex does not produce the photon. While the fraction of events with incorrect vertices depends on the final event selection criteria, the tγc distribution for each subclass is estimated separately using W → eν data where the electron track is dropped from the vertexing. For events with a correctly associ- ated vertex, the tγc distribution is Gaussian and centered at zero with a standard deviation of 0.64 ns [13]. For those with an incorrectly selected vertex the tγc distribu- tion is also Gaussian with a standard deviation of 2.05 ns. The tγc distributions for both non-collision backgrounds are estimated separately from data using events with no reconstructed tracks. Photon candidates from cos- mic rays are not correlated in time with collisions, and therefore their tγc distribution is roughly flat. Beam halo photon candidates are produced by muons that origi- nate upstream of the detector (from the p direction) and travel through the calorimeter, typically depositing small amounts of energy. When the muon deposits significant energy in the EM calorimeter, it can be misidentified as a photon and cause E/T . These photons populate predomi- nantly the negative tγc region, but can contribute to the signal region. Since beam halo muons travel parallel to the beam line, these events can be separated from cosmic ray events by identifying the small energy deposited in the calorimeter towers along the beam halo muon trajec- tory. The background prediction uses control regions out- side the signal time window but well within the 132 ns time window that the calorimeter uses to measure the energy. The non-collision background templates are nor- malized to match the number of events in two time win- dows: a beam halo-dominated window at {−20, −6} ns, selected to be 3σ away from the wrong vertex collision background, and a cosmic rays-dominated window at {25, 90} ns, well away from the standard model and beam halo contributions. The collision background is estimated by fitting events in the {−10, 1.2} ns window with the non-collision contribution subtracted and with the fraction of correct to incorrect vertex events allowed to vary. In this way the background for the signal region is entirely estimated from data samples. The systematic uncertainty on the background estimate is dominated by our ability to calibrate the mean of the tγc distribution for prompt photons. We find a variation of 200 ps on the mean and 20 ps on the standard deviation of the dis- tribution by considering various possible event selection criteria. These contribute to the systematic uncertainty of the collision background estimate in the signal region and are added in quadrature with the statistical uncer- tainties of the final fit procedure. We estimate the sensitivity to heavy, long-lived parti- cles that decay to photons using GMSB models for dif- ferent χ̃01 masses and lifetimes. Events from all SUSY processes are simulated with the pythia Monte Carlo program [16] along with the detector simulation [17]. The acceptance is the ratio of simulated events that pass all the requirements to all events produced. It is used in the optimization procedure and in the final limit setting and depends on a number of effects. The fraction of χ̃01 decays in the detector volume is the dominant effect on the acceptance. For a given lifetime this depends on the boost of the χ̃01. A highly boosted χ̃ 1 that decays in the detector typically does not contribute to the accep- tance because it tends to produce a photon traveling in the same direction as the χ̃01. Thus, the photon’s arrival time is indistinguishable from promptly produced pho- tons. At small boosts the decay is more likely to happen inside the detector, and the decay angle is more likely to be large, which translates into a larger delay for the photon. The fraction of events with a delayed photon ar- rival time initially rises as a function of χ̃01 lifetime, but falls as the fraction of χ̃01’s decaying outside the detector begins to dominates. In the χ̃01 mass region considered (65 ≤ mχ̃0 ≤ 150 GeV/c2), the acceptance peaks at a lifetime of around 5 ns. The acceptance also depends on the mass as the boost effects are mitigated by the ability to produce high energy photons or E/T in the collision, as discussed in Ref. [8]. The total systematic uncertainty of 10% on the ac- Photon Corrected Time of Arrival (ns) -20 0 20 40 60 80 )-1 + Jet data (570 pb E + γ Standard Model Beam Halo Cosmics GMSB Signal MC Photon Corrected Time of Arrival (ns) -20 0 20 40 60 80 FIG. 1: The time distribution for photons passing all but the final timing requirement for the background predic- tions, data, and a GMSB signal for an example point at = 100 GeV/c2, τχ̃0 = 5 ns. A total of 1.3±0.7 back- ground events are predicted and 2 (marked with a star) are observed in the signal region of 2 < tγc < 10 ns. ceptance is dominated by the uncertainty on the mean of the tγc distribution (7%) and on the photon ID effi- ciency (5%). Other significant contributions come from uncertainties on initial and final state radiation (3%), jet energy measurement (3%), and the parton distribution functions (1%). We determine the kinematic and tγc selection require- ments that define the final data sample by optimizing the expected cross section limit without looking at the data in the signal region. To compute the expected 95% confidence level (C.L.) cross section upper limit [18], we combine the predicted GMSB signal and background esti- mates with the systematic uncertainties using a Bayesian method with a flat prior [19]. The expected limits are op- timized by simultaneously varying the selection require- ments for E/T , photon ET , jet ET , azimuth angle be- tween the leading jet and E/T (∆φ(E/T , jet)), and t c . The ∆φ(E/T , jet) requirement rejects events where the E/T is overestimated because of a poorly measured jet. While each point in χ̃01 lifetime vs. mass space gives a slightly different optimization, we choose a single set of require- ments because it simplifies the final analysis, while only causing a small loss of sensitivity. The optimized require- ments are summarized in Table I. As an example, the ac- ceptance for mχ̃0 = 100 GeV/c2 and lifetime τχ̃0 = 5 ns is estimated to be (6.3±0.6)%. After all kinematic requirements, 508 events are ob- served in the data before the final signal region time re- quirement. Their time distribution is shown in Fig. 1. Our fit to the data outside the signal region predicts total backgrounds of 6.2±3.5 from cosmic rays, 6.8±4.9 from beam halo background sources, and the rest from the )2 mass (GeV/c 65 70 75 80 85 90 95 100 105 110 0 1χ∼ 1.0 pb 0.5 pb 0.3 pb 0.2 pb 0.13 pb FIG. 2: The contours of constant 95% C.L. upper cross section limits for a GMSB model [7]. standard model. Inside the signal time region, {2, 10} ns, we predict 1.25±0.66 events: 0.71±0.60 from standard model, 0.46±0.26 from cosmic rays, and 0.07±0.05 from beam halo. Two events are observed in the data. Since the result is consistent with the no-signal hypothesis, we set limits on the χ̃01 lifetime and mass. Figure 2 shows the contours of constant 95% C.L. cross section upper limit. Figure 3 shows the exclusion region at 95% C.L., along with the expected limit for comparison. This takes into account the predicted production cross section at next- to-leading order [20] as well as the uncertainties on the parton distribution functions (6%) and the renormaliza- tion scale (2%). Since the number of observed events is above expectations, the observed limits are slightly worse than the expected limits. These limits extend at large masses beyond those of LEP searches using photon “pointing” methods [11]. In conclusion, we have performed the first search for heavy, long-lived particles that decay to photons at a hadron collider using data collected with the EMTim- ing system at the CDF II detector. There is no excess of events beyond expectations. As our search strategy does not rely on event properties specific solely to GMSB models, we can exclude any γ+jet+E/T signal that would produce more than 5.5 events. We set cross section limits using a supersymmetric model with χ̃01 → γG̃, and find a GMSB exclusion region in the χ̃01 lifetime vs. mass plane with the world-best 95% C.L. lower limit on the χ̃01 mass of 101 GeV/c 2 at τχ̃0 = 5 ns. Future improve- ments with similar techniques should also provide sen- sitivity to new particle decays with a delayed electron signature [2]. By the end of Run II, an integrated lumi- nosity of 10 fb−1 is possible for which we estimate a mass reach of ≃ 140 GeV/c2 at a lifetime of 5 ns. )2 mass (GeV/c 65 70 75 80 85 90 95 100 105 110 0 1χ∼ )-1+1jet analysis with EMTiming (570 pb Predicted exclusion region Observed exclusion region ALEPH exclusion upper limit χ∼GMSB )=15β, tan(Λ=2messM >0µ=1, messN 65 70 75 80 85 90 95 100 105 110 FIG. 3: The exclusion region at 95% C.L. as a function of eχ01 lifetime and mass for a GMSB model [7]. The predicted and the observed regions are shown separately and are compared to the most stringent published limit from LEP searches [11]. We thank the Fermilab staff and the technical staffs of the participating institutions for their vital contribu- tions. This work was supported by the U.S. Department of Energy and National Science Foundation; the Italian Istituto Nazionale di Fisica Nucleare; the Ministry of Education, Culture, Sports, Science and Technology of Japan; the Natural Sciences and Engineering Research Council of Canada; the National Science Council of the Republic of China; the Swiss National Science Founda- tion; the A.P. Sloan Foundation; the Bundesministerium für Bildung und Forschung, Germany; the Korean Sci- ence and Engineering Foundation and the Korean Re- search Foundation; the Particle Physics and Astronomy Research Council and the Royal Society, UK; the Russian Foundation for Basic Research; the Comisión Interminis- terial de Ciencia y Tecnoloǵıa, Spain; in part by the Eu- ropean Community’s Human Potential Programme un- der contract HPRN-CT-2002-00292; and the Academy of Finland. [1] We use a cylindrical coordinate system in which the pro- ton beam travels along the z-axis, θ is the polar angle, φ is the azimuthal angle, and η = − ln tan(θ/2). The trans- verse energy and momentum are defined as ET = E sin θ and pT = p sin θ where E is the energy measured by the calorimeter and p the momentum measured in the tracking system. E/T = | − EiT ~ni| where ~ni is a unit vector that points from the interaction vertex to the ith calorimeter tower in the transverse plane. [2] J. L. Feng, A. Rajaraman and F. Takayama, Phys. Rev. D 68, 063504 (2003); M. J. Strassler and K. M. Zurek, arXiv:hep-ph/0605193. [3] S. Ambrosanio et al., Phys. Rev. D 54, 5395 (1996); C. H. Chen and J. F. Gunion, Phys. Rev. D 58, 075005 (1998). [4] F. Abe et al. (CDF Collaboration), Phys. Rev. Lett. 81, 1791 (1998) and Phys. Rev. D 59, 092002 (1999). [5] D. Acosta et al. (CDF Collaboration), Phys. Rev. D 71, 031104 (2005). [6] V. Abazov et al. (D0 Collaboration), Phys. Rev. Lett. 94, 041801 (2005). [7] B. C. Allanach et al., Eur. Phys. J. C25, 113 (2002). We use benchmark model 8 and allow the eG mass factor and the supersymmetry breaking scale to vary independently. [8] D. Toback and P. Wagner, Phys. Rev. D 70, 114032 (2004). [9] P. Wagner, Ph.D. Thesis, Texas A&M University, 2007. [10] D. Acosta et al. (CDF Collaboration), Phys. Rev. D 71, 032001 (2005). [11] A. Heister et al. (ALEPH Collaboration), Eur. Phys. J. C 25, 339 (2002); also see M. Gataullin, S. Rosier, L. Xia and H. Yang, arXiv:hep-ex/0611010; G. Abbiendi et al. (OPAL Collaboration), Proc. Sci. HEP2005 346 (2006); J. Abdallah et al. (DELPHI Collaboration), Eur. Phys. J. C 38 395 (2005). [12] The distribution of the pp̄ collisions has a standard devi- ation of 30 cm and 1.3 ns in zi and ti, respectively. [13] M. Goncharov et al., Nucl. Instrum. Methods A565, 543 (2006). [14] The standard requirement, χ2CES < 20 (see F. Abe et al. (CDF Collaboration), Phys. Rev. D 52, 4784 (1995)), has been removed because there is evidence that it is in- efficient for photons that arrive with large incident angles relative to the face of the detector. [15] See F. Abe et al. (CDF Collaboration), Phys. Rev. D 45, 1448 (1992). We use corrected jets reconstructed with a cone of ∆R = 0.7, see A. Bhatti et al., Nucl. In- strum. Methods A566, 375 (2006). [16] T. Sjöstrand et al., Comput. Phys. Commun. 135, 238 (2001). We use version 6.216. [17] We use the standard geant based detector simulation [R. Brun et al., CERN-DD/EE/84-1 (1987)] and add a parametrized EMTiming simulation. [18] E. Boos, A. Vologdin, D. Toback, and J. Gaspard, Phys. Rev. D 66, 013011 (2002). [19] J. Conway, CERN Yellow Book Report No. CERN 2000- 005, 2000, p. 247. [20] W. Beenakker et al., Phys. Rev. Lett. 83, 3780 (1999). http://arxiv.org/abs/hep-ph/0605193 http://arxiv.org/abs/hep-ex/0611010 ABSTRACT We present the first search for heavy, long-lived particles that decay to photons at a hadron collider. We use a sample of photon+jet+missing transverse energy events in p-pbar collisions at \sqrt{s}=1.96 TeV taken with the CDF II detector. Candidate events are selected based on the arrival time of the photon at the detector. Using an integrated luminosity of 570 pb-1 of collision data, we observe 2 events, consistent with the background estimate of 1.3+-0.7 events. While our search strategy does not rely on model-specific dynamics, we set cross section limits in a supersymmetric model with \tilde{\chi}_1^0->\gamma\gravitino and place the world-best 95% C.L. lower limit on the \tilde{\chi}_1^0 mass of 101 GeV/c^2 at \tau_{\tilde{\chi}_1^0} = 5 ns. <|endoftext|><|startoftext|> Failure of the work-Hamiltonian connection for free energy calculations Failure of the work-Hamiltonian connection for free energy calculations Jose M. G. Vilar1 and J. Miguel Rubi2 1Computational Biology Program, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021 2Departament de Fisica Fonamental, Universitat de Barcelona, Diagonal 647, 08028 Barcelona, Spain Abstract Extensions of statistical mechanics are routinely being used to infer free energies from the work performed over single-molecule nonequilibrium trajectories. A key element of this approach is the ubiquitous expression / ( , )dW dt H x t t/= ∂ ∂ , which connects the microscopic work W performed by a time-dependent force on the coordinate x with the corresponding Hamiltonian (H x t), at time t . Here we show that this connection, as pivotal as it is, cannot be used to estimate free energy changes. We discuss the implications of this result for single-molecule experiments and atomistic molecular simulations and point out possible avenues to overcome these limitations. PACS numbers: 05.40.-a, 05.20.-y, 05.70.Ln Hamiltonians provide two key ingredients to bridge the microscopic structure of nature with macroscopic thermodynamic properties: they completely specify the underlying dynamics and they can be identified with the energy of the system [1]. At equilibrium, the link with the thermodynamic properties is established through the partition function ( )H xZ e dβ−= ∫ x , which here uses the Hamiltonian in the coordinate space ( )H x x as the energy of the system [2]. In particular, the free energy is given by = − Z , where 1 Bk Tβ ≡ / is the inverse of the temperature T times the Boltzmann’s constant . Thermodynamic properties play an important role because they provide information that is not readily available from the microscopic properties, such as whether or not a given process happens spontaneously. The connection between work and Hamiltonian expressed through the relation W H x )t, , or equivalently through its integral representation ( ( ') ') ' W H x t t ∂∫ dt , is typically used to extend statistical mechanics to far-from- equilibrium situations [3-5]. These relations are meant to imply that the work W performed on a system is used to change its energy. The potential advantage of this type of approach is that it would allow one to infer thermodynamic properties even when the relevant details of the Hamiltonian are not known or when they are too complex for a direct analysis. Experiments and computer simulations can thus be performed to probe the microscopic mechanical properties from which to obtain thermodynamic properties. Time-dependent Hamiltonians, however, provide the energy up to an arbitrary factor that typically depends on time and on the microscopic history of the system. Such dependence, as we show below, prevents this approach from being generally applicable to compute thermodynamic properties. To illustrate how work and Hamiltonian fail to be generally connected, we consider a system described by the Hamiltonian under the effects of a time-dependent force 0( )H x ( )f t . The total Hamiltonian is given by 0( ) ( ) ( ) (H x t H x f t x g t), = − + , where is an arbitrary function of time, which leads to a total force . The function does not affect the total force but it changes the Hamiltonian. Therefore, has to be chosen so that the Hamiltonian can be identified with the energy of the system. ( )g t 0 /F H x f= −∂ ∂ + ( )t ( )g t ( )g t In general, the arbitrary time dependence of the Hamiltonian, , cannot be chosen so that the Hamiltonian gives a consistent energy. Consider, for instance, that the system, being initially at ( )g t 0x , is subjected to a sudden perturbation 0( ) ( )f t f t≡ Θ , where 0f is a constant and is the Heaviside step function. The work performed on the system, , where ( )tΘ 0( tW f x x= − 0 ) ( )tx x t≡ represents the value of the coordinate x at time , is in general different from ' 0 0 ( ') ' ( ) (0 tH x t dt f x g t gt , = − + − ∂∫ ) , irrespective of the explicit form of the function . ( )g t To illustrate the consequences of the lack of connection between work and changes in the Hamiltonian, we focus on the domain of validity of nonequilibrium work relations [3] of the type ,EG We eβ β− Δ −= which have been widely used recently to obtain estimates EGΔ of free energy changes from single-molecule pulling experiments [6] and atomistic computer simulations [7]. The promise of this type of relations is that they provide the values of the free energy from irreversible trajectories and therefore do not require equilibration of the system. Yet, in almost all instances in which this approach has been applied, the agreement with the canonical thermodynamic results has not been complete and in some cases the discrepancies have been large. These discrepancies have been attributed to the presence of statistical errors in the estimation of the exponential average We β− [8]. Currently, the mathematical validity of these type of nonequilibrium work relations appears to be well established: they have been derived using approximations [3] and rigorously for systems described by Langevin equations [4, 5]. However, all these derivations rely in different ways on the work-Hamiltonian connection, which as we show below prevents them from giving general estimates of thermodynamic free energies. The free energy difference between two states is defined as revG WΔ = , where is the work required to bring the system from the initial to the final state in a reversible manner [2]. Note that, if the system is not macroscopic, is in general a fluctuating quantity. At quasi-equilibrium, the external force ( )f t balances with the system force . After integration by the displacement, the reversible work done on the system is given by . Therefore, the free energy follows from ( )H x x−∂ /∂ 0 0( ) ( )rev tW H x H x= − 0 0 0( , ) ( ,0)rev eq t eq tG W P x t P x dx dxΔ = ,∫ ∫ where the equilibrium probabilities are obtained, in the usual way, from the Boltzmann distribution ( )( , ) H x t eq Z tP x t e β− ,= . To be explicit, let us consider a harmonic system described by 210 2( )H x kx= and ( ) 0g t = , with a constant. In this case, we can compute exactly the free energy change: G kxΔ = , where ( )eqx f t k≡ / , which leads to a positive value as required for non-spontaneous processes. One might have been tempted to use the partition function to estimate changes in free energy according to the expression 1 ln( ( ) (0))ZG Z t ZβΔ = − / , where ( )( ) H x tZ t e dβ− ,= ∫ x is the time-dependent quasi-equilibrium partition function [3, 4]. However, this relation is not valid when changes in the Hamiltonian cannot be associated with changes in energy. In the case of the harmonic potential, the use of the time-dependent partition function leads to 212Z eqG kΔ = − x , a negative value inconsistent with a process that is not spontaneous. More generally, the Hamiltonian 212( ) ( )(tH x t kx f t x )γ, = − − , where γ is a constant parameter that does not affect the dynamics of the system, leads to 2( )Z eq eqG kx xγΔ = − , which can be positive or negative depending on the value of γ . Therefore, the estimates ZGΔ are not suitable to predict typical thermodynamic properties, such as whether or not a process happens spontaneously. To what extent does the failure of the work-Hamiltonian connection impact nonequilibrium work equalities? In the case of a sudden perturbation and a harmonic potential discussed previously, the following result follows straightforwardly: 0 0( ) 0 0( , ) ( ,0) 1t f x xW eq t eq te e P x t P x dx dx ββ − −− = =∫ ∫ , which is different from . Ge β− Δ An intriguing question then arises: why do experiments and computer simulations sometimes lead to results that agree with nonequilibrium work equalities? Let us consider a situation closer to the experimental and computational setups, with a harmonic time- dependent force that constrains the motion on the coordinate x : 210 2( ) ( ) ( )tH x t H x K x X, = + − . Here is a constant and K tX is the time-dependent equilibrium position for the constraining force. In this case, with 210 2( )H x kx= and 0 0X = , we also have 2rev eq G W kxΔ = = , where now Keq tk Kx X+≡ . For quasi-equilibrium displacements of tX , so that the work performed is equal to the reversible work, , we have 0 0( ) ( )rev tW W H x H x= = − 0 0 0 0( ( ) ( )) 0 0( , ) ( ,0)rev t W H x H x eq t eq te e P x t P x dx β β− − −= ,∫ ∫ dx which leads to ( ) 2 2( 2 ) ( ) k k K eqk K W e ke K k K This result indicates that quasi-equilibrium does not guarantee the accuracy of the exponential estimate of the free energy from nonequilibrium work relations. The free energy change and its exponential estimate GΔ EGΔ agree with each other only for large values of . The reason is that, in this case, work and Hamiltonian are connected to each K other when both quasi-equilibrium and large- conditions are fulfilled simultaneously. Under such conditions, the work-Hamiltonian connection is valid because eq tx x X≈ ≈ implies that the rate of change of the Hamiltonian, ( ) / ( ) /t tH x t t K x X dX dt∂ , ∂ = − − , equals the power associated with the external force, / ( ) /tdW dt K x X dx dt= − − . Interestingly, large values of suppress fluctuations and lead to quasi-deterministic dynamics. Indeed, the experimental data [6] and computer simulations [7] indicate that the agreement between the free energy change GΔ and its exponential estimate EGΔ occurs mainly for relatively slow perturbations that lead to quasi-deterministic trajectories. Bringing thermodynamics to nonequilibrium microscopic processes [9] is becoming increasingly important with the advent of new experimental and computational techniques able to probe the properties of single molecules [6, 7]. Our results show that the classical connection between work and changes in the Hamiltonian cannot be applied straightforwardly to time-dependent systems. As a result, quantities that are based on the work-Hamiltonian connection, such as those obtained from nonequilibrium work relations and time-dependent partition functions, cannot generally be used to estimate thermodynamically consistent free energy changes. A possible avenue to overcome these limitations, as we have shown here, is to identify the particular conditions for which work and changes in the Hamiltonian are connected to each other. References [1] H. Goldstein, Classical mechanics (Addison-Wesley Pub. Co., Reading, Mass., 1980). [2] R. C. Tolman, The principles of statistical mechanics (Oxford University Press, London, 1955). [3] C. Jarzynski, Physical Review Letters 78, 2690 (1997). [4] G. Hummer, and A. Szabo, Proc Natl Acad Sci USA 98, 3658 (2001). [5] A. Imparato, and L. Peliti, Physical Review E 72, 046114 (2005). [6] J. Liphardt et al., Science 296, 1832 (2002). [7] S. Park et al., Journal of Chemical Physics 119, 3559 (2003). [8] J. Gore, F. Ritort, and C. Bustamante, Proc Natl Acad Sci USA 100, 12564 (2003). [9] D. Reguera, J. M. Rubi, and J. M. G. Vilar, Journal of Physical Chemistry B 109, 21502 (2005). ABSTRACT Extensions of statistical mechanics are routinely being used to infer free energies from the work performed over single-molecule nonequilibrium trajectories. A key element of this approach is the ubiquitous expression dW/dt=\partial H(x,t)/ \partial t which connects the microscopic work W performed by a time-dependent force on the coordinate x with the corresponding Hamiltonian H(x,t) at time t. Here we show that this connection, as pivotal as it is, cannot be used to estimate free energy changes. We discuss the implications of this result for single-molecule experiments and atomistic molecular simulations and point out possible avenues to overcome these limitations. <|endoftext|><|startoftext|> Introduction Adsorption of polymers on surfaces plays a key role in many technological applications and is also relevant to many biological processes. As a result, it has been studied for more than three decades1 and continues to receive intense interest.2 The field is rich and contains a wide variety of topics, from equilibrium properties of adsorbed layers and conformations of adsorbed polymer chains to dynamic properties and non-equilibrium processes in adsorption.2 For polymer adsorption on planar surfaces, it is well-known that there exists a critical adsorption point (CAP) that marks the transition of a polymer chain, in contact with a surface, from a non-adsorbed state to an adsorbed state.3 Scaling laws for a variety of quantities below, above and at the CAP for a homopolymer in contact with a planar surface were developed by Eisenriegler, Kremer, and Binder (EKB).4 For example, when the chain goes from a non-adsorbed state to an adsorbed state, the energy of the chain E changes from an intensive variable independent of chain length N to an extensive variable dependent on N. At the CAP, E is expected to scale with Nφ where φ is the crossover exponent. Numerical studies, including exact enumeration,5 the scanning method6,7 and the multiple Markov chain method8 have been performed to determine the location of the CAP and the crossover exponent φ. The values reported are however not completely in agreement with each other and are still under debate, especially the crossover exponent φ. The disagreement may be traced, as suggested by a recent article,9 to different methods used for determining the CAP and the crossover exponent φ. While many studies focused on adsorption of homopolymers on planar homogeneous surfaces, adsorption of polymers on chemically or physically heterogeneous surfaces has also received a fair amount of studies.10-20 Some were inspired by specific applications such as segregation of polymer chains on patterned surfaces,10 or pattern transfer via surface adsorption,21,22 others were motivated by a desire to understand how the presence of surface or sequence disorders may influence adsorption.13,14,16,17,23-25 For example, Sebastian and Sumithra developed an analytical theory of the adsorption of Gaussian chains on random surfaces using Gaussian variational approach.24,25 They took surface heterogeneity into account by modifying de Genne’s adsorption boundary condition and analyzed influence of randomness on the conformation of the adsorbed chains. Adsorption of heteropolymers on heterogeneous surfaces, in particular, has been studied because of its relevance to molecular recognition in biological process. The concept of “pattern matching” was proposed26 and has been investigated with different approaches.12,20,26,27 Muthukumar for example derived an equation for the critical condition of adsorption of a polyelectrolyte to an oppositely charged patterned surface.26 Golumbfski et al.12 showed that a statistical blocky chain was selectively adsorbed on a patchy surface while a statistically alternating chain was selectively adsorbed on an alternating surface. Jayaraman et al.19 described a simulation method to design surfaces for recognizing specific monomer sequences in heteropolymers. Recently Polotsky et al18 considered adsorption of Gaussian heteropolymer chains onto heterogeneous surface. They found that the presence of correlations between sequence and surface heterogeneity always enhances adsorption. However, the dependence of the critical adsorption point on either surface disorder or sequence disorder is not well-understood. Lack of this knowledge hampers further understanding on the correlation between sequence disorder and surface disorder during adsorption. Here we present theoretical equations that describe the dependence of CAP on the surface disorder or sequence disorder, along with Monte Carlo simulation data in agreement with the derived equations. The current study does not address the correlation between sequence disorder and surface disorder. We only consider cases where the disorder is either present randomly on the surface (i.e. adsorption of homopolymers on random heterogeneous surface) or on the sequence (i.e., adsorption of random copolymer on homogeneous surface). The correlation between sequence disorder and surface disorder will be the subject of future publications. In the following, we first present the theory that predicts the dependence of CAP on surface disorder and sequence disorder. Then we present details of Monte Carlo simulation methods used to determine the CAP, followed by simulation data that agree with the derived equations. Finally, we discuss implications of these results on practical applications such as chromatographic separations of polymers. 2. Theory 2.1 Adsorption of a homopolymer on a homogeneous surface We first consider adsorption of a homopolymer chain on a homogeneous surface. This can be represented by a self-avoiding walk (SAW) in a three-dimensional lattice interacting with a plane and restricted to lie on one side of the plane. The vertices of the walks interact with the surface sites with an attractive energy εw. The partition function for a N-step SAW interacting with a homogeneous surface is given by ( )∑= wNw vvcNZ εε exp)(),(homo (1) where cN(v) is the number of SAWs that lie above the surface with v visits to the surface. Hammersley et al.28 have shown that the model exhibits a phase transition at a critical adsorption energy, εc, with a desorbed state for εw < εc, and an adsorbed state for εw > εc. They have shown that the limiting monomer free energy f(εw) ),(log lim)( homo wNw NZN = (2) exists and is a convex non-decreasing continuous function of εw. Moreover, f(εw)=κ for εw ≤ 0, where κ is the lattice connective constant, and f(εw) is a strictly increasing function of εw when εw > εc. Therefore, f(εw) is non-analytic at εw = εc. εc has also been determined to be greater than zero and, based on the best-known connective constant for the simple cubic lattice29, to have an upper bound of 0.5738. The lattice connective constant κ is also the limiting monomer free energy of the SAWs in bulk solution. Hence the CAP can be understood as the condition where the limiting monomer free energy of a chain attached to the surface becomes equal to the limiting monomer free energy of the chain in the bulk solution. 2.2 Adsorption of a homopolymer on a random heterogeneous surface Now we consider the adsorption of a homopolymer interact with a heterogeneous surface consisting of two types of surface sites, A and B. The interaction energy of the vertices with the two surface sites are εwA and εwB. Following Soteros and Whittington23, and express the partition function of a N-step SAW interacting with a heterogeneous surface that consists of A and B surface sites as: ( ) ( )∑ ∑ ANBAhet BvfAvfAv vcffZ )()( )((exp)(exp )(),( εε (3) where cN(v) is the number of walks that have v surface contacts, v(A) is the number of monomers interacting with the A sites, and v(B) is the number of monomers interacting with the B sites, fA and fB = 1 - fA are the fractions of A and B sites on the surface, respectively. Here the partition function is averaged over random distributions of the surface sites, i.e. the so called annealed approximation. Physically the annealed disorder means that the type of surface sites may change while the system attains equilibrium state. However, it has been previously suggested11,15 that the annealed approximation is valid if the chain can visit a large area of the surface and hence samples all distributions of surface patterns. Furthermore, the surface sites are randomly distributed. If there is a correlation between surface disorders, such as those present in patchy surface or alternating surface, then Eq. (3) will not be valid, as Eq. (3) gives equal weight to all possible surface labelings, while correlations restrict possible labelings. Summing over v(A), equation (3) can be simplified to ( )∑ += wANBAhet ffvcffZ )exp()exp()(),( εε (4) A comparison of equations (1) and (4) reveals that the partition functions for homogeneous and annealed random heterogeneous surface become equivalent if ( ) ( ) )exp(expexp BwBAwAw ff εεε += (5) From Eq. (5), we derive the following equation that gives the dependence of CAP on the surface disorder: ( ) ( ) ))(exp(exp)1()(exp ccffcc BwBAwBhw εεε +−= (6) where εwh(cc) is the CAP of a homopolymer above a homogeneous surface, εwB(cc) is the CAP of a homopolymer above a heterogeneous surface while the surface interaction energy εwA held constant. It can be easily seen from this equation, that the dependence of the CAP on the percentage of attractive sites on the surface is not expected to be linear, in contrast to the conclusion drawn by an earlier study.13 Equation (6) is expected to be valid as long as the two conditions are met: (i) the chain has enough mobility to visit a large area of surface so that the annealed approximation is valid, and (ii) the surface sites are randomly distributed (i.e. uncorrelated). 2.3 Adsorption of a random heteropolymer on a homogeneous surface The same approach can be extended to consider the adsorption of a random heteropolymer interacting with a homogeneous surface. We will use the same notation as in previous section except now fA and fB represent fractions of A and B monomers present on the heteropolymer. We will only consider random copolymers composed by A and B monomers. The sequence of a random copolymer can be represented by χ ={χ1, χ2, … χN} where χi are independently and identically distributed random variables with χi =A with a probability of fA and χi=B with a probability of 1-fA. A sequence order parameter λ can be defined to characterize the sequence randomness.12,27 BAAB pp −−= 1λ (7) where pij is the nearest neighbor transition probabilities which is the probability that a monomer of type i is followed by a monomer of type j. When λ=0, the sequence is random. When λ>0, then the sequence is statistically blocky, and when λ<0, the sequence is statistically alternating. We note that a given random sequence designated by χ may have non-zero values of λ. More discussions will be given in the later section. The partition function of N-step SAWs with the given sequence above a homogenous surface is written as: )exp()|,(),,( BwB BANBAhetpoly vvvvCffZ εεχχ += ∑ (8) There are two different ways to average over different distributions of random sequences, namely the annealed average and the quenched average. With the annealed average, the partition function in Eq. (8) is first averaged over different distributions of χ. This then leads to a partition function, Zhetpoly(fA, fB), which is exactly the same as in Eq. (3). With the annealed approximation, we derive the same equation as given by Eq. (6) for the CAP of a random heteropolymer interacting with a homogeneous surface, provided that fA and fB now represent the fractions of A and B monomers on the chain. In the following, we will present Monte Carlo simulation data that conform to the two equations and also results that do not conform to the equations because of the invalidation of the approximations used in deriving the equations. 3. Monte Carlo Simulation Methods In our simulations, polymer chains are modeled as SAWs with N vertices on a simple cubic lattice of dimensions 250a × 250a × 100a, where a is the lattice spacing. Each vertex represents a monomer on the polymer chain. Chain lengths studied are in the range of N = 25 to 250. There is an impenetrable wall in the z = a plane representing the surface. One monomer, picked randomly from the chain, is first placed on a site adjacent to the wall (in the z = 2a plane). The rest of the chain is then grown using the biased chain insertion method.30 Monomers that are in the z = 2a plane are considered to be adsorbed on the surface. For all adsorbed monomers, an attractive polymer-surface interaction, εw, is applied. The standard chemical potential of the chain (since it does not contain translation entropy), µ0, is calculated from the Rosenbluth- Rosenbluth weighting factor, W(N), which is given by30 0 ln)(lnβµ and )exp( β (9) where z is the lattice coordination number (z = 6 for simple cubic lattice), Ej is the energy of ith inserted monomer in the jth potential direction. We note that µ0 calculated is the free energy per chain, and µ0/N is free energy per monomer discussed in equation (2). Typically, the chemical potential is determined based on about twenty million copies of trial chain conformations. We obtained the standard chemical potentials of a chain with at least one monomer attached to the surface, µads0, and compared that against a chain grown in a bulk solution, µbulk0. The bulk solution is modeled by a 100a × 100a × 100a lattice with periodic boundary conditions applied in all three directions. All chemical potentials calculated are reduced by the Boltzmann factor, β=1/kBT=1. A coefficient K, similar to partition coefficient if the chain was placed in a pore instead of near a surface, is calculated by K =exp(-∆µ0), where ∆µ0 = µads0 − µbulk0. The way we determined the CAP is based on the dependence of K on the chain length N and will be presented in the results section. Heterogeneous surfaces were modeled by making the z = a plane composed of two different types of sites, which have different values for polymer-surface interactions. The designations εwA and εwB will be used to distinguish between interaction energies of different site types. Simulations were performed using surfaces with different fractions of A and B sites. Surfaces were created by randomly assigning each site as A or B based on the probabilities, pA and pB, where pA and pB are, respectively, the desired fractions of A and B sites on the surface. Because of size of the surface, this procedure resulted in the real surface composition percentages matching the desired percentages within 0.1%. For a given surface composition, the surface was randomly created once and was subsequently used in all simulations that determine the chemical potential of a chain above that surface. The surfaces displayed quenched randomness, i.e. the surface pattern remained unchanged throughout the simulations. However, the first bead of chain was placed randomly over the surface during the chain insertion, and hence the chemical potential determined has been averaged over different surface randomness. Therefore, the annealed approximation used in deriving Eq. (6) was met in the simulations. In a few cases, patchy and alternating surfaces were created by simulating a two-dimensional Ising model at appropriate conditions. Heteropolymers were modelled as SAWs consisting of two types of monomers, A and B with specified fractions fA and fB=1-fA.. Chains were created by randomly selecting N*fB different positions along the chain to be B beads, while the remaining beads were assigned as A beads, ensuring that the chain had the exact composition called for by fA and fB. The sequence order parameter, λ, in generated random sequences exhibits a Gaussian distribution with zero mean. Examples of distributions are presented in Figure 1. The longer the chain, the narrower the distribution is. For a given chain length N, we typically generate 5000 copies of random sequences with specified fA. Each sequence is then used in biased insertion for 5000 or more copies to obtain the Rosenbluth-Rosenbluth weighting factor. Letting W(N, χ) stands for the sequence order parameter λ -1.0 -0.5 0.0 0.5 1.0 N=100 N=200 Figure 1: Distribution of sequence order parameters obtained from 5000 copies of random sequences generated with fA = fB = 0.50 for three different chain lengths. Lines are smooth fit to the data. Rosenbluth-Rosenbluth weighting factor obtained for a given sequence χ, the chemical potential of a chain can be obtained using two different averages over sequences: ),(ln)(0 χβµ NWNads −= (10) ),(ln),()( 00 χχβµβµ NWNN adsads −== (11) The first approach is the annealed average, while the second approach is the quenched average. The two chemical potentials calculated differ slightly from each other. More discussion of the quenched versus annealed averages will be given later. For the determination of CAP, we have used annealed chemical potentials. 4. Results and Discussion 4.1. Method Used to Determine the Critical Adsorption Point The method we used to determine the CAP follows our earlier papers31,32 and is briefly sketched out. We obtain the difference in standard chemical potential ∆µ0 at different surface interaction εw for a set of chains with different lengths. An example of data is presented in Figure 2(a) for a homopolymer above a homogeneous surface. The lines for different length N nearly intersect at a common point, which is estimated to be at εc=0.276 ± 0.005. A convenient way to identify this intersection point is to plot the standard deviation of all ∆µ0, σ(∆µ0), for a given range of chain length studied versus εw, which yields a minimum in a plot shown in Figure 2(b). The minimum identified is directly related to the critical condition point employed in liquid chromatography at the critical condition (LCCC) 32-34. In LCCC, the critical condition was defined as the co-elution point of homopolymers with different molecular weights, which, corresponding to computer simulation, is the point where K has least dependence on chain length. If K is truly independent of chain length, then σ(∆µ0) will be zero and will be the minimum in a plot in Figure 2(b). The critical condition point bracketed in this fashion depends slightly on the range of chain length included in the calculation of σ(∆µ0). However, in the current study we fixed the range of chain lengths used. Since this common intersection point does not occur at ∆µ0 =0, one may wonder if it is the critical adsorption point discussed in the literature. We have applied the same method for random walks above a planar surface in simple cubic lattice31. The intersection point found was at εc = 0.183± 0.002, in excellent agreement with expected CAP for random-walks, εc = -ln(5/6)= 0.1823.1 On the other hand, CAP could be understood as the point where the limiting monomer free energy for a chain attached to the surface f(ε) equals to the limiting monomer free energy of an unattached chain in the bulk solution. Therefore, we may define a CAP at a finite chain 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 N=100 N=200 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 Figure 2: (a) Plot of ∆µ0 versus εw for SAW chains with N =25, 50, 100 and 200 above a homogeneous surface. The critical adsorption point is identified as the common intersection point, εw(cc)=0.276±0.005. (b) Plot of deviation in ∆µ0 for the given range of N versus εw. The minimum in the plot is the critical adsorption point. length, εc (N), at which ∆µ0(N)=0. From Figure 2(a), we extract such εc (N). This εc (N) is expected to depend on N in a scaling law, εc (N) = εc(∞) –αN−φ, and εc(∞) is the CAP at infinite chain length limit. Assuming φ = 0.5, Figure 3 shows the linear fitting of εc (N) versus N−0.5 which yields εc (∞) = 0.274 ± 0.005. The εc (∞) identified is within the error bars of the common intersection point. The CAP of SAWs in simple cubic lattice has been studied by others.6-8 The reported literature value for the CAP of SAWs on the simple cubic lattice ranged from ~0.37 by Ma et al.35 down to 0.288 ± 0.02 by Janse van Rensburg and Rechnitzer8. The value reported by Ma et al. was considered to be too high, probably due to chains analyzed being too short. Methods used to determine the CAP varied in the literature. Meirovitch and Livne6 obtained the CAP for SAW in simple cubic lattice with Monte Carlo simulations with the scanning method. They plotted E(T)/N against N and found the exponent α in E(T)/N~Nα over three different ranges of chain length (N = 20-60, 60-170, and 170-350). Then, the critical point was located by finding the N-0.5 0.00 0.05 0.10 0.15 0.20 0.25 Figure 3: Plot of εc(N) versus N-0.5 where εc(N) is extracted from figure 1(a) as the point when ∆µ(N) = 0. The extrapolated εc(∞) =0.274 ± 0.005. value of the reciprocal temperature Θ that resulted in the exponent α being constant for the three different ranges of chain lengths. Their reported Θc, which is equivalent to our εc, was 0.291 + 0.001. Their method for determining Θc was based on the scaling theory developed by EKB.4 As stated earlier, at CAP, E(T)/N is expected to scale with Nφ-1 where φ is the crossover exponent. The value of this crossover exponent was debated. EKB first showed that φ ≈ ν ≈ 0.59, where ν is the Flory’s exponent. Several recent reports suggest that φ = 0.5 even for SAW chains, the same as φ for random-walks.8,36 In Meriovitch and Levin’s study, φ was left as an adjustable parameter. The reported φ value in their study was =0.530+ 0.007, slightly larger than recent reported values φ=0.5. If we were to take φ=0.5, then their data would suggest a lower Θc. Recently Decase et al.9 explored four different ways to determine the CAP, mostly based on the scaling idea. They found that a slight change of εc lead to large deviations in the resulting φ. Therefore, simultaneous determination of εc and φ may not give the true location of CAP. Janse van Rensburg and Rechnitzer8 studied CAP for SAWs in two and three dimensions using a variety of methods, including studying the energy ratios of walks of different lengths and the specific heats of the chains. They found that analysis of the specific heat data in three dimensions were fraught with difficulty. The energy ratios of different lengths and the free energy method yielded εc within the error bars. They reported a value for the CAP, εc=0.288 + 0.020 and a crossover exponent φ = 0.5005 + 0.0036. Our CAP is within the error bars of their reported value. Interestingly, if they assume that the convergence of the energy ratios of different chain lengths is proportional to N1 , the yielded εc = 0.276 + 0.029, exactly the same as in our study. The above discussion suggests that the critical condition determined with our approach is the CAP. Our approach to determine the CAP does not depend on knowledge of φ and therefore does not suffer from the uncertainty in εc when both εc and φ need to be determined simultaneously. In the remainder of the paper, we will use this method to determine the CAP of SAWs above a planar heterogeneous surface and SAWs for heteropolymers above a planar homogeneous surface. 4.2. Homopolymers above Heterogeneous Surfaces with Attractive and Non-Interacting Sites Here we consider adsorption of homopolymers above a heterogeneous surface. The first type of heterogeneous surface studied consists of a surface composed of two types of sites. One type of the surface sites, which will be called A sites, did not interact with the polymer chains; that is, εwA = 0. The other type of surface site, the B sites, had an attractive interaction with the polymer chains, εwB. The value of εwB was varied to locate the CAP. Figure 4 shows a plot of the standard deviations in β∆µ0 over all chain lengths for each value of εwB scanned. The minimum in standard deviations occurs for εwB(cc) = 0.49± 0.01, where the error was based on the energy increment scanned. The same method was used to determine the CAP for surfaces with 10%, 15%, 20%, 25%, and 75% attractive sites. Table I summarizes the CAP of homopolymers over heterogeneous surfaces along with the data over a homogeneous surface. Figure 5 presents the plot of CAP, εwB(cc), as a function of fB along with the theoretical prediction according to Eq. (8) with εwA = 0 and εwh(cc) = 0.276. It is clear that a good agreement between Eq. (6) and simulation data is observed. Also we note that CAP is not linearly dependent on fB over the entire range but is well-described by Eq.(6). Earlier study by Sumithra and Baumgaertner13 focused on surfaces with fB above the percolation threshold. Within that limited range of fB, a linear dependence may be obtained. This study is the first to confirm the dependence of CAP on the surface disorder over a wide range of fB. 0.44 0.46 0.48 0.50 0.52 0.54 0.56 Figure 4: Plot of deviation in ∆µ0 against εwB for a homogeneous chain adsorbing on a surface with 50% attractive sites and 50% non-interacting sites. The CAP occurs at εwB = 0.49 + 0.01. As discussed in the theory section, one of the assumptions used in deriving Eq. (6) is that the interacting surface sites are randomly distributed. We have tested this assumption by studying adsorption of homopolymers over a 50% surface with alternating and patchy patterns. For a surface with 50% of A and B, an order parameter O.P. can be defined (readers are referred to literature for the definition).19 If O.P.=0, the surface is random; if O.P.=+1, then the surface is patchy; and if O.P.=-1, the surface is alternating. The data are also included in Table I and are indicated in Figure 4. The two points deviate from the line described by Eq. (6). The CAP obtained over a 50% alternating surface is larger than that over a 50% random surface. On the other hand, the CAP obtained over a 50% patchy surface is smaller than over a 50% random surface. These results can be easily understood. When a chain is adsorbed on the surface, it forms trains, loops and tails.1 Formation of trains lowers the energy of a chain to overcome the entropy loss during the adsorption. When a chain is in contact with an alternating surface, it is however difficult to form trains as no adsorbing sites are adjacent, while this is possible for Percent B sites 0 20 40 60 80 100 Figure 5: Plot of the CAP, εwB(cc), against the percent of attractive B sites, fB. The symbols are the CAP determined by the simulation, and the solid line is from equation (6) with εWA =0.0 and εwh(cc) =0.276. Circles are CAP over random surfaces, the cross (×) is the CAP over a strictly alternating surface, and the upper triangle (∆) is the CAP over a patchy surface with O.P. =+0.94. random and patchy surfaces. Therefore, chains attraction to the alternating surface is lessened, and adsorption over a 50% alternating surface has to occur at a larger value of εw. On the other hand, a chain over a patchy surface can selectively sample patches of the surface composed of adsorbing sites, so the adsorption over patchy surface can occur at a smaller value of εw. Another assumption used in deriving Eq. (6) is the annealed approximation. This approximation is strictly met if the surface pattern in contact with the chain changes during the chain adsorption,11 hence averaging over different distributions can be performed as done in Eq. (3). The surface in this case is said to contain annealed randomness. If the surface pattern can not change, then the surface is said to contain quenched randomness. In our simulations, the surface contains quenched randomness. In fact, we have used only one realization of a quenched random surface. However, the chain was placed randomly over different surface sites, making the annealed approximation applicable to our simulations. We note that Sumithra and Baumgaertner13, in their studies, averaged over 50 different realizations of quenched randomness and they compared the results with that of a single surface realization. They did not find major difference between these two approaches, especially if the temperature is high. Moghaddam and Whittington16 investigated the difference between the quenched average and the annealed average for homopolymer adsorption on heterogeneous surface and random copolymer adsorption on homogeneous surface. Their data show that there was no difference between the two averages in the case of adsorption on random surfaces but there were differences for adsorption of random copolymers especially at low temperature. It has been argued that quenched and annealed averages are equivalent in cases where the quenched surface is large in comparison with the polymer.11,15 Polotsky et. al18 have also found that the CAP for quenched and annealed surface disorders are the same. In our simulations, the surface is large in comparison with the size of the polymer, and the attachment of the polymer to the surface occurs at many random places on the surface. Therefore, the chain can effectively interact with many different random arrangements of surface sites, and the system approaches the annealed average. 4.3. Homopolymers above Heterogeneous Surfaces with All Sites Interacting In order to assess whether the equation derived for the CAP of random surfaces was valid in more general cases, random surfaces that contained all attractive sites were prepared. For these surfaces, the polymer-surface interaction for the A sites, εwA, was set at a relatively weak attractive strength, 0.10, and the interaction for the B surface sites was varied to find the CAP. Additionally, surfaces with repulsive A sites (εwA = -.10) were also investigated. Percent B Sites 0 20 40 60 80 100 Figure 6: Plot of the critical adsorption point, εwB(cc), against the percent of attractive B sites for surfaces with attractive or repulsive A sites. The dashed line and open symbols are for surfaces with slightly repulsive A sites, εwA=-0.10. The solid line and closed symbols are for surfaces with slightly attractive A sites, εwA=+0.10. The symbols are simulation results, while the lines are from equation (6) with the corresponding εwA values. Figure 6 shows the values of εwB(cc) determined for these two cases, as well as the prediction of the value of εwB(cc) given the values of fB and εwA used in the simulation. As can be seen in the figure, there is a good agreement between the data and the equation, indicating that the equation is valid for surfaces with many different types of surfaces, not just surfaces with attractive and non-interacting sites. 4.4. Random Copolymers above Homogeneous Surfaces Critical adsorption point for random copolymers adsorbing on homogeneous surfaces were also determined. In these systems, polymer chains are considered to be composed of two different types of monomers, A’s and B’s, interacting with a surface composed of only one type of site. B monomers were attracted to the surface, while A monomers do not interact with the surface, i.e. εwA = 0. Table 2 shows the values of the CAP, εwB(cc), for various values of fB along with results obtained for homopolymers, alternating copolymers and block copolymers. Here we have used annealed chemical potentials to determine the CAP. Figure 7 presents the plot of B(cc) as a function of fB along with the theoretical prediction according to Eq. (6) with εwA = 0 and εwh(cc) = 0.276. The data fit the equation well for situations in which sequences are randomly specified. However, similar to homopolymer adsorption on heterogeneous surfaces, the equation does not apply when the chain sequence is not random. For a diblock copolymer, where the first half of the chain is all A monomers while the second half of the chain is all B monomers, a weaker attraction is required to reach the CAP than for a random 50% copolymer chain. An alternating copolymer requires a slightly stronger attraction to reach the CAP. Again, these results can be explained by considering the tendency of forming trains during adsorption. The diblock copolymer is a homogeneous string of adsorbing B monomers attached to a string of A monomers. The B section of the chain is able to interact with the surface like a homogeneous chain, while the A section does not adsorb and slightly repels the chain from the surface, indicating that the value of εwB(cc) for a diblock chain should be similar to a homogeneous chain on a homogeneous surface. In fact, εwB(cc) = 0.30 for diblock copolymers, a value only slightly higher than for homopolymer adsorption, and much lower than εwB(cc) for a 50% random copolymer chain. For an alternating chain, consecutive attractive interactions are not possible, resulting in the necessity of a stronger εwB(cc) than for a random chain. Finally we compare the chemical potential determined with annealed approximation versus quenched average. We found that the chemical potential of a random copolymer above the surface, µ0ads, obtained via the annealed average in Eq. (10) was smaller than the quenched Percent B monomers 0 20 40 60 80 100 Figure 7: Plot of the CAP, εwB(cc), of copolymers over a homogenous surface against the percent of attractive B monomers, fB. The symbols are the CAP determined by the simulation, and the solid line is the plot according to equation (6) with εWA =0.0 and h(cc) =0.276. Circles are CAP of random copolymers, the cross (×) is the CAP of block copolymers, and (∆) is the CAP of alternating copolymers. average in Eq. (11). This has been suggested in the literature.23 Annealed approximation implies that the chain sequence can change when it interacts with the surface. As a result, the chemical potential is lowered when compared with a chain with a fixed sequence. Figure 8 below shows the distribution of µ0ads, obtained based on trial insertions of a given random sequence, against the sequence order parameter λ. As discussed in section 3, a generated random sequence may not correspond to exactly λ=0, therefore resulting a distribution of µ0ads against λ. Figure 8 shows that within the range of λ spanned by random sequences, the chemical potential is seen to depend on λ. The µ0ads is higher for negative λ and is lower for positive λ. This is consistent with the results in Table II. A negative λ implies the random copolymer chain exhibits statistically alternating behaviour. A higher µ0ads implies that the chain is more difficult to be adsorbed on the surface; therefore, it needs a stronger attraction to reach CAP. -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 Figure 8: The distribution of µ0ads versus the sequence order parameter λ of a random copolymer. Each data point represent one µ0ads based on insertion of one given random sequence for 5000 times and the figure contains data for 5000 random sequences. Chain length N =100, fA = fB = 0.5, and εwA =0.0 and εWB =-0.5. 5. Summary Remarks Polymer adsorption at surfaces is relevant to many practical applications and has thus received extensive experimental investigation. However, interest in the CAP, to a large degree, has, until recently, remained a theoretical exercise. There were neither experimental methods that directly measure the CAP, nor were there applications that depended on the exact location of the CAP. This has now changed as interesting applications in liquid chromatography separations have been developed.37,38 In particular, liquid chromatography at the critical condition (LCCC), first reported in the 1980’s, has now widely used for characterization of polymer systems that contain structural and chemical heterogeneities. The critical condition in LCCC experiments was defined as the point where homopolymers of a specific type co-elute regardless of their molecular weights. By erasing the dependence of elution on the molecular weights of one species, other species, differing either chemically or structurally, can then be analyzed. Experimentalists39 have mostly regarded this critical condition as the CAP. Our earlier Monte Carlo simulations largely support this view.31-34 The current study provides knowledge on the dependence of CAP on sequence disorder or surface disorder and such knowledge will be useful to develop chromatographic methods for analyzing random copolymers. We note that several earlier studies 13,14,16,17 have examined the adsorption of polymers on surfaces with either surface disorder or sequence disorder. These studies examined influence of disorder on a variety of properties related to polymer adsorption, such as the change of heat capacity, energy of the chain, and radius of gyration of the chain. Very few, however, have tried to determine the dependence of CAP on the disorder. One of possible reasons that hamper these earlier studies to study the dependence of CAP on the disorder may be due to the lack of a convenient way to determine the CAP. As we have discussed in the theory section, CAP was typically understood as the phase transition of an infinitely long chain near a surface. Earlier studies trying to determine the CAP need to wrestle with the difficulty in extrapolation of results to the limit of infinitely long chain. On the other hand, validity of our studies hinges on the way we determine the CAP. In the case of adsorption of homopolymers over homogeneous surface, we discussed the relationship between the CAP determined by our method with reported literature values. Abundant evidence that supports the validity of our approach was presented in section 4.1. However, for the adsorption over heterogeneous surface, the nature of this CAP is not well-understood. Can a long chain in contact with a surface with few adsorbing sites still exhibit a phase transition similar as that of homopolymers over homogeneous surface? If it does, is the transition first-order or second-order? These questions therefore may cast some doubt on the CAP determined by our approach in the presence of disorder. However, the CAP we determined is directly related to the critical condition point in LCCC. Hence, even though the physical meaning of the CAP determined in this study in the presence of disorder could be subjected to further scrutiny, the importance of our results is not undermined. Table 1: Critical Adsorption Point for Homopolymers above Heterogeneous Surfaces with Attractive B Sites and Non-interacting A Sites. Percentage of Attractive Sites εwB(cc) 100% 0.276 + 0.005 75% 0.35 + 0.01 50% 0.49 + 0.01 25% 0.82 + 0.01 20% 0.96 + 0.01 15% 1.15 + 0.01 10% 1.45 + 0.01 50% alternating surface 0.55+ 0.01 50% patchy surface (O.P=0.94) 0.31+ 0.01 Table 2: Critical Adsorption Point for Heteropolymers with Attractive B Monomers and Non-interacting A Monomers over Homogeneous Surface Percentage of B monomers εwB(cc) 100% 0.276 + 0.005 75% 0.36 + 0.01 50% 0.49 + 0.01 25% 0.84 + 0.01 15% 1.16 + 0.01 50% alternating copolymers 0.55 + 0.01 50% block copolymers 0.30 + 0.01 References: (1) Fleer, G. J.; Cohens Stuart, M. A.; Scheutjens, J. M. H. M.; Cosgrove, T.; Vincent, B. Polymers at Interfaces; Chapman & Hall: London, UK, 1993. (2) O'Shaughnessy, B.; Vavylonis, D. J. Phys.: Conden. Matt. 2005, 17, R63-R99. (3) De Gennes, P. G. Scaling Concepts in Polymer Physics; Cornell Univ. Press: Ithaca, 1979. (4) Eisenriegler, E.; Kremer, K.; Binder, K. J. Chem. Phys. 1982, 77, 6296-6320. (5) Ishinabe, T. J. Chem. Phys. 1982, 77, 3171-3176. (6) Meirovitch, H.; Livne, S. J. Chem. Phys. 1988, 88, 4507-4515. (7) Livne, S.; Meirovitch, H. J. Chem. Phys. 1988, 88, 4498-4506. (8) van Rensburg, E. J. J.; Rechnitzer, A. R. J. Phys. A: Math. Gen. 2004, 37, 6875-6898. (9) Decase, R.; Sommer, J.-U.; Blumen, A. J. Chem. Phys. 2004, 120, 8831-8840. (10) Balazs, A. C.; Huang, K.; McElwain, P.; Brady, J. E. Macromolecules 1991, 24, 714- 717. (11) Wu, D.; Hui, K.; Chandler, D. J. Chem. Phys. 1992, 96, 835-841. (12) Golumbfskie, A. J.; Pande, V. S.; Chakraborty, A. K. Proc. Nat. Acad. Sci. 1999, 96, 11707-11712. (13) Sumithra, K.; Baumgaertner. J. Chem. Phys. 1998, 109, 1540-1544. (14) Sumithra, K.; Baumgaertner, A. J. Chem. Phys. 1999, 110, 2727-2731. (15) Charkraborty, A. K. Phys. Rep. 2001, 342, 1-61. (16) Moghaddam, M. S.; Whittington, S. G. J. Phys. A: Math. Gen. 2002, 35, 33-42. (17) Moghaddam, M. S. J. Phys. A: Math. Gen. 2003, 36, 939-949. (18) Polotsky, A.; Schmid, F.; Degenhard, A. J. Chem. Phys. 2004, 121, 4853-4864. (19) Jayaraman, A.; Hall, C. K.; Genzer, J. Phys. Rev. Lett. 2005, 94, 078103. (20) Bogner, T.; Degenhard, A.; Schmid, F. Phys. Rev. Lett. 2004, 93, 268108-268101- 268104. (21) Genzer, J. J. CHem. Phys. 2001, 115, 4873-4881. (22) Genzer, J. Macromol. Theory Simul. 2002, 11, 481-493. (23) Soteros, C. E.; Whittington, S. G. J. Phys. A.: Math. Gen. 2004, 37, R279-R325. (24) Sumithra, K.; Sebastian, K. L. Journal of Physical Chemistry 1994, 98, 9312-9317. (25) Sebastian, K. L.; Sumithra, K. Phys. Rev. E. 1993, 47, R32-R35. (26) Muthukumar, M. J.Chem. Phys. 1995, 103, 4723-4731. (27) Bratko, D.; Chakraborty, A. K.; Shakhnovich, E. I. Chem. Phys. Lett. 1997, 280, 46-52. (28) Hammersly, J. M.; Torrie, G. M.; Whittington, S. G. J. Phys. A: Math. Gen. 1982, 15, 539-571. (29) Arteca, G. A.; Zhang, S. Phys. Rev. E. 1998, 58, 6817-6820. (30) Frenkel, D.; Smit, B. Understanding molecular simulations-from algorithms to applications; Academic Press: San Diego, CA, 2002. (31) Gong, Y.; Wang, Y. Macromolecules 2002, 35, 7492-7498. (32) Orelli, S.; Jiang, W.; Wang, Y. Macromolecules 2004, 37, 10073-10078. (33) Jiang, W.; Khan, S.; Wang, Y. Macromolecules 2005, 38, 7514-. (34) Ziebarth, J.; Orelli, S.; Wang, Y. Polymer 2005, 46, 10450-10456. (35) Ma, L.; Middlemiss, K. M.; Torrie, G. M.; Whittington, S. G. J. Chem. Soc. Frad. Trans. II. 1978, 74, 721-726. (36) Metzger, S.; Muller, M.; Binder, K.; Baschnagel, J. Macromol. Theory Simul. 2002, 11, 985-995. (37) Pasch, H.; Trathnigg, B. HPLC of Polymers; Springer-Verlag Berlin Heidelberg, 1999. (38) Chang, T. J. Polym. Sci. B 2005, 43, 1591-1607. (39) Macko, T.; Hunkeler, D. Adv. Polym. Sci. 2003, 163, 61-136. Graphics to be used for the Table of Contents Percent B sites 0 20 40 60 80 100 ABSTRACT The critical adsorption point (CAP) of self-avoiding walks (SAW) interacting with a planar surface with surface disorder or sequence disorder has been studied. We present theoretical equations, based on ones previously developed by Soteros and Whittington (J. Phys. A.: Math. Gen. 2004, 37, R279-R325), that describe the dependence of CAP on the disorders along with Monte Carlo simulation data that are in agreement with the equations. We also show simulation results that deviate from the equations when the approximations used in the theory break down. Such knowledge is the first step toward understanding the correlation of surface disorder and sequence disorder during polymer adsorption. <|endoftext|><|startoftext|> rhoRR_rhoee.tex Coherent control of atomic tunneling John Martin and Daniel Braun Laboratoire de Physique Théorique, IRSAMC, UMR 5152 du CNRS, Université Paul Sabatier, Toulouse, FRANCE We study the tunneling of a two-level atom in a double well potential while the atom is coupled to a single electromagnetic field mode of a cavity. The coupling between internal and external degrees of freedom, due to the mechanical effect on the atom from photon emission into the cavity mode, can dramatically change the tunneling behavior. We predict that in general the tunneling process becomes quasiperiodic. In a certain regime of parameters a collapse and revival of the tunneling occurs. Accessing the internal degrees of freedom of the atom with a laser allows to coherently manipulate the atom position, and in particular to prepare the atom in one of the two wells. The effects described should be observable with atoms in an optical double well trap. PACS numbers: 73.40.Gk, 37.30.+i I. INTRODUCTION The tunneling effect is considered one of the hallmarks of quantum mechanical behavior. Historically, tunneling was first examined for single particles (e.g. α particles [1], electrons in field emission [2] and later in mesoscopic cir- cuits [3]), for Cooper pairs [4], and for molecular groups [5, 6, 7]. Recently the tunneling of atoms has attracted substantial attention [8, 9, 10, 11]. Dynamical (chaos as- sisted) tunneling of ultracold atoms between different is- lands of stability in phase space was analyzed in [12, 13] and has been observed experimentally [14, 15]. Reso- nantly enhanced tunneling of atoms between wells of a tilted optical lattice has also been observed very recently [16]. In all of these examples, the atoms have been con- sidered internally as inert, and only the center of mass coordinate of the atom was of interest. In [17] it was shown that by taking into account the internal degrees of freedom of atoms, an atom/optical double well poten- tial could be created in which tunneling atoms see their internal and external states correlated (such an effect is also known from other contexts [18]). Mechanical effects of light in optical resonators were also investigated in [19], but no tunneling was considered. Here we show that the tunneling effect can be drasti- cally modified if an internal transition of the atom is cou- pled to a single electromagnetic mode in a cavity, such that photon emission is a reversible and coherent pro- cess. The resulting Rabi oscillations between states with the excitation in the atom and states with a photon in the cavity modulate the periodic tunneling motion. De- pending on the frequencies involved, a rich quasi-periodic behavior can result. If the cavity is fed with a coherent state, collapse and revival of the tunneling effect can oc- cur. Moreover, we show that one may profit from access to the internal degrees of freedom of the atom (e.g. with a laser) to control the atomic motion in the external po- tential. FIG. 1: (Color online) Two-level atom in a double well po- tential interacting with a standing wave inside a cavity. II. MODEL A. Derivation of the Hamiltonian Consider a trapped two-level atom (with levels |g〉, |e〉 of energy ∓~ω0/2 respectively) interacting with a stand- ing wave (with wave number k and frequency ω) inside a cavity as illustrated in Fig. 1. The atom is assumed to be bound in the y − z plane at the equilibrium position y = z = 0 and to experience a symmetric double well po- tential V (x) along the x direction. We denote by ∆ the tunnel splitting, i.e. the energy spacing between the two lowest energy states (the symmetric |−〉 and antisym- metric |+〉 states) of this double well potential. Below we also allow the trapped atom to interact resonantly with an external laser. The Hamiltonian of this system is given by H = HA +HF +HAF , (1) where HA = H A is the Hamiltonian of the trapped atom, HF is the Hamiltonian of the free field and HAF is the interaction Hamiltonian describing the atom-field interaction. We have HexA = + V (x), H inA = σinz , HF = ~ωa HAF = −d.E, http://arxiv.org/abs/0704.0763v2 where d denotes the atomic dipole, E = Eωε a+ a† sin(k(x− x0)) (3) is the electric field operator, with Eω = , where ǫ0 is the permittivity of free space, V the electromag- netic mode volume, x0 the abscissa at the left cavity mirror (x0 < 0), and ε the electric field polarization vec- tor. We have introduced the operators σini (resp. σ for i = x, y, z as the Pauli spin operators in the basis {|e〉, |g〉} (resp. {|+〉, |−〉}). The operator x stands for the center-of-mass position of the atom, px is the conju- gate momentum along the x axis, m denotes the atomic mass, and a (a†) the annihilation (creation) operator of the cavity radiation field. We adopt the two-level approximation which consists of taking into account only the two lowest motional en- ergy states. This requires the Rabi frequency 4g2 + δ2 (with δ = ω − ω0 the detuning between the cavity field and the atomic transition frequencies) to be much smaller than the frequency gap ∆̃ between the upper motional states and the ground state doublet (see Fig. 1). Within this approximation, Hamiltonian HexA becomes HexA = σexz (4) and the position operator takes the form x = b σexx with b/2 = 〈+|x|−〉. We can form states that are mainly con- centrated in the left/right wells, |L〉 = (|+〉 − |−〉)/ |R〉 = (|+〉+ |−〉)/ The average position of a particle localized in the right well is then given by b/2 (see Fig. 1) and σexx = |R〉〈R| − |L〉〈L|. The interaction Hamiltonian HAF can then be written HAF = −~g(a+ a†) sinχ cosκ σinx − cosχ sinκ σexx σinx with the atom-field coupling strength g = −〈e|d|g〉.εEω/~, and χ = kx0, κ = kb/2. (6) For long wavelengths (κ ≪ 1), or κ = nπ with inte- ger n, the left and right sites of the double well are in- distinguishable to the cavity photon and HAF reduces to Jaynes-Cummings Hamiltonian without rotating wave approximation (with a sine varying coupling constant), −~g sinχ (a + a†)σinx . Note that κ ≪ 1 would normally be identified with the Lamb-Dicke regime. Here the sit- uation is more subtle as the level spacings between the tunneling split ground state doublet and the next excited states can be very different such that the recoil energy ~ωrecoil satisfies ∆ ≪ ωrecoil ≪ ∆̃. One may thus be in the Lamb-Dicke regime concerning transitions to higher vibrational states but have a significant mechanical ef- fect on the atomic tunneling. Furthermore, since there is only one photon mode, the recoil energy cannot vary con- tinuously and exciting higher vibrational levels requires ωrecoil close to a level spacing. Our numerical calculations show that even for κ ∼ 1 the two-level approximation can still work very well (see Fig. 4). For δ, ∆ ≪ ω, ω0, a rotating wave approximation is justified, which consists in eliminating the energy non- conserving terms aσex± σ − and a †σex± σ + with σ + = |e〉〈g|, σin− = σ + and σ + = |+〉〈−|, σex− = σ + . Within this approximation, the total Hamiltonian reads σexz + σinz + ~ωa †a (7) +~g(aσin+ + a †σin− ) cosχ sinκ σexx − sinχ cosκ 1ex Thus, depending on the parameters χ and κ, the cavity photon may induce internal transitions in the atom only (cosχ sinκ = 0), or induce transitions between internal and external states at the same time (cosχ sinκ 6= 0) even for a vanishing detuning (δ = ω − ω0 = 0). This is in contrast to conventional sideband transitions of har- monically bound atoms or ions in the Lamb-Dicke regime which require an appropriate value of the detuning. For a fixed potential center (and thus fixed χ), κ can be changed through a modulation of the well-to-well sep- aration b. We will neglect in the following the effects of decoherence, which means that not only g but also ∆ should be much larger than the rate of spontaneous emission Γ, and the cavity decay rate κcav. We denote the global state of the atom-field system by |n, i, j〉 ≡ |n〉⊗|i〉⊗|j〉 where |n〉 stands for the cavity field eigenstates, |i〉 ∈ {|−〉, |+〉} for the external motional states, and |j〉 ∈ {|g〉, |e〉} for the internal states. The total excitation number N is given by a†a+ σin+σ B. Energy levels The states |0,±, g〉 are eigenstates of H with eigen- value (−~ω0 ± ~∆)/2, i.e. these states remain uncou- pled and represent the two lowest energy states in the regime δ, ∆ ≪ ω, ω0. It is straightforward to ver- ify that the Hamiltonian (7) only induces transitions between states with the same number of excitations N , {|N − 1,+, e〉, |N,+, g〉, |N − 1,−, e〉, |N,−, g〉} ≡ {|1〉, |2〉, |3〉, |4〉}. It is therefore sufficient to solve the dynamics in this subspace. In doing so, we obtain the eigenvalues of H , λρµ = (N − 1/2)~ω + ρ , (8) for ρ, µ ∈ {±}, N = 1, 2, . . ., and with 2Ng2(1− cos(2κ) cos(2χ)) + δ2 +∆2 ± 2Ω2 , (9) 4Ng2 cos2 κ sin2 χ(∆2 + 4Ng2 sin2 κ cos2 χ) + δ2∆2 . (10) For a vanishing tunnel splitting (∆ = 0), Ω± reduces to the maximum (minimum) of the two Rabi frequencies of the Jaynes-Cummings models in the right and left wells. For cosκ = 1, the decoupling of external and internal degrees of freedom manifests itself also in the eigenvalues with Ω± = | 4Ng2 sin2 χ+ δ2 ±∆|. C. Evolution operator The whole dynamics of the system can be described by means of the evolution operator U(t) = e−iHt/~ with components Uij = 〈i|U(t)|j〉 = Uji, which can be calcu- lated exactly. In order to simplify the expressions, we restrict ourselves in the following to χ = −π/4−2nπ (in- teger n). We find, up to a an overall phase e−i(N−1/2)ωt, U11 = − µSµΩ−µ ξ + µ(∆− δ)Ω2 − iµΩ+Ω−Cµ(δ∆− µΩ2) U12 = Ng cosκ√ µSµΩ−µ(∆ 2 + 2Ng2 sin2 κ + µΩ2) + iµΩ+Ω−∆Cµ U13 = −iNg2 sin(2κ) µδΩµS−µ + iµΩ+Ω−Cµ U23 = Ng sinκ√ µΩµS−µ(δ∆+ 2Ng 2 cos2 κ− µΩ2) ξ = ∆(δ2 + 2Ng2 cos2 κ− δ∆), Λ = Ω+Ω−Ω and where all time dependence is in the coefficients C± = cos(Ω±t/2), S± = sin(Ω±t/2). (16) The remaining components can be deduced from the relations U22(δ,∆) = U33(−δ,−∆) = U44(δ,−∆) = U11(−δ,∆), U24(δ,∆) = U13(−δ,∆), U14(δ,∆) = U23(δ,−∆) = U23(−δ,∆), and U34(δ,∆) = U12(δ,−∆), valid for any χ, where we have made explicit the depen- dence of the Uij on δ and ∆. III. INTERNAL AND EXTERNAL DYNAMICS The reduced density matrix ρex for the atomic center- of-mass motion alone follows from ρ = |ψ(t)〉〈ψ(t)| by tracing out the field and internal degrees of freedom, where the total wave function at time t reads |ψ(t)〉 = i,j=1 Uij〈j|ψ(0)〉 |i〉. The average position of the atom in the double well potential is then given by 〈x〉 = Trex(ρ exσexx ) = (1− 2ρLL) (17) with ρLL = 〈L|ρex|L〉. Similarly, we obtain the reduced density matrix ρin for the internal atomic state by tracing out the field and external degrees of freedom, and the probability to find the atom in the excited state as ρee = 〈e|ρin|e〉. In the following, we first focus on resonant atom-field interaction (ω = ω0) before moving to the non-resonant case (ω 6= ω0). We distinguish three regimes according to the tunnel splitting compared to the Rabi frequency g : the small tunnel splitting regime (when ∆/g ≪ 1), the intermediate regime (when ∆/g ∼ 1), and the large tunnel splitting regime (when ∆/g ≫ 1). A. Resonant atom-field interaction For resonant atom-field interaction (δ = 0), the ex- pressions for Uij can be greatly simplified. If the system is initially prepared in the state |N − 1, R, e〉 and for κ = π/4, we have ρLL = ∆2 +Ng2 Ωtunt with the tunnel frequency Ωtun = (Ω+ +Ω−) , (19) ρee = Ω2µ −∆2 cos(Ωµt) + 4∆ 2 cos Ω+−Ω− 8(Ng2 +∆2) The atom position oscillates with a single frequency Ωtun given by Eq. (19), whereas ρee evolves with three in general incommensurable frequencies Ω+, Ω−, and (Ω+− Ω−)/2 giving rise to a quasi-periodic signal. For ∆/g ≪ 1, Eq. (18) leads to ρLL ≃ 0 (up to order (∆/g)2), indicating that tunneling is suppressed. This is already obvious from (7), as the term responsible for tun- neling, (~∆/2)σexz = (~∆/2)(|R〉〈L| + |L〉〈R|) becomes very small compared to the last term, diagonal in |R〉, |L〉 which leads to internal Rabi flopping. Note, however, that tunneling is suppressed on all time scales, even for t≫ 1/∆, due to the reduced amplitude in Eq. (18), very much in contrast to tunneling without internal degrees of freedom, where only the period of the tunneling motion, but not the amplitude is affected when ∆ is reduced. For κ approaching π, the situation changes because the term g cosχ sinκ σexx of the interaction Hamiltonian inducing transitions between vibrational states becomes small in comparison with ∆ thereby allowing tunneling again. Because internal and external degrees of freedom are coupled, the tunneling frequency (Eq. (19)) depends on the number of photons inside the cavity. As an exam- ple, let us now consider ∆ ∼ g and a cavity field initially in a coherent state |α〉 = e− 12 |α|2 |n〉 with |α|2 equal to the mean photon number 〈n〉. Figure 2 shows that the average position of the atom in the double well as a function of time for a coherent state exhibits col- lapses and revivals. The oscillation amplitude decreases with increasing mean photon number 〈n〉 and decreasing tunnel splitting ∆ (see Eqs. (18,9)). Since the probabil- ity to find the atom in the excited state oscillates with three frequencies, no collapses and revivals are observed for ρee. The collapse time tc of the tunneling motion can be estimated from the condition [20] (Ωtun(〈n〉 + 〈n〉) − Ωtun(〈n〉− 〈n〉)) tc ∼ 1 with Ωtun(m) given by Eq. (19) for N = m+ 1, which yields, for 〈n〉 ≫ 1, (∆/g)2 + 3/4 +O(〈n〉−2) (21) The time interval between two following revivals, tr, follows from (Ωtun(〈n〉) − Ωtun(〈n〉 − 1)) tr = 2π, and is given for 〈n〉 ≫ 1 by (∆/g)2 + 1/2 +O(〈n〉−2) For the parameters of Fig. 2, Eq. (22) yields gtr ≃ 68.23 for ∆/g = 2 and gtr ≃ 86.70 for ∆/g = 5. Smaller revival times are possible for smaller values of 〈n〉, but in general the observation of revivals will be quite challenging, as they require ∆ ∼ g ≫ κcav. For large tunnel splitting, ∆/g ≫ 1, Ωtun = ∆ + Ng2/(2∆) + O((g/∆)3), and Eq. (18) reduces to ρLL ≃ sin2(∆t/2), which is identical to the tunneling of a parti- cle without internal structure. Equation (20) reduces to a Rabi oscillation ρee ≃ cos2( Ngt/2). B. Non-resonant atom-field interaction For non-resonant atom-field interaction (δ 6= 0), and intermediate tunnel splitting [see Fig. 3 for ∆ = δ = g], ∆/g = 5 ∆/g = 2 100806040200 FIG. 2: (Color online) Average position of the atom in the double well as a function of time for ∆/g = 2 (blue, top curve) and ∆/g = 5 (red), κ = π/4 and a coherent state with α = 5. 100806040200 FIG. 3: (Color online) Average position of the atom in the double well as a function of time for ∆ = δ = g, κ = π/4 and N = 1. The blue solid/red dashed curve corresponds to an excited atom initially located in the left/right well. 〈x(t)〉 involves in general the two non-commensurate fre- quencies Ω+ and Ω− and varies therefore quasiperiod- ically as a function of time. Figure 3 also shows that an atom initially located in one of the two wells remains mostly confined to that well. For small tunnel splitting, ∆/g ≪ 1 and large detuning |δ|/g ≫ 1 (with ∆|δ|/g2 ∼ 1), the matrix elements of U simplify to U13 = i Ng2 sin 2κ δ2∆2 +N2g4 sin2(2κ) (23a) U33 = cos δ2∆2 +N2g4 sin2(2κ) (23b) up to corrections of order O(∆/g) and a phase factor ei[(Ng 2/δ+δ)−(2N−1)ω]t/2 while the components U12 and U23 are of order O(∆/g). In this situation, the system 20151050 FIG. 4: (Color online) Density matrix elements ρRR (top) and ρee (bottom) as a function of the interaction time gt for an initially excited atom located in the right well and for the parameters ∆/g ≃ 0.3336, δ/g = 3, κ = π/4, and N = 1. Numerical results from the propagation of the time dependent Schrödinger equation with Hamiltonian (1) and rotating wave approximation are represented by circles and analytical results by solid curves. The time propagation was done with (~ = m = 1) g = 0.01 and the double well potential V (x) = 0.08x4 − x2 yielding a tunnel splitting ∆ ≃ 0.003336 and a ratio ∆̃/ 4g2 + δ2 ≃ 44.4 ≫ 1. oscillates only between the two states |N − 1,+, e〉 and |N − 1,−, e〉 with a single frequency δ2∆2 +N2g4 sin2(2κ) , (24) just as a three-level atom undergoing a Raman transition in the far detuned regime behaves as a two-level system. If the system is initially in the state |N − 1,−, e〉, we have from Eqs. (23) ρLL = − Nδ∆sin(2κ) 2Ω̄2(δ/g)2 1− cos , (25) and ρee = 1. For a detuning δ = ±Ng2 sin(2κ)/∆, ρLL = 1− cos . (26) This regime may be suitable for coherently manipu- lating the atom position through access to its internal degrees of freedom with a laser. Coherent manipulation of the position of neutral atoms has been proposed and demonstrated before, see e.g. [21, 22, 23]. In these exam- ples, the manipulation is done by modifying the external potential. The mechanism we propose here is very dif- ferent, as the potential remains totally unchanged, and only internal transitions and the tunneling effect are used to move the atom in a controlled way. As an example, we show how the atom can be prepared in the left well starting from the ground state |0,−, g〉 for δ = −g2/∆. We first apply a π-pulse with an external laser resonant with the atomic transition. By using a laser with a wave vector perpendicular to the Ox-direction, only the atomic internal degree of freedom is affected, resulting in the transition |0,−, g〉 → −i|0,−, e〉. We assumed that the laser Rabi frequency ΩR is much larger than the tunnel frequency ∆. Now we use the coupling between the internal and ex- ternal degrees of freedom to create a superposition of the |0,±, e〉 states, and then apply a second resonant π- pulse to get back to the uncoupled states |0,±, g〉. For ∆/g ≪ 1, δ = −g2/∆ and κ = π/4, the initial state transforms according to |0,−, g〉 −−−−→ ΩRt=π |0,−, e〉 −−−−−−→ ∆t=π/ |0, L, e〉 −−−−→ ΩRt=π |0, L, g〉 up to a physically irrelevant phase. Other coherent su- perpositions of |0,+, g〉 and |0,−, g〉 can be obtained by choosing appropriate interaction times. In order to verify that the two-level approximation for the external motion used in the derivation of the Hamilto- nian is a good approximation, we have numerically solved the time dependent Schrödinger equation with Hamil- tonian (1) and rotating wave approximation but with the exact external potential V (x) (i.e. with a large num- ber of vibrational states). Figure 4 shows that provided 4g2 + δ2 as stated before, to take only the two lowest vibrational states into account is indeed a good approximation. We finally comment on possible experimental realiza- tions of our model. Double well potentials with tunable well-to-well separation have been demonstrated with op- tical dipole traps e.g. in [21, 24], and on atom chips e.g. in [25, 26]. For our model, the double well poten- tial has to be realized inside the cavity. Optical trapping and even cooling of atoms close to their ground state inside a cavity has been achieved in several groups by now [27, 28, 29, 30], but up to our knowledge double well potentials have not been realized in a cavity so far. However, some of the cavities developed have a very long lateral opening (up to 222 µm [31]) and should allow more complicated trapping potentials (optical lattices inter- secting a cavity have been realized in Chapman’s group [31]). We remark that it is not essential for our model that the double well potential be aligned with the cavity axes. Any other orientation is possible, and only leads to modified coefficients cosχ sinκ and sinχ cosκ. At certain “magical wavelengths”, Cs, Yb, Sr, Mg, and Ca atoms in optical traps experience the same potential for ground and excited internal states coupled by a dipole transition [27, 32, 33, 34]. In a symmetric potential V (z) the tunneling frequency ∆ is given in WKB approxi- mation by ∆ ∼ ωosc exp(−1/~ 2m(E0 − V (z)) dz) where E0 is the ground state energy, ωosc the single well harmonic oscillation frequency, and z = ±a are the corre- sponding classical turning points delimiting the range of the barrier. The exponential factor can approach unity for a barrier that is only slightly higher than the ground state energy E0, in which case cooling to temperatures kBT < ~∆ should be possible with state of the art techniques [27]. In [27] a trap depth V0/~ = 47 MHz was achieved inside a cavity with 1.2 mW laser power. In any case, the trap frequency and thus the tunneling splitting are determined by the laser power and the fo- cussing (or the wavelength for optical lattices), and can therefore be controlled independently of Γ, κcav, such that there should be no fundamental problem achieving ∆ ≫ Γ, κcav. The detection of the tunneling motion should be possible by optical imaging, i.e. diffusion of laser light from another transition in the optical regime with smaller wavelength than the well separation. Al- ternatively, one might monitor the transmission through the cavity in the case that it differs for the two locations of the wells [35]. Another possibility might be using the atomic spin as a position meter [17]. Acknowledgments We thank Jacques Vigué for an interesting discussion and CALMIP (Toulouse) for the use of their comput- ers. This work was supported by the Agence National de la Recherche (ANR), project INFOSYSQQ, and the EC IST-FET project EUROSQIP. [1] G. Gamow, Z. Phys. 51, 204 (1928). [2] E. Guth and C. J. Mullin, Phys. Rev. 61, 339 (1942). [3] M. H. Devoret, D. Esteve, H. Grabert, G.-L. Ingold, H. Pothier, and C. Urbina, Phys. Rev. Lett. 64, 1824 (1990). [4] B. D. Josephson, Rev. Mod. Phys. 46, 251 (1974). [5] A. Hueller, Z. Phys. B 36, 215 (1980). [6] A. Würger, Z. Phys. B 76, 65 (1989). [7] D. Braun and U. Weiss, Physica B 202, 264 (1994). [8] A. A. Louis and J. P. Sethna, Phys. Rev. Lett. 74, 1363 (1995). [9] F. Meier and W. Zwerger, Phys. Rev. A 64, 033610 (2001). [10] D. L. Luxat and A. Griffin, Phys. Rev. A 65, 043618 (2002). [11] M. Albiez, R. Gati, J. Folling, S. Hunsmann, M. Cris- tiani, and M. K. Oberthaler, Phys. Rev. Lett. 95, 010402 (2005). [12] F. Grossmann, T. Dittrich, P. Jung, and P. Hanggi, Phys. Rev. Lett. 67, 516 (1991). [13] V. Averbukh, S. Osovski, and N. Moiseyev, Phys. Rev. Lett. 89, 253201 (2002). [14] D. A. Steck, W. H. Oskay, and M. G. Raizen, Science 293, 274 (2001). [15] W. K. Hensinger, H. Häffner, A. Browaeys, N. R. Hecken- berg, K. Helmerons, C. McKenzie, G. J. Milburn, W. D. Philipps, S. L. Holston, H. Rubinsztein-Dunlop, et al., Nature 412, 52 (2001). [16] C. Sias, A. Zenesini, H. Lignier, S. Wimberger, D. Ciampini, O. Morsch, and E. Arimondo, Phys. Rev. Lett. 98, 120403 (2007). [17] D. L. Haycock, P. M. Alsing, I. H. Deutsch, J. Grondalski, and P. S. Jessen, Phys. Rev. Lett. 85, 3365 (2000). [18] T. Salzburger and H. Ritsch, Phys. Rev. Lett. 93, 063002 (2004). [19] P. Domokos and H. Ritsch, J. Opt. Soc. Am. B 20, 1098 (2003). [20] M. Scully and M. Zubairy, Quantum Optics (Cambridge University Press, Cambridge, UK, 1997). [21] J. Sebby-Strabley, M. Anderlini, P. S. Jessen, and J. V. Porto, Phys. Rev. A 73, 033605 (2006). [22] O. Mandel, M. Greiner, A. Widera, T. Rom, T. W. Hänsch, and I. Bloch, Phys. Rev. Lett. 91, 010407 (2003). [23] J. Mompart, K. Eckert, W. Ertmer, G. Birkl, and M. Lewenstein, Phys. Rev. Lett. 90, 147901 (2003). [24] Y. Shin, M. Saba, T. A. Pasquini, W. Ketterle, D. E. Pritchard, and A. E. Leanhardt, Phys. Rev. Lett. 92, 050405 (2004). [25] E. A. Hinds, C. J. Vale, and M. G. Boshier, Phys. Rev. Lett. 86, 1462 (2001). [26] W. Hänsel, J. Reichel, P. Hommelhoff, and T. W. Hänsch, Phys. Rev. A 64, 063607 (2001). [27] J. McKeever, J. R. Buck, A. D. Boozer, A. Kuzmich, H.- C. Nägerl, D. M. Stamper-Kurn, and H. J. Kimble, Phys. Rev. Lett. 90, 133602 (2003). [28] J. Ye, D. W. Vernooy, and H. J. Kimble, Phys. Rev. Lett. 83, 4987 (1999). [29] J. A. Sauer, K. M. Fortier, M. S. Chang, C. D. Hamley, and M. S. Chapman, Phys. Rev. A 69, 051804(R) (2004). [30] P. Maunz, T. Puppe, I. Schuster, N. Syassen, P. W. H. Pinske, and G. Rempe, Nature 428, 50 (2004). [31] K. M. Fortier, S. Y. Kim, M. J. Gibbons, P. Ahmadi, and M. S. Chapman, Phys. Rev. Lett. 98, 233601 (2007). [32] H. Katori, M. Takamoto, V. G. Pal’chikov, and V. D. Ovsiannikov, Phys. Rev. Lett. 91, 173005 (2003). [33] A. Brusch, R. LeTargat, X. Baillard, M. Fouche, and P. Lemonde, Phys. Rev. Lett. 96, 103003 (2006). [34] Z. W. Barber, C. W. Hoyt, C. W. Oates, L. Hollberg, A. V. Taichenachev, and V. I. Yudin, Phys. Rev. Lett. 96, 083002 (2006). [35] P. Maunz, T. Puppe, I. Schuster, N. Syassen, P. W. H. Pinkse, and G. Rempe, Phys. Rev. Lett. 94, 033002 (2005). ABSTRACT We study the tunneling of a two-level atom in a double well potential while the atom is coupled to a single electromagnetic field mode of a cavity. The coupling between internal and external degrees of freedom, due to the mechanical effect on the atom from photon emission into the cavity mode, can dramatically change the tunneling behavior. We predict that in general the tunneling process becomes quasiperiodic. In a certain regime of parameters a collapse and revival of the tunneling occurs. Accessing the internal degrees of freedom of the atom with a laser allows to coherently manipulate the atom position, and in particular to prepare the atom in one of the two wells. The effects described should be observable with atoms in an optical double well trap. <|endoftext|><|startoftext|> Correlation functions and excitation spectrum of the frustrated ferromagnetic spin-1 chain in an external magnetic field T. Vekua,1 A. Honecker,2 H.-J. Mikeska,3 and F. Heidrich-Meisner4 Laboratoire de Physique Théorique et Modèles Statistiques, Université Paris Sud, 91405 Orsay Cedex, France Institut für Theoretische Physik, Universität Göttingen, 37077 Göttingen, Germany Institut für Theoretische Physik, Universität Hannover, Appelstrasse 2, 30167 Hannover, Germany Materials Science and Technology Division, Oak Ridge National Laboratory, Tennessee, 37831, USA and Department of Physics and Astronomy, University of Tennessee, Knoxville, Tennessee 37996, USA (Dated: April 5, 2007; revised: July 6, 2007) Magnetic field effects on the one-dimensional frustrated ferromagnetic chain are studied by means of effective field theory approaches in combination with numerical calculations utilizing Lanczos diagonalization and the density matrix renormalization group method. The nature of the ground state is shown to change from a spin-density-wave region to a nematic-like one upon approaching the saturation magnetization. The excitation spectrum is analyzed and the behavior of the single spin-flip excitation gap is studied in detail, including the emergent finite-size corrections. I. INTRODUCTION The interest in helical and chiral phases of frustrated low-dimensional quantum magnets has been triggered by recent experimental results. While many copper-oxide based materials predominantly realize antiferromagnetic exchange interactions, several candidate materials with magnetic properties believed to be described by frus- trated ferromagnetic chains have been identified,1,2,3,4,5,6 including Rb2Cu2Mo3O12 (Ref. 1), LiCuVO4 (Refs. 2, 3,4,5), and Li2ZrCuO4 (Ref. 6). The frustrated anti- ferromagnetic chain is well-studied,7 but the magnetic phase diagram of the model with ferromagnetic nearest- neighbor interactions remains a subject of active theoret- ical investigations.8,9,10,11 In this work we consider a parameter regime that is in particular relevant for the low-energy properties of LiCuVO4, corresponding to a ratio of J1 ≈ −0.3 J2 be- tween the nearest neighbor interaction J1 and the frus- trating next-nearest neighbor interaction J2 > 0. As the interchain couplings for this material are an order of magnitude smaller than the intrachain ones,3 we an- alyze a purely one-dimensional (1D) model. Apart from mean-field based predictions,8 the nature of the ground state in a magnetic field h is not yet completely known. Therefore, combining the bosonization technique with a numerical analysis we determine ground-state properties and discuss the model’s elementary excitations. The Hamiltonian for our 1D model reads: J1~Sx · ~Sx+1 + J2~Sx · ~Sx+2 Szx , (1) where ~Sx represents a spin one-half operator at site x. Bosonization has turned out to be the appropriate lan- guage for describing the regime |J1| ≪ J2 of Eq. (1). This result has been established by studying the magne- tization process yielding a good agreement between field theory and numerical data.9 The derivation of the ef- fective field theory is summarized in Sec. II. Here, we extend on such comparison of analytical and numerical results and further confirm the predictions of field the- ory by analyzing several correlation functions in Sec. III. Then, in Sec. IV, we numerically compute the one- and two-spin flip excitation gaps and compare them to field- theory predictions. Finally, Sec. V contains a summary and a discussion of our results. II. EFFECTIVE FIELD THEORY We start from an effective field theory describing the long-wavelength fluctuations of Eq. (1). In the limit of strong next-nearest neighbor interactions J2 ≫ |J1|, the spin operators can be expressed as: Szα(r) ∼ m+ c(m) sin 2kF r + + · · · S−α (r) ∼ (−1)re−iθα π + · · · . (2) kF = ( −m)π is the Fermi-wave vector and α = 1, 2 enu- merates the two chains of the zig-zag ladder. In relation with Eq. (1), note that ~S1(r) = ~S(x+1)/2 (~S2(r) = ~Sx/2) for x odd (even). φα and θα are compactified quantum fields describing the out-of-plane and in-plane angles of fluctuating spins obeying Gaussian Hamiltonians: H = v (∂xφα) 2 +K(∂xθα) , (3) with [φα(x), θα(y)] = iΘ(y − x), where Θ(x) is the Heaviside function. Sub-leading terms are suppressed in Eq. (2). m is the magnetization of decoupled chains, re- lated to the real magnetization M of the zig-zag system M ≃ m 1− 2K(m)J1 πv(m) . (4) K(m) and v(m) are the Luttinger liquid (LL) parame- ter and the spin-wave velocity of the decoupled chains, http://arxiv.org/abs/0704.0764v2 respectively. The nonuniversal amplitude c(m) appear- ing in the bosonization formulas (2) has been determined from density matrix renormalization group (DMRG) calculations.12 Note that in our notation M = 1/2 at saturation. Now we perturbatively add the interchain coupling term to two decoupled chains, each of which is described by an effective Hamiltonian of the form Eq. (3) and fields φi and θi, i = 1, 2. For convenience, we transform to the symmetric and antisymmetric combinations of the bosonic fields φ± = (φ1±φ2)/ 2 and θ± = (θ1±θ2)/ In this basis and apart from terms H±0 of the form (3), the effective Hamiltonian describing low-energy proper- ties of Eq. (1) contains a single relevant interaction term with the bare coupling g1 ∝ J1 ≪ v: Heff = H+0 +H 0 + g1 dx cos , (5) and the renormalized LL parametersK± are, in the weak coupling limit: K± = K 1∓ J1 . (6) K+ is the Luttinger-liquid parameter of the soft mode of the zig-zag ladder. The Hamiltonian (5) represents the minimal effective low-energy field theory describing the region J2 ≫ |J1| of the frustrated FM spin-1/2 chain for M 6= 0.9,13 The relevant interaction term cos opens a gap in the φ− sector. Since S x+1 − Szx ∼ ∂xφ−, relative fluctuations of the two chains are locked. This implies that single-spin flips are gapped with a sine- Gordon gap in the sector describing relative spin fluctua- tions of the two-chain system.9 Gapless excitations come from the ∆Sz = 2 channel, i.e. only those excitations are soft where spins simultaneously flip on both chains. DMRG results show that this picture applies to a large part of the magnetic phase diagram.9 III. CORRELATION FUNCTIONS We now turn to the ground state properties of Eq. (1) as a function of magnetization, concentrating on several correlation functions in order to identify the leading in- stabilities. Note that our analysis is only valid if M 6= 0. Apart from a term representing the magnetization M in- duced by the external field, the longitudinal correlation function shows an algebraic decay with distance r: 〈Szα(0)Szβ(r)〉 ≃ M2+ C1 cos(2kF r + (α− β)kF ) 2π2rK+ 8π2r2 The constants Ci, i = 1, 2, 3, appearing here and in Eq. (9) will be determined through a comparison with numerical results. In contrast to Eq. (7), the transverse xy-correlation functions decay exponentially reflecting the gapped na- ture of the single spin-flip excitations. Here we do not restrict ourselves to the equal-time expression only, be- cause we will need non-equal time correlation functions to extract the finite-size corrections to the gap later on. We obtain: 〈S+α (0, 0)S−β (r, τ)〉 ≃ δα,β(−1)re−∆1(M) τ2+r2/v2 (r2 + v2+τ 8K+ (r2 + v2−τ where τ stands for the Euclidean time, ∆1(M) is the ∆Sz = 1 gap, and v± ∼ v ± J1/π in the weak cou- pling limit. The Kronecker delta strictly applies to the thermodynamic limit, while on the lattice an additional contribution for α 6= β exists. It is noteworthy that, different from Eq. (8), the in- plane correlation functions involving bilinear spin combi- nations decay algebraically. This stems from the gapless nature of ∆Sz = 2 excitations. In fact, these are the slowest decaying correlators close to the saturation mag- netization: 〈S+1 (r)S 2 (r)S 1 (0)S 2 (0)〉 ≃ r1/K+ C3 cos(2kF r) rK++1/K+ This result is reminiscent of a partially ordered state be- cause the ordering tendencies in this correlation function are more pronounced than those of the corresponding single-spin correlation function Eq. (8). Therefore, we call the correlator (9) ‘nematic’. Furthermore, we will refer to a situation where Eq. (9) is the slowest decay- ing one among all correlation functions as a ‘nematic-like phase’. By virtue of the exponential decay in (8), the correlator (9) is proportional to: 〈(S+1 (r) + S 2 (r)) 2 (S−1 (0) + S 2 (0)) 2〉 . (10) The term (Sα1 + S 2 appearing in the case of the S = 1 zig-zag ladder corresponds to the operator (Sα)2 in the case of a S = 1 chain. One can think of an effective S = 1 spin formed from two neighboring S = 1 spins coupled by the ferromagnetic interaction. A similar behavior of correlation functions, namely the exponential decay of in- plane spin components and the algebraic decay of their bilinear combinations, is encountered also in the XY 2 phase of the anisotropic S = 1 chain14 and in the spin-1 chain with biquadratic interactions, see, e.g., Ref. 15. The algebraic decay of the nematic correlator as op- posed to the exponential decay of (8) suggests that there are tendencies towards nematic ordering in this phase. Depending on the value of K+ the dominant instabilities are either spin-density-wave ones for K+ < 1 or nematic ones for K+ > 1. From the result for K+ given in Eq. (6) one can perturbatively evaluate the crossover value of J1: |J1,cr| = πv(m) . (11) For J1 < J1,cr the nematic correlator (9) is the slowest decaying one, i.e. one is in the nematic-like phase. The -0.015 -0.01 -0.005 0.005 0.001 0 4 8 12 16 20 24 28 32 fit, L=32 fit, L=48 fit, L=64 ED, L=32 ED, L=48 ED, L=64 FIG. 1: (Color online) Correlation functions at J1 = −J2 < 0, and magnetization M = 3/8: (a) longitudinal component Sz (b) transverse component S± , (c) spin nematic S± . x is the distance in a single-chain notation. ED results for periodic boundary conditions are shown by symbols, fits by lines. Note the logarithmic scale of the vertical axis in panel (b). behavior of the cross-over line can be read off from the behavior of K(m): K(m) increases monotonically with m, tends to K = 1 for m → 1/2, and satisfies K < 1 for m < 1/2 (see, e.g., Refs. 16,17). Therefore, we have J1,cr = 0 for M = 1/2 with increasing ferromagnetic |J1,cr| for decreasing M . This means that for J1 < 0 a regime opens at high M where nematic correlations given by Eq. (9) dominate over spin-density-wave corre- lations given by Eq. (7), in agreement with Chubukov’s prediction.8 Now we check the correlation functions obtained within bosonization against exact diagonalization (ED) results. Numerical data obtained for J1 = −J2 < 0 and M = 3/8 on finite systems with periodic boundary conditions are shown in Fig. 1. This parameter set allows for a clear test of the above predictions, but represents the generic behavior in the phase of two weakly coupled chains. To take into account finite-size effects we use the observation that for a conformally invariant theory, any power law on a plane becomes a power law in the following variable defined on a cylinder of circumference L: x → L . (12) First we fit the nematic correlator given by Eq. (9), which from bosonization is expected to be the leading instability at high magnetizations. Using the part with x ≥ 5 of the L = 64 data shown in Fig. 1c, we find 1/K+ = 0.904 ± 0.011, C2 = 0.143 ± 0.004, and C3 = −0.326±0.013. Fig. 1c shows that all finite-size results for the nematic correlator are nicely described by this fit with the dependence on L taken into account by substituting Eq. (12) for the power laws. Moreover, from K+ > 1, we see that the system is indeed in the region dominated by nematic correlations for M = 3/8 and J1 = −J2. Now we turn to the longitudinal correlation function which we fit to the bosonization result Eq. (7). Since most numerical parameters have been determined by the previous fit, only one free parameter is left which we de- termine from the numerical results of Fig. 1a for L = 64 and x ≥ 14 as C1 = 0.060± 0.004. Predictions for other system sizes are again obtained by substituting Eq. (12) for the power laws. The agreement in Fig. 1a is not as good as in Fig. 1c. However, it improves at larger dis- tances x and system sizes L, indicating that corrections omitted in Eq. (7) are still relevant on the length scales considered here. Finally, the xy-correlation function is shown in Fig. 1b with a logarithmic scale of the vertical axis of this panel. The exponential decay predicted by Eq. (8) is verified. One further observes that correlations between the in- plane spin-operators belonging to different chains (odd x) are an order of magnitude smaller than on the same chain (even x). This suppression of correlations between different chains corresponds to the δ symbol in (8), which strictly applies only in the thermodynamic limit and for large distances. We summarize the main result of this section: in-plane spin correlators are exponentially suppressed for any fi- nite value of the magnetization in the parameter region |J1| < J2. The ground state crosses over from a spin- density-wave dominated to a nematic-like phase with in- creasing magnetic field, with the crossover line given by Eq. (11). IV. EXCITATIONS We next address the excitation spectrum. Since the gap to ∆Sz = 1 excitations should be directly accessible to microscopic experimental probes such as inelastic neu- tron scattering or nuclear magnetic resonance, we analyze its behavior as a function of magnetization. Sufficiently below the fully polarized state the gap can be calculated analytically using results from sine-Gordon theory. In addition, to leading order of the interchain coupling, one can get qualitative expressions using dimensional argu- ments for the perturbed conformally invariant model: ∆1(m) ∼ c2(m)|J1| sin(πm) v(m)(1 − J1K(m)/πv(m)) , (13) where ν = 2− 2K(m) 1+ J1K(m)/πv(m) . m(h),K(h) and v(h) can be determined numerically from the Bethe ansatz integral equations.16,17,18,19 With this information and Eqs. (4) and (13) we de- termine the qualitative behavior of the single-spin gap ∆1(M) as a function of M : it increases from zero at zero magnetization, reaches a maximum at intermediate magnetization values, then shows a minimum and, upon approaching the saturation magnetization, it increases again. As our formulas do not strictly apply at m = 0, the notion of a vanishing gap at zero magnetization may be a spurious result. Note that when the fully polarized state is approached, the magnetization increases in an un- physical fashion since in this limit bosonization becomes inapplicable. At the point where the magnetization sat- urates the exact value of the gap can be obtained from the following mapping to hard-core bosons:8,11,20 Szi = − a†iai , S i = a i . (14) Comparing Eq. (14) with Eq. (2) one recognizes the lead- ing terms in Haldane’s harmonic fluid transformation for bosons.20 Using a ladder approximation which is exact in the two-magnon subspace we arrive at: 4J22 − 2J1J2 − J21 2(J2 − J1) J21 + 8J1J2 + 16J J21 (3 J2 + J1) J2 (J2 − J1) . (15) In Eq. (15) we have represented the gap as a difference of two terms: the quantum and the classical instability fields emphasizing its quantum origin. In order to verify these field theory predictions, we perform complementary numerical computations using the DMRG method.21 Open boundary conditions are im- posed and we typically keep up to 400 DMRG states. From DMRG we obtain the ground-state energies E(Sz) as a function of total Sz. For those values of Sz that emerge as a ground state in an external magnetic field we compute the single-spin excitation gap from ∆1(M) = E(Sz + 1) + E(Sz − 1)− 2E(Sz) . (16) Fig. 2a shows numerical results for ∆1 at a selected value of J1 = −0.3 J2 < 0 for the largest system sizes investi- gated. We find that the finite-size behavior of the gap ∆1(M,L) for system sizes L ≥ 24 is well described by a 1/L correction. This will be further corroborated by field-theoretical arguments outlined below. Therefore, we 0 0.1 0.2 0.3 0.4 0.5 L=120 L=144 L=156 L=120 L=144 L=156 extrapolated FIG. 2: (Color online) Density matrix renormalization group results for the gaps at J1 = −0.3 J2 < 0 as a function of magnetization M . Panel (a) shows the single-spin excitation gap (16), panel (b) the finite-size gap (19) for two flipped spins multiplied by the chain length L. extrapolate it to the thermodynamic limit using a fit to the form ∆1(M,L) = ∆1(M) + + · · · , (17) allowing for an additional 1/L2 correction for those values of M where at least 4 different system sizes are available. This extrapolation is represented by the full circles in Fig. 2a; errors are estimated not to exceed the size of the symbols. Our extrapolation for ∆1 is consistent with a vanishing gap at M = 0 in agreement with previous nu- merical studies22 although bosonization predicts a non- zero – possibly very small – gap.13,22,23 The behavior of ∆1(M) confirms the picture described above: the gap is non-zero for M > 0, goes first through a maximum and then a minimum and finally approaches ∆1/J2 ≈ 0.023 given by Eq. (15) for M → 1/2. We further wish to point out that for chains with pe- riodic boundary conditions, the coefficient a(M) of the finite-size extrapolation Eq. (17) is determined by the spin-wave velocity and the critical exponent of the soft mode from the ∆Sz = 2 channel. Indeed, using Eq. (8) where we can set r = 0, and use the conformal mapping (12) to the cylinder, we see that the leading finite-size correction to the gap is: ∆1(M,L) = ∆1(M) + πv+(M) 4K+(M) . (18) 0 π/2 π wave vector 1.0 0.0 FIG. 3: (Color online) Numerical dispersion spectrum in the subspaces of odd Sz computed for L = 24 and J1 = −J2 < 0. The wave vector is given relative to the ground state wave vector (0 for Sz = 0, 4, 8 and π for Sz = 2, 6). Note that we have to replace sin with sinh in Eq. (12) in order to extract a gap, since we are dealing with Eu- clidean time. In addition we used the fact that in our ap- proximation the effective Hamiltonian (5) is a direct sum of symmetric and antisymmetric sectors. Moreover, it is only the symmetric sector enjoying conformal invariance and consequently we perform the replacement τ → sinh τ only in the symmetric sector. The antisymmetric sector has a spectral gap and its contribution to the finite-size corrections of the single-spin flip excitation energy are exponentially suppressed with system size.24 With this method one cannot fix the amplitudes of the 1/L2 term and beyond. Note furthermore that there may be ad- ditional surface terms for open boundary conditions as employed in the numerical DMRG computations. Nev- ertheless there is a dominant 1/L correction in any case. Next, we briefly look at the ∆Sz = 2 excitations. Their finite-size gap is, in analogy to Eq. (16), computed with DMRG from ∆2(M) = E(Sz + 2) + E(Sz − 2)− 2E(Sz) . (19) Fig. 2b shows numerical results for L∆2(M,L) again at the value J1 = −0.3 J2 < 0. One observes that the scaled finite-size gaps collapse onto a single curve which shows that ∆2(M,L) scales linearly to zero with 1/L, exactly as expected for gapless excitations in 1D. Furthermore, we observe that the scaled quantity L∆2(M,L) vanishes as one approaches saturation M = 1/2 which indicates a vanishing of the velocity of the corresponding excitations at saturation. We proceed by discussing the wave-vector dependence of the ∆Sz = 1 excitation, while we remind the reader that the low-energy excitations are in the ∆Sz = 2 sec- tor. Fig. 3 shows representative ED results obtained for rings with L = 24 and J1 = −J2 < 0. For ground states with low Sz, the ∆Sz = 1 excitation spectrum looks similar to the continuum of spinons. On the other hand, close to saturation one has single-magnon excita- tions with a minimum given by the classical value of the wave vector kcl = arccos(|J1|/4J2).8,13,25 We read off from Fig. 3 that upon lowering the magnetic field, this minimum shifts from the classical incommensurate value towards π/2, i.e. the value appropriate for two de- coupled chains. This renormalization of the minimum of the magnon excitations towards the value of decoupled chains can be interpreted in terms of quantum fluctua- tions, which are enhanced when the density of magnons increases. A strong quantum renormalization of the pitch angle from its classical value at zero magnetization was previously observed by the coupled-cluster method and DMRG calculations.26 V. SUMMARY We have combined numerical techniques with analyti- cal approaches and mapped out the ground state phase diagram of the frustrated ferromagnetic spin chain in an external magnetic field. We have established that with increasing magnetic field, the ground state crosses over from a spin-density-wave dominated to a nematic-like phase. Single spin flip excitations are gapped, giving rise to an exponential decay of in-plane spin correlation func- tions in both regimes. We have studied the single- and two-spin flip excitation energy numerically. Using tools from conformal field theory we have further shown that the amplitude of the leading 1/L correction term to the single-spin flip gap is determined by the critical exponent and the spin-wave velocity of the soft mode. Finally, in order to apply our findings to the mate- rial LiCuVO4, one should take into account interchain interactions as well as anisotropies, which are expected to be present in this system.3 At low fields, a helical state has been observed experimentally.2,3 On the other hand, for the purely one dimensional case, we have shown that upon increasing the magnetic field there is a competition between spin-density-wave and nematic-like tendencies. Those are the two leading instabilities at high magnetiza- tions and thus they are the natural candidates to become long-range ordered in higher dimensions. The question whether there are true phase transitions at high fields in higher dimensions is beyond the scope of the current work. Acknowledgments We thank A. Feiguin for providing us with his DMRG code used for large scale calculations. Most of T.V.’s work was done during his visits to the Institutes of Theoretical Physics at the Universities of Hannover and Göttingen, supported by the Deutsche Forschungsge- meinschaft. The hospitality of the host institutions is gratefully acknowledged. T.V. also acknowledges support from the Georgian National Science Foundation under grant N 06−81−4−100. LPTMS is a mixed research unit 8626 of CNRS and University Paris-Sud. A.H. is sup- ported by the Deutsche Forschungsgemeinschaft (Project No. HO 2325/4-1), and F.H.-M. is supported by NSF grant No. DMR-0443144. 1 M. Hase, H. Kuroe, K. Ozawa, O. Suzuki, H. Kitazawa, G. Kido, and T Sekine, Phys. Rev. B 70, 104426 (2004). 2 B. J. Gibson, R. K. Kremer, A. V. Prokofiev, W. Assmus and G. J. McIntyre, Physica B 350, e253 (2004). 3 M. Enderle, C. Mukherjee, B. F̊ak, R. K. Kremer, J.-M. Broto, H. Rosner, S.-L. Drechsler, J. Richter, J. Malek, A. Prokofiev, W. Assmus, S. Pujol, J.-L. Raggazzoni, H. Rakoto, M. Rheinstädter, and H. M. Rønnow, Europhys. Lett. 70, 237 (2005). 4 M. G. Banks, F. Heidrich-Meisner, A. Honecker, H. Rakoto, J.-M. Broto, and R. K. Kremer, J. Phys.: Cond. Mat. 19, 145227 (2007). 5 N. Büttgen, H.-A. Krug von Nidda, L. E. Svistov, L. A. Prozorova, A. Prokofiev, and W. Aßmus, Phys. Rev. B 76, 014440 (2007). 6 S.-L. Drechsler, O. Volkova, A. N. Vasiliev, N. Tristan, J. Richter, M. Schmitt, H. Rosner, J. Málek, R. Klingeler, A. A. Zvyagin, and B. Büchner, Phys. Rev. Lett. 98, 077202 (2007). 7 H.-J. Mikeska and A. K. Kolezhuk, Lect. Notes Phys. 645, 1 (2004). 8 A. V. Chubukov, Phys. Rev. B 44, 4693 (1991). 9 F. Heidrich-Meisner, A. Honecker, and T. Vekua, Phys. Rev. B 74, 020403(R) (2006). 10 D. V. Dmitriev, V. Y. Krivnov, and J. Richter, Phys. Rev. B 75, 0114424 (2007). 11 R. O. Kuzian and S.-L. Drechsler, Phys. Rev. B 75, 024401 (2007). 12 See, e.g., T. Hikihara and A. Furusaki, Phys. Rev. B 69, 064427 (2004). 13 D. C. Cabra, A. Honecker, and P. Pujol, Eur. Phys. J. B 13, 55 (2000). 14 H. J. Schulz, Phys. Rev. B 34, 6372 (1986). 15 A. Läuchli, G. Schmid, and S. Trebst, Phys. Rev. B 74, 144426 (2006). 16 K. Totsuka, Phys. Lett. A 228, 103 (1997). 17 D. C. Cabra, A. Honecker, and P. Pujol, Phys. Rev. B 58, 6241 (1998). 18 V. E. Korepin, N. M. Bogoliubov, and A. G. Izergin, Quan- tum Inverse Scattering Method and Correlation Functions (Cambridge University Press, Cambridge, England, 1993). 19 S. Qin, M. Fabrizio, L. Yu, M. Oshikawa, and I. Affleck, Phys. Rev. B 56, 9766 (1997). 20 F. D. M. Haldane, Phys. Rev. Lett 47, 1840 (1981). 21 S. R. White, Phys. Rev. Lett. 69, 2863 (1992); Phys. Rev. B 48, 10345 (1993). 22 C. Itoi and S. Qin, Phys. Rev. B 63, 224423 (2001). 23 A. A. Nersesyan, A. O. Gogolin, and F. H. L. Eßler, Phys. Rev. Lett. 81, 910 (1998). 24 S. I. Matveenko and A. M. Tsvelik, private communication. 25 C. Gerhardt, K.-H. Mütter, and H. Kröger, Phys. Rev. B 57, 11504 (1998). 26 R. Bursill, G. A. Gehring, D. J. J. Farnell, J. B. Parkinson, T. Xiang, and C. Zeng, J. Phys. Cond. Mat. 7, 8605 (1995). ABSTRACT Magnetic field effects on the one-dimensional frustrated ferromagnetic chain are studied by means of effective field theory approaches in combination with numerical calculations utilizing Lanczos diagonalization and the density matrix renormalization group method. The nature of the ground state is shown to change from a spin-density-wave region to a nematic-like one upon approaching the saturation magnetization. The excitation spectrum is analyzed and the behavior of the single spin-flip excitation gap is studied in detail, including the emergent finite-size corrections. <|endoftext|><|startoftext|> Evidence of Spatially Inhomogeous Pairing on the Insulating Side of a Disorder-Tuned Superconductor-Insulator Transition K. H. Sarwa B. Tan, Kevin A. Parendo, and A. M. Goldman School of Physics and Astronomy, University of Minnesota, Minneapolis, MN 55455, USA (Date textdate; Received textdate; Revised textdate; Accepted textdate; Published textdate) Abstract Measurements of transport properties of amorphous insulating InxOy thin films have been interpreted as evidence of the presence of superconducting islands on the insulating side of a disorder-tuned superconductor-insulator transition. Although the films are not granular, the behavior is similar to that observed in granular films. The results support theoretical models in which the destruction of superconductivity by disorder produces spatially inhomogenous pairing with a spectral gap. The interplay between localization and superconduc- tivity can be investigated through studies of disordered superconducting films [1], originally treated by Anderson [2], and Abrikosov and Gor’kov [3], who considered the low-disorder regime. Several approaches have been pro- posed for strong disorder, including fermionic mean field theories [4, 5, 6] and theories that focus on the univer- sal critical properties near the superconductor-insulator transition. The latter consider the transition to belong to the dirty boson universality class [7]. When quantum fluctuations are included in fermionic theories for high levels of disorder a spatially inhomogeneous pairing am- plitude is found which retains a nonvanishing spectral gap [8]. For sufficiently disordered systems inhomoge- neous pairing can also be brought about by thermal fluc- tuations [9]. A similar inhomogeneous regime has also been considered under the rubric of electronic microemul- sions in the context of the metal-insulator transition of two dimensional electron gases [10]. In this letter we pro- vide evidence of a spatially inhomogeneous order param- eter on the insulating side of a superconductor-insulator transition driven by structural and/or chemical disorder. Studies of disorder and magnetic field tuned superconductor-insulator transitions have usually been carried out on films that are either amorphous or gran- ular. In the former the disorder is on an atomic scale, and in the latter, on a mesoscopic scale in which case the films consist of metallic grains or clusters connected by tunneling, that are either embedded in an insulating matrix, or on a bare substrate [1]. Amorphous films can be produced when films of metal atoms such as Pb or Bi are grown at liquid helium temperatures on substrates precoated with a wetting layer of amorphous Ge or Sb [11], or by careful vapor deposition of MoxGey, InxOy, or TiN using a variety of techniques. Granular films, are known to develop superconductiv- ity in stages. If the grains are small and weakly con- nected, the film is an insulator. For grains larger than some characteristic size, and sufficiently close together, “local superconductivity” will develop below some tem- perature. The opening of a spectral gap in the density of states of the grains [12] results in a relatively sharp upturn in the resistance below this temperature, which is usually close to the transition temperature of the bulk material. For well enough coupled grains, there may be a small drop in resistance at that temperature, followed by this upturn. This is in contrast with the “global su- perconductivity” that occurs when a sufficient fraction of the grains or clusters are strongly enough Josephson cou- pled to form a percolating superconducting path across the film. We have measured the temperature and magnetic field dependence of the resistance, and nonlinear conductance- voltage characteristics of amorphous InxOy films pre- pared by electron-beam evaporation. These films are not granular, but nevertheless exhibit local supercon- ductivity at the lowest temperatures. At low temper- atures, the application of a magnetic field results in a dramatic rise in resistance exhibiting a maximum that with decreasing temperatures is found at decreasingly small fields. The conductance-voltage characteristics in this high resistance regime are nonlinear in a manner suggestive of single-particle tunneling between supercon- ductors. We argue that these observations are consistent with the presence of droplets, or islands of superconduc- tivity, characterized by a nonvanishing superconducting pair amplitude and coupled by tunneling. Many of the droplets are Josephson coupled, but their density is not high enough to produce a superconducting path across the film. The 22 nm thick films used in this study were deposited at a rate of 0.4 nm/s by electron beam evaporation onto (001) SrTiO3 epi-polished single crystal substrates. Plat- inum electrodes, 10 nm in thickness, were deposited prior to growth. The starting material was 99.999 % pure In2O3. A shadow mask defined a Hall bar geometry in which the effective area for four-terminal resistance mea- surements was 500 x 500 µm2. As-grown films exhibited sheet resistances of about 4600 Ω at room temperature and about 23 kΩ at 10 K. By annealing at relatively low temperatures (55-70 ◦C) in a high vacuum environment (10−7 Torr), film resistances were lowered, and depending upon the annealing time either insulating or supercon- ducting behavior at low temperatures could be induced Typeset by REVTEX 1 http://arxiv.org/abs/0704.0765v2 FIG. 1: Resistance vs. temperature for Films 1 and 2. [13]. Low-temperature rather than high-temperature an- nealing avoids changes in morphology that would result in granular or microcrystalline films. As reported by Gantmakher, et al. [14], at room temperature the re- sistances of annealed films were found to be unstable. However, at low temperatures (40-1400 mK) and in vac- uum, they were stable. The films of the present study had resistances of 2600 Ω at room temperature. Film structure was studied using atomic force mi- croscopy (AFM), X-ray diffraction (XRD) analysis, and high resolution scanning electron microscopy (SEM). From the XRD there was no indication of the presence of crystalline In or In2O3. The SEM did not detect any In inclusions, and could be correlated with AFM stud- ies which revealed for a 22 nm thick film, roughness in the form of surface features with a height of 8.5 nm, and with bases about 18 nm in diameter. The conclusion from these characterization efforts is that the films were homogeneous and amorphous, and did not contain iso- lated grains or In inclusions. Measurements were carried out in an Oxford Kelvinox- 25 dilution refrigerator housed in a screen room, with measuring leads filtered at room temperature using π- section filters and RC filters. For measurements of resis- tance, the applied current was set in the range of 10-100 pA, to avoid the possibility of heating. Figure 1 shows a plot of R (T ) for two films which were studied in detail. For each, dR/dT is negative at the lowest temperatures. In the case of Film 1 there is a local minimum in R(T ) at about 350 mK. Both films exhibit a sharp upturn in R(T ) between 200 and 300 mK, with the effects to be discussed below, occurring for Film 1 at higher tempera- tures than for Film 2. These behaviors are suggestive of a regime of local superconductivity [12]. The sheet resistances of Films 1 and 2 were both ap- proximately 78 kΩ at 40 mK. In a perpendicular magnetic field of only 0.2 T, their sheet resistances increased by up to a factor of 40 at 40 mK. The maximum in R(B) as shown in Fig. 2(a) for Film 1 is followed, at the lowest temperatures, by a relatively slow decrease in resistance with increasing field. The resistance maximum moves to higher fields, with increasing temperature. The behavior of Film 2 resembled the higher temperature data for Film 1, presumably because Film 2 exhibited weaker traces of superconductivity as evidenced by the absence of a lo- cal minimum in R(T ) in the zero-field. This variation in properties from film to film is expected, as small changes in chemistry and/or morphology can have a large effect on disordered film properties. The temperature depen- dencies of the fields, Bpeak and resistances Rpeak are pre- sented in Fig. 2(b). A qualitatively similar, but weaker enhancement of resistance was previously reported for in- sulating InxOy films by Gantmakher and coworkers [14]. A larger enhancement was reported for ultrathin insulat- ing Be thin films [15]. However neither of these works demonstrated the systematic effects shown in Fig. 2(b). To probe the nature of the high resistance state, differ- ential conductance-voltage characteristics were also stud- ied [15, 16, 17, 18, 19, 20]. These are shown in Fig. 3 for Film 2 which was studied in detail. Film 1 exhibited qualitatively similar features. All of the nonlinear conductance-voltage characteris- tics are reminiscent of the single-particle tunneling char- acteristics of superconductor-insulator-superconductor (SIS) tunneling junctions. The effects of electron heating are found at voltages well above the observed conduc- tance thresholds [21]. The fact that the low voltage non- linearities vanish at temperatures above approximately 200 mK, suggests that they are associated with the pres- ence of a nonvanishing pairing amplitude occurring in the disconnected superconducting regions. We can model these thin disordered films exhibiting spatially inhomogeneous pairing as random networks of tunneling junctions of various (random) levels of conduc- tivity, connecting superconducting clusters imbedded in an insulating matrix. Some of these junctions are su- perconducting because they are Josephson coupled. As a consequence there are “superclusters,” which are ag- gregates of Josephson coupled smaller clusters that may cover a macroscopic fraction of the film area. Charge may flow through “superclusters” with zero electrical re- sistance. However, as long as these do not span the film the resistance will be determined by single-particle tun- neling. The sheet resistance of the resultant network can be inferred using the following simple argument [22]. Dis- connect all of the junctions in the network whose conduc- tance involves single particle tunneling, and then recon- nect them one by one in ascending order of resistance. A stage will be reached at which the next junction com- pletes an infinite cluster connecting the ends of the net- work. Let the normal state resistance of this last junction FIG. 2: (a) Resistance vs. magnetic field for Film 1. The temperatures are 40 (top), 80, 100, 120, 130, 140, 150, 170, 180, 200, 230, 250, 300, 350, 400, and 500 mK (bottom). (b)The fields (left axis) and the resistances (right axis) of the peaks in R(B) are plotted as a function of temperature. The flattening of Rpeak at the lowest temperatures may be the result of a failure to cool the electrons. FIG. 3: Differential conductance vs. voltage of Film 2 at 100 mK for 0 (top), 0.01, 0.02, 0.03, 0.04, 0.06, 0.1, 0.175, 0.25, 0.5, and 1 T (bottom). be Rn. The actual value will depend upon the nature of the distribution of single-particle tunneling resistances in the film. The measured normal-state sheet resistance of the entire network will then be Rn, as this junction is the bottleneck. Junctions with R > Rn are irrelevant since they are always shunted by junctions with resistances of order Rn. Junctions with R < Rn only form finite clusters over macroscopic distances. They don’t affect the conductivity because current must still pass through junctions with resistance of order Rn to get from one “supercluster” to the next. The action of a magnetic field is to quench the Josephson coupling within the “su- perclusters.” When this happens, the resistance at each temperature will be governed by the new, higher, value of the bottleneck resistance as there will no longer be any Josephson-coupled “superclusters,” and the distribution of junction resistances will shift towards higher values of resistance. The fact that the magnetic field that induces higher resistance decreases with decreasing temperature is a counter-intuitive result, implying a divergent magnetic length scale in the zero temperature limit, possibly of the form [φ0/B] where φ0 is the flux quantum. This re- sult suggests enhanced quantum fluctuations in the zero temperature limit. A heuristic argument can be made to demonstrate that this is plausible by treating the in- homogeneous pairing state of the film as a collection of superconducting grains or islands coupled by tunneling junctions. Without the inclusion of quantum fluctuations the argument may not capture all of the features of the data. It is first useful to consider the magnetic field de- pendence of the in-plane Josephson coupling between two planar thin film square islands with an area L2. This is a geometry resembling the grain boundary geometry of high temperature superconductor junctions. A magnetic field applied perpendicular to the plane will completely penetrate both electrodes of such a junction. As a conse- quence the minima of the diffraction pattern will be gov- erned by the field corresponding to a single flux quantum over the full area of the structure or B [L(2L+ d)] = φ0 where d is the width of the barrier or gap [23]. Since L >> d, the field at the first minimum of the diffrac- tion pattern would be found at a value proportional to −2 . For a “supercluster” consisting of a square ar- ray of square islands that are Josephson coupled, with some degree of randomness in the coupling, one would expect coherence to vanish at the first minimum. For a random array and with clusters that are not square, one might expect a similar dependence on L−2. If the char- acteristic size of the islands increased with decreasing temperature, which is a plausible assumption, the field quenching the Josephson coupling, would be expected to decrease as is observed. For the films studied, at the lowest temperatures, the peak in the resistance occurs in a field of 0.2 T, which would correspond to a length of approximately 100 nm. The fall off of the resistance at fields above that pro- ducing the maximum slows with decreasing temperature, consistent with the strengthening of the pairing ampli- tude with decreasing temperature. The fact that this remnant of superconductivity persists to fields up to 12 T, far above the bulk critical field of InxOy, implies that the superconducting islands are much smaller than the penetration depth. It should be possible to develop a detailed percolation model of this effect, similar to that developed for granular superconductors [24], which in- cludes the quenching of the Josephson coupling by mag- netic field and quantum fluctuations. Although the resistance of the films of the present work increases with decreasing temperatures in zero magnetic field there is no guarantee that at some temperature lower than the minimum value accessed in these measurements, the resistance will not fall to zero. This could result from the percolation of Josephson coupling across the film as the size of the clusters increases. In that event the inhomogeneous pairing implied by the data would be more likely governed by a theory including both quantum [8] and thermal [9] fluctuations. The large peaks in the magnetoresistance found at fields above the magnetic field-induced superconductor- insulator transition of superconducting amorphous InxOy thin films may result from a similar inhomogeneity of the pairing amplitude, in that case induced by mag- netic field rather than disorder. Such peaks were first re- ported by Hebard and Paalanen [25] who suggested that the state induced when superconductivity was quenched was a Bose insulator, characterized by localized Cooper pairs. They proposed that the peak was the signature of a crossover to a Fermi insulating state of localized elec- trons. This resistance peak has been the subject of more recent studies of InxOy films [14, 16, 26], of microcrys- talline TiN films where the high field limit appears to be a “quantum metal” [18], and of high-Tc superconductors [27]. The fact that inhomogeneous pairing can be in- duced in disordered superconductors by magnetic fields has been recently established using a Hubbard Model [28]. The notion that disorder implies inhomogeneity of su- perconducting order on some length scale was first dis- cussed by Kowal and Ovadyahu [29], and as was men- tioned earlier emerges naturally in a fermionic model of the superconductor-insulator transition that exhibits a disorder-tuned inhomogeneity of the pairing amplitude [8]. The films of Kowal and Ovadyahu differ from those of the present work in that they are presumably more disordered, and thus further into the insulating regime. Their magnetoresistance is always negative as there is no Josephson coupling between islands and the main effect of magnetic field is to weaken the inhomogeneous pairing amplitude, leading to negative magnetoresistance. This work was supported by the National Science Foundation under grant no. NSF/DMR-0455121. The authors would like to thank Zvi Ovadyahu for providing samples and for critical comments, and Leonid Glazman and Alex Kamenev for useful discussions. [1] A. M. Goldman and N. Marković, Phys. Today 51 (11), 39 (1998). [2] P. W. Anderson, J. Phys. Chem Solids 11, 26 (1959). [3] A. A. Abrikosov and L. P. Gor’kov, Zh. Eksp. Teor. Fiz. 36, 319 (1959) [Sov. Phys. JETP 9, 220 (1959)]. [4] H. Fukuyama, H. Ebisawa, and S. Maekawa, J. Phys. Soc. Jpn. 53, 3560 (1984). [5] M. Ma and P. A. Lee, Phys. Rev. B 32, 5658 (1985). [6] A. M. Finkel’stein, Physica B 197, 636 (1994). [7] M. P. A. Fisher, Phys. Rev. Lett 65, 923 (1990). [8] A. Ghosal, M. Randeria, and N. Trivedi, Phys. Rev. Lett. 81, 3940 (1998); A. Ghosal, M. Randeria, and N. Trivedi, Phys. Rev. B 65, 014501 (2001). [9] L. N. Bulaevskĭı, S. V. Panyukov, and M. V. Sadovskĭı, Zh. Eksp. Teor. Fiz. 92, 672 (1987) [Sov. Phys. JETP 65, 380 (1987)]. [10] Boris Spivak and Steven A. Kivelson, arXiv:cond-mat/0510422 v2. [11] Myron Strongin, R. S. Thompson, O. F. Kammerer, and J. E. Crow, Phys. Rev. B 1, 1078 (1970). [12] B. G. Orr, H. M. Jaeger, and A. M. Goldman, Phys. Rev. B 32, 7586 (1985). [13] Z. Ovadyahu, J. Phys. C 19, 5187 (1986). [14] V. F. Gantmakher, M. V. Golubkov, J. G. S. Lok, and A. K. Geim, JETP 82, 951 (1996). [15] E. Bielejec, J. Ruan, and W. Wu, Phys. Rev. B 63, 100502(R) (2001). [16] G. Sambandamurthy, L. W. Engel, A. Johansson, and D. Shahar, Phys. Rev. Lett. 92, 107005 (2004), G. Samban- damurthy et al., Phys. Rev. Lett. 94, 017003 (2005). [17] C. Christiansen, L. M. Hernandez, and A. M. Goldman, Phys. Rev. Lett. 88, 037004 (2002). [18] T. I. Baturina, C. Strunk, M. R. Baklanov, and A. Satta, Phys. Rev. Lett. 98, 127003 (2007). [19] W. Wu and P. W. Adams, Phys. Rev. B 50, 13065 (1994). [20] R. P. Barber, Jr., Shih-Ying Hsu, J. M. Valles, Jr., R. C. Dynes, and R. E. Glover III, Phys. Rev. B 73, 134516 (2006). [21] M. E. Gershenson, Yu. B. Khavin, D. Reuter, P. Schafmeister, and A. D. Wieck, Phys. Rev. Lett. 85, 1718 (2000). [22] B. G. Orr, H. M. Jaeger, A. M. Goldman, and C. G. Kuper, Phys. Rev. Lett. 56, 378 (1986). [23] K. L. Ngai, Phys. Rev. 182, 555 (1969). [24] Pedro A. Pury and Manuel O. Cáceres, Phys. Rev. B 55, 3841 (1997). [25] A. F. Hebard and M. A. Paalanen, Phys. Rev. Lett. 65, 927 (1990); M. A. Paalanen, A. F. Hebard, and R. R. Ruel, Phys. Rev. Lett. 69, 1604 (1992). [26] M. Steiner and A. Kapitulnik, Physica C 422, 16 (2005). [27] M. A. Steiner, G. Boebinger, and A. Kapitulnik, Phys. Rev. Lett. 94, 107008 (2005). [28] Yonatan Dubi, Yigal Meir, and Yshai Avishai, Nature 449, 876 (2007). [29] D. Kowal and Z. Ovadyahu, Solid State Commun. 90, 783 (1994). http://arxiv.org/abs/cond-mat/0510422 ABSTRACT Measurements of transport properties of amorphous insulating indium oxide thin films have been interpreted as evidence of the presence of superconducting islands on the insulating side of a disorder-tuned superconductor-insulator transition. Although the films are not granular, the behavior is similar to that observed in granular films. The results support theoretical models in which the destruction of superconductivity by disorder produces spatially inhomogenous pairing with a spectral gap. <|endoftext|><|startoftext|> Introduction How Non-local is de Broglie-Bohm? How non-local is Hooke's Law? Entanglement and a priori Nonlocality Nonlocality involving violation of Bell's Inequality de Broglie-Bohm Trajectories for Massive Singlets Local vs. Nonlocal Velocity Prescriptions The Locality of de Broglie-Bohm Trajectories for Massive Singlets in Aligned Fields Computer Experiments Physical But Unfair Sampling Effects Photon EPR experiments Conclusion The Meaning of Nonlocality Résumé References ABSTRACT We present a local interpretation of what is usually considered to be a nonlocal de Broglie-Bohm trajectory prescription for an entangled singlet state of massive particles. After reviewing various meanings of the term ``nonlocal'', we show that by using appropriately retarded wavefunctions (i.e., the locality loophole) this local model can violate Bell's inequality, without making any appeal to detector inefficiencies. We analyze a possible experimental configuration appropriate to massive two-particle singlet wavefunctions and find that as long as the particles are not ultra-relativistic, a locality loophole exists and Dirac wave(s) can propagate from Alice or Bob's changing magnetic field, through space, to the other detector, arriving before the particle and thereby allowing a local interpretation to the 2-particle de Broglie-Bohm trajectories. We also propose a physical effect due to changing magnetic fields in a Stern-Gerlach EPR setup that will throw away events and create a detector loophole in otherwise perfectly efficient detectors, an effect that is only significant for near-luminal particles that might otherwise close the locality loophole. <|endoftext|><|startoftext|> Ground-based Microlensing Surveys Andrew Gould1, B. Scott Gaudi1, and David P. Bennett2 1. Overview Microlensing is a proven extrasolar planet search method that has already yielded the de- tection of four exoplanets. These detections have changed our understanding of planet formation “beyond the snowline” by demonstrating that Neptune-mass planets with sepa- rations of several AU are common. Microlensing is sensitive to planets that are generally inaccessible to other methods, in particular cool planets at or beyond the snowline, very low-mass (i.e. terrestrial) planets, planets orbiting low-mass stars, free-floating planets, and even planets in external galaxies. Such planets can provide critical constraints on models of planet formation, and therefore the next generation of extrasolar planet searches should include an aggressive and well-funded microlensing component. When combined with the results from other complementary surveys, next generation microlensing surveys can yield an accurate and complete census of the frequency and properties of planets, and in particular low-mass terrestrial planets. Such a census provides a critical input for the design of direct imaging experiments. Microlensing planet searches can be carried out from either the ground or space. Here we focus on the former, and leave the discussion of space-based surveys for a separate paper. We review the microlensing method and its properties, and then outline the potential of next generation ground-based microlensing surveys. Detailed models of such surveys have already been carried out, and the first steps in constructing the required network of 1-2m class telescopes with wide FOV instruments are being taken. However, these steps are primarily being taken by other countries, and if the US is to remain competitive, it must commit resources to microlensing surveys in the relatively near future. 2. The Properties of Microlensing Planet Searches If a foreground star (“lens”) becomes closely aligned with a more distant star (“source”), it bends the source light into two images. The resulting magnification is a monotonic function of the projected separation. For Galactic stars, the image sizes and separations are of order µas and mas respectively, so they are generally not resolved. Rather “microlensing events” are recognized from their time-variable magnification (Paczynski 1986), which typically occurs on timescales tE of months, although it ranges from days to years in extreme cases. Presently about 600 microlensing events are discovered each year, almost all toward the Galactic bulge. If one of these images passes close to a planetary companion of the lens star, it further 1Department of Astronomy, The Ohio State University, 140 W. 18th Ave., Columbus, OH 43210, USA 2Department of Physics, University of Notre Dame, IN 46556, USA http://arxiv.org/abs/0704.0767v1 – 2 – perturbs the image and so changes the magnification. Because the range of gravitational action scales ∝ M , where M is the mass of the lens, the planetary perturbation typically lasts tp ∼ tE mp/M , where mp is the planet mass. That is, tp ∼ 1 day for Jupiters and tp ∼ 1.5 hours for Earths. Hence, planets are discovered by intensive, round-the-clock photometric monitoring of ongoing microlensing events (Mao & Paczynski 1991; Gould & Loeb 1992) 2.1 Sensitivity of Microlensing While, in principle, microlensing can detect planets of any mass and separation, orbiting stars of any mass and distance from the Sun, the characteristics of microlensing favor some regimes of parameter space. • Sensitivity to Low-mass Planets: Compared to other techniques, microlensing is more sensitive to low-mass planets. This is because the amplitude of the perturbation does not de- cline as the planet mass declines, at least until mass goes below that of Mars (Bennett & Rhie 1996). The duration does decline as mp (so higher cadence is required for small planets) and the probability of a perturbation also declines as mp (so more stars must be moni- tored), but if a signal is detected, its magnitude is typically large ( & 10%), and so easily characterized and unambiguous. • Sensitivity to Planets Beyond the Snowline: Because microlensing works by perturb- ing images, it is most sensitive to planets that lie at projected distances where the images are the largest. This so-called “lensing zone” lies within a factor of 1.6 of the Einstein ring, (4GM/c2)Dsx(1− x), where x = Dl/Ds and Dl and Ds are the distances to the lens and source. At the Einstein ring, the equilibrium temperature is TE = T⊕ )1/4( rE )−1/2 → 70K 0.5M⊙ [4x(1− x)]1/4 (1) where we have adopted a simple model for lens luminosity L ∝ M5, and assumed Ds = 8 kpc. Hence, microlensing is primarily sensitive to planets in temperature zones similar to Jupiter/Saturn/Uranus/Neptune. • Sensitivity to Free Floating Planets: Because the microlensing effect arises directly from the planet mass, the existence of a host star is not required for detection. Thus, mi- crolensing maintains significant sensitivity at arbitrarily large separations, and in particular is the only method that is sensitive to old, free-floating planets. See § 4. • Sensitivity to Planets from 1 kpc to M31: Microlensing searches require dense star fields and so are best carried out against the Galactic bulge, which is 8 kpc away. Given that the Einstein radius peaks at x = 1/2, it is most sensitive to planets that are 4 kpc away, but maintains considerable sensitivity provided the lens is at least 1 kpc from both the observer and the source. Hence, microlensing is about equally sensitive to planets in the bulge and disk of the Milky Way. However, specialized searches are also sensitive to closer planets and to planets in other galaxies, particularly M31. See § 5. • Sensitivity to Planets Orbiting a Wide Range of Host Stars: Microlensing is about equally sensitive to planets independent of host luminosity, i.e., planets of stars all along the main sequence, from G to M, as well as white dwarfs and brown dwarfs. By contrast, other – 3 – techniques are generally challenged to detect planets around low-luminosity hosts. • Sensitivity to Multiple Planet Systems: In general, the probability of detecting two planets (even if they are present) is the square of the probability of finding one, which means it is usually very small. However, for high-magnification events, the planet-detection probability is close to unity (Griest & Safizadeh 1998), and so its square is also near unity (Gaudi et al. 1998). In certain rare cases, microlensing can also detect the moon of a planet (Bennett & Rhie 2002). 2.2 Planet and Host Star Characterization Microlensing fits routinely return the planet/star mass ratio q = mp/M and the projected separation in units of the Einstein radius b = r⊥/rE (Gaudi & Gould 1997). Historically, it was believed that, for the majority of microlensing discoveries, it would be difficult to obtain additional information about the planet or the host star beyond measurements of q and b. This is because of the well-known difficulty that the routinely-measured timescale tE is a degenerate combination of M , Dl, and the velocity of the lens. In this regime, individual constraints on these parameters must rely on a Bayesian analysis incorporating priors derived from a Galactic model (e.g., Dong et al. 2006). Experience with the actual detections has demonstrated that the original view was likely shortsighted, and that one can routinely expect improved constraints on the mass of the host and planet. In three of the four microlensing events yielding exoplanet detections, the effect of the angular size of the source was imprinted on the light curve, thus enabling a measurement of the angular size of the Einstein radius θE = rE/Dl. This constrains the statistical estimate of M and Dl (and so mp and r⊥). In hindsight, one can expect this to be a generic outcome. Furthermore, it is now clear that for a substantial fraction of events, the lens light can be detected during and after the event, allowing photometric mass and distance estimates, and so reasonable estimates of mp and r⊥ (Bennett et al. 2007). By waiting sufficiently long (usually 2 to 20 years) one could use space telescopes or adaptive optics to see the lens separating from the source, even if the lens is faint. Such an analysis has already been used the constrain the mass of the host star of the first microlensing planet discovery (Bennett et al. 2006), and similar constraints for several of the remaining discoveries are forthcoming. Finally, in special cases it may also be possible to obtain information about the three-dimensional orbits of the discovered planets. 3. Present-Day Microlensing Searches Microlensing searches today still basically carry out the approach advocated by Gould & Loeb (1992): Two international networks of astronomers intensively follow up ongoing microlens- ing events that are discovered by two other groups that search for events. The one major modification is that, following the suggestion of Griest & Safizadeh (1998), they try to focus on the highest magnification events, which are the most sensitive to planets. Monitoring is done with 1m (and smaller) class telescopes. Indeed, because the most sensitive events are highly magnified, amateurs, with telescopes as small as 0.25m, play a major role. – 4 – Fig. 1.— (Left) Known extrasolar planets detected via transits (blue), RV (red), and microlensing (green), as a function of their mass and equilibrium temperature. (Right) Same as the right panel, but versus semimajor axis. The contours show the number of detections per year from a NextGen microlensing survey. To date, four secure planets have been detected, all with equilibrium temperatures 40K < T < 70K. Two are Jupiter class planets and so are similar to the planets found by RV at these temperatures (Bond et al. 2004; Udalski et al. 2005). However, two are Neptune mass planets, which are an order of magnitude lighter than planets detected by RV at these temperatures (Beaulieu et al. 2006; Gould et al. 2006). See Figure 1. This emphasizes the main advantages that microlensing has over other methods in this parameter range. The main disadvantage is simply that relatively few planets have been detected despite a huge amount of work. 4. NextGen Microlensing Searches Next-generation microlensing experiments will operate on completely different principles from those at present, which survey large sections of the Galactic bulge one–few times per night and then intensively monitor a handful of the events that are identified. Instead, wide- field (∼ 4 deg2) cameras on 2m telescopes on 3–4 continents will monitor large (∼ 10 deg2) areas of the bulge once every 10 minutes around-the-clock. The higher cadence will find 6000 events per year instead of 600. More important: all 6000 events will automatically be monitored for planetary perturbations by the search survey itself, as opposed to roughly 50 events monitored per year as at present. These two changes will yield a roughly 100-fold increase in the number of events probed and so in the number of planetary detections. Two groups (led respectively by Scott Gaudi and Dave Bennett) have carried out detailed – 5 – Fig. 2.— Expectations from a NextGen ground-based microlensing survey. These results represent the average of two independent simulations which include very different input assumptions but differ in their predictions by only ∼ 0.3 dex. (Left) Number of planets detected per year assuming every main-sequence (MS) star has a planet of a given mass and semi-major axis (see §4). (Right) Same as left panel, but assuming every MS has two planets distributed uniformly in log(a) between 0.4-20 AU. The arrows indicate the masses of the four microlensing exoplanet detections. simulations of such a survey, taking account of variable seeing and weather conditions as well as photometry systematics, and including a Galactic model that matches all known constraints. While these two independent simulations differ in detail, they come to similar conclusions. Figure 1 shows the number of planets detected assuming all main-sequence stars have a planet of a given mass and given semi-major axis. While, of course, all stars do not have planets at all these different masses, Gould et al. (2006) have shown that the two “cold Neptunes” detected by microlensing imply that roughly a third of stars have such planets in the “lensing zone”, i.e. the region most sensitive for microlensing searches. Microlensing sensitivity does decline at separations that are larger than the Einstein radius, but then levels to a plateau, which remains constant even into the regime of free-floating planets. In this case, the timescales are similar to those of bound-planet perturbations (1 day for Jupiters, 1.5 hours for Earths) but there is no “primary event”. Again, typical amplitudes are factor of a few, which makes them easily recognizable. If every star ejected f planets of mass mp, the event rate would be Γ = 2 × 10−5f mp/Mj yr −1 per monitored star. Since NextGen experiments will monitor 10s of millions of stars for integrated times of well over a year, this population will easily be detected unless f is very small. Microlensing is the only known way of detecting (old) free-floating planets, which may be a generic outcome of – 6 – planet formation (Goldreich et al. 2004; Juric & Tremaine 2007; Ford & Rasio 2007). 4.1 Transition to Next Generation Although NextGen microlensing experiments will work on completely different principles, the transition is actually taking place step by step. The Japanese/New Zealand group MOA already has a 2 deg2 camera in place on their 1.8m NZ telescope and monitors about 4 deg2 every 10 minutes, while covering a much wider area every hour. The OGLE team has funds from the Polish government to replace their current 0.4 deg2 camera on their 1.3m telescope in Chile with a 1.7 deg2 camera. When finished, they will also densely monitor several square degrees while monitoring a much larger area once per night. Astronomers in Korea and Germany have each made comprehensive proposals to their governments to build a major new telescope/camera in southern Africa, which would enable virtually round-the- clock monitoring of several square degrees. Chinese astronomers are considering a similar initiative. In the meantime, intensive followup of the currently surveyed fields is continuing. 5. Other Microlensing Planet Searches While microlensing searches are most efficiently carried out toward the Galactic bulge, there are two other frontiers that microlensing can broach over the next decade or so. • Extragalactic Planets: Microlensing searches of M31 are not presently sensitive to plan- ets, but could be with relatively minor modifications. M31’s greater distance implies that only more luminous (hence physically larger) sources can give rise to detectable microlensing events. To generate substantial magnification, the planetary Einstein ring must be larger than the source, which generally implies that Jupiters are detectable, but Neptunes (or Earths) are not (Covone et al. 2000; Baltz & Gondolo 2001). Nevertheless, it is astonishing that extragalactic planets are detectable at all. To probe for M31 planets, M31 microlensing events must be detected in real time, and then must trigger intensive followup observa- tions of the type currently carried out toward the Galactic bulge, but with larger telescopes (Chung et al. 2006). This capability is well within reach. • Nearby microlensing events: In his seminal paper on microlensing, Einstein (1936) famously dismissed the possibility that it would ever be observed because the event rate for the bright stars visible in his day was too small. Nevertheless, a Japanese amateur recently discovered such a “domestic microlensing event” (DME) of a bright (V ∼ 11.4), nearby (∼ 1 kpc) star, which was then intensively monitored by other amateurs (organized by Columbia professor Joe Patterson). While intensive observations began too late to detect planets, Gaudi et al. (2007) showed that more timely observations would have been sensitive to an Earth-mass planet orbiting the lens. In contrast to more distant lenses, DME lenses would usually be subject to followup observations, including RV. This would open a new domain in microlensing planet searches. Virtually all such DMEs could be found with two “fly’s eye” telescopes, one in each hemisphere, which would combine 120 10 cm cameras on a single mount to simultaneously monitor the π steradians above airmass 2 to V = 15. A fly’s eye telescope would have many other applications including an all-sky search for transiting – 7 – planets and a 3-day warning system for Tunguska-type impactors. Each would cost ∼$4M. 6. Conclusion and Outlook In our own solar system, the equilibrium-temperature range probed by microlensing (out past the “snow line”) is inhabited by four planets, two gas giants and two ice giants. All have similar-sized ice-rock cores and differ primarily in the amount of gas they have accreted. Systematic study of this region around other stars would test predictive models of planet formation (e.g. Ida & Lin 2004) by determining whether smaller cores (incapable of accreting gas) also form. Such a survey would give clues as to why cores that reach critical gas-grabbing size do or do not actually manage to accrete gas, and if so, how much. In the inner parts of this region, RV probes the gas giants but not the ice giants nor, of course, terrestrial planets. RV cannot make reliable measurements in the outer part of this region at all because the periods are too long. Future astrometry missions (such as SIM) could probe the inner regions down to terrestrial masses, but are also limited by their limited lifetime in the outer regions. Hence, microlensing is uniquely suited to a comprehensive study of this region. Although microlensing searches have so far detected only a handful of planets, these have already changed our understanding of planet formation “beyond the snowline”. Next gen- eration microlensing surveys, which would be sensitive to dozens of “cold Earths” in this region, are well advanced in design conception and are starting initial practical implemen- tation. These surveys play an additional crucial role as proving grounds for a space-based microlensing survey, the results of which are likely to completely revolutionize our under- standing of planets over a very broad range of masses, separations, and host star masses (see the Bennett et al. ExoPTF white paper). Traditionally, US astronomers have played a major role in microlensing planet searches. For example, Bohdan Paczyński at Princeton essentially founded the entire field (Paczynski 1986) and co-started OGLE. Half a dozen US theorists have all contributed key ideas and led the analysis of planetary events. The Ohio State and Notre Dame groups have played key roles in inaugurating and sustaining the follow-up teams that made 3 of the 4 microlensing planet detections possible. Nevertheless, it must be frankly stated that the field is increasingly dominated by other countries, often with GDPs that are 5–10% of the US GDP, for the simple reason that they are outspending the US by a substantial margin. There are simply no programs that would provide the $5–$10M required to be in the NextGen microlensing game. If US astronomers still are in this game at all, it is because of the strong intellectual heritage that we bring, augmented by the practical observing programs that we initiated when the entire subject was being run on a shoestring. These historical advantages will quickly disappear as the next generation of students is trained on NextGen experiments, somewhere else. – 8 – REFERENCES Baltz, E. A., & Gondolo, P. 2001, ApJ, 559, 41 Beaulieu, J.-P., et al. 2006, Nature, 439, 437 Bennett, D. P., & Rhie, S. H. 1996, ApJ, 472, 660 Bennett, D. P., & Rhie, S. H. 2002, ApJ, 574, 985 Bennett, D. P., Anderson, J., Bond, I. A., Udalski, A., & Gould, A. 2006, ApJ, 647, L171 Bennett, D. P., Anderson, J., & Gaudi, B. S. 2006, ApJ, accepted (astro-ph/0611448) Bond, I. A., et al. 2004, ApJ, 606, L155 Chung, S.-J., et al. 2006, ApJ, 650, 432 Covone, G., de Ritis, R., Dominik, M., & Marino, A. A. 2000, A&A, 357, 816 Dong, S., et al. 2006, ApJ, 642, 842 Einstein, A. 1936, Science, 84, 506 Ford, E. B., & Rasio, F. A. 2007, ApJ, submitted (astro-ph/0703163) Gaudi, B. S., & Gould, A. 1997, ApJ, 486, 85 Gaudi, B. S., Naber, R. M., & Sackett, P. D. 1998, ApJ, 502, L33 Gaudi, B. S., et al. 2007, ApJ, submitted (astro-ph/0703125 ) Goldreich, P., Lithwick, Y., & Sari, R. 2004, ApJ, 614, 497 Gould, A., & Loeb, A. 1992, ApJ, 396, 104 Gould, A., et al. 2006, ApJ, 644, L37 Griest, K., & Safizadeh, N. 1998, ApJ, 500, 37 Ida, S., & Lin, D. N. C. 2004, ApJ, 604, 388 Juric, M., & Tremaine, S. 2007, ApJ, submitted (astro-ph/0703160) Mao, S., & Paczynski, B. 1991, ApJ, 374, L37 Paczynski, B. 1986, ApJ, 304, 1 Udalski, A., et al. 2005, ApJ, 628, L109 This preprint was prepared with the AAS LATEX macros v5.2. http://arxiv.org/abs/astro-ph/0611448 http://arxiv.org/abs/astro-ph/0703163 http://arxiv.org/abs/astro-ph/0703125 http://arxiv.org/abs/astro-ph/0703160 ABSTRACT Microlensing is a proven extrasolar planet search method that has already yielded the detection of four exoplanets. These detections have changed our understanding of planet formation ``beyond the snowline'' by demonstrating that Neptune-mass planets with separations of several AU are common. Microlensing is sensitive to planets that are generally inaccessible to other methods, in particular cool planets at or beyond the snowline, very low-mass (i.e. terrestrial) planets, planets orbiting low-mass stars, free-floating planets, and even planets in external galaxies. Such planets can provide critical constraints on models of planet formation, and therefore the next generation of extrasolar planet searches should include an aggressive and well-funded microlensing component. When combined with the results from other complementary surveys, next generation microlensing surveys can yield an accurate and complete census of the frequency and properties of planets, and in particular low-mass terrestrial planets. <|endoftext|><|startoftext|> Introduction Luminous infrared galaxies (LIGs) are characterized by ex- treme IR luminosities LIR >∼ 1011L⊙ at mid- to far-infrared (FIR) wavelengths. In their comprehensive spectroscopic sur- vey of LIGs Kim et al. (1995) and Veilleux et al. (1995) have shown a clear tendency for the more luminous objects to be more Seyfert-like. The starburst and AGN are tightly connected phenomena and the interaction between them is a matter of de- bate. Send offprint requests to: Ivanka Yankulova, e-mail: yan@phys.uni-sofia.bg ⋆ Based on observations obtained at the Peak Terskol Observatory, Caucasus, Russia. Based on a large spectroscopic optical survey of bright IRAS and X-ray sources from ROSAT All Sky Survey, Moran et al. (1996) extracted low-redshift galaxies with optical spec- tra characterized by the HII regions and X-ray luminosities typical of AGNs and these objects were named Composite Seyfert/Starburst galaxies. Other similar galaxies (i.e. with bright X-ray emission together with the clear predominance of a starburst in the optical and IR regime) have been found also in the deep ROSAT fields (Boyle et al. 1995, Griffiths et al. 1996) and in the Chandra and XMM-Newton deep fields (Rosati et al. 2001). A significant part of the observed FIR-emission of these composites could be associated with circumnuclear starburst events. The nuclear X-ray source there is generally absorbed http://arxiv.org/abs/0704.0768v1 2 Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ... with column density of NH > 10 22 cm−2 and these values range from 1022 cm−2 to higher than 1024 cm−2 for about 96 % of this class of objects (Risaliti et al. 1999, Bassani et al. 1999). The circumnuclear starburst should also play a major role in the ob- scuration processes – see for details Levenson et al. (2001) and references therein. However, there are Sy2 galaxies with col- umn densities lower than 1022 cm−2. Panessa & Bassani (2002, hereafter PB02) present a sample of 17 type 2 SyGs showing such low absorption in X-rays. The Compton thin nature of these sources is strongly suggested by some isotropic indica- tors such as FIR and ø3 emission. The fraction of Composite Seyfert/Starburst objects is esti- mated to be in the range of 10% - 30% of the Sy2 population. The simple formulation of the Unified model for SyGs is not applicable in such sources. The observed absorption is likely to originate at larger scales instead in the pc-scale molecular torus. Probably the Broad Line Regions (BLRs) of these ob- jects are covered by some obscuring dusty material. NGC 7679 is a nearby (z = 0.0177) nearly face-on SB0 Seyfert 2 type galaxy in which starburst and AGN activities co- exist. The IRAS fluxes show that the luminosity of NGC 7679 in the far infrared is about LFIR ≈ 1011L⊙. This object is in- cluded in the large spectroscopic survey of 200 luminous IRAS galaxies (Kim et al. 1995, Veilleux et al. 1995). NGC 7679 is physically associated by a common stream of ionized gas with the Sy2 galaxy NGC 7682 at ∼ 4.5 arcmin eastward (PA ≈ 72◦) forming the pair Arp 216 (VV 329). The tidal interac- tions between both galaxies together with the existence of a bar in NGC 7679 could enhance the gas flow towards the nu- clear regions and possibly trigger the starburst processes (Gu et al. 2001). The X-ray properties of the NGC 7679 based on the BeppoSAX observations and on the ASCA archive were dis- cussed by Della Ceca et al. (2001, hereafter DC01). Their conclusion is that NGC 7679 is a Seyfert-starburst composite galaxy which implies the clear predominance of an AGN in the X-ray regime connected with a starburst in the optical and IR regime. DC01 found that a simple power-law spectral model with Γ ∼ 1.75 and small intrinsic absorption (NH < 4 × 1020 cm−2) provides a good description of the spectral properties of NGC 7679 from 0.1 to 50 keV. The small X-ray absorption and the absence of strong (EW ∼ 1 keV) Fe-lines suggest a Compton thin type 2 AGN in NGC 7679 which clearly distin- guishes this galaxy from the other LIG Seyferts. The main goal of this article is to investigate both gas distri- bution and ionization structure in the circumnuclear regions of the luminous IR unabsorbed Seyfert galaxy NGC 7679 and to look for tracers of the presence of a hidden Sy1-type nucleus. Some information on the observations and data reduction procedures is presented in Section 2. The results are presented in Section 3 and discussed in Section 4. The combination of the data taken from recent literature and our Fabry-Perot ob- servations provides new insight in the circumnuclear region of NGC 7679 and in the phenomena occurring there. Table 1. Observiation details image interference Fabry-Perot frames frame filtera) tuned × wavelength exposure λc/FWHM λFP time (Å)/(Å) (Å) (s) Hα 6662/55 6674.8 1 × 1800 2 × 900 [N II]λ6548 6662/55 6659.9 1 × 900 continuum 6719/33 6720.0 1 × 1800 1 × 900 ø3 λ5007 5094/44 5092.4 2 × 900 continuum 5002/41 4437.7 1 × 1200 Gunn rb) 6800/1110 1 × 60 BG 39/2b) 4720/700 2 × 1500 a) Used to separate Fabry-Perot working orders b) Broad-band image taken without Fabry-Perot to reveal the mor- phology 2. Observations and data reduction NGC 7679 was observed by K. Jockers, T. Bonev, and T. Credner with the 2m RCC reflector of the Peak Terskol Observatory, Caucasus, Russia. The observations were carried out in October 1996 with the Two-channel Focal Reducer of the former Max-Planck-Institut für Aeronomie, Germany (now Max-Planck-Institut für Sonnensystemforschung, MPS). This instrument was primarily intended for cometary studies but it has repeatedly been used for observations of active galactic nu- clei (see for example Golev et al. 1995, 1996, and Yankulova 1999). The technical data and the present capabilities of the MPS Two-channel Focal Reducer are described in Jockers (1997) and Jockers et al. (2000). All observations were taken in Fabry-Perot (FP) mode us- ing tunable FP narrow-band imaging with spectral FWHM of the Airy profile δλ in order of 3 - 4 Å. The details of obser- vations are presented in Table 1 where the central wavelengths λc and the effective width ∆λ of the interference filters used to separate the Fabry-Perot interference orders, the wavelength λFP at which the Fabry-Perot was tuned, and the exposures are listed. The overall “finesse” of the system ∆λ/δλ is ≈ 15, ∆λ is the free spectral range of the FP. As one can see from Table 1 ∆λ is comparable to the filter’s band width and therefore all FP orders except the central one are efficiently suppressed. Two exposures of NGC 7679 were obtained through each filter to eliminate cosmic ray events and to increase the signal-to-noise ratio. Flatfield exposures were obtained using dusk and dawn Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ... 3 Fig. 1. Contours of continuum-subtracted narrow-band ø3 λ5007-image superimposed on the gray-scale ø3 λ5007-emission distribution of the circumnuclear region of NGC 7679. The background noise level is σ = 2.01 × 10−17 ergs cm−2 s−1 arcsec−2. The outermost contour is taken at 5σ above the sky level and the next contours increase by a factor 2. Note East-West elongation and two extrema decentered of about ∼ 4 arcsec from the position of the nucleus marked by cross. North is up, East is to the left. twilight for uniform illumination of the detector. No dark cor- rection was required. The images were reduced following the usual reduction steps for narrow-band imaging. After flatfielding the frames were aligned by rebinning to a common origin. The final align- ment of all the images was estimated to be better than 0.1 px (the scale is 1 px = 0.8 arcsec). A convolution procedure was performed in order to match the Point-Spread Functions (PSFs) of each line-continuum pair which unavoidably degrades the fi- nal FWHM of the images to the mean value ≈ 3 − 3.3 arcsec (shown as ’seeing’ in Fig. 1). At the distance of NGC 7679 one arcsec corresponds to a distance of about 340 pc assuming H0 = 75 km sec −1 Mpc−1. 3. Results 3.1. Narrow-band emission-line images Gray-scale images of the narrow-band flux distribution of the extended circumnuclear region of NGC 7679 in the ø3 λ5007, Hα, and [N II]λ6548 emission lines with superimposed con- tours are presented in Fig. 1, 2, and 3, respectively. The ø3 λ5007 emission shown in Fig. 1 reveals a bright, about 20 arcsec in size, extended emission-line region (EELR) which is elongated approximately in East direction (PA ≈ 80◦± 10◦). This region is similar to the analogous EELRs observed in many Sy2 type galaxies. Most probably it is powered by the Fig. 2. Contours of continuum-subtracted narrow-band Hα- image superimposed on the gray-scale Hα-emission distribu- tion of the circumnuclear region of NGC 7679. The back- ground noise level is σ = 2.77 × 10−18 ergs cm−2 s−1 arcsec−2. The outermost contour is taken at 5σ above the sky level and the next contours increase by a factor of 2. North is up, East is to the left. Fig. 3. Contours of continuum-subtracted narrow-band [N II]λ6548-image superimposed on the gray-scale [N II]λ6548-emission distribution of the circumnuclear region of NGC 7679. The background noise level is σ = 4.75 × 10−18 ergs cm−2 s−1 arcsec−2. The outermost contour is taken at 5σ above the sky level and the next contours increase by a factor 2. North is up, East is to the left. 4 Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ... AGN-type activity of the nucleus. The emission-line peak of ø3 λ5007 is shifted at about ∼ 4 arcsec to the East with respect to the center defined by the continuum emission and marked by cross in Fig. 1. At larger distances (∼ 37 arcsec) the ionized gas forms an envelope which is extended along the direction PA ≈ 72◦ to the NGC 7682, the counterpart of NGC 7679, as it was already noted by Durret & Warin (1990). In Fig. 2 we present our very deep and high-contrast Hα continuum-subtracted image with numerous starburst regions where because of both seeing and pixel size we are able to see only elliptical central isophotes instead of the “double nu- cleus” observed recently by Buson et al (2006). Our analysis of the unpublished Hα images taken from the archive of the Isaak Newton Group of telescopes at La Palma as well as the archive images of Buson et al. (2006) from the ESO La Silla NTT also revealed a “double nucleus” otherwise unseen in the known broad-band images. The separation between the nuclear counterparts (in fact one is the active nucleus itself and the other one is a bright spiral-like extremely powerful starburst region) is ≈ 3 arcsec. The existence of this “double nucleus” in NGC 7679 could enhance the gas flows towards the nuclear regions and possibly trigger the starburst process itself. The “double nucleus” can be also seen at very different wavelength range on 6 cm and 20 cm high-resolution VLA radio continuum map of NGC 7679 published by Stine (1992). The angular dis- tance and PA between two counterparts is quite the same. The radio spectral index is −0.37 and steepens away from the cen- ter which indicates that nonthermal emission leaks out of the starburst region. The low-excitation gas traced by the emission in Hα re- veals different morphology as compared to that of the ø3 λ5007 emission. Inside of the region with radius of 6 – 8 arcsec from the center the contours of the Hα emission are nearly circular. Outside this region to the West of the main body of NGC 7679 a clearly outlined wide arc is observed at 16 arcsec (∼ 5 kpc) from the center. To the East this arc converts into a gaseous envelope which forms a part of a circumnuclear starforming ring mentioned by Pogge (1989). This arc is not detected on the narrow-band continuum image next to the Hα. The same morphology in Hα + [N II] with higher spatial resolution was observed by Buson et al. (2006). The Fabry-Perot technique used by us makes possible to disentangle [N II]λ6548 from Hα. The pure [N II]λ6548 emis- sion (Fig. 3) shows extended structure ∼ 20 arcsec in diameter. The starforming ring revealed by the Hα image is not seen here. As a rule the gas component in the starforming ring is ionized by stellar UV-emission and the [N II]λ6548 is weaker than that one where the gas is ionized by power-low AGN continuum. On the other hand, this could be an effect due to the shorter exposure time of our [N II]λ6548 frame. 3.2. Narrow-band emission-line total fluxes The total emission-line fluxes of Hα, ø3 λ5007and [N II]λ6583 were estimated from our flux calibrated images in an aperture of 2 kpc (r <∼ 3 arcsec) like the one used by the authors cited in Table 2. In this Table we have collected available measure- ments of the emission lines observed by us up to now. Our measured fluxes are in good agreement with those of Kim et al. (1995) and differ from the measurements of Contini et al. (1998). Flux values given by Contini et al. (1998) are twice larger than ours and those given by Kim et al. (1995). Recently Gu et al. (2006) measured the central flux in ø3 λ5007. We found a reasonable coincidence between their value (1.55 × 10−14 ergs cm−2 s−1) and ours (1.94 × 10−14 ergs cm−2 s−1) in the much smaller aperture used by them. We estimated the flux of the continuum near ø3 λ5007 within the central 2 kpc to be F(λcont) = 6.74×10−15 ergs cm−2 s−1Å. Then the equivalent width of the emission line ø3 λ5007 is EW(λ 5007) = 7.6 Å. Baskin and Loar (2005) have used the photoionization code CLOUDY to calculate the dependence of EW(λ 5007) on the electron density ne, the ionization parame- ter U, and the covering factor CF. Following their Fig. 5 and our estimation of EW(λ 5007) we derive for the covering factor CF the range 0.016 ≤ CF ≤ 0.04 with the most probable value CF ≈ 0.024. There is a large quantity of absorbing matter in the central region of NGC 7679 (Telesco et al. 1995) which modifies the Balmer emission lines. The Balmer decrement reported by Kim et al. (1995) in the central 2 kpc is F(Hα)/F(Hβ)≈ 17.4, but fol- lowing Contini et al. (1998) this decrement is 8.5. Kewley et al. (2000) give E(B − V)= 0.47 which results to F(Hα)/F(Hβ)= 5.04. In Table 2 the value of the parameter C is evaluated from the measured Balmer decrement and from the assumption that in AGNs F(Hα)/F(Hβ)= 3.1 and the optical depth τλ = C f (λ) where f (λ) is the reddening curve (Osterbrock 1989). The ex- tinction E(B − V) derived from the Balmer decrement is also given in Table 2. Contini et al. (1998) present measurements of emission- lines fluxes made in the extranuclear region 9 arcsec off the nu- cleus at PA = 207◦ in an aperture of 3 arcsec. We estimated the emission-line fluxes from our images in the same aperture at the same place in order to compare with those given by Contini et al. (1998). The results are given in Table 2. The Contini’s values are about 2 times larger than ours in the extranuclear region as well as at the nucleus. Moustakas & Kennicutt (2006) report total emission-line fluxes of Hα and ø3 λ5007 in a wide rectangular aperture 30 × 80 arcsec oriented at PA = 90◦. Their Hα-flux F(Hα) = (1.535± 0.062)× 10−12 ergs cm−2 s−1 coincides with our value (1.52 × 10−12 ergs cm−2 s−1) in the same wide aperture after a correction for extinction with E(B − V) = 0.065 used by them. In ø3 λ5007 the coincidence is reasonably good (4.72 × 10−13 compared with ours 3.90 × 10−13 ergs cm−2 s−1). 3.3. The ionization map F(ø3 λ5007) / F(Hα) Our flux-calibrated emission-line images are used to form the F(ø3 λ5007)/F(Hα) ionization map in order to analyse the mean level of ionization. This map is shown in the left panel of Fig. 4. All pixels below 4σ of the background noise level were suppressed before the division of the corresponding images. The ionization map infers a presence of a maximum shifted Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ... 5 Table 2. Measured emission lines fluxes in 2 kpc central aperture in NGC 7679 Emission Measured flux F(λ), ergs cm−2 s−1 2 kpc central aperture 9 arcsec off the nucleus 1 2 3 4 5 6 7 Hα 1.92 × 10−13 1.9 × 10−13 4.5 × 10−13 – 3.8 × 10−13 3.73 × 10−14 1.04 × 10−14 [N II]λ6548 9.96 × 10−14 1.08 × 10−13 1.86 × 10−13 – – 9.8 × 10−15 4.5 × 10−15 ø3 λ5007 5.2 × 10−14 5.3 × 10−14 8.8 × 10−14 – – 9 × 10−15 4.6 × 10−15 Hβ – 1.1 × 10−14 5.24 × 10−14 1.0 × 10−14 – 5.9 × 10−15 – F(Hα)/F(Hβ) – 17.4 8.5 5.0 4.58 6.3 – F(Hγ)/F(Hβ) – 0.24 0.32 0.4 0.3 – – C – 4.93 2.88 1.6 1.12 2.02 – E(B − V) – 1.45 0.85 0.47 0.33 0.65 – Columns: 1 - this work; 2 - Kim et al. (1995); 3 - Contini et al. (1998); 4 - Kewley et al. (2000); 5 - Buson et al. (2006); 6 - Contini et al. (1998); 7 - this work, PA = 207◦. Fig. 4. F(ø3 λ5007) / F(Hα) ionization map of NGC 7679. All pixels below 4σ of the background noise level were suppressed before image division (left). The ratio F(ø3 λ5007)/F(Hα) vs the axial distance from the nucleus along PA ≈ 80◦ (right). The positions labeled 1 to 5 are equidistant with step size of 3 arcsec. We refer to them later in the text (see Fig. 6). to the East at PA ≈ 80◦ with respect to the photometric cen- ter defined by the integral light of the continuum images and marked by cross on the figure. A slice of this map along the PA ≈ 80◦ versus the axial dis- tance from the nucleus is presented in the right panel of Fig. 4. Below we will discuss in more detail the behaviour of the ion- ization at positions 1 to 5. 4. Discussion 4.1. The ionizing flux from the central engine In order to estimate the number of ionizing photons emitted from the central engine, we made use of the recent X-ray obser- vations of NGC 7679. This object was observed by ASCA and BeppoSAX in 1998, and by XMM-Newton in 2005. A detailed analysis of ASCA and BeppoSAX data sets is present in DC01. 6 Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ... They show that a single absorbed power-law function (with a photon index 1.75) fits the observed spectrum very well and the X-ray absorption is relatively small (NH ≤ 4 × 1020 cm−2). The data for the X-ray observations in 2005 were taken from the XMM-Newton public archive. The corresponding X- ray spectra for the PN and the two MOS detectors were ex- tracted following the standard procedures using the XMM- Newton Science Analysis System software (SAS version 7.0.0). A single absorbed power-law function gave a good fit (χ2/do f = 201/191) to all the three spectra which were fitted simultaneously. The small X-ray absorption in the nucleus of NGC 7679 was confirmed, NH = 5.6[4.0 ÷ 7.5] × 1020 cm−2, and no change in the shape of the spectrum was found, a pho- ton index of 1.81[1.70÷1.92] (the 90%-confidence intervals are given in brackets). The absorbing X-ray column density along the line of sight is about an order of magnitude smaller than that one estimated from the observed Balmer decrement which is NH ∼ 8 × 1021 cm−2 and ∼ 5 × 1021 cm−2 following Kim et al. (1995) and Contini et al. (1998), respectively. Interestingly, the observed X-ray flux has decreased by a factor ∼ 10 over a time period of ∼ 7 years: FX = 3.8 × 10−13 and 5.8 × 10−13 ergs cm−2 s−1 correspondingly in the 0.1-2.0 keV and 2.0-10.0 keV energy intervals. Since, on the one hand, there is only about 5% scatter of the fluxes for all the three detectors (one PN and two MOS) around the average values given above, and, on the other hand, NGC 7679 shows an appreciable X-ray variability (DC01), it is then likely that the detected decrease of the X-ray flux is real and not an instru- mental effect. The extrapolation of the DC01’s power law to the UV spec- tral domain (that is to hν0 = 13.6 eV) yields F ν = Fν0(ν0/ν) where α = 0.75 and Fν0 = 2.0 × 10 −28 erg cm−2 s−1 Hz−1. The same extrapolation for the XMM-Newton spectrum results in Fntν = Fν0(ν0/ν) α where α = 0.81 and Fν0 = 2.8 × 10 −29 erg cm−2 s−1 Hz−1. The number of ionizing photons with hν > 55 eV provided by the central AGN source is defined as Nion = 55 eV F ntν dν = 4πR2G hν=55 eV where RG is the distance to the NGC 7679. For the BeppoSAX data this estimation is Nion ∼ 1052 ph s−1 and for the XMM- Newton data Nion ∼ 1051 ph s−1. These values are averaged be- tween all BeppoSAX and XMM-Newton bands, respectively. The number of ionizing photons decrease from the BeppoSAX time to the XMM-Newton time in the range of 1051 <∼ Nion <∼ 1052 ph s−1. 4.2. Physical conditions in the circumnuclear region of NGC 7679 The extended emission-line region in NGC 7679 has a rather different morphology when observed in Hα (low ionization emission line) as compared to ø3 λ5007 (high ionization emis- sion line). The Hα image (Fig. 2) contains a compact circum- nuclear region (∼ 20 arcsec in diameter) whose isophotes do not infer any preferred direction. In contrast, the ø3 λ5007 im- age (Fig. 1) of the circumnuclear region of NGC 7679 shows elliptical isophotes extended along the PA ≈ 80◦ ± 10◦. Such difference in morphology of the emission-line images signals the presence of at least two distinct ionization components (see for example Pogge 1989). The extended morphology both of the ø3 λ5007 image (Fig. 1) and of the ø3 λ5007/Hα flux ratio image (Fig. 4) sug- gests an anisotropy of the radiation field. In order to check whether the ionizing field is collimated or not we have to com- pare the number of ionizing photons Nph, absorbed by the ex- tended emission line gas with the number of ionizing photons Nion, emitted by the central AGN engine. Usually, the hydro- gen line flux F(Hα ) or F(Hβ ) is used to find Nph. But the NGC 7679 high resolution Hα image reveals a central circum- nuclear star-forming spiral ring capable of producing about ∼ 75% of the optical line emission within a radius of ∼ 1 kpc (Buson et al. 2006). For this reason it is not quite correct to use the F(Hα) in order to make the Nph estimate. Kauffmann et al. (2003) focus on the luminosity of the ø3 λ5007 as a tracer of AGN activity. We can estimate the number Nph of ionizing photons with energy above hν = 55 eV from the observed ø3 λ5007 luminosity after correction for extinc- tion. A dust correction to ø3 based on the ratio F(Hα) / F(Hβ) should be regarded as best approximation (Kauffmann et al. 2003). According to Draine & Lee (1984) (Fig.7 therein) the optical depth is τ5007 = 0.96 C = 2.76. Here we adopt the value of C= 2.88 following Contini et al. (1998) as a more com- promising reddening value among the different Balmer decre- ment assessments. Then the luminosity, corrected for extinc- tion, Lcorr([O+2]λ5007) = 4.4 × 1041 ergs s−1. We note that PB02 give 5.7×1041ergs s−1 for the ø3 λ5007 luminosity. The total number of ionizing photons that must be available to produce the observed ø3 λ5007 emission is given by the expression Nph = +2, Te)L corr([O+2]λ5007) CF−1 αeff5007(ne, Te) hν5007 ≈ 2 × 1052 ph s−1 (2) where αG(O +2, Te) = 5.1 × 10−12 cm3 s−1 (Aldrovandi & Pequignot 1973) is the recombination coefficient at Te ≈ 104 K and αeff5007(ne, Te) = 1.1× 10 −9 cm3 s−1 is the effective recombi- nation coefficient at ne = 10 5 cm−3 and Te = 10 4 K. This coeffi- cient strongly depends on the electron density and temperature. If we accept Te = 10 4 K then αeff5007(ne) = 5.14 × 10 −3A21/ne cm3 s−1 where A21 = 0.021 s −1. As the critical electron density is ncre (5007) = 5×10 5 cm−3 we assume that the electron density is not lower than ne ≈ 104 cm−3 in order to emit the ø3 λ5007. Then the lower limit for Nph is ≈ 2×1051 ph s−1. For NGC 7679 the covering factor CF = 0.024. The photon ratio Nph/Nion is a probe of the collimation hy- pothesis. In the anisotropic case this ratio is considerably larger than 1. Under the above assumptions about ne and Te we esti- mate for NGC 7679 0.2 <∼ (Nph/Nion)hν>55 eV <∼ 20 but the lower limit could increase if the luminosity L([O+2]λ5007) is inte- grated over the whole image. The increase of the upper limit of Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ... 7 Fig. 5. Spectral energy distribution (SED) from the radio to the X-ray band of the composite Starburst/Sy2 galaxy NGC 7679 (open diamonds). The radio values at 6 cm and 20 cm are from VLA (Stine, 1992). The X-ray band data are from ASCA and BeppoSAX (DC02 and Risaliti, 2002). Filled diamonds repre- sent recent X-ray observations taken from the XMM-Newton archive. All other data are taken from NED. The SED has been compared with a normal spiral galaxy template (dotted line) taken from Elvis et al.(1994), with Starburst and Sy2 galaxy templates (dashed line and thin solid line) taken from Schmitt et al. (1997), and with Sy1 galaxy template (thick solid line) taken from Mas-Hesse et al. (1995). this ratio is due to the XMM-Newton data which are ∼8 times lower than ASCA/BappoSAX ones. Both the ratio (Nph/Nion)hν>55 eV and the presence of weak and elusive broad Hα-wings (Kewley et al. 2000) indicate a hidden AGN in the NGC 7679. Contrary, the NGC 7679 X- ray spectrum is not highly absorbed and NH < 4 × 1020 cm−2 (see discussion in section 4.1). As a matter of fact Bian & Gu (2006) recently found a very high detectability of hidden BLRs (∼ 85%) for Compton-thin Sy2s with higher ø3 luminosity of L([O+2]λ5007) > 1041 erg s−1. We have to note that NGC 7679 resembles in many respects the galaxy IRAS 12393+3520. In this galaxy direct X-ray evi- dence suggests the presence of a hidden AGN (Guainazzi et al., 2000). This homology can be seen in Fig. 5 where the spectral energy distribution (SED) from the radio to the X-ray band of NGC 7679 is shown. The composite nature of NGC 7679 is clearly seen. Whereas the starburst component dominates in the FIR-IR range, the X-ray band emission is well below that of a typi- cal Sy1. The extrapolation of the power-low X-ray spectrum to 13.6 eV shows a much lower value than the typical Sy2 emis- sion at this wavelength. This again favors the idea about a hid- den central engine. Guainazzi et al. (2000) suppose that a dusty ionized absorber is able to obscure selectively the optical emis- sion, leaving the X-rays almost unabsorbed. Fig. 6. The ø3 λ5007/Hβ vs. [N II]λ6583/Hα diagnostic dia- gram of Veilleux & Osterbrock (1987). The dashed and dotted theoretical lines demarcate between Starbursts and AGNs ac- cording to Kauffmann et al. (2003) and Kewley et al. (2001), respectively. The line dividing between LINERs and SyGs is taken according to PA02. The label “Comp” indicates the re- gion of the diagram in which composite objects are expected to be found. The diagnostic value measured by us is denoted by thick triangle. See text for other designations. 4.3. Ionization structure in the circumnuclear region of the NGC 7679 The ionization map (the right panel of Fig. 4) displays the clear signature of highly-excited gas. The ø3 λ5007/Hα-ratio increases in the direction of the counterpart galaxy NGC 7682 reaching a maximum of ≈ 2.5 at about 12 arcsec off the nu- cleus. More than 15 years ago Durret & Warin (1990) also re- ported about the presence of high-ionization gas in this direc- tion (see their Fig.3a) but their result seemingly did not attract attention. On the other hand at PA ≈ 0◦ our map shows values around ø3 λ5007/Hα ≈ 0.3 and the ionization in this direction is en- tirely due to the young hot stars. The ø3 λ5007/Hβ vs. [N II]λ6583/Hα diagnostic diagram (Veilleux & Osterbrock, 1987) helps to delineate the different ionization mechanisms maintaining the ionization of gaseous component in AGNs and in Starbursts. In Fig. 6 such a dia- gram is shown for NGC 7679. Kewley et al. (2001) distinguish between Starbursts and AGN using a theoretical upper limit de- rived from star forming models. This limit is shown as a dotted line in Fig. 6. Objects with emission-line ratios above this limit cannot be explained by any possible combination of parame- ters in a star forming model. Kauffmann et al. (2003) published an updated estimate for the starburst boundary derived from the SDSS observations. In Fig. 6 this boundary is shown as a dashed line. The location of the Composites is expected to lie between these two lines (see e.g. Panessa et al., 2005). 8 Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ... In Fig. 6 we plot the emission-line flux ratios of NGC 7679 measured in an aperture of 3 arcsec in steps of 3 arcsec both along the PA≈ 80◦ (with crosses) and PA= 0◦ (with diamonds). The labels 1 - 5 for PA≈ 80◦ correspond to the labels in the right panel of Fig 4. Using spectra taken from the Smithsonian Astrophysical Observatory data Center Z-Machine Archive obtained with 3 arcsec slit width, we estimate the observed F(Hα)/F(Hβ) ∼ 5 in NLR. On Fig. 6 positions 1 and 2 at PA = 80◦ off the nucleus lie well within the region occupied by the Sy 2 galaxies. The position 5, which is at the same distance from the nucleus but in opposite direction, is located nearly on the dividing line. All points which refer to the PA = 0◦ are situated between Kauffmann’s and Kewley’s demarcation lines in the region of Composites. In Fig. 6 we also plot with asterisks the nuclear diagnostic ratios according to the data of authors presented in Table 2. The thick triangle refers to the nucleus according to our measure- ments under the assumption of F(Hα)/F(Hβ) = 8.5 (Contini et al., 1998). The large scattering of nuclear values is probably due to the variations of the strength of Hβ absorption line of the star-forming stellar population. 4.4. Unabsorbed SyGs with and without hidden BLRs The unabsorbed Sy2 galaxies with low absorption in X-rays (NH < 10 22 cm−2) possess a hidden or nonhidden central en- gine and BLRs. We have used the ø3 λ5007 emission to test the presence of hidden or nonhidden AGN sources in unabsorbed Sy2 galaxies in the sample of PB02 (14 objects) and Panessa et al., 2005 (6 objects selected by Moran et al., 1996) in the same way as it was done for NGC 7679 (Subsections 4.1 and 4.2). We derive the ratio (Nph/Nion)hν>55 eV following equations (1) and (2) under the assumtions of ne ≈ 5 × 104 cm−3 (which is an order of magnitude smaller than the critical electron density for the ø3 λ5007 emission), Te ≈ 104 K, and CF ≈ 10−2. These assumptions refer to the inner circumnuclear clouds of AGNs. The ratios are presented in Table 3. For the objects dis- cussed in Panessa et al. (2005) the most popular (i.e. as in NED) galaxy names are used. The Lcorr([O+2]λ5007) values are taken from PB02 and Panessa et al. (2005). In the case of NGC 7679 we have used both their and our determinations of Lcorr([O+2]λ5007). For three objects with estimated broad Hα component LbroadHα (Panessa et al., 2005, Table 1 therein) we derive also the number of recombinations Nrec resulting in the Hα emission. We assume Te = 10 4 K and CF = 1 which leads to the estima- tion of the lower limit of the value of Nrec. The Nrec/Nion lower limits are also presented in Table 3. One can see that 17 out of 20 objects of the unabsorbed Sy2s discussed here reveal Nph/Nion)hν>55 eV > 0.3. This indi- cates that the central AGN sources in a considerable part of the unabsorbed Sy2s are obscured. The NGC 7679 does not make an exception and also possesses a hidden AGN engine suggested both by the ø3 λ5007 morphology and by the photon deficiency. Table 3. The photon deficiency for unabsorbed Sy2s discussed by Panessa and Bassani (2002) and Panessa et al. (2005) galaxy (Nph/Nion)hν>55 eV Nrec/Nion (lower limit) ESO 540-G001 4.2 13.0 CGCG 551-008 1.0 MCG -03-05-007 2.2 UGC 03134 19.5 IRAS 20051-1117 1.6 1.2 CGCG 303-017 1.3 2.0 IC 1631 0.3 NGC 2992 2.0 NGC 3147 0.4 NGC 4565 6.7 NGC 4579 0.2 NGC 4594 1.7 NGC 4698 0.3 NGC 5033 1.3 MRK 273x 0.4 NGC 5995 0.4 NGC 6221 0.02 NGC 6251 6.0 NGC 7590 0.4 NGC 7679 3.4 (2.0 from our data) It is still not clear what kind of physical process is related to the presence of hidden central engines in Sy2s. PB02 suggest two scenarios for the unabsorbed Sy2s (i) the central engine and their BLR must be hidden by an absorbing medium with high value of the AV/NH ratio, and (ii) the BLR is very weak or absent. 5. Conclusions We present a new ø3 λ5007 emission - line image of the circumnuclear region of NGC 7679 which shows elliptical isophotes extended along the PA≈ 80◦ ± 10◦ in the direction to the counterpart galaxy NGC 7682. The maximum of this emis- sion is displaced by about 4 arcsec from the photometric center defined by the continuum emission. The ratio of the quantity of ionizing photons inferred from the observed extinction corrected ø3 λ5007 luminosity to the number of ionizing photons with hν > 55 eV provided by the central AGN source (Nph/Nion)hν> 55 eV ≈ 0.2 − 20 as well as the presence of weak and elusive Hα broad wings probably indicate a hidden AGN. The high ionization inferred by the flux ratio ø3 λ5007/Hα in the direction of about PA≈ 80◦ ± 10◦ coincides with the di- rection to the counterpart galaxy NGC 7682. It is possible that the dust and gas in this direction has a direct view to the central AGN engine. It suggests that starburst and dust decay in this di- Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ... 9 rection have occurred because of tidal interaction between the two galaxies. In the direction PA≈ 0◦ the ionization is entirely caused by hot stars. A large part of the unabsorbed Compton-thin Sy2s with higher ø3 luminosity (>∼ 1041 erg s−1) possesses a hidden AGN source. Acknowledgements. We are grateful to the referee, Lucio Buson, for his valuable comments which improved both the content and the clar- ity of this manuscript. We would like to thank T. Bonev, Institute of Astronomy of Bulgarian Academy of Sciences, for kindly providing the Fabry-Perot observations and for useful discussions. We are grateful to S. Zhekov, Space Research Institute of Bulgarian Academy of Sciences, for the numerous fruitful discussions and especially for the analysis of the X-ray properties of NGC 7679. Our work was partially based on data from the La Palma ING, ESO NTT, and XMM-Newton Archives. This research has made use of the SIMBAD database, operated at CDS, Strasbourg, France, and of the NASA/IPAC Extragalactic Database (NED) which is operated by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration. We acknowledge the support of the National Science Research Fund by the grant No.F-201/2006. References Aldrovandi, S. M. V., & Pequignot, D. 1973, A&A, 25, 137 Bassani, I., Dadina, M., Maiolino, R.,et al. 1999, ApJS, 121,473 Baskin, A. & Laor, A. 2005, MNRAS, 358, 1043 Bian, W., & Gu, Q. 2006, ApJ accepted (astro-ph/0611199) Boyle, B. J., McMahon, R. G., Wilkes, B. J.,& Elvis, M. 1995, MNRAS, 276, 315 Buson, L. M., Cappellari, M., Corsini, E. M., Held, E. V., Lim, J., & Pizzella, A. 2006, A&A, 447, 441 Condon, J., Huang, Z., Yin, Q., & Thuan, T. 1991, ApJ, 378, 65 Contini T., Considere S., & Davoust E. 1998, A&AS, 130, 285 Della Ceca, R., Pellegrini, S., Bassani, L., Beckmann, V., Cappi, M., Palumbo, G. G. C., Trinchieri, G., & Wolter, A. 2001, A&A, 375, 781 (DC01) Draine, B. T., & Lee, H. M. 1984, ApJ, 285, 89 Durret, F., & Warin, F. 1990, A&A, 238, 15 Elvis M., Wilkes, B. J., McDowell, J. C., Green, R. F., Bechtold, J., Willner, S. P., Oey, M. S., Polomski, E., & Cutri, R. 1994, ApJS 95, 1 Golev, V., Yankulova, I., Bonev, T., & Jockers, K. 1995, MNRAS, 273, Golev, V., Yankulova, I., & Bonev, T. 1996, MNRAS, 280, 29 Granato, G. L., & Danese, L. 1994, MNRAS, 268, 235 Griffiths, R. E., Della Ceca, R., Georgantopoulos, I., Boyle, B., Stewart, G., Shnks, T., & Fruscione, A. 1996 MNRAS, 281, 71 Gu, Q., Melnick, J., Fernandes, R. Cid, Kunth, D., Terlevich, E., & Terlevich, R. 2006, MNRAS, 366, 480 Gu, Q. S., Huang, J. H., de Diego, J. A., Dultzin-Hacyan, D., Lei, S. J., & Benitez, E. 2001, A&A, 374, 932 Guainazzi, M., Dennefeld, M., Piro, L., Boller, T., Rafanelli, P., & Yamauchi, M. 2000, A&A, 355, 113 Heckman, T. M., Armus, L., & Miley, G. K. 1990, ApJS, 74, 833 Jockers, K. 1997, Experimental Astronomy, 7, 305 Jockers, K., Credner, T., Bonev, T., Kiselev, N., Korsun, P., Kulik, I., Rosenbush, V., Andrienko, A., Karpov, N., Sergeev, A., & Tarady, V. 2000, Kinematika i Fizika Nebesnykh Tel, Suppl, No. 3, 13 Kauffmann, G., Heckman, T. M., Tremonti, C., et al. 2003, MNRAS, 346, 1055 Kewley, L. J., Heisler, C. A., Dopita, M. A., Sutherland, R. Norris, R., Reynolds, J., & Lumsden, S. 2000, ApJ, 530, 704 Kewley, L. J., Heisler, C. A., Dopita, M. A., & Lumsden, S. 2001, ApJS, 132, 37 Kim, D.-C., Sanders, D. B., Veilleux, S., Mazzarella, J. M., & Soifer, B. T. 1995, ApJS, 98, 129 Kotilainen, J. K., & Prieto, M. A. 1995, A&A, 295, 646 Levenson, N., Weaver, K., & Heckman, T. 2001, ApJ, 550, 230 Lipari, S., Bonatto, Ch., & Pastoriza, M. 1991, MNRAS, 253, 19 Mas-Hesse, J. M., Rodriguez-Pascual, P. M., Sanz Fernandez de Cordoba, L., Mirabel, I. F., Wamsteker, W., Makino, F., & Otani, C. 1995, A&A 298, 22 Moran, E. C., Halpern, J. P.,& Helfand, D. J. 1996, ApJS, 106, 341 Moustakas, J., & Kennicutt, R. C. 2006, ApJS, 164, 81 Osterbrock, D. 1989, Astrophysics of gaseous nebulae and active galactic nuclei, University Science Books Panessa, F., & Bassani, L. 2002, A&A, 394, 435 (PB02) Panessa, F., Wolter, A., Pellegrini, S., Fruscione, A., Bassani, L., Della Ceca, R., Palumbo, G., & Trinchieri, G. 2005, ApJ, 631, 707 Pier, E. A., & Krolik, J. 1992, ApJ, 401, 99 Pogge, R. W. 1989, AJ, 98, 124 Risaliti, G., Maiolino, R., & Salvati, M. 1999, ApJ, 522, 157 Risaliti, G. 2002, A&A, 386, 379 Rosati, P., & Chandra Deep Field South Team, 2001, A&AS, Bull.AAS, 33, 1519 Sanders, D., Soifer, B., Elias, J., Madore, B., Matthews, K., Neugebauer, G., & Scoville, N. 1988, ApJ, 325, 74 Schmitt, H. R., Kinney, A. L., Calzetti, D., & Storchi Bergmann, T. 1997, AJ 114, 592 Simpson, C., Mulchaey, J. S., Wislon, A. S., Ward, M. J., & Alonso- Herrero, A. 1996, ApJ, 457, L19 Simpson, C., Wislon, A. S., Bower, G., Heckman, T. M., Krolik, J. H., & Miley, G. K. 1997, ApJ, 474, 121 Smith, H. E., Lonsdale, C. J., & Londsdale C. J. 1998, ApJ, 492, 137 Stine, P. C. 1992, ApJS, 81, 49 Telesco, C. M., Dressel, L., & Wolstencroft,R. 1993, ApJ, 414, 120 Veilleux, S., Kim, D.-C., Sanders, D. B., Mazzarella, J. M., & Soifer, B. T. 1995, ApJS, 98, 171 Veilleux, S., & Osterbrock, D. E. 1987, ApJS, 63, 295 Wilson, A. S., Braatz, J. A., Heckman, T. M., Krolik, J. H., & Miley, G. K. 1993, ApJ, 419, L61 Yankulova, I. 1999, A&A, 344, 36 http://arxiv.org/abs/astro-ph/0611199 Introduction Observations and data reduction Results Narrow-band emission-line images Narrow-band emission-line total fluxes The ionization map F(ø3 5007)/F(H) Discussion The ionizing flux from the central engine Physical conditions in the circumnuclear region of NGC 7679 Ionization structure in the circumnuclear region of the NGC 7679 Unabsorbed SyGs with and without hidden BLRs Conclusions ABSTRACT NGC 7679 is a nearby luminous infrared Sy2 galaxy in which starburst and AGN activities co-exist. The ionization structure is maintained by both the AGN power-law continuum and starburst. The galaxy is a bright X-ray source possessing a low X-ray column density N_H < 4 x 10^20 cm^{-2}. The Compton-thin nature of such unabsorbed objects infers that the simple formulation of the Unified model for SyGs is not applicable in their case. The main goal of this article is to investigate both gas distribution and ionization structure in the circumnuclear region of NGC 7679 in search for the presence of a hidden Sy1 nucleus, using the [O III] 5007 luminosity as a tracer of AGN activity. The [O III] 5007 image of the NGC 7679 shows elliptical isophotes extended along the PA ~ 80 deg in the direction to the counterpart galaxy NGC 7682. The maximum of ionization by the AGN power-law continuum traced by [O III] 5007/Halpha ratio is displaced by ~ 13 arcsec eastward from the nucleus. We conclude that the dust and gas in the high ionization direction has a direct view to the central AGN engine. This possibly results in dust/star-formation decay. A large fraction of the unabsorbed Compton-thin Sy2s with [O III] luminosity > 10^41 erg s^{-1} possesses a hidden AGN source (abridged). <|endoftext|><|startoftext|> arXiv:0704.0769v1 [cond-mat.other] 5 Apr 2007 The Fermionic Density-functional at Feshbach Resonance Michael Seidl Institute of Theoretical Physics, University of Regensburg, D-93040 Regensburg, Germany Rajat K. Bhaduri Department of Physics and Astronomy, McMaster University, Hamilton, Canada L8S 4M1 (Dated: November 17, 2018) We consider a dilute gas of neutral unpolarized fermionic atoms at zero temperature. The atoms interact via a short-range (tunable) attractive interaction. We demonstrate analytically a curious property of the gas at unitarity. Namely, the correlation energy of the gas, evaluated by second order perturbation theory, has the same density dependence as the first order exchange energy, and the two almost exactly cancel each other at Feshbach resonance irrespective of the shape of the potential, provided (µrs) ≫ 1. Here (µ)−1 is the range of the two-body potential, and rs is defined through the number density, n = 3/(4πr3s). The implications of this result for universality is discussed. I. INTRODUCTION Consider a dilute gas ofN ≫ 1 neutral fermionic atoms (massM) at T = 0 interacting with a short-range attrac- tive potential. In general, the properties of the dilute gas are determined by the number density n, and the scatter- ing length a. The Hamiltonian of this N -particle system reads Ĥ = − ~ ∇2i + |ri − rj | . (1) Not written explicitly here, there is also an external po- tential vext(r) that forces the N atoms to stay within a large box with volume Ω [where vext(r) ≡ 0]. The attractive interaction potential is assumed to have the 2-parameter form v(r) = −v0f(µr) (2) where v0 > 0 is the strength of the interaction, R0 = is its range, and f(x) is a dimensionless function. In the true ground state of the Hamiltonian (1) the attractive atoms may form dimers or even clusters. We are, however, looking for a metastable state where there is a dilute gas of separated atoms with uniform density n, satisfying the condition (µrs) ≫ 1 where n = NΩ = Even then, for a weak v0, there will be BCS-type pair- ing, followed by dimer formation as the strength of the interaction increases. This was predicted long back by Leggett [1], and has been observed experimentally [2]. For the density functional analysis of the uniform gas at Feshbach resonance, we shall disregard the BCS con- densed pairs in this paper. To study the effect of the attractive interaction v(r), we consider the corresponding atom-atom scattering prob- lem in the relative s-state. Separating the center-of-mass motion, we are left with the relative Hamiltonian Ĥrel = − − v0f(µr) . (3) Keeping the range of the potential small enough such that (µrs) ≫ 1, the strength v0 is adjusted such that the potential can support a single bound state at zero energy. This happens when the scattering length a→ ∞, leaving no length scale from the interaction. Such a tuning of the interaction is possible experimentally, and gives rise to Feshbach resonance [3]. The scattering cross section in the given partial wave (s-wave in our case) reaches the unitary limit, and the gas is said to be at unitarity. It is then expected to display universal behavior [4]. Note that at Feshbach resonance, there is no length scale left other than the inverse of the Fermi wave number kF , where kF = (3π 2n)1/3. The energy per particle, E/N , as a function of the density n, should therefore scale the same way as the noninteracting kinetic energy, 3 2k2F /2M ∝ n2/3. There has been much interest amongst theorists to calculate the properties of the gas in the unitary regime (kF |a| ≫ 1). In particular, at T = 0, the energy per particle of the gas is calculated to be , (4) where ξ ≃ 0.44 [5]. The experimental value of ξ is about 0.5, but with large error bars [6]. Recently, there have been two Monte Carlo (MC) finite temperature calcula- tions [7, 8] of an untrapped gas at unitarity, where various thermodynamic properties as a function of temperature have been computed. It is clear that at unitarity, the kinetic and potential energies should scale the same way. This has been assumed a priori in a previous density functional treatment of a unitary gas [9]. However, such a scaling behavior is not evident from the density func- tionals for the direct, exchange and correlation energies [10] (see sects. II and III). The aim of the present paper is to examine this point in some detail. In particular, we are able to show analytically that the leading contribu- tion of the correlation energy (calculated in second order perturbation theory), cancels the first order exchange en- rgy almost exactly at Feshbach resonance. This happens irrespective of the shape of the potential as specified by f(µr), provided the condition (µrs) ≫ 1. We show that http://arxiv.org/abs/0704.0769v1 our general Eq.(24) (derived later in the text) that en- sures such a cancellation is satisfied at unitarity for a variety of 2-parameter potentials, including the square well and the delta-shell, as well as the smoothly varying cosh−2(µr) and Gaussian potentials. This is the main re- sult of the present work. The implications of this result for universality is marginal. This is because these po- tential energy terms, in the limit of (µrs) ≫ 1, are very small compared to the kinetic energy [4]. For a moder- ately large value like (µrs) ≃ 3 howevr, these terms are comparable in magnitude to the kinetic energy (sect. IV). Even then, the cancellation of the first order exchange, and the second order perturbative terms leave the di- rect first order term in tact. In the electron gas, this (repulsive) term got cancelled by the interaction of the electrons with the positive ionic background. There is no such mechanism of cancellation here, unless we assume, rather arbitrarily, that the short-range interatomic repul- sion cancels this direct (attractive) contribution. Even without any such assumptions, however, our main result (Table I), applicable at Feshbach reonance, is interesting from the angle of potential theory. II. PERTURBATION EXPANSION Treating the interaction (2) as a weak perturbation in the Hamiltonian (1), the unperturbed energy E(0) is the kinetic energy of a non-interacting Fermi gas, E(0) = Nts(rs) ≡ N . (5) Here, kF = and α3 = 4 . The corresponding ground state |Φ0〉 is a Slater determinant of plane waves. In terms of dimensionless coordinates xi = µri, the Hamiltonian (1) can be written as Ĥ = −1 ∇2i − λ |xi − xj | , λ = This suggests that the perturbation parameter is not re- ally small at unitarity. For example, for the square-well potential, the zero-energy single bound state occurs when λ = π (see sect. III). Nevertheless the low-order terms can point to important information, even when the ex- pansion is divergent [11]. In our problem, there are three parameters, µ, v0, and rs. The unitarity condition re- lates µ and v0, so two independent parameters are left. One of these may be taken to be the small parameter ζ = (µrs) −1. The remaining free parameter v0 may be chosen independently of ζ to fulfill the unitarity condi- tion. A. First order Formally, the first-order correction, E(1) = 〈Φ0|V̂int|Φ0〉, (7) has a direct contribution U(rs, µ) = Nu(rs, µ) with u(rs, µ) = d3r′v |r− r′| (µrs)3 f2. (8) Here, f2 = dxx2f(x). The other first-order contribution is the exchange en- ergy Ex(rs, µ) = Nex(rs, µ), [4] ex(rs, µ) = − drj1(kF r) 2v(r). (9) Here, j1(z) is a spherical Bessel function. Since v(r) is short-range and kF is small in a dilute gas, we can use the small-z expansion j1(z) = +O(z3) to find ex(rs, µ) = (µrs)3 f2 +O(µrs) −5. (10) B. Second order 1. General expressions Also the second-order correction, E(2) = − |〈Φn|V̂int|Φ0〉|2 En − E0 dir + e , (11) has a direct and an exchange contribution [12], dir(rs, µ) = − d3q f̃ d3k1 d q · (q+ k1 − k2) , (12) e(2)ex (rs, µ) = + d3q f̃ d3k1 d |q+ k1 − k2| q · (q+ k1 − k2) . (13) While v20(2M/~ 2µ2) has the dimension energy, the inte- gration variables q, k1, and k2 are dimensionless here. The domain of the integral over d3k1 d 3k2 depends on q, D : |k1|, |k2| < 1; |k1 + q|, |k2 − q| > 1. (14) Furthermore, f̃(y) is a dimensionless transform of f(x), f̃(y) = dxx2 f(x) j0(yx) dxx f(x) sin(yx). (15) To recover Eqs. (8) and (9) of Ref. [12], put M = me, v0 = −e2µ, and f(x) = 1x or f̃(y) = , such that v(r) = becomes the electronic Coulomb repulsion. (Note that Ref. [12] uses Rydberg units, mee 4/2~2 = e2/2aB = 1.) 2. The limit µrs ≫ 1 For a dilute gas (small kF ) with short-range interaction (large µ), Eqs. (12) and (13) can be evaluated in the limit µ/kF ≡ αµrs ≫ 1 where α3 = 49π . Following Ref. [13], we choose a number q1 such that 1 ≪ q1 ≪ µ/kF and split the integrals over d3q into two parts, d3q = d3q + d3q. (16) In the first part with q < q1, we have q ≪ µ/kF and |q + k1 − k2| ≪ µ/kF (note that |k1|, |k2| < 1 ≪ q1). Therefore, we may expand f̃(y) = f2+O(y 2) in Eqs. (12) and (13) and keep the leading term f2 only. The sum of the two resulting q < q1 contributions reads q q1 ≫ 1 of the integral (16), we can put q + k1 − k2 ≈ q, since |k1|, |k2| < 1. The resulting contributions to Eqs. (12) and (13) add up to q>q1(rs, µ) = − where d3k1 d 3k2 = ( )2 has been used. Now, dy f̃(y)2 (20) where y1 = kF q1/µ ≪ 1. If dy f̃(y)2 in Eq. (20) did not depend on y1, expression (19) did rigorously have the order O(µrs) −3. However, using the small-y expansion f̃(y) = f2 +O(y 2), we have dyf̃(y)2 = f22 y1 +O(y Consequently, shifting the lower limit y1 of the integral (20) to zero does not affect the leading-order contribution to expression (19), q>q1(rs, µ) = O(µrs) −3. (21) Therefore, the quantity (18) does not contribute to the leading order of e c = e dir + e ex which is purely due to expression (19), e(2)c (rs, µ) = − (µrs)3 F +O(µrs) −4 (22) where F = dyf̃(y)2. III. DENSITY SCALING AT UNITARITY If the perturbation expansion is convergent [12], the total energy E(rs, µ) = Ne(rs, µ) of the gas can be ex- pressed in the form e(rs, µ) = ts(rs) + ex(rs, µ) + e(n)c (rs, µ). (23) At unitarity, when the relative Hamiltonian (3) has a single bound state at zero energy, the exchange plus cor- relation energy ex + n=2 e c should display the same density scaling as the kinetic energy, ts(rs) ∝ r−2s ∝ ρ2/3. This is obviously not the case with any one of the present (leading-order) results (10) and (22). However, since the exchange energy (10) and the second-order correlation energy (22) have opposite signs, they can cancel each other at some value of µ. This happens when , (24) where f2 = dxx2 f(x) and F = dyf̃(y)2. This is the main result of our paper, and we check it by tak- ing four different potentials. The results of this analysis, summarized in Table I, are discussed in detail below. Generally, we need an eigenfunction ψ(r) = of the relative Hamiltonian (3) with eigenvalue zero. Writing u(r) = φ(µr), the corresponding dimensionless Schrödinger Equation reads φ′′(x) = −λf(x)φ(x), λ ≡ Mv0 . (25) Precisely, we wish to determine that particular value λuty of λ for which this zero-energy solution is the only bound state. Then, φ(x) must obey φ(0) = 0, φ′(x) < 0 for x ≥ 0, and φ(x) → const. for x → ∞. In the following examples (A-D), the solution φ(x) can be found analyti- cally or numerically. (A) Square-well potential of radius R0 = 1/µ: v(r) = −v0Θ(R0 − r) , (26) where Θ(z) denotes the Heavyside step function, Θ(z) = 1 for z > 0 and Θ(z) = 0 for z ≤ 0. By setting the dimensionless variable µr = x, we see that f(x) = Θ(1− x). The square-well potential (26) supports a single zero energy bound state when the LHS of Eq.(24) is λuty = π2/4. It may be easily checked analytically that for the square-well potential (26), f2 = and F = π so that the RHS of Eq.(24) is 5 , very close to its LHS, π2/4 = 2.47. (B) Rosen-Morse hyperbolic potential [5]. This poten- tial is given by v(r) = −v0 sech2(µr) , (27) which suppotrs a single zero energy bound state when the LHS of Eq.(24) is λuty = 2 instead of π 2/4. For this potential, it is easy to check that f2 = π 2/12. The quan- tity F , however, has to be calculated numerically, and is given by F = 0.596. Again, Eq.(24) is approximately satisfied, since its RHS for this potential is 2.17. (C) Delta-shell potential [14]. Consider the potential v(r) = −η ~ δ(r −R0) , = −η ~ = −v0f(µr) . (28) Thus, we have v0 = η , µ = 1 , and f(x) = δ(x− 1). So we get f2 = 1, f̃(y) = sin y , and F = π . Hence the RHS of Eq. (24) is unity. The LHS is (ηR0), which is exactly unity when the s-state scattering length goes to infinity [14]. Thus Eq.(24) is exactly obeyed in this case. (D) Gaussian Potential. v(r) = −v0 exp(−µ2r2) (29) For this example, f(x) = exp(−x2) in Eq. (2). We find π and F = 1 )3/2 so that the RHS of Eq. (24) becomes πf2/2F = 2 3/2. Solving Eq. (25) numerically for this f(x), we obtain a single bound state at zero energy when the LHS of Eq. (24) is λuty = 0.949× 23/2, close to 23/2. Note, however, that contributions O(µrs) −3 may also come from higher order terms of the perturbation expan- sion in section II, since that expansion is carried out with respect to the parameter λ =Mv0/~ 2µ2, but not 1/µrs. IV. DISCUSSION The dimensionless Hamiltonian ĥ from Eq. (6) depends on the dimensionless paramaters TABLE I: The moments f2 and F of four different profiles f(x) for the potential (2). λuty is the value at unitarity of the parameter λ in Eq. (25). At unitarity, the ratio Q of the LHS of Eq. (24) to the RHS is always close to 1. f(x) f2 F λuty Q Θ(1− x) 1 0.987 sech(x)2 π 0.596 2 0.922 δ(1− x) 1 π 1 1.000 exp(−x2) 1 )3/2 2.684 0.949 and, not written explicitly, xs = µrs. The perturbation expansion of the ground-state energy of ĥ reads ε(xs, λ) = εn(xs)λ n. (31) The ground-state energy of the original Hamiltonian Ĥ , with three independent parameters, is then given by E(rs, µ, λ) = ε(µrs, λ) εn(µrs)λ n. (32) For µrs ≫ 1, we may expand εn(µrs) = (µrs)m . (33) From Eq. (5), we have ε02 = N α−2 while ε0m = 0 for m 6= 2. Eqs. (8) and (10) imply that ε1m = 0 for m < 3 and ε13 = N(− 32 + )f2. Eventually, due to Eq. (22), ε2m = 0 for m < 3 and ε23 = N(− 34π )2F . So far as the unitary point is concerned, we are inter- ested in a situation where kF |a| ≫ 1 ≫ kFR0 ∼ (µrs)−1. In view of the fact that the perturbation series above does not converge at unitarity, how significant is our low order perturbation calculation in this situation ? Note that our first order direct and exchange (potential) energy terms given by Eqs. (8,10) are the same as those obtained in the Hartree-Fock calculation (see, for example, Eq.(10) of Heiselberg [4]). How big are these terms at unitarity compared to the kinetic energy per particle ? Taking the example of the square-well potential discussed earlier, it is straight forward to show that our exchange term (10) at Feshbach resonance is ex(rs, µ) = . (34) For the square-well example, (kF a) = (µrs) 1− tan . (35) At unitarity, the RHS diverges for any finite value of (µrs), how ever large. Even in the neighbourhood of unitarity, it is possible to have (kF |a|) ≫ 1 for (µrs) ≫ 1. From Eq.(34), we note that too large a choice for (µrs) would make ex negligible against EF . Instead, taking a modestly large value, µrs = 3, we obtain the ration of ex to kinetic energy per particle to be about 0.56. Noting that ex has a different density-dependence than the kinetic energy per particle, its cancellation with the second order perturbative correlation term helps towards scale invariance, but only if there is a mechanism for the direct first order term to be cancelled. We conclude by emphasizing that the new result in this paper is displayed in Table 1, and should be of interest from the point of view of potential theory. The authors would like to thank Brandon van Zyl for discussions. This research was financed by NSERC of Canada. [1] A.J. Leggett, in Modern Trends in the Theory of Con- densed Matter, Springer-Verlag Lecture Notes, Vol. 115, edited by A. Peklaski and J. Przystawa (Springer-Verlag, Berlin, 1980), p.13 [2] C.A. Regal et al., Nature (London) 424, 47 (2003); M.W. Zwierlein et al., Phys. Rev. Lett. 91, 250401 (2003); C.A. Regal et al., Phys. Rev. Lett. 92, 040403 (2004); M.W. Zwierlein et al., Nature (London) 435, 1046 (2005); G. B. Partridge et al., Science 311, 503 (2006). [3] S. Inouye et al., Nature (London) 392, 151 (1998); Ph. Courteille et al., Phys. Rev. Lett 81, 69 (1998). [4] G.A. Baker, Phys. Rev. C60, 054311 (1999); H. Heisel- berg, Phys. Rev. A63, 043606 (2001); T.-L. Ho. Phys. Rev. Lett. 92, 090402 (2004). [5] J. Carlson, S.-Y. Chang, V. R. Pandharipande, and K. E. Schmidt, Phys. Rev. Lett. 91, 050401 (2003); A. Perali, P. Pieri, and G. C. Strinati, Phys. Rev. Lett. 93, 100404 (2004). [6] M. Bartenstein et al., Phys. Rev. Lett. 92, 120401 (2004); T. Bourdel et al., Phys. Rev. Lett. 93, 050401 (2004). [7] A. Bulgac, J. E. Drut J.E., and P. Magierski, Phys. Rev. Lett. 96, 090404 (2006). [8] E. Burovski, N. Prokof’ev, B. Svistunov, and M. Troyer, Phys. Rev. Lett. 96, 160402 (2006). [9] T. Papenbrock, Phys. Rev. A72, 041603 (R) (2005); A. Bhattacharyya and T. Papenbrock, Phys. Rev. A74, 041602 (R) (2006). [10] R. G. Parr and W. Yang, Density-Functional Theory of Atoms and Molecules (Oxford University Press, New York, 1989); W. Kohn, Rev. Mod. Phys. 71, 1253 (1999). [11] M. Seidl, J. P. Perdew, and S. Kurth, Phys. Rev. Lett. 84, 5070 (2000). [12] M. Gell-Mann, K. A. Brueckner, Phys. Rev. 106, 364 (1957). [13] L. Zecca, P. Gori-Giorgi, S. Moroni, and G. B. Bachelet, Phys. Rev. B 70, 205 127 (2004). [14] K. Gottfried, Quantum Mechanicsvol.I, (W. A. Ben- jamin, Inc., New York, 1966). See sect. (15). ABSTRACT We consider a dilute gas of neutral unpolarized fermionic atoms at zero temperature.The atoms interact via a short range (tunable) attractive interaction. We demonstrate analytically a curious property of the gas at unitarity. Namely, the correlation energy of the gas, evaluated by second order perturbation theory, has the same density dependence as the first order exchange energy, and the two almost exactly cancel each other at Feshbach resonance irrespective of the shape of the potential, provided $(\mu r_s) >> 1$. Here $(\mu)^{-1}$ is the range of the two-body potential, and $r_s$ is defined through the number density $n=3/(4\pi r_s^3)$. The implications of this result for universality is discussed. <|endoftext|><|startoftext|> Chemical Evolution Francesca Matteucci Department of Astronomy University of Trieste and Osservatorio Astronomico di Trieste (INAF) Via G.B. Tiepolo, 11, 34124 Trieste Italy (matteucci@ts.astro.it) http://arxiv.org/abs/0704.0770v1 Contents 1 Chemical Evolution page 1 1.1 Lecture I: basic assumptions and equations of chem- ical evolution 1 1.1.1 The basic ingredients 1 1.1.2 The Star Formation Rate 2 1.1.3 The Initial Mass Function 3 1.1.4 The Infall Rate 4 1.1.5 The Outflow Rate 4 1.1.6 Stellar evolution and nucleosynthesis: the stellar yields 5 1.1.7 Type Ia SN Progenitors 6 1.1.8 Yields per Stellar Generation 7 1.1.9 Analytical models 8 1.1.10 Numerical Models 9 1.2 Lecture II: the Milky Way and other spirals 11 1.2.1 The Galactic formation timescales 11 1.2.2 The two-infall model 12 1.2.3 Common Conclusions from MW Models 18 1.2.4 Abundance Gradients from Emission Lines 19 1.2.5 Abundance Gradients in External Galaxies 21 1.2.6 How to model the Hubble Sequence 21 1.2.7 Type Ia SN rates in different galaxies 24 1.2.8 Time-delay model for different galaxies 25 1.3 Lecture III: interpretation of abundances in dwarf irregulars 27 1.3.1 Properties of Dwarf Irregular Galaxies 27 1.3.2 Galactic Winds 31 iv Contents 1.3.3 Results on DIG and BCG from purely chemical models 32 1.3.4 Results from Chemo-Dynamical models: IZw18 34 1.4 Lecture IV: Elliptical galaxies-Quasars- ICM Enrich- ment 38 1.4.1 Ellipticals 38 1.4.2 Chemical Properties 38 1.4.3 Scenarios for galaxy formation 39 1.4.4 Ellipticals-Quasars connection 41 1.4.5 The chemical evolution of QSOs 41 1.4.6 The chemical enrichment of the ICM 43 1.4.7 Conclusions on the enrichment of the ICM 46 References 48 Chemical Evolution 1.1 Lecture I: basic assumptions and equations of chemical evolution To build galaxy chemical evolution models one needs to elucidate a num- ber of hypotheses and make assumptions on the basic ingredients. 1.1.1 The basic ingredients • INITIAL CONDITIONS: whether the mass of gas out of which stars will form is all present initially or it will be accreted later on. The chemical composition of the initial gas (primordial or already enriched by a pregalactic stellar generation). • THE BIRTHRATE FUNCTION: B(M, t) = ψ(t)ϕ(M) (1.1) where: ψ(t) = SFR (1.2) is the star formation rate (SFR) and: ϕ(M) = IMF (1.3) is the initial mass function (IMF). • STELLAR EVOLUTION AND NUCLEOSYNTHESIS: stellar yields, yields per stellar generation • SUPPLEMENTARY PARAMETERS : infall, outflow, radial flows. 2 Chemical Evolution 1.1.2 The Star Formation Rate Here we will summarize the most common parametrizations for the SFR in galaxies, as adopted by chemical evolution models: • Constant in space and time and equal to the estimated present time SFR. For example, for the local disk, the present time SFR is SFR=2- 5M⊙pc −2Gyr−1 (Boissier& Prantzos, 1999). • Exponentially decreasing: SFR = νe−t/τ∗ (1.4) with τ∗ = 5− 15 Gyr (Tosi, 1988). The quantity νis a parameter that we call efficiency of SF since it represents the SFR per unit mass of gas and is expressed in Gyr−1. • The most used SFR is the Schmidt (1959) law, which assumes a de- pendence on the gas density, in particular: SFR = νσkgas (1.5) where k = 1.4± 0.15, as suggested by a study of Kennicutt (1998) of local star forming galaxies. • Some variations of the Schmidt law with a dependence also on the total mass have been suggested for example by Dopita & Ryder (1994). This formulation takes into account the feedback mechanism acting between supernovae ( SNe) and stellar winds injecting energy into the interstellar medium (ISM) and the galactic potential well. In other words, the SF process is regulated by the fact that in a region of recent star formation the gas is too hot to form stars and it is easily removed from that region. Before new stars could form the gas needs to cool and collapse back into the star forming region and this process depends on the potential well and therefore on the total mass density: SFR = νσk1totσ gas (1.6) with k1 = 0.5 and k2 = 1.5. • Kennicutt (1998) also suggested, as an alternative to the Schmidt law to fit the data, the following relation: SFR = 0.017Ωgasσgas ∝ R −1σgas (1.7) with Ωgas being the angular rotation speed of gas. 1.1 Lecture I: basic assumptions and equations of chemical evolution3 • Finally a SFR induced by spiral density waves was suggested by Wyse & Silk (1989): SFR = νV (R)R−1σ1.5gas (1.8) with R being the galactocentric distance and V (R) the gas rotation velocity. 1.1.3 The Initial Mass Function The IMF is a probability function describing the distribution of stars as a function of mass. The present day mass function is derived for the stars in the solar vicinity by counting the Main Sequence stars as a function of magnitude and then applying the mass-luminosty relation, holding for Main Sequence stars, to derive the distribution of stars as a function of mass. In order to derive the IMF one has then to make assumptions on the past history of SF. The derived IMF is normally approximated by a power law: ϕ(M)dM = aM−(1+x)dM (1.9) where ϕ(M) is the number of stars with masses in the interval M, M+dM. Salpeter (1955) proposed a one-slope IMF (x = 1.35) valid for stars with M > 10M⊙. Multi-slope (x1, x2, ..) IMFs have been suggested later on always for the solar vicinity (Scalo 1986,1998; Kroupa et al. 1993; Chabrier 2003). The IMF is generally normalized as: ∫ 100 Mϕ(M)dM = 1 (1.10) where a is the normalization constant and the assumed interval of inte- gration is 0.1− 100M⊙. The IMF is generally considered constant in space and time with some exceptions such as the IMF suggested by Larson (1998) with: x = 1.35(1 +m/m1) −1 (1.11) where m1 is variable typical mass and is associated to the Jeans mass. This IMF predicts then that m1 is a decreasing function of time. 4 Chemical Evolution 1.1.4 The Infall Rate For the rate of gas accretion there are in the literature several parametriza- tions: • The infall rate is constant in space and time and equal to the present time infall rate as measured in the Galaxy (∼ 1.0M⊙yr • The infall rate is variable in space and time, and the most common assumption is an exponential law (Chiosi 1980; Lacey & Fall 1985): IR = A(R)e−t/τ(R) (1.12) with τ(R) constant or varying with the galactocentric distance. The parameter A(R) is derived by fitting the present day total surface mass density, σtot(tG), at any specific galactocentric radius R. • For the formation of the Milky Way two episodes of infall have been suggested (Chiappini et al. 1997), where during the first infall episode the stellar halo forms whereas during the second infall episode the disk forms. This particular infall law gives a good representation of the formation of the Milky Way. The proposed two-infall law is: IR = A(R)e−t/τH(R) +B(R)e−(t−tmax)/τD(R) (1.13) where τH(R) is the timescale for the formation of the halo which can be costant or vary with galactocentric distance. The quantity τD(R) is the timescale for the formation of the disk and is a function of the galactocentric distance; in most of the models it is assumed to increase with R (e.g. Matteucci & François, 1989). • More recently, Prantzos (2003) suggested a gaussian law with a peak at 0.1 Gyr and a FWHM of 0.04 Gyr for the formation of the stellar halo. 1.1.5 The Outflow Rate The so-called galactic winds occur when the thermal energy of the gas in galaxies exceeds its potential energy. Generally, gas outflows are called winds when the gas is lost forever from the galaxy. Only detailed dynamical simulations can suggest whether there is a wind or just an outflow of gas which will soon or later fall back again into the galaxy. In chemical evolution models galactic winds can be sudden or continuous. If they are sudden, the mass is assumed to be lost in a very short interval of 1.1 Lecture I: basic assumptions and equations of chemical evolution5 time and the galaxy is devoided from all the gas; if they are continuous, one has to assume the rate of gas loss. Generally, in chemical evolution models (Bradamante et al. 1998) and also in cosmological simulations (Springel & Hernquist, 2003) it is assumed that the rate of gas loss is several times the SFR: W = −λSFR (1.14) where λ is a free parameter with the meaning of wind efficiency. This particular formulation for the galactic wind rate is confirmed by obser- vational findings (see Martin, 1999). 1.1.6 Stellar evolution and nucleosynthesis: the stellar yields Here we summarize the various contribution to the element production by stars of all masses. • Brown Dwarfs (M < ML, ML = 0.08 − 0.09M⊙) are objects which never ignite H and their lifetimes are larger than the age of the Uni- verse. They are contributing to lock up mass. • Low mass stars (0.5 ≤ M/M⊙ ≤ MHeF ) (1.85-2.2M⊙) ignite He ex- plosively but without destroying themselves and then become C-O white dwarfs (WD). If M < 0.5M⊙ they become He WDs. Their lifetimes range from several 109 years up to several Hubble times! • Intermediate mass stars (MHeF ≤ M/M⊙ ≤ Mup) ignite He quies- cently. The mass Mup is the limiting mass for the formation of a C-O degenerate core and is in the range 5-9M⊙, depending on stellar evo- lution calculations. Lifetimes are from several 107 to 109 years. They die as C-O WDs if not in binary systems. If in binary systems they can give rise to cataclysmic variables such as novae and Type Ia SNe. • Massive stars (M > Mup). We distinguish here several cases: -Mup ≤ M/M⊙ ≤ 10 − 12. Stars with Main Sequence masses in this range end up as electron-capture SNe leaving neutron stars as remnants. These SNe will appear as Type II SNe which show H in their spectra. -10 − 12 ≤ M/M⊙ ≤ MWR, (with MWR ∼ 20 − 40M⊙ being the limiting mass for the formation of a Wolf-Rayet (WR) star). Stars in this mass range end their life as core-collapse SNe (Type II) leaving a neutron star or a black hole as remnants. -MWR ≤ M/M⊙ ≤ 100. Stars in this mass range are probably 6 Chemical Evolution exploding as Type Ib/c SNe which do not show H in their spectra. Their lifetimes are of the order of ∼ 106 years. • Very Massive Stars (M > 100M⊙), they should explode by means of instability due to “pair creation” and they are called pair-creation SNe. In fact, at T ∼ 2 · 109 K a large portion of the gravitational energy goes into creation of pairs (e+, e−), the star becomes unstable and explodes. They leave no remnants and their lifetimes are < 106 years. Probably these very massive stars formed only when the metal content was almost zero (Population III stars, Schneider et al. 2004). All the elements with mass number A from 12 to 60 have been formed in stars during the quiescent burnings. Stars transform H into He and then He into heaviers until the Fe-peak elements, where the binding energy per nucleon reaches a maximum and the nuclear fusion reactions stop. H is transformed into He through the proton-proton chain or the CNO- cycle, then 4He is transformed into 12C through the triple- α reaction. Elements heavier than 12C are then produced by synthesis of α- particles. They are called α-elements (O, Ne, Mg, Si and others). The last main burning in stars is the 28Si -burning which produces 56Ni which then decays into 56Co and 56Fe. Si-burning can be quiescent or explosive (depending on the temperature). Explosive nucleosynthesis occurring during SN explosions mainly pro- duces Fe-peak elements. Elements originating from s- and r-processes (with A> 60 up to Th and U) are formed by means of slow or rapid (rel- ative to the β- decay) neutron capture by Fe seed nuclei; s-processing occurs during quiescent He-burning whereas r-processing occurs during SN explosions. 1.1.7 Type Ia SN Progenitors The Type Ia SNe, which do not show H in their spectra, are believed to originate from WDs in binary systems and to be the major producers of Fe in the Universe. The model proposed are basically two: • Single Degenerate Scenario (SDS), with a WD plus a Main Se- quence or Red Giant star, as originally suggested by Whelan and Iben (1973). The explosion (C-deflagration) occurs when the C-O WD reaches the Chandrasekhar mass, MCh =∼ 1.44M⊙, after accreting material from thecompanion. In this model the clock to the explosion is given by the lifetime of the companion of the WD (namely the less 1.1 Lecture I: basic assumptions and equations of chemical evolution7 massive star in the system). It is interesting to define the minimum timescale for the explosion which is given by the lifetime of a 8M⊙ star, namely tSNIamin=0.03 Gyr (Greggio and Renzini 1983). Recent observations in radio-galaxies by Mannucci et al. (2005;2006) seem to confirm the existence of such prompt Type Ia SNe. • Double Degenerate Scenario (DDS), where the merging of two C- OWDs of mass∼ 0.7M⊙, due to loss of angular momentum as a conse- quence of gravitational wave radiation, produces C-deflagration (Iben and Tutukov 1984). In this case the clock to the explosion is given by the lifetime of the secondary star, as above, plus the gravitational time delay, namely the time necessary for the two WDs to merge. The minimum time for the explosion is tSNIamin = 0.03+∆tgrav=0.04 Gyr (see Tornambè 1989). Some variations of the above scenarios have been proposed such as the model by Hachisu et al. (1996; 1999), which is based on the sin- gle degenerate scenario where a wind from the WD is considered. Such a wind stabilizes the accretion from the companion and introduces a metallicity effect. In particular, the wind, necessary to this model, oc- curs only if the systems have metallicity ([Fe/H]< −1.0). This implies that the minimum time for the explosion is larger than in the previous cases. In particular, tSNIamin = 0.33 Gyr, which is the lifetime of the more massive secondary considered (2.3M⊙) plus the metallicity delay which depends on the assumed chemical evolution model. 1.1.8 Yields per Stellar Generation Under the assumption of Instantaneous Recycling Approximation (IRA) which states that all stars more massive than 1M⊙ die immediately, whereas all stars with masses lower than 1M⊙ live forever, one can define the yield per stellar generation (Tinsley, 1980); mpimϕ(m)dm (1.15) where pim is the stellar yield of the element i, namely the newly formed and ejected element i by a star of mass m. The quantity R is the so-called Returned Fraction: (m−Mrem)ϕ(m)dm (1.16) 8 Chemical Evolution and is the total mass of gas restored into the ISM by an entire stellar generation. 1.1.9 Analytical models The Simple Model for the chemical evolution of the solar neighbourhood is the simplest approach to model chemical evolution. The solar neigh- bourhood is assumed to be a cylinder of 1 Kpc radius centered around the Sun. The basic assumptions of the Simple Model are: - the system is one-zone and closed, no inflows or outflows with the total mass present since the beginning, - the initial gas is primordial (no metals), - instantaneous recycling approximation holds, - the IMF, ϕ(m), is assumed to be constant in time, - the gas is well mixed at any time (IMA) The Simple Model fails in describing the evolution of the Milky Way (G-dwarf metallicity distribution, elements produced on long timescales and abundance ratios) and the reason is that at least two of the above assumptions are manifestly wrong, epecially if one intends to model the evolution of the abundance of elements produced on long timescales, such as Fe. In particular the assumptions of the closed boxiness and the However, it is interesting to know the solution of the Simple Model and its implications. Be Xi the abundance by mass of an element i. If Xi << 1, which is generally true for metals, we obtain the solution of the Simple Model. This solution is obtained analytically by ignoring the stellar lifetimes: Xi = yiln( ) (1.17) where µ =Mgas/Mtot and yi is the yield per stellar generation, as defined above, otherwise called effective yield. In particular, the effective yield is defined as: yieff = ln(1/G) (1.18) namely the yield that the system would have if behaving as the simple closed-box model. This means that if yieff > yi, then the actual system has attained a higher abundance for the element i at a given gas fraction G. Generally, in the IRA, we can assume: 1.1 Lecture I: basic assumptions and equations of chemical evolution9 (1.19) which means that the ratio of two element abundances are always equal to the ratio of their yields. This is no more true when IRA is relaxed. In fact, relaxing IRA is necessary to study in detail the evolution of the abundances of single elements. One can obtain analytical solutions also in presence of infall and/or outflow but the necessary condition is to assume IRA. Matteucci & Chiosi (1983) found solutions for models with outflow and infall and Matteucci (2001) found it for a model with infall and outflow acting at the same time. The main assumption in the model with outflow but no infall is that the outflow rate is: W (t) = λ(1−R)ψ(t) (1.20) where λ ≥ 0 is the wind parameter. The solution of this model is: (1 + λ) ln[(1 + λ)G−1 − λ] (1.21) for λ = 0 the equation becomes the one of the Simple Model (1.17). The solution of the equation of metals for a model without wind but with a primordial infalling material (XAi = 0) at a rate: A(t) = Λ(1−R)ψ(t) (1.22) and Λ 6= 1 is : [1− (Λ − (Λ− 1)G−1)−Λ/(1−Λ)] (1.23) For Λ = 1 one obtains the well known case of extreme infall studied by Larson (1972) whose solution is: Xi = yi[1− e −(G−1−1)] (1.24) This extreme infall solution shows that when G→ 0 then Xi → yi. 1.1.10 Numerical Models Numerical models relax IRA and close boxiness but generally retain the constancy of ϕ(m) and the IMA. 10 Chemical Evolution If Gi is the mass fraction of gas in the form of an element i, we can write: Ġi(t) = −ψ(t)Xi(t) ∫ MBm ψ(t− τm)Qmi(t− τm)ϕ(m)dm ∫ MBM ∫ 0.5 f(γ)ψ(t− τm2)Qmi(t− τm2)dγ]dm ∫ MBM ψ(t− τm)Qmi(t− τm)ϕ(m)dm ψ(t− τm)Qmi(t− τm)ϕ(m)dm +XAiA(t)−Xi(t)W (t) (1.25) where B=1-A, A=0.05-0.09. The meaning of the A parameter is the fraction in the IMF of binary systems with those specific features re- quired to give rise to Type Ia SNe, whereas B is the fraction of all the single stars and binary systems in the same mass range of definition of the progenitors of Type Ia SNe. The values of A indicated above are cor- rect for the evolution of the solar vicinity where an IMF of Scalo (1986, 1989) or Kroupa et al.(1993) is adopted. If one adopts a flatter IMF such as the Salpeter (1955) one then A is different. In the above equations the contribution of Type Ia SNe is contained in the third term on the right hand side. The integral is made over a range of masses going from 3 to 16 M⊙ which represents the total masses of binary systems able to produce Type Ia SNe in the framework of the SDS. There is also an inte- gration over the mass distribution of binary systems; in particular, one considers the function f(γ) where γ = M2 M1+M2 , with M1 and M2 being the primary and secondary mass of the binary system, respectively (for more details see Matteucci & Greggio 1986 and Matteucci 2001).The functions A(t) and W(t) are the infall and wind rate, respectively. Fi- nally, the quantity Qmi represents the stellar yields (both processed and unprocessed material). 1.2 Lecture II: the Milky Way and other spirals 11 1.2 Lecture II: the Milky Way and other spirals The Milky Way galaxy has four main stellar populations: 1) the halo stars with low metallicities (the most common metallicity indicator in stars is [Fe/H]= log(Fe/H)∗− log(Fe/H)⊙) and eccentric orbits, 2) the bulge population with a large range of metallicities and is dominated by random motions, 3) the thin disk stars with an average metallicity < [Fe/H ] >=-0.5 dex and circular orbits, and finally 4) the thick stars which possess chemical and kinematical properties intermediate between those of the halo and those of the thin disk. The halo stars have average metallicities of < [Fe/H ] >=-1.5 dex and a maximum metallicity of ∼ −1.0 dex although stars with [Fe/H] as high as -0.6 dex and halo kinematics are observed. The average metallicity of thin disk stars is ∼ −0.6 dex, whereas the one of Bulge stars is ∼ −0.2 dex. 1.2.1 The Galactic formation timescales The kinematical and chemical properties of the different Galactic stel- lar populations can be interpreted in terms of the Galaxy formation mechanism. Eggen et al. (1962) in a cornerstone paper suggested a rapid collapse for the formation of the Galaxy lasting ∼ 3 · 108 years. This suggestion was based on a kinematical and chemical study of so- lar neighbourhood stars. Later on, Searle & Zinn (1979) proposed a central collapse like the one proposed by Eggen et al. but also that the outer halo formed by merging of large fragments taking place over a con- siderable timescale > 1 Gyr. More recently, Berman & Suchov (1991) proposed the so-called hot Galaxy picture, with an initial strong burst of SF which inhibited further SF for few Gyr while a strong Galactic wind was created. From an historical point of view, the modelization of the Galactic chemical evolution has passed through different phases that I summarize in the following. • SERIAL FORMATION The Galaxy is modeled by means of one accretion episode lasting for the entire Galactic lifetime, where halo, thick and thin disk form in sequence as a continuous process. The obvious limit of this approach is that it does not allow us to predict the observed overlapping in metallicity between halo and thick disk stars and between thick and thin disk stars, but it gives a fair representation of our Galaxy (e.g. Matteucci & François 1989). 12 Chemical Evolution • PARALLEL FORMATION In this formulation, the various Galactic components start at the same time and from the same gas but evolve at different rates (e.g. Pardi et al. 1995). It predicts overlapping of stars belonging to the different components but implies that the thick disk formed out of gas shed by the halo and that the thin disk formed out of gas shed by the thick disk, and this is at variance with the distribution of the stellar angular momentum per unit mass (Wyse & Gilmore 1992), which indicates that the disk did not form out of gas shed by the halo. • TWO-INFALL FORMATION In this scenario, halo and disk formed out of two separate infall episodes (overlapping in metallicity is also predicted) (e.g. Chiappini et al. 1997; Chang et al. 1999). The first infall episode lasted no more than 1-2 Gyr whereas the second, where the thin disk formed, lasted much longer with a timescale for the formation of the solar vicinity of 6-8 Gyr (Chiappini et al. 1997; Boissier& Prantzos 1999). • STOCHASTIC APPROACH Here the hypothesis is that in the early halo phases ([Fe/H] < −3.0 dex), mixing was not efficient and, as a consequence, one should ob- serve in low metallicity halo stars the effects of pollution from single SNe (e.g. Tsujimoto et al. 1999; Argast et al. 2000; Oey 2000). These models predict a large spread for [Fe/H] < −3.0dex which is not ob- served, as shown by recent data with metallicities down to -4.0 dex (Cayrel et al. 2004; see later). 1.2.2 The two-infall model The adopted SFR (see Figure 2.1) is eq.(1.6) with different SF efficiencies for the halo and disk, in particular νH = 2.0Gyr −1, νD = 1.0Gyr respectively. A threshold density (σth = 7M⊙pc −2) for the SFR is also assumed in agreement with results from Kennicutt (1989; 1998). In Figure 2.2 we show the predicted SN (II and Ia) rates by the two- infall model. Note that the Type Ia SN rate is calculated according to the SDS (Greggio & Renzini, 1983; Matteucci & Recchi, 2001). There is a delay between the Type II SN rate and the Type Ia SN rate, and while the Type II SN rate strictly follows the SFR, the Type Ia SN rate is smoothly increasing. François et al. (2004) compared the predictions of the two-infall model for the abundance ratios versus metallicity relations ([X/Fe] vs. [Fe/H]), with the very recent and very accurate data of the project “First Stars” 1.2 Lecture II: the Milky Way and other spirals 13 Fig. 1.1. The predicted SFR in the solar vicinity with the two-infall model. Figure from Chiappini et al. (1997). The oscillating behaviour at late times is due to the assumed threshold density for SF. The threshold gas density is also responsible for the gap in the SFR seen at around 1 Gyr. by Cayrel et al. (2004). They adopted yields from the literature both for Type II and Type Ia SNe and noticed that while for some elements (O, Fe, Si, Ca) the yields of Woosley & Weaver (1995) (hereafter WW95) reproduce the data fairly well, for the Fe-peak elements and heaviers none of the available yields give a good agreement. Therefore, they varied empirically the yields of these elements in order to best fit the data. In Figures 2.3 and 2.4 we show the predictions for α-elements (O, Mg, Si, Ca, Ti, K) plus some Fe-peak elements and Zn. In Figure 2.4 we show also the ratios between the yields derived em- pirically by François et al. (2004) in order to obtain the excellent fits shown in the figures, and those of WW95 for massive stars. For some elements it was necessary to change also the yields from Type Ia SNe relative to the reference ones which are those of Iwamoto et al. (1999) (hereafter I99). In Figure 2.5 we show the predictions of chemical evolution models for 12C and 14N compared with abundance data. The behaviour of C shows a roughly constant [C/Fe] as a function of [Fe/H], although C seems to 14 Chemical Evolution Fig. 1.2. The predicted Type II and Ia SN rate in the solar vicinity with the two-infall model. Figure from Chiappini et al. (1997) slightly increase at very low metallicities, indicating that the bulk of these two elements comes from stars with the same lifetimes. The data in these figures, especially those for N are old and do not contain very metal poor stars. Newer data containing stars with [Fe/H] down to ∼ -4.0 dex (Spite et al. 2005; Israelian et al. 2004) indicate that the [N/Fe] ratio continues to be high also at low metallicities, indicating a primary origin for N produced in massive stars. We recall here that we define primary a chemical elements which is produced in the stars starting from the H and He, whereas we define secondary a chemical element which is formed from heavy elements already present in the star at its birth and not produced in situ. The model predictions shown in Figure 2.5 for C and N assume that the bulk of these elements is produced by low and intermediate mass stars (yields from van den Hoeck and Groenewegen, 1997) and that N is produced as a partly secondary and partly primary element. The N production from massive stars has only a secondary origin (yields from WW95). In Figure 2.5 we show also a model prediction where N is considered as a primary element in massive stars with the yields artificially increased. Recently, Chiappini et al. 1.2 Lecture II: the Milky Way and other spirals 15 Fig. 1.3. Predicted and observed [X/Fe] vs. [Fe/H] for several α- and Fe-peak- elements plus Zn compared with a compilation of data. In particular the black dots are the recent high resolution data from Cayrel et al. (2004). For the other data see references in François et al. (2004). The solar value indicated in the upper right part of each figure represents the predicted solar value for the ratio [X/Fe]. The assumed solar abundances are those of Grevesse & Sauval (1998) except that for oxygen for which we take the value of Holweger (2001). 16 Chemical Evolution Fig. 1.4. Upper panel: predicted and observed [X/Fe] vs. [Fe/H] for several elements as in Figure 2.3. In the bottom part of this Figure are shown the ratios between the empirical yields and the yields by WW95 for massive stars. Such empirical yields have been suggested by François et al. (2004) in order to fit at best all the [X/Fe] vs. [Fe/H] relations. In the small panel at the bottom right side are shown also the ratios between the empirical yields for Type Ia SNe and the yields by I99. 1.2 Lecture II: the Milky Way and other spirals 17 Fig. 1.5. Upper panel: predicted and observed [C/Fe] vs. [Fe/H]. Models from Chiappini et al. (2003a). Lower panel, predicted and observed [N/Fe] vs. [Fe/H]. For references to the data see original paper.The thin and thick continuous lines in both panels represent models with standard nucleosynthe- sis, as described in the text, whereas the dashed line represents the predictions of a model where N in massive stars has been considered as a primary element with “ad hoc” stellar yields. (2006) have shown that primary N produced by very metal poor fastly rotating massive stars can well reproduce the observations. In summary, the comparison between model predictions and abun- dance data indicate the following scenario for the formation of heavy elements: • 12C and 14N are mainly produced in low and intermediate mass stars (0.8 ≤ M/M⊙ ≤ 8). The amounts of primary and secondary N is still uncertain and also the fraction of C produced in massive stars. Primary N from massive stars seems to be required to reproduce the N abundance in low metallicity halo stars. • α-elements originate in massive stars: the nucleosynthesis of O is rather well understood (there is agreement between different authors), the yields from WW95 as functions of metallicity produce an excellent agreement with the observations for this particular element. 18 Chemical Evolution • Magnesium is generally underproduced by nucleosynthesis models. Taking the yields of WW95 as a reference, the Mg yields should be increased in stars with masses M ≤ 20M⊙ and decreased in stars with M > 20M⊙ to fit the data. Silicon should be slightly increased in stars with masses M > 40M⊙. • Fe originates mostly in Type Ia SNe. The Fe yields in massive stars are still uncertain, WW95 metallicity dependent yields overestimate Fe in stars < 30M⊙. For this element, it is better to adopt the yields of WW95 for solar metallicity. • Fe-peak elements: the yields of Cr, Mn should be increased in stars of 10-20 M⊙ relative to the yields of WW95, whereas the yield of Co should be increased in Type Ia SNe, relative to the yields of I99, and decreased in stars in the range 10-20M⊙, relative to the yields of WW95. Finally, the yield of Ni should be decreased in Type Ia SNe. • The yields of Cu and Zn from Type Ia SNe should be larger, relative to the standard yields, as already suggested by Matteucci et al. (1993). 1.2.3 Common Conclusions from MW Models Most of the chemical evolution models for the Milky Way existing in the literature conclude that: • The G-dwarf metallicity distribution can be reproduced only by as- suming a slow formation of the local disk by infall. In particular, the time-scale for the formation of the local disk should be in the range τd ∼ 6 − 8 Gyr (Chiappini et al. 1997; Boissier and Prantzos 1999; Chang et al. 1999; Chiappini et al. 2001; Alibès et al. 2001). • The relative abundance ratios [X/Fe] vs. [Fe/H], interpreted as time- delay between Type Ia and II SNe, suggest a timescale for the halo- thick disk formation of τh ∼ 1.5-2.0 Gyr (Matteucci and Greggio 1986; Matteucci and François, 1989; Chiappini et al. 1997). The external halo and thick disk probably formed more slowly or have been accreted (Chiappini et al. 2001). • To fit abundance gradients, SFR and gas distribution along the Galac- tic thin disk we must assume that the disk formed inside-out (Mat- teucci & François, 1989; Chiappini et al. 2001; Boissier & Prantzos 1999; Alibés et al. 2001). Radial flows can help in forming the gra- dients (Portinari & Chiosi 2000) but they are probably not the main cause for them. A variable IMF along the Disk can in principle ex- plain abundance gradients but it creates unrealistic situations: in fact, 1.2 Lecture II: the Milky Way and other spirals 19 in order to reproduce the negative gradients one should assume that in the external and less metal rich parts of the Disk low mass stars form preferentially (see Chiappini et al. 2000 for a discussion on this point). • The SFR is a strongly varying function of the galactocentric distance (Matteucci & François 1989; Chiappini et al, 1997,2001; Goswami & Prantzos 2000; Alibés et al. 2001). 1.2.4 Abundance Gradients from Emission Lines There are two types of abundance determinations in HII regions: one is based on recombination lines which should have a weak temperature de- pendence of the nebula (He, C, N, O), the other is based on collisionally excited lines where a strong dependence is intrinsic to the method (C, N, O, Ne, Si, S, Cl, Ar, Fe and Ni). This second method has predominated until now. A direct determination of the abundance gradients from HII regions in the Galaxy from optical lines is difficult because of extinction, so usually the abundances for distances larger than 3 Kpc from the Sun are obtained from radio and infrared emission lines. Abundance gradients can also be derived from optical emission lines in Planetary Nebulae (PNe). However, the abundances of He, C and N in PNe are giving only information on the internal nucleosynthesis of the star. So, to derive gradients one should look at the abundances of O, S and Ne, unaffected by stellar processes. In Figure 2.6 we show theoretical predictions of abundance gradients along the disk of the Milky Way compared with data from HII regions and B stars. The adopted model is from Chiappini et al. (2001; 2003a) and is based on an inside-out formation of the thin disk with the inner regions forming faster than the outer ones, in particular τ(R) = 0.875R − 0.75 Gyr. Note that to obtain a better fit for 12C, the yields of this element have been increased artificially relative to those of WW95. As already said, most of the models agree on the inside-out scenario for the Disk formation, however not all models agree on the evolution of the gradients with time. In fact, some models predict a flattening with time (Boissier and Prantzos 1998; Alibès et al. 2001), whereas others such as that of Chiappini et al. (2001) predict a steepening. The reason for the steepening is that in the model of Chiappini et al. is included a threshold density for SF,, which induces the SF to stop when the density decreases below the threshold. This effect is particularly strong in the external regions of the Disk, thus contributing to a slower evolution and 20 Chemical Evolution 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 Fig. 1.6. Upper panel: abundance gradients along the Disk of the MW. The lines are the models from Chiappini et al. (2003a): these models differ by the nucleosynthesis prescriptions. In particular, the dash-dotted line represents a model with van den Hoeck & Groenewegen (1997, hereafter HG97) yields for low-intermediate mass stars with η (mass loss parameter) constant and Thielemann et al.’s (1996) yields for massive stars, the long- dashed thick line has HG97 yields with variable η and Thielemann et al. yields, the long-dashed thin line has HG97 yields with variable η but WW95 yields for massive stars. It is interesting to note that in all of these models the yields of 12C in stars > 40M⊙ have been artificially increased by a factor of 3 relative to the yields of WW95. Lower panel: the temporal behaviour of abundance gradients along the Disk as predicted by the best model of Chiappini et al. (2001). The upper lines in each panel represent the present time gradient, whereas the lower ones represent the gradient a few Gyr ago. It is clear that the gradients tend to stepeen in time, a still controversial result. 1.2 Lecture II: the Milky Way and other spirals 21 therefore to a steepening of the gradients with time, as shown in Figure 2.6, bottom panel. 1.2.5 Abundance Gradients in External Galaxies Abundance gradients expressed in dex/Kpc are found to be steeper in smaller disks but the correlation disappears if they are expressed in dex/Rd, which means that there is a universal slope per unit scale length (ref). The gradients are generally flatter in galaxies with central bars (ref). The SFR is measured mainly from Hα emission (Kennicutt, 1998) and show a correlation with the total surface gas density (HI+H2), in particular the suggested law is that of eq. (1.5). In the observed gas distributions differences between field and clus- ter spirals are found in the sense that cluster spirals have less gas, probably as a consequence of stronger interactions with the environ- ment.Integrated colors of spiral galaxies (Josey & Arimoto 1992; Jimenez et al. 1998; Prantzos & Boissier 2000) indicate inside-out formation, as also found for the milky Way. As an example of abundance gradients in a spiral galaxy we show in Figure 2.7 the observed and predicted gas distribution and abun- dance gradients for the disk of M101. In this case the gas distribu- tion and the abundance gradients are reproduced with systematically smaller timescales for the disk formation relative to the MW (M101 formed faster), and the difference between the timescales of formation of the internal and external regions is smaller (τM101 = 0.75R−0.5 Gyr, Chiappini et al. 2003a) To conclude this section we like to recall a paper by Boissier et al. (2001) where a detailed study of the properties of disks is presented. They conclude that more massive disks are redder, more metal rich and more gas-poor than smaller ones. On the other hand their estimated SF efficiency (defined as the SFR per unit mass of gas) seem to be similar among different spirals: this leads them to conclude that more massive disks are older than less massive ones. 1.2.6 How to model the Hubble Sequence The Hubble Sequence can be simply thought as a sequence of objects where the SFR proceeds faster in the early than in the late types (see also Sandage, 1986). We take the Milky Way galaxy, whose properties are best known, as a 22 Chemical Evolution Fig. 1.7. Upper panel: predicted and observed gas distribution along the disk of M101. The observed HI, H2 and total gas are indicated in the Figure. The large open circles indicate the models: in particular, the open circles connected by a continuous line refer to a model with central surface mass density of 1000M⊙pc −2, while the dotted line refers to a model with 800M⊙pc −2 and the dashed to a model with 600M⊙pc −2. Lower panel: predicted and observed abundance gradients of C,N,O elements along the disk of M101.The models are the lines and differ for a different threshold density for SF, being larger in the dashed model. All the models are by Chiappini et al. (2003a). 1.2 Lecture II: the Milky Way and other spirals 23 reference galaxy and we change the SFR relatively to the Galactic one, for which we adopt eq. (1.6). The quantity ν in eq. (1.6) is the efficiency of SF which we assume to be characteristic of each Hubble type. In the two-infall model for the Milky Way we adopt νhalo = 2.0Gyr −1 and νdisk = 1.0Gyr −1 (see Figure 2.1). The choice of adopting a dependence on the total surface mass density for the Galactic disk is due to the fact that it helps in producing a SFR strongly varying with the galactocentric distance, as required by the observed SFR and gas density distribution as well as by the abundance gradients. In fact, the inside-out scenario influences the rate at which the gas mass is accumulated by infall at each galactocentric distance and this in turn influences the SFR. For bulges and ellipticals we assume that the SF proceeds like in a burst with very high star formation efficiency, namely: SFR = νσk (1.26) with k = 1.0 for the sake of simplicity; ν = 10 − 20Gyr−1 (see Mat- teucci, 1994; Pipino & Matteucci 2004). For irregular galaxies, on the other hand, we assume that the SFR proceeds more slowly and less efficiently that in the Milky Way disk, in particular we assume the same SF law as for spheroids but with 0.01 ≤ ν(Gyr−1) ≤ 0.1. Among irregular galaxies, a special position is taken by the Blue Compact Galaxies (BCG) namely galaxies which have blue colors as a consequence of the fact that they are forming stars at the present time, have small masses, large amounts of gas and low metallicities. For these galaxies, we assume that they suffered on average from 1 to 7 short bursts, with the SF efficiency mentioned above (see Bradamante et al. 1998 and next Lecture). Finally, dwarf spheroidals are also a special cathegory, characterized by old stars, no gas and low metallicities. For these galaxies we assume that they suffered one long starburst lasting 7-8 Gyr or at maximum a couple of extended SF periods, in agreement with their measued Color- Magnitude diagram. It is worth noting that both ellipticals and dwarf spheroidals should loose most of their gas and therefore one may con- clude that galactic winds should play an important role in their evolu- tion, although ram pressure stripping cannot be excluded as a mecha- nism for gas removal. Also for these galaxies we assume the previous SF law with k = 1 and ν = 0.01− 1.0Gyr−1. Lanfranchi & Matteucci, (2003, 2004) developed more detailed models for dwarf spheroidals by adopting the SF history suggested by the Color-Magnitude diagrams of 24 Chemical Evolution Fig. 1.8. Predicted SFRs in galaxies of different morphological type. Figure from Calura (2004). Note that for the elliptical galaxy the SF stops abruptly as a consequence of the galactic wind. single galaxies and with the same efficiency of SF as above. In Figure 2.8 we show the adopted SFRs in different galaxies and in Figure 2.9 the corresponding predicted Type Ia SN rates. For the irregular galaxy, the predicted Type Ia SN rate refers to a specific galaxy, LMC, with a SFR taken from observations (see Calura et al. 2003) with an early ans a late burts of SF and low SF in between. 1.2.7 Type Ia SN rates in different galaxies Following Matteucci & Recchi (2001) we define the typical timescale for Type Ia SN enrichment as the time when the SN rate reaches the maximum. In the following we will always adopt the SDS for the pro- genitors of Type Ia SNe. A point that is not often understood is that this timescale depends upon the progenitor lifetimes, IMF and SFR and therefore is not universal. Sometimes in the literature the typical Type Ia SN timescale is quoted as being universal and equal to 1 Gyr, whereas this is just the timescale at which the Type Ia SNe start to be important in the process of Fe enrichment in the solar vicinity. Matteucci & Recchi (2001) showed that for an elliptical galaxy or a bulge of spiral with a high SFR the timescale for Type Ia SN enrichment 1.2 Lecture II: the Milky Way and other spirals 25 Fig. 1.9. Predicted Type Ia SN rates for the SFRs of Figure 2.8. Figure from Calura (2004). Note that for the irregular galaxy here the predictions are for the LMC, where a recent SF burst is assumed. is quite short, in particular tSNIa = 0.3 − 0.5 Gyr. For a spiral like the Milky Way, in the two-infall model, a first peak is reached at 1.0- 1.5 Gyr (the time at which SNeIa become important as Fe producers (Matteucci and Greggio 1986) while a second less important peak occurs at tSNIa = 4 − 5 Gyr. For an irregular galaxy with a continuous but very low SFR the timescale is tSNIa > 5 Gyr. 1.2.8 Time-delay model for different galaxies As we have already seen, the time-delay between the production of oxy- gen by Type II SNe and that of Fe by Type Ia SNe allows us to explain the [X/Fe] vs. [Fe/H] relations in an elegant way. However, the [X/Fe] vs. [Fe/H] plots depend not only on nucleosynthesis and IMF but also on other model assumptions, such as the SFR, through the absolute Fe abundance ([Fe/H]). Therefore, we should expect a different behaviour in galaxies with different SF histories. In Figure 2.10 we show the pre- dictions of the time-delay model for a spheroid like the Bulge, for the solar vicinity and for a typical irregular magellanic galaxy. As one can see in this Figure, we predict a long plateau, well above the solar value, for the [α/Fe] ratios in the Bulge (and ellipticals), owing 26 Chemical Evolution LMC (Hill et al. 2000) DLA (Vladilo 2002) Fig. 1.10. Predicted [α/Fe] ratios in galaxies with different SF histories. The top line represents the predictions for the Bulge or for an elliptical galaxy of the same mass (∼ 1010M⊙), the median line represents the prediction for the solar vicinity and the lower line the prediction for an irregular magellanic galaxy. The differences among the various models are in the efficiency of star formation, being quite high for spheroids (ν = 20Gyr−1), moderate for the Milky Way (ν = 1 − 2Gyr−1) and low for irregular galaxies (ν = 0.1Gyr−1). The nucleosynthesis prescriptions are the same in all objects. The time-delay between the production of α-elements and Fe, coupled with the different SF histories produces the differences in the plots. Data for Damped-Lyman-α systems, LMC and Bulge are shown for comparison. to the fast Fe enrichment reached in these systems by means of Type II SNe: when the Type Ia SNe start enriching substantially the ISM, at 0.3-0.5 Gyr, the gas Fe abundance is already solar. The opposite occurs in Irregulars where the Fe enrichment proceeds very slowly so that when Type Ia SNe start restoring the Fe in a substantial way (> 3 Gyr) the Fe in the gas is still well below solar. Therefore, here we observe a steeper 1.3 Lecture III: interpretation of abundances in dwarf irregulars 27 slope for the [α/Fe] ratio. In other words, we have below solar [α/Fe] ratios at below solar [Fe/H] ratios. This diagram is very important since it allows us to recognize a galaxy type only by means of its abundances, and therefore it can be used to understand the nature of high redshift objects. 1.3 Lecture III: interpretation of abundances in dwarf irregulars They are rather simple objects with low metallicity and large gas con- tent, suggesting that they are either young or have undergone discon- tinuous star formation activity (bursts) or a continuous but not efficient star formation. They are very interesting objects for studying galaxy evolution. In fact, in ”bottom-up” cosmological scenarios they should be the first self- gravitating systems to form and they could also be important contributors to the population of systems giving rise to QSO- absorption lines at high redshift (see Matteucci et al. 1997 and Calura et al. 2002). 1.3.1 Properties of Dwarf Irregular Galaxies Among local star forming galaxies, sometimes referred to as HII galax- ies, most are dwarfs. Dwarf irregular galaxies can be divided into two categories: Dwarf Irregular (DIG) and Blue Compact galaxies (BCG). These latter have very blue colors due to active star formation at the present time. Chemical abundances in these galaxies are derived from optical emis- sion lines in HII regions. Both DIG and BCG show a distinctive spread in their chemical properties, altough this spread is decreasing with the new more accurate data, but also a definite mass-metallicity relation. From the point of view of chemical evolution, Matteucci and Chiosi (1983) first studied the evolution of DIG and BCG by means of ana- lytical chemical evolution models including either outflow or infall and concluded that: closed-box models cannot account for the Z-log G(G = Mgas/Mtot) distribution even if the number of bursts varies from galaxy to galaxy and suggested possible solutions to explain the observed spread. In other words, the data show a range of values of the metallicity for a given G ratio, and this means that the effective yield is lower than that of the Simple Model and vary from galaxy to galaxy. The possible solutions suggested to lower the effective yield were: 28 Chemical Evolution • a. different IMF’s • b. different amounts of galactic wind • c. different amounts of infall In Figure 3.1 we show graphically the solutions a), b) and c). Concerning the solution a), one simply varies the IMF, whereas solutions b) and c) have been already descibed (eqs. 1.21 ans 1.23). Later on, Pilyugin (1993) forwarded the idea that the spread observed also in other chemical properties properties of these galaxies such as in the He/H vs. O/H and N/O vs. O/H relations, can be due to self-pollution of the HII regions, which do not mix efficiently with the surrounding medium, coupled with “enriched” or “differential” galac- tic winds, namely different chemical elements are lost at different rates. Other models (Marconi et al. 1994; Bradamante et al. 1998) followed the suggestions of differential winds and introduced the novelty of the contribution to the chemical enrichment and energetics of the ISM by SNe of different type (II, Ia and Ib). Another important feature of these galaxies is the mass-metallicity relation. The existence of a luminosity-metallicity relation in irregulars and BCG was suggested first by Lequeux et al. (1979), then confirmed by Skillman et al. (1989) and extended also to spirals by Garnett & Shields (1987). In particular, Lequeux et al. suggested the relation: MT = (8.5± 0.4) + (190± 60)Z (1.27) with Z being the global metal content. Recently, Tremonti et al. (2004) analyzed 53000 local star-forming galaxies in the SDSS (irregulars and spirals). Metallicity was measured from the optical nebular emission lines. Masses were derived from fitting spectral energy distribution (SED) models. The strong optical nebular lines of elements other than H are produced by collisionally excited transitions. Metallicity was then determined by fitting simultaneously the most prominent emission lines ([OIII], Hβ , [OII], Hα, [NII], [SII]). Tremonti et al. (2004) derived a re- lation indicating that 12+log(O/H) is increasing steeply from M∗ going from 108.5 to 1010.5 but flattening for M∗ > 10 10.5. In particular, the Tremonti et al. relation is: 12 + log(O/H) = −1.492 + 1.847(logM∗)− 0.08026(logM∗) 2. (1.28) This relation extends to higher masses the mass-metallicity relation 1.3 Lecture III: interpretation of abundances in dwarf irregulars 29 Fig. 1.11. The Z-logG diagram.Solutions a), b) and c) from top to bottom, to lower the effective yield in DIG and BCG by Matteucci & Chiosi (1983). Solution a) consists in varying the yield per stellar generation, here indicated by pZ , just by changing the IMF. The solution b) and c) correspond to eqs. (1.21) and (1.23), respectively. 30 Chemical Evolution Fig. 1.12. Figure 3 from Erb et al. (2006) showing the mass-metallicity rela- tion for star forming galaxies at high redshift. The data from Tremonti et al. (2004) are also shown. found for star forming dwarfs and contains very important information on the physics governing galactic evolution. Even more recently, Erb et al. (2006) found the same mass-metallicity relation for star-forming galaxies at redshift z>2, with an offset from the local relation of ∼ 0.3 dex. They used Hα and [NII] spectra. In Figure 3.2 we show the figure from Erb et al. (2006) for the mass-metallicity relation at high redshift which includes the relation of Tremonti et al. (2004) for the local mass- metallicity relation. The most simple interpretation of the mass-metallicity relation is that the effective yield increases with galactic mass. This can be achieved in several ways, as shown in Fig. 3.1.: either by changing the IMF or the stellar yields as a function of galactic mass, or by assuming that the 1.3 Lecture III: interpretation of abundances in dwarf irregulars 31 galactic wind is less efficient in more massive systems, or that the infall rate is less efficient in more massive systems. One of the most common interpretations of the mass-metallicity relation is that the effective yield changes because of the occurrence of galactic winds, which should be more important in small systems. Evidences for galactic winds exist for dwarf irregular galaxies, as we will see next. 1.3.2 Galactic Winds Papaderos et al. (1994) estimated a galactic wind flowing at a velocity of 1320 Km/sec for the irregular dwarf VIIZw403. The escape velocity es- timated for this galaxy is ≃ 50 Km/sec. Lequeux et al. (1995) suggested a galactic wind in Haro2=MKn33 flowing at a velocity of ≃ 200Km/sec, also larger that the escape velocity of this object. More recently, Martin (1996;1998) found also supershells in 12 dwarfs, including IZw18, which imply gas outflow. Martin (1999) concluded that the galactic wind rates are several times the SFR. Finally, the presence of metals in the ICM (revealed by X-ray observations) and in the IGM (Ellison et al. 2000) represents a clear indication of the fact that galaxies lose their metals. However, we cannot exclude that the gas with metals is lost also by ram pressure stripping, especially in galaxy clusters. In models of chemical evolution of dwarf irregulars (e.g. Bradamante et al. 1998) the feedback effects are taken into account and the condition for the development of a wind is: (Eth)ISM ≥ EBgas (1.29) namely, that the thermal energy of the gas is larger or equal to its binding energy. The thermal energy of gas due to SN and stellar wind heating (Eth)ISM = EthSN + Ethw (1.30) with the contribution of SNe being: EthSN = ǫSNRSN (t ‘)dt‘, (1.31) while the contribution of stellar winds is: Ethw = ∫ 100 ϕ(m)ψ(t‘)ǫwdmdt ‘ (1.32) with ǫSN = ηSN ǫo and ǫo = 10 51erg (typical SN energy), and ǫw = 32 Chemical Evolution ηwEw with Ew = 10 49erg (typical energy injected by a 20M⊙ star taken as representative). ηw and ηSN are two free parameters and indicate the efficiency of energy transfer from stellar winds and SNe into the ISM, respectively, quantities still largely unknown. The total mass of the galaxy is expressed as Mtot(t) = M∗(t) +Mgas(t) +Mdark(t) with ML(t) =M∗(t) +Mgas(t) and the binding energy of gas is: EBgas(t) =WL(t) +WLD(t) (1.33) with: WL(t) = −0.5G Mgas(t)ML(t) (1.34) which is the potential well due to the luminous matter and with: WLD(t) = −GwLD Mgas(t)Mdark (1.35) which represents the potential well due to the interaction between dark and luminous matter, where wLD ∼ S(1 + 1.37S), with S = rL/rD, being the ratio between the galaxy effective radius and the radius of the dark matter core. The typical model for a BCG has a luminous mass of 108 − 109M⊙, a dark matter halo ten times larger than the luminous mass and various values for the parameter S. The galactic wind in these galaxies develops easily but it carries out mainly metals so that the total mass lost in the wind is small. 1.3.3 Results on DIG and BCG from purely chemical models Purely chemical models (Bradamante et al. 1998, Marconi et al. 1994) for DIG and BCG have been computed in the last years by varying the number of bursts, the time of occurrence of bursts tburst, the star forma- tion efficiency, the type of galactic wind (differential or normal), the IMF and the nucleosynthesis prescriptions. The best model of Bradamante et al. (1998) suggests that the number of bursts should be Nbursts ≤ 10, the SF efficiency should vary from 0.1 to 0.7 Gyr−1 for either Salpeter or Scalo (1986) IMF (Salpeter IMF is favored). Metal enriched winds are favored. The results of these models also suggest that SNe of Type II dominate the chemical evolution and energetics of these galaxies, whereas stellar winds are negligible. The predicted [O/Fe] ratios tend to be overabundant relative to the solar ratios, owing to the predominance of Type II SNe during the bursts, in agreement with observational data 1.3 Lecture III: interpretation of abundances in dwarf irregulars 33 (see Figure 3.5 upper panel). Models with strong differential winds and Nburst=10 - 15 can however give rise to negative [O/Fe] ratios. The main difference between DIGs and BCGs, in these models, is that the BCGs suffer a present time burst, whereas the DIGs are in a quiescent phase. In Figure 3.3 we show some of the results of Bradamante et al. (1998) compared with data on BCGs: it is evident from the Figure that the spread in the chemical properties can be simply reproduced by different SF efficiencies, which translate into different wind efficiencies. In Fig 3.4 we show the results of the chemical evolution models of Henry et al. (2000). These models take into account exponential infall but not outflow. They suggested that the SF efficiency in extragalactic HII regions must have been low and that this effect coupled with the primary N production from intermediate mass stars can explain the plateau in log(N/O) observed at low 12+log(O/H). Henry et al. (2000) also concluded that 12C is mainly produced in massive stars (yields by Maeder 1992) whereas 14N is mainly produced in intermediate mass stars (yields by HG97). This conclusion, however, should be tested also on the abundances of stars in the Milky Way, where the flat behaviour of [C/Fe] vs. [Fe/H] from [Fe/H] =-2.2 up to [Fe/H]=0 suggest a similar origin for the two elements, namely partly from massive stars and mainly from low and intermediate mass ones (Chiappini et al. 2003b). Concerning the [O/Fe] ratios we show results from Thuan et al. (1995) in Figure 3.5, where it is evident that generally BCGs have overabundant [O/Fe] ratios. Very recently, an extensive study from SDSS of chemical abundances from emission lines in a sample of 310 metal poor emission line galaxies appeared (Izotov et al. 2006). The global metallicity in these galax- ies ranges from ∼ 7.1(Z⊙/30) to ∼ 8.5(0.7Z⊙). The SDSS sample is merged with 109 BCGs containing extremely low metallicity objects. These data, shown in Figure 3.5 lower panel, substantially confirm pre- vious ones, showing how α-elements do not depend on the O abundance suggesting a common origin for these elements in stars withM > 10M⊙, except for a slight increase of Ne/O with metallicity which is inter- preted as due to a moderate dust depletion of O in metal rich galaxies. An important finding is that all the studied galaxies are found to have log(N/O) > −1.6, which indicates that none of these galaxies is a truly young object, unlike the DLA systems at high redshift which show a log(N/O) ∼ −2.3. 34 Chemical Evolution Fig. 1.13. Upper panel : predicted Log(N/O) vs. 12 + log(O/H) for a model with 3 bursts of SF separated by quiescent periods and different SF efficien- cies here indicated with γ = ν. Lower panel: predicted log(C/O) vs. 12 + log(O/H). The data in both panels are from Kobulnicky and Skillman (1996). The models assume a dark matter halo ten times larger than the luminous mass and S=0.3 ( Bradamante et al. 1998, see text). 1.3.4 Results from Chemo-Dynamical models: IZw18 IZw18 is the most metal poor local galaxy, thus resembling to a pri- mordial object. Probably it did not experience more than two bursts of star formation including the present one. The age of the oldest stars 1.3 Lecture III: interpretation of abundances in dwarf irregulars 35 Fig. 1.14. Figure from Henry et al. (2000): a comparison between numerical models and data for extragalactic HII regions and stars (filled circles, filled boxes and filled diamonds); M and S mark the position of the Galactic HII re- gions and the Sun, respectively. Their best model is model B with an efficiency of SF of ν = 0.03. in this galaxy is still uncertain, although recently Tosi et al. (2006) suggested an age possibly > 2 Gyr. The oxygen abundance in IZW18 is 12+log(O/H)= 7.17-7.26, ∼ 15-20 times lower than the solar oxygen (12+ log(O/H)= 8.39, Asplund et al. 2005) and log N/O= -1.54/ -1.60 (Garnett et al. 1997). Recently, FUSE provided abundances also for HI in IZw18: the evi- dence is that the abundances in the HI are lower than in the HII (Aloisi et al. 2003; Lecavelier des Etangs et al. 2003). In particular, Aloisi et al. (2003) found the largest difference relative to the HII data. Chemo-dynamical (2-D) models (Recchi et al. 2001) studied first the case of IZw18 with only one burst at the present time and concluded that the starburst triggers a galactic outflow. In particular, the metals leave the galaxy more easily than the unprocessed gas and among the enriched material the SN Ia ejecta leave the galaxy more easily than other ejecta. In fact, Recchi et al. (2001) had reasonably assumed that Type Ia SNe can transfer almost all of their energy to the gas, since 36 Chemical Evolution Fig. 1.15. Upper panel: [O/Fe] vs. [Fe/H] observed in a sample of BCGs by Thuan et al. (1995) (filled circles), open triangles and asterisks are disk and halo stars shown for comparison.Figure from Thuan et al. (1995). Lower panel: new data from Izotov et al. (2006). The large filled circles represent the BCGs whereas the dots are the SDSS galaxies. Abundances in the left panel are calculated as in Thuan et al. (1995) whereas those in the right panel are calculated as in Izotov et al. (2006) (see original papers for details). Figure from Izotov et al. (2006). 1.3 Lecture III: interpretation of abundances in dwarf irregulars 37 Fig. 1.16. Figure from Recchi et al. (2004): predicted abundances for the HII region in IZw18 (dashed lines represent a model adopting the yields of Meynet & Maeder (2002) for Z = 10−5, whereas the continuous line refers to a higher metallicity (Z=0.004).Observational data are represented by the shaded areas. they explode in an already hot and rarified medium after the SN II explosions. As a consequence of this, they predicted that the [α/Fe] ratios in the gas inside the galaxy should be larger than the [α/Fe] ratios in the gas outside the galaxy. At variance with previous studies, they found that most of the metals are already in the cold gas phase after 8-10 Myr since the superbubble does not break immediately and thermal conduction can act efficiently. In the following, Recchi et al. (2004) extended the model to a two-burst case, always with the aim of reproducing the characteristics of IZw18. The model well reproduces the chemical properties of IZw18 with a relatively long episode of SF lasting 270 Myr plus a recent burst of SF still going on. In Figure 3.6 we show the predictions of Recchi et al. (2004) for the abundances in the HII regions of IZW18 and in Figure 3.7 those for the HI region, showing a little difference between the HII and HI abundances, more in agreement with the data of Lecavelier des Etangs et al. (2004). 38 Chemical Evolution Fig. 1.17. Figure from Recchi et al. (2004): predicted abundances for the HI region. The models are the same as in Figure 3.6. Observational data are represented by the shaded areas. The upper shaded area in the panel for oxygen and the lower shaded area in the panel for N/O represent the data of Lecavelier des Etangs et al. (2003). 1.4 Lecture IV: Elliptical galaxies-Quasars- ICM Enrichment 1.4.1 Ellipticals We recall here some of the most important properties of ellipticals or early type galaxies (ETG) which are systems made of old stars with no gas and no ongoing SF. The metallicity of ellipticals is measured only by means of metallicity indeces obtained from their integrated spectra which are very similar to those of K giants. In order to pass from metal- licity indices to [Fe/H] one needs then to adopt a suitable calibration often based on population synthesis models (Worthey, 1994). We also summarize the most common scenarios for the formation of ellipticals. 1.4.2 Chemical Properties The main properties of the stellar populations in ellipticals are: • There exist the well-known Color-Magnitude and Color - σo (veloc- 1.4 Lecture IV: Elliptical galaxies-Quasars- ICM Enrichment 39 ity dispersion) relations indicating that the integrated colors become redder with increasing luminosity and mass (Faber 1977; Bower et al. 1992). These relations are interpreted as a metallicity effect, although a well known degeneracy exists between metallicity and age of the stellar populations in the integrated colors (Worthey 1994). • The indexMg2 is normally used as a metallicity indicator since it does not depend much upon the age of stellar populations. There exists for ellipticals a well defined Mg2–σo relation, equivalent to the already discussed mass-metallicity relation for star forming galaxies (Bender et al. 1993; Bernardi et al. 1998; Colless et al. 1999). • Abundance gradients in the stellar populations inside ellipticals are found (Carollo et al. 1993; Davies et al. 1993). Kobayashi & Arimoto (1999) derived the average gradient for ETGs from a large compilation of data and this is: ∆[Fe/H ]/∆r ∼ −0.3, with the average metallicity in ETGs of < [Fe/H ] >∗∼ −0.3dex (from -0.8 to +0.3 dex). • A very important characteristic of ellipticals is that their central dom- inant stellar population (dominant in the visual light) shows an over- abundance, relative to the Sun, of the Mg/Fe ratio, < [Mg/Fe] >∗> 0 (from 0.05 to + 0.3 dex) (Peletier 1989; Worthey et al. 1992; Weiss et al. 1995; Kuntschner et al. 2001). • In addition, the overabundance increases with increasing galactic mass and luminosity, < [Mg/Fe] >∗ vs. σo, (Worthey et al. 1992; Mat- teucci 1994; Jorgensen 1999; Kuntschner et al. 2001). 1.4.3 Scenarios for galaxy formation The most common ideas on the formation and evolution of ellipticals can be summarized as: • they formed by an early monolithic collapse of a gas cloud or early merging of lumps of gas where dissipation plays a fundamental role (Larson 1974; Arimoto & Yoshii 1987; Matteucci & Tornambè 1987). In this model SF proceeds very intensively until a galactic wind is developed and SF stops after that. The galactic wind is devoiding the galaxy from all its residual gas. • They formed by means of intense bursts of star formation in merging subsystems made of gas (Tinsley & Larson 1979). In this picture SF stops after the last burst and gas is lost via ram pressure stripping or galactic wind. 40 Chemical Evolution Fig. 1.18. The relation [α/Fe] vs. velocity dispersion (mass) for ETGs. Figure adapted from Thomas et al. (2002).The continuous line represents the predic- tion of the model by Pipino & Matteucci (2004). The shaded area represents the prediction of hierarchical models for the formation of ellipticals.The sym- bols are the observational data. • They formed by early merging of lumps containing gas and stars in which some dissipation is present (Bender et al. 1993). • They formed and continue to form in a wide redshift range and prefer- entially at late epochs by merging of early formed stellar (e.g. Kauff- mann et al. 1993;1996). Pipino & Matteucci (2004), by means of recent revised monolithic models taking into account the development of a galactic wind (see Lec- ture III), computed the relation [Mg/Fe] versus mass (velocity disper- sion) and compared it with the data by Thomas et al. (2002). Thomas (1990) already showed how hierarchical semi-analitycal models cannot 1.4 Lecture IV: Elliptical galaxies-Quasars- ICM Enrichment 41 reproduce the observed [Mg/Fe] vs. mass trend, since in this scenario massive ellipticals have longer periods of star formation than smaller ones. In Figure 4.1, the original figure from Thomas et al. (2002) is shown, where we have plotted also our predictions. In the Pipino & Matteucci (2004) model it is assumed that the most massive galaxies as- semble faster and form stars faster than less massive ones. The adopted IMF is the Salpeter one. In other words, more massive ellipticals seem to be older than less massive ones, in agreement with what found for spirals (Boissier et al. 2001). In particular, in order to explain the observed < [Mg/Fe] >∗> 0 in giant ellipticals the dominant stellar population should have formed on a time scale no longer than 3-5 ·108 yr (Weiss et al. 1995; Pipino & Matteucci 2004). 1.4.4 Ellipticals-Quasars connection We know now that most if not all massive ETGs are hosting an AGN for sometime during their life. Therefore, there is a strict link between the quasar activity and the evolution of ellipticals. 1.4.5 The chemical evolution of QSOs It is very interesting to study the chemical evolution of QSOs by means of the broad emission lines in the QSO region. The first studies by Wills et al. (1985) and Collin-Souffrin et al. (1986) found that the abundance of Fe in QSOs, as measured from broad emission lines, turned out to be ∼ a factor of 10 more than the solar one and this represented a challenge for chemical evolution model makers. Hamman & Ferland (1992) from N V/C IV line ratios in QSOs derived the N/C abundance ratios and inferred the QSO metallicities. They suggested that N is overabundant by factors of 2-9 in the high redshift sources (z > 2). Metallicities 3-14 times the solar one were also suggested in order to produce such a high N abundance, under the assumption of a mainly secondary N. To inter- pret their data they built a chemical evolution model, a Milky Way- like model, and suggested that these high metallicities are reached in only 0.5 Gyr, implying that QSOs are associated with vigourous star forma- tion. At the same time, Padovani & Matteucci (1993) and Matteucci & Padovani (1993) proposed a model for QSOs in which QSOs are hosted by massive ellipticals. They assumed that after the occurrence of a galac- tic wind the galaxy evolves passively and that for massess > 1011M⊙ the gas restored by the dying stars is not lost but it feeds the central 42 Chemical Evolution black hole. They showed that in this context the stellar mass loss rate can explain the observed AGN luminosities. They also found that solar abundances in the gas are reached in no more than 108 years explaining in a natural way the standard emission lines observed in high-z QSOs. The predicted abundances could explain the data available at that time and solve the problem of the quasi-similarity of QSO spectra at differ- ent redshifts. Finally, they suggested also a criterium for establishing the ages of QSOs on the basis of the [α/Fe] ratios observed from broad emission lines (see also Hamman & Ferland 1993). Much more recently, Maiolino et al. (2005, 2006) used more than 5000 QSO spectra from SDSS data to investigate the metallicity of the broad emission line region in the redshift range 2 < z < 4.5 and over the luminosity range −24.5 < MB < −29.5. They found substantial chemical enrichment in QSOs already at z = 6. Models for ellipticals by Pipino & Matteucci (2004) were used as a comparison with the data and they well reproduce the data, as one can see in Figure 4.2. In this Figure the evolution of the abundances of several chemical elements in the gas of a typical elliptical are shown. The elliptical suffers a galactic wind at around 0.4 Gyr since the beginning of star formation. This wind devoids the galaxy of all the gas present at that time. After this time, the SF stops and the galaxy evolves passively. All the gas restored after the galactic wind event by dying stars can in principle feed the central black hole, thus the abundances shown in Figure 4.2, after the time of the wind, can be compared with the abundances measured in the broad emission line region. As one can see, the predicted Fe abundance after the galactic wind is always higher than the O one, owing to the Type Ia SNe which continue to produce Fe even after the stop in the SF. On the other hand, O and α-elements stop to be produced when the SF halts. The comparison between the predicted abundances and those derived from the QSO spectra, are in very good agreement and indicates ages for these objects between 0.5 and 1 Gyr. Finally, in the context of the joint formation of QSOs and ellipticals we recall the work of Granato et al. (2001) who includes the energy feedback from the central AGN in ellipticals. This feedback produces outflows and stops the SF in a down-sizing fashion, in agreement with the chemical properties of ETGs indicating a shorter period of SF for the more massive objects. 1.4 Lecture IV: Elliptical galaxies-Quasars- ICM Enrichment 43 Fig. 1.19. The temporal evolution of the abundances of several chemical ele- ments in the gas of an elliptical galaxy with luminous mass of 1011M⊙. Feed- back effects are taken into account in the model (Pipino & Matteucci 2004), as described in Lecture III. The downarrow indicates the time for the occurrence of the galactic wind. After this time, the SF stops and the elliptical evolves passively. All the abundances after the time for the occurrence of the wind are those that we observe in the broad emission line region region. The shaded area indicates the abundance sets which best fit the line ratios observed in the QSO spectra. Figure from Maiolino et al. 2006. 1.4.6 The chemical enrichment of the ICM The X-ray emission from galay clusters is generally interpreted as ther- mal bremsstrahlung in a hot gas (107-108 K). There are several emission lines (O, Mg, Si, S) including the strong Fe K-line at around 7keV which was discovered by Mitchell et al. (1976). The iron is the best studied element in clusters. For kT ≥ 3 keV the intracluster medium (ICM) Fe abundance is constant and ∼ 0.3Fe⊙ in the central cluster regions; 44 Chemical Evolution the existence of metallicity gradients seems evident only in some clusters (see Renzini 2004). At lower temperatures, the situation is not so simple and the Fe abundance seems to increase. The first works on chemical enrichment of the ICM even preceeded the discovery of the Fe line (Gunn & Gott 1972, Larson & Dinerstein 1975). In the following years other works appeared such as those of Vigroux (1977), Himmes & Biermann (1988) and Matteucci & Vettolani (1988). In particular, Matteucci & Vettolani (1988) started a more detailed approach to the problem fol- lowed by David et al. (1991), Arnaud (1992), Renzini et al. (1993), Elbaz et al. (1995), Matteucci & Gibson (1995), Gibson & Matteucci (1997), Lowenstein & Mushotzky (1996), Martinelli et al. (2000), Chiosi (2000), Moretti et al. (2003). The majority of these papers assumed that galactic winds (mainly from ellipticals and S0 galaxies) are responsible for the ICM chemical enrichment. In fact, ETGs are the dominant type of galaxy in clusters and Arnaud (1992) found a clear correlation be- tween the mass of Fe in clusters and the total luminosity of ellipticals. No such correlation was found for spirals in clusters. Alternatively, the abundances in the ICM are due to ram pressure stripping (Himmes & Biermann 1988) or derive from a chemical enrichment from pre-galactic Pop III stars (White & Rees 1978). In Matteucci & Vettolani (1988) the Fe abundance in the ICM rel- ative to the Sun, XFe/XFe⊙ , was calculated as (MFe)pred/(Mgas)obs to be compared with the observed ratio (XFe/XFe⊙)obs = 0.3 − 0.5 (Rothenflug & Arnaud 1985). They found a good agreement with the observed Fe abundance in clusters if all the Fe produced by ellipticals and S0, after SF has stopped, is eventually restored into the ICM and if the majority of gas in clusters has a primordial origin. Low values for [Mg/Fe] and [Si/Fe] were predicted at the present time, due to the short period of SF in ETGs and to the Fe produced by Type Ia SNe. With Salpeter IMF they found that the Type Ia SNe contribute ≥ 50% of the total Fe in clusters. This leads to a bimodality in the [α/Fe] ratios in the stars and in the gas in the ICM, since the stars have overabundances of [α/Fe]> 0 whereas the ICM should have [α/Fe]≤ 0. The same con- clusion was reached and more highlighted later by Renzini et al. (1993). More recently, Pipino et al. (2002) computed the chemical enrichment of the ICM as a function of redshift by considering the evolution of the cluster luminosity function and an updated treatment of the SN feed- back. They adopted Woosley & Weaver (1995) yields for Type II SNe and Nomoto et al. (1997) W7 model for Type Ia SNe and a Salpeter IMF. They also predicted solar or undersolar [α/Fe] ratios in the ICM. 1.4 Lecture IV: Elliptical galaxies-Quasars- ICM Enrichment 45 Fig. 1.20. Observed Fe abundance and predicted Fe abundance in the ICM as a function of redshift: data from Tozzi et al. (2003), model (continuous line) from Pipino et al. (2002), where the formation of ETGs was assumed to occur at z=8. The observational data on abundance ratios in clusters are still uncertain and vary from cluster centers where they tend to be solar or undersolar to the outer regions where they tend to be oversolar (e.g. Tamura et al. 2004). So, no firm conclusions can be drawn on this point. Concerning the evolution of the Fe abundance in the ICM as a function of redshift, most of the above mentioned models predict very little or no evolution of the Fe abundance from z=1 to z=0 (Pipino et al. 2002). This prediction seemed to be in good agreement with data from Tozzi et al. (2003) as shown Figure. However, more recently, more data of Fe abundance for high redshift clusters appeared showing a different behaviour. In Figure 4.4 we show the data of Balestra et al. (2006) who claim an increase, by at least a factor of two, of the Fe abundance in the ICM from z=1 to z=0. Clearly, if we assume that only ellipticals have contributed to the Fe abundance in the ICM, this effect is difficult to explain unless we assume recent star formation in ellipticals. Another possible explanation could be that spiral galaxies contribute to Fe when 46 Chemical Evolution Fig. 1.21. New data (always relative to Fe) from Balestra et al. (2006) showing an increase of the Fe abundance in the ICM from z=1 to z=0. Error bars refer to 1σ confidence level. The big shaded area represents the rms dispersion. Figure from Balestra et al. (2006). they become S0 as a consequence of ram pressure stripping, and this morphological transformation might have started just at z=1. 1.4.7 Conclusions on the enrichment of the ICM From what said before we can conclude that: • Elliptical galaxies are the dominant contributors to the abundances and energetic content of the ICM. A constant Fe abundance of ∼ 0.3Fe⊙ is found in the central regions of clusters hotter than 3keV (Renzini 2004). • Good models for the chemical enrichment of the ICM should repro- duce the iron mass measured in clusters plus the [α/Fe] ratios in- side galaxies and in the ICM as well as the Fe mass to light ratio (IMLR= MFeICM /LB, with LB being the total blue luminosity of member galaxies, as defined by Renzini et al. (1993). Abundance 1.4 Lecture IV: Elliptical galaxies-Quasars- ICM Enrichment 47 ratios are very powerful tools to impose constraints on the evolution of ellipticals and of the ICM. • Models which do not assume a top-heavy IMF for the galaxies in clusters (a Salpeter IMF can reproduce at best the properties of local ellipticals) predict [α/Fe]> 0 inside ellipticals and [α/Fe] ≤ 0 in the ICM. Observed values are still too uncertain to draw firm conclusions on this point. Acknowledgements This research has been supported by INAF (Italian National Institute for Astrophysics), Project PRIN-INAF-2005-1.06.08.16 References [1] Alibés, A., Labay, J. & Canal, R., 2001, A&A, 370, 1103 [2] Aloisi, A., Savaglio, S., Heckman, T. M., Hoopes, C. G., Leitherer, C. & Sembach, K. R., 2003, ApJ, 595, 760 [3] Argast, D., Samland, M., Gerhard, O.E. & Thielemann, F.-K., 2000, A&A 356, 873 [4] Arimoto, N. & Yoshii, Y. 1987, A&A 173, 23 [5] Arnaud, M., Rothenflug, R., Boulade, O.,Vigroux, L. & Vangioni-Flam, E., 1992, A&A, 254, 49 [6] Asplund, M., Grevesse, N. & Sauval, A.J., 2005, ASP (Astronomical Society of the Pacific) Conf. Series, Vol. 336, p.55 [7] Balestra, I., Tozzi, P., Ettori, S., Rosati, P., Borgani, S., Mainieri, V., Norman, C. & Viola, M., 2006, A&A in press, astro-ph/0609664 [8] Barbuy, B. & Grenon, M., 1990. in :Bulges of Galaxies, eds. B.J. Jarvis & D.M. Terndrup, ESO/CTO Workshop, p.83 [9] Barbuy, B.,Ortolani, S.& Bica, E., 1998, A&AS, 132, 333 [10] Bender, R., Burstein, D. & Faber, S. M., 1993, ApJ, 411, 153 [11] Berman, B.C. & Suchov, A.A., 1991, Astrophys. Space Sci. 184, 169 [12] Bernardi, M., Renzini, A., da Costa, L. N.., Wegner, G. & al., 1998, ApJ, 508, L143 [13] Boissier, S., Prantzos, N., 1999, MNRAS, 307, 857 [14] Boissier, S., Boselli, A., Prantzos, N. & Gavazzi, G., 2001, MNRAS, 321, [15] Bradamante, F., Matteucci, F. & D’Ercole, A., 1998, A&A, 337, 338 [16] Calura, F. 2004 PhD Thesis, Trieste University [17] Calura, F., Matteucci, F. & Vladilo, G., 2003, MNRAS, 340, 59 [18] Carollo, C. M., Danziger, I. J.& Buson, L., 1993, MNRAS, 265, 553 [19] Cayrel, R., Depagne, E., Spite, M., Hill, V., Spite, F., Franois, P., Plez, B., Beers, T., & al., 2004, A&A, 416, 117 [20] Chabrier, G., 2003, PASP, 115, 763 [21] Chang, R.X., Hou, J.L., Shu, C.G. & Fu, C.Q., 1999, A&A 350, 38 [22] Chiappini, C,. Hirschi, R., Meynet, G., Ekstroem, S., Maeder, A. & Mat- teucci, F., 2006, A&A, 449, L27 [23] Chiappini, C., Matteucci F. & Gratton R. 1997, ApJ, 477, 765 [24] Chiappini, C., Matteucci, F. & Meynet, G. 2003b, A&A, 410, 257 [25] Chiappini, C., Matteucci, F. & Padoan, P., 2000, ApJ, 528, 711 [26] Chiappini, C., Matteucci, F., & Romano, D., 2001, ApJ, 554, 1044 http://arxiv.org/abs/astro-ph/0609664 References 49 [27] Chiappini, C., Romano, D & Matteucci, F., 2003a, MNRAS, 339, 63 [28] Chiosi, C., 1980, A&A, 83, 206 [29] Chiosi, C., 2000, A&A 364, 423 [30] Colless, M., Burstein, D., Davies, R.L., McMahan, R. K., Saglia, R. P. & Wegner, G., 1999, MNRAS, 303, 813 [31] Collin-Souffrin, S., Joly, M., Pequignot, D. & Dumont, S., 1986, A&A, 166, 27 [32] Davies, R. L., Sadler, E. M. & Peletier, R. F., 1993, MNRAS, 262, 650 [33] David, L.P., Forman, W., & Jones, C., 1991, ApJ, 376, 380 [34] Dopita, M.A.& Ryder, S.D., 1994, ApJ, 430, 163 [35] Eggen, O.J., Lynden-Bell, D. & Sandage, A.R., 1962, ApJ, 136, 748 [36] Elbaz, D., Cesarsky, C. J., Fadda, D., Aussel, H. & al., 1999, A&A, 351, [37] Ellison, S.L., Songaila, A., Schaye, J. & Pettini, M., 2000, AJ, 120, 1175 [38] Erb, D. K., Shapley, A.E., Pettini, M., Steidel, C.C., Reddy, N.A.& Adel- berger, K.L., 2006, ApJ, 644, 813 [39] François, P., Matteucci, F. Cayrel, R., Spite, M., Spite, F. & Chiappini, C., 2004, A&A, 421, 613 [40] Garnett, D.R.& Shields, G.A., 1987, ApJ, 317, 82 [] Garnett, D.R., Skillman, E.D., Dufour, R.J.& Shields, G.A., 1997, ApJ, 481, [41] Gibson, B.K. & Matteucci, F., 1997, ApJ, 475, 47 [42] Granato, G.L., Silva, L.,Monaco, P., Panuzzo, P., Salucci, P., De Zotti, G.& Danese, L., 2001, MNRAS, 324, 757 [43] Greggio, L. & Renzini, A., 1983, A&A, 118, 217 [44] Grevesse, N., & Sauval, A.J., 1998, Space Science Reviews, Vol. 85, p.161 [45] Goswami, A. & Prantzos, N., 2000, A&A, 359, 191 [46] Gunn, J. E. & Gott, J. R. III, 1972, ApJ, 176, 1 [47] Holweger, H., 2001, Joint SOHO/ACE workshop ”Solar and Galactic Composition”. Edited by Robert F. Wimmer-Schweingruber. Publisher: American Institute of Physics Conference proceedings Vol. 598, p.23 [48] Hachisu, I., Kato, M. & Nomoto, K., 1996, ApJ, 470, L97 [49] Hachisu, I., Kato, M. & Nomoto, K., 1999, ApJ, 522, 487 [50] Hamman, F. & Ferland, G., 1993, ApJ, 418, 11 [51] Henry, R.B.C., Edmunds, M.G.& Koeppen, J., 2000, ApJ, 541, 660 [52] Himmes, A., & Biermann, P., A&A, 1988, 86, 11 [69] Iben, I.Jr. & Tutukov, A.V., 1984, ApJS, 54, 335 [54] Ishimaru, Y., & Arimoto, N., 1997, PASJ, 49, 1 [55] Izotov, Y. I., Stasinska, G., Meynet, G., Guseva, N. G. & Thuan, T. X., 2006, A&A, 448, 955 [56] Jimenez, R., Padoan, P., Matteucci, F. & Heavens, A.F., 1998, MNRAS 299, 123 [57] Jorgensen, I., 1999, MNRAS, 306, 607 [58] Josey, S. A. & Arimoto, N., 1992, A&A, 255, 105 [59] Kauffmann, G., Charlot, S. & White, S. D. M., 1996, MNRAS 283, L117 [60] Kauffmann, G., White, S.D.M. & Guiderdoni, B., 1993, MNRAS, 264, 201 [61] Kennicutt, R.C. Jr., 1989, ApJ, 344, 685 [62] Kennicutt, R.C. Jr., 1998, ARAA, 36, 189 [63] Kobayashi, C. & Arimoto, N., 1999, ApJ, 527, 573 [64] Kodama, T., Yamada, T., Akiyama, M., Aoki, K., Doi, M., Furusawa, H.,Fuse, T., Imanishi, M. & al., 2004, ApJ, 492, 461 50 References [65] Kobulnicky, H.A. & Skillman, E.D., 1996, ApJ, 471, 211 [66] Kroupa, P., Tout, C.A. & Gilmore, G., 1993, MNRAS, 262, 545 [67] Kuntschner, H., Lucey, J. R., Smith, R. J., Hudson, M. J. & Davies, R. L., 2001, MNRAS, 323, 625 [68] Hill, V., François, P., Spite, M., Primas, F., Spite, F., 2000, A&A, 364, [69] Iben, I. Jr. & Tutukov, A., 1984, ApJ, 284, 719 [70] Iwamoto, K., Brachwitz, F., Nomoto, K., Kishimoto, N., Umeda, H., Hix, W. R. & Thielemann, F-K., 1999, ApJS, 125, 439 (I99) [71] Lacey, C.G. & Fall, S. M., 1985, ApJ, 290, 154 [72] Lanfranchi, G. & Matteucci, F., 2003, MNRAS, 345, 71 [73] Lanfranchi, G. & Matteucci, F., 2004, MNRAS, 351, 1338 [74] Larson, R.B., 1972, Nature, 236, 21 [75] Larson, R.B., 1974, MNRAS 169, 229 [76] Larson, R.B., 1976, MNRAS 176, 31 [77] Larson, R.B., 1998, MNRAS, 301, 569 [78] Larson, R.B., & Dinerstein, H.L., 1975, PASP, 87, 911 [79] Lecavelier des Etangs, A., Desert, J.-M. & Kunth, D., 2003, A&A, 413, [80] Lequeux, J., Kunth, D., Mas-Hesse, J. M. & Sargent, W. L. W., 1995, A&A 301, 18 [81] Lequeux, J.,Peimbert, M., Rayo, J. F., Serrano, A. & Torres-Peimbert, S., 1979, A&A, 80, 155 [82] Loewenstein, M., & Mushotzky, F., 1996, ApJ, 466, 695 [83] Maeder, A., 1992, A&A, 264, 105 [84] Maiolino, R., Cox, P., Caselli, P., Beelen, A., Bertoldi, F., Carilli, C. L., Kaufman, M. J., Menten, K. M.& al., 2005, A&A, 440, L51 [85] Maiolino, R., Nagao, T., Marconi, A., Schneider, R., Pedani, M., Pipino, A, Matteucci, F. & al., 2006, Mem. S.A.It. Vol. 77, 643 [86] Mannucci, F., Della Valle, M., Panagia, N., Cappellaro, E., Cresci, G., Maiolino, R., Petrosian, A. & Turatto, M., 2005, A & A, 433, 807 [87] Mannucci, F., Della Valle, M.& Panagia, N., 2006, MNRAS, 370, 773 [88] Marconi, G., Matteucci, F. & Tosi, M., 1994, MNRAS, 270, 35 [89] Martin, C.L., 1996, ApJ, 465, 680 [90] Martin, C.L., 1998, ApJ, 506, 222 [91] Martin, C.L., 1999, ApJ, 513, 156 [92] Martinelli, A., Matteucci, F. & Colafrancesco, S., 2000, A&A 354, 387 [93] Matteucci, F., 2001, The Chemical Evolution of the Galaxy, ASSL, Kluwer Academic Publisher [94] Matteucci, F.,1994, A&A, 288, 57 [95] Matteucci, F. & Chiosi, C., 1983, A&A 123, 121 [96] Matteucci, F. & François, P., 1989, MNRAS 239, 885 [97] Matteucci, F.& Gibson, B.K., 1995, A&A 304, 11 [98] Matteucci, F., Raiteri, C. M., Busso, M., Gallino, R. & Gratton, R., 1993, A&A, 272, 421 [99] Matteucci, F. & Greggio, L., 1986, A&A ,154, 279 [100] Matteucci, F., Molaro, P. & Vladilo, G., 1997, A&A 321, 45 [101] Matteucci, F. & Padovani, P., 1993, ApJ, 419, 485 [102] Matteucci, F. & Recchi, S., 2001, ApJ 5,58, 351 [103] Matteucci, F.& Tornambé, A., 1987, A&A, 185, 51 [104] Matteucci, F., & Vettolani, G., 1988, A&A, 202, 21 References 51 [105] McWilliam, A. & Rich, R. M., 1994, ApJS, 91, 749 [106] Menanteau, F., Jimenez, R.& Matteucci, F., 2001, ApJ, 562, L23 [107] Meynet, G. & Maeder, A., 2002, A&A, 390, 561 [108] Moretti, A., Portinari, L. & Chiosi, C., 2003, A&A, 408, 431 [109] Nomoto, K., Hashimoto, M., Tsujimoto, T., Thielemann, F.-K. & al., 1997, Nucl. Phys. A, 616, 79 [110] Oey, M. S., 2000, ApJ, 542, L25 [111] Padovani, P. & Matteucci, F., 1993, ApJ, 416, 26 [112] Papaderos, P., Fricke, K. J., Thuan, T. X. & Loose, H.-H., 1994, A&A 291, L13 [113] Pardi, M.C., Ferrini, F. & Matteucci, F., 1994, ApJ, 444, 207 [114] Peletier, R. 1989, PhD Thesis, University of Groningen, The Netherlands [115] Pilyugin, I.S., 1993, A&A 277, 42 [116] Pipino, A., Matteucci, F., Borgani, S. & Biviano, A., 2002, NewAstr., 7, [117] Pipino, A., Matteucci, F., 2004, MNRAS, 347, 968 [118] Pipino, A., Matteucci, F., 2006, MNRAS, 365, 1114 [119] Portinari, L. & Chiosi, C., 2000, A&A, 355, 929 [120] Prantzos, N., 2003, A&A, 404, 211 [121] Prantzos, N. & Boissier, S., 2000, MNRAS 313, 338 [122] Recchi, S., Matteucci, F. & D’Ercole, A., 2001, MNRAS 322, 800 [123] Recchi, S., Matteucci, F., D’Ercole, A. & Tosi, M., 2004, A&A, 426, 37 [124] Renzini, A., 2004, in Clusters of Galaxies: Probes of Cosmological Struc- ture and Galaxy Evolution, eds. J.S. Mulchay, A. Dressler & Oemler, A. (Cambridge University Press), p.260 [125] Renzini, A. & Ciotti, L., 1993, ApJ, 416, L49 [126] Renzini, A., Ciotti, L., D’Ercole, A. & Pellegrini, S., 1993, ApJ 416, L49 [127] Rothenflug, R. & Arnaud, M., 1985, A&A, 144, 431 [128] Salpeter, E.E., 1955, ApJ, 121, 161 [129] Sandage, A., 1986, A&A, 161, 89 [130] Scalo, J.M., 1986, Fund. Cosmic Phys. 11, 1 [131] Scalo, J.M., 1998, The Stellar Initial Mass Function, A.S.P. Conf. Ser., Vol. 142 p.201 [132] Schechter, P., 1976, ApJ, 203, 297 [133] Schmidt, M., 1959, ApJ, 129, 243 [134] Schmidt, M., 1963, ApJ, 137, 758 [135] Schneider, R., Salvaterra, R., Ferrara, A. & Ciardi, B., 2006, MNRAS, 369, 825 [136] Searle, L. & Zinn, R., 1978, ApJ, 225, 357 [137] Skillman, E.D, Terlevich, R. & Melnick, J., 1989, MNRAS, 240, 563 [138] Springel, V. & Hernquist, L., 2003, MNRAS, 339, 312 [139] Tamura, T., Kaastra, J.S., den Herder, J.W.A., Bleeeker, J.A.M. & Pe- terson, J.R., 2004, A&A, 420, 135 [140] Thielemann, F.K., Nomoto, K. & Hashimoto, M., 1996, ApJ, 460, 408 [141] Thomas, D., Greggio, L., Bender, R., 1999, MNRAS, 302, 537 [142] Thomas, D., Maraston, C., Bender, R. & Mensez de Oliveira, C., 2005, ApJ, 621, 673 [143] Thomas, D., Maraston, C.& Bender, R., 2002, in: R.E. Schielicke (ed.), Reviews in Modern Astronomy, Vol.15, p.219 [144] Thuan, T.X., Izotov, Y.I., Lipovetsky, V.A., 1995, ApJ, 445, 108 [145] Tinsley, B.M., 1980, Fund. Cosmic Phys., Vol. 5, 287 52 References [146] Tinsley, B.M. & Larson, R.B., 1979, MNRAS, 186, 503 [147] Tornambé, A., 1989, MNRAS, 239, 771 [148] Tosi, M., 1988, A&A, 197, 33 [149] Tosi, M., Aloisi, A., & Annibali, F., 2006, IAU Symp. N.35, p.19 [150] Tozzi, P., Rosati, P., Ettori, S., Borgani, S., Mainieri, V.& Norman, C., 2003, ApJ, 593, 705 [151] Tremonti, C.A., Heckman, T. M., Kauffmann, G., Brinchmann, J., Char- lot, S., White, S. D. M.; Seibert, M., Peng, E. W. & al., 2004, ApJ, 613, [152] Tsujimoto, T., Shigeyama, T. & Yoshii, Y., 1999, ApJ 519,63 [153] van den Hoek, L.B. & Groenewegen, M.A.T., 1997, A&AS, 123, 305 (HG97) [154] Vladilo, G., 2002, A&A, 391, 407 [155] Vigroux, L., 1977, A&A, 56, 473 [156] Weiss, A. Peletier, R. F. & Matteucci, F., 1995, A&A, 296, 73 [157] Whelan, J. & Iben, I. Jr., 1973, ApJ, 186, 1007 [158] White, S.D.M., & Rees, M.J., 1978, MNRAS 183, 341 [159] Wills, B.J., Netzer, H. & Wills, D., 1985, ApJ, 288, 94 [160] Worthey, G., 1994, ApJS, 95, 107 [161] Worthey, G. Faber, S. M. & Gonzalez, J. J., 1992, ApJ, 398, 69 [162] Worthey, G, Trager, S.C., Faber, S. M., 1995, ASP Conf. Ser., 86, 203 [163] Woosley, S.E. & Weaver, T.A., 1995, ApJS, 101, 181 (WW95) [164] Wyse, R.F.G. & Gilmore, G., 1992, AJ, 104, 144 [165] Wyse, R. F. G.& Silk, J., 1989, ApJ, 339, 700 Chemical Evolution Lecture I: basic assumptions and equations of chemical evolution The basic ingredients The Star Formation Rate The Initial Mass Function The Infall Rate The Outflow Rate Stellar evolution and nucleosynthesis: the stellar yields Type Ia SN Progenitors Yields per Stellar Generation Analytical models Numerical Models Lecture II: the Milky Way and other spirals The Galactic formation timescales The two-infall model Common Conclusions from MW Models Abundance Gradients from Emission Lines Abundance Gradients in External Galaxies How to model the Hubble Sequence Type Ia SN rates in different galaxies Time-delay model for different galaxies Lecture III: interpretation of abundances in dwarf irregulars Properties of Dwarf Irregular Galaxies Galactic Winds Results on DIG and BCG from purely chemical models Results from Chemo-Dynamical models: IZw18 Lecture IV: Elliptical galaxies-Quasars- ICM Enrichment Ellipticals Chemical Properties Scenarios for galaxy formation Ellipticals-Quasars connection The chemical evolution of QSOs The chemical enrichment of the ICM Conclusions on the enrichment of the ICM References ABSTRACT In this series of lectures we first describe the basic ingredients of galactic chemical evolution and discuss both analytical and numerical models. Then we compare model results for the Milky Way, Dwarf Irregulars, Quasars and the Intra-Cluster- Medium with abundances derived from emission lines. These comparisons allow us to put strong constraints on the stellar nucleosynthesis and the mechanisms of galaxy formation. <|endoftext|><|startoftext|> Suppression of 1/fα noise in one-qubit systems Pekko Kuopanportti, Mikko Möttönen, Ville Bergholm, Olli-Pentti Saira, Jun Zhang, and K. Birgitta Whaley Laboratory of Physi s, Helsinki University of Te hnology P. O. Box 4100, 02015 TKK, Finland Low Temperature Laboratory, Helsinki University of Te hnology, P.O. Box 3500, 02015 TKK, Finland Department of Chemistry and Pitzer Center for Theoreti al Chemistry, University of California, Berkeley, CA 94720 (Dated: O tober 26, 2018) We investigate the generation of quantum operations for one-qubit systems under lassi al noise with 1/fα power spe trum, where 2 > α > 0. We present an e� ient way to approximate the noise with a dis rete multi-state Markovian �u tuator. With this method, the average temporal evolution of the qubit density matrix under 1/fα noise an be feasibly determined from re ently derived deterministi master equations. We obtain qubit operations su h as quantum memory and the NOT gate to high �delity by a gradient based optimization algorithm. For the NOT gate, the omputed �delities are qualitatively similar to those obtained earlier for random telegraph noise. In the ase of quantum memory however, we observe a nonmonotoni dependen y of the �delity on the operation time, yielding a natural a ess rate of the memory. I. INTRODUCTION In solid-state realization of qubits, material spe i� �u tuations typi ally indu e the major ontribution to the intrinsi noise. Mu h e�ort has been fo used on the preservation of the state in a quantum memory in the presen e of 1/fα noise sin e this is a ubiquitous form of noise en ountered in solid-state qubit appli a- tions [1, 2, 3℄. Both harge and spin qubits are sus- eptible to noise of this form. For Josephson jun tions, both harge noise [4, 5℄ and riti al urrent noise [6, 7℄ have been measured to have 1/fα power spe tral densi- ties. Similar harge �u tuations are responsible for the well-known 1/fα nature of low frequen y noise in sin- gle ele tron transistors [8℄. Ba kground harge �u tua- tions resulting in 1/fα noise spe tra are onsidered to be the most important sour e of dephasing in Joseph- son jun tion qubits [4, 5, 9℄. Spin qubits su h as those formed from donor spins in semi ondu tors are sus epti- ble to nu lear spin noise deriving from dipolar oupling between environmental nu lear spins. The nu lear spin bath ouples to the donor spins by hyper�ne intera - tions, whi h renders the dynami s of the nu lear spins to ause dephasing. Re ent al ulations for a phospho- rus donor in sili on show that the high frequen y om- ponent of the nu lear spin noise is approximately de- s ribed by a 1/fα power spe trum [10℄. Ele tron spin qubits implanted into sili on [11℄ are also a�e ted by relaxation of dangling bonds deriving from oxygen va- an ies at the Si/SiO2 interfa e. This gives rise to a magneti noise with a 1/fα spe trum that is the dom- inant me hanism for phase �u tuations of donor spins near the surfa e [12℄. Another form of noise losely re- lated to 1/fα noise is random telegraph noise (RTN), whi h arises from oupling of individual bistable �u tu- Ele troni address: pekko.kuopanportti�tkk.� ators to a qubit [2, 13, 14, 15, 16, 17, 18, 19℄. Several approa hes to suppress de oheren e based on pulse design have been proposed in the literature. Among them, dynami al de oupling s hemes average out the un- wanted e�e ts of the environmental intera tion through the appli ation of suitable ontrol pulses [20, 21℄. Appli- ation of these s hemes often involves hard pulses with instantaneous swit hings and unbounded ontrol ampli- tudes, resulting in a range of validity restri ted to time s ales for whi h the pulse duration is mu h less than the noise orrelation time [22, 23℄. In Ref. [24℄, a dire t pulse optimization method restri ted to bounded ontrol pulses was developed for implementing one-qubit operations in a noisy environment. This initial work on noise suppression addressed the example of a single qubit system under the in�uen e of lassi ally modeled random telegraph noise, su h as might arise from a single bistable �u tuator. In this paper, we extend the work of Ref. [24℄ to the physi ally relevant situation of 1/fα noise where 2 > α > 0. This kind of noise is known to result, for ex- ample, from a set of bistable �u tuators [25, 26, 27, 28℄, i.e., RTN sour es. We investigate two ways to approxi- mate the 1/fα noise for omputer simulations, namely, the sum of independent RTN �u tuators and a single dis rete multi-state Markovian noise sour e. We show that the single �u tuator provides a mu h more e� ient way to model 1/fα noise than independent RTN �u tu- ators. Furthermore, the average temporal evolution of the density matrix under this Markovian noise an be exa tly des ribed by a set of deterministi master equa- tions derived in Ref. [29℄. Using this approa h, we avoid the heavy omputational task arising from the numeri al evaluation of the density matrix averaged over a large number of di�erent sample paths of the noise as om- puted in Ref. [24℄. This framework will not only signif- i antly a elerate the onvergen e of the ontrol pulse sequen e optimization, but also allows further theoreti- al analysis. Using these master equations, we employ gradient based optimization pro edures to obtain pulse sequen es that suppress 1/fα noise for quantum mem- http://arxiv.org/abs/0704.0771v1 mailto:pekko.kuopanportti@tkk.fi ory and for a NOT gate. Comparisons with omposite pulses designed to eliminate systemati errors and with refo using pulses demonstrate that the numeri ally opti- mized pulse sequen es yield the highest �delities. The remainder of this paper is organized as follows. In Se . II, we show how to e� iently approximate the 1/fα noise by a multi-state Markovian �u tuator. In Se . III, we de�ne the �delity of qubit operations, re- view the master equations des ribing the average evolu- tion of the qubit density matrix in the presen e of the noise and des ribe the numerial optimization pro edure. Se tions IV and V present optimized ontrol pulse se- quen es and the a hieved �delities for quantum memory and for the NOT gate, respe tively. Finally, Se . VI on- ludes and indi ates further appli ations of the method. II. ONE-QUBIT SYSTEM SUBJECT TO 1/fα NOISE We onsider a one-qubit system des ribed by the e�e - tive Hamiltonian a(t)σx + η(t)σz , (1) where a(t) ∈ [−amax, amax] is the external ontrol �eld applied along the x dire tion and η(t) is the lassi al noise signal perturbing the system along the z dire tion. The noise sour e η(t) an be hara terized by its auto- orrelation fun tion C(t) ≡ 〈η(0)η(t)〉 = lim ∫ T/2 η(s)η(s+ t) ds, (2) the Fourier transformation of whi h de�nes the noise power spe tral density as S(f) = C(t)e−i2πftdt. (3) For a single RTN sour e with the amplitude ∆ and or- relation time τc, the auto orrelation fun tion is given by [30℄ (t) = ∆2e−2|t|/τc, (4) and the orresponding power spe tral density by (f) = 1 + (πfτc)2 . (5) A standard way to simulate 1/fα noise is to use an ensemble of K independent un orrelated RTN pro- esses [1, 25, 27℄. Let ηk(t) be a symmetri RTN signal swit hing between values −∆k and ∆k with the orre- lation time τk ≡ 1/γk, where γk is the transition rate between the two states. The total noise pro ess appears in the Hamiltonian (1) as η(t) = k=1 ηk(t). Sin e the RTN sour es are independent, Eqs. (2) and (4) yield the auto orrelation fun tion C(t) = −2|t|/τk = −2γk|t|, (6) and the orresponding power spe tral density is given by S(f) = ∆2kγk γ2k + (πf) . (7) Introdu ing the density of transition rates g(γ) and ex- pressing the noise strength ∆ as a fun tion of the tran- sition rate, we an repla e the summation in Eq. (7) by an integration, whi h yields S(f) = ∫ γmax ∆2(γ)g(γ)γ γ2 + (πf)2 dγ, (8) where γmin and γmax are minimal and maximal transition rates, respe tively. Provided that ∆2(γ)g(γ) = 2A/γ, (9) where A is a onstant, the power spe tral density in Eq. (8) be omes [27℄ S(f) = arctan − arctan , γmin ≪ πf ≪ γmax. (10) Thus Eq. (10) yields an approximation to the 1/f power spe trum. To generate a general 1/fα power spe tral density for 2 > α > 0, we an hoose ∆2(γ)g(γ) = 2Aγ−α (11) as shown in [27℄. Although the above method yields a valid approxima- tion for the 1/fα spe trum, it is omputationally ine�- ient. In parti ular, the number of distin t noise states in reases exponentially with the number of RTN �u tua- torsK, i.e., the number of terms in the sum of Eq. (7) ap- proximating the 1/fα noise. Sin e the size of the di�eren- tial equation system des ribing the average qubit dynam- i s in reases linearly with the number of noise states [29℄, in pra ti e one has to restri t the omputation to a rather small number of independent RTN �u tuators. To over ome this problem, we present a on eptually di�erent way of generating the desired 1/fα noise spe - trum using a single multi-state Markovian �u tuator. Consider a ontinuous-time Markovian noise pro ess with M dis rete noise states. Let Γkj denote the transition rate from the jth state to the kth one. In order to pre- serve total probability, we must have Γjk = 0 for all k = 1, 2, . . . ,M. (12) Let us assume that the transition rates are symmetri , i.e., Γ = ΓT . Under this assumption the noise pro ess has a steady-state solution in whi h the di�erent noise states are equally probable. In order for the noise to be unbiased, i.e., 〈η〉 = 0, the amplitudes bk asso iated with the noise states must satisfy bk = 0. (13) Thus the auto orrelation is given by C(t) = 〈η(t)η(0)〉 = bT eΓ|t|b. (14) Sin e Γ is symmetri , we an diagonalize it with an or- thogonal matrix V as Γ = V ΛV T , where the real diagonal matrix Λ = diag{λk}Mk=1 arries the eigenvalues of Γ in a des ending order. De�ning χ := 1√ V T b, we rewrite Eq. (14) in the form of Eq. (6) as C(t) = χT eΛ|t|χ = λk|t|. (15) In order to use this multi-state Markovian �u tuator to approximate 1/fα noise, we have to hoose the eigen- values λk and the amplitudes χk su h that Eq. (11) is ful�lled. Moreover, we must onstru t the orthogonal matrix V su h that Γ = V ΛV T satis�es Eq. (12), the amplitudes bk satisfy Eq. (13), and the o�-diagonal ele- ments of Γ must be non-negative. One way to satisfy these requirements is to pi k an integer m ≥ 2 and set M = 2m and to hoose the eigen- values as {λk}Mk=1 = −2{0, γmin, γmin + δ, γmin + 2δ, . . . , γmax}, where γmax = (M − 2)δ+ γmin and 0 < δ ≤ γmin. Hen e, the distribution of the transition rates g(γ) is uniform on [γmin, γmax]. Then we set V = H , where H is the Hadamard matrix Expli it al ulation shows that these hoi es ensure that Eq. (12) is satis�ed. To ful�ll Eqs. (11) and (13), we set χ1 = 0 and χk = γ k for k = 2, . . . , M , where γk is equal to γmin + (k − 2)δ. It an be shown that this onstru tion will also produ e transition matri es Γ with non-negative o�-diagonal elements. Hen e we have pro- vided an e� ient way to implement 1/fα noise. Note that the M -state Markovian �u tuator, Eq. (15), or- responds formally to Eq. (6) with M − 1 non-vanishing RTN �u tuators. Thus we have a hieved an exponential improvement in the e� ien y of the noise approximation. Alternatively, we an hoose the eigenvalues of Γ freely and obtain a valid matrix V with numeri al optimization, 0 0.2 0.4 0.6 0.8 1 1.2 1.4 [(2πf)/γ0] FIG. 1: Logarithm of the power spe tral density for �ve independent RTN �u tuators (dash-dotted line), a multi-state Markovian sour e orresponding to 31 RTN �u tuators (solid line), and an ideal 1/f noise (dotted line). The transition rates of the RTN �u tuators are in both ases distributed uniformly on the interval [γ0, 30γ0]. whi h may result in even more faithful approximation of 1/fα noise. Figure 1 ompares the approximation of the spe - tral density of 1/f noise generated by independent RTN sour es and by a multi-state Markovian sour e. For the RTN approa h, we hoose 5 independent noise sour es, for whi h the transition rates γk are uniformly distributed in the range [γmin, γmax] = [γ0, 30γ0], and the strengths are given by ∆k = 1/ γk. This yields a �u tuator with 32 distin t noise states. For the multi-state �u - tuator, we hoose a 32-state noise sour e, for whi h the nonzero eigenvalues λk of its transition rate ma- trix Γ are distributed uniformly on [−60γ0,−2γ0], and χk = 1/ −λk/2. Thus the ondition in Eq. (9) is satis- �ed for both of the approa hes and the multi-state noise sour e has an auto orrelation fun tion and power spe - tral density whi h are equal to those for a ertain ensem- ble of 31 RTN �u tuators. We employ representations of similar omputational omplexity here in order to be able to assess the relative a ura y for a given omputational e�ort. Figure 1 shows that an ensemble of �ve RTN pro esses is not an a urate model for 1/f noise, whereas a single 32-state Markovian noise sour e is quite a urate, espe- ially in the range 3γ0 . ω . 16γ0. The poor quality of the approximation with �ve RTN �u tuators is due to the small number of independent noise sour es employed here, whereas the 32-state Markovian �u tuator ontains more parameters and thereby introdu es more �exibility in the noise approximation. The frequen y range over whi h the approximation is a urate is relatively short if one onsiders that the 1/f noise dete ted in experi- mental appli ations often extends over several frequen y de ades. The width of this frequen y range an of ourse be in reased by in reasing the width of the region from whi h the eigenvalues of the matrix Γ are hosen. In this ase, however, the number of dis rete levels in the Markovian sour e must also be in reased to preserve the desired a ura y. For the main purpose of demonstrating the feasibility of the numeri al optimization algorithm, in the rest of this paper we will ontinue to approximate 1/fα noise by a single Markovian noise sour e with 32 levels. III. QUBIT DYNAMICS AND CONTROL In Ref. [24℄, the temporal evolution of the qubit density matrix was al ulated by averaging over 104�105 unitary quantum traje tories, ea h orresponding to a sample noise path. To ensure a ura y, a large number of uni- tary traje tories are required, whi h results in extensive omputational e�ort. In Ref. [29℄, exa t deterministi master equations des ribing the average temporal evo- lution of quantum systems under Markovian noise were derived. Following Ref. [29℄, we introdu e a onditional density operator ρk(t) whi h orresponds to the density operator of the system averaged over all the noise sample paths o upying the kth state at the time instant t. The on- ditional density operators are normalized su h that the tra e of the operator ρk(t) yields the probability of the kth noise state as Pk(t) = Tr [ρk(t)]. The total average density operator an be expressed as ρ(t) = ρk(t). (16) The dynami s of ρk is obtained from the oupled master equations [29℄ ∂tρk(t) = [Hk(t), ρk(t)] + Γkjρj(t), (17) whereHk(t) is the Hamiltonian of the system orrespond- ing to the kth noise state, and Γkj the transition rate from the jth state to the kth state, as de�ned in Se . II. Spe i� ally, in our one-qubit ase, Hk(t) = a(t)σx + bkσz, (18) where bk is the noise amplitude of the state k. We shall use Ea {ρ} to denote the state ρ evolved under the in�u- en e of noise and the ontrol sequen e a. The �delity fun tion quantifying the overlap between the desired state ρf and the a tual a hieved �nal state is de�ned as φ(ρf , Ea {ρ0}) = Tr Ea {ρ0} , (19) where ρ0 is the initial state of the system. To measure how lose the evolution Ea is to the intended quantum gate operation U , we al ulate the average of the �delity φ(Uρ0U †, Ea {ρ0}) over all pure initial states ρ0, and ob- tain the gate �delity fun tion [24℄ Φ(U) = k=x,y,z †Ea {σk} . (20) We aim to �nd the optimal ontrol pulses whi h max- imize the �delity of the a hieved quantum operation, and hen e apply a typi al gradient based optimization algorithm su h as the gradient as ent pulse engineering (GRAPE) method developed in Ref. [31℄. If the ontin- uous pulse pro�les are approximated by pie ewise on- stant fun tions, the gradient of the �delity fun tion with respe t to these onstant pulse values and durations an be al ulated by the hain rule. This gradient is further used as a proportional adjustment to update the ontrol pulse pro�le. The optimization pro edure is terminated when ertain desired a ura y is a hieved. Note that due to the non- onvex nature of the problem, the gra- dient based algorithm will only yield a lo ally optimal solution. We further employ a multitude of initial on- ditions to �nd a ontrol pulse whi h a hieves the highest �delity. IV. QUANTUM MEMORY In this se tion, we fo us on the implementation of quantummemory, i.e., the identity operator. For the pur- pose of omparison with the optimized pulse sequen es, we introdu e four other kinds of ontrol s hemes whi h generate the identity operator. The �rst referen e sequen e is simply not to apply any external ontrol pulse, i.e., a(t) = 0. This pulse has no ompensation for de oheren e or error. The se ond referen e sequen e is a onstant 2π pulse given by a2π(t) = amax, for t ∈ [0, 2π~/amax]. (21) The third referen e sequen e is the omposite pulse se- quen e known as ompensation for o�-resonan e with a pulse sequen e (CORPSE), whi h was originally designed to orre t systemati errors in the implementation of one- qubit quantum operations and to provide high order on- trol proto ols for systemati qubit bias, i.e., for the noise orrelation time τc → ∞ [32, 33℄. For the identity oper- ation, the CORPSE pulse sequen e an be obtained as aSC2π(t) = amax, for 0 < t ′ < π −amax, for π ≤ t′ ≤ 3π amax, for 3π < t ′ < 4π, where the dimensionless time t′ is de�ned as t′ = amaxt/~. In the absen e of noise, the CORPSE sequen e gen- erates the identity operator exa tly although it requires twi e as long operation time as a 2π pulse, the se ond referen e pulse above. In the presen e of small system- ati errors, the CORPSE sequen e is mu h more a urate than the 2π pulse. For example, onsider a state trans- formation from the north pole ba k to itself on the Blo h sphere. For η(t) ≡ ∆ in Eq. (1), the �delities de�ned in Eq. (19) an be derived to be φ2π = 1− , (23) SC2π = 1− 4π2 . (24) We observe that the error in the �delity of the 2π pulse is fourth order in the relative noise strength ∆/amax, whereas for the CORPSE pulse sequen e it is eighth or- der. Thus the CORPSE sequen e is mu h more a urate than a 2π pulse in orre ting the e�e ts of systemati errors on quantum memory. The fourth standard pulse sequen e whi h we take as a referen e is the Carr-Pur ell-Meiboom-Gill (CPMG) [34℄ sequen e whi h is designed to preserve qubit oheren e. In our ontext, this sequen e onsists of a π/2 pulse fol- lowed by multiple π pulses at intervals tp, followed by a �nal π/2 pulse to bring the system ba k to the original state. This pulse sequen e is designed for T2 measure- ments on spins, starting from the |0〉 state. Thus one does not expe t a CPMG pulse sequen e to perform as well if the initial state is averaged over the Blo h sphere as is done to ompute a gate �delity. We �rst present the �delities obtained for the iden- tity operator using the various ontrol pulse options in the presen e of 1/f noise. The noise is generated here by the single Markovian noise sour e dis ussed in Se . II, with transition rates distributed uniformly over the inter- val [1/τc, 30/τc] . In Fig. 2, the �delities obtained from optimized ontrol pulses, 2π pulse, CORPSE, CPMG, and zero pulse sequen es are plotted as fun tions of the hara teristi orrelation time τc of the approximate 1/f noise. Here, CPMG1 and CPMG2 refer to two CPMG types of pulses with the intervals between π pulses be- ing π and 2π, respe tively. The total duration for these pulses are all 12π~/a . The optimal ontrol pulse is de- signed for 6π, and therefore we repeat it twi e. Similarly, we repeat the 2π pulse 6 times, the CORPSE sequen e 3 times, the CPMG1 sequen e 3 times, and the CPMG2 sequen e twi e. The optimal ontrol pulse yields learly the highest �delity among all these pulses, whereas the zero pulse sequen e has the worst performan e as there are no orre tion me hanisms. Note that due to motional narrowing, all urves approa h unit �delity in the limit τc → 0. The memory a ess rate is an important spe i� ation in modern omputer te hnology [35℄. In our ontext, it orresponds to the total duration of the ontrol pulses. Figure 3 shows the �delity as a fun tion of the dura- tion for the numeri ally optimized ontrol pulses. Equa- tion (1) implies that in the absen e of noise, the quantum system will generate an identity operator for a = amax 0 5 10 15 20 25 30 0.55 0.65 0.75 0.85 0.95 PSfrag repla ements τc/(~/amax) FIG. 2: Fidelity of the quantum memory as a fun tion of the hara teristi orrelation time τc for optimized ontrol pulses (bla k solid), a 2π pulse (bla k dash-dotted), CORPSE pulse sequen e (bla k dotted), CPMG1 pulse sequen e (bla k dashed), CPMG2 pulse sequen e (gray solid), zero pulse se- quen e (gray dash-dotted). The operation time is hosen to be 12π~/a . The noise is produ ed by a single 32-state Marko- vian sour e with the average strength 〈|η|〉 = 0.125 × amax orresponding to 31 RTN �u tuators with the transition rates uniformly distributed over the region [1/τ , 30/τ ] and strengths hosen as des ribed in Se . II. 0 1 2 3 4 5 6 7 8 9 10 PSfrag repla ements T/(π~/amax) FIG. 3: Fidelity of the quantum memory as a fun tion of the operation time for ontrol pulses optimized at ea h point. The noise is produ ed by a similar multi-state Markovian sour e as in Fig. 2, with τc = 3~/amax. and the duration T = 2nπ/amax. In Fig. 3, we observe that, despite an overall de rease, there are peaks in the �delity near these operation times. Thus we an regard 2nπ/amax as the natural periods for quantum memory, and we always hoose the total duration of ontrol pulses orrespondingly. Here, we study the relation between the optimized �- delities a hieved above and the average noise strength 〈|η|〉, for a �xed value of the hara teristi orrelation 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 PSfrag repla ements 〈|η|〉 FIG. 4: Fidelity of the quantum memory as a fun tion of the average absolute noise strength for optimized ontrol pulses (bla k solid), 2π pulse (bla k dash-dotted), CORPSE (bla k dotted), CPMG1 (bla k dashed), CPMG2 (gray solid), and zero (gray dash-dotted). The operation time is hosen to be 12π~/a . Ex ept for its strength, the noise is produ ed by a similar multi-state Markovian sour e as in Fig. 2 with the orrelation time τ = 30~/a time τc = 30~/amax. Figure 4 shows the �delity as a fun tion of the noise strength for the optimized ontrol pulses, 2π pulse, the CORPSE, CPMG1, CPMG2, and zero pulse sequen es. At small values of 〈|η|〉 again, the optimized ontrol pulses onsistently a hieve higher �- delities than all referen e pulses. However, we note that if the noise strength ex eeds ∼0.4, the optimized pulse sequen e redu es to the zero pulse sequen e, i.e., any nonzero pulse sequen e will a tually deteriorate the �- delity performan e. The dis ussion above is based on the spe i� noise den- sity spe trum 1/fα with α = 1. Figure 5 shows the �delities of quantum memory for four optimized ontrol pulses, ea h of whi h is obtained for a di�erent value of α. The noise is produ ed here by a single multi-state Marko- vian sour e with average strength 〈|η|〉 = 0.125 × amax, and the total duration for all ontrol pulses are �xed to 6π. A systemati s aling of the orrelation time axis with respe t to α is learly visible in Fig. 5. This phe- nomenon is explained by the fa t that the on entra- tion of the power spe trum of 1/fα to high frequen ies, i.e., long orrelation times, in reases with α. Hen e, the urves s ale down in τ with in reasing α. V. NOT GATE In this se tion, we fo us on the generation of high- �delityNOT gates, i.e., the σx operator, under 1/f noise. As in the ase of quantum memory, we ompare the nu- meri ally optimized results with referen e pulses. In this 0 5 10 15 20 25 30 0.95 0.955 0.96 0.965 0.97 0.975 0.98 0.985 0.99 0.995 PSfrag repla ements τc/(~/amax) FIG. 5: Fidelity of the quantum memory a hieved with op- timized ontrol pulses as a fun tion of the hara teristi or- relation time τc for 1/f noise with α = 1 (solid ), α = 1.25 (dotted), α = 1.5 (dash-dotted), and α = 1.75 (dashed). The operation time is hosen to be 6π~/a . The noise is pro- du ed by a similar multi-state Markovian sour e as in Fig. 2 with variable values of the power α. ase, our �rst referen e pulse is the π pulse given by aπ(t) = amax, for t ∈ [0, π~/amax], (25) whi h in the absen e of noise is the most e� ient way of a hieving a NOT gate. In addition, we will use the two omposite pulse sequen es CORPSE and short CORPSE [32, 33℄ whi h assume here the form aCπ(t) = amax, for 0 < t ′ < π/3 −amax, for π/3 ≤ t′ ≤ 2π amax, for 2π < t ′ < 13π/3, aSCπ(t) = −amax, for 0 < t′ < π/3 amax, for π/3 ≤ t′ ≤ 2π −amax, for 2π < t′ < 7π/3, respe tively. Both of these pulse sequen es orre t for systemati error, CORPSE being more e� ient. How- ever, the operation time of short CORPSE is mu h shorter than that of CORPSE, and hen e it an yield higher �delities in the presen e of noise. Figure 6 shows the NOT gate �delities obtained by the referen e and optimized pulses in the presen e of the same 1/f noise as employed in the analysis of quantum memory in Se . IV. We observe that for long enough orrelation times, the omposite pulse sequen es provide good error orre tion. Furthermore, as observed earlier for RTN [24℄, for intermediate orrelation times, short CORPSE a hieves the highest �delity among the refer- en e pulses. Figure 7 presents the pulse sequen es ob- tained from the numeri al optimizations for three dif- ferent values of the noise orrelation time τc. For the optimized pulse sequen e, we �nd a transition from an approximately onstant pulse to a short CORPSE -like pulse sequen e at hara teristi orrelation time τ 50~/a . This hange in optimal pulse sequen e is responsible for the apparent dis ontinuity in the �rst derivative of the �delity urve in Fig. 6. 0 50 100 150 200 0.955 0.96 0.965 0.97 0.975 0.98 0.985 0.99 0.995 τc/(~/amax) FIG. 6: NOT gate �delities as fun tions of the hara teris- ti noise orrelation time τc for a π pulse (dotted), CORPSE (dash-dotted), short CORPSE (dashed), and gradient opti- mized pulse sequen e (solid). The 1/f noise is generated as in Fig. 2. These results for the generation of NOT gates under 1/f noise are qualitatively quite similar to the previous results presented in Refs. [24, 29℄ for a single RTN. This similarity is due to the fa t that 1/f noise an be re- garded as arising from a sum of independent RTN �u - tuators, ea h of whi h having a similar �delity depen- den e on their orrelation times. Note that the s ale for the referen e orrelation time τc of the �delity obtained in presen e of 1/f noise in Fig. 6 is somewhat di�erent from the orresponding s ale for the orrelation time of a single RTN sour e, sin e the 1/f noise involves an en- semble of RTN �u tuators with a range of orrelation times. VI. CONCLUSIONS We have studied a single qubit under the in�uen e of 1/fα noise for 2 > α > 0 and investigated how de o- heren e due to this noise an be suppressed in the im- plementation of single qubit operations. We presented an e� ient way to approximate the noise with a dis rete multi-state Markovian �u tuator. Due to this �nding, the average temporal evolution of the qubit density ma- trix under 1/fα noise an be e� iently determined from a deterministi master equation. Employing these exa t deterministi master equations des ribing the temporal evolution of the qubit density operator under Markovian noise, we applied a gradient FIG. 7: Optimized pulse sequen es yielding the highest gate �delities for orrelation times (a) 45~/a , (b) 100~/a and ( ) 150~/a orresponding to Fig. 6. based optimization pro edure to sear h for optimal on- trol pulses implementing quantum operations. In par- ti ular, we studied the physi al appli ation of quantum memory, i.e., the identity operator, whi h is a funda- mental on ept in the realization of a quantum om- puter. The optimized ontrol pulses signi� antly im- proved the �delity over several referen e sequen es su h as 2π, CORPSE, CPMG, and zero pulses. We observe peaks on �delity urves orresponding to integer multi- ples of 2π~/a in the total durations of ontrol pulses, where a is the maximum magnitude of the external ontrol �eld. We also studied the performan e of opti- mal ontrol pulses under 1/fα noise for several di�erent values of 2 > α ≥ 1, and found a monotoni behavior in the noise frequen y as a fun tion of α, i.e., the �delity urves are s aled down in the orrelation time for in reas- ing α. We also investigated how the �delities degraded as the noise strength in reases. For the generation of high- �delity NOT gates, we obtained results showing qualita- tively similar behavior to the previous results presented in Refs. [24, 29℄ for a single RTN sour e. In parti u- lar, just as for a single noise sour e, in the presen e of 1/fα noise we observed a transition in the optimal ontrol pulse sequen e from a onstant pulse to a CORPSE-like sequen e as the noise hara teristi orrelation time τc is in reased. This approa h of oupled master equations indexed by noise states of the environment, together with an opti- mization te hnique for pulse design an be readily gen- eralized to multiple qubits evolving in the presen e of 1/fα noise and other Markovian noise sour es. Further- more, it an be used to develop realisti pulse sequen es for mitigation of nu lear spin and surfa e magneti noise a ting on donor spins implanted in sili on [11℄, as well as for suppression of ba kground harge noise a ting on super ondu ting qubits [4℄. In future, we will study the implementation of multi-qubit gates, e.g., the ontrolled NOT gate, in noisy systems and the swapping of quan- tum information from a noisy qubit to long term quan- tum memory. We will also onsider more realisti noise with 1/fα spe trum over many frequen y de ades. A knowledgments This work was supported by the A ademy of Fin- land, the National Se urity Agen y (NSA) under MOD713106A and by the NSF ITR program under grant number EIA-0205641. M. M. and V. B. a knowledge the Finnish Cultural Foundation, M. M. the Väisälä founda- tion and Magnus Ehrnrooth Foundation for the �nan ial support. We thank J. Clarke for insightful dis ussions. [1℄ L. Faoro and L. Viola, Phys. Rev. Lett. 92, 117905 (2004). [2℄ E. Paladino, L. Faoro, G. Fal i, and R. Fazio, Phys. Rev. Lett. 88, 228304 (2002). [3℄ G. Fal i, A. D'Arrigo, A. Mastellone, and E. Paladino, Phys. Rev. A 70, 040101 (2004). [4℄ O. Asta�ev, Y. A. Pashkin, Y. Nakamura, T. Yamamoto, and J. S. Tsai, Phys. Rev. Lett. 96, 137001 (2006). [5℄ O. Asta�ev, Y. A. Pashkin, Y. Nakamura, T. Yamamoto, and J. S. Tsai, Phys. Rev. Lett. 93, 267007 (2004). [6℄ F. C. Wellstood, C. Urbina, and J. Clarke, Apl. Phys. Lett. 85, 5296 (2004). [7℄ M. Mü k, M. Korn, C. G. A. Mugford, J. B. Ky ia, and J. Clarke, Apl. Phys. Lett. 86, 012610 (2005). [8℄ T. M. Eiles, R. L. Kautz, and J. M. Martinis, Apl. Phys. Lett. 61, 237 (1992). [9℄ Y. Nakamura, Y. A. Pashkin, T. Yamamoto, and J. S. Tsai, Physi a S ripta 102, 155 (2002). [10℄ R. de Sousa, unpublished, ond-mat/0610716 (2006). [11℄ T. S henkel, J. A. Liddle, A. Persaud, A. M. Tyryshkin, S. A. Lyon, R. de Sousa, K. B. Whaley, J. B. J. Shangkuan, and I. Chakarov, Apl. Phys. Lett. 8, 11201 (2006). [12℄ R. de Sousa et al., unpublished (2007). [13℄ Y. Nakamura, Y. A. Pashkin, T. Yamamoto, and J. S. Tsai, Phys. Rev. Lett. 88, 047901 (2002). [14℄ Y. M. Galperin, B. L. Altshuler, J. Bergli, and D. V. Shantsev, Phys. Rev. Lett. 96, 097009 (2006). [15℄ B. Savo, F. C. Wellstood, and J. Clarke, Appl. Phys. Letts. 50, 1757 (1987). [16℄ R. T. Wakai and D. J. V. Harlingen, Phys. Rev. Lett. 58, 1687 (1987). [17℄ T. Fujisawa and Y. Hirayama, Appl. Phys. Lett. 77, 543 (2000). [18℄ C. Kurdak, C.-J. Chen, D. C. Tsui, S. Parihar, S. Lyon, and G. W. Weimann, Phys. Rev. Lett. 56, 9813 (1997). [19℄ R. de Sousa, K. B. Whaley, F. K. Wilhelm, and J. von Delft, Phys. Rev. Lett. 95, 247006 (2005). [20℄ L. Viola and S. Lloyd, Phys. Rev. A 58, 2733 (1998). [21℄ L. Viola, S. Lloyd, and E. Knill, Phys. Rev. Lett. 83, 4888 (1999). [22℄ A. G. Kofman and G. Kurizki, Phys. Rev. Lett. 87, 270405 (2001). [23℄ A. G. Kofman and G. Kurizki, Phys. Rev. Lett. 93, 130406 (2004). [24℄ M. Möttönen, R. d. Sousa, J. Zhang, and K. B. Whaley, Phys. Rev. A 73, 022332 (2006). [25℄ M. B. Weissman, Rev. Mod. Phys. 60, 537 (1988). [26℄ E. Paladino, L. Faoro, G. Fal i, and R. Fazio, Phys. Rev. Lett. 88, 228304 (2002). [27℄ B. Kaulakys, V. Gontis, and M. Alaburda, Phys. Rev. E 71, 051105 (2005). [28℄ Y. M. Galperin, B. L. Altshuler, J. Bergli, and D. V. Shantsev, Phys. Rev. Lett. 96, 097009 (2006). [29℄ O.-P. Saira, V. Bergholm, T. Ojanen, and M. Möttönen, Phys. Rev. A 75, 012308 (2007). [30℄ M. J. Kirton and M. J. Uren, Advan es in Physi s 38 (1989). [31℄ N. Khaneja, T. Reiss, C. Kehlet, T. S hulte-Herbrüggen, and S. J. Glaser, J. Mag. Res. 172, 296 (2005). [32℄ H. K. Cummins and J. A. Jones, New J. Phys. 2, 1 (2000). [33℄ H. K. Cummins, G. Llewellyn, and J. A. Jones, Phys. Rev. A 67, 042308 (2003). [34℄ S. Meiboom and D. Gill, Rev. S i. Instr. 29, 688 (1958). [35℄ J. L. Hennessy and D. A. Patterson, Computer Ar hi- te ture: A Quantitative Approa h (Morgan Kaufmann, 2006). ABSTRACT We investigate the generation of quantum operations for one-qubit systems under classical noise with 1/f^\alpha power spectrum, where 2>\alpha > 0. We present an efficient way to approximate the noise with a discrete multi-state Markovian fluctuator. With this method, the average temporal evolution of the qubit density matrix under 1/f^\alpha noise can be feasibly determined from recently derived deterministic master equations. We obtain qubit operations such as quantum memory and the NOT}gate to high fidelity by a gradient based optimization algorithm. For the NOT gate, the computed fidelities are qualitatively similar to those obtained earlier for random telegraph noise. In the case of quantum memory however, we observe a nonmonotonic dependency of the fidelity on the operation time, yielding a natural access rate of the memory. <|endoftext|><|startoftext|> Introduction We consider the flow of an incompressible fluid in a open bounded set Ω ⊂ R2 during the time interval [0, T ]. The velocity field u : Ω × [0, T ] → R2 and the pressure field p : Ω× [0, T ] → R satisfy the Navier-Stokes equations ∆u+ (u ·∇)u+∇p = f ,(1.1) div u = 0 ,(1.2) with the boundary and initial condition u|∂Ω = 0 , u|t=0 = u0. The terms ∆u and (u ·∇)u are respectively associated with the physical phenom- ena of diffusion and convection. The Reynolds number Re measures the influence of convection in the flow. For equations (1.1)–(1.2), finite element and finite dif- ference methods are well known and mathematical studies are available (see [10] for example). Numerous computations have also been conducted with finite vol- ume schemes (e.g. [14] and [1]). However, in this case, few mathematical results are available. Let us cite Eymard and Herbin [7] and Eymard, Latché and Herbin [8]. In order to deal with the incompressibility constraint (1.2), these works use a penalization method. Another way is to use the projection methods which have been introduced by Chorin [4] and Temam [15]. This is the case in Faure [9]. In this work, however, the mesh is made of squares, so that the geometry of the problem is limited. Therefore, we introduce in what follows a finite volume scheme on triangular meshes for equations (1.1)–(1.2), using a projection method. An interesting feature of this scheme is that the unknowns for the velocity and Received by the editors April 1, 2007 and, in revised form, April 1, 2007. 2000 Mathematics Subject Classification. 76M12, 76B99. http://arxiv.org/abs/0704.0772v1 2 S. ZIMMERMANN pressure are both piecewise constant (colocated scheme). It leads to an economic computer storage, and allows an easy generalization of the scheme to the 3D case. The layout of the article is the following. We first introduce (section 2) some no- tations and hypotheses on the mesh. We define (section 2.2) the spaces we use to approximate the velocity and pressure. We define also (section 2.3) the operators we use to approximate the differential operators in (1.1)–(1.2). Combining this with a projection method, we build the scheme in section 3. In order to provide a mathematical analysis for the scheme, we prove in section 4 that the differential operators in (1.1)–(1.2) and their discrete counterparts share similar properties. In particular, the discrete operators for the gradient and the divergence are adjoint. Also, the discrete gradient operator is a consistent approximation of its continuous counterpart. The discrete operator for the convection term is positive, stable and consistent. The discrete operator for the divergence satisfies an inf-sup (Babuška- Brezzi) condition. From these properties we deduce in section 5 the stability of the scheme. We conclude with some notations. The spaces (L2, |.|) and (L∞, ‖.‖∞) are the usual Lebesgue spaces and we set L20 = {q ∈ L2 ; q(x) dx = 0}. Their vectorial counterparts are (L2, |.|) and (L∞, ‖.‖∞) with L2 = (L2)2 and L∞ = (L∞)2. For k ∈ N∗, (Hk, ‖·‖k) is the usual Sobolev space. Its vectorial counterpart is (Hk, ‖.‖k) with Hk = (Hk)2. For k = 1, the functions of H1 with a null trace on the boundary form the spaceH10. Also, we set ∇u = (∇u1,∇u2)T if u = (u1, u2) ∈ H1. If X ⊂ L2 is a Banach space, we define C(0, T ;X) (resp. L2(0, T ;X)) as the set of the applications g : [0, T ] → X such that t → |g(t)| is continous (resp. square integrable). The norms ‖.‖C(0,T ;X) and ‖.‖L2(0,T ;X) are defined respectively by ‖g‖C(0,T ;X) = supt∈[0,T ] |g(t)| and ‖g‖L2(0,T ;X) = |g(t)|2 ds . In all calculations, C is a generic positive constant, depending only on Ω, u0 and f . 2. Discrete setting First, we introduce the spaces and the operators needed to build the scheme. 2.1. The mesh. Let Th be a triangular mesh of Ω: Ω = ∪K∈ThK. For each triangle K ∈ Th, we denote by |K| its area and EK the set of his edges. If σ ∈ EK , nK,σ is the unit vector normal to σ pointing outward of K. The set of edges of the mesh is Eh = ∪K∈ThEK . The length of an edge σ ∈ Eh is |σ| and its middle point xσ. The set of edges located inside Ω (resp. on its boundary) is E inth (resp. Eexth ): Eh = E inth ∪ Eexth . If σ ∈ E inth , Kσ and Lσ are the triangles sharing σ as an edge. If σ ∈ Eexth , only the triangle Kσ inside Ω is defined. We denote by xK the circumcenter of a triangle K. We assume that the measure of all interior angles of the triangles of the mesh are below π , so that xK ∈ K. If σ ∈ E inth (resp. σ ∈ Eexth ) we set dσ = d(xKσ ,xLσ ) (resp. dσ = d(xσ ,xKσ)). We define for all edge σ ∈ Eh (2.1) τσ = The maximum circumradius of the triangles of the mesh is h. We assume ([6] p. 776) that there exists C > 0 such that ∀σ ∈ Eh, d(xKσ , σ) ≥ C|σ| and |σ| ≥ Ch. It implies that there exists C > 0 such that (2.2) ∀σ ∈ Eh , τσ ≥ C , A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 3 and for all triangles K ∈ Th we have (with σ ∈ EK and hK,σ the matching altitude) (2.3) |K| = 1 |σ|hK,σ ≥ |σ| d(xK ,xσ) ≥ C h2. Lastly, if K ∈ Th and L ∈ Th are two triangles sharing the edge σ ∈ E inth , we define αK,L = d(xL,xσ) d(xK ,xL) Let us notice that αK,L ∈ [0, 1] and αK,L + αL,K = 1. 2.2. The discrete spaces. We first define P0 = {q ∈ L2 ; ∀K ∈ Th, q|K is a constant} , P0 = (P0)2. For the sake of concision, we set for all qh ∈ P0 (resp. vh ∈ P0) and all triangle K ∈ Th: qK = qh|K (resp. vK = vh|K). Although P0 6⊂ H1, we define the discrete equivalent of a H1 norm as follows. For all vh ∈ P0 we set (2.4) ‖vh‖h = σ∈Eint τσ |vLσ − vKσ |2 + σ∈Eext τσ |vKσ |2 where τσ is given by (2.1). We have [6] a Poincaré-like inequality for P0: there exists C > 0 such that for all vh ∈ P0 (2.5) |vh| ≤ C ‖vh‖h. We also have the following inverse inequality. Proposition 2.1. There exists a constant C > 0 such that for all vh ∈ P0 h ‖vh‖h ≤ C |vh|. Proof. According to (2.4) h2 ‖vh‖2h = σ∈Eint h2 τσ |vLσ − vKσ |2 + σ∈Eext h2 τσ |vKσ |2. We deduce from (2.2) and (2.3) that h2 τσ ≤ C |Kσ| and h2 τσ ≤ C |Lσ|. Thus, since |vLσ − vKσ |2 ≤ 2 |vLσ |2 + |vKσ |2 , we get h2 ‖vh‖2h ≤ C σ∈Eint |Kσ| |vKσ |2 + |Lσ| |vLσ |2 σ∈Eext |Kσ| |vKσ |2. Hence h2 ‖vh‖2h ≤ C |K| |vK |2 ≤ C |vh|2. From the norm ‖.‖h we deduce a dual norm. For all vh ∈ P0 we set (2.6) ‖vh‖−1,h = sup (vh,ψh) ‖ψh‖h For all uh ∈ P0 and vh ∈ P0 we have (uh,vh) ≤ ‖uh‖−1,h ‖vh‖h. Now we introduce some operators on P0 and P0. We define the projection operator ΠP0 : L 2 → P0 as follows. For all w ∈ L2, ΠP0w ∈ P0 is given by (2.7) ∀K ∈ Th , (ΠP0w)|K = w(x) dx. We easily check that for all w ∈ L2 and vh ∈ P0 we have (ΠP0w,vh) = (w,vh). It implies that ΠP0 is stable for the L 2 norm. We define also the interpolation operator Π̃P0 : H 2 → P0. For all q ∈ H2, Π̃P0q ∈ P0 is given by ∀K ∈ Th , Π̃P0q|K = q(xK). 4 S. ZIMMERMANN According to the Sobolev embedding theorem, q ∈ H2 is a.e. equal to a continuous function. Therefore the definition above makes sense. We also set Π̃P0 = (Π̃P0 ) The operator Π̃P0 (resp. Π̃P0) is naturally stable for the L ∞ (resp. L∞) norm. One also checks ([2] and [16]) that there exists C > 0 such that (2.8) |v −ΠP0v| ≤ C h ‖v‖1 , |q − Π̃P0q| ≤ C h ‖q‖2 for all v ∈ H1 and q ∈ H2. We introduce the finite element spaces P d1 = {v ∈ L2 ; ∀K ∈ Th, v|K is affine} , 1 = {vh ∈ P d1 ; ∀σ ∈ E inth , vh|Kσ(xσ) = vh|Lσ(xσ) , Pc1 = {vh ∈ (P d1 )2 ; vh is continuous and vh|∂Ω = 0}. We have Pc1 ⊂ H10. We define the projection operator ΠPc1 : H 0 → Pc1. For all v = (v1, v2) ∈ H10, ΠPc1v = (v h) ∈ Pc1 is given by ∀φh = (φ1h, φ2h) ∈ Pc1 , ∇vih,∇φih) = ∇vi,∇φih). The operator ΠPc is stable for the H1 norm and ([2] p. 110) there exists C > 0 such that for all v ∈ H1 (2.9) |v −ΠPc v| ≤ C h ‖v‖1. Let us address now the space Pnc1 . If qh ∈ Pnc1 , we have usually ∇qh 6∈ L2. Thus we define the operator ∇̃h : Pnc1 → P0 by setting for all qh ∈ Pnc1 and all K ∈ Th ∇̃hqh|K = ∇qh dx. The associated norm is given by ‖qh‖1,h = |qh|2 + |∇̃hqh|2 We also have a Poincaré inequality: there exists C > 0 such that for all qh ∈ Pnc1 ∩L20 (2.10) |qh| ≤ C |∇̃hqh|. We define the projection operator ΠPnc . For all qh ∈ Pnc1 , ΠPnc1 qh is given by (2.11) ∀φ ∈ L2 , (ΠPnc qh, φ) = (qh, φ). We have the following result. Proposition 2.2. If qh ∈ P0, ΠPnc qh is given by ∀σ ∈ E inth , (ΠPnc1 qh)(xσ) = |Kσ|+ |Lσ| qKσ + |Kσ|+ |Lσ| qLσ , ∀σ ∈ Eexth , (ΠPnc1 qh)(xσ) = qKσ . Proof. For all edge σ ∈ Eh, we define the function ψσ ∈ Pnc1 by setting ψσ(xσ′ ) = 1 if σ = σ′, 0 otherwise. Let us notice that ψσ vanishes outside Kσ ∪ Lσ if σ ∈ E inth and outside Kσ if σ ∈ Eexth . Let σ ∈ E inth . Using a quadrature formula we get (ΠPnc qh, ψσ) = (ΠPnc qh)(xσ) A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 5 (qh, ψσ) = qKσ + qLσ For an edge σ ∈ Eexth we have (ΠPnc1 qh, ψσ) = (ΠPnc qh)(xσ) and (qh, ψσ) = . By plugging these equations into (2.11) with φ = ψσ, we get the result. We finally introduce the Raviart-Thomas spaces ={vh ∈ Pd1 ; ∀σ ∈ EK , vh|K · nK,σ is a constant, and vh · n|∂Ω = 0} , RT0 ={vh ∈ RTd0 ; ∀K ∈ Th, ∀σ ∈ EK , vh|Kσ · nKσ ,σ = vh|Lσ · nKσ ,σ}. For all vh ∈ RT0, K ∈ Th and σ ∈ EK we set (vh ·nK,σ)σ = vh|K ·nK,σ. We define the operator ΠRT0 : H 1 → RT0. For all v ∈ H1, ΠRT0v ∈ RT0 is given by (2.12) ∀K ∈ Th , ∀σ ∈ EK , (ΠRT0v · nK,σ)σ = v dσ. One checks [3] that there exists C > 0 such that for all v ∈ H1 (2.13) |v −ΠRT0v| ≤ C h ‖v‖1. The following result will be useful. Proposition 2.3. For all v ∈ H1 such that divv = 0, we have ΠRT0v ∈ P0. Proof. Let vh = ΠRT0v and K ∈ Th. According to [3] there exists aK ∈ R2 and bK ∈ R such that: ∀x ∈ K , vh(x) = aK + bK x. Thus divvh|K = 2 bK . On the other hand, according to the divergence formula and (2.12) divv dx = v · n dγ = vh · n dγ = divvh dx. Hence bK = 0 and we get: ∀x ∈ K , vh(x) = aK . 2.3. The discrete operators. The equations (1.1)–(1.2) use the differential op- erators gradient, divergence and laplacian. Using the spaces of section 2.2 we define their discrete counterparts. The discrete gradient ∇h : P0 → P0 is built using a linear interpolation on the edges of the mesh (see [16] for details). This kind of construction has also be considered in [5]. We set for all qh ∈ P0 and all K ∈ Th ∇h qh|K = σ∈EK∩E αKσ,Lσ qKσ + αLσ,Kσ qLσ σ∈EK∩E |σ| qKσ nK,σ.(2.14) We have the following result [16]. Proposition 2.4. If qh ∈ L20 is such that ∇hqh = 0, then qh = 0. The discrete divergence operator divh : P0 → P0 is built so that it is adjoint to the operator ∇h (proposition 4.6 below). We set for all qh ∈ P0 and all K ∈ Th (2.15) divh vh|K = σ∈EK∩E αLσ ,Kσ vKσ + αKσ,Lσ vLσ · nK,σ. The first discrete laplacian ∆h : P0 → P0 ensures that the incompressibility con- straint (1.2) is satisfied in a discrete sense (proposition 3.1). We set for all qh ∈ P0 (2.16) ∆hqh = divh(∇hqh). 6 S. ZIMMERMANN The second discrete laplacian ∆̃h : P0 → P0 is the usual operator in finite volume schemes [6]. We set for all vh ∈ P0 and all K ∈ Th ∆̃hvh|K = σ∈EK∩E τσ (vLσ − vKσ )− σ∈EK∩E τσ vKσ . In order to approximate the convection term (u ·∇)u in (1.1) we define a bilinear form b̃h : P0 ×P0 → P0 using the well-known upwind scheme ([6] p. 766). For all uh ∈ P0, vh ∈ P0, and all K ∈ Th we have (2.17) b̃h(uh,vh) σ∈EK∩E (uσ · nK,σ)+ vK + (uσ · nK,σ)− vLσ We have set uσ = αLσ ,Kσ uKσ + αKσ ,Lσ uLσ and a + = max(a, 0), a− = min(a, 0) for all a ∈ R. Lastly, we define the trilinear form bh : P0 × P0 × P0 → R2 as follows. For all uh ∈ P0, vh ∈ P0, wh ∈ P0, we set (2.18) bh(uh,vh,wh) = |K|wK · b̃h(uh,vh) 3. The scheme We have defined in section 2 the discretization in space. We now have to define a discretization in time, and treat the incompressibility constraint (1.2). We use a projection method to this end. This kind of method has been introduced byChorin [4] and Temam [15]. The basic idea is the following. The time interval [0, T ] is split with a time step k: [0, T ] = n=0[tn, tn+1] with N ∈ N∗ and tn = n k for all n ∈ {0, . . . , N}. For all m ∈ {2, . . . , N}, we compute (see equation (3.2) below) a first velocity field ũmh ≃ u(tm) using only equation (1.1). We use a second-order BDF scheme for the discretization in time. We then project ũmh (see equation (3.4) below) over a subspace of P0. We get a a pressure field p h ≃ p(tm) and a second velocity field umh ≃ u(tm), which fulfills the incompressibilty constraint (1.2) in a discrete sense. The algorithm goes as follows. First, for all m ∈ {0, . . . , N}, we set fmh = ΠP0 f(tm). Since the operator ΠP0 is stable for the L2-norm we get (3.1) |fmh | = |ΠP0 f(tm)| ≤ |f(tm)| ≤ ‖f‖C(0,T ;L2). We start with the initial values u0h ∈ P0 ∩RT0 , u1h ∈ P0 ∩RT0 p1h ∈ P0 ∩ L20. For all n ∈ {1, . . . , N}, (ũn+1h , p h ) is deduced from (ũ h) as follows. • ũn+1h ∈ P0 is given by (3.2) 3 ũn+1h − 4unh + u ∆̃hũ h +b̃h(2u h−un−1h , ũ h )+∇hp h = f • pn+1h ∈ Pnc1 ∩ L20 is the solution of (3.3) ∆h(p h − p divh ũ • un+1h ∈ P0 is deduced by (3.4) un+1h = ũ ∇h(pn+1h − p A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 7 Existence and unicity of a solution to equation (3.2) is classical ([6] for example). Let us show that equation (3.3) has also a unique solution. Let qh ∈ P0 ∩ L20 such that ∆hqh = 0. According to proposition 4.6 we have for all qh ∈ P0 −(∆hqh, qh) = − divh(∇hqh), qh = (∇hqh,∇hqh) = |∇hqh|2. Therefore we have ∇hqh = 0. Using proposition 2.4 we get qh = 0. We have thus proved the unicity of a solution for equation (3.3). It is also the case for the associated linear system. It implies that this linear system has indeed a solution. Hence it is also the case for equation (3.3). Let us now prove that for all m ∈ {0, . . . , N}, umh fulfills (1.2) in a discrete sense. Lemma 3.1. If vh ∈ RT0 ∩P0 then divh vh = 0. Proof. Let K ∈ Th. Since vh ∈ RT0, definition (2.15) reads divh vh|K = |σ| (αLσ ,K + αK,Lσ )vK · nK,σ. Since αKσ ,Lσ + αLσ,Kσ = 1 we conclude that divh vh|K = |σ|vK · nK,σ = vK · |σ|nK,σ Proposition 3.1. For all m ∈ {0, . . . , N} we have divh umh = 0. Proof. For m ∈ {0, 1} we have u0h ∈ P0 ∩ RT0 and u1h ∈ P0 ∩RT0. Applying the lemma above we get the result. If m ∈ {2, . . . , N}, we apply the operator divh to (3.3) and compare with (3.4). 4. Properties of the discrete operators We prove that the differential operators in (1.1)–(1.2) and the operators defined in section 2.3 share similar properties. 4.1. Properties of the discrete convective term. We define b̃ : H1 ×H1 → L2. For all u ∈ H1 and v = (v1, v2) ∈ H1 we set (4.1) b̃(u,v) = div(v1 u), div(v2 u) We show that the operator b̃h is a consistent approximation of b̃. Proposition 4.1. There exists a constant C > 0 such that for all v ∈ H2 and all u ∈ H2 ∩H10 satisfying divu = 0 ‖ΠP0b̃(u,v) − b̃h(ΠRT0u, Π̃P0v)‖−1,h ≤ C h ‖u‖2 ‖v‖1. Proof. Let uh = ΠRT0u and vh = Π̃P0v. According to proposition 2.3 we have uh ∈ P0. Let K ∈ Th. According to the divergence formula and (2.7) we have ΠP0b̃(u,v)|K = σ∈EK∩E v (u · n) dσ. On the other hand, let us rewrite b̃h(uh,vh). Let σ ∈ EK ∩ E inth . Setting vK,Lσ = vK si (uh · nK,σ)σ ≥ 0 vLσ si (uh · nK,σ)σ < 0 one checks that vK (uσ ·nK,σ)++vLσ (uσ ·nK,σ)− = vK,Lσ (uσ ·nK,σ). By definition uσ · nK,σ = αLσ,K (uK · nK,σ) + αK,Lσ (uLσ · nK,σ) ; since uh ∈ RT0 we get 8 S. ZIMMERMANN uσ · nK,σ = (αLσ ,K + αK,Lσ ) (uK · nK,σ) = (uK · nK,σ) = (uh · nK,σ)σ. Using at last (2.12), we deduce from (2.17) b̃h(uh,vh)|K = σ∈EK∩E vK,Lσ (u · nK,σ) dσ. ΠP0 b̃(u,v) − b̃h(uh,vh) σ∈EK∩E (v − vK,Lσ ) (u · n) dσ. Let ψh ∈ P0. We have ΠP0b̃(u,v) − b̃h(uh,vh),ψh σ∈EK∩E (v − vK,Lσ ) (u · n) dσ σ∈Eint (ψKσ −ψLσ) (v − vKσ ,Lσ) (u · n) dσ.(4.2) Let σ ∈ E inth . We want to estimate the integral over σ. Since we work in a two- dimensional domain, we have the Sobolev injection H2 ⊂ L∞. Thus (v − vKσ,Lσ) (u · n) dσ ∣∣∣∣ ≤ ‖u‖L∞ |v−vKσ ,Lσ | dσ ≤ C ‖u‖2 |v−vKσ ,Lσ | dσ. Let us first assume that v ∈ C1. We set xKσ,Lσ = xKσ si (uh · nK,σ)σ ≥ 0 xLσ si (uh · nK,σ)σ < 0 If x ∈ σ, we have the following Taylor expansion v(x)− vKσ,Lσ = v(x)−v(xKσ ,Lσ) = ∇v (tx+(1− t)xKσ ,Lσ) (x−xKσ ,Lσ) dt. We have |x−xKσ ,Lσ | ≤ h. Thus, integrating over σ and using the Cauchy-Schwarz inequality, we get |v − vKσ ,Lσ | dσ ≤ |∇v (tx+ (1− t)xKσ ,Lσ)|2 h t dt dσ We then use the change of variable (t,x) → y = tx + (1 − t)xKσ ,Lσ . Let Dσ be the quadrilateral domain given by the endpoints of σ, xKσ and xLσ . The domain [0, 1]× σ becomes DKσ,Lσ with DKσ,Lσ = Dσ ∩Kσ si (uh · nK,σ)σ ≥ 0 Dσ ∩ Lσ si (uh · nK,σ)σ < 0 For all t ∈ [0, 1] we have h t ≤ h t ≤ C d(xKσ ,Lσ , σ) t thanks to the hypothesis on the mesh. We check easily that d(xKσ ,Lσ , σ) t dt dσ = dy. Thus we get |v − vKσ ,Lσ | dσ ≤ C h DKσ,Lσ |∇v (y)|2 dy A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 9 Since (C1)2 is dense in H2, this estimate still holds for v ∈ H2. Plugging this estimate into (4.2) and using the Cauchy-Schwarz inequality we get ΠP0 b̃(u,v) − b̃h(ΠRT0u, Π̃P0v),ψh ≤ C h ‖u‖H2 σ∈Eint |ψLσ −ψKσ | σ∈Eint DKσ,Lσ |∇v (y)|2 dy so that ΠP0 b̃(u,v)− b̃h(ΠRT0u, Π̃P0v),ψh )∣∣∣ ≤ C h ‖u‖H2 ‖ψh‖1,h ‖v‖1. Using then definition (2.6), we get the result. Let us consider now the operator bh. Let u ∈ H1 and v ∈ L∞∩H1 with divu ≥ 0. Integrating by parts we deduce from (4.1): v · b̃(u,v) dx = divu dx ≥ 0. The discrete operator bh shares a similar property. Proposition 4.2. Let uh ∈ P0 such that divh uh ≥ 0. For all vh ∈ P0 we have bh(uh,vh,vh) ≥ 0. Proof. Remember that for all edges σ ∈ E inth , two triangles Kσ et Lσ share σ as an edge. We denote by Kσ the one such that uσ · nKσ,σ ≥ 0. Using the algebraic identity 2 a (a− b) = a2 − b2 + (a− b)2 we deduce from (2.18) 2 bh(uh,vh,vh) = 2 σ∈Eint |σ|vKσ · (vKσ − vLσ ) (uσ · nKσ,σ) σ∈Eint |vKσ|2 − |vLσ |2 + |vKσ − vLσ |2 (uσ · nKσ,σ) so that 2 bh(uh,vh,vh) ≥ σ∈Eint |vKσ|2−|vLσ |2 (uσ ·nKσ,σ). This sum can be written as a sum over the triangles of the mesh. We get 2 bh(uh,vh,vh) ≥ |vKσ |2 σ∈EK∩E |σ| (uσ · nKσ,σ). Using finally definition (2.15) we get 2 bh(uh,vh,vh) ≥ |K| |vK |2 (divh uh)|K ≥ 0. The following result states that the operator bh is stable for suitable norms. Proposition 4.3. There exists a constant C > 0 such that for all vh ∈ P0, wh ∈ P0, uh ∈ P0 satisfying divh uh = 0 |bh(uh,vh,vh)| ≤ C |uh| ‖vh‖h ‖vh‖h. Proof. For all triangle K ∈ Th and all edge σ ∈ EK ∩ E inth , we have (uσ · nK,σ)+ vK + (uσ · nK,σ)− vLσ = (uσ · nK,σ)vK − |(uσ · nK,σ)| (vLσ − vK). This way, we deduce from (4.7) bh(uh,vh,wh) = S1 + S2 with vK ·wK σ∈EK∩E |σ| (uσ · nK,σ) , S2 = − σ∈EK∩E |σ| |uσ · nK,σ| (vLσ − vK). 10 S. ZIMMERMANN By writing the sum over the edges as a sum over the triangles we get S2 = − σ∈Eint |σ| |uσ · nK,σ| (vLσ − vK) · (wLσ −wK). Using the Cauchy-Schwarz inequality we get |S2| ≤ h ‖uh‖∞ σ∈Eint |vLσ − vKσ |2 1/2  σ∈Eint |wLσ −wKσ |2 Since uh ∈ P0 we have the inverse inequality [6] h ‖uh‖∞ ≤ C |uh|. Using (2.2) and (2.4) we have σ∈Eint |vLσ − vKσ |2 ≤ C σ∈Eint τσ |vLσ − vKσ |2 ≤ C ‖vh‖2h σ∈Eint |wLσ −wKσ |2 ≤ C ‖wh‖2h. Therefore |S2| ≤ C |uh| ‖vh‖h ‖wh‖h. On the other hand we deduce from definition (2.15) |K| (vK ·wK) (divh uh)|K = 0. By combining the estimates for S1 and S2 we get the result. 4.2. Properties of the discrete gradient. Proposition 4.4. There exists a constant C > 0 such that for all qh ∈ P0: h |∇hqh| ≤ C |qh|. Proof. Using (2.14) and the Minkowski inequality, we have for all triangle K ∈ Th |K| |∇hqh |K |2 ≤ σ∈EK∩E 6 |σ|2 (q2K + q σ∈EK∩E 6 |σ|2 q2K . Let us sum over K ∈ Th. Since |σ| ≤ h, using (2.3), we get |∇hqh|2 ≤ σ∈EK∩E |K| q2K + |Lσ| q2Lσ σ∈EK∩E |K| q2K Thus h2 |∇hqh|2 ≤ C |K| q2K ≤ C |qh|2. We now prove that ∇h is a consistent approximation of the gradient. Proposition 4.5. There exists a constant C > 0 such that for all q ∈ H2 |ΠP0(∇q)−∇h(Π̃P0q)| ≤ C h ‖q‖2. Proof. Let K ∈ Th. Using the gradient formula and definition (2.14) we get ΠP0(∇q)−∇h(Π̃P0q) ∇q dx− |K| ∇h(Π̃P0q) where we have set for all edge σ ∈ EK ∩ E inth IσK = αK,Lσ q(xK) + αLσ,K q(xLσ ) nK,σ dσ A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 11 and for all edge σ ∈ EK ∩ Eexth : IσK = q − q(xK) nK,σ dσ. Squaring and using (2.3) we get ΠP0(∇q)−∇h(Π̃P0q) |IσK |2 ≤ |IσK |2. Summing over the triangles K ∈ Th we get (4.3) ∣∣∣ΠP0(∇q)−∇h(Π̃P0q) |IσK |2. We must estimate the integral terms IσK . Let K ∈ Th. Let us first assume that q ∈ C2(Ω). Let σ ∈ EK ∩ E inth . For x ∈ σ we have the following Taylor expansions q(xK) = q(x)+∇q(x) · (xK −x)+ H(q) (txK +(1− t)x)(xK −x) · (xK −x) t dt , q(xLσ) = q(x)+∇q(x)·(xLσ −x)+ H(q) (txLσ+(1−t)x)(xLσ−x)·(xLσ−x) t dt , ∇q(x) = ∇q(xK)− txK + (1− t)x (xK − x) dt. Plugging the last expansion into the two others and integrating over σ we get (4.4) q(xK)− q dσ = |σ| ∇q(xK) · (xK − xσ)−AσK +BσK , (4.5) q(xLσ )− q dσ = |σ| ∇q(xK) · (xLσ − xσ)−AσLσ + B We have set for T ∈ {Kσ, Lσ} (4.6) AσT = ∇∇q (txT + (1 − t)x) (xT − x) dt dσ , (4.7) BσT = H(q) (txT + (1− t)x)(xT − x) · (xT − x) t dt dσ. One can bound these terms as in the proof of proposition 4.1. We get (4.8) |AσT |2 ≤ C h2 |∇∇q (y)|2 dy , |BσT |2 ≤ C h4 |H(q)(y)|2 dy. Now, let us multiply (4.4) by −αK,Lσ nK,σ, (4.5) by −αLσ,K nK,σ and sum the equalities. Since αLσ,K + αK,Lσ = 1 we have −αLσ,K q(xK)− q nK,σ dσ − αK,Lσ q(xLσ )− q nK,σ dσ αKσ,Lσ q(xK,σ) + αLσ,Kσ q(xL,σ) nK,σ dσ = I On the other hand −αK,Lσ (xK −xσ) ·nK,σ −αLσ,K (xLσ −xσ) ·nK,σ = −αK,Lσ αLσ,K (dσ − dσ) = 0. Therefore we get IσK = −αLσ,K AσK +B nK,σ −αK,Lσ AσLσ +B nK,σ. Using estimates (4.8) we obtain |IσK |2 ≤ C h4 (|H(q)(y)|2 + |∇∇q(y)|2) dy. 12 S. ZIMMERMANN We now consider the case σ ∈ EK ∩ Eexth . For x ∈ σ we have q(xK) = q(x)+∇q(x) · (xK − x)+ H(q)(txK + (1− t)x)(xK − x) · (xK − x)tdt. Multiplying by nK,σ and integrating over σ, we get −IσK = JσK nK,σ+BσK nK,σ with JσK = ∇q(x) · (xK − x) dx. Since |xK − x| ≤ h if x ∈ σ, using a trace theorem, we have |JσK | ≤ C h2 ‖∇q‖L∞(σ) ≤ C h2 |∇q(y)|2 + |∇∇q(y)|2 By combining this estimate with (4.8), we get |IσK |2 ≤ 2 |Jσ|2 + 2 |BσK |2 ≤ C h4 |H(q)(y)|2 dy + C h4 (|∇q(y)|2 + |∇∇q(y)|2) dy. The space C2(Ω) is dense in H2. Therefore the bounds for IσK still hold for q ∈ H2. Plugging these bounds into (4.3) we get the result. 4.3. Properties of the discrete divergence. The operators divergence and gra- dient are adjoint: if q ∈ H1 and v ∈ H1 with v · n|∂Ω = 0, we get (v,∇q) = −(q, divv) by integrating by parts. For ∇h and divh we state Proposition 4.6. If vh ∈ P0 and qh ∈ P0 we have: (vh,∇hqh) = −(qh, divh vh). Proof. Using (2.14) one checks that (vh,∇hqh) = qK (S1 + S2 + S3) with σ∈EK∩E |σ|αK,Lσ vK · nK,σ , S2 = σ∈EK∩E |σ|αK,Lσ vLσ · nLσ,σ , and S3 = σ∈EK∩E |σ|vK · nK,σ. Since αK,Lσ + αLσ,K = 1 we have σ∈EK∩E |σ| (1− αLσ ,K)vK · nK,σ σ∈EK∩E |σ|vK · nK,σ − σ∈EK∩E |σ|αLσ,K vK · nK,σ. Since nLσ,σ = −nK,σ, we also have σ∈EK∩E |σ|αK,Lσ vLσ · nLσ,σ = − σ∈EK∩E |σ|αK,Lσ vLσ · nK,σ. Therefore (vh,∇hqh) = − σ∈EK∩E |σ| (αL,Kσ vK +αK,Lσ vLσ) ·nK,Lσ + |σ|vK ·nK,Lσ . Using definition (2.15) we get (vh,∇hqh) = − |K| divh vh|K + |σ|vK · nK,Lσ . Since |σ|vK · nK,Lσ = vK · |σ|nK,Lσ = 0 we obtain finally (vh,∇hqh) = − qK |K| divh vh|K = −(qh, divh vh). A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 13 The divergence operator and the spaces L20, H 0 satisfy the following property, called inf-sup (or Babuška-Brezzi) condition (see [10] for example). There exists a constant C > 0 such that (4.9) inf − (q, divv) ‖v‖1|q| We will now prove that the operator divh and the spaces P0 ∩ L20, P0 satisfy an analogous property. The proof is based on the following lemma. Lemma 4.1. We assume that the mesh is uniform (i.e. the triangles of the mesh are equilateral). Then we have for all qh ∈ P0 ∇hqh = ∇̃h(ΠPnc Proof. Since the mesh is uniform we have: ∀σ ∈ E inth , αKσ,Lσ = 12 . Let K ∈ Th. Using definition (2.14) and the gradient formula we get ∇hqh − ∇̃h(ΠPnc σ∈EK∩E (qKσ + qLσ)nK,σ σ∈EK∩E |σ| qKσ nK,σ − (ΠPnc qh)nK,σ dσ. Since qh ∈ P0 we deduce from proposition 2.2 qh dσ = |σ| (ΠPnc qh)(xσ) = (qKσ + qLσ) if σ ∈ E inth , |σ| qKσ if σ ∈ Eexth . Plugging this into the equation above, we get ∇hqh|K = ∇̃h(ΠPnc qh)|K . Lemma 4.2. We assume that the mesh is uniform. There exists a constant C > 0 such that ∀ qh ∈ P0 ∩ L20 , sup vh∈P0\{0} − (qh, divh vh) ‖vh‖h ≥ C h ‖ΠPnc qh‖1,h. Proof. If qh = 0 the result is trivial. Let qh ∈ P0 ∩ L20\{0}. Let vh = ∇hqh ∈ P0\{0}. Using proposition 4.6 we have −(qh, divhvh) = (vh,∇hqh) = |∇hqh|2 = |∇hqh| |vh|. Let χΩ be the characteristic function of Ω. Putting ψ = χΩ in (2.11) we get qh ∈ L20. So according to (2.10) and(4.1) we have |∇hqh| = ∣∣∣∇̃h(ΠPnc ∣∣∣ ≥ C ‖ΠPnc qh‖1,h. On the other hand, according to proposition 2.1: |vh| ≥ C h ‖vh‖h. Therefore −(qh, divhvh) ≥ C h ‖ΠPnc qh‖1,h ‖vh‖h. Proposition 4.7. We assume that the mesh is uniform. There exists a constant C > 0 such that for all qh ∈ P0 ∩ L20 vh∈P0\{0} − (qh, divh vh) ‖vh‖h ≥ C |ΠPnc 14 S. ZIMMERMANN Proof. If qh = 0 the result is clear. Let qh ∈ P0 ∩ L20\{0}. According to (4.9) there exists v ∈ H10 such that (4.10) divv = −ΠPnc qh and ‖v‖1 ≤ C |ΠPnc We set vh = ΠPc v. We want to estimate − qh, divh(ΠP0vh) . Since ∇hqh ∈ P0 we deduce from proposition 4.6 qh, divh(ΠP0vh) = (ΠP0vh,∇hqh) = (vh,∇hqh). Splitting the last term we get (4.11) − qh, divh(ΠP0vh) = (v,∇hqh)− (v − vh,∇hqh). One one hand, integrating by parts, we get (v,∇hqh) = −(ΠPnc qh, divv) + (ΠPnc qh) (v · nK,σ) dσ. According to (4.10) we have −(ΠPnc qh, divv) = |ΠPnc qh|2. Moreover (ΠPnc qh) (v · nK,σ) dσ = σ∈Eint (ΠPnc qh) (v · nKσ ,σ) dσ since v|∂Ω = 0. Using [2] p.269 and (4.10) we have ∣∣∣∣∣∣ σ∈Eint (ΠPnc qh) (v · nK,σ) dσ ∣∣∣∣∣∣ ≤ C h ‖v‖1 ‖ΠPnc qh‖1,h ≤ C h |ΠPnc qh| ‖ΠPnc qh‖1,h. So we get (4.12) (v,∇hqh) ≥ (|ΠPnc qh| − C h ‖ΠPnc qh‖1,h) |ΠPnc On the other hand, using lemma 4.1 and the Cauchy-Schwarz inequality |(v − vh,∇hqh)| = |(v − vh, ∇̃h(ΠPnc qh))| ≤ |v − vh| |∇̃h(ΠPnc qh)|. Using (2.9) and (4.10) we get |v − vh| = |v −ΠPc v| ≤ C h ‖v‖1 ≤ C h |ΠPnc |(v − vh,∇hqh)| ≤ C h |ΠPnc qh| |∇̃h(ΠPnc qh)| ≤ C h |ΠPnc qh| ‖ΠPnc qh‖1,h. Let us plug this estimate and (4.12) into (4.11). We get qh, divh(ΠP0vh) ≥ (|ΠPnc qh| − C h ‖ΠPnc qh‖1,h) |ΠPnc We now introduce the norm ‖.‖h. We have vh = ΠPc v ∈ Pc1 ⊂ H1. Thus, using [6] p. 776, we get ‖ΠP0vh‖h ≤ C ‖vh‖1. Since ΠPc1 is stable for the H 1 norm, we deduce from (4.10) ‖vh‖1 = ‖ΠPc v‖1 ≤ ‖v‖1 ≤ C |ΠPnc Therefore ‖ΠP0vh‖h ≤ C |ΠPnc1 qh|. Using this inequality in (4.3) we obtain that there exists constants C1 > 0 and C2 > 0 such that qh, divh(ΠP0vh) C1 |ΠPnc qh| − C2 h ‖ΠPnc qh‖1,h ‖ΠP0vh‖h. We deduce from this vh∈P0\{0} − (qh, divh vh) ‖vh‖h ≥ C1 |ΠPnc qh| − C2 h ‖ΠPnc qh‖1,h. A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 15 Let us combine this with lemma 4.2. Since ∀ t ≥ 0 , max C t , C1 |ΠPnc qh| − C2 t ≥ C C1 C + C2 |ΠPnc we get the result. 4.4. Properties of the discrete laplacian. We first prove the coercivity of the discrete laplacian. Proposition 4.8. For all uh ∈ P0 and vh ∈ P0 we have −(∆̃huh,uh) = ‖uh‖2h − (∆̃huh,vh) ≤ ‖uh‖h ‖vh‖h. Proof. Using definition (2.3) and writing the sum over the triangles as a sum over the edges, we have −(∆̃huh,vh) = − σ∈EK∩E τσ (uLσ − uK)− σ∈EK∩E τσ uK σ∈Eint τσ (vLσ − vK) · (uLσ − uK) + σ∈EK∩E τσ uK · vK . We get the first half of the result by taking vh = uh. On the other hand, using the Cauchy-Schwarz inequality and the algebraic identity a b+c d ≤ a2 + c2 b2 + d2, we get the second half. If v ∈ H2, we have |∆v| ≤ ‖v‖2. The operator ∆h shares a similar property. Proposition 4.9. There exists a constant C > 0 such that for all v ∈ H2 ∣∣∣∆̃h(Π̃P0v) ∣∣∣ ≤ C ‖v‖2. Proof. Let vh = Π̃P0v. Let K ∈ Th. According to definition (2.16) (4.13) ∆̃hvh|K = σ∈EK∩E τσ (v(xLσ )− v(xK))− σ∈EK∩E τσ v(xK ). Let us first assume that v = (v1, v2) ∈ (C∞0 )2. Let i ∈ {1, 2}. If σ ∈ EK ∩ E inth and x ∈ σ we have the Taylor expansions vi(xLσ ) = vi(x)+∇vi(x)·(xLσ−x)+ H(vi)(txLσ+(1−t)x)(xLσ−x)·(xLσ−x) t dt , vi(xK) = vi(x)+∇vi(x)·(xK−x)+ H(vi)(txK+(1−t)x)(xK−x)·(xK−x) t dt , ∇vi(x) = ∇vi(xK)− ∇∇vi(txK + (1− t)x)(xK − x) dt. The notation H(vi) refers to the hessian matrix of vi. Plugging the last expansion into the two others and integrating over σ, we get vi(xLσ )− vi(x) dx = ∇vi(xK) · (xLσ − xσ)−A vi(xK)− vi(x) dx = ∇vi(xK) · (xK − xσ)−Aσ,iK +B The terms A T and B T are the same as in (4.6) and (4.7), with vi instead of q. We substract these equations. Since xLσ − xK = dσ nK,σ we infer from (2.1) vi(xLσ )− vi(xK) = ∇vi(xK) · nK,σ + −Aσ,iLσ +B 16 S. ZIMMERMANN Let us consider now the case σ ∈ EK ∩Eexth . If x ∈ σ we have the Taylor expansions vi(xK) = vi(x)+∇vi(x)·(xK−x)+ H(vi)(txK+(1−t)x)(xK−x)·(xK−x) t dt , ∇vi(x) = ∇vi(xK)− ∇∇vi(txK + (1− t)x)(xK − x) dt. Since vi ∈ C∞0 we have vi(x) = 0. We plug the last expansion into the other and integrate over σ. Since xK − xσ = −dσ nK,σ we deduce from (2.1) −τσ vi(xK) = ∇vi(xK) · nK,σ + Thus we get σ∈EK∩E vi(xLσ )− vi(xK) σ∈EK∩E τσ vi(xK) ∇vi(xK) · |σ|nK,σ + where we have set for all edge σ ∈ EK ∩ E inth −Aσ,iLσ +B and for all edge σ ∈ EK ∩ Eexth : Riσ = 1dσ K − B . Since |σ|nK,σ = 0, setting Rσ = (R σ), we get σ∈EK∩E v(xLσ )− v(xK ) σ∈EK∩E τσ v(xK) = Since the space (C∞0 )2 is dense in H2, one checks that this equation still holds for v ∈ H2. Using (4.13) we infer from it ∣∣∣∆̃hvh ∣∣∣∆̃hvh|K |Rσ|2. Using estimates (4.6) and (4.7) we obtain ∣∣∣∆̃hvh |∇∇vi|2 + |H(vi)|2 dx ≤ C ‖v‖22. 5. Stability of the scheme We now use the results of section 4 to prove the stability of the scheme. We first show an estimate for the computed velocity (theorem 5.1). We then state a similar result for the increments in time (lemma 5.2). Using the inf-sup condition (proposition 4.7), we infer from it some estimates on the pressure (theorem 5.2). Lemma 5.1. For all m ∈ {0, . . . , N} and n ∈ {0, . . . , N} we have (umh ,∇hpnh) = 0 , |umh |2 − |ũmh |2 + |umh − ũmh |2 = 0. Proof. First, using propositions 3.1 and 4.6, we get (umh ,∇hpnh) = −(pnh, divhumh ) = 0. Thus we deduce from (3.4) 2 (umh ,u h − ũmh ) = − umh ,∇h(pmh − pm−1h ) A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 17 Using the algebraic identity 2 a (a− b) = a2 − b2 + (a− b)2 we get 2 (umh ,u h − ũmh ) = |umh |2 − |ũmh |2 + |umh − ũmh |2 = 0. We introduce the following hypothesis on the initial data. (H1) There exists C > 0 such that |u0h|+ |u1h|+ k|∇hp1h| ≤ C. Hypothesis (H1) is fulfilled if we set u0h = ΠRT0u0 and we use a semi-implicit Euler scheme to compute u1h. We have the following result. Theorem 5.1. We assume that the initial values of the scheme fulfill (H1). For all m ∈ {2, . . . , N} we have |umh |2 + k ‖ũnh‖2h ≤ C. Proof. Let m ∈ {2, . . . , N} and n ∈ {1, . . . ,m− 1}. Taking the scalar product of (3.2) with 4 k ũn+1h we get 3 ũn+1h − 4unh + u , 4 k ũn+1h − 4 k (∆̃hũ h , ũ +4 k bh(2u h − un−1h , ũ h , ũ h ) + 4 k (∇hp h, ũ h ) = 4 k (f h , ũ h ).(5.1) First of all, using lemma 5.1, we get as in [12] ũn+1h , 3 ũn+1h − 4unh + u = |un+1h | 2 − |unh|2 + 6 |ũn+1h − u 2 + |2un+1h − u h |2 − |2unh − un−1h | +|un+1h − 2u h + u According to proposition 4.8 we have − 4 k (∆̃hũ h , ũ h ) = ‖ũn+1h ‖2h. Also, using lemma 5.1 and (3.4), we have 4 k (∇hpnh, ũn+1h ) = 4 k (∇hp h, ũ h − u (|∇pn+1h | 2 − |∇pnh|2 − |∇pn+1h −∇p h|2). Multiplying (3.4) by 4k∇h(pn+1h − pnh) and using the Young inequality we get |∇(pn+1h − p h)|2 ≤ 3 |un+1h − ũ According to proposition 4.2 we have 4 k bh(2u h − u h , ũ h , ũ h ) ≥ 0. At last using the Cauchy-Schwarz inequality, (2.5) and (3.1) we have 4 k (fn+1h , ũ h ) ≤ 4 k |f h | |ũ h | ≤ C k ‖f‖C(0,T ;L2) ‖ũ h ‖h. Using the Young inequality we get 4 k (fn+1h , ũ h ) ≤ 3 k ‖ũ h + C k ‖f‖2C(0,T ;L2). Let us plug these estimates into (5.1). We get |un+1h | 2 − |unh|2 + |2un+1h − u h|2 − |2unh − un−1h | 2 + |un+1h − 2u h + u +3 |ũn+1h − u 2 + k ‖ũn+1h ‖ (|∇hpn+1h | 2 − |∇hpnh|2) ≤ C k. 18 S. ZIMMERMANN Summing from n = 1 to m− 1 we have |umh |2 + |2umh − um−1h | 2 + 3 |ũn+1h − u 2 + k ‖ũn+1h ‖ |∇hpmh |2 ≤ C + 4 |u1h|2 + |2u1h − u0h|2 + k2 |∇hp1h|2. Using hypothesis (H1) we get the result. We now want to estimate the computed pressure. From now on, we make the following hypothesis on the data f ∈ C(0, T ;L2) , ft ∈ L2(0, T ;L2) , u0 ∈ H2 ∩H10 , divu0 = 0. For all sequence (qm)m∈N we define the sequence (δq m)m∈N∗ by setting δq qm − qm−1 for m ≥ 1. We set δ = (δ)2. If the data u0 and f fulfill a compatibility condition [13] there exists a solution (u, p) to the equations (1.1)–(1.2) such that u ∈ C(0, T ;H2) , ut ∈ C(0, T ;L2) , ∇p ∈ C(0, T ;L2). We introduce the following hypothesis on the initial values of the scheme: there exists a constant C > 0 such that (H2) |u0h − u0|+ ‖u1h − u(t1)‖∞ + |p1h − p(t1)| ≤ C h , |u1h − u0h| ≤ C k. One checks easily that this hypothesis implies (H1). We have the following result. Lemma 5.2. We assume that the initial values of the scheme fulfill (H2). Then there exists a constant C > 0 such that for all m ∈ {1, . . . , N} (5.2) |δumh | ≤ C. Proof. We prove the result by induction. The result holds for m = 1 thanks to hypothesis (H2). Let us consider the case m = 2. We set ũ1h = u h. Let u h ∈ P0 given by (5.3) u−1h = 4u h − 3u1h + h − 2 k b̃h(u0h, ũ1h)− 2 k∇hp1h − 2 k f1h . We substract this equation from equation (3.4) written for n = 1. Since b̃h(2u h − u0h, ũ2h)− b̃h(u0h, ũ1h) = b̃(2u1h − 2u0h, ũ2h) + b̃h(u0h, δũ2h) , upon setting δu0h = u h − u h , we get 3 δũ2h − 4 δu1h + δu0h ∆̃h(δũ h) + b̃h(2u h − 2u0h, ũ2h) + b̃h(u0h, δũ2h) = δf2h . Taking the scalar product with 4 k δũ2h we get 3 δũ2h − 4 δu1h + δu0h, δũ2h ∆̃h(δũ h), δũ +4 k bh(u h, δũ h, δũ h) + 4 k bh(2u h − 2u0h, ũ2h, δũ2h) = 4 k (δf2h , δũ2h).(5.4) According to proposition 4.3 we have 4 k |bh(2u1h − 2u0h, ũ2h, δũ2h)| ≤ C k |2u1h − 2u0h| ‖ũ2h‖h ‖δũ2h‖h ; so that, using hypothesis (H2) ∣∣bh(2u1h − 2u0h, ũ2h, δũ2h) ∣∣ ≤ C k2 ‖ũ2h‖h ‖δũ2h‖h. From the Young inequality and theorem 5.1 we deduce ∣∣bh(2u1h − u0h, ũ2h, ũ2h − ũ1h) ∣∣ ≤ k ‖δũ2h‖2h + C k3 ‖ũ2h‖2h ≤ ‖δũ2h‖2h + C k2. A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 19 On the other hand δf2h = f h − f1h = ΠP0 f(t2)−ΠP0 f(t1) = ΠP0 (∫ t2 ft(s) ds Since ΠP0 is stable for the L 2 norm, using the Cauchy-Schwarz inequality, we get |δf2h | ≤ |ft(s)| ds ≤ (∫ t2 |ft(s)|2 ds k ‖ft‖L2(0,T ;L2). 4 k |(δf2h , δũ2h)| ≤ 4 k |δf2h | |δũ2h| ≤ C k3/2 |δũ2h|. So that, using (2.5) and the Young inequality 4 k |(δf2h , δũ2h)| ≤ C k3/2 ‖δũ2h‖h ≤ ‖δũ2h‖2h + C k2. The other terms in (5.4) are dealt with as in the prooof of theroem 5.1. We get (5.5) |δu2h|2 ≤ |δu1h|2 + |2 δu1h − δu0h|2. We know ((5.2) for m = 1) that |δu1h|2 ≤ C k2. It remains to estimate the term |2 δu1h − δu0h|2. According to (5.3) 2 δu1h − δu0h = −δu1h + h − 2 k b̃h(u0h,u1h)− 2 k∇hp1h − 2 k f1h ; by taking the scalar product with 2 δu1h − u0h and using the Cauchy-Schwarz in- equality we get |2 δu1h − δu0h|2 ≤ 2 k ( |δu1h| |∆̃hu1h|+ |∇hp1h|+ |f1h | |2 δu1h − δu0h| + 2 k ∣∣b(u0h, ũ h, 2 δu h − δu0h) ∣∣ .(5.6) Let us bound the terms between braces. First, we have h = ∆̃h u1h − Π̃P0u(t1) + ∆̃h ΠP0u(t1) On one hand, according to proposition 4.8 ∣∣∣∆̃h u1h − Π̃P0u(t1) u1h − Π̃P0u(t1) , ∆̃h u1h − Π̃P0u(t1) ≤ ‖∆̃h u1h − Π̃P0u(t1) ‖h ‖u1h − Π̃P0u(t1)‖h. Applying proposition 2.1 we get ∣∣∣∆̃h u1h − Π̃P0u(t1) u1h − Π̃P0u(t1) | |u1h − Π̃P0u(t1)|. Using the embedding L∞ ⊂ L2 we have |u1h − Π̃P0u(t1)| = |Π̃P0(u1h − u(t1))| ≤ ‖Π̃P0(u1h − u(t1))‖∞ ; since Π̃P0 is stable for the L ∞ norm, we get using hypothesis (H2) |u1h − Π̃P0u(t1)| ≤ ‖u1h − u(t1)‖∞ ≤ C h2. Therefore ∣∣∣∆̃h u1h − Π̃P0u(t1) )∣∣∣ ≤ C. And according to proposition 4.9 ∣∣∣∆̃h ΠP0u(t1) )∣∣∣ ≤ C ‖u(t1)‖ ≤ C ‖u‖C(0,T ;H2). Hence |∆̃hu1h| ≤ C. Let us now bound the pressure term in (5.6). We have ∇hp1h = ∇h p1h − Π̃P0p(t1) Π̃P0p(t1) − ΠP0∇p(t1) +ΠP0∇p(t1). 20 S. ZIMMERMANN According to proposition 4.4 we have ∣∣∣∇h p1h − Π̃P0p(t1) )∣∣∣ ≤ Ch |p h − Π̃P0p(t1)|. Using (2.8) we get ∣∣∣∇h p1h − Π̃P0p(t1) )∣∣∣ ≤ C ‖p(t1)‖2 ≤ C ‖p‖C(0,T ;H2). Since P0 is stable for the L 2 norm we have |ΠP0∇p(t1)| ≤ |∇p(t1)| ≤ ‖p‖C(0,T ;H1). Using proposition 4.5 to treat last term we get |∇hp1h| ≤ C. And according to (3.1) and (5.2) for m = 1 we have + |f1h | ≤ C. We are left with the term∣∣bh(u0h, ũ1h, 2 δu1h − δu0h) ∣∣ in (5.6). We use the following splitting b̃h(u h) = b̃h(u h −ΠRT0u0,u1h) + b̃h ΠRT0u0,u h − Π̃P0u(t1) + b̃h ΠRT0u0, Π̃P0u(t1) Let us take the scalar product with 2 δu1h − δu0h. We get h, 2 δu h − δu0h) = B1 +B2 +B3 B1 = bh(u h −ΠRT0u0,u1h, 2 δu1h − δu0h) , B2 = bh ΠRT0u0,u h − Π̃P0u(t1), 2 δu1h − δu0h ΠRT0u0, Π̃P0u(t1) , 2 δu1h − δu0h Applying propositions 2.1 and 4.3 we have |B1| ≤ |u0h −ΠRT0u0| ‖u1h‖h |2 δu1h − δu0h|. According to (2.8) and (2.13) we have have |u0h−ΠRT0u0| = |ΠP0u0 −ΠRT0u0| ≤ |ΠP0u0 −u0|+ |u0−ΠRT0u0| ≤ C h ‖u0‖1. According to proposition 4.8 and (2.5) ‖u1h‖2h = −(∆̃hu1h,u1h) ≤ |∆̃hu1h| |u1h| ≤ C |∆̃hu1h| ‖u1h‖h ; since |∆̃hu1h| is bounded we get ‖u1h‖h ≤ C. Hence |B1| ≤ C |2 δu1h − δu0h|. In a similar way, using propositions 2.1 and 4.3, we get |B2| ≤ |ΠRT0u0| |u1h − Π̃P0u(t1)| |2 δu1h − δu0h|. We have |ΠRT0u0| ≤ |ΠRT0u0 − u0| + |u0| ≤ C h ‖u0‖1 + |u0| ≤ C ‖u0‖1. Using moreover (5) we get |B2| ≤ C |2 δu1h − δu0h|. Lastly using the following splitting ΠRT0u0, Π̃P0u(t1) ΠRT0u0, Π̃P0u(t1) −ΠP0 b̃ u0,u(t1) + ΠP0 b̃ u0,u(t1) we have B3 = B31 +B32 with B31 = ΠRT0u0, Π̃P0u(t1) −ΠP0 b̃ u0,u(t1) , 2δu1h − δu0h B32 = ΠP0 b̃ u0,u(t1) , 2δu1h − δu0h We have |B31| ≤ ‖b̃h ΠRT0u0, Π̃P0u(t1) −ΠP0 b̃ u0,u(t1) ‖−1,h ‖2δu1h − δu0h‖h A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 21 So that, using proposition 4.1 |B31| ≤ C h ‖u0‖2 ‖u(t1)‖2 ‖2 δu1h − δu0h‖h. Using proposition 2.1 we obtain |B31| ≤ C ‖u0‖2 ‖u‖C(0,T ;H2) |2 δu1h − δu0h|. Let us now bound B32. Using the Cauchy-Schwarz inequality and the stability of ΠP0 for the L 2 norm, we have |B32| ≤ ∣∣∣ΠP0 b̃ u0,u(t1) )∣∣∣ |2 δu1h − δu0h| ≤ ∣∣∣b̃ u0,u(t1) )∣∣∣ |2 δu1h − δu0h|. Integrating by parts, we deduce from (4.1) ∣∣∣b̃ u0,u(t1) )∣∣∣ ≤ |u0 · ∇ui(t1)| ≤ |u0| ‖u(t1)‖2 ≤ C |u0| ‖u‖C(0,T ;H2). Thus |B32| ≤ C |2 δu1h − δu0h|. By gathering the estimates for B1, B2, B3 we get ∣∣bh(u0h,u h, 2 δu h − δu0h) ∣∣ ≤ C. Thus we have bounded the right-hand side in (5.6). We infer from it |2 δu1h − δu0h| ≤ C k. Plugging this estimate into (5.5) and using (5.2) for m = 1, we get (5.2) for m = 2. Let m ∈ {3, . . . , N − 1}. We assume that the induction hypothesis is satisfied up to rank n = m− 1. Let us substract equation (3.2) with the same for n− 1. Since the operator b̃h is bilinear we get 3 δũn+1h − 4 δunh + δu ∆̃h(δũ h ) + b̃h(2 δu h − δun−1h , ũ + b̃h(2u h − un−1h , δũ h ) +∇h(δp h) = δf Let us take the scalar product with 4 k δũn+1h . We get 3 δũn+1h − 4 δunh + δu , 4 k δũn+1h − 4 k ∆̃h(δũ h ), δũ +4 k bh(2 δu h − δun−1h , ũ h , δũ h ) + 4 k bh(2u h − un−1h , δũ h , δũ ∇h(δpnh), δũn+1h = 4 k (δfn+1h , δũ According to proposition 4.3 we have ∣∣4 k bh(2 δunh − δu h , ũ h , δũ ∣∣ ≤ C k |2 δunh − δu h | ‖ũ h ‖h ‖δũ h ‖h. Using the induction hypothesis we get ∣∣4 k bh(2 δunh − δu h , ũ h , δũ ∣∣ ≤ C k2 ‖ũn+1h ‖h ‖δũ h ‖h. Using the Young inequality and (5.1) we infer that ∣∣4 k bh(2 δunh − δun−1h , ũ h , δũ ∣∣ ≤ k ‖δũn+1h ‖ h + C k The other terms are treated like the case m = 2. We finally obtain (5.2). Theorem 5.2. We assume that the initial values of the scheme fulfull (H2). There exists a constant C > 0 such that for all m ∈ {2, . . . , N} |ΠPnc pnh|2 ≤ C. 22 S. ZIMMERMANN Proof. Let m ∈ {2, . . . , N}. We set n = m− 1. Using the inf-sup condition (4.7) and proposition 4.6, we get that there exists vh ∈ P0\{0} such that (5.7) C ‖vh‖h |ΠPnc h | ≤ −(p h , divh vh) = (∇hp h ,vh). Plugging (3.4) into (3.2) we have ∇hpn+1h = − 3un+1h − 4unh + u ∆̃hũ h − b̃h(2u h − un−1h , ũ h ) + f so that (∇hpn+1h ,vh) = − 3un+1h − 4unh + u ∆̃hũ h ,vh − bh(2unh − un−1h , ũ h ,vh) + (f h ,vh). Using the Cauchy-Schwarz inequality, (2.5) and (3.1) we have 3un+1h − 4unh + u )∣∣∣∣ ≤ C 3un+1h − 4unh + u ∣∣∣∣ ‖vh‖h (fn+1h ,vh) ≤ |f h | |vh| ≤ C |vh| ≤ C ‖vh‖h , Thanks to proposition 4.3 and theorem 5.1 we have ∣∣bh(2unh − un−1h , ũ h ,vh) 2 |unh|+ |un−1h | ‖ũn+1h ‖h ‖vh‖h ≤ C ‖ũ h ‖h ‖vh‖h. And according to proposition 4.8 we have ∆̃hũ h ,vh ≤ ‖ũn+1h ‖h ‖vh‖h. Thus (∇hpn+1h ,vh) ≤ C + C |3un+1h − 4unh + u + ‖ũn+1h ‖h ‖vh‖h. Comparing with (5.7) we get |ΠPnc h | ≤ C + C |3un+1h − 4unh + u + ‖ũn+1h ‖h Squaring and summing from n = 1 to m− 1 we obtain |ΠPnc pnh|2 ≤ C + C k |3un+1h − 4unh + u + C k ‖ũn+1h ‖ The last term on the right-hand side is bounded, thanks to theorem 5.1. And since 3un+1h − 4u h + u h = 3(u h − u h)− (unh − u h ) = 3 δu h − δu we deduce from lemma 5.2 |3un+1h − 4unh + u ≤ C k |δunh|2 References [1] S. Boivin , F. Cayre, J. M. Herard, A finite volume method to solve the Navier-Stokes equations for incompressible flows on unstructured meshes, Int. J. Therm. Sci., 39 (2000) 806-825. [2] S. C. Brenner, L. R. Scott, The Mathematical Theory of Finite Element Methods, Springer, 2002. [3] F. Brezzi, M. Fortin, Mixed and Hybrid Finite Element Methods, Springer-Verlag, 1991. [4] J. Chorin, On the convergence of discrete approximations to the Navier-Stokes equations, Math. Comp. 23 (1969) 341-353. A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 23 [5] R. Eymard, T. Gallouët, R. Herbin, A cell-centered finite-volume approximation for anisotropic diffusion operators on unstructured meshes in any space dimension, IMA J. Nu- mer. Anal. 26 (2006) 326-353. [6] R. Eymard, T. Gallouët and R. Herbin, Finite volume methods. In Handbook of Numerical Analysis, P.G. Ciarlet and J.L. Lions eds, North-Holland, 2000. [7] R. Eymard and R. Herbin, A staggered finite volume scheme on general meshes for the Navier-Stokes equations in two space dimensions, Int.J. Finite Volumes (2005). [8] R. Eymard, J. C. Latché and R. Herbin, Convergence analysis of a colocated finite volume scheme for the incompressible Navier-Stokes equations on general 2D or 3D meshes, preprint LATP (2004). [9] S. Faure, Stability of a colocated finite volume scheme for the Navier-Stokes equations, Num. Methods Partial Differential Equations 21(2) (2005) 242-271. [10] V. Girault and P. A. Raviart, Finite Element Methods for Navier-Stokes Equations: Theory and Algorithms, Springer-Verlag, 1986. [11] J.L. Guermond, Some implementations of projection methods for Navier-Stokes equations, M2AN 30(5) (1996) 637-667. [12] J. L. Guermond, Un résultat de convergence l’ordre deux en temps pour l’approximation des équations de Navier-Stokes par une technique de projection, M2AN 33(1) (1999) 169-189. [13] J. G. Heywood and R. Rannacher, Finite element approximation of the nonstationary Navier- Stokes problem. I. Regularity of solutions and second-order error estimates for spatial dis- cretization, SIAM J. Numer. Anal., 19(26) (1982) 275-311. [14] D. Kim and H. Choi, A second-order time-accurate finit volume method for unsteady incom- pressible flow on hybrid unstructured grids, J. Comput. Phys. 162 (2000) 411-428. [15] R. Temam, Sur l’approximation de la solution des équations de Navier-Stokes par la méthode de pas fractionnaires II, Arch. Ration. Mech. Anal. 33 (1969) 377-385. [16] S. Zimmermann, Étude et implémentation de méthodes de volumes finis pour les fluides incompressibles, PhD, Blaise Pascal University, 2006. Department of Mathematics, Centrale Lyon University, 63177 Ecully, FRANCE E-mail : Sebastien.Zimmermann@ec-lyon.fr ABSTRACT We introduce a finite volume scheme for the two-dimensional incompressible Navier-Stokes equations. We use a triangular mesh. The unknowns for the velocity and pressure are both piecewise constant (colocated scheme). We use a projection (fractional-step) method to deal with the incompressibility constraint. We prove that the differential operators in the Navier-Stokes equations and their discrete counterparts share similar properties. In particular, we state an inf-sup (Babuska-Brezzi) condition. We infer from it the stability of the scheme. <|endoftext|><|startoftext|> Introduction to Econophysics (Cambridge University Press, Cambridge, 1999). [2] J. P. Bouchaud and M. Potters, Theory of Financial Risk and Derivative Pricing (Cambridge University Press, Cambridge, 2003), 2nd ed. [3] I. Kondor and J. Kertesz, eds., Econophysics: An Emerg- ing Science (Kluwer, Dordrecht, 1999). [4] A. Chatterjee and B. K. Chakrabarti, eds., Econophysics of Stock and other Markets (Springer, Milan, 2006). [5] T. Lux, Applied Financial Economics 6, 463 (1996). [6] V. Plerou, P. Gopikrishnan, L. A. Nunes Amaral, M. Meyer, and H. E. Stanley, Phys. Rev. E 60, 6519 (1999). [7] R. K. Pan and S. Sinha, Europhys. Lett. 77, 58004 (2007). [8] L. Laloux, P. Cizeau, J. P. Bouchaud, and M. Potters, Phys. Rev. Lett. 83, 1467 (1999). [9] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. Nunes Amaral, and H. E. Stanley, Phys. Rev. Lett. 83, 1471 (1999). [10] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. Nunes Amaral, T. Guhr, and H. E. Stanley, Phys. Rev. E 65, 066126 (2002). [11] A. Utsugi, K. Ino, and M. Oshikawa, Phys. Rev. E 70, 026110 (2004). [12] P. Gopikrishnan, B. Rosenow, V. Plerou, and H. E. Stan- ley, Phys. Rev. E 64, 035106(R) (2001). [13] D. H. Kim and H. Jeong, Phys. Rev. E 72, 046133 (2005). [14] L. Giada and M. Marsili, Phys. Rev. E 63, 061101 (2001). [15] H. M. Markowitz, Portfolio Selection: : Efficient Diver- sification of Investments (John Wiely & Sons, Inc., New York, 1959). [16] R. N. Mantegna, Eur. Phys. Jour. B 11, 193 (1999). [17] J. P. Onnela, A. Chakraborti, K. Kaski, and J. Kertesz, Eur. Phys. Jour. B 30, 285 (2002). [18] J. P. Onnela, A. Chakraborti, K. Kaski, J. Kertesz, and A. Kanto, Phys. Rev. E 68, 056110 (2003). [19] R. Morck, B. Yeung, and W. Yu, Journal of Financial Economics 58, 215 (2000). [20] D. Wilcox and T. Gebbie, Physica A 375, 584 (2007). [21] S. Sinha and R. K. Pan, Econophysics of Stock and Other Markets (Springer, Milan, 2006), chap. The power (Law) of Indian markets: Analysing NSE and BSE trading statistics, pp. 24–34. [22] V. Kulkarni and N. Deo, Econophysics of Stock and Other Markets (Springer, Milan, 2006), chap. A random matrix approach to volatility in an Indian financial market, pp. 35–48. [23] S. Cukur, M. Eryigit, and R. Eryigit, Physica A 376, 555 (2007). [24] A. Durnev, K. Li, R. Morck, and B. Yeung, The Eco- nomics of Transition 12, 593 (2004). [25] R. K. Pan and S. Sinha, physics/0607014 (2006). [26] P. Gopikrishnan, V. Plerou, L. A. Nunes Amaral, M. Meyer, and H. E. Stanley, Phys. Rev. E 60, 5305 (1999). [27] J. D. Noh, Phys. Rev. E 61, 5981 (2000). [28] F. Lillo and R. N. Mantegna, Phys. Rev. E 72, 016219 (2005). [29] H. E. Roman, M. Albergante, M. Colombo, F. Croccolo, F. Marini, and C. Riccardi, Phys. Rev. E 73, 036129 (2006). [30] Tech. Rep., National Stock Exchange (2004). [31] http://www.nseindia.com/. [32] http://finance.yahoo.com/. [33] A. M. Sengupta and P. P. Mitra, Phys. Rev. E 60, 3389 (1999). [34] L. Bachelier, Annales Scientifiques de l’École Normale Supérieure Sér 3, 21 (1900). TABLE I: The list of 201 stocks of NSE analyzed in this paper. i Company Sector i Company Sector i Company Sector 1 UCALFUEL Automobiles Transport 68 IBP Energy 135 HIMATSEIDE Industrial 2 MICO Automobiles Transport 69 ESSAROIL Energy 136 BOMDYEING Industrial 3 SHANTIGEAR Automobiles Transport 70 VESUVIUS Energy 137 NAHAREXP Industrial 4 LUMAXIND Automobiles Transport 71 NOCIL Basic Materials 138 MAHAVIRSPG Industrial 5 BAJAJAUTO Automobiles Transport 72 GOODLASNER Basic Materials 139 MARALOVER Industrial 6 HEROHONDA Automobiles Transport 73 SPIC Basic Materials 140 GARDENSILK Industrial 7 MAHSCOOTER Automobiles Transport 74 TIRUMALCHM Basic Materials 141 NAHARSPG Industrial 8 ESCORTS Automobiles Transport 75 TATACHEM Basic Materials 142 SRF Industrial 9 ASHOKLEY Automobiles Transport 76 GHCL Basic Materials 143 CENTENKA Industrial 10 M&M Automobiles Transport 77 GUJALKALI Basic Materials 144 GUJAMBCEM Industrial 11 EICHERMOT Automobiles Transport 78 PIDILITIND Basic Materials 145 GRASIM Industrial 12 HINDMOTOR Automobiles Transport 79 FOSECOIND Basic Materials 146 ACC Industrial 13 PUNJABTRAC Automobiles Transport 80 BASF Basic Materials 147 INDIACEM Industrial 14 SWARAJMAZD Automobiles Transport 81 NIPPONDENR Basic Materials 148 MADRASCEM Industrial 15 SWARAJENG Automobiles Transport 82 LLOYDSTEEL Basic Materials 149 UNITECH Industrial 16 LML Automobiles Transport 83 HINDALC0 Basic Materials 150 HINDSANIT Industrial 17 VARUNSHIP Automobiles Transport 84 SAIL Basic Materials 151 MYSORECEM Industrial 18 APOLLOTYRE Automobiles Transport 85 TATAMETALI Basic Materials 152 HINDCONS Industrial 19 CEAT Automobiles Transport 86 MAHSEAMLES Basic Materials 153 CARBORUNIV Industrial 20 GOETZEIND Automobiles Transport 87 SURYAROSNI Basic Materials 154 SUPREMEIND Industrial 21 MRF Automobiles Transport 88 BILT Basic Materials 155 RUCHISOYA Industrial 22 IDBI Financial 89 TNPL Basic Materials 156 BHARATFORG Industrial 23 HDFCBANK Financial 90 ITC Consumer Goods 157 GESHIPPING Industrial 24 SBIN Financial 91 VSTIND Consumer Goods 158 SUNDRMFAST Industrial 25 ORIENTBANK Financial 92 GODFRYPHLP Consumer Goods 159 SHYAMTELE Telecom 26 KARURVYSYA Financial 93 TATATEA Consumer Goods 160 ITI Telecom 27 LAKSHVILAS Financial 94 HARRMALAYA Consumer Goods 161 HIMACHLFUT Telecom 28 IFCI Financial 95 BALRAMCHIN Consumer Goods 162 MTNL Telecom 29 BANKRAJAS Financial 96 RAJSREESUG Consumer Goods 163 BIRLAERIC Telecom 30 RELCAPITAL Financial 97 KAKATCEM Consumer Goods 164 INDHOTEL Services 31 CHOLAINV Financial 98 SAKHTISUG Consumer Goods 165 EIHOTEL Services 32 FIRSTLEASE Financial 99 DHAMPURSUG Consumer Goods 166 ASIANHOTEL Services 33 BAJAUTOFIN Financial 100 BRITANNIA Consumer Goods 167 HOTELEELA Services 34 SUNDARMFIN Financial 101 SATNAMOVER Consumer Goods 168 FLEX Services 35 HDFC Financial 102 INDSHAVING Consumer Goods 169 ESSELPACK Services 36 LICHSGFIN Financial 103 MIRCELECTR Consumer Discretonary 170 MAX Services 37 CANFINHOME Financial 104 SURAJDIAMN Consumer Discretonary 171 COSMOFILMS Services 38 GICHSGFIN Financial 105 SAMTEL Consumer Discretonary 172 DABUR Health Care 39 TFCILTD Financial 106 VDOCONAPPL Consumer Discretonary 173 COLGATE Health Care 40 TATAELXSI Technology 107 VDOCONINTL Consumer Discretonary 174 GLAXO Health Care 41 MOSERBAER Technology 108 INGERRAND Consumer Discretonary 175 DRREDDY Health Care 42 SATYAMCOMP Technology 109 ELGIEQUIP Consumer Discretonary 176 CIPLA Health Care 43 ROLTA Technology 110 KSBPUMPS Consumer Discretonary 177 RANBAXY Health Care 44 INFOSYSTCH Technology 111 NIRMA Consumer Discretonary 178 SUNPHARMA Health Care 45 MASTEK Technology 112 VOLTAS Consumer Discretonary 179 IPCALAB Health Care 46 WIPRO Technology 113 KECINTL Consumer Discretonary 180 PFIZER Health Care 47 BEML Technology 114 TUBEINVEST Consumer Discretonary 181 EMERCK Health Care 48 ALFALAVAL Technology 115 TITAN Consumer Discretonary 182 NICOLASPIR Health Care 49 RIIL Technology 116 ABB Industrial 183 SHASUNCHEM Health Care 50 GIPCL Energy 117 BHEL Industrial 184 AUROPHARMA Health Care 51 CESC Energy 118 THERMAX Industrial 185 NATCOPHARM Health Care 52 TATAPOWER Energy 119 SIEMENS Industrial 186 HINDLEVER Miscellaneous 53 GUJRATGAS Energy 120 CROMPGREAV Industrial 187 CENTURYTEX Miscellaneous 54 GUJFLUORO Energy 121 HEG Industrial 188 EIDPARRY Miscellaneous 55 HINDOILEXP Energy 122 ESABINDIA Industrial 189 KESORAMIND Miscellaneous 56 ONGC Energy 123 BATAINDIA Industrial 190 ADANIEXPO Miscellaneous 57 COCHINREFN Energy 124 ASIANPAINT Industrial 191 ZEETELE Miscellaneous 58 IPCL Energy 125 ICI Industrial 192 FINCABLES Miscellaneous 59 FINPIPE Energy 126 BERGEPAINT Industrial 193 RAMANEWSPR Miscellaneous 60 TNPETRO Energy 127 GNFC Industrial 194 APOLLOHOSP Miscellaneous 61 SUPPETRO Energy 128 NAGARFERT Industrial 195 THOMASCOOK Miscellaneous 62 DCW Energy 129 DEEPAKFERT Industrial 196 POLYPLEX Miscellaneous 63 CHEMPLAST Energy 130 GSFC Industrial 197 BLUEDART Miscellaneous 64 RELIANCE Energy 131 ZUARIAGRO Industrial 198 GTCIND Miscellaneous 65 HINDPETRO Energy 132 GODAVRFERT Industrial 199 TATAVASHIS Miscellaneous 66 BONGAIREFN Energy 133 ARVINDMILL Industrial 200 CRISIL Miscellaneous 67 BPCL Energy 134 RAYMOND Industrial 201 INDRAYON Miscellaneous ABSTRACT To investigate the universality of the structure of interactions in different markets, we analyze the cross-correlation matrix C of stock price fluctuations in the National Stock Exchange (NSE) of India. We find that this emerging market exhibits strong correlations in the movement of stock prices compared to developed markets, such as the New York Stock Exchange (NYSE). This is shown to be due to the dominant influence of a common market mode on the stock prices. By comparison, interactions between related stocks, e.g., those belonging to the same business sector, are much weaker. This lack of distinct sector identity in emerging markets is explicitly shown by reconstructing the network of mutually interacting stocks. Spectral analysis of C for NSE reveals that, the few largest eigenvalues deviate from the bulk of the spectrum predicted by random matrix theory, but they are far fewer in number compared to, e.g., NYSE. We show this to be due to the relative weakness of intra-sector interactions between stocks, compared to the market mode, by modeling stock price dynamics with a two-factor model. Our results suggest that the emergence of an internal structure comprising multiple groups of strongly coupled components is a signature of market development. <|endoftext|><|startoftext|> Introduction Visual galaxy classification Linking morphology to cluster environment Linking morphology to local projected galaxy density Linking morphology to projected cluster mass Linking morphology to cluster radius Linking morphology to photometric classification Conclusions ABSTRACT We present a morphological study of galaxies in the A901/902 supercluster from the COMBO-17 survey. A total of 570 galaxies with photometric redshifts in the range 0.155 < z_phot < 0.185 are visually classified by three independent classifiers to M_V=-18. These morphological classifications are compared to local galaxy density, distance from the nearest cluster centre, local surface mass density from weak lensing, and photometric classification. At high local galaxy densities, log(Sigma_10 /Mpc^2) > 1.5, a classical morphology-density relation is found. A correlation is also found between morphology and local projected surface mass density, but no trend is observed with distance to the nearest cluster. This supports the finding that local environment is more important to galaxy morphology than global cluster properties. The breakdown of the morphological catalogue by colour shows a dominance of blue galaxies in the galaxies displaying late-type morphologies and a corresponding dominance of red galaxies in the early-type population. Using the 17-band photometry from COMBO-17, we further split the supercluster red sequence into old passive galaxies and galaxies with young stars and dust according to the prescription of Wolf et al. (2005). We find that the dusty star-forming population describes an intermediate morphological group between late-type and early-type galaxies, supporting the hypothesis that field and group spiral galaxies are transformed into S0s and, perhaps, ellipticals during cluster infall. <|endoftext|><|startoftext|> Introduction For more than forty years, K-theory has been an essential tool in studying rings and algebras [1, 7]. Given a ring R, a simple functorial object associated to R is the abelian group K0(R). There are multi- ple ways of defining K0(R), but the most useful characterization when working with operator algebras is to define K0(R) in terms of idempo- tents (or projections, if an involution is present) in matrix algebras over R; i.e., elements e in Mk(R) for some k with the feature that e 2 = e (p = p∗ = p2 in the involutive case). In this paper, we define, for each natural number n ≥ 2, a group which we denote Kn0 (R). This group is constructed from matrices e over R with the property that en = e; we call such matrices n-potents. We define Kn0 (R) for all rings, unital or not, and show that Kn0 determines a covariant functor from rings to abelian groups. Let Q(n− 1) be the cyclotomic field obtained from the rationals by adjoining the (n− 1)-th roots of unity. We show that Kn0 is half-exact on the subcategory of Q(n − 1)-algebras, and given any such algebra A, we show that Kn0 (A) is isomorphic to a direct sum of n − 1 copies of K0(A). Since a C-algebra A is a Q(n − 1)-algebra for all n, what- ever invariants are contained in Kn0 (A) are already contained in K0(A). However, K 0 for p 6= n may generate new groups for cyclotomic alge- bras, e.g., K40(Q(4)) ∼= Z⊕2Z (Theorem 3.15) which is not isomorphic 2010 Mathematics Subject Classification: 18F30, 19A99, 19K99. http://arxiv.org/abs/0704.0775v2 2 EFTON PARK AND JODY TROUT to K40 (Q(3)) ∼= Z3. Thus, K40 distinguishes between the fields Q(3) and Q(4), but idempotent, and also tripotent (n = 3), K-theory does not. The paper is organized as follows. In Section 2, we define various notions of equivalence on the set of n-potents, and explore the rela- tionships between these equivalence relations. Most of our results in this section mirror analogous facts about idempotents, but in many cases the proofs differ or are more delicate for n-potents. In Section 3, we define n-potent K-theory and study its properties and compute some examples. Finally, in Section 4, we consider n-homomorphisms on rings and algebras [2, 3, 4], and show that n-potent K-theory is functorial for such maps; this is a phenomenon that does not appear in ordinary idempotent K-theory. The authors thank Dana Williams and Tom Shemanske for their helpful comments and suggestions. Note: Unless stated otherwise, all rings and algebras have a unit; i.e., a multiplicative identity, and all ring and algebra homomorphisms are unital. 2. Equivalence of n-potents Fix a natural number n ≥ 2. In this section, we develop the ba- sic theory of n-potents, including various equivalence relations among them. We begin by looking at n-potents over general rings, but even- tually we will specialize to get a well-behaved theory. Definition 2.1. Let R be a ring. An element e in R is called an n-potent if en = e. For n = 2, 3, 4, we use the terms idempotent, tripotent, and quadripotent, respectively. The set of all n-potents in R is denoted Pn(R). We begin with a very simple but useful fact about n-potents: Lemma 2.2. Suppose e is an n-potent. Then en−1 is an idempotent. Proof. (en−1)2 = en−1en−1 = enen−2 = een−2 = en−1. � Definition 2.3. Let e and f be n-potents in a ring R. We say that e and f are algebraically equivalent and write e ∼a f if there exist elements a and b in R such that e = ab and f = ba. We say that e and f are similar and write e ∼s f if there exists an invertible element z in R with the property that f = zez−1. Lemma 2.4. Suppose that e and f are algebraically equivalent n- potents in a ring R. Then the elements a and b described in Definition K0-THEORY WITH n-POTENTS 3 2.3 can be chosen so that a = en−1a = afn−1 = en−1afn−1 b = fn−1b = ben−1 = fn−1ben−1. Proof. Choose elements ã and b̃ in R so that ãb̃ = e and b̃ã = f . Set a = en−1ãfn−1 and b = fn−1b̃en−1. Using Lemma 2.2, we have ab = (en−1ãfn−1)(fn−1b̃en−1) = en−1ãfn−1b̃en−1 = en−1(ãb̃)nen−1 = en−1enen−1 = (en−1)2en = en−1e = en = e. Similarly, ba = f . The two strings of equalities in the statement of the lemma then follow easily. � Proposition 2.5. The relations ∼a and ∼s are equivalence relations on Pn(R). Proof. The only nonobvious point to establish is that ∼a is transitive. Let e, f , and g be elements of Pn(R), and suppose that e ∼a f ∼a g. Choose elements a, b, c and d in R so that e = ab, f = ba = cd, and g = dc, and set s = afn−2c and t = db. Then st = afn−2cdb = afn−1b = a(ba)n−1b = (ab)n = en = e ts = dbafn−2c = dfn−1c = d(cd)n−1c = (dc)n = gn = g. Proposition 2.6. If e and f are similar n-potents in a ring R, then they are algebraically equivalent. Proof. Choose an invertible element z in R such that f = zez−1, and set a = ez−1 and b = zen−1. Then ab = en = e and ba = zenz−1 = f . � As is the case with idempotents, algebraic equivalence does not imply similarity in general. However, we do have the following result, just as for idempotents: Proposition 2.7. Suppose that e and f are algebraically equivalent n-potents in a ring R. Then in the ring M2(R) of 2× 2 matrices over R. 4 EFTON PARK AND JODY TROUT Proof. Choose elements a and b in R so that e = ab and f = ba; without loss of generality, we assume that a and b satisfy the conclusions of Lemma 2.4. Define 1− fn−1 b afn−2 1− en−1 1− en−1 en−1 en−1 1− en−1 Straightforward computation yields that both u2 and v2 equal the iden- tity matrix in M2(R), and thus each is its own inverse. Set z = uv. Then we compute that z−1 = 1− fn−1 b afn−2 1− en−1 1− fn−1 b afn−2 1− en−1 beafn−2 0 since beafn−2 = b(ab)a(ba)n−2 = (ba)n = fn = f. � Definition 2.8. We say n-potents e and f in a ring R are orthogonal if ef = fe = 0, in which case we write e ⊥ f . The next result follows immediately by mathematical induction. Proposition 2.9. Let e and f be orthogonal n-potents in a ring R. Then (e+ f)k = ek + fk. In particular, e+ f is an n-potent. Proposition 2.10. For i = 1, 2, let ei and fi be algebraically equivalent n-potents in a ring R. Suppose that e1 and f1 are orthogonal to e2 and f2, respectively. Then e1 + e2 and f1 + f2 are algebraically equivalent. Proof. For i = 1, 2, choose ai and bi so that ei = aibi, fi = biai, and so that ai and bi satisfy the conclusion of Lemma 2.4. Then a1b2 = a1f 2 b2 = 0. Similarly, b2a1, a2b1, and b1a2 are also zero. Thus (a1 + a2)(b1 + b2) = a1b1 + a2b2 = e1 + e2 (b1 + b2)(a1 + a2) = b1a1 + b2a2 = f1 + f2, whence e1 + e2 is algebraically equivalent to f1 + f2. � Proposition 2.11. Let e and f be n-potents in a ring R. K0-THEORY WITH n-POTENTS 5 (b) If e ⊥ f then e+ f 0 Proof. Define and b = 0 fn−1 en−1 0 in M2(R). Then 0 fn−1 en−1 0 0 fn−1 en−1 0 which establishes the first part of (a); to obtain the second part, simply take f to be zero. To prove (b), first observe that if e ⊥ f , then e + f is an n-potent by Proposition 2.9. Define and b = en−1 fn−1 en−1 fn−1 en efn−1 fen−1 fn en−1 fn−1 en + fn 0 e + f 0 whence the result follows. � Later in this paper we will restrict our attention to n-potent K- theory of cyclotomic algebras: Definition 2.12. For each integer n ≥ 2, the cyclotomic field Q(n−1) is the field obtained by adjoining the (n − 1)st primitive root of unity ζn−1 = e 2πi/(n−1) to the field Q of rational numbers. A cyclotomic algebra is a Q(n− 1)-algebra for some n ≥ 2. Observe that Q(n− 1) ⊂ C, and therefore every C-algebra is canon- ically a Q(n− 1)-algebra for all n. Definition 2.13. Let F be a field and let A be an F-algebra with unit. An n-partition of unity is an ordered n-tuple (e0, e1, . . . , en−1) of idem- potents in A such that (1) e0 + e1 + · · ·+ en−1 = 1; 6 EFTON PARK AND JODY TROUT (2) e0, e1, . . . , en−1 are pairwise orthogonal; i.e., ejek = δjkek for all 0 ≤ j, k ≤ n− 1. Note that e0 = 1 − (e1 + · · · + en−1) is completely determined by e1, e2, . . . , en−1 and is thus redundant in the notation for an n-partition of unity. Cyclotomic algebras admit a distinguished n-partition of unity. Set ω0 = 0 and let ωk = e 2πi(k−1)/(n−1) for 1 ≤ k ≤ n − 1. Note that ω1, . . . , ωn−1 are the (n−1)st roots of unity, and Ωn = {ω0, ω1, . . . , ωn−1} is the set of roots of the polynomial equation xn − x = 0. Theorem 2.14. Let A be a Q(n − 1)-algebra with unit, and suppose e is an n-potent in A. Then there exists a unique n-partition of unity (e0, e1, . . . , en−1) in A such that ωkek. Proof. Let p0, p1, . . . , pn−1 ∈ Q(n− 1)[x] be the Lagrange polynomials pk(x) = j 6=k(x− ωj) j 6=k(ωk − ωj) In particular, p0(x) = 1 − xn−1. Each polynomial pk has degree n − 1 and satisfies pk(ωk) = 1 and pk(ωj) = 0 for all j 6= k. We claim that for all numbers α ∈ Q(n− 1) ⊆ C, pk(α) = p0(α) + · · ·+ pn−1(α) = 1 and that (2) α = ωkpk(α). Indeed, these identities follow from the fact that these polynomial equa- tions have degree n− 1 but are satisfied by the n distinct points in Ωn. Now, given any ωni = ωi in Ωn it follows that pk(ωi) 2 = pk(ωi). Hence, for any n-potent e ∈ A, if we define ek = pk(e), then each ek is an idempotent in A, and Equation (1) implies that pk(e) = 1. These idempotents are pairwise orthogonal, because ejek = pj(e)pk(e) = 0 K0-THEORY WITH n-POTENTS 7 for j 6= k. Finally, ωkpk(e) = by Equation (2). � 3. K0-theory with n-potents We can now proceed to construct our n-potent K-theory groups. Definition 3.1. Let R be a ring. For all k ≥ 1, let Pnk (R) denote the set of n-potents in Mk(R), and let ik denote the inclusion ik(a) = ofMk(R) intoMk+1(R), as well as its restriction as a map from Pnk (R) to Pnk+1(R). Define M∞(R) and Pn∞(R) to be the (algebraic) direct limits M∞(R) = Mk(R), Pn∞(R) = Pnk (R) = Pn(M∞(R)). We define a binary operation ⊕ on Pn∞(R) as follows: let e and f be elements of Pn∞(R), choose the smallest natural numbers k and ℓ such that e ∈Mk(R) and f ∈Ml(R), and set e⊕ f = diag(e, f) = ∈ Pnk+l(R) ⊂ Pn∞(R). Definition 3.2. Let R be a ring, and define an equivalence relation ∼ on Pn∞(R) as follows: take e and f in Pn∞(R), and choose a natural number k sufficiently large that e and f are elements of Pnk (R). Then e ∼ f if e ∼a f in Mk(R). We let Vn(R) denote the set of equivalence classes of ∼. Note that if e = ab and f = ba in Mk(R), then and therefore the equivalence relation described in Definition 3.2 is well-defined. 8 EFTON PARK AND JODY TROUT Note that for any n-potent e, f in M∞(R), we get Thus, the binary operation ⊕ induces a binary operation + on Vn∞(R) as follows: take e and f in Pn∞(R), and define [e] + [f ] = [e⊕ f ] = This operation is well-defined and commutative by Propositions 2.9 and 2.11. The next proposition is straightforward and left to the reader. Proposition 3.3. For every ring R and natural number n ≥ 2, Vn(R) is an abelian monoid under the addition defined above, and whose iden- tity element is the class of the zero n-potent. If α : R −→ S is a unital ring homomorphism, then the induced map Vn(α) : Vn(R) −→ Vn(S) given by Vn(α)([(aij)]) = [(α(aij))] is a well-defined homomorphism of abelian semigroups. The correspon- dences R 7→ Vn(R) and α 7→ Vn(α) induce a covariant functor from the category of rings and ring homomorphisms to the category of abelian monoids and monoid homomorphisms. Definition 3.4. Let R be a ring and let n ≥ 2 be a natural number. We define Kn0 (R) to be the Grothendieck completion [6] of the abelian monoid Vn(R). Given an n-potent e in Pn∞(R), we denote its class in Kn0 (R) by [e]. In light of Propositions 2.6 and 2.7, we could have alternatively used similarity to define Vn(R), and hence Kn0 (R). Proposition 3.5. The assignments R 7→ Kn0 (R) determines a covari- ant functor from the category of rings and ring homomorphisms to the category of abelian groups and group homomorphisms. Proof. Proposition 3.3 states that V is a covariant functor from the category of rings to the category of abelian monoids, and Grothendieck completion determines a covariant functor from the category of abelian monoids to the category of abelian groups; we get the desired result by composing these two functors. � The following result shows that for (unital) algebras over a field of characteristic 6= 2, the tripotent K-theory functor K30 offers us no new invariants over ordinary idempotent K-theory. However, we will see later (Theorem 3.15) that the situation is subtly different for K40 . K0-THEORY WITH n-POTENTS 9 Theorem 3.6. Let F be a field with characteristic 6= 2. If A is a unital algebra over F then there is a natural isomorphism K30 (A) K0(A) of abelian groups. Proof. If e = e3 ∈M∞(A) is a tripotent, then one can easily check that (e2 + e) and e2 = (e2 − e) are (unique) idempotents in M∞(A) such that e = e1 − e2. It follows that we have a natural bijection of abelain monoids V3(A) → V2(A)⊕ V2(A) [e] 7→ [e1]⊕ [e2] with inverse map [e1]⊕[e2] 7→ [e1⊕−e2]. Since these maps are additive, the result easily follows. � While Kn0 (R) is well-defined for any ring R, to obtain a well-behaved theory where the usual exact sequences exist, we must restrict our attention to a smaller class of rings. The problem is that unlike the situation for idempotents, it is not generally true that if e is an n- potent, then so is 1− e. However, given an n-potent in an algebra over the cyclotomic field Q(n− 1), there is an adequate substitute: Definition 3.7. Let e be an n-potent in a Q(n − 1)-algebra A, and write as in the conclusion of Theorem 2.14. We define an n-potent ω1(1− e1), ω2(1− e2), . . . , ωn−1(1− en−1) ∈Mn−1(A) and call e⊥ the complementary n-potent of e. Observe that if n = 2, this definition agrees with the usual one for idempotents; i.e., e⊥ = 1− e. Note also that e⊕ e⊥ ∼s ω, where ω = diag(ω11A, . . . , ωn1A) ∈Mn−1(Q(n− 1)) ⊆Mn−1(A). Proposition 3.8 (Standard Picture of Kn0 (A)). Let n ≥ 2 be a natural number and let A be a Q(n−1)-algebra. Then every element of Kn0 (A) can be written in the form [e]−[ω], where e in an n-potent inMk(A) for some natural number k and ω is a diagonal n-potent in Mk(Q(n− 1)). 10 EFTON PARK AND JODY TROUT Proof. Start with an element [ẽ]− [f̃ ] in Kn0 (A), and take f̃⊥ to be the complementary n-potent of f as defined in Definition 3.7. Then [ẽ]− [f̃ ] = [ẽ] + [f̃⊥] [f̃ ] + [f̃⊥] The n-potents f̃ and f̃⊥ are orthogonal, and therefore [f̃ ] + [f̃⊥] = [f̃ + f̃⊥] = [ω], where ω has the desired form. Finally we take e to be ẽ⊕ f̃⊥, and by enlarging the matrix ω, we obtain the desired result. � Proposition 3.9. Let n ≥ 2 and let A be a Q(n− 1)-algebra. Suppose e and f are n-potents in M∞(A). Then [e] = [f ] in K 0 (A) if and only if e⊕ ω is similar to f ⊕ ω for some n-potent ω in M∞(Q(n− 1)). Proof. The “only if” direction is obvious. To show the inference in the opposite direction, suppose that [e] = [f ] in Kn0 (A). By the definition of the Grothendieck completion, e ⊕ ẽ is similar to f ⊕ ẽ for some n- potent ẽ in M∞(A). Then e ⊕ ẽ ⊕ ẽ⊥ is similar to f ⊕ ẽ ⊕ ẽ⊥. But if we write ẽ = k=1 ωkẽk as in Theorem 2.14, then Proposition 2.11(b) implies that ẽ ∼s diag ω1ẽ1, ω2ẽ2, . . . , ωn−1ẽn−1 Therefore ẽ ⊕ ẽ⊥ is similar to an n-potent in M∞(Q(n − 1)), and the proposition follows. � We next turn our attention to n-potent K-theory for nonunital alge- bras. Given a nonunital Q(n − 1)-algebra A, we define its unitization A+ as the unital Q(n−1)-algebra A+ = {(a, λ) : a ∈ A, λ ∈ Q(n−1)}, where addition and scalar multiplication are defined componentwise, and multiplication is given by (a, λ)(b, τ) = (ab+ aτ + bλ, λτ). Definition 3.10. Let A be a nonunital Q(n−1)-algebra, and let A+ be its unitization. Let π : A+ −→ Q(n− 1) be the algebra homomorphism π(a, λ) = λ. Then we define Kn0 (A) = ker π∗. It is easy to see that π∗ is surjective, so by definition of K 0 (A) we have a short exact sequence 0 // Kn0 (A) // Kn0 (A // Kn0 (Q(n− 1)) // with splitting induced by the map ψ : Q(n − 1) −→ A+ defined by ψ(λ) = (0, λ). In addition, it is easy to check that if A already has a unit and we form A+, then ker π∗ is naturally isomorphic to our original definition of Kn0 (A). K0-THEORY WITH n-POTENTS 11 Proposition 3.11. Let A be a nonunital Q(n−1)-algebra. Then every element of Kn0 (A) can be written in the form [e]− [s(e)], where e is an n-potents in Mk(A +) for some integer k ≥ 1, and s = ψ ◦π : A+ → A+ is the scalar mapping [6, Sect. 4.2.1]. Proof. Follows directly from Proposition 3.8 and Definition 3.10. � Proposition 3.12 (Half-exactness). Every short exact sequence 0 // I // A/I // 0 of Q(n− 1)-algebras, with A unital, induces an exact sequence Kn0 (I) // Kn0 (A) // Kn0 (A/I) of abelian n-potent K-theory groups. Proof. Since q ◦ i = 0, we have by functoriality that q∗ ◦ i∗ = 0 and so the image of Kn0 (I) under i∗ in K 0 (A) is contained in the kernel of q∗. To show the reverse inclusion, suppose we have [e]− [λ] in Kn0 (A) such that q∗ [e]− [λ] = 0. Then [q(e)] = [q(λ)] = [λ] in Kn0 (A/I). By Proposition 3.9, there exists an n-potent τ in M∞(Q(n− 1)) so that q(e)⊕ τ ∼s λ⊕ τ. Choose N sufficiently large so that we may view e, λ, and τ as N by N matrices, and choose z in GL2N (A/I) so that q(e)⊕ τ z−1 = λ⊕ τ. By Proposition 3.4.2 and Corollary 3.4.4 in [1], we can lift diag(z, z−1) to an element u in GL4N(A). Set f = u(e⊕ τ)u−1. Then q(f) = diag(z, z−1)(q(e)⊕ τ)diag(z−1, z) = λ⊕ τ, and thus f and λ⊕ τ are in M4N (I+). Therefore [e]− [λ] = [e⊕ τ ]− [λ⊕ τ ] = i∗([f ]− [λ⊕ τ ]) is in the image of Kn0 (I) under i∗ as desired. � Note that our proof of Proposition 3.12 relies critically on Proposi- tion 3.9, which in turn is proved using the standard picture of Kn0 (A). We do not have a standard picture for Kk0 (A) when k 6= n, and it seems likely to the authors that Kk0 is, in fact, not half-exact in this case. However, we do not have a counterexample where half-exactness fails to hold. While it is not at all obvious from its definition, Kn0 (A) can be iden- tified with a more familiar object. 12 EFTON PARK AND JODY TROUT Theorem 3.13. Let n ≥ 2 be a natural number and let A be a not nec- essarily unital Q(n− 1)-algebra. Then there is a natural isomorphism Kn0 (A) K0(A) of abelian groups. Proof. First consider the case where A is unital. We define a homo- morphism ψ̃ : Vn(A) −→ V0(A) in the following way: for each n-potent e = ωkek in M∞(A), set ψ̃[e] = [e1], [e2], . . . , [en−1] It is easy to check that ψ̃ is additive and well-defined. Next, define a homomorphism φ̃ : )n−1 −→ Vn(A) by the formula [f1], [f2], . . . , [fn−1] ω1diag(f1, 0, 0, . . . , 0) + ω2diag(0, f2, 0, . . . , 0) + · · · + ωn−1diag(0, 0, . . . , 0, fn−1) Note that [f1], [f2], . . . , [fn−1] ω1diag(f1, 0, . . . , 0) + · · ·+ ωn−1diag(0, 0, . . . , fn−1) [diag(f1, 0, . . . , 0)], [diag(0, f2, . . . , 0)] . . . [diag(0, 0, . . . , fn−1)] [f1], [f2], . . . , [fn−1] φ̃ψ̃[e] = φ [e1], [e2], . . . , [en−1] ω1diag(e1, 0, . . . , 0) + · · ·+ ωn−1diag(0, 0, . . . , en−1) diag(ω1e1, ω2e2, . . . , ωn−1en−1) = [e], where the last equality is a consequence of Proposition 2.11(b). The universal mapping property of the Grothendieck completion implies that ψ̃ extends uniquely to an abelian group isomorphism ψ : Kn0 (A) −→ K0(A) and thus the theorem is true for unital Q(n− 1)-algebras. K0-THEORY WITH n-POTENTS 13 Now suppose that A does not have a unit. Then we have the following commutative diagram with exact rows: 0 // Kn0 (A) // Kn0 (A +) // Kn0 (Q(n− 1)) // 0 // K0(A) n−1 // K0(A +)n−1 // K0(Q(n− 1))n−1 // 0 An easy diagram chase shows that there is a unique group iso- morphism from Kn0 (A) to K0(A) that makes the diagram com- mute. � Since a complex algebra is a Q(n− 1)-algebra for all values of n, we have the following immediate corollary. Corollary 3.14. If A is a C-algebra, there are natural isomorphisms Kn0 (A) K0(A) of abelian groups for all natural numbers n ≥ 2. We now arrive at the result that suggests why we should consider all Kn0 -functors for algebras over a cyclotomic field. Theorem 3.15. Let Q(4) = Q[i] be the 4th cyclotomic field. Then we have the following isomorphisms of abelian groups: K20(Q(4)) ∼= Z, K30(Q(4)) ∼= Z2, K40(Q(4)) ∼= Z⊕ 2Z, K50(Q(4)) ∼= Z4. Thus, K40 (Q(4)) 6∼= Z3 ∼= K40 (Q(3)). Proof. Since Q(4) is a field [7], we have K20 (Q(4)) = K0(Q(4)) ∼= Z. The field Q(4) has characteristic 0 6= 2, so Theorem 3.6 implies that K30(Q(4)) K0(Q(4) )2 ∼= Z2. Theorem 3.13 implies that we have an isomorphism K50(Q(4)) K0(Q(4) )4 ∼= Z4. However, the spectrum of 4-potents is contained in 0, 1,−1 which is not contained inQ(4) since the two primitive 3rd roots of unity ω = ζ3 = −12 + i and ω̄ = ζ̄3 = −12 − i are not in Q(4) = Q[i]. Given any 4-potent e ∈Mn(Q(4)) ⊂Mn(C) we can uniquely write e = e1 + ωe2 + ω̄e3, 14 EFTON PARK AND JODY TROUT where e1, e2, e3 are orthogonal idempotents in Mn(C) that sum to an idempotent e1+e2+e3 = e 3 inMn(Q(4)) by Lemma 2.2. We thus have e2 = e1 + ω̄e2 + ωe3 e3 = e1 + e2 + e3 because ω2 = ω̄, ω̄2 = ω, and ω3 = ω̄3 = 1. Since ω + ω̄ = −1, this implies that the first idempotent e1 = (e + e 2 + e3)/3 ∈Mn(Q(4)) and the sum of the last two idempotents e2 + e3 = e 3 − e1 ∈Mn(Q(4)) are both inMn(Q(4)). Using a simple trace argument and the fact that ω, ω̄ 6∈ Q(4), we conclude that rank(e2) = trace(e2) = trace(e3) = rank(e3), and so rank(e2 + e3) = trace(e2 + e3) = 2trace(e2) is even. We then have a well-defined map V4(Q(4)) → V2(Q(4))⊕ 2V2(Q(4)) ∼= N⊕ 2N [e] 7→ [e1]⊕ [e2 + e3] ∼= trace(e1)⊕ 2 trace(e2); this is because the classes of e1 and e1 + e2 are preserved by (stable) similarity, and the K0-class of an idempotent in a matrix ring over a number field (or a PID) is the rank (= trace). It is easy to check that this map is injective (using e1 ⊥ e2 + e3 in Mn(Q(4))) and additive. The only question is surjectivity. It suffices to show that there is a 4-potent e over Q(4) whose stable similarity class is mapped to the generator 1⊕ 2 of N⊕ 2N. Consider the block diagonal matrix 1 0 0 0 0 i 0 i −1  ∈M3(Q(4)), which is easily checked to be quadripotent. The lower right quadripo- tent 2× 2 invertible block has the desired eigenvalues ω and ω̄, and so does not diagonalize over Q(4). The result now follows easily. � 4. n-Homomorphisms and Kn0 Functorality We know from Proposition 3.5 that Kn0 is a covariant functor from the category of (unital) rings and ring homomorphisms to the category of abelian groups and group homomorphisms. However, Kn0 is actually functorial for a more general class of ring mappings. K0-THEORY WITH n-POTENTS 15 Definition 4.1. Let R and S be rings. An additive map (not neces- sarily unital) φ : R −→ S is called an n-homomorphism if φ(a1a2 · · · an) = φ(a1)φ(a2) · · ·φ(an) for all a1, a2, . . . , an in R. Obviously every (ring) homomorphism is an n-homomorphism, but the converse is false in general. For example, an AEn-ring is a ring R such that every additive map φ : R → R is an n-homomorphism. Feigelstock [2, 3] classified all unital AEn-rings. The algebraic version of n-homomorphism was introduced for complex algebras in [4] and has been carefully studied in the case of C∗-algebras in [5]. Proposition 4.2. Let φ : R → S be an n-homomorphism between unital rings. Then φ induces a group homomorphism φ∗ : K 0 (R) −→ Kn0 (S). Furthermore, the assignment R 7→ Kn0 (R) is a covariant functor from the category of unital rings and n-homomorphisms to the category of abelian groups and ordinary group homomorphisms. Proof. For each natural number k, we extend φ to a map from Mk(R) to Mk(S) by applying φ to each matrix entry; it is easy to check this also gives us an n-homomorphism. Moreover, φ is compatible with stabilization of matrices; the only nonobvious point to check is that φ respects algebraic equivalence. Let e and f be algebraically equivalent n-potents in Mk(R) for some k, and choose a and b in Mk(R) so that e = ab and f = ba. Define elements a′ = φ(ea)φ(f)n−2 and b′ = φ(b) in Mk(S). We compute: a′b′ = φ(ea)φ(f)n−2φ(b) = φ((ea)fn−2b) = φ(ea(ba)n−2b) = φ(e(ab)n−1) = φ(en) = φ(e). A similar argument shows that b′a′ = φ(f). Therefore φ determines a monoid homomorphism from Vn(R) to Vn(S), and hence a group homomorphism φ∗ : K 0 (R) −→ Kn0 (S). We leave it to the reader to make the straightforward computations to show that we have a covari- ant functor. � Note that while we have an isomorphism Kn0 (A) K0(A) Q(n− 1)-algebras, it is not at all clear from the right hand side of this isomorphism that Kn0 (A) is functorial for n-homomorphisms. 16 EFTON PARK AND JODY TROUT References [1] B. Blackadar,K-theory for Operator Algebras, 2nd ed., MSRI Publication Series 5, Springer-Verlag, New York, 1998. [2] S. Feigelstock, Rings whose additive endomorphisms are N -multiplicative, Bull. Austral. Math. Soc. 39 (1989), no. 1, 11–14. [3] S. Feigelstock, Rings whose additive endomorphisms are n-multiplicative. II, Period. Math. Hungar. 25 (1992), no. 1, 21–26. [4] M. Hejazian, M. Mirzavaziri, M.S. Moslehian, n-homomorphisms, Bull. Iranian Math. Soc. 31 (2005), no. 1, 13-23. [5] E. Park and J. Trout, On the Nonexistence of Nontrivial Involutive n- homomorphisms of C∗-algebras, Trans. Amer. Math. Soc. 361 (2009), no. 4, 1949–1961 [6] M. Rordam, F. Larsen, N. Laustsen, An Introduction to K-theory for C∗- algebras, London Mathematical Society Student Texts, vol. 49. Cambridge Uni- versity Press, Cambridge, 2000. [7] J. Rosenberg, Algebraic K-theory and Its Applications, Graduate Texts in Math- ematics, vol. 147, Springer-Verlag, New York, 1994. Box 298900, Texas Christian University, Fort Worth, TX 76129 E-mail address : e.park@tcu.edu 6188 Kemeny Hall, Dartmouth College, Hanover, NH 03755 E-mail address : jody.trout@dartmouth.edu 1. Introduction 2. Equivalence of n-potents 3. K0-theory with n-potents 4. n-Homomorphisms and K0n Functorality References ABSTRACT Let $n \geq 2$ be an integer. An \emph{$n$-potent} is an element $e$ of a ring $R$ such that $e^n = e$. In this paper, we study $n$-potents in matrices over $R$ and use them to construct an abelian group $K_0^n(R)$. If $A$ is a complex algebra, there is a group isomorphism $K_0^n(A) \cong \bigl(K_0(A)\bigr)^{n-1}$ for all $n \geq 2$. However, for algebras over cyclotomic fields, this is not true in general. We consider $K_0^n$ as a covariant functor, and show that it is also functorial for a generalization of homomorphism called an \emph{$n$-homomorphism}. <|endoftext|><|startoftext|> Spin coupling in zigzag Wigner crystals A. D. Klironomos,1,2 J. S. Meyer,2 T. Hikihara,3 and K. A. Matveev1 Materials Science Division, Argonne National Laboratory, Argonne, Illinois 60439, USA Department of Physics, The Ohio State University, Columbus, Ohio 43210, USA Department of Physics, Hokkaido University, Sapporo 060-0810, Japan (Dated: October 28, 2018) We consider interacting electrons in a quantum wire in the case of a shallow confining potential and low electron density. In a certain range of densities, the electrons form a two-row (zigzag) Wigner crystal whose spin properties are determined by nearest and next-nearest neighbor exchange as well as by three- and four-particle ring exchange processes. The phase diagram of the resulting zigzag spin chain has regions of complete spin polarization and partial spin polarization in addition to a number of unpolarized phases, including antiferromagnetism and dimer order as well as a novel phase generated by the four-particle ring exchange. PACS numbers: 73.21.Hb,71.10.Pm I. INTRODUCTION The deviations of the conductance from perfect quanti- zation in integer multiples of G0 = 2e 2/h observed in bal- listic quantum wires at low electron densities have gener- ated great experimental and theoretical interest in recent years.1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27 These conductance anomalies manifest themselves as quasi-plateaus in the conductance as a function of gate voltage at about 0.5 to 0.7 of the conductance quantum G0, depending on the device. Although most experiments are performed with electrons in GaAs wires,1,2,3,4,5,6,7,8,9,10,11 a similar “0.7 structure” was recently observed in devices formed in two-dimensional hole systems.12,13,14 It is widely accepted that the origin of the quasi-plateau lies in correlation effects, but a complete understanding of this phenomenon remains elusive. Although some alternative interpretations have been proposed,11,26,27 most commonly the experimental find- ings are attributed to non-trivial spin properties of quan- tum wires.1,4,5,6,7,8,9,10,14,15,16,17,18,19,20,21,22,23,24,25 In a truly one-dimensional geometry the spin coupling is rel- atively simple: electron spins are coupled antiferromag- netically, and the low energy properties of the system are described by the Luttinger liquid theory. The pic- ture may change dramatically when transverse displace- ments of electrons are important and the system be- comes quasi-one-dimensional. In particular, the spon- taneous spin polarization of the ground state, which was proposed1,6,9,10,14,15,16 as a possible origin of the conduc- tance anomalies, is forbidden in one dimension,28 but allowed in this case. The electron system in a quantum wire undergoes a transition from a one-dimensional to a quasi-one- dimensional state when the energy of quantization in the confining potential is no longer large compared to other important energy scales. In this paper we consider the spin properties of a quantum wire with shallow confin- ing potential. In such a wire the electron system be- comes quasi-one-dimensional while the electron density is still very low, and thus the interactions between elec- trons are effectively strong. At very low densities, elec- trons in the wire form a one-dimensional structure with short-range crystalline order—the so-called Wigner crys- tal. As the density increases, strong Coulomb interac- tions cause deviations from one-dimensionality creating a quasi-one-dimensional zigzag crystal with dramatically different spin properties. In particular, ring exchanges will be shown to play an essential role. We find several interesting spin structures in the zigzag crystal. In a sufficiently shallow confining po- tential, in a certain range of electron densities, the 3- particle ring exchange dominates and leads to a fully spin-polarized ground state. At higher electron densities, and/or in a somewhat stronger confining potential, the 4-particle ring exchange becomes important. We study the phase diagram of the corresponding spin chain us- ing the method of exact diagonalization, and find that the 4-particle ring exchange gives rise to novel phases, including one of partial spin polarization. The paper is organized as follows. The formation of a Wigner crystal in a quantum wire and its evolution into a zigzag chain as a function of electron density are dis- cussed in Sec. II. Spin interactions in a zigzag Wigner crystal which arise through 2-particle as well as ring ex- changes are introduced in Sec. III. The numerical calcu- lation of the relevant exchange constants is presented in Sec. IV. The results of the numerical calculation estab- lish the existence of a ferromagnetic phase at intermedi- ate densities and the dominance of the 4-particle ring ex- change at high densities. Subsequently, a detailed study of the zigzag chain with 4-particle ring exchange is pre- sented in Sec. V. An attempt to construct the phase dia- gram for a realistic quantum wire as a function of electron density and interaction strength is presented in Sec. VI. The paper concludes with a discussion of the relation of our work to recent experiments, given in Sec. VII. A brief summary of some of our results has been reported previously in Ref. 29. http://arxiv.org/abs/0704.0776v1 0 0.05 0.1 0.15 0.2 0.25 (a) ν=0.70 (b) ν=0.90 (c) ν=1.75 FIG. 1: Wigner crystal of electrons in a quantum wire. The structure as determined by the dimensionless distance be- tween rows d/r0 depends on the parameter ν proportional to electron density (see text). As density grows, the one- dimensional crystal (a) gives way to a zigzag chain (b,c). II. WIGNER CRYSTALS IN QUANTUM WIRES We consider a long quantum wire in which the elec- trons are confined by some smooth potential in the direc- tion transverse to the wire axis. Assuming a quadratic dispersion and zero temperature, the kinetic energy of an electron is typically of the order of the Fermi en- ergy EF = (π~n) 2/8m, whereas the Coulomb interaction energy is of the order of e2n/ǫ. Here, n is the (one- dimensional) density of electrons, ǫ is the dielectric con- stant of the host material, and m is the effective electron mass. As the density of electrons is lowered, Coulomb interactions become increasingly more important, and at n ≪ a−1B they dominate over the kinetic energy, where the Bohr radius is given as aB = ~ 2ǫ/me2. (In GaAs its value is approximately aB ≈100Å.) In this low-density limit, the electrons can be treated as classical particles. They will minimize their mutual Coulomb repulsion by occupying equidistant positions along the wire, forming a structure with short-range crys- talline order—the so-called Wigner crystal, Fig. 1(a). Unlike in higher dimensions, the long-range order in a one-dimensional Wigner crystal is smeared by quantum fluctuations, and only weak density correlations remain at large distances.30 However, as will be shown in the following sections, the coupling of electron spins is con- trolled by electron interactions at distances of order 1/n, where the picture of a one-dimensional Wigner crystal is applicable. Henceforth, we speak of a Wigner crystal in a quantum wire with this important distinction in mind. Upon increasing the density, the inter-electron distance diminishes, and the resulting stronger electron repulsion will eventually overcome the confining potential Vconf , transforming the classical one-dimensional Wigner crys- tal into a staggered or zigzag chain31,32, as depicted in Fig. 1(b,c). From the comparison of the Coulomb inter- action energy Vint(r) = e 2/ǫr with the confining potential an important characteristic length scale emerges. Indeed, the transition from the one-dimensional Wigner crystal to the zigzag chain is expected to take place when dis- tances between electrons are of the order of the scale r0 such that Vconf(r0) = Vint(r0). It is therefore necessary to identify the electron equi- librium configuration as a function of density. In order to proceed in a quantitative way we consider a specific model, namely a quantum wire with a parabolic confining potential Vconf(y) = mΩ 2y2/2, where Ω is the frequency of harmonic oscillations in the potential Vconf(y). Within that model the characteristic length scale r0 is given as 2e2/ǫmΩ2 . (1) It is convenient for the following discussion to measure lengths in units of r0. To that respect we introduce a dimensionless density ν = nr0. (2) Then minimization of the energy with respect to the electron configuration31,32 reveals that a one-dimensional crystal is stable for densities ν < 0.78, whereas a zigzag chain forms at intermediate densities 0.78 < ν < 1.75. (If density is further increased, structures with larger numbers of rows appear.31,32) The distance d between rows grows with density as shown in Fig. 1. Note that at ν ≈ 1.46 the equilateral configuration is achieved. There- fore, at higher densities—and in a curious contradiction in terms—the distance between next-nearest neighbors is smaller than the distance between nearest neighbors (see Fig. 1(c)). III. SPIN EXCHANGE In order to introduce spin interactions in the Wigner crystal, it is necessary to go beyond the classical limit. In quantum mechanics spin interactions arise due to ex- change processes in which electrons switch positions by tunneling through the potential barrier that separates them. The tunneling barrier is created by the exchanging particles as well as all other electrons in the wire. The re- sulting exchange energy is exponentially small compared to the Fermi energy EF . Furthermore, as a result of the exponential decay of the tunneling amplitude with distance, only nearest neighbor exchange is relevant in a one-dimensional crystal. Thus, the one-dimensional crys- tal is described by the Heisenberg Hamiltonian H1 =∑ j J1SjSj+1, where the exchange constant J1 is posi- tive and has been studied in detail recently.24,33,34,35 The exchange constant being positive leads to a spin-singlet ground state with quasi-long-range antiferromagnetic or- der, in accordance with the Lieb-Mattis theorem.28 The zigzag chain introduced in the previous section displays much richer spin physics. As the distance be- tween the two rows increases as a function of density, the distance between next-nearest neighbors becomes com- parable to and eventually even smaller than the distance between nearest neighbors, as illustrated in Fig. 1(b,c). Consequently, the next-nearest neighbor exchange con- stant J2 may be comparable to or larger than the nearest neighbor exchange constant J1. Drawing intuition from studies of the two-dimensional Wigner crystal,36,37,38,39 one comes to a further important realization regarding the physics of the zigzag chain: in addition to 2-particle exchange processes, ring exchange processes, in which three or more particles exchange positions in a cyclic fashion, have to be considered in this geometry. It has long been established that, due to symmetry properties of the ground state wavefunctions, ring ex- changes of an even number of fermions favor antiferro- magnetism, while those of an odd number of fermions favor ferromagnetism.40 In a zigzag chain, the Hamilto- nian reads J1Pj j+1 + J2Pj j+2 − J3(Pj j+1 j+2 + Pj+2 j+1 j) +J4(Pj j+1 j+3 j+2 + Pj+2 j+3 j+1 j)− . . . , (3) where Pj1...jl denotes the cyclic permutation operator of l spins. Here the exchange constants are defined such that all Jl > 0. Furthermore, only the dominant l-particle ex- changes are shown. A more familiar form of the Hamilto- nian in terms of spin operators is obtained by noting that Pij = + 2SiSj and Pj1...jl = Pj1j2Pj2j3 . . . Pjl−1jl . Using spin operators and considering the two-spin ex- changes one obtains the Hamiltonian H12 = (J1SjSj+1 + J2SjSj+2) . (4) The competition between the nearest neighbor and next- nearest neighbor exchanges becomes the source of frus- tration of the antiferromagnetic spin order and eventu- ally leads to a gapped dimerized ground state at J2 > 0.24J1. 41,42,43,44 The simplest ring exchange involves three particles and is therefore ferromagnetic. Including the 3-particle ring exchange J3 in addition to the 2-particle exchanges, the Hamiltonian of the corresponding spin chain retains a simple form. The 3-particle ring exchange does not in- troduce a new type of coupling, but rather modifies the 2-particle exchange constants.40 For a zigzag crystal we find the effective 2-particle exchange constants J̃1 = J1 − 2J3, (5) J̃2 = J2 − J3. (6) Thus the total Hamiltonian has the form H123 = J̃1SjSj+1 + J̃2SjSj+2 , (7) where J̃1 and J̃2 can have either sign. ���������������������� ���������������������� ���������������������� ���������������������� ���������������������� ���������������������� ���������������������� ���������������������� ���������������������� ���������������������� ���������������������� ���������������������� ���������������������� ���������������������� ���������������������� ���������������������� ���������������������� ���������������������� J 2~ 0 FM AF Dimers FIG. 2: The phase diagram including nearest neighbor, next- nearest neighbor, and 3-particle ring exchanges. The effective couplings eJ1 and eJ2 are defined in the text. The shaded region between the dimer and ferromagnetic phases corresponds to the exotic phase predicted in Ref. 48. Consequently, regions of negative (i.e. ferromagnetic) nearest and/or next-nearest neighbor coupling become accessible. The phase diagram of the Heisenberg spin chain (7) with both positive and negative couplings has been studied extensively.41,42,43,44,45,46,47,48,49,50 In ad- dition to the antiferromagnetic and dimer phases dis- cussed earlier, a ferromagnetic phase exists for J̃1 < min{0,−4J̃2}.46 An exotic phase called the chiral-biaxial- nematic phase has been predicted48 to appear for J̃1 < 0 and −0.25 < J̃2/J̃1 < −0.38. However, the nature of the system in this parameter region is still controversial. The phase diagram is drawn in Fig. 2. Thus, depending on the relative magnitudes of the var- ious exchange constants, different phases are realized. Extensive studies of the two-dimensional Wigner crys- tal have shown that, at low densities (or strong interac- tions), the 3-particle ring exchange dominates over the 2-particle exchanges. As a result, the two-dimensional Wigner crystal becomes ferromagnetic at sufficiently strong interactions.36,39 Given that the electrons in a two-dimensional Wigner crystal form a triangular lat- tice, by analogy, one should expect a similar effect in the zigzag chain at densities where the electrons form ap- proximately equilateral triangles. More specifically, upon increasing the density and consequently the distance be- tween rows, one would expect the system to undergo a phase transition from an antiferromagnetic to a ferromag- netic phase. To establish this scenario conclusively, the various exchange energies in the zigzag crystal have to be determined. The system differs from the two-dimensional crystal in two important aspects. (i) The electrons are subject to a confining potential as opposed to the flat background in the two-dimensional case. Even more im- portantly, (ii) the electron configuration depends on den- sity, cf. Fig. 1, as opposed to the ideal triangular lattice in two dimensions. In the following section, we proceed with a numerical study of the exchange energies for the specific configurations of the zigzag Wigner crystal in a parabolic confining potential. IV. SEMICLASSICAL EVALUATION OF THE EXCHANGE CONSTANTS The effective strength of interactions is usually de- scribed by the interaction parameter rs which measures the relative magnitude of the interaction energy and the kinetic energy and is of order the distance between elec- trons measured in units of the Bohr radius. For quan- tum wires, it is more appropriate to use the parameter rΩ = r0/aB, which takes into account the confining po- tential. Within our model, the interaction parameter rΩ rΩ = 2 2ǫ2~2 . (8) For rΩ ≫ 1, strong interactions dominate the physics of the system, and a semiclassical description is appli- cable. In order to calculate the various exchange con- stants, we use the standard instanton method, success- fully employed in the study of the two-dimensional36,37,38 and one-dimensional34,35 Wigner crystal. Within this approach, the exchange constants are given by Jl = J∗l exp (−Sl/~). Here Sl is the value of the Euclidean (imaginary time) action, evaluated along the classical ex- change path. By measuring length and time in units of r0 and T = 2/Ω, respectively, the action S[{rj(τ)}] can be rewritten in the form S = ~η rΩ, where the functional η[{rj(τ)}] = + y2j |rj−ri|  (9) is dimensionless. Thus, we find the exchange constants in the form Jl = J l exp (−ηl rΩ), (10) where the dimensionless coefficients ηl depend only on the electron configuration (cf. Fig. 1) or, equivalently, on the density ν. The exponents ηl are calculated nu- merically for each type of exchange by minimizing the action (9) with respect to the instanton trajectories of the exchanging electrons. This procedure is mathemati- cally equivalent to solving a set of coupled, second order in the imaginary time τ , differential equations for the trajectories rj(τ). The boundary conditions at τ = ±∞ are, respectively, the original equilibrium configuration and the configuration where the electrons have exchanged positions according to the exchange process considered. 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 FIG. 3: The exponents η1, η2, η3, and η4 as functions of the dimensionless density ν. ν η1 η2 η3 η4 1.0 1.050 2.427 1.254 1.712 1.1 1.161 2.169 1.261 1.605 1.2 1.255 1.952 1.275 1.532 1.3 1.337 1.754 1.287 1.469 1.4 1.406 1.566 1.293 1.398 1.5 1.456 1.376 1.278 1.299 1.6 1.471 1.169 1.215 1.135 1.7 1.391 0.901 1.022 0.784 TABLE I: The numerically calculated values of the density dependent exponents ηl, see Eq. (10). The computation was carried out including 12 moving spectator particles on either side of the exchanging particles. Corrections to all ηl from the remaining spectators do not exceed 0.1%. In the simplest approximation only the exchanging electrons are included in the calculation while all other electrons, being frozen in place, create the background potential. It turns out, however, that it is important to take into account the motion of “spectators”—the electrons in the crystal to the left and to the right of the exchanging particles—during the exchange process. The results presented here are obtained by successively adding more spectators on both sides until the values ηl converge. We find that including 12 moving spectators on either side of the exchanging particles determines the exponents to an accuracy better than 0.1%. Figure 3 shows the calculated exponents for various ex- changes as a function of dimensionless density ν and the corresponding values are reported in Table I. At strong interactions (rΩ ≫ 1), the exchange with the smallest value of ηl is clearly dominant, and the prefactor J l is of secondary importance to our argument. At low densities, when the zigzag chain is still close to one-dimensional, J1 (c) J3 (b) J(a) J 2 (d) J4 FIG. 4: The calculated particle trajectories for various ex- changes at a representative density ν = 1.5. It is evident that only a few near neighbors of the exchanging particles move appreciably. is the largest exchange constant, and the spin physics is controlled by the nearest-neighbor exchange. In an inter- mediate density regime, when the electron configuration is close to equilateral triangles, the 3-particle ring ex- change dominates. Thus, the numerical calculation con- firms our original expectation, and a transition from an antiferromagnetic to a ferromagnetic state takes place upon increasing the density. Surprisingly, however, at even higher densities the 4-particle ring exchange is the dominant process. The role of the 4-particle ring ex- change and the phase diagram of the associated zigzag spin chain will be the subject of the following section. More complicated exchanges have also been computed, namely multi-particle (l ≥ 5) ring exchanges as well as exchanges involving more distant neighbors. However, the exchanges displayed in Fig. 3 were found to be the dominant ones.29 It is important to note here that spectators contribute to our results in an essential way. Allowing spectators to move results not only in quantitative changes (namely a reduction of the initially overestimated values ηl) but in qualitative changes as well: at high densities, the dom- inance of the 4-particle ring exchange J4 over the next- nearest neighbor exchange J2 is obtained only if specta- tors are taken into account. In particular, it is necessary to include at least 6 moving spectators on each side of the exchanging particles for J4 to take over at high densities. The considerable effect that the spectators have on the values of the exponents raises the question whether a short-ranged interaction potential might cause further quantitative or qualitative changes to the physical pic- ture. In order to investigate that possibility we have repeated the entire calculation for a modified Coulomb interaction of the form V (x) = x2 + (2d)2 . (11) This particular interaction accounts for the presence of a metal gate, modeled by a conducting plane at a distance d from the crystal. The gate screens the bare Coulomb potential, modifying the electron-electron interaction at long distances. Our calculation shows that this modifica- tion affects the values of the exponents only weakly, even when the gate is placed at a distance from the crystal comparable to the inter-particle spacing. Qualitatively, the physical picture remains the same, with the order of dominance of the various exchanges unaffected through- out the range of densities. At the same time, it is particularly noteworthy that (both for the screened and unscreened interaction) the contribution of the spectator electrons saturates rapidly as their number is increased. This is an indication that the destruction of long-range order in the quasi-one- dimensional Wigner crystal by quantum fluctuations will not affect our conclusions. Figure 4 shows the particle trajectories for the dominant exchanges at a represen- tative density of ν = 1.5. The trajectories of both the exchanging particles and a subset of the spectators are shown, and their relative displacements can be readily compared. V. FOUR-PARTICLE RING EXCHANGE We have shown in the preceding section that in a cer- tain range of densities, the 4-particle ring exchange dom- inates. Unlike the 3-particle exchange, the 4-particle ring exchange not only modifies the nearest and next- nearest neighbor exchange constants, but, in addition, introduces more complicated spin interactions.40 For the zigzag chain, we find H4 = J4 SjSj+l + 2 (SjSj+1)(Sj+2Sj+3) +(SjSj+2)(Sj+1Sj+3)− (SjSj+3)(Sj+1Sj+2) . (12) Not much is known about the physics of zigzag spin chains with interactions of this type. We have stud- ied this particular system described by the Hamiltonian H = H123 + H4 using exact diagonalization, consider- ing systems of N = 12, 16, 20, 24 sites. Periodic bound- ary conditions have been imposed, and we have employed the well-known Lanczos algorithm to calculate a few low- energy eigenstates. Figure 5 shows the total spin S of the ground state as a function of the effective couplings J̃1/J4 and J̃2/J4 for the largest system considered, one with N = 24 sites. The darkest region corresponding to maximal total spin is the ferromagnetic phase, which occurs for large negative couplings in direct analogy to the phase diagram for the system without four-spin interactions (see Fig. 2). For all system sizes that we have considered, the obtained phase boundary is almost independent of the system size and agrees very well with the conditions for ferromagnetism J̃1 + 2J4 < 0, (13) J̃1 + 4J̃2 + 10J4 < 0, (14) FIG. 5: Total spin S of the ground state for a chain of N = 24 sites as a function of the effective couplings eJ1/J4 and eJ2/J4. derived by treating the four-spin terms in the Hamilto- nian (12) on a mean field level near the ferromagnetic state. A new phase of partial spin polarization appears adja- cent to the ferromagnetic phase. The partially polarized phase possesses a ground state total spin of S = 2 for N = 12, S = 2 or 4 for N = 16, 20, and S = 4 for N = 24; it appears that total spin of one third of the saturated magnetization N/2 prevails throughout most of that phase. The phase persists, to a significant extent, in range and form as N increases. Therefore, we believe it is not a finite size effect. We note here that it has been shown rigorously that a model described by a Hamilto- nian having a similar form to ours also exhibits a ground state with partial spin polarization.51 On the other hand, the scattered points corresponding to non-zero total spin in the first quadrant (J̃1, J̃2 > 0) appear to shift posi- tion as N increases and the size of the total spin remains small, S ≤ 2, for all system sizes considered. We cannot ascertain at this point whether they persist in a larger system. At large values of |J̃1|/J4 and |J̃2|/J4, one would ex- pect to recover the phases present in the absence of J4. Thus, the large white area in Fig. 5 corresponding to total spin S = 0 should contain the antiferromagnetic phase, analogues of the dimer phases observed in the system without four-spin interactions, and possibly en- tirely new phases as well. In order to distinguish between these phases, we first calculate the overlap between the ground state wavefunctions in our model and the ones representing the dimer and antiferromagnetic phases in the well-studied model with J4 = 0. The representative ground state wavefunctions are obtained for the chain with J4 = 0 and typical parameter sets of (J̃1, J̃2) cho- sen deep in the dimer and antiferromagnetic phases of the phase diagram shown in Fig. 2. The results for the −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 8 1.0 FIG. 6: Overlaps of the ground state wavefunctions in the presence of the 4-particle ring exchange with the wavefunc- tions representing (a) the dimer and (b) the antiferromag- netic phase for J4 = 0. The representative ground states (a) and (b) are obtained for ( eJ1, eJ2, J4) = (1, 10, 0) and ( eJ1, eJ2, J4) = (1,−10, 0), respectively. chain with N = 24 sites are shown in Fig. 6. As can be seen from the figure, the ground states for a broad region of large positive J̃2/J4 have a significant overlap with the representative ground state of the dimer phase while the ground states for large positive J̃1/J4 and/or negative J̃2/J4 resemble very much the one belonging to the antiferromagnetic phase. This behavior indicates the appearance of the expected dimer and antiferromagnetic phases for large effective couplings |J̃1|/J4 and |J̃2|/J4. We have confirmed the existence of these phases in the corresponding parameter regimes by studying the associ- ated structure factors. In order to study and clarify the properties of the sys- tem in more detail, we have calculated the excitation energies ∆En(S,Q) = En(S,Q)− Egs, (15) where En(S,Q) is the energy of n-th lowest level in the subspace characterized by the total spin S and the mo- −2 0 2 4 6 J2 / J4 J1 /J4 = 2, N = 24 : (0, 0) : (0, π) : (1, π) FIG. 7: Excitation energies ∆En(S,Q) in the system of N = 24 sites for eJ1/J4 = 2 as functions of eJ2/J4. The two-lowest levels are plotted for the subspaces of (S,Q) = (0, 0) and (0, π) while only the lowest one is shown for all other subspaces. The energies for (S,Q) = (0, 0), (0, π), and (1, π) are plotted by thick solid, dotted, and dashed curves, respectively. The energies of the levels belonging to other subspaces are shown by thin gray curves. mentum Q, and Egs is the ground state energy. Figure 7 shows the results for the system of size N = 24, obtained along the vertical line in the phase diagram given by J̃1/J4 = 2. At large positive J̃2/J4, the ground and first- excited states belong to the subspace (S,Q) = (0, 0) and (0, π), respectively.52 These states are expected to form the ground state doublet of the dimer phase in the ther- modynamic limit. For J̃2/J4 > (J̃2/J4)c,dim ∼ 3.5, one of the dimer doublet states is the ground state and the sys- tem is in the dimer phase. At smaller J̃2/J4, both states of the dimer doublet shift upward and move away from the low-energy regime, while other states decrease steeply in energy and eventually become the ground state. We therefore take the point (J̃2/J4)c,dim as the boundary of the dimer phase. After the transition, the system enters a region with exotic ground states and a large number of low-lying excitations. We have numerically checked that these exotic ground states have no or, at most, negligibly small overlap with the ground state of either the dimer or antiferromagnetic phases. When J̃2/J4 decreases further, the exotic states leave the low-energy regime and the system predictably enters the antiferromagnetic phase, which occurs for J̃2/J4 < (J̃2/J4)c,AF ∼ 0.1. Performing the same type of analysis for several pa- rameter lines, we can estimate the phase boundaries (J̃2/J4)c,dim and (J̃2/J4)c,AF as functions of J̃1/J4. In the limit of large negative coupling J̃1/J4 → −∞, the boundary of the dimer phase (J̃2/J4)c,dim approaches the line J̃1 = −0.38J̃2, suggesting a smooth connection to the behavior for J̃1 < 0 and J4 = 0 (cf. Ref. 48). In −5 0 5 Dimers FIG. 8: The phase diagram of the Heisenberg chain including nearest neighbor, next-nearest neighbor, and 4-particle ring exchanges. The expected phases consist of a ferromagnetic and an antiferromagnetic phase as well as a dimer phase. In addition, a novel region (4P ) dominated by the 4-particle ring exchange appears. The latter includes a phase of partial spin polarization (M). Triangles, squares and circles correspond to the boundaries obtained for N = 16, 20, and 24 sites, re- spectively. We note that although the phase of partial spin polarization persists as the system size is increased, its bound- ary with the 4P phase has a rather irregular size dependence and is represented approximately in the figure. a similar fashion, at large positive coupling J̃1/J4, we find no indication for the appearance of exotic phases after J̃1/J4 ≥ 6; the data of the energy spectrum and the wavefunction overlaps show essentially the same be- haviors as those at J̃1/J4 → ∞. We therefore conclude that there occurs a direct transition between the dimer and antiferromagnetic phases and estimate the transition line using the method of level spectroscopy, according to which the transition point is determined by the level crossing between the first-excited states in the dimer and antiferromagnetic phases.43 Combining all these phase boundaries and including the boundaries of the ferromagnetic and partially spin polarized phases which were obtained using the total spin of the ground state as a criterion, we determine the phase diagram in the J̃1/J4 versus J̃2/J4 plane. The result is shown in Fig. 8. The phase diagram has similarities to the one obtained without the four-spin interaction term, see Fig. 2. In particular, the expected ferromagnetic, an- tiferromagnetic, and dimer phases appear for large values of the effective couplings, |J̃1|/J4 and |J̃2|/J4. But more importantly, at not too large values of the effective cou- plings, new phases appear as a direct result of the new interaction term. We can identify a phase with partial spin polarization and a region occupied by one or sev- eral novel phases with total spin S = 0. In the region where J4 dominates, the ground state has no similarity at the level of wavefunctions with that of the conventional phases. It is important to note that the region occupied by the new phases becomes broader as the system size N grows, indicating that it survives even in the thermo- dynamic limit. From the analysis of the wavefunction overlaps between the ground states, there are strong in- dications that the novel unpolarized region might consist of several different phases. Unfortunately, it has proven difficult to clarify the nature of the new phases and, in particular, discover the order parameters that character- ize them based solely on the analysis of small systems. Therefore, the issue is relegated to future studies. In the absence of detailed understanding of its properties, we collectively dub the region of the phase diagram the “4P” phase. VI. PHASE DIAGRAM FOR REALISTIC QUANTUM WIRES Having identified possible phases of the zigzag chain, the most interesting question is which of the various phases appearing in the phase diagram Fig. 8 are ac- cessible in quantum wires. At finite rΩ, the calculations of the exchange constants discussed in Sec. IV have to be completed in an important way by computing the prefac- tors Jl in Eq. (10). To that effect it is necessary to take into account Gaussian fluctuations around the classical exchange path. We employ the method introduced by Voelker and Chakravarty38 which, for the sake of com- pleteness, is outlined in the Appendix . The prefactors have the form J∗l = AlFl rΩ , (16) where Fl is density dependent. The factor Al is used to account for multiple classical trajectories corresponding to the same exchange process (see Appendix). Table II contains the values of Fl we calculated for the various exchanges considered in this work. Note that, in order to achieve a comparable level of convergence, a more ac- curate determination of the instanton trajectories was required for the calculation of the prefactors J∗l than for the calculation of the exponents ηl. By including up to 28 moving spectators on either side of the exchanging par- ticles, we have been able to achieve an accuracy better than 2%. We are now in a position to map out the areas of the phase diagram of Fig. 8 that are encountered as one tra- verses the density region of interest for a given rΩ. The resulting phase diagram obtained with the calculated ex- change energies is shown in Fig. 9. Since the semiclassical approximation is applicable only at rΩ ≫ 1, we do not extend the phase diagram to values of rΩ < 10. It turns out that the spin polarized phases are only realized at rΩ & 50. On the other hand, the novel “4P” phase is ν F1 F2 F3 F4 1.0 1.12 ≃ 6 1.22 2.44 1.1 1.04 ≃ 4 1.03 1.73 1.2 1.05 2.38 0.97 1.28 1.3 1.08 1.86 0.97 1.15 1.4 1.19 1.71 1.02 1.13 1.5 1.40 1.63 1.14 1.18 1.6 1.80 1.51 1.26 1.19 1.7 2.07 1.07 0.81 0.50 TABLE II: The numerically calculated values of the density dependent part Fl of the exchange energy prefactor J , see Eq. (16), calculated with mobile spectators. For all the num- bers reported, the accuracy is better than 2%, except for F2 at ν = 1.0, 1.1, for which extrapolated values with an estimated error of ∼ 10% are shown. 1.1 1.2 1.3 1.4 1.5 1.6 AF 4P FIG. 9: The phase diagram as a function of the dimension- less density ν and interaction strength rΩ. The various phases were obtained by first calculating the effective couplings eJ1/J4 and eJ2/J4 for a given point; subsequently, the correspond- ing phase was determined utilizing the calculated boundaries shown in Fig. 8 for a system of N = 24 sites. expected to appear in a certain density range as long as rΩ ≫ 1. VII. DISCUSSION In the preceding sections we have studied the coupling of spins of electrons forming a zigzag Wigner crystal in a parabolic confining potential. We have found that apart from the 2-particle exchange couplings between the near- est and next-nearest neighbor spins, the 3- and 4-particle ring exchange processes have to be taken into account. At relatively low electron densities, when the transverse displacement of electrons is small compared to the dis- tance between particles, Fig. 1(b), the nearest-neighbor 2-particle exchange dominates. In this regime the spins form an antiferromagnetic ground state, with low-energy excitations described by the Tomonaga-Luttinger theory. At relatively high densities, when the transverse displace- ments are large, Fig. 1(c), the 4-particle ring exchange processes dominate. Since the ring exchange processes in- volving even numbers of particles favor spin-unpolarized states, the ground state of the system in this regime has zero total spin. Finally, if the confining potential is suf- ficiently shallow, so that the parameter rΩ & 50, there is an intermediate density range in which the 3-particle exchange processes are important, and the ground state is spontaneously spin-polarized. These results are sum- marized in Fig. 9. We expect that the zigzag Wigner crystal state can be realized in quantum wires. In order for the zigzag crys- tal to form the confining potential of the wire should be rather shallow, so that large values rΩ ≫ 1 of the pa- rameter (8) could be achieved. The exact shape of the confining potential in existing wires is not well known. Using the quoted value of subband spacing ∼ 20 meV we estimate that the parameter rΩ is of order unity in cleaved-edge-overgrowth wires.53 The confining potential in split-gate quantum wires tends to be more shallow. For a typical value 1 meV of subband spacing we es- timate rΩ ≈ 6. Finally, for p-type quantum wires13,54 with subband spacing ∼ 300 µeV we estimate rΩ ≈ 20. These hole systems are the most promising devices for observation of the zigzag Wigner crystal. Given the relatively modest values of rΩ . 20 in the ex- isting quantum wire structures, we do not expect that the spontaneously spin-polarized ground state will be easily observed in experiments. Instead, we expect that as the density of charge carriers is increased, a transition from antiferromagnetism to a state dominated by 4-particle ring exchanges will occur. We have found that the ground state in this phase has a complicated size dependence, which makes it very difficult to identify its nature by exact diagonalization of finite-size chains. To fully un- derstand the spin properties in the high density regime, further studies of zigzag ladders with ring exchange cou- pling are needed. Acknowledgments We acknowledge helpful discussions with A. Läuchli and T. Momoi. This work was supported by the U. S. Department of Energy, Office of Science, under Contract No. DE-AC02-06CH11357. T.H. was supported in part by a Grant-in-Aid from the Ministry of Education, Cul- ture, Sports, Science and Technology (MEXT) of Japan (Grant Nos. 16740213 and 18043003). Part of the calcu- lations were performed at the Ohio Supercomputer Cen- ter thanks to a grant of computing time. APPENDIX: CALCULATION OF THE PREFACTORS In order to find the prefactors J∗l in the expressions for the exchange constants, fluctuations around the in- stanton trajectory have to be taken into account. The Euclidean (imaginary time) path integral for the propa- gator G(R1,R2;T ) = 〈R1|e−TH |R2〉 can be written as G(R1,R2;T ) = ∫ R(T )=R2 R(0)=R1 DR e− S[R], (A.1) where the Euclidean action is given by S[R] = + V (R) . (A.2) Here R is a M -dimensional position vector, where M/2 is the total number of moving particles, including the exchanging particles as well as the spectators. In the semiclassical limit, the integral is dominated by the clas- sical path Rcl(τ) that extremizes the action S for a given exchange process. (The exponents η are given as η = S[Rcl]/(~ rΩ).) The Gaussian quantum fluctua- tions about the classical path can be taken into account by defining fluctuation coordinates u(τ) ≡ R(τ)−Rcl(τ) and subsequently expanding the action to second order. We obtain for the propagator G(R1,R2;T ) = F [Rcl]e S[Rcl], (A.3) F [Rcl] = ∫ u(T )=0 u(0)=0 Du(τ) e− 1~ δS[u(τ)], (A.4) δS[u(τ)] = 2(τ) + uT (τ)H(τ)u(τ) , (A.5) Hkp(τ) = ∂2V (R) ∂Rk∂Rp R=Rcl(τ) . (A.6) In the preceding formulas, R1 and R2 correspond to two configurations of electrons that minimize the electrostatic potential V (R) describing electron-electron interactions as well as the confining potential. The exchange constant is related to the ratio of the propagator for a particular exchange process R1 → R2, divided by the propagator for the trivial path Rcl(τ) = R1: F [Rcl] F [R1] S[Rcl]. (A.7) We start from the expression for the propagator in the semiclassical limit and proceed by partitioning the time interval [0, T ] into N subintervals (τ0, τ1), (τ1, τ2), . . . , (τN−1, τN ), with τ0 = 0, τN = T . The partition is cho- sen sufficiently fine as to enable the approximation that in each subinterval, the Hessian matrix H(τ) of the sec- ond derivative of the potential can be considered time independent, H(τ) ≃ H(τν) ≡ Hν . (In what follows, we use the convention that for the fluctuation coordinates, superscripts denote time subinterval, while subscripts de- note spatial coordinate.) Subsequently the path integral is calculated as a product of path integrals over the par- titioned interval. Moreover, each individual path inte- gral is that of a multidimensional harmonic oscillator, for which analytic results exist. We then have F [Rcl] = du1 G1(u 1,u0; τ1 − τ0) . . . duN−1 GN−1(u N−1,uN−2; τN−1 − τN−2)GN (uN ,uN−1; τN − τN−1), (A.8) and the propagator for each subinterval is ν ,uν−1; τν − τν−1) = ∫ u(τν)=uν u(τν−1)=uν−1 Du(τ) exp 2(τ) + uT (τ)Hνu(τ) . (A.9) Within each imaginary time subinterval, we define or- thonormal eigenvectors qνµ = k=1 U k. The unitary matrix Uν is such that Hν = UνΛν(Uν)T , with Λ a diag- onal matrix of eigenvalues (λνµ) 2, µ = 1 . . .M , where M is the number of spatial coordinates. Then one immedi- ately obtains ν ,qν−1; τν − τν−1) = ∫ q(τν)=qν q(τν−1)=qν−1 Dq(τ) exp 2(τ) + qT (τ)Λνq(τ) = F̄ [qcl]e δS[qcl], (A.10) where qcl is the classical trajectory connecting q ν−1 and qν . Considering the fluctuation part first, we obtain an elementary path integral F̄ [qcl] = ∫ q(τν)=0 q(τν−1)=0 Dq(τ) exp dτ qT (τ) , (A.11) where Bνµ = ~ sinh(λνµ∆τν) , (A.12) and ∆τν = τn − τn−1. The exponent δS[qcl] can now be calculated explicitly δS[qcl] [(qνµ) cl + (q cl] cosh(λ µ∆τν) −2(qνµ)cl(qν−1µ )cl . (A.13) The subscript “cl” used for notational clarity will be sub- sequently dropped from all expressions. With some addi- tional algebra, the remaining integral is easily evaluated. With the following definitions Γνkp = ~ tanh(λνµ∆τν) Uνpµ (A.14) ∆νkp = ~ sinh(λνµ∆τν) Uνpµ, (A.15) we find F [Rcl] = (2π) , (A.16) where the M(N − 1) ×M(N − 1) matrix Ωνλkp has com- ponents Ωνλkp = (Γ kp + Γ kp )δ ν,λ −∆νkpδν,(λ+1) −∆λkpδν,(λ−1). (A.17) The calculation of F [R1] is carried out in an identical manner and the subscript “0” will be used to distinguish the results pertaining to that calculation. Finally, one has to account for the existence of an eigenvalue of the matrix Ω which is identically zero in the continuum limit and corresponds to the zero mode associated with uniform translation of the instanton in imaginary time. The procedure is standard55 and we simply report the result for the prefactor here. One ob- tains G = T Bνµ,0 detΩ0 det′ Ω , (A.18) where the primed determinant implies the exclusion of the eigenvalue corresponding to the zero mode. Revert- ing to the system of units used in this work, the prefactor of the exchange energy is given by J∗l = Al rΩ Bνµ,0 detΩ0 det′ Ω (A.19) The additional factor Al is used to account for multiple classical trajectories corresponding to the same exchange process, as happens for the case of nearest and next- nearest neighbor exchanges (i.e., A1 = A2 = 2, whereas Al = 1 for l ≥ 3). The numerical implementation of the method outlined above is straightforward. In particular, the quantity that needs to be numerically calculated, once for each type of exchange at all densities of interest, is Bνµ,0 detΩ0 det′ Ω . (A.20) We note here that the eigenvalue corresponding to the zero mode is easily calculated with the same procedure used by Voelker and Chakravarty38. In the definition of the prefactor, see Eqs. (A.4) and (A.5), one replacesH(τ) with H(τ)− λ, with λ a free parameter. Subsequently, a numerical search for the smallest eigenvalue that results in 1/F (λ) = 0 is carried out. The smallest eigenvalue corresponds to the zero mode, and for a finite partition of the imaginary time interval it is a small but finite number. 1 K. J. Thomas, J. T. Nicholls, M. Y. Simmons, M. Pepper, D. R. Mace, and D. A. Ritchie, Phys. Rev. Lett. 77, 135 (1996). 2 A. Kristensen, J. B. Jensen, M. Zaffalon, C. B. Sørensen, S. M. Reimann, P. E. Lindelof, M. Michel, and A. Forchel, J. Appl. Phys. 83, 607 (1998). 3 A. Kristensen, H. Bruus, A. E. Hansen, J. B. Jensen, P. E. Lindelof, C. J. Marckmann, J. Nyg̊ard, C. B. Sørensen, F. Beuscher, A. Forchel, and M. Michel, Phys. Rev. B 62, 10950 (2000). 4 K. J. Thomas, J. T. Nicholls, N. J. Appleyard, M. Y. Sim- mons, M. Pepper, D. R. Mace, W. R. Tribe, and D. A. Ritchie, Phys. Rev. B 58, 4846 (1998). 5 B. E. Kane, G. R. Facer, A. S. Dzurak, N. E. Lumpkin, R. G. Clark, L. N. Pfeiffer, and K. W. West, Appl. Phys. Lett. 72, 3506 (1998). 6 K. J. Thomas, J. T. Nicholls, M. Pepper, W. R. Tribe, M. Y. Simmons, and D. A. Ritchie, Phys. Rev. B 61, R13365 (2000). 7 D. J. Reilly, G. R. Facer, A. S. Dzurak, B. E. Kane, R. G. Clark, P. J. Stiles, R. G. Clark, A. R. Hamil- ton, J. L. O’Brien, N. E. Lumpkin, L. N. Pfeiffer, and K. W. West, Phys. Rev. B 63, 121311(R) (2001). 8 S. M. Cronenwett, H. J. Lynch, D. Goldhaber- Gordon, L. P. Kouwenhoven, C. M. Marcus, K. Hirose, N. S. Wingreen, and V. Umansky, Phys. Rev. Lett. 88, 226805 (2002). 9 D. J. Reilly, T. M. Buehler, J. L. O’Brien, A. R. Hamilton, A. S. Dzurak, R. G. Clark, B. E. Kane, L. N. Pfeiffer, and K. W. West, Phys. Rev. Lett. 89, 246801 (2002). 10 R. Crook, J. Prance, K. J. Thomas, S. J. Chorley, I. Farrer, D. A. Ritchie, M. Pepper, and C. G. Smith, Science 312, 1359 (2006). 11 R. de Picciotto, L. N. Pfeiffer, K. W. Baldwin, and K. W. West, Phys. Rev. B 72, 033319 (2005). 12 R. Danneau, W. R. Clarke, O. Klochan, A. P. Micol- ich, A. R. Hamilton, M. Y. Simmons, M. Pepper, and D. A. Ritchie, Appl. Phys. Lett. 88, 012107 (2006). 13 O. Klochan, W. R. Clarke, R. Danneau, A. P. Micolich, L. H. Ho, A. R. Hamilton, K. Muraki, and Y. Hirayama, Appl. Phys. Lett. 89, 092105 (2006). 14 L. P. Rokhinson, L. N. Pfeiffer, and K. W. West, Phys. Rev. Lett. 96, 156602 (2006). 15 C.-K. Wang and K.-F. Berggren, Phys. Rev. B 54, R14257 (1996); 57, 4552 (1998); A. A. Starikov, I. I. Yakimenko, and K.-F. Berggren, Phys. Rev. B 67, 235319 (2003). 16 B. Spivak and F. Zhou, Phys. Rev. B 61, 16730 (2000). 17 V. V. Flambaum and M. Yu. Kuchiev, Phys. Rev. B 61, R7869 (2000). 18 T. Rejec, A. Rams̆ak, and J. H. Jefferson, Phys. Rev. B 62, 12985 (2000). 19 H. Bruus, V. V. Cheianov, and K. Flensberg, Physica E 10, 97 (2001). 20 K. Hirose, S. S. Li, and N. S. Wingreen, Phys. Rev. B 63, 033315 (2001). 21 O. P. Sushkov, Phys. Rev. B 64, 155319 (2001); Phys. Rev. B 67, 195318 (2003). 22 Y. Meir, K. Hirose, and N. S. Wingreen, Phys. Rev. Lett. 89, 196802 (2002). 23 Y. Tokura and A. Khaetskii, Physica E 12, 711 (2002). 24 K. A. Matveev, Phys. Rev. Lett. 92, 106801 (2004); Phys. Rev. B 70, 245319 (2004). 25 T. Rejec and Y. Meir, Nature 442, 900 (2006). 26 H. Bruus and K. Flensberg, Semicond. Sci. Technol. 13, A30 (1998). 27 G. Seelig and K. A. Matveev, Phys. Rev. Lett. 90, 176804 (2003). 28 E. Lieb and D. Mattis, Phys. Rev. 125, 164 (1962). 29 A. D. Klironomos, J. S. Meyer and K. A. Matveev, Euro- phys. Lett. 74, 679 (2006). 30 H. J. Schulz, Phys. Rev. Lett. 71, 1864 (1993). 31 R. W. Hasse and J. P. Schiffer, Ann. Phys. 203, 419 (1990). 32 G. Piacente, I. V. Schweigert, J. J. Betouras, and F. M. Peeters, Phys. Rev. B 69, 045324 (2004). 33 W. Häusler, Z. Phys. B 99, 551 (1996). 34 A. D. Klironomos, R. R. Ramazashvili, and K. A. Matveev, Phys. Rev. B 72, 195343 (2005). 35 M. M. Fogler and E. Pivovarov, Phys. Rev. B 72, 195344 (2005); J. Phys.: Condens. Matter 18, L7 (2006). 36 M. Roger, Phys. Rev. B 30, 6432 (1984). 37 M. Katano and D. S. Hirashima, Phys. Rev. B 62, 2573 (2000). 38 K. Voelker and S. Chakravarty, Phys. Rev. B 64, 235125 (2001). 39 B. Bernu, L. Candido, and D. M. Ceperley, Phys. Rev. Lett. 86, 870 (2001). 40 D. J. Thouless, Proc. Phys. Soc. London 86, 893 (1965). 41 C. K. Majumdar and D. K. Ghosh, J. Math. Phys. 10, 1388 (1969); 10, 1399 (1969). 42 F. D. M. Haldane, Phys. Rev. B 25, R4925 (1982). 43 K. Okamoto and K. Nomura, Phys. Lett. A 169, 433 (1992). 44 S. Eggert, Phys. Rev. B 54, R9612 (1996). 45 S. R. White and I. Affleck, Phys. Rev. B 54, 9862 (1996). 46 T. Hamada, J. Kane, S. Nakagawa, and Y. Natsume, J. Phys. Soc. Jpn. 57, 1891 (1988). 47 T. Tonegawa and I. Harada, J. Phys. Soc. Jpn. 58, 2902 (1989). 48 A. V. Chubukov, Phys. Rev. B 44, 4693 (1991). 49 D. Allen, F. H. L. Essler, and A. A. Nersesyan, Phys. Rev. B 61, 8871 (2000). 50 C. Itoi and S. Qin, Phys. Rev. B 63, 224423 (2001). 51 N. Muramoto and M. Takahashi, J. Phys. Soc. Jpn. 68, 2098 (1999). 52 To be precise, we have found that the ground state at large eJ2/J4 belongs to the subspace (S,Q) = (0, 0) [(0, π)] for N = 8m [8m + 4], where m is an integer, while the first-excited state belongs to the subspace (S,Q) = (0, π) [(0, 0)]. 53 A. Yacoby, H. L. Stormer, N. S. Wingreen, L. N. Pfeiffer, K. W. Baldwin, and K. W. West, Phys. Rev. Lett. 77, 4612 (1996). 54 A. J. Daneshvar, C. J. B. Ford, A. R. Hamilton, M. Y. Sim- mons, M. Pepper, and D. A. Ritchie, Phys. Rev. B 55, R13409 (1997). 55 S. Coleman, Aspects of Symmetry (Cambridge University Press, New York, 1988). ABSTRACT We consider interacting electrons in a quantum wire in the case of a shallow confining potential and low electron density. In a certain range of densities, the electrons form a two-row (zigzag) Wigner crystal whose spin properties are determined by nearest and next-nearest neighbor exchange as well as by three- and four-particle ring exchange processes. The phase diagram of the resulting zigzag spin chain has regions of complete spin polarization and partial spin polarization in addition to a number of unpolarized phases, including antiferromagnetism and dimer order as well as a novel phase generated by the four-particle ring exchange. <|endoftext|><|startoftext|> arXiv:0704.0777v1 [hep-th] 5 Apr 2007 CALT-68-2636 DAMTP-2007-25 UT-07-11 Decoupling Supergravity from the Superstring Michael B. Green,1 Hirosi Ooguri,2,3 and John H. Schwarz2 1Department of Applied Mathematics and Theoretical Physics Cambridge University, Cambridge CB3 0WA, UK 2California Institute of Technology, Pasadena, CA 91125, USA 3Department of Physics, University of Tokyo, Tokyo 113-0033, Japan Abstract We consider the conditions necessary for obtaining perturbative maximal supergrav- ity in d dimensions as a decoupling limit of type II superstring theory compactified on a (10 − d)-torus. For dimensions d = 2 and d = 3 it is possible to define a limit in which the only finite-mass states are the 256 massless states of maximal supergravity. However, in dimensions d ≥ 4 there are infinite towers of additional massless and finite- mass states. These correspond to Kaluza–Klein charges, wound strings, Kaluza–Klein monopoles or branes wrapping around cycles of the toroidal extra dimensions. We con- clude that perturbative supergravity cannot be decoupled from string theory in dimensions ≥ 4. In particular, we conjecture that pure N = 8 supergravity in four dimensions is in the Swampland. March, 2007 http://arxiv.org/abs/0704.0777v1 There has recently has been some speculation that four-dimensional N = 8 super- gravity might be ultraviolet finite to all orders in perturbation theory [1,2,3]. If true, this would raise the question of whether N = 8 supergravity might be a consistent theory that is decoupled from its string theory extension. A related issue is whether N = 8 supergrav- ity can be obtained as a well-defined limit of superstring theory. Here we argue that such a supergravity limit of string theory does not exist in four or more dimensions, irrespective of whether or not the perturbative approximation is free of ultraviolet divergences. In this paper, we will study limits of Type IIA superstring theory on a (10 − d)- dimensional torus T 10−d for various d. One may regard the following analysis as analogous to the study of the decoupling limit on Dp-branes (the limit where field theories on branes decouple from closed string degrees freedom in the bulk) for various p [4,5]. The decoupling limit on Dp-branes is known to exist for p ≤ 5. On the other hand, subtleties have been found for p ≥ 6, where infinitely many new world-volume degrees of freedom appear in the limit. This has been regarded as a sign that a field theory decoupled from the bulk does not exist on Dp-branes for p ≥ 6. We will find similar subtleties for Type IIA theory on T 10−d ×Rd for d ≥ 4. It will be sufficient for our purposes to consider the torus T 10−d to be the product of (10− d) circles, each of which has radius R. Numerical factors, such as powers of 2π, are irrelevant to the discussion that follows and therefore will be dropped. In ten dimensions, Newton’s constant is given by G10 = g 2ℓ8s , where ℓs is the string scale and g is the string coupling constant. Thus, the effective Newton constant in d dimensions is given by Gd ≡ ℓd−2d = R10−d g2ℓ8s R10−d , (1) where ℓd is the d-dimensional Planck length, so that . (2) We are interested in whether there is a limit of string theory that reduces to maximal supergravity, which is defined purely in terms of the dynamics of the 256 states in the massless supermultiplet. In other words, we are interested in the limit in which all the excited string states, together with the Kaluza–Klein excitations and string winding states associated with the (10− d)-torus, decouple. A necessary condition for this to happen is that these states are all infinitely massive compared to the d-dimensional Planck scale ℓd. This is achieved by taking , and , (3) with ℓd fixed. This is compatible with keeping g fixed for d < 6. If the extra states do decouple then the surviving states are the 256 massless states of maximal supergravity, which is N = 8 supergravity when d = 4. Let us now consider the spectrum of nonperturbative superstring excitations in this limit. First consider a Dp-brane wrapping a p cycle of the torus. The mass of such a state in d dimensions is · ℓ1− . (4) When d ≤ 5, we also need to consider a NS5-brane wrapping a 5 cycle. This has a mass given by MNS5 = g2ℓ6s · ℓ2−d . (5) In order to obtain the pure supergravity theory with 32 supercharges in d dimensions, these nonperturbative states also need to decouple, so their masses must satisfy Mp,MNS5 ≫ 1/ℓd. In the case of d = 4 the nonperturbative BPS particle spectrum also includes Kaluza–Klein monopoles, which are discussed in the next paragraph. Before studying the limit in any dimension, d, we will discuss what to expect on general ground. A Kaluza–Klein momentum state and a wrapped string state have masses 1/R and R/ℓ2s , respectively, and they are half-BPS objects that carry a single unit of a conserved charge. In d-dimensions, their magnetic duals are (d − 4)-branes. The BPS saturation condition together with the Dirac quantization condition implies quite generally that the mass m of a BPS particle and the tension T of its magnetic dual (d − 4)-brane are related by mT ∼ 1 . (6) 1 In this limit, the string length ℓs provides a regularization scale for supergravity. Thus, if string amplitudes depend sensitively on ℓs, it can be taken as evidence for ultraviolet divergences in supergravity. This is seen explicitly, for example, in the one-loop four graviton amplitude, which is ultraviolet divergent in nine dimensions. The corresponding string expression is finite and its low-energy limit is sensitive to the presence of these massive states with momenta ∼ 1/ℓs. Applying this to d = 4, we immediately conclude that there is no limit in four dimensions where we can keep all BPS particles heavier than the Planck scale. In particular, magnetic duals of Kaluza–Klein excitations, which are the well-known Kaluza–Klein monopoles, are BPS states with masses ∼ R/ℓ24 → 0.2 Similarly, magnetic duals of wrapped strings are NS5-branes wrapping 5-cycles of T 6, and their masses go as ℓ2s/Rℓ 4 → 0. Later, we will discuss implications of these light states. When d ≥ 5, at least a subset of the BPS branes become tensionless in the limit (3). By contrast, in three dimensions it is possible to define a limit where all BPS particles become infinitely massive simultaneously. In this case, magnetic duals of BPS particles are (−1)-branes, namely instantons, and their Euclidean actions vanish in the limit. Thus, one would expect nonperturbative effects to be very large in three dimensions even though no singularity is apparent from the spectrum. In two dimensions, there are no magnetic duals of BPS particles, and we expect that there is a smooth limit where all BPS particles are massive and instanton actions remain non-vanishing. Now, let us look at each case in more detail. When d = 2, the conditions we want to impose are and MNS5 = → ∞. (7) On the other hand, the string coupling constant is given by . (8) Thus, the desired limit can be taken by sending R → 0 while keeping the string coupling constant finite. In this limit, all particle masses are much higher than the Planck mass, except for the massless two-dimensionalN = 16 supergravity states [6]. However, Dp-brane and NS5-brane instantons wrapping T 8 have Euclidean actions proportional to (ℓs/R) 3−p ∼ 8 and (ℓs/R) 2 ∼ g− 14 , respectively. Though the actions all remain finite and non-zero in the limit, their effects are not uniformly suppressed for small g. Thus, the resulting theory may not have a weak coupling limit that is dominated by the perturbative contribution. When d = 3, the conditions we need to impose are and MNS5 = → ∞. (9) 2 If the torus has six independent radii Ri, the Kaluza–Klein monopole mass spectrum has the form M2 = (niRi/ℓ Since we now have · ℓ3, (10) we can rewrite (9) as and MNS5 = → ∞. (11) Since p = 0, 2, 4, 6 in Type IIA theory, this can again be arranged by taking R → 0 keeping g finite.3 This is also compatible with the limit (3). Thus, all particle states develop large masses and may decouple, except for those in three-dimensional N = 16 supergravity theory [7]. However, Dp-brane and NS5-brane instanton actions, which are given by g 4 (R/ℓ3) 8 and g− 2 (R/ℓ3) 4 , vanish in the limit R → 0 for any finite value of g. This means that nonperturbative effects are strong and it may be difficult to determine the properties of the resulting three-dimensional supergravity. In view of these observations, it is interesting that gravity theories formulated in terms of a finite number of fields are known to exist in two and three dimensions. In three dimensions, the relation with Chern-Simons gauge theory [8] suggests that pure Einstein gravity is finite to all orders in perturbation theory. However, this theory has no propagating degrees of freedom, and it is not known whether there is a finite quantum gravity theory in three dimensions that includes propagating (scalar or spin-1/2) degrees of freedom. Such degrees of freedom are present, of course, in the examples considered here. The fact that we find limits of string theory compactifications with a finite number of such propagating degrees of freedom in these dimensions may be encouraging, though the implications of the nonperturbative instanton contributions need to be understood. When d = 4, the conditions, (3), necessary for the extra modes to have infinite masses and MNS5 = → ∞. (12) Clearly, this cannot be realized simultaneously for all p = 0, 2, 4, 6. This is in accord with the general argument given earlier, since a wrapped Dp-brane and a wrapped D(6−p)-brane 3 Note that, in the Type IIB theory, a wrapped D7-brane cannot be made heavy unless g ≫ 1. This is not in contradiction with T-duality since g transforms under T-duality in such a way that ℓp given by (1) remains invariant. T-duality along one of the circles on T 10−d transforms the coupling g → gℓs/R so it diverges in the limit R → 0 with the original coupling constant, given by (10), kept finite. are electric–magnetic duals. Similarly, the magnetic duals of Kaluza–Klein excitations and wrapped strings are Kaluza–Klein monopoles and wrapped NS5-branes, whose masses behave as R/ℓ24 and ℓ 4, respectively. There are infinitely many such states since they have arbitrary integer charges. In the limit R, ℓ2s/R → 0, there is no mass gap and the spectrum becomes continuous. To understand the implications of these infinitely many light states, we note that among the elements of the four-dimensional U-duality group E7(Z) is the four-dimensional S-duality transformation that interchanges the 28 types of electric charge with the corre- sponding magnetic charges [9,10]. This duality is described by the following transforma- tions of the moduli, S : R → R̃ = ℓ and ℓs → ℓ̃s = . (13) Note that this transformation inverts the radius R in four-dimensional Planck units (in contrast to T-duality, which inverts R in string units). Since g is related to R and ℓs by (2), this transformation acts as the inversion g → g̃ = 1/g, which maps BPS states into each other. For example, a wrapped Dp-brane is interchanged with a wrapped D(6 − p)- brane. Similarly, a Kaluza–Klein excitation is interchanged with a Kaluza–Klein monopole (whereas T-duality would relate it to a wrapped F-string). Thus, in the dual frame in which the compactification scale R̃ → ∞, the six-torus is decompactified. This explains the continuous spectrum in the limit (3). The fact that an infinite set of states from the nonperturbative sector become massless shows that the limit of interest does not result in pure N = 8 supergravity in four dimensions. Rather, it results in 10-dimensional decompactified string theory with the string coupling constant inverted. This is true in both the type IIA and type IIB cases. The only way of avoiding this would be to relax (3), in which case there would instead be extra finite-mass Kaluza–Klein or winding number states, which would therefore not decouple. One may regard our results on the limit of superstring compactification on T 10−d as examples illustrating the conjectures formulated in [11,12] on the geometry of continuous moduli parameterizing the string landscape. The conjectures concern consistent quantum gravity theories with finite Planck scale in four or more dimensions. Among the conjectures are the statements that, if a theory has continuous moduli, there are points in the moduli space that are infinitely far away from each other, and an infinite tower of modes becomes massless as a point at infinity is approached [12]. Since the limit considered in this paper corresponds to a point in the moduli space of string compactifications at infinite distance from a generic point in the middle of moduli space, the conjectures predict than an infinite number of particles become massless in the limit. For d = 4, we have found that among such particles are Kaluza–Klein monopoles, i.e., Kaluza–Klein modes on T 6 in the dual frame in the limit R̃ → ∞. On the other hand, the moduli space of pureN = 8 supergravity also contains infinite distance points, but it does not take account of new light particles appear near these points. If the BPS particles required by string theory were included one would have string theory and not N = 8 supergravity.4 Thus, the conjectures of [12] imply that the N = 8 supergravity is in the Swampland. Similarly, there are many superstring compactifications with N < 8 supersymmetry, and discarding stringy states in these compactifications results in further supergravity theories in the Swampland. It is interesting to see how scattering amplitudes behave in the limit (3). Consider a four-dimensional graviton scattering amplitude where the graviton momenta are below the four-dimensional Planck scale. According to (1) and (2), the ten-dimensional Planck length, ℓ10, is given by ℓ10 = g 4 ℓs = R . (14) After the S-duality transformation (13), the limit R → 0 turns into R̃ → ∞. Thus, we have ℓ̃10 = R̃ → ∞ in ten dimensions. Since ℓ̃10 ≪ R̃, the extra dimensions decompactify and the theory is effectively ten-dimensional. Furthermore, if we take this limit keeping the graviton momenta fixed (in units of the four-dimensional Planck mass), the scattering process becomes trans-Planckian. Generically, we expect that it will involve formation and evaporation of virtual black holes in ten dimensions. The original motivation of this work was to investigate the relation between super- string theory and N = 8 supergravity to see, in particular, under what conditions super- gravity might be ultraviolet finite. What we have found is that in four or more dimensions (d ≥ 4) there is no limit of compactified superstring theory in which the stringy effects decouple and only the 256 massless supergravity fields survive below the four-dimensional Planck scale. This is true whether or not there are ultraviolet divergences in supergravity perturbation theory. Of course, there is a well-defined procedure for extracting UV finite four-dimensional scattering amplitudes from perturbative string theory. This involves tak- ing g → 0 first, before taking the limit (3). However, this procedure does not keep ℓ4 fixed, and therefore it does not correspond to the limit considered in this paper. 4 One can imagine an alternative history in which type II superstring theory and M-theory were discovered by properly interpreting the BPS solitons of N = 8 supergravity. It might be instructive to compare the situation to that of the conifold limit of Calabi– Yau compactified type II superstring theory studied by Strominger [13]. In that case, certain terms in the low-energy effective theory that are independent of the string coupling constant g, due to the decoupling of vector and hypermultiplet fields, can be computed in string perturbation theory. One can estimate the singularity of these terms using the fact that a brane wrapping a vanishing cycle describes a nonperturbative BPS particle that becomes massless in the conifold limit. If one could identify analogous terms in N = 8 supergravity, one could transform the Feynman diagram computation in four-dimensional supergravity into a corresponding computation in ten dimensions, which might give insight into the question of ultraviolet finiteness. The situation is qualitatively different in two and three dimensions (d = 2, 3), where all non-supergravity states develop masses larger than the Planck scale in the limit (3), and therefore they can decouple. In these cases only the 256 massless supergravity states survive, and a self-contained quantum gravity theory may well exist decoupled from string theory. We have found, however, that in the d = 3 case there are instantons with zero action, which give rise to large nonperturbative contributions. In the d = 2 case the instanton actions do not vanish in the limit (3), but not all of them are small when g is small. Therefore the amplitudes may not be dominated by the perturbative contribution in this case, too. Acknowledgments We thank Z. Bern, N. Dorey, C. Hull, J. Russo, N. Seiberg, A. Sen, M. Shigemori, Y. Tachikawa, D. Tong, P. Vanhove and E. Witten for discussions. H.O. thanks the hospitality of the particle theory group of the University of Tokyo. H.O. and J.H.S. are supported in part by the DOE grant DE-FG03-92-ER40701. The research of H.O. is also supported in part by the NSF grant OISE-0403366 and by the 21st Century COE Program at the University of Tokyo. References [1] M. B. Green, J. G. Russo and P. Vanhove, “Non-renormalisation conditions in type II string theory and maximal supergravity,” JHEP 0702, 099 (2007) [arXiv:hep- th/0610299]. [2] Z. Bern, L. J. Dixon and R. Roiban, “Is N = 8 supergravity ultraviolet finite?,” Phys. Lett. B 644, 265 (2007) [arXiv:hep-th/0611086]. [3] Z. Bern, J. J. Carrasco, L. J. Dixon, H. Johansson, D. A. Kosower and R. Roiban, “Three-loop superfiniteness of N = 8 supergravity,” arXiv:hep-th/0702112. [4] A. Sen, “D0-branes on Tn and matrix theory,” Adv. Theor. Math. Phys. 2, 51 (1998) [arXiv:hep-th/9709220]. [5] N. Seiberg, “Why is the matrix model correct?,” Phys. Rev. Lett. 79, 3577 (1997) [arXiv:hep-th/9710009]. [6] H. Nicolai and N. P. Warner, “The structure of N = 16 supergravity in two dimen- sions,” Commun. Math. Phys. 125, 369 (1989). [7] N. Marcus and J. H. Schwarz, “Three-dimensional supergravity theories,” Nucl. Phys. B 228, 145 (1983). [8] E. Witten, “(2+1)-dimensional gravity as an exactly soluble system,” Nucl. Phys. B 311, 46 (1988). [9] C. M. Hull and P. K. Townsend, “Unity of superstring dualities,” Nucl. Phys. B 438, 109 (1995) [arXiv:hep-th/9410167]. [10] C. M. Hull, “String dynamics at strong coupling,” Nucl. Phys. B 468, 113 (1996) [arXiv:hep-th/9512181]. [11] C. Vafa, “The string landscape and the swampland,” arXiv:hep-th/0509212. [12] H. Ooguri and C. Vafa, “On the geometry of the string landscape and the swampland,” [arXiv:hep-th/0605264]. [13] A. Strominger, “Massless black holes and conifolds in string theory,” Nucl. Phys. B 451, 96 (1995) [arXiv:hep-th/9504090]. http://arxiv.org/abs/hep-th/0610299 http://arxiv.org/abs/hep-th/0610299 http://arxiv.org/abs/hep-th/0611086 http://arxiv.org/abs/hep-th/0702112 http://arxiv.org/abs/hep-th/9709220 http://arxiv.org/abs/hep-th/9710009 http://arxiv.org/abs/hep-th/9410167 http://arxiv.org/abs/hep-th/9512181 http://arxiv.org/abs/hep-th/0509212 http://arxiv.org/abs/hep-th/0605264 http://arxiv.org/abs/hep-th/9504090 ABSTRACT We consider the conditions necessary for obtaining perturbative maximal supergravity in d dimensions as a decoupling limit of type II superstring theory compactified on a (10 -- d)-torus. For dimensions d = 2 and d = 3 it is possible to define a limit in which the only finite-mass states are the 256 massless states of maximal supergravity. However, in dimensions d > 3 there are infinite towers of additional massless and finite-mass states. These correspond to Kaluza--Klein charges, wound strings, Kaluza--Klein monopoles or branes wrapping around cycles of the toroidal extra dimensions. We conclude that perturbative supergravity cannot be decoupled from string theory in dimensions d > 3. In particular, we conjecture that pure N = 8 supergravity in four dimensions is in the Swampland. <|endoftext|><|startoftext|> Introduction Let G denote a connected and reductive group over an algebraically closed field k, and let B denote a Borel subgroup of G. An equi- variant embedding X of G is a G × G-variety which contains G = (G × G)/diag(G) as an open G × G-invariant subset, where diag(G) is the diagonal image of G in G × G. Any equivariant embedding X of G contains finitely many B × B-orbits. In recent years the geom- etry of closures of B × B-orbits has been studied by several authors. The most general result was obtained in [H-T2] where it was proved that B × B-orbit closures are normal, Cohen-Macaulay and have (F - )rational singularities (actually, even stronger results were obtained). In the present paper we will study (closed) subvarieties inX of the form diag(G) ·V, where V denotes the closure of a B×B-orbit. Subvarieties of equivariant embeddings of G of this form will be called G-Schubert varieties. When G is a semisimple group of adjoint type there exists a canonical equivariant embedding X of G which is called the wonderful compact- ification. The wonderful compactifications are of primary interest in this paper. Actually, this work arose from the question of describing the closures of the so-called G-stable pieces of X. The G-stable pieces makes up a decomposition of X into locally closed subsets. They were introduced by Lusztig in [L] where they were used to construct and http://arxiv.org/abs/0704.0778v2 2 XUHUA HE AND JESPER FUNCH THOMSEN study a class of perverse sheaves which generalizes his theory of charac- ter sheaves on reductive groups. More precisely, these perverse sheaves are the intermediate extensions of the so-called “character sheaves” on a G-stable piece. This motivates the study of closures of G-stable pieces which turns out to coincide with the set of G-Schubert varieties. Before discussing the closures of G-stable pieces in details, let us make a short digression and discuss some other motivations for study- ing G-stable pieces and G-Schubert varieties (in wonderful compactifi- cations): (1) When G is a simple group, the boundary of the closure of the unipotent subvariety of G in the wonderful compactification X, is a union of certain G-Schubert varieties (see [He] and [H-T]). Thus knowing the geometry of these G-Schubert varieties will help us to understand the geometry of the closure of the unipo- tent variety within X. (2) Let Lie(G) denote the Lie algebra of a simple group G over a field of characteristic zero. Let ≪,≫ denote a fixed symmet- ric non-degenerate ad-invariant bilinear form. Let <,> be the bilinear form on Lie(G)⊕ Lie(G) defined by < (x, y), (x′, y′) >=≪ x, x′ ≫ − ≪ y, y′ ≫ . In [E-L], Evens and Lu showed that each splitting Lie(G) ⊕ Lie(G) = l ⊕ l′, where l and l′ are Lagrangian subalgebras of Lie(G)⊕ Lie(G), gives rise to a Poisson structure Πl,l′ on X. If moreover, one starts with the Belavin-Drinfeld splitting, then all the G-stable pieces/G-Schubert varieties and B×B−-orbits of X are Poisson subvarieties, where B− is a Borel subgroup opposite to B. Thus to understand the Poisson structure on X corresponding to the Belavin-Drinfeld splitting, one needs to understand the geometry of the G-stable pieces/G-Schubert varieties. If we start with another splitting, then we obtain a different Poisson structure on X and in order to understand these Poisson structures, one needs to study the R-stable pieces [L-Y] instead (see Section 12), which generalize both the G- stable pieces and the B × B−-orbits. The main technical ingredient in this paper is the positive character- istic notion of Frobenius splitting. Frobenius splitting is a powerful tool which has been proved to be very useful in obtaining strong geometric conclusions for e.g. Schubert varieties and closures of B × B-orbits in equivariant embeddings. In the present paper we obtain two types of results related to G-Schubert varieties over fields of positive character- istic. First of all, if we fix an equivariant embedding X of a reductive group G then we prove that all G-Schubert varieties in X are simul- taneously compatibly Frobenius split by a Frobenius splitting of X . Secondly, concentrating on a single G-Schubert variety X, in a smooth projective and toroidal embedding X , we prove that this admits a stable Frobenius splitting along an ample divisor. Statements of this form put strong conditions on the intertwined behavior of cohomology groups of line bundles on X and its G-Schubert varieties. As this is re- lated to geometric properties it therefore seems natural to expect that G-Schubert varieties should have nice singularities. It therefore comes as a complete surprise that G-Schubert varieties, in general, are not even normal. We only provide a single example of this phenomenon (in the wonderful compactification of a group of type G2), but expect that this absence of normality is the general picture. In obtaining the Frobenius splitting result mentioned above, we have developed some general theory of how to construct Frobenius splitting of varieties of the form G×PX (see Section 4.2 for the definition). This part of the paper is influenced by the theory of B-canonical Frobenius splitting as discussed in [B-K, Chap.4]; in particular the proof of [B-K, Prop.4.1.17]. The presentation we provide is more general and makes it possible to extract even better result from the ideas of B-canonical Frobenius splittings. This theory is presented in Chapter 5 in a general- ity which is more than necessary for obtaining the described Frobenius splitting results for G-Schubert varieties. However, we hope that this theory could be useful elsewhere and we certainly consider it to be of independent interest. This paper is organized in the following way. In Section 2 we intro- duce notation, and in Section 3 we briefly define Frobenius splitting and explain its fundamental ideas. Section 4 is devoted to some results on linearized sheaves which should all be well known. In Section 5 we study the Frobenius splitting of varieties of the form G ×P X for a variety X with an action by a parabolic subgroup P . The main idea is to decompose the Frobenius morphism on G×P X into maps associ- ated to the Frobenius morphism on the base G/P and the fiber X of the natural morphism G×P X → G/P . In Section 6 we relate B-canonical Frobenius splittings to the material in Section 5. Section 7 contains applications of Section 5 to general G × G-varieties. In section 8 we define the G-stable pieces and G-Schubert varieties. In Section 9 we apply the material of the previous sections to the class of equivariant embeddings and obtain Frobenius splitting results for G-Schubert vari- eties. Section 10 contains results related to cohomology of line bundles on G-Schubert varieties. Section 11 contains an example of a non- normal G-Schubert variety. Finally Section 12 contains generalizations and variations of the previous sections. We would like to thank the referee for a careful reading of this paper and for numerous suggestions concerning the presentation. 4 XUHUA HE AND JESPER FUNCH THOMSEN 2. Notation We will work over a fixed algebraically closed field k. The charac- teristic of k will depend on the application. By a variety we mean a reduced and separated scheme of finite type over k. In particular, we allow a variety to have several irreducible components. 2.1. Group setup. We letG denote a connected linear algebraic group over k. We fix a Borel subgroup B and a maximal torus T ⊂ B. The notation P is used for a parabolic subgroup of G containing B. The set of T -characters is denoted by X∗(T ) and we identify this set with the set X∗(B) of B-characters. 2.2. Reductive case. In many cases we will specialize to the case where G is reductive. In this case we will also use the following no- tation : the set of roots determined by T is denoted by R ⊆ X∗(T ) while the set of positive roots determined by (B, T ) is denoted by R+. The simple roots are denoted by α1, . . . , αl, and we let ∆ = {1, . . . , l} denote the associated index set. The simple reflection associated to the simple root αi is then denoted by si. The Weyl group W = NG(T )/T is generated by the simple reflections si, for i ∈ ∆. The length of w ∈ W will be denoted by l(w). For J ⊂ ∆, let WJ denote the subgroup of W generated by the simple reflection associated with the elements in J , and let W J (resp. JW ) denote the set of minimal length coset rep- resentatives for W/WJ (resp. WJ\W ). The element in W of maximal length will be denoted by w0, while w 0 is used for the same kind of element in WJ . For any w ∈ W , we let ẇ denote a representative of w in NG(T ). For J ⊂ ∆, let PJ ⊃ B denote the corresponding standard parabolic subgroup and P−J ⊃ B − denote its opposite parabolic. Let LJ = PJ ∩P J be the common Levi subgroup of PJ and P J containing T . Let UJ (resp. U J ) denote the unipotent radical of PJ (resp. P When J = ∅ we also use the notation U and U− for UJ and U J respec- tively. When G is semisimple and simply connected we may associate a fundamental character ωi to each simple root αi. The sum of the fundamental characters is then denoted by ρ. Then ρ also equals half the sum of the positive roots. 3. The relative Frobenius morphism In this section we collect some results related to the Frobenius mor- phism and to the concept of Frobenius splitting. Compared to other presentations on the same subject, this presentation differs only in its emphasis on the set HomOX′ (FX)∗OX ,OX′ (to be defined below) and not just the set of Frobenius splittings. Thus, the obtained results are only small variations of already known results as can be found in e.g. [B-K]. 3.1. The Frobenius morphism. By definition a variety X comes with an associated morphism pX : X → Spec(k), of schemes. Assume that the field k has positive characteristic p > 0. Then the Frobenius morphism on Spec(k) is the morphism of schemes Fk : Spec(k) → Spec(k), which on the level of coordinate rings is defined by a 7→ ap. As k is assumed to be algebraically closed the morphism Fk is actually an isomorphism and we let F−1k denote the inverse morphism. Composing pX with F k we obtain a new variety p′X : X → Spec(k), with underlying scheme X . In the following we suppress the morphism pX from the notation and simply use X as the notation for the variety defined by pX . The variety defined by p X is then denoted by X The relative Frobenius morphism on X is then the morphism of varieties : FX : X → X which as a morphism of schemes is the identity map on the level of points and where the associated map of sheaves X : OX′ → (FX)∗OX , is the p-th power map. A key property of the Frobenius morphism is the relation (1) (FX) ′ ≃ Lp which is satisfied for every line bundle L on X (here L′ denotes the corresponding line bundle on X ′). 3.2. Frobenius splitting. A variety X is said to be Frobenius split if the OX′-linear map of sheaves : X : OX′ → (FX)∗OX , has a section; i.e. if there exists an element s ∈ HomOX′ (FX)∗OX ,OX′ such that the composition s ◦F X is the identity endomorphism of OX′ . The section s will be called a Frobenius splitting of X . 6 XUHUA HE AND JESPER FUNCH THOMSEN 3.3. Compatibility with line bundles and closed subvarieties. Fix a line bundle L on X and a closed subvariety Y in X with sheaf of ideals IY . Let Y ′ denote the closed subvariety of X ′ associated to Y with sheaf of ideals denoted by IY ′ . The kernel of the natural morphism HomOX′ (FX)∗L,OX′ → HomOX′ (FX)∗(L⊗ IY ),OY ′ induced by the inclusion L ⊗ IY ⊂ L and the projection OX′ → OY ′ , will be denoted by EndLF (X, Y ). The associated space of global sections will be denoted by EndLF (X, Y ). When Y = X we simply denote EndLF (X, Y ) (resp. End F (X, Y )) by End F (X) (resp. End F (X)). The sheaf EndLF (X, Y ) is a subsheaf of End F (X) consisting of the elements compatible with Y . Moreover, there is a natural morphism EndLF (X, Y )|Y → End F (Y ), where the notation |Y means restriction to Y . If Y1, Y2, . . . , Ym is a collection of closed subvarieties of X then the notation EndLF (X, Y1, . . . , Ym) (or sometimes End F (X, {Yi} i=1)) will de- note the intersection of the subsheaves EndLF (X, Yi) for i = 1, . . . , m. The set of global sections of the sheaf EndLF (X, Y1, . . . , Ym) will be de- noted by EndLF (X, Y1, . . . , Ym). When L = OX we remove L from all of the above notation. In particular, the vectorspace EndF (X) denotes the set of morphisms from (FX)∗OX to OX′ and thus contains the set of Frobenius splittings of X . A Frobenius splitting s of X contained in EndF (X, {Yi}i) is said to be compatible with the subvarieties Y1, . . . , Ym. When s is compatible in this sense it induces a Frobenius splitting of each Yi for i = 1 . . . , m. In this case we also say that s compatibly Frobenius splits Y1, . . . , Ym. In concrete terms, this is equivalent to (FX)∗IYi ⊂ IY ′i . for all i. Lemma 3.1. Let Y and Z denote closed subvarieties in X and let s denote a global section of EndLF (X,Z, Y ). (1) s ∈ EndLF (X, Y1) for every irreducible component Y1 of Y . (2) If the scheme theoretic intersection Z ∩ Y is reduced then s is contained in EndLF (X, Y ∩ Z). Proof. Let Y1 denote an irreducible component of Y and let J = s (FX)∗(IY1 ⊗ L) ⊂ OX′ . Let U denote the open complement (in X ′) of the irreducible compo- nents of Y ′ which are different from Y ′1 . Then IY ′1 coincides with IY ′ on U and consequently J|U ⊂ (IY ′)|U as s is compatible with Y . In particular, J|U ⊂ (IY ′1 )|U . We claim that this implies that J ⊂ IY ′1 : let V denote an open subset of X ′ and let f be a section of J over V . As J is a subsheaf of OX′ , we may consider f as a function on V , and it suffices to prove that f vanishes on Y ′1 ∩ V . If Y 1 ∩ V is empty then this is clear. Otherwise, U ∩ V ∩ Y ′1 is a dense subset of Y 1 and it suffices to prove that f vanishes on this set. But this follows from the inclusion J|U ⊂ (IY ′1 )|U . As a consequence s is compatible with Y1. The second claim follows as the sheaf of ideals of the intersection Z ∩ Y is IY + IZ . � The condition that Z ∩ Y is reduced, in Lemma 3.1, only ensures that Z ∩ Y is a variety. When L = OX and s is a Frobenius splitting this is always satisfied [B-K, Prop.1.2.1]. 3.4. The evaluation map. Let k[X ′] denote the space of global reg- ular functions on X ′. Evaluating an element s : (FX)∗OX → OX′ of EndF (X) at the constant global function 1 on X defines an element in k[X ′] which we denote by evX(s). This defines a morphism evX : EndF (X) → k[X with the property that evX(s) = 1 if and only if s is a Frobenius splitting of X . 3.5. Frobenius D-splittings. Consider an effective Cartier divisor D on X , and let σD denote the associated global section of the associated line bundle OX(D). A Frobenius splitting s of X is said to be a Frobe- nius D-splitting if s factorizes as s : (FX)∗OX (FX)∗σD −−−−−→ (FX)∗OX(D) −→ OX′ , for some element sD in End OX(D) . We furthermore say that the Frobenius D-splitting s is compatible with a subvariety Y if sD is com- patible with Y . The following result assures that, in this case, the compatibility with closed subvarieties agrees with the usual definition [R, Defn.1.2]. Lemma 3.2. Assume that s defines a Frobenius D-splitting of X. Then sD is compatible with Y if and only if (i) s compatibly Frobe- nius splits Y and (ii) the support of D does not contain any irreducible components of Y . Proof. The if part of the statement follows from [R, Prop.1.4]. So assume that sD is compatible with Y . Then sD induces a morphism sD : (FY )∗OX(D)|Y → OY ′ , satisfying sD((σD)|Y ) is the constant function 1 on Y ′. As a conse- quence (σD)|Y does not vanish on any of the irreducible components of Y . This proves part (ii) of the statement. Part (i) is clearly satis- fied. � 8 XUHUA HE AND JESPER FUNCH THOMSEN It follows that if s is compatible with Y and, moreover, defines a Frobenius D-splitting of X then D ∩ Y makes sense as an effective Cartier divisor on Y and, in this case, s induces a Frobenius D ∩ Y - splitting of Y . 3.6. Stable Frobenius splittings along divisors. Let X(0) = X and define recursively X(n) = (X(n−1))′ for n ≥ 1. Composing the Frobenius morphisms on X(i) for i = 0, . . . , n, we obtain a morphism X : X → X with an associated map of sheaves ♯ : OX(n) → (F X )∗OX . Let, as in Section 3.5, D denote an effective Cartier divisor on X with associated canonical section σD of OX(D). We say that X admits a stable Frobenius splitting along D if there exists a positive integer n and an element s ∈ HomO X )∗OX(D),OX(n) such that the composed map OX(n) −−−−→ (F X )∗OX −−−−−−→ (F X )∗OX(D) −→ OX(n) , is the identity map on OX(n) . The element s is called a stable Frobenius splitting of X along D. When Y is a closed subvariety of X we say that the stable Frobenius splitting s is compatible with Y if X )∗(IY ⊗ OX(D)) ⊂ IY (n). Notice that this condition necessarily implies that the support of D does not contain any of the irreducible components of Y (cf. proof of Lemma 3.2). Notice also that if X admits a Frobenius D-splitting which is compatible with Y then X admits a stable Frobenius splitting along D which is compatible with Y . The following is well known (see e.g. [T, Lem.4.4]) Lemma 3.3. Let D1 and D2 denote effective Cartier divisors on X and let Y denote a closed subvariety of X. Then X admits stable Frobenius splittings along D1 and D2 which are compatible with Y if and only if X admits a stable Frobenius splitting along D1+D2 which is compatible with Y . The following result explains one of the main applications of (stable) Frobenius splitting. Remember that a line bundle L is nef if L⊗M is ample whenever M is ample. Proposition 3.4. Assume that X admits a stable Frobenius splitting along an effective Cartier divisor D. Then there exists a positive integer n such that for each line bundle L on X we have an inclusion of abelian groups Hi(X,L) ⊂ Hi(X,Lp ⊗ OX(D)). In particular, if D is ample and L is nef, then Hi(X,L) = 0 for i > 0. Moreover, if the stable Frobenius splitting of X is compatible with a subvariety Y , D is ample and L is nef then the restriction morphism H0(X,L) → H0(Y,L), is surjective. Proof. Argue as in the proof [R, Prop.1.13(i)]. � 3.7. Duality for FX. By duality (see [Har2, Ex.III.6.10]) for the finite morphism FX we may to each quasi-coherent OX′-module F associate an OX -module denoted by (FX) !F and satisfying (FX)∗(FX) F = HomOX′ (FX)∗OX ,F Actually, as FX is the identity on the level of points we may define !F as the sheaf of abelian groups HomOX′ (FX)∗OX ,F with OX -module structure defined by (g · φ)(f) = φ(gf), for g, f ∈ OX and φ ∈ HomOX′ (FX)∗OX ,F . When F = OX we will also use the notation End!F (X) for (FX) !OX . This sheaf is par- ticularly nice when X is smooth as (FX) !OX then coincides with the line bundle ω X , where ωX denotes the dualizing sheaf of X (see e.g. [B-K, Sect.1.3]). If Y1, Y2, . . . , Ym is a collection of closed subvarieties of X then End!F (X, Y1, . . . , Ym) (or End F (X, {Yi} i=1)) will denote the subsheaf of End!F (X) consisting of the elements mapping the sheaf of ideals IYi to IY ′i for all i = 1, . . . , m. We say that End F (X, {Yi} i=1) is the subsheaf of elements compatible with Y1, . . . , Ym. More generally, duality for FX implies that we have a natural iden- tification (FX)∗HomOX G, (FX) ≃ HomOX′ (FX)∗G,F whenever G (resp. F) is a quasicoherent sheaf on X (resp. X ′). This leads to the identification HomOX G, (FX) ≃ HomOX′ (FX)∗G,F where a morphism η : G → (FX) !F is identified with the composed morphism η′ : (FX)∗G (FX)∗η −−−−→ (FX)∗(FX) F ≃ HomOX′ (FX)∗OX ,F Here the latter map is the natural evaluation map at the element 1 in OX . From now on we will specialize to the case where F = OX′ 10 XUHUA HE AND JESPER FUNCH THOMSEN and G equals a line bundle L on X . In this case, an element in HomOX L,End!F (X) may also be considered as a global section of the sheaf End!F (X)⊗ L −1. For later use we emphasize Lemma 3.5. Let η be an element in HomOX L,End!F (X) and let η′ denote the corresponding element in HomOX′ (FX)∗L,OX′ by the above identification. Then η′ factors through the morphism (FX)∗L (FX)∗η −−−−→ (FX)∗End F (X). Moreover, the element η′ is compatible with a collection of closed sub- varieties Y1, . . . , Ym of X if and only if the image of η is contained in End!F (X, Y1, . . . , Ym). Proof. The first part of the statement follows directly from the discus- sion above. To prove the second statement we may assume that m = 1. We use the notation Y = Y1. Let σ denote a section of L over an open subset U of X , and consider s = η(σ) as a map s : OX(U) → OX′(U That s is compatible with Y means that s(f) vanishes on Y ′ whenever f vanishes on Y for a function f on U . Alternatively, the evaluation of f · s at 1, which coincides with η′(f · σ), should vanish on Y ′. In particular, the image of η is contained in End!F (X, Y ) if and only if the restriction of η′ to (FX)∗ IY ⊗ L maps into IY ′ . This ends the proof. � We will also need the following remark Lemma 3.6. Let D denote a reduced effective Cartier divisor on X and L denote a line bundle on X. Let M = OX((p − 1)D) ⊗ L and assume that we have an OX-linear morphism η : M → End F (X). Let σD denote the canonical section of OX(D) and consider the map ηD : L → End F (X), induced by σ D . Then the element η′D ∈ HomOX′ (FX)∗L,OX′ induced by ηD, is compatible with the support of D. In particular, the image of ηD is contained in End F (X,D). Proof. Notice that η′D is the composition η′D : (FX)∗L (FX)∗σ −−−−−−→ (FX)∗M −→ OX′ , where η′ is the element corresponding to η. Hence, the restriction of η′D to L⊗ OX(−D) coincides with the map (FX)∗ L⊗ OX(−D) ) (FX)∗σ −−−−−→ (FX)∗M −→ OX′ . But the restriction of η′ to (cf. (1)) (FX)∗ OX(−pD)⊗M ≃ OX′(−D ′)⊗ (FX)∗M, maps by linearity into OX′(−D ′). The in particular part follows by Lemma 3.5. � 3.8. Push-forward operation. Assume that f : X → Z is a mor- phism of varieties satisfying that the associated map f ♯ : OZ → f∗OX is an isomorphism. Let f ′ : X ′ → Z ′ denote the associated morphism. Then f ′∗ induces a morphism f ′∗EndF (X) → EndF (Z). If Y ⊂ X is a closed subset then the subsheaf f ′∗EndF (X, Y ) is mapped to EndF (Z, f(Y )), where f(Y ) denotes the variety associated to the closure of the image of Y . On the level of global sections this means that every Frobenius splitting s of X induces a Frobenius splitting f ′∗s of Z such that when s is compatible with Y then f ′∗s is compatible with f(Y ). Likewise Lemma 3.7. With notation as above, let L denote a line bundle on Z and let s be an element of End f∗(L) . Then f ′∗s is an element of EndLF . Moreover, if s is compatible with a closed subvariety Y of X then f ′∗s is compatible with f(Y ). Proof. This follows easily from the fact that the sheaf of ideals of f(Y ) coincides with f∗IY [B-K, Lem.1.1.8]. � 4. Linearized sheaves In this section we collect a number of well known facts about lin- earized sheaves. The chosen presentation follows rather closely the presentation in [Bri, Sect.2]. Let H denote a linear algebraic group over the field k and let X denote a H-variety with H-action defined by σ : H × X → X . We let p2 : H × X → X denote projection on the second coordinate. A H-linearization of a quasi-coherent sheaf F on X is an OH×X -linear isomorphism φ : σ∗F → p∗2F, satisfying the relation (2) (µ× 1X) ∗φ = p∗23φ ◦ (1H × σ) as morphisms of sheaves on H × H × X . Here µ : H × H → H (resp. p23 : H × H × X → H × X) denotes the multiplication on H (resp. the projection on the second and third coordinate). Based on the fact that σ∗OX = p 2OX we see that the sheaf OX admits a canonical linearization. In the following we will always assume that OX is equipped with this canonical linearization. 12 XUHUA HE AND JESPER FUNCH THOMSEN A morphism ψ : F → F′ of H-linearized sheaves is a morphism of OX -modules commuting with the linearizations φ and φ ′ of F and F′, i.e. φ′ ◦ σ∗(ψ) = p∗2(ψ) ◦ φ. Linearized sheaves on X form an abelian category which we denote by ShH(X). 4.1. Quotients and linearizations. Assume that the quotient q : X → X/H exists and that q is a locally trivial principal H-bundle. Then for G ∈ Sh(X/H), q∗G is naturally a H-linearized sheaf on X . This defines a functor q∗ : Sh(X/H) → ShH(X). On the other hand, for F ∈ ShH(X), q∗F has a natural action of H . Define a functor qH∗ : ShH(X) → Sh( X/H) by qH∗ (F) = (q∗F) H the subsheaf of H- invariants of q∗F. It is known that the functor q ∗ : Sh(X/H) → ShH(X) is an equivalence of categories with inverse functor qH∗ . In general, if H is a closed normal subgroup of G and X is a G- variety such that the quotient X/H exists (as above), then X/H is a G/H- variety and the functor q∗ : ShG/H(X/H) → ShG(X) is an equivalence of categories with inverse functor qH∗ : ShG(X) → ShG/H( X/H). 4.2. Induction equivalence. Consider now a connected linear alge- braic group G and a parabolic subgroup P in G. Let X denote a P -variety. Then G×X is a G× P -variety by the action (g, p)(h, x) = (ghp−1, px), for g, h ∈ G, p ∈ P and x ∈ X . Then the quotient, denoted by G ×P X , of G × X by P exists and the associated quotient map q : G×X → G×P X is a locally trivial principal P -bundle. The quotient of G × X by G also exists and may be identified with the projection p2 : G×X → X . In particular, we may apply the above consideration to obtain equivalences of the categories ShP (X), ShG×P (G × X) and ShG(G×P X). Notice that under this equivalence a P -linearized sheaf F on X corresponds to the G-linearized sheaf IndGP (F) = (q∗p P . In particular, the space of global sections of IndGP (F) equals IndGP (F)(G×P X) = p∗2F(G×X) k[G]⊗k F(X) = IndGP (F(X)), where the second equality follows by the Künneth formula. This also explains the notation IndGP (F). Similarly, starting with a G-linearized sheaf G on G×P X then the associated P -linearized line bundle on X equals G′ = ((p2)∗q ∗G)G. However, by [Bri, Lemma 2(1)] the latter also equals the simpler pull back i∗G by the P -equivariant map i : X → G×P X, sending x to q(1, x). In particular, we conclude that the functor i∗ : ShG(G ×P X) → ShP (X) is an equivalence of categories with inverse functor IndGP . Notice also that the space of global sections of G is G-equivariantly isomorphic to G(G×P X) = Ind (i∗G)(X) which follows by (3) above. 4.3. Duality. Assume that the field k has positive characteristic p > 0. Regard X ′ as a H-variety in the canonical way and let F denote a H- linearized sheaf on X ′. The sheaf (FX) !F, defined in Section 3.7, is then naturally a H-linearized sheaf on X . Moreover, the induced H- linearization of (FX)∗(FX) !F coincides with the natural H-linearization HomOX′ (FX)∗OX ,F When X is smooth the sheaf (FX) !OX′ is canonically isomorphic to X (cf. Section 3.7). We may use this isomorphism to define a H- linearization of ω X . Alternatively we may consider the natural H- linearization of the dualizing sheaf ΩX of X and use this to define a H-linearization of ω X . It may be checked that the two stated ways of defining a H-linearization of ω X coincide. 5. Frobenius splitting of G×P X Let G denote a connected linear algebraic group over an algebraically closed field k of characteristic p > 0. Let P denote a parabolic subgroup of G and let X denote a P -variety. In this section we want to consider Frobenius splittings of the quotient Z = G×PX of G×X by P . We let π : Z → G/P denote the morphism induced by the projection of G×X on the first coordinate. When g ∈ G and x ∈ X we use the notation [g, x] to denote the element in Z represented by (g, x). 5.1. Decomposing the Frobenius morphism. The Frobenius mor- phism FZ admits a decomposition FZ = Fb ◦ Ff where Fb (resp. Ff ) is related to the Frobenius morphism on the base (resp. fiber) of π. More precisely, define Ẑ and the morphisms π̂ and Fb as part of the fiber product diagram (4) Ẑ Fb // // (G/P)′ A local calculation shows that we may identify Ẑ with the quotient G ×P X ′, where the P -action on the Frobenius twist X ′ of X is the natural one. With this identification π̂ : G ×P X ′ → G/P is just the 14 XUHUA HE AND JESPER FUNCH THOMSEN map [g, x′] 7→ gP . It also follows that the natural morphism (induced by the Frobenius morphism on X) Ff : G×P X → G×P X makes the following diagram commutative Fb // // (G/P)′ 5.2. Let M denote a P -linearized line bundle on X and let MZ = IndGP (M) denote the associated G-linearized line bundle on Z. The main aim of this section is to construct global sections of the sheaf F (Z) = HomOZ′ (FZ)∗MZ ,OZ′ To this end we fix a P -character λ and let L denote the associated line bundle on G/P (cf. Section 4). The pull back π̂∗L of L to Ẑ is then denoted by LẐ . We then define the following sheaves F (Z)f := HomOẐ (Ff)∗MZ ,OẐ F (Z)b := HomOZ′ (Fb)∗LẐ ,OZ′ with spaces of global sections denoted by End F (Z)f and End F (Z)b. Notice that whenM is substituted with the P -linearized twistM(−λ) := M⊗ k−λ then M(−λ)Z = MZ ⊗ π ∗(L−1) = MZ ⊗ (Ff) and thus by the projection formula (5) End M(−λ)Z F (Z)f = HomOẐ (Ff)∗MZ ,LẐ Sections of End F (Z) are then constructed as compositions of global sections of the sheaves End M(−λ)Z F (Z)f and End F (Z)b. More precisely, v ∈ HomO (Ff)∗MZ ,LẐ u ∈ HomOZ′ (Fb)∗LẐ ,OZ′ are global sections of the latter sheaves, then the composition u◦(Fb)∗v defines a global section of End F (Z). 5.3. An equivariant setup. We now give equivariant descriptions of the sheaves End F (Z)f and End F (Z)b. 5.3.1. A description of End F (Z)f . Now End F (Z)f is a G-linearized sheaf on Ẑ = G×P X ′. Let Y denote a P -stable subvariety of X and let ZY = G ×P Y denote the associated subvariety of Z with sheaf of ideals IZY ⊂ OZ . Let ẐY denote the subvariety G ×P Y ′ of Ẑ. Then there is a natural morphism of G-linearized sheaves F (Z)f → HomOẐ (Ff )∗(MZ ⊗ IZY ),OẐY induced by the inclusion IZY ⊂ OZ and the projection OẐ → OẐY . We let End F (Z,ZY )f denote the kernel of the above map and arrive at a left exact sequence of G-linearized sheaves 0 → EndMZF (Z,ZY )f → End F (Z)f → HomOẐ (Ff)∗(MZ⊗IZY ),OẐY In particular, the space of global sections of End F (Z,ZY )f is identified with the set of elements in End F (Z)f which map (Ff )∗(MZ ⊗ IZY ) to ⊂ OẐ . Using the observations in Section 4.2 we can give another description of the space of global sections of End F (Z,ZY )f . Let i X ′ → G×P X ′ denote the morphism i′(x′) = [1, x′]. Then the functor i′ is exact on the category of G-linearized sheaves. We want to apply this fact on the left exact sequence (6) above : notice first that (i′)∗End F (Z)f = HomOX′ (i′)∗(Ff )∗MZ ,OX′ where, moreover, (i′)∗(Ff)∗MZ = (FX)∗M. Thus (i ′)∗End F (Z)f = EndMF (X). Similarly, (i′)∗HomO (Ff)∗(MZ ⊗ IZY ),OẐY = HomOX′ ((FX)∗(M⊗ IY ),OY ′). In particular, we see that the P -linearized sheaf on X ′ corresponding to the G-linearized sheaf End F (Z,ZY )f equals the kernel of the natural EndMF (X) → HomOX′ ((FX)∗(M⊗ IY ),OY ′), i.e. it equals EndMF (X, Y ). By Section 4.2 the space of global sections F (Z,ZY )f of End F (Z,ZY )f is then G-equivariantly isomorphic IndGP EndMF (X, Y )). Applying the above conclusions to the sheaf M(−λ) we find: Proposition 5.1. There exists a G-equivariant isomorphism M(−λ)Z F (Z)f ≃ Ind EndMF (X)⊗ kλ) such that when Y is a closed P -stable subvariety of X then the subset of elements of End M(−λ)Z F (Z)f which map (Ff)∗(MZ⊗IZY ) to (IẐY ⊗LẐ) ⊂ LẐ (cf. equation (5)) is identified with M(−λ)Z F (Z,ZY )f ≃ Ind EndMF (X, Y )⊗ kλ). 16 XUHUA HE AND JESPER FUNCH THOMSEN 5.3.2. A description of End F (Z)b. As π ′ in the fibre-diagram (4) is flat the natural morphism (π′)∗(FG/P )∗L → (Fb)∗π̂ ∗L is an isomor- phism ([Har2, Prop.III.9.3]). Thus there is a natural isomorphism of G-linearized sheaves F (Z)b ≃ (π ′)∗HomO(G/P )′ (FG/P )∗L,O(G/P)′ = (π′)∗EndLF ( G/P). Let V denote a closed subvariety of G/P . Then EndLF ( G/P , V ) is the kernel of the natural map EndLF ( G/P) → HomO(G/P )′ (FG/P )∗(IV ⊗ L),OOV ′ In particular, (π′)∗ EndLF ( G/P , V ) maps into the kernel of the induced morphism (7) End F (Z)b → (π ′)∗HomO (G/P )′ (FG/P )∗(IV ⊗ L),OOV ′ Let q : G→ G/P denote the quotient map. Then π̂−1(V ) identifies with the quotient q−1(V )×P X ′. Moreover, as π′ is locally trivial it follows that π̂∗(IV ) = Iq−1(V )×PX′. In particular, the sheaf (π′)∗HomO(G/P )′ (FG/P )∗(IV ⊗ L),OOV ′ is isomorphic to HomOZ′ (Fb)∗(Iq−1(V )×PX′ ⊗ LẐ),O(q−1(V )×PX)′ Thus we see that the kernel of (7) is the subsheaf End F (Z, π −1(V ))b of elements which map (Fb)∗(Iq−1(V )×PX′ ⊗ LẐ) to I(q−1(V )×PX)′ . The global sections of this subsheaf is denote by End F (Z, π −1(V ))b. In conclusion Proposition 5.2. The map π′ induces a G-equivariant morphism (π′)∗ : EndLF ( G/P) → End F (Z)b. Moreover, when V is a closed subvariety of G/P then (π′)∗ maps the subset EndLF ( G/P , V ) into End F (Z, q −1(V )×P X)b. The following is also useful. Lemma 5.3. Let Y denote a closed P -stable subvariety of X and fix notation as above. Then each element of End F (Z)b maps (Fb)∗(IẐY ⊗ LẐ) to I(ZY )′. Proof. It suffices to show that the natural morphism HomOZ′ (Fb)∗LẐ ,OZ′ → HomOZ′ (Fb)∗(IẐY ⊗ LẐ),O(ZY )′ is zero. By linearity, this will follow if the natural morphism I(ZY )′ ⊗ (Fb)∗LẐ → (Fb)∗(IẐY ⊗ LẐ), is an isomorphism, which can be checked by a local calculation. � 5.4. Conclusions. By Proposition 5.1 an element v in the vectorspace IndGP EndMF (X) ⊗ kλ defines an element in End M(−λ)Z F (Z)f . More- over, by Proposition 5.2, an element u ∈ EndLF ( G/P) defines an element (π′)∗(u) in End F (Z)b. Thus by the discussion in Section 5.2 we obtain a G-equivariant map M,λ : End G/P)⊗ IndGP EndMF (X)⊗ kλ → End F (G×P X), defined by M,λ(u⊗ v) = (π ′)∗u ◦ (Fb)∗v. We can now prove Theorem 5.4. Let X denote a P -variety and M denote a P -linearized line bundle on X. Let L denote the equivariant line bundle on G/P associated to the P -character λ. Then the G-equivariant map Φ1 defined above, satisfies (1) When Y is a P -stable closed subvariety of X then the restriction of Φ1 M,λ to the subspace : EndLF ( G/P)⊗ IndGP EndMF (X, Y )⊗ kλ maps to End F (G×P X,G×P Y ). (2) When V denotes a closed subvariety of G/P then the restriction of Φ1 M,λ to the subspace EndLF ( G/P , V )⊗ IndGP EndMF (X)⊗ kλ maps to End F (G ×P X, q −1(V ) ×P X), where q : G → G/P denotes the quotient map. Proof. The first statement follows from Proposition 5.1 and Lemma 5.3. The second statement follows from Proposition 5.2 and Lemma 5.5 below. � Lemma 5.5. Let V denote a closed subset of G/P . Then every element of End M(−λ)Z F (Z)f will map (Ff)∗(MZ ⊗ Iπ−1(V )) to I(π̂)−1(V ) ⊗ LẐ . Proof. It suffices to prove that the natural morphism I(π̂)−1(V ) ⊗ (Ff)∗MZ → (Ff)∗ Iπ−1(V ) ⊗MZ is an isomorphism, which can be checked by a local calculation. � 5.5. Identify IndGP with the space of global sections of MZ (cf. Equation (3)). Then we can define a G-equivariant morphism (8) End F (G×P X)⊗ Ind → EndF (G×P X), by mapping s⊗σ, for σ a global section of MZ and s : (FZ)∗MZ → OZ′ , to the element (FZ)∗OZ (FZ )∗σ −−−−→ (FZ)∗MZ −→ OZ′ , 18 XUHUA HE AND JESPER FUNCH THOMSEN in EndF (G×PX). Combining Φ M,λ with the morphism in (8) we obtain a G-equivariant map ΦM,λ : End G/P)⊗ IndGP EndMF (X)⊗kλ ⊗ IndGP → EndF (Z), where an element u⊗ v⊗ σ in the domain is mapped to the composed (9) (FZ)∗OZ (FZ)∗σ −−−−→ (FZ)∗MZ (Fb)∗v −−−→ (Fb)∗LẐ (π′)∗u −−−→ OZ′. Notice that by Lemma 3.5 the map u ∈ EndLF ( G/P) factors as (10) (FG/P )∗L (FG/P )∗u −−−−−→ (FG/P )∗ω → O(G/P)′ , where u! is some global section of the line bundle Ľ := ω ⊗L−1 as- sociated to u (cf. Section 3.7), and the rightmost map is the evaluation map with domain (FG/P )∗ω = EndF (G/P). It follows that we may extend (9) into a commutative diagram (FZ)∗OZ (FZ)∗σ // (Fb)∗π̂ ∗(u!) (FZ)∗MZ (Fb)∗v // (Fb)∗π̂ ∗(u!) (Fb)∗LẐ (π′)∗u (Fb)∗π̂ ∗(u!) (FZ)∗π ∗Ľ // (FZ)∗(MZ ⊗ π ∗Ľ) //// (Fb)∗(π̂ rrrrrrrrrrr where all the vertical maps are induced by multiplication by π̂∗(u!). Likewise the lower horizontal maps are induced from the upper hori- zontal maps by multiplication with π̂∗(u!). The triangle on the right is induced from (10) by pull-back to Z ′. Theorem 5.6. Let X denote a P -variety and M denote a P -linearized line bundle on X. Let L denote the equivariant line bundle on G/P associated to the P -character λ. Then the G-equivariant map ΦM,λ, defined above, satisfies (1) When Y is a P -stable closed subvariety of X then the restriction of ΦM,λ to the subspace : EndLF ( G/P)⊗ IndGP EndMF (X, Y )⊗ kλ ⊗ IndGP maps to EndF (G×P X,G×P Y ). (2) When V denotes a closed subvariety of G/P then the restriction of ΦM,λ to the subspace : EndLF ( G/P , V )⊗ IndGP EndMF (X)⊗ kλ ⊗ IndGP maps to EndF (G ×P X, q −1(V ) ×P X), where q : G → G/P denotes the quotient map. Moreover, let u ∈ EndLF ( G/P), v ∈ IndGP EndMF (X) ⊗ kλ and σ ∈ IndGP . Then the element ΦM,λ(u⊗ v ⊗ σ) factorizes both as (FZ)∗OZ (FZ)∗σ −−−−→ (FZ)∗MZ −→ OZ′, and as (FZ)∗OZ (FZ)∗(σ⊗π −−−−−−−−→ (FZ)∗(MZ ⊗ π −→ OZ′, where s1 and s2 satisfies i) If v is contained in IndGP EndMF (X, Y ) ⊗ kλ then s1 and s2 are compatible with G×P Y . ii) If u is contained in EndLF ( G/P , V ) then s1 is compatible with q −1(V )×P Proof. Part (1) and (2) follows directly from Theorem 5.4 and the defi- nition of ΦM,λ. The existence of s1 and s2 follows by the diagram (11). Finally the claims about the compatibility of s1 and s2 follows from Theorem 5.4 and Lemma 5.3. � 5.6. We will now describe when an element in the image of ΦM,λ defines a Frobenius splitting of Z. For this we consider the composed map evZ◦ΦM,λ. Recall that an element s ∈ EndF (Z) is a Frobenius splitting of Z if and only if evZ(s) is the constant function 1 on Z Let u ∈ EndLF ( G/P), v ∈ IndGP EndMF (X)⊗kλ and σ ∈ IndGP By Equation (9) the image of u⊗v⊗σ under evZ ◦ΦM,λ coincides with the global section of OZ′ determined by the composed map (12) OZ′ −→ (FZ)∗OZ (FZ )∗σ −−−−→ (FZ)∗MZ (Fb)∗v −−−→ (Fb)∗LẐ (π′)∗u −−−→ OZ′ . We may divide this composition into two parts. The first part −→ (FZ)∗OZ (FZ)∗σ −−−−→ (FZ)∗MZ (Fb)∗v −−−→ (Fb)∗LẐ is defined by σ and v and defines a global section of LẐ . The corre- sponding map M,λ : Ind EndMF (X)⊗ kλ ⊗ IndGP → IndGP k[X ′]⊗ kλ is the map induced by the morphism (13) EndMF (X)⊗M(X) → k[X mapping s : (FX)∗M → OX′ and τ a global section of M, to s(τ). Notice that we here identify IndGP k[X ′]⊗ kλ with the space of global sections of LẐ (cf. Equation (3)). The second part takes a global section τ̃ of LẐ and an element u in End G/P) to the global section of OZ′ defined by −→ (Fb)∗OẐ (Fb)∗τ̃ −−−→ (Fb)∗LẐ (π′)∗u −−−→ OZ′. 20 XUHUA HE AND JESPER FUNCH THOMSEN The corresponding map is Φλ : End G/P)⊗ IndGP k[X ′]⊗ kλ → k[Z ′], which maps u⊗ τ̃ , to ((π′)∗u)(τ̃) (cf. Proposition 5.2). The restriction of Φλ : (14) φλ : End G/P)⊗ IndGP is the map corresponding to Φλ in case X is the one point space Spec(k) (in which case k[X ′] is just k). In combination this defines us a com- mutative diagram EndLF ( G/P)⊗ IndGP EndMF (X)⊗ kλ ⊗ IndGP Id⊗Φ2 ΦM,λ // EndF (Z) EndLF ( G/P)⊗ IndGP k[X ′]⊗ kλ ) Φλ // k[Z ′] EndLF ( G/P)⊗ IndGP φλ // k evG/P 33ggggggggggggggggggggggggggggggg wheremλ is the natural map which makes the lower part of the diagram commutative. Notice that when k[X ′] = k, e.g. if X ′ is a complete and irreducible variety, then φλ and Φλ coincides. Let χ denote the P - character associated to the canonical G-linearization of ω−1G/P (cf. Sec- tion 4.3). Then as noted earlier (Section 5.5) the G-module EndLF ( coincides with the space of global sections of Ľ = ω ⊗L−1 and thus coincides with (16) EndLF ( G/P) = IndGP (p− 1)χ− λ where we abuse notation and write (p− 1)χ− λ for the 1-dimensional P -representation associated with the character (p− 1)χ−λ. It follows that mλ is the natural multiplication map (17) mλ : Ind (p− 1)χ− λ ⊗ IndGP → IndGP (p− 1)χ which is surjective if the domain is nonzero, i.e. if L and ω ⊗ L−1 are effective line bundles on G/P [R-R, Thm.3]. The commutativity of the diagram (15) then implies: Proposition 5.7. Let Ξ denote an element in the domain of ΦM,λ, and assume that the image (Id⊗Φ2 M,λ)(Ξ) is contained in the subspace EndLF ( G/P)⊗IndGP (cf. diagram (15)). Then ΦM,λ(Ξ) is a Frobenius splitting of Z if and only if φλ((Id ⊗ Φ M,λ)(Ξ)) equals the constant 1. In particular, if EndLF ( G/P) ⊗ IndGP is nonzero and IndGP contained in the image of Φ2 M,λ, then Z admits a Frobenius splitting. Proof. The first part of the proof is just a restatement of the fact that the diagram (15) is commutative. The second part follows by the sur- jectivity of mλ and the fact that G/P admits a Frobenius splitting. Corollary 5.8. Assume that X is irreducible and complete. If both IndGP and IndGP (p − 1)χ − λ are nonzero and Φ2 M,λ is surjective, then Z admits a Frobenius splitting. 5.7. In many concrete situation the existence of a P -invariant ele- ment in EndMF (X) ⊗ kλ is given. Notice that this is equivalent to a G-invariant element v in IndGP EndMF (X) ⊗ kλ and thus ΦM,λ defines a G-equivariant map (18) EndLF ( G/P)⊗ IndGP → EndF (Z), u⊗ σ 7→ ΦM,λ(u⊗ v ⊗ σ). Similarly Φ2 M,λ defines a G-equivariant morphism (19) IndGP → IndGP k[X ′]⊗ kλ which makes the diagram (20) EndLF ( G/P)⊗ IndGP // EndF (Z) EndLF ( G/P)⊗ IndGP k[X ′]⊗ kλ ) Φλ // k[Z ′] commutative. We also note Corollary 5.9. Assume that X is irreducible and complete and let v denote a P -invariant element of EndMF (X)⊗ kλ. If the induced map M,λ)|v⊗IndGP (M(X)) : IndGP → IndGP is surjective then Z admits a Frobenius splitting. In particular, if IndGP is an irreducible G-representation then for Z to be Frobenius split it suffices that the latter map is nonzero. Proof. Apply Corollary 5.8. � 6. B-Canonical Frobenius splittings In this section we continue the study of the Frobenius splitting prop- erties of Z = G ×P X . The notation is kept as in Section 5 but we restrict ourselves to the case where G is a connected, semisimple and simply connected linear algebraic group. Moreover, we fix P = B, M = OX and λ = −(p − 1)ρ. Recall that, in this setup, the dualizing sheaf ωG/B is the G-linearized sheaf associated to the B-character 2ρ. 22 XUHUA HE AND JESPER FUNCH THOMSEN Thus, with the notation in Section 5.6, we have χ = −2ρ. Recall also the G-equivariant identity (see (16)) (21) EndLF ( G/B) ≃ IndGB((p− 1)χ− λ) = Ind B(λ) = Ind B((1− p)ρ). The latter G-module is called the Steinberg module of G and will be denoted by St. The Steinberg module is a simple and selfdual G- module. A B-canonical Frobenius splitting ofX is then a B-equivariant (22) θ : St⊗ k(p−1)ρ → EndF (X), containing a Frobenius splitting in its image. Notice that a B-canonical Frobenius splitting of X is not a Frobenius splitting as defined in Sec- tion 3.2. However, there exists a unique nonzero lowest weight vector v− of St such that θ(v−) is a Frobenius splitting in the sense of Sec- tion 3.2. Moreover, as St is a simple G-module the map θ is uniquely determined by θ(v−), and we may thus identify θ with θ(v−). In this way θ(v−) will also be called a B-canonical Frobenius splitting of X . The importance of B-canonical Frobenius splittings was first ob- served by O. Mathieu in connection with good filtrations of G-modules. We refer to [B-K, Chapter 4] for a general reference on B-canonical Frobenius splittings. 6.1. Consider a B-canonical Frobenius splitting as in (22). By Frobe- nius reciprocity this defines a map St → IndGB EndF (X)⊗ kλ and as IndGB k[X ] contains k we may consider the inducedG-equivariant morphism θ̃ : St → IndGB EndF (X)⊗ kλ ⊗ IndGB k[X ] Composing θ̃ with the map Φ2 M,λ of Section 5.6 we end up with a map M,λ ◦ θ̃ : St → Ind k[X ′]⊗ kλ We claim Lemma 6.1. The composed map Φ2 M,λ ◦ θ̃ is an isomorphism on its image IndGB Proof. We first prove that the image of Φ2 M,λ◦θ̃ is contained in Ind For this let EndF (X)c denote the inverse image of k ⊂ k[X ′] under the evaluation map evX . It suffices to prove that the image of θ is contained in EndF (X)c. Notice that EndF (X)c is a B-submodule of EndF (X) containing the set of Frobenius splittings of X . In particular, the image of the lowest weight space of St under θ is contained in EndF (X)c. Moreover, as St is an irreducible G-module it is generated by the lowest weight space as a B-module. Thus, the image of θ will be contained in the B-module EndF (X)c. Now Φ2 M,λ ◦ θ̃ is a map from St to Ind B(λ) = St. Thus, by Frobenius reciprocity, it suffices to prove that Φ2 M,λ ◦ θ̃ is nonzero which is the case as θ contains a Frobenius splitting in its image. � Using Lemma 6.1 we can now combine the diagram (15) with the map Φ2 M,λ ◦ θ̃ and obtain a commutative and G-equivariant diagram (23) St⊗ St ≃ Id⊗(Φ2 Θ // EndF (G×B X) evZ // k[Z ′] EndLF ( G/B)⊗ IndGB φλ // k evG/B 88qqqqqqqqqqqqqq where Θ is the map induced by θ̃ and ΦM,λ. By Proposition 5.7 it follows that Θ(Ξ), for Ξ in St⊗ St, is a Frobenius splitting of Z if and only if the image of Ξ under φλ and Id ⊗ (Φ M,λ ◦ θ̃) equals 1. The latter map from St⊗St to k will be denoted by φ. By construction φ is G-equivariant. Moreover, mλ is surjective and evG/B is nonzero (as G/B admits a Frobenius splitting) and thus φ is nonzero. As St is a simple G-module it follows that (24) φ : St⊗ St → k, defines a nondegenerate G-invariant bilinear form on St. By Frobenius reciprocity such a form is uniquely determined up to a nonzero con- stant. In particular, this provides a very useful way to construct lots of Frobenius splittings of Z. Corollary 6.2. Let θ : St⊗ k(p−1)ρ → EndF (X) denote a B-canonical Frobenius splitting of X. Then the induced morphism (defined above) Θ : St⊗ St → EndF (G×B X), satisfies the following (1) The image Θ(ν) of an element ν in St⊗ St defines a Frobenius splitting of G×BX up to a nonzero constant if and only if φ(ν) is nonzero. (2) If the image of θ is contained in EndF (X, Y ) for a B-stable closed subvariety Y of X, then the image of Θ is contained in EndF (G×B X,G×B Y ). (3) Let v denote an element of St = EndLF ( G/B) which is compatible with a closed subvariety V of G/B. For any element v′ ∈ St we Θ(v ⊗ v′) ∈ EndF (G×B X, q −1(V )×B X), with q : G→ G/B denoting the quotient map. 24 XUHUA HE AND JESPER FUNCH THOMSEN (4) Any element of the form Θ(v ⊗ v′) factorizes as (FZ)∗OZ (FZ )∗π −−−−−→ (FZ)∗π −→ OZ′ , where Z = G×B X and L is the line bundle on G/B associated to the B-character (1 − p)ρ. Moreover, if the image of θ is contained in EndF (X, Y ) then s is compatible with G×B Y . Proof. All statements follows directly from Theorem 5.6 and the con- siderations above. � The first part (1) and (2) of the above result is well known (see e.g. [B-K, Ex. 4.1.E(4)]). However, the second part (3) and (4) seems to be new. 6.2. B-canonical Frobenius splitting when G is not semisim- ple. Although Corollary 6.2 is only stated for connected, semisimple and simply connected groups it also applies in other cases : assume that G is a connected linear algebraic group containing a connected semisimple subgroup H such that the induced map H/H∩B → G/B is an isomorphism. E.g. this is satisfied for any parabolic subgroup of a reductive connected linear algebraic group. Let qsc : Hsc → H denote a simply connected cover of H . Then X admits an action of the par- abolic subgroup Bsc := q sc (B ∩ H) of Hsc. Furthermore, the natural morphism Hsc ×Bsc X → G×B X, is then an isomorphism. We then say that X admits a B-canonical Frobenius splitting if X , as a Bsc-variety, admits a Bsc-canonical Frobe- nius splitting. In this case we may apply Corollary 6.2 to obtain Frobe- nius splitting properties of G×B X . 6.3. Restriction to Levi subgroups. Return to the situation where G is connected, semisimple and simply connected. Let J be a subset of the set of simple roots ∆ and let GJ denote the commutator subgroup of LJ . Then GJ is a connected, semisimple and simply connected linear algebraic group with Borel subgroup BJ = GJ ∩B and maximal torus TJ = T ∩ GJ . We let StJ denote the associated Steinberg module. Notice that StJ = Ind ((1− p)ρJ ) where ρJ denotes the restriction of ρ to BJ . The following should be well known but we do not know a good reference. Lemma 6.3. There exists a GJ -equivariant morphism StJ → St, such that the B−J -invariant line of StJ maps surjectively to the B invariant line of St. In particular, if X is a G-variety admitting a B- canonical Frobenius splitting then X admits a BJ -canonical Frobenius splitting as a GJ -variety. Proof. Let M denote the T -stable complement to the B-stable line in St. Then M is B−-invariant and thus also B−J -invariant. The trans- late ẇJ0M is then invariant under BJ and we obtain a BJ -equivariant morphism St → St/(ẇJ0M) ≃ k(1−p)ρJ . By Frobenius reciprocity this defines a GJ -equivariant map St → StJ such that the B-stable line of St maps onto the BJ -stable line of StJ . Now apply the selfduality of StJ and St to obtain the desired map. This proves the first part of the statement. The second part follows easily by composing the obtained morphism StJ → St with the B-canonical Frobenius splitting St → EndF (X)⊗ k(1−p)ρ, of X and noticing that the restriction of ρ to BJ is ρJ . 7. Applications to G×G-varieties In this section we consider a linear algebraic group G satisfying the conditions of Section 6.2, i.e. we assume that G contains a closed connected semisimple subgroup H such that H/H∩B → G/B is an iso- morphism. We also let Hsc denote the simply connected version of H and let Bsc denote the associated Borel subgroup. 7.1. A well known result. Consider for a moment (i.e. in this sub- section) the case where G is semisimple and simply connected. Re- member that the G-linearized line bundle on G/B associated to the B- character 2ρ coincides with the dualizing sheaf ωG/B. Let L denote the line bundle on G/B associated to the B-character (1−p)ρ and recall from Section 6 the notation St = IndGB((1−p)ρ) for the Steinberg module. As the Steinberg module is a selfdual G-module we may fix a G-invariant nonzero element v∆ in the tensorproduct St⊗ St. We may think of v∆ as a global section of the line bundle L⊠ L on (G/B)2 = G/B × G/B. Identify G/B × G/B with G×B G/B by the isomorphism G×B G/B → G/B × G/B, [g, hB] 7→ (gB, ghB), and let D denote the subvariety of G/B × G/B corresponding to G ×B ∂(G/B), where ∂(G/B) denotes the union of the codimension 1 Schubert varieties in G/B. Then, by [B-K, proof of Thm.2.3.8], the zero scheme of v∆ equals (p− 1)D. Consider then the natural morphism : η : (L⊠ L)⊗ (L⊠ L) → ω (G/B)2 = End!F (( G/B)2) and define ηD : (L⊠ L) → End G/B)2), 26 XUHUA HE AND JESPER FUNCH THOMSEN as in Lemma 3.6, using the identification L ⊠ L = O(G/B)2 (p − 1)D Then by Lemma 3.6 the image of ηD is contained in End G/B)2, D) and thus the associated element η′D ∈ HomO((G/B)2)′ (F(G/B)2)∗(L⊠ L),O((G/B)2)′ is compatible with D. It follows Lemma 7.1. The element in EndL⊠LF (( G/B)2) ≃ St⊠ St defined by v∆ is compatible with the diagonal diag(G/B) in G/B × G/B. Proof. We have to prove that η′D, defined above, is compatible with the diagonal diag(G/B). As η′D is compatible with D it suffices to show that EndL⊠LF (( G/B)2, D) is contained in EndL⊠LF (( G/B)2, diag(G/B) . This fol- lows by an application of Lemma 3.1 and an argument as at the end of the proof of [B-K, Thm.2.3.1]. � 7.2. We return to the setup as in the beginning of this section. We want to apply the results of the preceding sections to the case when the group equals G×G. So let X denote a B ×B-variety and assume that X admits a Bsc ×Bsc-canonical Frobenius splitting defined by θ : (St⊠ St)⊗ (k(p−1)ρ ⊠ k(p−1)ρ) → EndF (X), which is compatible with certain B×B-stable subvarieties X1, . . . , Xm, i.e. the image of θ is contained in EndF (X,Xi) for all i. Then Theorem 7.2. The variety (G × G) ×(B×B) X admits a diag(Bsc)- canonical Frobenius splitting which compatibly Frobenius splits the sub- varieties diag(G)×diag(B) X and (G×G)×(B×B) Xi for all i. Proof. It suffices to consider the case where G = Hsc (cf. discussion in Section 6.2). By Corollary 6.2 there exists a G × G-equivariant morphism Θ : (St⊠ St)⊗ (St⊠ St) → EndF ((G×G)×(B×B) X), satisfying certain compatibility conditions. Let v∆ ∈ St ⊠ St be a nonzero diag(G)-invariant element and let v ∈ St ⊠ St be arbitrary. Then by Corollary 6.2 and Lemma 7.1 the element Θ v∆ ⊗ v is com- patible with diag(G) ×diag(B) X and (G × G) ×(B×B) Xi for all i. In particular, if we define the diag(G)-equivariant morphism Θ∆ : St⊗ St → EndF ((G×G)×(B×B) X), by Θ∆(v) = Θ v∆ ⊗ v , then every element in the image of Θ∆ is compatible with diag(G) ×diag(B) X and (G × G) ×(B×B) Xi for all i. Consider k(p−1)ρ as the highest weight line in St. Then the restriction of Θ∆ to St⊗ k(p−1)ρ defines a diag(B)-canonical Frobenius splitting of (G×G)×(B×B) X with the desired properties. � Notice that by the general machinery of canonical Frobenius split- tings (see e.g. [B-K, Prop.4.1.17]) the existence of a Frobenius splitting of diag(G)×diag(B) X follows if X admits a diag(Bsc)-canonical Frobe- nius splitting. In the above setup X only admits a Bsc ×Bsc-canonical Frobenius splitting which is less restrictive. However, in contrast to the situation when X admits a diag(Bsc)-canonical Frobenius splitting, the present Frobenius splitting is not necessarily compatible with subvari- eties of the form BẇB ×B X , with w denoting an element of the Weyl group and BẇB denoting the closure of BẇB in G. 8. G-Schubert varieties in equivariant Embeddings From now on, unless otherwise stated, we assume that G is a con- nected reductive group. 8.1. Equivariant embeddings. Consider G as a G × G-variety by left and right translation. An equivariant embedding X of G is then a normal irreducible G × G-variety containing an open dense subset which is G × G-equivariantly isomorphic to G. In particular, we may identify G with an open subset of X , and the complement X \ G is then called the boundary. As G is an affine variety the boundary is of pure codimension 1 in X [Har, Prop.3.1]. Any equivariant embedding of G is a spherical variety (with respect to the induced B × B-action) and thus X contains finitely may B ×B-orbits. 8.2. Wonderful compactifications. When G = Gad is of adjoint type there exists a distinguished equivariant embedding X of G which is called the wonderful compactification (see e.g. [B-K, 6.1]). The boundary X \ G is a union of irreducible divisors Xj , j ∈ ∆, which intersect transversely. For a subset J ⊂ ∆, we denote the inter- section ∩j∈JXj by XJ . As a (G×G)-variety, XJ is isomorphic to the variety (G × G) ×P− ×P∆\J Y, where Y denotes the wonderful com- pactification of the group of adjoint type associated to L∆\J . Here the ×P∆\J -action on Y is defined by the quotient maps P∆\J → L∆\J and P− → L∆\J . In particular, X∆ is G×G-equivariantly isomorphic to the variety G/B− × G/B. 8.3. Toroidal embeddings. Let Gad denote the group of adjoint type associated to G, and let X denote the wonderful compactifica- tion of Gad. An embedding X of the reductive group G is then called toroidal if the canonical map G→ Gad admits an extension X → X. 8.4. G-Schubert varieties. By a G-Schubert variety in an equivari- ant embedding X we will mean a subvariety of the form diag(G) · V , for some B × B-orbit closure V . Notice that diag(G) · V is the image of diag(G)×diag(B) V under the proper map diag(G)×diag(B) X → X, 28 XUHUA HE AND JESPER FUNCH THOMSEN [g, x] 7→ g · x, and thus G-Schubert varieties are closed diag(G)-stable subvarieties of If G = Gad and X = X is the wonderful compactification then a G- Schubert variety in X∆ is diag(G)-equivariantly isomorphic to a variety of the form G×BX(w), where X(w) denotes a Schubert variety in G/B. In particular, this explains the name G-Schubert varieties as this is the name used for varieties of the form G×B X(w). In the rest of this section, we will relate G-Schubert varieties to closures of so-called G-stable pieces. Our primary interest are G-stable pieces in wonderful compactifications but below we will also describe the toroidal case in general. 8.5. G-stable pieces in the wonderful compactification. LetG = Gad denote a group of adjoint type and let X denote its wonderful com- pactification. Let J ⊂ ∆ and identify XJ with (G× G)×P− ×P∆\J as in Section 8.2. Using this identification it easily follows that there exists a unique element in XJ which is invariant under U J × UJ and diag(LJ). We denote this element by hJ and note that as an element of (G×G)×P− ×P∆\J Y it equals [(e, e), eJ ], where e (resp. eJ) denotes the identity element of G (resp. the adjoint group associated to L∆\J ). For w ∈ W∆\J , we then let XJ,w = diag(G)(Bw, 1) · hJ , and call XJ,w a G-stable piece of X. A G-stable piece is a locally closed subset of X and by [L, section 12] and [He, section 2], we can use them to decompose X as follows w∈W∆\J XJ,w. Moreover, by the proof of [He2, Theorem 4.5], any G-Schubert variety is a finite union of G-stable pieces. In particular, we may think of G-Schubert varieties as closures of G-stable pieces. 8.6. G-stable pieces in arbitrary toroidal embeddings. We fix a toroidal embedding X of G. The irreducible components of the bound- ary X \G will be denoted by X1, . . . , Xn. For each G×G-orbit closure Y in X we then associate the set KY = {i ∈ {1, . . . , n} | Y ⊂ Xi}, where by definition KY = ∅ when Y = X . Then by [B-K, Prop.6.2.3], Y = ∩i∈KYXi. Moreover, we define I = {KY ⊂ {1, . . . , n} | Y a G×G-orbit closure in X }, and write XK := ∩i∈KXi for K ∈ I. Then (XK)K∈I are the set of closures of G×G-orbits in X . Let now πX : X → X denote the given extension of G → Gad. Then the closure of πX(XK) equals XP (K) for some unique subset P (K) of ∆. This defines a map P : I → P(∆), where P(∆) denotes the set of subsets of ∆. As in [H-T2, 5.4], for K ∈ I we may choose a base point hK in the open G×G-orbit of XK which maps to hP (K). By [H-T2, Proposition 5.3], XK is then naturally isomorphic to (G×G)×P− ×P∆\J L∆\J · hK , where J = P (K) and L∆\J · hK is a toroidal embedding of a quotient (L∆\J )/H by some subgroup H of the center of L∆\J . For K ∈ I and w ∈ W∆\p(K), we then define XK,w = diag(G)(Bw, 1) · hK , and call XK,w a G-stable piece of X . One can then show, in the same way as in [He2, 4.3], that w∈W∆\P (K) XK,w. Also similar to the proof of [He2, Theorem 4.5], for any B × B-orbit closure V in X , the G-Schubert variety diag(G) · V is a finite union of G-stable pieces. In particular, G-Schubert varieties are closures of G-stable pieces. 9. Frobenius splitting of G-Schubert varieties In this section, we assume that X is an equivariant embedding of G. Let Gsc denote a simply connected cover of the semisimple commutator subgroup (G,G) of G. We fix a Borel subgroup Bsc of Gsc which is compatible with the Borel subgroup B in G. Similarly we fix a maximal torus Tsc ⊂ Bsc. Let X1, . . . , Xn denote the boundary divisors of X . The closure within X of the B × B-orbit Bsjw0B ⊂ G will be denoted by Dj . Then Dj is of codimension 1 in X . The translate (w0, w0)Dj of Dj will be denoted by D̃j. By earlier work we know Theorem 9.1. [H-T2, Prop.7.1] The equivariant embedding X admits a Bsc × Bsc-canonical Frobenius splitting which compatibly Frobenius splits the closure of every B ×B-orbit. As a direct consequence of Theorem 7.2 we then obtain Corollary 9.2. The variety (G × G) ×(B×B) X admits a diag(Bsc)- canonical Frobenius splitting which is compatible with all subvarieties of the form (G×G)×(B×B) Y and diag(G)×diag(B)Y , for a B×B-orbit closure Y in X. Proposition 9.3. The equivariant embedding X admits a diag(Bsc)- canonical Frobenius splitting which compatibly splits all G-Schubert va- rieties in X. 30 XUHUA HE AND JESPER FUNCH THOMSEN Proof. By Corollary 9.2 the variety Z = diag(G) ×diag(B) X admits a diag(Bsc)-canonical Frobenius splitting which is compatible with all subvarieties of the form diag(G)×diag(B) Y , with Y denoting a B ×B- orbit closure in X . As X is a diag(G)-stable we may identify Z with G/B ×X using the isomorphism G×B X → G/B ×X, [g, x] 7→ (gB, gx). In particular, we see that the morphism π : Z = diag(G)×diag(B) X → X, [g, x] 7→ g · x, is projective and that π∗(OZ) = OX . As a consequence (see Section 3.8) the diag(Bsc)-canonical Frobenius splitting of Z induces a diag(Bsc)- canonical Frobenius splitting of X which is compatible with all subva- rieties of the form π(diag(G)×diag(B) Y ) = diag(G) · Y, i.e. with all the G-Schubert varieties in X . This ends the proof. � As a direct consequence of Proposition 9.3, we conclude the following vanishing result (see [B-K, Theorem 1.2.8]). Corollary 9.4. Let X denote a projective equivariant embedding of G. Let X denote a G-Schubert variety in X and let L denote an ample line bundle on X. Then Hi(X,L) = 0, i > 0. Moreover, if X̃ ⊂ X is another G-Schubert variety, then the restriction H0(X,L) → H0(X̃,L), is surjective. Later (i.e. Cor. 10.5) we will generalize the vanishing part of this result to nef line bundle. 9.1. F-splittings along ample divisors. In this subsection we as- sume that X is toroidal. The following structural properties of toroidal embeddings can all be found in [B-K, Sect.6.2]. Let X0 denote the com- plement in X of the union of the subsets BsiB− for i ∈ ∆. If we let T̄ denote the closure of T in X , then X0 admits a decomposition defined by the following isomorphism (25) U × U− × (T̄ ∩X0) → X0, (x, y, z) 7→ (x, y) · z. Moreover, every G×G-orbit in X intersects (T̄ ∩X0) in a unique orbit under the left action of T . Notice here that as T is commutative the T × T -orbits and the (left) T -orbit in T will coincide. Lemma 9.5. Let X denote a projective toroidal equivariant embedding of G and let Y denote a G × G-orbit closure in X. Let K denote the subset of {1, . . . , n} consisting of those j such that Y is contained in the boundary component Xj. Then Y ∩ ( j /∈K (1, w0)Di), has pure codimension 1 in Y and contains the support of an ample effective Cartier divisor on Y . Proof. Let XK = ∪j /∈KXj . We claim that Y \X K coincides with the open G × G-orbit Y0 of Y . Clearly Y0 is contained in Y \ X K . On the other hand, let U be a G×G-orbit in Y \XK . Then Xj contains U if and only if j /∈ K. But every G × G-orbit closure in X is the intersection of those Xj which contain it [B-K, Prop.6.2.3]. It follows that the closure of Y0 and U coincide and thus U = Y0. As X is normal we may choose a G × G-linearized very ample line bundle L on X . Then H0(Y,L) is a finite dimensional (nonzero) rep- resentation of G ×G, and it thus contains a nonzero element v which is B × B−-invariant up to constants. The support of v is then the union of B × B−-invariant divisors on Y . As Y0 ∩ (T̄ ∩X0) is a single T × T -orbit it follows that Y0 ∩X0 ≃ U × U − × (Y0 ∩ (T̄ ∩X0)), is an affine variety and a single B×B−-orbit. In particular, the support of v is contained in Y \ (Y0 ∩X0) = Y ∩ (X (1, w0)Di). This shows the second part of the statement. The first part follows as Y0 ∩X0 is affine [Har, Prop.3.1]. � Let now X denote a smooth projective toroidal embedding of G. As the line bundles OX(Di) and OX(D̃i) are isomorphic it follows by [B-K, Prop.6.2.6] that (26) ω−1X ≃ OX (Di + D̃i) + Recall that a X is normal and G is semisimple and simply connected, any line bundle on X will admit a unique G2sc = Gsc×Gsc-linearization. In particular, if we let τi denote the canonical section of the line bundle OX(Di), then we may consider τi as a B sc = Bsc×Bsc-eigenvector of the space of global sections of OX(Di). As in the proof of [B-K, Prop.6.1.11] we find that the associated weight of τi equals ωi ⊠ −w0ωi, where ωi denotes the i-th fundamental weight. Similarly, we may consider the canonical section σj of OX(Xj) as a G sc-invariant element. 32 XUHUA HE AND JESPER FUNCH THOMSEN Let V denote a B ×B-orbit closure in X . As V is B ×B-stable the subset Y = (G×G) ·V is closed in X . Thus we may consider Y as the smallest G×G-invariant subvariety of X containing V . Now define K as in Lemma 9.5 and let M denote the line bundle M = OX (p− 1)( D̃i + j /∈K By Equation (26) and Lemma 3.6 it then follows that multiplication with τ i , for i ∈ ∆, and σ j , for j ∈ K, defines a morphism of B2sc-linearized line bundles M → End!F X, {Di, Xj}i∈∆,j∈K ⊗ kλ⊠λ, where λ = (1 − p)ρ. By [H-T2, Prop.6.5] and Lemma 3.1 any element in End!F (X) which is compatible with the closed subvarieties Di, i ∈ ∆, and Xj, j ∈ K, is also compatible with V and Y . In particular, we have defined a B2sc-equivariant map (27) η : M → End!F (X, Y, V ⊗ kλ⊠λ, which, by Lemma 3.5, is the same as a B2sc-invariant element η EndMF X, Y, V ⊗ kλ⊠λ. In particular, this defines us an element (28) v ∈ Ind EndMF X, Y, V ⊗ kλ⊠λ which is G2sc-invariant. We are then ready to use the ideas explained in Section 5.7. First we use (18) to construct a morphism (29) EndL⊠LF (Gsc/Bsc) ⊗M(X) → EndF G2sc ×B2sc X (u, σ) 7→ ΦM,λ⊠λ(u⊗ v ⊗ σ), where L is the Gsc-linearized line bundle on Gsc/Bsc associated to the character λ = (1− p)ρ. Notice that we here have used that M(X) is a G2sc-module. Lemma 9.6. There exists a G2sc-equivariant map (30) St⊠ St → M(X), which maps the B−sc×B sc-invariant line in St⊠St to a nonzero multiple of the global section j /∈K j ∈ M(X), where τ̃i denotes the canonical section of OX(D̃i). Proof. As OX(D̃i) and OX(Di) are isomorphic as line bundles we may consider the element j /∈K as a global section ofM. Then σ is a B2sc-eigenvector inM(X) of weight (p − 1)ρ ⊠ (p − 1)ρ. In particular, σ induces a Bsc × Bsc-equivariant k(p−1)ρ ⊠ k(p−1)ρ → M(X). Applying Frobenius reciprocity and the selfduality of the Steinberg module St, this defines the desired map St⊠ St → M(X), with the stated properties. � Combining the map (29) with the map (30) in Lemma 9.6 we obtain a G2sc-equivariant map (31) Θ : EndL⊠LF (Gsc/Bsc) St⊠ St → EndF G2sc ×B2sc X We will now study when the map (31) describes a Frobenius splitting of G2sc ×B2sc X . Consider the G sc-equivariant map (32) M(X) → St⊠ St, σ 7→ Φ2 M,λ⊠λ(v ⊗ σ), defined as the map (19) in Section 5.7. We claim Lemma 9.7. The composition of the map (30) in Lemma 9.6 and the map in (32) is an isomorphism on St⊠ St. Proof. By Frobenius reciprocity it suffices to show that the described composed map is nonzero. In particular, it suffices to show that M,λ⊠λ(v ⊗ σ̃) 6= 0, where σ̃ denotes the global section of M defined in Lemma 9.6. For this we use the fact that the global section (τiτ̃i) X defines a Frobenius splitting of X (see e.g. [B-K, proof of Thm.6.2.7]). As a consequence η(σ̃) is a Frobenius splitting ofX , where η is the map defined in (27). Equivalently , the natural G2sc-equivariant morphism EndMF (X)⊗M(X) → k[X ′] = k, defined in (13), will map η′ ⊗ σ̃ to 1. This induces a commutative diagram (33) Ind EndMF ⊗ kλ⊠λ ⊗M(X) M,λ⊠λ ++VVV St⊗ St EndMF (X)⊗ kλ⊠λ ⊗M(X) // kλ⊠λ 34 XUHUA HE AND JESPER FUNCH THOMSEN where the image of v⊗ σ̃ under the diagonal map is nonzero. This ends the proof. � Proposition 9.8. Let Θ denote the map defined in (31). The image Θ(ν) of an element ν defines, up to a nonzero constant, a Frobenius splitting of G2sc ×B2sc X if and only if the image of ν under the map (34) φλ⊠λ : End (Gsc/Bsc) St⊠ St defined in Section 5.6, is nonzero. Proof. Apply Proposition 5.7 and Lemma 9.7. With the identification EndL⊠LF (Gsc/Bsc) ≃ St⊠ St the map φλ⊠λ, defined in (34), must necessarily (up to a nonzero constant) be the G2sc-invariant form on St⊠ St mentioned in Section 6.1. Let v∆ denote the diag(G)-invariant element in EndL⊠LF (Gsc/Bsc)2 defined in Section 7.1. Then the diag(G)-equivariant map St⊗ St → k, ν 7→ φλ⊠λ(v∆ ⊗ ν), is nonzero and thus it must coincide (up to a nonzero constant) with the Gsc-invariant form φ on St defined in (24). Proposition 9.9. Fix notation as above and let D denote the effective Cartier divisor (p− 1) (1, w0)Di + j /∈K on X. Then X admits a Frobenius D-splitting which is compatible with the subvariety Y and the G-Schubert variety diag(G) · V . Proof. Consider the diag(G)-equivariant morphism Θ∆ : St⊠ St → EndF G2sc ×B2sc X ν 7→ Θ(v∆ ⊗ ν), where Θ is the map in (31). By Lemma 9.8 the image Θ∆(ν) of an element ν ∈ St⊗ St is a Frobenius splitting, up to a nonzero constant, if and only if φ(ν) is nonzero. Here φ is the the map defined in (24). Let v+ (resp. v−) denote a nonzero B (resp. B −)-eigenvector of St and let ν = v+ ⊗ v−. After possibly multiplying v+ with a constant we may assume that s = Θ∆(ν) defines a Frobenius splitting of Z = G2sc ×B2sc X . As v is compatible with Y and V (cf. (28)) it follows by Theorem 5.6 and Lemma 7.1 that s factorizes as (35) s : (FZ)∗OZ (FZ )∗σ −−−−→ (FZ)∗MZ −→ OZ′, where s1 is compatible with the subvarieties G sc×B2sc V , G sc×B2sc Y and diag(Gsc)×diag(Bsc) X . Here MZ is the G sc-linearized line bundle on Z associated with the B2sc-linearized line bundle M on X as explained in Section 5.2, and σ is the global section of MZ defined as the image of ν under the map (30) in Lemma 9.6. Notice that as M is a G2sc-linearized line bundle on the G2sc-variety X we may identify the global sections of M and MZ . Actually , as X is a G sc-variety the morphism G2sc ×B2sc X → Gsc/Bsc × Gsc/Bsc ×X, [(g1, g2), x] 7→ (g1B, g2B, (g1, g2) · x), is an isomorphism. Moreover, under this isomorphism, the line bundle MZ is just the pull back of M under projection pX on the third coor- dinate. Thus, by Lemma 9.6 it follows that σ is the pull back from X of the effective Cartier divisor D = (p− 1) (1, w0)Di + j /∈K Applying the functor (pX)∗ to (35) we obtain the Frobenius D-splitting (pX)∗s : (FX)∗OX (FX)∗σD −−−−−→ (FX)∗O(D) (pX)∗s1 −−−−→ OX′ ofX where (pX)∗s1 is compatible with the subvarieties pX(G sc×B2scY ) = Y and pX(diag(Gsc)×diag(Bsc) V ) = diag(G) · V (by Lemma 3.7). This ends the proof. � Corollary 9.10. Let X denote a G-Schubert variety in a smooth pro- jective toroidal embedding of a reductive group G. Then X admits a stable Frobenius splitting along an ample divisor. Proof. Apply Proposition 9.9, Lemma 9.5 and Lemma 3.3. � 10. Cohomology of line bundles The main aim of this section is to obtain a generalizing the vanishing part of Corollary 9.4 to nef line bundles. The concept of a rational morphism is here central and for this we use [B-K, Sect.3.3] as a general reference. First we recall : Definition 10.1. A morphism f : Y → Z of varieties is a called a ra- tional morphism if the induced map f ♯ : OZ → f∗OY is an isomorphism and Rif∗OY = 0, i > 0. The following criterion for a morphism to be rational will be very useful ([R, Lem.2.11]). Lemma 10.2. Let f : Y → Z denote a projective morphism of ir- reducible varieties and let Ŷ denote a closed irreducible subvariety of Y . Consider the image Ẑ = f(Ŷ ) as a closed subvariety of Z. Let L denote an ample line bundle on Z and assume (1) f ♯ : OZ → f∗OY is an isomorphism. (2) Hi(Y, f ∗Ln) = Hi(Ŷ , f ∗Ln) = 0, for i > 0 and n≫ 0. 36 XUHUA HE AND JESPER FUNCH THOMSEN (3) The restriction map H0(Y, f ∗Ln) → H0(Ŷ , f ∗Ln) is surjective for n≫ 0. Then the induced map f̂ : Ŷ → Ẑ is a rational morphism. 10.1. Toric variety. An equivariant embedding Z of the (reductive) group T is called a toric variety (wrt. T ). Notice that, as T is commu- tative, we may consider the T ×T -action on Z as just a T -action. The following result should be well known but, as we do not know a good reference, we include a proof. Lemma 10.3. Let f : Y → Z denote a projective surjective morphism of equivariant embeddings of T . Let T · z denote a T -orbit in Z and let T · y denote a T -orbit in f−1(T · z) of minimal dimension. Then the map T · y → T · z, induced by f , is an isomorphism. Proof. Let T · z and T · y denote the closures of T · z and T · y in Z and Y respectively. Then the induced map f̂ : T · y → T · z, is a projective morphism. Moreover, by the minimality assumption on T · y, the inverse image f̂−1(T · z) equals T · y. In particular, the induced morphism : T · y → T · z is projective. But any T -orbit in a toric variety (wrt. to T ) is isomorphic to a torus T1 satisfying that the cokernel of the induced map of character groups X∗(T1) → X ∗(T ) is a free abelian group ([Ful, Sect.3.1]). In particular, the varieties T ·y and T · z are tori and the cokernel of the induced map of character groups X∗(T · z) → X∗(T · y) is a free abelian group. But T · y → T · z is an affine projective morphism and thus it must be a finite morphism. Thus the cokernel of X∗(T · z) → X∗(T · y) is a finite group and, as it is already a free group, it must be trivial. This ends the proof as tori are determined by their character groups. � Lemma 10.4. Let X denote a projective embedding of a reductive group G and let Y denote a G × G-orbit closure of X. Then there exists a smooth toroidal embedding X̂ of G, a projective G-equivariant morphism f : X̂ → X and a G×G-orbit closure Ŷ in X̂ such that the induced morphism f : Ŷ → Y is a rational morphism. Proof. Assume first that X is toroidal. By [B-K, Prop.6.2.5] there ex- ists a smooth toroidal embedding X̂ of G with a projective morphism f : X̂ → X . Let X0 denote the open subset of X introduced in the beginning of Section 9.1, and let X̂0 denote the corresponding sub- set of X̂ . Then the inverse image f−1(X0) coincides with X̂0 [B-K, Prop.6.2.3(i)]. Let T (resp. T̂ ) denote the closure of T in X (resp. X̂). Then T and T̂ are toric varieties [B-K, Prop.6.2.3], and the induced map f : T̂ → T is a projective morphism of toric varieties. Thus also the induced map X̂0 ∩ T̂ → X0 ∩ T , is a projective morphism of toric varieties. As mentioned in Section 9.1 every G×G-orbit in X will intersect X0 ∩ T in a unique T -orbit. We let T · x denote the open T -orbit in the intersection of Y with X0 ∩ T . By Lemma 10.3 we may find a T -orbit T · x̂ in X̂0 ∩ T̂ which by f is isomorphic to T · x, and we then define Ŷ to be the closure of the G×G-orbit through x̂. By the isomorphism (25) we then conclude that f induces a projective birational morphism Ŷ → Y . By [H-T2, Cor.8.4] the orbit closure Y is normal and thus, by Zariski’s main theorem, we conclude f∗OŶ = OY . By Lemma 10.2 (used on the morphism Ŷ → Y and the closed non-proper subvariety Ŷ of Ŷ ) it now suffices to prove Hi(Ŷ , f ∗L) = 0, i > 0, for a very ample line bundle L on Y . This follows from [H-T2, Prop.7.2] and ends the proof in the case when X is toroidal. Consider now an arbitrary projective equivariant embedding X of G. Let X̂ denote the normalization of the closure of the image of the natural G×G-equivariant embedding G→ X ×X, where X denotes the wonderful compactification of Gad. Then X̂ is a toroidal embedding of G with an induced projective equivariant mor- phism f : X̂ → X . Let Ŷ denote any G×G-orbit closure in X̂. Then f : Ŷ → f(Ŷ ) is a rational morphism [H-T2, Lem.8.3]. In particular, we may find a G × G-orbit closure Ŷ of X̂ with an induced rational morphism f : Ŷ → Y . Finally we may apply the first part of the proof to Ŷ and X̂ and use that a composition of rational morphisms is again a rational morphism. � Corollary 10.5. Let X denote a projective embedding of a reductive group G and let X denote a G-Schubert variety in X. Let Y = (G×G)· X denote the minimal G × G-orbit closure of X containing X. When L is a nef line bundle on X then Hi(X,L) = 0, i > 0. Moreover, when L is a nef line bundle on Y then the restriction mor- phism H0(Y,L) → H0(X,L), is surjective. Proof. Assume first that X is smooth and toroidal. Then by Propo- sition 9.9, Lemma 9.5 and Lemma 3.3 the variety Y admits a stable Frobenius splitting along an ample divisor which is compatibly with X. Thus the statement follows in this case by Proposition 3.4. 38 XUHUA HE AND JESPER FUNCH THOMSEN Let now X denote an arbitrary projective equivariant embedding of G. Choose, using Lemma 10.4, a smooth projective toroidal embedding X̂ with a projective equivariant morphism f : X̂ → X onto X , and a G × G-orbit closure Ŷ in X̂ with an induced rational morphism onto Y . Let V denote a B×B-orbit closure in Y such that X = diag(G) ·V . As Y is the minimal G × G-orbit closure containing X it follows that V will intersect the open G×G-orbit of Y . In particular, there exists a B×B-orbit closure V̂ in X̂ which intersects the open G×G-orbit of Ŷ and which maps onto V . In particular, X̂ := diag(G) · V̂ , is a G-Schubert variety in X̂ which by f maps onto X. Moreover, Ŷ is the minimal G×G-orbit closure containing X̂. We claim that the induced morphism X̂ → X is a rational morphism. To prove this we apply Lemma 10.2 to the rational morphism f : Ŷ → Y . Choose an ample line bundle M on Y . Then it suffices to prove (36) Hi(Ŷ , f ∗Mn) = Hi(X̂, f ∗Mn) = 0, i > 0, n > 0, and that the restriction map (37) H0(Ŷ , f ∗Mn) → H0(X̂, f ∗Mn), is surjective for n > 0. But Mn is an ample, and thus nef, line bundle on Y and therefore the pull back f ∗Mn is a nef line bundle on Ŷ ([Laz, Ex. 1.4.4]). As X̂ is smooth and toroidal, the conclusion of the first part of this proof then shows that conditions (36) and (37) are satisfied. Now both X̂ → X and Ŷ → Y are rational morphisms. In particular, we have identifications Hi(Ŷ , f ∗L) ≃ Hi(Y,L), i ≥ 0, Hi(X̂, f ∗L) ≃ Hi(X,L), i ≥ 0, for any line bundle L on Y or, in the second equation, on X . When L is a nef line bundle the pull back f ∗L is also nef ([Laz, Ex. 1.4.4]). Thus as we have already completed the proof of the statement for smooth toroidal embeddings, in particular for X̂ , this now ends the proof. � By the proof of the above result we also find that any G-Schubert variety X in a projective equivariant embedding of G, will admit a G- equivariant rational morphism f : X̂ → X by a G-Schubert variety X̂ of some smooth projective toroidal embedding of G. Remark 10.6. When X = X is the wonderful compactification of a group G of adjoint type and L is a nef line bundle on X, then the restriction morphism H0(X,L) → H0(Y,L), to any closed G×G-stable irreducible subvariety Y of X is surjective. In particular, also the restriction morphism H0(X,L) → H0(X,L), to any G-Schubert variety X is surjective by the above result. We do not know if the latter is true for arbitrary equivariant embeddings. 11. Normality questions The obtained Frobenius splitting properties of G-Schubert varieties in Section 9 and the cohomology vanishing results in Corollary 10.5 should be expected to have strong implications on the geometry of these varieties. However, in this section we provide an example of a G- Schubert variety in the wonderful compactification of a group of type G2 which is not even normal. In fact, it seems that there are plenty of such examples. 11.1. Some general theory. We keep the notations as in Section 8.5. For J ⊂ ∆ and w ∈ W∆\J , we let XJ,w denote the closure of XJ,w in X. Let K = max{K ′ ⊂ ∆ \ J ;wK ′ ⊂ K ′}. By [He2, Prop. 1.12], we have a diag(G)-equivariant isomorphism diag(G)×diag(PK) (PKẇ, PK)hJ ≃ XJ,w induced by the inclusion of (PKẇ, PK)hJ in X. Let V denote the closure of (PKẇ, PK)hJ within X. Then V is the closure of a B × B- orbit and we find that the induced map (38) f : diag(G)×diag(PK) V → XJ,w, is a birational and projective morphism. Thus, by Zariski’s Main The- orem, a necessary condition for XJ,w to be normal is that the fibers of f are connected. Actually, in positive characteristic, connectedness of the fibers is also sufficient forXJ,w to be normal. This follows asXJ,w is Frobenius split (Prop. 9.3) and thus weakly normal [B-K, Prop.1.2.5]. 11.2. An example of a non-normal closure. Let now, further- more, G be a group of type G2. Let α1 denote the short simple root and α2 denote the long simple root. The associated simple reflections are denoted by s1 and s2. Let J = {α2} and w = s1s2 ∈ W ∆\J . In this case K = ∅ and we obtain a birational map f : diag(G)×diag(B) V ≃ XJ,w where V is the closure of (Bẇ,B)hJ . By [Sp, Prop. 2.4], the part of V which intersect the open G×G-orbit of XJ equals (Bẇ′, B)hJ ∪ ws1≤w′ (Bẇ′, Bṡ1)hJ . 40 XUHUA HE AND JESPER FUNCH THOMSEN In particular, x := (v̇, 1)hJ is an element of V , where v = s2s1s2. We claim that the fiber of f over x is not connected. To see this let y denote a point in the fiber over x. Then we may find g ∈ G and x̃ ∈ V such that y = [g, x̃]. By (39), x̃ = (bẇ′, b′)hJ for some b ∈ B, b ′ ∈ P∆\J and w ′ ≥ w. Then (gbẇ′, gb′)hJ = (v̇, 1)hJ . It follows that (v̇−1gbẇ′, gb′) lies in the stabilizer of hJ . In particular, gb′ ∈ P∆\J and thus also g ∈ P∆\J . If g ∈ B then y = [1, x]. So assume that g = u1(t)ṡ1 where u1 is the root homomorphism associated to α1. Assume that t 6= 0. Then we may find b1 ∈ B and s ∈ k such that g = u−1(s)b1 where u−1 is the root homomorphism associated to −α1. x̃ = (g−1, g−1)(v̇, 1)hJ = (b−11 u−1(−s)v̇, g −1)hJ = (b−11 v̇, g −1)hJ ∈ (Bv̇, Bṡ1)hJ where the third equality follows as v̇−1u−1(−s)v̇ is contained in the unipotent radical of P− . But (Bv̇, Bṡ1)hJ has empty intersection with V (by (39)) which contradicts the assumption that t 6= 0. It follows that the only possibilities for y are [1, x] and [ṡ1, (ṡ 1 v̇, ṡ 1 )hJ ]. As (ṡ−11 v̇, ṡ 1 ) is contained in V (by (39)) we conclude that the fiber of f over x consists of 2 points; in particular the fiber is not connected and thus XJ,w is not normal. Remark 11.1. It seems likely that normalizations of G-Schubert vari- eties should have nice singularities : If we let ZJ,w denote the normal- ization of the closure of XJ,w, then the map (38) induces a birational and projective morphism f̃ : diag(G)×diag(PK) V → ZJ,w. We expect that f̃ can be used to obtain global F -regularity of ZJ,w (see [S] for an introduction to global F -regularity). In fact, by the results in [H-T2] the B×B-orbit closure V is globally F -regular. Thus diag(G)×diag(PK) V is locally strongly F -regular, and as f̃∗Odiag(G)×diag(PK )V = OZJ,w , it seems likely that ZJ,w is also locally strongly F -regular. Moreover, similarly to Corollary 9.10 one may conclude that ZJ,w admits a stable Frobenius splitting along an ample divisor. Thus ZJ,w is globally F - regular if it is locally strongly F -regular. At the moment we do not know if ZJ,w is locally strongly F -regular. 12. Generalizations Fix notation as in Section 2. An admissible triple of G × G is by definition a triple C = (J1, J2, θδ) consisting of J1, J2 ⊂ ∆, a bijection δ : J1 → J2 and an isomorphism θδ : LJ1 → LJ2 that maps T to T and the root subgroup Uαi to the root subgroup Uαδ(i) for i ∈ J1. To each admissible triple C = (J1, J2, θδ), we associate the subgroup RC of G×G defined by RC = {(p, q) : p ∈ PJ1, q ∈ PJ2, θδ(πJ1(p)) = πJ2(q)}, where πJ : PJ → LJ , for a subset J ⊂ ∆, denotes the natural quotient Let X denote an equivariant embedding of the reductive group G. A RC-Schubert variety of X is then a subset of the form RC · V for some B × B-orbit closure V in X . When G = Gad is a group of adjoint type and X = X is the associated wonderful compactification the set of RC-Schubert varieties coincides with closures of the set of RC- stable pieces. By definition [L-Y, section 7], a RC-stable piece in the wonderful compactification X of Gad is a subvariety of the form RC ·Y , where Y = (Bv1, Bv2) · hJ for some J ⊂ ∆, v1 ∈ W J and v2 ∈ (notation as in Section 8.5). Notice that when J1 = J2 = ∆ and θδ is the identity map then a RC-stable piece is the same as a G-stable piece. On the other hand, when J1 = J2 = ∅, then a RC-stable piece is the same as a B × B-orbit. Moreover, any RC-Schubert variety is a finite union of RC-stable pieces [L-Y, Section 7]. The following is a generalization of Proposition 9.3 and Proposition Proposition 12.1. Let C = (J1, J2, θδ) denote an admissible triple of G×G and let X denote an equivariant embedding of G. Then X admits a Frobenius splitting which compatible splits all RC-Schubert varieties in X. If, moreover, X is a smooth, projective and toroidal embedding and Y = XK = (G × G) · V , for some B × B-orbit closure V in X, then X admits a Frobenius splitting along the Cartier divisor D = (p− 1) (wJ10 , 1)D̃i + j /∈K which is compatibly with Y and RC · V . Proof. As the proof is similar to the proof of Proposition 9.3 and Propo- sition 9.9 we only sketch the proof. In the following GJ , for a subset J ⊂ ∆, denotes the commutator of the Levi subgroup in Gsc associated to J . The Borel subgroup GJ ∩Bsc of GJ is denoted by BJ . Define XC to be the G2J1-variety which as a variety is X but where the action is twisted by the morphism GJ1 ×GJ1 −−−→ GJ1 ×GJ2. 42 XUHUA HE AND JESPER FUNCH THOMSEN Then the BJ1 × BJ2-canonical Frobenius splitting of X defined by Theorem 9.1 and Lemma 6.3 induces a B2J1-canonical Frobenius split- ting of XC. In particular, all subvarieties of XC which corresponds to B × B-orbit closures in X will be compatibly Frobenius split by this canonical Frobenius splitting. Now apply an argument as in the proof of Proposition 9.3 and use the identification of RC · V ⊂ X with diag(GJ1) · V ⊂ XC. This ends the proof of the first statement. Assume now that X is a smooth, projective and toroidal embedding and consider the B2sc-equivariant morphism η : M → End!F (X, Y, V )⊗ k(1−p)ρ⊠(1−p)ρ, defined in (27). Let YC and VC be defined similar to XC. Then η induces a B2J1-equivariant morphism ηC : M → End F (XC, YC, VC)⊗ k(1−p)ρJ1⊠(1−p)ρJ1 . Similar to the definition of v in (28) we obtain from ηC an element vC ∈ Ind EndF (XC, YC, VC)⊗ k(1−p)ρJ1⊠(1−p)ρJ1 and from this a G2J1-equivariant morphism (40) End (GJ1/BJ1) ⊗M(XC) → EndF G2J1 ×B2J1 similar to (29). Here LJ1 is the line bundle on GJ1/BJ1 associated to the character (1 − p)ρJ1. Combining Lemma 6.3 and Lemma 9.6 we also obtain a map (41) StJ1 ⊠ StJ1 → M(XC), with properties similar to the ones described in Lemma 9.6. As in (32) we may also use vC to construct a morphism M(XC) → StJ1 ⊠ StJ1 , such that the composition with (41) is an isomorphism on StJ1 ⊠ StJ1 . Finally we may construct ΘC : End (GJ1/BJ1) ⊗ (StJ1 ⊠ StJ1) → EndF G2J1 ×B2J1 similar to (31). In particular, a statement equivalent to Proposition 9.8 is satisfied for ΘC. Let v + (resp. v − ) denote a highest (resp. lowest) weight vector in StJ1 and let v ∆ denote the diag(GJ1)-invariant element of End (GJ1/BJ1) . Imitating the proof of Proposition 9.9 we then find that ΘC(v ∆ ⊗ (v + ⊗ v − )) is a Frobenius splitting of G2J1×B2J1 XC (up to a nonzero constant). Moreover, the push forward of this Frobenius splitting to X has the desired properties. We only have to note that the effective Cartier associated to the image of vJ1+ ⊗ v under the map (41) equals D = (p− 1) (wJ10 , 1)D̃i + j /∈K This ends the proof. � We may also argue as in Corollary 10.5 to obtain Corollary 12.2. Let X denote a projective embedding of a reductive group G and let V denote the closure of a B × B-orbit in X. Let Y = (G×G) · V and XC = RC · V . When L is a nef line bundle on XC Hi(XC,L) = 0, i > 0. Moreover, when L is a nef line bundle on Y then the restriction mor- phism H0(Y,L) → H0(XC,L), is surjective. Remark 12.3. In the case where k = C and X is the wonderful com- pactification, the subvarieties (wJ10 , 1)D̃i, Xj and all the RC-Schubert varieties are Poisson subvarieties with respect to the Poisson structure on X corresponding to the splitting Lie(G)⊕ Lie(G) = l1 ⊕ l2, where l1 = Lie(RC) and l2 is a certain subalgebra of Ad(w 0 )Lie(B Lie(B−). See [L-Y2, 4.5]. References [Bri] M. Brion, Multiplicity-free subvarieties of flag varieties, Contemp. Math. 331 (2003), 13–23. [B-K] M. Brion and S. Kumar, Frobenius Splittings Methods in Geometry and Representation Theory, Progress in Mathematics (2004), Birkhäuser, Boston. [B-T] M. Brion and J. F. Thomsen, F -regularity of large Schubert varieties, Amer. J. Math. 128 (2006), 949–962. [E-L] S. Evens and J.-H. Lu, On the variety of Lagrangian subalgebras, I, II, Ann. Sci. cole Norm. Sup. (4) 34 (2001), no. 5, 631–668; 39 (2006), no. 2, 347–379. [Ful] W. Fulton, Introduction to Toric Varieties, Ann. Math. Studies, 131 (1993), Princeton University Press. [Har] R. Hartshorne, Ample subvarieties of algebraic varieties, Lecture Notes in Math. 156 (1970), Springer-Verlag. [Har2] R. Hartshorne, Algebraic Geometry, GTM 52 (1977), Springer-Verlag. [He] X. He, Unipotent variety in the group compactification, Adv. in Math. 203 (2006), 109-131. [He2] X. He, The G-stable pieces of the wonderful compactification, Trans. Amer. Math. Soc. 359 (2007), 3005-3024. 44 XUHUA HE AND JESPER FUNCH THOMSEN [H-T] X. He and J. F. Thomsen, On the closure of Steinberg fibers in the won- derful compactification, Transformation Groups, 11 (2006), no. 3, 427-438. [H-T2] X. He and J.F.Thomsen, Geometry of B×B-orbit closures in equivariant embeddings, math.RT/0510088. [Laz] R. Lazarsfeld, Positivity in Algebraic Geometry I, classical setting: line bundles and linear series, Ergebnisse der Mathematik und ihrer Grenzge- biete. 3. Folge (2004), Springer-Verlag, Berlin. [L] G. Lusztig, Parabolic character sheaves, I, II, Mosc. Math. J. 4 (2004), no. 1, 153–179; no. 4, 869–896. [L-Y] J.-H Lu and M. Yakimov, Partitions of the wonderful group compactifica- tion, math.RT/0606579. [L-Y2] J.-H Lu and M. Yakimov, Group orbits and regular partitions of Poisson manifolds, math.SG/0609732. [M-R] V.B. Mehta and A. Ramanathan, Frobenius splitting and cohomology van- ishing for Schubert varieties, Ann. of Math. 122 (1985), 27–40. [R] A. Ramanathan, Equations defining Schubert varieties and Frobenius split- ting of diagonals, Inst. Hautes Études Sci. Publ. Math. 65 (1987), 61–90. [R-R] S. Ramanan and A, Ramanathan, Projective normality of flag varieties and Schubert varieties, Invent. Math. 79 (1985), 217–224. [S] K. E. Smith, Globally F -regular varieties: Applications to vanishing the- orems for quotients of Fano varieties, Michigan Math. J. 48 (2000), 553– [Sp] T. A. Springer, Intersection cohomology of B×B-orbits closures in group compactifications, J. Alg. 258 (2002), 71–111. [T] J. F. Thomsen, Frobenius splitting of equivariant closures of regular con- jugacy classes Proc. London Math. Soc. 93 (2006), 570–592. Department of Mathematics, Stony Brook University, Stony Brook, NY 11794, USA E-mail address : hugo@math.sunysb.edu Institut for matematiske fag, Aarhus Universitet, 8000 Århus C, Denmark E-mail address : funch@imf.au.dk ABSTRACT Let $X$ be an equivariant embedding of a connected reductive group $G$ over an algebraically closed field $k$ of positive characteristic. Let $B$ denote a Borel subgroup of $G$. A $G$-Schubert variety in $X$ is a subvariety of the form $\diag(G) \cdot V$, where $V$ is a $B \times B$-orbit closure in $X$. In the case where $X$ is the wonderful compactification of a group of adjoint type, the $G$-Schubert varieties are the closures of Lusztig's $G$-stable pieces. We prove that $X$ admits a Frobenius splitting which is compatible with all $G$-Schubert varieties. Moreover, when $X$ is smooth, projective and toroidal, then any $G$-Schubert variety in $X$ admits a stable Frobenius splitting along an ample divisors. Although this indicates that $G$-Schubert varieties have nice singularities we present an example of a non-normal $G$-Schubert variety in the wonderful compactification of a group of type $G_2$. Finally we also extend the Frobenius splitting results to the more general class of $\mathcal R$-Schubert varieties. <|endoftext|><|startoftext|> Introduction Model setup Orphan TeV flares ABSTRACT With the anticipated launch of GLAST, the existing X-ray telescopes, and the enhanced capabilities of the new generation of TeV telescopes, developing tools for modeling the variability of high energy sources such as blazars is becoming a high priority. We point out the serious, innate problems one zone synchrotron-self Compton models have in simulating high energy variability. We then present the first steps toward a multi zone model where non-local, time delayed Synchrotron-self Compton electron energy losses are taken into account. By introducing only one additional parameter, the length of the system, our code can simulate variability properly at Compton dominated stages, a situation typical of flaring systems. As a first application, we were able to reproduce variability similar to that observed in the case of the puzzling `orphan' TeV flares that are not accompanied by a corresponding X-ray flare. <|endoftext|><|startoftext|> Fusion of radioactive Sn with J. F. Liang, D. Shapira, J. R. Beene, C. J. Gross, R. L. Varner, A. Galindo-Uribarri, J. Gomez del Campo, P. A. Hausladen, P. E. Mueller, D. W. Stracener Physics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831 H. Amro, J. J. Kolata Department of Physics, University of Notre Dame, Notre Dame, IN 46556 J. D. Bierman Physics Department AD-51, Gonzaga University, Spokane, Washington 99258-0051 A. L. Caraley Department of Physics, State University of New York at Oswego, Oswego, NY 13126 K. L. Jones Department of Physics and Astronomy, Rutgers University, Piscataway, NJ 08854 Y. Larochelle Department of Physics and Astronomy, University of Tennessee, Knoxville, Tennessee 37966 W. Loveland, D. Peterson Department of Chemistry, Oregon State University, Corvallis, Oregon 97331 (Dated: October 28, 2018) Evaporation residue and fission cross sections of radioactive 132Sn on 64Ni were measured near the Coulomb barrier. A large sub-barrier fusion enhancement was observed. Coupled-channel calculations including inelastic excitation of the projectile and target, and neutron transfer are in good agreement with the measured fusion excitation function. When the change in nuclear size and shift in barrier height are accounted for, there is no extra fusion enhancement in 132Sn+64Ni with respect to stable Sn+64Ni. A systematic comparison of evaporation residue cross sections for the fusion of even 112−124Sn and 132Sn with 64Ni is presented. PACS numbers: 25.60.-t, 25.60.Pj I. INTRODUCTION Fusion of heavy ions has been a topic of interests for several decades[1]. One motivation is to understand the reaction mechanisms so that the production yield of heavy elements can be better estimated by model calcula- tions. The formation of a compound nucleus is a complex process. The projectile and target have to be captured inside the Coulomb barrier and subsequently evolve into a compact shape. In heavy systems, the dinuclear system can separate during shape equilibration prior to passing the saddle point. This quasifission process is considered the primary cause of fusion hindrance[2, 3, 4]. At energies near and below the Coulomb barrier, the structure of the participants plays an important role in influencing the fusion cross section[5, 6, 7]. Sub-barrier fusion enhancement due to nuclear deformation and in- elastic excitation has been observed[8, 9, 10, 11, 12]. Coupled-channel calculations have successfully repro- duced experimental data by including nuclear deforma- tion and inelastic excitation. Nucleon transfer is another important channel to be considered[13, 14]. Recently available radioactive ion beams offer the op- portunity to study fusion under the influence of strong nucleon transfer reactions. Several theoretical works have predicted large enhancement of sub-barrier fusion involv- ing neutron-rich radioactive nuclei[15, 16, 17, 18, 19]. In addition, the compound nucleus produced in such re- actions is predicted to have a higher survival probabil- ity and longer lifetimes. This is encouraging for super- heavy element research. If high-intensity, neutron-rich radioactive beams become available in the future, new neutron-rich heavy nuclei may be synthesized with en- hanced yields. The longer lifetime of new isotopes of heavy elements would enable the study of their atomic and chemical properties[20]. However, the current inten- sity of the radioactive beams is several orders of mag- nitude lower than that of stable beams. It is thus not practical to use such beams for heavy element synthesis experiments, but they do provide excellent opportuni- ties for studying reaction mechanisms of fusion involving neutron-rich radioactive nuclei. Fusion enhancement, with respect to a one- dimensional barrier penetration model prediction, has been observed in experiments performed with neutron- rich radioactive ion beams at sub-barrier energies[21, 22, 23, 24, 25]. For instance, the effect of large neutron excess on fusion enhancement can be seen in 29,31Al+197Au[23]. http://arxiv.org/abs/0704.0780v2 However, when comparing reactions involving stable iso- topes of the projectile or target, the fusion excitation functions are very similar if the change in nuclear sizes is accounted for. This paper reports results of fusion excitation func- tions measured with radioactive 132Sn on 64Ni. The dou- bly magic (Z=50, N=82) 132Sn has eight neutrons more than the heaviest stable 124Sn. Its N/Z ratio (1.64) is larger than that of stable doubly magic nuclei 48Ca (1.4) and 208Pb (1.54) which are commonly used for heavy element production[26]. Evaporation residue (ER) and fission cross sections were measured. The sum of ER and fission cross sections are taken as the fusion cross section. The experimental apparatus is described in Sect. II and data reduction procedures in Sect. III. The results and comparison with model calculations are presented in Sect. IV. In Sect. V a comparison of ER and fusion cross sections with those resulting from stable Sn isotopes on 64Ni is discussed. A summary is given in Sect. VI. II. EXPERIMENTAL METHODS The experiment was carried out at the Holifield Ra- dioactive Ion Beam Facility. A 42 MeV proton beam produced by the Oak Ridge Isochronous Cyclotron was used to bombard a uranium carbide target. The fission fragments were ionized by an electron beam plasma ion source. The largest yield of mass A=132 fragments was 132Te. Therefore, it was necessary to suppress 132Te. This was accomplished by introducing sulfur into the ion source then selecting the mass 164 XS+ molecular ions from the extracted beam. The 132Te to 132Sn ratio in the ion beam was found to be suppressed by a large factor (∼ 7 × 104) compared to that observed with the mass 132 atomic beam. The mass 164 SnS+ beam was con- verted into a Sn− beam by passing it through a Cs va- por cell where the molecular ion underwent breakup and charge exchange[27]. The negatively charged Sn was sub- sequently injected into the 25 MV electrostatic tandem accelerator to accelerate the beam to high energies. The measurement was performed at energies between 453 and 620 MeV. The average beam intensity was 50,000 parti- cles per second (pps) with a maximum of 72,000 pps. The ER cross sections measured between 453 and 560 MeV have been reported previously[24]. The purity of the Sn beam was measured by an ioniza- tion chamber mounted at zero degrees. Figure 1 displays the energy loss spectra of a 560 MeV A=132 beam with and without the sulfur purification. The dashed curves are the results of fitting the spectrum with Gaussian dis- tributions to estimate the composition of the beam. In the upper panel, the beam is primarily 132Te without sulfur in the ion source. When sulfur was introduced in the ion source, the beam was 96% 132Sn, as shown in the lower panel. The small amount of Sb and Te had a neg- ligible impact on the measurement because their atomic number is higher. Fusion of the target with these iso- baric contaminants at sub-barrier energies should have been suppressed due to the higher Coulomb barriers. FIG. 1: (Color online) Composition of a 560 MeV mass A=132 beam measured by the ionization chamber. Top panel: The mass A=132 beam without purification where Te and Sb are the major components of the beam. Bottom panel: Sulfur was introduced into the target ion source and SnS was selected by the mass separator. The dashed curves are results of fitting the spectrum with three Gaussian distributions. The isobar contaminants 132Sb and 132Te were suppressed considerably. The apparatus for the fusion measurement is shown in Fig. 2. A thick 64Ni target (1.0 mg/cm2) was used to compensate for the low beam intensity. Since the com- pound nucleus decays by particle evaporation and fission, the evaporation residue (ER) and fission cross sections were measured. The ERs were detected by the ioniza- tion chamber at zero degrees and the fission fragments were detected by an annular double-sided silicon strip detector. time−of−flight Si det beam defining Timing Timing Timing target ionization chamber FIG. 2: (Color online) Apparatus for measuring fission and evaporation residues cross sections induced by low intensity beams in inverse kinematics. The ERs were identified by the time-of-flight measured with the microchannel plate timing detector located in front of the ionization chamber and by energy loss in the ionization chamber. The two microchannel plate timing detectors located before the target were used to monitor the beam intensity and to provide the timing reference for the time-of-flight measurement. The microchannel plate timing detector in front of the ionization chamber was position sensitive and was used to monitor the beam position. It was located 200 mm from the target and had a 25 mm diameter Mylar foil. The ionization chamber was filled with CF4 gas so that it could function at rates up to 50,000 pps. Higher beam intensities occurred in some of the fission measurements, requiring the ioniza- tion chamber to be turned off. The data acquisition was triggered by either the beam signal rate down scaled by a factor of 1000, the coincidence of the delayed beam signal and ER signal, or the silicon detector signal. A 350 MeV Au beam that resembled ERs was measured by the ion- ization chamber to calibrate the energy loss spectrum. The ER cross section was obtained by taking the ratio of the ER yield to the target thickness and the integrated beam particles in the ionization chamber. A detailed de- scription of the ER measurement technique used in this experiment can be found in Ref. [28]. The annular double-sided silicon strip detector (Mi- cron Semiconductor Design S2) was located 42 mm from the target. It had 48 concentric strips on one side and 16 pie-shaped sectors on the other side. The inner diame- ter was 35 mm and the outer diameter was 70 mm. The thickness of the detector was 300 µm. The detection an- gles spanned 15.6◦ to 39.6◦. The fission fragments were identified by requiring a coincidence of events in the Si detector and by the folding angle distributions of the de- tected particles. III. DATA REDUCTION PROCEDURES A. Evaporation residues Since this was an inverse kinematics reaction, the ERs recoiled in the forward direction in a narrow cone. The apparatus was designed to have high efficiency for detect- ing ERs. The efficiency of the apparatus was estimated by Monte Carlo simulations. The angular distribution of the ERs was generated by statistical model calculations using the code PACE2[29]. The input parameters for the statistical model calculations will be discussed later in this paper. The calculated efficiency for the lowest bom- barding energy is 93±1%. It increases as the reaction energy increases and reaches 98±1% at the highest en- ergy. A relatively thick target was used in this experiment. The beam lost approximately 40 MeV after passing through the target (13 MeV in the center of mass). For this reason, the measured cross section is an average of the contributions from the beam interacting throughout the thickness of the target. The variation of ER cross sections is not very large at energies above the Coulomb barrier because the shape of the excitation function is al- most flat. Therefore, the measured cross section is close to that would be measured at an energy corresponding to the middle of the target. However, at energies below the barrier the ER cross section falls off exponentially. The cross section near the entrance of the target has more weight than that near the exit. Smooth curves fitting the excitation function in this rapidly varying region were used to determine the reaction energy associated with the measured cross section. An iterative method was used to determine the effec- tive reaction energy for the thick target measurement . First, the measured cross sections and the beam energies calculated at the middle of the target were fitted by a tensioned spline[30] where the smoothness of the curve could be adjusted. The resulting curve was then used to calculate the thick target cross section for each measure- ment, according to dE/dx where σ(E) is the curve generated by the spline fit, dE/dx is the stopping power of 132Sn in 64Ni, and ρ is the target thickness. The integration limits were the energies of the beam at the exit of the target and at the entrance of the target. The energy, Ei, corresponding to the cross section, σi, was obtained by interpolation using the fit- ted curve. This set of energies was used as the input for the next iteration of the fit. The result converged very quickly. After five iterations, the energies differed from the previous iteration energies by less than 0.2 MeV. The validity of this method was checked by generating data from a known function such as the Wong formula [31] and folding in the effects of target thickness. Comparing to the cross-section-weighted-average method described in Ref. [28], the differences in energies determined by these two methods are not noticeable at high energies because the excitation function is fairly flat. However, at energies below the barrier, the energy determined by the cross-section-weighted-average method is larger than that determined by the method described above and disagrees with the measurement in Ref. [32], as can be seen in Fig. 3. Furthermore, it is found that using data generated from a known function the effective energy obtained by the cross- section-weighted-average method is shifted to too high an energy in the exponential falloff region. The uncertainty of the energy determination was esti- mated by comparison with the method using the cross section weighted average. The average uncertainty of the effective reaction energy is 2.3 MeV in the region where the excitation function is almost flat and increases to 3.9 MeV in the exponential fall off region. The uncer- tainty is larger, 5.8 MeV, for the lowest energy data point because an extrapolation is required for calculating the thick target cross section and the extrapolation region is influenced by the location of the next higher energy point. To verify our measurement technique, the ER cross sections for 124Sn+64Ni in inverse kinematics were mea- sured and compared to those published by Freeman et al. measured with a thin target[32]. It is noted that some of our measurements were performed at energies differ- ent from those of Ref. [32]. The comparison is shown in Fig. 3. Our data (open triangles) are in good agreement with those measured by Freeman et al. [32] (filled stars). FIG. 3: (Color online) Comparison of ER cross sections for 64Ni+124Sn measured in this work and by Freeman et al.[32] (filled stars). The filled circles are for energies determined by the method described in Ref. [28] and the open triangles are for energies determined by the method described in this paper. The solid circles are for energy determined by the cross- section-weighted-average method described in Ref. [28]. B. Fission Fission fragments were identified by requiring a coin- cidence of two particles detected by the pie-shaped sec- tors of the Si strip detector on either side of the beam. Figure 4(a) and (b) present two-dimensional histograms of particle energy and strip number of the Si detector for coincident events taken from 560 MeV and 620 MeV 132Sn+64Ni, respectively. They were compared to the kinematics calculation displayed in Fig. 4(c) and (d) where the fission fragments, elastically scattered Sn and Ni are shown by the solid, dash-dotted and dotted curves, respectively. The angular range of the Si strip detector is between the two vertical dashed lines. The elastically scattered Ni and Sn appear in the upper right hand cor- ner and center of the histogram, respectively. The fission events are located in the gated area. The folding angle distributions of the fragments were used to distinguish fission from other reactions, such as deep inelastic reactions. Since there are two solutions for the kinematics of the inverse reaction, as shown in Fig. 4(c) and (d), the fragment angular correlation is not as simple as that in normal kinematics. Monte Carlo simulations were performed to provide guidance. It was assumed that only fusion-fission results from a full mo- mentum transfer. The width of the mass distribution was taken from the 58Ni+124Sn measurement[33]. The width of the mass distribution was varied to estimate the uncer- (d)(c) Strip No. 0 10 20 30 40 50 60 Strip No. 0 10 20 30 40 50 60 FIG. 4: (Color online) (a) and (b) Two dimensional his- tograms of energy and strip number for coincident events from 560 and 620 MeV 132Sn+64Ni, respectively, measured by the annular double-sided silicon strip detector. The gated area shows events from fission and other reactions. (c) and (d) Kinematics of energy as a function of scattering angle for 560 and 620 MeV 132Sn+64Ni, respectively, elastic scattering and fission fragments.The dash-dotted and dotted curves are for the elastically scattered Sn and Ni, respectively whereas the solid curve is for the fission fragments. The angular range of the Si strip detector is between the two vertical dashed lines. tainty of the simulation. The transition state model[34] was used to predict the fission fragment angular distribu- tion. In Fig. 5 the simulated fission fragment folding an- gle distributions for 550 MeV 124Te+64Ni are compared with a stable beam test measurement. The folding angle distributions for one of the fragments detected in strip 2 (16.2◦), strip 22 (27.7◦), and strip 41 (36.8◦) are shown. The gap in the spectra at strip 14, 30, 44, 46, and 47 are malfunctioning strips in the detector. The Monte Carlo simulated folding angle distributions for fission are shown in the middle panels of Fig. 5 and compared to those of measurements shown in the left panels. For one of the fragments detected at forward angles, strip 2 for example, the predicted angular dis- tribution of the other fragment is similar to that of the measurement. Most of these events are considered as re- sulting from fission. For one of the fragments detected near the middle part of the detector, strip 22 for instance, there are differences between measurement and simula- tion in the shapes of the angular distributions of the other fragment. It is predicted that the other fission fragment is distributed around strip 40. The measured distribution spreads to more forward angles. For one of the fragments detected at the backward angles, the yield of the other fragment is predicted to be small and they are equally FIG. 5: (Color online) Left panels: Folding angle distributions for 550 MeV 124Te+64Ni for one of the fragments detected at 16.2◦ (strip 2), 27.7◦ (strip 22), and 36.8◦ (strip 41) by the annular double-sided silicon strip detector. The elastic scat- tering events are excluded. The dotted and dashed histograms are the results of fitting the data with simulated fission and deep inelastic collisions with Q=–20 MeV, respectively (see text). Middle panels: Results of Monte Carlo simulations for fission events. Right panels: Results of Monte Carlo simula- tions for deep inelastic scattering events. The solid curves are for reaction Q value of –10 MeV, the dashed curves are for Q=–20 MeV, and the dotted curves are for Q=–40 MeV. distributed between the middle part of the detector and the outer edge of the detector. But the measured events appear in the middle part of the detector. There are no events in the region where fission events are expected. These differences are attributed to the contribution from other reaction mechanisms, most likely deep inelastic col- lisions. An attempt was made to simulate these deep inelastic collision events. It was assumed that the mass of these products were projectile- and target-like and the angular distribution at forward angles followed a 1/sin(θ) depen- dence. The right panels of Fig. 5 show the results of sim- ulations performed for reaction Q values of –10 (solid), –20 (dashed), and –40 MeV (dotted). It can be seen that the overlap of fission and deep inelastic collisions becomes larger at more backward angles. At strip 41 (36.8◦), deep inelastic collisions account for all the events. The relative contribution of fission and deep inelas- tic collisions were obtained by fitting the simulated fold- ing angle distributions to the measured distributions for all the detector strips using the CERN library program MINUIT[35]. In the fits, the normalization coefficients for the simulated distributions were the only two variable parameters. The results of the fits are shown in the left panels of Fig. 5 by the dotted and dashed histograms for fission and deep inelastic collisions with Q=–20 MeV, re- spectively. The number of fission events in the measured distributions were taken as the summed events in each strip multiplied by the relative contribution of fission. The folding angle distributions for 132Sn+64Ni are shown in Fig. 6. Due to the low statistics, it was not prac- tical to extract the fission events by fitting the folding angle distributions. As an alternative, the fission events were extracted by setting gates on the folding angle dis- tributions using the simulated distributions as references. This gating method was also tested with the 124Te+64Ni measurement. The fission cross sections obtained by the fitting method and the gating method agreed within 10%. FIG. 6: (Color online) Left panels: Folding angle distributions for 560 MeV 132Sn+64Ni for one of the fragments detected at 16.2◦ (strip 2), 27.7◦ (strip 22), and 36.8◦ (strip 41) by the annular double-sided silicon strip detector. The elastic scat- tering events are excluded. Middle panels: Results of Monte Carlo simulations for fission events. Right panels: Results of Monte Carlo simulations for deep inelastic scattering events. The solid curves are for reaction Q value of –10 MeV, and the dotted curves are for Q=–40 MeV. The Monte Carlo simulation was also employed to cal- culate the coincidence efficiency of the detector. The effi- ciency increased from 5.7±0.9% at 530 MeV to 7.6±0.8% at 620 MeV bombarding energy. In the present work, the dynamic range of the ampli- fiers was not sufficiently large resulting in the distortion of the high energy signals. In the future, new amplifiers that are more suitable for measuring the energy of fission fragments will be used so that the mass ratio of reac- tion products can be obtained to help distinguish fission events from other reaction channels. The formation of a compound nucleus depends on whether the interacting nuclei are captured inside the fusion barrier and whether the dinuclear system can sub- sequently evolve into a compact shape. Quasifission oc- curs when the dinuclear system fails to cross the saddle level density parameter (a) A/8 MeV−1 af/an 1.04 diffuseness of spin distribution (∆l) 4 h̄ fission barrier Sierk[45] TABLE I: Input parameters for statistical model calculations. point to reach shape equilibrium. Since the beam inten- sity was several orders of magnitude lower than that of stable beams and the reaction was in inverse kinemat- ics, making separation of fusion-fission and quasifission very difficult, there was no attempt to distinguish quasi- fission from fusion-fission in this work. Furthermore, the experimental results are compared to barrier penetration models which describe the capture process, making it un- necessary to separate these two processes. IV. COMPARISON WITH MODEL CALCULATIONS A. Statistical model The compound nucleus formed in 132Sn+64Ni decays by particle evaporation and fission. Statistical models have successfully described compound nucleus decay for a wide range of fusion reactions. The measured ER and fission cross sections are compared with the predictions of the statistical model code PACE2[29]. The input param- eters were obtained by simultaneously fitting the data from stable Sn on 64Ni[32, 36] and the measured fusion cross sections[36] were used for the calculations. Fig- ure 7(a), (b), and (c) displays the comparison of calcula- tions and data for 112,118,124Sn+64Ni, respectively. The calculations reproduce the measurements well except for the ER cross sections of 112Sn+64Ni. Table I lists the in- put parameters for the calculations. Without adjusting the parameters, calculations for 132Sn+64Ni were per- formed. The results are shown in Fig. 7(d). Very good agreement between the calculation and the data can be seen. It is noted that some of the parameters used in our calculations are different from those used by Lesko et al. [36]. In their calculations, the code CASCADE[37] was used. The mass of the nuclei in the decay chain was calculated using the Myers droplet model[38]. The dif- fuseness of the spin distribution was ∆l = 15h̄ and the ratio of level density at the saddle point to the ground state, af/an, was set to 1.0. In this work, a compilation of measured masses[39], ∆l = 4h̄, and af/an = 1.04 were used. FIG. 7: (Color online) Comparison of measured ER (filled circles) and fission (open circles) cross sections with statistical model calculations. The solid and the dotted curves are the calculated fission and ER cross sections, respectively, using the measured fusion cross sections as input. (a) 64Ni+112Sn, (b) 64Ni+118Sn, (c) 64Ni+124Sn (Freeman et al.[32]), and (d) 132Sn+64Ni (this work) B. Coupled-channel calculation In general, sub-barrier fusion enhancement can be de- scribed by coupled-channel calculations. The fusion cross section of 132Sn+64Ni, the sum of ER and fission cross sections, is compared with coupled-channel calculations using the code CCFULL[40]. The interaction potential (V◦=82.46 MeV, r◦=1.18 fm, and a=0.691 fm) was taken from the systematics of Broglia and Winther[41]. The result of the calculations are compared with the data in Fig. 8. The dotted curve is the prediction of a one- dimensional barrier penetration model and it can be seen that it substantially underpredicts the sub-barrier cross sections. The coupled-channel calculation including in- elastic excitation of 64Ni to the first 2+ and 3− states and 132Sn to the first 2+ state is shown by the dashed curve. The transition matrix elements, B(Eλ), of 64Ni were obtained from Ref. [42, 43] and the B(E2) of 132Sn was obtained from a recent measurement by Varner et al.[44]. This calculation overpredicts the data at energies near the barrier and underpredicts the data well below the barrier. The neutron transfer reactions have positive Q values for transferring two to six neutrons from 132Sn to 64Ni. Since there is no neutron transfer data available for this reaction, the transfer coupling form factor is unknown. Thus, the coupled-channel calculation including transfer and inelastic excitation was performed with one effec- FIG. 8: (Color online) Comparison of 132Sn+64Ni fusion data (filled circles) with a one-dimensional barrier penetra- tion model calculation (dotted curve). The coupled-channel calculation including inelastic excitation of the projectile and target is shown by the dashed curve and the calculation in- cluding inelastic excitation and neutron transfer is shown by the solid curve. tive transfer channel using the Q value for two-neutron transfer. The coupling constant was adjusted to fit the data. The calculation with the coupling constant set to 0.48 is shown by the solid curve. It reproduces the data very well except for the lowest energy data point which has large uncertainties in energy and in cross section. A better treatment of the transfer channels based on exper- imental transfer data would help improve understanding of the influence of transfer on fusion. Experimental neu- tron transfer data on 132Sn+64Ni in the future would be very useful. V. DISCUSSION The ER cross section can be described by σER = πλ (2l+ 1)σl, where λ is the de Broglie wave length, lc the maximum angular momentum for ER formation and σl the partial cross section. The reduced ER cross sections for 64Ni on stable-even Sn isotopes[32] are compared with that for 132Sn+64Ni in Fig. 9. The reduced ER cross sec- tion is defined as the ER cross section divided by the kinematic factor πλ2. It can be seen that the ER cross sections saturate at high energies as fission becomes a significant fraction of the fusion cross section. In addi- tion, the saturation value increases as the neutron excess in Sn increases. This is consistent with the fact that the fission barrier height increases for the more neutron-rich compound nuclei. FIG. 9: (Color online) The reduced ER cross section as a function of the center of mass energy for 64Ni on stable even Sn isotopes[32] and radioactive 132Sn. In Fig. 10, the measured reduced ER cross sections for Ni+Sn as a function of the calculated average mass of the ERs, predicted by PACE2, are presented. In the same re- action, the higher mass ERs are produced at lower beam energies because of the lower excitation energies of the compound nucleus. As the neutron excess in the com- pound nucleus increases, neutron evaporation becomes the dominant decay channel. The PACE2 calculation pre- dicts that a compound nucleus made with Sn isotopes of mass number greater than 120 decays essentially 100% by neutron evaporation and Pt isotopes are the primary ERs. The mass of the compound nucleus is different when it is produced with different Sn isotopes. How- ever, it can be seen that Pt of a particular mass can be produced with different Sn isotopes if different num- bers of neutrons are evaporated. The reaction with a more neutron-rich Sn produces the same Pt isotope at a higher rate. With 132Sn as the projectile, the ERs are so neutron-rich that they cannot be produced by stable Sn induced reactions. This suggests that it may be benefi- cial to use neutron-rich radioactive ion beams to produce new isotopes of heavy elements. The fusion excitation functions of 64Ni on stable even Sn isotopes[36] are compared with that of 132Sn+64Ni in Fig. 11. In order to remove the effects of the difference in nuclear sizes, the cross section is divided by πR2 with R=1.2(A t ) fm, where Ap (At) is the mass num- ber of the projectile (target). The reaction energy in the center of mass is divided by the barrier height predicted by the Bass model[46]. It can be seen that the fusion of 132Sn and 64Ni is not enhanced with respect to the stable- even Sn isotopes when the difference in nuclear sizes is FIG. 10: (Color online) The reduced ER cross section as a function of the calculated average mass of ERs predicted by PACE2[29] for 64Ni on stable even Sn isotopes[36] and radioac- tive 132Sn. considered. FIG. 11: (Color online) Comparison of fusion excitation func- tions for 64Ni on stable even Sn isotopes[36] and radioactive 132Sn. The change in nuclear sizes are corrected by factor- ing out the area and the Bass barrier height[46] in the cross section and energy, respectively. The lowest energy data point has large uncertainties. The cross section seems enhanced comparing to the sta- ble beam measurements in Fig. 9 and Fig. 11. A more pronounced enhancement appears when the data point is compared to our coupled-channel calculations (Fig. 8) and to a time-dependent Hartree-Fock calculation[47]. To further explore if fusion is enhanced at this low energy region, we plan to repeat the measurement with an im- proved apparatus where the thickness of the Mylar foil in the microchannel plate timing detector located in front of the ionization chamber will be reduced. This will al- low a better separation of the energy loss signals from ERs and scattered beams in the ionization chamber at low bombarding energies. The Q values for transferring two to six neutrons from 132Sn to 64Ni are positive. It is necessary to include neutron transfer in coupled-channel calculations to re- produce experimental results. As the neutron excess in the Ni isotopes decreases, the number of neutron transfer channels with positive Q values increases for 132Sn+Ni. In 132Sn+58Ni, the Q values for transferring one to six- teen neutrons from 132Sn to 58Ni are positive and range from 1.7 to 17.4 MeV. A large sub-barrier fusion enhance- ment due to the coupling to neutron transfer is expected to occur in 132Sn+58Ni. An experiment to measure the fusion excitation function of 132Sn on 58Ni is in prepara- tion. Although 132Sn is unstable, its neutron separation en- ergy is 7.3 MeV. This is not very low compared to stable nuclei. The sub-barrier fusion enhancement observed in 132Sn+64Ni with respect to stable Sn nuclei can be ac- counted for by the change in nuclear sizes. No extra en- hancement was found. However, an increased ER yield at energies above the barrier was observed as compared to stable Sn. As the shell closure is crossed, the binding energy for 133Sn decreases by a factor of two. The nu- clear surface of 133Sn and even more neutron-rich Sn may be more diffused. The number of neutron transfer chan- nels with positive Q values increases by a factor of two or more. Larger sub-barrier fusion enhancement beyond the nuclear size effect may be expected. VI. SUMMARY Neutron-rich radioactive 132Sn beams were incident on a 64Ni target to measure fusion cross sections near the Coulomb barrier. With an average intensity of 5×104 pps beams and a high efficiency apparatus for ER detec- tion, the uncertainty of the measured ER cross section is small and comparable to that achieved in stable beam ex- periments. The efficiency for fission fragment detection was low but the detector had a very fine granularity. By requiring a coincident detection of the fission fragments and performing folding angle distribution analysis, fission events were identified. The excitation functions of ER and fission can be described by statistical model calcula- tions using parameters that simultaneously fit the stable even Sn isotopes on 64Ni fusion data. A large sub-barrier fusion enhancement with respect to a one-dimensional barrier penetration model prediction was observed. The enhancement is attributed to the coupling of the projec- tile and target inelastic excitation and neutron transfer. The reduced ER cross sections at energies above the bar- rier are larger for the 132Sn induced reaction than those induced by stable Sn nuclei, as expected from the higher fission barrier of the more neutron-rich compound nu- cleus. For a specific mass of ER, reactions with a more neutron-rich Sn have higher cross sections. When the fusion excitation functions are compared on a reduced scale, where the effects of nuclear size and barrier height are factored out, no extra fusion enhancement is observed in 132Sn+64Ni with respect to stable Sn induced fusion. The fusion cross section measured at the lowest energy seems to be enhanced. Experiments to investigate this with an improved apparatus is planned. VII. ACKNOWLEDGMENT We would like to thank D. J. Hinde for helpful and stimulating discussions. We wish to thank the HRIBF staff for providing excellent radioactive beams and tech- nical support. Research at the Oak Ridge National Laboratory is supported by the U.S. Department of Energy under contract DE-AC05-00OR22725 with UT- Battelle, LLC. W.L. and D.P. are supported by the the U.S. Department of Energy under grant no. DE-FG06- 97ER41026. [1] W. Reisdorf, J. Phys. G 20, 1297 (1994). [2] B. B. Back, Phys. Rev. C 31, 2104 (1985). [3] J. Tōke et al., Nucl. Phys. A440, 327 (1985). [4] D. J. Hinde and M. Dasgupta, Phys. Lett. B 622, 23 (2005). [5] M. Beckerman, Rep. Prog. Phys. 51, 1047 (1988). [6] M. Dasgupta, D. J. Hinde, N. Rowley, and A. M. Ste- fanini, Annu. Rev. Nucl. Part. Sci. 48, 401 (1998). [7] A. B. Balantekin and N. Takigawa, Rev. Mod. Phys. 70, 77 (1998). [8] J. R. Leigh et al., Phys. Rev. C 52, 3151 (1995). [9] J. D. Bierman, P. Chan, J. F. Liang, M. P. Kelly, A. A. Sonzogni, and R. Vandenbosch, Phys. Rev. Lett. 76, 1587 (1996); Phys. Rev. C 54, 3068 (1996). [10] C. R. Morton et al., Phys. Rev. Lett. 72, 4074 (1994). [11] A. M. Stefanini et al., Phys. Rev. Lett. 74, 864 (1995). [12] A. A. Sonzogni, J. D. Bierman, M. P. Kelly, J. P. Lestone, J. F. Liang, and R. Vandenbosch, Phys. Rev. C 57, 722 (1998). [13] A. M. Stefanini, D. Ackermann, L. Corradi, J. H. He, G. Montagnoli, S. Beghini, F. Scarlassara, and G. F. Segato, Phys. Rev. C 52, R1727 (1995). [14] H. Timmers et al., Nucl. Phys. A633, 421 (1998). [15] N. Takigawa, H. Sagawa, and T. Shinozuka, Nucl. Phys. A538, 221c (1992). [16] M. S. Hussein, Nucl. Phys. A531, 192 (1991). [17] C. H. Dasso and R. Donangelo, Phys. Lett. B 276, 1 (1992). [18] V. Yu. Denisov, Eur. Phys. J. A 7 87 (2000). [19] V. I. Zagrebaev, Phys. Rev. C 67, 061601(R) (2003). [20] S. Hofmann, Prog. Part. Nucl. Phys. 46, 293 (2001). [21] K. E. Zyromski et al., Phys. Rev. C 55, R562 (1997). [22] J. J. Kolata et al., Phys. Rev. Lett. 81, 4580 (1998). [23] Y. X. Watanabe et al., Eur. Phys. J. A 10, 373 (2001). [24] J. F. Liang et al., Phys. Rev. Lett. 91, 152701 (2003); Phys. Rev. Lett. 96, 029903(E) (2006). [25] J. F. Liang and C. Signorini, Int. J. Mod. Phys. E 14, 1121 (2005). [26] S. Hofmann and G. Münzenberg, Rev. Mod. Phys. 72, 733 (2000). [27] D. W. Stracener, Nucl. Instrum. and Methods B 204, 42 (2003). [28] D. Shapira et al., Nucl. Instrum. and Methods A 551, 330 (2005). [29] A. Gavron, Phys. Rev. C 21, 230 (1980). [30] http://www.netlib.org/fitpack/. [31] C. Y. Wong, Phys. Rev. Lett. 31, 766 (1973). [32] W. S. Freeman et al., Phys. Rev. Lett. 50, 1563 (1983). [33] F. L. H. Wolfs, Phys. Rev. C 36, 1379 (1987). [34] R. Vandenbosch and J. R. Huizenga, Nuclear Fission, Academic Press, New York, (1973). [35] F. James, MINUIT reference manual (Version 94.1), Program Library D506, CERN, (1998). [36] K. T. Lesko et al., Phys. Rev. C 34, 2155 (1986). [37] F. Pühlhofer, Nucl. Phys. A280, 267 (1977). [38] W. D. Myers, Droplet Model of the Atomic Nucleus (IFI/Plenum, New York, 1977). [39] A. H. Wapstra, G. Audi, and C. Thibault, Nucl. Phys. A729, 129 (2003). [40] K. Hagino, N. Rowley, and A. T. Kruppa, Compu. Phys. Commun. 123, 143 (1999). [41] R. A. Broglia and A. Winther, Heavy Ion Reactions, Addison-Wesley, (1991). [42] S. Raman et al., At. Data Nucl. Tables 36, 1 (1987). [43] R. H. Spear, At. Data Nucl. Tables 42, 55 (1989). [44] R. L. Varner et al., Eur. Phys. J. A 25, s01, 391 (2005). [45] A. J. Sierk, Phys. Rev. C 33, 2039 (1986). [46] R. Bass, Nucl. Phys. A231, 45 (1974). [47] A. S. Umar and V. E. Oberacker, Phys. Rev. C 74, 061601(R) (2006). http://www.netlib.org/fitpack/ ABSTRACT Evaporation residue and fission cross sections of radioactive $^{132}$Sn on $^{64}$Ni were measured near the Coulomb barrier. A large sub-barrier fusion enhancement was observed. Coupled-channel calculations including inelastic excitation of the projectile and target, and neutron transfer are in good agreement with the measured fusion excitation function. When the change in nuclear size and shift in barrier height are accounted for, there is no extra fusion enhancement in $^{132}$Sn+$^{64}$Ni with respect to stable Sn+$^{64}$Ni. A systematic comparison of evaporation residue cross sections for the fusion of even $^{112-124}$Sn and $^{132}$Sn with $^{64}$Ni is presented. <|endoftext|><|startoftext|> Tri-layer superlattices: A route to magnetoelectric multiferroics? Alison J. Hatt and Nicola A. Spaldin Materials Department, University of California (Dated: October 22, 2018) We explore computationally the formation of tri-layer superlattices as an alternative approach for combining ferroelectricity with magnetism to form magnetoelectric multiferroics. We find that the contribution to the superlattice polarization from tri-layering is small compared to typical polar- izations in conventional ferroelectrics, and the switchable ferroelectric component is negligible. In contrast, we show that epitaxial strain and “negative pressure” can yield large, switchable polar- izations that are compatible with the coexistence of magnetism, even in materials with no active ferroelectric ions. PACS numbers: The simultaneous presence of ferromagnetism and fer- roelectricity in magnetoelectric multiferroics suggests tremendous potential for innovative device applications and exploration of the fundamental physics of coupled phenomena. However, the two properties are chemically contra-indicated, since the transition metal d electrons which are favorable for ferromagnetism disfavor the off- centering of cations required for ferroelectricity [1]. Con- tinued progress in this burgeoning field rests on the iden- tification of alternative mechanisms for ferroelectricity which are compatible with the existence of magnetism [2, 3]. Mechanisms discovered to date include the incor- poration of stereochemically active lone pair cations, for example in BiMnO3 [4, 5] and BiFeO3 [6, 7], geometric ferroelectricity in YMnO3 [8], BaNiF4 [9, 10] and related compounds, charge ordering as in LuFe2O4 [11, 12], and polar magnetic spin-spiral states, of which TbMnO3 is the prototype [13]. However, there are currently no single phase multiferroics with large and robust magnetization and polarization at or near room temperature [14]. The study of ferroelectrics has been invigorated over the last few years by tremendous improvements in the ability to grow high quality ferroelectric thin films with precisely controlled composition, atomic arrangements and interfaces. In particular, the use of compositional ordering that breaks inversion symmetry, such as the layer-by-layer growth of three materials in an A-B-C- A-B-C... arrangement, has produced systems with en- hanced polarizations and large non-linear optical re- sponses [15, 16, 17, 18]. Here we explore computation- ally this tri-layering approach as an alternative route to magnetoelectric multiferroics. Our hypothesis is that the magnetic ions in such a tri-layer superlattice will be con- strained in a polar, ferroelectric state by the symmetry of the system, in spite of their natural tendency to remain centrosymmetric. We note, however, that in previous tri-layering studies, at least one of the constituents has been a strong ferroelectric in its own right, and the other constituents have often contained so-called second-order Jahn-Teller ions such as Ti4+, which have a tendency to off-center. Therefore factors such as electrostatic effects from internal electric fields originating in the strong fer- roelectric layers [19], or epitaxial strain, which is well established to enhance or even induce ferroelectric prop- erties in thin films with second-order Jahn-Teller ions [6, 20, 21], could have been responsible for the enhanced polarization in those studies. We choose a [001] tri-layer superlattice of perovskite- structure LaAlO3, LaFeO3 and LaCrO3 as our model system (see Fig. 1, inset.) Our choice is motivated by three factors. First, all of the ions are filled shell or filled sub-shell, and therefore insulating behavior, a prerequi- site for ferroelectricity, is likely. Second, the Fe3+ and Cr3+ will introduce magnetism. And third, none of the parent compounds are ferroelectric or even contain ions that have a tendency towards ferroelectric distortions, al- lowing us to test the influence of trilayering alone as the driving force for ferroelectricity. For all calculations we use the LDA+U method [22] of density functional the- ory as implemented in the Vienna Ab-initio Simulation Package (VASP) [23]. We use the projector augmented wave (PAW) method [24, 25] with the default VASP po- tentials (La, Al, Fe pv, Cr pv, O), a 6x6x2 Monkhorst- Pack mesh and a plane-wave energy cutoff of 450 eV. Po- larizations are obtained using the standard Berry phase technique [26, 27] as implemented in VASP. We find that U/J values of 6/0.6 eV and 5/0.5 eV on the Fe and Cr ions respectively, are required to obtain insulating band structures; smaller values of U lead to metallic ground states. These values have been shown to give reasonable agreement with experimental band gaps and magnetic moments in related systems [28] but are somewhat lower than values obtained for trivalent Fe and Cr using a con- strained LDA approach [29]. We therefore regard them as a likely lower limit of physically meaningful U/J values. (Correspondingly, since increasing U often decreases the covalency of a system, our calculated polarizations likely provide upper bounds to the experimentally attainable polarizations). We begin by constraining the in-plane a lattice con- stant to the LDA lattice constant of cubic SrTiO3 (3.85 Å) to simulate growth on a substrate, and adjust the out- of-plane c lattice constant until the stress is minimized, with the ions constrained in each layer to the ideal, high- symmetry perovskite positions. We refer to this as our reference structure. (The LDA (LDA+U) lattice con- http://arxiv.org/abs/0704.0781v3 stants for cubic LaAlO3 (LaFeO3, LaCrO3) are 3.75, 3.85 and 3.84 Å, respectively. Thus, LaAlO3 is under tensile strain and LaFeO3/LaCrO3 are unstrained.) The cal- culated total density of states, and the local densities of states on the magnetic ions, are shown in Figure 2; a band gap of 0.32 eV is clearly visible. The polarization of this reference structure differs from that of the corresponding non-polar single-component material (for example pure LaAlO3) at the same lattice parameters by 0.21 µC/cm . Note, however, that this polarization is not switch- able by an electric field since it is a consequence of the tri-layered arrangement of the B-site cations. Next, we remove the constraint on the high symmetry ionic posi- tions, and relax the ions to their lowest energy positions along the c axis by minimizing the Hellmann-Feynman forces, while retaining tetragonal symmetry. We obtain a ground state that is significantly (0.14 eV) lower in en- ergy than the reference structure, but which has a simi- lar value of polarization. Two stable ground states with different and opposite polarizations from the reference structure, the signature of a ferroelectric, are not ob- tained. Thus it appears that tri-layering alone does not lead to a significant switchable polarization in the ab- sence of some additional driving force for ferroelectricity. In all cases, the magnetic ions are high spin with negligi- ble energy differences between ferro- and ferri-magnetic orderings of the Fe and Cr ions; both arrangements lead to substantial magnetizations of 440 and 110 emu/cm3 respectively. Such magnetic tri-layer systems could prove useful in non-linear-optical applications, where a break- ing of the inversion center is required, but a switchable polarization is not. Since epitaxial strain has been shown to have a strong influence on the polarization of some ferroelectrics (such as increasing the remanent polarization and Curie tem- perature of BaTiO3 [20] and inducing room temperature ferroelectricity in otherwise paraelectric SrTiO3 [21]) we next explore the effect of epitaxial strain on the polar- ization of La(Al,Fe,Cr)O3. To simulate the effects of epi- taxial strain we constrain the value of the in-plane lat- tice parameter, adjust the out of plane parameter so as to maintain a constant cell volume, and relax the atomic positions. The volume maintained is that of the calcu- lated fully optimized structure, 167 Å3, which has an in-plane lattice constant of 3.82 Å. As shown in Figure 3, we find that La(Al,Fe,Cr)O3 undergoes a phase transi- tion to a polar state at an in-plane lattice constant of 3.76 Å, which corresponds to a (compressive) strain of -0.016 (calculated from (a‖−a0)/a0 where a‖ is the in-plane lat- tice constant and a0 is the calculated equilibrium lattice constant). A compressive strain of -0.016 is within the range attainable by growing a thin film on a substrate with a suitably reduced lattice constant. We find that significant ferroelectric polarizations can be induced in La(Al,Fe,Cr)O3 at even smaller strain val- ues by using negative pressure conditions. We simulate negative pressure by increasing all three lattice constants and imposing the constraint a=b=c/3; such a growth condition might be realized experimentally by growing the film in small cavities on the surface of a large-lattice- constant substrate, such that epitaxy occurs both hori- zontally and vertically. As in the planar epitaxial strain state, the system becomes strongly polar; this time the phase transition to the polar state occurs at a lattice con- stant of 3.85 Å, at which the strain is a negligible 0.001 relative to the lattice constant of the fully optimized sys- In Fig. 1 we show the calculated energy versus dis- tortion profile and polarization for negative pressure La(Al,Fe,Cr)O3 with in-plane lattice constant = 3.95 Å, well within the ferroelectric region of the phase diagram shown in Fig. 3. The system has a characteristic ferro- electric double well potential which is almost symmetric in spite of the tri-layering; the two ground states have polarizations of 38.9 and -39.9 µC cm−2 respectively, rel- ative to the reference structure at the same lattice con- stant. Since the energies of the two minima are almost identical, the effective electric field Eeff=∆E/∆P, intro- duced in Ref [15], is close to zero and there is no tendency to self pole. The origin of the symmetry is seen in the calculated Born effective charges (3.6, 3.5 and 3.3 for Al, Fe and Cr respectively) which show that the system is largely ionic, with the ions showing very similar trivalent cationic behavior. A similar profile is observed under planar epitaxial strain, although the planar strained sys- tem is around 0.15 eV lower in energy than the negative pressure system for the same in-plane lattice constant. To decouple the effects of interfacial strain and tri- layering we calculate the polarization as a function of strain and negative pressure for the individual compo- nents, LaAlO3, LaFeO3 and LaCrO3. We find that all three single-phase materials become polar at planar epi- taxial strains of -0.03 (LaAlO3), -0.02 (LaFeO3), and -0.01 (LaCrO3). Likewise, all three components be- come polar at negative pressure, under strains of +0.03 (LaAlO3), +0.001 (LaFeO3), and +0.001 (LaCrO3). (The higher strains required in LaAlO3 reflect its smaller equilibrium lattice constant.) These results confirm our earlier conclusion that the large polarizations obtained in strained and negative pressure La(Al,Fe,Cr)O3 are not a result of the tri- layering. We therefore suggest that many perovskite ox- ides should be expected to show ferroelectricity provided that two conditions imposed in our calculations are met: First, the ionic radii of the cation sites in the high sym- metry structure are larger than the ideal radii, so that structural distortions are desirable in order to achieve an optimal bonding configuration. This can be achieved by straining the system epitaxially or in a “negative pres- sure” configuration. And second, non-polar structural distortions, such as Glazer tiltings [30], are de-activated relative to polar, off-centering distortions. These have been prohibited in our calculations by the imposition of tetragonal symmetry; we propose that the symmetry con- straints provided experimentally by hetero-epitaxy in two or three dimensions should also disfavor non-polar tilting and rotational distortions. A recent intriguing theoretical prediction that disorder can be used to disfavor cooper- ative tilting modes is awaiting experimental verification [31]. Finally, we compare the tri-layered La(Al,Fe,Cr)O3 with the polarization of its individual components. Cal- culated separately, the remnant polarizations of LaAlO3, LaFeO3 and LaCrO3, all at negative pressure with a=c=3.95 Å, average to 40.4 µC cm−2. This is only slightly larger than the calculated polarizations of the heterostructure, 38.9 and 39.9 µC cm−2, indicating that tri-laying has a negligible effect on the polarity. This sur- prizing result warrants further investigation into how the layering geometry modifies the overall polarization. In conclusion, we have shown that asymmetric layering alone is not sufficient to produce a significant switchable polarization in a La(Al,Fe,Cr)O3 superlattice, and we suggest that earlier reports of large polarizations in other tri-layer structures may have resulted from the intrinsic polarization of one of the components combined with epi- taxial strain. We find instead that La(Al,Fe,Cr)O3 and its parent compounds can become strongly polar under reasonable values of epitaxial strain and symmetry con- straints, and that tri-layering serves to modify the re- sulting polarization. Finally, we suggest “negative pres- sure” as an alternative route to ferroelectricity and hope that our prediction motivates experimental exploration of such growth techniques. This work was funded by the NSF IGERT program, grant number DGE-9987618, and the NSF Division of Materials Research, grant number DMR-0605852. The authors thank Massimiliano Stengel and Claude Ederer for helpful discussions. [1] N. A. Hill, J. Phys. Chem. B 104, 6694 (2000). [2] C. Ederer and N. A. Spaldin, Curr. Opin. Solid State Mater. Sci. 9, 128 (2005). [3] M. Fiebig, J. Phys. D: Appl. Phys. 38, R1 (2005). [4] R. Seshadri and N. A. Hill, Chem. Mater. 13, 2892 (2001). [5] A. M. dos Santos, S. Parashar, A. R. Raju, Y. S. Zhao, A. K. Cheetham, and C. N. R. Rao, Solid State Commun. 122, 49 (2002). [6] J. Wang, J. B. Neaton, H. Zheng, V. Nagarajan, S. B. Ogale, B. Liu, D. Viehland, V. Vaithyanathan, D. G. Schlom, U. V. Waghmare, et al., Science 299, 1719 (2003). [7] J. B. Neaton, C. Ederer, U. V. Waghmare, N. A. Spaldin, and K. M. Rabe, Phys. Rev. B 71, 014113 (2005). [8] B. B. van Aken, T. T. M. Palstra, A. Filippetti, and N. A. Spaldin, Nat. Mater. 3, 164 (2004). [9] D. L. Fox and J. F. Scott, J. Phys. C 10, L329 (1977). [10] C. Ederer and N. A. Spaldin, Physical Re- view B (Condensed Matter and Materials Physics) 74, 024102 (pages 8) (2006), URL http://link.aps.org/abstract/PRB/v74/e024102 . [11] N. Ikeda, H. Ohsumi, K. Ohwada, K. Ishii, T. Inami, K. Kakurai, Y. Murakami, K. Yoshii, S. Mori, Y. Horibe, et al., Nature 436, 1136 (2005). [12] M. A. Subramanian, H. Tao, C. Jiazhong, N. S. Rogado, T. G. Calvarese, and A. W. Sleight, Adv. mater. 18, 1737 (2006). [13] T. Kimura, T. Goto, H. Shintani, K. Ishizaka, T. Arima, and Y. Tokura, Nature 426, 55 (2003). [14] R. Ramesh and N. A. Spaldin, Nat. Mater. 6, 21 (2007). [15] N. Sai, B. Meyer, and D. Vanderbilt, Phys. Rev. Lett. 84, 5636 (2000). [16] H. N. Lee, H. M. Christen, M. F. Chisholm, C. M. Rouleau, and D. H. Lowndes, Nature 433, 395 (2005). [17] M. P. Warusawithana, E. V. Colla, J. N. Eckstein, and M. B. Weissman, Phys. Rev. Lett. 90, 036802 (2003). [18] Y. Ogawa, H. Yamada, T. Ogasawara, T. Arima, H. Okamoto, M. Kawasaki, and Y. Tokura, Phys. Rev. Lett. 90, 217403 (2003). [19] J. B. Neaton and K. M. Rabe, Appl. Phys. Lett. 82, 1586 (2003). [20] K. J. Choi, M. Biegalski, Y. L. Li, A. Sharan, J. Schubert, R. Uecker, P. Reiche, Y. B. Chen, X. Q. Pan, V. Gopalan, et al., Science 306, 1005 (2004). [21] J. H. Haeni, P. Irvin, W. Chang, R.Uecker, P. Re- iche, Y. L. Li, S. Choudhury, W. Tian, M. E. Hawley, B. Craigo, et al., Nature 430, 758 (2004). [22] V. I. Anisimov, F. Aryasetiawan, and A. I. Liechtenstein, J. Phys.: Condens. Mat. 9, 767 (1997). [23] G. Kresse and J. Furthmüller, Phys. Rev. B 54, 11169 (1996). [24] P. E. Blöchl, Phys. Rev. B 50, 17953 (1994). [25] G. Kresse and D. Joubert, Phys. Rev. B 59, 1758 (1999). [26] R. D. King-Smith and D. Vanderbilt, Phys. Rev. B 47, 1651 (1993). [27] D. Vanderbilt and R. D. King-Smith, Phys. Rev. B 48, 4442 (1993). [28] Z. Yang, Z. Huang, L. Ye, and X. Xie, Phys. Rev. B 60, 15674 (1999). [29] I. Solovyev, N. Hamada, and K. Terakura, Phys. Rev. B 53, 7158 (1996). [30] A. M. Glazer, Acta Crystallogr. B 28, 3384 (1972). [31] D. I. Bilc and D. J. Singh, Phys. Rev. Lett. 96, 147602 (pages 4) (2006). http://link.aps.org/abstract/PRB/v74/e024102 FIG. 1: Energy and polarization as a function of displacement from the centrosymmetric structure for La(Al,Fe,Cr)O3 under negative pressure with a = c/3 = 3.95 Å. Inset: Schematic representation of the centrosymmetric unit cell (center) and displacements of the metal cations corresponding to the en- ergy minima. Displacements are exaggerated for clarity. FIG. 2: Density of states for Fe and Cr ions in La(Al,Fe,Cr)O3 with U/J values of 6/0.6 eV and 5/0.5 eV respectively. The dashed line at 0 eV indicates the position of the Fermi energy. FIG. 3: Calculated polarizations of negative pressure (cir- cles) and epitaxially strained (triangles) La(Al,Fe,Cr)O3 as a function of change in (a) in-plane and (b) out-of-plane lattice constants relative to the lattice constants of the fully relaxed structures. The polarizations are reported relative to the ap- propriate corresponding reference structures in each case. ABSTRACT We explore computationally the formation of tri-layer superlattices as an alternative approach for combining ferroelectricity with magnetism to form magnetoelectric multiferroics. We find that the contribution to the superlattice polarization from tri-layering is small compared to typical polarizations in conventionalferroelectrics, and the switchable ferroelectric component is negligible. In contrast, we show that epitaxial strain and ``negative pressure'' can yield large, switchable polarizations that are compatible with the coexistence of magnetism, even in materials with no active ferroelectric ions. <|endoftext|><|startoftext|> Introduction A fundamental problem in numerical relativity is the need to solve Einstein’s equations on spatially unbounded domains with finite computer resources. There are various ways of addressing this issue. Most often, the spatial domain is truncated at a finite distance and suitable boundary conditions are imposed at the artificial boundary. A different approach is to compactify the domain by using spatial coordinates that bring spatial infinity to a finite location on the computational grid. Another method often used for wave-like problems (although it is not commonly used in numerical relativity) includes so-called sponge layers which damp the waves near the outer boundary of the computational domain. The purpose of this paper is to compare these various methods by testing their ability to accurately reproduce dynamical solutions of Einstein’s equations. An ideal boundary treatment would produce a solution to Einstein’s equations that is identical (within the computational domain) to the corresponding solution obtained on an unbounded domain. In particular, no spurious gravitational radiation or constraint violations should enter the computational domain through the artificial Testing outer boundary treatments for the Einstein equations 2 boundary. We can use this principle to test the various boundary treatments in the following way. First we compute a reference solution using a very large computational domain, large enough that its boundary remains out of causal contact with the interior spacetime region where comparisons are being made. Next we compute the same solution using a domain truncated at a smaller distance where one of the boundary treatments is used: we either impose boundary conditions there, compactify spatial infinity, or add a sponge layer. Finally we compare the solution on the smaller domain with the reference solution, measuring the reflections and constraint violations caused by the boundary treatment. Assessing boundary conditions by comparing with a reference solution on a much larger domain or a known analytic solution is a common practice in computational science. For applications to numerical relativity see e.g. [1], chapter 8 of [2], and [3, 4, 5]. The particular test problem used in this paper is a Schwarzschild black hole with an outgoing gravitational wave perturbation. The interior of the black hole is excised; all the characteristic fields propagate into the black hole (and out of the computational domain) at the inner boundary and hence no boundary conditions are needed there. Our numerical implementation uses a pseudo-spectral collocation method. See Appendix A for details on the initial data, the numerical methods, and the quantities that we compare between the solutions. We perform all of these tests using a first-order generalized harmonic formulation of the Einstein equations (see [6] and references therein). In section 2 we discuss the construction of boundary conditions for this system that prevent the influx of constraint violations, and that limit the spurious incoming gravitational radiation by controlling the Newman-Penrose scalar Ψ0 at the boundary. We also improve the boundary conditions on the gauge degrees of freedom by studying small gauge perturbations of flat spacetime. We then evaluate the performance of these boundary conditions on our test problem: measuring the reflections and constraint violations caused by the computational boundary, and determining how these reflections vary with the radius of the boundary. Section 3 evaluates the performance of a variety of other widely used boundary conditions on our test problem. First we test the simple boundary conditions that freeze all the incoming characteristic fields at the boundary. We also test the commonly used variant of this, the Sommerfeld boundary conditions, used in many binary black hole simulations [7, 8, 9, 10, 11] based on the BSSN [12, 13] formulation of Einstein’s equations. Finally in section 3 we evaluate the constraint-preserving boundary conditions proposed by Kreiss and Winicour [14], which differ from those discussed in section 2 mainly by our use of a physical boundary condition that controls In section 4 we evaluate two boundary treatments that are alternatives to imposing local boundary conditions at a finite outer boundary. The first is the spatial compactification method used e.g. by Pretorius [15, 16, 17] in his groundbreaking binary black hole evolutions. In this treatment a coordinate transformation maps spatial infinity to a finite location on the computational grid. As waves travel out, they become increasingly blue-shifted with respect to the compactified coordinates and ultimately they fail to be resolved. Hence numerical dissipation is applied, which damps away these short-wavelength features. We measure the reflections and the constraint violations generated by the waves in our test problem as they interact with this boundary treatment. Finally in section 4 we implement and test a sponge layer method for Einstein’s equations. Testing outer boundary treatments for the Einstein equations 3 One of the main objectives of current binary black hole simulations is the computation of reliable waveforms for gravitational wave data analysis. Therefore it is important to evaluate how the various boundary treatments affect the accuracy of the extracted waveforms. In section 5, we compute the Newman-Penrose scalar Ψ4 (which describes the outgoing waves) on an extraction sphere close to the outer boundary (or compactified region, or sponge layer, respectively) and compare it with the analogous Ψ4 from the reference solution. We also compare the measured reflections caused by our Ψ0 controlling boundary condition with the analytical predictions of these reflections made by Buchman and Sarbach [18, 19]. Finally we discuss the implications of our results in section 6, and we also describe briefly a number of other boundary treatments which we do not test here. 2. Constraint-preserving boundary conditions In this section, we briefly review the generalized harmonic form of the Einstein evolution system used in our tests. The method of constructing constraint-preserving boundary conditions (CPBCs) for this system is also discussed, and an improved boundary condition for the gauge degrees of freedom is derived. The numerical performance of these boundary conditions is evaluated using our test problem, and the dependence of the spurious reflections as a function of the boundary radius is measured. 2.1. The generalized harmonic evolution system The formulation of Einstein’s equations employed here uses generalized harmonic gauge conditions, in which the coordinates xa obey the wave equation �xa = Ha(x, ψ), (1) where � = ψab(∂a∂b − Γcab∂c) is the covariant scalar wave operator, with ψab the spacetime metric and Γcab the associated metric connection. In this formulation of the Einstein system the gauge source function Ha may be chosen freely as a function of the coordinates and of the spacetime metric ψab (but not derivatives of ψab). As is well known, the Einstein equations reduce to a set of coupled wave equations when the gauge is specified by equation (1). We write this system in first-order form, both in time and space, by introducing the additional variables Φiab ≡ ∂iψab and Πab ≡ −tc∂cψab, where tc is the future directed unit normal to the t = const. hypersurfaces. Here lower-case Latin indices from the beginning of the alphabet denote four-dimensional spacetime quantities, whereas lower-case Latin indices from the middle of the alphabet are spatial. The principal parts of these evolution equations are given by ‡ ∂tψab ' 0, ∂tΠab ' Nk∂kΠab −Ngki∂kΦiab − γ2Nk∂kψab, (2) ∂tΦiab ' Nk∂kΦiab −N∂iΠab +Nγ2∂iψab, where ' indicates that purely algebraic terms have been omitted, gij is the spatial metric of the t = const. slices, and N and N i are the lapse function and shift vector, ‡ The parameter γ1 of [6] is chosen to be −1, which ensures that the equations are linearly degenerate. Testing outer boundary treatments for the Einstein equations 4 respectively. The parameter γ2 was introduced in [6] in order to damp violations of the three-index constraint Ciab ≡ ∂iψab − Φiab = 0. (3) We also include terms of lower derivative order that are designed to damp violations of the harmonic gauge constraint [20] Ca ≡ −�xa +Ha = ψbcΓabc +Ha = 0. (4) The system (2) is symmetric hyperbolic. The characteristic fields in the direction ni (where nata = 0) are given by u0ab = ψab, speed 0, (5) u1±ab = Πab ± Φnab − γ2ψab, speed −N n ±N, (6) u2Aab = ΦAab, speed −Nn. (7) For future reference, we also define ũ1±ab ≡ Πab ± Φnab. (8) Here and in the following, an index n denotes contraction with ni, while upper- case Latin indices A,B, . . . are orthogonal to n, e.g. vA = PAivi where Pab ≡ ψab − nanb + tatb. For further details, we refer the interested reader to [6]. 2.2. Construction of boundary conditions Our construction of boundary conditions for the generalized harmonic evolution system can be divided into three parts: constraint-preserving, physical, and gauge boundary conditions. In order to impose constraint-preserving boundary conditions, we derive the subsidiary evolution system that the constraints (3) and (4) obey as a consequence of the main evolution equations (2). The incoming modes of the subsidiary system are then required to vanish at the boundary (cf. [21, 22, 23, 24, 25, 26, 27, 28, 29]). For instance, the harmonic gauge constraint (4) obeys a wave equation �Ca = (lower-order terms homogeneous in the constraints) (9) and the corresponding incoming fields will involve first derivatives of Ca. In terms of the incoming modes u1−ab (6) of the main evolution equations, the resulting constraint- preserving boundary conditions can be written in the form PC cdab ∂nu cd ≡ ( 12PabP cd − 2l(aPb)(ckd) + lalbkckd)∂nu1−cd = (tangential derivatives), (10) where PC is a projection operator of rank 4 (cf. [6]). Here ni now refers to the outward- pointing unit spatial normal to the boundary, la = (ta + na)/ 2, ka = (ta − na)/ = denotes equality at the boundary. If the shift vector points towards the exterior at the boundary (Nn >̇ 0), the fields u2Aab (7) are incoming as well and we obtain a boundary condition on them by requiring the components CnAab of the four-index constraint Cijab ≡ −2∂[iΦj]ab (11) to vanish at the boundary. An acceptable physical boundary condition should require that no gravitational radiation enter the computational domain from the outside (except for backscatter Testing outer boundary treatments for the Einstein equations 5 off the spacetime curvature, an effect that is a first-order correction in M/R). Gravitational radiation may be described by the evolution system that the Weyl tensor obeys by virtue of the Bianchi identities (see e.g. [27]). Our boundary condition requires the incoming characteristic fields of this system to vanish at the outer boundary. These incoming fields are proportional to the Newman-Penrose scalar Ψ0 (evaluated for a Newman-Penrose null tetrad containing the vectors la and ka). Hence the physical boundary condition we use is [27, 22, 30, 29, 31] = 0, (12) which can be written in a form similar to (10), PP cdab ∂nu cd ≡ (Pa d − 1 cd)∂nu = (tangential derivatives). (13) Here PP is a projection operator of rank 2 that is orthogonal to PC [6]. We remark that (12) still causes some, albeit very small, spurious reflections of gravitational radiation. It can be viewed as the lowest level in a hierarchy of perfectly absorbing boundary conditions for linearized gravity [18, 19]. The constraint-preserving (10) and physical (13) boundary conditions together constrain six components of the main incoming fields u1−ab . The remaining four components correspond to gauge degrees of freedom. In the past we chose simply to freeze those components in time [6], PG cdab ∂tu = 0, (14) where PG ≡ I− PC − PP. The initial-boundary value problem (IBVP) for the boundary conditions discussed so far was shown in [32] to be boundary-stable, which is a (rather strong) necessary condition for well posedness. These boundary conditions have been successfully used in long-term stable evolutions of single and binary black hole spacetimes [6, 33, 34]. In the following subsection, we present an improvement to the gauge boundary condition (14) motivated by the evolution of gauge perturbations about flat spacetime. 2.3. Improved gauge boundary condition Let us assume that near the outer boundary, the spacetime is close to Minkowski space in standard coordinates (Ha = 0) so that the Einstein equations may be linearized about that background. This assumption is reasonable because for the dominant wavenumber of the outgoing pulse (k = 1.6/M) and the boundary radius we typically consider (R = 41.9M), we have kR � 1 and R � M . Furthermore, we assume that the outer boundary is a coordinate sphere of radius r = R. We begin by noting that harmonic gauge does not fix the coordinates completely: infinitesimal coordinate transformations xa → xa + ξa (15) are still allowed provided the displacement vector satisfies the wave equation, �ξa = 0. (16) Under such a coordinate transformation, the metric changes by δψab = −2∂(aξb). (17) A closer inspection [32] of the projection operator PG in (14) shows that the gauge boundary conditions control the components laδψab of the perturbations, where Testing outer boundary treatments for the Einstein equations 6 la ≡ (ta + na)/ 2 is the outgoing null vector normal to the boundary. It is interesting to observe that these components vanish in the ingoing radiation gauge [35]. However, imposing radiation gauge on the entire spacetime is not possible in spacetimes containing strong-field regions, which will always generate perturbations laδψab that propagate into the far field. A reasonable condition to require then is that these perturbations pass through the boundary without causing strong reflections. Each Cartesian component of the vector laδψab obeys the scalar wave equation �ψ = 0. (18) Solutions to this equation can be written in the form Ylm(θ, φ)ψl(t, r), (19) where the Ylm are the standard spherical harmonics and the ψl are linear combinations of outgoing (+) and incoming (−) solutions ψ±l (t, r) = r F±l (r ∓ t), (20) F±l (x) being arbitrary functions. A boundary condition is needed on ψ that eliminates the incoming part of these solutions. In [36], a hierarchy of boundary conditions is constructed that accomplish this task for all l 6 L. This idea was applied to the evolution of the Weyl curvature in [18] in order to construct improved physical boundary conditions. For the gauge boundary conditions considered here, we restrict ourselves to the L = 0 member of the hierarchy, which corresponds to the Sommerfeld condition § (∂t + ∂r + r = 0. (21) In contrast, our old gauge boundary condition that froze the incoming characteristic field, as in (14), is given by (∂t + ∂r + γ2)ψ = 0, (22) where γ2 is the constraint damping parameter. This Sommerfeld boundary condition (21) is much less reflective than the freezing condition (22). To see this, we consider a solution of the form ψl = ψ l + ρlψ l (23) with generating functions F±l (x) = e ±ikx, (24) where k ∈ R is the wave number. Substituting this solution into the boundary conditions (21) resp. (22), we solve for the reflection coefficient ρl. Figure 1 shows |ρl| for a typical range of wave numbers k and outer boundary radii R used for the numerical tests in this paper. (The dominant wave number of the outgoing pulse is k ≈ 1.6/M and in most cases, we place the outer boundary at R = 41.9M .) We see that |ρl| is much smaller (by about 3 orders of magnitude) for the Sommerfeld condition than for the freezing condition. § To avoid confusion, we remark that in [5, 14], the term ‘Sommerfeld condition’ is used in reference to a condition of the form (∂t +∂r)u = 0, i.e. without the extra r−1 term due to our polar coordinates. Testing outer boundary treatments for the Einstein equations 7 0 0.5 1 1.5 2 l = 1 R = 41.9 M 50 100 150 200 R / M l = 1 k = 1 / M Figure 1. Predicted reflection coefficients ρl for freezing (dotted) and Sommerfeld (solid) boundary conditions as functions of wave number k and outer boundary radius R. The curves for different l are visually indistinguishable in the freezing case. Note also that ρ0 = 0 for the Sommerfeld condition. In the notation of the previous subsection, the improved gauge boundary condition (21) reads (after taking a time derivative), PG cdab ∂t[u cd + (γ2 − r −1)ψcd] = 0. (25) We remark that the extra terms in (25) as compared with the old condition (14) are of lower derivative order, so that the high-frequency stability result of [32] extends immediately to these modified gauge boundary conditions. 2.4. Numerical results The numerical tests of the various boundary conditions performed in this paper are described in some detail in Appendix A. Figure 2 compares the numerical performance of our new CPBCs (10), (11), (13), (25) with our old ones (10), (11), (13), (14). The outer boundary is placed at radius R = 41.9M for these particular tests. Shown are the discrete L∞ and L2 norms of the difference ∆U between the numerical solution and the reference solution, and also the violations of the constraints C (see Appendix A.4 for precise definitions of these quantities). The reference solution has an outer boundary at radius 961.9M and is computed using our old CPBCs; thus for t < 920M the outer boundary of the reference solution is out of causal contact with the region where ∆U and C are computed. In the difference ∆U we see a reflection that originates when the wave reaches the boundary at t ≈ R and then amplifies as it moves inward in the spherical geometry, assuming its maximum at t ≈ 2R. This feature is much more prominent in the L∞ norm than in the L2 norm, which is why we display only the L∞ norm in subsequent plots. The reflection is much smaller (by a factor of ≈ R/M) for the new boundary conditions as compared with the old ones. Even at later times, the new boundary conditions result in a smaller ∆U , which in contrast to the old conditions appears to decrease as resolution is increased. We would like to stress that ∆U is a coordinate dependent quantity. Hence a smaller ∆U does not necessarily mean that the boundary treatment is ‘better’ in a physically meaningful sense. If however the aim is to produce a solution that is as Testing outer boundary treatments for the Einstein equations 8 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) Figure 2. Old (solid) vs. new (dotted) CPBCs. Four different resolutions are shown: (Nr, L) = (21, 8), (31, 10), (41, 12), and (51, 14). The outer boundary is at R = 41.9M . close to the reference solution in the same coordinates, the choice of gauge boundary conditions does become important. Gauge reflections can in principle also impair the numerical accuracy of gauge-invariant quantities because much numerical resolution is wasted on resolving the gauge reflections. This is particularly the case when the gauge excitations in question are high-frequency modes such as those produced along with the so-called ‘junk radiation’ in binary black hole initial data. There is no discernible difference between the two sets of boundary conditions as far as constraint violations are concerned, which is what we expect because both of them are constraint-preserving. We close this section by investigating the dependence of the reflections on the radius of the outer boundary (figure 3). The amplitude of the first peak in ||∆U||∞ decreases as the boundary is moved outward, roughly like 1/R. At late times, there appears to be a power-law growth of that quantity at a rate that increases slightly with resolution. Inspection of the constraints (also in figure 3) and Ψ4 (figure 10) suggests that this is a pure gauge effect. This blow-up is completely dominated by the innermost domain, which contains a long-wavelength feature that is growing in time. We speculate that this problem might be cured by a more clever choice of gauge source function close to the black hole horizon. Testing outer boundary treatments for the Einstein equations 9 0 200 400 600 800 1000 t / M R/M = 21.9 161.9 201.9 121.9 = 51, L = 14) 0 200 400 600 800 1000 t / M = 51, L = 14) 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) R = 121.9 M 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) R = 121.9 M Figure 3. New CPBCs at different radii. Top half: all radii at the highest resolution, bottom half: R = 121.9M at all resolutions. In the top right panel, curves for all outer boundary radii coincide. 3. Alternate boundary conditions In this section, we consider several alternate boundary conditions that are often used in numerical relativity. All of these are local conditions imposed at a finite boundary radius, then in section 4 we consider some additional non-local boundary treatments. We run the alternate boundary conditions on our test problem and compare the results with the CPBCs (using the new gauge boundary condition (25)). 3.1. Freezing the incoming fields A very simple boundary condition is obtained by freezing in time all the incoming fields at the boundary, i.e., = 0 (and ∂tu = 0 if Nn >̇ 0). (26) This boundary condition is attractive from a mathematical point of view because it is of maximally dissipative type and hence, together with the symmetric hyperbolic evolution equations (2), yields a strongly well-posed IBVP [37, 38, 39]. However, in general this boundary condition is not compatible with the constraints. Testing outer boundary treatments for the Einstein equations 10 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) Figure 4. Freezing (solid) vs. new CPBCs (dotted). Four different resolutions are shown: (Nr, L) = (21, 8), (31, 10), (41, 12), and (51, 14). For freezing boundary conditions, both ||∆U|| and C converge to a nonzero function with increasing resolution. The outer boundary is at R = 41.9M . The left side of figure 4 demonstrates that freezing boundary conditions cause a significantly larger (by ≈ 3 orders of magnitude) initial reflection than our CPBCs. The difference with respect to the reference solution remains large in the subsequent evolution and unlike for the CPBCs does not decrease with increasing resolution. Furthermore, the violations of the constraints (right side of figure 4) do not converge away. This means that a solution to the Einstein equations is not obtained in the continuum limit. 3.2. Sommerfeld boundary conditions A boundary condition that is often imposed in conjunction with the BSSN [12, 13] formulation of the Einstein equations is a Sommerfeld condition on all the components of the spatial metric gij and extrinsic curvature Kij , (∂t + ∂r + r gij − δij = 0. (27) This condition has been used for example in many recent binary black hole simulations [7, 8, 9, 10, 11]. We cannot impose precisely the conditions (27) in our simulations because there is no one-to-one relationship between gij and Kij , and the incoming characteristic fields of our generalized harmonic formulation of Einstein’s equations. Instead we consider the similar condition (∂t + ∂r + r −1)(ψab − ηab) = 0 (28) on all the components of the spacetime metric (ηab being the Minkowski metric). A very similar boundary condition (without the r−1 term) has recently been used in the generalized harmonic evolutions of [40]. In our formulation, boundary conditions are required not on the spacetime metric itself but only on certain combinations of its derivatives. By taking a time derivative of (28) and rewriting in terms of incoming characteristic fields, we obtain ab + (γ2 − r −1)ψab] = 0. (29) Testing outer boundary treatments for the Einstein equations 11 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) Figure 5. Sommerfeld (solid) vs. new CPBCs (dotted). Four different resolutions are shown: (Nr, L) = (21, 8), (31, 10), (41, 12), and (51, 14). The outer boundary is at R = 41.9M . This then is our version of the Sommerfeld boundary condition (cf. (25)), to be imposed on a spherical boundary in the far field (where linearized theory is assumed to be valid). Because the BSSN formulations using (27) are usually second-order in space, there is no analogue of our three-index constraint (3) in that system. To mimic this situation in our tests of equation (29), we also impose a CPBC on u2Aab as discussed in section 2.2, which together with our constraint damping terms ensures that violations of the three-index constraint (3) are exponentially damped. Our version of Sommerfeld boundary conditions performs similarly on our test problem (figure 5) to the freezing boundary conditions (26) (figure 4). The initial pulse of reflections is smaller by ≈ 2 orders of magnitude, but later ||∆U|| grows to a similar level as for freezing boundary conditions. Again the constraints do not converge away, although this non-convergence appears only at somewhat higher resolutions than in the freezing case. 3.3. Kreiss-Winicour boundary conditions Recently, Kreiss and Winicour [14] proposed a set of ‘Sommerfeld-like’ CPBCs for the harmonic Einstein equations and showed that they result in an IBVP that is well-posed in the generalized sense. Their boundary conditions were implemented and tested in [5]; here we compare their performance with the various other boundary treatments. The Kreiss-Winicour boundary conditions are obtained by requiring the harmonic constraint to vanish at the boundary, = 0. (30) In our notation, this can be written as an algebraic condition on part of the incoming fields u1−, = Fa, (31) where [2k(cδad) − kaψcd], lbu1+ab − bcu1+bc + P iju2ija − 12Pa iψbcu2ibc (32) Testing outer boundary treatments for the Einstein equations 12 − γ2ta +Ha. The range of the projection operator PC is identical with that of PC defined in (10). For the unconstrained incoming fields ũ1− (i.e. u1− without the γ2 term, equation (8)), Kreiss and Winicour [14] specify certain free boundary data qPab and q ab. In our notation, PP cdab ũ cd = q ab, P ab ũ cd = q ab. (33) In the linearized wave and gauge wave tests of [5], these boundary data are obtained from the known exact solutions. In the absence of an exact solution, it is suggested that the data could be obtained from an exterior Cauchy-characteristic or Cauchy- perturbative code. However, since we do not have such an exterior code, we compute the boundary data from the background solution, i.e. Schwarzschild spacetime. As in the Sommerfeld case (section 3.2), we use a constraint-preserving boundary condition on u2Aab to emulate the second-order formulation of [5, 14], and this value of u Aab is then used to compute Fa in (32). Figure 6 shows the numerical results for our test problem. The magnitude of the initial reflections lies between that of freezing and Sommerfeld boundary conditions and is somewhat smaller at later times, though still larger than for our CPBCs at the higher resolutions. The constraints converge away with increasing resolution, as they should for a boundary condition that is consistent with the constraints. In a numerical simulation, violations of the constraints are in general present in the interior of the computational domain. These propagate as described by the constraint evolution system (9) and some may hit the outer boundary. The Dirichlet boundary conditions (30) might be expected to cause more reflections of constraint violations than our no-incoming-field conditions (10), however, no indications of this are seen in figure 6. Probably the constraint damping we use is sufficiently effective in eliminating the source of these reflections. We shall see in section 5.1 that the Kreiss-Winicour boundary conditions also cause larger errors in the physical degrees of freedom than our CPBCs. Since the main difference between the two sets of boundary conditions is our use of a physical boundary condition ∂tΨ0 = 0, we conclude that such a condition is crucial in reducing the reflections from the outer boundary. 4. Alternate approaches So far we have only considered boundary conditions that are local algebraic or differential conditions imposed at the boundary of some finite computational domain. There are of course many ways of treating the outer boundary that do not fall into that category. In this section, we evaluate two such approaches: spatial compactification and sponge layers. 4.1. Spatial compactification Spatial compactification is a method that has been widely used in numerical relativity, for instance in [41, 42] or more recently in the generalized harmonic binary black hole simulations of Pretorius [15, 16, 17]. The basic idea is to introduce spatial coordinates that map spacelike infinity to a finite location. Here we consider mappings that are functions of coordinate radius only (whereas Pretorius applies the mapping to each Cartesian coordinate separately). We Testing outer boundary treatments for the Einstein equations 13 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) Figure 6. Kreiss-Winicour (solid) vs. new CPBCs (dotted). Four different resolutions are shown: (Nr, L) = (21, 8), (31, 10), (41, 12), and (51, 14). The outer boundary is at R = 41.9M . have used two such mappings, named Tan and Inverse, as detailed in Appendix B.1. Each map has a scaleR across which the mapping is (essentially) linear. The outermost grid point is placed at a very large but finite uncompactified radius (r = 1017M). With respect to the compactified radial coordinate, the characteristic speeds are below numerical roundoff there and hence no boundary condition should be needed. The following results were produced using constraint-preserving boundary conditions; we have checked for one simulation that using no boundary condition at all yields results that are visually indistinguishable from the ones presented here on the scales of figures 7, 8, and 10. As the waves travel outward, they become more and more blue-shifted with respect to the computational grid and are eventually no longer properly resolved. However, some form of artificial numerical dissipation is applied that acts as a low- pass filter and causes the waves to be damped as they become increasingly distorted. We have experimented with various such filters; see Appendix B.1 for details. One of them (referred to as number 2 in the following) is designed to emulate as closely as possible the fourth-order Kreiss-Oliger dissipation used by Pretorius. In the following numerical comparisons, we evaluate the differences with respect to the reference solution only in the part of the domain where the compatification map is essentially linear, i.e. for r 6 R. First we compare the various filtering methods at a fixed resolution, using the Tan compactification mapping (figure 7). The filters that are applied to the right side of the evolution equations (numbers 1 and 3, cf. table B1) do somewhat better than those applied to the solution itself (numbers 2 and 4), and the Exponential filters (numbers 3 and 4) are slightly better than the Kreiss-Oliger filters (numbers 1 and 2). All of them are outperformed by the CPBCs (imposed at r = R). For our closest approximation to the dissipation used by Pretorius (number 2), ||∆U|| is comparable to constraint-preserving boundary conditions at the peak of reflections (at t ≈ 2R) but becomes larger by about 2 orders of magnitude at later times. The compactification methods also generate considerable constraint violations. Next we focus on the best filter (number 4) of the previous test but vary the resolution (figure 8). We do see convergence of ||∆U|| initially but the convergence degrades at later times. This is surprising at first because with increasing resolution, Testing outer boundary treatments for the Einstein equations 14 0 200 400 600 800 1000 t / M New CPBC TAN, Filter 1 TAN, Filter 2 TAN, Filter 3 TAN, Filter 4 0 200 400 600 800 1000 t / M New CPBC TAN, Filter 1 TAN, Filter 2 TAN, Filter 3 TAN, Filter 4 Figure 7. Tan compactification with various filters vs. new CPBCs. Only the highest resolution (Nr, L) = (51, 14) is shown. The compactification scale (and the radius of the outer boundary in the CPBC case) is R = 41.9M . 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) Figure 8. Tan compactification with filter 4 (solid) vs. new CPBCs (dotted). Four different resolutions are shown: (Nr, L) = (21, 8), (31, 10), (41, 12), and (51, 14). The compactification scale (and the radius of the outer boundary in the CPBC case) is R = 41.9M . the waves travel a longer distance before they fail to be resolved. Note however that the high-frequency filter is applied at each time step, as is done in the simulations of Pretorius. For higher resolutions, the time steps are smaller because of the CFL condition and the filter is applied more often, thus leading to a stronger damping of the waves. This may well lead to the observed loss of convergence with increasing resolution. The constraints appear to converge away in this test, although from figure 8 it appears that this will not persist for even higher resolutions. We have also evaluated the Inverse mapping described in Appendix B.1. The results are similar, but somewhat worse than the Tan mapping results shown here. Testing outer boundary treatments for the Einstein equations 15 4.2. Sponge layers A method that has been used for a long time in computational science, in particular for spectral methods (see e.g. section 17.2.3 of [43] and references therein), involves so-called sponge layers. A sponge layer is introduced by modifying the evolution equations according to ∂tu = . . .− γ(r)(u− u0), (34) where u0 refers to the unperturbed background solution (Schwarzschild spacetime in our case) and the smooth sponge function γ(r) > 0 is large only close to the outer boundary of the computational domain. (Here we use uncompactified coordinates as in sections 2 and 3.) In this way, the waves are damped exponentially as they approach the outer boundary. Details on our particular choice of γ(r) can be found in Appendix We compare the sponge layer method with our CPBCs in figure 9. For the CPBCs, the boundary is either placed at R = 41.9M (the outer edge of the sponge- free region) or at R = 121.9M (the outer edge of the sponge). At early times (t . 2R), the ||∆U||∞ of the sponge layer method lies between that of the CPBCs for the two choices of outer boundary radius, whereas at later times, it is much larger than both versions of CPBCs. The constraint violations in the sponge runs do not converge away. 5. Physical gravitational waves Perhaps the most important predictions of numerical relativity simulations at the present time are the gravitational waveforms produced by astrophysical systems like binary black holes. It is important therefore to understand how the accuracy of these waveforms is affected by the choice of boundary treatment. Physical gravitational radiation can be described by the Newman-Penrose scalars Ψ4 and Ψ0. The scalar Ψ4 is dominated by the outgoing radiation (its ingoing part is suppressed by a factor of (kr)4, where k is the wavenumber), whereas Ψ0 is dominated by the ingoing radiation (its outgoing part is suppressed by a factor of (kr)4). In this section we compare the gravitational waves extracted from the various boundary treatment solutions on a sphere of radius r = Rex, using the methods described in Appendix A.5. We note that Ψ4 (Ψ0) has a coordinate-invariant meaning only in the limit as future (past) null infinity is approached. The quantities computed at finite radius r will differ from those observed at infinity by terms of the order O(1/r). In the particular case of perturbed Schwarzschild spacetime considered here, a gauge-invariant wave extraction method does exist even at finite radius (see e.g. [44] and references therein) but we do not adopt it here. Our purpose in this paper is merely to measure the effects on Ψ4 caused by the various boundary treatments. 5.1. Difference of Ψ4 with respect to the reference solution We begin by evaluating ∆Ψ4 ≡ Ψ4 − Ψref4 , where Ψ4 is the Newman-Penrose scalar computed using one of the various boundary methods and Ψref4 is the same quantity computed from the reference solution at the same extraction radius. The curves shown in figure 10 plot the maximum value of |∆Ψ4| over time intervals of length 20M (this time filtering averages over the high frequency quasi-normal oscillations of the black hole), normalized by the maximum value of |∆Ψ4| over the entire evolution. The Testing outer boundary treatments for the Einstein equations 16 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) CPBC at R = 41.9 M 0 200 400 600 800 1000 t / M ∞ (Nr,L)=(21,8) (31,10) (41,12) (51,14) CPBC at R = 41.9 M 0 200 400 600 800 1000 t / M ,L)=(21,8) (31,10) (41,12) (51,14) CPBC at R = 121.9 M 0 200 400 600 800 1000 t / M ∞ (Nr,L)=(21,8) (31,10) (41,12) (51,14) CPBC at R = 121.9 M Figure 9. Sponge layer method (solid) vs. new CPBCs at two different radii (dotted). Four different resolutions are shown: (Nr, L) = (21, 8), (31, 10), (41, 12), and (51, 14). The size of the sponge-free region is R = 41.9M and ||∆U||∞ is only computed for r 6 R. radius of the outer boundary (or the compactification scale, or the size of the sponge- free region, respectively) used for these comparisons is R = 41.9M , and the radiation is extracted nearby at Rex = 40M . The first peak in |∆Ψ4| seen in figure 10 arises as the wave in our test problem passes outward through the extraction sphere at t ≈ Rex. This peak is caused by a presently unknown (probably gauge) interaction between the outer boundary (or compactified region etc.) and the spacetime near the extraction sphere. We have verified that this interaction and its influence on the peak in ∆Ψ4 goes away if we move the outer boundary (or the extraction surface) so that they are not in causal contact as the outgoing wave pulse passes the extraction surface. Some of the outgoing radiation is reflected off the boundary. Most of this reflected radiation is subsequently absorbed by the black hole, but some of it excites the hole, which then emits quasi-normal mode radiation of exponentially decaying amplitude. This exponential decay can be clearly seen for most of the boundary treatments. In the case of freezing boundary conditions, nearly all of the outgoing quasi- normal mode radiation is reflected from the boundary because the reflection coefficient is nearly 1 for the wave number of the dominant mode, k = 0.376/M (cf. figure 1). Testing outer boundary treatments for the Einstein equations 17 It then re-excites the black hole, which again radiates and so forth. On average the amplitude of the reflections remains roughly constant in time for this case. This behaviour is consistent with the result shown in figure 3 of [6] for a similar perturbed black hole simulation. For the Sommerfeld and Kreiss-Winicour boundary conditions, the reflections are much smaller but still considerably larger (by 2 to 3 orders of magnitude) than for our CPBCs. We attribute this difference largely to our use of the physical boundary condition (12). The spatial compactification method has the largest difference |∆Ψ4|, particularly at early times t ∼ R (about 4 orders of magnitude larger than for the CPBCs). We suspect that this may be a consequence of the use of artificial dissipation, as discussed in section 4.1. The sponge layer method has the smallest errors at early times. This is not surprising because the outer boundary of the sponge layer is much further out at R = 121.9M . However at later times when the waves begin to interact with the sponge layer, this method causes reflections comparable in amplitude to those using Sommerfeld boundary conditions. We also note that at late times the level of |∆Ψ4| decreases significantly with resolution for the CPBCs, but not generally for the other boundary treatments. We think it is remarkable that the maximum relative error in the extracted physical radiation is quite small (10−5 to 10−3) in these tests, even for the less sophisticated boundary treatments such as the freezing or Sommerfeld boundary conditions. This success is due in part to the fact that the extraction radius, Rex = 40M , for this test problem is about ten wavelengths (of the initial radiation pulse) away from the central black hole. Our results are likely to be more accurate than those from typical binary black hole simulations, which place the outer boundary at two or three wavelengths. This suggests that current binary black hole codes using, for instance, Sommerfeld boundary conditions, can still produce waveforms that are useful for some aspects of gravitational wave data analysis provided the outer boundary is placed sufficiently far out. Data analysis applications needing high precision waveforms, however, such as source parameter measurement or high- amplitude supermassive binary black hole signal subtraction for LISA, will need to use a more sophisticated boundary treatment that produces smaller errors in Ψ4. 5.2. Comparison with the predicted reflection coefficient Buchman and Sarbach [18, 19] have recently developed a hierarchy of increasingly absorbing physical boundary conditions for the Einstein equations by analyzing the equations describing the evolution of the Weyl curvature on both a flat and a Schwarzschild background spacetime. Their analysis predicts, in particular, the reflection coefficient ρ (defined as the ratio of the ingoing to the outgoing parts of the solution) that arises from the ∂tΨ0 = 0 physical boundary condition that we use. For quadrupolar radiation (as in our numerical tests), this reflection coefficient is given by equation (89) of [18], ρ(kR) = 3 (kR)−4 +O(kR)−5, (35) where k is the wave number of the gravitational radiation and R is the boundary radius. (As explained at the beginning of section 2.3, we assume the background spacetime to be flat; effects due to the backscattering would only enter at O(M/R).) Testing outer boundary treatments for the Einstein equations 18 0 200 400 600 800 1000 t / M ,L)=(31,10) (51,14) Freezing 0 200 400 600 800 1000 t / M ,L)=(31,10) (51,14) Sommerfeld 0 200 400 600 800 1000 t / M ,L)=(31,10) (51,14) Kreiss-Winicour 0 200 400 600 800 1000 t / M ,L)=(31,10) (51,14) TAN compactification (filter 4) 0 200 400 600 800 1000 t / M ,L)=(31,10) (51,14) Sponge layer Figure 10. Difference of Ψ4 for the various alternate methods (solid) vs. the new CPBCs (dotted). Two resolutions are shown: (Nr, L) = (31, 10) and (51, 14). The radius of the outer boundary (or the compactification scale, or the size of the sponge-free region, respectively) is R = 41.9M and the waves are extracted at Rex = 40M . Testing outer boundary treatments for the Einstein equations 19 0 1 2 3 4 5 6 7 (k R) R = 21.9 M 0 1 2 3 4 5 6 7 (k R) R = 121.9 M Figure 11. Comparison of the time Fourier transform of the measured Ψ0(t) with 3 (kR)−4Ψ4, which is the predicted value using the reflection coefficient of [18]. By evaluating Ψ0 and Ψ4 at the extraction radius of our test, we find that the ratio Ψ0/Ψ4 agrees with their predicted ρ to leading order in 1/(kR). We note that the tetrad we use for wave extraction (Appendix A.5) does not agree exactly with that of [18]. However, the tetrads do agree for the unperturbed Schwarzschild solution, so that the errors introduced into Ψ0 and Ψ4 due to our different choice of tetrad are second-order small in perturbation theory and hence the comparison with [18] is consistent. For a numerical solution using our new CPBCs, we evaluate the Newman-Penrose scalars Ψ0(t) and Ψ4(t) on extraction spheres located 1.9M inside the outer boundary. In figure 11 we plot the time Fourier transforms of these quantities. We also plot (kR)−4Ψ4, which by the above argument should agree with Ψ0 to leading order in 1/(kR). Figure 11 shows that the numerical agreement is reasonably good: roughly at the expected level of accuracy. The overall dependence of the predicted reflection coefficient ρ on k and R is captured very well. We surmise that the levelling off of our numerical Ψ0 for k & 3 is due to numerical roundoff effects. (Note the magnitude of Ψ0 at those frequencies.) For radii R & 200M , Ψ0 is at the roundoff level for all frequencies. 6. Discussion The purpose of this paper is to compare various methods of treating the outer boundary of the computational domain. We evaluate the performance of several often- used boundary treatments in numerical relativity by measuring the amount of spurious reflections and constraint violations they generate. To this end, we consider as a test problem an outgoing gravitational wave superimposed on a Schwarzschild black hole spacetime. First we compute this numerical solution on a reference domain, large enough that the influence of the outer boundary can be neglected. Then we repeat the evolution on smaller domains using one of the boundary treatments, either imposing local boundary conditions, compactifying the domain using a radial coordinate map, or installing a sponge layer. We use a first-order generalized harmonic formulation of the Einstein equations, although these boundary methods can be applied to other Testing outer boundary treatments for the Einstein equations 20 formulations as well. We believe our results are fairly independent of the particular formulation used. Our main conclusion is that our version of constraint-preserving boundary conditions performs better than any of the alternate treatments that we tested. Our boundary conditions include a limitation on the influx of spurious gravitational waves by freezing the Newman-Penrose scalar Ψ0 at the boundary. We also introduce and test an improved boundary condition for the gauge degrees of freedom. For some of the simple boundary conditions, such as freezing or Sommerfeld conditions, we find constraint violations that do not converge away with increasing resolution. The continuum limit does not satisfy Einstein’s equations in these cases. Most of the alternate boundary conditions also generate considerable reflections as measured by ∆U , the norm of the difference with respect to the reference solution. In many cases, these reflections do not decrease significantly with increasing resolution. The difference norm ∆U that we use to measure boundary reflections includes the entire spacetime metric, not just the physical degrees of freedom. It is important then to evaluate separately the effects of the various boundary treatments on the physical degrees of freedom. We use the extracted outgoing radiation as approximated by the Newman-Penrose scalar Ψ4 for this purpose. Here our conclusions are somewhat different. Rather surprisingly, most of the boundary methods we consider generate relatively small errors in Ψ4. This suggests that if gravitational waveforms are only needed to an accuracy of, say, 1% (which is comparable to the discrepancies between recent binary black hole simulations [45]) then even the simple Sommerfeld conditions might be good enough. (For those, we find relative errors ∼ 10−5.) The largest relative errors in Ψ4 we find (∼ 10−2) occur with our implementation of the spatial compactification method used by Pretorius [15, 16, 17]. We attribute these largely to the use of artificial dissipation. Undesirable effects of dissipation might be somewhat less severe in binary black hole evolutions, which have much larger wavelengths (λ ∼ 20 − 100M) than ours (λ ∼ 4M). Our tests suggest that the errors in Ψ4 can be made to decrease significantly with resolution only by using more sophisticated constraint preserving and physical boundary conditions. The importance of using a physical boundary condition on Ψ0 is illustrated in particular by the difference between the performance of our boundary conditions and those of Kreiss and Winicour [14]. Some caveats regarding the interpretation of our results must be stated. First, the ratio of the dominant wavelength to the radius of the outer boundary is typically much larger for binary black hole evolutions (where λ/R & 0.5) than for the simple test problem considered here (where λ/R ∼ 0.1). Boundary treatments generally work better for smaller λ/R, i.e. when the boundary is well out in the wave zone. Hence the results presented here are likely to be more accurate than those from typical binary black hole simulations. Second, we use spectral methods rather than finite-difference methods, which are more commonly used in numerical relativity at this time. This complicates the implementation of the kind of numerical dissipation that is crucial for the spatial compactification method to work. While we have attempted to construct a filter that mimics the finite-difference dissipation as closely as possible, a direct comparison is clearly impossible. In finite-difference methods, the error introduced by the type of numerical dissipation considered here is below the truncation error. Hence tests similar to ours but performed with a finite-difference method would not be able to detect the effect of dissipation. There are several directions in which the present work could be extended. For large values of the outer boundary radius, we observe a non-convergent power- Testing outer boundary treatments for the Einstein equations 21 law growth of the error in our test problem when constraint-preserving boundary conditions are used; the origin of this growth should be investigated further. It would be interesting to implement and test the hierarchy of physical boundary conditions that are perfectly absorbing for linearized gravity (including leading-order corrections due to the curvature and backscatter) found recently by Buchman and Sarbach [18, 19]. Our boundary conditions could also be tested using known exact solutions such as gauge waves, and comparisons could be made with the results found in [5]. For completeness we also mention a number of additional outer boundary approaches that were not addressed in this paper, but would also be interesting future extensions of this research. In [46, 47], boundary conditions for the full nonlinear Einstein equations on a finite domain are obtained by matching to exact solutions of the linearized field equations at the boundary. Alternatively, the interior code could be matched to an ‘outer module’ that solves the linearized field equations numerically [48, 49, 50, 51]. Other approaches involve matching the interior nonlinear Cauchy code to an outer characteristic code (see [52] for a review) or using hyperboloidal spacetime slices that can be compactified towards null infinity (see [53] for a review). Appendix A. Details on the numerical test problem Appendix A.1. Initial data The initial data used for our numerical tests are the same as in [27]. The background solution is a Schwarzschild black hole in Kerr-Schild coordinates, ds2 = −dt2 + 2M (dt+ dr)2 + dr2 + r2dΩ2. (A.1) Throughout the paper, M refers to the bare black hole mass of the unperturbed background. We superpose an odd-parity outgoing quadrupolar wave perturbation constructed using Teukolsky’s method [54]. Its generating function is taken to be a Gaussian G(r) = A exp[−(r − r0)2/w2] with A = 4× 10−3, r0 = 5M , and w = 1.5M . The full non-linear initial value equations in the conformal thin sandwich formulation are then solved to obtain initial data that satisfy the constraints [55]. This yields initial values for the spatial metric, extrinsic curvature, lapse function, and shift vector. We note that after the superposition, the resulting solution is still nearly but not completely outgoing. Our generalized harmonic formulation of Einstein’s equations requires initial data for the full spacetime metric and its first time derivative. These can be computed from the 3+1 quantities obtained above, provided we also choose initial values for the time derivatives of the lapse function and shift vector. These initial time derivatives are freely specifiable and are equivalent to the initial choice of the gauge source function Ha; we choose ∂tN = 0 and ∂tN i = 0 at t = 0. Appendix A.2. Numerical method We use a pseudospectral collocation method as described for example in [27]. The computational domain for the test problem considered here is taken to be a spherical shell extending from r = 1.9M (just inside the horizon) out to some r = R. This domain is subdivided into spherical-shell subdomains of extent ∆r = 10M . On each subdomain, the numerical solution is expanded in Chebyshev polynomials in the radial direction and in spherical harmonics in the angular directions Testing outer boundary treatments for the Einstein equations 22 (where each Cartesian tensor component is expanded in the standard scalar spherical harmonics). Typical resolutions are Nr ∈ {21, 31, 41, 51} coefficients per subdomain for the Chebyshev series and l 6 L with L ∈ {8, 10, 12, 14} for the spherical harmonics. We change the outer boundary radius R by changing the number of subdomains while keeping the width ∆r of each subdomain fixed; this facilitates direct comparisons between runs with different values of R. For example, the innermost four subdomains of the reference solution (which has a total of 96 subdomains and R = 961.9M) are identical to the four subdomains used to compute the solution with R = 41.9M . The evolution equations are integrated in time using a fourth-order Runge-Kutta scheme, with a Courant factor ∆t/∆xmin of at most 2.25, where ∆xmin is the smallest distance between two neighbouring collocation points. As described in [27], the top four coefficients in the tensor spherical harmonic expansion of each of our evolved quantities is set to zero after each time step; this eliminates an instability associated with the inconsistent mixing of tensor spherical harmonics in our approach. We use two methods of numerically implementing boundary conditions; the choice of method depends on the type of boundary conditions. Boundary conditions that can be expressed as algebraic relations involving the characteristic fields are implemented using a penalty method (see [56] and references therein; in the context of finite- difference methods see also [57] and references therein). In particular, we use a penalty method to implement the Kreiss-Winicour boundary conditions (cf. section 3.3) and to impose boundary conditions at the internal boundaries between neighbouring subdomains. Boundary conditions that are expressed in terms of the time derivatives of the characteristic fields are implemented using the method of Bjørhus [58], where the time derivatives of the incoming characteristic fields are replaced at the boundary with the relevant boundary condition. All boundary conditions in this paper besides those mentioned above are implemented using the Bjørhus method. Appendix A.3. Gauge source functions Our generalized harmonic formulation [6] of Einstein’s equations allows for gauge source functions that depend arbitrarily on the coordinates and the spacetime metric: Ha = Ha(t, x, ψ). The generalized harmonic evolution equations are equivalent to Einstein’s equations only if the constraint (4) remains satisfied. We choose the time derivatives of lapse and shift to be zero at the beginning of the simulation; this determines the initial value of Ha via the constraint (4). For the subsequent evolution, we hold this Ha fixed in time. Appendix A.4. Error quantities We use two different measurements of the errors in our solutions, which we monitor during our numerical evolutions. First, given a numerical solution (ψab,Πab,Φiab), the difference between that solution and the reference solution (ψ(ref)ab ,Π (ref) ab ,Φ (ref) iab ) is computed with the following norm at each point in space, δabδcd(M−2∆ψac∆ψbd + ∆Πac∆Πbd +gij∆Φiac∆Φjbd) , (A.2) where ∆ψab means ψab−ψ(ref)ab , and similarly for ∆Πab and ∆Φiab. Second, we define a quantity C that measures the violations in all of the constraints of our system, δab(FaFb + gij(CiaCjb + gklδcdCikacCjlbd) Testing outer boundary treatments for the Einstein equations 23 +M−2(CaCb + gijδcdCiacCjbd)) , (A.3) where Fa and Cia are first derivatives of Ca defined in [6]. To compute global error measures, a spatial norm ||·||, either the L∞ norm or the L2 norm, is applied separately to ∆U and C. The question often arises as to the significance of particular values of ||∆U|| and ||C||. For example, is a simulation with ||C|| = 10−2 good to one percent accuracy? To make it easier to answer such questions, we normalize both ||∆U|| and ||C|| as follows, and we always plot normalized quantities. We divide ||∆U|| by a normalization factor ||∆U0||, defined as the difference between a given solution at t = 0 and the unperturbed Schwarzschild background; i.e., the quantity ||∆U0|| is computed from (A.2) using the unperturbed Schwarzschild solution instead of the reference solution. Since ||∆U0|| is evaluated at t = 0, it depends only on the initial data used in the simulation, and is a measure of the amplitude of the superposed gravitational wave perturbation. For the initial data used here, ||∆U0||∞ = 6× 10−3 and ||∆U0||2 = 1.4× 10−4. The quantity ||∆U||/||∆U0|| is more easily interpreted than ||∆U||; for example, ||∆U||/||∆U0|| is unity when the difference from the reference solution is of the same size as the initial perturbation. Similarly, the constraint energy norm ||C|| is divided by the norm of the first derivatives ||∂U|| (at the respective time), gijδabδcd(M−2∂iψac∂lψbd + ∂iΠac∂jΠbd +gkl∂iΦkac∂jΦlbd) . (A.4) The constraints for our system are linear combinations of the first derivatives of the fields, hence ||C||/||∂U|| ∼ 1 corresponds to a complete violation of the constraints. Appendix A.5. Wave extraction For evaluating gravitational waveforms, we compute the Newman-Penrose scalars Ψ0 = −Cabcdlamblcmd, Ψ4 = −Cabcdkam̄bkcm̄d, (A.5) where Cabcd is the Weyl tensor, la and ka are outgoing and ingoing null vectors normalized according to laka = −1, ma is a complex unit null spatial vector orthogonal to la and ka, and m̄a is the complex conjugate of ma. For perturbations of flat spacetime, there is a standard choice for the vectors la, ka, and ma. In general curved spacetimes, however, no such prescription for the tetrad exists that would produce coordinate-independent quantities Ψ0 and Ψ4 at finite radius. We choose the null vectors according to la = 1√ (ta + na) , ka = 1√ (ta − na) , (A.6) where ta is the future-pointing unit timelike normal to the t = const. slices and na is the unit spacelike normal to the extraction sphere. Finally, we choose sin θ , (A.7) where (r, θ, φ) are spherical coordinates on the r = Rex = const. extraction sphere. Note that our choice of ma is not exactly null nor of unit magnitude at finite extraction radius. However, the tetrad is orthonormal for the unperturbed Schwarzschild solution, so that the errors introduced into Ψ0 and Ψ4 because of the lack of tetrad orthonormality will be second-order small in perturbation theory. Testing outer boundary treatments for the Einstein equations 24 The quantity Ψ4 corresponds to outgoing radiation in the limit of r → ∞, t − r = const., i.e. as future null infinity is approached. Similarly Ψ0 corresponds to ingoing radiation as past null infinity is approached. At finite extraction radius, Ψ4 and Ψ0 will disagree with the waveforms observed at infinity by terms of the order O(Rex)−1. We decompose the quantities Ψ4 and Ψ0 in terms of spin-weighted spherical harmonics of spin-weight −2 on the extraction surface. Since our perturbation is an odd-parity quadrupole wave, the imaginary part of the (l = 2, m = 0) spherical harmonic is by far the dominant contribution to Ψ4, and we only display that mode in our plots. We normalize the curves in our graphs by the maximum (in time) value of |Ψ4| at the extraction radius Rex, which for Rex = 40M is max |Ψ4| = 6× 10−4. Appendix B. Details of the alternate approaches In this appendix, we provide some more details on the alternate boundary treatments discussed in section 4: spatial compactification and sponge layers. Appendix B.1. Spatial compactification We implement spatial compactification by introducing a radial coordinate transformation x → r(x) that maps a compact ball on the computational grid with x ∈ [0, xmax] to the full unbounded physical slice with r ∈ [0,∞]. We consider two such mappings. The Tan mapping is similar to the one used by Pretorius [15, 16, 17] and is given by rTan(x) = R tan , 0 6 x < 2R. (B.1) The scale R determines the range in physical radius r across which the map is essentially linear (see figure B1). When comparing compactification with other boundary treatments, we compare quantities only in the region r < R. (The scale R is equal to unity in the work of Pretorius. He uses mesh refinement to obtain the appropriate resolution close to the origin, while we fix the resolution and choose the scale R appropriately.) We also tested an Inverse map defined by rInverse(x) = x, 0 6 x 6 R , 2R− x, R < x < 2R, (B.2) see figure B1. This map is only C1 at x = R, but we maintain spectral accuracy in our tests by placing this surface at the boundary between spectral subdomains. Dissipation is needed to remove the short wavelength components of the waves as they travel outward on the compactified computational grid and become unresolved. We apply this dissipation only in the radial direction, but everywhere in the computational domain. In spectral methods, dissipation can be conveniently implemented in the form of a spectral filter. This filter is applied by multiplying each spectral expansion coefficient of index k by a function f(k). (See Appendix A.2 for details on the pseudospectral method we use.) Higher values of k correspond to shorter wavelengths in the numerical approximation; let kmax be the highest index used in the spectral expansion. The first filter function we consider is the closest analogue in the context of our spectral methods to Kreiss-Oliger [59] dissipation, fKreiss-Oliger(k) = 1− � sin4 2kmax , 0 6 � 6 1. (B.3) Testing outer boundary treatments for the Einstein equations 25 0 1 2 x / R INVERSE 0 0.5 1 k / k Filters 1 and 2 Filters 3 and 4 Figure B1. Compactification mappings (left) and filter functions (right). The dashed line indicates the boundary of the region in where the compactification mapping is (essentially) linear. Typical values of the parameter � used by Pretorius are � ∈ [0.2, 0.5]; we use � = 0.25. This filter was derived via a comparison with finite-difference methods as follows. In the finite-difference approach, a numerical solution u is represented on a set of equidistant grid points xj . (It suffices to consider the one-dimensional case here.) Some form of numerical dissipation is usually required for the finite-difference method to be stable. The one that is most often used for second-order accurate methods is fourth-order Kreiss-Oliger dissipation [59]. One possible implementation of this, used e.g. by Pretorius, amounts to replacing u→ F [u] ≡ u (B.4) at each time step, where h is the grid spacing and D4 is the second-order accurate centred finite difference operator approximating the fourth derivative, D4ui = h −4(uj−2 − 4uj−1 + 6uj − 4uj+1 + uj+2). (B.5) Taking u to be a Fourier mode u(k)j = exp(ikxj), it follows that the mode is damped by a frequency-dependent factor, u(k) → F [u(k)] ≡ 1− � sin4 2kmax u(k), (B.6) where kmax = π/(2h) is the Nyquist frequency. Thus we obtain the filter function (B.3). Strictly speaking, the above analysis only applies to Fourier expansions and not to the Chebyshev expansions we use. Nevertheless, we apply the filter in the form (B.3) to our Chebyshev expansion coefficients. Note that in (B.6), each spectral coefficient u(k) is filtered separately; this is not true for the analogous calculation for a Chebyshev expansion. We also use a different filter function, which we call the Exponential filter, that is often used in spectral methods (see [60] and references therein), fExponential(k) = exp σkmax . (B.7) Testing outer boundary treatments for the Einstein equations 26 No. Type Parameters Applied to 1 Kreiss-Oliger � = 0.25 right side 2 Kreiss-Oliger � = 0.25 solution 3 Exponential σ = 0.76, p = 13 right side 4 Exponential σ = 0.76, p = 13 solution Table B1. Details of the filtering methods Typical values of the parameters are σ = 0.76 and p = 13. This choice of parameters gives less dissipation at small values of k than the Kreiss-Oliger filter, and also ensures that f(kmax) ≈ 10−16 is at the level of the numerical roundoff error. There are various ways the filters can be applied in a numerical evolution. We have experimented with two different methods. In the first method, the filter is applied to the right side of the equations, i.e. the evolution equations ∂tu = S are modified according to ∂tu = F [S], where F [S] is the filtered right side. In the second method, the filter is instead applied to the solution itself, i.e. after each substep of the time integrator (cf. Appendix A.2), the numerical solution u is replaced with its filtered version F [u]. This second method is closest to how the Kreiss-Oliger filter is applied by Pretorius. For our numerical tests, we have used four different combinations of the various options described above. They are summarized in table B1. Appendix B.2. Sponge layers For sponge layers we must specify a sponge profile function γ(r), as defined in (34). We choose γ(r) to be nonzero only outside some sponge-free region of radius R, and when comparing sponge layers with other boundary treatments, we compare quantities only in the sponge-free region r < R. The sponge profile function γ(r) we use is a Gaussian centred at the outer boundary, which we choose to place at r = 3R, γ(r) = γ0 exp r − 3R . (B.8) The amplitude of the Gaussian is taken to be γ0 = 1. The width σ is chosen so that γ(r) 6 10−16 (the numerical roundoff error) for r 6 R, which requires σ . R/3. In our numerical example, we take R = 41.9M and σ = 13.3M . Hence σ is considerably larger than the wavelength λ ≈ 4M of the gravitational wave, which is required in order to avoid reflections from the sponge layer (cf. section 17.2.3 of [43]). Figure B2 shows a plot of this sponge profile. Acknowledgments We thank Luisa Buchman, Jan Hesthaven, Larry Kidder, Harald Pfeiffer, Olivier Sarbach, and Jeff Winicour for helpful discussions concerning this work. The numerical simulations presented here were performed using the Spectral Einstein Code (SpEC) developed at Caltech and Cornell primarily by Larry Kidder, Mark Scheel and Harald Pfeiffer. This work was supported in part by grants from the Sherman Fairchild Foundation, and from the Brinson Foundation; by NSF grants Testing outer boundary treatments for the Einstein equations 27 0 1 2 3 r / R Figure B2. The sponge profile function γ(r). The dashed line indicates the boundary of the region where γ is below the numerical roundoff error. PHY-0099568, PHY-0244906, PHY-0601459, DMS-0553302 and NASA grants NAG5- 12834, NNG05GG52G. References [1] Novak J and Bonazzola S 2004 Absorbing boundary conditions for simulation of gravitational waves with spectral methods in spherical coordinates J. Comput. Phys. 197 86–196 [2] Rinne O 2005 Axisymmetric Numerical Relativity Ph.D. thesis Univ. of Cambridge Preprint http://www.arxiv.org/abs/gr-qc/0601064 [3] Lau S L 2004 Rapid evaluation of radiation boundarz kernels for time-domain wave propagation on black holes: implementation and numerical tests Class. Quantum Grav. 21 4147–4192 [4] Babiuc M C, Szilágyi B and Winicour J 2006 Harmonic initial-boundary evolution in general relativity Phys. Rev. D 73 064017 [5] Babiuc M C, Kreiss H O and Winicour J 2007 Constraint-preserving Sommerfeld conditions for the harmonic Einstein equations Phys. Rev. D 75 044002 [6] Lindblom L, Scheel M A, Kidder L E, Owen R and Rinne O 2006 A new generalized harmonic evolution system Class. Quantum Grav. 23 S447–S462 [7] Brügmann B, Tichy W and Jansen N 2004 Numerical simulation of orbiting black holes Phys. Rev. Lett. 92 211101 [8] Campanelli M, Lousto C O, Marronetti P and Zlochower Y 2006 Accurate evolutions of orbiting black-hole binaries without excision Phys. Rev. Lett. 96 111101 [9] Baker J G, Centrella J, Choi D I, Koppitz M and van Meter J 2006 Gravitational-wave extraction from an inspiraling configuration of merging black holes Phys. Rev. Lett. 96 111102 [10] Diener P, Herrmann F, Pollney D, Schnetter E, Seidel E, Takahashi R, Thornburg J and Ventrella J 2006 Accurate evolution of orbiting binary black holes Phys. Rev. Lett. 96 121101 [11] Herrmann F, Hinder I, Shoemaker D and Laguna P 2007 Unequal-mass binary black hole plunges and gravitational recoil Class. Quantum Grav. 24 S33–S42 [12] Shibata M and Nakamura T 1995 Evolution of three-dimensional gravitational waves: Harmonic slicing case Phys. Rev. D 52 5428 [13] Baumgarte T W and Shapiro S L 1998 Numerical integration of Einstein’s field equations Phys. Rev. D 59 024007 [14] Kreiss H O and Winicour J 2006 Problems which are well-posed in a generalized sense with applications to the Einstein equations Class. Quantum Grav. 16 S405–S420 [15] Pretorius F 2005 Numerical relativity using a generalized harmonic decomposition Class. Quantum Grav. 22 425–452 [16] Pretorius F 2005 Evolution of binary black hole spacetimes Phys. Rev. Lett. 95 121101 [17] Pretorius F 2006 Simulation of binary black hole spacetimes with a harmonic evolution scheme Class. Quantum Grav. 23 S529–S552 [18] Buchman L T and Sarbach O C A 2006 Towards absorbing outer boundaries in general relativity Class. Quantum Grav. 23 6709–6744 Testing outer boundary treatments for the Einstein equations 28 [19] Buchman L T and Sarbach O C A 2007 Improved outer boundary conditions for Einstein’s field equations Class. Quantum Grav. 24 S307–S326 [20] Gundlach C, Calabrese G, Hinder I and Mart́ın-Garćıa J M 2005 Constraint damping in the Z4 formulation and harmonic gauge Class. Quantum Grav. 22 3767–3774 [21] Stewart J M 1998 The Cauchy problem and the initial boundary value problem in numerical relativity Class. Quantum Grav. 15 2865–2889 [22] Friedrich H and Nagy G 1999 The initial boundary value problem for Einstein’s vacuum field equations Comm. Math. Phys. 201 619–655 [23] Iriondo M S and Reula O A 2002 Free evolution of self-gravitating, spherically symmetric waves Phys. Rev. D 65 044024 [24] Calabrese G, Lehner L and Tiglio M 2002 Constraint-preserving boundary conditions in numerical relativity Phys. Rev. D 65 104031 [25] Calabrese G and Sarbach O 2003 Detecting ill-posed boundary conditions in general relativity J. Math. Phys. 44 3888–3899 [26] Calabrese G, Pullin J, Reula O, Sarbach O and Tiglio M 2003 Well posed constraint-preserving boundary conditions for the linearized Einstein equations Comm. Math. Phys. 240 377–395 [27] Kidder L E, Lindblom L, Scheel M A, Buchman L T and Pfeiffer H P 2005 Boundary conditions for the Einstein evolution system Phys. Rev. D 71 064020 [28] Bona C, Ledvinka T, Palenzuela-Luque C and Žáček M 2005 Constraint-preserving boundary conditions in the Z4 numerical relativity formalism Class. Quantum Grav. 22 2615–2634 [29] Sarbach O and Tiglio M 2005 Boundary conditions for Einstein’s field equations: Analytical and numerical analysis J. Hyp. Diff. Eq. 2 839–883 [30] Bardeen J M and Buchman L T 2002 Numerical tests of evolution systems, gauge conditions, and boundary conditions for 1D colliding gravitational plane waves Phys. Rev. D 65 064037 [31] Nagy G and Sarbach O 2006 A minimization problem for the lapse and the initial-boundary value problem for Einstein’s field equations Class. Quantum Grav. 16 S477–S504 [32] Rinne O 2006 Stable radiation-controlling boundary conditions for the generalized harmonic Einstein equations Class. Quantum Grav. 23 6275–6300 [33] Scheel M A, Pfeiffer H P, Lindblom L, Kidder L E, Rinne O and Teukolsky S A 2006 Solving Einstein’s equations with dual coordinate frames Phys. Rev. D 74 104006 [34] Pfeiffer H P, Brown D A, Kidder L E, Lindblom L, Lovelace G and Scheel M A 2007 Reducing orbital eccentricity in binary black hole simulations Class. Quantum Grav. 24 S59–S81 [35] Chrzanowski P L 1975 Vector potential and metric perturbations of a rotating black hole Phys. Rev. D 11 2042–2062 [36] Bayliss A and Turkel E 1980 Radiation boundary conditions for wave-like equations Comm. Pure Appl. Math. 33 707–725 [37] Rauch J 1985 Symmetric positive systems with boundary characteristics of constant multiplicity Trans. Am. Math. Soc. 291 167–187 [38] Secchi P 1996 The initial boundary value problem for linear symmetric hyperbolic systems with characteristic boundary of constant multiplicity Diff. Int. Eq. 9 671–700 [39] Secchi P 1996 Well-posedness of characteristic symmetric hyperbolic systems Arch. Rat. Mech. Anal. 134 155–197 [40] Szilágyi B, Pollney D, Rezzolla L, Thornburg J and Winicour J 2007 An explicit harmonic code for black-hole evolution using excision Class. Quantum Grav. 24 S275–S293 [41] Garfinkle D and Duncan G 2001 Numerical evolution of Brill waves Phys. Rev. D 63 044011 [42] Choptuik M, Lehner L, Olabarrieta I, Petryk R, Pretorius F and Villegas H 2003 Towards the final fate of an unstable black string Phys. Rev. D 68 044001 [43] Boyd J P 2001 Chebyshev and Fourier Spectral Methods 2nd ed (Dover publications) [44] Pazos E, Dorband E N, Nagar A amd Palenzuela C, Schnetter E and Tiglio M 2007 How far away is far enough for extracting numerical waveforms, and how much do they depend on the extraction method? Class. Quantum Grav. 24 S341–S368 [45] Baker J G, Campanelli M, Pretorius F and Zlochower Y 2007 Comparisons of binary black hole merger waveforms Class. Quantum Grav. 24 S25–S31 [46] Abrahams A M and Evans C R 1988 Reading off the gravitational radiation waveforms in numerical relativity calculations: Matching to linearized gravity Phys. Rev. D 37 318 [47] Abrahams A M and Evans C R 1990 Gauge-invariant treatment of gravitational radiation near the source: Analysis and numerical simulations Phys. Rev. D 42 2585 [48] Abrahams A M et al. 1998 Gravitational wave extraction and outer boundary conditions by perturbative matching Phys. Rev. Lett. 80 1812–1815 [49] Rupright M E, Abrahams A M and Rezzolla L 1998 Cauchy-perturbative matching and outer boundary conditions I: Methods and tests Phys. Rev. D 58 044005 Testing outer boundary treatments for the Einstein equations 29 [50] Rezzolla L, Abrahams A M, Matzner R A, Rupright M E and Shapiro S L 1999 Cauchy- perturbative matching and outer boundary conditions: Computational studies Phys. Rev. D 59 064001 [51] Zink B, Pazos E, Diener P and Tiglio M 2006 Cauchy-perturbative matching reexamined: Tests in spherical symmetry Phys. Rev. D 73 084011 [52] Winicour J 2005 Characteristic evolution and matching Living Rev. Relativity 8(10) [53] Frauendiener J 2004 Conformal infinity Living Rev. Relativity 7(1) [54] Teukolsky S A 1982 Linearized quadrupole waves in general relativity and the motion of test particles Phys. Rev. D 26 745–750 [55] Pfeiffer H P, Kidder L E, Scheel M A and Shoemaker D 2005 Initial data for Einstein’s equations with superposed gravitational waves Phys. Rev. D 71 024020 [56] Hesthaven J S 2000 Spectral penalty methods Appl. Numer. Math. 33 23–41 [57] Schnetter E, Diener P, Dorband E N and Tiglio M 2006 A multi-block infrastructure for three- dimensional time-dependent numerical relativity Class. Quantum Grav. 23 S553–S578 [58] Bjørhus M 1995 The ODE formulation of hyperbolic PDEs discretized by the spectral collocation method SIAM J. Sci. Comput. 16 542–557 [59] Kreiss H O and Oliger J 1973 Methods for the approximate solution of time dependent problems Global Atmospheric Research Programme (Publication Series No. 10) [60] Gottlieb D and Hesthaven J S 2001 Spectral methods for hyperbolic problems J. Comput. Appl. Math. 128 83–131 Introduction Constraint-preserving boundary conditions The generalized harmonic evolution system Construction of boundary conditions Improved gauge boundary condition Numerical results Alternate boundary conditions Freezing the incoming fields Sommerfeld boundary conditions Kreiss-Winicour boundary conditions Alternate approaches Spatial compactification Sponge layers Physical gravitational waves Difference of 4 with respect to the reference solution Comparison with the predicted reflection coefficient Discussion Details on the numerical test problem Initial data Numerical method Gauge source functions Error quantities Wave extraction Details of the alternate approaches Spatial compactification Sponge layers ABSTRACT Various methods of treating outer boundaries in numerical relativity are compared using a simple test problem: a Schwarzschild black hole with an outgoing gravitational wave perturbation. Numerical solutions computed using different boundary treatments are compared to a `reference' numerical solution obtained by placing the outer boundary at a very large radius. For each boundary treatment, the full solutions including constraint violations and extracted gravitational waves are compared to those of the reference solution, thereby assessing the reflections caused by the artificial boundary. These tests use a first-order generalized harmonic formulation of the Einstein equations. Constraint-preserving boundary conditions for this system are reviewed, and an improved boundary condition on the gauge degrees of freedom is presented. Alternate boundary conditions evaluated here include freezing the incoming characteristic fields, Sommerfeld boundary conditions, and the constraint-preserving boundary conditions of Kreiss and Winicour. Rather different approaches to boundary treatments, such as sponge layers and spatial compactification, are also tested. Overall the best treatment found here combines boundary conditions that preserve the constraints, freeze the Newman-Penrose scalar Psi_0, and control gauge reflections. <|endoftext|><|startoftext|> Introduction We consider the flow of an incompressible fluid in a polyhedral set Ω ⊂ R2 during the time interval [0, T ]. The velocity field u : Ω× [0, T ] → R2 and the pressure field p : Ω× [0, T ] → R satisfy the Navier-Stokes equations ∆u+ (u · ∇)u+∇p = f , (1.1) div u = 0 , (1.2) with the boundary and initial conditions u|∂Ω = 0 , u|t=0 = u0. The terms ∆u and (u·∇)u are associated with the physical phenomena of diffusion and convection, respectively. The Reynolds number Re measures the influence of convection in the flow. For equations (1.1)–(1.2), finite element and finite difference methods are well known and mathematical studies are available (see [9] for example). Keywords and phrases: Incompressible fluids, Navier-Stokes equations, projection methods, finite volume. 1 17 rue Barrème - 69006 LYON. e-mail: Sebastien.Zimmermann@ec-lyon.fr c© EDP Sciences, SMAI 1999 http://arxiv.org/abs/0704.0783v2 2 TITLE WILL BE SET BY THE PUBLISHER For finite volume schemes, numerous computations have been conducted ( [12] and [1] for example). However, few mathematical results are available in this case. Let us cite Eymard and Herbin [6] and Eymard, Latché and Herbin [7]. In order to deal with the incompressibility constraint (1.2), these works use a penalization method. Another way is to use the projection methods which have been introduced by Chorin [4] and Temam [13]. This is the case in Faure [8] where the mesh is made of squares. In Zimmermann [14] the mesh is made of triangles, so that more complex geometries can be considered. In the present paper the mesh is also made of triangles, but we consider a different discretization for the pressure. It leads to a linear system with a better-conditioned matrix. The layout of the article is the following. We first introduce in section 2 the discrete setting. We state (section 2.1) some notations and hypotheses on the mesh. We define (section 2.2) the spaces we use to approximate the velocity and pressure. We define also (section 2.3) the operators we use to approximate the differential operators in (1.1)–(1.2). Combining this with a projection method, we build the scheme in section 3. In order to provide a mathematical analysis, we show in section 4 that the differential operators in (1.1)–(1.2) and their discrete counterparts share similar properties. In particular, the discrete operators for the gradient and the divergence are adjoint. The discrete operator for the convection term is positive, stable and consistent. The discrete operator for the divergence satisfy an inf-sup (Babuška-Brezzi) condition. From these properties we deduce in section 5 the stability of the scheme. We conclude with some notations. The spaces (L2, |.|) and (L∞, ‖.‖∞) are the usual Lebesgue spaces and we set L20 = {q ∈ L q(x) dx = 0}. Their vectorial counterparts are (L2, |.|) and (L∞, ‖.‖∞) with L 2 = (L2)2 and L∞ = (L∞). For k ∈ N∗, (Hk, ‖ ·‖k) is the usual Sobolev space. Its vectorial counterpart is (H k, ‖.‖k) with Hk = (Hk)2. For k = 1, the functions of H1 with a null trace on the boundary form the space H10. Also, we set ∇u = (∇u1,∇u2) T if u = (u1, u2) ∈ H 1. If X ⊂ L2 is a Banach space, we define C(0, T ;X) (resp. L2(0, T ;X)) as the set of the applications g : [0, T ] → X such that t → |g(t)| is continuous (resp. square integrable). The norm ‖.‖C(0,T ;X) is defined by ‖g‖C(0,T ;X) = sups∈[0,T ] |g(s)|. In all calculations, C is a generic positive constant, depending only on Ω, u0 and f . 2. Discrete setting First, we introduce the spaces and the operators needed to build the scheme. 2.1. The mesh Let Th be a triangular mesh of Ω. The circumscribed circle of a triangle K ∈ Th is centered at xK and has the diameter hK . We set h = maxK∈Th hK . We assume that all the interior angles of the triangles of the mesh are less than π , so that xK ∈ K. The set of the edges of the triangle K ∈ Th is EK . The symbol nK,σ denotes the unit vector normal to an edge σ ∈ EK and pointing outward K. We denote by Eh the set of the edges of the mesh. We distinguish the subset E inth ⊂ Eh (resp. E h ) of the edges located inside Ω (resp. on ∂Ω). The middle of an edge σ ∈ Eh is xσ and its length |σ|. For each edge σ ∈ E h , let Kσ and Lσ be the two triangles having σ in common. We set dσ = d(xKσ ,xLσ ). For all σ ∈ E h , only the triangle Kσ located inside Ω is defined and we set dσ = d(xKσ ,xσ). Then for all σ ∈ Eh we set τσ = . As in [5] we assume the following on the mesh: there exists C > 0 such that ∀σ ∈ Eh , dσ ≥ C |σ| and |σ| ≥ C h. It implies that there exists C > 0 such that ∀σ ∈ E inth , τσ = |σ|/dσ ≥ C. (2.1) 2.2. The discrete spaces We first define P0 = {q ∈ L 2 ; ∀K ∈ Th, q|K is a constant} , P0 = (P0) TITLE WILL BE SET BY THE PUBLISHER 3 For the sake of concision, we set for all qh ∈ P0 (resp. vh ∈ P0) and all triangle K ∈ Th: qK = qh|K (resp. vK = vh|K). Although P0 6⊂ H 1, we define the discrete equivalent of a H1 norm as follows. For all vh ∈ P0 we set ‖vh‖h = σ∈Eint τσ |vLσ − vKσ | σ∈Eext τσ |vKσ | . (2.2) We have [5] a Poincaré-like inequality: there exists C > 0 such that for all vh ∈ P0 |vh| ≤ C ‖vh‖h. (2.3) We also have [14] an inverse inequality: there exists C > 0 such that for all vh ∈ P0 h ‖vh‖h ≤ C |vh|. (2.4) From the norm ‖.‖h we deduce a dual norm. For all vh ∈ P0 we set ‖vh‖−1,h = sup (vh,ψh) ‖ψh‖h . (2.5) For all uh ∈ P0 and vh ∈ P0 we have (uh,vh) ≤ ‖uh‖−1,h ‖vh‖h. We define the projection operator ΠP0 : L2 → P0 as follows. For all w ∈ L 2, ΠP0w ∈ P0 is given by ∀K ∈ Th , (ΠP0w)|K = w(x) dx. (2.6) We easily check that for all w ∈ L2 and vh ∈ P0 we have (ΠP0w,vh) = (w,vh). We deduce from this that ΠP0 is stable for the L2 norm. We define also the operator Π̃P0 : H 2 → P0. For all w ∈ H 2, Π̃P0w ∈ P0 is given by ∀K ∈ Th , Π̃P0w|K = w(xK). According to the Sobolev embedding theorem, w ∈ H2 is a.e. equal to a continuous function. Therefore the definition above makes sense. We introduce also the finite element spaces P d1 = {v ∈ L 2 ; ∀K ∈ Th, v|K is affine} , Pnc1 = {vh ∈ P 1 ; ∀σ ∈ E h , vh|Kσ (xσ) = vh|Lσ(xσ) , Pc1 = {vh ∈ (P 2 ; vh is continuous and vh|∂Ω = 0}. We have Pc1 ⊂ H 0. We define ΠPc1 : H 0 → P 1. For all v = (v1, v2) ∈ H 0, ΠPc1v = (v h) ∈ P 1 is given by ∀φh = (φ h) ∈ P ∇vih,∇φ ∇vi,∇φ The operator ΠPc is stable for the H1 norm. One checks ( [2] p. 110) that there exists C > 0 such that for all v ∈ H1 |v −ΠPc v| ≤ C h ‖v‖1. (2.7) Let us address now the space Pnc1 . If qh ∈ P 1 , we have usually ∇qh 6∈ L 2. Thus we define the operator ∇h : P 1 → P0 by setting for all qh ∈ P0 and all triangle K ∈ Th ∇hqh|K = ∇qh dx. (2.8) 4 TITLE WILL BE SET BY THE PUBLISHER The associated norm is defined by ‖qh‖1,h = 2 + |∇hqh| We have a Poincaré-like inequality : there exists C > 0 such that for all qh ∈ P 1 ∩ L |qh| ≤ C |∇hqh|. (2.9) We define the projection operator ΠPnc . For all q ∈ H1, ΠPnc q is given by ∀σ ∈ Eh , (ΠPnc q) dσ = q dσ. One checks ( [2] p.110) that there exists C > 0 such that |p−ΠPnc p| ≤ C h ‖p‖1 , ∣∣∣∇̃h(p−ΠPnc ∣∣∣ ≤ C ‖p‖1. (2.10) Finally, we use the Raviart-Thomas spaces (see [3]) = {vh ∈ P 1 ; ∀σ ∈ EK , vh|K · nK,σ is a constant, and vh · n|∂Ω = 0} , RT0 = {vh ∈ RT ; ∀K ∈ Th, ∀σ ∈ EK , vh|Kσ · nKσ ,σ = vh|Lσ · nKσ ,σ}. For all vh ∈ RT0, K ∈ Th and σ ∈ EK we set (vh ·nK,σ)σ = vh|K ·nK,σ. We define the operator ΠRT0 : H RT0. For all v ∈ H 1, ΠRT0v ∈ RT0 is given by ∀K ∈ Th , ∀σ ∈ EK , (ΠRT0v · nK,σ)σ = v dσ. (2.11) 2.3. The discrete operators The equations (1.1)–(1.2) use the differential operators gradient, divergence and laplacian. Using the spaces of section 2.2, we define their discrete counterparts. The discrete gradient ∇h : P 1 → P0 is defined by (2.8). The discrete divergence operator divh : P0 → P 1 is built so that it is adjoint to the operator ∇h. We set for all vh ∈ P0 and all triangle K ∈ Th ∀σ ∈ E inth , (divh vh)(xσ) = 3 |σ| |Kσ|+ |Lσ| (vLσ − vKσ ) · nK,σ ; ∀σ ∈ Eexth , (divh vh)(xσ) = − 3 |σ| |Kσ|+ |Lσ| vKσ · nK,σ. (2.12) The first discrete laplacian ∆h : P 1 → P 1 ensures that the incompressibility constraint (1.2) is satisfied in a discrete sense (see the proof of proposition 3.1 below). We set for all qh ∈ P ∆hqh = divh(∇hqh). The second discrete laplacian ∆̃h : P0 → P0 is the usual operator in finite volume schemes [5]. We set for all vh ∈ P0 and all triangle K ∈ Th ∆̃hvh|K = σ∈EK∩E τσ (vLσ − vKσ )− σ∈EK∩E τσ vKσ . TITLE WILL BE SET BY THE PUBLISHER 5 In order to approximate the term (u · ∇)u in (1.1) we define a bilinear form b̃h : RT0 × P0 → P0 using the well-known upwind scheme [5]. For all uh ∈ P0, vh ∈ P0, and all triangle K ∈ Th we set b̃h(uh,vh) σ∈EK∩E (u · nK,σ) σ vK + (u · nK,σ) σ vLσ . (2.13) We have set a+ = max(a, 0), a− = min(a, 0) for all a ∈ R. Lastly, we define the trilinear form bh : RT0 ×P0 × P0 → R 2 as follows. For all uh ∈ RT0, vh ∈ P0, wh ∈ P0, we set bh(uh,vh,wh) = |K|wK · b̃h(uh,vh) . (2.14) 3. The scheme In order to deal with the incompressibility constraint (1.2) we use a projection method. This kind of method has been introduced by Chorin [4] and Temam [13]. The basic idea is the following. The time interval [0, T ] is split with a time step k: [0, T ] = n=0[tn, tn+1] with N ∈ N ∗ and tn = n k for all n ∈ {0, . . . , N}. For all m ∈ {2, . . . , N}, we compute (see equation (3.2) below) a first velocity field ũmh ≃ u(tm) using only equation (1.1). We use a second-order BDF scheme for the discretization in time. We then project ũmh (see equation (3.4) below) over a subspace of P0. We get a a pressure field p h ≃ p(tm) and a second velocity field u h ≃ u(tm), which fulfills the incompressibility constraint (1.2) in a discrete sense. The algorithm goes as follows. For all m ∈ {0, . . . , N}, we set fmh = ΠP0 f(tm). Since the operator ΠP0 is stable for the L 2-norm we get |fmh | = |ΠP0 f(tm)| ≤ |f(tm)| ≤ ‖f‖C(0,T ;L2). (3.1) We start with the initial values u0h ∈ P0 ∩RT0 , u h ∈ P0 ∩RT0 p h ∈ P0 ∩ L For all n ∈ {1, . . . , N}, (ũn+1h , p h ) is deduced from (ũ h , p h) as follows. • ũn+1h ∈ P0 is given by 3 ũn+1h − 4u h + u ∆̃hũ h + b̃h(2u h − u h , ũ h ) +∇hp h = f h , (3.2) • pn+1h ∈ P 1 ∩ L 0 is the solution of h − p divh ũ h , (3.3) • un+1h ∈ P0 is deduced by un+1h = ũ h − p h). (3.4) Existence and unicity of a solution to equation (3.2) is classical ( [5] for example). The convection term in (3.2) is well defined thanks to the following result. Proposition 3.1. For all m ∈ {0, . . . , N} we have umh ∈ RT0 . Proof. If m ∈ {0, 1} the result holds by definition. If m ∈ {2, . . . , N} we apply the operator divh to (3.3) and compare with (3.4). We get divh u h = 0. Using definition (2.12) we get u h ∈ RT0. 6 TITLE WILL BE SET BY THE PUBLISHER Let us show that equation (3.3) also has a unique solution. Let qh ∈ P 1 ∩ L 0 such that ∆hqh = 0. According to proposition 4.4 we have for all qh ∈ P0 −(∆hqh, qh) = − divh(∇hqh), qh = (∇hqh,∇hqh) = |∇hqh| Therefore we have ∇hqh = 0, so that qh = 0 since qh ∈ L 0. We have thus proved the unicity of a solution for (3.3). It is also the case for the associated linear system. It implies that this linear system has indeed a solution. Hence it is also the case for equation (3.3). Note finally that since umh ∈ P0 ∩RT0, we have divu h = 0 for all m ∈ {0, . . . , N}. Hence the incompressibility condition (1.2) is fulfilled. 4. Properties of the discrete operators We show that the differential operators in (1.1)–(1.2) and the operators defined in section 2.3 share similar properties. 4.1. Properties of the discrete convective term We define b̃ : H1 ×H1 → L2. For all u ∈ H1 and v = (v1, v2) ∈ H 1 we set b̃(u,v) = div(v1 u), div(v2 u) We show that the operator b̃h is a consistent approximation of b̃. Proposition 4.1. There exists a constant C > 0 such that for all v ∈ H2 and all u ∈ H2 ∩ H10 satisfying divu = 0 ‖ΠP0 b̃(u,v)− b̃h(ΠRT0u, Π̃P0v)‖−1,h ≤ C h ‖u‖2 ‖v‖1. Proof. We set uh = ΠRT0u and vh = Π̃P0v. Let K ∈ Th. According to the divergence formula and (2.6) we ΠP0 b̃(u,v)|K = σ∈EK∩E v (u · n) dσ. On the other hand, let us rewrite b̃h(uh,vh). Let σ ∈ EK ∩ E h . Setting vK,Lσ = vK si (uh · nK,σ)σ ≥ 0 vLσ si (uh · nK,σ)σ < 0 one checks that vK (uh · nK,σ) σ + vLσ (uh · nK,σ) σ = vK,Lσ (uh · nK,σ)σ. Using (2.11), we deduce from (2.13) b̃h(uh,vh)|K = σ∈EK∩E vK,Lσ (uh · nK,σ) dσ. ΠP0 b̃(u,v)− b̃h(uh,vh) σ∈EK∩E (v − vK,Lσ) (uh · n) dσ. Let ψh ∈ P0. We have ΠP0 b̃(u,v) − b̃h(uh,vh),ψh σ∈EK∩E (v − vK,Lσ) (uh · n) dσ σ∈Eint (ψKσ −ψLσ) (v − vKσ ,Lσ) (uh · n) dσ. TITLE WILL BE SET BY THE PUBLISHER 7 Let σ ∈ E inth . We consider the quadrilateral Dσ defined by xKσ , xLσ and the vertex of σ. We set DK,Lσ = Dσ ∩K si (uh · nK,σ)σ ≥ 0 Dσ ∩ Lσ si (uh · nK,σ)σ < 0 Using a Taylor expansion and a density argument (see [14]) one checks that |v − vKσ ,Lσ | dσ ≤ C h DKσ,Lσ |∇v (y)|2 dy ΠP0 b̃(u,v) − b̃h(ΠRT0u, Π̃P0v),ψh ≤ C h ‖u‖H2 σ∈Eint |ψLσ −ψKσ | σ∈Eint DKσ,Lσ |∇v (y)|2 dy so that ΠP0b̃(u,v)− b̃h(ΠRT0u, Π̃P0v),ψh )∣∣∣ ≤ C h ‖u‖H2 ‖ψh‖1,h ‖v‖1. Using then definition (2.5), we get the result. Let v ∈ L∞ ∩H1 and u ∈ H1 with divu ≥ 0 a.e. in Ω. Integrating by parts one checks that v · b̃(u,v) dx = divu dx ≥ 0. The operator bh shares a similar property. Proposition 4.2. Let uh ∈ RT0 such that divuh ≥ 0. For all vh ∈ P0 we have bh(uh,vh,vh) ≥ 0. Proof. Remember that for all edges σ ∈ E inth , two triangles Kσ et Lσ share σ as an edge. We denote by Kσ the one such that uσ · nKσ ,σ ≥ 0. Using the algebraic identity 2 a (a− b) = a 2 − b2 + (a − b)2 we deduce from (2.14) 2 bh(uh,vh,vh) = 2 σ∈Eint |σ|vKσ · (vKσ − vLσ) (uh · nKσ,σ) σ∈Eint |vKσ| 2 − |vLσ | 2 + |vKσ − vLσ | (uh · nKσ,σ) so that 2 bh(uh,vh,vh) ≥ σ∈Eint |vKσ| 2 − |vLσ | (uh · nKσ,σ). This sum can be written as a sum over the triangles of the mesh. We get 2 bh(uh,vh,vh) ≥ |vKσ | σ∈EK∩E |σ| (uh · nKσ,σ). Using finally the divergence formula we get 2 bh(uh,vh,vh) ≥ |K| |vK | divuh dx ≥ 0. The following result states that the operator bh is stable for suitable norms. 8 TITLE WILL BE SET BY THE PUBLISHER Proposition 4.3. There exists a constant C > 0 such that for all vh ∈ P0, wh ∈ P0, uh ∈ P0 satisfying divuh = 0 |bh(uh,vh,vh)| ≤ C |uh| ‖vh‖h ‖vh‖h. Proof. For all triangle K ∈ Th and all edge σ ∈ EK ∩ E h , we have (uh · nK,σ) σ vK + (uh · nK,σ) σ vLσ = (uh · nK,σ)σ vK − |(uh · nK,σ)σ| (vLσ − vK). Using this splitting, we deduce from (2.14) bh(uh,vh,wh) = S1 + S2 with vK ·wK σ∈EK∩E |σ| (uh · nK,σ)σ , S2 = − σ∈EK∩E |σ| |(uh · nK,σ)σ| (vLσ − vK). By writing the sum over the edges as a sum over the triangles we have S2 = − σ∈Eint |σ| |(uh · nK,σ)σ| (vLσ − vK) · (wLσ −wK). Using the Cauchy-Schwarz inequality we get |S2| ≤ h ‖uh‖∞ σ∈Eint |vLσ − vKσ | 1/2  σ∈Eint |wLσ −wKσ | Since uh ∈ RT0 we have [5] the inverse inequality h ‖uh‖∞ ≤ C |uh|. Using (2.1) and (2.2) we get σ∈Eint |vLσ − vKσ | 2 ≤ C σ∈Eint τσ |vLσ − vKσ | 2 ≤ C ‖vh‖ and in a similar way σ∈Eint |wLσ −wKσ | 2 ≤ C ‖wh‖ h. Thus |S2| ≤ C |uh| ‖vh‖h ‖wh‖h. On the other hand, according to the divergence formula |K| (vK ·wK) divuh dx = 0. By gathering the estimates for S1 and S2 we get the result. 4.2. Properties of the discrete divergence The operators gradient and divergence are adjoint: if q ∈ H1 , v ∈ H1 with v · n|∂Ω = 0, we get (v,∇q) = −(q, divv) by integrating by parts. For ∇h and divh we state the following. Proposition 4.4. For all vh ∈ P0 and qh ∈ P 1 we have: (vh,∇hqh) = −(qh, divh vh). Proof. According to (2.8) (vh,∇hqh) = |K|vK · ∇hqh|K = |σ|qh(xσ)nK,σ TITLE WILL BE SET BY THE PUBLISHER 9 By writing this sum as a sum over the edges we get (vh,∇hqh) = − σ∈Eint |σ| qh(xσ) (vLσ − vKσ) · nKσ,σ + σ∈Eext |σ| qh(xσ)vKσ · nKσ,σ. (4.1) On the other hand, using a quadrature formula −(qh, divh vh) = − qh(xσ) (divh vh)(xσ). By writing this sum as a sum over the edges of the mesh we get −(qh, divh vh) = − σ∈Eint ( |Kσ| qh(xσ) (divhvh)(xσ)− σ∈Eext qh(xσ) (divh vh)(xσ). Using definition (2.12) and comparing with (4.1) we get the result. The divergence operator and the spaces L20, H 0 satisfy the following property, called inf-sup (or Babuška-Brezzi) condition (see [9] for example). There exists a constant C > 0 such that (q, divv) ‖v‖1|q| ≥ C. (4.2) We will now show that the operator divh and the spaces P0 ∩ L 0, P0 satisfy an analogous property. The proof uses the following lemma. Lemma 4.1. There exists a constant C > 0 such that ∀ qh ∈ P 1 ∩ L 0 , sup vh∈P0\{0} (qh, divh vh) ‖vh‖h ≥ C h ‖qh‖1,h. Proof. If qh = 0 the result is trivial. Let qh ∈ P 0\{0}. Let vh = ∇hqh ∈ P0\{0}. Using proposition 4.4 we have −(qh, divhvh) = (vh,∇hqh) = |∇hqh| 2 = |∇hqh| |vh|. Using (2.3) and (2.4) we get −(qh, divhvh) ≥ C h ‖qh‖1,h ‖vh‖h. We now state the result. Proposition 4.5. There exists a constant C > 0 such that for all qh ∈ P 1 ∩ L vh∈P0\{0} (qh, divh vh) ‖vh‖h ≥ C |qh|. Proof. If qh = 0 the result is trivial. Let qh ∈ P 0\{0}. According to (4.2) there exists v ∈ H 0 such that divv = −qh and ‖v‖1 ≤ C |qh|. (4.3) We set vh = ΠPc v. We want to estimate − qh, divh(ΠP0vh) . Since ∇hqh ∈ P0 we deduce from proposition qh, divh(ΠP0vh) = (ΠP0vh,∇hqh) = (vh,∇hqh). By splitting the last term we get qh, divh(ΠP0vh) = (v,∇hqh)− (v − vh,∇hqh). (4.4) 10 TITLE WILL BE SET BY THE PUBLISHER We bound the right-hand side of (4.4). Using (2.7) and (4.3) we have |v − vh| = |v −ΠPc v| ≤ C h ‖v‖1 ≤ C h |qh|. Thus, using the Cauchy-Schwarz inequality, we get |(v − vh,∇hqh)| ≤ C h |qh| |∇hqh| ≤ C h |qh| ‖qh‖1,h. We estimate the other term as follows. Integrating by parts we get (v,∇hqh) = −(qh, divv) + qh (v · nK,σ) dσ. We have −(qh, divv) = |qh| 2 thanks to (4.3). On the other hand qh (v · nK,σ) dσ = σ∈Eint qh (v · nKσ,σ) dσ since v|∂Ω = 0. Using [2] p.269 and (4.3) we have ∣∣∣∣∣ qh (v · nK,σ) dσ ∣∣∣∣∣ ≤ C h ‖v‖1 ‖qh‖1,h ≤ C h |qh| ‖qh‖1,h. Hence we get (v,∇hqh) ≥ (|qh| − C h ‖qh‖1,h) |qh|. Thus we deduce from (4.4) qh, divh(ΠP0vh) ≥ (|qh| − C h ‖qh‖1,h) |qh|. (4.5) We now introduce the norm ‖.‖h. We have vh = ΠPc v ∈ Pc1 ⊂ H 1. From [5] p. 776 we deduce ‖ΠP0vh‖h ≤ C ‖vh‖1. Since ΠPc is stable for the H1 norm, using (4.3), we get ‖vh‖1 = ‖ΠPc v‖1 ≤ ‖v‖1 ≤ C |qh|. Therefore ‖ΠP0vh‖h ≤ C |qh|. Using this inequality in (4.5) we obtain that there exists C1 > 0 and C2 > 0 such that qh, divh(ΠP0vh) ≥ (C1 |qh| − C2 h ‖qh‖1,h) ‖ΠP0vh‖h. We deduce from this vh∈P0\{0} (qh, divh vh) ‖vh‖h ≥ C1 |qh| − C2 h ‖qh‖1,h. Let us combine this result with lemma 4.1. Since ∀ t ≥ 0 , max C t , C1 |qh| − C2 t C + C2 |qh| , we finally get the result. 4.3. Properties of the discrete laplacian We recall from [14] the coercivity of the laplacian operator. Proposition 4.6. For all uh ∈ P0 and vh ∈ P0 we have −(∆̃huh,uh) = ‖uh‖ h , −(∆̃huh,vh) ≤ ‖uh‖h ‖vh‖h. TITLE WILL BE SET BY THE PUBLISHER 11 5. Stability of the scheme We first prove an estimate for the computed velocity (theorem 5.1). We show a similar result for the increments in time (lemma 5.2). Using the inf-sup condition (proposition 4.5), we infer from it some estimates on the pressure (theorem 5.2). Lemma 5.1. For all m ∈ {0, . . . , N} et n ∈ {0, . . . , N} we have (umh ,∇hp h) = 0 , |u 2 − |ũmh | 2 + |umh − ũ 2 = 0. Proof. First, using propositions 3.1 and 4.4, we get (umh ,∇hp h) = −(p h, divhu h ) = 0. Also, we deduce from (3.4) 2 (umh ,u h − ũ h ) = − umh ,∇h(p h − p Using the algebraic identity 2 a (a− b) = a2 − b2 + (a− b)2 we get 2 (umh ,u h − ũ h ) = |u 2 − |ũmh | 2 + |umh − ũ 2 = 0. We introduce the following hypothesis on the initial data. (H1) There exists C > 0 such that |u0h|+ |u h|+ k|∇hp h| ≤ C. Hypothesis (H1) is fulfilled if we set u0h = ΠRT0u0 and we use a semi-implicit Euler scheme to compute u We have the following stability result. Theorem 5.1. We assume that the initial values of the scheme fulfill (H1). For all m ∈ {2, . . . , N} we have |umh | 2 + k ‖ũnh‖ h ≤ C. (5.1) Proof. Let m ∈ {2, . . . , N} and n ∈ {1, . . . ,m− 1}. Taking the scalar product of (3.2) with 4 k ũn+1h we get 3 ũn+1h − 4u h + u , 4 k ũn+1h (∆̃hũ h , ũ +4 k bh(2u h − u h , ũ h , ũ h ) + 4 k (∇hp h, ũ h ) = 4 k (f h , ũ h ). (5.2) First of all, using lemma 5.1 and proceeding as in [10], we get ũn+1h , 3 ũn+1h − 4u h + u = |un+1h | 2 − |unh| 2 + |2un+1h − u 2 − |2unh − u + |un+1h − 2u h + u 2 + 6 |ũn+1h − u According to proposition 4.6 we have − 4 k (∆̃hũ h , ũ h ) = ‖ũn+1h ‖ h. Also, according to lemma 5.1 and (3.4) 4 k (∇hp h, ũ h ) = 4 k (∇hp h, ũ h − u (|∇pn+1h | 2 − |∇pnh| 2 − |∇pn+1h −∇p 12 TITLE WILL BE SET BY THE PUBLISHER Multiplying equation (3.4) by 4 k∇h(p h − p h) and using the Young inequality we get |∇(pn+1h − p 2 ≤ 3 |un+1h − ũ According to proposition 4.2, we have 4 k bh(2u h − u h , ũ h , ũ h ) ≥ 0. At last using the Cauchy-Schwarz inequality, (2.3) and (3.1) we have 4 k (fn+1h , ũ h ) ≤ 4 k |f h | |ũ h | ≤ C k ‖f‖C(0,T ;L2) ‖ũ h ‖h. Using the Young inequality we get 4 k (fn+1h , ũ h ) ≤ 3 k ‖ũ h + C k ‖f‖ C(0,T ;L2). Thus we deduce from (5.2) |un+1h | 2 − |unh | 2 + |2un+1h − u 2 − |2unh − u 2 + |un+1h − 2u h + u +3 |ũn+1h − u 2 + k ‖ũn+1h ‖ (|∇hp 2 − |∇hp 2) ≤ C k. Summing from n = 1 to m− 1 we have |umh | 2 + |2umh − u 2 + 3 |ũn+1h − u 2 + k ‖ũn+1h ‖ ≤ C + 4 |u1h| 2 + |2u1h − u 2 + k2 |∇hp Using hypothesis (H1) we get (5.1). We now want to estimate the computed pressure. From now on, we make the following hypothesis on the data f ∈ C(0, T ;L2) , ft ∈ L 2(0, T ;L2) , u0 ∈ H 2 ∩H10 , divu0 = 0. One shows that if the data u0 and f fulfill a compatibility condition [11] there exists a solution (u, p) to the equations (1.1)–(1.2) such that u ∈ C(0, T ;H2) , ut ∈ C(0, T ;L 2) , ∇p ∈ C(0, T ;L2). We introduce the following hypothesis on the initial values of the scheme: there exists a constant C > 0 such (H2) |u0h − u0|+ ‖u1h − u(t1)‖∞ + |p h − p(t1)| ≤ C h , |u h − u h| ≤ C k. One checks easily that this hypothesis implies (H1). We have the following result. Lemma 5.2. We assume that the initial values of the scheme fulfill (H2). Then there exists a constant C > 0 such that for all m ∈ {1, . . . , N} |umh − u h | ≤ C. Proof. Using proposition 4.1 one proceeds as in [14]. The difference lies in the way we bound the term ∇hp We use the splitting p1h = (p h −ΠPnc1 p(t1)) + (ΠP p(t1)− p(t1)) + p(t1). TITLE WILL BE SET BY THE PUBLISHER 13 Using an inverse inequality [2] we have p1h −ΠPnc1 p(t1) )∣∣ ≤ ∣∣p1h −ΠPnc1 p(t1) (∣∣p1h − p(t1) ∣∣p(t1)−ΠPnc p(t1) ∣∣) . Using (2.10) and hypothesis (H2) we get p1h −ΠPnc1 p(t1) )∣∣ ≤ C ‖p(t1)‖1 ≤ C ‖p‖C(0,T ;H1). According to (2.10) we also have ∣∣∇h(p(t1)−ΠPnc p(t1)) ∣∣ ≤ C ‖p(t1)‖1 ≤ C ‖p‖C(0,T ;H1). Lastly |∇p(t1)| ≤ ‖p‖C(0,T ;H1). Thus we get |∇hp h| ≤ C. Theorem 5.2. We assume that the initial values of the scheme fulfull (H2). There exists a constant C > 0 such that for all m ∈ {2, . . . , N} |pnh| 2 ≤ C. Proof. Let m ∈ {2, . . . , N}. We set n = m− 1. Using the inf-sup condition (4.5) and proposition 4.4, we get that there exists vh ∈ P0\{0} such that C ‖vh‖h |p h | ≤ −(p h , divh vh) = (∇hp h ,vh). (5.3) Plugging (3.4) into (3.2) we have h = − 3un+1h − 4u h + u ∆̃hũ h − b̃h(2u h − u h , ũ h ) + f so that h ,vh) = − 3un+1h − 4u h + u ∆̃hũ h ,vh − bh(2u h − u h , ũ h ,vh) + (f h ,vh). Thanks to proposition 4.3 and theorem 5.1 we have ∣∣bh(2unh − u h , ũ h ,vh) 2 |unh|+ |u ‖ũn+1h ‖h ‖vh‖h ≤ C ‖ũ h ‖h ‖vh‖h. According to proposition 4.6 we have ∆̃hũ h ,vh ≤ ‖ũn+1h ‖h ‖vh‖h. Using the Cauchy-Schwarz inequality, (2.3) and (3.1) we have (fn+1h ,vh) ≤ |f h | |vh| ≤ C |vh| ≤ C ‖vh‖h and in a similar way 3un+1h − 4u h + u )∣∣∣∣ ≤ C 3un+1h − 4u h + u ∣∣∣∣ ‖vh‖h. Thus we get h ,vh) ≤ C + C |3un+1h − 4u h + u + ‖ũn+1h ‖h ‖vh‖h. By comparing with (5.3) we get |pn+1h | ≤ C + C |3un+1h − 4u h + u + ‖ũn+1h ‖h 14 TITLE WILL BE SET BY THE PUBLISHER Squaring and summing from n = 1 to m− 1 we obtain |pnh| 2 ≤ C + C k |3un+1h − 4u h + u + C k ‖ũn+1h ‖ The last term on the right-hand side is bounded, thanks to theorem 5.1. And since 3un+1h − 4u h + u h = 3(u h − u h)− (u h − u h ) = 3 δu h − δu we deduce from lemma 5.2 |3un+1h − 4u h + u ≤ C k |δunh | References [1] S. Boivin, F. Cayre, J. M Herard, A finite volume method to solve the Navier-Stokes equations for incompressible flows on unstructured meshes, Int. J. Therm. Sci. 39 (2000) 806–825. [2] S. C. Brenner and L. R. Scott, The mathematical theory of finite element methods, Springer, 2002. [3] F. Brezzi and M. Fortin, Mixed and hybrid finite element methods, Springer-Verlag, 1991. [4] J. Chorin, On the convergence of discrete approximations to the Navier-Stokes equations, Math. Comp. 23 (1969) 341–353. [5] R. Eymard, T. Gallouët and R. Herbin, Finite volume methods, P.G. Ciarlet and J.L. Lions eds, North-Holland, 2000. [6] R. Eymard and R. Herbin, A staggered finite volume scheme on general meshes for the Navier-Stokes equations in two space dimensions, Int.J. Finite Volumes (2005). [7] R. Eymard, J. C. Latché and R. Herbin, Convergence analysis of a colocated finite volume scheme for the incompressible Navier-Stokes equations on general 2 or 3D meshes, SIAM J. Numer. Anal. 45(1) (2007) 1–36. [8] S. Faure, Stability of a colocated finite volume scheme for the Navier-Stokes equations, Num. Meth. PDE 21(2) (2005) 242–271. [9] V. Girault and P. A. Raviart, Finite Element Methods for Navier-Stokes equations: Theory and Algorithms, Springer, 1986. [10] J. L. Guermond, Some implementations of projection methods for Navier-Stokes equations, M2AN 30(5) (1996) 637–667. [11] J. G. Heywood and R. Rannacher, Finite element approximation of the nonstationary Navier-Stokes problem. I. Regularity of solutions and second-order error estimates for spatial discretization, SIAM J. Numer. Anal. 19(26) (1982) 275–311. [12] D. Kim and H. Choi, A second-order time-accurate finite volume method for unsteady incompressible flow on hybrid unstruc- tured grids, J. Comp. Phys. 162 (2000) 411–428. [13] R. Temam, Sur l’approximation de la solution des équations de Navier-Stokes par la méthode de pas fractionnaires II, Arch. Rat. Mech. Anal. 33 (1969) 377–385. [14] S. Zimmermann, Stability of a colocated finite volume for the incompressible Navier-Stokes equations, arXiv :0704.0772 (2006). [15] S. Zimmermann, Étude et implémentation de méthodes de volumes finis pour les fluides incompressibles, PhD, Blaise Pascal University, France (2006). ABSTRACT We introduce a finite volume scheme for the two-dimensional incompressible Navier-Stokes equations. We use a triangular mesh. The unknowns for the velocity and pressure are respectively piecewise constant and affine. We use a projection method to deal with the incompressibility constraint. We show that the differential operators in the Navier-Stokes equations and their discrete counterparts share similar properties. In particular we state an inf-sup (Babuska-Brezzi) condition. Using these properties we infer the stability of the scheme. <|endoftext|><|startoftext|> Introduction 1 2 The Gauge Theory 4 3 D(-1) Instantons 6 3.1 D3-D(-1) in flat space . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 D(-1)-D3 at the C3/Z3-orientifold . . . . . . . . . . . . . . . 7 4 ADS-like superpotential 9 4.1 D3-D(-1) one-loop vacuum amplitudes . . . . . . . . . . . . . 9 4.2 Sp(6)× U(2) superpotential . . . . . . . . . . . . . . . . . . . 14 4.3 U(4) superpotential . . . . . . . . . . . . . . . . . . . . . . . 15 5 ED3-instantons 17 5.1 D3-ED3 one-loop vacuum amplitudes . . . . . . . . . . . . . 19 5.2 The superpotential . . . . . . . . . . . . . . . . . . . . . . . . 21 6 ADS superpotentials: a general analysis 22 7 Conclusions 24 1 Introduction Our understanding of non perturbative effects in four dimensional supersym- metric gauge theories (SYM) has dramatically improved in recent years. This is due mainly to the observation that integrals over the moduli space of gauge connections localize around a finite number of points[1]. These techniques have been applied to the study of multi-instanton corrections to N = 1, 2, 4 supersymmetric gauge theories in R4 [2, 3, 4, 5, 6, 7, 8, 9, 10] (see [11, 12] for reviews of multi-instanton techniques before localization and complete lists of references). In the D-brane language language, the dynamics of the gauge theory around the instanton background is described by an effective theory governing the interactions of the lowest energy excitations of open strings ending on a bound state of Dp-D(p+4) branes. For the case of N = 2, 4 SYM the multi-instanton action has been derived via string techniques in [13, 14]. In [15], D-brane techniques have been applied to the computation of the Affleck, Dine and Seiberg (ADS) superpotential [16, 17] for N = 1 SQCD with gauge group SU(Nc) and Nf = Nc − 1 massless flavours and Sp(2Nc) with 2Nf = 2Nc flavours. The N = 1 gauge theory is realized on the four- dimensional intersection of Nc coloured and Nf flavour D6 branes. Chiral matter comes from strings connecting the flavor and color D6 branes. Instan- tons in the U(Nc) gauge theory are realized in terms of ED2 branes parallel to the stack of Nc D6-branes. By careful integrating the supermoduli (massless strings with at least one end on the ED2) the precise form of the ADS su- perpotential was reproduced in the low energy, field theory limit α′ → 0. In the recent literature ED2-brane instantons in intersecting D6-brane models have received particular attention in connection with the possibility of gen- erating a Majorana mass for right handed neutrinos and their superpartners [18, 19, 20, 21, 22]. The field theory interpretation of this new instanton effect is far from clear and it is the subject of active investigation. In this paper we present a detailed derivation of these new non perturbative superpoten- tials in N = 1 Z3-orientifold models. Investigations of stringy instantons on N = 1 Z2 × Z2 orientifold singularities appeared recently in [23]. We study SYM gauge theories living on D3 branes located at a Z3- orientifold singularity. There are two choices for the orientifold projection [24, 25, 26, 27, 28] realized by two types of O3-planes1. They lead to anomaly free2 chiral N = 1 gauge theories with gauge groups SO(N − 4)× U(N) or Sp(N +4)×U(N) and three generations of chiral matter in the bifundamen- tal and anti/symmetric representation of U(N). The archetype of this class can be realized as a stack of 3N + 4 D3-branes and one O3−-plane sitting on top of an R6/Z3 singularity. This system can be thought of as a T-dual local description (near the origin) of the T 6/Z3 type I string vacuum found in [40]. The lowest choices of N lead to U(4) or U(5) gauge theories with three generations of chiral matter in the 6 and 10 + 5∗ that are clearly of phenomenological interest in unification scenarios [45, 46, 47, 48] 3. In [49] the U(4) case was studied and the form of the ADS-like super- 1We will only consider O3±-planes, not the more exotic Õ3 -planes [29, 30, 31]. 2Factorizable U(1) anomalies are cancelled by a generalization of the Green-Schwarz mechanism [32, 33, 34, 35, 36, 37, 38, 37] that may require the introduction of generalized Chern-Simons couplings [39]. 3Only the U(4) case can be realized in the compact Z3 orientifold. In general the Chan- Paton group is SO(8 − 2n) × U(12 − 2n) ×Hn where Hn = U(n) 3, SO(2n), U(n), U(1)n depending on the choice of Wilson lines [28, 41, 42, 43, 44]. potential was determined combining holomorphicity, U(1) anomaly, dimen- sional analysis and flavour symmetry. Stringy instanton effects were also considered. Very much as for worldsheet instantons in heterotic strings [50, 51, 52, 53, 54], these genuinely stringy instantons give rise to super- potentials that do not vanish at large VEV’s of the open string (charged) ‘moduli’. Here we derive the non-perturbative superpotentials from a direct in- tegration over the D-instanton super-moduli space. Gauge instantons are described in terms of open strings ending on D(-1) branes while stringy in- stantons are given by open strings ending on euclidean ED3 branes wrapping a four cycle inside the Calabi Yau . The open strings connecting the stack of D3 branes to D(-1) and ED3 branes have four and eight mixed Neumann- Dirichelet directions respectively. This ensures that the bound state is su- persymmetric. The superpotential receives contribution from disk, one-loop annulus and Möbius amplitudes ending on the D(-1) or ED3 branes. We find that ADS superpotentials are generated only for two gauge theory choices U(4) and Sp(6)×U(2) inside the Z3-orientifold class. Stringy instantons leads to Majorana masses in the U(4) case, Yukawa couplings in the U(6)×SO(2) gauge theory and non-renormalizable couplings for U(2N + 4) × SO(2N) gauge theories with N > 3. The plan of the paper is as follows. In section 2 we review the gauge theories coming from a stack of D3 branes at a C3/Z3 orientifold singularity. In Section 3 we consider non- perturbative effects generated by D(-1) gauge instantons, corresponding to ADS-like superpotential in the low energy limit. A detailed analysis of one- loop vacuum amplitudes and the integrals over the supermoduli is presented for SYM theories with gauge groups Sp(6) × U(2) and U(4). In section 5, we consider stringy instanton effects generated by ED3-branes. Once again a detailed analysis of the the one-loop string amplitudes and the integrals over the supermoduli is presented. In section 6 we present a “complete” list of N = 1 SYM theories with matter in the adjoint, fundamental, symmetric and antisymmetric representation of the gauge groups (U, SO, Sp) which exhibit a non perturbatively generated ADS superpotential. We conclude with some comments and directions for future investigation in Section 7. 2 The Gauge Theory The low energy dynamics of the open strings living on a stack of N D3-branes in flat space is described by a N = 4 U(N) SYM gauge theory. In the N = 1 language the fields are grouped into a vector multiplet V = (Aµ, λα, λ̄α̇) and 3 chiral multiplets ΦI = (φI , ψIα), I = 1, 2, 3 , all in the adjoint of the gauge group. We consider the D3-brane system at a R6/Z3 singularity. At the singular- ity the N D3-branes group into stacks of Nn fractional branes with n = 0, 1, 2 labelling the conjugacy classes of Z3. The gauge group U(N) decomposes n U(Nn). More precisely, denoting by γθ,N the projective embedding of the orbifold group element θ ∈ Z3 in the Chan-Paton group and imposing = 1 and γ† = γ−1 one can write N0×N̄0 , ωh 1 N1×N̄1 , ω̄h 1 N2×N̄2 ) (2.1) with N = nNn. The resulting gauge theory can be found by projecting the N = 4 U(N) gauge theory under the Z3 orbifold group action: V → γ V γ−1 ΦI → ω γ ΦI γ−1 ω = e2πi/3 (2.2) Keeping only invariant components under (2.2) one finds the N = 1 quiver gauge theory V : N0N̄0 +N1N̄1 +N2N̄2 ΦI : 3× N0N̄1 +N1N̄2 +N2N̄0 (2.3) with gauge group n U(Nn) and three generations of bifundamentals. More precisely V and ΦI are N ×N block matrices (N = Nn) with non trivial Nn × N̄m blocks given by (2.3). Under Z3 a block Nn × N̄m transform as ωn−m. These non-trivial transformation properties are compensated by the space-time eigenvalues of the corresponding field (ω0 for V and ω for ΦI ) making the corresponding component invariant under Z3. Next we consider the effect of introducing an O3±-plane. Woldsheet parity Ω flips open string orientations and act on Chan-Paton indices as Nn ↔ N̄−n where subscripts are always understood mod 3. This prescription leads to Ω : N0 ↔ N̄0 N1 ↔ N̄2 (2.4) The choices of O3±-planes correspond to keep states with eigenvalues Ω = ±1 and lead to symplectic or orthogonal gauge groups4. We start by considering the O3− case. Keeping Ω = − components from (2.3) one finds V : 1 N0(N0 − 1) +N1N̄1 ΦI : 3× N0N̄1 + N1(N1 − 1) (2.5) This follows from (2.3) after identifying the mirror images N̄0 = N0, N̄2 = N1, and antisymmetrizing the resulting block matrix. (2.5) describes the field content of a N = 1 SYM with gauge group SO(N0)×U(N1) and three chiral multiplets in the ( , ¯) + (•, ) For general N0, N1 the U(N1) gauge theory is anomalous. The anomaly is a signal of the presence of a twisted RR tadpole [34, 35]. Focusing on a local description near the orientifold singularity one can relax the global tadpole cancellation condition [55, 56]. These models can be thought as local descriptions of a more complicated Calabi Yau near a Z3 sigularity. Cancellation of the twisted RR tadpole can be written as [40] = −4 ⇒ N0 = N1 − 4 (2.6) and ensures the cancellation of the irreducible four-dimensional anomaly I(F ) ∼ [−N0 + (N1 − 4)] trF 3 = 0 (2.7) Finally the running of the gauge coupling constants is governed by the β functions with one-loop coefficients5 β0 = 3 ℓ( N0(N0 − 1))− 3N1 ℓ(N0) (N0 −N1 − 2) = −9 (IR free) β1 = 3 ℓ(N1N̄1)− 3N0 ℓ(N̄1)− 3 ℓ( N1(N1 − 1)) (−N0 +N1 + 2) = +9 (UV free) (2.8) with βn refering to the n th-gauge group. The last equalities arise after im- posing the anomaly cancellation (2.6). As expected, β0 + β1 = 0 since the ten-dimensional dilaton does not run. 4In the compact case, realized in terms of D9-branes and O9-plane on T 6/Z3, the orthogonal choice is dictated by global tadpole cancellation. Turning on a quantized NS- NS antisymmetric tensor [28, 41, 29] leads to symplectic groups. 5Here trRT aT b = ℓ(R), i.e. ℓ(N) = 1 , ℓ(NN̄) = N and ℓ(1 N(N± 1)) = 1 (N ± 2). The case Ω = + works in a similar way. The resulting N = 1 quiver has gauge group Sp(N0) × U(N1) and three chiral multiplets in the [( , ¯) + (•, )]. The U(N1) is anomaly free for N0 = N1 + 4 and the one- loop β function coefficients are given by β0 = +9 (UV free) and β1 = −9 (IR free). 3 D(-1) Instantons There are two sources of supersymmetric instanton corrections in the D3 brane gauge theory: D(-1)-instantons and Euclidean ED3-branes wrapping four cycles on T 6/Z3. Both are point-like configurations in the space-time and can be thought of as D(-1)-D3 and ED3-D3 bound states with four and eight directions with mixed Neumann-Dirichlet boundary conditions. 3.1 D3-D(-1) in flat space Gauge instantons in SYM can be efficiently described in terms of D(-1)- branes living on the world-volume of D3-branes [57]. As before, we start from the N = 4 case: a bound state of N D3 and k D(-1) branes in flat space. In this formalism instanton moduli are described by the lowest energy modes of open strings with at least one end on the D(-1)-brane stack. The gauge theory dynamics around the instanton background can be described in terms of the U(k) × U(N) 0-dimensional matrix theory living on the D- instanton world-volume. In particular, the ADHM constraints [58] defining the moduli space of self-dual YM connections follow from the F- and D- flatness condition in the matrix theory [57]. The instanton moduli space is given by the D(-1)D3 field content (aµ, θ α , χa, D c, θ̄Aα̇) kk̄ (wα̇, ν A) kN̄ (w̄α̇, ν̄ A) Nk̄ (3.1) with µ = 1, . . . , 4, α, α̇ = 1, 2 (vector/spinor indices of SO(4)), a = 1, . . . , 6, A = 1, . . . , 4 (vector/spinor indices of SO(6)R), c = 1, . . . , 3. The matrices aµ, χa describe the positions of the instanton in the directions parallel and perpendicular to the D3-brane respectively, wα is given by the NS open D3- D(-1) string (instanton sizes and orientations), Dc are auxiliary fields and θAα , θ̄Aα̇, ν A are the fermionic superpartners. The D3-D(-1) action can be written as [59] Sk,N = trk SG + SK + SD (3.2) SG = −[χa, χb] 2 + iθ̄α̇A[χ AB, θ̄ cDc (3.3) SK = −[χa, aµ] 2 + χaw̄ α̇wα̇χa − iθ αA[χABθ α ] + 2iχAB ν̄ SD = i −[aαα̇, θ αA] + ν̄Awα̇ + w̄α̇ν θ̄α̇A +D w̄σcw − iη̄cµν [a µ, aν ] with χAB ≡ T aABχa, T AB = (η AB, iη̄ AB) given in terms of the t’Hooft symbols and g20 = 4π(4π 2α′)−2 gs. The action (3.3) follows from the dimensional reduction of the D5-D9 action in six dimensions down to zero dimension. As a consequence our subsequent results hold up to some computable non vanishing numerical constant. In the presence of a v.e.v. for the six U(N) scalars ϕa in the D3-D3 open string sector we must add to Sk,N Sϕ = trk w̄α̇(ϕaϕa + 2χaϕa)wα̇ + 2iν̄ AϕABν (3.4) The multi-instanton partition function is Zk,N = e−Sk,N−Sϕ = VolU(k) dχ dD da dθ dθ̄dw dν e−Sk,N−Sϕ In the limit g0 ∼ (α ′)−1 → ∞, gravity decouples from the gauge theory and the contributions coming from SG are suppressed. The fields θ̄α̇A, D c become Lagrange multipliers implementing the super ADHM constraints θ̄α̇A : ν̄ Awα̇ + w̄α̇ν A − [aαα̇, θ αA] = 0 Dc : w̄σcw − iη̄cµν [a µ, aν ] = 0 (3.5) 3.2 D(-1)-D3 at the C3/Z3-orientifold Let us now consider in turn the Ω and then the Z3 projection. The effect of introducing an O3±-plane in the D(-1)-D3 system corre- sponds to keep open string states with eigenvalue ΩI = ±, Ω being the worldsheet parity and I a reflection along the Neumann-Dirichlet directions of the Dp-O3 system [8]. On D(-1) string modes, I acts as a reflection in the spacetime plane I : aµ → −aµ θ α → −θ α (3.6) leaving all other moduli invariant. In addition consistency with the D3-O3 projection requires that the D(-1) strings are projected in the opposite way with respect to the D3-branes[60] . From the gauge theory point of view this corresponds to the well known fact that SO(N) and Sp(N) gauge instantons have ADHM constraints invariant under Sp(k) and SO(k) respectively. We start by considering the O3− case. After the ΩI projection the sur- viving fields are (aµ, θ k(k− 1) (Dc, χI , χ̄I , θ̄Aα̇) k(k+ 1) (wα̇, ν) kN . (3.7) Since we are dealing with a SO(N) gauge theory the Dc moduli are projected in the adjoint of Sp(k). This is also the case for all the other moduli even under I while the odd ones, (aµ, θ α ), turn out to be antisymmetric. Let us now consider the Z3 projection. Out of the six χa one can form three complex fields χI with eigenvalues ω under Z3 and their conjugate χ̄I . To embed the Z3 projection into SU(4) we decompose the spinor index A = (0, I), with I = 1, . . . , 3 and the zeroth direction along the surviving N = 1 supersymmetry. The D3 and D(-1) gauge groups SO(N) and Sp(k) break into SO(N0)×U(N1) and Sp(k0)×U(k1) respectively with N0 (k0) the number of fractional D3 (D(-1)) branes invariant under Z3 and N1 (k1) those transforming with eigenvalue ω. More precisely, the projective embedding of the Z3 basic orbifold group element θ in the Chan-Paton group can be written N0×N0 , ωh 1 N1×N̄1 , ω̄h 1 N̄1×N1 k0×k0 , ωh 1 k1×k̄1 , ω̄h 1 k̄1×k1 ) (3.8) After projecting under Z3 the symmetric/antisymmetric matrices in (3.7) break into km × k̄n, km × N̄n or Nm × k̄n each transforming with eigenvalue ωm−n. In addition fields with up(down) index I transform like ω(ω̄). Keeping only the invariant components one finds (aµ; θ k0(k0 − 1) + k1k̄1 k1(k1 − 1) + k0k̄1 (Dc; θ̄0α̇) k0(k0 + 1) + k1k̄1 (χ̄I ; θ̄Iα̇) k̄1(k̄1 + 1) + k0k1 k1(k1 + 1) + k0k̄1 (wα̇; ν0) k0N0 + k1N̄1 + k̄1N1 νI k0N̄1 + k̄1N0 + k1N1 (3.9) Notice that the Z3 eigenvalues of the Chan-Paton indices in the r.h.s. of (3.9) compensate for those of the moduli in the l.h.s. making the field invariant under Z3. In addition (odd)even components under I are (anti)symmetrized ensuring the invariance under ΩI. The multi-instanton action follows from that of N = N0+2N1 D3 branes and k = k0 + 2k1 D(-1) instanton in flat space (3.2) with U(N) and U(k) matrices restricted to the invariant blocks (3.9). The results for O3+ can be read off from (3.9) by exchanging symmetric and antisymmetric representations. 4 ADS-like superpotential 4.1 D3-D(-1) one-loop vacuum amplitudes Non-perturbative superpotentials can be computed from the instanton mod- uli space integral [11, 13, 18] SW = e 〈1〉D+〈1〉A+〈1〉M µβnkn e−Sk,N−Sϕ (4.1) The integration is over the instanton moduli space, M, 〈1〉D is the disk amplitude and 〈1〉A,M are the one-loop vacuum amplitudes with at least one end on the D(-1)-instanton. The factor µβnkn , µ being the energy scale, comes from the quadratic fluctuations around the instanton background and as we will see it combines with a similar contribution coming from the moduli measure to give a dimensionless SW . The terms in front of the integral in (4.1) combine into SW = Λ e−Sk,N−Sϕ (4.2) Λknβn = e2πiknτn(µ) µβnkn τn(µ) = τn − (4.3) the one-loop renormalization group invariant and the running coupling con- stant respectively and τn refers to the complexified coupling constant of the nth gauge group. µ0 is a reference scale. More precisely, the disk amplitude and one-loop amplitudes yields e〈1〉D = e2πiknτn τn = e〈1〉A+〈1〉M = )−βnκn + . . . (4.4) with dots refering to threshold corrections that will not be considered here. To see (4.4) we should compute the following one-loop amplitudes 〈1〉A = − Tr[(1 + (−)F )(1 + θ + θ2) qL0−a] AD(−1)D3 = −A0,D(−1)D3 ln + . . . 〈1〉M = − Tr[ ΩI (1 + (−)F )(1 + θ + θ2) qL0−a] MD(−1) = −M0,D(−1) ln + . . . (4.5) In the above formula µ enters as a UV regulator in the open string chan- nel (see [61] for details) and A0,M0 are the massless contributions to the amplitudes. We start by considering the O3− projection. It is important to notice that only the annulus with one end on the D(-1) and one on the D3 contributes to these amplitudes since D(-1)-D(-1) amplitudes cancel due to the Riemann identity. One finds AD(−1),D3 = trγθ,ktrγθ,N ϑ[αβ ] ϑ[αβ+hi ] (k0 − k1)(N0 −N1) + . . . MD(−1) = trγθ2,k ϑ[αβ ] ϑ[αβ+hi ] = −3(k0 − k1) + . . . (4.6) The sum runs over the even spin structures and cαβ = (−) 2(α+β). The term comes from the (b, c) and (β, γ) ghosts while the extra five thetas in the numerator and denominator describe the contributions of the ten fermionic and bosonic worldsheet degrees of freedom. We adopt the shorthand notation h ] ≡ ϑ[ h ]/(2 cosπh) to describe the massive contribution of a periodic boson to the partition function. hi = ( ) denote the Z3-twists while the extra 1 -shifts in the annulus account for the D(-1)-D3 open string twist along Neuman-Dirichlet directions while 1 twists in the Möbius come from the I-projection. In addition we used the fact that the contribution of the unprojected sector is zero after using the Riemann identity while that of the θ- and θ2-projected sectors are identical explaining the overall factor of 2. The extra factor of 2 in the annulus comes from the two orientations of the string. The second line displays the massless contributions. We use the Chan Paton traces 1,k = k0 + 2k1 trγθ,k = k0 − k1 1,N = N0 + 2N1 trγθ,N = N0 −N1 (4.7) that follows from (2.1) and the first few terms in the theta expansions ϑ[0h] = 1 + q 2 2 cos 2πh+ . . . ϑ[ h ] = q 8 2 cosπh+ . . . η = q 24 + . . . (4.8) From (4.6) one finds A0 +M0 = (k0 − k1)(N0 −N1 − 2) = knβn (4.9) with βn the one-loop β coefficients given in (2.8). Plugging (4.9) into (4.5) results into (4.4). The fact that the β function coefficients are reproduced by the instanton vacuum amplitudes is a nice test of the instanton field content (3.9). Now let us determine the dependence of the instanton measure on the string scaleMs ∼ α ′ −1/2. The scaling of the various instanton moduli follows from (3.3): D, g0 ∼M s χa, ϕa ∼ Ms wα̇, aµ ∼M νA, θAα ∼M s θ̄Aα̇ ∼M s (4.10) Collecting from (3.9) the number of components of the various moduli enter- ing in the instanton measure one finds 6 e−Sk,N−Sϕ ∼ M−βnkns knβn = −2nD − nχ + na + nw + nθ̄ − (k0 − k1)(N0 −N1 − 2) (4.11) Notice that this factor precisely combines with that in (4.2) leading to a di- mensionless SW as expected. This simple dimensional analysis can be used to determine the form of the allowed ADS superpotentials in the gauge theory. A superpotential is generated if and only if the integral over the instanton moduli space reduce to an integral over x 0 describing the center of the in- stanton and θα its superpartner. More precisely SW = Λ e−Sk,N−Sϕ = c d4x0d Λknβn ϕknβn−3 (4.12) where c is a numerical constant. Whether c is zero or not depends on the presence or not of extra fermionic zero modes besides θ. Notice that the power of ϕ is completely fixed requiring that SW is dimensionless. The precise form of the superpotential requires the evaluation of the moduli space integral and will be the subject of the next section. The superpotential follows from (4.12) after promoting ϕI to the chiral superfield ΦI and x0, θα to the measure of the superspace SW = c d4xd2θ Λknβn Φknβn−3 (4.13) 6We recall that fermionic differentials scale as the inverse of the dimension of the fermion itself. This explains the extra minus sign in (4.11). A superpotential of type (4.13) is generated whenever [63, 64, 16, 17] 〈λ2 ϕknβn−3〉 6= 0 (4.14) Each scalar ϕ soaks two fermionic zero-modes and each gaugino λ one zero mode7. The condition (4.14) translates into dimMF = 2knβn − 4 (4.15) with dimMF the fermionic dimension of the instanton super-moduli space. The number of fermionic zero modes can be read off from (3.9) dimMF = nθ + nν − nθ̄ = k0(3N1 +N0 − 2) + k1[2N1 + 3(N0 +N1 − 2)] = k0(4N0 + 10) + k1(8N0 + 14) (4.16) where we used the fact that θ̄α̇A enter as a Lagrangian multiplier imposing the fermionic ADHM constraint and therefore subtracts degrees of freedom. The last line in (4.16) follows from using the anomaly cancellation condition N1 = N0 + 4. The result (4.16) is consistent with the Atiyah-Singer index theorem that states dimMF = 2k0 N0(N0 − 1)) + 3N1ℓ(N0) ℓ(N1N̄1) + 3N0ℓ(N1) + 3ℓ( N1(N1 − 1)) = k0(3N1 +N0 − 2) + k1[2N1 + 3(N0 +N1 − 2)] (4.17) Combining (4.15) and (4.16) one finds k1 − 7k0 − 1 k0 + 2k1 (4.18) One can easily see that the only non-negative solution for N0 is N0 = 0 k0 = 0 k1 = 1 We conclude that in the class of U(N0+4)×SO(N0) SYM theories describing the low-energy dynamics of D3-branes on the Z3 orientifold only the U(4) 7This can be seen by explicitly solving the equations of motion of the gaugino and the ϕ-field in the instanton background [59]. In particular the source for the scalar field comes from the Yukawa coupling LY uk = gYMϕ †ψλ in the gauge theory action. theory with three chiral multiplets in the antisymmetric leads to an ADS-like superpotential generated by gauge instantons. The counting can be easily repeated for the Sp(N1+4)×U(N1) cases by exchanging symmetric and antisymmetric representations in (3.9) as required by the presence of the O3+-plane. The results are knβn = 9(k0 − k1) dimMF = k0(4N1 + 6) + k1(8N1 + 18) 3k0 − 9k1 − 1 k0 + 2k1 (4.19) One can easily see that the only non-negative solution is N1 = 2 k0 = 1 k1 = 0 We conclude that in this class, only the gauge theory Sp(6)×U(2) with three chiral multiplets in the ( , ¯) + (•, ) admits an ADS-like superpotential generated by instantons. The aim of the rest of this section is to compute SW . The integral (4.12) will be evaluated in turn for the Sp(6)× U(2) and U(4) case. 4.2 Sp(6)× U(2) superpotential We first consider the O3+ case, i.e. the Sp(6) × U(2) gauge theory with three chiral multiplets in the [(6, 2̄) + (1, 3)]. The instanton moduli is given by (3.9) after flipping symmetric/antisymmetric representations in order to deal with the symplectic projection. Plugging k0 = 1, k1 = 0, N0 = 6, N1 = 2 into (3.9) one finds the the surviving fields aµ, w , θ0α, ν 0u0 , νIu1 (4.20) with u0 = 1, ..6, and u1 = 1, 2, whose position from lower to upper has been switched in this section for notational convenience as we will momentarily see. In particular both θ̄0α̇ andD c are projected out (the D(-1) “gauge” group is O(1) ≈ Z2 in this case) and therefore no ADHM constraint survives. The instanton action reduces then to S = SK + Sϕ = w α̇ ϕ̄Iu0u1ϕ I u1v0wα̇v0 + ν Iu1ν0u0ϕ̄Iu0u1 (4.21) Here and below we omit numerical coefficients that can be always reabsorbed at the end in the definition of the scale. The integrations over wα̇u0 , ν , νIu1 are gaussian and the final result, up to a non vanishing numerical constant, can be written as SW = Λ d4ad2θ det6×6 (ϕ̄Iu1,u0) det6×6 (ϕ̄Iu1,u0ϕ Iu1,v0) d4ad2θ det6×6 (ϕIu1,u0) (4.22) where we have exploited the possibility of combining I and u1 in one ‘bi- index’ Iu1 so as to get a range of six values. For the sake of simplicity we have dropped the subscript 0 denoting bare scalar fields. In the following scalar fields entering in formulae involving Λ will be always understood to be bare. The last step makes use of det(AB) = det(A)det(B). 4.3 U(4) superpotential We now consider the O3− case, i.e. the U(4) gauge theory with three chiral multiplets in the 6. Setting k0 = 0, k1 = 1, N0 = 0, N1 = 4 in (3.9) the surviving fields can be written as I[uv] , ϕ̄I[uv](0) , aµ(0) , χ̄I(−2) , χ (+2) , D (0) , w u(+1) , w̄ α̇(−1) θ0α(0) , θ̄0α̇(0) , θ̄α̇I(−1) ; ν u(+1) , ν̄ (−1) , ν (+1) (4.23) with u = 1, ..4 and the charge q under U(1)k1 is denoted in parentheses. Plugging into (3.3) (after taking α′ → 0) one finds S = SB + SF (4.24) where ν̄0uwuα̇ + w̄ θ̄α̇0 + ν Iu wuα̇ θ̄ I + χ̄Iν Iu + νIuϕ̄Iuvν̄ SB = w̄ α̇ϕ̄Iuwϕ Iwvwα̇v + ϕ Iuvwα̇uwvα̇χ̄I + ϕ̄Iuvw̄ uα̇w̄vα̇χ I + w̄uα̇wuα̇χ̄Iχ +Dc w̄σcw (4.25) As before we omit numerical coefficients. The integral over Dc leads to a δ function on the ADHM constraints d8wd8w̄δ3(w̄σcw) = dρ ρ9d12U (4.26) In the r.h.s of (4.26) we have solved the ADHM constraints in favor of w and U defined by wuα̇ = ρUuα̇ w̄ uα̇ = ρ Ūuα̇ Ūuα̇Uuβ̇ = δ (4.27) The coset representatives Uα̇u parameterizes the SU(4)/SU(2) orientations of the instanton inside the gauge group. The fermionic integrations lead to the determinant ∆F = ρ 8 ǫu1u2u3u4ǫv1v2u5u6ǫv3v4v5v6Xu1v1u2v2 Xu3v3u4v4 Yu5v5 Yu6v6 (4.28) Xu1v1u2v2 = ǫ I1I2I3χ̄I1ϕ̄I2u1v1ϕ̄I3u2v2 Yuv = U u Uα̇u (4.29) The bosonic integrals are more involved. For arbitrary choices of the scalar VEV’s ϕ̄I and ϕ I , even along the flat directions of the potential, the inte- gration over U represents a challenging if not a prohibitive task. Fortunately choosing ϕIuv = ϕηIuv, ϕ̄Iuv = ϕ̄η Iuv, the full ϕ-dependence can be factor- ized. SU(4) gauge and SU(3) ‘flavor’ invariance can then be used to recover the full answer. After the rescaling ρ2 → ρ2/(ϕϕ̄) χI → ϕχI χ̄I → ϕ̄χ̄I (4.30) The integral becomes SW = Λ d4x0d (4.31) with I the ϕ-independent integral dρρ9 d12U d3χd3χ̄∆F e S̃B = −ρ 2(1 + ηIuvYuvχI + η̄IuvȲ uvχI + χ̄Iχ I) (4.32) and ∆F given again by (4.28) but now in terms of Xu1v1u2v2 = ǫ I1I2I3χ̄I1 η̄I2u1v1 η̄I3u2v2 Finally one can restore the gauge covariance of (4.31) by noticing that there is a unique SU(4)c × SU(3)f singlet in the symmetric tensor of six ϕ det3×3[ǫu1..u4ϕ Iu1u2ϕJu3u4] Therefore one can replace ϕ6 in (4.31) by this singlet. The superpotential follows after replacing ϕI → ΦI SW = c d4xd2θ det3×3[ǫu1..u4Φ Iu1u2ΦJu3u4] (4.33) where c is a computable non-zero numerical coefficient. 5 ED3-instantons Let us now consider the ED3-D3 system. We restrict ourselves to the compact case T 6/Z3 and consider the ED3 fractional instanton wrapping a four-cycle Cn inside T 6/Z3. We start by considering the O3 −-orientifold projection. The zero modes of the Yang-Mills fields in the instanton background can be described as before in terms of open strings with at least one end on the ED3. Open strings connecting ED3 and D3 branes have 8 Neumann- Dirichlet directions therefore the zero-mode dynamics of the ED3-D3 system is equivalent to that of the D7-D(-1) bound state. The instanton action can be found starting from that of the N = (8, 0) sigma model describing the low energy dynamics of a D1-D9 bound state in type I [65] reduced down to zero dimensions. In flat space the D(-1)-D7 action reads S = trk Sg + SK + SD (5.1) Sg = −[χ, χ̄] 2 + Θ̃ȧχΘ̃ȧ +DcDc SK = −[χ,Xm][χ̄, Xm] + Θ aχ̄Θa + ν(χ + ϕ)ν SD = Θ̃ ȧXmΓ a +DcΓ̂cmn[Xm, Xn] (5.2) with m = 1, . . . , 8v, a = 1, . . . , 8s, ȧ = 1, . . . , 8c, c = 1, , . . . , 7. We denote by ϕ = mI(Cn)ϕ I , the gauge scalar parametrizing the position of the D3- brane along the direction perpendicular to the 4-cycle Cn. Here Γ ȧa, Γ̂ are gamma matrices of SO(8) and SO(7) respectively. The introduction of the auxiliary fields Dc has broken the manifest SO(8) invariance of the action that will be further broken by the Z3-projection. In (5.2), Xm and χ, χ̄ describe the position of the D(-1)-instanton in the directions longitudinal and perpendicular to the D7-brane respectively while Θa, Θ̃ȧ are the fermionic superpartners grouped according to the their chirality along the Dirichlet- Dirichlet χ-plane. Unlike the D(-1)-D3 case, in the case of 8 Neumann- Dirichlet directions Ω acts in the same way on the D(-1) and D7 Chan-Paton indices. This implies that Dc transform in the adjoint of SO(k) if we take the D7 gauge symmetry to be SO(N). In addition I acts as I : Xm → −Xm Θ a → −Θa (5.3) Fields with eigenvalues ΩI = − are then in the following representations of SO(k)× SO(N) (χ, χ̄, Dc, Θ̃ȧ) 1 k(k− 1) (Xm,Θ k(k+ 1) ν kN (5.4) Fields even under I transform in the adjoint of SO(k) while odd fields tran- form in the symmetric representation. For k = 1, N = 32 the D(-1)-D7 system or equivalently the D1-D9 bound state describes the S-dual version of the fundamental heterotic string on T 2. k > 1 bound states correspond to multiple windings of the heterotic string [65]. The field Dc implements the one-real D and three complex F flatness conditions V = − DcDc = −g20 m,n=1 [Xm, Xn] 2 = 0 (5.5) Dc = −1 mn[Xm, Xn] (5.6) An explicit choice of Γ matrices in D = 7 is given by (a = 1, 2, 3) Γa8×8 = iσ1 ⊗ η 4×4 Γ 8×8 = iσ3 ⊗ η̄ 4×4 Γ 8×8 = iσ2 ⊗ 14×4 (5.7) As in section 3 Z3 acts both on spacetime and Chan-Paton indices. Chan- Paton indices decompose as N → N0 + N1 + N̄1 and k → k0 + k1 + k̄1. Spacetime indices on the other hand decompose as 8v = 4 + 2ω + 2ω̄ 8s = 2 + 2ω + 4ω̄ 8c = 2 + 2ω̄ + 4ω 7 = 3 + 2ω + 2ω̄ (5.8) In addition χ, ν transform with eigenvalue ω under Z3. Combining with (5.4) one finds the Z3-invariant components χ, χ̄ 1 k1(k1 − 1) + k0k̄1 + h.c. Dc 3(1 k0(k0 − 1) + k1k̄1) + 2 k1(k1 − 1) + k0k̄1 + h.c. Θ̃ȧ 2 k0(k0 − 1) + k1k̄1 k̄1(k̄1 − 1) + k0k1) k1(k1 − 1) + k0k̄1) k0(k0 + 1) + k1k̄1 k1(k1 + 1) + k0k̄1 + h.c. k0(k0 + 1) + k1k̄1 k1(k1 + 1) + k0k̄1) k̄1(k̄1 + 1) + k0k1) ν k0N̄1 + k1N1 + k̄1N0 (5.9) 5.1 D3-ED3 one-loop vacuum amplitudes ED3 generated superpotentials can be computed following the same steps as in section 4.1. The disk amplitude can be written as e〈1〉D = e2πiknτ̃n τ̃n = i 4πV4(Cn) g2n α (C4 + C0 ∧R ∧ R) (5.10) τ̃n describes the coupling of closed string moduli to the ED3 instanton wrap- ping the 4-cycle Cn with volume V4(Cn). We remark that closed string states in the Z3-twisted sectors flow in the ED3-ED3 cylinder amplitude and there- fore τ̃n is function of both untwisted and twisted closed twisted moduli. This is not surprising since the volume of the cycle depends also on the volume of the exceptional cycles that the ED3 wraps. The annulus and Möbius amplitudes are given by AED3,D3 = ϑ[αβ ] 2trγθ,ktrγθ,N ϑ[αβ−2h1 ] + trγ 1,ktrγ1,N ϑ[αβ ] k0N1 − k1(N0 +N1) + . . . MED3 = − ϑ[αβ ] 2trγθ2,k ϑ[αβ−2h1 ] + trγ ϑ[αβ ] = 3k0 + k1 + . . . (5.11) The origin of the various contributions is the same that in the D(-1)-D3 system. Now the D3-ED3 open strings have 8 Neumann-Dirichlet directions explaining the extra 1 twists in the annulus amplitude. On the other side, the I projection accounts for the 1 -shift in the Möbius amplitude. Notice that unlike the D(-1)-D3 case, the unprojected amplitude tr1, now gives a non-trivial contribution. Collecting the contributions from (5.11) one finds Λ̃knbn = µknbn e〈1〉D+〈1〉A+〈1〉M = µknbn e2πikn τ̃n(µ) (5.12) τ̃n(µ) = τ̃n − (5.13) A0 +M0 = knbn = k0(6−N1) + k1(2−N0 −N1) (5.14) The interpretation of the bn as the one-loop β function coefficients of the τ̃n coupling, though tantalizing, is not clear to us. We will now check that knbn reproduces the right scale dependence of the instanton measure. The scaling of the various instanton moduli follows from (5.2): D, g0 ∼M s χ, χ̄, ϕ ∼Ms Xm ∼ M ν, Θa ∼M−1/2s Θ̃ ȧ ∼ M3/2s (5.15) Collecting from (5.9) the number of degrees of freedom entering in the instanton supermoduli measure one finds e−Sk,N ∼ M−knbns knbn = −2nD − nχ + nX + nΘ̃ − k0(6−N1) + k1(2−N0 −N1) (5.16) As in the previous case we write the instanton generated superpotential as the moduli space integral SW = Λ̃ e−Sk,N−Sϕ = d4x0d 2θ Λ̃knbn ϕ−knbn+3 (5.17) After promoting ϕ→ Φ and x0, θα to the measure of the superspace one finds the ED3 generated superpotential d4xd2θ Λ̃knbn Φ−knbn+3 (5.18) The main difference with respect to the D(-1) instantons is that now ϕ enters into Sϕ (5.2) only through the coupling to the ν-fermions. This implies that in order to get a non zero result from the fermionic integral in (5.17) only the ν’s and the two fermionic zero modes θα ∈ Θ a should survive the orientifold projections. From (5.9) one can easily see that this implies k0 = 1, k1 = 0. The same counting shows that no solutions are allowed in the Sp(N) case. 5.2 The superpotential Here we evaluate the instanton moduli space integral for the SO(N0)×U(N1) case. From our analysis above the relevant cases are k0 = 1, k1 = 0. The surviving fields in (5.9) are θα ∈ Θ 0 ∈ Xm νu (5.19) with u = 1, ...N1. The instanton action reduces to S = νuϕ uvνv (5.20) The superpotential is then given by the integral SW = Λ̃ d4x d2θ dN1ν e−νϕν (5.21) After integration over ν and lifting ϕ→ Φ to the superfield one finds SW = c Λ̃ d4x d2θ ǫu1....uN1Φ u1u2Φu3u4...ΦuN1−1uN1 (5.22) where c is a non vanishing numerical constant. Notice that the result is non-trivial only when N1 is even. The superpotentials (5.22) are non- renormalizable for N1 > 6 and grow for large vacuum expectation values where the low energy approximation breaks down. The only exceptions are Majorana masses U(4) + 3 Yukawa couplings SO(2)× U(6) + 3 ( , ¯) + 3 (•, ) (5.23) Notice that both instanton generated Yukawa couplings involve only the mat- ter in the antisymmetric representation. 6 ADS superpotentials: a general analysis Here we consider a general N = 1 gauge theory with gauge group U(N) and nAdj, nf/n̄f , nS/n̄S, nA/n̄A number of chiral multiplets in the adjoint, fun- damental, symmetric and anti-symmetric representations (and their complex conjugates) respectively. The cubic chiral anomaly, one-loop β function and number of fermionic zero modes in the instanton background of the gauge theory can be written Ianom = nf− + nS−(N + 4) + nA−(N − 4) = 0 (6.1) β1−loop = 3N −NnAdj − nf+ − nS+(N + 2)− nA+(N − 2) dimMF = k [2N + 2NnAdj + nf+ + nS+(N + 2) + nA+(N − 2)] nf± = nf ± n̄f nS± = nS ± n̄S nA± = nA ± n̄A (6.2) The condition for an Affleck, Dine and Seiberg like superpotential [16, 17] to be generated was determined in section 4.1 to be dimMF = 2kβ − 4 (6.3) Combining (6.1) and (6.3) one finds β1−loop = 2N + (6.4) nf− = −nS−(N + 4)− nA−(N − 4) nf+ = 2N − − 2NnAdj − nS+(N + 2)− nA+(N − 2) Remarkably the β function in a theory admitting an instanton generated su- perpotential depends only on the rank of the gauge group. A simple inspec- tion shows that a superpotential is generated only for k = 1 and nAdj = 0. The complete list follows from a scan of any choice of nS±,nA± such that n+ ≥ |n−| and n+ ≥ 0. One finds U(N) +Nf ( + ¯ ) Nf ≤ N − 1 U(N) + + (N − 4)¯ +Nf ( + ¯ ) Nf ≤ 2 U(4) + 2 +Nf ( + ¯ ) Nf ≤ 1 U(4) + 3 U(5) + 2 + 2¯ (6.5) The inequalities are saturated for gauge theories satisfying (6.3) and (6.4), while the lower cases are found by decoupling quark-antiquark pairs via mass deformations. The generalization to SO(N)/Sp(N) gauge groups is straightforward. In these cases there is no restriction coming from anomalies since representations are real. The β function and the number of fermionic zero modes in the instanton background are given by β1−loop = (N ± 2)− 1 nS(N + 2)− nA(N − 2) dimMF = k [N ± 2 + nf + nS(N + 2) + nA(N − 2)] with upper sign for Sp(N) and lower sign for SO(N) gauge groups. Imposing (6.3) one finds β1−loop = N ± 2 + (6.6) nf = N ± 2− − nS(N + 2)− nA(N − 2) The list of solutions is even shorter SO(N) +Nf Nf ≤ N − 3 k = 2 Sp(N) +Nf Nf ≤ N k = 1 Sp(N) + + 2 k = 1 (6.7) Notice that k = 1, respectively k = 2, are the basic instantons in Sp(N), respectively SO(N), since the instanton symmetry groups are in these cases SO(k), respectively Sp(k). 7 Conclusions In the present paper, we have given a detailed microscopic derivation of non- perturbative superpotentials for chiral N = 1 D3-brane gauge theories living at Z3-orientifold singularities. We considered both unoriented projections leading to SO(N1− 4)×U(N1) and Sp(N1+4)×U(N1) gauge theories with three generations of chiral matter in the representations ( , ¯) + (•, ) and ( , ¯) + (•, ) respectively. The U(4) case was studied in details in [49] and describes the local physics of type I theory near the origin of T 6/Z3 with SO(8)× U(12) gauge group broken by Wilson lines. In the present T-dual setting, there are two sources of non-perturbative effects: D(-1) and ED3 instantons. The former realize the standard gauge instantons and lead to Affleck, Dine and Seiberg like superpotentials. The latter lead to Majorana masses or non-renormalizable superpotentials and were ignored till very recently [18, 19, 20, 21, 15, 49, 22, Our explicit instanton computations confirm the form of ADS and stringy superpotentials proposed in [49] on the basis of holomorphicity, dimensional analysis U(1) anomaly and flavour symmetry. We show that ADS super- potentials are generated only for the U(4) and Sp(6)× U(2) gauge theories in the Z3-orientifold list. The precise form of the superpotential is derived from an integration over the instanton super-moduli space. Like in [15], the β function running of gauge couplings are reproduced from vacuum ampli- tudes given in terms of annulus and Möbius amplitudes ending on the in- stantons. The same analysis is performed for “stringy instantons” generated by Euclidean ED3-branes (dual to ED1-strings in type I theory) wrapping holomorphic four-cycles on T 6/Z3. A detailed microscopic analysis of the multi-instanton super-moduli space encompasses massless open string states with a least one end on the ED3-instanton. We show the generation of Ma- jorana mass terms for the open string chiral multiplets in the U(4) case, Yukwa couplings for the SO(2)×U(6) gauge theory and non-renormalizable superpotentials for SO(N0)×U(N0 +4) gauge theories. The field theory in- terpretation of the β function coefficients generated by the one-loop vacuum amplitudes for open strings ending on the ED3-instantons is one of the most interesting open question left by our instanton super-moduli space analysis. As previously observed, the invariance under anomalous U(1)’s results from a detailed balance between the charges of the open strings involved and the axionic shift of a closed string R-R modulus from the twisted sector. Our present analysis has some analogies with the recent ones [18, 19, 20, 21, 15, 22, 23] that have focussed on ED2-branes at D6-brane intersection. As stressed in [49], one immediate advantage of the viewpoint advocated here is the consistency of the local description. Indeed, imposing twisted tadpole cancellation [34, 35] the models presented here and all closely related settings of D-branes at singularities (not necessarily of the Zn kind) give rise to anomaly free theories, while this is not necessarily the case for the ‘local’ models with intersecting D-branes. We can envisage the possibility of extending our analysis to other Zn singularities [55, 56] or even to Gepner models [66, 67, 68] where many if not all ingredients, such as the brane actions from gauge kinetic functions including one-loop threshold effects [69, 70, 71, 72], are available. In the present paper we have not addressed phenomenological implica- tions of the stringy instanton effects we have analyzed in detail. We hope to be able to investigate these issues in this or similar contexts with D-branes at singularities, where the rigidity of the cycles is well understood and allows for the correct number of fermionic zero-modes. Clearly additional (closed string) fluxes neeeded for moduli stabilization [73, 74] may change some of our present conclusions. Acknowledgments It is a pleasure to thank P. Anastasopoulos, R. Argurio, C. Bachas, M. Bertolini, M. Billo, G. Ferretti, M.L. Frau, A. Kumar, E. Kiritsis, I. Kle- banov, S. Kovacs, A. Lerda, L. Martucci, I. Pesandro, R. Russo and M. Wijnholt for valuable discussions. Special thanks go to G. Pradisi for collab- oration on the computation of the string amplitudes and useful exchanges. During completion of this work, M.B. was visiting the Galileo Galilei In- stitute in Arcetri (FI) and thanks INFN for hospitality and support. M.B. is very grateful to the organizers and participants to the workshop “String and M theory approaches to particle physics and cosmology” for creating a very stimulating atmosphere. This work was supported in part by the CNRS PICS no. 2530 and 3059, INTAS grant 03-516346, MIUR-COFIN 2003-023852, NATO PST.CLG.978785, the RTN grants MRTNCT- 2004- 503369, EU MRTN-CT-2004-512194, MRTN-CT-2004-005104 and by a Eu- ropean Union Excellence Grant, MEXT-CT-2003-509661. References [1] N. A. Nekrasov, Seiberg-witten prepotential from instanton counting, Adv. Theor. Math. Phys. 7 (2004) 831–864, [hep-th/0206161]. [2] R. Flume and R. Poghossian, An algorithm for the microscopic evaluation of the coefficients of the seiberg-witten prepotential, Int. J. Mod. Phys. A18 (2003) 2541, [hep-th/0208176]. [3] U. Bruzzo, F. Fucito, J. F. Morales, and A. Tanzini, Multi-instanton calculus and equivariant cohomology, JHEP 05 (2003) 054, [hep-th/0211108]. [4] A. S. Losev, A. Marshakov, and N. A. Nekrasov, Small instantons, little strings and free fermions, hep-th/0302191. [5] N. Nekrasov and A. Okounkov, Seiberg-witten theory and random partitions, hep-th/0306238. [6] R. Flume, F. Fucito, J. F. Morales, and R. Poghossian, Matone’s relation in the presence of gravitational couplings, JHEP 04 (2004) 008, [hep-th/0403057]. [7] M. Marino and N. Wyllard, A note on instanton counting for n = 2 gauge theories with classical gauge groups, JHEP 05 (2004) 021, [hep-th/0404125]. [8] F. Fucito, J. F. Morales, and R. Poghossian, Instantons on quivers and orientifolds, JHEP 10 (2004) 037, [hep-th/0408090]. [9] F. Fucito, J. F. Morales, R. Poghossian, and A. Tanzini, N = 1 superpotentials from multi-instanton calculus, JHEP 01 (2006) 031, [hep-th/0510173]. [10] S. Fujii, H. Kanno, S. Moriyama, and S. Okada, Instanton calculus and chiral one-point functions in supersymmetric gauge theories, hep-th/0702125. [11] N. Dorey, T. J. Hollowood, V. V. Khoze, and M. P. Mattis, The calculus of many instantons, Phys. Rept. 371 (2002) 231–459, [hep-th/0206063]. [12] M. Bianchi, S. Kovacs and G. Rossi, “Instantons and supersymmetry”, hep-th/0703142. [13] M. Billo et. al., Classical gauge instantons from open strings, JHEP 02 (2003) 045, [hep-th/0211250]. [14] M. Billo, M. Frau, F. Fucito, and A. Lerda, Instanton calculus in r-r background and the topological string, JHEP 11 (2006) 012, [hep-th/0606013]. [15] N. Akerblom, R. Blumenhagen, D. Lust, E. Plauschinn, and M. Schmidt-Sommerfeld, Non-perturbative sqcd superpotentials from string instantons, hep-th/0612132. [16] I. Affleck, M. Dine, and N. Seiberg, Supersymmetry breaking by instantons, Phys. Rev. Lett. 51 (1983) 1026. [17] I. Affleck, M. Dine, and N. Seiberg, Dynamical supersymmetry breaking in supersymmetric qcd, Nucl. Phys. B241 (1984) 493–534. [18] R. Blumenhagen, M. Cvetic, and T. Weigand, Spacetime instanton corrections in 4d string vacua - the seesaw mechanism for d-brane models, hep-th/0609191. [19] M. Haack, D. Krefl, D. Lust, A. Van Proeyen, and M. Zagermann, Gaugino condensates and d-terms from d7-branes, JHEP 01 (2007) 078, [hep-th/0609211]. [20] L. E. Ibanez and A. M. Uranga, Neutrino majorana masses from string theory instanton effects, JHEP 03 (2007) 052, [hep-th/0609213]. [21] B. Florea, S. Kachru, J. McGreevy, and N. Saulina, Stringy instantons and quiver gauge theories, hep-th/0610003. [22] M. Cvetic, R. Richter, and T. Weigand, Computation of d-brane instanton induced superpotential couplings: Majorana masses from string theory, hep-th/0703028. [23] R. Argurio, M. Bertolini, G. Ferretti, A. Lerda and C. Petersson, Stringy instantons at orbifold singularities, hep-th/0704.0262. [24] A. Sagnotti, Open strings and their symmetry groups, hep-th/0208020. [25] G. Pradisi and A. Sagnotti, Open string orbifolds, Phys. Lett. B216 (1989) 59. [26] M. Bianchi and A. Sagnotti, On the systematics of open string theories, Phys. Lett. B247 (1990) 517–524. [27] M. Bianchi and A. Sagnotti, Twist symmetry and open string wilson lines, Nucl. Phys. B361 (1991) 519–538. [28] M. Bianchi, G. Pradisi, and A. Sagnotti, Toroidal compactification and symmetry breaking in open string theories, Nucl. Phys. B376 (1992) 365–386. [29] E. Witten, Toroidal compactification without vector structure, JHEP 02 (1998) 006, [hep-th/9712028]. [30] E. Dudas, Theory and phenomenology of type i strings and m-theory, Class. Quant. Grav. 17 (2000) R41–R116, [hep-ph/0006190]. [31] C. Angelantonj and A. Sagnotti, Open strings, Phys. Rept. 371 (2002) 1–150, [hep-th/0204089]. [32] M. B. Green and J. H. Schwarz, Anomaly cancellation in supersymmetric d=10 gauge theory and superstring theory, Phys. Lett. B149 (1984) 117–122. [33] A. Sagnotti, A note on the green-schwarz mechanism in open string theories, Phys. Lett. B294 (1992) 196–203, [hep-th/9210127]. [34] L. E. Ibanez, R. Rabadan, and A. M. Uranga, Anomalous u(1)’s in type i and type iib d = 4, n = 1 string vacua, Nucl. Phys. B542 (1999) 112–138, [hep-th/9808139]. [35] M. Bianchi and J. F. Morales, Anomalies and tadpoles, JHEP 03 (2000) 030, [hep-th/0002149]. [36] I. Antoniadis, E. Kiritsis, and J. Rizos, Anomalous u(1)s in type i superstring vacua, Nucl. Phys. B637 (2002) 92–118, [hep-th/0204153]. [37] P. Anastasopoulos, 4d anomalous u(1)’s, their masses and their relation to 6d anomalies, JHEP 08 (2003) 005, [hep-th/0306042]. [38] P. Anastasopoulos, Anomalous u(1)s masses in non-supersymmetric open string vacua, Phys. Lett. B588 (2004) 119–126, [hep-th/0402105]. [39] P. Anastasopoulos, M. Bianchi, E. Dudas, and E. Kiritsis, Anomalies, anomalous u(1)’s and generalized chern-simons terms, JHEP 11 (2006) 057, [hep-th/0605225]. [40] C. Angelantonj, M. Bianchi, G. Pradisi, A. Sagnotti, and Y. S. Stanev, Chiral asymmetry in four-dimensional open- string vacua, Phys. Lett. B385 (1996) 96–102, [hep-th/9606169]. [41] M. Bianchi, A note on toroidal compactifications of the type i superstring and other superstring vacuum configurations with 16 supercharges, Nucl. Phys. B528 (1998) 73–94, [hep-th/9711201]. [42] M. Cvetic, L. L. Everett, P. Langacker, and J. Wang, Blowing-up the four-dimensional z(3) orientifold, JHEP 04 (1999) 020, [hep-th/9903051]. [43] M. Cvetic and P. Langacker, D = 4 n = 1 type iib orientifolds with continuous wilson lines, moving branes, and their field theory realization, Nucl. Phys. B586 (2000) 287–302, [hep-th/0006049]. [44] M. Cvetic, A. M. Uranga, and J. Wang, Discrete wilson lines in n = 1 d = 4 type iib orientifolds: A systematic exploration for z(6) orientifold, Nucl. Phys. B595 (2001) 63–92, [hep-th/0010091]. [45] A. M. Uranga, Chiral four-dimensional string compactifications with intersecting d-branes, Class. Quant. Grav. 20 (2003) S373–S394, [hep-th/0301032]. [46] E. Kiritsis, D-branes in standard model building, gravity and cosmology, Fortsch. Phys. 52 (2004) 200–263, [hep-th/0310001]. [47] R. Blumenhagen, M. Cvetic, P. Langacker, and G. Shiu, Toward realistic intersecting d-brane models, Ann. Rev. Nucl. Part. Sci. 55 (2005) 71–139, [hep-th/0502005]. [48] R. Blumenhagen, B. Kors, D. Lust, and S. Stieberger, Four-dimensional string compactifications with d-branes, orientifolds and fluxes, hep-th/0610327. [49] M. Bianchi and E. Kiritsis, Non-perturbative and flux superpotentials for type i strings on the z(3) orbifold, hep-th/0702015. [50] M. Dine, N. Seiberg, X. G. Wen, and E. Witten, Nonperturbative effects on the string world sheet, Nucl. Phys. B278 (1986) 769. [51] M. Dine, N. Seiberg, X. G. Wen, and E. Witten, Nonperturbative effects on the string world sheet. 2, Nucl. Phys. B289 (1987) 319. [52] E. Witten, World-sheet corrections via d-instantons, JHEP 02 (2000) 030, [hep-th/9907041]. [53] C. Beasley and E. Witten, Residues and world-sheet instantons, JHEP 10 (2003) 065, [hep-th/0304115]. [54] C. Beasley and E. Witten, New instanton effects in string theory, JHEP 02 (2006) 060, [hep-th/0512039]. [55] G. Aldazabal, L. E. Ibanez, F. Quevedo, and A. M. Uranga, D-branes at singularities: A bottom-up approach to the string embedding of the standard model, JHEP 08 (2000) 002, [hep-th/0005067]. [56] M. Buican, D. Malyshev, D. R. Morrison, M. Wijnholt, and H. Verlinde, D-branes at singularities, compactification, and hypercharge, JHEP 01 (2007) 107, [hep-th/0610007]. [57] M. R. Douglas, Branes within branes, hep-th/9512077. [58] M. F. Atiyah, N. J. Hitchin, V. G. Drinfeld, and Y. I. Manin, Construction of instantons, Phys. Lett. A65 (1978) 185–187. [59] N. Dorey, T. J. Hollowood, V. V. Khoze, M. P. Mattis, and S. Vandoren, Multi-instanton calculus and the ads/cft correspondence in n = 4 superconformal field theory, Nucl. Phys. B552 (1999) 88–168, [hep-th/9901128]. [60] E. G. Gimon and J. Polchinski, Consistency conditions for orientifolds and d-manifolds, Phys. Rev. D54 (1996) 1667–1676, [hep-th/9601038]. [61] M. Bianchi and J. F. Morales, Rg flows and open/closed string duality, JHEP 08 (2000) 035, [hep-th/0006176]. [62] F. Fucito, J. F. Morales, and A. Tanzini, D-instanton probes of non-conformal geometries, JHEP 07 (2001) 012, [hep-th/0106061]. [63] G. Veneziano and S. Yankielowicz, An effective lagrangian for the pure n=1 supersymmetric yang-mills theory, Phys. Lett. B113 (1982) 231. [64] T. R. Taylor, G. Veneziano, and S. Yankielowicz, Supersymmetric qcd and its massless limit: An effective lagrangian analysis, Nucl. Phys. B218 (1983) 493. [65] E. Gava, J. F. Morales, K. S. Narain, and G. Thompson, Bound states of type I D-strings, Nucl. Phys. B528 (1998) 95–108, [hep-th/9801128]. [66] C. Angelantonj, M. Bianchi, G. Pradisi, A. Sagnotti, and Y. S. Stanev, Comments on gepner models and type i vacua in string theory, Phys. Lett. B387 (1996) 743–749, [hep-th/9607229]. [67] T. P. T. Dijkstra, L. R. Huiszoon, and A. N. Schellekens, Supersymmetric standard model spectra from rcft orientifolds, Nucl. Phys. B710 (2005) 3–57, [hep-th/0411129]. [68] P. Anastasopoulos, T. P. T. Dijkstra, E. Kiritsis, and A. N. Schellekens, Orientifolds, hypercharge embeddings and the standard model, Nucl. Phys. B759 (2006) 83–146, [hep-th/0605226]. [69] I. Antoniadis, C. Bachas, and E. Dudas, Gauge couplings in four-dimensional type i string orbifolds, Nucl. Phys. B560 (1999) 93–134, [hep-th/9906039]. [70] D. Lust and S. Stieberger, Gauge threshold corrections in intersecting brane world models, hep-th/0302221. [71] M. Bianchi and E. Trevigne, Gauge thresholds in the presence of oblique magnetic fluxes, JHEP 01 (2006) 092, [hep-th/0506080]. [72] P. Anastasopoulos, M. Bianchi, G. Sarkissian, and Y. S. Stanev, On gauge couplings and thresholds in type i gepner models and otherwise, JHEP 03 (2007) 059, [hep-th/0612234]. [73] D. Lust, S. Reffert, E. Scheidegger, W. Schulgin, and S. Stieberger, Moduli stabilization in type iib orientifolds. ii, hep-th/0609013. [74] D. Lust, S. Reffert, E. Scheidegger, and S. Stieberger, Resolved toroidal orbifolds and their orientifolds, hep-th/0609014. ABSTRACT We give a detailed microscopic derivation of gauge and stringy instanton generated superpotentials for gauge theories living on D3-branes at Z_3-orientifold singularities. Gauge instantons are generated by D(-1)-branes and lead to Affleck, Dine and Seiberg (ADS) like superpotentials in the effective N=1 gauge theories with three generations of bifundamental and anti/symmetric matter. Stringy instanton effects are generated by Euclidean ED3-branes wrapping four-cycles on T^6/\Z_3. They give rise to Majorana masses in one case and non-renormalizable superpotentials for the other cases. Finally we determine the conditions under which ADS like superpotentials are generated in N=1 gauge theories with adjoints, fundamentals, symmetric and antisymmetric chiral matter. <|endoftext|><|startoftext|> Introduction Diamond-Like Carbon (DLC) films have been shown to demonstrate various tribological behaviors: in ultra-high vacuum (UHV), with either friction coefficients as low as 0.01 or less and very mild wear, or very high friction coefficients (>0.4) and drastic wear. These behaviors depend notably on gaseous environment, hydrogen content of the film [1], and on its viscoplastic properties [2,3]. A relation between superlow friction in UHV and viscoplasticity has indeed been established for a-C:H films and confirmed for a fluorinated sample (a-C:F:H). In this study, nanoindentation and nanoscratch tests were conducted in ambient air, using a nanoindentation apparatus, in order to evaluate tribological behaviors, as well as mechanical and viscoplastic properties of different amorphous carbon films. Experimental The samples were deposited on a Si (100) substrate by Plasma Enhanced Chemical Vapor Deposition (PECVD) process at different bias voltages, either from acetylene, cyclohexane precursors by d.c.- PECVD, or hexafluorobenzene mixed with hydrogen precursor by r.f.-PECVD for the fluorinated sample (a- C:F:H, noted FDLC). Details of the deposition process can be found in [4,5]. Thickness of the coating is 1µm, except for the FDLC, which is 0.4µm. Nanoindentation and nanoscratch tests were carried out in ambient air, at room temperature, with a MTS NanoIndenter® XP apparatus. A spherical (radius 10µm) and a Berkovich diamond indenter were used. Mechanical and viscoplastic properties were evaluated from nanoindentation tests in continuous stiffness mode, using the Berkovich diamond indenter, with a maximum load of 100mN. As the load P is applied exponentially as a function of time, the ratio between loading rate P’ and load P is kept constant during indentation, and thus the strain rate ε is also constant. Five different ratios P’/P, from 3.10-3 up to 3.10-1Hz, were used. Nanotribological evaluation of the samples was conducted from nanoscratch tests at ramping load (0.1 to 10mN, 3 passes) and at constant load (5mN, 10 passes) with spherical diamond indenter. Results The strain rate sensitivity of the materials is estimated and fitted by a Norton-Hoff law: xH H ε= ⋅ where H is the hardness, H0 a constant, ε the strain and x a constant called viscoplastic exponent (Table 1). Contrary to UHV, no evidence of correlation between friction coefficients in ambient air and viscoplasticity can be made. But even in this environment, some very low friction coefficient values, as low as 0.04 (FDLC), with very mild wear have been evidenced (Figure 1). Hardness H0 seems to be the key parameter: wear resistance in the air is improved with higher H0 and friction coefficient decreases with H0. Note that H0 is also roughly linked with the hydrogen content of the coating for the non fluorinated samples, as it has been shown in [2]. The number of passes seems also to lead to a decrease of friction coefficient. Sample H content (at. %) (GPa) x µ ramping load µ constant load Wear FDLC 5/18(F) 16 0.060 0.04-0.14 0.080 ~ none AC8 34 13 0.014 0.06-0.11 0.083 ~ none AC5 40 11 0.068 0.05-0.16 0.078 mild CY6.5 42 6.8 0.028 0.07-0.13 0.083 mild CY5 42 1.3 0.076 0.10-0.22 0.183 severe Table 1: Summary of nanofriction tests results and viscoplastic properties Conclusion This study shows that in ambient air, wear resistance and frictional behavior of a-C:H and a-C:F:H samples is improved with hardness H0. In UHV, the achievement of super-low friction is linked with the viscoplastic character. Thus, intermediary coating, with high hardness and viscoplastic exponent, as a-C:F:H will demonstrate satisfactory tribological behavior both in ambient air and in UHV. References [1] C. Donnet et al., Tribo. Lett., 9 (2000) 137. [2] J. Fontaine et al., Tribo. Lett., 17 (2004) 709. [3] J. Fontaine et al., Thin Solid Films, (2005) in press. [4] C. Donnet et al., Surf. Coat. Tech., 94-95 (1997) 456. [5] C. Donnet et al., Surf. Coat. Tech., 94-95 (1997) 531. Figure 1: Constant load scratch micrograph FDLC CY5 ABSTRACT Diamond-Like Carbon (DLC) films have been shown to demonstrate various tribological behaviors: in ultra-high vacuum (UHV), with either friction coefficients as low as 0.01 or less and very mild wear, or very high friction coefficients (>0.4) and drastic wear. These behaviors depend notably on gaseous environment, hydrogen content of the film [1], and on its viscoplastic properties [2,3]. A relation between superlow friction in UHV and viscoplasticity has indeed been established for a-C:H films and confirmed for a fluorinated sample (a-C:F:H). In this study, nanoindentation and nanoscratch tests were conducted in ambient air, using a nanoindentation apparatus, in order to evaluate tribological behaviors, as well as mechanical and viscoplastic properties of different amorphous carbon films. <|endoftext|><|startoftext|> IPPP/07/10 DCPT/07/20 Implication of the D0 Width Difference On CP-Violation in D0-D̄0 Mixing Patricia Ball IPPP, Department of Physics, University of Durham, Durham DH1 3LE, UK Abstract Both BaBar and Belle have found evidence for a non-zero width difference in the D0-D̄0 system. Although there is no direct experimental evidence for CP-violation in D mixing (yet), we show that the measured values of the width difference y ∼ ∆Γ already imply constraints on the CP-odd phase in D mixing, which, if significantly different from zero, would be an unambiguous signal of new physics. ∗Patricia.Ball@durham.ac.uk http://arxiv.org/abs/0704.0786v2 The highlight of this year’s Moriond conference on electroweak interactions and unified theories arguably was the announcement by BaBar and Belle of experimental evidence for D0-D̄0 mixing [1, 2, 3], which was quickly followed by a number of theoretical anal- yses [4, 5, 6, 7, 8, 9]. While Refs. [4, 7, 8, 9] focused on the constraints posed, by the experimental results, on various new-physics models, Ref. [5] presented a first analysis of the implications of these results for the fundamental parameters describing D mixing. The purpose of this letter is to show that the present experimental results already imply constraints on a sizeable CP-odd phase in D mixing, which could only be due to new physics (NP). To start with, let us shortly review the theoretical formalism of D mixing and the experimental results, see Refs. [10, 11] for more detailed reviews. In complete analogy to B mixing, D mixing in the SM is due to box diagrams with internal quarks andW bosons. In contrast to B, though, the internal quarks are down-type. Also in contrast to B mixing, the GIM mechanism is much more effective, as the contribution of the heaviest down-type quark, the b, comes with a relative enhancement factor (m2b −m2s,d)/(m2s −m2d), but also a large CKM-suppression factor |VubV ∗cb|2/|VusV ∗cs|2 ∼ λ8, which renders its contribution to D mixing ∼ 1% and hence negligible. As a consequence, D mixing is very sensitive to the potential intervention of NP. On the other hand, it is also rather difficult to calculate the SM “background” to D mixing, as the loop-diagrams are dominated by s and d quarks and hence sensitive to the intervention of resonances and non-perturbative QCD. The quasi-decoupling of the 3rd quark generation also implies that CP violation in D mixing is extremely small in the SM, and hence any observation of CP violation will be an unambiguous signal of new physics, independently of hadronic uncertainties. The theoretical parameters describing D mixing can be defined in complete analogy to those for B mixing: the time evolution of the D0 system is described by the Schrödinger equation D0(t) D̄0(t) M − i Γ D0(t) D̄0(t) with Hermitian matrices M and Γ. The off-diagonal elements of these matrices, M12 and Γ12, describe, respectively, the dispersive and absorptive parts of D mixing. The flavour-eigenstates D0 = (cū), D̄0 = (uc̄) are related to the mass-eigenstates D1,2 by |D1,2〉 = p|D0〉 ± q|D̄0〉 (2) M12 − i2 Γ12 ; (3) |p|2 + |q|2 = 1 by definition. The basic observables in D mixing are the mass and lifetime difference of D1,2, which are usually normalised to the average lifetime Γ = (Γ1 + Γ2)/2: x ≡ ∆M M2 −M1 , y ≡ ∆Γ Γ2 − Γ1 . (4) In this letter we follow the sign convention of Ref. [5], according to which x is positive by definition. The sign of y then has to be determined from experiment. In addition, if there is CP-violation in the D system, one also has 6= 1, φ ≡ arg(M12/Γ12) 6= 0. (5) While previously only bounds on x and y were known, both BaBar and Belle have now found evidence for non-vanishing mixing in the D system. BaBar has obtained this evidence from the measurement of the doubly Cabibbo-suppressed decay D0 → K+π− (and its CP conjugate), yielding y′ = (0.97± 0.44(stat)± 0.31(syst))× 10−2, x′2 = (−0.022± 0.030(stat)± 0.021(syst))× 10−2, (6) while Belle obtains yCP = (1.31± 0.32(stat)± 0.25(syst))× 10−2 (7) from D0 → K+K−, π+π− and x = (0.80±0.29(stat)±0.17(syst))×10−2, y = (0.33±0.24(stat)±0.15(syst))×10−2 (8) from a Dalitz-plot analysis of D0 → K0Sπ+π−. Here yCP → y in the limit of no CP violation in D mixing, while the primed quantities x′, y′ are related to x, y by a rotation by a strong phase δKπ: y′ = cos δKπ − x sin δKπ, x′ = x cos δKπ + y sin δKπ. (9) Limited experimental information on this phase has been obtainted at CLEO-c [12]: cos δKπ = 1.09± 0.66 , (10) which can be translated into δKπ = (0 ± 65)◦. An analysis with a larger data-set is underway at CLEO-c, with an expected uncertainty of ∆ cos δKπ ≈ 0.1 in the next couple of years [13]; BES-III is expected to reach ∆ cos δKπ ≈ 0.04 after 4 years of running [14]. The experimental result (10) agrees with theoretical expectations, δKπ = 0 in the SU(3)-limit and |δKπ| <∼ 15 ◦ from a calculation of the amplitudes in QCD factorisation [15]. Based on these experimental results, a preliminary HFAG-average was presented at the 2007 CERN workshop “Flavour in the Era of the LHC” [13]: x = (8.5+3.2 −3.1)× 10−3, y = (7.1+2.0−2.3)× 10−3. (11) Adding errors in quadrature, this implies = 1.2± 0.6. (12) The exact relations between ∆M , ∆Γ, M12 and Γ12 are given by (∆M)2 − 1 (∆Γ)2 = 4|M12|2 − |Γ12|2, (∆M)(∆Γ) = 4Re(M∗ Γ12) = 4|M12||Γ12| cosφ . (13) Eq. (13) implies x/y > 0 for |φ| < π/2 and x/y < 0 for π/2 < |φ| < 3π/2. In view of the above experimental results, we assume |φ| < π/2 from now on. As for the CP-violating observables, |q/p| 6= 1 characterises CP-violation in mixing and can be measured for instance in flavour-specific decays D0 → f , where D̄0 → f is possible only via mixing. The prime example is semileptonic decays with ASL = Γ(D0 → ℓ−X)− Γ(D̄0 → ℓ+X) Γ(D0 → ℓ−X) + Γ(D̄0 → ℓ+X) |q/p|2 − |p/q|2 |q/p|2 + |p/q|2 . (14) Although the B factories may have some sensitivity to this asymmetry, its measurement is severely impaired by the fact that D mixing proceeds only very slowly, resulting in a large suppression factor of the mixed vs. the unmixed rate: Γ(D0 → ℓ−X) Γ(D0 → ℓ+X) x2 + y2 2 + x2 + y2 ≈ 6× 10−5. (15) Both in the K and the B system the quantity − 1 (16) is very small, which however need not necessarily be the case forD’s. From (3) one derives the general expression 4 + r2 + 4r sin φ 4 + r2 − 4r sinφ with r = |Γ12/M12| and the weak phase φ defined in (5). In the B system, one has r ≪ 1 (the current up-to-date numbers are r ≈ 7 × 10−3 for Bd and r ≈ 5 × 10−3 for Bs [16]), so that upon expansion in r = 1 + sin φ+O(r2). (18) Note that this formula refers to the definition φ = arg(M12/Γ12), which differs by +π from the one used in Ref. [16], φ = arg(−M12/Γ12). For the K system, one finds r ≈ |∆Γ/∆M | ≈ 2 from experiment, but now the phase φ turns out to be small, so that = 1 + 4 + r2 φ+O(φ2) ≈ 1 + φ. (19) In both cases, |q/p| ≈ 1 to a very good approximation. In the D system, however, there is no natural hierarchy r ≪ 1, and of course one hopes that NP-effects induce |φ| ≫ 0. In -1.5 -1. -0.5 0. 0.5 1. 1.5 Figure 1: |q/p|2, Eq. (20), as a function of the CP-odd phase φ for the central experimental value r̃ = 7.1/8.5. Solid line: full expression, dashed line: first order expansion around φ = 0. this case, and because x and y have been measured, while |M12| and |Γ12| are difficult to calculate, it is convenient to express |q/p|D in terms of x, y, φ, using the exact relations (13). From (3), and defining r̃ = y/x, we then obtain 2(1 + r̃2) 2(1 + r̃2)2 + 16r̃2 tan2 φ +8r̃ tanφ secφ (1 + r̃2)2 − (1− r̃2)2 sin2 φ . (20) Note that for finite xy and φ = ±π/2, |q/p| diverges because xy → 0 for φ → ±π/2 from (13). In Fig. 1 we plot |q/p|2 as function of φ, for the central experimental value from HFAG, r̃ = 7.1/8.5, Eq. (11). It is obvious that even for moderate values of φ the small-φ expansion is not really reliable. What is the currently available experimental information on CP-violating in D mixing, i.e. |q/p| and φ? As already mentioned, the semileptonic CP-asymmetry (14) has not been measured yet. What has been measured, though, is the effect of CP-violation on the time-dependent rates of D0 → K+π− and D̄0 → K−π+. The BaBar collaboration has parametrised these rates as Γ(D0(t) → K+π−) ∝ e−Γt x′2+ + y (Γt)2 Γ(D̄0(t) → K−π+) ∝ e−Γt + y′2 (Γt)2 and fit the D0 and D̄0 samples separately. They find [2] = (9.8± 6.4(stat)± 4.5(syst))× 10−3, = (9.6± 6.1(stat)± 4.3(syst))× 10−3. (22) Adding errors in quadrature, this means y′ = 1.0±1.1. BaBar also obtains values for which we do not quote here, because the sensitivity to the quadratic term in (21) is -1.5 -1. -0.5 0. 0.5 1. -1.5 -1. -0.5 0. 0.5 1. 1.5 Figure 2: Left: y′+/y as function of φ for x/y = 1.2 (solid line) and x/y = {0.6, 1.8} (dashed lines), from Eq. (11). δKπ = 0. Right: y as function of φ for x/y = 1.2 for δKπ = 0 (solid line) and δKπ = ±65◦ (dashed lines). less than that to the linear term in y′ D is the ratio of the doubly Cabbibo-suppressed to the Cabibbo-favoured amplitude, R D = |A(D0 → K+π−)/A(D0 → K−π+)|. δKπ is the relative strong phase in the Cabibbo-favoured and suppressed amplitudes: A(D0 → K+π−) A(D̄0 → K+π−) −iδKπ ; (23) the minus-sign comes from the relative sign between the CKM matrix elements Vcd and Vus. In the limit of no CP-violation in the decay amplitude, one has |A(D0 → K−π+)| = |A(D̄0 → K+π−)|, which is expected to be a very good approximation, in view of the fact that the decay is solely due to a tree-level amplitude. Then the relation of y′ to x, y and φ is given by {(y cos δKπ − x sin δKπ) cosφ+ (x cos δKπ + y sin δKπ) sinφ} , {(y cos δKπ − x sin δKπ) cosφ− (x cos δKπ + y sin δKπ) sinφ} . (24) Presently, the experimental result for y′+/y is compatible with 1, although with con- siderable uncertainties. Any significant deviation from 1 would be a sign for new physics. In Fig. 2 we plot y′+/y as function of φ, for different values of x/y and δKπ. The figures clearly show that the value of y′ is very sensitive to the phase φ, at least if δKπ is not too close to −65◦, which corresponds to the nearly constant dashed line in Fig. 2b. The reason for this dependence on δKπ becomes clearer if y is expanded to first order in = 1− 2φ x(x 2 + 2y2) cos δKπ + y 3 sin δKπ (x2 + y2)(x sin δKπ − y cos δKπ) +O(φ2) . (25) For the central values of x and y, Eq. (11), this amounts to 1+3.4φ for δKπ = 0, 1− 3.3φ for δKπ = +65 ◦ and 1 + 0.45φ for δKπ = −65◦, which explains the shape of the curves in Fig. 2b. Evidently it is important to reduce the uncertainty of δKπ, which, as mentioned x/ySM Figure 3: Plot of |∆Γ/∆ΓSM|, Eq. (26), as a function of x/ySM and φ. earlier, will be achieved within the next few years. On the other hand, as shown in Fig. 2a, y′ , which depends only on the ratio x/y, but not x and y separately, is not very sensitive to the precise value of that ratio, but very much so to φ. The conclusion is that, even if x/y itself cannot be determined very precisely, y′ will nonetheless be a powerful tool to constrain φ, at least once δKπ will be known more precisely. Already now very large values φ ∼ π/2 are excluded. Another, more theory-dependent constraint on φ can be derived from the value of y. This argument centers around the fact that (a) the experimental result (11) is at the top end of theoretical predictions ySM ∼ 1% [17] and (b) new physics indicated by a non-zero value of φ always reduces the lifetime difference, independently of the value of x. This observation is similar to what was found, some time ago, for the Bs system [18]. In order to derive it, we assume that new physics does not affect Γ12, 1 so that Γ12 = Γ 12 . We then have 2|Γ12| = ∆ΓSM and hence |ySM| = |Γ12|/Γ. Using the relations (13), we can then express the ratio |∆Γ/∆ΓSM| in terms of ySM, x and φ: + x2/ cos2 φ . (26) This implies that new physics always reduces the lifetime difference, independently of the value of x (and any new physics in the mass difference). In particular one has y = 0 for φ = ±π/2 and x 6= 0, which follows from the 2nd relation (13). Eq. (26) is the manifestation of the fact that one does not need to observe CP-violation in order to constrain it. A famous example for this is the unitarity triangle in B physics, whose sides are determined from CP-conserving quantities only, but nonetheless allow a precise measurement of the size of CP-violation in the SM, via the angles and the area of the triangle. In Fig. 3, we plot |∆Γ/∆ΓSM| as a function of r = x/ySM. The zero at φ = ±π/2 is clearly visible. The experimental value |y/ySM| = O(1) then excludes phases φ close to ±π/2. In order to make more quantitative statements, apparently a more precise calculation of ySM is needed. 1See, however, Ref. [19] for a discussion of the effect of tiny NP admixtures to Γ12. -1.5 -1. -0.5 0. 0.5 1. 1.5 5.yCP -1.5 -1. -0.5 0. 0.5 1. 1.5 Figure 4: Left: yCP/y as function of φ, for x/y = 1.2 (solid line) and x/y = {0.6, 1.8} (dashed lines), see Eq. (12). Right: AΓ/yCP as function of φ. Two more CP-sensitive observables related to D0 → K+K− have been measured by the Belle collaboration [3]: yCP = [Γ(D0 → K+K−) + Γ(D̄0 → K+K−)]− 1 y cosφ+ x sin φ, (27) [Γ(D0 → K+K−)− Γ(D̄0 → K+K−)]− 1 y cos φ+ x sin φ. (28) The present experimental value of yCP is given in (7), that for AΓ is (0.01± 0.30(stat)± 0.15(syst)) × 10−2. Again, we can study the dependence of these observables on φ. In Fig. 4a we plot the ratio yCP/y, which is a function of x/y and φ, in dependence on φ. As it turns out, this quantity is far less sensitive to φ than y′ , the reason being that its deviation from 1 is only a second-order effect in φ: yCP = y 1 + φ2 x4 + x2y2 − y4 2(x2 + y2)2 +O(φ4) . (29) Hence, unless the experimental accuracy is dramatically increased, and because the results on y′ and y/ySM already exclude a large CP-odd phase φ ≈ ±π/2, it is safe to interpret yCP as measurement of y. In Fig. 4b we plot the quantity AΓ/yCP. Also here there is a distinctive dependence on φ, with AΓ/y ∝ φ for small φ, but the effect is less dramatic than that in y′ In conclusion, we find that the experimental results on D mixing reported by BaBar and Belle already exclude extreme values of the CP-odd phase φ close to ±π/2. This follows from the result for y, which is close to the top end of theoretical predictions and can only be reduced by new physics, and from y′ ∼ 1. While y′ − 1 vanishes in the limit of no CP-violation, y ∼ ∆Γ is a CP-conserving observable, which demonstrates the usefulness of such quantities in constraining CP-odd phases. Also yCP, AΓ and the ratio AΓ/yCP can be useful in constraining φ. As long as there is no major breakthrough in theoretical predictions for D mixing, which are held back by the fact that the D meson is at the same time too heavy and too light for current theoretical tools to get a proper grip on the problem, the long-distance SM contributions to x will completely obscure any NP contributions and their detection. The observation of CP violation, however, presents a theoretically clean way for NP to manifest itself and it is to be hoped that in the near future, i.e. at the B factories or the LHC, at least one of the plentiful opportunities for NP to show up in CP violation [20] will be realised. Acknowledgments This work was supported in part by the EU networks contract Nos. MRTN-CT-2006- 035482, Flavianet, and MRTN-CT-2006-035505, Heptools. References [1] M. Staric (Belle), talk given at 42nd Rencontres de Moriond, Electroweak Interactions and Unified Theories, La Thuile, Italy, March 2007; K. Flood (BaBar), talk given at the same conference. [2] B. Aubert et al. [BABAR Collaboration], arXiv:hep-ex/0703020. [3] K. Abe [Belle Collaboration], arXiv:hep-ex/0703036. [4] M. Ciuchini et al., arXiv:hep-ph/0703204. [5] Y. Nir, arXiv:hep-ph/0703235. [6] P. Ball, arXiv:hep-ph/0703245. [7] M. Blanke et al., arXiv:hep-ph/0703254. [8] X. G. He and G. Valencia, arXiv:hep-ph/0703270. [9] C. H. Chen, C. Q. Geng and T. C. Yuan, arXiv:0704.0601. [10] G. Burdman and I. Shipsey, Ann. Rev. Nucl. Part. Sci. 53, 431 (2003) [arXiv:hep-ph/0310076]. [11] D. Asner, review on D mixing in W. M. Yao et al. [Particle Data Group], J. Phys. G 33 (2006) 1; I. Shipsey, Int. J. Mod. Phys. A 21 (2006) 5381 [arXiv:hep-ex/0607070]; A. A. Petrov, Int. J. Mod. Phys. A 21 (2006) 5686 [arXiv:hep-ph/0611361]. http://arxiv.org/abs/hep-ex/0703020 http://arxiv.org/abs/hep-ex/0703036 http://arxiv.org/abs/hep-ph/0703204 http://arxiv.org/abs/hep-ph/0703235 http://arxiv.org/abs/hep-ph/0703245 http://arxiv.org/abs/hep-ph/0703254 http://arxiv.org/abs/hep-ph/0703270 http://arxiv.org/abs/0704.0601 http://arxiv.org/abs/hep-ph/0310076 http://arxiv.org/abs/hep-ex/0607070 http://arxiv.org/abs/hep-ph/0611361 [12] W. M. Sun [CLEO Collaboration], AIP Conf. Proc. 842 (2006) 693 [arXiv:hep-ex/0603031]; D. Asner et al. [CLEO Collaboration], Int. J. Mod. Phys. A 21 (2006) 5456 [arXiv:hep-ex/0607078]. [13] D. Asner, talk given at workshop Flavour Physics in the Era of the LHC, CERN, March 07, http://mlm.home.cern.ch/mlm/FlavLHC.html. [14] X. D. Cheng et al., arXiv:arXiv:0704.0120. [15] D. N. Gao, Phys. Lett. B 645 (2007) 59 [arXiv:hep-ph/0610389]. [16] A. Lenz and U. Nierste, arXiv:hep-ph/0612167. [17] H. Georgi, Phys. Lett. B 297, 353 (1992) [arXiv:hep-ph/9209291]; T. Ohl, G. Ricciardi and E. H. Simmons, Nucl. Phys. B 403, 605 (1993) [arXiv:hep-ph/9301212]; I. I. Y. Bigi and N. G. Uraltsev, Nucl. Phys. B 592, 92 (2001) [arXiv:hep-ph/0005089]; A. F. Falk, Y. Grossman, Z. Ligeti and A. A. Petrov, Phys. Rev. D 65, 054034 (2002) [arXiv:hep-ph/0110317]; A. F. Falk et al., Phys. Rev. D 69, 114021 (2004) [arXiv:hep-ph/0402204]. [18] Y. Grossman, Phys. Lett. B 380 (1996) 99 [arXiv:hep-ph/9603244]. [19] E. Golowich, S. Pakvasa and A. A. Petrov, arXiv:hep-ph/0610039. [20] P. Ball, J. M. Frere and J. Matias, Nucl. Phys. B 572 (2000) 3 [arXiv:hep-ph/9910211]; P. Ball and R. Zwicky, JHEP 0604 (2006) 046 [arXiv:hep-ph/0603232]; P. Ball and R. Fleischer, Eur. Phys. J. C 48, 413 (2006) [arXiv:hep-ph/0604249]; P. Ball and R. Zwicky, Phys. Lett. B 642 (2006) 478 [arXiv:hep-ph/0609037]; P. Ball, G. W. Jones and R. Zwicky, Phys. Rev. D 75 (2007) 054004 [arXiv:hep-ph/0612081]. http://arxiv.org/abs/hep-ex/0603031 http://arxiv.org/abs/hep-ex/0607078 http://mlm.home.cern.ch/mlm/FlavLHC.html http://arxiv.org/abs/0704.0120 http://arxiv.org/abs/hep-ph/0610389 http://arxiv.org/abs/hep-ph/0612167 http://arxiv.org/abs/hep-ph/9209291 http://arxiv.org/abs/hep-ph/9301212 http://arxiv.org/abs/hep-ph/0005089 http://arxiv.org/abs/hep-ph/0110317 http://arxiv.org/abs/hep-ph/0402204 http://arxiv.org/abs/hep-ph/9603244 http://arxiv.org/abs/hep-ph/0610039 http://arxiv.org/abs/hep-ph/9910211 http://arxiv.org/abs/hep-ph/0603232 http://arxiv.org/abs/hep-ph/0604249 http://arxiv.org/abs/hep-ph/0609037 http://arxiv.org/abs/hep-ph/0612081 ABSTRACT Both BaBar and Belle have found evidence for a non-zero width difference in the $D^0$-$\bar D^0$ system. Although there is no direct experimental evidence for CP-violation in $D$ mixing (yet), we show that the measured values of the width difference $y\sim \Delta\Gamma$ already imply constraints on the CP-violating phase in $D$ mixing, which, if significantly different from zero, would be an unambiguous signal of new physics. <|endoftext|><|startoftext|> Introduction We consider the flow of an incompressible fluid in a open bounded polyhedral set Ω ⊂ R2 during the time interval [0, T ]. The velocity field u : Ω× [0, T ] → 2 and the pressure field p : Ω × [0, T ] → R satisfy the Navier-Stokes equa- tions ∆u+ (u ·∇)u+∇p = f , (1) div u = 0 , (2) with the boundary and initial condition u|∂Ω = 0 , u|t=0 = u0. The terms ∆u and (u ·∇)u are associated with the physical phenomena of diffusion and convection, respectively. The Reynolds number Re measures the S. Zimmermann 17 rue Barrème, 69006 Lyon - FRANCE Tel.: (+33)0472820337 E-mail: Sebastien.Zimmermann@ec-lyon.fr http://arxiv.org/abs/0704.0787v1 2 Sébastien Zimmermann influence of convection in the flow. For equations (1)–(2), finite element and finite difference methods are well known and mathematical studies are avail- able (see [9] for example). For finite volume schemes, numerous computations have been conducted ([12] and [1] for example). However, few mathematical results are available in this case. Let us cite Eymard and Herbin [5] and Eymard, Latché and Herbin [6]. In order to deal with the incompress- ibility constraint, these works use a penalization method. Another way is to use the projection methods which have been introduced by Chorin [4] and Temam [13]. This is the case in Faure [8] where the mesh is made of squares. In Zimmermann [14] the mesh is made of triangles, which allows more complex geometries. In the present paper the mesh is also made of tri- angles, but we consider a different discretisation for the pressure. It leads to a linear system with a better-conditioned matrix. The layout of the article is the following. We first introduce (section 2.1) some notations and hypotheses on the mesh. We define (section 2.2) the spaces we use to approximate the velocity and pressure. We define also (section 2.3) the operators we use to approximate the differential operators in (1)–(2). By combining this with a projection method, we build the scheme in section 3. In order to provide a mathematical analysis, we state in section 4 that the differential operators in (1)–(2) and their discrete counterparts share similar properties. In particular, the discrete operators for the gradient and the divergence are adjoint. We then prove in section 5 the convergence of the scheme. We conclude with some notations. We denote by χI the characteristic func- tion of an interval I ⊂ R. We denote by C∞0 = C∞0 (Ω) the set of the functions with a compact support in Ω. The spaces (L2, |.|) and (L∞, ‖.‖∞) are the usual Lebesgue spaces and we set L20 = {q ∈ L2 ; q dx = 0}. Their vectorial counterparts are (L2, |.|) and (L∞, ‖.‖∞) with L2 = (L2)2 and L∞ = (L∞). For k ∈ N∗, (Hk, ‖ · ‖k) is the usual Sobolev space. Its vectorial counterpart is (Hk, ‖.‖k) with Hk = (Hk)2. For k = 1, the functions of H1 with a null trace on the boundary form the space H10. Also, we set ∇u = (∇u1,∇u2)T if u = (u1, u2) ∈ H1. If X ⊂ L2 is a Banach space, we define C(0, T ;X) (resp. L2(0, T ;X)) as the set of the applications g : [0, T ] → X such that t → |g(t)| is continous (resp. square integrable). The norm ‖.‖C(0,T ;X) is defined by ‖g‖C(0,T ;X) = sups∈[0,T ] |g(s)|. Finally in all calculations, C is a generic positive constant, depending only on Ω, u0 and f . 2 Discrete setting First, we introduce the spaces and operators needed to build the mesh. 2.1 The mesh Let Th be a triangular mesh of Ω: Ω = ∪K∈ThK. For each triangle K ∈ Th, we denote by |K| its area and EK the set of his edges. If σ ∈ EK , nK,σ is the unit vector normal to σ pointing outwards of K. The set of edges of the mesh is Eh = ∪K∈ThEK . The length of an edge σ ∈ Eh is |σ|. The set of edges inside Ω (resp. on the boundary) is E inth (resp. Eexth ): Convergence of a finite volume scheme for the incompressible fluids 3 Eh = E inth ∪ Eexth . If σ ∈ E inth , Kσ and Lσ are the triangles sharing σ as an edge. If σ ∈ Eexth , only the triangle Kσ inside Ω is defined. We denote by xK the circumcenter of a triangle K. We assume that the measure of all interior angles of the triangles of the mesh are below π that xK ∈ K. If σ ∈ E inth (resp. σ ∈ Eexth ) we set dσ = d(xKσ ,xLσ ) (resp. dσ = d(xσ,xKσ )). We define for all edge σ ∈ Eh: τσ = . The maximum circumradius of the triangles of the mesh is h. We assume that there exists C > 0 such that ∀σ ∈ Eh, d(xKσ , σ) ≥ C|σ| and |σ| ≥ Ch. It implies that there exists a constant C > 0 such that for all edge σ ∈ Eh τσ ≥ C (3) and for all triangles K ∈ Th we have (with σ ∈ EK and hK,σ the matching altitude) |K| = 1 |σ|hK,σ ≥ |σ| d(xK ,xσ) ≥ C h2. (4) 2.2 The discrete spaces We first define P0 = {q ∈ L2 ; ∀K ∈ Th, q|K is a constant} , P0 = (P0)2. For the sake of concision, we set for all qh ∈ P0 (resp. vh ∈ P0) and all triangle K ∈ Th: qK = qh|K (resp. vK = vh|K). Although P0 6⊂ H1, we define the discrete equivalent of a H1 norm as follow. For all vh ∈ P0 we set ‖vh‖h = σ∈Eint τσ |vLσ − vKσ |2 + σ∈Eext τσ |vKσ |2 . (5) We have [7] a discrete Poincaré inequality for P0: there exists C > 0 such that for all vh ∈ P0 |vh| ≤ C ‖vh‖h. (6) From the norm ‖.‖h we deduce a dual norm. For all vh ∈ P0 we set ‖vh‖−1,h = sup (vh,ψh) ‖ψh‖h . (7) For all uh ∈ P0 and vh ∈ P0 we have (uh,vh) ≤ ‖uh‖−1,h ‖vh‖h. We define the projection operator ΠP0 : L 2 → P0 as follows. For all w ∈ L2, ΠP0w ∈ P0 is given by ∀K ∈ Th , (ΠP0w)|K = w(x) dx. (8) 4 Sébastien Zimmermann We easily check that for all w ∈ L2 and vh ∈ P0 we have (ΠP0w,vh) = (w,vh). We deduce from this that ΠP0 is stable for the L 2 norm. We define also the operator Π̃P0 : H 2 → P0. For all v ∈ H2, Π̃P0v ∈ P0 is given by ∀K ∈ Th , Π̃P0v|K = v(xK). According to the Sobolev embedding theorem, v ∈ H2 is a.e. equal to a continuous function. Therefore the definition above makes sense. One checks [14] that there exists C > 0 such that |v − Π̃P0v| ≤ C h ‖v‖2 (9) for all v ∈ H1. We introduce also the finite element spaces 1 = {v ∈ L2 ; ∀K ∈ Th, v|K is affine} , Pnc1 = {vh ∈ P d1 ; ∀σ ∈ E inth , vh|Kσ (xσ) = vh|Lσ(xσ) , Pnc1 = (Pnc1 )2. If qh ∈ Pnc1 , we have usually ∇qh 6∈ L2. Therefore we define the operator ∇h : Pnc1 → P0 by setting for all qh ∈ P0 and all triangle K ∈ Th ∇hqh|K = ∇qh dx. (10) We define the projection operator ΠPnc . For all q ∈ H1, ΠPnc q is given by ∀σ ∈ Eh , (ΠPnc q) dσ = q dσ. (11) We also set ΠPnc = (ΠPnc )2. One checks that there exists C > 0 such that ∣∣∇q −∇h(ΠPnc ∣∣ ≤ C h ‖q‖2 , |v −ΠPnc v| ≤ C h ‖v‖1 , (12) for all q ∈ H1 and v ∈ H1. We also use the Raviart-Thomas spaces = {vh ∈ Pd1 ; ∀σ ∈ EK , vh|K · nK,σ is constant, and vh · n|∂Ω = 0} , RT0 = {vh ∈ RTd0 ; ∀K ∈ Th, ∀σ ∈ EK , vh|Kσ · nKσ,σ = vh|Lσ · nKσ,σ}. For all vh ∈ RT0, K ∈ Th and σ ∈ EK we set (vh · nK,σ)σ = vh|K · nK,σ. We define the operator ΠRT0 : H 1 → RT0. For all v ∈ H1, ΠRT0v ∈ RT0 is given by ∀K ∈ Th , ∀σ ∈ EK , (ΠRT0v · nK,σ)σ = v dσ. (13) One checks [3] that there exists a constant C > 0 such that for all v ∈ H1 |v −ΠRT0v| ≤ C h ‖v‖1. (14) Convergence of a finite volume scheme for the incompressible fluids 5 2.3 The discrete operators The equations (1)–(2) use the differential operators gradient, divergence and laplacian. Using the spaces of section 2.2, we now define their discrete coun- terparts. The discrete gradient ∇h : Pnc1 → P0 is defined by (10). The discrete divergence operator divh : P0 → Pnc1 is built so that it is adjoint to the operator ∇h (proposition 3 below). We set for all vh ∈ P0 and all triangle K ∈ Th ∀σ ∈ E inth , (divh vh)(xσ) = 3 |σ| |Kσ|+ |Lσ| (vLσ − vKσ ) · nK,σ ; ∀σ ∈ Eexth , (divh vh)(xσ) = − 3 |σ| |Kσ|+ |Lσ| vKσ · nK,σ. (15) The first discrete laplacian ∆h : P 1 → Pnc1 is given by ∀ qh ∈ Pnc1 , ∆hqh = divh(∇hqh). The second discrete laplacian ∆̃h : P0 → P0 is the usual operator in finite volume schemes [7]. We set for all vh ∈ P0 and all triangle K ∈ Th ∆̃hvh|K = σ∈EK∩E τσ (vLσ − vKσ)− σ∈EK∩E τσ vKσ . (16) In order to approximate the convection term (u · ∇)u in (1), we define a bilinear form b̃h : RT0 ×P0 → P0 using the well-known [7] upwind scheme. For all uh ∈ P0, vh ∈ P0, and all triangle K ∈ Th we set b̃h(uh,vh) σ∈EK∩E (u · nK,σ)+σ vK + (u · nK,σ)−σ vLσ We have set a+ = max(a, 0), a− = min(a, 0) for all a ∈ R. Lastly, we define the trilinear form bh : RT0 × P0 × P0 → R as follows. For all uh ∈ RT0, vh ∈ P0, wh ∈ P0, we set bh(uh,vh,wh) = |K|wK · b̃h(uh,vh) 3 The scheme We have defined in section 2 the discretization in space. We now have to define the discretization in time, and treat the incompressibility constraint (2). We use a projection method to this end. This kind of method has been introduced by Chorin [4] and Temam [13]. The time interval [0, T ] is split with a time step k: [0, T ] = n=0[tn, tn+1] with N ∈ N∗ et tn = n k for all n ∈ {0, . . . , N}. We start with the initial values u0h ∈ P0 ∩RT0 , u1h ∈ P0 ∩RT0 , p1h ∈ Pnc1 ∩ L20. For all n ∈ {1, . . . , N}, (ũn+1h , p h ) is deduced from (ũ h) as follows. 6 Sébastien Zimmermann – ũn+1h ∈ P0 is given by 3 ũn+1h − 4unh + u ∆̃hũ + b̃h(2u h − un−1h , ũ h ) +∇hp h = f h , (17) – pn+1h ∈ Pnc1 ∩ L20 is the solution of h − p divh ũ – un+1h ∈ P0 is given by un+1h = ũ ∇h(pn+1h − p h). (18) We have proven in [14] that the scheme is well defined. In particular the term b̃h(2u h − u h ) in (17) is defined thanks to the following result. Proposition 1 For m ∈ {0, . . . , N} we have umh ∈ RT0. Note also that for m ∈ {0, . . . , N} we have divumh = 0, since umh ∈ P0. Thus the incompressibility condition (2) is fullfilled. 4 Properties of the discrete operators The operators defined in section 2.3 have the following properties [14]. Proposition 2 There exists a constant C > 0 such that for all uh ∈ RT0 satisfying divuh = 0, vh ∈ P0, wh ∈ P0: |bh(uh,vh,vh)| ≤ C |uh| ‖vh‖h ‖wh‖h. Proposition 3 For all vh ∈ P0 and qh ∈ Pnc1 : (vh,∇hqh) = −(qh, divh vh). Proposition 4 For all uh ∈ P0 and vh ∈ P0: −(∆̃huh,uh) = ‖uh‖2h and −(∆̃huh,vh) ≤ ‖uh‖h ‖vh‖h. If v ∈ H1 we have |divv| ≤ ‖v‖1. The operator divh has a similar property. Proposition 5 There exists a constant C > 0 such that for all vh ∈ P0 |divh vh| ≤ C ‖vh‖h. Proof. Using a quadrature formula we have |divh vh|2 = |(divh vh)(xσ)|2 . Convergence of a finite volume scheme for the incompressible fluids 7 Let K ∈ Th. Using definition and (4) we have ∀σ ∈ EK ∩ E inth , |(divh vh)(xσ)| |K| |vLσ − vK | ∀σ ∈ EK ∩ Eexth , |(divh vh)(xσ)| 2 ≤ C |vK |2. Thus: |divh vh|2 ≤ C σ∈EK∩E |vLσ − vK |2 + σ∈EK∩E |vK |2 Writing the sum over the triangles as a sum over the edges, we get |divh vh|2 ≤ C σ∈Eint τσ |vLσ − vK |2 + σ∈Eext τσ |vK |2  ≤ C ‖vh‖2h. Proposition 6 If uh ∈ P0 and vh ∈ P0 we have (∆̃huh,vh) = (uh, ∆̃hvh). Proof. Using definition (2.3) one checks that (∆̃huh,vh) = σ∈Eint τσ (vLσ − vKσ ) · (uLσ − uKσ)− σ∈Eext τσ vKσ · uKσ = (∆̃hvh,uh). Proposition 7 There exists C > 0 such that for all v ∈ H2 satisfying ∇v · n|∂Ω = 0 ‖ΠP0(∆v)−∆h(Π̃P0v)‖−1,h ≤ C h ‖v‖2. Proof. Let ψh ∈ P0. We have ΠP0(∆q)−∆h(Π̃P0v),ψh ΠP0(∆q)−∆h(Π̃P0v) ·ψK . For all K ∈ Th, using (2.3) and the divergence formula, we get ΠP0(∆v) −∆h(Π̃P0v) ) ∣∣∣ σ∈EK∩E ∇v · nK,σ dσ − τσ v(xLσ )− v(xK) Thus, by writing the sum over the triangles as a sum over the edges, we get ΠP0(∆v)−∆h(Π̃P0v),ψh σ∈Eint (ψLσ −ψKσ)Rσ with Rσ = ∇v · nKσ,σ − 1dσ v(xLσ ) − v(xKσ ) dσ. We denote by Dσ the quadrilatere defined by xKσ , xLσ and the endpoints of σ. Using a Taylor expansion and a density argument, we get as in [7] that |Rσ| ≤ C h |H(vi)(y)|2 dy 8 Sébastien Zimmermann Thus, using the Cauchy-Schwarz inequality, we get ΠP0(∆q)−∆h(Π̃P0v),ψh ≤ C h σ∈Eint |ψLσ −ψKσ | σ∈Eint |H(vi)(y)|2 dy According to (3) σ∈Eint |ψLσ −ψKσ | 2 ≤ C σ∈Eint τσ |ψLσ −ψKσ | 2 ≤ C ‖ψ‖2h. ΠP0(∆v)−∆h(Π̃P0v),ψh )∣∣∣ ≤ C h ‖ψh‖h ‖v‖2. Using (7) we get the result. 5 Convergence of the scheme We first recall the stability result that has been proven in [15]. We deduce from it an estimate on the Fourier transform of the computed velocity (lemma 1). Using a result on space P0, we infer from it the convergence of the scheme (theorem 2). One shows that if the data u0 et f fulfill a compati- bility condition [11], there exists a solution (u, p) to equations (1)–(2) such that u ∈ C(0, T ;H2) , ∇p ∈ C(0, T ;L2). We assume from now on that there exists C > 0 such that (HI) |u0h−u0|+ ‖u1h−u(t1)‖∞+|p1h−p(t1)| ≤ C h , |u1h−u0h| ≤ C k. Let us recall the following result [15]. Theorem 1 We assume that the initial values of the scheme fulfill (HI). There exists a constant C > 0 such that for all m ∈ {2, . . . , N} |umh |+ k ‖ũnh‖2h + |umh − um−1h |+ ‖ũnh − ũn−1h ‖ h ≤ C |pnh|2 + k |∇hpmh |+ |∇h(pmh − pm−1h )| ≤ C. From now on we set ũ1h = u h for the sake of conveniance. One deduces from hypothesis (HI) [14] that |p1h| ≤ C and ‖ũ1h‖h ≤ C. Now, let ε = max(h, k). We study the behaviour of the scheme as ε → 0. We define the applications uε : R → P0 , ũε : R → P0 , ũcε : R → P0, pε : R → Pnc1 and fε : R → P0 as follows. For all n ∈ {0, . . . , N − 1} and all t ∈ [tn, tn+1] we set uε(t) = u h , ũε(t) = ũ h , ũ ε(t) = ũ (t− tn) (ũn+1h − ũ pε(t) = p h, fε(t) = ΠP0 f(tn+1) , Convergence of a finite volume scheme for the incompressible fluids 9 and for all t 6∈ [0, T ] we set uε(t) = ũε(t) = ũcε(t) = fε(t) = 0, pε(t) = 0. We recall that the Fourier transform v̂ of a function v ∈ L1(R) is defined by ∀ τ ∈ R, v̂(τ) = e−2iπτt v(t) dt. (19) We have the following result. Lemma 1 Let 0 < γ < 1 . There exists C > 0 such that for all ε > 0 |τ |2γ |̂̃uε(τ)|2 dτ ≤ C. Proof. Let χI be the characteristic function of an interval I ⊂ R. We define the application gε : R → P0 as follows. For all t 6∈ [t1, T ] we set gε(t) = 0. For all t ∈ [t1, T ], gε(t) ∈ P0 is the solution of ∆̃hgε = ∆̃hũε + fε − b̃h 2uε − uε(t− k), ũε with Pε = −∇hpε− 2 ũε(t−k)−uεk + ũε(t−2 k)−uε(t−k) χ[t2,T ]. We have omitted most of the time dependancies for the sake of concision. Let us estimate gε. We have −(∆̃hgε,gε) = −(∆̃hũε,gε)− (fε,gε) (20) 2uε − uε(t− k), ũε,gε + (Pε,gε). According to proposition 4 we have −(∆̃hgε,gε) = ‖gε‖2h , −(∆̃hũε,gε) ≤ ‖ũε‖h ‖gε‖h. Using the Cauchy-Schwarz inequality and (6) we have −(fε,gε) ≤ |fε| |gε| ≤ C |fε| ‖gε‖h. According to proposition 2 and theorem 1 2uε − uε(t− k), ũε,gε ≤ C |2uε − uε(t− k)| ‖ũε‖h ‖gε‖h ≤ C ‖ũε‖h ‖gε‖h. Using (18) we have Pε = − χ[t3,T ] ∇hpε+ χ[t3,T ]∇hpε(t−k)− χ[t3,T ] ∇hpε(t−2 k). Using proposition 3 and the Cauchy-Schwarz inequality we get |(Pε,gε)| ≤ C |pε|+ χ[t3,T ] |pε(t− k)|+ χ[t3,T ] |pε(t− 2 k)| |divh gε|. Using proposition 5 we have |(Pε,gε)| ≤ C |pε|+ χ[t3,T ] |pε(t− k)|+ χ[t3,T ] |pε(t− 2 k)| ‖gε‖h. 10 Sébastien Zimmermann Let us plug these estimates into (20). By simplifying by ‖gε‖h and integrating from t = t1 to T we get ‖gε‖h dt ≤ C + C |pε| dt+ |fε| dt+ ‖ũε‖h dt According to the Cauchy-Schwarz inequality and theorem 1 |pε(t)| dt ≤ |pε(t)|2 dt |pnh|2 Thanks to the stability of ΠP0 for the L 2 norm we have |fε(t)| dt = k |ΠP0 f(tn)| ≤ k |f(tn)| ≤ k ‖f‖C(0,T ;L2) ≤ C. And thanks to the Cauchy-Schwarz inequality and theorem 1 ‖ũε(t)‖h dt ≤ ‖ũε(t)‖2h dt ≤ C k ‖ũnh‖2h ≤ C. Thus, since gε(t) = 0 for t ∈ [0, t1], we get ‖gε(t)‖h = ‖gε(t)‖h dt ≤ C. Using definition (19) we obtain finally ∀ τ ∈ R , ‖ĝε(τ)‖h ≤ C. (21) With this estimate we can now prove the result. Since the function ũcε is piecewise C1 on R, and discontinous for t = 0 and t = T , equation (17) reads dũcε dũcε (t− k) = ∆hgε + (ũ0h δ0 − ũNh δT )− (ũ1h δt1 − ũNh δT+k) ũ1h − ũ0h χ[0,t1] − ũNh − ũ χ[T,T+k] where δ0, δt1 , δT and δT+k are Dirac distributions located respectively in 0, t1, T and T + k. Let τ ∈ R. Applying the Fourier transform we get −2iπτ −2iπτk ̂̃uε(τ) = ∆hĝε(τ) +α (ũ0h − ũNh e−2iπT )− (ũ1h − ũNh e−2iπT ) e−2iπk − 1 ũ1h − ũ0h ũNh − ũ −2iπT e−2iπk − 1 Convergence of a finite volume scheme for the incompressible fluids 11 Taking the scalar product with i ̂̃uε(τ) we get e−2iπτk |̂̃uε(τ)|2 = i ∆hĝε(τ), ̂̃uε(τ) α, ̂̃uε(τ) Let us bound the right-hand side. According to proposition 4 and (21) ∆hĝε(τ), ̂̃uε(τ) )∣∣∣ ≤ ‖ĝε(τ)‖h ‖̂̃uε(τ)‖h ≤ C ‖̂̃uε(τ)‖h. On the other hand, using theorem 1, one checks that α is bounded. Thus, according to the Cauchy-Schwarz inequality and (6) α, ̂̃uε(τ) )∣∣∣ ≤ |α| |̂̃uε(τ)| ≤ C |α| ‖̂̃uε(τ)‖h ≤ C ‖̂̃uε(τ)‖h. Hence we have ∀ τ ∈ R, |τ | |̂̃uε(τ)|2 ≤ C ‖̂̃uε(τ)‖h. If τ 6= 0, multiplying this estimate by |τ |2 γ−1, we get |τ |2 γ |̂̃uε(τ)|2 ≤ C |τ |2 γ−1 ‖̂̃uε(τ)‖h. Using the Young inequality and integrating over {τ ∈ R ; |τ | > 1} we obtain |τ |>1 |τ |2 γ |̂̃uε(τ)|2 dτ ≤ |τ |>1 |τ |4 γ−2 dτ + C |τ |>1 ‖̂̃uε(τ)‖2h dτ. For |τ | ≤ 1 we have |τ |2 γ |̂̃uε(τ)|2 ≤ |̂̃uε(τ)|2 ≤ C ‖̂̃uε(τ)‖2h thanks to (6). |τ |2 γ |̂̃uε(τ)|2 dτ ≤ |τ |>1 |τ |4 γ−2 dτ + C ‖̂̃uε(τ)‖2h dτ. Since 4 γ−2 < −1 we have |τ |>1 |τ |4 γ−2 dτ ≤ C. On the other hand, thanks to the Parseval theorem and thorem 1 ‖̂̃uε(τ)‖2h dτ ≤ ‖̂̃uε(τ)‖2h dt ≤ k ‖ũnh‖2h ≤ C. Hence the result. We introduce the following spaces H = {v ∈ L2 ; divv ∈ L2 et v · n|∂Ω = 0} , V = {v ∈ H10 ; divv = 0}. We also set ((u,v)) = (∇ui,∇vi) , b(u,v,w) = − (vi,u · ∇wi) for all u = (u1, u2) ∈ H1, v = (v1, v2) ∈ H1, w = (w1, w2) ∈ H1. We have the following result. 12 Sébastien Zimmermann Theorem 2 We assume that the initial values of the scheme fulfill hypothesis (HI). We also assume that the space step h and the time step k are such that h ≤ C kα with α > 1. Then we have uε → u in L2(0, T ;L2) with u ∈ C(0, T ;H) ∩ L2(0, T ;V) , ∈ L2(0, T ;L2). (22) We also have u(0) = u0 and for all ψ ∈ C∞0 ([0, T ]) ∀v ∈ V, (u,v) + ((u,v)) + b(u,u,v) − (f ,v) dt = 0. (23) Proof. In what follows, sub-sequences of a sequence (vε)ε>0 will still be noted (vε)ε>0 for the sake of convenience. All the limits are for ε → 0. According to theorem 1 and hypothesis (HI) we have ‖uε‖2L2(0,T ;L2) = k (|u h|2 + |u1h|2) + k |unh|2 ≤ C. We also deduce from (6), hypothesis (HI) and theorem 1 ‖ũε‖2L2(0,T ;L2) = k |u h|2 + k |ũnh|2 ≤ C + C k ‖ũnh‖2h ≤ C. A simple computation shows that there exists C > 0 such that ‖ũcε‖L2(0,T ;L2) ≤ C ‖ũε‖L2(0,T ;L2) ≤ C. Thus the sequences (uε)ε>0, (ũε)ε>0 and (ũ ε)ε>0 are bounded in L 2(0, T ;L2). Therefore there exists u ∈ L2(0, T ;L2), ũ ∈ L2(0, T ;L2) and ũc ∈ L2(0, T ;L2) such that, up to a sub-sequence, we have uε ⇀ u , ũε ⇀ ũ , ũ ε ⇀ ũ c weakly in L2(0, T ;L2). We claim that the limits u, ũ, ũc are the same. Indeed, let us consider uε−ũε. Since unh − ũ h = (u h − u h ) + (u h − ũ h ) we have ‖uε − ũε‖2L2(0,T ;L2) ≤ 2 k |unh − un+1h | 2 + 2 k |un+1h − ũ According to theorem 1 we have k n=0 |unh−u h |2 ≤ C n=0 k 3 ≤ C k2. Thanks to (18) we also have |un+1h − ũ |∇h(pn+1h − p h)|2 ≤ C 3 ≤ C k2. Thus ‖uε − ũε‖L2(0,T ;L2) → 0 and u = ũ. One checks in a simililar way that ũ = ũc. Now, using the Fourier transform, we prove the strong convergence Convergence of a finite volume scheme for the incompressible fluids 13 of the sequence (uε)ε>0 in L 2(0, T ;L2). We set vε = uε −u. Let M > 0. We use the splitting |v̂ε(τ)|2 dτ = |τ |≤M |v̂ε(τ)|2 dτ + |τ |>M |v̂ε(τ)|2 dτ = IMε + JMε . Let us estimate JMε . Since |v̂ε(τ)|2 ≤ 2 |ûε(τ)|2 + 2 |û(τ)|2 we have JMε ≤ 2 |τ |>M |ûε(τ)|2 dτ + 2 |τ |>M |û(τ)|2 dτ. According to lemma 1 we have |τ |>M |ûε(τ)|2 dτ ≤ |τ |>M |τ |2γ |ûε(τ)|2 dτ ≤ |τ |>M |û(τ)|2 dτ. Therefore, for all ε > 0, we have JMε → 0 when M → ∞. We now consider IMε . Let τ ∈ R. Since uε ⇀ u in L2(0, T ;L2), we deduce from definition (19) ̂̃uε(τ)⇀ û(τ) weakly in L2. For all t ∈ R we have ũε(t) ∈ P0. From definition (19) we infer that ̂̃uε(τ) ∈ P0. Now, prolonging ̂̃uε(τ) by 0 outside Ω, we deduce from lemma 4 in [7] that there exists a constant C > 0 such that ∀η∈ R2 , |̂̃uε(τ)(· + η)− ̂̃uε(τ)|2 ≤ ‖̂̃uε(τ)‖2h |η| (|η|+ C h). Using definition (19), the Cauchy-Schwarz inequality and theorem 1, we have ‖̂̃uε(τ)‖2h ≤ C ‖ũε(t)‖2h dt ≤ C k ‖ũnh‖2h ≤ C. Thus, using the compactness criterium given by theorem 1 in [7], we get ̂̃uε(τ) → û(τ) in L2. Thus ̂̃vε(τ) = ̂̃uε(τ)− û(τ) → 0 in L2. Therefore for all M > 0 we have IMε → 0. Using the Parseval inequality, and gathering the limits for IMε and J ε , we get |v̂ε(τ)|2 dτ = |vε|2 dt = |uε − u|2 dτ → 0. We have proven that uε → u in L2(0, T ;L2). We now check the properties of u. First, proceeding as in [7], one checks easily that u ∈ L2(0, T ;H10). Now let q ∈ L2(0, T ; C∞0 ). According to (12) we have ∇h(ΠPnc q) → ∇q in L2(0, T ;L2). Since uε → u in L2(0, T ;L2) we get ∇h(ΠPnc q),uε → (∇q,u) = −(q, divu). 14 Sébastien Zimmermann On the other hand, according to propositions 1 and 3, we have for all ε > 0 ∇h(ΠPnc q),uε = −(ΠPnc q, divh uε) = 0. Thus we have q divu dt = 0 for all q ∈ L2(0, T ; C∞0 ). Since the space C∞0 is dense in L2, we get divu = 0. Hence u ∈ L2(0, T ;V). Let us now check the regularity of . Using hypothesis (HI), (6) and theorem 1, we have dũcε L2(0,T ;L2) |u1h − u0h|2 + k |δũnh|2 ≤ C + C k ‖δũnh‖2h Thus the sequence dũcε is bounded in L2(0, T ;L2). Since uε → u in L2(0, T ;L2) with u ∈ L2(0, T ;H1), proceeding as in, we get dũcε weakly ∈ L2(0, T ;L2) and u ∈ C(0, T ;H). Let us now prove that u satisfies (23). For the sake of simplicity, we omit to note some time dependencies. According to (17) we have for all t ∈ [t1, T ] dũcε dũcε (t− k)− 1 ∆hũε + b̃h 2uε − uε(t− k), ũε χ[t3,T ] ∇hpε + χ[t3,T ] ∇hpε(t− k)− χ[t3,T ] ∇hpε(t− 2 k). Let v ∈ V ∩ (C∞0 )2 and ψ ∈ C∞([0, T ]) with ψ(T ) = 0. We set vh = Π̃P0v. Multiplying the former equation by ψ vh and integrating over [t1, T ] we get dũcε dũcε (t− k),vh dt− 1 ψ (∆̃hũε,vh) dt 2uε − uε(t− k), ũε,vh ψ (fε,vh) dt χψ (∇hpε,vh) dt (24) with χ = −χ[t3,T−2 k] + 13 χ[t2,t3] − χ[t1,t2] − 73 χ[T−k,T ] − χ[T−2 k,T−k]. We now check the limits of the terms in this equation. First, according to (9), we have vh → v in L2. We will use this limit in the computations below without mentioning it. Since ψ(T ) = 0 we obtain by integrating by parts dũcε dt− ψ(t1) (ũ1h,vh)− ψ′ (ũcε,vh) dt dũcε (t− k),vh dt = −ψ(0) (ũ0h,vh)− ∫ T−k ′(t+ k) (ũcε,vh) dt. Convergence of a finite volume scheme for the incompressible fluids 15 According to hypothesis (HI) we have ũ0h = u h → u0 in L2 and ũ1h = u1h → u0 in L2. It implies that (u0h,vh) → (u0,v) and ψ(t1) (ũ1h,vh) = ψ(k) (ũ1h,vh) → ψ(0) (u0,v). On the other hand ψ′ (ũcε,vh) dt = χ[t1,T ] ψ ′ (ũcε,vh) dt→ ψ′ (u,v) dt and since χ[0,T−k] ψ ′(·+ k) → ψ′ in L∞(0, T ) ∫ T−k ′(t+k) (ũcε,vh) dt = χ[0,T−k] ψ ′(t+k) (ũcε,vh) dt→ ′ (u,v) dt. Thus we have dũcε dũcε (t− k),vh dt→ −ψ(0) (u,v)− ψ′ (u,v) dt. (25) Let us now consider the discrete laplacian. Using proposition 6 and the split- ting ∆̃hvh = ∆̃hvh −ΠP0(∆̃v) +ΠP0(∆̃v) we have ∆̃hũε,vh) dt = Aε +Bε with Aε = ũε, ∆̃h(Π̃P0v)−ΠP0(∆̃v) dt,Bε = ũε, ΠP0(∆v) Since |Aε| ≤ ‖∆̃h(Π̃P0v) −ΠP0(∆̃v)‖−1,h ψ ‖ũε‖h dt , using proposition 7 and the Cauchy-Schwarz inequality, we get |Aε| ≤ C h ‖v‖2 ψ ‖ũε‖h dt ≤ C h ψ2 dt )1/2(∫ T ‖ũε‖2h dt Therefore, using theorem 1: |Aε| ≤ C h n=1 ‖ũnh‖2h ≤ C h. Hence Aε → 0. On the other hand, using an integration by parts, we have ψ (ũε, ∆v) dt → ψ (u, ∆v) dt = − ψ ((u,v)) dt. By gathering the limits for Aε and Bε we get ψ (∆̃hũε,vh) dt → − ψ ((u,v)) dt. Let us now consider the pressure. We use the splitting (∇hpε,vh) = (∇hpε,vh − v) + (∇hpε,v −ΠRT0v) + (∇hpε, ΠRT0v). (26) 16 Sébastien Zimmermann First, integrating by parts, we have (∇hpε, ΠRT0v) = − pε, div (ΠRT0v) pε (ΠRT0v · nK,σ). Since divv = 0, using the divergence formula and definition (13), one checks that div (ΠRT0v) = 0. Thus − pε, div (ΠRT0v) = 0. On the other hand pε (ΠRT0v ·nK,σ) = σ∈Eint (ΠRT0v)σ ·nKσ ,σ) (pε|Lσ −pε|Kσ ) dσ and since pε ∈ Pnc1 we get pε (ΠRT0v · nK,σ) = 0. Thus the last term in (26) vanishes. To bound the other terms, we use the Cauchy-Schwartz inequality together with estimates (9), (14) and theorem 1. We get |(∇hpε,v − vh)|+ |(∇hpε,v −ΠRT0v)| ≤ C h |∇hpε| ‖v‖2 ≤ C ‖v‖2. Plugging these estimates into (26) we get |χψ (∇hpε,vh)| dt ≤ C hk . By hypothesis we have h ≤ kα−1 with α− 1 > 0. Thus for ε = max(h, k) → 0 χψ (∇hpε,vh) dt→ 0. Let us now consider the convection term. We set uε = 2uε − uε(t − k) and want to find the limit of ψ bh(uε, ũε,vh) dt. We use the splitting −bh(uε, ũε,vh) + b(u,u,v) = Aε1 +Aε2 +Aε3 with Aε1 = b(u− uε,u,v) , Aε2 = b(uε,u,v) − div(ui uε), v Aε3 = div(ui uε), v − bh(uε,uε,vh). By definition Aε1 = − ui, (u − uε) · ∇vi . Using the Cauchy-Schwarz inequality we get ψ |Aε1| dt ≤ ‖ψ‖∞ ‖v‖W1,∞ ‖u‖L2(0,T ;L2) ‖u− uε‖L2(0,T ;L2). Since uε → u in L2(0, T ;L2) we also have ‖u − uε‖L2(0,T ;L2) → 0. Thus∫ T ψAε1 dt → 0. Let us now consider Aε2. Since uε · n|∂Ω = 0 we obtain by integrating by parts b(uε,u,v) = vi, div(ui uε) . Thus Aε2 = vi − vih, div(ui uε) vi − vih,uε · ∇ui Convergence of a finite volume scheme for the incompressible fluids 17 Using the Cauchy-Schwarz inequality we get ψ |Aε2| dt ≤ C ‖ψ‖L∞ ‖v − vh‖L∞ ‖uε‖L2(0,T ;L2) ‖u‖L2(0,T ;H1). Using a Taylor expansion one checks that ‖v − vh‖L∞ ≤ ‖v‖W1,∞ h. We recall also that ‖uε‖L2(0,T ;L2) ≤ C. Therefore ψAε2 dt → 0. Let us now bound Aε3. For all triangle K ∈ Th and all edge σ ∈ EK ∩ E inth , we set ũεK,Lσ = ũε|K if (uε · nK,σ) ≥ 0 ũε|Lσ if (uε · nK,σ) < 0 Using the divergence formula one checks that Aε3 = σ∈EK∩E (u− ũεK,Lσ ) (uε · nK,σ) dσ. By writing this sum as a sum on the edges we get Aε3 = σ∈Eint (vKσ − vLσ) · (u− ũεKσ ,Lσ) (uε · nKσ,σ) dσ. Thus, using definition (11) and a quadrature formula σ∈Eint (vKσ − vLσ ) (ΠPnc u− ũεKσ,Lσ) (uε · nKσ,σ) dσ σ∈Eint (vKσ − vLσ ) |σ| (ΠPnc u)(xσ)− ũεKσ,Lσ (uε · nKσ,σ)σ. We have |σ| ≤ h and, using a Taylor expansion, one checks that |vKσ−vLσ | ≤ h ‖v‖W1,∞ . Thus, thanks to the Cauchy-Schwarz inequality, we get |Aε3| ≤ C h2 σ∈Eint |uε(xσ)|2 σ∈Eint |(ΠPnc u)(xσ)− ũεKσ ,Lσ | Using (4) we get |Aε3| ≤ C σ∈EK∩E |uε(xσ)|2 σ∈EK∩E |(ΠPnc u)(xσ)− ũε|K |2 Therefore, using a quadrature formula |Aε3| ≤ C |uε| |ΠPnc1 u− uε| ≤ C |uε| |ΠPnc1 u− uε|. 18 Sébastien Zimmermann By writing ΠPnc u − uε = (ΠPnc u − u) + (u − uε) and using (12), we get |Aε3| ≤ C |uε| (h ‖u‖1+ |u−uε|). Thus, using the Cauchy-Schwarz inequality, we have |ψ| |Aε3| dt ≤ C ‖uε‖L2(0,T ;L2) (h ‖u‖L2(0,T ;H1) + ‖u− uε‖L2(0,T ;L2)). ψAε3 dt → 0. By gathering the limits for Aε1, Aε2, Aε3 we obtain∫ T bh(uε,uε,vh)− b(u,u,v) dt→ 0. Since b(u,u,v) dt → 0, we get bh(uε,uε,vh), dt → ψ b(u,u,v) dt. Finally, since vh ∈ P0, we have: (fε,vh) = (ΠP0 f ,vh) = (f ,vh). Therefore ψ (fε,vh) dt = χ[t1,T ] ψ (f ,vh) dt→ ψ (f ,v) dt. We now gather the limits we have obtained into (24). The space V∩(C∞0 )2 is dense in V. Hence we obtain for all v ∈ V and ψ ∈ C∞([0, T ]) with ψ(T ) = 0 −ψ(0) (u0,v)− ′ (u,v) dt+ ((u,v)) +b(u,u,v)− (f ,v) dt = 0. Taking ψ = φ ∈ C∞0 ([0, T ]), we have φ(0) = 0 and from the definition of the derivative in the distributional sense φ′ (u,v) dt = − (u,v) dt. Thus we have proven (23). At last, let us show that the initial condition holds. We have proven before that dũcε weakly in L2(0, T ;L2). Let v ∈ V ∩ (C∞0 )2 and ψ ∈ C∞([0, T ]) such that ψ(T ) = 0. We have dũcε dũcε (t− k),vh Integrating by parts the limit we get dũcε dũcε (t− k),vh dt → −ψ(0) u(0),v ψ′ (u,v) dt. By comparing this limit with (25), we get ψ(0) (u(0) − u0,v) = 0 for all ψ ∈ C∞([0, T ]) with ψ(T ) = 0. Therefore u(0) = u0. At last, note that we have proven so far the convergence of a sub-sequence of (uε)ε>0 towards u. But the application u such that (22), (23) and u(0) = u0 hold is unique ([13], p. 254). Thus the whole sequence (uε)ε>0 converges towards u. Convergence of a finite volume scheme for the incompressible fluids 19 References 1. Boivin, S., Cayre, F., Herard, J. M.: A finite volume method to solve the Navier- Stokes equations for incompressible flows on unstructured meshes. Int. J. Therm. Sci. 39 806–825 (2000). 2. Brenner, S. C., Scott, L.R.: The mathematical theory of finite element methods. Springer, 2002. 3. Brezzi, F., Fortin, M.: Mixed and hybrid finite element methods. Springer- Verlag, 1991. 4. Chorin, J.: On the convergence of discrete approximations to the Navier-Stokes equations. Math. Comp. 23 341–353 (1969). 5. Eymard, R., Herbin, R.: A staggered finite volume scheme on general meshes for the Navier-Stokes equations in two space dimensions. Int.J. Finite Volumes (2005). 6. Eymard, R., Latché, J. C., Herbin, R.: Convergence analysis of a colocated finite volume scheme for the incompressible Navier-Stokes equations on general 2 or 3D meshes. preprint LATP (2004). 7. Eymard, R., Gallouët, T., Herbin, R.: Finite volume methods. P.G. Ciarlet and J.L. Lions eds, North-Holland, 2000. 8. Faure, S.: Stability of a colocated finite volume scheme for the Navier-Stokes equations. Num. Meth. PDE 21(2) 242–271 (2005). 9. Girault, V., Raviart, P. A.: Finite Element Methods for Navier-Stokes equations: Theory and Algorithms. Springer (1986). 10. Guermond, J. L.:Some implementations of projection methods for Navier- Stokes equations. M2AN 30(5) 637–667 (1996). 11. Heywood, J. G., Rannacher, R.: Finite element approximation of the nonsta- tionary Navier-Stokes problem. I. Regularity of solutions and second-order er- ror estimates for spatial discretization. SIAM J. Numer. Anal. 19(26) 275–311 (1982). 12. Kim, D., Choi, H.: A second-order time-accurate finite volume method for unsteady incompressible flow on hybrid unstructured grids. J. Comp. Phys. 162, 411–428 (2000). 13. Temam, R.: Sur l’approximation de la solution des équations de Navier-Stokes par la méthode de pas fractionnaires II. Arch. Rat. Mech. Anal. 33 377–385 (1969). 14. Zimmermann, S.: Stability of a colocated finite volume for the incompressible Navier-Stokes equations. preprint (2006). 15. Zimmermann, S.: Stability of a finite volume scheme for the incompressible fluids. preprint (2006). ABSTRACT We consider a finite volume scheme for the two-dimensional incompressible Navier-Stokes equations. We use a triangular mesh. The unknowns for the velocity and pressure are respectively piecewise constant and affine. We use a projection method to deal with the incompressibility constraint. In a former paper, the stability of the scheme has been proven. We infer from it its convergence. <|endoftext|><|startoftext|> COMBINING SEVERAL ALGORITHMS INTO A SUPERIOR ONE OPTIMAL SYNTHESIS OF MULTIPLE ALGORITHMS KERRY M. SOILEAU ksoileau@yahoo.com JULY 27, 2004 ABSTRACT In this paper we give a definition of “algorithm,” “finite algorithm,” “equivalent algorithms,” and what it means for a single algorithm to dominate a set of algorithms. We define a derived algorithm which may have a smaller mean execution time than any of its component algorithms. We give an explicit expression for the mean execution time (when it exists) of the derived algorithm. We give several illustrative examples of derived algorithms with two component algorithms. We include mean execution time solutions for two-algorithm processors whose joint density of execution times are of several general forms. For the case in which the joint density for a two-algorithm processor is a step function, we give a maximum-likelihood estimation scheme with which to analyze empirical processing time data. mailto:ksoileau@yahoo.com 1 INTRODUCTION It can categorically be said that no algorithm is unique. By this we mean that for a given task, invariably more than one algorithm exists which will accomplish that task. One strategy is to select one algorithm deemed generally superior to the rest, and to use that algorithm exclusively. This paper examines an alternative strategy. We ask, given two or more equivalent algorithms, is it ever possible to create a new derived algorithm whose mean execution time is less than that of all of the original algorithms? If so, how can such an algorithm be derived? First we define clearly what we mean by the term “algorithm:” Algorithm: An algorithm α is a pair ( ),α αρ π , where :αρ Ω→Γ is a Turing- computable mapping of a countable set Ω (tasks) into a countable set Γ (outputs), and :απ Ω→ is a mapping of into the positive real numbers. The function Ω αρ specifies the algorithm’s output ( )αρ ω when presented with the task ω∈Ω . The function απ specifies the execution time ( )απ ω required to compute the output ( )αρ ω . Note that under this definition, given a task ω∈Ω , an algorithm will always produce a definite output, namely ( )αρ ω , and will always produce this output after a definite amount of time has passed, namely ( )απ ω . We do not address procedures which are nondeterministic or whose execution time is unpredictable. Definition: We say that an algorithm ( ),α αα ρ π= is finite if and only if for every ( )0 απ ω< < ∞ ω∈Ω . Note that “ ( ),α αα ρ π= is finite” does not imply “ απ is bounded.” For example, Quicksort and Bubblesort are finite. Definition: We say that two algorithms ( ),α αα ρ π= and ( ),β ββ ρ π= are equivalent if and only if Dom Dom α βρ ρ= and ( ) ( )α βρ ω ρ ω= for every ω∈Ω . Notice that equivalent algorithms may require different times to process a given task. For example, Quicksort and Bubblesort are equivalent. Definition: Let { }1 2, , , Nα α α be a set of equivalent algorithms. We say that nα dominates { }1 2, , , Nα α α if and only if for every ω∈Ω , ( ) (n iα α )π ω π ω≤ for every { }1,2, ,i N∈ . Now suppose we are given a set of finite equivalent algorithms { }1 2, , , Nα α α such that no nα dominates { }1 2, , , Nα α α . Suppose further that there exists a probability space over such that ( ), , PΩ ℑ Ω , , , Nα α α π π π are random variables. Let be the joint density of the random variables , , , : α α απ π π , , , Nα α α π π π . Definition of Derived Algorithm: From a set of finite equivalent algorithms { }1 2, , , Nα α α , and a given point ( ) [ ) 1 2 1, , , 0, Nτ τ τ − ∈ ∞ , the function is defined as follows. For each , we define the random variable 1 2 11 2 1 | | | : NN Nτ τ τ α α α α ⎡ Ω→ Γ⎣ ⎤⎦ ( ) [ ) 11 2 1, , , 0, Nτ τ τ − ∈ ∞ , , , 1 2 1, , , : NT α α απ π π τ τ τ − Ω→ as follows: ( )( ) , , , 1 2 1 1 2 2 3 1 2 2 2 1 2 1 1 , , , α α απ π π τ τ τ ω π ω ω τ π ω ω τ τ π ω ω τ τ τ π ω ω τ τ τ π ω ω ⎪ + + ∈⎪ ⎪ + + + + ∈ ⎪ + + + + ∈⎩ (1) 1 2 1 1 2 1 | | | τ τ τ ρ ω ω ρ ω ω ρ ω ω α α α α ω ρ ω ω ρ ω ω ⎪ ∈⎪⎡ ⎤ = ⎨⎣ ⎦ (2) where ( ) ( ) ( ){ } 1 21 2 ; , , , S α α αω τ π ω τ π ω τ π ω= ∈Ω < < < for 1 1n N≤ ≤ − . Each is the event consisting of the points ω∈Ω on which none of the algorithms 1 2, , , nα α α completes processing within each algorithm’s permitted run time limit. The derived algorithm is then defined to be the pair . ( )( )1 2 1 1 21 2 1 , , , 1 2 1| | | , , , ,N NN N NT α α ατ τ τ π π πα α α α τ τ τ−− −⎡ ⎤⎣ ⎦ ( )( ) , , , 1 2 1, , , NT α α απ π π τ τ τ ω− represents the time taken for the derived algorithm to execute when presented with the task ω , and 1 2 11 2 1 | | | NNτ τ τ N α α α α ⎡ ⎤⎣ ⎦ represents the derived algorithm’s output when presented with the task ω . We may envision an implementation of this algorithm as follows. When presented with a task ω∈Ω , a timer is started, and 1α is applied. If 1α has not completed by time 1τ , 1α is abandoned and 2α is applied. If 2α has not completed by time 1 2τ τ+ , 2α is abandoned and 3α is applied, and so on. If 1Nα − has not completed by time 1 2 1Nτ τ τ −+ + + 1N, α − is abandoned and Nα is applied and (unlike the other algorithms) is allowed to run without time limit. ( ) ρ ω is returned as output , where iα is the algorithm which completed execution on the task ω∈Ω . The expected value (if it exists) of the random variable ( ) , , , 1 2 1, , , NT α α απ π π τ τ τ − is given by the following Theorem 1: ( ) ( ) ( ) ( ) ( ) ( ) ( ) , , , 1 2 1 1 1 1 1 1 , , , n n n n N N ET E S P S E S S P S S E S P S α α απ π π α τ τ τ π − − − = Ω Ω ∼ ∼ 1− Proof: Recall that ( )( ) , , , 1 2 1 1 2 2 3 1 2 2 2 1 2 1 1 , , , α α απ π π τ τ τ ω π ω ω τ π ω ω τ τ π ω ω τ τ τ π ω ω τ τ τ π ω ω ⎪ + + ∈⎪ ⎪ + + + + ∈ ⎪ + + + + ∈⎩ It follows immediately that ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) , , , 1 2 1 1 1 2 1 2 1 2 2 3 2 3 1 2 2 2 1 2 1 1 2 1 1 1 , , , N N N N N N N E S P S E S S P S S E S S P S S E S S P E S P S α α απ π π τ τ τ τ τ π τ τ τ π τ τ τ π − − − − − − − = Ω Ω + + + + + + + + + + + + + ∼ NS S −∼ (4) This may be written as ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) , , , 1 2 1 1 1 1 1 2 1 2 1 2 1 2 2 3 2 3 2 3 1 2 2 2 1 2 1 2 1 1 2 1 1 1 1 , , , N N N N N N N N N N N ET E S P S P S S E S S P S S P S S E S S P S S P S S E S S P S S P S E S P S α α απ π π α τ τ τ π τ π τ τ τ τ τ τ τ τ π − − − − − − − − − − − = Ω Ω + + + + + + + + + + + + + + ∼ ∼ ∼ ∼ ∼ ∼ ∼ ∼ Telescoping sums yield ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( )( ) ( )( ) , , , 1 2 1 1 1 1 2 1 2 2 3 2 3 2 1 2 1 1 1 2 2 3 2 1 1 2 2 3 2 1 1 1 1 , , , N N N N N N N N N N N N ET E S P S E S S P S S E S S P S S E S S P S S E S P S P S S P S S P S S P S P S S P S S P S P S α α απ π π α τ τ τ π − − − − − − − − − − − − = Ω Ω + + + + + + + + + + + ∼ ∼ ∼ Next, ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) , , , 1 2 1 1 1 1 2 1 2 2 3 2 3 2 1 2 1 1 1 1 1 2 2 1 1 , , , N N N N N N N N ET E S P S E S S P S S E S S P S S E S S P S S E S P S P S P S P S α α απ π π α τ τ τ π π τ τ τ − − − − − − − = Ω Ω + + + + + + + + ∼ ∼ ∼ ∼ − whence ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( , , , 1 2 1 1 1 1 1 1 1 , , , n n n n N N n ET E S P S )nE S S P S S E S P S P S α α απ π π α τ τ τ π − − − − = Ω Ω + +∑ ∑ ∼ ∼ τ+ as desired. 2 CASE (TWO ALGORITHMS) 2N = In this case ( )( ) ( ) ( ) ( ) ( ) π ω π ω τ π ω τ π ω + <⎪⎩ ( ) ( ) ( ) ( ) ( ) ( ) 1 21 2 , 1 1 1 1 1 1 1ET E S P S E S P S P Sα απ π α ατ π π τ= Ω Ω + +∼ ∼ and ( ){ } ;S αω τ π ω= ∈Ω < , (11) ( ) ( ) ( ) ( )( ) ( )1 1 1 2 1 11 2, 1 1 1 1 1 1ET E P E Pα απ π α α α α α ατ π π τ π τ π τ π τ τ π= ≤ ≤ + < + < (12) 2.1 ( ) , 1ET α απ π τ WHEN JOINT DENSITY IS ( )1 2, ,f x yα απ π In this case, ( ) ( ) ( ) ( ) 1 2 1 2 1 2 , 1 , 1 , 0 0 0 ,ET xf x y dydx y f x y dydx α α α α α α π π π π π π ∞ ∞ ∞ = + +∫ ∫ ∫ ∫ , (13) 2.11 EXAMPLE Suppose the joint density of completion times for the two algorithms is given by ( ) ( )( , , 12 exp )f x y xy x yα απ π = − + (14) Figure 1. ,f α απ π Then ( ) ( )( )( ) ( ) ( ) 4 2 3 23 3 , 1 1 1 1 1 1 12 8erf 1 expET α απ π τ π τ τ τ τ τ τ= − + + + − + π (15) Figure 2. ,ET α απ π 2.12 EXAMPLE Suppose the joint density of completion times for the two algorithms is given by ( ) ( ) , , 48 exp 4 3f x y xy x yα απ π = − − Figure 3. ,f α απ π Then ( ) ( ) ( ) , 1 1 14 6erf 2 3 exp 4ET α απ π τ π τ π τ= + − (17) Figure 4. ,ET α απ π 2.13 EXAMPLE Suppose the joint density of completion times for the two algorithms is given by ( ) ( )( ) ( ) ( )( )1 2 , 2 2 exp 1 7 , .022179119694367830844 exp 7 1 f x y xy α απ π ⎛ ⎞− − − − ⎜ ⎟= ⎜ ⎟ + − − − −⎜ ⎟ (18) Figure 5. ,f α απ π Figure 6. ,ET α απ π The minimum occurs at 1 2.492τ . Note that if 1 2.492τ , then ( ) , 1 2.854ET α απ π τ , while 4.260E απ and 2 4.260E απ . In this case the derived algorithm has better mean execution time than either of the original algorithms. Its mean execution time is approximately 33% less than that of either of the original algorithms. Notation: In the following, wherever 1 2, , , MB B B is found in a context requiring a Boolean expression, it means the conjunction 1 2 MB B B∧ ∧ ∧ of the Boolean expressions 1 2, , , MB B B . Notation: In the following, if B is a Boolean expression, then ( ) . 1 is tru 0 is fals In particular we define ( ) a x b a x x b ≤ < ≡ ≤ <⎨ 2.14 EXAMPLE Suppose the joint density of completion times for the two algorithms is given by ( ) ( )( ) ( )( ) , 1 3 4 5 8 2 4 f x y x y α απ π = ≤ < ≤ < + ≤ < ≤ < (19) Figure 7. ,f α απ π Figure 8. ,ET α απ π The minimum occurs at 1 3τ = . Note that if 1 3τ = , then ( ) , 1 4ET α απ π τ = , while 4.25E απ = and 2 4.25E απ = . In this case the derived algorithm has better mean execution time than either of the original algorithms. Its mean execution time is approximately 6% less than that of either of the original algorithms. 2.15 EXAMPLE Suppose the joint density of completion times for the two algorithms is given by ( ) ( ) , , expf x y x yα απ π = − − Figure 9. ,f α απ π Figure 10. ,ET α απ π Note that for any choice of 1τ , then ( ) , 1 1ET α απ π τ = , while 1 1E απ = and 2 1E απ = . In this case the derived algorithm has exactly the same mean execution time as do the original algorithms, so a derived algorithm would be of no benefit. 2.16 ( ) , 1ET α απ π τ DOES NOT ALWAYS EXIST If we take ( ) ( ) ( )1 2 f x y x yα α π π = , then ( ) ( ) ( ) 1 2 1 2 , 1 , 0 0 0 ,xf x y dydx y f x y dydx α α α α π π π π ∞ ∞ ∞ + + =∫ ∫ ∫ ∫ (21) so in this case ( ) , 1ET α απ π τ does not exist. 2.2 ( ) , 1ET α απ π τ WHEN JOINT DENSITY IS OF THE FORM ( ) ( ) ( 1 2 1 2 , , )f x y f x f yα α α απ π π π= ( ) ( ) ( )( ) ( ) ( ) ( ) ( 1 21 2 1 1 1 1 2 1 , 1 1 1 1 1 1 1 ET xf x dx P E E P E P α α α π π π α α α α α α α τ τ π τ π π τ π τ τ π τ π = + < + = ≤ ≤ + + < 2.3 ( ) , 1ET α απ π τ WHEN JOINT DENSITY IS OF THE FORM ( ) ( ) 1 2 2 ax by cx dy xpf x y x y a b c dα απ π + + + + + + + + If then 0 , , ,a b c d≤ ( ) ( 1 2 2 ax by cx dy )xpf x y x y a b c dα απ π + + + + = − − + + + + is a density function over ( ) . Accordingly, [ ) [ ), 0, 0,x y ∈ ∞ × ∞ ( ) ( ) 1 2 6 2 2 4 4 exp 1 2 2 a b c d c a b c d a b c dα απ π + + + + − + − + − − + + + + (23) 2.4 ( ) , 1ET α απ π τ WHEN JOINT DENSITY IS OF THE FORM ( ) ( ) ( )( ) ( ) ( )1 2 , 2 3 3 1 1 1 1 f x y d m n x y x yα α + + + + If , 0 , , then ( )0 ,d m n≤ c≤ ( ) c d m n + =∑∑ 1 ( ) ( ) ( )( ) ( ) ( )1 2 , 2 3 3 1 1 1 1 f x y d m n x y x yα α + + + + is a density function over ( ) [ ) [ ), 0, 0,x y ∈ ∞ × ∞ . A straightforward calculation yields ( ) ( ) ( ) ( ) ( ) ( ) ( ) 0 2 0 01 1 1, , 0, 2 1 1 1 1 1 1 2, 1, ln 1 n m n n ET d n d m n d d m n d m n c ∞ ∞ ∞ ∞ = = = = = + + + + + + + − − − +⎜ ⎟+⎝ ⎠+ ∑ ∑ ∑ ∑ (25) 2.5 ( ) , 1ET α απ π τ WHEN JOINT DENSITY IS OF THE FORM ( ) ( )( n n n n n )f x y k a x b c y d α απ π = ≤ < ≤ <∑ Theorem 2: If and ( )( ) n n n n n k b a d c − −∑ = 0 nk< for 1, 2, ,n N= , and ( ) ( )( n n n n n )f x y k a x b c y d α απ π = ≤ < ≤ <∑ , then ( ) ( ) ( )( ) ( )( )( ) ( )( ) ( )( )( ) ( )( ) , 1 1 2 n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n ET k d c k b a d c k b d c d c k b a d c k b d c a d c k b a d c α απ π ⎜ ⎟= − −⎜ ⎟⎜ ⎟ − −⎜ ⎟ + ⎜ ⎟ ⎜ ⎟+ − + −⎜ ⎟ + − − + + − + − − (26) Notation: In the following, if is a Boolean expression, then . ( )B n ( )( ) s B n ≡∑ ∑ s Proof: If and 0( )( ) n n n n n k b a d c − −∑ = nk< for 1, 2, ,n N= , then ( ) ( )( n n n n n )f x y k a x b c y d α απ π = ≤ < ≤ <∑ is a density function over . Now note that ( ) [ ) [ ), 0, 0,x y ∈ ∞ × ∞ ( ) ( ) ( ) ( ) 1 2 1 2 1 2 , 1 , 1 , 0 0 0 ,ET xf x y dydx y f x y dydx α α α α α α π π π π π π ∞ ∞ ∞ = + +∫ ∫ ∫ ∫ , ( )( ) ( ) ( )( ) n n n n n n n n n n x k a x b c y d dydx y k a x b c y d dydx = ≤ < ≤ < + + ≤ < ≤ < (27) ( )( ) ( )( )( ) ( ) ( ) ( ) ( ) 1 0 0 n n n n n n n n n n n n n n n n n n k x a x b c y d dydx k y a x b c y d dy k d c x a x b dx k a x b dx y dy = ≤ < ≤ < + + ≤ < ≤ < = − ≤ < + ≤ < + ∑ ∫ ∫ (28) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( ) n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n k d c x a x b dx k a x b dx k d c x a x b dx k d c x a x b dx k d c x a x b dx k d c c d a x b dx = − ≤ < ⎛ ⎞+ + ⎜ ⎟+ ≤ < − = − ≤ < + − ≤ < + − ≤ < + − + + ≤ < ( )( ) ( ) ( )( ) ( ) n n n n n n n n n n n n n n k d c c d a x b dx k d c c d a x b dx + − + + ≤ < + − + + ≤ < (29) ( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( )( ) ( ) n n n n n n n n n n n n n n n n n n n n n n n n n n n n n ET k d c x a x b dx k d c x a x b dx k d c c d a x b dx k d c c d a x b dx k d c xdx k d = − ≤ < + − ≤ < + − + + ≤ < + − + + ≤ < = − + − ∑ ∫ ( ) ( )( ) ( )( ) ( )( ) ( ) ( ) ( )( )( ) 2 2 21 1 1 12 2 2 n n n n n n n n n n n n n n n n n n n n n n a b b n n n n n n n n n n c xdx k d c c d dx k d c c d dx k d c a k d c b a k d c c d b a k d c < < ≤ + − + + + − + + = − − + − + − + + − ( )( ) n n n c d b + + −∑ ( )( ) ( )( )( ) ( )( ) ( )( )( ) ( )( ) n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n k d c k b a d c k b d c d c k b a d c k b d c a d c k b a d c ⎜ ⎟= − −⎜ ⎟⎜ ⎟ − −⎜ ⎟ + ⎜ ⎟ ⎜ ⎟+ − + −⎜ ⎟ + − − + + − + − − (31) ( ) ( ) ( )( ) ( )( )( ) ( )( ) ( )( )( ) ( )( ) , 1 1 2 n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n ET k d c k b a d c k b d c d c k b a d c k b d c a d c k b a d c α απ π ⎜ ⎟= − −⎜ ⎟⎜ ⎟ − −⎜ ⎟ + ⎜ ⎟ ⎜ ⎟+ − + −⎜ ⎟ + − − + + − + − − (32) In particular, ( ) ( ) ( )( ) ( )( )( ) ( )( ) ( )( )( ) ( )( ) n i n n i n n i n i i n n n a a b n n n n n n n n n n n a a b n n n n n n n n n n n n a a b n n n n n ET a a k d c k b a d c k b d c d c k b a d c k b d c a d c k b a d c α απ π ⎜ ⎟= − −⎜ ⎟⎜ ⎟ − −⎜ ⎟ + ⎜ ⎟ ⎜ ⎟+ − + −⎜ ⎟ + − − + + − + − − (33) ( ) ( ) ( )( ) ( )( )( ) ( )( ) ( )( )( ) ( )( ) n i n n i n n i n i i n n n a b b n n n n n n n n n n n a b b n n n n n n n n n n n n a b b n n n n n ET b b k d c k b a d c k b d c d c k b a d c k b d c a d c k b a d c α απ π ⎜ ⎟= − −⎜ ⎟⎜ ⎟ − −⎜ ⎟ + ⎜ ⎟ ⎜ ⎟+ − + −⎜ ⎟ + − − + + − + − − (34) It is straightforward to show that ( ) , 1ET α απ π τ attains a global minimum at one of the points { }1 2 1 2, , , , ,Na a a b b bN . Indeed, ( ) , 1ET α απ π τ is a continuous, piecewise quadratic function. Notice that the set of points of connection of the pieces is a subset of { }1 2 1 2, , , , ,Na a a b b bN . Each piece is either linear or is quadratic with a negative second derivative. We can thus replace each quadratic piece with a linear piece connecting the endpoints of the quadratic piece, without altering the global minimum of , 1ET α απ π τ . After replacing each quadratic piece with the appropriate linear piece, we then have a continuous piecewise linear function whose global minimum is the same as that of ( ) , 1ET α απ π τ . But of course the global minimum of a continuous piecewise linear function is attained at one of its vertices. These vertices are a subset of the set of points of connection { }1 2 1 2, , , , ,Na a a b b bN , as desired. This global minimum is given by ( ) ( ) ( ) ( ) ( ) ( ) 1 2 1 2 1 2 1 2 1 2 1 2 , 1 , 2 , , 1 , 2 , , , , , , , , ET a ET a ET a ET b ET b ET b α α α α α α α α α α α α π π π π π π π π π π π π ⎪ ⎪⎩ ⎭ ⎬ (35) This minimum can be computed in ( )2O N time. MAXIMUM LIKELIHOOD ESTIMATION Suppose ( ) ( )( n n n n n )f x y k a x b c y d α απ π = ≤ < ≤ <∑ with the following conditions: 1. for 1 , 0nk > n N≤ ≤ 2. for 1 , na b< n n N≤ ≤ 3. for 1 , nc d< n N≤ ≤ 4. The boxes for 1[ ) [ ), ,n n n n nB a b c d≡ × n N≤ ≤ are disjoint, 5. . ( )( ) n n n n n k b a d c − −∑ = Then ,f α απ π is a joint density function. Suppose next that we have observed the performance of two equivalent algorithms α and β over a (finite) sample set sΩ ⊂ Ω . That is, for each task sω∈Ω we have observed the values ( )απ ω and ( )βπ ω representing the time that algorithms α and β actually took to process the task ω . We now present a maximum-likelihood procedure to find the “best fitting” joint density function of the form ( ) ( )( n n n n n )f x y k a x b c y d α απ π = ≤ < ≤ <∑ , subject to the five conditions above. Let ( ) ( ) ( ){ }1 1 2 2, , , , , ,P Px y x y x y be the data observed, where jx and are the durations required by algorithms jy α and β respectively, to process j sω ∈Ω , for 1 j P≤ ≤ , with sP = Ω . Our performance function is defined as ( ) ( ) ( )( ) 1 2 1 2 , 1 2 , , , , N P P N N m m n n m n n m n g k k k f x y k a x b c y d k k k α απ π ≡ = ≤ < ≤ < =∑∏ ∏ N where ( ) ( ) ( ) ( ){ }{ }1 1 2 2, , , , , , , ; ,j P P j j j jS x y x y x y x y a x b c y d≡ ∈ ≤ < ≤ < We form as usual the Lagrange multiplier equations ( )( )1 0j j j j j S b a d c λ + − − = for 1 j P≤ ≤ . We have immediately that ( )( ) j j j j b a d c and recalling the constraint n n n n n k b a d c = − −∑ ) we infer S Pλ λ = − = −∑ whence λ = − thus ( )( ) j j j j P b a d c Substituting into (32), we get ( )( ) , 1 1 2 n n n n n ET d c P b a d cα απ π ⎜ ⎟= − −⎜ ⎟− −⎜ ⎟ ( )( ) ( )( ) ( )( ) ( )( )( ) n n n n n n n n n n n n n n n n n n n b a d c P b a d c b d c d c P b a d c − −⎜ ⎟− −⎜ ⎟ + ⎜ ⎟ ⎜ ⎟+ − +⎜ ⎟− −⎜ ⎟ ( )( ) ( )( ) n n n n n n n n n b a d c P b a d c + − − − −∑ ( )( ) ( )( )( ) n n n n n n n n n n n b d c a d c P b a d c − −∑ − − ( )( ) ( )( ) n n n n n n n n n b a d c P b a d c − −∑ − ( )( ) n n n n a b n n n n n n P b a S b d c P b a ⎜ ⎟⎛ ⎞ ⎜ ⎟⎜ ⎟= − + ⎜ ⎟⎜ ⎟− ⎜ ⎟⎜ ⎟ + −⎝ ⎠ ⎜ ⎟−⎜ ⎟ ( )( ) ( ) 1 1 1 21 1 1 2 2 2 1 1 1 n n n n N N N n n n n n n n n n n nn n a a b b d c b d c a b a P P b a P τ τ τ = = = ≤ < < ≤ + + + + − + ∑ ∑ ∑ n 1 1 1 , 1 1 12 1 1 1 n n n n n N N N n n nn n n nn n n n a b a a b b d cS ET S S b a P b aα απ π τ τ τ τ τ τ = = = < < ≤ < < − +⎜ ⎟= − + +⎜ ⎟− −⎜ ⎟ ∑ ∑ ∑ ( )( ) ( ) ( ) 2 2 2 n n n n n n n n n n n n n nP P n nn n a b a b b d c a S d c S b a < < ≤ ≤ + + − + + + 3 CONCLUSIONS In this paper, we asked the following questions: Given two or more equivalent algorithms, is it ever possible to create a new derived algorithm whose mean execution time is less than that of all of the original algorithms? If so, how can such an algorithm be derived? By giving examples in Section 2, we have shown that the answer to the first question is “yes.” In Section 1, we gave an explicit construction of the derived algorithm. ABSTRACT In this paper we give a definition of "algorithm," "finite algorithm," "equivalent algorithms," and what it means for a single algorithm to dominate a set of algorithms. We define a derived algorithm which may have a smaller mean execution time than any of its component algorithms. We give an explicit expression for the mean execution time (when it exists) of the derived algorithm. We give several illustrative examples of derived algorithms with two component algorithms. We include mean execution time solutions for two-algorithm processors whose joint density of execution times are of several general forms. For the case in which the joint density for a two-algorithm processor is a step function, we give a maximum-likelihood estimation scheme with which to analyze empirical processing time data. <|endoftext|><|startoftext|> Draft version October 22, 2018 Preprint typeset using LATEX style emulateapj v. 08/22/09 NEW CLOSE BINARY SYSTEMS FROM THE SDSS–I (DATA RELEASE FIVE) AND THE SEARCH FOR MAGNETIC WHITE DWARFS IN CATACLYSMIC VARIABLE PROGENITOR SYSTEMS Nicole M. Silvestri , Mara P. Lemagie , Suzanne L. Hawley , Andrew A. West , Gary D. Schmidt , James Liebert , Paula Szkody , Lee Mannikko , Michael A. Wolfe , J. C. Barentine , Howard J. Brewington Michael Harvanek , Jurik Krzesinski , Dan Long , Donald P. Schneider , and Stephanie A. Snedden (Received 2007 February 9) Draft version October 22, 2018 ABSTRACT We present the latest catalog of more than 1200 spectroscopically–selected close binary systems observed with the Sloan Digital Sky Survey through Data Release Five. We use the catalog to search for magnetic white dwarfs in cataclysmic variable progenitor systems. Given that approximately 25% of cataclysmic variables contain a magnetic white dwarf, and that our large sample of close binary systems should contain many progenitors of cataclysmic variables, it is quite surprising that we find only two potential magnetic white dwarfs in this sample. The candidate magnetic white dwarfs, if confirmed, would possess relatively low magnetic field strengths (BWD < 10 MG) that are similar to those of intermediate–Polars but are much less than the average field strength of the current Polar population. Additional observations of these systems are required to definitively cast the white dwarfs as magnetic. Even if these two systems prove to be the first evidence of detached magnetic white dwarf + M dwarf binaries, there is still a large disparity between the properties of the presently known cataclysmic variable population and the presumed close binary progenitors. Subject headings: binaries: close — cataclysmic variables — stars: low-mass — stars: magnetic fields — stars: white dwarfs 1. INTRODUCTION The evolution of stars in close binary systems leads to interesting stellar end-products such as cataclysmic variables (CVs), Type 1a supernovae, and helium–core white dwarfs (WDs). The period in which an evolved star ascends the asymptotic giant branch and engulfs a close companion in its evolving atmosphere, referred to as the common envelope phase, probably plays a dominant role in the evolution of these systems and as yet is poorly understood. The angular momentum of the system is believed to aid in the eventual ejec- tion of the common envelope to reveal the remnant WD and close companion. After the common envelope has been ejected, gravitational and magnetic braking work to decrease the orbital separation of the detached system (de Kool & Ritter 1993). This orbital evolution contin- ues through to the CV phase. The effect of the common envelope on the secondary star in these systems is an- other aspect of close binary evolution which is not well characterized. Plausible scenarios for the secondary com- 1 Department of Astronomy, University of Washington, Box 351580, Seattle, WA 98195, USA; nms@astro.washington.edu, mlemagie@u.washington.edu, slh@astro.washington.edu, szkody@astro.washington.edu, leeman@u.washington.edu, maw2323@u.washington.edu. 2 Astronomy Department, 601 Campbell Hall, University of California, Berkeley, CA 94720, USA; awest@astro.berkeley.edu 3 Department of Astronomy and Steward Observatory, Univer- sity of Arizona, Tucson, AZ 85721, USA; schmidt@as.arizona.edu, jliebert@as.arizona.edu. 4 Apache Point Observatory, P.O. Box 59, Sunspot, NM 88349, USA; jcb@apo.nmsu.edu, hbrewington@apo.nmsu.edu, harvanek@apo.nmsu.edu, long@apo.nmsu.edu, sned- den@apo.nmsu.edu. 5 Mt. Suhora Observatory, Cracow Pedagogical University, ul. Podchorazych 2, 30-084 Cracow, Poland; jurek@apo.nmsu.edu. 6 Department of Astronomy, Penn State University, PA 16802 USA; dps@astro.psu.edu panion range from accreting as much as 90% of its mass during this phase to escaping relatively unscathed from the common envelope, emerging in the same state as it entered (see Livio 1996, and references therein). Recently, studies of close binary systems with WD companions (see for example Farihi et al. 2005b,a; Pourbaix et al. 2005; Silvestri et al. 2006) have revealed yet another puzzling property of these systems. None of the WDs in close binary systems with low–mass, main se- quence companions appear to be magnetic (Liebert et al. 2005). Close, non–interacting binary systems with WD primaries are quite common and are believed to be the direct progenitors to CVs (Langer et al. 2000, and ref- erences therein). Magnetic WDs, stellar remnants with magnetic fields in excess of ∼ 1 MG, comprise only a small percentage of the isolated WD population (∼ 2%, Liebert et al. 2005). Note that the 2% magnetic WD fraction applies to magnitude–limited samples like the Palomar–Green (Liebert et al. 1988). However, the same paper notes that magnetic WDs may generally have smaller radii than non–magnetic white dwarfs, due to higher mass. In a given volume, the density of mag- netic WDs may be ∼ 10% of all WDs (Liebert et al. 2003). The SDSS is also a magnitude limited sample so we assume a similar expected value for the close bi- naries. Our sample (as discussed in detail in §2) con- tains 1253 potential close binary systems. Therefore we assume approximately 24 of these binaries to harbor a magnetic WD. Possible implications of the small radii for magnetic WD + main sequence pairs will be dis- cussed in §5. However, more than 25% of the WDs in the currently identified CV population are classified as magnetic, and many have magnetic fields in excess of 10 MG (see Wickramasinghe & Ferrario 2000). Holberg et al. (2002) have compiled a list of 109 known http://arxiv.org/abs/0704.0789v1 2 Silvestri et al. WDs within 20pc (and complete to within 13pc) that have nearly complete information about the presence of a companion. Of the 109 WDs in their sample, 19 ± 4 have nondegenerate companions. Table 7 in Kawka et al. (2007) lists all known magnetic WDs as of June 2006. Of the magnetic WDs listed in their table, 149 have field strengths identifiable in SDSS-resolution spectra (BWD ≥ 3 MG). If the magnetic WDs in the Kawka et al. (2007) sample are assumed to be drawn from a similar sample then 28 ± 5.3 would be expected to have nondegenerate companions, and yet none have been detected in the Kawka et al. (2007) sample. This is nearly a 5σ deficit in magnetic WDs with nondegenerate companions. Holberg & Magargal (2005) looked at the 2MASS JHKs photometry of 347 WDs in the Palomar–Green sample. Of the 347 WDs, 254 had reliable infrared mea- surements of at least J magnitude. Of these, 59 had excesses indicative of a nondegenerate companion and another 15 showed “probable” excesses (Liebert et al. 2005). This gives a WD+dM fraction of 23% (definite excess) and 29% (including all probable excesses). If the Kawka et al. (2007) sample had the same frequency of of nondegenerate companions as the Palomar–Green sam- ple, they should have 34 and 43, respectively. This is nearly as 6σ deficit! This apparent lack of magnetic WDs with main se- quence companions is not restricted to studies of close binaries. Low resolution spectroscopic surveys of more than 500 common proper motion binary systems dis- covered by Luyten et al. (1964); Luyten (1968, 1972) and Giclas et al. (1971, 1978) revealed no magnetic WDs paired with main sequence companions in these wide pairs (Smith 1997; Silvestri et al. 2005). In addition, Schmidt et al. (2003) and Vanlandingham et al. (2005) have identified over 100 magnetic WDs in the Sloan Digital Sky Survey (SDSS, Gunn et al. 1998; York et al. 2000; Stoughton et al. 2002; Pier et al. 2003; Gunn et al. 2006). As discussed by Liebert et al. (2005), this implies essentially no overlap between the close binary and mag- netic WD samples. A new class of short–period, low accretion–rate polars (LARPS) identified by Schmidt et al. (2005b) may ex- plain, in part, these “missing” magnetic WD systems. In these systems, the donor star has not filled its Roche Lobe. The WD accretes material by capturing the stel- lar wind of the secondary. These CVs have accretion rates that are less than 1% of accretion rates normally associated with CVs. The discovery of these systems sheds some light on the whereabouts of magnetic WD binaries, though as Schmidt et al. (2005b) point out, this still does not explain the apparent lack of long– period, detached magnetic WD systems. Thought to be the first detached binary with a magnetic WD, SDSS J121209.31+013627.7, a magnetic WD with a proba- ble brown dwarf (L dwarf) companion (Schmidt et al. 2005a) has been shown to be one of these LARP systems (Debes et al. 2006; Koen & Maxted 2006; Burleigh). To date, magnetic WDs have only been found as isolated ob- jects, in binaries with another degenerate object (WD or neutron star companion), or in CVs; none have a clearly main sequence companion. In this study, we investigate a new large sample of close binary systems in an effort to uncover these “missing” magnetic WD binary systems. The sample comprises more than 1200 close binary systems containing a WD and main sequence star drawn from the SDSS, many of which were originally presented in Silvestri et al. (2006, hereafter, S06). We find that only two of the WDs in these pairs appear to be magnetic. Even if confirmed, neither of these WDs has magnetic field strength compa- rable to those observed in the majority of magnetic (Po- lar) CV systems. We confirm that the current CV and close binary populations are indeed disparate and show that more work is necessary to unravel this mystery. In §2 we introduce the catalog of close binary sys- tems through the public SDSS Data Release Five (DR5; Adelman-McCarthy et al. 2007). We discuss our anal- ysis techniques in §3 and we present our results in §2. Our discussion and concluding remarks are given in §5 and §6, respectively. 2. THE SDSS CLOSE BINARY CATALOG THROUGH DR5 The combined properties of the majority of close binaries in this paper are discussed in detail in Raymond et al. (2003) and S06. The S06 cata- log was based on a preliminary list of spectroscopic plates released internally to the collaboration and as such does not include objects from ∼ 200 plates re- leased with the final public Data Release Four (DR4; Adelman-McCarthy et al. 2006). The additional systems from both DR4 and DR5 do not change the overall re- sults from analysis performed in S06, hence no new anal- ysis is presented here. We include this list in its en- tirety to complete the DR4 catalog introduced by S06 and add over 300 new systems from the now public DR5 (Adelman-McCarthy et al. 2007). This completes the catalog of close binary systems with a WD identified through SDSS–I. More close binaries are being targeted in the SDSS–II (SEGUE) survey which will continue to increase the sample through 2008. The list of 1253 potential close binary systems given in Table 1 includes objects from all plates released with the public DR5, thereby superseding the S06 DR4 cat- alog. The technique used to search for these objects is the same as described in S06. As with that study, we do not include systems with low signal-to-noise ratios (S/N < 5) and do not search for systems with non–DA WDs. We emphasize that our sample is not complete (or bias free) due to the selection effects imposed by our detection methods and due to the sporadic targeting of these objects in the SDSS spectroscopic survey as dis- cussed in S06. Thus, our sample represents primarily bright, DAWD +M dwarf binary systems. As evidenced by Smolčić et al. (2004), there are potentially thousands more WD + M dwarf binaries observed photometrically in the SDSS but not targeted for spectroscopy. Our cat- alog represents an interesting and statistically significant sampling of these systems, the properties of which can be used to test models of close binary evolution (see Politano & Weiler 2006, for example). The list of plate numbers from which this sample has been drawn can be found at http://das.sdss.org/DR5/data/spectro/1d 23/. This plate list includes both “extra” and “special” plates. The extra plates are repeat observations of survey plates taken during normal operation. The special plates are observations for special programs (e.g. SEGUE, F stars, http://das.sdss.org/DR5/data/spectro/1d_23/ Magnetic WDs in CV Progenitors 3 Fig. 1.— Example of an M dwarf with excess blue flux (:+dM) from Table 1. The companion is seen as little more than excess blue flux in the M dwarf spectrum. Follow-up spectroscopy to resolve the companion is necessary to rule out the presence of a magnetic WD. Note: spectrum has been boxcar smoothed with a filter size of seven. main sequence turnoff stars, quasar selection efficiency, etc.) that are not part of the original SDSS–I survey. The first four columns of Table 1 list the SDSS identi- fier, the plate number, fiber identification, and modified Julian date (MJD) of the observation, followed by the spectral type of the components (determined visually) where Sp1 represents the blue object and Sp2 is the red object. Columns 6 and 7 give the J2000 coordinates (in decimal degrees) for the object. The next 15 columns give the ugriz PSF photometry (Fukugita et al. 1996; Hogg et al. 2001; Ivezić et al. 2004; Smith et al. 2002; Tucker et al. 2006), photometric uncertainties (σugriz), and reddening (Augriz). The magnitudes are not cor- rected for Galactic extinction. Column 23 lists the SDSS data release in which the object was discovered as well as additional references in the literature. Additional notes for the objects are listed in column 24. The objects identified in Table 1 as :+dM are likely M dwarfs with faint, cool WD companions. The dis- covery spectra for these objects reveal little more than excess blue flux at wavelengths shorter than 5000 Å, as shown in Figure 1. It is possible that some of these pairs may contain a magnetic WD; however, much higher S/N spectra are required to adequately characterize the blue component of these systems. Similarly, the thirty nine objects identified as WD+: or WD+:e (see Figure 8 of S06) have either some ex- cess flux in the red or have emission at Balmer wave- lengths indicative of a faint, active, low–mass or sub– stellar companion. The companion to the magnetic WD in Schmidt et al. (2005a) was first identified by emission at Hα in the SDSS discovery spectrum. Other than the emission at Hα this object had no other optical signa- ture of a companion. We are performing followup obser- vations using the ARC 3.5–m telescope at Apache Point Observatory to obtain radial velocities and near–infrared imaging of these objects to measure the orbital periods and categorize the probable low–mass companion’s spec- tral type. We have already confirmed that none of these systems contain a magnetic WD. 3. THE SEARCH FOR MAGNETIC WDS Schmidt et al. (2003) and Vanlandingham et al. (2005) demonstrated that magnetic WDs with field strengths as low as ∼ 3 MG can be effectively measured using SDSS spectra. Visual inspection of the systems in our sample reveals no obvious magnetic WDs in spectra with good S/N (> 10) (Lemagie et al. 2004). Most are classical Fig. 2.— A Typical WD+dM System: SDSS J140723.03+003841.7, the superposition of a DA (hydrogen atmosphere) WD and a M4 red dwarf star. Hα emission is visible in many of these systems and is a result of chromspheric activity on the surface of the M star, perhaps enhanced due to the influence of the WD. The lack of broad Zeeman absorption features in the hydrogen lines indicates that the magnetic field strength of the WD is very low (compare with Figure 4). WD + M dwarf close binaries as shown in Figure 2. Of interest are the lower quality spectra, where the features of the WD are less obvious because of low S/N and/or contamination by the spectral features of the close M dwarf companion. These effects make it difficult to iden- tify small magnetic field effects on the WD absorption features. Thus, relatively low magnetic fields (BWD < 10 MG) are not easily recognized in the combined spectrum. 3.1. The Simulated Magnetic Binary Systems Given the difficulties associated with visually identi- fying features in these systems, we developed a method to search for the characteristic Zeeman splitting of the DA WD absorption features that is also sensitive to low magnetic field WDs. We use a program that attempts to match absorption features in magnetic DA WD mod- els (see Kemic 1974b,a; Schmidt et al. 2003, and refer- ences therein for details on the models) through an it- erative method of smoothing and searching the stellar spectrum. To develop a robust program to search for magnetic WDs in close binaries we first tested our pro- gram on WDs of known magnetic field strength. We used the magnetic DA WDs with field strengths between 1.5 MG ≤ BWD ≤ 30 MG from Schmidt et al. (2003) and Vanlandingham et al. (2005) as our test sample. We then constructed model spectra at every half–MG be- tween 1.5 MG ≤ BWD ≤ 30 MG, each with magnetic field inclinations of 30◦, 60◦, and 90◦. The program was able to match (using a χ2 minimization) the magnetic field strength of each of the magnetic WDs to within ±5 MG of the value quoted in Schmidt et al. (2003). We then constructed a sample of simulated SDSS spec- tra of magnetic binary systems. The simulated binaries were created by adding the spectra of magnetic WDs used in our initial test from Schmidt et al. (2003) and Vanlandingham et al. (2005) to the M star templates of Hawley et al. (2002). We first normalized all spectra at a wavelength of 6500 Å, and then combined them with flux ratios of 1:4 (WD:M dwarf) to 4:1 to replicate the range of flux ratios observed in the close binary sample (see Figure 3)7. This created a sample of binaries which represent the average brightness and spectral type dis- tribution of the majority of the systems in Table 1 (i.e. 7 Note that 6500 Å is the midpoint of the SDSS combined blue and red spectra, as plotted in Figure 3. In reality the SDSS spectra extend to below 3900 Å and to nearly 10000 Å. 4 Silvestri et al. Fig. 3.— Comparison of simulated and observed pre–cataclysmic variable (PCV) systems. Left Hand Column: Simulated magnetic PCVs produced by adding WD spectra from Schmidt et al. (2003) to M dwarf spectra from Hawley et al. (2002) with brightness ratios as specified at 6500 Å. Right Hand Column: Observed PCVs from Silvestri et al. (2006). Fig. 4.— A Simulated System. Top Left Panel: A 13 MG mag- netic WD from Schmidt et al. (2003). Top Right Panel: Template M4 dwarf star from Hawley et al. (2002). Bottom Panel: addition of the magnetic WD and template M dwarf, assuming equal flux density at 6500 Å. DA WDs and M0–M5 dwarfs). Figure 4 is an example of one of the simulated magnetic binary systems. The upper left hand panel is the SDSS spectrum of a 13 MG magnetic WD, the upper right hand panel is the spectrum of a template M4 dwarf star. The bottom panel is the addition (superposition) of the two spectra with a flux ratio of 1:1 at 6500 Å. As shown, this WD with a relatively moderate magnetic field, when combined with the spectrum of an average M dwarf, is clearly detected at the resolution of the SDSS spectra (R ∼ 1800). 3.2. Results from the Simulated Systems We found that detecting the presence of a WD mag- netic field depends most strongly on the spectral type and relative flux of the M dwarf companion. Due to the selection effects of the close binary sample (see S06 for details), the majority of the M dwarfs in these binaries have spectral sub–types between M0–M4. In SDSS spec- tra, early M dwarf spectral types contribute nearly as much flux in the blue portion of the spectrum (4000–7000 Å) as they do in the red (7000–10000 Å). The spectrum of the blue magnetic WD is then superimposed onto the numerous blue molecular features of the M dwarf. This makes the small absorption features stemming from the subtle influence of a weak magnetic field difficult to de- tect. We plot a subset of our simulated pairs to demon- strate some of these issues in Figure 5 and Figure 6. In Figure 5 we selected four early–type template M dwarfs (WD+M0 = open squares, WD+M1 = open circles, WD+M2 = open triangles, and WD+M3 = Fig. 5.— Left Hand Panel: Subset of the simulated binary sys- tems comprised of early–M dwarfs from Hawley et al. (2002) paired with magnetic WDs and literature values from Schmidt et al. (2003) and Vanlandingham et al. (2005) with BWD ≤ 10 MG. Right Hand Panel: Same M dwarfs from Left Panel paired with magnetic WDs with BWD ≥ 10 MG. The measured values are from our program. In both panels, the filled triangles represent single WDs, open squares are WD+M0, open circles are WD+M1, open triangles are WD+M2, and crosses are WD+M3. The solid line has a slope of one and the dashed lines are ±5 MG. Refer to § 3.2 of the text for details. Fig. 6.— Left Hand Panel: Subset of the simulated binary sys- tems comprised of late-M dwarfs from Hawley et al. (2002) paired with magnetic WDs and literature values from Schmidt et al. (2003) and Vanlandingham et al. (2005) with BWD ≤ 10 MG. Right Hand Panel: Same M dwarfs from Left Panel paired with magnetic WDs with BWD ≥ 10 MG. The measured values are from our program. In both panels, the filled triangles represent single WDs, open squares are WD+M4, open circles are WD+M5, open triangles are WD+M6, and crosses are WD+M7. The solid line has a slope of one and the dashed lines are ±5 MG. Refer to § 3.2 of the text for details. crosses) from Hawley et al. (2002) and added them to a range of magnetic WDs from Schmidt et al. (2003) and Vanlandingham et al. (2005). The quoted value from Schmidt et al. (2003) for the magnetic field strength of each of these WDs represents the “Literature BWD” value on the x–axis. The “Measured BWD” is the value returned by the program. Values returned by the pro- gram that matched the literature values fall along the solid line. The dashed lines represent ±5 MG of the literature value. Figure 6 is the same except we add the same magnetic WDs to later–type M dwarf tem- plates (WD+M4 = open squares, WD+M5 = open cir- cles, WD+M6 = open triangles, andWD+M7 = crosses). The solid triangles represent the tests using the isolated WD spectra. In both Figures 5 and 6 the program returns the value of the single WD to within ∼ ±2 MG for the large ma- jority of the systems. The uncertainty of the fitted value and the spread in values increases for magnetic fields of 3 MG or less when the magnetic WD is paired with an M dwarf of comparable brightness. The flux minima associ- ated with the Zeeman features for such low field strengths are just barely resolvable in high S/N spectra of isolated SDSS WDs (see Schmidt et al. 2003). The added com- Magnetic WDs in CV Progenitors 5 Fig. 7.— Here, we plot the flux ratio (WD flux/ M dwarf [dM]) versus the difference between the literature value (from Schmidt et al. 2003; Vanlandingham et al. 2005) of the magnetic field strength (BLit) and the measured magnetic field strength (BMea) as determined from the WD Hα (top panel), Hβ (center panel), and Hγ (bottom panel) absorption features. Error bars are from the χ2 fit. Refer to § 3.2 of the text for details. plexity of the M dwarf molecular features and the gen- erally lower S/N spectra make it difficult to measure the magnetic features for low magnetic field strengths. How- ever, WDs with magnetic fields ≥ 4 MG were easily mea- sured at all M dwarf spectral types. In both Figures, the largest discrepancies between the literature and measured values occur when the WD’s magnetic field is between 12 MG ≤ BWD ≤ 18 MG; this is true when the WD is paired with both early– and late–type M dwarfs. Inspection of the model results in- dicates that at these field strengths, the Zeeman features overlap on wavelengths with strong M dwarf molecular features, causing confusion in the identification of the fea- ture. However, WD spectra with these and larger field strengths are quite easily recognized visually so we are confident that no systems with ≥ 10 MG have escaped notice, though the exact value of the field strength would be more uncertain. In Figure 7, we demonstrate the effect of the relative flux ratio (WD: M dwarf [dM]) on the identification of the magnetic field strength of WDs in the simulated binary sample. The Figure gives the relative flux ratio versus the difference between the magnetic fields quoted in the literature and those returned by the program. We use the same BWD distribution in Figure 7 as used in Fig- ure 5 and Figure 6. The literature values (BLit) are from Schmidt et al. (2003) and Vanlandingham et al. (2005). The three panels show ratios determined using Hα (top), Hβ (center), and Hγ (bottom). The program consis- tently returns the quoted BWD as determined from Hβ until the flux contribution from the M dwarf is nearly double the flux contribution from the WD. The program returns the magnetic field from the Hα feature to within ±5 MG until the flux contribution from the M dwarf is nearly 1.5× the flux from the WD. The BWD as mea- sured by Hγ is consistently 15–25 MG larger than the BWD value in the literature at any flux ratio. The con- tribution of a relatively clean spectral region near Hβ, to- gether with the fairly strong Zeeman signal at this wave- length makes Hβ a reliable indicator of WD magnetic field strength for binaries with flux ratios up to 1:2. 4. TWO POSSIBLE MAGNETIC WDS IN THE DR5 CLOSE BINARY SAMPLE The method employed by S06 to split the binary sys- tem into its two component spectra through an itera- tive method of fitting and subtracting WD model at- mospheres and template M dwarf spectra was not used Fig. 8.— Two potential magnetic DA WD + M dwarf pairs as identified by our program. The tentative magnetic field strengths are 8 MG ±5 MG (top) and 3 MG +5/−3 MG (bottom) as deter- mined from the Hα and Hβ WD absorption features. on these objects. There are no obviously strong mag- netic WDs in the sample, suggesting that any possibly magnetic WDs must possess relatively weak fields. The subsequent fitting and subtraction of model WDs and template M dwarfs adds noise to the spectrum which would make detection of an already weak magnetic field even more difficult. Also, we would be subtracting a non–magnetic WD model from the spectrum of a poten- tially magnetic WD in our attempt to improve the M dwarf template fit. This adds absorption features where none actually exist, further corrupting the WD spectrum. Given these complications, we chose to work with the original composite SDSS discovery spectra. Table 2 lists the properties of the only two close bi- nary systems flagged by our program as containing po- tential magnetic WDs: SDSS J082828.18+471737.9 and SDSS J125250.03−020608.1. The first four columns are the same as for Table 1, followed by the R.A. and Decl. (J2000 coordinates). The tentative magnetic field strengths (in MG), inclination of the WD magnetic field to the line of sight (in degrees) and the spectral types of the components are listed in Columns 7–9. For each of these systems the magnetic field strength estimate is based upon a match to at least two of the three Balmer features (Hα, Hβ, and/or Hγ) to within ±5 MG of the model minima. The last six columns give the ugriz pho- tometry and the SDSS data release for the objects. Refer to Table 1 for a full listing of photometric errors, redden- ing and alternate literature sources. Figure 8 displays the spectra of these two objects, which have relatively low S/N (∼ 5 at Hα) The iden- tification of the magnetic field strength was determined from the Hα and Hβ features in each spectrum, which upon closer inspection may show some Zeeman splitting. The best fit model for SDSS J082828.18+471737.9 has a magnetic field strength of 8 MG and an inclination of 90◦, while the best fit model for SDSS J125250.03−020608.1 has a magnetic field strength of 3 MG and an inclina- tion of 90◦. Hβ appears to be distorted in both systems, indicating a potential broadening of a few MG field, how- ever Hγ and Hδ would show more splitting than Hβ but both appear to be relatively sharp in comparison. Hβ may be affected by TiO features from the M dwarf and there does appear to be a minor glitch in the blue por- tion of the spectrum, indication difficulty with SDSS flux calibration. 5. DISCUSSION Of the 1253 potential close binary systems in the DR5 catalog, there were 168 systems that we could not mea- 6 Silvestri et al. sure with our program. These include the :+dM systems and binaries with non-DA WDs. We were not able to unambiguously determine if the :+dM systems have a magnetic or a non–magnetic WD as the blue component is barely visible in most of the :+dM SDSS spectra. Un- til we can identify the companion, we can not make any statement about magnetism in these objects. The :+dM cases where a blue component is seen in the spectrum which must be a WD, but too faint even to classify the type may include (a) cases where the WD is simply very cool, but also (b) magnetic WDs of suitably warmer ef- fective temperature but with smaller radii. These need to be reobserved in the blue with a spectrograph and tele- scope of large aperture. We made no attempt to measure the DB WDs because we lack viable magnetic DB WD models; however, all of the DB spectra matched well to non–magnetic DB WD models, so we believe it is un- likely that any of the WDs in these pairs are magnetic. We could not measure the pairs with DC WDs because there are no features with which we can detect a magnetic field and therefore cannot rule out magnetism without employing polarimetry or other methods of identifying a magnetic field in these objects. Of the remaining binary systems, we find only two that may contain WDs with weak magnetic fields. Our automatic detection methods are sensitive to magnetic fields between 3 MG ≤ BWD ≤ 30 MG; field strengths larger than this are easily identified by visual inspec- tion. Therefore, there is a significant shortage of close binary systems that could be the progenitors of the large Intermediate–Polar and Polar CV populations. As mentioned in §1, Schmidt et al. (2005b) discuss six newly identified low accretion rate magnetic binary sys- tems as being the probable progenitors to magnetic CVs. The magnetic field strengths of the WDs in these sys- tems are fairly high, with most around 60 MG. These objects are clearly pre-Polars and provide an obvious link between post–common envelope, detached binaries and Polars. The existence of these objects, however only adds to the mystery. If observations of these objects are possi- ble then why have no detached binary systems with large magnetic field WDs been detected? Perhaps selection effects are to blame. Schmidt et al. (2005b) discuss the various selection effects associated with targeting these pre–Polars with the SDSS. As is the case with the majority of the close binary systems, the pre–Polars were targeted by the SDSS QSO target- ing pipeline (Richards et al. 2002) which accounts for the narrow range of magnetic field strengths found in these objects. In the case of significantly lower or higher mag- netic field strengths, the pre–Polars resemble an ordinary WD + M dwarf binary in color–color space and are re- jected by the QSO targeting algorithm. It is possible that this selection effect accounts for the lack of close binary magnetic systems targeted by the SDSS as well. Arguing against this explanation is the large number of detached close binary systems in our sample, and the fact that the pre–Polars were observed by the SDSS. It is quite sur- prising that a detached system with a WD magnetic field in the range required to detect these pre–Polars has not been observed, if such objects exist. Another selection effect discussed by Liebert et al. (2005) argues that magnetic WDs, on average, are more massive than non–magnetic WDs; this implies smaller WD radii and therefore less luminous WDs. Faint, mas- sive WDs in competition with the flux from an M star companion might go undetected in an optical survey because they are hidden by the more luminous, non– degenerate companion. This would imply an unusually small mass ratio (q = M2/M1) for the initial binary if the progenitor of the magnetic white dwarf were mas- sive (3-8 M⊙). Thus, the magnetics may usually have been paired with an A-G star. However, the vast major- ity of polars and intermediate polars with strongly mag- netic primaries have M dwarf companions. Perhaps they were whittled down from more massive stars by mass transfer. The LARPS are selected for spectroscopy be- cause of their peculiar colors, which arise because of the isolated cyclotron harmonics. As Schmidt et al. (2005b, 2007) point out, the WDs in LARPS are generally rather faint (cool) and, in one case, undetected. So the large mass/small radius selection effect would also apply to the pre–Polars which have been observed by SDSS. 6. CONCLUSIONS We present a new sample of close binary systems through the Data Release Five of the SDSS. This cat- alog includes more than 1200 WD + M dwarf binary systems and represents the largest catalog of its kind to date. We have fit magnetic DA WD models (see Schmidt et al. 2003, and references therein) to the 1100 DA WD + M dwarf close binaries in the DR5 sample. Only two have been found to potentially harbor a mag- netic DA WD of low (BWD < 10 MG) magnetic field strength. Neither of these potential magnetic WDs are convincing cases, though follow–up spectroscopy to im- prove the S/N or polarimetry on these objects should be performed to completely rule out the presence of a magnetic field. The remaining ∼ 100 close binaries comprised of M dwarfs with excess blue flux (:+dM) and binaries with non–DA WDs require other means of detecting mag- netic fields. Methods that are sensitive to magnetic fields weaker than 3 MG should also be employed on this sample to detect possible Intermediate–Polar progenitors that may have escaped detection with our methods. Even if future spectroscopic or polarimetric observa- tions reveal the two DA WD candidates to be magnetic, their field strengths will likely prove to be quite low. A sample of two, detached, low magnetic field WD binaries is not representative of the majority of known magnetic WDs in CVs nor would it comprise an adequate progen- itor population for the newly discovered magnetic pre– Polars described in Schmidt et al. (2005b). The question of where the progenitors to magnetic CVs are remains unanswered by the current spectroscopically identified close binary population. This work was supported by NSF Grant AST 02–05875 (NMS, SLH), a University of Washington undergraduate research grant (MPL), NSF grant AST 03–06080 (GDS), and NSF grant AST 03–07321 (JL). Funding for the SDSS and SDSS–II has been pro- vided by the Alfred P. Sloan Foundation, the Partic- ipating Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronau- Magnetic WDs in CV Progenitors 7 tics and Space Administration, the Japanese Monbuka- gakusho, the Max Planck Society, and the Higher Educa- tion Funding Council for England. The SDSS Web Site is http://www.sdss.org/. The SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions. The Par- ticipating Institutions are the American Museum of Nat- ural History, Astrophysical Institute Potsdam, Univer- sity of Basel, University of Cambridge, Case Western Reserve University, University of Chicago, Drexel Uni- versity, Fermilab, the Institute for Advanced Study, the Japan Participation Group, Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for Particle Astrophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sci- ences (LAMOST), Los Alamos National Laboratory, the Max–Planck–Institute for Astronomy (MPIA), the Max– Planck–Institute for Astrophysics (MPA), New Mexico State University, Ohio State University, University of Pittsburgh, University of Portsmouth, Princeton Uni- versity, the United States Naval Observatory, and the University of Washington. http://www.sdss.org/ 8 Silvestri et al. REFERENCES Abazajian, K., et al. 2003, AJ, 126, 2081 —. 2004, AJ, 128, 502 —. 2005, AJ, 129, 1755 Adelman-McCarthy, J. K., et al. 2006, ApJS, 162, 38 —. 2007, ApJS, submitted (Burleigh), M. R., et al. 2006, MNRAS, accepted,[astro-ph/0609366], accepted,[astro de Kool, M., & Ritter, H. 1993, A&A, 267, 397 Debes, J. H., López-Morales, M., Bonanos, A. Z., & Weinberger, A. J. 2006, ApJ, 647, L147 Eisenstein, D., et al. 2006, AJ, accepted [astro-ph/0606700], accepted [astro Farihi, J., Becklin, E. E., & Zuckerman, B. 2005a, ApJS, 161, 394 Farihi, J., Zuckerman, B., & Becklin, E. E. 2005b, Astronomische Nachrichten, 326, 964 Fukugita, M., Ichikawa, T., Gunn, J. E., Doi, M., Shimasaku, K., & Schneider, D. P. 1996, AJ, 111, 1748 Giclas, H. L., Burnham, R., & Thomas, N. G. 1971, Lowell proper motion survey Northern Hemisphere. The G numbered stars. 8991 stars fainter than magnitude 8 with motions ≥0′′.26/year (Flagstaff, Arizona: Lowell Observatory, 1971) Giclas, H. L., Burnham, Jr., R., & Thomas, N. G. 1978, Lowell Observatory Bulletin, 8, 89 Gunn, J. E., et al. 1998, AJ, 116, 3040 —. 2006, AJ, 131, 2332 Hawley, S. L., et al. 2002, AJ, 123, 3409 Hogg, D. W., Finkbeiner, D. P., Schlegel, D. J., & Gunn, J. E. 2001, AJ, 122, 2129 Holberg, J. B., & Magargal, K. 2005, in ASP Conf. Ser. 334: 14th European Workshop on White Dwarfs, ed. D. Koester & S. Moehler, 419–+ Holberg, J. B., Oswalt, T. D., & Sion, E. M. 2002, ApJ, 571, 512 Ivezić, Ž., et al. 2004, Astronomische Nachrichten, 325, 583 Kawka, A., Vennes, S., Schmidt, G. D., Wickramasinghe, D. T., & Koch, R. 2007, ApJ, 654, 499 Kemic, S. B. 1974a, ApJ, 193, 213 —. 1974b, ApJ, 193, 213 Kleinman, S. J., et al. 2004, ApJ, 607, 426 Koen, C., & Maxted, P. F. L. 2006, MNRAS, 371, 1675 Langer, N., Deutschmann, A., Wellstein, S., & Höflich, P. 2000, A&A, 362, 1046 Lemagie, M. P., Silvestri, N. M., Hawley, S. L., Schmidt, G. D., Liebert, J., & Wolfe, M. A. 2004, in Bulletin of the American Astronomical Society, 1515 Liebert, J., Bergeron, P., & Holberg, J. B. 2003, AJ, 125, 348 Liebert, J. et al. 2005, AJ, 129, 2376 Liebert, J., et al. 1988, PASP, 100, 1302 Livio, M. 1996, in ASP Conf. Ser. 90: The Origins, Evolution, and Destinies of Binary Stars in Clusters, ed. E. F. Milone & J.-C. Mermilliod, 291 Luyten, W. J. 1968, Univ. Minnesota, Minneapolis,$ fasc. 1-57,$ 1963-81,1963, 13, 1 (1968), 13, 1 —. 1972, Proper Motion Survey with the 48-inch Telescope, Univ. Minnesota, 29, 1 (1972), 29, 1 Luyten, W. J., Anderson, J. H., & University Of Minnesota. Observatory. 1964, Publications of the Astronomical Observatory University of Minnesota Pier, J. R., Munn, J. A., Hindsley, R. B., Hennessy, G. S., Kent, S. M., Lupton, R. H., & Ivezić, Ž. 2003, AJ, 125, 1559 Politano, M., & Weiler, K. P. 2006, ApJ, 641, L137 Pourbaix, D., et al. 2005, A&A, 444, 643 Raymond, S. N., et al. 2003, AJ, 125, 2621 Richards, G. T., et al. 2002, AJ, 123, 2945 Schmidt, G. D., Szkody, P., Henden, A., Anderson, S. F., Lamb, D. Q., Margon, B., & Schneider, D. P. 2007, ApJ, 654, 521 Schmidt, G. D., Szkody, P., Silvestri, N. M., Cushing, M. C., Liebert, J., & Smith, P. S. 2005a, ApJ, 630, L173 Schmidt, G. D., et al. 2003, ApJ, 595, 1101 —. 2005b, ApJ, 630, 1037 Schuh, S., & Nagel, T. 2006, in ASP Conf. Ser., The 15th European Workshop on White Dwarfs, ed. R. Napiwotzki & M. Burleigh, accepted [astro–ph/0610324] Silvestri, N. M., Hawley, S. L., & Oswalt, T. D. 2005, AJ, 129, Silvestri, N. M., et al. 2006, AJ, 131, 1674 Smith, J. A. 1997, Ph.D. Thesis, Florida Institute of Technology Smith, J. A., et al. 2002, AJ, 123, 2121 Smolčić, V., et al. 2004, ApJ, 615, L141 Stoughton, C., et al. 2002, AJ, 123, 485 Tucker, D. L., et al. 2006, Astronomische Nachrichten, 327, 821 van den Besselaar, E. J. M., Roelofs, G. H. A., Nelemans, G. A., Augusteijn, T., & Groot, P. J. 2005, A&A, 434, L13 Vanlandingham, K. M., et al. 2005, AJ, 130, 734 Wickramasinghe, D. T., & Ferrario, L. 2000, PASP, 112, 873 York, D. G., et al. 2000, AJ, 120, 1579 http://arxiv.org/abs/astro-ph/0609366 http://arxiv.org/abs/astro-ph/0606700 http://arxiv.org/abs/astro--ph/0610324 Magnetic WDs in CV Progenitors 9 TABLE 1 The SDSS–I DR5 Catalog of Close Binary Systems. Identifier Plate FiberID MJD Sp1+Sp2a R.A.b Decl. upsf σu Au gpsf σg Ag rpsf σr Ar ipsf σi Ai zpsf σz Az Refs c Notesd (SDSS J) (deg) (deg) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) 001029.87+003126.2 0388 545 51793 DZ:+dM 2.62448 00.52396 21.93 0.19 0.14 20.85 0.04 0.10 19.98 0.03 0.08 19.00 0.02 0.06 18.42 0.04 0.04 EDR 001726.63−002451.2 0687 153 52518 DA+dMe 4.36099 −00.41422 19.68 0.04 0.14 19.29 0.03 0.10 19.03 0.02 0.07 18.19 0.02 0.06 17.54 0.03 0.04 R03 001733.59+004030.4 0389 614 51795 DA+dM 4.38996 00.67511 22.10 0.40 0.13 20.79 0.14 0.10 19.59 0.03 0.07 18.17 0.02 0.05 17.39 0.02 0.04 EDR/R03 001749.24−000955.3 0389 112 51795 DA+dMe 4.45519 −00.16539 16.57 0.02 0.13 16.87 0.02 0.10 17.03 0.01 0.07 16.78 0.01 0.05 16.47 0.02 0.04 EDR/R03 002620.41+144409.5 0753 079 52233 DA+dMe 6.58505 14.73597 17.57 0.01 0.27 17.35 0.01 0.20 17.34 0.02 0.15 16.65 0.01 0.11 16.04 0.02 0.08 DR2 Note. — Table 1 is published in its entirety in the electronic edition of the AJ. A portion is shown here for guidance regarding its form and content. ugriz photometry has not been corrected for Galactic extinction. Sp1: Spectral type of the WD, Sp2: Spectral type of the low–mass dwarf (see Silvestri et al. 2006, for details on Sp determination); e: emission detected visually. R.A. and Decl. are J2000.0 equinox. EDR: Stoughton et al. (2002); DR[1,2,3]: Abazajian et al. (2003, 2004, 2005); DR[4,5]: Adelman-McCarthy et al. (2006, 2007); R03: published in Raymond et al. (2003); K04: published in Kleinman et al. (2004); B05: published in van den Besselaar et al. (2005); Sc05: published in Schmidt et al. (2005a); S06: published in Silvestri et al. (2006); E06: published in Eisenstein et al. (2006); P05: published in Pourbaix et al. (2005); KM: published in Koen & Maxted (2006); SN: published in Schuh & Nagel (2006); da06: R. da Silva (priv. comm., 2006). low: potential low gravity (log g < 7) white dwarf. 10 Silvestri et al. TABLE 2 Two Potential Magnetic White Dwarf Binary Systems. Identifier Plate Fiber MJD R.A. Decl. B i Sp1+Sp2 u g r i z Release SDSS J (deg) (deg) (MG) (deg) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) 082828.18+471737.9 0549 338 51981 127.11742 +47.29387 8 90 DA+dM 20.41 20.35 20.33 19.58 19.02 DR1 125250.03−020608.1 0338 343 51694 193.20846 −02.10227 3 90 DA+dM 19.25 19.12 18.89 18.31 17.82 DR1 ABSTRACT We present the latest catalog of more than 1200 spectroscopically-selected close binary systems observed with the Sloan Digital Sky Survey through Data Release Five. We use the catalog to search for magnetic white dwarfs in cataclysmic variable progenitor systems. Given that approximately 25% of cataclysmic variables contain a magnetic white dwarf, and that our large sample of close binary systems should contain many progenitors of cataclysmic variables, it is quite surprising that we find only two potential magnetic white dwarfs in this sample. The candidate magnetic white dwarfs, if confirmed, would possess relatively low magnetic field strengths (B_WD < 10 MG) that are similar to those of intermediate-Polars but are much less than the average field strength of the current Polar population. Additional observations of these systems are required to definitively cast the white dwarfs as magnetic. Even if these two systems prove to be the first evidence of detached magnetic white dwarf + M dwarf binaries, there is still a large disparity between the properties of the presently known cataclysmic variable population and the presumed close binary progenitors. <|endoftext|><|startoftext|> arXiv:0704.0790v3 [hep-th] 15 Nov 2007 Dynamical Casimir effect for gravitons in bouncing braneworlds Marcus Ruser∗ and Ruth Durrer† Département de Physique Théorique, Université de Genève, 24 quai Ernest Ansermet, 1211 Genève 4, Switzerland. We consider a two-brane system in five-dimensional anti-de Sitter space-time. We study particle creation due to the motion of the physical brane which first approaches the second static brane (contraction) and then recedes from it (expansion). The spectrum and the energy density of the generated gravitons are calculated. We show that the massless gravitons have a blue spectrum and that their energy density satisfies the nucleosynthesis bound with very mild constraints on the parameters. We also show that the Kaluza-Klein modes cannot provide the dark matter in an anti- de-Sitter braneworld. However, for natural choices of parameters, backreaction from the Kaluza- Klein gravitons may well become important. The main findings of this work have been published in form of a Letter [R.Durrer and M.Ruser, Phys. Rev. Lett. 99, 071601 (2007), arXiv:0704.0756]. PACS numbers: 04.50.+h, 11.10.Kk, 98.80.Cq I. INTRODUCTION In recent times, the possibility that our observed Universe might represent a hypersurface in a higher- dimensional space-time has received considerable attention. The main motivation for this idea is the fact, that string theory [1, 2], which is consistent only in ten spac-etime dimensions (or 11 for M–theory) allows for solutions where the standard model particles (like fermions and gauge bosons) are confined to some hypersurface, called the brane, and only the graviton can propagate in the whole space-time, the bulk [2, 3]. Since gravity is not well constrained at small distances, the dimensions normal to the brane, the extra dimensions, can be as large as 0.1mm. Based on this feature, Arkani-Hamed, Dimopoulos and Dvali (ADD) proposed a braneworld model where the presence of two or more flat extra-dimensions can provide a solution to the hierarchy problem, the problem of the huge difference between the Planck scale and the electroweak scale [4, 5]. In 1999 Randall and Sundrum (RS) introduced a model with one extra dimension, where the bulk is a slice of five-dimensional anti de-Sitter (AdS) space. Such curved extra dimensions are also referred to as warped extra dimensions. While in the RS I model [6] with two flat branes of opposite tension at the edges of the bulk the warping leads to an interesting solution of the hierarchy problem, it localizes four-dimensional gravity on a single positive tension brane in the RS II model [7]. Within the context of warped braneworlds, cosmological evolution, i.e., the expansion of the Universe, can be understood as the motion of the brane representing our Universe through the AdS bulk. Thereby the Lanczos- Sen-Darmois-Israel-junction conditions [8, 9, 10, 11], relate the energy-momentum tensor on the brane to the ∗Electronic address: marcus.ruser@physics.unige.ch †Electronic address: ruth.durrer@physics.unige.ch extrinsic curvature and hence to the brane motion which is described by a modified Friedmann equation. At low energy, however, the usual Friedmann equations for the expansion of the Universe are recovered [12, 13]. Since gravity probes the extra dimension, gravita- tional perturbations on the brane, i.e. in our Universe, carry five-dimensional effects in form of massive four- dimensional gravitons, the so-called Kaluza-Klein (KK) tower. Depending on the particular brane trajectory, these perturbations may be significantly amplified lead- ing to observable consequences, for example, a stochastic gravitational wave background. (For a review of stochas- tic gravitational waves see [14].) This amplification mechanism is identical to the dynamical Casimir effect for the electromagnetic field in cavities with dynamical walls (moving mirrors); see [15, 16, 17] and references therein. In the quantum field theoretical language, such an amplification corresponds to the creation of particles out of vacuum fluctuations. Hence, in the same way a moving mirror leads to production of photons, the brane moving through the bulk causes creation of gravitons. Thereby, not only the usual four-dimensional graviton might be produced, but also gravitons of the KK tower can be excited. Those massive gravitons are of particular interest, since their energy density could dom- inate the energy density of the Universe and spoil the phenomenology if their production is sufficiently copious. The evolution of cosmological perturbations under the influence of a moving brane has been the subject of many studies during recent years. Since one has to deal with partial differential equations and time-dependent boundary conditions, the investigation of the evolution of perturbations in the background of a moving brane is quite complicated. Analytical progress has been made based on approximations like the “near brane limit” and a slowly moving brane [18, 19, 20, 21]. The case of de Sitter or quasi-de Sitter inflation on the brane has been investigated analytically in [22, 23, 24, 25, 26]. In [25] it is demonstrated that dur- http://arxiv.org/abs/0704.0790v3 mailto:marcus.ruser@physics.unige.ch mailto:ruth.durrer@physics.unige.ch yb(t) FIG. 1: Two branes in an AdS5 spacetime, with y denot- ing the fifth dimension and L the AdS curvature scale. The physical brane is on the left at time dependent position yb(t). While it is approaching the static brane its scale factor is decreasing and when it moves away from the static brane it is expanding [cf. Eq. (2.3)]. The value of the scale factor of the brane metric as function of the extra dimension y is also indicated. ing slow-roll inflation (modeled as a period of quasi-de Sitter expansion) the standard four-dimensional result for the amplitude of perturbations is recovered at low energies while it is enhanced at high energies. However, most of the effort has gone into numerical simulations [27, 28, 29, 30, 31, 32, 33, 34, 35, 36], in particular in order to investigate the high-energy regime. Thereby different coordinate systems have been used for which the brane is at rest, and different numerical evolution schemes have been employed in order to solve the partial differential equation. In this work we chose a different way of looking at the problem. We shall apply a formalism used to describe the dynamical Casimir effect to study the production of gravitons in braneworld cosmology. This approach and its numerical implementation offers many advantages. The most important one is the fact that this approach deals directly with the appearing mode couplings by means of coupling matrices. (In [19] a similar approach involving coupling matrices has been used. However, perturbatively only, and not in the complexity presented here.) Hence, the interaction between the four-dimensional graviton and the KK modes is not hidden within a numerical simulation but can directly be investigated making it possible to reveal the underlying physics in a very transparent way. We consider a five-dimensional anti-de Sitter spacetime with two branes in it; a moving positive tension brane representing our Universe and a second brane which, for definiteness, is kept at rest. This setup is depicted in Fig. 1. For this model we have previously shown that in a radiation dominated Universe, where the second, fixed brane is arbitrarily far away, no gravitons are produced [37]. The particular model which we shall consider is strongly motivated by the ekpyrotic or cyclic Universe and similar ideas [38, 39, 40, 41, 42, 43, 44, 45, 46]. In this model, roughly speaking, the hot big bang corresponds to the collision of two branes; a moving bulk brane which hits “our” brane, i.e. the observable Universe. Within such a model, it seems to be possible to address all major cosmological problems (homogeneity, origin of density perturbations, monopole problem) without invoking the paradigm of inflation. For more details see [38] but also [39] for critical comments. One important difference between the ekpyrotic model and standard inflation is that in the latter one tensor perturbations have a nearly scale invariant spectrum. The ekpyrotic model, on the other hand, predicts a strongly blue gravitational wave spectrum with spectral tilt nT ≃ 2 [38]. This blue spectrum is a key test for the ekpyrotic scenario since inflation always pre- dicts a slightly red spectrum for gravitational waves. One method to detect a background of primordial gravitational waves of wavelengths comparable to the Hubble horizon today is the polarization of the cosmic microwave background. Since a strongly blue spectrum of gravitational waves is unobservably small for large length scales, the detection of gravitational waves in the cosmic microwave background polarization would falsify the ekpyrotic model [38]. Here we consider a simple specific model which is generic enough to cover important main features of the generation and evolution of gravitational waves in the background of a moving brane whose trajectory involves a bounce. First, the physical brane moves towards the static brane, initially the motion is very slow. During this phase our Universe is contracting, i.e. the scale factor on the brane decreases, the energy density on the brane increases and the motion becomes faster. We suppose that the evolution of the brane is driven by a radiation component on the brane, and that at some more or less close encounter of the two branes which we call the bounce, some high-energy mechanism which we do not want to specify in any detail, turns around the motion of the brane leading to an expanding Universe. Modeling the transition from contraction to subsequent expansion in any detail would require assumptions about unknown physics. We shall therefore ignore results which depend on the details of the transition. Finally the physical brane moves away from the static brane back towards the horizon with expansion first fast and then becoming slower as the energy density drops. This model is more similar to the pyrotechnic Universe of Kallosh, Kofman and Linde [39] where the observable Universe is also represented by a positive tension brane rather than to the ekpyrotic model where our brane has negative tension. We address the following questions: What is the spec- trum and energy density of the produced gravitons, the massless zero mode and the KK modes? Can the gravi- ton production in such a brane Universe lead to limits, e.g. on the AdS curvature scale via the nucleosynthesis bound? Can the KK modes provide the dark matter or lead to stringent limits on these models? Similar results could be obtained for the free gravi-photon and gravi-scalar, i.e. when we neglect the perturbations of the brane energy momentum tensor which also couple to these gravity wave modes which have spin-1 respectively spin-0 on the brane. The reminder of the paper is organized as follows. After reviewing the basic equations of braneworld cos- mology and tensor perturbations in Sec. II, we discuss the dynamical Casimir effect approach in Sec. III. In Sec. IV we derive expressions for the energy density and the power spectrum of gravitons. Thereby we show that, very generically, KK gravitons cannot play the role of dark matter in warped braneworlds. This is explained by the localization of gravity on the moving brane which we discuss in detail. Section V is devoted to the presentation and discussion of our numerical results. In Sec. VI we reproduce some of the numerical results with analytical approximations and we derive fits for the number of produced gravitons. We discuss our main results and their implications for bouncing braneworlds in Sec. VII and conclude in Sec. VIII. Some technical aspects are collected in appendices. The main and most important results of this rather long and technical paper are published in the Letter [47]. II. GRAVITONS IN MOVING BRANEWORLDS A. A moving brane in AdS5 We consider a AdS-5 spacetime. In Poincaré coordi- nates, the bulk metric is given by ds2 = gABdx AdxB = −dt2 + δijdxidxj + dy2 (2.1) The physical brane (our Universe) is located at some time dependent position y = yb(t), while the 2nd brane is at fixed position y = ys (see Fig. 1). The induced metric on the physical brane is given by ds2 = y2b (t) dt2 + δijdx = a2(η) −dη2 + δijdxidxj , (2.2) where a(η) = yb(t) (2.3) is the scale factor and η denotes the conformal time of an observer on the brane, dt ≡ γ−1dt . (2.4) We have introduced the brane velocity v ≡ dyb = − LH√ 1 + L2H2 and (2.5) 1− v2 1 + L2H2 . (2.6) Here H is the usual Hubble parameter, H ≡ ȧ/a2 ≡ a−1H = −L−1γv , (2.7) and an overdot denotes the derivative with respect to conformal time η. The bulk cosmological constant Λ is related to the curvature scale L by Λ = −6/L2. The junction conditions on the brane lead to [37, 48] (ρ+ T ) = 6 1 + L2H2 , (2.8) (ρ+ P ) = − 2LḢ 1 + L2H2 . (2.9) Here T is the brane tension and ρ and P denote the energy density and pressure of the matter confined on the brane. Combining (2.8) and (2.9) results in ρ̇ = −3Ha(ρ+ P ) , (2.10) while taking the square of (2.8) leads to . (2.11) These equations form the basis of brane cosmology and have been discussed at length in the literature (for re- views see [49, 50]). The last equation is called the mod- ified Friedmann equation for brane cosmology [13]. For usual matter with ρ+ P > 0, ρ decreases during expan- sion and at sufficiently late time ρ ≪ T . The ordinary four-dimensional Friedmann equation is then recovered if and we set κ = 8πG4 = . (2.12) Here we have neglected a possible four-dimensional cos- mological constant. The first of these equations is the RS fine tuning implying κ5 = κ4 L . (2.13) Defining the string and Planck scales by = L3s , κ4 = = L2Pl , (2.14) respectively, the RS fine tuning condition leads to . (2.15) As outlined in the introduction, we shall be interested mainly in a radiation dominated low-energy phase, hence in the period where ρ and |v| ≪ 1 so that γ ≃ 1 , dη ≃ dt . (2.16) In such a period, the solutions to the above equations are of the form a(t) = |t|+ tb , (2.17) yb(t) = |t|+ tb , (2.18) v(t) = − sgn(t)L (|t|+ tb)2 ≃ −HL . (2.19) Negative times (t < 0) describe a contracting phase, while positive times (t > 0) describe radiation dominated expansion. At t = 0, the scale factor exhibits a kink and the evolution equations are singular. This is the bounce which we shall not model in detail, but we will have to in- troduce a cutoff in order to avoid ultraviolet divergencies in the total particle number and energy density which are due to this unphysical kink. We shall show, that when the kink is smoothed out at some length scale, the pro- duction of particles (KK gravitons) of masses larger than this scale is exponentially suppressed, as it is expected. The (free) parameter tb > 0 determines the value of the scale factor at the bounce ab, i.e. the minimal interbrane distance, as well as the velocity at the bounce vb ab = a(0) = , |v(0)| ≡ vb = . (2.20) Apparently we have to demand tb > L which implies yb(t) < L. B. Tensor perturbations in AdS5 We now consider tensor perturbations on this back- ground. Allowing for tensor perturbations hij(t,x, y) of the spatial three-dimensional geometry at fixed y, the bulk metric reads ds2 = −dt2 + (δij + 2hij)dxidxj + dy2 . (2.21) Tensor modes satisfy the traceless and transverse condi- tions, hii = ∂ih j = 0. These conditions imply that hij has only two independent degrees of freedom, the two polar- ization states • = ×,+. We decompose hij into spatial Fourier modes, hij(t,x, y) = (2π)3/2 •=+,× eik·xe•ij(k)h•(t, y;k) , (2.22) where e•ij(k) are unitary constant transverse-traceless po- larization tensors which form a basis of the two polariza- tion states • = ×,+. For hij to be real we require h∗•(t, y;k) = h•(t, y;−k). (2.23) The perturbed Einstein equations yield the equation of motion for the mode functions h , which obey the Klein- Gordon equation for minimally coupled massless scalar fields in AdS5 [25, 51, 52] ∂2t + k 2 − ∂2y + (t, y;k) = 0 . (2.24) In addition to the bulk equation of motion the modes also satisfy a boundary condition at the brane coming from the second junction condition, LH∂th• − 1 + L2H2∂yh• − γ (v∂t + ∂y)h•|yb = aPΠ(T ) . (2.25) Here Π(T ) denotes possible anisotropic stress perturba- tions in the brane energy momentum tensor. We are in- terested in the quantum production of free gravitons, not in the coupling of gravitational waves to matter. There- fore we shall set Π(T ) = 0 in the sequel, i.e. we make the assumption that the Universe is filled with a perfect fluid. Then, (2.25) reduces to 1 (v∂t + ∂y)h•|yb(t) = 0 . (2.26) This is not entirely correct for the evolution of gravity modes since at late times, when matter on the brane is no longer a perfect fluid (e.g., free-streaming neutrinos) and anisotropic stresses develop which slightly modify the evolution of gravitational waves. We neglect this sub- dominant effect in our treatment. (Some of the difficul- ties which appear when Π(T ) 6= 0 are discussed in [48].) The wave equation (2.24) together with the boundary condition (2.26) can also be obtained by variation of the action Sh = 2 yb(t) |∂th•|2 − |∂yh•|2 − k2|h•|2 , (2.27) which follows from the second order perturbation of the gravitational Lagrangian. The factor 2 in the action is 1 In Equations (4) and (8) of our Letter [47] two sign mistakes have creeped in. due to Z2 symmetry. Indeed, Equation (2.26) is the only boundary condition for the perturbation amplitude h• which is compatible with the variational principle δSh = 0, except if h• is constant on the brane. Since this issue is important in the following, it is discussed more detailed in Appendix A. C. Equations of motion in the late time/low energy limit In this work we restrict ourselves to relatively late times, when ρT ≫ ρ2 and therefore |v| ≪ 1. (2.28) In this limit the conformal time on the brane agrees roughly with the 5D time coordinate, dη ≃ dt and we shall therefore not distinguish these times; we set t = η. We want to study the quantum mechanical evolution of tensor perturbations within a canonical formulation similar to the dynamical Casimir effect for the electro- magnetic field in dynamical cavities [15, 16, 17]. In or- der to pave the way for canonical quantization, we have to introduce a suitable set of functions allowing the ex- pansion of the perturbation amplitude h• in canonical variables. More precisely, we need a complete and or- thonormal set of eigenfunctions φα of the spatial part −∂2y + 3y∂y = −y y−3∂y of the differential opera- tor (2.24). The existence of such a set depends on the boundary conditions and is ensured if the problem is of Sturm-Liouville type (see, e.g.,[53]). For the junc- tion condition (2.26), such a set does unfortunately not exist due to the time derivative. One way to proceed would be to introduce other coordinates along the lines of [54] for which the junction condition reduces to a sim- ple Neumann boundary condition leading to a problem of Sturm-Liouville type. This transformation is, however, relatively complicated to implement without approxima- tions and is the subject of future work. Here we shall proceed otherwise, harnessing the fact that we are interested in low energy effects only, i.e. in small brane velocities. Assuming that one can neglect the time derivative in the junction condition since |v| ≪ 1, Eq. (2.25) reduces to a simple Neumann boundary con- dition. We shall therefore work with the boundary con- ditions ∂yh•|yb = ∂yh•|ys = 0 . (2.29) Then, at any time t the eigenvalue problem for the spatial part of the differential operator (2.24) −∂2y + φα(t, y) = −y3∂y y−3∂yφα(t, y) = m2α(t)φα(t, y) (2.30) is of Sturm-Liouville type if we demand that the φα’s are subject to the boundary conditions (2.29). Consequently, the set of eigenfunctions {φα(t, y)}∞α=0 is complete, φα(t, y)φα(t, ỹ) = δ(y − ỹ)y3 , (2.31) and orthonormal with respect to the inner-product (φα, φβ) = 2 yb(t) φα(t, y)φβ(t, y) = δαβ . (2.32) Note the factor 2 in front of both expressions which is necessary in order to take the Z2 symmetry properly into account. The eigenvalues mα(t) are time-dependent and discrete due to the time-dependent but finite distance between the branes and the eigenfunctions φα(t, y) are time- dependent in particular because of the time dependence of the boundary conditions (2.29). The case α = 0 with m0 = 0 is the zero mode, i.e. the massless four- dimensional graviton. Its general solution in accordance with the boundary conditions is just a constant with re- spect to the extra dimension, φ0(t, y) = φ0(t), and is fully determined by the normalization condition (φ0, φ0) = 1: φ0(t) = ysyb(t)√ y2s − y2b (t) . (2.33) For α = i ∈ {1, 2, 3, · · · , } with eigenvalues mi > 0, the general solution of (2.30) is a combination of the Bessel functions J2 (mi(t) y) and Y2 (mi(t) y). Their particular combination is determined by the boundary condition at the moving brane. The remaining boundary condition at the static brane selects the possible values for the eigen- values mi(t), the KK masses. For any three-momentum k these masses build up an entire tower of momenta in the y-direction; the fifth dimension. Explicitely, the so- lutions φi(t, y) for the KK modes read φi(t, y) = Ni(t)y 2C2 (mi(t) y) (2.34) Cν(miy) = Y1(miyb)Jν(miy)−J1(miyb)Yν(miy). (2.35) The normalization reads Ni(t, yb, ys) = y2sC22(mi ys)− (2/(miπ)) (2.36) where we have used that C2(mi yb) = πmi yb . (2.37) 2 Note that we have changed the parameterization of the solutions with respect to [37] for technical reasons. There, we also did not take into account the factor 2 related to Z2 symmetry. It can be simplified further by using C2(mi ys) = Y1(mi yb) Y1(mi ys) πmi ys (2.38) leading to Y 21 (miys) Y 21 (miyb)− Y 21 (miys) . (2.39) Note that it is possible to have Y 21 (mi ys)− Y 21 (mi yb) = 0. But then both Y 21 (miys) = Y 1 (miyb) = 0 and Eq. (2.39) has to be understood as a limit. For that rea- son, the expression (2.36) for the normalization is used in the numerical simulations later on. Its denominator remains always finite. The time-dependent KK masses {mi(t)}∞i=1 are deter- mined by the condition C1 (mi(t)ys) = 0 . (2.40) Because the zeros of the cross product of the Bessel func- tions J1 and Y1 are not known analytically in closed form, the KK-spectrum has to be determined by solving Eq. (2.40) numerically 3. An important quantity which we need below is the rate of change ṁi/mi of a KK mass given by m̂i ≡ = ŷb m2i π N2i (2.41) where the rate of change of the brane motion ŷb is just the Hubble parameter on the brane ŷb(t) ≡ ẏb(t) yb(t) ≃ −Ha = − ȧ = −H . (2.42) On account of the completeness of the eigenfunctions φα(t, y) the gravitational wave amplitude h•(t, y;k) subject to the boundary conditions (2.29) can now be expanded as h•(t, y;k) = qα,k,•(t)φα(t, y) . (2.43) The coefficients qα,k,•(t) are canonical variables describ- ing the time evolution of the perturbations and the fac- κ5/L3 has been introduced in order to render the qα,k,•’s canonically normalized. In order to satisfy (2.23) we have to impose the same condition for the canonical variables, i.e. q∗α,k,• = qα,−k,•. (2.44) 3 Approximate expressions for the zeros can be found in [55]. One could now insert the expansion (2.43) into the wave equation (2.24), multiplying it by φβ(t, y) and integrating out the y−dependence by using the orthonormality to de- rive the equations of motion for the variables qα,k,•. How- ever, as we explain in Appendix A, a Neumann boundary condition at a moving brane is not compatible with a free wave equation. The only consistent way to implement the boundary conditions (2.29) is therefore to consider the action (2.27) of the perturbations as the starting point to derive the equations of motion for qα,k,•. Inserting (2.43) into (2.27) leads to the canonical action S = 1 |q̇α,k,•|2 − ω2α,k|qα,k,•|2 Mαβ (qα,k,•q̇β,−k,• + qα,−k,•q̇β,k,•) +Nαβqα,k,•qβ,−k,• . (2.45) We have introduced the time-dependent frequency of a graviton mode ω2α,k = k2 +m2α , k = |k| , (2.46) and the time-dependent coupling matrices Mαβ = (∂tφα, φβ) , (2.47) Nαβ = (∂tφα, ∂tφβ) = MαγMβγ (2.48) which are given explicitely in Appendix B (see also [37]). Consequently, the equations of motion for the canonical variables are q̈α,k,• + ω α,kqα,k,• + [Mβα −Mαβ ] q̇β,k,• Ṁαβ −Nαβ qβ,k,• = 0 . (2.49) The motion of the brane through the bulk, i.e. the expansion of the Universe, is encoded in the time-dependent coupling matrices Mαβ , Nαβ. The mode couplings are caused by the time-dependent boundary condition ∂yh•(t, y)|yb = 0 which forces the eigenfunctions φα(t, y) to be explicitly time-dependent. In addition, the frequency of a KK mode ωα,k is also time-dependent since the distance between the two branes changes when the brane is in motion. Both time- dependencies can lead to the amplification of tensor perturbations and, within a quantum theory which is developed in the next section, to graviton production from vacuum. Because of translation invariance with respect to the directions parallel to the brane, modes with different k do not couple in (2.49). The three-momentum k enters the equation of motion for the perturbation only via the frequency ωα,k, i.e. as a global quantity. Equation (2.49) is similar to the equation describing the time-evolution of electromagnetic field modes in a three-dimensional dynamical cavity [16] and may effectively be described by a massive scalar field on a time-dependent interval [17]. For the electromagnetic field, the dynamics of the cavity, or more precisely the motion of one of its walls, leads to photon creation from vacuum fluctuations. This phenomenon is usually referred to as dynamical Casimir effect. Inspired by this, we shall call the production of gravitons by the moving brane as dynamical Casimir effect for gravitons. D. Remarks and comments In [37] we have already shown that in the limit where the fixed brane is sent off to infinity, ys → ∞, only the M00 matrix element survives with M00 = −H[1 + O(ǫ)] and ǫ = yb/ys. M00 expresses the coupling of the zero mode to the brane motion. Since all other couplings dis- appear for ǫ → 0 all modes decouple from each other and, in addition, the canonical variables for the KK modes de- couple from the brane motion itself. This has led to the result that at late times and in the limit ys ≫ yb, the KK modes with non-vanishing mass evolve trivially, and only the massless zero mode is coupled to the brane motion q̈0,k,• + k2 − Ḣ −H2 q0,k,• = 0 . (2.50) Since φ0 ∝ 1/a [cf. Eqs. (4.2),(4.5)] we have found in [37] that the gravitational zero mode on the brane h0,•(t;k) ≡√ κ5/L3q0,k,•φ0(t, yb) evolves according to ḧ0,•(t;k) + 2Hḣ0,•(t;k) + k2h0,•(t;k) = 0 , (2.51) which explicitely demonstrates that at low energies (late times) the homogeneous tensor perturbation equation in brane cosmology reduces to the four-dimensional tensor perturbation equation. An important comment is in order here concerning the RS II model. In the limit ys → ∞ the fixed brane is sent off to infinity and one ends up with a single positive tension brane in AdS, i.e. the RS II model. Even though we have shown that all couplings except M00 vanish in this limit, that does not imply that this is necessarily the case for the RS II setup. Strictly speaking, the above arguments are only valid in a two brane model with ys ≫ 1. Starting with the RS II model from the beginning, the coupling matrices do in general not vanish when calculated with the corresponding eigenfunctions which can be found in, e.g., [22]. One just has to be careful when taking those limits. But what the above consideration demonstrates is that, if the couplings of the zero mode to the KK modes vanish, like in the ys ≫ 1 limit or in the low energy RS II model as observed in numerical simulations (see below) the standard evolution equation for the zero mode emerges automatically from five-dimensional perturbation theory. Starting from five-dimensional perturbation theory, our formalism does imply the usual evolution equation for the four-dimensional graviton in a FLRW-Universe in the limit of vanishing couplings. This serves as a very strong indication (but certainly not proof!) for the fact that the approach based on the approximation (2.29) and the expansion of the action in canonical variables rather than the wave equation is consistent and leads to results which should reflect the physics at low energies. As already outlined, if one would expand the wave equation (2.24) in the set of functions φα, the resulting equation of motion for the corresponding canonical variables is different from Eq. (2.49) and cannot be derived from a Lagrangian or Hamiltonian (see Appendix A). Moreover, in [30] the low energy RS II scenario has been studied numerically including the full junction condition (2.26) without approximations (see also [27]). Those numerical results show that the evolution of tensor perturbations on the brane is four-dimensional, i.e. described by Eq. (2.51) derived here analytically. Combining these observations gives us confidence that the used approach based on the Neumann boundary condition approximation and the action as starting point for the canonical formulation is adequate for the study of tensor perturbations in the low energy limit. The many benefits this approach offers will become visible in the following. III. QUANTUM GENERATION OF TENSOR PERTURBATIONS A. Preliminary remarks We now introduce a treatment of quantum generation of tensor perturbations. This formalism is an advance- ment of the method which is presented in [15, 16, 17] for the dynamical Casimir effect for a scalar field and the electromagnetic field to gravitational perturbations in the braneworld scenario. The following method is very general and not restricted to a particular brane motion as long as it complies with the low energy approach [cf. Eq. (2.28)]. We assume that asymptotically, i.e. for t → ±∞, the physical brane approaches the Cauchy horizon (yb → 0), moving very slowly. Then, the coupling matrices vanish and the KK masses are constant (for yb close to zero, Eq. (2.40) re- duces to J1(miys) = 0): Mαβ(t) = 0 , lim mα(t) = const. ∀α, β . (3.1) In this limit, the system (2.49) reduces to an infinite set of uncoupled harmonic oscillators. This allows to intro- duce an unambiguous and meaningful particle concept, i.e. notion of (massive) gravitons. As a matter of fact, in the numerical simulations, the brane motion has to be switched on and off at finite times. These times are denoted by tin and tout, respectively. We introduce vacuum states with respect to times t < tin < 0 and t > tout > 0. In order to avoid spurious effects influ- encing the particle creation, we have to chose tin small, respectively tout large enough such that the couplings are effectively zero at these times. Checking the indepen- dence of the numerical results on the choice of tin and tout guarantees that these times correspond virtually to the real asymptotic states of the brane configuration. B. Quantization, initial and final state Canonical quantization of the gravity wave amplitude is performed by replacing the canonical variables qα,k,• by the corresponding operators q̂α,k,• ĥ•(t, y;k) = q̂α,k,•(t)φα(t, y) . (3.2) Adopting the Heisenberg picture to describe the quantum time-evolution, it follows that q̂α,k,• satisfies the same equation (2.49) as the canonical variable qα,k,•. Under the assumptions outlined above, the operator q̂α,k,• can be written for times t < tin as q̂α,k,•(t < tin) = (3.3) 2ωinα,k âinα,k,•e −i ωinα,k t + âin†α,−k,•e i ωinα,k t where we have introduced the initial-state frequency ωinα,k ≡ ωα,k(t < tin) . (3.4) This expansion ensures that Eq. (2.44) is satisfied. The set of annihilation and creation operators {âinα,k,•, â α,k,•} corresponding to the notion of gravitons for t < tin is subject to the usual commutation relations âinα,k,•, â α′,k′,•′ = δαα′δ••′δ (3)(k− k′) , (3.5) âinα,k,•, â α′,k′,•′ α,k,•, â α′,k′,•′ = 0. (3.6) For times t > tout, i.e. after the motion of the brane has ceased, the operator q̂α,k,• can be expanded in a similar manner, q̂α,k,•(t > tout) = (3.7) 2ωoutα,k âoutα,k,•e −i ωout t + â out † α,−k,•e i ωout with final state frequency ωoutα,k ≡ ωα,k(t > tout) . (3.8) The annihilation and creation operators {âoutα,k,•, â out † α,k,•} correspond to a meaningful definition of final state gravi- tons (they are associated with positive and negative fre- quency solutions for t ≥ tout) and satisfy the same com- mutation relations as the initial state operators. Initial |0, in〉 ≡ |0, t < tin〉 and final |0, out〉 ≡ |0, t > tout〉 vacuum states are uniquely defined via 4 âinα,k,•|0, in〉 = 0 , âoutα,k,•|0, out〉 = 0 , ∀ α, k, • . (3.9) The operators counting the number of particles defined with respect to the initial and final vacuum state, respec- tively, are N̂ inα,k,• = â α,k,•â α,k,• , N̂ α,k,• = â out † α,k,•â α,k,• . (3.10) The number of gravitons created during the motion of the brane for each momentum k, quantum number α and polarization state • is given by the expectation value of the number operator N̂outα,k,• of final-state gravitons with respect to the initial vacuum state |0, in〉: N outα,k,• = 〈0, in|N̂outα,k,•|0, in〉. (3.11) If the brane undergoes a non-trivial dynamics between tin < t < tout it is â α,k,•|0, in〉 6= 0 in general, i.e. graviton production from vacuum fluctuations takes place. From (2.22), the expansion (3.2) and Eqs.(3.3), (3.7) it follows that the quantized tensor perturbation with respect to the initial and final state can be written ĥij(t < tin,x,y) = (2π)3/2 âinα,k,• e −i ωinα,k t 2ωinα,k × u•ij,α(t < tin,x, y,k) + h.c. (3.12) ĥij(t > tout,x,y) = (2π)3/2 âoutα,k,• e −i ωoutα,k t 2ωoutα,k × u•ij,α(t > tout,x, y,k) + h.c. . (3.13) We have introduced the basis functions u•ij,α(t,x, y,k) = e ik ·x e•ij(k)φα(t, y). (3.14) which, on account of (e•ij(k)) ∗ = e•ij(−k), satisfy (u•ij,α(t,x, y,k)) ∗ = u•ij,α(t,x, y,−k). 4 Note that the notations |0, t < tin〉 and |0, t > tout〉 do not mean that the states are time-dependent; states do not evolve in the Heisenberg picture. C. Time evolution During the motion of the brane the time evolution of the field modes is described by the system of coupled differential equations (2.49). To account for the inter- mode couplings mediated by the coupling matrix Mαβ the operator q̂α,k,• is decomposed as q̂α,k,•(t) = 2ωinβ,k âinβ,k,•ǫ α,k(t) + â β,−k,•ǫ α,k (t) (3.15) The complex functions ǫ α,k(t) also satisfy the system of coupled differential equations (2.49). With the ansatz (3.15) the quantized tensor perturbation at any time dur- ing the brane motion reads ĥij(t,x, y) = (3.16) âinβ,k,•√ 2ωinβ,k α,k(t)u ij,α(t,x, y,k) + h.c. . Due to the time-dependence of the eigenfunctions φα, the time-derivative of the gravity wave amplitude con- tains additional mode coupling contributions. Using the completeness and orthnormality of the φα’s it is readily shown that h•(t, y;k) = p̂α,−k,•(t)φα(t, y) (3.17) where p̂α,−k,•(t) = ˙̂qα,k,•(t) + Mβαq̂β,k,•(t). (3.18) The coupling term arises from the time dependence of the mode functions φα. Accordingly, the time derivative hij reads hij(t,x, y) = âinβ,k,•√ 2ωinβ,k × (3.19) × f (β)α,k(t)u ij,α(t,x, y,k) + h.c. where we have introduced the function α,k(t) = ǫ̇ α,k(t) + Mγα(t)ǫ γ,k(t) . (3.20) By comparing Eq. (3.12) and its time-derivative with Eqs. (3.16) and (3.19) at t = tin one can read off the initial conditions for the functions ǫ α,k(tin) = δαβ Θ α,k , (3.21) α,k(tin) = −iωinα,kδαβ −Mβα(tin) Θinβ,k (3.22) with phase Θinα,k = e −iωinα,k tin . (3.23) The choice of this phase for the initial condition is in principle arbitrary, we could as well set Θinα,k = 1. But with this choice, ǫ α,k(t) is independent of tin for t < tin and therefore it is also at later times independent of tin if only we choose tin sufficiently early. This is especially useful for the numerical work. D. Bogoliubov transformations The two sets of annihilation and creation operators {âinα,k,•, â α,k,•} and {âoutα,k,•, â out † α,k,•} corresponding to the notion of initial-state and final-state gravitons are re- lated via a Bogoliubov transformation. Matching the expression for the tensor perturbation Eq. (3.16) and its time-derivative Eq. (3.19) with the final state expres- sion Eq. (3.13) and its corresponding time-derivative at t = tout one finds âoutβ,k,• = Aαβ,k(tout)âinα,k,• + B∗αβ,k(tout)â α,−k,• (3.24) Aβα,k(tout) = ωoutα,k ωinβ,k α,k(tout) + ωoutα,k α,k(tout) (3.25) Bβα,k(tout) = Θoutα,k ωoutα,k ωinβ,k α,k(tout)− ωoutα,k α,k(tout) (3.26) where we shall stick to the phase Θoutα,k defined like Θ in (3.23) for completeness. Performing the matching at tout = tin the Bogoliubov transformation should become trivial, i.e. the Bogoliubov coefficients are subject to vac- uum initial conditions Aαβ,k(tin) = δαβ , Bαβ,k(tin) = 0. (3.27) Evaluating the Bogoliubov coefficients (3.25) and (3.26) for tout = tin by making use of the initial conditions (3.21) and (3.22) shows the consistency. Note that the Bogoliubov transformation (3.24) is not diagonal due to the inter-mode coupling. If during the motion of the brane the graviton field departs form its vacuum state one has Bαβ,k(tout) 6= 0, i.e. gravitons have been generated. By means of Eq. (3.24) the number of generated final state gravitons (3.11), which is the same for every polarization state, is given by N outα,k (t ≥ tout) = •=+,× 〈0, in|N̂outα,k,•|0, in〉 |Bβα,k(tout)|2. (3.28) Later we will sometimes interpret tout as a continuous variable tout → t such that N outα,k → Nα,k(t), i.e. it becomes a continuous function of time. We shall call Nα,k(t) the instantaneous particle number [see Appendix C 2], however, a physical interpretation should be made with caution. E. The first order system From the solutions of the system of differential equa- tions (2.49) for the complex functions ǫ α,k, the Bogoli- ubov coefficient Bαβ,k, and hence the number of cre- ated final state gravitons (3.28), can now be calculated. It is however useful to introduce auxiliary functions α,k(t), η α,k(t) through α,k(t) = ǫ α,k(t) + ωinα,k α,k(t) (3.29) α,k(t) = ǫ α,k(t)− ωinα,k α,k(t) . (3.30) These are related to the Bogoliubov coefficients via Aβα,k(tout) = (3.31) ωoutα,k ωinβ,k ∆+α,k(tout)ξ α,k(tout) + ∆ α,k(tout)η α,k(tout) Bβα,k(tout) = (3.32) Θoutα,k ωoutα,k ωinβ,k ∆−α,k(tout)ξ α,k(tout) + ∆ α,k(tout)η α,k(tout) where we have defined ∆±α,k(t) = ωinα,k ωα,k(t) , (3.33) Using the second order differential equation for ǫ α,k, it is readily shown that the functions ξ α,k(t), η α,k(t) satisfy the following system of first order differential equations: α,k(t) = −i a+αα,k(t)ξ α,k(t)− a αα,k(t)η α,k(t) c−αγ,k(t)ξ γ,k(t) + c αγ,k(t)η γ,k(t) (3.34) α,k(t) = −i a−αα,k(t)ξ α,k(t)− a αα,k(t)η α,k(t) c+αγ,k(t)ξ γ,k(t) + c αγ,k(t)η γ,k(t) (3.35) a±αα,k(t) = ωinα,k ωα,k(t) ωinα,k  , (3.36) c±γα,k(t) = Mαγ(t)± ωinα,k ωinγ,k Mγα(t) . (3.37) The vacuum initial conditions (3.27) entail the initial conditions α,k(tin) = 2 δαβ Θ α,k , η α,k(tin) = 0. (3.38) With the aid of Eq. (3.32), the coefficient Bαβ,k(tout), and therefore the number of produced gravitons, can be directly deduced from the solutions to this system of coupled first order differential equations which can be solved using standard numerics. In the next section we will show how interesting observables like the power spectrum and the energy den- sity of the amplified gravitational waves are expressed in terms of the number of created gravitons. The system (3.34, 3.35) of coupled differential equations forms the basis of our numerical simulations. Details of the applied numerics are collected in Appendix D. IV. POWER SPECTRUM, ENERGY DENSITY AND LOCALIZATION OF GRAVITY A. Perturbations on the brane By solving the system of coupled differential equations formed by Eqs. (3.34) and (3.35) the time evolution of the quantized tensor perturbation ĥij(t,x, y) can be com- pletely reconstructed at any position y in the bulk. Ac- cessible to observations is the imprint which the pertur- bations leave on the brane, i.e. in our Universe. Of par- ticular interest is therefore the part of the tensor pertur- bation which resides on the brane. It is given by eval- uating Eq. (2.22) at the brane position y = yb (see also [36]) ĥij(t,x, yb) = (2π)3/2 •=+,× eik·xe•ij(k)ĥ•(t, yb,k) . (4.1) The motion of the brane (expansion of the Universe) en- ters this expression via the eigenfunctions φα(t, yb(t)). We shall take (4.1) as the starting point to define ob- servables on the brane. The zero-mode function φ0(t) [cf. Eq. (2.33)] does not depend on the extra dimension y. Using Eq. (2.37), one reads off from Eq. (2.34) that the eigenfunctions on the brane φα(t, yb) are φα(t, yb) = yb Yα(yb) = Yα(a) (4.2) where we have defined Y0(a) = y2s − y2b and (4.3) Yn(a) = Y 21 (mnys) Y 21 (mnyb)− Y 21 (mnys) , (4.4) for the zero- and KK modes, respectively. One immedi- ately is confronted with an interesting observation: the function Yα(a) behaves differently with the expansion of the Universe for the zero mode α = 0 and the KK modes α = n. This is evident in particular in the asymptotic regime ys ≫ yb, i.e. yb → 0 (|t|, a → ∞) where, exploit- ing the asymptotics of Y1 (see [55]), one finds Y0(a) ≃ 1 , Yn(a) ≃ |Y1(mnys)| ≃ (4.5) Ergo, Y0 is constant while Yn decays with the expansion of the Universe as 1/a. For large n one can approximate mn ≃ nπ/ys and Y1(mnys) ≃ Y1(nπ) ≃ (1/π) 2/n [55], so that Yn(a) ≃ , Y2n(a) ≃ πL2mn 2 ysa2 . (4.6) In summary, the amplitude of the KKmodes on the brane decreases faster with the expansion of the Universe than the amplitude of the zero mode. This leads to interest- ing consequences for the observable power spectrum and energy density and has a clear physical interpretation: It manifest the localization of usual gravity on the brane. As we shall show below, KK gravitons which are traces of the five-dimensional nature of gravity escape rapidly from the brane. B. Power spectrum We define the power spectrum P(k) of gravitational waves on the brane as in four-dimensional cosmology by using the restriction of the tensor amplitude to the brane position (4.1): (2π)3 P(k)δ(3)(k− k′) (4.7) •=×,+ 0, in ∣∣∣ĥ•(t, yb;k)ĥ†•(t, yb;k′) ∣∣∣0, in i.e. we consider the expectation value of the field operator ĥ• with respect to the initial vacuum state at the position of the brane y = yb(t). In order to get a physically mean- ingful power spectrum, averaging over several oscillations of the gravitational wave amplitude has to be performed. Equation (4.7) describes the observable power spectrum imprinted in our Universe by the four-dimensional spin-2 graviton component of the five-dimensional tensor per- turbation. The explicit calculation of the expectation value involv- ing a “renormalization” of a divergent contribution is car- ried out in detail in Appendix C 2. The final result reads P(k) = 1 (2π)3 Rα,k(t)Y2α(a). (4.8) The function Rα,k(t) can be expressed in terms of the Bogoliubov coefficients (3.25) and (3.26) if one considers tout as a continuous variable t: Rα,k(t) = Nα,k(t) +ONα,k(t) ωα,k(t) . (4.9) Nα,k(t) is the instantaneous particle number [cf. Ap- pendix C 1] and the function ONα,k(t) is defined in Eq. (C9). It is important to recall thatNα,k(t) can in general not be interpreted as a physical particle number. For example zero modes with wave numbers such that kt < 1 can- not be considered as particles. They have not performed several oscillations and their energy density cannot be defined in a meaningful way. Equivalently, expressed in terms of the complex functions α,k, one finds Rα,k(t) = |ǫ(β)α,k(t)|2 ωinβ,k ωα,k(t) +Oǫα,k(t), (4.10) with Oǫα,k given in Eq. (C10). Equation (4.8) together with (4.9) or (4.10) holds at all times. If one is interested in the power spectrum at early times kt ≪ 1, it is not sufficient to take only the instantaneous particle number Nα,k(t) in Eq. (4.9) into account. This is due to the fact that even if the mode functions ǫ are already oscillating, the coupling matrix entering the Bogoliubov coefficients might still undergo a non-trivial time dependence [cf. Eq. (6.16)]. In the next section we shall show explicitly, that in a radiation dominated bounce particle creation, especially of the zero mode, only stops on sub-Hubble times, kt > 1, even if the mode functions are plane waves right after the bounce [cf, e.g., Figs. 6, 7, 9]. Therefore, in order to determine the per- turbation spectrum of the zero mode, one has to make use of the full expression expression (4.10) and may not use (4.11), given below. At late times, kt ≫ 1 (t ≥ tout) when the brane moves slowly, the couplings Mαβ go to zero and particle cre- ation has come to an end, both functions ONα,k and Oǫα,k do not contribute to the observable power spectrum after averaging over several oscillations. Furthermore, the in- stantaneous particle number then equals the (physically meaningful) number of created final state gravitons N outα,k and the KK masses are constant. Consequently, the ob- servable power spectrum at late times takes the form P(k, t ≥ tout) = (2π)3 N outα,k ωoutα,k Y2α(a) , (4.11) where we have used that κ5/L = κ4. Its dependence on the wave number k is completely determined by the spec- tral behavior of the number of created gravitons N outα,k . It is useful to decompose the power spectrum in its zero- mode and KK-contributions: P = P0 + PKK . (4.12) In the late time regime, using Eqs. (4.11) and (4.5), the zero-mode power spectrum reads P0(k, t ≥ tout) = (2π)3 N out0,k . (4.13) As expected for a usual four-dimensional tensor perturba- tion (massless graviton), on sub-Hubble scales the power spectrum decreases with the expansion of the Universe as 1/a2. In contrast, the KK mode power spectrum for late times, given by PKK(k, t ≥ tout) = N outn,k ωoutn,k Y 21 (mnys), (4.14) decreases as 1/a4, i.e. with a factor 1/a2 faster than P0. The gravity wave power spectrum at late times is therefore dominated by the zero-mode power spectrum and looks four dimensional. Contributions to it arising from five-dimensional effects are scaled away rapidly as the Universe expands due to the 1/a4 behavior of PKK. In the limit of large masses mnys ≫ 1, n ≫ 1 and for wave lengths k ≪ mn such that ωn,k ≃ mn, the late-time KK-mode power spectrum can be approximated by PKK(k, t ≥ tout) = 16π2ys N outn,k (4.15) where we have inserted Eq. (4.6) for Y2n(a). Note that the formal summations over the particle num- ber might be ill defined if the brane trajectory contains unphysical features like discontinuities in the velocity. An appropriate regularization is then necessary, for example, by introducing a physically motivated cutoff. C. Energy density For a usual four-dimensional tensor perturbation hµν on a backgroundmetric gµν an associated effective energy momentum tensor can be defined unambiguously by (see, e.g., [14, 56]) Tµν = 〈hαβ‖µhαβ‖ν〉 , (4.16) where the bracket stands for averaging over several pe- riods of the wave and “‖” denotes the covariant deriva- tive with respect to the unperturbed background metric. The energy density of gravity waves is the 00-component of the effective energy momentum tensor. We shall use the same effective energy momentum tensor to calculate the energy density corresponding to the four-dimensional spin-2 graviton component of the five-dimensional ten- sor perturbation on the brane, i.e. for the perturbation hij(t,x, yb) given by Eq. (4.1). For this it is important to remember that in our low energy approach, and in particular at very late times for which we want to cal- culate the energy density, the conformal time η on the brane is identical to the conformal bulk time t. The en- ergy density of four-dimensional spin-2 gravitons on the brane produced during the brane motion is then given by [see also [36]] κ4 a2 0, in| ˙̂hij(t,x, yb) ˙̂hij(t,x, yb)|0, in . (4.17) Here the outer bracket denotes averaging over several os- cillations, which (in contrast to the power spectrum) we embrace from the very beginning. The factor 1/a2 comes from the fact that an over-dot indicates the derivative with respect t. A detailed calculation is carried out in Appendix C 3 leading to (2π)3 ωα,kNα,k(t)Y2α(a) (4.18) where againNα,k(t) is the instantaneous particle number. At late times t > tout after particle creation has ceased, the energy density is therefore given by (2π)3 ωoutα,k N outα,k Y2α(a). (4.19) This expression looks at first sight very similar to a “naive” definition of energy density as integration over momentum space and summation over all quantum num- bers α of the energy ωoutα,k N outα,k of created gravitons. (Note that the graviton number N outα,k already contains the contributions of both polarizations [see Eq. (3.28)].) However, the important difference is the appearance of the function Y2α(a) which exhibits a different dependence on the scale factor for the zero mode compared to the KK modes. Let us decompose the energy density into zero-mode and KK contributions ρ = ρ0 + ρKK . (4.20) For the energy density of the massless zero mode one then obtains (2π)3 kN out0,k . (4.21) This is the expected behavior; the energy density of stan- dard four-dimensional gravitons scales like radiation. On contrast, the energy density of the KK modes at late times is found to be ρKK = (2π)3 ωoutn,k N outn,k m2nY 21 (mnys), (4.22) which decays like 1/a6. As the Universe expands, the en- ergy density of massive gravitons on the brane is there- fore rapidly diluted. The total energy density of gravita- tional waves in our Universe at late times is dominated by the standard four-dimensional graviton (massless zero mode). In the large mass limit mnys ≫ 1,n ≫ 1 the KK- energy density can be approximated by ρKK ≃ 2a6ys (2π)3 N outn,k ωoutn,kmn . (4.23) Due to the factor mn coming from the function Y2n, i.e. from the normalization of the functions φn(t, y), for the summation over the KK-tower to converge, the number of produced gravitons N outn,k has to decrease faster than 1/m3n for large masses and not just faster than 1/m one might naively expect. D. Escaping of massive gravitons and localization of gravity As we have shown, the power spectrum and energy density of the KK modes scale, at late times when par- ticle production has ceased, with the expansion of the Universe like PKK ∝ 1/a4 , ρKK ∝ 1/a6. (4.24) Both quantities decay by a factor 1/a2 faster than the corresponding expressions for the zero-mode graviton. In particular, the energy density of the KK particles on the brane behaves effectively like stiff matter. Mathemat- ically, this difference arises from the distinct behavior of the functions Y0(a) and Yn(a) [cf. Eq. (4.5)] and is a direct consequence of the warping of the fifth dimension. But what is the underlying physics? As we shall discuss now, this scaling behavior for the KK particles has indeed a very appealing physical interpretation which is in the spirit of the RS model. First, the mass mn is a comoving mass. The (in- stantaneous) ’comoving’ frequency or energy of a KK graviton is ωn,k = k2 +m2n, with comoving wave number k. The physical mass of a KK mode measured by an observer on the brane with cosmic time dτ = adt is therefore mn/a, i.e. the KK masses are redshifted with the expansion of the Universe. This comes from the fact that mn is the wave number corresponding to the y-direction with respect to the bulk time t which corresponds to conformal time η on the brane and not to physical time. It implies that the energy of KK particles on a moving AdS brane is redshifted like that of massless particles. From this alone one would expect that the energy density of KK modes on the brane decays like 1/a4 (see also Appendix D of [22]). Now, let us define the “wave function” for a gravi- Ψα(t, y) = φα(t, y) (4.25) which, by virtue of (φα, φα) = 1, satisfies dyΨ2α(t, y) = 1 (4.26) From the expansion of the gravity wave amplitude Eq. (2.43) and the normalization condition it is clear that Ψ2α(t, y) gives the probability to find a graviton of mass mα for a given (fixed) time t at position y in the Z2- symmetric AdS-bulk. Since φα satisfies Equation (2.30), the wave function Ψα satisfies the Schrödinger like equa- − ∂2yΨα + Ψα = m αΨα (4.27) and the junction conditions (2.29) translate into Ψα|y={yb,ys} = 0. (4.28) In Fig. 2 we plot the evolution of Ψ21(t, y) under the influence of the brane motion Eq. (2.18) with vb = 0.1. For this motion, the physical brane starting at yb → 0 for t → −∞ moves towards the static brane, corresponding to a contracting Universe. After a bounce, it moves back to the Cauchy horizon, i.e. the Universe expands. The second brane is placed at ys = 10L and y ranges from yb(t) to ys. We set Ψ 1 ≡ 0 for y < yb(t) . The time-dependent KK mass m1 is determined numerically from Eq. (2.40). As it is evident from this Figure, Ψ21 is effectively localized close to the static brane, i.e. the weight of the KK-mode wave function lies in the region of less warping, far from the physical brane. Thus the probability to find a KK mode is larger in the region with less warping. Since the effect of the brane motion on Ψ21 is hardly visible in Fig. 2, we show the behavior of Ψ21 close to the physical brane in Fig. 3. This shows that Ψ21 peaks also at the physical brane but with an amplitude roughly ten times smaller than the amplitude at the static brane. While the brane, coming from t → −∞, approaches the point of closest encounter Ψ21 slightly increases and peaks at the bounce t = 0 where, as we shall show in the next Section, the production of KK particles takes place. Afterwards, for t → ∞, when the brane is moving back towards the Cauchy horizon, the amplitude Ψ21 decreases again and so does the probability to find a KK particle at the position of the physical brane, i.e. in our Universe. The parameter settings used in Figures 2 and 3 are typical parameters which we use in the numerical simulations described later on. However, the effect is illustrated much better if the second brane is closer to the moving brane. In Figure 4 we show Ψ21 for the same parameters as in Figures 2 and 3 but now with ys = L. In this case, the probability to find a KK particle on the physical brane is of the same order as in the region close to the second brane during times close to the bounce. However, as the Universe expands, Ψ21 rapidly decreases at the position of the physical brane. From Eqs. (4.2) and (4.5) it follows that Ψ2n(t, yb) ∝ 1/a. The behavior of the KK-mode wave function suggests the following interpretation: If KK gravitons are created on the brane, or equivalently in our Universe, they escape from the brane into the bulk as the brane moves back to the Cauchy horizon, i.e. when the Universe undergoes expansion. This is the reason why the power spectrum and the energy density imprinted by the KK modes on the brane decrease faster with the expansion of the Universe than for the massless zero mode. The zero mode, on the other hand, is localized at the position of the moving brane. The profile of φ0 does not depend on the extra dimension, but the zero-mode wave function Ψ0 does. Its square is Ψ20(t, y) = y2s − y2b if ys ≫ yb , (4.29) such that on the brane (y = yb) it behaves as Ψ20(t, yb) ≃ . (4.30) Equation (4.29) shows that, at any time, the zero mode is localized at the position of the moving brane. For a better illustration we show Eq. (4.29) in Fig. 5 for the same parameters as in Fig. 4. This is the “dynamical analog” of the localization mechanism for four-dimensional gravity discussed in [7]. To establish contact with [7] and to obtain a intu- itive physical description, we rewrite the boundary value problem (4.27), (4.28) as a Schrödinger-like equation − ∂2yΨα(t, y) + V (y, t)Ψα(y, t) = mα(t)Ψα(y, t) (4.31) V (y, t) = yb(t) δ(|y| − yb(t)) − 3a(t) δ(|y| − yb(t)) , (4.32) where we have absorbed the boundary condition at the moving brane into the (instantaneous) volcano potential V (y, t) and made use of Z2 symmetry. Similar to the static case [7], at any time the potential (4.32) supports a single bound state, the four-dimensional graviton (4.29), and acts as a barrier for the massive KK modes. The potential, ensuring localization of four-dimensional gravity on the brane and the repulsion of KK modes, moves together with the brane through the fifth dimen- sion. Note that with the expansion of the Universe, the “depth of the delta-function” becomes larger, expressing the fact that the localization of four-dimensional gravity becomes stronger at late times [cf. Eq. (4.30), Fig. 5]. In summary, the different scaling behavior for the zero- and KK modes on the brane is entirely a conse- quence of the geometry of the bulk space-time, i.e. of the warping L2/y2 of the metric (2.1) 5. It is simply a manifestation of the localization of gravity on the brane: as time evolves, the KK gravitons, which are traces of the five-dimensional nature of gravity, escape into the bulk and only the zero mode which corresponds to the usual four-dimensional graviton remains on the brane. This, and in particular the scaling behavior (4.24), remains also true if the second brane is removed, i.e. in the limit ys → ∞, leading to the original RS II model. By looking at (4.15) and (4.23) one could at first think that then the KK-power spectrum and energy density vanish and no traces of the KK gravitons could be observed on the brane since both expressions behave as 1/ys. But this is not the case since the spectrum of KK masses becomes continuous. In the continuum limit ys → ∞ the summation over the discrete spectrum mn has to be replaced by an integration over continuous masses m in the following way: f(mn) −→ dmf(m) . (4.33) f is some function depending on the spectrum, for example f(mn) = N outn,k . The pre-factor 1/ys in (4.15) and (4.23) therefore ensures the existence of the proper continuum limit of both expressions. Another way of seeing this is to repeat the same calculations but using the eigenfunctions for the case with only one brane from the beginning. Those are δ-function normalized and can be found in, e.g., [22]. They are basically the same as (2.34) except that the normalization is different since it depends on whether the fifth dimension is compact or not. In particular, on the brane, they have the same scale factor dependence as (4.2). At the end, the behavior found for the KK modes should not come as a surprise, since the RS II model has attracted lots of attention because of exactly this; it localizes usual four-dimensional gravity on the brane. As we have shown here, localization of standard four- dimensional gravity on a moving brane via a warped geometry automatically ensures that the KK modes escape into the bulk as the Universe expands because their wave function has its weight in the region of less warping, resulting in an KK-mode energy density on the brane which scales like stiff matter. An immediate consequence of this particular scaling behavior is that KK gravitons in an AdS braneworld 5 Note that it does not depend on a particular type of brane motion and is expected to be true also in the high energy case which we do not consider here. FIG. 2: Evolution of Ψ21(t, y) = φ 1(t, y)/y 3 corresponding to the probability to find the first KK graviton at time t at the position y in the AdS-bulk. The static brane is at ys = 10L and the maximal brane velocity is given by vb = 0.1. FIG. 3: Evolution of Ψ21(t, y) as in Fig. 2 but zoomed into the bulk-region close to the moving brane. cannot play the role of dark matter. Their energy density in our Universe decays much faster with the expansion than that of ordinary matter which is restricted to reside on the brane. V. NUMERICAL SIMULATIONS A. Preliminary remarks In this section we present results of numerical simula- tions for the bouncing model described by the equations (2.17)-(2.19). In the numerical simulations we set L = 1, i.e. all FIG. 4: Evolution of Ψ21(t, y) for ys = L and vb = 0.1. FIG. 5: Localization of four-dimensional gravity on a moving brane: Evolution of Ψ20(t, y) for ys = L = 1 and vb = 0.1 which should be compared with Ψ21(t, y) shown in Fig. 4. dimensionful quantities are measured in units of the AdS5 curvature scale. Starting at initial time tin ≪ 0 where the initial vacuum state |0, in〉 is defined, the system (3.34,3.35) is evolved numerically up to final time tout. Thereby we set tin = −2πNin/k with 1 ≤ Nin ∈ N, such that Θin0,k = 1 [cf. Eq. (3.23)]. This implies 0 (tin) = 2, i.e. independent of the three-dimensional momentum k a (plane wave) zero-mode solution always performs a fixed number of oscillations between tin and the bounce at t = 0 [cf. Eq. (3.38)]. The final graviton spectrum at N outα,k is calculated at late times tout ≫ 1 when the brane approaches the Cauchy horizon and graviton creation has ceased. This quantity is physically well defined and leads to the late-time power spectrum (4.11) and energy density (4.19) on the brane. For illustrative purposes, we also plot the instantaneous particle number Nα,k,•(t) which also determines the power spectrum at all times [cf Eq.(4.9)]. In this section we shall use the term particle number respectively graviton number for both, the instantaneous particle number Nα,k,•(t) as well as the final state graviton number N outα,k,•, keeping in mind that only the latter one is physically meaningful. There are two physical input parameters for the numerical simulation; the maximal brane velocity vb (i.e. tb) and the position of the static brane ys. The latter determines the number of KK modes which fall within a particular mass range. On the numerical side one has to specify Nin and tout, as well as the maximum number of KK modes nmax which one takes into account, i.e. after which KK mode the system of differential equations is truncated. The independence of the numerical results on the choice of the time parameters is checked and the convergence of the particle spectrum with increasing nmax is investigated. More detailed information on numerical issues including accuracy considerations are collected in Appendix D. One strong feature of the brane motion (2.18) is its kink at the bounce t = 0. In order to study how particle production depends on the kink, we shall compare the motion (2.18) with the following motion which has a smooth transition from contraction to expansion (L = 1): yb(t) = (|t|+ tb − ts)−1 if |t| > ts a+ (b/2)t2 + (c/4)t4 if |t| ≤ ts (5.1) with the new parameter ts in the range 0 < ts < tb. This motion is constructed such that its velocity at |t| = ts is the same as the velocity of the kink motion at the bounce. This will be the important quantity determin- ing the number of produced gravitons. For ts → 0 the motion with smooth transition approaches (2.18). The parameters a, b and c are obtained by matching the mo- tions and the first and second derivatives. Matching also the second derivative guarantees that possible spurious effects contributing to particle production are avoided. The parameter ts has to be chosen small enough, ts ≪ 1, such that the maximal velocity of the smooth motion is not much larger than vb in order to have comparable sit- uations. For reasons which will become obvious in the next two sections we shall discuss the cases of long k ≪ 1 and short wavelengths k ≫ 1, separately. B. Generic results and observations for long wavelengths k ≪ 1 Figure 6 displays the results of a numerical simula- tion for three-momentum k = 0.01, static brane position ys = 10 and maximal brane velocity vb = 0.1. Depicted is FIG. 6: Evolution of the graviton number Nα,k,•(t) for the zero mode and the first ten KK modes for three-momentum k = 0.01 and vb = 0.1, ys = 10. FIG. 7: Nn,k,•(t) for the zero mode and the first ten KK modes for the parameters of Fig. 6, but without coupling of the zero mode to the KK modes, i.e. Mi0 ≡ 0. the graviton number for one polarizationNα,k,•(t) for the zero mode and the first ten KK modes as well as the evo- lution of the scale factor a(t) and the position of the phys- ical brane yb(t). Initial and final times are Nin = 5 and tout = 2000, respectively. The KK-particle spectrum will be discussed in detail below. One observes that the zero- mode particle number increases slightly with the expan- sion of the Universe towards the bounce at t = 0. Close to the bounce N0,k,•(t) increases drastically, shows a lo- cal peak at the bounce and, after a short decrease, grows again until the mode is sub-horizon (kt ≫ 1). Inside the horizon N0,k,•(t) is oscillating around a mean value with diminishing amplitude. This mean value which is reached asymptotically for t → ∞ corresponds to the number of generated final state zero-mode gravitonsN out0,k,•. Produc- tion of KK-mode gravitons takes effectively place only at the bounce in a step-like manner and the graviton num- ber remains constant right after the bounce. In Fig. 7 we show the numerical results obtained for the same parameters as in Fig. 6 but without coupling of the zero mode to the KK modes, i.e. Mi0 = 0 (and thus also Ni0 = N0i = 0). One observes that the production of zero-mode gravitons is virtually not affected by the artificial decoupling 6. Note that even if M0j ≡ 0 (see Eqs. B2), which is in general true for Neumann bound- ary conditions, the zero mode q0,k,• couples in Eq. (2.49) to the KK modes via N0j = M00Mj0 and through the anti-symmetric combination Mαβ −Mβα. In contrast, the production of the first ten KK modes is heavily suppressed if Mi0 = 0. The corresponding final- state graviton numbers N outn,k,• are reduced by four orders of magnitude. This shows that the coupling to the zero mode is essential for the production of massive gravitons. Later we will see that this is true for light KK gravitons only. If the KK masses exceed mi ∼ 1, they evolve in- dependently of the four-dimensional graviton and their evolution is entirely driven by the intermode couplings Mij . It will also turn out that the time-dependence of the KK mass mi plays only an inferior role for the gen- eration of massive KK modes. On the other hand, the effective decoupling of the evolution of the zero mode from the KK modes occurs in general as long as k ≪ 1 is satisfied, i.e. for long-wavelengths. We will see that it is no longer true for short wavelengths k ≫ 1. The effective decoupling of the zero-mode evolution from the KK modes makes it possible to derive analytical ex- pressions for the number of zero-mode gravitons, their power spectrum and energy density. The calculations are carried out in section VIA In summary we emphasize the important observation that for long wavelengths the amplification of the four dimensional gravity wave amplitude during the bounce is not affected by the evolution of the KK gravitons. We can therefore study the zero mode separately from the KK modes in this case. C. Zero mode: long wavelengths k ≪ 1 In Figure 8 we show the numerical results for the num- ber of generated zero-mode gravitons N0,k,•(t) and the evolution of the corresponding power spectrum P0(k) on the brane for momentum k = 0.01, position of the static brane ys = 10 and maximal brane velocity vb = 0.1. The results have been obtained by solving the equations for the zero mode alone, i.e. without the couplings to the KK modes, since, as we have just shown, the evolution of the 6 Quantitatively it is N0,k,•(t = 2000) = 965.01 with and N0,k,•(t = 2000) = 965.06 without Mi0. Note that this differ- ence lies indeed within the accuracy of our numerical simulations (see Appendix D.) four-dimensional graviton for long wavelengths is not in- fluenced by the KK modes. Thereby the power spectrum is shown before and after averaging over several oscilla- tions, i.e. employing Eq. (4.9) with and without the term ON0,k, respectively. Right after the bounce where the gen- eration of gravitons is initiated and which is responsible for the peak in N0,k,• at t = 0, the number of gravitons first decreases again. AfterwardsN0,k,• grows further un- til the mode enters the horizon at kt = 1. Once on sub- horizon scales kt ≫ 1, the number of produced gravitons oscillates with a diminishing amplitude and asymptoti- cally approaches the final state graviton number N out0,k,•. During the growth of N0,k,• after the bounce, the power spectrum remains practically constant. Within the range of validity it is in good agreement with the analytical pre- diction (6.22) yielding (L2(2π)3/κ4)P0(k, t) = 4vb(kL)2. When particle creation has ceased, the full power spec- trum Eq.(4.8) starts to oscillate with an decreasing am- plitude. The time-averaged power spectrum obtained by using Eq. (4.9) without the ON0,k-term is perfectly in agreement with the analytical expression Eq. (6.20) which gives (L2(2π)3/κ4)P0(k, t) = 2vb/t2. Note that at early times, the time-averaged power spectrum behaves not in the same way as the full one, demonstrating the importance of the term ON0,k. Figure 9 shows a summary of numerical results for the number of created zero-mode gravitons N0,k,•(t) for dif- ferent values of the three-momentum k. The maximum velocity at the bounce is vb = 0.1 and the second brane is at ys = 10. These values are representative. Other values in accordance with the considered low-energy regime do not lead to a qualitatively different behavior. Note that the evolution of the zero mode does virtually not depend on the value of ys as long as ys ≫ yb(0) (see below). Ini- tial and final integration times are given by Nin = 5 and tout = 20000, respectively. For sub-horizon modes we compare the final graviton spectra with the analytical prediction (6.17). Both are in perfect agreement. On super-horizon scales where parti- cle creation has not ceased yet N0,k,• is independent of k. The corresponding time-evolution of the power spec- tra P0(k, t) is depicted in Fig. 10. For the sake of clarity, only the results for t > 0, i.e. after the bounce, are shown in both figures. The numerical simulations and the calculations of sec- tion VIA reveal that the power spectrum for the four- dimensional graviton for long wavelengths is blue on super-horizon scales, as expected for an ekpyrotic sce- nario. The analytical calculations performed in section VIA rely on the assumption that yb ≪ ys and tin → −∞. Figure 11 shows the behavior of the number of generated zero-mode gravitons of momentum k = 0.01 in depen- dence on the inter-brane distance and the initial integra- tion time. The brane velocity at the bounce is vb = 0.1 which implies that at the bounce the moving brane is at yb(0) = vb ≃ 0.316 (L = 1). In case of a close en- counter of the two branes as for ys = 0.35, the production FIG. 8: Time evolution of the number of created zero-mode gravitons N0,k,•(t) and of the zero-mode power spectrum (4.8): (a) for the entire integration time; (b) for t > 0 only. Parameters are k = 0.01, ys = 10 and vb = 0.1. Initial and fi- nal time of integration are given by Nin = 10 and tout = 4000, respectively. The power spectrum is shown with and without the term ON0,k,•, i.e. before and after averaging, respectively, and compared with the analytical results. FIG. 9: Numerical results for the time evolution of the num- ber of created zero-mode gravitons N0,k,•(t) after the bounce t > 0 for different three-momenta k. The maximal brane velocity at the bounce is vb = 0.1 and the second brane is positioned at ys = 10. In the final particle spectrum the nu- merical values are compared with the analytical prediction Eq. (6.17). Initial and final time of integration are given by Nin = 5 and tout = 20000, respectively. of massless gravitons is strongly enhanced compared to the analytical result. But as soon as ys ≥ 1, (i.e. ys ≥ L) the numerical result is very well described by the analyt- ical expression Eq. (6.16) derived under the assumption ys ≫ yb. For ys ≥ 10 the agreement between both is very good. From panels (b) and (c) one infers that the numerical result becomes indeed independent of the ini- FIG. 10: Evolution of the zero-mode power spectrum after the bounce t > 0 corresponding to the values and parameters of Fig. 9. The numerical results are compared to the analytical predictions Eqs. (6.20) and (6.22). 0 1000 2000 time t analytical =0.35 10 100 1000 2nd brane pos. y 0.1 1 10 100 1000 2nd brane pos. y analytical (a) (b) FIG. 11: Dependence of the zero-mode particle number on inter-brane distance and initial integration time for momen- tum k = 0.01, maximal brane velocity vb = 0.1 in comparison with the analytical expression Eq. (6.16). (a) Evolution of the instantaneous particle number N0,k,•(t) with initial integra- tion time given by Nin = 5 for ys = 0.35, 0.5 and 1. (b) Final zero-mode graviton spectrum N0,k,•(tout = 2000) for various values of ys and Nin. (c) Close-up view of (b) for large ys. tial integration time when increasing Nin. Note that in the limit Nin ≫ 1 the numerical result is slightly larger than the analytical prediction but the difference between both is negligibly small. This confirms the correctness and accuracy of the analytical expressions derived in Sec- tion VIA for the evolution of the zero-mode graviton. 0.01 0.1 1 10 Kaluza-Klein mass m t nmax=60 = 0.3 = 0.1 = 0.5● FIG. 12: Final state KK-graviton spectra for k = 0.001, ys = 100, different maximal brane velocities vb and Nin = 1, tout = 400. The numerical results are compared with the analytical prediction Eq. (6.34) (dashed line). D. Kaluza-Klein-modes: long wavelengths k ≪ 1 Because the creation of KK gravitons ceases right after the bounce [cf Fig. 6] one can stop the numerical simulation and read out the number of produced KK gravitons N outn,k,• at times for which the zero mode is still super-horizon. Even though Eq. (2.40) cannot be solved analytically, the KK masses can be approximated by mn ≃ nπ/ys. This expression is the better the larger the mass. Consequently, for the massive modes the position of the second brane ys determines how many KK modes belong to a particular mass range ∆m. In Figure 12 we show the KK-graviton spectra N outn,k,• for three-momentum k = 0.001 and second brane position ys = 100 for maximal brane velocities vb = 0.1, 0.3 and 0.5. For any velocity vb two spectra obtained with nmax = 60 and 80 KK modes taken into account in the simulation are compared to each other. This reveals that the numerical results are stable up to a KK mass mn ≃ 1. One infers that first, N outn,k,• grows with increasing mass until a maximum is reached. The position of the maximum shifts slightly towards larger masses with increasing brane velocity vb. Afterwards, N outn,k,• declines with growing mass. Until the maximum is reached, the numerical results for the KK-particle spectrum are very stable. This already indicates that the KK-intermode couplings mediated by Mij are not very strong in this mass range. In Figure 13 we show the final KK-particle spectrum for the same parameters as in Fig. 12 but for three-momentum k = 0.01 and 0.01 0.1 1 10 Kaluza-Klein mass m =60, M =0 for all i,j = 0.1 = 0.5 = 0.3 = 0.9 FIG. 13: Final state KK-graviton spectra for k = 0.01, ys = 100, different vb and Nin = 1, tout = 400. The numerical results are compared with the analytical prediction Eq. (6.34) (dashed line). For vb = 0.3, 0.5 the spectra obtained without KK-intermode and self-couplings (Mij ≡ 0 ∀ i, j) are shown as well. the additional velocity vb = 0.9 7. We observe the same qualitative behavior as in Fig. 12. In addition we show numerical results obtained for vb = 0.3 and 0.5 without the KK-intermode and self couplings, i.e. we have set Mij ≡ 0 ∀ i, j by hand. One infers that for KK masses, depending slightly on the velocity vb but at least up to mn ≃ 1, the numerical results for the spectra do not change when the KK-intermode coupling is switched off. Consequently, the evolution of light, i.e. mn <∼ 1, KK gravitons is virtually not affected by the KK-intermode coupling. In addition we find that also the time-dependence of the KK masses is not important for the production of light KK gravitons which is explicitly demonstrated below. Thus, production of light KK gravitons is driven by the zero-mode evolution only. This allows us to find an analytical expression, Eq. (6.34), for the number of produced light KK gravitons in terms of exponential integrals. The calculations which are based on several approximations are performed in Section VIC. In Figs. 12 and 13 the analytical prediction (6.34) for the spectrum of final state gravitons has already been included (dashed lines). Within its range of validity it is in excellent agreement with the numerical results obtained by including the full KK-intermode coupling. It perfectly describes the dependence of N outn,k,• on the three-momentum k and the maximal velocity vb. For small velocities vb <∼ 0.1 it is also able to reproduce the position of the maximum. This reveals that the KK- 7 Such a high brane velocity is of course not consistent with a Neumann boundary condition Eq. (2.29) at the position of the moving brane. intermode coupling is negligible for light KK gravitons and that their production is entirely driven by their coupling to the four-dimensional graviton. The analytical prediction is very precious for testing the goodness of the parameters used in the simulations, in particular the initial time tin (respectively Nin). Since it has been derived for real asymptotic initial conditions, tin → −∞, its perfect agreement with the numerical results demonstrates that the values for Nin used in the numerical simulations are large enough. No spurious initial effects contaminate the numerical results. Note, that the numerical values for N outn,k,• in the ex- amples shown are all smaller than one. However, for smaller values of k than the ones which we consider here for purely numerical reasons, the number of generated KK-mode particles is enhanced since N outn,k,• ∝ 1/k as can be inferred from Eq. (6.34) in the limit k ≪ mn. If we go to smaller values of ys, fewer KK modes belong to a particular mass range. Hence, with the same or similar number of KK modes as taken into account in the simulations so far, we can study the behavior of the final particle spectrum for larger masses. These simulations shall reveal the asymptotical behavior of N outn,k,• for mn → ∞ and therefore the behavior of the total graviton number and energy density. Due to the kink in the brane motion we cannot expect that the energy density of produced KK-mode gravitons is finite when summing over arbitrarily high frequency modes. Eventually, we will have to introduce a cutoff setting the scale at which the kink-approximation [cf. Eqs. (2.17) - (2.19)] is no longer valid. This is the scale where the effects of the underlying unspecified high-energy physics which drive the transition from contraction to expansion become important. The dependence of the final particle spectrum on the kink will be studied later on in this section in detail. In Figures 14 and 15 we show final KK-graviton spectra for ys = 10 and three-momentum k = 0.01 and k = 0.1. The analytical expression Eq. (6.34) is depicted as well and the spectra are always shown for at least two values of nmax to indicate up to which KK mass stability of the the numerical results is guaranteed. Now, only two KK modes are lighter than m = 1. For these modes the analytical expression Eq. (6.34) is valid and in excellent agreement with the numerical results, in particular for small brane velocities vb ∼ 0.1. As before, the larger the velocity vb the more visible is the effect of the truncation of the system of differential equations at nmax. For k = 0.01 the spectrum seems to follow a power law decrease right after the maximum in the spectra. In case of vb = 0.1 the spectrum is numerically stable up to masses mn ≃ 20. In the region 5 <∼ mn <∼ 20 the spec- trum is very well fitted by a power law N outn,k,• ∝ m−2.7n . Also for larger velocities the decline of the spectrum is given by the same power within the mass ranges 0.1 1 10 Kaluza-Klein mass m = 0.5 = 0.3 = 0.1 FIG. 14: Final state KK-graviton spectra for k = 0.01, ys = 10, different maximal brane velocities vb and Nin = 2, tout = 400. The numerical results are compared with the analytical prediction Eq. (6.34) (dashed line). 0.1 1 10 Kaluza - Klein mass m = 0.5 = 0.1 = 0.3 FIG. 15: Final state KK-graviton spectra for k = 0.1, ys = 10, different maximal brane velocities vb and Nin = 2, tout = 400. The numerical results are compared with the analytical prediction Eq. (6.34) (dashed line). where the spectrum is numerically stable. For k = 0.1, however, the decreasing spectrum bends over at a mass around mn ≃ 10 towards a less steep decline. This is in particular visible in the two cases with vb = 0.1 and 0.3 where the first 100 KK modes have been taken into account in the simulation. The behavior of the KK-mode particle spectrum can therefore not be described by a single power law decline for masses mn > 1. It shows more complicated features instead, which depend on the parameters. We shall demonstrate that this bending over of the decline is related to the coupling properties of the KK modes and to the kink in the brane motion. But before we come to a detailed discussion of these issues, let us briefly confront numerical results of different ys to FIG. 16: Upper panel: Final state KK-particle spectra for k = 0.01, vb = 0.1 and different ys = 3, 10, 30 and 100. The analytical prediction Eq. (6.34) is shown as well (dashed line). Lower panel: Energy ωoutn,kN n,k,• of the produced fi- nal state gravitons binned in mass intervals ∆m = 1 for ys = 10, 30, 100. demonstrate a scaling behavior. In the upper panel of Figures 16 and 17 we com- pare the final KK-spectra for several positions of the second brane ys = 3, 10, 30 and 100 obtained for a maximal brane velocity vb = 0.1 for k = 0.01 and 0.1, respectively. One observes that the shapes of the spectra are identical. The bending over in the decline of the spectrum at masses mn ∼ 1 is very well visible for k = 0.1 and ys = 3, 10. For a given KK mode n the number of particles produced in this mode is the larger the smaller ys. But the smaller ys, the less KK modes belong to a given mass interval ∆m. The energy transferred into the system by the moving brane, which is determined by the maximum brane velocity vb, is the same in all cases. Therefore, the total energy of the pro- duced final state KK gravitons of a given mass interval ∆m should also be the same, independent of how many KK modes are contributing to it. This is demonstrated in the lower panels of Figs. 16 and 17 where the energy ωoutn,kN outn,k,•(in units of L) of the generated KK gravitons binned in mass intervals ∆m = 1 is shown 8. One observes that, as expected, the energy transferred into the production of KK gravitons of a particular mass range is the same (within the region where the numerical results are stable), independent of the number of KK modes lying in the interval. This is in particular evident for ys = 30, 100. The discrepancy for ys = 10 is due to the binning. As we shall discuss below in detail, the 8 The energy for the case ys = 3 is not shown because no KK mode belongs to the first mass interval. FIG. 17: Upper panel: Final state KK-particle spectra for k = 0.1, vb = 0.1 and different ys = 3, 10, 30 and 100. The analytical prediction Eq. (6.34) is shown as well (dashed line). Lower panel: Energy ωoutn,kN n,k,• of the produced fi- nal state gravitons binned in mass intervals ∆m = 1 for ys = 10, 30, 100. particle spectrum can be split into two different parts. The first part is dominated by the coupling of the zero mode to the KK modes (as shown above), whereas the second part is dominated by the KK-intermode couplings and is virtually independent of the wave number k. As long as the coupling of the zero mode to the KK modes is the dominant contribution to KK-particle production it is N outn,k,• ∝ 1/k [cf. Eq. (6.34)]. Hence, Eoutn,k,• = ωoutn,kN outn,k,• ∝ 1/k if mn ≫ k. This explains why the energy per mass interval ∆m is one order larger for k = 0.01 (cf Fig. 16) than for k = 0.1 (cf Fig. 17) . Let us now discuss the KK-spectrum for large masses. The qualitative behavior of the spectrum N outn,k,• and the mass at which the decline of the spectrum changes are independent of ys. This is demonstrated in Figure 18 where KK-spectra for vb = 0.1, k = 0.1, ys = 10 [cf Fig. 15] and ys = 3 [cf Fig. 17] are shown. The results obtained by taking the full intermode coupling into account are compared to results of simulations where we have switched off the coupling of the KK modes to each other as well as their self-coupling (Mij ≡ 0 ∀ i, j). Furthermore we display the results for the KK-spectrum obtained by taking only the KK-intermode couplings into account, i.e. Mi0 = Mii = 0 ∀ i. One infers that for the lowest masses the spectra obtained with all couplings are identical to the ones obtained without the KK-intermode (Mij = 0, i 6= j) and self-couplings (Mii = 0). Hence, as already seen before, the primary source for the production of light KK gravitons is their coupling to the evolution of the four-dimensional gravi- ton. In this mass range, the contribution to the particle creation coming from the KK-intermode couplings is very much suppressed and negligibly small. 1 10 100 Kaluza-Klein mass m t full coupling =0 for all i,j =0, M FIG. 18: KK-particle spectra for three-momentum k = 0.1, maximum brane velocity vb = 0.1 and ys = 3 and 10 with different couplings taken into account. The dashed lines indi- cates again the analytical expression Eq. (6.34). For masses mn ≃ 4 a change in the decline of the spectrum sets in and the spectrum obtained without the coupling of the KK modes to the zero mode starts to diverge from the spectrum computed by taking all the couplings into account. While the spectrum without the KK-intermode couplings decreases roughly like a power law N outn,k,• ∝ m−3n the spectrum corresponding to the full coupling case changes its slope towards a power law decline with less power. At this point the KK-intermode couplings gain importance and the coupling of the KK modes to the zero mode looses influence. For a particular mass mc ≃ 9 the spectrum obtained including the KK-intermode couplings only, crosses the spectrum calculated by taking into account exclusively the coupling of the KK modes to the zero mode. After the crossing, the spectrum obtained by using only the KK-intermode couplings approaches the spectrum of the full coupling case. Both agree for large masses. Thus for large masses mn > mc the production of KK gravitons is dominated by the couplings of the KK modes to each other and is not influenced anymore by the evolution of the four-dimensional graviton. This crossing defines the transition between the two regimes mentioned before: for masses mn < mc the production of KK gravitons takes place due to their coupling to the zero mode Mi0, while it is entirely caused by the intermode couplings Mij for masses mn > mc. Decoupling of the evolution of the KK modes from the dynamics of the four-dimensional graviton for large masses implies that KK-spectra obtained for the same maximal velocity are independent of the three- momentum k. This is demonstrated in Fig. 19 where we compare spectra obtained for vb = 0.1 and ys = 3 but different k. As expected, all spectra converge towards the same behavior for masses mn > mc. 1 2 3 4 5 10 20 30 1000.3 Kaluza-Klein masses m t k=0.01 k=0.03 k=0.1 FIG. 19: Comparison of KK-particle spectra for ys = 3, vb = 0.1 and three-momentum k = 0.01, 0.03, 0.1 and 1 demonstrating the independence of the spectrum on k for large masses. nmax = 60 KK modes have been taken into account in the simulations. 1 10 100 Kaluza-Klein mass m t full coupling =0 for all i, j =0, i = j =0, M Mαβ = 0 for all α, β FIG. 20: KK-particle spectra for three-momentum k = 0.1, maximum brane velocities vb = 0.1 and ys = 3 for nmax = 40 obtained for different coupling combinations. Figure. 20 shows KK-particle spectra for k = 0.1,vb = 0.1 and ys = 3 obtained for different couplings. This plot visualizes how each particular coupling combination contributes to the production of KK gravitons. It shows, as already mentioned before but not shown explicitly, that the Mii coupling which is the rate of change of the corresponding KK mass [cf. Eqs. (2.41) and (B4)] is not important for the production of KK gravitons. Switching it off does not affect the final graviton spectrum. We also show the result obtained with all couplings but with α+ii(t) = ω i,k and α ii (t) = 0, i.e. the time-dependence of the frequency [cf. Eq. (3.36)] has been neglected. One observes that in this case the spectrum for larger masses is quantitatively slightly 0.1 1 10 100 Kaluza-Klein mass m t full coupling =0 for all i, j =0, M k=0.01 k=0.1 k=0.1 FIG. 21: KK-particle spectra for ys = 10, vb = 0.1, nmax = 100 and three-momentum k = 0.01 and 0.1 with different couplings taken into account. The thin dashed lines indicates Eq. (6.34) and the thick dashed line Eq. (5.4). different but has a identical qualitative behavior. If, on the other hand, all the couplings are switched off Mαβ ≡ 0 ∀α, β and only the time-dependence of the frequency ωi,k is taken into account, the spectrum changes drastically. Not only the number of produced gravitons is now orders of magnitude smaller but also the spectral tilt changes. For large masses it behaves as Nn,k,• ∝ m−2n . Consequently, the time-dependence of the graviton frequency itself plays only an inferior role for production of KK gravitons. The bottom line is that the main sources of the produc- tion of KK gravitons is their coupling to the evolution of the four-dimensional graviton (Mi0) and their couplings to each other (Mij , i 6= j) for small and large masses, respectively. Both are caused by the time-dependent boundary condition. The time-dependence of the oscilla- tor frequency ωj,k = m2j(t) + k 2 is virtually irrelevant. Note that this situation is very different from ordinary inflation where there are no boundaries and particle production is due entirely to the time dependence of the frequency 9. The behavior of the KK-spectrum, in particular the mass mc at which the KK-intermode couplings start to dominate over the coupling of the KK modes to the zero mode depends only on the three-momentum k = |k| and the maximal brane velocity vb. This is now discussed. In Figure 21 we show KK-particle spectra for ys = 10, vb = 0.1, nmax = 100 and three-momenta k = 0.01 and 0.1. Again, the spectra obtained by taking 9 Note, however, that the time-dependent KK mass mj(t) enters the intermode couplings. 0.1 1 10 Kaluza-Klein mass m t full coupling =0 for all i, j =0, M = 0.1 = 0.1 = 0.3 = 0.03 FIG. 22: KK-particle spectra for three-momentum k = 0.1,ys = 10 and maximum brane velocities vb = 0.03, 0.1 and 0.3 with nmax = 100. As in Fig. 21 different couplings have been taken into account and thin dashed lines indicates Eq. (6.34) and the thick dashed line Eq. (5.4). all the couplings into account are compared to the case where only the coupling to the zero mode is switched on. One observes that for k = 0.01 the spectrum is dominated by the coupling of the KK modes to the zero mode up to larger masses than it is the case for k = 0.1. For k = 0.01 the spectrum obtained taking into account Mi0 only is identical to the spectrum obtained with the full coupling up to mn ≃ 10. In case of k = 0.1 instead, the spectrum is purely zero mode dominated only up to mn ≃ 5. Hence, the smaller the three-momentum k the larger is the mass range for which the KK-intermode coupling is suppressed, and the coupling of the zero mode to the KK modes is the dominant source for the production of KK gravitons. As long as the coupling to the zero mode is the primary source of particle production, the spectrum declines with a power law ∝ m−3n . Therefore, in the limiting case k → 0 when the coupling of the zero mode to the KK modes dominates particle production also for very large masses it is N outn≫1,k→0,• ∝ 1/m3n. Figure 22 shows KK-graviton spectra obtained for the same parameters as in Fig. 21 but for fixed k = 0.1 and different maximal brane velocities vb. Again, the spectra obtained by taking all the couplings into account are compared with the spectra to which only the coupling of the KK modes to the zero mode contributes. The mass up to which the spectra obtained with different couplings are identical changes only slightly with the maximal brane velocity vb. Therefore, the dependence of mc on the velocity is rather weak even if vb is changed by an order of magnitude, but nevertheless evident. This behavior of the spectrum can indeed be understood qualitatively. In Section VIC we demonstrate that the coupling strength of the KK modes to the zero mode 1 10 100 1 10 100 1 10 100 1 10 100 k=0.01 k=0.03 k=0.1 k=1 =25.54 mc=14.11 =8.24 mc=7.31 FIG. 23: KK-particle spectra for three-momentum k = 0.01, 0.03, 0.1 and 1 for ys = 3 and maximum brane velocity vb = 0.1 with different couplings taken into account where the notation is like in Fig. 22. From the crossing of the Mii = Mij = 0- and Mii = Mi0 = 0 results we determine the k-dependence of mc(k, vb). The thick dashed line indi- cates Eq. (5.4). at the bounce t = 0, where production of KK gravitons takes place, is proportional to . (5.2) The larger this term the stronger is the coupling of the KK modes to the zero mode, and thus the larger is the mass up to which this coupling dominates over the KK-intermode couplings. Consequently, the mass at which the tilt of the KK-particle spectrum changes depends strongly on the three-momentum k but only weakly on the maximal brane velocity due to the square root behavior of the coupling strength. This explains qualitatively the behavior obtained from the numerical simulations. An approximate expression for mc(k, vb) can be obtained from the numerical simulations. In Figure 23 we depict the KK-particle spectra for three-momentum k = 0.01, 0.03, 0.1 and 1 for ys = 3 and maximum brane velocity vb = 0.1 with different couplings taken into account. The legend is as in Fig. 22. From the crossings of the Mij = 0, i 6= j and Mii = Mi0 = 0 results one can determine the k-dependence of mc. Note that the spectra are not numerically stable for large masses, but they are stable in the range where mc lies [cf., e.g., Fig. 25, for k = 0.1]. Using the data for k = 0.01, 0.03 and 0.1 one finds mc(k, vb) ∝ 1/ In Fig. 24 KK-graviton spectra are displayed for k = 0.1, ys = 3 and maximal brane velocities vb = 0.3, 0.2, 0.1, 0.08, 0.05 and 0.03 with different couplings taken into account. It is in principle possible to determine the vb-dependence of mc from the crossings 1 10 100 1 10 100 1 10 100 = 0.3 =9.50 = 0.2 v = 0.1 = 0.08 =8.24 = 0.05 v = 0.03 =9.04 =8.06 m =7.72 m =7.52 FIG. 24: KK-graviton spectra for three-momentum k = 0.1, ys = 3 and maximum brane velocities vb = 0.3, 0.2, 0.1, 0.08, 0.05 and 0.03 with different couplings taken into account where the notation is like in Fig. 22. From the crossing of the Mii = Mij = 0- and Mii = Mi0 = 0 results we determine the vb-dependence of mc. of the Mij = 0, i 6= j- and Mii = Mi0 = 0 results as done for the k-dependence. However, the values for mc displayed in the Figures indicate that the dependence of mc on vb is very weak. From the given data it is not possible to obtain a good fitting formula (as a simple power law) for the vb-dependence of mc. (In the range 0.1 ≤ vb ≤ 0.3 a very good fit is mc = 1.12πv0.13b / The reason is twofold. First of all, given the complicated coupling structure, it is a priori not clear that a simple power law dependence exists. Recall that also the ana- lytical expression for the particle number Eq. (6.34) has not a simple power law velocity dependence. Moreover, for the number of modes taken into account (nmax = 40) the numerical results are not stable enough to resolve the weak dependence of mc on vb with a high enough accuracy. (But it is good enough to perfectly resolve the k-dependence.) The reason for the slow convergence of the numerics will become clear below. As we shall see, the corresponding energy density is dominated by masses much larger than mc. Consequently the weak dependence of mc on vb is not very important in that respect and therefore does not need to be determined more precisely. However, combining all the data we can give as a fair approximation mc(k, vb) ≃ π vαb , with α ≃ 0.1. (5.3) Taking α = 0.13 for 0.1 ≤ vb ≤ 0.3 and α = 0.08 for 0.03 ≤ vb ≤ 0.1 fits the given data reasonably well. As we have seen, as long as the zero mode is the dominant source of KK-particle production, the final KK-graviton spectrum can be approximated by a power law decrease m−3n . We can combine the presented numerical results to obtain a fitting formula valid in this regime: N outn≫1,k≪1,• = (Lmn)3 , for < mn < mc. (5.4) This fitting formula is shown in Figs. 21 22 and 23 and is in reasonable good agreement with the numerical results. Since Eq. (5.4) together with (5.3) is an impor- tant result, we have reintroduced dimensions, i.e. the AdS scale L which is set to one in the simulations, in both expressions. Let us now investigate the slope of the KK-graviton spectrum for masses mn → ∞ since it determines the contribution of the heavy KK modes to the energy den- sity. In Figure 25 we show KK-graviton spectra obtained for three-momentum k = 0.1, second brane position ys = 3 and maximal brane velocities vb = 0.01, 0.03 and 0.1. Up to nmax = 100 KK modes have been taken into account in the simulations. One immediately is confronted with the observation that the convergence of the KK-graviton spectra for large mn is very slow. This is since those modes, which are decoupled from the evolution of the four-dimensional graviton, are strongly affected by the kink in the brane motion. Recall that the production of light KK gravitons with masses mn ≪ mc is virtually driven entirely by the evolution of the massless mode. Those light modes are not so sensitive to the discontinuity in the velocity of the brane motion. To be more precise, their primary source of excitation is the evolution of the four-dimensional graviton but not the kink which, as we shall discuss now, is responsible for the production of heavy KK gravitons mn ≫ mc. A discontinuity in the velocity will always lead to a di- vergent total particle number. Arbitrary high frequency modes are excited by the kink since the acceleration diverges there. Due to the excitation of KK gravitons of arbitrarily high masses, one cannot expect that the numerical simulations show a satisfactory convergence behavior which allows to determine the slope by fit- ting the data. However, it is nevertheless possible to give a quantitative expression for the behavior of the KK-graviton spectrum for large masses. The studies of the usual dynamical Casimir effect on a time-dependent interval are very useful for this purpose. For the usual dynamical Casimir effect it has been shown analytically that a discontinuity in the velocity will lead to a divergent particle number [57, 58]. In Appendix E we discuss in detail the model of a massless real scalar field on a time-dependent interval [0, y(t)] for the boundary motion y(t) = y0 + v t with v = const, and present numerical results for final particle spectra (Fig. 34). For this motion it was shown in [58] that the particle spectrum behaves as ∝ v2/ωn where ωn = nπ/y0 is the frequency of a massless scalar particle. This di- vergent behavior is due to the discontinuities in the velocity when the motion is switched on and off, and are responsible for the slow convergence of the numerical 1 2 3 4 5 10 20 30 100 Kaluza-Klein mass m t nmax = 40 = 60 = 80 =0.03 =0.01 FIG. 25: KK-particle spectra for k = 0.1, ys = 3 and max- imal brane velocities vb = 0.01, 0.03, 0.1 up to KK masses mn ≃ 100 compared with an 1/mn decline. The dashed lines indicate the approximate expression (5.6) which describes the asymptotic behavior of the final KK-particle spectra reason- ably well, in particular for vb < 0.1. results shown in Fig. 34 for this scenario. At the kink in the brane-motion the total change of the velocity is 2vb, similar to the case for the linear motion where the discontinuous change of the velocity is 2v. Consequently we may conclude that for large KK masses mn ≫ mc for which the evolution of the KK modes is no longer affected by their coupling to the four-dimensional graviton the KK-graviton spectrum behaves as 10 N outn,k,• ∝ for mn ≫ mc . (5.5) If we assume that the spectrum declines like 1/mn and use that the numerical results for masses mn ≃ 20 are virtually stable one finds N outn,k,• ∝ v2.08b /mn which de- scribes the asymptotics of the numerical results well. As for the dynamical Casimir effect for a uniform motion discussed in Appendix E [cf. Fig. 34], the slow conver- gence of the numerical results towards the 1/mn behav- ior is well visible for large masses mn ≫ mc which do no longer couple to the four-dimensional graviton. This is a strong indication for the statement that the final gravi- ton spectrum for large masses behaves indeed like (5.5). It is therefore possible to give a single simple expression for the final KK-particle spectrum for large masses which 10 Note that the discussion in Appendix E refers to Dirichlet bound- ary conditions. For Neumann boundary conditions considered here, the zero mode and its asymmetric coupling play certainly a particular role. However, as we have shown, for large masses only the KK-intermode couplings are important. Consequently, there is no reason to expect that the qualitative behavior of the spec- trum for large masses depends on the particular kind of boundary condition. -100 -50 0 50 100 time t -100 -50 0 50 100 time t 1 10 50 KK mass m 1 10 50 KK mass m =40 n FIG. 26: Evolution of the zero-mode particle numberN0,k,•(t) and final KK-graviton spectra N outn,k,• for ys = 3, maximal brane velocity vb = 0.1 and three-momenta k = 10 and 30. The dashed line in the upper plots indicate Eq. (6.17) (divided by two) demonstrating the value of the number of produced zero-mode gravitons without coupling to the KK modes. comprises all the features of the spectrum even quantita- tively reasonably well [cf. dashed lines in Fig. 25] N outn,k,• ≃ 0.2 ωoutn,k ys for mn ≫ mc . (5.6) The 1/ys-dependence is compelling. It follows imme- diately from the considerations on the energy and the scaling behavior discussed above [cf. Figs. 16 and 17]. For completeness we now write 1/ωoutn,k instead of the KK mass mn only, since what matters is the total energy of a mode. Throughout this section this has not been important since we considered only k ≪ 1 such that ωoutn,k becomes independent of k for large masses mn ≫ k [cf. Fig. 19]. E. Short wavelengths k ≫ 1 For short wave lengths k ≫ 1 (short compared to the AdS-curvature scale L set to one in the simulations) a completely new and very interesting effect appears. The behavior of the four-dimensional graviton mode changes drastically. We find that the zero mode now couples to the KK gravitons and no longer evolves virtually independently of the KK modes, in contrast to the behavior for long wavelengths. In Fig. 26 we show the evolution of the zero-mode graviton number N0,k,•(t) and final KK-graviton spectra N outn,k,• for ys = 3, maximal brane velocity vb = 0.1 and three-momenta k = 10 and 30. One observes that the evolution of the four-dimensional graviton depends on the number of KK modes nmax taken into account, i.e. the zero mode couples to the KK gravitons. For -100 -50 0 50 100 time t 0 10 20 30 40 0.02/(k-1.8) analytical FIG. 27: 4D-graviton number N0,k,•(t) for k = 3, 5, 10, 20 and 30 with ys = 3 and maximal brane velocity vb = 0.1. The small plot shows the final graviton spectrum N out 0,k,• together with a fit to the inverse law a/(k + b) [dashed line] and the analytical fitting formula Eq. (6.23) [solid line]. For k = 10 and 30 the corresponding KK-graviton spectra are shown in Fig. 26. k = 10 the first 60 KK modes have to be included in the simulation in order to obtain a numerically stable result for the zero mode. In the case of k = 30 one already needs nmax ≃ 100 in order to achieve numerical stability for the zero mode. Figure 27 displays the time-evolution of the number of produced zero-mode gravitons N0,k,•(t) for ys = 3 and vb = 0.1. For large k the production of massless gravitons takes place only at the bounce since these short wavelength modes are sub-horizon right after the bounce. Corresponding KK-particle spectra for k = 10, 30 are depicted in Figs. 26 and 28. The insert in Fig. 27 shows the resulting final four-dimensional graviton spectrum N out0,k,•, which is very well fitted by an inverse power law N out0,k,• = 0.02/(k − 1.8) 11. Consequently, for k ≫ 1 the zero-mode particle number N out0,k,• declines like 1/k only, in contrast to the 1/k2 behavior found for k ≪ 1. The dependence of N out0,k,• on the maximal brane velocity vb also changes. In Fig. 28 we show N0,k,•(t) together with the corresponding KK-graviton spectra for ys = 3, k = 5 and 10 in each case for different vb. Using nmax = 60 KK modes in the simulations guarantees numerical stability for the zero mode. The velocity dependence of N out0,k,• is not given by a simple power law as it is the case for k ≪ 1. This is not very surprising 11 The momenta k = 5, 10, 20, 30 and 40 have been used to obtain the fit. Fitting the spectrum for k = 20, 30 and 40 to a power law gives N out 0,k,• ∝ k−1.1. -100 -50 0 50 100 time t -100 -50 0 50 100 time t 1 10 100 KK mass m 1 10 100 KK mass m = 0.03 = 0.05 = 0.1 = 0.03 = 0.05 = 0.1 = 0.3 = 0.1 = 0.03 = 0.03 = 0.1 = 0.3 FIG. 28: Zero-mode particle number N0,k,•(t) and corre- sponding final KK-particle spectraN outn,k,• for ys = 3, k = 5, 10 and different maximal brane velocities vb. nmax = 60 guaran- tees numerically stable solutions for the zero mode. since now the zero mode couples strongly to the KK modes [cf. Fig. 26]. For k = 10, for example, one finds N out0,k,• ∝ v1.4b if vb <∼ 0.1. As in the long wavelengths case, the zero-mode particle number does not depend on the position of the static brane ys even though the zero mode now couples to the KK modes. This is demonstrated in Fig. 29 where the evolution of the zero-mode particle number N0,k,•(t) and the corresponding KK-graviton spectra with k = 10, vb = 0.1 for the two values ys = 3 and 10 are shown. One needs nmax = 60 for ys = 30 in order to obtain a stable result for the zero mode which is not sufficient in the case ys = 10. Only for nmax ≃ 120 the zero-mode solution approaches the stable result which is identical to the result obtained for ys = 3. What is important is not the number of the KK modes the four-dimensional graviton couples to, but rather a particular mass mzm ≃ k. The zero mode couples to all KK modes of masses below mzm no matter how many KK modes are lighter. Recall that the value of ys just determines how many KK modes belong to a given mass interval ∆m since, roughly, mn ≃ nπ/ys. The KK-spectra for k ≥ 1 show the same scaling behavior as demonstrated for long wavelengths in Figs. 16 and 17. The production of four-dimensional gravitons of short wavelengths takes place on the expense of the KK modes. In Fig. 30 we show the numerical results for the final KK-particle spectra with vb = 0.1, ys = 3 and k = 3, 5, 10 and 30 obtained for different coupling combinations. These spectra should be compared with those shown in Fig. 23 for the long wavelengths case. For k >∼ 10 the number of the produced lightest KK gravi- tons is smaller in the full coupling case compared to the situation where only the KK-intermode coupling is taken -100 -50 0 50 100 time t -100 -50 0 50 100 time t 1 10 50 KK mass m 1 10 50 KK mass m FIG. 29: Zero-mode particle numberN0,k•(t) and correspond- ing KK-graviton spectra for k = 10, vb = 0.1 and 2nd brane positions ys = 3 and 10. 1 10 100 KK-mass m 1 10 100 KK-mass m k=3 k=5 k=10 k=30 FIG. 30: Final KK-particle spectra N outn,k,• for vb = 0.1, ys = 3 and k = 3, 5, 10 and 30 and different couplings. Circles cor- respond to the full coupling case, squares indicate the results if Mij = Mii = 0, i.e. no KK-intermode couplings and dia- monds correspond to Mi0 = 0, i.e. no coupling of KK modes to the zero mode. into account. In case k = 30, for instance, the numbers of produced gravitons for the first four KK modes are smaller for the full coupling case. This indicates that the lightest KK modes couple strongly to the zero mode. Their evolution is damped and graviton production in those modes is suppressed. The production of zero-mode gravitons on the other hand is enhanced compared to the long wavelengths case. For short wavelengths, the evolution of the KK modes therefore contributes to the production of zero-mode gravitons. This may be interpreted as creation of zero-mode gravitons out of KK-mode vacuum fluctuations. As in the long wavelengths case, the KK-particle spectrum becomes independent of k if mn ≫ k and 10 20 100 200 frequency ω k=40, n k=30, n k=20, n k=10, n k=5, n FIG. 31: Final KK-particle spectra N outn,k,• for vb = 0.1, ys = 3 and k = 5, 10, 20, 30 and 40. The dashed lines in- dicate Eq. (6.35) for k = 10, 20, 30 and 40. For k ≥ 20, the simple analytical expression (6.35) agrees quite well with the numerical results. the evolution of the KK modes is dominated by the KK-intermode coupling. This is visible in Fig. 30 for k = 3 and 5. Also the bend in the spectrum when the KK-intermode coupling starts to dominate is observable. For k = 10 and 30 this regime with mn ≫ k is not reached. As we have shown before, in the regime mn ≫ k the KK-particle spectrum behaves as 1/ωoutn,k which will dominate the energy density of produced KK gravitons. If 1 ≪ mn <∼ k, however, the zero mode couples to the KK modes and the KK-graviton spectrum does not decay like 1/ωoutn,k. This is demonstrated in Fig. 31 where the number of produced final state gravitons N outn,k,• is plotted as function of their frequency ωoutn,k for parameters vb = 0.1, ys = 3 and k = 5, 10, 20, 30 and 40. While for k = 5 the KK-intermode coupling dominates for large masses [cf. Fig. 30] leading to a bending over in the spectrum and eventually to an 1/ωoutn,k-decay, the spectra for k = 20, 30 and 40 show a different behavior. All the modes are still coupled to the zero mode leading to a power-law decrease ∝ 1/(ωoutn,k)α with α ≃ 2. The case k = 10 corresponds to an intermediate regime. Also shown is the simple analytical expression given in Eq. (6.35) which describes the spectra reasonably well for large k (dashed line). The KK-particle spectra in the region 1 ≪ mn <∼ k will also contribute to energy density since the cutoff scale is the same for the integration over k and the summation over the KK-tower (see Section VID below). 1 2 3 4 5 10 20 30 100 Kaluza-Klein mass m =0 (kink) =0.005 =0.015 =0.05 = 2.2 x 10 exp(-0.1315 m FIG. 32: KK-particle spectrum for ys = 3, vb = 0.1 and k = 0.1 for the bouncing as well as smooth motions with ts = 0.005, 0.015, and 0.05 to demonstrate the influence of the bounce. nmax = 60 KK modes have been taken into account in the simulations and the result for the kink motion is shown as well. F. A smooth transition Let us finally investigate how the KK-graviton spec- trum changes when the kink-motion (2.18) is replaced by the smooth motion (5.1). In Fig. 32 we show the numeri- cal results for the final KK-graviton spectrum for ys = 3, vb = 0.1 and k = 0.1 for the smooth motion (5.1) with ts = 0.05, 0.015 and 0.005. nmax = 60 modes have been taken into account in the simulation and the results are compared to the spectrum obtained with the kink-motion (2.18). The parameter ts defines the scale Ls ≃ 2ts at which the kink is smoothed, i.e. Ls corresponds to the width of the transition from contraction to expansion. The numerical results reveal that KK gravitons of masses smaller than ms ≃ 1/Ls are not affected, but the pro- duction of KK particles of masses larger than ms is exponentially suppressed. This is in particular evident for ts = 0.05 where the particle spectrum for masses mn > 10 has been fitted to a exponential decrease. Going to smaller values of ts, the suppression of KK-mode pro- duction sets in for larger masses. For the example with ts = 0.005 the KK-particle spectrum is identical to the one obtained with the kink-motion within the depicted mass range. In this case the exponential suppression of particle production sets in only for masses mn > 100. Note that the exponential decay of the spectrum for the smooth transition from contraction to expansions also shows that no additional spurious effects due to the dis- continuities in the velocity when switching the brane dy- namics on and off occur. Consequently, tin and tout are appropriately chosen. VI. ANALYTICAL CALCULATIONS AND ESTIMATES A. The zero mode: long wavelengths k ≪ 1/L The numerical simulations show that the evolution of the zero mode at large wavelengths is not affected by the KK modes. To find an analytical approximation to the numerical result for the zero mode, we neglect all the couplings of the KK modes to the zero mode by setting Mij = 0 ∀ i, j and keeping M00 only. Then only the evo- lution equation for ǫ 0 ≡ δα0 ǫ is important; it decouples and reduces to ǫ̈+ [k2 + V(t)]ǫ = 0 , (6.1) with “potential” V = Ṁ00 −M200 . (6.2) The corresponding vacuum initial conditions are [cf. Eqs. (3.21), (3.22); here we do not consider the unim- portant phase] ǫ = 1 , lim ǫ̇ = −ik. (6.3) A brief calculation using the expression for M00 (cf. Ap- pendix B) leads to V = y y2s − y2b 3y2b − 2y2s y2s − y2b (6.4) = − y y2s − y2b y2s − y2b . (6.5) If one assumes that the static brane is much further away from the Cauchy horizon than the physical brane, ys ≫ yb, it is simply V = −H2 − Ḣ , (6.6) and one recovers Eq. (2.50). For the particular scale factor (2.17) one obtains H = ȧ sgn(t) |t|+ tb and (6.7) Ḣ = 2δ(t) (|t|+ tb)2 (6.8) such that Ḣ +H2 = 2δ(t) . (6.9) The δ-function in the last equation models the bounce. Without the bounce, i.e. for an eternally radiation dom- inated dynamics, one has V = 0 and the evolution equa- tion for ǫ would be trivial. With the bounce, the potential is just a delta-function potential with “height” propor- tional to −2√vb/L V = − δ(t) , (6.10) where vb is given in Eq (2.20). Equation (6.1) with poten- tial (6.10) can be considered as a Schrödinger equation with δ-function potential. Its solution is a classical text- book problem. Since the approximated potential V vanishes for all t < 0 one has, with the initial condition (6.3), ǫ(t) = e−ikt , t < 0 . (6.11) Assuming continuity of ǫ through t = 0 and integrating the differential equation over a small interval t ∈ [0−, 0+] around t = 0 gives (6.12) = ǫ̇(0+)− ǫ̇(0−)− ǫ(0) . (6.13) The jump of the derivative ǫ̇ at t = 0 leads to parti- cle creation. Using ǫ(0+) = ǫ(0) = ǫ(0−) and ǫ̇(0+) = ǫ̇(0−) + ǫ(0) as initial conditions for the solution for t > 0, one obtains ǫ(t) = Ae−ikt +Beikt , t > 0 (6.14) A = 1 + i , B = −i . (6.15) The Bogoliubov coefficient B00 after the bounce is then given by B00(t ≥ 0) = e−ikt 1 + i ǫ(t)− i ǫ̇(t) (6.16) where we have used that M00 = −H if ys ≫ yb. At this point the importance of the coupling matrix M00 becomes obvious. Even though the solution ǫ to the dif- ferential equation (6.1) is a plane wave right after the bounce, |B00(t)|2 is not a constant due to the motion of the brane itself. Only once the mode is inside the hori- zon, i.e. H/k ≪ 1, |B00(t)|2 is constant and the number of generated final state gravitons (for both polarizations) is given by N out0,k = 2|B00(kt ≫ 1)|2 = 2 |ǫ|2 + |ǫ̇| (kL)2 (6.17) where we have used that the Wronskian of ǫ, ǫ∗ is 2ik. As illustrated in Fig. 9 the expression (6.17) is indeed in excellent agreement with the (full) numerical results, not only in its k-dependence but also the amplitude agrees without any fudge factor. The evolution of the four- dimensional graviton mode and the associated genera- tion of massless gravitons with momentum k < 1/L can therefore be understood analytically. Note that the approximation employed here is only valid if y2s − yb(0)2 ≫ yb(0)2. In the opposite limit, if ∆y ≡ ys − yb(0) ≪ yb(0) one can also derive an analytical ap- proximation along the same lines. For k ≤ 1/∆y one obtains instead of Eq. (6.17) N out0,k = 2(k∆y)2 , (6.18) if ∆y ≡ ys − yb(0) ≪ yb(0) , k∆y <∼ 1 . In order to calculate the energy density, we have to take into account that the approximation of an exactly ra- diation dominated Universe with an instant transition breaks down on small scales. We assume this break down to occur at the string scale Ls, much smaller than L [cf. Eqs. (2.14),(2.15)]. Ls is the true width of the transition from collapse to expansion, which we have set to zero in the treatment. Modes with mode numbers k ≫ (2π)/Ls will not ’feel’ the potential and are not generated. We therefore choose kmax = (2π)/Ls as the cutoff scale. Then, with Eq (4.21), one obtains for the energy density 2 π2a4 ∫ 2π/Ls dkk3N0,k . (6.19) For small wave numbers, k < 1/L, we can use the above analytical result for the zero-mode particle number. However, as the numerical simulations have revealed, as soon as k >∼ 1/L, the coupling of the four-dimensional graviton to the KK modes becomes important and for large wave numbers N out0,k decays only like 1/k. Hence the integral (6.19) is entirely dominated by the upper cutoff. The contributions from long wavelengths to the energy density are negligible. For the power spectrum, on the other hand, we are interested in cosmologically large scales, 1/k ≃ several Mpc or more, but not in short wavelengths kL ≫ 1 dominating the energy density. Inserting the expression for the number of produced long wavelength gravitons (6.17) into (4.11), the gravity wave power spectrum at late times becomes P0(k) = (2π)3 for kt ≫ 1. (6.20) This is the asymptotic power spectrum, when ǫ starts oscillating, hence inside the Hubble horizon, kt ≫ 1. On super Hubble scales, kt ≪ 1 when the asymptotic out-state of the zero mode is not yet reached, one may use Eq. (4.10) with R0,k(t) = |ǫ(t)|2 − 1 ≃ 4vba . (6.21) For the ≃ sign we assume t ≫ L and t ≫ tb so that one may neglect terms of order t/L in comparison to√ vb(t/L) 2. We have also approximated a = (t+ tb)/L ≃ t/L. Inserting this in Eq. (4.8) yields P0(k) = 2 , kt ≪ 1 . (6.22) Both expressions (6.20) and (6.22) are in very good agreement with the corresponding numerical results, see Figs. 9, 10 and 11. B. The zero mode: short wavelengths k ≫ 1/L As we have demonstrated with the numerical analysis, as soon as k >∼ 1/L, the coupling of the zero mode to the KK modes becomes important, and for large wave numbers N out0,k,• ∝ 1/k. We obtain a good asymptotic behavior for the four-dimensional graviton spectrum if we set N out0,k,• ≃ 5(kL) . (6.23) This function and Eq. (6.17) (divided by two for one po- larization) meet at kL = 5. Even though the approxi- mation is not good in the intermediate regime it is very reasonable for large k [cf. Fig. 27]. Inserting this approximation into Eq (6.19) for the energy density, one finds that the integral is dominated entirely by the upper cutoff, i.e. by the blue, high energy modes: . (6.24) The power spectrum associated with the short wave- lengths k ≫ 1/L is not of interest since the gravity wave spectrum is measured on cosmologically large scales only, k ≪ 1/L. C. Light Kaluza-Klein modes and long wavelengths k ≪ 1/L The numerics indicates that light (mn < 1) long wave- length KK modes become excited mainly due to their coupling to the zero mode. Let us take only this cou- pling into account and neglect also the time-dependence of the frequency, setting ωn,k(t) ≡ ωoutn,k = ωinn,k since it plays an inferior role as shown by the numerics. The Bogoliubov coefficients are then determined by the equations ξ̇n,k + iω n,kξn,k = 2ωoutn,k Sn(t; k) (6.25) η̇n,k − iωoutn,kηn,k = − 2ωoutn,k Sn(t; k) (6.26) with the “source” Sn(t; k) = (ξ0 − η0)Mn0 . (6.27) We have defined ξn,k ≡ ξ(0)n,k, ηn,k ≡ η n,k, ξ0 ≡ ξ 0,k, and η0 ≡ η(0)0,k. This source is known, since the evolution of the four-dimensional graviton is know. From the result for ǫ above and the definition of ξ0 and η0 in terms of ǫ and ǫ̇ one obtains ξ0 − η0 = −ik + 1|t|+ tb e−itk , t < 0 (6.28) ξ0 − η0 = 2 1− iktb k2tb(t+ tb) e−itk k2tb(t+ tb) eitk , t > 0 .(6.29) Furthermore, if ys ≫ yb, one has [cf. Eq. (B3)] Mn0 = 2 Y1(mnys)2 Y1(mnyb)2 − Y1(mnys)2 . (6.30) Assuming ysmn ≫ 1 and ybmn ≪ 1 one can expand the Bessel functions and arrives at Mn0 ≃ ẏb = − πmnL2 L sgn(t) (|t|+ tb)2 To determine the number of created final state gravi- tons we only need to calculate ηn,k [cf. Eq. (3.32) with ∆+n,k(|t| → ∞) = 1 and ∆ n,k(|t| → ∞) = 0], N outn,k,• = |B0n,k(tout)|2 = ωoutn,k |ηn,k|2 (6.31) The vacuum initial conditions require limt→−∞ ηn,k = 0 so that ηn,k is given by the particular solution ηn,k(t) = ωoutn,k ′; k)e−it ′ωoutn,kdt′ , (6.32) and therefore N outn,k,• = 4ωoutn,k Sn(t; k)e −itωout n,kdt (6.33) where the integration range has been extended from −∞ to +∞ since the source is very localized around the bounce. This integral can be solved exactly. A some- what lengthy but straight forward calculation gives N outn,k,• = πm5nL 2ωoutn,kkys ∣∣∣2iRe +k)tbE1(i(ω n,k + k)tb) +(ktb) −1ei(ω n,k−k)tbE1(i(ω n,k − k)tb) −ei(ω +k)tbE1(i(ω n,k + k)tb) . (6.34) Here E1 is the exponential integral, E1(z) ≡∫∞ t−1e−tdt . This function is holomorphic in the com- plex plane with a cut along the negative real axis, and the above expression is therefore well defined. Note that this expression does not give rise to a simple dependence of N outn,k on the velocity vb = (L/tb)2. In the preceding section we have seen that, within its range of validity, Eq. (6.34) is in excellent agreement with the numerical results (cf., for instance, Figs. 12 and 13). As already mentioned before, this excellent agreement between the numerics and the analytical approximation demonstrates that the numerical results are not contam- inated by any spurious effects. D. Kaluza-Klein modes: asymptotic behavior and energy density The numerical simulations show that the asymptotic KK-graviton spectra (i.e. for masses mn ≫ 1) decay like 1/ωoutn,k if mn ≫ k and like 1/ωoutn,k with α ≃ 2 if mn <∼ k. The corresponding energy density on the brane is given by the summation of Eq. (4.23) over all KK modes up to the cutoff. Since the mass mn is simply the momentum into the extra dimension, it is plausible to choose the same cutoff scale for both, the k-integral and the summation over the KK modes, namely 2π/Ls. The main contribution to the four-dimensional particle density and energy density comes from mn ∼ 2π/Ls and k ∼ 2π/Ls, i.e. the blue end of the spectrum. The large-frequency behavior of the final KK-spectrum can be approximated by N outn,k,• ≃ 0.2v2b   ωoutn,k if 1/L <∼ k <∼ mn 2(α−1)/2 (ωoutn,k) if mn <∼ k <∼ 2π/Ls (6.35) with α ≃ 2 which is particularly good for large k. Both expression match at mn = k and are indicated in Figures 25 and 31 as dashed lines. Given the complicated coupling structure of the problem and the multitude of features visible in the particle spectra, these compact expressions describe the numerical results reasonable well for all parameters. The deviation from the numer- ical results is at most a factor of two. This accuracy is sufficient in order to obtain a useful expression for the energy density from which bounds on the involved energy scales can be derived. The energy density on the brane associated with the KK gravitons is given by [cf. Eq. (4.23)] ρKK ≃ πa6ys dkk2 N outn,k,• ωoutn,k mn . (6.36) Splitting the momentum integration into two integrations from 0 to mn and mn to the cutoff 2π/Ls, and replacing the sum over the KK masses by an integral one obtains ρKK ≃ C(α) π5v2b . (6.37) The power α in Eq. (6.35) enters the final result for the energy density only through the pre-factor C(α) which is of order unity. VII. DISCUSSION The numerical simulations have revealed many inter- esting effects related to the interplay between the evolu- tion of the four-dimensional graviton and the KK modes. All features observed in the numerical results have been interpreted entirely on physical grounds and many of them are supported by analytical calculations and ar- guments. Having summarized the results for the power spectrum and energy densities in the preceding section, we are now in the position to discuss the significance of these findings for brane cosmology. A. The zero mode For the zero-mode power-spectrum we have found that P0(k) = k2 if kt ≪ 1 (La)−2 if kt ≫ 1 . (7.1) Therefore, the gravity wave spectrum on large, super Hubble scales is blue with spectral tilt nT = 2 , (7.2) a common feature of ekpyrotic and pre-big-bang models. The amplitude of perturbations on scales at which fluctuations of the Cosmic Microwave Background (CMB) are observed is of the order of (H0/mPl) 2, i.e. very suppressed on scales relevant for the anisotropies of the CMB. The fluctuations induced by these Casimir gravitons are much too small to leave any observable imprint on the CMB. For the zero-mode energy density at late times, kt ≫ 1, we have obtained [cf Eq. (6.24)] ρh0 ≃ . (7.3) In this section we denote the energy density of the zero mode by ρh0 in order not to confuse it with the 12 Note that even the transition from the summation over the KK- tower to an integration according to (4.33) “eats up” the 1/ys term in (6.36), the final energy density (6.37) depends on ys since it explicitly enters the particle number. present density of the Universe. Recall that Ls is the scale at which our kinky approximation (2.17) of the scale factor breaks down, i.e. the width of the bounce. If this width is taken to zero, the energy density of gravitons is very blue and diverges. This is not so surprising, since the kink in a(t) leads to the generation of gravitons of arbitrary high energies. However, as the numerical simulations have shown, when we smooth the kink at some scale Ls, the production of modes with energies larger than ≃ 1/Ls is exponentially suppressed [cf. Fig. 32]. This justifies the introduction of Ls as a cutoff scale. In the following we shall determine the density pa- rameter of the generated gravitons today and compare it to the Nucleosynthesis bound. For this we need the quantities ab given in Eq (2.20) and Here ab is the minimal scale factor andHb is the maximal Hubble parameter, i.e. the Hubble parameter right after the bounce. (Recall that in the low energy approximation t = η.) During the radiation era, curvature and/or a cos- mological constant can be neglected so that the density ρrad = a−4 = . (7.4) In order to determine the density parameter of the gen- erated gravitons today, i.e., at t = t0, we use Ωh0 = ρh0(t0) ρcrit(t0) ρh0(t0) ρrad(t0) ρrad(t0) ρcrit(t0) ρh0(t0) ρrad(t0) Ωrad. (7.5) The second factor Ωrad is the present radiation density parameter. For the factor ρh0/ρrad, which is time inde- pendent since both ρh0 and ρrad scale like 1/a 4, we insert the above results and obtain Ωh0 = Ωrad = Ωrad (7.6) Ωrad . (7.7) The nucleosynthesis bound [14] requests that Ωh0 <∼ 0.1Ωrad , (7.8) which translates into the relation (LPl/Ls) (L/Ls) <∼ 0.1 (7.9) which, at first sight, relates the different scales involved. But since we have chosen the cutoff scale Ls to be the higher-dimensional fundamental scale (string scale), Equation (7.9) reduces to vb <∼ 0.2 (7.10) by virtue of Equation (2.15). All one has to require to be consistent with the nucleosynthesis bound is a small brane velocity which justifies the low energy approach. In all, we conclude that the model is not severely con- strained by the zero mode. This result itself is remark- able. If there would be no coupling of the zero mode to the KK modes for small wavelengths the number of produced high energy zero-mode gravitons would behave as ∝ k−2 as it is the case for long wavelengths. The production of high energy zero-mode gravitons from KK gravitons enhances the total energy density by a factor of about L/Ls. Without this enhancement, the nucleosyn- thesis bound would not lead to any meaningful constraint and would not even require vb < 1. B. The KK modes As derived above, the energy density of KK gravitons on the brane is dominated by the high energy gravitons and can be approximated by [cf. Eq. (6.37)] ρKK ≃ π5v2b . (7.11) Let us evaluate the constraint induced from the require- ment that the KK-energy density on the brane be smaller than the radiation density ρKK(t) < ρrad(t) at all times. If this is not satisfied, back-reaction cannot be neglected and our results are no longer valid. Clearly, at early times this condition is more stringent than at late times since ρKK decays faster then ρrad. Inserting the value of the scale factor directly after the bounce where the produc- tion of KK gravitons takes place, a−2b = vb, one finds, using again the RS fine tuning condition (2.15), ≃ 100 v3b . (7.12) If we use the largest value for the brane velocity vb ad- mitted by the nucleosynthesis bound vb ≃ 0.2 and re- quire that ρKK/ρrad be (much) smaller than one for back- reaction effects to be negligible, we obtain the very strin- gent condition . (7.13) Let us first discuss the largest allowed value for L ≃ 0.1mm. The RS-fine tuning condition (2.15) then deter- mines Ls = (LL 1/3 ≃ 10−22 mm ≃ 1/(106 TeV). In this case the brane tension is T = 6κ4/κ25 = 6L2Pl/L6s = 6/(LL3s) ∼ (10TeV)4. Furthermore, we have (L/Ls)2 ≃ 1042 so that ys > L(L/Ls) 2 ≃ 1041mm ≃ 3 × 1015Mpc, which is about 12 orders of magnitude larger than the present Hubble scale. Also, since yb(t) ≪ L in the low energy regime, and ys ≫ L according to the inequality (7.13), the physical brane and the static brane are very far apart at all times. Note that the distance between the physical and the static brane is dy = L log(ys/yb) >∼ L ≫ Ls . This situation is probably not very realistic. Some high energy, stringy effects are needed to provoke the bounce and one expects these to be relevant only when the branes are sufficiently close, i.e. at a distance of order Ls. But in this case the constraint (7.13) will be violated which implies that back-reaction will be relevant. On the other hand, if one wants that ys ≃ L and back-reaction to be unimportant, then Eq. (7.12) implies that the bounce velocity has to be exceedingly small, vb <∼ 10−15. A way out of this conclusion is to assume that the brane distance at the bounce, ∆y = ys − yb(0), becomes of the order of the cutoff Ls or smaller. Then the pro- duction of KK gravitons is suppressed. However, then the approximation (6.18) has to be used to determine the energy density of zero-mode gravitons which then becomes ρh0 ≃ (Ls∆y) Setting ∆y ≃ Ls, the nucleosynthesis bound, ρh0 <∼ 0.1ρrad, then yields the much more stringent limit on the brane velocity, v2b < . (7.14) One might hope to find a way out of these conclusions by allowing the bounce to happen in the high energy regime. But then vb ≃ 1 and the nucleosynthesis bound is violated since too many zero-mode gravitons are pro- duced. Even if one disregards this limit for a moment, saying that the calculation presented here only applies in the low energy regime, vb ≪ 1, the modification coming from the high energy regime are not expected to allevi- ate the bounds. In the high energy regime one may of course have yb(t) ≫ L and therefore the physical brane can approach the static brane arbitrarily closely without the latter having to violate (7.13). Those results suggest that even in the scenario of a bounce at low energies, the back reaction from KK gravitons has to be taken into account. But this does not need to exclude the model. VIII. CONCLUSIONS We have studied the evolution of tensor perturbations in braneworld cosmology using the techniques developed for the standard dynamical Casimir effect. A model consisting of a moving and a fixed 3-brane embedded in a five-dimensional static AdS bulk has been considered. Applying the dynamical Casimir effect formulation to the study of tensor perturbations in braneworld cosmology represents an interesting alternative to other approaches existing in the literature so far and provides a new perspective on the problem. The explicit use of coupling matrices allows us to obtain detailed information about the effects of the intermode couplings generated by the time-dependent boundary conditions, i.e. the brane motion. Based on the expansion of the tensor perturbations in instantaneous eigenfunctions, we have introduced a consistent quantum mechanical formulation of graviton production by a moving brane. Observable quantities like the power spectrum and energy density can be directly deduced from quantum mechanical expectation values, in particular the number of gravitons created from vacuum fluctuations. The most surprising and at the same time most interesting fact which this approach has revealed is that the energy density of the massive gravitons decays like 1/a6 with the expansion of the Universe. This is a direct consequence of the localization of gravity: five-dimensional aspects of it, like the KK gravitons, become less and less ’visible’ on the brane with the expansion of the Universe. The 1/a6-scaling behavior remains valid also when the fixed brane is sent off to infinity and one ends up with a single braneworld in AdS, like in the original RS II scenario. Consequently, KK gravitons on a brane moving through an AdS bulk cannot play the role of dark matter. As an explicit example, we have studied graviton production in a generic, ekpyrotic-inspired model of two branes bouncing at low energies, assuming that the energy density on the moving brane is dominated by a radiation component. The numerical results have revealed a multitude of interesting effects. For long wavelengths kL ≪ 1 the zero mode evolves virtually independently of the KK modes. zero-mode gravitons are generated by the self coupling of the zero mode to the moving brane. For the number of produced massless gravitons we have found the simple analytical expression 2vb/(kL). These long wavelength modes are the once which are of interest for the gravitational wave power spectrum. As one expects for an ekpyrotic scenario, the power spectrum is blue on super-horizon scales with spectral tilt nT = 2. Hence, the spectrum of these Casimir gravitons has much too little power on large scales to affect the fluctuations of the cosmic microwave background. The situation changes completely for short wavelengths kL ≫ 1. In this wavelength range, the evolution of the zero mode couples strongly to the KK modes. Produc- tion of zero-mode gravitons takes place on the expense of KK-graviton production. The numerical simulation have revealed that the number of produced short-wavelength massless gravitons is given by 2vb/(5kL). It decays only like 1/k instead of the 1/k2-behavior found for long wavelengths. These short wavelength gravitons dominate the energy density. Comparing the energy density with the nucleosynthesis bound and taking the cutoff scale to be the string scale Ls, we have shown that the model is not constrained by the zero mode. As long as vb <∼ 0.2, i.e. a low energy bounce, the nucleosynthesis bound is not violated. More stringent bounds on the model come from the KK modes. Their energy density is dominated by the high energy modes which are produced due to the kink which models the transition from contraction to expansion. Imposing the reasonable requirement that the energy density of the KK modes on the brane be (much) smaller than the radiation density at all times in order for back reaction effects to be negligible, has led to two cases. On the one hand, allowing the largest values for the AdS curvature scale L ≃ 0.1mm and the bounce velocity vb ≃ 0.2, back reaction can only be neglected if the fixed brane is very far away from the physical brane ys ∼ 1041mm. As we have argued, this is not very realistic since some high energy, stringy effects provoking the bounce are expected to be relevant only when the branes are sufficiently close, i.e. ys ∼ Ls. On the other hand, by only requiring that ys ≃ L ≫ Ls, the bounce velocity has already to be exceedingly small, vb <∼ 10−15, for back reaction to be unimportant. Therefore, one of the main conclusions to take away from this work is that back reaction of massive gravitons has to be taken into account for a realistic bounce. Many of the results presented here are based on numerical calculations. However, since the used ap- proach provides the possibility to artificially switch on and off the mode couplings, we were able identify the primary sources driving the time evolution of the perturbations in different wavelength and KK mass ranges. This has allowed us to understand many of the features observed in the numerical results on analytical grounds. On the other hand, it is fair to say that most of the presented results rely on the low energy approach, i.e. on the approximation of the junction condition (generalized Neumann boundary condition) by a Neumann boundary condition. Even though we have given arguments for the goodness of this approximation, it has eventually to be confirmed by calculations which take the exact boundary condition into account. This is the subject of future work. Acknowledgment We thank Cyril Cartier who participated in the early stages of this work and Kazuya Koyama and David Lan- glois for discussions. We are grateful for the use of the ’Myrinet’-cluster of Geneva University on which most of the quite intensive numerical computations have been performed. This work is supported by the Swiss National Science Foundation. APPENDIX A: VARIATION OF THE ACTION Let us consider the variation of the action (2.27) with respect to h•. It is sufficient to study the action for a fixed wave number k and polarization • Sh•(k) = yb(t) |∂th•|2 − |∂yh•|2 − k2|h•|2 and we omit the normalization factor L3/κ5 as well as the factor two related to Z2 symmetry. The variation of (A1) reads δSh•(k) = yb(t) (∂th•)(∂tδh •) (A2) −(∂yh•)(∂yδh∗•)− k2h•δh∗• + h.c. . Here, T denotes a time interval within the variation is performed and it is assumed in the following that the variation vanishes at the boundaries of the time interval T . Performing partial integrations and demanding that the variation of the action vanishes leads to 0 = (A3) yb(t) − ∂2t h• + y3 − k2h• [(v∂t + ∂y)h•] δh •|yb(t) − (∂yh•)δh with v = dyb(t)/dt. The first term in curly brackets is the wave operator (2.24). In order for h• to satisfy the free wave equation (perturbation equation) (2.24) the term in curly brackets in the second integral has to vanish. Allowing for an evolution of h• on the branes, i.e. in general δh•|brane 6= 0, enforces the boundary conditions (v∂t + ∂y)h•|yb(t) = 0 and ∂yh•|ys = 0 , (A4) hence, the junction condition (2.26). Consequently, any other boundary conditions than (A4) are not compati- ble with the free perturbation equation (2.24) under the influence of a moving brane (provided δh• 6= 0 at the branes). APPENDIX B: COUPLING MATRICES The use of several identities of Bessel functions leads M00 = ŷb y2s − y2b , (B1) M0j = 0 , (B2) Mi0 = φ0 = ŷb y2s − y2b , (B3) Mii = m̂i , (B4) Mij = M ij +M ij (B5) MAij = (ŷb + m̂i)yb 2m2iNiNj m2j −m2i × (B6) × [ys C2(mjys)J1(miys)− yb C2(mjyb)J1(miyb)] where J1(mi y) = [J2(miyb)Y1(miy)− Y2(miyb)J1(miy)] MNij = NiNjmim̂i dyy2C1(miy)C2(mjy). (B8) This integral has to be solved numerically. Note that, because of the boundary conditions, one has the identity dyy2C1(miy)C2(mjy) = − dyy2C1(miy)C0(mjy). Furthermore, one can simplify J1(mi yb) = πmiyb , J1(mi ys) = πmiyb Y1(miys) Y1(miyb) (B10) where the limiting value has to be taken for the last term whenever Y1(miyb) = Y1(miys) = 0. APPENDIX C: ON POWER SPECTRUM AND ENERGY DENSITY CALCULATION 1. Instantaneous vacuum In Section III the in - out state approach to particle creation has been presented. The definitions of the in - and out- vacuum states Eq. (3.9) are unique and the particle concept is well defined and meaningful. If we interpret tout as a continuous time variable t, we can write the Bogoliubov transformation Eq. (3.24) âα,k,•(t) = Aβα,k(t)âinβ,k,• + B∗βα,k(t)â β,−k,• where at any time we have introduced a set of operators {âα,k•(t), â†α,k,•(t)}. Vacuum states defined at any time can be associated with these operators via âα,k,•(t)|0, t〉 = 0 ∀ α,k • . (C2) Similar to Eq. (3.11) a ”particle number” can be intro- duced through Nα,k(t) = 〈0, in|â†α,k•(t)âα,k,•(t)|0, in〉 |Bβα,k(t)|2 . (C3) We shall denote |0, t〉 as the instantaneous vacuum state and the quantity Nα,k(t) as instantaneous particle num- ber 13. However, even if we call it ”particle number” and plot it in section V for illustrative reasons, we consider only the particle definitions for the initial and final state (asymptotic regions) outlined in section III as physically meaningful. 2. Power spectrum In order to calculate the power spectrum Eq. (4.7) we need to evaluate the expectation value (t, yb,k)ĥ (t, yb,k ′)〉in = (C4) φα(t, yb)φα′ (t, yb)〈q̂α,k,•(t)q̂†α′,k′,•(t)〉in where we have introduced the shortcut 〈...〉in = 〈0, in|...|0, in〉. Using the expansion (3.15) of q̂α′,k′,•(t) in initial state operators and complex functions ǫ α,k(t) one finds 〈q̂α,k,•(t)q̂†α′,k′,•(t)〉in = α,k(t) ǫ α′,k (t) 2ωinβ,k δ(3)(k− k′). From the initial conditions (3.21) it follows that the sum in (C4) diverges at t = tin. This divergence is related to the usual normal ordering problem and can be removed by a subtraction scheme. However, in order to obtain a well defined power spectrum at all times, it is not suffi- cient just to subtract the term (1/2)(δαα′/ω α,k)δ (3)(k − ′) which corresponds to 〈q̂α,k,•(tin)q̂†α′,k′,•(tin)〉in in the above expression. In order to identify all terms contained in the power spectrum we use the instantaneous particle concept which allows us to treat the Bogoliubov coeffi- cients (3.25) and (3.26) as continuous functions of time. First we express the complex functions ǫ α,k in (C5) in terms of Aγα,k(t) and Bγα,k(t). This is of course equiv- alent to calculating the expectation value (C5) using [cf. Eq.(3.7)] q̂α,k,•(t) = 2ωα,k(t) âα,k,•(t)Θα,k(t) α,−k,•(t)Θ α,k(t) and the Bogoliubov transformation Eq. (C1). The result consists of terms involving the Bogoliubov coefficients and the factor (1/2)(δαα′/ωα,k(t))δ (3)(k − k′), leading potentially to a divergence at all times. This term cor- responds to 〈0, t|q̂α,k,•(t)q̂†α′,k′,•(t)|0, t〉, and is related to 13 It could be interpreted as the number of particles which would have been created if the motion of the boundary (the brane) stops at time t. the normal ordering problem (zero-point energy) with re- spect to the instantaneous vacuum state |0, t〉. It can be removed by the subtraction scheme 〈q̂α,k,•(t)q̂†α′,k′,•(t)〉in,phys (C7) = 〈q̂α,k,•(t)q̂†α′,k′,•(t)〉in − 〈0, t|q̂α,k,•(t)q̂ α′,k′,•(t)|0, t〉 where we use the subscript “phys” to denote the physi- cally meaningful expectation value. Inserting this expectation value into (C4), and using Eq. (4.2), we find 〈ĥ•(t, yb,k)ĥ•(t, yb,k′)〉in (C8) Rα,k(t)Y2α(a)δ(3)(k− k′) with Rα,k(t) defined in Eq. (4.9). The function ONα,k appearing in Eq. (4.9) is explicitely given by ONα,k = 2ℜ Θ2α,kAβα,kB∗βα,k +Θα,k α′ 6=α ωα′,k Yα′ (a) Yα(a) Θ∗α′,kB∗βαBβα′ +Θα′,kAβαB∗βα′ and Oǫα,k appearing in Eq. (4.10) reads Oǫα,k = β,α′ 6=α Yα′(a) Yα(a) ωinβ,k . (C10) 3. Energy density In order to calculate the energy density we need to evaluate the expectation value 〈 ˙̂hij(t,x, yb) ˙̂hij(t,x, yb)〉in. Using (2.22) and the relation e•ij(−k) = (e•ij(k))∗ we ob- 〈 ˙̂hij(t,x, yb) ˙̂hij(t,x, yb)〉in = (2π)3/2 (2π)3/2 (C11) × 〈 ˙̂h (t, yb,k) ′(t, yb,k ′)〉inei(k−k ′)·xe•ij(k) ′ ij(k′) By means of the expansion (3.17) the expectation value 〈 ˙̂h (t, yb,k) ′(t, yb,k ′)〉in becomes 〈 ˙̂h (t, yb,k) ′(t, yb,k ′)〉in (C12) 〈p̂α,k,•(t)p̂†α′,k′,•′(t)〉inφα(t, yb)φα′ (t, yb). From the definition of p̂α,k,•(t) in Eq. (3.18) it is clear that this expectation value will in general contain terms proportional to the coupling matrix and its square when expressed in terms of ǫ α,k. However, we are interested in the expectation value at late times only when the brane moves very slowly such that the mode couplings go to zero and a physical meaningful particle definition can be given. In this case we can set 〈p̂α,k,•(t)p̂†α′,k′,•′(t)〉in = ˙̂qα,k,•(t) ˙̂q α′,k′,•′(t) .(C13) Calculating this expectation value by using Eq. (3.15) leads to an expression which, as for the power spec- trum calculation before, has a divergent part related to the zero-point energy of the instantaneous vacuum state (normal ordering problem). We remove this part by a subtraction scheme similar to Eq (C7). The final result reads 〈 ˙̂qα,k,•(t) ˙̂q†α′,k′,•′(t)〉in,phys (C14) α,k(t)ǫ̇ α′,k′(t)√ ωinβ,kω − ωα,k(t)δαα′ ′δ(3)(k− k′). Inserting this result into Eq. (C12), splitting the summa- tions in sums over α = α′ and α 6= α′ and neglecting the oscillating α 6= α′ contributions (averaging over several oscillations), leads to 〈 ˙̂h (t, yb,k) ′(t, yb,k ′)〉in (C15) Kα,k(t)Y2α(a)δ••′δ(3)(k− k′) where the function Kα,k(t) is given by Kα,k(t) = |ǫ̇(β)α,k(t)|2 ωinβ,k − ωα,k(t) = ωα,k(t)Nα,k(t) , (C16) and we have made use of Eq. (4.2). The relation be- tween β |ǫ̇ α,k(t)|2/ωinβ,k and the number of created par- ticles can easily be established. Using this expression in Eq. (C11) leads eventually to 〈 ˙̂hij(t,x, yb) ˙̂hij(t,x, yb)〉in (C17) (2π)3 Kα,k(t)Y2α(a) where we have used that the polarization tensors satisfy e•ij(k) e• ij(k) = 2. (C18) The final expression for the energy density Eq. (4.18) is then obtained by exploiting that κ5/L = κ4. APPENDIX D: NUMERICS In order to calculate the number of produced gravitons the system of coupled differential equations (3.34) and (3.35) is solved numerically. The complex functions ξ α,k are decomposed into their real and imaginary parts: α,k = u α,k + iv α,k , η α,k = x α,k + iy α,k. (D1) The system of coupled differential equations can then be written in the form (cf. Eq. (A2) of [16]) k (t) = Wk(t)X k (t) (D2) where 0,k ...u nmax,k 0,k ...x nmax,k 0,k ...v nmax,k 0,k ...y nmax,k The matrix Wk(t) is given by Eq. (A4) of [16] but here indices start at zero. The number of produced gravitons can be calculated directly from the solutions to this system using Eqs. (3.28) and (3.32). Note that for a given truncation parameter nmax the above system of size 4(nmax + 1) × 4(nmax + 1) has to be solved nmax + 1 - times, each time with different initial conditions (3.38). The main difficulty in the numerical simulations is that most of the entries of the matrix Wk(t) [Eq. (A4) of [16]] are not known analytically. This is due to the fact that Eq. (2.40) which determines the time-dependent KK masses mi(t) does not have an (exact) analytical solution. Only the 00-component of the coupling matrix Mαβ is known analytically. We therefore have to deter- mine the time-dependent KK-spectrum {mi(t)}nmaxi=1 by solving Eq. (2.40) numerically. In addition, also the part MNij [Eq. (B8)] has to be calculated numerically since the integral over the particular combination of Bessel functions can not be found analytically. We numerically evaluate the KK-spectrum and the integral MNij for discrete time-values ti and use spline routines to assemble Wk(t). The system (D2) can then be solved using standard routines. We chose the distribution of the ti’s in a non-uniform way. A more dense mesh close to the bounce and a less dense mesh at early and late times. The independence of the numerical results on the distribution of the ti’s is checked. In order to implement the bounce as realistic as possible, we do not spline the KK-spectrum very close to the bounce but re-calculate it numerically at every time t needed in the differential equation solver. This minimizes possible artificial effects caused by using a spline in the direct vicinity of the bounce. The same was done for MNij but we found that splining MNij when propagating through the bounce does not affect the numerical results. Routines provided by the GNU Scientific Library (GSL) [59] have been employed. Different routines for root finding and integration have been compared. The code has been parallelized (MPI) in order to deal with the 1 10 100 KK-mass m 1 10 100 KK-mass m FIG. 33: Comparison of the final KK-graviton spectrum n,k,• with the expression dn,k(tout) describing to what accu- racy the diagonal part of the Bogoliubov relation (D4) is sat- isfied. Left panel: ys = 3, k = 0.1, vb = 0.03 and nmax = 100 [cf. Fig. 25]. Right panel: ys = 3, k = 30, vb = 0.1 and nmax = 100 [cf. Fig. 26]. intensive numerical computations. The accuracy of the numerical simulations can be assessed by checking the validity of the Bogoliubov relations Aβα,k(t)A∗βγ,k(t)− B∗βα,k(t)Bβγ,k(t) = δαγ (D4) Aβα,k(t)B∗βγ,k(t)− B∗βα,k(t)Aβγ,k(t) = 0. (D5) In the following we demonstrate the accuracy of the nu- merical simulations by considering the diagonal part of (D4). The deviation of the quantity dα,k(t) = 1− |Aβα,k(t)|2 − |Bβα,k(t)|2 from zero gives a measure for the accuracy of the numerical result. We consider this quantity at final times tout and compare it with the corresponding final particle spectrum. In Fig. 33 we compare the final KK- graviton spectrum N outn,k,• with the expression dn,k(tout) for two different cases. This shows that the accuracy of the numerical simulations is very good. Even if the expectation value for the particle number is only of order 10−7 to 10−6, the deviation of dn,k(tout) from zero is at least one order of magnitude smaller. This demonstrates the reliability of our numerical simulations and that we can trust the numerical results presented in this work. APPENDIX E: DYNAMICAL CASIMIR EFFECT FOR A UNIFORM MOTION We consider a real massless scalar field on a time- dependent interval [0, y(t)]. The time evolution of its mode functions are described by a system of differential equations like (2.49) where the specific form of Mαβ de- pends on the particular boundary condition the field is subject to. In [15, 17] a method has been introduced to study particle creation due to the motion of the boundary y(t) (i.e. the dynamical Casimir effect) fully numerically. We refer the reader to these publications for further de- tails. If the boundary undergoes a uniform motion y(t) = 1+vt (in units of some reference length) it was shown in [57, 58] that the total number of created scalar particles diverges, caused by the discontinuities in the velocity at the begin- ning and the end of the motion. In particular, for Dirich- let boundary conditions (no zero mode), it was found in [58] that 〈0, in|N̂outn |0, in〉 ∝ v2/n if n > 6 and v ≪ 1. Thereby in- and out- vacuum states are defined like in the present work and the frequency of a mode function is given by ωn = π n , n = 1, 2, ... . In Figure 34 we show spectra of created scalar particles obtained numer- ically with the method of [17] for this particular case. One observes that, as for our bouncing motion, the con- vergence is very slow since the discontinuities in the ve- locity lead to the excitation of arbitrary high frequency modes. Nevertheless, it is evident from Fig. 34 that the numerically calculated spectra approach the analytical prediction. The linear motion discussed here and the brane-motion (2.18) are very similar with respect to the discontinuities in the velocity. In both cases, the total discontinuous change of the velocity is 2v and 2vb, re- spectively. The resulting divergence of the acceleration is responsible for the excitation and therefore creation of particles of all frequency modes. Consequently we ex- pect the same ∝ v2/ωn behavior for the bouncing mo- tion (2.18). Indeed, comparing the convergence behavior of the final graviton spectrum for vb = 0.01 shown in Fig. 25 with the one of the scalar particle spectrum for v = 0.01 depicted in Fig. 34 shows that both are very similar. [1] J. Polchinski, String theory. An introduction to the bosonic string, Vol. I (Cambridge University Press, Cam- bridge, UK, 1998). [2] J. Polchinski, String theory. Superstring theory and be- yond, Vol. II (Cambridge University Press, Cambridge, UK, 1998). [3] J. Polchinski, Phys. Rev. Lett. 75, 4724 (1995), hep- th/9510017. 1 2 3 10 20 30 100 200 frequency mode n v=0.1 v=0.01 v=0.05 v=0.02 FIG. 34: Spectra of massless scalar particles produced under the influence of the uniform motion y(t) = 1+vt for velocities v = 0.01, 0.02, 0.05 and 0.1. The numerical results are com- pared to the expression Nn = 0.035v 2/n (dashed lines) which agrees with the analytical prediction Nn ∝ v [4] N. Arkani-Hamed, S. Dimopoulos, and G. R. Dvali, Phys. Lett. B429, 263 (1998), hep-ph/9803315. [5] N. Arkani-Hamed, S. Dimopoulos, and G. R. Dvali, Phys. Rev. D 59, 086004 (1999), hep-ph/9807344. [6] L. Randall and R. Sundrum, Phys. Rev. Lett. 83, 3370 (1999), hep-ph/9905221. [7] L. Randall and R. Sundrum, Phys. Rev. Lett. 83, 4690 (1999), hep-th/9906064. [8] C. Lanczos, Ann. Phys. (Leipzig) 74, 518 (1924). [9] N. Sen, Ann. Phys. (Leipzig) 73, 365 (1924). [10] G. Darmois, Mémorial des sciences mathématiques, fas- cicule 25 chap. 5 (Gauthier-Villars, Paris, 1927). [11] W. Israel, Nuovo Cimento B44, 1 (1966). [12] P. Kraus, JHEP 12, 011 (1999), hep-th/9910149. [13] P. Binetruy, C. Deffayet, U. Ellwanger, and D. Langlois, Phys. Lett. B477, 285 (2000), hep-th/9910219. [14] M. Maggiore. Phys. Rept. 331, 283 (2000), gr- qc/9909001. [15] M. Ruser, J. Opt. B: Quantum Semiclass. Opt. 7, S100 (2005), quant-ph/0408142. [16] M. Ruser, Phys. Rev. A 73, 043811 (2006), quant- ph/0509030. [17] M. Ruser, J. Phys. A 39, 6711 (2006), quant-ph/0603097. [18] R. A. Battye, C. van de Bruck, and A. Mennim, Phys. Rev. D 69, 064040 (2004), hep-th/0308134. [19] R. A. Battye and A. Mennim, Phys. Rev. D 70, 124008 (2004), hep-th/0408101. [20] R. Easther, D. Langlois, R. Maartens, and D. Wands, J. Cosmol. Astropart. Phys. 10 (2003) 014, hep-th/0308078. [21] T. Kobayashi and T. Tanaka, J. Cosmol. Astropart. Phys. 10 (2004) 015. [22] D. S. Gorbunov, V. A. Rubakov, and S. M. Sibiryakov, JHEP 10, 015 (2001), hep-th/0108017. [23] T. Kobayashi, H. Kudoh, and T. Tanaka, Phys. Rev. D 68, 044025 (2003), gr-qc/0305006. [24] R. Maartens, D. Wands, B. A. Bassett, and I. P. C. Heard, Phys. Rev. D 62, 041301 (2000), hep- ph/9912464. [25] D. Langlois, R. Maartens, and D. Wands, Phys. Lett. B 489, 259 (2000), hep-th/0006007. [26] A. V. Frolov and L. Kofman (2002), hep-th/0209133. [27] T. Hiramatsu, K. Koyama, and A. Taruya, Phys. Lett. B 578, 269 (2004), hep-th/0308072. [28] T. Hiramatsu, K. Koyama, and A. Taruya, Phys. Lett. B 609, 133 (2005), hep-th/0410247. [29] T. Hiramatsu, Phys. Rev. D 73, 084008 (2006), hep- th/0601105. [30] K. Koyama, J. Cosmol. Astropart. Phys. 09, 10, (2004) astro-ph/0407263. [31] K. Ichiki and K. Nakamura, Phys. Rev. D 70, 064017 (2004), hep-th/0310282. [32] K. Ichiki and K. Nakamura, astro-ph/0406606 (2004). [33] T. Kobayashi and T. Tanaka, Phys. Rev. D 71, 124028 (2005), hep-th/0505065. [34] T. Kobayashi and T. Tanaka, Phys. Rev. D 73, 044005 (2006), hep-th/0511186. [35] T. Kobayashi and T. Tanaka Phys. Rev. D 73, 124031 (2006). [36] S. Seahra, Phys. Rev. D 74, 044010 (2006), hep- th/0602194. [37] C. Cartier, R. Durrer, M. Ruser, Phys. Rev.D72, 104018 (2005), hep-th/0510155. [38] J. Khoury, B. A. Ovrut, P.J. Steinhardt, and N. Turok, Phys. Rev. D 64 123522 (2001), hep-th/0103239. [39] R. Kallosh, L. Kovman and A. Linde, Phys. Rev. D 64 123523 (2001), hep-th/0104073. [40] A. Neronov, J. High Energy Phys. 11, 007 (2001), hep- th/0109090. [41] P.J. Steinhardt, and N. Turok, Phys. Rev. D 65 126003 (2002), hep-th/0111098. [42] J. Khoury, B. A. Ovrut, N. Seiberg, P.J. Steinhardt and N. Turok, Phys. Rev. D 65 086007 (2002), hep- th/0108187. [43] J. Khoury, B. A. Ovrut, P.J. Steinhardt and N. Turok, Phys. Rev. D 66 046005 (2002), hep-th/0109050. [44] J. Khoury, P.J. Steinhardt and N. Turok, Phys. Rev. Lett. 91 161301 (2003), astro-ph/0302012. [45] J. Khoury, P.J. Steinhardt and N. Turok, Phys. Rev. Lett. 92 031302 (2004), hep-th/0307132. [46] A. Tolley, N. Turok, and P.J. Steinhardt, Phys. Rev. D 69 106005 (2004), hep-th/0306109. [47] R. Durrer and M. Ruser, Phys. Rev. Lett. 99, 071601 (2007), arXiv:0704.0756. [48] C. Cartier and R. Durrer, Phys. Rev. D71, 064022 (2005), hep-th/0409287. [49] R. Maartens, Living Rev. Rel. 7, 7 (2004), gr-qc/0312059. [50] R. Durrer, Braneworlds, at the XI Brazilian School of Cosmology and Gravitation, Edt. M. Novello and S.E. Perez Bergliaffa, AIP Conference Proceedings 782 (2005), hep-th/0507006. [51] S. W. Hawking, T. Hertog, and H. S. Reall, Phys. Rev. D62, 043501 (2000), hep-th/0003052. [52] S. W. Hawking, T. Hertog, and H. S. Reall, Phys. Rev. D63, 083504 (2001), hep-th/0010232. [53] M. A. Pinsky, Partial Differential Equations and Boundary-Value Problems with Applications, McGraw- Hill, inc. New York (1991). [54] M. Crocce, D.A.R. Dalvit and F.D. Mazzitelli, Phys. Rev. A66, 033811 (2002), quant-ph/0205104. [55] M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, 9th Edition (Dover Publications, NY, 1970). [56] N. Straumann, Ann. Phys. (Leipzig), Volume 15, Issue 10-11 , 701 (2006), hep-ph/0505249. [57] G. T. Moore, J. Math. Phys. 11, 2679 (1970). [58] M. Castagnino and R. Ferraro, Ann. Phys. 154, 1 (1984). [59] http://www.gnu.org/software/gsl ABSTRACT We consider a two-brane system in a five-dimensional anti-de Sitter spacetime. We study particle creation due to the motion of the physical brane which first approaches the second static brane (contraction) and then recedes from it(expansion). The spectrum and the energy density of the generated gravitons are calculated. We show that the massless gravitons have a blue spectrum and that their energy density satisfies the nucleosynthesis bound with very mild constraints on the parameters. We also show that the Kaluza-Klein modes cannot provide the dark matter in an anti-de-Sitter braneworld. However, for natural choices of parameters, backreaction from the Kaluza-Klein gravitons may well become important. The main findings of this work have been published in the form of a Letter [R. Durrer and M. Ruser, Phys. Rev. Lett. 99, 071601 (2007), arXiv:0704.0756]. <|endoftext|><|startoftext|> Introduction Spectral analysis The sample Spectral fits Errors from the fit and error propagation Results Bursts detected also by other instruments Peak energy vs spectral index Correlation between Epk and E,iso Summary and Conclusions The observed spectra ABSTRACT We study the spectral and energetics properties of 47 long-duration gamma-ray bursts (GRBs) with known redshift, all of them detected by the Swift satellite. Due to the narrow energy range (15-150 keV) of the Swift-BAT detector, the spectral fitting is reliable only for fitting models with 2 or 3 parameters. As high uncertainty and correlation among the errors is expected, a careful analysis of the errors is necessary. We fit both the power law (PL, 2 parameters) and cut--off power law (CPL, 3 parameters) models to the time-integrated spectra of the 47 bursts, and present the corresponding parameters, their uncertainties, and the correlations among the uncertainties. The CPL model is reliable only for 29 bursts for which we estimate the nuf_nu peak energy Epk. For these GRBs, we calculate the energy fluence and the rest- frame isotropic-equivalent radiated energy, Eiso, as well as the propagated uncertainties and correlations among them. We explore the distribution of our homogeneous sample of GRBs on the rest-frame diagram E'pk vs Eiso. We confirm a significant correlation between these two quantities (the "Amati" relation) and we verify that, within the uncertainty limits, no outliers are present. We also fit the spectra to a Band model with the high energy power law index frozen to -2.3, obtaining a rather good agreement with the "Amati" relation of non-Swift GRBs. <|endoftext|><|startoftext|> Accepted for publication in the Astrophysical Journal, July 16, 2007 Preprint typeset using LATEX style emulateapj v. 08/22/09 THE RELATIONSHIP BETWEEN MOLECULAR GAS TRACERS AND KENNICUTT-SCHMIDT LAWS Mark R. Krumholz and Todd A. Thompson Department of Astrophysical Sciences, Princeton University, Princeton, NJ 08544 Accepted for publication in the Astrophysical Journal, July 16, 2007 ABSTRACT We provide a model for how Kennicutt-Schmidt (KS) laws, which describe the correlation between star formation rate and gas surface or volume density, depend on the molecular line chosen to trace the gas. We show that, for lines that can be excited at low temperatures, the KS law depends on how the line critical density compares to the median density in a galaxy’s star-forming molecular clouds. High critical density lines trace regions with similar physical properties across galaxy types, and this produces a linear correlation between line luminosity and star formation rate. Low critical density lines probe regions whose properties vary across galaxies, leading to a star formation rate that varies superlinearly with line luminosity. We show that a simple model in which molecular clouds are treated as isothermal and homogenous can quantitatively reproduce the observed correlations between galactic luminosities in far infrared and in the CO(1 → 0) and HCN(1 → 0) lines, and naturally explains why these correlations have different slopes. We predict that IR-line luminosity correlations should change slope for galaxies in which the median density is close to the line critical density. This prediction may be tested by observations of lines such as HCO+(1 → 0) with intermediate critical densities, or by HCN(1 → 0) observations of intensely star-forming high redshift galaxies with very high densities. Recent observations by Gao et al. hint at just such a change in slope. We argue that deviations from linearity in the HCN(1 → 0)−IR correlation at high luminosity are consistent with the assumption of a constant star formation efficiency. Subject headings: ISM: clouds — ISM: molecules — stars: formation — galaxies: ISM — radio lines: 1. INTRODUCTION Schmidt (1959, 1963) first proposed that the rate at which a gas forms stars might follow a simple power law correlation of the form ρ̇∗ ∝ ρNg , where ρ̇∗ is the star formation rate per unit volume, ρg is the gas den- sity, and N is generally taken to be in the range 1 − 2. In the decades since, observations have revealed two strong correlations that appear to be evidence for this hypothesis. First, galaxy surveys reveal that the in- frared luminosity of a galaxy, which traces the star for- mation rate, varies with its luminosity in the CO(1 → 0) line, which traces the total mass of molecular gas, as LFIR ∝ L1.4−1.6CO (Gao & Solomon 2004a,b; Greve et al. 2005; Riechers et al. 2006a). Kennicutt (1998a,b) iden- tified the closely-related correlation between gas surface density Σg and star formation rate surface density Σ̇∗, Σ̇∗ ∝ Σ1.4±0.15g , a relation that has come to be known as the Kennicutt Law. Since over the bulk of the dy- namic range of Kennicutt’s data galaxies are predomi- nantly molecular, this is effectively a correlation between molecular gas, as traced by CO(1 → 0) line emission, and star formation. Spatially resolved observations of galaxies confirm that, at least for molecule-rich galax- ies where resolved CO(1 → 0) observations are possible, star formation is more closely coupled with gas traced by CO(1 → 0) than with atomic gas (Wong & Blitz 2002; Heyer et al. 2004; Komugi et al. 2005; Kennicutt et al. 2007) Electronic address: krumholz@astro.princeton.edu, thomp@astro.princeton.edu 1 Hubble Fellow 2 Lyman Spitzer Jr. Fellow Second, Gao & Solomon (2004a,b) find that there is a strong correlation between the IR luminosity of galax- ies and emission in the HCN(1 → 0) line, which mea- sures the mass at densities significantly greater than that probed by CO(1 → 0). However, they find that their correlation, which covers nearly three decades in total galactic star formation rate, is linear: LFIR ∝ LHCN. Wu et al. (2005) show that this correlation ex- tends down to individual star-forming clumps of gas in the Milky Way, provided that their infrared luminosities are >∼ 104.5 L⊙. Interestingly, however, Gao et al. (2007) find a deviation from linearity in the IR-HCN correlation for a sample of intensely star-forming high redshift galax- ies. These sources show small but significant excesses of infrared emission for their observed HCN emission. The difference in power law indices between the LFIR− LCO and LFIR −LHCN correlations is statistically signif- icant, and, on its face, puzzling. An index near N = 1.5 seems natural if one supposes that a roughly constant fraction of the gas present in molecular clouds will be converted into stars each free-fall time. In this case one expects ρ̇∗ ∝ ρ1.5g (Madore 1977; Elmegreen 1994). If gas scale heights do not vary strongly from galaxy to galaxy, this implies Σ̇∗ ∝ Σ1.5g as well, which is consis- tent with the observed Kennicutt law. More generally, since the dynamical timescale in a marginally Toomre- stable (Q ≈ 1; see Martin & Kennicutt 2001) galactic disk is of order Ω−1 ∝ (Gρg)−1/2, where Ω is the angular frequency of the disk, an index close to N = 1.5 is ex- pected if star formation is regulated by any phenomenon that converts a fixed fraction of the gas into stars on this time scale (Elmegreen 2002). http://arxiv.org/abs/0704.0792v2 mailto:krumholz@astro.princeton.edu, thomp@astro.princeton.edu On the other hand, Wu et al. (2005) suggest a simple interpretation of the linear IR-HCN correlation. They ar- gue that the individual HCN-emitting molecular clumps that they identify in the Milky Way represent a funda- mental unit of star formation. The linear correlation between star formation rate and HCN luminosity across galaxies arises because a measurement of the HCN lu- minosity for a galaxy simply counts the number of such structures present within it, each of which forms stars at some roughly fixed rate regardless of its galactic environ- ment. However, in this interpretation it is unclear why the structures traced by HCN(1 → 0) emission should form stars at the same rate in any galaxy. After all, one could equally well argue that molecular clouds traced by CO(1 → 0) are fundamental units of star formation, but the non-linear IR-CO correlation clearly shows that these objects do not form stars at a fixed rate per unit mass. Moreover, the evidence presented by Gao et al. (2007) that the linear IR-HCN correlation varies in extremely luminous high redshift galaxies suggests that the rela- tionship between HCN emission and star-formation may be somewhat more complex. In this paper we attempt to explain the origin of the difference in slope between the CO and HCN correla- tions with star formation rate, and more generally to give a theoretical framework for understanding how cor- relations between star formation rate and line luminosity, which we generically refer to as Kennicutt-Schmidt (KS) laws, depend on the tracer used to define them. Our cen- tral argument is conceptually quite simple, and in some sense represents a combination of the intuitive arguments for CO and HCN given above. Consider an observation of a galaxy in a molecular tracer with critical density ncrit, which essentially mea- sures the mass of gas at densities of ncrit or more, i.e. the gas that is dense enough for that particular transition to be excited. In galaxies where the median density of the molecular gas is significantly larger than ncrit, this means that the observation will detect the majority of the gas, and the bulk of the emission will come from gas whose density is near the median density. Since the gas den- sity will vary from galaxy to galaxy, the star formation rate per unit gas mass will vary as roughly ρ1.5g , with one factor of ρg coming from the amount of gas available for star formation, and an additional factor of ρ0.5g coming from the dependence of the free-fall or dynamical time on the density. On the other hand, in galaxies where the median gas density is small compared to the critical density for the chosen transition, observations will pick out only high density peaks. Since the density in these peaks is set by ncrit, and not by the conditions in the galaxy, these peaks are at essentially the same density in any galaxy where they are observed, and the corresponding free-fall times in these regions are constant as well. As a result, the star formation rate per unit mass of gas traced by that line is approximately the same in every galaxy, because the corresponding free-fall time is the same in every galaxy. In the rest of this paper, we give a quantitative version of this intuitive argument, and then discuss its conse- quences. In § 2 we develop a simple formalism to com- pute the star formation rate and the molecular line lu- minosity of galaxies, and in § 3 we use this formalism to predict the correlation between star formation rate and luminosity. We show that our predictions provide a very good fit for a variety of observations, and make predic- tions for future observations. We discuss the implications of our work and its limitations in § 4, and summarize our conclusions is § 5. 2. STAR FORMATION RATES AND LINE LUMINOSITIES 2.1. Cloud Properties Consider a galaxy in which the star-forming molecu- lar clouds have a volume-averaged mean molecular hy- drogen number density n = ρg/µH2 , where ρg is the volume-averaged mass density of the molecular clouds in the galaxy and µH2 = 3.9×10−24 g is the mean mass per hydrogen molecule for a gas of standard cosmic composi- tion. Observations indicate that n varies by two to three decades over the galaxies for which the Kennicutt and Gao & Solomon correlations are measured, from n ≈ 50 cm−3 in normal spirals like the Milky Way (McKee 1999) up to n ≈ 104 cm−3 in the strongest starburst systems in the local universe (e.g. Downes & Solomon 1998). There is strong evidence that densities in molec- ular clouds follow a lognormal probability distribution function (PDF; see reviews by Mac Low & Klessen 2004 and Elmegreen & Scalo 2004) d lnx (lnx− lnx)2 , (1) where x = n/n is the molecular hydrogen number den- sity n relative to the average density, σ is the width of the lognormal, and lnx = −σ2/2. For this distribution the median density is nmed = n exp(σ 2/2). Numerical experiments show that for supersonic isothermal turbu- lence σ2 ≈ ln 1 + 3M2/4 , where M is the 1D Mach number of the turbulence (Nordlund & Padoan 1999; Ostriker et al. 1999; Padoan & Nordlund 2002). Mach numbers in star-forming molecular clouds range from M ∼ 30 (McKee 1999) in normal spirals to M ∼ 100 in strong starbursts (Downes & Solomon 1998), imply- ing that median densities in molecular clouds range from ∼ 103 cm−3 in normal spirals to ∼ 106 cm−3 in star- bursts. Star forming clouds within a galaxy are approx- imately isothermal, except very near strong sources of stellar radiation, so we assume a fixed temperature T for the clouds. Observationally, T ranges from roughly 10 K in normal spirals (McKee 1999) up to as much about 50 K in strong starbursts (Downes & Solomon 1998; Gao & Solomon 2004b). 2.2. Star Formation Rates First let us ask how quickly stars form in such a medium. Krumholz & McKee (2005) give a model for star formation regulated by supersonic turbulence in which a population of molecular clouds of total mass Mcl form stars at a rate Ṁ∗ = SFRffMcl/tff(n), where tff(n) is the free-fall time evaluated at the mean density and SFRff is a number of order 10 −2 that depends weakly on M. We therefore estimate the star formation rate per unit volume as a function of the mean density given by ρ̇∗ ≈ SFRff 32Gµ3H2n . (2) Molecular Gas and Kennicutt-Schmidt Laws 3 We adopt the Krumholz & McKee result SFRff ≈ 0.014(M/100)−0.32 for clouds with a fiducial virial ra- tio of αvir = 1.3. Alternately, Krumholz & Tan (2007) point out that observed correlations between the star formation rate and the luminosity in different density tracers imply that over a 3− 4 decade range in density n, Ṁ∗ ≈ 10−2 Mcl(> n) tff(n) , (3) where Mcl(> n) is the mass of gas with a density of n or higher, and Mcl = Mcl(> 0). For a given choice of n this provides an alternative estimate of the star for- mation rate which is purely empirical, and independent of any particular theoretical model. However, the differ- ence between the star formation rates predicted by (2) and (3) is small. For gas with a lognormal PDF, Mcl(> n) = 1 + erf −2 lnx+ σ2 23/2σ , (4) and using this to evaluate equation (3) indicates that, for Mach numbers in the observed range, the two prescrip- tions (2) and (3) give about the same star formation rate over a very broad range in x. For example, at M = 30 the two estimates agree to within a factor of 3 for den- sities in the range 0.2 < x < 4 × 104. Given the scatter inherent in observational estimates of the star formation rate, a factor of 3 difference is not particularly signifi- cant, so it matters little which prescription we adopt. In practice, we will use equation (2). 2.3. Line Luminosities Now we must compute the luminosity of molecular line emission from the galaxy. Even for a cloud that is not in local thermodynamic equilibrium (LTE), for optically thin emission this calculation is straightforward. How- ever, the molecular lines used most often in galaxy sur- veys are generally optically thick. To handle the effect of finite optical depth on molecule level populations and line luminosities, we adopt an escape probability approxima- tion and treat clouds as homogeneous spheres. This is not fully consistent with our assumption that clouds have lognormal density PDFs, since the escape probability for- malism assumes a uniform level population throughout the cloud, and the essence of our argument in this paper turns on how the level population varies with density. However, this approach gives us an approximate way of incorporating the optical thickness of star-forming clouds into our model, the only alternative to which for turbu- lent media is full numerical simulation (e.g. Juvela et al. 2001). We therefore proceed by treating clouds as ho- mogeneous in order to determine their escape probabili- ties, and we then relax the assumption of homogeneity, while keeping the escape probabilities fixed, in order to determine level populations and cloud luminosities as a function of density. Consider a cloud of radius R in statistical equilibrium but not necessarily in LTE. In the escape probability approximation, the fraction fi of molecules of species S in state i is given implicitly by the linear system (nqji + βjiAji) fj = (nqij + βijAij)  fi (5) fi=1, (6) where qij is the collision rate for transitions from state i to state j, Aij is the Einstein spontaneous emission coefficient for this transition, βij is the cloud-averaged escape probability for photons emitted in this transition, the sums are over all quantum states, and we understand that Aij = 0 for i ≤ j and qij = 0 for i = j. Equations (5) and (6) allow us to compute the level populations fi for given values of βij . To completely specify the system, we must add an additional consis- tency condition relating the values of βij to the level populations. For a homogeneous spherical cloud, the es- cape probability for a given line is related to the optical depth from the center to the edge of the cloud τij by (B. Draine, 2007, private communication) βij ≈ 1 + 0.5τij , (7) where τij is computed at the central frequency of the line. In turn, the optical depth is related to the level populations by τij = 4(2π)3/2Mcs nX(S)fjR , (8) where λij is the wavelength of transition i → j, gi and gj are the statistical weights of states i and j, cs is the isothermal sound speed of the gas, and X(S) is the abundance of molecules of species S. Note that this equation implicitly assumes that the cloud has a uni- form Maxwellian velocity distribution with 1D disper- sion Mcs, consistent with our treatment of the clouds as homogeneous spheres. One additional complication is that we do not directly know cloud radii for most exter- nal galaxies, where observations cannot resolve individ- ual molecular clouds. However, we often can diagnose the optical depths of transitions by comparing line ratios of molecular isotopomers of different abundances. We therefore take τ10, the optical depth of the transition be- tween the first excited state and the ground state, as known. For a given level population this fixes the value of R. We solve equations (5)–(8) using Newton-Raphson it- eration. In this procedure, we guess an initial set of es- cape probabilities βij , and solve the linear system (5) and (6) to find the corresponding initial level populations fi. We then compute the optical depths τij from equation (8). The guessed escape probabilities βij and the corre- sponding optical depths τij generally will not satisfy the consistency condition (7), so we then iterate over βij val- ues using a Newton-Raphson approach, seeking βij for which the level populations give optical depths τij such that all elements of the matrix βij − 1/(1 + 0.5τij) are equal to zero within some specified tolerance. We use the LTE level populations and escape probabilities for our initial guess, so that the iteration converges rapidly when the system is close to LTE. Once we have determined the escape probabilities βij , we compute the luminosity by holding the βij values fixed but allowing the level populations to vary with density, then integrating over the PDF. Thus, the total luminos- TABLE 1 Model Parameters Parameter Normal galaxy Intermediate Starburst Reference T 10 20 50 1–4 M 30 50 80 1–4 X(CO) 2× 10−4 4× 10−4 8× 10−4 5 X(HCO+) 2× 10−9 4× 10−9 8× 10−9 6, 7 X(HCN) 1× 10−8 2× 10−8 4× 10−8 6–8 τCO(1→0) 10 20 40 9 τHCO+(1→0) 0.5 1.0 2.0 6, 7 τHCN(1→0) 0.5 1.0 2.0 6, 7 OPR 0.25 0.25 0.25 10 Note. — OPR = H2 ortho- to para-ratio. References: 1 – Solomon et al. (1987), 2 – Gao & Solomon (2004b), 3 – Downes & Solomon (1998), 4 – Wu et al. (2005), 5 – Black (2000), 6 – Nguyen et al. (1992), 7 – Wild et al. (1992), 8 – Lahuis & van Dishoeck (2000), 9 – Combes (1991), 10 – Neufeld et al. (2006) ity per unit volume in a particular line is Lij = X(S)βijAijhνij d lnx d lnx, (9) where νij is the line frequency, fi is an implicit function of n given by the solution to equations (5) and (6), and we assume that the abundance X(S) is independent of n. The line luminosity per unit mass is Lij/(µH2n). An IDL code that implements this calcu- lation is available for public download from http://www.astro.princeton.edu/∼krumholz/ astron- omy.html. 3. CORRELATIONS AND KENNICUTT-SCHMIDT LAWS 3.1. Lines and Parameters Using the formalism of § 2, we can now predict the correlation between the star formation rate and the luminosity of a galaxy in molecular lines. We make these predictions for three representative molecular lines: CO(1 → 0), HCO+(1 → 0), and HCN(1 → 0). For the first and last of these transitions, there are exten- sive observational surveys. We select HCO+(1 → 0) in addition to these two because there is some obser- vational data for it, and because its critical density of ncrit = βHCO+4.6 × 104 cm−3 makes it intermediate between CO(1 → 0), with ncrit = βCO560 cm−3, and HCN(1 → 0), with ncrit = βHCN2.8 × 105 cm−3.3 Here βS is the escape probability for the 1 → 0 transition of species S. These critical densities are for T = 20 K. All molecular data are taken from the Leiden Atomic and Molecular Database4 (Schöier et al. 2005). We make our calculations for three sets of fiducial pa- rameters which we summarize in Table 1. The three sets correspond roughly to typical conditions in normal disk galaxies like the Milky Way, to starburst galaxies like Arp 220, and to a case intermediate between the two. We have selected parameters for each case to roughly model the systematic variation of ISM parameters as one moves 3 Note that our critical density for HCN(1 → 0) is somewhat larger than the value quoted by Gao & Solomon (2004a,b), proba- bly because their calculation is based on somewhat different as- sumptions about how to extrapolate from calculated rate coef- ficients for HCN collisions with He to collisions with H2. See Schöier et al. (2005) for details. 4 http://www.strw.leidenuniv.nl/∼moldata/ from normal disk galaxies to starbursts. Thus, we vary the ISM temperature from 10− 50 K and the molecular cloud Mach number from 30−80 as we move from Milky Way-like molecular clouds to temperatures and Mach numbers typical of starbursts (e.g. Downes & Solomon 1998). Similarly, starbursts, which preferentially occur at galactic centers, have systematically larger metallici- ties than galaxies like the Milky Way (e.g. Zaritsky et al. 1994; Yao et al. 2003; Netzer et al. 2005). To explore this effect, we use abundances and 1 → 0 optical depths are twice and four times as large for our intermediate and starburst models, respectively, as for our normal galaxy model. 3.2. Kennicutt-Schmidt Laws We first plot, in Figure 1, the quantities L−1[dL(< n)/d lnn] (solid lines) and M−1[dM(< n)/d lnn] (dot- ted lines) as a function of density n for galaxies with mean densities n = 102, 103, and 104 cm−3, for the tracers CO(1 → 0), HCO+(1 → 0), and HCN(1 → 0), and for the Mach number and temperature corre- sponding to our intermediate case in Table 1. Here L(< n) and M(< n) are the luminosity and mass per unit volume contributed by gas of density n or less, i.e. L(< n) = X(S)βijAijhνij ∫ lnn fin(dp/d lnn)d lnn, M(< n) = ∫ lnn µH2n(dp/d lnn)d lnn, L = L(< ∞), and M = M(< ∞). Physically, L−1[dL(< n)/d lnn] and M−1[dM(< n)/d lnn] represent the fractional contribu- tion to the total line luminosity and the total mass that comes from each unit interval in the logarithm of den- sity. The plot shows what density range provides the dominant contribution to the line luminosity in differ- ent lines and for galaxies of differing mean densities, and how the gas contributing light compares to the gas con- tributing mass. Because the mass distribution is entirely specified by n and M, the dotted lines are the same in each of the three panels. Additionally, because of our choice M = 50 (Table 1), the median density (the den- sity corresponding to the peak in M−1[dM(< n)/d lnn]) is nmed ≈ 43n. In each panel, the critical density for each molecule is identified by a vertical dashed line. The top panel clearly shows that for the CO line, the light and the mass track one another very closely, even at the lowest densities. Thus, because nmed > ncrit, the http://www.astro.princeton.edu/~krumholz/ Molecular Gas and Kennicutt-Schmidt Laws 5 Fig. 1.— Fractional contribution to the total luminosity L−1[dL(< n)/d lnn] (solid lines) and mass M−1[dM(< n)/d lnn] (dotted lines) versus density n for the lines CO(1 → 0) (top panel), HCO+(1 → 0) (middle panel), and HCN(1 → 0) (bottom panel). The three curves show the cases n = 102 cm−3, 103 cm−3, and 104 cm−3, from leftmost to rightmost. We also show the critical density of each molecule, corrected for radiative trapping (dashed vertical lines). These calcula- tions use the parameters for the intermediate case listed in Table 1. solid lines move in lock-step with the dashed lines as n increases. In contrast, for HCN most of the luminosity comes from densities near the critical density regardless of the mass distribution. For the lowest n this means that the line luminosity is entirely dominated by the high den- sity tail of the mass distribution. As the median density nmed varies by a factor of 100 (from 4.3× 103− 4.3× 105 cm−3), the peak of L−1[dL(< n)/d lnn] moves by just a factor of a few in n. The HCO+ line is intermedi- ate between CO and HCN. For n = 102 cm−3 and 103 cm−3, nmed <∼ ncrit, and as with HCN most of the emis- sion comes from near the critical density. For n = 104 cm−3, nmed > ncrit, and the light starts to follow the mass, in a pattern similar to that for CO. Although Fig- ure 1 shows only the intermediate case, the normal galaxy and starburst cases give qualitatively identical results. This confirms the intuitive argument given in § 1: high critical density transitions trace regions of similar den- sity in every galaxy, while low critical density transitions trace regions whose density is close to the median density. Now consider how the luminosity in a given line corre- lates with the star formation rate in galaxies of varying mean densities. For a given n, we can compute the vol- ume density of star formation from equation (2) and the line luminosity density from (9). To facilitate compari- son with observations, rather than considering the total line luminosity, we use the quantity L′ (Solomon et al. 1997), which is related to the luminosity L by 8πkBν2 L, (10) converted to the units K km s−1 pc2. Similarly, we can estimate the far infrared luminosity from the star formation rate. There is a tight correla- tion between far-IR emission and star formation, par- ticularly for dense, dusty galaxies like those that make up most of the dynamic range of the Kennicutt (1998a) sample. To the extent that most or all of the light from young stars is re-processed by dust before escaping the galaxy, the bolometric luminosity integrated over the wavelength range 8− 1000 µm, which we define as LFIR, simply provides a calorimetric measurement of the total energy output by young stars, and is therefore an excel- lent tracer of recent star formation (Sanders & Mirabel 1996; Rowan-Robinson et al. 1997; Kennicutt 1998a,b; Hirashita et al. 2003; Bell 2003; Iglesias-Páramo et al. 2006). We therefore estimate the FIR luminosity from the star formation rate via LFIR = ǫṀ∗c 2, (11) where ǫ is an IMF-dependent constant. For consistency with Kennicutt (1998a,b), we take ǫ = 3.8× 10−4. To be precise and to facilitate comparison with observations, we adopt the Sanders & Mirabel (1996) definition of LFIR as a weighted sum of the luminosity in the 60 and 100 µm IRAS bands. This definition of the infrared luminos- ity generically underestimates the total infrared luminos- ity [8 − 1000]µm by a factor of 1.5 − 2 (Calzetti et al. 2000; Dale et al. 2001; Bell 2003). However, we use the ǫ value appropriate for LFIR rather than for the total IR luminosity because some of the observations to which we wish to compare our model (see § 3.3) provide only LFIR. Note that this choice for the connection between the star formation rate and the infrared luminosity is not fully consistent with our choice of the gas temperature for the three sets of parameters — normal, intermediate, and starburst — listed in Table 1, an issue we discuss in more detail in § 4.3. We plot the ratio of star formation rate to line lumi- nosity, and infrared luminosity to line luminosity, as a function of n in Figure 2. First consider the top panel, which shows all three lines computed for the interme- diate case. This again confirms our intuitive argument. Since the luminosity per unit volume in the CO line is roughly proportional to the mass density, and the star formation rate / IR luminosity is proportional to mass density to the 1.5 power, the ratios Ṁ∗/L ′ and LFIR/L vary roughly as n0.5. A powerlaw fit to the data over the range shown in Figure 2 gives an index of 0.57. In con- trast, the ratio of star formation density to HCN luminos- ity density is nearly constant for galaxies with n < 103 cm−3, and varies quite weakly with n up to densities of 104 cm−3, values found in the densest starbursts. A powerlaw fit from 10 cm−3 to 104 cm−3 gives an index of 0.17; from 10 cm−3 and 103 cm−3, the best fit powerlaw index is 0.08. As in Figure 1, the slope of the Ṁ∗/L curve for HCO+ represents an intermediate case, with a roughly constant ratio of Ṁ∗/L ′ and LFIR/L ′ at low n, rising to a slope comparable to that for CO at high values of n. Now consider the bottom three panels in Figure 2. Each panel shows the ratio of star formation rate and infrared luminosity to line luminosity for a single line, computed for each of the three galaxy models. The most Fig. 2.— Ratio of star formation rate or infrared luminosity to line luminosity, as a function of mean density n. In the top panel we show the lines CO(1 → 0) (solid line), HCO+(1 → 0) (dot- dashed line), and HCN(1 → 0) (dashed line) for the intermediate case in Table 1. In the next three panels we show the CO(1 → 0), HCO+(1 → 0), and HCN(1 → 0) lines for the normal galaxy case (dot-dashed line), intermediate case (solid line), and starburst case (dashed line). important point to take from these plots is that the choice of galaxy model has little effect in most cases. The largest differences are for HCN, where at n = 10 cm−3 the IR to line ratio predicted for the intermediate case differs from the normal galaxy case by a factor of 6.1, and from the starburst case by a factor of 4.1. This variation comes primarily from changes in the Mach number and the op- tical depth between models. The higher Mach number of the starburst model significantly increases the amount of mass in the high overdensity tail of the probability dis- tribution, while the higher optical depth lowers the ef- fective critical density. Both of these effect increase the amount of mass dense enough to emit in HCN(1 → 0) and reduce Ṁ∗/L ′. At higher mean densities these effects become less important and the models converge, so that by n = 104 cm−3 the range in Ṁ∗/L ′ from the normal to the starburst case is only a factor of 3.5. Most importantly, our central conclusion that Ṁ∗/L HCN is roughly constant across galaxies, while Ṁ∗/L CO rises as roughly [L 0.5, still holds when we consider how conditions vary across galaxies. Galaxies with low mean densities n are generally closest to the normal galaxy case, while those with high mean densi- ties should be closest to the starburst case, and this sys- tematic variation in galaxy properties with n still leaves Ṁ∗/L ′ relatively flat for HCN, and varying with a slope close to 0.5 for CO. From the normal galaxy case at n = 10 cm−3 to the starburst case at n = 104 cm−3, the value of Ṁ∗/L ′ varies by more than a factor of 50 for the CO(1 → 0), but by less than a factor of 3 for the HCN(1 → 0). 3.3. Comparison with Observations The calculations illustrated in Figure 2 demonstrate the basic argument that one expects a roughly constant star formation rate per unit line luminosity for high den- sity tracers (e.g., HCN), and a star formation rate per unit luminosity that rises like luminosity to the ∼ 0.5 for low density tracers (e.g., CO). However, in large sur- veys one cannot always determine the mean density in a galaxy, which would be required to construct an ob- servational analog to Figure 2. Instead, we can use our calculated dependence of star formation rate and line luminosity on density to compare to observations as fol- lows. Equation (9) gives the total molecular line lumi- nosity per unit volume and equation (2) gives the star formation rate, which we convert to an IR luminosity via equation (11). For fixed assumed volume of molecu- lar star-forming gas (Vmol) we can then predict the ex- pected correlations between L′ in a given molecular line and LFIR. The three panels of Figure 3 show our results for LFIR as a function of L CO, L HCN, and L for the intermediate model (see Table 1) and for several values of Vmol. Figure 4 shows how are results vary as a func- tion of the assumed T and M. There, for fixed Vmol, we compare our predictions for the intermediate model with the normal and starburst models. In both figures we compare our models to data culled from the literature. From the work of Gao & Solomon (2004a,b), Greve et al. (2005, their Fig. 7), Riechers et al. (2006b, their Fig. 5), and Gao et al. (2007), as well as the theoretical arguments in the preceding sections, we expect a strong, but not linear, correlation between the CO luminosity and the star formation rate — as measured by LFIR — with the approximate form LFIR ∝ L CO . The left panel of Figure 3 shows the CO data, the approximate correlation expected (solid line segment; offset from the data for clarity) and the theoretical prediction (solid lines) for a total volume of molecular gas of Vmol = 10 7, 108, and 109 pc3. Because at fixed LFIR, galaxies exhibit a dispersion in Vmol we expect there to be intrinsic scatter in this correlation, roughly bracketed by the range of Vmol plotted. The middle and right panels of Figure 3 show the same prediction for L′ and L′HCN. In these cases, because the molecular line luminosity is nearly linearly propor- tional to LFIR, the dependence on Vmol is much weaker than for L′CO. However, systematic changes or differences in the fiducial parameters for the calculation (see Table 1) introduce uncertainty and scatter into the correlation. Figure 4 assesses this dependence. It compares the pre- dictions of our model for normal (dot-dashed lines), inter- mediate (solid lines), and starburst (dashed lines) galax- ies, as defined in Table 1, for fixed Vmol = 10 8 pc3. Our simple model reproduces the data rather well, and it pre- dicts that generically there may be more intrinsic scatter in the L′CO−LFIR correlation than in either L HCN−LFIR or L′ − LFIR. Molecular Gas and Kennicutt-Schmidt Laws 7 Fig. 3.— LFIR (L⊙) versus L (1 → 0) (left panel), L′ (1 → 0) (middle panel), and L′ (1 → 0) (K km s−1 pc2; right panel). The lines in each panel derive from the model presented in this paper with a constant total volume of molecular material of 107, 108, and 109 pc3 (lowest to highest). The thick solid line segment shows power-law slopes to guide the eye. Data in the left and right panels are from Gao & Solomon (2004a,b) (circles) and Gao et al. (2007) (open squares for detections, arrows for upper limits). The middle panel combines data from Nguyen et al. (1992) (small circles with lines), Graciá-Carpio et al. (2006) (big circles), and Riechers et al. (2006b) (open square; using the Gao et al. (2007) FIR luminosity and magnification factor). For all data, LFIR is defined based on a weighted sum of the galaxy luminosity in the 60 and 100 µm IRAS bands, as described by Sanders & Mirabel (1996). For the Nguyen et al. (1992) data, the uncertainties in LHCO+ indicated by the lines arise because Nguyen et al. provide both HCN(1 → 0) and HCO +(1 → 0) intensities, but the values for L′ derived from their work generally fall a factor of 2 − 3 below the L′ from Gao & Solomon (2004a,b) for the same systems. This is probably because Nguyen et al. use a single beam pointing rather than integrating fully over extended sources, and therefore miss some of the flux. We therefore show two values of L′ , connected by a line, for each Nguyen et al. data point: a smaller value calculated directly from the data listed in their Table 2, and a larger value obtained by multiplying the L′HCN value of Gao & Solomon for that galaxy by the ratio IHCO+/IHCN measured by Nguyen et al. If this ratio is constant over the source, this estimate should correctly account for the flux outside the beam in the Nguyen et al. HCO+ observation. Note that in both the middle and right panels of Fig- ures 3 and 4, one expects a turn upward in the corre- lation at high LFIR, a deviation from linearity. This follows from the fact that in our model, at fixed Vmol, systems with higher LFIR have higher average densities. At sufficiently high LFIR we thus expect L HCN−LFIR and −LFIR to steepen, in analogy with the L′CO−LFIR correlation. The data points with very high LFIR in Fig- ures 3 and 4, which might be used to test this prediction of our model, are gravitationally lensed, at high redshift, and contaminated by bright AGN. It is therefore un- clear if the deviation from linearity implied particularly by the upper limits in L′HCN in the right panels of Fig- ures 3 and 4 is a result of enhanced LFIR, caused by the AGN emission (Carilli et al. 2005), or is instead a result of less molecular line emission per unit star formation, as our model implies (Fig. 2). Gao et al. (2007) note, however, that in the three systems for which the contri- bution from the AGN has been estimated (F10214+4724, D. Downes & P. Solomon 2007, in preparation; Clover- leaf, Weiß et al. 2003; APM 08279+5255, Weiß et al. 2005, 2007) the corrections are only significant for APM 08279+5255. This suggests that the data are so far con- sistent with our interpretation, but clearly much more data at high LFIR — or, more precisely for our purposes, at high density — is required to test our predictions. We discuss the issue of AGN contamination further in § 4.3. As a final note, the data so far do support the utility of HCO+ as a useful tracer of dense gas. Papadopoulos (2007) has argued against the utility of HCO+ as a faith- ful tracer of mass in starbursts on the basis that, since it is an ion, its abundance is strongly dependent on the free- electron abundance and might therefore vary strongly between galaxies with different ionizing radiation back- grounds. We cannot rule out this possibility given the limited data set available, but we see no strong evi- dence in favor of it from the data shown in Figures 3 and 4. As we have argued, HCO+(1 → 0) is particu- larly useful because its critical density is between that of CO(1 → 0) and HCN(1 → 0) and, thus, as Fig- ure 3 and 4 show, the correlation between LFIR and L′HCO+ should steepen from linear to super-linear over the range of galaxies presented in the CO panels. A care- ful, large-scale HCO+(1 → 0) survey similar to the work of Gao & Solomon (2004a) on HCN(1 → 0) should reveal these trends. Lines with similarly low excitation temper- atures and intermediate critical densities like CS(1 → 0) should behave analogously. 4. DISCUSSION 4.1. Implications for Kennicutt-Schmidt Laws and Star Formation Efficiencies Our results suggest that KS laws in different tracers naturally fall into two regimes, although there is a broad range of molecular tracers that are intermediate between the two extremes. Tracers for which the critical density is small compared to the median density in a galaxy repre- sent one limit. In these tracers, the light faithfully follows the mass, so the KS law measures a relationship between total mass and star formation. In any model in which star formation occurs at a roughly constant rate per dy- namical time, this must produce a KS law in which the star formation rate rises with density to a power of near 1.5, and the ratio of star formation to luminosity rises as density to the 0.5 power. In terms of surface rather than volume densities, this implies Σ̇∗ ∝ Σ −1/2. If we further add the observation that the scale heights h of the star-forming molecular layers of galaxies are roughly constant across galaxy types, one form of the observed Kennicutt (1998a,b) star formation law follows immedi- ately (Elmegreen 2002). Moreover, in a galactic disk, Fig. 4.— The same as Figure 3, but with constant Vmol = 10 8 pc3, and for the model parameters corresponding to “starburst” (dashed), “intermediate” (solid), and “normal” (dot-dashed) (see Table 1). Therefore, the middle solid line in each panel of Figure 3 is the same as the solid line in each panel in this Figure. h ∝ Σg/n and n ∝ Ω2/Q (e.g. Thompson et al. 2005); since in star-forming disks the Toomre-Q is about unity (Martin & Kennicutt 2001), substituting for h immedi- ately gives Σ̇∗ ∝ ΣΩ, the alternate form of the Kennicutt (1998a,b) law. The other limit is tracers for which the critical density is large compared to the median galactic density. These tracers pick out a particular density independent of the mean or median density in the galaxy, and thus all the regions they identify have the same dynamical time re- gardless of galactic environment. In this case the star formation rate will simply be proportional to the total mass of the observed regions, yielding a constant ratio of star formation rate to mass, as is observed for HCN in the local universe (Fig. 3, right panel; Gao & Solomon 2004a,b; Wu et al. 2005). We predict that there should be a transition between linear and super-linear scaling of LFIR with line luminos- ity at the point where galaxies transition from median densities that are smaller than the line critical density to median densities larger than the critical density. The HCO+(1 → 0) line, and other lines with similar critical densities, e.g. CS(1 → 0) and SO(1 → 0), should show this behavior for galaxies in the local universe. The ob- served correlation between LHCO+ and LFIR appears to be consistent with our prediction, although at present the data are not of sufficient quality to distinguish between a break and a single powerlaw relation. There are hints that the very highest luminosity star-forming galaxies, which all reside at high redshift and may well reach ISM densities not found in any local systems, show such a break in the IR-HCN correlation. One important point to emphasize in this analysis is that we have been able to explain the observed correla- tions between line and infrared luminosities, and hence between gas masses at various densities and star for- mation rates, without resorting to the hypothesis that the star formation process is fundamentally different in galaxies of different properties. Although uncertainties in both our model and the observations do not preclude an order-unity change in the star formation efficiency or SFRff as a function of LFIR, there is currently no evi- dence for such a change in the data, contrary to claims made by, e.g. Graciá-Carpio et al. (2006). In fact, all of the observational trends are predicted by our sim- ple model with constant star formation efficiency. This is consistent with other lines of evidence that the frac- tion of mass at a given density that turns into stars is roughly 1% per free-fall time independent of density (Krumholz & Tan 2007). 4.2. Does Star Formation Have a Fundamental Size or Density Scale? Based on the linear correlation between HCN(1 → 0) luminosity and star formation rate, seen both in ex- ternal galaxies and in individual molecular clumps in the Milky Way, Gao & Solomon (2004a,b) and Wu et al. (2005) propose that HCN(1 → 0) emission traces a fun- damental unit of star formation. They explain the linear IR-HCN correlation as a product of this; in their model, HCN luminosity correlates linearly with star formation rate because HCN luminosity simply counts the number of such units. Based on our analysis, we argue that this hypothesis is only partially correct. We concur with Gao & Solomon and Wu et al. that the HCN(1 → 0) luminosity of a galaxy does simply reflect the mass of gas that is dense enough to excite the HCN(1 → 0) line. However, our analysis shows that this does not necessarily imply that this density represents a special density in the star for- mation process, or that objects traced by HCN(1 → 0) represent a physically distinct class. We show that a linear correlation between star formation rate and line luminosity is expected for any line with a critical density comparable to or larger than the median molecular cloud density in the galaxies used to define the correlation. It is possible that HCN(1 → 0)-emitting regions represent a physically distinct scale of star formation as Wu et al. propose, but one can explain the linear IR-HCN correla- tion equally well if they are just part of the same contin- uous medium as the regions traced by CO(1 → 0) and by other transitions. Even the star-forming clouds them- selves may simply be parts of a continuous distribution of ISM structures occupying the entire galaxy, as argued by Wada & Norman (2007). In this case there need be no special density scales other than the mean and me- dian densities for the star-forming clouds on their largest scales, and the density at which star formation becomes rapid, converting the mass into stars in of order a free- Molecular Gas and Kennicutt-Schmidt Laws 9 fall time. This transition scale is unknown, but must be considerably larger than the density traced by HCN (Krumholz & Tan 2007). 4.3. Limitations and Cautions 4.3.1. Self-Consistency As mentioned in § 3.2, our approach of leaving the gas temperature T and Mach number M as free parame- ters is not entirely consistent with our calculation of the IR luminosity, since the IR luminosity and temperature are of course related. In principle, with a model of how the energy output from stars heats the dust and gas, to- gether with a structural model connecting the energy and momentum output from stars to the generation of turbu- lence, it should be possible to self-consistently compute both the gas temperature and the Mach number from the volumetric star formation rate (see, e.g., Thompson et al. 2005). Such a model would return T andM as a function of n and possibly other galaxy properties, while simulta- neously predicting a set of Kennicutt-Schmidt laws. If the line luminosity depended strongly on T or M, or if one required knowledge of the temperature to com- pute the infrared luminosity of a galaxy, we would would have no alternative to constructing such a model if we wished to explain the observed IR-line luminosity cor- relation. However, we can avoid this by relying on the observationally-calibrated star formation-IR correlation, and because, as we show in Figures 2, 3, and 4, the line luminosity varies quite weakly over a reasonable range of T and M for our chosen lines. For this reason, any model for computing T and M as a function of galaxy proper- ties, if it were consistent with observations, would not significantly alter the IR-line luminosity correlation we derive. This is true, however, only for lines that require low temperatures to excite. As we discuss in § 4.3.2, lines that require higher temperatures to excite do de- pend sensitively on the temparature in the galaxy, and a model capable of predicting the IR-line luminosity cor- relation for these lines must also include a calculation of the temperature structure of the galaxy. 4.3.2. Isothermality Our assumption of isothermality means that our anal- ysis will only apply to molecular lines for which the tem- perature Tup corresponding to the upper state energy is < 10 K, low enough to be excited even in the coolest molecular clouds in normal spiral galaxies. The reason for this is that at temperatures larger than Tup, the lu- minosity in a line generally varies at most linearly with the temperature. As the similarity between the results with our different galaxy models illustrates, changing the temperature within the range of ∼ 10 − 50 K produces only a factor of a few change in the luminosity of the lines we have studied. In contrast, line luminosity re- sponds exponentially to temperature changes when the temperature is below the value corresponding to the up- per state’s energy. This means that lines sensitive to high temperatures pick out primarily the regions that are warm enough for the line to be excited. Density has only a secondary effect. The emission will therefore re- flect the temperature distribution in star-forming clouds more than the density distribution, an effect that our isothermal assumption precludes us from treating. KS laws in high temperature tracers are likely to find lin- ear relationships between star formation rate and mass regardless of the critical density of the molecule in ques- tion because they will simply be correlating the mass of dust warmed to >∼ 100 Kelvin, which is essentially what is measured by LFIR, with the mass of gas warmed to temperatures above Tup. However, our model will not apply to these lines, and for this reason we do not attempt to compare to observations using higher tran- sitions of CO (3 → 2, 4 → 3, 5 → 4, 6 → 5, and 7 → 6, which have Tup = 33, 55, 83, 116, and 154 K, respectively; Greve et al. 2005, Solomon & Vanden Bout 2005), CS(5 → 4) (Tup = 35 K; Plume et al. 1997), or other high temperature tracers. 4.3.3. Molecular Abundances We have not considered density-dependent variations in molecule abundances. One potential source of vari- ation in molecular abundance is freeze-out onto grain surfaces at high densities and low temperatures (e.g. Tafalla et al. 2004a,b). Chemodynamical models suggest that freeze-out is not likely to become significant for ei- ther carbonaceous or nitrogenous species until densities n >∼ 106 cm−3 (Flower et al. 2006), but may become se- vere at higher densities, so whether depletion is signif- icant depends on what fraction of the total luminosity would be contributed by gas of this density or higher were there no freeze-out. Figure 1 suggests that freeze- out is likely to modify the total galactic luminosity of CO, HCO+, and HCN fairly little even at a mean ISM density of n = 104 cm−3, but may have significant effects for galaxies of larger mean densities or for lines for which the critical densities is comparable to the freeze-out den- sity. If freeze-out is significant, our conclusions will be modified. 4.3.4. Atomic Gas In the simple model developed here, we have neglected the role of atomic gas entirely. Whether the density or surface density of atomic gas plays a role in controlling the star formation rate is subject to debate on both ob- servational and theoretical grounds (Kennicutt 1998a,b; Wong & Blitz 2002; Heyer et al. 2004; Komugi et al. 2005; Krumholz & McKee 2005; Kennicutt et al. 2007), so it is unclear how much a limitation this omission re- ally is. We can say with confidence that in molecule-rich galaxies, which provide almost all the dynamic range of both the Kennicutt (1998a,b) correlation and the cor- relations illustrated in Figures 3 and 4, the atomic gas plays almost no role simply because there is so little of it. Thus, our predictions should be quite robust, except perhaps at the very low luminosity ends of Figures 3 and 4.3.5. AGN Contributions A final point is not so much a limitation of our work as a cautionary note about comparing our model with ob- servations. We have included in our model IR luminosity only from star formation, and molecular line luminosity only from molecules in cold star-forming clouds. How- ever, an AGN may make a significant contribution to a galaxy’s luminosity in the far infrared by direct heating of dust grains, and in molecular lines via an X-ray dissocia- tion region. Indeed, several of the systems with the high- est IR luminosities in Figures 3 and 4 are contaminated by AGN. As noted in §3.3, this complicates an assessment of our prediction of an up-turn in the L′HCN − LFIR and L′HCO+ − LFIR correlations at high luminosity. This de- viation from linearity at high gas density (at fixed Vmol, high LFIR) is an essential prediction of our model, but testing it relies on a careful separation of the contribu- tion of the AGN to both the IR and line luminosities (e.g. Maloney et al. 1996). In fact, Carilli et al. (2005) discuss the possibility that the AGN’s contribution to the IR lu- minosity in these systems causes them to be above the local linear L′HCN−LFIR correlation. Such a contamina- tion would mimic the prediction of our model. However, Gao et al. (2007) argue that the sub-millimeter galaxies in their sample are not AGN dominated and that just one of three quasars in their sample (APM 08279+5255) has a large AGN IR component. See Gao et al. (2007) for more discussion. For these reasons we contend that although our model is consistent with the existing data, the current evidence for a break in the L′HCN−LFIR cor- relation should be viewed with caution and more data in high density/luminosity systems is clearly required to understand the role of AGN contamination in shaping the correlation. 5. CONCLUSIONS We provide a simple model for understanding how Kennicutt-Schmidt laws, which relate the star formation rate to the mass or surface density of gas as inferred from some particular line, depend on the line chosen to define the correlation. We show that for a turbulent medium the luminosity per unit volume in a given line, provided that line can be excited at temperatures lower than the mean temperature in a galaxy’s molecular clouds, in- creases faster than linearly with the density for molecules with critical densities larger than the median gas density. The star formation rate also rises super-linearly with the gas density, and the combination of these two effects pro- duces a close to linear correlation between star formation rate and line luminosity. In contrast, the line luminosity rises only linearly with density for lines with low critical densities, producing a correlation between star formation rate and line luminosity that is super-linear. Based on this analysis, we construct a model for the correlation between a galaxy’s infrared luminosity and its luminosity in a particular molecular line. Our model is extremely simple, in that it relies on an observationally- calibrated IR-star formation rate correlation, it treats molecular clouds as having homogenous density and ve- locity distributions, temperatures, and chemical compo- sitions, and it only very crudely accounts for variations in molecular cloud properties across galaxies. Despite these approximations, the model naturally explains why some observed correlations between infrared luminosity and line luminosity in galaxies are linear, and some are super-linear. Using it, we are able to compute quantita- tively the correlation between infrared and HCN(1 → 0) line luminosity, and between IR and CO(1 → 0) line lu- minosity. We show that our model provides a very good fit to observations in these lines, and we are able to make similar predictions for any molecular line that can be ex- cited at low temperatures, as we demonstrate for the example of HCO+(1 → 0). Moreover, we are able to ex- plain the observed data without recourse to the hypoth- esis that the star formation process is somehow different, either more or less efficient, in different types of galaxies or for media of different densities. Instead, our model is able to explain the observed correlations using a simple, universal star formation law. One strong prediction of our model is that there should be a break from linear to non-linear scaling in the HCN- IR correlation at very high IR luminosity, and a similar break in the HCO+-IR correlation at somewhat lower lu- minosity. The data for HCO+ are consistent with this prediction but do not yet strongly favor a break over pure powerlaw behavior. However, there is some preliminary evidence for a break in the IR-HCN correlation in high redshift galaxies more luminous than any found in the local universe, although with these high redshift obser- vations it is difficult to rule out the alternative explana- tion of the break as arising due to a progressively rising AGN contribution to the IR luminosity (see §3.3 and §4.3.5). Future galaxy surveys both in the local universe and at high redshift may be used to test our predictions for HCO+(1 → 0), HCN(1 → 0), and other molecular lines. We thank L. Blitz, B. Draine, A. Leroy, E. Rosolowsky, and A. Socrates for helpful discussions, N. Evans and the anonymous referee for useful comments on the manuscript, and R. Kennicutt for kindly providing a preprint of his submitted paper. We thank Y. Gao for providing LFIR for the systems used in Figures 3 and 4. MRK acknowledges support from NASA through Hubble Fellowship grant #HSF-HF-01186 awarded by the Space Telescope Science Institute, which is operated by the As- sociation of Universities for Research in Astronomy, Inc., for NASA, under contract NAS 5-26555. TAT acknowl- edges support from a Lyman Spitzer, Jr. Fellowship. REFERENCES Bell, E. F. 2003, ApJ, 586, 794 Black, J. H. 2000, in Astronomy, physics and chemistry of H Calzetti, D., Armus, L., Bohlin, R. C., Kinney, A. L., Koornneef, J., & Storchi-Bergmann, T. 2000, ApJ, 533, 682 Carilli, C. L., Solomon, P., Vanden Bout, P., Walter, F., Beelen, A., Cox, P., Bertoldi, F., Menten, K. M., Isaak, K. G., Chandler, C. J., & Omont, A. 2005, ApJ, 618, 586 Combes, F. 1991, ARA&A, 29, 195 Dale, D. A., Helou, G., Contursi, A., Silbermann, N. A., & Kolhatkar, S. 2001, ApJ, 549, 215 Downes, D. & Solomon, P. M. 1998, ApJ, 507, 615 Elmegreen, B. G. 1994, ApJ, 425, L73 —. 2002, ApJ, 577, 206 Elmegreen, B. G. & Scalo, J. 2004, ARA&A, 42, 211 Flower, D. R., Pineau Des Forêts, G., & Walmsley, C. M. 2006, A&A, 456, 215 Gao, Y., Carilli, C. L., Solomon, P. M., & Vanden Bout, P. A. 2007, ApJ, in press, astro-ph/0703548 Gao, Y. & Solomon, P. M. 2004a, ApJS, 152, 63 —. 2004b, ApJ, 606, 271 Graciá-Carpio, J., Garćıa-Burillo, S., Planesas, P., & Colina, L. 2006, ApJ, 640, L135 Molecular Gas and Kennicutt-Schmidt Laws 11 Greve, T. R., Bertoldi, F., Smail, I., Neri, R., Chapman, S. C., Blain, A. W., Ivison, R. J., Genzel, R., Omont, A., Cox, P., Tacconi, L., & Kneib, J.-P. 2005, MNRAS, 359, 1165 Heyer, M. H., Corbelli, E., Schneider, S. E., & Young, J. S. 2004, ApJ, 602, 723 Hirashita, H., Buat, V., & Inoue, A. K. 2003, A&A, 410, 83 Iglesias-Páramo, J., Buat, V., Takeuchi, T. T., Xu, K., Boissier, S., Boselli, A., Burgarella, D., Madore, B. F., Gil de Paz, A., Bianchi, L., Barlow, T. A., Byun, Y.-I., Donas, J., Forster, K., Friedman, P. G., Heckman, T. M., Jelinski, P. N., Lee, Y.-W., Malina, R. F., Martin, D. C., Milliard, B., Morrissey, P. F., Neff, S. G., Rich, R. M., Schiminovich, D., Seibert, M., Siegmund, O. H. W., Small, T., Szalay, A. S., Welsh, B. Y., & Wyder, T. K. 2006, ApJS, 164, 38 Juvela, M., Padoan, P., & Nordlund, Å. 2001, ApJ, 563, 853 Kennicutt, R. C. 1998a, ARA&A, 36, 189 —. 1998b, ApJ, 498, 541 Kennicutt, R. C., Calzetti, D., Walter, F., Helou, G., Hollenbach, D. J., Armus, L., Bendo, G., Dale, D. A., Draine, B. T., Engelbracht, C. W., Gordon, K. D., Prescott, M. K. M., Regan, M. W., Thornley, M. D., Bot, C., Brinks, E., de Blok, E., de Mello, D., Meyer, M., Moustakas, J., Murphy, E. J., Sheth, K., & Smith, J. D. T. 2007, ApJ, submitted Komugi, S., Sofue, Y., Nakanishi, H., Onodera, S., & Egusa, F. 2005, PASJ, 57, 733 Krumholz, M. R. & McKee, C. F. 2005, ApJ, 630, 250 Krumholz, M. R. & Tan, J. C. 2007, ApJ, 654, 304 Lahuis, F. & van Dishoeck, E. F. 2000, A&A, 355, 699 Mac Low, M. & Klessen, R. S. 2004, Reviews of Modern Physics, 76, 125 Madore, B. F. 1977, MNRAS, 178, 1 Maloney, P. R., Hollenbach, D. J., & Tielens, A. G. G. M. 1996, ApJ, 466, 561 Martin, C. L. & Kennicutt, R. C. 2001, ApJ, 555, 301 McKee, C. F. 1999, in NATO ASIC Proc. 540: The Origin of Stars and Planetary Systems, 29 Netzer, H., Lemze, D., Kaspi, S., George, I. M., Turner, T. J., Lutz, D., Boller, T., & Chelouche, D. 2005, ApJ, 629, 739 Neufeld, D. A., Melnick, G. J., Sonnentrucker, P., Bergin, E. A., Green, J. D., Kim, K. H., Watson, D. M., Forrest, W. J., & Pipher, J. L. 2006, ApJ, 649, 816 Nguyen, Q.-R., Jackson, J. M., Henkel, C., Truong, B., & Mauersberger, R. 1992, ApJ, 399, 521 Nordlund, Å. K. & Padoan, P. 1999, in Interstellar Turbulence, Ostriker, E. C., Gammie, C. F., & Stone, J. M. 1999, ApJ, 513, Padoan, P. & Nordlund, Å. 2002, ApJ, 576, 870 Papadopoulos, P. P. 2007, ApJ, 656, 792 Plume, R., Jaffe, D. T., Evans, N. J., Martin-Pintado, J., & Gomez-Gonzalez, J. 1997, ApJ, 476, 730 Riechers, D. A., Walter, F., Carilli, C. L., Knudsen, K. K., Lo, K. Y., Benford, D. J., Staguhn, J. G., Hunter, T. R., Bertoldi, F., Henkel, C., Menten, K. M., Weiss, A., Yun, M. S., & Scoville, N. Z. 2006a, ApJ, 650, 604 Riechers, D. A., Walter, F., Carilli, C. L., Weiss, A., Bertoldi, F., Menten, K. M., Knudsen, K. K., & Cox, P. 2006b, ApJ, 645, Rowan-Robinson, M., Mann, R. G., Oliver, S. J., Efstathiou, A., Eaton, N., Goldschmidt, P., Mobasher, B., Serjeant, S. B. G., Sumner, T. J., Danese, L., Elbaz, D., Franceschini, A., Egami, E., Kontizas, M., Lawrence, A., McMahon, R., Norgaard-Nielsen, H. U., Perez-Fournon, I., & Gonzalez-Serrano, J. I. 1997, MNRAS, 289, 490 Sanders, D. B. & Mirabel, I. F. 1996, ARA&A, 34, 749 Schmidt, M. 1959, ApJ, 129, 243 —. 1963, ApJ, 137, 758 Schöier, F. L., van der Tak, F. F. S., van Dishoeck, E. F., & Black, J. H. 2005, A&A, 432, 369 Solomon, P. M., Downes, D., Radford, S. J. E., & Barrett, J. W. 1997, ApJ, 478, 144 Solomon, P. M., Rivolo, A. R., Barrett, J., & Yahil, A. 1987, ApJ, 319, 730 Solomon, P. M. & Vanden Bout, P. A. 2005, ARA&A, 43, 677 Tafalla, M., Myers, P. C., Caselli, P., & Walmsley, C. M. 2004a, A&A, 416, 191 —. 2004b, Ap&SS, 292, 347 Thompson, T. A., Quataert, E., & Murray, N. 2005, ApJ, 630, Wada, K. & Norman, C. 2007, ApJ, in press, astro-ph/0701595 Weiß, A., Downes, D., Neri, R., Walter, F., Henkel, C., Wilner, D. J., Wagg, J., & Wiklind, T. 2007, A&A, 467, 955 Weiß, A., Downes, D., Walter, F., & Henkel, C. 2005, A&A, 440, Weiß, A., Henkel, C., Downes, D., & Walter, F. 2003, A&A, 409, Wild, W., Harris, A. I., Eckart, A., Genzel, R., Graf, U. U., Jackson, J. M., Russell, A. P. G., & Stutzki, J. 1992, A&A, 265, 447 Wong, T. & Blitz, L. 2002, ApJ, 569, 157 Wu, J., Evans, N. J., Gao, Y., Solomon, P. M., Shirley, Y. L., & Vanden Bout, P. A. 2005, ApJ, 635, L173 Yao, L., Seaquist, E. R., Kuno, N., & Dunne, L. 2003, ApJ, 588, Zaritsky, D., Kennicutt, Jr., R. C., & Huchra, J. P. 1994, ApJ, 420, 87 ABSTRACT We provide a model for how Kennicutt-Schmidt (KS) laws, which describe the correlation between star formation rate and gas surface or volume density, depend on the molecular line chosen to trace the gas. We show that, for lines that can be excited at low temperatures, the KS law depends on how the line critical density compares to the median density in a galaxy's star-forming molecular clouds. High critical density lines trace regions with similar physical properties across galaxy types, and this produces a linear correlation between line luminosity and star formation rate. Low critical density lines probe regions whose properties vary across galaxies, leading to a star formation rate that varies superlinearly with line luminosity. We show that a simple model in which molecular clouds are treated as isothermal and homogenous can quantitatively reproduce the observed correlations between galactic luminosities in far infrared and in the CO(1->0) and HCN(1->0) lines, and naturally explains why these correlations have different slopes. We predict that IR-line luminosity correlations should change slope for galaxies in which the median density is close to the line critical density. This prediction may be tested by observations of lines such as HCO^+(1->0) with intermediate critical densities, or by HCN(1->0) observations of intensely star-forming high redshift galaxies with very high densities. Recent observations by Gao et al. hint at just such a change in slope. We argue that deviations from linearity in the HCN(1->0)-IR correlation at high luminosity are consistent with the assumption of a constant star formation efficiency. <|endoftext|><|startoftext|> Friedmann Equations and Thermodynamics of Apparent Horizons Yungui Gong1, 2, ∗ and Anzhong Wang2, † College of Mathematics and Physics, Chongqing University of Posts and Telecommunications, Chongqing 400065, China GCAP-CASPER, Department of Physics, Baylor University, Waco, TX 76798, USA With the help of a masslike function which has dimension of energy and equals to the Misner-Sharp mass at the apparent horizon, we show that the first law of thermodynamics of the apparent horizon dE = TAdSA can be derived from the Friedmann equation in various theories of gravity, including the Einstein, Lovelock, nonlinear, and scalar-tensor theories. This result strongly suggests that the relationship between the first law of thermodynamics of the apparent horizon and the Friedmann equation is not just a simple coincidence, but rather a more profound physical connection. PACS numbers: 98.80.-k,04.20.Cv,04.70.Dy The derivation of the thermodynamic laws of black holes from the classical Einstein equation suggests a deep connection between gravitation and thermodynamics [1]. The discovery of the quantum Hawking radiation [2] and black hole entropy which is proportional to the area of the event horizon of the black hole [3] further supports this connection and the thermodynamic (physical) inter- pretation of geometric quantities. The interesting rela- tion between thermodynamics and gravitation became manifest when Jacobson derived Einstein equation from the first law of thermodynamics by assuming the propor- tionality of the entropy and the horizon area for all local acceleration horizons [4]. In cosmology, like in black holes, for the cosmologi- cal model with a cosmological constant (called de Sit- ter space), there also exist Hawking temperature and en- tropy associated with the cosmological event horizon, and thermodynamic laws of the cosmological event horizon [5]. In de Sitter space, the event horizon coincides with the apparent horizon (AH). For more general cosmologi- cal models, the event horizon may not exist, but the AH always exists, so it is possible to have Hawking tempera- ture and entropy associated with the AH. The connection between the first law of thermodynamics of the AH and the Friedmann equation was shown in [6]. Now, we must ask if this interesting relation between gravitation and thermodynamics exists in more general theories of grav- ity, like Brans-Dicke (BD) theory and nonlinear gravi- tational theory. In [7], the gravitational field equations for the nonlinear theory of gravity were derived from the first law of thermodynamics by adding some nonequi- librium corrections. In this Letter, we show that equi- librium thermodynamics indeed exists for more general theories of gravity, provided that a new masslike function is introduced. To show our claim, we begin by reviewing the ther- modynamics of the AH with the use of the Misner- Sharp (MS) mass in Einstein and BD theories of gravity, whereby we find the equilibrium thermodynamics fails to hold for the BD theory. The Einstein equation can be rewritten as the mass formulas with the help of the MS mass M. The energy flow through the AH dE is related with the MS mass. Since the MS mass M, the Hawking temperature TA, and the entropy SA of the AH are geo- metric quantities, the first law of thermodynamics of the AH can be thought of as a geometric relation. Therefore, we expect the geometric relation to hold in other gravi- tational theory if it holds in Einstein theory. To achieve this, we replace the MS mass M by a masslike function M which equals to the MS mass M at the AH, we then show that the connection between the first law of ther- modynamics of the AH and the gravitational equations holds in scalar-tensor and nonlinear theories of gravity without adding nonequilibrium correction. For a spherically symmetric space-time with the metric ds2 = gabdx adxb+ r̃2dΩ2, using the MS mass M = r̃(1− gabr̃,ar̃,b)/2G [8], the a − b components of the Einstein equation give the mass formulas [9, 10] M,a = 4πr̃2(T ba − δ aT )r̃,b, (1) where the unit spherical metric is given by dΩ2 = dθ2 + sin2 θdϕ2 and T = T aa . From now on, all the indices are raised and lowered by the metric gab and the covariant derivative is with respect to gab. The AH is r̃A = arA = (H 2 + k/a2)−1/2. (2) At the AH, the MS mass M = 4πr̃3Aρ/3, which can be interpreted as the total energy inside the AH. Now we use the (approximate) generator ka = (1, −Hr) of the AH, which is null at the horizon, to project the mass formulas. Since kar̃,a = 0, at the AH we find that − dE = −ka∇aMdt = d(r̃A)/G = TAdSA, (3) where the horizon temperature is TA = 1/(2πr̃A) and the horizon entropy is SA = πr̃ A/G. On the other hand, us- ing the mass formulae (1), we get the energy flow through the AH − dE = −ka∇aMdt = −4πr̃2T ba r̃,bk = 4πr̃3AH(ρ+ p)dt. (4) Therefore, the Friedmann equation gives rise to the first law of thermodynamics −dE = TAdSA of the AH. From http://arxiv.org/abs/0704.0793v2 the above definitions, we see that the relation −dE = TAdSA is a geometrical relation which depends on the only assumption of the Robertson-Walker metric. To connect the geometrical quantity dE with the energy flow through the AH, we need to use the Friedmann equa- tions. Therefore, for any gravitational theory, if we can write the gravitational field equation as Gµν = 8πGTµν and regard the right-hand side as the effective energy- momentum tensor, then we find the energy flow through the AH, whereby we derive the first law of thermodynam- ics of the AH −dE = TAdSA. For example, in the Jordan frame of the scalar-tensor theory of gravity, if we take the right-hand side of gravitational field equation as the total effective energy-momentum tensor, then the Friedmann equation can be regarded as a thermodynamic identity at the AH [11]. The connection between the first law of thermodynam- ics and the Friedmann equation at the AH was also found for gravity with Gauss-Bonnet term, the Lovelock theory of gravity [6], and the braneworld cosmology [12]. For a general static spherically symmetric and stationary ax- isymmetric space-times, it was shown that Einstein equa- tion at the horizon give rise to the first law of thermody- namics [13, 14]. For the Lovelock gravity, the interpreta- tion of gravitational field equation as a thermodynamic identity was proposed in [15]. Alternatively, the mass formulae (1) can be written as the so-called unified first law ∇aM = AΨa + W∇aV [16, 17], where W = (ρ − p)/2 and Ψa = T ba r̃,b + Wr̃,a. Projecting the unified first law along the direc- tion tangent to the AH (or trapping horizon in Hay- ward’s terminology), the first law of thermodynamics dM = TdS + WdV can be derived, where the hori- zon temperature and entropy are given, respectively, by T = ✷r̃/(4π) and S = A/(4G). Based on this result, the connection between the Friedmann equation and the first law of thermodynamics of the AH with the work term was widely discussed for Einstein gravity, Love- lock’s gravity, the scalar-tensor theory of gravity, the nonlinear theory of gravity, and the braneworld scenario [18, 19, 20, 21, 22, 23]. This connection between the Friedmann equation and the first law of thermodynamics of the AH suggests the unique role of the AH in thermodynamics of cosmology. This may be used to probe the property of dark energy [10, 24]. For example, if we assume that the temperature of the dark components is T = bTA, then use the relation T = (ρ+p)/s = (ρ+p)a3/σ, we find that the total energy density of the dark components is given by ρ = ρΛ + ρ0 , (5) where ρ0 = σ 2b2Ga−60 /(6π), ρΛ = 3Λ/(8πG) is the en- ergy density of the cosmological constant, σ is the con- stant comoving entropy density, and s is the physical en- tropy density. The right-hand side of the above equation contains three different terms, which correspond to, re- spectively, the cosmological constant, the stiff fluid, and the pressureless matter. However, the coefficients of these terms are not all independent. In fact, the current obser- vational constraints tell us that the stiff fluid is negligibly small, for which we must assume ρ0 ≪ 1. This in turn im- plies that the pressureless matter given by the last term is also negligibly small. So the pressureless matter in the last term cannot account for dark matter. In other words, the dark matter must not be in equilibrium with the AH. For the BD theory [25] L = − φR + ωgµν ∂µφ∂νφ , (6) the BD scalar φ plays the role of the gravitational con- stant. The MS mass is [26] M = φr̃(1− gabr̃,ar̃,b)/2. (7) At the AH, M = φr̃A/2. The horizon entropy is SA = πr̃2Aφ, so we get TAdSA = r̃Adφ/2 + φdr̃A. (8) On the other hand, we have − dE = −ka∇aMdt = −r̃Adφ/2 + φdr̃A. (9) Comparing Eqs. (8) with (9), we find that the equilibrium thermodynamics −dE = TAdSA fails to hold for the BD theory. Similarly, it can be shown that −dE = TAdSA does not hold in the nonlinear and scalar-tensor theories of gravity. It is exactly because of this that it was argued nonequilibrium treatment might be needed. As mentioned above, the mass, temperature and en- tropy of the AH are all geometrical quantities, and the first law of thermodynamics of the AH can be regarded as a geometric relation. Now, the important question is whether a mass function exists that serves as the bridge between the Friedmann equation and the first law of ther- modynamics of the AH without nonequilibrium correc- tion. In the following, we show that the answer is affir- mative. It has exactly the dimension of energy, and is equal to the MS mass at the AH. To distinguish it with the MS mass, we call it the masslike function. To show our above claim, let us write the a − b com- ponents of the Einstein equation as M,a = −4πr̃2(T ba − δ aT )r̃,b + r̃,a, (10) where the mass-like function M is defined as (1 + gabr̃,ar̃,b). (11) At the AH, gabr̃,ar̃,b = 0 and the masslike function M = r̃A/2G, which is equal to the MS mass. For the Robertson-Walker metric we have gtt = −1, grr = a2/(1 − kr2) and r̃ = ar. Then, the mass formulas (10) yield the Friedmann equations ρ, (12) (ρ+ 3p). (13) Combining Eqs. (12) and (13), we can derive the energy conservation law ρ̇ + 3H(ρ + p) = 0. Thus, the mass formulas (10) give rise to the full set of the cosmological equations. At the AH, the masslike function M = 4πr̃3Aρ/3, which is the total energy inside the AH. The energy flow is dE = ka∇aMdt = d(r̃A)/G = TAdSA. (14) On the other hand, using the mass formulas (10), we get the energy flow through the AH dE = ka∇aMdt = −4πr̃2T ba r̃,bk = 4πr̃3AH(ρ+ p)dt. (15) Therefore, the Friedmann equation gives rise to the first law of thermodynamics dE = TAdSA of the AH. While this result is the same as that obtained by using the MS mass, we show below that the equilibrium thermodynam- ics can be derived for BD and nonlinear gravities by using our newly defined masslike function, although it cannot be done by using the MS mass, as shown above. For the BD theory, the mass-like function is defined as M ≡ φr̃(1 + gabr̃,ar̃,b)/2. (16) At the AH, it reduces to the MS mass, M = M = φr̃A/2. The a− b components of the gravitational field equation become M,a =− 4πr̃2(T ba − δ aT )r̃,b + 2πr̃ + (φr̃),a ω + 2 r̃2φ,aφ,br̃ r̃2r̃,aφ,bφ ;b − r̃r̃,ar̃,bφ;b r̃2φ;abr̃ r̃2✷r̃φ,a − φ,a✷φ. Applying to the Robertson-Walker metric, the above equation gives the Friedmann equations , (18) (ρ+ 3p)− . (19) The mass formulas (17) or Eqs. (18) and (19) are not sufficient to describe the full dynamics of the BD cosmol- ogy. In the BD cosmology, we also need the equation of motion of the BD scalar field φ in addition to Eqs. (18) and (19), which is given by φ̈+ 3Hφ̇ = 3 + 2ω (ρ− 3p). (20) From the definition of the masslike function (16), at the AH we find dE = M,ak adt = r̃Adφ/2 + φdr̃A = TAdSA, (21) where the entropy now is SA = πr̃ Aφ. Using the mass formulas (17), we get the energy flow through the AH 3 + 2ω r̃3AH [(ω + 2)ρ+ ωp] + r̃3AH − 2r̃3AH r̃Aφ̇, where we used Eq. (20) in deriving the above equation. From Eqs. (18)-(20), the right-hand side of Eq. (22) can be written as 1 r̃Aφ̇+ φ ˙̃rA. Therefore, we see that in BD theory, the first law of thermodynamics of the AH dE = TAdSA can be derived from the Friedmann equation. The thermodynamic prescription can be easily ex- tended to general scalar-tensor theory of gravity with the Lagrangian L = f(φ)R − gµν∂µφ∂νφ/2− V (φ). (23) In this case, f(φ) plays the role of the gravitational con- stant, so now we can define the mass-like function as M ≡ f(φ)r̃ 1 + gabr̃,ar̃,b /2, (24) and the horizon entropy as SA = πr̃ Af(φ). Then, using these definitions, we can show that dE = M,ak adt = TAdSA. For the nonlinear theory of gravity f(R), we can define the masslike function as f ′(R)r̃ 1 + gabr̃,ar̃,b , (25) and the horizon entropy SA = πr̃ ′(R), where f ′(R) = df/dR. Again, it is easy to show that dE = M,ak adt = TAdSA. Therefore, the thermodynamics of the AH holds for both the general scalar-tensor theory of gravity and the nonlinear theory of gravity. Now we show how to derive the first law of ther- modynamics of the AH from the Friedmann equation in the Lovelock gravity. The Lovelock Lagrangian is n=0 cnLn [27], where Ln = 2 µ1ν1···µnνn α1β1···αnβn Rα1β1µ1ν1 · · ·R Using the Robertson-Walker metric, we obtain the Fried- mann equations in N + 1 dimensional space-time N(N − 1) ρ, (26) )i−1 ( N − 1 (ρ+ p), where ĉ0 = c0/[N(N − 1)], ĉ1 = 1 and ĉi = ci j=3(N + 1−j) for i > 1. The masslike function can now be defined N(N − 1)ΩN r̃N 2r̃−2i − = ΩN r̃ A ρ, (28) where ΩN is the volume of unit N -dimensional sphere and the last equality is evaluated at the AH. Note that although the geometric form is different, the masslike function at the AH has the same value as that in Ein- stein theory of gravity, which is the total energy inside the AH. The entropy of the AH is i(N − 1) N − 2i+ 1 ĉir̃ N+1−2i A . (29) From Eqs. (28) and (29), we can easily check that dE = adt = TAdSA holds with the horizon temperature TA = 1/(2πr̃A). Using the Friedmann Eqs. (26) and (27), we find the energy flow through the AH is dE = NΩNHr̃ A (ρ+p), which is the same as that in Einstein’s gravity. By properly defining the masslike function in each the- ory of gravity, we find that the corresponding Friedmann equations can be written in the form dE = TAdSA of the first law of thermodynamics at the AH. In other words, the thermodynamic description of the gravita- tional dynamics is manifest through the mass formulas. Therefore, the gravitational dynamics can be considered as the thermodynamic identity dE = TAdSA. This is true for a variety of theories of gravity, including the Einstein, Lovelock, nonlinear, and scalar-tensor theories. This non-trivial connection between the thermodynamics of the AH and the Friedmann equation may represent a generic connection, and it suggests the unique role that the AH can play in the thermodynamics of cosmology. Such a thermodynamic description of the AH can also be used to probe other physical systems and properties, such as the nature of dark energy and the thermodynamics of black holes in each of these theories. Finally, we would like to note that, although the newly defined masslike function reduces to the MS mass at the AH, the corresponding energy flows passing through the horizon are different. This explains why our masslike function gives rise to the first law of thermodynamics in various theories of gravity, while the MS mass does not. Because of the masslike function, the energy mo- mentum tensor includes the contribution of gravitational fields such as BD scalars, or curvature scalars in non- linear theory of gravity, in addition to the matter fields. This treatment allows a reinterpretation of the nonequi- librium correction introduced in [7]. The studies of other properties of the newly-defined masslike function, includ- ing the physical and geometrical difference between the MS mass and it are important and should be reported somewhere else. Y.G. Gong is supported by NNSFC under Grants No. 10447008 and 10605042, CMEC under Grant No. KJ060502, and SRF for ROCS, State Education Min- istry. A. Wang’s work was partially supported by a VPR fund from Baylor University. ∗ gongyg@cqupt.edu.cn † anzhong˙wang@baylor.edu [1] J.M. Bardeen, B. Carter and S.W. Hawking, Commun. Math. Phys. 31, 161 (1973). [2] S.W. Hawking, Commun. Math. Phys. 43, 199 (1975); 46, 206(E) (1976). [3] J.D. Bekenstein, Phys. Rev. D 7, 2333 (1973). [4] T. Jacobson, Phys. Rev. Lett. 75, 1260 (1995). [5] G.W. Gibbons and S.W. Hawking, Phys. Rev. D 15, 2738 (1977). [6] R.G. Cai and S.P. Kim, J. High Energy Phys. 02, 050 (2005). [7] C. Eling, R. Guedens and T. Jacobson, Phys. Rev. Lett. 96, 121301 (2006). [8] C.M. Misner and D.H. Sharp, Phys. Rev 136, B571 (1964). [9] E. Poisson and W. Israel, Phys. Rev. D 41, 1796 (1990). [10] Y.G. Gong, B. Wang and A. Wang, J. Cosmol. Astropart. Phys. 01, 024 (2007). [11] M. Akbar and R.G. Cai, Phys. Lett. B 635, 7 (2006). [12] X.-H. Ge, Phys. Lett. B 651, 49 (2007). [13] T. Padmanabhan, Class. Quantum Grav. 19, 5387 (2002). [14] D. Kothawala, S. Sarkar and T. Padmanabhan, gr-qc/0701002. [15] A. Paranjape, S. Sarkar and T. Padmanabhan, Phys. Rev. D 74, 104015 (2006). [16] S.A. Hayward, Class. Quantum Grav. 15, 3147 (1998). [17] S. A. Hayward, S. Mukohyama and M.C. Ashworth, Phys. Lett. A 256, 347 (1999). [18] M. Akbar and R.G. Cai, Phys. Rev. D 75, 084003 (2007). [19] R.G. Cai and L.M. Cao, Phys. Rev. D 75, 064008 (2007). [20] R.G. Cai and L.M. Cao, Nucl. Phys. B 785, 135 (2007). [21] M. Akbar and R.G. Cai, Phys. Lett. B 648, 243 (2007). [22] A. Sheykhi, B. Wang and R.G. Cai, Nucl. Phys. B 779, 1 (2007). [23] A. Sheykhi, B. Wang and R.G. Cai, Phys. Rev. D 76, 023515 (2007). [24] Y.G. Gong, B. Wang and A. Wang, Phys. Rev. D 75, mailto:gongyg@cqupt.edu.cn mailto:anzhong_wang@baylor.edu http://arxiv.org/abs/gr-qc/0701002 123516 (2007). [25] C. Brans and R.H. Dicke, Phys. Rev. 124, 925 (1961). [26] N. Sakai and J.D. Barrow, Class. Quantum Grav. 18, 4717 (2001). [27] D. Lovelock, J. Math. Phys. 12, 498 (1971). ABSTRACT With the help of a masslike function which has dimension of energy and equals to the Misner-Sharp mass at the apparent horizon, we show that the first law of thermodynamics of the apparent horizon $dE=T_AdS_A$ can be derived from the Friedmann equation in various theories of gravity, including the Einstein, Lovelock, nonlinear, and scalar-tensor theories. This result strongly suggests that the relationship between the first law of thermodynamics of the apparent horizon and the Friedmann equation is not just a simple coincidence, but rather a more profound physical connection. <|endoftext|><|startoftext|> Constraints on the Interactions between Dark Matter and Baryons from the X-ray Quantum Calorimetry Experiment Adrienne L. Erickcek1,2, Paul J. Steinhardt2,3, Dan McCammon4, and Patrick C. McGuire5 Division of Physics, Mathematics, & Astronomy, California Institute of Technology, Mail Code 103-33, Pasadena, CA 91125, USA Department of Physics, Princeton University, Princeton, NJ 08544, USA Princeton Center for Theoretical Physics, Princeton University, Princeton, NJ 08544, USA Department of Physics, University of Wisconsin, Madison, WI 53706, USA and McDonnell Center for the Space Sciences, Washington University, St. Louis, M0 63130, USA Although the rocket-based X-ray Quantum Calorimetry (XQC) experiment was designed for X- ray spectroscopy, the minimal shielding of its calorimeters, its low atmospheric overburden, and its low-threshold detectors make it among the most sensitive instruments for detecting or constraining strong interactions between dark matter particles and baryons. We use Monte Carlo simulations to obtain the precise limits the XQC experiment places on spin-independent interactions between dark matter and baryons, improving upon earlier analytical estimates. We find that the XQC experiment rules out a wide range of nucleon-scattering cross sections centered around one barn for dark matter particles with masses between 0.01 and 105 GeV. Our analysis also provides new constraints on cases where only a fraction of the dark matter strongly interacts with baryons. PACS numbers: 95.35.+d, 12.60.-i, 29.40.Vj I. INTRODUCTION From Vera Rubin’s discovery that the rotation curves of galaxies remain level to radii much greater than pre- dicted by Keplerian dynamics [1] to the Wilkinson Mi- crowave Anisotropy Probe (WMAP) measurement of the cosmic microwave background (CMB) temperature anisotropy power spectrum [2], observations indicate that the luminous matter we see is only a fraction of the mass in the Universe. The three-year WMAP CMB anisotropy spectrum is best-fit by a cosmological model with Ωm = 0.241 ± 0.034 and a baryon density that is less than one fifth of the total mass density. The cold collisionless dark matter (CCDM) model has emerged as the predominant paradigm for discussing the missing mass problem. The dark matter is assumed to consist of non-relativistic, non-baryonic, weakly interacting par- ticles, often referred to as Weakly Interacting Massive Particles (WIMPs). Although the CCDM model successfully predicts ob- served features of large-scale structure at scales greater than one megaparsec [3], there are indications that it may fail to match observations on smaller scales. Numerical simulations of CCDM halos [4, 5, 6, 7, 8, 9, 10, 11, 12] imply that CCDM halos have a density profile that in- creases sharply at small radii (ρ ∼ r−1.2 according to Ref. [12]). These predictions conflict with lensing ob- servations of clusters [13, 14] that indicate the presence of constant-density cores. X-ray observations of clusters have found cores in some clusters, although density cusps have also been observed [15, 16, 17]. On smaller scales, observations of dwarf and low-surface-brightness galaxies [18, 19, 20, 21, 22, 23, 24] indicate that these dark mat- ter halos have constant-density cores with lower densities than predicted by numerical simulations. Observations also indicate that cores are predominant in spiral galaxies as well, including the Milky Way [25, 26, 27]. Numerical simulations of CCDM halos also predict more satellite halos than are observed in the Local Group [28, 29] and fossil groups [30]. Astrophysical explanations for the discord between the density profiles predicted by CCDM simulations and ob- servations have been proposed: for instance, dynamical friction may transform density cusps into cores in the inner regions of clusters [31], and the triaxiality of galac- tic halos may mask the true nature of their inner den- sity profiles [32]. There are also models of substructure formation that explain the observed paucity of satellite halos [33, 34, 35, 36]. Another possible explanation for the apparent failure of the CCDM model to describe the observed features of dark matter halos is that dark matter particles scat- ter strongly off one another. The discrepancies between observations and the CCDM model are alleviated if one introduces a dark matter self-interaction that is compa- rable in strength to the interaction cross section between neutral baryons [37, 38]: = 8× 10−25 − 1× 10−23cm2 GeV−1, (1) where σDD is the cross section for scattering between dark matter particles and mdm is the mass of the dark mat- ter particle. Numerical simulations have shown that in- troducing dark matter self-interactions within this range reduces the central slope of the halo density profile and reduces the central densities of halo cores, in addition to destroying the extra substructure [39, 40]. The numerical coincidence between this dark mat- ter self-interaction cross section and the known strong- interaction cross section for neutron-neutron or neutron- proton scattering has reinvigorated interest in the pos- sibility that dark matter interacts with itself and with http://arxiv.org/abs/0704.0794v2 baryons through the strong nuclear force. We refer to dark matter of this type as “strongly interacting dark matter” where “strong” refers specifically to the strong nuclear force. Strongly interacting dark matter candi- dates include the dibaryon [41, 42], the Q-ball [43], and O-helium [44]. Surprisingly, the possibility that the dark matter may be strongly interacting is not ruled out. While there are numerous experiments searching for WIMPs, they are largely insensitive to dark matter that interacts strongly with baryons. The reason is that WIMP searches are typ- ically conducted at or below ground level based on the fact that WIMPs can easily penetrate the atmosphere or the Earth, whereas strongly interacting dark matter is multiply scattered and thermalized by the time it reaches ground level and its thermal kinetic energy is too small to produce detectable collisions with baryons in WIMP detectors. Consequently, there are few experiments ca- pable of detecting strongly interacting dark matter di- rectly. Starkman et al. [45] summarized the constraints on strongly interacting dark matter from experiments prior to 1990, and these constraints were later refined [38, 46, 47, 48]. The strength of dark matter interactions with baryons may also be constrained by galactic dynam- ics [45], cosmic rays [45, 49], Big Bang nucleosynthesis (BBN) [49], the CMB [50], and large-scale structure [50]. The X-ray Quantum Calorimetry (XQC) project launched a rocket-mounted micro-calorimeter array in 1999 [51]. At altitudes above 165 km, the XQC detector collected data for a little less than two minutes. Although its primary purpose was X-ray spectroscopy, the limited amount of shielding in front of the calorimeters and the low atmospheric overburden makes the XQC experiment a sensitive detector of strongly interacting dark matter. In this article, we present a new numerical analysis of the constraints on spin-independent interactions be- tween dark matter particles and baryons from the XQC experiment using Monte Carlo simulations of dark mat- ter particles interacting with the XQC detector and the atmosphere above it. Our work is a significant improve- ment upon the earlier analytic estimates presented by some of us in Refs. [38, 48] because it accurately models the dark matter particle’s interactions with the atmo- sphere and the XQC instrument. Our calculation here also supersedes the analytic estimate by Zaharijas and Farrar [52] because they only considered a small por- tion of the XQC data and did not include multiple scat- tering events nor the overburden of the XQC detector. We restrict our analysis to spin-independent interactions because the XQC calorimeters are not highly sensitive to spin-dependent interactions. Only a small fraction of the target nuclei in the calorimeters have non-zero spin; consequently, the bound on spin-dependent interactions between baryons and dark matter from the XQC experi- ment is about four orders of magnitude weaker than the bound on spin-independent interactions [52]. This article is organized as follows. In Section II we summarize the specifications of the XQC detector. We then review dark matter detection theory in Section III. This Section includes a discussion of coherent versus in- coherent scattering and how we account for the loss of coherence in our analysis. A complete description of our analysis follows in IV, and our results are presented in Section V. Finally, in Section VI, we summarize our findings and compare the constraints to strongly inter- acting dark matter from the XQC experiment to those from other experiments. II. THE XQC EXPERIMENT Calorimetry is the use of temperature deviations to measure changes in the internal energy of a material. By drastically reducing the specific heat of the absorbing material, the use of cryogenics in calorimetry allows the absorbing object to have a macroscopic volume and still be sensitive to minute changes in energy. These detectors are sensitive enough to register the energy deposited by a single photon or particle and gave birth to the technique of “quantum calorimetry,” the thermal measurement of energy quanta. The quantum calorimetry experiment [51] we use to constrain interactions between dark matter particles and baryons is the second rocket-born experiment in the XQC (X-ray Quantum Calorimetry) Project, a joint under- taking of the University of Wisconsin and the Goddard Space Flight Center [53, 54]. It launched on March 28, 1999 and collected about 100 seconds of data at alti- tudes between 165 and 225 km above the Earth’s surface. The detector consisted of thirty-four quantum calorime- ters operating at a temperature of 0.06 K; for detailed information on the XQC detector functions, please refer to Refs. [51, 54]. These detectors were separated from the exterior of the rocket by five thin filter panes [51]. The small atmospheric overburden at this altitude and the minimal amount of shielding in front of the calorime- ters makes this experiment a promising probe of strongly interacting dark matter. The absorbers in the XQC calorimeters are composed of a thin film of HgTe (0.96 µm thick) deposited on a sili- con (Si) substrate that is 14 µm thick. The absorbers rest on silicon spacers and silicon pixel bodies. Figs. 1 and 2 show side and top views of the detectors with the di- mensions of each layer. Temperature changes in all four components are measured by the calorimeter’s internal thermometer. The calorimeters report the average tem- perature over an integration time of 7 ms in order to reduce the effect of random temperature fluctuations on the measurement. Multiple scatterings by a dark matter particle will register as a single event because the time it takes the dark matter particle to make its way through the calorimeter is small compared to the integration time. The detector array consists of two rows of detec- tors, with seventeen active calorimeters and one inactive calorimeter in each row, and is located at the bottom of a conical detector chamber. Within a 32-degree angle Si Substrate 0.5 mm x 2.0 mm 14 µm 12 µm HgTe Absorber 0.96µm Si Pixel Body 0.25 mm x 1.0 mm Si Spacer 0.245 mm x 0.245 mm FIG. 1: A vertical cross section of an XQC calorimeter. The relative thicknesses of the layers are drawn to scale, as are their relative lengths, but the two scales are not the same. To facilitate the display of the layers, the vertical dimension has been stretched relative to the horizontal dimension. 2.0 mm 1.0 mm Absorber Panel (HgTe on Si) Pixel Body (Si) Spacer (Si) 0.245 mm x 0.245 mm FIG. 2: A top view of an XQC calorimeter. The absorber is the top layer and underneath it lies the spacer, followed by the pixel body. These dimensions are drawn to scale. from the detector normal, the incoming particles only pass through the aforementioned filters. The five filters are located 2 mm, 6 mm, 9 mm, 11 mm and 28 mm above the detectors. Each filter consists of a thin layer of alu- minum (150 Å) supported on a parylene (CH) substrate (1380 Å). The pressure inside the chamber is less than 10−6 Torr. At this level of evacuation, a dark matter particle with a mass of 106 GeV and a baryon interac- tion cross section of 106 barns, would have less than a 20% chance of colliding with an air atom in the cham- ber. Therefore, we assume that the chamber is a perfect vacuum in our analysis. While the atmospheric pressure at the altitudes at which the detector operated is about 10−8 times the at- mospheric pressure at sea level, the atmospheric over- burden of the XQC detector is still sufficient to scat- ter incoming strongly interacting dark matter particles. Simulating a dark matter particle’s path through the atmosphere requires number-density profiles for all the molecules in the atmosphere. These profiles were ob- tained using the MSIS-E-90 model1 for the time (1999 1 Available at http://modelweb.gsfc.nasa.gov/models/msis.html FIG. 3: The points depict the MSIS-E-90 density profiles for the seven most prevalent constituents of the atmosphere above the XQC detector, and the lines show the piecewise exponen- tial fits used in our analysis. March 28 9:00 UT) and location (White Sands Missile Range, New Mexico) of the XQC rocket launch. During the data collection period, the average altitude of the XQC rocket was 201.747 km. At this altitude and above, the primary constituents of the atmosphere are molecular and atomic oxygen, molecular and atomic ni- trogen, helium, atomic hydrogen, and argon. The MSIS- E-90 model provides tables of the number densities of each of these seven chemical species. In our analysis, computational efficiency demanded that we fit analytic functions to these data. We found exponential fits for the density profiles in three altitude ranges: 200-300 km, 300-500 km and 500-1000 km. The error in the proba- bility of a collision between a dark matter particle and an element of the atmosphere introduced by using these fits instead of the original data is 0.02%. Fig. 3 shows the number density profiles provided by the MSIS-E-90 model and the exponential fits used to model the data. The XQC detector collected data for a total of 150 seconds. During these 150 seconds of activity, the thirty- four individual calorimeters were not all operational at all times. Furthermore, events that could not be ac- curately measured by the calorimeters and events at- tributed to cosmic rays hitting the base of the detec- tor array were removed from the XQC spectrum, and these cuts also contribute to the dead time of the sys- tem. Specifically, events that arrived too close together for the calorimeters to accurately measure distinct ener- gies were discarded. This criterion removed 12% of the observed events and the resulting loss of sensitivity was http://modelweb.gsfc.nasa.gov/models/msis.html FIG. 4: Top panel: The XQC energy spectrum from 0 - 4 keV in 5 eV bins. This spectrum does not have non-linearity corrections applied (see Ref. [51]), so the calibration lines at 3312 eV and 3590 eV appear slightly below their actual en- ergies. The cluster of counts to the left of each calibration peak result from X-rays passing through the HgTe layer and being absorbed in the Si components where up to 12% of the energy may then be trapped in metastable states. Bottom panel: The XQC energy spectrum from 0 - 2.5 keV in 5 eV bins. This spectrum, combined with the over-saturation rate of 0.6 events per second with energies greater than 4000 eV, was used in our analysis. included in the dead time of the calorimeters. When a cosmic ray penetrates the silicon base of the detector ar- ray, the resulting temperature increase is expected to reg- ister as multiple, nearly simultaneous, low-energy events on nearby calorimeters. To remove these events from the spectrum, we cut out events that were part of either a pair of events in adjacent detectors or a trio of events in any of the detectors that arrived within 3 ms of each other and had energies less than 2.5 keV. This procedure was expected to remove more than 97% of the events that resulted from cosmic rays hitting the base of the array. Nearly all of the events attributed to heating from cosmic rays had energies less than 300 eV, and a high fraction of the observed low-energy events were included in this cut. For example, seventeen of the observed twenty-four events with energies less than 100 eV were removed. The expected loss of sensitivity due to events being falsely attributed to cosmic rays was included in the calculated dead time of the calorimeters. Once all the dead time is accounted for, the 150 seconds of data collection is equiv- alent to 100.7 seconds of observation with all thirty-four calorimeters operational. The XQC calorimeters are capable of detecting energy deposits that exceed 20 eV, but full sensitivity is not reached until the energy surpasses 36 eV, and for approx- imately half of the detection time, the detector’s lower threshold was set to 120 eV. The calorimeters cannot re- solve energies above 4 keV, and the 2.5-4 keV spectrum is dominated by the detector’s interior calibration source: a ring of 2µCi 41Ca that generates Kα and Kβ lines at 3312 eV and 3590 eV, respectively. We refer the reader to Ref. [51] for a complete discussion of the calibration of the detector. These limitations restrict the useful portion of the XQC spectrum to 0.03-2.5 keV. This spectrum is shown in Fig. 4, along with the full spectrum from 0-4 keV. The XQC field of view was centered on a region of the sky known to have an enhanced X-ray background in the 100-300 eV range, possibly due to hot gas in the halo, and this surge in counts can be seen in Fig. 4. In addition to the information present in this spectrum, we know that the XQC detector observed an average over- saturation event rate of 0.6 per second. This corresponds to a total of 60 events that deposited more than 4000 eV in a calorimeter. In Section IVB, we describe how we use the observed spectrum between 29 eV and 2500 eV and the integrated over-saturation rate to constrain the total cross section for elastic scattering between dark matter particles and nucleons. III. DETECTING DARK MATTER A. Incidence of dark matter particles The expected flux of dark matter particles into the de- tector depends on the density of the dark matter halo in the Solar System. Unfortunately, the local dark matter density is unknown and the range of theoretical predic- tions is wide. By constructing numerous models of our galaxy with various dark matter density profiles and halo characteristics, rejecting those models that contradict ob- servations, and finding the distribution of local dark mat- ter densities in the remaining viable models, Ref. [55] pre- dicted that the local dark matter density is between 0.3 and 0.7 GeV cm−3 assuming that the dark matter halo is flattened, and the predicted local density decreases as the halo is taken to be more spherical. Another approach [56] used numerical simulations of galaxies similar to our own to find the dark matter density profile and then fit the profile parameters to Galactic observations, predict- ing a mean local dark matter density between 0.18 GeV cm−3 and 0.30 GeV cm−3. Given that it lies in the in- tersection of these two ranges, we use the standard value of 0.3 GeV cm−3 for the local dark matter density in our primary analysis. This assumption ignores the possi- ble presence of dark matter streams or minihalos, which do occur in numerical simulations [56] and could lead to local deviations from the mean dark matter density. We also assume that the velocities of the dark matter particles with respect to the halo are isotropic and have a bounded Maxwellian distribution: the probability that a particle has a velocity within a differential volume in velocity-space centered around a given velocity ~v is P (~v) = d3~v if v ≤ vesc, 0 if v > vesc. where v0 is the dispersion velocity of the halo, vesc is the Galactic escape velocity at the Sun’s position, and k is a normalization factor [57]: k = (πv20) . (3) Numerical simulations indicate that dark matter particle velocities may not have an isotropic Maxwellian distri- bution [56]. Ref. [58] examines how assuming a more complicated velocity distribution would alter the flux of dark matter particles into an Earth-based detector. Given the flat rotation curve of the spiral disk at the Sun’s radius and beyond and assuming a spherical halo, the local dispersion speed v0 is the maximum rotational velocity of the Galaxy vc [59]. Reported values for the rotational speed include 222±20 km s−1 [60], 228±19 km s−1 [61], 184± 8 km s−1 [62] and 230 ± 30 km s−1 [63]. Recent measurements of the Galaxy’s angular velocity have yielded values of Ωgal = 28 ± 2 km s−1 kpc−1 [64] and 32.8 ± 2 km s−1 kpc−1 [65]. If the Sun is located 8.0 kpc from the Galactic center, these angular velocities correspond to tangential velocities 224± 16 km s−1 and 262 ± 16 km s−1 respectively. We adopt vc = 220 ± 30 km s−1 as a centrally conservative value for the Galaxy’s circular velocity at the Sun’s location. The final parameter we need to obtain the dark mat- ter’s velocity distribution is the escape velocity in the So- lar vicinity. The largest observed stellar velocity at the Sun’s radius in the Milky Way is 475 km s−1, which es- tablishes a lower bound for the local escape velocity [66]. Ref. [67] used the radial motion of Carney-Latham stars to determine that the escape velocity is between 450 and 650 km s−1 to 90% confidence, and Ref. [68] obtained a 90% confidence interval of 498 to 608 km s−1 from ob- servations of high-velocity stars. A kinematic derivation of the escape velocity [59] gives v2esc = 2v 1 + ln , (4) where R0 is the distance from the Sun to the center of the Galaxy, and Rgal is radius of the Galaxy. Observations of other galaxies suggest that our galaxy extends to about 100 kpc [59], and observations of Galactic satellites indi- cate that the Galaxy’s flat rotation curve extends to at least 110 kpc [63]. The commonly accepted value for the Solar radius is R0 = 8.0 kpc [69]. Recent measurements include R0 = 7.9± 0.3 kpc [70] and R0 = 8.01± 0.44 kpc [71], and a compilation of measurements over the past decade [71] yields an average value of R0 = 7.80 ± 0.33 kpc. To estimate the escape velocity, we use 100 kpc as a conservative estimate of the Galactic radius and the standard value R0 = 8.0 kpc. These parameters, com- bined with vc = 220 km s −1, predict an escape velocity of 584 km s−1, which falls near the middle of the ranges proposed in Refs. [67, 68]. The isotropic Maxwellian velocity distribution given by Eq. (2) specifies the dark matter particles’ motion relative to the halo. However, we are interested in their motion relative to the XQC detector: ~vobserved = ~vdm − ~vdetector where the latter two velocities are measured with respect to the halo. The velocity of the detector with respect to the halo has three components: the velocity of the Sun relative to halo, the velocity of the Earth with respect to the Sun, and the velocity of the detector with respect to the Earth. When discussing these velocities, it is useful to de- fine a Galactic Cartesian coordinate system. In Galactic coordinates, the Sun is located at the origin, and the xy-plane is defined by the Galactic disk. The x-axis points toward the center of the Galaxy, and the y-axis points in the direction of the Sun’s tangential velocity as it revolves around the Galactic center. The z-axis points toward the north Galactic pole and is antiparal- lel to the angular momentum of the rotating disk. The motion of the Sun through the halo has two compo- nents. First, there is the Sun’s rotational velocity as it orbits the Galactic center: vc in the y direction. Sec- ond, there is the motion of the Sun relative to the spiral disk [72]: ~v⊙ = (10.00 ± 0.36, 5.25 ± 0.62, 7.17 ± 0.38) km s−1 in Galactic Cartesian coordinates. When the Earth’s motion through the Solar System during its an- nual orbit of the Sun is expressed in Galactic coor- dinates [57], the resulting velocity at the time of the XQC experiment (7.3 days after the vernal equinox) is ~vEarth = (29.14, 5.330,−3.597) km s−1. The final consideration is the velocity of the detector relative to the Earth. The maximum velocity attained by the XQC rocket was less than 1.2 km s−1. This velocity is insignificant compared to the motion of the Sun relative to the halo. Moreover, the XQC detector collected data while the rocket rose and while it fell, and the average velocity of the rocket was only 0.104 km s−1. Therefore, we neglect the motion of the rocket in the calculation of the dark matter wind. Combining the motion of the Sun and the Earth then gives the total velocity of the XQC detector with respect to the halo during the experiment in Galactic Cartesian coordinates: ~vdetector = (39.14 ± 0.36, 230.5 ± 30, 3.573 ± 0.38) km s−1. Subtracting the velocity vector of the detector relative to the halo from the velocity vector of the dark matter relative to the halo gives the dark matter’s velocity relative to the detector in Galactic coordinates. However, we want the dark matter particles’ velocities in the coordinate frame defined by the detector, where the z-axis is the field-of-view vector. The XQC field of view was centered on l = 90◦, b = +60◦ in Galactic latitude and longitude [51], so the rotation from Galactic coordinates to detector coordinates may be described as a clockwise 30◦ rotation of the z-axis around the x-axis, which is taken to be the same in both coordinate systems. B. Dark Matter Interactions Calorimetry measures the kinetic energy transferred from the dark matter to the absorbing material without regard for the specific mechanism of the scattering or any other interactions. Consequently, the dark matter detection rate for a calorimeter depends only on the mass of the dark matter particle and the total cross section for elastic scattering between the dark matter particle and an atomic nucleus of mass number A, which is proportional to the cross section for dark matter interactions with a single nucleon (σDn). The calorimeter measures the recoil energy of the target nucleus (mass mT), Erec = 2mTmdm (mT +mdm)2 (1 − cos θCM), (5) where mdm and vdm are dark matter particle’s mass and velocity prior to the collision in rest frame of the target nucleus and θCM is the scattering angle in the center-of- mass frame. If the momentum transferred to the nucleus, q2 = 2mTErec, is small enough that the corresponding de Broglie wavelength is larger than the radius R of the nucleus (qR ≪ ~), then the scattering is coherent. In co- herent scattering, the scattering amplitudes for each in- dividual component in the conglomerate body are added prior to the calculation of the cross section, so the total cross section is proportional to the square of the mass number of the target nucleus. Including kinematic fac- tors [45, 73], the cross section for coherent scattering off a nucleus is given by σcoh(A) = A mred(DM,Nuc) mred(DM, n) σDn, (6) where mred(DM,Nuc) is the reduced mass of the nucleus and the dark matter particle, mred(DM, n) is the reduced mass of a nucleon and the dark matter particle, and A is the mass number of the nucleus. Coherent scattering is isotropic in the center-of-mass frame of the collision. Dark matter particles may be massive and fast-moving enough that the scattering is not completely coherent when the target nucleus is large [74]. When the scat- tering is incoherent, the dark matter particle “sees” the internal structure of the nucleus, and the cross section for scattering is reduced by a “form factor,” which is a func- tion of the momentum transferred to the nucleus during the collision (q) and the nuclear radius (R): F 2(q, R). (7) Since q depends on the recoil energy, which in turn de- pends on the scattering angle, incoherent scattering is not isotropic. In this discussion of coherence, we have neglected the possible effects of the dark matter particle’s internal structure by assuming that σDn is independent of recoil energy. If the dark matter particle is not point-like then σDn decreases as the recoil momentum increases due to a loss of coherence within the dark matter particle. Inco- herence within the dark matter particle has observational consequences [75], but these effects depend on the size of the dark matter particle. To avoid restricting ourselves to a particular dark matter model, we assume that the dark matter particle is small enough that nucleon scat- tering is always coherent; when we discuss incoherence, we are referring to the effects of the nucleus’s internal structure. According to the Born approximation, the form fac- tor for nuclear scattering defined in Eq. (7) is the Fourier transform of the nuclear ground-state mass den- sity [57, 76]. The most common choice for the form factor [74, 77] is F 2(q, R) = exp[−(qRrms)2/(3~2)], where Rrms is the root-mean-square radius of the nucleus. For a solid sphere, R2rms = (3/5)R 2, so this form factor is equivalent to the form factor used in Ref. [52]. This form factor is an accurate approximation of the Fourier transform of a solid sphere for (qR)/~ ∼< 2, but it grossly underesti- mates the reduction in σ for larger values of q [57]. The maximum speed of a dark matter particle with respect to the XQC detector is ∼ 800 km s−1 (escape velocity + detector velocity), and at that speed, the maximum possible value of qR/~ for a collision with a Hg nucleus (A = 200) is nearly ten for a 100 GeV dark matter parti- cle, and the maximum possible value of qR/~ increases as the mass of the dark matter particle increases. Clearly, this approximation is not appropriate for a large portion of the dark matter parameter space probed by the XQC experiment. Furthermore, a solid sphere is not a very realistic model of the nucleus. A more accurate model of the nuclear mass density is ρ(r) = d3r′ρ0(r ′)ρ1(r− r′), where ρ0 is constant inside a radius R20 = R 2 − 5s2 and zero beyond that radius and ρ1 = exp[−r2/(2s2)], where s is a “skin thickness” for the nucleus [78]. The resulting form factor F (q, R) = 3 sin(qR/~)− (qR/~) cos(qR/~) (qR/~)3 × exp (qs/~)2 . (8) We follow Ref. [57] in setting the parameters in Eq. (8): s = 0.9 fm and R2 = [(1.23A1/3 − 0.6)2 + 0.631π2 − 5s2] fm2, (9) where A is the mass number of the target nucleus. Despite its simple analytic form, the form factor given by Eq. (8) is computationally costly to evaluate repeat- edly. We use an approximation: F 2 = 0.9 fm if qR 9(0.81) (qR/~)4 0.9 fm if qR The low-q approximation combines the standard approx- imation for the solid sphere with the factor accounting for the skin depth of the nucleus. The high-q approxi- mation was derived from the asymptotic form of the first spherical Bessel function and normalized so that the to- tal cross section is as close as possible to the exact result. The error in the total cross section due to the use of the approximation is less than 1% for nearly all dark mat- ter masses; the sole exception is mdm ∼ 10 − 100 GeV, and even then the error is less than 5%. Unless other- wise noted, we use this approximation for the form factor throughout this analysis. We also assume that the dark matter particle does not interact with nuclei in any way other than elastic scattering. IV. ANALYSIS OF XQC CONSTRAINTS To obtain an accurate description of the XQC experi- ment’s ability to detect strongly interacting dark matter particles, we turned to Monte Carlo simulations. The Monte Carlo code we wrote to analyze the XQC experi- ment simulates a dark matter particle’s journey through the atmosphere to the XQC detector chamber, its path through the detector chamber to a calorimeter, and its in- teraction with the sensitive components of the calorime- ter. This latter portion of the code also records how much energy the particle deposits in the calorimeter through scattering. The results of several such simulations for the same set of dark matter properties may be used to pre- dict the likelihood that a given dark matter particle will deposit a particular amount of energy into the calorime- ter. These probabilities of various energy deposits predict the recoil-energy spectrum the XQC detector would ob- serve if the dark matter particles have a given mass and nucleon-scattering cross section. This simulated spec- trum may then be compared to the XQC data to find which dark matter parameters are excluded by the XQC experiment. A. Generating Simulated Energy-Recoil Spectra The basic subroutine in our Monte Carlo algorithm is the step procedure. The step procedure begins with a particle with a certain velocity vector and position in a given material and moves the particle a certain dis- tance in the material, returning its new position and ve- locity. The step procedure also determines whether or not a scattering event occurred during the particle’s trek and updates the velocity accordingly. The number of ex- pected collisions in a step of length l through a material with target number density n is n× σtot × l, where σtot is the total scattering cross section obtained by integrat- ing Eq. (7) over the scattering angle, or equivalently, the recoil momentum q: σtot = q2max F 2(q, R) dq2, (11) where qmax is the maximum possible recoil momentum. The step length l is chosen so that it is at most a tenth of the mean free path through the material, so the num- ber of expected collisions is less than one and represents the probability of a collision. After each step, a ran- dom number between zero and one is generated using the “Mersenne Twister” (MT) algorithm [79] and if that random number is less than the probability of a collision, the particle’s energy and trajectory are updated. First, a recoil momentum is selected according to the probability distribution P (q2) = F 2(q, R)σcoh/(q maxσtot), where the exact form factor is used for qR/~ > 2 so that the oscilla- tory nature of the form factor is not lost. The recoil mo- mentum determines the recoil energy and the scattering angle in the center-of-mass frame through Eq. (5). The scattering is axisymmetric around the scattering axis, so the azimuthal angle is assigned a random value between 0 and 2π. The scattering angles are used to update the particle’s trajectory, and its speed is decreased in accor- dance with the kinetic energy transferred to the target nucleus. The step subroutine repeats until the particle exits the simulation, or its kinetic energy falls below 0.1 eV, or the energy deposited in the calorimeter exceeds the saturation point of 4000 eV. Our simulation treats the atmosphere as a 4.6×4.6 cm square column with periodic boundary conditions, the bottom face of which covers the top of the conical detec- tor chamber described in Section II. This implementation assumes that for every particle that exits one side of the column, there is a particle that enters the column from the opposite side with the same velocity. The infinite extent of the atmosphere and its translational invariance makes this assumption reasonable. The atmosphere col- umn extends to an altitude 1000 km; increasing the at- mosphere height beyond 1000 km has a negligible effect on the total number of collisions in the atmosphere. The simulation begins with a dark matter particle at the top of the atmosphere column at a random initial position on the 4.6×4.6 cm square. Its initial velocity with re- spect to the dark matter halo is selected according to the isotropic Maxwellian velocity distribution function given by Eq. (2), and then the velocity relative to the detector is found via the procedure described in Section IIIA. The dark matter particle’s path from the top of the at- mosphere to the detector is modeled using the step pro- cedure described above. The simulation of the particle’s interaction with the atmosphere ends if the particle’s al- titude exceeds 1000 km or if the particle falls below the height of the XQC rocket. We use the time-averaged alti- tude (201.747 km) as the constant altitude of the rocket. We made this simplification because it allows us to ig- nore the periodic inactivity of each calorimeter and treat the detector as thirty-four calorimeters that are active for 100.7 seconds. When the dark matter particle hits the rocket, its path through the five filter layers is also mod- eled using the step procedure, as is its path through the calorimeters. In addition to being smaller than the mean free path, the step length is chosen so that the particle’s position relative to the boundaries of the detectors is ac- curately modeled. The simulation ends when the dark matter particle’s random-walk trajectory takes it out of the detector chamber. As mentioned in Section II, the calorimeter detects the sum of all the recoil energies if the dark matter particle is scattered multiple times. When the dark matter particle is unlikely to experience more than one collision in the calorimeter, this simula- tion is far more detailed than is required to accurately predict the energy deposited by the dark matter parti- cle. This is the case for the lightest (mdm ≤ 102 GeV) and weakest-interacting (σDn ≤ 10−26 cm2) dark mat- ter particles that the XQC calorimeters are capable of detecting. Since the lightest dark matter particles are also the most numerous, many Monte Carlo trials are required to sample all the possible outcomes of a dark matter particle’s encounter with the detector. The sim- ulation described above is too computationally intensive to run that many trials, so we used a faster and simpler simulation to model the interactions of these dark matter particles. This simulation assumes that the particle will experience at most one collision in the atmosphere and at most one collision in each filter layer and each layer of the calorimeter. The simulation ends if the probability of two scattering events in either the atmosphere or any of the filter layers exceeds 0.1. Instead of tracking the dark matter particle’s path through the atmosphere, the total overburden for the atmosphere is used to determine the probability that the dark matter particle scatters in the atmosphere, and the particle only reaches the detec- tor if its velocity vector points toward the detector after the one allowed scattering event. Also, instead of the small step lengths required to accurately model the ran- dom walk of a strongly interacting particle, each layer is crossed with a single step. These simplifications reduce the runtime of the simulation by a factor of 100, making it possible to run 1010 trials in less than one day. B. Comparing the Simulations to the XQC Data In order to compare the probability spectra produced by our Monte Carlo routine to the results of the XQC experiment, we must multiply the probabilities by the number of dark matter particles that are encountered by the initial surface of the Monte Carlo routine. When the initial velocity of the dark matter particle is chosen, the initial velocity may point toward or away from the detec- tor; in the latter case, the trial ends immediately. Con- FIG. 5: Simulated event spectra for dark matter particles with masses of 1, 10 and 100 GeV and a total nucleon-scattering cross section of 10−27.3 cm2. In addition to the events de- picted in these spectra, the simulations predict 1300 ± 160 events with energies greater than 4000 eV when mdm = 10 GeV and 10, 000 ± 1200 such events when mdm = 100 GeV. The histogram represents the XQC observations. sequently, the Monte Carlo probability that the particle deposits no energy in the calorimeter already includes the probability that the dark matter particle does not have a halo trajectory that takes it into the atmosphere. There- fore, the probabilities resulting from the Monte Carlo routine should be multiplied by the number of particles in the volume swept out by the initial 4.6×4.6 cm2 square surface during the 100.7f(E) seconds of observation time, where f is the fraction of the observing time that the XQC detector was sensitive to deposits of energy E. For energies between 36 and 88 eV, f is 0.5083, and the value of f increases to one over energies between 88 and 128 eV. The detector was also slightly sensitive to lower energies: between 29 and 35 eV, f increases from 0.3815 to 0.5083. The normal of the initial surface points along the detec- tor’s field of view, and the surface moves with the detec- tor; using the detector velocity given in Section IIIA, the number of dark matter particles encountered by the ini- tial surface is Ndm = f × (ρdm/mdm)× [(2.5± 0.3)× 1010 cm3], where ρdm is the local dark matter density. The simulated event spectra produced by our Monte Carlo routine indicate that particles with masses less than 1 GeV very rarely deposit more than 100 eV inside the XQC calorimeters. Conversely, particles with masses greater than 100 GeV nearly always deposit more than 4000 eV when they interact with the XQC calorimeters, so constraints on σDn for these mdm values arise from the FIG. 6: Simulated event spectra for 10-GeV dark mat- ter particles with total nucleon-scattering cross sections of 10−21.6, 10−27.3 and 10−28.3 cm2. In addition to the events depicted in these spectra, the simulations predict 140 ± 37 events with energies greater than 4000 eV when σDn = 10 −21.6 cm2, 1300 ± 160 such events when σDn = 10 −27.3 cm2, and 120±15 such events when σDn = 10 −28.3 cm2. The histogram represents the XQC observations. over-saturation (E ≥ 4000 eV) event rate. Fig. 5 shows simulated spectra for three mdm values that lie between these two extremes, along with a histogram that depicts the XQC observations. Given an initial velocity of 300 km s−1 relative to the XQC detector, a 1-GeV particle can only deposit up to 66 eV in a single collision with an Si nucleus, so the spectrum for these particles is confined to very low energies. Meanwhile, a 10-GeV particle and a 100-GeV particle with the same initial velocity can de- posit up to 900 eV and 44,000 eV, respectively, in a single collision with an Hg nucleus. In fact, ignoring any loss of coherence, all recoil energies between 0 and 44,000 eV are equally likely during a collision between an Hg nu- cleus and a 100-GeV dark matter particle. That’s why the mdm = 100 GeV spectrum in Fig. 5 is flat below 2500 eV and why the simulations predict 10,000 events with energies greater than 4000 eV for this value of mdm and Fig. 6 shows how changing the total cross section for elastic scattering off a nucleon affects the simulated spec- tra generated by our Monte Carlo routine for a single dark matter particle mass (mdm = 10 GeV). We see that increasing σDn from 10 −28.3 cm2 to 10−27.3 cm2 increases all of the counts by a factor of ten but leaves the ba- sic shape of the spectrum unchanged. For much larger values of σDn, however, the particle loses a considerable Energy Range (eV) Counts Energy Range (eV) Counts 29 - 36 0 945 - 1100 31 36 - 128 11 1100 - 1310 30 128 - 300 129 1310 - 1500 29 300 - 540 80 1500 -1810 32 540 - 700 90 1810 - 2505 15 700 - 800 32 ≥ 4000 60 800 - 945 48 TABLE I: The binned XQC results used for comparison with our Monte Carlo simulations. amount of its energy while traveling through the atmo- sphere. Consequently, high-energy recoil events become less frequent, as shown by the spectrum for σDn = 10 −21.6 cm2. For larger values of σDn, too much energy is lost in the atmosphere for the particle to be detectable by the XQC experiment. When comparing the simulated measurements to the XQC data, we group the events into the thirteen energy bins given in Table I. We generally use large bins be- cause it reduces the fractional error in the probabilities generated by our Monte Carlo routine by increasing the probability of each bin: δpi/pi = 1/ pit, where t is the number of trials and pi is the probability of an energy deposit in the ith bin. Given that the number of trials is limited by runtime constraints, increasing the bin size is often the only way to obtain bin probabilities with δpi/pi values much less than one. When choosing our binning scheme, we attempted to maximize bin size while pre- serving as many features of the observed spectrum as possible. We also grouped all energies for which f 6= 1 into two bins; we ignore the variation in f within these bins and set f = 0.3815 in the lowest-energy bin and f = 0.5083 in the next-to-lowest bin. Unfortunately, we do not know the number of X-ray events in any of the bins listed in Table I. We considered using a model to subtract off the X-ray background but, given any model’s questionable accuracy, we decided not to use it in our analysis. Our ignorance of the X-ray background forces us to treat the number of observed counts in each bin as an upper limit on the number of dark matter events in that energy range. Consequently, we define a parameter X2 that measures the extent of the discrepancy between the simulated results for a given mdm and σDn and the XQC observations while ignoring bins in which the observed event count exceeds the pre- dicted contribution from dark matter: i=# of Bins (Ei − Ui)2 with Ui < Ei , (12) where Ei = Ndm × pi is the number of counts in the ith bin predicted by the Monte Carlo simulation and Ui is the number of observed counts in the same bin. We use a second Monte Carlo routine to determine how likely it is that a set of observations would give a value of X2 as large or larger than the one derived from the XQC data given a mean signal described by the set of Ei derived from the simulation Monte Carlo. In the comparison Monte Carlo, a trial begins by gen- erating a new set of Ei by sampling the error distribu- tions of Ndm and pi. The distribution of Ndm values is assumed to be Gaussian with the mean and standard de- viation given above. The probability pi is derived from pi× t events in the simulation Monte Carlo (recall that t is the number of trials), so a new value for pi is generated by sampling a Poisson distribution with a mean of pi × t and dividing the resulting number by t. Once a new set of Ei has been found, the routine generates a simulated number of observed counts for each bin according to a Poisson distribution with a mean of Ei. The value of the X2 parameter for the new Ei and Ui is computed and compared to the value for the original Ei and the XQC observations, X2XQC. The number of trials needed to ac- curately measure the probability P(X) thatX2 ≥ X2XQC is determined by requiring that the variation in the mean value of X2 over ten Monte Carlo simulations does not exceed (100-C)%, where C% is the desired confidence level and that the range P(X)±(5× the variation in P(X)) does not contain (100− C)/100. V. RESULTS AND DISCUSSION The XQC experiment rules out the enclosed region in (mdm, σDn) parameter space shown in Fig. 7. The over- burden from the atmosphere and the filtering layers as- sures that there will be a limit to how strongly a dark matter particle can interact with baryons and still reach the XQC calorimeters; this overburden is responsible for the top edge of the exclusion region. Conversely, if σDn is too small, the dark matter particles will pass through the calorimeters without interacting. The low-energy thresh- old of the XQC calorimeters places a lower bound on the excluded dark matter particle masses; ifmdm is too small, then the recoil energies are undetectable. On the other side of the mass range, the XQC detector is not sensitive to mdm ∼> 105 GeV because the number density of such massive dark matter particles is too small for the XQC experiment to detect. The exclusion region shown in Fig. 7 has a complicated shape, but its features are readily explicable. As mdm in- creases, the range of excluded σDn values shifts to lower values and then moves up again. The downward shift for mdm between 0.1 GeV and 100 GeV is due to the effects of coherent nuclear scattering. Since σcoh increases with in- creasing mdm for fixed σDn, a 100-GeV particle interacts more strongly in the atmosphere and in the detector than a 1-GeV particle with the same σDn. Consequently, both the upper and lower boundaries of the excluded region decrease with increasing mass for mdm ∼< 100 GeV. The scattering of dark matter particles with larger masses is incoherent, and the form factor discussed in Section III B FIG. 7: The region of dark matter parameter space excluded by the XQC experiment; σDn is the total cross section for scat- tering off a nucleon and mdm is the mass of the dark matter particle. This exclusion region follows from the assumption that the local dark matter density is 0.3 GeV cm−3 and that all of the dark matter shares the same value of σDn. causes σtot to decrease as mass increases for fixed σDn. Moreover, particles that are more massive than the target nuclei have straighter trajectories than lighter dark mat- ter particles due to smaller scattering angles in the de- tector rest frame. The loss of coherence also contributes because incoherent scattering makes small scattering an- gles more probable. A straight trajectory is shorter than a random walk, so the more massive particles interact less in the atmosphere and the detector than the more easily-deflected lighter particles. Due to both of these effects, the upper and lower boundaries of the exclusion region increase with increasingmdm for mdm ∼> 100 GeV. The lower left corner of the exclusion region also has two interesting features. First, the lower bound on the excluded value of σDn decreases sharply as mdm increases from 0.1 GeV to 0.5 GeV. A dark matter particle with the maximum possible velocity with respect to the detector (800 km s−1) must have a mass greater than 0.24 GeV to be capable of depositing 29 eV in the calorimeter in a single collision. Lighter particles are only detectable if they scatter multiple times inside the calorimeter, and multiple scatters require a higher value of σDn. Since their analysis does not allow multiple collisions, the XQC exclusion region found in Ref. [52] does not extend to masses lower than 0.3 GeV for any value of σDn. Second, there is a kink in the lower boundary at mdm = 10 GeV; the constraint on σDn is not as strong for this mass. The FIG. 8: The region of dark matter parameter space excluded to 90% confidence by the XQC experiment for several values of the local density of dark matter with a total nucleon scattering cross section σDn and mass mdm. The four densities shown are 0.3 GeV cm−3 (solid line), 0.15 GeV cm−3 (long dashed line), 0.075 GeV cm−3 (short dashed line) and 0.03 GeV cm−3 (dotted line). simulated spectra produced by our Monte Carlo routine for mdm = 10 GeV and σDn ∼< 10−25 cm2 reveal that the particle is most likely to deposit between 100 and 600 eV, as exemplified by the spectra depicted in Fig. 6. The background in this energy range is very high, so the XQC constraints are not as strict at these energies. Altering the local density of dark matter that strongly interacts with baryons changes the exclusion region. Fig. 8 shows the 90%-confidence exclusion regions for four values of the local density of dark matter particles with strong baryon interactions: 0.3 GeV cm−3 (solid line), 0.15 GeV cm−3 (long dashed line), 0.075 GeV cm−3 (short dashed line) and 0.03 GeV cm−3 (dotted line). These different local densities could arise due to varia- tions in the local dark matter density due to mini-halos or streams. They also describe models where the dark matter does not consist of a single particle species and the dark matter that strongly interacts with baryons is a fraction fd of the local dark matter. In that case, the four exclusion regions in Fig. 8 correspond to fd =1, 0.5, 0.25, and 0.1. Fig. 8 indicates that the top and left boundaries of the XQC exclusion region are not highly sensitive to the dark matter density. In particular, the upper left corner of the exclusion region (0.01 ≤ mdm ≤ 0.1 GeV) is nearly un- affected by lowering the dark matter density. This con- sistency indicates our Monte Carlo-generated exclusion region is smaller than the true exclusion region in this corner. If the dark matter is light (mdm ∼< 0.1 GeV), then the number of dark matter particles encountered by the XQC detector is very large (Ndm ∼> 7 × 1010). As previously mentioned, the upper left corner of the XQC exclusion region results from multiple scattering events, so the simpler version of our Monte Carlo code described in Section IVA is not applicable. Consequently, it is not possible to run more than 109 trials in a week, so each scattering event in the simulation corresponds to more than one scattering event in the detector for all the den- sities shown in Fig. 8. Therefore, decreasing the density does not change the result. If it were possible to run 1011 trials, then the upper left corner of the exclusion region would expand and differences between the different den- sity contours would emerge. Since the upper left corner of the XQC exclusion region is already ruled out by astro- physical constraints (see Fig. 9), we have not invested in the computational time necessary to expand this corner. The upper boundary of the exclusion region is also not greatly affected by decreasing the particle density, even when Ndm is small enough that the Monte Carlo routine is capable of running more than Ndm trials (mdm ≥ 100 GeV). This robustness indicates that the overburden of the XQC experiment effectively prevents all dark matter particles with σDn values greater than the upper bound of the exclusion region from reaching the detector, so it does not matter how many particles are encountered. Finally, as discussed previously, the lower portion of the exclusion region’s left boundary (σDn ≤ 10−23 cm2) is set by the energy threshold for detection and is therefore independent of Ndm. Examining the features of the excluded region allows us to predict how the region may be expanded by a fu- ture XQC-like experiment. Decreasing the overburden by either increasing the rocket’s altitude or reducing the fil- tering will push the top boundary of the excluded region upwards. Decreasing the energy detection threshold will extend the excluded region to lower masses. It may also extend the exclusion region to higher values of σDn for all masses since strongly interacting particles lose much of their energy in the atmosphere and arrive at the calorime- ter with too little energy to produce a detectable sig- nal. Increasing the size or number of calorimeters would increase the sensitivity and extend the excluded region to lower values of σDn. Finally, increasing the observa- tion time would increase Ndm, and that would extend the right and bottom boundaries of the excluded region. VI. CONCLUSION The X-ray Quantum Calorimetry (XQC) experiment is a powerful detector of dark matter that interacts strongly with baryons due to its high altitude and minimal shield- ing. The XQC measurements rule out a large range of hitherto unconstrained dark matter masses and scatter- FIG. 9: Plot of the scattering cross section for dark matter particles and nucleons (σDn) versus dark matter particle mass (mdm) showing the new XQC limits along with other current experimental limits. The red XQC exclusion region is the same as shown in Fig. 7, and the other experiments are discussed in the text. The dark gray region shows the maximal range of dark matter self-interaction cross section consistent with the strongly self-interacting dark matter model of structure formation [37, 38]. The square marks the value of the scattering cross section for neutron-nucleon interactions. ing cross sections. The excluded range was first derived in Refs. [38, 48] based on rough analytic estimates. In this paper, we have improved upon these results using detailed Monte Carlo simulations to predict how a dark matter particle of a given mass and cross section for nu- cleon scattering would interact with the XQC calorime- ters. Unlike Ref. [52], our analysis includes the atmo- sphere and the shielding of the detector, so our result in- cludes the upper limit on excluded σDn values, which had not yet been accurately determined. Our simulation also models the internal geometry of the XQC detector and the random walk of particles through it, which is not pos- sible using the analytical approaches of Refs. [38, 48, 52]. The resulting exclusion region is significantly different than its analytical predecessors. When multiple scatter- ings are included, the XQC experiment is sensitive to dark matter particles with masses below 0.3 GeV and cross sections for nucleon scattering between 10−24 and 10−20 cm2. Unlike Ref. [52], we find that the XQC ex- clusion region does not include σDn < 10 −29 cm2 for dark matter masses less than 10 GeV. Ref. [52] obtained a more restrictive upper bound because they assumed a specific X-ray background while we treat all events as po- tential dark matter interactions. At higher masses, the lower boundary of our exclusion region is much higher than in Refs. [38, 48] because they over-estimated the XQC sensitivity by assuming coherent scattering. It also appears that Refs. [38, 48] underestimated the atmo- spheric and shielding overburden for the XQC detector because our exclusion region does not extend to values of σDn as large as those included in their exclusion region. We also assume a lower local dark matter density than Refs. [38, 48] (0.3 instead of 0.4 GeV cm−3), so some of the shrinkage of the exclusion region may be attributed to the reduction in the assumed number density of dark matter particles. Fig. 9 shows how the XQC exclusion region depicted in Fig. 7 complements the exclusion regions from other ex- periments that are sensitive to similar values of σDn and mdm. For a summary of some of the other experimen- tal constraints as of 1994, see Ref. [46]. The constraints to σDn from Pioneer 11 [80], Skylab [81], and IMP7/8 [82] were interpreted by Refs. [38, 46, 48]. There have been two balloon-borne searches for dark matter, the IMAX experiment [46, 47] and the Rich, Rocchia & Spiro (RRS) [83] experiment. Although underground detectors are designed to detect WIMPs, DAMA [84, 85] does ex- clude σDn values within the range of interest, and relevant constraints may be derived from Edelweiss (EDEL) and CDMS [86, 87]. All of the exclusion regions shown in Fig. 9 were de- rived assuming that all the dark matter is strongly in- teracting. A local dark matter density of 0.4 GeV cm−3 was assumed in the analysis of the exclusion regions from Pioneer 11, Skylab and the RRS experiment, while all the other exclusion regions were derived assuming a lo- cal dark matter density of 0.3 GeV cm−3. Furthermore, the derivations of all the shown exclusion regions other than the XQC region and the EDEL+CDMS region as- sume that the scattering between dark matter particles and nuclei is coherent. Therefore, these exclusion regions are likely too broad because they over-estimate the cross section for nuclear scattering. A comparison of the XQC exclusion region reported in Refs. [38, 48] and our exclu- sion region indicates that assuming coherent scattering extends the exclusion region for mdm ≥ 1000 GeV to σDn values that are roughly A× smaller than the lower boundary of our exclusion region, where A is the mass number of the largest target nucleus. Fig. 9 also shows the bound on σDn from the CMB and large-scale structure (LSS) obtained when one as- sumes prior knowledge of the Hubble constant H0 and the cosmic baryon fraction (from BBN) [50]. This bound is nominally stronger than the bound from disk stabil- ity [45], but it is less direct in that it requires combining different measurements and depends on the cosmologi- cal model; consequently, we show both bounds in Fig. 9. Measurements of primordial element abundances give an upper limit of σDn/mdm ∼< 4 × 10−16 cm2 GeV−1 [49]. Since this upper bound lies well beyond the upper bound from disk stability, we do not include it in Fig. 9. We also do not display the constraints from cosmic rays [49] because they are derived from inelastic interactions that are model-dependent. As shown in Fig. 9, the XQC experiment rules out a wide region of (mdm, σDn) parameter space that was not probed by prior dark matter searches. Of partic- ular interest is the darkly shaded range of σDn values that corresponds to the maximal range of dark matter self-interaction cross sections consistent with the strongly self-interacting dark matter model of structure formation [37, 38]. If the dark matter consists of exotic hadrons whose interactions with nucleons are comparable to their self-interactions, then σDn for these particles would lie in or near the darkly shaded region in Fig. 9. Previous esti- mates of the XQC exclusion region [38, 48] indicated that the XQC experiment rules out all the darkly shaded σDn values for 1 ∼< mdm ∼< 104 GeV. Our analysis reveals that this is not the case; portions of the darkly shaded region for mdm ∼> 20 GeV are not excluded by the XQC ex- periment, although they are ruled out by observations of LSS and the CMB. The mass-σ combination correspond- ing to nucleon-neutron scattering (the square in Fig. 9) lies within the exclusion region of the XQC experiment, and the only portion of the darkly shaded region that is unconstrained corresponds to dark matter masses smaller than 0.25 GeV. It is important to note, however, that the cross section for dark matter self-interactions need not be comparable to the cross section for nucleon scattering; σDn could dif- fer by a few orders of magnitude from the self-interaction cross section (as is the case for Q-balls). Furthermore, no interactions with baryons are required for self-interacting dark matter to resolve the tension between the collision- less dark matter model and observations of small-scale structure. Another XQC detector is scheduled to launch in the upcoming year. This experiment will have twice the ob- serving time of the XQC experiment used in this analy- sis. As discussed in Section V, increasing the observing time will extend the exclusion region to higher masses and weaker interactions. The future XQC experiment will also have a lower energy threshold (15 eV) and will maintain sensitivity to all energies above this threshold throughout the run. The increased sensitivity to low en- ergies will shift the lower (σDn ≤ 10−23 cm2) left bound- ary of the exclusion region to lower masses. A lower en- ergy threshold of 15 eV will make the experiment sensi- tive to single recoil events involving dark matter particles more massive than 0.17 GeV, as discussed in Section V. Clearly, the next-generation XQC experiment will be an even more powerful probe of interactions between dark matter particles and baryons than its predecessor. Acknowledgments A. L. E. would like to thank Robert Lupton and Michael Ramsey-Musolf for useful discussions. D. M. thanks the Wallops Flight Facility launch support team and the many undergraduate and graduate students that made this pioneering experiment possible. The authors also thank Randy Gladstone for his assistance with the atmosphere model. A. L. E. acknowledges the support of an NSF Graduate Fellowship. P. J. S. is supported in part by US Department of Energy grant DE-FG02- 91ER40671. P. C. M. acknowledges current support for this project from a Robert M. Walker Senior Research Fellowship in Experimental Space Science from the Mc- Donnell Center for the Space Sciences, as well as prior institutional support for this project from the Instituto Nacional de Técnica Aeroespacial (INTA) in Spain, from the University of Bielefeld in Germany, and from the Uni- versity of Arizona. [1] V. C. Rubin, N. Thonnard, and W. K. Ford, The Astro- physical Journal 238, 471 (1980). [2] D. N. Spergel, R. Bean, O. Dore’, M. R. Nolta, C. L. Bennett, G. Hinshaw, N. Jarosik, E. Komatsu, L. Page, H. V. Peiris, et al., ArXiv Astrophysics e-prints (2006), astro-ph/0603449. [3] N. A. Bahcall, J. P. Ostriker, S. Perlmutter, and P. J. Steinhardt, Science 284, 1481 (1999). [4] J. F. Navarro, C. S. Frenk, and S. D. M. White, The Astrophysical Journal 462, 563 (1996). [5] A. V. Kravtsov, A. A. Klypin, J. S. Bullock, and J. R. Primack, The Astrophysical Journal 502, 48 (1998). [6] B. Moore, T. Quinn, F. Governato, J. Stadel, and G. Lake, Mon. Not. R. Astron. Soc. 310, 1147 (1999). [7] S. Ghigna, B. Moore, F. Governato, G. Lake, T. Quinn, and J. Stadel, The Astrophysical Journal 544, 616 (2000). [8] C. Power, J. F. Navarro, A. Jenkins, C. S. Frenk, S. D. M. White, V. Springel, J. Stadel, and T. Quinn, Mon. Not. R. Astron. Soc. 338, 14 (2003), astro-ph/0201544. [9] J. F. Navarro, E. Hayashi, C. Power, A. R. Jenkins, C. S. Frenk, S. D. M. White, V. Springel, J. Stadel, and T. R. Quinn, Mon. Not. R. Astron. Soc. 349, 1039 (2004), astro-ph/0311231. [10] E. Hayashi, J. F. Navarro, C. Power, A. Jenkins, C. S. Frenk, S. D. M. White, V. Springel, J. Stadel, and T. R. Quinn, Mon. Not. R. Astron. Soc. 355, 794 (2004), astro- ph/0310576. [11] J. Diemand, B. Moore, and J. Stadel, Mon. Not. R. As- tron. Soc. 353, 624 (2004), astro-ph/0402267. [12] J. Diemand, M. Zemp, B. Moore, J. Stadel, and C. M. Carollo, Mon. Not. R. Astron. Soc. 364, 665 (2005), astro-ph/0504215. [13] J. A. Tyson, G. P. Kochanski, and I. P. dell’Antonio, The Astrophysical Journal Letters 498, L107+ (1998). [14] D. J. Sand, T. Treu, G. P. Smith, and R. S. Ellis, The Astrophysical Journal 604, 88 (2004), astro-ph/0310703. [15] H. Katayama and K. Hayashida, Advances in Space Re- search 34, 2519 (2004), astro-ph/0405363. [16] E. Pointecouteau, M. Arnaud, and G. W. Pratt, Astron- omy and Astrophysics 435, 1 (2005), astro-ph/0501635. [17] L. M. Voigt and A. C. Fabian, Mon. Not. R. Astron. Soc. 368, 518 (2006), astro-ph/0602373. [18] R. A. Flores and J. R. Primack, The Astrophysical Jour- nal Letters 427, L1 (1994). [19] B. Moore, Nature 370, 629 (1994). [20] A. Burkert, The Astrophysical Journal Letters 447, L25+ (1995). [21] W. J. G. de Blok and S. S. McGaugh, Mon. Not. R. Astron. Soc. 290, 533 (1997). [22] S. S. McGaugh and W. J. G. de Blok, The Astrophysical Journal 499, 41 (1998), astro-ph/9801123. [23] W. J. G. de Blok, S. S. McGaugh, and V. C. Rubin, The Astronomical Journal 122, 2396 (2001). [24] D. Marchesini, E. D’Onghia, G. Chincarini, C. Firmani, P. Conconi, E. Molinari, and A. Zacchei, The Astrophys- ical Journal 575, 801 (2002), astro-ph/0202075. [25] J. J. Binney and N. W. Evans, Mon. Not. R. Astron. Soc. 327, L27 (2001), astro-ph/0108505. [26] P. Salucci, Mon. Not. R. Astron. Soc. 320, L1 (2001), astro-ph/0007389. [27] J. D. Simon, A. D. Bolatto, A. Leroy, L. Blitz, and E. L. Gates, The Astrophysical Journal 621, 757 (2005), astro- ph/0412035. [28] B. Moore, S. Ghigna, F. Governato, G. Lake, T. Quinn, J. Stadel, and P. Tozzi, The Astrophysical Journal Let- ters 524, L19 (1999). [29] A. Klypin, A. V. Kravtsov, O. Valenzuela, and F. Prada, The Astrophysical Journal 522, 82 (1999), astro- ph/9901240. [30] E. D’Onghia and G. Lake, The Astrophysical Journal 612, 628 (2004), astro-ph/0309735. [31] A. A. El-Zant, Y. Hoffman, J. Primack, F. Combes, and I. Shlosman, The Astrophysical Journal Letters 607, L75 (2004), astro-ph/0309412. [32] E. Hayashi and J. F. Navarro, Mon. Not. R. Astron. Soc. 373, 1117 (2006), astro-ph/0608376. [33] J. S. Bullock, A. V. Kravtsov, and D. H. Weinberg, The Astrophysical Journal 539, 517 (2000), astro- ph/0002214. [34] A. J. Benson, C. S. Frenk, C. G. Lacey, C. M. Baugh, and S. Cole, Mon. Not. R. Astron. Soc. 333, 177 (2002), astro-ph/0108218. [35] A. V. Kravtsov, O. Y. Gnedin, and A. A. Klypin, The As- trophysical Journal 609, 482 (2004), astro-ph/0401088. [36] B. Moore, J. Diemand, P. Madau, M. Zemp, and J. Stadel, Mon. Not. R. Astron. Soc. 368, 563 (2006), astro-ph/0510370. [37] D. N. Spergel and P. J. Steinhardt, Physical Review Let- ters 84, 3760 (2000). [38] B. D. Wandelt, R. Davé, G. R. Farrar, P. C. McGuire, D. N. Spergel, and P. J. Steinhardt, in Sources and De- tection of Dark Matter and Dark Energy in the Universe, edited by D. B. Cline (Springer-Verlag, Berlin, New York, 2001), p. 263, astro-ph/0006344. [39] R. Davé, D. N. Spergel, P. J. Steinhardt, and B. D. Wan- delt, The Astrophysical Journal 547, 574 (2001). [40] K. Ahn and P. R. Shapiro, Mon. Not. R. Astron. Soc. 363, 1092 (2005), astro-ph/0412169. [41] G. R. Farrar, Int. J. Theor. Phys. 42, 1211 (2003). [42] G. R. Farrar and G. Zaharijas, Physical Review Letters 96, 041302 (2006), hep-ph/0510079. [43] A. Kusenko and P. J. Steinhardt, Physical Review Letters 87, 141301 (2001), astro-ph/0106008. [44] M. Y. Khlopov, Pisma Zh. Eksp. Teor. Fiz. 83, 3 (2006), astro-ph/0511796. [45] G. D. Starkman, A. Gould, R. Esmailzadeh, and S. Di- mopoulos, Physical Review D 41, 3594 (1990). [46] P. C. McGuire, Ph.D. thesis, University of Arizona (1994). [47] P. C. McGuire, T. Bowen, D. L. Barker, P. G. Halverson, K. R. Kendall, T. S. Metcalfe, R. S. Norton, A. E. Pifer, L. M. Barbier, E. R. Christian, et al., in AIP Conf. Proc. 336: Dark Matter, edited by S. S. Holt and C. L. Bennett (1995), p. 53. [48] P. C. McGuire and P. J. Steinhardt, in Proceedings of the 27th International Cosmic Ray Conference, Ham- burg, Germany (2001), p. 1566, astro-ph/0105567. [49] R. H. Cyburt, B. D. Fields, V. Pavlidou, and B. Wandelt, Physical Review D 65, 123503 (2002), astro-ph/0203240. [50] X. Chen, S. Hannestad, and R. J. Scherrer, Physical Re- view D 65, 123515 (2002), astro-ph/0202496. [51] D. McCammon, R. Almy, E. Apodaca, W. Bergmann Tiest, W. Cui, S. Deiker, M. Galeazzi, M. Juda, A. Lesser, T. Mihara, et al., The Astrophysical Journal 576, 188 (2002). [52] G. Zaharijas and G. R. Farrar, Physical Review D 72, 083502 (2005), astro-ph/0406531. [53] D. McCammon, R. Almy, S. Deiker, J. Morgenthaler, R. L. Kelley, F. J. Marshall, S. H. Moseley, C. K. Stahle, and A. E. Szymkowiak, Nuclear Instruments and Meth- ods in Physics Research A 370, 266 (1996). [54] C. K. Stahle, R. L. Kelley, D. McCammon, S. H. Moseley, and A. E. Szymkowiak, Nuclear Instruments and Meth- ods in Physics Research A 370, 173 (1996). [55] E. I. Gates, G. Gyuk, and M. S. Turner, The Astrophys- ical Journal Letters 449, L123+ (1995). [56] B. Moore, C. Calcáneo-Roldán, J. Stadel, T. Quinn, G. Lake, S. Ghigna, and F. Governato, Phys. Rev. D 64, 063508 (2001), astro-ph/0106271. [57] J. D. Lewin and P. F. Smith, Astroparticle Physics 6, 87 (1996). [58] A. M. Green, Phys. Rev. D 68, 023004 (2003), astro- ph/0304446. [59] A. K. Drukier, K. Freese, and D. N. Spergel, Physical Review D 33, 3495 (1986). [60] F. J. Kerr and D. Lynden-Bell, Mon. Not. R. Astron. Soc. 221, 1023 (1986). [61] J. A. R. Caldwell and I. M. Coulson, The Astronomical Journal 93, 1090 (1987). [62] R. P. Olling and M. R. Merrifield, Mon. Not. R. Astron. Soc. 297, 943 (1998). [63] C. S. Kochanek, Astrophys. J. 457, 228 (1996), astro- ph/9505068. [64] M. J. Reid, A. C. S. Readhead, R. C. Vermeulen, and R. N. Treuhaft, The Astrophysical Journal 524, 816 (1999). [65] R. P. Olling and W. Dehnen, The Astrophysical Journal 599, 275 (2003), arXiv:astro-ph/0301486. [66] K. M. Cudworth, The Astronomical Journal 99, 590 (1990). [67] P. J. T. Leonard and S. Tremaine, Astrophys. J. 353, 486 (1990). [68] M. C. Smith, G. R. Ruchti, A. Helmi, R. F. G. Wyse, J. P. Fulbright, K. C. Freeman, J. F. Navarro, G. M. Seabroke, M. Steinmetz, M. Williams, et al., Mon. Not. R. Astron. Soc. 379, 755 (2007), arXiv:astro-ph/0611671. [69] M. J. Reid, Annual Review of Astronomy and Astro- physics 31, 345 (1993). [70] D. H. McNamara, J. B. Madsen, J. Barnes, and B. F. Ericksen, The Publications of the Astronomical Society of the Pacific 112, 202 (2000). [71] V. S. Avedisova, Astronomy Reports 49, 435 (2005). [72] W. Dehnen and J. J. Binney, Mon. Not. R. Astron. Soc. 298, 387 (1998), astro-ph/9710077. [73] M. W. Goodman and E. Witten, Physical Review D 31, 3059 (1985). [74] A. Gould, The Astrophysical Journal 321, 571 (1987). [75] G. Gelmini, A. Kusenko, and S. Nussinov, Physical Re- view Letters 89, 101302 (2002), hep-ph/0203179. [76] J. Engel, Physics Letters B 264, 114 (1991). [77] D. Z. Freedman, Physical Review D 9, 1389 (1974). [78] R. H. Helm, Physical Review 104, 1466 (1956). [79] M. Matsumoto and T. Nishimura, ACM Transactions on Modeling and Computer Simulation 8, 3 (1998). [80] J. A. Simpson, T. S. Bastian, D. L. Chenette, R. B. McK- ibben, and K. R. Pyle, Journal of Geophysical Research 85, 5731 (1980). [81] E. K. Shirk and P. B. Price, Astrophys. J. 220, 719 (1978). [82] R. A. Mewaldt, A. W. Labrador, C. Lopate, and R. B. McKibben (2001), private communication. [83] J. Rich, R. Rocchia, and M. Spiro, Physics Letters B 194, 173 (1987). [84] C. Bacci, P. Belli, R. Bernabei, C. Dai, L. Ding, W. di Nicolantonio, E. Gaillard, G. Gerbier, H. Kuang, A. In- cicchitti, et al., Astroparticle Physics 2, 13 (1994). [85] R. Bernabei, P. Belli, R. Cerulli, F. Montecchia, M. Am- ato, G. Ignesti, A. Incicchitti, D. Prosperi, C. J. Dai, H. L. He, et al., Physical Review Letters 83, 4918 (1999). [86] I. F. M. Albuquerque and L. Baudis, Physical Review Letters 90, 221301 (2003), astro-ph/0301188. [87] I. F. M. Albuquerque and L. Baudis, Physical Review Letters 91, 229903(E) (2003). ABSTRACT Although the rocket-based X-ray Quantum Calorimetry (XQC) experiment was designed for X-ray spectroscopy, the minimal shielding of its calorimeters, its low atmospheric overburden, and its low-threshold detectors make it among the most sensitive instruments for detecting or constraining strong interactions between dark matter particles and baryons. We use Monte Carlo simulations to obtain the precise limits the XQC experiment places on spin-independent interactions between dark matter and baryons, improving upon earlier analytical estimates. We find that the XQC experiment rules out a wide range of nucleon-scattering cross sections centered around one barn for dark matter particles with masses between 0.01 and 10^5 GeV. Our analysis also provides new constraints on cases where only a fraction of the dark matter strongly interacts with baryons. <|endoftext|><|startoftext|> Introduction Heavy-Light Staggered Chiral Perturbation Theory Generalizing Continuum PQ0.4exPT to S0.4exPT Form Factors for BP Decay Form factors for 3-flavor partially quenched S0.4exPT Full QCD Results Analytic terms Finite Volume Effects Conclusions Feynman Rules Integrals Wavefunction Renormalization Factors References ABSTRACT We calculate the form factors for the semileptonic decays of heavy-light pseudoscalar mesons in partially quenched staggered chiral perturbation theory (\schpt), working to leading order in $1/m_Q$, where $m_Q$ is the heavy quark mass. We take the light meson in the final state to be a pseudoscalar corresponding to the exact chiral symmetry of staggered quarks. The treatment assumes the validity of the standard prescription for representing the staggered ``fourth root trick'' within \schpt by insertions of factors of 1/4 for each sea quark loop. Our calculation is based on an existing partially quenched continuum chiral perturbation theory calculation with degenerate sea quarks by Becirevic, Prelovsek and Zupan, which we generalize to the staggered (and non-degenerate) case. As a by-product, we obtain the continuum partially quenched results with non-degenerate sea quarks. We analyze the effects of non-leading chiral terms, and find a relation among the coefficients governing the analytic valence mass dependence at this order. Our results are useful in analyzing lattice computations of form factors $B\to\pi$ and $D\to K$ when the light quarks are simulated with the staggered action. <|endoftext|><|startoftext|> Introduction Increasing attention is being paid to the dynamics of open quantum systems, that is, to quantum systems acted on by an environment. Such systems are of interest for studies of dissipative phenomena, decoherence, backgrounds to quantum computers and to precision measurements, and theories of quantum measurement. A principal tool in studying open quantum systems is the reduced density matrix, obtained from the pure state density matrix by tracing over environment degrees of freedom, or in stochastic models where the environ- ment is represented by a noise term in the Schrödinger equation, by averaging over the noise. As is well-known, this transition from the pure state density matrix to the reduced density matrix is not one-to-one, since information about the total system is lost. For example, in stochastic models, there is known to be a continuum of different unravelings, or pure state density matrix stochastic evolutions, that yield the same master equation for the reduced density matrix. The question that we investigate here is the extent to which one can form objects that refer only to the basis vectors of the system Hilbert space, but that nonetheless recapture information that is lost in passing to the reduced density matrix. In the first part of this paper (Sections 2 through 5), we discuss classical noise arising from fluctuations defined by classical probability distributions. In the second part (Sections 6 through 9), we give an analogous discussion of quantum noise, which appears in the physically important case of a quantum system coupled to a quantum environment in an overall pure state. We also give an extension, making contact with the discussion of the first part, to the case in which the overall system is in a mixed state superposition of pure states. The final section contains a discussion of quantum measurements that relates the material in the first and second parts. For the case of classical probability distributions, a relevant discussion appears in Chapter 5 of the book The Theory of Open Quantum Systems by Breuer and Petruccione [1], following up on earlier papers by those authors [2], by Wiseman [3] and by Mølmer, Castin, and Dalibard [4]. In simplified form, Breuer and Petruccione introduce an ensemble of pure state vectors |ψα〉, each drawn from the same system Hilbert space HS , with each vector assumed to occur in the ensemble with probability wα, wα = 1. Measurement of a general self-adjoint operator R for a system prepared in |ψα〉 typically gives a range of values, the mean of which given by 〈ψα|R|ψα〉. The mean or expectation over the ensemble of pure state vectors is then given by wα〈ψα|R|ψα〉 = TrρR , (1a) with ρ the mixed state or reduced density matrix defined by wα|ψα〉〈ψα| . (1b) Breuer and Petruccione point out that there are three variances that are relevant. The variance of measurements of R over all pure states in the ensemble is given by Var(R) = Trρ(R − TrρR)2 = TrρR2 − (TrρR)2 . (2a) This can be written as the sum of two non-negative terms, Var(R) = Var1(R) + Var2(R) , (2b) with Var1(R) the ensemble average of the variances of R within each pure state of the ensemble, Var1(R) = wα[〈ψα|R 2|ψα〉 − 〈ψα|R|ψα〉 2] , (2c) and with Var2(R) the variance of the pure state means of R over the ensemble, Var2(R) = wα〈ψα|R|ψα〉 2 − [ wα〈ψα|R|ψα〉] 2 . (2d) Thus, Var1(R) is an ensemble average of the quantum variances of R, while Var2(R) is a measure of the spread of the average values of R resulting from the statistical properties of the ensemble. As Breuer and Petruccione note, neither of the subsidiary variances Var1,2 can be expressed as the density matrix expectation of some self-adjoint operator. Our aim in the first part of this paper is to extend the formalism of ref [1] by utilizing a density tensor hierarchy, which captures the statistical information that is lost in forming the reduced density matrix of Eq. (1b). A density tensor, defined as an ensemble average of density matrices, was first introduced by Mielnik [5], and was applied to discussions of density functions on the space of quantum states and their application to thermalization of quantum systems by Brody and Hughston [6]. These papers, in addition to introducing the concept of a density tensor which is developed further here, also contain the important result that in the case of a continuum probability distribution, the density tensor hierarchy gives all of the information needed to reconstruct the probability function wα. In particular, the variances Var1,2 for any observable, and more general statistical properties of the ensemble as well, can be expressed as contractions of density tensor matrix elements with appropriate matrix elements of the observable(s) of interest. The basic construction of the density tensor hierarchy corresponding to a classical probability distribution {wα} is given in Sec. 2. Here we generalize the reduced density matrix of Eq. (1b) to a density tensor, formed by taking a product of pure state density matrix elements, and averaging over the ensemble of pure states. When the ψα are independent of α, this tensor reduces to an n-fold product of reduced density matrices, and so the difference between the density tensor and this product is a measure of the statistical fluctuations in the ensemble. In the generic case of non-trivial dependence of ψα on α, there are some general statements that can be made. First of all, the order n density tensor is a symmetric tensor in its pair indices, and it can be considered as a matrix operator acting on the n-fold tensor product of the system Hilbert space HS with itself. The symmetry of the density tensor allows construction of a generating function that on expansion gives the density tensors of all orders. Additionally, as a consequence of the unit trace and idempotence conditions obeyed by the pure state density matrix, the density tensor hierarchy satisfies a system of descent equations, relating the order n tensor to the order n−1 tensor when any row index is contracted with any column index. We show that the variances Var1,2 defined by Breuer and Petruccione can be expressed in terms of appropriate contractions of density tensor elements with operator matrix elements. In subsequent sections we develop some concrete applications of the general formal- ism for classical probability distributions. In Sec. 3, we consider an isotropic ensemble of spin-1/2 pure state density matrices, construct the density tensors through order 3, ver- ify the descent equations, and calculate the generating function. In Sec. 4 we apply the formalism to a quantum system evolving under the influence of noise as described by a stochastic Schrödinger equation, with the ensemble defined as the set of all histories of an initial quantum state under the influence of the noise. Assuming white noise described by the Itô calculus, we give the dynamics of the general density tensor in terms of the general unraveling of the Lindblad equation constructed by Wiseman and Diósi [7], and show that the order two and higher density tensors distinguish between inequivalent unravelings that give the same reduced density matrix (i.e., the same order one density tensor). In Sec. 5 we develop an analogous formalism for the case of jump (piecewise deterministic process) unravelings of the Lindblad equation. We turn next to an analysis of a quantum system coupled to a quantum environment, rather than to an external classical noise source. Here, one is confronted with the problem of discussing the system dissipation associated with the system-environment interaction within a single overall pure state of system plus environment (or in a thermal state that is a weighted average of such pure states). Typically, in master equation derivations, the system- environment interaction1 H has vanishing expectation in the environment, but its square H2 does not have a vanishing expectation, because the environment is not in an eigenstate of H . The associated variance is then a measure of quantum fluctuations associated with the environment state, and is the source of quantum “noise” driving the system dissipation. Our aim in the second part of this paper is to generalize the formalism of the first part to recapture information about this noise that is lost in the passage to the system reduced density matrix. We do this in Sec. 6 by defining a density tensor hierarchy as the trace over the environment of a product of environment operators constructed as the system matrix elements of the total density matrix. Unlike the classical noise construction, which uses only the system density matrix, the construction in the quantum noise case requires knowledge of the full system plus environment density matrix, and so (except for the order one case) does not give a system observable. It is nonetheless computable in any theory of the system plus environment, and is of theoretical, rather than empirical, interest. Because the environment 1 What we call H is usually denoted by HI in the open systems literature. To avoid confusion, all other Hamiltonians will carry subscripts, e.g., HS and HE for the system and environment Hamiltonians, HTOT for the total Hamiltonian, etc. operators entering the construction are non-commutative, this hierarchy is no longer totally symmetric in its system index pairs, but by the cyclic permutation property of the trace, it is symmetric under cyclic permutation of the system index pairs. Also, because the system trace of these environment operators gives only the reduced environment density matrix, rather than unity, there is in general no descent equation associated with taking this trace. However, when indices of adjacent system operators are contracted, one gets the square of the overall density matrix, and so there remains a set of descent relations connecting the order (n) tensor to the order (n− 1) tensor. Finally, in the case of thermal (or other mixed) overall states, we define the appropriate tensor as a weighted sum of pure state tensors, in analogy with the definition of Sec. 2. In subsequent sections, we give applications of the trace hierarchy formalism to several classic problems discussed in the theory of quantum master equations. In Sec. 7 we consider the quantum Brownian motion (and resulting decoherence) of a massive Brownian particle in interaction with an independent particle bath of scatterers. In Sec. 8 we discuss the tensor hierarchy corresponding to the weak coupling Born–Markov master equation, and it specialization to the quantum optical master equation. Finally in Sec. 9, we give an analogous discussion for the Caldeira–Leggett model of a particle in interaction with a system of environmental oscillators. We conclude with a discussion that bridges the considerations of the classical noise and the quantum noise cases. In Sec. 10, we contrast two different Itô stochastic Schrödinger equations, both of which have the same Lindblad, but only one of which leads to state vector reduction. We relate this to the fact that the equation giving the time derivative of the stochastic expectation of operator variances involves the order two density tensor, which differs for the two cases. We discuss the analogous equation for the time dependence of the variance of a “pointer operator” in the case of a quantum system coupled to a quantum environment, and show why this does not lead to state vector reduction. Thus we see no mechanism for quantum “noise” in a closed quantum system plus environment to provide a resolution of the quantum measurement problem. 2. The density tensor for classical noise and its kinematical properties We proceed to establish our notation and to define the density tensor hierarchy in the classical noise case. We denote the pure state density matrix formed from the unit normalized state |ψα〉 by ρα, with ρα = |ψα〉〈ψα| , (3a) and its general matrix element between states |i〉 and |j〉 of HS by ρα;ij ≡ 〈i|ρα|j〉 . (3b) The unit trace condition on ρα states that Trρα = 〈ψα|ψα〉 = 1 , (3c) and the idempotence condition on ρα states that ρ2α = |ψα〉〈ψα||ψα〉〈ψα| = |ψα〉〈ψα| = ρα . (3d) We now define the order n density tensor by i1j1,i2j2,...,injn wαρα;i1j1ρα;i2j2 ...ρα;injn = E[ρα;i1j1ρα;i2j2 ...ρα;injn ] , (4a) with E[Fα] a shorthand for E[Fα] = wαFα . (4b) Since wαρα;ij = wα〈i|ρα|j〉 , (5a) we see that this is just the |i〉 to |j〉 matrix element of the reduced density matrix ρ defined in Eq. (1b), ij = 〈i|ρ|j〉 , (5b) and so the density tensor of Eq. (4a) is a natural generalization of the usual reduced density matrix. When the states |ψα〉 are independent of the label α, the definition of Eq. (4a) simplifies to i1j1,i2j2,...,injn = ρi1j1ρi2j2 ...ρinjn , (5c) and so the difference between Eq. (4a) and a product of reduced density matrix elements is a reflection of the statistical structure of the ensemble. Since the factors within the expectation E[...] on the right of Eq. (4a) are just ordinary complex numbers, the density tensor is symmetric under interchange of any index pair iljl with any other index pair imjm. Consequently, we can define a generating function for the density tensor by G[aij ] = E[e ρα;ijaij ] = ai1j1 ...ainjn i1j1,...,injn , (5d) where repeated indices i, j are summed. It will often be convenient to abbreviate ρα;ijaij by ρα · a, so that the generating function becomes in this notation G[a] = E[e ρα ·a]. Although the density tensor for n > 1 is not an operator on HS , it clearly has the structure of an operator on the n-fold tensor product HS ⊗ HS ⊗ ... ⊗ HS . Motivated by this, we will often find it convenient to write the definition of Eq. (4a) as ρ(n) = E , (5e) with each factor ρα;ℓ acting on a distinct factor Hilbert space HS;ℓ in the tensor product ℓ=1 HS;ℓ. One can pass easily back and forth from this notation to one in which the system matrix indices are displayed explicitly. Let us consider next the result of contracting any row index il with any column index jk. There are two basic cases: (i) one can contract a row index il with its corresponding column index jl, and (ii) one can contract a row index il with a column index jk with k 6= l. Since the density tensor is symmetric in its index pairs, it suffices to consider only one example of each case, since all others can be obtained by permutation. For the contraction of i1 with j1 we find δi1j1ρ i1j1,i2j2,...,injn = E[(Trρ)ρα;i2j2 ...ρα;injn ] = E[ρα;i2j2 ...ρα;injn ] = ρ (n−1) i2j2,...,injn , (6a) where we have used the unit trace condition of Eq. (3c). For the contraction of j1 with i2, we find δj1i2ρ i1j1,i2j2,...,injn = E[(ρ2)α;i1j2 ...ρα;injn ] = E[ρα;i1j2ρα;i3j3 ...ρα;injn ] = ρ (n−1) i1j2,i3j3,...,injn , (6b) where now we have used the idempotence condition of Eq. (3d). As an illustration of how this works when all possible index pair contractions are considered, we give the complete set of contractions reducing the second order density tensor to a first order density tensor, δi1j1ρ i1j1,i2j2 δi2j2ρ i1j1,i2j2 δj1i2ρ i1j1,i2j2 δj2i1ρ i1j1,i2j2 Referring to the generating function of Eq. (5d), the general descent equations can be sum- marized compactly by the two identities, ∂G[aij ] =E[(Trρα)e ρα;ijaij ] = G[aij ] , ∂2G[aij ] ∂amr∂apq =E[ρmrρrqe ρα;ijaij ] = E[ρmqe ρα;ijaij ] = ∂G[aij ] When the density matrix ρ used to define the density tensor is a mixed state density matrix, the trace descent relation of Eq. (6a) is unchanged, while the indempotency relation of Eq. (6b) relates the contraction an order (n) tensor to an order (n− 1) tensor in which one factor ρ is replaced by ρ2; this is not a member of the original hierarchy, but still gives a useful relation for checking calculations. To conclude this section, let us return to the variances introduced by Breuer and Petruccione. In terms of the order one and order two density tensors, we evidently have Var1(R) =ρ (R2)j1i1 − ρ i1j1,i2j2 Rj1i1Rj2i2 , Var2(R) =ρ i1j1,i2j2 Rj1i1Rj2i2 − (ρ Rj1i1) Var(R) =ρ (R2)j1i1 − (ρ Rj1i1) with Rji = 〈j|R|i〉. Clearly, other statistical properties of the ensemble are readily expressed in terms of the density tensor hierarchy. For example, the ensemble average of the product of the expectations of two different operators R and S is given by wα〈ψα|R|ψα〉〈ψα|S|ψα〉 = ρ i1j1,i2j2 Rj1i1Sj2i2 , (8b) which can be used, together with information obtained from ρ(1), to calculate the covariance and correlation of R and S. 3. Isotropic spin-1/2 ensemble As a simple example of the density tensor formalism, let us follow Breuer and Petruc- cione [1] and consider the case of an isotropic spin-1/2 ensemble. Let ~v be a vector in three dimensions, and consider the ensemble of spin-1/2 pure state density matrices ρ(~v) = (1 + ~v · ~σ) , (9a) with ~σ = (σ1, σ2, σ3) the standard Pauli matrices, and with a uniform probability distribution of ~v over the unit sphere |~v | = 1 specified by w(~v ) = δ(|~v | − 1) . (9b) (Clearly, ~v has the same significance as the label α used in the preceding section.) Defining E[P (~v )] = d3vw(~v )P (~v ) , (10a) a standard calculation gives E[1] = 1 , E[vsvt] = δst , ... , (10b) with all averages of odd powers of ~v vanishing. From Eq. (9a), we have ρ(~v )ij = (δij + vrσ ij) , (11a) and the general density tensor over this ensemble is defined by i1j1,...,injn = E[ρ(~v )i1j1 ...ρ(~v )injn ] . (11b) From Eq. (10b), the first three tensors in this hierarchy are now easily found to be δi1j1 , i1j1,i2j2 δi1j1δi2j2 + ~σi1j1 · ~σi2j2 i1j1,i2j2,i3j3 δi1j1δi2j2δi3j3 + (δi1j1~σi2j2 · ~σi3j3 + δi2j2~σi1j1 · ~σi3j3 + δi3j3~σi1j1 · ~σi2j2) Using the relations Tr~σ = 0 and (~σ 2)ij = 3δij , it is now easy to verify that the descent relations of Eqs. (6a) and (6b) are satisfied by Eq. (12). For the isotropic spin-1/2 ensemble, the generating function of Eq. (5d) becomes G[aij ] = E[e ρ(~v)ijaij ] , (13) with ρ(~v )ij given by Eq. (11a). Defining the vector ~A by ~σijaij , (14a) a simple calculation gives G[aij ] = exp( sinh | ~A | | ~A | = exp( Tra)[1 + ( ~A 2)2 + ...] , (14b) from which one can read off the values of the low order density tensors given in Eq. (12). The verification of the descent relations of Eq. (7b) for the generating function of Eq. (14b) is given in Appendix A. 4. Itô stochastic Schrödinger equation We consider next a state vector |ψ〉 with a time evolution described by a stochas- tic Schrödinger equation, which is a frequently used model approximation to open system dynamics. In this case the state vector and the corresponding pure state density matrix ρ = |ψ〉〈ψ| are implicit functions of the noise, which takes a different sequence of values for each history of the system. In the notation of Sec. 2, the different histories are labeled by the subscript α, and the expectation of Eq. (4b) is an average over all possible histories. It is customary, however, in discussing stochastic Schrödinger equations to omit the subscript α, treating the history dependence of ρ as understood. So in this context, the definition of Eq. (4a) becomes i1j1,...,injn = E[ρi1j1 ...ρinjn ] , (15) with E[...] the stochastic expectation, and the generating function G[aij ] takes the same form as given in Eq. (5d) but with the subscript α omitted. Our aim in this section is to derive an equation of motion for the generating function, which on expansion yields equations of motion for all density tensors ρ(n), taking as input the general pure state density matrix evolution constructed by Wiseman and Diósi [7], that corresponds to a given Lindblad form [8,9] for the time evolution of the reduced density matrix ρ(1) = E[ρ]. We begin by recapitulating the results of ref [7]. The most general evolution of a density matrix ρ that preserves Trρ = 1 and obeys the complete positivity condition is the Lindblad form dρ = dtLρ , (16a) Lρ ≡ −i[HTOT, ρ] + ckρc ck, ρ} , (16b) with {, } denoting the anticommutator, and with the repeated index k summed. The set of Lindblad operators ck describes the effects on the system of the reservoir or environment that is modeled by an external classical noise. Wiseman and Diósi show that the most general evolution of the pure state density matrix ρ for which E[dρ] reduces to Eqs. (16a) and (16b) takes the form dρ = dtLρ+ |dφ〉〈ψ|+ |ψ〉〈dφ| . (17a) Here |dφ〉 is a state vector that is a pure noise term, so that E[|dφ〉] = 0 , (17b) that is orthogonal to |ψ〉, so that 〈ψ|dφ〉 = 0 , (17c) and that obeys |dφ〉〈dφ| = dtW . (17d) The operator W is the Diósi transition rate operator [5] given by W =Lρ− {ρ,Lρ}+ ρTr(ρLρ) =(ck − 〈ck〉)ρ(ck − 〈ck〉) , (18) where 〈ck〉 is a shorthand for the quantum state expectation 〈ψ|ck|ψ〉 = Trρck. Although |dφ〉〈dφ| is completely fixed, Wiseman and Diósi show that |dφ〉|dφ〉 is free, with different choices for this and different phase choices for the ck corresponding to different pure state evolutions (or “unravelings”) that yield the same evolution of Eqs. (16a) and (16b) for the reduced density matrix ρ. Wiseman and Diósi further show that |dφ〉 can be parameterized by complex Wiener processes by writing |dφ〉 = (ck − 〈ck〉)|ψ〉dξ k , (19a) E[dξk] = E[dξ k] = 0 (19b) and with dξj(t)dξ k(t) =dtδjk dξj(t)dξk(t) =dtujk , (19c) where ukj = ujk is a set of arbitrary complex numbers subject to the condition that the norm of the complex matrix u ≡ [ujk] be less than or equal to 1. (See Eqs. (4.10) and (4.11) of ref. [7].) In terms of this parameterization of |dφ〉, the pure state evolution of Eq. (17a) takes the form dρ = dtLρ+ (ck − 〈ck〉)ρdξ k + ρ(ck − 〈ck〉) †dξk , (19d) and the corresponding stochastic Schrödinger equation for the wave function is [7] d|ψ〉 =− iHψdt|ψ〉+ (ck − 〈ck〉)dξ k|ψ〉 , −iHψ =− iHTOT − kck − 2〈ck〉 ∗ck + 〈ck〉 ∗〈ck〉 (19e) We proceed now to use pure state evolution of Eq. (19d) to calculate the evolution equation for the generating function G[aij ] = E[exp(ρijaij)] . (20a) To calculate the differential of Eq. (20a), we use the Itô stochastic calculus rule for the differential of a function f(w) of a stochastic variable w, df(w) = dwf ′(w) + (dw)2f ′′(w) . (20b) Applying this to Eq. (20a), we get dG[aij ] = E[(dρmramr + dρmramrdρpqapq) exp(ρijaij)] . (20c) Substituting Eq. (19d) for dρ, and using Eqs. (19a-c), together with the Itô calculus rule E[dwf(w)] = 0, we get dG[aij ] = dtE amr(Lρ)mr + amrapqCmr,pq exp(ρijaij) , (21a) with the coefficient of the quadratic term in aij given by Cmr,pq =Cpq,mr = dρmrdρpq =〈m|(ck − 〈ck〉)ρ|r〉〈p|ρ(ck − 〈ck〉) +〈m|ρ(ck − 〈ck〉) †|r〉〈p|(ck − 〈ck〉)ρ|q〉 +〈m|(ck − 〈ck〉)ρ|r〉〈p|(cℓ − 〈cℓ〉)ρ|q〉u +〈m|ρ(ck − 〈ck〉) †|r〉〈p|ρ(cℓ − 〈cℓ〉) †|q〉ukℓ (21b) This expression can be rearranged by using the identity, valid for general operators A,B, general states |r〉, |m〉, and general pure state (idempotent) density matrix ρ, ρA|r〉〈m|Bρ = ρ〈m|BρA|r〉 , (22a) giving an alternative result for Cmr,pq Cmr,pq =Wmqρpr +Wprρmq +[(ck − 〈ck〉)ρ]mqu kℓ[(cℓ − 〈cℓ〉)ρ]pr +[ρ(ck − 〈ck〉) †]prukℓ[ρ(cℓ − 〈cℓ〉) †]mq , (22b) where we have used Eq. (18) defining the operator W , and where we use the subscript notation of Eq. (3b) for matrix elements, so that in general Amr = 〈m|A|r〉. From the evolution equation of Eqs. (21a,b) and (22b) for the generating function, by expansion in powers of a we can read off the evolution equation for the general density tensor of order n. Employing now the condensed notation of Eq. (5e), in which matrix indices are not indicated explicitly, we have dρ(n) =dtE[ (ρ1...ρn)ℓ(Lρ)ℓ ℓ 2 to n − 1. For the n = 2 density tensor time derivative, writing out all terms in Eq. (50a) explicitly, and using the fact that since operators labeled with subscripts 2 and 1 act on different Hilbert spaces, the order in which they are written is irrelevant, we have dρ(2)(t)/dt =i[ρ 1 (t), Sαβ(ω)A 1α(ω)A1β(ω)]ρ 2 (t) 1 (t)i[ρ 2 (t), Sαβ(ω)A 2α(ω)A2β(ω)] γαβ(ω) 1α(ω)A1β(ω), ρ 1 (t)}ρ 2 (t) γαβ(ω)ρ 1 (t) 2α(ω)A2β(ω), ρ 2 (t)} γαβ(ω)[ρ 1 (t)A 1α(ω)A2β(ω)ρ 2 (t) + A1β(ω)ρ 1 (t)ρ 2 (t)A 2α(ω)] . (D1a) Contracting the column index associated with the subscript 1 with the row index associated with the subscript 2, and dropping the subscripts since all operators now act in the same Hilbert space, we get dρ(2)(t)/dt→i[ρ(1)(t), Sαβ(ω)A α(ω)Aβ(ω)]ρ (1)(t) +ρ(1)(t)i[ρ(1)(t), Sαβ(ω)A α(ω)Aβ(ω)] γαβ(ω) {A†α(ω)Aβ(ω), ρ (1)(t)}ρ(1)(t) γαβ(ω)ρ (1)(t) {A†α(ω)Aβ(ω), ρ (1)(t)} γαβ(ω)[ρ (1)(t)A†α(ω)Aβ(ω)ρ (1)(t) + Aβ(ω)(ρ (1)(t))2A†α(ω)] =i[(ρ(1)(t))2, Sαβ(ω)A α(ω)Aβ(ω)] γαβ(ω) Aβ(ω)(ρ (1)(t))2A†α(ω)− {(ρ(1)(t))2, A†α(ω)Aβ(ω)} (D1b) which has the structure of dρ(1)(t)/dt and so verifies the 2 → 1 descent. To verify the n→ n− 1 descent we make some simplifications in notation. We omit all superscripts (1), since this leads to no ambiguities, as well as all time arguments (t) and all frequency arguments (ω). We also abbreviate Sαβ(ω)A (ω)Aℓβ(ω) , γαβ(ω)A (ω)Aℓβ(ω) . (D2a) Our general strategy is to split the sum ℓ=1 containing (ρ1...ρn)ℓ into ℓ=2 plus the ℓ = 1 and the ℓ = n terms, and to split the sum ℓ=1 containing (ρ1...ρn)ℓℓ+1 into ℓ=2 plus the ℓ = 1, ℓ = n− 1, and ℓ = n terms. For the part of dρ(n)/dt involving Lℓ, we have (ρ1...ρn−1)ℓρni[ρℓ, Lℓ] + (ρ2...ρn)i[ρ1, L1] + (ρ1...ρn−1)i[ρn, Ln] , (D2b) which on contracting the column index associated with the subscript n with the row index associated with the subscript 1, and relabeling all quantities that had subscript n with subscript 1, since they act now in the same Hilbert space, gives (ρ21ρ2...ρn−1)ℓi[ρℓ, Lℓ] + ρ2...ρn−1i(ρ1[ρ1, L1] + [ρ1, L1]ρ1) (ρ21ρ2...ρn−1)ℓi[ρℓ, Lℓ] + ρ2...ρn−1i[ρ 1, L1] , (D2c) which has the correct structure for the corresponding part of dρ(n−1)/dt, with ρ1 replaced by ρ21. The remainder of dρ (n)/dt is (ρ1...ρn)ℓ {Mℓ, ρℓ} − (ρ2...ρn) {M1, ρ1} − (ρ1...ρn−1) {Mn, ρn} (ρ1...ρn)ℓℓ+1ρℓA Aℓ+1βρℓ+1 + ρ3...ρnρ1A 1αA2βρ2 + ρ1...ρn−2ρn−1A n−1αAnβρn + ρ2...ρn−1ρnA nαA1βρ1 (D3a) Again, contracting the column index associated with the subscript n with the row index associated with the subscript 1, and relabeling all quantities that had subscript n with subscript 1, since they act now in the same Hilbert space, gives (ρ21...ρn−1)ℓ {Mℓ, ρℓ} − (ρ2...ρn−1) ρ1M1ρ1(∗) + {M1, ρ (ρ21...ρn−1)ℓℓ+1ρℓA Aℓ+1βρℓ+1 + (ρ3...ρn−1)ρ 1αA2βρ2 + ρ2...ρn−2ρn−1A n−1αA1βρ 1 + ρ2...ρn−1ρ1A 1αA1βρ1(∗) (D3b) which on canceling the terms marked with (∗) gives the corresponding part of dρ(n−1)/dt, with ρ1 replaced by ρ 1. This completes the verification of the n→ n− 1 descent. Appendix E: Descent equations for the Caldeira–Leggett model We verify here that Eqs. (56a) and (56b) obey the descent equations of Eq. (34). As in the preceding appendix, we simplify the notation by omitting all superscripts (1) and all time arguments (t). We first verify the n = 2 to n = 1 descent. For the n = 2 case of Eq. (56b), we have Dρ(2)/dt =ρ2[−2mγkBT (x 1ρ1 + ρ1x 1) + iγ(ρ1p1x1 − x1p1ρ1)] +ρ1[−2mγkBT (x 2ρ2 + ρ2x 2) + iγ(ρ2p2x2 − x2p2ρ2)] +4mγkBT (ρ1x1x2ρ2 + ρ2x2x1ρ1) + iγ[ρ1(x1p2 − p1x2)ρ2 + ρ2(x2p1 − p2x1)ρ1] . (E1A) Contracting the column index associated with the subscript 1 with the row index associated with the subscript 2, and dropping subscripts since all operators now act in the same Hilbert space, we get Dρ(2)/dt→− 2mγkBT (x 2ρ2 + ρx2ρ) + iγ(ρpxρ− xpρ2) − 2mγkBT (ρx 2ρ+ ρ2x2) + iγ(ρ2px− ρxpρ) +4mγkBT (ρx 2ρ+ xρ2x) + iγ[ρ(xp− px)ρ+ pρ2x− xρ2p] . (E1B) We see that the terms that have an operator sandwiched between two factors of ρ cancel, leaving only terms involving ρ2, which have the form of Eq. (56a) with ρ replaced by ρ2. To check the n > 2 to n−1 descent, we split the sums that occur in the same manner as in Appendix D. We thus write Eq. (56b) in the form Dρ(n)/dt = (ρ1...ρn)ℓ[−2mγkBT{x ℓ , ρℓ}+ iγ(ρℓpℓxℓ − xℓpℓρℓ)] +ρ2...ρn[−2mγkBT{x 1, ρ1}+ iγ(ρ1p1x1 − x1p1ρ1)] +ρ1...ρn−1[−2mγkBT{x n, ρn}+ iγ(ρnpnxn − xnpnρn)] (ρ1...ρn)ℓℓ+1[4mγkBTρℓxℓxℓ+1ρℓ+1 + iγρℓ(xℓpℓ+1 − pℓxℓ+1)ρℓ+1] +ρ3...ρn[4mγkBTρ1x1x2ρ2 + iγρ1(x1p2 − p1x2)ρ2] +ρ1...ρn−2[4mγkBTρn−1xn−1xnρn + iγρn−1(xn−1pn − pn−1xn)ρn] +ρ2...ρn−1[4mγkBTρnxnx1ρ1 + iγρn(xnp1 − pnx1)ρ1] . We now contract the column index associated with the subscript n with the row index associated with the subscript 1, and relabel all quantities that had subscript n with subscript 1, since they act now in the same Hilbert space. As is readily seen by inspection of Eq. (E2), this gives Eq. (56b) with n replaced by n−1 and with ρ1 replaced by ρ 1, together with terms of the wrong structure, that grouped together give (4−2−2)ρ2...ρn−1mγkBTρ1x 1ρ1 = 0 and (1 − 1)ρ2...ρn−1iγρ1(x1p1 − p1x1)ρ1 = 0, which thus vanish. This completes the verification of the descent equation for Eq. (56b). References [1] Breuer H-P and Petruccione F (2002) The Theory of Open Quantum Systems (Oxford: Oxford University Press) [2] Breuer H-P and Petruccione F (1996) Phys. Rev. A 54 1146 [3] Wiseman H M (1993) Phys. Rev. A 47 5180 [4] Mølmer K, Castin Y and Dalibard J (1993) J. Opt. Soc. Am. B 10 524 [5] Mielnik, B (1974) Commun. Math. Phys. 37 221; see especially p 240. I wish to thank Lane Hughston for bringing this reference, and ref [6] as well, to my attention. [6] Brody, D C and Hughston, L P (1999) J. Math. Phys. 40 12, Eqs. (31) and (32); Brody, D C and Hughston, L P (1999) Proc. Roy. Soc. A 455 1683, Sec. 2(e); Brody, D C and Hughston, L P (2000) J. Math. Phys. 41, 2586, Eq. (9) and subsequent discussion. [7] Wiseman H M and Diósi L (2001) Chem. Phys. 268 91. See also Diósi L (1986) Phys. Lett. A 114 451 for the transition rate operator. [8] Lindblad G (1976) Commun. Math. Phys. 48 119 [9] Gorini V, Kossakowski A and Sudarshan E C G (1976) J. Math. Phys. 17 821 [10] Schack R and Brun T A (1997) Comp. Phys. Commun. 102 210 [11] Gallis M R and Fleming G N (1990) Phys. Rev. A 42 38 [12] Diósi L (1995) Europhys. Lett. 30 63; Dodd P J and Halliwell J J (2003) Phys. Rev. D 67 105018; Hornberger K and Sipe J E (2003) Phys. Rev. A 68 012105; Adler S L (2006) J. Phys. A: Math. Gen. 39 14067 [13] Hornberger K (2006) Introduction to decoherence theory, arXiv: quant-ph/0612118 [14] Caldeira A O and Leggett A J (1983) Physica A 121 587 [15] Ghirardi G C, Pearle P and Rimini A (1990) Phys. Rev. A 42 78; Hughston L P (1996) Proc. Roy. Soc. A 452 953; Adler S L and Horwitz L P (2000) J. Math. Phys. 41 2485; Adler S L, Brody D C, Brun T A and Hughston L P (2001) J Phys. A: Math. Gen. 34 8795; Adler S L (2004) Quantum Theory as an Emergent Phenomenon (Cambridge UK: http://arxiv.org/abs/quant-ph/0612118 Cambridge University Press) Sec. 6.2 [16] Bassi A and Ghirardi G C Phys. Lett. A 275 373 [17] Zurek W H (1981) Phys. Rev. D 24 1516; Schlosshauer M (2004) Rev. Mod. Phys. 75 1267, p. 1280 [18] For reviews of stochastic reduction models, see Bassi A and Ghirardi G C (2003) Phys. Reports 379 257; Pearle P (1999) Collapse models, in Open Systems and Measurements in Relativistic Quantum Field Theory, Lecture Notes in Physics 526, Breuer H-P and Petruc- cione F eds. (Berlin: Springer-Verlag) ABSTRACT We introduce a density tensor hierarchy for open system dynamics, that recovers information about fluctuations lost in passing to the reduced density matrix. For the case of fluctuations arising from a classical probability distribution, the hierarchy is formed from expectations of products of pure state density matrix elements, and can be compactly summarized by a simple generating function. For the case of quantum fluctuations arising when a quantum system interacts with a quantum environment in an overall pure state, the corresponding hierarchy is defined as the environmental trace of products of system matrix elements of the full density matrix. Only the lowest member of the quantum noise hierarchy is directly experimentally measurable. The unit trace and idempotence properties of the pure state density matrix imply descent relations for the tensor hierarchies, that relate the order $n$ tensor, under contraction of appropriate pairs of tensor indices, to the order $n-1$ tensor. As examples to illustrate the classical probability distribution formalism, we consider a quantum system evolving by It\^o stochastic and by jump process Schr\"odinger equations. As examples to illustrate the corresponding trace formalism in the quantum fluctuation case, we consider collisional Brownian motion of an infinite mass Brownian particle, and the weak coupling Born-Markov master equation. In different specializations, the latter gives the hierarchies generalizing the quantum optical master equation and the Caldeira--Leggett master equation. As a further application of the density tensor, we contrast stochastic Schr\"odinger equations that reduce and that do not reduce the state vector, and discuss why a quantum system coupled to a quantum environment behaves like the latter. <|endoftext|><|startoftext|> Scalar self-force on eccentric geodesics in Schwarzschild spacetime: A time-domain computation Roland Haas Department of Physics, University of Guelph, Guelph, Ontario, Canada N1G 2W1 (Dated: April 3, 2007) We calculate the self-force acting on a particle with scalar charge moving on a generic geodesic around a Schwarzschild black hole. This calculation requires an accurate computation of the retarded scalar field produced by the moving charge; this is done numerically with the help of a fourth-order convergent finite-difference scheme formulated in the time domain. The calculation also requires a regularization procedure, because the retarded field is singular on the particle’s world line; this is handled mode-by-mode via the mode-sum regularization scheme first introduced by Barack and Ori. This paper presents the numerical method, various numerical tests, and a sample of results for mildly eccentric orbits as well as “zoom-whirl” orbits. PACS numbers: 04.25.-g, 04.40.-b, 41.60.-m, 45.50.-j, 02.60.Cb, 02.70.Bf I. INTRODUCTION The inspiral and capture of solar-mass compact objects by supermassive black holes is one of the most promis- ing and interesting sources of gravitational radiation to be detected by the future space-based gravitational-wave antenna LISA [1]. For these extreme mass-ratio inspirals, one can treat the compact object as a point mass and de- scribe its influence on the spacetime perturbatively. Go- ing beyond the test mass limit, its motion is no longer along a geodesic of the unperturbed spacetime of the cen- tral black hole; it is a geodesic of the perturbed space- time created by the presence of the moving body. When viewed from the unperturbed spacetime, the small body is said to move under the influence of its gravitational self-force. The self-force induces radiative losses of energy and angular momentum, which will eventually drive the object into the black hole. To describe the motion of the body, including its inspiral toward the black hole, we seek to evaluate the self-force and calculate its effect on the motion. One way of doing this uses the mode-sum reg- ularization procedure introduced by Barack and Ori [2]. (For a comprehensive introduction of the problem, see the special issue of Classical and Quantum Gravity [3].) In this paper, in an effort to build expertise to calculate the gravitational self-force, we retreat to the technically simpler problem of a point particle of mass m endowed with a scalar charge q orbiting a Schwarzschild black hole of mass M . Following up on a previous paper [4], we implement the numerical part of the regularization pro- cedure for generic orbits with a time-domain integration of the scalar-wave equation. A. The problem Our goal is to calculate the regularized self-force acting on a scalar point charge in orbit around a Schwarzschild black hole. In analogy with the gravitational case, where in a first-order (in m/M) perturbative calculation the particle moves on a geodesic of the background space- time, we take the orbit of the particle to be a geodesic and calculate the self-force as a vector field on this geodesic. We start by writing the Schwarzschild metric using the tortoise coordinate r∗ = r + 2M ln ds2 = f −dt2 + dr∗2 + r2dΩ2, (1.1) where f = 1− 2M , dΩ2 = dθ2 + sin2 θdφ2 is the metric on a two-sphere, and t, r, θ and φ are the usual Schwarzschild coordinates. Our task is to solve the scalar wave equation gαβ∇α∇βΦ(x) = −4πµ(x), (1.2) µ(x) = q δ4(x, z(τ))dτ , (1.3) where ∇α is the covariant derivative compatible with the metric gαβ , Φ(x) is the scalar field created by a scalar charge q which moves along a world line γ : τ 7→ z(τ) parametrized by proper time τ . The source term µ(x) appearing on the right-hand side is written in terms of a scalarized four-dimensional Dirac δ-function δ4(x, x ′) := δ(x0 − x′0)δ(x1 − x′1)δ(x2 − x′2)δ(x3 − x′3)/ − det(gαβ). Because of the singularity in the source term, the re- tarded solution to Eq. (1.2) is singular on the world line, and the näıve expression for the self-force, Fα(τ) = q∇αΦ(z(τ)), (1.4) must be regularized. Following DeWitt and Brehme [5], Mino, Sasaki, Tanaka [6], Quinn and Wald [7], Quinn [8] carried out this regularization for the electromagnetic, scalar and gravitational radiation reaction. In later work, Detweiler and Whiting [9] introduced a very useful de- composition of the retarded solution of Eq. (1.2) in terms of a singular part ΦS and a regular remainder ΦR: Φ = ΦS +ΦR. (1.5) ΦR is regular and differentiable at the position of the par- ticle, satisfies the homogeneous wave equation associated http://arxiv.org/abs/0704.0797v2 with Eq. (1.2), and is solely responsible for the self-force acting on the particle. ΦS , on the other hand, satisfies Eq. (1.2), is just as singular at the particle’s position as the retarded solution, and produces no force on the par- ticle. Rearranging Eq. (1.5) and differentiating once, we can write the regularized self-force as Fα := q∇αΦR = q ∇αΦ−∇αΦS . (1.6) In a previous paper [4], we described our implemen- tation of the regularization procedure to find a mode- sum representation of ∇αΦS along a generic geodesic of the Schwarzschild spacetime. Schematically, we intro- duce a tetrad eα and decompose the tetrad components Φ(µ) := e ∇αΦ of the field gradient in terms of ordinary scalar spherical harmonics Yℓm: Φ(µ)(t, r, θ, φ) = Φℓm(µ)(t, r)Yℓm(θ, φ). (1.7) Each mode Φℓm (t, r) is finite at the position of the par- ticle, but their sum diverges on the world line. In [4], we derive analytic expressions for the mode-sum decompo- sition of ΦS(µ), ΦS(µ) =q ΦS(µ),ℓ (1.8) ΦS(µ),ℓ = A(µ) +B(µ) + (ℓ− 1 )(ℓ + 3 + · · · , (1.9) where the coefficients A(µ), B(µ), C(µ), and D(µ) are in- dependent of ℓ; they are listed in Appendix B for conve- nience. As each mode of Φ is finite, it is straightforward to compute the modes of the retarded solution using nu- merical methods, and we will describe how this was done in Sec. IV. We use the numerical solutions in Eq. (1.6) to calculate the regularized self-force, regularizing mode- by-mode: ΦR(µ) = Φ(µ),ℓ − ΦS(µ),ℓ , (1.10) where Φ(µ),ℓ := Yℓm (no summation over ℓ im- plied). For numerical purposes it is convenient to define ψℓm Φ(x) = ℓm, (1.11) where Yℓm are the usual scalar spherical harmonics. Af- ter substituting in Eq. (1.2), this yields a reduced wave equation for the multipole moments ψℓm: −∂2tψℓm + ∂2r∗ψℓm − Vℓψℓm = − 4πq Ȳℓm(π/2, φ0)δ(r ∗ − r∗0), (1.12) where Vℓ = f ℓ (ℓ+ 1) . (1.13) An overbar denotes complex conjugation, E = −ut is the particle’s conserved energy per unit mass, and uα = dz is its four velocity. Quantities bearing a subscript “0” are evaluated at the particle’s position; they are functions of τ that are obtained by solving the geodesic equation uβ∇βuα = 0 (1.14) in the background spacetime. Without loss of general- ity, we have confined the motion of the particle to the equatorial plane θ = π Once we have numerically solved Eq. (1.12), we ex- tract numerical estimates for ψℓm, ∂tψℓm and ∂r∗ψℓm, which can then be used to find Φℓm, ∂tΦℓm and ∂rΦℓm. These—together with the translation table displayed in Eqs. (1.23)–(1.26) of [4], reproduced in Appendix A— allow us to find the tetrad components Φ(µ)ℓm with re- spect to the tetrad defined by Eqs. (1.18)–(1.21) of [4]. Eventually we regularize the multipole coefficients Φ(µ)ℓ = Φ(µ)ℓm(t0, r0)Y ℓm(π/2, φ0) (1.15) using Eq. (1.10); this involves the regularization param- eters listed in Eqs. (1.30)–(1.45) of [4], which are repro- duced in Appendix B. B. Organization of this paper In Sec. II we introduce the main ideas behind the discretization scheme used in the numerical simulation. Sec. III describes the choices we make in order to handle the problems of specifying initial data and proper bound- ary conditions. The next section—Sec. IV—provides de- tails on the concrete implementation of the ideas put forth in Secs. II and III. In Sec. V we describe the tests we performed in order to validate our implementation of the numerical method. Sec. VI finally presents sample results for a small number of representative simulations. C. Future work This work, which deals with a scalar charge moving in the Schwarzschild spacetime, is not intended to produce physically or astrophysically interesting results. Instead, its goal is to help us evaluate the merits of several strate- gies that could be used to tackle the more interesting (and difficult) problems of electromagnetism and gravity. One future project we are currently exploring is to ap- ply the formalism developed so far to the electromagnetic self-force acting on an electric charge. Beyond the tech- nical complication of having to deal with a vector field instead of a single scalar quantity, we are also faced with the reality of having to impose a gauge (in our case: the Lorenz gauge) and to eliminate (or at least control) gauge violations in the numerical simulation. The first step, namely, the calculation of the regularization parameters A(µ), B(µ), C(µ), and D(µ) for the self-force, is currently underway. Also underway is the calculation of the regu- larization parameters for he gravitational self-force. Another project is the implementation of a scheme to use the calculated self-force to update the orbital pa- rameters of a particle on its inspiral toward the black hole. The standard proposed approach to this problem in the past has been to calculate the self-force on a set of geodesics which are momentarily tangent to the par- ticle’s trajectory. The self-force calculated in this way is then used to update the orbital elements. This “after the fact” calculation of the motion requires one to build (in advance) a large database of self-force values for the anticipated set of orbital parameters that the particle’s trajectory will assume during its inspiral. Alternatively, and conceptually more simply, the self-force could be cal- culated self-consistently along the real, accelerated tra- jectory. Such an approach requires changes in the expres- sions of the regularization parameters, which so far have been derived only for geodesic orbits. We are currently investigating the merits of such an approach. II. NUMERICAL METHOD In this section we describe the algorithm used to inte- grate the reduced wave equation [Eq. (1.12)] numerically. For the most part we use the fourth-order algorithm in- troduced by Lousto [10], with some modifications to suit our needs. We choose to implement a fourth-order con- vergent code because second-order convergence for the potential Φ, while much easier to achieve, would guaran- tee only first-order convergence for ∇αΦ, the quantity in which we are ultimately interested. With a fourth-order convergent code we can expect to achieve third-order con- vergence for ∇αΦ, which is required for an accurate es- timation of the self-force. Numerical experiments, how- ever, show that in practice we do achieve fourth-order convergence for the derivatives of Φ, a fortunate outcome that we exploit but cannot explain. From now on, we will suppress the subscripts ℓ and m on Vℓ and ψℓm for convenience of notation. The wave equation consists of three parts: the wave-operator term (∂2r∗ − ∂2t )ψ and the potential term V ψ on the left-hand side, and the source term on the right-hand side of the equation. Of these, the wave operator turns out to be easiest to handle, and the source term does not create a substantial difficulty. The term involving the potential V turns out to be the most difficult one to handle. Following Lousto we introduce a staggered grid with step sizes ∆t = 1 ∆r∗ ≡ h, which follows the characteris- tic lines of the wave operator in Schwarzschild spacetime; see Fig. 1 for a sketch of a typical grid cell. The basic idea behind the method is to integrate the wave equation over a unit cell of the grid, which nicely deals with the Dirac-δ source term on the right-hand side. To this end, we introduce the Eddington-Finkelstein null coordinates v = t + r∗ and u = t − r∗ and use them as integration variables. A. Differential operator Rewriting the wave operator in terms of u and v, we find −∂2t + ∂2r∗ = −4∂u∂v, which allows us to evaluate the integral involving the wave operator exactly. We find −4∂u∂vψ du dv =− 4[ψ(t+ h, r∗) + ψ(t− h, r∗) − ψ(t, r∗ − h)− ψ(t, r∗ + h)]. (2.1) B. Source term If we integrate over a cell traversed by the particle, then the source term on the right-hand side of the equation will have a non-zero contribution. Writing the source term as G(t, r∗)δ(r∗ − r∗0(t)) with G(t, r∗) = −4πq f Ȳℓm(π/2, φ0), (2.2) we find Gδ(r∗ − r∗0(t)) du dv =− f0(t) r0(t) × Ȳℓm(π/2, φ0(t)) dt, (2.3) where t1 and t2 are the times at which the particle enters and leaves the cell, respectively. While we do not have an analytic expression for the trajectory of the particle (except when the particle follows a circular orbit), we can numerically integrate the first-order ordinary differential equations that govern the particle’s motion to a precision that is much higher than that of the partial differential equation governing ψ. In this sense we treat the integral over the source term as exact. To evaluate the integral we adopt a four-point Gauss-Legendre scheme, which has an error of order h8. C. Potential term The most problematic term—from the point of view of implementing an approximation of sufficiently high or- der in h—turns out to be the term V ψ in Eq. (1.12). Since this term does not contain a δ-function, we have to approximate the double integral V ψ du dv (2.4) t0 − h t0 + h t0 − 2h 0 − 2h r 0 + 2h 0 − 3h r 0 + 3hr 0 + hr 0 − h 13 34 87 9 10 FIG. 1: Points used to calculate the integral over the potential term for vacuum cells. Grid points are indicated by blue cir- cles while red cross-hairs indicate points in between two grid points. We calculate field values at points that do not lie on the grid by employing the second-order algorithm described in [10]. up to terms of order h6 for a generic cell in order to achieve an overall O(h4) convergence of the scheme. Here we have to treat cells traversed by the particle (“sourced” cells) differently from the generic (“vacuum”) cells. While much of the algorithm can be transferred from the vacuum cells to the sourced cells, some modifica- tions are required. We will describe each case separately in the following subsections. 1. Vacuum case To implement Lousto’s algorithm to evolve the field across the vacuum cells, we use a double Simpson rule to compute the integral Eq. (2.4). We introduce the nota- g(t, r∗) = V (r∗)ψ(t, r∗) (2.5) and label our points in the same manner (see Fig. 1) as in [10]: g du dv = [g1 + g2 + g3 + g4 + 4(g12+ g24 + g34 + g13) + 16g0] +O(h 6). (2.6) Here, for example, g1 is the value of g at the grid point labeled 1, and g12 is the value of g at the off-grid point labeled 12, etc. Deviating from Lousto’s algorithm, we choose to calculate g0 using an expression different from that derived in [10]. Unlike Lousto’s approach, our ex- pression exclusively involves points that are within the past light cone of the current cell. We find 8V4 ψ4 + 8V1 ψ1 + 8V2 ψ2 − 4V6 ψ6 − 4V5 ψ5 + V10 ψ10 + V7 ψ7 − V9 ψ9 − V8 ψ8 +O(h4). (2.7) In order to evaluate the term in parentheses in Eq. (2.6), we again use a variant of the equations given in [10]. Lousto’s equations (33) and (34), g13 + g12 =V (r 0 − h/2) (ψ1 + ψ0) V (r∗0 − h/2) +O(h4), (2.8) g24 + g34 =V (r 0 + h/2) (ψ0 + ψ4) V (r∗0 + h/2) +O(h4) (2.9) contain isolated occurrences of ψ0, the value of the field at the central point. Since Eq. (2.7) only allows us to find g0 = V0ψ0, finding ψ0 would involve a division by V0, which will be numerically unstable very close to the event horizon where V0 ≈ 0. Instead we choose to express the potential term appearing in the square brackets as a Taylor series around r∗0 . This allows us to eliminate the isolated occurrences of ψ0, and we find g13+g12 + g24 + g34 = 2V (r V (r∗0) + V (r∗0 − h/2)ψ1 V (r∗0 − h/2) + V (r∗0 + h/2)ψ4 V (r∗0 + h/2) V (r∗0 − h/2)− 2V (r∗0) + V (r∗0 + h/2) (ψ1 + ψ4) +O(h 4). (2.10) Because of the factor in Eq. (2.6), this allows us to reach the required O(h6) convergence for a generic vacuum cell. This—given that there is a number of order N = 1/h2 of such cells—yields the desired overall O(h4) convergence of the full algorithm, at the end of the N steps required to finish the simulation. 2. Sourced cells For vacuum cells, the algorithm described above is the complete algorithm used to evolve the field forward in time. For cells traversed by the particle, however, we have to reconsider the assumptions used in deriving Eqs. (2.7) and (2.10). When deriving Eq. (2.10) we have employed the second-order evolution algorithm (see [10]), in which the single step equation ψ3 =− ψ2 + (ψ1 + ψ4) (2.11) O(h3)O(h4)O(h5) O(h5)O(h4) traje tory t0 + h t0 − h t0 − 2h FIG. 2: Cells affected by the passage of the particle, showing the reduced order of the single step equation is accurate only to O(h3) for cells traversed by the parti- cle. For these cells, therefore, the error term in Eq. (2.10) is O(h3) instead of O(h4). As there is a number of or- der N ′ = 1/h of cells that are traversed by the parti- cle in a simulation run, the overall error—after including factor in Eq. (2.6)—is of order h4. We can therefore afford this reduction of the convergence order in Eq. (2.10) Equation (2.7), however, is accurate only to O(h) for cells traversed by the particle. Again taking the factor into account, this renders the overall algorithm O(h2). Figure 2 shows the cells affected by the particle’s traversal and the reduced order of the single step equa- tion for each cell. Cells whose convergence order is O(h5) or higher do not need modifications, since there is only a number N ′ = 1/h of such cells in the simulation. We are therefore concerned about cells neighboring the particle’s trajectory and those traversed by the particle. a. Cells neighboring the particle These cells are not traversed by the particle, but the particle might have traversed cells in their past light-cone, which are used in the calculation of g0 in Eq. (2.7). For these cells, we use a one-dimensional Taylor expansion of g(t, r∗) within the current time-slice t = t0, 5V (r∗0 − h)ψ(t0, r∗0 − h) + 15V (r∗0 − 3h)ψ(t0, r∗0 − 3h) − 5V (r∗0 − 5h)ψ(t0, r∗0 − 5h) + V (r∗0 − 7h)ψ(t0, r∗0 − 7h) +O(h4) (2.12) for the cell on the left-hand side, and 5V (r∗0 + h)ψ(t0, r 0 + h) + 15V (r∗0 + 3h)ψ(t0, r 0 + 3h) − 5V (r∗0 + 5h)ψ(t0, r∗0 + 5h) + V (r∗0 + 7h)ψ(t0, r 0 + 7h) +O(h4) (2.13) for the cell on the right-hand side, where (t0, r 0) is the center of the cell traversed by the particle. Both of these are more accurate than is strictly necessary; we would t0 − h t0 + h 0 + 2hr 0 − 2h r 0 − h r 0 + h (3a) (3b) (2a) (2b) FIG. 3: Typical cell traversal of the particle. We split the domain into sub-parts indicated by the dotted line based on the time the particle enters (at t1) and leaves (at t2) the cell. The integral over each sub-part is evaluated using an iterated two-by-two point Gauss-Legendre rule. need error terms of order h3 to achieve the desired over- all O(h4) convergence of the algorithm. Keeping the ex- tra terms, however, improves the numerical convergence slightly. b. Cell traversed by the particle We choose not to implement a fully explicit algorithm to handle cells tra- versed by the particle, because this would increase the complexity of the algorithm by a significant factor. In- stead we use an iterative approach to evolve the field using the integrated wave equation −4(ψ3+ψ2 − ψ1 − ψ4)− V ψ du dv = − 8πq f0(t) r0(t) Ȳℓm(π/2, φ0(t)) dt. (2.14) In this equation the integral involving the source term can be evaluated to any desired accuracy at the begin- ning of the iteration, because the motion of the particle is determined by a simple system of ordinary differential equations, which are easily integrated with reliable nu- merical methods. It remains to evaluate the integral over the potential term, which we do iteratively. Schemati- cally the method works as follows: • Make an initial guess for ψ3 using the second-order algorithm. This guess is correct up to terms of O(h3). • Match a second-order piecewise interpolation poly- nomial to the six points that make up the past light- cone of the future grid point, including the future point itself. • Use this approximation for ψ to numerically calcu- V ψ du dv, using two-by-two point Gauss-Legendre rules for the six sub-parts indicated in Fig. 3. • Update the future value of the field and repeat the process until the iteration has converged to a re- quired degree of accuracy. trajectory FIG. 4: Numerical domain evolved during the simulation. We impose an inner boundary condition close to the black whole where we can implement it easily to the accuracy of the un- derlying floating point format. Far away from the black hole, we evolve the full domain of dependence of the initial data domain without imposing boundary conditions. III. INITIAL VALUES AND BOUNDARY CONDITIONS As is typical for numerical simulations, we have to pay careful attention to specifying initial data and appropri- ate boundary conditions. These aspects of the numerical method are highly non-trivial problems in full numerical relativity, but they can be solved or circumvented with moderate effort in the present work. A. Initial data In this work we use a characteristic grid consisting of points lying on characteristic lines of the wave operator to evolve ψ forward in time. As such, we need to specify characteristic initial data on the lines u = u0 and v = v0 shown in Fig. 4. We choose not to worry about specifying “correct” initial data, but instead arbitrarily choose ψ to vanish on u = u0 and v = v0: ψ(u = u0, v) = ψ(u, v = v0) = 0. (3.1) This is equivalent to adding spurious initial waves in the form of a homogeneous solution of Eq. (1.12) to the cor- rect solution. This produces an initial wave burst that moves away from the particle with the speed of light, and quickly leaves the numerical domain. Any remain- ing tails of the spurious initial data decay as t−(2ℓ+2) as shown in [11] and become negligible after a short time. We conclude that the influence of the initial-wave con- tent on the self-force becomes negligible after a time of the order of the light-crossing time of the particle’s orbit. B. Boundary conditions On the analytical side we would like to impose ingoing boundary conditions at the event horizon r∗ → −∞ and outgoing boundary conditions at spatial infinity r∗ → ∞, r∗→−∞ ∂uψ =0, lim ∂vψ =0. (3.2) Because of the finite resources available to a computer we can only simulate a finite region of the spacetime, and are faced with the reality of implementing boundary conditions at finite values of r∗. Two solutions to this problem present themselves: 1. choose the numerical domain to be the domain of dependence of the initial data surface. Since the effect of the boundary condition can only propagate forward in time with at most the speed of light, this effectively hides any influence of the boundary. This is what we choose to do in order to deal with the outer boundary condition. 2. implement boundary conditions sufficiently “far out” so that numerically there is no difference be- tween imposing the boundary condition there or at infinity. Since the boundary conditions depend on the vanishing of the potential V (r) appearing in the wave equation, this will happen once 1−2M/r ≈ 0. Near the horizon r ≈ 2M(1 + exp(r∗/2M)), so this will happen—to numerical accuracy—for mod- estly large (negative) values of r∗ ≈ −73M . We choose to implement the ingoing waves condition ∂uψℓm = 0 there. IV. IMPLEMENTATION Making more precise the ideas developed in the pre- ceding sections, we implement the following numerical scheme. A. Particle motion Following Darwin [12] we introduce the dimensionless semi-latus rectum p and the eccentricity e such that for a bound orbit around a Schwarzschild black hole of mass 1 + e , r2 = (4.1) are the radial positions of the periastron and apastron, respectively. Energy per unit mass and angular momen- tum per unit mass are then given by (p− 2− 2e)(p− 2 + 2e) p (p− 3− e2) , L2 = p− 3− e2 (4.2) Together with these definitions it is useful to introduce an orbital parameter χ such that along the trajectory of the particle, r(χ) = 1 + e cosχ , (4.3) where χ is single-valued along the orbit. We can then write down first-order differential equations for χ(t) and the azimuthal angle φ(t) of the particle, (p− 2− 2e cosχ)(1 + e cosχ)(1 + e cosχ) (Mp2) p− 6− 2e cosχ (p− 2− 2e)(p− 2 + 2e) , (4.4) (p− 2− 2e cosχ)(1 + e cosχ)2 p3/2M (p− 2− 2e)(p− 2 + 2e) . (4.5) We use the embedded Runge-Kutta-Fehlberg (4, 5) algo- rithm provided by the GNU Scientific Library routine gsl odeiv step rkf45 and an adaptive step-size control to evolve the position of the particle forward in time. Intermediate values of the particle’s position are found using a Hermite interpolation of the nearest available cal- culated positions. B. Initial data We do not specify initial data. The field is set to zero on the initial characteristic slices, u = u0 and v = v0. C. Boundary conditions We adjust the outer boundary of the numerical do- main at each time-step so that we cover the domain of dependence of the initial characteristic surfaces and the particle’s world line. The resulting numerical domain was already shown in Fig. 4. Near the event horizon, at r∗ ≈ −73M , we implement an ingoing-wave boundary condition by imposing ψ(t+ h, r∗) = ψ(t, r∗ − h). (4.6) This allows us to drastically reduce the number of cells in the numerical domain, and consequently the running time of the simulation. D. Evolution in vacuum Cells not traversed by the particle are evolved using Eqs. (2.1), (2.6) – (2.10). Explicitly written out, we use ψ3 = −ψ2 (V0 + V1) + V0 (V0 + V1) (V0 + V4) + V0 (V0 + V4) (g12 + g24 + g34 + g13 + 4g0), (4.7) where g0 is given by Eq. (2.7) and the sum g12 + g24 + g34 + g13 is given by Eq. (2.10). E. Cells next to the particle Vacuum cells close to the current position of the parti- cle require a different approach to calculate g0, since the cells in their past light cone could have been traversed by the particle. We use Eqs. (2.12) and (2.13) to find g0 in this case. Other than this modification, the same algorithm as for generic vacuum cells is used. F. Cells traversed by the particle We evolve cells traversed by the particle using the it- erative algorithm described in Sec. II C 2. Here ψ3 =− ψ1 + ψ2 + ψ4 − V ψ du dv f0(t) r0(t) Ȳℓm(π/2, π0(t)) dt, (4.8) where the initial guess for the iterative evolution of∫∫ V ψ du dv is obtained using the second order algo- rithm of Lousto and Price [13], ψ3 =− ψ1 + × [ψ2 + ψ4] f0(t) r0(t) Ȳℓm(π/2, π0(t)) dt. (4.9) Successive iterations use a four-point Gauss-Legendre rule to evaluate the integral of V ψ; this requires a second- order polynomial interpolation of the current field values as described in Appendix C. G. Extraction of the field data at the particle In order to extract the value of the field and its first derivatives at the position of the particle, we again use a polynomial interpolation at the points surrounding the particle’s position. Using a fourth-order polynomial, as described in Appendix C, we can estimate ψ, ∂tψt, and ∂r∗ψ at the position of the particle up to errors of order h4. As was briefly mentioned in Sec. II, we would expect an error term of order h3 for ∂tψt and ∂r∗ψ. The O(h accuracy we actually achieve by using a fourth-order (in- stead of a third-order) piecewise polynomial shows up clearly in a regression plot such as Fig. 7. H. Regularization of the mode sum We use the calculated multipole moments ψℓm to con- struct the multipole moments Φℓm, and first derivatives ∂tΦℓm and ∂rΦℓm, of the scalar field. These, in turn, are used to calculate the tetrad components Φ(0)ℓm, Φ(+)ℓm, Φ(−)ℓm, and Φ(3)ℓm of the field gradient according to Eqs. (1.23)–(1.26) of [4], which are reproduced in Ap- pendix A. These multipoles then give rise to the multi- pole coefficients of the retarded field, Φ(µ)ℓ(t, r, θ, φ) = Φ(µ)ℓm(t, r)Yℓm(θ, φ), (4.10) which are subjected to the regularization procedure de- scribed by Eq. (1.29) of [4], ΦR(µ)(t, r0, π/2, φ0) = lim Φ(µ)ℓ(t, r0 +∆, π/2, φ0) (ℓ+ 1/2)A(µ) +B(µ) (ℓ + 1/2) (ℓ− 1/2)(ℓ+ 3/2) + · · · , (4.11) using the regularization parameters A(µ), B(µ), C(µ), and D(µ) tabulated in Appendix B. Finally we reconstruct the vector components of the field gradient using Eqs. (1.47)–(1.48) of [4], ΦRt = (0), (4.12) ΦRr = ΦR(+)e −iφ0 +ΦR(−)e , (4.13) ΦRθ = −r0ΦR(3), (4.14) ΦRφ = − ΦR(+)e −iφ0 − ΦR(−)e , (4.15) and calculate the self-force Fα = qΦ α . (4.16) We recall the discussion in Sec. I A concerning the def- inition of ΦR, its connection to the self-force acting on the particle, and its regularity at the particle’s position. V. NUMERICAL TESTS In this section we present the tests we have performed to validate our numerical evolution code. First, in order to check the fourth-order convergence rate of the code, we perform regression runs with increasing resolution for both a vacuum test case, where we seeded the evolution with a Gaussian wave packet, and a case where a particle is present. As a second test, we compute the regularized self-force for several different combinations of orbital el- ements p and e and check that the multipole coefficients decay with ℓ as expected. This provides a very sensi- tive check on the overall implementation of the numerical scheme, as well as the analytical calculations that lead to the regularization parameters. Finally, we calculate the self-force for a particle on a circular orbit and show that it agrees with the results presented in [4, 14]. A. Convergence tests: Vacuum As a first test of the validity of our numerical code we estimate the convergence order by removing the particle and performing regression runs for several resolutions. We use a Gaussian wave packet as initial data, ψ(u = u0, v) = exp(−[v − vp]2/[2σ2]), (5.1) ψ(u, v = v0) = 0, (5.2) where vp = 75M and σ = 10M , v0 = −u0 = 6M + 2M ln 2, and we extract the field values at r∗ = 20M . Several such runs were performed, with varying resolu- tion of 2, 4, 8, 16, and 32 grid points per M . Figure 5 shows ψ(2h)−ψ(h) rescaled by appropriate powers of 2, so that in the case of fourth-order convergence the curves would lie on top of each other. As can be seen from the plots, they do, and the vacuum portion of the code is indeed fourth-order convergent. B. Convergence tests: Particle While the convergence test described in section VA clearly shows that the desired convergence is achieved for vacuum evolution, it does not test the parts of the code that are used in the integration of the inhomoge- neous wave equation. To test these we perform a second set of regression runs, this time using a non-zero charge q. We extract the field at the position of the particle, thus also testing the implementation of the extraction algorithm described in section IVG. For this test we choose the ℓ = 6, m = 4 mode of the field generated by a particle on a mildly eccentric geodesic orbit with p = 7, e = 0.3. As shown in Fig. 6 the convergence is still of fourth order, but the two curves no longer lie precisely on top of each other at all times. The region before t ≈ 100M is dominated by the initial wave burst and therefore does not scale as expected, yielding two very different curves. In the region 300M . t . 400M the two curves lie on top of each other, as expected for a fourth-order convergent algorithm. In the region between t ≈ 200M and t ≈ 300M , however, the dashed curves -8.0e-07 -6.0e-07 -4.0e-07 -2.0e-07 0.0e+00 2.0e-07 4.0e-07 6.0e-07 8.0e-07 0 20 40 60 80 100 120 δ16-8 δ32-16 FIG. 5: Convergence test of the numerical algorithm in the vacuum case. We show differences between simulations using different step sizes h = 0.5M (ψ2), h = 0.25M (ψ4), h = 0.125M (ψ8), h = 0.0625M (ψ16), and h = 0.03125M (ψ32). Displayed are the rescaled differences δ4−2 = ψ4 −ψ2, δ8−4 = 24(ψ8 − ψ4), δ16−8 = 4 4(ψ8 − ψ4), and δ32−16 = 8 4(ψ8 − ψ4) for the real part of the ℓ = 2, m = 2 mode at r∗ ≈ 20M . The maximum value of the field itself is of the order of 0.1, so that the errors in the field values are roughly five orders of magnitude smaller than the field values themselves. We can see that the convergence is in fact of fourth-order, as the curves lie nearly on top of each other, with only the lowest resolution curve δ4−2 deviating slightly. have slightly smaller amplitudes than the solid one, indi- cating an order of convergence different from (but close to) four. To explain this behavior we have to examine the terms that contribute significantly to the error in the simula- tion. The numerical error is almost completely domi- nated by that of the approximation of the potential term∫∫ V ψ du dv in the integrated wave equation. For vac- uum cells the error in this approximation scales as h6, where h is the step size. For cells traversed by the parti- cle, on the other hand, the approximation error depends also on the difference t2−t1 of the times at which the par- ticle enters and leaves the cell. This difference is bounded by h but does not necessarily scale as h. For example, if a particle enters a cell at its very left, then scaling h by 1 would not change t2 − t1 at all, thus leading to a scaling behavior that differs from expectation. To investigate this further we conducted test runs of the simulation for a particle on a circular orbit at r = 6M . In order to observe the expected scaling behav- ior, we have to make sure that the particle passes through the tips of the cell it traverses. When this is the case, then t2 − t1 ≡ h and a plot similar to the one shown in Fig. 6 shows the proper scaling behavior. As a further test we artificially reduced the convergence order of the vacuum algorithm to two by implementing the second-order algo- rithm described in [10]. By keeping the algorithm that deals with sourced cells unchanged, we reduced the rela- -4e-05 -2e-05 2e-05 4e-05 100 200 300 400 500 δ16-8 FIG. 6: Convergence test of the numerical algorithm in the sourced case. We show differences between simulations using different step sizes of 4 (ψ4), 8 (ψ8), 16 (ψ16), and 32 (ψ32) cells per M . Displayed are the rescaled differences δ8−4 = ψ8 −ψ4, etc. (see caption of Fig. 5 for definitions) of the field values at the position of the particle for a simulation with ℓ = 6, m = 4 and p = 7, e = 0.3. We see that the convergence is approximately fourth-order. -4e-05 -2e-05 2e-05 4e-05 100 200 300 400 500 δ32-16 δ16-8 FIG. 7: Convergence test of the numerical algorithm in the sourced case. We show differences between ∂rΦ for simula- tions using different step sizes of 4 (Φr,4), 8 (Φr,8), 16 (Φr,16), and 32 (Φr,32) cells per M . Displayed are the rescaled differ- ences δ8−4 = Φr,8 − Φr,4 etc. of the values at the position of the particle for a simulation with ℓ = 6, m = 4 and p = 7, e = 0.3. Although there is much noise caused by the piece- wise polynomials used to extract the data, we can see that the convergence is approximately fourth-order. tive impact on the numerical error. This, too, allows us to recover the expected (second-order) convergence. Fig- ures 8 and 9 illustrate the effects of the measures taken to control the convergence behavior. -3.000e-05 -2.000e-05 -1.000e-05 0.000e+00 1.000e-05 2.000e-05 3.000e-05 200 210 220 230 240 250 260 270 280 δ64-32 δ32-16 δ16-8 -3.000e-05 -2.000e-05 -1.000e-05 0.000e+00 1.000e-05 2.000e-05 3.000e-05 200 210 220 230 240 250 260 270 280 FIG. 8: Behavior of convergence tests for a particle in circular orbit at r = 6M . We show differences between simulations of the ℓ = 2, m = 2 multipole moment using different step sizes of 2 (ψ2), 4 (ψ4), 8 (ψ8), 16 (ψ16), 32 (ψ32) and 64 (ψ64) cells per M . Displayed are the real part of the rescaled differences δ4−2 = (ψ4 −ψ2) etc. of the field values at the position of the particle, defined as in Fig. 5. The values have been rescaled so that—for fourth order convergence—the curves should all coincide. The upper panel corresponds to a set of simulations where the particle traverses the cells away from their tips. The curves do not coincide perfectly with each other, seem- ingly indicating a failure of the convergence. The lower panel was obtained in a simulation where the particle was carefully positioned so as to pass through the tips of each cell it tra- verses. This set of simulations passes the convergence test more convincingly. C. High-ℓ behavior of the multipole coefficients Inspection of Eq. (4.11) reveals that a plot of Φ(µ)ℓ as a function of ℓ (for a selected value of t) should display a linear growth in ℓ for large ℓ. Removing the A(µ) term should produce a constant curve, removing the B(µ) term (given that C(µ) = 0) should produce a curve that decays as ℓ−2, and finally, removing the D(µ) term should pro- duce a curve that decays as ℓ−4. It is a powerful test of the numerical methods to check whether these expecta- tions are borne out by the numerical data. Fig. 10 plots the remainders as obtained from our numerical simula- tion, demonstrating the expected behavior. It displays, on a logarithmic scale, the absolute value of ReΦR(+)ℓ, the real part of the (+) component of the self-force. The orbit is eccentric (p = 7.2, e = 0.5), and all components of the self-force require regularization. The first curve (in trian- gles) shows the unregularized multipole coefficients that increase linearly in ℓ, as confirmed by fitting a straight line to the data. The second curve (in squares) shows par- tially regularized coefficients, obtained after the removal of (ℓ + 1/2)A(µ); this clearly approaches a constant for large values of ℓ. The curve made up of diamonds shows the behavior after removal of B(µ); because C(µ) = 0, it decays as ℓ−2, a behavior that is confirmed by a fit to -8.000e-05 -4.000e-05 0.000e+00 4.000e-05 8.000e-05 200 210 220 230 240 250 260 270 280 -8.000e-05 -4.000e-05 0.000e+00 4.000e-05 8.000e-05 200 210 220 230 240 250 260 270 280 -8.000e-05 -4.000e-05 0.000e+00 4.000e-05 8.000e-05 200 210 220 230 240 250 260 270 280 δ64-32 δ32-16 δ16-8 FIG. 9: Behavior of convergence tests for a particle in circular orbit at r = 6M . We show differences between simulations of the ℓ = 2, m = 2 multipole moment using different step sizes of 8 (ψ8), 16 (ψ16), 32 (ψ32), and 64 (ψ64) cells per M . Displayed are the real part of the rescaled differences δ16−8 = ψ16−ψ8 etc. of the field values at the position of the particle, defined as in Fig. 5. The values have been rescaled so that— for second order convergence—the curves should all coincide. The upper two panels correspond to simulations where the second order algorithm was used throughout. For the topmost one, care was taken to ensure that the particle passes through the tip of each cell it traverses, while in the middle one no such precaution was taken. Clearly the curves in the middle panel do not coincide with each other, indicating a failure of the second-order convergence of the code. The lower panel was obtained in a simulation using the mixed-order algorithm described in the text. While the curves still do not coincide precisely, the observed behavior is much closer to the expected one than for the purely second order algorithm. the ℓ ≥ 5 part of the curve. Finally, after removal of D(µ)/[(ℓ − 12 ) (ℓ + )] the terms of the sum decrease in magnitude as ℓ−4 for large values of ℓ, as derived in [15]. Each one of the last two curves would result in a con- verging sum, but the convergence is much faster after subtracting the D(µ) terms. We thereby gain more than 2 orders of magnitude in the accuracy of the estimated Figure 10 provides a sensitive test of the implemen- tation of both the numerical and analytical parts of the calculation. Small mistakes in either one will cause the difference in Eq. (4.11) to have a vastly different behav- 0 2 4 6 8 10 12 14 ReΦ(+) ReΦ(+)-A ReΦ(+)-A-B ReΦ(+)-A-B-D FIG. 10: Multipole coefficients of the dimensionless self-force ReΦR(+) for a particle on an eccentric orbit (p = 7.2, e = 0.5). The coefficients are extracted at t = 500M along the trajectory shown in Fig. 12. The plots show several stages of the regularization procedure, with a closer description of the curves to be found in the text. 0 2 4 6 8 10 12 14 16 18 20 FIG. 11: Multipole coefficients of ΦR(0) for a particle on a circu- lar orbit. Note that ΦR(0)ℓ is linked to Φ t via Φ The multipole coefficients decay exponentially with ℓ until ℓ ≈ 16, at which point numerical errors start to dominate. D. Self-force on a circular orbit For the case of a circular orbit, the regularization pa- rameters A(0), B(0), and D(0) all vanish identically, so that the (0) (or alternatively the t) component of the self-force does not require regularization. Figure 11 thus shows only one curve, with the magnitude of the multi- pole coefficients decaying exponentially with increasing As a final test, in Table I we compare our result for the self-force on a particle in a circular orbit at r = 6M to those obtained in [4, 14] using a frequency-domain code. For a circular orbit, a calculation in the frequency domain TABLE I: Results for the self-force on a scalar particle with scalar charge q on a circular orbit at r0 = 6M . The first column lists the results as calculated in this work us- ing time-domain numerical methods, while the second and third columns list the results as calculated in [4, 14] using frequency-domain methods. For the t and φ components the number of digits is limited by numerical roundoff error. For the r component the number of digits is limited by the trun- cation error of the sum of multipole coefficients. This work: Previous work: Diaz-Rivera time-domain frequency-domain [4] et. al. [14] ΦRt 3.60339 × 10 −4 3.60907254 × 10−4 1.6767 × 10−4 1.67730 × 10−4 1.6772834 × 10−4 ΦRφ −5.30424 × 10 −3 −5.30423170 × 10−3 is more efficient, and we expect the results of [4, 14] to be much more accurate than our own results. This fact is reflected in the number of regularization coefficients we can reliably extract from the numerical data, before being limited by the accuracy of the numerical method: the frequency-domain calculation found usable multipole coefficients up to ℓ = 20, whereas our data for ΦR(0)ℓ is dominated by noise by the time ℓ reaches 16. Figure 11 shows this behavior. E. Accuracy of the numerical method Several figures of merit can be used to estimate the accuracy of numerical values for the self-force. An estimate for the truncation error arising from cut- ting short the summation in Eq. (4.11) at some ℓmax can be calculated by considering the behavior of the remain- ing terms for large ℓ. Detweiler et. al. [15] showed that the remaining terms scale as ℓ−4 for large ℓ. They find the functional form of the terms to be EP3/2 (2ℓ− 3)(2ℓ− 1)(2ℓ+ 3)(2ℓ+ 5) , (5.3) where P3/2 = 36 2. We fit a function of this form to the tail end of a plot of the multipole coefficients to find the coefficient E in Eq. (5.3). Extrapolating to ℓ → ∞ we find that the truncation error is ℓ=ℓmax [Eq. (5.3)] (5.4) 2Eℓmax (2ℓmax + 3)(2ℓmax + 1)(2ℓmax − 1)(2ℓmax − 3) (5.5) where ℓmax is the value at which we cut the summation short. For all but the special case of the (0) component for a circular orbit, for which all regularization parame- ters vanish identically, we use this approach to calculate an estimate for the truncation error. A second source of error lies in the numerical calcula- tion of the retarded solution to the wave equation. This error depends on the step size h used to evolve the field forward in time. For a numerical scheme of a given con- vergence order, we can estimate this discretization error by extrapolating the differences of simulations using dif- ferent step sizes down to h = 0. This is what was done in the graphs shown in Sec. VB. We display results for mildly eccentric orbits. A high eccentricity causes ∂rΦ (displayed in Fig. 7) to be plagued by high frequency noise produced by effects similar to those described in Sec. VB. This makes it impossible to reliably estimate the discretization error for these orbits. We do not expect this to be very different from the errors for mildly eccentric orbits. Finally we compare our final results for the self-force Fα to “reference values”. For circular orbits, frequency- domain calculations are much more accurate than our time-domain computations. We thus compare our results to the results obtained in [4]. Table II lists typical values for the various errors listed above. error estimation mildly eccentric orbit truncation error (M Φ(+)) ≈ 2× 10 discretization error (M ∂rΦℓm) ≈ 10 comparison with reference values circular orbit Ft 0.2% Fr 0.04% Fφ 2× 10 TABLE II: Estimated values for the various errors in the com- ponents of the self-force as described in the text. We show the truncation and discretization errors for a mildly eccentric orbit and the total error for a circular orbit. The truncation error is calculated using a plot similar to the one shown in Fig. 16. The discretization error is estimated using a plot similar to that in Fig. 7 for the ℓ = 2, m = 2 mode, and the total error is estimated as the difference between our values and those of [4]. We use p = 7.2 , e = 0.5 for the mildly eccentric orbit. Note that we use the tetrad component Φ(+) for the truncation error and the vector component ∂rΦ for the discretization error. Both are related by the translation table Eqs. (A6) – (A9), we expect corresponding errors to be comparable for Φ(+) and ∂rΦ. VI. SAMPLE RESULTS In this section we describe some results of our numer- ical calculation. A. Mildly eccentric orbit We choose a particle on an eccentric orbit with p = 7.2, e = 0.5 which starts at r = pM/(1−e2), halfway between 15 10 5 0 5 10 trajectory for p=7.2, e=0.5 FIG. 12: Trajectory of a particle with p = 7.2, e = 0.5. The cross-hair indicates the point where the data for Fig. 10 was extracted. -0.014 -0.012 -0.01 -0.008 -0.006 -0.004 -0.002 0.002 100 200 300 400 500 600 700 800 900 1000 time/M FIG. 13: Regularized dimensionless self-force M and M Fφ on a particle on an eccentric orbit with p = 7.2, e = 0.5. periastron and apastron. The field is evolved for 1000M with a resolution of 16 grid points per M , both in the t and r∗ directions, for ℓ = 0. Higher values of ℓ (and thus m) require a corresponding increase in the number of grid points used to achieve the same fractional accuracy. Multipole coefficients for 0 ≤ ℓ ≤ 15 are calculated and used to reconstruct the regularized self-force Fα along the geodesic. Figure 13 shows the result of the calcula- tion. For the choice of parameters used to calculate the force shown in Fig. 13, the error bars corresponding to the truncation error (which are already much larger than than the discretization error) would be of the order of the line thickness and have not been drawn. Already for this small eccentricity, we see that the self- force is most important when the particle is closest to the black hole (ie. for 200M . t . 400M and 600M . t . 60 50 40 30 20 10 0 10 20 30 40 50 trajectory for p=7.8001, e=0.9 FIG. 14: Trajectory of a particle on a zoom-whirl orbit with p = 7.8001, e = 0.9. The cross-hairs indicate the positions where the data shown in Fig. 16 and 17 was extracted. 800M); the self-force acting on the particle is very small once the particle has moved away to r ≈ 15M . B. Zoom-whirl orbit Highly eccentric orbits are of most interest as sources of gravitational radiation. For nearly parabolic orbits with e . 1 and p & 6+2e, a particle revolves around the black hole a number of times, moving on a nearly circu- lar trajectory close to the event horizon (“whirl phase”), before moving away from the black hole (“zoom phase”). During the whirl phase the particle is in the strong field region of the black hole, emitting copious amounts of radiation. Figures 14 and 15 show the trajectory of a particle and the force on such an orbit with p = 7.8001, e = 0.9. Even more so than for the mildly eccentric orbit discussed in Sec. VIA, the self-force (and thus the amount of radiation produced) is much larger while the particle is close to the black hole than when it zooms out. Defining energy E per unit mass and angular momen- tum L per unit mass in the usual way, E = − uα, L = uα, (6.1) and following eg. the treatment of Wald [16], Ap- pendix C, it is easy to see that the rates of change Ė and L̇ (per unit proper time) are directly related to com- ponents of the acceleration aα (and therefore force) ex- perienced by the particle via Ė = −at, L̇ = aφ. (6.2) The self-force shown in Fig. 15 therefore confirms our näıve expectation that the self-force should decrease both the energy and angular momentum of the particle as ra- diation is emitted. -0.025 -0.02 -0.015 -0.01 -0.005 0.005 500 1000 1500 2000 2500 3000 time/M -0.002 -0.001 0.001 0.002 0.003 0.004 2000 2050 2100 2150 2200 FIG. 15: Self-force acting on a particle. Shown is the dimen- sionless self-force M Fr and Fφ on a zoom-whirl orbit with p = 7.8001, e = 0.9. The inset shows a magni- fied view of the self-force when the particle is about to enter the whirl phase. No error bars showing an estimate error are shown, as the errors shown eg. in Table II are to small to show up on the graph. Notice that the self-force is essentially zero during the zoom phase 500M . t . 2000M and reaches a constant value very quickly after the particle enters into the whirl phase. It is instructive to have a closer look at the force acting on the particle when it is within the zoom phase, and also when it is moving around the black hole on the nearly cir- cular orbit of the whirl phase. In Fig. 16 and Fig. 17 we show plots of Φ(0)ℓ vs. ℓ after the removal of the A(µ), B(µ), and D(µ) terms. While the particle is still zooming in toward the black hole, Φ(0)ℓ behaves exactly as for the mildly eccentric orbit described in Sec. VIA over the full range of ℓ plotted; ie. the magnitude of each term scales as ℓ0, ℓ−2 and ℓ−4, after removal of the A(µ), B(µ), and D(µ) terms respectively. Close to the black hole, on the other hand, the particle moves along a nearly circular tra- jectory. If the orbit were perfectly circular for all times, ie. ṙ ≡ 0, then the (0) component would not require reg- ularization at all, and the multipole coefficients would decay exponentially, resulting in a straight line on the semi-logarithmic plot shown in Fig. 17. As the real orbit is not precisely circular, curves eventually deviate from a straight line. Removal of the A(µ) term is required almost immediately (beginning with ℓ ≈ 3), while the D(µ) term starts to become important only after ℓ ≈ 11. This shows that there is a smooth transition from the self-force on a circular orbit, which does not require regularization for the t and φ components, to that of a generic orbit, for which all components of the self-force require regulariza- tion. 0 2 4 6 8 10 12 14 Φ(0)-A Φ(0)-A-B Φ(0)-A-B-D FIG. 16: Multipole coefficients of M ReΦR(0) for a particle on a zoom-whirl orbit (p = 7.8001, e = 0.9). The coefficients are extracted at t = 2000M as the particle is about to enter the whirl phase. As ṙ is non-zero, all components of the self-force require regularization and we see that the dependence of the multipole coefficients on ℓ is as predicted by Eq. 1.9. After the removal of the regularization parameters A(µ), B(µ), and D(µ) the remainder is proportional to ℓ0, ℓ−2 and ℓ−4 respectively. 0 2 4 6 8 10 12 14 Φ(0)-A Φ(0)-A-B Φ(0)-A-B-D FIG. 17: Multipole coefficients of ReΦR(0) for a particle on a zoom-whirl orbit (p = 7.8001, e = 0.9). The coefficients are extracted at t = 2150M while the particle is in the whirl phase. The orbit is nearly circular at this time, causing the dependence on ℓ after removal of the regularization parame- ters to approximate that of a true circular orbit. Acknowledgments We thank Eric Poisson and Eran Rosenthal for useful discussions and suggestions. This work was supported by the Natural Sciences and Engineering Council of Canada. APPENDIX A: TRANSLATION TABLES We quote the results of [4] for the translation table be- tween the modes Φℓm and the tetrad components Φ(µ)ℓm with respect to the pseudo-Cartesian basis eα(0) = , 0, 0, 0 , (A1) eα(1) = f sin θ cosφ, cos θ cosφ,− sinφ r sin θ , (A2) eα(2) = f sin θ sinφ, cos θ sinφ, r sin θ , (A3) eα(3) = f cos θ,−1 sin θ, 0 , (A4) and the complex combinations eα := eα ± ieα eα(±) = f sin θe±iφ, cos θe±iφ, ±ie±iφ r sin θ . (A5) With these, the spherical-harmonic modes Φ(µ)ℓm(t, r) are given in terms of Φℓm(t, r) by Φ(0)ℓm = Φℓm, (A6) Φ(+)ℓm =− (ℓ+m− 1)(ℓ +m) (2ℓ− 1)(2ℓ+ 1) − ℓ− 1 Φℓ−1,m−1 (ℓ−m+ 1)(ℓ −m+ 2) (2ℓ+ 1)(2ℓ+ 3) Φℓ+1,m−1, (A7) Φ(−)ℓm = (ℓ −m− 1)(ℓ−m) (2ℓ− 1)(2ℓ+ 1) − ℓ− 1 Φℓ−1,m+1 (ℓ+m+ 1)(ℓ +m+ 2) (2ℓ+ 1)(2ℓ+ 3) Φℓ+1,m+1, (A8) Φ(3)ℓm = (ℓ −m)(ℓ+m) (2ℓ− 1)(2ℓ+ 1) Φℓ−1,m (ℓ−m+ 1)(ℓ +m+ 1) (2ℓ+ 1)(2ℓ+ 3) Φℓ+1,m. (A9) APPENDIX B: REGULARIZATION PARAMETERS For completeness we list the regularization parameters as calculated in [4]. Quantities bearing a subscript “0” are evaluated at the particle’s position. A(0) = 0 + L sign(∆), (B1) A(+) = −eiφ0 0 + L sign(∆), (B2) A(3) = 0, (B3) where f0 := 1 − 2M/r0 and sign(∆) is equal to +1 if ∆ > 0 and to −1 if ∆ < 0. We have, in addition, A(−) = Ā(+), A(1) = Re[A(+)], and A(2) = Im[A(+)]. We also use B(0) = − Er0ṙ0√ 0 + L 2)3/2 E + Er0ṙ0 0 + L 2)3/2 B(+) = e Bc(+) − iB , (B5) Bc(+) = 0 + L 2)3/2 r20 + L 0 + L 2)3/2 f0 − 1 r20 + L K, (B6) Bs(+) = − f0)ṙ0 r20 + L E + (2− f0)ṙ0 r20 + L B(3) = 0. (B8) In addition, B(−) = B̄(+), B(1) = Re[B(+)] = cosφ0 + B sinφ0, and B(2) = Im[B(+)] = Bc(+) sinφ0 −B (+) cosφ0. Here, the rescaled elliptic integrals E and K are defined E := 2 ∫ π/2 (1− k sin2 ψ)1/2 dψ = F ; 1; k K := 2 ∫ π/2 (1− k sin2 ψ)−1/2 dψ = F ; 1; k (B10) in which k := L2/(r20 + L We also use C(µ) = 0 (B11) D(0) = − Er30(r 0 − L2)ṙ30 0 + L 2)7/2 E(r70 + 30Mr 0 − 7L2r50 + 114ML2r40 + 104ML4r20 + 36ML6)ṙ0 16r40 0 + L 2)5/2 Er30(5r 0 − 3L2)ṙ30 0 + L 2)7/2 E(r50 + 16Mr 0 − 3L2r30 + 42ML2r20 + 18ML4)ṙ0 16r20 0 + L 2)5/2 K, (B12) D(+) = e Dc(+) − iD , (B13) Dc(+) = r30(r 0 − L2)ṙ40 0 + L 2)7/2 − r0ṙ 4(r20 + L 2)3/2 (3r70 + 6Mr 0 − L2r50 + 31ML2r40 + 26ML4r20 + 9ML6)ṙ20 0 + L 2)5/2 (3r70 + 8Mr 0 + L 2r50 + 26ML 2r40 + 22ML 4r20 + 8ML 16r60(r 0 + L 2)3/2 0 + 2Mr 0 + 4ML r20 + L 0 − 3L2)ṙ40 0 + L 2)7/2 8(r20 + L 2)3/2 − (7r 0 + 12Mr 0 − L2r30 + 46ML2r20 + 18ML4)ṙ20 16r20 0 + L 2)5/2 − (7r 0 + 6Mr 0 + 6L 2r30 + 12ML 2r20 + 4ML 16r40(r 0 + L 2)3/2 r20 + L K, (B14) Ds(+) = r20(r 0 − 7L2)( f0 − 2)ṙ30 0 + L 2)5/2 − (2r 0 +Mr 0 + 5L 2r50 + 10ML 2r40 + 29ML 4r20 + 14ML 6)ṙ0 8r50L(r 0 + L 2)3/2 (r50 −Mr40 + 4L2r30 − 5ML2r20 + 2ML4)ṙ0 4r30L 0 + L 2)3/2 r20(r 0 − 3L2)( f0 − 2)ṙ30 0 + L 2)5/2 (4r50 + 2Mr 0 + 7L 2r30 + 10ML 2r20 + 14ML 4)ṙ0 16r30L(r 0 + L 2)3/2 − (2r 0 − 2Mr20 + 5L2r0 − 8ML2)ṙ0 0 + L 2)3/2 K, (B15) D(3) = 0. (B16) And finally, D(−) = D̄(+), D(1) = Re[D(+)] = cosφ0 + D sinφ0, and D(2) = Im[D(+)] = Dc(+) sinφ0 −D (+) cosφ0. APPENDIX C: PIECEWISE POLYNOMIALS In two places in the numerical simulation we introduce piecewise polynomials to approximate the scalar field ψℓm across the world line, where it is continuous but not dif- ferentiable. By a piecewise polynomial we mean a poly- nomial of the form p(t, r∗) =   n,m=0 unvm if r∗(u, v) > r∗0 n,m=0 unvm if r∗(u, v) < r∗0 , (C1) where u = t−r∗, v = t+r∗ are characteristic coordinates, r∗0 is the position of the particle at the time t(u, v), and N is the order of the polynomial, which for our purposes is N = 4 or less. The two sets of coefficients cnm and c′nm are not independent of each other, but are linked via jump conditions that can be derived from the wave equation [Eq. (1.12)]. To do so, we rewrite the wave equation in the characteristic coordinates u and v and reintroduce the integral over the world line on the right- hand side, −4∂u∂vψ − V ψ = Ŝ(τ)δ(u − up)δ(v − vp) dτ , (C2) where Ŝ(τ) = −8πq Ȳℓm(π/2,φp(τ)) rp(τ) is the source term and quantities bearing a subscript p are evaluated on the world line at proper time τ . Here and in the following we use the notation [∂nu∂ v ψ] = lim [∂nu∂ v ψ(t0, r 0 + ǫ)− ∂nu∂mv ψ(t0, r∗0 − ǫ)] to denote the jump in ∂nu∂ v ψ across the world line. First, we notice that the source term does not contain any derivatives of the Dirac δ-function, causing the solution ψ to be continuous. This means that the zeroth-order jump vanishes: [ψ] = 0. Our task is then to find the re- maining jump conditions at a point (t0, r 0) for n,m ≤ 4. Alternatively, instead of crossing the world line along a line t = t0 = const we can also choose to cross along lines of u = u0 = const or v = v0 = const, noting that for a line of constant v the coordinate u runs from u0+ ǫ to u0 − ǫ to cross from the left to the right of the world line. Figure 18 provides a clearer description of the paths taken. (u0 − ǫ, v0) (u0, v0 + ǫ)(u0 + ǫ, v0) (u0, v0 − ǫ) (t0, r 0) = (u0, v0) FIG. 18: Paths taken in the calculation of the jump condi- tions. (u0, v0) denotes an arbitrary but fixed point along the world line γ. The wave equation is integrated along the lines of constant u or v indicated in the sketch. Note that in order to move from the domain on the left to the domain on the right, u has to run from u0 + ǫ to u0 − ǫ. Where appropriate we label quantities connected to the domain on the left by a subscript “−” and quantities connected to the domain on the right by “+”. In order to find the jump [∂uψ] we integrate the wave equation along the line u = u0 from v0 − ǫ to v0 + ǫ ∫ v0+ǫ ∂u∂vψdv − ∫ v0+ǫ V ψdv = Ŝ(τ)δ(u0 − up) ∫ v0+ǫ δ(v − vp)dv dτ , which, after involving ∫ v0+ǫ δ(v − vp)dv = θ(vp − v0 + ǫ)θ(v0 − vp + ǫ) and δ(g(x)) = δ(x− x0)/ |g′(x0)|, yields [∂uψ] = − E − ṙ0 Ŝ(τ0), (C5) where the overdot denotes differentiation with respect to proper time τ . Similarly, after first taking a derivative of the wave equation with respect to v and integrating from u0+ ǫ to u0 − ǫ, we obtain ∫ u0−ǫ vψdu− ∫ u0−ǫ V ψdu = Ŝ(τ) ∫ u0−ǫ δ(u− up)du δ′(v0 − vp)dτ . We find E + ṙ0 E + ṙp Ŝ(τ) |τ=τ0 . (C7) Systematically repeating this procedure we find expres- sions for the jumps in all the derivatives that are purely in the u or v direction. Table III lists these results. Jump [ψ] =0 [∂uψ] =− bS(τ0), [∂vψ] = bS(τ0) bS(τ ) |τ=τ0 bS(τ ) |τ=τ0 V ξ0ξ̄ 0 [∂uψ]− bS(τ ) |τ=τ0 V ξ̄0ξ 0 [∂vψ] + bS(τ ) |τ=τ0 |τ=τ0 + 3ξ0ξ̄ 0 ∂uV + ξ 0 ∂vV [∂uψ] + bS(τ ) |τ=τ0 |τ=τ0 + 3ξ̄0ξ 0 ∂vV + ξ̄ 0 ∂uV [∂vψ]− bS(τ ) |τ=τ0 TABLE III: Jump conditions for the derivatives purely in the u or v directions. ṙ and r̈ are the particle’s radial velocity and acceleration, respectively. They are obtained from the equa- tion of motion for the particle. ξ̄ := E−ṙ and ξ := E+ṙ introduced for notational convenience. Quantities bearing a subscript p are evaluated on the particle’s world line, while quantities bearing a subscript 0 are evaluated at the parti- cle’s current position. Derivatives of V with respect to either u or v are evaluated as ∂uV = − f∂rV and ∂vV = f∂rV , respectively. conditions for derivatives involving both u and v are ob- tained directly from the wave equation [Eq. (C2)]. We see that [∂u∂vψ] = 0, (C8) and taking an additional derivative with respect to u on both sides reveals that ∂2u∂vψ V [∂uψ] . (C9) Systematically repeating this procedure we can find jump conditions for each of the mixed derivatives by evaluating ∂n+1u ∂ [∂nu∂ v (V ψ)] , (C10) where n,m ≥ 0 and derivatives of V with respect to either u or v are evaluated as ∂uV = − 12f∂rV and ∂vV = f∂rV , respectively. The results of Table III and Eq. (C10) allow us to express the coefficients of the left-hand polynomial in Eq. (C1) in terms of the jump conditions and the co- efficients of the right-hand side: c′nm = cnm − [∂nu∂mv ψ] . (C11) For N = 4 this leaves us with 25 unknown coefficients cnm which can be uniquely determined by demanding that the polynomial match the value of the field on the 25 grid points surrounding the particle. When we are interested in integrating the polynomial, as in the case of the potential term in the fourth-order algorithm, we do not need all these terms. Instead, in order to calculate e.g. the integral V ψ du dv up to terms of order h5, as is needed to achieve overall O(h4) convergence, it is sufficient to include only terms such that n+m ≤ 2, thus reducing the number of unknown coefficients to 6. In this case Eq. (C1) becomes p(t, r∗) = m+n≤2 unvm if r∗(u, v) > r∗0 m+n≤2 unvm if r∗(u, v) < r∗0 . (C12) The six coefficients can then be determined by matching the polynomial to the field values at the six grid points which lie within the past light cone of the grid point whose field value we want to calculate. [1] The LISA web site is located at http://lisa.jpl.nasa.gov/. [2] L. Barack and A. Ori, Phys. Rev. D 61, 061502 (2000), gr-qc/9912010. [3] C. Lousto, Class. Quantum Grav. 22, S543 (2005). [4] R. Haas and E. Poisson, Phys. Rev. D 74, 044009 (pages 29) (2006), gr-qc/0605077, URL http://link.aps.org/abstract/PRD/v74/e044009 . [5] B. S. DeWitt and R. W. Brehme, Annals of Physics 9, 220 (1960). [6] Y. Mino, M. Sasaki, and T. Tanaka, Phys. Rev. D 55, 3457 (1997), gr-qc/9606018. [7] T. C. Quinn and R. M. Wald, Phys. Rev. D 56, 3381 (1997), gr-qc/9610053. [8] T. C. Quinn, Phys. Rev. D 62, 064029 (2000), gr- qc/0005030. [9] S. Detweiler and B. F. Whiting, Phys. Rev. D 67, 024025 (2003), gr-qc/0202086. [10] C. O. Lousto, Class. Quant. Grav. 22, S543 (2005), gr- qc/0503001. [11] R. H. Price, Phys. Rev. D 5, 2419 (1972). [12] C. G. Darwin, Proc. R. Soc. A 249, 180 (1959). http://lisa.jpl.nasa.gov/ http://link.aps.org/abstract/PRD/v74/e044009 [13] C. O. Lousto and R. H. Price, Phys. Rev. D 56, 6439 (1997), gr-qc/9705071. [14] L. M. Diaz-Rivera, E. Messaritaki, B. F. Whit- ing, and S. Detweiler, Physical Review D (Par- ticles, Fields, Gravitation, and Cosmology) 70, 124018 (pages 14) (2004), gr-qc/0410011, URL http://link.aps.org/abstract/PRD/v70/e124018. [15] S. Detweiler, E. Messaritaki, and B. F. Whiting, Phys. Rev. D 67, 104016 (2003), gr-qc/0205079. [16] R. M. Wald, General relativity (University of Chicago Press, Chicago, 1984), ISBN 0226870324. http://link.aps.org/abstract/PRD/v70/e124018 ABSTRACT We calculate the self-force acting on a particle with scalar charge moving on a generic geodesic around a Schwarzschild black hole. This calculation requires an accurate computation of the retarded scalar field produced by the moving charge; this is done numerically with the help of a fourth-order convergent finite-difference scheme formulated in the time domain. The calculation also requires a regularization procedure, because the retarded field is singular on the particle's world line; this is handled mode-by-mode via the mode-sum regularization scheme first introduced by Barack and Ori. This paper presents the numerical method, various numerical tests, and a sample of results for mildly eccentric orbits as well as ``zoom-whirl'' orbits. <|endoftext|><|startoftext|> Introduction In [6], the authors and Bruner described a proof of the following theorem, along with some additional nonimmersion results. Theorem 1.1. ([6, 1.1]) Assume that M is divisible by the smallest 2-power greater than or equal to h. • If α(M) = 4h − 1, then P 8M+8h+2 cannot be immersed in ( 6⊆) 16M−8h+10. • If α(M) = 4h− 2, then P 8M+8h 6⊆ R16M−8h+12. Here and throughout, α(M) denotes the number of 1’s in the binary expansion of M , and P n denotes real projective space. Date: April 5, 2007. 2000 Mathematics Subject Classification. 57R42, 55N20. Key words and phrases. immersion, projective space, elliptic cohomology. We thank Steve Wilson for causing us to take a look at these matters. http://arxiv.org/abs/0704.0798v1 2 DONALD M. DAVIS AND MARK MAHOWALD In [6], the theorem is followed by a comment that this is new provided α(M) ≥ 6, i.e., h ≥ 2, and the first new result occurs for P 1536. In this note, we point out that 1.1 is valid when h = 1, and these results are new when M is even, including new nonimmersions of P n for n as small as 56. A remark in [6, p.66] that the nonimmersions when h = 1 were implied by earlier work of the authors was incorrect. Letting h = 1 in 1.1, we have the following result. Corollary 1.2. a. If α(M) = 3, then P 8M+10 6⊆ R16M+2. b. If α(M) = 2, then P 8M+8 6⊆ R16M+4. Part (a) is new when M is even. It is 2 better than the previous best result, proved in [4], and the nonembedding result that it implies is also new, 1 better than the previous best, proved in [3]. In [7], a table of known nonimmersions, immersions, nonembeddings, and embeddings of P n is presented, arranged according to n = 2i+d with 0 ≤ d < 2i and d < 64. Part (a) enters the table with a new result for d = 58, applying first to P 122. If M is even, 1.2.b is new, 1 better than the previous best result, of [12], and the nonembedding result implied is also new. It enters [7] at d = 24 and 40, with a new result for P n with n as small as 56. The result of 1.2.b with M = 2i + 1 was also proved very recently by Kitchloo and Wilson in [15]. This result for P 2 k+16, 2 better than the previous result of [4] and also new as a nonembedding, enters [7] at d = 16, and applies for n as small as 48. In Section 2, we present a self-contained proof of Corollary 1.2. The primary reason for doing this, which amounts to a reproof of part of [6, 1.1], is that the proof of the general case in [6] requires some extremely elaborate arguments and calculations. Our proof here, which is just for the case h = 1, is much more comprehensible. The proof in [6] contained an oversight which we shall correct here. The argument there was that an immersion of RP n in Rn+k implies existence of an axial map P n× f−→ Pm+k for an appropriate value of m, and obtains a contradiction for certain n, m, and k by consideration of tmf∗(f). Here tmf is the spectrum of topological modular forms, which was discussed in [6]. A class X ∈ tmf8(P n) was described, along with X1 = X × 1 and X2 = 1 × X in tmf8(P n × Pm). It was asserted that f ∗(X) = X1+X2, and a contradiction obtained by showing that, for certain values of NONIMMERSIONS IMPLIED BY TMF, REVISITED 3 the parameters, we might have Xℓ = 0 but (X1+X2) ℓ 6= 0. We recently realized that it is conceivable that f ∗(X) might contain other terms coming from tmf8(P n ∧ Pm). In Section 3 (see Theorem 3.7) we perform a complete calculation of tmf∗(P∞×P∞) in positive gradings divisible by 8, and in Section 4 we use it to show that effectively f ∗(X) = u(X1 + X2), where u is a unit in tmf ∗(P∞ × P∞), which enables us to retrieve all the nonimmersions of [6]. In Section 5, we compute tmf∗(CP∞ × CP∞) in positive gradings. The original purpose of doing this was, prior to our obtaining the argument of Section 4, to see whether we might mimic the argument of [2] and [8] to conclude that if f is an axial map, then f ∗(X) might necessarily equal u(X1 − X2), where u is a unit in tmf∗(CP × CP ). This approach to retrieving the nonimmersions of [6] did not yield the desired result, but the later approach given in Section 4 did. Nevertheless the nice result for tmf∗(CP∞ × CP∞) obtained in Theorem 5.19 should be of independent interest. 2. Proof of Corollary 1.2 We begin by proving 1.2.a. The following standard reduction goes back at least to [14]. If P 8M+10 ⊆ R16M+2, then gd((2L+3 − 8M − 11)ξ8M+10) ≤ 8M − 8, hence this bundle has (2L+3−16M −3) linearly independent sections, and thus there is an axial P 8M+10 × P 2L+3−16M−4 f−→P 2L+3−8M−12. The bundle here is the stable normal bundle, L is a sufficiently large integer, and gd refers to geometric dimension. Let X , X1, and X2 be elements of tmf 8(−) described in [6] and also in Section 1. In Section 4, we will show that we may assume that f ∗(X) = X1+X2, as was done in [6], since this is true up to multiplication by a unit. Since tmf2 L+3−8M−8(P 2 L+3−8M−12) = 0, we have 0 = f ∗(0) = f ∗(X2 L−M−1) = (X1+X2) 2L−M−1 ∈ tmf2L+3−8M−8(P 8M+10×P 2L+3−16M−4). Expanding, we obtain 2L−M−1 XM+11 X 2L−2M−2 2L−M−1 XM1 X 2L−2M−1 2 as the only terms which are possibly nonzero. Next we note that, with all u’s representing odd integers, 2L−M−1 = 2α(M)−ν(M+1)u2 = 2 3−ν(M+1)u2, 4 DONALD M. DAVIS AND MARK MAHOWALD where we have used α(M) = 3 at the last step. Here and throughout, ν(2eu) = e. Similarly, 2L−M−1 = 2α(M)u4 = 2 3u4. Thus an immersion implies that in L+3−8M−8(P 8M+10 × P 2L+3−16M−4), we have 23−ν(M+1)u2X 2L−2M−2 2 + 2 2L−2M−1 2 = 0. (2.1) We recall [6, 2.6], which states that there is an equivalence of spectra P k+8b+8 ∧ tmf ≃ Σ8P kb ∧ tmf. Combining this with duality, we obtain tmf 8M+8(P 8M+10) ≈ tmf−1(P−3) ≈ Z/8, and so 8XM+11 X2 L−2M−2 2 = 0. Here and throughout, Pn = P RP∞/RP n−1. Similarly tmf2 L+3−16M−8(P 2 L+3−16M−4) ≈ tmf7(P3) ≈ Z/16, and hence 16XM1 X 2L−2M−1 2 = 0. Duality also implies L+3−8M−8(P 8M+10 × P 2L+3−16M−4) ≈ tmf14(P−3 ∧ P3). Calculations such as E2(tmf∗(P−3∧P3)), the E2-term of the Adams spectral sequence (ASS), were made by Bruner’s minimal-resolution computer programs in our work on [6]. This one is in a small enough range to actually do by hand. The result is given in Diagram 2.2. Diagram 2.2. E2(tmf∗(P−3 ∧ P3)), ∗ ≤ 15 0 3 7 11 15 r r r r r� r r r The Z/8 ⊕ Z/16 arising from filtration 0 in grading 14 in 2.2 is not hit by a differential from the class in (15, 0) because, as explained in the last paragraph of page 54 of [6], the class in (15, 0) corresponds to an easily-constructed nontrivial map. The monomials XM+11 X 2L−2M−2 2 and X 2L−2M−1 2 are detected in mod-2 cohomology, NONIMMERSIONS IMPLIED BY TMF, REVISITED 5 and so their duals emanate from filtration 0. We saw in the previous paragraph that 8 and 16, respectively, annihilate these monomials, and hence also their duals. Since the chart shows that the subgroup of tmf14(P−3∧P3) generated by classes of filtration 0 is Z/8 ⊕ Z/16, we conclude that 8 and 16, respectively, are the precise orders of the monomials. In particular, the order of XM1 X 2L−2M−1 2 is 16, and hence the class in (2.1) is nonzero since it has a term 8uXM1 X 2L−2M−1 2 , and so (2.1) contradicts the hypothesized immersion. Part b of 1.2 is proved similarly. If P 8M+8 immerses in R16M+4, then there is an axial map P 8M+8 × P 2L+3−16M−6 f−→P 2L+3−8M−10, and hence, up to odd multiples, 22−ν(M+1)XM+11 X 2L−2M−2 2 + 2 2XM1 X 2L−2M−1 2 (2.3) = 0 ∈ tmf2L+3−8M−8(P 8M+8 ∧ P 2L+3−16M−6), since α(M) = 2. We have tmf8M+8(P 8M+8) ≈ tmf−1(P−1) ≈ Z/2, and L+3−16M−8(P 2 L+3−16M−6) ≈ tmf−1(P−3) ≈ Z/8. Thus the two monomials in (2.3) have order at most 2 and 8, respectively. On the other hand, the group in (2.3) is isomorphic to tmf6(P−1 ∧ P−3). A minimal resolution calculation easier than the one in Diagram 2.2 shows that tmf6(P−1∧P−3) has Z/2⊕Z/8 emanating from filtration 0 (and another Z/2⊕Z/8 in higher filtration). The monomials of (2.3) are generated in filtration 0, and since the above upper bound for their orders equals the order of the subgroup generated by filtration-0 classes, we conclude that the orders of the monomials in (2.3) are precisely 2 and 8, respectively, and so the term 4XM1 X 2L−2M−1 2 in (2.3) is nonzero, contradicting the immersion. 3. tmf-cohomology of P∞ × P∞ In this section, we compute tmf∗(P∞) and tmf8∗(P∞ × P∞) in positive gradings. These will be used in the next section in studying the axial class in tmf-cohomology. There is an element c4 ∈ π8(tmf) which reduces to v41 ∈ π8(bo); it has Adams filtration 4. It acts on tmf∗(X) with degree −8. Recall also that π∗(bo) = bo∗ is as depicted in 5.1. We denote bo∗ = bo−∗. We use P1 and P ∞ interchangeably. 6 DONALD M. DAVIS AND MARK MAHOWALD Theorem 3.1. There is an element X ∈ tmf8(P1) of Adams filtration 0, described in [6], such that, in positive dimensions divisible by 8, tmf∗(P1) is isomorphic as an algebra over Z(2)[c4] to Z(2)[c4][X ]. In particular, each tmf 8i(P1) with i > 0 is a free abelian group with basis {cj4X i+j : j ≥ 0}. There is a class L ∈ t0(P1) such that • tmf0(P1) is a free abelian group with basis {L, cj4Xj : j ≥ 1}, • L2 = 2L and LX = 2X. Moreover, in positive dimensions tmf∗(P1) is isomorphic as a graded abelian group to bo∗[X ], and is depicted in Diagram 3.6. Remark 3.2. A complete description of tmf∗(P1) as a graded abelian group could probably be obtained using the analysis in the proof which follows, together with the computation of the E2-term of the ASS converging to tmf∗(P−1), which was given in [10]. However, this is quite complicated and unnecessary for this paper, and so will be omitted. Proof. We begin with the structure as graded abelian group. There are isomorphisms tmf∗(P1) ≈ lim← tmf ∗(P n1 ) ≈ lim← tmf−∗−1(P −n−1) = tmf−∗−1(P (3.3) Since H∗(tmf;Z2) ≈ A//A2, there is a spectral sequence converging to tmf∗(X) with E2(X) = ExtA2(H ∗X,Z2). Here A2 is the subalgebra of the mod 2 Steenrod algebra A generated by Sq1, Sq2, and Sq4. Also Z2 = Z/2. We compute E2(P −∞) from the exact sequence → Es−1,t2 (P∞−1)→ E −∞)→ E q∗−→ Es,t2 (P∞−1)→ . (3.4) It was proved in [17] that ExtA2(P −∞,Z2) ≈ ExtA1(Σ Z2,Z2). Here we have initiated a notation that Pmn := H ∗(Pmn ). A complete calculation of ExtA2(P −1,Z2) was performed in [10], but all we need here are the first few groups. We can now form a chart for E2(P −∞) from (3.4), as in Diagram 3.5, where ◦ indicate elements of ExtA2(P −1,Z2) suitably positioned, and lines of negative slope correspond to cases of q∗ 6= 0 in (3.4). NONIMMERSIONS IMPLIED BY TMF, REVISITED 7 Diagram 3.5. tmf∗(P −∞), −17 ≤ ∗ ≤ 2 −17 −9 −1 · · · ✻ ✻ ✻ r r r r r r ✻ ✻ ✻ r r r r r r r r r r r r r r r Dualizing, we obtain Diagram 3.6 for the desired tmf∗(P∞1 ). Diagram 3.6. tmf∗(P∞1 ), ∗ ≥ −2 0 8 16 ✻ ✻ ✻✻ ✻ ✻ r r r r r r · · · Naming of the generators X i is clear since X has filtration 0. The free action of c4 is also clear. The class L is (up to sign) the composite P1 λ−→ S0 → tmf, where λ is the well-known Kahn-Priddy map. Thus L is the image of a class L̂ ∈ π0(P1). Lin’s theorem ([16]) says that π0(P1) ≈ Z∧2 , generated by L̂. Since π0(P1)→ ko0(P1) is an isomorphism, and, since (1 − ξ)2 = 2(1 − ξ) for a generator (1 − ξ) of ko0(P1), we obtain L̂2 = 2L̂, and hence also for L. We chose the generator to be (1 − ξ) rather than (ξ − 1) to avoid minus signs later in the paper. 8 DONALD M. DAVIS AND MARK MAHOWALD To prove the claim about LX , first note that, by the structure of tmf8(P1), we must have LX = p(c4X)X for some polynomial p. Multiply both sides by L and apply the result about L2 to get 2LX = p(c4X)LX , hence 2p = p 2, from which we conclude p = 2. In tmf∗(P1 × P1), for i = 1, 2, let Li and Xi denote the classes L and X in the ith factor. Note that there is an isomorphism as tmf∗-modules, but not as rings, tmf∗(P1 × P1) ≈ tmf∗(P1 ∧ P1)⊕ tmf∗(P1 × ∗)⊕ tmf∗(∗ × P1). Theorem 3.7. In positive dimensions divisible by 8, tmf∗(P1 ∧ P1) is isomorphic as a graded abelian group to a free abelian group on monomials X i1X 2 with i, j > 0 direct sum with a free Z[c4]-module with basis {L1X i2, X i1L2 : i ≥ 1}. The product and Z[c4]-module structure is determined from 3.1 and c4(X1X2) = (c4X1)X2 = X1(c4X2) = 4(L1X 1 L2), for certain integers γi with γ0 divisible by 8. The proof of this theorem involves a number of subsidiary results. They and it occupy the remainder of this section. We will use duality and exact sequences similar to (3.4). But to get started, we need ExtA2(P ⊗ P,Z2). Here we have begun to abbreviate P := P∞−∞. We begin with a simple lemma. Throughout this section, x1 and x2 denote nonzero elements coming from the factors in H 1(RP × RP ;Z2). Lemma 3.8. ([9]) There is a split short exact sequence of A-modules 0→ Z2 ⊗P→ P⊗P→ (P/Z2)⊗P→ 0. Proof. The Z2 is, of course, the subgroup generated by x 0, which is an A-submodule. A splitting morphism P⊗P g−→Z2 ⊗P is defined by g(xi1 ⊗ x 2) = x 1 ⊗ x 2 . This is A-linear since g(Sqk(xi1⊗x 2)) = x01⊗x i+j+k x01⊗x i+j+k 2 = Sq k g(xi1⊗x The following result is more substantial. We will prove it at the end of this section. NONIMMERSIONS IMPLIED BY TMF, REVISITED 9 Proposition 3.9. There is a short exact sequence of A2-modules 0→ C → (P/Z2)⊗P→ B → 0, where C has a filtration with Fp(C)/Fp−1(C) ≈ Σ8pA2/ Sq2, p ∈ Z, and B has a filtration with Fp(B)/Fp−1(B) ≈ Z copies Σ4p−2A2/ Sq 1, p ∈ Z. The generator of Fp(C)/Fp−1(C) is x 2 ; a basis over Z2 for C is {x21xi+22 +x41xi2, x41xi2+x81xi−42 , i ∈ Z}∪{x11xi−12 +x21xi−22 , i 6≡ 0 (8)}∪{x11x 2 , p ∈ Z}. A minimal set of generators as an A2-module for the filtration quotients of B is {x8i−11 x 2 : i, j ∈ Z}. Corollary 3.10. A chart for Ext (P ⊗ P,Z2) in 8p − 3 ≤ t − s ≤ 8p + 4 is as suggested in Diagram 3.11, for all integers p. The big batch of towers in each grading ≡ 2 (4) represents an infinite family of towers. The pattern of the other classes is repeated with vertical period 4. Thus, for example, in 8p−1 there is an infinite tower emanating from filtration 4i for each i ≥ 0. 10 DONALD M. DAVIS AND MARK MAHOWALD Diagram 3.11. Ext (P⊗P,Z2) in 8p− 3 ≤ t− s ≤ 8p+ 4 8p+ −2 0 2 4 ✻✻✻✻✻✻✻✻✻✻✻ ✻✻✻✻✻✻✻✻✻✻✻✻ ✻✻ ✻✻ ✻✻ Proof of Corollary 3.10. We first note that ExtA2(P,Z2) is identical to the left portion of Diagram 3.5 extended periodically in both directions. Also, ExtA2(A2/ Sq 1,Z2) ≈ ExtA0(Z2,Z2) is just an infinite tower, and ExtA2(A2/ Sq 2,Z2) ≈ ExtA1(A1/ Sq2,Z2) is given as in Diagram 3.14. We will show at the end of this proof that ExtA2(C,Z2) ≈ ExtA2(Σ 8pA2/ Sq 2,Z2) (3.12) and similarly ExtA2(B,Z2) ≈ ExtA2(Σ 4p−2A2/ Sq 1,Z2). These would follow by induction on p once you get started, but since p ranges over all integers, that is not automatic. Thus ExtA2(P⊗P,Z2) is formed from ExtA2(P,Z2)⊕ ExtA2(Σ 8pA2/ Sq 2,Z2)⊕ ExtA2(Σ 4p−2A2/ Sq 1,Z2), NONIMMERSIONS IMPLIED BY TMF, REVISITED 11 using the sequences in 3.8 and 3.9. The Ext sequence of 3.8 must split, and there are no possible boundary morphisms in the Ext sequence of 3.9, yielding the claim of the corollary. To prove (3.12), let (s, t) be given, and choose p0 so that 8p0 < t− 23s+ 2. Since the highest degree element in A2 is in degree 23, Ext (Fp0(C),Z2) = 0. Actually a much sharper lower vanishing line can be established, but this is good enough for our purposes. Thus, for this (s, t), (Fp1(C),Z2) ≈ (Σ8p−2A2/ Sq 2) (3.13) for p1 ≤ p0, as both are 0. Let p1 be minimal such that (3.13) does not hold. Then comparison of exact sequences implies that s−1,t (Fp1−1(C),Z2)→ Ext (Fp1(C)/Fp1−1(C),Z2) must be nonzero. But one or the other of these groups is always 0,1 as both charts (Fp1−1(C),Z2) and Ext (Fp1(C)/Fp1−1(C),Z2) are copies of Diagram 3.14 dis- placed by 4 vertical units from one another. Thus (3.13) is true for all p1, and hence (3.12) holds. A similar proof works when C is replaced by B. Diagram 3.14. ExtA2(A2/ Sq 2,Z2) · · · Now we can prove a result which will, after dualizing, yield Theorem 3.7. The groups ExtA1(Z2,Z2) to which it alludes are depicted in 5.1. The content of this result is pictured in Diagram 3.18. Proposition 3.15. In dimensions t− s ≡ 2 mod 4 with t− s ≤ −10, ExtA2(P−2−∞⊗ P−2−∞,Z2) consists of i infinite towers emanating from filtration 0 in dimensions −8i− 1Actually this is not quite true; for one family of elements we need to use h0- naturality. 12 DONALD M. DAVIS AND MARK MAHOWALD 6 and −8i − 10, together with the relevant portion of two copies of ExtA1(Z2,Z2) beginning in filtration 1 in each dimension −8i − 2. The generators of the towers in −8i− 10 correspond to cohomology classes x−91 x−8i−12 , . . . , x−8i−11 x−92 . The generators of the two copies of ExtA1(Z2,Z2) in −8i−2 arise from h0 times classes corresponding to x−11 x 2 and x −8i−1 Proof. Using exact sequences like (3.4) on each factor, we build Ext (P−2−∞⊗P−2−∞,Z2) fromA := Ext (P⊗P,Z2), B := Ext∗−1,∗A2 (P −1⊗P,Z2), C := Ext ∗−1,∗ (P⊗P∞−1,Z2), and D := Ext ∗−2,∗ (P∞−1 ⊗ P∞−1,Z2), with possible d1-differential from A and into D. In the range of concern, t− s ≤ −9, the D-part will not be present, and the part of Diagram 3.11 in dimension 6≡ 2 mod 4 will not be involved in d1. Using [17] for B and C, the relevant part, namely the portion of A in dimension ≡ 2 mod 4, together with B and C, is pictured in Diagram 3.16. Diagram 3.16. Portion of A+B+C ✻✻✻✻✻✻✻✻✻✻ ✻✻✻✻✻✻✻✻✻✻ ✻✻✻✻✻✻✻✻✻✻ −2 2 68p+ rr rrr In dimension 8p−2, the towers in A arise from all cohomology classes x−8i−11 x −8j−1 with i+ j = −p, while in dimension 8p+ 2, they arise from x8i−11 x 2 ∼ x8i+31 x The finite towers in B arise from x4i−11 x 2 with i ≥ 0, and those from C from x8i−11 x 2 with j ≥ 0. The homomorphism Ext0A2(P⊗P,Z2)→ Ext (P∞−1 ⊗P,Z2)⊕ Ext0A2(P⊗P −1,Z2), NONIMMERSIONS IMPLIED BY TMF, REVISITED 13 which is equivalent to the d1-differential mentioned above, sends classes to those with the same name. In dimension ≤ −10, this is surjective, with kernel spanned by classes with both components < −1. In dimension −8i−6 and −8i−10, there will be i such classes. We illustrate by listing the classes in the first few gradings: −14 : x−91 x−52 ∼ x−51 x−92 −18 : x−91 x−92 −22 : x−171 x−52 ∼ x−131 x−92 , x−91 x−132 ∼ x−51 x−172 −26 : x−171 x−92 , x−91 x−172 . These kernel classes yield infinite towers emanating from filtration 0. For each p < 0, the towers arising from x 2 , j ≥ 0, in A combine with those in the p-summand of ExtA1(Σ 8p−1P∞−1,Z2) as in Diagram 3.17 to yield one of the copies of ExtA1(Z2,Z2) arising from filtration 1. An identical picture results when the factors are reversed. Diagram 3.17. Part of ExtA2(P −∞ ⊗P−2−∞,Z2) ✻ ✻ ✻ Putting things together, we obtain that in dimensions less than −8, ExtA2(P−2−∞⊗ P−2−∞,Z2) consists of a chart described in Proposition 3.15 and partially illustrated in Diagram 3.18 together with the classes in Diagram 3.11 which are not part of the infinite sums of towers in dimension ≡ 2 mod 4. 14 DONALD M. DAVIS AND MARK MAHOWALD Diagram 3.18. Illustration of Proposition 3.15 −26 −18 −10 ✻✻✻✻✻✻ ✻ ✻✻✻ ✻✻✻✻ ✻✻ The only possible differentials in the Adams spectral sequence of P−2−∞∧P−2−∞∧ tmf involving the classes in dimensions 8p − 2 with p < 0 are from the towers in 8p − 1 in Diagram 3.11, but these differentials are shown to be 0 as in [6, p.54]. Similarly to (3.3), we have tmf∗(P1 ∧ P1) ≈ tmf−∗−2(P−2−∞ ∧ P−2−∞), and so we obtain a turned-around version of Diagram 3.18, of the same general sort as Diagram 3.6, as a depiction of a relevant portion of tmf∗(P1∧P1), with the labeled columns in Diagram 3.18 corresponding to cohomology gradings 24, 16, and 8. The classes X i1X 2 described in Theorem 3.7 are detected by the S-duals of the classes from which the filtration-0 towers in dimensions 8p− 2 in Diagram 3.18 arise, and so they can be chosen to be the corresponding elements of tmf8∗(P1 ∧P1). Simi- larly the classes L1X 2 and X 1L2 have Adams filtration 1, and so one would anticipate that they represent the duals of the generators of the two towers in dimension 8p− 2 with p < 0 in Diagram 3.18. This seems a bit harder to prove using the Adams spectral sequence; however, the Atiyah-Hirzebruch spectral sequence shows this quite clearly. The class X i1 is detected by H 8i(P1; π0(tmf)), while L is detected by H 1(P1; π1(tmf)). NONIMMERSIONS IMPLIED BY TMF, REVISITED 15 Under the pairing, their product is detected in H8i+1(P1; π1(tmf)), clearly of Adams filtration 1. The last part of Theorem 3.7 deals with the action of c4 on the monomials X Since tmf is a commutative ring spectrum, tmf∗(P1 ∧ P1) is a graded commutative algebra over tmf∗. The action c4(X1X2) must be of the form i≥0 γic 4(L1X as these are the only elements in tmf8(P1∧P1), and the class must be invariant under reversing factors. The divisibility of γ0 by 8 follows since c4 has Adams filtration 4. Having just completed the proof of Theorem 3.7, we conclude this section with the postponed proof of Proposition 3.9. Proof of Proposition 3.9. Let C denote the A2-submodule of (P/Z2) ⊗ P generated by all x11x 2 , p ∈ Z. Note that Sq2(x11x 2 ) = Sq 4 Sq6(x11x 2 ). Thus a basis of A2/ Sq 2 acting on all x11x 2 spans C. The 24 elements in a basis of A/ Sq acting on x11x 2 yield x 2 + x 2 + x 2 + x 2 , x 2 + x 2 , x 2 , x 2 , x 2 , x 2 , x 2 , x 2 , x 2 , x 2 , x 2 , x 2 , x 2 , x 2 , x 2 , x and x41x 2 + x 2 . These classes with second components shifted by all multiples of 8 exactly comprise the basis for C described in the proposition. The procedure to establish the structure of B = ((P/Z2)⊗P)/C is similar but more elaborate. For the 32 elements θ in a basis of A2/ Sq 1, we list θ(x−11 x 2 ) and θ(x Then we show that these, with each component allowed to vary by multiples of 8, together with C, fill out all of (P/Z2)⊗P. It is convenient to let Q denote the quotient of (P/Z2)⊗P by C and all elements θ(x8i−11 x 2 ) and θ(x 2 ). We will show Q = 0. This will complete the proof of Proposition 3.9, implying in particular that Sq1(x8i−11 x 2 ) and Sq 1(x8i−11 x 2 ) are decomposable over A2. A separate calculation is performed for each mod 8 value of the degree. Here we use repeatedly that the A2-action on x i depends only on i mod 8. We illustrate with the case in which degree ≡ 0 mod 8. The other 7 congruences are handled similarly, although some are a bit more complicated. A basis of A2/ Sq 1 in degree ≡ 2 mod 8 acting on x−11 x−12 yields the following elements: x−11 x 2 + x 2 + x 2 , x 2 + x 2 + x 2 + x 2 + x 2 + x 16 DONALD M. DAVIS AND MARK MAHOWALD and x41x 2 + x 2. A basis of A2/ Sq 1 in degree ≡ 6 mod 8 acting on x−11 x32 yields the following elements: x21x 2 + x 2 + x 2 + x 2 + x 2 + x 2 + x 2 + x 2, and x 2 + x 2. Because we allow both components to vary by multiples of 8, we will list just the first component of the ordered pairs. These are considered as relations in Q. Thus the relation R1 below really means that all x8i−11 x 2 + x 2 + x 2 become 0 in Q. R1 : X−1 +X0 +X1, R2 : X2 +X6, R3 : X−1 +X3 +X4 +X5 +X9, R4 : X4 +X12, R5 : X2 +X3 +X4 +X5, R6 : X−1 +X2 +X5, R7 : X4 +X6 +X10 +X12, R8 : X8 +X16. We will use these relations to show that all classes (in degree ≡ 0 mod 8) are 0 in Q. First, R8 implies that all classes X8i are congruent to one another. Since X0 is 0 in the quotient due to P/Z2, we conclude that all classes X8i are 0 in Q. Next, R4 implies that all X8i+4 are congruent to one another. Since X4 +X8 ∈ C, and we have just shown that X8 ≡ 0 in Q, we deduce that all X8i+4 are 0 in Q. Now we use R2 + R7 to see that all X8i+2 + X8i+4 are congruent to one another, then that X2 +X4 ∈ C to deduce all X8i+2 +X8i+4 ≡ 0, and finally the result of the previous sentence to conclude all X8i+2 ≡ 0. Then R2 implies all X8i+6 ≡ 0. Now R1+R3+R5, together with relations previously obtained, implies all X8i+1 are congruent to one another, and since X1 ∈ C, we conclude all X8i+1 ≡ 0. Finally R1 implies X8i−1 ≡ 0, R6 implies X8i+5 ≡ 0, and then R3 implies X8i+3 ≡ 0. 4. Careful treatment of axial class In this section, we fill the gap in the proof in [6] of its Theorem 1.1 by careful consid- eration of the possible “other terms” in the axial class discussed in the Introduction. We show that, at least as far as the monomials cX i1X 2 in its powers are concerned, NONIMMERSIONS IMPLIED BY TMF, REVISITED 17 the axial class equals u(X1+X2), where u is a unit in tmf 0(RP∞×RP∞). Thus the ℓth power of the axial class is nonzero in tmf8ℓ(RP n×RPm) if and only if (X1+X2)ℓ is nonzero there, and the latter is the condition which yielded the nonimmersions of [6, 1.1]. Thus we have a complete proof of [6, 1.1]. If P n × Pm f−→Pm+k is an axial map, then there is a commutative diagram P n × Pm f−−−→ Pm+k P∞ × P∞ g−−−→ P∞, where g is the standard multiplication of P∞, since P∞ = K(Z2, 1). Since X ∈ tmf8(Pm+k) has been chosen to extend over P∞, we obtain that f ∗(X) is the restric- tion of g∗(X). By Theorem 3.7 and the symmetry of g, we must have g∗(X) = X1 +X2 + 4(L1X 1 L2), (4.1) for some integers κi. This is what we call the “axial class.” Then g ∗(Xℓ) equals the ℓth power of (4.1). Using the formulas for L2i , LiXi, and c4(X1X2) in 3.1 and 3.7 and the binomial theorem, this ℓth power can be written in terms of the basis described in 3.7. If some κi’s are nonzero, the coefficients of X 2 in g ∗(Xℓ) will not equal , as was claimed in [6]. We will study this possible deviation carefully. One simplification is to treat L1 and L2 as being just 2. Note that Li acts like 2 when multiplying by Xi, and if, for example, L1 is present without X1, then the terms ci4L1X 2 cannot cancel our X 2-classes because both are separate parts of the basis. You have to carry the terms along, because they might get multiplied by an X1, and then it is as if L1 = 2. We will incorporate this important simplification throughout the remainder of this section. For example, one easily checks that, using L21 = 2L1 and L1X1 = 2X1, we obtain (X1 +X2 + L1X2) 4 = (X1 + 3X2) 4 − 80X42 + 40L1X42 . The exponent of 2 in each monomial of (X1 + 3X2) 4 − 80X42 is the same as that in (X1 +X2) 4, and L1X 2 is a separate basis element. With this simplification, the axial class in (4.1) becomes X1 +X2 + 2 2 ) (4.2) 18 DONALD M. DAVIS AND MARK MAHOWALD for some integers κi. There was another term 2κ0(X1+X2), but it can be incorporated into the leading (X1 +X2). The odd multiple that it can create is not important. From Theorem 3.7, we have c4(X1X2) = 16(X1 +X2) + 2 (4.3) for some integers γk. The 16 comes from γ0 = 8 and Li = 2. Actually we don’t really know that γ0 = 8, even just up to multiplication by a unit, but it is divisible by 8 and the possibility of equality must be allowed for. This gives 2 ) = 16(X 2 ) + 2 i+k+1 j+k+1 (4.4) Here we use that in a graded tmf∗-algebra tmf ∗(X) with even-degree elements, c(xy) = cx · y, for c ∈ tmf∗ and x, y ∈ tmf∗(X). There is an iterative nature to the action of c4 in (4.4), but the leading coefficient 16 enables us to keep track of 2-exponents of leading terms in the iteration. (As observed above, the leading coefficient might be an even multiple of 16, which would make the terms even more highly 2-divisible. We assume the worst, that it equals 16.) We obtain the following key result about the action of c4 on monomials in X1 and X2. Theorem 4.5. There are 2-adic integers Ai such that 24+iAi Remark 4.6. This formula will be evaluated on (i.e. multiplied by) monomialsXk1X One might worry that the negative powers of X1 or X2 in 4.5 will cause nonsensical negative powers in c4X 2. This will, in fact, not occur because the monomials on which we act always have total degree greater than the dimension of either factor. Thus if, after multiplication by c4, a term with negative exponent of Xi appears, then the accompanying X 3−i-term will be 0 for dimensional reasons. Proof of Theorem 4.5. The defining equation (4.3) may be written as, with θ = X1X2 and z = X1/X2, θ = 16(z + z−1) + i(zi+1 + z−(i+1)). (4.7) NONIMMERSIONS IMPLIED BY TMF, REVISITED 19 Let pi = z i + z−i. We will show that 24+iAip2i+1 (4.8) for certain 2-adic integers Ai, which interprets back to the claim of 4.5. Note that pipj = pi+j + p|i−j|, and hence pe11 · · · p k = pΣiei + L, where L is a sum of integer multiples of pj with j < iei and j ≡ iei mod 2. We will ignore for awhile the coefficients γi which occur in (4.7). This is allowable if we agree that when collecting terms, we only make crude estimates about their 2-divisibility. We have θ = 16p1 + 2θp2 + 2θ 2p3 + 2θ 3p4 + · · · = 16p1 + 2p2(16p1 + 2p2(16p1 + · · · ) + 2p3(16p1 + · · · )2 + · · · ) +2p3(16p1 + 2p2(16p1 + · · · ) + · · · )2 + · · · . Note that the only terms that actually get evaluated must end with a 16p1 factor. Now let T1 = 16p1 and, for i ≥ 2, let Ti = 2θi−1pi. Each term in the expansion of θ involves a sequence of choices. First choose Ti for some i ≥ 1, and then if i > 1 choose (i−1) factors Tj , one from each factor of θi−1. For each of these Tj with j > 1, choose j − 1 additional factors, and continue this procedure. This builds a tree, and we don’t get an explicit product term until every branch ends with T1. Each selected factor Tj with j > 1 contributes a factor 2pj. There will also be binomial coefficients and the omitted γi’s occurring as additional factors. For example, Diagram 4.9 illustrates the choices leading to one term in the expan- sion of θ. This yields the term 2p2 · 2p4 · 16p1 · 2p2 · 16p1 · 2p3 · 16p1 · 2p2 · 16p1, which equals 221(p17 +L), where L is a sum of pi with i < 17 and i odd. By induction, one sees in general that the sum of the subscripts emanating from any node, including the subscript of the node itself, is odd. 20 DONALD M. DAVIS AND MARK MAHOWALD Diagram 4.9. A possible choice of terms T2 T4 T2 T1 T2 T1 The important terms are those in which T2 is chosen k times (k ≥ 0) and then T1 is chosen. These give (2p2) kp1 with no binomial coefficient. This term is 2 k+4(p2k+1+L). Note that a term 2k+4p2i+1 with i < k obtained from L will be more 2-divisible than the 2i+4p2i+1 term that was previously obtained. Thus it may be incorporated into the coefficient of that term. All other terms will be more highly 2-divisible than these. For example, the first would arise from choosing T3 then two copies of T1. This would give 2p3 ·24p1 ·24p1 = 29p5+L, and the 29p5 can be combined with the 26p5 obtained from choosing T2 then T2 then T1. Incorporating γi’s may make terms even more divisible, but the claim of (4.8) is only that p2i+1 occurs with coefficient divisible by 2 Now we incorporate 4.5 into (4.2) to obtain the following key result, which we prove at the end of the section. Theorem 4.10. The monomials ciX 2 in the nth power of the axial class in tmf8n(RP∞ × RP∞) are equal to those in the nth power of (X1 +X2) 24+iαi , (4.11) where u is an odd 2-adic integer and αi are 2-adic integers. The factor which accompanies (X1+X2) in (4.11) is a unit in tmf ∗(RP∞×RP∞); we referred to it earlier as u. Indeed, its inverse is a series of the same form, obtained by solving a sequence of equations. This justifies the claim in the first paragraph of this section regarding retrieval of the nonimmersions of [6, 1.1]. We must also observe that restriction to tmf8ℓ(RP n × RPm) of the non-X i1Xℓ−i2 parts of the basis of tmf8ℓ(RP∞ × RP∞) cannot cancel the X i1Xℓ−i2 terms essential for the nonimmersion. This is proved by noting that these elements such as L1X NONIMMERSIONS IMPLIED BY TMF, REVISITED 21 and ci4L1X 2 will restrict to a class of the same name in tmf 8ℓ(RP n × RPm), and will be 0 there for dimensional reasons, since 8ℓ > n. Proof of Theorem 4.10. Let g∗(X) denote the axial class as in (4.1). From (4.2) and 4.5, the difference g∗(X)− (X1 +X2) equals We let z = X1/X2 and pj = z j + z−j as in the proof of 4.5. The summand with i = 2t becomes 2κi(X1 +X2) X t1X 2jAjp2j+1 = 2κi(X1 +X2)(p2t + L)24i k(p2k+i + L). Here k is a sum of j-values taken from the various factors in the ith power. Also, in pj + L, L denotes a combination of pt’s with t < j. Noting (p2t + L)(p2k+i + L) = p2k+2i + L, this becomes 2(X1 +X2)2 k(p2k+2i + L). (4.12) The argument when i = 2t + 1 is similar but slightly more complicated because (X i+11 +X 2 ) is not divisible by (X1 +X2). We obtain X i+11 +X X1X2)2t+1 2jAjp2j+1 For one of the factors of the ith power, say the first, we treat p2j+1 as X1+X2√ (p2j +L). The expression then becomes 2(X1 +X2)pi+12 k(p2k+i−1 + L), where k is obtained as in the previous case. We again obtain (4.12). Thus when g∗(X) − (X1 +X2) is written as (X1 + X2) βjp2j , the coefficient βj satisfies ν(βj) ≥ (j − 1) + 4 + 1. Here the (j − 1) + 4 comes from the case i = 1, k = j− 1 in (4.12), and the extra +1 is the factor 2 which has been present all along. This yields the claim of (4.11). 22 DONALD M. DAVIS AND MARK MAHOWALD 5. tmf-cohomology of CP∞ × CP∞ In [2], [4], and [8], it was noted, first by Astey, that the axial class using BP (or BP 〈2〉) was u(X2 − X1), where u is a unit in BP ∗(P∞ ∧ P∞). In this section, we review that argument and consider the possibility that it might be true when BP is replaced by tmf, which would render the considerations of the previous section unnecessary. To do this, we calculate tmf∗(CP∞) and tmf∗(CP∞×CP∞) in positive dimensions. (See Theorems 5.15 and 5.19.) Although our conclusion will be that Astey’s BP -argument cannot be adapted to tmf, nevertheless these calculations may be of independent interest. We begin by reviewing Astey’s argument. There is a commutative diagram, in which RP = RP∞ and CP = CP∞ dR−−−→ RP × RP mR−−−→ RP dC−−−→ CP × CP CP 1×(−1) CP × CP mC−−−→ CP The generator XR ∈ BP 2(RP ) satisfies XR = h∗(X). We also have that mC ◦ (1 × (−1))◦dC is null-homotopic. The key fact, which will fail for tmf, is BP ∗(CP×CP ) ≈ BP ∗[X1, X2]. The axial class is m∗R(XR). It equals (h× h)∗(1× (−1))∗m∗C(X). But (1× (−1))∗m∗C(X) ∈ ker(d∗C). By the above “key fact,” d∗C is the projection BP ∗[X1, X2]→ BP ∗[X ] in which each Xi 7→ X . The kernel of this projection is the ideal (X2 −X1). To see this, just note that in grading 2n a kernel element must be 2 with ci = 0, and hence is 2 −Xn1 ) = 1(X2 −X1) n−i−1−j Thus (1 × (−1))∗m∗C(X) = (X2 − X1)u for some u ∈ BP ∗(CP × CP ). This u is a unit by consideration of its reduction to H∗(−;Z), as in [2]. Since h∗(u) will then be a unit in BP ∗(RP ×RP ) and h∗(Xi) = XRi, we obtain the claim about the axial class being a unit times XR2 −XR1. NONIMMERSIONS IMPLIED BY TMF, REVISITED 23 In order to see if there is any chance of adapting this to tmf, we compute tmf∗(CP∞) and tmf∗(CP∞ × CP∞) in positive gradings. We begin with the relevant Ext calcu- lations. Let bo = Ext (Z2,Z2). Recall that a chart for this is given as in Diagram 5.1, extended with period (t− s, s) = (8, 4). Diagram 5.1. Ext (Z2,Z2) 0 4 8 · · · Let M10 denote the A2-module 〈1, Sq4, Sq2 Sq4, Sq4 Sq2 Sq4〉. Lemma 5.2. There is an additive isomorphism (M10,Z2) ≈ bo[v2], where v2 ∈ Ext1,7(−). Thus the chart for Ext (M10,Z2) consists of a copy of bo shifted by (t − s, s) = (6i, i) units for each i ≥ 0. Proof. There is a short exact sequence of A2-modules 0→ Σ7M10 → A2//A1 → M10 → 0. This yields a spectral sequence which builds Ext (M10,Z2) from ∗−i,∗−7i (A2//A1,Z2). Since Ext (A2//A1,Z2) ≈ bo, one easily checks that there are no possible differen- tials in this spectral sequence. Let Cmn = H ∗(CPmn ;Z2). 24 DONALD M. DAVIS AND MARK MAHOWALD Theorem 5.3. There is an additive isomorphism (C∞−∞,Z2) ≈ Σ8p−2bo[v2]. Of course Σ applied to a module or an Ext group just means to increase the t-grading by 1. Proof. There is a filtration of C∞−∞ with Fp/Fp−1 ≈ Σ8p−2M10 for p ∈ Z. We have Sq2 ι8p−2 = Sq 4 Sq2 Sq4 ι8p−10. The same argument used in the last paragraph of the proof of Corollary 3.10 works to initiate an inductive proof of the Ext-isomorphism claimed in the theorem. Corollary 5.4. In gradings (t− s) less than −1, (C−2−∞,Z2) ≈ Σ8p−2bo[v2]. Proof. There is an exact sequence → Exts−1,tA2 (C −1,Z2)→ Ext (C−2−∞,Z2)→ Ext (C∞−∞,Z2) q∗−→ Exts,tA2(C −1,Z2)→ . The result is immediate from this and 5.3, since q∗ sends the initial tower in F0/F−1 isomorphically to the initial tower in ExtA2(C −1,Z2). The A-modules C∞1 and Σ 2C−2−∞ are dual. Thus, by [9, Prop 4], (Z2,C 1 ) ≈ Ext (Σ2C−2−∞,Z2). There is a ring structure on Ext (Z2,C 1 ). We deduce the following result, which is pictured in Diagram 5.12. Corollary 5.5. In (t− s) gradings ≤ 0, there is a ring isomorphism (Z2,C 1 ) ≈ bo[v2][X ], where X ∈ Ext0,−8. Proof. We apply the duality isomorphism to 5.4. The multiplicative structure is obtained from the observation that the powers of the class in Ext0,−8 equal the class in Ext0,−8i for each i > 0. NONIMMERSIONS IMPLIED BY TMF, REVISITED 25 The Ext groups computed here are the E2-term of the ASS converging to tmf −∗(CP∞). We will consider the differentials in this spectral sequence after performing the Ext calculation relevant for tmf∗(CP∞ × CP∞). Now we consider C−2−∞⊗C−2−∞. Now x1 and x2 denote elements of H2(CP ;Z2). Let E2 denote the exterior subalgebra generated by the Milnor primitives of grading 1, 3, and 7. Note that A2//E2 has a basis with elements of grading 0, 2, 4, 6, 6, 8, 10, and 12. Finally we note that for any j ≡ −2 mod 8 with j ≤ −10, there is a nontrivial A2-morphism C ρ−→ΣjZ2. Lemma 5.6. Let K = ker(C−2−∞ ⊗C−2−∞ ρ−→C−2−∞ ⊗ Σ−10Z2). Let S denote the set of all classes x8i−21 x 2 with i ≤ −1 and j ≤ −2, together with the classes x8i−21 x 2 with i ≤ −1 and j ≤ −1. Then K is the direct sum of a free A2//E2-module on S with a single relation Sq 4 Sq2 Sq4(x−101 x 2 ) = 0. Proof. Since the generators of E2 have odd grading, A2//E2 acts on any element of these evenly-graded modules. The action of A2//E2 on x 2 yields the additional elements x−21 x 2 , x 2 , x 2 , x x−21 x 2 , and x 2. The action of A2//E2 on x 2 yields the additional elements x01x 2 + x 2, and x 2 + x 2. Each exponent can be decreased by any multiple of 8. One can easily check that in each grading all classes in C−2−∞ ⊗C−2−∞ are obtained exactly once from the described elements in K together with C−2−∞ ⊗ Σ−10Z2. There are four cases, for the four even mod 8 values. We illustrate with the case of grading 4 mod 8. We will just consider the specific value −28, but it will be clear that it generalizes to all gradings ≡ 4 mod 8. Letting Xi denote xi1x−28−i2 , we have: (1) From generators in −28, we obtain just X−10 in K. The class X−18 is in C −∞ ⊗ Σ−10Z2. (2) From generators in −32, we obtain X−8 + X−6, X−16 + X−14, and X−24 +X−22. (3) From generators in −36, we obtain X−8+X−4 and X−16+X−12. (4) From generators in −40, we obtain X−4, X−12 +X−8, X−20 + X−16, and X−24. 26 DONALD M. DAVIS AND MARK MAHOWALD Note in (4) that X0 and X−28 do not appear because each component must be ≤ −4 and the components sum to −28. One easily checks that the 11 classes listed above, including X−18, form a basis for the space spanned by X−4, . . . , X−24, in an orderly fashion that clearly generalizes to any grading ≡ 4 mod 8. A similar argument works in the other three congruences. There are some minor variations in the top few dimensions. Now we dualize. There is a pairing ExtA2(Z2,C 1 )⊗ ExtA2(Z2,C∞1 )→ ExtA2(Z2,C∞1 ⊗C∞1 ). Let Xi denote the class in grading −8 coming from the ith factor. Then we obtain Theorem 5.7. The algebra Ext (Z2,C 1 ⊗ C∞1 ) in gradings ≤ −8 is isomorphic to Z2[X1, X2]〈X1X2, y−12〉 with y2−12 = X21X2 + X1X22 . The monomials of the form X i1X 2y−12 are acted on freely by Z2[v0, v1, v2]. Let Sn denote the Z2-vector space with basis the monomials X i1X 2 , and define a homomorphism ǫ : Sn → Z2 by sending each monomial to 1. Then Z2[v0, v1, v2] acts freely on ker(ǫ), while bo[v2] acts freely on Sn/ ker(ǫ). Thus in dimensions t − s ≤ −8 Ext∗,∗A2 (Z2,C 1 ⊗ C∞1 ) has, for each i > 0, i copies of Σ−8i−4Z2[v0, v1, v2] and i copies of Σ −8i−16Z2[v0, v1, v2], and also one copy of Σ−8i−8bo[v2]. Here Z2[X1, X2]〈X1X2, y−12〉means a free Z2[X1, X2]-module on basis {X1X2, y−12} Proof. The structure as graded abelian group is straightforward from Lemma 5.6, Corollary 5.5, and the duality isomorphism (Z2,C 1 ⊗C∞1 ) ≈ Ext ∗,∗−4 (C−2−∞ ⊗C−2−∞,Z2). We use that ExtA2(A2//E2,Z2) ≈ Z2[v0, v1, v2]. The reason that we only assert the structure in dimension ≤ −8 is due to the Σ−10 in the cokernel part of Lemma 5.6, and that Theorem 5.5 was only valid in dimension ≤ 0. In the range under consideration, the relation on the top class in Lemma 5.6 does not affect Ext. The ring structure in filtration 0 comes from HomA2(Z2,C 1 ⊗C∞1 ) being isomorphic to elements of C∞1 ⊗C∞1 annihilated by Sq2 and Sq4, which has as basis all elements x4i1 ⊗ x 2 and (x 1 ⊗ x 2 )(x 1 ⊗ x22 + x21 + x42). NONIMMERSIONS IMPLIED BY TMF, REVISITED 27 Now we show that Ext 1,−8n+2 (Z2,C 1 ⊗ C∞1 ) = Z2, and h1 times each mono- mial in Ext 0,−8n (Z2,C 1 ⊗ C∞1 ) equals the nonzero element here. An element in 1,−8n+2 (Z2,C 1 ⊗C∞1 ) = Z2 is an equivalence class of morphisms Σ2A2 ⊕ Σ4A2 h−→C∞1 ⊗C∞1 which increase grading by 8n− 2, and yield a trivial composite when preceded by Σ4A2 ⊕ Σ8A2 Sq2 Sq6 0 Sq4 −−−−−−−−−→ Σ2A2 ⊕ Σ4A2. Morphisms h which can be factored as Σ2A2 ⊕ Σ4A2 Sq2,Sq4−−−−→ A2 k−→C∞1 ⊗C∞1 (5.8) are equivalent to 0 in Ext. We illustrate with the case n = 3. There are A2-morphisms increasing grading by 22 sending either Σ2A2 or Σ 4A2 to any one of the following classes: 2 , x 2 , x (5.9) The classes are listed in this order because any two adjacent monomials are equivalent using as k in (5.8) the morphism sending the generator to the indicated classes in succession: 2 , x For example, (Sq2, Sq4)(x11x 2 ) = (x 2 , x 2 ). Thus all classes in (5.9) are equiva- lent to one another. That h1 times any monomialX 2 equals this nonzero element of Ext 1,8n+2 (Z2,C C∞1 ) follows from usual Yoneda product consideration. If 0 ← Z2 ← C0 ← C1 ← is the beginning of a minimal A2-resolution, with C1 = Σ 1A2 ⊕ Σ2A2 ⊕ Σ4A2, then h1X 2 is represented by the composite C1 → C0 → C∞1 ⊗ C∞1 sending ι2 7→ ι 7→ X i1Xn−i2 , and this is equivalent to the element described in the previous paragraph. Here is a schematic way of picturing Theorem 5.7. We first list the generators in grading greater than −32. Then for each of the two types of generators, we list the structure arising from them in the first 10 dimensions. The bo[v2]-structure in the 28 DONALD M. DAVIS AND MARK MAHOWALD left half of Diagram 5.11 arises from one tower in dimensions −24 and −16, while the Z2[v0, v1, v2]-structure in the right half of diagram 5.11 arises from the other towers in Diagram 5.10. Diagram 5.10. Generators of ExtA2(Z2,C 1 ⊗C∞1 ) −28 −24 −20 −16 −12 ✻✻✻ ✻✻ ✻✻ ✻ ✻ Diagram 5.11. Structure on two types of generators 0 010 10 ✻ ✻ ✻ ✻ ✻ ✻ ✻ ✻✻ ✻ ✻ Now we consider the differentials in the ASS converging to tmf∗(CP∞) and then for tmf∗(CP∞ ∧ CP∞). The gradings are negated when considered as tmf-cohomology groups. Corollary 5.5 gives the E2-term converging to [Σ ∗CP∞1 , tmf] ≈ tmf −∗(CP∞1 ). We will maintain the homotopy gradings until just before the end. In diagram 5.12, we depict a portion of the E2-term of this ASS in gradings −16 to 1. There are also classes in higher filtration arising from powers of v41 and v2 acting on generators in lower grading. The elements indicated by •’s are involved in differentials, as explained later. NONIMMERSIONS IMPLIED BY TMF, REVISITED 29 Diagram 5.12. A portion of E2 for [Σ ∗CP∞, tmf] −16 −8 0 We will prove the following key result about differentials in this ASS. Theorem 5.13. The nonzero differentials in the ASS converging to [Σ∗CP∞, tmf], ∗ < 1, are given by −2k+1) = hǫ+11 v for ǫ = 0, 1, i, j ≥ 0, k ≥ 1. Here h1, v 1, and v2 have the usual Ext s,t gradings (s, t) = (1, 2), (4, 12), and (1, 7), respectively. Diagram 5.12 pictures the situation for k = 1 and small values of i and j. The elements indicated by •’s are involved in the differentials. The resulting picture is nicer if the filtrations of all classes built on X−2k+1 are increased by 1. There is a nontrivial extension (multiplication by 2) in dimension −6 due to the preceding differential. This is equivalent to the way that bu∗ is formed from bo∗ and Σ 2bo∗. We obtain Diagram 5.14 from Diagram 5.12 after the differentials, extensions, and filtration shift are taken into account. 30 DONALD M. DAVIS AND MARK MAHOWALD Diagram 5.14. Diagram 5.12 after differentials and filtration shift −16 −8 0 The regular sequence of towers in the chart beginning in filtration 1 in dimension −10 is interpreted as vi1v2, i ≥ 0. After negating dimensions to switch to cohomology indexing, we obtain the follow- ing result, which is immediate from 5.13 after the extensions such as just seen are taken into account. Theorem 5.15. In positive gradings, there is an isomorphism of graded abelian groups tmf∗(CP∞1 ) ≈ Z(2)[Z16](bo∗ ⊕ v2Z(2)[v1, v2]). Here Z16 ∈ tmf16(CP∞1 ), and |v1| = −2 and |v2| = −6. Recall that bo∗ = bo−∗ with bo∗ as suggested in 5.1. Much of the ring structure of tmf∗(CP∞1 ) is described in 5.15, since bo∗ and v2Z(2)[v1, v2] are rings, and it is quite clear how to multiply an element in bo∗ by one in v2Z(2)[v1, v2]. Because of the filtration shift that led to the identification of some of the classes in v2Z(2)[v1, v2], we hesitate to make any complete claims about the ring structure. A complete computation of tmf∗(CP∞) was made in [5]. See there especially Theorem 7.1 and Diagram 7.1. At first glance, the two descriptions appear quite different, but they seem to be compatible. NONIMMERSIONS IMPLIED BY TMF, REVISITED 31 Proof of Theorem 5.15. We first prove that there is a nontrivial class in [Σ−16CP, tmf] detected in filtration 0. This is obtained using the virtual bundle 8(H−1)−(H3−H), where H denotes the complex Hopf bundle. Considered as a real bundle θ, this bundle satisfies w2(θ) and p1(θ) = 0. Here we use from [18] that p1 generates the infinite cyclic summand in H4(BSO;Z) and satisfies r∗(p1) = c 1 − 2c2 under BU r−→BSO, and ρ∗(p1) = 2e1 under BSpin ρ−→ BSO, where H4(BSpin;Z) is an infinite cyclic group generated by e1. The total Chern class of 9H −H3 is (1 + x)9(1 + 3x)−1 = 1 + 6x+ 18x2 + · · · , and hence r∗(p1(θ)) = (c1(9H −H3))2 − 2c2(9H −H3) = (6x)2 − 2 · 18x2 = 0. Thus e1(θ) = 0, hence CP ∞ θ−→BSpin → K(Z, 4) is trivial, and so θ lifts to a map CP∞ → BO[8]. Hence its Thom spectrum induces a degree-1 map T (θ) → MO[8]. Since ψ3(H) = H3 −H , by [19] θ is J(2)-equivalent to 8(H − 1), and hence its Thom spectrum is T (8(H − 1)) = Σ−16CP∞8 . Using the Ando-Hopkins-Rezk orientation ([1]) MO[8]→ tmf, we obtain our desired class as the composite Σ−16CP∞1 col−→ Σ−16CP∞8 T (θ)−−→MO[8]→ tmf . (5.16) We will deduce our differentials from the d3-differential E 3 → E 3 in the ASS converging to π∗(tmf). This can be seen in [13, p.537] or [11, Thm 2.2]. See Remark 5.17 for additional explanation. It is not difficult to show that, with M10 as in 5.2, the morphism (Z2,Z2)→ Exts,tA2(M10,Z2) induced by the nontrivial A2-map M10 → Z2 sends the Z2 in Ext7,23A2 (Z2,Z2) which is not part of the infinite tower to h21v We prefer to think about the ASS for tmf∗(Σ 2CP−2−∞), which, as we have noted, is isomorphic to that of [Σ∗CP∞1 , tmf]. The E2-term was described in 5.4. Let S−16 → Σ2CP−2−∞ ∧ tmf correspond to the map in (5.16). Since E2(CP−2−∞ ∧ tmf) in negative dimensions is built from copies of ExtA2(M10,Z2), we deduce from the previous paragraph that h21v 1v2g−16 in the ASS for tmf∗(Σ 2CP−2−∞) must be hit by a d2- or d3-differential, since it is the image of a class hit by a d3. The only possibility is that it be d2 from h1v 1g−8, as indicated by the dotted line in Diagram 5.12. Naturality 32 DONALD M. DAVIS AND MARK MAHOWALD of differentials with respect to h1 and v 1 implies the differentials of 5.13 for ǫ = 0, 1, all i, j = 0, and k = 1. Using the diagonal map of CP∞1 and the multiplication of tmf, powers of (5.16) give similar nontrivial elements in [Σ−16kCP∞1 , tmf] for all k ≥ 1, and by the argument just presented, we establish the differentials of 5.13 for all k (with j = 0 still). The only possible differentials on v2g−16 would be some dr with r > 2 hitting an element which is acted on nontrivially by h1. However h1v2g−16 has become 0 in E3 since it was hit by a d2-differential. Thus a nonzero differential on v2g−16 would contradict naturality of differentials with respect to h1-action. Hence there is a map S−10 → Σ2CP−2−∞ ∧ tmf hitting v2g−16, and the argument of the previous paragraph implies that d2(h1v 1v2g−8) = h 2g−16 and then other related differentials. This now establishes the differentials of 5.13 when j = 1, and sets in motion an inductive argument to establish these differentials for all j ≥ 1. No further differentials in the spectral sequence are possible, by dimensional and h1-naturality considerations. Remark 5.17. The proof of the key d3-differential in the ASS of tmf from the 17- stem to the 16-stem, which was cited above, has not had a thorough proof in the literature. Giambalvo’s original argument was incorrect and his correction merely refers to “a homotopy argument.” The current authors cited Giambalvo’s result in [11] without additional argument. We provide some more detail here regarding this differential. The relevant portion of the ASS of tmf appears in Diagram 5.18. In [13] and [11], this was pictured as the ASS of MO[8], but through dimension 18, ∗(MO[8]),Z2) ≈ Ext∗,∗A2(Z2 ⊕ Σ Z2,Z2). One way of obtaining the differentials from 15 to 14, as in [13], is to note that the [8]- cobordism group of 14-dimensional manifolds is Z2, and so the top two elements must be killed by differentials. It is not difficult to compute in Ext the Massey product formula B = 〈A, h0, h1〉, where A and B are as in Diagram 5.18. This can be seen as v41 times a similar formula between classes in dimensions 6 and 8. Since A is 0 in homotopy, the associated Toda bracket formula says that B must be divisible by η. NONIMMERSIONS IMPLIED BY TMF, REVISITED 33 But only 0 can be divisible by η in dimension 16 here. Thus B must be killed by a differential, and the depicted way is the only way this can happen. Diagram 5.18. Portion of ASS of tmf 14 16 18 The differentials in the ASS converging to tmf∗(CP −∞∧CP−2−∞) are implied by the same considerations that worked for CP−2−∞. The Z2[v0, v1, v2]-parts in Theorem 5.7 cannot support differentials by dimensionality and h1-naturality. For the bo-like part, we prefer thinking about it as [Σ∗+4CP∞1 ∧ CP∞1 , tmf] ≈ tmf−∗−4(CP∞1 ∧ CP∞1 ), where the product structure is more apparent. Let Zn denote the nonzero element of Ext 0,−8n (Z2,C 1 ⊗C∞1 )/ ker(h1). By Theorem 5.7, Zn can be represented by X 2 for any 1 ≤ i < n. If n is even and n ≥ 4, choosing i even, Zn is an infinite cycle because it is an external product of infinite cycles. Hence by the proof of Theorem 5.13, 2Z2k−1) = h 2 Z2k for ǫ = 0, 1, i, j ≥ 0, and k ≥ 2. Finally, X1X2 is an infinite cycle since there is nothing that it can hit. Also, h1v2X1X2 and h 1v2X1X2 are not hit by differentials since Ext (Z2,C 1 ⊗C∞1 ) = 0 by Theorem 5.7. We obtain the following. Theorem 5.19. In grading ≥ 10, there is an isomorphism of graded abelian groups tmf∗(CP∞1 ∧CP∞1 ) ≈ yZ(2)[v1, v2, X1, X2]⊕ In·Z(2)[v1, v2]⊕Z(2)[Z](bo∗⊕v2Z(2)[v1, v2]), 34 DONALD M. DAVIS AND MARK MAHOWALD where |y| = 12, |Xi| = 8, |Z| = 16, |v1| = −2, and |v2| = −6. Here In = ker(Fn ǫ−→Z), where Fn is a free abelian group with basis {X i1Xn−i2 : 1 ≤ i < n}, and ǫ(X i1Xn−i2 ) = 1. Thus In consists of all polynomials of grading n with sum of coefficients equal to 0. We could have extended the description in 5.19 down to grading 8, but the description would have been slightly more complicated, since it would include h1v2Z and h 1v2Z. The motivation for this section was to see if perhaps ker(tmf∗(CP∞ × CP∞) d −→ tmf∗(CP∞)) might be something nice like the I(X1 − X2) which was the case for BP ∗(−). In Theorem 5.19, we described tmf∗(CP∞ ∧ CP∞). To obtain tmf∗(CP∞ × CP∞), we add on two copies of tmf∗(CP∞), which was described in 5.15. Denote by Z1 and Z2 the generators in tmf 16(CP∞ × CP∞). Monomials Z i1Zn−i2 should equal Zn of 5.19 plus perhaps elements of I2n of 5.19. The class y of 5.19 plus perhaps a sum of elements of higher filtration is in ker(d∗) and not in the ideal generated by (Z1−Z2). Thus, as expected, ker(d∗) does not have the nice form that it did for BP ∗(−), and so we cannot use this argument to show that the axial class in tmf∗(RP∞ × RP∞) is u(X1 − X2). However, we showed something like this by a completely different method in Theorem 4.10. We feel that the results obtained in Theorems 5.15 and 5.19 should be of independent interest. References [1] M. Ando, M. J. Hopkins, and C. Rezk, Multiplicative orientations of KO-theory and of the spectrum of topological modular forms, preprint, www.math.uiuc.edu/∼mando/papers/koandtmf.pdf. [2] L. Astey, Geometric dimension of vector bundles over real projective spaces, Quar Jour Math Oxford 31 (1980) 139-155. [3] , A cobordism obstruction to embedding manifolds, Ill Jour Math 31 (1987) 344-350. [4] L. Astey and D. M. Davis, Nonimmersions of real projective spaces implied by BP , Bol Soc Mat Mex 25 (1980) 15-22. [5] T. Bauer, Elliptic cohomology and projective spaces–a computation, preprint. wwwmath.uni-muenster.de/u/tbauer/cpinfty.pdf. [6] R. R. Bruner, D. M. Davis, and M. Mahowald, Nonimmersions of real projec- tive spaces implied by tmf, Contemp Math 293 (2002) 45-68. [7] D. M. Davis, Table of immersions and embeddings of real projective spaces, http://www.lehigh.edu/∼dmd1/immtable. [8] , A strong nonimmersion theorem for real projective spaces, Annals of Math 120 (1984) 517-528. NONIMMERSIONS IMPLIED BY TMF, REVISITED 35 [9] , On the Segal Conjecture for Z2 × Z2, Proc Amer Math Soc 83 (1981) 619-622. [10] D. M. Davis and M. Mahowald, Ext over the subalgebra A2 of the Steenrod algebra for stunted projective spaces, Can Math Soc Conf Proc 2 (1982) 297- [11] , A new spectrum related to 7-connected cobordism, Springer-Verlag Lecture Notes in Math 1370 (1989) 126-134. [12] D. M. Davis and V. Zelov, Some new embeddings and nonimmersions of real projective spaces, Proc Amer Math Soc 128 (2000) 3731-3740. [13] V. Giambalvo, On 〈8〉-cobordism, Ill Jour Math 15 (1971) 533-541. Correction in Ill Jour Math 16 (1972) 704. [14] I. M. James, On the immersion problem for real projective spaces, Bull Amer Math Soc 69 (1963) 231-238. [15] N. Kitchloo and W. S. Wilson, The second real Johnson-Wilson theory and nonimmersions of RPn, preprint. [16] W. H. Lin, On conjectures of Mahowald, Segal, and Sullivan, Math Proc Camb Phil Soc 87 (1980) 449-458. [17] W. H. Lin, D. M. Davis, M. Mahowald, and J. F. Adams, Calculation of Lin’s Ext groups, Math Proc Camb Phil Soc 87 (1980) 459-469. [18] J. Milnor and J. D. Stasheff, Characteristic classes, Princeton Univ Press (1974). [19] D. Sullivan, Genetics of homotopy theory and the Adams conjecture, Annals of Math 100 (1974) 1-79. Lehigh University, Bethlehem, PA 18015, USA E-mail address : dmd1@lehigh.edu Northwestern University, Evanston, IL 60208, USA E-mail address : mark@math.northwestern.edu ABSTRACT In a 2002 paper, the authors and Bruner used the new spectrum tmf to obtain some new nonimmersions of real projective spaces. In this note, we complete/correct two oversights in that paper. The first is to note that in that paper a general nonimmersion result was stated which yielded new nonimmersions for RP^n with n as small as 48, and yet it was stated there that the first new result occurred when n=1536. Here we give a simple proof of those overlooked results. Secondly, we fill in a gap in the proof of the 2002 paper. There it was claimed that an axial map f must satisfy f^*(X)=X_1+X_2. We realized recently that this is not clear. However, here we show that it is true up multiplication by a unit in the appropriate ring, and so we retrieve all the nonimmersion results claimed in the original paper. Finally, we present a complete determination of tmf^{8*}(RP^\infty\times RP^\infty) and tmf^*(CP^\infty\times CP^\infty) in positive dimensions. <|endoftext|><|startoftext|> Spin Evolution of Accreting Neutron Stars: Nonlinear Development of the R-mode Instability Ruxandra Bondarescu, Saul A. Teukolsky, and Ira Wasserman Center for Radiophysics and Space Research, Cornell University, Ithaca, NY 14853 The nonlinear saturation of the r-mode instability and its effects on the spin evolution of Low Mass X-ray Binaries (LMXBs) are modeled using the triplet of modes at the lowest parametric instability threshold. We solve numerically the coupled equations for the three mode amplitudes in conjunction with the spin and temperature evolution equations. We observe that very quickly the mode amplitudes settle into quasi-stationary states that change slowly as the temperature and spin of the star evolve. Once these states are reached, the mode amplitudes can be found algebraically and the system of equations is reduced from eight to two equations: spin and temperature evolution. The evolution of the neutron star angular velocity and temperature follow easily calculated trajectories along these sequences of quasi-stationary states. The outcome depends on whether or not the star will reach thermal equilibrium, where the viscous heating by the three modes is equal to the neutrino cooling (H = C curve). If, when the r-mode becomes unstable, the star spins at a frequency below the maximum of the H = C curve, then it will reach a state of thermal equilibrium. It can then either (1) undergo a cyclic evolution with a small cycle size with a frequency change of at most 10%, (2) evolve toward a full equilibrium state in which the accretion torque balances the gravitational radiation emission, or (3) enter a thermogravitational runaway on a very long timescale of ≈ 106 years. If the star does not reach a state of thermal equilibrium, then a faster thermal runaway (timescale of ≈ 100 years) occurs and the r-mode amplitude increases above the second parametric instability threshold. Following this evolution requires more inertial modes to be included. The sources of damping considered are shear viscosity, hyperon bulk viscosity and viscosity within the core-crust boundary layer. We vary proprieties of the star such as the hyperon superfluid transition temperature Tc, the fraction of the star that is above the threshold for direct URCA reactions, and slippage factor, and map the different scenarios we obtain to ranges of these parameters. We focus on Tc & 5 × 10 9 K where nonlinear effects are important. Wagoner [1] has shown that a very low r-mode amplitude arises at smaller Tc. For all our bounded evolutions the r-mode amplitude remains small ∼ 10−5. The spin frequency of accreting neutron stars is limited by boundary layer viscosity to νmax ≈ 800Hz[Sns/(M1.4R6)] 4/11T −2/11 . Fast rotators are allowed for [Sns/(M1.4R6)] 4/11T −2/11 ∼ 1 and we find that in this case the r-mode instability would be active for about 1 in 1000 LMXBs and that only the gravitational waves from LMXBs in the local group of galaxies could be detected by advanced LIGO interferometers. PACS numbers: 04.40.Dg, 04.30.Db, 97.10.Sj, 97.60.Jd I. INTRODUCTION R-modes are oscillations in rotating fluids that are due to the Coriolis effect. They are subject to the classical Chandrashekar-Friedman-Shutz (CFS) instability [2, 3], which is driven by the gravitational radiation backreac- tion force. Andersson [4] and Friedman and Morsink [5] showed that, in the absence of fluid dissipation, r-modes are linearly unstable at all rotation rates. However, in real stars there is a competition between internal viscous dissipation and gravitational driving [6] that depends on the angular velocity Ω and temperature T of the star. Above a critical curve in the Ω−T plane the n = 3,m = 2 mode, referred to as ’the r-mode’ in this work, becomes unstable. At first, an unstable r-mode grows exponen- tially, but soon it may enter a regime where other in- ertial modes that couple to the r-mode become excited and nonlinear effects become important. Roughly speak- ing, nonlinear effects first become significant as the am- plitude passes its first parametric instability threshold, which is very low (∼ 10−5). Modeling and understand- ing the nonlinear effects is crucial in determining (1) the final saturation amplitude of the r-mode and (2) the lim- iting spin frequency that neutron stars can achieve. The r-mode amplitude and the duration of the instability are among the main factors that determine whether the as- sociated gravitational radiation could be detectable by laser interferometers on Earth. The r-mode instability has been proposed as an expla- nation for the sub-breakup spin rates of both Low Mass X-ray Binaries (LMXBs) [7, 8] and young, hot neutron stars [6, 9]. The idea that gravitational radiation could balance accretion was proposed independently by Bild- sten [7] and Andersson et al. [8]. Cook, Shapiro and Teukolsky [10, 11] model the recycling of pulsars to mil- lisecond periods via accretion from a Keplerian disk onto a bare neutron star with M = 1.4M⊙ when Ω = 0. De- pending on the equation of state they found that spin frequencies of between ≈ 670 Hz and 1600 Hz could be achieved before mass shedding or radial instability set in (these calculations predated the realization that the r-mode instability could limit the spin frequency). For comparison, the highest observed spin rate of millisec- ond pulsars is 716 Hz for PSR J1748-2446ad [12, 13]. http://arxiv.org/abs/0704.0799v3 PSR B1937+21, which was discovered in 1982, was the previous fastest known radio pulsar with a spin rate of 642 Hz [14]; that this “speed” record stood for 24 years suggests that neutron stars rotating this fast are rare. Moreover, based on a Bayesian statistical analysis of the spin frequencies of the 11 nuclear-powered millisecond pulsars whose spin periods are known from burst oscil- lations, Chakrabarty et al. [15] claimed a cutoff limit of νmax = 760 Hz (95% confidence); A more recent analy- sis, which added two more pulsars to the sample, found νmax = 730 Hz [16]. At first sight, one might conclude that mass shedding or radial instability sets νmax, and that it is just above the record ν = 716 Hz determined for PSR J1748-2446ad. However, the nuclear equations of state consistent with this picture all have rather large radii ≈ 16 − 17 km for non-rotating 1.4 M⊙ models; see Table 1 in Cook et al. [10]. For these equations of state, the r-mode insta- bility should lead to νmax somewhat below 716 Hz; see Eq. (33) in Sec. V below. Thus, the r-mode instability may prevent recycling by accretion from reaching mass shedding or radial instability. In other words, the de- tection of the 716 Hz rotator is consistent with accretion spin-up mitigated by the r-mode instability only for equa- tions of state for which mass shedding or radial instability would permit even faster rotation. Ultimately, this may be turned into useful constraints on nuclear equations of state. However, at present the uncertainty in the physics of internal dissipation is a significant hindrance in estab- lishing such constraints. Since a physical model to follow the nonlinear phase of the evolution was initially unavailable, Owen et al. [17] proposed a simple one-mode evolution model in which they assumed that nonlinear hydrodynamics effects satu- rate the r-mode amplitude at some arbitrarily fixed value. According to their model, once this maximum allowed amplitude is achieved, the r-mode amplitude remains constant and the star spins down at this fixed ampli- tude (see Eqs. (3.16) and (3.17) in Ref. [17]). They used this model to study the impact of the r-mode instability on the spin evolution of young hot neutron stars assum- ing normal matter. In their calculation they include the effects of shear viscosity and n-p-e bulk viscosity. They found that the star would cool to approximately 109 K and spin down from a frequency close to the Kepler fre- quency to about 100 Hz in a period of ∼ 1 yr [17]. Most subsequent investigations that did not perform direct hydrodynamic simulations used the one-amplitude model of Ref. [17] for studying the r-mode instability. Levin [18] used this model to study the limiting effects of the r-mode instability on the spin evolution of LMXBs, assuming an r-mode saturation amplitude of ∼ 1; he adopted a modified shear viscosity to match the maxi- mum LMXB spin frequency of 330 Hz known in 1999. Levin found that the neutron star followed a cyclic evo- lution in the Ω − T phase plane. The star spins up for several million years until it crosses the r-mode stability curve, whereupon the r-mode becomes unstable and the star is viscously heated for a fraction of a year until the r-mode reaches its saturation amplitude (∼ 1). At this point the spin and r-mode amplitude evolution equations are changed, following the prescription of Ref. [17] to en- sure constant amplitude. The star then spins down by emitting gravitational radiation for another fraction of a year until it crosses the r-mode stability curve again and the instability shuts off. The time period during which the r-mode is unstable was found to be about 10−6 times shorter than the spin-up time, and Levin concluded that it is unlikely that any neutron stars in LMXBs in our galaxy are currently spinning down and emitting gravita- tional radiation. However, following work by Arras et al. [19] showing that nonlinear effects become significant at small r-mode amplitude, Heyl [20] varied the saturation amplitude, and found that the duration of the spin-down depends sensitively on it. He predicted that the unstable phase could be as much as 30% of the cyclic evolution for an r-mode saturation amplitude of α ≈ 10−5, and that this would make some of the fastest spinning LMXBs in our galaxy detectable by interferometers on Earth. Jones [21] and Lindblom and Owen [22] pointed out that if the star contains exotic particles such as hyperons (massive nucleons where an up or down quark is replaced with a strange quark), internal processes could lead to a very high coefficient of bulk viscosity in the cores of neu- tron stars. While this additional high viscosity coefficient could eliminate the instability altogether in newly born neutron stars [21, 22, 23, 24], Nayyar and Owen [24] pro- posed that it would enhance the probability of detection of gravitational radiation from LMXBs by blocking the thermal runaway. The cyclic evolution found by Levin [18] and gener- alized by Heyl [20] arises when shear or boundary layer viscosity dominates the r-mode dissipation. In the evo- lutionary picture of Nayyar and Owen [24], the r-mode first becomes unstable at a temperature where shear and boundary layer viscosity dominate, but the result- ing thermal runaway halts once hyperon bulk viscosity becomes dominant. The key feature behind the runaway is that shear and boundary layer viscosities both decrease with increasing temperature, so the instability speeds up as the star grows hotter. However, if the bulk viscosity is sufficiently large the star can cross the r-mode stabil- ity curve at a point where the viscosity is an increas- ing function of temperature. Such scenarios were stud- ied by Wagoner [1] for hyperon bulk viscosty with low hyperon superfluid transition temperature; similar evo- lution was found for strange stars by Andersson, Jones and Kokkotas [25]. In this picture, the star evolves near the r-mode stability curve until an equilibrium between accretion spin-up and gravitational radiation spin-down is achieved. The value of the r-mode amplitude remains below the lowest instability threshold found by Brink et al. [26, 27, 28] for modes with n < 30, and hence in this regime nonlinear effects may not play a role. Schenk et al. [29] developed a formalism to study the nonlinear interaction of the r-mode with other inertial modes. They assumed a small r-mode amplitude and treated the oscillations of the modes with weakly nonlin- ear perturbation theory via three-mode couplings. This assumption was tested by Arras et al. [19] and Brink et al. [26, 27, 28]. Arras et al. proposed that a turbu- lent cascade would develop in the strong driving regime. They estimated that r-mode amplitude was small and could have values between 10−1 − 10−4. Brink et al. modeled the star as incompressible and calculated the coupling coefficients analytically. They computed the in- teraction of about 5000 modes via approximatively 1.3 million couplings of the 109 possible couplings among the modes with n ≤ 30. The couplings were restricted to mode triplets with a fractional detuning δω/(2Ω) < 0.002 since near-resonances promote modal excitation at very small amplitudes. Brink et al. showed that the nonlinear evolution saturates at a very small amplitude, generally comparable to the lowest parametric instability thresh- old that controls the initiation of energy sharing among the sea of inertial modes. However, Brink et al. did not model accretion spin-up or neutrino cooling in their cal- culation and only included minimal dissipation via shear viscosity. In this paper we begin a more complete study of the saturation of the r-mode instability including accretion spin up and neutrino cooling. We use a simple model in which we parameterize uncertain properties of the star such as the rate at which it cools via neutrino emission and the rate at which the energy in inertial modes dis- sipates via boundary layer effects [30] and bulk viscos- ity. In order to exhibit the variety of possible nonlinear behaviors, we explore a range of models with different neutrino cooling and viscous heating coefficients by vary- ing the free parameters of our model. In particular, we vary: (1) the slippage factor Sns, which regulates the boundary layer viscosity, between 0 and 1 (see for exam- ple [31, 32, 33] for some models of the interaction between the oscillating fluid core and an elastic crust) ; (2) the fraction of the star that is above the density threshold for direct URCA reactions fdU, which is taken to be between 0 (0% of the star cools via direct URCA) and 1 (100% of the star is subjected to direct URCA reactions), and in general depends on the equation of state used; and (3) the hyperon superfluidity temperature Tc, which is believed to be between 109− 1010 K (We use a single, effective Tc rather than modelling its spatial variation.) We focus on Tc & 5×109 K for which nonlinear effects are important. For low Tc . 3 × 109 K, Wagoner [1] showed that the evolution reaches a steady state at amplitudes below the lowest parametric instability threshold found by Brink et al. [28]. It is important to note that all our evolu- tions start on the part of the r-mode stability curve that decreases with temperature and that the bulk viscosity does not play a role in any of our bound evolutions. We include three modes: the r-mode at n = 3 and the two inertial modes at n = 13 and n = 14 that become unstable at the lowest parametric instability threshold found by Brink et al. [28]. We evolve the coupled equa- tions for the three-mode system numerically in conjunc- tion with the spin and temperature evolution equations. The lowest parametric instability threshold provides a physical cutoff for the r-mode amplitude. In all cases we investigate, the growth of the r-mode is initially halted by energy transfer to the two daughter modes. We ob- serve that the mode amplitudes settle into a series of quasi-stationary states within a period of a few years af- ter the spin frequency of the star has increased above the r-mode stability curve. These quasi-stationary states are algebraic solutions of the three-mode amplitude equa- tions (see Eqs. (6)) and change slowly as the spin and the temperature of the star evolve. Using these solutions for the mode amplitudes, one can reduce the eight evo- lution equations (six for the real and imaginary parts of the mode amplitudes, which are complex [29]; one for the spin, and one for the temperature) to two equations gov- erning the rotational frequency and the temperature of the star. Our work can be regarded as a minimal physical model for modeling amplitude saturation realistically. The outcome of the evolution is crucially dependent on whether the star can reach a state of thermal equilib- rium. This can be predicted by finding the curve where the viscous heating by the three modes balances the neu- trino cooling, referred to below as the Heating = Cooling (H = C) curve. TheH = C curve can be calculated prior to carrying out an evolution using the quasi-stationary solutions for the mode amplitudes. If the spin frequency of the star upon becoming unstable is below the peak of the H = C curve, then the star will reach a state of thermal equilibrium. When such a state is reached we find several possible scenarios. The star can: (1) un- dergo a cyclic evolution; (2) reach a true equilibrium in which the accretion torque is balanced by the rate of loss of angular momentum via gravitational radiation; or (3) evolve in thermal equilibrium until it reaches the peak of the H = C curve, which occurs on a timescale of about 106 yr, and subsequently enter a regime of thermal run- away. On the other hand, if the star cannot find a state of thermal equilibrium, then it enters a regime of ther- mogravitational runaway within a few hundred years of crossing the r-mode stability curve. When this happens, the r-mode amplitude increases beyond the second para- metric instability, and more inertial modes would need to be included to correctly model the nonlinear effects. This will be done in a later paper. This paper focuses on showing how nonlinear mode couplings affect the evolution of the temperature and spin frequency of a neutron star once it becomes prone to the r-mode CFS instability. We do this in the context of three mode coupling, which may be sufficient for large enough dissipation. To illustrate the types of behavior that arise, we adopt a very specific model in which the mode fre- quencies and couplings are computed for an incompress- ible star, modes damp via shear viscosity, boundary layer viscosity and hyperon bulk viscosity, and the star cools via a mixture of fast and slow processes. This model in- volves several parameters that are uncertain, and we vary these to find ‘phase diagrams’ in which different generic types of behavior are expected. Moreover, the model it- self is simplified: (1) A more realistic treatment of the modes could include buoyant forces, and also mixtures of superfluids or of superfluid and normal fluid in different regions. (2) Dissipation rates, particularly from bulk vis- cosity, depend on the composition of high density nuclear matter, which could differ from what we assume. Nevertheless, although the quantitative details may differ from what we compute, we believe that many fea- tures of our calculations ought to be robust. More sophis- ticated treatment of the modes of the star will still find a dense set of modes confined to a relatively small range of frequencies. Most importantly, this set will exhibit nu- merous three mode resonances, which is the prerequisite for strong nonlinear effects at small mode amplitudes. Thus, whenever the unstable r-mode can pass its lowest parametric instability threshold, it must start exciting its daughters. Whether or not that occurs depends on the temperature dependence of the dissipation rate of the r- mode; for the models considered here, where bulk viscos- ity is relatively unimportant, soon after the star becomes unstable its r-mode amplitude passes its first paramet- ric instability threshold. Once that happens, the generic types of behavior we find - cycles, steady states, slow and fast runaway - ought to follow suit. The details of when different behaviors arise will depend on the precise features of the stellar model, but the principles we out- line here (parametric instability, quasisteady evolution, competition between heating and cooling) ought to ap- ply quite generally. In Sec. II we describe the evolution equations of the three modes, the angular frequency and the temperature of the neutron star. We first show how the equations of motion for the modes of Schenk et al. couple to the rota- tional frequency of the star in the limit of slow rotation. We then give a short review of the parametric instability threshold and the quasi-stationary solutions of the three- mode system. The thermal and spin evolution of the star is discussed next. This is followed by a description of the driving and damping rates used. Sec. III provides an overview of the results, which includes a discussion of each evolution scenario and of the initial conditions and input physics that lead to each scenario. Sec. IVA discusses cyclic evolution in more detail. An evolution that leads to an equilibrium steady state is presented next in Sec. IVB. The two types of thermal runaway are then discussed in Sec. IVC. The prospects for detecting gravitational radiation for the evolutions in which the three-mode system correctly models the nonlinear effects are considered in Sec. V. We summarize the results in the conclusion. Appendix A sketches a derivation of the equations of motion for the three modes and Appendix B contains a stability analysis of the evolution equations around the thermal equilibrium state. II. EVOLUTION EQUATIONS A. Three mode system: coupling to uniform rotation In this section we review the equations of motion for the three-mode system in the limit of slow rotation. In terms of rotational phase τ for the time variable with dτ = Ω dt Eq. (2.49) of Schenk et al. [29] can be rewritten = iω̃αCα + 2iω̃ακ̃√ CβCγ , (1) = iω̃βCβ − 2iω̃βκ̃√ = iω̃γCγ − 2iω̃γκ̃√ Here the scaled frequency ω̃j is defined to be ω̃j = ωj/Ω, the dissipation rates of the daughter modes are γβ and γγ , γα is the sum of the driving and damping rates of the r- mode γα = γGR−γαv, and the dimensionless coupling is κ̃ = κ/(MR2Ω2). These amplitude variables are complex and can be written in terms of the variables of Ref. [29] as Cj(t) = Ω(t)cj(t) (see Appendix A for a derivation of Eqs. 1). The index j loops over the three modes j = α, β, γ, where α labels the r-mode or parent mode and β and γ label the two daughter modes in the mode triplet. When the daughter mode amplitudes are much smaller than that of the parent mode, one can approximate the parent mode amplitude as constant. Under this assump- tion one performs a linear stability analysis on Eqs. (1) and finds the r-mode amplitude when the two daughter modes become unstable (see Eqs. (B5-B7) of Ref. [28] for a full derivation). This amplitude is the parametric instability threshold |Cα|2 = 4κ̃2ω̃βω̃γΩ 1 + Ω2 γβ + γγ , (2) where the fractional detuning is δω̃ = ω̃α − ω̃β − ω̃γ . Thorough explorations of the phase space of damped three-mode systems were performed by Dimant [34] and Wersinger et al. [35]. For the three modes at the lowest parametric instabil- ity threshold, ω̃α ≈ 0.66, ω̃β ≈ 0.44, ω̃γ ≈ 0.22, κ̃ ≈ 0.19 and |δω̃| ≈ 3.82 × 10−6. Note that ω̃ is twice the w of Brink et al. [26, 27, 28]. Here β labels the mode with n = 13,m = −3 and γ labels the n = 14,m = 1 mode. The amplitude the r-mode has to reach before exciting these two daughter modes is |Cα| ≈ 1.5× 10−5 Ω [28]. We next rescale the rotational phase τ by the fractional detuning as τ̃ = τ |δω̃| and the mode amplitudes by |Cα|0 = |δω̃| ω̃βω̃γ , |Cβ |0 = |δω̃| ω̃αω̃γ , (3) |Cγ |0 = |δω̃| ω̃βω̃α which for the r-mode is, up to a factor of Ω/Ωc, the no-damping limit of the parametric instability thresh- old below which no oscillations will occur. The coupled equations become |δω̃| C̄α + |δω̃|Ω̃ C̄α − C̄βC̄γ , (4) |δω̃| C̄β − |δω̃|Ω̃ C̄β − C̄αC̄ |δω̃| C̄γ − |δω̃|Ω̃ C̄γ − C̄αC̄ with C̄j = Cj/|Cj |0 and γ̃j = γj/Ωc being the newly rescaled amplitudes and dissipation/driving rates, re- spectively. 1. Quasi-Stationary Solution In terms of amplitudes and phase variables Cj = |Cj |eiφj Eqs. (4) can be rewritten as d|C̄α| Ω̃|δw̃| |C̄α| − sinφ|C̄β ||C̄γ | , (5) d|C̄β | = − γ̃β Ω̃|δw̃| |C̄β |+ sinφ|C̄α||C̄γ | d|C̄γ | Ω̃|δw̃| |C̄γ |+ sinφ|C̄α||C̄β | |δω̃| − cosφ |C̄β ||C̄γ | |C̄α| − |C̄α||C̄γ | |C̄β | − |C̄β ||C̄α| |C̄γ | where we have defined the relative phase difference as φ = φα − φβ − φγ . These equations have the stationary solution |C̄α|2 = 4γ̃βγ̃γ Ω̃|δω̃|2 tan2 φ , (6) |C̄β |2 = 4γ̃αγ̃γ Ω̃|δω̃|2 tan2 φ |C̄γ |2 = 4γ̃αγ̃β Ω̃|δω̃|2 tan2 φ tanφ = γ̃β + γ̃γ − γ̃α Ω̃|δω̃| Note that in the limit in which γβ+γγ >> γα the station- ary solution for the r-mode amplitude |Cα| is the same as the parametric instability threshold. B. Temperature and Spin Evolution The spin evolution equation is obtained from conser- vation of total angular momentum J , where J = IΩ + Jphys. (7) Following Eq (K39-K42) of Schenk et al. [29] the physical angular momentum of the perturbation can be written as ΩJphys = C⋆BCA d3xρ[(Ω̂× ξ⋆B) · (Ω̂× ξA) (8) − i (ω̃A + ω̃B) ξ⋆B · (Ω̂× ξA)]. Since the eigenvectors ξA ∝ eimAφ the cross-terms will vanish for modes with different magnetic quantum num- bers m as ei(mA−mB)φdφ = 0 for mA 6= mB. Eq. (8) can be re-written for our triplet of modes as Jphys = MR 2(kαα|Cα|2 + kββ|Cβ |2 + kγγ |Cγ |2), (9) where kαα is defined as kαα = d3xρ[(Ω̂×ξ⋆α) · (Ω̂×ξα)− iω̃αξ⋆α · (Ω̂×ξα)] and similarly for kββ and kγγ . In terms of the scaled vari- ables C̄j = Cj/|Cj |0 (with |Cj |0 defined in Eq. (3)) the angular momentum of the perturbation can be written Jphys = MR2Ωc|δω̃|2 (4k̃)2ω̃αω̃βω̃γ (kαα|C̄α|2ω̃α (11) +kββ|C̄β |2ω̃β + kγγω̃γ |C̄γ |2). We chose the same normalization for the eigenfuctions as Refs. [19, 26, 27, 28, 29] so that at unit amplitude all modes have the same energy ǫα = MR 2Ω2. The energy of a mode α is Eα = MR 2Ω2|cα|2 = MR2Ω|Cα|2. The rotating frame energy is the same as the canonical energy and physical energy [29]. The canonical angular momen- tum and the canonical energy of the perturbation satisfy the general relation Ec = −(ω/m)Jc [3]. Angular momentum is gained because of accretion and lost via gravitational waves emission = 2γGRJc rmode + Ṁ GMR, (12) where Jc rmode = −(mα/ωα)ǫα|cα|2 = −3MR2Ω|cα|2 = −3MR2|Cα|2. Eq. (12) can be rewritten in terms of the scaled variables C̄j as = −6γ̃GR MR2Ωc|δω̃| (4k̃)2ω̃βω̃γ |C̄α|2 + ΩcΩ̃|δω̃| . (13) Thermal energy conservation gives the temperature evo- lution equation C(T ) 2Ejγj +KnṀc 2 − Lν(T ), (14) = 2MR2Ω(γα v|Cα|2 + γβ |Cβ |2 + γγ |Cγ |2) +KnṀc2 − Lν(T ). The three terms on the right hand side of the equa- tion represent viscous heating, nuclear heating and neu- trino cooling. The specific heat is taken to be C(T ) ≈ 1.5 × 1038 T8 erg K−1, where T = T8 × 108 K. Nu- clear heating occurs because of pycnonuclear reactions and neutron emission in the inner crust [36]. At large accretion rates such as that of the brightest LMXBs of Ṁ ≈ 10−8M⊙/yr, the accreted helium and hydrogen burns stably and most of the heat released in the crust is conducted into the core of the neutron star, where neu- trino emission is assumed to regulate the temperature of the star [36, 37]. The nuclear heating constant is taken to be Kn ≈ 1×10−3 [36]. Following Ref. [1], we take the neutrino luminosity to be Lν = LdUT 8RdU(T/Tp) + LmUT 8RmU(T/Tp) (15) + Le−iT 8 + Ln−nT 8 + LCpT where the constants for the modified and direct URCA re- actions are defined by LmU = 1.0×1032 erg sec−1, LdU = fdU × 108LmU [38, 39], and the electron-ion, neutron- neutron neutrino bremsstrahlung and Cooper pairing of neutrons are given by Le−i = 9.1 × 1029 erg sec−1 [36], Ln−n ≈ 0.01LmU, LCp = 8.9 × 1031 erg sec−1 [40]. The fraction of the star fdU that is above the density thresh- old for direct URCA reactions is in general dependent on the equation of state [41] and in this work we treat fdU a free parameter with values between 0 and 1. The proton superfluid reduction factors for the modi- fied and direct URCA reactions are taken from Ref. [39] (see Eqs. (32) and (51) in Ref. [39]): RdU(T/Tp) = 0.2312 + (0.76880)2 + (0.1438v)2 × exp 3.427− (3.427)2 + v2 RmU(T/Tp) = 0.2414 + (0.7586)2 + (0.1318v)2 × exp 5.339− (5.339)2 + (2v)2 where the dimensionless gap amplitude v for the singlet type superfluidity is given by 1.456− 0.157 + 1.764 . (17) Similar to Ref. [1], we use Tp = 5.0× 109 K. In terms of the scaled variables Eq. (14) becomes C(T ) 2MR2Ω2c |δω̃| (4κ̃)2ω̃αω̃βω̃γ (ω̃αγ̃α v|C̄α|2 + ω̃βγ̃β |C̄β |2(18) +ω̃γγ̃γ |C̄γ |2) + KnṀc 2 − Lν(T ) ΩcΩ̃|δω̃| C. Temperature and Spin Evolution with the Mode Amplitudes in Quasi-Stationary States Assuming that the amplitudes evolve through a series of spin- and temperature-dependent steady states, i.e., dCi/dτ̃ ≈ 0, the spin and thermal evolution equations can be rewritten by taking J ≈ IΩ and using Eqs. (6) in Eq. (13). = − 6γ̃GR Ω̃2|δω̃| γ̃β γ̃γ 4k̃2Ĩω̃βω̃γ tan2 φ MR2ĨΩ̃|δω̃| where Ĩ = I/(MR2). The thermal evolution of the sys- tem is given by C(T ) 2MR2Ω2c (4κ̃)2ω̃αω̃βω̃γ γ̃αγ̃β γ̃γ Ω̃|δω̃| ω̃αγ̃α,v + ω̃β (20) +ω̃γ) tan2 φ KnṀc 2 − Lν(T ) ΩcΩ̃|δω̃| By setting the right hand side of the above equation to zero, one can find the Heating = Cooling (H = C) curve. Below, we find that Eqs. (19)-(20) describe the evolu- tion very well throughout the unstable regime. These equations are a minimal physical model for the effects of nonlinear coupling on r-mode evolution. D. Sources of Driving and Dissipation The damping mechanisms are shear viscosity, bound- ary layer viscosity and hyperon bulk viscosity; for modes j = α, β, γ we write γj v(Ω, T ) = γj sh(T ) + γj bl(Ω, T ) + γj hb(Ω, T ). (21) The r-mode is driven by gravitational radiation and damped by these dissipation mechanisms, while the pair of daughter modes (n = 13,m = −3 labeled as β and n = 14,m = 1 labeled as γ) is affected only by the vis- cous damping. Brink et al. [26, 27, 28] determined that this pair of modes is excited at the lowest parametric instability threshold. Their model uses the Bryan [42] modes of an incompressible star, which has the advan- tage that the mode eigenfrequencies (and eigenfunctions) are known analytically. This enables them to find near resonances efficiently. We are using their results, but we include more realistic effects such as bulk viscosity, whose effect vanishes in the incompressible limit (Γ1 → ∞ in Eq. (29)) For our benchmark calculations, we adopt the neutron star model of Owen et al. Ref. [6] (n = 1 polytrope, M = 1.4M⊙, Ωc = 8.4 × 103 rad sec−1 and R = 12.53 km) and use their gravitational driving rate and shear viscous damping rate for the r-mode γGR(Ω) ≃ sec−1, (22) γα sh(T ) ≃ where τsh = 2.56 × 106 sec. (In Sec. V we consider ap- proximate scalings with M and R.) The damping rate due to shear viscosity for the two daughter modes is calculated using the Bryan modes for a star with the same mass and radius γβ sh(T ) ≃ 3.48× 10−4 sec−1 , (23) γγ sh(T ) ≃ 4.52× 10−4 sec−1 The geometric contribution γsh/η of the individual modes increases significantly with the degree n of the mode scal- ing approximatively like n3 for large n (see Eq. (29) of Brink et al. [27] for an analytic fit to the shear damping rates computed for the 5,000 modes in their network), and hence the inertial modes with n = 13 and n = 14 have shear damping rates about three orders of magni- tude larger than that of the r-mode. The damping due to boundary layer viscosity is calcu- lated using Eq. (4) of Ref. [30], γα bl(T,Ω) ≃ 0.009 sec−1 S2ns , (24) γβ bl(T,Ω) ≃ 0.028 sec−1 S2ns γγ bl(T,Ω) ≃ 0.021 sec−1S2ns Analogous to Wagoner [1], we allow the slippage fac- tor Sns to vary. The slippage factor is defined by Refs. [1, 31, 45] to be S2ns = (2S n + S s )/3, with Sn being the fractional difference in velocity of the normal fluid be- tween the crust and the core [31] and Ss the fractional degree of pinning of the vortices in the crust [45]. Note that γβ bl and γγ bl are both greater than 2 × γα bl and can easily be comparable to γGR in the unstable regime. The damping rate due to bulk viscosity produced by out-of-equilibrium hyperon reactions for the r-mode is found by fitting the results of Nayyar and Owen [24]. This rate is taken to have a form similar to that taken by Wagoner [1] γα hb = fhb t−20α τ(T )Ω̃ 1 + (ω̃αΩτ(T ))2 , (25) and for the daughter modes γβ hb = fhb t−20β τ(T )ω̃ 1 + (ω̃βΩτ(T ))2 , (26) and similarly foe γγ hb. The relaxation timescale τ(T ) = Rhb(T/Tc) The reduction factor is taken to be the product of two single-particle reduction factors [23, 24] Rhb single(T/Tc) = a5/4 + b1/2 0.5068− 0.50682 + y2 where a = 1 + 0.3118y2, b = 1 + 2.556y2 and y = 1.0− T/Tc(1.456 − 0.157 Tc/T + 1.764Tc/T ). The constants t1 ≈ 10−4 sec and t0α ≈ 0.00058 sec are found by fitting the results of Ref. [24]. The factor fhb allows for physical uncertainties; we take fhb = 1 throughout the body of the paper since Tc , which enters γj hb ex- ponentially, is also uncertain. For the daughter modes, the dissipation energy due to bulk viscosity is calculated using the modes for the incompressible star. In the slow rotation limit, it is given to leading order in Γ−21 by − ĖB j = ξj · ∇p . (29) This approximation was proposed by Cutler and Lind- blom [43] and adopted by Kokkotas and Stergioulas [44] for the r-mode and by Brink et al. [27] for the inertial modes. The adiabatic index Γ1 is regarded as a parame- ter; we use Γ1 ≈ 2. The damping rate is γj hb = − ĖB j , (30) where ǫ = MR2Ω2 is the mode’s energy in the rotat- ing frame at unit amplitude and j = β, γ. Using this procedure, we calculate t0β ≈ 1.4× 10−5 sec, (31) t0γ ≈ 1.0× 10−5 sec. III. SUMMARY OF RESULTS Fig. 1(a) shows possible evolutionary trajectories of a neutron star in the angular velocity-temperature Ω̃− T8 plane, where T = T8×108 K is the core temperature, and Ω̃ = Ω/Ωc = Ω/ πGρ̄ with ρ̄ the mean density of the neutron star. Fig. 1(b) displays the regions in fdU − Sns in which the trajectories occur. Here fdU represents the fraction of the star that is above the density threshold for direct URCA reactions and Sns is the slippage factor that reduces the relative motion between the crust and the core taking into account the elasticity of the crust [31]. The stability regions are shown at fixed hyperon superfluidity temperature, Tc = 5.0 × 109 K. The initial part of the evolution is similar in all scenarios and can be divided into phases. Phase 0. Spin up below the r-mode stability curve at T8 = T8 in such that nuclear heating balances neutrino cooling. Phase 1. Linear regime. The r-mode amplitude grows exponentially. The phase ends when the r-mode reaches the parametric instability. Phase 2. The triplet coupling leads to quasi-steady mode amplitudes. The star is secularly heated at approximately constant Ω because of viscous dissipation in all three modes. Phase 3. Several trajectories are possible depending on FIG. 1: (a)Typical trajectories for the four observed evolu- tion scenarios are shown in the Ω̃ - T8 phase space, where Ω̃ = Ω/Ωc. The dashed lines (H = C curves) represent the points in the Ω̃ − T8 phase space where the dissipative effects of the heating from the three-modes exactly compen- sate the neutrino cooling for the given set of parameters (Sns, fdU, Tc, ...) of each evolution. (b)The corresponding sta- bility regions for which these scenarios occur are plotted at fixed hyperon superfluidity temperature Tc = 5.0 × 10 while varying fdU and Sns. The position of the initial angu- lar velocity and temperature (Ω̃in, T8 in) with respect to the maximum of this curve determines the stability of the evo- lution. (I) Ω̃in > Ω̃H=C max. Trajectory R1. Fast Runaway Region. After the r-mode becomes unstable the star heats up, does not find a thermal equilibrium state and continues heating up until a thermogravitational runaway occurs. (II) Ω̃in < Ω̃H=C max. The evolutions are either stable or, if there is a runaway, it occurs on timescales comparable to the ac- cretion timescale. The possible trajectories are (1)Trajectory C. Cycle Region. (2) Trajectories S1 and S2. Steady State Region. (3) Trajectory R2. Slow Runaway Region. how the previous phase ends. a. Fast Runaway. The star fails to reach thermal equilibrium when the trajectory passes over the peak of the Heating = Cooling (H = C) curve. This leads to rapid runaway. The daughter modes damp eventually as bulk viscosity becomes important, and the r-mode grows exponentially until the trajectory hits the r-mode stability curve again. This scenario ends as predicted by Nayyar and Owen [24]. However, the r-mode passes its second parametric instability threshold soon after it starts growing again. This requires the inclusion of more modes to follow the evolution, which is the subject of future work. b. The star reaches thermal equilibrium. There are then three possibilities: (i) Cycle. The star cools and spins down slowly, descending the H = C curve until it crosses the r-mode stability curve again. At this point the instability shuts off. The star cools back to T8 in at constant Ω̃ and then the cycle repeats itself. At Tc = 5.0 × 109 K this scenario occurs for values of Sns < 0.50 and large enough values of fdU. However, if Tc is larger, the cycle region in the fdU-Sns phase space increases dramatically (see Fig. 9(a)). Note that our cycles are different from those obtained by Levin [18] in that the spin-down phase does not start when the r-mode amplitude saturates (or in our case when it reaches the parametric insta- bility threshold), but rather when the system reaches thermal equilibrium. The r-mode amplitude does not grow significantly above its first parametric instability threshold, remaining close to ∼ 105 and so the part of the cycle in which the r-mode is unstable also lasts longer than in Ref. [18]. Also, our cycles are narrow. During spin-down the temperature changes by less than 20 % and Ω̃ changes by less than 10% of the initial value. (See Sec. 2 for a detailed example.) (ii) Steady State. For small Sns and large enough fdU (fdU & 5 × 10−5, Sns . 0.04; see Fig. 1(b)) the star evolves towards an Ω̃ equilibrium. The trajectory either ascends or descends the H = C curve (spins up and heats or spins down and cools). The evolution stops when the accretion torque equals the gravitational radiation emission. (iii) Slow Runaway. For small Sns and very small fdU (Sns . 0.03, fdU < 5×10−5) the star ascends the H = C curve until the peak is overcome and subsequently a runaway occurs. The daughter modes eventually damp and the r-mode grows exponentially until it crosses its second parametric instability threshold and more modes need to be included. Bulk viscosity only affects the runaway evolutions; the cyclic and steady state evolutions found here would be the same if there were no hyperon bulk viscosity. For large Tc ∼ 1010, or for models with no hyperons at all, there would be no runaway region (See Fig. 9(a) for an fdU − Sns scenario space with a larger Tc = 6.5× 109 K where the fast runaway region has shrunk dramatically FIG. 2: Two cyclic trajectories in the Ω̃ − T8 plane are dis- played for a star with Tc = 5.0 × 10 9 K and (a) fdU = 0.15 and Sns = 0.10, and (b) fdU = 0.142 and Sns = 0.35, which is close to the border between the stable and unstable region (see Fig. 1(b)). The thick solid line labeled as the Heating = Cool- ing (H = C) curve is the locus of points in this phase space where the neutrino cooling is equal to the viscous heating due to the unstable modes. The other solid line representing the r-mode stability curve is defined by setting the gravitational driving rate equal to the viscous damping rate. The part of the curve that decreases with T8 is dominated by boundary layer and shear viscosity, while the part of the curve that has a positive slope is dominated by hyperon bulk viscosity. In portion a1 → b1 of the trajectory the star heats up at con- stant Ω̃. Part b1 → c1 represents the spin down stage, which occurs when the viscous heating is equal to the neutrino cool- ing. c1 → d1 shows the star cooling back to the initial T8. Segment d1 → a1 displays the accretional spin-up of the star back to the r-mode stability curve. The cycle a2 → d2 pro- ceeds in the same way. This cycle is close to the peak of the H = C curve. Configurations above this peak will run away. and the slow runaway region has disappeared.) IV. POSSIBLE EVOLUTION SCENARIOS In this section we examine examples of the different types of evolution in more detail. We assume Ṁ = 10−8M⊙/yr and Tc = 5.0× 109 K. A. Cyclic Evolution In this sub-section we present the features of typical cyclic trajectories of neutron stars in the angular velocity temperature plane in more detail. We focus on two cases: (C1) Sns = 0.10 and fdU = 0.15 and (C2) Sns = 0.35 and fdU = 0.142. In this scenario the 3-mode system is sufficient to model the nonlinear effects and successfully stops the thermal runaway. The numerical evolution is started once the star reaches the r-mode stability curve. The initial temperature of the star is at the point where nuclear heating equals neutrino cooling in Eq. (18) that is approximately T8 in ≈ 3.29 for both cases. The initial Ω is the angular velocity that corresponds to this tem- perature on the r-mode stability curve, which differs for the different Sns (Ω̃in = 0.183 for C1 and Ω̃in = 0.288 for Figs. 2(a) and (b) display the cyclic evolution for tra- jectories C1 and C2 of Fig. 1(b). In leg a1 → b1 of the trajectory the r-mode and, once the r-mode amplitude increases above the first parametric instability thresh- old, the two daughter modes it excites, viscously heat up the star until point b1 when the neutrino cooling bal- ances the viscous dissipation. This part of the evolu- tion occurs at constant angular velocity over a period of theat−up ≈ 100 yr and a total temperature change (∆T )a1−b1 ≈ 0.80 (≈ 24% of T8 in). The points where the viscous heating compensates the neutrino cooling are represented by the Heating = Cooling (H = C) curve. This is determined by setting Eq. 18 to zero and using the quasi-stationary solutions given by Eq. (6) for the three modes on the right hand side. The star continues to evolve on the H = C curve for part b1 → c1 of the trajectory as it spins down and cools down back to the r- mode stability curve. This spin-down stage lasts a time tspin−downb1−c1 ≈ 23, 000 yr that is much longer than the heat-up period. This timescale is very sensitive to changes in the slippage factor and can reach 106 yr for smaller values of Sns that are close to boundary of the steady state region. The cycle is very narrow in angular velocity with a total angular velocity change of less than 4%, (∆Ω̃)b1−c1 ≈ 0.0066. The temperature also changes by only about 2%, (∆T8)b1−c1 ≈ 0.08 in this spin-down period. Segment c1 → d1 represents the cooling of the star to the initial temperature on a timescale of ∼ 2, 000 yr. In part d1 → a1 the star spins up by accretion at constant temperature back to the original crossing point on the r-mode stability curve. This last part of the tra- jectory is the longest-lasting one, taking ≈ 200, 000 yr at our chosen Ṁ of 10−8M⊙yr −1. The cycle C2 in Fig. 2(b) proceeds in a similar fashion. It is important to note that this configuration is close to the border between the “FAST RUNAWAY” and “CYCLE” regions and there- fore close to the peak of the H = C curve. Configura- tions above this peak (e.g., with the same fdU and higher Sns) will go through a fast runaway. Fig. 3(a) shows the evolution of the three modes in the first few years after the star first reaches the r- mode stability curve. In this region the r-mode is un- stable and initially grows exponentially. Once it has in- creased above the first parametric instability threshold the daughter modes are excited. The oscillations of the three modes display some of the typical dynamics of a driven three-mode system. When the r-mode transfers energy to the daughter modes they increase exponen- tially while the r-mode decreases. Similarly, when daugh- ter modes decrease the r-mode increases. The viscosity damps the oscillations and the r-mode amplitude settles at a value close to the parametric instability threshold. Fig. 3(b) displays the evolution of the r-mode ampli- tude divided by the parametric instability threshold on a longer timescale. It can be seen that the r-mode never grows significantly beyond this first threshold. Fig. 3(c) shows the evolution of the parametric instability thresh- old as a function of time. The threshold increases as the temperature increases and the star is viscously heated by the three modes. When the star spins down in thermal equilibrium, the threshold decreases to a value close to its initial value. B. Steady State Evolution This sub-section focuses on evolutions that lead to a steady equilibrium state in which the rate of accretion of angular momentum is balanced by the rate of loss via gravitational radiation emission. This scenario is re- stricted to stars with small slippage factor (Sns . 0.04, see Fig. 1(b)) and boundary layer viscosity. A typical trajectory of a star that reaches such an equilibrium is shown in Fig. 4. As always, we start the evolution at the point on the r-mode stability curve at which the nuclear heating balances neutrino cooling. Above the r-mode sta- bility curve the gravitational driving rate is greater than the viscous damping rate and the r-mode grows exponen- tially until nonlinear effects become important. In this case, as in the cyclic evolution, the triplet of modes at the lowest parametric instability threshold is sufficient to stop the thermal runaway. The r-mode remains close to the first instability threshold for the length of the evo- lution and after a few oscillations the three modes settle into their quasi-stationary states, which change only sec- ularly as the spin and temperature of the star evolve. The modes heat the star viscously at constant Ω̃ in seg- ment a → b of the trajectory for theat−up ≈ 1, 100 yr. At point b, the star reaches a state of thermal balance. In leg b → c the star continues its evolution in thermal equilibrium and slowly spins up due to accretion until the angular velocity evolution also reaches an equilib- 5.12 5.13 τ [x 10 0.422 0.424 0.426 0.428 0.432 |cα Th| 0.4 0.5 0.6 0.7 0.8 0.9 1 τ [x 10 0 0.5 1 1.5 2 2.5 3 3.5 4 τ [x 10 0.425 0.435 FIG. 3: (a)The amplitudes of the r-mode |Cα| and of the n = 13, m = −3 and n = 14, m = 1 inertial modes |Cβ | and |Cγ | are shown as a function of time for a star that executes a cyclic evolution (same parameters as in Fig. 2). The low- est parametric instability threshold is also displayed. (b)The ratio of the r-mode amplitude to the parametric instability threshold is plotted as a function of time. It can be seen that once the r-mode crosses the parametric instability threshold it remains close to it for the rest of the evolution. (c)The parametric instability threshold is displayed as a function of time. Its value changes as the angular velocity and tempera- ture evolve. FIG. 4: The trajectory of a neutron star in the Ω̃− T8 phase space is shown for a model with Tc = 5.0× 10 9 K, fdU = 0.03 and Sns = 0.03 that reaches an equilibrium steady state. The star spins up until it crosses the r-mode stability curve and the r-mode becomes unstable. The r-mode then quickly grows to the first parametric instability threshold and excites the daughter modes. In leg a → b of the trajectory the star is viscously heated by the mode triplet until the system reaches thermal equilibrium. Segment b → c shows the star contin- uing to heat and spin up in thermal equilibrium until the accretion torque is balanced by the gravitational radiation emission. The r-mode stability curve represents the points in phase space where the viscous driving rate is equal to the gravitational driving rate. The H=C curve is the locus of points where the viscous dissipation due to the mode triplet balances the neutrino cooling. FIG. 5: The (Ω̃, T8) initial values (region delimited by the solid line) that lead to equilibrium steady states and their corresponding final steady state values (region enclosed by the dashed line) are shown. Since both the initial and final values of T8 are low, these evolutions are roughly independent of Tc. rium. The timescale to reach an equilibrium steady state is tsteady ≈ 3.5× 106 yr for this set of parameters. Fig. 5 displays the possible initial values for the angu- lar velocity Ω̃ and temperature T8 of the star that lead to a balancing between the accreted angular momentum and the angular momentum emitted in gravitational waves. The fraction of the star that is above the threshold for direct URCA reactions and the slippage factor are varied within the corresponding “STEADY STATE” region of Fig. 1(b). The final equilibrium values are also displayed and cluster in a narrower region than the initial values. Because viscosity is so small in this regime, the values of Ω also tend to be small. Thus, although an interest- ing physical regime, this case is most likely not relevant to recycling by accretion to create pulsars with spin fre- quencies as large as 716 Hz. Note that a steady state can be achieved when Sns = 0. This is the probable end state of the problem first calculated by Levin [18]. The reason we do not find a cycle at low Sns is twofold: (1) the shear viscosity we are using is lower (shear viscosity in Ref. [18] is amplified by a factor of 244), and (2) the nonlinear couplings keep all mode amplitudes small. C. Thermal Runaway Evolutions We now consider evolutions in which the three-mode system is not sufficient to halt the thermal runaway. We observe two such scenarios. In the first scenario, the star is unable to reach thermal equilibrium. The run- away occurs on a period much shorter than the accretion timescale and so the whole evolution is at approximately constant angular frequency. In the second scenario, the star reaches a state of thermal equilibrium but the spin evolution does not reach a steady state. The star contin- ues to spin up by accretion until it climbs to the peak of the H = C curve, thermal equilibrium fails and a run- away occurs. 1. Fast Runaway A typical trajectory of a star that goes through a rapid thermal runaway is displayed in Fig. 6. This star has Sns = 0.25 and fdU = 0.058. Initially, the growth of the r-mode is halted by the two daughter modes once the lowest parametric instability threshold is crossed, and the three modes settle in the (Ω,T )-dependent quasi- stationary states of Eqs. (6). They viscously heat up the star until hyperon bulk viscosity becomes important for the daughter modes. As the amplitudes of the daughter modes decrease the coupling is no longer strong enough to drain enough energy to stop the growth of the r-mode. The daughter modes are completely damped and the r- mode increases exponentially. The system goes back to the one-mode evolution described by Ref. [24]. Fig. 6(a) and (b) compare both the temperature evolu- tion and the trajectory in the Ω̃−T8 plane of the star for a 0 0.5 1 1.5 2 2.5 3 3.5 τ [x 10 Full Amplitudes-Ω-T Evolution Steady State Evolution 2nd Parametric Instability Threshold 3 3.5 4 4.5 5 5.5 6 6.5 r-mode Stability Curve Full Amplitude-Ω-T Evolution Steady State Evolution 2nd Parametric Instability Threshold FIG. 6: This plot compares the full evolution resulting from solving Eqs. (4),(13),(18) with the reduced Ω − T evolution that assumes the amplitudes go through a series of steady states Eqs. (19)-(20) for a model with Tc = 5.0 × 10 fdU = 0.058 and Sns = 0.25. (a) The temperature is dis- played as a function of time for the two different methods. (b) The angular velocity Ω̃ = Ω/Ωc is shown as a function of temperature. The evolution occurs at constant spin fre- quency. It can be seen that the steady-state amplitude ap- proximation is extremely good. The ‘X’ shows the point at which the r-mode crosses its second lowest parametric insta- bility threshold, where additional dissipation would become operative. simulation solving the full set of equations to a simulation that assumes quasi-stationary solutions for the three am- plitudes and evolves only the angular velocity and tem- perature of the star. It can be seen that the steady state approximation is very good until the thermal runaway occurs. Afterward, the temperature evolution of the re- duced equations is offset slightly from the quasi-steady result and intersects the r-mode instability curve sooner. This evolution is similar to that described by Nayyar and Owen [24]. However, the r-mode crosses its second low- FIG. 7: The trajectory of a neutron star in the Ω̃ − T8 phase space is shown for a model with Tc = 5.0 × 10 fdU = 4.0 × 10 −5 and Sns = 0.02 that goes through a slow thermogravitational runaway. Portion a → b of the trajectory shows the mode triplet heating up the neutron star through boundary layer and shear viscosity until the system reaches thermal equilibrium. Segment b → c represents the accre- tional spin-up of the star in thermal equilibrium. The dotted- dashed line is the locus of points where the viscous dissipation of the mode triplet is equal to the neutrino cooling, and is la- beled as the H = C curve. The star reaches the maximum of this curve and fails to reach an equilibrium between the ac- cretion torque and gravitational emission. It then continues heating at constant angular velocity and crosses its second lowest parametric instability threshold, at which point more modes would need to be included to make the evolution accu- rate. Eventually the star reaches the r-mode stability curve again. est parametric instability much earlier in the evolution (see the ‘X’ in the figure), and at that point more modes need to be included to model the instability accurately. Thus, we cannot be sure that a runaway must occur in this case. We shall return to this issue in a subsequent paper. 2. Slow Runaway In this section we examine evolutions in which the neu- tron star has both a very small slippage factor, Sns . 0.03, and only a small percentage of the star is above the threshold for direct URCA reactions, fdU < 5 × 10−5. A trajectory for this kind of evolution is displayed in Fig. 7. After the star crosses the r-mode stability curve, the r-mode increases beyond the first parametric insta- bility threshold, and its growth is temporarily stopped by energy transfer to the daughter modes. As in the previous scenarios, the star is viscously heated by the mode triplet at constant Ω in part a → b of the trajec- tory on a timescale of about 5, 000 yr. At point b, it FIG. 8: The spin-down timescale is shown as slippage fac- tor Sns and fraction of the star subject to direct URCA fdU for cyclic evolutions are varied for a fixed hyperon critical temperature of Tc = 5.0 × 10 9 K. This timescale dominates the heat-up timescale and hence represents the time the star spends above the r-mode instability curve. It increases as the viscosity is lowered and the star gets closer to the steady state region. reaches thermal equilibrium. In leg b → c of the tra- jectory, the star continues its evolution by ascending the H = C curve and spinning up because of accretion for about 2 × 106 yr without finding an equilibrium state for the angular momentum evolution. Once it reaches the peak of the H = C curve, the cooling is no longer sufficient to stop the temperature from increasing expo- nentially and a thermal runaway occurs. The cross mark ‘X’ on the trajectory shows the point at which the r-mode amplitude crosses its second lowest parametric instabil- ity threshold. At this stage more inertial modes need to be included to model the rest of this evolution correctly. As for the cases that evolve to steady states, these long- timescale runaways tend to occur at low spin rates. V. PROBABILITY OF DETECTION Fig. 8 shows how the time the star spends above the r-mode stability curve changes when Sns and fdU are var- ied. For large enough values of Sns the boundary layer viscosity dominates. In this region of phase space the spin-down timescale can be approximated by tspin−down = dΩ̃ (32) FIG. 9: (a)The stability regions are plotted at fixed hyperon superfluidity temperature Tc = 6.5×10 9 K, while varying fdU and Sns. The steady state region remains roughly the same as in Fig. 1(b), the slow run-away region disappears, and the cycle region increases dramatically while shrinking the fast- runaway region. (b) The spin-down timescale is shown for the cyclic evolutions in part (a). Ĩτ0GR (4κ̃)2ω̃βω̃γ |δω̃|2 |C̄α|2 <Ω̃>6 ≈ 250 yr ∆νkHz <νkHz>7 M1.4R |cthα | where M1.4 = M/(1.4M⊙), R6 = R/(10 6cm), νkHz = ν/1kHz, Ĩ = 0.261 [17], the r-mode am- plitude at its parametric instability threshold |cthα | ≈ |δω̃|/(4κ̃ ω̃βω̃γ) ≈ 1.5×10−5, and C̄α = Ω̃|cα|/|cα|th. This approximation agrees with spin-up timescales ob- tained from our simulations to ∼ 25%. The maximum ν is approximately the same as the ini- tial frequency, and can be determined by equating the driving and damping rate of the r-mode, since it is on the r-mode stability curve νmax ≈ 800Hz M1.4R6 )4/11 . (33) Thus, the spin-down timescale is very sensitive to the slippage factor tspin−down ∝ S−24/11ns (∆νkHz/νkHz). The dependences on fdU and accretion rate Ṁ are much weaker; a rough approximation, obtained by matching direct URCA cooling and nuclear heat- ing, is T8 in ∝ Ṁ1/6f−1/6dU R 1.4 , and νmax ∝ dU Ṁ −1/33R −34/99 1.4 . The gravitational wave amplitude measured at distance d [46, 47] is h ≈ 1.6R τ0GRc Ω̃3|cα| (34) ≈ 3× 10−25 10kpc M1.4R Taking ν ≈ νmax gives h ∝ S12/11ns M −1/33 1.4 R dU Ṁ −1/11. (35) The maximum distance at which sources could be de- tected by Advanced LIGO interferometers, assuming hmin = 10 −27, [46] is dmax ≈ 3Mpc 10−27 M1.4R |cthα | ≈ 1.5Mpc 10−27 S12/11ns M −1/11 1.4 R 21/11 × T−6/118 |cthα | Eqs. (33) and (36) imply that gravitational radiation from the r-mode instability may only be detectable for sources in the Local Group of galaxies. Eq. (33) implies that for accretion to be able to spin up neutron stars to ν & 700 Hz, we must require (Sns/M1.4R6 T8in) 4/11 & 1. Assuming this to be true, dmax . 1-1.5 Mpc. However, tspin−down ≈ 1000 yr at most, making detection unlikely for any given source. Moreover, unless Sns can differ sub- stantially from one neutron star to another, only those with ν given by Eq. (33) can be r-mode unstable. Slower rotators, including almost all LMXBs, are still in their stable spin-up phases. Still more seriously, Fig. 1(b) shows that spin cycles are only possible for Sns . 0.50, assuming Tc ≈ 5.0× 109 K; Eq. (33) then implies ν . 450 Hz. This would restrict detectable gravitational radiation to galactic sources, al- though the duration of the unstable phase could be longer. Within the context of our three mode calculation, Sns > 0.50, which is needed for explaining the fastest pul- sars, would imply fast runaway. There are two possible resolutions to this problem. One is that including addi- tional modes prevents the runaway; we shall investigate this in subsequent papers. The second is that Tc is larger, or that neutron stars do not contain hyperons (e.g., be- cause they are insufficiently dense). Fig. 9(a) shows the same phase plane as Fig. 1(b) but with Tc = 6.5×109 K, and Fig. 9(b) shows the results for tspin−down analogous to Fig. 8. Larger Tc permits spin cycles for higher values of Sns (and hence ν), but the time spent in the unstable regime is shorter. VI. CONCLUSIONS In this paper, we model the nonlinear saturation of un- stable r-modes of accreting neutron stars using the triplet of modes formed from the n = 3,m = 2 r-mode and the the first two near resonant modes that become unstable (n = 13,m = −3 and n = 14,m = 1) by coupling to the r-mode. This is the first treatment of the spin and thermal evolution including the nonlinear saturation of the r-mode instability to provide a physical cutoff by en- ergy transfer to other modes in the system. The model includes neutrino cooling and shear, boundary layer and hyperon bulk viscosity. We allow for some uncertainties in neutron star physics that is not yet understood by varying the superfluid transition temperature, the slip- page factor that regulates the boundary layer viscosity, and the fraction of the star that is above the density threshold for direct URCA reactions. In all our evolu- tions we find that the mode amplitudes quickly settle into a series of quasi-stationary states that can be calcu- lated algebraically, and depend weakly on angular veloc- ity and temperature. The evolution continues along these sequences of quasi-steady states as long as the r-mode is in the unstable regime. The spin and temperature of the neutron star can follow several possible trajectories depending on interior physics. The first part of the evo- lution is the same for all types of trajectories: the star viscously heats up at constant angular velocity. If thermal equilibrium is reached, we find several pos- sible scenarios. The star may follow a cyclic evolution, and spin down and cool in thermal equilibrium until the r-mode enters the stable regime. It subsequently cools at constant Ω until it reaches the initial temperature. At this point the star starts spinning up by accretion until the r-mode becomes unstable again and the cycle is repeated. The time the star spends in the unstable regime is found to vary between a few hundred years (large Sns ∼ 1) and 106 yr (small Sns ∼ 0.05). Our cycles are different from those previously found by Ref. [18] in that our amplitudes remain small, ∼ 10−5, which slows the viscous heating and causes the star to spend more time in the regime where the r-mode instability is active. Furthermore, we find that the star stops heating when it reaches thermal equilibrium and not when the r- mode reaches a maximum value. The cycles we find are narrow with the spin frequency of the star changing less than 10% even in the case of high spin rates ∼ 750 Hz. Other possible trajectories are an evolution toward a full steady state in which the accretion torque balances the gravitational radiation emission, and a very slow thermo- gravitational runaway on a timescale of ∼ 106 yr. These scenarios occur for very low viscosity (Sns . 0.04). Al- though theoretically interesting, they do not allow for very fast rotators of ∼ 700 Hz. Alternatively, if the star does not reach thermal equi- librium, we find that it continues heating up at constant spin frequency until it enters a regime in which the r- mode is no longer unstable. This evolution is similar to that predicted by Nayyar and Owen [24]. However, the r-mode grows above its second parametric instability threshold fairly early in its evolution and at this point more inertial modes should be excited and the three- mode model becomes insufficient. Modeling this scenario accurately is subject of future work. We have focused on cases with Tc & 5× 109 K. These are cases for which the nonlinear effects are substantial. In this regime, hyperon bulk viscosity is not important except for thermal runaways where we expect other mode couplings, ignored here, to play important roles. Fast ro- tation requires large dissipation, as has long been recog- nized [18, 30] and these models can only achieve ν & 700 Hz if boundary layer viscosity is very large. Alterna- tively, at lower Tc . 3 × 109 K, large rotation rates can be achieved at r-mode amplitudes below the first para- metric instability threshold [1]. Nayyar and Owen found that increasing the mass of the star for the same equation of state makes the hyperon bulk viscosity become impor- tant at lower temperatures [24]. Conceivably, there are accreting neutron stars with relatively low masses that have lower central densities and small hyperon popula- tions. These could evolve as detailed here and only spin up to modest frequencies. Hyperons could be more im- portant in more massive neutron stars leading to larger spin rates and very small steady state r-mode amplitude as found by Wagoner [1]. Our models imply small r-mode amplitudes of ∼ 10−5 and therefore gravitational radiation detectable by ad- vanced LIGO interferometers only in the local group of galaxies up to a distance of a few Mpc. The r- mode instability puts a fairly stringent limit on the spin frequencies of accreting neutron stars of νmax ≈ 800Hz[Sns/(M1.4R6)] 4/11T −2/11 8 . In order to allow for fast rotators of & 700 Hz in our models a large bound- ary layer viscosity with (Sns/M1.4R6 T8in) 4/11 ∼ 1 is required. Slippage factors of order ∼ 1 lead to time peri- ods on which the r-mode is unstable with a timescale of at most 1000 yr, which is about 10−3 times shorter than the accretion timescale. This would mean that only about 1 in 1000 LMXBs in the galaxy are possible LIGO sources. However, lower slippage factors lead to a longer duration of the gravitational wave emission, but also lower fre- quencies. We also note that in this model we have con- sidered only very fast accretors with Ṁ ∼ 10−8M⊙yr−1 and most LMXBs in our galaxy accrete at slower rates. Investigations with more accurate nuclear heating models are a subject for future work. Our analysis could be made more realistic in several ways, such as by including the effects of magnetic fields, compressibility, multi-fluid composition [48], superfluid- ity, superconductivity, etc. These features would render the model more realistic, but its generic features ought to persist, since the upshot would still be a dense set of mode frequencies exhibiting three mode resonances and parametric instabilities with low threshold amplitudes. Although the behavior of the star would differ quanti- tatively in a model different from ours in detail, we ex- pect the qualitative behaviors we have found to be ro- bust, as they are well described by quasi-stationary mode evolutions whose slow variations are determined by com- petitions between dissipation and neutrino cooling, and accretion spin-up and gravitational radiation spin-down. In our model, it seems that three mode evolution involv- ing interactions of the r-mode with two daughters at the lowest parametric instability threshold is often sufficient to quench the instability. Our treatment is inadequate to follow what happens when the system runs away; for this, coupling to additional modes is essential. For this regime, a generalization of the work of Brink et al. [26, 27, 28] that includes accretion spin-up, viscous heating and neu- trino cooling would be needed. Such a calculation is formidable even in a “simple” model involving coupled inertial modes of an incompressible star. Acknowledgments It is a pleasure to thank Jeandrew Brink and Éanna Flanagan for useful discussions. RB would especially like to thank Jeandrew for useful discussions, encourage- ment and advice at the beginning of this project, with- out which the project would not have been started. RB is very grateful to Gregory Daues for steady encourage- ment and support for the duration of this project, and also to Gabrielle Allen and Ed Seidel. This research was funded by grants NSF AST-0307273, NSF AST-0606710 and NSF PHY-0354631. APPENDIX A This appendix will sketch the derivation of Eqs. (1) from the Lagrangian density. We follow closely Appendix A in Schenk et al., which contains the derivation of the equations of motion for constant Ω. The Lagrangian density as given by Eq. (A1) in Schenk et al. [29] is L = 1 ξ̇ · ξ̇ + 1 ξ̇ ·B · ξ − 1 ξ ·C · ξ + aext(t) · ξ, (A-1) where the operators B · ξ = 2Ω× ξ and ρ(C · ξ)i = −∇i(Γ1p∇jξj) +∇ip∇jξj + ρ∇iδφ (A-2) − ∇jp∇iξj + ρξj∇j∇iφ+ ρξj∇j∇iφrot with φrot = −(1/2)(Ω × x)2. We are interested in a situation where the uniform angular velocity of the star changes slowly on the timescale of the rotation period itself. In order to remove the time dependence we define the new displacement and time variables , dτ = Ωdt. (A-3) In terms of these new variables the Lagrangian density can be written as L̃ = 1 ξ̃′ · ξ̃′ + 1 ξ̃′ · (B̃ · ξ̃) + ( |ξ̃|2 (A-4) ξ̃ · C̃ · ξ̃ + aext(t) · ξ̃, where the primes denote derivatives with respect to τ , B̃ = Ω−1B and C̃ = Ω−2C. The momentum canonically conjugate to ξ̃ is = ξ̃′ + Ω̂× ξ̃. (A-5) The associated Hamiltonian density is B̃ · ξ̃ |ξ̃|2 + ξ̃ · C̃ · ξ̃ − · ξ̃. (A-6) Hamilton’s equations of motions can be written as ζ̃′ = T · ζ̃ + F(τ), (A-7) where the operator T is T = T0 + T1 with B̃2 − C̃ − 1 Ω)′′√ F(τ) = We assume solutions of the form ζ̃(τ,x) = eiω̃tζ̃(x). Spe- cializing to the case of no forcing term aext = 0 leads to the eigenvalue equation (T0 − iω̃)ζ̃(x) = 0. (A-8) Since the operator T0 is not Hermitian it will have dis- tinct right and left eigenvectors. Similar to Schenk et al. [29] we label the right eigenvectors of T as ζ̃A, and the associated eigenfrequencies as ω̃A = ωA/Ω, and the eigenvalue equation above becomes (T0 − iω̃A)ζ̃A(x) = 0. (A-9) The left eigenvectors χA satisfy 0 − iω̃⋆A)χ̃A = 0, (A-10) where B̃2 − C̃ For simplicity, in this appendix we specialize to the case of no Jordan chains when the set of right eigenvectors forms a complete basis. The orthonormality relation be- tween right and left eigenvectors is χ̃A, ζ̃B d3xρ(x)χ̃ A · ζ̃B = δAB. (A-11) We can expand ζ(τ,x) in this basis as ζ(τ,x) = CA(τ)ζA(x), (A-12) where the coefficients CA are given by the inverse of this mode expansion CA(τ) = χ̃A, ζ̃(τ,x) . (A-13) Using Eqs. (B-2,A-9,A-11) in Eq. (A-7) leads to the equa- tions of motion for the mode amplitudes C′A − iω̃ACA = g(τ) (A-14) + 〈χ̃A, F (τ)〉 , where g(τ) = ( Ω)′′/ Ω. Following Sec. IV of Schenk et al. [29] we replace the externally applied acceleration by the nonlinear acceleration given by Eq. (4.2) of Ref. [29]. The inner product can be written in terms of the displacement variable ξ̃. The left eigenvectors are χ̃A = where τ̃A can be chosen to be proportional to ξ̃A because they satisfy the same matrix equation. τ̃A = −iξ̃A/b̃A, (A-15) which corresponds to Eq. (A-45) in Schenk et al. [29] with the proportionality constant b̃A = Ω −1bA = MR2/ω̃A (also given by Eq. (2.36) of Ref. [29]). The equations of motion for the mode amplitudes be- C′A − iω̃ACA = ig(τ) d3xξ̃⋆A · ξ̃B(A-16) κ̃⋆ABCC where the nonlinear coupling κ̃ABC = κABC/(MR and κABC is explicitly give by Eq. (4.20) of Ref. [29]. The g(τ) integral mixes only modes with mA = mB because of the eimφ dependence of the displacement eigenvectors ξ̃. ( dφei(mA−mB)φ = 0 if mA 6= mB.) So, this term will be zero for our mode triplet. Also, in the case of a single mode triplet there is only one coupling and Eqs. (A-16) take the form of Eqs. (1). APPENDIX B In this appendix we study the behavior of the mode amplitudes and temperature near equilibrium assuming constant angular velocity. We are performing a first order expansion of Eqs. (5) and (18). Similar to Ref. [49], each of the five variables is expanded about its equilibrium (Xj)e as follows Xj(τ̃ ) = {|C̄α|, |C̄β |, |C̄γ |, φ, T8} = (Xj)e[1 + ζj(τ̃ )] (B-1) where the perturbation |ζj | << 1 and j = α, β, γ, T . The expansion leads to a first order differential equation for each ζj (γ̃α)e Ω̃|δω̃| ζα − ζβ − ζγ − ζφ (B-2) (γ̃β)e Ω̃|δω̃| ζα − ζβ + ζγ + (γ̃γ)e Ω̃|δω̃| ζα + ζβ − ζγ + φe tanφe γ̃α + γ̃β + γ̃γ Ω̃|δω̃| −γ̃α − γ̃β + γ̃γ Ω̃|δω̃| −γ̃α + γ̃β − γ̃γ Ω̃|δω̃| (γ̃α − γ̃β − γ̃γ)e Ω̃|δω̃| MR2Ω2c γ̃αγ̃β γ̃γ 2κ̃2ω̃αω̃βω̃γΩ̃|δω̃|C(Te)T8e tanφ2e γ̃α v ζα + ω̃βζβ + ω̃γζγ + T8e + ω̃β + ω̃γ ΩcΩ̃|δω̃|C(Te) where the equilibrium amplitudes |Cj |e have been written in terms of the corresponding driving and damping rates using Eqs. (6). Eq. (B-2) can be written in matrix form = Aijζi. (B-3) Let ζj ∝ exp(λτ̃ ). The determinant ||Aij − λδij || = 0 leads to the eigenvalue equation λ5 + a4λ 4 + a3λ 3 + a2λ 2 + a1λ+ a0 = 0. (B-4) The coefficients aj with j = 0, 4 are a4 = 2 tanφe = γ̃β + γ̃γ − γ̃α Ω̃|δω̃| , (B-5) tanφ2e γ̃2β + γ̃ γ + γ̃ (Ω̃|δω̃|)2 + tanφ2e − 1, γ̃αγ̃β γ̃γ (Ω̃|δω̃|)3 tanφ2e 4γ̃αγ̃β γ̃γ (Ω̃|δω̃|)3 tanφe + tanφ 2MR2Ω2c κ̃2ω̃αω̃βω̃γC(Te) (γ̃αγ̃β γ̃γ) (Ω̃|δω̃|)4 tanφe tanφ2e 4γ̃αγ̃β γ̃γ (Ω̃|δω̃|)3 tanφe Ω̃|δω̃|C(Te) The eigenvalues can be approximated as λ1,2 ≈ − − ǫ± i ǫ2 + w2 , (B-6) λ3,4 ≈ ǫ± iw, λ5 ≈ − where ǫ = (a2 − a3a4)/a4 and w = a1/a3. The system is unstable when a2 − a3a4 > 0 or a0 < 0. The first two eigenvalues will have a negative real part as long as γ̃β + γ̃γ > γ̃α. If the heating compensates the cooling of the star a0 ≈ 0 and becomes negative if the star can not reach thermal equilibrium. The other critical stability condition a2 − a3a4 = 0 can be written as Ω̃|δω̃| [1+Γβ+Γγ−(Γ2β+Γ2γ)−(Γβ−Γγ)2(Γβ+Γγ)] = 0, (B-7) where Γβ = γβ/γα and Γγ = γγ/γα. Note that we have ignored the smaller terms of orderO([γ̃α/(Ω̃|δω̃|)]5). This condition can be rewritten by defining variables D1 = Γβ + Γγ and D2 = Γβ − Γγ 2 + 2D1 −D21 −D22 − 2D22D1 = 0. (B-8) If D2 = 0 then the equation has one solutionD1 = 1+ for D1 > 2, which corresponds to Γ = Γβ = Γγ = 1.37 and matches the result of Wersinger et al. [35]. For the viscosity we consider (see Sec. II D) a2 − a3a4 < 0. [1] R. Wagoner, Astrophys. J 578, L63 (2002). [2] S. Chandrasekhar, Phys. Rev. Lett. 24, 611 (1970). [3] J. L. Friedman and B. F. Schutz, Astrophys. J. 222, 281 (1978). J. L. Friedman and B. F. Schutz, Astrophys. J. 221, 937 (1978). [4] N. Andersson, Astrophys. J. 502, 708 (1998). [5] J. Friedman and S. Morsink, Astrophys. J. 502, 714(1998). [6] L. Lindblom, B. J. Owen, and S. M. Morsink, Phys. Rev. Lett 80, 4843 (1998). [7] L. Bildsten, Astrophys. J. 501, L89 (1998). [8] N. Andersson, K. D. Kokkotas, N. Stergioulas, Astro- phys. J. 516, 307 (1999). [9] N. Andersson, K. Kokkotas, and B. F. Schutz, Astrophys. J. 510, 846 (1999). [10] G. B. Cook, S. L. Shapiro, and S. A. Teukolsky, Astro- phys. J 423 L117 (1994). [11] G. B. Cook, S. L. Shapiro, and S. A. Teukolsky, Astro- phys. J 424 823 (1994). [12] J. W. T. Hessels et al. Science 311 1901 (2006). [13] J. E. Grindlay, Science 311, 1876 (2006). [14] D. C. Backer et al., Nature 300, 615 (1982). [15] D. Chakrabarty et al., Nature 424, 42 (2003). [16] D. Chakrabarty, Astron. Soc. Pac. Conf. Series 328, 279 (2005). [17] B. J. Owen et al., Phys. Rev. D 58, 084020 (1998). [18] Y. Levin, Astrophys. J 517, 328 (1999). [19] P. Arras et al., Astrophys. J 591, 1129 (2003). [20] J. Heyl, Astrophys. J 574, L57 (2002). [21] P.B. Jones, Astrophys. Lett. 5, 33 (1970). P.B. Jones, Proc, Roy. Soc. (London) A323, 111 (1971). P.B. Jones, Phys. Rev. Lett. 86, 1384 (2001). P.B. Jones, Phys. Rev. D64, 084003 (2001). [22] L. Lindblom and B. J. Owen, Phys. Rev. D65, 063006 (2002), astro-ph/0110558. [23] P. Haensel, K. P. Levenfish, and D. G. Yakovlev, Astron. and Astrophys. 381, 1080 (2002), astro-ph/0110575. [24] M. Nayyar and B. J. Owen, Phys. Rev. D 73 (2006) 084001, astro-ph/0512041. [25] N. Andersson, D. I. Jones, and K. D. Kokkotas, MNRAS 337, 1224 (2002). [26] J. Brink, S. A. Teukolsky, and I. Wasserman, Phys. Rev. D70 (2004) 121501, gr-qc/0406085. [27] J. Brink, S. A. Teukolsky, and I. Wasserman, Phys.Rev. D70 (2004) 124017, gr-qc/0409048. [28] J. Brink, S. A. Teukolsky, and I. Wasserman, Phys.Rev. D71 (2005) 064029, gr-qc/0410072. [29] A. K. Schenk, P. Arras, E. E. Flanagan, S. A. Teukol- sky, I. Wasserman, Phys.Rev. D65 (2001) 024001, gr-qc/0101092. [30] L. Bildsten and G. Ushomirsky, Astrophys. J 529, L33 (2000). [31] Y. Levin and G. Ushomirsky, MNRAS 322, 515 (2001). [32] S. Yohida and U. Lee, Astrophys. J 546, 1121 (2001). [33] K. Glampedakis and N. Andersson, astro-ph/0607105, astro-ph/0411750. [34] Y. S. Dimant, Phys. Rev. Lett. 84, 622 (2000). [35] J. Wersinger, J. Finn, and E. Ott, Phys. Fluids 23, 1142 (1980). [36] E. F. Brown, Ap. J 531, 988 (2000). [37] H. Schatz, Phys. Rep. 294, 167 (1998). [38] D. G. Yakovlev and K. P. Levenfish, Astron. Astrophys. 297, 717 (1995). [39] D. G. Yakovlev and K. P. Levenfish, and Yu. A. Shibanov, Soviet Phys.-Uspekhi, 42, 737 (1999). [40] D. G. Yakovlev, A. D. Kaminker, and O. Y. Gnedin, A&A, 379, L5 (2001). [41] D. G. Yakovlev, and C. J. Pethick, Ann. Rev. Astron. Astrophysics, 42, 169 (2004). [42] G. Bryan, Philos. Trans. R. Soc. London A180, 187 (1889). [43] C. Cutler and L. Lindblom, Astrophys. J 314, 234 (1987). [44] K. D. Kokkotas and N. Stergioulas, Astron. and Astro- phys. 341, 110 (1999). [45] J. B. Kinney and G. Mendell, Phys.Rev. D67 024032 (2003). [46] P. R. Brady, T. Creighton, C. Cutler, B. F. Schutz, Phys. Rev. D 57, 2101 (1998), gr-qc/9702050. P. R. Brady, T. Creighton, Phys. Rev. D 61, 082001 (2000), gr-qc/9812014. [47] B. J. Owen and L. Lindblom, Class.Quant.Grav. 19, 1247-1254 (2002), gr-qc/0111024. [48] R. Prix, G. L. Comer, and N. Andersson, MNRAS 348, 625 (2004). N. Andersson and G. L. Comer, MNRAS 328,1129 (2001). N. Andersson, G. L. Comer and R. Prix, MNRAS 354, 101 (2004). [49] R. V. Wagoner, J. F. Hennawi, J. Liu, Proceedings of the 20th Texas Symposium on Relativistic Astrophysics, 781 (2001), astro-ph/0107229. http://arxiv.org/abs/astro-ph/0110558 http://arxiv.org/abs/astro-ph/0110575 http://arxiv.org/abs/astro-ph/0512041 http://arxiv.org/abs/gr-qc/0406085 http://arxiv.org/abs/gr-qc/0409048 http://arxiv.org/abs/gr-qc/0410072 http://arxiv.org/abs/gr-qc/0101092 http://arxiv.org/abs/astro-ph/0607105 http://arxiv.org/abs/astro-ph/0411750 http://arxiv.org/abs/gr-qc/9702050 http://arxiv.org/abs/gr-qc/9812014 http://arxiv.org/abs/gr-qc/0111024 http://arxiv.org/abs/astro-ph/0107229 ABSTRACT The nonlinear saturation of the r-mode instability and its effects on the spin evolution of Low Mass X-ray Binaries (LMXBs) are modeled using the triplet of modes at the lowest parametric instability threshold. We solve numerically the coupled equations for the three mode amplitudes in conjunction with the spin and temperature evolution equations. We observe that very quickly the mode amplitudes settle into quasi-stationary states. Once these states are reached, the mode amplitudes can be found algebraically and the system of equations is reduced from eight to two equations: spin and temperature evolution. Eventually, the system may reach thermal equilibrium and either (1) undergo a cyclic evolution with a frequency change of at most 10%, (2) evolve toward a full equilibrium state in which the accretion torque balances the gravitational radiation emission, or (3) enter a thermogravitational runaway on a very long timescale of about $10^6$ years. Alternatively, a faster thermal runaway (timescale of about 100 years) may occur. The sources of damping considered are shear viscosity, hyperon bulk viscosity and boundary layer viscosity. We vary proprieties of the star such as the hyperon superfluid transition temperature T_c, the fraction of the star that is above the threshold for direct URCA reactions, and slippage factor, and map the different scenarios we obtain to ranges of these parameters. For all our bound evolutions the r-mode amplitude remains small $\sim 10^{-5}$. The spin frequency is limited by boundary layer viscosity to $\nu_{max} \sim 800 Hz [S_{ns}/(M_{1.4} R_6)]^{4/11} T_8^{-2/11}$. We find that for $\nu > 700$ Hz the r-mode instability would be active for about 1 in 1000 LMXBs and that only the gravitational waves from LMXBs in the local group of galaxies could be detected by advanced LIGO interferometers. <|endoftext|><|startoftext|> Introduction Quantum information processing [23] offers potential improvements in a va- riety of applications. Computational advantages [26, 14] of quantum com- puters with many qubits have received the most attention but are difficult to implement physically. On the other hand, technology for manipulating and communicating just a few qubits could be sufficient to create new economic mechanisms by altering the information security and strategic incentives of the underlying game. Examples of quantum mechanisms include the prisoner’s dilemma [10, 11, 7, 8], coordination [17, 21] and public goods provisioning [3]. In partic- ular, a quantum mechanism can significantly reduce the free-rider problem without a third-party enforcer or repeated interactions, both in theory and practice [2]. In this paper, we examine quantum mechanisms for another economic scenario: resource allocation by auction [28]. While traditional auction mechanisms can efficiently allocate resources in many cases, quantum auc- tion protocols offer improvements in preserving privacy of the losing bids and dealing with scenarios in which bidders care about what other bidders win when multiple items are auctioned. Specifically, using quantum super- positions to represent bids prevents the auctioneer and other bidders from viewing the bids during the auction without disrupting the auction process. Furthermore, the auction result reveals nothing but the winning bid and allocation. The first part of the paper introduces a general quantum auction protocol for various pricing and allocation rules, multiple unit auctions, combinatorial auctions and partnership bids. For simplicity, we focus on the sealed-bid first-price auction. In this auction, each bidder has one opportunity to submit a bid. The winner is the highest bidder, who pays the amount bid for the item. This auction has been well studied both theoretically [28] and experimentally [5, 4], and contrasts with iterative auctions in which bidders can incrementally increase their bids depending on how others bid. If the auction is not well-matched to the bidders preferences, it can intro- duce perverse incentives and result in poor outcomes, such as lost revenue for the seller or economically inefficient allocations where items are not al- located to those who value them most. Thus it is important to examine incentives introduced with a proposed auction design. In particular, our auction protocol involves quantum search, which introduces incentive issues beyond those examined in prior quantum games [11]. A full analysis of incentive issues is complicated, even for classical auc- tions. In this paper we focus on two incentive issues arising from the quan- tum auction protocol. The first incentive issue arises from the possibility of manipulating the search outcome by altering amplitudes associated with different bids. We show how to revise an adiabatic search method to correct this incentive problem, thereby preserving the classical Nash equilibrium. From a quantum algorithm perspective, this construction of the search il- lustrates how incentive issues affect algorithm design, in contrast to the more common concern with computational efficiency in quantum informa- tion processing. Second, the quantum search for the highest bid is probabilistic, i.e., does not always return the highest bid. While the probability of finding the correct answer can be made as high as one wishes by using more iterations of the search, the small residue probability of awarding the item to someone other than the highest bidder may change bidding behavior. As a step toward addressing the effect of probabilistic outcomes, we show that, with sufficient steps in the quantum search, altering choices from those of the corresponding deterministic auction gives at most a small improvement for that bidder. The paper is organized as follows. Sec. 2 describes the quantum auction and the bidding language encoding bids in quantum states. Sec. 3 describes the quantum search method to find the maximum bid. After these sections describing the auction protocol, in Sec. 4 we turn to strategic issues raised by the quantum nature of the auction beyond those in the corresponding classical auctions. Then, in Sec. 5 we give a game theory analysis of some of these strategic possibilities and describe how simple modifications of the quantum search improves the auction outcome, in theory. Sec. 6 generalizes the results to auctions of multiple items, including combinatorial auctions. Sec. 7 describes scenarios for which the quantum protocol offers likely eco- nomic advantages in terms of information security and ability to compactly express complex dependencies among items and bidders. Finally, Sec. 8 sum- marizes the quantum auction protocol and highlights a number of remaining economic questions. 2 Quantum Auction Protocol In our auction protocol, each bidder selects an operator that produces the de- sired bid from a prespecified initial state. The auctioneer repeatedly asks the bidders to apply their individual operators in a distributed implementation of a quantum search to find the winning bid. More specifically, the quantum auction protocol for sealed-bid auctions involves the following steps: 1. Auctioneer announces conventional aspects of the auction: type of auction (e.g., first or second price and any reservation prices), the good(s) for sale, the allowed price granularity (e.g., if bids can specify values to the penny, or only to the dollar), and the criterion used to determine the winner(s), e.g., maximizing revenue for the seller 2. Auctioneer announces how quantum states will be interpreted, i.e., as specifying a price if only one good is for sale, or a combination of price and a set of goods if combinations are for sale; and also announces the initial quantum state. This state uses p qubits for each bidder. Auctioneer announces the quantum search procedure. 3. Each bidder selects an operator on p qubits. Bidders keep their choice of operator private. 4. Auctioneer produces a set of particles implementing p qubits for each bidder, initializing the set to the announced initial state. 5. Auctioneer and bidders perform a distributed search for the winner Fig. 1 illustrates this procedure for two bidders and repeating the steps of the search twice. Realistic search involves a larger number of steps. In contrast with other quantum games, e.g., public goods, that involve just one round of interaction, the search required to identify the winners involves multiple rounds of interaction among the participants. The required number of iterations depends on the search method. In practice, the auctioneer could pick the number of iterations based on prior experience with similar auctions, or from simulating several test cases using valuations randomly drawn from a plausible distribution of values for the auction items. Alternatively, the auctioneer could repeat the procedure several times (possibly with steps from each repetition interleaved in a random order) and use the best result from these repetitions. auctioneer auctioneer auctioneer bidder 1 bidder 2 bidder 1 bidder 2 start measure state announce result Figure 1: Schematic diagram of distributed search procedure, showing re- peated interactions between auctioneer and bidders, in this case two bidders and two steps of the distributed search. number of bidders n number of items in auction m number of qubits per bidder p state of qubits for bidder j ψj state of all qubits Ψ = ψ1 ⊗ . . .⊗ ψn Table 1: Notation for the quantum auction. This auction protocol uses a distributed search so bidders’ operator choices remain private. Specifically, the search operation requiring input from the bidders is applied locally by each bidder, giving the overall opera- U = U1 ⊗ U2 ⊗ . . .⊗ Un (1) where n is the number of bidders and Ui the operator of bidder i. 3 Quantum Auction Implementation A quantum auction requires finding the winning bid and corresponding bid- der. This procedure has two components: the interpretation of the qubits as bids, and the search procedure to find the winner. The following two sub- sections discuss these components in the context of a single-item auction. Sec. 6 generalizes this discussion to multiple items. 3.1 Creation and interpretation of quantum bids We define a bid as the amount a bidder indicates he is willing to pay for the item. An allocation is a list of bids, one from each bidder. The quan- tum auction protocol manipulates superpositions of allocations. We use an allocation rule to indicate how allocations specify a winner and amount paid. Example 1. Consider an auction of one item with three bidders, willing to pay $1, $3 and $10 for the item, respectively. We represent these bids as |$1〉, |$3〉 and |$10〉, and the corresponding allocation as the product of these states, i.e., |$1, $3, $10〉 with the ordering in the allocation understood to correspond to the bidders. A simple allocation rule selects the highest bidder as the winner, who pays the high bid. In this example, this rule results in the third bidder winning, and paying $10 for the item. Each bidder gets p qubits and can only operate on those bits. Thus each bidder has 2p possible bid values, and can create superpositions of these values. A superposition of bids specifies set of distinct bids, with at most one allowed to win. The amplitudes of the superposition affect the likelihood of various outcomes for the auction. For a single-item auction, a bidder will typically have only one bid. As discussed below, more complicated superpositions are useful for information hiding. Specifically, bidder j selects an operator Uj on p qubits to apply to the initial state for that bidder’s qubits ψinit specified by the auctioneer. The resulting state, ψj = Ujψinit, is a superposition of bids, each of the form where b i is bidder j’s bid for the item. The subscript i indicates one of the possible bids that can be specified with p qubits according to the announced interpretation of the bits. We define the subspace used by bidder j as the set of states spanned by the basis eigenvectors in ψj . Only these basis vectors appear in allocations relevant for the search. As bidders apply their operators during the search, the superposition of allocations remains within the subspace of each bidder. In this case, where each bidder applies an operator only to their own qubits, the superposition of allocations is always a factored form, i.e., Ψ = ψ1 ⊗ . . . ⊗ ψn. More generally, groups of bidders could operate jointly on their qubits, entangling their bids in the allocations as discussed in Sec. 7. To exploit information hiding properties of superpositions, the state re- vealed at the end of the search should specify only the bidder who wins the item and the corresponding bid. To achieve this, instead of a direct repre- sentation of bids, we interpret bids formed from the p qubits available to a bidder as containing a special null value, ∅, indicating a bid for nothing. This null bid has additional benefits in multiple item settings, as discussed in Sec. 6 and Sec. 7. Example 2. Consider bidder j with two qubits and the initial state ψinit = |00〉 corresponding to the vector (1, 0, 0, 0), which is interpreted as the null bid. The other bid states are |01〉, |10〉 and |11〉 corresponding to vectors (0,1,0,0), (0,0,1,0) and (0,0,0,1). These three states are interpreted as three bid values in some preannounced way, e.g., $1, $2 and $3, respectively. The operator 1 0 1 0 0 1 0 1 1 0 −1 0 0 1 0 −1 gives the initial state ψj = Ujψinit as (|00〉+|10〉)/ 2 and specifies the search subspace whose basis is the first and third columns of Uj in this example. Thus the possible allocations involve only |00〉 and |10〉 for this bidder, cor- responding to the null bid and a bid of $2, respectively. In the presence of a null bid, we consider an allocation to be a feasible if it contains exactly one bid not equal to ∅. The corresponding allocation rule assigns no winner to infeasible allocations and, for feasible allocations, the winner is the single bidder in the allocation whose bid is not ∅, and he pays the amount bid. This allocation rule corresponds to a first-price single- item auction, except there can be no winner, analogous to the situation in auctions with a reservation price when no bidder exceeds that price. 3.2 Distributed Search The auctioneer must find the best state according to an announced crite- rion, e.g., maximum revenue. Specifically, the auctioneer has a evaluation function F assigning a quality value to each allocation. The function F assigns a lower value to infeasible allocations than to any feasible one. An example is F equal to the revenue produced by the allocation (if feasible) and otherwise is −1. The auctioneer uses quantum search to find the allocation in the subspace selected by the bidders giving the maximum value for F (e.g., a feasible allocation giving the most revenue to the auctioneer). This could be done via repeated uses of a decision-problem quantum search [14, 1] as a subroutine within a search for the minimum threshold value of F giving a solution to the decision problem, e.g., with a classical binary search on threshold values or using results of prior iterations of the decision problem [9]. Alternatively, we could use a method giving the maximum value directly (e.g., adiabatic [12] if run for a sufficiently long time or heuristic methods [15, 16] based on some prior knowledge of the distribution of bidders values). For definiteness, we focus on the adiabatic method. The adiabatic search is conventionally described as searching for the minimum cost state. We use this convention by defining a state’s cost to be the negative of the evaluation function F . The adiabatic search procedure, if run sufficiently slowly, changes the initial superposition into a final super- position in such a way that the amplitude in each initial eigenstate maps to the same amplitude in the corresponding final eigenstate, up to a phase factor (for nondegenerate eigenstates). We refer to this mapping of initial to final eigenstates as a perfect search. In practice, with a finite time for the search, there will be some transfer of amplitude among the eigenstates so the search will not be perfect in the sense defined here. Instead the auc- tion outcome is probabilistic: the auction will not always produce the best outcome when starting from the ground state. For example, an auction in- tending to find the highest bid could sometimes produce the second highest bid instead. Conventionally, the search operations are chosen so the uniform superposition is the lowest cost initial eigenstate. In our case, bidders are free to choose their operators and need not create uniform superpositions. A discrete implementation of adiabatic search consists of the following steps: • The auctioneer selects a number of search steps S and parameter ∆. These need not be announced to the bidders. • The auctioneer initializes the state of all np qubits to Ψinit = ψinit ⊗ . . . ⊗ ψinit = |0, . . . , 0〉, with n factors of ψinit in the product, and ψinit = |0〉 is the initial state for the p qubits for a single bidder. • The auctioneer sends these initialized qubits to the bidders who use their individual operators and then return the qubits to the auctioneer, jointly creating the state Ψ0 = UΨinit (3) • For s = 1, . . . , S, the auctioneer and bidders update the state to Ψs = UD(f)U †P (f)Ψs−1 (4) with f = s/S the fraction of steps completed. The bid operator U and its adjoint U † are performed by sending bits to the bidders as described in Sec. 2. The diagonal matrices D(f) and P (f) are described below. • The auctioneer measures the state ΨS, resulting in specific values for all the bits, from which the winner and prices are determined by the allocation rule described in Sec. 3.1. The diagonal matrix P (f) adjusts the phases of the amplitudes according to the cost associated with each allocation. In particular, using the cost c(x) = −F (x) for allocation |x〉, we have Pxx(f) = exp (−ifc(x)∆) (5) Similarly, the diagonal matrix D(f) adjusts amplitude phases as defined by a function d(x): Dxx(f) = exp (−i(1− f)d(x)∆) (6) The key property of d(x) is assigning the smallest value, e.g., 0, to |0〉, thereby making the first column of U the ground state eigenvector. Aside from this key property, the choice of d(x) is somewhat arbitrary. The con- ventional choice in the adiabatic method uses the Hamming weight of the state, i.e., d(x) equal to the number of 1 bits in the binary representation of x. However, as described in Sec. 5, other choices for d(x) can improve the incentive properties of the auction. The discrete-step implementation of the continuous adiabatic method [12] involves the limits ∆ → 0 and S∆ → ∞, in which case the final state ψS has high probability to be the lowest cost state. In practice, this outcome can often be achieved with considerably fewer steps using a fixed value of ∆, corresponding to a discrete version of the adiabatic method [16]. 4 Strategies with Quantum Operators Ideally, an auction achieves the economic objective of its design (e.g. maxi- mum revenue for the seller). In practice, an auction design may not provide incentives for participants to behave so as to achieve this objective. Usu- ally auction designs are examined under the assumption of self-interested rational participants. In conventional auctions, strategic issues include mis- representation of the true value, collusion among bidders and false name bidding (where a single bidder submits bids under several aliases). Some of these issues can be addressed with suitable auction rules, e.g., second price auctions encourage truthful reporting of values. Developing suitable designs of classical auctions in a wide range of economic contexts remains a challenging problem [28]. Quantum auctions raise strategic issues beyond those of classical auc- tions. In our case, every step of the adiabatic search requires each bidder to perform an operation on their qubits. Ideally, the bidder should use the same operator U for creating ψinit as in every step of the search in Eq. (4). In addition, bidders should include the null bid in their subspaces. In the clas- sical first-price sealed-bid auction, the bidder makes one choice: the amount to bid. In our quantum setting, this choice amounts to selecting the sub- space to use with the quantum search. The remaining freedom to select U , and possibly a different U for each step in the search, are additional choices provided by the quantum auction. Bidders may be tempted to exploit the flexibility of choosing operators in several general ways. First, they could use a subspace not including the null bid. Second, they could use a different operator for creating ψinit than they use in the rest of the search, thereby producing an altered initial amplitude that is not the ground state eigenvector. Third they could change operators during the search. If any such changes give significant probability for low bids to win, bidders would be tempted to make such changes and include a low bid in their subspace, hoping to profit significantly by winning the auction with a low bid. The remainder of this section describes some strategic issues unique to quantum auctions and possible solutions. We further discuss a game theory analysis of some of these issues in Sec. 5. 4.1 Selecting the Subspace The use of the null bid in our protocol raises the strategic issue illustrated in the following example: Example 3. Consider an auction of a single item with two bidders Alice and Bob. Using operators producing uniform amplitudes for the sake of illustration, they ought to apply operators that create (|∅〉+ |bA〉) and 1√ (|∅〉+ |bB〉) respectively, where bA and bB are their desired bids. The initial superposition for all the qubits is the product of these individual superpositions, i.e., Ψ0 is (|∅,∅〉+ |bA,∅〉+ |∅, bB〉+ |bA, bB〉) If bidders use these same operators during the search, the search algorithm finds the highest revenue allocation, i.e., giving the item to the highest bid- der. Suppose instead Bob picks an operator with a one-dimensional subspace, producing an initial state |bB〉 rather than including ∅. The product super- position is then (|∅, bB〉+ |bA, bB〉) Since the search remains in this subspace and the second allocation is infea- sible, the search will return |∅, bB〉 no matter what Alice bids. Thus Bob always wins the item, and can win using the lowest possible bid. This example shows bidders have an incentive to exclude the null set from their subspace. If all bidders make this choice, there will be no feasible allocations in the joint subspace and the auction will always give no winner. For auctions with more than two bidders, selecting subspaces excluding ∅ is a weak Nash Equilibrium for the quantum auction because any other choice by a single bidder still results in no feasible allocations. 4.2 Altering Initial Amplitudes Strategic choices for bidders also arise from the search procedure itself, even when using the correct subspace consisting of ∅ and the desired bid. In particular, the probabilistic outcome of the search means the optimal bid according to the auction criterion (e.g., highest revenue) will not always win. For the adiabatic search method, bidders could try to arrange for especially tiny eigenvalue gaps between the state corresponding to the best outcome and another state allowing them to win with a low bid. A sufficiently small gap could make the number of steps the auctioneer selects insufficient to give the optimal state with high probability and instead give a significant chance of producing the more favorable outcome. However, because the eigenvalues are a complicated function of the operators of all bidders, and individual bidders do not know the choices made by others, it will be difficult for a bidder to determine how to make such especially small gaps and do so in a way that gives a favorable outcome. Nevertheless, even fairly small proba- bilities for not finding the optimal state could alter the strategic behavior of the bidders. A more direct way a bidder can arrange for a low bid to win is by altering the initial state of the adiabatic search to start not in the ground state but in an eigenvector corresponding to one of the first few eigenvalues above the ground state. The adiabatic search takes such eigenvectors, with high probability, to an outcome in which a bid lower than the highest wins. While a single bidder cannot create an arbitrary initial condition, one bidder can ensure that it is not the ground state. For example, a bidder could chose an operator that gives a nonuniform amplitude for the initial state, in particular (|∅〉−|bA〉)/ 2, while using the uniform state (|∅〉+|bA〉)/ 2 as the ground state through the remainder of the search in Eq. (4). This can result in significant probability for a low bid to win, and so a bidder is tempted to deviate from the nominal operator choice. Fig. 2 illustrates this behavior. Instead of starting in the ground state, 0 0.2 0.4 0.6 0.8 1 high bid wins low bid wins infeasible Figure 2: Correspondence between initial basis and the possible allocations for a single item auction with two bidders in the standard adiabatic search. During the search, as f increases from 0 to 1, the eigenvalues of the four states change as shown schematically in the figure. The states for f = 0 correspond to both bidders starting with the ground state, |00〉, the two states obtained if one of the bidders starts with a different superposition, |01〉 and |10〉 (“single-bidder deviation states”), and the state of both bidders starting with different superpositions, |11〉 (“2-bidder deviation state”). the bidder’s choice gives the initial state as a linear combination of the ground state and the single-deviation state for that bidder, denoted as |01〉 or |10〉 for the two bidders in Fig. 2. Here a “single deviation” state is one that a single bidder can create, i.e., by operating on just the qubits available to that bidder. The adiabatic search splits the degeneracy, thereby giving some probability for the lowest bid to win and some probability for an infeasible allocation. More generally, bidder i uses this strategy by selecting two different operators U initi and Ui to use for forming the initial state and during the search, respectively. These choices result in different joint operators, in Eq. (1), used in Eq. (3) and (4). As with selecting a subspace without ∅, if many or all bidders make this choice, the initial state will have significant amplitude in eigenvectors corresponding to large eigenvalues, which produce infeasible outcomes and 0 0.2 0.4 0.6 0.8 1 high bid wins low bid wins infeasible Figure 3: Correspondence between the initial basis and the possible alloca- tions for a single item auction with two bidders in the search with permuted initial eigenvalues. hence a high probability for no winner. Thus with standard adiabatic search, if everyone uses the same operator for both initialization and search, then each bidder is tempted to use a different initialization operator and bid low, gaining a chance to win with a low bid. However, if multiple bidders attempt this, the outcome will most likely be an infeasible state, with no winner. We can address this problem by reordering the eigenvalues given by the d(x) function in Eq. (6) so that any change in initial operator by a single bidder increases probability of infeasible allocation but not the probability of any feasible allocation with a bid lower than the highest bid. This is possible because bidders only have access to their own bits, so can only form initial superpositions from a limited set of basis vectors. Fig. 3 illustrates the resulting situation. We give an analysis of this approach in Sec. 5.2. 4.3 Changing Operator During Search The distributed search of Eq. (4) has each bidder using the same operator for every step of the search. Thus bidders may gain some advantage by altering their operator during the steps of the search. Gradually changing the operator during the search amounts to a different path from initial to final Hamiltonian during the adiabatic search. Thus, provided the auction- eer uses enough steps, such changes will have at most a minor effect on the outcome probabilities unless the bidder can arrange for particularly small eigenvalue gaps among favorable states. Such arrangement is difficult, par- ticularly since the bidder does not know the choices of other bidders and the auctioneer could treat the bits from the bidders in an arbitrary, unan- nounced order. More significant changes in outcome is possible with sudden, large changes in the operator during search. Since the use of bidders operators gradually decreases during the search (i.e., Dxx(f) given in Eq. (6) approaches the identity operator as f approaches 1), the most problematic situation is for an abrupt change in operator at the beginning of the search. After such a change, the adiabatic search continues its gradual change of states, but now instead of starting in the ground state, it will instead have a linear combination of various states obtained by mapping the original basis onto the basis after the change. 5 Quantum Auction Design In this section, we focus on mechanism design to reduce incentive issues arising from the quantum aspects of the auction. We analyze incentive is- sues with the Nash equilibrium (NE) concept commonly used to evaluate auctions [28]. A given set of behaviors for the bidders is an equilibrium if no single bidder can gain an advantage (i.e., higher expected payoff) by switching to another behavior. Specifically, Sec. 5.1 describes an approach to encouraging bidders to include the null set in their bids. In Sec. 5.2 we show that using the ground state eigenvector is a NE provided bidders do not change the operators during the search. Sec. 5.3 then discusses how the auctioneer can discourage bidders from changing operators. Sec. 5.4 de- scribes how the auction can be made symmetric across the different bidders. We focus on single-item auctions in this section, but the ideas extend to quantum combinatorial auctions, as described in Sec. 6. 5.1 Checking for the Null Set One approach to the incentive to exclude the null set, described in Sec. 4.1, is for the auctioneer to perform a second search: for the allocation with the most ∅ values. This search uses the same distributed protocol of Eq. (4) but with separate qubits and a different cost function to define P (f), i.e., setting c(x) to the number of non-∅ values in the allocation x. Interleaving the additional search in a random order within the steps of the search for the winning bid prevents bidders from knowing which search a given step belongs to. So bidders could not consistently select different operators for the two searches. If all bidders include ∅ in their selected subspace, this additional search returns |∅,∅, . . .〉. Any bidder found not to have included ∅ could be ex- cluded from winning the auction. At this point the auctioneer could either announce there is no winner, or restart the auction for the remaining bid- ders without announcing this restart. The adiabatic search has a small but nonzero probability of returning the wrong result, which would then incor- rectly conclude some bidder did not include ∅. As long as the probability of such errors is smaller than the error probability of the search for the winner, these errors should not greatly affect the incentive structure of the mech- anism. Alternatively, the auctioneer could use a search completing with probability one in a finite number of steps, i.e., with different choices of D and P in Eq. (4), the auctioneer could implement Grover’s algorithm [14] to search for the allocation |∅,∅, . . .〉 in the joint subspace of the bidders. Since the auctioneer does not know the size of the subspaces selected by the bidders, the auctioneer would need to try various numbers of steps [1] before concluding |∅,∅, . . .〉 is not in the selected subspaces. Unlike the adiabatic search, failure would only indicate some bidder had not included ∅, but not which one. Thus the auctioneer’s only alternative in this case is to announce the auction has no winner. While this approach removes the immediate benefit of not including the null bid, its affect on broader strategic issues in the full auction is an open question. 5.2 The First-Price Sealed-Bid Auction In this section we examine the incentive structure of the auction with per- muted eigenstates described in Sec. 4.2. We first review how game the- ory applies to auctions. We then consider the quantum auction when the search runs long enough to give successful completion almost always (“per- fect search”). Finally, we consider the more realistic case of search with small, but not negligible, probability for non-optimal outcomes. 5.2.1 A Game Theory Approach to Auctions Game theory is a common approach to evaluating auctions [20, 28]. Con- sider n people bidding for an item, with person i having value vi for the item. Unlike discrete choice games, such as the prisoner’s dilemma, a strat- egy for a private value auction involves a bidding function b(v), mapping a bidder’s value to a corresponding bid. Theoretical analysis of auctions usually involves identifying a NE strategy, if any. This is a strategy for all players such that no bidder gains by changing this strategy given everyone else is using it. This focus on possible changes by a single bidder assumes bidders do not collude. A primary issue for auction behavior is how much participants know about other bidders’ values. Such knowledge can affect the choice of bid. The most popular model of such knowledge is independent private values, where the vi are independently drawn from the same distribution. Each bidder knows his own value, but not the values of other bidders. However, the distribution from which values come is common knowledge, i.e., known to all bidders, each bidder knows the others know this fact, and so on. A final ingredient for the analysis is an assumption of bidders’ goals. For illustration, we use the common assumption that bidders are risk neutral expected utility maximizers, and within the context of the auction, utility is proportional to profit. We illustrate this approach for a first-price sealed-bid auction, in which each bidder submits a single bid without seeing any of the other bids. This corresponds to the auction scenario considered in this paper. The bidder with the highest bid gets the item and pays the amount of his bid. Thus if bidder i bids bi, his profit is vi−bi if he wins the auction and zero otherwise. To avoid possibly losing money, bidders should ensure bi ≤ vi, and bids are required to be nonnegative. In the symmetric case where bidders’ values all come from the same distribution, a NE is a bidding function b(v). A bidder’s expected payoff is (v − b(v))P (b) where v is his value, b is his bidding function and P (b) is the probability of winning if he is using b(v) (which is also the function others use in equilibrium). Let F be the cumulative distribution of values, i.e., probability a value is at most v, and n be the number of bidders. The equilibrium condition leads to a differential equation satisfied by b(v) [28]. As a simple example, when v is uniformly distributed between 0 and 1, F (v) = v and the NE is b(v) = (n − 1)v/n. Thus, in the equilibrium strategy, a bidder bids somewhat less than his value and the bid gets closer to the value when there is more competition, i.e., larger n. If bidders have differing value distributions, a NE involves a set of bidding functions, {bi(v)}. An auction may have multiple equilibria. 5.2.2 Behavior with Perfect Search With perfect search and non-colluding bidders, if bidders use the same opera- tors for every step of the search, including initialization, and pick a subspace with the null bid then the adiabatic search described in Sec. 3.2 finds the highest revenue state. We now show that the auctioneer can choose eigenval- ues for the search so that bidders have no incentive to create an initial state different from the ground state. This choice corresponds to the auctioneer selecting an appropriate function d(x) in Eq. (6). Suppose bidder i uses operator Ui, giving the overall operator U with Eq. (1). Suppose all bidders except bidder 1 use the same operator to create the initial state as they use for the subsequent search. But bidder 1 uses two operators: U init1 to form the initial state and U1 for the search. Thus the initial state produced by bidder 1, ψ1 = U 1 ψinit, i.e., the first column of U init1 , is not necessarily equal to the first column of U1 that bidder 1 uses for the subsequent search. Instead, ψ1 may have contributions from all columns of U1, i.e., αi |i〉 (7) where |i〉 corresponds to column i, ranging from 0 to 2p−1, of U1. Combining with the initial state of all other bidders, Eq. (3) gives Ψ0 = i αi |i, 0, . . . , 0〉, instead of the initial ground state |0, 0, . . . , 0〉. Significantly, because a bidder can only operate on the p qubits from the auctioneer and not on any of the qubits sent to other bidders, a single bidder can only create a limited set of “single-deviation” initial states. In the case of bidder 1, these states all have the form |i, 0, . . . , 0〉. Similarly, if bidder j is the one using different initial and search operators, the states all have the form |. . . , 0, i, 0, . . .〉, where only the jth position can be nonzero. Thus, among the 2np basis states in the full search space, aside from the correct ground state, only n(2p−1) are possible states some single bidder can create when all other bidders use the same operator for initialization and search. More generally, k bidders can create superpositions of (2p − 1)k basis states in which none of them use the ground state initially, by selecting different operators for initialization and search. Thus there are (2p − 1)k (8) k-deviation states that some set of k bidders can create, while the other n− k bidders use the ground state. Our formulation has n(2p−1) feasible allocations, i.e., situations in which exactly one of the bidders has a non-∅ bid while all other bidders have ∅. To see this, each of the n bidders could have the non-∅ bid, and this bid could have any of 2p − 1 values (since the remaining value for the bidder’s bits represents ∅). The remaining n− 1 bidders have only one choice each, i.e., ∅. Suppose the auctioneer selects d(x) such that d(|0, . . . , 0〉) = 0 is the lowest eigenvalue and d(x) for all single-deviation states x is the largest value, with intermediate values for all other states. Provided the number of infeasible allocations is at least equal to the number of single-deviation states, a perfect search will then map every single-deviation state to an infeasible allocation, resulting in no winner for the auction. This condition amounts to 2np − n(2p − 1) ≥ n(2p − 1) (9) The following claim shows that Eq. (9) always is true in an auction scenario. Claim 1. Eq. (9) is true for all integers n, p ≥ 1 Proof. When p = 1, Eq. (9) reduces to 2n−1 ≥ n, which is true for all n ≥ 1. We prove a stronger condition for p ≥ 2, namely there are enough in- feasible states to handle up to n − 1 bidders deviating. Using Eq. (8), this stronger condition is 2np − n(2p − 1) ≥ (2p − 1) = 2np − 1− (2p − 1)n (10) with the k = 1 term in the sum corresponding to the right-hand side of Eq. (9). Writing x ≡ 2p − 1, Eq. (10) becomes f(x, n) ≡ xn − nx+ 1 ≥ 0. Since p ≥ 2, we have x ≥ 3. For this range of x and for n ≥ 1, f(x, n) is monotonically increasing in both arguments. To see f is monotonic for x, the derivative of f(x, n) with respect to x is n(xn−1 − 1) which is nonnegative since n ≥ 1 and x > 1. Similarly, the derivative with respect to n is x(xn−1 ln(x) − 1) which is at least 3(ln(3) − 1) > 0 since n ≥ 1 and x ≥ 3. Thus for the relevant range of n and x, f(x, n) ≥ f(3, 1) = 1 so Eq. (10) is true for all n ≥ 1 and p ≥ 2. Combining these cases for p = 1 and p ≥ 2 establishes the claim. Using this claim, we demonstrate the permuted eigenvalue choices re- move incentives to alter the initial amplitudes: Theorem 1. If (a) auctioneer chooses eigenvalues as described above, (b) {b∗i (v)}ni=1 is an equilibrium for the first-price classical auction, and (c) bid- ders include the null set as part of their bids and use the same operator in each step in the search except, possibly, for the initial state, then the strategy of using bidding functions {b∗i (v)}ni=1 and the same operator for their initial state as they use in the search is a NE for corresponding quantum auction. Proof. Without loss of generality, suppose only bidder 1 deviates and all the other bidders use {b∗i (v)}ni=2 and the same operator for initialization and search. Then, as described above, the initial state Ψ0 is i αi |i, 0, . . . , 0〉 for some choice of amplitudes αi, with i ranging from 0 to 2 p − 1. A perfect adiabatic search maps each of these states to a corresponding allocation. In particular, with d(|0, . . . , 0〉) having the smallest value of the function d(x), the lowest cost allocation is produced with probability |α0|2. This allocation corresponds to the highest bid winning. Moreover, each |i, 0, . . . , 0〉 with i 6= 0 has the largest value of d(x), and so, because of Eq. (9), maps to an infeasible allocation, giving no winner and hence no value to bidder 1. Hence the expected value for bidder 1 is |α0|2V where V is the value of the expected profit of the corresponding classical auction to bidder 1. Since |α0|2V ≤ V , bidder 1 cannot gain from such a deviation. Furthermore, there is no gain from deviating from the bidding function b∗1(v) since it will only decrease V , because, by assumption, {b∗i (v)}ni=1 is a NE for the corresponding classical auction. Because of Eq. (9), this discussion applies to deviations by any single bidder, not just bidder 1. Thus, using bidding function {b∗i (v)}ni=1 and using the same operator for their initial state as they use in the search is a NE. The stronger condition, Eq. (10), shows that the number of infeasible states is enough to give no winner for any choice of initial amplitudes that up to n − 1 bidders can produce, provided p ≥ 2. Thus if an auctioneer implements a collusion-proof classical auction with the quantum protocol and assigns infeasible states as described then the resulting quantum auction is collusion-proof up to n− 1 bidders for initial amplitude deviations. The choice for d(x) satisfying the above requirements is not unique. As one example, let x be the state index in the full search space, running from 0 to 2np − 1. Consider x as written as a series of n base-2p numbers, |x1, x2, . . . , xn〉. Define d(x) = −r(x) (mod n+ 1) (11) where r(x) is number of nonzero values among x1, x2, ..., xn. The mod oper- ation gives all d(x) values in the range 0 to n. For the initial ground state, x = |0, . . . , 0〉, r(x) = 0 so d(x) = 0, and this is the smallest possible value. Single-deviation states have exactly one of the xi nonzero, giving r(x) = 1 and d(x) = n, the largest possible value. More generally, all k-deviation states have r(x) = k so d(x) = n + 1 − k. This function definition gives values directly from the representation of the state x, so, in particular, the auctioneer can implement it without any knowledge of the subspaces selected by the bidders. The assumption of perfect search is a sufficient but not necessary con- dition for the proof of Theorem 1. The necessary conditions are more com- plicated because we only need that every single bidder deviation maps to a linear combination of infeasible states. Thus mixing among different single- deviation states during search (e.g., due to small eigenvalue gaps among those states), or among states corresponding to two or more bidders deviat- ing, does not affect the proof. 5.2.3 Bounded Number of Search Steps Theorem 1 shows the quantum auction has the same NE as the classical first price auction if the search is perfect and each bidder uses the same operator for every search step of Eq. (4). Since adiabatic search, run for a finite number of steps, is not perfect we examine the effect on the NE of an imperfect search. We show that the NE for perfect search, i.e., bidding as in a classical first price auction and using the same operator initially and during the search, is an ǫ-equilibrium for the auction with imperfect search. Furthermore, ǫ converges to zero as the number of search steps goes to infinity. A strategic profile is an ǫ-equilibrium [24] if for every player, the gains of unilateral defecting to another strategy is at most ǫ. This weaker equilibrium concept is useful in our case because determining how to exploit imperfect search is computationally difficult. Specifically, with the small eigenvalue gaps and degeneracy it is hard to know whether imperfect search benefits a particular bidder. Thus computational cost will likely outweigh the small possible gain. In this situation, an ǫ-equilibrium is a useful generalization of NE. We must prove that for any ǫ there exists an N so that if the search process uses at least N steps, the equilibrium of the game with a perfect search is also an ǫ-equilibrium when using the actual search. To do so, we bound the possible gain from deviation based on prior knowledge of the range of possible bidder values. That is, we assume the distribution of values has a finite upper bound v̄. In our context, one such bound is the maximum bid value expressible by the announced interpretation of each bidders qubits. Theorem 2. If the conditions of Theorem 1 are met, and assuming the pos- sible bidder values are bounded by v̄, for any ǫ > 0, there exists an N so that the NE in the quantum auction with a perfect search, shown in Theorem 1, is also an ǫ-equilibrium of the same auction with an imperfect search using N search steps. Proof. Let ph be the probability of the highest bid wins. Let pinf be the probability of reaching an infeasible state. Then po = 1 − ph − pinf is the probability of a bid other than the highest bid wins. With the adiabatic search, with nonzero eigenvalue gaps, the probability of correctly mapping the initial to final states converges to one as the number of search steps increases. Thus for any δ > 0, there always exists a N where po is at most δ. We define an equilibrium expected payoff function for bidder i with value v as π∗i (v), when all bidders use their equilibrium bidding functions. Without loss of generality, from the perspective of bidder i with value v, the probability of achieving the equilibrium payoff, π∗i (v), if that bidder does not deviate is 1 − δ. Thus the expected payoff of deviating is at most πdeviatei (v) ≤ (1− δ)π∗i (v) + δv̄ because (a) the most any bidder can gain is bounded by v̄, and (b) with probability 1− δ the auction either produces no profit (pinf) or is identical to a classical auction (ph). The expected gain g from deviating is the expected payoff from deviating minus the expected payoff with no deviation, i.e., g = πdeviatei (v) − π∗i (v) ≤ δ(v̄ − π∗i (v)), which in turn is at most δv̄. Thus for any choice of δ, there always exists an N where the maximum deviation benefit is at most δv̄. For any ǫ > 0, using δ = ǫ/v̄ in the above discussion shows there always exists an N where the deviation is at most ǫ. 5.3 Testing for Changed Operators During Search One approach to the incentive issue of changing operators during search, described in Sec. 4.3, is for the auctioneer to test the bidders by randomly inserting additional probe steps in the search. Specifically, suppose at any step of the search the auctioneer, with some probability, decides to check a bidder by sending a new set of qubits in a known state |φ〉, while storing the qubits for the search until a subsequent step. For the test step, the auctioneer sets D or P to the identity operator. The state returned by the bidder is then U ′iU i |φ〉 or U i Ui |φ〉, depending on which part of the search step in Eq. (4) the auctioneer is testing. Without loss of generality, we consider the former case. Ideally, the bidder uses the same operator, so U ′i = Ui and U i is the identity. Suppose the test state is formed from some operator V , randomly selected by the auctioneer, |φ〉 = V |0〉. If U ′iU i is not the identity, the re- turned state has the form α |φ〉+β |φ⊥〉, where |φ⊥〉 is some state orthogonal to |φ〉 and |α|2 + |β|2 = 1. The auctioneer then applies V †, giving α |0〉+ β |a〉 (12) for some value a 6= 0. The auctioneer then measures this state, getting 0 with probability |α|2, indicating the bidder passes the test. Otherwise, the auctioneer observes a different value, indicating the bidder changed the operator. Hence the chance of getting caught depends on how often the auctioneer checks, and how big a change the bidder makes in the operator. Larger operator changes are more likely to be caught. This testing behavior is appropriate as small changes are not likely to have much affect on the search outcome, and instead simply act as an alternate adiabatic path from initial to final states. This technique is especially useful for risk averse bidders since then even a small chance to be caught might be enough to prevent bidders from wanting to change operators. 5.4 Assigning Eigenvalues to Subspaces Quantum search acts on the full space of superpositions of the available qubits, i.e., in our case to all 2np configurations of items and bids. In the auction context, bidders choose operators to restrict the search to a subspace of possible bids, namely the ones they wish to make. Conceptually, the search described above is then restricted to the subspace selected by the bidders. The search can also be viewed as taking place in the full space of 2np configurations. The operator U appearing in the search algorithm is block diagonal (up to a permutation of the basis states), with only the block operating on the selected subspace relevant for the search outcome. This view of the search is that of the auctioneer, who has no prior knowledge of the subspace selected by each bidder. The operator U is not known to any single individual: instead its implementation is distributed among the bidders, with each bidder implementing a part of the overall operator. The auctioneer chooses the eigenvalues for the initial Hamiltonian and the ordering for the qubits assigned to each bidder. These choices, which could change during the search, affect the incentive structure of the auction as described in Sec. 5.2. This section describes how the auctioneer’s choice of d(x) can give the same eigenvalues when restricted to the subspace actually selected for the search. For simplicity, we suppose each bidder uses a 2-dimensional sub- space, consisting of ∅ and the desired bid for the single item. While not essential for the NE results discussed above, uniformity with respect to sub- space choices means bidders are treated uniformly, so convergence of the search is independent of the order in which the auctioneer considers the bidders. 5.4.1 An Example Consider n = 2 bidders, each with p = 2 bits, representing 4 values: ∅ and three bid values 1, 2, 3. A set of 2-bit operators to form a uniform superposition of the form (|∅〉+ |b〉)/ 2 where b is the bid value, 1, 2 or 3, is 1/ 2 times 1 −1 0 0 1 1 0 0 0 0 1 −1 0 0 1 1 1 0 −1 0 0 1 0 −1 1 0 1 0 0 1 0 1 1 0 0 −1 0 1 1 0 0 −1 1 0 1 0 0 1 which we can denote as A1, A2, A3, respectively, with the first columns giving the uniform superposition of the three possible bid values. If the bidders select bids b1, b2, respectively, the overall operator for the search is U = Ab1 ⊗ Ab2 , used in Eq. (4) to perform each step of the search. Thus in this case there are 9 possible subspaces the two bidders can jointly select. Up to a permutation, U is block diagonal with the block containing the nonzero entries of the first column, and hence all the nonzero amplitude during the search, equal to 1 −1 −1 1 1 1 −1 −1 1 −1 1 −1 1 1 1 1 The search using U in the full 4-bit space is thus equivalent to one taking place in the 2-bit subspace selected by the two bidders using this operator The auctioneers’ choice of eigenvalues, i.e., the function d(x) used in Eq. (6) should ensure the uniform superposition within the subspace defined by the two bidders has the lowest value, say 0, and all other eigenstates have larger values. One possibility is the standard choice for the diagonal values d(x) when searching in the full space of 24 states defined by the np = 4 bits, namely the Hamming weight of each state, i.e., the number of 1 bits in its binary representation, ranging from 0 to 4. An alternative approach is picking d(x) so eigenvalues for the four states appearing in V have the same values as they would have with using the Hamming weight for a 2-bit search, ranging from 0 to 2. Doing so requires selecting the eigenvalues to match the corresponding Hamming weights for any choices the bidders make among A1, A2, A3. In this example, each bidder has 2 qubits, so can represent 4 states, which we denote as |0〉 , . . . , |3〉. The states for both bidders are products of these individual states, |0, 0〉 , . . . , |3, 3〉. Examining the 9 possible cases for U , shows a consistent set of choices is d(|x, y〉) equal to the number of nonzero values among x, y. With this d(x), the adiabatic search in the subspace selected by the bidders is identical to the standard adiabatic search for two bits. This choice treats both bidders identically. In this case we see the auctioneer can arrange the adiabatic search to operate symmetrically no matter what choice of subspace each bidder makes (i.e., no matter what value each bidder decides to bid). Thus from the point of view of the bidders, the search, in effect, takes place within the subspace of possible values defined by their bid selections. 5.4.2 General Case For arbitrary numbers of bidders n and bits p, we consider a single-item auction so each bidder would, ideally, pick an operator giving just two terms, with b(j) the bid of bidder j for the single item and no bits needed to specify which item the bidder is interested in. The choice of b(j) corresponds to the bidder picking a 2-dimensional subspace of the 2p possible states. The product of these subspaces gives a subspace S of all np qubits used in the auction. The subspace S has dimension 2n and its states xS can be viewed as strings of n bits. More specifically, we suppose bidder j implements the operator Uj such that the rows and columns corresponding to ∅ and b have nonzero values only for positions ∅ and b(j). That is, the elements of Uj for these two values form a 2× 2 unitary matrix. If the auctioneer knew the subspace S, the eigenvalue function d(x) used in Eq. (6) could be selected to match any desired function dS(xS) of the states in xS ∈ S. Without such knowledge, this is possible only for some choices for dS . Theorem 3. Provided dS(xS) depends only on the Hamming weight of the states xS, a single choice of d(x) in the full space corresponds to dS(xS) in all possible subspaces the bidders could select that include the null set. Proof. Consider the full operator U given by Eq. (1). For the element Ux,y, express the np bits defining the states x and y as sequences of p-bit values, x1, . . . , xn and y1, . . . , yn, respectively, with each xi and yi between 0 and 2p − 1. From Eq. (1), Ux,y = (Ui)xi,yi The matrix U is of size 2np × 2np while each Ui is of size 2p × 2p. Consider the first column of U , i.e., y = 0. Ux,0 is nonzero only for those x such that all the (Ui)xi,0 are nonzero. For this to be the case, each xi is either 0 (corresponding to |∅〉 for that bidder’s superposition) or xi = b(i), i.e., the bid value. Similarly, for all columns with each yi equal to 0 or b These values for x, y are precisely the states in the selected subspace of the bidders, S. For these choices of xi, yi, we can map 0 (i.e., p bits all equal to zero) to the single bit 0, and each b(i) (specified by values for p bits) to the single bit 1. This establishes a one-to-one mapping from states in the full space, of np bits corresponding to the product of bidders’ superpositions, to states in the subspace treated as n-bit vectors. Thus a function dS(xS) applied to the subspace that depends on the Hamming weight, i.e., the number of 1 bits in xS, is the same as a function on the full space depending on the number of nonzero xi values in x = x1, . . . , xn. We must show that a single choice of function d(x) in the full space maps to the desired dS(xS) in any choice of bidder subspaces. To see this is the case, consider any state in the full space x = x1, . . . , xn. Among these xi, suppose h are nonzero, denoted by xa1 , . . . , xah . This state x will appear in all selected subspaces in which bidder aj bids b (aj) = xaj , for j = 1, . . . , h, and the remaining bidders have any choice of bid. That is, x appears in (2p − 1)n−h possible subspaces S. Since x has exactly h nonzero values, in each of these possible subspaces it maps to a state xS with exactly h bits equal to 1, i.e., it has the same Hamming weight, h, in all possible subspaces in which it appears. Thus any choice of function dS(xS) depending only on the Hamming weight of xS will have the same value in all these possible subspaces. This observation allows the auctioneer to select that common value as the value for d(x), consistently giving the desired eigenvalue function for any possible subspace. Since this holds for all values of h, the auctioneer can operate in the full space with identical search behavior no matter what subspace the bidders select. For the auctioneer to operate without knowledge of the actual subspace selected by the bidders and treat bidders identically, we need d(x) to map to the same function on any subspace selected. In this case, the search proceeds exactly as if the auctioneer did know the subspace choices made by the bidders. The theorem gives one type of function for in which this is the case. In particular, Eq. (11) is an example of a function satisfying this theorem. 6 Multiple Items and Combinatorial Auction While the paper focuses on the single item first-price sealed-bid auction, the quantum protocol can apply to multiple items by changing the interpretation of the bids, i.e., the bidding language. Such changes affect the counting of deviation and feasible states, so we must check the validity of Theorem 1. In the single item case, each bidder uses the p qubits to specify the bid amount. With multiple items, the bid must specify both the items of interest and a bid amount for the items. Various bidding languages can encode this information. For multiple items, we divide the p qubits allocated to each bidder into two parts: pitem bits to denote a bundle of items and pprice bits to denote bid value (so p = pitem+pprice). Since qubits are expensive, a succinct represen- tation of items is best. Depending on the type of auction, we have various choices with different efficiency in using bits. For example, the pitem item bits could indicate the item in the bid, allowing pitem qubits to specify up to 2pitem different items. Another case is multiple units of a single item, so pitem could specify how many units a bidder wants (with the understanding the bid is for all those units not a partial amount) so the bits could specify 2pitem different numbers. In the general case, bids are on arbitrary sets of items or bundles, and we represent a bundle with m bits, 1 if the corresponding item is a part of the bundle and 0 otherwise, i.e., m = pitem. We focus on this general case in the remainder of the section. Allowing bids on sets of items is called a combinatorial auction [6]. With multiple items, the bid operator ψj = Ujψinit gives a superposition of bids of the form i , b where b i is bidder j’s bid for a bundle of items I . In this notation, the null bid is |∅, b〉, and the specified amount b is irrelevant so we take it to be zero in the examples. A superposition specifies a set of distinct bids, with at most one allowed to win. Example 4. Consider a combinatorial auction with two items X, Y and integer prices ranging from 0 to 3. With p = 4 bits for each bidder, using 2 bits each to specify item bundles and prices, is sufficient to specify the bids. The full space for a bidder has dimension 2p = 16, consisting of 4 possible item bundle choices and 4 price choices. Suppose a bidder places a bid (|∅, 0〉+ |X, 1〉 + |(X,Y ), 2〉) i.e., a bid of 1 for item X alone, and 2 for the bundle of both items. In this case, the bidder is not interested in item Y by itself. The dimension of the subspace of this bid is 3. Another example is the bid (|∅, 0〉 + |X, 1〉 + |X, 3〉 + |(X,Y ), 4〉) The dimension of the subspace is 4. This superposition has multiple bids on the same item X. This bidding language is both expressive and compact. For instance, a superposition of bundles of items readily expresses exclusive-or preferences, where a bidder wants at most one of the bundles. It is also compact because superpositions allow the bidder to use exactly the same qubits to place no bid (i.e., ∅) and to place all the exponential number of bundles in a combinatorial auction. An allocation, as defined in Sec. 3.1, is a list of bids, one from each bidder. With multiple items, an allocation is feasible if the item sets are pairwise disjoint. As in the single item case, we consider the allocation when all item sets are empty as infeasible. The value of a feasible allocation is the sum total of the bid values of the different bids in the allocation. The number of feasible states is ((n + 1)m − 1)2npprice . This is because we can assign m items among n bidders where all items need not be allocated in (n+1)m ways. The factor n+1 allows for some items to remain unallocated. Since the allocation when all bidders place the null bid is an infeasible state, we subtract 1. Each bidder can specify 2pprice different prices for the bundle giving 2npprice possible choices for n bidders. Note that the number of feasible states for a single item, m = 1, is different from that in Sec. 5.2 because here we have changed the bidding language to represent items also. The null bid in our protocol simplifies the evaluation of allocations for combinatorial auctions. To see this, consider a protocol without the null bid. In a single item case, F (x) for any allocation vector x would be maximum of the bids placed by the different bidders on the item, which is fairly easy to compute. But in the case of multiple items, there could be several allocations for a vector x. For example suppose Alice bids on the set {A,B} and Bob bids on {B,C}. Without the null set then both bids appear in the same state and have to be evaluated by F (x). The possible allocation to the bidders are 1. none to either 2. {A,B} to Alice 3. {B,C} to Bob, and 4. {A,B} to Alice and {B,C} to Bob (which is infeasible) F (x) will have to compute the maximum of the values in all these states. This is computationally complex when there are many items. By contrast, the bidding language with the null bid avoids this combinatorial evaluation within the search function F (x). As in the case of single item auctions, we restrict ourselves to a one-shot sealed bid classical combinatorial auction that we implement in a quantum setting. The total number of states is 2pn and the total number of single bidder deviations states is n(2p − 1). These expressions are the same as the single item case. The condition for all single-deviation states to be mapped to infeasible allocations, resulting in no winner, is 2np − ((n + 1)m − 1)2npprice ≥ n(2p − 1) (13) This condition holds for cases relevant for auctions as seen in the following claim. Claim 2. Eq. (13) is true for all integers m, pprice ≥ 1 and n ≥ 2. Proof. Recall p = m + pprice. We prove a stronger condition for integers n,m ≥ 2, i.e., there exists enough infeasible states to handle joint deviations up to n − 1 bidders. The number of k-bidder deviation states is the same as the single-item case, i.e., Eq. (8). Thus this stronger condition, with the same right-hand side as Eq. (10), is 2np − ((n + 1)m − 1)2npprice ≥ 2np − 1− (2p − 1)n (14) Hence Eq. (14) is true if (2p − 1)n ≥ ((n+ 1)m − 1)2npprice ⇔ 2ppricen(2m − 2−pprice)n ≥ ((n+ 1)m − 1)2npprice ⇔ (2m − 2−pprice)n ≥ (n+ 1)m − 1 Since 2−pprice ≤ 1, Eq. (14) is true if (2m − 1)n ≥ (n+ 1)m − 1 which is true if (2m − 1) m ≥ (n+ 1) Let f(m) ≡ (2m − 1) m and g(n) ≡ (n + 1) n . We establish the required inequality, f(m) ≥ g(n), by showing f(m) is increasing in m when m ≥ 2, g(n) is decreasing in n when n ≥ 2 and noting f(2) = g(2) = Taking the derivative of f(m) with respect to m, we get, (2m − 1) 2m ln(2) 2m − 1 ln(2m − 1) This is positive if and only if 2m − 1 log2(2 m − 1) This is true because log2(2 m − 1) < log2(2m) = m and hence both fractions in the expression are greater than 1. Thus, f(m) is increasing for all m ≥ 2. Taking derivative of g(n) with respect to n, we get, (n+ 1) 1 + n − ln(1 + n) This is negative if and only if ln(1 + n) 1 + n This is true for n ≥ 2. Thus g(n) is decreasing in n for n ≥ 2. Thus we have shown that Eq. (13) is true for n,m ≥ 2. It can be easily checked that Eq. (13), is not true for n = 1 and true when n = 2 and m = 1. Thus, if a classical combinatorial auction has a NE then the correspond- ing quantum auction protocol also has a NE with respect to initial state deviations. Also there is an ǫ-equilibrium of the same auction with an im- perfect search using N search steps. Moreover, the stronger condition of Eq. (14) shows that in auctions with at least two bidders (n > 1), there are enough infeasible states to give no winner for any deviation of initial ampli- tudes that up to n− 1 can produce. Thus no groups, up to size n− 1, can collude to benefit from initial amplitude deviations in the quantum auction. 7 Applications of Quantum Auctions Two properties of quantum information may provide benefits to auctioneers and bidders: the ability to compactly express complicated combinations of preferences via superpositions and entanglement and the destruction of the quantum state upon measurement. This section describes some economic scenarios that could benefit from these properties. As one economic application, quantum auctions provide a natural way to solve the allocative externality problem [18, 25]. In this situation, a bidder’s value for an item depends on the items received by other bidders. For example, consider companies bidding on a big government project requiring multiple companies to work on different parts. Allocative externality refers to the issue that the costs for a company which wins a contract for one part depends on which other companies win other parts. So company A may be willing to bid more aggressively if it knows that company B will work on related parts. Multiple simultaneous auctions for separate parts will not handle these interdependencies and thus will be inefficient. One possible solution is to let companies form partnership bids. That is joint bids that are accepted together or not at all. Quantum information processing allows for a natural way of forming partnership bids via entanglement. With the protocol described in Sec. 6, multiple bids can be entangled so they will either all be accepted together or none will be. Furthermore, quantum auctions may provide more flexibility with respect to information privacy of partnership bids than classical methods. Specifically, with multiple items, groups of bidders could select joint op- erators on their combined qubits, allowing them to express joint constraints (e.g., where they either all win their specified items or none of them do) without any of the other bidders or auctioneer knowing this choice. The bidders do so by creating an entangled state instead of the factored form for their qubits. Thus employing quantum entanglement provides bidders a natural way for expressing any allocative externality. This possibility shows bidding languages based on qubits are highly expressive and compact be- cause bidders can use the same bits to express their individual bids and joint bids via entanglement. Example 5. Alice and Bob could jointly form the state (|∅, 0,∅, 0〉 + |IA, bA, IB , bB〉+ |IC , bC , ID, bD〉) (15) to represent the bidders willing to pay bA and bB for items IA and IB, or to pay bC and bD for items IC and ID, but they are not willing to buy other combinations, such as IA for Alice and ID for Bob. In this scenario, a direct representation of bids, i.e., without a null bid, would not guarantee the joint preferences are satisfied for all entangled bid- ders or none of them. That is, without null bids, the superposition could not express the joint preference through entanglement. A group of k bidders operating jointly on their qubits to form entangled bids could also produce initial amplitudes involving up to k-bidder deviation states. However the discussion with Eq. (14) on multiple item auctions shows our protocol can handle all deviation states a group of up to n−1 bidders can produce, i.e., by mapping them to infeasible outcomes. Thus the additional expressivity used for joint bids does not introduce additional opportunities for collusion to change the outcomes via initial amplitude selection. A second economic application for quantum auctions arises from their privacy guarantee for losing bids. This property is economically useful when bidders have incentives to hide information. An example is a scenario in which companies are bidding for government contracts year after year. A company’s bid usually contains information about its cost structure. If there is reasonable expectation that the losing bids will be revealed, a company may want to bid less aggressively to reduce the amount of information passed to its competition for use in future auctions. This will lead to a less efficient auction than if bidders reveal their true values. In this situation, a privacy guarantee on the losing bids enables bidders to bid with less inhibition. More generally, this privacy issue is only relevant when there are additional interactions between these companies after the auction is concluded, such as future auctions or negotiations where participants may be at a disadvantage if their values are known to others. This strong privacy property is unique to quantum information process- ing. Privacy can be enforced via cryptographic methods for multi-player computation [13], and in an auction can keep losing bids secret [22]. How- ever, the information on the bids, and the key to decrypt them, remains after the auction completes. People who have access to the key may be legally compelled to reveal the information or choose to sell it. So while cryptography can be secured computationally, it cannot guarantee the in- tegrity of the person(s) who have the means to decrypt the information. On the other hand, the quantum method destroys losing bids during the search for the winning one and it is physically impossible to reconstruct the bids after the auction process. Similarly, some of the other properties of quantum auctions, such as correlations for partnership bids, can be pro- vided classically [19]. Moreover, quantum mechanisms are readily simulated classically [27] (as long as they involve at most 20 to 30 qubits). However, these classical approaches lack the information security of quantum states. More study is needed to determine scenarios where the privacy property of the quantum protocol is significant. 8 Discussion This paper describes a quantum protocol for auctions, gives a game theory analysis of some strategic issues the protocol raises and suggests economic scenarios that could benefit from these auctions. These include the privacy of bids and the possibility of addressing allocative externalities. The search used in our protocol can use arbitrary criteria for evaluating allocations, thereby implementing other types of auctions with quantum states. Thus while we focus our attention on the first-price sealed-bid auction, the pro- tocol is more general: it can implement other pricing and allocation rules, as well as multiple-unit-multiple-item auctions with combinatorial bids. For example we can use this protocol in a multiple stage, iterative auction. In fact, the protocol supports general bidding languages. Encoding bids in quantum states raises new game theory issues because the bidders’ strategic choices include specifying amplitudes in the quantum states. The auction is not only probabilistic, but the winning probability is not just a function of the amount bid. Instead a bidder can change the probability of winning by altering the amplitudes of the quantum states encoding his bid. For example, in the context of the first-price sealed-bid auction, the auction does not guarantee the allocation of the item to the highest bidder. We show that the correct design of the protocol can solve a specific version of this incentive problem. The salient design feature is an incentive compatible mechanism so that bidders do not want to cheat, as opposed to an algorithmic secure protocol that prevents bidders from cheating. Thus, our design is an example of a quantum algorithm, in this case adiabatic search, tuned to improve incentive issues rather than the usual focus in quantum information processing on computation or security properties of algorithms. In addition, we show that the Nash equilibrium of the corresponding classical first-price sealed-bid auction is an ǫ-equilibrium of the quantum auction and that ǫ converges to zero when the quantum search associated with the protocol uses an increasing number of steps, under the conditions listed in Theorem 1. This result is with respect to changes in the initial state of the search. It remains to be seen whether other bidder strategies give some unilateral benefit, requiring further adjustments to the auction design. There are multiple directions for future work. First, we plan a series of human subject experiments on whether people can indeed bid effectively in the simple quantum auction scenario described in this paper. As with previous experiments with a quantum public goods mechanism [2], such ex- periments are useful tests of the applicability of game theory in practice, and also suggest useful training and decision support tools. In particular, people’s behavior in a quantum auction could differ from game theory predic- tions that people select a Nash equilibrium based on idealized assumptions of human rationality and full ability to evaluate consequences of strategic choices with uncertainty. Second, we plan to extend studies of quantum auctions to more com- plicated economic scenarios, such as one with allocative externality. Our analysis considers a single auction. An interesting extension is to a series of auctions for similar items. If auctions are repeated, the game theory anal- ysis is more complicated [28]. In particular, privacy concerns become more significant since information revealed by a bidder’s behavior in one auction may benefit other bidders in later auctions. The quantum auction destroys all information about the losing bids. As a result, it is not possible to conduct after-the-fact audits to verify that the auction has been conducted correctly. Is there a way to modify the mechanism to enable audits while preserving some of the privacy guaran- tees? Security is another interesting issue. For example, there may be third parties, aside from the auctioneer and bidders who are interested in inter- cepting and changing bits in transit. Auctioneers may have incentives to detect a bidder’s bid or skew auction results. The question is whether we can build security around the protocol to prevent or at least detect these types of attacks. Similarly, many economics issues surrounding the protocol remain to be resolved. For example, people behave as if they are risk averse in auction situations [5, 4] which can change the predictions of game theory. Another issue arises from the possibility of multiple Nash equilibria. We have only shown that the desirable outcome is an equilibrium. The quantum protocol can also have other equilibria. Since the Nash equilibrium concept alone does not indicate how people select one equilibrium over another, additional study is needed to determine when the desirable outcome is likely to occur. Our protocol makes only limited use of quantum states, in particular encoding bids in the subspace selected by the bidders but not using the amplitudes separately. Thus it would be interesting to examine extensions to the protocol exploiting the wider range of options for bidders. For example, a protocol might use amplitudes of superpositions to indicate a bidder’s probabilistic preferences, say, as in constructing a portfolio of items with various expected values and risks. Such portfolios could be useful if bidders have some uncertainty in their values (e.g., in bidding for oil field exploration rights) rather than the standard private value framework considered in this paper, where bidders know their own values for the items. With uncertain values, probabilistic bids could allow bidders to match their risk preferences along with their value estimates within the auction process. As a final note, the number of qubits necessary to conduct an auction is small compared to the requirement of complex computations such as factor- ing. For example, if each bidder uses 7 bits (corresponding to 27 or about 100 bid values) and there are 3 bidders, about 25 qubits are needed, consid- erably less than thousands needed for factoring interesting-sized numbers. Thus with the advancement of quantum information processing technologies, economics mechanisms could be early feasible applications. Acknowledgments We have benefited from discussions with Raymond Beausoleil, Saikat Guha, Philip Kuekes, Andrew Landahl and Tim Spiller. This work was supported by DARPA funding via the Army Research Office contract #W911NF0530002 to Dr. Beau- soleil. This paper does not necessarily reflect the position or the policy of the Government funding agencies, and no official endorsement of the views contained herein by the funding agencies should be inferred. References [1] Michel Boyer, Gilles Brassard, Peter Hoyer, and Alain Tapp. Tight bounds on quantum searching. In T. Toffoli et al., editors, Proc. of the Workshop on Physics and Computation (PhysComp96), pages 36–43, Cambridge, MA, 1996. New England Complex Systems Institute. [2] Kay-Yut Chen and Tad Hogg. How well do people play a quantum prisoner’s dilemma? Quantum Information Processing, 5:43–67, 2006. [3] Kay-Yut Chen, Tad Hogg, and Raymond Beausoleil. A quantum treatment of public goods economics. Quantum Information Processing, 1:449–469, 2002. arxiv.org preprint quant-ph/0301013. [4] Kay-Yut Chen and Charles R. Plott. Nonlinear behavior in sealed bid first price auctions. Games and Economic Behavior, 25:34–78, 1998. [5] James C. Cox, Vernon L. Smith, and James M. Walker. Theory and individual behavior of first-price auctions. Journal of Risk and Uncertainty, 1:61–99, 1988. [6] Peter Cramton, Yoav Shoham, and Richard Steinberg, editors. Combinatorial Auctions. MIT Press, 2006. [7] Jiangfeng Du et al. Entanglement enhanced multiplayer quantum games. Physics Letters A, 302:229–233, 2002. arxiv.org preprint quant-ph/0110122. [8] Jiangfeng Du et al. Experimental realization of quantum games on a quan- tum computer. Physical Review Letters, 88:137902, 2002. arxiv.org preprint quant-ph/0104087. [9] Christoph Durr and Peter Hoyer. A quantum algorithm for finding the mini- mum. arxiv.org preprint quant-ph/9607014, 1996. [10] J. Eisert, M. Wilkens, and M. Lewenstein. Quantum games and quantum strategies. Physical Review Letters, 83:3077–3080, 1999. arxiv.org preprint quant-ph/9806088. [11] Jens Eisert and Martin Wilkens. Quantum games. J. Modern Optics, 47:2543– 2556, 2000. arxiv.org preprint quant-ph/0004076. [12] Edward Farhi et al. A quantum adiabatic evolution algorithm applied to random instances of an NP-complete problem. Science, 292:472–476, 2001. [13] O. Goldreich. Secure multi-party computation. working draft version 1.1, 1998. Available at philby.ucsd.edu/cryptolib/books.html. [14] Lov K. Grover. Quantum mechanics helps in searching for a needle in a haystack. Physical Review Letters, 79:325–328, 1997. arxiv.org preprint quant-ph/9706033. [15] Tad Hogg. Quantum search heuristics. Physical Review A, 61:052311, 2000. Preprint at publish.aps.org/eprint/gateway/eplist/aps1999oct19 002. [16] Tad Hogg. Adiabatic quantum computing for random satisfiability problems. Physical Review A, 67:022314, 2003. arxiv.org preprint quant-ph/0206059. http://arxiv.org/abs/quant-ph/0301013 http://arxiv.org/abs/quant-ph/0110122 http://arxiv.org/abs/quant-ph/0104087 http://arxiv.org/abs/quant-ph/9607014 http://arxiv.org/abs/quant-ph/9806088 http://arxiv.org/abs/quant-ph/0004076 http://arxiv.org/abs/quant-ph/9706033 http://arxiv.org/abs/quant-ph/0206059 [17] Bernardo A. Huberman and Tad Hogg. Quantum solution of coordination problems. Quantum Information Processing, 2:421–432, 2003. arxiv.org preprint quant-ph/0306112. [18] Philippe Jehiel and Benny Moldovanu. Allocative and informational external- ities in auctions and related mechanisms. Technical Report SFB/TR 15 142, Free University of Berlin. available at ideas.repec.org/p/trf/wpaper/142.html LOCATION =. [19] David A. Meyer. Quantum communication in games. In S. M. Barnett et al., editors, Quantum Communication, Measurement and Computing, volume 734, pages 36–39. AIP Conference Proceedings, 2004. [20] Paul R. Milgrom and Robert J. Weber. A theory of auctions and competitive bidding. Econometrica, 50:1089–1122, 1982. [21] Pierfrancesco La Mura. Correlated equilibria of classical strategic games with quantum signals. arxiv.org preprint quant-ph/0309033, Sept. 2003. [22] Moni Naor, Benny Pinkas, and Reuben Sumner. Privacy perserving auctions and mechanism design. In Proc. of the ACM Conference on Electronic Com- merce, pages 129–139, NY, 1999. ACM Press. [23] Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and Quantum Information. Cambridge Univ. Press, 2000. [24] Roy Radner. Collusive behavior in noncooperative epsilon-equilibria of oligopolies with long but finite lives. J. of Economic Theory, 22:136–154, 1980. [25] Martin Ranger. The generalized ascending proxy auction in the presence of externalities. Technical report, Social Science Research Network, July 2005. available at ssrn.com/abstract=834785. [26] Peter W. Shor. Algorithms for quantum computation: Discrete logarithms and factoring. In S. Goldwasser, editor, Proc. of the 35th Symposium on Founda- tions of Computer Science, pages 124–134, Los Alamitos, CA, November 1994. IEEE Press. [27] S. J. van Enk and R. Pike. Classical rules in quantum games. Physical Review A, 66:024306, 2002. [28] Robert Wilson. Strategic analysis of auctions. In Robert Aumann and Sergiu Hart, editors, Handbook of Game Theory with Economics Applications, vol- ume 1. Elsevier, 1992. Chapter 8. http://arxiv.org/abs/quant-ph/0306112 http://arxiv.org/abs/quant-ph/0309033 Introduction Quantum Auction Protocol Quantum Auction Implementation Creation and interpretation of quantum bids Distributed Search Strategies with Quantum Operators Selecting the Subspace Altering Initial Amplitudes Changing Operator During Search Quantum Auction Design Checking for the Null Set The First-Price Sealed-Bid Auction A Game Theory Approach to Auctions Behavior with Perfect Search Bounded Number of Search Steps Testing for Changed Operators During Search Assigning Eigenvalues to Subspaces An Example General Case Multiple Items and Combinatorial Auction Applications of Quantum Auctions Discussion ABSTRACT We present a quantum auction protocol using superpositions to represent bids and distributed search to identify the winner(s). Measuring the final quantum state gives the auction outcome while simultaneously destroying the superposition. Thus non-winning bids are never revealed. Participants can use entanglement to arrange for correlations among their bids, with the assurance that this entanglement is not observable by others. The protocol is useful for information hiding applications, such as partnership bidding with allocative externality or concerns about revealing bidding preferences. The protocol applies to a variety of auction types, e.g., first or second price, and to auctions involving either a single item or arbitrary bundles of items (i.e., combinatorial auctions). We analyze the game-theoretical behavior of the quantum protocol for the simple case of a sealed-bid quantum, and show how a suitably designed adiabatic search reduces the possibilities for bidders to game the auction. This design illustrates how incentive rather that computational constraints affect quantum algorithm choices. <|endoftext|><|startoftext|> Geometric Phase and Superconducting Flux Quantization Geometric Phase and Superconducting Flux Quantization Walter A. Simmons & Sandip S. Pakvasa Department of Physics and Astronomy University of Hawaii at Manoa Honolulu, Hi 96822 Abstract In a ring of s-wave superconducting material the magnetic flux is quantized in units of 0 2 Φ = . It is well known from the theory of Josephson junctions that if the ring is interrupted with a piece of d- wave material, then the flux is quantized in one-half of those units due to a additional phase shift of π . We reinterpret this phenomenon in terms of geometric phase. We consider an idealized hetero-junction superconductor with pure s- wave and pure d-wave electron pairs. We find, for this idealized configuration, that the phase shift of π follows from the discontinuity in the geometric phase and is thus a fundamental consequence of quantum mechanics. Geometric phase has been contained in quantum mechanics since the foundations of the field were set down in the early twentieth century; however, the phase and its importance were not recognized for some time. Pancharatnam1 discovered the classical geometric phase in optics in 1956 and Berry’s important 1987 quantum mechanics paper2 stimulated the rapid development of the field. By 1992, Anandan3, in a review article in Nature, was able to conclude that the phase had been convincingly demonstrated. The first application of geometric phase to Josephson Junctions was carried out by Anandan and Pati in 1997. They showed that the zero voltage tunneling supercurrent is geometric in nature and that it is proportional to the speed of the state vector in projective Hilbert space.4 The appearance of a phase discontinuity of π± arising from geometric phase, under certain circumstances, was shown5 to be a general feature of quantum mechanics in 2003, but has so far found only limited application6. Here we show that a well known phenomenon7,8 in superconductivity, the quantization of magnetic flux in one half of the usual unit9, which is 0 2 Φ = , can be interpreted as an effect of the discontinuity in geometric phase. This phase shift in superconductors has been understood in terms of the physics of the Josephson junctions and the result has been applied to high temperature superconductors10 in order to test the idea that they involve d-wave electron pairs. Our application considers an idealized hetero-junction superconductor with pure s-wave and pure d-wave electron pairs. We find, for this idealized configuration, that the phase shift of π follows from the discontinuity in the geometric phase4 and is thus a fundamental consequence of quantum mechanics. Applications of quantum geometric phase have been made in nearly every branch of physics, from fundamental material science11,12 to quantum computing with superconducting nanocircuits13, as well as in chemistry14, and it has been suggested that phase may become important in biology15. Since the phase has been long present, but not fully recognized, some applications entail reinterpreting known phenomena in terms of the phase. An important and illustrative example is the reinterpretation of the so called Guoy effect in optics as a geometrical phase16, which we will summarize below. An idealized superconducting ring, which consists of a composite of s-wave material and d-wave material, will, in the absence of external electromagnetic fields, exhibit quantization of the magnetic flux in units of one half of the usual unit, 0 2 Φ = . This half-unit quantization of the magnetic flux will occur whenever there is an odd number of phase shifts of magnitude π in the circuit. The theoretical argument6 for the π phase shift in a composite ring was based upon the dynamics of the Josephson junction and on thermodynamics, and has experimental support. We next explain the quantum mechanics of the phase shift and then we shall proceed to reinterpret the s-wave/d-wave superconductor hetro-junction. It has long been known17 that when a light beam converges to a focus, then diverges again, the light experiences a phase shift whose magnitude depends upon the details. For example, for a beam with a Gaussian profile and a very small waistline at the focus, an abrupt phase shift of π occurs for each of the two transverse directions, for a total phase shift of π . This result, which follows from standard classical electrodynamics, has been reinterpreted in terms of geometric phase15. For one transverse direction, the complex curvature of the wave reverses at the focus; the geometric phase, which is directly related to the curvature, changes by π . That optical example of geometric phase is closely analogous to a well-known18 phase flip of π± , which occurs in optics when the polarization of a light beam is rotated from some initial state, through and beyond, another polarization state that is orthogonal to the initial state. The latter example of the geometric phase discontinuity has been shown to occur rather generally in quantum mechanics4. This can be understood by considering the behavior of a complex quantum state vector as it is impelled through a series of states in Hilbert space by an external force. Suppose the initial state is iΨ , the final state is fΨ , and some intermediate state 0Ψ is orthogonal to the initial state, 0 0iΨ Ψ = . In the complex plane, the trajectory is a sequence of projections of each state upon its subsequent state. The trajectory from the initial to the final state passes through the origin (with a positive or negative infinitesimal imaginary part); the phase goes through an inverse-tangent singularity and changes by π± . Finally, turning to superconductivity, we adopt the theoretical framework19, in which superconductivity is viewed as a consequence of the breaking of gauge invariance entailing the formation of Cooper pairs. We consider a ring of material in which the supercurrent is carried by s-wave Cooper pairs. The ring is interrupted by a section of material in which d-wave Cooper pairs carry the supercurrent. The supercurrent passing through the idealized hetro-junction experiences a shift of π± in the geometric phase due to the orthogonality of d-wave and s-wave. Since the orientation of the d- wave relative to the s-wave is not meaningful, we have no sum over dimensions as in the optical beam analogy; a shift of π± is the result of a single inserted section of material and the half-unit magnetic flux quantization follows. Since the 1997 work of Anandan and Pati, it has been known that the zero-voltage current in a tunneling supercurrent arises from the geometry of Hilbert space and is independent of the specific Hamiltonian, (which is a general feature of geometric phase20). Recently, experiments on hetero-junctions have supported the idea that high materials are d-wave superconductors. While we are not discussing here realistic models of high materials here, our results show that a phase shift of π± in s-wave/d-wave hetero-junctions arises from the fundamentals of quantum mechanics. 1 Pancharatnam, S., The Proceedings of the Indian Academy of Sciences, Vol XLIV, No. 5, Sec. A, 247 (1956) in Collected Works of S. Pancharatnam, Oxford University Press, London (175). 2 Berry, M.V. “Quantal Phase Factors Accompanying Adiabatic Changes”, Proc. R. Soc. Lond. A392, 45, (1984). 3 Anandan, J. “The geometric phase”, Nature 360, 307 (1992). 4 Anandan, J. & Pati, A.K., “Geometry of the Josephson effect” Physics Letters A 231, 29 (1997). 5 Mukunda, et al “Bargmann invariants, null phase curves, and a theory of the geometric phase”, Phys. Rev. A 67, 042114 (2003). 6 Simon, R. and Mukunda, N., “Bargmann Invariant and the Geometry of the Guoy Effect”, Phys. Rev. Letters 70, 880 (1993). 7 Bulaevskii, L. N. , Kuzii, V. V. & Sobyanin, A. A. Superconducting system with weak coupling to the current in the ground state. JETP Lett. 25, 290–294 (1977). 8 Tsuei, C.C., and Kirtley, J.R. “Paring symmetry in cuprate superconductors” Rev. Mod. Phys 72, 969 (2000). For more recent results, see Kirtley, et al, Nature 373, 225 (2005) 9 Ashcroft, N.W. & Mermin, N.D., Solid State Physics, Holt, Rinehart and Winston (1976). 10 Hilgenkamp, H., Ariando, Smilde, H.-J. H., Blank, D. H. A., Rijnders, G., Rogalla, H., Kirtley, J. R., and Tsuei, C. C., “Ordering and manipulation of the magnetic moments in large-scale superconducting pi-loop arrays”, Nature 422, 50 (2003). 11 Zak, J. “Berry’s Phase for Energy Bands in Solids”, Phys. Rev. Lett. 62, 2747 (1989). 12 Resta, R. “Manifestations of Berry’s phase in molecules and condensed matter”, J.Phys.Condens. Matter 12, R107 (2000). 13 Falci,G., Fazio, R., Palma, G.M., Siewert, J., and Vedral, V. “Detection of geometric phases in superconducting nanocircuits”, Nature 407, 355 (2000). 14 Mead, C.A. “The geometric phase in molecular systems”, Rev. Mod. Phys. 64, 51 (1992). 15 Kagan, M.L., Kepler, T. B. & Epstein, I.R., “Geometric phase shifts in chemical oscillators”, Nature 349, 506 (1991). 16 Simon, R. and Mukunda, N., “Bargmann Invariant and the Geometry of the Guoy Effect”, Phys. Rev. Letters 70, 880 (1993). 17 Siegman, A.E., Lasers, University Science Books, Mill Valley, California (1986). 18 Bhandari, R. “Polarization of light an the topological phases”, Physics Reports, 281, 1 (1997). 19 Weinberg, S., Quantum Theory of Fields II, Cambridge University Press (1996). 20 Aharonov, Y. & Anandan, J., “Phase Change during a Cyclic Quantum Evolution”, Phys. Rev. Lett 58, 1593 (1987). ABSTRACT In a ring of s-wave superconducting material the magnetic flux is quantized in units of $\Phi_0 = \frac{h}{2e}$. It is well known from the theory of Josephson junctions that if the ring is interrupted with a piece of d-wave material, then the flux is quantized in one-half of those units due to a additional phase shift of $\pi$. We reinterpret this phenomenon in terms of geometric phase. We consider an idealized hetero-junction superconductor with pure s-wave and pure d-wave electron pairs. We find, for this idealized configuration, that the phase shift of $\pi$ follows from the discontinuity in the geometric phase and is thus a fundamental consequence of quantum mechanics. <|endoftext|><|startoftext|> Equation-Free Implementation of Statistical Moment Closures Francis J. Alexander and Gregory Johnson Los Alamos National Laboratory, P.O.Box 1663, Los Alamos, NM, 87545. Gregory L. Eyink Department of Mathematical Sciences Johns Hopkins University Baltimore, MD 21218 Ioannis G. Kevrekidis Department of Chemical Engineering and PACM Princeton University Princeton, NJ 08544 We present a general numerical scheme for the practical implementation of statistical moment clo- sures suitable for modeling complex, large-scale, nonlinear systems. Building on recently developed equation-free methods, this approach numerically integrates the closure dynamics, the equations of which may not even be available in closed form. Although closure dynamics introduce statistical assumptions of unknown validity, they can have significant computational advantages as they typi- cally have fewer degrees of freedom and may be much less stiff than the original detailed model. The closure method can in principle be applied to a wide class of nonlinear problems, including strongly- coupled systems (either deterministic or stochastic) for which there may be no scale separation. We demonstrate the equation-free approach for implementing entropy-based Eyink-Levermore closures on a nonlinear stochastic partial differential equation. PACS numbers: INTRODUCTION Accurate, fast simulations of complex, large-scale, non- linear systems remain a challenge for computational sci- ence and engineering, despite extraordinary advances in computing power. Examples range from molecular dy- namics simulations of proteins [1], [2] and glasses [3], to stochastic simulations of cellular biochemistry [4, 5], to global-scale, geophysical fluid dynamics [6]. Often for the systems under consideration there is no obvious scale sep- aration, and their many degrees of freedom are strongly coupled. The complex and multiscale nature of these pro- cesses therefore makes them extremely difficult to model numerically. To make matters worse, one is often in- terested not in a single, time-dependent solution of the equations governing these processes, but rather in ensem- bles of solutions consisting of multiple realizations (e.g., sampling noise, initial conditions, and/or uncertain pa- rameters). Often real-time answers are needed (e.g., for control, tracking, filtering). These demands can easily exceed the computational resources available not only now but also for the foreseeable future. In principle, all statistical information for the problem under investigation is contained in solutions to the Liou- ville (if deterministic)/Kolmogorov (if stochastic) equa- tions. These are partial differential equations in a state space of high (possibly infinite) dimension. A straightfor- ward discretization of the Liouville / Kolmogorov equa- tions is therefore impractical. An ensemble approach to solving these equations can be taken; however, quite of- ten, the practical application of the ensemble approach is also problematic. Generating a sufficient number of independent samples for statistical convergence can be a challenge. For some problems, computing even one real- ization may be prohibitive. The traditional approach to making these prob- lems computationally tractable is to replace the Liou- ville/Kolmogorov equation by a (small) set of equations (PDEs or ODEs) for a few, low order statistical moments of its solution. When taking this approach for nonlinear systems, one must make an approximation, a closure, for the dependence of higher order moments on lower order moments. Typically the form of the closure equation is based on expert knowledge, empirical data, and/or phys- ical insight. For example, in the superposition approxi- mation and its extensions [7] for dense liquids and plas- mas, both quantum or classical, one approximates third order moments as functions of second order moments. Moment closure methods of this type have been applied to a number of areas including fluid turbulence (see [8] and references therein, and also the work of Chorin et. al.). Of course, as with any approximation strategy, the quality of the resulting reduced description depends on the approximations made – poor closures lead to poor answers/predictions. In addition to replacing the ensem- ble with a small set of equations for low order moments, these equations are typically easier to solve. They are deterministic and generally far less stiff than the original http://arxiv.org/abs/0704.0804v1 equations. A less exploited variant of this approximation scheme is the probability density function (PDF) based moment- closure approach. For PDF moment closures one makes an ansatz for the system statistics guided by available in- formation (e.g., symmetries). One then uses this ansatz in conjunction with the original dynamical equations to derive moment equations. Such PDF-based closures have been developed for reacting scalars advected by turbu- lence [10], phase-ordering dynamics [11] and a variety of other systems. This approach to moment-closure is a close analogue of the Rayleigh-Ritz method frequently used in solving the quantum-mechanical Schroedinger equation, by exploiting an ansatz for the wave-function. For a formal development of this point of view, see [12]. One of the obstacles to applying moment closures is that often the closure equations are too complicated to write down explicitly, even with the availability of com- puter algebra / symbolic computation systems. This is especially true for large-scale, complex systems, e.g. global climate models. Because of their great complexity, even if one could in principle derive the closure equations analytically, this procedure would be extremely difficult and time-intensive. Moreover, each time a model is up- dated, as climate and ocean models regularly are, the closure equations would have to be rederived. In other cases it may simply be impossible to determine the clo- sure equations analytically. This is especially likely when PDF’s are not Gaussian, which is the case for most use- ful closures. Monte Carlo or other numerical methods may be needed in order to evaluate integrals for the mo- ments [13]. In addition, there may be situations where neither analytic nor numerical/MC integration will yield the closure equations due to the black-box nature of the available numerical simulator such as a compiled numer- ical code with an inaccessible source. Clearly, a need ex- ists for a robust approach to the general closure protocol which circumvents analytical difficulties. We address that need here by combining PDF closures with equation free modeling [14] [15]. The basic premise of the equation-free method is to use an ensemble of short bursts of simulation of the original dynamical system to estimate, on demand, the time-evolution of the the clo- sure equations that we may not explicitly have. The equation-free approach extends the applicability of sta- tistical closures beyond the rare cases where they can be expressed in closed form. This hybrid strategy may be faster than the brute-force solution of a large ensemble of realizations of the dynamical equations since the closure version is generally smoother than the original problem. This paper is organized as follows. In Section 2 we describe the general features of PDF-based moment clo- sures. In Section 3 we explain how to implement the equation-free approach with these closures. We then, in Section 4, apply these ideas for a specific dynamical sys- tem, the stochastic Ginzburg-Landau (GL) equations us- ing a particular PDF-based closure scheme, the entropy method of Eyink and Levermore [16]. We conclude with a discussion of closure quality, computational issues, and the application of our approach to large-scale systems. PDF-BASED MOMENT CLOSURES We consider the very general class of dynamical sys- tems, including maps, formally represented by Ẋ = U(X(t),N(t), t) (1) Xt+1 = Ut(Xt,Nt) (2) where N(t) is a stochastic process with prescribed statis- tics. The stochastic component arises from unknown pa- rameters, random forcing, neglected degrees of freedom and/or random initial conditions. This class includes both deterministic and stochastic systems with discrete and/or continuous states. Queueing systems, molecular dynamics, and stochastic PDEs are just some of the many examples that fall into this category. For concreteness in this paper we restrict ourselves to a special case of equation (2), namely, situations where N(t) is a Markov process (Brownian motion, Poisson pro- cess, etc.) and—more specifically still—Itô stochastic dif- ferential equations of the form: dX = U(X, t)dt + 2S(X, t)dW(t). (3) The deterministic component of the state, X, is gov- erned by the continuously differentiable vector field, U : N × R → RN . For many problems of interest (e.g., cli- mate) U is a highly nonlinear function. The noise com- ponent is modeled by the standard mean 0, covariance matrix I Wiener process, W ∈ RN , possibly modulated by a state-dependent matrix S : RN×R → RN×N . Equa- tion (3) encompasses a wide class of systems including deterministic (S = 0) ones. In many cases one is interested in knowing the low order statistics of equation (3), for example an instanta- neous mean value or possibly multi-point covariance of X. These statistics can be obtained by averaging over an ensemble of stochastic systems, solving equation (3). They can also be obtained via the forward Kolmogorov equation for the probability density function P (X, t): ∂tP = L∗(t)P, (4) where P satisfies the conditions: P (X, t) ≥ 0, and P (X, t) dX = 1, and where L∗ is the generator of the Markov process. In the case of equation (3) this operator takes the form L∗(t)ψ(X) = −∇X·(U(X, t)ψ(X))+∇2X : (D(X, t)ψ(X)). The forward Kolmogorov equation then becomes a Fokker-Planck equation ∂tP +∇X · (UP ) = ∇2X : (DP ) (6) where D(X, t) = S(X, t)S(X, t)T is the nonnegative- definite diffusion matrix arising from the noise term. Un- like the original dynamical equation (3), the forward Kol- morogov equation (FKE) is both linear and determinis- tic. Dealing with it, therefore, has apparent advantages over the original ensemble of stochastic systems simu- lations. The price to pay for these advantages is that the FKE lives in a typically high, potentially infinite- dimensional, space. When equation (3) is a nonlinear PDE, numerical solution to the FKE is usually ruled out. For computational purposes, we would therefore like to reduce the FKE (if possible and useful) to a small system of ordinary differential equations. This reduc- tion should simplify the computation as much as pos- sible while retaining fidelity to the original dynamical processes. The reduction proceeds by taking moments of the FKE with respect to a vector-valued function ξ(X, t) from RN × R+ → RM . The ξ selected should include the relevant variables in the system (slow modes, con- served quantities, etc.). The moments µ(t) of ξ(X, t) are defined by µ(t) = ξ(X, t)P (X, t)dX (7) and give rise to µ̇(t) = ξ̇(X, t)P (X, t)dX, (8) where ξ̇(X, t) = ∂tξ(X, t) + L(t)ξ(X, t) (9) and L is the adjoint of L∗ or the backward Kolmogorov operator. The result (8) can be obtained by averaging over an ensemble of realizations of the stochastic dynam- ics (3). In general, however, (8) is not a closed equa- tion for the moments, µ. One can close this equation by choosing a PDF, P (X, t,µ), which itself is a function of the moments µ. µ̇(t) = V(µ, t) ≡ ξ̇(X, t)P (X, t,µ)dX. (10) Alternatively, one can select a family of probability den- sities P (X, t,α), specified by parameters α = α(µ, t) rather than directly by the moments µ. This is analogous to specifying the temperature in the canonical ensemble as opposed to the average energy. The equivalence of these approaches is guaranteed provided that the param- eters and moments can be determined uniquely from one another. The translation between the parameters and their corresponding moments can be carried out by one of several methods. In some cases one may require Monte Carlo evaluation of the resulting integrals. If the moments and/or parameters are selected judiciously, one hopes that the approximate PDF P (X, t,α(µ)(t)) will be close to the exact solution of the Liouville/Kolmogorov equation (4). The mapping closure approach of Chen et al [10] and the Gaussian mapping method of Yeung et al. [11] are based on this type of parametric PDF closure [19]. In fact, perhaps the most familiar application of the parametric approach is the use of the Rayleigh-Ritz method in quantum me- chanical calculations. This is the essential approach of our paper. EQUATION-FREE COMPUTATION Although we now have obtained a closed moment equa- tion (equation 10), we still need to determine the dynam- ical vector field V. As explained above, this step can be a serious obstacle to the practical implementation of PDF- based moment-closure (PDFMC). A method to calculate V is desirable that (i) does not require a radical revision each time the underlying code or model changes, and (ii) is relatively insensitive to the complexity of the PDFMC. The equation-free approach of Kevrekidis and collabora- tors [14] meets those requirements. It permits one to work with much more sophisticated, physically realistic closures. Equation-free computation is motivated by the simple observation that numerical computations involving the closure equations ultimately do not require closed for- mulae for the closure equations. Instead, one must only be able to sample an ensemble of system states X dis- tributed according to the closure ansatz P (X, t;α) and then evolve each of these via equation (3) for short inter- vals of time. Such sampling and subsequent dynamical evolution would be necessary to calculate the statistics of interest even when not using a closure strategy. It is sufficient to have a (possibly black-box) subroutine avail- able which, given a specific state variable X(t) as input, returns the value of the state X(t + δt) after a short time δt. The ensemble of systems, each of which satis- fies equation (3), is evolved over a time interval δt. The moments/parameters µ or α are determined at the be- ginning and end of this interval and the time derivative µ̇ is estimated from the results of these short ensemble runs. This “coarse timestepper” can be used to estimate locally the right hand side of the closure evolution equa- tions, namely V(µ, t). Coarse projective forward Euler (arguably the simplest of equation-free algorithms) which we will use below il- lustrates the approach succinctly: Starting from a set of coarse-grained initial conditions specified by moments µ(t) we first (a) lift to a consistent fine scale descrip- tion, that is, sample the PDF ansatz P (X, t;α(t)) to generate ensembles of initial conditions X for equation (3) consistent with the set µ(t); (b) starting with these consistent initial conditions we evolve the fine scale de- scription for a (relatively short) time δt; we subsequently restrict back to coarse observables by evaluating the mo- ments µ(t+ δt) as ensemble-averages and (d) use the re- sults to estimate locally the time derivative dµ/dt. This is precisely the right hand-side of the explicitly unavail- able closure, obtained not through a closed form formula, but rather through short, judicious computational exper- iments with the original fine scale dynamics/code. Given this local estimate of the coarse-grained observable time derivatives, we can now exploit the smoothness of their evolution in time (in the form of Taylor series) and take a single long projective forward Euler step: µ(t+∆t) = µ(t) + ∆t µ(t+ δt)− µ(t) . (11) The procedure then repeats itself: lifting, fine scale evo- lution, restriction, estimation, and then (connecting with continuum traditional numerical analysis) a new for- ward Euler step. Beyond coarse projective forward Eu- ler, many other coarse initial-value solvers (e.g. coarse projective Adams-Bashforth, and even implicit coarse solvers) have been implemented; the stability and accu- racy study of such algorithms is progressing [14]. These developments allow us to construct a nonintrusive imple- mentation of PDF moment closures, nonintrusive in the sense that we compute with the closures without explic- itly obtaining them, but rather by intelligently chosen computational experiments with the original, fine-scale problem. There is, however, an obvious objection to the equation-free implementation of moment-closures. Using the same ingredients, one can clearly obtain an estimate of any statistics of interest (for example, the moment- averages µ(t)) without the need of making any closure assumptions whatsoever. This can be done by the much simpler method of direct ensemble averaging. That is, one can sample an ensemble of initial conditions X from any chosen distribution P0(X), evolve each of these real- izations according to the fine-scale dynamics of equation (3), and then evaluate any statistics of interest at time t by averaging over the ensemble of solutions X(t). It would seem that this direct ensemble approach is much more straightforward and accurate than the equation-free implementation of a moment-closure, which introduces additional statistical hypotheses. The response to this important objection is that the fine-scale dynamics (3) is often very stiff for the appli- cations considered, in which the system contains many- degrees-of-freedom interacting on a huge range of length- and time-scales. In contrast, the closure equation (10) is much less stiff, because of statistical-averaging, and its solutions µ(t) are much smoother in time (and space). Thus, to evolve an ensemble of solutions of the fine-scale dynamics (3) from an initial time t0 to a final time t0+T would require O(T/δt) integration steps, where the time- step δt is required to be very small by the intrinsic stiff- ness of the micro-dynamics. In the closure approach, the evolution of the moment equations (10) from time t0 to time t0+T requires only O(T/∆t) integration steps, with (hopefully) ∆t ≫ δt. Each of these closure integration steps by an increment ∆t requires in the equation-free approach just one (or just a few) fine-scale integration step by an increment δt. Thus, there is an over-all savings by a (hopefully) large factor O(∆t/δt). This crude esti- mate is based on a single step coarse projective forward Euler algorithm; clearly, more sophisticated projective integration algorithms can be used. In all of them, however, the computational savings are predicated on the smoothness of the closure equations, and are governed by the ratio of the time that it takes to obtain a good local estimate of dµ/dt from full direct simulation to the time that we can (linearly or even poly- nomially) extrapolate µ(t) in time. It is also worth not- ing that a variety of additional computational tasks, be- yond projective integration (e.g. accelerated fixed point computation) can be performed within the equation-free framework In the next section we show by a concrete example how significant computational economy can be achieved with statistical moment closures implemented in the equation- free framework. A NUMERICAL EXAMPLE We illustrate here the equation-free implementation of moment-closures for a canonical equation of phase- ordering kinetics [17], the stochastic time-dependent Ginzburg-Landau (TDGL) equation in one spatial di- mension. This is written as ∂φ(x, t) = D∆φ(x, t) − V ′(φ(x, t)) + η(x, t) (12) where φ(x, t) represents a local order parameter, e.g. a magnetization. The noise has mean zero and covariance 〈η(x, t)η(x′, t′)〉 = 2kT δ(x−x′)δ(t− t′). The potential V shall be chosen as V (φ) = to represent a single quartic/quadratic well. This stochastic dynamics has an invariant measure which is formally of Hamiltonian form P∗[φ] ∝ exp(−H [φ]/kT ) where H [φ] = D|∇φ(x)|2 + V (φ(x))] dx. (13) The Gibbsian measure P∗[φ] is approached at long times for any random distribution P0[φ] of initial states. One of the simplest dynamical quantities of interest is the bulk magnetization φ(t) = (1/V ) φ(x, t)dx, where V is the total volume. If the initial statistics are space- homogeneous, then the ensemble average µ(t) = 〈φ(t)〉 is also given by µ(t) = 〈φ(x, t)〉 for any space point x. Equa- tion (12) leads to a hierarchy of equations for statistical moments of φ(x, t). For example, the first moment satis- fies the equation ∂〈φ(x, t)〉 = ∆〈φ(x, t)〉 − 〈φ(x, t)〉 − 〈φ3(x, t)〉. (14) The evolution of the mean total magnetization is thus a function of the mean cubic total magnetization. One could write a time evolution equation for 〈φ3〉, but it would involve a higher order term 〈φ5〉, and so on. Each equation contains higher moments and therefore the hi- erarchy does not close. To close the equation for µ(t) we assume a parametric PDF of the form P [φ;α] ∝ exp(−H [φ;α]/kT ) where H [φ;α] = H [φ] + α φ(x) dx is a perturbation of the Hamiltonian (13) by a term pro- portional to the moment variable ξ[φ] = (1/V ) φ(x) dx. This is a special case of a general “entropy-based” clo- sure prescription proposed by Eyink and Levermore [16]. This closure scheme guarantees that α(t) → 0 at long times and therefore the PDF ansatz P [φ;α(t)] relaxes to the correct stationary distribution P∗[φ] of the stochastic process. The determination of the parameter α given the moment µ is here accomplished by Legendre transform α = argmaxα[αµ− F (α)], (15) where the “moment-generating function” F (α) = log〈exp[α φ(x) dx]〉∗ and 〈·〉∗ denotes average with re- spect to the invariant measure P∗[φ]. The numerical op- timization required for the Legendre transform is well- suited to gradient descent algorithms such as the conju- gate gradient method, since (∂/∂α)[αµ− F (α)] = µ− µ(α), where µ(α) = 〈ξ〉α is the average of the moment-function in the PDF ansatz P [φ;α]. In simple cases, F (α) and µ(α) = F ′(α) may be given by closed analytical expres- sions. If not, then both of these averages may be deter- mined together by Monte Carlo sampling techniques. In the numerical calculations below, we discretize equa- tion (12) using a forward Euler-Maruyama stochastic in- tegrator and 3-point stencil for the Laplacian (other dis- cretizations are possible). φ(x, t + δt) = φ(x, t)− δt[φ(x, t) + φ3(x, t)] + (16) (δx)2 [φ(x + δx, t)− 2φ(x, t) + φ(x− δx, t)] + 2kT (δt/δx)N(x, t) where N(x, t) are independent, identically distributed standard normal random variables for each space-time point (x, t). The invariant distribution of the stochastic dynamics space-discretized in this manner has a Gibbsian form ∝ exp(−Hδ/kT ) with discrete Hamiltonian 〈x,x′〉 (φ(x) − φ(x′))2 (17) φ2(x) + φ4(x)] where 〈x, x′〉 are nearest-neighbor pairs. The closure ansatz can be adopted in the consistently discretized form Pδ[φ;α] ∝ exp(−Hδ[φ;α]/kT ) where Hδ[φ;α] = Hδ[φ] + α δxφ(x). In this numerical experiment, we integrate an N = 1000 member ensemble of solutions of equation (17), and measure the ensemble-averaged, global magnetiza- tion µ(t) = 〈φ(t)〉 = (1/V ) 〈φ(x, t)〉 at each time- step. With this we compare the results of the entropy- based closure simulation implemented by the equation- free framework using also an ensemble with N = 1000 samples. In this concrete example, the projective inte- gration scheme works as follows: Suppose we are given the parameter α(t) at time t. The mean µ(t) is first cal- culated from the parametric ensemble at time t by Monte Carlo sampling. Next all N samples are integrated over a short time-step δt to create a time-advanced ensemble. From this ensemble µ(t + δt) is calculated, which yields an estimate of the local time derivative. µ̇app(t) = [µ(t+ δt)− µ(t)]/δt. A large, projective Euler time-step of the moment aver- age is then taken via µ(t+∆t) = µ(t) + ∆t µ̇app(t). The parameter is finally updated by using the Legendre transform inversion to obtain α(t+∆t) from the known value µ(t + ∆t). The cycle may now be repeated to in- tegrate the closure equations by successive time-steps of length ∆t. A critical issue in general application of projective inte- gration is the criterion to determine the projective time- step ∆t. For stiff problems with time-scale separation, the projective time step for stability purposes is of the order of (1/fastest “slow group” eigenvalues), while the “preparatory” simulation time is of the order of (1/slow- est “fast group” eigenvalue). Variants of the approach have been developed for problems with several gaps in their spectrum [18]. Accuracy considerations in real-time projective step selection can, in principle, be dealt with in the traditional way for integrators with adaptive step- size selection and error control: through on-line a poste- riori error estimates. An additional “twist” arises from the error inherent in the estimation of the (unavailable) reduced time derivatives from the ensemble simulations; issues of variance reduction and even on-line hypothe- sis testing (are the data consistent with a local linear model?) must be considered. These are important re- search issues that are currently explored by several re- search groups. Nevertheless, the main factor in computa- tional savings comes from the effective smoothness of the unavailable closed equation: the separation of time scales between the low-order statistics we follow and the higher order statistics whose effect we model (and, eventually, the time scales of the direct simulation of the original model). Figure 1 is a plot comparing Projective Integration with Entropy Closure and direct Ensemble Integration with equation (12) for diffusion constant D = 1000.0 We have selected both the “fine-scale” integration step δt and the “coarse-scale” projective integration step ∆t to be as large as possible, consistent with stability and accuracy. Thus, only steps small enough to avoid numerical blow- ups were considered. Then, values were selected both for δt and for ∆t so that the numerical integrations with those time-steps differed by at most a few percent from fully converged integrations with very small steps. In this manner, the time step required for the Euler-Maruyama integration of (12) was determined to be δt = 0.0004. On the other hand, for projective integration of the closure equation a time step ∆t = 0.01 could be taken. This indicates a gain in time step by a factor of 25, which is also roughly the speed-up in the algorithm or savings in CPU time. The present example is not as stiff as equa- tions that appear in more realistic applications, with a very broad range of length- and time-scales, where even greater computational economies might be expected. In general, the moment-closure results need not agree so well with those of the direct ensemble approach, even when both are converged. In the example presented here, there is good agreement because the closure effectively captures the one-point PDF (see Fig.2). This one-point PDF is the only statistical quantity that enters into Equa- tion (14) as long as the statistics are homogeneous and the Laplacian term vanishes. CONCLUSIONS In this paper, we have described how one can combine recently developed equation-free methods with statisti- cal moment closures to model nonlinear problems. With this method we can numerically integrate complex non- linear systems, for which closure equations may not be available in closed form. In the example presented here the specific entropy-based closure we selected has an H- 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 FIG. 1: Mean total field as a function of time. Line (sym- bols): traditional (coarse projective) integration, respectively. See the text for a description of the stepsize selection. FIG. 2: Comparison of the time dependent PDF’s of the local field φ(x, t) for the exact solution (blue) and for the projective integration / closure solution (red). theorem which guarantees relaxation to the equilibrium state of the original dissipative dynamics. However, we stress that the general approach outlined above can be used with a variety of closure methods. The equation-free method has the potential to enhance the flexibility, power, and applications set of the statis- tical moment closure approach. Since little or no an- alytic work is required, the sophistication of statistical moment closures can greatly enhanced beyond Gaussian PDF ansätze. The “practical usefulness” criterion for parametric PDF models that they permit analytical cal- culations is replaced by the criterion that they can be efficiently sampled. We believe that this approach can significantly increase the usefulness of closure methods. In order to model systems like global climate, oceans, and reaction diffusion processes in systems biology, one will have to construct more complex closures. These will likely include higher order moments, correlation functions of the relevant variables, highly non-Gaussian statistics, etc. As the closures become more complex, the lifting step will require more efficient sampling ap- proaches. One will likely have to use nonlocal, acceler- ated sampling methods. One will also likely employ the latest in adaptive time and adaptive mesh methods to optimize performance for large-scale problems. ACKNOWLEDGEMENTS This work, LA-UR-07-2218, was carried out in part at Los Alamos National Laboratory under the auspices of the US National Nuclear Security Administration of the US Department of Energy. It was supported un- der contract number DE-AC52-06NA25396. The work of IGK was partially supported by DARPA and by and US DOE(CMPD). G. Eyink was supported by NSF-ITR grant, DMS-0113649. [1] T. Schlick, R. D. Skeel, A. T. Brunger, L. V. Kale, J. A. Board, Jr. , J. Hermans, and K. Schulten J. Comp. Phys. 151, 9, (1999). [2] M. Karplus and J. A. McCammon, Nature, Structural and Molecular Biology, 9 , 646, (2002). [3] P. G. Debenedetti and F. H. Stillinger, Nature, 410, 259, (2001). [4] D. T. Gillespie, J. Phys. Chem., 81 , 2340, (1977). [5] D. J. Wilkinson, Stochastic Modeling for Systems Biol- ogy, Chapman & Hall / CRC Press, Boca Raton, (2006). [6] A. J. Majda and X. Wang, Nonlinear Dynamics and Sta- tistical Theories for Basic Geophysical Flows, Cambridge Univeristy Press, Cambridge UK, 2006 [7] J. P. Hansen and I. R. MacDonald, Theory of Simple Liquids, Academic, New York, (1986). [8] S. B. Pope, Turbulent Flows, Cambridge University Press, Cambridge, UK, (2000). [9] A. J. Chorin, O. H. Hald, and R. Kupferman, Proceedings of the National Academy of Sciences of the United States of America, 97, 2968, (2000). [10] H. Chen, S. Chen, and R. H. Kraichnan, Phys. Rev. Lett., 63, 2657–2660, 1989. [11] C. Yeung, Y. Oono, and A. Shinozaki Phys. Rev. E, 49, 2693 (1994) [12] G. L. Eyink, Phys. Rev. E 54 (1996) 3419–3435. [13] C.D. Levermore, J. Stat. Phys., 86 (1996), 1021–1065. [14] I. G. Kevrekidis, C. W. Gear, J. M. Hyman, P. G. Kevrekidis, O. Runborg and K. Theodoropoulos, Comm. Math. Sciences 1(4) pp.715-762 (2003); S. L. Lee and C. W. Gear, J. Comp. App. Math., 201, 258, (2007). [15] I. G. Kevrekidis, C. William Gear and G. Hummer, A.I.Ch.E Journal, 50(7) pp.1346-1354 (2004) [16] G. L. Eyink and C. D. Levermore (preprint) Entropy- Based Closures of Nonlinear Stochastic Dynamics. sub- mitted to ”Communications in Mathematical Sciences” (2006). [17] A. J. Bray, Adv. in Phys., 43, 357, (1994) [18] C. W. Gear and I. G. Kevrekidis, J. Comp. Phys., 187, 95, (2003) [19] In the case of [10] the dynamics is an advection-reaction- diffusion equation for a scalar concentration field X(t) = {θ(x, t) : x ∈ Rd}. The moment functions are the “fine- grained PDF” ξϑ,x[X, t] = δ(θ(x, t) − ϑ), labelled by space point x and scalar value ϑ. The moment average µϑ,x(t) = 〈δ(θ(x, t) − ϑ)〉 is the 1-point PDF p(ϑ;x, t) which gives the distribution of scalar values ϑ at space- time point (x, t). The parametric model P [X;α, t] is the distribution over scalar fields obtained by the ansatz θ(x, t) = X(θ0(x, t),x, t) where θ0(x, t) is a reference ran- dom field of known (Gaussian) statistics and X(·,x, t) : R → R is a “mapping function”. The latter function is the “parameter” αϑ0,x(t) = X(ϑ0,x, t) which determines (and is determined by) the “moment” µϑ,x(t) from the re- lation p(X(ϑ0,x, t);x, t)|∂X/∂ϑ0| = p0(ϑ0,x, t). Here p0 is the 1-point PDF of the reference Gaussian field θ0(x, t). The approach of [11] is similar. The problem is phase- ordering dynamics as given, for example, by our equation (12) and X(t) = {φ(x, t) : x ∈ Rd}. The moment func- tions are the quadratic products ξr[X, t] = φ(r, t)φ(0, t), labelled by the displacement r ∈ Rd and the moment av- erages µr(t) are the spatial correlation function C(r, t). The parametric model P [X;α, t] is the distribution ob- tained by the ansatz φ(x, t) = f(u(x, t)) where u(x, t) is a homogeneous Gaussian random field with mean zero and covariance G(r, t) = 〈u(r, t)u(0, t)〉 and f(z) is the sta- tionary planar interface solution of the TDGL equation (12). In this case, it is the auxiliary correlation function G(r, t) which plays the role of the “parameter” αr(t). It is shown in [11] for various cases how this function may be uniquely related to the “moment” µr(t) = C(r, t). ABSTRACT We present a general numerical scheme for the practical implementation of statistical moment closures suitable for modeling complex, large-scale, nonlinear systems. Building on recently developed equation-free methods, this approach numerically integrates the closure dynamics, the equations of which may not even be available in closed form. Although closure dynamics introduce statistical assumptions of unknown validity, they can have significant computational advantages as they typically have fewer degrees of freedom and may be much less stiff than the original detailed model. The closure method can in principle be applied to a wide class of nonlinear problems, including strongly-coupled systems (either deterministic or stochastic) for which there may be no scale separation. We demonstrate the equation-free approach for implementing entropy-based Eyink-Levermore closures on a nonlinear stochastic partial differential equation. <|endoftext|><|startoftext|> Introduction System Model Relay Selection Via Limited Feedback Performance Impact of Varying System Parameters BER Analysis Conclusion References ABSTRACT It has been shown that a decentralized relay selection protocol based on opportunistic feedback from the relays yields good throughput performance in dense wireless networks. This selection strategy supports a hybrid-ARQ transmission approach where relays forward parity information to the destination in the event of a decoding error. Such an approach, however, suffers a loss compared to centralized strategies that select relays with the best channel gain to the destination. This paper closes the performance gap by adding another level of channel feedback to the decentralized relay selection problem. It is demonstrated that only one additional bit of feedback is necessary for good throughput performance. The performance impact of varying key parameters such as the number of relays and the channel feedback threshold is discussed. An accompanying bit error rate analysis demonstrates the importance of relay selection. <|endoftext|><|startoftext|> Introduction This paper describes the Fourth Edition of the Sloan Digital Sky Survey (SDSS; York et al. 2000) Quasar Catalog. Previous versions of the catalog (Schneider et al. 2002, 2003, 2005; hereafter Papers I, II, and III) were published with the SDSS Early Data Release (EDR; Stoughton et al. 2002), the SDSS First Data Release (DR1; Abazajian et al. 2003), and the SDSS Third Data Release (DR3; Abazajian et al. 2005), and contained 3,814, 16,713, and 46,420 quasars, respectively. The current catalog is the entire set of quasars from the SDSS-I Quasar Survey; the SDSS-I was completed on 30 June 2005 and the Fifth Data Release (DR5; Adelman-McCarthy et al. 2007) was made public on 30 June 2006. The catalog contains 77,429 quasars, the vast majority of which were discovered by the SDSS. The SDSS Quasar Survey is continuing via the SDSS-II Legacy Survey, which is is an extension of the SDSS-I. The catalog in the present paper consists of the DR5 objects that have a luminosity larger than Mi = −22.0 (calculated assuming an H0 = 70 km s −1 Mpc−1, ΩM = 0.3, ΩΛ = 0.7 cosmology [Spergel et al. 2006], which will be used throughout this paper), and whose SDSS spectra contain at least one broad emission line (velocity FWHM larger than ≈ 1000 km s−1) or have interesting/complex absorption-line fea- tures. The catalog also has a bright limit of i ≈ 15.0. The quasars range in redshift from 0.08 to 5.41; 78% have redshifts below 2.0. The objects are denoted in the catalog by their DR5 J2000 coordinates; the format for the object name is SDSS Jhhmmss.ss+ddmmss.s. Since the image data used for the astrometric information can change between data releases (e.g., a region with poor seeing that is included in an early release is superseded by a newer observation in good seeing), the coordinates for an object can change at the 0.1′′ to 0.2′′ level; hence – 3 – the designation of a given source can change between data releases. Except on very rare occasions (see §5.1), this change in position is much less than 1′′. When merging SDSS Quasar Catalogs with previous databases one should always use the coordinates, not object names, to identify unique entries. The DR5 catalog does not include classes of Active Galactic Nuclei (AGN) such as Type 2 quasars, Seyfert galaxies, and BL Lacertae objects; studies of these sources in the SDSS can be found in Zakamska et al. (2003) (Type 2), Kauffmann et al. (2003) and Hao et al. (2005) (Seyferts), and Collinge et al. (2005) and Anderson et al. (2007) (BL Lacs). Spectra of the highest redshift SDSS quasars (z > 5.7; e.g., Fan et al. 2003, 2006a) were not acquired as part of the SDSS quasar survey (the objects were identified as candidates in the SDSS imaging data, but the spectra were not obtained with the SDSS spectrographs), so they are not included in the catalog. The observations used to produce the catalog are presented in Section 2; the construction of the catalog and the catalog format are discussed in Sections 3 and 4, respectively. Section 5 presents an overview of the catalog, and a summary is given in Section 6. The catalog is presented in an electronic table in this paper and can also be found at an SDSS public web site.1 2. Observations 2.1. Sloan Digital Sky Survey The Sloan Digital Sky Survey uses a CCD camera (Gunn et al. 1998) on a dedicated 2.5-m telescope (Gunn et al. 2006) at Apache Point Observatory, New Mexico, to obtain images in five broad optical bands (ugriz; Fukugita et al. 1996) over approximately 10,000 deg2 of the high Galactic latitude sky. The sur- vey data-processing software measures the properties of each detected object in the imaging data in all five bands, and determines and applies both astrometric and photometric calibrations (Pier et al., 2003; Lupton et al. 2001; Ivezić et al. 2004). Photometric calibration is provided by simultaneous observations with a 20-inch telescope at the same site (see Hogg et al. 2001, Smith et al. 2002, Stoughton et al. 2002, and Tucker et al. 2006). The SDSS photometric system is based on the AB magnitude scale (Oke & Gunn 1983). The catalog contains photometry from 204 SDSS imaging runs acquired between 19 September 1998 (Run 94) and 13 May 2005 (Run 5326). 2.2. Target Selection The SDSS filter system was designed to identify quasars at redshifts between zero and approximately six; most quasar candidates are selected based on their location in multidimensional SDSS color-space. The Point Spread Function (PSF) magnitudes are used for the quasar target selection, and the selection is based on magnitudes and colors that have been corrected for Galactic extinction (using the maps of Schlegel, Finkbeiner, & Davis 1998). An i magnitude limit of 19.1 is imposed for candidates whose colors indicate a probable redshift of less than≈ 3.0 (selected from the ugri color cube); high-redshift candidates (selected from the griz color cube) are accepted if i < 20.2 and the source is unresolved. The errors on the i measurements are typically 0.02–0.03 and 0.03–0.04 magnitudes at the brighter and fainter limits, respectively. In addition 1http://www.sdss.org/dr5/products/value added/qsocat dr5.html http://www.sdss.org/dr5/products/value$_$added/qsocat$_$dr5.html – 4 – to the multicolor selection, unresolved objects brighter than i = 19.1 that lie within 2.0′′ of a FIRST radio source (Becker, White, & Helfand 1995) are also identified as primary quasar candidates. Target selection also imposes a maximum brightness limit (i ≈ 15.0) on quasar candidates; the spectra of objects that exceed this brightness could contaminate the adjacent spectra on the detectors of the SDSS spectrographs. A detailed description of the quasar selection process and possible biases can be found in Richards et al. (2002a). The primary sample described above was supplemented by quasars that were targeted by the following SDSS spectroscopic target selection algorithms: Galaxy and Luminous Red Galaxy (Strauss et al. 2002 and Eisenstein et al. 2001), X-ray (object near the position of a ROSAT All-Sky Survey [RASS; Voges et al. 1999, 2000] source; see Anderson et al. 2003), Star (point source with a color typical of an interesting class of star), or Serendipity (unusual color or FIRST matches). The SDSS is designed to be complete in the Galaxy, Luminous Red Galaxy and Quasar programs, (in practice various limitations reduce the completeness to about 90%) but no attempt at completeness was made for the other categories. Most of the DR5 quasars that fall below the magnitude limits of the quasar survey were selected by the serendipity algorithm (see §5). While the bulk of the catalog objects targeted as quasars were selected based on the algorithm of Richards et al. (2002a), during the early years of the SDSS the quasar selection software was undergoing constant modification to improve its efficiency. All of the sources in Papers I and II, and some of the Paper III objects, were not identified with the final selection algorithm. Once the final target selection software was installed, the algorithm was applied to the entire SDSS photometric database. Each DR5 quasar has two spectroscopic target selection flags listed in the catalog: BEST, which refers to the final algorithm, and TARGET, which is the target flag used in the actual spectroscopic targeting. There are also two sets of photometric measurements for each quasar: BEST, which refers to the measurements with the latest photometric software on the highest quality data, and TARGET, which are the values used at the time of the spectroscopic target selection. Extreme care must be exercised when constructing statistical samples from this catalog; if one uses the values produced by only the latest version of the selection software, not only must one drop the catalog quasars that were not identified as quasar candidates by the final selection software, one must also account for quasar candidates produced by the final version that were not observed in the SDSS spectroscopic survey (this can occur in regions of sky whose spectroscopic targets were identified by early versions of the selection software). The selection for the UV-excess quasars, which comprise the majority (≈ 80%) of the objects in the DR5 Catalog, has remained reasonably uniform; the changes to the selection algorithm were primarily designed to increase the effectiveness of the identification of 3.0 < z < 3.8 quasars. Extensive discussion of the completeness and efficiency of the selection can be found in Richards et al. (2002a) and Vanden Berk et al. (2005); Richards et al. (2006) describes the process for the construction of statistical SDSS quasar samples (see also Adelman-McCarthy et al. 2007). The survey efficiency (the ratio of quasars to quasar candidates) for the ultraviolet excess-selected candidates, which comprise the bulk of the quasar sample, is about 77%. (The catalog contains information on which objects can be used in a uniform sample; see Section 4.) 2.3. Spectroscopy Spectroscopic targets chosen by the various SDSS selection algorithms (i.e., quasars, galaxies, stars, serendipity) are arranged onto a series of 3◦ diameter circular fields (Blanton et al. 2003). Details of the spectroscopic observations can be found in York et al. (2000), Castander et al. (2001), Stoughton et al. (2002), – 5 – and Paper I. A total of 1458 spectroscopic fields, taken between 5 March 2000 and 14 June 2005, provided the quasars for the DR5 quasar catalog; the locations of the plate centers can be found from the information given by Adelman-McCarthy et al. (2007). The DR5 spectroscopic program attempted to cover, in a well- defined manner, an area of ≈ 5740 deg2. Spectroscopic plate 716 was the first spectroscopic observation that was based on the final version of the quasar target selection algorithm of Richards et al. (2002a); the detailed tiling information in the SDSS database must be consulted to identify those regions of sky targeted with the final selection algorithm (see Richards et al. 2006). The two SDSS double spectrographs produce data covering 3800–9200 Å at a spectral resolution of ≃ 2000. The data, along with the associated calibration frames, are processed by the SDSS spectroscopic pipeline (see Stoughton et al. 2002). The calibrated spectra are classified into various groups (e.g., star, galaxy, quasar), and redshifts are determined by two independent software packages. Objects whose spectra cannot be classified by the software are flagged for visual inspection. Figure 1 shows the calibrated SDSS spectra of four previously unknown catalog quasars representing a range of properties. The processed DR5 spectra have not been corrected for Galactic extinction. 3. Construction of the SDSS DR5 Quasar Catalog The quasars in the catalog were drawn from three sets of SDSS observations: 1) the primary survey area, 2) “Bonus” plates, which are spectroscopic observations of regions near to, but outside of, the primary survey area, and 3) “Special” plates, where the spectroscopic targets were not chosen by the standard SDSS target selection algorithms (e.g., a set of plates to investigate the structure of the Galaxy; see Adelman-McCarthy et al. 2006). The DR5 quasar catalog was constructed, as were the previous editions, in three stages: 1) Creation of a quasar candidate database, 2) Visual examination of the spectra of the quasar candidates, and 3) Application of luminosity and emission-line velocity width criteria. All three tasks were initially done without reference to the material in the previous SDSS Quasar Catalogs, although the results of each task were compared to the Paper III database (e.g., the construction of the quasar database was not viewed as complete until it was understood why any Paper III quasars were not included). 3.1. Creation of the DR5 Quasar Candidate Database This catalog of bona-fide quasars, that have redshifts checked by eye and luminosities and line widths that meet the formal quasar definition, is constructed from a larger “master” table of confirmed quasars and quasar candidates. This master table was created using an SQL query to the public SDSS-DR5 database (i.e., the Catalog Archive Server [CAS]; http://cas.sdss.org/astrodr5/). Two versions of the photometric database exist, which contain the properties of objects when targeted for spectroscopic observations (TARGET) and as determined in the latest processing (BEST). These databases are divided into multiple tables and subtables to facilitate access to only the most relevant data for a particular use. In the case of the quasar catalog construction, we have made use of the PhotoObjAll and SpecObjAll tables, which contain, respectively, the photometric information for all SDSS sources and for all SDSS spectra. In the case of PhotoObjAll, both the TARGET and BEST versions are queried. These tables include duplicate observations of objects and observations of objects that lie outside of the formal SDSS area (as compared to the PhotoObj and SpecObj tables, which include only sources in the formal SDSS area), and are the most complete database files. For http://cas.sdss.org/astrodr5/ – 6 – example, in PhotoObjAll, two (or more) observations of a single object may exist; if so, one is classified as PRIMARY, the other(s) as SECONDARY. This master table contains all objects identified as quasar candidate targets for spectroscopy in either the TARGET or BEST photometric databases. Quasar candidates are those objects which have had one or more of the following flags set by the algorithm described by Richards et al. (2002a): TARGET QSO HIZ OR TARGET QSO CAP OR TARGET QSO SKIRT OR TARGET QSO FIRST CAP OR TARGET QSO FIRST SKIRT ( = 0x0000001F, except for the “special” plates [see Adelman-McCarthy et al. 2006, 2007], where additional care is required in interpreting the flags). Objects flagged as TARGET QSO MAG OUTLIER and TARGET QSO REJECT are not included, as these flags are meant only for diagnostic purposes. (In the CAS documentation and the EDR paper, TARGET QSO MAG OUTLIER is called TARGET QSO FAINT.) Furthermore, the master table includes any objects with spectra that have been classified by the spectroscopic pipeline as quasars (specClass=QSO or HIZ QSO), that have UNKNOWN type, or that have redshifts greater than 0.6. (On rare occasions the spectroscopic pipeline measures the correct redshift for a quasar but classifies the object as a galaxy.) The query was run on the union of the database tables Target..PhotoObjAll, Best..SpecObjAll, and Best..PhotoObjAll. Multiple entries for a given object are retained at this stage. Ten objects in the DR3 Quasar Catalog were missed by this query. One omission was due to an “unmapped” fiber (a spectrum of a quasar was obtained, but because of a failure in the mapping of fiber number to location in the sky, we are no longer certain of the celestial position of the object); the other nine were low-redshift AGN that were not classified as quasars by the spectroscopic pipeline (this result provides an estimate of the incompleteness produced by the query). We were able to identify the information for all ten quasars in the database and add the material to the master table. Four automated cuts were made to the master table database of 329,884 candidates 2: 1) Objects targeted as quasars but whose spectra had not yet been obtained by the closing date of DR5 (124,447 objects), 2) Candidates classified with high confidence as “stars” by the spectroscopic pipeline that had redshifts less than 0.002 (33,653), 3) Objects whose photometric measurements have not been loaded into the CAS (3106) and 4) Multiple spectra (coordinate agreement better than 1.0′′) of the same object (40,007). In cases of duplicate spectra of an object, the “science primary” spectrum is selected (i.e., the spectrum was obtained as part of normal science operations); when there is more than one science primary observation (or when none of the spectra have this flag set), the spectrum with the highest signal-to-noise ratio (S/N) is retained (see Stoughton et al. 2002 for a description of the science primary flag). These actions produced a list of 128,671 unique quasar candidates. 3.2. Visual Examination of the Spectra The SDSS spectra of the remaining quasar candidates were manually inspected by several of the authors (DPS, PBH, GTR, MAS, and SFA); as in previous papers in this series, we found that the spectroscopic 2The master table is known as the QSOConcordanceALL table, which can be found in the SDSS database; see http://cas.sdss.org/astrodr5/en/help/browser/description.asp?n=QsoConcordanceAll&t=U. http://cas.sdss.org/astrodr5/en/help/browser/description.asp?n=QsoConcordanceAll – 7 – pipeline redshifts and classifications of the overwhelming majority of the objects are accurate. Tens of thousands of objects were dropped from the list because they were obviously not quasars (these objects tended to be low S/N stars, unusual stars, and a mix of absorption-line and narrow emission-line objects); this large number of candidates that are not quasars is due to the inclusive nature of our initial database query. Spectra for which redshifts could not be determined (low signal-to-noise ratio or subject to data- processing difficulties) were also removed from the sample. This visual inspection resulted in the revisions of the redshifts of 863 quasars; the changes in the individual redshifts were usually quite substantial, due to the spectroscopic pipeline misidentifying emission lines. An independent determination of the redshifts of 5,865 quasars with redshifts larger than 2.9 in the catalog was performed by Shen et al. (2007). The redshift differences between the two sets of measurements follow a Gaussian distribution (with slightly extended wings), with a mean of 0.002 and a dispersion of 0.01. The catalog contains numerous examples of extreme Broad Absorption Line (BAL) Quasars (see Hall et al. 2002); it is difficult if not impossible to apply the emission-line width criterion for these objects, but they are clearly of interest, have more in common with “typical” quasars than with narrow-emission line galaxies, and have historically been included in quasar catalogs. We have included in the catalog all objects with broad absorption-line spectra that meet the Mi < −22.0 luminosity criterion. 3.3. Luminosity and Line Width Criteria As in Papers II and III, we adopt a luminosity limit of Mi = −22.0. The absolute magnitudes were calculated by correcting the BEST i measurement for Galactic extinction (using the maps of Schlegel, Finkbeiner, & Davis 1998) and assuming that the quasar spectral energy distribution in the ultraviolet- optical can be represented by a power law (fν ∝ ν α), where α = −0.5 (Vanden Berk et al. 2001). (In the 134 cases where BEST photometry was not available, the TARGET measurements were substituted for the absolute magnitude calculation.) This approach ignores the contributions of emission lines and the observed distribution in continuum slopes. Emission lines can contribute several tenths of a magnitude to the k- correction (see Richards et al. 2006), and variations in the continuum slopes can introduce a magnitude or more of error into the calculation of the absolute magnitude, depending upon the redshift. The absolute magnitudes will be particularly uncertain at redshifts near and above five, when the Lyman α emission line (with a typical observed equivalent width of ≈ 400− 500 Å) and strong Lyman α forest absorption enter the i bandpass. Quasars near the Mi = −22.0 luminosity limit are often not enormously brighter in the i-band than the starlight produced by the host galaxy. Although the PSF-based SDSS photometry presented in the catalog are less susceptible to host galaxy contamination than are fixed-aperture measurements, the nucleus of the host galaxy can still contribute appreciably to this measurement for the lowest luminosity entries in the catalog (see Hao et al. 2005). An object of Mi = −22.0 will reach the i = 19.1 “low-redshift” selection limit at a redshift of ≈ 0.4. After visual inspection and application of the luminosity criterion had reduced the number of quasar candidates to under 80,000 objects, the remaining spectra were processed with an automated line-measuring routine. The spectra for objects whose maximum line width was less than 1000 km s−1 were visually examined; if the measurement was deemed to be an accurate reflection of the line (automated routines occasionally have spectacular failures when dealing with complex line profiles), the object was removed from the catalog. – 8 – 4. Catalog Format The DR5 SDSS Quasar Catalog is available in three types of files at the SDSS public web site listed in the introduction: 1) a standard ASCII file with fixed-size columns, 2) a gzipped compressed version of the ASCII file (which is smaller than the uncompressed version by a factor of more than four), and 3) a binary FITS table format. The following description applies to the standard ASCII file. All files contain the same number of columns, but the storage of the numbers differs slightly in the ASCII and FITS formats; the FITS header contains all of the required documentation. Table 1 provides a summary of the information contained in each of the columns in the ASCII catalog. The standard ASCII catalog (Table 2 of this paper) contains information on 77,429 quasars in a 36 MB file. The DR5 format is similar to that of DR3 with a few minor differences. The first 80 lines consist of catalog documentation; this is followed by 77,429 lines containing information on the quasars. There are 74 columns in each line; a summary of the information is given in Table 1 (the documentation in the ASCII catalog header is essentially an expansion of Table 1). At least one space separates all the column entries, and, except for the first and last columns (SDSS designation and the object name if previously known), all entries are reported in either floating point or integer format. Notes on the catalog columns: 1) The DR5 object designation, given by the format SDSS Jhhmmss.ss+ddmmss.s; only the final 18 char- acters are listed in the catalog (i.e., the “SDSS J” for each entry is dropped). The coordinates in the object name follow IAU convention and are truncated, not rounded. 2–3) The J2000 coordinates (Right Ascension and Declination) in decimal degrees. The positions for the vast majority of the objects are accurate to 0.1′′ rms or better in each coordinate; the largest expected errors are 0.2′′ (see Pier et al 2003). The SDSS coordinates are placed in the International Celestial Reference System, primarily through the United States Naval Observatory CCD Astrograph Catalog (Zacharias et al. 2000), and have an rms accuracy of 0.045′′ per coordinate. 4) The quasar redshifts. A total of 863 of the CAS redshifts were revised during our visual inspection. A detailed description of the redshift measurements is given in Section 4.10 of Stoughton et al. (2002). A comparison of 299 quasars observed at multiple epochs by the SDSS (Wilhite et al. 2005) found an rms difference of 0.006 in the measured redshifts for a given object. It is well known that the redshifts of individual broad emission lines in quasars exhibit significant offsets from their systemic redshifts (e.g., Gaskell 1982, Richards et al. 2002b, Shen et al. 2007); the catalog redshifts attempt to correct for this effect in the ensemble average (see Stoughton et al. 2002). 5–14) The DR5 PSF magnitudes and errors (not corrected for Galactic extinction) from BEST photometry for each object in the five SDSS filters. Some of the relevant imaging scans, such as special scans through M31 (see the DR4 and DR5 papers) were never loaded into the CAS, therefore the BEST photometry is not available for them. Thus there are 134 quasars which have entries of “0.000” for their BEST photometric measurements. The effective wavelengths of the u, g, r, i, and z bandpasses are 3541, 4653, 6147, 7461, and 8904 Å, re- spectively (for an α = −0.5 power-law spectral energy distribution using the definition of effective wavelength given in Schneider, Gunn, & Hoessel 1983). The photometric measurements are reported in the natural sys- tem of the SDSS camera, and the magnitudes are normalized to the AB system (Oke & Gunn 1983). The measurements are reported as asinh magnitudes (Lupton, Gunn, & Szalay 1999); see Adelman-McCarthy et – 9 – al. (2007) for additional discussion and references for the accuracy of the photometric measurements. The TARGET PSF photometric measurements are presented in columns 63–72. 15) The Galactic extinction in the u band based on the maps of Schlegel, Finkbeiner, & Davis (1998). For an RV = 3.1 absorbing medium, the extinctions in the SDSS bands can be expressed as Ax = Cx E(B − V ) where x is the filter (ugriz), and values of Cx are 5.155, 3.793, 2.751, 2.086, and 1.479 for ugriz, respectively (Ag, Ar, Ai, and Az are 0.736, 0.534, 0.405, and 0.287 times Au). 16) The logarithm of the Galactic neutral hydrogen column density along the line of sight to the quasar. These values were estimated via interpolation of the 21-cm data from Stark et al. (1992), using the COLDEN software provided by the Chandra X-ray Center. Errors associated with the interpolation are typically expected to be less than ≈ 1× 1020 cm−2 (e.g., see §5 of Elvis, Lockman, & Fassnacht 1994). 17) Radio properties. If there is a source in the FIRST catalog within 2.0′′ of the quasar position, this column contains the FIRST peak flux density at 20 cm encoded as an AB magnitude AB = −2.5 log 3631 Jy (see Ivezić et al. 2002). An entry of “0.000” indicates no match to a FIRST source; an entry of “−1.000” indicates that the object does not lie in the region covered by the final catalog of the FIRST survey. The catalog contains 6226 FIRST matches; 5729 DR5 quasars lie outside of the FIRST area. 18) The S/N of the FIRST source whose flux is given in column 17. 19) Separation between the SDSS and FIRST coordinates (in arc seconds). 20) In cases when the FIRST counterpart to an SDSS source is extended, the FIRST catalog position of the source may differ by more than 2′′ from the optical position. A “1” in column 20 indicates that no matching FIRST source was found within 2′′ of the optical position, but that there is significant detection (larger than 3σ) of FIRST flux at the optical position. This is the case for 2440 SDSS quasars. 21) A “1” in column 21 identifies the 1596 sources with a FIRST match in either columns 17 or 20 that also have at least one FIRST counterpart located between 2.0′′ (the SDSS-FIRST matching radius) and 30′′ of the optical position. Based on the average FIRST source surface density of 90 deg−2, we expect 50–60 of these matches to be chance superpositions. 22) The logarithm of the vignetting-corrected count rate (photons s−1) in the broad energy band (0.1–2.4 keV) in the ROSATAll-Sky Survey Faint Source Catalog (Voges et al. 2000) and the ROSATAll-Sky Survey Bright Source Catalog (Voges et al. 1999). The matching radius was set to 30′′; an entry of “−9.000” in this column indicates no X-ray detection. There are 4133 RASS matches in the DR5 catalog. 23) The S/N of the ROSAT measurement. 24) Separation between the SDSS and ROSAT All-Sky Survey coordinates (in arc seconds). 25–30) The JHK magnitudes and errors from the Two Micron All Sky Survey (2MASS; Skrutskie et al. 2006) All-Sky Data Release Point Source Catalog (Cutri et al. 2003) using a matching radius of 2.0′′. A – 10 – non-detection by 2MASS is indicated by a “0.000” in these columns. Note that the 2MASS measurements are Vega-based, not AB, magnitudes. The catalog contains 9824 2MASS matches. 31) Separation between the SDSS and 2MASS coordinates (in arc seconds). 32) The absolute magnitude in the i band calculated by correcting for Galactic extinction and assuming H0 = 70 km s −1 Mpc−1, ΩM = 0.3, ΩΛ = 0.7, and a power-law (frequency) continuum index of −0.5. 33) The ∆(g− i) color, which is the difference in the Galactic extinction corrected (g− i) for the quasar and that of the mean of the quasars at that redshift. If ∆(g − i) is not defined for the quasar, which occurs for objects at either z < 0.12 or z > 5.12 the column will contain “−9.000”. See Section 5.2 for a description of this quantity. 34) Morphological information. If the SDSS photometric pipeline classified the image of the quasar as a point source, the catalog entry is 0; if the quasar is extended, the catalog entry is 1. 35) The SDSS SCIENCEPRIMARY flag, which indicates whether the spectrum was taken as a normal science spectrum (SCIENCEPRIMARY= 1) or for another purpose (SCIENCEPRIMARY= 0). The latter category contains Quality Assurance and calibration spectra, or spectra of objects located outside of the nominal survey area. Over 90% of the DR5 entries (69,762 objects) are SCIENCEPRIMARY = 1. 36) This flag provides information on whether the photometric object is designated PRIMARY (1), SECONDARY (2), or FAMILY (3; these are blended objects that have not been deblended). During target selection, only PRIMARY objects are considered (except on occasion for objects located in fields that are not part of the nominal sur- vey area); however, differences between TARGET and BEST photometric pipeline versions make it possible that the BEST photometric object belonging to a spectrum is either not detected at all, or is a non-primary object (see §3.1 above). Over 99% of the catalog entries are PRIMARY; 613 quasars are SECONDARY and 9 are FAMILY. There are 124 quasars with an entry of “0” in this column; each of these is an object that lacks BEST photometry. For statistical analysis, one should use only PRIMARY objects; SECONDARY and FAMILY objects are included in the catalog for the sake of completeness with respect to confirmed quasars. 37) The “uniform selection” flag, either 0 or 1; a “1” indicates that the object was identified as a primary quasar target (37,574 catalog entries) with the final target selection algorithm as given by Richards et al. (2002a). These objects constitute a statistical sample. 38) The 32-bit SDSS target-selection flag from BEST processing (PRIMTARGET; see Table 26 in Stoughton et al. 2002 for details); this is the flag produced by running the selection algorithm of Richards et al. (2002a) on the most recent processing of the image data. The target-selection flag from TARGET processing is found in column 55. 39–45) The spectroscopic target selection breakdown (BEST) for each object. The target selection flag in column 38 is decoded for seven groups: Low-redshift quasar, High-redshift quasar, FIRST, ROSAT, Serendipity, Star, and Galaxy An entry of “1” indicates that the object satisfied the given criterion (see Stoughton et al. 2002). Note that an object can be, and often is, targeted by more than one selection algorithm. The last two columns in Table 3 presents the number of quasars identified by the individual BEST target selection algorithm; the column labeled “Sole” indicates the number of objects that were detected by only one of the seven listed selection algorithms. 46–47) The SDSS Imaging Run number and the Modified Julian Date (MJD) of the photometric observation used in the catalog. The MJD is given as an integer; all observations on a given night have the same integer MJD (and, because of the observatory’s location, the same UT date). For example, imaging run 94 has an – 11 – MJD of 51075; this observation was taken on 1998 September 19 (UT). 48–50) Information about the spectroscopic observation (Modified Julian Date, spectroscopic plate number, and spectroscopic fiber number) used to determine the redshift. These three numbers are unique for each spectrum, and can be used to retrieve the digital spectra from the public SDSS database. 51–54) Additional SDSS processing information: the photometric processing rerun number; the camera column (1–6) containing the image of the object, the field number of the run containing the object, and the object identification number (see Stoughton et al. 2002 for descriptions of these parameters). 55) The 32-bit SDSS target selection flag from the TARGET processing, i.e., the value that was used when the spectroscopic plate was drilled. This may not match the BEST target selection flag because a different versions of the selection algorithm were used, the selection was done with different image data (superior quality data of the field was obtained after the spectroscopic observations were completed), or different processings of the same data were used. Objects with no TARGET flag were either identified as quasars as a result of Quality Assurance observations and/or from special plates with somewhat different targeting criteria (see Adelman-McCarthy 2006). 56–62) The spectroscopic target-selection breakdown (TARGET) for each object; this is the same convention as followed in columns 39–45 for the BEST target-selection flag. 63–72) The DR5 PSF magnitudes and errors (not corrected for Galactic reddening) from TARGET photom- etry. 73) The 64-bit integer that uniquely describes the spectroscopic observation that is listed in the catalog (SpecObjID). 74) Name of object in the NASA/IPAC Extragalactic Database (NED). If there is a source in the NED quasar database within 5.0′′ of the quasar position, the NED object name is given in this column. The NED quasar database contains over 100,000 objects. Occasionally NED will list the SDSS name for objects that were not discovered by the SDSS. 5. Catalog Summary The 77,429 objects in the catalog represent an increase of 31,009 quasars over the Paper III database; of the entries in the new catalog, 74,297 (96.0%) were discovered by the SDSS (with the caveat that NED is not complete). The catalog quasars span a wide range of properties: redshifts from 0.078 to 5.414, 14.94 < i < 22.36 (506 objects have i > 20.5; only 26 have i > 21.0), and−30.27 < Mi < −22.00. The catalog contains 6226, 4133, and 9824 matches to the FIRST, RASS, and 2MASS catalogs, respectively. The RASS and 2MASS catalogs cover essentially all of the DR5 area, but 5729 (7%) of the entries in the DR5 catalog lie outside of the FIRST region. Figure 2 displays the distribution of the DR5 quasars in the i-redshift plane (the 26 objects with i > 21 are not plotted). Objects which NED indicates were previously discovered by investigations other than the SDSS are indicated with open circles. The curved cutoff on the left hand side of the graph is produced by the minimum luminosity criterion (Mi < −22.0). The ridge in the contours at i ≈ 19.1 for redshifts below three reflects the flux limit of the low-redshift sample; essentially all of the large number of z < 3 points with i > 19.1 are quasars selected via criteria other than the primary multicolor sample. – 12 – A histogram of the catalog redshifts is shown in the upper curve in Figure 3. A clear majority of quasars have redshifts below two (the median redshift is 1.48, the mode is ≈ 1.85), but there is a significant tail of objects extending out to redshifts beyond five (zmax = 5.41). The dips in the curve at redshifts of 2.7 and 3.5 arise because the SDSS colors of quasars at these redshifts are similar to the colors of stars; we decided to accept significant incompleteness at these redshifts rather than be overwhelmed by a large number of stellar contaminants in the spectroscopic survey. Improvements in the quasar target selection algorithm since the initial editions of the SDSS Quasar Catalog have increased the efficiency of target selection at redshifts near 3.5 (compare Figure 3 with Paper II’s Figure 4; see Richards et al. 2002a for a discussion of the incompleteness of the SDSS Quasar Survey). This structure in the catalog redshift histogram can be understood by careful modelling of the selection effects (e.g., accounting for emission line effects and using only objects selected in regions whose spectroscopic observations were chosen with the final version of the quasar target selection algorithm; also see Figure 8 in Richards et al. 2006). Repeating the analysis of Richards et al. (2006) for the DR5 sample reveals no structure in the redshift distribution after selection effects have been included (see lower histogram in Figure 3); this is in contrast to the reported redshift structure found in the SDSS quasar survey by Bell & McDiarmid (2006). To construct the lower histogram we have partially removed the effect of host galaxy contamination (by excluding extended objects), limited the sample to a uniform magnitude limit of i < 19.1 (accounting for emission-line effects), and have corrected for the known incompleteness near z ∼ 2.7 and z ∼ 3.5 due to quasar colors lying close to or in the stellar locus. Accounting for selection effects significantly reduces the number of objects as compared with the raw, more heterogeneous catalog, but the smaller, more homogeneous sample is what should be used for statistical analyses. The distribution of the observed i magnitude (not corrected for Galactic extinction) of the quasars is given in Figure 4. The sharp drops in the histogram at i ≈ 19.1 and i ≈ 20.2 are due to the magnitude limits in the low and high-redshift samples, respectively. Figures 5 and 6 display the distribution of the absolute i magnitudes of the catalog quasars. There is a roughly symmetric peak centered at Mi = −26 with a FWHM of approximately one magnitude. The histogram declines sharply at high luminosities (only 1.5% of the objects haveMi < −28.0) and has a gradual decline toward lower luminosities, partially due to host-galaxy contribution. A summary of the spectroscopic selection, for both the TARGET and the BEST algorithms, is given in Table 3. We report seven selection classes in the catalog (columns 39 to 45 for BEST, 56–62 for TARGET). Each selection version has two columns, the number of objects that satisfied a given selection criterion and the number of objects that were identified only by that selection class. About two-thirds of the catalog entries were selected based on the SDSS quasar selection criteria (either a low-redshift or high-redshift candidate, or both). Slightly more than half of the quasars in the catalog are serendipity-flagged candidates, which is also primarily an “unusual color” algorithm; about one-fifth of the catalog was selected by the serendipity criteria alone. Of the 50,093 DR5 quasars that have Galactic-absorption corrected TARGET i magnitudes brighter than 19.1, 48,593 (97.0%) were identified by the TARGET quasar multicolor selection; if one combines TARGET multicolor and FIRST selection (the primary quasar target selection criteria), all but 1015 of the i < 19.1 objects were selected. (The spectra of many of the last category of objects were obtained in observations that were not part of the primary survey.) The numbers are similar if one uses the BEST photometry and selection, although the completeness is not quite as high as with TARGET values. – 13 – 5.1. Discrepancies Between the DR5 and Other Quasar Catalogs The DR3 database is entirely contained in that of DR5, but there are 66 quasars from Paper III (out of 46,420 objects) that do not have a counterpart within 1.0′′ of a DR5 quasar. Three of these “missing” quasars are in the DR5 list; changes in celestial position of 1.1′′, 1.8′′, and 5.3′′ between DR3 and DR5 caused these quasars to be missed with the 1.0′′ matching criterion. The other 63 cases (0.14% of the DR3 total) were individually investigated. Three DR3 objects were dropped because the latest photometry reduced their luminosities below the catalog limit. The remaining 60 objects were removed because 1) the visual examination of the spectrum either convinced us that the object was not a quasar or that the S/N was insufficient to assign a redshift with confidence or 2) The widest line in the latest fit to the spectrum had a FWHM of less than 1000 km s−1. It should be noted that there have been no changes to the DR3 spectra in the DR5 database; the missing objects reflect the inherent uncertainties involved with interpreting objects that either lie near survey cutoffs or have spectra of marginal S/N. There are 40 and 136 DR5 quasars that have redshifts that differ by more than 0.1 from the DR3 and NED values, respectively (there is, of course, considerable overlap in these two groups). In all cases the DR5 measurements are considered more reliable than those presented in previous publications. The 40 objects with |zDR5 − zDR3| > 0.1 are listed in Table 4. 5.2. Quasar Colors It has long been known that the majority of quasars inhabit a restricted range in photometric color, and the large sample size and accurate photometry of the SDSS revealed a relatively tight color-redshift correlation for quasars (Richards et al. 2001). This SDSS color relation, recently presented in Hopkins et al. (2004), has led to considerable success in assigning photometric redshifts to quasars (e.g., Weinstein et al. 2004 and references therein). All photometric measurements used in these analyses have been corrected for Galactic extinction. The dependence of the four standard SDSS colors on redshift for the DR5 quasars is given in Figure 7. The dashed line in each panel is the modal relation for the DR5 quasars; the modal relations are tabulated in Table 5, along with the values for (g− i). The figures show an impressively tight correlation of color with redshift, although the scatter dramatically increases when the Lyman α forest dominates the bluer of the passbands used to form the color. The distribution near the modal curve is roughly symmetric, but there is clearly a significant population of “red” quasars that has no “blue” counterpart. This table is an improvement over previous work in that it is based on a larger sample size (a factor of four increase since this relation was last published) and provides higher redshift resolution (0.01, except near the extrema). As in Hopkins et al. (2004), we compute the mode, rather than the mean or median, as the most representative quantity. However, a formal computation of the mode requires binning the data both in redshift and by color within redshift bins; therefore we estimated the mode from the mean and the median. Typically, the mode is estimated as (3 × median−2 × mean), but we found empirically that (2 × median−mean) appeared to work better for this sample in terms of tracing the modal “ridgeline” with redshift. For each of the DR5 quasars we provide the quantity ∆(g − i), which is defined by ∆(g − i) = (g − i)QSO − 〈(g − i)〉redshift – 14 – where 〈(g − i)〉redshift is the entry in Table 5 for the redshift of the quasar. This “differential color” provides an estimate of the continuum properties of the quasar (values above zero indicate that the object has a redder continuum than the typical quasar at that redshift). 5.3. Bright Quasars Although the spectroscopic survey is limited to objects fainter than i ≈ 15, the SDSS continues to discover a number of “PG-class” (Schmidt & Green 1983) objects. The DR5 catalog contains 81 entries with i < 16.0; 14 of the quasars are not in the NED database or attributed to the SDSS by NED. The spectrum of the brightest post-DR3 discovery, SDSS J165551.37+214601.8 (i = 15.62, z = 0.15), is presented in Figure 1. Three of the SDSS-discovered objects in this catalog have been added since Paper III. 5.4. Luminous Quasars There are 103 catalog quasars with Mi < −29.0 (3C 273 has Mi ≈ −26.6 in our adopted cosmology); 61 were discovered by the SDSS, and 18 are published here for the first time. The redshifts of these quasars lie between 1.3 and 5.0. The most luminous quasar in the catalog is 2MASSI J0745217+473436 (= SDSS J074521.78+473436.2), at Mi = −30.27 and z = 3.22. Spectra of the two most luminous post-DR3 discoveries, with absolute i magnitudes of −29.94 and −29.65, are displayed in the upper two panels of Figure 1. The spectra of both quasars possess a considerable number of absorption features redward of the Lyman α emission line. 5.5. Broad Absorption Line Quasars The SDSS quasar selection algorithm has proven to be effective at finding a wide variety of Broad Absorption Line (BAL) Quasars. An EDR sample of 118 BAL quasars was presented by Tolea, Krolik, & Tsvetanov (2002). There have been two editions of the SDSS BAL Quasar Catalog; the first, associated with Paper I, contained 224 BAL quasars (Reichard et al. 2003); the second was based on the Paper III catalog and presents 4787 BAL quasars (Trump et al. 2006). BAL quasars are usually recognized by the presence of C IV absorption features, which are only visible in SDSS spectra at z > 1.6, thus the frequency of the BAL quasar phenomenon cannot be found from simply taking the ratio of BAL quasars to total number of quasars in the SDSS catalog. The SDSS has discovered a wide variety of extreme BAL quasars (see Hall et al. 2002); the lower right panel in Figure 1 presents the spectrum of an unusual FeLoBAL quasar with strong Balmer absorption (see Hall 2007 for a discussion of this object). 5.6. Quasars with Redshifts Below 0.15 The catalog contains 109 quasars with redshifts below 0.15. All of these objects are of low luminosity (Mi > −24.0, only three have Mi < −23.5) because of the i ≈ 15.0 limit for the spectroscopic sample. About three-quarters of these quasars (83) are extended in the SDSS image data. A total of 40 of the z < 0.15 quasars were found by the SDSS; 21 have been added since Paper III. – 15 – 5.7. High-Redshift (z ≥ 4) Quasars At first light of the SDSS, the most distant known quasar was PC 1247+3406 at redshift of 4.897 (Schneider, Schmidt, and Gunn 1991), which had been discovered seven years earlier. Within a year of operation, the SDSS had discovered quasars with redshifts above five (Fan et al. 1999, 2000); the DR5 catalog contains 60 objects with redshifts greater than that of PC 1247+3406. In recent years the SDSS has identified quasars out to a redshift of 6.4 (Fan et al. 2003, 2006b). Quasars with redshifts larger than ≈ 5.7, however, cannot be found by the SDSS spectroscopic survey because at these redshifts the observed wavelength of the Lyman α emission line is redward of the i band; at this point quasars become single-filter (z) detections. At the typical z-band flux levels for redshift six quasars, there are simply too many “false-positives” to undertake automated targeting. The largest redshift in the DR5 catalog is SDSS J023137.65−072854.5 at z = 5.41, which was originally described by Anderson et al. (2001). The DR5 catalog contains 891 quasars with redshifts larger than four; 36 entries have redshifts above five (11 above z = 5.2), which is more than a factor of two increase since Paper III. The spectra of the 20 highest redshift post-DR3 objects (all with redshifts greater than or equal to 4.99) are displayed in Figure 8. These redshift five spectra display a striking variety of emission line properties, and include an impressive BAL at z = 5.27. We have used archival data from Chandra, ROSAT , and XMM-Newton to check for new X-ray detections of z > 4 quasars with unusual emission-line or absorption-line properties; we do not report all z > 4 X-ray detections here as there are now more than 110 already published.3 We found three remarkable z > 4 X-ray detections: the z = 4.26 BAL quasar SDSS J133529.45+410125.9, the z = 4.11 BAL quasar SDSS J142305.04+240507.8, and the z = 4.50 quasar SDSS J150730.63+553710.8, which shows remarkably strong C iv emission. None of these objects has sufficient counts for detailed X-ray spectral analysis, but we have computed their point-to-point spectral slopes between rest-frame 2500 Å and 2 keV (αox), adopting the assumptions in §2 of Brandt et al. (2002). SDSS J133529.45+410125.9 and SDSS J142305.04+240507.8 were serendipitously detected in archival Chandra ACIS observations and have αox = −2.19 and αox = −1.52, respectively. Comparing these values to the established relation between αox and 2500 Å luminosity (e.g., Steffen et al. 2006), we find that SDSS J133529.45+410125.9 is notably X-ray weak, indicating likely X-ray absorption as is often seen in BAL quasars (e.g., Gallagher et al. 2006) including those at z > 4 (Vignali et al. 2005). In contrast, the level of X-ray emission from SDSS J142305.04+240507.8 is consistent with that from normal, non-BAL quasars; its relatively narrow UV absorption, for a BAL quasar, may indicate a relatively small column density of obscuring material. SDSS J150730.63+553710.8 is weakly detected in a ROSAT PSPC observation and has αox = −1.47; this level of X-ray emission is nominal for a quasar of its luminosity. We have also checked all quasars with z > 5 for new X-ray detections and found none; 21 quasars with z > 5 have previously reported X-ray detections. 5.8. Close Pairs The mechanical constraint that SDSS spectroscopic fibers must be separated by 55′′ on a given plate makes it difficult for the spectroscopic survey to confirm close pairs of quasars. In regions that are covered 3See http://www.astro.psu.edu/users/niel/papers/highz-xray-detected.dat for a list of X-ray detections and references. http://www.astro.psu.edu/users/niel/papers/highz-xray-detected.dat – 16 – by more than one plate, however, it is possible to obtain spectra of both components of a close pair; there are 346 pairs of quasars in the catalog with angular separation less than an arcminute (34 pairs with separations less than 20′′). Most of the pairs are chance superpositions, but there are many sets whose components have similar redshifts, suggesting that the quasars may be physically associated. The typical uncertainty in the measured value of the redshift difference between two quasars is 0.02; the catalog contains 18 quasar pairs with separations of less than an arcminute and with ∆z < 0.02. These pairs, which are excellent candidates for binary quasars, are listed in Table 6. Hennawi et al. (2006) identified over 200 quasar pairs in the SDSS, primarily through spectroscopic observations of SDSS quasar candidates (based on photometric measurements) near known SDSS quasars; statistical arguments based on a correlation-function analysis suggests that most of these pairs are indeed physically associated. 5.9. Morphology The images of 3498 of the DR5 quasars are classified as extended by the SDSS photometric pipeline; 3291 (94%) have redshifts below one (there are nine resolved z > 3.0 quasars). The majority of the large-redshift “re- solved” quasars are probably measurement errors, but this sample may also contain a mix of chance su- perpositions of quasars and foreground objects or possibly some small angle separation gravitational lenses (indeed, several lenses are present in the resolved quasar sample; see Paper II and Oguri et al. 2006). 5.10. Matches with Non-optical Catalogs A total of 6226 catalog objects are FIRST sources (defined by a SDSS-FIRST positional offset of less than 2.0′′). Note that 226 of the objects were selected (with TARGET) solely because they were FIRST matches (all unresolved SDSS sources brighter than i = 19.1 that lie within 2.0′′ of a FIRST source are targeted by the quasar spectroscopic selection algorithm). Extended radio sources may be missed by this matching. The upper left panel in Figure 9 contains a histogram of the angular offsets between the SDSS and FIRST positions; the solid line is the expected distribution assuming a 0.21′′ 1σ Gaussian error in the relative SDSS/FIRST positions (found by fitting the points with a separation less than 1.0′′. The small-angle separations are well-fit to the Rayleigh distribution, but outside of about 0.5′′ there is an obvious excess of observed separations. The number of chance superpositions was estimated by shifting the quasar positions by ±200′′ in declination and matching the new coordinates to the FIRST catalog; only about 0.1% of the reported FIRST matches are false. The large “tail” of this distribution is not likely to be due to measurement errors but probably arises from extended radio emission that may not be precisely centered on the optical image. To recover radio quasars that have offsets of more than 2.0′′, we separately identify all objects with a greater than 3σ detection of FIRST flux at the optical position (2440 sources). For these objects as well as those with a FIRST catalog match within 2′′, we perform a second FIRST catalog search with 30′′ matching radius to identify possible radio lobes associated with the quasar, finding such matches for 1596 sources. Matches with the ROSAT All-Sky Survey Bright and Faint Source Catalogs were made with a maximum allowed positional offset of 30′′; this is the positional coincidence required for the SDSS ROSAT target selection code. The DR5 catalog contains 4133 RASS matches; approximately 1.3% are expected to be false identifications based on an analysis similar to that described in the previous paragraph. The SDSS-RASS offsets for the DR5 sample are presented in the upper right panel of Figure 9; the solid curve, which is the predicted distribution for a 1σ positional error of 11.1′′ (fit using all of the points), matches the data quite – 17 – well. JHK photometric measurements for 9824 DR5 quasars were found by using a matching radius of 2.0′′ in the 2MASS All-Sky Data Release Point Source Catalog. No infrared information was used to select the SDSS spectroscopic targets. The positional offset histogram, given in the lower left panel of Figure 9, is considerably tighter than that for the FIRST matches, although the Rayleigh fit to the separations less than 1.0′′ is virtually identical to the FIRST distribution (1σ of 0.21′′). There are very few 2MASS identifications with offsets between 1′′ and 2′′; virtually all of the infrared matches are correct. 6. Summary The lower right panel in Figure 9 charts the progress of the SDSS Quasar Survey, denoted by the number of spectroscopically-confirmed quasars, over the duration of SDSS-I. Although SDSS-I has now been completed, the SDSS Quasar Survey is continuing under the SDSS-II project. By necessity the SDSS spectroscopy lags the SDSS imaging; at the conclusion of SDSS-I more than 2000 square degrees of SDSS image data in the Northern Galactic Cap lacked spectroscopic coverage (Adelman-McCarthy et al. 2007). A future edition of the SDSS Quasar Catalog will incorporate the observations from SDSS-II and should contain approximately 100,000 quasars. The publication of this catalog marks the completion of the SDSS-I Quasar Survey, and we dedicate this work to the memory of John N. Bahcall. John was the initial co-chair of the SDSS Quasar Working Group, a position he held for nearly a decade. He played a key role in the formation of the SDSS Collaboration and the design of the SDSS Quasar Survey, and was a mentor to many of the members of the Quasar Working Group. We would like to thank Todd Boroson for suggesting several redshift adjustments to some of the DR3 Quasar Catalog redshifts. This work was supported in part by National Science Foundation grants AST-0307582 and AST-0607634 (DPS, DVB, JW), AST-0307384 (XF), and AST-0307409 (MAS), and by NASA LTSA grant NAG5-13035 (WNB, DPS). PBH acknowledges support by NSERC, and GTR was supported in part by a Gordon and Betty Moore Fellowship in Data Intensive Sciences at JHU. XF acknowledges support from an Alfred P. Sloan Fellowship and a David and Lucile Packard Fellowship in Science and Engineering. SJ was supported by the Max-Planck-Gesellschaft (MPI für Astronomie) through an Otto Hahn fellowship. CS was supported by the U.S. Department of Energy under contract DE-AC02-76CH03000. Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participat- ing Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Educa- tion Funding Council for England. The SDSS Web site is http://www.sdss.org/. The SDSS is managed by the Astrophysical Research Consortium (ARC) for the Participating Institu- tions. The Participating Institutions are the American Museum of Natural History, Astrophysical Institute of Potsdam, University of Basel, Cambridge University, Case Western Reserve University, University of Chicago, Drexel University, Fermilab, the Institute for Advanced Study, the Japan Participation Group, Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for Particle As- trophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sciences (LAMOST), Los Alamos National Laboratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics (MPA), New Mexico State University, Ohio State University, University of Pittsburgh, University of Portsmouth, Princeton University, the United States Naval Observatory, and the University of http://www.sdss.org/ – 18 – Washington. This research has made use of 1) the NASA/IPAC Extragalactic Database (NED) which is operated by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration, and 2) data products from the Two Micron All Sky Survey, which is a joint project of the University of Massachusetts and the Infrared Processing and Analysis Center/California Institute of Technology, funded by the National Aeronautics and Space Administration and the National Science Foundation. – 19 – REFERENCES Abazajian, K., et al. 2003, AJ, 126, 2081 (DR1) Abazajian, K., et al. 2005, AJ, 129, 1755 (DR3) Adelman-McCarthy, J., et al. 2006, ApJS, 162, 38 (DR4) Adelman-McCarthy, J., et al. 2007, ApJS, in press (DR5) Anderson, S.F., et al. 2001, AJ, 122, 503 Anderson, S.F., et al. 2003, AJ, 126, 2209 Anderson, S.F., et al. 2007, AJ, 133, 313 Becker, R.H., White, R.L., & Helfand, D.J. 1995, ApJ, 450, 559 Bell, M. B., & McDiarmid, D. 2006, ApJ, 648, 140 Blanton, M.R., Lupton, R.H., Maley, F.M., Young, N., Zehavi, I., & Loveday, J. 2003, AJ, 125, 2276 Brandt, W.N., et al. 2002, ApJ, 569, L5 Castander, F.J., et al. 2001, AJ, 121, 2331 Collinge, M., et al. 2005, AJ, 129, 2542; Erratum AJ 131, 3135 Cutri, R.M., Skrutskie, M.F., van Dyk, S., Beichman, C.A., et al. 2003, VizieR On-line Data Catalog: II/246, University of Massachusetts and Infrared Processing and Analysis Center Eisenstein, D.J., et al. 2001, AJ, 122, 2267 Elvis M., Lockman F.J., & Fassnacht C., 1994, ApJS, 95, 413 Fan, X., et al. 1999, AJ, 118, 1 Fan, X., et al. 2000, AJ, 119, 1 Fan, X., et al. 2003, AJ, 125, 1649 Fan, X., et al. 2006a, AJ, 131, 1203 Fan, X., et al. 2006b, AJ, 132, 171 Fukugita, M., Ichikawa, T., Gunn, J.E., Doi, M., Shimasaku, K., & Schneider, D.P. 1996, AJ, 111, 1748 Gallagher, S.C., Brandt, W.N., Chartas, G., Priddey, R., Garmire, G.P., & Sambruna, R.M. 2006, ApJ, 644, Gaskell, C.M. 1982, ApJ, 263, 79 Gunn, J.E., et al. 1998, AJ, 116, 3040 Gunn, J.E., et al. 2006, AJ, 131, 2332 Hall, P.B. 2007, AJ, 133, 1271 – 20 – Hall, P.B., et al. 2002, ApJS, 141, 267 Hennawi, J., et al. 2006, 131, 1 Hao, L., et al. 2005, AJ, 129, 1783 Hogg, D.W., Schlegel, D.J., Finkbeiner, D.P., & Gunn, J.E. 2001, AJ, 122, 2129 Hopkins, P.F., et al. 2004, AJ, 128, 1112 Ivezić, Ž., et al. 2002, AJ, 124, 2364 Ivezić, Ž., et al. 2004, AN, 325, 583 Kauffmann, G., et al. 2003, MNRAS, 346, 1055 Lupton, R.H., Gunn, J.E., Ivezić, Ž., Knapp, G.R., Kent, S., & Yasuda, N. 2001, in ASP Conf. Ser. 238, Astronomical Data Analysis Software and Systems, ed. F.R. Harnden, F.A. Primini, & H.E. Payne (San Francisco:ASP), 269 Lupton, R.H., Gunn, J.E., & Szalay, A. 1999, AJ, 118, 1406 Oke, J.B., & Gunn, J.E., 1983, ApJ., 266, 713 Oguri, M., et al. 2006, AJ, 132, 999 Pier, J.R., Munn, J.A., Hindsley, R.B., Hennessy, G.S., Kent, S.M., Lupton, R.H., & Ivezić, Ž., 2003, AJ, 125, 1559 Reichard, T.A., et al. 2003, AJ, 125, 1711 Richards, G.T., et al. 2001, AJ, 121, 2308 Richards, G.T., et al. 2002a, AJ, 123, 2945 Richards, G.T., et al. 2002b, AJ, 124,1 Richards, G.T., et al. 2006, AJ, 131, 2766 Schlegel, D.J., Finkbeiner, D.P., & Davis, M. 1998, ApJ, 500, 525 Schmidt, M., & Green, R.F. 1983, ApJ, 269, 352 Schneider, D.P., Gunn, J.E., & Hoessel, J.G. 1983, ApJ, 264, 337 Schneider, D.P., Schmidt, M., and Gunn, J.E., 1991, AJ, 102, 837 Schneider, D.P., et al. 2002, AJ, 123, 567 (Paper I) Schneider, D.P., et al. 2003, AJ, 126, 2579 (Paper II) Schneider, D.P., et al. 2005, AJ, 130, 367 (Paper III) Shen, Y., et al. 2007, AJ, in press. Skrutskie, M.F., et al. 2006, AJ, 131, 1163 Smith, J.A., et al. 2002, AJ, 123, 2121 – 21 – Spergel, D.N., et al. 2006, ApJ, submitted (astro-ph/0603449) Stark A.A., Gammie C.F., Wilson R.W., Bally J., Linke R.A., Heiles, C., & Hurwitz, M. 1992, ApJS, 79, 77 Steffen, A.T., Strateva, I.V., Brandt, W.N., Alexander, D.M., Koekemoer, A.M., Lehmer, B.D., Schneider, D.P., & Vignali, C. 2006, AJ, 131, 2826 Stoughton, C., et al. 2002, AJ, 123, 485 (EDR) Strauss, M.A., et al. 2002, AJ, 124, 1810 Tolea, A., Krolik, J.H., & Tsvetanov, Z. 2002, ApJ, 578, 31 Trump, J.R., et al. 2006, ApJS, 165, 1 Tucker, D., et al. 2006, AN, 327, 821 Vanden Berk, D.E., et al. 2001, AJ, 122, 549 Vanden Berk, D.E., et al. 2005, AJ, 129, 2047 Vignali, C., Brandt, W.N., Schneider, D.P., & Kaspi, S. 2005, AJ, 129, 2519 Voges, W., et al. 1999, A & A, 349, 389 Voges, W., et al. 2000, IAUC, 7432 Weinstein, M.A., et al. 2004, ApJS, 155, 243 Wilhite, B.C., et al. 2005, AJ, 633, 638 York, D.G., et al. 2000, AJ, 120, 1579 Zacharias, N., et al. 2000, AJ, 120, 2131 Zakamska, N.L., et al. 2003, AJ, 128, 1002 This preprint was prepared with the AAS LATEX macros v5.2. http://arxiv.org/abs/astro-ph/0603449 – 22 – 0 160843.90+071508.6 z = 2.88 163909.11+282447.1 z = 3.82 Wavelength (A) 4000 5000 6000 7000 8000 9000 165551.37+214601.8 z = 0.15 Wavelength (A) 4000 5000 6000 7000 8000 9000 5 125942.80+121312.6 z = 0.75 Fig. 1.— SDSS spectra of four previously unreported quasars. The spectral resolution of the data ranges from 1800 to 2100; a dichroic splits the beam at 6150 Å. The data have been rebinned to 5 Å pixel−1 for display purposes. The upper two panels display the two most luminous of the newly discovered quasars; both objects have Mi < −29.6. SDSS J165551.37+214601.8 is the brightest (i = 15.62) of the new quasars; SDSS J125942.80+121312.6 is an unusual FeLoBAL quasar with Balmer-line absorption. – 23 – Fig. 2.— The observed i magnitude as a function of redshift for the 77,429 objects in the catalog. Open circles indicate quasars in NED that were recovered but not discovered by the SDSS. The 26 quasars with i > 21 are not plotted. The distribution is represented by a set of linear contours when the density of points in this two-dimensional space causes the points to overlap. The steep gradient at i ≈ 19 is due to the flux limit for the targeted low-redshift part of the survey; the dip in the counts at z ≈ 2.7 arises because of the high incompleteness of the SDSS Quasar Survey at redshifts between 2.5 and 3.0 (also see Figure 3). – 24 – Fig. 3.— The redshift histogram of the catalog quasars. The redshifts range from 0.08 to 5.41; the median redshift of the catalog is 1.48. The redshift bins have a width of 0.05. The dips at redshifts of 2.7 and 3.5 are caused by the reduced efficiency of the selection algorithm at these redshifts. The lower histogram is the redshift distribution of the i < 19.1 sample after correction for selection effects (see Section 5). – 25 – i mag 16 18 20 22 Fig. 4.— The i magnitude (not corrected for Galactic absorption) histogram of the 77,429 catalog quasars. The magnitude bins have a width of 0.108. The sharp drop that occurs at magnitudes slightly fainter than 19 is due to the flux limit for the low-redshift targeted part of the survey. Quasars fainter than the i = 20.2 high-redshift selection limit were found via other selection algorithms, primarily serendipity. The SDSS Quasar survey has a bright limit of i ≈ 15.0 imposed by the need to avoid saturation in the spectroscopic observations. – 26 – Fig. 5.— The absolute i magnitude as a function of redshift for the 77,429 objects in the catalog. Open circles indicate quasars in NED that were recovered but not discovered by the SDSS. The distribution is represented by a set of linear contours when the density of points in this two-dimensional space causes the points to overlap. The steep gradient that runs through the midst of the quasar distribution is produced by the i ≈ 19 flux limit for the targeted low-redshift part of the survey. – 27 – Absolute i Magnitude -30 -28 -26 -24 -22 Fig. 6.— The luminosity distribution of the catalog quasars. The absolute magnitude bins have a width of 0.114. The most luminous quasar in the catalog has Mi ≈ −30.3. In the adopted cosmology 3C 273 has Mi ≈ −26.6. – 28 – Fig. 7.— The quasar color-redshift relation for the DR5 quasars (photometry corrected for Galactic ex- tinction). Contours are used to represent the distribution when the density of points causes the points to overlap. The panels present the four standard SDSS colors; the dashed gray lines are the modal relations presented in Table 5. The influence of emission lines on the colors is readily apparent (in particular Hα in the (i− z) panel). The tightness of the correlations breaks down when the Lyman α forest region dominates the bluer of the two passbands (e.g., above redshifts of 2.2 in the (u− g) relation). – 29 – 005421.42-010921.6 z = 5.09 073103.12+445949.4 z = 5.00 4 084627.84+080051.7 z = 5.03 090245.76+085115.9 z = 5.23 092216.81+265359.0 z = 5.03 111920.64+345248.1 z = 5.01 113246.50+120901.6 z = 5.17 114657.79+403708.6 z = 5.01 115424.73+134145.7 z = 5.01 120207.78+323538.8 z = 5.29 123333.48+062234.2 z = 5.29 133412.56+122020.7 z = 5.13 133728.81+415539.8 z = 5.01 134015.03+392630.7 z = 5.03 134040.24+281328.1 z = 5.34 134141.45+461110.2 z = 5.00 Wavelength (A) 6000 7000 8000 9000 142325.92+130300.6 z = 5.04 Wavelength (A) 6000 7000 8000 9000 4 144350.66+362315.1 z = 5.27 Wavelength (A) 6000 7000 8000 9000 3 162629.19+285857.5 z = 5.02 Wavelength (A) 6000 7000 8000 9000 165902.11+270935.1 z = 5.31 Fig. 8.— SDSS spectra of the 20 new quasars with the highest redshifts (z ≥ 4.99). The spectra have been rebinned to 10 Å pixel−1 for display purposes. The wavelength region below 6000 Å has been removed because of the lack of signal below rest frame wavelengths of 1000 Å in these objects. Five of the quasars have redshifts larger than 5.25. – 30 – FIRST/SDSS Offset (") 0.0 0.5 1.0 1.5 2.0 RASS/SDSS Offset (") 0 10 20 30 2MASS/SDSS Offset (") 0.0 0.5 1.0 1.5 2.0 Modified Julian Date (51600.0 = UTC 2000 Feb 26.0) 51500 52000 52500 53000 53500 Fig. 9.— a) Offsets between the 6226 SDSS and FIRST matches; the matching radius was set to 2.0′′. The smooth curve is the expected distribution for a set of matches if the offsets between the objects are described by a Rayleigh distribution with σ = 0.21′′ (best fit for points with separations of less than 1.0′′). b) Offsets between the 4133 SDSS and RASS FSC/BSC matches; the matching radius was set to 30′′. The smooth curve is the Rayleigh distribution fit (σ = 11.1′′) to all of the points. c) Offsets between the 9824 SDSS and 2MASS matches; the matching radius was set to 2′′. The smooth curve is a Rayleigh distribution with σ = 0.21′′ based on the points with separations smaller than 1.0′′. d) The cumulative number of DR5 quasars as a function of time. The horizontal axis runs from February 2000 to June 2005. The periodic structure in the curve is caused by the yearly summer maintenance schedule. The total number of objects in the catalog is 77,429. – 31 – Table 1. SDSS DR5 Quasar Catalog Format Column Format Description 1 A18 SDSS DR5 Designation hhmmss.ss+ddmmss.s (J2000) 2 F11.6 Right Ascension in decimal degrees (J2000) 3 F11.6 Declination in decimal degrees (J2000) 4 F7.4 Redshift 5 F7.3 BEST PSF u magnitude (not corrected for Galactic extinction) 6 F6.3 Error in BEST PSF u magnitude 7 F7.3 BEST PSF g magnitude (not corrected for Galactic extinction) 8 F6.3 Error in BEST PSF g magnitude 9 F7.3 BEST PSF r magnitude (not corrected for Galactic extinction) 10 F6.3 Error in BEST PSF r magnitude 11 F7.3 BEST PSF i magnitude (not corrected for Galactic extinction) 12 F6.3 Error in BEST PSF i magnitude 13 F7.3 BEST PSF z magnitude (not corrected for Galactic extinction) 14 F6.3 Error in BEST PSF z magnitude 15 F7.3 Galactic extinction in u band 16 F7.3 logNH (logarithm of Galactic H I column density) 17 F7.3 FIRST peak flux density at 20 cm expressed as AB magnitude; 0.0 is no detection, −1.0 source is not in FIRST area 18 F8.3 S/N of FIRST flux density 19 F7.3 SDSS-FIRST separation in arc seconds 20 I3 > 3σ FIRST flux at optical position but no FIRST counterpart within 2′′ (0 or 1) 21 I3 FIRST source located 2′′-30′′ from optical position (0 or 1) 22 F8.3 log RASS full band count rate; −9.0 is no detection 23 F7.3 S/N of RASS count rate 24 F7.3 SDSS-RASS separation in arc seconds 25 F7.3 J magnitude (2MASS); 0.0 indicates no 2MASS detection 26 F6.3 Error in J magnitude (2MASS) 27 F7.3 H magnitude (2MASS); 0.0 indicates no 2MASS detection 28 F6.3 Error in H magnitude (2MASS) 29 F7.3 K magnitude (2MASS); 0.0 indicates no 2MASS detection 30 F6.3 Error in K magnitude (2MASS) 31 F7.3 SDSS-2MASS separation in arc seconds 32 F8.3 Mi (H0 = 70 km s −1 Mpc−1, ΩM = 0.3, ΩΛ = 0.7, αν = −0.5) 33 F7.3 ∆(g − i) = (g − i)− 〈(g − i)〉redshift (Galactic extinction corrected) 34 I3 Morphology flag 0 = point source 1 = extended 35 I3 SDSS SCIENCEPRIMARY flag (0 or 1) 36 I3 SDSS MODE flag (blends, overlapping scans; 1, 2, or 3) 37 I3 Selected with final quasar algorithm (0 or 1) 38 I12 Target Selection Flag (BEST) 39 I3 Low-z Quasar selection flag (0 or 1) 40 I3 High-z Quasar selection flag (0 or 1) 41 I3 FIRST selection flag (0 or 1) – 32 – Table 1—Continued Column Format Description 42 I3 ROSAT selection flag (0 or 1) 43 I3 Serendipity selection flag (0 or 1) 44 I3 Star selection flag (0 or 1) 45 I3 Galaxy selection flag (0 or 1) 46 I6 SDSS Imaging Run Number of photometric measurements 47 I6 Modified Julian Date of imaging observation 48 I6 Modified Julian Date of spectroscopic observation 49 I5 Spectroscopic Plate Number 50 I5 Spectroscopic Fiber Number 51 I4 SDSS Photometric Processing Rerun Number 52 I3 SDSS Camera Column Number 53 I5 SDSS Field Number 54 I5 SDSS Object Number 55 I12 Target Selection Flag (TARGET) 56 I3 Low-z Quasar selection flag (0 or 1) 57 I3 High-z Quasar selection flag (0 or 1) 58 I3 FIRST selection flag (0 or 1) 59 I3 ROSAT selection flag (0 or 1) 60 I3 Serendipity selection flag (0 or 1) 61 I3 Star selection flag (0 or 1) 62 I3 Galaxy selection flag (0 or 1) 63 F7.3 TARGET PSF u magnitude (not corrected for Galactic extinction) 64 F6.3 TARGET Error in PSF u magnitude 65 F7.3 TARGET PSF g magnitude (not corrected for Galactic extinction) 66 F6.3 TARGET Error in PSF g magnitude 67 F7.3 TARGET PSF r magnitude (not corrected for Galactic extinction) 68 F6.3 TARGET Error in PSF r magnitude 69 F7.3 TARGET PSF i magnitude (not corrected for Galactic extinction) 70 F6.3 TARGET Error in PSF i magnitude 71 F7.3 TARGET PSF z magnitude (not corrected for Galactic extinction) 72 F6.3 TARGET Error in PSF z magnitude 73 I21 Spectroscopic Identification flag (64-bit integer) 74 1X, A25 Object Name for previously known quasars “SDSS” designates previously published SDSS object Table 2. The SDSS Quasar Catalog IVa Object (SDSS J) R.A. (deg) Dec (deg) Redshift u g r i z 000006.53+003055.2 0.027228 0.515349 1.8227 20.389 0.066 20.468 0.034 20.332 0.037 20.099 0.041 20.053 0.121 000008.13+001634.6 0.033898 0.276304 1.8365 20.233 0.054 20.200 0.024 19.945 0.032 19.491 0.032 19.191 0.068 000009.26+151754.5 0.038605 15.298476 1.1986 19.921 0.042 19.811 0.036 19.386 0.017 19.165 0.023 19.323 0.069 000009.38+135618.4 0.039088 13.938447 2.2400 19.218 0.026 18.893 0.022 18.445 0.018 18.331 0.024 18.110 0.033 000009.42−102751.9 0.039269 −10.464428 1.8442 19.249 0.036 19.029 0.027 18.980 0.021 18.791 0.018 18.751 0.047 aTable 2 is presented in its entirety in the electronic edition of the Astronomical Journal. A portion is shown here for guidance regarding its form and content. The full catalog contains 74 columns of information on 77,429 quasars. – 34 – Table 3. Spectroscopic Target Selection TARGET TARGET BEST BEST Sole Sole Class Selected Selection Selected Selection Low-z 49010 16422 46460 14444 High-z 16383 5327 16757 4411 FIRST 3501 226 3619 209 ROSAT 4817 380 4918 492 Serendipity 42109 15729 41042 15950 Star 1970 187 820 162 Galaxy 536 99 601 80 – 35 – Table 4. Quasars with |zDR5 − zDR3| > 0.1 SDSS J zDR5 SDSS J zDR5 005508.55−105206.2 1.381 133028.12+600811.7 1.992 013413.55+142900.1 1.195 133951.94+481651.3 0.911 031712.23−075850.3 2.696 134048.37+433359.8 2.069 075052.59+300334.1 3.990 135833.05+634122.6 3.180 075132.75+350535.0 2.077 140012.65+595823.3 2.061 083503.79+322242.0 0.728 140223.63+463604.9 0.925 085339.64+372203.6 1.950 140327.91+613654.2 2.023 090902.73+355334.8 1.638 141230.28+471103.7 2.078 091025.25+365921.3 2.004 142010.28+604722.3 1.345 092415.87+424632.2 0.559 143702.47+613437.0 2.064 093557.85+005528.1 1.301 144939.30+534212.1 1.805 093935.08−000801.1 0.909 151307.26−000559.3 2.030 094326.48+460226.8 2.093 151422.99+481936.3 2.071 100415.17+415802.6 1.977 153257.67+422047.1 1.950 102117.71+623010.1 1.949 160320.97+315248.3 0.727 103039.95+510923.3 1.649 165806.76+611858.9 2.631 103219.66+563456.8 2.017 170929.58+323826.9 1.902 115917.62+100921.5 2.028 205058.45+004709.9 0.932 124345.10+492645.3 1.982 212744.12+005720.3 4.386 131810.57+585416.9 1.900 225246.43+142525.8 4.904 – 36 – Table 5. Quasar Colors as a Function of Redshifta zbin 〈z〉 NQSO (g − i) (u − g) (g − r) (r − i) (i − z) 0.18 0.181 183 0.567 −0.065 0.197 0.379 −0.037 0.21 0.210 290 0.580 0.032 0.223 0.355 −0.034 0.24 0.240 394 0.513 0.000 0.236 0.267 0.115 0.27 0.270 406 0.289 0.055 0.231 0.077 0.397 0.30 0.301 484 0.236 0.067 0.219 0.033 0.472 aTable 5 is presented in its entirety in the electronic edition of the Astro- nomical Journal. A portion is shown here for guidance regarding its form and content. – 37 – Table 6. Candidate Binary Quasars Quasar 1 Quasar 2 z1 z2 ∆θ 001201.87+005259.7 001202.35+005314.0 1.652 1.642 16.0 011757.99+002104.1 011758.83+002021.4 0.612 0.613 44.5 014110.40+003107.1 014111.62+003145.9 1.879 1.882 42.9 024511.93−011317.5 024512.12−011313.9 2.463 2.460 4.5 025813.65−000326.4 025815.54−000334.2 1.316 1.321 29.4 025959.68+004813.6 030000.57+004828.0 0.892 0.900 19.6 074336.85+205512.0 074337.28+205437.1 1.570 1.565 35.5 074759.02+431805.4 074759.66+431811.5 0.501 0.501 9.2 082439.83+235720.3 082440.61+235709.9 0.536 0.536 14.9 085625.63+511137.0 085626.71+511117.8 0.543 0.543 21.8 090923.12+000203.9 090924.01+000211.0 1.884 1.865 15.0 095556.37+061642.4 095559.02+061701.8 1.278 1.273 44.0 110357.71+031808.2 110401.48+031817.5 1.941 1.923 57.3 111610.68+411814.4 111611.73+411821.5 2.980 2.971 13.8 113457.73+084935.2 113459.37+084923.2 1.533 1.525 27.1 121840.47+501543.4 121841.00+501535.8 1.457 1.455 9.1 165501.31+260517.5 165502.02+260516.5 1.881 1.892 9.6 215727.26+001558.4 215728.35+001545.5 2.540 2.553 20.8 aThe quasar pairs were selected by a redshift difference of less than 0.02 and an angular separation less than 60′′. Introduction Observations Sloan Digital Sky Survey Target Selection Spectroscopy Construction of the SDSS DR5 Quasar Catalog Creation of the DR5 Quasar Candidate Database Visual Examination of the Spectra Luminosity and Line Width Criteria Catalog Format Catalog Summary Discrepancies Between the DR5 and Other Quasar Catalogs Quasar Colors Bright Quasars Luminous Quasars Broad Absorption Line Quasars Quasars with Redshifts Below 0.15 High-Redshift (z 4) Quasars Close Pairs Morphology Matches with Non-optical Catalogs Summary ABSTRACT We present the fourth edition of the Sloan Digital Sky Survey (SDSS) Quasar Catalog. The catalog contains 77,429 objects; this is an increase of over 30,000 entries since the previous edition. The catalog consists of the objects in the SDSS Fifth Data Release that have luminosities larger than M_i = -22.0 (in a cosmology with H_0 = 70 km/s/Mpc, Omega_M = 0.3, and Omega_Lambda = 0.7) have at least one emission line with FWHM larger than 1000 km/s, or have interesting/complex absorption features, are fainter than i=15.0, and have highly reliable redshifts. The area covered by the catalog is 5740 sq. deg. The quasar redshifts range from 0.08 to 5.41, with a median value of 1.48; the catalog includes 891 quasars at redshifts greater than four, of which 36 are at redshifts greater than five. Approximately half of the catalog quasars have i < 19; nearly all have i < 21. For each object the catalog presents positions accurate to better than 0.2 arcsec. rms per coordinate, five-band (ugriz) CCD-based photometry with typical accuracy of 0.03 mag, and information on the morphology and selection method. The catalog also contains basic radio, near-infrared, and X-ray emission properties of the quasars, when available, from other large-area surveys. The calibrated digital spectra cover the wavelength region 3800--9200A at a spectral resolution of ~2000. The spectra can be retrieved from the public database using the information provided in the catalog. The average SDSS colors of quasars as a function of redshift, derived from the catalog entries, are presented in tabular form. Approximately 96% of the objects in the catalog were discovered by the SDSS. <|endoftext|><|startoftext|> Introduction and Historical Perspective 3 2 QCD and the Nuclear Force 5 3 Effective Field Theory for Low-Energy QCD 5 3.1 Symmetries of Low-Energy QCD . . . . . . . . . . . . . . . . 6 3.1.1 Chiral Symmetry . . . . . . . . . . . . . . . . . . . . . 6 3.1.2 Explicit Symmetry Breaking . . . . . . . . . . . . . . 9 3.1.3 Spontaneous Symmetry Breaking . . . . . . . . . . . . 9 3.2 Chiral Effective Lagrangians Involving Pions . . . . . . . . . 10 3.3 Nucleon Contact Lagrangians . . . . . . . . . . . . . . . . . . 12 4 Nuclear Forces from EFT: Overview 13 4.1 Chiral Perturbation Theory and Power Counting . . . . . . . 14 4.2 The Hierarchy of Nuclear Forces . . . . . . . . . . . . . . . . 14 ∗Lecture series presented at the DAE-BRNS Workshop on Physics and Astrophysics of Hadrons and Hadronic Matter, Visva Bharati University, Santiniketan, West Bengal, India, November 2006. 5 Two-Nucleon Forces 16 5.1 Pion-Exchange Contributions in ChPT . . . . . . . . . . . . 16 5.1.1 Zeroth Order (LO) . . . . . . . . . . . . . . . . . . . 17 5.1.2 Second Order (NLO) . . . . . . . . . . . . . . . . . . 17 5.1.3 Third Order (NNLO) . . . . . . . . . . . . . . . . . . 19 5.1.4 Fourth Order (N3LO) . . . . . . . . . . . . . . . . . . 20 5.1.5 Iterated One-Pion-Exchange . . . . . . . . . . . . . . 20 5.2 NN Scattering in Peripheral Partial Waves Using the Pertur- bative Amplitude . . . . . . . . . . . . . . . . . . . . . . . . 22 5.3 NN Contact Potentials . . . . . . . . . . . . . . . . . . . . . 28 5.3.1 Zeroth Order . . . . . . . . . . . . . . . . . . . . . . . 29 5.3.2 Second Order . . . . . . . . . . . . . . . . . . . . . . . 30 5.3.3 Fourth Order . . . . . . . . . . . . . . . . . . . . . . . 30 5.4 Constructing a Chiral NN Potential . . . . . . . . . . . . . . 31 5.4.1 Conceptual Questions . . . . . . . . . . . . . . . . . . 31 5.4.2 What Order? . . . . . . . . . . . . . . . . . . . . . . . 33 5.4.3 Charge-Dependence . . . . . . . . . . . . . . . . . . . 34 5.4.4 A Quantitative NN Potential at N3LO . . . . . . . . 36 6 Many-Nucleon Forces 39 6.1 Three-Nucleon Forces . . . . . . . . . . . . . . . . . . . . . . 40 6.2 Four-Nucleon Forces . . . . . . . . . . . . . . . . . . . . . . . 42 7 Conclusions 42 A Fourth Order Two-Pion Exchange Contributions 44 A.1 One-loop diagrams . . . . . . . . . . . . . . . . . . . . . . . . 44 A.1.1 c2i contributions. . . . . . . . . . . . . . . . . . . . . . 44 A.1.2 ci/MN contributions. . . . . . . . . . . . . . . . . . . . 44 A.1.3 1/M2N corrections. . . . . . . . . . . . . . . . . . . . . 45 A.2 Two-loop contributions. . . . . . . . . . . . . . . . . . . . . . 46 B Partial Wave Decomposition of the Fourth Order Contact Potential 48 1 Introduction and Historical Perspective The theory of nuclear forces has a long history (cf. Table 1). Based upon the seminal idea by Yukawa [1], first field-theoretic attempts to derive the nucleon-nucleon (NN) interaction focused on pion-exchange. While the one- pion exchange turned out to be very useful in explaining NN scattering data and the properties of the deuteron [2], multi-pion exchange was beset with serious ambiguities [3, 4]. Thus, the “pion theories” of the 1950s are gen- erally judged as failures—for reasons we understand today: pion dynamics is constrained by chiral symmetry, a crucial point that was unknown in the 1950s. Historically, the experimental discovery of heavy mesons [5] in the early 1960s saved the situation. The one-boson-exchange (OBE) model [6, 7] emerged which is still the most economical and quantitative phenomenol- ogy for describing the nuclear force [8, 9]. The weak point of this model, however, is the scalar-isoscalar “sigma” or “epsilon” boson, for which the empirical evidence remains controversial. Since this boson is associated with the correlated (or resonant) exchange of two pions, a vast theoretical effort that occupied more than a decade was launched to derive the 2π-exchange contribution to the nuclear force, which creates the intermediate range at- traction. For this, dispersion theory as well as field theory were invoked producing the Stony Brook [10], Paris [11, 12], and Bonn [7, 13] potentials. The nuclear force problem appeared to be solved; however, with the discovery of quantum chromodynamics (QCD), all “meson theories” were relegated to models and the attempts to derive the nuclear force started all over again. The problem with a derivation from QCD is that this theory is non- perturbative in the low-energy regime characteristic of nuclear physics, which makes direct solutions impossible. Therefore, during the first round of new attempts, QCD-inspired quark models [14] became popular. These models are able to reproduce qualitatively and, in some cases, semi-quantitatively the gross features of the nuclear force [15, 16]. However, on a critical note, it has been pointed out that these quark-based approaches are nothing but another set of models and, thus, do not represent any fundamental progress. Equally well, one may then stay with the simpler and much more quantita- tive meson models. A major breakthrough occurred when the concept of an effective field theory (EFT) was introduced and applied to low-energy QCD. As outlined by Weinberg in a seminal paper [17], one has to write down the most general Lagrangian consistent with the assumed symmetry principles, particularly Table 1: Seven Decades of Struggle: The Theory of Nuclear Forces 1935 Yukawa: Meson Theory The “Pion Theories” 1950’s One-Pion Exchange: o.k. Multi-Pion Exchange: disaster Many pions ≡ multi-pion resonances: 1960’s σ, ρ, ω, ... The One-Boson-Exchange Model: success Refined meson models, including 1970’s sophisticated 2π exchange contributions (Stony Brook, Paris, Bonn) Nuclear physicists discover 1980’s QCD Quark Cluster Models Nuclear physicists discover EFT 1990’s Weinberg, van Kolck and beyond Back to Pion Theory! But, constrained by Chiral Symmetry: success the (broken) chiral symmetry of QCD. At low energy, the effective degrees of freedom are pions and nucleons rather than quarks and gluons; heavy mesons and nucleon resonances are “integrated out”. So, the circle of his- tory is closing and we are back to Yukawa’s meson theory, except that we have learned to add one important refinement to the theory: broken chiral symmetry is a crucial constraint that generates and controls the dynamics and establishes a clear connection with the underlying theory, QCD. Following the first initiative by Weinberg [18], pioneering work was per- formed by Ordóñez, Ray, and van Kolck [19, 20] who constructed a NN potential in coordinate space based upon chiral perturbation theory at next- to-next-to-leading order. The results were encouraging and many researchers became attracted to the new field [21, 22, 23, 24, 25, 26, 27]. As a conse- quence, nuclear EFT has developed into one of the most popular branches of modern nuclear physics [28, 29]. It is the purpose of these lectures to describe in some detail the recent progress in our understanding of nuclear forces in terms of nuclear EFT. 2 QCD and the Nuclear Force Quantum chromodynamics (QCD) is the theory of strong interactions. It deals with quarks, gluons and their interactions and is part of the Standard Model of Particle Physics. QCD is a non-Abelian gauge field theory with color SU(3) the underlying gauge group. The non-Abelian nature of the theory has dramatic consequences. While the interaction between colored objects is weak at short distances or high momentum transfer (“asymptotic freedom”); it is strong at long distances ( >∼ 1 fm) or low energies, leading to the confinement of quarks into colorless objects, the hadrons. Conse- quently, QCD allows for a perturbative analysis at large energies, whereas it is highly non-perturbative in the low-energy regime. Nuclear physics resides at low energies and the force between nucleons is a residual QCD interac- tion. Therefore, in terms of quarks and gluons, the nuclear force is a very complicated problem. 3 Effective Field Theory for Low-Energy QCD The way out of the dilemma of how to derive the nuclear force from QCD is provided by the effective field theory (EFT) concept. First, one needs to identify the relevant degrees of freedom. For the ground state and the low- energy excitation spectrum of an atomic nucleus as well as for conventional nuclear reactions, quarks and gluons are ineffective degrees of freedom, while nucleons and pions are the appropriate ones. Second; to make sure that this EFT is not just another phenomenology, the EFT must observe all relevant symmetries of the underlying theory. This requirement is based upon a ‘folk theorem’ by Weinberg [17]: If one writes down the most general possible Lagrangian, in- cluding all terms consistent with assumed symmetry principles, and then calculates matrix elements with this Lagrangian to any given order of perturbation theory, the result will simply be the most general possible S-matrix consistent with analyticity, per- turbative unitarity, cluster decomposition, and the assumed sym- metry principles. Thus, the EFT program consists of the following steps: 1. Identify the degrees of freedom relevant at the resolution scale of (low- energy) nuclear physics: nucleons and pions. 2. Identify the relevant symmetries of low-energy QCD and investigate if and how they are broken. 3. Construct the most general Lagrangian consistent with those symme- tries and the symmetry breaking. 4. Design an organizational scheme that can distinguish between more and less important contributions: a low-momentum expansion. 5. Guided by the expansion, calculate Feynman diagrams to the the de- sired accuracy for the problem under consideration. We will now elaborate on these steps, one by one. 3.1 Symmetries of Low-Energy QCD In this section, we will give a brief introduction into (low-energy) QCD, its symmetries and symmetry breaking. A more detailed introduction can be found in the excellent lecture series by Scherer and Schindler [30]. 3.1.1 Chiral Symmetry The QCD Lagrangian reads LQCD = q̄(iγµDµ −M)q − Gµν,aGµνa (1) with the gauge-covariant derivative Dµ = ∂µ + ig Aµ,a (2) and the gluon field strength tensor Gµν,a = ∂µAν,a − ∂νAµ,a − gfabcAµ,bAν,c . (3) In the above, q denotes the quark fields and M the quark mass matrix. Further, g is the strong coupling constant and Aµ,a are the gluon fields. The λa are the Gell-Mann matrices and the fabc the structure constants of the SU(3)color Lie algebra (a, b, c = 1, . . . , 8); summation over repeated indices is always implied. The gluon-gluon term in the last equation arises from the non-Abelian nature of the gauge theory and is the reason for the peculiar features of the color force. On a typical hadronic scale, i.e., on a scale of low-mass hadrons which are not Goldstone bosons, e.g., mρ = 0.78 GeV ≈ 1 GeV; the masses of the up (u), down (d), and—to a certain extend—strange (s) quarks are small [31]: mu = 2± 1 MeV (4) md = 5± 2 MeV (5) ms = 95± 25 MeV (6) It is therefore of interest to discuss the QCD Lagrangian in the limit of vanishing quark masses: L0QCD = q̄iγ µDµq − Gµν,aGµνa . (7) Defining right- and left-handed quark fields, qR = PRq , qL = PLq , (8) (1 + γ5) , PL = (1− γ5) , (9) we can rewrite the Lagrangian as follows: L0QCD = q̄Riγ µDµqR + q̄LiγµDµqL − Gµν,aGµνa . (10) Restricting ourselves now to up and down quarks, we see that L0QCD is invariant under the global unitary transformations 7−→ exp −iΘRi 7−→ exp −iΘLi , (12) where τi (i = 1, 2, 3) are the generators of SU(2)flavor, the usual Pauli spin matrices. The right- and left-handed components of massless quarks do not mix. This is SU(2)R × SU(2)R symmetry, also known as chiral symmetry. Noether’s Theorem implies the existence of six conserved currents; three right-handed currents i = q̄Rγ qR with ∂µR i = 0 (13) and three left-handed currents i = q̄Lγ qL with ∂µL i = 0 . (14) It is useful to consider the following linear combinations; namely, three vec- tor currents i = R i + L i = q̄γ q with ∂µV i = 0 (15) and three axial-vector currents i = R i − L i = q̄γ q with ∂µA i = 0 , (16) which got their names from the fact that they transform as vectors and axial-vectors, respectively. Thus, the chiral SU(2)L × SU(2)R symmetry is equivalent to SU(2)V ×SU(2)A, where the vector and axial-vector transfor- mations are given respectively by 7−→ exp −iΘVi 7−→ exp −iΘAi γ5 . (18) Obviously, the vector transformations are isospin rotations and, therefore, invariance under vector transformations can be identified with isospin sym- metry. There are the six conserved charges, QVi = d3x V 0i = d3x q†(t, ~x) q(t, ~x) with = 0 (19) QAi = d3x A0i = d3x q†(t, ~x)γ5 q(t, ~x) with = 0 , (20) which are also generators of SU(2)V × SU(2)A. 3.1.2 Explicit Symmetry Breaking The mass term −q̄Mq in the QCD Lagrangian Eq. (1) breaks chiral sym- metry explicitly. To better see this, let’s rewrite M, (mu +md) (mu −md) (mu +md) I + (mu −md) τ3 . (23) The first term in the last equation in invariant under SU(2)V (isospin sym- metry) and the second term vanishes for mu = md. Thus, isospin is an exact symmetry if mu = md. However, both terms in Eq. (23) break SU(2)A. Since the up and down quark masses are small as compared to the typical hadronic mass scale of ≈ 1 GeV [cf. Eqs. (4) and (5)], the explicit chiral symmetry breaking due to non-vanishing quark masses is very small. 3.1.3 Spontaneous Symmetry Breaking A (continuous) symmetry is said to be spontaneously broken if a symmetry of the Lagrangian is not realized in the ground state of the system. There is evidence that the chiral symmetry of the QCD Lagrangian is spontaneously broken—for dynamical reasons of nonperturbative origin which are not fully understood at this time. The most plausible evidence comes from the hadron spectrum. From chiral symmetry, one would naively expect the existence of degenerate hadron multiplets of opposite parity, i.e., for any hadron of positive parity one would expect a degenerate hadron state of negative parity and vice versa. However, these “parity doublets” are not observed in nature. For example, take the ρ-meson, a vector meson with negative parity (1−) and mass 776 MeV. There does exist a 1+ meson, the a1, but it has a mass of 1230 MeV and, thus, cannot be perceived as degenerate with the ρ. On the other hand, the ρ meson comes in three charge states (equivalent to three isospin states), the ρ± and the ρ0 with masses that differ by at most a few MeV. In summary, in the QCD ground state (the hadron spectrum) SU(2)V (isospin symmetry) is well observed, while SU(2)A (axial symmetry) is broken. Or, in other words, SU(2)V ×SU(2)A is broken down to SU(2)V . A spontaneously broken global symmetry implies the existence of (mass- less) Goldstone bosons with the quantum numbers of the broken generators. The broken generators are the QAi of Eq. (20) which are pseudoscalar. The Goldstone bosons are identified with the isospin triplet of the (pseudoscalar) pions, which explains why pions are so light. The pion masses are not ex- actly zero because the up and down quark masses are not exactly zero either (explicit symmetry breaking). Thus, pions are a truly remarkable species: they reflect spontaneous as well as explicit symmetry breaking. 3.2 Chiral Effective Lagrangians Involving Pions The next step in our EFT program is to build the most general Lagrangian consistent with the (broken) symmetries discussed above. An elegant formal- ism for the construction of such Lagrangians was developed by Callan, Cole- man, Wess, and Zumino (CCWZ) [32] who worked out the group-theoretical foundations of non-linear realizations of chiral symmetry. The Lagrangians given below are built upon the CCWZ formalism. As discussed, the relevant degrees of freedom are pions (Goldstone bosons) and nucleons. Since the interactions of Goldstone bosons must vanish at zero momentum transfer and in the chiral limit (m→ 0), the low-energy expan- sion of the Lagrangian is arranged in powers of derivatives and pion masses. This is chiral perturbation theory (ChPT). The Lagrangian consists of one part that deals with the interaction among pions, Lππ, and another one that describes the interaction between pions and the nucleon, LπN : Leff = Lππ + LπN (24) Lππ = L(2)ππ + L ππ + . . . (25) LπN = L πN + L πN + L πN + . . . , (26) where the superscript refers to the number of derivatives or pion mass inser- tions (chiral dimension) and the ellipsis stands for terms of higher dimension. The leading order (LO) ππ Lagrangian is given by [33] L(2)ππ = ∂µU∂µU † +m2π(U + U and the LO relativistic πN Lagrangian reads [34] L(1)πN = Ψ̄ iγµDµ −MN + γµγ5uµ Ψ (28) Dµ = ∂µ + Γµ (29) (ξ†∂µξ + ξ∂µξ τ · (π × ∂µπ) + . . . (30) uµ = i(ξ †∂µξ − ξ∂µξ†) = − τ · ∂µπ + . . . (31) U = ξ2 = 1 + τ · π − (τ · π)3 + 8α− 1 π4 + . . . (32) In Eq. (28) the chirally covariant derivative Dµ is applied which introduces the “gauge term” Γµ (also known as chiral connection), a vector current that leads to a coupling of pions with the nucleon. Besides this, the Lagrangian includes a coupling term which involves the axial vector uµ. The SU(2) matrix U = ξ2 collects the Goldstone pion fields. In the above equations, MN denotes the nucleon mass, gA the axial- vector coupling constant, and fπ the pion decay constant. Numerical values will be given later. The coefficient α that appears in Eq. (32) is arbitrary. Therefore, dia- grams with chiral vertices that involve three or four pions must always be grouped together such that the α-dependence drops out (cf. Fig. 4, below). We apply the heavy baryon (HB) formulation of chiral perturbation the- ory [35] in which the relativistic πN Lagrangian is subjected to an expansion in terms of powers of 1/MN (kind of a nonrelativistic expansion), the lowest order of which is L̂(1)πN = N̄ iD0 − ~σ · ~u i∂0 − τ · (π × ∂0π)− τ · (~σ · ~∇)π N + . . . (33) In the relativistic formulation, the nucleon is represented by a four-component Dirac spinor field, Ψ, while in the HB version, the nucleon, N , is a Pauli spinor; in addition, all nucleon fields include Pauli spinors describing the isospin of the nucleon. At dimension two, the relativistic πN Lagrangian reads L(2)πN = ciΨ̄O i Ψ . (34) The various operators O(2)i are given in Ref. [36]. The fundamental rule by which this Lagrangian—as well as all the other ones—are assembled is that they must contain all terms consistent with chiral symmetry and Lorentz invariance (apart from other trivial symmetries) at a given chiral dimension (here: order two). The parameters ci are known as low-energy constants (LECs) and are determined empirically from fits to πN data. The HB projected πN Lagrangian at order two is most conveniently broken up into two pieces, L̂(2)πN = L̂ πN, fix + L̂ πN, ct , (35) L̂(2)πN, fix = N̄ ~D · ~D + i {~σ · ~D, u0} N (36) L̂(2)πN, ct = N̄ 2 c1m π (U + U u20 + c3 uµu ~σ · (~u× ~u) N . (37) Note that L̂(2)πN, fix is created entirely from the HB expansion of the relativistic L(1)πN and thus has no free parameters (“fixed”), while L̂ πN, ct is dominated by the new πN contact terms proportional to the ci parameters, besides some small 1/MN corrections. At dimension three, the relativistic πN Lagrangian can be formally writ- ten as L(3)πN = diΨ̄O i Ψ , (38) with the operators, O(3)i , listed in Refs. [36, 37]; not all 23 terms are of interest here. The new LECs that occur at this order are the di. Similar to the order two case, the HB projected Lagrangian at order three can be broken into two pieces, L̂(3)πN = L̂ πN, fix + L̂ πN, ct , (39) with L̂(3)πN, fix and L̂ πN, ct given in Refs. [36, 37]. 3.3 Nucleon Contact Lagrangians Nucleon contact interactions consist of four nucleon fields (four nucleon legs) and no meson fields. Such terms are needed to renormalize loop integrals, to make results reasonably independent of regulators, and to parametrize the unresolved short-distance contributions to the nuclear force. For more about contact terms, see Sec. 5.3. Because of parity, nucleon contact interactions come only in even num- bers of derivatives, thus, LNN = L NN + L NN + L NN + . . . (40) The lowest order (or leading order) NN Lagrangian has no derivatives and reads [18] L(0)NN = − CSN̄NN̄N − CT N̄~σNN̄~σN , (41) where N is the heavy baryon nucleon field. CS and CT are unknown con- stants which are determined by a fit to the NN data. The second order NN Lagrangian is given by [19] L(2)NN = −C 1[(N̄ ~∇N) 2 + (~∇NN)2]− C ′2(N̄ ~∇N) · (~∇NN) −C ′3N̄N [N̄ ~∇ 2N + ~∇2NN ] −iC ′4[N̄ ~∇N · (~∇N × ~σN) + (~∇N)N · (N̄~σ × ~∇N)] −iC ′5N̄N(~∇N · ~σ × ~∇N)− iC 6(N̄~σN) · (~∇N × ~∇N) −(C ′7δikδjl + C 8δilδkj + C 9δijδkl) ×[N̄σk∂iNN̄σl∂jN + ∂iNσkN∂jNσlN ] −(C ′10δikδjl + C 11δilδkj + C 12δijδkl)N̄σk∂iN∂jNσlN C ′13(δikδjl + δilδkj) +C ′14δijδkl)[∂iNσk∂jN + ∂jNσk∂iN ]N̄σlN . (42) Similar to CS and CT , the C ′i are unknown constants which are fixed in a fit to the NN data. Obviously, these contact Lagrangians blow up quite a bit with increasing order, which why we do not give L(4)NN explicitly here. 4 Nuclear Forces from EFT: Overview In the beginning of Sec. 3, we spelled out the steps we have to take to accomplish our EFT program for the derivation of nuclear forces. So far, we discussed steps one to three. What is left are steps four (low-momentum expansion) and five (Feynman diagrams). In this section, we will say more about the expansion we are using and give an overview of the Feynman diagrams that arise order by order. 4.1 Chiral Perturbation Theory and Power Counting In ChPT, we analyze contributions in terms of powers of small momenta over the large scale: (Q/Λχ)ν , where Q stands for a momentum (nucleon three-momentum or pion four-momentum) or a pion mass and Λχ ≈ 1 GeV is the chiral symmetry breaking scale (hadronic scale). Determining the power ν at which a given diagram contributes has become known as power counting. For a non-iterative contribution involving A nucleons, the power ν is given by ν = −2 + 2A− 2C + 2L+ ∆i , (43) ∆i ≡ di + − 2 , (44) where C denotes the number of separately connected pieces and L the num- ber of loops in the diagram; di is the number of derivatives or pion-mass insertions and ni the number of nucleon fields involved in vertex i; the sum runs over all vertices contained in the diagram under consideration. Note that for an irreducible NN diagram (A = 2), the above formula reduces to ν = 2L+ ∆i (45) The power ν is bounded from below; e.g., for A = 2, ν ≥ 0. This fact is crucial for the power expansion to be of any use. 4.2 The Hierarchy of Nuclear Forces Chiral perturbation theory and power counting imply that nuclear forces emerge as a hierarchy ruled by the power ν, Fig. 1. The NN amplitude is determined by two classes of contributions: con- tact terms and pion-exchange diagrams. There are two contacts of order Q0 [O(Q0)] represented by the four-nucleon graph with a small-dot vertex shown in the first row of Fig. 1. The corresponding graph in the second row, four nucleon legs and a solid square, represents the seven contact terms of O(Q2). Finally, at O(Q4), we have 15 contact contributions represented by a four-nucleon graph with a solid diamond. Now, turning to the pion contributions: At leading order [LO, O(Q0), ν = 0], there is only the well-known static one-pion exchange (1PE), second diagram in the first row of Fig. 1. Two-pion exchange (2PE) starts at next- to-leading order (NLO, ν = 2) and all diagrams of this leading-order two- pion exchange are shown. Further 2PE contributions occur in any higher +... +... +... 2N Force 3N Force 4N Force N LO3 Figure 1: Hierarchy of nuclear forces in ChPT. Solid lines represent nucleons and dashed lines pions. Further explanations are given in the text. order. Of this sub-leading 2PE, we show only two representative diagrams at next-to-next-to-leading order (NNLO) and three diagrams at next-to-next- to-next-to-leading order (N3LO). Finally, there is also three-pion exchange, which shows up for the first time at N3LO (two loops; one representative 3π diagram is included in Fig. 1). At this order, the 3π contribution is negligible [38]. One important advantage of ChPT is that it makes specific predictions also for many-body forces. For a given order of ChPT, two-nucleon forces (2NF), three-nucleon forces (3NF), . . . are generated on the same footing (cf. Fig. 1). At LO, there are no 3NF, and at NLO, all 3NF terms cancel [18, 39]. However, at NNLO and higher orders, well-defined, nonvanishing 3NF occur [39, 40]. Since 3NF show up for the first time at NNLO, they are weak. Four-nucleon forces (4NF) occur first at N3LO and, therefore, they are even weaker. 5 Two-Nucleon Forces In this section, we will elaborate in detail on the two-nucleon force contri- butions of which we have given a rough overview in the previous section. 5.1 Pion-Exchange Contributions in ChPT The effective pion Lagrangians presented in Sec. 3.2 are the crucial ingredi- ents for the evaluation of the pion-exchange contributions to the NN inter- action. We will derive these contributions now order by order. We will state our results in terms of contributions to the momentum- space NN amplitude in the center-of-mass system (CMS), which takes the general form V (~p ′, ~p) = VC + τ 1 · τ 2WC + [VS + τ 1 · τ 2WS ] ~σ1 · ~σ2 + [VLS + τ 1 · τ 2WLS ] −i~S · (~q × ~k) + [VT + τ 1 · τ 2WT ] ~σ1 · ~q ~σ2 · ~q + [VσL + τ 1 · τ 2WσL ] ~σ1 · (~q × ~k ) ~σ2 · (~q × ~k ) , (46) where ~p ′ and ~p denote the final and initial nucleon momenta in the CMS, respectively; moreover, ~q ≡ ~p ′ − ~p is the momentum transfer, ~k ≡ 1 (~p ′ + ~p) the average momentum, ~S ≡ 1 (~σ1 + ~σ2) the total spin, and ~σ1,2 and τ 1,2 are the spin and isospin operators, respectively, of nucleon 1 and 2. For on-energy-shell scattering, Vα and Wα (α = C, S, LS, T, σL) can be expressed as functions of q and k (with q ≡ |~q| and k ≡ |~k|), only. Our formalism is similar to the one used by the Munich group [22, 41, 42] except for two differences: all our momentum space amplitudes differ by an over-all factor of (−1) and our spin-orbit potentials, VLS and WLS , differ by an additional factor of (−2). Our conventions are more in tune with what is commonly used in nuclear physics. In all expressions given below, we will state only the nonpolynomial con- tributions to the NN amplitude. Note, however, that dimensional regular- ization typically generates also polynomial terms. These polynomials are absorbed by the contact interactions to be discussed in a later section and, therefore, they are of no interest here. 5.1.1 Zeroth Order (LO) At order zero [ν = 0, O(Q0), lowest order, leading order, LO], there is only the well-known static one-pion exchange, second diagram in the first row of Fig. 1 which is given by: V1π(~p ′, ~p) = − τ 1 · τ 2 ~σ1 · ~q ~σ2 · ~q q2 +m2π . (48) At first order [ν = 1, O(Q)], there are no pion-exchange contributions (and also no contact terms). 5.1.2 Second Order (NLO) Non-vanishing higher-order graphs start at second order (ν = 2, next-to- leading order, NLO). The most efficient way to evaluate these loop diagrams is to use covariant perturbation theory and dimensional regularization. This is the method applied by the Munich group [22, 41, 42]. One starts with the relativistic versions of the πN Lagrangians (cf. Sec. 3.2) and sets up four-dimensional (covariant) loop integrals. Relativistic vertices and nucleon propagators are then expanded in powers of 1/MN . The divergences that occur in conjunction with the four-dimensional loop integrals are treated by means of dimensional regularization, a prescription which is consistent with chiral symmetry and power counting. The results derived in this way are the same obtained when starting right away with the HB versions of the πN Lagrangians. However, as it turns out, the method used by the Munich group is more efficient in dealing with the rather tedious calculations. Two-pion exchange occurs first at second order, also know as leading- order 2π exchange. The graphs are shown in the first row of Fig. 2. Since a loop creates already ν = 2, the vertices involved at this order can only be from the leading/lowest order Lagrangian L̂(1)πN , Eq. (33), i. e., they carry only one derivative. These vertices are denoted by small dots in Fig. 2. Concerning the box diagram, we should note that we include only the non- iterative part of this diagram which is obtained by subtracting the iter- ated 1PE contribution Eq. (65) or Eq. (66), below, but using M2N/Ep ≈ (N LO)2 (NLO) Figure 2: Two-pion exchange contributions to the NN interaction at order two and three in small momenta. Solid lines represent nucleons and dashed lines pions. Small dots denote vertices from the leading order πN Lagrangian L̂(1)πN , Eq. (33). Large solid dots are vertices proportional to the LECs ci from the second order Lagrangian L̂(2)πN, ct, Eq. (37). Symbols with an open circles are relativistic 1/MN corrections contained in the second order Lagrangian L̂(2)πN , Eqs. (35). Only a few representative examples of 1/MN corrections are shown and not all. M2N/Ep′′ ≈ MN at this order (NLO). Summarizing all contributions from irreducible two-pion exchange at second order, one obtains [22]: WC = − 384π2f4π 4m2π(5g A − 4g A − 1) + q 2(23g4A − 10g A − 1) 48g4Am , (49) VT = − VS = − 3g4AL(q) 64π2f4π , (50) where L(q) ≡ w + q 4m2π + q2 . (52) 5.1.3 Third Order (NNLO) The two-pion exchange diagrams of order three (ν = 3, next-to-next-to- leading order, NNLO) are very similar to the ones of order two, except that they contain one insertion from L̂(2)πN , Eq. (35). The resulting contributions are typically either proportional to one of the low-energy constants ci or they contain a factor 1/MN . Notice that relativistic 1/MN corrections can occur for vertices and nucleon propagators. In Fig. 2, we show in row 2 the diagrams with vertices proportional to ci (large solid dot), Eq. (37), and in row 3 and 4 a few representative graphs with a 1/MN correction (symbols with an open circle). The number of 1/MN correction graphs is large and not all are shown in the figure. Again, the box diagram is corrected for a contribution from the iterated 1PE. If the iterative 2PE of Eq. (65) is used, the expansion of the factor M2N/Ep = MN − p 2/2MN + . . . is applied and the term proportional to (−p2/2MN ) is subtracted from the third order box diagram contribution. Then, one obtains for the full third order contribution [22]: 16πf4π 16MNw2 2m2π(2c1 − c3)− q × w̃2A(q) , (53) 128πMNf4π 3g2Am 4m2π + 2q 2 − g2A(4m π + 3q w̃2A(q) , (54) VT = − 9g4Aw̃ 2A(q) 512πMNf4π , (55) WT = − g2AA(q) 32πf4π (10m2π + 3q , (56) VLS = 3g4Aw̃ 2A(q) 32πMNf4π , (57) WLS = g2A(1− g 32πMNf4π w2A(q) , (58) A(q) ≡ arctan 2m2π + q2 . (60) As discussed in Sec. 5.1.5, below, we prefer the iterative 2PE defined in Eq. (66), which leads to a different NNLO term for the iterative 2PE. This changes the 1/MN terms in the above potentials. The changes are obtained by adding to Eqs. (53)-(56) the following terms: VC = − 256πf4πMN 2 + ω̃4A(q)) (61) 128πf4πMN 2 + ω̃4A(q)) (62) VT = − 512πf4πMN (mπ + ω 2A(q)) (63) WT = − WS = − 256πf4πMN (mπ + ω 2A(q)) (64) 5.1.4 Fourth Order (N3LO) This order, which may also be denoted by next-to-next-to-next-to-leading order (N3LO), is very involved. Three-pion exchange (3PE) occurs for the first time at this order. The 3PE contribution at N3LO has been calculated by the Munich group and found to be negligible [38]. Therefore, we will ignore it. The 2PE contributions at N3LO can be subdivided into two groups, one- loop graphs, Fig. 3, and two-loop diagrams, Fig. 4. Since these contributions are very complicated, we have moved them to Appendix A. 5.1.5 Iterated One-Pion-Exchange Besides all the irreducible 2PE contributions presented above, there is also the reducible 2PE which is generated from iterated 1PE. This “iterative 2PE” is the only 2PE contribution which produces an imaginary part. Thus, one wishes to formulate this contribution such that relativistic elastic uni- tarity is satisfied. There are several ways to achieve this. Kaiser et al. [22] define the iterative 2PE contribution as follows, (KBW) 2π,it (~p ′, ~p) = d3p′′ (2π)3 V1π(~p ′, ~p ′′)V1π(~p ′′, ~p) p2 − p′′2 + i� Q4 (N LO)3 Figure 3: One-loop 2π-exchange contributions to the NN interaction at order four. Basic notation as in Fig. 2. Symbols with a large solid dot and an open circle denote 1/MN corrections of vertices proportional to ci. Symbols with two open circles mark relativistic 1/M2N corrections. Both corrections are part of the third order Lagrangian L̂(3)πN , Eq. (39). Representative examples for all types of one-loop graphs that occur at this order are shown. with V1π given in Eq. (48). Since we adopt the relativistic scheme developed by Blankenbecler and Sugar [43] (BbS) (see beginning of Sec. 5.4), we prefer the following for- mulation which is consistent with the BbS approach (and, of course, with relativistic elastic unitarity): 2π,it (~p ′, ~p) = d3p′′ (2π)3 V1π(~p ′, ~p ′′)V1π(~p ′′, ~p) p2 − p′′2 + i� . (66) The iterative 2PE contribution has to be subtracted from the covariant box diagram, order by order. For this, the expansion M2N/Ep = MN − p2/2MN + . . . is applied in Eq. (65) and M2N/Ep′′ = MN −p ′′2/2MN + . . . in Eq. (66). At NLO, both choices for the iterative 2PE collapse to the same, � � � � � � � � � � � (N LO)3 Figure 4: Two-loop 2π-exchange contributions at order four. Basic notation as in Fig. 2. The oval stands for all one-loop πN graphs some of which are shown in the lower part of the figure. The solid square represents vertices proportional to the LECs di which are introduced by the third order Lagrangian L πN , Eq. (38). More explanations are given in the text. while at NNLO there are obvious differences. 5.2 NN Scattering in Peripheral Partial Waves Using the Perturbative Amplitude After the tedious mathematics of the previous section, it is time for more tangible affairs. The obvious question to address now is: How does the derived NN amplitude compare to empirical information? Since our deriva- tion includes only one- and two-pion exchanges, we are dealing here with the long- and intermediate-range part of the NN interaction. This part of the nuclear force is probed in the peripheral partial waves of NN scattering. Thus, in this section, we will calculate the phase shifts that result from the NN amplitudes presented in the previous section and compare them to the empirical phase shifts as well as to the predictions from conventional meson theory. Besides the irreducible two-pion exchanges derived above, we must also include 1PE and iterated 1PE. In this section [44], which is restricted to just peripheral waves, we will always consider neutron-proton (np) scattering and take the charge- dependence of 1PE due to pion-mass splitting into account, since it is ap- preciable. With the definition V1π(mπ) ≡ − ~σ1 · ~q ~σ2 · ~q q2 +m2π , (67) the charge-dependent 1PE for np scattering is 1π (~p ′, ~p) = −Vπ(mπ0) + (−1) I+1 2Vπ(mπ±) , (68) where I denotes the isospin of the two-nucleon system. We use mπ0 = 134.9766 MeV, mπ± = 139.5702 MeV [31], and 2MpMn Mp +Mn = 938.9182 MeV . (69) Also in the iterative 2PE, we apply the charge-dependent 1PE, i.e., in Eq. (66) we replace V1π with V The perturbative relativistic T-matrix for np scattering in peripheral waves is T (~p ′, ~p) = V (np)1π (~p ′, ~p) + V (EM,np)2π,it (~p ′, ~p) + V2π,irr(~p ′, ~p) , (70) where V2π,irr refers to any or all of the irreducible 2PE contributions pre- sented in Sec. 5.1, depending on the order at which the calculation is con- ducted. In the calculation of the irreducible 2PE, we use the average pion mass mπ = 138.039 MeV and, thus, neglect the charge-dependence due to pion-mass splitting. The charge-dependence that emerges from irreducible 2π exchange was investigated in Ref. [45] and found to be negligible for partial waves with L ≥ 3. For the T -matrix given in Eq. (70), we calculate phase shifts for partial waves with L ≥ 3 and Tlab ≤ 300 MeV. At order four in small momenta, partial waves with L ≥ 3 do not receive any contributions from contact inter- actions and, thus, the non-polynomial pion contributions uniquely predict the F and higher partial waves. We use fπ = 92.4 MeV [31] and gA = 1.29. Via the Goldberger-Treiman relation, gA = gπNN fπ/MN , our value for gA is consistent with g2πNN/4π = 13.63± 0.20 which is obtained from πN and NN analysis [46, 47]. The LECs used in this calculation are shown in Table 2, column “NN periph. Fig. 5”. Note that many determinations of the LECs, ci and d̄i, can be found in the literature. The most reliable way to determine the LECs Table 2: Low-energy constants, LECs, used for a NN potential at N3LO, Sec. 5.4.4, and in the calculation of the peripheral NN phase shifts shown in Fig. 5 (column “NN periph. Fig. 5”). The ci belong to the dimension- two πN Lagrangian, Eq. (37), and are in units of GeV−1, while the d̄i are associated with the dimension-three Lagrangian, Eq. (38), and are in units of GeV−2. The column “πN empirical” shows determinations from πN data. LEC NN potential NN periph. πN at N3LO Fig. 5 empirical c1 –0.81 –0.81 −0.81± 0.15a c2 2.80 3.28 3.28± 0.23b c3 –3.20 –3.40 −4.69± 1.34a c4 5.40 3.40 3.40± 0.04a d̄1 + d̄2 3.06 3.06 3.06± 0.21b d̄3 –3.27 –3.27 −3.27± 0.73b d̄5 0.45 0.45 0.45± 0.42b d̄14 − d̄15 –5.65 –5.65 −5.65± 0.41b aTable 1, Fit 1 of Ref. [48]. bTable 2, Fit 1 of Ref. [37]. from empirical πN information is to extract them from the πN amplitude inside the Mandelstam triangle (unphysical region) which can be constructed with the help of dispersion relations from empirical πN data. This method was used by Büttiker and Meißner [48]. Unfortunately, the values for c2 and all d̄i parameters obtained in Ref. [48] carry uncertainties, so large that the values cannot provide any guidance. Therefore, in Table 2, only c1, c3, and c4 are from Ref. [48], while the other LECs are taken from Ref. [37] where the πN amplitude in the physical region was considered. To establish a link between πN and NN , we apply the values from the above determinations in our calculations of the NN peripheral phase shifts. In general, we use the mean values; the only exception is c3, where we choose a value that is, in terms of magnitude, about one standard deviation below the one from Ref. [48]. With the exception of c3, phase shift predictions do not depend sensitively on variations of the LECs within the quoted uncertainties. In Fig. 5, we show the phase-shift predictions for neutron-proton scat- tering in F waves for laboratory kinetic energies below 300 MeV (for G and H waves, see Ref. [26]). The orders displayed are defined as follows: • Leading order (LO) is just 1PE, Eq. (68). 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) Figure 5: F -wave phase shifts of neutron-proton scattering for laboratory kinetic energies below 300 MeV. We show the predictions from chiral pion exchange at lead- ing order (LO), next-to-leading order (NLO), next-to-next-to-leading order (N2LO), and next-to-next-to-next-to-leading order (N3LO). The solid dots and open circles are the results from the Nijmegen multi-energy np phase shift analysis [49] and the VPI single-energy np analysis SM99 [50], respectively. • Next-to-leading order (NLO) is 1PE, Eq. (68), plus iterated 1PE, Eq. (66), plus the contributions of Sec. 5.1.2 (order two), Eqs. (49) and (50). • Next-to-next-to-leading order (denoted by N2LO in the figures) con- sists of NLO plus the contributions of Sec. 5.1.3 (order three), Eqs. (53)- (58) and (61)-(64). • Next-to-next-to-next-to-leading order (denoted by N3LO in the fig- ures) consists of N2LO plus the contributions of Sec. 5.1.4 (order four), Eqs. (99)-(112) and (115)-(124). It is clearly seen in Fig. 5 that the leading order 2π exchange (NLO) is a rather small contribution, insufficient to explain the empirical facts. In 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) Figure 6: F -wave phase shifts of neutron-proton scattering for laboratory kinetic energies below 300 MeV. We show the results from one-pion-exchange (OPE), and one- plus two-pion exchange as predicted by ChPT at next-to-next-to-next- to-leading order (N3LO) and by the Bonn Full Model [13] (Bonn). Note that the “Bonn” curve does not include the repulsive ω and πρ exchanges of the full model, since this figure serves the purpose to compare just predictions by different mod- els/theories for the π + 2π contribution to the NN interaction. Empirical phase shifts (solid dots and open circles) as in Fig. 5. contrast, the next order (N2LO) is very large, several times NLO. This is due to the ππNN contact interactions proportional to the LECs ci that are introduced by the second order Lagrangian L(2)πN , Eq. (34). These contacts are supposed to simulate the contributions from intermediate ∆-isobars and correlated 2π exchange which are known to be large (see, e. g., Ref. [13]). At N3LO a clearly identifiable trend towards convergence emerges. Ob- viously, 1F3 and 3F4 appear fully converged. However, in 3F2 and 3F3, N3LO differs noticeably from NNLO, but the difference is much smaller than the one between NNLO and NLO. This is what we perceive as a trend towards convergence. In Fig. 6, we conduct a comparison between the predictions from chi- ral one- and two-pion exchange at N3LO and the corresponding predictions from conventional meson theory (curve ‘Bonn’). As representative for con- ventional meson theory, we choose the Bonn meson-exchange model for the NN interaction [13], since it contains a comprehensive and thoughtfully con- structed model for 2π exchange. This 2π model includes box and crossed box diagrams with NN , N∆, and ∆∆ intermediate states as well as di- rect ππ interaction in S- and P -waves (of the ππ system) consistent with empirical information from πN and ππ scattering. Besides this the Bonn model also includes (repulsive) ω-meson exchange and irreducible diagrams of π and ρ exchange (which are also repulsive). However, note that in the phase shift predictions displayed in Fig. 6, the “Bonn” curve includes only the 1π and 2π contributions from the Bonn model; the short-range contri- butions are left out since the purpose of the figure is to compare different models/theories for π + 2π. In all waves shown we see, in general, good agreement between N3LO and Bonn. In 3F2 and 3F3 above 150 MeV and in 3F4 above 250 MeV the chiral model at N3LO is more attractive than the Bonn 2π model. Note, however, that the Bonn model is relativistic and, thus, includes relativistic corrections up to infinite orders. Thus, one may speculate that higher orders in ChPT may create some repulsion, moving the Bonn and the chiral predictions even closer together [51]. The 2π exchange contribution to the NN interaction can also be de- rived from empirical πN and ππ input using dispersion theory, which is based upon unitarity, causality (analyticity), and crossing symmetry. The amplitude NN̄ → ππ is constructed from πN → πN and πN → ππN data using crossing properties and analytic continuation; this amplitude is then ‘squared’ to yield the NN̄ amplitude which is related to NN by cross- ing symmetry [52]. The Paris group [11, 12] pursued this path and calcu- lated NN phase shifts in peripheral partial waves. Naively, the dispersion- theoretic approach is the ideal one, since it is based exclusively on empirical information. Unfortunately, in practice, quite a few uncertainties enter into the approach. First, there are ambiguities in the analytic continuation and, second, the dispersion integrals have to be cut off at a certain momentum to ensure reasonable results. In Ref. [13], a thorough comparison was con- ducted between the predictions by the Bonn model and the Paris approach and it was demonstrated that the Bonn predictions always lie comfortably within the range of uncertainty of the dispersion-theoretic results. There- fore, there is no need to perform a separate comparison of our chiral N3LO predictions with dispersion theory, since it would not add anything that we cannot conclude from Fig. 6. Finally, we need to compare the predictions with the empirical phase shifts. In F waves the N3LO predictions above 200 MeV are, in general, too attractive. Note, however, that this is also true for the predictions by the Bonn π + 2π model. In the full Bonn model, besides π + 2π, (repul- sive) ω and πρ exchanges are included which move the predictions right on top of the data. The exchange of a ω meson or combined πρ exchange are 3π exchanges. Three-pion exchange occurs first at chiral order four. It has be investigated by Kaiser [38] and found to be negligible, at this order. However, 3π exchange at order five appears to be sizable [53] and may have impact on F waves. Besides this, there is the usual short-range phenomenology. In ChPT, this short-range interaction is parametrized in terms of four-nucleon contact terms (since heavy mesons do not have a place in that theory). Contact terms of order four (N3LO) do not contribute to F -waves, but order six does. In summary, the remaining small discrepan- cies between the N3LO predictions and the empirical phase shifts may be straightened out in fifth or sixth order of ChPT. 5.3 NN Contact Potentials In conventional meson theory, the short-range nuclear force is described by the exchange of heavy mesons, notably the ω(782). The qualitative short- distance behavior of the NN potential is obtained by Fourier transform of the propagator of a heavy meson, ei~q·~r m2ω + ~q2 e−mωr . (71) ChPT is an expansion in small momenta Q, too small to resolve struc- tures like a ρ(770) or ω(782) meson, because Q � Λχ ≈ mρ,ω. But the latter relation allows us to expand the propagator of a heavy meson into a power series, m2ω +Q2 −+ . . . , (72) where the ω is representative for any heavy meson of interest. The above expansion suggests that it should be possible to describe the short distance part of the nuclear force simply in terms of powers of Q/mω, which fits in well with our over-all power scheme since Q/mω ≈ Q/Λχ. A second purpose of contact terms is renormalization. Dimensional reg- ularization of the loop integrals of pion-exchanges (cf. Sec. 5.1) typically generates polynomial terms with coefficients that are, in part, infinite or scale dependent. Contact terms pick up infinities and remove scale depen- dence. The partial-wave decomposition of a power Qν has an interesting prop- erty. First note that Q can only be either the momentum transfer between the two interacting nucleons q or the average momentum k [cf. Eq. (47) for their definitions]. In any case, for even ν, Qν = f ν (cos θ) , (73) where fm stands for a polynomial of degree m and θ is the CMS scattering angle. The partial-wave decomposition of Qν for a state of orbital-angular momentum L involves the integral QνPL(cos θ)d cos θ = (cos θ)PL(cos θ)d cos θ , (74) where PL is a Legendre polynomial. Due to the orthogonality of the PL, L = 0 for L > . (75) Consequently, contact terms of order zero contribute only in S-waves, while order-two terms contribute up to P -waves, order-four terms up to D-waves, etc.. We will now present, one by one, the various orders of NN contact terms together with their partial-wave decomposition [54]. Note that, due to parity, only even powers of Q are allowed. 5.3.1 Zeroth Order The contact potential at order zero reads: V (0)(~p′, ~p) = CS + CT ~σ1 · ~σ2 (76) Partial wave decomposition yields: V (0)(1S0) = C̃1S0 = 4π (CS − 3CT ) V (0)(3S1) = C̃3S1 = 4π (CS + CT ) (77) 5.3.2 Second Order The contact potential contribution of order two is given by: V (2)(~p′, ~p) = C1q 2 + C2k 2 + C4k ~σ1 · ~σ2 −i~S · (~q × ~k) + C6(~σ1 · ~q) (~σ2 · ~q) + C7(~σ1 · ~k) (~σ2 · ~k) (78) Second order partial wave contributions: S0) = C1S0(p C2 − 3C3 − C4 − C6 − P0) = C3P0 pp C5 + 2C6 − P1) = C1P1 pp C2 + 2C3 − P1) = C3P1 pp S1) = C3S1(p C2 + C3 + S1 −3 D1) = C3S1−3D1p P2) = C3P2 pp 5.3.3 Fourth Order The contact potential contribution of order four reads: V (4)(~p′, ~p) = D1q 4 +D2k 4 +D3q 2k2 +D4(~q × ~k)2 4 +D6k 4 +D7q 2k2 +D8(~q × ~k)2 ~σ1 · ~σ2 2 +D10k −i~S · (~q × ~k) 2 +D12k (~σ1 · ~q) (~σ2 · ~q) 2 +D14k (~σ1 · ~k) (~σ2 · ~k) + D15 ~σ1 · (~q × ~k) ~σ2 · (~q × ~k) The rather lengthy partial-wave expressions of this order have been relegated to Appendix B. 5.4 Constructing a Chiral NN Potential 5.4.1 Conceptual Questions The two-nucleon system is non-perturbative as evidenced by the presence of a shallow bound state (the deuteron) and large scattering lengths. Wein- berg [18] showed that the strong enhancement of the scattering amplitude arises from purely nucleonic intermediate states. He therefore suggested to use perturbation theory to calculate the NN potential and to apply this potential in a scattering equation to obtain the NN amplitude. We adopt this prescription. Since the irreducible diagrams that make up the potential are calculated using covariant perturbation theory (cf. Sec. 5.1), it is consistent to start from the covariant Bethe-Salpeter (BS) equation [55] describing two-nucleon scattering. In operator notation, the BS equation reads T = V + V G T (81) with T the invariant amplitude for the two-nucleon scattering process, V the sum of all connected two-particle irreducible diagrams, and G the relativistic two-nucleon propagator. The BS equation is equivalent to a set of two equations T = V + V g T (82) V = V + V (G − g)V (83) = V + V1π (G − g)V1π + . . . , (84) where g is a covariant three-dimensional propagator which preserves rela- tivistic elastic unitarity. We choose the propagator g proposed by Blanken- becler and Sugar (BbS) [43] (for more details on relativistic three-dimensional reductions of the BS equation, see Ref. [7]). The ellipsis in Eq. (84) stands for terms of irreducible 3π and higher pion exchanges which we neglect. Note that when we speak of covariance in conjunction with (heavy baryon) ChPT, we are not referring to manifest covariance. Relativity and relativis- tic off-shell effects are accounted for in terms of a Q/MN expansion up to the given order. Thus, Eq. (84) is evaluated in the following way, V ≈ V(on-shell) + V1π G V1π − V1π g V1π , (85) where the pion-exchange content of V(on-shell) is V1π+V ′2π with V1π the on- shell 1PE given in Eq. (48) and V ′2π the irreducible 2π exchanges calculated in Sec. 5.1, but without the box. V1π denotes the relativistic (off-shell) 1PE. Notice that the term (V1π G V1π−V1π g V1π) represents what has been called “the (irreducible part of the) box diagram contribution” in Sec. 5.1 where it was evaluated at various orders. The full chiral NN potential V is given by irreducible pion exchanges Vπ and contact terms Vct, V = Vπ + Vct (86) Vπ = V1π + V2π + . . . , (87) where the ellipsis denotes irreducible 3π and higher pion exchanges which are omitted. Two-pion exchange contributions appear in various orders V2π = V 2π + V 2π + V 2π + . . . (88) as calculated in Sec. 5.1. Contact terms come in even orders, Vct = V ct + V ct + V ct + . . . (89) and were presented in Sec. 5.3. The potential V is calculated at a given order. For example, the potential at NNLO includes 2PE up to V (3)2π and contacts up to V (2)ct . At N 3LO, contributions up to V (4)2π and V ct are included. The potential V satisfies the relativistic BbS equation, Eq. (82). Defining V̂ (~p ′, ~p) ≡ (2π)3 V (~p ′, ~p) T̂ (~p ′, ~p) ≡ (2π)3 T (~p ′, ~p) with Ep ≡ M2N + ~p 2 (the factor 1/(2π)3 is added for convenience), the BbS equation collapses into the usual, nonrelativistic Lippmann-Schwinger (LS) equation, T̂ (~p ′, ~p) = V̂ (~p ′, ~p) + d3p′′ V̂ (~p ′, ~p ′′) p2 − p′′2 + i� T̂ (~p ′′, ~p) . (92) Since V̂ satisfies Eq. (92), it can be used like a usual nonrelativistic potential, and T̂ is the conventional nonrelativistic T-matrix. Iteration of V̂ in the LS equation requires cutting V̂ off for high momenta to avoid infinities, This is consistent with the fact that ChPT is a low- momentum expansion which is valid only for momenta Q � Λχ ≈ 1 GeV. Thus, we multiply V̂ with a regulator function V̂ (~p ′, ~p) 7−→ V̂ (~p ′, ~p) e−(p ′/Λ)2n e−(p/Λ) ≈ V̂ (~p ′, ~p) + . . . with the ‘cutoff parameter’ Λ around 0.5 GeV. Equation (94) provides an indication of the fact that the exponential cutoff does not necessarily affect the given order at which the calculation is conducted. For sufficiently large n, the regulator introduces contributions that are beyond the given order. Assuming a good rate of convergence of the chiral expansion, such orders are small as compared to the given order and, thus, do not affect the accuracy at the given order. In our calculations we use, of course, the full exponential, Eq. (93), and not the expansion. On a similar note, we also do not expand the square-root factors in Eqs. (90-91) because they are kinematical factors which guarantee relativistic elastic unitarity. 5.4.2 What Order? Since in nuclear EFT we are dealing with a perturbative expansion, at some point, we have to raise the question, to what order of ChPT we have to go to obtain the precision we need. To discuss this issue on firm grounds, we show in Table 3 the χ2/datum for the fit of the world np data below 290 MeV for a family of np potentials at NLO and NNLO. The NLO potentials produce the very large χ2/datum between 67 and 105, and the NNLO are between 12 and 27. The rate of improvement from one order to the other is very encouraging, but the quality of the reproduction of the np data at NLO and NNLO is obviously insufficient for reliable predictions. Table 3: χ2/datum for the reproduction of the 1999 np database [56] by families of np potentials at NLO and NNLO constructed by the Juelich group [57]. Tlab bin # of np — Juelich np potentials — (MeV) data NLO NNLO 0–100 1058 4–5 1.4–1.9 100–190 501 77–121 12–32 190–290 843 140–220 25–69 0–290 2402 67–105 12–27 Based upon these facts, it has been pointed out in 2002 by Entem and Machleidt [25, 26] that one has to proceed to N3LO. Consequently, the first N3LO potential was published in 2003 [27]. At N3LO, there are 24 contact terms (24 parameters) which contribute to the partial waves with L ≤ 2 (cf. Sec. 5.3). In Table 4, column ‘Q4/N3LO’, we show how these terms/parameters are distributed over the various partial waves. For comparison, we also show the number of parameters used in the Nijmegen partial wave analysis (PWA93) [49] and in the high-precision CD-Bonn potential [9]. The table reveals that, for S and P waves, the number of parameters used in high-precision phenomenology and in EFT at N3LO are about the same. Thus, the EFT approach provides retroactively a justification for what the phenomenologists of the 1990’s were doing. At NLO and NNLO, the number of parameters is substantially smaller than for PWA93 and CD-Bonn, which explains why these orders are insufficient for a quantitative potential. This fact is also clearly reflected in Fig. 7 where phase shifts are shown for potentials constructed at NLO, NNLO, and N3LO. 5.4.3 Charge-Dependence For an accurate fit of the low-energy pp and np data, charge-dependence is important. We include charge-dependence up to next-to-leading order of the isospin-violation scheme (NLØ, in the notation of Ref. [58]). Thus, we include the pion mass difference in 1PE and the Coulomb potential in pp scattering, which takes care of the LØ contributions. At order NLØ, we have the pion mass difference in 2PE at NLO, πγ exchange [59], and two charge-dependent contact interactions of order Q0 which make possible an accurate fit of the three different 1S0 scattering lengths, app, ann, and anp. Table 4: Number of parameters needed for fitting the np data in phase-shift analysis and by a high-precision NN potential versus the number of NN contact terms of EFT based potentials at different orders. Nijmegen CD-Bonn — Contact Potentials — partial-wave high-precision Q0 Q2 Q4 analysis [49] potential [9] LO NLO/NNLO N3LO 1S0 3 4 1 2 4 3S1 3 4 1 2 4 3S1-3D1 2 2 0 1 3 1P1 3 3 0 1 2 3P0 3 2 0 1 2 3P1 2 2 0 1 2 3P2 3 3 0 1 2 3P2-3F2 2 1 0 0 1 1D2 2 3 0 0 1 3D1 2 1 0 0 1 3D2 2 2 0 0 1 3D3 1 2 0 0 1 3D3-3G3 1 0 0 0 0 1F3 1 1 0 0 0 3F2 1 2 0 0 0 3F3 1 2 0 0 0 3F4 2 1 0 0 0 3F4-3H4 0 0 0 0 0 1G4 1 0 0 0 0 3G3 0 1 0 0 0 3G4 0 1 0 0 0 3G5 0 1 0 0 0 Total 35 38 2 9 24 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) Figure 7: Phase parameters for np scattering as calculated from NN potentials at different orders of ChPT. The dotted line is NLO [57], the dashed NNLO [57], and the solid N3LO [27]. Partial waves with total angular momentum J ≤ 2 are dis- played. Solid dots represent the Nijmegen multienergy np phase shift analysis [49] and open circles are the GWU/VPI single-energy np analysis SM99 [50]. 5.4.4 A Quantitative NN Potential at N3LO NN Scattering. The fitting procedure starts with the peripheral partial waves because they depend on fewer parameters. Partial waves with L ≥ 3 are exclusively determined by 1PE and 2PE because the N3LO contacts contribute to L ≤ 2 only. 1PE and 2PE at N3LO depend on the axial- vector coupling constant, gA (we use gA = 1.29), the pion decay constant, fπ = 92.4 MeV, and eight low-energy constants (LECs) that appear in the dimension-two and dimension-three πN Lagrangians, Eqs. (37) and (38). In the fitting process, we varied three of them, namely, c2, c3, and c4. We found that the other LECs are not very effective in the NN system and, therefore, we kept them at the values determined from πN (cf. Table 2). The most influential constant is c3, which has to be chosen on the low side (slightly more than one standard deviation below its πN determination) for an optimal fit of the NN data. As compared to a calculation that strictly uses the πN values for c2 and c4, our choices for these two LECs lower Table 5: χ2/datum for the reproduction of the 1999 np database [56] by various np potentials. (Numbers in parentheses are the values of cutoff parameters in units of MeV used in the regulators of the chiral potentials.) Tlab bin # of np Idaho Juelich Argonne (MeV) data N3LO [27] N3LO [60] V18 [61] (500–600) (600/700–450/500) 0–100 1058 1.0–1.1 1.0–1.1 0.95 100–190 501 1.1–1.2 1.3–1.8 1.10 190–290 843 1.2–1.4 2.8–20.0 1.11 0–290 2402 1.1–1.3 1.7–7.9 1.04 Table 6: χ2/datum for the reproduction of the 1999 pp database [56] by various pp potentials. Notation as in Fig. 5. Tlab bin # of np Idaho Juelich Argonne (MeV) data N3LO [27] N3LO [60] V18 [61] (500–600) (600/700–450/500) 0–100 795 1.0–1.7 1.0–3.8 1.0 100–190 411 1.5–1.9 3.5–11.6 1.3 190–290 851 1.9–2.7 4.3–44.4 1.8 0–290 2057 1.5–2.1 2.9–22.3 1.4 the 3F2 and 1F3 phase shifts bringing them into closer agreement with the phase shift analysis. The other F waves and the higher partial waves are essentially unaffected by our variations of c2 and c4. Overall, the fit of all J ≥ 3 waves is very good. We turn now to the lower partial waves. Here, the most important fit parameters are the ones associated with the 24 contact terms that contribute to the partial waves with L ≤ 2. In addition, we have two charge-dependent contacts which are used to fit the three different 1S0 scattering lengths, app, ann, and anp. In the optimization procedure, we fit first phase shifts, and then we refine the fit by minimizing the χ2 obtained from a direct comparison with the data. The χ2/datum for the fit of the np data below 290 MeV is shown in Table 5, and the corresponding one for pp is given in Table 6. These tables show that at N3LO a χ2/datum comparable to the high-precision Argonne 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) 0 100 200 300 Lab. Energy (MeV) Figure 8: Neutron-proton phase parameters as described by two potentials at N3LO. The solid curve is calculated from the Idaho N3LO potential [27] while the dashed curve is from the Juelich [60] one. Solid dots and open circles as in Fig. 7. V18 [61] potential can, indeed, be achieved. The “Idaho” N3LO potential [27] produces a χ2/datum = 1.1 for the world np data below 290 MeV which compares well with the χ2/datum = 1.04 by the Argonne potential. In 2005, also the Juelich group produced several N3LO NN potentials [60], the best of which fits the np data with a χ2/datum = 1.7 and the worse with a χ2/datum = 7.9 (see Table 5). While 7.9 is clearly unacceptable for any meaningful application, a χ2/datum of 1.7 is reasonable, although it does not meet the precision standard that few-nucleon physicists established in the 1990’s. Turning to pp, the χ2 for pp data are typically larger than for np be- cause of the higher precision of pp data. Thus, the Argonne V18 produces a χ2/datum = 1.4 for the world pp data below 290 MeV and the best Idaho N3LO pp potential obtains 1.5. The fit by the best Juelich N3LO pp poten- tial results in a χ2/datum = 2.9 which, again, is not quite consistent with the precision standards established in the 1990’s. The worst Juelich N3LO pp potential produces a χ2/datum of 22.3 and is incompatible with reliable predictions. Table 7: Deuteron properties as predicted by various NN potentials are com- pared to empirical information. (Deuteron binding energy Bd, asymptotic S state AS , asymptotic D/S state η, deuteron radius rd, quadrupole moment Q, D-state probability PD; the calculated rd and Q are without meson-exchange current con- tributions and relativistic corrections.) Idaho Juelich N3LO [27] N3LO [60] CD-Bonn[9] AV18[61] Empiricala (500) (550/600) Bd (MeV) 2.224575 2.218279 2.224575 2.224575 2.224575(9) AS (fm−1/2) 0.8843 0.8820 0.8846 0.8850 0.8846(9) η 0.0256 0.0254 0.0256 0.0250 0.0256(4) rd (fm) 1.975 1.977 1.966 1.967 1.97535(85) Q (fm2) 0.275 0.266 0.270 0.270 0.2859(3) PD (%) 4.51 3.28 4.85 5.76 aSee Table XVIII of Ref. [9] for references; the empirical value for rd is from Ref. [62]. Phase shifts of np scattering from the best Idaho (solid line) and Juelich (dashed line) N3LO np potentials are shown in Figure 8. The phase shifts confirm what the corresponding χ2 have already revealed. The Deuteron. The reproduction of the deuteron parameters is shown in Table 7. We present results for two N3LO potentials, namely, Idaho [27] and Juelich [60]. Remarkable are the predictions by the chiral potentials for the deuteron radius which are in good agreement with the latest empirical value obtained by the isotope-shift method [62]. All NN potentials of the past (Table 7 includes two representative examples, namely, CD-Bonn [9] and AV18 [61]) fail to reproduce this very precise new value for the deuteron radius. In Fig. 9, we display the deuteron wave functions derived from the N3LO potentials and compare them with wave functions based upon conventional NN potentials from the recent past. Characteristic differences are notice- able; in particular, the chiral wave functions are shifted towards larger r which explains the larger deuteron radius. 6 Many-Nucleon Forces As noted before, an important advantage of the EFT approach to nuclear forces is that it creates two- and many-nucleon forces on an equal footing. 0 1 2 3 4 5 6 r (fm) Figure 9: Deuteron wave functions: the family of larger curves are S-waves, the smaller ones D-waves. The thick lines represent the wave functions derived from chiral NN potentials at order N3LO (thick solid: Idaho [27], thick dashed: Juelich [60]). The thin dashed, dash-dotted, and dotted lines refer to the wave functions of the CD-Bonn [9], Nijm-I [8], and AV18 [61] potentials, respectively. 6.1 Three-Nucleon Forces The first non-vanishing 3NF terms occur at NNLO and are shown in Fig. 10 (cf. also Fig. 1, row ‘Q3/NNLO’, column ‘3N Force’). There are three di- agrams: the 2PE, 1PE, and 3N-contact interactions [39, 40]. The 2PE 3N-potential is given by V 3NF2PE = i 6=j 6=k (~σi · ~qi)(~σj · ~qj) (q2i +m ijk τ j (95) with ~qi ≡ ~pi′ − ~pi, where ~pi and ~pi′ are the initial and final momenta of nucleon i, respectively, and ijk = δ 4c1m2π ~qi · ~qj �αβγ τ k ~σk · [~qi × ~qj ] . (96) The vertex involved in this 3NF term is the two-derivative ππNN vertex (solid square in Fig. 10) which we encountered already in the 2PE contribu- tion to the NN potential at NNLO. Thus, there are no new parameters and 1 2 3 Figure 10: The three-nucleon force at NNLO (from Ref. [40]). the contribution is fixed by the LECs used in NN . The 1PE contribution is V 3NF1PE = D i 6=j 6=k ~σj · ~qj q2j +m (τ i · τ j)(~σi · ~qj) (97) and, finally, the 3N contact term reads V 3NFct = E j 6=k τ j · τ k . (98) The last two 3NF terms involve two new vertices (that do not appear in the 2N problem), namely, the πNNNN vertex with parameter D and a 6N ver- tex with parameters E. To pin them down, one needs two observables that involve at least three nucleons. In Ref. [40], the triton binding energy and the nd doublet scattering length 2and were used. Alternatively, one may also choose the binding energies of 3H and 4He [63]. Once D and E are fixed, the results for other 3N, 4N, . . . observables are predictions. In Refs. [64, 63], the first calculations of the structure of light nuclei (6Li and 7Li) were re- ported. Recently, the structure of nuclei with A = 10−13 nucleons has been calculated using the ab initio no-core shell model and applying chiral two and three-nucleon forces [65]. The results are very encouraging. Concerning the famous ‘Ay puzzle’, the above 3NF terms yield some improvement of the predicted nd Ay, however, the problem is not solved [40]. Note that the 3NF expressions given in Eqs. (95)-(98) above are the ones that occur at NNLO, and all calculations to date have included only those. Since we have to proceed to N3LO for sufficient accuracy of the 2NF, then consistency requires that we also consider the 3NF at N3LO. The 3NF at N3LO is very involved as can be seen from Fig. 11, but it does not depend on any new parameters. It is presently under construction [66]. So, for the moment, we can only hope that the Ay puzzle may be solved by a complete calculation at N3LO. Figure 11: Three-nucleon force contributions at N3LO (from Ref. [66]). 6.2 Four-Nucleon Forces In ChPT, four-nucleon forces (4NF) appear for the first time at N3LO (ν = 4). Thus, N3LO is the leading order for 4NF. Assuming a good rate of convergence, a contribution of order (Q/Λχ)4 is expected to be rather small. Thus, ChPT predicts 4NF to be essentially insignificant, consistent with ex- perience. Still, nothing is fully proven in physics unless we have performed explicit calculations. Very recently, the first such calculation has been per- formed: The chiral 4NF, Fig. 12, has been applied in a calculation of the 4He binding energy and found to contribute a few 100 keV [68]. It should be noted that this preliminary calculation involves many approximations, but it certainly provides the right order of magnitude of the result, which is indeed very small as compared to the full 4He binding energy of 28.3 MeV. 7 Conclusions The theory of nuclear forces has made great progress since the turn of the millennium. Nucleon-nucleon potentials have been developed that are based on proper theory (EFT for low-energy QCD) and are of high-precision, at the same time. Moreover, the theory generates two- and many-body forces on an equal footing and provides a theoretical explanation for the empirically known fact that 2NF � 3NF � 4NF . . . . (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) Figure 12: The four-nucleon force at N3LO (from Ref. [67]). At N3LO [26, 27], the accuracy can be achieved that is necessary and sufficient for reliable microscopic nuclear structure predictions. First cal- culations applying the N3LO NN potential [27] in the conventional shell model [69, 70], the ab initio no-core shell model [71, 72, 73], the coupled cluster formalism [74, 75, 76, 77, 78], and the unitary-model-operator ap- proach [79] have produced promising results. The 3NF at NNLO is known [39, 40] and has been applied in few-nucleon reactions [40, 80, 81] as well as the structure of light nuclei [64, 63, 65]. How- ever, the famous ‘Ay puzzle’ of nucleon-deuteron scattering is not resolved by the 3NF at NNLO. Thus, one important outstanding issue is the 3NF at N3LO, which is under construction [66]. Another open question that needs to be settled is whether Weinberg power counting, which is applied in all current NN potentials, is consistent. This controversial issue is presently being debated in the literature [82, 83]. Acknowledgements It is a pleasure to thank the organizers of this workshop, particularly, Ananda Santra, for their warm hospitality. I gratefully acknowledge numer- ous discussions with my collaborator D. R. Entem. This work was supported in part by the U.S. National Science Foundation under Grant No. PHY- 0099444. A Fourth Order Two-Pion Exchange Contributions The fourth order 2PE contributions consist of two classes: the one-loop (Fig. 3) and the two-loop diagrams (Fig. 4). A.1 One-loop diagrams This large pool of diagrams can be analyzed in a systematic way by intro- ducing the following well-defined subdivisions. A.1.1 c2i contributions. The only contribution of this kind comes from the football diagram with both vertices proportional to ci (first row of Fig. 3). One obtains [41]: 3L(q) 16π2f4π w2 + c3w̃ 2 − 4c1m2π , (99) WT = − 2L(q) 96π2f4π . (100) A.1.2 ci/MN contributions. This class consists of diagrams with one vertex proportional to ci and one 1/MN correction. A few graphs that are representative for this class are shown in the second row of Fig. 3. Symbols with a large solid dot and an open circle denote 1/MN corrections of vertices proportional to ci. They are part of L̂(3)πN , Eq. (39). The result for this group of diagrams is [41]: VC = − g2A L(q) 32π2MNf4π (c2 − 6c3)q4 + 4(6c1 + c2 − 3c3)q2m2π +6(c2 − 2c3)m4π + 24(2c1 + c3)m , (101) WC = − 2L(q) 192π2MNf4π g2A(8m π + 5q 2) + w2 , (102) WT = − c4L(q) 192π2MNf4π g2A(16m π + 7q 2)− w2 , (103) VLS = 8π2MNf4π w2L(q) , (104) WLS = − c4L(q) 48π2MNf4π g2A(8m π + 5q 2) + w2 . (105) A.1.3 1/M2N corrections. These are relativistic 1/M2N corrections of the leading order 2π exchange diagrams. Typical examples for this large class are shown in row 3–6 of Fig. 3. This time, there is no correction from the iterated 1PE, Eq. (65) or Eq. (66), since the expansion of the factor M2N/Ep does not create a term proportional to 1/M2N . The total result for this class is [42], VC = − 32π2M2Nf 2m8πw −4 + 8m6πw −2 − q4 − 2m4π (106) WC = − 768π2M2Nf q4 + 3m2πq 2 + 3m4π − 6m −k2(8m2π + 5q + 4g4A k2(20m2π + 7q 2 − 16m4πw −2) + 16m8πw +12m6πw −2 − 4m4πq 2w−2 − 5q4 − 6m2πq 2 − 6m4π − 4k2w2 16g4Am , (107) VT = − g4A L(q) 32π2M2Nf q2 +m4πw , (108) WT = − 1536π2M2Nf 7m2π + q2 + 4m4πw − 32g2A m2π + , (109) VLS = g4A L(q) 4π2M2Nf q2 +m4πw , (110) WLS = 256π2M2Nf 16g2A m2π + 4m4πw q2 − 9m2π , (111) VσL = g4A L(q) 32π2M2Nf . (112) A.2 Two-loop contributions. The two-loop contributions are quite intricate. In Fig. 4, we attempt a graphical representation of this class. The gray disk stands for all one-loop πN graphs which are shown in some detail in the lower part of the figure. Not all of the numerous graphs are displayed. Some of the missing ones are obtained by permutation of the vertices along the nucleon line, others by inverting initial and final states. Vertices denoted by a small dot are from the leading order πN Lagrangian L̂(1)πN , Eq. (33), except for the four- pion vertices which are from L(2)ππ , Eq. (27). The solid square represents vertices proportional to the LECs di which are introduced by the third order Lagrangian L(3)πN , Eq. (38). The di vertices occur actually in one- loop NN diagrams, but we list them among the two-loop NN contributions because they are needed to absorb divergences generated by one-loop πN graphs. Using techniques from dispersion theory, Kaiser [41] calculated the imaginary parts of the NN amplitudes, Im Vα(iµ) and Im Wα(iµ), which result from analytic continuation to time-like momentum transfer q = iµ−0+ with µ ≥ 2mπ. From this, the momentum-space amplitudes Vα(q) and Wα(q) are obtained via the subtracted dispersion relations: VC,S(q) = − ImVC,S(iµ) µ5(µ2 + q2) , (113) VT (q) = ImVT (iµ) µ3(µ2 + q2) , (114) and similarly for WC,S,T . In most cases, the dispersion integrals can be solved analytically and the following expressions are obtained [26]: VC(q) = 3g4Aw̃ 2A(q) 1024π2f6π (m2π + 2q 2mπ + w̃ 2A(q) + 4g2Amπw̃ (115) WC(q) = W C (q) +W C (q) , (116) C (q) = 18432π4f6π 192π2f2πw 2g2Aw̃ (g2A − 1)w 6g2Aw̃ 2 − (g2A − 1)w 384π2f2π w̃2(d̄1 + d̄2) + 4m +L(q) 4m2π(1 + 2g A) + q 2(1 + 5g2A) (5 + 13g2A) + 8m π(1 + 2g (117) C (q) = − ImW (b)C (iµ) µ5(µ2 + q2) , (118) where ImW (b)C (iµ) = − 3µ(8πf2π)3 g2A(2m π − µ 2) + 2(g2A − 1)κ − 3κ2x2 + 6κx m2π + κ2x2 ln m2π + κ2x2 µ2 − 2κ2x2 − 2m2π m2π + κ2x2 ; (119) VT (q) = V T (q) + V T (q) VS(q) = − S (q) + V S (q) , (120) T (q) = − S (q) = − 2L(q) 32π2f4π (d̄14 − d̄15) (121) T (q) = − S (q) = ImV (b)T (iµ) µ3(µ2 + q2) , (122) where ImV (b)T (iµ) = − 2g6Aκ µ(8πf2π)3 dx(1− x2) m2π + κ2x2  ; (123) WT (q) = − WS(q) = 2A(q) 2048π2f6π w2A(q) + 2mπ(1 + 2g , (124) where κ ≡ µ2/4−m2π. Note that the analytic solutions hold modulo polynomials. We have checked the importance of those contributions where we could not find an analytic solution and where, therefore, the integrations have to be performed numerically. It turns out that the combined effect on NN phase shifts from C , V T , and V S is smaller than 0.1 deg in F and G waves and smaller than 0.01 deg in H waves, at Tlab = 300 MeV (and less at lower energies). This renders these contributions negligible. Therefore, we omit W (b)C , V and V (b)S in the construction of chiral NN potentials at order N In Eqs. (117) and (121), we use the scale-independent LECs, d̄i, which are obtained by combining the scale-dependent ones, dri (λ), with the chiral logarithm, ln(mπ/λ), or equivalently d̄i = dri (mπ). The scale-dependent LECs, dri (λ), are a consequence of renormalization. For more details about this issue, see Ref. [37]. B Partial Wave Decomposition of the Fourth Or- der Contact Potential The contact potential contribution of order four, Eq. (80), decomposes into partial-waves as follows. V (4)(1S0) = D̂1S0(p ′4 + p4) +D1S0p V (4)(3P0) = D3P0(p ′3p+ p′p3) V (4)(1P1) = D1P1(p ′3p+ p′p3) V (4)(3P1) = D3P1(p ′3p+ p′p3) V (4)(3S1) = D̂3S1(p ′4 + p4) +D3S1p V (4)(3D1) = D3D1p V (4)(3S1 −3 D1) = D̂3S1−3D1p 4 +D3S1−3D1p V (4)(1D2) = D1D2p V (4)(3D2) = D3D2p V (4)(3P2) = D3P2(p ′3p+ p′p3) V (4)(3P2 −3 F2) = D3P2−3F2p V (4)(3D3) = D3D3p ′2p2 (125) The coefficients in the above expressions are given by: D̂1S0 = D1 + D3 − 3D5 − D7 −D11 − D12 − D1S0 = D4 − 10D5 − D7 − 2D8 − D12 − D13 − D14 − D3P0 = − D10 + D11 + D12 − D1P1 = − D2 + 4D5 − D11 − D3P1 = − D10 − 2D11 − D12 + D̂3S1 = D1 + D3 +D5 + D11 + D12 + D3S1 = D12 + D13 + D14 + D3D1 = D10 − D11 + D12 + D13 − D14 − D̂3S1−3D1 = − D11 − D12 − D13 − D3S1−3D1 = − D11 + D12 + D13 − D14 + D1D2 = D12 + D13 − D14 + D3D2 = D10 + D11 − D12 − D13 + D14 + D3P2 = − D10 − D11 + D13 + D3P2−3F2 = D11 − D12 + D13 − D3D3 = D10 − D15 (126) References [1] H. Yukawa, Proc. Phys. Math. Soc. Japan 17, 48 (1935). [2] Prog. Theor. Phys. (Kyoto), Supplement 3 (1956). [3] M. Taketani, S. Machida, and S. Onuma, Prog. Theor. Phys. (Kyoto) 7, 45 (1952). [4] K. A. Brueckner and K. M. Watson, Phys. Rev. 90, 699; 92, 1023 (1953). [5] A. R. Erwin et al., Phys. Rev. Lett. 6, 628 (1961); B. C. Maglić et al., ibid. 7, 178 (1961). [6] Prog. Theor. Phys. (Kyoto), Supplement 39 (1967); R. A. Bryan and B. L. Scott, Phys. Rev. 177, 1435 (1969); M. M. Nagels et al., Phys. Rev. D 17, 768 (1978). [7] R. Machleidt, Adv. Nucl. Phys. 19, 189 (1989). [8] V. G. J. Stoks et al., Phys. Rev. C 49, 2950 (1994). [9] R. Machleidt, Phys. Rev. C 63, 024001 (2001). [10] A. D. Jackson, D. O. Riska, and B. Verwest, Nucl. Phys. A249, 397 (1975). [11] R. Vinh Mau, in Mesons in Nuclei, edited by M. Rho and D. H. Wilkin- son (North-Holland, Amsterdam, 1979), Vol. I, p. 151. [12] M. Lacombe, B. Loiseau, J. M. Richard, R. Vinh Mau, J. Côté, P. Pires, and R. de Tourreil, Phys. Rev. C 21, 861 (1980). [13] R. Machleidt, K. Holinde, and Ch. Elster, Phys. Rep. 149, 1 (1987). [14] F. Myhrer and J. Wroldsen, Rev. Mod. Phys. 60, 629 (1988). [15] D. R. Entem, F. Fernandez, and A. Valcarce, Phys. Rev. C 62, 034002 (2000). [16] G. H. Wu, J. L. Ping, L. J. Teng, F. Wang, and T. Goldman, Nucl. Phys. A673, 273 (2000). [17] S. Weinberg, Physica 96A, 327 (1979). [18] S. Weinberg, Phys. Lett. B 251, 288 (1990); Nucl. Phys. B363, 3 (1991); Phys. Lett. B 295, 114 (1992). [19] C. Ordóñez, L. Ray, and U. van Kolck, Phys. Rev. Lett. 72, 1982 (1994); Phys. Rev. C 53, 2086 (1996). [20] U. van Kolck, Prog. Part. Nucl. Phys. 43, 337 (1999). [21] L. S. Celenza et al., Phys. Rev. C 46, 2213 (1992); C. A. da Rocha et al., ibid. 49, 1818 (1994); D. B. Kaplan et al., Nucl. Phys. B478, 629 (1996). [22] N. Kaiser, R. Brockmann, and W. Weise, Nucl. Phys. A625, 758 (1997). [23] N. Kaiser, S. Gerstendörfer, and W. Weise, Nucl. Phys. A637, 395 (1998). [24] E. Epelbaum et al., Nucl. Phys. A637, 107 (1998); A671, 295 (2000). [25] D. R. Entem and R. Machleidt, Phys. Lett. B 524, 93 (2002). [26] D. R. Entem and R. Machleidt, Phys. Rev. C 66, 014002 (2002). [27] D. R. Entem and R. Machleidt, Phys. Rev. C 68, 041001 (2003). [28] R. Machleidt and D. R. Entem, J. Phys. G: Nucl. Phys. 31, S1235 (2005). [29] P. F. Bedaque and U. van Kolck, Ann. Rev. Nucl. Part. Sci. 52, 339 (2002). [30] S. Scherer and M. R. Schindler, arXiv:hep-ph/0505265. [31] Review of Particle Physics, J. Phys. G: Nucl. Part. Phys. 33, 1 (2006). [32] S. Coleman, J. Wess, and B. Zumino, Phys. Rev. 177, 2239 (1969); C. G. Callan, S. Coleman, J. Wess, and B. Zumino, ibid. 177, 2247 (1969). [33] J. Gasser and H. Leutwyler, Ann. Phys. 158, 142 (1984). [34] J. Gasser, M. E. Sainio, and A. Švarc, Nucl. Phys. B307, 779 (1988). http://arxiv.org/abs/hep-ph/0505265 [35] V. Bernard, N. Kaiser, and U.-G. Meißner, Int. J. Mod. Phys. E 4, 193 (1995). [36] N. Fettes, U.-G. Meißner, M. Mojžǐs, and S. Steininger, Ann. Phys. (N.Y.) 283, 273 (2000); 288, 249 (2001). [37] N. Fettes, U.-G. Meißner, and S. Steiniger, Nucl. Phys. A640, 199 (1998). [38] N. Kaiser, Phys. Rev. C 61, 014003 (1999); 62, 024001 (2000). [39] U. van Kolck, Phys. Rev. C 49, 2932 (1994). [40] E. Epelbaum et al., Phys. Rev. C 66, 064001 (2002). [41] N. Kaiser, Phys. Rev. C 64, 057001 (2001). [42] N. Kaiser, Phys. Rev. C 65, 017001 (2002). [43] R. Blankenbecler and R. Sugar, Phys. Rev. 142, 1051 (1966). [44] This section closely follows Ref. [26]. [45] G. Q. Li and R. Machleidt, Phys. Rev. C 58, 3153 (1998). [46] V. Stoks, R. Timmermans, and J. J. de Swart, Phys. Rev. C 47, 512 (1993). [47] R. A. Arndt, R. L. Workman, and M. M. Pavan, Phys. Rev. C 49, 2729 (1994). [48] P. Büttiker and U.-G. Meißner, Nucl. Phys. A668, 97 (2000). [49] V. G. J. Stoks, R. A. M. Klomp, M. C. M. Rentmeester, and J. J. de Swart, Phys. Rev. C 48, 792 (1993). [50] R. A. Arndt, I. I. Strakovsky, and R. L. Workman, SAID, Scattering Analysis Interactive Dial-in computer facility, George Washington Uni- versity (formerly Virginia Polytechnic Institute), solution SM99 (Sum- mer 1999); for more information see, e. g., R. A. Arndt, I. I. Strakovsky, and R. L. Workman, Phys. Rev. C 50, 2731 (1994). [51] In fact, preliminary calculations, which take an important class of dia- grams of order five into account, indicate that the N4LO contribution may prevailingly be repulsive (N. Kaiser, private communication). [52] G. E. Brown and A. D. Jackson, The Nucleon-Nucleon Interaction, (North-Holland, Amsterdam, 1976). [53] N. Kaiser, Phys. Rev. C 63, 044010 (2001). [54] K. Erkelenz, R. Alzetta, and K. Holinde, Nucl. Phys. A176, 413 (1971); note that there is an error in equation (4.22) of this paper where it should read −W JLS = 2qq ′ J − 1 2J − 1 J−2,(0) LS −A +W JLS = 2qq ′ J + 2 2J + 3 J+2,(0) LS −A [55] E. E. Salpeter and H. A. Bethe, Phys. Rev. 84, 1232 (1951). [56] The 1999 NN data base is defined in Ref. [9]. [57] E. Epelbaum, W. Glöckle, and U.-G. Meißner, Eur. Phys. J. A19, 401 (2004). [58] M. Walzl et al., Nucl. Phys. A693, 663 (2001). [59] U. van Kolck et al., Phys. Rev. Lett. 80, 4386 (1998). [60] E. Epelbaum, W. Glöckle, and U.-G. Meißner, Nucl. Phys. A747, 362 (2005). [61] R. B. Wiringa et al., Phys. Rev. C 51, 38 (1995). [62] A. Huber et al., Phys. Rev. Lett. 80, 468 (1998). [63] A. Nogga, P. Navratil, B. R. Barrett, and J. P. Vary, Phys. Rev. C 73, 064002 (2006). [64] A. Nogga et al., Nucl. Phys. A737, 236 (2004). [65] P. Navratil, V. G. Gueorguiev, J. P. Vary, W. E. Ormand, and A. Nogga, arXiv:nucl-th/0701038. [66] U.-G. Meißner, Proc. 18th International Conference on Few-Body Prob- lems in Physics, Santos, SP, Brazil, August 2006, to be published in Nucl. Phys. A. [67] E. Epelbaum, Phys. Lett. B 639, 456 (2006). http://arxiv.org/abs/nucl-th/0701038 [68] D. Rozpedzik et al., Acta Phys. Polon. B37, 2889 (2006); arXiv:nucl- th/0606017. [69] L. Coraggio et al., Phys. Rev. C 66. 021303 (2002). [70] L. Coraggio et al., Phys. Rev. C 71. 014307 (2005). [71] P. Navrátil and E. Caurier (2004) Phys. Rev. C 69 014311. [72] C. Forssen et al., Phys. Rev. C 71, 044312 (2005). [73] J.P. Vary et al., Eur. Phys. J. A 25 s01, 475 (2005). [74] K. Kowalski et al., Phys. Rev. Lett. 92, 132501 (2004). [75] D.J. Dean and M. Hjorth-Jensen (2004) Phys. Rev. C 69 054320. [76] M. Wloch et al., J. Phys. G 31, S1291 (2005); Phys. Rev. Lett. 94, 21250 (2005). [77] D.J. Dean et al., Nucl. Phys. 752, 299 (2005). [78] J.R. Gour et al., Phys. Rev. C 74, 024310 (2006). [79] S. Fujii, R. Okamato, and K. Suzuki, Phys. Rev. C 69, 034328 (2004). [80] K. Ermisch et al., Phys. Rev. C 71, 064004 (2005). [81] H. Witala, J. Golak, R. Skibinski, W. Glöckle, A. Nogga, E. Epelbaum, H. Kamada, A. Kievsky, and M. Viviani, Phys. Rev. C 73, 044004 (2006). [82] A. Nogga, R. G. E. Timmermans, and U. van Kolck, Phys. Rev. C 72, 054006 (2005). [83] E. Epelbaum, U.-G. Meißner, arXiv:nucl-th/0609037. http://arxiv.org/abs/nucl-th/0606017 http://arxiv.org/abs/nucl-th/0606017 http://arxiv.org/abs/nucl-th/0609037 Introduction and Historical Perspective QCD and the Nuclear Force Effective Field Theory for Low-Energy QCD Symmetries of Low-Energy QCD Chiral Symmetry Explicit Symmetry Breaking Spontaneous Symmetry Breaking Chiral Effective Lagrangians Involving Pions Nucleon Contact Lagrangians Nuclear Forces from EFT: Overview Chiral Perturbation Theory and Power Counting The Hierarchy of Nuclear Forces Two-Nucleon Forces Pion-Exchange Contributions in ChPT Zeroth Order (LO) Second Order (NLO) Third Order (NNLO) Fourth Order (N3LO) Iterated One-Pion-Exchange NN Scattering in Peripheral Partial Waves Using the Perturbative Amplitude NN Contact Potentials Zeroth Order Second Order Fourth Order Constructing a Chiral NN Potential Conceptual Questions What Order? Charge-Dependence A Quantitative NN Potential at N3LO Many-Nucleon Forces Three-Nucleon Forces Four-Nucleon Forces Conclusions Fourth Order Two-Pion Exchange Contributions One-loop diagrams ci2 contributions. ci/MN contributions. 1/MN2 corrections. Two-loop contributions. Partial Wave Decomposition of the Fourth Order Contact Potential ABSTRACT In this lecture series, I present the recent progress in our understanding of nuclear forces in terms of chiral effective field theory. <|endoftext|><|startoftext|> On a Conjecture of E. M. Stein on the Hilbert Transform on Vector Fields Michael Lacey and Xiaochun Li Michael Lacey, School of Mathematics, Georgia Insti- tute of Technology, Atlanta GA 30332 Xiaochun Li, Department of Mathematics, University of Illinois, Urbana IL 61801 E-mail address : xcli@math.uiuc.edu http://arxiv.org/abs/0704.0808v3 1991 Mathematics Subject Classification. Primary 42A50, 42B25 The authors are supported in part by NSF grants. M.L. was supported in part by the Guggenheim Foundation. Abstract. Let v be a smooth vector field on the plane, that is a map from the plane to the unit circle. We study sufficient condi- tions for the boundedness of the Hilbert transform Hv,ǫ f(x) := p.v. f(x− yv(x)) dy where ǫ is a suitably chosen parameter, determined by the smooth- ness properties of the vector field. It is a conjecture, due to E.M. Stein, that if v is Lipschitz, there is a positive ǫ for which the transform above is bounded on L2. Our principal result gives a sufficient condition in terms of the boundedness of a maximal function associated to v, namely that this new maximal function be bounded on some Lp, for some 1 < p < 2. We show that the maximal function is bounded from L2 to weak L2 for all Lips- chitz vector fields. The relationship between our results and other known sufficient conditions is explored. Contents Preface vii Chapter 1. Overview of Principal Results 1 Chapter 2. Connections to Besicovitch Set and Carleson’s Theorem 7 Besicovitch Set 7 The Kakeya Maximal Function 7 Carleson’s Theorem 8 The Weak L2 Estimate in Theorem 1.15 is Sharp 10 Chapter 3. The Lipschitz Kakeya Maximal Function 11 The Weak L2 Estimate 11 An Obstacle to an Lp estimate, for 1 < p < 2 22 Bourgain’s Geometric Condition 23 Vector Fields that are a Function of One Variable 27 Chapter 4. The L2 Estimate for Hilbert Transform on Lipschitz Vector Fields 31 Definitions and Principle Lemmas 31 Truncation and an Alternate Model Sum 36 Proofs of Lemmata 39 Chapter 5. Almost Orthogonality Between Annuli 63 Application of the Fourier Localization Lemma 63 The Fourier Localization Estimate 79 References 87 Preface This memoir is devoted to a question in planar Harmonic Analysis, a subject which is a circle of problems all related to the Besicovitch set. This anomalous set has zero Lebesgue measure, yet contains a line seg- ment of unit length in each direction of the plane. It is a known, since the 1970’s, that such sets must necessarily have full Hausdorff dimen- sion. The existence of these sets, and the full Hausdorff dimension, are intimately related to other, independently interesting issues [26]. An important tool to study these questions is the so-called Kakeya Maxi- mal Function, in which one computes the maximal average of a function over rectangles of a fixed eccentricity and arbitrary orientation. Most famously, Charles Fefferman showed [10] that the Besicovitch set is the obstacle to the boundedness of the disc multiplier in the plane. But as well, this set is intimately related to finer questions of Bochner-Riesz summability of Fourier series in higher dimensions and space-time regularity of solutions of the wave equation. This memoir concerns one of the finer questions which center around the Besicovitch set in the plane. (There are not so many of these questions, but our purpose here is not to catalog them!) It concerns a certain degenerate Radon transform. Given a vector field v on R2, one considers a Hilbert transform computed in the one dimensional line segment determined by v, namely the Hilbert transform of a function on the plane computed on the line segment {x+ tv(x) | |t| ≤ 1}. The Besicovitch set itself says that choice of v cannot be just mea- surable, for you can choose the vector field to always point into the set. Finer constructions show that one cannot take it to be Hölder continu- ous of any index strictly less than one. Is the sharp condition of Hölder continuity of index one enough? This is the question of E. M. Stein, motivated by an earlier question of A. Zygmund, who asked the same for the question of differentiation of integrals. The answer is not known under any condition of just smoothness of the vector field. Indeed, as is known, and we explain, a positive answer would necessarily imply Carleson’s famous theorem on the con- vergence of Fourier series, [6]. This memoir is concerned with reversing viii PREFACE this implication: Given the striking recent successes related to Car- leson’s Theorem, what can one say about Stein’s Conjecture? In this direction, we introduce a new object into the study, a Lipschitz Kakeya Maximal Function, which is a variant of the more familiar Kakeya Max- imal Function, which links the vector field v to the ‘Besicovitch sets’ associated to the vector field. One averages a function over rectan- gles of arbitrary orientation and—in contrast to the classical setting— arbitrary eccentricity. But, the rectangle must suitably localize the directions in which the vector field points. This Maximal Function ad- mits a favorable estimate on L2, and this is one of the main results of the Memoir. On Stein’s Conjecture, we prove a conditional result: If the Lips- chitz Kakeya Maximal Function associated with v maps is an estimate a little better than our L2 estimate, then the associated Hilbert trans- form is indeed bounded. Thus, the main question left open concerns the behavior of these novel Maximal Functions. While the main result is conditional, it does contain many of the prior results on the subject, and greatly narrows the possible avenues of a resolution of this conjecture. The principal results and conjectures are stated in the Chapter 1; following that we collect some of the background material for this sub- ject, and prove some of the folklore results known about the subject. The remainder of the Memoir is taken up with the proofs of the The- orems stated in the Chapter 1. Acknowledgment. The efforts of a strikingly generous referee has resulted in corrections of arguments, and improvements in presentation throughout this manuscript. We are indebted to that person. Michael T. Lacey and Xiaochun Li CHAPTER 1 Overview of Principal Results We are interested in singular integral operators on functions of two variables, which act by performing a one dimensional transform along a particular line in the plane. The choice of lines is to be variable. Thus, for a measurable map, v from R2 to the unit circle in the plane, that is a vector field, and a Schwartz function f on R2, define Hv,ǫ f(x) := p.v. f(x− yv(x)) dy This is a truncated Hilbert transform performed on the line segment {x+ tv(x) : |t| < 1}. We stress the limit of the truncation in the defi- nition above as it is important to different scale invariant formulations of our questions of interest. This is an example of a Radon transform, one that is degenerate in the sense that we seek results independent of geometric assumptions on the vector field. We are primarily interested in assumptions of smoothness on the vector field. Also relevant is the corresponding maximal function (1.1) Mv,ǫ f := sup 0 0 so that if ǫ−1 = K‖v‖Lip, we have the weak type estimate (1.4) sup λ|{Mv,ǫ f > λ}|1/2 . ‖f‖2 . The origins of this question go back to the discovery of the Besicov- itch set in the 1920s, and in particular, constructions of this set show that the Conjecture is false under the assumption that v is Hölder con- tinuous for any index strictly less than 1. These constructions, known 2 1. OVERVIEW OF PRINCIPAL RESULTS since the 1920’s, were the inspiration for A. Zygmund to ask if integrals of, say, L2(R2) functions could be differentiated in a Lipschitz choice of directions. That is, for Lipschitz v, and f ∈ L2, is it the case that (2ǫ)−1 f(x− yv(x)) dy = f(x) a.e.(x) These and other matters are reviewed in the next chapter. Much later, E. M. Stein [25] raised the singular integral variant of this conjecture. E.M. Stein Conjecture 1.5. There is an absolute constant K > 0 so that if ǫ−1 = K‖v‖Lip, we have the weak type estimate (1.6) sup λ|{|Hv,ǫ f | > λ}|1/2 . ‖f‖2 . These are very difficult conjectures. Indeed, it is known that if the Stein Conjecture holds for, say, C2 vector fields, then Carleson’s Theorem on the pointwise convergence of Fourier series [6] would follow. This folklore result is recalled in the next Chapter. We will study these questions using modifications of the phase plane analysis associated with Carleson’s Theorem [15–20] and a new tool, which we term a Lipschitz Kakeya Maximal Function. Associated with the Besicovitch set is the Kakeya Maximal Func- tion, a maximal function over all rectangles of a given eccentricity. A key estimate is that the L2 −→ L2,∞ norm of this operator grows logarithmically in the eccentricity, [27, 28]. Associated with a Lipschitz vector field, we define a class of maxi- mal functions taken over rectangles of arbitrary eccentricity, but these rectangles are approximate level sets of the vector field. Perhaps sur- prisingly, these maximal functions admit an L2 bound that is indepen- dent of eccentricity. Let us explain. A rectangle is determined as follows. Fix a choice of unit vectors in the plane (e, e⊥), with e⊥ being the vector e rotated by π/2. Using these vectors as coordinate axes, a rectangle is a product of two intervals R = I × J . We will insist that |I| ≥ |J |, and use the notations (1.7) L(R) = |I|, W(R) = |J | for the length and width respectively of R. The interval of uncertainty of R is the subarc EX(R) of the unit circle in the plane, centered at e, and of length W(R)/ L(R). See Fig- ure 1.1. 1. OVERVIEW OF PRINCIPAL RESULTS 3 0 EX(R) R Figure 1.1. An example eccentricity interval EX(R). The circle on the left has radius one. We now fix a Lipschitz map v of the plane into the unit circle. We only consider rectangles R with (1.8) L(R) ≤ (100‖v‖Lip)−1 . For such a rectangle R, set V(R) = R ∩ v−1(EX(R)). It is essential to impose a restriction of this type on the length of the rectangles, for with out it, one can modify constructions of the Besicovitch set to provide examples which would contradict the main results and conjectures of this work. For 0 < δ < 1, we consider the maximal functions (1.9) Mv,δ f(x) = sup |V(R)|≥δ|R| 1R(x) |f(y)| dy. That is we only form the supremum over rectangles for which the vector field lies in the interval of uncertainty for a fixed positive proportion δ of the rectangle, see Figure 1.2. Weak L2 estimate for the Lipschitz Kakeya Maximal Function 1.10. The maximal function Mδ,v is bounded from L 2(R2) to L2,∞(R2) with norm at most . δ−1/2. That is, for any λ > 0, and f ∈ L2(R2), this inequality holds: (1.11) λ2|{x ∈ R2 : Mδ,v f(x) > λ}| . δ−1‖f‖22 . The norm estimate in particular is independent of the Lipschitz vector field v. A principal Conjecture of this work is: Conjecture 1.12. For some 1 < p < 2, and some finite N and all 0 < δ < 1 and all Lipschitz vector fields v, the maximal function Mδ,v is bounded from Lp(R2) to Lp,∞(R2) with norm at most . δ−N . 4 1. OVERVIEW OF PRINCIPAL RESULTS Figure 1.2. A rectangle, with the vector field pointing in the long direction of the rectangle at three points. We cannot verify this Conjecture, only establishing that the norm of the operator can be controlled by a slowly growing function of ec- centricity. In fact, this conjecture is stronger than what is needed below. Let us modify the definition of the Lipschitz Kakeya Maximal Function, by re- stricting the rectangles that enter into the definition to have an approx- imately fixed width. For 0 < δ < 1, and choice of 0 < w < 1 ‖v‖Lip, parameterizing the width of the rectangles we consider, define (1.13) Mv,δ,w f(x) = sup |V(R)|≥δ|R| w≤W(R)≤2w 1R(x) |f(y)| dy. We can restrict attention to this case as the primary interest below is the Hilbert transform on vector fields applied to functions with fre- quency support in a fixed annulus. By Fourier uncertainty, the width of the fixed annulus is the inverse of the parameter w above. Conjecture 1.14. For some 1 < p < 2, and some finite N and all 0 < δ < 1, all Lipschitz vector fields v and 0 < w < 1 ‖v‖Lip the maximal function Mδ,v,w is bounded from L p(R2) to Lp,∞(R2) with norm at most . δ−N . These conjectures are stated as to be universal over Lipschitz vec- tor fields. On the other hand, we will state conditional results below in which we assume that a given vector field satisfies the Conjecture above, and then derive consequences for the Hilbert transform on vec- tor fields. We also show that e. g. real-analytic vector fields [3] satisfy these conjectures. We turn to the Hilbert transform on vector fields. As it turns out, it is useful to restrict functions in frequency variables to an annulus. Such operators are given by St f(x) = 1/t≤|ξ|≤2/t f̂(ξ) eiξ·x dξ . The relevance in part is explained in part by this result of the authors [15], valid for measurable vector fields. 1. OVERVIEW OF PRINCIPAL RESULTS 5 Theorem 1.15. For any measurable vector field v we have the L2 into λ|{|Hv,∞ ◦ St f | > λ}|1/2 . ‖f‖2 . The inequality holds uniformly in t > 0. It is critical that the Fourier restriction St enters in, for otherwise the Besicovitch set would provide a counterexample, as we indicate in the first section of Chapter 2. This is one point at which the differ- ence between the maximal function and the Hilbert transform is strik- ing. The maximal function variant of the estimate above holds, and is relatively easy to prove, yet the Theorem above contains Carleson’s Theorem on the pointwise convergence of Fourier series as a Corollary. The weak L2 estimate is sharp for measurable vector fields, and so we raise the conjecture Conjecture 1.16. There is a universal constant K for which we have the inequalities (1.17) sup 0 0. 6 1. OVERVIEW OF PRINCIPAL RESULTS While this is a conditional result, we shall see that it sheds new light on prior results, such as one of Bourgain [3] on real analytic vector fields. See Proposition 3.30, and the discussion of that Proposition. The authors are not aware of any conceptual obstacles to the fol- lowing extension of the Theorem above to be true, namely that one can establish Lp estimates, for all p > 2. As our argument currently stands, we could only prove this result for p sufficiently close to 2, because of our currently crude understanding of the underlying orthogonality ar- guments. Conjecture 1.21. Assume that Conjecture 1.14 holds for a choice of vector field v with 1 + η > 1 derivatives, then we have the inequalities below (1.22) ‖Hv,ǫ‖p . (1 + log‖v‖C1+η)2 , 2 < p <∞ . In this case, ǫ = K/‖v‖C1+η . For a brief remark on what is required to prove this conjecture, see Remark 4.65. The results of Christ, Nagel, Stein and Wainger [7] apply to certain vector fields v. This work is a beautiful culmination of the ‘geometric’ approach to questions concerning the boundedness of Radon trans- forms. Earlier, a positive result for analytic vector fields followed from Nagel, Stein and Wainger [22]. E.M. Stein [25] specifically raised the question of the boundedness of Hv for smooth vector fields v. And the results of D. Phong and Stein [23, 24] also give results about Hv. J. Bourgain [3] considered real–analytic vector field. N. H. Katz [13] has made an interesting contribution to maximal function question. Also see the partial results of Carbery, Seeger, Wainger and Wright [5]. CHAPTER 2 Connections to Besicovitch Set and Carleson’s Theorem Besicovitch Set The Besicovitch set is a compact set that contains a line segment of unit length in each direction in the plane. Anomalous constructions of such sets show that they can have very small measure. Indeed, given ǫ > 0 one can select rectangles R1, . . . , Rn, with disjoint eccentricities, |EX(R)| ≃ n−1, and of unit length, so that |B| ≤ ǫ for B := n=1Rj. On the other hand, letting ej ∈ EX(Rj), one has that the rectangles Rj + ej are essentially disjoint. See Figure 2.1. Call the ‘reach’ of the Besicovitch set Reach := Rj + ej . This set has measure about one. On the Reach, one can define a vector field with points to a line segment contained in the Besicovitch set. Clearly, one has |Hv 1B(x)| ≃ 1 , x ∈ Reach . Further, constructions of this set permit one to take the vector field to be Lipschitz continuous of any index strictly less than one. And conversely, if one considers a Besicovitch set associated to a vector field of sufficiently small Lipschitz norm, of index one, the corresponding Besicovitch set must have large measure. Thus, Lipschitz estimates are critical. The Kakeya Maximal Function The Kakeya maximal function is typically defined as (2.1) MK,ǫ f(x) := sup |EX(R)|≥ǫ 1R(x) |f(y)| dy , ǫ > 0 . One is forced to take ǫ > 0 due to the existence of the Besicovitch set. It is a critical fact that the norm of this operator admits a norm bound on L2 that is logarithmic in ǫ. See Córdoba and Fefferman [8], and 8 2. CONNECTIONS Figure 2.1. A Besicovitch Set on the left, and it’s Reach on the right. Strömberg [27, 28]. Subsequently, there have been several refinements of this observation, we cite only Nets H. Katz [12], Alfonseca, Soria and Vargas [1], and Alfonseca [2]. These papers contain additional references. For the L2 norm, the following is the sharp result. Theorem 2.2. We have the estimate below valid for all 0 < ǫ < 1. ‖MK,ǫ‖2→2 . 1 + log 1/ǫ . The standard example of taking f to be the indicator of a small disk show that the estimate above is sharp, and that the norm grows as an inverse power of ǫ for 1 < p < 2. Carleson’s Theorem We explain the connection between the Hilbert transform on vector fields and Carleson’s Theorem on the pointwise convergence of Fourier series. Since smooth functions have a convergent Fourier expansion, the main point of Carleson’s Theorem is to provide for the control of an appropriate maximal function. We recall that maximal function in this Theorem. Carleson’s Theorem 2.3. For all measurable functions N : R −→ R, the operator below maps L2 into itself. CNf(x) := p.v. eiN(x)y f(x− y)dy The implied operator norm is independent of the choice of measurable N(x). CARLESON’S THEOREM 9 v(x1) N(x1) ξ2 = J σ(ξ1 −N(x1)) Figure 2.2. Deducing Carleson’s Theorem from Stein’s Conjecture. For fixed function f , an appropriate choice of N will give us ∣∣∣p.v. eiNy f(x− y)dy ∣∣∣ . |CNf(x)| . Thus, in the Theorem above we have simply linearized the supremum. Also, we have stated the Theorem with the un-truncated integral. The content of the Theorem is unchanged if we make a truncation of the integral, which we will do below. Let us now show how to deduce this Theorem from an appropriate bound on certain bound on Hilbert transforms on vector fields. (This observation is apparently due to R.Coifman from the 1970’s.) Proposition 2.4. Assume that we have, say, the bound ‖Hv,1‖2→2 . 1 , assuming that ‖v‖C2 ≤ 1. It follows that the Carleson maximal operator is bounded on L2(R). Proof. The Proposition and the proof are only given in their most obvious formulation. Set σ(ξ) = iξy dy . For a C2 function N : R −→ R we deduce that the operator with symbol σ(ξ − N(x)) maps L2(R) into itself with norm that is independent of the C2 norm of the function N(x). A standard limiting argument then permits one to conclude the same bound for all measurable choices of N(x), as is required for the deduction of Carleson’s inequality. This argument is indicated in Figure 2.2. Take the vector field to be v(x1, x2) = (1,−N(x1)/n) where n is chosen much larger than the 10 2. CONNECTIONS C2 norm of the function N(x1). Then, Hv,1 is bounded on L 2(R2) with norm bounded by an absolute constant. The symbol of Hv,1 is σ(ξ1, ξ2) = σ(ξ1 − ξ2N(x1)/n) . The trace of this symbol along the line ξ2 = J defines a symbol of a bounded operator on L2(R). Taking J very large, we obtain a very good approximation to symbol σ(ξ1 − ξ2N(x1)/n), deducing that it maps L2(R) into itself with a bounded constant. Our proof is complete. � The Weak L2 Estimate in Theorem 1.15 is Sharp An example shows that under the assumption that the vector field is measurable, the sharp conclusion is that Hv ◦ S1 maps L2 into L2,∞. And a variant of the approach to Carleson’s theorem by Lacey and Thiele [20] will prove this norm inequality. This method will also show, under only the measurability assumption, that Hv S1 maps L p into itself for p > 2, as is shown by the current authors [15]. The results and techniques of that paper are critical to this one. CHAPTER 3 The Lipschitz Kakeya Maximal Function The Weak L2 Estimate We prove Theorem 1.10, the weak L2 estimate for the maximal function defined in (1.9), by suitably adapting classical covering lemma arguments. The Covering Lemma Conditions. We adopt the covering lemma approach of Córdoba and R. Fefferman [8]. To this end, we regard the choice of vector field v and 0 < δ < 1 as fixed. Let R be any finite collection of rectangles obeying the conditions (1.8) and |V(R)| ≥ δ|R|. We show that R has a decomposition into disjoint collections R′ and R′′ for which these estimates hold. . δ−1 ,(3.1) R∈R′′ ∣∣∣ . (3.2) The first of these conditions is the stronger one, as it bounds the L2 norm squared by the L1 norm; the verification of it will occupy most of the proof. Let us see how to deduce Theorem 1.10. Take λ > 0 and f ∈ L2 which is non negative and of norm one. Set R to be all the rectangles R of prescribed maximum length as given in (1.8), density with respect to the vector field, namely |V(R)| ≥ δ|R|, and f(y) dy ≥ λ|R| . We should verify the weak type inequality (3.3) λ . δ−1/2 . 12 3. LIPSCHITZ KAKEYA Apply the decomposition to R. Observe that . δ−1/2 Here of course we have used (3.1). This implies that . δ−1/2. Therefore clearly (3.3) holds for the collection R′. Concerning the collection R′′, apply (3.2) to see that R∈R′′ . δ−1/2 . This completes our proof of (3.3). The remainder of the proof is devoted to the proof of (3.1) and (3.2). The Covering Lemma Estimates. Construction of R′ and R′′. In the course of the proof, we will need several recursive procedures. The first of these occurs in the selection of R′ and R′′. We will have need of one large constant κ, of the order of say 100, but whose exact value does not concern us. Using this notation hides distracting terms. Let Mκ be a maximal function given as Mκ f(x) = sup |f(y)| dy , sup |f(x+ σω)| dσ Here, Q is the unit square in plane, and Ω is a set of uniformly dis- tributed points on the unit circle of cardinality equal to κ. It follows from the usual weak type bounds that this operator maps L1(R2) into weak L1(R2). To initialize the recursive procedure, set R′ ← ∅ , STOCK← R . THE WEAK L ESTIMATE 13 R′ κR Figure 3.1. The rectangle R′ would have been removed from STOCK upon the selection of R as a member of R′. The main step is this while loop. While STOCK is not empty, select R ∈ STOCK subject to the criteria that first it have a maximal length L(R), and second that it have minimal value of |EX(R)|. Update R′ ←R′ ∪ {R}. Remove R from STOCK. As well, remove any rectangle R′ ∈ STOCK which is also contained in 1κR ≥ κ−1 As the collection R is finite, the while loop will terminate, and at this point we set R′′ def= R−R′. In the course of the argument below, we will refer the order in which rectangles were added to R′. With this construction, it is obvious that (3.3) holds, with a bound that is a function of κ. Yet, κ is an absolute constant, so this depen- dence does not concern us. And so the rest of the proof is devoted to the verification of (3.1). An important aspect of the qualitative nature of the interval of eccentricity is encoded into this algorithm. We will choose κ so large that this is true: Consider two rectangles R and R′ with R ∩ R′ 6= ∅, L(R) ≥ L(R′), W(R) ≥ W(R′), |EX(R)| ≤ |EX(R′)| and EX(R) ⊂ 10EX(R′) then we have (3.4) R′ ⊂ κR . See Figure 3.1. Uniform Estimates. We estimate the left hand side of (3.1). In so doing we expand the square, and seek certain uniform estimates. 14 3. LIPSCHITZ KAKEYA Expanding the square on the left hand side of (3.1), we can estimate l.h.s. of (3.1) ≤ |R|+ 2 (ρ,R)∈P |ρ ∩R| where P consists of all pairs (ρ, R) ∈ R′×R′ such that ρ∩R 6= ∅, and ρ was selected to be a member of R′ before R was. It is then automatic that L(R) ≤ L(ρ). And since the density of all tiles is positive, it follows that dist(EX(ρ),EX(R)) ≤ 2‖v‖LipL(ρ) < 150 . We will split up the collection P into sub-collections {SR : R ∈ R′} and {Tρ : ρ ∈ R′}. For a rectangle R ∈ R′, we take SR to consist of all rectangles ρ such that (a) (ρ, R) ∈ P; and (b) EX(ρ) ⊂ 10EX(R). We assert that (3.5) |R ∩ ρ| ≤ |R|, R ∈ R. This estimate is in fact easily available to us. Since the rectangles ρ ∈ SR were selected to be in R′ before R was, we cannot have the inclusion (3.6) R ⊂ 1κρ > κ Now the rectangle ρ are also longer. Thus, if (3.5) does not hold, we would compute the maximal function of in a direction which is close, within an error of 2π/κ, of being orthog- onal to the long direction of R. In this way, we will contradict (3.6). The second uniform estimate that we need is as follows. For fixed ρ, set Tρ to be the set of all rectangles R such that (a) (ρ, R) ∈ P and (b) EX(ρ) 6⊂ 10EX(R). We assert that (3.7) |R ∩ ρ| . δ−1|ρ|, ρ ∈ R′. This proof of this inequality is more involved, and taken up in the next subsection. Remark 3.8. In the proof of (3.7), it is not necessary that ρ ∈ R′. Writing ρ = Iρ × Jρ, in the coordinate basis e and e⊥, we could take any rectangle of the form I × Jρ. THE WEAK L ESTIMATE 15 . . . Figure 3.2. Proof of Lemma 3.9 These two estimates conclude the proof of (3.1). For any two dis- tinct rectangles ρ, R ∈ P, we will have either ρ ∈ SR or R ∈ Tρ. Thus (3.1) follows by summing (3.5) on R and (3.7) on ρ. The Proof of (3.7). We do not need this Lemma for the proof of (3.7), but this is the most convenient place to prove it. Lemma 3.9. Let S be any finite collection of rectangles with L(R) ≤ 2 L(R′), and with |V(R)| ≥ δ|R| for all R,R′ ∈ S. Then it is the case (3.10) ≤ 2δ−1 . Proof. Fix a point x at which we give an upper bound on the sum above. Let C(x) be any circle centered at x. We shall show that there exists at most one R in S such that V(R) ∩ C(x) 6= ∅. By the assumption |V(R)| ≥ δ|R| this proves the Lemma. We prove this last claim by contradiction of the Lipschitz assump- tion on the vector field v. Assume that there exist at least two rectan- gles R,R′ ∈ S for which the sets V(R) and V(R′) intersect C(x). Thus there exist y and y′ in C(x) such that v(y) ∈ EX(R) and v(y′) ∈ EX(R′). Since v is Lipschitz, we have |v(y)− v(y′)| ≤ ‖v‖Lip|y − y′| ≤ 4‖v‖Lip L(R)|v(y)− v(y′)| , but this is a contradiction to our assumption (1.8). See Figure 3.2. � We fix ρ, and begin by making a decomposition of the collection Tρ. Suppose that the coordinate axes for ρ are given by eρ, associated with the long side of R, and e⊥ρ , with the short side. Write the rectangle as a product of intervals Iρ × J , where |Iρ| = L(ρ). Denote one of the endpoints of J as α. See Figure 3.3. 16 3. LIPSCHITZ KAKEYA Figure 3.3. Notation for the proof of (3.7). For rectangles R ∈ Tρ, let IR denote the orthogonal projection R onto the line segment 2Iρ×{α}. Subsequently, we will consider different subsets of this line segment. The first of these is as follows. For R ∈ Tρ, let VR be the projection of the set V(R) onto 2Iρ × {α}. The angle θ between ρ and R is at most |θ| ≤ 2‖v‖Lip L(ρ) ≤ 150 . It follows that (3.11) 1 L(R) ≤ |IR| ≤ 2 L(R), and δ L(R) . |VR|. A recursive mechanism is used to decompose Tρ. Initialize STOCK← Tρ , U ← ∅ . While STOCK 6= ∅ select R ∈ STOCK of maximal length. Update U ← U ∪ {R}, U(R)← {R′ ∈ STOCK : VR ∩ VR′ 6= ∅}. STOCK← STOCK− U(R). (3.12) When this while loop stops, it is the case that Tρ = R∈U U(R). With this construction, the sets {VR : R ∈ U} are disjoint. By (3.11), we have (3.13) L(R) . δ−1 L(ρ) . The main point, is then to verify the uniform estimate (3.14) R′∈U(R) |R′ ∩ ρ| . L(R) ·W(ρ) , R ∈ U . Note that both estimates immediately imply (3.7). THE WEAK L ESTIMATE 17 Figure 3.4. Proof of Lemma 3.15: The rectangles R,R′ ∈ U(ρ), and so the angles R and R′ form with ρ are nearly the same. Proof of (3.14). There are three important, and more technical, facts to observe about the collections U(R). For any rectangle R′ ∈ U(R), denote its coordinate axes as eR′ and e⊥R′ , associated to the long and short sides of R ′ respectively. Lemma 3.15. For any rectangle R′ ∈ U(R) we have |eR′ − eR| ≤ 12 |eρ− eR| Proof. There are by construction, points x ∈ V(R) and x′ ∈ V(R′) which get projected to the same point on the line segment Iρ × {α}. See Figure 3.4. Observe that |eR′ − eR| ≤ |EX(R′)|+ |EX(R)|+ |v(x′)− v(x)| ≤ |EX(R′)|+ |EX(R)|+ ‖v‖Lip · L(R) · |eρ− eR| ≤ |EX(R′)|+ |EX(R)|+ 1 |eρ− eR| Now, |EX(R)| ≤ 1 |eρ− eR|, else we would have ρ ∈ SR. Likewise, |EX(R′)| ≤ 1 |eR′ − eR|. And this proves the desired inequality. Lemma 3.16. Suppose that there is an interval I ⊂ Iρ such that (3.17) R′∈U(R) L(R′)≥8|I| |R′ ∩ I × J | ≥ |I × J | . Then there is no R′′ ∈ U(R) such that L(R′′) < |I| and R′′∩4I×J 6= ∅. Proof. There is a natural angle θ between the rectangles ρ and R, which we can assume is positive, and is given by |eρ− eR|. Notice that we have θ ≥ 10|EX(R)|, else we would have ρ ∈ SR, which contradicts our construction. 18 3. LIPSCHITZ KAKEYA 4I × J I × J Figure 3.5. Moreover, there is an important consequence of Lemma 3.15: For any R′ ∈ U(R), there is a natural angle θ′ between R′ and ρ. These two angles are close. For our purposes below, these two angles can be regarded as the same. For any R′ ∈ U(R), we will have |κR′ ∩ ρ| |I × J | ≃ κ W(R′) ·W(ρ) θ|I|W(ρ) W(R′) θ · |I| . Recall Mκ is larger than the maximal function over κ uniformly distributed directions. Choose a direction e′ from this set of κ directions that is closest to e⊥ρ . Take a line segment Λ in direction e ′ of length κθ|I|, and the center of Λ is in 4I × J . See Figure 3.5. Then we have |κR′ ∩ Λ| |Λ| ≥ W(R′) θ · |I| Thus by our assumption (3.17), R′∈U(R) |κR′ ∩ Λ| & 1 . That is, any of the lines Λ are contained in the set 1κR′ > κ Clearly our construction does not permit any rectangle R′′ ∈ U(R) contained in this set. To conclude the proof of our Lemma, we seek a contradiction. Suppose that there is an R′′ ∈ U(R) with L(R′′) < |I| and R′′ intersects 2I × J . The range of line segments Λ we can permit THE WEAK L ESTIMATE 19 4I × J Figure 3.6. The proof of Lemma 3.18 is however quite broad. The only possibility permitted to us is that the rectangle R′′ is quite wide. We must have W(R′′) ≥ 1 |Λ| = κ · θ · |I|. This however forces us to have |EX(R′′)| ≥ κ θ. And this implies that ρ ∈ SR′′ , as in (3.5). This is the desired contradiction. Our third and final fact about the collection U(R) is a consequence of Lemma 3.15 and a geometric observation of J.-O. Stromberg [27, Lemma 2, p. 400]. Lemma 3.18. For any interval I ⊂ IR we have the inequality (3.19) R′∈U(R) L(R′)≤|I|≤ κL(R′) |R′ ∩ I × J | ≤ 5|I| ·W(ρ) . Proof. For each point x ∈ 4I × J , consider the square S centered at x of side length equal to κ · |I| · |eR− eρ|. See Figure 3.6. It is Stromberg’s observation that for R′ ∈ U(R) we have |κR′ ∩ I × J | |I × J | ≃ |S ∩ κR′| with the implied constant independent of κ. Indeed, by Lemma 3.15, we have that |κR′ ∩ I × J | |I × J | ≃ κW(R′) |eR− eρ| · |I| ≃ κW(R ′) · |I| · |eR− eρ| (|eR− eρ| · |I|)2 ≃ |S ∩ κR |S| , 20 3. LIPSCHITZ KAKEYA as claimed. Now, assume that (3.19) does not hold and seek a contradiction. Let U ′ ⊂ U(R) denote the collection of rectangles R′ over which the sum is made in (3.19). The rectangles in U ′ were added in some order to the collection R′, and in particular there is a rectangle R0 ∈ U ′ that was the last to be added to U ′. Let U ′′ be the collection U ′−{R0}. We certainly have ∑ R′∈U ′′ |R′′ ∩ I × J | ≥ 4|I × J |. Since we cannot have ρ ∈ SR0 , Stromberg’s observation implies that R′∈U ′′ 1κR′ > κ Here, we rely upon the fact that the maximal function Mκ is larger than the usual maximal function over squares. But this is a contradiction to our construction, and so the proof is complete. � The principal line of reasoning to prove (3.14) can now begin with it’s initial recursive procedure. Initialize C(R′)← R′ ∩ ρ . We are to bound the sum R′∈U(R)|C(R′)|. Initialize a collection of subintervals of IR to be I ← ∅ WHILE there is an interval I ⊂ IR satisfying∑ R′∈V(I) |C(R′) ∩ I × J | ≥ 40|I| ·W(ρ) ,(3.20) V(I) = {R′ ∈ U(R) | |C(R′) ∩ I × J | 6= ∅ , L(R′) ≥ 8|I|} ,(3.21) we take I to be an interval of maximal length with this property, and update I ← I ∪ {I} ; C(R′, I) = C(R′) ∩ I × J , R′ ∈ V(I); C(R′)← C(R′)− I × J, R′ ∈ V(I) . [We remark that this last updating is not needed in the most important special case when all rectangles have the same width. But the case we are considering, rectangles can have variable widths, so that |C(R)| can be much larger than any |I| · |J | that would arise from this algorithm.] Once the WHILE loop stops, we have R′ ∩ ρ = C(R′) ∪ {C(R′, I) | I ∈ I , R′ ∈ V(I)} . THE WEAK L ESTIMATE 21 Here the union is over pairwise disjoint sets. We first consider the collection of sets {C(R′) | R′ ∈ U(R)} that remain after the WHILE loop has finished. Since we must not have R′ ⊂ 1/4κ · ρ, it follows that the minimum value of L(R′) is 1 W(ρ). Thus, if in (3.20), we consider an interval I of length 1 W(ρ), the condition L(R′) ≥ 8|I| in the definition of V(I) in (3.21) is vacuous. Thus, we necessarily have R′∈V(I) |C(R′) ∩ I × J | ≤ 40|I| ·W(ρ) . For if this inequality failed, the WHILE loop would not have stopped. We can partition IR by intervals of length close to W(ρ), showing that we have ∑ R′∈U(R) |C(R′)| . |IR| ·W(ρ) . Turning to the central components of the argument, namely the bound for the terms associated with the intervals in I, consider I ∈ I. The inequality (3.20) and Lemma 3.18 implies that each I ∈ I must have length |I| ≤ κ−1/2|Iρ|. But we choose intervals in I to be of maximal length. Thus, R′∈V(I) |C(R′, I)| ≤ 100 · |I| ·W(ρ) . (3.22) Indeed, suppose this last inequality fails. Let I ⊂ Ĩ ⊂ Iρ be an interval twice as long as I. By Lemma 3.18, we conclude that R′∈V(I) L(R′)≤8|eI| |R′ ∩ Ĩ × J | ≤ 10|I| ·W(ρ) . Notice that we are restricting the sum on the left by the length of |Ĩ|. Therefore, we have the inequalities R′∈V(I) L(R′)>8|eI| |C(R′, I)| ≥ 90 · |I| ·W(ρ) > 40 · |Ĩ| ·W(ρ) . That is, Ĩ would have been selected, contradicting our construction. Lemmas 3.16 and 3.18 place significant restrictions on the collection of intervals I. If we have I 6= I ′ ∈ I with 3 I ∩ 3 I ′ 6= ∅, then we must have e.g. κ|I ′| < |I|, as follows from Lemma 3.18. Moreover, V(I ′) must contain a rectangle R′ with L(R′) < |I|. But this contradicts Lemma 3.16. 22 3. LIPSCHITZ KAKEYA Therefore, we must have |I| . |IR| . L(R). With (3.22), this completes the proof of (3.14). An Obstacle to an Lp estimate, for 1 < p < 2 We address one of the main conjectures of this memoir, namely Conjecture 1.12. Let us first observe Proposition 3.23. We have the estimate below valid for all 0 < w < ‖v‖Lip. λ|{Mv,w f > λ}|2/3 . δ−1/3(1 + logw−1‖v‖Lip)1/3‖f‖3/2 Proof. Let ‖v‖Lip = 1. This just relies upon the fact that with 0 < w < 1 fixed, there are only about log 1/w possible values of L(R). This leads very easily to the following two estimates. Following the earlier argument, consider an arbitrary collection of rectangles R with each R ∈ R satisfying (1.8) and |V(R)| ≥ δ|R|. We can then decompose R into disjoint collections R′ and R′′ for which these estimates hold. . δ−1(log 1/w) ,(3.24) R∈R′′ ∣∣∣ . (3.25) Compare to (3.1) and (3.2). Following the same line of reasoning that was used to prove (3.3), we prove our Proposition. We can devise proofs of smaller bounds on the norm of the maximal function than that given by this proposition. But no argument that we can find avoids the some logarithmic term in the width of the rectangle. Let us illustrate the difficulty in the estimate with an object pointed out to us by Ciprian Demeter. We term it a pocketknife, and it is pictured in Figure 3.7. A pocketknife comes with a handle, namely a rectangle Rhandle that is longer than any other rectangle in the pocketknife. We call a collec- tion of rectangles B a set of blades if these two conditions are met. In the first place, (3.26) Rhandle ∩ R 6= ∅ . BOURGAIN’S GEOMETRIC CONDITION 23 handle blades . . . hinges Figure 3.7. A pocketknife. In the second place, we have angle(R,Rhandle) ≃ angle(R′, Rhandle) , R, R′ ∈ B . Let θ(B) denote the angle between Rhandle and the rectangles in the blade B. We refer to as a hinge a rectangle of dimensions w/θ(B) by w, in the same coordinate system of Rhandle that contains the intersection in (3.26). Now, let B be a collection of blades for the handle Rhandle. Our proof of the weak L2 estimate for the Lipschitz Kakeya Maximal function shows that we can assume ♯B · w2 · θ(B)−1 . |Rhandle| . This is essentially the estimate (3.5). But, to follow the covering lemma approach to the L3/2 estimate for the maximal function, we need to control (♯B)2 · w2 · θ(B)−1 . We can only find control of expressions of this type in terms of some slowly varying function of w−1. Bourgain’s Geometric Condition Jean Bourgain [3] gives a geometric condition on the Lipschitz vec- tor field that is sufficient for the L2 boundedness of the maximal func- tion associated with v. We describe the condition, and show how it immediately proves that the corresponding Lipschitz Kakeya maximal function admits a weak type bound on L1. In particular our Conjec- ture 1.12 holds for these vector fields. To motivate Bourgain’s condition, let us recall the earlier condition considered by Nagel, Stein and Wainger in [22]. This condition imposes 24 3. LIPSCHITZ KAKEYA a restriction on the maximum and minimum curvatures of the integral curves of the vector field through the assumption that supx∈Ω det[∇v(x)v(x), v(x)] infx∈Ω det[∇v(x)v(x), v(x)] Here, Ω is a domain in R2, and one can achieve an upper bound on the norm of the maximal function associated to v, appropriately restricted to Ω, in terms of this ratio. Bourgain’s condition permits the vector field to have integral curves which are flat. Suppose that v is defined on all of R2. Define (3.27) ω(x; t) := |det[v(x+ tv(x)), v(x)]| , |t| ≤ 1 ‖v‖Lip . Assume a uniform estimate of the following type: For absolute con- stants 0 < c, C <∞ and 0 < ǫ0 < 12‖v‖Lip, (3.28) |{|t| ≤ ǫ | ω(x; t) < τ sup |s|≤ǫ ω(x, s)}| ≤ Cτ cǫ , this condition holding for all x ∈ R2, 0 < τ < 1 and 0 < ǫ < ǫ0. The interest in this condition stems from the fact [3] that real- analytic vector fields satisfy it. Also see Remark 3.35. Bourgain proved: Theorem 3.29. Assume that (3.28) holds. Then, the maximal opera- tor Mv,ǫ0 defined in (1.1) maps L 2 into itself. This paper claims that the same methods would prove the bounds ‖Mv,ǫ0‖p . ‖f‖p , 1 < p <∞ . And suggests that similar methods would apply to the localized Hilbert transform with respect to these vector fields. Here, we prove Proposition 3.30. Assume that (3.28) holds. Then, the Lipschitz Kakeya Maximal Functions Mv,δ,w , 0 < δ < 1 , 0 < w < ǫ0 defined in (1.13) satisfy the weak L1 estimate λ|{Mv,δ,w f > λ}| . δ−1(1 + log 1/δ)‖f‖1 . The implied constants depend upon the constants in (3.28). That is, these vector fields easily fall within the scope of our anal- ysis. As a corollary to Theorem 1.18, we see that Hv maps L 2 into itself. BOURGAIN’S GEOMETRIC CONDITION 25 Figure 3.8. Proof of (3.31). Proof. Let us assume that ‖v‖Lip = 1. Fix δ > 0 and 0 < w < ǫ0. Let R be the class of rectangles with L(R) < κ and satisfying |V(R)| ≥ δ|R|. Say that R′ ⊂ R has scales separated by s > 3 iff for R,R′ ∈ R′ the condition 4 L(R) < L(R′) implies that 2s L(R) < L(R′). One sees that R can be decomposed into ≃ s sub-collections with scales separated by s. The fortunate observation is this: Assuming (3.28), and taking s ≃ log 1/δ, any subset R′ ⊂ R with scales separated by s further enjoys this property: If R,R′ ∈ R′ with C EX(R) ∩ C EX(R′) = ∅, with C a fixed constant, then (3.31) L(R) ≃ L(R′) or R ∩R′ = ∅ . Let us see why this is true, arguing by contradiction. Thus we assume that L(R′) ≤ 2−s L(R), R∩R′ 6= ∅ and C EX(R)∩C EX(R′) = ∅. Since the rectangles have an essentially fixed width, it follows that 2|EX(R′)| ≥ |EX(R)|. Fix a line ℓ in the long direction of 2R with |{x ∈ ℓ | v(x) ∈ V(R)}| ≥ δ |ℓ| = δ L(R) . Let x0 be in the set above, x 0 ∈ V (R′) and x′0 is the projection of x′′0 onto the line ℓ. See Figure 3.8. Observe that we can estimate |v(x′′0)− v(x′0)| ≤ 2|v(x0)− v(x′′0)| L(R′)(3.32) Therefore, for C sufficiently large, we have |v(x0)− v(x′0)| ≥ ∣∣|v(x′0)− v(x′′0)| − |v(x′′0)− v(x0)| ≥ |v(x′′0)− v(x0)|(1− 2 L(R′)) ≥ |EX(R′)| provided C is large enough. Now, after a moments thought, one sees that |det[v(x0), v(x′0)]| ≃ angle(v(x0), v(x′0)) . 26 3. LIPSCHITZ KAKEYA Therefore, for any x ∈ ℓ s≤L(R) ω(x; s) & |EX(R′)| . But the vector field satisfies (3.28), which we will apply with τ ≃ EX(R) EX(R′) ≃ L(R It follows that L(R) ≤ |{x ∈ ℓ | ω(x; s) ≤ cτ |EX(R′)|}| ≤ |{x ∈ ℓ | ω(x; s) ≤ τ sup |s|≤ǫ ω(x, s)}| . τ c L(R) . Therefore, we see (δ/2)1/c . L(R′) which is a contradiction to R′ have scales separated by s, and s ≃ 1 + log 1/δ. Let us see how to prove the Proposition now that we have proved (3.31). Take s ≃ log 1/δ, and a finite sub-collection R′ ⊂ R of rectan- gles with scales separated by s. We may take a further subset R′′ ⊂ R′ such that ∥∥∥ R′′∈R′′ . δ−1 ,(3.33) R∈R′−R′′ ∣∣∣ . R∈R′′ |R′′| .(3.34) These are precisely the covering estimates needed to prove the weak L1 estimate claimed in the proposition. But, in choosing R′′ to satisfy (3.33), it is clear that we need only be concerned about rectangles with a fixed length, and the separation in scales are (3.31) will control rectangles of distinct lengths. The procedure that we apply to select R′′ is inductive. Set R′′ ← ∅ , S ← ∅ , STOCK←R′ . WHILE STOCK 6= ∅, select R ∈ STOCK with maximal length, and update R′′ ← R′′ ∪ {R}, as well as STOCK ← STOCK − {R}. In addition, for any R′ ∈ STOCK with R′ ⊂ 4CR, where C ≥ 1 is the VECTOR FIELDS THAT ARE A FUNCTION OF ONE VARIABLE 27 constant that insures that (3.31) holds, remove these rectangles from STOCK and add them to S. Once the WHILE loop stops, we will have STOCK = ∅ and we have our decomposition of R′. By construction, it is clear that (3.34) holds. We need only check that (3.33) holds. Now, consider R,R′ ∈ R′, with two rectangles have their scales separated, thus 2s L(R′) < L(R). If it is the case that R∩R′ 6= ∅ and C EX(R)∩C EX(R′) 6= ∅, then R would been selected to be in R′ first, whence R′ would have been placed in Therefore, C EX(R) ∩ C EX(R′) = ∅, but then (3.31) implies that R ∩ R′ = ∅. Thus, the only contribution to the L∞ norm in (3.33) can come from rectangles of about the same length. But Lemma 3.9 then implies that such rectangles can overlap only about δ−1 times. Our proof is complete. (As the interest in (3.28) is in small values of c, it will be more efficient to use Lemma 3.9 to handle the case of the rectangles having approximately the same length.) � Remark 3.35. To conclude that the Hilbert transform on vector fields is bounded, one could weaken Bourgain’s condition (3.28) to |{|t| ≤ ǫ | ω(x; t) < τ sup |s|≤ǫ ω(x, s)}| ≤ C exp(−(log 1/τ)c)ǫ . This inequality is to hold universally in x ∈ R2, 0 < τ < 1, and 0 < ǫ < ‖v‖Lip. This is of interest for 0 < c < 1. The proof above can be modified to show that the maximal functions Mv,δ,w satisfy the weak L1 inequality, with constant at most . δ−1−1/c. Vector Fields that are a Function of One Variable We specialize to the vector fields that are a function of just one real variable. Assume that the vector field v is of the form (3.36) v(x1, x2) = (v1(x2), v2(x2)) , and for the moment we do not impose the condition that the vector field take values in the unit circle. The point is simply this: If we are interested in transforms where the kernel is not localized, the restriction on the vector field is immaterial. Namely, for any vector field v Hv,∞ f(x) = p.v. f(x− yv(x)) dy = p.v. f(x− yṽ(x)) dy , ṽ(x) = |v(x)| . We return to a theme implicit in the proof of Proposition 2.4. This proof only relies upon vector fields that are only a function of one 28 3. LIPSCHITZ KAKEYA variable. Thus, it is a significant subcase of the Stein Conjecture to verify it for Lipschitz vector fields of just one variable. Indeed, the situation is this. Proposition 3.37. Suppose that a choice of vector field v(x1, x2) = (1, v1(x1)) is just a function of, say, the first coordinate. Then, Hv,∞ maps L2(R2) into itself. Proof. The symbol of Hv,∞ is sgn(ξ1 + ξ2v1(x1)) . For each fixed ξ2, this is a bounded symbol. And in the special case of the L2 estimate, this is enough to conclude the boundedness of the operator. � It is of interest to extend this Theorem in any Lp, for p 6= 2, for some reasonable choice of vector fields. The corresponding questions for the maximal function are also of interest, and here the subject is much more developed. The paper [5] studies the maximal function Mv,∞. They proved the boundedness of this maximal function on Lp, p > 1, assuming that the vector field was of the form v(x) = (1, v2(x)), that D v2 was positive, and increasing, and satisfied a third more technical condition. More recently, [14] has showed that the third condition is not needed. Namely the following is true. Theorem 3.38. Assume that v(x) = (1, v2(x)), and moreover that D v2 ≥ 0 and is monotonically increasing. Then, Mv,∞ is bounded on Lp, for 1 < p <∞. These vector fields present far fewer technical difficulties than a gen- eral Lipschitz vector field, and there are a richer set of proof techniques that one can bring to bear on them, as indicated in part in the proof of Proposition 3.37. The papers [5, 14] cleverly exploit the Plancherel identity (in the independent variable), and other orthogonality consid- erations to prove their results. These considerations are not completely consistent with the domi- nant theme of this monograph, in which the transforms are localized. Nevertheless, it would be interesting to explore methods, possibly mod- ifications of this memoir, that could provide an extension of Proposi- tion 3.37. In this direction, let us state a possible direction of study. The definition of the the sets V(R) for vector fields of magnitude 1 is given as V(R) = R ∩ v−1(EX(R)). For vector fields of arbitrary magnitude, VECTOR FIELDS THAT ARE A FUNCTION OF ONE VARIABLE 29 we define these sets to be V(R) = {x ∈ R | v(x)|v(x)| ∈ EX(R)} . Define a maximal function—an extension of our Lipschitz Kakeya Max- imal Function—by (3.39) M̃v,δf(x) = sup |V(R)|≥δ|R| |R|−1 f(y) dy . In this definition, we require the rectangles to have density δ, but do not restrict their eccentricities, or lengths. Conjecture 3.40. Assume that the vector field is of the form v(x) = (1, v2(x2)), and the derivative D v ≥ 0 and is monotone. Then for all 0 < δ < 1, we have the estimate ‖M̃v,δ‖p . δ−1 , 1 < p <∞ . One can construct examples which show that the L1 to weak L1 norm of the maximal function is not bounded in terms of δ. Indeed, recalling the ‘pocketknife’ examples of Figure 3.7, we comment that one can construct examples of vector fields with these properties, which we describe with the terminology associated with the pocketknife exam- ples. • The width of all rectangles are fixed. And all rectangles have density δ. • The ‘handle’ of the pocketknife has positive angle θ with the x1 axis. • There is ‘hinge’ whose blades have angles which are positive, and greater than θ. The number of blades can be unbounded, as the width of the rectangles decreases to zero. The assumption that the vector field is only a function of x2 then greatly restricts, but does not completely forbid, the existence of addi- tional hinges. So the combinatorics of these vector fields, as expressed in the Lipschitz Kakeya Maximal Function, are not so simple. CHAPTER 4 The L2 Estimate for Hilbert Transform on Lipschitz Vector Fields We prove one of our main conditional results about the Hilbert transform on Lipschitz vector fields, the inequality (1.19) which is the estimate at L2, for functions with frequency support in an annulus, assuming an appropriate estimate for the Lipschitz Kakeya Maximal Function. We begin the proof by setting notation appropriate for phase plane analysis for functions f on the plane supported on an annulus. With this notation, we can define appropriate discrete analogs of the Hilbert transform on vector fields. The Lemmas 4.22 and 4.23 are the combi- natorial analogs of our Theorem 1.15. We then take up the proofs. The main step in the proof is Lemma 4.50 which combines the (standard) orthogonality considerations with the conjectures about the Lipschitz Kakeya Maximal Functions. Definitions and Principle Lemmas Throughout this chapter, κ will denote a fixed small positive con- stant, whose exact value need not concern us. κ of the order of 10−3 would suffice. The following definitions are as in the authors’ previous paper [15]. Definition 4.1. A grid is a collection of intervals G so that for all I, J ∈ G, we have I ∩ J ∈ {∅, I, J}. The dyadic intervals are a grid. A grid G is central iff for all I, J ∈ G, with I ⊂6= J we have 500κ−20I ⊂ J . The reader can find the details on how to construct such a central grid structure in [11]. Let ρ be rotation on T by an angle of π/2. Coordinate axes for R2 are a pair of unit orthogonal vectors (e, e⊥) with ρ e = e⊥. Definition 4.2. We say that ω ⊂ R2 is a rectangle if it is a product of intervals with respect to a choice of axes (e, e⊥) of R 2. We will say that ω is an annular rectangle if ω = (−2l−1, 2l−1)× (a, 2a) for an integer l with 2l < κa, with respect to the axes (e, e⊥). The dimensions of ω are said to be 2l × a. Notice that the face (−2l−1, 2l−1) × a is 32 4. L ESTIMATE FOR Hv es Rs Figure 4.1. The two rectangles ωs and Rs whose prod- uct is a tile. The gray rectangles are other possible loca- tions for the rectangle Rs. tangent to the circle |ξ| = a at the midpoint to the face, (0, a). We say that the scale of ω is scl(ω) := 2l and that the annular parameter of ω is ann(ω) := a. In referring to the coordinate axes of an annular rectangle, we shall always mean (e, e⊥) as above. Annular rectangles will decompose our functions in the frequency variables. But our methods must be sensitive to spatial considerations; it is this and the uncertainty principle that motivate the next definition. Definition 4.3. Two rectangles R and R are said to be dual if they are rectangles with respect to the same basis (e, e⊥), thus R = r1 × r2 and R = r1 × r2 for intervals ri, ri, i = 1, 2. Moreover, 1 ≤ |ri| · |ri| ≤ 4 for i = 1, 2. The product of two dual rectangles we shall refer to as a phase rectangle. The first coordinate of a phase rectangle we think of as a frequency component and the second as a spatial component. We consider collections of phase rectangles AT which satisfy these conditions. For s, s′ ∈ AT we write s = ωs × Rs, and require that ωs is an annular rectangle,(4.4) Rs and ωs are dual,(4.5) The rectangles Rs are from the product of central grids.(4.6) {1000κ−100R | ωs × R ∈ AT } covers R2, for all ωs.(4.7) ann(ωs) = 2 j for some integer j,(4.8) ♯{ωs | scl(s) = scl, ann(s) = ann} ≥ c ,(4.9) scl(s) ≤ κann(s).(4.10) DEFINITIONS AND PRINCIPLE LEMMAS 33 0 ρωs1 Figure 4.2. An annular rectangular ωs, and three as- sociated subintervals of ρωs1, ωs1, and ωs2. We assume that there are auxiliary sets ωs,ωs1,ωs2 ⊂ T associated to s—or more specifically ωs—which satisfy these properties. Ω := {ωs,ωs1,ωs2 | s ∈ AT } is a grid in T,(4.11) ωs1 ∩ ωs2 = ∅, |ωs| ≥ 32(|ωs1|+ |ωs2|+ dist(ωs1,ωs2))(4.12) ωs1 lies clockwise from ωs2 on T,(4.13) |ωs| ≤ K scl(ωs) ann(ωs) ,(4.14) { ξ|ξ| | ξ ∈ ωs} ⊂ ρωs1.(4.15) In the top line, the intervals ωs1 and ωs2 are small subintervals of the unit circle, and we can define their dilate by a factor of 2 in an obvious way. Recall that ρ is the rotation that takes e into e⊥. Thus, eωs ∈ ωs1. See the figures Figure 4.1 and Figure 4.2 for an illustration of these definitions. Note that |ωs| ≥ |ωs1| ≥ scl(ωs)/ann(ωs). Thus, eωs is in ωs1, and ωs serves as ‘the angle of uncertainty associated to Rs.’ Let us be more precise about the geometric information encoded into the angle of uncertainty. Let Rs = rs × rs⊥ be as above. Choose another set of coordinate axes (e′, e′⊥) with e ′ ∈ ωs and let R′ be the product of the intervals rs and rs⊥ in the new coordinate axes. Then K ′ ⊂ Rs ⊂ ′ for an absolute constant K0 > 1. We say that annular tiles are collections AT satisfying the condi- tions (4.4)—(4.15) above. We extend the definition of e⊥, eω⊥, ann(ω) and scl(ω) to annular tiles in the obvious way, using the notation es, es⊥, ann(s) and scl(s). 34 4. L ESTIMATE FOR Hv A phase rectangle will have two distinct functions associated to it. In order to define these functions, set Ty f(x) := f(x− y), y ∈ R2 (Translation operator) Modξ f(x) := e iξ·x f(x), ξ ∈ R2 (Modulation operator) R1×R2 f(x1, x2) := (|R1||R2|)1/p (Dilation operator). In the last display, 0 < p ≤ ∞, and R1 × R2 is a rectangle, and the coordinates (x1, x2) are those of the rectangle. Note that the definition depends only on the side lengths of the rectangle, and not the location. And that it preserves Lp norm. For a function ϕ and tile s ∈ AT set (4.16) ϕs := Modc(ωs)Tc(Rs)D We shall consider ϕ to be a Schwartz function for which ϕ̂ ≥ 0 is supported in a small ball, of radius κ, about the origin in R2, and is identically 1 on another smaller ball around the origin. (Recall that κ is a fixed small constant.) We introduce the tool to decompose the singular integral kernels. In so doing, we consider a class of functions ψt, t > 0, so that Each ψt is supported in frequency in [−θ − κ,−θ + κ].(4.17) |ψt(x)| . CN(1 + |x|)−N , N > 1 .(4.18) In the top line, θ is a fixed positive constant so that the second half of (4.19) is true. Define φs(x) := ϕs(x− yv(x))ψs(y) dy (v(x)) ϕs(x− yv(x))ψs(sy) dy. (4.19) ψs(y) := scl(s)ψscl(s)(scl(s)y).(4.20) An essential feature of this definition is that the support of the integral is contained in the set {v(x) ∈ ωs2}, a fact which can be routinely verified. That is, we can insert the indicator 1 (v(x)) without loss of generality. The set ωs2 serves to localize the vector field, while ωs1 serves to identify the location of ϕs in the frequency coordinate. DEFINITIONS AND PRINCIPLE LEMMAS 35 The model operator we consider acts on a Schwartz functions f , and it is defined by (4.21) Cannf := s∈AT (ann) scl(s)≥‖v‖Lip 〈f, ϕs〉φs. In this display, AT (ann) := {s ∈ AT | ann(s) = ann}, and we have deliberately formulated the operator in a dilation invariant manner. Lemma 4.22. Assume that the vector field is Lipschitz, and satisfies Conjecture 1.14. Then, for all ann ≥ ‖v‖−1Lip, the operator Cann extends to a bounded map from L2 into itself, with norm bounded by an absolute constant. We remind the reader that for 2 < p <∞ the only condition needed for the boundedness of Cann is the measurability of the vector field, a principal result of Lacey and Li [15]. It is of course of great importance to add up the Cann over ann. The method we use for doing this are purely L2 in nature, and lead to the estimate for C := j=1 C2j . Lemma 4.23. Assume that the vector field is of norm at most one in Cα for some α > 1, and satisfies Conjecture 1.14. Then C maps L2 into itself. In addition we have the estimate below, holding for all values of scl. (4.24) ann=−∞ s∈AT (ann) scl(s)=scl 〈f, ϕs〉φs . (1 + log(1 + scl−1‖v‖Cα)). Moreover, these operators are unconditionally convergent in s ∈ AT . These are the principal steps towards the proof of Theorem 1.15. In the course of the proof, we shall not invoke the additional notation needed to account for the unconditional convergence, as it is entirely notational. They can be added in by the reader. Observe that (4.24) is only of interest when scl < ‖v‖Cα. This inequality depends critically on the fact that the kernel sclψ(scly) has mean zero. Without this assumption, this inequality is certainly false. The proof of Theorem 1.15 from these two lemmas is an argument in which one averages over translations, dilations and rotations of grids. The specifics of the approach are very close to the corresponding argu- ment in [15]. The details are omitted. The operators Cann and C are constructed from a a kernel which is a smooth analog of the truncated kernel p.v. 1 1{|t|≤1}. Nevertheless, our 36 4. L ESTIMATE FOR Hv main theorem follows,1 due to the observation that we can choose a sequence of Schwartz kernels ψ(1+κ)n , for n ∈ Z, which satisfy (4.17) and (4.18), and so that for K(t) := an(1 + κ) nψ(1+κ)n((1 + κ) we have p.v. 1 1{|t|≤1} = K(t)−K(t). Here, for n ≥ 0 we have |an| . 1. And for n < 0, we have |an| . (1+κ)n. The principal sum is thus over n ≥ max(0, ‖v‖Cα), and this corresponds to the operator C. For those n < max(0, ‖v‖Cα), we use the estimate (4.24), and the rapid decay of the coefficient an. Truncation and an Alternate Model Sum There are significant obstacles to proving the boundedness of the model sum Cann on an Lp space, for 1 < p < 2. In this section, we rely upon some naive L2 estimates to define a new model sum which is bounded on Lp, for some 1 < p < 2. Our next Lemma is indicative of the estimates we need. For choices of scl < κann, set AT (ann, scl) := {s ∈ AT (ann) | scl(s) = scl}. Lemma 4.25. For measurable vector fields v and all choices of ann and scl. ∥∥∥ s∈AT (ann,scl) 〈f, ϕs〉φs . ‖f‖2 Proof. The scale and annulus are fixed in this sum, making the Bessel inequality s∈AT (ann,scl) |〈f, ϕs〉|2 . ‖f‖22 evident. For any two tiles s and s′ that contribute to this sum, if ωs 6= ωs′, then φs and φs′ are disjointly supported. And if ωs = ωs′, then Rs and R s are disjoint, but share the same dimensions and orientation in the plane. The rapid decay of the functions φs then gives us the 1In the typical circumstance, one uses a maximal function to pass back and forth between truncated and smooth kernels. This route is forbidden to us; there is no appropriate maximal function to appeal to. TRUNCATION AND AN ALTERNATE MODEL SUM 37 estimate s∈AT (ann,scl) 〈f, ϕs〉φs s∈AT (ann,scl) |〈f, ϕs〉|2 . ‖f‖2 Consider the variant of the operator (4.21) given by (4.26) Φf = s∈AT (ann) scl(s)≥κ−1‖v‖Lip 〈f, ϕs〉φs. As ann is fixed, we shall begin to suppress it in our notations for oper- ators. The difference between Φ and Cann is the absence of the initial . log(1+ ‖v‖Lip) scales in the former. The L2 bound for these missing scales is clearly provided by Lemma 4.25, and so it remains for us to establish (4.27) ‖Φ‖2 . 1, the implied constant being independent of ann, and the Lipschitz norm of v. It is an important fact, the main result of Lacey and Li [15], that (4.28) ‖Φ‖p . 1, 2 < p <∞. This holds without the Lipschitz assumption. We are now at a point where we can be more directly engaged with the construction of our alternate model sum. We only consider tiles with κ−1‖v‖Lip ≤ scl(s) ≤ κann. A parameter is introduced which is used to make a spatial truncation of the functions ϕs; it is (4.29) γ2s := 100 −2 scl(s) ‖v‖Lip Write ϕs = αs + βs where αs = (Tc(Rs)D ζ)ϕs, and ζ is a smooth Schwartz function supported on |x| < 1/2, and equal to 1 on |x| < 1/4. Write for choices of tiles s, (4.30) ψs(y) = ψs−(y) + ψs+(y) where ψs−(y) is a Schwartz function on R, with supp(ψs−) ⊂ γs(scl(s)) −1[−1, 1] , 38 4. L ESTIMATE FOR Hv and equal to ψscl(s)(y) for |y| < 14γs(scl(s)) −1. Then define (4.31) as±(x) = 1ωs2(v(x)) φs(x− yv(x))ψs±(y) dy. Thus, φs = as− + as+. Recalling the notation Sann in Theorem 1.15, define (4.32) A± f := s∈AT (ann) scl(s)≥κ−1‖v‖Lip 〈Sann f, αs〉as± We will write Φ = ΦSann = A+ +A− +B, where B is an operator defined in (4.35) below. The main fact we need concerns A−. Lemma 4.33. There is a choice of 1 < p0 < 2 so that ‖A−‖p . 1, p0 < p <∞. The implied constant is independent of the value of ann, and the Lips- chitz norm of v. The proof of this Lemma is given in the next section, modulo three additional Lemmata stated therein. The following Lemma is important for our approach to the previous Lemma. It is proved below. Lemma 4.34. For each choice of κ−1‖v‖Lip < scl < κann, we have the estimate ∑ s∈AT (ann,scl) |〈Sann f, αs〉|2 . ‖f‖22. Define (4.35) B f := s∈AT (ann) scl(s)≥κ−1‖v‖Lip 〈Sann f, βs〉φs Lemma 4.36. For a Lipschitz vector field v, we have ‖B‖p . 1, 2 ≤ p <∞. Proof. For choices of integers κ−1‖v‖Lip ≤ scl < κann, consider the vector valued operator Tj,k f := {〈Sann f, βs〉√ 1{v(x)∈ωs2}Tc(Rs)D (1 + | · |2)N )(x) | s ∈ AT (ann, scl) where N is a fixed large integer. PROOFS OF LEMMATA 39 Recall that βs is supported off of γsRs. This is bounded linear op- erator from L∞(R2) to ℓ∞(AT (ann, scl)). It has norm. (scl/‖v‖Lip)−10. Routine considerations will verify that Tj,k : L 2(R2) −→ ℓ2(AT (ann, scl)) with a similarly favorable estimate on its norm. By interpolation, we achieve the same estimate for Tj,k from L p(R2) into ℓp(AT (ann, scl)), 2 ≤ p <∞. It is now very easy to conclude the Lemma by summing over scales in a brute force way, and using the methods of Lemma 4.25. � We turn to A+, as defined in (4.32). Lemma 4.37. We have the estimate ‖A+‖p . 1 2 ≤ p <∞. Proof. We redefine the vector valued operator Tj,k to be Tj,k f := {〈Sann f, αs〉√ 1{v(x)∈ωs2}Tc(Rs)D (1 + |x|2)N | s ∈ AT (ann, scl) where N is a fixed large integer. This operator is bounded from Lp(R2) −→ ℓp(AT (ann, scl)) , 2 ≤ p <∞ Its norm is at most . 1. But, for s ∈ AT (ann, scl), we have (4.38) |as+| . (scl/‖v‖Lip)−10|Rs|−1/2(M1Rs)100. Here M denotes the strong maximal function in the plane in the coordi- nates determined by Rs. This permits one to again adapt the estimate of Lemma 4.25 to conclude the Lemma. � Now we conclude that ‖Φ‖2 . 1. And since Φ = A− +A++B, it follows from the Lemmata of this section. Proofs of Lemmata Proof of Lemma 4.33. We have Φ = A−+A+ +B, so from (4.28), Lemma 4.36 and Lemma 4.37, we deduce that ‖A−‖p . 1 for all 2 < p <∞. It remains for us to verify that A− is of restricted weak type p0 for some choice of 1 < p0 < 2. That is, we should verify that for all sets F,G ⊂ R2 of finite measure (4.39) |〈A− 1F , 1G〉| . |F |1/p|G|1−1/p, p0 < p < 2. 40 4. L ESTIMATE FOR Hv Since A− maps L p into itself for 2 < p < ∞, it suffices to consider the case of |F | < |G|. Since we assume only that the vector field is Lipschitz, we can use a dilation to assume that 1 < |G| < 2, and so this set will not explicitly enter into our estimates. We fix the data F ⊂ R2 of finite measure, ann, and vector field v with ‖v‖Lip ≤ κann. Take p0 = 2 − κ2. We need a set of definitions that are inspired by the approach of Lacey and Thiele [20], and are also used in Lacey and Li [15]. For subsets S ⊂ Av := {s ∈ AT (ann) | κ−1‖v‖Lip ≤ scl(s) < κann}, set 〈Sann 1F , αs〉as− Set χ(x) = (1 + |x|)−1000/κ. Define (4.40) χ := χ(p)s = Tc(Rs)D χ, 0 ≤ p ≤ ∞. And set χ̃ s = 1γsRsχ Remark 4.41. It is typical to define a partial order on tiles, following an observation of C. Fefferman [9]. In this case, there doesn’t seem to be an appropriate partial order. Begin with this assumption on the order relation ‘<’ on tiles: (4.42) If ωs ×Rs ∩ ωs′ ×Rs′ 6= ∅, then s and s′ are comparable under ‘<’. It follows from transitivity of a partial order that that one can have tiles s1, . . . , sJ , with sj+1 < sj for 1 ≤ j < J , J ≃ log(‖v‖Lip · ann), and yet the rectangles RsJ and Rs1 can be far apart, namely RsJ ∩ (cJ)Rs1 , for a positive constant c. See Figure 4.3. (We thank the referee for directing us towards this conclusion.) Therefore, one cannot make the order relation transitive, and maintain control of the approximate localization of spatial variables, as one would wish. The partial order is essential to the argument of [9], but while it is used in [20], it is not essential to that argument. We recall a fact about the eccentricity. There is an absolute con- stant K ′ so that for any two tiles s, s′ (4.43) ωs ⊃ ωs′ , Rs ∩ Rs′ 6= ∅ implies Rs ⊂ K ′Rs′ . Figure 3.1 illustrates this in the case where the two rectangles Rs and Rs′ have different widths, which is not the case here. We define an order relation on tiles by s . s′ iff ωs ) ωs′ and Rs ⊂ κ−10Rs′. Thus, (4.42) holds for this order relation, and it is certainly not transitive. PROOFS OF LEMMATA 41 Figure 4.3. The rectangles Rs1 , . . . , RsJ of Remark 4.41. A tree is a collection of tiles T ⊂ Av, for which there is a (non– unique) tile ωT × RT ∈ AT (ann) with Rs ⊂ 100κ−10RT, and ωs ⊃ ωT for all s ∈ T. Here, we deliberately use a somewhat larger constant 100κ−10 than we used in the definition of the order relation ‘..’ For j = 1, 2, call T a i–tree if the tiles for all s, s′ ∈ T, if scl(s) 6= scl(s′), then ωsi∩ωs′i = ∅. 1–trees are especially important. A few tiles in such a tree are depicted in Figure 4.4. Remark 4.44. This remark about the partial order ‘.’ and trees is useful to us below. Suppose that we have two trees T, with top s(T) and T′ with top s(T′). Suppose in addition that s(T′) . s(T). Then, it is the case that T ∪ T′ is a tree with top s(T). Indeed, we must necessarily have ωT ( ωT, since the Rs are from products of a central grid. Also, 100κ−1RT′ ⊂ 100κ−1RT. And so every tile in T′ could also be a tile in T. Our proof is organized around these parameters and functions as- sociated to tiles and sets of tiles. Of particular note here are the first definitions of ‘density,’ which have to be formulated to accommodate the lack of transitivity in the partial order. Note that in the first defi- nition, the supremum is taken over tiles s′ ∈ AT of the same annular parameter as s. We choose the collection AT as it is ‘universal,’ cover- ing all scales in a uniform way, due to different assumptions including 42 4. L ESTIMATE FOR Hv (4.7). dense(S) := sup s′∈AT G∩v−1(ωs′ ) s′ dx | ∃ s , s′′ ∈ S : ωs ⊃ ωs′ ⊃ ωs′′ , Rs ⊂ 100κ−10Rs′ , Rs′ ⊂ 100κ−10Rs′ (4.45) ∆(T)2 := |〈Sann 1F , αs〉|2 1Rs , T is a 1–tree,(4.46) size(S) := sup T is a 1–tree ∆(T) dx.(4.47) Observe that dense(S) only really applies to ‘tree-like’ sets of tiles, and that—and this is important—the tile s′ that appear in (4.45) are not in S, but only assumed to be in AT . Finally, note that dense(s) ≃ G∩v−1(ωs) χ̃(1)s dx with the implied constants only depending upon κ, χ, and other fixed quantities. Observe these points about size. First, it is computed relative to the truncated functions αs, recall (4.29). Second, that for p > 1, (4.48) ‖∆(T)‖p . |F |1/p , because of a standard Lp estimate for a Littlewood-Paley square func- tion. Third, that size(Av(ann)) . 1. And fourth, that one has an estimate of John-Nirenberg type. Lemma 4.49. For a 1-tree T we have the estimate ‖∆(T)‖p . size(T)|RT|1/p , 1 < p <∞ . Proofs of results of this type are well represented in the literature. See [4, 11]. Given a set of tiles, say that count(S) < A iff S is a union of trees T ∈ T for which ∑ |sh(T)| < A. We will also use the notation count(S) . A, implying the existence of an absolute constant K for which count(S) ≤ KA. The principal organizational Lemma is PROOFS OF LEMMATA 43 Figure 4.4. A few possible tiles in a 1–tree. Rectangles ωs are on the left in different shades of gray. Possible locations of Rs are in the same shade of gray. Lemma 4.50. Any finite collection of tiles S ⊂ Av is a union of four subsets Slight, Ssmall, S large, ℓ = 1, 2. They satisfy these properties. size(Ssmall) < size(S),(4.51) dense(Slight) < dense(S),(4.52) and both Sℓlarge are unions of trees T ∈ T ℓ, for which we have the estimates count(S1large) . size(S) −2−κ|F | size(S)−p dense(S)−M |F | + size(S) dense(S)−1 dense(S)−1 (4.53) count(S2large) . size(S) (log 1/ size(S))3|F | size(S)κ/50 dense(S)−1 (4.54) What is most important here is the middle estimate in (4.53). Here, p is as in Conjecture 1.14, and M > 0 is only a function of N in that Conjecture. The estimates that involve size(S)−2|F | are those that follow from orthogonality considerations. The estimates in dense(S)−1 are those that follow from density considerations which are less complicated. However, in the second half of (4.54), the small positive power of size is essential for us. All of these estimates are all variants of those in [20]. The middle estimate of (4.53) is not of this type, and is the key ingredient that permits us to obtain an estimate below L2. Note that 44 4. L ESTIMATE FOR Hv it gives the best bound for collections with moderate density and size. For it we shall appeal to our assumed Conjecture 1.14. Logarithms, such as those that arise in (4.54), arise from our trun- cation arguments, associated with the parameters γs in (4.29). For individual trees, we need two estimates. Lemma 4.55. If T is a 1–tree with − ∆(T) ≥ σ, then we have (4.56) |F ∩ σ−κRT| & σ1+κ|RT|. Lemma 4.57. For trees T we have the estimate (4.58) |〈Sann1F , αs〉〈as−, 1G〉| . Ψ dense(T) size(T) |sh(T)|. Here Ψ(x) = x|log cx|, and inside the logarithm, c is a small fixed constant, to insure that c dense(T) · size(T) < 1 , say. Sum(S) := |〈Sann1F , αs〉〈as−, 1G〉| We want to provide the bound Sum(Av) . |F |1/p for p0 < p < 2. We have the trivial bound (4.59) Sum(S) . Ψ dense(S) size(S) count(S). It is incumbent on us to provide a decomposition of Av into sub- collections for which this last estimate is effective. By inductive application of our principal organizational Lemma 4.50, Av is the union of Sℓδ,σ, ℓ = 1, 2 for δ, σ ∈ 2 := {2n | n ∈ Z , n ≤ 0}, satisfying dense(Sℓδ,σ) . δ,(4.60) size(Sℓδ,σ) . σ,(4.61) count(Sℓδ,σ) . min(σ−2−κ|F |, δ−Mσ−p|F |+ σ1/κδ−1, δ−1) ℓ = 1, min(σ−2(log 1/σ)3|F |, δ−1σκ/50) ℓ = 2 (4.62) Using (4.59), we see that Sum(S1δ,σ) . min(Ψ(δ)σ −1−κ|F |, δ−M+1σ−p+1|F |+ σ1/κ+1, σ) Sum(S2δ,σ) . min(Ψ(δ)σ −1(log 1/σ)4|F |, σ1+κ/50) (4.63) One can check that for ℓ = 1, 2, (4.64) δ,σ∈2 Sum(Sℓδ,σ) . |F |1/p, p0 < p < 2. PROOFS OF LEMMATA 45 This completes the proof of Lemma 4.33, aside from the proof of Lemma 4.50. Proof of (4.64). We can assume that |G| = 1, and that |F | ≤ 1, for otherwise the result follows from the known Lp estimates, for p > 2 and measurable vector fields, see Theorem 1.15. The case of ℓ = 2 in (4.64) is straightforward. Notice that in (4.63), for ℓ = 2, the two terms in the minimum are roughly comparable, ignoring logarithmic terms, for δ|F | ≃ σ2+κ/50 . Therefore, we set T1 = {(δ, σ) ∈ 2× 2 | δ|F | ≤ σ2+κ/50 ≤ |F |} , T2 = {(δ, σ) ∈ 2× 2 | σ2+κ/50 ≤ δ|F |} and T3 = 2× 2− T1 − T2. We can estimate (δ,σ)∈T1 Sum(S2δ,σ) . (δ,σ)∈T1 Ψ(δ)σ−1(log 1/σ)4|F | σ2+κ/50≤|F | σ1+κ/75 . |F |1/p0 , p0 = 2 + κ/50 1 + κ/75 < 2 . Notice that we have absorbed harmless logarithmic terms into a slightly smaller exponent in σ above. The second term is (δ,σ)∈T2 Sum(S2δ,σ) . (δ,σ)∈T2 σ1+κ/50 (δF )1/p1 , p1 = 2 + κ/50 1 + κ/50 < 2 , . |F |1/p1 . 46 4. L ESTIMATE FOR Hv The third term is (δ,σ)∈T3 Sum(S2δ,σ) . (δ,σ)∈T3 Ψ(δ)σ−1(log 1/σ)4|F | σ2+κ/50≥|F | σ−1|F |1−κ/75 . |F |1/p0 . Here, we have again absorbed harmless logarithms into a slightly smaller power of |F |, and p0 < 2 is as in the first term. The novelty in this proof is the proof of (4.64) in the case of ℓ = 1. We comment that if one uses the proof strategy just employed, that is only relying upon the first and last estimates from the minimum in (4.63), in the case of ℓ = 1, one will only show that |F |1/2. In the definitions below, we will have a choice of 0 < τ < 1, where τ = τ(M, p) ≃ M−1·(2−p) will only depend uponM and p in (4.63). (τ enters into the definition of T4 and T5 below.) The choice of 0 < κ < τ will be specified below. T1 = {(δ, σ) ∈ 2× 2 | |F | (2+κ)(1+κ) ≤ σ} , T2 = {(δ, σ) ∈ 2× 2 | σ < |F | 2−κ , δ ≥ σ1/κ} , T3 = {(δ, σ) ∈ 2× 2 | σ < |F | 2−κ , δ > σ1/κ} , T4 = {(δ, σ) ∈ 2× 2 | |F | 2−κ ≤ σ < |F | (2+κ)(1+κ) , δ > στ} , T5 = {(δ, σ) ∈ 2× 2 | |F | 2−κ ≤ σ < |F | (2+κ)(1+κ) , δ ≤ στ} , T (T ) = (δ,σ)∈T Sum(S1δ,σ) . Note that for T1 we can use the first term in the minimum in (4.63). T (T1) . (δ,σ)∈T1 δσ−1−κ|F | σ≥|F | (2+κ)(1+κ) σ−1−κ|F | . |F |1− 2+κ . This last exponent on |F | is strictly larger than 1 as desired. The point of the definition of T1 is that when it comes time to use the middle term PROOFS OF LEMMATA 47 of the minimum for ℓ = 1 in (4.63), we can restrict attention to the δ−M+1σ−p+1|F | . For the collection T2, use the last term in the minimum in (4.63). T (T2) . (δ,σ)∈T2 σ≤|F | σ log 1/σ . |F | 2−κ/2 . Again, for 0 < κ < 1, the exponent on |F | above is strictly greater than 1/2. The term T3 can be controlled with the first term in the minimum in (4.63). T (T3) . (δ,σ)∈T3 δσ−1−κ|F | σ|F | . |F | . The term T4 is the heart of the matter. It is here that we use the middle term in the minimum of (4.63), and that the role of τ becomes clear. We estimate T (T4) . (δ,σ)∈T4 δ−Mσ−p+1|F | δ≥|F |τ δ−M |F |1− . |F |1− −Mτ . Recall that 1 < p < 2, so that 0 < p − 1 < 1. Therefore, for 0 < κ sufficiently small, of the order of 2− p, we will have 1− p− 1 2− κ > + 2−p Therefore, choosing τ ≃ (2 − p)/M will leave us with a power on |F | that is strictly larger than 1 The previous term did not specify κ > 0. Instead it shows that for 0 < κ < 1 sufficiently small, we can make a choice of τ , that is 48 4. L ESTIMATE FOR Hv independent of κ, for which T (T4) admits the required control. The bound in the last term will specify a choice of κ on us. We estimate T (T5) . (δ,σ)∈T5 δσ−1−κ|F | σ≥|F | σ−1−κ|F |1+τ . |F |1+τ+ Choosing κ = τ/6 will result in the estimate which is as required, so our proof is finished. � Remark 4.65. The resolution of Conjecture 1.21 would depend upon refinements of Lemma 4.50, as well as using the restricted weak type approach of [21]. Proof of Lemma 4.34. We only consider tiles s ∈ AT (ann, scl), and sets ω ∈ Ω which are associated to one of these tiles. For an element a = {as} ∈ ℓ2(AT (ann, scl)), s :ωs=ω asSannαs For |ωs| = |ωs′|, note that dist(ωs,ωs′) is measured in units of scl/ann. By a lemma of Cotlar and Stein, it suffices to provide the estimate ′‖2 . ρ−3, ρ = 1 + dist(ω,ω′). Now, the estimate ‖T ‖2 . 1 is obvious. For the case ω 6= ω′, by Schur’s test, it suffices to see that (4.66) sup s′ :ωs′=ω s :ωs=ω |〈Sannαs, Sannαs′〉| . ρ−3. For tiles s′ and s as above, recall that 〈ϕs, ϕs′〉 = 0, note that |Rs′ ∩ Rs| ann dist(ω,ω′) ≃ ρ−1, and in particular, for a fixed s′, let Ss′ be those s for which ρRs∩ρRs′ 6= ∅. Clearly, card(Ss′) . |ρRs| |2ρRs′ ∩ 2ρRs| ρ ≃ ρ2 PROOFS OF LEMMATA 49 If for r > 1, rRs ∩ rRs′ = ∅, then it is routine to show that |〈Sannαs, Sannαs′〉| . r−10 And so we may directly sum over those s 6∈ Ss′ , s 6∈Ss′ |〈Sannαs, Sannαs′〉| . ρ−3. For those s ∈ Ss′, we estimate the inner product in frequency vari- ables. Recalling the definition of αs = (Tc(Rs)D ζ)ϕs, we have α̂s = (Mod−c(Rs) D ζ̂) ∗ ϕ̂s. Recall that ζ is a smooth compactly supported Schwartz function. We estimate the inner product |〈Ŝannαs, Ŝannαs′〉| without appealing to cancellation. Since we choose the function λ̂ to be supported in an annulus 1 < |ξ| < 3 so that λ̂ann = λ̂(·/ann) is supported in the annulus 1 ann < |ξ| < 3 ann. We can restrict our attention to this same range of ξ. In the region |ξ| > ann/4, suppose, without loss of generality, that ξ is closer to ωs than ωs′. Then since ωs and ωs′ are separated by an amount & anndist(ω,ω |α̂s(ξ)α̂s′(ξ)| . χ(2)ωs (ξ)χ dist(ω,ω) . χ(2)ωs (ξ)χ (ξ)ρ−20. Here, χ is the non–negative bump function in (4.40). Hence, we have the estimate ∫ |λ̂ann(ξ)|2|α̂s(ξ)α̂s′(ξ)|dξ . ρ−10. This is summed over the . ρ2 possible choices of s ∈ Ss′ , giving the estimate ∑ s∈Ss′ |〈Sannαs, Sannαs′〉| . ρ−8 . ρ−3. This is the proof of (4.66). And this concludes the proof of Lemma 4.34. Proof of the Principal Organizational Lemma 4.50. Recall that we are to decompose S into four distinct subsets satisfying the favorable estimates of that Lemma. For the remainder of the proof set dense(S) := δ and size(S) := σ. Take Slight to be all those s ∈ S for which there is no tile s′ ∈ AT of density at least δ/2 for which s . s′. It is clear that this set so constructed has density at most δ/2, that this is a set of tiles, and that S1 := S− Slight is also . 50 4. L ESTIMATE FOR Hv The next Lemma and proof comment on the method we use to obtain the middle estimate in (4.53) which depends upon the Lipschitz Kakeya Maximal Function Conjecture 1.14. It will be used to obtain the important inequality (4.82) below. Lemma 4.67. Suppose we have a collection of trees T ∈ T , with these conditions. a: For T ∈ T there is a 1-tree T1 ⊂ T with (4.68) ∆(T1) dx ≥ κσ , b: Each tree has top element s(T) := ωT×RT of density at least c: The collections of tops {s(T) | T ∈ T } are pairwise incompa- rable under the order relation ‘.’. d: For all T ∈ T , γT = γωT×RT ≥ κ−1/2σ−κ/5N . Here, N is the exponent on δ in Conjecture 1.14. Then we have |RT| . δ−Np−1σ−p(1+κ/4)|F |+ σ1/κδ−1.(4.69) |RT| . δ−1.(4.70) Concerning the role of γT, recall from the definition, (4.29), that γs is a quantity that grows as does the ratio scl(s)/‖v‖Lip, hence there are only . log σ−1 scales of tiles that do not satisfy the assumption d above. Proof. Our primary interest is in (4.69), which is a consequence of our assumption about the Lipschitz Kakeya Maximal Functions, Con- jecture 1.12. s(T) := ωT × σ−κ/10NRT . Let us begin by noting that κ−1‖v‖Lip ≤ scl(s(T)) ≤ κann(s(T)), T ∈ T ,(4.71) dense(s(T)) ≥ δσκ/10N , T ∈ T ,(4.72) |F ∩Rs(T)| ≥ σ1+κ/4N |Rs(T)| .(4.73) The conclusion (4.71) is straightforward, as is (4.72). The inequality (4.73) follows from Lemma 4.55. PROOFS OF LEMMATA 51 Note that the length of σ−κ/10NRT satisfies σ−κ/10N L(RT) ≤ γT L(RT) scl(s) ‖v‖Lip ≤ (100‖v‖Lip)−1 . (4.74) This is the condition (1.8) that we impose in the definition of the Lipschitz Kakeya Maximal Functions. Observe that we can regard ann(s(T)) ≃ σκ/10ann as a constant independent of T. The point of these observations is that our assumption about the Lipschitz Kakeya Maximal Function applies to the maximal function formed over the set of tiles {s(T) | T ∈ T }. And it will be applied below. Let Tk be the collection of trees so that T ∈ Tk if k ≥ 0 is the smallest integer such that (4.75) |(2kRT) ∩ v−1(ωT) ∩G| ≥ 220k/κ 2−1δ|RT| . Then since the density of s(T) for every tree T ∈ T is at least δ, we have T = k=0 Tk. We can apply Conjecture 1.12 to these collections, with the value of δ in that Conjecture being 220k/κ 2−1δ. For each Tk, we decompose it by the following algorithm. Initialize T selectedk ← ∅, T stockk ← Tk . While T stockk 6= ∅, select T ∈ T stockk such that scl(s(T)) is minimal. Define Tk(T) by Tk(T) = {T′ ∈ Tk : (2kRT) ∩ (2kRT′) 6= ∅ and ωT ⊂ ωT′} . Update T selectedk ← T selectedk ∪ {T} , T stockk ← T stockk \Tk(T) . Thus we decompose Tk into T∈T selected T′∈Tk(T) {T′} . And ∑ |RT| = T∈T selected T′∈Tk(T) |RT′| . Notice that RT′ ’s are disjoint for all T ′ ∈ Tk(T) and they are contained in 5(2kRT). This is so, since the tops of the trees are assumed to be incomparable with respect to the order relation ‘.’ on tiles. 52 4. L ESTIMATE FOR Hv Thus we have |RT| . T∈T selected 22k|RT| . δ−12−10k/κ T∈T selected |(2kRT) ∩ v−1(ωT) ∩G| . Observe that (2kRT)∩v−1(ωT)’s are disjoint for all T ∈ T selectedk . This and the fact that |G| ≤ 1 proves (4.70). To argue for (4.69), we see |RT| . δ−12−10k/κ T∈T selected (2kRT) ∩ v−1(ωT) ∩G . δ−12−10k/κ (2kRT) ∩G ∣∣∣∣ . At this point, Conjecture 1.12 enters. Observe that we can estimate ∣∣∣ . |{Mδ′,v,(σκ/10ann)−1 1F > σ1+κ/4N}| . (δ′)−Npσ−p(1+κ/4N)|F |. . (δ)−Np2−kσ−p(1+κ/4N)|F |. (4.76) Here, δ′ = 220k/κ 2−1δ, the choice of δ′ permitted to us by (4.75), and we have used (4.73) in the first line, to pass to the Lipschitz Kakeya Maximal Function. Hence, |sh(T)| . . δ−1 2−10k/κ (2kRT) ∩G . δ−1 k : 1≤2k≤σ−κ/10 (2kRT) + δ−1 k : 2k>σ−κ/10 2−10k/κ 2 |G| . On the first sum in the last line, we use (4.76), and on the second, we just sum the geometric series, and recall that |G| = 1. PROOFS OF LEMMATA 53 We can now begin the principal line of reasoning for the proof of Lemma 4.50. The Construction of S1large. We use an orthogonality, or TT ∗ argu- ment that has been used many times before, especially in [20] and [15]. (There is a feature of the current application of the argument that is present due to the fact that we are working on the plane, and it is detailed by Lacey and Li [15].) We may assume that all intervals ωs are contained in the upper half of the unit circle in the plane. Fix S ⊂ Av, and σ = size(S). We construct a collection of trees T 1large for the collection S1, and a corresponding collection of 1–trees T 1,1large, with particular properties. We begin the recursion by initializing T 1large ← ∅, T large ← ∅, S1large ← ∅, Sstock ← S1. In the recursive step, if size(Sstock) < 1 σ1+κ/100, then this recursion stops. Otherwise, we select a tree T ⊂ Sstock such that three conditions are met. a: The top of the tree s(T) (which need not be in the tree) satisfies dense(s(T)) ≥ δ/4. b: T contains a 1–tree T1 with (4.77) − ∆(T1) dx ≥ 1 σ1+κ/100 . c: And that ωT is in the first place minimal and and in the second most clockwise among all possible choices of T. (Since all ωs are in the upper half of the unit circle, this condition can be fulfilled.) We take T to be the maximal tree in Sstock which satisfies these condi- tions. We then update T 1large ← {T} ∪ Tlarge, T large ← {T1} ∪ T large, S1large ← T ∪ S1large Sstock ← Sstock −T. The recursion then repeats. Once the recursion stops, we update S1 ← Sstock It is this collection that we analyze in the next subsection. Note that it is a consequence of the recursion, and Remark 4.44, that the tops of the trees {s(T) | T ∈ T 1large} are pairwise incomparable under .. 54 4. L ESTIMATE FOR Hv The bottom estimate of (4.53) is then immediate from the construc- tion and (4.70). First, we turn to the deduction of the first estimate of (4.53). Let T 1,(1)large be the set T 1,(1)large = T ∈ T 1large : |〈Sann 1F , βs〉|2 < 116σ 2+κ/50|RT| And let T 1,(2)large be the set T 1,(2)large = T ∈ T 1large : |〈Sann 1F , βs〉|2 ≥ 116σ 2+κ/50|RT| In the inner products, we are taking βs, which is supported off of γsRs. Since T ∈ T 1large satisfies (4.78) − ∆(T) dx ≥ 1 σ1+κ/100 , we have ∑ |〈Sann 1F , αs〉|2 ≥ 14σ 2+κ/50|RT| . Thus, if T ∈ T 1,(1)large , we have |〈1F , ϕs〉|2 ≥ 18σ 2+κ/50|RT| . The replacement of αs by ϕs in the inequality above is an important point for us. That we can then drop the Sann is immediate. With this construction and observation, we claim that (4.79) T∈T 1,(1) large |RT| . (log 1/σ)2σ−2−κ/50|F |. Proof of (4.79). This is a variant of the the argument for the ‘Size Lemma’ in [15], and so we will not present all details. Begin by making a further decomposition of the trees T ∈ T 1,(1)large . To each such tree, we have a 1-tree T1 ⊂ T which satisfies (4.77). We decompose T1. Set T1(0) = s ∈ T1 | |〈f, ϕs〉|√ < σ1+κ/100 T1(j) = s ∈ T1 | 4j−1σ1+κ/100 ≤ |〈f, ϕs〉|√ < 4jσ1+κ/100 1 ≤ j ≤ j0 = C log 1/σ . PROOFS OF LEMMATA 55 Now, set T (j) to be those T ∈ T 1,(1)large for which (4.80) s∈T1(j) |〈f, ϕs〉|2 ≥ (2j0)−1σ2+κ/50|RT| , 0 ≤ j ≤ j0 . It is the case that each T ∈ T 1,(1)large is in some T (j), for 0 ≤ j ≤ j0. The central case is that of j = 0. We can apply the ‘Size Lemma’ of [15] to deduce that T∈T (0) |RT| ≤ (2j0)σ−2−κ/50 T∈T (0) s∈T1(0) |〈f, ϕs〉|2 . (log 1/σ)σ−2−κ/50|F | . The point here is that to apply the argument in the ‘Size Lemma’ one needs an average case estimate, namely (4.80), as well as a uniform control, namely the condition defining T1(0). This proves (4.79) in this case. For 1 ≤ j ≤ j0, we can apply the ‘Size Lemma’ argument to the individual tiles in the collection {T1(j) | T ∈ T (j)} . The individual tiles satisfy the definition of a 1-tree. And the defining condition of T1(j) is both the average case estimate, and the uniform control needed to run that argument. In this case we conclude that T∈T (j) s∈T1(j) |〈f, ϕs〉|2 . |F | . Thus, we can estimate T∈T (j) |RT| . (log 1/σ)σ−1−κ/50|F | . This summed over 1 ≤ j ≤ j0 = C log 1/σ proves (4.79). � For T 1,(2)large , we have T∈T 1,(2) large |RT| . σ−2−κ/50 scl≥κ−1‖v‖Lip s:scl(s)=scl |〈Sann 1F , βs〉|2 . σ−2−κ/50|F | scl≥κ−1‖v‖Lip ‖v‖Lip . σ−2−κ/50|F | , 56 4. L ESTIMATE FOR Hv since βs has fast decay. The Bessel inequality in the last display can be obtained by using the same argument in the proof of Lemma 4.34. Hence we get (4.81) T∈T 1,(2) large |RT| . σ−2−κ/50|F |. Combining (4.79) and (4.81), we obtain the first estimate of (4.53). Second, we turn to the deduction of the middle estimate of (4.53), which relies upon the Lipschitz Kakeya Maximal Function. Let T 1,goodlarge be the set T ∈ T 1large : γT ≥ κ−1/2σ−κ/5N And let T 1,badlarge be the set T ∈ T 1large : γT < κ−1/2σ−κ/5N The ‘good’ collection can be controlled by facts which we have already marshaled together. In particular, we have been careful to arrange the construction so that Lemma 4.67 applies. By the main conclusion of that Lemma, (4.69), we have (4.82) T∈T 1,good large |RT| . δ−Mσ−1−3κ/4|F |+ σ1/κδ−1 . Here, M is a large constant that only depends upon N in Conjec- ture 1.14. For T ∈ T 1,badlarge , there are at most K = O(log(σ−κ)) many possible scales for scl(ωT × RT). Let scl(T) = scl(ωT × RT). Thus we have T∈T 1,bad large |RT| . T:scl(T)=2mκ−1‖v‖Lip |RT| . Since T satisfies (4.78), we have |F ∩ γTRT| & σ1+κ/2|RT| . Thus, we get T∈T 1,bad large |RT| . σ−1−κ/2 T:scl(T)=2mκ−1‖v‖Lip 1σ−κRT(x)dx . PROOFS OF LEMMATA 57 For the tiles with a fixed scale, we have the following inequality, which is a consequence of Lemma 4.25. T:scl(T)=2mκ−1‖v‖Lip 1σ−κRT . σ−κ/5δ−1 . Hence we obtain (4.83) T∈T 1,bad large |RT| . δ−1σ−1−3κ/4|F | . Combining (4.82) and (4.83), we obtain the middle estimate of (4.53). Therefore, we complete the proof of (4.53). The Construction of S2large. It is important to keep in mind that we have only removed trees of nearly maximal size, with tops of a given density. In the collection of tiles that remain, there can be trees of large size, but they cannot have a top with nearly maximal density. We repeat the TT∗ construction of the previous step in the proof, with two significant changes. We construct a collection of trees T 2large from the collection S1, and a corresponding collection of 1–trees T 2,1large, with particular properties. We begin the recursion by initializing T 2large ← ∅ , T large ← ∅ , S2large ← ∅ , Sstock ← S1 . In the recursive step, if size(Sstock) < σ/2, then this recursion stops. Otherwise, we select a tree T ⊂ Sstock such that two conditions are a: T satisfies ‖∆(T)‖2 ≥ σ2 |RT| b: ωT is both minimal and most clockwise among all possible choices of T. We take T to be the maximal tree in Sstock which satisfies these condi- tions. We take T1 ⊂ T to be a 1–tree so that (4.84) − ∆(T1) dx ≥ κσ . This last inequality must hold by Lemma 4.49. We then update T 2large ← {T} ∪ Tlarge, T large ← {T1} ∪ T large, Sstock ← Sstock −T. The recursion then repeats. 58 4. L ESTIMATE FOR Hv Once the recursion stops, it is clear that the size of Sstock is at most σ/2, and so we take Ssmall := Sstock. The estimate ∑ T∈T 2 large |RT| . σ−2|F | then is a consequence of the TT ∗ method, as indicated in the previous step of the proof. That is the first estimate claimed in (4.54). What is significant is the second estimate of (4.54), which involves the density. The point to observe is this. Consider any tile s of density at least δ/2. Let Ts be those trees T ∈ T 2large with top ωs(T) ⊃ ωs and Rs(T) ⊂ KRs. By the construction of S1large, we must have ∆(T1) dx ≤ σ1+κ/100 , for the maximal 1–tree T 1 contained in T∈Ts T. But, in addition, the tops of the trees in T 2large are pairwise incomparable with respect to the order relation ‘.,’ hence we conclude that |RT| . σ2+κ/50|Rs|. Moreover, by the construction of Slight, for each T ∈ T 2large we must be able to select some tile s with density at least δ/2 and ωs(T) ⊃ ωs and Rs(T) ⊂ KRs. Thus, we let S∗ be the maximal tiles of density at least δ/2. Then, the inequality (4.70) applies to this collection. And, therefore, T∈T 2 large |RT| ≤ σκ/50 |Rs| . σκ/50δ−1. This completes the proof of second estimate of (4.54). � The Estimates For a Single Tree. The Proof of Lemma 4.55. It is a routine matter to check that for any 1–tree we have ∑ |〈f, ϕs〉|2 . ‖f‖22. Indeed, there is a strengthening of this estimate relevant to our concerns here. Recalling the notation (4.40), we have (4.85) |〈f, ϕs〉|2 ]1/2∥∥∥ . ‖χ(∞)RT f‖p , 1 < p <∞ . PROOFS OF LEMMATA 59 This is variant of the Littlewood-Paley inequalities, with some addi- tional spatial localization in the estimate. Using this inequality for p = 1 + κ/100 and the assumption of the Lemma, we have σ1+κ/100 ≤ ∆T dx ]1+κ/100 1+κ/100 ≤ |RT|−1 |〈f, ϕs〉|2 ]1/2∥∥∥ 1+κ/100 1+κ/100 . |RT|−1 dx.(4.86) This inequality can only hold if |F ∩ σ−κRT| ≥ σ1+κ|RT|. � The Proof of Lemma 4.57. This Lemma is closely related to the Tree Lemma of [15]. Let us recall that result in a form that we need it. We need analogs of the definitions of density and size that do not incorporate truncations of the various functions involved. Define dense(s) := G∩v−1(ωs) (x) dx. (Recall the notation from (4.40).) dense(T) := sup dense(s). Likewise define size(T) := sup ′ is a 1–tree |RT′|−1 |〈1F , ϕs〉|2 Then, the proof of the Tree Lemma of [15] will give us this inequality: For T a tree, (4.87) |〈Sann 1F , ϕs〉〈φs, 1G〉| . dense(T) size(T)|RT|. Now, consider a tree T with dense(T) = δ, and size(T) = σ, where we insist upon using the original definitions of density and size. If in addition, γs ≥ K(σδ)−1 for all s ∈ T, we would then have the inequalities dense(T) . δ, size(T) . σ, 60 4. L ESTIMATE FOR Hv This places (4.87) at our disposal, but this is not quite the estimate we need, as the functions ϕs and φs that occur in (4.87) are not truncated in the appropriate way, and it is this matter that we turn to next. Recall that ϕs = αs + βs , αs(x− yv(x))ψs(y) dy = αs−(x) + αs+(y) . One should recall the displays (4.30), (4.31), and (4.38). As an immediate consequence of the definition of βs, we have∫ |βs(x)| dx . γ−2s |Rs|. Hence, if we replace ϕs by βs, we have |〈Sann 1F , βs〉〈φs, 1G〉| . |Rs||〈φs, 1G〉| γ−1s |Rs| . σδ|RT|. And by a very similar argument, one sees corresponding bounds, in which we replace the φs by different functions. Namely, recalling the definitions of as± in (4.31) and estimate (4.38), we have |Rs||〈as+, 1G〉| . σ (‖v‖Lip scl(s) (x) dx(4.88) (‖v‖Lip scl(s) . σδ|RT| . Similarly, we have |Rs||〈φs − as+ − as−, 1G〉| . σδ|RT|, Putting these estimates together proves our Lemma, in particular (4.58), under the assumption that γs ≥ K(σδ)−1 for all s ∈ T. Assume that T is a tree with scl(s) = scl(s′) for all s, s′ ∈ T. That is, the scale of the tiles in the tree is fixed. Then, T is in particular a 1– tree, so that by an application of the definitions and Cauchy–Schwartz, |〈Sann 1F , αs〉〈as−, 1G〉| ≤ δ |〈Sann 1F , αs〉| ≤ δσ|RT|. PROOFS OF LEMMATA 61 But, γs ≥ 1 increases as does scl(s). Thus, any tree T with γs ≤ K(σδ)−1 for all s ∈ T, is a union of O(|log δσ|) trees for which the last estimate holds. � CHAPTER 5 Almost Orthogonality Between Annuli Application of the Fourier Localization Lemma We are to prove Lemma 4.23, and in doing so rely upon a technical lemma on Fourier localization, Lemma 5.56 below. We can take a choice of 1 < α < 9 , and assume, after a dilation, that ‖v‖Cα = 1. The first inequality we establish is this. Lemma 5.1. Using the notation of of Lemma 4.23, and assuming that ‖v‖Cα . 1, we have the estimate ‖C‖2 . 1, where ann≥1 where the Cann are defined in (4.21). We have already established Lemma 4.22, and so in particular know that ‖Cann‖2 . 1. Due to the imposition of the Fourier restriction in the definition of these operators, it is immediate that CannC∗ann′ ≡ 0 for ann 6= ann′. We establish that Cann′‖2 . max(ann, ann′)−δ , δ = 1 (α− 1) , |log ann(ann′)−1| > 3 . (5.2) Then, it is entirely elementary to see that C is a bounded operator. Let Pann be the Fourier projection of f onto the frequencies ann < |ξ| < 2ann. Observe, ‖Cf‖22 = ann≥1 Cann Pann f ann≥1 ann′>1 〈Cann Pann f, Cann′ Pann′ f〉 ≤ 2‖f‖2 ann≥1 ann′>1 Cann′ Pann′ f‖2 . ‖f‖22 ann≥1 ann′>1 max(ann, ann′)−δ . ‖f‖22. 64 5. ALMOST ORTHOGONALITY There are only O(log ann) possible values of scl that contribute to Cann, and likewise for Cann′. Thus, if we define (5.3) Cann,sclf = s∈AT (ann) scl(s)=scl 〈f, ϕs〉φs , it suffices to prove Lemma 5.4. Using the notation of of Lemma 4.23, and assuming that ‖v‖Cα . 1, we have ann,sclCann′,scl′‖2 . (max(ann, ann′))−δ . Here, we can take δ′ = 1 (α − 1), and the inequality holds for all |log ann(ann′)−1| > 3, 1 < scl ≤ ann and 1 < scl′ ≤ ann′. Proof of Lemma 4.23. In this proof, we assume that Lemma 5.1 and Lemma 5.4 are established. The first Lemma clearly establishes the first (and more important) claim of the Lemma. Let us prove the inequality (4.24). Using the notation of this sec- tion, this inequality is as follows. (5.5) ann=−∞ Cann,scl . (1 + log(1 + scl−1‖v‖Cα)). This inequality holds for all choices of Cα vector fields v. Note that Lemma 5.4 implies immediately ann=3 Cann,scl . 1 , ‖v‖Cα = 1 . We are however in a scale invariant situation, so that this inequality implies this equivalent form, independent of assumption on the norm of the vector field. (5.6) ann≥8‖v‖Cα Cann,scl . 1 . On the other hand, Lemma 4.25, implies that independent of any assumption other than measurability, we have have the inequality ‖Cann,scl‖2 . 1 . To prove (5.5), use the inequality (5.6), and this last inequality together with the simple fact that for a fixed value of scl, there are at most . 1 + log(1 + scl−1‖v‖Cα)) values of ann with scl ≤ ann ≤ 8‖v‖Cα. APPLICATION OF THE FOURIER LOCALIZATION LEMMA 65 We use the notation AT (ann, scl) := {s ∈ AT (ann) : scl(s) = scl} , Observe that as the scale is fixed, we have a Bessel inequality for the functions {ϕs | s ∈ AT (ann, scl)}. Thus, ann,sclCann′,scl′f‖22 = s∈AT (ann,scl) s∈AT (ann′,scl′) 〈φs, φs′〉〈ϕs′, f〉ϕs s∈AT (ann,scl) s∈AT (ann′,scl′) 〈φs, φs′〉〈ϕs′, f〉 At this point, the Schur test suggests itself, and indeed, we need a quantitative version of the test, which we state here. Proposition 5.7. Let A = {ai,j} be a matrix acting on ℓ2(N) by ai,jxj Then, we have the following bound on the operator norm of A. ‖A‖2 . sup |ai,j| · sup |ai,j| We assume that 1 ≤ ann < 1 ′. For a subset S ⊂ AT (ann, scl)× AT (ann′, scl′) Consider the operator and definitions below. AS f = (s,s′)∈S 〈φs, φs′〉〈ϕs′, f〉ϕs , FL(s,S) = s′∈AT (ann′,scl′) |〈φs, φs′〉| , FL(S) = sup FL(s,S) . Here ‘FL’ is for ‘Fourier Localization’ as this term is to be controlled by Lemma 5.56. We will use the notations FL(s′,S), and FL′(S), which are defined similarly, with the roles of s and s′ reversed. By Proposition 5.7, we have the inequality (5.8) ‖AS‖22 . FL(S) · FL′(S) . We shall see that typically FL(S) will be somewhat large, but is bal- anced out by FL′(S). We partition AT (ann, scl) × AT (ann′, scl′) into three disjoint sub- collections Su, u = 1, 2, 3, defined as follows. In this display, (s, s′) ∈ 66 5. ALMOST ORTHOGONALITY AT (ann, scl)×AT (ann′, scl′). (s, s′) | scl ≥ scl ,(5.9) (s, s′) | scl , scl < scl′ ,(5.10) (s, s′) | scl , scl′ < scl .(5.11) A further modification to these collections must be made, but it is not of an essential nature. For an integer j ≥ 1, and (s, s′) ∈ Su, for u = 1, 2, 3, write (s, s′) ∈ Su,j if j is the smallest integer such that 2j+2Rs ∩ 2j+2Rs′ 6= ∅. We apply the inequalities (5.8) to the collections Su,j, to prove the inequalities (5.12) ‖ASu,j‖2 . 2−j(ann′)−δ where δ′ = 1 (α − 1). This proves Lemma 5.4, and so completes the proof of Lemma 5.1. In applying (5.8) it will be very easy to estimate FL(s,S), with a term that decreases like say 2−10j. The difficult part is to estimate either FL(s,S) or FL′(S) by a term with decreases faster than a small power of (ann′)−1. for which we use Lemma 5.56. Considering a term 〈φs, φs′〉, the inner product is trivially zero if ωs ∩ ωs′ = ∅. We assume that this is not the case below. To apply Lemma 5.56, fix e ∈ ω′s ∩ωs. Let α be a Schwartz function on R with α̂ supported on [ann′, 2ann′], and identically one on 3 [ann′, 2ann′]. Set β̂(θ) := α̂(θ − 3 ′). We will convolve φs with β in the direction e, and φs′ with α also in the direction e, thereby obtaining orthogonal functions. Define Ie g(x) = g(x− ye)β(y) dy,(5.13) ∆s = φs − Ie φs ∆s′ = φs′ − Ie φs′ (5.14) By construction, we have 〈φs, φs′〉 = 〈Ie φs +∆s, Ie φs′ +∆s′〉 = 〈Ie φs,∆s′〉+ 〈∆s, Ie φs′〉+ 〈∆s,∆s′〉 . APPLICATION OF THE FOURIER LOCALIZATION LEMMA 67 It falls to us to estimate terms like s∈Sℓ,j |〈∆s, Ie φs′〉|,(5.15) s∈Sℓ,j |〈Ie φs,∆s′〉|,(5.16) s∈Sℓ,j |〈∆s,∆s′〉|.(5.17) as well as the dual expressions, with the roles of s and s′ reversed. The differences ∆s and ∆s′ are frequently controlled by Lemma 5.56. Concerning application of this Lemma to ∆s, observe that Mod−c(ωs)∆s = Mod−c(ωs) φs − [Mod−c(ωs) φs(x− ye)]β̃(y) dy where β̃(y) = e(c(ωs)·e)y β(y). Now the Fourier transform of β is identi- cally one in a neighborhood of the origin of width comparable to ann′, where as |c(ωs) · e| is comparable to ann. Since we can assume that ′ > ann+3, say, the function β̃ meets the hypotheses of Lemma 5.56, namely it is Schwarz function with Fourier transform identically one in a neighborhood of the origin, and the width of that neighborhood is comparable to ann′. And so ∆s is bounded by the bounded by the three terms in (5.57)—(5.59) below. In these estimates, we take 2k ≃ ann′ > 1. By a similar argument, one sees that Lemma 5.56 also applies to ∆s′. We will let ∆s,m, for m = 1, 2, 3, denote the terms that come from (5.57), (5.58), and (5.59) respectively. We use the corresponding no- tation for ∆s,m, for m = 1, 2, 3. A nice feature of these estimates, is that while ∆s and ∆s′ depend upon the choice of e ∈ ωs′ ∩ ωs, the upper bounds in the first two estimates do not depend upon the choice of e. While the third estimate does, the dependence of the set Fs on the choice of e is rather weak. In application of (5.58), the functions ∆s,2 will be very small, due to the term (ann′)−10 which is on the right in (5.58). This term is so much smaller than all other terms involved in this argument that these terms are very easy to control. So we do not explicitly discuss the case of ∆s,2, or ∆s′,2 below. In the analysis of the terms (5.15) and (5.16), we frequently only need to use an inequality such as |Ie φs′| . χ(2)Rs′ . When it comes to the analysis of (5.17), the function ∆s′ obeys the same inequality, so that 68 5. ALMOST ORTHOGONALITY these sums can be controlled by the same analysis that controls (5.15), or (5.16). So we will explicitly discuss these cases below. In order for 〈φs, φs′〉 6= ∅, we must necessarily have ωs ∩ ωs′ 6= ∅. Thus, we update all Sℓ,j as follows. Sℓ,j ← {(s, s′) ∈ Sℓ,j | ωs ∩ ωs′ 6= ∅} . The Proof of (5.12) for S1,j, j ≥ 1. Recall the definition of S1,j from (5.9). In particular, for (s, s′) ∈ S1,j , we must have ωs ⊂ ωs′. We will use the inequality (5.8), and show that for 0 < ǫ < 1, FL(S1,j) . 2−10j(ann′)−eα ′ · ann′ scl · ann(5.18) FL′(S1,j) . 22j(ann′)ǫ · scl · ann ′ · ann′ .(5.19) Notice that in the second estimate, we permit some slow increase in the estimates as a function of 2j and ann′. But, due to the form of the estimate of the Schur test in (5.8), this slow growth is acceptable. The terms inside the square root in these two estimates cancel out. These inequalities conclude the proof of the inequality (5.12) for the collection S1,j , j ≥ 1. We prove (5.18). For this, we use Lemma 5.56. That is, we should bound the several terms s′ : (s,s′)∈S1,j |〈∆s, Ie φs′〉| ,(5.20) s′ : (s,s′)∈S1,j |〈Ie φs,∆s′〉| ,(5.21) s′ : (s,s′)∈S1,j |〈∆s,∆s′〉| .(5.22) Here ∆s and ∆s′ are as in (5.14). And, Ie is defined as in (5.13). We can regard the tile s as fixed, and so fix a choice of e ∈ ωs. In the next two cases, we will need to estimate the same expressions as above. In all three cases, Lemma 5.56 is applied with 2k ≃ ann′, and we can take ǫ in this Lemma to be ǫ = 1 (α− 1). For ease of notation, we set (5.23) α̃ = (α− 1)(1− ǫ)2 − ǫ > 0 As we have already mentioned, we do not explicitly discuss the upper bound on the estimate for (5.22). APPLICATION OF THE FOURIER LOCALIZATION LEMMA 69 The Upper Bound on (5.20). We write ∆s = ∆s,1 + ∆s,2 + ∆s,3, where these three terms are those on the right in (5.57)—(5.59) respec- tively. Note that (5.24) |Ie φs′| . χ(2)Rs′ , since Ie is convolution in the long direction of Rs′ , at the scale of (ann′)−1, which is much smaller than the length of Rs′ in the direc- tion e. Therefore, we can estimate the term in (5.20) by s′ : (s,s′)∈S1,j |〈∆s,1, Ie φs′〉| . (ann′)−eα2−10j s′ : (s,s′)∈S1,j . (ann′)−eα2−10j ′ · ann′ scl · ann .(5.25) This is as required to prove (5.18) for these sums. For the terms associated with ∆s,3, we have s′ : (s,s′)∈S1,j |〈∆s,3, Ie φs′〉| . s′ : (s,s′)∈S1,j |Rs|−1/2 · χ(2)R′s dx . 2−10j|Fs| ann′ · scl′ · ann · scl . 2−10j(ann′)−α+ǫ ′ · ann′ scl · ann . That is, we only rely upon the estimate (5.60). This completes the analysis of (5.20). (As we have commented above, we do not explicitly discuss the case of ∆s,2.) The Upper Bound for (5.21). Since ωs ⊂ ωs′ , the only facts about ∆s′ we need are (ann′)ǫR′s |∆s′| dx . (ann′)−eα+ǫ ′ · ann′ , |∆s′(x)| . (ann′)−eαχ(2)Rs′ (x) , x 6∈ (ann ′)ǫRs′ . (5.26) Indeed, this estimate is a straightforward consequence of the various conclusions of Lemma 5.56. (We will return to this estimate in other cases below.) 70 5. ALMOST ORTHOGONALITY These inequalities, with |Ie φs| . χ(2)Rs , permit us to estimate (5.21) . 2−20j |Rs|−1/2 s′ : (s,s′)∈S1,j (ann′)ǫR′s |∆s′| dx . 2−20j(ann′)−eα scl · ann ′ · ann′ × ♯{s ′ : (s, s′) ∈ S1,j} . 2−20j(ann′)−eα ′ · ann′ scl · ann . which is the required estimate. Here of course we use the estimate ♯{s′ : (s, s′) ∈ S1,j} . 22j ′ · ann′ scl · ann . We now turn to the proof of (5.19), where it is important that we justify the small term √ scl · ann ′ · ann′ on the right in (5.19). We estimate the terms dual to (5.20)—(5.22), namely s : (s,s′)∈S1,j |〈∆s, Ies φs′〉| ,(5.27) s : (s,s′)∈S1,j |〈Ies φs,∆s′〉| ,(5.28) s : (s,s′)∈S1,j |〈∆s,∆s′〉| .(5.29) Here, for each choice of tile s, we make a choice of es ∈ ωs ⊂ ωs′. The Upper Bound on (5.27). We have an inequality analogous to (5.24). (5.30) |Ies φs′| . χ Note that as we can view s′ as fixed, all the tiles {s : (s, s′) ∈ S1,j} have the same approximate spatial location. Let us single out a tile s0 in this collection. Then, for all s, we have Rs ⊂ 2j+2Rs0 . Recalling the specific information about the support of the functions of ∆s from (5.57), (5.59) and (5.61), it follows that s : (s,s′)∈S1,j |∆s| . 22j(ann′)ǫχ(2)2j+2Rs0 . APPLICATION OF THE FOURIER LOCALIZATION LEMMA 71 In particular, we do not claim any decay in ann′ in this estimate. (The small growth of (ann′)ǫ above arises from the overlapping supports of the functions ∆s, as detailed in Lemma 5.56.) Therefore, we can esti- s : (s,s′)∈S1,j |〈∆s, Ies φs′〉| . 22j(ann′)ǫ 2jRs0 . 2−10j(ann′)ǫ scl · ann ′ · ann′ . This is as required in (5.19). Remark 5.31. It is the analysis of the term s : (s,s′)∈S1,j |〈∆s,3, Ies φs′〉| which prevents us from obtaining a decay in ann′, at least in some choices of the parameters scl , ann , scl′, and ann′. The Upper Bound on (5.28). The fact about ∆s′ we need is the simple inequality |∆s′| . χ(2)Rs′ . As in the previous case, we turn to the fact that all the tile {s : (s, s′) ∈ S1,j} have the same approximate spatial location. Single out a tile s0 in this collection, so that Rs ⊂ 2j+2Rs0 for all such s. Our claim is that (5.32) s : (s,s′)∈S1,j |Ies φs| . 22jχ 22jRs (We will have need of related inequalities below.) Suppose that s ∈ {s : (s, s′) ∈ S1,j}. These intervals all have the same length, namely scl/ann. And x 6∈ supp(φs) implies v(x) 6∈ ωs, so that by the Lipschitz assumption on the vector field dist(x, supp(φs)) & dist(v(x),ωs) . This means that (5.33) |Ies φs(x)| . χ 1 + ann′ · dist(v(x),ωs) Here, we recall that the operator Ie is dominated by the operator which averages on spatial scale (ann′)−1 in the direction e. Moreover, we have (5.34) ann′ · dist(ωs,ω) & scl . Here, we partition the unit circle into disjoint intervals ω ∈ Ω of length |ω| ≃ scl/ann, so that for all s ∈ {s : (s, s′) ∈ S1,j}, we have ωs ∈ Ω. 72 5. ALMOST ORTHOGONALITY Figure 5.1. The relative positions of Rs and Rs′ in for pairs (s, s′) ∈ Sℓ, for ℓ = 2 and ℓ = 3 respectively. In fact, the term on the left in (5.34) can be taken to be integer multiples of scl. Combining these observations proves (5.32). Indeed, we can estimate the term in (5.32) as follows. For x, fix ω ∈ Ω with v(x) ∈ ω. Then, s : (s,s′)∈S1,j |Ies φs| . s : (s,s′)∈S1,j 1 + ann′ · dist(ω,ωs) The important point is that the term involving the distance allows us to sum over the possible values of ωs ⊂ ωs′ to conclude (5.32). To finish this case, we can estimate s : (s,s′)∈S1,j |〈Ies φs,∆s′〉| . 2−10j scl · ann ′ · ann′ . This completes the upper bound on (5.28). The Proof (5.12) for S2,j, j ≥ 1. In this case, note that the assumptions imply that we can assume that ωs′ ⊂ ωs, and that di- mensions of the rectangle Rs′ are smaller than those for Rs in both directions. See Figure 5. We should show these two inequalities, in analogy to (5.18) and (5.19). FL(S2,j) . 2−10j(ann′)−eα ′ · ann′ scl · ann(5.35) FL′(S2,j) . 2−10j(ann′)−eα · scl · ann ′ · ann′ .(5.36) Here, α̃ is as in (5.23). APPLICATION OF THE FOURIER LOCALIZATION LEMMA 73 For the proof of (5.35), we should analyze the sums s′ : (s,s′)∈S2,j |〈∆s, Ies′ φs′〉| ,(5.37) s′ : (s,s′)∈S2,j |〈Ies′ φs,∆s′〉| ,(5.38) s′ : (s,s′)∈S2,j |〈∆s,∆s′〉| .(5.39) These inequalities are in analogy to (5.20)—(5.22), and es′ ∈ ωs′ ⊂ ωs. The Upper Bound on (5.37). Fix the tile s. Fix a translate Rs of Rs with 2 jRs ∩ 2jRs = ∅, but 2j+1Rs ∩ 2j+2Rs 6= ∅. Let us consider (5.40) S2,j = {(s, s′) ∈ S2,j | Rs′ ⊂ Rs} and we restrict the the sum in (5.37) to this collection of tiles. Note that with . 22j choices of Rs, we can exhaust the collection S2,j . So we will prove a slightly stronger estimate in the parameter 2j for the restricted collection S2,j. The point of this restriction is that we can appeal to an inequality similar to (5.32). Namely, (5.41) s′ : (s,s′)∈S2,j |Ie′s φs′| . ′ · ann′ scl · ann χ Note that the term in the square root takes care of the differing L2 normalizations of φs′ and χ . Indeed, the proof of (5.32) is easily modified to give this inequality. Next, we observe that the analog of (5.26) holds for ∆s. Just replace s′ in (5.26) with s. It is a consequence that we have s′ : (s,s′)∈S2,j |〈∆s, Ies′ φs′〉| . 2 −12j(ann′)−eα ′ · ann′ scl · ann . This is enough to finish this case. The Upper Bound on (5.38). Let us again appeal to the notations Rs and S2,j as in (5.40). We have the estimates |Ies′ φs| . χ 74 5. ALMOST ORTHOGONALITY As for the sum over ∆s′ , we have an analog of the estimates (5.26). Namely, s′ : (s,s′)∈S2,j (ann′)ǫRs |∆s′,1| dx . (ann′)−eα+ǫ ′ · ann′ scl · ann s′ : (s,s′)∈S2,j |∆s′,1| . ′ · ann′ scl · ann χ , x 6∈ (ann′)ǫRs . Note that we again have to be careful to accommodate the different normalizations here. The proof of (5.26) can be modified to prove this estimate. Putting these two estimates together clearly proves that s′ : (s,s′)∈S2,j |〈Ies′ φs,∆s′〉| . 2 −10j(ann′)−eα ′ · ann′ scl · ann , as is required. We now turn to the proof of the inequality (5.36), which will follow from appropriate upper bounds on the sums below. s : (s,s′)∈S2,j |〈∆s, Ie φs′〉| ,(5.42) s : (s,s′)∈S2,j |〈Ie φs,∆s′〉| ,(5.43) s : (s,s′)∈S2,j |〈∆s,∆s′〉| .(5.44) Here, we can regard s′ as a fixed tile, and e ∈ ωs′ ⊂ ωs. In this case, observe that we have the inequality (5.45) ♯{s : (s, s′) ∈ S1,j} . 22j . This is so since Rs has larger dimensions in both directions than does The Upper Bound on (5.42). We use the decomposition of ∆s = ∆s,1 +∆s,2 +∆s,3. In the first case, we can estimate s : (s,s′)∈S2,j |〈∆s,1, Ie φs′〉| . 22j sup|〈∆s,1, Ie φs′〉| . 2−10j(ann′)−eα scl · ann ′ · ann′ . APPLICATION OF THE FOURIER LOCALIZATION LEMMA 75 For the last case, of ∆s,3, we estimate s : (s,s′)∈S2,j |〈∆s,3, Ie φs′〉| . 22j sup|〈∆s,3, Ie φs′〉| . 22j min |Fs| · scl · ann · scl′ · ann′ , 2−30j scl · ann ′ · ann′ Examining the two terms of the minimum, note that by (5.61), |Fs| · scl · ann · scl′ · ann′ . (ann′)−α+ǫ ′ · ann′ scl · ann . (ann′)−α+1+ǫ ′ · ann′ · scl · ann . (ann′)−α+1+ǫ scl · ann ′ · ann′ . Here it is essential that we have the estimate (5.60) as stated, with |Fs| . (ann′)−α+ǫ|Rs|. This is an estimate of the desired form, but without any decay in the parameter j. The second term in the min- imum does have the decay in j, but does not have the decay in ann′. Taking the geometric mean of these two terms finishes the proof, pro- vided (α−ǫ)/2 > α̃, which we can assume by taking α sufficiently close to one. The Upper Bound on (5.43). Using the inequality |Ie φs| . χ(2)Rs , and the inequalities (5.26) and (5.45), it is easy to see that s : (s,s′)∈S2,j |〈Ie φs,∆s′〉| . 2−12j(ann′)−eα scl · ann ′ · ann′ . This is the required estimate. The Proof of (5.12) for S3,j, j ≥ 1. In this case, we have that the length of the rectangles Rs′ are greater than those of the rectangles Rs, as depicted in Figure 5. We show that FL(S3,j) . (ann′)ǫ ′ · ann′ scl · ann(5.46) FL′(S3,j) . 2−10j(ann′)−eα scl · ann ′ · ann′ .(5.47) In particular, we do not claim any decay in the term FL(S3,j), in fact permitting a small increase in the parameter ann′. Recall that 0 < ǫ < 1 is a small quantity. See (5.23). But due to the form of the estimate in 76 5. ALMOST ORTHOGONALITY Proposition 5.7, with the decay in 2j and ann′ in the estimate (5.47), these two estimates still prove (5.12) for S3,j . For the proof of (5.46), we analyze the sums s′ : (s,s′)∈S3,j |〈∆s, Ies′ φs′〉| ,(5.48) s′ : (s,s′)∈S3,j |〈Ies′ φs,∆s′〉| ,(5.49) s′ : (s,s′)∈S3,j |〈∆s,∆s′〉| .(5.50) Here, es′ ∈ ωs′ ⊂ ωs. The Upper Bound on (5.48). Regard s as fixed. We employ a vari- ant of the notation established in (5.40). Let R̃s be a rectangle with in the same coordinates axes as Rs. In the direction es, let it have length 1/scl′, that is the (longer) length of the rectangles Rs′, and let it have the same width of Rs. Further assume that 2 jRs ∩ R̃s = ∅ but 2j+4Rs ∩ R̃s 6= ∅. (There is an obvious change in these requirements for j = 1.) Then, define S̃3,j = {(s, s′) ∈ S3,j | Rs′ ⊂ R̃s} . With . 22j choices of R̃s, we can exhaust the collection S3,j . Thus, we prove a slightly stronger estimate in the parameter 2j for the collection S̃3,j. The main point here is that we have an analog of the estimate (5.32): s′ : (s,s′)∈ eS3,j |Ie φs′| . The term in the square root takes into account the differing L2 normal- izations between the φs′ and χ . The proof of (5.32) can be modified to prove the estimate above. We also have the analogs of the estimate (5.26). Putting these two together proves that s′ : (s,s′)∈S3,j |〈∆s, Ies′ φs′〉| . 2 −10j(ann′)−eα . 2−10j(ann′)−eα ′ · ann′ scl · ann . APPLICATION OF THE FOURIER LOCALIZATION LEMMA 77 That is, we get the estimate we want with decay in ann′, we do not claim in general. The Upper Bound on (5.49). We use the inequality |Ies′ φs| . χ And we use the decomposition ∆s′ = ∆s′,1 +∆s′,2 +∆s′,3. For the case of ∆s′,1, we have ωs′ ⊂ ωs. And the supports of the functions ∆s′ are well localized with respect to the vector field. See (5.57). Thus, in particular we have s′ : (s,s′)∈S3,j |∆s′| . (ann′)ǫ ′ · ann′ . Hence, we have s′ : (s,s′)∈S3,j |〈χ(2)Rs ,∆s′〉| . (ann ′ · ann′ scl · ann which is the desired estimate. Remark 5.51. It is the analysis of the sum s′ : (s,s′)∈S3,j |〈Ie φs,∆s′,3〉| that prevents us from obtaining decay in the parameter ann′ for cer- tain choices of parameters scl , ann , scl′ and ann′. This is why we have formulated (5.46) the way we have. For the proof of (5.47), we analyze the sums s : (s,s′)∈S3,j |〈∆s, Ie φs′〉| ,(5.52) s : (s,s′)∈S3,j |〈Ie φs,∆s′〉| ,(5.53) s : (s,s′)∈S3,j |〈∆s,∆s′〉| .(5.54) Here es′ ∈ ωs′ ⊂ ωs, and one can regard the interval ωs′ as fixed. It is essential that we obtain the decay in 2j and ann′ in these cases. Indeed, these cases are easier, as the sum is over s. For fixed s′, there is a unique choice of interval ωs ⊃ ωs′. And the rectangles Rs are shorter than Rs′, but wider. Hence, (5.55) ♯{s : (s, s′) ∈ S3,j} . 22j 78 5. ALMOST ORTHOGONALITY The Upper Bound on (5.52). We use the decomposition ∆s = ∆s,1+ ∆s,2 +∆s,3, and the inequality |Ies′ φs′, | . χ For the sum associated with ∆s,1, we have s : (s,s′)∈S3,j |〈∆s,1, Ies′ φs′〉| . (ann ′)−eα s : (s,s′)∈S3,j 〈χ(2)Rs , χ . 2−12j(ann′)−eα ′ · ann scl · ann′ × ♯{s : (s, s′) ∈ S3,j} . 2−10j(ann′)−eα ′ · ann′ scl · ann . This is the required estimate. For the sum associated with ∆s,3, the critical properties are those of the corresponding sets Fs, described in (5.60) and (5.61). Note that the sets ∑ s : (s,s′)∈S3,j 1Fs . (ann ′)2ǫ . On the other hand, s : (s,s′)∈S3,j |Fs| . 22j ′ sup s : (s,s′)∈S3,j . 22j(ann′)−α+ǫ ′ · ann . Here, we have used the estimate (5.55). This permits us to estimate s : (s,s′)∈S3,j |〈∆s,3, Ies′ φs′〉| . 2 −10j(ann′)−α+3ǫ scl · ann′ ′ · ann Note that the parity between the ‘primes’ is broken in this estimate. By inspection, one sees that this last term is at most . 2−10j(ann′)−eα ′ · ann′ scl · ann . Indeed, the claimed inequality amounts to (ann′)−α+3ǫscl . (ann′)−eαscl′ . We have to permit scl′ to be as small as 1, whereas scl can be as big as ann. But α > 1, and ann < ann′, so the inequality above is trivially true. This completes the analysis of (5.52). THE FOURIER LOCALIZATION ESTIMATE 79 The Upper Bound on (5.53). We only need to use the inequality |Ies′ φs| . χ , and the inequalities (5.26). It follows that s : (s,s′)∈S3,j |〈Ies′ φs,∆s′〉| . s : (s,s′)∈S3,j 〈χ(2)Rs , |∆s′|〉 . 2−10j(ann′)−eα ′ · ann′ scl · ann . The Fourier Localization Estimate The precise form of the inequalities quantifying the Fourier local- ization effect follows. Fourier Localization Lemma 5.56. Let 1 < α < 2, ǫ < (α− 1)/20, and v be a vector field with ‖v‖Cα ≤ 1. Let s be a tile with 1 < scl(s) = scl ≤ ann(s) = ann < 1 fs = Mod−c(ωs) φs Let ζ be a smooth function on R, with 1(−2,2) ≤ ζ̂ ≤ 1(−3,3) and set ζ2k(y) = 2 kζ(y2k). We have this inequality valid for all unit vectors e with |e− es| ≤ |ωs|. ∣∣∣fs(x)− fs(x− ye)ζ2k(y) dy .(scl2(α−1)k)−1+ǫχ (x)1fωs(v(x))(5.57) + (2kscl)−10χ (x)(5.58) + |Rs|−1/21Fs(x) ,(5.59) where ω̃s is a sub arc of the unit circle, with ω̃s = λωs, and 1 < λ < 2ǫk. Moreover, the sets Fs ⊂ R2 satisfy |Fs| . 2−(α−ǫ)k(1 + scl−1)α−1|Rs|,(5.60) Fs ⊂ 2ǫkRs ∩ v−1(ω̃s) ∩ ∂(v ·es⊥) ∣∣∣ > 2(1−ǫ)k .(5.61) The appearance of the set Fs is explained in part because the only way for the function φs to oscillate quickly along the direction es is that the vector field moves back and forth across the interval ωs very quickly. This sort of behavior, as it turns out, is the only obstacle to the frequency localization described in this Lemma. Note that the degree of localization improves in k. In (5.57), it is important that we have the localization in terms of the directions 80 5. ALMOST ORTHOGONALITY of the vector field. The terms in (5.58) will be very small in all the instances that we apply this lemma. The third estimate (5.59) is the most complicated, as it depends upon the exceptional set. The form of the exceptional set in (5.61) is not so important, but the size estimate, as a function of α > 1, in (5.60) is. Proof. We collect some elementary estimates. Throughout this argument, ~y := y e ∈ R2. (5.62) |y|>t2−k |y2k||ζ2k(y)| dy . t−N , t > 1. This estimate holds for all N > 1. Likewise, (5.63) |u|>tscl−1 |uscl||sclψ(scl u)| du . t−N , t > 1. More significantly, we have for all x ∈ R2, (5.64) eiξ0y ϕ (x− ~y)ζ2k(y) dy = ϕ (x) − 2k < ξ0 < 2k, where ϕ = Tc(Rs)D ϕ. This is seen by taking the Fourier transform. Likewise, by (4.17), for vectors v0 of unit length,∫ e−2πiuλ0 ϕ (x− uv0)sclψ(scl u) du 6= 0 implies that (5.65) scl ≤ λ0 + ξ · v0 ≤ 98scl, for some ξ ∈ supp(ϕ̂ At this point, it is useful to recall that we have specified the fre- quency support of ϕ to be in a small ball of radius κ in (4.16). This has the implication that (5.66) |ξ · es| ≤ κscl, |ξ · es⊥| ≤ κann ξ ∈ supp(ϕ̂(2)Rs ) We begin the main line of the argument, which comes in two stages. In the first stage, we address the issue of the derivative below exceeding a ‘large’ threshold. e ·Dv(x) · es⊥ = ∂v · es⊥ We shall find that this happens on a relatively small set, the set Fs of the Lemma. Notice that due to the eccentricity of the rectangle Rs, we can only hope to have some control over the derivative in the long direction of the rectangle, and e essentially points in the long direction. We are interested in derivative in the direction es⊥ as that is THE FOURIER LOCALIZATION ESTIMATE 81 the direction that v must move to cross the interval ωs2. A substantial portion of the technicalities below are forced upon us due to the few choices of scales 1 ≤ scl ≤ 2εk, for some small positive ε.1 Let 0 < ε1, ε2 < ǫ to be specified in the argument below. In partic- ular, we take 0 < ε1 ≤ min , κα−1 , 0 < ε2 < (α− 1). We have the estimate |fs(x)|+ fs(x− ~y)ζ2k(y) dy ∣∣∣ . 2−10kχ(2)Rs (x), x 6∈ 2 ε1kRs. This follows from (5.62) and the fact that the direction e differs from es by an no more than the measure of the angle of uncertainty for Rs. This is as claimed in (5.58). We need only consider x ∈ 2ε1kRs. Let us define the sets Fs, as in (5.59). Define λs := 2ε1k scl < 2−2ε1k 8 otherwise Let λωs denote the interval on the unit circle with length λ|ωs|, and the same center as ωs. 2 This is our ω̃ of the Lemma; the set Fs of the Lemma is (5.67) Fs := 2 ε1kRs ∩ v−1(λsωs) ∩ ∂(v ·es⊥) ∣∣∣ > 2(1−ε2)k And so to satisfy (5.61), we should take ε1 < 1/1200. Let us argue that the measure of Fs satisfies (5.60). Fix a line ℓ in the direction of e. We should see that the one dimensional measure (5.68) |ℓ ∩ Fs| . 2−k(α−ǫ)(1 + scl−1)α−1scl−1. For we can then integrate over the choices of ℓ to get the estimate in (5.60). The set ℓ ∩ Fs is viewed as a subset of R. It consists of open intervals An = (an, bn), 1 ≤ n ≤ N . List them so that bn < an+1 for all n. Partition the integers {1, 2, . . . , N} into sets of consecutive integers Iσ = [mσ, nσ]∩N so that for all points x between the left-hand endpoint of Amσ and the right-hand endpoint of Anσ , the derivative ∂(v · es⊥)/∂e has the same sign. Take the intervals of integers Iσ to be maximal with respect to this property. 1The scales of approximate length one are where the smooth character of the vector field helps the least. The argument becomes especially easy in the case that√ ann ≤ scl, as in the case, |ωs| & scl−1. 2We have defined λs this way so that λsωs makes sense. 82 5. ALMOST ORTHOGONALITY For x ∈ Fs, the partial derivative of v, in the direction that is transverse to λsωs, is large with respect to the length of λsωs. Hence, v must pass across λsωs in a small amount of time: |Am| . 2−(1−ε1−ε2)k for all σ. Now consider intervals Anσ and A1+nσ = Amσ+1. By definition, there must be a change of sign of ∂v(x) · es⊥/∂e between these two intervals. And so there is a change in this derivative that is at least as big as 2(1−ε2)k scl . The partial derivative is also Hölder continuous of index α − 1, which implies that Anσ and Amσ+1 cannot be very close, specifically dist(Anσ , Amσ+1) ≥ 2(1−ε2)k As all of the intervals An lie in an interval of length 2 ε1kscl −1, it follows that there can be at most 1 ≤ σ . 2ε1kscl−1 2(1−ε2)k )−α+1 intervals Iσ. Consequently, |ℓ ∩ Fs| . 2−(1−2ε1−ε2+(1−ε2)(α−1))kscl−1 . 2−(α−2ε1−2ε2)kscl−α+1scl−1 We have already required 0 < ε1 < and taking 0 < ε2 < achieve the estimate (5.68). This completes the proof of (5.60). The second stage of the proof begins, in which we make a detailed estimate of the difference in question, seeking to take full advantage of the Fourier properties (5.62)—(5.65), as well as the derivative informa- tion encoded into the set Fs. We consider the difference in (5.57) in the case of x ∈ 2ε1kRs − v−1(λsωs). In particular, x is not in the support of fs, and due to the smoothness of the vector field, the distance of x to the support of fs is at least & 2ε1k so that by (5.62), we can estimate ∣∣∣fs(x)− fs(x− ~y)ζ2k(y) dy ∣∣∣ . (2ε1kscl)−N |Rs|−1/2 which is the estimate (5.58). THE FOURIER LOCALIZATION ESTIMATE 83 We turn to the proof of (5.57). For x ∈ 2ε1kRs ∩ v−1(λsωs), we always have the bound ∣∣∣fs(x)− fs(x− ~y)ζ2k(y) dy ∣∣∣ . 210ε1k/κχ(2)Rs (x) 101λsωs(x). It is essential that we have |e − es| ≤ |ωs| for this to be true, and κ enters in on the right hand side through the definition (4.40). We establish the bound ∣∣∣fs(x)− fs(x− ~y)ζ2k(y) dy ∣∣∣ . (scl2(α−1)k)−1|Rs|−1/2, x ∈ 2ε1kRs ∩ v−1(λsωs) ∩ F cs . (5.69) We take the geometric mean of these two estimates, and specify that 0 < ε1 < κ to conclude (5.57). It remains to consider x ∈ 2ε1kRs ∩ v−1(λsωs) ∩ F cs , and now some detailed calculations are needed. To ease the burden of notation, we exp(x) := e−2πiuc(ωs)·v(x), Φ(x, x′) = ϕ (x− uv(x′)), with the dependency on u being suppressed, and define w(du, d~y) := sclψ(scl u)ζ2k(y) du d~y. In this notation, note that fs = Mod−c(ωs) φs ec(ωs)(x−uv(x)−c(ωs))x ϕ (x− uv(x)) sclψ(sclu) du exp(x)Φ(x, x, )sclψ(sclu) du exp(x)Φ(x, x, )w(du, d~y) , since ζ has integral on R2. In addition, we have fs(x− ~ye)ζ2k(~y) d~y = ec(ωs)(x−uv(x−~y)−c(ωs))x × ϕ(2)Rs (x− uv(x− ~y)) sclψ(sclu) du d~y exp(x− ~y)Φ(x− ~y, x− ~y)w(du, d~y) . 84 5. ALMOST ORTHOGONALITY We are to estimate the difference between these two expressions, which is the difference of Diff1(x) := exp(x)Φ(x, x)− exp(x− ~y)Φ(x− ~y, x)w(du, d~y) Diff2(x) := exp(x− ~y){Φ(x− ~y, x− ~y)− Φ(x− ~y, x)}w(du, d~y) The analysis of both terms is quite similar. We begin with the first term. Note that by (5.64), we have Diff1(x) = {exp(x)− exp(x− ~y)}Φ(x− ~y, x)w(du, d~y). We make a first order approximation to the difference above. Observe exp(x)− exp(x− ~y) = exp(x){1− exp(x− ~y)exp(x)} = exp(x){1− e−2πiu[c(ωs)·Dv(x)·e]~y}(5.70) +O(|u|ann|~y|α). In the Big–Oh term, |u| is typically of the order scl−1, and |~y| is of the order 2−k. Hence, direct integration leads to the estimate of this term |u|ann|y|α|Φ(x− ~y, x)|·|w(du, d~y)| .|Rs|−1/2 .|Rs|−1/2(scl2(α−1)k)−1. This is (5.69). The term left to estimate is Diff ′1(x) := exp(x)(1−e−2πiu[c(ωs)·Dv(x)·e]~y) Φ(x− ~y, x)w(du, d~y) . Observe that by (5.64), the integral in y is zero if |u[c(ωs) ·Dv(x) · e]| ≤ 2k. Here we recall that c(ωs) = ann es⊥. By the definition of Fs, the partial derivative is small, namely |es⊥ ·Dv(x) · e| . 2(1−ε1)k THE FOURIER LOCALIZATION ESTIMATE 85 Hence, the integral in y in Diff ′1(x) can be non–zero only for scl|u| & 2ε1k. By (5.63), it follows that in this case we have the estimate |Diff ′1(x)| . 2−2k|Rs|−1/2 This estimate holds for x ∈ 2ε1kRs∩v−1(λsωs)∩F cs and this completes the proof of the upper bound (5.69) for the first difference. We consider the second difference Diff2. The term v(x− ~y) occurs twice in this term, in exp(x − ~y), and in Φ(x − ~y, x − ~y). We will use the approximation (5.70), and similarly, Φ(x− ~y, x−~y)− Φ(x− ~y, x) (x− ~y − uv(x− ~y))− ϕ(2)Rs (x− ~y − uv(x)) (x− ~y − uv(x)− uDv(x)~y) − ϕ(2)Rs (x− ~y − uv(x)) +O(ann |u||y| = ∆Φ(x, ~y) +O(ann |u||~y|α) The Big–Oh term gives us, upon integration in u and ~y, a term that is no more than . |Rs|−1/2 2−αk . |Rs|−1/2(scl2(α−1)k)−1. This is as required by (5.69). We are left with estimating Diff ′2(x) := e−2πiuc(ωs)·(v(x)−Dv(x)·~y)∆Φ(x, ~y)w(du, d~y). By (5.64), the integral in y is zero if both of these conditions hold. |uc(ωs)Dv(x) · e| < 2k, |[uc(ωs)Dv(x)− ξ − uξDv(x)] · e| < 2k, ξ ∈ supp(ϕ̂(2)Rs ) Both of these conditions are phrased in terms of the derivative which is controlled as x 6∈ Fs. In fact, the first condition already occurred in the first case, and it is satisfied if scl|u| . 2ε1k. Recalling the conditions (5.66), the second condition is also satisfied for the same set of values for u. The application of (5.63) then yields a very small bound after integrating |u| & 2ε1kscl−1. This completes the proof our technical Lemma. � REFERENCES 87 References [1] Angeles Alfonseca, Fernando Soria, and Ana Vargas, A remark on maximal operators along directions in R2, Math. Res. Lett. 10 (2003), no. 1, 41– 49.MR1960122 (2004j:42010) ↑8 [2] Angeles Alfonseca, Strong type inequalities and an almost-orthogonality princi- ple for families of maximal operators along directions in R2, J. London Math. Soc. (2) 67 (2003), no. 1, 208–218.MR1942421 (2003j:42015) ↑8 [3] J. Bourgain, A remark on the maximal function associated to an analytic vec- tor field, Analysis at Urbana, Vol. I (Urbana, IL, 1986–1987), 1989, pp. 111– 132.MR 90h:42028 ↑4, 6, 23, 24 [4] Camil Muscalu, Terence Tao, and Christoph Thiele, Uniform estimates on paraproducts, J. Anal. Math. 87 (2002), 369–384. Dedicated to the memory of Thomas H. Wolff. MR 1945289 (2004a:42023) ↑42 [5] Anthony Carbery, Andreas Seeger, Stephen Wainger, and James Wright, Classes of singular integral operators along variable lines, J. Geom. Anal. 9 (1999), no. 4, 583–605.MR 2001g:42026 ↑6, 28 [6] Lennart Carleson, On convergence and growth of partial sumas of Fourier se- ries, Acta Math. 116 (1966), 135–157.MR 33 #7774 ↑vii, 2 [7] Michael Christ, Alexander Nagel, Elias M. Stein, and Stephen Wainger, Sin- gular and maximal Radon transforms: analysis and geometry, Ann. of Math. (2) 150 (1999), no. 2, 489–577.MR 2000j:42023 ↑6 [8] A. Córdoba and R. Fefferman, On differentiation of integrals, Proc. Nat. Acad. Sci. U.S.A. 74 (1977), no. 6, 2211–2213.MR0476977 (57 #16522) ↑7, 11 [9] Charles Fefferman, Pointwise convergence of Fourier series, Ann. of Math. (2) 98 (1973), 551–571.MR 49 #5676 ↑40 [10] , The multiplier problem for the ball, Ann. of Math. (2) 94 (1971), 330– 336.MR 45 #5661 ↑vii [11] Loukas Grafakos and Xiaochun Li, Uniform bounds for the bilinear Hilbert transform, I, Ann. of Math. 159 (2004), 889–933. ↑31, 42 [12] Nets Hawk Katz, Maximal operators over arbitrary sets of directions, Duke Math. J. 97 (1999), no. 1, 67–79.MR 2000a:42036 ↑8 [13] , A partial result on Lipschitz differentiation, Harmonic analysis at Mount Holyoke (South Hadley, MA, 2001), 2003, pp. 217–224.1 979 942 ↑6 [14] Joonil Kim, Maximal Average Along Variable Lines (2006). ↑28 [15] Michael T. Lacey and Xiaochun Li, Maximal theorems for the directional Hilbert transform on the plane, Trans. Amer. Math. Soc. 358 (2006), no. 9, 4099–4117 (electronic). MR 2219012 ↑2, 4, 10, 31, 35, 37, 40, 53, 54, 55, 59 [16] Michael T. Lacey and Christoph Thiele, Lp estimates on the bilinear Hilbert transform for 2 < p < ∞, Ann. of Math. (2) 146 (1997), no. 3, 693– 724.MR1491450 (99b:42014) ↑2 [17] , On Calderón’s conjecture for the bilinear Hilbert transform, Proc. Natl. Acad. Sci. USA 95 (1998), no. 9, 4828–4830 (electronic). MR 1619285 (99e:42013) ↑2 [18] Michael Lacey and Christoph Thiele, On Calderón’s conjecture, Ann. of Math. (2) 149 (1999), no. 2, 475–496. MR 1689336 (2000d:42003) ↑2 [19] Michael T. Lacey and Christoph Thiele, Lp estimates for the bilinear Hilbert transform, Proc. Nat. Acad. Sci. U.S.A. 94 (1997), no. 1, 33–35. MR 1425870 (98e:44001) ↑2 88 5. ALMOST ORTHOGONALITY [20] Michael Lacey and Christoph Thiele, A proof of boundedness of the Carleson operator, Math. Res. Lett. 7 (2000), no. 4, 361–370.MR 2001m:42009 ↑2, 10, 40, 43, 53 [21] Camil Muscalu, Terence Tao, and Christoph Thiele, Multi-linear operators given by singular multipliers, J. Amer. Math. Soc. 15 (2002), no. 2, 469–496 (electronic).MR 2003b:42017 ↑5, 48 [22] Alexander Nagel, Elias M. Stein, and Stephen Wainger, Hilbert transforms and maximal functions related to variable curves, Harmonic analysis in Eu- clidean spaces (Proc. Sympos. Pure Math., Williams Coll., Williamstown, Mass., 1978), Part 1, 1979, pp. 95–98.MR 81a:42027 ↑6, 23 [23] D. H. Phong and Elias M. Stein, Hilbert integrals, singular integrals, and Radon transforms. II, Invent. Math. 86 (1986), no. 1, 75–113.MR 88i:42028b ↑6 [24] , Hilbert integrals, singular integrals, and Radon transforms. I, Acta Math. 157 (1986), no. 1-2, 99–157.MR 88i:42028a ↑6 [25] Elias M. Stein, Problems in harmonic analysis related to curvature and oscil- latory integrals, Proceedings of the International Congress of Mathematicians, Vol. 1, 2 (Berkeley, Calif., 1986), 1987, pp. 196–221.MR 89d:42028 ↑2, 6 [26] E. M. Stein, Harmonic analysis: real-variable methods, orthogonality, and os- cillatory integrals, Princeton Mathematical Series, vol. 43, Princeton Univer- sity Press, Princeton, NJ, 1993. With the assistance of Timothy S. Murphy; Monographs in Harmonic Analysis, III.MR 95c:42002 ↑vii [27] Jan-Olov Strömberg,Maximal functions associated to rectangles with uniformly distributed directions, Ann. Math. (2) 107 (1978), no. 2, 399–402.MR0481883 (58 #1978) ↑2, 8, 19 [28] , Weak estimates on maximal functions with rectangles in certain di- rections, Ark. Mat. 15 (1977), no. 2, 229–240.MR0487260 (58 #6911) ↑2, 8 Preface Chapter 1. Overview of Principal Results Chapter 2. Connections to Besicovitch Set and Carleson's Theorem Besicovitch Set The Kakeya Maximal Function Carleson's Theorem The Weak L 2 Estimate in Theorem 1.15 is Sharp Chapter 3. The Lipschitz Kakeya Maximal Function The Weak L 2 Estimate An Obstacle to an Lp estimate, for 1<|startoftext|> Wide Field Surveys and Astronomical Discovery Space A.Lawrence Institute for Astronomy, SUPA∗, University of Edinburgh, Royal Observatory, Blackford Hill, Edinburgh EH9 3HJ A review for publication in Astronomy and Geophysics Feb 27th 2007 Abstract I review the status of science with wide field surveys. For many decades surveys have been the backbone of astronomy, and the main engine of discovery, as we have mapped the sky at every possible wavelength. Surveys are an efficient use of resources. They are important as a fundamental resource; to map intrinsically large structures; to gain the necessary statistics to address some problems; and to find very rare objects. I summarise major recent wide field surveys - 2MASS, SDSS, 2dfGRS, and UKIDSS - and look at examples of the exciting science they have produced, covering the structure of the Milky Way, the measurement of cosmological parameters, the creation of a new field studying substellar objects, and the ionisation history of the Universe. I then look briefly at upcoming projects in the optical-IR survey arena - VISTA, PanSTARRS,WISE, and LSST. Finally I ask, now we have opened up essentially all wavelength windows, whether the exploration of survey discovery space is ended. I examine other possible axes of discovery space, and find them mostly to be too expensive to explore or otherwise unfruitful, with two exceptions : the first is the time axis, which we have only just begun to explore properly; and the second is the possibility of neutrino astrophysics. 1 Why are wide field surveys important ? Some astronomical experiments are direct, in that measurements are made of some piece of sky, and these measurements are then used for a specific scientific analysis. The essence of a survey however is that extracting science is a two step process. First we summarise the sky, usually by taking an image and then running pattern recognition software to produce a catalogue of objects each with a set of measured parameters. When this summary is made, we can then do the science with the catalogue; the archive becomes the sky. There are many such archives, distributed around the world in online structured databases; querying such databases is a growing mode of scientific analysis. This of course is why survey databases have played such a central role in the worldwide Virtual Observatory initiatives. Why is this two-step process a good thing to do ? Firstly, it is cost effective, because we can performmany experiments using the same data. Secondly, surveys are a resource that can support other experiments. This can mean for example creating samples of objects which are ‘followed up’, i.e. observed in detail, on other facilities (eg getting spectra of galaxy samples). Conversely, interesting objects discovered by other experiments can be matched against objects in the standard survey catalogues, so that one quickly has the optical flux of a new gamma-ray source. (Should this be called follow-down ?). Finally, surveying the sky can produce surprises. First looks in new corners of parameter space have often ∗Scottish Universities Physics Alliance http://arxiv.org/abs/0704.0809v1 Wide Field Surveys : A.Lawrence 2 Table 1: Examples of major astronomical surveys from recent decades Type Survey Examples Radio 3C, PKS, 4C, FIRST IR IRAS-PSC, ELAIS, 2MASS, UKIDSS Optical APM, SuperCOSMOS, SDSS, CFHTLS X-ray 3U, 2A, HEAO-A, 1-XMM z-surveys CfA-z, QDOT, 2dFGRS, SDSS-z discovered completely new populations of objects. Historically, surveys have been the main engine of discovery for astronomy. Why are wide angle surveys important, as opposed to the deepest possible pencil beams ? The key point here is that in Euclidean space, time spent surveying more area increases volume much faster than time spent going deeper. (The argument that wide angles produce large samples faster breaks down when the differential source count slope is flatter than 1, which for example occurs for galaxies fainter than about B∼ 23. Also of course, sometimes, one simply has to go deep, for example to survey at some given large redshift.) Many astronomical problems need large samples of objects to address them. Sometimes this is because one wants accurate function estimation – for example to test theories of structure formation, one wants to estimate the galaxy clustering power spectrum to an accuracy of around 5% in many bins over a wide range of scale. Sometimes large samples are needed to recover a very weak signal from noise – for example the net alignment of many random galaxy ellipticities produced by weak lensing by intervening dark matter. The second reason for maximising volume as quickly as possible is to find rare objects, such as the hoped for Y dwarfs and z = 7 quasars; to a given depth there might be only a handful over the whole sky. Finally, some objects of astronomical study simply have intrinsically large angular scale - for example the Milky Way, the galaxy clustering dipole, or open clusters of stars, which can be tens of degrees across. 2 Major surveys Surveys are the core of astronomy. This has always been true of course, from Ptolemy through the New General Catalogue, to the Carte du Ciel, but it has been certainly been the case in the last few decades. Table 1 lists some of the best known imaging surveys in each wavelength regime. (I have also included a few redshift surveys as a distinct set). This is only a selection, and is biased towards my own favourites, so apologies to those whose own surveys aren’t listed. The point to note is that these names are as immediately recogniseable to every astronomer as are the names of famous telescopes and satellites - Palomar, AAT, Ariel-V, etc. The data in these catalogues are of everyday use and have been the source of many discoveries. Many of the older surveys were classic examples of opening a completely new window on the Universe - 3C, IRAS, and 3U in particular, though I think it is also fair to include the CfA redshift survey in this category, as it gave us the first real feel for the three dimensional structure of the Universe, with bubbles, filaments, and walls. The 1-XMM catalogue is slightly different, in that it wasn’t planned as a coherent single survey, but is the uniformly processed summation of XMM pointings over the sky. Over the last 5-10 years the most important major new surveys have been in the optical-IR - 2MASS, SDSS, 2dFGRS, and now UKIDSS, which started in 2005. I will summarise each of these briefly in turn. Some highlight science results are in the next section. The Two Micron All Sky Survey : 2MASS. Wide Field Surveys : A.Lawrence 3 Figure 1: All sky distribution of 2MASS catalogues. Point sources are shown as white dots. Extended sources are coloured according to estimated redshift, based either on known values, or estimated from K magnitude. Blue are the nearest sources (z < 0.01); green are at moderate distances (0.01 < z < 0.04) and red are the most distant sources that 2MASS resolves (0.04 < z < 0.1). Taken from Jarrett (2004). 2MASS broke new ground, as it was the first real sky survey at near infa-red wavelengths. At near- IR wavelengths we see roughly the same Universe as in the visible light regime, but with some key improvements. Extinction is much less; we can see pretty much clean through the Milky Way, and can find reddened versions of objects such as quasars. Cooler objects such as brown dwarfs can be found, with the most extreme objects essentially invisible in standard optical bands. Cleaner galaxy samples can be constructed, with high redshift objects easier to find. Colour combinations with optical bands have proved especially good at finding rare objects, such as the new T-dwarf class of brown dwarfs. 2MASS used two dedicated 1.3m telescopes, in Mt Hopkins, Arizona, and CTIO, Chile. Each telescope was equipped with a three-channel camera, each channel consisting of a 256×256 HgCdTe array, so that observations could be made simultaneously at J (1.25 microns), H (1.65 microns), and Ks (2.17 microns). One interesting innovation was the use of large pixels, maximising survey speed, requiring micro-stepping to improve sampling. The survey started in June 1997 June and completed in February 2001. The full data release occurred in March 2003, including both an Atlas of images and a catalogue of almost half a billion sources. To a point source limit of 10σ, the catalogue depth is J=16 H=15 Ks=14.7, almost five orders of magnitude deeper than any comparable IR survey. However, for the colours of many astronomical objects, this is still two orders of magnitude shallower than modern optical surveys. The core reference for 2MASS is Skrutskie et al. 2006). Further information can be found at the IPAC (http://www.ipac.caltech.edu/2mass/) and UMASS(http://pegasus.phast.umass.edu/) sites. Data ac- cess is through the IRSA system at http://irsa.ipac.caltech.edu/ . The Sloan Digital Sky Survey : SDSS. The SDSS project has produced a survey of 8,000 square degrees of sky at visible light wavelengths, approximately two magnitudes deeper than the historic Schmidt surveys, and in addition has carried out a spectroscopic survey of objects selected from the imaging survey. The project used a dedicated 2.5m telescope at Apache Point, New Mexico, and a camera covering 1.5 square degrees. Survey operations used a novel drift scan approach; the 30 CCDs on the camera are arranged in five rows each sensitive to a separate filter band (u,g,r,i,z); the telescope is parked in a given position and the sky allowed to drift past. The spectroscopic survey is carried out using a 600-fibre system, on separate nights spliced into the imaging programme. This then required the data processing pipeline to keep up in almost real time. Public data access has been announced in a series of staged releases, culminating in June 2006 with DR5, which contains a catalogue of around 200 million objects, and spectra for around a million http://www.ipac.caltech.edu/2mass/ Wide Field Surveys : A.Lawrence 4 galaxies, quasars, and stars. An extended programme, cunningly called SDSS-II, has now commenced, and is expected to continue through 2008. SDSS has been arguably the most successful survey project of recent times, with many hundreds of scientific papers based directly on its data, and having an impact on a very large range of astronomical topics - large scale structure, the highest redshift quasars, the structure of the Milky Way, and many other things besides. This may seem surprising, as visible light sky surveys covering the whole sky have been available for decades, and available as digitised queryable online databases for some years (eg the Digitised Sky Survey (DSS : see http://archive.stsci.edu/dss/) or the SuperCOSMOS Science Archive (SSA : see http://surveys.roe.ac.uk/ssa). There are several reasons for the success of SDSS. The first reason is of course the spectroscopic database, matched only by 2dF (see below). The second reason is the wider wavelength range, with filters carefully chosen and calibrated to optimise various kinds of search. The third reason is the improvement in quality - not only is SDSS a magnitude or two deeper than the Schmidt surveys, but the seeing is markedly better. The fourth reason, shared by 2MASS, is the quality of the online interface - well calibrated, reliable, and documented data were available promptly, and with the ability to do online analysis rather than just downloading data. This has made it easy for astronomers all over the world to jump in and benefit from SDSS. The core reference for SDSS is York et al. 2000). Further information can be found at http://www.sdss.org, which also contains links to data access via SkyServer. The UKIRT Infrared Deep Sky Survey : UKIDSS. UKIDSS is the near-infrared equivalent of the SDSS, covering only part of the sky, but many times deeper than 2MASS. The project has been designed and implemented by a private consortium, but on behalf of the whole ESO member community, and after a short delay, the world. It uses the Wide Field Camera (WFCAM) on the UK Infrared Telescope (UKIRT) in Hawaii, and is taking roughly half the UKIRT time over 2005-2012. WFCAM has an instantaneous field of view of 0.21 sq.deg, much larger than any previous large facility IR camera. Put together with a 4m telescope, this makes possible an ambitiuous survey. It is estimated that the effective volume of UKIDSS will be 12 times that of 2MASS, and the effective amount of information collected 70 times larger. UKIDSS is not a single survey, but a portfolio of five survey components. Three of these are wide shallow surveys, to K∼ 18−19, and covering a total of ∼ 7000 sq.deg - the Galactic Plane Survey (GPS); the Galactic Clusters Survey (GCS); and the high latitude Large Area Survey (LAS). Then there is a Deep Extragalactic Survey (DXS), covering 35 sq.deg to K ∼ 21, and an Ultra Deep Survey (UDS), covering 0.77 sq.deg. to K ∼ 23. In all cases, there is the maximum possible overlap with other multiwavelength surveys and key areas, such as SDSS, the Lockman Hole, and the Subaru Deep Field. The aim of UKIDSS is to provide a public legacy database, but the design was targeted at some specific goals - for example, to measure the substellar mass function, and its dependence on metallicity; to find quasars at z = 7; to discover Population II brown dwarfs if they exist; to measure galaxy clustering at z = 1 and z = 3 with the same accuracy as at z = 0; and to determine the epoch of spheroid formation. Like SDSS, data are being released in a series of stages. At each stage the data are public to astronomers in all ESO member states, and world-public eighteen months later. Data are made available through a queryable interface at the WFCAM Science Archive (WSA : http://surveys.roe.ac.uk/wsa). Three data releases have occured so far – the “Early Data Release”, and Data Releases One and Two (DR1 and DR2) which contain approximately 10% of the likely full dataset. UKIDSS is summarised in Lawrence et al. (2007) , and technical details of the releases are described in Dye et al. (2006) and Warren et al. (2007). Redshift surveys : 2MRS/6dFGRS; 2dFGRS and SDSS-z. Systematic redshift surveys based on galaxy catalogues from imaging surveys were one of the big success stories of the 1970s–90s, culminating in the all-sky z-survey based on the IRAS galaxies, the PSC-z (Saunders et al. 2000). The most ambitious surveys to date have however been carried out over the last five years. The first example is the construction of a complete all-sky redshift survey based on galaxies in http://archive.stsci.edu/dss/ http://www.sdss.org http://surveys.roe.ac.uk/wsa Wide Field Surveys : A.Lawrence 5 the 2MASS Extended Source Catalog (XCS) to a depth of KS=12.2, containing roughly 100,000 galaxies. In the south, observations are carried out at the UK Schmidt, as part of the 6dfGRS project (Jones et al. 2004, http://www.aao.gov.au/local/www/6df/); in the North observations are being carried out by a CfA team at Mt Hopkins, Arizona (see http://cfa-www.harvard.edu/∼huchra/2mass/). The survey is part way through, but has already been used to measure the dipole anisotropy of the local universe (Erdodgu et al. 2005). Two very successful projects have completed redshift surveys of smaller area, but reaching considerably deeper, containing hundreds of thousands of galaxies. The first, in the Northern sky, is SDSS-z, the spectroscopic component of SDSS, as described above. The second, in the southern sky, is the 2dF Galaxy Redshift Survey (2dFGRS; Colless et al. 2001). This was based on galaxies selected from the APM digitisation of UK Schmidt plates (Maddox et al. 199x), and observed using the Two Degree Field (2dF) facility at the Anglo-Australian Telescope, which has 400 independent fibres. The 2dFGRS obtained spectra for 245591 objects, mainly galaxies, brighter than a nominal extinction-corrected mag- nitude limit of bJ=19.45, covering 1500 square degrees in three regions. The final data release was in June 2003. More information, and data access, is available at http://www.mso.anu.edu.au/2dFGRS/. These two surveys have produced a range of science, but have concentrated on making the best possible measurement of the power spectrum of galaxy clustering, and together with WMAP and supernova programme results, have produced the definitive estimates of the cosmological parameters, leading to the current ‘concordance cosmology’. 3 Recent survey science highlights I have picked out a handful of results from the optical-IR surveys of the last few years, including the first results from UKIDSS, to illustrate the power of the survey approach. Panoramic mapping : the structure of the Milky Way. Two topics which clearly benefit from a map covering 4π sr, and with low extinction, are the structure of the Milky Way, and the structure of the local extragalactic universe. Figure 1, taken from Jarrett (2004), illustrates the impact 2MASS has on both these topics, showing both the Point Source Catalog (mostly stars) and the Extended Source Catalog (mostly galaxies at z<0.1). For the first time, we can see the Milky Way looking like other external galaxies, with disc, bulge, and dust lane. Some of the most important scientific results however have come from looking at subsets of the stellar population. Figure 2, from Majewski et al. (2003), shows the sky distribution of M giants selected from the 2MASS PSC, a selection which traces very large scale structures while removing the dilution of local objects, using a few thousand stars out of the catalogue of half a billion. From APM star counts we already knew of the existence of the Sagittarius dwarf, swallowed by the Milky Way (Ibata, Irwin and Gilmore 1994), but now we can see its complete structure including an extraordinary 150 degree tidal tail. Its orbital plane shows no precession, indicating that the Galactic potential within which it moves is spherical. The Earth is currently close to the debris, which means that some very nearby stars are actually members of the Sagittarius dwarf system. Interestingly, Sagittarius seems to contribute over 75% of of high latitude halo M giants, with no evidence for M giant tidal debris from the Magellanic clouds. SDSS, although not a panoramic survey, has also been very important for Galactic structure and stel- lar populations, with the five widely spread bands making it possible to derive stellar types and so photometric parallaxes. Juric et al. (2005) derive such parallaxes for 48 million stars. They fit a com- bination of oblate halo, thin disk, thick disk, but also find significant ‘localised overdensities’, including the known Monoceros stream, but also a new enhancement towards Virgo that covers 1000 sq.deg. This then maybe another dwarf galaxy swallowed by the Milky Way. Large sample statistics : galaxy clustering and the cosmological parameters. http://www.aao.gov.au/local/www/6df/ http://www.mso.anu.edu.au/2dFGRS/ Wide Field Surveys : A.Lawrence 6 Figure 2: Smoothed maps of the sky in equatorial coordinates showing the 2MASS point source catalogue optimally filtered to show the Sagittarius dwarf; southern arc (top), and the Sagittarius dwarf northern arm (bottom). Two cycles around the sky are plotted to demonstrate the continuity of features. The top panel uses 11 < Ks < 12 and 1.00 < J − Ks < 1.05. The bottom panel uses 12 < Ks < 13 and 1.05 < J −Ks < 1.15. Taken from Majewski et al. (2003). Figure 3: Cone diagram showing projected distribution of galaxies in 2dFGRS. Taken from Peacock (2002). Wide Field Surveys : A.Lawrence 7 Figure 4: (a) Power spectrum from 2dFGRS, compared to various model predictions. Taken from Per- cival et al. (2001). (b) Correlation function of Luminous red Galaxies in the SDSS-z sample, showing the first baryon acoustic oscillation peak. Taken from Eisenstein et al. (2005) The SDSS-z and 2dFGRS surveys illustrate the power of the survey approach in two ways. First, significant volume is needed to map out large scale structures and overcome shot noise on the largest scales. Figure 3 is a cone diagram for all the 2dFGRS galaxies, showing the richness of structure that is only possible to map out with both a large volume and density. Second, large numbers are needed to make a good enough estimate of the power spectrum of galaxy clustering. This is illustrated in Fig 4a, which shows the power spectrum derived from 2dFGRS compared to various model predictions (data from Percival et al. (2001), figure from Peacock (2002)). To distinguish models with differing matter density in the interesting range requires accuracy of a few percent over a very wide range of scales; to have a chance of measuring small scale features predicted by models including a significant baryon fraction requires many samples across this wide range, with of the order 103 galaxies per bin to achieve the required accuracy. These wiggles are due to acoustic oscillations in the baryon component of the universe at early times. In the Percival et al. paper, only a limit could be placed on these oscillations, but they were statistically detected in the fimal 2dFGRS data (Cole et al. 2005). However, in another good example of filtering out a tracer sub-sample from a very large sample, the first baryon peak was much more clearly seen in the correlation function of Luminous Red Galaxies (LRGs) selected from SDSS-z (Eisenstein et al. 2005; Huetsi 2005; see Fig 4). 2dFGRS and SDSS-z were the first redshift surveys to have large enough scale and depth to overlap the fluctuation measurements from the CMB, enabling degeneracies in the estimation of cosmological parameters to be broken, and accuracy to be increased by a factor of several. Several key papers made joint analyses of the galaxy and CMB datasets (Percival et al. 2002; Efstathiou et al. 2002; Tegmark et al. 2003; Pope et al. 2004) arriving at broadly consistent answers. We now know what kind of universe we live in : a geometrically flat universe dominated by vacuum energy (75%), with some kind of cold dark matter at about 21% and ordinary baryons 4%. The equation of state parameter for the dark energy has been limited to w < −0.52 (Percival et al. 2002), and the total mass of the neutrinos to m <1 eV (Tegmark et al. 2003; Elgaroy et al. 2002) The Deep eXtragalactic Survey (DXS) of UKIDSS will produce a galaxy survey over a volume as large as that of 2dFGRS or SDSS, but at z = 1. A redshift survey of this sample is a prime target for future work. Rare objects : Brown Dwarfs. Infrared surveys have transformed the study of the substellar regime, blurring our idea of what it means Wide Field Surveys : A.Lawrence 8 7000 7500 8000 8500 9000 9500 10000 10500 Wavelength Dashed: LBQS composite Dotted: Telfer continuum SDSS J1030+0524 z=6.28 Figure 5: (a) High resolution spectrum of high redshift quasar found in SDSS. Gunn-Peterson troughs due to Lyα and Lyβ are the black sections from 8500Å to 9000Å and from 7000Å to 7500Å. Taken from White et al. (2003); original discovery spectrum in Becker et al. (2001). (b) Spectral energy distributions for a z = 7 quasar and a T-dwarf, compared to filter passbands from SDSS and UKIDSS. Taken from Lawrence et al. (2007). to be a star. For many years, until the first discovery of the very faint IR companion of GL 229 (i.e. GL229B) by Nakajima et al. (1995), the possibility of star-like objects which never ignite nuclear burning was only a speculation. Within a year of the start of 2MASS, Kirkpatrick et al. (1999) had found 20 brown dwarfs in the field, increasing the number of known brown dwarfs by a factor of four, and had defined two new stellar spectral types - L and T. (These strange designations were determined by the fact that various odd stellar types had already used up nearly all the other letters of the alphabet.) The transition from M to L was defined by the change of key atmospheric spectral features from those of metal oxides to metal hydrides and neutral metals; the transition from L to T by the appearance of molecular features such as methane - as seen in solar system planets. The effective temperature for L dwarfs is in the range T∼1500 – 2000 K, and for T-dwarfs T∼1000 –1500 K. As of the time of writing, almost 600 brown dwarfs are known. Most of these are L-dwarfs, but almost 60 T-dwarfs have now been found in a series of 2MASS papers (see Ellis et al. 2005 and references therein). The much deeper UKIDSS search is expected to make significant further advances in two ways. The first is by pushing to ever cooler and fainter objects, hopefully finding examples of a putative new stellar class labelled ‘Y dwarfs’ (the last useable letter left ... see Hewett et al. 2006), finding T-dwarfs further than 10pc, and plausibly finding Population II brown dwarfs if they exist. The second advance expected from UKIDSS is the determination of the substellar mass function, through the Galactic Clusters Survey (GCS), and testing whether it is universal or not. These hopes are already being borne out by early UKIDSS results; Warren et al. (2007b) report the discovery of the coolest known star, classified as T8.5; and in early results from the GCS programme, Lodieu et al. (2007) have found 129 new brown dwarfs in Upper Sco, a significant fraction of all known brown dwarfs, including a dozen below 20 Jupiter masses, finding the mass function in the range 0.3 – 0.01 solar masses to have a slope of index α = 0.6 ± 0.1. Rare objects : the ionisation history of the Universe. An excellent example of the ‘needle in a haystack’ search is looking for very high redshift quasars. Only the most extremely luminous quasars are detectable at these distances, but the space density of such objects is very low; even in a survey with thousands of square degrees there may be only a few present. Luminous and high redshift quasars are interesting for a variety of reasons, but a key target for four decades has been their use as beacons to detect the re-ionisation of the inter-galactic medium. The baryon content of the early universe must have become neutral as it cooled down, but something subsequently re-ionised it, as attempts to find the expected ‘Gunn-Peterson trough’ (Gunn and Peterson 1965) in the spectra of high redshift quasars had failed for many years. This finally changed in 2001 as SDSS broke the z = 6 quasar redshift barrier (Fan et al. 2001) and Becker et al. (2001) made the first Wide Field Surveys : A.Lawrence 9 detection of a Gunn-Peterson trough at z = 6.28. Figure 5 shows the improved spectrum from White et al. (2003). Unfortunately this exciting result seemed to conflict with the CMB measurements by the WMAP year-1 data. The degree of scattering required implied that ionisation had already taken place by z=11 – 30 (Kogut et al. 2003). Rather than being seen as a contradiction, it seems likely that that re-ionisation was not a single sharp-edged event, but an extended and very likely complex affair, perhaps with multiple stages and even spatial inhomogeneity (see White et al. 2003). This opens an entire new field of investigation for understanding the history of the early universe. Rather than a single object locating the transition edge, it is now important to find as many beacons as possible at z>6, and to find some beacons in the range z = 7–8. This is one of the key aims of UKIDSS, in combination with SDSS data, looking for z-dropouts. A problem however is that JHK colours of high-z quasars and T-dwarfs become very similar. For this reason, UKIDSS is using a Y-band filter centred at 1.0 µm. Figure 6 illustrates the point, comparing the spectrum of a quasar redshifted to z = 7 with that of a T-dwarf brown dwarf. 4 Next steps in optical-IR surveys Three key optical-IR survey projects are to begin soon (VISTA, PanSTARRS, and WISE), with the ultimate in wide-field surveys (LSST) now in the planning stage. Here I briefly summarise each of these. VISTA. The Visible and Infrared Survey Telescope for Astronomy (VISTA) is a 4m aperture dedicated survey telescope on Paranal in Chile. It was originally a UK project, aimed ta bothe optical and IR surveys, but became an IR-only ESO telescope during the accession of the UK to ESO. The infrared camera operates at Z, Y , J , H , and KS , and contains 16 arrays each of which has 2048×2048 0.33” pixels, covering 0.6 sq.deg. in each shot. VISTA therefore operates in the same parameter space as UKIDSS, but will survey three times faster, and furthermore, 100% of the time is dedicated to IR surveys. The majority (75%) of the telescope time is reserved for large public surveys. At the time of writing, these surveys are in the process of final approval, but are likely to include a complete hemisphere survey to K=18.5, surveys of the Galactic Bulge and the Milky Way, a thousand sq.deg. survey to K=19.5, a 30 sq.deg. survey to K=21.5, and a 1 sq.deg. survey to K=23. VISTA is expected to begin operations in late 2007. The VISTA web page is at http://www.vista.ac.uk/, and a recent reference is McPherson et al. (2006). PanSTARRS. The power of a survey facility is measured by its étendue, the product of collecting area times field of view. The cost of a telescope, and the difficulty of producing very wide fields, scales steeply with telescope aperture. The idea behind the ‘Panoramic Survey Telescope and Rapid Response System’ (PanSTARRS), a University of Hawaii project, is to produce the maximum étendue per unit cost by building several co-operating wide field telescopes of moderate size. The design has four 1.8m telescopes each with a mosaic array of 64×64 CCD chips covering 7 sq.deg, which will produce an étendue an order of magnitude larger than the SDSS facility. As well as enabling one to produce deep surveys faster, this makes it plausible to cover very large areas of sky repeatedly - thousands of square degrees per night. The prime aim of PanSTARRS is to detect potentially hazardous NEOs, but it will also be used for stellar transits, microlensing studies, and locating distant supernovae to constrain the dark energy problem. The accumulated sky survey will be many times deeper than SDSS, and the expected image quality and stability from Hawaii should allow the best ever mapping of dark matter via weak lensing distortions. A prototype single PanSTARRS system (‘PS1’) has recently been built and is being commissioned at the time of writing. The operation and science analysis for PS1 involves an extended ‘PS1 Science Consortium’ with additional partners, from the US, Uk and Germany. Over three years, it is expected http://www.vista.ac.uk/ Wide Field Surveys : A.Lawrence 10 13 13.5 14 14.5 15 Log(ν) Hz Sky survey comparisons : Extragalactic PS1UHSWISE LRG z=2 quasar z=2 quasar z=7 13 13.5 14 14.5 15 Log(ν) Hz Sky survey comparisons : Brown dwarfs PS1UHSWISE T=1000K D=30pc M=10MJ, T=5Gyr D=1pc Figure 6: Spectral energy distributions of various objects compared to 5σ sensitivities of key sky surveys. Green triangles are JHK sensitivities for a proposed extension to the UKIDSS Large Area Survey, the UKIRT Hemisphere SUrvey (UHS). Blue circles are for the WISE mission, from Mainzer et al 2005. Red circles (PS1) are for the PanSTARRS-1 3π survey, taken from project documentation. The left hand frame compares extragalactic objects - a giant elliptical at z = 2; the mean quasar continuum SED from Elvis et al 1994 redshifted to z = 2; and a high redshift quasar spectrum redshifted to z = 7. The right hand frame compares two model brown dwarf spectra, from Burrows et al (2003, 2006). The red line (lower curve at low frequency) is for an object with effective temperature of 1000K and surface gravity of 4.5, placed at a distance of 50pc. The black line (upper curve at low frequency) is for an object with mass of 10 Jupiter masses and age 5 Gyr, placed at a distance of 1pc. to produce a 3π steradian survey at grizy to z=23 with 12 visits, a Medium Deep Survey visiting 12 7 sq.deg. fields with a 4 day cadence, building up a survey to z=26, and special stellar transit campaigns and microlensing monitoring of M31. The data will become public at the end of this science programme. Information about PanSTARRS can be found at http://pan-starrs.ifa.hawaii.edu/public/ WISE. The Wide Field Infrared Survey Explorer (WISE) is a NASA MIDEX mission scheduled for launch in 2009 that will fill the gap between UKIDSS/VISTA in the near-IR and IRAS and Akari in the far-IR, surveying the sky in four bands simultaneously (3.3, 4.7, 12, and 23µm). The sky survey at 3 and 5µm is completely new territory; as 12 and 23µm WISE covers the same territory as IRAS but will be a thousand times deeper. WISE carries a 40cm cooled telescope with a 47 arcmin field of view. It is designed to have a relatively short lifetime - 7 months - but in this time will make a mid-infrared survey of the entire sky in all four bands. WISE will produce significant advances in a number of areas, but especially for objects expected to have temperatures in the hundreds of degrees - the very coolest brown dwarfs, protoplanetary discs, solar system bodies, and obscured quasars. Information about WISE can be found at http://wise.ssl.berkeley.edu/ and in Mainzer et al. (2006). The depths of PS1, UKIDSS-VISTA, and WISE surveys are well matched and will produce a stunning sky survey dataset over a factor of a hundred in wavelength. This is illustrated in Fig. 6, taken from a recent proposal to extend UKIDSS to a complete hemisphere survey. LSST. The Large Synoptic Survey Telescope (LSST) aims at the maximum possible étendue, aiming at the same kind of science as PanSTARRS - hazardous NEOs, GRBs, supernovae, dark matter mapping via weak lensing - but a factor of several faster; it should be able to produce a survey equivalent to SDSS every few days. The design has an 8.4m telescope with a 10 sq.deg. field of view. The planned standard mode of use is to take 15 second exposures, and keep moving, covering the whole sky visible from LSST http://pan-starrs.ifa.hawaii.edu/public/ http://wise.ssl.berkeley.edu/ Wide Field Surveys : A.Lawrence 11 in bands ugrizy once every three days. This produces 15TB of imaging data every night. The aim is to keep up with this flow in quasi-real time, producing alerts for transient objects within minutes. This requires approximately 60 TFlops of processing power - a huge amount today, but following Moore’s Law, very likely to be equivalent to merely the 500th most powerful computer in the world by 2012.. The LSST data management plan has a hierarchy of archive and data centres, reminiscent of the LHC Grid, with the primary mission facility acting like a ‘beamline’, where a variety of research groups can rent space for their own experiments on the data flowing past. The LSST site has now been chosen (Cerro Pachon in Chile), but the project is not yet fully funded. More information can be found at http://www.lsst.org 5 The end of survey discoveries ? In 1950, the universe seemed to consist of stars, and a sprinkling of dust. Over the last fifty years, the actual diverse and bizarre contents of the universe have been successively revealed as we surveyed the sky at a series of new wavelengths. Radio astronomy has shown us radio galaxies and pulsars; microwave observations have given us molecular clouds and the Big Bang fossil background; IR astronomy has shown us ultraluminous starburst galaxies and brown dwarfs; X-ray astronomy has given us collapsed object binaries and the intra-cluster medium; and submm astronomy has shown us debris disks and the epoch of galaxy formation. As well as revealing strange new objects, these surveys revealed new states of matter (relativistic plasma, degenerate matter, black holes) and new physical processes (bipolar ejection, matter-antimatter annihilation). Having opened up gamma-rays and the submm with GRO and SCUBA, there are no new wavelength windows left. Has this amazing journey of discovery now finished ? Wavelength is not the only possible axis of survey discovery space. Let us step through some other axes and examine their possibilities. In doing this, we will to some extent go over ground already trodden by Harwit (2003), but with a particular emphasis on surveys rather than discovery space in general, and with an eye to what is economically plausible. Photon Flux. Historically, going ever deeper has been as productive as opening new wavelength windows, the classic example of course being the existence of the entire extragalactic universe, which did not become apparent until reaching ten thousand times fainter than naked eye observations, requiring both large telescopes and the ability to integrate. We can now see things ten billion times fainter than the naked eye stars. However, we have reached the era of diminishing returns. The flux reached by a telescope is inversely proportional to diameter D but its cost is proportional to D3. Significant improvements can now only be achieved with world-scale facilities, and orders of magnitude improvements are unthinkable. The easy wins have been covered already - our detectors now achieve close to 100% quantum efficiency; we have gone into space and reduced sky background to a minimum; and multi-night integrations have been used many times. We will keep building bigger telescopes, but it no longer seems the fast track to discovery. Spectral resolution. Detailed spectroscopy of individual objects is of course the key technique of modern astrophysics. Spectroscopic surveys of samples drawn from imaging surveys have been carried out at many wavelengths, and have been particularly important for measuring redshift and so mapping the Universe in 3D; we were not expecting the voids, bubbles and walls that we found in the galaxy distribution in the 1980s. This industry will continue, but there is no obvious new barrier to break. Narrow band imaging surveys centred on specific atomic or molecular features (21cm HI, CO, Hα) have been fruitful, but again its not obvious there is anywhere new to go. Polarization. Polarisation measurements of individual objects are a very important physical diagnostic, but are polarisation surveys plausible ? Surveys of samples of known objects to the 0.1% level have been done, with interesting results but no big surprises. Perhaps blank field imaging surveys in four Stokes parameters would turn up unexpected highly polarised objects ? This has essentially been done http://www.lsst.org Wide Field Surveys : A.Lawrence 12 in radio astronomy but not at other wavelengths. Spatial resolution. This is the dominant big-project target of the next few decades, and of course is the real point of Extremely Large Telescopes. Put together with multi-conjugate Adaptive Optics, we hope to achieve both depth and milli-arcsec resolution at the same time. However, the royal road to high spatial resolution is through interferometry. Surveys with radio interferometers in the twentieth century showed the existence of masers in space, and bulk relativistic outflow. In the twenty first century we will be doing microwave interferometry on the ground (ALMA) and IR interferometry in space (TPF/DARWIN), hoping to directly detect Earth-like planets around nearby stars. So there is excitement for at least some time to come; however, as with photon flux, we are hitting an economic brick wall. Significantly bigger and better experiments will be a very long time coming. Time. The observation of temporal changes has repeatedly brought about revolutionary changes in astronomy, the classic examples being Tycho’s supernova, and the measurement of parallax. The last two decades has seen a renaissance in this area, with an impressive number of important discoveries from relatively cheap monitoring experiments - the discovery of extrasolar planets from velocity wobbles and transits; the discovery of the accelerating universe and dark energy from supernova campaigns; the location of substellar objects from survey proper motions; the existence of Trans-Neptunian Objects, and Near Earth Objects; the final pinning down of gamma-ray burst counterparts; and the limits on dark matter candidates from micro-lensing events. The next decade or two will see more ambitious pho- tometric monitoring experiments, such as PanSTARRS and LSST, and a series of astrometric missions, culminating in GAIA, which will see external galaxies rotating. Overall, the ‘time window’ is well and truly opened up. However, the temporal frequency axis is far from fully explored. My instinct is that this technique will continue to produce surprises for some time. Non-light channels : particles. Cosmic ray studies have been important for many decades, but you can’t really do surveys - indeed the central mystery has alway been where they come from. Dark matter experiments are confronting what is arguably the most important problem in physics, let alone astrophysics, but again no survey is plausible. The big hope is neutrino astrophysics. Neutrinos should emerge from deep in the most fascinating places that we could otherwise never see - supernova cores, the centres of stars, the interior of quasar accretion discs. Measurement of solar neutrinos has solved a long standing problem, and set a challenge for particle physics - but what about the rest of the Universe ? New experiments such as ANTARES (under the sea) and AMANDA (under the ice) seem to be clearly detecting cosmic neutrinos, but no distinct sources have yet emerged. Possibly the next generation (ICECUBE) will get there. This looks like the best bet for genuinely unexpected discoveries in the twenty first century. Non-light channels : gravitational waves. Like neutrinos, we know that gravitational waves have to be there somewhere, and their existence has been indirectly proved by the famous binary pulsar timing experiment. However after many years of exquisite technical development, we still have no direct detection of a gravitational wave. The space interferometer mission LISA should finally detect gravitational waves, unless current predictions are badly wrong. However even LISA will not produce a genuine survey. We will detect many events and understand more astrophysics, but will have essentially no idea where they came from, except that hopefully some will correlate with Gamma-ray bursts. If we see totally unexpected signals, it will be very hard to know what to do next. Hyper-space planes : the Virtual Observatory. As we explore the various possible axes one by one, many if not most of them are running out of steam, or too expensive to pursue. But we are a long way short of exploring the whole space - for example narrow line imaging in all Stokes parameters versus time. This exploration does not necessarily need complex new experiments. More survey-quality datasets come on line every year. As formats, access and query protocols, and analysis tool interaction protocols all get standardised, the virtual universe becomes easier for the e-astronomer to explore, and unexpected results will emerge. This, of course, is the agenda of the worldwide Virtual Observatory initiative. Wide Field Surveys : A.Lawrence 13 6 Conclusions Surveys are perhaps the most cost effective and productive way of doing astronomy. In recent years optical, infra-red, and redshift surveys have produced spectacular results in determining cosmological parameters, finding the smallest stellar objects, decoding the history of the Milky Way, and much else besides. Surveys underway now and over the next few years should also produce impressive science. Having been the main engine of discovery for decades, there is a worry now that we have already explored every axis of discovery space. The best hopes for unexpected discoveries may be in massive time domain surveys, in neutrino astrophysics, and in exploring the full multi-dimensional space through the Virtual Observatory. Wide Field Surveys : A.Lawrence 14 7 REFERENCES Becker, R.H. Fan, X., White, R.L., 2001, AJ, 122, 2850. Burrows, A., Sudarsky, D., Lurine, J.I., 2003 ApJ, 596, 587. Burrows, A., Sudarsky, D., Hubeny, I., 2006 ApJ, 640, 1063. Cole, S., Percival, W.J., Peacock, J.A. et al. 2005, MNRAS, 363 505. Colless M.M., Dalton G.B., Maddox S.J., et al. 2001, MNRAS, 328, 1039. Dye, S., Warren, S.J., Hambly, N.C., et al. 2006, MNRAS, 372, 1227. Eisenstein, D.J., Zehavi, I., Hogg, D.W., et al. 2005, ApJ, 633, 560. Efstathiou G., Moody S., Baugh C., et al. 2002, MNRAS, 330, 29. Elgary, ., Lahav, O., Percival, W.J., et al. 2002, Phys.Rev.Lett., 89, 1301. Ellis, S.C., Tinney, C.G., Burgasser, A.J., Kirkpatrick, J.D., McElwain, M.W., 2005, AJ, 130, 2347. Elvis, M, Wilkes, B.J., McDowell, J.C., Green, R.F., Bechtold, J., Willner, S.P., Oey, M.S., Polomski, E. Cutri, R., 1994, ApJSupp, 95, 1. Erdogdu, P., Huchra, J.P., Lahav, O., et al. 2006, MNRAS, 368, 1515. Fan, X., Narayanan, V.K., Lupton, R.H., et al. 2001, AJ, 122, 2833. Gunn, J.E., Peterson, B.A., 1965, ApJ, 142, 1633. Harwit, M., 2003, Physics Today, November 2003, 38. Hewett, P.C., Warren, S.J., Leggett, S.K., Hodgkin, S.L., 2006 MNRAS, 367, 454. Huetsi, G., 2005, A&A, 449, 891. Ibata, R.A., Gilmore, G., Irwin, M.I. 1994 Nature, 370, 194. Jarrett, T.H. 2004, PASA, 21, 396. Jones, D.H., Saunders, W., Colless, M., et al. 2004, MNRAS, 355, 747. Juric, M., Ivezic, Z., Brooks, A., et al. 2005, ApJ submitted (astro-ph/0510520) Kirkpatrick, J.D., Reid, I.N., Liebert, J., Cutri, R.M., Nelson, B., Beichman, C.A., Dahn, C.C., Monet, D.G., Gizis, J.E., Skrutskie, M.F., 1999, ApJ, 519, 802. Kogut, A., Spergel, D.N., Barnes, C., et al. 2003 ApJSupp, 148, 161. Lawrence, A., Warren, S.J., Almaini, O., et al. 2006 MNRAS submitted (astro-ph/0604426) Lodieu, N., Hambly, N.C., Jameson, R.F., Hodgkin, S.T., Carraro, G., Kendall, T.R., 2007, MNRAS, 374, 372. Maddox, S.J., Sutherland, W.J., Efstathiou, G., Loveday, J., 1990, MNRAS, 243, 692. http://arxiv.org/abs/astro-ph/0510520 http://arxiv.org/abs/astro-ph/0604426 Wide Field Surveys : A.Lawrence 15 Mainzer, A.K., Eisenhardt, P., Wright, E.L., Liu, F-C., Irace, W., Heinrichsen, I., Cutri, R., Duval, V., 2006, Proc SPIE, 6256, 61. Majewski, S.R., Skrutskie, M.F., Weinberg, M.D., Ostheimer, J.C. 2003, 599, 1082. McPherson, A.M., Born, A., Sutherland, S., Emerson, J., Little, B., Jeffers, P., Stewart, M., Murray, J., Ward, K., 2006, Proc SPIE 6267, 7. Nakajima, T., Oppenheimer, B.R., Kulkarni, S.R., Golimowski, D.A., Matthews, K., Durrance, S.T., 1995 Nature 378 463. Percival W.J., Baugh C.M., Bland-Hawthorn J., et al. 2001, MNRAS, 327, 1297. Percival W.J., Sutherland W.J., Peacock J.A., et al. 2002, MNRAS 337, 1068 Peacock J.A., 2002, ASP Conf Series, 283, 19. Pope, Adrian C.; Matsubara, Takahiko; Szalay, Alexander S., et al. 2004, ApJ, 607, 655. Saunders, W., Sutherland, W.J., Maddox, S.J., et al. 2000, MNRAS, 317,55. Skrutskie M.F., Cutri R.M., Stiening, et al. 2006, AJ, 131, 1163. Tegmark, M., Blanton, M., Strauss, M., et al. 2004, ApJ, 606, 702. Warren, S.J., Hambly, N.C., Dye, S., et al. 2007a, MNRAS in press (astro-ph/0610191) Warren, S.J., Mortlock, D.J., Legget, S.K., et al. 2007b, MNRAS submitted White, R.L., Becker, R.H., Fan, X., Strauss, M.A., 2003 AJ, 126, 1. York, D.G., Adelman, J., Anderson, J.E., et al. 2000, AJ, 120, 1579 http://arxiv.org/abs/astro-ph/0610191 Why are wide field surveys important ? Major surveys Recent survey science highlights Next steps in optical-IR surveys The end of survey discoveries ? Conclusions REFERENCES ABSTRACT I review the status of science with wide field surveys. For many decades surveys have been the backbone of astronomy, and the main engine of discovery, as we have mapped the sky at every possible wavelength. Surveys are an efficient use of resources. They are important as a fundamental resource; to map intrinsically large structures; to gain the necessary statistics to address some problems; and to find very rare objects. I summarise major recent wide field surveys - 2MASS, SDSS, 2dfGRS, and UKIDSS - and look at examples of the exciting science they have produced, covering the structure of the Milky Way, the measurement of cosmological parameters, the creation of a new field studying substellar objects, and the ionisation history of the Universe. I then look briefly at upcoming projects in the optical-IR survey arena - VISTA, PanSTARRS, WISE, and LSST. Finally I ask, now we have opened up essentially all wavelength windows, whether the exploration of survey discovery space is ended. I examine other possible axes of discovery space, and find them mostly to be too expensive to explore or otherwise unfruitful, with two exceptions : the first is the time axis, which we have only just begun to explore properly; and the second is the possibility of neutrino astrophysics. <|endoftext|><|startoftext|> Introduction Measurement of polarization anisotropies in the Cosmic Microwave Back- ground (CMB) is one of the great challenges in cosmology today. Very sensitive measurements of these anisotropies, particularly at large angular scales, will Preprint submitted to Elsevier Science 30 October 2018 http://arxiv.org/abs/0704.0810v2 provide unique constraints on the influence of gravitational waves on the pro- duction of structure in the very early Universe and information on the epoch of reionization. Several experiments are running or in the planning stages, and long term devel- opment for a future space mission attacking CMB polarization is underway. To date, nearly all of the effort has been directed towards maximizing the number of detectors in the focal plane to achieve the required sensitivity. Relatively little work is going into sub-orbital efforts to constrain polarization fluctua- tions at the largest angular scales, those most interesting for their impact on understanding the inflationary epoch and ionization history of the universe. This is primarily because of an unproven perception that very low multipoles will not be accessible to any but space-based missions. Indeed, large scale polarization has been searched for with ground based experiments over the last 30 years. The COsmic Foreground Explorer (COFE) is a balloon-borne instrument to measure the low frequency and low-ℓ characteristics of some dominant polarized foregrounds. Good understanding of these foregrounds is critical both for interpreting recent results, e.g. Spergel et al. (2006), and for appropriately planning future CMB missions. The experiment also explores low-ℓ limits to CMB polarization measurements at moderate frequencies from non-space based platforms. We believe that balloon and ground-based mea- surements to characterize in detail the polarized microwave sky are essential to prepare a future space mission dedicated to CMB B-modes. 2 Science The CMB radiation field is an observable that provides direct information from the early Universe. The temperature and polarization characteristics of this field impose constraints on cosmological scenarios relevant to understand the origin and the structure of the Universe. Accurate measurements of the CMB are vital to improve our understanding about geometry, mass-energy composition, and reionization of the Universe. Ultimately, the CMB could also provide indirect detection of a stochastic gravitational background and information from the inflationary epoch itself. Having this big picture in mind, several CMB experiments are now trying to constrain the tensor-to-scalar ratio value and to detect the B-mode signature. Among all practical limitations to primordial tensor amplitude detection, con- tamination due diffuse microwave foreground polarized emission is certainly the fundamental one. This emission presents spatial and frequency variations that are not well known, and the residuals from foreground subtraction are restricting our knowledge of CMB polarization. This is particularly true for fu- ture B-mode experiments that will benefit if accurate determinations of spatial and spectral characteristics of polarized foreground are made. For this reason, multifrequency measurements of the polarized foregrounds in the microwaves is now recognized as a key objective within the CMB community. At low frequencies, foregrounds include synchrotron, free-free, and possible spinning dust emission. Synchrotron dominates the low frequency range of the microwave sky. Its emission is caused by relativistic charged particles interact- ing with the Galactic magnetic field and can be highly polarized. Synchrotron measurements provide better understanding of the Galactic magnetic field structure and the density of relativistic electrons across the Galaxy. Free-free emission becomes more important in the microwave intermediate frequency range, and it is due to electron-ion scattering. Free-free is expected to be un- polarized but this might not be true at the edges of HII clouds. Electrical dipole emission from spinning dust has also been suggested by recent obser- vations at low microwave frequencies, e.g. Finkbeiner et al. (2004). COFE is a balloon-borne microwave polarimeter to measure spatial and low- frequency characteristics of diffuse polarized foregrounds. This is an important effort toward characterizing the polarized foregrounds for future CMB experi- ments, in particular the ones that aim to detect primordial gravitational wave signatures in the CMB polarization angular power spectrum. 3 Instrumentation 3.1 Telescope Amodified BEAST telescope design is the basis for the COFE optics (Childers et al., 2005; Figueiredo et al., 2005; Meinhold, P. R. et al., 2005; Mej́ıa, J. et al., 2005; O’Dwyer, I. J. et al., 2005). It consists of an off-axis Gregorian configuration obeying the DragoneMizuguchi condition (Dragone, 1978; Mizuguchi et al., 1978). The telescope is optimized for minimal cross-polarization contamina- tion and maximum focal plane area. The primary reflector is a 2.2 m off-axis parabolic reflector. The incoming radiation is reflected off of the primary re- flector towards a polarization modulating wave plate then to the secondary reflector. The 0.9 m ellipsoidal secondary reflects the incoming radiation to- ward the array of scalar feed horns that couple the radiation to an array of cryogenic low noise amplifiers. The telescope will be mounted in a gondola that has been simplified from a standard balloon-borne design due to the very light carbon fiber optical elements. A schematic of the optics is shown in Figure 1. 3.2 Polarization modulator COFE will employ a low-loss reflective polarization modulator for measur- ing both Q and U simultaneously. It consists of a linear polarizing wire grid mounted in front of a reflecting plate. The wire grid decomposes the input wave into components, parallel and perpendicular to the wires, reflecting the par- allel component with low loss. The perpendicular component passes through the wire grid and reflects off the back short, passes through the grid again and recombines with the parallel component. The distance between the plate and the grid introduces a phase shift between the two components, effectively rotating the plane of polarization of the input wave. A schematic of the polar- ization modulator is shown in Figure 2. Rotating the grid chops between the two polarization states four times per revolution as shown in Figure 3. Tests of this modulator were performed at 41.5 GHz, using a 70 cm telescope. We measured beam patterns for the rotated polarization states and integrated for extended periods on the sky in Santa Barbara, CA. We were able to deter- mine a 1/f knee lower than 50 mHz and very stable long term offsets. We also demodulated sky data to the two different states and calculated the correct combined sensitivity, as seen in Figure 4. The polarization modulator has a broad bandwidth. We achieved 22 dB isola- tion at 20% bandwidth. The radiometric loss of the elements in the modulator can easily be made very low (of order 0.11%) up to relatively high frequencies. The system works for a very wide range of frequency bands. 3.3 Receiver COFE will use InP MMIC 1 amplifiers integrated into simple total power re- ceivers. All of the RF gain will be integrated into a small compact module inside the vacuum chamber. The module will contain 3 to 4 amplifiers (∼ 75 dB of gain), band pass filter, cryogenic detector diode, and an audio ampli- fier. The module avoids the need for cryo/vacuum waveguide feedthrus on the dewar simplifying the overall design. The audio amplifiers will be within the cryostat vacuum vessel for simplicity and noise reasons, but will be at ambient temperatures. COFE has a modest number of feeds required, and no ortho- mode transducers or hybrid tees, so the passive components are minimal. A schematic of the receiver is shown in Figure 5. 1 Indium Phosphide Monolithic Microwave Integrated Circuit 3.4 Data acquisition/demodulation Data acquisition will use the same technique we have been using in our test system, namely synchronous sampling of analog integrators. We oversample the data by a large factor and perform the demodulation of Q and U Stokes parameters (and other modes for systematic error analysis) in software. This yields the most information and allows a variety of post-flight tests includ- ing null signal analysis and analysis of the DC or total power components (contaminated with 1/f , but still useful for systematic tests). 3.5 Ground-based B-machine prototype A prototype polarimeter for a B-mode project, named B-machine, is being deployed at the WMRS 2 Barcroft facility, CA (118◦14′ W longitude, 37◦35′ N latitude, 3800 m altitude). The WMRS facility is an excellent site for mi- crowave observation because of a cold microwave zenith temperature, low precipitable water vapor, and a high percentage of clear days (Marvil et al., 2006). Many of the components that will be used by the B-machine prototype are useful for COFE as well. For example, the prototype will allow systematic checks of the polarization modulator, and COFE scan strategy. The B-machine prototype will be able to yield some basic higher multipole results on the fore- grounds as well as the polarization signature and establish a data analysis pipeline. The prototype possesses telescope and detector technology identical to COFE. It has 2 Ka-band and 6 Q-band channels centered at 31 and 41.5 GHz with FWHM resolution of 28′ and 20′ respectively. The receiver has been previ- ously used in anisotropy measurements (Childers et al., 2005). The telescope runs at constant elevation while continuously scanning the sky in azimuth. A photograph of B-machine prototype is shown in Figure 6. 4 Performance For any sub-orbital CMB experiment, minimizing atmospheric contamination is important. For the COFE bands, total atmospheric emission at our target altitude of 35 km is less than 1 mK. Common broad band bolometric at- mospheric antenna temperature contributions at balloon altitudes are several hundred mK or more. Since the effective CMB antenna temperature drops 2 White Mountain Research Station with frequency, our effective atmospheric signal is approximately 1000 times less than for a bolometric balloon-borne system. Hence low-ℓ information from a balloon-borne system is very clean by comparison. Figure 7 shows the atmo- sphere and predicted foreground emission over a range of frequencies interest- ing for CMB work (the foreground prediction is calculated from Bennett et al. (2003)). 4.1 Receiver bands and expected receiver sensitivity Receiver sensitivity can be estimated according to the radiometer equation σT = K Tsys + Tsky ∆ν · τ , (1) where σT is the root-mean-square noise, Tsys is the system noise temperature, Tsky is the sky antenna temperature, ∆ν is the bandwidth, τ is the integration time, and K is the sensitivity constant of the receiver. For COFE and B-machine prototype the sensitivity constant of each receiver is K = π . The signal is sine wave modulated, reducing the sensitivity by a factor as compared with a standard Dicke receiver, with an addition factor of from the the standard definition for Q and U in the Rayleigh-Jeans regime of the CMB spectrum. Table 1 shows our estimation of the sensitivity of each receiver. Table 1 – Instrument parameters. COFE B-machine Central frequency (GHz) 10 15 20 31 41.5 FWHM beam (arcmin) 83 55 42 28 20 Tsys (K) 8 10 12 25 27 Tsky (K) at target altitude 3 2.5 2.4 2.3 6.4 13.0 Bandwidth (GHz) 4 4 5 10 7 Number of receivers 3 6 10 2 6 Sensitivity per receiver (µK s) 261 308 318 493 751 Aggregate sensitivity (µK s) 151 126 100 348 307 3 For COFE and B-machine (ground based), we compute expected Tsky antenna temperature at target altitude of 35 km and 3.8 km, respectively. By increasing the number of receivers, future ground-based or balloon-borne experiments can significantly improve aggregate sensitivity. For instance, 30 detectors could reach 61 µK s and 107 µK s at 30 and 40 GHz, respectively. 4.2 Scan strategy, sky coverage and expected map sensitivity COFE uses a simple scan strategy to cover the largest available sky area in each flight. The telescope will be pointed nominally 45◦ from the horizon to minimize ground and balloon pickup, and the gondola will rotate constantly at approximately 1/2 rpm. Data acquisition sample rate will be synchronized with the polarization rotator (at ∼ 30 Hz). For instance, using this strategy, a 24 hour flight from Fort Sumner, NM, allows to cover 59% of the sky area with a median aggregate pixel sensitivity of 92 µK/deg2, 77 µK/deg2, and 61 µK/deg2 at 10 GHz, 15 GHz, and 20 GHz respectively. COFE will acquire data from nearly all of the sky (∼ 93%). This will be achieved in a set of 12 and/or 24 hour flights from the Northern and Southern Hemispheres. Figure 8 provides estimates for sensitivity per square degree pixel over the whole sky for our flight plans. Figure 9 illustrates the expected sky coverage. The B-machine prototype focuses on higher multipoles but uses a similar scan- ning strategy from the ground. For a conservative 60 day observing campaign at WMRS, we expect to cover 56% of the sky with an median aggregate sensi- tivity of 27 µK/deg2, and 23 µK/deg2 at 31 GHz and 41.5 GHz, respectively. 5 Conclusion Over the next few years we will field a balloon-borne telescope to map more than 90% of the sky. Both polarization anisotropy and polarized foregrounds will be measured over several bands. This is an important effort toward char- acterizing the polarized foregrounds for future CMB experiments. In addition to foreground detection, COFE will better characterize the po- larization modulation capability for measuring Q and U simultaneously. As discussed earlier, a large scale ground-based campaign will capitalize on the technology that has been developed by COFE and B-machine prototype. It is clear that our current understanding of the polarization foregrounds limits our ability to make accurate observations of the B-mode signature. COFE will lessen the effect that incomplete models of foregrounds will have on future experiments. 6 Acknowledgments We acknowledge support from the National Aeronautics and Space Admin- istration (NASA), and the California Space Institute (CalSpace). T.V. and C.A.W. acknowledge CNPq Grants 305219/2004-9 and 307433/2004-8, respec- tively. Some of the results have been derived using the HEALPix 4 (Górski et al., 2005) package. References Bennett, C. L. et al. 2003, ApJS, 148, 97 Childers, J. et al. 2005, ApJS, 158, 124 Dragone, C., 1978, The Bell System Technical Journal, 57, 7, 2663 Figueiredo, N. et al. 2005, ApJS, 158, 118 Finkbeiner, D. P. et al. 2004, ApJ, 617, 350 Górski, K. M. et al. 2005, ApJ, 622, 759 Marvil, J. et al. 2006, New Astronomy, 11, 218 Meinhold, P. R. et al. 2005, ApJS, 158, 101 Mej́ıa, J. et al. 2005, ApJS, 158, 109 Mizuguchi, Y., Akagawa, M., and Yokoi, H., 1978, Electronics and Communi- cations In Japan, 61, 58 O’Dwyer, I. J. et al. 2005, ApJS, 158, 93 Spergel, D. N. et al. 2006, astro-ph/0603449 4 http://healpix.jpl.nasa.gov http://arxiv.org/abs/astro-ph/0603449 Fig. 1. Optical schematic for COFE and B-machine prototype telescopes, an off-axis Gregorian configuration optimized for minimal cross-polarization contamination. A 2.2 m parabolic reflector primary, a 0.9 m ellipsoidal secondary, and a 0.3 m rotator grid are shown. Fig. 2. Schematic of the polarization modulator. The input wave is decomposed into its two linear polarization states, parallel and perpendicular to the wires (rep- resented by dots just above the conducting reflector). The perpendicular component is phase shifted from the extra path length. When added back to the parallel com- ponent, the plane of polarization of the input wave is rotated. Fig. 3. Sample signal from a polarized thermal source. A single revolution of the modulator is shown, along with the reference signal to be used for demodulation. Commutating using this signal yields Q, for instance, while demodulating with a reference phase shifted by π/4 gives U . Fig. 4. Sample data from our room temperature radiometer viewing the sky at 41.5 GHz. The undemodulated PSD displays the 1/f knee of the HEMT radiometer of 10 Hz and a white noise of 5.4 mK s. The demodulated data have no visible 1/f and a white noise level consistent with expectation. Fig. 5. Radiometer layout for COFE. Fig. 6. Picture of prototype telescope to be deployed at WMRS. Fig. 7. Atmosphere, CMB, and predicted foreground emission from 5 to 300 GHz. COFE bands run from 10 to 20 GHz. The zenith atmosphere emission is shown at 3.8 and 35 km. The atmospheric emission and lines are mainly due to H2O, O2, and O3. For the target altitude of 35 km, we expect well under 1 mK total emission from the atmosphere. Foreground spectral index β for free-free, synchrotron, and dust were assumed, respectively, as −2.15, −2.7, and 2.2. Fig. 8. Integrated histogram of anticipated aggregate sensitivity per 1 deg2 pixel assuming a 24 hour flight from the Northern Hemisphere (Fort Sumner, NM) and a 24 hour flight from the Southern Hemisphere (Alice Springs, Australia). For each COFE band, we plot the fraction of the entire sky measured with better than a given aggregate sensitivity. The change of the curves slope is due to the fact that 35% of the sky can be observed from both hemispheres using COFE scan strategy. Fig. 9. Sky coverage for COFE assuming a 24 hour flight from the Northern Hemi- sphere (Fort Sumner, NM) and a 24 hour flight from the Southern Hemisphere (Alice Springs, Australia). The region observed contains nearly the entire sky (93%). The darker strip shows the overlap between the two observations. For illustration pur- poses, we show the diffuse Galactic structure obtained adding synchrotron, free-free and dust maps at 23 GHz (Bennett et al., 2003). Introduction Science Instrumentation Telescope Polarization modulator Receiver Data acquisition/demodulation Ground-based B-machine prototype Performance Receiver bands and expected receiver sensitivity Scan strategy, sky coverage and expected map sensitivity Conclusion Acknowledgments ABSTRACT The COsmic Foreground Explorer (COFE) is a balloon-borne microwave polarime- ter designed to measure the low-frequency and low-l characteristics of dominant diffuse polarized foregrounds. Short duration balloon flights from the Northern and Southern Hemispheres will allow the telescope to cover up to 80% of the sky with an expected sensitivity per pixel better than 100 $\mu K / deg^2$ from 10 GHz to 20 GHz. This is an important effort toward characterizing the polarized foregrounds for future CMB experiments, in particular the ones that aim to detect primordial gravity wave signatures in the CMB polarization angular power spectrum. <|endoftext|><|startoftext|> To appear in “Massive Stars: Fundamental Parameters and Circumstellar Interactions (2007)”RevMexAA(SC) JET INTERACTIONS IN MASSIVE X-RAY BINARIES Gustavo E. Romero 1,2 RESUMEN Los sistemas binarios masivos de rayos X están formados por un objeto compacto que acreta materia del viento estelar de una estrella donante de tipo temprano. En algunos de estos sistemas, llamados microcuásares, chorros de part́ıculas relativistas son eyectados desde las cercańıas del objeto compacto. Estos chorros interactúan con el campo de fotones de la estrella compañera, con el viento estelar, y, a grandes distancias, con el medio interestelar. En este trabajo se resumirán los principales resultados de tales interacciones con especial énfasis en la producción de fotones de alta enerǵıa y neutrinos. El caso de algún sistema particular, como ser LS I +61 303, será discutido con algún detalle. Además, se presentarán las perspectivas futuras para observaciones a diferentes longitudes de onda para este tipo de objetos. ABSTRACT Massive X-ray binaries are formed by a compact object that accretes matter from the stellar wind of an early-type donor star. In some of these systems, called microquasars, relativistic jets are launched from the surroundings of the compact object. Such jets interact with the photon field of the companion star, the stellar wind, and, at large distances, with the interstellar medium. In this paper I will review the main results of such interactions with particular emphasis on the production of high-energy photons and neutrinos. The case of some specific systems, like LS I +61 303, will be discussed in some detail. Prospects for future observations at different wavelengths of this type of objects will be presented. Key Words: GAMMA RAYS: THEORY — GAMMA RAYS: OBSERVATIONS — JETS AND OUT- FLOWS — STARS: BINARIES — STARS: MICROQUASARS 1. INTRODUCTION Massive stars use to form binary systems. In such systems one of the stars evolves faster than the other. At the end of the lifetime of this star a supernova explosion will occur, and either a neutron star or a black hole will be left behind. If the system is not disrupted by the explosion, the compact object will start to accrete matter from the stellar wind of its early-type companion. Since the matter has angu- lar momentum, it will form an accretion disk around the compact star. The matter will be heated in the disk, losing angular momentum and falling into the potential well. The hot disk will cool through the emission of X-rays. We say then that a massive X- ray binary (HMXRB) is born. There are around 120 HMXRBs detected in the Galaxy so far (Liu et al. 2006). Some of these systems present non-thermal radio emission. This emission is thought to be syn- chrotron radiation produced by relativistic electrons in a jet that is somehow ejected from the surround- 1Facultad de Ciencias Astronómicas y Geof́ısicas, Univer- sidad Nacional de La Plata, Paseo del Bosque, 1900 La Plata, Argentina (romero@fcaglp.unlp.edu.ar). 2Instituto Argentino de Radioastronomı́a, Casilla de Correos No. 5, (1894) Villa Elisa, Buenos Aires, Argentina (romero@irma.iar.unlp.edu.ar). ings of the compact object. When the jet is resolved at radio wavelengths through interferometric tech- niques or at X-rays, the HMXRB is called a high- mass microquasar (Mirabel et al. 1992). The word ‘microquasar’ (MQ) was coined to em- phasize the similarities between galactic jet sources and extragalactic quasars (Mirabel & Rodŕıguez 1998). These similarities, although important, should not make us to overlook the also important differences between both types of objects. The main difference is, of course, the presence of a donor star in the case of MQs. In high-mass MQs, this star pro- vides a strong photon field, a matter field in the form of a stellar wind, and a gravitational field that can act upon the accretion disk producing a torque and inducing its precession. The photon and matter field constitute targets for the relativistic particles in the jet. The interaction of the jets with these fields can produce a variety of phenomena that are absent in the case of extragalactic jets. The aim of the present article is to review these phenomena. 2. WHAT IS A MICROQUASAR? A microquasar is an accreting X-ray binary sys- tem with non-thermal jets. The basic ingredients of a MQ are shown in Figure 1. They are the compact http://arxiv.org/abs/0704.0811v1 2 GUSTAVO E. ROMERO ACCRETION DISC (hard X-rays) ‘Corona’ CORONA Accretion disc Accreting neutron star or black (radio - ?) (optical - soft X-rays) Mass- donating companion star (IR-optical) Mass-flow > 1radio infrared optical soft-X hard-X gamma-ray COMPANION Fig. 1. Sketch showing the different components of a microquasar and the energy bands at which they emit. Not to scale. From Fender & Maccarone (2004). object, the donor star, the accretion disk, the jets, which usually are relativistic or mildly relativistic, and a region of hot plasma called the ‘corona’ that surrounds the compact object. If the star is an early- type, hot star, the accretion can proceed through capture of the wind material. In the case of low- mass stars and in some close systems, the accretion occurs through the overflow of the Roche lobe. In what follows we will focus only on high-mass MQs. The donor star can produce radiation from the IR up to UV energies. The accretion disk produces soft X-rays, whereas the corona is responsible for hard X-rays that are likely generated by Comptonization of disk photons. The emission of the jets goes from radio wavelengths to, in some cases like LS 5039, gamma-rays. MQs, like blazars, can emit along the entire electromagnetic spectrum. Their spectral en- ergy distribution (SED) is complex, being the result of a number of different radiative processes occurring on different size-scales in the MQs. MQs present different spectral states at X-rays. The two basic state are the ‘soft’ state and the ‘hard’ state. In the former the SED is dominated by a grey- body peak around E ∼ 1 keV, probably due to the contribution of the accretion disk, which extends in this state all the way down to the last stable or- bit around the compact object. In the hard state the peak in the X-rays is shifted toward lower ener- gies and a strong and hard power-law component is present up to energies ∼ 150 keV, in some cases even beyond. This emission is usually interpreted as soft X-ray Comptonization in the corona (e.g. Ichimaru 1977), although some authors have suggested that it could be produced in the jet through external inverse Compton (IC) interactions (Georganopoulos et al. 2002) or through synchrotron mechanism (Markoff et al. 2001, 2003). The sources spend most of the time in the hard state. It is in this state when a steady, self-absorbed radio jet is usually observed. The transition form one state to the other is commonly accompanied by the ejection of superluminal components, that can be de- tected as moving radio blobs (Mirabel & Rodŕıguez 1994, Fender et al. 2004). 3. WHAT ARE JETS MADE OF? One of the most important open issues concern- ing MQs is the nature of the matter content of the jets. We know for sure that relativistic lep- tons with a power-law distribution are present in the jets since we can detect and measure their syn- chrotron radiation. The relativistic outflow can be made of relativistic electron-positron pairs, or al- ternatively it could be a relativistic proton-electron plasma. Another possibility is a plasma formed by a cold electron-proton fluid, where the particles would have a thermal distribution, plus a relativistic con- tent, locally accelerated by shocks (Bosch-Ramon, Romero & Paredes 2006). In this kind of jets, the bulk of the momentum is carried out by the cold plasma, which additionally confines the relativistic component. In any case, the large perturbations observed in the interstellar medium (ISM) around some MQs like Cygnus X-1 (Gallo et al. 2005) and SS 433 (Dubner et al. 1998), strongly suggest that the jets are bari- onic loaded. The direct detection of iron lines in the case of the jets of SS 433 (Kotani et al. 1994, 1996; Migliari et al. 2002) clearly confirms that they con- tain hadrons, at least in this particular object. Since there seems to be a clear correlation between the ac- cretion and ejection of matter in MQs (Mirabel et al. 1998), it is natural to assume that the content of the jets does not basically differ in nature from that of the accreting matter. All these considerations make quite likely the presence of relativistic protons in the jets of MQs. Hence, their radiative signatures can not be neglected in a serious analysis of the radia- tive processes in these sources. 4. JET INTERACTIONS What does happen when a relativistic jet pass through the medium that surrounds a hot, massive star?. The radiation field of the star penetrates freely into the jet and the dominant UV photons will in- teract with relativistic particles in the outflow. The interaction of the stellar wind with the jet will form JET INTERACTIONS IN HMXRBS 3 a boundary layer where shocks will likely be formed, but some level of fluid mixing is expected to occur. The interaction between relativistic particles from the jet and thermal particles of the wind will take place, producing high-energy emission. We can sep- arate the microscopic jet-stellar environment inter- actions in two groups, according to whether they are of leptonic or hadronic nature. Of course, both types of reactions will occur in a specific system, but ac- cording to the given conditions, one type or the other might dominate the high-energy output of the MQ. Let us briefly discuss both cases. 4.1. Leptonic interactions Relativistic electrons and positrons in the jet will IC scatter soft photons up to high energies. The origin of these photons can be diverse: stellar UV photons, X-ray photons from the accretion disk and the hot corona around the compact object, or non- thermal photons produced in the jet by synchrotron mechanism. At high energies, the interaction en- ters in the Klein-Nishina regime, where the cross section decreases dramatically. Opacity effects to gamma-ray propagation due to the presence of the local photon fields can result in the generation of IC cascades within the binary system (Bednarek 2006a, Orellana et al. 2007). Relativistic leptons can in- teract with cold protons and nuclei from the stellar wind producing high-energy emission through rela- tivistic Bremsstrahlung. A number of papers have been devoted to leptonic interactions in MQs in re- cent years, for instance, Atoyan & Aharonian (1999), Markoff et al. (2001, 2003), Georganopolous et al. (2002), Kaufman Bernadó et al. (2002), Romero et al. (2002), Bosch-Ramon & Paredes (2004), Bosch- Ramon et al. (2005a, 2006), Paredes et al. (2006), Dermer & Böttcher (2006), Gupta et al. (2006), Bednarek (2006b), etc. The reader is referred to these papers and references therein for detailed dis- cussions. In Figure 2 we show the broadband SED ex- pected from leptonic interactions in a high-mass MQ. The different contributions are indicated. It can be seen that the synchrotron emission can extend up to MeV energies and that in the GeV-TeV range the dominant process is IC upscattering of stellar pho- tons. Figure 3 shows a detail of the SED at high energies. Notice that absorption by photon-photon annihilation has been taken into account in the fi- nal curve, yielding a soft spectrum around 100 GeV (Bosch-Ramon et al. 2006). −5 −3 −1 1 3 5 7 9 11 13 log(photon energy [eV]) observed SED IC emission seed photons ext. Bremsstr. int. Bremsstr. star IC corona IC disk IC sync. corona Fig. 2. Spectral energy distribution of a high-mass MQ. The different contributions to the total SED are shown. From Bosch-Ramon, Romero & Paredes (2006). 6 7 8 9 10 11 12 13 log(photon energy [eV]) observed SED star IC synchrotron corona IC Fig. 3. High-energy emission from high-mass MQ. Cour- tesy of V. Bosch-Ramon. 4.2. Hadronic interactions The main reaction for proton cooling in a high- mass MQs is pp interaction, through the channels pp → p + p + π0 and pp → p + p + ξ(π+ + π−), where ξ is the π± multiplicity. The neutral pions decay yielding gamma rays, π0 → γ + γ, whereas the charged pions produce neutrinos and e± pairs: π± → µ±νµν̄µ → e ±νeν̄e. The gamma-ray spectrum will mimic at high-energies the spectrum of the par- ent relativistic proton population. In general, since proton losses are not as severe as electron losses in the inner region of the source, we could expect a higher energy cutoff in hadronic-dominated sources. Models for hadronic MQs have been developed 4 GUSTAVO E. ROMERO 7 8 9 10 11 12 13 14 Log E [eV] θ = 0º θ = 30º θ = 60º θ = 90º 0 60 120 180 θ [º] Beaming factor Fig. 4. Spectral energy distributions for the hadronic emission of a high-mass MQ with a smooth stellar wind. Different curves correspond to different viewing angles. From Orellana et al. (2007). by Romero et al. (2003), Romero et al. (2005), Romero & Orellana (2005) and Orellana & Romero (2007). Neutrino production in this kind of models is discussed by Romero & Orellana (2005), Aharo- nian et al. (2006), Benarek (2005) and Christiansen et al (2006). For photo-pion production of neutri- nos, under rather extreme conditions, see Levinson & Waxman (2001). Figure 4 shows different SED obtained from pp interactions for a high-mass MQ with a smooth spherical wind (Orellana et al. 2007). The vari- ous curves correspond to different viewing angles. The total jet power in relativistic protons is Lrel 6 × 1036 erg s−1. The jet is assumed to be perpen- dicular to the orbital plane, but this constraint can be relaxed to allow, for instance, for a precessing jet. Actually, in some systems, the jet could point even in the direction of the star (Butt et al. 2003, Romero & Orellana 2005). In such a case, jet-induced nucle- osynthesis can occur in the stellar atmosphere. The power of the stellar wind might be, for some stars, strong enough as to stop the jet creating a stand- ing shock between the compact object and the star. Protons and electrons might be re-accelerated there up to very high energies, producing a detectable TeV source. All existing models for hadronic MQs assume a smooth wind from the star3. However, it would be quite possible that the wind have some struc- ture, for instance in the form of clumps, a fact that 3See, nonetheless, the paper by Aharonian & Atoyan (1996) that, although not framed in the context of MQs, dis- cusses the interaction of a proton beam with a cloudy medium around a star. would lead to gamma-ray variability on short time scales. If such a variability could be detected by fu- ture Cherenkov telescope arrays, it might be used to infer the structure of the wind. The jet would act as a kind of lantern illuminating the wind in gamma- rays to the observer. Hadronic jets can propagate through the ISM producing hot spots similar to those observed in the case of extragalactic sources (Bosch-Ramon et al. 2005b). Particles re-accelerated at the termination point of the jets, can diffuse in the ambient medium, interacting with diffuse material and producing ex- tended high-energy sources. 5. THE CONTROVERSIAL CASE OF LS I +61 303 LS I +61 303 is a puzzling Be/X-ray binary, which displays gamma-ray variability at high ener- gies. The nature of the compact object and the origin of the high-energy emission is unclear. The detection of jet-like radio features by Massi et al. (2001, 2004) led to the classification of this source as a MQ. This has been recently challenged by Dhawan et al. (2006), who observed the source with the VLBA at different orbital phases concluding that the direction the jet-like feature during the perias- tron passage (opposed to the primary star) supports the scenario of a colliding wind model where the com- pact object is an energetic pulsar (wind power∼ 1036 erg/s). The system has been detected by the MAGIC telescope at E > 200 GeV. The variability is modu- lated with the orbital period. Contrary to the expec- tations the maximum of the gamma-ray emission oc- curred well after the periastron passage. The source was not clearly detected during the periastron (Al- bert et al. 2006). The cause of this could be gamma- ray absorption in the combined photon field of the Be star and its decretion disk (e.g. Orellana & Romero 2007). Figure 5 shows the electromagnetic cascades that might develop close to the periastron passage (which occurs at phase 0.23). According to these simulations the source should be detectable during the periastron, but with longer exposures, and the spectrum will be softer than what was observed at phases 0.6-0.7. The pulsar/Be binary interpretation goes not free of severe problems. The flux at MeV-GeV energies observed by EGRET (Kniffen et al. 1997) accounts for a luminosity of ∼ 1036 erg/s, which would imply an impossible conversion efficiency from wind power to gamma-rays of ∼ 1. In addition, since the pulsar wind would be orders of magnitude stronger than the slow Be equatorial wind, the observed ‘cometary tail’ JET INTERACTIONS IN HMXRBS 5 Fig. 5. Electromagnetic cascades at different phases de- veloped close to the periastron passage of the X-ray bi- nary LS I +61 303 (from Orellana & Romero 2007). radio feature, if interpreted as synchrotron radiation from electrons accelerated at the colliding wind re- gion, should point out toward the primary star, and not opposite to it. It is clear that LS I +61 303 is a interesting and peculiar system that deserves more intensive studies in the near future. 6. CONCLUSIONS MQs are outstanding natural laboratories to study a variety of physical phenomena such as par- ticle acceleration, accretion physics, and particle in- teractions. Observations of gamma-ray emission of high-mass MQs can be used to probe the structure of stellar winds and the nature of the matter content of relativistic jets. How many MQs are there in the Galaxy?. It is difficult to answer this questions, but it seems possi- ble that a significant number of the yet-unidentified variable gamma-ray sources located on the galactic plane (Romero et al. 1999) could be associated with high-mass MQs (Romero et al. 2004, Bosch-Ramon et al. 2005a). In the next few years, new Cherenkov telescope arrays like HESS II, MAGIC II, and VER- ITAS, along with the satellite observatories AGILE and GLAST, will continue detecting these extraor- dinary objects at high energies and helping to pene- trate into their mysteries. Acknowledgments This work has been supported by the Agencies CONICET (PIP 5375) and ANPCyT (PICT 03- 13291 BID 1728/OC-AR). I thank the organizers for a wonderful meeting and a warm hospitality. REFERENCES Albert, J. et al. (MAGIC coll.) 2006, Science, 312, 1771 Aharonian, F. A., & Atoyan, A. M. 1996, Space Sci. Rev., 75, 357 Aharonian, F. A., Anchordoqui, L. A., Khangulyan, D., & Montaruli, T. 2006, Journal of physics: conference series, 39, 408 Atoyan, A. M., & Aharonian, F. A. 1999, MNRAS, 302, Bednarek, W. 2005, ApJ, 631, 466 Bednarek, W. 2006a, MNRAS, 368, 579 Bednarek, W. 2006b, MNRAS, 371, 1737 Bosch-Ramon, V. & Paredes, J. M. 2004, A&A, 417, 1075 Bosch-Ramon, V., Romero, G. E., & Paredes, J. M. 2005a, A&A, 429, 267 Bosch-Ramon, V., Aharonian, F. A., & Paredes, J. M. 2005b, A&A, 432, 609 Bosch-Ramon, V., Romero, G. E., & Paredes, J. M. 2006, A&A, 447, 263 Butt, Y.M., Maccarone, T.J., & Prantzos, N. 2003, ApJ, 587, 748 Christiansen, H. R., Orellana, M., & Romero, G. E. 2006, PhRvD, 73, 063012 Dhawan, V., Mioduszewski, A., & Rupen, M., 2006, in Proc. of the VI Microquasar Workshop, Como-2006 Dermer, C., & Böttcher, M. 2006, ApJ, 643, 1081 Dubner, G. M., Holdaway, M., Goss, W. M., & Mirabel, I. F. 1998, AJ, 116, 1842 Fender R., & Maccarone T. 2004, in: Cosmic Gamma- Ray Sources, ed. K.S. Cheng & G.E. Romero, Kluwer Academic Publishers, Dordrecht, p.205 Fender, R. P., Belloni, T. M., & Gallo, E. 2004, MNRAS, 355, 1105 Gallo, E., Fender, R., Kaiser, C. 2005, Nature, 436, 819 Georganopoulos, M., Aharonian, F. A., & Kirk, J. G. 2002, A&A, 388, L25 Gupta, S., Böttcher, M., & Dermer, C. D. 2006, ApJ, 644, 409 Ichimaru, S. 1977, ApJ, 214, 840 Kaufman Bernadó, M. M., Romero, G. E., & Mirabel, I. F. 2002, A&A, 385, L10 Kniffen, D.A., et al., 1997, ApJ, 486, 126 Kotani, T., Kawai, N., Aoki, T., et al. 1994, PASJ, 46, Kotani, T., Kawai, N., Matsuoka, M., & Brinkmann, W. 1996, PASJ, 48, 619 Levinson, A., & Waxman, E. 2001, PhRvL, 87, 171101 Liu, Q.Z., van Paradijs, J., & van den Heuvel, E. P. J 2006, A&A, 455, 1165 Markoff, S., Falcke, H., & Fender, R. P. 2001, A&A, 372, Markoff, S., Nowak, M., Corbel, S., et al. 2003, A&A, 397, 645 Massi, M., et al. 2001, A&A, 376, 217 Massi, M., et al. 2004, A&A, 414, L1 Migliari, S., Fender, R. & Méndez, M. 2002, Science, 297, Mirabel, I. F., Rodriguez, L. F., Cordier, B., Paul, J., & Lebrun, F. 1992, Nature, 358, 215 Mirabel, I. F., & Rodŕıguez, L. F. 1994, Nature, 371, 46 6 GUSTAVO E. ROMERO Mirabel, I. F., & Rodŕıguez, L. F. 1998, Nature, 392, 673 Mirabel, I. F., Dhawan, V., & Chaty, S. et al. 1998, A&A, 330, L9 Orellana, M., & Romero, G. E. 2007, Ap&SS, in press Orellana, M., Bordas, P., Bosch-Ramon, V., et al. 2007, A&A, submitted Paredes, J. M., Bosch-Ramon, V., & Romero, G. E. 2006, A&A, 451, 259 Romero, G.E., Benaglia, P., Torres, D.F. 1999, A&A, 348, 868 Romero, G.E., Kaufman Bernadó, M.M., & Mirabel, I.F. 2002, A&A, 393, L61 Romero, G. E., Torres, D. F., Kaufman Bernadó, M. M., & Mirabel, I. F. 2003, A&A, 410, L1 Romero, G. E., Grenier, I. A., Kaufman Bernadó, M.M., Mirabel, I.F., & Torres, D. F. 2004, ESA-SP, 552, 703 Romero, G.E., & Orellana, M. 2005, A&A, 439, 237 Romero, G.E., Christiansen, H.R., & Orellana, M. 2005 ApJ, 632, 1093 ABSTRACT Massive X-ray binaries are formed by a compact object that accretes matter from the stellar wind of an early-type donor star. In some of these systems, called microquasars, relativistic jets are launched from the surroundings of the compact object. Such jets interact with the photon field of the companion star, the stellar wind, and, at large distances, with the interstellar medium. In this paper I will review the main results of such interactions with particular emphasis on the production of high-energy photons and neutrinos. The case of some specific systems, like LS I +61 303, will be discussed in some detail. Prospects for future observations at different wavelengths of this type of objects will be presented. <|endoftext|><|startoftext|> Introduction Observations and stellar sample Observations and calibration of the spectra Stellar sample Pseudo-continuum Definition Relation with (B-V) Calibration Equivalent width Definition Relation with colour index (B-V) Comparison Determination of (B-V) Activity indexes and chromospheric activity N index Chromospheric activity Summary ABSTRACT We study the sodium D lines (D1: 5895.92 \AA; D2: 5889.95 \AA) in late-type dwarf stars. The stars have spectral types between F6 and M5.5 (B-V between 0.457 and 1.807) and metallicity between [Fe/H] = -0.82 and 0.6. We obtained medium resolution echelle spectra using the 2.15-m telescope at the argentinian observatory CASLEO. The observations have been performed periodically since 1999. The spectra were calibrated in wavelength and in flux. A definition of the pseudo-continuum level is found for all our observations. We also define a continuum level for calibration purposes. The equivalent width of the D lines is computed in detail for all our spectra and related to the colour index (B-V) of the stars. When possible, we perform a careful comparison with previous studies. Finally, we construct a spectral index (R_D') as the ratio between the flux in the D lines, and the bolometric flux. We find that, once corrected for the photospheric contribution, this index can be used as a chromospheric activity indicator in stars with a high level of activity. Additionally, we find that combining some of our results, we obtain a method to calibrate in flux stars of unknown colour. <|endoftext|><|startoftext|> Introduction Bosonic systems at very low temperature are characterized by the fact that a macroscopic fraction of the particles collapses into a single one-particle state. Although this phenomenon, known as Bose-Einstein condensation, was already predicted in the early days of quantum mechanics, the first empirical evidence for its existence was only obtained in 1995, in experiments performed by groups led by Cornell and Wieman at the University of Colorado at Boulder and by Ketterle at MIT (see [2, 4]). In these important experiments, atomic gases were initially trapped by magnetic fields and cooled down at very low temperatures. Then the magnetic traps were switched off and the consequent time evolution of the gas was observed; for sufficiently small temperatures, the particles remained close together and the gas moved as a single particle, a clear sign for the existence of condensation. In the last years important progress has also been achieved in the theoretical understanding of Bose-Einstein condensation. In [10], Lieb, Yngvason, and Seiringer considered a trapped Bose gas consisting of N three-dimensional particles described by the Hamiltonian (−∆j + Vext(xj)) + Va(xi − xj), (1.1) where Vext is an external confining potential and Va(x) is a repulsive interaction potential with scattering length a (here and in the rest of the paper we use the notation ∇j = ∇xj and ∆j = ∆xj). Letting N → ∞ and a→ 0 with Na = a0 fixed, they showed that the ground state energy E(N) of (1.1) divided by the number of particle N converges to N→∞, Na=a0 = min ϕ∈L2(R3): ‖ϕ‖=1 EGP(ϕ) where EGP is the Gross-Pitaevskii energy functional EGP(ϕ) = |∇ϕ(x)|2 + Vext(x)|ϕ(x)| 2 + 4πa0|ϕ(x)| . (1.2) http://arxiv.org/abs/0704.0813v1 Later, in [9], Lieb and Seiringer also proved that trapped Bose gases characterized by the Gross- Pitaevskii scaling Na = a0 = const exhibit Bose-Einstein condensation in the ground state. More precisely, they showed that, if ψN is the ground state wave function of the Hamiltonian (1.1) and N denotes the corresponding one-particle marginal (defined as the partial trace of the density matrix γN = |ψN 〉〈ψN | over the last N − 1 particles, with the convention that Tr γ N = 1 for all N), N → |φGP〉〈φGP| as N → ∞ . (1.3) Here φGP ∈ L 2(R3) is the minimizer of the Gross-Pitaevskii energy functional (1.2). The interpre- tation of this result is straightforward; in the limit of large N , all particles, apart from a fraction vanishing asN → ∞, are in the same one-particle state described by the wave-function φGP ∈ L 2(R3). In this sense the ground state of (1.1) exhibits complete Bose-Einstein condensation into φGP. In joint works with L. Erdős and H.-T. Yau (see [5, 6, 7]), we prove that the Gross-Pitaevskii theory can also be used to describe the dynamics of Bose-Einstein condensates. In the Gross- Pitaevskii scaling (characterized by the fact that the scattering length of the interaction potential is of the order 1/N) we show, under some conditions on the interaction potential and on the initial N - particle wave function, that complete Bose-Einstein condensation is preserved by the time evolution. Moreover we prove that the dynamics of the condensate wave function is governed by the time- dependent Gross-Pitaevskii equation associated with the energy functional (1.2). As an example, consider the experimental set-up described above, where the dynamics of an initially confined gas is observed after removing the traps. Mathematically, the trapped gas can be described by the Hamiltonian (1.1), where the confining potential Vext models the magnetic traps. When cooled down at very low temperatures, the system essentially relaxes to the ground state ψN of (1.1); from [9] it follows that at time t = 0, immediately before switching off the traps, the system exhibits complete Bose-Einstein condensation into φGP in the sense (1.3). At time t = 0 the traps are turned off, and one observes the evolution of the system generated by the translation invariant Hamiltonian HN = − Va(xi − xj) . Our results (stated in more details in Section 3 below) imply that, if ψN,t = e −iHN tψN is the time evolution of the initial wave function ψN and if γ N,t denotes the one-particle marginal associated with ψN,t, then, for any fixed time t ∈ R, N,t → |ϕt〉〈ϕt| as N → ∞ where ϕt is the solution of the nonlinear time-dependent Gross-Pitaevskii equation i∂tϕt = −∆ϕt + 8πa0|ϕt| 2ϕt (1.4) with the initial data ϕt=0 = φGP. In other words, we prove that at arbitrary time t ∈ R, the system still exhibits complete condensation, and the time-evolution of the condensate wave function is determined by the Gross-Pitaevskii equation (1.4). The goal of this manuscript is to illustrate the main ideas of the proof of the results obtained in [5, 6, 7]. The paper is organized as follows. In Section 2 we define the model more precisely, and we give a heuristic argument to explain the emergence of the Gross-Pitaevskii equation (1.4). In Section 3 we present our main results. In Section 4 we illustrate the general strategy used to prove the main results and, finally, in Sections 5 and 6 we discuss the two most important parts of the proof in some more details. 2 Heuristic Derivation of the Gross-Pitaevskii Equation To describe the interaction among the particles we choose a positive, spherical symmetric, compactly supported, smooth function V (x). We denote the scattering length of V by a0. Recall that the scattering length of V is defined by the spherical symmetric solution to the zero energy equation ( V (x) f(x) = 0 f(x) → 1 as |x| → ∞ . (2.1) The scattering length of V is defined then by a0 = lim |x|→∞ |x| − |x|f(x) . This limit can be proven to exist if V decays sufficiently fast at infinity. Note that, since we assumed V to have compact support, we have f(x) = 1− (2.2) for |x| sufficiently large. Another equivalent characterization of the scattering length is given by 8πa0 = dxV (x)f(x) . (2.3) To recover the Gross-Pitaevskii scaling, we define VN (x) = N 2V (Nx). By scaling it is clear that the scattering length of VN equals a = a0/N . In fact if f(x) is the solution to (2.1), it is clear that fN (x) = f(Nx) solves ( VN (x) fN (x) = 0 (2.4) with the boundary condition fN (x) → 1 as |x| → ∞. From (2.2), we obtain fN (x) = 1− N |x| for |x| large enough. In particular the scattering length a of VN is given by a = a0/N . We consider the dynamics generated by the translation invariant Hamiltonian −∆j + VN (xi − xj) (2.5) acting on the Hilbert space L2s(R 3N ,dx1 . . . dxN), the bosonic subspace of L 2(R3N ,dx1 . . . dxN) con- sisting of all permutation symmetric functions (although it is possible to extend our analysis to include an external potential, to keep the discussion as simple as possible we only consider the translation invariant case (2.5)). We consider solutions ψN,t of the N -body Schrödinger equation i∂tψN,t = HNψN,t . (2.6) Let γN,t = |ψN,t〉〈ψN,t| denote the density matrix associated with ψN,t, defined as the orthogonal projection onto ψN,t. In order to study the limit N → ∞, we introduce the marginal densities of γN,t. For k = 1, . . . , N , we define the k-particle density matrix γ N,t associated with ψN,t by taking the partial trace of γN,t over the last N − k particles. In other words, γ N,t is defined as the positive trace class operator on L2s(R 3k) with kernel given by N,t(xk;x dxN−k ψN,t(xk,xN−k)ψN,t(x k,xN−k) . (2.7) Here and in the rest of the paper we use the notation x = (x1, x2, . . . , xN ), xk = (x1, x2, . . . , xk), x′k = (x 2, . . . , x k), and xN−k = (xk+1, xk+2, . . . , xN ). We consider initial wave functions ψN,0 exhibiting complete condensation in a one-particle state ϕ. Thus at time t = 0, we assume that N,0 → |ϕ〉〈ϕ| as N → ∞ . (2.8) It turns out that the last equation immediately implies that N,0 → |ϕ〉〈ϕ| ⊗k as N → ∞ (2.9) for every fixed k ∈ N (the argument, due to Lieb and Seiringer, can be found in [9], after Theorem 1). It is also interesting to notice that the convergence (2.8) (and (2.9)) in the trace class norm is equivalent to the convergence in the weak* topology defined on the space of trace class operators on 3 (or R3k, for (2.9)); we thank A. Michelangeli for pointing out this fact to us (the proof is based on general arguments, such as Grümm’s Convergence Theorem). Starting from the Schrödinger equation (2.6) for the wave function ψN,t, we can derive evolution equations for the marginal densities γ N,t. The dynamics of the marginals is governed by a hierarchy of N coupled equations usually known as the BBGKY hierarchy. N,t = −∆j, γ VN (xi − xj), γ + (N − k) Trk+1 VN (xj − xk+1), γ (k+1) (2.10) Here Trk+1 denotes the partial trace over the (k + 1)-th particle. Next we study the limit N → ∞ of the density γ N,t for fixed k ∈ N. For simplicity we fix k = 1. From (2.10), the evolution equation for the one-particle density matrix, written in terms of its kernel N,t(x1;x 1) is given by N,t(x1, x −∆1 +∆ N,t(x1;x + (N − 1) VN (x1 − x2)− VN (x 1 − x2) N,t(x1, x2;x 1, x2) . (2.11) Suppose now that γ ∞,t and γ ∞,t are limit points (with respect to the weak* topology) of γ N,t and, respectively, γ N,t as N → ∞. Since, formally, (N − 1)VN (x) = (N − 1)N 2V (Nx) ≃ N3V (Nx) → b0δ(x) with b0 = dxV (x) as N → ∞, we could naively expect the limit points γ ∞,t and γ ∞,t to satisfy the limiting equation ∞,t(x1;x −∆1 +∆ ∞,t(x1;x 1) + b0 δ(x1 − x2)− δ(x 1 − x2) ∞,t(x1, x2;x 1, x2) . (2.12) From (2.9) we have, at time t = 0, ∞,0(x1;x 1) = ϕ(x1)ϕ(x ∞,0(x1, x2;x 2) = ϕ(x1)ϕ(x2)ϕ(x 1)ϕ(x (2.13) If condensation is really preserved by the time evolution, also at time t 6= 0 we have ∞,t(x1;x 1) = ϕt(x1)ϕt(x ∞,t(x1, x2;x 2) = ϕt(x1)ϕt(x2)ϕt(x 1)ϕt(x (2.14) Inserting (2.14) in (2.12), we obtain the self-consistent equation i∂tϕt = −∆ϕt + b0|ϕt| 2ϕt (2.15) for the condensate wave function ϕt. This equation has the same form as the time-dependent Gross- Pitaevskii equation (1.4), but a different coefficient in front of the nonlinearity (b0 instead of 8πa0). The reason why we obtain the wrong coupling constant in (2.15) is that going from (2.11) to (2.12), we took the two limits (N − 1)VN (x) → b0δ(x) and γ N,t → γ ∞,t (2.16) independently from each other. However, since the scattering length of the interaction is of the order 1/N , the two-particle density γ N,t develops a short scale correlation structure on the length scale 1/N , which is exactly the same length scale on which the potential VN varies. For this reason the two limits in (2.16) cannot be taken independently. In order to obtain the correct Gross-Pitaevskii equation (1.4) we need to take into account the correlations among the particles, and the short scale structure they create in the marginal density γ To describe the correlations among the particles we make use of the solution fN (x) to the zero energy scattering equation (2.4). Assuming that the function fN (xi−xj) gives a good approximation for the correlations between particles i and j, we may expect that the one- and two-particle densities associated with the evolution of a condensate are given, for large but finite N , by N,t(x1;x 1) ≃ ϕt(x1)ϕt(x N,t(x1, x2;x 2) ≃ fN (x1 − x2)fN (x 1 − x 2)ϕt(x1)ϕt(x2)ϕt(x 1)ϕt(x (2.17) Inserting this ansatz into (2.11), we obtain a new self-consistent equation i∂tϕt = −∆ϕt + (N − 1) dxfN (x)VN (x) = −∆ϕt + dxf(Nx)V (Nx) = −∆ϕt + 8πa0|ϕt| (2.18) because of (2.3). This is exactly the Gross-Pitaevskii equation (1.4), with the correct coupling constant in front of the nonlinearity. Note that the presence of the correlation functions fN (x1−x2) and fN (x 2) in (2.17) does not contradict complete condensation of the system at time t. On the contrary, in the weak limit N → ∞, the function fN converges to one, and therefore γ N,t and γ N,t converge to |ϕt〉〈ϕt| and |ϕt〉〈ϕt| respectively. The correlations described by the function fN can only produce nontrivial effects on the macroscopic dynamics of the system because of the singularity of the interaction potential VN . From this heuristic argument it is clear that, in order to obtain a rigorous derivation of the Gross- Pitaevskii equation (2.18), we need to identify the short scale structure of the marginal densities and prove that, in a very good approximation, it can be described by the function fN as in (2.17). In other words, we need to show a very strong separation of scales in the marginal density γ N,t (and, more generally, in the k-particle density γ N,t) associated with the solution of the N -body Schrödinger equation; the Gross-Pitaevskii theory can only be correct if γ N,t has a regular part, which factorizes for large N into the product of k copies of the orthogonal projection |ϕt〉〈ϕt|, and a time independent singular part, due to the correlations among the particles, and described by products of the functions fN (xi − xj), 1 ≤ i, j ≤ k. 3 Main Results To prove our main results we need to assume the interaction potential to be sufficiently weak. To measure the strength of the potential, we introduce the dimensionless quantity α = sup |x|2V (x) + V (x) . (3.1) Apart from the smallness assumption on the potential, we also need to assume that the correlations characterizing the initial N -particle wave function are sufficiently weak. We define therefore the notion of asymptotically factorized wave functions. We say that a family of permutation symmetric wave functions ψN is asymptotically factorized if there exists ϕ ∈ L 2(R3) and, for any fixed k ≥ 1, there exists a family ξ (N−k) N ∈ L 3(N−k)) such that ∥∥∥ψN − ϕ⊗k ⊗ ξ (N−k) ∥∥∥→ 0 as N → ∞ . (3.2) It is simple to check that, if ψN is asymptotically factorized, then it exhibits complete Bose-Einstein condensation in the one-particle state ϕ (in the sense that the one-particle density associated with ψN satisfy γ N → |ϕ〉〈ϕ| as N → ∞). Asymptotic factorization is therefore a stronger condition than complete condensation, and it provides more control on the correlations of ψN . Theorem 3.1. Assume that V (x) is a positive, smooth, spherical symmetric, and compactly sup- ported potential such that α (defined in (3.1)) is sufficiently small. Consider an asymptotically factorized family of wave functions ψN ∈ L 3N ), exhibiting complete Bose-Einstein condensation in a one-particle state ϕ ∈ H1(R3), in the sense that N → |ϕ〉〈ϕ| as N → ∞ (3.3) where γ N denotes the one-particle density associated with ψN . Then, for any fixed t ∈ R, the one- particle density γ N,t associated with the solution ψN,t of the N -particle Schrödinger equation (2.6) satisfies N,t → |ϕt〉〈ϕt| as N → ∞ (3.4) where ϕt is the solution to the time-dependent Gross-Pitaevskii equation i∂tϕt = −∆ϕt + 8πa0|ϕt| 2ϕt (3.5) with initial data ϕt=0 = ϕ. The convergence in (3.3) and (3.4) is in the trace norm topology (which in this case is equivalent to the weak* topology defined on the space of trace class operators on R3). Moreover, from (3.4) we also get convergence of higher marginal. For every k ≥ 1, we have N,t → |ϕt〉〈ϕt| ⊗k as N → ∞. Theorem 3.1 can be used to describe the dynamics of condensates satisfying the condition of asymptotic factorization. The following two corollaries provide examples of such initial data. The simplest example of N -particle wave function satisfying the assumption of asymptotic fac- torization is given by a product state. Corollary 3.2. Under the assumptions on V (x) stated in Theorem 3.1, let ψN (x) = j=1 ϕ(xj) for an arbitrary ϕ ∈ H1(R3). Then, for any t ∈ R, N,t → |ϕt〉〈ϕt| as N → ∞ where ϕt is a solution of the Gross-Pitaevskii equation (3.5) with initial data ϕt=0 = ϕ. The second application of Theorem 3.1 gives a mathematical description of the results of the experiments depicted in the introduction. (−∆j + Vext(xj)) + VN (xi − xj) (3.6) with a confining potential Vext. Let ψN be the ground state of H N . By [9], ψN exhibits complete Bose Einstein condensation into the minimizer φGP of the Gross-Pitaevskii energy functional EGP defined in (1.2). In other words N → |φGP〉〈φGP| as N → ∞ . In [5], we demonstrate that ψN also satisfies the condition (3.2) of asymptotic factorization. From this observation, we obtain the following corollary. Corollary 3.3. Under the assumptions on V (x) stated in Theorem 3.1, let ψN be the ground state of (3.6), and denote by γ N,t the one-particle density associated with the solution ψN,t = e −iHN tψN of the Schrödinger equation (2.6). Then, for any fixed t ∈ R, N,t → |ϕt〉〈ϕt| as N → ∞ where ϕt is the solution of the Gross-Pitaevskii equation (3.5) with initial data ϕt=0 = φGP. Although the second corollary describes physically more realistic situations, also the first corollary has interesting consequences. In Section 2, we observed that the emergence of the scattering length in the Gross-Pitaevskii equation is an effect due to the correlations. The fact that the Gross-Pitaevskii equation describes the dynamics of the condensate also if the initial wave function is completely uncorrelated, as in Corollary 3.2, implies that the N -body Schrödinger dynamics generates the singular correlation structure in very short times. Of course, when the wave function develops correlations on the length scale 1/N , the energy associated with this length scale decreases; since the total energy is conserved by the Schrödinger evolution, we must conclude that together with the short scale structure at scales of order 1/N , the N -body dynamics also produces oscillations on intermediate length scales 1/N ≪ ℓ ≪ 1, which carry the excess energy (the difference between the energy of the factorized wave function and the energy of the wave function with correlations on the length scale 1/N) and which have no effect on the macroscopic dynamics (because only variations of the wave function on length scales of order one and order 1/N affect the macroscopic dynamics described by the Gross-Pitaevskii equation). 4 General Strategy of the Proof and Previous Results In this section we illustrate the strategy used to prove Theorem 3.1. The proof is divided into three main steps. Step 1. Compactness of γ N,t. Recall, from (2.7), the definition of the marginal densities γ associated with the solution ψN,t = exp(−iHN t)ψN of the N -body Schrödinger equation. By defini- tion, for any N ∈ N and t ∈ R, γ N,t is a positive operator in L k = L 1(L2(R3k)) (the space of trace class operators on L2(R3k)) with trace equal to one. For fixed t ∈ R and k ≥ 1, it follows by standard general argument (Banach-Alaouglu Theorem) that the sequence {γ N,t}N≥k is compact with respect to the weak* topology of L1k. Note here that L k has a weak* topology because L k = K k, where Kk = K(L 2(R3k)) is the space of compact operators on L2(R3k). To make sure that we can find subsequences of γ N,t which converge for all times in a certain interval, we fix T > 0 and consider the space C([0, T ],L1k) of all functions of t ∈ [0, T ] with values in L k which are continuous with respect to the weak* topology on L1k. Since Kk is separable, it follows that the weak* topology on the unit ball of L1k is metrizable; this allows us to prove the equicontinuity of the densities γ N,t, and to obtain compactness of the sequences {γ N,t}N≥k in C([0, T ],L Step 2. Convergence to an infinite hierarchy. By Step 1 we know that, as N → ∞, the family of marginal densities ΓN,t = {γ k=1 has at least one limit point Γ∞,t = {γ ∞,t}k≥1 in⊕ k≥1C([0, T ],L k) with respect to the product topology. Next, we derive evolution equations for the limiting densities γ ∞,t. Starting from the BBGKY hierarchy (2.10) for the family ΓN,t, we prove that any limit point Γ∞,t satisfies the infinite hierarchy of equations ∞,t = −∆j, γ + 8πa0 Trk+1 δ(xj − xk+1), γ (k+1) (4.1) for k ≥ 1. It is at this point, in the derivation of this infinite hierarchy, that we need to identify the singular part of the densities γ (k+1) N,t . The emergence of the scattering length in the second term on the right hand side of (4.1) is due to short scale structure of γ (k+1) N,t . It is worth noticing that the infinite hierarchy (4.1) has a factorized solution. In fact, it is simple to see that the infinite family t = |ϕt〉〈ϕt| ⊗k for k ≥ 1 (4.2) solves (4.1) if and only if ϕt is a solution to the Gross-Pitaevskii equation (3.5). Step 3. Uniqueness of the solution to the infinite hierarchy. To conclude the proof of Theorem 3.1, we show that the infinite hierarchy (4.1) has a unique solution. This implies immediately that the densities γ N,t converge; in fact, a compact sequence with at most one limit point is always convergent. Moreover, since we know that the factorized densities (4.2) are a solution, it also follows that, for any k ≥ 1, N,t → |ϕt〉〈ϕt| ⊗k as N → ∞ with respect to the weak* topology of L1k. Similar strategies have been used to obtain rigorous derivations of the nonlinear Hartree equation i∂tϕt = −∆ϕt + (v ∗ |ϕt| 2)ϕt (4.3) for the dynamics of initially factorized wave functions in bosonic many particle mean field models, characterized by the Hamiltonian HmfN = −∆j + v(xi − xj) . (4.4) In this context, the approach outlined above was introduced by Spohn in [11], who applied it to derive (4.3) in the case of a bounded potential v. In [8], Erdős and Yau extended Spohn’s result to the case of a Coulomb interaction v(x) = ±1/|x| (partial results for the Coulomb case, in particular the convergence to the infinite hierarchy, were also obtained by Bardos, Golse, and Mauser, see [3]). More recently, Adami, Golse, and Teta used the same approach in [1] for one-dimensional systems with dynamics generated by a Hamiltonian of the form (4.4) with an N -dependent pair potential vN (x) = N βV (Nβx), β < 1. In the limit N → ∞, they obtain the nonlinear Schrödinger equation i∂tϕt = −∆ϕt + b0|ϕt| 2ϕt with b0 = V (x)dx . Notice that the Hamiltonian (2.5) has the same form as the mean field Hamiltonian (4.4), with an N -dependent pair potential vN (x) = N 3V (Nx). Of course, one may also ask what happens if we consider the mean field Hamiltonian (4.4) with the N -dependent potential vN (x) = N 3βV (Nβx), for β 6= 1. If β < 1, the short scale structure developed by the solution of the Schrödinger equation is still characterized by the length scale 1/N (because the scattering length of N3β−1V (Nβx) is still of order 1/N); but this time the potential varies on much larger scales, of the order N−β ≫ N−1. For this reason, if β < 1, the scattering length does not appear in the effective macroscopic equation (8πa0 is replaced by b0 = dxV (x)). In [6] (and previously in [5] for 0 < β < 1/2) we prove in fact that Corollary 3.2 can be extended to include the case 0 < β < 1 as follows. Theorem 4.1. Suppose ψN (x) = j=1 ϕ(xj), for some ϕ ∈ H 1(R3). Let ψN,t = e −iHβ,N tψN with the mean-field Hamiltonian Hβ,N = −∆j + N3βV (Nβ(xi − xj)) for a positive, spherical symmetric, compactly supported, and smooth potential V such that α (defined in (3.1)) is sufficiently small. Let γ N,t be the one-particle density associated with ψN,t. Then, if 0 < β ≤ 1 we have, for any fixed t ∈ R, γ N,t → |ϕt〉〈ϕt| as N → ∞. Here ϕt is the solution to the nonlinear Schrödinger equation i∂tϕt = −∆ϕt + σ|ϕt| with initial data ϕt=0 = ϕ and with 8πa0 if β = 1 b0 if 0 < β < 1 5 Convergence to the Infinite Hierarchy In this section we give some more details concerning Step 2 in the strategy outlined above. We consider a limit point Γ∞,t = {γ ∞,t}k≥1 of the sequence ΓN,t = {γ k=1 and we prove that Γ∞,t satisfies the infinite hierarchy (4.1). To this end we use that, for finite N , the family ΓN,t satisfies the BBGKY hierarchy (2.10), and we show the convergence of each term in (2.10) to the corresponding term in the infinite hierarchy (4.1) (the second term on the r.h.s. of (2.10) is of smaller order and can be proven to vanish in the limit N → ∞). The main difficulty consists in proving the convergence of the last term on the right hand side of (2.10) to the last term on the right hand side of (4.1). In particular, we need to show that in the limit N → ∞ we can replace the potential (N − k)N2V (N(xj −xk+1)) ≃ N 3V (Nx) in the last term on the r.h.s. of (2.10) by 8πa0δ(xj − xk+1) . In terms of kernels we have to prove that dxk+1 N3V (N(xj − xk+1))− 8πa0δ(xj − xk+1) (k+1) N,t (xk, xk+1,x k, xk+1) → 0 (5.1) as N → ∞. It is enough to prove the convergence (5.1) in a weak sense, after testing the expression against a smooth k-particle kernel J (k)(xk;x k). Note, however, that the observable J (k) does not help to perform the integration over the variable xk+1. The problem here is that, formally, the N -dependent potential N3V (N(xj − xk+1)) does not converge towards 8πa0δ(xj − xk+1) as N → ∞ (it converges towards b0δ(xj − xk+1), with b0 =∫ dxV (x)). Eq. (5.1) is only correct because of the correlations between xj and xk+1 hidden in the density γ (k+1) N,t . Therefore, to prove (5.1), we start by factoring out the correlations explicitly, and by proving that, as N → ∞, dxk+1 N3V (N(xj − xk+1))fN (xj − xk+1)− 8πa0δ(xj − xk+1) ) γ(k+1)N,t (xk, xk+1,x k, xk+1) fN (xj − xk+1) (5.2) where fN (x) is the solution to the zero energy scattering equation (2.4). Then, in a second step, we use the fact that fN → 1 in the weak limitN → ∞, to prove that the ratio γ (k+1) N,t /fN (xj−xk+1) converges to the same limiting density γ (k+1) ∞,t as γ (k+1) N,t . Eq. (5.2) looks now much better than (5.1) because, formally, N3V (N(xj − xk+1))fN (xj − xk+1) does converge to 8πa0δ(xj − xk+1). To prove that (5.2) is indeed correct, we only need some regularity of the ratio γ (k+1) N,t (xk, xk+1;x k, xk+1)/fN (xj − xk+1) in the variables xj and xk+1. In terms of the N -particle wave function ψN,t we need regularity of ψN,t(x)/fN (xi−xj) in the variables xi, xj, for any i 6= j. To establish the required regularity we use the following energy estimate. Proposition 5.1. Consider the Hamiltonian HN defined in (2.5), with a positive, spherical sym- metric, smooth and compactly supported potential V . Suppose that α (defined in (3.1)) is sufficiently small. Then there exists C = C(α) > 0 such that 〈ψ,H2Nψ〉 ≥ CN ∣∣∣∣∇i∇j fN(xi − xj) . (5.3) for all i 6= j and for all ψ ∈ L2s(R 3N ,dx). Making use of this energy estimate it is possible to deduce strong a-priori bounds on the solution ψN,t of the Schrödinger equation (2.6). These bounds have the form ∣∣∣∣∇i∇j ψN,t(x) fN(xi − xj) ≤ C (5.4) uniformly in N ∈ N and t ∈ R. To prove (5.4) we use that, by (5.3), and because of the conservation of the energy along the time evolution, ∣∣∣∣∇i∇j ψN,t(x) fN (xi − xj) ≤ CN−2〈ψN,t,H NψN,t〉 = CN −2〈ψN,0,H NψN,0〉 . (5.5) From (5.5) and using an approximation argument on the initial wave function to make sure that the expectation of H2N at time t = 0 is of the order N 2, we obtain (5.4). The bounds (5.4) are then sufficient to prove the convergence (5.1) (using a non-standard Poincaré inequality; see Lemma 7.2 in [6]). Remark that the a-priori bounds (5.4) do not hold true if we do not divide the solution ψN,t of the Schrödinger equation by fN(xi − xj) (replacing ψN,t(x)/fN (xi − xj) by ψN (x) the integral in (5.4) would be of order N). It is only after removing the singular factor fN (xi − xj) from ψN,t(x) that we can prove useful bounds on the regular part of the wave function. It is through the a-priori bounds (5.4) that we identify the correlation structure of the wave function ψN,t and that we show that, when xi and xj are close to each other, ψN,t(x) can be approximated by the time independent singular factor fN(xi − xj), which varies on the length scale 1/N , multiplied with a regular part (regular in the sense that it satisfy the bounds (5.4)). It is therefore through (5.4) that we establish the strong separation of scales in the wave function ψN,t and in the marginal densities γ N,t which is of fundamental importance for the Gross-Pitaevskii theory. Since it is quite short and it shows why the solution fN (xi − xj) to the zero energy scattering equation (2.1) can be used to describe the two-particle correlations, we reproduce in the following the proof Proposition 5.1. Note that this is the only step in the proof of our main theorem where the smallness of constant α, measuring the strength of the interaction potential, is used. The positivity of the interaction potential, on the other hand, also plays an important role in many other parts of the proof. Proof of Proposition 5.1. We decompose the Hamiltonian (2.5) as hj with hj = −∆j + i 6=j VN (xi − xj) . For an arbitrary permutation symmetric wave function ψ and for any fixed i 6= j, we have 〈ψ,H2Nψ〉 = N〈ψ, h iψ〉+N(N − 1)〈ψ, hihjψ〉 ≥ N(N − 1)〈ψ, hihjψ〉 . Using the positivity of the potential, we find 〈ψ,H2Nψ〉 ≥ N(N − 1) −∆i + VN (xi − xj) −∆j + VN (xi − xj) . (5.6) Next, we define φ(x) by ψ(x) = fN (xi−xj)φ(x) (φ is well defined because fN (x) > 0 for all x ∈ R note that the definition of the function φ depends on the choice of i, j. Then fN (xi − xj) ∆i (fN (xi − xj)φ(x)) = ∆iφ(x) + (∆fN )(xi − xj) fN (xi − xj) φ(x) + ∇fN (xi − xj) fN (xi − xj) ∇iφ(x) . From (2.1) it follows that fN (xi − xj) −∆i + VN (xi − xj) fN (xi − xj)φ(x) = Liφ(x) and analogously fN (xi − xj) −∆j + VN (xi − xj) fN (xi − xj)φ(x) = Ljφ(x) where we defined Lℓ = −∆ℓ + 2 ∇ℓ fN (xi − xj) fN (xi − xj) ∇ℓ, for ℓ = i, j . Remark that, for ℓ = i, j, the operator Lℓ satisfies dx f2N (xi−xj) Lℓ φ(x) ψ(x) = dx f2N (xi−xj) φ(x) Lℓ ψ(x) = dx f2N (xi−xj) ∇ℓ φ(x) ∇ℓ ψ(x) . Therefore, from (5.6), we obtain 〈ψ,H2Nψ〉 ≥ N(N − 1) dx f2N (xi − xj) Li φ(x)Lj φ(x) = N(N − 1) dx f2N (xi − xj) ∇iφ(x)∇iLj φ(x) = N(N − 1) dx f2N (xi − xj) ∇iφ(x)Lj ∇iφ(x) +N(N − 1) dx f2N(xi − xj) ∇iφ(x) [∇i, Lj ]φ(x) = N(N − 1) dx f2N (xi − xj) |∇j∇iφ(x)| +N(N − 1) dx f2N(xi − xj) ∇fN(xi − xj) fN (xi − xj) ∇iφ(x)∇jφ(x) . (5.7) To control the second term on the right hand side of the last equation we use bounds on the function fN , which can be derived from the zero energy scattering equation (2.1): 1− Cα ≤ fN (x) ≤ 1, |∇fN (x)| ≤ C , |∇2fN(x)| ≤ C (5.8) for constants C independent of N and of the potential V (recall the definition of the dimensionless constant α from (3.1)). Therefore, for α < 1, dx f2N (xi − xj) ∇fN(xi − xj) fN (xi − xj) ∇iφ(x)∇jφ(x) |xi − xj | |∇iφ(x)| |∇jφ(x)| |xi − xj |2 |∇iφ(x)| 2 + |∇jφ(x)| dx |∇i∇jφ(x)| (5.9) where we used Hardy inequality. Thus, from (5.7), and using again the first bound in (5.8), we obtain 〈ψ,H2Nψ〉 ≥ N(N − 1)(1 −Cα) dx |∇i∇jφ(x)| which implies (5.3). 6 Uniqueness of the Solution to the Infinite Hierarchy In this section we discuss the main ideas used to prove the uniqueness of the solution to the infinite hierarchy (Step 3 in the strategy outlined in Section 4). First of all, we need to specify in which class of family of densities Γt = {γ t }k≥1 we want to prove the uniqueness of the solution to the infinite hierarchy (4.1). Clearly, the proof of the uniqueness is simpler if we can restrict our attention to smaller classes. But of course, in order to apply the uniqueness result to prove Theorem 3.1, we need to make sure that any limit point of the sequence ΓN,t = {γ k=1 is in the class for which we can prove uniqueness. We are going to prove uniqueness for all families Γt = {γ t }k≥1 ∈ C([0, T ],L1k) with t ‖Hk := Tr ∣∣∣(1−∆1)1/2 . . . (1−∆k)1/2 γ t (1−∆k) 1/2 . . . (1−∆1) ∣∣∣ ≤ Ck (6.1) for all t ∈ [0, T ] and for all k ≥ 1 (with a constant C independent of k). The following proposition guarantees that any limit point of the sequence ΓN,t satisfies (6.1). Proposition 6.1. Assume the same conditions as in Proposition 5.1. Suppose that Γ∞,t = {γ ∞,t}k≥1 is a limit point of ΓN,t = {γ k=1 with respect to the product topology on k≥1C([0, T ],L k). Then ∞,t ≥ 0 and there exists a constant C such that Tr (1−∆1) . . . (1−∆k)γ ∞,t ≤ C k (6.2) for all k ≥ 1 and t ∈ [0, T ]. Because of Proposition 6.1, it is enough to prove the uniqueness of the infinite hierarchy (4.1) in the following sense. Theorem 6.2. Suppose that Γ = {γ(k)}k≥1 is such that ‖γ(k)‖Hk ≤ C k (6.3) for all k ≥ 1 (the norm ‖.‖Hk is defined in (6.1)). Then there exists at most one solution Γt = t }k≥1 ∈ C([0, T ],Lk) of (4.1) such that Γt=0 = Γ and t ‖Hk ≤ C k (6.4) for all k ≥ 1 and all t ∈ [0, T ] (with the same constant C as in (6.3)). In the next two subsections we explain the main ideas of the proofs of Proposition 6.1 and Theorem 6.2. 6.1 Higher Order Energy Estimates The main difficulty in proving Proposition 6.1 is the fact that the estimate (6.2) does not hold true if we replace γ ∞,t by the marginal density γ N,t. More precisely, Tr (1−∆1) . . . (1−∆k)γ N,t ≤ C k (6.5) cannot hold true with a constant C independent of N . In fact, for finite N and k > 1, the k-particle density γ N,t still contains the short scale structure due to the correlations among the particles. Therefore, when we take derivatives of γ N,t as in (6.5), the singular structure (which varies on a length scale of order 1/N) generates contributions which diverge in the limit N → ∞. To overcome this problem, we cutoff the wave function ψN,t when two or more particles come at distances smaller than some intermediate length scale ℓ, with N−1 ≪ ℓ ≪ 1 (more precisely, the cutoff will be effective only when one or more particles come close to one of the variable xj over which we want to take derivatives). For fixed j = 1, . . . , N , we define θj ∈ C ∞(R3N ) such that θj(x) ≃ 1 if |xi − xj| ≫ ℓ for all i 6= j 0 if there exists i 6= j with |xi − xj | . ℓ It is important, for our analysis, that θj controls its derivatives (in the sense that, for example, |∇iθj| ≤ Cℓ j ); for this reason we cannot use standard compactly supported cutoffs, but instead we have to construct appropriate functions which decay exponentially when particles come close together. Making use of the functions θj(x), we prove the following higher order energy estimates. Proposition 6.3. Choose ℓ ≪ 1 such that Nℓ2 ≫ 1. Suppose that α is small enough. Then there exist constants C1 and C2 such that, for any ψ ∈ L 3N ), 〈ψ, (HN + C1N) kψ〉 ≥ C2N dx θ1(x) . . . θk−1(x) |∇1 . . .∇kψ(x)| 2 . (6.6) The meaning of the bounds (6.6) is clear. We can control the L2-norm of the k-th derivative ∇1 . . .∇kψ by the expectation of the k-th power of the energy per particle, if we only integrate over configurations where the first k − 1 particles are “isolated” (in the sense that there is no particle at distances smaller than ℓ from x1, x2, . . . , xk−1). In this sense the energy estimate in Proposition 5.1 (which, compared with Proposition 6.3, is restricted to k = 2) is more precise than (6.6), because it identifies and controls the singularity of the wave function exactly in the region cutoff from the integral on the right side of (6.6). The point is that, while Proposition 5.1 is used to identify the two-particle correlations in the marginal densities γ N,t (which are essential for the emergence of the scattering length a0 in the infinite hierarchy (4.1)), we only need Proposition 6.3 to establish properties of the limiting densities; this is why we can introduce cutoffs in (6.6), provided we can show their effect to vanish in the limit N → ∞. Note that we can allow one “free derivative”; in (6.6) we take the derivative over xk although there is no cutoff θk(x). The reason is that the correlation structure becomes singular, in the L sense, only when we derive it twice (if one uses the zero energy solution fN introduced in (2.1) to describe the correlations, this can be seen by observing that ∇fN (x) ≃ 1/|x|, which is locally square integrable). Remark that the condition Nℓ2 ≫ 1 is a consequence of the fact that, if ℓ is too small, the error due to the localization of the kinetic energy on distances of order ℓ cannot be controlled. The proof of Proposition 6.3 is based on induction over k; for details see Section 9 in [6]. From the estimates (6.6), using the preservation of the expectation ofHkN along the time evolution and a regularization of the initial N -particle wave function ψN , we obtain the following bounds for the solution ψN,t = e −iHN tψN of the Schrödinger equation (2.6). dx θ1(x) . . . θk−1(x) |∇1 . . .∇kψN,t(x)| ≤ Ck (6.7) uniformly in N and t, and for all k ≥ 1. Translating these bounds in the language of the density matrix γN,t, we obtain Tr θ1 . . . θk−1∇1 . . .∇kγN,t∇ 1 . . .∇ k ≤ C k . (6.8) The idea now is to use the freedom in the choice of the cutoff length ℓ. If we fix the position of all particles but xj, it is clear that the cutoff θj is effective in a volume at most of the order Nℓ 3. If we choose now ℓ such that Nℓ3 → 0 as N → ∞ (which is of course compatible with the condition that Nℓ2 ≫ 1), then we can expect that, in the limit of large N , the cutoff becomes negligible. This approach yields in fact the desired results; starting from (6.8), and choosing ℓ such that Nℓ3 ≪ 1, we can complete the proof of Proposition 6.1 (see Proposition 6.3 in [6] for more details). 6.2 Expansion in Feynman Graphs To prove Theorem 6.2, we start by rewriting the infinite hierarchy (4.1) in the integral form γt = U (k)(t)γ0 + 8iπa0 ds U (k)(t− s)Trk+1 δ(xj − xk+1), γ (k+1) = U (k)(t)γ0 + ds U (k)(t− s)B(k)γ(k+1)s , (6.9) where U (k)(t) denotes the free evolution of k particles, U (k)(t)γ(k) = eit ∆jγ(k)e−it and the collision operator B(k) maps (k+1)-particle operators into k-particle operators according to B(k)γ(k+1) = 8iπa0 Trk+1 δ(xj − xk+1), γ (k+1) (6.10) (recall that Trk+1 denotes the partial trace over the (k + 1)-th particle). Iterating (6.9) n times we obtain the Duhamel type series t = U (k)(t)γ m,t + η n,t (6.11) m,t = ds1 . . . ∫ sm−1 dsm U (k)(t− s1)B (k)U (k+1)(s1 − s2)B (k+1) . . . B(k+m−1)U (k+m)(sm)γ (k+m) · · · ds1 . . . ∫ sm−1 dsm U (k)(t− s1)Trk+1 δ(xj1 − xk+1), U (k+1)(s1 − s2)Trk+2 δ(xj2 − xk+2), . . .Trk+m δ(xjm − xk+m),U (k+m)(sm)γ (k+m) . . . (6.12) and the error term n,t = ds2 . . . ∫ sn−1 dsn U (k)(t− s1)B (k)U (k+1)(s1 − s2)B (k+1) . . . B(k+n−1)γ(k+m)sn . (6.13) Note that the error term (6.13) has exactly the same form as the terms in (6.12), with the only difference that the last free evolution is replaced by the full evolution γ (k+m) 2k+2m leaves2k roots Vertices: Figure 1: A Feynman graph in Fm,k and its two types of vertices To prove the uniqueness of the infinite hierarchy, it is enough to prove that the error term (6.13) converges to zero as n → ∞ (in some norm, or even only after testing it against a sufficiently large class of smooth observables). The main problem here is that the delta function in the collision operator B(k) cannot be controlled by the kinetic energy (in the sense that, in three dimensions, the operator inequality δ(x) ≤ C(1 − ∆) does not hold true). For this reason, the a-priori estimates t ‖Hk ≤ C k are not sufficient to show that (6.13) converges to zero, as n→ ∞. Instead, we have to make use of the smoothing effects of the free evolutions U (k+j)(sj − sj+1) in (6.13) (in a similar way, Stricharzt estimates are used to prove the well-posedness of nonlinear Schrödinger equations). To this end, we rewrite each term in the series (6.11) as a sum of contributions associated with certain Feynman graphs, and then we prove the convergence of the Duhamel expansion by controlling each contribution separately. The details of the diagrammatic expansion can be found in Section 9 of [5]. Here we only present the main ideas. We start by considering the term ξ m,t in (6.12). After multiplying it with a compact k-particle observable J (k) and taking the trace, we expand the result as Tr J (k)ξ m,t = Λ∈Fm,k KΛ,t (6.14) where KΛ,t is the contribution associated with the Feynman graph Λ. Here Fm,k denotes the set of all graphs consisting of 2k disjoint, paired, oriented, and rooted trees with m vertices. An example of a graph in Fm,k is drawn in Figure 1. Each vertex has one of the two forms drawn in Figure 1, with one “father”-edge on the left (closer to the root of the tree) and three “son”-edges on the right. One of the son edge is marked (the one drawn on the same level as the father edge; the other two son edges are drawn below). Graphs in Fm,k have 2k + 3m edges, 2k roots (the edges on the very left), and 2k + 2m leaves (the edges on the very right). It is possible to show that the number of different graphs in Fm,k is bounded by 2 4m+k. The particular form of the graphs in Fm,k is due to the quantum mechanical nature of the expansion; the presence of a commutator in the collision operator (6.10) implies that, for every B(k+j) in (6.12), we can choose whether to write the interaction on the left or on the right of the density. When we draw the corresponding vertex in a graph in Fm,k, we have to choose whether to attach it on the incoming or on the outgoing edge. Graphs in Fm,k are characterized by a natural partial ordering among the vertices (v ≺ v the vertex v is on the path from v′ to the roots); there is, however, no total ordering. The absence of total ordering among the vertices is the consequence of a rearrangement of the summands on the r.h.s. of (6.12); by removing the order between times associated with non-ordered vertices we significantly reduce the number of terms in the expansion. In fact, while (6.12) contains (m+ k)!/k! summands, in (6.14) we are only summing over 24m+k contributions. The price we have to pay is that the apparent gain of a factor 1/m! because of the ordering of the time integrals in (6.12) is lost in the new expansion (6.14). However, since the time integrations are already needed to smooth out singularities, and since they cannot be used simultaneously for smoothing and for gaining a factor 1/m!, it seems very difficult to make use of the apparent factor 1/m! in (6.12). In fact, we find that the expansion (6.14) is better suited for analyzing the cumulative space-time smoothing effects of the multiple free evolutions than (6.12). Because of the pairing of the 2k trees, there is a natural pairing between the 2k roots of the graph. Moreover, it is also possible to define a natural pairing of the leaves of the graph (this is evident in Figure 1); two leaves ℓ1 and ℓ2 are paired if there exists an edge e1 on the path from ℓ1 back to the roots, and an edge e2 on the path from ℓ2 to the roots, such that e1 and e2 are the two unmarked son-edges of the same vertex (or, if there is no unmarked sons in the path from ℓ1 and ℓ2 to the roots, if the two roots connected to ℓ1 and ℓ2 are paired). For Λ ∈ Fm,k, we denote by E(Λ), V (Λ), R(Λ) and L(Λ) the set of all edges, vertices, roots and, respectively, leaves in the graph Λ. For every edge e ∈ E(Λ), we introduce a three-dimensional momentum variable pe and a one-dimensional frequency variable αe. Then, denoting by γ̂ (k+m) 0 and by Ĵ (k) the kernels of the density γ (k+m) 0 and of the observable J (k) in Fourier space, the contribution KΛ,t in (6.14) is given by KΛ,t = e∈E(Λ) dpedαe αe − p2e + iτeµe v∈V (Λ) × exp e∈R(Λ) τe(αe + iτeµe)  Ĵ (k) {pe}e∈R(Λ) (k+m) {pe}e∈L(Λ) (6.15) Here τe = ±1, according to the orientation of the edge e. We observe from (6.15) that the momenta of the roots of Λ are the variables of the kernel of J (k), while the momenta of the leaves of Λ are the variables of the kernel of γ (k+m) 0 (this also explain why roots and leaves of Λ need to be paired). The denominators (αe−p e+iτeµe) −1 are called propagators; they correspond to the free evolutions in the expansion (6.12) and they enter the expression (6.15) through the formula eit(α+iµ) α− p2 + iµ (here and in (6.15) the measure dα is defined by dα = d′α/(2πi) where d′α is the Lebesgue measure on R). The regularization factors µe in (6.15) have to be chosen such that µfather = e= son µe at every vertex. The delta-functions in (6.15) express momentum and frequency conservation (the sum over e ∈ v denotes the sum over all edges adjacent to the vertex v; here ±αe = αe if the edge points towards the vertex, while ±αe = −αe if the edge points out of the vertex, and analogously for ±pe). An analogous expansion can be obtained for the error term η n,t in (6.13). The problem now is to analyze the integral (6.15) (and the corresponding integral for the error term). Through an appropriate choice of the regularization factors µe one can extract the time dependence of KΛ,t and show that |KΛ,t| ≤ C k+m tm/4 e∈E(Γ) dαedpe 〈αe − p2e〉 v∈V (Γ) ∣∣∣Ĵ (k) {pe}e∈R(Γ) ) ∣∣∣ ∣∣∣γ̂(k+m)0 {pe}e∈L(Γ) ) ∣∣∣ (6.16) where we introduced the notation 〈x〉 = (1 + x2)1/2. Because of the singularity of the interaction at zero, we may be faced here with an ultraviolet problem; we have to show that all integrations in (6.16) are finite in the regime of large momenta and large frequency. Because of (6.3), we know that the kernel γ̂ (k+m) 0 ({pe}e∈L(Λ)) in (6.16) provides decay in the momenta of the leaves. From (6.3) we have, in momentum space, dp1 . . . dpn (p 1 + 1) . . . (p n + 1) γ̂ 0 (p1, . . . , pn; p1, . . . , pn) ≤ C for all n ≥ 1. Power counting implies that (k+m) 0 ({pe}e∈L(Λ))| . e∈L(Λ) −5/2 . (6.17) Using this decay in the momenta of the leaves and the decay of the propagators 〈αe−p −1, e ∈ E(Λ), we can prove the finiteness of all the momentum and frequency integrals in (6.15). Heuristically, this can be seen using a simple power counting argument. Fix κ≫ 1, and cutoff all momenta |pe| ≥ κ and all frequencies |αe| ≥ κ 2. Each pe-integral scales then as κ 3, and each αe-integral scales as κ 2. Since we have 2k + 3m edges in Λ, we have 2k + 3m momentum- and frequency integrations. However, because of the m delta functions (due to momentum and frequency conservation), we effectively only have to perform 2k + 2m momentum- and frequency-integrations. Therefore the whole integral in (6.15) carries a volume factor of the order κ5(2k+2m) = κ10k+10m. Now, since there are 2k + 2m leaves in the graph Λ, the estimate (6.17) guarantees a decay of the order κ−5/2(2k+2m) = κ−5k−5m. The 2k + 3m propagators, on the other hand, provide a decay of the order κ−2(2k+3m) = κ−4k−6m. Choosing the observable J (k) so that Ĵ (k) decays sufficiently fast at infinity, we can also gain an additional decay κ−6k. Since κ10k+10m · κ−5k−5m−4k−6m−6k = κ−m−5k ≪ 1 for κ ≫ 1, we can expect (6.15) to converge in the large momentum and large frequency regime. Remark the importance of the decay provided by the free evolution (through the propagators); without making use of it, we would not be able to prove the uniqueness of the infinite hierarchy. This heuristic argument is clearly far from rigorous. To obtain a rigorous proof, we use an integration scheme dictated by the structure of the graph Λ; we start by integrating the momenta and the frequency of the leaves (for which (6.17) provides sufficient decay). The point here is that when we perform the integrations over the momenta of the leaves we have to propagate the decay to the next edges on the left. We move iteratively from the right to the left of the graph, until we reach the roots; at every step we integrate the frequencies and momenta of the son edges of a fixed vertex and as a result we obtain decay in the momentum of the father edge. When we reach the roots, we use the decay of the kernel Ĵ (k) to complete the integration scheme. In a typical step, we α upuα rpr Figure 2: Integration scheme: a typical vertex consider a vertex as the one drawn in Figure 2 and we assume to have decay in the momenta of the three son-edges, in the form |pe| −λ, e = u, d,w (for some 2 < λ < 5/2). Then we integrate over the frequencies αu, αd, αw and the momenta pu, pd, pw of the son-edges and as a result we obtain a decaying factor |pr| −λ in the momentum of the father edge. In other words, we prove bounds of the dαudαddαwdpudpddpw |pu|λ|pd| λ|pw|λ δ(αr = αu + αd − αw)δ(pr = pu + pd − pw) 〈αu − p u〉〈αd − p d〉〈αw − p const |pr|λ . (6.18) Power counting implies that (6.18) can only be correct if λ > 2. On the other hand, to start the integration scheme we need λ < 5/2 (from (6.17) this is the decay in the momenta of the leaves, obtained from the a-priori estimates). It turns out that, choosing λ = 2 + ε for a sufficiently small ε > 0, (6.18) can be made precise, and the integration scheme can be completed. After integrating all the frequency and momentum variables, from (6.16) we obtain that |KΛ,t| ≤ C k+m tm/4 for every Λ ∈ Fm,k. Since the number of diagrams in Fm,k is bounded by C k+m, it follows immediately that ∣∣∣Tr J (k) ξ(k)m,t ∣∣∣ ≤ Ck+mtm/4 . Note that, from (6.12), one may expect ξ m,t to be proportional to t m. The reason why we only get a bound proportional to tm/4 is that we effectively use part of the time integration to control the singularity of the potentials. Note that the only property of γ (k+m) 0 used in the analysis of (6.15) is the estimate (6.3), which provides the necessary decay in the momenta of the leaves. However, since the a-priori bound (6.4) hold uniformly in time, we can use a similar argument to bound the contribution arising from the error term η n,t in (6.13) (as explained above, also η n,t can be expanded analogously to (6.14), with contributions associated to Feynman graphs similar to (6.15); the difference, of course, is that these contributions will depend on γ (k+n) s for all s ∈ [0, t], while (6.15) only depends on the initial data). Thus, we also obtain ∣∣∣Tr J (k) η(k)n,t ∣∣∣ ≤ Ck+n tn/4 . (6.19) This bound immediately implies the uniqueness. In fact, given two solutions Γ1,t = {γ 1,t }k≥1 and Γ2,t = {γ 2,t }k≥1 of the infinite hierarchy (4.1), both satisfying the a-priori bounds (6.4) and with the same initial data, we can expand both in a Duhamel series of order n as in (6.11). If we fix k ≥ 1, and consider the difference between γ 1,t and γ 2,t , all terms (6.12) cancel out because they only depend on the initial data. Therefore, from (6.19), we immediately obtain that, for arbitrary (sufficiently smooth) compact k-particle operators J (k), ∣∣∣TrJ (k) 1,t − γ )∣∣∣ ≤ 2Ck+n tn/4 Since it is independent of n, the left side has to vanish for all t < 1/C4. This proves uniqueness for short times. But then, since the a-priori bounds hold uniformly in time, the argument can be repeated to prove uniqueness for all times. References [1] Adami, R.; Golse, F.; Teta, A.: Rigorous derivation of the cubic NLS in dimension one. Preprint: Univ. Texas Math. Physics Archive, www.ma.utexas.edu, No. 05-211. [2] M.H. Anderson, J.R. Ensher, M.R. Matthews, C.E. Wieman, and E.A. Cornell, Science 269 (1995), 198. [3] Bardos, C.; Golse, F.; Mauser, N.: Weak coupling limit of the N -particle Schrödinger equation. Methods Appl. Anal. 7 (2000), 275–293. [4] K. B. Davis, M. -O. Mewes, M. R. Andrews, N. J. van Druten, D. S. Durfee, D. M. Kurn and W. Ketterle, Phys. Rev. Lett. 75 (1995), 3969. [5] Erdős, L.; Schlein, B.; Yau, H.-T.: Derivation of the cubic non-linear Schrödinger equation from quantum dynamics of many-body systems. Invent. Math. 167 (2007), no. 3, 515-614. [6] Erdős, L.; Schlein, B.; Yau, H.-T.: Derivation of the Gross-Pitaevskii equation for the dynamics of Bose-Einstein condensate. To appear in Ann. of Math. Preprint arXiv:math-ph/0606017. [7] Erdős, L.; Schlein, B.; Yau, H.-T.: Rigorous derivation of the Gross-Pitaevskii equation. Phys. Rev. Lett. 98 (2007), no. 4, 040404. [8] Erdős, L.; Yau, H.-T.: Derivation of the nonlinear Schrödinger equation from a many body Coulomb system. Adv. Theor. Math. Phys. 5 (2001), no. 6, 1169–1205. [9] Lieb, E.H.; Seiringer, R.: Proof of Bose-Einstein condensation for dilute trapped gases. Phys. Rev. Lett. 88 (2002), no. 17, 170409. [10] Lieb, E.H.; Seiringer, R.; Yngvason, J.: Bosons in a trap: a rigorous derivation of the Gross- Pitaevskii energy functional. Phys. Rev A 61 (2000), no. 4, 043602. [11] Spohn, H.: Kinetic Equations from Hamiltonian Dynamics. Rev. Mod. Phys. 52 (1980), no. 3, 569–615. http://arxiv.org/abs/math-ph/0606017 Introduction Heuristic Derivation of the Gross-Pitaevskii Equation Main Results General Strategy of the Proof and Previous Results Convergence to the Infinite Hierarchy Uniqueness of the Solution to the Infinite Hierarchy Higher Order Energy Estimates Expansion in Feynman Graphs ABSTRACT We report on some recent results concerning the dynamics of Bose-Einstein condensates, obtained in a series of joint papers with L. Erdos and H.-T. Yau. Starting from many body quantum dynamics, we present a rigorous derivation of a cubic nonlinear Schroedinger equation known as the Gross-Pitaevskii equation for the time evolution of the condensate wave function. <|endoftext|><|startoftext|> Introduction.— It is known since the ground-breaking work of Berry on geometric phases [1] that artificial gauge potentials can be induced if the spatial dynamics of a sys- tem that obeys a wave equation is confined in a certain way. For instance, if the internal Hamiltonian of neu- tral atoms contains an energy barrier but the spin eigen- states are spatially varying, gauge field dynamics can be induced [2]. In the limit of ray optics, moving atomic en- sembles could simulate the propagation of light around a black hole or generate topological phase factors of the Aharonov-Bohm type [3], and inhomogeneous dielectric media could generally exhibit geometric effects such as an optical spin-Hall effect and the optical Magnus force In this paper, we propose to use electromagnetically in- duced transparency (EIT) to generate an artificial vector potential for the paraxial dynamics of signal photons that simulates quantum dynamics of charged particles in a static electromagnetic field. Not only the ray of light but also its mode structure is affected, resulting in a paraxial wave equation that is equivalent to the Schrödinger equa- tion for charged particles. Furthermore, the form of the artificial vector potential can be easily controlled through spatial variations in the control fields. We suggest con- figurations that generate homogeneous quasi-electric and magnetic fields as well as a vector potential of Aharonov- Bohm type. Although the treatment in this paper is based on EIT, the effect presented here is more general: it will occur in any medium that supports a set of discrete eigenmodes for a propagating signal fields with different indices of refraction. If the parameters governing these eigenmodes vary in space, the signal modes will adiabatically follow, acquiring geometric phases that affect their paraxial dy- namics. Review of EIT with multi-Λ atoms.— The effect takes place in an atomic multi-Λ system, in which two ground states are coupled to Q excited states by Q pairs of con- trol (Ωq) and signal (âq) fields (Fig. 1). An experimen- tally relevant example of such system is the fundamental D1 transition in atomic rubidium, where both the ground and excited levels are split into two hyperfine sublevels [5]. We assume that the detunings are small so each sig- nal field âq interacts only with the respective transition |B〉 ↔ |Aq〉 with the associated atomic operator σ̂B,Aq and vacuum Rabi frequency gq. In this case, the parax- ial wave equation for each signal mode can be cast into the form âq = iNgqσ̂B,Aq , (1) where the wave propagates along the z axis, ∆⊥ = ∂ ∂2y , N is the number of atoms and k is the wavevector which we assume approximately independent of q. In Ref. [6] we have constructed a unitary transformation âq = Wqs b̂s (2) that maps the original field modes aq to a new set of modes b̂q, such that one and only one of the new modes, b̂Q = R∗q âq, (3) (where Rq ≡ Ωq/(gqΩ⊥) and Ω⊥ ≡ q=1 |Ωq/gq|2 de- pend on the control fields) couples only to an atomic dark state and experiences EIT [6, 7, 8]. All other superposi- tions of field modes are absorbed. This transformation is given explicitly by Wqq′ = γwqw q′−δqq′ , with γ = RQ+1 and wq = γ −1(δQq +Rq). The EIT mode b̂Q interacts with the multi-Λ atoms in the same fashion as does the signal field in a regu- lar 3-level system. While propagating through the EIT medium, it gives rise to a dark-state polariton associ- ated with zero interaction energy [9]. All other modes couple to atomic states whose energy levels are Stark- shifted by the interaction with either the pump field or the other signal modes b̂q (q 6= Q). The resulting energy gap guarantees that, if the amplitudes and phases of the control fields are slowly changed, the composition of the dark-state polariton, and hence the EIT mode b̂Q, will adiabatically follow. It has been proposed [6] and ex- perimentally demonstrated [5] that a variation in time of the control fields can therefore be used to adiabatically http://arxiv.org/abs/0704.0814v2 FIG. 1: Multi Λ-system: Q excited states |Aq〉 are each cou- pled by a classical control field Ωq to the ground state |C〉 and by a quantized field âq with detuning δ to state |B〉. transfer optical states between signal modes. In this pa- per, we focus on spatial propagation of the EIT mode under control fields that are constant in time, but varied in space. Derivation of the gauge potential.— We proceed by ex- pressing Eq. (1) in terms of the new signal modes b̂q. Employing the vector notation ~a = {â1, · · · , âQ} and ~σB,A = {g1σ̂B,A1 , · · · , gQσ̂B,AQ} we get b = iN~σB,A. (4) Throughout the paper, the double arrow denotes a Q × Q matrix. Because W depends on space and time, the differential operators have to be applied to both W and b. As a result, transformation (2) brings about additional terms into the equation of motion, that can be written in form of a minimal coupling scheme by introducing the Hermitian gauge field i ≡ i W †∂i W, (5) where i = t, x, y, z. We multiply both sides of Eq. (4) by W † and exploit the unitarity of W to show that ∂i W † = W †(∂i W † from which it follows that − W †∂2i i + i∂iA i. The dynamic equation for the b̂ modes can then be written as i∂t +A b = − ic∂z + cA b (6) (−i∇⊥ −A 2~̂b− W †N~σBA with ∇⊥ = (∂x, ∂y). This equation has the structure of a 2+2 dimensional field theory with minimal coupling. Under the assumption that the control fields do not depend on t and z we can make a temporal Fourier trans- formation of the slowly varying amplitudes, which results in the paraxial wave equation b(δ) = (−i∇⊥ −A ~σBA(δ). (7) The gauge potential is given explicitly by ⊥ = i R∗q(∇⊥Rq)~w~w† − iγ(∇⊥ ~w)~w† + iγ∗ ~w∇⊥ ~w† . The full matrix A ⊥ is a pure gauge: it has emerged solely as a consequence of the unitary transformation (2), which reflects our choice to describe the system in terms of the new modes b̂q rather than the original modes âq. How- ever, this choice is motivated by the fact that the EIT mode b̂Q is the only mode that is not absorbed. Ab- sorption of other modes b̂q (with q 6= Q) means that the index of refraction for these modes has a significant imaginary part. This separates the EIT mode b̂Q from other b-modes and ensures that it will adiabatically follow variations of the control fields. Therefore, when analyz- ing the evolution of b̂Q, we can neglect the off-diagonal terms in the matrix (−i∇⊥ −A 2 in Eq. (7) and write i∂z b̂Q(δ) = −( ~σBA)Q(δ)− b̂Q(δ) (9) (−i∇⊥ −A⊥)Qq(−i∇⊥ −A⊥)qQb̂Q(δ). This equation does not include the whole matrix A Consequently, this potential no longer acts like a pure gauge but attains physical significance in determining the spatial dynamics of the EIT mode. The first term on the right-hand side of Eq. (9), re- sponsible for the interaction of the light field with the EIT medium, takes the same form as the susceptibility of EIT in a single Λ-system. Neglecting decoherence, we can write it as [6] ( W † N ~σBA)Q(δ) = b̂Q, with the EIT group velocity vEIT = cΩ 2/(Ng2). Note that vEIT depends on the spatial position because Ω does. This transforms Eq. (9) to i∂z b̂Q = (−i∇⊥ −AQQ)2 − b̂Q (10) AQQ = i R∗q∇⊥Rq = − |Rq|2∇⊥Arg(Rq), |(A⊥)Qq|2 = −A2QQ + |∇⊥Rq|2 (11) being, respectively, the “quasi-vector” and “quasi-scalar” potentials. We see that the paraxial spatial evolution of the EIT signal mode is governed by the equation that is identi- cal (up to coefficients) to the Schrödinger equation of a charged particle in an electromagnetic field. This is the main result of this work. By arranging the control field in a certain configuration, one can control the spatial prop- agation of the signal mode through the EIT medium. Some steering of the EIT mode is possible even in a single-Λ system by affecting the term δ/vEIT in Eq. (10), which results in nonuniform refraction for this mode [10, 11]. The action of quasi-gauge fields (11) is fun- damentally different: deflection of the signal field occurs not due to refraction (the refraction index on resonance is 1), but due to adiabatic following. The case of two control fields: homogeneous electric and magnetic quasi-fields.— Of particular practical im- portance is the simplest non-trivial case with Q = 2. We parametrize the control fields by writing R1,2 = 1/2±Rei(φ±θ). The corresponding Rabi frequencies are then Ωi = h(x, y) giRi, with h(x, y) being an arbi- trary common prefactor. This parametrization yields the gauge potentials AQQ = −∇⊥φ− 2R∇⊥θ; (12) (∇⊥R)2 1− 4R2 + (∇⊥θ) 2(1 − 4R2). Similarly to usual electrodynamics we can use a gauge transformation [12], A′QQ = AQQ + ∇⊥f , to eliminate the term ∇⊥φ from Eq. (12). The common phase φ of the control fields therefore does not contribute and can be set to zero. A simple way to generate a term that corresponds to a one-dimensional scalar potential V (x) for a Schrödinger particle is to choose R = 0 and θ = 2kV (x′) This choice of control fields leads to AQQ = 0 and Φ = 2kV (x). For the special case of a constant electric quasi-field along the x axis, V (x) = −Fx and subsequently θ = − 4kF |x|3/3, (13) where x < 0 is assumed for the region of interest. A res- onant (δ = 0) Gaussian solution to Eq. (10) is displayed in Fig. 2(a). The center of the Gaussian beam is shifted by an amount xctr = Fz 2/2k, which is equivalent to the motion of a charged particle in a constant electric field. The control field phase profile (13) can be implemented using, for example, a phase plate. The assumption that the control fields do not depend on z implies that the Fresnel number for these fields must be above 1, i.e. that the characteristic transverse distance over which these fields significantly change must be larger than ∼ where L is the EIT cell length. This imposes a limi- tation on the magnitude of the electric quasi-field: from Eq. (13) we find F <∼ λ−1/2L−3/2 and thus xctr <∼ Assuming that the signal field also has a Fresnel number of at least 1, and thus satisfies 2zR >∼ L (with Rayleigh length zR = kw 2/2, w being the signal beam width at the cell entrance), we find that in a realistic experiment, the maximum possible signal beam displacement due to the quasi-electric field is on the order of the signal beam width w. To generate a homogeneous magnetic quasi-field along the z-axis the quantity B = ∇ × AQQ = 2∇⊥θ × ∇⊥R should be constant. However, it seems difficult to si- multaneously achieve a vanishing electric quasi-field E = −∇⊥Φ. A choice that minimizes the electric quasi- field around the origin is given by θ = B/2x and B/2 y. The quasi-potentials then become AQQ = −B y ex, which corresponds to the Landau gauge in stan- dard electrodynamics, and Φ = B + 2B3y4 + O(y6). If Φ is neglected, a Gaussian solution to the paraxial wave equation is given by bQ = N cscu(z) exp cotu(z)∆x2 +∆x · pc ∆x∆y − 1 pc,xpc,y + ycpc,y where we have set ∆x ≡ (x − xc, y − yc), u(z) ≡ Bz/(2k) − i tanh−1(2η) and η ≡ Bw2/4. Here xc = x0 + (k/B)(x 0 sin(Bz/k) + x̃ 0(1 − cos(Bz/k)) denotes the classical spiral trajectory of a charged particle in a magnetic field, with x′c = dxc/dz, initial position x0 and initial velocity x′0. For convenience we also have defined x̃′0 = (y 0,−x′0) and the classical canonical momentum pc. We remark that pc,x is a constant of motion. The evolution of the signal mode is displayed in Fig. 2(b). A surprising feature of solution (14) is that the diffrac- tive divergence of the signal beam is reduced: the width squared of the Gaussian, Re(iB cotu) 1 + 4η2 − (1 − 4η2) cos(Bz varies periodically with z instead of monotonically in- creasing. This effect is known for electron wavepackets [13] and can be understood as a consequence of the cir- cular motion of particles in a magnetic field: instead of dispersing, two-dimensional particles in a magnetic field will simply move on circles of different size (depending on their velocity), but with the same angular velocity. The particle cloud will therefore not spread but “breathe”. It remains to show that non-adiabatic coupling to other modes can be suppressed for realistic experimental pa- rameters. This is the case if the strength of the gauge field terms coupling bQ to other modes, which for the quasi-magnetic field are of the order B/(2k), are much smaller than the difference in the respective linear sus- ceptibilities χ1. For the EIT mode bQ, χ1 = δ/vEIT with vEIT defined above Eq. (10); for the other modes it can be approximated by the susceptibility of a two- level medium, χ1 = −4Ng2(δ − iγ/2)/(cγ2). Evalu- ating this relation at resonance leads to the condition η ≪ (kw)2n3π/2, with n ≡ N/(V k3) being the number of atoms in the volume k−3, which can easily be fulfilled in an experiment. Aharonov-Bohm potential for photons.— One of the most intriguing phenomena of charged quantum particles FIG. 2: Paraxial propagation of a signal beam over twice the Rayleigh length in the presence (solid) and absence (grey) of a constant (a) electric field along the x axis and (b) magnetic field along z. The dashed line represents the center of the grey beam. The effect of the fields is somewhat exaggerated. in electromagnetic fields is the Aharonov-Bohm (AB) ef- fect [14]. Its two astonishing features are (i) a phase shift induced by the vector potential in a region in which elec- tric and magnetic fields are absent, and (ii) its topological nature: the phase shift does not depend on the particle trajectory as long as it encloses a magnetic flux. Because (unlike genuine electromagnetism) the potential (5) is a differential function of the control fields, it is impossible to simulate feature (i) with quasi-charged photons. How- ever, we will show here that a mathematically equivalent topological phase shift does exist for the optical case. To generate an AB potential for photons we propose to use two counter-rotating Laguerre-Gaussian control fields, i.e., fields that possess an orbital angular momen- tum. If these control fields are spatially wider than the signal fields, the corresponding Rabi frequencies can be approximated in cylindrical coordinates (r, ϕ) by Ω1 = g1s1re iϕ and Ω2 = g2s2re −iϕ. The gauge potentials (12) then become AQQ = −2R/r ~eϕ and Φ = (1 − 4R2)/r2, with R = 1 (|s1|2 − |s2|2)/(|s1|2 + |s2|2). The potential AQQ corresponds exactly to an Aharonov-Bohm poten- tial for charged particles as it is created by a solenoid. Solutions of the paraxial wave equation (10) can be found in cylindrical coordinates by expanding the field mode as bQ = r m∈ZZ Bm(z, r) exp(imϕ). Because of Ω ∼ r, the EIT group velocity can be written as vEIT = ṽ r2 with ṽ ≡ c |s1|2 + |s2|2/N . Exact solutions are given by Bessel functions, Bm = e −iκ2z/(2k) κrJν(κr) + βm κrYν(κr) with ν = 1 +m2 + 4Rm− 2kṽδ. For monochromatic signal fields this corresponds to a rotation of the trans- verse mode structure. For R = ±1/2 the potential trans- fers a unit amount of angular momentum to the signal light, but generally the amount can vary continuously be- tween −h̄ and h̄. Signal photons in the EIT mode there- fore form a two-dimensional bosonic quantum system in an Aharonov-Bohm potential. Conclusion.— We showed that EIT in a multi-Λ sys- tem can be used to generate a variety of geometric ef- fects on propagating signal pulses that mimic the be- havior of a charged particle in an electromagnetic field. We found specific arrangements of two spatially inhomo- geneous pump fields in a double-Λ system which gener- ate quasi-gauge potentials which correspond to constant electric and magnetic fields. Furthermore topological ef- fects like the Aharonov-Bohm phase shift can be induced. The latter is significantly different from the proposal of Ref. [3] in that it is based on spatially inhomogeneous pump fields rather than the Doppler effect in moving me- This paper investigated EIT in systems with two ground levels. In such a system, there is only one EIT mode, which results in an Abelian U(1) gauge theory, making the physics analogous to electromagnetism. By extending to multiple ground levels, it may be possible to obtain multiple EIT modes and model non-Abelian gauge potentials. This will be explored in a future publication. We thank David Feder and Alexis Morris for fruit- ful discussions. This work was supported by iCORE, NSERC, CIAR, QuantumWorks and CFI. [1] M. V. Berry, Proc. R. Soc. Lond. A 392, 45 (1984). [2] R. Dum and M. Olshanii, Phys. Rev. Lett. 76, 1788 (1996); J. Ruseckas et al., Phys. Rev. Lett. 95, 010404 (2005); K. Osterloh et al., Phys. Rev. Lett. 95, 010403 (2005). [3] U. Leonhardt and P. Piwnicki, Phys. Rev. A 60, 4301 (1999). [4] S. Murakami, N. Nagaosa, and S.-C. Zhang, Science 301, 1348 (2003); M. Onoda, S. Murakami, and N. Nagaosa, Phys. Rev. E 74, 066610 (2006); K. Y. Bliokh and Y. P. Bliokh, Phys. Rev. Lett. 96, 073903 (2006); K. Bliokh, Phys. Rev. Lett. 97, 043901 (2006); C. Duval, Z. Hor- vath, and P. Horvathy, J.Geom.Phys. 57, 925 (2007); C. Duval, Z. Horvathy, and P. A. Horvathy, Phys. Rev. D 74, 021701 (2006); S. Raghu and F. D. M. Haldane, cond-mat/0602501. [5] F. Vewinger et al., quant-ph/0611181. [6] J. Appel, K.-P. Marzlin, and A. I. Lvovsky, Phys. Rev. A 73, 013804 (2006). [7] X.-J. Liu, H. Jing, and M.-L. Ge, Eur. Phys. J. D 40, 297 (2006); see also quant-ph/0403171. [8] S. A. Moiseev and B. S. Ham, Phys. Rev. A 73, 033812 (2006). [9] M. Fleischauer and M. D. Lukin, Phys. Rev. A 65, 022314 (2002). [10] A. G. Truscott et al., Phys. Rev. Lett. 82, 1438 (1999); R. Kapoor and G. S. Agarwal, Phys. Rev. A 61, 053818 (2000). [11] L. Karpa and M. Weitz, Nature Phys. 2, 332 (2006). [12] Note that this gauge transformation acts on the EIT mode b̂Q only and is therefore different from the gauge transformation discussed above. [13] H. Takagi, M. Ishida, and N. Sawaki, Jpn. J. Appl. Phys 40, 1973 (2001). [14] Y. Aharonov and D. Bohm, Phys. Rev. 115, 485 (1959). http://arxiv.org/abs/cond-mat/0602501 http://arxiv.org/abs/quant-ph/0611181 http://arxiv.org/abs/quant-ph/0403171 ABSTRACT The Schrodinger motion of a charged quantum particle in an electromagnetic potential can be simulated by the paraxial dynamics of photons propagating through a spatially inhomogeneous medium. The inhomogeneity induces geometric effects that generate an artificial vector potential to which signal photons are coupled. This phenomenon can be implemented with slow light propagating through an a gas of double-Lambda atoms in an electromagnetically-induced transparency setting with spatially varied control fields. It can lead to a reduced dispersion of signal photons and a topological phase shift of Aharonov-Bohm type. <|endoftext|><|startoftext|> Introduction The engineering of quantum states of light fields and oscillators became an in- teresting topic in the last years, due to its applications in : (i) fundamentals of quantum mechanics (preparation of Schrodinger-cat states [1], their super- position [2] and measurement of their decoherence [3], etc.); (ii) determination of certain properties of a system (phase distribution P(θ) [4], Wigner [5] and Husimi [6] functions, etc.); (iii) proposals for practical applications (quantum lithography [7], quantum communication [8] - e.g., via hole-burning in Fock ∗corresponding author, e-mail : sbd@cbpf.br http://arxiv.org/abs/0704.0815v2 space [9] - quantum teleportation [10], etc). However, a difficult situation ap- pears when one wants to prepare a state of a system offering hard access [11]. In this case the difficulty may be circumvented by coupling the system having hard access to a second system offering easy access, in which a desired state is prepared with subsequent transfer to the first one. The success of this operation depends on the model-Hamiltonian and on the initial state describing the whole system. Although the problem of two interacting harmonic oscillators has been ex- haustively studied in the literature, the discussion about exchange of nonclas- sical states between them is scarce. The coupled quantum oscillation problem was considered earlier in [12, 13, 14], where the authors of those papers were interested only in the energy of the system. Later on, in Ref [15] a full exchange between quantum two-mode harmonic oscillators was presented, however the issue was only concerned with the particular transfer of coherent states. In Ref. [16] we have studied the transfer of certain properties (statistics and squeezing) and in Ref. [17] we have studied the transfer of the most relevant part of the state of a sub-system to another, through the simultaneous transfer of the number and phase distributions, Pn and P (θ) 1 [17]; the solutions were found numerically since the models were not exactly soluble. In the present work we employ a distinct model-Hamiltonian, allowing us to treat the problem analytically permitting us to analyze the transfer of generic states. We show in which way one can get exact exchange of the states between two interacting sub-systems. Exchange of states means simultaneous transfer of states in two opposite directions ; so, it is more significant than the transfer of states in one direction as studied in [17]. In the present case the transfer of a state from the “easy-oscillator” to the “hard-oscillator” is observed by simply monitoring the state of the easy-oscillator during the time evolution of the whole system. For brevity, hereafter the easy- and the hard-oscillator will be referred to as O1 and O2, respectively. The Sect. II introduces the model-Hamiltonian allowing us to obtain the evolution operator for this coupled system. In the Sect. III we consider differ- ent types of initial states describing the entire system to study the mentioned effect between the O1 and the O2 ( Sub-Sects. (A), (B),and (C) ), includ- ing superpositions of states representing the qubits |0〉 and |1〉. The Sect. IV contains the comments and conclusion. 2 Model-Hamiltonian: evolution operator We start from the Hamiltonian H/h̄ = ω1a 1 a1 + ω2a 2 a2 + λ a+1 a2 + a1a , (1) 1Since the number and phase are canonically conjugate operators they are complementary, in the sense that simultaneous transfer of number and phase distributions, Pn and P (θ), concerns the transfer of the major part of the state describing a system. where a+i (ai) stands for the raising (lowering) operator of the i− th oscillator, i = 1, 2; ωi and λ are real parameters standing for the i-th oscillator frequency and coupling constant, respectively. The equations of motion for the operators a1(t) and a2(t) can be solved analytically, a1(t) = c2e−iω t + s2e−iω a1(0) + cs t − e−iω a2(0), (2) a2(t) = c2e−iω t + s2e−iω a2(0) + cs t − e−iω a1(0), where, ω′1 = ω1 + λ , (3) ω′2 = ω2 − λ x2 + 1 x2 + 1 ω1 − ω2 . (5) The parameter s and c satisfy the condition c2+s2 = 1, they define the auxiliary operators a′1 = c a1 + s a2 , (6) a′2 = −s a1 + c a2 , which decouple the above Hamiltonian. The following relations also hold: ω′1 + ω 2 = ω1 + ω2 , (7) ω′1 − ω′2 = It is convenient for our purposes to find the time dependent state vector or density operator in the Schrodinger picture. One formal prescription is to work with Wigner representation of the state and obtain the time-dependent density operator from the Wigner function[19], for which the time evolution is easily obtained. However, it is a hard task to restore analytical or numerical values for the density matrix ρ(t) in the Fock basis from the time dependent Wigner function. To overcome this difficult we will show that for the Hamiltonian given by Eq.(1) there is an analytical expression for the evolution operator U(t), which defines the solution of the Schrodinger equation, allowing us to get directly the matrix ρ(t) in the Fock basis. This kind of approach was already used in Ref [18], but only treating the system in the resonant case (ω1 = ω2). In [18] the author studied the transfer of state starting from the particular one photon state. Our results permit one to obtain an analytical expression for the matrix element U(t), for the Hamiltonian (1) not restricted to the resonant case and permitting easy application to a generic initial state. Consequently, the problem of transfer of states can be more comfortably discussed using the present results. To obtain the operator U(t), we define the (auxiliary) unitary operator Us(t) which is associated to a rotation and decouples the Hamiltonian, U−1s ai Us = a i . (8) We have, U−1s = U−s , (9) in view of the reverse transformation a1 = c a 1 − s a′2 , (10) a2 = s a 1 + c a We denote {|n1, n2〉0} as representing the Fock′s basis, eigenvectors of the (old) number operator Ni = a i ai , whereas {|n1, n2〉s} is the same for the (new) number operator Ni(s) = a i. We have, Us|n1, n2〉s = |n1, n2〉0, (11) |n1, n2〉s = U−s|n1, n2〉0. If we represent Us in the Fock ′s basis {|n1, n2〉0}, we obtain n1, n2 m1, m2 = 0〈n1, n2|Us|m1,m2〉0 (12) = s〈n1, n2|m1,m2〉0. Next, to reconstruct the operator Us in the Fock’s basis, we start from s〈n1, n2| a′1|m1,m2〉0 = s〈n1, n2| (c a1 + s a2) |m1,m2〉0, (13) Since the operators a′i act on the basis {|n1, n2〉s} whereas the ai act on the basis {|n1, n2〉0}, we get n1 + 1s〈n1 + 1, n2|m1,m2〉0 = c m1 s〈n1, n2|m1 − 1,m2〉0 (14) m2 s〈n1, n2|m1,m2 − 1〉0, which, after using the Eq.(12), leads to n1, n2 m1, m2 n1−1,n2 m1−1,m2 n1−1,n2 m1,m2−1 , (15) and similarly, repeating the procedure for the operator a′2, we find n1, n2 m1, m2 n1,n2−1 m1−1,m2 n1,n2−1 m1,m2−1 . (16) Using the Eqs. (15), (16) plus the unitary condition U †sUs = UsU s = 1 we obtain, after a lengthy calculation, the expression n1, n2 m1, m2 = δn1+n2, m1+m2 n1!n2! m1!m2! (−1)n2 cm1−n2 sm2+n2 (17) min(n2,m2) k=max(0,m2−n1) (−1)−k n2 − k (U−s) n1, n2 m1, m2 = (−1)m2−n2 (Us)n1, n2m1, m2 . (18) The time evolution operator U(t) may be written in the basis {|n1, n2〉s} as U(t) = k1,k2 |k1, k2〉s e−i(k1ω + k2ω s〈k1, k2| , (19) for H is diagonal in this basis. Finally from the Eqs.(12) and (19) we obtain the expression n1, n2 m1, m2 k1,k2 e−i(k1ω + k2ω )t (U−s) n1, n2 k1, k2 (U−s) m1, m2 k1, k2 , (20) restricted to n1 + n2 = k1 + k2 = m1 +m2 , whereas U n1, n2 m1, m2 = 0 otherwise. The evolution operator obtained in Eq.(20) allows us to study the time evo- lution of the whole state describing our bipartite system composed by coupled oscillators, represented by the Hamiltonian in the Eq.(1). In the next section we will study the exchange of states between these oscillators and, as a natural assumption, we will suppose the O2 initially in its ground state |0〉. The O1 is assumed to be previously prepared in various initial states, firstly starting from an arbitrary state |φ〉. 3 Exchange of generic state Let us consider that the whole (bipartite) system is initially in the state |Ψ(0)〉 = |φ〉 ⊗ |0〉 , (21) whose components in the Fock’s basis are given by, |Ψ(0)〉 = Cn, 0(0)|n, 0〉 , (22) since Cn1, n2(0) = 0 for n2 6= 0. In the Schrodinger representation, the coeffi- cients Cn1,n2(t) are obtained from Cn1,n2(t) = 〈n1, n2|U(t)|Ψ(0)〉, which, using Eq. (22) and the constraint n1 + n2 = n, results in the form Cn1,n2(t) = Cn1+n2,0(0)U(t) n1,n2 n1+n2,0 . (23) In particular, we have that Cn,0(t) = Cn,0(0)U(t) n,0 , (24) C0,n(t) = Cn,0(0)U(t) n,0 . (25) The exchange of states between the oscillators will occur after an instant τ ,when C0,n(τ ) = Cn,0(0) and |Ψ(τ)〉 = C0, n(τ )|0, n〉 , (26) or, |Ψ(τ )〉 = |0〉 ⊗ |φ〉. This shows that exchange of states allows us to verify the transfer of states to the O2 by monitoring the time evolution of the O1. From the Eqs. (17) and (18) we have, n−l,l = (n− l)!l! cn−l sl , (27) n−l,l = (n− l)!l! (−1) cl sn−l . (28) The substitution of the Eqs. (27) and (28) in the Eq. (20) results n,0 = (−1) (n− l)!l! (−1)n−l cnsne−i (n−l) ω t e−i l ω t . (29) where we recognize the Newton’s binomial expression, n,0 = (−1) e−i ω t − e−i ω or, replacing the auxiliary parameters ω′1, ω 2 by ω1, ω2 and λ (cf. Eq. (7)), n,0 = e ω1+ω2 −2 i s c sin( λ , (31) and, consequently, C0,n(t) = Cn,0(0) e−i ω1+ω2 −2 i s c sin( λ . (32) In a similar way we get, Cn,0(t) = Cn,0(0) e−i ω1+ω2 c2e−i t + s2ei . (33) From Eq. (32) we see that a partial exchange of states will occur when λt/sc = (2k + 1)π, i.e., in the time intervals τk = (sc/λ) (2k + 1)π. The effect attains the highest efficiency when the product sc is maximum, i.e., when s = c = 1/ 2 and τk = (k + 1/2)π/λ. According to the Eq. (4) this implies x = 0 and the resonance condition ω1 = ω2 = ω (cf. Eq. (5)), C0,n(τk) = (−i)n Cn,0(0) e−i ω n τk . (34) However, we note that even at resonance we obtain no exchange of states, due to the presence of the phase factor exp ωτk + affecting the coefficients of the state describing both oscillators in the Fock’s representation. In this gen- eral case we obtain ∣C0,n(τk) ∣Cn,0(0) ∣, which means exchange of statistics between the two oscillators. This can also be seen comparing both reduced density matrix, ρ m1, m2(τk) and ρ m1, m2(0), in the Fock’s representation, ρ(2)m1, m2(τk) = e −i (ωτk+π2 ) (m1−m2) ρ(1)m1, m2(0) , (35) which exhibits the distinction between their off-diagonal elements. As well known, while the state of a system offers its complete description, the same is not true for the statistics, which contains only partial informations of the system. 3.1 The complete exchange of state It is shown in the last section that it is not possible to have a complete exchange of states for a generic initial state because the phases are not transferred (see Eq.35). Here we show that when the state of oscillator O1 is given by the super- position C0|0〉+ CN |N〉 whereas O2 is in the vacuum state, complete exchange of states occurs. Note that this state includes in particular the important case C0|0〉+ C1|1〉 using the qubits |0〉, |1〉 having potential applications in quantum communication [20] and in quantum computation [21]. It was shown that this state exhibits squeezed fluctuations [22]. Next, let us consider the whole system initially in the superposed state |Ψ(0)〉 = C0,0(0)|0, 0〉+ CN,0(0)|N, 0〉 . (36) In this case we verify perfect exchange of states between the oscillators for a convenient choice of the parameters involved. Assuming the resonance condition in the Eq.(32) we have, for C0,0(t) = C0,0(0), C0,N (t) = CN,0(0) e−i (ω t+π/2)N sinN (λ t) . (37) Partial exchange of states will occur when t = τ0 = π/(2λ),which results in C0,N (τk) = C N,0(0) e−i π/2(ω/λ+1)N , (38) whose meaning is the exchange of statistics. The exchange of states becomes complete (exact) when C0,N (τk) = C N,0(0), namely, when , (39) with m integers. Taking m = 1 and ω in the microwave domain (ω ∼ 109Hz) the time spent to transfer the state C0|0 > + C1|1 > from the O1 to the O2 results τ0 = π/(2λ) ∼ 10−9s, since λ = ω/3 (cf. Eq.(39)), which is smaller than the typical decoherence time for such systems (τd ∼ 10−3s), as it should. Note that the previous initial state C0|0〉+CN |N〉 describing the O1 includes the Fock states |N〉, obtained from C0 = 0 and CN = 1. In this case exact exchange of states no longer requires the Eq. (39). The reason comes from the phase factor appearing in the Eq. (39), now becoming a global phase with no physical relevance. In this case the exchange of states is exact for any instant tk = τ0 + 2πk/λ. 4 Comments and Conclusion An analytical procedure applied to a convenient model-Hamiltonian describing two coupled oscillators allows us to get the exact evolution operator for the entire system (Sect. II). This approach, through the use of distinct initial states and parameters (Sub-Sects. (A), (B) of Sect. III), makes easy the study of exchange of states between such sub-systems. In all cases we have shown that the fidelity of the process is maximum when the resonance condition, ω1 = ω2, is attained. Assuming the O2 always in the vacuum state we find, sub- Section by sub-Section, that: (A) partial exchange of states is achieved when the initial state of the O1 is arbitrary, for the time intervals t = τk = (k + 1/2)π/λ; the efficiency of partial exchange is maximum when the product sc is maximum (sc = 1/2); however, while the occurrence of exchange of states is partial, exchange of statistics is obtained exactly, as shown in the Eqs. (34), (35); (B) exact exchange of states occurs when the O1 starts from the initial superposed state C0|0〉 + CN |N〉, in the time intervals tk = τ0 + 2πk/λ, with the requirement in Eq. (39). If the Eq.(39) is not obeyed, exchange of states will occur at the same time intervals, but now the effect is only partial; Exact exchange of states is also found in the particular case of (B), setting C0 = 0 and CN = 1, which means the O1 starting from a Fock state |N〉. In this case the exchange of states occurs exactly at the same time intervals found in (B), no matter the Eq. (39) is obeyed or not. As final remarks we mention that exchange of states and its efficiency could be investigated for other model-Hamiltonians and, as explained before, the ef- fect goes beyond those studied in [16] and [17]. To our knowledge, exchange of states in coupled systems and even exchange of certain properties, are subjects receiving little attention in the literature [23] - with the remarkable exception of quantum teleportation [21], an effect having a very distinct nature (requiring the presence of quantum channels and entangled states), which occurs in the absence of coupling between the two sub-systems. In the context of teleporta- tion, exchange of states appears with the name ”identity interchange” [24] and ”two-way teleportation” [25]. 4.1 Acknowledgements The authors thank the CNPq (SBD, BB) and FAPERJ (DPJ) for the partial supports. 4.2 References References [1] B.Yurke, D. Stoler, Phys. Rev. Lett. 57 (1986) 13. [2] L. Davidovich et al., Phys. Rev. Lett. 71 (1993) 2360. [3] M. Brune et al., Phys. Rev. Lett. 77 (1996) 4887; D.M. Meekhof et al., Phys. Rev. Lett. 76 (1996) 1796. [4] D.T. Pegg, S. M. Barnett, Phys. Rev. Lett. 76 (1996) 4148. [5] L.G. Lutterbach and L. Davidovich, Phys. Rev. Lett. 78 (1997) 2547. [6] M. H.Y. Moussa, B. Baseia, Phys. Lett. A 238 (1998) 223. [7] G. Bjork, L.L. Sanchez-Soto, J. D. Soderholm, Phys. Rev. Lett. 86 (2001) 4516. [8] See, e.g., S.L. Braustein, P. van Loock, Rev. Mod. Phys. 77 (2005) 513. [9] B. Baseia, J.M.C. Malbouisson, Chinese Phys. Lett. 18 ,1467 (2001); Phys. Lett. A 290 (2001) 234; A.T. Avelar, B. Baseia, Opt. Commun. 239 (2004) 281; Phys. Rev. A 72 (2005) 67508; B. Escher et al., Phys. Rev. A 70 (2004) 025801. [10] B. Julsgaard et al., Nature, 413 (2001) 400, and references therein. [11] F. Dietrich et al., Phys. Rev. Lett. 62 (1989) 403; D.J. Heizein et al., Phys. Rev. Lett. 66 (1991) 2080. [12] J. Tucker and D. F. Walls, Ann. Phys. (N.Y.) 52, 1 (1969). [13] E.Y.C. Lu, Phys. Rev. A 8, 1053 (1973). [14] M.S. Abdalla, J. Phys. A: Math. Gen. 29, 1997 (1996). [15] Marcos C de Oliveira et al, Journal Optics B 1 (1999) 610. [16] H. Rodrigues et al., Physica A 311 (2002)188. [17] D. Portes Jr., et al., Physica A 329 (2003) 391. [18] Lee E. Estes, Thomas H. Keil, and Lorenzo M. Narducci, Physical Review 175,1 (1968) 286. [19] B. R. Mollow, Physical Review 162,5 (1967) 1256. [20] S. J. van Enk, J. I. Cirac, P. Zoller, Phys. Rev. Lett. 78 (1997) 4293. [21] P. W. Shor, Phys. Rev. A 52 (1995) R2493. [22] K. Wodkiewicz et al, Phys. Rev. A 35, (1987) 2567. [23] A. S. M. de Castro, V.V. Dodonov , J. Opt. B: Quantum Semiclass. Opt. 4 (2002) 191. [24] M. H. Y. Moussa, Phys. Rev. A 55, (1997) R3287. [25] L. Vaidman, N. Yoran, Phys. Rev. A 59 (1999) 116. Introduction Model-Hamiltonian: evolution operator Exchange of generic state The complete exchange of state Comments and Conclusion Acknowledgements References ABSTRACT Exchange of quantum states between two interacting harmonic oscillator along their evolution time is discussed. It is analyzed the conditions for such exchange starting from a generic initial state and demonstrating that the effect occurs exactly only for the particular states C0|0>+Cn|N>, which includes the interesting qubits components |0>,|1>. It is also determined the relation between the coupling constant and characteristic frequencies of the oscillators to have the complete exchange. <|endoftext|><|startoftext|> Introduction Recent studies of luminous radio quasars indicate that the power of the radio jet can exceed the bolometric luminosity associated with the accretion flow thermal emission (Punsly 2006b, 2007). This has proven to be quite challenging for current 3-D numerical simulations of MHD black hole magnetospheres. Based on table 4 of Hawley and Krolik (2006) and the related discussion of Punsly (2006b, 2007), the most promising 3-D simulations for achieving this level of efficiency are those of the highest spin, a/M ≈ 1 (where the black hole mass, M , and the angular momentum per unit mass, a, are in geometrized units). More generally, such high spins have been inferred in some black hole systems based on observational constraints (McClintock et al 2006). Thus, there is tremendous astronomical relevance to these highest http://arxiv.org/abs/0704.0816v1 – 2 – spin configurations, in particular the physical origin of the relativistic Poynting jet. The first generation of long term 3-D simulations produced one Poynting flux powerhouse, the a/M = 0.995 simulation, KDE (De Villiers et al 2003, 2005a; Hirose et al 2004; Krolik et al 2005). The source of most of the Poynting flux was clearly shown to be outside the event horizon in KDE (Punsly 2006a). However, without access to the original data, the details of the physical mechanism could not be ascertained. A second generation of 3-D simulations were developed in Hawley and Krolik (2006), the highest spin case was KDJ, a/M = 0.99, with by far the most powerful Poynting jet within the new family of simulations; three times the Poynting flux (in units of the accretion rate of mass energy) of the next closest simulation KDH, a/M = 0.95. The last three data dumps, at simulation times, t = 9840 M, t = 9920 M and t = 10000 M, were generously made available to this author. The late time behavior of the simulations is established after t = 2000 M (when the large transients due to the funnel formation have died off) making these data dumps of particular interest for studying the Poynting jet (Hawley and Krolik 2006). This paper studies the origin of the Poynting jet at these late times. The analysis of the data from the KDJ simulation clearly indicates that the Poynting flux in the outgoing jet is dominated by large flares. Typically, one expects the turbulence in the field variables to mask the dynamics of Poynting flux creation in an individual time slice of one of the 3-D simulations (Punsly 2006a). Surprisingly, the flares are of such a large magnitude that they clearly standout above the background field fluctuations as evidenced by figure 1. The flares are created in the equatorial accretion flow deep in the egosphere between the inner calculational boundary at r=1.203 M and r= 1.6 M (the event horizon is at r= 1.141 M). Powerful beams of Poynting flux emerge perpendicular to the equatorial plane in the ergospheric flares and much of the energy flux is diverted outward along approximately radial trajectories that are closely aligned with the poloidal magnetic field direction in the jet (see figure 1). The situation is unsteady, whenever some vertical magnetic flux is captured in the accretion flow it tends to be asymetrically distributed and concentrated in either the northern or southern hemisphere. This hemisphere then receives a huge injection of electromagnetic energy on time scales ∼ 60M . The source of Poynting flux in KDJ resembles a nonstationary version of the ergospheric disk (see Punsly and Coroniti (1990) and chapter 8 of Punsly (2001) for a review). The ergospheric disk is modeled in the limit of negligible accretion and it is the most direct manifestation of gravitohydromagnetics (GHM) Punsly (2001). A GHM dynamo arises when the magnetic field impedes the inflow of gas in the ergosphere, i.e., vertical flux in an equatorial accretion flow. The strong gravitational force will impart stress to the magnetic field in an effort to move the plasma through the obstructing flux. In particular, the metric induced frame dragging force will twist up the field azimuthally. These stresses are coupled – 3 – into the accretion vortex around a black hole by large scale magnetic flux, and propagate outward as a relativistic Poynting jet. The more obstinate the obstruction, the more powerful the jet. There are two defining characteristics that distinguish the GHM dynamo from a Blandford-Znajek (B-Z) process, Blandford and Znajek (1977), on field lines that thread the ergopshere: 1. The B-Z process is electrodynamic so there is no source within the ergosphere, it appears as if the energy flux is emerging from the horizon. In the GHM mechanism, the source of Poynting flux is in the ergospheric equatorial accretion flow. 2. In a B-Z process in a magnetosphere shaped by the accretion vortex, the field line angular velocity is, ΩF ≈ ΩH/2 (where ΩH is the angular velocity of the horizon) near the pole and decreases with latitude to ≈ ΩH/5 near the equatorial plane of the inner ergosphere (Phinney 1983). In GHM, since the magnetic flux is anchored by the inertia of the accretion flow in the inner ergosphere, frame dragging enforces dφ/dt ≈ ΩH . One therefore has the condition, ΩF ≈ ΩH . In order to understand the physical origin of the Poynting flux, these two issues are studied below. 2. The KDJ Simulation The simulation is performed in the Kerr metric (that of a rotating, uncharged black hole), gµν . Calculations are carried out in Boyer-Lindquist (B-L) coordinates (r, θ, φ, t). The reader should refer to Hawley and Krolik (2006) for details of the simulation. We only give a brief overview. The initial state is a torus of gas in equilibrium that is threaded by concentric loops of weak magnetic flux that foliate the surfaces of constant pressure. The magnetic loops are twisted azimuthally by the differentially rotating gas. This creates significant magnetic stress that removes angular momentum from the gas, initiating a strong inflow that is permeated by magneto-rotational instabilities (MRI). The end result is that after t = a few hundred M, accreted poloidal magnetic flux gets trapped in the accretion vortex or funnel (with an opening angle of ∼ 60◦ at the horizon tapering to ∼ 35◦ at r > 20M). This region is the black hole magnetosphere and it supports a Poynting jet. The surrounding accretion flow is very turbulent. In order to understand the source of the strong flares of radial Poynting flux, one needs to merely consider the conservation of global, redshifted, or equivalently the B-L coordinate evaluated energy flux (Thorne et al 1986). In general, the divergence of the – 4 – Fig. 1.— The source of Poynting flux. The left hand column is Sθ and the right hand column is Sr in KDJ, both averaged over azimuth, at (from top to bottom) t= 9840 M, t = 9920 M and t= 10000 M. The relative units (based on code variables) are in a color bar to right of each plot for comparison of magnitudes between the six plots. The contours on the Sθ plots are of the density, scaled from the peak value within the frame at relative levels 0.5 and 0.1. The contours on the Sr plots are of Sθ scaled from the peak within the frame at relative levels 0.67 and 0.33. The inside of the inner calculational boundary (r=1.203 M) is black. The calculational boundary near the poles is at 8.1◦ and 171.9◦. Notice that any contribution from an electrodynamic effect associated with the horizon appears minimal. The white contour is the stationary limit surface. There is no data clipping, so plot values that exceed the limits of the color bar appear white. – 5 – Fig. 2.— The central engine. The left hand column is Bθ and the right hand column is ΩF in KDJ, both averaged over azimuth, at (from top to bottom) t= 9840 M, t = 9920 M and t= 10000 M. The relative units (based on code variables) are in a color bar to right of each plot for comparison of magnitudes between the plots. The calculational boundaries are the same as figure 1. The contours on the Bθ plots are of the density, scaled from the peak value within the frame at relative levels 0.5 and 0.1. There is no data clipping, so plot values that exceed the limits of the color bar appear white. – 6 – time component of the stress-energy tensor in a coordinate system can be expanded as, T νt ;ν = (1/ −g)[∂( −g T νt )/∂(x ν)] + Γ µ , where Γ t β is the connection coefficient and g = −(r2 + a2 cos2 θ)2 sin2 θ is the determinant of the metric. However, the Kerr metric has a Killing vector (the metric is time stationary) dual to the B-L time coordinate. Thus, there is a conservation law associated with the time component of the divergence of the stress- energy tensor. Consequently, if one expands out the inhomogeneous connection coefficient term in the expression above, it will equate to zero. The conservation of energy evaluated in B-L coordinates reduces to, ∂( −g T νt )/∂(x ν) = 0, where the four-momentum −T νt has two components: one from the fluid, −(T νt )fluid, and one from the electromagnetic field, −(T νt )EM. The reduction to a homogeneous equation with only partial derivatives is the reason why the global conservation of energy can be expressed in integral form in (3.70) of Thorne et al (1986). It follows that the poloidal components of the redshifted Poynting flux are Sθ = − −g (T θt )EM and S r = − −g (T rt )EM. We can use these simple expressions to understand the primary source of the Poynting jet in KDJ. Figure 1 is a plot of Sθ (on the left) and Sr (on the right) in KDJ at the last three time steps of data collection. Each frame is the average over azimuth of each time step. This greatly reduces the fluctuations as the accretion vortex is a cauldron of strong MHD waves. The individual φ = constant slices show the same dominant behavior, however it is embedded in large MHD fluctuations. On the left hand column of figure 1, density contours have been superimposed on the images to indicate the location of the equatorial accretion flow. The density is evaluated in B-L coordinates with contours at 0.5 and 0.1 of the peak value within r < 2.5M . Notice that in all three left hand frames, Sθ is created primarily in regions of very high accretion flow density. In all three of the right hand frames of figure 1, there is an enhanced Sr that emanates from the ergosphere (defined by the interior of the stationary limit, rs = M + M2 − a2 cos2 θ, note that there are 40 grid points between r = 1.203M and rs at θ = π/2). This radial energy beam diminishes precipitously just outside the horizon, near the equatorial plane in all three time steps. The region in which Sr diminishes is adjacent to a region of strong Sθ that orig- inates in the inertially dominated accretion flow in the inner ergosphere, 1.2M < r < 1.6M (this region is resolved by 28 radial grid zones). In fact, if one looks at the conservation of energy equation, the term ∂(Sθ)/∂θ is sufficiently large to be the source of ∂(Sr)/∂r at the base of the radial beam in all three frames. This does not preclude the transfer of energy to and from the plasma. It merely states that the magnitude is sufficient to source Sr. In general, the hydrodynamic energy flux is negligible in the funnel. In order to illustrate this, contours of Sθ are superimposed on the color plots of Sr. The contour levels are chosen to be 2/3 and 1/3 of the maximum value of Sθ emerging from the dense equatorial accretion flow. One clearly sees Sθ switching off where Sr switches on. We conclude that a vertical Poynting flux created in the equatorial accretion flow is the source of the strong beams of Sr. This establishes condition 1 of the Introduction. – 7 – The left column of figure 2 contains plots of the magnetic field component, Bθ ≡ Frφ, at the three time steps. At every location in which Sθ is strong in figure 1, there is a pronounced enhancement in Bθ in figure 2. Recall that the sign of Sθ is not determined by the sign of Bθ. These intense flux patches penetrate the inertially dominated equatorial accretion flow in all three frames. The density contours indicate that the regions of enhanced vertical field greatly disrupt the equatorial inflow. As noted in the introduction, a GHM interaction is likely to occur when the magnetic field impedes the inflow in the ergosphere. The regions of large Bθ are compact compared to the global field configuration of the jet, only ∼ 1.0M − 2.0M long. Considering the turbulent, differentially rotating plasma in which they are embedded, these are most likely highly enhanced regions of twisted magnetic loops created by the MRI. The strength of Bθ at the base of the flares is comparable to, or exceeds the radial magnetic field strength. The situation is clearly very unsteady and vertical flux is constantly shifting from hemisphere to hemisphere. The time slice t = 10000 M, although primarily a southern hemisphere event, also has a significant contribution in the northern hemisphere (see the blue fan-like plume of vertical Poynting flux in figure 1). The GHM interaction is provided by the vertical flux that links the equatorial plasma to the relatively slowly rotating plasma of the magnetosphere within the accretion vortex. The vertical flux transmits huge torsional stresses from the accretion flow to the magnetosphere. Further corroboration of this interpretation can be found by looking at the values of ΩF in the vicinity of the Sr flares. In a non-axisymmetric, non-time stationary flow, there is still a well defined notion of ΩF : the rate at which a frame of reference at fixed r and θ would have to rotate so that the poloidal component of the electric field, E⊥, that is orthogonal to the poloidal magnetic field, BP , vanishes. This was first derived in Punsly (1991) (see the extended discussion in Punsly (2001) for the various physical interpretations), and has recently been written out in B-L coordinates in Hawley and Krolik (2006) in terms of the plasma three-velocity, vi and the Faraday tensor as ΩF = v φ − Fθr rFφθ + gθθv (Fφθ)2grr + (Frφ)2gθθ . (2-1) This expression was studied in the context of the simulation KDH, a/M = 0.95, in Hawley and Krolik (2006). They found that a long term time and azimuth average yielded ΩF ≈ 1/3ΩH and there was no enhancement at high latitudes as was anticipated by Phinney (1983). The t = 10000 M time slice of KDH was generously provided to this author. At t = 10000 M, there are no strong flares emerging from the equatorial accretion flow. Inside the funnel at r < 10M , at t=10000 M, 0 < ΩF < 0.5ΩH . The right hand column of figure 2 is ΩF plotted at three different time steps for KDJ. By comparison to figure 1, notice that each flare in Sr is enveloped by a region of enhanced – 8 – ΩF , typically 0.7ΩH < ΩF < 1.2ΩH . The regions of the funnel outside the ergosphere are devoid of large flares in Sr and typically have 0 < ΩF < 0.5ΩH , similar to what is seen in KDH.. Unlike KDH, there are huge enhancements in ΩF at lower latitudes in the funnel. It seems reasonable to associate this large difference in the peak values of ΩF in KDJ and KDH (at t= 10000 M) with the spatially and temporally coincident flares in Sr that occur in KDJ. Furthermore, this greatly enhanced value of ΩF indicates a different physical origin for ΩF in the flares than for the remainder of the funnel or in KDH at t = 10000 M. The most straightforward interpretation is that it is a direct consequence of the fact that the flares originate on magnetic flux that is locked into approximate corotation with the dense accreting equatorial plasma (i.e., the inertially dominated equatorial plasma anchors the magnetic flux). In the inner ergosphere, frame dragging enforces 0.7ΩH < dφ/dt < 1.0ΩH on the accretion flow. This establishes condition 2 of the Introduction. 3. Discussion In this Letter we showed that in the last three data dumps of the 3-D MHD numerical simulation, KDJ, the dominant source of Poynting flux originated near the equatorial plane deep in the ergopshere. The phenomenon is unsteady and is triggered by large scale vertical flux that is anchored in the inertially dominated equatorial accretion flow. The situation typifies the ergospheric disk in virtually every aspect, even though there is an intense accre- tion flow. There is one exception, unlike the ergospheric disk, the anchoring plasma rarely achieves the global negative energy condition that is defined by the four-velocity, −Ut < 0, because of the flood of incoming positive energy plasma from the accretion flow. The plasma attains −Ut < 0 only near the base of the strongest flares seen in the φ = constant slices. The switch-on of a powerful beam of Sr outside the horizon at r ≈ 1.3M in the a/M = 0.995 simulation, KDE, of Krolik et al (2005) was demonstrated in Punsly (2006a). It seems likely the the source of Sr in KDE is Sθ from an ergopsheric disk. The ergospheric disk appears to switch on at a/M > 0.95 as evidenced by the factor of 3 weaker Poynting flux in KDH. Furthermore, if the funnel opening angle at the horizon in KDH at t= 10000 M is typical within ±5◦ then figure 5 and table 4 of Hawley and Krolik (2006) indicate that only 35% to 40% of the funnel Poynting flux at large distances is created outside the horizon during the course of the simulation. A plausible reason is given by the plots of Bθ in figure 2. The vertical magnetic flux at the equatorial plane is located at r < 1.55M . The power in the ergospheric disk jet ∼ [Bθ(SA)(ΩH)]2, where SA is the proper surface area of the equatorial plane threaded by vertical magnetic flux (Semenov et al 2004; Punsly 2001). The proper surface area in the ergospheric equatorial plane increases dramatically – 9 – at high spin, diverging at a = M . For example, between the inner calculational boundary and 1.55 M the surface area is only significant for a/M > 0.95 and grows quickly with a/M , exceeding twice the surface area of the horizon for a/M = 0.99. Thus, if Bθ in the inner ergosphere were independent of spin to first order, then a strong ergospheric disk jet would switch-on in the 3-D simulations at a/M > 0.95. Note that if the inner boundary were truly the event horizon instead of the inner calculational boundary then this argument would indicate that the ergospheric disk would likely be very powerful even at a/M = 0.95 and the switch-on would occur at a/M ≈ 0.9. The implication is that a significant amount of large scale magnetic flux threading the equatorial plane of the ergopshere (which implies a large black hole spin based on geometrical considerations) catalyzes the formation of the most powerful Poynting jets around black holes. Thus, we are now considering initial conditions in simulations that are conducive to producing significant vertical flux in the equatorial plane of the ergosphere. It should be noted that 2-D simulations from a similar initial state of torii threaded by magnetic loops have been studied in McKinney and Gammie (2004). However, the magnetic flux evolution can be much different in this setting as discussed in Punsly (2006a) and poloidal flux configurations conducive to GHM could be highly suppressed. In summary, there are no interchange instabilities, so flux tubes cannot pass by each other or move around each other in the extra degree of freedom provided by the azimuth. Thus, there is a tendency for flux tubes to get pushed into the hole by the accretion flow. This is in contrast to the formation of the ergospheric disk in Punsly and Coroniti (1990) in which buoyant flux tubes are created by reconnection at the inner edge of the ergospheric disk and recycle back out into the outer ergosphere by interchange instabilities. Ideally, a full 3-D simulation with a detailed treatment of resistive MHD reconnection is preferred for studying the relevant GHM physics. I would like to thank Jean-Pierre DeVilliers for sharing his deep understanding of the numerical code and these simulations. I was also very fortunate that Julian Krolik and John Hawley were willing to share their data in the best spirit of science. REFERENCES Blandford, R. and Znajek, R. 1977, MNRAS. 179, 433 De Villiers, J-P., Hawley, J., Krolik, 2003, ApJ 599 1238 De Villiers, J-P., Hawley, J., Krolik, J.,Hirose, S. 2005, ApJ 620 878 – 10 – De Villiers, J-P., Staff, J., Ouyed, R.. 2005, astro-ph 0502225 Hawley, J., Krolik, K. 2006, ApJ 641 103 Hirose, S., Krolik, K., De Villiers, J., Hawley, J. 2004, ApJ 606, 1083 Krolik, K., Hawley, J., Hirose, S. 2005, ApJ 622, 1008 McKinney, J. and Gammie, C. 2004, ApJ 611 977 McClintock, J.E. et al 2006, ApJ 652, 518 Phinney, E.S. 1983, PhD Dissertation University of Cambridge. Punsly, B., Coroniti, F.V. 1990, ApJ 354 583 Punsly, B. 1991, ApJ 372 424 Punsly, B. 2001, Black Hole Gravitohydromagnetics (Springer-Verlag, New York) Punsly, B. 2006, MNRAS 366 29 Punsly, B. 2006, ApJL 651 L17 Punsly, B. 2007, MNRAS 374 10 Semenov, V., Dyadechkin, S. and Punsly, B. 2004, Science 305978 Thorne, K., Price, R. and Macdonald, D. 1986, Black Holes: The Membrane Paradigm (Yale University Press, New Haven) This preprint was prepared with the AAS LATEX macros v5.2. http://arxiv.org/abs/astro-ph/0502225 Introduction The KDJ Simulation Discussion ABSTRACT This Letter reports on 3-dimensional simulations of Kerr black hole magnetospheres that obey the general relativistic equations of perfect magnetohydrodynamics (MHD). In particular, we study powerful Poynting flux dominated jets that are driven from dense gas in the equatorial plane in the ergosphere. The physics of which has been previously studied in the simplified limit of an ergopsheric disk. For high spin black holes, $a/M > 0.95$, the ergospheric disk is prominent in the 3-D simulations and is responsible for greatly enhanced Poynting flux emission. Any large scale poloidal magnetic flux that is trapped in the equatorial region leads to an enormous release of electromagnetic energy that dwarfs the jet energy produced by magnetic flux threading the event horizon. The implication is that magnetic flux threading the equatorial plane of the ergosphere is a likely prerequisite for the central engine of powerful FRII quasars. <|endoftext|><|startoftext|> Introduction 2. Tableau facts and proof of the Main Theorem 2.1. Tableau sliding 2.2. Proof of the rule 3. An extended example of the main theorem Acknowledgments References ABSTRACT The classical Littlewood-Richardson coefficients C(lambda,mu,nu) carry a natural $S_3$ symmetry via permutation of the indices. Our "carton rule" for computing these numbers transparently and uniformly explains these six symmetries; previously formulated Littlewood-Richardson rules manifest at most three of the six. <|endoftext|><|startoftext|> Submitted to Physical Review Letters Two-scale structure of the electron dissipation region during collisionless magnetic reconnection M. A. Shay∗ Department of Physics & Astronomy, 217 Sharp Lab, University of Delaware, Newark, DE 19716 J. F. Drake, M. Swisdak University of Maryland, College Park, MD, 20742 (Dated: November 1, 2018) Particle in cell (PIC) simulations of collisionless magnetic reconnection are presented that demon- strate that the electron dissipation region develops a distinct two-scale structure along the outflow direction. The length of the electron current layer is found to decrease with decreasing electron mass, approaching the ion inertial length for a proton-electron plasma. A surprise, however, is that the electrons form a high-velocity outflow jet that remains decoupled from the magnetic field and extends large distances downstream from the x-line. The rate of reconnection remains fast in very large systems, independent of boundary conditions and the mass of electrons. PACS numbers: Valid PACS appear here Magnetic reconnection drives the release of magnetic energy in explosive events such as disruptions in labo- ratory experiments, magnetic substorms in the Earth’s magnetosphere and flares in the solar corona. Recon- nection in these events is typically collisionless because reconnection electric fields exceed the Dreicer runaway field. Since magnetic field lines reconnect in a boundary layer, the “dissipation region”, whose structure may limit the rate of release of energy, understanding the structure of this boundary layer and its impact on reconnection is critical to understanding the observations. Because of their ability to carry large currents the dynamics of electrons continues to be a topic of interest. Early sim- ulations of reconnection suggested that the rate of re- connection was not sensitive to electron dynamics [1, 2] and this insensitivity was attributed to the coupling to whistler dynamics at the small spatial scales of the dis- sipation region [3, 4]. The results of more recent kinetic PIC simulations have called into question these results by suggesting that the electron current layer stretches along the outflow direction and the rate of reconnection drops[5, 6]. The fast rates of reconnection obtained from earlier simulations[1, 3, 7] were attributed to the influ- ence of periodicity[5]. We present particle-in-cell (PIC) simulations with var- ious electron masses and computational domain sizes and an analytic model that demonstrate that collisionless re- connection remains fast even in very large collisionless systems. The reconnection rate stabilizes before the pe- riodicity of the boundary conditions can impact the dy- namics. The electron current layer develops a distinct two-scale structure along the outflow direction that had not been identified in earlier simulations. The out-of- plane electron current driven by the reconnection elec- ∗Electronic address: shay@udel.edu; URL: http://www.physics.udel.edu/~shay tric field has a length that decreases with the electron mass, scaling as (me/mi) 3/8, which extrapolates to about an ion inertial length di = c/ωpi for the electron-proton mass ratio. The surprise is that a jet of outflowing elec- trons with velocity close to the electron Alfven speed cAe extends up to several 10’s of di from the x-line. Remark- ably, the electrons are able to jet across the magnetic field over such enormous distances because momentum trans- port transverse to the jet effectively “blocks” the flow of the out-of-plane current in this region. The momentum transport causing this “current blocking” effect has the same source (the off diagonal pressure tensor[1]), but is much stronger than that which balances the reconnection electric field at the x-line. Our simulations are performed with the particle-in-cell code p3d [8, 9]. The results are presented in normal- ized units: the magnetic field to the asymptotic value of the reversed field, the density to the value at the cen- ter of the current sheet minus the uniform background density, velocities to the Alfvén speed vA, lengths to the ion inertial length di, times to the inverse ion cy- clotron frequency Ω−1ci , and temperatures to miv A. We consider a system periodic in the x− y plane where flow into and away from the x-line are parallel to ŷ and x̂, respectively. The reconnection electric field is parallel to ẑ. The initial equilibrium consists of two Harris cur- rent sheets superimposed on a ambient population of uni- form density. The reconnection magnetic field is given by Bx = tanh[(y − Ly/4)/w0] − tanh[(y − 3Ly/4)/w0] − 1, where w0 and Ly are the half-width of the initial current sheets and the box size in the ŷ direction. The electron and ion temperatures, Te = 1/12 and Ti = 5/12, are ini- tially uniform. The initial density profile is the usual Har- ris form plus a uniform background of 0.2. The simula- tions presented here are two-dimensional,i.e., ∂/∂z = 0. Reconnection is initiated with a small initial magnetic perturbation that produces a single magnetic island on each current layer. We have explored the dependence of the rate of recon- http://arxiv.org/abs/0704.0818v1 mailto:shay@udel.edu http://www.physics.udel.edu/~shay FIG. 1: (color online). Reconnection electric field versus time: (a) 204.8 × 102.4, (b) 102.4 × 51.2, (c) 51.2 × 25.6. w0 is the initial current sheet width. nection on the system size in a series of simulations with three different system sizes and three different mass ra- tios. For mi/me = 25, the grid scale ∆ = 0.05 and the speed of light c = 15. For mi/me = 100, ∆ = 0.025 and c = 20. For mi/me = 400, ∆ = 0.0125 and c = 40. The reconnection rate versus time is plotted for our simula- tions in Fig. 1. The reconnection rate is determined by taking the time derivative of the total magnetic flux be- tween the x-line and the center of the magnetic island. The rate increases with time, undergoes a modest over- shoot that is more pronounced in the smaller domains, and approaches a quasi-steady rate of around 0.14, in- dependent of the domain size. Earlier suggestions [5] that reconnection rates would plunge until elongated cur- rent layers spawned secondary magnetic islands are not borne out in these simulations. The rates of reconnec- tion approach constant values even in the absence of sec- ondary islands, which for anti-parallel reconnection typ- ically only occur transiently due to initial conditions[10]. Even these transient islands can be largely eliminated by a suitable choice of the initial current layer width w0 (a larger value of w0 is required for the larger domains). A critical issue is whether the periodicity in the x direc- tion can influence the rate of reconnection [5]. In each of the simulations we have identified the time at which the ion outflows from the x-line meet at the center of the mag- netic island. This occurs at t ≈ 155 for the largest simu- lation shown in Fig. 1a. The plasma at the x-line can not be affected by the downstream conditions until t ≈ 255, when a pressure perturbation can propagate back up- stream to the x-line at the magnetosonic speed. This is well after the end of the simulation. The electrons are ejected from the x-line at a velocity of around cAe ≫ cA and therefore might be able to follow field lines back to the x-line. During the traversal time δt = Lx/cAe, the amount of reconnected flux is vinB0L/cAe, where vin is the inflow velocity into the x-line. Using the conservation of the canonical momentum in the z-direction, the condi- tion that an electron with a velocity cAe can not cross this flux to access the x-line reduces to L > di(cA/vin) ∼ 7di, which is easily satisfied for the simulations in Fig. 1. The fact that the reconnection rates for all of the simulation domains in Fig. 1 are essentially identical further sup- ports this conclusion. Also shown in Fig. 1 in the dashed lines are the rates of reconnection for mi/me = 100 in (b) and mi/me = 400 in (c). Consistent with simulations in smaller domains [1, 2], the rate of reconnection is insensitive to the electron mass. We now proceed to explore the structure of the electron current layer. Shown in Fig. 2 is a blow-up around the x- line of the out-of-plane electron velocity for mi/me = 25 and two simulation domains, 204.8 × 102.4 in (a) and 51.2 × 25.6 in (b), and for mi/me = 400 in a simula- tion domain of 51.2 × 25.6 in (c). All of the data is taken in the phase where the reconnection rate and the lengths of the region of intense out-of-plane current are stationary. Reconnection forms intense current layers that have a well-defined length (half widths of around 7di and independent of the size of computational domain formi/me = 25) and then open up forming the open out- flow jet that characterizes Hall reconnection [3, 7]. The current layer in the case of mi/me = 400 in Fig. 2c is dis- tinctly shorter than the smaller mass ratio current layers in Fig. 2a,b, suggesting that the length of the electron current layer depends on the electron mass and would be shorter for realistic proton-electron mass ratios. Shown in Fig. 3a is a blow-up around the x-line of the electron outflow velocity vex for the mi/me = 25, 204.8 × 102.4 run corresponding to Fig. 2a. In contrast with the out-of-plane current the electrons form an out- flow jet that extends a very large distance downstream from the x-line. This outflow jet continued to grow in length until the end of the simulation. This simulation, along with others at differing mass ratios, reveals that the peak outflow velocity is very close to the electron Alfven speed [8, 11]. One might expect that because of the colli- mation of the outflow jet and its length, the reconnection rate would drop. However, this is not the case. While there is an intense jet in the core of the reconnection exhaust, the exhaust as a whole quickly begins to open up downstream of the current layer (Jz). The jet itself therefore does not act as a nozzle to limit the rate of FIG. 2: (color online). Blowups around the x-line of the out- of-plane electron velocity for: (a)mi/me = 25, simulation size 204.8× 102.4, (b) mi/me = 25, 51.2× 25.6, and (c) mi/me = 400, 51.2 × 25.6. FIG. 3: (color online). Blowups around the x-line for system size 204.8× 102.4 with mi/me = 25. (a) The electron outflow velocity vex. (b) Momentum flux vectors, Γ = pexzx̂ + peyzŷ (vectors in box surrounding x-line are multiplied by 20), with a background color plot of | (Ez + (ve ×B/c)z)/Ez | . reconnection: the rate of reconnection remains constant even as the length of the outflow jet varies in time. To understand how the electrons can form such an ex- tended outflow jet while the out-of-plane current layer remains localized, we examine the out-of-plane compo- nent of the fluid electron momentum equation along the symmetry line of the outflow direction. In steady state Ez = − mevex vexBy − ∇ · Γ, (1) where ve is the electron bulk velocity, Γ = pexzx̂+peyzŷ is the flux of z-directed electron momentum in the recon- nection plane (not including convection of momentum) with pe the electron pressure tensor. In Fig. 4a we plot all of the terms in this equation along a cut though the x-line along the outflow direction from a simulation with mi/me = 100 and Lx × Ly = 102.4 × 51.2. The data has been averaged between t = 116.2 and t = 117.0. The electric field (black) is balanced by the sum (red) of the electron inertia (dashed blue), the Lorentz force (solid blue) and the divergence of the momentum flux (green). The major contributions to momentum balance come from the Lorentz force and the divergence of the momentum flux. At the x-line the electric field drive is balanced by the momentum transport [1, 12]. The surprise is that the Lorentz force, rather than simply in- creasing downstream from the x-line to balance the re- connection electric field, instead strongly overshoots the reconnection electric field far downstream of x-line. This tendency was seen in earlier simulations [12] but there was no clear separation of scales because of the small size of these earlier simulations. Downsteam from the x-line the electrons are streaming much faster than the mag- netic field lines. Thus, in a reference frame of the moving electrons the z-directed electric field has reversed direc- tion compared with the x-line. This electric field tries to drive a current opposite to that at the x-line. Evidence for this reversed current appears downstream of the x-line in Fig. 2c. In spite of the strength of the effective elec- tric field, the reversed current carried by the electrons is small. As at the x-line, the momentum transfer to electrons in this extended outflow region is balanced by momentum transport. The momentum flux around the x-line is shown as a 2-D vector plot in Fig. 3 for the same run as in (a). The momentum flux has been multiplied by 20 in the box surrounding the x-line. The data for this figure has been averaged between t = 172.5 and 174.5. The background color plot is of | (Ez+(ve×B/c)z)/Ez |, which is & 1 where the electrons are not frozen-in. Evi- dent is the outward flow of momentum around the x-line and the much stronger outward flow of negative momen- tum in an extended downstream region. The momentum transport is so large that the out-of-plane current down- stream is effectively “blocked”. The force associated with this “blocking effect” drives the flow of the large-scale jet of electrons downstream of the x-line. We define the length ∆x of the inner dissipation re- gion as the distance from the x-line to the point where the Lorentz force vexBy/c crosses the reconnection elec- tric field Ez. At this location the effective out-of-plane electric field seen by the electrons reverses sign, causing the electron current jez to be driven in reverse, which allows the separatrices to open up. Thus, the inner dis- sipation region defines the spatial extent of the magnetic nozzle that develops during reconnection. Since the sim- ulations presented in this paper use artificial values of me, it is essential to understand the me scaling of ∆x so that this important length can be calculated for a proton- electron plasma. The momentum equation of electrons in the outflow direction yields a steady state equation for ex) = vezBy, (2) where vez ∼ cAe. Thus, the profile of By along the out- flow direction and its dependence on me must be deter- mined. This profile is shown formi/me = 25 (system size 102.4× 51.2), 100 (102.4× 51.2) and 400 (51.2× 25.6) in Fig. 4b. Surprisingly, the profile of By is apparently in- dependent of me. Our original expectation was because of the continuity of the flow of magnetic flux into and out of the x-line that By ∼ B0vin/cAe ∝ m e , where the outflow velocity eventually rises to cAe. However, since the electrons are not frozen into the magnetic field until far downstream, the expected scaling fails. To calculate vex we approximate By by a linear ramp and integrate Eq. (2). Setting the Lorentz force equation to the re- connection electric field, we then obtain an equation for )3/8 ( )1/2 ( diBy′ di. (3) For the three simulations shown in Fig. 4b the simula- tions yield 2.9di, 1.8di and 1.0di for mi/me = 25, 100 and 400, respectively, which is in reasonable accord with the scaling. Extrapolating to a mass-ratio of 1836, we predict ∆x ∼ 0.6di. In contrast the outer dissipation region can extend to 10’s of di. We have shown that the electron current layer that forms during reconnection stabilizes at a finite length, independent of the periodicity of the simulation domain, and aside from transients from initial conditions remains largely stable to secondary island formation. Reconnec- tion remains fast with normalized reconnection rates of around 0.14. The length of the electron current layer ∆x scales as m e . Since the width δ of the current layer scales with the electron skin depth c/ωpe, the aspect- ratio δ/∆x ∝ (me/mi) 1/8. Extrapolating from our mi/me = 400 simulations to mi/me = 1836 should not significantly change the aspect-ratio and we therefore ex- pect the current layer to remain stable for real mass ra- tios. The structure of the current layer is important to the design of NASA’s magnetospheric multiscale mission (MMS), which will be the first mission with the time reso- lution to measure the electron current layers that develop during reconnection. The length of the out-of-plane elec- tron current layer projects to around c/ωpi for a proton- electron plasma while the the outflow jet, which supports a strong Hall (out-of-plane) magnetic field, extends 10’s of c/ωpi from the x-line. FIG. 4: (color online). Results for simulation size 102.4×51.2 with mi/me = 25 and 100; and 51.2×25.6 with mi/me = 400. (a) Cuts through the x-line of the contributions to Ohm’s law for mi/me = 100. 1 → −me/eve · ∇vez, 2 → −ẑ · ve × B/e, 3 → −ẑ · (∇ ·Pe)/(nee), 4 → sum of 1,2,3. (b) Cuts through x-line of By for the three different mi/me. This work was supported in part by NSF, NASA and Acknowledgments This work was supported in part by NASA and the NSF. Computations were carried out at the National Energy Research Scientific Computing Center. [1] M. Hesse et al., Phys. Plasmas 6, 1781 (1999). [2] M. A. Shay and J. F. Drake, Geophys. Res. Lett. 25, 3759 (1998). [3] J. Birn et al., J. Geophys. Res. 106, 3715 (2001). [4] B. N. Rogers et al., Phys. Rev. Lett. 87, 195004 (2001). [5] W. Daughton et al., Phys. Plasmas 13, 072101 (2006). [6] K. Fujimoto, Phys. Plasmas 13, 072904 (2006). [7] M. A. Shay et al., Geophys. Res. Lett. 26, 2163 (1999). [8] M. A. Shay et al., J. Geophys. Res. 106, 3751 (2001). [9] A. Zeiler et al., J. Geophys. Res. 107, 1230 (2002), doi:10.1029/2001JA000287. [10] J. F. Drake et al., Geophys. Res. Lett. 33, L13105 (2006), doi:10.1029/2006GL025957. [11] M. Hoshino et al., J. Geophys. Res. 106, 25979 (2001). [12] P. L. Pritchett, J. Geophys. Res. 106, 3783 (2001). ABSTRACT Particle in cell (PIC) simulations of collisionless magnetic reconnection are presented that demonstrate that the electron dissipation region develops a distinct two-scale structure along the outflow direction. The length of the electron current layer is found to decrease with decreasing electron mass, approaching the ion inertial length for a proton-electron plasma. A surprise, however, is that the electrons form a high-velocity outflow jet that remains decoupled from the magnetic field and extends large distances downstream from the x-line. The rate of reconnection remains fast in very large systems, independent of boundary conditions and the mass of electrons. <|endoftext|><|startoftext|> Accepted by The Astrophysical Journal Preprint typeset using LATEX style emulateapj v. 03/07/07 POSITION–VELOCITY DIAGRAMS FOR THE MASER EMISSION COMING FROM A KEPLERIAN RING Lucero Uscanga, Centro de Radioastronomı́a y Astrof́ısica, Universidad Nacional Autónoma de México and Apartado Postal 3-72, 58089 Morelia, Michoacán, Mexico Jorge Cantó, Instituto de Astronomı́a, Universidad Nacional Autónoma de México and Apartado Postal 70-264, 04510 México, DF, Mexico Alejandro C. Raga Instituto de Ciencias Nucleares, Universidad Nacional Autónoma de México and Apartado Postal 70-543, 04510 México, DF, Mexico Accepted by The Astrophysical Journal ABSTRACT We have studied the maser emission from a thin, planar, gaseous ring in Keplerian rotation around a central mass observed edge-on. The absorption coefficient within the ring is assumed to follow a power law dependence with the distance from the central mass as, κ = κ0r −q. We have calculated position-velocity diagrams for the most intense maser features, for different values of the exponent q. We have found that, depending on the value of q, these diagrams can be qualitatively different. The most intense maser emission at a given velocity can either come mainly from regions close to the inner or outer edges of the amplifying ring or from the line perpendicular to the line of sight and passing through the central mass (as is commonly assumed). Particularly, when q > 1 the position-velocity diagram is qualitatively similar to the one observed for the water maser emission in the nucleus of the galaxy NGC 4258. In the context of this simple model, we conclude that in this object the absorption coefficient depends on the radius of the amplifying ring as a decreasing function, in order to have significant emission coming from the inner edge of the ring. Subject headings: galaxies: individual (NGC 4258) — galaxies: nuclei — masers 1. INTRODUCTION The a priori probability of seeing a thin disk nearly edge-on is very small. It is given by p ≃ 0.125 (h/R)2, where h is the thickness of the disk and R is its radius. Typically h/R ≃ 0.01 and thus p ≃ 1.25 × 10−5. Sur- prisingly however, the maser emission observed in sev- eral cosmic sources has been successfully modeled as coming from a ring or truncated disk in Keplerian ro- tation (around a massive object) seen edge-on. For in- stance: circumstellar disks in star-forming regions as in S255 (Cesaroni 1990) and MWC 349 (Ponomarev et al. 1994), and also circumnuclear disks around black holes of galactic nuclei as in NGC 4258 (Watson & Wallin 1994; Miyoshi et al. 1995). In general, the maser emis- sion from a Keplerian disk observed edge-on produces a triple-peaked spectrum (Elmegreen & Morris 1979); but Ponomarev et al. (1994) showed that, there is a transi- tion from triple- to double-peaked spectra as the width of the amplifying ring decreases. NGC 4258 is a Seyfert 2/LINER located at a dis- tance of 7.2 ± 0.3 Mpc (Herrnstein et al. 1999). The water maser emission (22 GHz) toward this galaxy was first detected by Claussen et al. (1984). Shortly after- wards it was shown that the water masers are confined in a very small region (∼1.3 pc) at the center of NGC 4258 (Claussen & Lo 1986). Subsequently, Nakai et al. (1993) discovered water maser emission with velocity off- Electronic address: l.uscanga@astrosmo.unam.mx Electronic address: raga@nucleares.unam.mx sets ±1000 km s−1 from the already known emission at the galactic systemic velocity of ≃472 km s−1. They sug- gested that the high-velocity emission could arise from masers orbiting a massive central black hole, or ejected in a bipolar outflow. Using the Very Long Baseline Array (VLBA), Miyoshi et al. (1995) simultaneously observed the systemic and high-velocity water maser emission in NGC 4258, finding that the spatial distribution and line- of-sight velocities of the water masers trace a thin molec- ular ring in Keplerian rotation around a massive black hole of 3.6×107 M⊙ seen nearly edge-on. The position- velocity (PV) diagram for the maser emission shows dis- tinct Keplerian orbits (with deviations < 1%) defined by the high-velocity maser emission that arises on the ring diameter perpendicular to the line of sight, as well as a line traced by the systemic maser emission that arises from material on the inner edge of the amplifying ring, this linear dependence is a consequence of the change in the line-of-sight projection of the rotation velocity. By monitoring both systematic and high-velocity wa- ter maser emission of NGC 4258 over periods of sev- eral years with different radio telescopes, a signif- icant centripetal acceleration was observed only for the maser features near the galactic systemic velocity. The systemic maser features drift at a mean rate of ∼9 km s−1yr−1 (Haschick et al. 1994; Greenhill et al. 1995; Nakai et al. 1995; Bragg et al. 2000) while the high-velocity maser features drift by .1 km s−1yr−1 (Greenhill et al. 1995). In a recent spectroscopic study, Bragg et al. (2000) detected accelerations for http://arxiv.org/abs/0704.0819v1 mailto:l.uscanga@astrosmo.unam.mx mailto:raga@nucleares.unam.mx 2 USCANGA, CANTÓ, & RAGA the high-velocity features in the range of −0.77 to 0.38 km s−1 yr−1. These measurements indicate that the systemic water masers lie within a relatively nar- row range of radii, on the near side of the ring at the proximity of its inner edge, while the high-velocity wa- ter masers are located near the ring diameter (between −13.o6 and 9.o3 of the mid-line, Bragg et al. 2000). In addition, the deviation of the high-velocity masers from a straight line passing through the systemic masers in the plane of the sky suggests that the rotating disk is slightly warped (Herrnstein et al. 1996, 1999, 2005). Previously, Watson & Wallin (1994) demonstrated that the maser emission from a rapidly rotating, thin Keplerian ring viewed edge-on can reproduce the general features of the observed 22 GHz radiation from the nu- cleus of NGC 4258, including the high-velocity satellites. However, it is important to point out that their assump- tion of a uniform absorption coefficient within the ampli- fying ring results in a PV diagram for the most intense masers that is qualitatively different from the observed one. While their model predicts that the maser emission at velocities around the systemic velocity of the galaxy comes mainly from the outer edge of the ring, the ob- servations indicate that this emission is actually coming from the inner parts of the truncated disk. In this paper we show that this discrepancy can be resolved if the absorption coefficient decreases with dis- tance from the central mass. The model is presented in §2. The main results are described in §3. Finally, the conclusions are discussed in §4. 2. MODEL We study the maser emission that arises from a thin, planar, gaseous ring in Keplerian rotation around a mas- sive central object when it is observed edge-on. The mas- ing gas is located between R0 and R, the inner and outer radii of the ring, respectively. For simplicity, we assume that the disk is transparent to the maser radiation at radii smaller than R0 and greater than R, although the inner region is probably thermalized due to the higher gas density, and actually it would absorb a significant fraction of the maser radiation produced in the far side of the ring (see §4). The absorption coefficient is assumed to follow a power law function of the distance from the central mass within the amplifying ring as, κ = κ0r The distances are measured in units of R, and the veloc- ities are measured in units of vout, the rotation velocity at the outer edge of the ring (see Figure 1). For the case of an unsaturated maser and neglecting the spontaneous emission, the intensity of the maser ra- diation from a line of sight with impact parameter y at a velocity vr is I(vr, y) = I0e τ(vr,y) , (1) where the optical depth or gain along the line of sight is given by τ(vr , y) = 2 κ0 ∫ xmax (x2+y2)−q/2 exp −(v − vr)2 xmin = r20 − y2 for 0 ≤ |y| ≤ r0 , 0 for r0 < |y| ≤ 1 , xmax = 1− y2 . The line-of-sight velocity component of the gas at the position (x, y) can be expressed as v = y/(x2 + y2)3/4. Here I0 and ∆vD are the background intensity and the Doppler width, respectively, which are supposed to be uniform inside the amplifying ring. The Doppler width ∆vD is related with the FWHM of the velocity distribu- tion of the emitting particles as ∆vD = FWHM/ 4 ln 2. We have numerically solved equations (1) and (2), and we have also calculated the y-positions (impact parame- ters) of maximum maser intensity for each specific value of the velocity vr. When we have found two local max- ima, we have kept both. With this information, we have constructed the PV diagrams using the positions of the observer’s line of sight with maximum emission at each velocity. This way to construct the PV diagrams was previously used by Uscanga et al. (2005). We show the results using the following values for the model parameters which seem to be appropriate for mod- eling maser emission in the galaxy NGC 4258. The back- ground intensity is I0 = 1.3 × 10−5 Jy beam−1, corre- sponding to a radio continuum source with a temper- ature of 106 K (Watson & Wallin 1994). The dimen- sionless inner radius r0 = 0.51, using the estimated val- ues for the inner and outer radii of 4.1 and 8.0 mas re- spectively, given by Miyoshi et al. (1995). The Doppler width ∆vD = 0.007vout which combined with an outer rotation velocity of 770 km s−1 (Miyoshi et al. 1995), gives a Doppler width ≃ 5 km s−1, similar to the value used by Watson & Wallin (1994). We have used some representative values of the exponent q, specifically q = 0, 1/2, 15/8, 5 for Models I, II, III and IV, respec- tively. In Model I, we study the simplest situation of a uniform absorption coefficient. In Model III, we choose q = 15/8, that corresponds to the density dependence with the radius of an accretion disk, i.e., Frank et al. (1992). Finally, in Models II and IV, we explore two other different values of the exponent q in order to study how it changes the results. In all the models, the value for the absorption coefficient κ0 is mainly determined by the requirement that the intensity at the peak of the central component (13 Jy beam−1) is compatible with the observational data when the background intensity is 1.3×10−5 Jy beam−1. Other values of I0, r0, ∆vD, and κ0 give qualitatively similar results. We present the results in the next section; but let us first discuss briefly some important concepts in order to understand these results. In general, the observed emis- sion at a given velocity coming from a specific position in a nebula has contributions of the whole material along the line of sight. However, when the velocity gradient along the line of sight is greater than the dispersion ve- locity (thermal or turbulent) of the emitting material, the main contribution to the emission is actually coming from a narrow region around the point with a line-of- sight velocity equal to the observation velocity. The esti- mated width of the region is 2l, where l is the correlation distance defined as l ≡ ∆vD |dv/dx| , (3) here dv/dx is the line-of-sight velocity gradient. In this approximation, known as Sobolev’s approximation or the approximation of high velocity gradient, the observed in- tensity is given by the following expression I(vr) = I0(vr)e −τ(vr) + S(vr)(1 − e−τ(vr)) , (4) where I0 is the background intensity, S is the source func- tion and τ is the optical depth given by τ(vr) = κ(2l) , (5) where κ is the absorption coefficient. For the case of maser emission, the value of κ is in- trinsically negative and τ is also negative, therefore the factor e−τ(vr) becomes an amplification factor. Because of this reason, the relative contribution at a given veloc- ity of the correlation region is even more important with respect to the remainder of the emitting material than in the case of non-maser emission. Consequently, the ap- proximation given by equations (4) and (5) is suitable for maser emission. As shown in the next section, for a gaseous ring of in- ner radius R0 and outer radius R in Keplerian rotation and seen edge-on, the emission either comes preferen- tially from the inner or outer edges of the ring or from the line perpendicular to the line of sight and passing through the ring center. In the first two cases, it is easy to show that the expected PV diagram will be a straight line. When the emission comes from the outer edge, the slope of the straight line is equal to one (measuring the distances in units of the outer radius of the ring and the velocities in units of the rotation velocity at that point), whereas if the emission comes from the inner edge, the slope of the straight line is equal to 1/r 0 . On the other hand, when the emission arises from the line perpendic- ular to the line of sight, the PV diagram will be a curve with the form 1/y1/2, where y is the impact parameter of the observation (see Figure 2). 3. RESULTS The PV diagrams for the maser emission peak are point-symmetric, consequently we only discuss positive velocities from now on (see Figures 3 and 4). • Model I (q = 0) – With this value of the expo- nent q, we are considering the simplest situation, a uniform or constant absorption coefficient. The strongest maser emission either comes mostly from the outer edge of the ring at velocities lower than 1, or from the mid-line of the ring perpendicular to the line of sight and passing through the cen- tral mass at greater velocities. The filled squares, circles, and triangles mark the regions of strongest maser emission at each velocity. • Model II (q = 1/2) – The results are qualitatively similar to those of Model I. • Model III (q = 15/8) – This value of exponent q corresponds to the density dependence with radius of an accretion disk (ρ ∝ r−15/8). The strongest maser emission either comes mainly from the inner edge of the ring at low velocities (velocities near the systemic velocity), or from the outer edge at velocities close to 1. On the other hand, at ve- locities greater than 1, the most intense emission comes predominantly from the mid-line of the ring perpendicular to the line of sight. • Model IV (q = 5) – The strongest maser emission comes mainly from the inner edge of the ring at velocities lower than 1. At greater velocities, the most intense emission can either come mainly from the inner edge or from the mid-line of the ring per- pendicular to the line of sight. In summary, from the results of Models I–IV (see Fig- ure 3), we found that the most intense maser emission can be around the inner or outer edges of the ring, or the mid-line of the ring perpendicular to the line of sight depending on the velocity and also on the value of q. In fact, the PV diagrams are qualitatively different when q < 1 or q > 1. In the first case, for q < 1 (including the simplest situation with a uniform absorption coeffi- cient, q = 0) and vr < 1, the PV diagram corresponds to a straight line with slope 1; for vr > 1, the diagram corresponds to a Keplerian curve. In the second case, for q > 1 and vr < 1, the PV diagram corresponds to a straight line with a slope that depends on the inner ra- dius of the ring. At velocities close to 1 the slope changes to 1; for vr > 1, the diagram corresponds to a Keplerian curve and also a straight line with a slope that depends on the inner radius under circumstances such as in Model It is also important to realize that when q > 1, the optical depth or gain presents two local maxima within a certain velocity range. Either local maxima may be a global maximum. For vr < 1, the local maximum can be either at the inner and/or outer edges of the ring, while for vr > 1, they are located at the inner edge and/or mid-line of the ring (see Figure 5). As shown in the top panels of Figure 5 (Model III, q = 15/8), the relative difference between the two lo- cal maxima is not very significant. However, when the value of q is higher (like in Model IV, q = 5 shown in the bottom panels) the relative difference becomes more important. In order to estimate the velocity vc at which the global maximum of the optical depth changes its locus, we have calculated analytical approximations for the largest value of the optical depth or gain that corresponds to the max- imum intensity at the inner and outer edges of the ring, and also at the mid-line of the ring perpendicular to the line of sight. The detailed calculations are presented in the Appendix. The following equations give the local maximum depth as a function of the velocity in each neighborhood τ(vr) ≃ πκ0∆vD 1− r0v2r inner edge , (6) τ(vr) ≃ πκ0∆vD 1− v2r outer edge , (7) τ(vr) ≃ r mid-line . (8) The velocity vc is estimated by combining equations (6) and (7), or equations (6) and (8) according to the value of vc (when vc < 1 or vc > 1, respectively). The results are 2(1−q) 0 − 1 2(1−q) 0 − r0 for vc < 1, (9) 4 USCANGA, CANTÓ, & RAGA 2(1−q) 0 − v c + r0v c = 0 for vc > 1, (10) which are presented in Figure 6 for some representative values of the exponent q. The bottom plot of Figure 6 shows vc as function of the inner radius r0 for different values of q, from equation (9a). For q < 1, there is no solution to equation (9a). When q = 1, vc = 0 for any value of r0. That is, the optical depth has a maximum and its locus is around the outer edge of the ring, and vc is meaningless as we have defined it. When q > 1, the optical depth presents two local maxima, and vc is different from zero and its value depends on r0. This velocity corresponds to the value at which the locus of the global maximum changes from the inner edge to the outer edge of the ring. As a consequence, there is a slope change in the PV diagrams at velocities lower than the rotation velocity at the outer edge of the ring. For instance, when q = 15/8 and r0 = 0.51, the slope change occurs at vc = 0.906. In other words, the locus of the global maximum of the optical depth changes from the inner to the outer edge of the ring at this velocity vc. The top plot of Figure 6 also shows vc as function of the inner radius r0 using specific values of q and ∆vD in equation (9b); in this case, 5 and 0.007vout, respectively. As an example, when ∆vD = 0.007vout, q = 5 and r0 = 0.51, then vc = 1.077. Stated differently, at that velocity vc, the largest value of both the optical depth and the intensity changes its locus from the inner edge to the mid-line of the ring perpendicular to the line of sight. The remarkable water maser emission in the nucleus of the galaxy NGC 4258 traces a PV diagram where the detected emission around the systemic velocity of the galaxy comes from the inner edge of the amplifying ring; this emission delineates a straight line just as the straight line that connects points C and D in Figure 2 (see Figure 3 of Miyoshi et al. 1995). According to our model re- sults, this implies that the absorption coefficient within the molecular ring of NGC 4258 is not uniform, instead it must be a decreasing function of the distance from the central mass, i.e., κ = κ0r −q with q > 1. Moreover, the observed red/blue-shifted emission at high velocities that arises from the mid-line of the ring perpendicular to the line of sight traces a Keplerian curve such as is indicated by the model results (see Figure 7). Simply stated, when q > 1 the PV diagram is qualitatively similar to the one observed for the water maser emission detected in the nucleus of NGC 4258. As an example, in Figure 7 we show a comparison be- tween the results of Model III (q = 15/8) and the water maser emission in NGC 4258. The detected emission arises from the inner edge of the amplifying ring and the mid-line perpendicular to the line of sight. The locus of the observed maser emission coincides with the locus of the most intense maser emission as indicated by the sizes of the circles in Figure 7. The model results indicate that there is emission coming from the outer edge of the ring at velocities close to 1, nevertheless the sizes of the circles indicate that this emission is very weak. Maybe maser emission is not detected from this locus for this reason. Additionally, our model results also indicate that the intense maser emission at the inner edge of the ring ex- tends neither to velocities very different from the sys- temic velocity nor to impact parameters very different from zero, as is indicated by the size of the circles in the PV diagram shown in Figure 7. Furthermore, accord- ing to the size of the circles, the other locus of intense maser emission is the mid-line of the ring perpendicular to the line of sight, precisely the locus of the red/blue- shifted maser emission at high velocities that describes Keplerian curves in the PV diagram. 4. DISCUSSION AND CONCLUSIONS In our model, we have assumed that the gas in the re- gion inside the masing ring is transparent to the maser radiation. This implies that the most intense maser emis- sion at low velocities (velocities near the systemic veloc- ity) comes mainly from the outer edge of ring (for q < 1) or from the inner edge (for q > 1), either the near or far side of the ring, as is indicated in Figure 4. Mea- surements of positive acceleration of the maser emission around the systemic velocity show that this emission cer- tainly comes from the near side of the ring at the prox- imity of its inner edge (e.g., Greenhill et al. 1995). If we suppose that the gas inside the masing ring is thermal- ized probably due to its higher density then an important fraction of the maser emission from the backside of the ring would be absorbed and the detected emission would come from the front side of the ring at the outer or inner edge depending on the value of q. For instance, consider- ing absorption and emission from the gas located inside the masing ring, the difference in the intensity for a line of sight that passes through both the inner absorbing region and the front side of the masing ring from the in- tensity for a line of sight that passes through both the backside of the masing ring and the inner absorbing re- gion is S(1−e−τ2)(eτ1−1) where S is the source function of the gas inside the masing ring, τ2 is the optical depth in this region, and τ1 is the optical depth for the front side of the ring. If τ2 >> τ1, then the detected emission would be the radiation amplified by the front side of the masing ring. Also, we have made a simplifying assumption about the geometry of the masing ring in NGC 4258, consider- ing that the amplifying ring is strictly flat. Despite the observations indicate an apparent warp in the maser dis- tribution of this galaxy, Kartje et al. (1999) presented a model in which the disk does not require to be physically warped in order to the masing gas become exposed to the central continuum radiation. In this scenario, dusty clouds provide the shielding of the high-energy contin- uum, which is required for the gas to remain molecu- lar. They found that a flat-disk model of the irradiated ring could be applied to a source like NGC 4258 only if the water abundance is higher than the value implied by equilibrium photoionization-driven chemistry. A very important result from their study (based on radiative and kinematic considerations) was that, even if the disk in NGC 4258 is warped, the maser-emitting gas must be clumpy, instead of homogeneous as in the scenario pre- viously proposed by Neufeld & Maloney (1995). An important result of our model shows that the as- sumption, commonly used, that considers a uniform or constant absorption coefficient within the masing ring in Keplerian rotation around the nucleus of NGC 4258 is not appropriate. For example, Wallin et al. (1998) supposed that κ was constant considering that the locus of the maser emission from NGC 4258 was determined mainly by the velocity gradients in a Keplerian velocity field indicating some uniformity of κ, at least on length scales comparable to the coherence or correlation length resulting from the Keplerian velocity gradients. On the contrary, from our analysis, we conclude that a constant absorption coefficient would result in a PV diagram qual- itatively different from the observed one, since the most intense maser emission would come predominantly from a narrow region close to the outer edge of the ring instead of a narrow region close to the inner edge of the ring, as indicated by the observations. Necessarily, the absorp- tion coefficient must be a decreasing function of distance from the central mass (i.e., κ = κ0r −q with q > 1) to have significant emission coming from the inner edge of the amplifying ring and hence explain the form of the PV diagram delineated by the water masers in NGC 4258. When comparing our edge-on disk model with the observations of NGC 4258, it is clear that we need a κ ∝ r−q radial dependence for the absorption coefficient with q > 1 (so as to favour the emission from the in- ner edge of the disk, see above) in order to reproduce the observations. In reality, the fact that the disk of NGC 4258 is warped introduces geometrical effects which might favour the inner disk edge emission (over the one of the outer edge). One will need to compute more complex, 3D transfer models to see whether or not these geomet- rical effects are sufficient to explain the PV diagrams of the NGC 4258 masers without introducing the radially dependent absorption coefficient which is required by the edge-on disk models described in the present paper. J. C. and A. C. R. acknowledge support from CONA- CyT grants 41320 and 43103, and DGAPA-UNAM. L. U. acknowledges support from DGAPA-UNAM. We sin- cerely thank J. M. Torrelles and Y. Gómez for useful comments, which contributed to improve an earlier ver- sion of this manuscript. L. U. gives special thanks to M. R. Pestalozzi and M. Elitzur for valuable comments on this work. We also thank an anonymous referee for helpful comments on the manuscript. APPENDIX ANALYTICAL APPROXIMATIONS FOR THE OPTICAL DEPTH In this appendix, we describe how to obtain the analytical approximations for the optical depth given by equations (6)–(8). First, we define w = (v − vr)/∆vD then we can change the variables in equation (2) and rewrite it as τ(vr , y) = 2κ0 ∫ wmax (x2 + y2)−q/2 exp(−w2)dw , (A1) where wmin = 0 − vr wmax = y − vr Note that wmin > wmax since r0 ≤ 1. Using the expressions for the line-of-sight velocity v = y/(x2 + y2)3/4 and the previously defined variable w = (v− vr)/∆vD, we can write x = (y/(vr + w∆vD)) 4/3 − y2 . At zero order around w = 0 we obtain (x2 + y2)−q/2 ≃ − 2∆vD(y/vr) 2(2−q)/3 (y/vr)4/3 − y2 . (A2) Additionally ∫ wmax exp(−w2)dw = erf(wmax)− erf(wmin) , (A3) where erf(w) is the error function, defined as erf(w) ≡ (2/ exp(−t2)dt. Finally, substituting equations (A2) and (A3) into (A1), we obtain the approximation for the optical depth τ(vr , y) ≃ πκ0∆vD(y/vr) 2(2−q)/3 (y/vr)4/3 − y2 erf(wmin)− erf(wmax) . (A4) Around the inner edge of the ring, vr ≃ y/r3/20 and the maximum value of [erf(wmin)− erf(wmax)] = 2. Substituting these approximations into equation (A4), we obtain equation (6). Similarly, around the outer edge of the ring, vr ≃ y, and the maximum value of [erf(wmin)− erf(wmax)] also equals 2. Then, equation (A4) reduces to equation (7) for the optical depth at the outer edge of the ring. In order to find an approximation for the local maximum of optical depth at the mid-line of the ring perpendicular to the line of sight, we expand the expression for the velocity along the line of sight around x = 0 to obtain v ≃ 1 , (A5) 6 USCANGA, CANTÓ, & RAGA therefore = −∆vD , (A6) and thus 5/4 . (A7) Then, using Sobolev’s approximation τ = κ0(x 2 + y2)−q/2(2x) , (A8) and substituting equation (A7) into (A8), we obtain the following expression τ ≃ κ0 5/2 + y2 )−q/2 4 5/4 , (A9) since y ≪ 1 and ∆vD is small, then 43∆vDy 5/2 ≪ y2, hence τ ≈ 4 5/4−q , (A10) considering that y = v−2r , we finally obtain the approximation for the local maximum of the optical depth at the mid-line of the ring given by equation (8). REFERENCES Bragg, A. E., Greenhill, L. J., Moran, J. M., & Henkel, C. 2000, ApJ, 535, 73 Cesaroni, R. 1990, A&A, 233, 513 Claussen, M. J., Heiligman, G. M., & Lo, K. Y. 1984, Nature, 310, Claussen, M. J., & Lo, K.-Y. 1986, ApJ, 308, 592 Elmegreen, B. J., & Morris, M. 1979, ApJ, 229, 593 Frank, J., King, A., & Raine, D. 1992, Accretion Power in Astrophysics (Cambridge University Press) Greenhill, L. J., Henkel, C., Becker, R., Wilson, T. L., & Wouterloot, J. G. A. 1995, A&A, 304, 21 Haschick, A. D., Baan, W. A., & Peng, E. W. 1994, ApJ, 437, L35 Herrnstein, J. R., Greenhill, L. J., & Moran, J. M. 1996, ApJ, 468, Herrnstein, J. R., Moran, J. M., Greenhill, L. J., & Trotter, A. S. 2005, ApJ, 629, 719 Herrnstein, J. R., et al. 1999, Nature, 400, 539 Kartje, J. F., Königl, A., & Elitzur, M. 1999, ApJ, 513, 180 Miyoshi, M., Moran. J., Herrnstein, J., Greenhill, L., Nakai, N., Diamond, P., & Inoue, M. 1995, Nature, 373, 127 Nakai, N., Inoue, M., Miyazawa, K., Miyoshi, M., & Hall, P. 1995, PASJ, 47, 771 Nakai, N., Inoue, M., & Miyoshi, M. 1993, Nature, 361, 45 Neufeld, D. A. & Maloney, P. R. 1995, ApJ, 447, L17 Ponomarev, V. O., Smith, H. A., & Strelnitski, V. S. 1994, ApJ, 424, 976 Uscanga, L., Cantó, J., Curiel, S., Anglada, G., Torrelles, J. M., Patel, N. A., Gómez, J. F., & Raga, A. C. 2005, ApJ, 634, 468 Wallin, B. K., Watson, W. D., & Wyld, H. W. 1998, ApJ, 495, 774 Watson, W. D., & Wallin, B. K. 1994, ApJ, 432, L35 R /R00 Observer Fig. 1.— Schematic diagram of a gaseous disk in Keplerian rotation. The masing gas exists between radii R0 and R. At radii smaller than R0 and greater than R the disk is transparent to the maser radiation. The observer is on the plane of the disk. All the distances are measured in units of R, the outer radius of the amplifying ring; therefore the variables x, y, and r0 are dimensionless. 8 USCANGA, CANTÓ, & RAGA v = y 1/2v = 1/y v = y/r0 Fig. 2.— PV diagram for the maser emission of a gaseous ring with inner radius R0 and outer radius R in Keplerian rotation observed edge-on. The straight line that connects points A and B has a slope equal to 1, while the straight line that connects points C and D has a slope equal to 1/r Fig. 3.— From top to bottom results of Models I, II, III, and IV. The left panels show PV diagrams for the maser emission peak. The filled squares, circles, and triangles represent the strongest maser emission which is coming from regions a, b, and c, respectively, indicated in Figure 4. The straight lines or curves represent the velocity dependences of the regions where this emission arises. The central panels show PV diagrams for the maser emission peak. Because of the point-symmetric shape of these diagrams, only positive velocities are shown. The radii of the open circles are proportional to the maximum maser intensity at each position and velocity. The right panels show the logarithm of the ratio between the maximum intensity and the background intensity as a function of the velocity. 10 USCANGA, CANTÓ, & RAGA :a v = y b : v = 1 / y x x x b b b Observer Observer Observer Model IIIModels I and II Model IV :c v = y / r0 Fig. 4.— Schematic representation of the results of Models I, II, III, and IV. The filled squares, circles, and triangles represent the most intense maser emission that is coming from regions a, b, and c, respectively. These regions are very narrow because the correlation distance is very small; that is, the width ∆vD is much smaller than the line-of-sight velocity gradient. Besides, the exponential amplification of the intensity emphasizes small changes in the optical depth. Fig. 5.— Left : PV diagrams for the maser emission in grey scale with intensity contours overlaid. The darker regions show the locus of the strongest emission in these diagrams, which potentially could be detected, depending on the sensitivity cutoff of the observations. Right : Close-up to the PV diagrams showing the maser emission at low velocities. 12 USCANGA, CANTÓ, & RAGA Fig. 6.— vc as function of the inner radius of the ring, r0. Bottom: For vc < 1, it is computed from equation (9a) for some representative values of q. For q < 1, there is no solution to equation (9a). Top: For vc > 1, it is computed from equation (9b). This is the solution to equation (9b), using ∆vD = 0.007vout and q = 5. Fig. 7.— Comparison between the calculated PV diagram for the maser emission peak in Model III (q = 15/8) and the PV diagram delineated by the water masers observed in NGC 4258 (Miyoshi et al. 1995). The radii of the open circles are proportional to the maser intensity at each position and velocity. The dots represent the observed maser spots in NGC 4258. We have subtracted the ring systemic velocity of 476 km s−1 from the observed local standard of rest velocity of the maser spots in order to compare the observed PV diagram with the modeled one. The positions and velocities are in units of the outer radius of the ring (8 mas) and the rotation velocity at the outer edge of the ring (770 km s−1), respectively. ABSTRACT We have studied the maser emission from a thin, planar, gaseous ring in Keplerian rotation around a central mass observed edge-on. The absorption coefficient within the ring is assumed to follow a power law dependence with the distance from the central mass as, k=k0r^{-q}. We have calculated position-velocity diagrams for the most intense maser features, for different values of the exponent q. We have found that, depending on the value of q, these diagrams can be qualitatively different. The most intense maser emission at a given velocity can either come mainly from regions close to the inner or outer edges of the amplifying ring or from the line perpendicular to the line of sight and passing through the central mass (as is commonly assumed). Particularly, when q>1 the position-velocity diagram is qualitatively similar to the one observed for the water maser emission in the nucleus of the galaxy NGC 4258. In the context of this simple model, we conclude that in this object the absorption coefficient depends on the radius of the amplifying ring as a decreasing function, in order to have significant emission coming from the inner edge of the ring. <|endoftext|><|startoftext|> Microsoft Word - Boriskina_OL2007.DOC Coupling of whispering-gallery modes in size-mismatched microdisk photonic molecules Svetlana V. Boriskina School of Radiophysics, V. Karazin Kharkov National University, Kharkov 61077, Ukraine Mechanisms of whispering-gallery (WG) modes coupling in microdisk photonic molecules (PMs) with slight and significant size mismatch are numerically investigated. The results reveal two different scenarios of modes interaction depending on the degree of this mismatch and offer new insight into how PM parameters can be tuned to control and modify WG-modes wavelengths and Q-factors. From a practical point of view, these findings offer a way to fabricate PM microlaser structures that exhibit low thresholds and directional emission, and at the same time are more tolerant to fabrication errors than previously explored coupled-cavity structures composed of identical microresonators. © 2007 Optical Society of America. OCIS codes: 130.3120, 140.3410, 140.4780, 140.5960, 230.5750, 260.3160 During the last decade, photonic molecules1 (clusters of electromagnetically-coupled optical microcavities) have gone a long way from a useful illustration of parallels between behavior of photons and electrons and now hold promise of new insights into physics of light-matter interactions and also of a variety of practical applications, including microlasers, tunable filters and switches, coupled-cavity waveguides, sensors, etc2-10. The simplest PM composed of two identical optical microcavities has been widely used as a test-bed to demonstrate shift and splitting of cavity modes and formation of a spectrum of bonding and anti-bonding PM supermodes1- 4. I have recently shown how arranging identical WG-mode microdisks into pre-designed high-symmetry configurations yields quasi-single-mode PMs with dramatically increased Q- factors6, enhanced sensitivity to the environmental changes7, and/or directional emission patterns8. In all these structures, size uniformity of microcavities is an important issue in successful realization of PM-based optical components. The motivation for studying interactions of optical modes in a photonic molecule with size mismatch9,10 stems from two sources. First, precise and repeatable fabrication of identical microcavities, which in many cases are tiny structures having just several microns in diameter, is highly challenging. Second, a systematic study of double-cavity PMs with various degrees of cavities size mismatch can reveal new mechanisms of manipulating their optical properties thus paving way to improving or adding new functionalities to PM-based photonic devices. Such study has never been performed before, and is a focus of this letter. Despite its simplicity, the double-cavity structure provides useful insight into the general mechanisms of WG-modes coupling and offers new design ideas for more complex structures. The Muller boundary integral equations framework previously developed by the author to model PMs composed of identical cavities7 has been modified to study size-mismatched PMs. In the following, the term “microcavity mode” encompasses a complex value of the mode eigenfrequency and the corresponding eigenvector (i.e., modal spatial field distribution). The PM under study is composed of a pair of side- coupled microdisks of radii RA, RB and refractive indices nA, nB separated by an airgap of width w (Fig. 1a). The microdisks of thicknesses much smaller than their diameters are considered. Thus, instead of the 3-D boundary value problem for the Maxwell equations, we solve its 2-D equivalent. In the following analysis, we search for the TE (transverse-electric) WG-modes, which are dominant in thin disks. At wavelength λ = 1.521 μm, a 2-D cavity with radius 1.1 μm and effective refractive index 63.2TE =effn (2-D equivalent of a 200 nm-thick GaInAsP disk)2 supports WG8,1-mode with one radial field variation and eight azimuthal field variations (Q = 5243). This mode (like all other WG-modes in circular cavities) is double- degenerate due to the symmetry of the structure. WG-mode degeneracy is removed if two (or more) cavities are brought close together1-9, and four non- degenerate WG-supermodes of different symmetry appear in the double-disk PM spectrum instead of every WG-mode of an isolated cavity. Fig. 2 (b and c) shows the wavelength migration and Q-factors change of these modes with the change of the radius of one of the cavities. The modes are labeled according to the symmetry of their field patterns along the y- and x- axes, respectively (e.g., OE supermode has odd symmetry with respect to y-axis and even symmetry with respect to x-axis). OE and OO modes are termed “anti-bonding” modes, while EO and EE modes are termed “bonding” ones. Bonding and anti-bonding supermodes group into nearly-degenerate doublets as seen in Fig. 1b. The values of real parts of eigenfrequencies of two modes in a doublet are so close to each other that they cannot be distinguished (Fig. 1b), while their imaginary parts differ, resulting in different Q-factors of these supermodes (Fig. 1c). Thus, in practice only two peaks are observed in a symmetrical double-cavity PM lasing spectrum (see Fig. 2 in Ref. 9), where the narrow high- intensity peak corresponds to the high-Q anti-bonding mode doublet, and the wider low-intensity peak corresponds to the bonding one. 0.8 1.0 1.2 1.4 (single disk) Radius of microdisk B, R (μm) w (a) RA=1.1 μm RB Fig 1. (a) A geometry of a PM composed of two microdisks of different radii; (b) wavelengths migration and (c) Q-factors change of PM supermodes as a function of the radius of disk B (RA = 1.1 µm, w = 400 nm). The insets show the magnetic field distribution of the bonding (EE) and the anti- bonding (OE) WG8,1 supermodes in the symmetrical (RA = RB = 1.1 µm) PM. Here and thereafter, corresponding characteristics of the WG8,1 mode of an isolated cavity with radius 1.1 µm are plotted for comparison (dash-dot line). 1.05 1.10 1.15 1.05 1.10 1.15 Superm ode Q -factor Radius of microdisk B, R (μm) Fig. 2. Supermodes wavelengths (left) and Q-factors (right) in the vicinity of anti-crossing point AC1 (RB = RA). Mode switching (see the modal near-field distributions at points A and B shown in the insets) at the anti-crossing point and Q-factor enhancement of one of the supermodes can be observed. Careful study of Figs. 1 b,c reveals a number of so-called exceptional points (corresponding to certain values of the varied parameter), where PM supermodes couple following either crossing (C) or avoided crossing (AC) scenarios. The behavior of wavelengths and Q-factors of the four supermodes in the vicinity of these exceptional points is shown in more detail in Figs. 2-4 (for the points AC1, AC2, and C1, respectively). The phenomenon of coupling of complex eigenvalues of matrices dependent on parameters under the change of these parameters is of a general nature and is observed in many physical systems11. Usually, frequency anti-crossing (crossing) is accompanied by crossing (anti-crossing) of the corresponding widths of the resonance states. Furthermore, at the points of avoided frequency crossing (points AC1-AC3 in Fig. 1), eigenmodes interchange their identities, i.e., Q-factors and field distributions. 0.89 0.90 0.91 0.92 0.89 0.90 0.91 0.92 Superm ode Q -factor Radius of microdisk B, R (μm) Fig. 3. Supermodes wavelengths (left) and Q-factors (right) in the vicinity of anti-crossing point AC2. Wavelengths repulsion accompanied by the linewidths crossing is observed. The insets demonstrate mode switching at the anti-crossing point (the modal near-field distributions shown in the upper(lower) insets correspond to the complex frequency values at the points labeled as A and D (B and C), respectively. In the context of coupling of WG-modes in microcavities, this interchange offers exciting new prospects for manipulating the PM optical characteristics, e.g. for realization of optical flip-flops9. For example, the coupling of modes with avoided frequency crossing scenario observed in Figs. 2 and 3 makes possible switching of field intensity between two microdisks. To realize such switching in practice, carrier-induced refractive index change of one of the disks induced by nonuniform pumping can be used. This effect was observed experimentally11 in a PM composed of nearly-identical microdisks (similar to the case shown in Fig. 2). If the microcavities are severely size- mismatched, their WG-modes couple with the frequency crossing scenario. This situation is demonstrated in Fig. 4, and the numerical data indicate that such coupling may spoil significantly the Q-factors of the high-Q modes in the larger microdisk. However, optical coupling between cavities with strongly detuned WG-modes makes possible broad spectral transmission effects in coupled resonator optical waveguides (CROWs)10, coupled-resonator induced transparency12, and significant reducing of CROW bend radiation loses13. 0.72 0.75 0.78 0.72 0.75 0.78 Superm ode Q -factor Radius of microdisk B, R (μm) Fig. 4. Supermodes wavelengths (left) and Q-factors (right) in the vicinity of the crossing point C1. Wavelength crossing accompanied by damping of the high-Q supermodes is observed. The insets show supermodes near-field portraits at the crossing point. 0.4 0.6 0.8 1.473 1.474 0.4 0.6 0.8 5x103 6x103 7x103 8x103 9x103 Superm ode Q -factor Disk-to-disk separation (μm) Fig. 5. Resonance wavelength (left) and Q-factor (right) of an anti-bonding WG-supermode in a three-disk PM. The central disk of radius 1.065 μm is separated from the side disks of radii 1.1 μm by airgaps of 400 nm width. Supermode near-field portrait and directional far-field emission pattern at the point labeled as A are shown in the inset. Finally, enhancement of the Q-factor of a single supermode in a double-disk PM in comparison to its single- cavity value can be observed in Fig. 2 in a relatively wide range of cavities radii detuning: 14 nm < ΔR < 53 nm (ΔR = |RA - RB|). Note that all the other PM supermodes have significantly lower Q-factors in this range of the parameter change. This effect offers a way for selective enhancement of the Q-factor of a single supermode that (unlike symmetry- enhanced Q-factor boost in polygonal PMs)6,7 does not rely on exact cavity size-matching. A possible realization of a PM-based structure designed by making use of this mechanism of selective mode enhancement is presented in Fig. 5. It consists of three coupled microcavities, with the central cavity radius detuned by ΔR = 35 nm from the side cavities radii. By adjusting the width of the airgaps between microcavities, noticeable Q-factor enhancement of one anti- bonding supermode is achieved without shifting the supermode wavelength (Fig. 5). Furthermore, such PM demonstrates directional light emission, which cannot be achieved in isolated WG-mode microdisks (see inset to Fig. 5). Our studies also indicate that this directional emission pattern is preserved if the disk-to-disk distance is varied. It should also be noted that other system parameters can be tuned to manipulate resonance wavelengths and Q- factors of microcavities through mode coupling at exceptional points. Among these are: the refractive index of the cavity substrate, and the size and/or position of a hole pierced in the cavity, which can be adjusted to enhance a WG-mode Q- factor14,15 or to achieve directional emission on a high-Q WG-mode16. In summary, a comprehensive numerical study was performed to elucidate the mechanisms of modes coupling in PMs with various degrees of cavities size mismatch. The study offers an alternative approach to design novel PM- based components with improved functionalities. In contrast to PM structures composed of identical cavities that may require fabrication accuracy beyond the capabilities of modern technology, the proposed approach does not rely on precise cavities size-matching to achieve the desired device performance. This approach paves the way for new designs of more complex PM structures and arrays, which may eventually lead to new capabilities and applications in micro- and nano-photonics. I wish to thank Vasily Astratov for discussions and Jan Wiersig for bringing his recent paper16 to my attention. This work has been partially supported by the NATO Collaborative Linkage Grant CBP.NUKR.CLG 982430. Svetlana Boriskina’s e-mail address is SBoriskina@gmail.com. References 1. M. Bayer, T. Gutbrod, J. P. Reithmaier, A. Forchel, T. L. Reinecke, and P. A. Knipp, Phys. Rev. Lett. 81, 2582-2586 (1998). 2. A. Nakagawa, S. Ishii, and T. Baba, Appl. Phys. Lett. 86, 041112 (2005). 3. E. I. Smotrova, A. I. Nosich, T. M. Benson, and P. Sewell, IEEE J. Select. Topics Quantum. Electron. 12(1), 78-85 (2006). 4. Y. P. Rakovich, J. F. Donegan, M. Gerlach, A. L. Bradley, T. M. Connolly, J. J. Boland, N. Gaponik, and A. Rogach, Phys. Rev. A 70, 051801(R) (2004). 5. B. Moller, U. Woggon, and M. V. Artemyev, J. Opt. A: Pure Appl. Opt. 8, S113–S121 (2006). 6. S. V. Boriskina, Opt. Lett. 31(3), 338-340 (2006). 7. S. V. Boriskina, J. Opt. Soc. Am. B 23(8), 1565-1573 (2006). 8. S. V. Boriskina, T. M. Benson, P. Sewell, and A. I. Nosich, IEEE J. Select. Topics Quantum Electron., 12(6), (2006). 9. S. Ishii, A. Nakagawa, and T. Baba, IEEE J. Select. Topics Quantum. Electron. 12(1), 71-77 (2006). 10. A. V. Kanaev, V. N. Astratov, and W. Cai, Appl. Phys. Lett. 88, 111111 (2006). 11. W. D. Heiss, Phys. Rev. E 61, 929–932 (2000). 12. D. D. Smith, H. Chang, K. A. Fuller, A. T. Rosenberger, and R. W. Boyd, Phys. Rev. A 69, 063804 (2004). 13. S. V. Pishko, P. Sewell, T. M. Benson, and S. V. Boriskina, submitted to J. Lightwave Technol. (2007). 14. S. V. Boriskina, T. M. Benson, P. Sewell, and A. I. Nosich, J. Lightwave Technol. 20(8) 1563-1572 (2002). 15. X.-S. Luo, Y.-Z. Huang, and Q. Chen, Opt. Lett. 31(8), 1073-1075 (2006). 16. J. Wiersig and M. Hentschel, Phys. Rev. A 73, 031802 (2006). ABSTRACT Mechanisms of whispering-gallery (WG) modes coupling in microdisk photonic molecules (PMs) with slight and significant size mismatch are numerically investigated. The results reveal two different scenarios of modes interaction depending on the degree of this mismatch and offer new insight into how PM parameters can be tuned to control and modify WG-modes wavelengths and Q-factors. From a practical point of view, these findings offer a way to fabricate PM microlaser structures that exhibit low thresholds and directional emission, and at the same time are more tolerant to fabrication errors than previously explored coupled-cavity structures composed of identical microresonators. <|endoftext|><|startoftext|> Spin solid phases of spin 1 and spin 3/2 antiferromagnets on a cubic lattice. Karol Gregor and Olexei I. Motrunich Department of Physics, California Institute of Technology, Pasadena, CA 91125 (Dated: October 30, 2018) We study spin S = 1 and S = 3/2 Heisenberg antiferromagnets on a cubic lattice focusing on spin solid states. Using Schwinger boson formulation for spins, we start in a U(1) spin liquid phase proximate to Neel phase and explore possible confining paramagnetic phases as we transition away from the spin liquid by the process of monopole condensation. Electromagnetic duality is used to rewrite the theory in terms of monopoles. For spin 1 we find several candidate phases of which the most natural one is a phase with spins organized into parallel Haldane chains. For spin 3/2 we find that the most natural phase has spins organized into parallel ladders. As a by-product, we also write a Landau theory of the ordering in two special classical frustrated XY models on the cubic lattice, one of which is the fully frustrated XY model. In a particular limit our approach maps to a dimer model with 2S dimers coming out of every site, and we find the same spin solid phases in this regime as well. PACS numbers: I. INTRODUCTION A simple, nontrivial, and physically common example of a regular system of quantum objects is a collection of spins on a lattice. This is easiest to analyze if the in- teractions do not compete and all prefer the same spin state; the resulting phases have been known for a long time and include ferromagnetic and Neel states. A much richer situation of current interest is when interactions compete. The frustration together with quantum fluctu- ations can destroy the magnetic order and produce spin solid or spin liquid phases. In a spin solid, spins combine into larger singlet objects such as valence bonds which form an ordered pattern on a lattice. Such phases have been found in nature,1,2,3 and also in numerical studies of model Hamiltonians.4,5,6 A spin liquid, on the other hand, is a featureless paramagnet, which can be crudely viewed as a quantum superposition of many valence bond configurations, thus the name “resonating valence bonds” (RVB) state. So far there are only few experimental can- didates, but on the theoretical side the existence of spin liquids in many varieties and our understanding of them is well established (see Ref. 7 for a recent collection of references and also a very recent example of the so-called Coulomb phase in 3d, which is the spin liquid relevant to the present work). In this paper we look for natural spin solid phases of spin 1 and spin 3/2 on a cubic lattice. A direct study of spin Hamiltonians that can stabilize such phases is diffi- cult but can be done in some cases with Quantum Monte Carlo. Which phases are realized will of course depend on the specific model: For example, Refs. 4,5 found valence bond solids in spin 1/2 systems with ring exchanges on the square and cubic lattices. Refs. 6,8 found spin solid phases for a spin 1 model with biquadratic interaction on the anisotropic square lattice, but only magnetically ordered phases on the isotropic square and cubic lattices. Here we follow instead a more phenomenological approach.9,10,11,12 A systematic and commonly used route to achieve this, and the one we start with, is to generalize the spins to a representation of higher symme- try group, here taken to be SU(N).9,10 The problem can be solved exactly in the N → ∞ limit and one can con- sider fluctuations around this limit to get long distance properties of the system. This approach, while difficult to connect with the actual microscopic SU(2) spin system, nevertheless gives us some guidance about what phases to expect and gives us a form of the effective field theory. Here it results in a gauge theory which naturally exhibits deconfined (liquid) and confined (solid) phases, and we expect that if a microscopic spin system has such phases, they should be described by this theory. FIG. 1: The most natural spin solid phase for S = 1 on the cubic lattice. The thick lines denote links with large spin-spin correlations suggesting that the spins organize into Haldane chains along one lattice direction. One of the spin liquid phases expected in 3d is the so-called Coulomb phase. It is a compact U(1) gauge theory coupled to matter in the deconfined phase, where the matter fields (spinons) are gapped, gauge field (emer- gent photon) is gapless, and monopoles (which arise due http://arxiv.org/abs/0704.0821v1 FIG. 2: The most natural spin solid phase for S = 3/2 on the cubic lattice. The drawn bold lines denote links with large spin-spin correlations suggesting that the spins organize into ladders. to compactness) are gapped. In addition, importantly, there are spin Berry phases that lead to the presence of a background charge in the gauge theory formulation. This makes the confined phases nontrivial in that they break lattice symmetries and therefore correspond to various spin solids. The transition occurs because the monopoles condense, and the theory can be equivalently analyzed in terms of them by employing standard electro-magnetic duality. The background charge causes monopoles to acquire a phase when they hop around a plaquette.12 This leads to a nontrivial monopole condensation pat- tern, which then corresponds to a spin solid phase. In 2d the physics is similar, except that the monopoles are instantons and they always proliferate, so there is no Coulomb spin liquid. This approach was first used by Read and Sachdev10 on the square lattice. The spin solids for spin 1/2 on the cubic lattice were analyzed in Ref. 12 and near several different Coulomb spin liquids in Ref. 13. Ref. 14 was led (in a different context) to a gauge theory with background charges on a diamond lattice which was attacked using analogous techniques. For the spins on the cubic lattice, the analysis depends only on the spin magnitude. Any case can be mapped onto S = 0, 1/2, and 1 in 2d and S = 0, 1/2, 1, and 3/2 in 3d. Only the spin 1/2 case was considered so far, but these results cannot be transferred to the other spins since each requires a separate analysis. This is the task of the present work. We find that the most nat- ural phases for spin 1 and 3/2 are the ones shown on Figures 1 and 2. In the S = 1 case the spins organize into Haldane chains. This is easiest to understand in the standard picture where we break spin 1 into two spin 1/2’s and form singlets with spin 1/2’s of spins on either side. Similarly, in the S = 3/2 case we break spin 3/2 into three spin 1/2’s and form singlets on the bonds of the ladders. Several approaches that we have taken and used in different parameter regimes suggest the same spin solid states, which gives us confidence that these phases are very natural in the two cases. II. SCHWINGER BOSONS, DUAL REFORMULATION, AND A BASIC PHASE DIAGRAM A. Schwinger bosons We begin by briefly reviewing the standard technique of large N for spins.9,10 This maps (approximately) our spin system into a theory of spinons coupled to a U(1) gauge field in the presence of static background charges. Our main work is the analysis of this theory, while the purpose of the review here is to establish the connection with the properties of the original spin system. The basic steps in the derivation are as follows. We generalize the SU(2) spin to SU(N) spin and denote it by Sβα(i). We write the spins in terms of Schwinger bosons: Sβα(i) = b α(i)b β(i) sublattice A , Sβα(j) = −b̄β†(j)b̄α(j) sublattice B , (1) where the b, b̄’s are bosonic operators that transform un- der the fundamental representation of SU(N) if the index is on the top and under its conjugate if the index is on the bottom. To get the Hilbert space of the spins we need to restrict the boson occupations as b†α(i)b α(i) = nc , b̄α†(j)b̄α(j) = nc , (2) where nc corresponds to the spin length. The SU(N) spin Hamiltonian is 〈i,j〉 Sβα(i)S β (j) , (3) which reduces to the SU(2) Heisenberg spin model for spin S when N = 2 and nc = 2S. Next we write the system in the path integral pic- ture, imposing the constraints (2) by Lagrange multi- pliers. The spin interaction contains quartic terms; to get action that is quadratic in the boson fields, we use Hubbard-Stratonovich transformation and obtain b†α(i) + iλ(i) bα(i)− iλ(i)nc b̄α†(j) + iλ(j) b̄α(j)− iλ(j)nc 〈i,j〉 |Qij |2 −Q∗ijbα(i)b̄α(j) + h.c. (4) The path integral goes over b, b̄, Q, λ. We can now integrate out the b’s. The resulting ex- pression will have coefficient N in front of it. At large N it can be approximated by its saddle point value. Our departing point is such a “mean field” with uni- form Qr,r+m̂(τ) = Q̄ and λ(r, τ) = λ̄ and assuming gapped b spectrum; this represents a Coulomb spin liq- uid, which is a stable phase in three dimensions. The effective theory is obtained by considering the fluctua- tions of the fields, Qr,r+m̂(τ) = [Q̄ + qm(r, τ)]e iαm(r,τ) and λ(r, τ) = λ̄ + iα0(r, τ). Here r runs over all sites of the cubic lattice and m̂ = x̂, ŷ, ẑ denotes one of the direc- tions in 3d. The amplitude fields qm are massive, and so are the fields αm and α0 near the wavevector (0, 0, 0). On the other hand, the fields αm and α0 near the wavevector (π, π, π) are massless and describe the gauge field (pho- ton) of the Coulomb phase, am ≡ α(π,π,π)m , aτ ≡ α(π,π,π)0 . For details of the derivation, see the original Ref. 10 (our notation is slightly different compared to these papers, which use a two-site unit cell labeling instead). As emphasized in Refs. 10,11, we also have to con- sider effect of Berry phases, which is crucial for the un- derstanding of the spin solid states. A very convenient encapsulation of the low-energy degrees of freedom and the Berry phases is provided by the following re-latticized Euclidean action:15,16 Daiµe −Sa−SB , (5) Sa = −β i,µ<ν cos(∇µaν −∇νaµ) , SB = i ηiaiτ . Here we have a compact U(1) gauge field a residing on the links of a (3+1)d space-time lattice and described by the action term Sa. The SB term comes from detailed consideration of the Berry phases, and ηi is 2S on one sublattice of the spatial lattice and −2S on the other one. In the Hamiltonian language this has a simple in- terpretation as a background charge of value 2S on one sublattice and −2S on the other one: H = u E2rm − κ r,m 0 (the latter choice is made for concreteness). In the ”Quartic unstable” region on the left the potential to quartic order is asymptotically negative and we would have to include sixth order terms to stabilize it. The cross denotes the pa- rameter point obtained by simply expanding the microscopic potential |Φ|4 in terms of the slowly varying fields φ1,2,3. Phase 1. There are three degenerate states. The values in one of them are φ1 = 1, φ2 = φ3 = 0; (19) B0 = 1; F1 = , F2 = 1; (20) ǫx = 2, ǫy = ǫz = 1; (21) fx = 4 3; fy = fz = −2 3. (22) The bond variables are drawn on the original spin lattice in Figure 1; they suggest that the spins are organized into Haldane chains along the x direction. The values of plaquette energies are consistent with this: the plaque- ttes in the xy and xz planes are the same and differ from the plaquettes in the yz plane, ǫz = ǫy 6= ǫx. Phase 2. There are six degenerate states. The values in one of them are φ1 = 0, φ2 = 1, φ3 = ±i; (23) B0 = 2; F1 = − 1√3 , F2 = −1; Nx = 2; (24) ǫx = 2, ǫy = ǫz = 3+ 2 3(−1)x; (25) fx = −4[ 3 + 2(−1)x], fy = fz = 2 3. (26) The corresponding drawing of the bond variables on the original spin lattice is in Figure 6, suggesting that in this phase the spins combine into singlets and form a colum- nar dimer state along one direction. Permuting the val- ues of φ1,2,3 gives six degenerate states that correspond to six possible ways of placing such columnar solid onto the cubic lattice. Phase 3. There are eight degenerate states specified as follows: φ1 = 1, φ2 = e iα2 , φ3 = e iα3 , (27) {α2, α3} = ±{2π/3,−2π/3}, ± {2π/3, π/3}, ± {π/3, 2π/3}, ± {π/3,−π/3}; (28) FIG. 6: Phase 2 of spin 1. The thick lines denote the positions where the bond variables are strongest and dashed lines where they are weakest. This suggests that the spins organize into singlets (dimers) and form a columnar order. B0 = 3; Dx = Dy = Dz = −1; Nx = Ny = Nz = 3 (29) ǫx = 4 + 2(−1)y+z + 3[(−1)y + (−1)z], etc., (30) fx = −4 3(−1)x, etc. (31) The nearest neighbor spin-spin correlation has higher ex- pectation value on the sides of the cubes shown in Figure 7, which suggests that this phase corresponds to a box state. There are eight possible ways of placing such box state onto the cubic lattice. FIG. 7: Phase 3 of spin 1. The bond variables have higher expectation values on the cubes shown. Phase 4. There are four degenerate states: φ1 = 1, φ2 = e iα2 , φ3 = e iα3 ; (32) α2 = 0, π; α3 = 0, π; (33) B0 = 3; Dx = Dy = Dz = 2; (34) ǫx = 4− 4(−1)y+z, ǫy = 4− 4(−1)z+x, ǫz = 4− 4(−1)x+y; (35) fx = fy = fz = 0. (36) This state breaks lattice symmetries as can be seen from the plaquette energies. However, because the bond vari- ables fx,y,z are zero, we do not know a simple interpre- tation of this phase in terms of the original spins; some finer characterization than what we use here is needed to establish this state. This concludes the discussion of the general phase dia- gram of the Landau theory including quadratic and quar- tic terms. Higher-order interactions may stabilize some other phases, but the presented states are the most nat- ural ones. The actual lowest-energy state depends on the parameters u, v, w, unknown apriori. If we are to guess which of the four phases is the most likely candidate in the specific frustrated XY model, we can consider the simplest microscopic quartic potential |Φ|4. When ex- panded in terms of the continuum fields, we find u = 2, v = −1, w = −1/2; this point is denoted by the cross in Figure 5 and lies in the Phase 1, i.e., the Haldane chains phase. 2. Analysis 2: The ground state of the XY model Minimizing the classical energy of the hard spin XY model as described in Sec. III A, we find that the ground state configurations coincide with the condensate wave- functions of the phase 1 and hence the state is that of the phase 1. In particular note that each wavefunction Ψ1,2,3 has the same length |Φ| on all sites. The XY angles of spins in this gauge in the three ground states are (0,−30,−30, 0, 60,−90,−90, 60) , (37) (0, 30, 30, 0,−60, 90, 90,−60) , (38) (0,−90, 90, 180, 0,−90, 90, 180) , (39) where the convention is that we vary position on the cube in the x direction first, then in the y direction, and then in the z. 3. Discussion and extension to anisotropic system Some remarks are in order. First, it is useful to note that the doublet F1,2 can be interpreted as an order parameter of the Haldane chains phase. Indeed, one can readily see that the transformation properties of F1 and F2 coincide with those of (Qx +Qy − 2Qz)/ 3 and Qx − Qy respectively, where Qm is the bond variable in the direction m̂. On the other hand, Nx transforms as (−1)xQx and similarly for Ny and Nz, so ~N can be viewed as an order parameter of the valence bond solids such as the columnar Phase 2 or the box Phase 3. In the columnar phase, it is suggestive to view each strong bond in Fig. 6 as representing a singlet formed by two spin-1’s, which can be also drawn as two spin-1/2 valence bonds connecting the two sites. However, we should be cautious with such interpretation, since we can only tell that the deviations of the bond variables from their mean value will have the displayed pattern. The actual state needs to be studied by constructing the corresponding spin wavefunction. For example, the Haldane phase of a spin 1 chain is stable to weak dimerization and should be viewed as a solid formed by single-strength bonds along the chains, so such distinct possibilities should be kept in mind. Let us now assume that the system is in the Phase 1. It is also interesting to ask what happens when we stretch the lattice in one of the axis directions, say the z-direction. In this case the Rx and Ry rotations are no longer symmetries but the other transformations are. At the quadratic level, the translation symmetries already prohibit all terms except B0 and F ’s. Then from Rz we see that only F1 is allowed. Thus at the quadratic level one new term is allowed. In principle we should look at the new allowed terms at the quartic level, however we will assume that this quadratic term is leading but small compared to the terms that were there before we broke the symmetry. We find that if the F1 comes with a positive pre-factor, out of the three ground states it selects the state with chains running along the z direction whereas if it comes with a negative pre-factor it selects the two states with chains running along the x and y directions. This has a simple interpretation in terms of spins. If the coupling in the z direction is stronger than in the other directions the state with maximum number of bonds in this directions is selected which is the state with chains running in the z-direction. In the opposite case, the states with fewest bonds in the z direction are se- lected which are the states with chains running in the x or y directions. C. Results: Spin 3/2 1. Analysis 1: Phase transition of the XY model We choose the gauge as shown on Figure 3 with S = 3/2. The hopping amplitudes are (−1)z 1 + i(−1)x+y , (40) (−1)z 1− i(−1)x+y , (41) tz = 1 . (42) The band structure has four minima and hence the space of the ground states of kinetic energy is four-dimensional. Unlike the spin 1 case where this space was three- dimensional and simple basis vectors were found corre- sponding to the three directions of the physical space, there is no such form in the spin 3/2 case. The four wavefunctions that give us relatively simple subsequent analysis are the following Ψ1 = (−1)x cosβ − i(−1)x+y+z sinβ Ψ2 = i(−1)y cosβ + i(−1)x+y+z sinβ 1 + i(−1)x+y√ cosβ − i(−1)x+y+z sinβ 1− i(−1)x+y√ cosβ + i(−1)x+y+z sinβ where cosβ = 3 + 1 , sinβ = . (43) We again write Φ(R) = i=1 φiΨi(R). The transfor- mation properties of the slow fields φ1,2,3,4 are derived in the same manner as in the spin 1 case. The symmetries Tx : ~φ → τ3σ0 ~φ∗ , (44) Ty : ~φ → τ0σ0 ~φ∗ , (45) Tz : ~φ → τ1σ0 ~φ∗ , (46) Ry : ~φ → τ1e−i ~φ∗ , (47) Rz : ~φ → e−i σ1 ~φ∗ . (48) Here ~φ, ~φ∗ are column vectors, and we introduced two sets of Pauli matrices: τ matrices that act on the blocs {1, 2} and {3, 4}, and σ matrices that act within each bloc (τ0 and σ0 are the corresponding identity matrices). At the quadratic order there is one invariant term V (2) = m |φi|2 . (49) At the quartic order there are five invariant terms. The expressions in terms of φ are rather complicated and not very illuminating, particularly since φ’s depend on the choice of gauge and the basis. Instead, we will use gauge invariant bilinears of φ to which we now turn. There are 16 bilinears and they can be conveniently organized using tensor product of the introduced two sets of Pauli matrices, namely φ†τµσνφ with µ, ν = 0, 1, 2, 3. These break up into irreducible representa- tions of the cubic lattice symmetry group. There are two one-dimensional, one two-dimensional, and four three- dimensional representations. The convenient combina- tions that we use are B0 = φ †τ0σ0φ , C = φ†τ0σ2φ , F1 = φ †τ0σ1φ , F2 = φ †τ0σ3φ , ~D = (Dx, Dy, Dz) = φ †~τσ2φ , ~N = (Nx, Ny, Nz) = φ †~τσ0φ , Mx = φ †τ1(−1 σ3)φ , My = φ †τ2(−1 σ3)φ , Mz = φ †τ3σ1φ , Kx = φ σ1 − 1 σ3)φ , Ky = φ †τ2(− σ3)φ , Kz = φ †τ3σ3φ . The transformation properties of these bilinears are in the following table Tx Ty Tz Rx Ry Rz B0 + + + + + + C − − − + + + F1 + + + − 12F1 + F2 − 12F1 − F2 + + + Dx + − − + Dz Dy Dy − + − Dz + Dx Dz − − + Dy Dx + Nx − + + + Nz Ny Ny + − + Nz + Nx Nz + + − Ny Nx + Mx − + + + Mz My My + − + Mz + Mx Mz + + − My Mx + Kx − + + − −Kz −Ky Ky + − + −Kz − −Kx Kz + + − −Ky −Kx − The energies and staggered curls of monopole cur- rents in term of these bilinears are B0 − 2(−1)y+zDx 2 [(−1)yMy + (−1)zMz] [(−1)yKy − (−1)zKz] , (50) fx = 2 2(F1 + 3F2) + 8(−1)x√ Nx . (51) The components in the other directions are obtained from these by simple rotations of the coordinates. Our general discussion following similar expressions (17) and (18) in the spin 1 case apply here as well (for ease of compari- son, we are using similar labels for objects with identical transformation properties in the two cases). However, a word of warning is in order here, which will be explained in Sec. III C 3 below. Observe, for example, that ~N and ~M have identical transformation properties and therefore should enter similarly in any expression. The absence of M ’s in the expression for ǫx and the absence of N ’s in the expression for fx is due to their different eigenval- ues under an additional artificial symmetry present in the frustrated XY model, namely a charge conjugation symmetry defined later, which is also present in our bare kinetic term and thus in the above expressions. This symmetry is not physical in the original spin model and will not be used here; it is therefore important to note that the degeneracy of the four slow modes obtained from the bare kinetic term is protected at the quadratic level by the physical lattice symmetries. There are five independent fourth order terms in φ al- lowed by translation and rotation symmetries: I1 = B 0 , (52) I2 = C 2 , (53) I3 = N z , (54) I4 = M z , (55) I5 = NxMx +NyMy +NzMz . (56) As we have said earlier, because the number of invariant terms is large, we will not attempt to draw the phase diagram of the Landau’s theory. Instead we look at the natural microscopic term V (4) = |Φ|4 = 4 I4 , (57) where the second equality is obtained after some calcu- lation keeping only non-oscillatory terms. This potential does not have any continuous symme- try left other than the global U(1) transformation of all fields. In fact the dimensions of the subgroups of SU(4) that keep the terms I1, . . . , I5 invariant are 15, 7, 6, 0, 0 respectively. The potential (57) achieves global mini- mum at twelve discrete points. As an illustration, we consider the following four minima that are associated with the z direction in the sense to become clear below: (φ1, φ2, φ3, φ4) = (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1). (58) The four states can be related to each other by transla- tions in the z direction and rotations about the z axis. Besides B0 = 1, the only nonzero bilinears in these states are (F2, Nz,Kz) = (1, 1, 1), (−1, 1,−1), (1,−1,−1), and (−1,−1, 1) respectively. The energies are ∓ (−1)z , (59) ± (−1)z , (60) , (61) where the upper sign corresponds to the first and fourth minima and the lower sign to the other two. The staggered curls of monopole currents are respec- tively fx, fy, fz = 8(−1)z√ , (62) 8(−1)z√ , (63) 6,− 8(−1) , (64) 6,− 8(−1) . (65) The staggered curls are interpreted as the strength (above some mean) of the expectation value of nearest neighbor spin-spin correlation function. The above val- ues imply that the spins organize themselves into ladders as shown in Figure 2, obtained by drawing say the pos- itive bonds for the first of the above minima. The four listed states correspond to the four different positions of ladders with rungs oriented along the z-axis. The other eight minima are obtained by 90 degree rotations around the x and y axes and we will not write the specific values of the variables. The ladder state is natural for S = 3/2 system, in the picture where spin 3/2 breaks up into three spin 1/2’s and each of them forms a bond with some other neighboring spin 1/2. 2. Analysis 2: The ground state of the XY model We can use the same procedure as in the case of spin 1 to find the classical ground state of the appropriate XY model. In fact, this was already done in Ref. 20 because this problem is the fully frustrated XY model (FFXY), which is of interest by itself, and we can use the available results. We find that the ground state con- figurations coincide with the condensate wavefunctions obtained above. Thus, in each of the four displayed states (58), the microscopic boson field Φ is given precisely by one of the four wavefunctions Ψ1,...,4. One can see that |Φ| = 1 on all lattice sites, and the complex phases of Φ can be interpreted as angles of the hard-spin XY model. For example, for Φ = Ψ1 the angles are (−β, π + β, β, π − β, β, π − β,−β, π + β) , (66) listed in the same order as in Eq. (39). All other ground states can be obtained by appropriate symmetry trans- formations. The agreement of the two analyses suggests that there is only one ordered phases in the FFXY model, which is also supported by the available Monte Carlo studies.20,21 3. Remark on charge conjugation symmetry in the FFXY It is worth to point out that the fully frustrated XY model has an additional charge conjugation symmetry. Indeed, since π and −π fluxes are indistinguishable, tRR′ and t∗RR′ are related by a gauge transformation, tRR′ = eiγRt∗RR′e −iγR′ , and so the action remains invariant under the following unitary transformation: C : ΦR → eiγRΦ∗R . (67) In terms of the continuum fields, this becomes C : ~φ → τ2σ2 ~φ∗ . (68) In particular, the bilinears Nx,y,z are odd under C while Mx,y,z are even, so if this symmetry is included, the I5 quartic term is not allowed (this is why this term did not appear in Eq. 57 since both the microscopic |Φ|4 and the bare quadratic terms in Eq. 10 have this additional sym- metry). Thus, the complete field theory for the FFXY model is a φ4 theory with four complex fields and inde- pendent quartic terms I1,...,4. One consequence of the charge conjugation symmetry is that, for example, if we draw the Ψ1 state using neg- ative values of the staggered curls fx,y,z as opposed to using positive values which was done in Fig. 2, we would obtain another set of ladders that go perpendicularly to the ones displayed and are shifted up by one lattice spac- ing. To put this in other words, the Ψ1 and Ψ4 states that can be related by a translation in the z direction followed by a rotation around the z axis are also related by C. In this sense, each of the states Eq. (58) does not define a direction in the x-y plane since the correlations in the x and y directions are related by the charge conjugation symmetry. Tracing back to the original gauge theory formulation, this symmetry is present in the simplest model Eq. (5) for S = 3/2 that we wrote down and the corresponding sim- plest “dimer model” Hamiltonian Eq. (6). Specifically, the transformation E → 1−E on the links oriented from one sublattice to the other, or equivalently 1 ↔ 0 in the dimer language, takes the model corresponding to spin S to the one corresponding to spin 3−S, while the S = 3/2 case maps back onto itself. This symmetry is useful in the specific models, but there is no corresponding sym- metry in the microscopic derivation from the spin model, and therefore it was not used in the preceding analysis. Let us look what happens to the ground states when we add small term that breaks the charge conjugation sym- metry, the I5, to the potential. Using general arguments it is easy to check that the twelve minima will shift but not split, and the twelve-fold degeneracy remains since all are related to each other by lattice symmetries. Fur- thermore, each ground state stays translationally invari- ant along the ladders and perpendicular to the plane of ladders (otherwise, if this were not true, there would be more than twelve states). In other words, the states still have the structure of ladders. However since the charge conjugation is broken, it is no longer true that the nega- tive bonds are of the same magnitude as the correspond- ing positive ones. This makes sense when interpreted in terms of spins. In the picture where spin 3/2 breaks up into three spin 1/2 and ladders of valence bonds are formed, the links that belong to these ladders are differ- ent from the links without bonds (which also form lad- ders). For example, the system is entangled along the former but not along the latter. Thus these two should not be related by any symmetry. Explicitly, the four states in Eq. (58) become (1, δ, 0, 0), (δ, 1, 0, 0), (0, 0, 1, δ), (0, 0, δ, 1), (69) with appropriate δ obtained from minimization. There is now an additional non-zero bilinear Mz, and also both F1 and F2 are non-zero. The expressions for the ener- gies and staggered curls in the x and y directions are no longer related, and we can then associate a unique x or y direction with each of the four states. These are ladders with rungs oriented in the z direction and are related to each other by the z translations and rotations. 4. Extension to anisotropic system As in the spin 1 case, we ask what happens when we stretch the system along one axis, say the z-direction. Again, the Rx and Ry rotations are no longer symmetries but the translations and Rz are. At the quadratic level, the translation symmetries already prohibit all terms ex- cept B0 and F ’s. Then from Rz we see that only F1 is allowed. Thus at the quadratic level one new term is allowed. We find that if the F1 comes with a positive pre-factor, out of the twelve ground states it selects four with the ladders that lie entirely in the x-y plane, whereas if it comes with a negative pre-factor it selects four states with the ladders running along the z-direction. Note that this breaking up into groups of four is a consequence of the remaining symmetries in the system. These results have a simple physical interpretation for the spin system. If the coupling in the z direction is weaker than in the other directions, the states with fewest bonds in the z direction are selected which are the states with the ladders lying in the x-y plane. On the other hand, if the coupling in the z direction is stronger, the states with the largest number of bonds in the z direction are selected, which are the states with ladders oriented in the z direction. IV. ANALYSIS 3: MAPPING TO DIMERS AT λ ≫ 1/β ≫ 1 Here we look at the right hand corner of the phase diagram Fig. 4 in the regime with λ ≫ 1/β ≫ 1, where as we will see the system can be mapped to dimers.17,18,19 The analysis proceeds as follows. First we gauge away the ∇θ in Eq. (9) to obtain Sdual = ∑ (∂L)2 λ cos(L+ L0) , (70) Because we assume λ ≫ 1/β ≫ 1 the configurations that contribute significantly to the partition function can be written in the form L = −L0 + 2πn + δL where n is an integer and δL is small. Note that the λ term does not depend on n and the 1/β term has a gauge invariance n → n+∇m where m’s are integers on sites. The partition function can be written as a sum over the gauge equivalent classes. These classes are in one-to- one correspondence with the fluxes j = ∂n which are integers on plaquettes, where ∂n is the four dimensional curl (∂n)µν = ∂µnν − ∂νmµ. Consider first configurations with δL = 0. Some con- figurations of j minimize the action and we denote them by jgs. As we show below, there is an extensive num- ber of them in all our cases. The configurations with j that are not jgs are at energy of at least ∼ 1/β higher. Now turning on δL, if we show that the typical energy of excitation in δL around a given j is much smaller then 1/β then we can neglect all configurations which are not around jgs. We will assume that this is true and show this self-consistently below. We define Jgs = −∂L0/(2π) + jgs. We expand the action to the second order and drop the terms that do not depend on Jgs, δL to obtain ∑ 4πJgs · (∂δL) + (∂δL)2 (δL)2 . (71) This is just a gaussian integral. There are two quadratic terms and the first one has 1/β in front and contains two derivatives while the second has λ in front and contains no derivatives. Since we are on a lattice the derivatives are of order one. Since λ ≫ β, the first term can be neglected. Next we sum by parts and integrate out the δL. Before we do this however, we notice that the cou- pling is ferromagnetic in time direction and L0 has zero time components and its spatial components do not de- pend on time. This implies that the jgs and Jgs have zero time components and their spatial components do not depend on time. Thus we drop time components and time derivatives from the action and treat the Jgs and L0 as three-dimensional. Now we integrate out the δL and obtain Seff [J gs] = − 1 8π2β2λ (∇× Jgs)2 (72) Thus, to obtain a ground state, we need to maximize the sum of the squares of curls of Jgs. Let us check the consistency of our approach. From (71), δL ∼ ∇× Jgs/(λβ) and so energy∼ 1/(λβ2). This needs to be much smaller then 1/β which implies λ ≫ 1/β which is what we assumed. FIG. 8: a) L0/(2π) where S = 1/2, 1, 3/2 is the spin. The link variables switch orientations under elementary translation in the x or y direction. b) The fluxes (∇ × L0)/(2π). This figure is similar to Fig. 3 with 2π’s removed to simplify the discussion of the dimer ground states. Now let us turn to the specific cases of spins. Since the spin 1/2 case has not been considered using this approach before, we will add it here for completeness. The gauge choice for L0 and the fluxes∇×L0/(2π) through the faces of the spatial cubes are shown on Figure 8 with S = 1/2. It is easy to see that the set of ground states consists of all configurations with precisely one −5/6 and five 1/6 fluxes Jgs coming out of every site of one sublattice of the original spin lattice (and coming into every site on the other sublattice). Jgs = −∇ × L0/(2π) is one such configuration in the spin 1/2 case, but there are many more. Associating the −5/6 plaquettes with dimers on the links of the original spin lattice, the set of the ground states is thus the set of dimer configurations with one dimer coming out of every site. Now turn to the case of spin 1. The fluxes (∇ × L0)/(2π) are shown on Figure 8 with S = 1. If we try Jgs = −∇× L0/(2π), each cube contributes 1/β energy term proportional to 5(1/3)2 + (5/3)2 = 10/3. However we can do better. Using L = −L0+2πn, if we pick n = 1 on the upper link on the front face and zero elsewhere on the cube in Fig. 8, we lower the magnitude of the flux on the upper face, at the expense of increasing the flux through the front face. The energy of this cube is then 4(1/3)2 + 2(2/3)2 = 4/3, which is lower. It is easy to show that this is the lowest we can achieve and that the ground state configurations have two fluxes of value −2/3 and four fluxes of value 1/3 coming out of every site of one sublattice of the original spin lattice. Associating the 2/3 links with dimers, the set of the ground states is thus the set of dimer configurations with two dimers coming out of every spin site. Finally, in the S = 3/2 case, it is easy to see that the ground state configurations have precisely three −1/2 and three 1/2 fluxes coming out of every site of one sub- lattice of the original cubic lattice. Associating the −1/2 links with dimers, the set of the ground states is thus the set of dimer configurations with three dimers coming out of every spin site. Thus, as claimed, in each case there is an extensive number of Jgs’s. To find the true ground state, we need to minimize (72) among these dimer configurations. It is not hard to show that for the spin 1/2 we get columnar state, for spin 1 the Haldane chains state of Fig. 1 and for spin 3/2 the ladder state of Fig. 2. Finally we note that defining Egs = S/3− Jgs, the set of Egs is the set of electric fields on links, cf. Eq. (6), with the property that the magnitude of each is either zero or one (which can be imposed by minimizing the energy term E2); the mapping between such electric fields and dimers above is the standard one on the cubic lattice17,18,19. The final ground state selection is obtain by maximizing (∇× E)2. V. CONCLUSIONS In this paper we looked for spin solid phases in the sys- tem of spin 1 and 3/2 on the cubic lattice. We wrote the spins in terms of Schwinger bosons, assumed the uniform Coulomb spin liquid phase and by process of monopole condensation transitioned into spin solid phases. Using the duality we rewrote the system in terms of monopoles coupled to a noncompact U(1) gauge field, Eq. (9), and analyzed this theory in three different limits shown in Figure 4. In the first two limits the theory becomes a frustrated XY model. For spin 1 the frustrating flux through every plaquette is 2π/3, while for spin 3/2 it is π. In the first approach, using symmetries we wrote the Landau’s the- ory near the ordering transition. It is a φ4 theory with φ a complex vector with three components for S = 1 and four components for S = 3/2. At the quadratic level only the rotationally invariant mass term is allowed. At the quartic level there are three allowed terms for spin 1 and five for spin 3/2. For spin 1 we draw a mean field phase diagram Figure 5. For spin 3/2 we didn’t attempt it due to a large number of parameters. In both cases we also considered the most natural microscopic potential and found that it selects a state with parallel Haldane chains of Figure 1 for S = 1 and a state with parallel ladders of Figure 2 for S = 3/2. These are natural states for the spin systems to be in, in the picture where spin 1 breaks up into two and spin 3/2 into three spin 1/2’s and each such spin 1/2 forms a singlet bond with another spin 1/2 of some neighbor. In the second approach we looked at the classical ground states of the frustrated XY models and found that these actually describe the same phases as the most natural ones identified near the transition. In the third approach the theory becomes a dimer model with 2S dimers coming out of every site. Dimer configurations with parallel lines for spin 1 and parallel ladders for spin 3/2 are selected, which is the same re- sult as in the other two limits suggesting that these are indeed the most natural valence bond solids in the corre- sponding spin systems. It would be interesting to look for such spin solid phases in Quantum Monte Carlo studies of models on the cubic lattice.5,8 It is also worth noting14 that if we consider our quan- tum 3d systems at a finite temperature, we obtain simply the corresponding classical 3d dimer models, e.g., with the classical energy given by the first term in Eq. (6). Our results then provide appropriate long-wavelength (dual) description of the dimer ordering patterns transitioning out of the so-called Coulomb phase of the classical dimer models,22,23,24 stressing in particular a composite char- acter of the naive order parameters for the valence bond solid phases. It would interesting to explore such 3d clas- sical dimer models and their transitions further. APPENDIX A: CLASSICAL U(1) DUALITY WITH BACKGROUND CHARGE In this section we derive duality for classical com- pact U(1) gauge theory.12,25 However we will use a gen- eral notation of antisymmetric tensors, or differential forms which are fields of antisymmetric tensors. Thus the derivation will work not only for the gauge theory, whose objects are one dimensional, but for general n- dimensional objects. For n = 0 this is the vortex duality of the XY model and for n = 1 the duality of the gauge theory. The further advantage of this derivation is that the formulas are simpler and more transparent. First we give the basic notations and properties of antisymmetric tensors. An n-dimensional antisymmet- ric tensor ω in d dimensions is a collection of numbers ωµ1,µ2,...,µn , where µv = 1, . . . , d, which is completely an- tisymmetric. A differential form ω(~r) is a field of these tensors. We define two operations. First is the exterior deriva- tive ∂. The derivative of ω, denoted ∂ω is the (n+1)-form (∂ω)µ1,µ2,...,µn+1 = (−1)p∂µp1ωµp2 ,...,µpn (A1) where the sum is over all permutations of the n+1 indices and (−1)p is −1 if the permutation is odd and 1 if it is even. Thus for example for n = 1, a vector field, (∂ω)12 = ∂1ω2 − ∂2ω1 and hence this is the curl of a vector field. The second operation that we define is the star opera- tor that takes n-form to (d− n)-form (∗ω)ν1,...,νd−n = ǫν1,...,νd−n,µd−n+1,...,µdωµd−n+1,...,µd where ǫ is the fully antisymmetric tensor in d dimensions and repeated indices are summed over. For example in three dimensions for n = 2, (∗ω)1 = 12 (ω23 − ω32). Note that ∗∗ = (−1)n(d−n). A common operator is divergence which in this nota- tion is proportional to ∗∂∗. As easily checked, (∇ · ω)µ1,...,µn−1 ≡ ∂νων,µ1,...,µn−1 (A3) = (−1)(n−1)(d−n)(∗∂ ∗ ω)µ1,...,µn−1(A4) For a vector field this is the standard divergence. We will work on the lattice. The variables are defined on discrete points. We will define the coordinates of a given variable to be those of the center of the object the variable belongs to. For example the x component of a one form ω in d = 3 lies on a link pointing in x direction and it is denoted by ωx(x+1/2, y, z). The ∂ now denotes the difference operator. For example the curl of the ω is (∂ω)xy(x + 1/2, y + 1/2, z) = ωy(x + 1, y + 1/2, z) − ωy(x, y+1/2, z)−ωx(x+1/2, y+1, z)+ωx(x+1/2, y, z). Finally we will write the integration (summation) by parts ω · ∂φ = − (∇ · ω) · φ+ surface term (A5) where the dot is the sum over the component by com- ponent product of two forms of the same n. Note that ∗ω1 · ∗ω2 = ω1 · ω2. Because we use periodic boundary conditions below, the surface term will be zero. Now we are ready to turn to the duality. Let a be an n-form in d dimensions where its variables are defined on the unit circle. The action is S = −β cos(∂a)− i η · a (A6) In the first term one takes every component at every point, takes cosine of it and sums. In the second term the n-form η denotes the background charge. For the action considered in this paper, the first term is the Sa and the second term the SB in (5), while the η is the four dimensional vector with the time component being ±2S and the other components being zero. The duality proceeds by the following steps. cos(∂a)+i (∂a−2πp)2+i (∂a−2π∂−1q′)2+i J·(∂a−2π∂−1q′)+i J·∂−1q′ ×∆(∇ · J − η) (A7) All numerical factors are dropped throughout, while the sign “≈” is used when an approximation is being made that does not change the qualitative aspects. In the second line we use the Villain form of the cosine. In the third line we have written the field p = ∂α+∂−1q′ as a curl of α plus a field of a particular monopole current configuration q′, ∂−1q′. The ∂−1 denotes a particular configuration of p that gives the monopole currents - that satisfies q′ = ∂p. Then we shifted a → a − α. The summation over α extends the integration of a over the whole real line. The prime on q′ denotes that fact that we are summing over fields for which ∂q′ = 0. The third line can be obtained from the fourth one by completing the square, shifting J and integrating it out. In the fifth line, the ∆ denotes that the operator inside of it is zero. This line is obtained from the fourth one by integrating (summing) by parts and integrating out the Next, as shown explicitly below, in our case there are fields J0 and L0 such that η = ∇ · J0 (A8) J0 = (∗P∂L0)/2π (A9) ∂J0 = 0 (A10) The P shifts a real number by a multiple of 2π so that the result is in the interval (−π, π]. Using (A8) in (A7) we see ∂ ∗ (J − J0) = 0 and hence we can write J = J0 + (∗∂L)/2π (A11) for some field L. To substitute this into (A7) we notice the following J2 = J20 + (∗∂L/2π)2 + 2J0 · ∗∂L/2π ≃ J20 + (∗∂L/2π)2. The ≃ denotes that these expressions are equal under integration, which follows from Eq. (A10). Also ∂−1q′ · ∗∂L ≃ − ∗ ∂∂−1q′ · L = − ∗ q′ · L ≡ −Q′ · L, ′·∗P∂L0 = ei∂ ′·∗∂L0 ≃ e−iQ ′·L0, where Q′ ≡ ∗q′, and we have dropped inconsequential ± signs; in the last line, the P can be removed because the resulting expression, which is in the exponent, differs from the original one by a multiple of 2π. With this we can proceed to complete the duality (∂L)2 ′·(L+L0) (∂L)2 Q·(L+L0−∂θ) (∂L)2 (L+L0−∂θ−2πp)2 (∂L)2 −λ cos(L+L0−∂θ) (A12) In the first line the summation over Q′ is over integer fields Q′ with zero divergence ∇ · Q′ = 0 - currents. In the second line we introduced θ that imposes this con- straint as a Lagrange multiplier and summed by parts. In the third line we added a small term Q2/2λ and as- sumed that it is not going to change the basic behavior of the system. Then we summed out Q, which introduced integer p because Q is an integer (this is the Poisson sum- mation formula). The second term is the Villain form of cosine. In the last line we approximated it by cosine. To complete it remains to find J0 and L0. The η has values ητ (x, y, z, τ + 1/2) = (−1)x+y+z2S and zero for other components. As easily checked (J0)τx(x+ 1/2, y, z, τ + 1/2) = (−1)x+y+z (A13) and similarly for y and z with other components (other then the ones obtained by permutation of indices) being zero. This gives the right η and satisfies ∂J0 = 0. The L0 can be chosen as on the Fig. 3. In the final expression (A12) the L is 1-form and hence a gauge field. The θ is 0-form - a number on a circle - a matter field. Thus we obtained a noncompact U(1) gauge theory coupled to scalar fields of monopoles with frustrated hopping. 1 S. Taniguchi et al., J. Phys. Soc. Jpn. 64, 2758 (1995); 2 D. S. Chow, P. Wzietek, D. Fogliatti, B. Alavi, D. J. Tan- tillo, C. A. Merlic, and S. E. Brown, Phys. Rev. Lett. 81, 3984 (1998). 3 H. Kageyama, K. Yoshimura, R. Stern, N.V. Mushnikov, K. Onizuka, M. Kato, K. Kosuge, C.P. Slichter, T. Goto, and Y. Ueda, Phys. Rev. Lett. 82, 3168 (1999); H. Kageyama, M. Nishi, N. Aso, K. Onizuka, T. Yosihama, K. Nukui, K. Kodama, K. Kakurai, and Y. Ueda, Phys. Rev. Lett. 84, 5876 (2000); 4 A. W. Sandvik, S. Daul, R. R. P. Singh, and D. J. Scalapino, Phys. Rev. Lett. 89, 247201 (2002). 5 K. S. D. Beach and A. W. Sandvik, cond-mat/0612126. 6 K. Harada, N. Kawashima, and M. Troyer, J. Phys. Soc. Jpn 76, 013703 (2007). 7 A. Banerjee, S. V. Isakov, K. Damle and Y. B. Kim, cond-mat/0702029. 8 K. Harada and N. Kawashima, Phys. Rev. B 65, 052403 (2002). 9 D. P. Arovas and A. Auerbach, Phys. Rev. Lett. 61, 617 (1988); Phys. Rev. B 38, 316 (1988). 10 N. Read and S. Sachdev, Phys. Rev. Lett. 62, 1694 (1989); Phys. Rev. B 42, 4568 (1990). 11 F. D. M. Haldane, Phys. Rev. Lett. 61, 1029 (1988). 12 O. I. Motrunich and T. Senthil, Phys. Rev. B 71, 125102 (2005) 13 J.-S. Bernier, Y.-J. Kao, and Y. B. Kim, Phys. Rev. B 71, 184406 (2005). 14 D. L. Bergman, G. A. Fiete and L. Balents, Phys. Rev. B 73, 134402 (2006) 15 S. Sachdev and R. Jalabert, Mod. Phys. Lett. B 4, 1043 (1990). 16 S. Sachdev and K. Park, Ann. Phys. (N.Y.) 298, 58 (2002). 17 W. Zheng and S. Sachdev, Phys. Rev. B 40, 2704 (1989). 18 E. Fradkin and S. Kivelson, Mod. Phys. Lett. B 4, 225 (1990). 19 E. Fradkin, Field Theories of Condensed Matter Systems, Westview Press, 1991 20 H. T. Diep, A. Ghazali, and P. Lallemand, J. Phys. C 18, 5881 (1985). 21 K. Kim and D. Stroud, Phys. Rev. B 73, 224504 (2006). 22 D. A. Huse, W. Krauth, R. Moessner, and S. L. Sondhi, Phys. Rev. Lett. 91, 167004 (2003). 23 M. Hermele, M. P. A. Fisher, and L. Balents, Phys. Rev. B 69, 064404 (2004). 24 F. Alet, G. Misguich, V. Pasquier, R. Moessner, and J. L. Jacobsen, Phys. Rev. Lett. 97, 030403 (2006). 25 M. Peskin, Ann. Phys. (NY) 113, 122 (1978); R. Savit, Rev. Mod. Phys. 52, 453 (1980). http://arxiv.org/abs/cond-mat/0612126 http://arxiv.org/abs/cond-mat/0702029 ABSTRACT We study spin S=1 and S=3/2 Heisenberg antiferromagnets on a cubic lattice focusing on spin solid states. Using Schwinger boson formulation for spins, we start in a U(1) spin liquid phase proximate to Neel phase and explore possible confining paramagnetic phases as we transition away from the spin liquid by the process of monopole condensation. Electromagnetic duality is used to rewrite the theory in terms of monopoles. For spin 1 we find several candidate phases of which the most natural one is a phase with spins organized into parallel Haldane chains. For spin 3/2 we find that the most natural phase has spins organized into parallel ladders. As a by-product, we also write a Landau theory of the ordering in two special classical frustrated XY models on the cubic lattice, one of which is the fully frustrated XY model. In a particular limit our approach maps to a dimer model with 2S dimers coming out of every site, and we find the same spin solid phases in this regime as well. <|endoftext|><|startoftext|> Introduction Symmetry is one of the most important notions in quantum field theory. In many examples, it is useful in investigating properties of quantum field theories non-perturbatively, is a guiding principle in constructing field theories for various purposes such as grand unification, or gives powerful methods in finding exact solutions. It also plays important roles in actual renormalization procedures. Therefore it should be interesting to study symmetries also in noncommutative field theories [1, 2, 3, 4, 5], which may result from some quantum gravity effects [6]. A difficulty in the study in this direction is the apparent violation of basic symme- tries such as Poincaré symmetry in the noncommutativity of spacetime. For example, the Moyal plane [xµ, xν ] = iθµν is translational invariant, but is not Lorentz or rotational invariant. Another example is the three-dimensional spacetime with noncommutativity [xi, xj] = iκǫijkxk (i, j, k = 1, 2, 3) [7, 8, 9, 10] with a noncommutativity parameter κ. This noncommutative spacetime is Lorentz-invariant, but is not invariant under the translational transformation xi → xi + ai with c-number ai. In fact, a naive construction of noncom- mutative quantum field theory on this spacetime leads to rather disastrous violations of energy-momentum conservation [10]: the violations coming from the non-planar diagrams do not vanish in the commutative limit κ→ 0 as in the UV/IR mixing phenomena [11]. In recent years, however, there has been interesting conceptual progress in understanding symmetries in noncommutative field theories: the symmetry transformations in noncommu- tative spacetime are not the usual Lie-algebraic type, but should be generalized to have Hopf algebraic structures. The Moyal plane was pointed out to be invariant under the twisted Poincaré transformation in [12, 13, 14] and under the twisted diffeomorphism in [15, 16, 17, 18]. There have been various proposals to implement the twisted Poincaré in- variance in quantum field theories [19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]. As for the noncommutative spacetime with [xi, xj ] = iκǫijkxk, a noncommutative quantum field theory was derived as the effective field theory of three-dimensional quantum gravity with matters [31]. Its essential difference from the naive construction mentioned above is the nontrivial braiding for each crossing in non-planar Feynman diagrams. With this braiding, there ex- ists a kind of conserved energy-momentum in the amplitudes, and the energy-momentum operators have Hopf algebraic structures. Our aim of this paper is to systematically understand these Hopf algebraic symmetries and their consequences in noncommutative field theories in the framework of braided quan- tum field theories proposed by Oeckl [34]. In the usual quantum field theories, symmetries give non-perturbative relations among correlation functions. We will see that such relations have natural extensions to the Hopf algebraic symmetries in braided quantum field theories, and will obtain the four conditions for the relations to hold. These conditions should be interpreted as the criteria of the symmetries in braided quantum field theories. This paper is organized as follows. In the following section, we review braided quantum field theory. This review part follows faithfully the original paper [34], but figures are more extensively used in the proofs and the explanations to make this paper self-contained and intuitively understandable. We start with braided category and braided Hopf algebra. Then correlation functions of braided quantum field theory are represented in terms of them. Finally braided Feynman rules are given. In Section 3, we first review the axioms of action1 of an algebra on vector spaces. Then we consider the relations among correlation functions in braided quantum field theory. We find that four algebraic conditions are required for the relations to hold. Then, as concrete examples, we discuss whether the noncommutative field theories mentioned above have the Poincaré symmetry by checking the four conditions. In the former case, we find that the twisted Poincaré symmetry is implemented only after the introduction of a non-trivial braid- ing factor, which agrees with the previous proposal in [21, 35]. In the latter case, we find that the theory has a kind of translational symmetry, which is different from the usual one by multi-field contributions. We also give some examples of such relations among correlation functions and the implications. The final section is devoted to summary and comments. We comment on quantum field theory on κ-Minkowski spacetime whose noncommutativity of coordinates is [x0, xj] = xj (j = 1, 2, 3) [36]. 2 Review of braided quantum field theory 2.1 Braided categories and braided Hopf algebras First of all, we review braided categories and braided Hopf algebras [34, 37]. Braided cate- gories are composed of an object X , which is a vector space, a dual object X∗, which is a dual vector space, and morphisms ev : X∗ ⊗X → k (evaluation), (1) coev : k → X ⊗X∗ (coevaluation), (2) where k is a c-number. The composition of the two morphisms in an obvious way makes the identity. Then the braided categories have also an invertible morphism ψV,W : V ⊗W → W ⊗ V (braiding), (3) where V,W are any pair of vector spaces. Generally the inverse of braiding is not equal to the braiding itself. The braiding is required to be compatible with the tensor product such that ψU,V⊗W = (id⊗ ψU,W ) ◦ (ψU,V ⊗ id), ψU⊗V,W = (ψU,W ⊗ id) ◦ (id⊗ ψV,W ). (4) 1We use the italic symbol to distinguish it from the action S. Figure 1: The evaluation, coevaluation, braiding and its inverse. Then the braiding is also required to be intersectional under any morphisms in a Hopf algebra. For example, ψZ,W (Q⊗ id) = (id⊗Q)ψV,W for any Q : V → Z, ψV,Z(id⊗Q) = (Q⊗ id)ψV,W for any Q : W → Z, (5) where Z is a vector space. We can represent these axioms in pictorial ways [38]. We write the morphisms, ev, coev, ψ, downwards as in Figure 1. Thus the axioms (4) are represented as in Figure 2, and the axioms (5) are represented as in Figure 3. Next we consider the polynomials of X , X̂ := Xn, with X0 := 1 and Xn := X ⊗ · · · ⊗X︸ ︷︷ ︸ n times , (6) where 1 is the trivial one-dimensional space. X̂ naturally has the structure of a braided Hopf algebra via · (product) : X̂⊗̂X̂ → X̂, (7) η (unit) : k → X̂ ; η(1) = 1, (8) ∆ (coproduct) : X̂ → X̂⊗̂X̂ ; ∆φ = φ⊗̂1+ 1⊗̂φ, and ∆(1) = 1⊗̂1, (9) ǫ (counit) : X̂ → k ; ǫ(φ) = 0, and ǫ(1) = 1, (10) S (antipode) : X̂ → X̂ ; Sφ = −φ, and S(1) = 1, (11) where φ ∈ X . The tensor product ⊗ is the same as the usual product of Xs, while the new tensor product ⊗̂ is the tensor product of X̂s. The coproduct ∆, counit ǫ, antipode S of the Figure 2: The axioms of braiding (4). Figure 3: The axioms of braiding (5). Figure 4: The axioms of coproduct, counit, antipode for products. products of Xs are defined inductively by ∆ ◦ · = (·⊗̂·) ◦ (id⊗̂ψ⊗̂id) ◦ (∆⊗̂∆), (12) ǫ ◦ · = · ◦ (ǫ⊗̂ǫ), (13) S ◦ · = · ◦ ψ ◦ (S⊗̂S). (14) These axioms are diagrammatically represented in Figure 4. 2.2 Braided quantum field theory Next we represent braided quantum field theory [34] in terms of the braided category and the braided Hopf algebra. We take the vector space X as the space of a field φ(x), where x denotes a general index for independent modes of the field. Thus X̂ is the space of polynomials of the fields such as φ(x1)φ(x2) · · ·φ(xn), and 1 correspond to the constant field of unit. We also take the dual vector space X∗ as the space of differentials δ/δφ(x). We take the evaluation and the coevaluation as follows, δφ(x) ⊗ φ(x′) → δ(x− x′), (15) coev : 1 → φ(x)⊗ δ δφ(x) , (16) Figure 5: The differentials on X̂. where the distribution and the integration should symbolically be understood, and their detailed forms, which may contain non-trivial measures, depend on each case. The differential on X̂ is defined by diff := (êv ⊗ id) ◦ (id⊗∆); X∗ ⊗ X̂ → X̂, (17) where êv|X∗⊗Xn = ev for n = 1, 0 for n 6= 1. Diagrammatically this is given by Figure 5. To see whether the map diff gives really the differential of products, let us compute the differential of φ(x)φ(y) as a simple example using the definition (17). This becomes δφ(x′) ⊗ φ(x)φ(y) = (êv ⊗ id) ◦ (id⊗∆) δφ(x′) ⊗ φ(x)φ(y) = (êv ⊗ id) ◦ δφ(x′) ⊗∆(φ(x)φ(y)) = (êv ⊗ id) ◦ δφ(x′) ⊗ (φ(x)φ(y)⊗̂1 + φ(x)⊗̂φ(y) + ψ(φ(x)⊗̂φ(y)) + 1⊗̂φ(x)φ(y)) = δ(x′ − x)⊗ φ(y) + (êv ⊗ id) ◦ δφ(x′) ⊗ ψ(φ(x)⊗̂φ(y)) where we have used the axiom (12) in deriving the third line. If the braiding is trivial, we find that the differential (17) satisfies the usual Leibniz rule. Figure 6: Diagram of ψn,m. Generally we find a braided Leibniz rule ∂(αβ) = ∂(α)β + ψ−1(∂ ⊗ α)(β) (20) ∂(α) = (ev ⊗ idn−1)(∂ ⊗ [n]ψα), (21) where ∂ ∈ X∗, α, β ∈ X̂ , and we have used a simplified notation ∂(α) := diff(∂ ⊗ α). (22) Here n is the degree of α, and [n]ψ is called a braided integer defined by [n]ψ := id n + ψ ⊗ idn−2 + · · ·+ ψn−2,1 ⊗ id + ψn−1,1, (23) where ψn,m is a braiding morphism given in Figure 6. The proofs of the formula (20), (21) are in Appendix A. Now we define a Gaussian integration, which defines the path integral. The definition is given by ∫ ∂(αw) := 0 for ∂ ∈ X∗, α ∈ X̂, (24) where w ∈ X̂ is a Gaussian weight. In field theory, w is the exponential of the free part of the action, e−S0 . In order to obtain a formula for correlation functions, we define a morphism γ : X∗ → X such that ∂(w) := −γ(∂)w. (25) This morphism is assumed to be commutative with the braiding as in (5). If w = e−S0 , γ(∂) = ∂(S0). In field theory, this is the kinetic part of the action, or the inverse of the propagator. Starting from (24), we can represent correlation functions of a free field theory in terms of the braided category and the braided Hopf algebra. This is the analog of the Wick theorem in braided quantum field theory. The definition of the free n point correlation function is given by Z(0)n (α) := , (26) where the degree of α is n. Algebraically, this is given by 2 = ev ◦ (γ−1 ⊗ id) ◦ ψ, (27) 2n = (Z n ◦ [2n− 1]′ψ!!, (28) 2n−1 = 0, (29) where [2n− 1]′ψ!! := ([1]′ψ ⊗ id2n−1) ◦ ([3]′ψ ⊗ id2n−3) ◦ · · · ◦ ([2n− 1]′ψ ⊗ id), (30) ψ := id n + idn−2 ⊗ ψ−1 + · · ·+ ψ−11,n−1 = ψ−11,n−1 ◦ [n]ψ. (31) The proofs of (27), (28), (29) are in Appendix B. Next we consider correlation functions with the existence of an interaction. For S = S0 + λSint, a correlation function is perturbatively given by Zn(α) = αe−S∫ α(1− λSint + · · · )e−S0∫ (1− λSint + · · · )e−S0 , (32) where α ∈ Xn. Introducing a morphism Sint : k → Xk, where k is the degree of Sint, the correlation function is algebraically given by n − λZ(0)n+k ◦ (id n ⊗ Sint) + 12λ n+2k ◦ (id n ⊗ Sint ⊗ Sint) + · · · 1− λZ(0) ◦ Sint + 12λ2Z 2k ◦ (Sint ⊗ Sint) + · · · . (33) Acting Zn on α ∈ Xn, we obtain the correlation function (32). One can obviously extend Sint to include various interaction terms. 2.3 Braided Feynman rules From the results in the preceding subsection, a correlation function can be represented by summation of diagrams obeying the following rules below. Figure 7: Propagator (left) and vertex (right). Figure 8: The braiding ψ (left) and its inverse ψ−1 (right). • An n-point function Zn is a morphism Xn → k. Thus a Feynman diagram starts with n strands at the top and must be closed at the bottom. • The propagator Z(0)2 : X ⊗X → k is represented by the left of Figure 7, which is the abbreviation of Figure 9. • The interaction vertex Sint : k → Xk is represented by the right of Figure 7. Generally the order of the strands is noncommutative. • The two kinds of crossings, which are represented in Figure 8, correspond to the braid- ing and its inverse. • Any Feynman diagram is built out of propagators, vertices, and crossings, and is closed at the bottom. 3 Symmetries in braided quantum field theory In this section, we discuss symmetries in braided quantum field theory. In order to represent symmetry transformations on fields, we review general description of an action in Section Figure 9: The propagator, which is abbreviated in the left figure of Figure 7. 3.1. In Section 3.2, we study relations among correlation functions. We find four conditions for such relations to follow from a symmetry algebra. In Section 3.3 and 3.4, we treat two examples of (braided) noncommutative field theories and discuss their Poincaré symmetries. 3.1 General description of an action We review an action of a general Hopf algebra on vector spaces in a mathematical language [37, 39]. An action αV is a map αV : A⊗ V → V , where A is an arbitrary Hopf algebra and V is a vector space (in our case, A is a symmetry algebra, and V = X or X∗). We will denote the coproduct and the counit of the Hopf algebra2 by ∆′ and ǫ′ to distinguish them from those of the braided Hopf algebra of fields in Section 2. We do not write all the axioms of an action, but our important axioms are the following. • αV satisfies the following condition. αV ◦ (· ⊗ id) = αV ◦ (id⊗ αV ), (34) where the equality acts on A ⊗ A ⊗ V . This means that αV ((a · b) ⊗ V ) = αV (a ⊗ (αV (b⊗ V ))), where a, b ∈ A. In short we can write this as (a · b) ⊲ V = a ⊲ (b ⊲ V ). (35) • An action on 1, which is in a vector space, is defined by αV (a⊗ 1) = ǫ′(a)1, (36) where ǫ′(a) is the counit of an algebra a ∈ A. • An action on a tensor product of vector spaces V,W is defined by αV⊗W (a) := ((αV ⊗ αW ) ◦∆′)(a) = αV (a (1))⊗ αW (ai(2)), a ∈ A, (37) where ∆′(a) = ai(1) ⊗ ai(2) is the coproduct of the Hopf algebra A. In the case of a usual Lie-algebraic transformation, its coproduct is given by ∆′(a) = a ⊗ 1 + 1 ⊗ a, where 1 is in A. This gives the usual Leibnitz rule. • Since a Hopf algebra has the coassociativity that ((∆′ ⊗ id) ◦∆′)(a) = ((id⊗∆′) ◦∆′)(a), (38) 2We omit the antipode. the action on a tensor product of vector spaces, which is obtained by the multiple operations of ∆′ on a, is actually unique. An important consequence is that one can divide the action on a tensor product of vector spaces as a⊲(V1 ⊗ · · · ⊗ Vk−1 ⊗ Vk ⊗ · · · ⊗ Vn) =∑ ai(1) ⊲ (V1 ⊗ · · · ⊗ Vk−1)⊗ ai(2) ⊲ (Vk ⊗ · · · ⊗ Vn) (39) for any k. 3.2 Symmetry relations among correlation functions and their al- gebraic descriptions The expression of the correlation functions (33) is perturbative in interactions, but is a full order algebraic description. Therefore we can discuss the symmetry of the theory and the implied relations among correlation functions by using this expression. We may even expect that the relations will hold non-perturbatively. In usual quantum field theory, if a field theory has a certain symmetry, there is a relation among the correlation functions in the form, 〈φ(x1) · · · δaφ(xi) · · ·φ(xn)〉 = 0, (40) where δaφ(x) is a variation of a field under a transformation a, on the assumption that the path integral measure and the action are invariant under the transformation. If the coproduct of a symmetry algebra is not the usual Lie-algebraic type and thus the Leibniz rule is deformed, the relation will generally have the form, c(bi)a 〈φ(x1) · · · δbφ(xi) · · ·φ(xn)〉 +c(bi)(cj)a 〈φ(x1) · · · δbφ(xi) · · · δcφ(xj) · · ·φ(xn)〉 +c(bi)(cj)(dk)a 〈φ(x1) · · · δbφ(xi) · · · δcφ(xj) · · · δdφ(xk) · · ·φ(xn)〉 + · · · = 0, (41) where c···a are some coefficients. Its essential difference from (40) is the multi-field contribu- tions. In our algebraic language, the relation can be written as Zn(a ⊲ χ) = ǫ ′(a)Zn(χ), for a ∈ A, χ ∈ Xn. (42) This is equivalent to Figure 10 in our diagrammatic representation. Then we consider what an algebraic structure is required for (42) to hold for any a and χ, i.e. the theory is invariant under the Hopf algebra transformation A. Figure 10: A relation among correlation functions in the diagrammatic representation. n is the number of external legs, k is the order of the interaction, and p is the order of the perturbation. n+ kp is even. Let us write the coproduct of an element a ∈ A as ∆′(a) = f s ⊗ gs, (43) where f s, gs ∈ A. Since the coproduct must satisfy the Hopf algebra axiom [37], (ǫ′ ⊗ id)∆′(a) = (id⊗ ǫ′)∆′(a) = a, (44) f s, gs must satisfy ǫ′(f s)⊗ gs = f s ⊗ ǫ′(gs) = a. (45) For all the relations among correlation functions to hold, we find the following four conditions for any action a ∈ A. • (Condition 1) Sint must satisfy a ⊲ Sint = ǫ ′(a)Sint. (46) • (Condition 2) The braiding ψ is an intertwining operator. That is ψ(a ⊲ (V ⊗W )) = a ⊲ ψ(V ⊗W ). (47) • (Condition 3) γ−1 and a are commutative, a ⊲ (γ−1(V )) = γ−1(a ⊲ V ). (48) • (Condition 4) Under an action a, the evaluation map follows ev(a ⊲ (X∗ ⊗X)) = ǫ′(a)ev(X∗ ⊗X). (49) Condition 1 to 4 are diagrammatically represented in Figure 11. It is clear that, when the algebra A is generated from a finite number of its independent elements, it is enough for these generators to satisfy these conditions. Condition 1 is the requirement of the symmetry at the classical level for the interaction. We can extend this condition to (a ⊲ Xn)⊗ Spint = a ⊲ (Xn ⊗ S int). (50) The proof is the following. From a coproduct (43) and its coassociativity (39), the right hand side of (50) is equal to (f s ⊲ (Xn ⊗ Sp−1int ))⊗ gs ⊲ Sint (51) Since Condition 1 implies gs ⊲ Sint = ǫ ′(gs)Sint, (52) (51) becomes (f s ⊲ (Xn ⊗ Sp−1int ))⊗ ǫ′(gs)Sint = a ⊲ (Xn ⊗ Sp−1int )⊗ Sint, (53) where we have used (45). Iterating this procedure, we obtain the left-hand side of (50). Condition 2,3,4 can also be extended to [n + kp− 1]ψ!! ◦ (a ⊲ Xn+kp) = a ⊲ [n+ kp− 1]ψ!! Xn+kp, (54) (γ−1 ⊗ id) 2 ◦ (a ⊲ Xn+kp) = a ⊲ (γ−1 ⊗ id) 2 Xn+kp, (55) 2 (a ⊲ (X∗ ⊗X) 2 ) = ǫ′(a) ev 2 (X∗ ⊗X) 2 . (56) We can find that these extended conditions (50), (54), (55), (56) can be represented as in Figure 12. In the diagrammatic language, the relation among correlation functions holds if an action can pass downwards through a Feynman diagram and satisfies (36). 3.3 Symmetries of the effective noncommutative field theory of three-dimensional quantum gravity coupled with scalar parti- In this subsection, we discuss the Poincaré symmetry of the effective noncommutative field theory of three-dimensional quantum gravity coupled with scalar particles, which was ob- tained in [31] by studying the Ponzano-Regge model [40] coupled with spinless particles. The Figure 11: Conditions 1,2,3, and 4. Figure 12: A relation among correlation functions is satisfied if the four conditions (46), (47), (48), (49) are satisfied. symmetries of this theory is also known as DSU(2), which was discussed in [32, 33]. We first review the field theory [10, 31]. Let φ(x) be a scalar field on a three-dimensional space x = (x1, x2, x3). Its Fourier transformation is given by φ(x) = dgφ̃(g)e tr(Xg), (57) where κ is a constant, X = ixiσi, and g = P 0−iκP iσi ∈ SO(3)3 with Pauli matrices σi. Here∫ dg is the Haar measure of SO(3) and P 0 = ± 1− κ2P iPi by definition. In the following discussions, we will only deal with the Euclidean case, but the Lorentzian case can also be treated in a similar manner by replacing SO(3) with SL(2, R). The definition of the star product is given by tr(Xg1) ⋆ e tr(Xg2) := e tr(Xg1g2). (58) Differentiating both hands sides of (58) with respect to P i1 := P i(g1) and P 2 := P j(g2) and then taking the limit P i1, P 2 → 0, one finds the SO(3) Lie-algebraic space-time noncommu- tativity [7, 8, 9], [xi, xj]⋆ = 2iκǫ ijkxk. (59) For example, the action4 of a φ3 theory is (∂iφ ⋆ ∂iφ)(x)− M2(φ ⋆ φ)(x) + (φ ⋆ φ ⋆ φ)(x) , (60) where M2 = sin . Its momentum representation is P 2(g)−M2 φ̃(g)φ̃(g−1) dg1dg2dg3δ(g1g2g3)φ̃(g1)φ̃(g2)φ̃(g3), (61) from which it is straightforward to read the Feynman rules. Some quantum properties of this scalar field theory were analyzed in [10]. As can be seen from (59), the naive translational symmetry is violated. In fact, the violation is rather disastrous. There exists a kind of conserved energy-momentum in the amplitudes of the tree and the planar loop diagrams, but this energy-momentum is not conserved in the non- planar loop diagrams. Moreover, the violation of the energy-momentum conservation does not vanish in the commutative limit κ → 0 due to a mechanism similar to the UV/IR phenomena [11]. 3The identification g ∼ −g is implicitly assumed. 4Since in the Ponzano-Regge model the definition of the weight of partition function is eiS despite of Euclidean theory, the sign of the mass term is not the usual one. In the effective field theory of quantum gravity coupled with spinless particles, however, the Feynman rules contain also a non-trivial braiding rule for each crossing, which comes from a flatness condition in a graph of intersecting particles [31]. This can be incorporated as a braiding between the scalar fields, ψ(φ̃(g1)φ̃(g2)) = φ̃(g2)φ̃(g 2 g1g2), (62) in the braided quantum field theory. From the direct analysis of the Feynman graphs with this braiding rule, one can easily find that the energy-momentum mentioned above is conserved also in the non-planar diagrams. This suggests the existence of a translational symmetry in the quantum field theory. In the sequel, we will discuss the embedding of this field theory into the framework of braided quantum field theory, and will check the four conditions for its translational and rotational symmetries. We use the momentum representation, and take X as the space of φ̃(g) and X∗ as that δφ̃(g) . We take the braided Hopf algebra of the fields as follows, ∆ : φ̃(g) → φ̃(g)⊗̂1 + 1⊗̂φ̃(g), (63) ǫ : φ̃(g) → 0, (64) S : φ̃(g) → −φ̃(g), (65) ψ : φ̃(g1)⊗ φ̃(g2) → φ̃(g2)⊗ φ̃(g−12 g1g2). (66) The evaluation and coevaluation maps are given by δφ̃(g) ⊗ φ̃(g′) → δ(g−1g′), (67) coev : 1 → dgφ̃(g)⊗ δ δφ̃(g) . (68) From γ(∂) = ∂S0 = (P 2(g)−m2)φ̃(g−1), γ−1(φ̃(g)) = P 2(g−1)−m2 δφ̃(g−1) . (69) From the algebraic consistencies in Figure 13, the braidings between X and X∗ and the braiding between X∗s are determined to be δφ̃(g1) ⊗ φ̃(g2) = φ̃(g2)⊗ δφ̃(g−12 g1g2) , (70) φ̃(g1)⊗ δφ̃(g2) δφ̃(g2) ⊗ φ̃(g2g1g−12 ), (71) δφ̃(g1) δφ̃(g2) δφ̃(g2) δφ̃(g2g1g . (72) Figure 13: The algebraic consistency conditions of coevaluation map and X , X∗. In this derivation, we have used the invariance of the Haar measure d(g−1g′g) = dg′. Now we consider a translational transformation of the field. If we shift xi to xi + ǫi, a field φ(x) becomes φ(x) → φ(x+ ǫ) dgφ̃(g)ei(x+ǫ) iPi(g) dg(1 + iǫiPi(g))φ̃(g)e ixiPi(g). (73) Thus in the momentum representation, the translational transformation corresponds to an action P i ⊲ φ̃(g) = P i(g)φ̃(g), P 0 ⊲ φ̃(g) = P 0(g)φ̃(g). (74) From the requirement that the star product (58) conserve a kind of momentum, the action on a product of fields should be P i ⊲ (φ̃(g1)φ̃(g2)) = P i(g1g2)φ̃(g1)φ̃(g2) = (P 01P 2 + P 1 + κǫ 2 )φ̃(g1)φ̃(g2), (75) P 0 ⊲ (φ̃(g1)φ̃(g2)) = (P 2 − κ2P i1P2i)φ̃(g1)φ̃(g2). (76) This determines the coproduct of P i, P 0 as ∆′(P i) = P 0 ⊗ P i + P i ⊗ P 0 + κǫijkP j ⊗ P k, (77) ∆′(P 0) = P 0 ⊗ P 0 − κ2P i ⊗ Pi. (78) This coproduct satisfies the coassociativity, which essentially comes from the associativity of the group multiplication. From the axiom (44), the counit of P i, P 0 is given by ǫ′(P i) = ǫ′(P 0) = 0. (79) Since the conservation of momentum under the coevaluation map (68) requires that the action of P i on dg(φ̃(g)⊗ δ δφ̃(g) ) vanish from (36), the action of P i on δ δφ̃(g) must be P i ⊲ δφ̃(g) = P i(g−1) δφ̃(g) . (80) In the following, we see that the momentum algebra satisfies the four conditions (46), (47), (48), (49). Condition 1 is satisfied since P i ⊲ Sint dg1dg2dg3δ(g1g2g3)P i ⊲ (φ̃(g1)φ̃(g2)φ̃(g3)) dg1dg2dg3δ(g1g2g3)P i(g1g2g3)(φ̃(g1)φ̃(g2)φ̃(g3)) = 0. (81) Condition 2 is satisfied since ψ(P i ⊲ (φ̃(g1)φ̃(g2))) = P i(g1g2)(φ̃(g2)φ̃(g 2 g1g2)), P i ⊲ ψ(φ̃(g1)φ̃(g2)) = P i(g2g 2 g1g2)(φ̃(g2)φ̃(g 2 g1g2)). Condition 3 is satisfied since P i ⊲ γ−1(φ̃(g)) = P 2(g−1)−m2 P i(g) δφ̃(g−1) γ−1(P i ⊲ φ̃(g)) = P 2(g−1)−m2 P i(g) δφ̃(g−1) Condition 4 is satisfied since P i ⊲ δφ̃(g1) ⊗ φ̃(g2) = P i(g−11 g2) ev δφ̃(g1) ⊗ φ̃(g2) = 0. (82) Thus we find that the effective braided noncommutative field theory of three-dimensional quantum gravity coupled with spinless particles has the translational symmetry. Next we consider a rotational symmetry. The rotational symmetry corresponds to an action Λ ⊲ φ̃(g) = φ̃(h−1gh), (83) which is the usual Lie-group one. The action on the tensor product is Λ ⊲ (φ̃(g1)⊗ φ̃(g2)) = φ̃(h−1g1h)⊗ φ̃(h−1g2h). (84) Thus the coproduct of the rotational symmetry is given by ∆′(Λ) = Λ⊗ Λ. (85) From the axiom (44), the counit of Λ is given by ǫ′(Λ) = 1. (86) Condition 1 is satisfied since Λ ⊲ Sint dg1dg2dg3δ(g1g2g3)Λ ⊲ (φ̃(g1)φ̃(g2)φ̃(g3)) dg1dg2dg3δ(g1g2g3)(φ̃(h −1g1h)φ̃(h −1g2h)φ̃(h −1g3h)) =ǫ′(Λ)Sint. (87) Condition 2 is satisfied since ψ(Λ ⊲ (φ̃(g1)⊗ φ̃(g2))) = φ̃(h−1g2h)⊗ φ̃(h−1g−12 g1g2h) Λ ⊲ ψ(φ̃(g1)⊗ φ̃(g2)) = φ̃(h−1g2h)⊗ φ̃(h−1g−12 g1g2h). (88) Condition 3 is satisfied since Λ ⊲ γ−1(φ̃(g)) = P 2(g−1)−m2 δφ̃(h−1g−1h) γ−1(Λ ⊲ φ̃(g)) = P 2(h−1g−1h)−m2 δφ̃(h−1g−1h) P 2(g−1)−m2 δφ̃(h−1g−1h) . (89) Condition 4 is satisfied since δφ̃(g1) ⊗ φ̃(g2) δφ̃(h−1g1h) ⊗ φ̃(h−1g2h) = δ(g−11 g2) = ǫ′(Λ)ev δφ̃(g1) ⊗ φ̃(g2) . (90) Thus we find that this braided noncommutative field theory has also the rotational sym- metry. 3.4 Twisted Poincaré symmetry of noncommutative field theory on Moyal plane In this subsection, we discuss the twisted Poincaré symmetry of noncommutative field theory on Moyal plane [xµ, xν ] = iθµν . For example, the action of a φ3 theory is given by (∂µφ ∗ ∂µφ)(x)− m2(φ ∗ φ)(x) + λ (φ ∗ φ ∗ φ)(x) , (91) where the star product is given by φ(x) ∗ φ(x) = e θµν∂xµ∂ νφ(x)φ(y) . (92) In the momentum representation, the action is (p2 −m2)φ̃(p)φ̃(−p) dDp1d µνp2νδ(p1 + p2 + p3)φ̃(p1)φ̃(p2)φ̃(p3) . (93) We take X as the space of φ̃(p) and X∗ as that of δ δφ̃(p) . Then we take the braided Hopf algebra as follows: ∆ : φ̃(p) → φ̃(p)⊗̂1+ 1⊗̂φ̃(p), (94) ǫ : φ̃(p) → 0, (95) S : φ̃(p) → −φ̃(p). (96) From γ(∂) = ∂S0 = (p 2 −m2)φ̃(−p), γ−1(φ̃(p)) = p2 −m2 δφ̃(−p) . (97) Let us consider the twisted Poincaré symmetry [12, 13, 14]. The coproduct and the counit of the twisted Poincaré algebra is given by ∆′(P µ) = P µ ⊗ 1+ 1⊗ P µ, ǫ′(P µ) = 0, ∆′(Mµν) =Mµν ⊗ 1+ 1⊗Mµν θαβ [(δµαP ν − δναP µ)⊗ Pβ + Pα ⊗ (δ ν − δνβP µ)], ǫ′(Mµν) = 0. (98) Thus the action of the twisted Lorentz algebra on the tensor product is Mµν ⊲ (φ̃(p1)⊗ φ̃(p2)) =Mµν ⊲ φ̃(p1)⊗ φ̃(p2) + φ̃(p1)⊗Mµν ⊲ φ̃(p2) θαβ [(δµαP ν − δναP µ) ⊲ φ̃(p1)⊗ Pβ ⊲ φ̃(p2) + Pα ⊲ φ̃(p1)⊗ (δµβP ν − δνβP µ) ⊲ φ̃(p2)], (99) where Mµν ⊲ φ̃(p) = i(pµ∂/∂pν − pν∂/∂pµ)φ̃(p) and P µ ⊲ φ̃(p) = pµφ̃(p). The actions of Mµν and P µ on δ δφ̃(p) Mµν ⊲ δφ̃(p) = i(pµ∂/∂pν − pν∂/∂pµ) δφ̃(p) , (100) P µ ⊲ δφ̃(p) = −pµ δ δφ̃(p) . (101) One easily finds that three conditions (46), (48), (49) are satisfied, but (47) is not if the braiding is trivial. In order to keep the invariance, the braiding must be taken as ψ(φ̃(p1)⊗ φ̃(p2)) = eiθ αβp2α⊗p1β(φ̃(p2)⊗ φ̃(p1)). (102) This agrees with the previous proposal [21, 35]. We can easily check that the translational symmetry holds since the coproduct ∆′(P µ) follows the usual Leibniz rule. 3.5 Relations among correlation functions : Examples Now we have checked, in all orders of perturbation, that the two theories in the preceding sections have symmetry relations among correlation functions implied by the Hopf algebra symmetries. In Section 3.3 we gave how the translational generator acts on a product of fields in (75), (76) in the momentum representation. Since the physical meaning of this Hopf algebra transformation is not so clear, it would be interesting to see explicitly the symmetry relations among correlation functions. The same thing is also true in the case of the twisted Lorentz symmetry in Section 3.4. In this subsection, we work out explicitly some relations among correlation functions in the two theories. In the effective quantum field theory of quantum gravity, the action of the translational generators on a correlation function is given by 〈φ̃(g1) · · · φ̃(gn)〉 → iǫiPi(g1 · · · gn)〈φ̃(g1) · · · φ̃(gn)〉 (103) in the momentum representation, where ǫi is an infinitesimal parameter. Thus we obtain a relation, Pi(g1 · · · gn)〈φ̃(g1) · · · φ̃(gn)〉 = 0. (104) This is a (modified) momentum conservation law; the correlation function has support only on the vanishing momentum subspace, Pi(g1 · · · gn) = 0. This all-order relation in the quantum field theory would be a simple but an important implication of the Hopf algebraic translational symmetry. This provides a good example of the physical importance of a Hopf algebraic symmetry: a Hopf algebra symmetry leads to a (modified) conservation law. It would also be interesting to see the relations in the coordinate representations, where the fields are defined by φ(x) = eip·xφ̃(p). As explicitly noted in the preceding subsections, we stress that the basis of the spaces X of the field variables in the path integrals are parameterized in terms of momenta, and that φ(x) are defined by some c-number linear combinations of them. Therefore, an action a ∈ A of a symmetry transformation acts as a ⊲ φ(x) = eip·x(a ⊲ φ̃(p)), (105) and the symmetry relations of correlation functions can be obtained by some inverse Fourier transformations (with possible non-trivial measures) of those in momentum representations. For example, in the case of the two point function, after the inverse Fourier transforma- tion, the relation among correlation functions is given by 〈∂iφ(x1)φ(x2) + φ(x1)∂iφ(x2)〉 = 0, (106) where we have used the relation (104). Interestingly, this is the usual relation in a transla- tionally invariant quantum field theory. In the case of the three point function, however, the relation is given by 〈∂iφ(x1) 1 + κ2∂2φ(x2) 1 + κ2∂2φ(x3) + 1 + κ2∂2φ(x1)∂ iφ(x2) 1 + κ2∂2φ(x3) 1 + κ2∂2φ(x1) 1 + κ2∂2φ(x2)∂ iφ(x3) + iκǫ 1 + κ2∂2φ(x1)∂jφ(x2)∂kφ(x3) + iκǫijk∂jφ(x1) 1 + κ2∂2φ(x2)∂kφ(x3)− iκǫijk∂jφ(x1)∂kφ(x2) 1 + κ2∂2φ(x3) + κ2∂jφ(x1)∂ jφ(x2)∂ iφ(x3)− κ2∂iφ(x1)∂kφ(x2)∂kφ(x3) + κ2∂kφ(x1)∂ iφ(x2)∂ kφ(x3)〉 = 0. (107) This is quite a non-trivial relation among correlation functions, and would be hard to find, if the Hopf algebra symmetry in the quantum field theory was not noticed. This would be another interesting example implying the physical importance of a Hopf algebra symmetry. In general, the relation has the form, ∂xli − i κǫijk∂xlj∂xmk +O(κ 2))〈φ(x1) · · ·φ(xn)〉 = 0. (108) In the κ → 0 limit, the relation approaches the usual relation. Thus the Hopf algebra sym- metry is a kind of translational symmetry modified by adding κ dependent higher derivative multi-field contributions. We can proceed in a similar manner for the twisted Lorentz symmetry. We have a general form of such a symmetry relation as Mµν ⊲ 〈φ̃(p1) · · · φ̃(pn)〉 = 0. (109) In the case of the two point function, the relation is given by 〈(xµ1∂ν − xν1∂µ)φ(x1)φ(x2) + φ(x1)(x ν − xν2∂µ)φ(x2)〉 = 0, (110) where we have used the momentum conservation. This is the same relation as that in a Lorentz invariant quantum field theory. In the case of the three point function, the relation is given by 〈(xµ1∂ν − xν1∂µ)φ(x1)φ(x2)φ(x3) + φ(x1)(x ν − xν2∂µ)φ(x2)φ(x3) + φ(x1)φ(x2)(x ν − xν3∂µ)φ(x3) iθαµ(∂αφ(x1)∂ νφ(x2)φ(x3) + ∂αφ(x1)φ(x2)∂ νφ(x3) + φ(x1)∂αφ(x2)∂ νφ(x3) − ∂νφ(x1)∂αφ(x2)φ(x3)− ∂νφ(x1)φ(x2)∂αφ(x3)− φ(x1)∂νφ(x2)∂αφ(x3)) iθαν(∂αφ(x1)∂ µφ(x2)φ(x3) + ∂αφ(x1)φ(x2)∂ µφ(x3) + φ(x1)∂αφ(x2)∂ µφ(x3) − ∂µφ(x1)∂αφ(x2)φ(x3)− ∂µφ(x1)φ(x2)∂αφ(x3)− φ(x1)∂µφ(x2)∂αφ(x3))〉 = 0. (111) In general, the relation among correlation functions has the from, ((x1µ∂x1ν − x1ν∂x1µ) + · · ·+ (xnµ∂xnν − xnν∂xmν) +O(θ))〈φ(x1) · · ·φ(xn)〉 = 0 (112) in the coordinate representation. The leading terms corresponds to the usual Lorentz trans- formation xµ → xµ + ǫµνxν . The above symmetry relations on Moyal plane can be represented in similar manners as the usual commutative cases, if we use star products. In the papers [24, 25, 26, 27, 28, 29, 30], they have pointed out that in coordinate representation, correlation functions on Moyal plane should be defined with star products extended to non-coincident points (see also [43]) instead of usual products since the usual commutative commutation relation [x i , x j ] = 0 (i, j = 1, · · · , n) is not invariant under the twisted Poincaré transformation. Carrying out Fourier transformation of the symmetry relation (109) in momentum representation to such a noncommutative coordinate representation, we obtain the symmetry relations in star tensor products. Namely (110) becomes 〈((xµ1∂ν − xν1∂µ)φ(x1)) ∗ φ(x2) + φ(x1) ∗ ((x ν − xν2∂µ)φ(x2))〉 = 0, (113) and (111) becomes 〈((xµ1∂ν − xν1∂µ)φ(x1)) ∗ φ(x2) ∗ φ(x3) + φ(x1) ∗ ((xµ2∂ν − xν2∂µ)φ(x2)) ∗ φ(x3) + φ(x1) ∗ φ(x2) ∗ ((x ν − xν3∂µ)φ(x3)) iθαµ(∂αφ(x1) ∗ ∂νφ(x2) ∗ φ(x3) + ∂αφ(x1) ∗ φ(x2) ∗ ∂νφ(x3) + φ(x1) ∗ ∂αφ(x2) ∗ ∂νφ(x3) − ∂νφ(x1) ∗ ∂αφ(x2) ∗ φ(x3)− ∂νφ(x1) ∗ φ(x2)∂α ∗ φ(x3)− φ(x1) ∗ ∂νφ(x2) ∗ ∂αφ(x3)) iθαν(∂αφ(x1) ∗ ∂µφ(x2) ∗ φ(x3) + ∂αφ(x1) ∗ φ(x2)∂µ ∗ φ(x3) + φ(x1) ∗ ∂αφ(x2) ∗ ∂µφ(x3) − ∂µφ(x1) ∗ ∂αφ(x2) ∗ φ(x3)− ∂µφ(x1) ∗ φ(x2) ∗ ∂αφ(x3)− φ(x1) ∗ ∂µφ(x2) ∗ ∂αφ(x3))〉 = 0. (114) More generally we can derive the symmetry relations of correlation functions for tensor fields φα1···αn(x) ≡ ∂α1 · · ·∂αnφ(x). For example in the case of the three point function of the tensor fields, the symmetry relation becomes 〈((M1µν)α1···αl δ1···δlφδ1···δl(x1)) ∗ φβ1···βm(x2) ∗ φγ1···γn(x3) + φα1···αl(x1) ∗ ((M 2µν)β1···βm δ1···δmφδ1···δm(x2)) ∗ φγ1···γn(x3) + φα1···αl(x1) ∗ φβ1···βm(x2) ∗ ((M 3µν)γ1···γn δ1···δnφδ1···δn(x3)) θαµ[∂αφα1···αl(x1) ∗ ∂ νφβ1···βm(x2) ∗ φγ1···γn(x3) + ∂αφα1···αl(x1) ∗ φβ1···βm(x2) ∗ ∂ νφγ1···γn(x3) + φα1···αl(x1) ∗ ∂αφβ1···βm(x2) ∗ ∂ νφγ1···γn(x3) − ∂νφα1···αl(x1) ∗ ∂αφβ1···βm(x2) ∗ φγ1···γn(x3) − ∂νφα1···αl(x1) ∗ φβ1···βm(x2)∂α ∗ φγ1···γn(x3) − φα1···αl(x1) ∗ ∂ νφβ1···βm(x2) ∗ ∂αφγ1···γn(x3)] θαν [∂αφα1···αl(x1) ∗ ∂ µφβ1···βm(x2) ∗ φγ1···γn(x3) + ∂αφα1···αl(x1) ∗ φβ1···βm(x2)∂ µ ∗ φγ1···γn(x3) + φα1···αl(x1) ∗ ∂αφβ1···βm(x2) ∗ ∂ µφγ1···γn(x3) − ∂µφα1···αl(x1) ∗ ∂αφβ1···βm(x2) ∗ φγ1···γn(x3) − ∂µφα1···αl(x1) ∗ φβ1···βm(x2) ∗ ∂αφγ1···γn(x3) − φα1···αl(x1) ∗ ∂ µφβ1···βm(x2) ∗ ∂αφγ1···γn(x3)]〉 = 0, (115) where (Mµν)α1···αn β1···βn = (Lµν)α1···αn β1···βn + (Sµν)α1···αn β1···βn (Lµν)α1···αn β1···βn = i(xµ∂ν − xν∂µ)δα1β1 · · · δαnβn (Sµν)α1···αn β1···βn = i(ηνβ1δ{α1 β2 · · · δαn}βn − ηµβ1δ{α1νδα2β2 · · · δαn}βn) (116) If we bring the operators (M iµν)α1···αn β1···βn (i = 1, 2, 3) out of the star products, θµν depen- dent terms are canceled. The final expressions are just the usual Lorentz rotations on the coordinates and the tensorial indices in the correlation functions. This is fully consistent with the discussions in [29]. 3.6 Origin of Hopf algebra symmetries To study more the meaning of these additional terms, let us see closer the transformation properties of the star products. In the latter case, it is known that the θµν dependence of the twisted Lorentz transformation (99) comes from the Lorentz transformation of θµν itself [41]. To see this, let us consider an infinitesimal Lorentz transformation, Λµν = δ ν + ǫ The transformation of θµν is given by θµν → θµν + ǫµρθρν + ǫνρθµρ := θµν + δθµν . (117) If one considers not only the transformation of the coordinates, x ′µ = xµ + ǫµνxν , but also (117), and assumes that φ(x) ∗θ φ(x) and φ′(x′) ∗θ+δθ φ′(x′) be equal, one obtains, after the Fourier transformation, φ̃′(p1)⊗ φ̃′(p2) (ǫµνMµν ⊗ 1+ 1⊗ ǫµνMµν + δθµνPµ ⊗ Pν) φ̃(p1)⊗ φ̃(p2) ǫµν∆′Mµν φ̃(p1)⊗ φ̃(p2), (118) which agrees with (99). This shows that the additional part of the coproduct of Mµν takes into account the transformation of the non-dynamical background parameter θµν . The former case can be discussed in a similar manner. The definition of the star product is given by iPi(g1) ⋆x e ixiPi(g2) = eix iPi(g1g2), (119) where we have explicitly indicated the coordinate where the star product is taken. Then we recognize that ei(x+ǫ) iPi(g1) ⋆x+ǫ e i(x+ǫ)iPi(g2) and ei(x+ǫ) iPi(g1) ⋆x e i(x+ǫ)iPi(g2) give distinct values. Namely, if the coordinate of the star product is also shifted, ei(x+ǫ) iPi(g1) ⋆x+ǫ e i(x+ǫ)iPi(g2) = ei(x+ǫ) iPi(g1g2), (120) but, if not, ei(x+ǫ) iPi(g1) ⋆x e i(x+ǫ)iPi(g2) = eiǫ iPi(g1)eiǫ iPi(g2)eix iPi(g1g2). (121) Therefore, if we take the translational transformation as (120), and carry out the same procedure in deriving (59), we always obtain a translational invariant commutation relation5, [(x+ ǫ)i, (x+ ǫ)j ]⋆x+ǫ = 2iκǫ ijk(x+ ǫ)k. (122) 5There is a similar discussion in [42]. Now, assuming that φ(x) ⋆x φ(x) and φ ′(x′) ⋆x′ φ ′(x′) be equal under the translation xi → ′i = xi + ǫi, we obtain, after the Fourier transformation, φ̃′(g1)φ̃ ′(g2) = (1− iǫiPi(g1g2))φ̃(g1)φ̃(g2), (123) which is the same as (75). From these two examples, we anticipate that the multi-field contributions in (41) comes from the transformation properties of the star products. 4 Summary and comments We have discussed symmetries in noncommutative field theories in the framework of braided quantum field theory. We have obtained the algebraic conditions for a Hopf algebra to be a symmetry of a braided quantum field theory, by discussing the conditions for the relations among correlation functions generated from the transformation algebra to hold. Then we have applied our discussions to the Poincaré symmetries in the effective noncommutative field theory of three-dimensional quantum gravity coupled with spinless particles and in the noncommutative field theory on Moyal plane. In the former case we can understand the braiding between fields, which was derived from the three-dimensional quantum gravity computation, from the viewpoint of the translational symmetry of the noncommutative field theory on a Lie-algebraic noncommutative spacetime. In the latter case we have found that the twisted Lorentz symmetry on Moyal plane is a symmetry of the quantum field theory only after the inclusion of the nontrivial braiding factor, which is in agreement with the previous proposal [28, 35]. Then we have discussed the meaning of the Hopf algebra symmetries from the viewpoint of coordinate representation. In the recent research a noncommutative field theory on κ-Minkowski spacetime is dis- cussed [36]. Since this noncommutativity of the coordinates is given by [x0, xj] = i xj , this noncommutative field theory will not have the naive translational symmetry. We may intro- duce a non-trivial braiding between fields as in the effective field theory discussed in Section 3.3 to keep the momentum conservation. However, while the effective field theory has the braided category structure because of the invariance of the Haar measure d(g−1g′g) = dg′, the measure of the momentum space of the field theory on κ-Minkowski spacetime is only left-invariant [36]. Therefore it is not clear to us whether we can embed this field theory on κ-Minkowski spacetime into the framework of braided quantum field theory. Acknowledgments We would like to thank S. Terashima and S. Sasaki for useful discussions and comments, and would also like to thank L. Freidel for stimulating discussions and explaining their recent results during his stay in Yukawa Institute for Theoretical Physics after the 21st Nishinomiya- Yukawa Memorial Symposium. Y.S. was supported in part by JSPS Research Fellowships for Young Scientists. N.S. was supported in part by the Grant-in-Aid for Scientific Research No.13135213, No.16540244 and No.18340061 from the Ministry of Education, Science, Sports and Culture of Japan. A The proofs of the formula (20), (21) We give the proofs of the formula (20), (21) using diagrams. At first we use the formula êv(∂ ⊗ αβ) = êv(∂ ⊗ α)ǫ(β) + êv(∂ ⊗ β)ǫ(α), (124) where α, β ∈ X̂. This is clear from the definition of êv. Figure 14 gives the proof of (20). In the first line, we use the axiom (12), and in the second line we use the lemma (124). We find the last line from the property of counit. Next we prove (21). By using the braided Leibniz rule (20) as α ∈ X ⊗ X̂, the left-hand side of (21) becomes Figure 15. The first term of Figure 15 becomes (ev ⊗ idn−1)(∂ ⊗ idnα) by using the definition of coproduct (9). In the second term of Figure 15, we divide X̂ into X ⊗ X̂ and iterate the same as we did above. For example, if the degree of X̂ is 3, the second term of Figure 15 can be reduced as in Figure 16. We have used ∆X = X⊗̂1+ 1⊗̂X in the second line of Figure 16. The result agrees with (21). In the same way, we can obtain the formula (21) in general. B The proofs of (27), (28), (29) From the definition of γ (25), we find that αaw = −αdiff(γ−1(a)⊗ w), (125) for a ∈ X and α ∈ X̂ . On the other hand, adding γ−1 and ψ to the braided Leibniz rule (20) as in Figure 17, we find that αdiff(γ−1(a)⊗ w) = diff(ψ(α⊗ γ−1(a))w)− (diff ◦ ψ(α⊗ γ−1(a)))w. (126) Combining (125), (126), we obtain αaw = −diff(ψ(α⊗ γ−1(a))w) + (diff ◦ ψ(α⊗ γ−1(a)))w. (127) Integrating the both hand sides of (127) and using (24), we find that Z(0)(αa) = Z(0)(diff ◦ ψ(α⊗ γ−1(a))). (128) If α is b ∈ X , Z(0)(ba) = Z(0)(diff ◦ ψ(b⊗ γ−1(a))) = ev ◦ ψ(b⊗ γ−1(a)) = ev ◦ (γ−1 ⊗ id) ◦ ψ(b⊗ a). (129) Figure 14: The proof of (20). Figure 15: The left-hand side of (21). Figure 16: The second term of Figure 15. Figure 17: The diagram obtained from adding γ−1 and ψ over the braided Leibniz rule. Thus we obtain (27). By putting α = 1, it is clear that 1 (a) = 0. (130) Next we rewrite (128) for α ∈ Xn−1 using the formula (21). Diagrammatically it is written as in Figure 18. The second equality is due to (21). Thus we obtain that Z(0)n = (Z n−2 ⊗ Z 2 ) ◦ ([n− 1]′ψ ⊗ id) (131) Iterating this, we find (28) for even n and (29) for odd n. Figure 18: Diagrammatic proof of (131) References [1] H. S. Snyder, “Quantized space-time,” Phys. Rev. 71, 38 (1947). [2] C. N. Yang, “On Quantized Space-Time,” Phys. Rev. 72, 874 (1947). [3] A. Connes and J. Lott, “Particle Models And Noncommutative Geometry (Expanded Version),” Nucl. Phys. Proc. Suppl. 18B, 29 (1991). [4] S. Doplicher, K. Fredenhagen and J. E. Roberts, “The Quantum structure of space- time at the Planck scale and quantum fields,” Commun. Math. Phys. 172, 187 (1995) [arXiv:hep-th/0303037]. [5] N. Seiberg and E. Witten, “String theory and noncommutative geometry,” JHEP 9909, 032 (1999) [arXiv:hep-th/9908142]. [6] L. J. Garay, “Quantum gravity and minimum length,” Int. J. Mod. Phys. A 10, 145 (1995) [arXiv:gr-qc/9403008]. [7] N. Sasakura, “Space-time uncertainty relation and Lorentz invariance,” JHEP 0005, 015 (2000) [arXiv:hep-th/0001161]. [8] J. Madore, S. Schraml, P. Schupp and J. Wess, “Gauge theory on noncommutative spaces,” Eur. Phys. J. C 16, 161 (2000) [arXiv:hep-th/0001203]. [9] L. Freidel and S. Majid, “Noncommutative harmonic analysis, sampling theory and the Duflo map in 2+1 quantum gravity,” arXiv:hep-th/0601004. [10] S. Imai and N. Sasakura, “Scalar field theories in a Lorentz-invariant three-dimensional noncommutative space-time,” JHEP 0009, 032 (2000) [arXiv:hep-th/0005178]. [11] S. Minwalla, M. Van Raamsdonk and N. Seiberg, “Noncommutative perturbative dy- namics,” JHEP 0002, 020 (2000) [arXiv:hep-th/9912072]. [12] M. Chaichian, P. P. Kulish, K. Nishijima and A. Tureanu, “On a Lorentz-invariant interpretation of noncommutative space-time and its implications on noncommutative QFT,” Phys. Lett. B 604, 98 (2004) [arXiv:hep-th/0408069]. [13] J. Wess, “Deformed coordinate spaces: Derivatives,” arXiv:hep-th/0408080. [14] F. Koch and E. Tsouchnika, “Construction of theta-Poincare algebras and their invari- ants on M(theta),” Nucl. Phys. B 717, 387 (2005) [arXiv:hep-th/0409012]. [15] P. Aschieri, C. Blohmann, M. Dimitrijevic, F. Meyer, P. Schupp and J. Wess, “A gravity theory on noncommutative spaces,” Class. Quant. Grav. 22, 3511 (2005) [arXiv:hep-th/0504183]. http://arxiv.org/abs/hep-th/0303037 http://arxiv.org/abs/hep-th/9908142 http://arxiv.org/abs/gr-qc/9403008 http://arxiv.org/abs/hep-th/0001161 http://arxiv.org/abs/hep-th/0001203 http://arxiv.org/abs/hep-th/0601004 http://arxiv.org/abs/hep-th/0005178 http://arxiv.org/abs/hep-th/9912072 http://arxiv.org/abs/hep-th/0408069 http://arxiv.org/abs/hep-th/0408080 http://arxiv.org/abs/hep-th/0409012 http://arxiv.org/abs/hep-th/0504183 [16] P. Aschieri, M. Dimitrijevic, F. Meyer and J. Wess, “Noncommutative geometry and gravity,” Class. Quant. Grav. 23, 1883 (2006) [arXiv:hep-th/0510059]. [17] X. Calmet and A. Kobakhidze, “Noncommutative general relativity,” Phys. Rev. D 72, 045010 (2005) [arXiv:hep-th/0506157]. [18] A. Kobakhidze, “Theta-twisted gravity,” arXiv:hep-th/0603132. [19] M. Chaichian, P. Presnajder and A. Tureanu, “New concept of relativistic invariance in NC space-time: Twisted Poincare symmetry and its implications,” Phys. Rev. Lett. 94, 151602 (2005) [arXiv:hep-th/0409096]. [20] M. Chaichian, K. Nishijima and A. Tureanu, “An interpretation of noncommuta- tive field theory in terms of a quantum shift,” Phys. Lett. B 633, 129 (2006) [arXiv:hep-th/0511094]. [21] A. P. Balachandran, G. Mangano, A. Pinzul and S. Vaidya, “Spin and statistics on the Groenwald-Moyal plane: Pauli-forbidden levels and transitions,” Int. J. Mod. Phys. A 21, 3111 (2006) [arXiv:hep-th/0508002]. [22] A. P. Balachandran, A. Pinzul and B. A. Qureshi, “UV-IR mixing in non-commutative plane,” Phys. Lett. B 634, 434 (2006) [arXiv:hep-th/0508151]. [23] F. Lizzi, S. Vaidya and P. Vitale, “Twisted conformal symmetry in noncommu- tative two-dimensional quantum field theory,” Phys. Rev. D 73, 125020 (2006) [arXiv:hep-th/0601056]. [24] A. Tureanu, “Twist and spin-statistics relation in noncommutative quantum field the- ory,” Phys. Lett. B 638, 296 (2006) [arXiv:hep-th/0603219]. [25] J. Zahn, “Remarks on twisted noncommutative quantum field theory,” Phys. Rev. D 73, 105005 (2006) [arXiv:hep-th/0603231]. [26] J. G. Bu, H. C. Kim, Y. Lee, C. H. Vac and J. H. Yee, “Noncommutative field theory from twisted Fock space,” Phys. Rev. D 73, 125001 (2006) [arXiv:hep-th/0603251]. [27] Y. Abe, “Noncommutative quantization for noncommutative field theory,” arXiv:hep-th/0606183. [28] A. P. Balachandran, T. R. Govindarajan, G. Mangano, A. Pinzul, B. A. Qureshi and S. Vaidya, “Statistics and UV-IR mixing with twisted Poincare invariance,” Phys. Rev. D 75, 045009 (2007) [arXiv:hep-th/0608179]. [29] G. Fiore and J. Wess, “On ’full’ twisted Poincare symmetry and QFT on Moyal-Weyl spaces,” arXiv:hep-th/0701078. http://arxiv.org/abs/hep-th/0510059 http://arxiv.org/abs/hep-th/0506157 http://arxiv.org/abs/hep-th/0603132 http://arxiv.org/abs/hep-th/0409096 http://arxiv.org/abs/hep-th/0511094 http://arxiv.org/abs/hep-th/0508002 http://arxiv.org/abs/hep-th/0508151 http://arxiv.org/abs/hep-th/0601056 http://arxiv.org/abs/hep-th/0603219 http://arxiv.org/abs/hep-th/0603231 http://arxiv.org/abs/hep-th/0603251 http://arxiv.org/abs/hep-th/0606183 http://arxiv.org/abs/hep-th/0608179 http://arxiv.org/abs/hep-th/0701078 [30] E. Joung and J. Mourad, “QFT with twisted Poincare invariance and the Moyal prod- uct,” arXiv:hep-th/0703245. [31] L. Freidel and E. R. Livine, “Ponzano-Regge model revisited. III: Feynman diagrams and effective field theory,” Class. Quant. Grav. 23, 2021 (2006) [arXiv:hep-th/0502106]. [32] K. Noui, “Three dimensional loop quantum gravity: Towards a self-gravitating quantum field theory,” Class. Quant. Grav. 24, 329 (2007) [arXiv:gr-qc/0612145]. [33] K. Noui, “Three dimensional loop quantum gravity: Particles and the quantum double,” J. Math. Phys. 47, 102501 (2006) [arXiv:gr-qc/0612144]. [34] R. Oeckl, “Braided quantum field theory,” Commun. Math. Phys. 217, 451 (2001) [arXiv:hep-th/9906225]. [35] R. Oeckl, “Untwisting noncommutative R**d and the equivalence of quantum field theories,” Nucl. Phys. B 581, 559 (2000) [arXiv:hep-th/0003018]. [36] L. Freidel, J. Kowalski-Glikman and S. Nowak, “From noncommutative kappa- Minkowski to Minkowski space-time,” arXiv:hep-th/0612170. [37] S. Majid, “Foundations of quantum group theory,” Cambridge, UK: Univ. Pr. (1995) 607 p [38] S. Majid, “Beyond supersymmetry and quantum symmetry: An Introduction to braided groups and braided matrices,” arXiv:hep-th/9212151. [39] A. Klimyk and K. Schmudgen, “Quantum groups and their representations,” Berlin, Germany: Springer (1997) 552 p [40] G. Ponzano and T. Regge, in “Spectroscopic and Group Theoretical Methods in Physics” ed. F. Bloch, North-Holland, Amsterdam, (1968). [41] L. Alvarez-Gaume, F. Meyer and M. A. Vazquez-Mozo, “Comments on noncommutative gravity,” Nucl. Phys. B 753, 92 (2006) [arXiv:hep-th/0605113]. [42] A. Agostini, G. Amelino-Camelia, M. Arzano, A. Marciano and R. A. Tac- chi, “Generalizing the Noether theorem for Hopf-algebra spacetime symmetries,” arXiv:hep-th/0607221. [43] R. J. Szabo, “Quantum field theory on noncommutative spaces,” Phys. Rept. 378, 207 (2003) [arXiv:hep-th/0109162]. http://arxiv.org/abs/hep-th/0703245 http://arxiv.org/abs/hep-th/0502106 http://arxiv.org/abs/gr-qc/0612145 http://arxiv.org/abs/gr-qc/0612144 http://arxiv.org/abs/hep-th/9906225 http://arxiv.org/abs/hep-th/0003018 http://arxiv.org/abs/hep-th/0612170 http://arxiv.org/abs/hep-th/9212151 http://arxiv.org/abs/hep-th/0605113 http://arxiv.org/abs/hep-th/0607221 http://arxiv.org/abs/hep-th/0109162 Introduction Review of braided quantum field theory Braided categories and braided Hopf algebras Braided quantum field theory Braided Feynman rules Symmetries in braided quantum field theory General description of an action Symmetry relations among correlation functions and their algebraic descriptions Symmetries of the effective noncommutative field theory of three-dimensional quantum gravity coupled with scalar particles Twisted Poincaré symmetry of noncommutative field theory on Moyal plane Relations among correlation functions : Examples Origin of Hopf algebra symmetries Summary and comments The proofs of the formula (??), (??) The proofs of (??), (??), (??) ABSTRACT Braided quantum field theories proposed by Oeckl can provide a framework for defining quantum field theories having Hopf algebra symmetries. In quantum field theories, symmetries lead to non-perturbative relations among correlation functions. We discuss Hopf algebra symmetries and such relations in braided quantum field theories. We give the four algebraic conditions between Hopf algebra symmetries and braided quantum field theories, which are required for the relations to hold. As concrete examples, we apply our discussions to the Poincare symmetries of two examples of noncommutative field theories. One is the effective quantum field theory of three-dimensional quantum gravity coupled with spinless particles given by Freidel and Livine, and the other is noncommutative field theory on Moyal plane. We also comment on quantum field theory on kappa-Minkowski spacetime. <|endoftext|><|startoftext|> Introduction Solar flares first revealed themselves as visual perturbations of the solar atmo- sphere (“white light flares”) and hence immediately were construed as a pho- tospheric process. With the invention of spectroscopic techniques, though, it became clear that chromospheric emission lines such as Hα revealed flare pres- ence much more readily. This led to the concept of the “chromospheric flare” and to a great deal of observational material on Hα flares and eruptions, as reviewed by Smith & Smith (1963), Zirin (1966), or Švestka (1976), for example. At some point, prior to the discovery of coronal flare effects, the misinterpretations of the Hα line profile even led to the incorrect idea that a flare was a sudden cooling of the solar atmosphere. In any case, a perturbation of the lower solar atmosphere violent enough to affect the solar luminosity itself (“white light”) implies a large energy content. Our view of flares now emphasizes the high temperatures and non-thermal effects seen in the corona, and we generally believe the chromospheric effects themselves to be secondary in nature. This may be true, but nonetheless the modern observations confirm the fact that the lower solar atmosphere dominates the radiant energy budget of a flare via the UV and white-light continua. Some- how, therefore, the energy stored in the solar corona rapidly focuses down into regions visible in chromospheric signatures; this accounts for the high contrast of flare effects there. Thus the “chromospheric flare” remains essential to our understanding of the overall processes involved. The chromosphere nowhere exists as a well-defined layer with a reproducible height structure. In this paper I use the term interchangeably with “lower solar atmosphere,” embracing the phenomena of the visible photosphere through the transition region. During flares the structure of these “layers” and the physical conditions within them may change drastically. The changes generally http://arxiv.org/abs/0704.0823v1 2 Hudson happen so fast and on such small spatial scales that we cannot observe them comprehensively. Understanding the impulsive phase in the chromosphere may therefore seem like something of a lost cause from the the point of view of theory, especially in view of our inability to understand the quiet chromosphere any better than we do. The data repeatedly reveal that we simply have not yet resolved the spatial or temporal structures involved in the impulsive phase, and that without knowing the geometry of the physical structure, we cannot really comprehend its physics. The TRACE (Handy et al. 1999) and RHESSI (Lin et al. 2002) observations have provided more than one recent breakthrough, however, and it may be that we are beginning to understand the gradual phase of a flare at least. This review is organized around several topics involving the behavior of the chromosphere during a flare. These include the process of “chromospheric evaporation” (Section 4), flare energetics (Section 5), the mechanisms of flare continuum emission (Section 6), and the inference of flare structure from the morphology of the chromospheric flare (Section 7 and Section 9). In Section 2 and Section 3 we give an overview of the history of chromospheric flares and show a cartoon to establish a working model of a solar flare. Sections 8 and 11 discuss large-scale magnetic reconnection and theoretical ideas, and Section 10 presents a γ-ray mystery. 2. Historical Development Although it was the white-light continuum that initially revealed the existence of solar flares, the advent of spectroscopy (e.g., Hale 1930) allowed their regular observation via the Hα line (see Švestka 1966 for a discussion of the historical de- velopment of these observations). This strong absorption line actually becomes an emission line during bright flares, and Hα limb observations frequently show prominences and eruptions. Hα observers came to recognize a particular flare morphology, the so-called two-ribbon flare. Bruzek (1964) described the pat- terns followed by these events, which provided strong evidence that the solar corona had to play a major role in flare development. Figure 1 reproduces one of Bruzek’s sketches, and then illustrates in a cartoon (due to Anzer & Pneuman 1982) how this morphology led to our standard magnetic-reconnection scenario that tries to embrace the X-ray observations and the coronal mass ejections (CMEs) as well as the chromospheric ribbon structures. In this standard picture a solar flare develops in a complicated manner that involves restructuring of the coronal magnetic field in such a way as to release energy. The immediate effects of this energy release are to produce broad-band “impulsive phase” emissions and to drive chromospheric gas up into coronal magnetic loops, the process we term “chromospheric evaporation.” A part of the field magnetic structure may actually erupt and open out into the solar wind, in the sense that the field lines stretch out past the Alfvé n critical point of the flow. This opening may consist of rising loops which then take the form of a coronal mass ejection (CME), or it may involve interactions with previously open field (a process often termed “interchange reconnection” nowadays; see Heyvaerts et al. 1977). If a CME does accompany the flare, as it almost invariably does for flares of GOES class X or greater, the energy involved in mass motions may Chromospheric Flares 3 Figure 1. Left: one of Bruzek’s (1964) sketches, showing a flare with ribbons on the disk and its equivalent Hα “loop prominence system” over the limb. This key observational pattern led directly to the formation of our standard flare model (right), in the form presented by Anzer & Pneuman (1982). be comparable to the luminous energy (e.g., Emslie et al. 2005). Generally the observations are limited in resolution, both temporal and spatial, and especially in spectral coverage. Thus we often resort to a cartoon that serves to identify how the essential parts of a flare relate to one another. Soft X-ray observations show hot loops in the gradual phase of a flare. These result from the material “evaporated” from the chromosphere and have anomalously high gas pressure (but still low plasma β; however see Gary 2001). Whereas the pressure at the base of the corona normally is of order 0.1 dyne cm−2, a bright flare loop can achieve 103 dyne cm−2. This over-dense and over-hot coro- nal loop gradually cools, and in its final stages the remaining plasma returns to a more chromospheric state and suddenly becomes visible in Hα (Goldsmith 1971). The loops that have reached this state then form Bruzek’s Hα loop prominence system (Figure 1). During the ribbon expansion another important phenomenon occurs: hard X-ray emission appears at the footpoints of the coronal loops that are in the pro- cess of being filled by chromospheric evaporation (Hoyng et al. 1981). The hard X-rays show that a substantial part of flare energy appears in the form of non- thermal electrons (Kane & Donnelly 1971; Lin & Hudson 1976; Holman et al. 2003). The hard X-ray signature (and hence the energetic dominance of these electrons) is present whether or not the flare develops the two-ribbon morphol- ogy or has a CME association. The hard X-ray emission occurs in the impulsive phase of the flare, contem- poraneously with the period of chromospheric evaporation that fills the coro- nal loops and with the acceleration phase of the associated CME (Zhang et al. 2001). In Section 5 we describe this phase of the flare with the thick-target model (Kane & Donnelly 1971) which Hudson (1972) identified with the energy source of white-light flare continuum. 3. The Flare Spectrum A (major) flare can be observed at almost any wavelength in a fast-rise/slow- decay time profile, with some (e.g., the white-light continuum) having a more impulsive variability, and others (e.g., the Balmer lines) having a more gradual pattern (Figure 2, right). We generally describe a flare as consisting of a foot- 4 Hudson Figure 2. Left: Line widths of the Balmer-series lines, from the classic paper by Suemoto & Hiei (1959). The inferred densities added to the curves are logne = 13.5 and 13.3; the inferred filling factor is small, suggesting either filamentary structure or thin layering. Right: Typical time series of flare radiations, distinguishing the impulsive phase from the gradual phase (see Kane & Donnelly 1971). point and ribbon structures in the lower atmosphere, coronal loops, and various kinds of ejecta. The impulsive phase is typically associated with the footpoint structures, and the gradual phase with the flare ribbons. Nowadays imaging spectroscopy in principle allows us to study these regions independently. Flare spectroscopy began with the observation of the Balmer series, which shows broad lines tending towards emission profiles as the flare gets more en- ergetic. Early observations of the higher members of the sequence allowed the inference of a relatively high density and of a small filling factor (Suemoto & Hiei 1959; see the left panel of Figure 2). Such observations refer to what we would now call the gradual phase of the flare (see the right panel of Figure 2 for a sketch of the temporal development of a flare). In the impulsive phase the con- tinuum appears in emission, as noted originally by Carrington and by Hodgson independently. The weak photospheric metallic lines may also go weakly into emission (or are filled in by continuum) and the recent observations of Xu et al. (2004) show that flare effects can appear even at the “opacity minimum” region of the spectrum, where one would expect much higher densities. In fact a single density could never properly describe such a heterogeneous structure, but each spectral band provides its own clues. At the time of writing no proper analy- sis of spectroscopic “response functions” (e.g., Uitenbroek 2005) for any of the signatures has yet been attempted, so our inference of flare structure from the spectroscopy alone is weak. The continuum radiation seen in white light and the UV constitutes the bulk of flare radiated energy (Kane & Donnelly 1971; Woods et al. 2006). TRACE imaging of this emission component shows it to consist of unresolved, intensely Chromospheric Flares 5 bright fine structures (Hudson et al. 2006). The thick-target model invokes fast electrons (energies above about 10 keV) to transport coronal energy into the chromosphere. Here collisional losses provide the heating and footpoint emis- sions that accompany the hard X-ray bremsstrahlung. The thick-target model does not explain the particle acceleration, nor show how the footpoint sources can be so intermittent. We return to this question in Section 7. The spectra emitted at the footpoints of the flaring coronal loops have contributions over an exceptionally broad wavelength range, as sketched in the right panel of Figure 2. The prototypical observable is the hard X-ray flux, which imaging observations show to be concentrated at the footpoints (Hoyng et al. 1981), but impulsive footpoint emissions also occur in many spectral windows ranging from the microwaves (limited presumably by opacity) to the γ-rays (limited presumably by detection sensitivity). There is a large body of work on the Hα line alone, both observation and theory. Berlicki (2007) reviews the Hα spectroscopic material in detail in these proceedings. A strong absorption line forms across a wide range of continuum optical depths, and in principle this single line might provide sufficient information to infer the physical structure of the flare everywhere. In practice the complexities of the radiative transfer and of the flare motions, especially in the impulsive phase, make this information ambiguous (see Berlicki 2007). 4. Chromospheric Evaporation The motions most directly relevant to the chromosphere are often called “chro- mospheric evaporation,” even though the direct Doppler signatures of this mo- tion are normally found in lines formed at higher temperatures (but see Berlicki et al. 2005). That this process occurs (even if it is not “evaporation” strictly speak- ing) was suggested by the early observations of loop prominence systems (e.g., Bruzek 1964) with their “coronal rain,” and Neupert (1968) established its as- sociation with non-thermal processes such as bursts of microwave synchrotron radiation. The thermal microwave spectrum (e.g., Hudson & Ohki 1972) made it particularly clear that the gradual phase of a solar flare involves the temporary levitation of chromospheric material into the corona, as opposed to the process that might be imagined from the earlier term “sporadic coronal condensation” (e.g., Waldmeier 1963). The flows involved in chromospheric evaporation are along the field direction and serve to create systems of coronal loops with rel- atively high gas pressure and therefore higher (but still probably low) plasma beta. The early observational indications of chromospheric evaporation actually came from blueshifts in EUV and soft X-ray lines (e.g., Antonucci et al. 1982; Acton et al. 1982) such as those from FeXXV or CaXIX. Figure 3 shows an image-resolved view of Doppler shifts in an evaporative flow (Czaykowska et al. 1999). The chromospheric effects are more subtle and in fact the impulsive-phase evaporation is difficult to disentangle from other effects (Schmieder et al. 1987). The high-temperature blueshifts correspond to upward velocities of some hun- dreds km/s and seldom appear in the absence of a stationary emission line; in other words, hot plasma has already accumulated in coronal loops as the process continues. Based on theory and simulations (Fisher et al. 1985) one can distin- 6 Hudson Figure 3. Imaging spectroscopy from SOHO/CDS of EUV emission lines in the gradual phase of a two-ribbon flare, showing the clear signature of blueshifted upflows in the expected locations along the flare ribbons. This is “gentle” evaporation not associated with strong hard X-ray emission (from Czaykowska et al. 1999). Note that CDS produces images by scanning in one spatial dimension, so that each image (while monochromatic) is not instanta- neous. guish “explosive” and “gentle” evaporation, depending upon the physics of en- ergy deposition (e.g., Abbett & Hawley 1999). In explosive evaporation, driven hypothetically by an electron beam, one has the additional complication of a “chromospheric condensation” that produces a redshift as well. Schmieder et al. (1987) and Berlicki et al. (2005) survey our overall understanding. It would be fair to comment that the explosive evaporation stage remains ill-understood, even though it principle it describes the key physics of sudden mass injections into flare loops. Chromospheric Flares 7 -500 0 500 1000 1500 2000 2500 3000 VAL C Height, km -500 0 500 1000 1500 2000 2500 3000 VAL C Height, km 10000 Figure 4. Characteristic radiative cooling time (upper) as a function of height in the VAL-C model, crudely estimated as described in the text. The lower panel shows the temperature in this model. 5. Energetics and Magnetic Field We can use the standard VAL-C model (Vernazza et al. 1981), as discussed further in the Appendix, to discuss the energetics. First we establish that the chromosphere and the rest of the lower solar atmosphere (i.e., that for which τ5000 < 1) have negligible heat capacity and limited time scales. Figure 4 shows an estimate of the radiative time scale in the VAL-C model (Vernazza et al. 1981). This shows 3σ(z)kT/L⊙, where σ is the surface density as a function of height about the τ5000 = 1 layer, and L⊙ the solar luminosity. The time scale decreases below 1 sec only above z ∼ 515 km, near the temperature minimum in the VAL-C model. Above this height any energy injected into the system will tend to radiate rapidly, resulting in a direct energy balance between input and output energy, rather than a local storage and release. At lower altitudes we would not expect to see rapid variability. The model also allows us to ask whether the chromosphere itself can store energy comparable to that released in a major flare or CME. Table 1 gives some order-of-magnitude properties for a chromospheric area of 1019 cm2, showing both possible sources (bold) and sinks (italics) of energy. For the magnetic field we simply assume 10 or 1000 G as representative cases. Using the total magnetic energy in this manner is an upper limit, since the actual free energy would depend on its degree of non-potentiality. We find that magnetic energy storage limited to the volume of the chromosphere will not suffice, unless unobservably small-scale fields there somehow dominate. The gravitational potential energy also will not suffice. Estimates of this sort confirm the idea that the flare energy must reside in the corona prior to the event. The estimate of gravitational potential energy is somewhat more ambiguous. The Table shows the value needed to displace the entire atmosphere by its total thickness, the equivalent of roughly 3′′ in the VAL-C model. There does not seem to be any evidence for such a displacement, although I am not aware of any searches. It is likely that the stresses that store energy in the coronal field 8 Hudson have their origin deeper in the convection zone, rather than in the atmosphere (McClymont & Fisher 1989). Actually the observable changes of gravitational energy are even of the wrong sign, given that we normally observe only outward motions, (against gravity) during a flare. Table 1. Properties of a chromospheric volume of area 1019 cm2 Parameter VAL-C VAL-C above Tmin Mass 4×1019 gram 5×1017 gram Magnetic energy 1×1028erg 8×1027 erg (10 G) Magnetic energy 1×1032 erg 8×1031 erg (103 G) Gravitational energya 3×1032 erg 3×1030 erg Thermal energy 2×1031 erg 3×1029 erg Kinetic energy 3×1029 erg 3×1027 erg Ionization energy 4×1032 erg 6×1030 erg aPotential energy for a vertical displacement of 2.5 × 108 cm From Table 1 one concludes that the chromosphere probably does play a dominant role in the energetics of a solar flare, at least as described by a semi- empirical model such as VAL-C. This just restates the conventional wisdom, namely that the flare energy needs to be stored magnetically in the corona, rather than in the chromosphere where the radiation forms. Note that this is backwards from the relationship for steady emissions: the requirement for chromospheric heating is larger than that for coronal heating, so it is possible to argue that the steady-state corona actually forms as a result of energy leakage from the process of chromospheric heating (e.g., Scudder 1994). We can make a similar estimate for energy flowing up from the photosphere. The Alfvén speed at τ5000 = 1 ranges from 3 to 30 km s −1, depending on the field strength, in the VAL-C model (see Appendix). Below the surface of the Sun vA drops rapidly because of the increase of hydrogen ionization. Thus chromo- spheric flare energy cannot have been stored just below the photosphere, since it could not propagate upwards rapidly enough (McClymont & Fisher 1989). This again supports the conventional wisdom that flare energy resides in the corona prior to the event. To drive chromospheric radiations from coronal energy sources requires ef- ficient energy transport, which is normally thought to be in the form of non- thermal particles (the “thick target model”, Brown 1971; Hudson 1972) or in the form of thermal conduction as in the formation of the classical transition region. Both of these mechanisms provide interesting physical problems, but the impulsive phase of the flare (where the thick-target model usually is thought to apply) certainly remains less understood. Section 11 comments on models. The magnetic field in the chromosphere is decisively important but ill- understood. The plasma beta is generally low (see Appendix), so just as in the corona the dynamics depends more on the behavior of the field itself than to the other forces at work. Generally we believe that the subphotospheric field exists in fibrils, implying the existence of sheath currents that isolate the flux tubes from their unmagnetized environment. On the other hand, the dominance of plasma pressure in the chromosphere as well as the corona implies that the Chromospheric Flares 9 field must rapidly expand to become space-filling. Longcope & Welsch (2000) discuss the physics involved in this process as flux emerges from the interior. The effect of the flux emergence must be to create current systems linking the sources of magnetic stress below the photosphere, with the non-potential fields containing the coronal free energy. A full theory of how this works does not exist, and we must add to the uncertainty the possibilty of unresolved fields (e.g., Trujillo Bueno et al. 2004). Their suggested factor of 100 in B2 would clearly affect the estimate of magnetic energy given in Table 1 and perhaps change everything. We note in this context that the “impulse response” flares (White et al. 1992) have scales so small that one could argue for an entirely chromospheric origin. 6. Energetics and the Formation of the Continuum The formation of the optical/UV emission spectrum of a solar flare has from the outset presented a special challenge, since (a) it represents so much energy, and (b) it appears in what should be the stablest layer of the solar atmosphere. The recent observations of rapid variability and spatial intermittency make this all the more interesting, and these observations – now from space – also help to intercompare events; previous catalogs of white-light flares (e.g., Neidig 1989 and references therein) had to be based on spotty observations made with a wide variety of instruments. Observationally, the continuum appears to have two classes, with most events (“Type I” spectra) showing evidence for recombination radiation via the presence of the Balmer edge and sometimes the Paschen edge as well. A few events (e.g., Machado & Rust 1974) show spectra with weak or unobservable Balmer jumps, implicating H− continuum as observed in normal photospheric radiation. The spectra in the latter class (“Type II”) suggests a relationship to Ellerman bombs (Chen et al. 2001). However, the physics of Ellerman bombs appears to be quite different from that of solar flares (e.g., Pariat et al. 2004), though. The strong suggestion from correlations is that non-thermal electrons phys- ically transport flare energy from the corona, where it had been stored in the current systems of non-potential field structures, into the radiating layers. The hard X-ray bremsstrahlung results from the collisional energy losses of these particles, and other signatures (such as the optical/UV continuum) depends on secondary effects. Proposed mechanisms include direct heating, heating in the presence of non-thermal ionization, and radiative backwarming. In some manner these effects (or others not imagined) must provide the emissivity ǫν , to support the observed spectrum. Note that the emissivity is often expressed in terms of the source function Sν = ǫν/κν via the opacity κν . In a steady state one would have energy balance between the input (e.g., electrons) and the contin- uum. Fletcher et al. (2007) have now shown that this implies energy transport by low-energy electrons, below 25 keV, as opposed to the 50 keV or higher sug- gested by some earlier authors. Such low-energy electrons have little penetrating power and could not directly heat the photosphere itself from a coronal accel- eration site. Thus either the continuum arises from altered conditions in the chromosphere, or some mechanism must be devised to link the chromosphere and photosphere not involving the thick-target electrons. 10 Hudson “Radiative backwarming” – for example Balmer and Paschen continuum excited in the chromosphere and then penetrating down to and heating a deeper layer – could in principle provide a vertical step between energy source and sink. One problem with this is that the weaker backwarming energy fluxes might not cause appreciable heating in the denser atmosphere, and thus not be able to contribute to the observed continuum excess, because of the short radiative time scale. This idea is a variant of the mechanism of non-thermal ionization originally proposed by Hudson (1972) in the “specific ionization approximation,” which involves no radiative-transfer theory and simply assumes ion-electron pairs to be created locally at a mean energy (∼30 eV per ion pair). Finally, the rapid variability observed in the continuuum, even at 1.56 µm (Xu et al. 2006) provides a clear argument that the continuum forms at the temperature minimum or above (see Section 5, especially Figure 4). Early proponents of particle heating as an explanation for white-light flares also considered protons as an energy source (Najita & Orrall 1970; Švestka 1970). This made sense, because protons at energies even below those char- acteristic of γ-ray emission-line excitation can penetrate more deeply than the electrons that produce hard X rays. It makes even more sense now that we have the suggestion that ion acceleration in solar flares may rival electron accelera- tion energetically (Ramaty et al. 1995). Simnett & Haines (1990) suggest that particle acceleration in solar flares involves a neutral beam, implying that the major energy content (and hence the optical/UV continuum) would originate in the ion component. This idea does not appear to explain the apparent simul- taneity of the footpoint sources (Sakao et al. 1996), and at present we do not understand the plasma physics of the particle acceleration and propagation well enough even to identify the location of the acceleration region. 7. Flare Structures Inferred from Chromospheric Signatures The continuum kernels may move systematically for perhaps tens of seconds and generally have short lifetimes. We illustrate this in Figure 5 (from Fletcher et al. 2004). This shows the motions of individual UV bright points within the flare ribbon structure. Such motions are only apparent motions, as in a deflagration wave, because they exceed the estimated photospheric Alfvén speed (see Sec- tion 4 and the Appendix). Figure 6 (from Hudson et al. 2006) makes the same point for a different flare, using TRACE white-light observations. The basic picture one gets from such observations is that the white light/UV continuum of a flare appears in compact structures that are essentially unresolved in space and in time within the present observational limits. These bright points con- tain enormous energy and thus must map directly to the energy source. We do not know if the fragmentation (intermittency) results from this mapping or is intrinsic to the basic energy-release mechanism. How do the small chromospheric sources map into the corona, where the flare energy must reside on a large scale before its release? A strong literature has grown up regarding this point, interpreting the ribbon motions as measures of flux transfer in the standard magnetic-reconnection model (Poletto & Kopp 1986; see also literature cited, for example, by Isobe et al. 2005). The flux transfer in the photosphere is taken to measure the coronal inflow into the re- Chromospheric Flares 11 Figure 5. Flare footpoint apparent motions deduced from TRACE UV ob- servations. Each squiggle represents the track of a bright point visible for several consecutive images at a few-second cadence, with the black dot show- ing the beginning of each track (Fletcher et al. 2004). Figure 6. Intermittent structure seen in TRACE white-light images of an M-class flare on July 24, 2004. The individual frames have dimensions 32′′× 64′′. Note the presence of bright features consistent with the TRACE angular resolution, and which change from frame to frame over the 30-second interval. These observations do not appear to resolve the fluctuations either in space or in time (Hudson et al. 2006). connecting current sheet, which appears to correlate with the radiated energy as seen in hard X-rays, UV, or Hα. Figure 1 (right) shows the assumed geometry linking the chromosphere and corona. The analysis extends to the multiple si- multaneous UV footpoints apparently moving within the ribbons as they evolve, as noted in Figure 5 above. The analyses suggest a strong relationship between energy release and the inferred coronal Alfvén speed. 12 Hudson Figure 7. Left: UV ribbons (TRACE observations) from a flare of Novem- ber 23, 2000. The gray scale shows the time sequence of brightening. Right: Correlation between pixel brightness in Ribbon A and the inferred reconnec- tion rate (from Saba et al. 2006). 8. Dynamics and Magnetic Reconnection To release energy from coronal magnetic field in a largely “frozen-field” plasma, a flare must involve mass motions. We often do observe apparent motions, both parallel and perpendicular to the field as indicated by the image striations (“loops”). Most of the observable motions are outward, leading to the idea of a “magnetic explosion” (e.g., Moore et al. 2001). Motions apparently perpendicu- lar to the magnetic field may become coronal mass ejections (CMEs) and contain a great deal of energy (e.g., Emslie et al. 2005). These perpendicular motions also are involved in flare energy release; for example the large-scale magnetic reconnection involved in many flare models (Figure 1, right panel) necessarily involve “shrinkage” (e.g., Švestka et al. 1987; Forbes & Acton 1996). Note that this process is more of a magnetic implosion than a magnetic explosion (Hudson 2000). The motion of flare footpoints and ribbons is (we believe) only apparent, because of the low Alfvén speed vA in the photosphere, where the field is tem- porarily anchored (“line-tied”). For B =1000 G and n = 1017 cm−3 we find vA ∼ 6 km s −1; observations often suggest motions an order of magnitude faster (e.g., Schrijver et al. 2006). The motions therefore represent a wave-light confla- gration moving through a relatively fixed magnetic-field structure. It is natural to imagine that this sequence of field lines links to the coronal energy-release site, which the standard model identifies with a current sheet that mediates large-scale magnetic reconnection. Figure 7 shows one example of the result of an analysis of the apparent motion of a flare ribbon (Saba et al. 2006). This and other similar analyses reveal a tendency for the “reconnection rate” to correlate with the pixel brightness. The reconnection rate is the rate at which flux is swept out in the ribbonmotion, often expressed as an electric field from E = v×B (the so-called “reconnection electric field”). In this picture the flare ribbons are identified with “quasi-separatrix structures” where magnetic reconnection can take place most directly. Chromospheric Flares 13 9. Surges, Sprays, and Jets Chromospheric material also appears in the corona in the form of surges and sprays, which may have a close relationship to the flare process (e.g., Engvold 1980). In addition, of course, we observe filaments and prominences in chromo- spheric lines, and these also have a flare/CME association, but too tangential for discussion in this review. Surges and sprays are Hα ejecta, rising into the corona as a result of chro- mospheric magnetic activity. The literature traditionally distinguishes them by apparent velocity, with the faster-moving sprays taken to have stronger flare associations. Surges often appear to return to the Sun, while sprays acceler- ate beyond the escape velocity and do not return. Both appear to move along the magnetic field lines, but unlike the evaporation flow the surges and sprays incorporate material at chromospheric temperatures. Modern soft X-ray and EUV data (Yohkoh, SOHO, and TRACE) have had sufficient time resolution to reveal the phenomenon of X-ray jets (Shibata et al. 1992); see also the UV observations of Dere et al. 1983. These tightly-collimated structures at X-ray temperatures have a strong correlation with surges and sprays, and indeed presumably lead to the jet-like CMEs seen at much greater altitudes (Wang & Sheeley 2002). These events have a strong association with emerging flux, and indeed the X-ray jets invariably have an association with mi- croflares and originate in the chromosphere near the microflare loop(s) (Shibata et al. 1992). As Zirin famously remarked, most emerging flux emerges within active regions, and that is where the jets preferentially occur. The site is frequently in the leading part of the sunspot group. Figure 8 (Canfield et al. 1996) shows the sequence of events in an explanation of these phenomena invoking magnetic reconnection to allow chromospheric material access to open fields. Note that this scenario imposes two requirements on the chromosphere: there must be open and closed fields juxtaposed, and a large-scale reconnection process must be able to proceed under chromospheric conditions. The Canfield et al. (1996) observations strongly imply that this process requires the presence of vertical electric currents supporting the observed twisting motions. The surges, sprays, and jets, not to mention flares and CMEs, underscore the time dependence and three-dimensionality characterizing what is often char- acterized as a thin time-independent layer for convenience. The subject of spicules is outside the scope of this review, but we note that they represent a form of activity that occurs ubiquitously outside the magnetic active regions. 10. A Chromospheric γ-ray Mystery The γ-ray observations of solar flares have begun, as did the radio and X-ray observations before them, to open new windows on flare physics. Share et al. (2004) have made a discovery that is difficult to understand and which in- volves chromospheric material. They report observations of the line width of the 0.511 MeV γ-ray emission line formed by positron annihilation (Figure 9). This emission requires a complicated chain of events: the acceleration of high-energy ions, their collisional braking and nuclear interactions in the solar atmosphere, the emission of secondary positrons by the excited nuclei, the collisional braking 14 Hudson Figure 8. A mechanism for jet/surge formation involving emerging flux (upper left), with magnetic reconnection against already-open fields (upper right), which may lead to a high-temperature ejection (the jet) entraining chromospheric material (the surge). The cartoon at lower right describes the observations of (Canfield et al. 1996), who find a spinning motion suggesting that the process must occur in a 3D configuration rather than that of the cartoons left and above. of these energetic positrons in turn, and finally their recombination with ambient electrons to produce the 0.511 MeV γ-rays. Because the γ-ray observations are so insensitive, this process requires an energetically significant level of particle acceleration that is possibly distinct from the well-known electron acceleration in the impulsive phase. The mystery comes in the line width of the emission line. Surprisingly the pioneering RHESSI observations of Share et al. (2004) showed it to be broad enough to resolve. The likeliest source of this line broadening is Doppler mo- tions in the positron-annihilation region. This requires the existence of a large column density (of order gram cm−2) at transition-region temperatures; the transition region under hydrostatic conditions would be many orders of mag- nitude thinner (see also Figure 11). According to Schrijver et al. (2006), the excitation of the footpoint regions during the the time of intense particle accel- eration only continues for some tens of seconds at most. This would represent the time scale for the apparent motion of a foopoint source across its diameter. The γ-ray observations, on the other hand, require minutes of integration for a statistically significant line-profile measurement. We therefore are confronted with a major problem. What is the structure of the flaring atmosphere that permits the formation of the broad 0.511 MeV γ-ray line? Recent spectroscopic observations of the impulsive phase in the UV, as viewed off the limb (Raymond et al. 2007) make a conventional explanation difficult. Chromospheric Flares 15 500 505 510 515 520 Energy (keV) 0.000 0.005 0.010 0.015 Figure 9. RHESSI γ-ray observations of the 0.511 MeV line of positron annihilation (Share et al. 2004). The two line profiles are from different inte- grations in the late phase of the X17 flare of 28 October 2003; for the broader line the authors suggest thermal broadening, which would require a large column depth of transition-region temperatures during the flare. 11. Theory and Modeling To understand the chromospheric spectrum of a solar flare we must understand the formation of the radiation and its transfer in the context of the motions produced by (or producing) the flare. The representation of the spectrum by a “semi-empirical model” represents one shortcut; in such an approach (e.g., the standard VAL model that we use in the Appendix) one attempts to construct a model atmosphere capable of describing the spectrum even if it may not be physically self-consistent. Such descriptions may however be sufficient in the gradual phase of a flare when the flare loops no longer have energy input and simply evolve by cooling and draining. Even here, however, we do not have a good understanding of the “moss” regions that form at the footpoints of these high-pressure loops (but see Berger et al. 1999). So far as I am aware there is no literature specifically on “spreading moss,” the similar structure that appears in association with flare ribbons. A more complete approach to the physics comes from “radiation hydrody- namics” physical models, most recently those of Allred et al. (2005); see Berlicki (2007) for a fuller description. Such models solve the equations of hydrodynam- ics and radiative transfer simultaneously and can thus deal with chromospheric evaporation and the formation of the high-pressure flare loops. This frame- work is necessary if we are to be able to understand the flare impulsive phase (e.g., Heinzel 2003). Even these models do not have sufficient realism, though, since they work currently in one dimension and thus cannot follow the time development of the excitation properly; the high-resolution observations of UV and white light by TRACE clearly show that the energy release has unresolved scales. Further, as pointed out by Hudson (1972), the ionization of the chromo- sphere (and hence the formation of the continua) cannot be described by a fluid 16 Hudson Figure 10. Continuum emission in the near infrared (1.56 µm, the “opacity minimum” region) during an X10 flare (Xu et al. 2004). Red shows the IR emissions, contours show the RHESSI 50-100 keV X-ray sources. The IR contrast relative to the preflare photosphere reached ∼20% in this event. approximation, or even by non-LTE radiative transfer that assumes a unique temperature. At present there has been little effort to create an electrodynamic theory of chromospheric flare processes, even though non-thermal particles are widely thought to provide the dominant energy in at least the impulsive phase. In the gradual phase there is interesting physics associated with heat conduction because the transition region would have to become so steep that classical con- ductivity estimates have difficulty (Shoub 1983). A more complete theory would have to take plasma effects into account and would probably contain elements of theories of the terrestrial aurora that are now largely missing from the solar lexicon. This lack of self-consistency in the modeling probably means that we have major gaps in our understanding of, for example, the evaporation process as it affects the fractionation of the elements and of the ionization states of the flare plasma. The Appendix gives estimates of the ranges of some the key plasma parameters in the chromosphere. 12. Conclusions This article has reviewed chromospheric flare observations from the point of view of the newest available information – Yohkoh, SOHO, TRACE and RHESSI, for example, but not Hinode or STEREO (already launched), nor much less FASR or ATST (not launched yet at the time of writing). spite of the high quality of the data prior to these missions, we still find major unsolved problems: • How does the chromosphere obtain all of the energy that it radiates? • How can flare effects appear at great depths in the photosphere? • How is the anomalous 0.511 MeV line width produced? Chromospheric Flares 17 • What are the elements of an electrodynamic theory of chromospheric flares? In my view the solution of these problems cannot be found in chromospheric observations alone, because the physical processes involve much broader regions of the solar atmosphere. Even providing answers to these specific questions may not reveal the plasma physics responsible for flare occurrence, which may involve spatial scales too fine ever to resolve. But we can hope that new observations from space and from the ground, in wavelengths ranging from the radio to the γ-rays, will enable us to continue our current rapid progress, and can speculate that eventually numerical tools will supplement the theory well enough for us to achieve full comprehension of the important properties of flares. To get to this point we will need to deal with the chromosphere, as messy as it is. One important task that is probably within our grasp now is the compu- tation of response functions for physical models of flares. At present these are restricted to very limited numerical explorations of the radiative transfer within the framework of one-dimensional radiation hydrodynamics (e.g., Allred et al. 2005). The energy transport in these models has been restricted to simplistic representations of particle beams for energy transport, and do not take account of complicated flare geometries, waves, or various elements of plasma physics. Future developments of chromospheric flare theory will need to complete the picture in a more self-consistent manner. Acknowledgments This work has been supported by NASA under grant NAG5- 12878 and contract NAS5-38099. I thank W. Abbett for a critical reading. I am also grateful to Rob Rutten for LaTeX instruction, and to Bart de Pontieu for meticulous keyboard entry. Appendix: plasma parameters The lower solar atmosphere marks the transition layer between regions of strik- ing physical differences, and as one goes further up in height the tools of plasma physics should become more important. This Appendix evaluates for conve- nience several basic plasma-physics parameters for the conditions of the staple VAL-C atmospheric model (Vernazza et al. 1981)1. This is a “semi-empirical model” in which interprets a set of observations in terms of the theory of ra- diative transfer, but without any effort to have self-consistent physics. Such a model can accurately represent the spectrum but may or may not provide a good starting point for physical analysis. Because the optical depth of a spec- tral feature is the key parameter determining its structure, one often sees the model parameters plotted against continuum optical depth τ5000 evaluated at 5000Å. Just for illustration, Figure 11 shows the VAL-C temperature separately as a function of height, column mass, and optical depth. Note that features prominent in one display may appear to be negligible in another The VAL-C model is an “average quiet Sun” model, and like all static 1D models, it cannot describe the variability of the physical parameters that theory The VAL-C parameters are available within SolarSoft as the procedure VAL C MODEL.PRO. 18 Hudson 10000 1000 100 10 Height, km 10-10 10-8 10-6 10-4 10-2 100 Continuum opacity 10-8 10-6 10-4 10-2 100 102 Column mass, g/cm3 Figure 11. Illustration of the structure of a semi-empirical model, using three different independent variables: the VAL-C temperature plotted against height, optical depth, and column mass. and observation require (see other papers in these proceedings, e.g., Carlsson’s review). Thus we should regard the plasma parameters estimated here as order- of-magnitude estimates only and note especially that the vertical scales, which depend in the model on the inferred optical-depth scale, may be systematically displaced. The VAL-C model explicitly does not represent a chromosphere perturbed by a flare. Vernazza et al. (1981) and many other authors give more appropriate models derived by similar techniques for flares as well as other structures. As the discussion of the γ-ray signatures in Section 10 suggests, though, a pow- erful flare may be able distort the lower solar atmosphere essentially beyond all recognition (especially in the impulsive phase). To estimate representative plasma parameters I have therefore chosen just to start with the basic VAL-C model, and we simply assume constant values of B at 10 G and 1000 G. The actual magnetic field may vary through this region (the “canopy”) but the de- tails are little-understood. The γ-ray literature usually uses a parametrization of the magnetic field strength B ∝ Pαg (Zweibel & Haber 1983), where Pg is the gas pressure. The most complicated behavior of the plasma parameters happens prefer- entially near the top of the VAL-C model range (for example, Figure 12 shows that the collision frequency decreases below the plasma and Larmor frequencies) above the helium ionization level (or even below this level for strong magnetic fields). Because VAL-C ignores time dependences and 3D structure, and as- sumes Te = Ti, we can expect that it has diminished fidelity as one approaches the unstable transition region; thus one should be especially careful not to take these approximations too literally. The following notes correspond to each panel of the figure. Most of the plasma-physics formulae used in this Appendix are from Chen (1984). Chromospheric Flares 19 Figure 12. Various plasma parameters in the VAL-C model. We have as- sumed representative B values of 10 G and 1000 G. The different panels show the following, left to right and top to bottom: (a) Temperature. (b) Den- sities: solid, total hydrogen density; dotted, electron density; dashed, He I density; dash-dot, He II density. (c) Plasma beta: solid, for 1000 G; dotted, for 10 G; light solid, electron density as a fraction of total hydrogen den- sity; dash-dot, the plasma parameter. (d) Frequencies. Solid, electron and ion plasma frequencies; dotted, electron gyrofrequencies for 10 and 1000 G; dashed, electron and ion collision frequencies; dash-dot, electron/neutral col- lision frequency. (e) Velocities: Solid, electron and ion thermal velocities; dashed, Alfvén speeds for 10 and 1000 G. (f) Scale lengths: solid, electron Larmor radii for 10 G and 100 G; dotted, Debye length; dashed, ion and electron inertial lengths. Temperature: The VAL-C model, like all of the semi-empirical models, sets Te = Ti. It therefore cannot support plasma processes dependent upon dif- ferent ion and electron temperatures, or more complicated particle distribution functions (e.g. Scudder 1994). Densities: Total hydrogen density, electron density, and densities of He I and He II. Dimensionless parameters: We approximate the plasma beta as 2(nH + 2ne)kT B2/8π with nH the hydrogen density, ne the electron density, Figure 12(c) gives the number of electrons in a Debye sphere as the “plasma parameter” Λ. 20 Hudson Frequencies: The plasma frequency, the electron and proton Larmor frequen- cies, and the electron and ion and collision frequencies νei = 2.4 × 10 −6nelnΛ/T eV ; νii = 0.05 × 4νei; νeH = (nH/ne)νe with ne in cm −3, TeV the temperature in eV, using Z = 1.2 and the Coulomb logarithm lnΛ = 23 - ln(n0.5e T eV ) (Chen 1984; De Pontieu et al. 2001). Note that the collision frequencies are small compared with the plasma and Larmor frequencies above about 1000 km in this model. This means generally that plasma processes must have strong effects on the physical parameters of the atmosphere in this region. Velocities: Electron and proton thermal velocities; Alfvén speeds vA for 10 and 1000 G. Scale lengths: Electron Larmor radii for 10 and 1000 G, the ion inertial length c/ωpi, the electron inertial length c/ωpe, and the Debye length λD. The iner- tial lengths determines the scale for the particle demagnetization necessary for magnetic reconnection. For VAL-C parameters the ion inertial length increases to about 100 m in the transition region. References Abbett W. P., Hawley S. L., 1999, ApJ521, 906 Acton L. W., Leibacher J. W., Canfield R. C., Gunkler T. A., Hudson H. S., Kiplinger A. L., 1982, ApJ263, 409 Allred J. C., Hawley S. L., Abbett W. P., Carlsson M., 2005, ApJ630, 573 Antonucci E., Gabriel A. H., Acton L. W., Leibacher J. W., Culhane J. L., Rapley C. G., Doyle J. G., Machado M. E., Orwig L. E., 1982, Solar Phys.78, 107 Anzer U., Pneuman G. W., 1982, Solar Phys.79, 129 Berger T. E., de Pontieu B., Fletcher L., Schrijver C. J., Tarbell T. D., Title A. M., 1999, Solar Phys.190, 409 Berlicki A., Heinzel P., Schmieder B., Mein P., Mein N., 2005, A&A430, 679 Berlicki A., 2007, these proceedings Brown J. C., 1971, Solar Phys.18, 489 Bruzek A., 1964, ApJ140, 746 Canfield R. C., Reardon K. P., Leka K. D., Shibata K., Yokoyama T., Shimojo M., 1996, ApJ464, 1016 Chen F. F., 1984, Introduction to plasma physics, 2nd edition, New York: Plenum Press, 1984 Chen P.-F., Fang C., Ding M.-D., 2001, Chinese Journal of Astronomy and Astrophysics 1, 176 Czaykowska A., de Pontieu B., Alexander D., Rank G., 1999, ApJ521, L75 De Pontieu B., Martens P. C. H., Hudson H. S., 2001, ApJ558, 859 Dere K. P., Bartoe J.-D. F., Brueckner G. E., 1983, ApJ 267, L65 Emslie A. G., Dennis B. R., Holman G. D., Hudson H. S., 2005, Journal of Geophysical Research (Space Physics) 110, 11103 Engvold O., 1980, in M. Dryer, E. Tandberg-Hanssen (eds.), IAU Symp. 91: Solar and Interplanetary Dynamics, p. 173 Fisher G. H., Canfield R. C., McClymont A. N., 1985, ApJ289, 434 Fletcher L., Pollock J. A., Potts H. E., 2004, Solar Phys.222, 279 Forbes T. G., Acton L. W., 1996, ApJ459, 330 Chromospheric Flares 21 Gary G. A., 2001, Solar Phys.203, 71 Goldsmith D. W., 1971, Solar Phys.19, 86 Hale G. E., 1930, ApJ71, 73 Handy B. N. et al., 1999, Solar Phys. 187, 229 Heinzel P., 2003, Advances in Space Research 32, 2393 Heyvaerts J., Priest E. R., Rust D. M., 1977, ApJ216, 123 Holman G. D., Sui L., Schwartz R. A., Emslie A. G., 2003, ApJ595, L97 Hoyng P. et al., 1981, ApJ246, L155 Hudson H. S., 1972, Solar Phys.24, 414 Hudson H. S., 2000, ApJ 531, L75 Hudson H. S., Ohki K., 1972, Solar Phys.23, 155 Hudson H. S., Wolfson C. J., Metcalf T. R., 2006, Solar Phys.234, 79 Isobe H., Takasaki H., Shibata K., 2005, ApJ632, 1184 Kane S. R., Donnelly R. F., 1971, ApJ164, 151 Lin R. P., et al., 2002, Solar Phys.210, 3 Lin R. P., Hudson H. S., 1976, Solar Phys.50, 153 Longcope D. W., Welsch B. T., 2000, ApJ545, 1089 Machado M. E., Rust D. M., 1974, Solar Phys.38, 499 McClymont A. N., Fisher G. H., 1989, in J. H. Waite Jr., J. L. Burch, R. L. Moore (eds.), Solar System Plasma Physics, p. 219 Moore R. L., Sterling A. C., Hudson H. S., Lemen J. R., 2001, ApJ552, 833 Najita K., Orrall F. Q., 1970, Solar Phys.15, 176 Neidig D. F., 1989, Solar Phys.121, 261 Neupert W. M., 1968, ApJ 153, L59 Pariat E., Aulanier G., Schmieder B., Georgoulis M. K., Rust D. M., Bernasconi P. N., 2004, ApJ614, 1099 Poletto G., Kopp R. A., 1986, in The Lower Atmosphere of Solar Flares, p. 453 Ramaty R., Mandzhavidze N., Kozlovsky B., Murphy R. J., 1995, ApJ455, L193 Raymond J. C., Holman G., Ciaravella A., Panasyuk A., Ko Y. ., Kohl J., 2007, ArXiv Astrophysics e-prints 1359 Saba J. L. R., Gaeng T., Tarbell T. D., 2006, ApJ641, 1197 Sakao T., Kosugi T., Masuda S., Yaji K., Inda-Koide M., Makishima K., 1996, Advances in Space Research 17, 67 Schmieder B., Forbes T. G., Malherbe J. M., Machado M. E., 1987, ApJ317, 956 Schrijver C. J., Hudson H. S., Murphy R. J., Share G. H., Tarbell T. D., 2006, ApJ650, Scudder J. D., 1994, ApJ427, 446 Share G. H., Murphy R. J., Smith D. M., Schwartz R. A., Lin R. P., 2004, ApJ615, Shibata K., et al., 1992, PASJ44, L173 Shoub E. C., 1983, ApJ266, 339 Simnett G. M., Haines M. G., 1990, Solar Phys.130, 253 Smith H. J., Smith E. V. P., 1963, Solar flares, New York: Macmillan, 1963 Suemoto Z., Hiei E., 1959, PASJ11, 185 Trujillo Bueno J., Shchukina N., Asensio Ramos A., 2004, Nat430, 326 Uitenbroek H., 2005, AGU Spring Meeting Abstracts Švestka Z., 1966, Space Science Reviews 5, 388 Švestka Z., 1970, Solar Phys.13, 471 Švestka Z., 1976, Solar Flares, Dordrecht: Reidel, 1976 Švestka Z. F., Fontenla J. M., Machado M. E., Martin S. F., Neidig D. F., 1987, So- lar Phys.108, 237 Vernazza J. E., Avrett E. H., Loeser R., 1981, ApJS45, 635 Waldmeier M., 1963, Zeitschrift fur Astrophysik 56, 291 Wang Y.-M., Sheeley, Jr. N. R., 2002, ApJ575, 542 White S. M., Kundu M. R., Bastian T. S., Gary D. E., Hurford G. J., Kucera T., 22 Hudson Bieging J. H., 1992, ApJ 384, 656 Woods T. N., Kopp G., Chamberlin P. C., 2006, Journal of Geophysical Research (Space Physics) 111, 10 Xu Y., Cao W., Liu C., Yang G., Jing J., Denker C., Emslie A. G., Wang H., 2006, ApJ641, 1210 Xu Y., Cao W., Liu C., Yang G., Qiu J., Jing J., Denker C., Wang H., 2004, ApJ607, Zhang J., Dere K. P., Howard R. A., Kundu M. R., White S. M., 2001, ApJ559, 452 Zirin H., 1966, The solar atmosphere, Blaisdell: Waltham, Mass., 1966 Zweibel E. G., Haber D. A., 1983, ApJ264, 648 ABSTRACT In this topical review I revisit the "chromospheric flare." This should currently be an outdated concept, because modern data seem to rule out the possiblity of a major flare happening independently in the chromosphere alone, but the chromosphere still plays a major observational role in many ways. It is the source of the bulk of a flare's radiant energy - in particular the visible/UV continuum radiation. It also provides tracers that guide us to the coronal source of the energy, even though we do not yet understand the propagation of the energy from its storage in the corona to its release in the chromosphere. The formation of chromospheric radiations during a flare presents several difficult and interesting physical problems. <|endoftext|><|startoftext|> Introduction In this work we study deformations of the N -differential of a N -differential graded algebra. According to Kapranov [18] and Mayer [24, 25] a N -complex over a field k is a Z-graded k- vector space V = n∈Z Vn together with a degree one linear map d : V −→ V such that dN = 0. Remarkably, there are at least two generalizations of the notion of differential graded algebras to the context of N -complexes. A choice, introduced first by Kerner in [20, 21] and further studied by Dubois-Violette [13, 14] and Kapranov [18], is to fix a primitive N -th root of unity q and define a q-differential graded algebra A to be a Z-graded associative algebra together with a linear operator d : A −→ A of degree one such that d(ab) = d(a)b + qāad(b) and dN = 0. There are several interesting examples and constructions of q-differential graded algebras [1, 2, 6, 8, 9, 15, 16, 19, 21]. We work within the framework of N -differential graded algebras (N -dga) introduced in [4]. This notion does not depend on the choice of a N -th primitive root of unity, and thus it is better adapted for differential geometric applications. A N -differential graded algebra A consist of a Z-graded associative algebra A = n∈Z An together with a degree one linear map d : A −→ A such that dN = 0 and d(ab) = d(a)b + (−1)āad(b) for a, b ∈ A. The main question regarding this definition is whether there are interesting examples of N -differential graded algebras. Much work still needs to be done, but already a variety of examples has been constructed in [4, 5]. These examples may be classified as follows: • Deformations of 2-dga into N -dga. This is the simplest and most direct way to construct N -differential graded algebras. Take a differential graded algebra A with differential d and consider the deformed derivation d+e where e : A −→ A is a degree one derivation. It http://arxiv.org/abs/0704.0824v3 is possible to write down explicitly the equations that determine under which conditions d+ e is a N -differential, and thus turns A into a N -differential graded algebra. In other words one can explicitly write down the condition (d+ e)N = 0. • N flat connections. Let E be a vector bundle over a manifold M provided with a flat connection ∂E . Differential forms on M with values in End(E) form a differential graded algebra. An End(E)-valued one form T determines a deformation of this algebra into a N -differential graded algebra with differential of the form ∂E + [T, ] if and only if T is a N -flat connection, i.e., the curvature of T is N -nilpotent. • Differential forms of depth N ≥ 3. Attached to each affine manifold M there is a (dim(M)(N − 1) + 1)-differential graded algebra ΩN (M), called de algebra of differential forms of depth N on M , constructed as the usual differential forms allowing higher order differentials, i.e., for affine coordinates xi on M , there are higher order differentials d for 1 ≤ j ≤ N − 1. • Deformations of N -differential graded algebras into M -differential graded algebras. If we are given a N -differential graded algebra A with differential d, one can study under which condition a deformed derivation d + e, where e is a degree one derivation of A, turns A into a M -differential graded algebra, i.e., one can determine conditions ensuring that (d + e)M = 0. In [4] we showed that e must satisfy a system of non-linear equations, which we called the (N,M) Maurer-Cartan equation. • Algebras AN∞. This is not so much an example of N -differential graded algebras but rather a homotopy generalization of such notion. AN∞ algebras are studied in [7]. This paper has three main goals. One is to introduce geometric examples of N - differential graded algebras. We first review the constructions of N -differential graded algebras outlined above and then proceed to consider the new examples: • Differential forms on finitely generated simplicial sets. We construct a contravariant func- tor ΩN : set ∆op −→ N ildga from the category of simplicial sets generated in finite dimen- sions to N ildga, the category of nilpotent differential graded algebras, i.e., N -differential graded algebras for some N ≥ 1. For a simplicial set s we let ΩN (s) be the algebra of algebraic differential forms of depth N on the algebro-geometric realization of s. For each integer K we define functor Sing≤K : Top −→ set ∆op , thus we obtain contravariant func- tors ΩN ◦Sing≤K : Top −→ N ildga assigning to each topological space X a nil-differential graded algebra. • Difference forms on finitely generated simplicial sets. We construct a contravariant functor DN defined on set ∆op with values in a category whose objects are graded algebras which are also N -complexes for some N , with the N -differential satisfying a twisted Leibnitz rule. For a simplicial set s we let DN (s) be the algebra of difference forms of depth N on the integral lattice in the algebro-geometric realization of s. Again, for each integer K ≥ 0 we obtain a functor DN ◦ Sing≤K defined on Top assigning to each topological space X a twisted nil-differential graded algebra. Our second goal is to study the construction of N -differential graded algebras as defor- mations of 3-differential graded algebras. Although in [4] a general theory solving this sort of problem was proposed, our aim here is to provided a solution as explicit as possible. We consider exact and infinitesimal deformations of 3-differentials in Section 3. Our final goal in this work is to find applications of N -differential graded algebras to Lie algebroids. In Section 4 we review the concept of Lie algebroids introduced by Pradines [27], which generalizes both Lie algebras and tangent bundles of manifolds. A Lie algebroid E may be defined as a vector bundle together with a degree one differential d on Γ( E∗). We generalize this notion to the world of N -complexes, that is we introduce the concept of N Lie algebroids and construct several examples of such objects. 2 Examples of N-differential graded algebras In this section we give a brief summary of the known examples of N -dgas and introduce new examples of N -dgas of geometric nature. Definition 1. Let N ≥ 1 be an integer. A N -complex is a pair (A, d), where A is a Z-graded vector space and d : A −→ A is a degree one linear map such that dN = 0. Clearly a N -complex is also a M -complex for M ≥ N . N -complexes are also referred to as N -differential graded vector spaces. A N -complex (A, d) such that dN−1 6= 0 is said to be a proper N -complex. Let (A, d) be a N -complex and (B, d) be a M -complex, a morphism f : (A, d) −→ (B, d) is a linear map f : A −→ B such that df = fd. One of the most interesting features of N -complexes is that they carry cohomological information. Let (A, d) be a N -complex, a ∈ Ai is p-closed if dp(a) = 0, and is p-exact if there exists b ∈ Ai−N+p such that dN−p(b) = a, for 1 ≤ p < N . The cohomology groups of (A, d) are the spaces i(A) = Ker{dp : Ai −→ Ai+p}/Im{dN−p : Ai−N+p −→ Ai}, for i ∈ Z and p = 1, 2, ..., N−1. Definition 2. A N -differential graded algebra (N -dga) over a field k, is a triple (A,m, d) where m : A⊗A −→ A and d : A −→ A are linear maps such that: 1. dN = 0, i.e., (A, d) is a N -complex. 2. (A,m) is a graded associative algebra. 3. d satisfies the graded Leibnitz rule d(ab) = d(a)b+ (−1)āad(b). The simplest way to obtain N -differential graded algebras is deforming differential graded algebras. Let Der(A) be the Lie algebra of derivations on a graded algebra A. Recall that a degree one derivation d on A, induces a degree one derivation, also denoted by d, on End(A). Let A be a 2-dga and e ∈ Der(A). It is shown in [4] that e defines a deformation of A into a N -differential graded algebra if and only if (d + e)N = 0, or equivalently, if and only if the curvature Fe = d(e) + e 2 of e satisfies (Fe) 2 = 0 if N is even, or (Fe) 2 (d + e) = 0 if N is odd. For example, consider the trivial bundle M × Rn over M . A connection on M × Rn is a gl(n)-valued one form a on M , and its curvature is Fa = da + [a, a]. Let Ω(M,gl(n)) be the graded algebra of gl(n)-valued forms on M . Thus the pair (Ω(M,gl(n)), d + [a, ]) defines a N -dga if and only if (Fa) 2 = 0 for N even, or (Fa) 2 (d+ a) = 0 for N odd. Differential forms of depth N on simplicial sets Fix an integer N ≥ 3. We are going to construct the (n(N − 1) + 1)-differential graded algebra ΩN (R n) of algebraic differential forms of depth N on Rn. Let x1, ..., xn be coordinates on R and for 0 ≤ i ≤ n and 0 ≤ j < N, let djxi be a variable of degree j. We identify d 0xi with xi. Definition 3. The (n(N − 1) + 1)-differential graded algebra ΩN (R n) is given by • ΩN (R n) = R[djxi]/ djxid kxi | j, k ≥ 1 as a graded algebras. • The (n(N − 1) + 1)-differential d : ΩN (R n) −→ ΩN (R n) is given by d(djxi) = d j+1xi, for 0 ≤ j ≤ N − 2, and d(dN−1xi) = 0. One can show that d is (n(N − 1) + 1)-differential as follows: 1. It is easy to check that ΩN (R) is a N -dga. 2. If A is a N -dga and B is a P -dga, then A⊗B is a (N + P − 1)-dga. 3. ΩN (R n) = ΩN (R) We often write ΩN (x1, ..., xn) instead of ΩN (R n) to indicate that a choice of affine coordi- nates (x1, ..., xn) on R n has been made. Let ∆ be the category such that its objects are non-negative integers; morphisms in ∆(n,m) are order preserving maps f : {0, ..., n} −→ {0, ...,m}. The category of simplicial sets Set∆ the category of contravariant functors ∆ −→ Set. Explicitly, a simplicial set s : ∆op −→ Set is a functorial correspondence assigning: • A set sn for each integer n ≥ 0. Elements of sn are called simplices of dimension n. • A map s(f) : sm −→ sn for each f ∈ ∆(n,m). Let Aff be the category of affine varieties, and let A : ∆ −→ Aff be the functor sending n ≥ 0, into the affine variety A(x0, ..., xn) = ∆n = {(x0, ..., xn) ∈ R n | x0 + ... + xn = 1}. A sends f ∈ ∆(n,m) into A(f) : A(x0, ..., xn) → A(x0, ..., xm) given by A(f) ∗(xj) = f(i)=j xi, for 0 ≤ j ≤ m. Forms of depth N on the cosimplicial affine variety A are defined by the functor ΩN : ∆ op −→ N ildga sending n ≥ 0 into ΩN (n) = ΩN (x0, ..., xn)/ 〈x0 + ...+ xn − 1, dx0 + ...+ dxn〉 . A map f ∈ ∆(n,m) induces a morphisms ΩN (f) : ΩN (m) −→ ΩN (n) given for 0 ≤ j ≤ m by ΩN (f)(xj) = f(i)=j xi and ΩN (f)(dxj) = f(i)=j Let set∆ be the full subcategory of Set∆ whose objects are simplicial sets generated in finite dimensions, i.e., simplicial sets s for which there is an integer K such that for each p ∈ si, i ≥ K, there exists q ∈ sj, j ≤ K, with p = s(f)(q) for some f ∈ ∆(p, q). We are ready to define the contravariant functor ΩN : set ∆op −→ N ildga announced in the introduction. The nil-differential graded algebra ΩN (s) = i=0 Ω N (s) associated with s is given by ΩiN (s) = {a ∈ ΩiN (n) | as(f)(p) = ΩN (f)(ap) for p ∈ sm and f ∈ ∆(n,m)}. A natural transformation l : s −→ t induces a map ΩN(l) : ΩN (t) −→ ΩN (s) given by the rule [ΩN (l)(a)]p = al(p) for a ∈ ΩN (t) and p ∈ sn. For each integer K ≥ 0 there is functor ( )≤K : Set ∆op −→ set∆ sending a simplicial set s, into the simplicial set s≤K generated by simplices in s of dimension lesser or equal to K. The singular functor Sing : Top −→ Set∆ sends a topological space X into the simplicial set Sing(X) such that Singn(X) = {f : ∆n −→ X | f is continous }. Thus, for each pair of integers N and K we have constructed a functor ΩN ◦ ( )≤K ◦ Sing : Top −→ N ildga sending a topological space X into the nil-differential graded algebra ΩN (Sing≤K(X)). Difference forms of depth N on simplicial sets Next we construct difference forms of higher depth on finitely generated simplicial sets. Dif- ference forms on discrete affine space were introduced by Zeilberger in [28]. We proceed to construct a discrete analogue of the functors from topological spaces to nil-differential graded algebras introduced above. First, we construct DN (Z n) the algebra of difference forms of depth N on Zn. Let F (Zn,R) be the algebra concentrated in degree zero of R-valued functions on the lattice Zn. Introduce variables δjmi of degree j for 1 ≤ i ≤ n and 1 ≤ j < N . The graded algebra of difference forms of depth N on Zn is given by DN (Z n) = F (Zn,R)⊗ R[δjmi]/ δjmiδ kmi | j, k ≥ 1 A form ω ∈ DN (Z n) can be written as ω = I ωIdmI where I : {1, .., n} −→ N is any map and dmI = i=1 d I(i)mi. The degree of dmI is |I| = i=1 I(i). The finite difference ∆i(g) of g ∈ F (Zn,R) along the i-direction is given by ∆i(g)(m) = g(m+ ei)− g(m), where the vectors ei are the canonical generators of Z n and m ∈ Zn. The difference operator δ is defined for 1 ≤ j ≤ N − 2 by the rules δ(g) = ∆i(g)δmi, δ(δ jmi) = δ j+1mi and δ(δ N−1mi) = 0. It is not difficult to check that if ω = I ωIdmI , then δω = J(δω)JdmJ where (δω)J = J(i)=1 (−1)|Jn = s<1 = ∅. N (∞) is equal to where by convention N(0) = {∅}. Let A be a 3-dga and e be a degree one derivation on A. For s ∈ Nn we let e(s) = e(s1) · · · e(sn), where e(l) = dl(e) if l ≥ 1, e(0) = e and e∅ = 1. For N ∈ N, we set EN = s ∈ N(∞) | s 6= ∅ and |s|+ l(s) ≤ N and for s ∈ EN we let N(s) ∈ Z be given by N(s) = N − |s| − l(s). The following data defines a discrete quantum mechanical system: 1. The set of vertices is N(∞). 2. There is a unique directed edge from s to t if and only if t ∈ {(0, s), s, (s + ei)}, where ei ∈ N l(s) are the canonical vectors. 3. Edges are weighted according to the table: source target weight s (0, s) 1 s s (−1)|s|+l(s) s (s+ ei) (−1) |s i. The sign s(f, α) is given by s(f, α) = p( s i for i ∈ [N − 1], and f : [N ] −→ [m] such that |{j ∈ α−1(N + 1) ⊔ {N} | f(j) = i}| = I(i), for i ∈ [m]. The sign S(f, α) is given by S(f, α) = s(f, α)s(f |α−1(N+1)⊔{N}) . Corollary 19. (ai∂i) N = 0 if and only if cI = 0 for I as above. For example for N = 2 one gets (ai∂i) p(xiaj)aiaj∂i∂j + ai∂i(aj)∂j . For N = 3 we get that (ai∂i) i,j,k ai∂i(aj)∂j(ak)∂k + p(xiaj)aiaj∂i∂j(ak)∂k + p(xiak)aiaj∂j(ak)∂i∂k + p(xjak)ai∂i(aj)ak∂j∂k + xiaj)aiaj∂i(ak)∂j∂k + p(xjak + xiaj + xiak)aiajak∂i∂j∂k. For N = 4 the corresponding expression have 24 terms and we won’t spell it out. We return to the problem of defining N Lie algebroids. We need some general remarks on differential operators on associative algebras. Given an associative algebra A we let DO(A) be the algebra of differential operators on A, i.e., the subalgebra of End(A) generated by A ⊂ End(A) and Der(A) ⊂ End(A), the space of derivations of A. Thus DO(A) is generated as a vector space by operators of the form x1 ◦x2 ◦ · · · ◦xn ∈ End(A) where xi is in A⊔Der(A). Notice that DO(A) admits a natural filtration ∅ = DO≤−1(A) ⊆ DO≤0(A) ⊆ DO≤1(A) ⊆ · · · ⊆ DO≤k(A) ⊆ · · · ⊆ DO(A), whereDO≤k(A) ⊆ DO(A) is the subspace generated by operators x1◦x2◦· · ·◦xn, where at most k operators among the xi belong to Der(A). Thus DO(A) admits the following decomposition as graded vector space DO(A) = DOk(A) := DO≤k(A)/DO≤k−1(A). Clearly DO0(A) = A and if A is either commutative or graded commutative, then DO1(A) = Der(A). The projection map π1 : DO(A) −→ DO1(A) induces a non-associative product ⋄ : DO1(A)⊗DO1(A) −→ DO1(A) given by s ⋄ t = π1(s ◦ t) for s, t ∈ DO1(A). In particular if A is commutative or graded commutative we obtain a non-associative product ⋄ : Der(A)⊗Der(A) −→ Der(A). To avoid unnecessary use of parenthesis we assume that in the iterated applications of ⋄ we associate in the minimal form from right to left. Definition 20. A N Lie algebroid is a vector bundle E together with a degree one derivation d : Γ( E∗) −→ Γ( E∗), such that the result of N ⋄-compositions of d with itself vanishes, i.e., d ⋄ d ⋄ · · · ⋄ d = 0. The notions of Lie algebroids and 2 Lie algebroids agree; indeed it is easy to check that d ◦ d = d ⋄ d for any degree one derivation d : Γ( E∗) −→ Γ( E∗). Let us now illustrate with an example the difference between the condition d◦d◦· · ·◦d = 0 and the much weaker condition d ⋄ d ⋄ · · · ⋄ d = 0. Let C[x1, ..., xn] be the free graded algebra generated by graded variables xi for 1 ≤ i ≤ n. A derivation on C[x1, ..., xn] is a vector field ∂ = ai∂i where ai ∈ C[x1, ..., xn]. The condition ∂N = 0 is rather strong and restrictive, it might be tackled with the methods provided above. In contrast, the condition ∂ ⋄ ∂ ⋄ · · · ⋄ ∂ = 0 is much simpler and indeed it is equivalent to the condition ∂N (xi) = 0 for 1 ≤ i ≤ n. Definition 21. A N Lie algebra is a vector space g together with a degree one derivation d on ∗ such that the N -th ⋄-composition of d with itself vanishes. Our next result characterizes 3 Lie algebras in more familiar terms. For integers k1, k2, ..., kl such that k1 + k2 + · · ·+ kl = n, we let Sh(k1, k2, · · · , kl) be the set of permutations σ : {1, · · · , n} −→ {1, · · · , n} such that σ is increasing on the intervals [ki + 1, ki+1] for 0 ≤ i ≤ l, k0 = 1 and kl+1 = n. Assume we are given a map [ , ] : g −→ g. Theorem 22. The pair (g, [ , ]) is a 3 Lie algebra if and only if for v1, v2, v3, v4 ∈ g we have σ∈Sh(2,1,1) sgn(σ)[[[vσ(1), vσ(2)], vσ(3)]vσ(4)] = σ∈Sh(2,2) sgn(σ)[[vσ(1) , vσ(2)], [vσ(3), vσ(4)]], Proof. One can show that a degree one differential on ∗ is necessarily the Chevalley-Eilenberg operator dθ(v1, . . . , vn+1) = (−1)i+jθ([vi, vj ], v1, . . . , v̂i, . . . , v̂j , . . . vn+1) , where [ , ] : g −→ g is an antisymmetric operator. We remark that we are not assuming, at this point, that the bracket [ , ] satisfies any further identity. Jacobi identity arises when the square of d is set to be equal to zero, but we do not do that since we want to investigate the weaker condition that the third ⋄-power of d be equal to zero. For θ ∈ ∗ = g∗ the Chevalley-Eilenberg operator takes the simple form dθ(v1, v2) = −θ([v1, v2]). Moreover a further application of d to dθ yields d2θ(v1, v2, v3) = σ∈Sh(2,1) sgn(σ)θ([[vσ(1), vσ(2)], vσ(3)]). From the last equation it is evident that Jacobi identity is equivalent to the condition d2 = 0. We do not assume assume that Jacobi identity holds and proceed to compute the third ⋄-power of d. We obtain that d ⋄ d ⋄ dθ(v1, v2, v3, v4) = σ∈Sh(2,1,1) sgn(σ)θ([[[vσ(1), vσ(2)], vσ(3)]vσ(4)]) σ∈Sh(2,2) sgn(σ)θ([[vσ(1), vσ(2)], [vσ(3), vσ(4)]]). Thus d ⋄ d ⋄ d = 0 if and only if the condition from the statement of the Theorem holds. Using local coordinates θ1, ..., θm on the graded manifold g[−1], it is not hard to show that a vector field of degree one on g[−1] can be written as where the constants C may be identified with the structural constants of [·, ·]. The square of the vector field ∂ is given by Cσδ εθ Cσγ εθ αθβθε Cσδ γθ αθβθδ θαθβCσδ εθ Using the antisymmetry properties of C and the commutation rules for θα one can write together the first to terms. We find that ∂ ⋄ ∂ = Cσγ εθ αθβθε The condition ∂ ⋄ ∂ = 0 is equivalent to Jacobi identity. We assume that ∂ ⋄ ∂ 6= 0 and proceed to compute consider the condition ∂ ⋄ ∂ ⋄ ∂ = 0. We have that ∂ ◦ (∂ ⋄ ∂) = Cνλµθ Cσγ εθ αθβθε Using carefully the properties of C and θα we find that ∂ ◦ (∂ ⋄ ∂) = CνλµC Cσγ εθ λθµθβθε CνλµC Cσγ νθ λθµθαθβ CνλµC Cσγ εθ λθµθαθβθε Therefore we have shown that ∂ ⋄ (∂ ⋄ ∂) = CνλµC Cσγ ǫ − CσγαC θλθµθβθε Thus the condition ∂ ⋄ (∂ ⋄ ∂) = 0 is equivalent to the following equations for fixed σ: λ,µ,β,ε CνλµC Cσγ ǫ − CσγαC θλθµθβθε = 0 . Let us now go back to the case of Lie algebroids as opposed to Lie algebras. There is a natural degree one vector field on the graded manifold T[−1]R n, namely, de Rham differential. We now investigate whether it is possible to deform, infinitesimally, de Rham differential into a 3-differential. In local coordinates (x1, . . . , xn, θ1, . . . , θn) on T[−1]R n, with xi of degree zero and θi of degree 1, de Rham operator takes the form ∂ = δiαθ Let t be a formal infinitesimal parameter such that t2 = 0. We are going to show that any set of functions aiα of degree zero on T[−1]R n determine a deformation of de Rham operator into a 3-⋄ nilpotent operator given by δiα + ta Theorem 23. ∂a ⋄ ∂a = t θα θβ and ∂a ⋄ (∂a ⋄ ∂a) = 0. Proof. ∂2a = δiα + ta θα θβ θα θβ θα θβ Since t2 = 0 the third term on the right hand side of the expression above vanishes. The second term also vanishes because it is a contraction of even and odd indices. So we get that ∂a ⋄ ∂a = t θα θβ The third power of ∂a is given by ∂a ⋄ (∂a ⋄ ∂a) = t ∂xγ∂xα θγ θα θβ It also vanishes because it includes a contraction of even and odd indices. The nilpotency condition for the operator ∂a ⋄∂a is θα θβ = 0 for j = 1, . . . , n. It is not hard to find examples of matrices a such that ∂a ⋄ ∂a = 0, for example  (x4)2 x1 x1 x2 x2 x3 x2 x3 x3 x2 x4 x4 x4 x1 x4 x3  More importantly there are also matrices a such that ∂a ⋄ ∂a 6= 0, for example  x1 x4 x1 x1 x1 x2 x2 x4 x2 x2 x3 x3 x3 x4 x3 x4 x4 x4 x1 x4  We now consider full deformations as opposed to infinitesimal ones. Let δiα + a be a vector field. We think of ∂a as a deformation of de Rham differential with deformation parameters aiα. Theorem 24. ∂a ⋄ (∂a ⋄ ∂a) = δlγ + a ){∂aiα + aiα ∂xl∂xi θγθαθβ Proof. Since ∂2a = δiα + a ∂a ⋄ ∂a = δiα + a ) ∂ajβ we get ∂a ⋄ (∂a ⋄ ∂a) = δlγ + a δiα + a ) ∂ajβ δlγ + a ){∂aiα + aiα ∂xl∂xi θγθαθβ Corollary 25. ∂a ⋄ (∂a ⋄ ∂a) = 0 if for fixed indices α, β, λ, j the following identity holds δlγ + a ){∂aiα + aiα ∂xl∂xi θγθαθβ = 0. Corollary 26. Each matrix A = (A ) ∈ Mn(R) such that A 2 = 0 determines a 3 Lie algebroid structure on TRn with differential given by (δiα +A αxα)dx Our final result describes explicitly the conditions defining a 3 Lie algebroid. Let E be a vector bundle over M . A vector field on E[−1] of degree one is given in local coordinates by ∂ = ρiαθ where ρiα and C are functions of the bosonic variables only. Theorem 27. ∂ ⋄ (∂ ⋄ ∂) = 0 if and only if for fixed γ and i the following identity holds: Cασ µ) Cλµσ − CαλµC CαβµC θνθσθµθβ = 0 , Cǫσν− Cǫσν + ρ θσθνθγ = 0 . Proof. We sketch the rather long proof. For ∂ = ρiαθ θαθβ ∂ , we have ∂ ⋄ ∂ = θλθµθβ As in the previous theorem one finds that the condition ∂ ⋄ (∂ ⋄ ∂) = 0 is equivalent to the following identities Cασ µ) Cλµσ − Cλνσ − θνθσθµθβ = 0 , Cǫσν − ρ θσθνθγ Needless to say further research is necessary in order to have a better grasp of the meaning and applications of the notion of N Lie algebroids. We expect that this approach will lead towards new forms of infinitesimal symmetries, and for that reason alone it should find appli- cations in various problems in mathematical physics. In our forthcoming work [3] we are going to discuss some applications of N Lie algebroids in the context of Batalin-Vilkovisky algebras and the master equation. Acknowledgment Thanks to Takashi Kimura, Juan Carlos Moreno and Jim Stasheff. References [1] V. Abramov, R. Kerner, Exterior differentials of higher order and ther covariant generalization, J. Math. Phys. 41 (8) (2000) 5598-5614. [2] V. Abramov, R. Kerner, On certain realizations of the q-deformed exterior differential calculus, Rep. Math. Phys. 43 (1999) 179-194. [3] M. Angel, J. Camacaro, R. Dı́az, Batalin-Vilkovisky algebras andN -complexes, in preparation. [4] M. Angel, R. Dı́az, N-differential graded algebras, J. Pure App. Alg. 210 (3) (2007) 673-683. [5] M. Angel, R. Dı́az, N -flat connections, in S. Paycha, B. Uribe (Eds.), Geometric and Topological Methods for Quantum Field Theory, Contemp. Math. 432, Amer. Math. Soc., Providence, pp. 163- 172, 2007. [6] M. Angel, R. Dı́az. On the q-analoque of the Maurer-Cartan equation, Adv. Stud. Contemp. Math. 12 (2) (2006) 315-322. [7] M. Angel, R. Dı́az, AN -algebras, preprint, arXiv.math.QA/0612661. [8] N. Bazunova, Construction of graded differential algebra with ternary differential, in J. Fuchs, J. Mickelsson, Grigori Rozenblioum and Alexander Stolin (Eds.), Noncommutative geometry and rep- resentation theory in mathematical physics, Contemp. Math. 391, Amer. Math. Soc., Providence, pp. 1-9, 2005. [9] N. Bazunova, Non-coordinate case of graded differential algebra with ternary differential, J. Nonlinear Math. Phys. 13 (2006) 21-26. [10] J. R. Camacaro, Lie algebroid exterior algebra in gauge field theories, in Groups, Geometry and Physics, Monogr. Acad. Ci. Zaragoza 29, Zaragoza, pp. 57-64, 2006. [11] J.F. Cariñena, Lie groupoids and algebroids in classical and quantum mechanics, in Symmetries in Quantum Mechanics and Quantum Optics, Universidad de Burgos, Burgos, pp. 67-81, 1999. [12] A.C. da Silva, A. Weinstein, Lectures on geometrical models for noncommutative algebra, Berkeley Mathematical Lecture Notes 10, Amer. Math. Soc., Providence, 1999. [13] M. Dubois-Violette, Generalized differential spaces with dN = 0 and the q-differential calculus, Czech J. of Phys. 46 (1996) 1227- 1233. [14] M. Dubois-Violette, Generalized homologies for dN = 0 and graded q-differential algebras, in M. Henneaux, J. Krasil’shchik, A. Vinogradov (Eds.), Secondary Calculus and Cohomological Physics, Contemp. Maths. 219, Amer. Math. Soc., Providence, pp. 69-79, 1998. [15] M. Dubois-Violette, Lectures on differentials, generalized differentials and some examples re- lated to theoretical physics, in R. Coquereaux, A. Garcia, R. Trinchero (Eds.),Quantum Symmetries in Theoretical Physics and Mathematics, Contemp. Maths. 294, Amer. Math. Soc., Providence, pp. 59-94, 2002. [16] M. Dubois-Violette, R. Kerner, Universal q-differential calculus and q-analog of homological algebra, Acta Math. Univ. Comenianae LXV (2) (1996) 175-188. [17] N. P. Landsman, Lie groups and Lie algebroids in physics and noncommutative geometry, J. Geom. Phys. 56 (2006) 24-54. [18] M.M. Kapranov, On the q-analog of homological algebra, preprint, arXiv.q-alg/9611005. [19] C. Kassel, M. Wambst, Algèbre homologique des N-complexes et homologie de Hochschild aux racines de l’unité, Publ. Res. Inst. Math. Sci. Kyoto University 34 (2) (1998) 91-114. [20] R. Kerner, The cubic chessboard, Class. Quantum Grav. 14 (1997) A203-A225. [21] R. Kerner, Z3-graded exterior differential calculus and gauge theories of higher order, Lett. Math. Phys. 36 (1996) 441-454. [22] R. Kerner, B. Niemeyer, Covariant q-differential calculus and its deformations at qN = 1, Lett. Math. Phys. 45 (1998) 161-176. http://arxiv.org/abs/math/0612661 http://arxiv.org/abs/q-alg/9611005 [23] K. C. Mackenzie, General Theory of Lie Groupoids and Lie Algebroids, London Math. Soc. Lecture Note Series 213, Cambridge Univ. Press, Cambridge, 2005. [24] W. Mayer, A new homology theory I, Ann. of Math. 43 (1942) 370-380. [25] W. Mayer, A new homology theory II, Ann. of Math. 43 (1942) 594-605. [26] A. Sitarz, On the tensor product construction for q-differential algebras, Lett. Math. Phys. 44 (1998). [27] J. Pradines, Théorie de Lie pour les groupöıdes différentiables. Relations entre propiétés locales et globales, C. R. Acad. Sci. Paris Sér. I Math. 236 (1966) 907-910. [28] D. Zeilberger; Closed forms (pun inteded!), in A tribute to Emil Grosswald: number theory and related analysis, Contemp. Math. 143, Amer. Math. Soc., Providence, pp. 579-607, 1993. mangel@euler.ciens.ucv.ve, jcama@usb.ve, ragadiaz@gmail.com Introduction Examples of N-differential graded algebras On the (3,N) curvature N Lie algebroids ABSTRACT Deformations of the 3-differential of 3-differential graded algebras are controlled by the (3,N) Maurer-Cartan equation. We find explicit formulae for the coefficients appearing in that equation, introduce new geometric examples of N-differential graded algebras, and use these results to study N Lie algebroids. <|endoftext|><|startoftext|> Electronic structure of kinetic energy driven superconductors in the presence of bilayer splitting Yu Lan,1 Jihong Qin,2 and Shiping Feng1 Department of Physics, Beijing Normal University, Beijing 100875, China Department of Physics, Beijing University of Science and Technology, Beijing 100083, China (Dated: November 17, 2018) Within the framework of the kinetic energy driven superconductivity, the electronic structure of bilayer cuprate superconductors in the superconducting state is studied. It is shown that the electron spectrum of bilayer cuprate superconductors is split into the bonding and antibonding components by the bilayer splitting, then the observed peak-dip-hump structure around the [π, 0] point is mainly caused by this bilayer splitting, with the superconducting peak being related to the antibonding component, and the hump being formed by the bonding component. The spectral weight increases with increasing the doping concentration. In analogy to the normal state case, both electron antibonding peak and bonding hump have the weak dispersions around the [π, 0] point. PACS numbers: 74.20.Mn, 74.20.-z, 74.25.Jb I. INTRODUCTION The parent compounds of cuprate superconductors are the Mott insulators with an antiferromagnetic (AF) long- range order (AFLRO), then via the charge carrier doping, one can drive these materials through a metal-insulating transition and enter the superconducting (SC) dome1,2,3. It has become clear in the past twenty years that cuprate superconductors are among the most complex systems studied in condensed matter physics1,2,3. The compli- cations arise mainly from (1) a layered crystal structure with one or more CuO2 planes per unit cell separated by insulating layers which leads to a quasi-two-dimensional electronic structure, and (2) extreme sensitivity of the physical properties to the compositions (stoichiometry) which control the carrier density in the CuO2 plane 1,2,3. As a consequence, both experimental investigation and theoretical understanding are extremely difficult. By virtue of systematic studies using the angle-resolved photoemission spectroscopy (ARPES), the low-energy electronic structure of cuprate superconductors in the SC state is well-established by now2,3, where an agree- ment has emerged that the electronic quasiparticle-like excitations are well defined, and are the entities par- ticipating in the SC pairing. In particular, the lowest energy states are located at the [π, 0] point of the Bril- louin zone, where the d-wave SC gap function is max- imal, then the most contributions of the electron spec- tral function come from the [π, 0] point2,3. Moreover, some ARPES experimental results unambiguously estab- lished the Bogoliubov-quasiparticle nature of the sharp SC quasiparticle peak near the [π, 0] point4,5, then the SC coherence of the quasiparticle peak is described by the simple Bardeen-Cooper-Schrieffer (BCS) formalism6. However, there are numerous anomalies for different fam- ilies of cuprate superconductors, which complicate the physical properties of the electronic structure2,3. Among these anomalies is the dramatic change in the spectral lineshape around the [π, 0] point first observed on the bi- layer cuprate superconductor Bi2Sr2CaCu2O8+δ, where a sharp quasiparticle peak develops at the lowest bind- ing energy, followed by a dip and a hump, giving rise to the so-called peak-dip-hump (PDH) structure in the elec- tron spectrum7,8,9. Later, this PDH structure was also found in YBa2Cu3O7−δ 10 and in Bi2Sr2Ca2Cu3O10+δ Furthermore, although the sharp quasiparticle peaks are identified in the SC state along the entire Fermi surface, the PDH structure is most strongly developed around the [π, 0] point2,7,8,9,10,11. The appearance of the PDH structure in bilayer cuprate superconductors in the SC state is the mostly remarkable effect, however, its full understanding is still a challenging issue. The earlier works2,12 gave the main impetus for a phenomenological description of the single- particle excitations in terms of an interaction between quasiparticles and collective modes, which is of fun- damental relevance to the nature of superconductivity and the pairing mechanism in cuprate superconductors. However, the different interpretive scenario has been proposed2,13. This followed from the observation of the bilayer splitting (BS) for both normal and SC states in a wide doping range14,15,16. This BS of the CuO2 plane derives the electronic structure in the bonding and antibonding bands due to the present of CuO2 bilayer blocks in the unit cell of bilayer cuprate superconduc- tors, then the main features of the PDH structure is caused by the BS13,14,15,16, with the peak and hump corresponding to the antibonding and bonding bands, respectively. Furthermore, some ARPES experimental data measured above and below the SC transition tem- perature show that this PDH structure is totally unre- lated to superconductivity14. The recent ARPES exper- imental results reported by several groups support this scenario, and most convincingly suggested that the PDH structure originates from the BS at any doping levels17. To the best of our knowledge, this PDH structure in bi- layer cuprate superconductors has not been treated start- ing from a microscopic SC theory. Within the single layer t-t′-J model, the electronic structure of the single layer cuprate superconductors in the SC state has been discussed18 based on the frame- work of the kinetic energy driven superconductivity19, and the main features of the ARPES experiments on the single layer cuprate superconductors have been repro- duced, including the doping and temperature dependence of the electron spectrum and quasiparticle dispersion. In http://arxiv.org/abs/0704.0825v2 this paper, we study the electronic structure of bilayer cuprate superconductors in the SC state along with this line. Within the kinetic energy driven SC mechanism19, we employed the t-t′-J model by considering the bilayer interaction, and then show explicitly that the BS occurs due to this bilayer interaction. In this case, the elec- tron spectrum is split into the bonding and antibond- ing components by this BS, then the SC peak is closely related to the antibonding component, while the hump is mainly formed by the bonding component. In other words, the well pronounced PDH structure in the electron spectrum of bilayer cuprate superconductors is mainly caused by the BS. Furthermore, the spectral weight in the [π, 0] point increases with increasing the doping con- centration. In analogy to the normal-state case14,20,21,22, both electron antibonding peak and bonding hump have the weak dispersions around the [π, 0] point, in qualita- tive agreement with the experimental observation on bi- layer cuprate superconductors in the SC state2,7,8,9,10,11. The paper is organized as follows. The basic formal- ism is presented in Sec. II, where we generalize the kinetic energy driven superconductivity from the previ- ous single layer case18,19 to the bilayer case, and then evaluate explicitly the longitudinal and transverse com- ponents of the electron normal and anomalous Green’s functions (hence the bonding and antibonding electron spectral functions). Within this theoretical framework, we discuss the electronic structure of bilayer cuprate su- perconductors in the SC state in Sec. III. It is shown that the striking PDH structure in bilayer cuprate supercon- ductors is closely related to the BS. Finally, we give a summary and discussions in Sec. IV. II. FORMALISM It has been shown from the ARPES experiments2,23 that the two-dimensional t-t′-J model is of particular relevance to the low energy features of cuprate super- conductors. For discussions of the physical properties of bilayer cuprate superconductors, the t-t′-J model can be expressed by including the bilayer interactions as, H = −t iη̂aσ iaσCi+η̂aσ + t iτ̂aσ iaσCi+τ̂aσ t⊥(i)(C i1σCi2σ +H.c.) + µ iaσCiaσ Sia · Si+η̂a + J⊥ Si1 · Si2, (1) supplemented by an important on-site local constraint iaσCiaσ ≤ 1 to avoid the double occupancy, where η̂ = ±x̂,±ŷ representing the nearest neighbors of a given site i, τ̂ = ±x̂± ŷ representing the next nearest neighbors of a given site i, a = 1, 2 is plane index, C iaσ (Ciaσ) is the electron creation (annihilation) operator, Sia = iaσCia/2 is the spin operator with the Pauli matrices σ = (σx, σy, σz), µ is the chemical potential, and the interlayer coherent hopping has the form, t⊥(k) = (cos kx − cos ky) 2, (2) which is strongly anisotropic and follows the theoret- ical predictions24. In particular, this momentum de- pendent form (2) has been experimentally verified14,15. For this t-t′-J model (1), it has been argued that cru- cial requirement is to impose the electron single occu- pancy local constraint for a proper understanding of the physical properties of cuprate superconductors. To in- corporate the electron single occupancy local constraint, the charge-spin separation (CSS) fermion-spin theory has been proposed25, where the constrained electron opera- tors are decoupled as, Cia↑ = h ia and Cia↓ = h with the spinful fermion operator hiaσ = e −iΦiaσhia rep- resents the charge degree of freedom together with some effects of the spin configuration rearrangements due to the presence of the doped hole itself (dressed holon), while the spin operator Sia represents the spin degree of freedom, then the bilayer t-t′-J Hamiltonian (1) can be expressed in this CSS fermion-spin representation as, H = t i+η̂a↑hia↑S i+η̂a + h i+η̂a↓hia↓S i+η̂a) i+τ̂a↑hia↑S i+τ̂a + h i+τ̂a↓hia↓S i+τ̂a) t⊥(i)(h i2↑hi1↑S i2 + h i1↑hi2↑S i2↓hi1↓S i2 + h i1↓hi2↓S i1)− µ iaσhiaσ + Jeff Sia · Si+η̂a + Jeff⊥ Si1 · Si2, (3) where Jeff = J(1 − δ) 2, Jeff⊥ = J⊥(1 − δ) 2, and δ = iaσhiaσ〉 = 〈h iahia〉 is the doping concentration. It has been shown that the electron single occupancy local con- straint is satisfied in analytical calculations within this CSS fermion-spin theory, and the double spinful fermion occupancy are ruled out automatically25. Although in common sense hiaσ is not a real spinful fermion, it be- haves like a spinful fermion25. As in the single layer case18, the kinetic energy terms in the bilayer t-t′-J model have been transferred as the dressed holon-spin interactions, which can induce the dressed holon pair- ing state (hence the electron Cooper pairing state) by exchanging spin excitations in the higher power of the doping concentration. Before calculation of the electron normal and anomalous Green’s functions of the bilayer system in the SC state, we firstly introduce the SC order parameter. As we have mentioned above, there are two coupled CuO2 planes in the unit cell, and in this case, the SC order parameters for the electron Cooper pair is a matrix ∆ = ∆L + σx∆T , with the longitudinal and transverse SC order parameters in the CSS fermion-spin theory can be expressed as, ∆L = 〈C i+η̂a↓ − C i+η̂a↑〉 = 〈hia↑hi+η̂a↓S i+η̂a − hia↓hi+η̂a↑S i+η̂a〉 = −χ1∆hL, (4a) ∆T = 〈C i2↓ − C = 〈hi1↑hi2↓S i2 − hi1↓hi2↑S = −χ⊥∆hT , (4b) respectively, where the spin correlation functions χ1 = 〈S+iaS i+η̂a〉 and χ⊥ = 〈S i2〉, and the longitudinal and transverse dressed holon pairing order parameters ∆hL = 〈hi+η̂a↓hia↑ − hi+η̂a↑hia↓〉 and ∆hT = 〈hi2↓hi1↑ − hi2↑hi1↓〉. Within the t-J type model, robust indications of superconductivity with the d-wave symmetry in doped cuprates have been found by using numerical techniques26. On the other hand, it has been argued that the SC transition in doped cuprates is determined by the need to reduce the frustrated kinetic energy27. Although it is not necessary for the strong coupling of the electron quasiparticles and a pairing boson in their arguments27, a series of the inelastic neutron scattering experimental results provide a clear link between the electron quasi- particles and magnetic excitations28,29. In particular, an impurity-substitution effect on the low energy dynamics has been studied by virtue of the ARPES measurement30, this impurity-substitution effect is a magnetic analogue of the isotope effect used for the conventional super- conductors. These experimental results30 reveal that the impurity-induced changes in the electron self-energy show a good correspondence to those of the magnetic excitations, indicating the importance of the magnetic fluctuation to the electron pairing in cuprate supercon- ductors. Recently, we19 have developed the kinetic en- ergy driven SC mechanism based on the CSS fermion- spin theory25, where the dressed holons interact occur- ring directly through the kinetic energy by exchanging spin excitations, leading to a net attractive force between dressed holons, then the electron Cooper pairs originat- ing from the dressed holon pairing state are due to the charge-spin recombination, and their condensation re- veals the SC ground-state. Within this SC mechanism19, the doping and temperature dependence of the electron spectral function of the single layer cuprate supercon- ductors in the SC state has been discussed18. In this section, our main goal is to generalize these analytical calculations from the single layer case to the bilayer sys- tem. As in the case for the SC order parameter, the full dressed holon normal and anomalous Green’s functions can also be expressed as g(k, ω) = gL(k, ω) + σxgT (k, ω) and ℑ†(k, ω) = ℑ L(k, ω) + σxℑ L(k, ω), respectively. We now can follow the previous discussions for the single layer case18,19, and evaluate explicitly these correspond- ing longitudinal and transverse parts of the full dressed holon normal and anomalous Green’s functions as [see the Appendix], gL(k, ω) = ν=1,2 U2hνk ω − Ehνk V 2hνk ω + Ehνk , (5a) gT (k, ω) = ν=1,2 (−1)ν+1Z U2hνk ω − Ehνk V 2hνk ω + Ehνk , (5b) L(k, ω) = − ν=1,2 2Ehνk ω − Ehνk ω + Ehνk , (5c) T (k, ω) = − ν=1,2 (−1)ν+1Z 2Ehνk ω − Ehνk ω + Ehνk , (5d) where the dressed holon quasiparticle coherence fac- tors U2hνk = [1 + ξ̄νk/Ehνk]/2 and V hνk = [1 − ξ̄νk/Ehνk]/2, the dressed holon quasiparticle disper- sion Ehνk = [ξ̄νk]2+ | ∆̄ (k) |2, the renormalized dressed holon excitation spectrum ξ̄νk = Z ξνk, with the mean-field (MF) dressed holon excitation spectrum ξνk = Ztχ1γk − Zt k − µ+ (−1) ν+1χ⊥t⊥(k), where the spin correlation function χ2 = 〈S i+τ̂a〉, γk = (1/Z) eik·η̂, γ′k = (1/Z) eik·τ̂ , Z is the num- ber of the nearest neighbor or next nearest neighbor sites, the renormalized dressed holon pair gap func- tion ∆̄ (k) = Z [∆̄hL(k) + (−1) ν+1∆̄hT (k)], with ν = 1 ( ν = 2) for the bonding (antibonding) case, where ∆̄hL(k) = Σ 2L (k, ω) |ω=0= ∆̄hLγ , with γ (coskx − cosky)/2, ∆̄hT (k) = Σ 2T (k, ω) |ω=0= ∆̄hT , the dressed holon quasiparticle coherent weights Z (1)−1 hF1 − Z hF2, Z (2)−1 = Z−1 hF1 + Z hF2, with Z hF1 = 1 − Σ 1L (k0, ω) |ω=0, and Z hF2 = Σ 1T (k0, ω) |ω=0 , where k0 = [π, 0], Σ 1L (k, ω) and Σ 1T (k, ω) are the cor- responding antisymmetric parts of the longitudinal and transverse dressed holon self-energy functions Σ 1L (k, ω) and Σ 1T (k, ω), while the longitudinal and transverse parts of the dressed holon self-energy functions Σ 1 (k, ω) and Σ 2 (k, ω) have been evaluated as, 1L (k, iωn) = p+q+k gL(p+ k, ipm + iωn)ΠLL(p,q, ipm) p+q+k gT (p+ k, ipm + iωn)ΠTL(p,q, ipm)], (6a) 1T (k, iωn) = p+q+k gT (p+ k, ipm + iωn)ΠTT (p,q, ipm) p+q+k gL(p+ k, ipm + iωn)ΠLT (p,q, ipm)], (6b) 2L (k, iωn) = p+q+k L(−p− k,−ipm − iωn)ΠLL(p,q, ipm) p+q+k T (−p− k,−ipm − iωn)ΠTL(p,q, ipm)], (6c) 2T (k, iωn) = p+q+k (−p− k,−ipm − iωn)ΠTT (p,q, ipm) p+q+k L(−p− k,−ipm − iωn)ΠLT (p,q, ipm)], (6d) where R = [Z(tγk − t ′γ′k)] 2 + t2⊥(k), R = 2Z(tγk − )t⊥(k), and the spin bubbles Πη,η′(p,q, ipm) = (1/β) η (q, iqm)D η′ (q + p, iqm+ ipm), with η = L, T and η′ = L, T , and the MF spin Green’s function D(0)(k, ω) = D (k, ω) + σxD (k, ω), with the cor- responding longitudinal and transverse parts have been given by22, L (k, ω) = ν=1,2 ω2 − ω2 , (7a) (k, ω) = ν=1,2 (−1)ν+1 ω2 − ω2 , (7b) where Bνk = λ(A1γk−A2)−λ ′(2χz2γ k−χ2)−Jeff⊥[χ⊥+ 2χz⊥(−1) ν ][ǫ⊥(k)+(−1) ν ], A1 = 2ǫ‖χ 1+χ1, A2 = ǫ‖χ1+ 2χz1, λ = 2ZJeff , λ ′ = 4Zφ2t ′, ǫ‖ = 1+2tφ1/Jeff , ǫ⊥(k) = 1 + 4φ⊥t⊥(k)/Jeff⊥, the spin correlation functions χ 〈SziaS i+η̂a〉, χ 2 = 〈S i+τ̂a〉, χ ⊥ = 〈S i2〉, the dressed holon particle-hole order parameters φ1 = 〈h iaσhi+η̂aσ〉, φ2 = 〈h iaσhi+τ̂aσ〉, φ⊥ = 〈h i1σhi2σ〉, and the MF spin excitation spectrum, ω2νk = λ A4 − αǫ‖χ 1γk − αǫ‖χ1 (1 − ǫ‖γk) + αχz1 − αχ1γk (ǫ‖ − γk) Z − 1 γ′k + + λλ′α χz1(1 − ǫ‖γk)γ k − C2)(ǫ‖ − γk) + γ′k(C 2 − ǫ‖χ 2γk)− ǫ‖(C2 − χ2γk) + λJeff⊥α ǫ⊥(k)(ǫ‖ − γk)[C⊥ + χ1(−1) + (1− ǫ‖γk)[C ⊥ + χ 1ǫ⊥(k)(−1) ν ] + [ǫ⊥(k) + (−1) ǫ‖(C⊥ − χ⊥γk) + (C ⊥ − ǫ‖χ ⊥γk)(−1) + λ′Jeff⊥α γ′k[C ⊥ + χ 2ǫ⊥(k)(−1) ǫ⊥(k)[C ⊥ + χ2(−1) k − C ⊥) + χ k(−1) [ǫ⊥(k) + (−1) J2eff⊥[ǫ⊥(k) + (−1) ν ]2, (8) where A3 = αC1 + (1− α)/2Z, A4 = αC 1 + (1− α)/4Z, A5 = αC3 + (1 − α)/2Z, and the spin correla- tion functions C1 = (1/Z η̂η̂′ i+η̂aS i+η̂′a C2 = (1/Z i+η̂aS i+τ̂a〉, C3 = (1/Z2) τ̂ τ̂ ′ i+τ̂aS i+τ̂ ′a 〉, Cz1 = (1/Z2) η̂η̂′ 〈Szi+η̂aS i+η̂′a 〉, Cz2 = (1/Z2) 〈Szi+η̂aS i+τ̂a〉, C⊥ = (1/Z) 〈S+i1S i+η̂2〉, C′⊥ = (1/Z) 〈S+i1S i+τ̂2〉, C ⊥ = (1/Z) 〈Szi1S i+η̂2〉, and C′ ⊥ = (1/Z) 〈Szi1S i+τ̂2〉. In order to satisfy the sum rule of the spin correlation function 〈S+iaS ia〉 = 1/2 in the case without AFLRO, the important decoupling parameter α has been introduced in the above calcu- lation as in the single layer case18,19,22, which can be regarded as the vertex correction. With the help of the longitudinal and transverse parts of the full dressed holon normal and anomalous Green’s functions in Eq. (5) and MF spin Green’s function in Eq. (7), we now can calculate the electron nor- mal and anomalous Green’s functions G(i − j, t − t′) = 〈〈Ciσ(t);C ′)〉〉 = GL(i− j, t− t ′) + σxGT (i− j, t− t and Γ†(i−j, t−t′) = 〈〈C i↑(t);C ′)〉〉 = Γ L(i−j, t−t T (i−j, t−t ′), where these longitudinal and transverse parts are the convolutions of the corresponding longitudi- nal and transverse parts of the full dressed holon normal and anomalous Green’s functions and MF spin Green’s function in the CSS fermion-spin theory, and can be eval- uated explicitly as, GL(k, ω) = L(1)µν (k,p) U2hµp−k ω + Ehµp−k − ωνp V 2hµp−k ω − Ehµp−k + ωνp + L(2)µν (k,p) U2hµp−k ω + Ehµp−k + ωνp V 2hµp−k ω − Ehµp−k − ωνp , (9a) GT (k, ω) = (−1)µ+νZ L(1)µν (k,p) U2hµp−k ω + Ehµp−k − ωνp V 2hµp−k ω − Ehµp−k + ωνp + L(2)µν (k,p) U2hµp−k ω + Ehµp−k + ωνp V 2hµp−k ω − Ehµp−k − ωνp , (9b) L(k, ω) = (p− k) 2Ehµp−k L(1)µν (k,p) ω − Ehµp−k + ωνp ω + Ehµp−k − ωνp + L(2)µν (k,p) ω − Ehµp−k − ωνp ω + Ehµp−k + ωνp , (9c) (k, ω) = (−1)µ+νZ (p− k) 2Ehµp−k L(1)µν (k,p) ω − Ehµp−k + ωνp ω + Ehµp−k − ωνp + L(2)µν (k,p) ω − Ehµp−k − ωνp ω + Ehµp−k + ωνp , (9d) where L µν (k,p) = [coth(βωνp/2) − th(βEhµp−k/2)]/2 and L µν (k,p) = [coth(βωνp/2) + th(βEhµp−k/2)]/2, then the longitudinal and transverse parts of the electron spectral function AL(k, ω) = −2ImGL(k, ω) and AT (k, ω) = −2ImGT (k, ω) and SC gap func- tion ∆L(k) = (1/β) L(k, iωn) and ∆T (k) = (1/β) (k, iωn) are obtained as, AL(k, ω) = π {L(1)µν (k,p)[U hµp−kδ(ω + Ehµp−k − ωνp) + V hµp−kδ(ω − Ehµp−k + ωνp)] + L(2)µν (k,p)[U hµp−kδ(ω + Ehµp−k + ωνp) + V hµp−kδ(ω − Ehµp−k − ωνp)]}, (10a) AT (k, ω) = π (−1)µ+νZ {L(1)µν (k,p)[U hµp−kδ(ω + Ehµp−k − ωνp) + V hµp−kδ(ω − Ehµp−k + ωνp)] + L(2)µν (k,p)[U hµp−kδ(ω + Ehµp−k + ωνp) + V hµp−kδ(ω − Ehµp−k − ωνp)]}, (10b) ∆L(k) = − p,µ,ν (p− k) Ehµp−k βEhµp−k]coth[ βωνp], (10c) ∆T (k) = − p,µ,ν (−1)µ+νZ (p− k) Ehµp−k βEhµp−k]coth[ βωνp]. (10d) With the above longitudinal and transverse parts of the SC gap functions in Eqs. (10c) and (10d), the corre- sponding longitudinal and transverse SC gap parameters are obtained as ∆L = −χ1∆hL and ∆T = −χ⊥∆hT , respectively. In the bilayer coupling case, the more ap- propriate classification is in terms of the spectral func- tion and SC gap function within the basis of the an- tibonding and bonding components13,14,15,16,17. In this case, the electron spectral function and SC gap parame- ter can be transformed from the plane representation to the antibonding-bonding representation as, A(a)(k, ω) = [AL(k, ω)−AT (k, ω)], (11a) A(b)(k, ω) = [AL(k, ω) +AT (k, ω)], (11b) ∆(a) = ∆L −∆T , (11c) ∆(b) = ∆L +∆T . (11d) respectively, then the antibonding and bonding parts have odd and even symmetries, respectively. III. ELECTRON STRUCTURE OF BILAYER CUPRATE SUPERCONDUCTORS We now begin to discuss the effect of the bilayer in- teraction on the electronic structure in the SC state. We first plot, in Fig. 1, the antibonding (solid line) and bonding (dashed line) electron spectral functions in the [π, 0] point for parameters t/J = 2.5, t′/t = 0.3, and t⊥/t = 0.35 with temperature T = 0.002J at the doping concentration δ = 0.15. In comparison with the single layer case18, the electron spectrum of the bilayer system has been split into the bonding and antibonging compo- nents, with the bonding and antibonding SC quasipar- ticle peaks in the [π, 0] point are located at the differ- ent positions. In this sense, the differentiation between the bonding and antibonding components of the electron spectral function is essential. The antibonding spectrum consists of a low energy antibonding peak, corresponding to the SC peak, and the bonding spectrum has a higher energy bonding peak, corresponding to the hump, while the spectral dip is in between them, then the total con- tributions for the electron spectrum from both antibond- ing and bonding components give rise to the PDH struc- ture. Although the simple bilayer t-t′-J model (1) can- not be regarded as a comprehensive model for a quanti- tative comparison with bilayer cuprate superconductors, our present results for the SC state are in qualitative agreement with the major experimental observations on bilayer cuprate superconductors2,7,8,9,10,11,16. We now turn to discuss the doping evolution of the electron spectrum of bilayer cuprate superconductors in ( )/J -1.0 -.5 0.0 .5 Bonding Antibonding FIG. 1: The antibonding (solid line) and bonding (dashed line) electron spectral functions in the [π, 0] point for t/J = 2.5, t′/t = 0.3, and t⊥/t = 0.35 with T = 0.002J at δ = 0.15. ( )/J -1.0 -.5 0.0 .5 FIG. 2: The electron spectral functions at [π, 0] point for t/J = 2.5, t′/t = 0.3, and t⊥/t = 0.35 with T = 0.002J at δ = 0.09 (solid line), δ = 0.12 (dashed line), and δ = 0.15 (dotted line). the SC state. We have calculated the electron spec- trum at different doping concentrations, and the result of the electron spectral functions in the [π, 0] point for t/J = 2.5, t′/t = 0.3, and t⊥/t = 0.35 with T = 0.002J at δ = 0.09 (solid line), δ = 0.12 (dashed line), and δ = 0.15 (dotted line) are plotted in Fig. 2. In compari- son with the corresponding ARPES experimental results of the bilayer cuprate superconductor Bi2Sr2CaCu2O8+δ in the SC state in Ref.12, it is obviously that the doping evolution of the spectral weight of the bilayer supercon- Bonding Band Antibonding Band (-0.2 , ) (0, ) (0.2 , ) FIG. 3: The positions of the antibonding peaks and bonding humps in the electron spectrum as a function of momentum along the direction [−0.2π, π] → [0, π] → [0.2π, π] with T = 0.002J at δ = 0.15 for t/J = 2.5, t′/t = 0.3, and t⊥/t = 0.35. ductor Bi2Sr2CaCu2O8+δ is reproduced. With increas- ing the doping concentration, both SC peak and hump become sharper, and then the spectral weights increase in intensity. Furthermore, we have also calculated the electron spectrum with different temperatures, and the results show that the spectral weights of both SC peak and hump are suppressed with increasing temperatures. Our these results are also qualitatively consistent with the ARPES experimental results on bilayer cuprate su- perconductors in the SC state2,9,12. To better perceive the anomalous form of the antibond- ing and bonding electron spectral functions as a function of energy ω for k in the vicinity of the [π, 0] point, we have made a series of calculations for the electron spec- tral function at different momenta, and the results show that the sharp SC peak from the electron antibonding spectral function and hump from the bonding spectral function persist in a very large momentum space region around the [π, 0] point. To show this point clearly, we plot the positions of the antibonding peak and bonding hump in the electron spectrum as a function of momen- tum along the direction [−0.2π, π] → [0, π] → [0.2π, π] with T = 0.002J at δ = 0.15 for t/J = 2.5, t′/t = 0.3, and t⊥/t = 0.35 in Fig. 3. Our result shows that there are two branches in the quasiparticle dispersion, with upper branch corresponding to the antibonding quasi- particle dispersion, and lower branch corresponding to the bonding quasiparticle dispersion. Furthermore, the BS reaches its maximum at the [π, 0] point. Our present result also shows that in analogy to the two flat bands ap- peared in the normal state22, both electron antibonding peak and bonding hump have a weak dispersion around the [π, 0] point, in qualitative agreement with the ARPES experimental measurements on bilayer cuprate supercon- ductors in the SC state2,7,8,9,10,11,14. In the above calculations, we find that although the antibonding SC peak and bonding hump have different dispersions, the transverse part of the SC gap param- eter ∆T ≈ 0. To show this point clearly, we plot the antibonding and bonding gap parameters in Eqs. (11c) 00.00 0.05 0.10 0.15 0.20 0.25 0.30 00.00 FIG. 4: The antibonding (solid line) and bonding (dashed line) gap parameters as a function of the doping concentration with T = 0.002J for t/J = 2.5, t′/t = 0.3, and t⊥/t = 0.35. and (11d) as a function of the doping concentration with T = 0.002J for t/J = 2.5, t′/t = 0.3, and t⊥/t = 0.35 in Fig. 4. As seen from Fig. 4, both antibonding and bond- ing gap parameters have the same d-wave SC gap mag- nitude in a given doping concentration, i.e., ∆a ≈ ∆b. This result shows that although there is a single elec- tron interlayer coherent hopping (2) in bilayer cuprate superconductors in the SC state, the electron interlayer pairing interaction vanishes. This reflects that in the present kinetic energy driven SC mechanism, the weak dressed holon-spin interaction due to the interlayer co- herent hopping (2) from the kinetic energy terms in Eq. (3) does not induce the dressed holon interlayer pair- ing state by exchanging spin excitations in the higher power of the doping concentration. This is different from the dressed holon-spin interaction due to the intralayer hopping from the kinetic energy terms in Eq. (3), it can induce superconductivity by exchanging spin excita- tions in the higher power of the doping concentration19. Our this result is also consistent with the ARPES ex- perimental results of the bilayer cuprate superconductor Bi(Pb)2Sr2CaCu2O8+δ 14,16, where the SC gap separately for the bonding and antibonding bands has been mea- sured, and it is found that both d-wave SC gaps from the antibonding and bonding components are identical within the experimental uncertainties. To our present understanding, two main reasons why the electronic structure of bilayer cuprate superconduc- tors in the SC state can be described qualitatively in the framework of the kinetic energy driven supercon- ductivity by considering the bilayer interaction are as follows. Firstly, the bilayer interaction causes the BS, this leads to that the full electron normal (anomalous) Green’s function is divided into the longitudinal and transverse parts, respectively, then the bonding and an- tibonding electron spectral functions (SC gap functions) are obtained from these longitudinal and transverse parts of the electron normal (anomalous) Green’s function, respectively. Although the transverse part of the SC gap parameter ∆T ≈ 0, the antibonding peak around the [π, 0] point is always at lower binding energy than the bonding peak (hump) due to the BS. In this sense, the PDH structure in the bilayer cuprate superconduc- tors in the SC state is mainly caused by the BS. Sec- ondly, the SC state in the kinetic energy driven SC mechanism is the conventional BCS like as in the sin- gle layer case18,19. This can be understood from the electron normal and anomalous Green’s functions in Eq. (9). Since the spins center around the [π, π] point in the MF level18,19,22, then the main contributions for the spins comes from the [π, π] point. In this case, the longitudinal and transverse parts of the electron nor- mal and anomalous Green’s functions in Eq. (9) can be approximately reduced in terms of ωνp=[π,π] ∼ 0 and one of the self-consistent equations22 1/2 = 〈S+iaS ia〉 = 1/(4N) (Bνk/ωνk)coth[(1/2)βωνk] as, GL(k, ω) ≈ ν=1,2 ω − Eνk V 2νk ω + Eνk (12a) GT (k, ω) ≈ ν=1,2 (−1)ν+1Z ω − Eνk V 2νk ω + Eνk (12b) (k, ω) = ν=1,2 z (k) ω − Eνk ω + Eνk (12c) T (k, ω) = ν=1,2 (−1)ν+1Z z (k) ω − Eνk ω + Eνk , (12d) where the electron coherent weights Z FA = Z /2, the electron quasiparticle coherence factors U2νk ≈ V hνk−kA and V 2νk ≈ U hνk−kA , the SC gap function ∆̄ z (k) ≈ (k − kA) and the electron quasiparticle spectrum Eνk ≈ Ehνk−kA , with kA = [π, π]. As in the sin- gle layer case18,19, this reflects that the hole-like dressed holon quasiparticle coherence factors Vhνk and Uhνk and hole-like dressed holon quasiparticle spectrum Ehνk have been transferred into the electron quasiparticle coher- ence factors Uνk and Vνk and electron quasiparticle spec- trum Eνk, respectively, by the convolutions of the corre- sponding longitudinal and transverse parts of the MF spin Green’s function and full dressed holon normal and anomalous Green’s functions due to the charge-spin recombination27. As a result, these electron normal and anomalous Green’s functions in Eq. (12) are typical bi- layer BCS like6. This also reflects that as in the single layer case18,19, the dressed holon pairs condense with the d-wave symmetry in a wide range of the doping concen- tration, then the electron Cooper pairs originating from the dressed holon pairing state are due to the charge- spin recombination, and their condensation automati- cally gives the electron quasiparticle character. These are why the basic bilayer BCS formalism6 is still valid in discussions of SC coherence of the quasiparticle peak and hump, although the pairing mechanism is driven by the intralayer kinetic energy by exchanging spin excitations, and other exotic magnetic scattering28,29 is beyond the BCS formalism. IV. SUMMARY AND DISCUSSIONS We have studied the electronic structure of bilayer cuprate superconductors in the SC state based on the kinetic energy driven SC mechanism19. Our results show that the electron spectrum of bilayer cuprate supercon- ductors is split into the bonding and antibonding com- ponents by the BS, then the observed PDH structure around the [π, 0] point is mainly caused by this BS, with the SC peak being related to the antibonding compo- nent, and the hump being formed by the bonding com- ponent. The spectral weight increases with increasing the doping concentration. In analogy to the two flat bands appeared in the normal state, the antibonding and bonding quasiparticles around the [π, 0] point dis- perse weakly with momentum, in qualitative agreement with the experimental observation on the bilayer cuprate superconductors2,7,8,9,10,11. Our these results also show that the bilayer interaction has significant contributions to the electronic structure of bilayer cuprate supercon- ductors in the SC state. It has been shown from the ARPES experiments2,14 that the BS has been detected in both normal and SC states, and then the electron spectral functions display the double-peak structure in the normal state and PDH structure in the SC state. Recently, we22 have studied the electron spectrum of bilayer cuprate superconductors in the normal state, and shown that the double-peak struc- ture in the electron spectrum in the normal state is dom- inated by the BS. On the other hand, although the anti- bonding and bonding SC peaks have different dispersions, the antibonding and bonding parts have the same d-wave SC gap amplitude as mentioned above. Incorporating our previous discussions for the normal state case22 and the present studies for the SC state case, we therefore find that the one of the important roles of the interlayer co- herent hopping (2) is to split the electron spectrum of the bilayer system into the bonding and antibonding compo- nents in both normal and SC states. As a consequence, the well pronounced PDH structure of bilayer cuprate su- perconductors in the SC state and double-peak structure in the normal state are mainly caused by the BS. Acknowledgments The authors would like to thank Dr. H. Guo and Dr. L. Cheng for the helpful discussions. This work was sup- ported by the National Natural Science Foundation of China under Grant No. 90403005, and the funds from the Ministry of Science and Technology of China under Grant Nos. 2006CB601002 and 2006CB921300. APPENDIX A: DRESSED HOLON BCS TYPE NORMAL AND ANOMALOUS GREEN’S FUNCTIONS IN BILAYER CUPRATE SUPERCONDUCTORS In the single layer case, it has been shown19 that the dressed holon-spin interactions from the kinetic en- ergy terms of the t-t′-J model are quite strong, and in the case without AFLRO, these interactions can in- duce the dressed holon pairing state (then the electron Cooper pairing state) by exchanging spin excitations in the higher power of the doping concentration. Following their discussions18,19, we obtain in terms of Eliashberg’s strong coupling theory31 that the self-consistent equa- tions that satisfied by the full dressed holon normal and anomalous Green’s functions in the bilayer system in the SC state as, g(k, ω) = g(0)(k, ω) + g(0)(k, ω) 1 (k, ω)g(k, ω) 2 (−k,−ω)ℑ †(k, ω) , (A1a) ℑ†(k, ω) = g(0)(−k,−ω) 1 (−k,−ω)ℑ †(−k,−ω) 2 (−k,−ω)g(k, ω) , (A1b) respectively, where the MF dressed holon normal Green’s function22 g(0)(k, ω) = g (k, ω) + σxg (k, ω), with the longitudinal and transverse parts are evaluated as L (k, ω) = (1/2) ν=1,2(ω − ξνk) −1 and g T (k, ω) = (1/2) ν=1,2(−1) ν+1(ω − ξνk) −1, respectively, while the dressed holon self-energy functions Σ 1 (k, ω) = 1L (k, ω) + σxΣ 1T (k, ω) and Σ 2 (k, ω) = Σ 2L (k, ω) + 2T (k, ω), with the corresponding longitudinal and transverse parts have been given in Eq. (6). In the previous discussions of the electronic struc- ture for the single layer cuprate superconductors in the SC state18, it has been shown the self-energy function 2 (k, ω) describes the effective dressed holon pair gap function, while the self-energy function Σ 1 (k, ω) de- scribes the quasiparticle coherence. Since Σ 2 (k, ω) is an even function of ω, while Σ 1 (k, ω) is not, therefore for the convenience, the self-energy function Σ 1 (k, ω) can be broken up into its symmetric and antisymmetric parts as, Σ 1 (k, ω) = Σ 1e (k, ω)+ωΣ 1o (k, ω), then both 1e (k, ω) and Σ 1o (k, ω) are even functions of ω. Now we can define the dressed holon quasiparticle coherent weights in the present bilayer system as Z−1 hF1(k, ω) = 1 − Σ 1L (k, ω) and Z hF2(k, ω) = Σ 1T (k, ω). As in the single layer case18, we only discuss the low-energy behav- ior of the electronic structure of bilayer cuprate super- conductors, which means that the effective dressed holon pair gap functions and quasiparticle coherent weights can be discussed in the static limit, i.e., ∆̄h(k) = 2 (k, ω) |ω=0= ∆̄hL(k) + σx∆̄hT (k), Z hF1(k) = 1 − 1L (k, ω) |ω=0 and Z hF2(k) = Σ 1T (k, ω) |ω=0. As in the single layer case18, although ZhF1(k) and ZhF2(k) still are a function of k, the wave vector dependence may be unimportant. This followed from the ARPES experiments2 that in the SC-state of bilayer cuprate su- perconductors, the lowest energy states are located at the [π, 0] point, which indicates that the majority con- tribution for the electron spectrum comes from the [π, 0] point. In this case, the wave vector k in ZhF1(k) and ZhF2(k) can be chosen as Z hF1 = 1−Σ 1L (k) |k=[π,0] and hF2 = Σ 1T (k) |k=[π,0]. With the help of the above dis- cussions, the corresponding longitudinal and transverse parts of the dressed holon normal and anomalous Green’s functions in Eqs. (A1a) and (A1b) now can be obtained explicitly as, gL(k, ω) = ν=1,2 U2hνk ω − Ehνk V 2hνk ω + Ehνk , (A2a) gT (k, ω) = ν=1,2 (−1)ν+1Z U2hνk ω − Ehνk V 2hνk ω + Ehνk , (A2b) L(k, ω) = − ν=1,2 2Ehνk ω − Ehνk ω + Ehνk , (A2c) T (k, ω) = − ν=1,2 (−1)ν+1Z 2Ehνk ω − Ehνk ω + Ehνk , (A2d) with the dressed holon effective gap parameters and quasiparticle coherent weights satisfy the following four equations, ∆̄hL = − k,q,p ν,ν′,ν′′ k−p+qCνν′′ (k+ q) (ν′′) Bν′pBνq ων′pωνq (ν′′) νν′ν′′(q,p) + F νν′ν′′(k,q,p) [ων′p − ωνq]2 − E hν′′k νν′ν′′(q,p) + F νν′ν′′(k,q,p) [ων′p + ωνq]2 − E hν′′k , (A3a) ∆̄hT = − k,q,p ν,ν′,ν′′ (−1)ν+ν ′+ν′′+1Cνν′′(k+ q) (ν′′) Bν′pBνq ων′pωνq (ν′′) νν′ν′′(q,p) + F νν′ν′′(k,q,p) [ων′p − ωνq]2 − E hν′′k νν′ν′′(q,p) + F νν′ν′′(k,q,p) [ων′p + ωνq]2 − E hν′′k , (A3b) = 1 + ν,ν′,ν′′ [1 + (−1)ν+ν ′+ν′′+1]Cνν′′ (p+ k0) (ν′′) Bν′pBνq ων′pωνq νν′ν′′(q,p) [ων′p − ωνq + Ehν′′p−q+k0 ] νν′ν′′(q,p) [ων′p − ωνq − Ehν′′p−q+k0 ] νν′ν′′(q,p) [ων′p + ωνq + Ehν′′p−q+k0] νν′ν′′ (q,p) [ων′p + ωνq − Ehν′′p−q+k0 ] , (A3c) = 1 + ν,ν′,ν′′ [1− (−1)ν+ν ′+ν′′+1]Cνν′′ (p+ k0) (ν′′) Bν′pBνq ων′pωνq νν′ν′′(q,p) [ων′p − ωνq + Ehν′′p−q+k0 ] νν′ν′′(q,p) [ων′p − ωνq − Ehν′′p−q+k0 ] νν′ν′′(q,p) [ων′p + ωνq + Ehν′′p−q+k0] νν′ν′′ (q,p) [ων′p + ωνq − Ehν′′p−q+k0 ] , (A3d) where Cνν′′ (k) = [Z(tγk − t ′γ′k) + (−1) ν+ν′′t⊥(k)] νν′ν′′ (q,p) = nB(ωνq)+nB(ων′p)+2nB(ωνq)nB(ων′p), νν′ν′′(k,q,p) = [2nF (Ehν′′k) − 1][ων′p − ωνq][nB(ωνq) − nB(ων′p)]/Ehν′′k, F νν′ν′′(q,p) = 1 + nB(ωνq) + nB(ων′p) + 2nB(ωνq)nB(ων′p), νν′ν′′ (k,q,p) = [2nF (Ehν′′k) − 1][ων′p + ωνq][1 + nB(ωνq) + nB(ων′p)]/Ehν′′k, H νν′ν′′(q,p) = nF (Ehν′′p−q+k0)[nB(ων′p) − nB(ωνq)] + nB(ωνq)[1 + nB(ων′p)], H νν′ν′′(q,p) = nF (Ehν′′p−q+k0)[nB(ωνq) − nB(ων′p)] + nB(ων′p)[1 + nB(ωνq)], H νν′ν′′(q,p) = [1 − nF (Ehν′′p−q+k0)][1 + nB(ωνq) + nB(ων′p)] + nB(ωνq)nB(ων′p), H νν′ν′′ (q,p) = nF (Ehν′′p−q+k0)[1 + nB(ωνq)+nB(ων′p)]+nB(ωνq)nB(ων′p), and k0 = [π, 0]. These four equations must be solved self-consistently in combination with other equations as in the single layer case18,19, then all order parameters, decoupling parameter α, and chemical potential µ are determined by the self-consistent calculation. 1 See, e.g., M.A.Kastner, R.J. Birgeneau, G. Shirane, and Y. Endoh, Rev. Mod. Phys. 70, 897 (1998), and references therein. 2 See, e.g., A. Damascelli, Z. Hussain, and Z.-X. Shen, Rev. Mod. Phys. 75, 473 (2003), and references therein. 3 See, e.g., J. Campuzano, M. Norman, and M. Randeira, in Physics of Superconductors, vol. II, edited by K. Benne- mann and J. Ketterson (Springer, Berlin Heidelberg New York, 2004), p. 167, and references therein. 4 J. Campuzano, H. Ding, M. R. Norman, M. Randeira, A. F. Bellman, T. Yokoya, T. Takahashi, H. Katayama- Yoshida, T. Mochiku, and K. Kadowaki, Phys. Rev. B 53, R14737 (2003). 5 H. Matsui, T. Sato, T. Takahashi, S.-C. Wang, H.-B. Yang, H. Ding, T. Fujii, T. Watanabe, and A. Matsuda, Phys. Rev. Lett. 90, 217002 (2003). 6 J.R. Schrieffer, Theory of Superconductivity, Benjamin, New York, 1964. 7 D.S. Dessau, B.O. Wells, Z.-X. Shen, W.E. Spicer, A.J. Arko, R.S. List, D.B. Mitzi, and A. Kapitulnik, Phys. Rev. Lett. 66, 2160 (1991); Y. Hwu, L. Lozzi, M. Marsi, S. La Rosa, M. Winokur, P. Davis, M. Onellion, H. Berger, F. Gozzo, F. Lévy, and G. Margaritondo, Phys. Rev. Lett. 67, 2573 (1991). 8 Mohit Randeria, Hong Ding, J-C. Campuzano, A. Bell- man, G. Jennings, T. Yokoya, T. Takahashi, H. Katayama- Yoshida, T. Mochiku, and K. Kadowaki, Phys. Rev. Lett. 74, 4951 (1995); H. Ding, T. Yokoya, J-C. Campuzano, T. Takahashi, M. Randeria, M. R. Norman, T. Mochiku, K. Kadowaki, and J. Giapintzakis, Nature 382, 51 (1996). 9 A.V. Fedorov, T. Valla, P.D. Johnson, Q. Li, G.D. Gu, and N. Koshizuka, Phys. Rev. Lett. 82, 2179 (1999). 10 D.H. Lu, D.L. Feng, N.P. Armitage, K.M. Shen, A. Dam- ascelli, C. Kim, F. Ronning, Z.-X. Shen, D.A. Bonn, R. Liang, W.N. Hardy, A.I. Rykov, and S. Tajima, Phys. Rev. Lett. 86, 4370 (2001). 11 T. Sato, H. Matsui, S. Nishina, T. Takahashi, T. Fujii, T. Watanabe, and A. Matsuda, Phys. Rev. Lett. 89, 67005 (2002); D.L. Feng, A. Damascelli, K.M. Shen, N. Mo- toyama, D.H. Lu, H. Eisaki, K. Shimizu, J.-i. Shimoyama, K. Kishio, N. Kaneko, M. Greven, G.D. Gu, X.J. Zhou, C. Kim, F. Ronning, N.P. Armitage, and Z.-X Shen, Phys. Rev. Lett. 88, 107001 (2002). 12 J.C. Campuzano, H. Ding, M.R. Norman, H.M. Fretwell, M. Randeria, A. Kaminski, J. Mesot, T. Takeuchi, T. Sato, T. Yokoya, T. Takahashi, T. Mochiku, K. Kadowaki, P. Guptasarma, D.G. Hinks, Z. Konstantinovic, Z.Z. Li, and H. Raffy, Phys. Rev. Lett. 83, 3709 (1999); M.R. Nor- man, H. Ding, J.C. Campuzano, T. Takeuchi, M. Randeria, T. Yokoya, T. Takahashi, T. Mochiku, and K. Kadowaki, Phys. Rev. Lett. 79, 3506 (1997). 13 A.A. Kordyuk, S.V. Borisenko, T.K. Kim, K.A. Nenkov, M. Knupfer, J. Fink, M.S. Golden, H. Berger, and R. Fol- lath, Phys. Rev. Lett. 89, 077003 (2002); A.D. Gromko, Y.-D. Chuang, A.V. Fedorov, Y. Aiura, Y. Yamaguchi, K. Oka, Yoichi Ando, D.S. Dessau, cond-mat/0205385. 14 D.L. Feng, N.P. Armitage, D.H. Lu, A. Damascelli, J.P. Hu, P. Bogdanov, A. Lanzara, F. Ronning, K.M. Shen, H. Eisaki, C. Kim, Z.-X. Shen, J.-i. Shimoyama, and K. Kishio, Phys. Rev. Lett. 86, 5550 (2001). 15 Y.-D. Chuang, A.D. Gromko, A. Fedorov, Y. Aiura, K. Oka, Yoichi Ando, H. Eisaki, S.I. Uchida, and D.S. Dessau, Phys. Rev. Lett. 87, 117002 (2001); P.V. Bogdanov, A. Lanzara, X.J. Zhou, S.A. Kellar, D.L. Feng, E.D. Lu, H. Eisaki, J.-I. Shimoyama, K. Kishio, Z. Hussain, and Z.X. Shen, Phys. Rev. B 64, 180505(R) (2001). 16 S.V. Borisenko, A.A. Kordyuk, T.K. Kim, S. Legner, K.A. Nenkov, M. Knupfer, M.S. Golden, J. Fink, H. Berger, and R. Follath, Phys. Rev. B 66, 140509(R) (2002). 17 D.L. Feng, C. Kim, H. Eisaki, D.H. Lu, A. Damascelli, K.M. Shen, F. Ronning, N.P. Armitage, N. Kaneko1, M. Greven, J.-i. Shimoyama, K. Kishio, R. Yoshizaki, G.D. Gu, and Z.-X. Shen, Phys. Rev. B 65, 220501(R) (2002); A.A. Kordyuk, S.V. Borisenko, M.S. Golden, S. Legner, K.A. Nenkov, M. Knupfer, J. Fink, H. Berger, L. Forró, and R. Follath, Phys. Rev. B 66, 014502 (2002); Y.-D. Chuang, A.D. Gromko, A.V. Fedorov, Y. Aiura, K. Oka, Yoichi Ando, D.S. Dessau, cond-mat/0107002. 18 Huaiming Guo and Shiping Feng, Phys. Lett. A 361, 382 (2007); Shiping Feng and Tianxing Ma, Phys. Lett. A 350, 138 (2006). 19 Shiping Feng, Phys. Rev. B68, 184501 (2003); Shiping Feng, Tianxing Ma, and Huaiming Guo, Physica C 436, 14 (2006). 20 A.A. Kordyuk, S.V. Borisenko, M. Knupfer, and J. Fink, Phys. Rev. B 67, 064504 (2003); A.A. Kordyuk and S.V. Borisenko, Low Temp. Phys. 32, 298 (2006). 21 M. Mori, T. Tohyama, and S. Maekawa, Phys. Rev. B 66, 064502 (2002). 22 Yu Lan, Jihong Qin, and Shiping Feng, Phys. Rev. B 75, 134513 (2007). 23 C. Kim, P.J. White, Z.-X. Shen, T. Tohyama, Y. Shibata, S. Maekawa, B.O. Wells, Y.J. Kim, R.J. Birgeneau, and M.A. Kastner, Phys. Rev. Lett. 80, 4245 (1998). 24 O.K. Anderson, A.I. Liechtenstein, O. Jepsen, and F. Paulsen, J. Phys. Chem. Solids 56, 1573 (1995); A.I. Liechtenstein, O. Gunnarsson, O.K. Anderson, and R.M. Martin, Phys. Rev. B 54, 12505 (1996); S. Chakarvarty, A. Sudbo, P.W. Anderson, and S. Strong, Science 261, 337 (1993). 25 Shiping Feng, Jihong Qin, and Tianxing Ma, J. Phys. Con- dens. Matter 16, 343 (2004); Shiping Feng, Tianxing Ma, and Jihong Qin, Mod. Phys. Lett. B 17, 361 (2003). 26 S. Sorella, G.B. Martins, F. Becca, C. Gazza, L. Capriotti, A. Parola, and E. Dagotto, Phys. Rev. Lett. 88, 117002 (2002). 27 P.W. Anderson, Phys. Rev. Lett. 67, 2092 (1991); Science 288, 480 (2000). 28 P. Dai, H.A. Mook, R.D. Hunt, and F. Dog̃an, Phys. Rev. B 63, 54525 (2001); Ph. Bourges, B. Keimer, S. Pailhés, L.P. Regnault, Y. Sidis, and C. Ulrich, Physica C 424, 45 (2005). 29 M. Arai, T. Nishijima, Y. Endoh, T. Egami, S. Tajima, K. Tomimoto, Y. Shiohara, M. Takahashi, A. Garret, and S.M. Bennington, Phys. Rev. Lett. 83, 608 (1999); S.M. Hayden, H.A. Mook, P. Dai, T.G. Perring, and F. Dog̃an, Nature 429, 531 (2004); C. Stock, W.J. Buyers, R.A. Cow- ley, P.S. Clegg, R. Coldea, C.D. Frost, R. Liang, D. Peets, D. Bonn, W.N. Hardy, and R.J. Birgeneau, Phys. Rev. B 71, 24522 (2005). 30 K. Terashima, H. Matsui, D. Hashimoto, T. Sato, T. Taka- hashi, H. Ding, T. Yamamoto, and K. Kadowaki, Nature Phys. 2, 27 (2006). 31 G.M. Eliashberg, Sov. Phys. JETP 11, 696 (1960); D.J. Scalapino, J.R. Schrieffer, and J.W. Wilkins, Phys. Rev. 148, 263 (1966). http://arxiv.org/abs/cond-mat/0205385 http://arxiv.org/abs/cond-mat/0107002 ABSTRACT Within the framework of the kinetic energy driven superconductivity, the electronic structure of bilayer cuprate superconductors in the superconducting state is studied. It is shown that the electron spectrum of bilayer cuprate superconductors is split into the bonding and antibonding components by the bilayer splitting, then the observed peak-dip-hump structure around the $[\pi,0]$ point is mainly caused by this bilayer splitting, with the superconducting peak being related to the antibonding component, and the hump being formed by the bonding component. The spectral weight increases with increasing the doping concentration. In analogy to the normal state case, both electron antibonding peak and bonding hump have the weak dispersions around the $[\pi,0]$ point. <|endoftext|><|startoftext|> Submitted to ApJ Letters (Revised Version including Referee’s Comments) Preprint typeset using LATEX style emulateapj v. 08/22/09 9.7 µ M SILICATE ABSORPTION IN A DAMPED LYMAN-α ABSORBER AT Z = 0.52 Varsha P. Kulkarni1, Donald G. York2,3, Giovanni Vladilo4, Daniel E. Welty2 Submitted to ApJ Letters (Revised Version including Referee’s Comments) ABSTRACT We report a detection of the 9.7 µm silicate absorption feature in a damped Lyman-α (DLA) system at zabs = 0.524 toward AO0235+164, using the Infrared Spectrograph (IRS) onboard the Spitzer Space Telescope. The feature shows a broad shallow profile over ≈ 8-12 µm in the absorber rest frame and appears to be > 15 σ significant in equivalent width. The feature is fit reasonably well by the silicate absorption profiles for laboratory amorphous olivine or diffuse Galactic interstellar clouds. To our knowledge, this is the first indication of 9.7 µm silicate absorption in a DLA. We discuss potential implications of this finding for the nature of the dust in quasar absorbers. Although the feature is relatively shallow (τ9.7 ≈ 0.08− 0.09), it is ≈ 2 times deeper than expected from extrapolation of the τ9.7 vs. E(B − V ) relation known for diffuse Galactic interstellar clouds. Further studies of the 9.7 µm silicate feature in quasar absorbers will open a new window on the dust in distant galaxies. Subject headings: Quasars: absorption lines–ISM:dust 1. INTRODUCTION Damped Lyman-alpha (DLA) absorption systems in quasar spectra dominate the neutral gas content in galax- ies and offer venues for studying the evolution of metals and dust in galaxies. Recent observations, however, sug- gest that the majority of DLAs have low metallcities at all redshifts studied (0 . z . 4), with the mean metal- licity reaching at most ≈ 10 − 20% solar at the lowest redshifts (see, e.g., Prochaska et al. 2003; Kulkarni et al. 2005, 2007; Péroux et al. 2006; and references therein). These results appear to contradict the predictions of a near-solar global mean interstellar metallicity of galaxies at z ∼ 0 in most chemical evolution models based on the cosmic star formation history inferred from galaxy imag- ing surveys such as the Hubble Deep Field (HDF) (e.g., Madau et al. 1996). Furthermore, for a large fraction of the DLAs, the SFRs inferred from emission-line imag- ing searches fall far below the global predictions (e.g., Kulkarni et al. 2006, and references therein). A possible explanation of these puzzles is that the cur- rent DLA samples are biased due to dust selection ef- fects, i.e. that the more dusty and more metal-rich ab- sorbers obscure the background quasars more, making them harder to observe (e.g., Fall & Pei 1993; Boissé et al. 1998; Vladilo & Péroux 2005). DLAs are known to have some dust, based on both the (generally mild) deple- tions of refractory elements and the (typically slight) red- dening of the background quasars (e.g., Pei et al. 1991; Pettini et al. 1997; Kulkarni et al. 1997). Combining ∼ 800 quasar spectra from the Sloan Digital Sky Survey (SDSS), York et al. (2006b) found a small but significant amount of dust in absorbers at 1 < z < 2, with E(B−V ) of 0.02-0.09 for 9 of their 27 sub-samples (see also Khare et al. 2007). York et al. (2006b) also showed that the extinction in the composite spectra is best fitted by a 1 Department of Physics and Astronomy, University of South Carolina, Columbia, SC 29208; E-mail: kulkarni@sc.edu 2 Department of Astronomy and Astrophysics, University of Chicago, Chicago, IL 60637 3 Also, Enrico Fermi Institute 4 INAF, Osservatorio Astronomico di Trieste, Trieste, Italy Small Magellanic Cloud (SMC) curve (with no 2175 Å bump). Some recent studies suggest that dusty DLAs could hide as much as 17% of the total metal content at z ∼ 2, and more at lower z (Bouché et al. 2005). To un- derstand whether this is the case, and to understand the role of dust in quasar absorbers in general, it is essential to directly probe the basic properties of the dust. Recently, a small number of very dusty quasar ab- sorbers have been discovered, via various signatures of the dust in optical and UV observations: substantial red- dening of the background quasars, large element deple- tions (e.g., for Cr, Fe), and/or a detectable 2175 Å bump (e.g., Junkkarinen et al. 2004; Wang et al. 2004). It is not yet clear, however, whether the dust in these systems is similar to that in the Milky Way or SMC or LMC. The 2175 Å bump is generally, though not conclusively, attributed to carbonaceous grains. The silicate compo- nent of the dust, believed to comprise ≈ 70% of the core mass of interstellar dust grains in the Milky Way (see, e.g., Draine 2003) has not yet been probed in quasar ab- sorbers. A unique opportunity to study this important dust component is provided by the Spitzer IRS (Werner et al. 2004; Houck et al. 2004), which provides the spec- tral coverage, sensitivity, and resolution needed for the detection of the strongest of the silicate spectral features near 9.7 µm. The 9.7 µm feature, thought to arise in Si-O stretching vibrations, is seen in a wide range of Galac- tic and extragalactic environments (e.g., Whittet 1987 and references therein; Spoon et al. 2006; Imanishi et al. 2007). We have been carrying out an exploratory study of the silicate dust in quasar absorbers by searching for the 9.7 µm absorption feature with the Spitzer IRS. Here we report on the detection of the 9.7 µm feature in one of the systems studied, while the remaining three systems observed recently will be reported in a separate paper (Kulkarni et al. 2007b, in preparation). 2. OBSERVATIONS AND DATA ANALYSIS The DLA at zabs = 0.524 (Junkkarinen et al. 2004) toward the blazar AO 0235+164 (zem = 0.94) offers an excellent venue for comparing dust in a distant galaxy http://arxiv.org/abs/0704.0826v2 2 Kulkarni et al. with that in near-by galaxies. It has one of the largest H I column densities seen in DLAs (log NHI = 21.70) and shows 21-cm absorption (Roberts et al. 1976). It also shows X-ray absorption, consistent with a metallic- ity of 0.7 solar (Junkkarinen et al. 2004). Candidate absorber galaxies (much fainter than the blazar) within a few arcseconds from the blazar sightline have been de- tected (e.g., Smith et al. 1977; Yanny et al. 1989; Chun et al. 2006). This absorber is one of a very few DLAs producing appreciable reddening [E(B−V ) = 0.23 in the absorber rest frame] and detection of a strong broad 2175 Å extinction bump (Junkkarinen et al. 2004). Finally, this absorber is the only DLA with detections of several diffuse interstellar bands (Junkkarinen et al. 2004; York et al. 2006a). All of these data suggest that this absorber is very dusty and may contain molecular gas. The observations were obtained with the Spitzer IRS on January 30, 2006 (UT) as GO program 20757 (PI V. P. Kulkarni). IRS modules Short-Low 1 (SL1) and Long-Low 2 (LL2) were used to cover 7.5-21.4 µm in the observed frame (4.9-14.1 µm in the DLA rest frame). The target was acquired with high-accuracy peakup using a near-by bright star. The IRS standard staring mode was used, with 2-pixel slit widths of 3.6” for SL1 and 10.5” for LL2. Integration times were 60 s ×8 cycles for SL1 and 120 s ×11 cycles for LL2. For each cycle, observations were performed at both nod positions A and B (offset by 1/3 the slit length), so the total integration times were 960 s and 2640 s, respectively, for SL1 and LL2. The data were processed using the IRS S15.0 calibra- tion pipeline (the latest version available at present), Im- age Reduction and Analysis Facility (IRAF5), and In- teractive Data Language (IDL). As detailed below, the S15.0 pipeline yielded significant improvements for the reliable detection and measurement of weak, broad fea- tures in our spectra. The pipeline performs a number of standard processing steps to produce the basic calibrated data (BCD) files (see, e.g., the IRS Data Handbook at http://ssc.spitzer.caltech.edu/irs/dh). Subtraction of the sky (mostly zodiacal light) was performed by sub- tracting the coadded frames at nod position B from those at nod position A, and vice versa. The 1-dimentional spectra were extracted from the 2-dimensional images us- ing the Spitzer IRS Custom Extraction (SPICE) software using the default extraction windows, and flux calibrated using the standard S15.0 flux calibration files. The spec- tra from the two nod positions were averaged together, and the corresponding flux uncertainties calculated us- ing both measurement uncertainties and “sampling un- certainties” between the two nod positions. The absolute flux levels in the different IRS mod- ules were scaled to match the continuum levels in the overlapping regions, using the bonus segment available in the LL2 images. There was no mismatch between the SL1 and LL2 flux levels; we used the SL1 data for λ < 14.23µm and LL2 data for λ > 14.23µm. The data at λ > 20µm bonus segment level had to be scaled up by 5.5% to match with the LL2 data at λ < 20µm. Fig. 1(a) shows the final merged spectrum of AO0235+164. The 5 IRAF is distributed by the National Optical Astronomy Ob- servatories, which are operated by the Association of Universities for Research in Astronomy, Inc. (AURA), under cooperative agree- ment with the National Science Foundation S/N achieved per unbinned pixel in the final spectrum, determined from rms fluctuations in the continuum re- gions, is ≈ 100. The error bars denote 1 σ uncertainties. The dashed line in Fig. 1(a) shows an estimate of the power-law continuum of the quasar. This line joins the observed continuum fluxes at 5.6 and 7.1 µm in the ab- sorber rest frame and is extrapolated to the remaining wavelength region. These wavelengths are chosen to be in regions free of any other potential emission or absorp- tion features (e.g., Imanishi et al. 2007). In principle, significant 9.7 µm emission at the quasar redshift could affect continuum determination redward of the suspected silicate absorption feature from the DLA. However, (a) our spectrum does not extend that far to the red, (b) the power law provides a good fit to the continuum in our data, and (c) the 9.7 µm emission is not particularly strong in most quasars (e.g., Hao et al. 2007). 3. RESULTS The spectrum shown in Fig. 1(a) exhibits a broad absorption feature between about 12.4 and 18.3 µm rel- ative to the power law continuum. The flux decline from the continuum begins near the long wavelength end of SL1 and continues smoothly into the LL2 data. The broad feature is centered at 15.41 µm (10.11 µm in the DLA rest frame). The observed frame equivalent width is 0.31µm, with a 1 σ uncertainty of 0.014-0.020 µm, in- cluding contributions from photon noise and continuum fitting uncertainties (Sembach & Savage 1992). We have performed several checks of our data analy- sis to see whether the observed broad feature could be an artifact. Since the possible silicate feature is broad and shallow, extending from the long wavelength end of SL1 through most of LL2, flux calibration and contin- uum fitting are critical issues. In the S14 pipeline ver- sion of these data, the possible silicate feature was some- what stronger than in the S15 version. These differences are due to a low-level non-linearity problem in the S14 pipeline, which produces a 4% tilt in LL2 spectra and a 5% mismatch at the SL1/LL2 boundary. This problem has been eliminated in the S15 pipeline, and we find no mismatch at the SL1/LL2 boundary in the S15 data. The possible silicate feature does not show any visi- ble signature of the “teardrop” feature known to exist near 14.1 µm in some SL1 data (see, e.g., the IRS data handbook). The beginning of decline in flux at the long- wavelength end of SL1 matches smoothly with the flux at the short-wavelength end of LL2 (which does not suffer from the teardrop problem). Our results do not change much even if the SL1 data are truncated at 14 µm to avoid the region potentially affected by the teardrop (the region 14-14.23 µm is a small fraction of the whole fea- ture stretching out to 18.3 µm in the observed frame). Inaccuracies in pointing (which can affect SL1 and LL2 fluxes at the ±1% level) also do not appear to be sig- nificant for our data. Based on an examination of the spectral images and the pointing difference keywords in the data file headers, the telescope pointing was accu- rate to within 0.09-0.11” for LL2 and within 0.22-0.29” for SL1. Integrating a Gaussian intensity distribution from a point source with the Spitzer point spread func- tion over the known slit dimensions (57′′ × 3.6′′ for SL1, 168′′×10.5′′ for LL2), we estimate that the effect of such an offset would be about 0.26% for SL1 and 0.05% for http://ssc.spitzer.caltech.edu/irs/dh Silicate Feature in A DLA 3 LL2, far too small to account for the observed feature. We also compared our results with IRS spectra from the literature for quasars without strong absorption sys- tems (e.g., Sturm et al. 2006; Hao et al. 2007), and did not find the broad absorption feature from our data in those other quasars. In fact, quasar spectra in general show no silicate absorption, but rather (generally rela- tively weak) silicate emission at the quasar emission red- shift. We also compared our IRS data for AO0235+164 with those for other targets in our study. The feature seen in AO0235+164 is not seen at the same observed wavelength in the other objects, suggesting that it is not an instrumental artifact. [In fact, in Kulkarni et al. 2007b (in prep.), we will report the possible detection of redshifted broad 9.7 µm silicate absorption in other parts of the Spitzer spectral coverage toward other quasars.] Given the results of the above tests and the fact that the DLA toward AO0235+164 is already known to be dusty (from detection of 2175 Å bump and diffuse inter- stellar bands and reddening of the background quasar), it seems very likely that the feature detected is the broad 9.7 µm silicate feature arising in the absorber galaxy. 4. DISCUSSION The suggested silicate feature in the DLA toward AO0235+164 is relatively shallow/weak compared to the silicate features typically observed in Galactic interstel- lar material (ISM) because of the modest reddening and lower amounts of dust in quasar absorbers than in the Milky Way. Indeed, the dust-to-gas ratio in the DLA toward AO0235+164 is estimated to be 0.19 times the Galactic value (Junkkarinen et al. 2004). On the other hand, the observed feature is stronger than expected from E(B−V ) = 0.23±0.01 for this absorber (Junkkari- nen et al. 2004). In Galactic diffuse interstellar clouds, the peak optical depth in the 9.7 µm silicate feature (τ9.7) is observed to correlate with the reddening along the line of sight, with τ9.7 = AV /18.5 (e.g., Whittet 1987). Ex- trapolating this relation, and assuming RV = 3.1, one would expect τ9.7 ≈ 0.039 for the DLA in AO0235+164. Our observations, however, indicate τ9.7 ≈ 0.08 for this DLA, ∼ 2 times higher than expected from the relation for Galactic diffuse ISM. The dust in this absorber may thus be somewhat richer in silicates than typical Galac- tic dust. We note, however, that the silicate feature is also known to be stronger in the Galactic Center region, perhaps due to fewer carbon stars (and thus less carbona- ceous dust) there (e.g., Roche & Aitken 1985). If future observations of other DLAs also reveal material richer in silicates, it might indicate that those DLAs probe denser regions near the centers of the respective galaxies. The Galactic interstellar 9.7 µm feature is generally broad and relatively featureless, which is taken as an in- dication that interstellar silicates are largely amorphous. (Crystalline silicates would produce structure within the broad feature.) In principle, silicate grains may be com- posed of a mixture of pyroxene-like [(MgxFe1−x)SiO3] and olivine-like [(MgxFe1−x)2SiO4] silicates, with the shape and central wavelength of the 9.7 µm absorp- tion somewhat dependent on the exact composition (e.g., Kemper et al. 2004; Chiar & Tielens 2006). Fig. 1(b) shows a closer view of the data, normalized by the power law continuum shown in Fig. 1(a), and binned by a fac- tor of 3. The dotted and short-dashed curves are fits based on silicate emissivities derived from observations of the M supergiant µ Cep and of the Orion Trapezium region (e.g., Roche & Aitken 1984; Hanner et al. 1995), which are taken to be representative of diffuse Galactic ISM and denser molecular material, respectively . The long-dashed and dot-dashed curves are fits based on the silicate absorption profile observed toward the Galactic Center Source GCS3, and on laboratory measurements for amorphous olivine (Spoon et al. 2006). The shape of the silicate profile observed toward AO 0235+164 is most similar to that of laboratory amorphous olivine, but the µ-Cep and GCS3 templates also yield reasonable fits. The DLA silicate profile does not exhibit the red- ward extension seen for the Trapezium profile, suggest- ing that the DLA dust resembles dust in diffuse Galactic clouds more than that in molecular clouds. Using χ2 minimization for 8.0-13.3 µm in the DLA rest frame, the peak optical depth values τ9.7 for the laboratory olivine, GCS3, µ Cep, and Trapezium templates are 0.081±0.018, 0.088 ± 0.020, 0.083 ± 0.018, and 0.071 ± 0.016, respec- tively for the binned data (0.081± 0.020, 0.091± 0.023, 0.084± 0.021, and 0.069± 0.017, respectively, for the un- binned data). The error bars on τ9.7 correspond to op- tical depths that give reduced χ2 larger by 1.0 than the minimum values. The respective reduced χ2 values are 1.22, 1.32, 1.51, and 1.92 for the binned data (1.82, 2.08, 2.10, and 2.65 for the unbinned data). It is interesting to note that the best-fit astronomical template is GCS3, consistent with the enhanced τ9.7/E(B−V ) ratio seen in the DLA as toward the Galactic center. While the min- imum reduced χ2 values are greater than 1.0, they are similar to those found in other studies of the silicate ab- sorption toward both Galactic and extragalactic sources (e.g., Hanner et al. 1995; Bowey et al. 1998; Roche et al. 2006, 2007). Indeed, we do not expect a perfect fit, since possible differences in dust grain size and chemical composition can alter the shape of the silicate feature, in- cluding the peak wavelength and the FWHM (Bowey et al. 1998 and references therein). Higher S/N and higher resolution data would be needed to shed further light on the specific types of silicates present in DLAs. With a larger absorber sample, it would be possible to explore correlations between the strengths of the 9.7 µm silicate feature and the 2175 Å extinction bump (which is thought to be produced by a carbonaceous component of the dust). For example, it would be interesting to understand whether the relative amounts of silicate and carbonaceous dust vary with redshift or with the gas- phase abundances of C or Si. High-S/N observations of other possible features (e.g., the 18.5 µm silicate feature or the 3.0 µm H2O ice feature) would provide additional constraints on dust composition. (While those features are generally weaker than the 9.7 µm feature in the Milky Way, the 3.0 µm feature can be stronger than the 9.7 µm feature in highly reddened molecular sightlines.) Our exploratory study has demonstrated the potential of the Spitzer IRS to study dust in quasar absorbers. It would be very interesting to obtain similar spectra for other dusty quasar absorbers. The E(B − V ) val- ues for dusty absorbers such as that reported here (0.23) are much larger than those for typical Mg II absorbers [E(B − V ) of 0.002; York et al. 2006b]. These relatively large reddening values are comparable to some of those 4 Kulkarni et al. for Ly-break galaxies (LBGs), which show E(B − V ) up to 0.4 and a median E(B − V ) of ≈ 0.15 at z ∼ 2 and z ∼ 3 (Shapley et al. 2001, 2005; Papovich et al. 2001). Such dusty absorbers appear to be chemically more evolved (Wild et al. 2006) than typical DLAs, and may possibly provide a link in terms of SFRs, masses, metallicities, and dust content between the primarily metal-poor and dust-poor general DLA population with low SFRs and the actively star-forming, metal-rich, and dust-rich LBGs. Further Spitzer IRS observations of more dusty quasar absorbers thus will help to open a new window on this interesting class of distant galaxies. This work is based on observations made with the Spitzer Space Telescope, which is operated by the Jet Propulsion Laboratory, California Institute of Technol- ogy under a contract with NASA. Support for this work was provided by NASA through an award issued by JPL/Caltech. VPK acknowledges support from NSF grant AST-0607739 to University of South Carolina. DEW acknowledges support from NASA LTSA grant NAG5-11413 to the University of Chicago. We are grate- ful to the Spitzer Science Center staff for helpful advice on data analysis and to an anonymous referee for helpful comments. Facilities: SST (IRS). REFERENCES Boissé, P., Le Brun, V., Bergeron, J., & Deharveng, J.-M. 1998, A&A, 333, 841 Bouché, N., Lehnert, M. D., & Péroux, C. 2005, MNRAS, 364, Bowey, J. E., Adamson, A. J., & Whittet, D. C. B. 1998, MNRAS, 298, 131 Chiar, J. E., & Tielens, A. G. G. M. 2006, ApJ, 637, 774 Chun, M. R. et al. 2006, AJ, 131, 686 Draine, B. T. 2003, ARAA, 41, 241 Fall, S. M., & Pei, Y. C. 1993, ApJ, 402, 479 Hao, L., Weedman, D. W., Spoon, H. W. W., Marshall, J. A., Levenson, N., Elitzur, M., & Houck, J. R. 2007, ApJ, 655, L77 Hanner, M. S., Brooke, T. Y., & Tokunaga, A. T. 1995, ApJ, 438, Houck, J. R. et al. 2004, ApJS, 154, 18 Imanishi, M., Dudley, C. C., Maiolino, R., Maloney, P. R., Nakagawa, T., & Risaliti, G. 2007, ApJ, in press Junkkarinen, V. T., Cohen, R. D., Beaver, E. A., Burbidge, E. M., Lyons, R. W., & Madejski, G. 2004, ApJ, 614, 658 Kemper, F., Vriend, W. J., & Tielens, A. G. G. M. 2004, ApJ, 609, 826 (erratum 633, 534 [2005]) Khare, P., Kulkarni, V. P., Péroux, C., York, D. G., Lauroesch, J. T., & Meiring, J. D. 2007, A&A, 464, 487 Kulkarni, V. P., Fall, S. M. & Truran, J. W. 1997, ApJ, 484, L7 Kulkarni, V. P., Fall, S. M., Lauroesch, J. T., York, D. G., Welty, D. E., Khare, P., & Truran, J. W. 2005, ApJ, 618, 68 Kulkarni, V. P., Woodgate, B. E., York, D. G., Thatte, D. G., Meiring, J., Palunas, P., & Wassell, E. 2006, ApJ, 636, 30 Kulkarni, V. P., Khare, P., Péroux, C., York, D. G., Lauroesch, J. T., & Meiring, J. D. 2007, ApJ, in press (astro-ph/0608126) Madau, P., Ferguson, H. C., Dickinson, M. E., Giavalisco, M., Steidel, C. C., & Fruchter, A. 1996, MNRAS, 283, 1388 Papovich et al. 2001, ApJ, 559, 620 Pei, Y. C., Fall, S. M., & Bechtold, J. 1991, ApJ, 378, 6 Péroux, C., Kulkarni, V. P., Meiring, J., Ferlet, R., Khare, P., Lauroesch, J., Vladilo, G., & York, D. G. 2006, A&A 450, 53 Pettini, M. Smith, L. J., King, D. L., & Hunstead, R. W. 1997, ApJ, 486, 665 Prochaska, J. X., Gawiser, E., Wolfe, A. M., Cooke, J., & Gelino, D. 2003, ApJS, 147, 227 Roberts, M. S. et al. 1976, AJ, 81, 293 Roche, P. F., & Aitken, D. K. 1984, MNRAS, 208, 481 Roche, P. F., & Aitken, D. K. 1985, MNRAS, 215, 425 Roche, P. F., Packham, C., Aitken, D. K., & Mason, R. E. 2007, MNRAS, 375, 99 Roche, P. F., Packham, C., Telesco, C. M., Radomski, J. T., Alonso-Herroro, A., Aitken, D. K., Colina, L., & Perlman, E. 2006, MNRAS, 367, 1689 Sembach, K. R., & Savage, B. D. 1992, ApJS, 83, 147 Shapley, A. et al. 2001, ApJ, 562, 95 Shapley, A. E., Steidel, C. C., Erb, D. K., Reddy, N. A., Adelberger, K. L., Pettini, M., Barmby, P., & Huang, J. 2005, ApJ, 626, 698 Smith, H. E. et al. 1977, ApJ, 218, 611 Spoon, H. W. W. et al. 2006, ApJ, 638, 759 Sturm, E., Hasinger, G., Lehmann, I., Mainieri, V., Genzel, R., Lehnert, M. D., Lutz, D., & Tacconi, L. J. 2006, ApJ, 642, 81 Vladilo, G., & Péroux, C. 2005, , A&A, 444, 461 Wang, J., Hall, P. B., Ge, J., Li, A., & Schneider, D. P. 2004, ApJ, 609, 589 Werner, M. W. et al. 2004, ApJS, 154, 1 Whittet, D. C. B. 1987, QJRAS, 28, 303 Whittet, D. C. B., Bode, M. F., Longmore, A. J., Adamson, A. J., McFadzean, A. D., Aitken, D. K., & Roche, P. F. 1988, 233, 321 Wild, V., & Hewett, P. C. 2005, MNRAS, 361, L30 Wild, V., Hewett, P. C., & Pettini, M. 2006, MNRAS, 367, 211 Yanny, B., York, D. G., & Gallagher, J. S. 1989, ApJ, 338, 735 York, B. A., Ellison, S. L., Lawton, B., Churchill, C. W., Snow, T. P., Johnson, R. A., & Ryan, S. G. 2006a, ApJ, 647, L29 York, D. G. et al. 2006b, MNRAS, 367, 945 http://arxiv.org/abs/astro-ph/0608126 Silicate Feature in A DLA 5 0.85 0.95 1.05 1.15 1.25 1.35 −1.65 −1.55 −1.45 −1.35 −1.25 −1.15 −1.05 SL order 1 LL order 2 LL bonus order log λobserved 5.0 10.06.0 8.0 12.0 Rest Wavelength( Q0235+164 z abs = 0.524 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 Rest Wavelength tau_9.7=0.083, mu Cep tau_9.7=0.071, Trapezium tau_9.7=0.088, GCS3 tau_9.7=0.081, Lab Olivine ( m)µ Fig. 1.— (a) Left: Spitzer IRS spectrum of AO0235+164. The lower scale for the abscissa denotes the logarithm of the observed wavelength in µm; rest frame wavelengths at the absorber redshift are shown at the top. The errorbars denote 1 σ flux uncertainties. The dashed line shows a power law estimate of the continuum. (b) Right: A closer look at the suggested silicate feature. The abscissa denotes the rest frame wavelength at the DLA redshift. The data points show the spectrum, normalized by the power law continuum and binned by a factor of 3. The errorbars denote 1 σ uncertainties. The smooth curves show profiles for four templates of silicate optical depth, based on observations for three Galactic sightlines and laboratory measurements for amorphous olivine. ABSTRACT We report a detection of the 9.7 micrometer silicate absorption feature in a damped Lyman-alpha (DLA) system at z_{abs} = 0.524 toward AO0235+164, using the Infrared Spectrograph (IRS) onboard the Spitzer Space Telescope. The feature shows a broad shallow profile over about 8-12 micrometers in the absorber rest frame and appears to be > 15 sigma significant in equivalent width. The feature is fit reasonably well by the silicate absorption profiles for laboratory amorphous olivine or diffuse Galactic interstellar clouds. To our knowledge, this is the first indication of 9.7 micrometer silicate absorption in a DLA. We discuss potential implications of this finding for the nature of the dust in quasar absorbers. Although the feature is relatively shallow (tau_{9.7} = 0.08-0.09), it is about 2 times deeper than expected from extrapolation of the tau_{9.7} vs. E(B-V) relation known for diffuse Galactic interstellar clouds. Further studies of the 9.7 micrometer silicate feature in quasar absorbers will open a new window on the dust in distant galaxies. <|endoftext|><|startoftext|> Introduction The luminosity function (LF) is an observational tool used for analyzing the post- main sequence evolutionary phases of low-mass (≈ 0.5-0.8M⊙) metal-poor stars in Galactic globular clusters (GGC). Because of their age and richness, GGC typically contain hun- dreds of stars that have evolved off the main sequence. The numbers of stars in evolved phases are directly related to the evolutionary timescales and fuel consumed in each phase (Renzini & Fusi Pecci 1988), so that they present us with an opportunity to test this aspect of stellar evolution models. The results of the most stringent tests have been mixed. Repeated studies of the metal- poor cluster M30 (Bolte 1994; Bergbusch 1996; Guhathakurta et al. 1998; Sandquist et al. 1999) have found an excess number of red giant branch (RGB) stars relative to main sequence (MS) stars. Stetson (1991) also uncovered an apparent excess of stars in a combined LF of the metal-poor clusters M68, NGC 6397, and M92. However, the LFs of more metal-rich clusters show no discrepancy (M5: Sandquist et al. 1996; M3: Rood et al. 1999; M12: Hargis, Sandquist, & Bolte 2004). In a survey of 18 clusters, Zoccali & Piotto (2000) found good agreement with model predictions with the possible exception of clusters at the high metallicity end. In this paper we present BV I photometry of NGC 5466, a high galactic latitude globular cluster (l = 42.2◦ and b = 73.6◦), located in the constellation of Boötes (α = 14h05m27.s4, δ = +28◦32′04′′ at a distance of R=15.9 kpc; Harris 1996). NGC 5466 is a loose cluster (rc = 1. ′64) with extremely low metallicity ([Fe/H]= −2.22) and subject to little or no reddening, (E(B − V ) ≃ 0) (Harris 1996). In §2, we describe the process leading to the calibrated photometry, and compare with previous studies of the cluster. In §3, we compare the observed color-magnitude diagram and observed luminosity function with theoretical models, focusing on the relative number – 3 – of stars on the lower RGB and around the MS turnoff. Finally, in §4, we present a new examination of the blue straggler population of NGC 5466. 2. Observations and Data Reduction The data used in this study were obtained with the Kitt Peak National Observatory (KPNO) 0.9 m telescope (0.′′68 pix−1) on the nights of UT dates 1995 May 4, May 5, and May 9. A complete list of the image frames, exposure times, and observing conditions is given in Table 1. The images obtained on the three nights were processed using IRAF1 tasks and pack- ages. The reduction involved subtraction and trimming of the overscan region of all images, subtraction of a master bias frame from flats and object frames, and flat fielding of the object frames using images taken at twilight. Profile-fitting photometry was done using the DAOPHOTII/ALLSTAR programs (Stetson 1987). We also reduced archival ground-based photometry of the cluster core taken with the High-Resolution Camera (HRCam) on the 3.6 m Canada-France-Hawaii Telescope (CFHT). The V and I images were taken 30 and 31 May 1992 (observers J. Heasley and C. Christian), and have not previously been described in the literature. The CCD images had 1024 × 1024 pixels, 0.′′13 per pixel, and excellent seeing (0.4 - 0.5 arcsec). The images were reduced using the archived bias and twilight flat frames, and following a procedure similar to that for the KPNO data. This allowed us to get excellent photometry to 2 magnitudes below the turnoff in the cluster core. These images were used entirely for blue straggler identification (see §4). 2.1. Calibration against Primary Standard Stars The conditions at KPNO on 1995 May 9 were photometric, and Landolt standard star fields were observed at a range of air masses to determine photometric transformation coeffi- cients. The standard values used for the calibration were chosen from the large compilation of Stetson (2000), which is set to be on the same photometric scale as the earlier Landolt (1992) values. We conducted photometry on the standard stars and isolated cluster stars using multiple 1IRAF(Image Reduction and Analysis Facility) is distributed by the National Optical Astronomy Ob- servatories, which are operated by the Association of Universities for Research in Astronomy, Inc., under contract with the National Science Foundation. – 4 – synthetic apertures. We then used the DAOGROW (Stetson 1990) program to construct growth curves to extrapolate measurements to a common aperture size. Using the CCDSTD program, the standard star transformation equations were found to be: b = B + ao + (−0.069± 0.005)(B − V ) + (0.255± 0.014)(X − 1.0) v = V + bo + (0.027± 0.004)(B − V ) + (0.172± 0.010)(X − 1.0) i = I + co + (−0.013± 0.005)(V − I) + (0.145± 0.018)(X − 1.0) where X is the airmass, v, b and i are instrumental magnitudes, and V , B, and I are standard magnitudes. These calibration equations are different than those used for our analysis of M10 Pollard et al. (2005) and M12 Hargis et al. (2004), which were observed on the same night, because our I−band exposures did not go as deep as the B and V exposures. As a result, the (B − V ) color was a better choice for calibrating the V photometry down to our faintest observed stars. The calibrated measurements for the standard stars are compared with catalogue values in Fig. 1. We note that there is slight evidence of trend in the residuals for the I band versus magnitude, which might indicate nonlinearity. This impression is caused by one observation of the PG1323-086 field. We did, however, have an additional observation of the same field on the same night having the same exposure time that does not show the same (small) trend. Because we do not have any reason to eliminate the frame and because its elimination has a minimal effect on the transformation coefficients, we have decided to retain the measurements from the image. 2.2. Calibration against Secondary Standard Stars Aperture photometry for 165 cluster stars was used to calibrate the point-spread function (PSF) photometry for the cluster. These secondary standard stars were chosen based on relatively low measurement errors and location in relatively uncrowded regions of the cluster. They were chosen from the asymptotic giant branch (AGB), upper RGB, and horizontal branch (HB) of the cluster in order to cover the entire range of colors covered by cluster stars. The PSF-fitting photometry for the three nights of data was combined and averaged after zero-point differences among the frames had been determined and corrected. The zero- point corrections to the standard system were determined after fixing the color-dependent terms at the values measured in the primary standard star calibration. (This was also done in our studies of M10 and M12.) In Fig. 2, it can be seen that this procedure does not introduce systematic color- or magnitude-dependent errors. – 5 – 2.3. Comparison with Previous Studies We compared our photometric data set to those of Jeon et al. (2004), Rosenberg et al. (2000), and Stetson (2000). The magnitude and color comparisons (BV I in this study versus V I in Stetson and Rosenberg et al., and BV in Jeon et al.) as function of magnitude and color are shown in Figs. 3 – 5. Though our calibrated magnitudes are slightly brighter than those of Stetson (2000), the differences are small, and there is no color trend. The offsets compared to the Rosenberg et al. (2000) are larger, but again there are no clear color trends. The offsets compared to the Jeon et al. (2004) data are also significant, but more notable are slight trends with color. 2.4. Calculation of the Luminosity Function Artificial star tests were performed to empirically measure the precision of our photom- etry and to correct for incompleteness in the detection of stars. We followed the procedure described by Hargis et al. (2004) for the calculation of incompleteness corrections as function of position and magnitude. The inputs used for producing the artificial star tests were the reduced B and V CCD frames, PSFs for each object frame, fiducial lines, and an estimate for the initial LF (Sandquist et al. 1996). Artificial stars were randomly placed in cells on a spatial grid and the entire grid was then shifted randomly from run to run in order to ensure the whole imaged field was tested (Piotto & Zoccali 1999). Each star was placed in a consistent po- sition relative to the cluster center on each image. If a detected star was found to coincide with the input position of an artificial star, it was added to the archive. The new images were reduced using the same procedure applied to the original data set. In this study, a total of 100,000 artificial stars from 50 separate runs were added. The number of artificial stars per trial was chosen so that the effects of crowding on the photometry was qualitatively unchanged. The recovered artificial stars were used to calculate 1) median magnitude and color biases (δV and δB−V , where δ = median[output − input]), 2) median external error esti- mates (σext(V ) = median[δV −median(δV )]/0.6745 and σext(B − V )), and 3) total recovery probabilities (F (V ), which is the fraction of the stars that were recovered with any output magnitude) in bins according to projected radius and magnitude. The values for the above quantities are plotted in Figs.6 – 8. Finally, an initial estimate of the “true” LF and the error distribution, magnitude biases and the total recovery probability (F ) were used to compute the completeness fraction f – 6 – (the ratio of the predicted number of stars to the actual number of observed stars). The completeness fraction results are shown in Figure 9. We then interpolated to compute f for the radial distance and magnitude of each detected star. For each observed star f−1 was added to the appropriate magnitude bin to determined the observed LF. (Note that the completeness fraction was set to 1.0 for star brighter than the turnoff.) The observed LF along with the upper and lower 1 σ error bars on log N are listed in Table 2. 3. Discussion 3.1. Reddening, Metallicity, Distance Modulus, and Age Because NGC 5466 resides at high galactic latitude, it suffers little if any reddening. Though Schlegel et al. (1998) found a reddening of E(B − V ) = 0.02 from the maps of dust IR emission, we adopted E(B − V ) = 0.0 (Rosenberg et al. 2000). For our interests in this paper, the small difference is of small importance. Most of the comparisons below between observations and theory are relative, in which reference points (like the turnoff) are used to determine magnitude and/or color shifts. This has the benefit of minimizing the influence of uncertainties in reddening and distance modulus (see below). As for abundances, there is only one high-resolution measurement for a cluster star, and it is for the anomalous Cepheid V19. McCarthy & Nemec (1997) find [Fe/H]= −1.92±0.05, while Pritzl et al. (2005) find [Fe/H]= −2.05 using the same data. Typically quoted metal- licity values include [Fe/H] = −2.17 (Zinn 1980) from photoelectric photometry of integrated light in selected filter bands, and [Fe/H] = −2.22, which was derived by Zinn & West (1984) (ultimately from low-resolution spectral scans by Searle & Zinn 1978). When converted to the widely-used metallicity scale of Carretta & Gratton (1997), this becomes [Fe/H] = −2.14. More work could certainly be done on the composition of NGC 5466 stars, but the evidence so far points to an abundance [Fe/H] . −2.0. Though the range in the above quoted metal- licity values is relatively large for a globular cluster, the exact value is not critical for our purposes since we will primarily be concerned with relative comparisons. Our photometry does not extend faint enough to derive a new distance modulus from subdwarf fitting to the main sequence. Harris (1996) obtained (m − M)V = 16.0 by cali- brating the observed luminosity level of the horizontal branch with the relation MV (HB) = 0.15[Fe/H]+0.80 and adopting a reddening, E(B−V )=0.0 and a metallicity, [Fe/H]=−2.22. Ferraro et al. (1999) determined distance moduli (m−M)V = 16.16 from their zero-age HB estimate, assuming no reddening and metallicity on the Carretta & Gratton (1997) scale. We will consider distance moduli in this range. – 7 – Most previous age estimates of NGC 5466’s age have it older than the recent deter- mination of the age of the universe (13.7+0.13 −0.17 Gyr) obtained by the Wilkinson Microwave Anisotropy Probe (WMAP) team (Spergel et al. 2006). Recent homogeneous studies of GGC indicate that NGC 5466 is coeval with clusters of similar metallicity (Salaris & Weiss 2002; Rosenberg et al. 2000). As a result, we will primarily consider ages in the range of 12 to 13 3.2. The Color-Magnitude Diagram The color-magnitude diagrams (CMDs) for NGC 5466 show well-defined RGB, AGB, and HB sequences (see Fig. 10), and stars extending from the tip of the RGB down to V ≈ 22.5. Fiducial sequences for the MS and lower RGB were determined from the mode of the color distribution of stars in magnitude bins. The SGB position was determined using the magnitude distribution of the stars in color bins. The fiducial line for the rest of the RGB was obtained from the mean color of stars in magnitude bins. The fiducial points are listed in Table 3. A comparison of the fiducial points derived for NGC 5466 with theoretical isochrones for a range of ages from the Teramo (Cassisi et al. 2004), Victoria-Regina (VandenBerg et al. 2006), and Yonsei-Yale (Demarque et al. 2004) groups is displayed in Figures 11 and 12. The isochrones have been shifted in color and magnitude (aligning the turnoff colors and the magnitudes of the main sequence point 0.05 mag redder than the turnoff) according to the technique of Vandenberg et al. (1990). This has the advantage of removing some of systematic uncertainties associated with the color-Teff transformations. [In our comparisons, we found that the Yonsei-Yale models could not match the fiducial line for any reasonable set of input parameters when the transformations of Lejeune et al. (1998) were used. Therefore, we only utilize models using the Green, Demarque, & King (1987) transformations. Even then, we could not find a match with the slope of the upper giant branch for reasonable metallicities ([Fe/H] . −1.9).] On the whole, the shape of the fiducial matches the models well on the main sequence, subgiant branch, and lower giant branch. Neither the Teramo nor the Victoria-Regina models include element diffusion processes, while the Yonsei-Yale isochrones only include He diffusion. However, differences in Teff -color transformations are likely to be the cause of some of the differences seen. – 8 – 3.3. The Luminosity Function The number of stars at a given luminosity in post-main-sequence phases is directly pro- portional to the lifetime spent at that luminosity. It is well known that the LF of the RGB probes the chemical stratification inside a star because the hydrogen abundance being sam- pled by the thin hydrogen-fusion shell affects the rate of evolution, and hence star counts on the RGB (Renzini & Fusi Pecci 1988). Setting aside the short pause at the RGB bump, RGB evolution accelerates in a very regular way that is ultimately related to the structure of degenerate core. The relationship between core mass and radius forces the fusion shell to function at strictly controlled density and temperature conditions, which leads to a relation- ship between core mass and luminosity. This causes the LF to be particularly sensitive to certain physical details, which we discuss in §3.3.1 below. Fig. 13 shows the observed luminosity function compared to theoretical LFs for the la- beled values of metallicity, exponent of the initial mass function, and a range of age estimates for NGC 5466, assuming our preferred distance modulus (m −M)V = 16.00. The theoret- ical models were normalized to the observed LF at V ≈ 21.3 (sufficiently faint that stellar evolution effects are minimized). The models agree well with the observed LF, implying an age of approximately 12 - 13 Gyr. 3.3.1. Relative RGB and MS Numbers Gallart et al. (2005) recently discussed theoretical luminosity functions calculated by different groups. One of the primary differences they noted was in the number of giant stars relative to main sequence stars. In order to show these differences in a parameter-independent way, we follow the method of Vandenberg et al. (1990). In Fig. 14, the theoretical LFs were shifted so that a point on the main sequence 0.05 mag redder than the turnoff color point matched the corresponding point on the cluster fiducial line. The reason for using the point (VTO+0.05) rather than the MSTO itself is that the MS has a significant slope and curvature at this point, making it possible to accurately measure the point in both observational data and isochrones (Vandenberg et al. 1990). The theoretical models were normalized to the two bins in the observed LF on either side of the turnoff (V = 19.83 and 20.13). As can be seen, age-related differences between the theoretical LFs nearly disappear when this procedure is applied (see also Stetson 1991; Vandenberg et al. 1998). However, when different sets of models are compared, there are small differences in the number of RGB stars relative to MS stars, with the Victoria-Regina models predicting the smallest number of giants and the Yonsei-Yale models predicting the largest. – 9 – To quantify the differences, we computed the ratio of the number of stars on the lower giant branch to the number of stars near the main sequence. Pollard et al. (2005) introduced this ratio and showed that it is insensitive to age and heavy-element abundance. For the main sequence population, we used star counts in the two bins on either side of the turnoff (19.682 < V < 20.282), and for the red giant branch we used the counts in the range 16.982 < V < 18.482. We derived model values from the same magnitude ranges relative to the VTO+0.05 point on the main sequence. Values are compared in Table 4. The error in the observed value is dominated by Poisson statistical scatter. The Yonsei-Yale models are in best agreement with the observations, the Victoria-Regina models are out of agreement by more than 2 σ, and the Teramo models are in between. It is worth examining the possible causes of this difference both because it may help improve the physics inputs for the models and because red giant stars are some of the largest contributors to the integrated light of old stellar populations. Gallart et al. (2005) tabulated most of the main physics inputs for the most widely-used model sets2. Earlier studies (Stetson 1991; Vandenberg et al. 1998) have shown that the LF-shifting method used above eliminates nearly all sensitivity to model input parameters like mass function, age, convective mixing length, and composition inputs with the exception of helium abundance, which we examine first. Older models (e.g. Fig. 9 of Ratcliff 1987, Fig. 7 of Stetson 1991, Fig. 3 of Vandenberg et al. 1998) seem to agree that an increase in initial helium abundance Y in non-diffusive models by 0.1 results in an increase in the relative number of stars on the RGB (more precisely, a reduction in the relative number of main sequence stars) by about 0.07-0.08 in logN . The Teramo models (Y = 0.245) predict about 12% more giant stars relative to main sequence stars compared to the Victoria-Regina models (Y = 0.235)3. This difference corresponds to a shift of 0.05 in logN , which is about an order of magnitude too large for the helium abundance difference. The Yonsei-Yale models have the lowest assumed helium abundance (Y = 0.23), but are the only set of the three that include helium diffusion. The inclusion of helium dif- fusion reduces the age derived from the turnoff of a globular cluster by about 10-15% 2Since the Gallart et al. (2005) review was published, the Teramo group found an error in the evolution scheme they used on the giant branch, which brings their models into better agreement with other groups. We use their updated models in the comparisons here 3The Teramo models also assume a larger α-element enhancement ([α/Fe] = 0.4) than the Victoria-Regina models ([α/Fe] = 0.3), which would tend to reduce the number of giants relative to main sequence stars. However, because the relative number of RGB and MS stars is not sensitive to small differences in heavy element abundance, this difference is probably unimportant. – 10 – (Proffitt & Vandenberg 1991; Straniero et al. 1997; VandenBerg et al. 2002), thanks to the inward motion of helium. According to theoretical models (e.g. Fig. 8 of Proffitt & Vandenberg 1991), diffusion has a small effect on the LFs (∼ a few times 10−2 in logN , increasing for increasing age), but it does increase the number of giants relative to MS stars. He diffusion reduces the total core hydrogen fuel supply available to an MS star, but in itself this does not strongly modify the LF, just changes the brightness of the turnoff. This magnitude change is eliminated in our LF shifting procedure. The chemical composition profile left in the star after it leaves the main sequence has a greater impact. Diffusion reduces the H abundance in the fusion regions, thereby decreasing the evolutionary timescale. According to the models of Proffitt & Michaud (1991), the changes to the He abundance profile are most considerable immediately below the surface convection zone, and just outside the nuclear fusion regions (where the composition gradient slows the inward settling of helium). However, over most of the star, the changes in Y are limited to 0.01 - 0.02. Because the core portion of the composition profile is consumed on the subgiant branch, the evolution timescales for giant stars are only affected in a minor way, and the appearance of a deep convection zone on the lower giant branch wipes out most of the effects of diffusion for the upper giant branch evolution. In spite of this, the Yonsei-Yale models have almost 23% more giants than the Victoria- Regina models, and more than 9% more giants than the Teramo models (relative to MS stars). The lower helium abundance in the Yonsei-Yale models compared to the other models should partially counteract what effects helium diffusion might have had on the RGB/MS ratio as well. Thus, it appears that neither He abundance nor He diffusion can completely explain the differences between the Yonsei-Yale models and the other groups. We can ask whether the LFs show similar disagreements at other metallicities. Hargis et al. (2004) made comparisons between the Victoria-Regina and Yonsei-Yale theoretical LFs and observational LFs for the clusters M3 (Rood et al. 1999), M5 (Sandquist et al. 1996), M12, and M30. The overall impression from those comparisons was again that the Yonsei-Yale models (having He diffusion) predict more giant stars relative to main-sequence stars than do the Victoria-Regina models. In Fig. 15, we compare the LFs for these clusters with the Teramo models. The degree of agreement or disagreement can be quantified with num- ber ratios of lower RGB and MSTO stars, similar to the ones we computed earlier for comparisons with NGC 5466. Our calculations are shown in Table 4. As can be seen, uncer- tainties in the metallicity scale have some effect on the comparisons with the observations. The Carretta & Gratton (1997) scale has higher [Fe/H] values than the Zinn & West (1984) scale, and thus results in lower number ratios. Fig. 16 shows the results of comparing the observed ratios with the models for different – 11 – [Fe/H] scales. On the Zinn & West 1984 scale (right panels), the observed values seem to be in agreement to within about 1 − 1.5σ for most of the models, with the exception of the the lowest metallicity clusters (M30 and NGC 5466) and the lowest helium abundance (Victoria-Regina) models. On the Carretta & Gratton 1997 scale, the Yonsei-Yale models have the best overall agreement, although the Teramo models only deviate noticeably for the lowest metallicity clusters. The Victoria-Regina models predict too few giants for all of the clusters. The differences from model to model (as opposed to models versus observation) point toward deficiencies elsewhere in the physics or computational algorithms used in the stel- lar evolution codes. 4 The RGB LF is a robust prediction of the models because there is a strong core mass — luminosity relationship: the conditions in the hydrogen fusion shell of the giant are strongly dependent on the structure of the degenerate core and are al- most independent of the details of the mass or structure of the envelope. As a result of this, we can focus on factors affecting core structure. (As a non-standard physics exam- ple, Vandenberg et al. 1998 describe the way in which core rotation relieves a giant star of some of the need to support itself by gas pressure, which reduces the core temperature and lengthens the evolutionary timescale.) Because model-to-model differences appear even on the faint end of the giant branch, we can set aside factors that only become important to the structure of the star near the tip of the giant branch [such as neutrino losses and conductive opacities; see Bjork & Chaboyer 2006, for example], even though there are significant theo- retical uncertainties in these quantities. Nuclear reaction rates in the fusion shell can also be neglected, partly because the uncertainties in the reactions appear to be relatively small Adelberger et al. (1998), but also because small changes in the reaction rates require only tiny changes in the shell temperature to get the same energy production. This leads us to examine the equation of state (EoS) in the core. Although the behavior of degenerate electrons is thought to be very well understood, their interactions with nuclei can have a measureable effect on the pressure. Particles of like charge tend to cluster together, which modifies the free energy of the gas and reduces the gas pressure for given density and temperature. Harpaz & Kovetz (1988) looked at the effects of the inclusion of Coulomb interactions on giant stars, and their results are corroborated by those of Cassisi et al. (2003). They found that for a given core mass the fusion shell temperature was higher when the Coulomb interactions were included, which leads to faster processing of hydrogen. Thus this is another example (like core rotation) where modification of the pressure support of the core affects the evolutionary timescales, which results in 4For a comparison of these physical inputs for the different theoretical groups, see Table 1 of Gallart et al. (2005). – 12 – changes to the luminosity function. The Coulomb corrections to the pressure become more important with increasing density for the core, but are small compared to the contribution of the degenerate electrons. All of the model sets we have considered here incorporate Coulomb interactions in some form. The Teramo group used the most sophisticated “EOS1” version of the FreeEOS5, which incorporates Coulomb corrections in a form that matches limits in both the weak (Debye-Hückel) and strong (once-component plasma) Coulomb interaction limits as well as (less importantly) electron exchange interactions. The strong interaction limit is most relevant for giant star cores since the strong interaction parameter where ζ is the rms nuclear charge and r0 is the average internuclear distance. The Yonsei-Yale group used the OPAL EoS tables (Rogers et al. 1996), but falls back on the group’s older EoS [e.g. (Guenther et al. 1992)] for conditions for high densities and temperatures outside the OPAL tables. While the most recent OPAL tables probably contain the most complete physical description of the Coulomb effect, the OPAL tables they used (Y.-C. Kim, private communication) were computed prior to recent improvements to account for relativistic electrons (Rogers & Nayfonov 2002), and as a result cut off at log ρ > 5.0. The Yale EoS at higher densities only includes the Coulomb effect in the weak Debye-Hückel limit, which for the highest densities in the core. The Victoria-Regina models also use a modified version of the EFF EoS (Eggleton et al. 1973), with a correction for Coulomb corrections in the weak Debye-Hückel limit (VandenBerg et al. 2000). The differences in the implementation of the Coulomb effect may explain the fact that the Teramo models generally predict more giants (relative to the main sequence) than the Victoria-Regina models do. However, the smaller Coulomb corrections in the Yonsei-Yale models would tend to result in fewer giants than the Teramo models (although the effects of helium diffusion work in the opposite direction). So, we are unable to completely reconcile the differences in the luminosity functions from the three groups. Obviously more detailed study is needed by all of the modelling groups to identify the causes of the differences, but such a study is beyond the scope of this paper. Still, we believe that helium diffusion and strong interaction Coulomb corrections are physical effects that should be considered first. There is, for example, good evidence from helioseismology 5FreeEOS is available at http://freeeos.sourceforge.net/, and the discussion of the implementation of the Coulomb effect can be found at http://freeeos.sourceforge.net/coulomb.pdf. http://freeeos.sourceforge.net/ http://freeeos.sourceforge.net/coulomb.pdf – 13 – for helium diffusion in the Sun Bahcall et al. (1995), despite the surface convection and meridional flow (e.g., Hathaway 1996). It is expected that helium diffusion should also act in globular cluster stars. A detailed study of the effect of equation of state uncertainties has yet to be done [see, for example, Bjork & Chaboyer (2006) for a study of uncertainties in other physical inputs]. Use of FreeEoS would make a study of equation of state effects most stringent since it appears to be capable of modelling the most sophisticated tabular EoS (OPAL), while also having the flexibility to allow individual bits of the physics to be “turned off”. As a final warning about the observations, we should remember the LF of the clus- ter M10. Pollard et al. (2005) found that unusual variations in numbers of RGB stars at different brightness levels in M10 (a virtual twin to M12). In particular there seemed to a significant excess in the number of stars near the RGB bump in brightness, while the lower RGB appeared normal (compared to Victoria-Regina and Yonsei-Yale models). A similar excess may be present in the RGB LF of M13 (Cho et al. 2005). These kinds of variations cannot be explained by the “global” physics that should apply to all globular cluster stars. These anomalies point toward fluctuations in the stellar initial mass function or composition-dependent effects. 3.3.2. The RGB Bump A second feature of the LF presented here is a noticeable RGB bump. Typically, the RGB bump appears as a peak in the differential LF and as a change of slope in the cumulative LF (CLF). The bump provides a measure of the maximum depth reached by the outer convection zone during first dredge-up since it is the result of a pause in the star’s evolution when the shell fusion source begins consuming material of constant, lower helium content (Fusi Pecci et al. 1990). Unfortunately, the number of stars occupying the bump gets smaller and the luminosity of the bump increases as the metallicity of the cluster decreases, making the bump harder to detect in metal-poor clusters. A small peak appears in our differential LF at V ≈ 16.2, and a significant (2.5 − 3.5σ) change in slope occurs at the same position as the peak in the differential LF, as shown in Fig. 17. The relative brightness of the bump can be measured by comparing to the V -magnitude of the HB at the level of the RR Lyrae instability strip ∆V HB = Vbump−VHB (Ferraro et al. 1999). This indicator is a function of the total metallicity and the age of the cluster: an increase in metallicity and/or a decrease in age are accompanied by a decrease in luminosity of the bump (Ferraro & Montegriffo 2000). We find Vbump = 16.20 ± 0.05 mag, and VHB = 16.52 ± 0.11 (from interpolation between the average magnitudes of non-variable HB stars – 14 – at the blue and red ends of the RR Lyrae instability strip), giving ∆V HB = −0.32 ± 0.12. In the compilation of Ferraro et al. (1999), a zero-age HB reference point was calculated using the relation VZAHB = VHB +0.106[Fe/H] +0.236[Fe/H]+ 0.193. Ferraro et al. found VZAHB = 16.62 ± 0.10, which is consistent with the value obtained here (VZAHB = 16.65 ± 0.11). Our value of ∆V ZAHB = −0.45 ± 0.12 is considerably lower than tabulated values for other clusters with similar metallicities (M68: −0.60 ± 0.07; M92: −0.65 ± 0.12; M15: −0.65 ± 0.09). In a separate compilation, Zoccali et al. (1999) measured a smaller value ZAHB = −0.45 ± 0.11 for M15, in better agreement with the value for NGC 5466. (The difference is primarily because Zoccali et al. measured the bump position to be 0.16 mag fainter than Ferraro et al. .) Clearly there is still some need for more precise comparisons of bumps in metal-poors clusters with theory. We believe, however, that the brightness of the bump should ultimately be judged using hydrogen-fusing stars as references because it avoids any effects of the poorly-understood physical processes (such as the helium flash and/or mass loss) associated with the creation of an HB star. Using the cluster LF (as seen in Figure 14), we again find that the observed RGB bump is fainter than model values by at least 0.3 mag when the models are shifted to match the cluster’s main sequence. Hargis et al. (2004) did similar comparisons between theoretical models and luminosity functions for M3, M5, M12, and M30 taken from the literature. With the exception of M30 (because the bump could not be identified), the position of the bump relative to the turnoff region agreed well with theory. NGC 5466 is thus the most metal-poor cluster this comparison has been done for. So at present we are left with the question of whether this might result from the cluster’s low metallicity, or whether we have been the unfortunate victims of a fluctuation in the number of giant stars in this low-mass cluster. We therefore encourage the examination of the luminosity function of more massive metal-poor clusters to settle the question. 3.3.3. Mass Function Exponent Two recent papers (Belokurov et al. 2006; Grillmair & Johnson 2006) reported the dis- covery of tidal streams covering many degrees around NGC 5466 in Sloan Digital Sky Sur- vey images. Gnedin & Ostriker (1997) examined the Milky Way globular clusters and found NGC 5466 to be a cluster that has probably been strongly affected by disk shocking in the recent past. In our examination of the LF, we found a rather low value for the global main se- quence mass function slope. The mass function for a cluster is typically expressed as a power law (N(M) ∝ M−(1+x)), where the slope x = 1.35 is the standard Salpeter value. Generally, the present-day power-law index x varies from cluster to cluster. The mass function slopes – 15 – that best fit the upper LF of NGC 5466 around and above the MSTO have −1 . x . 0. (Note that the best fit slope does depend on the models being used: the Yonsei-Yale models require a flatter slope than the Victoria-Regina and Teramo models.) Such a shallow mass function slope is unusual for a metal-poor cluster. For example, the cluster NGC 5053 has similar metallicity, position relative to the galactic center and plane, and density structure, but still has a steep x ∼ 2 mass function (Fahlman et al. 1991). Djorgovski et al. (1993) found that mass function slopes in the range of 0.5 ≤ M ≤ 0.8 are influenced primarily by the cluster’s position in the galaxy, and to some extent by cluster metallicity. Based on both of those factors, NGC 5466 should have a larger mass function slope (x ∼ 3 according to the multivariate formula in Djorgovski et al. 1993). Like other halo clusters, NGC 5466’s orbit is quite eccentric and will take the cluster more than 30 kpc away from the Galactic center (Dinescu et al. 1999), but it is currently on its way back into the halo after two relatively recent passes through the Galactic disk. Recent losses of low-mass stars may explain the recent identification of strong tidal tails near the cluster by Belokurov et al. (2006). 4. Blue Stragglers Blue stragglers (BSSs) were first identified by Sandage (1953) in the globular cluster M3. These stars are more massive than the turnoff mass and occupy the space in the CMD just bluer and brighter than the MSTO. Blue stragglers are found in clusters, and relatively more frequently in lower-luminosity clusters (Ferraro et al. 1993; Preston & Sneden 2000; Piotto et al. 2004; Sandquist 2005). From the various models proposed for the formation mechanism of BSSs, the “collision” theory (involving strong gravitational interactions be- tween previously unassociated single or binary stars) and the “mass-transfer” theory (in which the more massive star in a binary evolves and during its expansion transfers mass to its companion) are the strongest possibilities. There is a continuing interest in the study of BSSs because they may provide insight into the recent dynamical history of a cluster. In order to identify BSSs over the entire observed area of the cluster out to a radius of 11.′6, we used photometry from three datasets. In the core of the cluster we used the CFHT data presented here for the first time. Outside of the CFHT field, we used the BV photometry of Jeon et al. 2004, which covered a field 11.′6 on a side centered on the cluster. Finally, we used our KPNO data for the least crowded outskirts of the cluster. Even though NGC 5466 is a very low density cluster, the spatial resolution of the KPNO data was such that blends of stars would have resulted in the spurious identification of 10 objects as BSS in the intermediate portion of the field. 48 BSS candidates were identified in NGC 5466 by Nemec & Harris (1987), all located – 16 – between 0.′1 and 5.′6 from the cluster center. In spite of the low cluster density, we find new BSS candidates at all radii and luminosity levels, and find several of their candidates are spurious. According to the CFHT photometry, the object with ID 45 from Nemec & Harris is a blend of several fainter stars, none of which is a BSS. In addition, IDs 6 and 24 were identified as blends of stars using the Jeon et al. (2004) dataset. Our BSS list is presented in Table 5. The list includes the nine known SX Phoenicis stars (ID 27, 29, 35, 38, 39, and 49, Nemec & Mateo 1990; ID 3 (SX Phe 3), 36 (SX Phe 2), and 50 (SX Phe 1), Jeon et al. 2004) and the three eclipsing binaries (ID 19, 30, and 31; Mateo et al. 1990). New straggler candidates were given ID numbers that build upon the Nemec & Harris (1987) list. Figure 18 shows the CMDs used to select the 94 identified BSSs in each of the three datasets. In order to use the BSSs to constrain cluster dynamics, we compared the normalized cumulative radial distribution of the BSSs to the population of the giant branch, as shown in Fig. 19. Nemec & Harris (1987) found a 97.8% probability that their BSS sample was more centrally concentrated than red giants in the same magnitude range. Because their photometry was taken in conditions of poorer seeing (compared to our CFHT photometry and that of Jeon et al.), their samples are likely to be somewhat incomplete near the cluster center. Our RGB sample contains 350 stars with magnitude V < 18.5. Kolmogorov-Smirnov (K−S) probability tests were used to test the hypothesis that both populations were drawn from the same parent population. The K−S probability that the BSSs are drawn from the same radial distribution as the RGB population is 8.1 × 10−7, and 2.4 × 10−4 for the comparison with the HB population. By contrast there is a probability of 0.27 that the RGB and HB samples are drawn from the same population. The concentration of BSSs toward the cluster center as compared to the RGB and HB samples is consistent with the idea that they are more massive than individual RGB stars, and as a result have been segregated by mass deeper within the cluster potential well. Piotto et al. (2004) recently used samples of stragglers from the cores of 56 globular clusters to show that there was a strong correlation between FHBBSS = NBSS/NHB and inte- grated cluster V magnitude, and a weaker anti-correlation with central density. Sandquist (2005) examined an additional 13 low-luminosity globular clusters using similar selection criteria. NGC 5466 is an interesting cluster in relation to these samples because it has an integrated luminosity that puts it at the faint end of the Piotto et al. sample (MVt = −6.96; Harris 1996), but with a central density that is nearly an order of magnitude lower than any of their clusters [log(ρ0/(LV,⊙pc −3)) = 0.88] but comparable to clusters in the Sandquist sample. To put NGC 5466 in the context of the Piotto et al. and Sandquist samples, we selected a subset of our BSSs that satisfied the selection criteria in those studies (brighter – 17 – than the MSTO, and bluer than the MSTO by 0.05 in B − V color). From mode fitting to the turnoff region in the Jeon et al. (2004) and CFHT data, we find (B−V )TO = 0.367 and (V − I)TO = 0.511. A color offset of 0.05 in B − V corresponds to an offset of about 0.075 in V − I (VandenBerg & Clem 2003) for NGC 5466’s metallicity. We find that 75 BSSs are brighter than the cluster turnoff (VTO = 19.99±0.05) and 0.05 bluer than the cluster turnoff in B − V . We have identified 97 HB stars in our datasets for NGC 5466, which gives a specific frequency FHBBSS = 0.77± 0.12 (with the error estimate from Poisson statistics). When compared to the Piotto et al. values (see Fig. 20), NGC 5466 falls within the general trend versus MVt despite the cluster’s low central density. On the other hand, NGC 5466 has a lower FHBBSS value than other clusters of similar central density (but lower total luminosity). As discussed by Sandquist (2005), this provides additional evidence that the plateau in FHBBSS seen for clusters with log ρ0 . 2.5 is a result of the correlation between cluster integrated magnitude and central density. The lowest luminosity clusters in the Sandquist sample (E3 and Palomar 13) have central densities comparable to that of NGC 5466, but BSS frequencies that are several times higher. Another moderate-luminosity, low-density cluster (NGC 5053; Hiner et al., in preparation) similar to NGC 5466 shows a comparably low straggler frequency. BSSs produced via purely collisional means are not likely to show this kind of behavior. More likely is the scenario proposed by Davies et al. (2004) in which binary stars that would normally produce BSSs are destroyed earlier in the cluster’s history. More direct observational support for that hypothesis is needed though — for example, a detailed study of the variation of the binary star fractions as a function of integrated cluster magnitude. Fig. 21 shows that the number and frequency RBSS = NBSS/N Lsample/L sample of BSSs relative to the integrated V -band flux (derived from a King model profile) as a func- tion of radius. Both frequencies increase toward the cluster center, and neither shows signs of rising toward larger radii. As recent studies of denser clusters show (M3: Ferraro et al. 1993; M55: Zaggia et al. 1997; 47 Tuc: Ferraro et al. 2004; NGC 6752: Sabbi et al. 2004), the BSS frequency generally decreases at intermediate radii and rises again at larger dis- tances. However, the cluster Palomar 13 (Clark et al. 2004), which has a central density similar to NGC 5466, shows no sign of an increase in straggler frequency at large distance. In more massive clusters, the minimum in the BSS frequency is reached approximately where the timescale for dynamical friction equals the age of the cluster (Warren et al. 2006). A similar calculation for the current structure of NGC 5466 indicates that this occurs at about 270′′(about 2.8rc). This appears to be in the outer reaches of the core straggler distribution. – 18 – Because it is likely that NGC 5466 has lost a significant fraction of its mass, we expect that the current density structure of the cluster has not existed throughout its history and that NGC 5466 might have been able to dynamically relax stragglers to its core from larger distances earlier in its history. This may be showing that the global BSS population differs significantly between low-density/low-mass clusters and high-density/high-mass clusters. A lack of stragglers at large distance may be signature of large-scale tidal stripping of the cluster, which would remove both stragglers that would normally have formed in primordial binaries in the outer reaches of the cluster and ones that formed in the core but were given velocity kicks into orbits that would take them into the outer reaches. The case against a rise at large radius in Palomar 13 is stronger because the cluster has been surveyed out to 19 core radii, while in NGC 5466, we have only surveyed out to about 10 core radii (or 7.5 half-mass radii). Still, NGC 5466 probably should have an even more concentrated distribution of stragglers if its current density structure has existed for most of its history. In more massive clusters, the secondary rise in straggler frequency is observed between 8 and 10 rc. Unfortunately, further study of the stragglers in NGC 5466 will probably be complicated by the strong tidal tails observed in the cluster. 5. Conclusions Examinations of the luminosity functions of globular clusters continue to produce inter- esting tests of astrophysics. In this study, we found that NGC 5466 has a luminosity function that is in better overall agreement with theoretical models than the anomalous cluster M30, which has a similar low metallicity. In addition, we found that the relative numbers of red giant and main sequence stars may produce a fairly sensitive test of the physics near the core of red giants — specifically, helium diffusion and Coulomb interactions. However, we are not yet able to fully explain the differences between sets of theoretical models. Recent discoveries of large tidal tails associated with NGC 5466 suggest that this cluster has been strongly disrupted by interactions with the Galaxy. Our measured flat (−1 . x . 0) main-sequence luminosity function is unusual for a low-metallicity halo cluster. It is, however, consistent with the emerging picture of mass-segregation followed by tidal stripping. We have thoroughly re-examined the blue straggler population in the cluster, and de- tected a total of 94. The radial distribution of stragglers is clearly more centrally concen- trated than the RGB and HB populations. The frequency of blue stragglers in the cluster is relatively low — consistent with the observed anti-correlation between frequency and cluster luminosity, in spite of the cluster’s very low central density. – 19 – We would like to thank the anonymous referee for helpful comments on the manuscript, Y. Jeon for providing us with an electronic copy of his photometric dataset, Y.-C. Kim for information on the Yonsei-Yale isochrones, and S. Cassisi for providing us with access to the Teramo set of models. This work has been funded through grants AST 00-98696 and 05-07785 from the National Science Foundation to E.L.S. and M.B. REFERENCES Adelberger, E. G., et al. 1998, Rev. Mod. Phys., 70, 1265 Bahcall, J. N., Pinsonneault, M. H., & Wasserburg, G. J. 1995, Rev. Mod. Phys., 67, 781 Behr, B. B. 2003, ApJS, 149, 67 Belokurov, V., Evans, N. W., Irwin, M. J., Hewett, P. C., & Wilkinson, M. I. 2006, ApJ, 637, L29 Bergbusch, P. A. 1996, AJ, 112, 1061 Bjork, S. R., & Chaboyer, B. 2006, ApJ, 641, 1102 Bolte, M. 1994, ApJ, 431, 223 Bono, G., Cassisi, S., Zoccali, M., & Piotto, G. 2001, ApJ, 546, L109 Burles, S., Nollett, K. M., & Turner, M. S. 2001, ApJ, 552, L1 Carretta, E., & Gratton, R. G. 1997, A&AS, 121, 95 Cassisi, S., Salaris, M., Castelli, F., & Pietrinferni, A. 2004, ApJ, 616, 498 Cassisi, S., Salaris, M., & Irwin, A. W. 2003, ApJ, 588, 862 Cho, D.-H., Lee, S.-G., Jeon, Y.-B., & Sim, K. J. 2005, AJ, 129, 1922 Clark, L. L., Sandquist, E. L., & Bolte, M. 2004, AJ, 128, 3019 D’Antona, F., Bellazzini, M., Caloi, V., Pecci, F. F., Galleti, S., & Rood, R. T. 2005, ApJ, 631, 868 Davies, M. B., Piotto, G., & de Angeli, F. 2004, MNRAS, 349, 129 degl’Innocenti, S., Weiss, A., & Leone, L. 1997, A&A, 319, 487 – 20 – Demarque, P., Woo, J.-H., Kim, Y.-C., & Yi, S. K. 2004, ApJS, 155, 667 Dinescu, D. I., Girard, T. M., & van Altena, W. F. 1999, AJ, 117, 1792 Djorgovski, S., Piotto, G., & Capaccioli, M. 1993, AJ, 105, 2148 Eggleton, P. P., Faulkner, J., & Flannery, B. P. 1973, A&A, 23, 325 Fahlman, G. G., Richer, H. B., & Nemec, J. 1991, ApJ, 380, 124 Ferraro, F. R., Beccari, G., Rood, R. T., Bellazzini, M., Sills, A., & Sabbi, E. 2004, ApJ, 603, 127 Ferraro, F. R., Messineo, M., Fusi Pecci, F., de Palo, M. A., Straniero, O., Chieffi, A., & Limongi, M. 1999, AJ, 118, 1738 Ferraro, F. R. & Montegriffo, P. 2000, AJ, 119, 1282 Ferraro, F. R., Paltrinieri, B., & Cacciari, C. 1999, Mem. Soc. Astron. Italiana, 70, 599 Ferraro, F. R., Fusi Pecci, F., Cacciari, C., Corsi, C., Buonanno, R., Fahlman, G. G., & Richer, H. B. 1993, AJ, 106, 2324 Fusi Pecci, F., Ferraro, F. R., Crocker, D. A., Rood, R. T., & Buonanno, R. 1990, A&A, 238, 95 Gallart, C., Zoccali, M., & Aparicio, A. 2005, ARA&A, 43, 387 Gnedin, O. Y., & Ostriker, J. P. 1997, ApJ, 474, 223 Grillmair, C. J., & Johnson, R. 2006, ApJ, 639, L17 Guenther, D. B., Demarque, P., Kim, Y.-C., & Pinsonneault, M. H. 1992, ApJ, 387, 372 Guhathakurta, P., Webster, Z. T., Yanny, B., Schneider, D. P., & Bahcall, J. N. 1998, AJ, 116, 1757 Hargis, J. R., Sandquist, E. L., & Bolte, M. 2004, ApJ, 608, 243 Harpaz, A., & Kovetz, A. 1988, ApJ, 331, 898 Harris, W. E. 1996, AJ, 112, 1487 Hathaway, D. H. 1996, ApJ, 460, 1027 Jeon, Y., Lee, M. G., Kim, S., & Lee, H. 2004, AJ, 128, 287 – 21 – Johnson, J. A., & Bolte, M. 1998, AJ, 115, 693 Kim, Y., Demarque, P., Yi, S. K., & Alexander, D. R. 2002, ApJS, 143, 499 Landolt, A. U. 1992, AJ, 104, 340 Lejeune, T., Cuisinier, F., & Buser, R. 1998, A&AS, 130, 65 Mateo, M., Harris, H. C., Nemec, J., & Olszewski, E. W. 1990, AJ, 100, 469 McCarthy, J. K., & Nemec, J. M. 1997, ApJ, 482, 203 Nemec, J. M. & Harris, H. C. 1987, ApJ, 316, 172 Nemec, J., & Mateo, M. 1990, ASP Conf. Ser. 11: Confrontation Between Stellar Pulsation and Evolution, 11, 64 Norris, J. E. 2004, ApJ, 612, L25 Olive, K. A., & Skillman, E. D. 2004, ApJ, 617, 29 Piotto, G., De Angeli, F., King I. R., Djorgovski, S. G., Bono, G., Cassisi, S. , Meylan, G., Recio-Blanco, A. , Rich, R. M. & Davies, M. B. 2004, ApJ, 604, 109 Piotto, G., & Zoccali, M. 1999, A&A, 345, 485 Pollard, D. L., Sandquist, E. L., Hargis, J. R., & Bolte, M. 2005, ApJ, 628, 729 Preston, G. W., & Sneden, C. 2000, AJ, 120, 1014 Pritzl, B. J., Venn, K. A., & Irwin, M. 2005, AJ, 130, 2140 Proffitt, C. R., & Michaud, G. 1991, ApJ, 371, 584 Proffitt, C. R., & Vandenberg, D. A. 1991, ApJS, 77, 473 Pryor C. , McClure, R. D., Fletcher, J. M., & Hesser, J. E., 1991, AJ, 102, 1026 Ratcliff, S. J. 1987, ApJ, 318, 196 Renzini, A. & Fusi Pecci, F. 1988, ARA&A, 26, 199 Rogers, F. J., & Nayfonov, A. 2002, ApJ, 576, 1064 Rogers, F. J., Swenson, F. J., & Iglesias, C. A. 1996, ApJ, 456, 902 Rood, R. T., et al. 1999, ApJ, 523, 752 – 22 – Rosenberg, A., Aparicio, A., Saviane, I., & Piotto, G. 2000, A&AS, 145, 451 Sabbi, E., Ferraro, F. R., Sills, A., & Rood, R. T. 2004, ApJ, 617, 1296 Salaris, M., Riello, M., Cassisi, S., & Piotto, G. 2004, A&A, 420, 911 Salaris, M., & Weiss, A. 2002, A&A, 388, 492 Sandage, A. R. 1953, AJ, 58, 61 Sandquist, E. L., Bolte, M., Langer, G. E., Hesser, J. E., & Mendes de Oliveira, C. 1999, ApJ, 158, 262 Sandquist, E. L., Bolte, M., Stetson, P. B., & Hesser, J. E. 1996, ApJ, 470, 910 Sandquist, E. L. 2005, ApJ, 635, 73 Schlegel, D. J., Finkbeiner, D. P., & Davis, M. 1998, ApJ, 500, 525 Searle, L., & Zinn, R. 1978, ApJ, 225, 357 Spergel, D. N., et al. 2006, submitted Stetson, P. B. 2000, PASP, 112, 925 Stetson, P. B. 1991, ASP Conf. Ser. 13: The Formation and Evolution of Star Clusters, 13, Stetson, P. B. 1990, PASP, 102, 932 Stetson, P. B. 1987, PASP, 99, 191 Straniero, O., Chieffi, A., & Limongi, M. 1997, ApJ, 490, 425 VandenBerg, D. A., Bergbusch, P. A., & Dowler, P. D. 2006, ApJS, 162, 375 Vandenberg, D. A., Bolte, M., & Stetson, P. B. 1990, AJ, 100, 445 VandenBerg, D. A., & Clem, J. L. 2003, AJ, 126, 778 Vandenberg, D. A., Larson, A. M., & de Propris, R. 1998, PASP, 110, 98 VandenBerg, D. A., Richard, O., Michaud, G., & Richer, J. 2002, ApJ, 571, 487 VandenBerg, D. A., Swenson, F. J., Rogers, F. J., Iglesias, C. A., & Alexander, D. R. 2000, ApJ, 532, 430 – 23 – Warren, S. R., Sandquist, E. L., & Bolte, M. 2006, ApJ, 648, 1026 Zaggia, S. R., Piotto, G., & Capaccioli, M. 1997, A&A, 327, 1004 Zoccali, M., Cassisi, S., Piotto, G., Bono, G., & Salaris, M. 1999, ApJ, 518, L49 Zoccali, M., & Piotto, G. 2000, A&A, 358, 943 Zinn, R. & West, M. J. 1984, ApJS, 55, 45 Zinn, R. 1980, ApJ, 241, 602 This preprint was prepared with the AAS LATEX macros v5.2. – 24 – Fig. 1.— Photometric residuals (in the sense of this study minus those of Landolt 1992 and Stetson 2000) of primary standard stars. The median residuals are listed in the panels with the semi-interquartile range (half the magnitude difference between the 25% and 75% points in the ordered list of residuals) given in parentheses. – 25 – Fig. 2.— Photometric residuals (in the sense of the final PSF photometry minus standard aperture photometry values) of secondary standard stars. – 26 – Fig. 3.— Residuals (in the sense of this study minus Stetson 2000) from the star-by-star comparison. The median residuals are listed in the panels with the semi-interquartile range (see Fig.1) given in parentheses. – 27 – Fig. 4.— Residuals (in the sense of this study minus Rosenberg et al. 2000) from the star- by-star comparison. The median residuals and the plots versus color have been restricted to brighter stars (V < 20 and I < 19) to make the comparisons clearer. The numbers in parentheses are the semi-interquartile ranges (see Fig.1). – 28 – Fig. 5.— Residuals (in the sense of this study minus Jeon et al. 2004) from the star-by-star comparison. The median residuals and the plots versus color have been restricted to brighter stars (B < 20 and V < 19.5) to make the comparisons clearer. The numbers in parentheses are the semi-interquartile ranges (see Fig.1). – 29 – Fig. 6.— External V magnitude errors σext(V ) as a function of radius and magnitude determined from artificial star tests, with exponential fits shown by the solid lines. – 30 – Fig. 7.— Magnitude biases δ(V ) determined from artificial star tests as a function of radius and magnitude. – 31 – Fig. 8.— Total recovery probability F (V ) determined from artificial star tests as a function of radius and magnitude. – 32 – Fig. 9.— Completeness fraction f(V ) determined from artificial star tests as a function of radius and magnitude. – 33 – Fig. 10.— Color-magnitude diagrams for all stars measured in the KPNO and CFHT images. The BV fiducial (Table 3) is also plotted in the left panel. – 34 – Fig. 11.— Comparison of the observed fiducial sequence of NGC 5466 with the isochrones of the Teramo, Victoria-Regina, and Yonsei-Yale groups. The isochrones have been shifted horizontally so that the turnoff colors align, and shifted vertically to align the main sequence point 0.05 mag redder than the turnoff. On the giant branch, the ages increase from the reddest to the bluest isochrone. – 35 – Fig. 12.— Same as Fig. 11, except for more metal-rich models. – 36 – Fig. 13.— Comparison of the observed V−band LF of NGC 5466 with theoretical models of the Yonsei-Yale, Teramo, and Victoria-Regina groups assuming (m−M)V = 16. – 37 – Fig. 14.— Comparison of the observed V−band LF of NGC 5466 with theoretical models of the Victoria-Regina, Yonsei-Yale, and Teramo groups using magnitude shifts that bring the main sequence point 0.05 mag redder than the turnoff into alignment. The models have been normalized to the two bins on either side of the turnoff (V = 19.94). – 38 – Fig. 15.— Comparison of the observed LFs of M3, M5, M12, and M30 with theoretical models of the Teramo group using magnitude shifts that bring the main sequence point 0.05 mag redder than the turnoff into alignment. The models have been normalized to bins on either side of the turnoff. – 39 – Fig. 16.— Fractional difference between the observed RGB-MS number ratios (for M5, M12, M3, M30, and NGC 5466, from left to right) and the theoretical predictions from the Yonsei- Yale, Teramo, and Victoria-Regina models. The left panels use the Carretta & Gratton (1997) metallicity scale, and the right panels use the Zinn & West (1984) scale. The sense is (observed − theoretical) / observed. – 40 – Fig. 17.— The cumulative luminosity function for bright RGB stars derived from the pho- tometry of Jeon et al. (2004; 286 stars) and from the KPNO dataset (338 stars) presented here. Dotted lines show fits to the data for stars above and below the position of the apparent bump. – 41 – Fig. 18.— Blue straggler selection for NGC 5466. The stars plotted in each panel show the entire sample used for the selection: stars from Jeon et al. (2004) in the middle panel are only those stars outside the CFHT field, and KPNO stars in the right panel are only those outside the Jeon et al. field. Open squares show stragglers identified by Nemec & Harris (1987), and open circles are new candidates. – 42 – Fig. 19.— Normalized cumulative radial distributions for RGB stars (dashed line), HB stars (dotted line), and BSSs (solid line). – 43 – Fig. 20.— Relative frequencies of blue stragglers as a function of cluster absolute magnitude and central density. The solid square is NGC 5466, the open squares are globular clusters from Sandquist (2005), and all other points are from Piotto et al. (2004). Open circles are post-core-collapse clusters. In the left panel, symbols represent clusters in different ranges of central density from the Piotto et al. sample: log ρ0 < 2.8: open triangles; 2.8 < log ρ0 < 3.6: filled triangles; 3.6 < log ρ0 < 4.4: stars; log ρ0 > 4.4: filled circles. – 44 – Fig. 21.— Frequency of BSS relative to the integrated V -band flux of detected cluster stars (top panel) and specific frequency of blue stragglers relative as a function of radius (bottom panel). – 45 – Table 1. Photometric Observation Log for NGC 5466 UT Date Filters N Exposure Time (s) Airmass 1995 May 4 B,V 1,1 60 1.01,1.12 1995 May 4 B 2 300 1.03, 1.03 1995 May 4 B,V 2,1 600 1.0,1.11,1.0 1995 May 5 B,I 2,2 300 1.03,1.01,1.02,1.01 1995 May 9 B,V ,I 1,1,1 60 1.12,1.13,1.14 – 46 – Table 2. V -Band Luminosity Function V logN σhigh σlow 13.532 0.5005 0.1761 0.3010 13.832 −0.1015 0.3010 1.0000 14.132 −0.1015 0.3010 1.0000 14.432 0.5975 0.1605 0.2575 14.732 0.8985 0.1193 0.1651 15.032 0.6767 0.1487 0.2279 15.332 0.6767 0.1487 0.2279 15.632 0.5976 0.1606 0.2575 15.932 0.8986 0.1193 0.1651 16.232 1.2410 0.0839 0.1041 16.532 1.0127 0.1063 0.1411 16.832 1.0451 0.1029 0.1351 17.132 1.3306 0.0764 0.0928 17.432 1.4432 0.0678 0.0804 17.732 1.4777 0.0728 0.0876 18.032 1.6136 0.0630 0.0738 18.332 1.7575 0.0540 0.0617 18.632 1.7577 0.0540 0.0617 18.932 1.9624 0.0433 0.0481 19.232 2.2928 0.0301 0.0324 19.532 2.5143 0.0236 0.0249 19.832 2.6996 0.0192 0.0201 20.132 2.7738 0.0178 0.0185 20.432 2.8830 0.0159 0.0165 20.732 2.9327 0.0151 0.0157 21.032 3.0036 0.0142 0.0146 21.332 3.0496 0.0137 0.0142 21.632 3.1179 0.0132 0.0136 – 47 – – 48 – Table 3. Fiducial sequence for NGC 5466 V B − V Na 22.198 0.578 518 21.999 0.564 598 21.798 0.549 724 21.596 0.532 710 21.398 0.498 673 21.200 0.470 670 21.004 0.459 653 20.802 0.450 614 20.600 0.431 568 20.400 0.418 546 20.199 0.404 470 20.001 0.395 453 19.805 0.396 382 19.599 0.403 273 19.400 0.436 235 19.206 0.495 155 19.005 0.543 84 18.823 0.586 70 18.588 0.607 43 18.401 0.616 52 18.204 0.621 40 17.989 0.636 34 17.820 0.647 31 17.586 0.654 23 17.403 0.672 20 17.194 0.677 19 16.988 0.693 11 16.784 0.722 10 16.613 0.725 6 16.400 0.752 12 16.199 0.771 16 – 49 – Table 3—Continued V B − V Na 16.066 0.779 5 15.825 0.809 7 15.602 0.827 4 15.407 0.857 6 15.111 0.917 1 15.006 0.930 5 14.786 0.946 2 14.590 0.986 11 14.438 1.044 2 aNumber of stars used to determine fiducial point – 50 – Table 4. RGB-MSTO Number Ratios Sourcea TO Sample RGB Sample Y NRGB/NMSTO [Fe/H] NGC 5466 19.682 < V < 20.282 16.982 < V < 18.482 0.162± 0.013 VR 0.235 0.132 −2.22 VR 0.235 0.130 −2.14 T 0.245 0.148 −2.22 T 0.245 0.146 −2.14 YY 0.230 0.162 −2.22 YY 0.230 0.160 −2.14 M3 18.80 < V < 19.40 16.40 < V < 18.00 0.168± 0.008 VR 0.235 0.170 −1.66 VR 0.235 0.158 −1.34 T 0.246 0.181 −1.66 T 0.248 0.170 −1.34 YY 0.230 0.185 −1.66 YY 0.230 0.162 −1.34 M5 19.13 < B < 19.73 16.33 < B < 17.93 0.110± 0.006 VR 0.235 0.106 −1.40 VR 0.235 0.097 −1.11 T 0.248 0.114 −1.40 T 0.251 0.106 −1.11 YY 0.230 0.119 −1.40 YY 0.230 0.105 −1.11 M12 18.14 < V < 18.74 15.59 < V < 17.24 0.158± 0.011 VR 0.235 0.144 −1.40 VR 0.235 0.118 −1.14 T 0.248 0.155 −1.40 T 0.251 0.144 −1.14 YY 0.230 0.150 −1.40 YY 0.230 0.137 −1.14 M30 18.33 < V < 18.93 15.78 < V < 17.43 0.214± 0.017 VR 0.235 0.174 −2.13 VR 0.235 0.153 −1.91 – 51 – Table 4—Continued Sourcea TO Sample RGB Sample Y NRGB/NMSTO [Fe/H] T 0.245 0.173 −2.13 T 0.246 0.165 −1.91 YY 0.230 0.194 −2.13 YY 0.230 0.180 −1.91 aVR: Victoria-Regina models (no diffusion); YY: Yonsei-Yale models (He diffusion); T: Teramo models (no diffusion). All models are for an age of 12 Gyr. – 52 – Table 5. Selected Star Populations in NGC 5466 ID ∆α(′′) ∆δ(′′) B σB V σV I σI Alternate ID Ref. a Notes Blue Stragglers 1 137.74 32.86 19.1477 19.0038 605 J 1 137.74 32.86 19.1446 0.0122 19.0182 0.0280 18.7606 0.0400 809 K 2 −12.18 −5.60 18.7591 0.0116 18.5972 0.0061 10313 C 2 −12.18 −5.60 18.8461 0.0147 18.7209 0.0258 18.6245 0.0504 2176 K 3 −8.68 14.99 19.1710 0.0100 18.9595 0.0059 9339 C SX Phe (3) 3 −8.68 14.99 19.4373 0.0155 19.2441 0.0296 19.1073 0.0563 2129 K 4 −77.31 83.56 19.2057 19.0546 646 J 4 −77.31 83.56 19.2266 0.0123 19.0548 0.0256 18.8465 0.0460 2880 K 5 −81.70 65.06 18.5511 18.2697 389 J 5 −81.70 65.06 18.5632 0.0121 18.2740 0.0243 17.8309 0.0255 2895 K 7 −132.96 −3.18 18.8010 18.7046 504 J 7 −132.96 −3.18 18.8482 0.0113 18.7379 0.0217 18.5538 0.0386 3289 K 8 −144.85 3.07 18.8436 18.6640 498 J 8 −144.85 3.07 18.8726 0.0120 18.6345 0.0234 18.3367 0.0298 3358 K 9 −90.90 83.38 19.1757 19.0256 626 J 9 −90.90 83.38 19.1385 0.0144 18.9711 0.0257 18.7508 0.0387 2973 K 10 −44.94 96.04 19.3627 19.1882 751 J 10 −44.94 96.04 19.4145 0.0139 19.2190 0.0261 18.9077 0.0494 2520 K aPhotometry Sources: K: KPNO data from this paper, C: CFHT data from this paper, J: Jeon et al. (2004) Note. — The complete version of this table is in the electronic edition of the Journal. The printed edition contains only a sample. Introduction Observations and Data Reduction Calibration against Primary Standard Stars Calibration against Secondary Standard Stars Comparison with Previous Studies Calculation of the Luminosity Function Discussion Reddening, Metallicity, Distance Modulus, and Age The Color-Magnitude Diagram The Luminosity Function Relative RGB and MS Numbers The RGB Bump Mass Function Exponent Blue Stragglers Conclusions ABSTRACT We present wide-field BVI photometry for about 11,500 stars in the low-metallicity cluster NGC 5466. We have detected the red giant branch bump for the first time, although it is at least 0.2 mag fainter than expected relative to the turnoff. The number of red giants (relative to main sequence turnoff stars) is in excellent agreement with stellar models from the Yonsei-Yale and Teramo groups, and slightly high compared to Victoria-Regina models. This adds to evidence that an abnormally large ratio of red giant to main-sequence stars is not correlated with cluster metallicity. We discuss theoretical predictions from different research groups and find that the inclusion or exclusion of helium diffusion and strong limit Coulomb interactions may be partly responsible. We also examine indicators of dynamical history: the mass function exponent and the blue straggler frequency. NGC 5466 has a very shallow mass function, consistent with large mass loss and recently-discovered tidal tails. The blue straggler sample is significantly more centrally concentrated than the HB or RGB stars. We see no evidence of an upturn in the blue straggler frequency at large distances from the center. Dynamical friction timescales indicate that the stragglers should be more concentrated if the cluster's present density structure has existed for most of its history. NGC 5466 also has an unusually low central density compared to clusters of similar luminosity. In spite of this, the specific frequency of blue stragglers that puts it right on the frequency -- cluster M_V relation observed for other clusters. <|endoftext|><|startoftext|> Quark-Antiquark and Diquark Condensates in Vacuum in a 3D Two-Flavor Gross-Neveu Model∗ Zhou Bang-Rong College of Physical Sciences, Graduate School of the Chinese Academy of Sciences, Beijing 100049, China and CCAST (World Laboratory), P.O.Box 8730, Beijing 100080, China (Dated:) The effective potential analysis indicates that, in a 3D two-flavor Gross-Neveu model in vacuum, depending on less or bigger than the critical value 2/3 of GS/HP , where GS and HP are respectively the coupling constants of scalar quark-antiquark channel and pseudoscalar diquark channel, the system will have the ground state with pure diquark condensates or with pure quark-antiquark condensates, but no the one with coexistence of the two forms of condensates. The similarities and differences in the interplay between the quark-antiquark and the diquark condensates in vacuum in the 2D, 3D and 4D two-flavor four-fermion interaction models are summarized. PACS numbers: 12.38Aw; 12.38.Lg; 12.10.Dm; 11.15.Pg Keywords: 3D Gross-Neveu model, quark-antiquark and diquark condensates, effective potential I. INTRODUCTION It has been shown by effective potential approach that in a two-flavor 4D Nambu-Jona-Lasinio (NJL) model [1], even when temperature T = 0 and quark chemical poten- tial µ = 0, i.e. in vacuum, there could exist mutual com- petition between the quark-antiquark condensates and the diquark condensates [2]. Similar situation has also emerged from a 2D two-flavor Gross-Neveu (GN) model [3] except some difference in the details of the results [4]. An interesting question is that if such mutual competi- tion between the two forms of condensates is a general characteristic of this kind of two-flavor four-fermion in- teraction models? For answer to this question, on the basis of research on the 4D NJL model and the 2D GN model, we will continue to examine a 3D two-flavor GN model in similar way. The results will certainly deepen our understanding of the feature of the four-fermion in- teraction models. We will use the effective potential in the mean field approximation which is equivalent to the leading order of 1/N expansion. It is indicated that a 3D GN model is renormalizable in 1/N expansion [5]. II. MODEL AND ITS SYMMETRIES The Lagrangian of the model will be expressed by L = q̄iγµ∂µq +GS [(q̄q)2 + (q̄~τq)2] A=2,5,7 (q̄τ2λAq C)(q̄Cτ2λAq). (1) All the denotations used in Eq.(1) are the same as the ones in the 2D GN model given in Ref.[4], except that ∗The project supported by the National Natural Science Founda- tion of China under Grant No.10475113. the dimension of space-time is changed from 2 to 3 and the coupling constant HS of scalar diquark interaction channel is replaced by the coupling constant HP of pseu- doscalar diquark interaction channel. Now the matrices γµ(µ = 0, 1, 2) and the charge conjugate matrix C are taken to be 2× 2 ones and have the explicit forms , γ1 = , γ2 = It is emphasized that, in 3D case, no ”γ5” matrix can be defined, hence the third term in the right-handed side of Eq.(1) will be the only possible color-anti-triplet di- quark interaction channel which could lead to Lorentz- invariant diquark condensates, where we note that the matrix Cτ2λA is antisymmetric. Without ”γ5”, the La- grangian (1) will have no chiral symmetry. Except this, it is not difficult to verify that the symmetries of L include: 1. continuous flavor and color symmetries SUf (2) ⊗ SUc(3)⊗ Uf (1); 2. discrete symmetry R: q → −q; 3. parity P : q(t, ~x) → γ0q(t,−~x) and qC(t, ~x) → −γ0qC(t,−~x); 4. time reversal T : q(t, ~x) → γ2q(−t, ~x) and qC(t, ~x) → −γ2qC(−t, ~x); 5. charge conjugate C: q ↔ qC ; 6. special parity P1: q(t, x1, x2) → γ1q(t,−x1, x2) and qC(t, x1, x2) → −γ1q(t,−x1, x2); 7. special parity P2: q(t, x1, x2) → γ2q(t, x1,−x2) and qC(t, x1, x2) → −γ2qC(t, x1,−x2). If the quark-antiquark condensates 〈q̄q〉 could be formed, then the time reversal T , the special parities P1 and P2 will be spontaneously broken [6]. If the diquark conden- sates 〈q̄Cτ2λ2q〉 could be formed, then the color symme- try SUc(3) will be spontaneously broken down to SUc(2) http://arxiv.org/abs/0704.0829v2 and the flavor number Uf (1) will be spontaneously bro- ken but a ”rotated” electric charge U (1) and a ”rotated” quark number U ′q(1) leave unbroken [7]. In addition, the parity P will be spontaneously broken, though all the other discrete symmetries survive. This implies that the diquark condensates 〈q̄Cτ2λ2q〉 will be a pseudoscalar. In this paper we will neglect discussions of the Gold- stone bosons induced by breakdown of the continuous symmetries and pay our main attention to the problem of interplay between the above two forms of condensates. III. EFFECTIVE POTENTIAL IN MEAN FIELD APPROXIMATION Define the order parameters in the 3D GN model by σ = −2GS〈q̄q〉 and ∆ = −2HP 〈q̄Cτ2λ2q〉, (3) then in the mean field approximation, the Lagrangian (1) can be rewritten by L = Ψ̄(x)S−1(x)Ψ(x) − , (4) where Ψ(x) = qC(x) and Ψ̄(x) = q̄(x) q̄C(x) are the expressions of the quark fields in the Nambu- Gorkov basis [8]. In the momentum space, the inverse propagator S−1(x) for the quark fields may be expressed S−1(p) = 6p− σ −τ2λ2∆ −τ2λ2∆∗ 6p− σ , 6p = γµpµ. (5) The effective potential corresponding to L given by Eq.(4) becomes V (σ, |∆|) = σ (2π)3 Tr lnS−1(p)S0(p). Similar to the case of the 2D NG model [4], the calcu- lations of Tr for (red, green) and blue color degrees of freedom can be made separately thus Eq.(6) will be re- duced to V (σ, |∆|) = σ (2π)3 p2 − (σ − |∆|)2 + iε p2 + iε p2 − (σ + |∆|)2 + iε p2 + iε p2 − σ2 + iε p2 + iε After the Wick rotation, we may define and calculate in 3D Euclidean momentum space I(a2) = (2π)3 p̄2 + a2 a3 arctan a2Λ − π , if Λ ≫ |a|, (8) where Λ is the 3D Euclidean momentum cut-off. Assume that Λ ≫ |σ − |∆||, Λ ≫ σ + |∆| and Λ ≫ σ, then by means of Eq.(8) we will obtain the final expression of the effective potential in the 3D GN model V (σ, |∆|) = σ (3σ2 + 2|∆|2)Λ 6σ2|∆|+ 2|∆|3 + σ3 +2θ(σ − |∆|)(σ − |∆|)3 . (9) IV. GROUND STATES Equation (9) provide the possibility to discuss the ground states of the model analytically. The extreme value conditions ∂V (σ, |∆|)/∂σ = 0 and ∂V (σ, |∆|)/∂|∆| = 0 will lead to the equations θ(σ − |∆|)(σ − |∆|)2 = 0, (10) [σ2 − θ(σ − |∆|)(σ − |∆|)2] = 0. (11) Define the expressions = AC −B2, where A, B and C represent the second order derivatives of V (σ, |∆|) with the explicit expressions θ(σ − |∆|)(σ − |∆|), ∂σ∂|∆| ∂|∆|∂σ [σ − θ(σ − |∆|)(σ − |∆|)], ∂|∆|2 θ(σ − |∆|)(σ − |∆|). (12) Equations (10) and (11) have the four different solutions which will be discussed in proper order as follows. (i) (σ, |∆|)=(0,0). It is a maximum point of V (σ, |∆|), since in this case we have < 0 and K = A assuming Eqs. (10) and (11) have solutions of non-zero σ and |∆|. (ii) (σ, |∆|)=(σ1 ,0), where the non-zero σ1 satisfies the equation = 0. (13) When Eq. (13) is used, we obtain K = = A > 0, if Hence (σ1,0) will be a minimum point of V (σ, |∆|) when GS/HP > 2/3. (iii) (σ, |∆|)= (0, ∆1), where non-zero ∆1 obeys the equation = 0. (14) By using Eq.(14) we may get , K = A Obviously, (0,∆1) will be a minimum point of V (σ, |∆|) when GS/HP < 2/3. (iv) (σ, |∆|)=(σ2,∆2). In view of existence of the func- tion θ(σ− |∆|) in Eqs.(10) and (11), we have to consider the case of σ2 > ∆2 and σ2 < ∆2 respectively. (a) σ2 > ∆2. In this case, Eqs.(10) and (11) will be- + 2∆2 ) = 0, From them we can get − 2∆2 > 0, K = − Thus it is turned out that (σ2, ∆2) will be neither a maximum nor a minimum point of V (σ, |∆|) if σ2 > ∆2. (b) σ2 < ∆2. Now Eqs. (10) and (11) are changed into 4∆2 + σ2 = 0, (15) = 0. (16) Hence we will have the results that , K = − 8σ2∆2), from which it may be deduced that only if σ2 < ( 17− 4)∆2, (17) (σ2,∆2) is just a minimum point of V (σ, |∆|). On the other hand, from Eqs. (15) and (16) obeyed by σ2 and ∆2 we may get 13− 1 ∆2 + σ2 13 + 1 ∆2 − σ2 . (18) Equation (18) indicates that for the minimum point (σ2,∆2) satisfying Eq.(17) one will certainly have GS/HP > 2/3. Taking this and the result obtained in case (ii) into account we see that if GS/HP > 2/3 the effective potential V (σ, |∆|) will have two possible min- imum points (σ1, 0) and (σ2,∆2). To determine which one of the two minimum points is the least value point of V , we must make a comparison between V (σ1, 0) and V (σ2,∆2) with the constraint given by Eq.(17). In fact, it is easy to find out that when Eq.(13) is used, V (σ1, 0) = − , (19) and that when Eqs. (15) and (16) are used, V (σ2,∆2) = − + 3σ2 . (20) By comparing Eq.(13) with Eq.(15) we may obtain the relation 3σ1 = σ2 + 4∆2. (21) By means of Eqs.(19)-(21) it is easy to verify that V (σ1, 0)− V (σ2,∆2) = − (23∆3 − 4σ3 (8∆2 − 7σ2) when Eq.(17) is satisfied. This result indicates that when GS/HP > 2/3, the least value point of V (σ, |∆|) will be (σ1, 0) but not (σ2,∆2). In summary, if the necessary conditions GSΛ > π and HPΛ > π 2/8 for non-zero σ and ∆ are satisfied, then the least value points of the effective potential V (σ, |∆|) will be at (σ, |∆|) = (0,∆1) (σ1, 0) 0 ≤ GS/HP < 2/3 GS/HP > 2/3 . (22) As a result, in the ground state of the 3D two-flavor GN model, depending on that the ratio GS/HP is either big- ger or less than 2/3, one will have either pure quark- antiquark condensates or pure diquark condensates, but no coexistence of the two forms of condensates could hap- V. CONCLUDING REMARKS The result (22) in the 3D GN model can be compared with the ones in the 4D NJL model and in the 2D GN model. The minimal points of the effective potential V (σ, |∆|) for the latter models have been obtained and are located respectively at (σ, |∆|) = (0, ∆1) (σ2, ∆2) (σ1, 0) 0 ≤ GS/HS < 2/[3(1 + C)] 2/[3(1 + C)] < GS/HS < 2/3 GS/HS > 2/3 with C = (2HSΛ /π2 − 1)/3 and Λ4 denoting the 4D Euclidean momentum cutoff in the 4D two-flavor NJL model, if the necessary conditions GSΛ > π2/3 and > π2/2 for non-zero σ and ∆ are satisfied [2], and (σ, |∆|) = (0, ∆1) (σ2, ∆2) (σ1, 0) GS/HS = 0 0 < GS/HS < 2/3 GS/HS > 2/3 in the 2D two-flavor GN model [4]. In Eqs.(23) and (24), GS and HS always represent the coupling constants in scalar quark-antiquark channel and scalar diquark chan- nel separately. By a comparison among Eqs.(22)-(24) it may be found that the three models lead to very similar results. In all the three models, the interplay between the quark- antiquark and the diquark condensates in vacuum de- pends on the ratio GS/HD (D = S for the 4D and 2D model and D = P for the 3D model). In particular, the diquark condensates could emerge (in separate or coex- istent pattern) only if GS/HD < 2/3. This is probably a general characteristic of the considered two-flavor four- fermion models, since in these models the color number of the quarks participating in the diquark condensates and in the quark-antiquark condensates is just 2 and 3 respectively. However, there are also some differences in the pattern realizing the diquark condensates among the three models, though the pure quark-antiquark con- densates arise only if GS/HD > 2/3 in all of them. In the 2D GN model, the pure diquark condensates emerge only if GS/HS = 0 and this is different from the 4D NJL model where the pure diquark condensates may arise if GS/HS is in a finite region below 2/3. Another difference is that in the 3D GN model, there is no coexistence of the quark-antuquark condensates and the diquark conden- sates but such coexistence is clearly displayed in the 4D and 2D model. This implies that in the 3D GN model, GS/HP = 2/3 becomes the critical value which distin- guishes between the ground states with the pure diquark condensates and with the pure quark-antiquark conden- sates. It is also indicated that if the two-flavor four-fermion interaction models are assumed to be simulations of QCD (of course, only the 4D NJL model is just the true one) and the four-fermion interactions are supposed to come from the heavy color gluon exchange interactions −g(q̄γµλaq)2 (a = 1, · · · , 8;µ = 0, · · · , D − 1) via the Fierz transformation [7], then one will find that in all the three models, for the case of two flavors and three colors the ratio GS/HD are always equal to 4/3 which is larger than the above critical value 2/3. From this we can con- clude that there will be only the pure quark-antiquark condensates and no diquark condensates in the ground states of all these models in vacuum. [1] Y. Nambu and G. Jona-Lasinio, Phys. Rev. 122 (1961) 345; 124 (1961) 246. [2] Zhou Bang-Rong, Commun. Theor. Phys. 47 (2007) 95. [3] D.J. Gross and A. Neveu, Phys. Rev. D 10 (1974) 3235. [4] Zhou Bang-Rong, Commun. Theor. Phys., 47 (2007) 520. [5] B. Rosenstein, B. J. Warr, and S. H. Park, Phys. Rep. 205 (1991) 59. [6] Bang-Rong Zhou, Phys. Lett. B444 (1998) 455. [7] M. Buballa, Phys. Rep. 407 (2005) 205. [8] Y. Nambu, Phys. ReV. 117 (1960) 648; L. P. Gorkov, JETP 7 (1958) 993. ABSTRACT The effective potential analysis indicates that, in a 3D two-flavor Gross-Neveu model in vacuum, depending on less or bigger than the critical value 2/3 of $G_S/H_P$, where $G_S$ and $H_P$ are respectively the coupling constants of scalar quark-antiquark channel and pseudoscalar diquark channel, the system will have the ground state with pure diquark condensates or with pure quark-antiquark condensates, but no the one with coexistence of the two forms of condensates. The similarities and differences in the interplay between the quark-antiquark and the diquark condensates in vacuum in the 2D, 3D and 4D two-flavor four-fermion interaction models are summarized. <|endoftext|><|startoftext|> Introduction Throughput of random linear coding Performance without pre-coding Performance with pre-coding Discussion References ABSTRACT We assess the practicality of random network coding by illuminating the issue of overhead and considering it in conjunction with increasingly long packets sent over the erasure channel. We show that the transmission of increasingly long packets, consisting of either of an increasing number of symbols per packet or an increasing symbol alphabet size, results in a data rate approaching zero over the erasure channel. This result is due to an erasure probability that increases with packet length. Numerical results for a particular modulation scheme demonstrate a data rate of approximately zero for a large, but finite-length packet. Our results suggest a reduction in the performance gains offered by random network coding. <|endoftext|><|startoftext|> Introduction The discovery of extrasolar planets during the past decade has confronted astronomers with many new challenges. The diverse and surprising dynamical characteristics of many of these objects have made scientists wonder to what extent the current theories of planet formation can be applied to other planetary systems. A major challenge of planetary science is now to explain how such planets were formed, how they acquired their unfamiliar dynamical state, whether there are habitable extrasolar planets, and how to detect such habitable worlds. In this respect, one of the most surprising discoveries is the detection of planets in binary star systems. Among the currently known extrasolar planet-hosting stars approximately 25% are members of binaries (Table 1). With the exception of the pulsar planetary system PSR B1620-26 (Sigurdsson et al. 2003; Richer et al. 2003; Beer et al. 2004), and possibly the system of HD202206 (Correia et al. 2005), planets in these binary systems revolve only around one of the stars. While the majority of these binaries are wide (i.e., with separations between 250 and 6500 AU, where the perturbative effect of the stellar companion on planet formation around the other star is negligible), the detection of Jovian-type planets in the two binaries γ Cephei (separation of 18.5 AU, see Hatzes et al. 2003) and GJ 86 (separation of 21 AU, see Els et al. 2001) have brought to the forefront questions on the formation of giant planets and the possibility of the existence of smaller bodies in moderately close binary and multiple star systems. Given that more than half of main sequence stars are members of binaries/multiples (Duquennoy & Mayor 1991; Mathieu et al. 2000), and the frequency of planets in binary/multiple systems is comparable to those around single stars (Bonavita & Desidera 2007), such questions have realistic grounds. At present, the sensitivity of the detection techniques does not allow routine discovery of Earth-sized objects around binary and multi-star systems. However, with the advancement of new techniques, and with the recent launch of CoRoT and the launch of Kepler in late 2008, the detection of more planets (possibly terrestrial-class objects) in such systems is on the horizon. Table 1- Binary and multi-star systems with extrasolar planets (Haghighipour 2006) Star Star Star Star HD142 (GJ 9002) HD3651 HD9826 (υAnd) HD13445 (GJ 86) HD19994 HD22049 (εEri) HD27442 HD40979 HD41004 HD75732 (55 Cnc) HD80606 HD89744 HD114762 HD117176 (70 Vir) HD120136 (τBoo) HD121504 HD137759 HD143761 (ρCrb) HD178911 HD186472 (16 Cyg) HD190360 (GJ 777 A) HD192263 HD195019 HD213240 HD217107 HD219449 HD219542 HD222404 (γCeph) HD178911 PSR B1257-20 PSR B1620-26 HD202206 See http://www.obspm.fr/planets for complete list of extrasolar planets with their corresponding references. Fig 1. The time of ejection, vs. the initial semimajor axis of an Earth-like planet in a co-planar arrangement in the γ Cephei system. The binary consists of a 1.59 solar-masses K1 IV subgiant as its primary (Fuhrmann 2003) and a probable red M dwarf with a mass of 0.41 solar-masses (Neuhauser et al 2007) as its secondary. The semimajor axis and eccentricity of the binary are 18.5 AU and 0.36, respectively (Hatzes et al. 2003). The primary star is host to a 1.7 Jupiter-masses object in an orbit with a semimajor axis of 2.13 AU and eccentricity of 0.12. The habitable zone of the primary is within 3 AU to 3.7 AU from this star (Haghighipour 2006). As shown here, the orbit of an Earth-sized object in the primary’s habitable zone is unstable. However, such an object can have a log-term stable orbit in distances closer to the primary star. Theoretical studies and numerical modeling of terrestrial and habitable planet formation in such dynamically complex environments are, therefore, necessary to gain fundamental insights into the prospects for life in such systems and have great strategic impact on NASA science and missions. Several lines of investigations are needed to ensure progress in understanding the formation of terrestrial and habitable planets in binary and multi-star systems. Fig 2. Results of simulations of the formation of Earth-like objects in the habitable zone of the primary of a binary star system. The stars of the binary are Sun-like and the primary is host to a Jupiter-sized object on a circular orbit at 5 AU. Simulations show the results for different values of the eccentricity and semimajor axis of the stellar companion (Haghighipour & Raymond 2007). As shown here, the orbital motion of the secondary star disturbs the orbit of the giant planet, which in turn affects the final assembly and water contents of the terrestrial objects. This figure also shows that binary systems with larger perihelia are more favorable for forming and harboring habitable planets. The quantities ab and eb represent the seimmajor axis and eccentricity of the binary. 1) Computational Modeling Extensive numerical studies are necessary to i) map the parameter-space of binary and/or multiple star systems to identify regions where giant and terrestrial planets can have long-term stable orbits, ii) simulate the collision and growth of planetesimals to form protoplanetary objects, iii) simulate the formation of planetesimals in circumbinary and circumstellar disks, iii) develop models of protoplanet disk chemistry that ensure delivery of water to terrestrial-class planets in the habitable zone, iv) simulate the interaction of planetary embryos and the late stage of terrestrial planet formation. The parameter-space is large and includes the masses and orbital parameters of the stars and planets. It is, therefore, necessary to develop a systematic approach, based on the results of Fig 3. Histograms of the number of final terrestrial planets formed in binary star systems with periastron distances of 5 AU (top), 7.5 AU (middle), and 10 AU (bottom). The color red corresponds to simulations in a binary in which the primary and secondary stars are 0.5 solar-masses. The color blue represents a binary with 1 solar-mass stars, and the color yellow corresponds to a binary with a 0.5 solar-masses primary and a 1 solar- mass secondary. The black line in the middle panel shows the results of simulations when the primary star is 1 solar-mass and the secondary is 0.5 solar-masses. As shown here, the typical number of final planets clearly increases in systems with larger stellar periastra, and also when the companion star is less massive than the primary (Quintana et al. 2007). current research, to avoid un-necessary simulations, particularly if the computational resources are limited. Current research has indicated that terrestrial-class planets can have long-term stable orbits as long as they are closer to their host stars and their orbits lie outside the influence zone of the giant planet of the system (figure 1, also see Holman & Wiegert 1999; David et al. 2003; Haghighipour 2006). This implies, in order for such systems to be habitable, the habitable zone of the planet-hosting star has to be considerably closer to it than orbit of its giant planet(s). Given that the location of the habitable zone is a function of the luminosity of a star, the above- mentioned criterion can be used to constrain stellar properties. Recent numerical simulations have also shown that (1) water-delivery is more efficient when the perihelion of the binary is large and the orbit of the giant planet is close to a circle (figures 2 and 3, also see Quintana et al. 2007, Haghighipour & Raymond 2007), and (2) habitable planets can form in the habitable zone of a star during the migration of giant planets (figure 4, see Raymond, Mandell & Sigurdsson, 2006). Since many stars are formed in clusters, their mutual interactions may change their orbital configurations and cause their giant planets to revolve around their host starts in un-conventional orbits. Theoretical studies are essential to identify systems capable of forming and harboring habitable planets. Fig 4. Habitable planet formation at presence of giant planet migration (Raymond, Mandell & Sigurdsson 2006). The system consists of a Sun-like star and a Jupiter-sized giant planet. The figure shows snapshots in time of the evolution of one simulation. Each panel plots the orbital eccentricity versus semimajor axis for each surviving body. The size of each body is proportional to its physical size (except for the giant planet, shown in black). The vertical "error bars" represent the sine of each body's inclination on the y-axis scale. The color of each dot corresponds to its water content (as per the color bar), and the dark inner dot represents the relative size of its iron core. For scale, the Earth's water content is roughly 10-3. As shown here, an Earth-like object can form in the habitable zone of the star while the giant planet migrates to closer distances. 2) Theoretical Analysis of Observation Data Recent observations of binary star systems, using Spitzer Space Telescope, show evidence of debris disks in these environments (Trilling et al. 2007). As shown by these authors, approximately 60% of their observed close binary systems (separation smaller than 3 AU) have excess in their thermal emissions, implying on-going collisions in their planetesimal regions. Future space-, air-, and ground-based telescopes such as ALMA, SOFIA and JWST will be able to detect more of such disks and will also be able to resolve their fine structures. Numerical simulations, similar to those for debris disks around single stars (Telesco et al. 2005), will be necessary in order to understand the dynamics of such planet-forming environments, and also identify the source of their disks features (e.g., embedded planets, and/or on-going planetesimals collision). Due to the complex nature of these systems, such numerical studies require more advanced computational codes, and more powerful computers. Developing theories of disk evolution in close binary systems is also essential. 3) Computational Resources Given the extent and complexity of simulations of planet formation in multi-star systems, and the high dimensionality of the parameter space of initial conditions, supports for developing computational resources with the primary focus of conducting numerical analysis of terrestrial planet formation are essential. Reliable simulations of collisional growth of planetesimals and planetary embryos require integration of the orbits of several hundred thousands of such objects. With the current technology, such simulations may take several months to a year to complete. It is therefore necessary to develop (i) faster integration routines, and (ii) major computational facilities with the primary focus of simulating terrestrial planet formation. Strategic Impact to NASA Missions Understanding terrestrial and habitable planet formation in binary and multiple star systems has implications for investigating the habitability of extra-solar planets. It ties directly into near future NASA missions, in particular Kepler, and JWST as well as complementary ongoing and planned NSF and privately funded surveys that include transit, and radial velocity. It is also closely coupled with the scientific aspect of the Space Exploration Vision and aligns with the 2006 NASA Science Program implementation of the Strategic sub-goal 3D: “Discover the origin, structure, evolution, and destiny of the universe and search for earth-like planets.” The strategic relevance to the NASA missions is in the prospects for detection of habitable Earth-like planets. Studies such as those presented here underlie hypotheses regarding the likelihood of the existence of such planets, the origin of life in the habitable zones of their host stars, and theories of evolution and persistence of life after initiation, at the presence of a stellar companion. Earth-like objects in and around binary star systems allow testing of theories of extrasolar habitability and origin of life. Prospects for testing of extrasolar life are intrinsically exciting and valuable to the NASA community and the public, and the systems to be explored, once found, provide calibration targets for future NASA missions. References Beer, M. E., King, A. P., & Pringle, J. E. 2004, MNRAS, 355, 1244 Bonavita, M., & Desidera, S. 2007, submitted to A&A (astro-ph/0703754) Correia, A. C. M., Udry, S., Mayor, M., Laskar, J., Naef, D., Pepe, F., Queloz, D., & Santos, N. C., 2005, A&A, 440, 751 David, E., Quintana, E. V., Fatuzzo, M., Adams, F. C., 2003, PASP, 115, 825 Duquennoy, A., & Mayor, M. 1991, A&A, 248, 485 Els, S. G., Sterzik, M. F., Marchis, F., Pantin, E., Endl, M., & Kurster, M. 2001, A&A, 370, L1 Fuhrmann, K. 2003, Astron.Nachr. 323, 392 Haghighipour, N. 2006, ApJ, 644, 543 Haghighipour, N., & Raymond, S. N., to appear in ApJ (astro-ph/0702706) Holman, M. J., & Wiegert, P. A. 1999, AJ, 117, 621 Hatzes, A. P., Cochran, W. D., Endl, M., McArthur, B., Paulson, D. B., Walker, G. A. H., Campbell, B., & Yang, S. 2003, ApJ, 599, 1383 Mathieu, R. D., Ghez, A. M., Jensen, E. L. N., & Simon, M. 2000, in Protostars and Planets IV, ed. V. Mannings, A. P. Boss, & S. S. Russell (Tucson: Univ. Arizona Press), 703 Neuhauser, R., Mugrauer, M., Fukagawa, M., Torres, G., Schmidt, T., 2007, A&A, 462, 777 Quintana, E. V., Adams, F. C., Lissauer, J. J., Chambers, J. E. 2007, ApJ, to appear in vol. 660 Raymond, S. N., Mandell, A. M., & Sigurdsson, S., 2006, Science, 313, 1413 Richer, H. B., Ibata, R., Fahlman, G., G., & Huber, M. 2003, ApJ, 597, L45 Sigurdsson, S., Richer, H. B., Hansen, B., M., Stairs, I. H., & Thorsett, S. E. 2003, Science, 301, 103 Telesco, C. M., Fisher, R. S., Wyatt, M. C., Dermott, S. F., Kehoe, T. J. J., Novotny. S., Mariñas, N., Radomski, J. T., Packham, C., De Buizer, J., Hayward, T. L., 2005, Nature, 433, 133 Trilling, D. E., Stansberry, J. A., Stapelfeldt, K. R., Rieke, G. H., Su, K. Y. L., Gray, R. O., Corbally, C. J., Bryden, G., Chen, C. H., Boden, A., Beichman, C. A. 2007, 658, 1289 ABSTRACT One of the most surprising discoveries of extrasolar planets is the detection of planets in moderately close binary star systems. The Jovian-type planets in the two binaries of Gamma Cephei and GJ 86 have brought to the forefront questions on the formation of giant planets and the possibility of the existence of smaller bodies in such dynamically complex environments. The diverse dynamical characteristics of these objects have made scientists wonder to what extent the current theories of planet formation can be applied to binaries and multiple star systems. At present, the sensitivity of the detection techniques does not allow routine discovery of Earth-sized bodies in binary systems. However, with the advancement of new techniques, and with the recent launch of CoRoT and the launch of Kepler in late 2008, the detection of more planets (possibly terrestrial-class objects) in such systems is on the horizon. Theoretical studies and numerical modeling of terrestrial and habitable planet formation are, therefore, necessary to gain fundamental insights into the prospects for life in such systems and have great strategic impact on NASA science and missions. <|endoftext|><|startoftext|> Introduction and statement of results The theory of nonlinear dispersive equations (local and global existence, regularity, scattering theory) is vast and has been studied extensively by many authors. Almost exclusively, the techniques developed so far restrict to Cauchy problems with initial data in a Sobolev space, mainly because of the crucial role played by the Fourier transform in the analysis of partial differential operators. For a sample of results and a nice introduction to the field, we refer the reader to Tao’s monograph [12] and the references therein. In this note, we focus on the Cauchy problem for the nonlinear Schrödinger equa- tion (NLS), the nonlinear wave equation (NLW), and the nonlinear Klein-Gordon equation (NLKG) in the realm of modulation spaces. Generally speaking, a Cauchy data in a modulation space is rougher than any given one in a fractional Bessel poten- tial space and this low-regularity is desirable in many situations. Modulation spaces were introduced by Feichtinger in the 80s [6] and have asserted themselves lately as the “right” spaces in time-frequency analysis. Furthermore, they provide an excellent substitute in estimates that are known to fail on Lebesgue spaces. This is not entirely surprising, if we consider their analogy with Besov spaces, since modulation spaces arise essentially replacing dilation by modulation. The equations that we will investigate are: (1) (NLS) i +∆xu+ f(u) = 0, u(x, 0) = u0(x), (2) (NLW ) −∆xu+ f(u) = 0, u(x, 0) = u0(x), (x, 0) = u1(x), (3) (NLKG) + (I −∆x)u+ f(u) = 0, u(x, 0) = u0(x), (x, 0) = u1(x), Date: October 30, 2018. 2000 Mathematics Subject Classification. Primary 35Q55; Secondary 35C15, 42B15, 42B35. Key words and phrases. Fourier multiplier, weighted modulation space, short-time Fourier trans- form, nonlinear Schrödinger equation, nonlinear wave equation, nonlinear Klein-Gordon equation, conservation of energy. http://arxiv.org/abs/0704.0833v1 2 Á. Bényi and K. A. Okoudjou where u(x, t) is a complex valued function on Rd×R, f(u) (the nonlinearity) is some scalar function of u, and u0, u1 are complex valued functions on R The nonlinearities considered in this paper will be either power-like (4) pk(u) = λ|u|2ku, k ∈ N, λ ∈ R, or exponential-like (5) eρ(u) = λ(e ρ|u|2 − 1)u, λ, ρ ∈ R. Both nonlinearities considered have the advantage of being smooth. The correspond- ing equations having power-like nonlinearities pk are sometimes referred to as alge- braic nonlinear (Schrödinger, wave, Klein-Gordon) equations. The sign of the coeffi- cient λ determines the defocusing, absent, or focusing character of the nonlinearity, but, as we shall see, this character will play no role in our analysis on modulation spaces. The classical definition of (weighted) modulation spaces that will be used through- out this work is based on the notion of short-time Fourier transform (STFT). For z = (x, ω) ∈ R2d, we let Mω and Tx denote the operators of modulation and transla- tion, and π(z) = MωTx the general time-frequency shift. Then, the STFT of f with respect to a window g is Vgf(z) = 〈f, π(z)g〉. Modulation spaces provide an effective way to measure the time-frequency concen- tration of a distribution through size and integrability conditions on its STFT. For s, t ∈ R and 1 ≤ p, q ≤ ∞, we define the weighted modulation space Mp,qt,s (Rd) to be the Banach space of all tempered distributions f such that, for a nonzero smooth rapidly decreasing function g ∈ S(Rd), we have ‖f‖Mp,qt,s = |Vgf(x, ω)|p < x >tp dx < ω >qs dω Here, we use the notation < x >= (1 + |x|2)1/2. This definition is independent of the choice of the window, in the sense that different window functions yield equivalent modulation-space norms. When both s = t = 0, we will simply write Mp,q = Mp,q0,0. It is well-known that the dual of a modulation space is also a modulation space, (Mp,qs,t )′ = M p′,q′ −s,−t, where p ′, q′ denote the dual exponents of p and q, respectively. The definition above can be appropriately extended to exponents 0 < p, q ≤ ∞ as in the works of Kobayashi [9], [10]. More specifically, let β > 0 and χ ∈ S such that suppχ̂ ⊂ {|ξ| ≤ 1} and k∈Zd χ̂(ξ − βk) = 1, ∀ξ ∈ Rd. For 0 < p, q ≤ ∞ and s > 0, the modulation space Mp,q0,s is the set of all tempered distributions f such that |f ∗ (Mβkχ)(x)|p dx < βk >sq NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES 3 When, 1 ≤ p, q ≤ ∞ this is an equivalent norm on Mp,q0,s, but when 0 < p, q < 1 this is just a quasi-norm. We refer to [9] for more details. For another definition of the modulation spaces for all 0 < p, q ≤ ∞ we refer to [5, 15]. For a discussion of the cases when p and/or q = 0, see [4]. These extensions of modulation spaces have recently been rediscovered and many of their known properties reproved via different methods by Baoxiang et all [1], [2]. There exist several embedding results between Lebesgue, Sobolev, or Besov spaces and modulation spaces, see for example [11], [13]; also [1], [2]. We note, in particular, that the Sobolev space H2s coincides with M For further properties and uses of modulation spaces, the interested reader is referred to Gröchenig’s book [8]. The goal of this note is two fold: to improve some recent results of Baoxiang, Lifeng and Boling [1] on the local well-posedness of nonlinear equations stated above, by allowing the Cauchy data to lie in any modulation space Mp,10,s, p > dd+1 , s ≥ 0, and to simplify the methods of proof by employing well-established tools from time- frequency analysis. Ideally, one would like to adapt these methods to deal with global well-posedness as well. We plan to address these issues in a future work. In what follows, we assume that d ≥ 1, k ∈ N, d < p ≤ ∞, λ, ρ ∈ R and s ≥ 0 are given. With pk and eρ defined by (4) and (5) respectively, our main results are the following. Theorem 1. Assume that u0 ∈ Mp,10,s(Rd) and f ∈ {pk, eρ}. Then, there exists T ∗ = T ∗(‖u0‖Mp,10,s) such that (1) has a unique solution u ∈ C([0, T ∗],Mp,10,s(Rd)). Moreover, if T ∗ < ∞, then lim sup t→T ∗ ‖u(·, t)‖Mp,10,s = ∞. Theorem 2. Assume that u0, u1 ∈ Mp,10,s(Rd) and f ∈ {pk, eρ}. Then, there exists T ∗ = T ∗(‖u0‖Mp,10,s , ‖u1‖Mp,10,s) such that (2) has a unique solution u ∈ C([0, T ∗],Mp,10,s(Rd)). Moreover, if T ∗ < ∞, then lim sup t→T ∗ ‖u(·, t)‖Mp,10,s = ∞. Theorem 3. Assume that u0, u1 ∈ Mp,10,s(Rd) and f ∈ {pk, eρ}. Then, there exists T ∗ = T ∗(‖u0‖Mp,10,s , ‖u1‖Mp,10,s) such that (3) has a unique solution u ∈ C([0, T ∗],Mp,10,s(Rd)). Moreover, if T ∗ < ∞, then lim sup t→T ∗ ‖u(·, t)‖Mp,10,s = ∞. Remark 1. In Theorem 1 we can replace the (NLS) equation with the following more general (NLS) type equation: (7) (NLS)α i +∆α/2x u+ f(u) = 0, u(x, 0) = u0(x), for any α ∈ [0, 2] and p ≥ 1. The operator ∆α/2x is interpreted as a Fourier multiplier operator (with t fixed), ∆̂ x u(ξ, t) = |ξ|αû(ξ, t). This strengthening will become evident from the preliminary Lemma 1 of the next section. Remark 2. Theorems 1.1 and 1.2 of [1] are particular cases of Theorem 1 with p = 2 and s = 0. 4 Á. Bényi and K. A. Okoudjou 2. Fourier multipliers and multilinear estimates The generic scheme in the local existence theory is to establish linear and nonlinear estimates on appropriate spaces that contain the solution u. As indicated by the main theorems above, the spaces we consider here areMp,10,s, and we present the appropriate estimates in the lemmas below. In fact, we will need estimates on Fourier multipliers on modulation spaces. As proved in [3] and [7], a function σ(ξ) is a symbol of a bounded Fourier multiplier on Mp,q for 1 ≤ p, q ≤ ∞ if σ ∈ W (FL1, ℓ∞) (see the proofs of the following two lemmas for a definition of this space). As we shall indicate below, this condition can be naturally extended to give a sufficient criterion for the boundedness of the Fourier multiplier operator on Mp,q0,s for 0 < p, q ≤ ∞ and s ≥ 0. The notation A . B stands for A ≤ cB for some positive constant c independent of A and B. Lemma 1. Let σ be a function defined on Rd and consider the Fourier multiplier operator Hσ defined by Hσf(x) = σ(ξ) f̂(ξ) e2πξ·x dξ. Let χ ∈ S such that supp χ̂ ⊂ {|ξ| ≤ 1}. Let d ≥ 1, s ≥ 0, 0 < q ≤ ∞, and 0 < p < 1. If σ ∈ W (FLp, ℓ∞)(Rd), i.e., ‖σ‖W (FLp,ℓ∞) = sup ‖σ · Tβnχ‖FLp < ∞ for β > 0, then Hσ extends to a bounded operator on Mp,q0,s(Rd). Proof. We use the definition of the modulation spaces given by (6) (see also [9]). In particular, let χ ∈ S such that supp χ̂ ⊂ {|ξ| ≤ 1}, and define g ∈ S by ĝ = χ̂2. Denote g̃(x) = g(−x). For f ∈ S, β > 0, k ∈ Zd and x ∈ Rd we have: |Hσf ∗ (Mβkg̃)(x)| = |VgHσf(x, βk)| = |〈σf̂,M−xTβkĝ〉| = |〈σf̂,M−xTβkχ̂2〉| ≤ |F−1(σ · Tβkχ̂)| ∗ |F−1(f̂ · Tβkχ̂)|(x) ≤ |F−1(σ · Tβkχ̂)| ∗ |f ∗ (Mβkχ̃)|(x). Now, observe that supp σ · Tβkχ̂ ⊂ Γk := βk + {|ξ| ≤ 1} and supp f̂ · Tβkχ̂ ⊂ Γk. Moreover, by assumption we know that σ ∈ W (FLp, ℓ∞) and so F−1(σ · Tβkχ̂) ∈ Lp and f ∗(Mβkχ̃) ∈ Lp. Consequently, by [9, Lemma 2.6] we have the following estimate ‖Hσf ∗ (Mβkg̃)‖Lp ≤ C ‖F−1(σ · Tβkχ̂)‖Lp‖f ∗ (Mβkχ̃)‖Lp, where C is a positive constant that depends only on the diameter of Γk and p. Clearly, the diameter of Γk is independent of k, and this makes C a constant depending only on the dimension d and the exponent p. Therefore, for 0 < q ≤ ∞ we have ‖Hσf‖Mp,q0,s . sup ‖F−1(σ · Tβkχ̂)‖Lp ‖f‖Mp,q0,s = ‖σ‖W (FLp,ℓ∞) ‖f‖Mp,q0,s . NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES 5 The result then follows from the density of S in Mp,q0,s for p, q < ∞; see [9, Theorem 3.10]. � We are now ready to state and prove the boundedness of Fourier multipliers that will be needed in establishing our main results. Lemma 2. Let d ≥ 1, s ≥ 0, and 0 < q ≤ ∞ be given. Define mα(ξ) = ei|ξ| 1 ≤ p ≤ ∞ and α ∈ [0, 2], then the Fourier multiplier operator Hmα extends to a bounded operator on Mp,q0,s(Rd). Moreover, If α ∈ {1, 2} and d < p ≤ ∞, then the Fourier multiplier operator Hmα extends to a bounded operator on M 0,s(R Proof. First, we prove the result when 1 ≤ p ≤ ∞, and 0 < q ≤ ∞. Let g ∈ S(Rd) and define χ ∈ S by χ̂ = g2. For f ∈ S, we have |VχHmαf(x, ξ)| mα(t) f̂(t) e 2πix·t χ̂(t− ξ) dt mα(t) Tξg(t) < t > s f̂(t) <ξ> s N < t− ξ > N g(t− ξ) e2πix·t dt mα(t) Tξg(t)φN(ξ, t) ̂< D >s f(t) TξgN(t) dt ∣∣∣∣F mα · Tξg φN(ξ, ·) ̂< D >s f TξgN ∣∣∣∣F(mα · Tξg) ∗ F2(φN(ξ, ·)) ∗ F( ̂< D >s f · TξgN)(−x) ∣∣∣∣, where N > 0 is an integer to be chosen later, gN(t) =< t > N g(t), φN(ξ, t) = s N , and < D > s is the Fourier multiplier defined by ̂< D >s f(ξ) =< ξ >s f̂(ξ). We also denote by Φ2,N (ξ, ·) := F2(φN(ξ, ·)) the Fourier transform in the second variable of φN(ξ, ·) 6 Á. Bényi and K. A. Okoudjou We can therefore estimate the weighted modulation norm of Hmαf as follows: ‖Hmαf‖Mp,q0,s |Vχf(x, ξ)|p dx < ξ >qs ∣∣∣∣F(mα · Tξg) ∗ Φ2,N (ξ, ·) ∗ F( ̂< D >s f · TξgN)(−x) ‖F−1(mα · Tξg)‖qL1 ‖Φ2,N(ξ, ·)‖ ‖F( ̂< D >s f · TξgN)‖qLp dξ ≤ sup ‖F−1(mα · Tξg)‖L1 sup ‖Φ2,N(ξ, ·)‖L1 ‖F−1( ̂< D >s f · TξgN)‖qLp dξ ≤ sup ‖F(mα · Tξg)‖L1 sup ‖Φ2,N (ξ, ·)‖L1 ‖f‖Mp,q0,s . Now, it follows from [3, Lemma 8] that, for α ∈ [0, 2], ‖F−1(mα · Tξg)‖L1 := ‖mα‖W (FL1,ℓ∞) < ∞. Moreover (see, for example, [13, Lemma 3.1] or [14, Lemma 2.1]), we can select a sufficiently large N > 0 such that ‖Φ2,N (ξ, ·)‖L1 ≤ |Φ2,N (ξ, x))|dx < ∞. Hence, using (8), we get ‖Hmαf‖Mp,q0,s ≤ Cα‖f‖Mp,q0,s . To prove the second part of the result we shall use Lemma 1. In particular, we need to show that for α ∈ {1, 2} and d < p < 1, mα ∈ W (FLp, ℓ∞). This, however, follows by straightforward adaptations of the proofs of [3, Theorems 9 and 11], which we leave to the interested reader. � In analogy to the proof of the previous lemma, we can prove the following weighted version of [3, Theorem 16]. Lemma 3. Let d ≥ 1, s ≥ 0, d < p ≤ ∞ and 0 < q ≤ ∞ be given, and let m(1)(ξ) = sin(|ξ|) |ξ| and m (2)(ξ) = cos(|ξ|), for ξ ∈ Rd. Then, the Fourier multiplier operators Hm(1) , Hm(2) can be extended as bounded operators on M A “smooth” version of Lemma 3 is obtained by replacing |ξ| with < ξ >. Lemma 4. Let d ≥ 1, s ≥ 0, d < p ≤ ∞ and 0 < q ≤ ∞ be given, and let m(ξ) = ei<ξ>, m(1)(ξ) = sin(<ξ>) and m(2)(ξ) = cos(< ξ >), for ξ ∈ Rd. Then, the Fourier multiplier operators Hm, Hm(1), Hm(2) can be extended as bounded operators on Mp,q0,s. NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES 7 Proof. It is clear that m,m(1), m(2) are C∞(Rd) functions and that all their derivatives are bounded. Therefore, m,m(1), m(2) ∈ Cd+1(Rd) ⊂ M∞,1(Rd) ⊂ W (FL1, ℓ∞)(Rd) [8, 11]. Thus, for 1 ≤ p ≤ ∞, and 0 < q ≤ ∞ the result follows from [3] and Lemma 2. For d < p < 1 and 0 < q ≤ ∞, it can be showed that m,m(1), m(2) ∈ Cd+1(Rd) ⊂ W (FLp, ℓ∞)(Rd). Indeed, this follows from obvious modifications to the proof of the embedding Cd+1(Rd) ⊂ M∞,1(Rd) ⊂ W (FL1, ℓ∞)(Rd) [8, 11]. Furthermore, if we modify, for example, the multiplier m to mt(ξ) = e it<ξ>, t ∈ R, we have for < p ≤ 1 (9) ‖mt‖W (FLp,ℓ∞) ≤ (1 + |t|)d+1, and similar estimates hold for modified multipliers m t and m t . � Finally, we state a crucial multilinear estimate that will be used in our proofs. Although the estimate will be needed only in the particular case of a product of functions (see Corollary 1), we present it here in its full generality that applies to multilinear pseudodifferential operators. An m-linear pseudodifferential operator is defined à priori through its (distribu- tional) symbol σ to be the mapping Tσ from the m-fold product of Schwartz spaces S × · · · × S into the space S ′ of tempered distributions given by the formula Tσ(u1, . . . , um)(x) σ(x, ξ1, . . . , ξm) û1(ξ1) · · · ûm(ξm) e2πix·(ξ1+···+ξm) dξ1 · · · dξm,(10) for u1, . . . , um ∈ S. The pointwise product u1 · · ·um corresponds to the case σ = 1. Lemma 5. If σ ∈ M∞,10,s (R(m+1)d), then the m-linear pseudodifferential operator Tσ defined by (10) extends to a bounded operator from Mp1,q10,s ×· · ·×M pm,qm 0,s into M p0,q0 when 1 + · · ·+ 1 + · · ·+ 1 = m − 1 + 1 , and 0 < pi ≤ ∞, 1 ≤ qi ≤ ∞ for 0 ≤ i ≤ m. This result is a slight modification of [4, Theorem 3.1]. Its proof proceeds along the same lines, and therefore it is omitted here. Note that if σ ∈ M∞,10,s , and we pick u1 = · · · = um = u (some of them could be equal to ū since the modulation norm is preserved), p1 = · · · = pm = mp, 0 < p ≤ ∞, and q1 = · · · = qm = 1 we have (11) ‖Tσ(u, . . . , u)‖Mp,10,s . ‖u‖ Mmp,10,s . ‖u‖mMp,10,s , where we used the obvious embedding Mp,10,s ⊆ M 0,s . The notation A . B stands for A ≤ cB for some positive constant c independent of A and B. In particular, if we select σ = 1 (the constant function 1), then σ ∈ M∞,10,s ⊂ M∞,1, and we obtain Corollary 1. Let 0 < p ≤ ∞. If u ∈ Mp,10,s, then um ∈ M 0,s. Furthermore, ‖um‖Mp,10,s . ‖u‖ Mp,10,s 8 Á. Bényi and K. A. Okoudjou This is of course just a particular case of the more general multilinear estimate Mp0,q00,s ‖ui‖Mpi,qi0,s , where the exponents satisfy the same relations as in Lemma 1. When we consider the power nonlinearity f(u) = pk(u) = λ|u|2ku = λuk+1ūk, Corollary 1 becomes Corollary 2. Let 0 < p ≤ ∞. If u ∈ Mp,10,s, then pk(u) ∈ M 0,s. Furthermore, ‖pk(u)‖Mp,10,s . ‖u‖ Mp,10,s For a different proof of the estimate in Corollary 2, see [1, Corollary 4.2]. It is important to note that the previous estimate allows us to control the exponential nonlinearity eρ as well. Indeed, since eρ(u) = λ(e ρ|u|2 − 1)u = pk(u), if we now apply the modulation norm on both sides and use the triangle inequality, we arrive at Corollary 3. Let 0 < p ≤ ∞. If u ∈ Mp,10,s, then eρ(u) ∈ M 0,s. Furthermore, ‖eρ(u)‖Mp,10,s . ‖u‖Mp,10,s(e |ρ|‖u‖2 0,s − 1). 3. Proofs of the main results We are now ready to proceed with the proofs of our main theorems. We will only prove our results for the power nonlinearities f = pk, by making use of Corollary 2. The case of exponential nonlinearity f = eρ is treated similarly, by now employing Corollary 3. In all that follows we assume that u : [0, T )×Rd → C where 0 < T ≤ ∞ and that f(u) = pk(u) = λ|u|2ku. 3.1. The nonlinear Schrödinger equation: Proof of Theorem 1. We start by noting that (1) can be written in the equivalent form (13) u(·, t) = S(t)u0 − iAf(u) where (14) S(t) = eit∆, A = S(t− τ) · dτ. Consider now the mapping J u = S(t)u0 − i S(t− τ)(pk(u))(τ) dτ. It follows from Lemma 2 (see also [3, Corollary 18]) that ‖S(t)u0‖Mp,10,s ≤ C (t 2 + 4π2)d/4 ‖u0‖Mp,10,s , NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES 9 where C is a universal constant depending only on d. Therefore, (15) ‖S(t)u0‖Mp,10,s ≤ CT ‖u0‖Mp,10,s , where CT = sup t∈[0,T ) C (t2 + 4π2)d/4. Moreover, we have S(t− τ)(pk(u))(τ) dτ Mp,10,s ‖S(t− τ)(pk(u))(τ)‖Mp,10,s dτ ≤ T CT sup t∈[0,T ] ‖pk(u)(t)‖Mp,10,s .(16) By using now Corollary 2, we can further estimate in (16) to get S(t− τ)(pk(u))(τ) dτ Mp,10,s . CT T ‖u(t)‖2k+1Mp,10,s . Consequently, using (15) and (17) we have (18) ‖J u‖ C([0,T ],Mp,10,s) ≤ CT (‖u0‖Mp,10,s + cT ‖u‖ Mp,10,s for some universal positive constant c. We are now in the position of using a stan- dard contraction argument to arrive to our result. For completeness, we sketch it here. Let BM denote the closed ball of radius M centered at the origin in the space C([0, T ],Mp,10,s). We claim that J : BM → BM , for a carefully chosen M . Indeed, if we let M = 2CT‖u0‖Mp,10,s and u ∈ BM , from (18) we obtain ‖J u‖ C([0,T ],Mp,10,s) + cCTTM 2k+1. Now let T be such that cCTTM 2k ≤ 1/2, that is, T ≤ T̃ (‖u0‖Mp,10,s ). We obtain ‖J u‖ C([0,T ],Mp,10,s) that is J u ∈ BM . Furthermore, a similar argument gives ‖J u− J v‖ C([0,T ],Mp,10,s) ‖u− v‖ C([0,T ],Mp,10,s) This last estimate follows in particular from the following fact: pk(u)(τ)− pk(v)(τ) = λ(u− v)|u|2k(τ) + λv(|u|2k − |v|2k)(τ). Therefore, using Banach’s contraction mapping principle, we conclude that J has a fixed point in BM which is a solution of (13); this solution can be now extended up to a maximal time T ∗(‖u0‖Mp,10,s ). The proof is complete. 10 Á. Bényi and K. A. Okoudjou 3.2. The nonlinear wave equation: Proof of Theorem 2. Equation (2) can be written in the equivalent form (19) u(·, t) = K̃(t)u0 +K(t)u1 − Bf(u) where (20) K(t) = sin(t −∆ , K̃(t) = cos(t −∆), B = K(t− τ) · dτ Consider the mapping J u = K̃(t)u0 +K(t)u1 − Bf(u). Recall that f = pk. If we now use Lemma 3 (see also [3, Corollary 21]) for the first two inequalities below and Corollary 2 for the last estimate, we can write ‖K̃(t)u0‖Mp,10,s ≤ CT‖u0‖Mp,10,s , ‖K(t)u1‖Mp,10,s ≤ CT‖u1‖Mp,10,s , ‖Bf(u)‖Mp,10,s ≤ cT CT‖u‖ Mp,10,s where c is some universal positive constant. The constants T and CT have the same meaning as before. The standard contraction mapping argument applied to J completes the proof. 3.3. The nonlinear Klein-Gordon equation: Proof of Theorem 3. The equiv- alent form of equation (3) is (22) u(·, t) = K̃(t)u0 +K(t)u1 + Cf(u) where now (23) K(t) = sin t(I−∆)1/2 (I−∆)1/2 , K̃(t) = cos t(I −∆) 1/2, C = K(t− τ) · dτ. Consider the mapping J u = K̃(t)u0 +K(t)u1 + Cf(u). Using Lemma 4 and the notations above, we can write ‖K̃(t)u0‖Mp,10,s ≤ CT‖u0‖Mp,10,s , ‖K(t)u1‖Mp,10,s ≤ CT‖u1‖Mp,10,s , ‖Cf(u)‖Mp,10,s ≤ cT CT‖u‖ Mp,10,s The standard contraction mapping argument applied to J completes the proof. NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES 11 References [1] W. Baoxiang, Z. Lifeng, and G. Boling, Isometric decomposition operators, function spaces Eλp,q and applications to nonlinear evolution equations, J. Funct. Anal. 233 (2006), no. 1, 1–39. [2] W. Baoxiang and H. Hudzik, The global Cauchy problem for the NLS and NLKG with small rough data, J. Diff. Equations 232 (2007), 36–73. [3] Á. Bényi, K. Gröchenig, K. A. Okoudjou, and L. G. Rogers, Unimodular Fourier multipliers for modulation spaces, J. Funct. Anal. (2007), to appear. [4] Á. Bényi, K. Gröchenig, C. Heil, and K. Okoudjou, Modulation spaces and a class of bounded multilinear pseudodifferential operators, J. Operator Theory 54 (2005), no. 2, 387–399. [5] Y. V. Galperin, and S. Samarah, Time-frequency analysis on modulation spaces Mp,qm , 0 < p, q ≤ ∞, Appl. Comput. Harmon. Anal., 16 (2004), 1–18. [6] H. G. Feichtinger, Modulation spaces on locally Abelian groups, in: “ Proc. Internat. Conf. on Wavelets and Applications” (Radha, R.;Krishna, M.;Thangavelu, S. eds.), New Delhi Allied Publishers (2003), 1–56. [7] H. G. Feichtinger and G. Narimani, Fourier multipliers of classical modulation spaces, Appl. Comput. Harmon. Anal. 21 (2006), no. 3, 349–359. [8] K. Gröchenig, Foundations of Time-Frequency Analysis, Birkhäuser, Boston MA, 2001. [9] M. Kobayashi, Modulation spaces Mp,q for 0 < p, q ≤ ∞, J. Func. Spaces Appl. 4 (2006), no. 2, 329–341. [10] M. Kobayashi, Dual of modulation spaces, J. Func. Spaces Appl., to appear. [11] K. A. Okoudjou, Embeddings of some classical Banach spaces into the modulation spaces, Proc. Amer. Math. Soc., 132 (2004), no. 6, 1639–1647. [12] T. Tao, Nonlinear dispersive equations: Local and global analysis, CBMS Regional Conference Series in Mathematics, no. 106, American Mathematical Society, 2006 [13] J. Toft, Convolutions and embeddings for weighted modulation spaces, Advances in pseudo- differential operators, 165–186, Oper. Theory Adv. Appl. 155, Birkhauser, Basel, 2004. [14] J. Toft, Continuity properties for modulation spaces, with applications to pseudo-differential calculus, II, Ann. Global Anal. Geom. 26 (2004), no. 1, 73–106. [15] H. Triebel, Modulation spaces on the euclidean n−space, Z. Anal. Anwendungen, 2 (1983), no. 5, 443–457. Árpád Bényi, Department of Mathematics, 516 High Street, Western Washington University, Bellingham, WA 98225, USA E-mail address : arpad.benyi@wwu.edu Kasso A. Okoudjou, Department of Mathematics, University of Maryland, Col- lege Park, MD 20742, USA E-mail address : kasso@math.umd.edu 1. Introduction and statement of results 2. Fourier multipliers and multilinear estimates 3. Proofs of the main results 3.1. The nonlinear Schrödinger equation: Proof of Theorem ?? 3.2. The nonlinear wave equation: Proof of Theorem ?? 3.3. The nonlinear Klein-Gordon equation: Proof of Theorem ?? References ABSTRACT By using tools of time-frequency analysis, we obtain some improved local well-posedness results for the NLS, NLW and NLKG equations with Cauchy data in modulation spaces $M{p, 1}_{0,s}$. <|endoftext|><|startoftext|> Introduction Arithmetic coding algorithm in its modern version was published in Communications of ACM in June 1987 [Witten], but the authors, Ian Witten, Radford Neal and John Cleary, referred to [Abrahamson] as to “the first reference to what was to become the method of arithmetic coding”. So we may say that it is known “for more than forty years”. The algorithm now is a common knowledge – it was published in numerous textbooks (see for example [Salomon, Sayood]), some reviews were published [Bodden, Said], Dr. Dobb’s Journal popularized it [Nelson], wiki [wiki] contains an article about it, a lot of sources could be found on web... So why one more paper on this subject and what is this “p-adic arithmetic”? Let go back to the original idea of arithmetic coding. In arithmetic coding a message is represented as a subinterval [b, e) of union semi interval [0, 1). (We will give all definitions later) When a new symbol s comes a new subinterval [b(s), e(s)) of [b, e) is constructed. Common method to calculate a new subinterval is to divide a current interval into |A| (A is an alphabet, |A|- number of symbols) subintervals, each subinterval represents a symbol from A and has length proportional to probability of this symbol. For a new symbol s corresponding subinterval [b(s), e(s)) will be return by encoder. Thus encoding is a process of narrowing intervals (we will call them message intervals) starting from the union interval: [0, 1) ≡ [b0, e0), [b1, e1), [b2, e2), … , [bt, et) where 0 = b0, ≤ b1 ≤ b2 ≤ … ≤ bt 1 = e0, ≥ e 1 ≥ e 2 ≥ … ≥ e t All bi and ei are real numbers. A last constructed subinterval may be used as a final output, or any point x from last subinterval and message length. But usually a special symbol EOM (End Of Message), which does not belong to the alphabet, is used as termination symbol of a message. In this case only a point x can be used as coding result. Decoding is also a process on narrowing intervals. It starts with union interval and a point x inside it. Decoder finds a symbol by dividing current intervals into |A| subintervals and finds the one that contains point x, say [b1, e1). Corresponding to this interval symbol s1 is pushed into an output buffer; [b1, e1) is used as a new current interval. And so on until EOM symbol is received. 4/5/2007 But here is a problem – one have to use infinite precision real numbers to implement this algorithm and there is no such a thing like effective infinite precision real arithmetic. This problem was always considered as a technical one. Solution is simple - just use integers instead. There is a canonical implementation, first written in C [Witten], which was later reproduced in other languages, but no analysis of what happens to the algorithm after moving it from the field of real numbers to the ring of integer numbers was published. In this paper we introduce p-adic arithmetic coding which is based on mapping a message to a path on a p- tree (a tree with p outgoing branches at each vertex; we also assume that p is a prime number). This path is constructed as a common part of paths to the left and right edges of a subinterval [gl(s), gr(s)), where gl(s) and gr(s) are from a special equidistant grid G on [0, 1). This semi interval is constructed according to the same rules, as in real number arithmetic coding, but in contrast to it, the edges are not arbitrary real numbers, but belong to the grid. A path on a p-tree can be naturally presented as a p-adic integer number. p-adic distance proved to be a natural measure on paths – the longer a common part of two paths, the smaller p-adic distance between them. Function ordP, also known as p-adic logarithm, gives length of a common path. A path can be also identified by its final point on a grid. A grid point g can be represented by an integer index k from a finite integer ring as g=k*|G|-1(here |G| is number of elements in G). The crucial point of this algorithm is how we can calculate a path from an index and vice versa – an index from a path. IP (Index-Path) mapping, described in this article, presents an elegant and efficient way for this. Now we ready to give a brief sketch of how does the algorithm work. As initial step we have to define an alphabet A, a model M, an output buffer (it will contain a p-adic integer number B) and a grid G on [0, 1). Start with union coding semi interval represented by two indexes l=0 (left) and r=0 (right), and B=0. When a new symbol s comes, the model calculates a new subinterval [l(s), r(s)) (l and r are indexes from a finite integer ring, while l^ and r^ – paths presented as p-adic integer numbers). Using IP transformation (we use symbol ^ for this transformation) we can calculate p-adic representation of paths to these edges l(s)^ and r(s)^. If p-adic distance between them is equal to 1, we continue encoding using [l(s), r(s)) as a new current encoding interval. If the distance is less then 1, then l(s)^ and r(s)^ have a common path of length c=ordP(l(s)^, r(s)^). That means that path to any point inside [l(s), r(s)) have the same first (least significant) c digits as l(s)^. We can push this common path to an output buffer adding them as new most significant part of p-adic number B. We can also drop c least significant digit of l(s)^ and r(s)^. Both of these operations are possible, because p-adic numbers are read from left to right, i.e. less significant digit (those that are multiplied by less powers of p) are in the left part of buffer. This feature of p-adic integer numbers explains why p-adic arithmetic coding and decoding are incremental. Now we can continue encoding with truncated l(s)^ and r(s)^. To do this we must calculate new subinterval, corresponding to new paths. This also can be done using IP transformation. Encoding will continue using [(l(s)^)^, (r(s)^)^) as new current message interval from some grid. This procedure we will call PR rescaling. In the case of p=2 this is procedure is similar to well known E1/E2 rescaling [Bodden]. But PR rescaling gives a better insight of this mechanism, connects it with p-adic norm and can be used for any prime p. Moreover, PR rescaling is more accurate on boundaries and because of this the algorithm is able to reproduce Huffman codes for certain models. We will also generalize E3 rescaling, which is based on usual Archimedean norm (absolute value in this case), for any prime p. p-adic arithmetic coding algorithm generalizes not only arithmetic coding. For a special class of models, p- adic coding algorithm works exactly as Huffman’s algorithm [Huffman]. In this models weights of all symbols should be equal to p-n, where n some positive numbers and a sum of all weight is equal to 1. In other words, they are leaves of a Huffman code tree. For a special model and one symbol alphabet p-adic arithmetic coding reproduce Golomb-Rice codes [Golomb, Rice]. 4/5/2007 Definitions Alphabet Alphabet A - A non empty set of symbols ai. |A| - number of symbols in A. In most examples below 4 symbols alphabet [a, b, c, d] will be used. Other examples: binary alphabet [0, 1], 128 characters ASCII, alphabet of 256 different eight-bit characters. The last one is used in all tests. Even an alphabet containing only one symbol makes sense – as it will be shown, the algorithm creates exactly Golomb-Rice [Golomb, Rice] codes in this case. Message Message M – a sequence of symbols from alphabet A. M = (ao, a1, … , ai, … ,an) where ai belong to A. Example: (a, b, a, a, b, c, d, a). Semi interval [l, r) – includes l, but not r. Notation [,) means that the left point is included to the interval, while the right one is not. Below we always deal with subintervals of [0, 1). Grid Later we will subdivide [0, 1) into PN (P and N are natural numbers) semi intervals of equal length. Each has length P-N and can be identified by its left edge. These left edge points form a grid G(PN). Coordinate of a point of the grid with index k is evidently kP-N. We will use notation gk(N) for points from G(P Picture 1. Grid P2 (P=2) These indexes will play an important role in our discussion. If fact, all calculations will be done using indexes. The range of indexes is 0 ≤ k < PN In other words, indexes are nonnegative numbers modulo PN. Negative numbers are defined in this ring as -k = PN - k Weight interval Following the main idea of arithmetic coding let map alphabet A to a semi interval [0, 1), which we will refer as a weight interval. To do this enumerate symbols from A in any order (the order is not important in a sense that compression rate does not depend on it) and divide the interval in |A| semi intervals. Semi interval [wi, wi+1) corresponds to symbol ai. To make compression effective lengths of these intervals must be equal to probability of symbols in a message: | wi - wi+1 | = pi where pi – probability of symbol ai. 4/5/2007 Picture 2. Weight interval To define weight intervals we will use notation: { symbol0:[semininterval1), … , symbol|A|-1:[semininterval|A|-1) } For example: {a:[0, 0.5), b:[0.5, 0.75), c:[0.75, 0.875), d:[0.875,1)} Message interval Let fix a natural number N, a prime number P and create a grid G(PN). Messages will be mapped to semi intervals [l, r) of this interval. 0 ≤ l < r < 1 l, r belongs to G(PN). Arithmetic coding is just a process of narrowing a message interval. When a new symbol comes, a current message interval is divided in |A| subintervals proportional to weight interval and then a subinterval corresponding to a new symbol is selected as a new message interval. Thus starting with [0, 1) (empty message) interval we end up with a subinterval corresponding to the whole message. For example let see how a short message {a, b, a} may be coded (here we use weight interval from previous section): Picture 3. Message interval We may also present this in a table (using actual values of gk(N)): 4/5/2007 Message Semi interval Numerical value {} [ g0(0) , g0(0) ) [ 0 , 1 ) {a} [ g0(1) , g1(1) ) [ 0 , 1/2 ) {a, b} [ g2(3) , g3(3) ) [ 2/4 , 3/8 ) {a, b, a} [ g4(4) , g5(4) ) [ 4/8 , 5/8 ) An important difference from the original idea of real number arithmetic coding is that here we use only points from a grid as subintervals edges. Coding tree Consider a tower of grids G(P0) < G(P1) < G(P2) < … < G(Pn) … By construction, if a point belongs to G(Pn), then it belongs to G(Pn+1), G(Pn+2),…., G(Pk) (k>n). G(P0) consist only of one point. Now let construct a coding tree. Start with a root – which is evidently the only point from G(P0) - g0(0) . Then comes a first level – points from G(P1). Link the root g0(0) with points from G(P 1): g0(1), g1(1), … , gP-1(1). This gives us first level of the coding tree. Now we can continue. Let us assume that the tree build up to a level N. To create a new N+1 level, we have to: Construct a new grid G(Pn+1) as a new bottom level to the bottom (us usual, tree grows downwards). Link points from the last level (i.e. points from G(Pn)) to points from G(P n+1) according to the following rule: gk(n) link to points gk*P(n+1), g(k+1)*P(n+1), …, g(k+P-1)*P (n+1) 0 11/21/4 3/4 g0(1) g1(1) g0(2) g1(2) g2(2) g3(2) g0(3) g2(3) g4(3)g3(3) g5(3) g6(3) g7(3) g0(4) g1(4) g2(4) g3(4) g4(4) g5(4) g6(4) g7(4) g8(4) g9(4) g10(4) g11(4) g12(4) g13(4) g14(4) g15(4) g1(3) g0(0) Picture 4. Coding tree Here, as in most other illustrations, we use P=2 to simplify drawing. Now we can use the grid and the tree to code a simple message {a, b, a} using weights {a:[0, 1/2), b:[1/2, 3/4), c:[3/4, 7/8), d:[ 7/8, 1)} 4/5/2007 Picture 5. Paths on coding tree Message Semi interval Path {} [ g0(0) , g0(0) ) [ { g0(0) }, { g0(0) } ) {a} [ g0(1) , g1(1) ) [ { g0(0) , g0(1) } , { g0(0) , g1(1) } ) {a, b} [ g2(3) , g3(3) ) [ { g0(0), g0(1) , g1(2) , g2(3) }, { g0(0), g0(1) , g1(2) , g3(3) } ) {a, b, a} [ g4(4) , g5(4) ) [ { g0(0), g0(1) ,g1(2) ,g2(3),g4(4)}, { g0(0),g0(1) ,g1(2) ,g2(3),g5(4) } ) We need a more convenient way to refer to grid points and tree paths. Grid points can be easily represented as indexes, i.e. well known positive integer numbers, while for paths we will use p-adic integer numbers. Representation of paths as p-adic integer numbers From any point gk(n) we have P different links to a next (n+1) level. We can mark our next choice with a nonnegative integer number m 0 ≤ mj < P 0 ≤ j ≤ n Now we can represent any (final) paths on the coding tree as a vector M = {m0, m1, m2, … , mn} This vector can be mapped to a nonnegative number x x= m0P + m1P 1 + m2P 2+ … + mnP This mapping is evidently one to one. The number x may be considered as a p-adic integer number. These numbers are not well known among programmers. One can find an introduction in p-adic numbers in [Baker, Koblitz]. A very helpful way to visualize some unusual properties of p-adic mathematic may be found in [Holly]. The first coefficient m0 tells us to what top level subinterval of [0, 1) the point belongs. The next one m1 – to which subinterval of this interval the point belongs, and so on. 4/5/2007 Picture 6. p-adic representation of binary tree. Level Paths 0 {} 1 Paths {0} {1} p-adic number 0 1 2 Paths {0,0} {0,1} {1,0} {1,1} p-adic number 0 2 1 3 3 Path {0,0,0} {0,0,1} {0,1,0} {0,1,1} {1,0,0} {1,0,1} {1,1,0} {1,1,1} p-adic number 0 4 2 6 1 5 3 7 A tree for P=3 is shown in the next picture Picture 6. p-adic representation of a tree; P=3. 4/5/2007 Level Paths 0 {} 1 Paths {0} {1} {2} p-adic number 0 1 2 2 Paths {0,0} {0,1} {0,2 } {1,0} {1,1} {1,2} {2,0} {2,1} {2,2} p-adic number 0 3 6 1 4 7 2 5 8 In our algorithm we will use p-adic distance. This distance can be defined with the help of p-adic logarithm function, usually called as ordP. ordP(x) = max number r such that (x % P r) = 0; x ≠ 0 p-adic norm is |x|P = 1 / P**ordP(x) x ≠ 0 |0| = 0. and distance dP(x,y) = |x – y|P It can be shown [Koblitz] these are “real” norm and distance, i.e. all three axioms are valid for them. For two paths x and y x= m0P + m1P 1 + m2P 2+ … + mnP y= k0P + k1P 1 + k2P 2+ … + knP ordP(x-y) gives a number of common links. dP(x,y) have a very intuitive meaning – the greater number of common links have two paths, the closer they are in terms of p-adic distance. Index representation Now return to the first method of mapping paths – by end points. An end point belongs to a grid G(Pn), so it is defined by its index a which is just a plain nonnegative integer number. How this index is connected to the paths leading to this point? Let x be a path x= m0P + m1P 1 + m2P 2+ … + mnP 0 ≤ mj < P 0 ≤ j ≤ n Then the path x ends at a point g at level n g = m0P + m1P -2 + m2P -3 + … + mnP -n-1 We just negate powers and subtract 1. It can be proved by simple induction. Let a be an index corresponding to point g on the grid. g = aP-n-1 a= m0P + m1P n-1 + m2P n-2+ … + mnP We can rewrite a in usual form a= u0P + u1P 1 + u2P 2+ … + unP If we consider x and a as integer numbers then mapping is just reversing vectors of coefficients. This mapping we will call IP (Index-Path) mapping. 4/5/2007 We will introduce some useful notations in the next section and use this feature intensively in algorithms: we can perform ordinary integer arithmetic operations on indexes calculating new subinterval and immediately get paths to them. Mapping paths to points was considered in a more general form by S.V. Kozyrev [Kozyrev]. Following his notation we will use symbol ρ for it. The ρ mapping has a very important feature |ρ(x) - ρ(y)| ≤ |x – y|P were |ρ(x) - ρ(y)| is Archimedean (our usual) distance which in our case is absolute value. This means that if two paths are close to each other then corresponding end points are also close in our usual Archimedean norm. A proof of this can be found in [Kozyrev]. ρ mapping is not one to one, IP (Index-Path), mapping which deals with finite sums, is one to one mapping. Let, as previously a and g be a= m0P + m1P n-1 + m2P n-2+ … + mnP g = aP-n-1 To what subinterval of level 1 of [0, 1) the point belongs to? It depends on the value of m0 only. Straightforward calculations n-1 + m2P n-2+ … + mnP 0 ≤ (P-1)( P0 + P 1 + P2+ … + Pn-1) = (P-1)((Pn -1)/(P-1)) =(Pn -1) show that a sum of all less significant terms can’t move a point to another subinterval. With m0 fixed, we can conclude that m1 solely defines subinterval inside the second subinterval and so on. Let us continue with the example (with P=2) above by adding indexes of grid points to the table: Level Paths 0 {} 1 Paths {0} {1} p-adic number 0 1 Index 0 1 Points 0 1/2 2 Paths {0,0} {0,1} {1,0} {1,1} p-adic number 0 2 1 3 Index 0 1 2 3 Points 0 1/4 2/4 3/4 3 Paths {0,0,0} {0,0,1} {0,1,0} {0,1,1} {1,0,0} {1,0,1} {1,1,0} {1,1,1} p-adic number 0 4 2 6 1 5 3 7 Index 0 1 2 3 4 5 6 7 Points 0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 As an illustration let us compare Archimedean and p-adic distances for the following three points g2(3) (or {0, 1, 0}), g3(3) ({0,1,1}) and g4(3) ({1, 0, 0}). Archimedean distance we all used to: Points (p-adic) 2 3 4 2 0 1/8 1/4 3 1/8 0 1/8 4 1/4 1/8 0 p-adic logarithm (ord) has values: 4/5/2007 Points (p-adic) 2 6 1 2 2 0 6 2 0 1 0 0 and p-adic distances are: Points (p-adic) 2 6 1 2 0 1/4 1 6 1/4 0 1 1 1 1 0 From this we may observe that the greater is a common path, the closer are points in p-adic norm. Operators ^ and [] Let us define operator ^ to transform points from index representation to path representation, or, in other words, from nonnegative integers modulo PN to p-adic integers, and back, from path to index representation. x = a^ It is convenient to rewrite a and x in form of scalar product. Consider N+1 element vectors MN and PN MN = (m0, m1, m2, … , mN) where 0 ≤ mj < P, 0 ≤ j ≤ N PN = (P 0, P1, P2, … , PN) Then x = m0P + m1P 1 + m2P 2+ … + mNP can be represented as scalar product of two vectors x = (MN • PN T – as usual, means operation of matrix transposition (i.e. changing rows to columns). While a= m0P + m1P N-1 + m2P N-2+ … + mNP a = (MN R • PN T) = (MN • (PN R )T) Here R means reverting elements of a vector. It is obvious that operator ^ is idempotent x^^ = a^ = x The important thing about this trivial operation is that we can perform arithmetic operation on points of a grid, and then immediately find a path to it by applying operator ^, and vice versa, for a given path we can find a corresponding grid point. It is convenient to define operator [] as coefficient in scalar representation: Let x be as previously x = m0P + m1P 1 + m2P 2+ … + mNP Then 4/5/2007 x[i] = mi It is easy to see that x[i] = x^[N-i] Mapping subintervals to paths Now any subinterval can be mapped to a pair of paths on a coding tree, provided that edge points of subintervals belong to some G(PN). We will use notation [ , ] for intervals presented as a pair of indexes, where l ≤ r and [l^, r^] – as a pair of paths and [ , ) for pairs paths to subintervals. A simple fact, just to note: if an interval [l, r] lies inside an interval [l1, r1], then dP(l1^, r1^) ≤ dP(l^, r^) In other words, paths to subinterval’s edges are closer then paths to enveloping interval. So, if an interval’s edges have a common path, then paths to edges of any subinterval have at least the same or even longer common paths. A length of common path can be calculated as ordP(r^ - l^). As an example see Picture 5. Let discuss in more details the rightmost semi interval, i.e. a subinterval which ends at point 1. This point has index equal to PN. Because we are working in the ring on integers numbers mod PN the index is equal to 0 in this ring. So path to 1 has the form {0, 0, … , 0}, an general form of the rightmost interval is [l, 0]. What is a p-adic length of a rightmost interval? By definition dP(l,0) = | 0 – l |P = | –l |P And common length is ordP(-l). What is a “negative” path in our case? Path is always a path to some point on a grid. We use indexes for representing them. So a negative path may be defined as a path to a point, represented by negated index. -l = (-(l^))^ By definition negative numbers in ring mod PN are l^ + -(l^) = 0 mod PN Common paths Consider common part of all paths to points of semi interval [l, r) (l and r are indexes here); both of them belongs to G(PN). All these paths end at corresponding points pi l ≤ pi ≤ r-1 (remember that r does not belong to subinterval). What is a common path to all these point? First consider length of a common path. To find it we may first find maximum p-adic distance among all pairs: max(|p^ - q^|P) ; l ≤ p < q ≤ r-1 We can use ultrametric feature of p-adic norm (see, for example, [Koblitz, Holly]): |x-y|P <= max(|x|P, |y| P) In our case we can use it as (|p^ - q^|P) = |(p^ - l^) + (l^ - q^)|P ≤ max(|(p^ - l^)|P , |(l^ - q^)|P) l ≤ p < q ≤ r-1 So all we need is to find max(|(pi^ - l^)|P) ; l < pi ≤ r-1 which, by construction, is: 4/5/2007 |(r -1)^ - l^|P Now we can calculate length of a common path. Special case l = r – 1 is important but trivial – the length here is simply a length of l. If l ≠ r – 1 then it is equal to ordP((r-1)^ - l^). Because function ordP is not defined for zero argument we introduce a function com, defined on p-adic numbers: comP,N(l, r) = N if ( l == r ) else ordP(r - l) If l, r belongs to G(PN) then length of common path is calculated as comP,N(l^, r^). Finally we have: Paths to points of semi interval [l, r) have a common path of length comP,N(l^, (r-1)^). Common path is a sub path of length comP,N(^, (r-1)^) of l^ starting from root. In the following table examples different intervals of level 2 from Picture 6 and their common paths are shown. l^ r^ l r r-1 l (r-1) - l com2,2( l^, r^ ) Common path {0, 0} {0, 1} 0 1 0 0 0 2 {0, 0} {0, 0} {1, 0} 0 2 1 0 2 1 {0} {0, 0} {1, 1} 0 3 2 0 1 0 {} {0, 0} {0, 0} 0 0 3 0 3 0 {} {0, 1} {1, 0} 1 2 1 2 0 2 {0, 1} {0, 1} {1, 1} 1 3 2 2 3 0 {} {0, 1} {0, 0} 1 0 3 2 1 0 {} {1, 0 } {1, 1} 2 3 2 1 1 2 {1, 0} {1, 0} {0, 0} 2 0 3 1 2 1 {1} {1, 1} {0, 0} 3 0 3 3 0 2 {1, 1} Rescaling based on P-adic distance (PR) Consider two paths x = {0, 0, 1} and y = {0, 1, 1} or, as p-adic numbers: x = 0*20 + 0*2 1 + 1*22 = 4 y = 0*20 + 1*2 1 + 1*22 = 6 or, as grid points x^*2-3 = (1*20 + 0*2 1 + 0*22)/8 = 1/8 y^*2-3 = (1*20 + 1*2 1 + 0*22)/8 = 3/8 Because all subintervals in coding process will be inside [1/8, 3/8) all subsequent intervals will be inside it (this is how coding works), that means that all paths to these subintervals will have a common part. We can calculate common path of x and y according to the procedure described above: y^ -1 = 1*20 + 1*2 1 + 0*22 = 6 (y^ -1)^ = 0*20 + 1*2 1 + 0*22 = 2 (y^ -1)^ - x = 2 And finally com2,3(x, y) = ord2(2) = 1 We can store this path as a vector of coefficients and proceed with remaining part. To make further descriptions shorter we introduce two operators: extracting and rescaling Extracting is a trivial operator - it creates a vector of the first j coefficients of x: 4/5/2007 x= m0P + m1P 1 + m2P 2+ … + mnP ext(x, j) = {m0, m1, m2, … , mj-1} if second argument is omitted, then all coefficients are extracted: ext(x) = {m0, m1, m2, … , mn} One more operation on vector representation: cut(x, n, m) removes m bit starting with position n and shrink the vector of coefficients. Rescaling is just omitting first j terms in x and removing common factor Pj, res(x, j) = mjP + mj+iP 1 + m j+2P 2+ … + mnP Why do we call it rescaling? Because we can continue with level n-j as a first level and do not care about previous steps. Let see what happens with corresponding index res(x, j)^ = mjP + mj+iP n-j+1 + m j+2P n-j+2+ … + mnP As an integer number it is smaller then the original one. Rescaling keeps numbers from growing and makes it possible to use computer’s integer arithmetic (not infinite precision) for calculations, which makes this algorithm robust. Continuing our example (do not forget remove common factor!) res(x,1) = (0*21 + 1*22) / 2 = 0*20 + 1*21 res(y,1) = (1*21 + 1*22) / 2 = 1*20 + 1*21 Indexes will be res(x,1)^ = 1 res(y,1)^ = 3 and grid points res(x,1)^*2-2 = 1/4 res(y,1)^*2-2 = 3/4 We can show how this works for weight interval {c:[0, 0.125), b:[ 0.125, 0.375), d:[0.375, 05), a:[0.5, 1)} and message {b}. 4/5/2007 Picture 7. Rescaling In arithmetic coding analogous procedure (see [Bodden]) is called E1/E2. We will call this rescaling PR (p-adic rescaling). Trivial, but important case is when an interval occupies a whole subinterval of level K. In this case x^ = y^- 1 and the interval can be rescale on full length to starting interval [0, 1). This fact will be used later in discussion how p-adic coding corresponds to Huffman algorithm. Lifting In all our previous considerations and examples we use grids of minimal level. We may as well fix a level deep enough to perform all calculations. In fact, adding or removing trailing zeros in path representation does not change p-adic representation of a point, but, of course, changes its index representation. We will call operation of adding or removing zeros lifting. The reason for this name is that on a picture it looks like moving points in vertical direction. There are several advantages of using fix level in calculations: • It may be easy and more efficient coded, especially for P=2. • Model may be unable to present results as numbers of current ring G(PN); in this case special procedure must be implemented for changing level on a model’s demand. Let x to be from G(PN). Lifting is a mapping x to G(PN+f) x = m0P + m1P 1 + m2P 2+ … + mNP lift(x, j) => x = m0P + m1P 1 + m2P 2+ … + mNP N + 0•PN+1+ … + 0•PN+j where j ≥ 0 Evidently as an integer number x does not change, but as an index it changes dramatically. Important, but trivial feature of lifting is that it does not change common paths. Lifting can be defined also for negative argument: lift(x, -j) => x = m0P + m1P 1 + m2P 2+ … + mN-jP N-j where j ≥ 0 4/5/2007 If last j coefficient were zero, negative lifting also does not change x as an integer number. To use negative lifting without changing results we need to know an order of the last non zero coefficient: lnz(x) = min( j: mk =0 for k>j) This function gives us the highest possible for x level. A semi interval [x y) may be positioned at level hpl(x,y) = max(lnz(x), lnz(y)) Our procedure for calculating common path length of a semi interval was defined for intervals at hpl level. To extend it for the case when an interval belongs to a fixed level we need first to lift it back to hpl. comP,N( x, y ) = N if ( x == y ) else ordP(x - y) comP,N( x, y ) = comP,hpl(x,y) (lift(x, hpl(x,y) - N ), lift(y, hpl(x,y) -N ) ) Fortunately we do not have to go in that complication. The reason for this is that lifting does not change number of common paths. Let explore previous example restricting all calculations to level 4. 0 11/21/4 3/4 {0} {1} {0, 0} {0, 1} {1, 0} {1, 1} {0, 0, 0} {0, 1, 0} {1, 0, 0}{0, 1, 1} {1, 0, 1} {1, 1, 0} {1, 1, 1} {0, 0, 0, 0} {0, 0, 0, 1} {0, 0, 1, 0} {0, 0, 1, 1} {0, 1, 0, 0} {0, 1, 0, 1} {0, 1, 1, 0} {0, 1, 1, 1} {1, 0, 0, 0} {1, 0, 0, 1} {1, 0, 1, 0} {1, 0, 1, 1} {1, 1, 0, 0} {1, 1, 0, 1} {1, 1, 1, 0} {1, 1, 1, 1} Before rescaling After rescaling {0, 0, 1} Picture 7a. Rescaling on level 4 Consider two paths x and y from the previous example, but fixed the level equal to 4. On this level x and y can be presented as {0, 0, 1, 0} and {0, 1, 1, 0} or, as p-adic numbers: x = 0*20 + 0*2 1 + 1*22 + 0*23 = 4 y = 0*20 + 1*2 1 + 1*22 + 0*23= 6 To determine common path length we need to calculate y^ -1 = 5 (y^ -1)^ = 10 (y^ -1)^ - x = 6 And finally 4/5/2007 com2,4(x, y) = ord2(6) = 1 After rescaling we have new x and y: x = 0*20 + 1*21 + 0*22 = 2 y = 1*20 + 1*21 + 0*22 = 3 But they belong to level 3. To return x and y back to level 4 lifting is needed: lift(x, 1) = 0*20 + 1*21 + 0*22 + 0*23 lift(y, 1) = 1*20 + 1*21 + 0*22 + 0*23 Finding the shortest path point When coding is over we can choose any paths to any point from a final semi interval as a result. But points from the same semi interval may and have different paths after dropping trailing zeros. Let take a simple example when a message finally ends with semi interval [g5(4), g10(4)). Because we can drop trailing zeros, point g8(4) is the best choice – after dropping trailing zeros it becomes g1(1). 0 11/21/4 3/4 g0(0) g0(1) g1(1) g0(2) g1(2) g2(2) g3(2) g0(3) g2(3) g4(3)g3(3) g5(3) g6(3) g7(3) g0(4) g1(4) g2(4) g3(4) g4(4) g5(4) g6(4) g7(4) g8(4) g9(4) g10(4) g11(4) g12(4) g13(4) g14(4) g15(4) g1(3) Picture 8. Shortest path point A shortest path point in a semi interval [l, r) can be defined as a point with minimum level. lv(x^) = max(i: mi ≠ 0) g = min(lv(x^): l ≤ x < r ) But let consider paths as integers. From this point of view a point with minimal path is just a minimal p- adic integer. So g = min(x^: l ≤ x < r) We can check this for our example: Point g5(4) g6(4) g7(4) g8(4) g9(4) g10(4) Path {0, 1, 0, 1} {0, 1, 1, 0} {0, 1, 1, 1} {1, 0, 0, 0} {1, 0, 0, 1} {1, 0, 1, 0} p-adic number 10 6 15 1 9 5 Index 5 6 7 8 9 10 4/5/2007 Model Model is just an abstraction for a set of functions. One function calculates new subinterval on a base of incoming symbol and current interval in a predefined grid. M.code(a, l, r) => lnew, rnew An other takes as arguments a point and current interval and returns a new subinterval and a symbol M.decode(g, l, r ) => lnew, rnew, a l, r, g belongs to grid G(PN), a – to alphabet A. Model operates with indexes from a ring of nonnegative integers modular PN, so we have three possible variants how one subinterval on a ring can be situated inside another: 0 ≤ l ≤ lnew < rnew ≤ r < P N r=0 ; 0 ≤ l ≤ lnew < rnew < P N r=0 ; rnew = 0; 0 ≤ l ≤ lnew < P N And, of course some technical things: initialization and taking care of end of message. M.init(A, P, N, x) Where A is alphabet, P, N – characteristics of grid G(PN), x – optional parameter, some auxiliary information, which may be used by a model for optimization. code and decode functions may update model, but they must do it in sync. Input and Output I (input) and O (Output) are abstracts for pushing and receiving information. To make notations short we introduce an ugly term P-bit, which means one of symbols 0, … P-1. For P=2 it is obviously a normal bit. Now let describe input and output operations. I.getC => returns next character from input stream or EOM (End Of Message) I.getB(n) => returns next n P-bit vector from input stream O.pushB(U ) – pushes all P-bits from vector U O.pushB(p, n) – pushes P-bit p n times O.pushC(a) – pushes a symbol to an output stream Algorithms Now we are in position to describe the p-adic coding algorithm. Main idea of this algorithm is the same as in arithmetic coding – a message is mapped to in interval on [0, 1). There two parts of the algorithm – encoding and decoding, but whatever we are doing the first step – initialize a model: M.init(A, N) Coding Start with an empty message – no symbols. An empty message is coded as [0, 1), empty path U = {} or as [0, 0). l, r = 0, 0 When a symbol a comes a = I.getC model calculates a new interval. 4/5/2007 l, r = M.code(a, l, r) Now calculate a common path length n = comP,N(l^, (r-1)^) If n > 0 we can push common path to an output O.pushB(ext (l^, n)) and do rescaling. l^, r^ = res(l^, n), res(r^, n) And we also need to lift rescaled values back to level N and convert to index representation. l, r = lift(l^, n)^, lift(r^, n)^ Now we can read a next symbol and repeat steps. Pseudo code M.init(A, P, N) l, r = 0, 0 while ( ( a = I.getC ) != EOM ) { l, r = M.code(a, l, r) n = comP,N(l^, (r-1)^) if ( n > 0 ) { O.pushB(ext(l^, n)) l, r = lift(res(l^, n), n)^, lift(res(r^, n), n)^ } //if } //while l, r = M.code(EOM, l, r) q = selectPoint(l, r) O.pushB(ext(q, lnz(q)) We do not specify here what selectPoint does. The only requirement is to return a grid point from final semi interval [l, r), but of cause, it’s a good idea to return a point with a shortest paths. As it follows from previous discussion, all we need is to find a minimal integer in p-adic representation. So, to select a point with minimal path we should define selectPoint( l, r ) = min(x^: l ≤ x < r) lnz used here not to push trailing zeros. Decoding Start with an empty message – no symbols. An empty message is coded as [0, 1), or as empty path U={} or as pair of indexes: l, r = 0, 0 As the first step read first N P-bits from an input stream and construct a number from the vector. We need also to transform a path we a getting from a stream, to a number, so we use operator ^. g= (I.getB(n) • PTN)^ where PTN is a vector PN = (P 0, P1, P2, … , PN) Model calculates a new interval and a symbol a M.decode(g, l, r ) => l, r, a Now, a is a new decoded symbol and can be pushed into a stream of decoded symbols 4/5/2007 O.pushC(a) Next, as in the coding algorithm, calculate common path length n = comP,N(l^, (r-1)^) If n > 0 we can drop common path and do rescaling. l, r, g = lift(res(l^, n), n)^, lift(res(r^, n), n)^, lift(res(g^, n))^ read additional n P-bits and recalculate g g = g + (I.getB(n) • PTn)^ Now we can repeat all steps. Pseudo code M.init(A, P, N) l, r = 0, 0 g = (I.getB(n) • PTN)^ while ( true ) { l, r, a = M.decode(g, l, r ) if ( a == EOM ) break O.pushC(a) n = comP,N(l^, (r-1)^) if ( n > 0 ) { l, r, g = lift(res(l^, n), n)^, lift(res(r^, n), n)^, lift(res(g^, n),n)^ g = g + (I.getB(n) • PTn)^ } //if } //while One important particular case – Huffman codes Now we are prepared to show that p-adic coding algorithm gives exactly the same codes as Huffman’s algorithm [Huffman] if a weight interval is prepared in a special way. Let as assume that for a given alphabet and symbol probabilities a Huffman code tree was constructed. For example: Symbol (s): a b C d e Codeword (h(s)): 000 001 10 01 11 Grid level (cl(s)): 3 3 2 2 2 Starting index in grid: 0 1 2 1 3 We can map the tree to weight interval using the same technique as we used for coding messages 4/5/2007 Picture 9. Mapping Huffman code tree to weight interval After lifting all intervals to highest grid (N=3): Symbol (s): a B c d e Codeword (lift(h(s),N-cl(s)): 000 001 100 010 110 Starting index in grid: 0 1 4 2 6 Algorithm of constructing weight intervals from a Huffman code tree for alphabet A is simple. Let cl(s) be a length of Huffman code of symbol s and N = max(cl(s)) among all s from A, h(s) - Huffman code of s, then symbols s occupies a semi interval starting at point with index lift(h(s),N-cl(s))^ and ending at starting point of a next symbol or 1. Constructed weight interval has an important property – all of subintervals occupy a whole grid interval of some level. It was shown above that in this situation left end right ends have an entire path in common; so PR rescaling will push all of it into an output and a next symbol will be coded starting with [0,1) interval. This proves that for this particular choice of weight interval p-adic coding works identical to Huffman’s algorithm. Another particular case – Golomb-Rice codes Surprisingly enough, but p-adic coding algorithm produces Golomb-Rice [Golomb, Rice] codes when supplied with single symbol alphabet and special model; no changes to algorithm itself are needed. If an alphabet contains only one symbol, the only information a message may contains is its length. So coding of a message is equivalent to coding of a natural number – the length. We will use symbol * to identify the only entry. The model is trivial: M.code(*, l, r) => l, r-1 M.code(EOM, l, r) => r-1, r M.decode(g, l, r ) => if (g == r-1) then l, r, EOM else l, r-1, * The algorithm will do all the work. Let start with P=2 and consider a grid 2N+1. Coding procedure starts with l = r = 0 4/5/2007 If a message is empty, we have to encode EOM. To do this we need to calculate r-1= 0 - 1, which is 2N+1-1 and return a path to 2N+1-1. This path consists of N+1 ones: {1, 1, … , 1, 1}. This is our new representation of zero. If a symbol comes, the model recalculates r and l: l = 0 r = 2N+1-1 If it was the only symbol in a message, then the model returns 2N+1-2, 2N+1-1, and a code is a path to a point with index 2N+1-2: {1, 1, … , 1, 0}. This procedure may be continued until a message’s length is less than 2N. At this point the model returns l = 0 r = 2N because comP,N(0^, (2 N -1)^) = 1 PR rescaling will be used; one 0 will be pushed to output buffer, l and r return to their initial values l = r = 0. The coder is in initial state and ready to receive a new symbol. Encoder stays almost without changes. We have defined selectPoint( l, r ) = l And drop lnz call in the last pushB operation to keep trailing zeros O.pushB(ext(q)) If a messages of length W comes W/2N zeros will be pushed in output buffer; the rest part of the output will contain a path to a point which index is 0 – (W%2N). After encoding EOM we have to move the point one step to the left. So finally index will be 0 – ((W%2N) + 1). For example, for N=3 we have: W code W Code 0 1111 8 01111 1 1110 9 01110 2 1101 10 01101 3 1100 11 01100 4 1011 12 01011 5 1010 13 01010 6 1001 14 01001 7 1000 15 01000 The codes look very much like Golomb-Rice codes. Indeed, they may be transformed to each other by replacing 1 with 0, and 0 with 1 - binary NOT. There is no magic in changing unary representation and delimiter – there is no difference between counting a number 0 of before first 1 and counting number of 1 before first 0. Transformation of the rest part – after delimiter, may be not that clear. In the ring of integers modular 2N 0 – (R +1) = (2N - 1) - R here R = (W%2N); R < 2N. In binary representation (2N – 1) is a vector U of N 1. Now NOT(U – R) = R This proves that after NOT transformation the rightmost part of codes transforms to W%2N. 4/5/2007 Any prime P can be used with this model. But this generalization does not look very promising. In fact, the reason why we discuss Huffman and Golomb-Rice codes here is to emphasize that the most popular entropy codes have a common base – they all maps messages to p-adic integer numbers. Rescaling based on Archimedean distance (AR) We were very ingenious when selecting most convenient for us weigh interval: {a:[0, 0.5), b:[0.5, 0.75), c:[0.75, 0.875), d:[0.875,1)} Yes, compression rate does not depend on an order of subintervals, but calculation and resulted codes do. Let shuffle the weigh interval: {b:[0, 0.25), a:[0.25, 0.75), c:[0.75, 0.875), d:[0.875,1)} Now subinterval a:[0.25, 0.75) covers the center point 1/2. Consider now a message containing only symbols a. It can be easily shown that left edge of message interval will be always less than 1/2, while the right one – greater. From p-adic point of view this means that ordp(l, r) is always zero and there is no common path and, as a sequence, rescaling will never happen. If we continue coding {a, a, … , a} we will end in integer overflow error or will be faced to use infinite precision arithmetic. To save our integer arithmetic from huge numbers we have to use the fact that Archimedean length in this case is less or equal to1/2. Picture 10. Coding {a, a, a} For P ≠ 2 situation is more complex. A semi interval can include any grid point 0 < n < P. In the following example (P = 3) an interval has Archimedean length 2/9, but p-adic length 1. 4/5/2007 Picture 11. Before rescaling Now let explore a case when a sub interval lies in the smallest interval of level 2, which includes a point of level 1 with index n. p-adic representation of left l and right r edges of such subinterval is. l = {n-1, P-1, …. } r = {n, 0, … } It’s Archimedean length is less or equal to 2/(P*P). We want to map it to a bigger interval, precisely to interval [n-1, n+1) from level 1. This can be done by a linear transformation: Y(X) = XP1 – nP0 +nP-1 Let’s consider how a semi interval defined in p-adic representation as l = m0P + m1P 1 + m2P 2+ … + mNP r = k0P + k1P 1 + k2P 2+ … + kNP transforms under this mapping. The first thing we need to do – to transform paths to points. We can do it by using IP transformation: a = m0P + m1P -2 + m2P -3+ … + mNP -N-1 b = k0P + k1P -2 + k2P -3+ … + kNP -N-1 Now we can apply linear transformation: Y(a) = (m0 - n)P + (m1 + n)P -1 + m2P -2+ … + mNP Y(b) = (k0 - n) P 0+ (k1 + n) P -1 + k2P -2+ … + kNP For this subinterval we have: m0 = n -1; m1 = P- 1 k0 = n; k1 = 0 Y(a) = 0P0 + (n - 1)P -1 + m2P -2+ … + mNP Y(b) = 0P0+ nP-1 + k2P -2+ … + kNP Rescaling will drop first zero terms. Reverting back from points to paths we can find how this transformation works on paths: Y(l) = (n - 1) P0 + m2P 1+ … + mnP Y( r ) = nP0 + k2P 1+ … + knP Or in vector representation: 4/5/2007 Y(l) = {n-1, P-1, …. } => {n -1, … } Y(r) = {n, 0, … } => {n, … } we just remove second (counting from the left) elements. It is also easy to verify that center point {n, 0, 0, …, 0} of this mapping is a stable point, i.e. Y maps it to itself Y : {n, 0, 0, …, 0} => {n, 0, …, 0} New interval [l, r) contains the stable point. Coming back to the example (here n=1) we can draw the picture after rescaling: Picture 11a. After rescaling We will refer this rescaling as AR. Important difference between AR and PR rescaling is that AR does not push anything in output buffer. It is convenient to invent a special predicate AR? for testing if AR rescaling can be applied for an interval. AR?(l, r, P) = (r[0] – l[0] == 1) AND (l[1] == P-1) AND (r[1] == 0) To continue coding we must remember the applied mapping, it can be done by storing only two parameters: n – a stable point and u – a number of times rescaling was applied. What may happen if we continue coding? 1. [l, r) are still contains n 1.1. value of AR? predicate is false 1.2. value of AR? predicate is true 2. [l, r) does not contain n 2.1. n lays to the right of r; toRight?( n, r) == true 2.2. n lays to the left of l; toLeft?( n, l) == true To test condition 2.1 and 2.2 we introduced two predicates toRight? and toLetf?. There predicates are suppose to receive a path as second argument, i.e. a number in p-adic integer number; first argument is an integer number toRight?(n, r) = ( r[0] < n ) OR ( r^ == n ) toLeft?(n, l) = l[0] ≥ n Now let discuss situations mentioned above: 1.1. This is the simplest case. We just continue coding. 1.2. Increase u: u = u +1; do AR rescaling and continue coding. 4/5/2007 2.1. This means that the whole interval lays in [{n-1, P-1, … , P-1}, {n, 0, … , 0}); where P-1 is added u times. Any subinterval from this interval has common path {n-1, P-1, … , P-1}, so we can now push this path into output and rescale l and r one more time removing fist digits. 2.2. This means that the whole interval lays in [{n, 0, … , 0, 1}, { n, 0, … , 0,1}); where 0 is added u times. Any subinterval from this interval has common path {n, 0, … , 0}, so we can now push this path into output and rescale l and r one more time removing fist digits. AR and PR rescaling procedures together guaranty that current coding interval will never be smaller than 2/P2-1/PN. This means that maximum value of indexes is 2PN-2-1. Algorithms revised Coding with AR A new feature here, comparing to the first variant of p-adic encoding algorithm, is that we need to track AR transformation. To do this we introduce two new variables sp and spn. • sp – stable point of AR; it is a point of level 1 and may be represented as a positive integer (not path) 0 < sp < P. • spn – number of times AR was applied. Some additional operations should be done at final step. First of all we need to check, as in the main loop, if the final interval is situated to the left or to the right of a stable point and, if this is the case, do necessary pushing and then proceed to usual final search for minimal point. If not and spn is not zero, then we are lucky and we already have a point from level 1 and all we need to do is just to push out sp. Pseudo code M.init(A, N) l, r = 0, 0 sp, spn = 0, 0 while ( ( a = I.getC ) != EOM ) { l, r = M.code(a, l, r) if ( spn ≠ 0 ) { if ( toLeft?(sp, l^) ) { O.pushB(sp, 1) O.pushB(0,spn) l, r = lift(res(l^, 1), 1)^, lift(res(r^, 1), 1)^ sp, spn = 0, 0 } //if if ( toRight?(sp, r^) ) { O.pushB(sp - 1, 1) O.pushB(P - 1, spn) l, r = lift(res(l^, 1), 1)^, lift(res(r^, 1), 1)^ sp, spn = 0, 0 } //if } //if // PR rescaling if ( spn == 0 ) { n = comP,N(l^, (r - 1)^) if ( n > 0 ) { O.pushB(ext(l^, n)) l, r = lift(res(l^, n), n)^, lift(res(r^, n), n)^ 4/5/2007 } //if } //AR rescaling while ( AR?(l^, r^) ) { sp = r^[0] if sp == 0 spn = spn + 1 l, r = lift(cut(l^,1,1),1)^, lift(cut(r^,1,1),1)^ } //while } //while l, r = M.code(EOM, l, r) if ( spn ≠ 0 ) { if ( toLeft?(sp, l^) ) { O.pushB(sp, 1) O.pushB(0, spn) l, r = lift(res(l^, 1), 1)^, lift(res(r^, 1), 1)^ sp, spn = 0, 0 } //if if ( toRight?(sp, r^) ) { O.pushB(sp - 1, 1) O.pushB(P - 1, spn) l, r = lift(res(l^, 1), 1)^, lift(res(r^, 1), 1)^ sp, spn = 0, 0 } //if } //if if (spn == 0) { q = selectPoint(l, r) O.pushB(ext(q, lnz(q)) } else { O.pushB(sp, 1) // we already have point of level 1 } //if //the End Decoding with AR AR rescaling is simpler for decoding process, because we do not care about pushing anything out and a final step is most simple – we just finish decoding. The only thing which is new is additional reading from an input stream. Pseudo code M.init(A, N) l, r = 0, 0 spn = sp = 0 g = (I.getB(N) • PTN)^ while ( true ) { l, r, a = M.decode(g, l, r ) if ( a == EOM ) break O.pushC(a) if ( spn ≠ 0 ) { if ( toLeft?(sp, l^) OR toRight?(sp, r^) ) { l, r, g = lift(res(l^, 1), 1)^, lift(res(r^, 1), 1)^ , lift(res(g^, 1), 1)^ g = g + (I.getB(1) • PT1)^ 4/5/2007 sp, spn = 0, 0 } //if } //if // PR rescaling n = comP,N(l^, (r-1)^) if ( n > 0 ) { l, r, g = lift(res(l^, n), n)^, lift(res(r^, n), n)^, lift(res(g^, n),n)^ g = g + (I.getB(n) • PTn)^ } //if // AR rescaling while ( AR?(l^, r^) ) { sp = r^[0] if sp == 0 spn = spn +1 l, r, g = lift(cut(l^,1,1),1)^, lift(cut(r^,1,1),1)^, lift(cut(g^,1,1),1)^ g = g + (I.getB(1) • PT1)^ } //while } //while //the End Of course, PT1 is just 1 and we can also omit ^ operator. The operation g = g + (I.getB(1) • PT1)^ can be replaced (in two places) by g = g + I.getB(1) Implementation We have implemented all algorithms and all tests in Ruby [Ruby] – a new popular interpreted, dynamically typed, pure object-oriented, scripting language. And Ruby proved to be very helpful. We would hardly be able to try so many variants and run innumerous tests in any other language. Now let discuss the practical case of P=2. All previous discussion remains valid – this is just a special case. This case has most important advantage – we can use real bits and binary vectors. This is extremely convenient. All algorithms remain the same. Only some small improvement can be done for AR rescaling. Because the only possible value for n is 1, there is no need to store it as spt. In case when toLeft? returns true we have to push 1 and a number of 0; if toRight? returns true we have to push 0 and a number of 1. Arithmetic coding We can see that arithmetic coding is just a special case of p-adic coding for P=2. All conditions expressed there as arithmetic operations can be done on bit level. In fact, many practical implementations use shifts instead. Let us examine E1 condition: mHigh < g_Half where g_Half = 0x40000000 This condition means that most significant bit in binary representation of mHigh must be 0. This is also true for gLow because mLow < mHigh. Reverting to paths we can see that both gLow and gHigh have 4/5/2007 most significant bits in p-adic representation are equal to 0, so p-adic distance is less than 1 and PR condition is fulfilled. However, in p-adic coding algorithm PR rescaling works for mHigh equal to g_Half. It is this small difference makes p-adic coding algorithm works exactly as Huffman algorithm for certain models. Arithmetic coding in this situation does not provide optimal compression (see discussion in [Bodden]). AR rescaling is similar to E3. AR? predicate is equivalent to (g_FisrtQuater <= mlow) AND (mHigh < g_ThirdQuater) We have implemented the same model as proposed in [Bodden] and get the same compression for all standard tests. Results Standard tests For testing we used Calgary/Canterbury text compression corpus – popular set of tests first discussed in [Bell]. It contains files bib, book1, book2, geo, news, obj1, obj2, paper1, paper2, paper3, paper4, paper5, paper6, pic, progc, progl, progp and trans. These files may be obtained from [Canterbury corpus]. Comparison with Arithmetic coding We used program codes published in [Bodden] to get results of arithmetic coding. This program adds additional 4 bytes to an output file; these 4 bytes are in most examples the only difference between arithmetic and p-adic coding results. In our test we use p-adic coding with P=2 and N= 31. Conclusion Tree is a well known and widely used data structure in computer science. Arithmetic, Huffman and Golomb-Rice coding are also well known and widely used for a long time algorithms. p-adic numbers, ultrametric spaces are not so popular in computer science; even for pure mathematic they are relatively new. Is there any connection between them? We hope that we have shown this connection and that this connection is quite natural and fundamental. A message, as sequences of symbols, may be considered as path on a tree. There are numerous ways to construct this mapping. It is quite fundamental and widely used way for presenting messages and is very popular in computer science and applications. On the other hand, trees are great models of p-adic numbers; many strange and unusual features of ultrametric spaces can be understood and visualized on trees [Holly]. This works also in the reverse direction – p-adic numbers is convenient tool for indexing paths and p-adic norm is a natural measure on trees. On the other hand, a message can be mapped to a subinterval of a unit interval – this what real number arithmetic algorithm does. While theoretically clear and simple, this method was never used in practice, because of its inefficiency due to problems with computer based real arithmetic. Integer arithmetic coding solved this problem by introducing some practical receipts how to use integer numbers, instead of real ones. The resulting algorithm proved to be efficient and robust and may be because of this fact no theoretical analysis has been done. Integer version looks pretty much like the original algorithm, but in fact, difference between them is considerable; while real number algorithm works on a field, its integer number variant deals with a finite ring. p-adic number coding algorithm explicitly works with numbers from the finite ring of positive integer numbers modular PN. These numbers, being mapped to a union interval, create an equidistance grid G(PN). The next step is to create a path from a root through grid points of upper levels (G(PK); k=0..N-1) to points of this grid. This construction creates a bridge between ultrametric space of tree paths and Archimedean space of grid points. Now we can identify any grid point not only by its index, but by a path, i.e. by some 4/5/2007 p-adic integer number; the reverse is also true – any path can be identify by its end point from the grid and as so by an index. This dualism is the real base of p-adic arithmetic coding algorithm. We found a simple and elegant way to transform paths to indexes and back. We called it IP transformation. As a transformation from paths to points IP transformation can be considered as Kozyrev’s transformation for finite paths, but IP transformation is reversible. p-adic arithmetic coding algorithm works as a bridge between two spaces – ultrametric space of paths and Archimedean space of grid points. Model calculates intervals with edge points on the grid and then IP transformation maps them to paths; if these paths are closed to each other as p-adic numbers, then common path is pushed to output buffer, these p-adic numbers are truncated, and IP transformation maps them back on grid. For P=2 PR rescaling works pretty much like E1/E2 rescaling but it has one small improvement. It is this improvement that makes it possible to show that for certain models and alphabet p-adic coding algorithm works as Huffman and Golomb-Rice algorithm. For P=2 and general models it works as arithmetic coding. So we may say that three most popular entropy coding algorithms can be considered as special cases of one algorithm - p-adic coding, working with different P, models and alphabets. This also gives an answer to the question in the begging of this paragraph - arithmetic, Huffman and Golomb-Rice coding algorithms maps messages to ultrametric space of p-adic numbers. They are “speaking in prose”! References 1. Abrahamson, N., "Information theory and coding”, McGraw-Hill, New York 1963. 2. Baker A.J., “Introduction to p-adic Numers and p-adic Analysis”, Department of Mathematics, University of Glasgow G12 8QW, Scotland 3. Bell, T.C., Witten, I.H. and Cleary, J.G., "Modeling for text compression", Computing Surveys 21(4): 557-591; December 1989. 4. Bodden Eric, Clasen Malte, Kneis Joachim, "Arithmetic Coding Revealed. A guided tour from theory to practice”, Translated and updated version, May, 2001. 5. Canterbury corpus, http://links.uwaterloo.ca/calgary.corpus.html . 6. Golomb S.W., "Run-length encoding”, IEEE Transactions on information Theory, IT-12:399- 401, July 1966. 7. Holly J.E.,”Pictures of Ultrametric Spaces, the p-adic Numbers, and Valued fields”, Amer. Math. Monthly 108 (2001) 721-728 8. Huffman D.A., “A method for construction of minimum-redundancy codes ”, Proc. Inst. Radio Eng. 40, 9 (Sept. 1952), 1098-1101 9. Koblitz Neal, "p-adic numbers, p-adic analysis and zeta-functions”, Springer-Verlag, 1977. 10. Koc C.K., "A Tutorial on p-adic Arithmetic”, Electrical & Computer Engineering, Oregon State University, Corvallis, Oregon 97331, April 2002. 11. Kozyrev S.V., “Wavelet theory as p-adic spectral analysis”, Izvestiya: Mathematics 66:2 367-376 12. Moffat A, Neal R. M. and Witten I. H., “Arithmetic coding revised,” ACM Transactions on Information Systems, vol. 16, no. 3, pp. 256-294, 1998 13. Nelson Mark, "Arithmetic Coding + Statistical Modeling = Data Compression”, Dr. Dobb’s Journal, February, 1991. 14. Rice R.F., "Some Practical Universal Noiseless Coding Techniques”, Technical Report JPL Publication 79-22, JPL, March 1979. 15. Ruby, "Ruby”, http://www.ruby-lang.org/en/ 16. Said Amir, “Introduction to Arithmetic Coding – Theory and Practice”, Imagining Systems Laboratory, HP Laboratories, Palo Alto, HPL-2004-76, April 21,2004 17. Salomon David, "Data compression. The complete reference”, Springer-Verlag, 2004 18. Sayood Khalid, "Introduction to data compression. The complete reference”, Elsevier, 2006 19. wiki, "Arithmetic_coding”, http://en.wikipedia.org/wiki/Arithmetic_coding 20. Witten, I.H., Neal, R. and Cleary, J.G. (1987) “Arithmetic coding for data compression.” Communications of the ACM, 30(6), pp. 520-540, June. Reprinted in C Gazette 2(3) 4-25, December 1987 ABSTRACT A new incremental algorithm for data compression is presented. For a sequence of input symbols algorithm incrementally constructs a p-adic integer number as an output. Decoding process starts with less significant part of a p-adic integer and incrementally reconstructs a sequence of input symbols. Algorithm is based on certain features of p-adic numbers and p-adic norm. p-adic coding algorithm may be considered as of generalization a popular compression technique - arithmetic coding algorithms. It is shown that for p = 2 the algorithm works as integer variant of arithmetic coding; for a special class of models it gives exactly the same codes as Huffman's algorithm, for another special model and a specific alphabet it gives Golomb-Rice codes. <|endoftext|><|startoftext|> Compton X-ray and γ-ray Emission from Extended Radio Galaxies C. C. Cheung1 Kavli Institute for Particle Astrophysics and Cosmology, Stanford University, Stanford, CA 94305, USA Abstract. The extended lobes of radio galaxies are examined as sources of X-ray and γ-ray emission via inverse Compton scattering of 3K background photons. The Compton spectra of two exemplary examples, Fornax A and Centaurus A, are estimated using available radio measurements in the ∼10’s MHz – 10’s GHz range. For average lobe magnetic fields of ∼ 0.3–1 µG, the lobe spectra are predicted to extend into the soft γ-rays making them likely detectable with the GLAST LAT. If detected, their large angular extents (∼1◦ and 8◦) will make it possible to “image” the radio lobes in γ-rays. Similarly, this process operates in more distant radio galaxies and the possibility that such systems will be detected as unresolved γ-ray sources with GLAST is briefly considered. Keywords: gamma-ray sources (astronomical); radiofrequency spectra; imaging; radiogalaxies PACS: 98.54.Gr,98.58.Fd INVERSE COMPTON "IMAGES" OF LARGE RADIO GALAXIES Inverse Compton (IC) scattering of the CMB is a mandatory process in synchrotron emitting sources. This emission becomes most prominent in regions of weaker B-field like the extended lobes of radio galaxies. Many such IC/CMB lobe X-ray sources are now known (e.g., Croston et al. 2005; Kataoka & Stawarz 2005) and we explore the possibility of the IC spectra extending into the γ-ray band. This is independent of possible γ-ray emission from the unresolved nuclei of radio galaxies, i.e., from the misaligned blazar (Sreekumar et al. 1999; Bai & Lee 2001; Foshini et al. 2005). The case of the nearby (D=18.6 Mpc) double-lobed radio galaxy, Fornax A was discussed in Cheung (2007). Radio flux density measurements down to ∼30 MHz (Isobe et al. 2006) were used to estimate the IC/CMB spectra of the lobes. Normalizing the IC spectra to the X-ray detections of the lobes (which indicate B∼1.5µG on average; Feigelson et al. 1995, Isobe et al. 2006), the presence of high frequency radio emission observed in the >∼ 10–90 GHz range with WMAP (with Fν ∝ ν−1.5) imply a detectable soft γ-ray signal. As this emission is not expected to be time variable, the LAT can simply integrate on this position during its normal scanning mode to test this prediction. Here, we similarly consider the case of Centaurus A which is only 3.5 Mpc away. It is long known to have structure extended over ∼8◦ in declination (Cooper et al. 1965, and references therein). We use the extensive compilation by Alvarez et al. (2000) of the various components of the radio source; Figure 1 shows a low resolution 408 MHz image from Haslam et al. (1982). The outer (degree-scale) giant lobes (GLs) visible in Figure 1 account for >∼ 2/3 of the total 408 MHz emission at ∼1000 Jy each; the arcmin-scale inner lobes (ILs) are only 3–4 times fainter than each GL. The northern GL was searched for such IC emission with ASCA data but the extended X-rays could not be uniquely attributed to such a process (Isobe et al. 2001). Repeating the analysis as for Fornax A, it appears that the extended components of Cen A will also emit γ-rays at a level detectable by GLAST. The various data from ∼10 MHz to 43 GHz are consistent with a single spectral index α=0.7. Since the luminosities of both the ILs and GLs are similar (within ∼20%), only the SEDs of the southern ones are plotted in Figure 1. Utilizing these radio measurements, the expected IC/CMB spectra for example B-field strengths are drawn. The integrated Compton Gamma-Ray Observatory (CGRO) COMPTEL detections of Cen A (Steinle et al. 1998) at ∼1021 Hz already limit B >∼ 1µG for both the northern and southern GLs (since they have similar radio spectra); a similar extrapolation for the ILs give B >∼ 0.3µG. Thermal emission will be a complicating factor at energies below ∼10 keV, so hard X-ray and soft γ-ray measure- ments are better suited for detecting the suspected IC/CMB emission. Additionally, since Fornax A and Cen A are 1 Jansky Postdoctoral Fellow. The National Radio Astronomy Observatory is operated by Associated Universities, Inc. under a cooperative agreement with the U.S. National Science Foundation. http://arxiv.org/abs/0704.0835v1 208.000 204.000 200.000 196.000 -38.000 -40.000 -42.000 -44.000 -46.000 -48.000 Right Ascension (J2000) GL (south) GL (north) Centaurus A 408 MHz 0.85 deg FIGURE 1. [Left] Radio image of Cen A at 0.85◦ resolution which is comparable to the angular resolution of GLAST/LAT. [Center] SEDs of the multiple components of the Cen A radio source with lines indicating Fν ∝ ν−0.7 spectra. The data points at > 1018 Hz are the integrated detections with CGRO with lines indicating the expected IC/CMB spectra of the southern giant lobe for different average B-fields. [Right] IC/CMB X-ray and γ-ray flux predictions for 17 of the highest-z radio galaxies discussed in the text. Typical Chandra “snapshots" are 5–10 ksec exposures so these sources are all expected to be easily detectable in the X-rays for the indicated field strengths. GLAST detections require electrons with γ >∼ 105 in a 1µG or smaller field which are optimistic. quite extended in the sky (∼1◦ and 8◦), if they are detected with GLAST, the contributions from the two lobes will be separable with the LAT making IC/CMB γ-ray “images” of these radio galaxies possible. These γ-ray images will appear most similar to radio maps at frequencies, ν >∼ 10 GHz; such radio maps of Cen A’s extended components have already been obtained by WMAP (Page et al. 2007, Fig. 2 therein) and are available for this comparison. THE HIGHEST-REDSHIFT RADIO GALAXIES Using the above examples as a guide, we can gauge the feasibility of detecting even more distant radio galaxies at the higher-energies. Utilizing the recent large compilation of bright z>2.5 radio sources by Carson et al. (2007), we consider the highest-redshift (z > 3.5) radio galaxies for illustration. The observed monochromatic Compton (X-ray, γ-ray) to synchrotron (radio) flux ratio for IC/CMB emission has a strong redshift dependence: for α=1, it is simply fc/ fs ≃ ucmb/uB ≃ 10(1+ z) 4δ 2/B2µG ( fν ≡ νFν , and δ is the Doppler factor which is set to 1). We use the NVSS (Condon et al. 1998) 1.4 GHz fluxes for fs. Most of the considered sources (13/17) are detected at 74 MHz in the VLSS database (Cohen et al. 2005) giving α74MHz−1.4GHz ∼ 0.9− 1.2, so the approximate relation is applicable. As in the nearby sources, these distant radio galaxies are expected to be IC/CMB X-ray sources unless B ≫10µG (Fig. 1). Chandra observations should easily detect this emission to constrain the lobe B-fields, and thus the lobe energetics. In one of the highest-redshift (z = 3.8) radio galaxies observed so far with Chandra (Scharf et al. 2003), it was necessary to remove the contribution from a bright nucleus (spatially) and extended IC emission from other sources of seed photons (by spectral fitting). Such X-ray observations can guide our determination of the expected level of (soft) γ-ray emission from the IC/CMB process; at the moment, the estimates (Fig. 1) are rather crude. REFERENCES 1. H. Alvarez, J. Aparici, J. May, & P. Reich, Astron. Astrophys. 355, pp. 863–872 (2000). 2. J. M. Bai & M. G. Lee, Astrophys. Journal Lett. 549, pp. L173–L177 (2001). 3. J. E. Carson, T. M. Arias, & C. C. Cheung, in preparation (2007). 4. C. C. Cheung, in The Central Engine of Active Galactic Nuclei, edited by L. C. Ho & J.-M. Wang, ASP Conf. Series, in press, arXiv:astro-ph/0612372 (2007). 5. A. S. Cohen, et al., in From Clark Lake to the Long Wavelength Array: Bill Erickson’s Radio Science, edited by N. Kassim et al., ASP Conf. Series 345, 299–303 (2005). 6. B. F. C. Cooper, R. M. Price, & D. J. Cole, Australian Journal of Physics 18, pp. 589–625 (1965). http://arxiv.org/abs/astro-ph/0612372 7. J. J. Condon, W. D. Cotton, E. W. Greisen, Q. F. Yin, R. A. Perley, G. B. Taylor, & J. J. Broderick, Astron. Journal 115, pp. 1693–1716 (1998). 8. J. H. Croston, et al., Astrophys. Journal 626, pp. 733–747 (2005). 9. E. D. Feigelson, S. A. Laurent-Muehleisen, R. I. Kollgaard, & E. B. Fomalont, Astrophys. Journal Lett. 449, pp. L149–L152 (1995). 10. L. Foschini et al., Astron. Astrophys. 433, pp. 515–518 (2005). 11. C. G. T. Haslam, C. J. Salter, H. Stoffel, & W. E. Wilson, Astron. Astrophys. Suppl. 47, pp. 1–142 (1982). 12. N. Isobe, K. Makishima, M. Tashiro, & H. Kaneda, in Particles and Fields in Radio Galaxies, edited by R. A. Laing & K. M. Blundell, ASP Conf. Series 250, pp. 394–399 (2001). 13. N. Isobe, K. Makishima, M. Tashiro, K. Itoh, N. Iyomoto, I. Takahashi, & H. Kaneda, Astrophys. Journal 645, pp. 256–263 (2006). 14. J. Kataoka, & Ł. Stawarz, Astrophys. Journal 622, pp. 797–810 (2005). 15. L. Page, et al., Astrophys. Journal, in press (2007). 16. C. Scharf, et al. Astrophys. Journal 596, pp. 105–113 (2003). 17. P. Sreekumar, D. L. Bertsch, R. C. Hartman, P. L. Nolan, & D. J. Thompson, Astroparticle Physics 11, pp. 221–223 (1999). 18. H. Steinle, et al. Astron. Astrophys. 330, pp. 97–107 (1998). Inverse Compton "Images" of Large Radio Galaxies The Highest-Redshift Radio Galaxies ABSTRACT The extended lobes of radio galaxies are examined as sources of X-ray and gamma-ray emission via inverse Compton scattering of 3K background photons. The Compton spectra of two exemplary examples, Fornax A and Centaurus A, are estimated using available radio measurements in the ~10's MHz - 10's GHz range. For average lobe magnetic fields of >~0.3-1 micro-G, the lobe spectra are predicted to extend into the soft gamma-rays making them likely detectable with the GLAST LAT. If detected, their large angular extents (~1 deg and 8 deg) will make it possible to ``image'' the radio lobes in gamma-rays. Similarly, this process operates in more distant radio galaxies and the possibility that such systems will be detected as unresolved gamma-ray sources with GLAST is briefly considered. <|endoftext|><|startoftext|> Introduction In this paper we construct a new Z-basis for the space of quasisymmetric functions, QSym and study its properties. For instance, we show that it has nonnegative structure constants, and that it behaves well with respect to the quasisymmetric functions associated to matroids by the Hopf algebra morphism Mat → QSym described by Billera, Jia, and Reiner [3]. We also answer in the affirmative a question regarding rank two matroids posed in [3, Question 7.10], and give an affirmative answer to [3, Question 7.12] in the case of rank two matroids. In [3], Billera, Jia, and Reiner describe an invariant for matroids in the form of a quasisymmetric function. They show that the mapping F : Mat → QSym is in fact a morphism of combinatorial Hopf algebras (given a suitable choice of character on Mat; see [1]), where Mat is the Hopf algebra of matroids introduced by Schmitt [15], and studied by Crapo and Schmitt [4], [5], [6], [7]. Billera, Jia, and Reiner show that, while the mapping F is not surjective over integer coefficients, it is surjective over rational coefficients. http://arxiv.org/abs/0704.0836v2 Our new basis for QSym is “matroid-friendly” in that it reflects the rank of loopless matroids as well as the size of the ground sets: for every 1 ≤ r ≤ n, there is a set Nnr of basis vectors such that for every loopless matroid M of rank r on an n-element ground set, F (M) ∈ span Nnr ; moreover, QSym decomposes as the direct sum of these subspaces. This provides us with a new product grading of QSym, according to matroid rank r. (The usual grading of QSym by degree corresponds to the size n of the matroid ground set.) Also, as with the monomial and fundamental bases of QSym, for every matroid M , F (M) has nonnegative coefficients in our basis. The paper has two main parts. The first part (Sections 2–4) presents the new basis and relevant background material. In Section 2, we recount background material from the literature regarding posets and quasisymmetric functions. In Section 3, we present a definition for our new basis for QSym by means of a construction, and highlight several of its important features. There we also prove that it is a Z-basis for QSym. In Section 4, we build necessary machinery regarding computing the quasisymmetric function associated to a labeled poset, in the form of alternative decompositions, and apply these tools to prove that the structure constants of the new basis are nonnegative. The second part, (Sections 5–7) discusses matroids and their quasisymmetric functions. In Section 5, we recall some of the concepts, terminology, and results from the paper [3], and prove our claims regarding the quasisymmetric functions of matroids vis-a-vis our new basis. In Section 6, we recall the context of [3, Section 7] regarding the relationship between decompositions of the quasisym- metric function associated to a matroid and decompositions of its matroid base polytope, and recall the statement of [3, Question 7.10] regarding the functions associated to rank two matroids. We develop a formula for the quasisymmetric function of a loopless rank two matroid in terms of the new basis, and apply it to show (1) that the morphism F : Mat → QSym distinguishes isomorphism classes of rank two matroids, (2) that the two types of decompositions mirror each other, i.e. an affirmative answer to [3, Question 7.12] for the case of rank two matroids, and (3) to give an affirmative answer to [3, Question 7.10]. In Section 7, we make additional observations regarding matroid functions and the new basis. We also compare the new basis with the other QSym bases discussed in Section 10 of [3], and sketch an alternate proof of the surjectivity of the map Mat → QSym over rational coefficients. 2 Preliminaries In this section we quote certain concepts, terminology, and facts from the litera- ture, as well as establish certain conventions which will be used in the remainder of the paper. 2.1 Compositions A composition α is a finite sequence of positive integers, i.e. α ∈ Pm for some m ∈ N. The number of parts of α, m, is the length of α, and denoted by ℓ(α). The weight of α = (α1, . . . , αm) is |α| = α1+ · · ·+αm. Included in our definition is the composition having no parts, which we denote by the (bold font) symbol 0. We have ℓ(0) = |0| = 0, the only composition with these properties. Note, for small examples where individual parts are less than 10, we will often write a composition as a sequence of digits, with no separating commas. For example, we may write (1, 5, 6, 3, 2, 3) as 156323 when the context is clear. We adopt a similar convention for the one-line notation of permutations in Sn when n < 10. There is a natural bijection between compositions of weight |α| = n and susbets of [n− 1] (where [n] = {1, 2, 3, . . . , n}), given by (α1, . . . , αm) ↔ {α1, α1 + α2, α1 + α2 + α3, . . . , α1 + · · ·+ αm−1}. We say that β is a refinement of α, or that β refines α (denoted β 4 α) if |α| = |β| and A ⊂ B, where A and B are the sets associated to α and β respectively. To any permutation π ∈ Sn there is an associated composition of weight n which we denote C(π) and whose parts give the lengths of successive increasing runs in the one-line notation of π. For example, for π = 934756218 ∈ S9, we have C(π) = 13212. In this paper, we mildly generalize the notion of a permutation to be any sequence of distinct positive integers. Given a set of positive integers X , we let S(X) denote the set of all permuations of all the elements of X . The run length operator C(π) extends to these general permutations in the obvious way. If X and Y are two sets of positive integers of the same cardinality n, then every bijection f : X → Y induces a mapping f : S(X) → S(Y ) given by f(x1, . . . , xn) = (f(x1), . . . , f(xn)). If f is an increasing function, then we have C(f(π)) = C(π) for every π ∈ S(X). 2.2 Well-known QSym bases The algebra of quasisymmetric functions QSym (or QSym(x) when we want to emphasize the variable set) forms a subring of the power series ring R[[x]] where x = (x1, x2, x3, ...) is a linearly ordered set of variables indexed by the positive integers, and R is a (fixed) commutative ring. In this paper we only deal with the cases where R is either Z or Q, assuming coefficients in Q unless otherwise stated. We often suppress the variables in our notation, writing simply f ∈ QSym rather than f(x) ∈ QSym(x). There are a number of well-known bases for QSym, all indexed by compo- sitions. (For the two considered here, see [10].) The best-known is the basis of monomial quasisymmetric functions, which here we denote {xα}. Given a composition α with ℓ(α) = k, xα is defined by xα := 1≤i1 y if i is odd. Lemma 4.2. Let K be an alternating ordered partition of type τ . Then F (PK) = Nτ . Proof. Each rank Ki of PK = K1 ⊕ · · · ⊕Kk (where ℓ(τ) = k) is an antichain. Hence F (PK) depends only on the relative ordering of the elements between adjacent ranks Ki and Ki+1 (see Remark 2.1). Since K is alternating, we can relabel its elements in each rank as we do in the construction of Pτ (as in Definition 3.1) and still maintain the same relative ordering between elements in adjacent ranks. Thus F (PK) = F (Pτ ) = Nτ . 4.2 Unordered partitions of X ⊂ P Let T = {T1, . . . , Tm} be an unordered partition of the set X ⊂ P. We say that an ordered partition K is a refinement of T if K, considered as an unordered partition, is a refinement of T . For every permutation π ∈ S(X), T induces a unique segmentation of π where each segment is contained in a block of T and this segmentation is least (coarsest), with respect to refinement, among all such segmentations. Corresponding to this segmentation there is a unique ordered partition KT (π), which clearly is is a refinement of T . We say that T induces the ordered partition KT (π) on π. Example 4.3. Let X = [9], T = { {1, 4}, {2, 6, 8, 9}, {3, 5, 7} }, π = 965412378. Then KT (π) = ({6, 9}, {5}, {1, 4}, {2}, {3, 7}, {8}). Let P be a labeled poset, and T an unordered partition of P . Define KP,T to be the set of induced ordered partitions KP,T := {KT (π) | π ∈ L(P )}. We say that T is antichain-inducing if for every ordered partition K ∈ KP,T , every block Ki of K is an antichain in P . Lemma 4.4. Let T be an antichain-inducing unordered partition of a labeled poset P . Then F (P ) = K∈KP,T F (PK). (5) We call this the decomposition of F (P ) with respect to T . Proof. By Lemma 4.1 it suffices to show that L(P ) = K∈KP,T K−1(K). The “⊂”-direction is trivial. Indeed, T induces some ordered partition on every permutation, and by definition KP,T includes all such partitions as permutations range over L(P ). Also, clearly K−1(K) ∩ K−1(J) = ∅ if K 6= J since KT is a well-defined map on L(P ), and so the union on the right is indeed a disjoint union. For the “⊃”-direction, let K ∈ KP,T . By definition of KP,T , there exists π ∈ L(P ) ∩ K−1(K). Let s = sτ(K)(π). Since T is antichain-inducing, the unordered set of elements Ki of each segment si is an antichain. It follows that if we form a new permutation π̂ by permuting the elements of si arbitrarily within si (and thus within π), we must also have that π̂ ∈ L(P ). Since this holds true for each segment of s, we have K−1(K) ⊂ L(P ). Remark. In the extreme case where T consists of all singleton sets, KT (π) is the list of singleton sets in the order specified by π, and K−1(KT (π)) = {π}. We can identify KT (π) with π itself, and similarly KP,T with L(P ), and the lemma is then equivalent to the formula (1). 4.3 Structure constants for the new basis Following the notation of [3] and [17], given labeled posets P and Q on sets X and Y respectively, we denote by P + Q any disjoint sum of the posets, constructed as follows. We first form the poset whose set of elements is the disjoint union of the sets of elements of P and Q, retaining all partial order relations of the two posets but adding no new relations. In order to ensure that all labels are distinct, we then relabel the elements in any fashion subject to the restriction that the resulting labels are all distinct and preserve the relative order of labels at all covering relations (see Remark 2.1). While the disjoint sum of the labeled posets is not uniquely defined, all disjoint sums so constructed will have the same quasisymmetric function. It is well-known and is easy to prove (see, for example, [10]) that F (P +Q) = F (P ) · F (Q). (6) We are now in a position to prove the nonnegativity of the structure constants for our new basis. Theorem 4.5. The quasisymmetric function algebra QSym is graded by the rank of the compositions indexing the basis {Nα}. Furthermore, the structure constants for {Nα} are nonnegative. That is, in the expansion NαNβ = cνα,βNν , all the constants cνα,β are nonnegative integers. Proof. We first prove the statement regarding structure constants. Since N0 = 1, the claim holds trivially if α = 0 or β = 0. Thus we assume α = (α1, . . . , αs) 6= 0 and β = (β1, . . . , βt) 6= 0. By (6) we have that NαNβ = F (Pα)F (Pβ) = F (Pα + Pβ). (7) We write Pα = A1 ⊕ · · · ⊕As and Pβ = B1 ⊕ · · · ⊕Bt, and identify the Ai and Bj subsets with their canonical inclusions in Pα + Pβ . We form a new poset Q by relabeling the elements of the Ai and Bj subsets while maintaining their ordering relations: first label the even-indexed Ai and Bj in order, with the numbers from [m], where m = |α|+ |β| − r(α)− r(β) and r(α) is the rank function from Definition 3.3, then label the odd-indexed Ai and Bj in order, with the numbers from {m + 1, . . . , |α| + |β|}. Since F (Pα + Pβ) depends only on the relative ordering of elements between adjacent ranks Ai and Ai+1 for 1 ≤ i < s and between adjacent ranks Bj and Bj+1 for 1 ≤ j < t, we have F (Pα + Pβ) = F (Q). (8) We consider the unordered partition T = {T1, T2} of Q given by odd i odd i , and T2 = even i even i Note that T is antichain-inducing, so we may apply Lemma 4.4: F (Q) = K∈KQ,T F (PK). (9) On the other hand, the labeling of Q implies that every ordered partition K ∈ KQ,T is alternating, so applying Lemma 4.2, we have F (Q) = K∈KQ,T Nτ(K). (10) Combining Equations (7) – (10) yields the positivity claim. In particular, cνα,β = |{K ∈ KQ,T : τ(K) = ν}|. To prove the statement regarding the grading of QSym by composition rank, we simply note that for every K ∈ KQ,T , we have r(τ(K)) = |T1| = r(α) + r(β). 5 Matroids This section begins the second part of the paper. Here we review some of the concepts, terminology, and results from [3], and prove our claims regarding the quasisymmetric functions of matroids vis-a-vis our new basis. For general background in matroid theory we refer the reader to standard texts such as Oxley’s [14]. We review several of the terms here. The direct sum of matroids M1 and M2, denoted M1⊕M2, has as its ground set the disjoint union E(M1 ⊕M2) = E(M1) ⊔ E(M2), and as its bases B(M1 ⊕M2) = {B1 ⊔B2 : B1 ∈ B(M1), B2 ∈ B(M2)}. A circuit is a minimal dependent set. If we declare two elements of a matroid to be equivalent if and only if they are both contained in some circuit, then the equivalence classes of elements are the components of the matroid. We say that the matroid is connected if it has only one component, and disconnected otherwise. A matroid is the direct sum of its components. 5.1 The quasisymmetric function of a matroid Billera, Jia, and Reiner [3] describe an invariant for isomorphism classes of matroids in the form of a quasisymmetric function. Rather than give the defi- nition from [3], we describe it in terms of a formula which is shown in [3] to be equivalent to the definition. Fix a matroid M , one of its bases B ∈ B(M), and let Bc = E(M)−B (the cobase of B). Define the poset PB on the ground set E(M) where e y. Similarly, a labeled poset is naturally labeled if for all x, y ∈ P , x

3, choose index j such that 1 < j < m− 1. Let a = tj and b = n− tj , and define compositions µ = (a, b), α = (a, λj+1, . . . , λm), and β = (λ1, . . . , λj , b), all of which have weight n. Consider the hyperplane H ′ = {x ∈ Rn : i=1 xi = 1}. Then H ′ ∩ Q(Mλ) = Q(Mµ), giving us a hyperplane split Q(Mλ) = Q(Mα) ∪ Q(Mβ). It follows from the above that F (Mλ) = F (Mα) + F (Mβ) − F (Mµ), and F (Mλ) = F (Mα) + F (Mβ). We can summarize this in the following proposition. The relations given in the proposition remain true even if λ has only two or three parts, but in that case the resulting relations are trivial. Proposition 6.1. Let λ = (λ1, . . . , λt) be a composition with at least two parts. Let 1 ≤ s < t, a = i=1 λi, and b = i=s+1 λi. Consider compositions α = (a, λs+1, . . . , λt), β = (λ1, . . . , λs, b), and µ = (a, b). We then have F (Mλ) = F (Mα) + F (Mβ)− F (Mµ), and modulo m2, F (Mλ) = F (Mα) + F (Mβ). Moreover there is a split of matroid base polytopes Q(Mλ) = Q(Mα) ∪Q(Mβ). The splitting process can be repeated on the constituent matroid base polytopes until we have decomposed Q(Mλ) into the union of matroid base polytopes of type Q(Mα) where ℓ(α) = 3. Consequently, modulo m 2, F (Mλ) can be written as a positive sum F (Mλ) = F (Mi), where each Mi is a loopless rank 2 matroid indexed by a partition of length 3. In this setting, Billera, Jia, and Reiner , pose the following question: [3, Question 7.10] Fix n and consider the semigroup generated by F (M) within QSymn/m 2 as one ranges over all matroids M of rank 2 on n elements. Is the Hilbert basis for this semigroup indexed by those M for which λ(M) has exactly 3 parts? By repeated application of Proposition 6.1, the set {F (Mλ) : ℓ(λ) = 3} generates the semigroup in question, so the point of the question is whether this generating set is minimal, and whether distinct indices yield distinct functions. We prove that this is the case as a corollary of Theorem 6.2. 6.2 Results for rank two matroids In this section, we prove that the morphism F : Mat → QSym distinguishes isomorphism classes of rank two matroids and that decomposability of F (M) for a rank two matroid M implies decomposability of Q(M), as stated in the following theorem. Theorem 6.2. Let λ ⊢ n with ℓ(λ) ≥ 3, and let J be a multiset of partitions of n, all of length three or more, such that F ([Mλ]) = F ([Mµ]), (16) where [Mτ ] denotes the isomorphism class of (loopless) rank two matroids on n elements indexed by the partition τ . Then, taking the set of standard basis vec- tors of Rn as the common ground set, there exists a collection of representative matroids on this ground set, Mλ ∈ [Mλ] and Mµ ∈ [Mµ] for all µ ∈ J which form a decomposition of matroid base polytopes Q(Mλ) = Q(Mµ). (17) Before the main proof of this theorem, we establish some preliminary results. We begin by developing a formula for F (Mλ) in terms of the new basis {Nα}. We define the following quasisymmetric functions in V n2 = span{Nα : |α| = n, r(α) = 2}. For all 1 ≤ k ≤ n− 1 let T nk := N(2,n−2) + k − 1 N(1,j,1,n−2−j), (18) where we understand N(1,j,1,n−2−j) to be N(1,n−2,1) when j = n − 2. We also define quasisymmetric functions Unk := k(n− k)T Note that each of the sets {T nk } and {U k } forms a basis for the subspace V where we consider QSym to have rational coefficients. Lemma 6.3. Let Mλ be the rank two matroid on n elements indexed by the partition λ = (λ1, . . . , λm). Then F (Mλ) = Unλi . (19) Proof. We write c(λi) to denote the parallelism class of elements in Mλ cor- responding to the part λi. A typical base B ∈ B(Mλ) is B = {ei, ej}, where ei ∈ c(λi) and ej ∈ c(λj) are in distinct parallelism classes. The Hasse diagram of PB has two minimal elements, ei and ej. There are edges from ei to all elements of the cobase Bc = E(Mλ) − B except for the λj − 1 elements which are in the same parallelism class c(λj) as ej. Similarly, there are edges from ej to all elements of the cobase except for the λi − 1 elements which are in the same parallelism class c(λi) as ei. We can analyze F (PB) as in the proof of Lemma 5.1 by applying a strict labeling γ : E(Mλ) → [n] such that γ(ei) = n, γ(ej) = n − 1, and the cobase elements are arbitrarily labeled with {1, 2, . . . , n − 2}. We take T = {B,Bc} to be our antichain-inducing partition of (PB , γ). There is one induced ordered partition (of [n]) of type (2, n − 2), namely K = (B,Bc), classifying one set of permutations in L(PB , γ), and thus contributing one N(2,n−2) term to the expansion of F (PB). For each 1 ≤ k < λj , and for each k-set A ⊂ B c(λj), there is an induced ordered partition K = ({ej}, A, {ei}, B c −A) of type (1, k, 1, n − 2 − k) contributing a term N(1,k,1,n−2−k) to the expansion. Thus there are such terms N(1,k,1,n−2−k) corresponding to ordered partitions K of type (1, k, 1, n−2−k) withK1 = {ej}. Likewise there are such terms N(1,k,1,n−2−k) corresponding to ordered partitions K of type (1, k, 1, n− 2− k) with K1 = {ei}. All the Nα ∈ N 2 are of one of these types, and we know that the terms of F (PB) must lie in V 2 , so these are the only types appearing in the expansion for F (PB). There can be no other terms than these due to the order relations in PB. Thus F (PB) = N(2,n−2) + λi − 1 λj − 1 N(1,k,1,n−2−k). (20) Using Equation (18), we can rewrite this as F (PB) = T + T nλj . Finally, there are λiλj such bases B ∈ c(λi)× c(λj). Summing over all pairs of parallelism classes of the matroid yields the formula F (Mλ) = λi(n− λi)T Unλi . Next we develop a similar formula for F (Mλ) in QSymn/m 2. Our starting point is the following corollary. Corollary 6.4. Let a and b be positive integers such that a+ b = n. Then ab ·N(1,a−1) ·N(1,b−1) = U a + U Proof. Let λ = (a, b). ThenMλ = U1,a⊕U1,b, where U1,m is the uniform matroid of rank one onm elements. As discussed in Example 5.3, F (U1,m) = mN(1,m−1). Therefore by the Hopf algebra morphism, we have F (Mλ) = F (U1,a) · F (U1,b) = aN(1,a−1) · bN(1,b−1). On the other hand, by Lemma 6.3 we have F (Mλ) = U a + U b . Equating right hand sides yields the desired formula. Since QSym with respect to its product structure is graded by composition rank as well as degree, the vector subspace V n2 ∩m 2 is spanned by the vectors {N(1,a−1) ·N(1,b−1) : a+ b = n}. Thus a basis for V n2 ∩ m 2 is {Unk + U n−k : 1 ≤ k ≤ }. For expressing our formula for F (Mλ), we find it convenient to define vectors U k as follows: Unk = Unk if k < 0 if k = n −Unn−k if k > Thus the set {Unk : 1 ≤ k < } forms a basis (over rational coefficients) for V n2 /m 2. We have the immediate corollary of Lemma 6.3: Corollary 6.5. Let Mλ be the rank two matroid on n elements indexed by the partition λ = (λ1, . . . , λm). Then F (Mλ) = Unλi . (22) The next proposition provides a necessary step for the main result, but may be of interest in its own right. Proposition 6.6. Let M2 be the set of matroid isomorphism classes (including those with loops) of rank two matroids. Let Matc be the vector subspace of Mat spanned by the isomorphism classes of connected matroids, and let M2c be the set of matroid isomorphism classes of connected rank two matroids. Then the algebra morphism F : Mat → QSym is injective when restricted to M2, and the induced quotient map of vector spaces F : Matc → QSym/m 2 is injective when restricted to M2c. Proof. We show that we can recover the isomorphism class of the matroid from its respective function. Suppose we are given F (M) for a rank two matroid M . We know that F (M) is a non-zero homogeneous function of degree n = |E(M)|, and so we recover the size of the ground set. Clearly, n ≥ 2. It is possible that M may have loops or coloops. By Lemma 5.4 we can recover the total number s of loops and coloops of M from F (M) by Equation (13). If s = n, then M consists of two coloops and n − 2 loops. Otherwise s ≤ n− 2, and we may factor F (M) as F (M) = N(s) · F (M where (s) is the one-part composition of s, and M ′ is the matroid obtained from M by removing all loops and coloops. If now F (M ′) ∈ V n−s1 , we have M ′ ∼= U1,n−s and M has one coloop and s− 1 loops. Otherwise M has s loops, no coloops, F (M ′) ∈ V n−s2 , and M ′ is a loopless rank two matroid on n − s elements. So now without loss of generality, we assume that M has no loops or coloops and thus is isomorphic to Mλ for some λ ⊢ n. We expand F (M) as F (M) = k . (23) This expansion can be determined since the set of {Unk } form a basis of V 2 . Per Lemma 6.3, for each k, the coefficient tk is the number of parts of λ that are equal to k, and so we recover λ from F (Mλ). The argument for recoveringM from F (M) for a connected rank two matroid M is similar. Since M is connected, it has no loops or coloops, and so again M is isomorphic to Mλ for some λ ⊢ n with ℓ(λ) ≥ 3, where n is the degree of F (M). We expand F (M) = ⌊(n−1)/2⌋∑ k . (24) This expansion can be determined since the set {Unk : 1 ≤ k < } forms a basis for the subspace V n2 /m 2. Note that λ cannot have a pair of parts with values k and n − k. Using this fact together with Corollary 6.5, we see that if the coefficient tk is nonnegative, then λ has exactly tk parts with value k. From this we can determine all the parts of λ which are < n . Since λ cannot have more than one part ≥ n , this allows us to determine the remaining part of λ, if Proof of Theorem 6.2. We write A⊔B to denote the disjoint union of multisets A and B. Note that a partition may be considered to be a multiset of integers. We fix n > 2 and λ ⊢ n with ℓ(λ) ≥ 3, and proceed by induction on |J |. The base case |J | = 1 follows from Proposition 6.6. So we assume that the statement holds for |J | < m for some fixed m > 1. Suppose now that F ([Mλ]) = F ([Mµ]), (25) where |J | = m. Say that a pair of elements µ, ν ∈ J are matching if for some value 1 < k < n − 1 we have k ∈ µ and n − k ∈ ν. If µ, ν are a matching pair, then we can apply Proposition 6.1 to form a new relation of type (25) by replacing J with J ′ = (J − {µ, ν}) ⊔ {τ}, where τ = (µ ⊔ ν) − {k, n − k}. At the same time, Proposition 6.1 tells us that we also have a decomposition of base polytopes Q(Mτ ) = Q(Mµ) ∪ Q(Mν). Since |J ′| < m, we can apply our induction hypothesis, and we are done. It remains to show that there exists a matching pair in J . For a partition τ ⊢ n, define the multiset g(τ) = {τi : τi > 1, τi 6= Define multisets L = g(λ) and R = µ∈J g(µ). Per Corollary 6.5 we expand F (Mλ) = ℓ(λ)∑ Unλi , and we similarly expand each F (Mµ) on the right hand side of (25). Since the set {Unk : 1 ≤ k < } forms a basis for V n2 /m 2, with Unk = −U n−k, we conclude that L ⊆ R and that the parts in R − L can be matched into complementary pairs of the form (k, n− k). Since no partition in J can contain both parts of a complementary pair, there exists a matching pair in J if R− L 6= ∅. We are assuming that |J | ≥ 2, and that R and L contain all the parts not equal to n or 1 on the respective sides of (25). The parts equal to 1 on both sides must match since all of the partitions have at least three parts and hence no part equal to (n− 1). The only way to have R−L = ∅ is if there exist µ, ν ∈ J each of which contains a part equal to n , in which case they are matching. Thus in all cases, there exists a matching pair µ, ν ∈ J , and the result follows by induction. Now we can give an affirmative answer to [3, Question 7.10]. Corollary 6.7. For a fixed n, the Hilbert basis for the semigroup in QSym/m2 generated by the set S = {F (Mλ) : λ ⊢ n, ℓ(λ) ≥ 3} is indexed by those Mλ for which ℓ(λ) = 3. Proof. Let T = {F (Mλ) : λ ⊢ n, ℓ(λ) = 3}. It follows from Proposition 6.1 that for ℓ(λ) > 3, F (Mλ) is decomposable into a sum µ F (Mµ), where for all µ, ℓ(µ) < ℓ(λ). Hence T generates the same semigroup as S. As noted in [3, Section 7], if ℓ(λ) = 3, then Q(Mλ) is indecomposable. Theorem 6.2 then implies that F (Mλ) must also be indecomposable, so T is the minimal generating set, i.e. the Hilbert basis of the semigroup. By Proposition 6.6, distinct indexing partitions yield distinct images, establishing the claim. 7 Additional observations In this section we discuss additional aspects of our new basis, especially regard- ing the expansion of F (M) for a matroid M . 7.1 Matroid duality, loops, and coloops Although we describe the basis {Nα} as ‘matroid-friendly’, things are slightly less friendly when considering matroid duality in the presence of coloops. This is due to the fact, mentioned in Section 5.1, that the mapping F : Mat → QSym factors through the quotient Mat → Mat/∼ → QSym, where ∼ denotes loop- coloop equivalence. For example, a fact proved in [3] is that, for any matroid M , in terms of the monomial basis for QSym the following relationship holds: F (M) = α =⇒ F (M∗) = α∗ . (26) where α∗ is the reversal of α, obtained by writing the parts of α in reverse order. If M be is a matroid of rank r on n elements having no loops or coloops, then we have the analogous relationship F (M) = mαNα =⇒ F (M mαNα∗ . (27) However this relationship breaks down if M has loops or coloops. We showed in Theorem 5.2 that if M is a loopless matroid of rank r on n elements, then F (M) ∈ V nr . More generally, if M is of rank r on n elements and has exactly ℓ loops, then F (M) ∈ V nr+ℓ. Thus if M has exactly c coloops, then we have the duality relationship F (M) ∈ V nr =⇒ F (M ∗) ∈ V nn−r+c. 7.2 Comultiplication The matroid Hopf algebra is graded by matroid rank as well as ground set size. Let Wnr be the subspace of Mat spanned by the classes of matroids of rank r on n elements. Then Wnr ·W s ⊂ W r+s . For any matroid M and A ⊆ E(M), r(M) = r(M |A) + r(M/A). So comultiplication in Mat also respects these gradings. (For general background on Hopf algebras, see [8].) That is, ∆Wnr ⊆ a+b=n, s+t=r W as ⊗W . (28) One might wonder whether the standard comultiplication of the Hopf algebra QSym respects the grading by the rank function for our new basis, that is, whether ∆V nr ⊆ a+b=n, s+t=r V as ⊗ V . (29) This is not the case. For the simplest example, consider n = 2 and r = 1. We have N 00 = {N0} = {1} and N 1 = {N11} = {x 11}. Note that there is no Nm0 (or rather, Nm0 = ∅) for m > 0. The basis vectors corresponding to the right hand side of (29) are N11 ⊗N0 = x 11 ⊗ 1, and N0 ⊗N11 = 1⊗ x However, ∆N11 = ∆x 11 = x11 ⊗ 1 + x1 ⊗ x1 + 1⊗ x11, which clearly does not lie in the span of the above vectors. The failure of the comultiplication to respect the rank grading can be viewed as another artifact of loop-coloop equivalence under the morphism F , as evi- denced by the fact that the rank grading is respected by comultiplication in the quotient space corresponding to matroids with neither loops nor coloops. Let J ⊂ QSym be the ideal generated by degree one elements, i.e. by {N1} = {x Similarly, let I ⊂ Mat be the ideal generated by degree one elements, i.e. by {[U0,1], [U1,1]}. Both I and J are Hopf ideals in their respective Hopf algebras, hence Mat/I and QSym/J (with their naturally induced comultiplications) are Hopf algebras. Moreover, I = F−1(J), so F : Mat → QSym induces a sur- jective Hopf algebra morphism Mat/I → QSym/J . Note that a natural basis for Mat/I is the set of all matroid isomorphism classes that have neither loops nor coloops, while a natural basis for QSym/J is {Nα : ℓ(α) is even}. Taking appropriate images under the quotient map, the relation (29) holds in QSym/J . The duality formula (27) also holds in QSym/J . 7.3 Comparison with other QSym bases In the course of their proof in Section 10 of [3], the authors introduce two new Z-bases for QSym. They also compare their bases to another Z-basis due to Stanley [19]. Our new basis is different from these three, as evidenced by the report by those authors that all three of these bases have some negative structure con- stants, whereas our new basis does not. However, of the three, ours most closely resembles that of Stanley. Stanley’s basis element indexed by a composition α = (α1, . . . , αm) is F (P ) where, as with our basis, P = A1 ⊕ · · · ⊕ Am, is the ordered sum of antichains A1, · · · , Am on α1, . . . , αm elements respectively. However, Stanley applies a natural labeling to P , whereas we apply an alter- nating labeling to the ranks in the poset for our basis. 7.4 Surjectivity of the Hopf algebra morphism Billera, Jia, and Reiner devote [3, Section 10] to showing that the morphism F : Mat → QSym is surjective over rational coefficients. In this subsection we sketch one way to shorten their proof somewhat using our new basis. The reader will need to consult [3] to have the full context. Define an ordering on compositions as follows. To each composition α we assign the binary word b(α) that begins with α1 zeros followed by α2 ones, then α3 zeros, then α4 ones, etc. We then linearly order compositions according to their binary words: α < β if b(α) <|startoftext|> Bremsstrahlung Radiation At a Vacuum Bubble Wall Jae-Weon Lee∗ School of Computational Sciences, Korea Institute for Advanced Study, 207-43 Cheongnyangni 2-dong, Dongdaemun-gu, Seoul 130-722, Korea Kyungsub Kim and Chul H. Lee Department of Physics, Hanyang University, Seoul 133-791, Korea Ji-ho Jang Korea Atomic Energy Research Institute Yuseong, Daejeon 305-353, Korea When charged particles collide with a vacuum bubble, they can radiate strong electromagnetic waves due to rapid deceleration. Owing to the energy loss of the particles by this bremsstrahlung radiation, there is a non-negligible damping pressure acting on the bubble wall even when thermal equilibrium is maintained. In the non-relativistic region, this pressure is proportional to the velocity of the wall and could have influenced the bubble dynamics in the early universe. PACS numbers: 12.15.Ji, 98.80.Cq There have been many studies on cosmological roles of first-order phase transitions, which proceed by nucleations and collisions of vacuum bubbles[1]. For example, in electroweak baryo- genesis models[2] rapid bubble expansion can provide a non-equilibrium environment, which may result in asymmetry between matter and antimatter. Furthermore, in some inflation- ary models[3, 4, 5], the speed of expanding vacuum bubbles determines how long the infla- tion period lasts. To understand the bubble kinematics in a hot plasma, it is important to study particle scatterings at a moving bubble wall. To calculate the velocity of electro-weak bubbles[6, 7, 8, 9, 10, 11] and the CP violating charge transport rate by the wall available for baryogenesis[2], one should know the reaction force acting on the wall due to the scattered par- ticles, such as quarks and gauge bosons[12, 13]. (For a supersymmetric model see, for example, Ref. 14) At the first order cosmological phase transition, the false vacuum decays to the true vacuum, which has lower energy, by making a vacuum bubble. When it is created, the wall of the bubble is at rest. As the free energy difference between the inner and the outer parts of the bubble fuels the wall, the velocity of the wall increases to the light velocity unless there is a damping force. In the literature, it is generally believed that the non-trivial damping force is caused by a deviation of the particle population from a thermal equilibrium one. In this paper, we study the effect of bremsstrahlung radiations emitted by particles on the pressure acting on a bubble wall (not necessary electroweak bubbles) during cosmological first order phase transitions. The aim of this work is to show that, contrary to the usual arguments, the radiation damping could give a non-negligible pressure even when the particles maintain thermal equilibrium. Bremsstrahlung (braking radiation) is a radiation due to the acceleration or deceleration of a charged particle[15]. Entering a true vacuum through a bubble wall, particles interact with the wall and could acquire mass and be decelerated. For example, a fermion field ψ can get mass through the well-known Yukawa term gψ̄φψ = mψ̄ψ, where φ is a Higgs field. At this time, if the particle is charged electromagnetically, it can radiate strong electromagnetic waves due to the deceleration. Let us calculate the pressure from the scattering. For simplicity, we assume a linear profile for the bubble wall, i.e., gφ(x) ≡ m(x) = m0x/d when 0 < x < d. (See Fig.1.) and choose the coordinates of the rest frame of the bubble wall. ∗Electronic address: scikid@kias.re.kr http://arxiv.org/abs/0704.0837v1 mailto:scikid@kias.re.kr This approximation is good for the usual tanh profile of the wall. The radiation power of an accelerated particle is given by a relativistic version of the Larmor’s formula[16]: dErad , (1) where A = 2e2/3c3 ≃ 0.0611 in the natural units (~ = c = k = 1) and ~k = (kx, ky, kz) is the 3-momentum of a particle. We assume a situation where this classical description of bremsstrahlung is good enough. Also, assuming that the wall is planar and parallel to the y-z plane, we can treat the bubble as a 1-dimensional one along the x-axis. The energy, momentum, and mass of the particle satisfy the usual relation E2 ≡ m2(x) + ~k2(x). (2) Let us denote the x-component of the momentum (kx) as k from now on. Differentiating the above equation with time t and using dx/dt ≡ v and k = Ev, we get the force acting on the wall due to the particles , (3) which is the starting point of the pressure calculation[6]. However, if we also consider the energy carried away by the radiation Erad, then the total energy conserved is Etot ≡ E+Erad and the force and, hence, the pressure should be changed. From dEtot/dt = 0, we obtain = 0, (4) which has a solution for the force . (5) Up to O(A), one can expand the square root term and obtain . (6) The second term represents the radiation damping. Then, the total pressure due to the collisions of the particles in the plasma is given by[6] (2π)3 f(E(k)) , (7) where f(E) = (exp(βE) ± 1)−1 is a distribution function of fermions and bosons, respectively. First, let us briefly review the well-known results without radiation damping. When the mean velocity of the plasma fluid V relative to the wall (or the negative of the bubble wall velocity relative to the fluid ) is zero, the first term of Eq. (6) contributes dm2(x) (2π)3 eβE ± 1 = F (m0, T )− F (0, T ), (8) where F (φ, T ) is a free energy of φ at a temperature T = β−1. When V 6= 0, the distribution function is changed to f [γ(E − V k)] = eβγ(E−V k) ± 1 . (9) Here, γ = (1−V 2)− 2 . However, using the fact that the phase factor d3~k/E is a Lorentz invariant and changing the integration variable to k′ = γ(k − V E) and defining E′ ≡ γ(E − V k), one can find that the V dependency of P1 disappears [6]. From this, it is generally believed that to get non-trivial pressure on the wall, one needs to consider a non-equilibrium deviation of f [7]. Our work indicates this is not necessarily true for some phase transitions. To see this, consider the effect of the radiation (the second term of Eq. (6)). When V = 0, the term contributes to the pressure P2 = 2A dm(x) (2π)3 Ek(eβE ± 1) which also vanishes because the second integrand is an odd function of k. However, when V 6= 0, one can easily check that, due to the 1/k term, the V dependency survives even under the change of the integration variable. Thus, in this case, P2 = 2A dm(x) (2π)3 Ek(eβγ(E−V k) ± 1) dm(x) dx I2(x). (10) To be more concrete, let us calculate an approximate value of the integration when V ≪ 1 for fermions. In this case, we can expand f [γ(E − V k)] ≃ f(E) − V βkf(E)[f(E) − 1] = f(E) + V βkf2(E)exp(−βE). The integration of the first term gives zero, and the second term contributes I2 = V β (2π)3 f2(E)exp(−βE) ≃ (ln2)TV because (2π)3 f2(E)exp(−βE) ≃ (ln2)T 2 , (12) to lowest oder in (m/T )2 (See Ref. 7). Therefore, for the wall described in Fig. 1 the pressure by the radiation is (ln2)Am20T V, (13) which is comparable to the result of numerical integration of Eq. (10) for V ≪ 1, as shown in Fig. 2. (During the numerical study it is useful to change the measure from dkydkz to 2πEdE.) This pressure is proportional to the wall velocity up to the moderately relativistic case and exists even when the system is in a thermal equilibrium. During the electroweak phase transition, a particle’s electromagnetic charge is not definite, so the A value in Lamor’s formula can not be a constant. In this paper , however, to perform a rough calculation, we have assumed that A is a constant during the phase transition. For illustration of high temperature effects on electric charges, now we consider a Debye screening of electric charge by plasma during the phase transition, which is given by effective coupling αeff = α/(1 − 2α ln(k/Λ)/3π) ≃ 0.97α, where we used averaged momentum 〈k〉 ≃ 3T and Λ of order electron mass at the last approximation (see Eq. (42) of [17]). Thus, we obtain A = 0.0599 which is slightly smaller than the zero temperature value. We also plot the pressure with this A value. It is noteworthy that the pressure caused by the radiation damping (Eq. (13)) is of order O(α), which is bigger than the pressure due to a departure from thermal equilibriums[7, 18, 19] (O(α2))[18], and hence non-negligible. Here, α is the fine structure constant. Note also that the power of bremsstrahlung due to bubble walls is much stronger (O(α)) than that of ordinary bremsstrahlung of electrons colliding with ions in a plasma (O(α3))[20]. Since the electroweak phase transition is a complicated phenomenon, by no means is our work a full calculation of the pressure acting on the electroweak bubbles. The purpose of this paper is to present a general idea that radiation damping (although usually ignored in the many related works for bubble wall velocity calculations) could give rise to significant frictional forces even in thermal equilibrium states at some cosmological phase transitions. To include the effects of other particles (e.g., gluon and W/Z particles) in our work, we need to modify Larmor’s formula by using some sort of group factor. Even in this case, it is hardly probable that the pressure from the radiation damping from different gauge sectors exactly cancel each other. Hence, one can expect that a O(α) viscosity to survive. Since bubbles are slow initially, they are supposed to be in a thermal equilibrium state initially. An ordinary calculation shows no friction at this time, but the radiation damping force exists in this stage, and hence, this pressure can significantly change the early evolution of the vacuum bubbles, and the nature of electroweak baryogenesis or inflationary cosmology. ACKNOWLEDGEMENTS The authors are thankful to Myongtak Choi for helpful discussions. This work was supported in part by the Korean Science and Engineering Foundation and Korea Research Foundation (BSRI-98-2441). [1] C. H. Lee, J. Korean Phys. Soc. 33, 588 (1998). [2] M. Trodden, Rev. Mod. Phys. 71, 1463 (1999). [3] D. La and P. J. Steinhardt, Phys. Rev. Lett. 62, 376 (1989). [4] D. S. Goldwirth and H. W. Zaglauer, Phys. Rev. Lett. 67, 3639 (1991). [5] S. Koh, J. Korean Phys. Soc. 49, 787 (2005). [6] N. Turok, Phys. Rev. Lett. 68, 1803 (1992). [7] G. Moore and T. Prokopec, Phys. Rev. Lett. 75, 777 (1995). [8] P. J. Steinhardt, Phys. Rev. D 25, 2074 (1982). [9] K. Enqvist, J. Ignatius, K. Kajantie, and K. Rummukainen, Phys. Rev. D 45, 3415 (1992). [10] M. Dine, R. G. Leigh, P. Huet, A. Linde, and D. Linde, Phys. Rev. D 46, 550 (1992). [11] C. H. Lee, J. Korean Phys. Soc. 32, 861 (1998). [12] D. B. K. Andrew G. Cohen and A. E. Nelson, Nuc. Phys. B 349, 727 (1991). [13] G. R. Farrar and M. E. Shaposhnikov, Phys. Rev. D 50, 774 (1994). [14] P. John and M. G. Schmidt, Nucl. Phys. B 598, 291 (2001). [15] K. T. Byun, K. Y. Kim, and H. Y. Kwak, J. Korean Phys. Soc. 47, 1010 (2005). [16] J. Jackson, Classical Electrodynamics, 2nd ed. (Wiley, New York, 1975). [17] R. A. Schneider, Phys. Rev. D66, 036003 (2002). [18] G. D. Moore, JHEP 0003, 006 (2000). [19] G. D. Moore and T. Prokopec, Phys. Rev. D 52, 7182 (1995). [20] S. Ichimaru, Basic Principles of Plasma Physics (W. A. Benjamin, Reading, MA., 1973). m ( x ) FIG. 1: The effective mass of the particle m(x) in the wall rest frame. 0.2 0.4 0.6 0.8 1 0.001 0.002 0.003 0.004 FIG. 2: The pressure by the radiation damping of fermions colliding with the linear bubble wall as a function of the wall velocity. The thick line shows numerical integration of Eq. (10) and the dotted line shows the approximate formula in Eq. (13). Here we set 1/d = 1 = m0 = T for simplicity. The dashed line represents the result with Debye screening of charge. ACKNOWLEDGEMENTS References ABSTRACT When charged particles collide with a vacuum bubble, they can radiate strong electromagnetic waves due to rapid deceleration. Owing to the energy loss of the particles by this bremsstrahlung radiation, there is a non-negligible damping pressure acting on the bubble wall even when thermal equilibrium is maintained. In the non-relativistic region, this pressure is proportional to the velocity of the wall and could have influenced the bubble dynamics in the early universe. <|endoftext|><|startoftext|> Introduction The classical setting of the universal lossless compression problem [5], [8], [9] assumes that a se- quence xn of length n that was generated by a source θ is to be compressed without knowledge of the particular θ that generated xn but with knowledge of the class Λ of all possible sources θ. The average performance of any given code, that assigns a length function L(·), is judged on the basis of the redundancy function Rn (L,θ), which is defined as the difference between the expected code length of L (·) with respect to (w.r.t.) the given source probability mass function Pθ and the nth-order entropy of Pθ normalized by the length n of the uncoded sequence. A class of sources is said to be universally compressible in some worst sense if the redundancy function diminishes for this worst setting. Another approach to universal coding [29] considers the individual sequence redundancy R̂n (L, x n), defined as the normalized difference between the code length obtained by L(·) for xn and the negative logarithm of the maximum likelihood (ML) probability of the sequence xn, where the ML probability is within the class Λ. We thereafter refer to this negative logarithm as the ML description length of xn. The individual sequence redundancy is defined for each sequence that can be generated by a source θ in the given class Λ. Classical literature on universal compression [5], [8], [9], [23], [29] considered compression of sequences generated by sources over finite alphabets. In fact, it was shown by Kieffer [15] (see also [13]) that there are no universal codes (in the sense of diminishing redundancy) for sources over infinite alphabets. Later work (see, e.g., [21], [25]), however, bounded the achievable redundancies for identically and independently distributed (i.i.d.) sequences generated by sources over large and infinite alphabets. Specifically, while it was shown that the redundancy does not decay if the alphabet size is of the same order of magnitude as the sequence length n or greater, it was also shown that the redundancy does decay for alphabets of size o(n). 1 While there is no universal code for infinite alphabets, recent work [20] demonstrated that if one considers the pattern of a sequence instead of the sequence itself, universal codes do exist in the sense of diminishing redundancy. A pattern of a sequence, first considered, to the best of our knowledge, in [1], is a sequence of indices, where the index ψi at time i represents the order of first occurrence of letter xi in the sequence x n. Further study of universal compression of patterns [20], [21], [26], [28] provided various lower and upper bounds to various forms of redundancy in universal 1For two functions f(n) and g(n), f(n) = o(g(n)) if ∀c,∃n0, such that, ∀n > n0, f(n) < cg(n); f(n) = O(g(n)) if ∃c, n0, such that, ∀n > n0, 0 ≤ f(n) ≤ cg(n); f(n) = Θ(g(n)) if ∃c1, c2, n0, such that, ∀n > n0, c1g(n) ≤ f(n) ≤ c2g(n). compression of patterns. Another related study is that of compression of data, where the order of the occurring data symbols is not important, but their types and empirical counts are [30]-[31]. This paper considers universal compression of data sequences generated by distributions that are known a-priori to be monotonic. Hence, the order of probabilities of the source symbols is known in advance to both encoder and decoder and can be utilized as side information to improve universal compression performance. Monotonic distributions are common for distributions over the integers, including the geometric distribution and others. Such distributions do occur in image compression problems (see, e.g., [18], [19]), and in other applications that compress residual signals. A specific application one can consider for the results in this paper is compression of the list of last or first names in a given city of a given population. One can usually find some monotonicity for such a distribution in the given population, which both encoder and decoder may be aware of a-priori . For example, the last name “Smith” can be expected to be much more common than the last name “Shannon”. Another example is the compression of a sequence of observations of different species, where one has prior knowledge which species are more common, and which are rare. Finally, one can consider compressing data for which side information given to the decoder through a different channel gives the monotonicity order. Unlike compression of patterns, Foster, Stine, and Wyner, showed in [10] that there are no universal block codes in the standard sense for the complete class of monotonic distributions. The main reason is that there exist such distributions, for which much of the statistical weight lies in symbols that have very low probability, and most of which will not occur in a given sequence. Thus, in practice, even though one has the prior knowledge of the monotonicity of the distribution, this monotonicity is not necessarily retained in an observed sequence. Therefore, actual coding can be very similar to compressing with infinite alphabets, and the additional prior knowledge of the monotonicity is not very helpful in reducing redundancy. Despite that, Foster, Stine, and Wyner demonstrated codes that obtained universal per-symbol redundancy of o(1) as long as the source entropy is fixed (i.e., neither increasing with n nor infinite). However, instead of considering redundancy in the standard sense, the study of monotonic distributions resorted to studying relative redundancy , which bounds the ratio between average assigned code length and the source entropy. This approach dates back to work by Elias [7], Rissanen [22], and Ryabko [24]. The work in [10] studied coding sequences (or blocks) generated by i.i.d. monotonic distributions, and designed codes for which the relative block redundancy could be (upper) bounded. Unlike that work, the focus in [7], [22], and [24] was on designing codes that minimize the redundancy or relative redundancy for a single symbol generated by a monotonic distribution. Specifically, in [22], minimax codes, which minimize the relative redundancy for the worst possible monotonic distribution over a given alphabet size, were derived. In [24], it was shown that redundancy of O(log log k), where k is the alphabet size, can be obtained with minimax per-symbol codes. Very recent work [16] considered per-symbol codes that minimize an average redundancy over the class of monotonic distributions for a given alphabet size. Unlike [10], all these papers study per-symbol codes. Therefore, the codes designed always pay non-diminishing per-symbol redundancy. A different line of work on monotonic distributions considered optimizing codes for a known monotonic distribution but with unknown parameters (see [18], [19] for design of codes for two-sided geometric distributions). In this line of work, the class of sources is very limited and consists of only the unknown parameters of a known distribution. In this paper, we consider a general class of monotonic distributions that is not restricted to a specific type. We study standard block redundancy for coding sequences generated by i.i.d. monotonic distributions, i.e., a setting similar to the work in [10]. We do, however, restrict ourselves to smaller subsets of the complete class of monotonic distributions. First, we consider monotonic distributions over alphabets of size k, where k is either small w.r.t. n, or of O(n). Then, we extend the analysis to show that under minimal restrictions of the monotonic distribution class, there exist universal codes in the standard sense, i.e., with diminishing per-symbol redundancy. In fact, not only do universal codes exist, but under mild restrictions, they achieve the same redundancy as obtained for alphabets of size O(n). The restrictions on this subclass imply that some types of fast decaying monotonic distributions are included in it, and therefore, sequences generated by these distributions (without prior knowledge of either the distribution or of its parameters) can still be compressed universally in the class of monotonic distributions. The main contributions of this paper are the development of codes and derivation of their upper bounds on the redundancies for coding i.i.d. sequences generated by monotonic distributions. Specifically, the paper gives complete characterization of the redundancy in coding with monotonic distributions over “small” alphabets (k = o(n1/3)) and “large” alphabets (k = O(n)). Then, it shows that these redundancy bounds carry over (in first order) to fast decaying distributions. Next, a code that achieves good redundancy rates for even slower decaying monotonic distributions is derived, and is used to study achievable redundancy rates for such distributions. Lower bounds are also presented to complete the characterization, and are shown to meet the upper bounds in the first three cases (small alphabets, large alphabets, and fast decaying distributions). The lower bounds turn out to result from lower bounds obtained for coding patterns. The relationship to patterns is demonstrated in the proofs of those lower bounds. Finally, individual sequences are considered. It is shown that under mild conditions, there exist universal codes w.r.t. the monotonic ML description length for sequences that contain the O(n) more likely symbols, even if their empirical distributions are not monotonic. The outline of this paper is as follows. Section 2 describes the notation and basic definitions. Then, in section 3, lower bounds on the redundancy for monotonic distributions are derived. Next, in Section 4, we propose codes and upper bound their redundancy for coding monotonic distribu- tions over small and large alphabets. These bounds are then extended to fast decaying monotonic distributions in Section 5. Finally, in Section 6, we consider individual sequence redundancy. 2 Notation and Definitions Let xn = (x1, x2, . . . , xn) denote a sequence of n symbols over the alphabet Σ of size k, where k can go to infinity. Without loss of generality, we assume that Σ = {1, 2, . . . , k}, i.e., it is the set of positive integers from 1 to k. The sequence xn is generated by an i.i.d. distribution of some source, determined by the parameter vector θ = (θ1, θ2, . . . , θk), where θi is the probability of X taking value i. The components of θ are non-negative and sum to 1. The distributions we consider in this paper are monotonic. Therefore, θ1 ≥ θ2 ≥ . . . ≥ θk. The class of all monotonic distributions will be denoted by M. The class of monotonic distributions over an alphabet of size k is denoted by Mk. It is assumed that prior to coding xn both encoder and decoder know that θ ∈ M or θ ∈ Mk, and also know the order of the probabilities in θ. In the more restrictive setting, k is known in advance and it is known that θ ∈ Mk. We do not restrict ourselves to this setting. In general, boldface letters will denote vectors, whose components will be denoted by their indices in the vector. Capital letters will denote random variables. We will denote an estimator by the hat sign. In particular, θ̂ will denote the ML estimator of θ which is obtained from xn. The probability of xn generated by θ is given by Pθ (x = Pr (xn | Θ = θ). The average per-symbol2 nth-order redundancy obtained by a code that assigns length function L(·) for θ is Rn (L,θ) EθL [X n]−Hθ [X] , (1) where Eθ denotes expectation w.r.t. θ, and Hθ [X] is the (per-symbol) entropy (rate) of the source 2In this paper, redundancy is defined per-symbol (normalized by the sequence length n). However, when we refer to redundancy in overall bits, we address the block redundancy cost for a sequence. (Hθ [X n] is the nth-order sequence entropy of θ, and for i.i.d. sources, Hθ [X n] = nHθ [X]). With entropy coding techniques, assigning a universal probability Q (xn) is identical to designing a uni- versal code for coding xn where, up to negligible integer length constraints that will be ignored, the negative logarithm to the base of 2 of the assigned probability is considered as the code length. The individual sequence redundancy (see, e.g., [29]) of a code with length function L (·) per sequence xn is R̂n (L, x {L (xn) + log PML (xn)} , (2) where the logarithm function is taken to the base of 2, here and elsewhere, and PML (x n) is the probability of xn given by the ML estimator θ̂Λ ∈ Λ of the governing parameter vector Θ. The negative logarithm of this probability is, up to integer length constraints, the shortest possible code length assigned to xn in Λ. It will be referred to as the ML description length of xn in Λ. In the general case, one considers the i.i.d. ML. However, since we only consider θ ∈ M, i.e., restrict the sequence to one governed by a monotonic distribution, we define θ̂M ∈ M as the monotonic ML estimator. Its associated shortest code length will be referred to as the monotonic ML description length. The estimator θ̂M may differ from the i.i.d. ML θ̂, in particular, if the empirical distribution of xn is not monotonic. The individual sequence redundancy in M is thus defined w.r.t. the monotonic ML description length, which is the negative logarithm of PML (x xn | Θ = θ̂M ∈ M The average minimax redundancy of some class Λ is defined as R+n (Λ) = min Rn (L,θ) . (3) Similarly, the individual minimax redundancy is that of the best code L (·) for the worst sequence R̂+n (Λ) = min {L (xn) + logPθ (xn)} . (4) The maximin redundancy of Λ is R−n (Λ) = sup w (dθ)Rn (L,θ) , (5) where w(·) is a prior on Λ. In [5], it was shown that R+n (Λ) ≥ R−n (Λ). Later, however, [6], [11], [24] the two were shown to be essentially equal. 3 Lower Bounds Lower bounds on various forms of the redundancy for the class of monotonic distributions can be obtained with slight modifications of the proofs for the lower bounds on the redundancy of coding patterns in [14], [20], [21], and [26]. The bounds are presented in the following three theorems. For the sake of completeness, the main steps of the proofs of the first two theorems are presented in appendices, and the proof of the third theorem is presented below. The reader is referred to [14], [20], [21], [25] and [26] for more details. Theorem 1 Fix an arbitrarily small ε > 0, and let n → ∞. Then, the nth-order average max- imin and minimax universal coding redundancies for i.i.d. sequences generated by a monotonic distribution with alphabet size k are lower bounded by R−n (Mk) ≥ log n + k−1 log πe log k , for k ≤ πn1−ε )1/3 · (1.5 log e) · n(1−ε)/3 , for k > πn1−ε )1/3 . (6) Theorem 2 Fix an arbitrarily small ε > 0, and let n→ ∞. Then, the nth-order average universal coding redundancy for coding i.i.d. sequences generated by monotonic distributions with alphabet size k is lower bounded by Rn (L,θ) ≥ log n − k−1 log 8π log k , for k ≤ 1 1.5 log e 2π1/3 · n(1−ε)/3 , for k > 1 )1/3 (7) for every code L(·) and almost every i.i.d. source θ ∈ Mk, except for a set of sources Aε (n) whose relative volume in Mk goes to 0 as n→ ∞. Theorems 1 and 2 give lower bounds on redundancies of coding over monotonic distributions for the class Mk. However, the bounds are more general, and the second region applies to the whole class of monotonic distributions M. As in the case of patterns [20], [26], the bounds in (6)-(7) show that each parameter costs at least 0.5 log(n/k3) bits for small alphabets, and the total universality cost is at least Θ(n1/3−ε) bits overall for larger alphabets. Unlike the currently known results on patterns, however, we show in Section 4 that for k = O(n) these bounds are achievable for monotonic distributions. The proofs of Theorems 1 and 2 are presented in Appendix A and in Appendix B, respectively. Theorem 3 Let n → ∞. Then, the nth-order individual minimax redundancy for i.i.d. sequences with maximal letter k w.r.t. the monotonic ML description length with alphabet size k is lower bounded by R̂+n (Mk) ≥ log n log e 23/12 log k , for k ≤ e5/18 (2π)1/3 · n1/3 e5/18 (2π)1/3 (log e) · n1/3 , for n > k > e (2π)1/3 · n1/3 (log e) · n1/3 , for k ≥ n. Theorem 3 lower bounds the individual minimax redundancy for coding a sequence believed to have an empirical monotonic distribution. The alphabet size is determined by the maximal letter that occurs in the sequence, i.e., k = max {x1, x2, . . . , xn}. (If k is unknown, one can use Elias’ code for the integers [7] using O(log k) bits to describe k. However this is not reflected in the lower bound.) The ML probability estimate is taken over the class of monotonic distributions, i.e., the empirical probability (standard ML) estimate θ̂ is not θ̂M in case θ̂ does not satisfy the monotonicity that defines the class M. While the average case maximin and minimax bounds of Theorem 1 also apply to R̂+n (Mk), the bounds of Theorem 3 are tighter for the individual redundancy and are obtained using individual sequence redundancy techniques. Proof of Theorem 3: Using Shtarkov’s normalized maximum likelihood (NML) approach [29], one can assign probability Q (xn) yn Pθ̂M maxθ′∈M Pθ′ (x yn maxθ′∈M Pθ′ (y to sequence xn. This approach minimizes the individual minimax redundancy, giving individual redundancy of R̂n (Q,x maxθ′∈M Pθ′ (x Q (xn) Pθ′ (y to every xn, specifically achieving the individual minimax redundancy. It is now left to bound the logarithm of the sum in (10). For the first two regions, we follow the approach used in Theorem 2 in [21] for bounding the redundancy for standard compression of i.i.d. sequences over large alphabets, but adjust it to monotonic distributions. Alternatively, one can derive the same bounds following the approach used for bounding the individual minimax redundancy of patterns in proving Theorem 12 in [20]. Let nℓx = (nx(1), nx(2), . . . , nx(ℓ)) denote the occurrence counts of the first ℓ letters of the alphabet Σ in xn. For ℓ = k, i=1 nx(i) = n. Now, following (10), nR̂+n (Mk) ≥ log yn:θ̂(yn)∈M ≥ log ny(1), . . . , ny(ℓ) ny(i) )ny(i) ≥ log ny(1), . . . , ny(k) ny(i) )ny(i) ≥ log ek/12 · (2π)k/2 nx(i) ≥ log k − 1 ek/12 ≥ k − 1 + k log e23/12√ −O (log k) (11) where (a) follows from including only sequences yn that have a monotonic empirical (i.i.d. ML) distribution in Shtarkov’s sum. Inequality (b) follows from partitioning the sequences yn into types as done in [21], first by the number of occurring symbols ℓ, and then by the empirical distribution. Unlike standard i.i.d. distributions though, monotonicity implies that only the first ℓ symbols in Σ occur, and thus the choice of ℓ out of k in the proof in [21] is replaced by 1. Like in coding patterns, we also divide by ℓ! because each type with ℓ occurring symbols can be ordered in at most ℓ! ways, where only some retain the monotonicity. (Note that this step is the reason that step (b) produces an inequality, because more than one of the orderings may be monotonic if equal occurrence counts occur.) Except the division by ℓ!, the remaining steps follow those in [21]. Retaining only the term ℓ = k yields inequality (c). Inequality (d) follows from Stirling’s bound 2πm · ≤ m! ≤ 2πm · · exp . (12) Then, (e) follows from the relation between arithmetic and geometric means, and from expressing the number of types as the number of ordered partitions of n into k parts k − 1 . Finally, (f) follows from applying (12) again and by lower bounding k − 1 The first region in (8) results directly from (11). The behavior is similar to patterns as shown in [1] for this region. As mentioned in [20], to obtain the second region, the bound is maximized by retaining ℓ̂ = n1/3e5/18 /(2π)1/3 instead of k in step (c) of (11), for every k ≥ ℓ̂. The bounds obtained are equal to those obtained for patterns because the first step (a) in (11) discards all the sequences whose contributions to Shtarkov’s sum are different between patterns and monotonic distributions. A similar step is effectively done deriving the bounds for patterns. The difference is that in the case of patterns, components of Shtarkov’s sum are reduced, but all are retained in the sum, while here, we omit components from the sum, corresponding to sequences with non- monotonic i.i.d. ML estimates. The analysis in [20] that also attains the second region of the bound in (8) is still valid here. It differs from the steps taken above by lower bounding a pattern probability by a larger probability than the ML i.i.d. probability corresponding to the pattern. The bound used in the derivation of Theorem 12 in [20] adds a multiplicative factor to each pattern probability which equals the number of sequences with the same pattern and an equal i.i.d. ML probability. However, this similar effect is included in Shtarkov’s sum for monotonic distributions since all these sequences do have a corresponding i.i.d. ML estimate which is monotonic, and are thus not omitted by step (a) of the derivation. The analysis in [14] yields the third region of the bound in (8), since, for k ≥ n, R̂+n (Mk) = Ψ(yn) 1.5n1/3 log e log n , (13) where Ψ(yn) is the pattern of the sequence yn. Inequality (a) holds because each pattern cor- responds to at least one sequence whose ML probability parameter estimates are ordered, i.e., θ̂i ≥ θ̂i+1,∀i, where the most probable index represents i = 1, the second most probable index i = 2, and so on. Note that the sum element on the right hand side is for a probability of a sequence, not a pattern, but the sum is over all patterns. The left hand side also includes sequences for which the probabilities are unordered. Furthermore, exchanging the letters that correspond to two indices with the same occurrence count will not violate monotonicity. Thus the inequality follows. Step (b) in (13) is taken from [14], where the sum on the left hand side was shown to equal the right hand side. This was true when summing over all patterns with up to n indices, thus requiring k ≥ n. Note that this requirement does not mean that n distinct symbols must occur in xn, only that the maximal symbol in xn is n or greater. This concludes the proof of Theorem 3. � 4 Upper Bounds for Small and Large Alphabets In this section, we demonstrate codes that asymptotically achieve the lower bounds for θ ∈ Mk and k = O(n). We begin with a theorem that shows the achievable redundancies, and devote the remainder of the section to describing the codes and deriving upper bounds on their redundancies. The theorem is stated assuming no initial knowledge of k. The proof first considers the setting where k is known, and then shows how the same bounds are achieved even when k is unknown in advance, but as long as it satisfies the conditions. Theorem 4 Fix an arbitrarily small ε > 0, and let n → ∞. Then, there exist a code with length function L∗ (·) that achieves redundancy Rn (L ∗,θ) ≤ (1 + ε) k−1 n(logn)2 , for k ≤ n1/3, (1 + ε) (log n) log k n1/3−ε , for n1/3 < k = o(n), (1 + ε) 2 (log n) 2 n1/3 , for n1/3 < k = O(n), for i.i.d. sequences generated by any source θ ∈ Mk. Slightly tighter bounds are possible in the first and second regions and between them. The bounds presented, however, are inclusive for each of the regions. Note that the third region con- tains the second, but if k = o(n), a tighter bound is possible in the second region. The code designed to code a sequence xn is a two part code [23] that quantizes a distribution that minimizes the cost, and uses it to code xn. The total redundancy cost consists of the cost of describing the quantized distribution and the quantization cost. The second is bounded through the quantized true distribution of the sequence, which cannot result in lower cost than that of the chosen dis- tribution (which minimizes the cost). In order to achieve the low costs of the lower bound, the probability parameters are quantized non-uniformly, where the smaller the probability the finer the quantization. This approach was used in [25] and [26] to obtain upper bounds on the redundancy for coding over large alphabets and for coding patterns, respectively. The method used in [25] and [26], however, is insufficient here, because it still results in too many quantization points due to the polynomial growth in quantization spacing. Here, we use an exponential growth as the parameters increase. This general idea was used in [28] to improve an upper bound on the redundancy of coding patterns. Here, however, we improve on the method presented in [28]. Another key step in the proof here is the fact that since both encoder and decoder know the order of the probabilities a-priori , this order need not be coded. It is sufficient to encode the quantized probabilities of the monotonic distribution, and the decoder can identify which probability is associated with which symbol using the monotonicity of the distribution. Proof of Theorem 4: We start with k ≤ n1/3 assuming k is known. Let β = 1/(log n) be a parameter (note, that we can choose other values). Partition the probability space into J1 = ⌈1/β⌉ intervals, n(j−1)β , 1 ≤ j ≤ J1. (15) Note that I1 = [1/n, 2/n), I2 = [2/n, 4/n), . . . , Ij = [2 j−1/n, 2j/n). Let kj = |θi ∈ Ij| denote the number of probabilities in θ that are in interval Ij. In interval j, take a grid of points with spacing . (16) Note that to complete all points in an interval, the spacing between two points at the boundary of an interval may be smaller. There are ⌈log n⌉ intervals. Ignoring negligible integer length constraints (here and elsewhere), in each interval, the number of points is bounded by |Ij | ≤ , ∀j : j = 1, 2, . . . , J1, (17) where | · | denotes the cardinality of a set. Let the grid τ = (τ1, τ2, . . .) = , . . . , , . . . be a vector that takes all the points from all intervals, with cardinality = |τ | ≤ 1 ⌈log n⌉ . (19) Now, let ϕ = (ϕ1, ϕ2, . . . , ϕk) be a monotonic probability vector, such that ϕi = 1, ϕ1 ≥ ϕ2 ≥ · · · ≥ ϕk ≥ 0, and also the smaller k−1 components of ϕ are either 0 or from τ , i.e., ϕi ∈ (τ ∪ {0}), i = 2, 3, . . . , k. One can code xn using a two part code, assuming the distribution governing xn is given by the parameter ϕ. The code length required (up to integer length constraints) is L (xn|ϕ) = log k + LR(ϕ)− log Pϕ (xn) , (20) where log k bits are needed to describe how many letter probabilities are greater than 0 in ϕ, and LR(ϕ) is the number of bits required to describe the quantized points of ϕ. The vector ϕ can be described by a code as follows. Let k̂ϕ be the number of nonzero letter probabilities hypothesized by ϕ. Let bi denote the index of ϕi in τ , i.e., ϕi = τbi . Then, we will use the following differential code. For ϕ we need at most 1 + log b + 2 log(1 + log b bits to code its index in τ using Elias’ coding for the integers [7]. For ϕi−1, we need at most 1 + log(bi−1 − bi + 1) + 2 log[1 + log(bi−1 − bi + 1)] bits to code the index displacement from the index of the previous parameter, where an additional 1 is added to the difference in case the two parameters share the same index. Summing up all components of ϕ, and taking b k̂ϕ+1 LR(ϕ) ≤ k̂ϕ − 1 + log (bi − bi+1 + 1) + 2 log [1 + log (bi − bi+1 + 1)] ≤ (k − 1) + (k − 1) log B1 + k − 1 + 2(k − 1) log log B1 + k − 1 + o(k) = (1 + ε) k − 1 n (log n) . (21) Inequality (a) is obtained by applying Jensen’s inequality once on the first sum, twice on the second sum utilizing the monotonicity of the logarithm function, and by bounding k̂ϕ by k and absorbing low order terms in the resulting o(k) term. Then, low order terms are absorbed in ε, and (19) is used to obtain (b). To code xn, we choose ϕ which minimizes the expression in (20) over all ϕ, i.e., L∗ (xn) = min L (xn|ϕ) △= L (xn|ϕ̂) . (22) The pointwise redundancy for xn is given by nRn (L ∗, xn) = L∗ (xn) + log Pθ (x n) = log k + L∗R (ϕ̂) + log Pθ (x Pϕ̂ (x . (23) Note that the pointwise redundancy differs from the individual one, since it is defined w.r.t. the true probability of xn. To bound the third term of (23), let θ′ be a quantized still monotonic version of θ onto τ , i.e., θ′i ∈ (τ ∪ {0}), i = 2, 3, . . . , k, where if θi > 0 ⇔ θ′i > 0 as well. Define the quantization error, δi = θi − θ′i. (24) The quantization is performed from the smallest parameter θk to the largest, where monotonicity is retained, as well as minimal absolute quantization error. This implies that θi will be quantized to one of the two nearest grid points (one smaller and one greater than it). It also guarantees that |δ1| ≤ ∆ , where j2 is the index of the interval in which θ2 is contained, i.e., θ2 ∈ Ij2 . Now, since θ′ is included in the minimization of (22), we have, for every xn, L∗ (xn) ≤ L xn|θ′ , (25) and also nRn (L ∗, xn) ≤ log k + LR + log Pθ (x Pθ′ (x . (26) Averaging over all possible xn, the average redundancy is bounded by nRn (L ∗,θ) = log k + EθL R (ϕ̂) + Eθ log Pθ (X Pϕ̂ (X ≤ log k + EθLR ′)+ Eθ log Pθ (X Pθ′ (X . (27) The second term of (27) is bounded with the bound of (21), and we proceed with the third term. Eθ log Pθ (X Pθ′ (X θi log θ′i + δi ≤ n(log e) θ′i + δi = n(log e) ≤ k log e+ 2(log e)k kj · njβ ≤ 5(log e)k. (28) Equality (a) is since the argument in the logarithm is fixed, thus expectation is performed only on the number of occurrences of letter i for each letter. Representing θi = θ i + δi yields equation (b). We use ln(1+x) ≤ x to obtain (c). Equality (d) is obtained since all the quantization displacements must sum to 0. The first term of inequality (e) is obtained under a worst case assumption that θi ≪ 1/n for i ≥ 2. Thus it is quantized to θ′i = 1/n, and the bound |δi| ≤ 1/n is used. The second term is obtained by separating the terms into their intervals. In interval j, the bounds θ′i ≥ n(j−1)β/n, and |δi| ≤ knjβ/n1.5 are used, and also nβ = 2. Inequality (f) is obtained since j ≤ 2n. (29) Inequality (29) is obtained since k1 ≤ n, k2 ≤ (n− k1)/2, k3 ≤ (n− k1)/4− k2/2, and so on, until kJ1 ≤ 2J1−1 2J1−ℓ j ≤ 2n. (30) The reason for these relations are the lower limits of the J1 intervals that restrict the number of parameters inside the interval. The restriction is done in order of intervals, so that the used probabilities are subtracted, leading to the series of equations. Plugging the bounds of (21) and (28) into (27), we obtain, nRn (L ∗,θ) ≤ log k + (1 + ε) k − 1 n (log n) + 5(log e)k 1 + ε′ ) k − 1 n (log n) , (31) where we absorb low order terms in ε′. Replacing ε′ by ε normalizing the redundancy per symbol by n, the bound of the first region of (14) is proved. We now consider the larger values of k, i.e., n1/3 < k = O(n). The idea of the proof is the same. However, we need to partition the probability space to different intervals, the spacing within an interval must be optimized, and the parameters’ description cost must be bounded differently, because now there are more parameters quantized than points in the quantization grid. Define the jth interval as n(j−1)β , 1 ≤ j ≤ J2, (32) where J2 = ⌈2/β⌉ = ⌈2 log n⌉. Again, let kj = |θi ∈ Ij| denote the number of probabilities in θ that are in interval Ij. It could be possible to use the intervals as defined in (15), but this would not guarantee bounded redundancy in the rate we require if there are very small probabilities θi ≪ 1/n. Therefore, the interval definition in (15) can be used for larger alphabets only if the probabilities of the symbols are known to be bounded. Define the spacing in interval j as , (33) where α is a parameter to be optimized. Similarly to (17), the interval cardinality here is |Ij| ≤ 0.5 · nα, ∀j : j = 1, 2, . . . , J2, (34) In a similar manner to the definition of τ in (18), we define η = (η1, η2, . . .) = , . . . , , . . . . (35) The cardinality of η is = |η| ≤ 0.5 · nα ⌈2 log n⌉ ≤ nα ⌈log n⌉ . (36) We now perform the encoding similarly to the small k case, where we allow quantization to nonzero values to the components of ϕ up to i = n2. (This is more than needed but is possible since η1 = 1/n 2.) Encoding is performed similarly to the small k case. Thus, similarly to (27), we nRn (L ∗,θ) ≤ 2 log n+ EθLR ′)+ Eθ log Pθ (X Pθ′ (X , (37) where the first term is due to allowing up to k̂ = n2. Since usually in this region k ≥ B2 (except the low end), the description of vectors ϕ and θ′ is done by coding the cardinality of |ϕi = ηj | and |θ′i = ηj |, respectively, i.e., for each grid point the code describes how many letters have probability quantized to this point. This idea resembles coding profiles of patterns, as done in [20]. However, unlike the method in [20], here, many probability parameters of symbols with different occurrences are mapped to the same grid point by quantization. The number of parameters mapped to a grid point of η is coded using Elias’ representation of the integers. Hence, in a similar manner to (21), 1 + log |θ′i = ηj |+ 1 + 2 log 1 + log |θ′i = ηj |+ 1 ≤ B2 +B2 log k +B2 + 2B2 log log k +B2 + o (B2) (1 + ε)(log n) log k nα, for nα < k = o(n), (1 + ε)(1 − α) (log n)2 nα, for nα < k = O(n). The additional 1 term in the logarithm in (a) is for 0 occurrences, (b) is obtained similarly to step (a) of (21), absorbing all low order terms in the last term. To obtain (c), we first assume, for the first region, that knε ≫ B2 (an assumption that must be later validated with the choice of α). Then, low order terms are absorbed in ε. The extra nε factor is unnecessary if k ≫ B2. The second region is obtained by upper bounding k without this factor. It is possible to separate the first region into two regions, eliminate this factor in the lower region, and obtain a more complicated, yet tighter, expression in the upper region, where k ∼ Θ(n1/3). Now, similarly to (28), we obtain Eθ log Pθ (X Pθ′ (X ≤ n(log e) ≤ O(1) + 2 log e n1+2α ≤ 4(log e)n1−2α +O(1). (39) The first term of inequality (a) is obtained under the assumption that k = O(n), θ′i ≥ 1/n2, and |δi| ≤ 1/n2. For the second term |δi| ≤ njβ/n2+α, and θ′i ≥ n(j−1)β/n2. Inequality (b) is obtained in a similar manner to inequality (f) of (28), where the sum is shown similarly to be 2n2. Summing up the contributions of (38) and (39) in (37), it is clear that α = 1/3 minimizes the total cost (to first order). This choice of α also satisfies the assumption of step (c) in (38). Using α = 1/3, absorbing all low order terms in ε and normalizing by n, we obtain the remaining two regions of the bound in (14). It should be noted that the proof here would give a bound of O(n1/3+ε) up to k = O(n4/3). If the intervals in (15) were used for bounded distributions, the coefficients of the last two regions will be reduced by a factor of 2. Additional manipulations on the grid η may reduce the coefficients more (see, e.g., [28]). The proof up to this point assumes that k is known in advance. This is important for the code resulting in the bound for the first region because the quantization grid depends on k. Specifically, if in building the grid, k is underestimated, the description cost of ϕ increases. If k is overestimated, the quantization cost will increase. Also, if the code of the second region is used for a smaller k, a larger bound than necessary results. To solve this, the optimization that chooses L∗ (xn) is done over all possible values of k (greater than or equal to the maximal symbol occurring in xn), i.e., every greater k in the first region, and the construction of the code for the other regions. For every k in the first region, a different construction is done, using the appropriate k to determine the spacing in each interval. The value of k yielding the shortest code word is then used, and O(log n) additional bits are used at the prefix of the code to inform the decoder which k is used. The analysis continues as before. This does not change the redundancy to first order, giving all three regions of the bound in (14), even if k is unknown in advance. This concludes the proof of Theorem 4. � 5 Upper Bounds for Fast Decaying Distributions This section shows that with some mild conditions on the source distribution, the same redundancy upper bounds achieved for finite monotonic distributions can be achieved even if the monotonic distribution is over an infinite alphabet. The key observation that allows this is that a distribution that decays fast enough will result in only a small number of occurrences of unlikely letters in a sequence. These letters may very likely be out of order, but since there are very few of them, they can be handled without increasing the asymptotic behavior of the coding cost. More precisely, fast decaying monotonic distributions can be viewed as if they have some effective bounded alphabet size, where occurrences of symbols outside this limited alphabet are rare. We present two theorems and a corollary that show how one can upper bound the redundancy obtained when coding with some unknown distribution. The first theorem provides a slightly stronger bound (with smaller coefficient) even for k = O(n), where the smaller coefficient is attained by improved bounding, that more uniformly weights the quantization cost for minimal probabilities. In the weaker version of the results presented here, if the distribution decays slower and there are more low probability symbols, the redundancy order does increase due to the penalty of identifying these symbols in a sequence. However, we show, consistently with the results in [10], that as long as the entropy of the source is finite, a universal code, in the sense of diminishing redundancy per symbol, does exist. We begin with stating the two theorems and the corollary, then the proofs are presented. The section is concluded with three examples of typical monotonic distributions over the integers, to which the bounds are applied. 5.1 Upper Bounds We begin with some notation. Fix an arbitrary small ε > 0, and let n→ ∞. Define m △= mρ as the effective alphabet size, where ρ > ε. (Note that ρ = (logm)/(log n).) Let Rn(m) log n , for m = o ρ+ ε− 1 (log n) n1/3, otherwise. Theorem 5 I. Fix an arbitrarily small ε > 0, and let n → ∞. Let xn be generated by an i.i.d. monotonic distribution θ ∈ M. If there exists m∗, such that, nθi log i = o [Rn (m∗)] , (41) then, there exists a code with length function L∗(·), such that Rn (L ∗,θ) ≤ (1 + ε) Rn (m∗) (42) for the monotonic distribution θ. II. If there exists m∗ for which ρ∗ = o n1/3/(log n) , such that, θi log i = o(1), (43) then, there exists a universal code with length function L∗(·), such that Rn (L ∗,θ) = o(1). (44) Theorem 5 implies that if a monotonic distribution decays fast enough, its effective alphabet size does not exceed O(nρ), and, as long as ρ is fixed, bounds of the same order as those obtained for finite alphabets are achievable. Specifically, very fast decaying distributions, although over infinite alphabets, may even behave like monotonic distributions with o symbols. The condition in (41) merely means that the cost that a code would obtain in order to code very rare symbols, that are larger than the effective alphabet size, is negligible w.r.t. the total cost obtained from other, more likely, symbols. Note that for m = n, the bound is tighter than that of the third region of Theorem 4, and a constant of 5/9 replaces 2/3. The second part of the theorem states that if the decay is slow, but the cost of coding rare symbols is still diminishing per symbol, a universal code still exists for such distributions. However, in this case the redundancy will be dominated by coding the rare (out of order) symbols. This result leads to the following corollary: Corollary 1 As n → ∞, sequences generated by monotonic distributions with Hθ(X) = O(1) are universally compressible in the average sense. Corollary 1 shows that sequences generated by finite entropy monotonic distributions can be com- pressed in the average with diminishing per symbol redundancy. This result is consistent with the results shown in [10]. While Theorem 5 bounds the redundancy decay rate with two extremes, a more general theorem can be used to provide some best redundancy decay rate that a code can be designed to adapt to for some unknown monotonic distribution that governs the data. As the examples at the end of this section show, the next theorem is very useful for slower decaying distributions. Theorem 6 Fix an arbitrarily small ε > 0, and let n → ∞. Let xn be generated by an i.i.d. monotonic distribution θ ∈ M. Then, there exists a code with length function L∗(·), that achieves redundancy nRn (L ∗,θ) ≤ (1 + ε) · α,ρ:ρ≥α+ε · (ρ+ 2α) (ρ− α) (log n)2nα + 5(log e)n1−2α + θi log i for coding sequences generated by the source θ. We continue with proving the two theorems and the corollary. Proof : The idea of the proof of both theorems is to separate the more likely symbols from the unlikely ones. First, the code determines the point of separation m = nρ. (Note that ρ can be greater than 1.) Then, all symbols i ≤ m are considered likely and are quantized in a similar manner as in the codes for smaller alphabets. Unlike bounded alphabets, though, a more robust grid is used here to allow larger values of m. Coding of occurrences of these symbols uses the quantized probabilities. The unlikely symbols are coded hierarchically. They are first merged into a single symbol, and then are coded within this symbol, where the full cost of conveying to the decoder which rare symbols occur in the sequence is required. Thus, they are presented giving their actual value. As long as the decay is fast enough, the average cost of conveying these symbols becomes negligible w.r.t. the cost of coding the likely symbols. If the decay is slower, but still fast enough, as the case described in condition (43), the coding cost of the rare symbols dominates the redundancy, but still diminishing redundancy can be achieved. In order to determine the best value of m for a given sequence, all values are tried and the one yielding the shortest description is used for coding the specific sequence xn. Let m ≥ 2 determine the number of likely symbols in the alphabet. For a given m, define θi, (46) as the total probability of the remaining symbols. Given θ, m and Sm, a probability P (xn|m,Sm,θ) nx(i) · Snx(x>m)m · nx(i) nx(x > m) )nx(i) , (47) can be computed for xn, where nx(i) is the occurrence count of symbol i in x n, and nx(x > m) is the count of all symbols greater than m in xn. This probability mass function clusters all large symbols (with small probabilities) greater than m into one symbol. Then, it uses the ML estimate of each of the large symbols to distinguish among them in the clustered symbol. For every m, we can define a quantization grid ξm for the first m probability parameters of θ. The idea is similar to that used for all probability parameters in the proof of Theorem 4. If m = o(n1/3), we use ξm = τm, where τm is the grid defined in (18) where m replaces k. Otherwise, we can use the definition of η in (35). However, to obtain tighter bounds for large m, we define a different grid for the larger values of m following similar steps to those in (32)-(36). First, define the jth interval as n(j−1)β nρ+2α nρ+2α , 1 ≤ j ≤ Jρ, (48) where ρ = (logm)/(log n) as defined above, α is a parameter, and β = 1/(log n) as before. Within the jth interval, we define the spacing in the grid by nρ+3α . (49) As in (34), |Ij | ≤ 0.5 · nα, ∀j : j = 1, 2, . . . , Jρ, (50) and the total number of intervals is Jρ = ⌈(ρ+ 2α) log n⌉ . (51) Similarly to (35), ξm is defined as ξm = (ξ1, ξ2, . . .) = nρ+2α nρ+2α nρ+3α , . . . , nρ+2α nρ+2α nρ+3α , . . . . (52) The cardinality of ξm is thus = |ξm| ≤ 0.5 · nα ⌈(ρ+ 2α) log n⌉ . (53) An mth order quantized version θ′m of θ is obtained by quantizing θi, i = 2, 3, . . . ,m onto ξm, such that θ′i ∈ ξm for these values of i. Then, the remaining cluster probability Sm is quantized into S′m ∈ [1/n, 2/n, . . . , 1]. The parameter θ′1 is constrained by the quantization of the other parameters. Quantization is performed in a similar manner as before, to minimize the accumulating cost and retain monotonicity. Now, for any m ≥ 2, let ϕm be any monotonic probability vector of cardinality m whose last m− 1 components are quantized into ξm, and let σm ∈ [1/n, 2/n, . . . , 1] be a quantized estimate of the total probability of the remaining symbols, such that i=1 ϕi,m+σm = 1, where ϕi,m is the ith component of ϕm. If m, σm and ϕm are known, a given x n can be coded using P (xn|m,σm,ϕm) as defined in (47), where σm replaces Sm, and the m components of ϕm replace the first m com- ponents of θ. However, in the universal setting, none of these parameter are known in advance. Furthermore, neither the symbols greater than m nor their conditional ML probabilities are known in advance. Therefore, the total cost of coding xn using these parameters requires universality costs for describing them. The cost of universally coding xn assigning probability P (xn|m,σm,ϕm) to it thus requires the following five components: 1) m should be described using Elias’ representation with at most 1+ ρ log n+2 log(1+ ρ log n) bits. 2) The value of σm in its quantization grid should be coded using log n bits. 3) The m components of ϕm require LR (ϕm) (which is bounded below) bits. 4) The number cx(x > m) of distinct letters in x n greater than m is coded using log n bits. 5) Each letter i > m in xn is coded. Elias’ coding for the integers using 1 + log i+ 2 log(1 + log i) bits can be used, but to simplify the derivation we can also use the code, also presented in [7], that uses no more than 1 + 2 log i bits to describe i. In addition, at most log n bits are required for describing nx(i) in x n. For n→ ∞, m≫ 1, and ε > 0 arbitrarily small, this yields a total cost of L (xn|m,σm,ϕm) ≤ − log P (xn|m,σm,ϕm) + LR (ϕm) + [(1 + ε)ρ+ cx(x > m) + 2] log n +cx(x > m) + 2 i>m,i∈xn log i, (54) where we assume m is large enough to bound the cost of describing m by (1 + ε)ρ log n. The description cost of ϕm for m = o(n 1/3) is bounded by LR (ϕm) ≤ (1 + ε) using (21), where m replaces k. The (log n)2 factor in (21) can be absorbed in ε since we limit m to o(n1/3), unlike the derivation in (21). For larger values of m, we describe symbol probabilities of ϕm in the grid ξm in a similar manner to the description of O(n) symbol probabilities in the grid η. Similarly to (38), we thus have LR(ϕm) ≤ Bρ +Bρ log nρ +Bρ + 2Bρ log log nρ +Bρ + o (Bρ) ≤ (1 + ε) (ρ+ 2α) (ρ+ ε− α) (log n)2nα (56) where to obtain inequality (a), we first multiply nρ by nε in the numerator of the argument of the logarithm. This is only necessary for ρ→ α to guarantee that nρ+ε ≫ Bρ. Substituting the bound on Bρ from (53), absorbing low order terms in the leading ε, yields the bound. A sequence xn can now be coded using the universal parameters that minimize the length of the sequence description, i.e., L∗ (xn) = min σm′∈[ ,...,1] ϕm′ :ϕi∈ξm′ ,i≥2 xn|m′, σm′ ,ϕm′ xn|m,S′m,θ′m , (57) where θ′m and S m are the true source parameters quantized as described above, and the inequality holds for every m. Note that the maximization on m′ should be performed only up to the maximal symbol the occurs in xn. Following (54)-(57), up to negligible integer length constraints, the average redundancy using L∗(·) is bounded, for every m ≥ 2, by nRn (L ∗,θ) = Eθ [L ∗ (Xn) + log Pθ (X Xn | m,S′m,θ′m + logPθ (X ≤ Eθ log Pθ (X Xn | m,S′m,θ′m ) + LR Pθ (i ∈ Xn) log i +(1 + ε) [EθCx (X > m) + ρ+ 2] log n (58) where (a) follows from (57), and (b) follows from averaging on (54) with σm = S m, and ϕm = θ where the average on cx(x > m) is absorbed in the leading ε. Expressing Pθ (x n) as Pθ (x nx(i)  · Snx(x>m)m · )nx(i) , (59) and defining δS = Sm − S′m, the first term of (58) is bounded, for the upper region of m, by Eθ log Pθ (X Xn | m,S′m,θ′m ) ≤ Eθ Nx(i) log θ′i,m +Nx (X > m) log Nx(i) log θi/Sm Nx(i)/Nx(X > m) ≤ n · θi log θ′i,m + nSm log ≤ n(log e) θ′i,m ≤ (log e) · n · nρ nρ+2α + 2(log e)n1−ρ−4α · jβ + log e ≤ 5(log e)n1−2α + log e, (60) where (a) is since for the third term, the conditional ML probability used for coding is greater than the actual conditional probability assigned to all letters greater than m for every xn. Hence, the third term is bounded by 0. For the other terms expectation is performed. Inequality (b) is obtained similarly to (28) where quantization includes the first m components of θ and the parameter Sm. Then, inequality (c) follows the same reasoning as step (a) of (39). The first term bounds the worst case in which all nρ symbols are quantized to 1/nρ+2α with |δi| ≤ 1/nρ+2α. The second term is obtained where θ′i,m ≥ n(j−1)β/nρ+2α and |δi| ≤ njβ/nρ+3α for θi ∈ Ij , and kj = |θi ∈ Ij| as before. The last term is since S′m ≥ 1/n and |δS | ≤ 1/n. Finally, (d) is obtained similarly to step (b) of (39), where as in (29), jβ ≤ 2nρ+2α. For m = o(n1/3), the same initial steps up to step (b) in (60) are applied, and then the remaining steps in (28) are applied to the left sum with m replacing k, yielding a total quantization cost of 5(log e)m+ log e. To bound the third and fourth terms of (58), we realize that Pθ (i ∈ Xn) = 1− (1− θi)n ≤ nθi. (61) Similarly, EθCx(X > m) = Pθ (i ∈ Xn) ≤ nSm. (62) Combining the dominant terms of the third and fourth terms of (58), we have Pθ (i ∈ Xn) log i+ (1 + ε)EθCx(X > m) log n Pθ (i ∈ Xn) [2 log i+ (1 + ε) log n] 1 + ε Pθ (i ∈ Xn) log i 1 + ε θi log i (63) where (a) is because EθCx(X > m) = i>m Pθ (i ∈ Xn), (b) is because for i > m = nρ, log i > ρ log n, and (c) follows from (61). Given ρ > ε for an arbitrary fixed ε > 0, the resulting coefficient above is upper bounded by some constant κ. Summing up the contributions of the terms of (58) from (28), (55), and (63), absorbing low order terms in a leading ε′, we obtain that for m = o(n1/3), nRn (L ∗,θ) ≤ 1 + ε′ ) m− 1 θi log i. (64) For the second region, substituting α = 1/3, and summing up the contributions of (60), (56), and (63) to (58), absorbing low order terms in ε′, we obtain nRn (L ∗,θ) ≤ (1 + ε′) ρ+ ε′ − (log n) n1/3 + κn θi log i. (65) Since (64)-(65) hold for every m > nε, there exists m∗ for which the minimal bound is obtained. To bound the redundancy, we choose this m∗. Now, if the condition in (41) holds, then the second term in (64) and (65) is negligible w.r.t. the first term. Absorbing it in a leading ε, normalizing by n, yields the upper bound of (42), and concludes the proof of the Part I of Theorem 5. For Part II of Theorem 5, we consider the bound of the second region in (65). If there exists ρ∗ = o n1/3/(log n) for which the condition in (43) holds, then both terms of (65) are of o(n), yielding a total redundancy per symbol of o(1). The proof of Theorem 5 is concluded. � To prove Corollary 1, we use Wyner’s inequality [32], which implies that for a finite entropy monotonic distribution, θi log i = Eθ [logX] ≤ Hθ [X] . (66) Since the sum on the left hand side of (66) is finite if Hθ[X] is finite, there must exist some n0 such θi log i = o(1). Let n > n0, then for m ∗ = n and ρ∗ = 1, condition (43) is satisfied. Therefore, (44) holds, and the proof of Corollary 1 is concluded. � We now consider only the upper region in (58) with parameters α and ρ taking any valid value. (The code leading to the bound of the upper region can be applied even if the actual effective alphabet size is in the lower region.) We can sum up the contributions of (60), (56), and (63) to (58), absorbing low order terms in ε. Equation (56) is valid without the middle ε term as long as ρ ≥ α + ε. Since, in the upper region of m, i ≥ m is large enough, Elias’ code for the integers can be used costing (1 + ε) log i to code i, with ε > 0 which can be made arbitrarily small. Hence, the leading coefficient of the bound in (63) can be replaced by (1 + ε)(1 + 1/ρ). This yields the expression bounding the redundancy in (45). This expression applies to every valid choice of α and ρ, including the choice that minimizes the expression. Thus the proof of Theorem 6 is concluded. � 5.2 Examples We demonstrate the use of the bounds of Theorems 5 and 6 with three typical distributions over the integers. We specifically show that the redundancy rate of O n1/3+ε bits overall is achievable when coding many of the typical monotonic distributions, and, in fact, for many distributions faster convergence rates are achievable with the codes provided in proving the theorems above. The assumption that very few unlikely symbols are likely to appear in a sequence generated by a monotonic distribution, which is reflected in the conditions in (41) and (43), is very realistic even in practical examples. Specifically, in the phone book example, there may be many rare names, but only very few of them may occur in a certain city, and the more common names constitute most of any possible phone book sequence. 5.2.1 Fast Decaying Distributions Over the Integers Consider the monotonic distributions over the integers of the form, , i = 1, 2, . . . , (67) where γ > 0, and a is a normalization coefficient that guarantees that the probabilities over all integers sum to 1. It is easy to show by approximating summation by integration that for some m→ ∞, Sm ≤ (1 + ε) θi log i ≤ (1 + ε) a logm . (69) For m = nρ and fixed ρ, the sum in (41) is thus O n1−ργ log n , which is o n1/3(log n)2 for every ρ ≥ 2/(3γ). Specifically, as long as γ ≤ 2 (slow decay), the minimal value of ρ required to guarantee negligibility of the sum in (41) is greater than 1/3. Using Theorem 5, this implies that for γ ≤ 2, the second (upper) region of the upper bound in (42) holds with the minimal choice of ρ∗ = 2/(3γ). Plugging in this value in the second region of (40) (i.e., in (42)) yields the upper bound shown below for this region. For γ > 2, 2/(3γ) < 1/3. Hence, (41) holds for m∗ = o . This means that for the distribution in (67) with γ > 2, the effective alphabet size is o , and thus the achievable redundancy is in the first region of the bound of (42). Thus, even though the distribution is over an infinite alphabet, its compressibility behavior is similar to a distribution over a relatively small alphabet. To find the exact redundancy rate, we balance between the contributions of (55) and (63) in (58). As long as 1 − ργ < ρ, condition (41) holds, and the contribution of small letters in (63) is negligible w.r.t. the other terms of the redundancy. Equality, implying ρ∗ = 1/(1 + γ), achieves the minimal redundancy rate. Thus, for γ > 2, nRn (L ≤ (1 + ε) a(2ρ∗ + 1) ∗γ log n+ (1− 3ρ∗) log n = (1 + ε) 1+γ log n (70) where the first term in (a) follows from the bounds in (63) and (69), with m = nρ , and the second term from that in (55), and (b) follows from ρ∗ = 1/(1 + γ). Note that for a fixed ρ∗, the factor 3 in the first term can be reduced to 2 with Elias’ coding for the integers. The results described are summarized in the following corollary: Corollary 2 Let θ ∈ M be defined in (67). Then, there exists a universal code with length function L∗(·) that has only prior knowledge that θ ∈ M, that can achieve universal coding redundancy Rn (L ∗,θ) ≤ (1 + ε) 1 1 + 1 + ε− 1 n1/3(logn)2 , for γ ≤ 2, (1 + ε) 1+γ logn , for γ > 2. Corollary 2 gives the redundancy rates for all distributions defined in (67). For example, if γ = 1, the redundancy is O n1/3(log n)2 bits overall with coefficient 2/9. For γ = 3, O(n1/4 log n) bits are required. For faster decays (greater γ) even smaller redundancy rates are achievable. 5.2.2 Geometric Distributions Geometric distributions given by θi = p (1− p)i−1 ; i = 1, 2, . . . , (72) where 0 < p < 1, decay even faster than the distribution over the integers in (67). Thus their effective alphabet sizes are even smaller. This implies that a universal code can have even smaller redundancy than that presented in Corollary 2 when coding sequences generated by a geometric distribution (even if this is unknown in advance, and the only prior knowledge is that θ ∈ M). Choosing m = ℓ · log n, the contribution of low probability symbols in (63) to (58) can be upper bounded by θi (log i+ log n) ≤ 2n(1− p)m log n+O (n(1− p)m logm) = 2n1+ℓ log(1−p)(log n) +O n1+ℓ log(1−p) log log n where (a) follows from computing Sm using geometric series, and bounding the second term, and (b) follows from substituting m = ℓ log n and representing (1 − p)ℓ logn as nℓ log(1−p). As long as ℓ ≥ 1/(− log(1− p)), the expression in (73) is O(log n), thus negligible w.r.t. the redundancy upper bound of (42) with m∗ = ℓ∗ log n = (log n)/(− log(1 − p)). Substituting this m∗ in (42), we obtain the following corollary: Corollary 3 Let θ ∈ M be a geometric distribution defined in (72). Then, there exists a universal code with length function L∗(·) that has only prior knowledge that θ ∈ M, that can achieve universal coding redundancy Rn (L ∗,θ) ≤ 1 + ε −2 log(1− p) · (log n) . (74) Corollary 3 shows that if θ parameterizes a geometric distribution, sequences governed by θ can be coded with average universal coding redundancy of O (log n)2 bits. Their effective alphabet size is O(log n), implying that larger symbols are very unlikely to occur. For example, for p = 0.5, the effective alphabet size is log n, and 0.5(log n)2 bits are required for a universal code. For p = 0.75, the effective alphabet size is (log n)/2, and (log n)2/4 bits are required by a universal code. 5.2.3 Slow Decaying Distributions Over the Integers Up to now, we considered fast decaying distributions, which all achieved the O(n1/3+ε/n) redun- dancy rate. We now consider a slowly decaying monotonic distribution over the integers, given i (log i) , i = 2, 3, . . . , (75) where γ > 0 and a is a normalizing factor (see, e.g., [12], [27]). This distribution has finite entropy only if γ > 0 (but is a valid infinite entropy distribution for γ > −1). Unlike the previous distributions, we need to use Theorem 6 to bound the redundancy for coding sequences generated by this distribution. Approximating the sum with an integral, the order of the third term of (45) θi log i = O (logm)γ . (76) In order to minimize the redundancy bound of (45), we define ρ = nℓ. For the minimum rate, all terms of (45) must be balanced. To achieve that, we must have α+ 2ℓ = 1− 2α = 1− γℓ. (77) The solution is α = γ/(4 + 3γ), and ℓ = 2/(4 + 3γ). Substituting these values in the expression of (45), with ρ = nℓ, results in the first term in (45) dominating, and yields the following corollary: Corollary 4 Let θ ∈ M be defined in (75) with γ > 0. Then, there exists a universal code with length function L∗(·) that has only prior knowledge that θ ∈ M, that can achieve universal coding redundancy Rn (L ∗,θ) ≤ (1 + ε) 3γ+4 (log n)2 . (78) Due to the slow decay rate of the distribution in (75), the effective alphabet size is much greater here. For γ = 1, for example, it is nn . This implies that very large symbols are likely to appear in xn. As γ increases though, the effective alphabet size decreases, and as γ → ∞, m → n. The redundancy rate increases due to the slow decay. For γ ≥ 1, it is O n5/7(log n)2/n . As γ → ∞, since the distribution tends to decay faster, the redundancy rate tends to the finite alphabet rate n1/3(log n)2/n . However, as the decay rate is slower γ → 0, a non-diminishing redundancy rate is approached. Note that the proof of Theorem 6 does not limit the distribution to a finite entropy one. Therefore, the bound of (78) applies, in fact, also to −1 < γ ≤ 0. However, for γ ≤ 0, the per-symbol redundancy is no long diminishing. 6 Individual Sequences In this section, we first show that individual sequences whose empirical distributions obey the monotonicity constraints can be universally compressed as well as the average case. We then study compression of sequences whose empirical distributions may diverge from monotonic. We demonstrate that under mild conditions, similar in nature to those of Theorems 5 and 6, redundancy that diminishes (slower than in the average case) w.r.t. the monotonic ML description length can be obtained. However, these results are only useful when the monotonic ML description length diverges only slightly from the (standard) ML description length of a sequence, i.e., the empirical distribution of a sequence only mildly violates monotonicity. Otherwise, the penalty of using an incorrect monotone model overwhelms the redundancy gain. We begin with sequences that obey the monotonicity constraints. Theorem 7 Fix an arbitrarily small ε > 0, and let n→ ∞. Let xn be a sequence for which θ̂ ∈ M, i.e., θ̂1 ≥ θ̂2 ≥ . . .. Let k = k̂ be the number of letters occurring in xn. Then, there exists a code L∗ (·) that achieves individual sequence redundancy w.r.t. θ̂M = θ̂ for xn which is upper bounded R̂n (L ∗, xn) ≤ (1 + ε) k−1 n(logn) , for k ≤ n1/3, (1 + ε) (log n) log k n1/3−ε , for n1/3 < k = o(n), (1 + ε) 1 (log n) 2 n1/3 , for n1/3 < k = O(n). Note that by the monotonicity constraint, the number of symbols k̂ occurring in xn also equals to the maximal symbol in xn. Since, in the individual sequence case, this maximal symbol defines the class considered and also to be consistent with Theorem 3, we use k to characterize the alphabet size of a given sequence. (The maximal symbol in the individual sequence case is equivalent to the alphabet size in the average case.) Finally, since θ̂ is monotonic, θ̂M = θ̂. Proof of Theorem 7: The result in Theorem 7 follows directly from the proof of Theorem 4. Both regions of the proof apply here, where instead of quantizing θ to θ′, we quantize θ̂ to θ̂ similar manner, and do not need to average over all sequences. In fact, instead of using any general ϕ̂ to code xn, we can use θ̂ without any additional optimizations, where log n bits describe k. The description costs of θ̂ are almost the same as those of θ′. The factor 2 reduction in the last region is because it is sufficient here to replace n2 by n in the denominators of (32). This is because for every occurring symbol θ̂′i ≥ 1/n and δi ≤ 1/n, thus the first term of step (a) in (39) holds with the new grid, and B2 in (36) reduces by a factor of 2. The quantization costs bounded in (28) and (39) are thus bounded similarly, where θ̂ replaces θ and θ̂ replaces θ′. This results in the bounds in (79) and concludes the proof of Theorem 7. � If one a-priori knows that xn is likely to have been generated by a monotonic distribution, the case considered in Theorem 7 is with high probability the typical one. However, a typical sequence can also be one for which θ̂ 6∈ M, where θ̂ mildly violates the monotonicity. In the pure individual sequence setting (where no underlying distribution is assumed but some monotonicity assumption is reasonable for the empirical distribution of xn), one can still observe sequences that have empirical distributions that are either monotonic or slightly diverge from monotonic. Coding for this more general case can apply the methods described in Section 5 to the individual sequence case. If the divergence from monotonicity is small, one may still achieve bounds of the same order of those presented in Theorem 7 with additional negligible cost of relaying which symbols are out of order. The next theorem, however, provides a general upper bound in the form of the bounds of Theorems 5 and 6 for the individual sequence redundancy w.r.t. the monotonic ML description length, as defined in (10). We begin, again, with some notation. Recall the definition of an effective alphabet size m = nρ (where ρ = (logm)/(log n).) Now, use this definition for a specific individual sequence xn. Let R̂n(m) log n , m ≤ n1/3, m log n , n1/3 < m = o ( minα<ρ ρ+1+α (ρ− α) (log n)2 nα + 3(log e)n1−α , otherwise. Theorem 8 Fix an arbitrarily small ε > 0, and let n→ ∞. Then, there exists a code with length function L∗(·), that achieves individual sequence redundancy w.r.t. the monotonic ML description length of xn (as defined in (10)) bounded by R̂n (L ∗, xn) ≤ 1 + ε R̂n (nρ) + i>nρ,i∈xn log i for every xn. Theorem 8 shows that if one can find a relatively small effective alphabet of the symbols that occur in xn, and the symbols outside this alphabet are small enough, xn can be described with diminishing per-symbol redundancy w.r.t. its monotonic ML description length. This implies that as long as the occurring symbols are not too large, there exist a universal code w.r.t. a monotonic ML distribution for any such sequence xn. This is unlike standard individual sequence compression w.r.t. the i.i.d. ML description length. Specifically, if the effective alphabet size is O(n), and only a small number of symbols which are only polynomial in n occur, the universality cost is n(log n)2) bits overall, which gives diminishing per-symbol redundancy of O((log n)2/ This redundancy is much better than what can be achieved in standard compression. The penalty, of course, is when the empirical distribution of an individual sequence diverges significantly away from a monotonic one. While the monotonic redundancy can be made diminishing under mild conditions, there is a non-diminishing divergence cost by using the monotonic ML description length instead of the ML description length in that case. This implies that one should compress a sequence as generated by a monotonic distribution only if the total description length required to code xn as such is shorter than the total description length required to code xn with standard methods. As shown in the proof of Theorem 8, one prefix bit can inform the decoder which type of description is used. Theorem 8 shows that as long as the effective alphabet size is polynomial in n, α = 0.5 optimizes the third region of the upper bound, thus yielding the rate shown above, unless very large symbols occur in xn. For small effective alphabets (the first region), there is no redundancy gain in using the monotonic ML description length over the ML description length. The reason, again, is that the bound is obtained for cases where the actual empirical distribution of a sequence may not be monotonic. One can still use an i.i.d. ML estimate w.r.t. only the effective alphabet, if the additional cost of symbols outside this alphabet is negligible, to better code such sequences. Theorem 8 also shows that if a very large symbol, such as i = an; a > 1, occurs in xn, xn cannot be universally compressed even w.r.t. its monotonic ML description length. This is because it is impossible to avoid the cost of (1+ε) log i = (1+ε)n log a bits to describe this symbol to the decoder. The bound above and its proof below give a very powerful method to individually compress sequences that have an almost monotonic empirical distribution but may have some limited disorder, for which the monotonic ML description length diverges only negligibly from the ML description length. Proof of Theorem 8: The proof follows the same steps as the proof of Theorems 5 and 6. Each value of m is tested and the best one is chosen, where the same coding costs described in the mentioned proof are computed for each m. In addition, one can test the cost of coding xn using the description lengths for both θ̂ and θ̂M. Then, one bit can be used to relay which ML estimator is used. If θ̂ is used, the codes for coding individual sequences over large alphabets in either [21] or [25] can be used. In the first region in (81), the bound in [25] is obtained since log P (xn) ≥ log P for every xn. This bound yields smaller redundancy for this region than that obtained using θ̂M if θ̂M 6= θ̂. It implies that for small alphabets, if xn does not have an empirical monotonic distribution, it is better coded, even in terms of universal coding redundancy, using standard universal compression methods without taking advantage of a monotonicity assumption. For the other two regions, we start with a lemma. Lemma 6.1 Let θ̂M = θ̂1,M, θ̂2,M, . . . , θ̂k,M be the monotonic ML estimator of θ from xn, i.e., θ̂1,M ≥ θ̂2,M ≥ · · · ≥ θ̂k,M, where k = max {x1, x2, . . . , xn}. Then, θ̂k,M ≥ . (82) Lemma 6.1 provides a lower bound on the minimal nonzero probability component of the monotonic ML estimator. This bound helps in designing the grid of points used to quantize the monotonic ML distribution of xn, while maintaining bounded quantization costs. The proof of Lemma 6.1 is in Appendix C. For m in the second region, we cannot use the grid in (18). The reason is that, here, the quantization cost is affected by both θ̂ and θ̂M. This is unlike the average case, where the av- erage respective vectors merge. To limit the quantization cost for very small probabilities, using Lemma 6.1, the minimal grid point must be 1/n2 or smaller. To make the quantization cost neg- ligible w.r.t. the cost of describing the quantized ML, the ratio ∆j/ϕi,M between the spacing in interval j, and a quantized version ϕi,M of θ̂i,M in the jth interval, must be O(m/n). Hence, using the same methodology of the proof of Theorems 5 and 6, we define the jth interval for an effective alphabet m = nρ = o ( n) as Îj = n(j−1)β , 1 ≤ j ≤ Ĵρ. (83) The spacing in the jth interval is . (84) This gives a total of B̂ρ ≤ log n (85) quantization points. Using the same methodology as in (21), this yields a representation cost of LR (ϕm) ≤ (1 + ε)m log where ϕm is the quantized version of θ̂M in which only the firstm components of θ̂M are considered. Using the quantization with the grid defined in (83)-(86) in a code similar to the one used in the proof of Theorems 5 and 6, the individual quantization cost is given by P (xn|m,S′m,ϕm) θ̂i log θ̂i,M + log e ≤ n(log e) + log e ≤ (log e) · n ·mn+ (log e) · n · mn + log e = 3m(log e) + log e. (87) where (a) follows the same steps as in (60), (b) follows from ln(1+x) ≤ x, and then x ≤ |x|, where = θ̂i,M − ϕi,m, and (c) follows from Lemma 6.1 and the definition of Îj in (83) (for the worst case first term, |δi| ≤ 1/n2 and ϕi,m ≥ 1/(mn)), from (84) and (83) (the second term), and since θ̂i = 1. The only additional non-negligible cost of coding sequences using a code as defined in the proof of Theorems 5 and 6 for a given m is the cost of coding all symbols i > m that occur in xn. Using a similar derivation to (54), with Elias’ asymptotic code for the integers, this yields an additional cost of (1 + ε) (1 + 1/ρ) i>nρ,i∈xn log i code bits. Combining all costs, absorbing low order terms in ε, and normalizing by n, yields the second region of the bound in (81). Note that this bound also applies to the first region, but in that region, a tighter bound is obtained by using a code that uses the standard i.i.d. ML estimator θ̂. This is because very fine quantization is needed to offset the cost of mismatch between θ̂ and θ̂M. This quantization requires higher description costs than the description of a quantized type of a sequence when using standard compression. (This is not the case when θ̂ obeys the monotonicity, as in Theorem 7. Even if θ̂ does not obey monotonicity in the upper regions of the bound, this is not the case.) For the last region of the bound, we follow the same steps above as was done for the upper region of the bound in Theorem 5 with a parameter α. The intervals are chosen, again, to guarantee bounded quantization costs. Hence, Îj = n(j−1)β nρ+1+α nρ+1+α , 1 ≤ j ≤ Ĵρ. (88) The spacing in the jth interval is nρ+1+2α . (89) This gives a total of B̂ρ ≤ 0.5nα ⌈(ρ+ 1 + α) log n⌉ (90) quantization points. Using the same methodology as in (56), this yields a representation cost of LR (ϕm) ≤ (1 + ε) ρ+ 1 + α (ρ+ ε− α) (log n)2nα. (91) Similarly to (87), P (xn|m,S′m,ϕm) ≤ (log e) nρ+1+α + (log e)2n1−α + log e = 3(log e)n1−α + log e (92) where (a) follows from similar steps to (a)-(c) of (87). Using Lemma 6.1, ϕi,m ≥ 1/nρ+1 and |δi| ≤ 1/nρ+1+α, leading to the first term. Bounding |δi| ≤ njβ/nρ+1+2α and ϕi,m ≥ n(j−1)β/nρ+1+α leads to the second term. Note that as before, m is used here in place of k, because using an ef- fective alphabet m, all greater symbols are packed together as one symbol, and the additional cost to describe them is reflected in an additional term. Adding this additional term with an identical expression to that in the lower regions, absorbing low order terms in ε, and normalizing by n, yields the third region of the bound in (81). Since the bound holds for every α and every ρ > α, it can be optimized to give the values that attain the minimum, concluding the proof of Theorem 8. � 7 Summary and Conclusions Universal compression of sequences generated by monotonic distributions was studied. We showed that for finite alphabets, if one has the prior knowledge of the monotonicity of a distribution, one can reduce the cost of universality. For alphabets of o(n1/3) letters, this cost reduces from 0.5 log(n/k) bits per each unknown probability parameter to 0.5 log(n/k3) bits per each unknown probability parameter. Otherwise, for alphabets of O(n) letters, one can compress such sources with overall redundancy of O(n1/3+ε) bits. This is a significant decrease in redundancy from O(k log n) or O(n) bits overall that can be achieved if no side information is available about the source distribution. Redundancy of O(n1/3+ε) bits overall can also be achieved for much larger alphabets including infinite alphabets for fast decaying monotonic distributions. Sequences generated by slower decaying distributions can also be compressed with diminishing per-symbol redundancy costs under some mild conditions and specifically if they have finite entropy rates. Examples for well-known monotonic distributions demonstrated how the diminishing redundancy decay rates can be computed by applying the bounds that were derived. Finally, the average case results were extended to individual sequences. Similar convergence rates were shown for sequences that have empirical monotonic distributions. Furthermore, universal redundancy bounds w.r.t. the monotonic ML description length of a sequence were also derived for the more general case. Under some mild conditions, these bounds still exhibit diminishing per-symbol redundancies. Appendix A – Proof of Theorem 1 The proof follows the same steps used in [25] and [26] to lower bound the maximin redundancies for large alphabets and patterns, respectively, using the weak version of the redundancy-capacity theorem [5]. This version ties between the maximin universal coding redundancy and the capacity of a channel defined by the conditional probability Pθ (x n). We define a set ΩMk of points θ ∈ Mk. Then, show that these points are distinguishable by observing Xn, i.e., the probability that Xn generated by θ ∈ ΩMk appears to have been generated by another point θ ′ ∈ ΩMk diminishes with n. Then, using Fano’s inequality [3], the number of such distinguishable points is a lower bound on R−n (Mk). Since R+n (Mk) ≥ R−n (Mk), it is also a lower bound on the average minimax redundancy. The two regions in (6) result from a threshold phenomenon, where there exists a value km of k that maximizes the lower bound, and can be applied to all Mk for k ≥ km. We begin with defining ΩMk . Let ω be a vector of grid components, such that the last k − 1 components θi, i = 2, . . . , k, of θ ∈ ΩMk must satisfy θi ∈ ω. Let ωb be the bth point in ω, and define ω0 = 0 and 2(j − 1 , b = 1, 2, . . . . (A.1) Then, for the bth point in ω, . (A.2) To count the number of points in ΩMk , let us first consider the standard i.i.d. case, where there is no monotonicity requirement, and count the number of points in Ω, which is defined similarly, but without the monotonicity requirement (i.e., ΩMk ⊆ Ω). Let bi be the index of θi in ω, i.e., θi = ωbi . Then, from (A.1)-(A.2) and since the components of θ are probabilities, ωbi = θi ≤ 1. (A.3) It follows that for θ ∈ Ω, b2i ≤ n1−ε. (A.4) Hence, since the components bi are nonnegative integers, = |Ω| ≥ n1−ε⌋ n1−ε−b22 · · · n1−ε− i=2 b n1−ε−x22 · · · n1−ε− i=2 x dxk · · · dx3dx2 (A.5) where Vk−1 is the volume of a k − 1 dimensional sphere with radius , (a) follows from monotonic decrease of the function in the integrand for all integration arguments, and (b) follows since its left hand side computes the volume of the positive quadrant of this sphere. Note that this is a different proof from that used in [25]-[26] for this step. Applying the monotonicity constraint, all permutations of θ that are not monotonic must be taken out of the grid. Hence, = |ΩMk | ≥ k! · 2k−1 , (A.6) where dividing by k! is a worst case assumption, yielding a lower bound and not an equality. This leads to a lower bound equal to that obtained for patterns in [26] on the number of points in ΩMk . Specifically, the bound achieves a maximal value for km = πn1−ε/2 and then decreases to eventually become smaller than 1. However, for k > km, one can consider a monotonic distribution for which all components θi; i > km, of θ are zero, and use the bound for km. Distinguishability of θ ∈ ΩMk is a direct result of distinguishability of θ ∈ Ω, which is shown in Lemma 3.1 in [25], i.e., there exits an estimator Θ̂g(X n) ∈ Ω for which the estimate θ̂g satisfies limn→∞ Pθ θ̂g 6= θ = 0 for all θ ∈ Ω. Since this is true for all points in Ω, it is also true for all points in ΩMk ⊆ Ω, where now, θ̂g ∈ ΩMk . Assuming all points in ΩMk are equally probable to generate Xn, we can define an average error probability Pe Θ̂g(X n) 6= Θ θ∈ΩMk θ̂g 6= θ /MMk . Using the redundancy-capacity theorem, nR−n [Mk] ≥ C [Mk → Xn] ≥ I[Θ;Xn] = H [Θ]−H [Θ|Xn] = logMMk −H [Θ|X ≥ (1− Pe) (logMMk)− 1 ≥ (1− o(1)) logMMk , (A.7) where C [Mk → Xn] denotes the capacity of the respective channel and I[Θ;Xn] is the mutual information induced by the joint distribution Pr (Θ = θ) · Pθ (Xn). Inequality (a) follows from the definition of capacity, equality (b) from the uniform distribution of Θ in ΩMk , inequality (c) from Fano’s inequality, and (d) follows since Pe → 0. Lower bounding the expression in (A.6) for the two regions (obtaining the same bounds as in [26]), then using (A.7), normalizing by n, and ab- sorbing low order terms in ε, yields the two regions of the bound in (6). The proof of Theorem 1 is concluded. � Appendix B – Proof of Theorem 2 To prove Theorem 2, we use the random-coding strong version of the redundancy-capacity theorem [17]. The idea is similar to the weak version used in Appendix A. We assume that grids ΩMk of points are uniformly distributed over Mk, and one grid is selected randomly. Then, a point in the selected grid is randomly selected under a uniform prior to generate Xn. Showing distinguishability within a selected grid, for every possible random choice of ΩMk , implies that a lower bound on the cardinality of ΩMk for every possible choice is essentially a lower bound on the overall sequence redundancy for most sources in Mk. The construction of ΩMk is identical to that used in [26] to construct a grid of sources that generate patterns. We pack spheres of radius n−0.5(1−ε) in the parameter space defining Mk. The set ΩMk consists of the center points of the spheres. To cover the space Mk, we randomly select a random shift of the whole lattice under a uniform distribution. The cardinality of ΩMk is lower bounded by the relation between the volume of Mk, which equals (as shown in [26]) 1/[(k− 1)!k!], and the volume of a single sphere, with factoring also of a packing density (see, e.g., [2]). This yields eq. (55) in [26], MMk ≥ (k − 1)! · k! · Vk−1 n−0.5(1−ε) · 2k−1 , (B.1) where Vk−1 n−0.5(1−ε) is the volume of a k−1 dimensional sphere with radius n−0.5(1−ε) (see, e.g., [2] for computation of this volume). For distinguishability, it is sufficient to show that there exists an estimator Θ̂g(X n) ∈ ΩMk such that limn→∞ PΘ Θ̂g(X n) 6= Θ = 0 for every choice of ΩMk and for every choice of Θ ∈ ΩMk . This is already shown in Lemma 4.1 in [25] for a larger grid Ω of i.i.d. sources, which is constructed identically to ΩMk over the complete k−1 dimensional probability simplex. Therefore, by the monotonicity requirement, for every ΩMk , there exists such Ω, such that ΩMk ⊆ Ω. Since Lemma 4.1 in [25] holds for Ω, it then must also hold for the smaller grid ΩMk . Note that distinguishability is easier to prove here than for patterns because Θ̂g(X n) is obtained directly form Xn and not from its pattern as in [26]. Now, since all the conditions of the strong random- coding version of the redundancy-capacity theorem hold, taking the logarithm of bound in (B.1), absorbing low order terms in ε, and normalizing by n, leads to the first region of the bound in (7). More detailed steps follow those found in [26]. The second region of the bound is handled in a manner related to the second region of the bound of Theorem 1. However, here, we cannot simply set the probability of all symbols i > km to zero, because all possible valid sources must be included in one of the grids ΩMk to generate a complete covering of Mk. As was done in [26], we include sources with θi > 0 for i > km in the grids ΩMk , but do not include them in the lower bound on the number of grid points. In- stead, for k > km, we bound the number of points in a km-dimensional cut of Mk for which the remaining k− km components of θ are very small (and insignificant). This analysis is valid also for k > n. Distinguishability for k > km is shown for i.i.d. non-monotonically restricted distributions in the proof of Lemma 6.1 in [26]. As before, it carries over to monotonic distributions, since as before, for each ΩMk , there exists an unrestricted corresponding Ω, such that ΩMk ⊆ Ω. The choice of km = 0.5(n 1−ε/π)1/3 gives the maximal bound w.r.t. k. Since, again, all conditions of the strong version of the redundancy-capacity theorem are satisfied, the second region of the bound is obtained. Again, more detailed steps can be found in [26]. This concludes the proof of Theorem 2. � Appendix C – Proof of Lemma 6.1 For cardinality k, we consider the largest component of θ̂M; θ̂1,M, as the constraint component, i.e., θ̂1,M = 1− i=2 θ̂i,M. For any given probability parameter ϕ of cardinality k with ϕ1 > 0, we Pϕ (x n) = ϕ nx(1) 1 (1− ϕ1) n−nx(1) · 1− ϕ1 )nx(i) △ nx(1) 1 (1− ϕ1) n−nx(1) nx(i) i (C.1) where we recall that nx(i) is the occurrence count of i in x n. Therefore, maximization of Pϕ (x w.r.t. ϕ1 is independent of the maximization over ϑi; i > 1, and is obtained for ϕ1 = θ̂1 = nx(1)/n. Since for all i > 1, θ̂1,M ≥ θ̂i,M, θ̂1,M can thus only increase from θ̂1 by the monotonicity constraint. (Note that the monotonicity constraint implies a water filling [3] optimization to achieve θ̂M.) Hence, θ̂1,M ≥ nx(1)/n. Now, using the result above, we show that the derivative of lnPϕM (x n) w.r.t. ϕk,M is positive for ϕk,M < 1/(kn) and a monotonic ϕM. A component of a parameter vector ϕM, which is monotonic, can be expressed as ϕi,M = ϕ′ℓ, ϕ ℓ ≥ 0. (C.2) Hence, ∂ lnPϕM (x ∂ϕk,M ϕ1,M=θ̂1,M ∂ lnPϕM (x ϕ1,M=θ̂1,M nx(i) − (k − 1)nx(1) θ̂1,M knx(k) − knx(1) = 0 (C.3) where (a) follows from ϕk,M being the smallest nonzero component of ϕM, (b) is since by (C.2), ϕ′k is included in all terms, and ϕ1,M = 1− ϕi,M = 1− (i− 1)ϕ′i − (k − 1)ϕk,M, (C.4) where the last equality follows from (C.2), (c) follows by omitting all terms of the sum except i = k, from the assumption that ϕk,M < 1/(nk) ≤ θ̂k/k, and since θ̂1,M ≥ nx(1)/n = θ̂1, and (d) follows since its left hand side is 0 for the (i.i.d.) ML parameter values. Hence, PϕM (x n) must increase, with ϕ1,M taking its optimal value, for all ϕM for which ϕk,M < 1/(nk), and the maximum is thus achieved for θ̂k,M ≥ 1/(nk). � References [1] J. Åberg, Y. M. Shtarkov, and B. J. M. Smeets, “Multialphabet coding with separate alphabet description,” in Proceedings of Compression and Complexity of Sequences, pp. 56-65, Jun. 1997. [2] J. H. Conway, N. J. A. Sloane, Sphere Packings, Lattices and Groups, Springer-Verlag, Third Edition, 1998. [3] T. M. Cover and J. A. Thomas, Elements of Information Theory , second edition, John Wiley & Sons, 2006. [4] I. Csiszar and J. Korner, Information Theory: Coding Theorems for Discrete Memoryless Systems., Academic Press, New York, 1981. [5] L. D. Davisson, “Universal noiseless coding,” IEEE Trans. Inform. Theory , vol. IT-19, no. 6, pp. 783-795, Nov. 1973. [6] L. D. Davisson, and A. Leon-Garcia, “A source matching approach to finding minimax codes,” IEEE Trans. Inform. Theory , vol. IT-26, no. 2, pp. 166-174, Mar. 1980. [7] P. Elias, “Universal codeword sets and representation of the integers,” IEEE Trans. Inform. Theory , vol. IT-21, no. 2, pp. 194-203, March 1975. [8] B. M. Fitingof, “Optimal coding in the case of unknown and changing message statistics,” Probl. Inform. Transm., vol. 2, no. 2, pp. 1-7, 1966. [9] B. M. Fitingof, “The compression of discrete information,” Probl. Inform. Transm., vol. 3, no. 3, pp. 22-29, 1967. [10] D. P. Foster, R. A. Stine, and A. J. Wyner, “Universal codes for finite sequences of integers drawn from a monotone distribution,” IEEE Trans. Inform. Theory , vol. 48, no. 6, pp. 1713- 1720, June 2002. [11] R. G. Gallager, “Source coding with side information and universal coding,” unpublished manuscript, September 1976. [12] G. M. Gemelos and T. Weissman, “On the entropy rate of pattern processes,” IEEE Trans. Inform. Theory , vol. 52, no. 9, pp. 3994-4007, Sept. 2006. [13] L. Györfi, I. Páli, and E. C. van der Meulen, “There is no universal code for an infinite source alphabet,” IEEE Trans. Inform. Theory , vol. 40, no. 1, pp. 267-271, Jan. 1994. [14] N. Jevtić, A. Orlitsky, and N. P. Santhanam, “A lower bound on compression of unknown alphabets,” Theoret. Comput. Sci., vol. 332, no. 1-3, pp. 293-311, 2005. [15] J. C. Kieffer, “A unified approach to weak universal source coding,” IEEE Trans. Inform. Theory , vol. IT-24, no. 6, pp. 674-682, Nov. 1978. [16] M. Khosravifard, H. Saidi, M. Esmaeili, and T. A. Gulliver, “The minimum average code for finite memoryless monotone sources,” IEEE Trans. Inform. Theory , vol. 52, no. 3, pp. 955-975, Mar. 2007. [17] N. Merhav and M. Feder, “A strong version of the redundancy-capacity theorem of universal coding,” IEEE Trans. Inform. Theory , vol. no. 3, 41, pp. 714-722, May 1995. [18] N. Merhav, G. Seroussi, and M. J. Weinberger, “Optimal prefix codes for sources with two- sided geometric distributions,” IEEE Trans. Inform. Theory , vol. 46, no. 1, pp. 121-135, Jan. 2000. [19] N. Merhav, G. Seroussi, and M. J. Weinberger, “Coding of sources with two-sided geometric distributions and unknown parameters,” IEEE Trans. Inform. Theory , vol. 46, no. 1, pp. 229-236, Jan. 2000. [20] A. Orlitsky, N. P. Santhanam, and J. Zhang, “Universal compression of memoryless sources over unknown alphabets,” IEEE Trans. Inform. Theory , vol. 50, no. 7, pp. 1469-1481, July 2004. [21] A. Orlitsky, and N. P. Santhanam, “Speaking of infinity,” IEEE Trans. Inform. Theory , vol. 50, no. 10, pp. 2215-2230, Oct. 2004. [22] J. Rissanen, “Minimax codes for finite alphabets,” IEEE Trans. Inform. Theory , vol. IT-24, no. 3, pp. 389-392, May 1978. [23] J. Rissanen, “Universal coding, information, prediction, and estimation,” IEEE Trans. In- form. Theory , vol. IT-30, no. 4, pp. 629-636, Jul. 1984. [24] B. Ya. Ryabko, “Coding of a source with unknown but ordered probabilities,” Problems of Information Transmission, vol. 15, no. 2, pp. 134-138, Oct. 1979. [25] G. I. Shamir, “On the MDL principle for i.i.d. sources with large alphabets,” IEEE Trans. Inform. Theory , vol. 52, no. 5, pp. 1939-1955, May 2006. [26] G. I. Shamir, “Universal lossless compression with unknown alphabets - the average case”, IEEE Trans. Inform. Theory , vol. 52, no. 11, pp. 4915-4944, Nov. 2006. [27] G. I. Shamir, “Patterns of sequences and their entropy,” submitted to IEEE Trans. Inform. Theory . Also in Arxiv:cs.IT/0605046 . [28] G. I. Shamir, “A new redundancy bound for universal lossless compression of unknown alpha- bets,” in Proceedings of The 38th Annual Conference on Information Sciences and Systems, Princeton, New-Jersey, U.S.A., pp. 1175-1179, Mar. 17-19, 2004. [29] Y. M. Shtarkov, “Universal sequential coding of single messages,” Problems of Information Transmission, 23(3):3-17, Jul.-Sep. 1987. [30] L. R. Varshney and V. K. Goyal, “Ordered and disordered source coding,” in Information Theory & Applications Workshop (ITA), San Diego, California, Feb. 6-10, 2006. [31] L. R. Varshney and V. K. Goyal, “On universal coding of unordered data,” in Information Theory & Applications Workshop (ITA), San Diego, California, Jan. 29-Feb. 2, 2007. [32] A. D. Wyner, “An upper bound on the entropy series,” Inform. Contr., vol. 20, pp. 176-181, 1972. Introduction Notation and Definitions Lower Bounds Upper Bounds for Small and Large Alphabets Upper Bounds for Fast Decaying Distributions Upper Bounds Examples Fast Decaying Distributions Over the Integers Geometric Distributions Slow Decaying Distributions Over the Integers Individual Sequences Summary and Conclusions – Proof of Theorem ?? – Proof of Theorem ?? – Proof of Lemma ?? ABSTRACT We study universal compression of sequences generated by monotonic distributions. We show that for a monotonic distribution over an alphabet of size $k$, each probability parameter costs essentially $0.5 \log (n/k^3)$ bits, where $n$ is the coded sequence length, as long as $k = o(n^{1/3})$. Otherwise, for $k = O(n)$, the total average sequence redundancy is $O(n^{1/3+\epsilon})$ bits overall. We then show that there exists a sub-class of monotonic distributions over infinite alphabets for which redundancy of $O(n^{1/3+\epsilon})$ bits overall is still achievable. This class contains fast decaying distributions, including many distributions over the integers and geometric distributions. For some slower decays, including other distributions over the integers, redundancy of $o(n)$ bits overall is achievable, where a method to compute specific redundancy rates for such distributions is derived. The results are specifically true for finite entropy monotonic distributions. Finally, we study individual sequence redundancy behavior assuming a sequence is governed by a monotonic distribution. We show that for sequences whose empirical distributions are monotonic, individual redundancy bounds similar to those in the average case can be obtained. However, even if the monotonicity in the empirical distribution is violated, diminishing per symbol individual sequence redundancies with respect to the monotonic maximum likelihood description length may still be achievable. <|endoftext|><|startoftext|> Introduction: smooth tropical varieties In this section we follow the definitions of [5] and [4]. The underlying algebra of tropical geometry is given by the semifield T = R∪{−∞} of tropical numbers. The tropical arithmetic operations are “a + b” = max{a, b} and “ab” = a + b. The quotation marks are used to distinguish between the tropical and classical operations. With respect to addition T is a commutative semigroup with zero “0T” = −∞. With respect to multiplication T× = T r {0T} ≈ R is an honest commutative group with the unit “1T” = 0. Furthermore, the addition and multiplication satisfy to the distribution law “a(b+c)” = “ab+ac”, a, b, c ∈ T. These operations may be viewed as a result of the so-called dequantization of the classical arithmetic operations that underlies the patchworking construction, see [3] and [8]. These tropical operations allow one to define tropical Laurent poly- nomials. Namely, a tropical Laurent polynomial is a function f : Rn → f(x) = “ j” = max (aj + jx), where jx denotes the scalar product, x ∈ (T×)n ≈ Rn, j ∈ Zn and only finitely many coefficients aj ∈ T are non-zero (i.e. not −∞). http://arxiv.org/abs/0704.0839v1 2 GRIGORY MIKHALKIN Affine-linear functions with integer slopes (for brevity we call them simply affine functions) form an important subcollection of all Laurent polynomials. Namely, these are such functions f : Rn → R that both f and “1T ” = −f are tropical Laurent polynomials. We equip Tn ≈ [−∞,∞)n with the Euclidean topology. Let U ⊂ Tn be an open set. Definition 1.1. A continuous function f : U → T is called regular if its restriction to U ∩ Rn coincides with a restriction of some tropical Laurent polynomial to U ∩ Rn. We denote the sheaf of regular functions on Tn with O (or sometimes OTn to avoid confusion). Any subset X ⊂ T n gets an induced regular sheaf OX by restriction. For our purposes we restrict our attention only to the case when X is a polyhedral complex, i.e. when X is the closure of a union of convex polyhedra (possibly unbounded) in Rn such that the intersection of any number of such polyhedra is their common face. We say that X is an k-dimensional polyhedral complex if it is obtained from a union of k-dimensional polyhedra. These polyhedra are called the facets of X . Let V ⊂ X be an open set and f ∈ OX(V ) be a regular function in V . A point x ∈ V is called a “zero point” of f if the restriction of ” = −f toW ⊂ V is not regular for any open neighborhoodW ∋ x. Note that it may happen that x is a “zero point” for φ : U → T, but not for φ|X∩U . It is easy to see that if X is a k-dimensional polyhe- dral complex then the “zero locus” Zf of f is a (k − 1)-dimensional polyhedral subcomplex. To each facet of Zf we may associate a natural number, called its weight (or degree). To do this we choose a “zero point” x inside such a facet. We say that x is a “simple zero” for f if for any local de- composition into a sum (i.e. the tropical product) of regular function f = “gh” = g+h on V near x we have either g or h affine (i.e. without a “zero”). We say that the weight is l if f can be locally decomposed into a tropical product of l functions with a simple zero at x. A regular function f allows us to make the following modification on its domain V ⊂ X ⊂ Tn. Consider the graph Γf ⊂ V × T ⊂ T It is easy to see that the “zero locus” Γ̄f ⊂ V × T of the (regular) function “y+ f(x)” (defined on V ×T), where x is the coordinate on V and y is the coordinate on T, coincides with the union MODULI SPACES OF RATIONAL TROPICAL CURVES 3 of Γf and the undergraph UΓf,Z = {(x, y) ∈ V × T | x ∈ Zf , y ≤ f(x)}. Furthermore, the weight of a facet F ⊂ Γ̄f is 1 if F ∈ Γf (recall that as V is an unweighted polyhedral complex all the weights of its facets are equal to one) and is the weight of the corresponding facet of Zf if F ∈ UΓf,Z . We view Γ̄f as a “tropical closure” of the set-theoretical graph Γf . Note that we have a map Γ̄f → V . We set Ṽ = Γ̄f to be the result of the tropical modification µf : Ṽ → V along the regular function f . The locus Zf is called the center of tropical modification. The weights of the facets of Ṽ supplies us with some inconvenience as they should be incorporated in the definition of the regular sheaf OṼ on Ṽ . Namely, the affine functions defined by OṼ on a facet of weight w should contain the group of functions that come as restrictions to this facet of the affine functions on Tn+1 as a subgroup of index w. Sometimes one can get rid of the weights of Ṽ by a reparameteriza- tion with the help of a map V̄ → Ṽ that is given by locally linear maps in the corresponding charts. Indeed, the restriction of µg : V̄ → Ṽ to a facet is locally given by a linear function between two k-dimensional affine-linear spaces defined over Z. If its determinant equals to w then the push-forward of OV̄ supplies an extension of OṼ required by the weights. Note however that if w > 1 then the converse map is not defined over Z and thus is not given by elements of OṼ . Tropical modifications give the basic equivalence relation in Tropical Geometry. It can be shown that if we start from Tk and do a number of tropical modifications on it then the result is a k-dimensional polyhe- dral complex Y ⊂ Tn that satisfies to the following balancing property (cf. Property 3.3 in [4] where balancing is restated in an equivalent way). Property 1.2. Let E ⊂ Y ∩ RN be a (k − 1)-dimensional face and F1, . . . , Fl be the facets of Y adjacent to F whose weights are w1, . . . , wl. Let L ⊂ RN be a (N−k)-dimensional affine-linear space with an integer slope and such that it intersects E. For a generic (real) vector v ∈ RN the intersection Fj ∩ (L + v) is either empty or a single point. Let ΛFj ⊂ Z N be the integer vectors parallel to Fj and ΛL ⊂ Z N be the integer vectors parallel to L. Set λj to be the product of wj and the index of the subgroup ΛFj +ΛL ⊂ Z N . We say that Y ⊂ Tn is balanced if for any choice of E, L and a small generic v the sum j | Fj∩(L+v)6=∅ 4 GRIGORY MIKHALKIN is independent of v. We say that Y is simply balanced if in addition for every j we can find L and v so that Fj ∩ (L + v) 6= ∅, ιL = 1 and for every small v there exists an affine hyperplane Hv ⊂ L such that the intersection Y ∩ (L + v) sits entirely on one side of Hv + v in L + v while the intersection Y ∩ (Hv + v) is a point. Definition 1.3 (cf. [5],[4]). A topological space X enhanced with a sheaf of tropical functions OX is called a (smooth) tropical variety of dimension k if for every x ∈ X there exist an open set U ∋ x and an open set V in a simply balanced polyhedral complex Y ⊂ TN such that the restrictions OX |U and OY |V are isomorphic. Tropical varieties are considered up to the equivalence generated by tropical modifications. It can be shown that a smooth tropical variety of dimension k can be locally obtained from Tk by a sequence of tropical modifications centered at smooth tropical varieties of dimension (k−1). This follows from the following proposition. Proposition 1.4. Any k-dimensional simply balanced polyhedral com- plex X ⊂ Rn can be obtained from Tk by a sequence of consecutive trop- ical modifications whose centers are simply balanced (k−1)-dimensional polyhedral complexes. Proof. We prove this proposition inductively by n. Without the loss of genericity we may assume that X is a fan, i.e. each convex polyhedron of X is a cone centered at the origin. The base of the induction, when n = k, is trivial. If n > k let us take a (n− k)-dimensional affine-linear subspace L ⊂ Rn given by Property 1.2. Choose a linear projection λ : Rn → Rn−1 defined over Z and such that ker(λ) is a line contained in L. The image λ(X) ⊂ Rn−1 is a k-dimensional polyhedral complex since L is transversal to some facets of X . We claim that λ|X : X → λ(X) is a tropical modification once we identify Rn and Rn−1×R. The center of this modification is the locus Zf = {x ∈ R n−1 | dim(λ−1(x) ∩X) > 0}. Here we use the dimension in the usual topological sense. Note that the (k − 1)-dimensional complex Zf ⊂ R n−1 is simply balanced, existence of the needed (n− k)-dimensional affine-linear spaces follows from the fact that X ⊂ Rn is simply balanced. MODULI SPACES OF RATIONAL TROPICAL CURVES 5 To justify our claim we note that near any point x ∈ Zf the sub- complex Y ⊂ X obtained as the (Euclidean topology) closure of X r λ−1(Zf) is a (set-theoretical) graph of a convex function. This, once again, follows from the fact that X ⊂ Rn is simply balanced, this time applied to the points in the facets on X r Y . Thus it gives a regular tropical function f and it remains only to show that the the weight of any facet of E ⊂ Zf is 1. But this follows, in turn, from the balancing condition at λ−1(E) ∩ Y . � 2. Tropical curves and their moduli spaces The definition of tropical variety is especially easy in dimension 1. Tropical modifications take a graph into a graph (with arbitrary va- lence of its vertices) and the tropical structure carried by the sheaf OX amounts to a complete metric on the complement of the set of 1-valent vertices of the graph X (cf. [5], [6], [1]). Thus, each 1-valent vertex of a tropical curve X is adjacent to an edge of infinite length. A tropical modification allows one to contract such an edge or to attach it at any point of X other than a 1-valent vertex. If we have a finite collection of marked points on X then by passing to an equivalent model if needed we may assume that the set of marked points coincides with the set of 1-valent vertices. (Of course, if X is a tree then we have to have at least two marked points to make such assumption.) The genus of a tropical curve X is dimH1(X). Let Mg,n be the set of all tropical curves X of genus g with n distinct marked points. Fixing a combinatorial type of a graph Γ with n marked leaves defines a subset UΓ ⊂ Mg,n consisting of marked tropical curves with this combinatorics. A length of any non-leaf edge of Γ defines a real-valued function on UΓ. Such functions are called edge-length functions. To avoid difficulties caused by self-automorphisms of X from now on we restrict our attention to the case g = 0. Definition 2.1. The combinatorial type of a tropical curve X is its equivalence class up to homeomorphisms respecting the markings. Combinatorial types partite the set M0,n into disjoint subsets. The edge-length functions define the structure of the polyhedral cone RM≥0 in each of those subsets (as the lengths have to be positive). The number M here is the number of the bounded (non-leaf) edges in X . By the Euler characteristic reasoning it is equal to n − 3 if X is (1- and) 3-valent, it is smaller if X has vertices of higher valence. Furthermore, any face of the polyhedral cone RM≥0 coincides with the cone corresponding to another combinatorial type, the one where we 6 GRIGORY MIKHALKIN contract some of the edges of X to points. This gives the adjacency (fan-like) structure on M0,n, so M0,n is a (non-compact) polyhedral complex. In particular, it is a topological space. Theorem 1. The set M0,n for n ≥ 3 admits the structure of an (n−3)- dimensional tropical variety such that the edge-length functions are reg- ular within each combinatorial type. Furthermore, the space M0,n can be tropically embedded in RN for some N (i.e. M0,n can be presented as a simply balanced complex). Proof. This theorem is trivial for n = 3 as M0,3 is a point. Otherwise, any two disjoint ordered pairs of marked points can be used to define a global regular function on M0,n with values in R = T ×. Namely, each such ordered pair defines the oriented path on the tropical curve X connecting the corresponding marked points. These paths can be embedded. Since the two pairs of marked points are disjoint the intersection of the two corresponding paths has to have finite length. We take this length with the positive sign if the orientations agree and with the negative sign otherwise. This defines a function on M0,n. We call such functions the double ratio functions. Take all possible disjoint pairs of marked points and use them as coordinates for our embedding ι : M0,n → R where N is the number of all possible decompositions of n into two disjoint pairs. The theorem now follows from the following two lemmas. Note that, strictly speaking, each coordinate in RN depends not only on the choice of two disjoint pairs of marked points but also on the order of points in each pair. However, changing the order in one of the pairs only reverses the sign of the double ratio. Taking an extra coordinate for such a change of order would be redundant. Indeed, for any balanced complex Y ⊂ RN and any affine-linear function λ : RN → R with an integer slope the graph of λ is a balanced complex in RN+1 isomorphic to the initial complex Y . Lemma 2.2. The map ι is a topological embedding. Proof. First, let us prove that ι is an embedding. The combinatorial type of X is determined by the set of the coordinates that do not vanish on X . Indeed, any non-leaf edge E of the tree X separates the leaves (i.e. the set of markings) into two classes corresponding to the components of XrE. Let us take a coordinate in Rn that corresponds MODULI SPACES OF RATIONAL TROPICAL CURVES 7 to four marking points (union of the two disjoint pairs) such that two of these points belong to one class and two to the other class. We call such a coordinate an E-compatible coordinate. Note that an E-compatible coordinate vanishes on X if and only if the pairs of markings defined by the coordinate agree with the pairs defined by the classes. This observation suffices to reconstruct the combinatorial type of X . Furthermore, the length of E equals to the minimal non-zero abso- lute value of the E-compatible coordinates. This implies that ι is an embedding. � Lemma 2.3. The image ι(M0,n) is a simply balanced complex in R Proof. This is a condition on codimension 1 faces of M0,n. First we shall check it for the case n = 4. There are three ways to split the four marking points into two disjoint pairs. Accordingly, there are three combinatorial types of 3-valent trees with three marked leaves. Thus our space M0,4 is homeomorphic to the tripod, or the “interior” of the letter Y , see Figure 1. Each ray of this tripod correspond to a combinatorial type of a 3-valent tree with 4 leaves while the vertex correspond to the 4-valent tree. Figure 1. The tropical moduli space M0,4 and its points on the corresponding edges. 8 GRIGORY MIKHALKIN Up to the sign we have the total of three double ratios for n = 4. Let us e.g. take those defined by the following ordered pairs: {(12), (34)}, {(13), (24)} and {(14), (23)} Each is vanishing on the corresponding ray of the tripod. Let us parameterize each ray of the tripod by its only edge-length t ≥ 0 and compute the corresponding map to R3. We have the following embeddings on the three rays t 7→ (0, t, t), t 7→ (t, 0,−t), t 7→ (−t,−t, 0). The sum of the primitive integer vectors parallel to the resulting direc- tions is 0 and thus ι(M0,4) is balanced. In the case n > 4 the codimension 1 faces of M0,n correspond to the combinatorial types of X with a single 4-valent vertex. Near a point inside of such face F the space M0,n looks like the product of M0,4 and R n−4. The factor Rn−4 comes from the edge-lengths on F (its combinatorial type has n−4 bounded edges) while the factor M0,4 comes from perturbations of the 4-valent vertex (which result in a new bounded edge in one of the three possible combinatorial types of the result). We have a well-defined map from the union U of the F -adjacent facets to F by contracting the new edge to a point. Note that the edge-length functions exhibit F as the positive quadrant in Rn−4. Fur- thermore, in the combinatorial type of F we may choose 4 leaves such that contracting all other leaves will take place outside of the 4-valent vertex (see Figure 2). This contraction defines a map U → M0,4. Figure 2. One of the possible contractions of a tree with a 4-valent vertex to the tree corresponding to the origin O ∈ M0,4. The lemma now follows from the observation that the resulting de- composition into M0,4 × R n−4 agrees with the double ratio functions. Indeed, note that the complement of the 4-valent vertex for a curve in the combinatorial type F is composed of four components. If the double ratio is such that its four markings are in one-to-one correspon- dence with these components then at U it coincides with sum of the pull-back of the corresponding double ratio in F with the pull-back of the corresponding double ratio in M0,4. If one of the four components MODULI SPACES OF RATIONAL TROPICAL CURVES 9 is lacking a marking from the double ratio ρ then ρ|U coincides with the corresponding pull-back from F . � Remark 2.4. The functions Zxi,xj from [4] do not define regular func- tions on M0,n, contrary to what is written in [4]. These functions were a result of an erroneous simplification of the double ratio functions. But these functions cannot be regular as they are always positive and Proposition 5.12 of [4] is not correct. Even the projectivization of the embedding is not a balanced complex already for M0,5. One should use the (non-simplified) double ratios instead. Clearly, the space M0,n is non compact. However it is easy to com- pactify it by allowing the lengths of bounded edges to assume infinite values. Let M0,n be the space of connected trees with n (marked) leaves such that each edge of this tree is assigned a length 0 < l ≤ +∞ so that each leaf has length necessarily equal to +∞. Corollary 2.5. The space M0,n is a smooth compact tropical variety. To verify that M0,n is smooth near a point x at the boundary ∂M0,n = M0,n rM0,n we need to examine those double ratios that are equal to ±∞ at x. There we use only those signs that result in −∞ do that the map takes values in TN . Remark 2.6. Note that the compactification M0,n ⊃ M0,n corresponds to the Deligne-Mumford compactification in the complex case as under the 1-parametric family collapse of a Riemannian surface to a tropical curve the tropical length of an edge corresponds to the rate of growth of the complex modulus of the holomorphic annulus collapsing to that edge. Furthermore, similarly to the complex story the infinite edges de- compose a tropical curve into components (where the non-leaf edges are finite). Any tropical map from an infinite edge which is bounded would have to be constant and thus the image would have to split as a union of several tropical curves in the target. Such decompositions were used by Gathmann and Markwig in their deduction of the tropical WDVV equation in R2, see [1]. 3. Tropical ψ-classes Note that we do have the forgetting maps ftj : M0,n+1 → M0,n 10 GRIGORY MIKHALKIN for j = 1, . . . , n + 1 by contracting the leaf with the j-marking. This map is sometimes called the universal curve. Each marking k 6= j defines a section σk of ftj. The conormal bundle to σk defines the ψk- class in complex geometry (to avoid ambiguity we take j = n+1). This notion can be adapted to our tropical setup. Recall that so far our choice of tropical models in their equivalence class was such that the leaves of the tropical curves were in 1-1 cor- respondence with the markings. For this choice we have the images σk(M0,n) contained in the boundary part of M0,n+1. This presenta- tion is compatible with the point of view when we think about line bundles in tropical geometry to be given by H1(X,O×). Here X is the base of the bundle and O× is the sheaf of “non-vanishing” tropi- cal regular functions. Such functions are given in the charts to RN by affine-linear functions with integer slopes, see [6]. (Recall that T× = R is an honest group with respect to tropical multiplication, i.e. the classical addition.) However, the following alternative construction allows one to obtain the ψ-classes more geometrically (as we’ll illustrate in an example in the next section). This approach is based on contracting the leaves marked by number k. The canonical class of a tropical curve is supported at its vertices, namely we take each vertex with the multiplicity equal to its valence minus 2, cf. [6]. Furthermore, the cotangent bundle near a 3-valent vertex point can be viewed as a neighborhood of the origin for the line given by the tropical polynomial “x + y + 1T” in R 2, so the +1 self- intersection of the line gives the required multiplicity for the canonical class at any 3-valent vertex. Thus we can use the intersections with the corresponding codimension 1 faces inM0,n to define the ψ-classes there. In other words, tropical ψ-classes will be supported on the (n − 4)- dimensional faces in M0,n. Namely, for a ψk-class we have to collect those codimension 1 faces in M0,n whose only 4-valent vertex is adjacent to the leaf marked by k. After a contraction of this leaf we get a 3-valent vertex, thus the multiplicity of every face in a ψ-divisor is 1. We arrive to the following definition. Definition 3.1. The tropical ψk-divisor Ψk ⊂ M0,n is the union of those (n−4)-dimensional faces that correspond to tropical curves with a 4-valent vertex adjacent to the leaf marked by k, k = 1, . . . , n. Each such face is taken with the multiplicity 1. Proposition 3.2. The subcomplex Ψk is a divisor, i.e. satisfies the balancing condition. MODULI SPACES OF RATIONAL TROPICAL CURVES 11 Proof. Recall that the balancing condition is a condition at (n − 5)- dimensional faces. In M0,n there are two types of such faces, one corresponding to tropical curves with two 4-valent vertices and one corresponding to a tropical curve with a 5-valent vertex. Near the faces of the first type the moduli space M0,n is locally a product of two copies of M0,4 and R n−5. The Ψ-divisor is a product of Rn−5, one copy of M0,4 and the central (3-valent) point in the other copy of M0,4 (this is the point corresponding to the 4-valent vertex adjacent to the leaf marked by k). Thus the balancing condition holds trivially in this case. Near the faces of the second type the moduli space M0,n is locally a product of M0,5 and R n−5. As in the proof of Theorem 1 each double ratio decomposes to the sum of the corresponding double ration inM0,5 (perhaps trivial if two of the markings for the double ratio correspond to the same edge adjacent to the 5-valent vertex) and an affine-linear function in Rn−5. Thus it suffices to check only the balancing condition for the Ψ-divisors in M0,5. This example is considered in details in the next section. The balancing condition there follows from Proposition 4.1. � Conjecturally, the tropical Ψ-divisors are limits of some natural rep- resentatives of the divisors for the complex ψ-classes under the collapse of the complex moduli space onto the corresponding tropical moduli space M0,n. Note that our choice for the tropical Ψ-divisor is not con- tained in the boundary ∂M0,n ⊂ M0,n (cf. the calculus of the complex boundary classes in [2]), but comes as a closure of a divisor in M0,n. 4. The space M0,5 We have already described the moduli space M0,4 as the tripod of Figure 1. It has only one 0-dimensional face O ∈ M0,4. This point (considered as a divisor) coincides with the divisors Ψ1 = Ψ2 = Ψ3 = Ψ4. The description of M0,5 is somewhat more interesting. There are 15 combinatorial types of 3-valent trees with 5 marked leaves. If we forget about the markings there is only one homeomor- phism class for such a curve (see Figure 3). To get the number of non-isomorphic markings we take the number all possible reordering of vertices (equal to 5! = 120) and divide by 23 = 8 as there is an 8-fold symmetry of reordering. Indeed there is one symmetry interchanging the left two leaves, one interchanging the right two leaves and the cen- tral symmetry around the central leave of the 3-valent tree on top of Figure 3. 12 GRIGORY MIKHALKIN 1 15 5 Figure 3. Adjunction of combinatorial types corre- sponding to the quadrant connecting the rays (45) and (12). (25) (13) (15) (23) Figure 4. The link of the origin in M0,5. MODULI SPACES OF RATIONAL TROPICAL CURVES 13 Thus the space M0,5 is a union of 15 quadrants R ≥0. These quad- rants are attached along the rays which correspond to the combinato- rial types of curves with one 4-valent vertex. Such curves also have one 3-valent vertex which is adjacent to two leaves and the only bounded edge of the curve, see the bottom of Figure 3. Such combinatorial types are determined by the markings of the two leaves emanating from the 3-valent vertex. Thus we have a total of = 10 of such rays. The two boundary edges of the quadrant correspond to contractions of the bounded edges of the combinatorial type as shown on Figure 3. The global picture of adjacency of quadrants and rays is shown on Figure 4 where the reader may recognize the well-known Petersen graph, cf. the related tropical Grassmannian picture in [7]. Vertices of this graph correspond to the rays of M0,5 while the edges correspond to the quadrants. Thus the whole picture may be interpreted as the link of the only vertex O ∈ M0,5 (the point O corresponds to the tree with a 5-valent vertex adjacent to all the leaves). To locate the Ψk-divisor we recall that the kth leaf has to be adjacent to a 4-valent vertex if it appears in Ψk. This means that Ψk consists of 6 rays that are marked by pairs not containing k. Proposition 4.1. The subcomplex Ψk ⊂ M0,5 is a divisor. Proof. Since the whole M0,5 is S5-symmetric it suffices to check the balancing condition only for Ψ1. The embedding M0,5 ⊂ R N is given by the double ratios, so it suffices to check that for each double ratio function the sum of its gradients on the six rays of Ψ1 vanishes. If the double ratio is determined by two pairs disjoint from the mark- ing 1, e.g. by {(23), (45)} then its restriction onto the six rays of Ψ1 is the same as its restriction to the three rays M0,4 taken twice and thus balanced. Namely its gradient is 1 on the rays (24) and (35); −1 on the rays (25) and (34); and 0 on the rays (23) and (45). If the four markings of the double ratio contain the marking 1 then thanks to the symmetry we may assume that the double ratio is given by {(12), (34)}. It vanishes on the rays (34), (35), (45) and (25); it has gradient +1 on the ray (24) and the gradient −1 on the ray (23). Once again, the balancing condition holds. � As our final example of the paper we would like to describe explicitly the universal curve ft5 : M0,5 → M0,4. This is presented on Figure 5. Once again, we interpret the Peterson graph as the link L of the vertex O ∈ M0,5. Similarly, the link of the 14 GRIGORY MIKHALKIN Figure 5. The three fibers and four sections of the universal curve ft5 : M0,5 → M0,4. origin in M0,4 consists of three points. Thus L is the union of the fibers of ft5 (away from a neighborhood of infinity) over these three points and four copies of a neighborhood of the origin in M0,4 corresponding to the four sections σ1, σ2, σ3 and σ4 of the universal curve. Figure 5 depicts the fibers in L with solid lines and the sections with dashed lines. Acknowledgements. I am thankful to Valery Alexeev and Kristin Shaw for discussions related to geometry of tropical moduli spaces. My research is supported in part by NSERC. References [1] Gathmann, A., Markwig, H., Kontsevich’s formula and the WDVV equations in tropical geometry, http://arxiv.org/abs/math.AG/0509628. [2] Keel, S., Intersection theory of moduli space of stable N -pointed curves of genus zero, Transactions of the AMS 330 (1992), 545–574. [3] Litvinov, G. L., The Maslov dequantization, idempotent and tropical mathe- matics: a very brief introduction. In Idempotent mathematics and mathemat- ical physics, Contemp. Math., 377, Amer. Math. Soc., Providence, RI, 2005, 1–17. [4] Mikhalkin, G., Tropical Geometry and its application, to appear in the Pro- ceedings on the ICM-2006, Madrid; http://arxiv.org/abs/math/0601041. [5] Mikhalkin, G., Tropical Geometry, book in preparation. http://arxiv.org/abs/math.AG/0509628 http://arxiv.org/abs/math/0601041 MODULI SPACES OF RATIONAL TROPICAL CURVES 15 [6] Mikhalkin, G., Zharkov, I., Tropical curves, their Jacobians and Theta func- tions, http://arxiv.org/abs/math/0612267. [7] Speyer, D., Sturmfels, B., The tropical Grassmannian. Adv. Geom. 4 (2004), no. 3, 389–411. [8] Viro, O. Ya., Dequantization of real algebraic geometry on logarithmic paper. In European Congress of Mathematics, Vol. I (Barcelona, 2000), Progr. Math., 201, Birkhäuser, Basel, 2001, 135–146. Department of Mathematics, University of Toronto, 40 St George St, Toronto ON M5S 2E4 Canada http://arxiv.org/abs/math/0612267 1. Introduction: smooth tropical varieties 2. Tropical curves and their moduli spaces 3. Tropical -classes 4. The space M0,5 References ABSTRACT This note is devoted to the definition of moduli spaces of rational tropical curves with n marked points. We show that this space has a structure of a smooth tropical variety of dimension n-3. We define the Deligne-Mumford compactification of this space and tropical $\psi$-class divisors. <|endoftext|><|startoftext|> Difermion condensates in vacuum in 2-4D four-fermion interaction models Bang-Rong Zhou† College of Physical Sciences, Graduate School of the Chinese Academy of Sciences, Beijing 100049, China In any four fermion (denoted by q) interaction models, the couplings of (qq)2-form can always coexist with the ones of (q̄q)2-form via the Fierz transformations. Hence, even in vacuum, there could be interplay between the condensates 〈q̄q〉 and 〈qq〉. Theoretical anal- ysis of this problem is generally made by relativistic effective potentials in the mean field approximation in 2D, 3D and 4D models with two flavor and Nc color massless fermions. It is found that in ground states of these models, interplay between the two condensates mainly depend on the ratio GS/HS for 2D and 4D case or GS/HP for 3D case, where GS , HS and HP are respectively the coupling constants in a scalar (q̄q), a scalar (qq) and a pseudoscalar (qq) channel. In ground states of all the models, only pure 〈q̄q〉 condensates could exist if GS/HS or GS/HP is bigger than the critical value 2/Nc, the ratio of the color numbers of the fermions entering into the condensates 〈qq〉 and 〈q̄q〉. Below it, differences of the models will manifest themselves. In the 4D Nambu-Jona-Lasinio (NJL) model, as GS/HS decreases to the region below 2/Nc, one will first have a coexistence phase of the two condensates then a pure 〈qq〉 con- densate phase. Similar results come from a renormalized effective potential in the 2D Gross- Neveu model, except that the pure 〈qq〉 condensates could exist only if GS/HS = 0. In a 3D Gross-Neveu model, when GS/HP < 2/Nc, the phase transition similar to the 4D case can arise only if Nc > 4, and for smaller Nc, only a pure 〈qq〉 condensate phase exists but no coexistence phase of the two condensates happens. The GS −HS (or GS − HP ) phase diagrams in these models are given. The results deepen our understanding of dynamical phase structure of four-fermion inter- action models in vacuum. In addition, in view of absence of difermion condensates in vacuum of QCD, they will also imply a real restriction to any given two-flavor QCD-analogous NJL model, i.e. in the model, the derived smallest ratio GS/HS via the Fierz transformations in the Hartree approximation must be bigger than 2/3. The project supported by the National Natural Science Foundation of China under Grant No.10475113. Electronic mailing address: zhoubr@163bj.com http://arxiv.org/abs/0704.0841v3 I. MAIN RESULTS We have researched interplay between the fermion(q)-antifermion (q̄) condensates 〈q̄q〉 and the difermion condensates 〈qq〉 in vacuum in 2D, 3D and 4D four-fermion interaction models with two flavor and Nc color massless fermions. It is found that the ground states of the systems could be in different phases shown in the following GS − HS and GS − HP phase diagrams [Fig.(a)–Fig.(d)], where GS — coupling constant of scalar (q̄q) 2 channel HS — coupling constant of scalar color Nc(Nc − 1) −plet (qq)2 channel (4D, 2D case) HP — coupling constant of pseudoscalar color Nc(Nc − 1) −plet (qq)2 channel (3D case) Λ — Euclidean Momentum cutoff of loop integrals (4D,3D case) (σ1, 0) — pure 〈q̄q〉 phase (0,∆1) — pure 〈qq〉 phase (σ2,∆2) — mixed phase with both 〈q̄q〉 and 〈qq〉 Fig.(a)-Fig.(d) (pages 3-6) 3 4D NJL Model y=GSΛ 2/π2 y =2x/ Nc (σ1, 0) (σ2,Δ2) 1/Nc (0 ,Δ1) R 0 1/2 x =HSΛ R: y=x/[1+(Nc-2)x] Fig. (a) 4 2D GN Model y=GS /π y=2x /Nc (σ1, 0) (σ2 ,Δ2) x=HS /π (0,Δ1) Fig. (b) 3D GN Model, Nc≤4 y=GSΛ/π y=2x/ Nc (σ1, 0) 1/4Nc (0,Δ1) 0 1/8 x=HPΛ/π Fig. (c) 3D GN Model, Nc≥5 y=GSΛ/π y =2x/ Nc (σ1, 0) 1/4Nc (σ2,Δ2) (0,Δ1) 0 1/8 x =HPΛ/π Fig. (d) Main Conclusions 1. In all the models, pure 〈q̄q〉 phase happens if ) > 2 (also GS must be large enough in 3D and 4D model). 2. The phases with condensates 〈qq〉, including pure 〈qq〉 phase and mixed phase with 〈q̄q〉 and 〈qq〉, arise only if ) < 2 3. In 3D Gross-Neveu model, no mixed phase with 〈q̄q〉 and 〈qq〉 exists for Nc ≤ 4. II. Motive and general approach • In any four-fermion interaction model [1, 2], the couplings of (qq)2-form and (q̄q)2-form can always coexist via the Fierz transformations, hence there must be interplay between the condensates 〈q̄q〉 and 〈qq〉 in ground state of the system. • In the vacuum, despite of absence of net fermions, based on a relativistic quantum field theory, it is possible that the condensates 〈qq〉 and 〈q̄q̄〉 are generated simultaneously. • The mean field approximation has been taken. In this case, we have used the Fierz transformed four-fermion couplings in the Hartree approximation to avoid double counting [3]. • In selecting the couplings of (qq)2-form, we always simulate SU(Nc) gauge interaction, where two fermions are attractive in the antisymmetric Nc(Nc − 1) -plet. • Euclidean momentum cutoffs in 3D and 4D models have been used so as to maintain Lorentz invariance of effective potentials in the vacuum. • In massless fermion limit, all the discussions can be made analytically. • The coupling constants GS and HS (or HP ) are viewed as independent parameters. III. 4D Nambu-Jona-Lasinio model With 2 flavors and Nc color massless fermions, the Lagrangian L = q̄iγµ∂µq +GS [(q̄q) 2 + (q̄iγ5~τq) 2] +HS (q̄iγ5τ2λAq C)(q̄C iγ5τ2λAq), (1) where the fermion fields q are in the doublet of SUf (2) and the Nc-plet of SUc(Nc), i.e. i = 1, · · · , Nc, (2) http://arxiv.org/abs/0704.0841v3 qC is the charge conjugate of q and ~τ = (τ1, τ2, τ3) are the Pauli matrices acting in two-flavor space. The matrices λA run over all the antisymmetric generators of SUc(Nc). Assume that the four-fermion interactions can lead to the scalar condensates 〈q̄q〉 = φ (3) with all the Nc color fermion entering them, and the scalar color Nc(Nc − 1) 2 -plet difermion and di-antifermion condensates (after a global SUc(Nc) transformation) 〈q̄Ciγ5τ2λ2q〉 = δ, 〈q̄iγ5τ2λ2q C〉 = δ∗, (4) with only two color fermions enter them. The corresponding symmetry breaking is that SUfL(2) ⊗ SUfR(2) → SUf (2), SUc(Nc) → SUc(2), and a ”rotated” electric charge UQ̃(1) and a ”rotated” quark number U ′q(1) leave unbroken. It should be indicated that in the case of vacuum, the Goldstone bosons induced by spontaneous breaking of SUc(Nc) could be some combinations of difermions and di-antifermions. Define that σ = −2GSφ, ∆ = −2HSδ, ∆ ∗ = −2HSδ ∗. (5) With standard technique and a 4D Euclidean momentum cutoff Λ [4], we obtain the relativistic effective potential V4(σ, |∆|) = 2 + 2|∆|2)Λ2 − (Nc − 2) −(σ2 + |∆|2)2 σ2 + |∆|2 . (6) The ground states of the system, i.e. the minimum points of V4(σ, |∆|), will be at (σ, |∆|) = (0, ∆1) (σ2, ∆2) (σ1, 0) , 0 ≤ 1 + (Nc − 2) 1 + (Nc − 2) , (7) Eq.(7) gives the phase diagram Fig.(a) of the 4D NJL model. IV. 2D Gross-Neveu model The Lagrangian is expressed by L = q̄iγµ∂µq +GS [(q̄q) 2 + (q̄iγ5~τq) 2] +HS(q̄iγ5τSλAq C)(q̄Ciγ5τSλAq), (8) All the denotations are the same as ones in 4D NJL model, except that in 2D space-time , γ1 = = −C, γ5 = γ and τS = (τ0 ≡ 1, τ1, τ3) are flavor-triplet symmetric matrices. It is indicated that the product matrix Cγ5τSλA is antisymmetric. Assume that the four-fermion interactions could lead to the scalar quark-antiquark conden- sates 〈q̄q〉 = φ, (9) which will break the discrete symmetries χD : q(t, x) → γ5q(t, x), P1 : q(t, x) → γ1q(t,−x), and that the coupling with HS can lead to the scalar color Nc(Nc − 1) -plet difermion con- densates and the scalar color anti- Nc(Nc − 1) -plet di-antifermion condensates (after a global transformation in flavor and color space) 〈q̄C iγ51fλ2q〉 = δ, 〈q̄iγ51fλ2q C〉 = δ∗ (10) which will break discrete symmetries Zc (center of SUc(3)) and Z (center of SUf (2)), besides χD and P1. Noting that in a 2D model, no breaking of continuous symmetry needs to be considered on the basis of Mermin-Wagner-Coleman theorem [5]. The model is renormalizable. In the space-time dimension regularization approach, we can write down the renormalized L in D = 2− 2ε dimension space-time by the replacements GS → GSM 2−DZG, HS → HSM 2−DZH , with the scale parameter M , the renormalization constants ZG and ZH . In addition, the γ L will become 2D/2 × 2D/2 matrices. Define the order parameters σ = −2GSM 2−DZGφ, ∆ = −2HSM 2−DZHδ, (11) which will be finite if ZG and ZH are selected so as to cancel the UV divergences in φ and δ. In the minimal substraction scheme, ZG = 1− 2NcGS , ZH = 1− . (12) By similar derivation to the one made in Ref.[6], the corresponding renormalized effective po- tential in the mean field approximation up to one-loop order becomes V2(σ, |∆|) = σ2 + |∆|2 + (Nc − 2) ln σ2 + |∆|2 , M̄2 = 2πe−γM2, (13) where γ is the Euler constant. The ground states of the system i.e. the minimal points of V2(σ, |∆|) will be at (σ, |∆|) = (0, ∆1) (σ2, ∆2) (σ1, 0) GS/HS = 0 0 < GS/HS < 2/Nc GS/HS > 2/Nc Eq.(14) gives the phase diagram Fig.(b) of 2D GN model. In 2D case, the GS -HS phase structure has the following feature: 1. The pure 〈qq〉 phase (0,∆1) could appear only if GS/HS = 0; 2. Formations of the condensates do not call for that the coupling constant GS and HS have some lower bounds. V. 3D Gross-Neveu model The Lagrangian is expressed by L = q̄iγµ∂µq +GS [(q̄q) 2 + (q̄~τq)2] +HP (q̄τ2λAq C)(q̄Cτ2λAq), (15) where γµ(µ = 0, 1, 2) are taken to be 2× 2 matrices , γ1 = , γ2 = It is noted that the product matrix Cτ2λA is antisymmetric, and since without the ”γ5” matrix, the only possible color Nc(Nc − 1) -plet difermion interaction channel is pseudoscalar one. The condensates 〈q̄q〉 will break time reversal symmetry T : q(t, ~x) → γ2q(−t, ~x), special parity P1 : q(t, x 1, x2) → γ1q(t,−x1, x2), special parity P2 : q(t, x 1, x2) → γ2q(t, x1,−x2). The difermion condensates 〈q̄Cτ2λ2q〉 (after a global rotation in the color space) will break SUc(Nc) → SUc(2) and leave a ”rotated” electrical charge U (1) and a ”rotated” fermion number U ′q(1) unbroken. It also breaks parity P : q(t, ~x) → γ0q(t,−~x) and this shows pseudoscalar feature of the difermion condensates. Define the order parameters in the 3D GN model σ = −2GS〈q̄q〉, ∆ = −2HP 〈q̄ Cτ2λ2q〉, (16) on bases of the same method used in Ref.[7], we find out the effective potential in the mean field approximation V3(σ, |∆|) = 2 + 2|∆|2)Λ 6σ2|∆|+ 2|∆|3 + (Nc − 2)σ 3 + 2θ(σ − |∆|)(σ − |∆|)3 , (17) where Λ is a 3D Euclidean momentum cutoff. The ground states of the system correspond to the least value points of V3(σ, |∆|) which will respectively be at (σ, |∆|) = (0,∆1), (0,∆1), (σ2,∆2), (σ1, 0), , for Nc ≤ 4 for Nc > 4 , for all Nc Eq.(18) gives the GS −HP phase diagrams Fig.(c) and Fig.(d) of the 3D GN model. VI. Summary • Present research deepens our theoretical understanding of the four-fermion interaction models: 1. Even in vacuum, it is possible that the difermion condensates are generated as long as the coupling constants of the difermion channel are strong enough (bigger than zero or some finite values). 2. Interplay between the condensates 〈q̄q〉 and 〈qq〉 mainly depends on GS/HS (or GS/HP ), the ratio of the coupling constants of scalar fermion-antifermion channel and scalar (or pseudoscalar ) difermion channel. 3. In all the discussed 2-flavor models, if GS/HS (GS/HP ) > 2/Nc, the ratio of the color numbers of the fermions entering into the condensates 〈qq〉 and 〈q̄q〉, (and also with sufficiently large GS in 4D and 3D model), then only pure 〈q̄q〉 condensates phase may exist. Below 2/Nc, (and also with sufficiently large HS or HP in 4D or 3D model), one will always first have a mixed phase with condensates 〈q̄q〉 and 〈qq〉, then a pure 〈qq〉 condensate phase, except that in the 3D GN model, no the mixed phase appears when Nc ≤ 4. • In view of absence of 〈qq〉 condensates in vacuum of QCD, the result here also implies a real restriction to any given two-flavor QCD-analogue NJL model: in such model, the derived smallest ratio GS/HS via the Fierz transformation in the Hartree approximation must be bigger than 2/3 [4]. [1] Y. Nambu and G. Jona-Lasinio, Phys. Rev. 122 (1961) 345; 124 (1961) 246. [2] D.J. Gross and A. Neveu, Phys. Rev. D 10 (1974) 3235. [3] M. Buballa, Phys. Rep. 407 (2005) 205. [4] Zhou Bang-Rong, Commun. Theor. Phys. 47 (2007) 95. [5] N. D. Mermin and H. Wagner, Phys. Rev. Lett. 17 (1966) 1133; S. Coleman, Commun. Math. Phys. 31 (1973) 259. [6] Zhou Bang-Rong, Commun. Theor. Phys. 47 (2007) 520. [7] Zhou Bang-Rong, Commun. Theor. Phys. 47 (2007) 695. Main results References ABSTRACT Theoretical analysis of interplay between the condensates $<\bar{q}q>$ and $$ in vacuum is generally made by relativistic effective potentials in the mean field approximation in 2D, 3D and 4D models with two flavor and $N_c$ color massless fermions. It is found that in ground states of these models, interplay between the two condensates mainly depend on the ratio $G_S/H_S$ for 2D and 4D case or $G_S/H_P$ for 3D case, where $G_S$, $H_S$ and $H_P$ are respectively the coupling constants in a scalar $(\bar{q}q)$, a scalar $(qq)$ and a pseudoscalar $(qq)$ channel. In ground states of all the models, only pure $<\bar{q}q>$ condensates could exist if $G_S/H_S$ or $G_S/H_P$ is bigger than the critical value $2/N_c$, the ratio of the color numbers of the fermions entering into the condensates $$ and $<\bar{q}q>$. As $G_S/H_S$ or $G_S/H_P$ decreases to the region below $2/N_c$, differences of the models will manifest themselves. Depending on different models, and also on $N_c$ in 3D model, one will have or have no the coexistence phase of the two condensates, besides the pure $$ condensate phase. The $G_S-H_S$ (or $G_S-H_P$) phase diagrams in these models are given. The results also implicate a real constraint on two-flavor QCD-analogous NJL model. <|endoftext|><|startoftext|> Oscillation bands of condensates on a ring: Beyond the mean field theory C. G. Bao Center of Theoretical Nuclear Physics, National Laboratory of Heavy Ion Collisions, Lanzhou 730000, P. R. China The State Key Laboratory of Optoelectronic Materials and Technologies, Zhongshan University, Guangzhou, 510275, P.R. China Abstract: The Hamiltonian of a N-boson system confined on a ring with zero spin and repulsive interaction is diagonalized. The excitation of a pair of p-wave-particles rotating reversely appears to be a basic mode. The fluctuation of many of these excited pairs provides a mechanism of oscillation, the states can be thereby classified into oscillation bands. The particle correlation is studied intuitively via the two-body densities. Bose-clustering originating from the symmetrization of wave functions is found, which leads to the appearance of 1-, 2-, and 3-cluster structures. The motion is divided into being collective and relative, this leads to the establishment of a relation between the very high vortex states and the low-lying states. After the experimental realization of the Bose-Einstein condensation1, various condensates confined under dif- ferent circumstances have been extensively studied the- oretically and experimentally. Mostly, the condensates are considered to be confined in a harmonic trap. Con- densates trapped by periodic potential have also been studied due to the appearance of optical lattices.2 It is believed that the appearance of condensates confined in particular geometries is possible. Experimentally, the particle interactions can now be tuned from very weak to very strong,3−8 it implies that the particle correlation may become important. Theoretically, to respond, go- ing beyond the mean field Gross-Pitaevskii (GP) theory is desirable, and the condensates confined in particular geometries are also deserved to be considered. Along this line, in addition to the ground state, the yrast states have been studied both analytically and numerically.9−17 The condensation on a ring has also been studied recently.12 The present paper is also ded- icated to the N−boson systems confined on a ring with weak interaction, its scope is broader and covers the whole low-lying spectra. A similar system has been in- vestigated analytically by Lieb and Liniger16,17. How- ever, the emphasis of their papers is different from the present one, which is placed on analyzing the structures of the excited states to find out their distinctions and sim- ilarities, and to find out the modes of excitation. Based on the analysis, an effort is made to classify the ex- cited states. Traditionally, the particle correlation and its effect on the geometry of N−boson systems is a topic scarcely studied if N is large. In this paper, the corre- lation is studied intuitively so as the geometric features inherent in the excited states can be understood. Tradi- tionally, a separation between the collective and internal motions is seldom to be considered if N is large. In this paper such a separation is made and leads to the estab- lishment of a relation between the vortex states and the low-lying states. It is assumed that the N identical bosons confined on a ring have mass m, spin zero, and square-barrier inter- action. The ring has a radius R, N is given at 100, 20 and 10000. Let G = ~2/(2mR2) be the unit of energy. The Hamiltonian then reads H = − i<|startoftext|> Kadowaki-Woods Ratio of Strongly Coupled Fermi Liquids Takuya Okabe Faculty of Engineering, Shizuoka University, 3-5-1 Johoku, Hamamatsu 432-8561,Japan (Dated: November 28, 2018) On the basis of the Fermi liquid theory, the Kadowaki-Woods ratio A/γ2 is evaluated by using a first principle band calculation for typical itinerant d and f electron systems. It is found as observed that the ratio for the d electron systems is significantly smaller than the normal f systems, even without considering their relatively weak correlation. The difference in the ratio value comes from different characters of the Fermi surfaces. By comparing Pd and USn3 as typical cases, we discuss the importance of the Fermi surface dependence of the quasiparticle transport relaxation. PACS numbers: 71.10.Ay, 71.18.+y, 71.20.Be, 71.27.+a, 72.15.-v It is widely known as a universal feature of heavy fermion systems that there holds the Kadowaki-Woods (KW) relation A/γ2 ≃ 1 × 10−5µΩ cm(mol K/mJ)2 be- tween the electronic specific heat coefficient γ of C = γT and the coefficient A of the resistivity ρ = AT 2 in the clean and low temperature limit.[1] According to the Fermi liquid theory, this is interpreted as an indication of the fact that A is squarely proportional to quasiparti- cle mass enhancement due to strong electron correlation. On the other hand, transition metal systems are reported since before to obey a similar relation with a more than an order of magnitude smaller value of A/γ2.[2, 3] In view of the observation that there seems to exist several types of systems in this regard, the recent finding by Tsujii et al.[4] is quite impressive that many Yb-based compounds show the KW ratio A/γ2 as small as the transition met- als. Kontani derived the small ratio as a result of the large orbital degeneracy of the the 4f13 state of trivalent Yb by applying the dynamical mean field approximation to a periodic Anderson model of an orbitally degenerate f electron states coupled with a single conduction band.[5] To discuss the KW ratio A/γ2 and the many-body mass enhancement effect, a simple model is usually adopted at the cost of neglecting material specific individ- ual factors. In the present work, we are interested in such an effect as caused by a system-dependent factor, that is, the Fermi surface dependence of quasiparticle current re- laxation. The system should have a large enough Fermi surface relative to the Brillouin zone boundary in order for the quasiparticle current to dissipate effectively into an underlying lattice through mutual quasiparticle scat- terings. In other words, the effectiveness of the trans- port relaxation may depend on the size and shape of the Fermi surface. To investigate this point definitely, we discuss the quasiparticle transport by taking account of the momentum dependence of quasiparticle scattering on the basis of realistic band structures. This has been ham- pered so far by a task required for not so simple Fermi surfaces of many band systems as could be simply mod- elled analytically. In terms of fairly realistic energy bands obtained from a first principle calculation, we evaluate those quantities which are not affected severely by the electron correlation effect. The theory in use is essen- tially within the phenomenological Fermi liquid theory described by renormalized quantities, and unlike a model calculation no bare microscopic quantities appear explic- itly. Schematic results using simple abstract models have been given before, in which a tight binding square lattice model and a two-band model are investigated.[6, 7, 8] For the ratio A/γ2 we make use of the expression, = 21.3αFa [µΩ (mol K/mJ) ], (1) which corresponds to Eq. (4.11) in Ref. 7 where we set a = 4Å for the lattice constant. In what follows we sub- stitute a calculated value for a. Below we follow how to derive αF , where α is a coupling constant, and F is a factor determined by the Fermi surface. Following a microscopic analysis of the quasiparticle transport with vertex corrections properly taken into account,[9] we may derive a phenomenological linearized Boltzmann equation.[7] Generalizing the theory to take a many-band effect into account, in the low temperature T → 0 we end up with the equation ∗Electronic address: ttokabe@ipc.shizuoka.ac.jp vipµ = (πT ) pp′kρ p′+k(l pµ + l p′µ − l p′+kµ − l p−kµ), (2) where vipµ and ρ p = δ(µ − ε p) are the velocity compo- nent and the local density of state of the renormalized http://arxiv.org/abs/0704.0843v2 mailto:ttokabe@ipc.shizuoka.ac.jp (mass-enhanced) quasiparticle with the crystal momen- tum p in the i-th band. The superscripts i and j are the band indices, while the subscript µ = x, y, z are Carte- sian coordinates. In the right hand side of Eq. (2), the 2nd to 4th terms in the parenthesis represent vertex cor- rections in the microscopic formulation. In terms of the solution lipµ, which physically represents stationary devi- ation of the Fermi surface in an applied electric field Eµ, the conductivity is given by σ ≡ σµ = 2e pµ, (3) The above equations (2) and (3) correspond to Eqs. (3.10) and (3.15) of Ref. 8 respectively. We may suppress the index µ (= x) in Eq. (3) as we discuss the cubic systems in what follows. Instead of solving the simultaneous matrix equations (2) exactly, we use trial functions for lipµ as commonly ap- plied in a variational principle formulation of the trans- port problems.[10] Assuming lipµ ∝ e |vipµ| we obtain αijci,j ρ2|vx| , (4) where ci,j = k1,k2,k3,k4 k1+k2=k3+k4 ρik1ρ ρik4(e −eik4) 2/4ρiρj , ρ|vx| ≡ ρip|v px|. (6) We define coupling constants αij = ρiρj〈W ij〉/π, where p, is the density of states of the i-th band at the Fermi level and 〈W ij〉 denotes the quasiparticle scat- tering probability W pp′k averaged over the momenta p, p and k. As the double sum in (2), dominated by Umk- lapp processes, covers a complicated shaped phase space over the Fermi surface, it is generally a good approxi- mation to take W pp′k out of the momentum sum as an averaged quantity. The total density of states ρ = is substituted for γ = 2π2ρ/3. In heavy fermion systems, the momentum dependence pp′k could be generally neglected, for the quasiparti- cle scattering W pp′k is primarily caused by strong on-site Coulomb repulsion U . Then we can make an order of magnitude estimate of αii in terms of Landau parame- ters F and F . For an anisotropic Fermi liquid, as in an isotropic case, one can derive that the charge and spin susceptibilities are given by χic = 2ρi/(1+F ) and χis = 2ρi/(1 + F ), respectively. Thus, for the systems in which charge fluctuations are suppressed, χic → 0, we obtain F ≫ 1. On the other hand, in terms of A /(1 +F ), one obtains a rough estimate of the cou- pling αii = 1 )2 + 1 . There- fore, under the normal condition that the spin enhance- ment is moderate, (1+F )−1 ∼ 1, αii should universally stay around a constant of an order of unity.[7] This corre- sponds to the condition to make the Wilson ratioRW = 2 in the impurity model.[11, 12] We discuss a normal state that the system is well away from critical instabilities, around which A/γ2 will be strongly enhanced at vari- ance with experimental results under consideration.[13] We evaluate F numerically for α = αij = 1 to obtain A/γ2, and investigate the Fermi surface dependence. It is noted that the factor F is determined by the shape and extent of the Fermi surfaces relative to the Brillouin zone boundary. Microscopically, the mass enhancement due to the many-body effect is represented by the ω- derivative of the electron self-energy Σ(q, ω), or by the renormalization factor zip as ρ p = ρ 0,p/z p, where ρ 0,p is a bare density of states. It is easily checked that the factor z cancels in F when zip is independent of i. Oth- erwise, in case that a dominant contribution to the re- sistivity comes from an electron-correlated main band, then the other bands may be neglected and A/γ2 be- comes independent of z of the main band. As we see below numerically, it is found indeed that F is domi- nated by a few scattering channels within a main band or two. Hence, we elaborate on a numerical estimate of F on the basis of a realistic band calculation reproducing reliable Fermi surfaces of relevant bands, even if it may not take account of local many-body correlation effects fully enough for the renormalized quantities like ρi and vip to be separately compared with experiments. As a matter of course, we must exclude the extreme case in which strong correlation modifies electron states around the Fermi level qualitatively from those of a band calcu- lation. We apply our theory to those itinerant electron systems in which correlation strength is not negligible but not so strong. To calculate F for some typical cubic d and f itiner- ant electron systems in the fcc and Cu3Au structures, we have performed ab initio band calculations within den- sity functional theory using the plane wave pseudopoten- tial code VASP with the Perdew-Wang 1991 generalized gradient approximation to the exchange correlation func- tional Exc.[14, 15, 16, 17] By minimizing the total energy we obtain the lattice constant a, which is accurate enough to be used in Eq. (1). To evaluate F numerically, we have to broaden the delta function ρip = δ(µ − ε p) by ∆ to pick up electron states around the Fermi level. The width ∆ of the order of real temperature should be decreased as the number of the k-points is increased until we confirm to have a con- vergent result. For the number L of subdivisions along re- TABLE I: Calculated results. a (Å) ρ|vx| a F N A/γ2 b USn3 4.60 3.1 4.0 3 0.39 UIn3 4.61 4.9 1.6 3 0.16 UGa3 4.24 3.9 2.5 3 0.23 Pd 3.86 7.4 0.23 3 0.019 Pt 3.91 8.4 0.15 4 0.012 aIn unit of a = 1. bIn unit of [10−5 µΩ cm (mol K/mJ)2]. ciprocal lattice vectors, band calculations are performed with Lband ∼ 50, from which we obtain the band energies εik on the finer k-mesh of L ∼ 200 by interpolation. As the four-fold k-sum in the numerator of Eq. (4), especially for the most important terms coming from the main d or f correlated bands, constitutes the most time consuming part of the calculation, we have to reduce the numerical task by some symmetry considerations not only on the cubic symmetry of the quasiparticle states, but on the relative directions of the four momentum vectors of the scattering quasiparticle states and the x-direction of the current flow. The reduction is particularly effective for the intra-band scatterings i = j. The calculated results are shown in Table I, where F and A/γ2 for α = αij = 1 are shown along with the lattice constant a, the number N of metallic bands con- tributing to the resistivity, and ρ|vx| defined in Eq. (6). We find that our results explain well the experimental tendency of an order of magnitude small values of the ratio A/γ2 for the transition metal systems. As for the absolute values of the ratio, our results are a few times smaller than observed evenly, but the accuracy of this order should not be taken seriously here. Among other things, the results indicate that different characters of the Fermi surfaces play an important role. To show the relative contribution to the resistivity from relevant bands, relative magnitudes of ci,j in the nu- merator of Eq. (4) are shown for Pd and USn3 in Figs. 1 and 2, respectively. For Pd, the contribution to F comes from the 4th to 6th bands, among which dominant is the 5th hole band of the 3d character. Similarly, the 5th band contributes majorly not only to ρ, i.e., ρ5 ≃ 5.4ρ4 ≃ 12ρ6, but to ρ|vx| in Eq. (6). On the other hand, for USn3, while the 14th heavy electron band plays a central role, the 12th and 13th hole bands also make non-negligible contributions through the inter-band scatterings. Hence, as the first point to note, numerical importance of the inter-band contributions makes F large in the f electron system. This is partly because ρi for i = 12, 13, 14 are comparable with each other, namely, ρ14 ≃ 2ρ13 ≃ 3ρ12. Moreover, it is remarked that the large and nearly spher- ical shape of the Fermi surfaces are essential too. As the second point to note, the importance of the Fermi surface geometry can be understood within a single band model by comparing contribution from the main band. We find that c5,5/ρ 5 = 0.097 for Pd is an order of magnitude 6 4 FIG. 1: cij (i, j = 4, 5, 6) for Pd. The contribution from the 5th band is dominant for the resistivity. 14 12 FIG. 2: cij (i, j = 12, 13, 14) for USn3. The interband con- tribution with the 14th band is important too. smaller than c14,14/ρ 14 = 0.93 for USn3. The difference comes from the different characters of the Fermi surfaces. According to an elementary formula σ = e2ρv2τ = e2ρvl, the conductivity σ depends on ρv as well as l. In this context, the mean free path l is not a single particle property determined by a lifetime of the particle state, but it is the transport property which characterizes how efficiently the total electric current decays into a lattice system, e.g., in our case, through mutual Umklapp scat- tering processes between the current carriers. In partic- ular, regardless of interaction, electrons in free space will not have resistivity.[9] Thus, to evaluate the transport property l correctly, it is crucial to take account of the momentum dependence of the scattering states and their conservation modulo the reciprocal lattice vectors. Note that ρ|vx| defined in Eq. (6) is related to the sur- face area S of the Fermi surfaces, as ρdε = Sdk⊥/(2π) Hence, ρ|vx| too is independent of the mass renormaliza- tion z as F is, and for free electrons we obtain ρ|vx| ∝ ∝ n2/3. One can see a correlation between F and ρ|vx| in Table I. In fact, Pd and Pt have twice as large ρ|vx| as the uranium compounds. The difference can- not be simply explained by the difference in the Fermi surface volume n. It is caused by the fact that the f - electron systems have the nearly isotropic Fermi surfaces while the d-electron systems have complicated ones with FIG. 3: The intersection of the Fermi surfaces of Pd d-hole states with the (111̄) plane. FIG. 4: The intersection of the Fermi surfaces of USn3 with the (100) plane relatively large area compared to their total volume, as indicated in Figs. 3 and 4. The different characters of the surfaces affect not only the single particle quantity ρ|vx| but also the transport property of the total cur- rent relaxation. As the order of magnitude difference in F is not explained merely by ρ|vx|, we have to have re- sort to the other factor, that is, the transport property depending on the Fermi surfaces. It originates from the detailed k-dependence of the scattering states, as repre- sented in ci,j , or by the phase space volume available for all possible scattering channels under strict restrictions of energy and momentum conservations. Thus our quan- titative analysis concludes the important effect on the quasiparticle transport due to the shape and complexity of the Fermi surfaces. In summary, we evaluated the Kadowaki-Woods ratio A/γ2 of some itinerant d and f electron systems numeri- cally on the basis of the Fermi liquid theory using quasi- particle Fermi surfaces obtained by band calculations. In a single framework, we find the d electron systems have smaller ratio than the f systems, as observed, and among others we pointed out an important effect to the transport coefficient A originating from a commonly ne- glected specific feature depending on the characters of the Fermi surfaces. The effect is not understood fully as a single-particle property of interacting systems, but we stress the importance of the phase space restriction due to momentum conservation in two-body scattering processes to dissipate a total electric current. In short, to realize effective dissipation, the system should have a large and regular shaped Fermi surface. In future we will examine that the Fermi-surface dependent efficiency of mutual quasiparticle scatterings may depend on a type of transport current to be relaxed. Acknowledgment The author is grateful to N. Fujima, S. Kokado and T. Hoshino for providing assistance in the numerical cal- culations. He also acknowledges computational resources offered from YITP computer system in Kyoto University. [1] K. Kadowaki and S. B. Woods, Solid State Commun. 58, 507 (1986). [2] M. J. Rice, Phys. Rev. Lett. 20, 1439 (1968). [3] K. Miyake, T. Matsuura, and C. M. Varma, Solid State Commun. 71, 1149 (1989). [4] N. Tsujii, H. Kontani, and K. Yoshimura, Phys. Rev. Lett. 94, 057201 (2005). [5] H. Kontani, J. Phys. Soc. Jpn. 73, 515 (2004). [6] T. Okabe, J. Phys. Soc. Jpn. 67, 2792 (1998). [7] T. Okabe, J. Phys. Soc. Jpn. 67, 4178 (1998). [8] T. Okabe, J. Phys. Soc. Jpn. 68, 2721 (1999). [9] K. Yamada and K. Yosida, Prog. Theor. Phys. 76, 621 (1986). [10] J. M. Ziman, Electrons and Phonons (Clarendon Press, Oxford, 1960). [11] P. Nozières, J. Low. Temp. Phys. 17, 31 (1974). [12] K. Yosida and K. Yamada, Prog. Theor. Phys. 53, 1286 (1975). [13] T. Takimoto and T. Moriya, Solid State Commun. 99, 457 (1996). [14] G. Kresse and J. Furthmüller, Comput. Mater. Sci. 6, 15 (1996). [15] G. Kresse and J. Furthmüller, Phys. Rev. B 54, 11169 (1996). [16] G. Kresse and D. Joubert, Phys. Rev. B 59, 1758 (1999). [17] J. P. Perdew, J. A. Chevary, S. H. Vosko, K. A. Jackson, M. R. Pederson, D. J. Singh, and C. Fiolhais, Phys. Rev. B 46, 6671 (1992). ABSTRACT On the basis of the Fermi liquid theory, the Kadowaki-Woods ratio $A/\gamma^2$ is evaluated by using a first principle band calculation for typical itinerant $d$ and $f$ electron systems. It is found as observed that the ratio for the $d$ electron systems is significantly smaller than the normal $f$ systems, even without considering their relatively weak correlation. The difference in the ratio value comes from different characters of the Fermi surfaces. By comparing Pd and USn$_3$ as typical cases, we discuss the importance of the Fermi surface dependence of the quasiparticle transport relaxation. <|endoftext|><|startoftext|> Structure of Strange Dwarfs with Color Superconducting Core Masayuki Matsuzaki∗ and Etsuchika Kobayashi Department of Physics, Fukuoka University of Education, Munakata, Fukuoka 811-4192, Japan Abstract We study effects of two-flavor color superconductivity on the structure of strange dwarfs, which are stellar objects with similar masses and radii with ordinary white dwarfs but stabilized by the strange quark matter core. We find that unpaired quark matter is a good approximation to the core of strange dwarfs. PACS numbers: 95.30.-k ∗matsuza@fukuoka-edu.ac.jp http://arxiv.org/abs/0704.0844v1 mailto:matsuza@fukuoka-edu.ac.jp Witten made a conjecture that the absolute ground state of quantum chromodynamics (QCD) is not 56Fe but strange quark matter, which is a plasma composed of almost equal number of deconfined u, d, and s quarks [1]. Although this conjecture has been neither confirmed nor rejected, if this is true, since deconfinement is expected in high density cores of compact stars, there could exist stars that contain strange quark matter converted from two-flavor quark matter via weak interaction. Strange quark stars whose radii are about 10 km, with or without thin nuclear crust, have long been investigated. Glendenning et al. proposed a new class of compact stars containing strange quark matter and thick nuclear crust ranging from a few hundred to ten thousand km [2, 3, 4]. They named them the strange dwarfs because their radii correspond to those of white dwarfs. Alcock et al. discussed the mechanism that the strange quark core supports the cruct [5]. Since the mass of s quark is larger than those of u and d, strange quark matter is positively charged. In order to electrically neutralize the core, electrons are bound to the surface of the core. They estimated that the thickness of this electric dipole layer is a few hundred fm. Then this layer can support a nuclear crust. Although Alcock et al. considered only thin crusts, Glendenning et al. considered thick crusts up to about ten thousand km. Very recently, Mathews et al. identified eight candidates of strange dwarfs from observed data [6]. A theoretical facet whose importance in nuclear physics was recognized later is color superconductivity in quark matter. At asymptotically high density, the color-flavor locking (CFL) is believed to be the ground state [7]. At realistic densities, however, the two-flavor color superconductivity (2SC) is thought to be realized even when electric neutrality is imposed if the coupling constant is strong [8]. Thus, in the present paper, we discuss effects of the 2SC phase in the strange quark matter core on the structure of strange dwarfs. In order to determine the structure of compact stars, we solve the general relativistic Tolman-Oppenheimer-Volkoff (TOV) equation, dp(r) Gǫ(r)M(r) 4πr3p(r) M(r)c2 2GM(r) , (1) M(r) = 4π ǫ(r′) r′2dr′, (2) for the pressure p(r), the energy density ǫ(r), and the mass enclosed within the radius r, M(r). Here G is the gravitational constant and c is the speed of light. The equation is closed when an equation of state (EOS), a relation between p and ǫ, is specified. In the present case, strange dwarfs are composed of the strange quark matter core and the nuclear crust. Accordingly two parameters, the pressure at the center and at the core-crust boundary, must be specified to integrate the TOV equation. The latter must be equal or less than that corresponds to the nucleon drip density ǫdrip. Otherwise neutrons drip and gravitate to the core. In the present calculation we take a pcruct calculated from ǫcrust = ǫdrip. We assume zero temperature throughout this paper. As for the EOS of the quark core, we adopt the MIT bag model without any QCD corrections (see Ref. [4], for example). For unpaired free quark matter, p = −B + µfkFf µ2f − m4f ln µf + kFf , (3) ǫ = B + µfkFf µ2f − m4f ln µf + kFf , (4) where mf , kFf , and µf = m2f + k Ff are the mass, the Fermi momentum, and the chemical potential of quarks of each flavor, respectively, and f runs u, d, and s. Hereafter we put c = h̄ = 1. The quantity B is the bag constant. The effect of color superconductivity is incorporated as a chemical potential dependent effective bag constant. In the 2SC case [9], Beff = B − ∆2(µ)µ2, (5) where ∆(µ) is the quark pairing gap as a function of a chemical potential µ, whose relation to µf is specified later. The pairing gap is obtained as a function of the Fermi momentum by solving the gap equation [10] ∆(kF) = − v̄(kF, k) E ′(k) k2dk, (6) E ′(k) = (Ek − EkF) 2 + 3∆2(k), (7) with kF = kFu = kFd, Ek = k2 +m2q , and mq = mu = md. The one gluon exchange pairing interaction is given by v̄(p, k) = − pkEpEk 2EpEk + 2m q + p 2 + k2 +m2E (p+ k)2 +m2E (p− k)2 +m2E 6EpEk − 6m q − p 2 − k2 m2E = 2, (8) where p and k are the magnitudes of 3-momenta. The running coupling constant is given by [11] q2max+q q = p− k, qmax = max{p, k}. (9) As for the EOS of the crust, we adopt the tabulated one for β-equilibrium nuclear matter of Baym, Pethick, and Sutherland [12] (BPS) conforming to Refs. [2, 3, 4]. The positively charged strange quark matter in the core is simply approximated by µ = µu = µd = µs. Quark masses are given by mu = md = 10 MeV, ms = 150 MeV. The bag constant is chosen to be B1/4 = 160 MeV. Parameters entering into the pairing interaction are q2c = 1.5Λ QCD and ΛQCD = 400 MeV. The nucleon drip density is ǫdrip = 4.3×10 11 g/cm3. 26 28 30 32 34 36 38 log� (J/m free quark 2SC quark FIG. 1: Equations of state of free and 2SC quark matter and β-equilibrium nuclear matter. The latter is tabulated in Refs. [12] and [4]. The adopted EOS is displayed in Fig. 1. The logarithm is to base 10 throughout this paper. The quark matter EOS describes the core and the BPS EOS describes the crust. At the boundary, the pressure is common whereas the energy density jumps discontinuously. In order to obtain the EOS for 2SC matter, the pairing gap must be calculated at each kF beforehand. This is shown in Fig. 2 left. The effective bag constant determined by the pairing gap is shown in Fig. 2 right. The resulting 2SC EOS is included in Fig. 1. Figure 3 presents the mass-radius relation obtained by integrating the TOV equation with a fixed pcruct, determined from ǫcrust = ǫdrip, and various central pressures. This result can 200 300 400 500 600 700 800 µ (MeV) 200 300 400 500 600 700 800 µ (MeV) FIG. 2: Left: color superconducting pairing gap and right: effective bag constant, as functions of the quark chemical potential. 0 1 2 3 4 5 logR (km) FIG. 3: Mass-radius relation of strange dwarfs and white dwarfs. be classified into three regions. The first region (larger central pressures), almost vertical curve at around R ∼ 10 km, describes strange stars with thin crusts. In this region, color superconductivity makes the maximum mass and radius larger because the pairing gap reduces the bag constant and consequently the energy density decreases and the pressure increases. This is consistent with another calculation with the CFL phase [13]. The second region, horizontal at around M/Msun ∼ 10 −2, and the third region, vertical at around R ∼ 104 km up to the maximum mass, correspond to strange dwarfs. In the second region, color superconducting quark cores support slightly larger masses than unpaired free quark cores. In the third region, effect of color superconductivity is negligible. In Fig. 3, The mass-radius relation of ordinary white dwarfs without quark matter cores calculated by adopting the BPS EOS is also shown although it is known that the BPS EOS is not very suitable for white dwarfs. As the central pressure decreases, the quark matter core shrinks (Fig. 4 left) and eventually strange dwarfs reduce to ordinary white dwarfs. When their masses are the same, the former is more compact than the latter (see also Fig. 5 right) because of the gravity of the core. Mathews et al. paid attention to this difference in the mass-radius relation and classified the observed data of dwarfs [6]. According to their work, eight of them are classified into strange dwarfs. 28 29 30 31 32 33 34 35 logp0 (J/m 28 29 30 31 32 33 34 35 logp0 (J/m FIG. 4: Left: core radius and right: mass of strange dwarfs, as functions of the central pressure. 14 15 16 log�0 (g/cm -1 0 1 2 3 4 logr (km) SD(free) FIG. 5: Left: mass of strange dwarfs as a function of the central energy density. Right: energy profile of a strange dwarf with M/Msun = 0.465 and that of a white dwarf with M/Msun = 0.466. Figure 4 right indicates that strange dwarfs, in particular those of 103 km < R < 104 km, are realized in a very narrow range of the central pressure. This is reflected in the density of calculated points. During this rapid structure change from the second to the third region, the core radius almost does not change, see Fig. 4 left. Figure 5 left also graphs M/Msun as Fig. 4 right but as a function of the central energy density. The difference between these two figures at the low pressure/energy density side can be understood from the quark matter EOS in Fig. 1 such that the pressure decreases steeply at the lowest energy density. Figure 5 left indicates that strange dwarfs have central energy densities just below the lowest stable compact strange stars and several orders of magnitude larger than those of ordinary white dwarfs. This is clearly demonstrated in Fig. 5 right. To summarize, we have solved the Tolman-Oppenheimer-Volkoff equation for strange dwarfs with ǫcrust = ǫdrip and a wide range of the central pressure. We have examined effects of the two-flavor color superconductivity in the strange quark matter core in a simplified manner. The obtained results indicate that, aside from a slight increase of the minimum mass, effect of color superconductivity is negligible in the mass-radius relation. This is consistent with the conjecture given in Ref. [6]. As a function of the central energy density, however, strange dwarfs are realized at slightly lower energy densities than the unpaired free quark case reflecting the effect on the equation of state. Recently Usov discussed that electric fields are also generated on the surface of the color-flavor locked matter [14]. This suggests that strange dwarfs with color-flavor locked cores might also be possible although this is expected only at relatively high densities. Since the pairing gap enters into the calculation only through the effective bag constant, aside from a possible slight change in chemical potentials, it can surely be expected that the effect of color-flavor locking does not differ much from that of the two-flavor color superconductivity. In conclusion, unpaired quark matter is a good approximation to the core of strange dwarfs. Another aspect that might be affected by color superconductivity is the cooling [15]. This is beyond the scope of the present study. [1] E. Witten, Phys. Rev. D 30 (1984), 272. [2] N. K. Glendenning, Ch. Ketter and F. Weber, Phys. Rev. Lett. 74 (1995), 3519. [3] N. K. Glendenning, Ch. Ketter and F. Weber, Astrophys. J. 450 (1995), 253. [4] N. K. Glendenning, Compact Stars (Springer, New York, 1996). [5] C. Alcock, E. Farhi and A. Olinto, Astrophys. J. 310 (1986), 261. [6] G. J. Mathews, I. -S. Suh, B. O’Gorman, N. Q. Lan, W. Zech, K. Otsuki and F. Weber, J. Phys. G 32 (2006), 747. [7] K. Rajagopal and F. Wilczek, Phys. Rev. Lett. 86 (2001), 3492. [8] H. Abuki and T. Kunihiro, Nucl. Phys. A 768 (2006), 118. [9] M. Alford and K. Rajagopal, J. High Energy Phys. 06 (2002), 031. [10] M. Matsuzaki, Phys. Rev. D 62 (2000), 017501. [11] K. Higashijima, Prog. Theor. Phys. Suppl. 104 (1991), 1. [12] G. Baym, C. Pethick and P. Sutherland, Astrophys. J. 170 (1971), 299. [13] G. Lugones and J. E. Horvath, Astron. and Astrophys. 403 (2003), 173. [14] V. V. Usov, Phys. Rev. D 70 (2004), 067301. [15] O. G. Benvenuto and L. G. Althaus, Astrophys. J. 462 (1996), 364. References ABSTRACT We study effects of two-flavor color superconductivity on the structure of strange dwarfs, which are stellar objects with similar masses and radii with ordinary white dwarfs but stabilized by the strange quark matter core. We find that unpaired quark matter is a good approximation to the core of strange dwarfs. <|endoftext|><|startoftext|> Information entropic superconducting microcooler A. O. Niskanen,1, 2 Y. Nakamura,1, 3, 4 and J. P. Pekola5 1CREST-JST, Kawaguchi, Saitama 332-0012,Japan 2VTT Technical Research Centre of Finland, Sensors, PO BOX 1000, 02044 VTT, Finland 3NEC Fundamental Research Laboratories, Tsukuba, Ibaraki 305-8501, Japan 4The Institute of Physical and Chemical Research (RIKEN), Wako, Saitama 351-0198, Japan 5Low Temperature Laboratory, Helsinki University of Technology, PO BOX 3500, 02015 TKK, Finland (Dated: October 25, 2018) We consider a design for a cyclic microrefrigerator using a superconducting flux qubit. Adiabatic modulation of the flux combined with thermalization can be used to transfer energy from a lower temperature normal metal thin film resistor to another one at higher temperature. The frequency selectivity of photonic heat conduction is achieved by including the hot resistor as part of a high frequency LC resonator and the cold one as part of a low-frequency oscillator while keeping both circuits in the underdamped regime. We discuss the performance of the device in an experimentally realistic setting. This device illustrates the complementarity of information and thermodynamic entropy as the erasure of the quantum bit directly relates to the cooling of the resistor. PACS numbers: 74.50.+r,85.80.Fi,03.67.-a For the purpose of quantum computing, the coher- ence properties of superconducting quantum bits (qubits) should be optimized by decoupling them from all noise sources as well as possible. However, many interesting experiments can be envisioned also when the decoupling is far from perfect. One such experiment closely related to coherence optimization is using a qubit as a spectrom- eter [1, 2, 3] for the environmental noise by monitoring the effect of the environment on the quantum two-level system. Here we focus on the opposite phenomenon, i.e. the effect of a qubit on the environment. Recently a superconducting flux qubit [4, 5] with a quite small tun- neling energy from the point of view of quantum com- puting was cooled using sideband cooling and a third level [6] from about 400 mK down to 3 mK. Motivated by this experiment we consider the possibility of using a single quantum bit as a cyclic refrigerator for environ- mental degrees of freedom. The utilized heat conduction mechanism is photonic which was recently studied also in experiment [7]. Besides the possible practical uses, the device is interesting physically as it directly illus- trates the connection between information entropy and thermodynamical entropy. For related superconducting high-frequency cooler concepts see eg. Refs. [8, 9]. Here we study a flux qubit coupled inductively to two different loops shown in Fig. 1a. In loop j (j = 1, 2) we have a resistor Rj in series with an inductance Lj and a capacitance Cj . These form two damped harmonic os- cillators. The resistors are in general at different tem- peratures T1 and T2. The coupling of the qubit to these two admittances Y1 and Y2 is assumed to be sufficiently large to dominate the relaxation of the qubit. This as- sumption can be easily validated by e.g. increasing the mutual inductance. The flux qubit is an otherwise su- perconducting loop except for three or four Josephson junctions with suitably picked parameters. In particu- FIG. 1: (color online) Principle of the flux-qubit cooler. (a) Layout of the circuit. (b) Energy band diagram. (c) Schematic of the cooling cycle in the qubit temperature- entropy plane. lar one of the junctions is made smaller than others to form a two-level system. When biased close to half of the flux quantum Φ0 = h/2e, the qubit can be described (in persistent current basis) by the Hamiltonian H/~ = −1 (∆σx + εσz) (1) where σx and σz are Pauli matrices, ~ε = 2Ip(Φ−Φ0/2) is the flux-tunable energy bias and Φ is the controllable flux threading the qubit loop. Away from Φ = Φ0/2 the eigenstates have the persistent currents ±Ip circulating in the loop. The tunneling energy ~∆ results in an an- http://arxiv.org/abs/0704.0845v1 ticrossing at Φ = Φ0/2 and there the energy eigenstates do not carry average current. The resonant angular fre- quency of the qubit is ω = ε2 +∆2. Consider the ideal cycle shown in Fig. 1b-c where the bias of the flux qubit is swept slowly (slower than ∆/2π) between two extreme values ε1 and ε2 corresponding to two different energy level separations ~ω1 and ~ω2. Let us further assume that ωj ≈ ωLCj and Qj ≫ 1, where ωLCj = 1/ LjCj and Qj = Lj/Cj/Rj. This choice guarantees that the qubit mainly couples to resistor R1 (R2) at bias point 1 (2). The cooling cycle consists of steps O, P, Q and R. First in step O the qubit has the angular frequency ω2 and is allowed to thermalize. Be- cause of the bandwidth limitations imposed by the reac- tive elements, the qubit tends to thermalize with resistor R2 to temperature T2. In the next step P the flux bias is adiabatically changed to point 1 such that the level popu- lations do not change but the energy eigenstates do. The sweep is assumed to be however faster than relaxation. In point 1 the angular frequency is reduced to ω1. Be- cause the level populations and therefore the Boltzmann factors do not change the qubit must now be at lower temperature T̃2 given by T̃2 = T2ω1/ω2 in order to com- pensate for the change of the qubit splitting. Note that the quantum mechanical adiabaticity implies also ther- modynamical adiabaticity: while the energy eigenbasis changes the level populations and thus also entropy do not change. In step Q the qubit is allowed to thermalize to temperature T1 which results in heating of the qubit and in cooling of resistor 1 if T̃2 < T1. At this point the ideally pure quantum state of the qubit gets erased and information stored is lost. The entropy of the qubit increases, but locally the entropy of resistor 1 decreases such that one can say that some information is “stored” in the resistor as it cools but naturally with some loss. Finally in step R the qubit is adiabatically shifted back to frequency ω2 which results in heating of the qubit to the effective temperature T̃1 = T1ω2/ω1 which is assumed to be higher than T2. The excess energy is dumped to admittance 2 when the cycle starts again from the be- ginning. Note that due to the condition T̃2 < T1 resistor 1 can never be cooled below T2ω1/ω2. Since there is no isothermal stage in the above cycle the present device is not even in principle a Carnot cooler but rather an Otto-type device.[10] The density matrix of the qubit with the resonant an- gular frequency ω at temperature T (β = (kBT ) −1) is given by ρeq(β, ε) = . (2) Using this the cooling power and the efficiency of the ideal cycle in Fig. 1c can be easily calculated. It is given by the area of the shaded region in the entropy-temperature plane below points P and Q. In principle one could solve for the effective temperature of the qubit along the line between points P and Q as a function of entropy given by S = −kBTr(ρ ln ρ). Alternatively, we can simply note that the expectation value of the energy stored in the qubit in point P is EP = Tr(ρeq(β2ω2/ω1, ε1)H1) while after relaxation we have EQ = Tr(ρeq(β1, ε1)H1), where H1 = H(ε1) is the Hamiltonian at point 1. We thus get for the ideal cooling power P/f = EQ − EP = −β1~ω1 e−β1~ω1 + 1 − ~ω1e −β2~ω2 e−β2~ω2 + 1 ≤ ~ω1 where f is the pump frequency. The cooling power achieves the maximum value of ~ω1f/2 when the ther- mal population in step O (and P) is small and when the population in step Q is large, i.e. when β2~ω2 ≫ 1 and β1~ω1 ≪ 1. Naturally a practical device has to be designed to fulfill the first condition always, in which case the smallest achievable temperature is on the or- der of ~ω1/kB below which the cooling power decreases rapidly. The dynamic range could be made wider by a tunable ∆ which can be achieved by splitting the small- est junction into a dc SQUID geometry. Another figure of merit is the ratio η of the heat removed from resis- tor 1 divided by the heat added to resistor 2. It can be obtained as the ratio of the shaded area divided by the sum of the hatched area and the shaded area, i.e., η = (EQ−EP )/(ER−EO) where EO = Tr(ρeq(β2, ε2)H2) and ER = Tr(ρeq(β1ω1/ω2, ε2)H2). This simplifies neatly to η = ω1/ω2 < 1 which is in harmony with the second law of thermodynamics. For more quantitative analysis we have to consider the details of the relaxation rates due to the baths. The Golden Rule transition rates due to resistor j are given ↓,↑ = |〈0|dH/dΦ|1〉|2M2j S I (±ωj) M2j S I(±ωj) (4) where the positive sign corresponds to relaxation. The total thermalization rate is Γ ↑ + Γ ↓. Here the unsymmetrized noise spectrum is given by I(ω) = e−iωt〈δIj(0)δIj(t)〉dt 2~ωReYj(ω) 1− exp(−βj~ω) . (5) where ReYj(ω) = R j /[1+Q − ωLCj )2] is the real part of admittance of circuit j. The total relaxation rate is thus 2(Ip∆Mj) 2 coth 1 +Q2j − ωLCj ) . (6) To model the behavior of the device we utilize the Bloch master equation (see e.g Ref. [11]) given in our case by ~̇M = − ~B× ~M−Γ1th( ~M‖− ~MT1)−Γ2th( ~M‖− ~MT2)−Γ2 ~M⊥, where ~M = Tr(~σρ) is the “magnetization” of the qubit, and ~B = ∆~x + ε~z is the fictitious magnetic field. Note however that the z-component of ~B and ~M do correspond to real magnetic field and magnetization, respectively. In Eq. (7) ~M‖ and ~M⊥ are the components of the magnetiza- tion parallel and perpendicular to ~B, respectively. These are explicitly ~M‖ = (∆Mx + εMz)(∆~x + ε~z) (8) ~M⊥ = ε2Mx−∆εMz ~x+My~y + 2Mz−∆εMx ~z. (9) Here ~MT stands for the ε-dependent equilibrium mag- netization of a qubit at temperature T given explicitly ~MT = and Γ2 = (Γ th + Γ th)/2 + Γϕ is the dephasing rate. The possibility of pure dephasing at the rate Γϕ has been included. In the simulation we neglect pure dephasing due to the intentionally large dominating thermalization rate. Equation (7) describes relaxation towards instan- taneous equilibrium with two competing rates due to two different thermal baths. Equations of this type are usually used in the stationary case, but for driving fre- quencies slower than ∆/~ it should be also valid. As is obvious from Eq. (7), the qubit actually tends to re- lax towards an effective ε-dependent equilibrium mag- netization (Γ1th ~MT1 + Γ ~MT2)/(Γ th + Γ th) at the rate To illustrate the practical potential of the device we show in Fig. 2 the simulated cooling power with si- nusoidal driving of ǫ(t) compared to the ideal case along with the actual loop in the entropy temperature plane. The heat flow Pj from resistor j to the qubit is simply obtained by integrating the product of the thermalization rate and the energy deficit, i.e., Pj = ∫ 1/f [Tr(ρeq(βj , ǫ(t))H)− Tr(ρ(t)H)] . The den- sity matrix ρ(t) = 1 ~M(t) · ~σ is solved numerically using the Bloch equation (system is followed over a few periods until it has converged to the limit cycle). We see that the actual simulated behavior does not significantly deviate at low f from the ideal behavior and that cooling pow- ers on the order of fW can be achieved with reasonable sample parameters. The oscillatory behavior at high f is interpreted as Landau-Zener interference [12, 13]. However, the cooling power has to be compared with realistic heat loads to evaluate the utility of the flux qubit cooler. On one hand, resistor 1 is subject to heat load 0 ln 2 (a) (c) (b) (d) 0 0.5 1 1.5 f (GHz) 0 ln 2 0 0.5 1 1.5 f (GHz) FIG. 2: (color online) Example of the simulated cooling power with ω1/2π = ∆/2π = 5 GHz (ǫ = 0 GHz), ω2/2π = 20.62 GHz (ǫ = 20 GHz), Q1 = Q2 = 10, ωj = ωLCj and 2(Ip∆Mj) 2/(Rj~ωj) = 20 × 10 9s−1. This can be achieved e.g. with Ip = 200 nA, M1 = 29 pH, M2 = 59 pH and R1 = R2 = 1 Ω. The driving is sinusoidal. (a) The solid line illustrates the path in the T − S plane for the ideal cy- cle described in the text while the dashed (dotted) line is a result of simulation for f = 0.05 GHz (f = 1 GHz) with T1 = T2 = 0.3 × ~ω2/kB ≈ 300 mK. (b) Simulated cooling power vs. f for the same temperatures as in (a) is shown with the dashed line while the solid line is the ideal result of Eq. 3 (c-d) Same as (a-b) but with T1 = 0.5× T2 ≈ 150 mK. The cooling threshold at 0.14 GHz in (d) is caused by finite Q-factor. from the phonons of the substrate on which the device rests. On the other hand, resistor 2 should be coupled well enough to phonon bath such that the unavoidable work done on it does not raise T2 excessively. The heat flow between the electron system of resistor j and the phonon system is given by Pel−ph = ΣV (T j −T 5ph) where Vj is the volume of resistor j and Σ is typically on the order of 109 Wm−3K−5. Thus resistor 1 needs to have a sufficiently small volume while resistor 2 should be large enough physically in order to serve as a heat sink. In ad- dition the photonic heat conduction between the resistors due to temperature gradient may in principle contribute also. Following an analysis similar to Ref. [14], the heat flow from admittance Y2(ω) to Y1(ω) can be written as ReY1(ω)ReY2(ω)(n2(ω)− n1(ω)) where nj(ω) = [exp(βj~ω − 1)]−1 are the boson occu- pation factors and M is the mutual inductance between the loops. For detuned high-Q resonators the photonic heat conduction turns out to be quite negligible. For instance for the values of Fig. 3 with M = 5 pH and R1 = R2 = 1 Ω we get only Pγ = 2 × 10−18 W even 0 0.5 1 1.5 f (GHz) FIG. 3: (color online) Equilibrium temperature as a function of pump frequency for three different phonon bath temper- atures. The temperature of resistor 1 (volume 10−21m3) is shown with dashed line while the temperature of resistor 2 (volume 10−18m3) is shown with solid line. The bath tem- peratures Tph ≈ T2 from top to bottom are 0.3 × ~ω2/kB, 0.2 × ~ω2/kB and 0.1 × ~ω2/kB. Otherwise the parameters are like in Fig. 2. if T1 = 0 K and T2 = 300 mK. Figure 2 illustrates the calculated equilibrium temperature versus operation fre- quency obtained numerically by finding the balance be- tween the dominating phononic heat conduction and the integrated cooling power. We see that almost a factor of 2 reduction of T1 is possible with realistic parameters. In practice the drop of T1 can be measured e.g. using an additional SINIS thermometer, in which resistor 1 will serve as the normal metal N. Its reading is sensitive to the electronic temperature of N only, and self-heating can be made very small. The resistors should be made out of thin film normal metal such as copper or gold with typ- ically sub 1 Ω square resistance. Volume can be picked freely. To get the resonant frequencies and quality factor as above we need L1 = 320 pH, C1 = 3.2 pF, L2 = 80 pH and C2 = 0.8 pF which are also realistic. For the inductor one may use either Josephson or the kinetic in- ductance of superconducting wire while the capacitance values are similar to those in typical flux qubits [2]. To satisfy the conditions of the above numerical example we need quite large mutual inductances which however can be easily achieved using e.g. kinetic inductance [15]. The strong driving requires also rather large inductance be- tween the microwave line and the qubit, which should not result in uncontrolled relaxation. For instance, Mmw=5 pH coupling to the control line is acceptable as it would result in at most 3 × 107 s−1 relaxation rate assuming a 50 Ω environment at 0.3 K. This choice will not degrade the performance of the device significantly since driving is much faster. Yet sufficiently strong driving can be achieved with a modest 3 µA ac current. Fabrication process will require most likely three lithography steps. In conclusion, we have described a method of using a superconducting flux qubit driven strongly at microwave frequency to cool an external metal resistor. Here we con- sidered LC resonators to achieve the required frequency selectivity but a coplanar wave-guide resonator or a me- chanical oscillator could be used in principle, too. We demonstrated by a numerical example that it is possible to observe the associated temperature decrease experi- mentally. This effect is directly related to the loss of information and thus to the increase of entropy of the quantum bit. J.P.P thanks NanoSciERA project ”NanoFridge” of EU for financial support. [1] O. Astafiev, Yu. A. Pashkin, Y. Nakamura, T. Ya- mamoto, and J. S. Tsai, Phys. Rev. Lett. 93, 267007 (2004). [2] F. Yoshihara, K. Harrabi, A. O. Niskanen, Y. Nakamura, J.S. Tsai, Phys. Rev. Lett. 97, 167001 (2006). [3] P. Bertet, I. Chiorescu, G. Burkard, K. Semba, C. J. P. M. Harmans, D.P. DiVincenzo, and J.E. Mooij, Phys. Rev. Lett. 95, 257002 (2005). [4] J. E. Mooij, T. P. Orlando, L. Levitov, L. Tian, C. H. van der Wal, S. Lloyd, Science 285, 1036 (1999). [5] I. Chiorescu, Y. Nakamura, C. J. P. M. Harmans, and J. E. Mooij, Science 299, 1869 (2003). [6] S. O. Valenzuela, W. D. Oliver, D. M. Berns, K. K. Berggren, L. S. Levitov, and T. P. Orlando, Science 314, 1589 (2006). [7] M. Meschke, W. Guichard, and J. P. Pekola, Na- ture(London) 444, 187 (2006). [8] J. Hauss, A. Fedorov, C. Hutter, A. Shnirman, and G. Schön, cond-mat/0701041. [9] J. P. Pekola, F. Giazotto, and O. P. Saira, Phys. Rev. Lett. 98, 037201 (2007). [10] H.T. Quan, Y.X. Liu, C. P. Sun, and Franco Nori, quant-ph/0611275. [11] Yu. Makhlin, G. Schön, and A. Shnirman, in New Di- rections in Mesoscopic Physics (Towards Nanoscience), edited by R. Fazio, V. F. Gantmakher, and Y. Imry (Kluwer, 2003), p. 197; cond-mat/0309049. [12] M. Sillanpää, T. Lehtinen, A. Paila, Yu. Makhlin, and P. Hakonen, Phys. Rev. Lett. 96, 187002 (2006). [13] W. D. Oliver, Y. Yu, J. C. Lee, K. K. Berggren, L. S. Levitov, and T. P. Orlando, Science 310, 1653 (2005). [14] D. R. Schmidt, R. J. Schoelkopf, and A. N. Cleland, Phys. Rev. Lett. 93, 045901 (2004). [15] A. O. Niskanen, K. Harrabi, F. Yoshihara, Y. Nakamura, and J. S. Tsai, Phys. Rev. B 74, 220503(R) (2006). http://arxiv.org/abs/cond-mat/0701041 http://arxiv.org/abs/quant-ph/0611275 http://arxiv.org/abs/cond-mat/0309049 ABSTRACT We consider a design for a cyclic microrefrigerator using a superconducting flux qubit. Adiabatic modulation of the flux combined with thermalization can be used to transfer energy from a lower temperature normal metal thin film resistor to another one at higher temperature. The frequency selectivity of photonic heat conduction is achieved by including the hot resistor as part of a high frequency LC resonator and the cold one as part of a low-frequency oscillator while keeping both circuits in the underdamped regime. We discuss the performance of the device in an experimentally realistic setting. This device illustrates the complementarity of information and thermodynamic entropy as the erasure of the quantum bit directly relates to the cooling of the resistor. <|endoftext|><|startoftext|> Introduction Presented here is a new technique for analyzing skew polynomial rings satisfying a poly- nomial identity with an eye toward discovering their PI degrees. It combines and extends the methods of Jøndrup [21] and Cauchon [5], who introduced techniques of “deleting derivations” in skew polynomial rings, by means of which they showed that some proper- ties of certain types of iterated skew polynomial ring A = k[x1][x2; τ2, δ2] · · · [xn; τn, δn] are determined by the corresponding ring A′ = k[x1][x2; τ2] · · · [xn; τn]. Jøndrup’s re- sults imply that A and A′ have the same PI degree under certain hypotheses, including characteristic zero for the base field. Cauchon developed an algorithm that gives an isomorphism between certain localizations of A and A′, but this requires a qi-skew condition on each (τi, δi) with qi not a root of unity, which usually precludes A from satisfying a polynomial identity. We relax the restrictions placed on the base field and its chosen scalars by Jøndrup and Cauchon, respectively, by introducing the notion of a higher q-skew τ -derivation. If we “twist” the multiplication in the (commutative) coordinate ring of affine, symplec- tic, or Euclidean n-space over a field k, we get a (noncommutative) quantized coordinate ring which has the structure of an iterated skew polynomial ring with coefficients in k. This structure is also exhibited in the quantized Weyl algebras and in the quantized coordinate ring of n×n matrices over k. Letting A represent one of these k-algebras, the quantum Gel’fand-Kirillov conjecture asserts that FractA is isomorphic to the quotient division ring of a quantum affine space over a purely transcendental extension of k. For 1991 Mathematics Subject Classification. 16R99; 16S36; 81R50; 16P40. Key words and phrases. noncommutative rings; skew polynomial rings; quantum algebras. This research will form a part of the author’s PhD dissertation at the University of California at Santa Barbara. http://arxiv.org/abs/0704.0846v1 2 HEIDI HAYNAL more information on the quantum Gel’fand-Kirillov conjecture and proofs of conditions under which the result holds, see [1] [7] [23] [28] [32] [33]. We will confirm some of these cases in a new way. The first section sets up the conventions under which we work, including definitions and an established result concerning the PI degree of quantum affine space. We assume that the reader has some familiarity with the subject, so we do not give an exhaustive collection of definitions. A comprehensive discussion of any unfamiliar terms can be found in [16] [4] and [27]. In the second section we define higher τ -derivations and give necessary and sufficient conditions for their existence. Of particular interest are higher τ -derivations which satisfy a q-skew relation. In the third section we present a structure theorem for a localization of q-skew polynomial rings. This extends the work of Cauchon [5], and the calculations are simplified by the presence of higher q- skew τ -derivations. In the fourth section we deal with the structure of iterated skew polynomial rings. Sometimes it is advantageous to rearrange the order in which the indeterminates appear, so we establish a sufficient condition that allows such reordering. The main theorem there asserts that if A is an iterated q-skew polynomial ring with certain higher τ -derivations, then there is a finitely generated Ore set T ⊆ A such that AT−1 is isomorphic to a localization of a much “nicer” iterated skew polynomial ring. In the fifth section, we use the tools developed in the previous sections to confirm certain cases of the quantum Gel’fand-Kirillov conjecture and to find the PI degree of some quantized coordinate rings and quantized Weyl algebras. In the last section, we follow up with a structure theorem for completely prime factors of iterated skew polynomial rings. We also present an open question which, if answered positively, would show that the quantum Gel’fand-Kirillov conjecture holds for certain of the prime factor algebras we study. Throughout, k will denote a field of arbitrary characteristic, q ∈ k a nonzero ele- ment. The following assumptions apply to all skew polynomial rings that we will con- sider: • all coefficient rings are k-algebras • all automorphisms are k-algebra automorphisms • all skew derivations are k-linear • in all skew polynomial rings R[x; τ, δ], τ is an automorphism, not just an endo- morphism. To say that R[x; τ, δ] is a q-skew polynomial ring means that the auomorphism and skew derivation satisfy the relation δτ = qτδ. The reader will note that this is opposite to Cauchon’s conventions, but it matches the presentation in [10] and others. To say that δ is locally nilpotent means that for every r ∈ R there is an integer nr ≥ 0 such that δnr(r) = 0, and δp(r) 6= 0 for p < nr. Such nr is called the δ-nilpotence index of r. The PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 3 symbol N refers to the set of positive integers. For a real number m we use the notation ⌊m⌋ in section five to indicate the integer part of m. Definition 1.1. We say that two rings R and S exhibit PI degree parity when these two conditions are satisfied: (1) R is a PI ring if and only if S is a PI ring, (2) PIdegR = PIdegS. For a field k and multiplicatively antisymmetric λ ∈ Mn(k), the corresponding mul- tiparameter quantum affine space is the k-algebra Oλ(kn) with generators x1, . . . , xn and relations xixj = λijxjxi for all i, j. The corresponding multiparameter quantum torus is the k-algebra Oλ((k×)n) given by generators x±11 , . . . , x±1n and the same rela- tions. The multiplicative set generated by x1, . . . , xn in Oλ(kn) is a denominator set, and Oλ((k×)n) is a localization of Oλ(kn) with respect to this set. In this paper we’ll show that iterated skew polynomial algebras covering a large class of standard examples have PI degree parity with Oλ(kn) for an appropriately chosen λ. To find out what that PI degree may be, we utilize a result of De Concini and Procesi. In [8, Proposition 7.1], they establish the following formula for calculating the PI degree of a quantum affine space Oλ(kn). Their assumption of characteristic zero from [8, Section 4] is not used in this result. Theorem 1.2. [De Concini - Procesi] Let λ = (λij) be a multiplicatively antisymmetric n× n matrix over k. (1) The quantum affine space Oλ(kn) is a PI ring if and only if all the λij are roots of unity. In this case, there exist a primitive root of unity q ∈ k× and integers aij such that λij = q aij for all i, j. (2) Suppose λij = q aij for all i, j, where q ∈ k is a primitive ℓth root of unity and the aij ∈ Z. Let h be the cardinality of the image of the homomorphism n (aij)−−−−−−→ Zn π−−−−→ (Z/ℓZ)n where π denotes the canonical epimorphism. Then PI-deg (Oλ(kn)) = 2. Higher q-Skew τ-Derivations Before the featured definition, a brief discussion of a tool used to study q-skew polyno- mial rings is needed. Having the q-skew relation δτ = qτδ in place allows us to group terms of the same degree when we do skew polynomial arithmetic. The means to do this are provided by the q-Liebnitz rules. 4 HEIDI HAYNAL Definition 2.1. For an indeterminate t, and integers n ≥ m ≥ 0, we define the following polynomial functions: (m)t = t m−1 + tm−2 + · · ·+ t + 1 (1) (m)!t = (m)t(m− 1)t · · · (1)t, and (0)!t = 1 (2)( (n)!t (m)!t(n−m)!t The expressions are called the t-binomial coefficients, or Gaussian polynomials. The t-binomial coefficients have properties similar to those of the regular binomial coefficients. Two that will be useful for this work are:( = 1 for all n ≥ 0 (4) + tn−m for all 0 < m < n Proofs for these identities may be found in combinatorics texts such as [39]. When we evaluate the t-binomial coefficients at t = q, we obtain the q-binomial coefficients that we need for studying q-skew polynomial rings. As shown in [10, Section 6], the following q-Liebnitz rules hold for any q-skew polynomial ring R[x; τ, δ]: δn(rs) = τn−iδi(r)δn−i(s) for all r, s ∈ R and n = 0, 1, 2, ... xnr = τn−iδi(r)xn−i for all r ∈ R and n = 0, 1, 2, ... Now, taking a cue from the study of Schmidt differential operator rings, for instance [25], we define a sequence of k-linear maps that allows us to broaden the class of rings for which we may derive results like those of Jøndrup and Cauchon. Definition 2.2. A higher q-skew τ -derivation (h.q-s.τ -d.) on a k-algebra R is a sequence d0, d1, d2, . . . of k-linear operators on R such that d0 is the identity dn(rs) = τn−idi(r)dn−i(s) for all r, s ∈ R and all n diτ = q iτdi for all i. PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 5 If a sequence of k-linear maps satisfies the first two conditions, we refer to it as a higher τ -derivation. We abbreviate the sequence {di}∞i=0 usually as just {di}. A h.q-s.τ -d is locally nilpotent if for all r ∈ R, there exists an integer n ≥ 0 such that di(r) = 0 for all i ≥ n, and dp(r) 6= 0 for p < n. In this case, n is called the d-nilpotence index of r. A h.q-s.τ -d is iterative if didj = di+j for all i, j. This implies that the di commute with each other. A q-skew τ -derivation δ on R extends to a h.q-s.τ -d. if there is a h.q-s.τ -d {di} on R with d1 = δ. For example, consider the k-algebra with two generators x and y, and one relation xy − qyx = 1, where q ∈ k×. We’ll assume that q 6= 1 and recognize this algebra as a q-skew polynomial ring k[y][x; τ, δ] with τ(y) = qy and δ(y) = 1, commonly known as a quantized Weyl algebra and denoted A 1(k). If q is not a root of unity, then the (i)!q comprise an iterative higher q-skew τ -derivation that extends δ on k[y]. The prop- erties of a higher q-skew τ -derivation follow directly from the fact that δ is a q-skew τ -derivation and the first q-Liebnitz rule. This particular h.q-s.τ -d. is also locally nilpotent because yn−i when i ≤ n, 0 when i > n. Proposition 2.3. Let {di} be a sequence of k-linear maps on a k-algebra R with d0 = idR, and let R[[x; τ −1]] be the skew power series ring where τ is a k-linear automor- phism of R, the coefficients are written on the right of the variable x, and rx = xτ(r) for all r ∈ R. (a) Then {di} is a higher τ -derivation on R if and only if the map Ψ : R → R[[x; τ−1]] given by r 7→ i=0 x idi(r) is a ring homomorphism. (b) Extend τ to an automorphism of R[[x; τ−1]] such that τ(x) = xq. Assume that {di} is a higher τ -derivation. Then the sequence {di} is a h.q-s.τ -d. if and only if this diagram is commutative: R[[x; τ−1]] // R[[x; τ−1]] 6 HEIDI HAYNAL Proof. (a) Suppose {di} is a higher τ -derivation on R. Consider any r, s ∈ R. It is clear that Ψ is additive and Ψ(1) = 1. Applying the definition 2.2 gives Ψ(rs) = xidi(rs) = τ i−mdm(r)di−m(s) Power series multiplication, with rx = xτ(r), gives Ψ(r)Ψ(s) = xidi(r) )( ∞∑ xidi(s) τ i−mdm(r)di−m(s) So Ψ preserves products. Therefore, Ψ is a ring homomorphism. To demonstrate the other implication, suppose Ψ is a ring homomorphism. Then Ψ(r)Ψ(s) = Ψ(rs) implies that dn(rs) = i=0 τ n−idi(r)dn−i(s) for all r, s ∈ R. There- fore, {di} is a higher τ -derivation. (b) Suppose that {di} is a h.q-s.τ -d. Then the relations diτ = qiτdi imply that τΨ(r) = i=0 x iqiτdi(r) = i=0 x idi(τ(r)) = Ψτ(r), for all r ∈ R. Now if the diagram is commutative, then comparing the coefficients of τΨ(r) = i=0 x iqiτdi(r) and Ψτ(r) = i=0 x idi(τ(r)) for all r ∈ R yields that diτ = q iτdi. � Remark 2.4. If {di} is locally nilpotent on R, we observe that claims analogous to the proposition can be made for the map Ψ : R → R[x; τ−1]. Proposition 2.5. Let {di} be a h.q-s.τ -d. on a k-algebra R, where τ is an automor- phism, and let S be a right denominator set in R with τ(S) = S. Then {di} can be uniquely extended to a h.q-s.τ -d. on RS−1. Proof. It has been established that τ and d1 extend uniquely to RS −1 by τ(rs−1) = τ(r)τ(s)−1 and d1(rs −1) = d1(r)s −1 − τ(rs−1)d1(s)s−1 in [10, Lemma 1.3]. Suppose that {di} extends to a h.q-s.τ -d. on RS−1. For r ∈ R and s ∈ S, we apply dn to the equation r1−1 = (rs−1)(s1−1) to get dn(r)1 −1 = dn (rs−1)(s1−1) τn−jdj(rs −1)dn−j(s1 = τn(rs−1)dn(s)1 −1 + · · ·+ dn(rs−1)s1−1. This implies that dn(rs −1) = dn(r)− τn−jdj(rs −1)dn−j(s) So we have uniqueness in case of existence. PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 7 To show existence, let Ψ : R → R[[x; τ−1]] be the map defined in Proposition 2.3, and let φ : R[[x; τ−1]] → RS−1[[x; τ−1]] be the natural map. Consider the composite map Φ = φΨ : R → RS−1[[x; τ−1]]. For any s ∈ S, the constant term of Φ(s) is a unit. So we may inductively solve for the coefficients of an inverse for Φ(s) in RS−1[[x; τ−1]]. Details, as in [37, 1.2], are left to the reader. Hence, Φ extends to a ring homomorphism Φ′ : RS−1 → RS−1[[x; τ−1]] such that Φ′(rs−1) = Φ(r)Φ(s)−1, and we consider the diagram: RS−1[[x; τ−1]] // RS−1[[x; τ−1]] // RS−1 where τ has been extended to an automorphism of RS−1[[x; τ−1]] as in Proposition 2.3. Since Φ(r) = i=0 x idi(r)1 −1, and {di} is a h.q-s.τ -d. on R, we have τΦ(r) = xiqiτdi(r)1 1−1 = Φτ(r) for all r ∈ R. It follows directly that τΦ′(rs−1) = Φ′τ(rs−1). So, indeed, the diagram is commutative. Define a sequence {di} on RS−1 such that di(t) equals the coefficient of xi in Φ′(t) for all t ∈ RS−1. Then by Proposition 2.3 we conclude that this sequence is a h.q-s.τ -d. on RS−1 extending {di} on R. � Lemma 2.6. Let A be a k-algebra, B ⊆ A a k-subalgebra generated by {b1, b2, . . . }, τ a k-linear automorphism of A, and {di} a higher τ -derivation on A. If di(bj) ∈ B and τ(bj) ∈ B, for all i, j ∈ N, then di(B) ⊆ B for all i. Proof. First, observe that τ(bj) ∈ B for all j implies that τ(B) ⊆ B. Since the di are k-linear maps, it suffices to check monomials in the bj , using induction on their length. Suppose, inductively, that for integers m ≥ 1 and 1 ≤ ℓ ≤ m − 1, we have di(bj1 · · · bjℓ) ∈ B for all i and all j1, . . . , jℓ. Then using the product rule for h.q-s.τ -d. gives dn(bj1 · · · bjm) = τn−1di(bj1 · · · bjm−1)dn−i(bjm) ∈ B for all n and all j1, . . . , jm, by the induction hypothesis. � 8 HEIDI HAYNAL Lemma 2.7. Let A be a k-algebra with a set {xj} of generators, τ an automorphism of A, and {di} a h.q-s.τ -d. on A. If {di} is locally nilpotent for all xj, then {di} is locally nilpotent on A. Proof. It suffices to check monomials in the xj because the di are k-linear maps. We proceed by using induction on the length of such monomials. For a given xn, let i(n) be its nilpotence index, so di(xn) = 0 for all i ≥ i(n). Suppose inductively that for n ≥ 2, all integers ℓ with 1 ≤ ℓ ≤ n− 1, and all choices of j1, . . . , jℓ, there exists an integer m such that di(xj1 · · ·xjl) = 0 for all i ≥ m. For instance, m = i(j1)+ · · ·+ i(jℓ) will suffice, although the d-nilpotence index of xj1 · · ·xjℓ may be less than this sum. Then, for p ≥ m+ i(jn), we have dp(xj1 · · ·xjn) = τ p−idi(xj1 · · ·xjn−1)dp−i(xjn) = 0, completing the induction. � Consider again the quantized Weyl algebra A 1(k). In case q is an ℓ-th root of unity, the dℓ given in (7) would be undefined due to the occurrence of a zero denominator. However, realizing A 1(k) as a factor of a quantized Weyl algebra over k[t ±1] allows us to define a h.q-s.τ -d. on A 1(k) nonetheless. The k[t ±1]-algebra At1(k[t ±1]) has generators x and y and one relation xy− tyx = 1. This is a t-skew polynomial ring k[t±1][y][x; τ̄ , δ̄] where τ̄ (y) = ty, τ̄(t) = t, δ̄(y) = 1, and δ̄(t) = 0. Note that δ̄i(yn) = (n)!t (n−i)!t yn−i when i ≤ n 0 when i > n implying that δ̄i k[t±1][y] ⊆ (i)!tk[t±1][y]. So the assignment d̄i = (i)!t defines an iterative, locally nilpotent h.t-s.τ̄ -d. {d̄i} on k[t±1][y]. Now, the relation xy − tyx = 1 is equivalent to the relation xy − qyx = 1 modulo 〈t − q〉. Hence we k[t±1] /〈t− q〉 ∼= Aq1(k). When q is an ℓth root of unity, we have δ̄ℓ k[t±1][y] ⊆ 〈t−q〉k[t±1][y]. Nonetheless, the h.t-s.τ̄ -d. {d̄i} on k[t±1][y] induces a h.q-s.τ -d. {di} on k[y], also iterative and locally nilpotent, with d1 = δ. Note that even though δ ℓ = 0 in this algebra, we have di(y i) = 1 for all i. This phenomenon is not unique to the quantized Weyl algebras. The conditions that drive it are codified in the following theorem. PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 9 Theorem 2.8. Let R be a k-algebra and R[x; τ, δ] a q-skew polynomial ring where q ∈ k, q 6= 1. Suppose there exists a torsion-free k[t±1]-algebra R and R[x; τ̄ , δ̄] a t-skew polynomial ring such that R/〈t− q〉R ∼= R, with τ̄ and δ̄ reducing to τ and δ. Suppose further that δ̄i(R) ⊆ (i)!tR for all i. Then δ extends to an iterative h.q-s.τ -d. {di} on R. If δ̄ is locally nilpotent, then so is {di}. If q is not a root of unity, then di = δ (i)!q all i. If q is a primitive ℓth root of unity, then di = (i)!q for i < ℓ. Proof. The assumption δ̄i(R) ⊆ (i)!tR for all i implies that the sequence of maps d̄i = (i)!t make up a well-defined iterative h.t-s.τ̄ -d. on R, and also implies that δ̄ℓ(R) ⊆ 〈t − q〉R because (ℓ)t ≡ (ℓ)q = 0 modulo 〈t − q〉. Since τ̄ and δ̄ reduce to τ and δ modulo 〈t− q〉, we have an isomorphism R/〈t− q〉[x; τ̄ , δ̄] ∼= R[x; τ, δ] whereby {d̄i} induces an iterative h.q-s.τ -d. {di} on R. The reduction of the maps from R to R also implies the remaining results. � We will find that all of the conditions assumed above are satisfied by the common quantized coordinate rings and related examples, which will be discussed in a subsequent section. 3. The τ-Derivation Removing Homomorphism Following the pattern in [5], let A = R[x; τ, δ], and suppose that δ is locally nilpotent. Set S = {xn | n ∈ N ∪ {0}} ⊂ A. Lemma 3.1. The set S is a denominator set in A. Proof. Clearly, S is a multiplicative set inA. And, since S contains only regular elements of A, it is left and right reversible. It remains to show that S is an Ore set. Let a = i=0 rix i be an element of A with rn 6= 0. For each ri in the expression of a, and each mi ≥ 0, we have xmiri = τmi−jδj(ri)x = a′ix+ δ mi(ri) for some a i ∈ A. Since δ is locally nilpotent, we may choose mi to be the δ-nilpotence index of ri to conclude that xmiri = a ix for some a i ∈ A. Set ma = max{mi | 0 ≤ i ≤ n}. Then for each ri, we have x mari = ãix, and hence x maa = ãx for some ã ∈ A. Now suppose, inductively, that for a given a ∈ A and xp ∈ S we can find elements xma ∈ S and ā ∈ A such that xmaa = āxp, say ā = i=0 r̄ix i. We know that there 10 HEIDI HAYNAL exists an element xmā such that xmā ā = a′x for some a′ ∈ A. So, xmaa = āxp implies xmā+maa = a′xp+1, completing the induction. Hence, for any a ∈ A and s ∈ S, we have Sa∩As 6= ∅. So S is a left Ore set in A. We see that S is a right Ore set by applying the same argument to Aop = Rop[x; τ−1,−δτ−1]. � Suppose also that the derivation δ extends to an iterative, locally nilpotent higher q- skew τ -derivation {di} on R and that q 6= 1. Denote  = AS−1 = S−1A, the localization of A with respect to S, and define a map f : R −→  by f(r) = n(n+1) 2 (q − 1)−ndnτ−n(r)x−n, noting that {di} is locally nilpotent and that q − 1 is invertible. If q is not a root of unity and {di} is obtained from a q-skew τ -derivation δ as in (7), the formula for f can be rewritten as f(r) = n(n+1) (q − 1)−n (n)!q δnτ−n(r)x−n. The rewritten formula matches the one presented in [5, Section 2] when q is replaced by q−1 to account for the difference between δτ = qτδ (used here) and τδ = qδτ (used in [5]). We will show that f is a homomorphism and that the the multiplication in imf is made simpler than that in A by removing the derivation, as seen in the following. Proposition 3.2. If r ∈ R, then xf(r) = f x in Â. Proof. Using the hypothesis that {di} is iterative, we compute that xf(r) = n(n+1) 2 (q − 1)−nxdnτ−n(r)x−n n(n+1) 2 (q − 1)−n −n(r)x+ d1dnτ −n(r) n(n+1) 2 (q − 1)−nq−ndnτ−n+1(r)x−n+1 n(n+1) 2 (q − 1)−n(n+ 1)qdn+1τ−n(r)x−n n(n+1) 2 (q − 1)−nq−ndnτ−n(τ(r))x−n+1 n(n−1) 2 (q − 1)−n+1(n)qdnτ−n(τ(r))x−n+1 PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 11 = τ(r)x n(n+1) 2 (q − 1)−nq−n + q n(n−1) 2 (q − 1)−n+1(n)q −n(τ(r))x−n+1 = τ(r)x (q − 1)−n 2 + q 2 (qn − 1) −n(τ(r))x−n+1 = τ(r)x+ (q − 1)−nq n(n+1) 2 dnτ −n(τ(r))x−n+1 n(n+1) 2 (q − 1)−ndnτ−n(τ(r))x−n x = f which gives the result. � From Proposition 3.2, it follows by routine induction that xmf(r) = f τm(r) xm ∀m ∈ Z. (8) This is what we need in order to show that our map is indeed a k-algebra homomor- phism. Proposition 3.3. The map f : R −→  is a k-algebra homomorphism. Proof. It is immediate that f is k-linear (τ and {di} are k-linear), and that f(1) = 1. We’ll show that f is multiplicative. If r, s ∈ R, then using Prop. 3.2, f(r)f(s) = i(i+1) 2 (q − 1)−idiτ−i(r)x−if(s) i(i+1) 2 (q − 1)−idiτ−i(r)f(τ−i(s))x−i i≥0, j≥0 i(i+1)+j(j+1) 2 (q − 1)−(i+j)diτ−i(r)djτ−(i+j)(s)x−(i+j). For n ∈ N, the coefficient of x−n in the sum above is i≥0, j≥0, i+j=n i(i+1)+j(j+1) 2 (q − 1)−ndiτ−i(r)djτ−n(s) (n−p)2+p2+n 2 (q − 1)−ndn−pτ p−n(r)dpτ−n(s) 12 HEIDI HAYNAL n(n+1) 2 (q − 1)−n qp(p−n)dn−pτ pτ−n(r)dpτ −n(s) n(n+1) 2 (q − 1)−n τ pdn−p(τ −n(r))dp(τ −n(s)) n(n+1) 2 (q − 1)−ndn(τ−n(r)τ−n(s)) n(n+1) 2 (q − 1)−ndnτ−n(rs), computed by putting p = j and using the second condition in the Definition 2.2. In summary, f(r)f(s) = n=0 q n(n+1) 2 (q − 1)−ndnτ−n(rs)x−n = f(rs). � Proposition 3.4. (1) The map f extends uniquely to an algebra homomorphism, also denoted f , of R[y; τ ] to  satisfying f(y)=x. (2) The extended homomorphism is injective. Proof. (1) This result follows from Proposition 3.2 and the universal property of Ore extensions. (2) Let P = pmy m + · · ·+ p1y + p0 be a nonzero element of R[y; τ ], where each pi ∈ R, m ≥ 0, pm 6= 0. Then f(P ) = f(pm)xm + · · ·+ f(p1)x+ f(p0). Since f(pi) = n(n+1) 2 (q − 1)−ndnτ−n(pi)x−n ∈ AS−1, we know that there exists an integer l ≥ 0 such that each f(pi)xl is a nonzero element of A of positive degree l (in x) whenever pi 6= 0. (Because {di} is locally nilpotent, we may choose an l large enough.) It follows that f(P )xl is a nonzero element of  of degree m+ l, hence f(P ) 6= 0. � Definition 3.5. The algebra homomorphism f : R[y; τ ] −→  = AS−1 is called the derivation removing homomorphism. The image of f , call it A′, is the subalgebra of  = AS−1 generated by x and f(R), and is isomorphic (as an algebra) to R[y; τ ] by the derivation removing homomorphism f . Observe that A′ contains the multiplicative system S = {xn | n ∈ N ∪ {0}}. Since equation (8) holds and f(y) = x, the elements of this set are normal in A′. Hence, S satisfies the (two-sided) Ore condition in A′. The elements of S are regular in A′ because they are regular in Â, and thus: Proposition 3.6. A′S−1 = AS−1 Proof. We have A′S−1 ⊆ AS−1 because A′ = im(f) ⊆ AS−1. To show the other inclusion, it suffices to show that R ⊆ A′S−1. (This suffices because A is built up from PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 13 R by x, x2, . . . . So if R ⊆ A′S−1, then AS−1 ⊆ A′S−1.) Consider any r ∈ R and let ℓ be the d-nilpotence index of r. We show that r ∈ A′S−1 with an induction argument on ℓ. If ℓ ≤ 1, then d1(r) = 0, whence f(r) = r ∈ A′ ⊆ AS−1. If ℓ ≥ 2, we write f(r) = r + −n, with rn = q n(n+1) 2 (q − 1)−ndnτ−n(r) ∈ R. We’ll show that n=1 rnx −n ∈ A′S−1 in order to conclude that r ∈ A′S−1, because f(r) − n=1 rnx −n = r. That is, we need to show that each rn ∈ A′S−1. Suppose, inductively, that for any element r̃ ∈ R with d-nilpotence index m such that m < ℓ, we have r̃ ∈ A′S−1. Note that for n ∈ {1, . . . , ℓ}, we have dℓ−n(rn) = q n(n+1) 2 (q − 1)−n −n(r) = q n(n+1) 2 (q − 1)−n q−nℓτ−ndℓ(r) = 0 because dℓ(r) = 0 by hypothesis. Hence, by the induction hypothesis, each rn ∈ A′S−1 for 1 ≤ n ≤ ℓ− 1. It follows that r = f(r)− n=1 rnx −n also belongs to A′S−1. � This equality of quotient rings reveals that if A is a PI ring, then PIdegA = PIdegA′ = PIdegR[y; τ ], with the second equality arising from the derivation removing homomorphism f . This recovers the result of Jøndrup [21] without the assumption that k has characteristic zero. We summarize the results of this section in the following theorem. Theorem 3.7. Let k be a field, R a k-algebra and A = R[x; τ, δ] a q-skew polynomial ring in which δ extends to a locally nilpotent, iterative h.q-s.τ -d. {di} on R for some q ∈ k×, q 6= 1. Let S be the Ore set in A generated by x, and define a map f : R −→ AS−1 by f(r) = n=0 q n(n+1) 2 (q − 1)−ndnτ−n(r)x−n. Then f is a k-algebra homomorphism, and it extends to an injective homomorphism f : R[y; τ ] −→ AS−1 sending y to x. Furthermore, the extension f : R[y±1; τ ] −→ AS−1 is an isomorphism. So there is PI degree parity between A and R[y; τ ]. Moreover, if R is a noetherian domain, then FractA ∼= FractR[y; τ ]. 14 HEIDI HAYNAL 4. Main Theorem In the case where A is an iterated skew polynomial ring, we would like to apply re- peatedly the method presented above to remove all of the derivations and compare the resulting Ore localizations. We must first establish some facts about the behavior of h.q-s.τ -d. when the variables adjoined to the coefficient ring are rearranged, and about iterated localization. The results of these lemmas will ensure that after the induction step in the proof of the main theorem we are left with a ring to which the method of the preceding section applies. The first parts of the following lemmas hold in a broader class of skew polynomial rings and also when the q-skew condition is imposed. The final parts assert that h.q-s.τ -d. are preserved when rearranging of the variables is permissible. Lemma 4.1. Let S = R[x; τ, δ], A = R[x; τ, δ][y; σ], and  = R[x; τ, δ][y±1; σ], where σ(R) = R and σ(x) = λx for some λ ∈ k×. (1) Then A = R[y; σ′][x; τ ′; δ′], and  = R[y±1; σ′][x; τ ′; δ′], where σ′ = σ , τ ′ = τ , = δ, τ ′(y) = λ−1y, and δ′(y) = 0. (2) If (τ, δ) is q-skew, then so is (τ ′, δ′). (3) Suppose further that δ extends to a h.q-s.τ -d. {di} on R, and that σdi = λidiσ for all i. Then the τ ′-derivation δ′ extends to a h.q-s.τ ′-d. {d′i} on R[y±1; σ′] such that the restrictions of the d′i to R coincide with di, and d i(y) = 0 for all i ≥ 1. Moreover, {d′i} restricts to a h.q-s.τ ′-d. on R[y; σ′]. (a) If {di} is iterative, then {d′i} is iterative. (b) If {di} is locally nilpotent, then {d′i} is locally nilpotent. Proof. (1) Routine details omitted so as not to try the patience of the reader. (2) Suppose that (τ, δ) is q-skew on R. We’ll check that the two τ ′-derivations τ ′−1δ′τ ′ and qδ′ agree on R[y±1; σ′]. It suffices to check their agreement on a set of generators, R ∪ {y, y−1}. It is clear that τ ′−1δ′τ ′(r) = qδ′(r) for all r ∈ R. Since δ′(y) = 0, they agree on {y, y−1} as well. So (τ ′, δ′) is q-skew. (3) Define a sequence of maps d′i : R[y ±1; σ′] → R[y±1; σ′] by di(rj)y Clearly these are k-linear maps, d′i(r) = di(r) for all r ∈ R; also d′i(y) = di(1)y = 0 for i ≥ 1, and d′0 is the identity on R[y±1; σ′]. PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 15 Because δ extends to {di} on R, we get d1(rj)y δ(rj)y j = δ′ for all rj ∈ R. So d′1 = δ′ on R[y±1; σ′]. Now, for integers j,m, n, and elements r, s ∈ R, (ryj)(sym) = d′n rσj(s)yj+m rσj(s) τn−idi(r)dn−i(σ j(s))yj+m τn−idi(r)y jσ−jdn−i(σ j(s))ym τn−idi(r)y jλ−j(n−i)dn−i(s)y (τ ′)n−i di(r)y d′n−i(sy (τ ′)n−id′i(ry j)d′n−i(sy So {d′i} satisfies the product rule for a higher τ -derivation on R[y±1; σ′]. Furthermore, τ ′d′i = τ ′ di(rj)y τdi(rj)λ −jyj, and d′iτ = d′i τ(rj)λ diτ(rj)λ τdi(rj)λ −jyj, giving the q-skew relation d′iτ ′ = qiτ ′d′i on R[y ±1; σ′]. It follows directly from the definition of the maps {di} that their restrictions to the k-subalgebra R[y; σ′] also exhibit the properties of definition 2.2. 16 HEIDI HAYNAL If {di} is iterative on R, then d′ℓd′i(rym) = d′ℓ di(r)y = dℓdi(r)y dℓ+i(r)y d′ℓ+i(ry m) for all r ∈ R, m ∈ Z, and non-negative integers ℓ, i. Hence, {d′i} is iterative on R[y±1; σ′]. Suppose that {di} is locally nilpotent on R. By Lemma 2.7 we need only check that {d′i} is locally nilpotent on R ∪ {y, y−1}, a set of generators for R[y±1; σ′]. This is clear because d′i(r) = di(r) for all r ∈ R, and d′i(y) = 0 for all i by construction. � Lemma 4.2. Let A = R[x1; τ1, δ1][x2; τ2, δ2] · · · [xn; τn, δn][y; σ],  = R[x1; τ1, δ1][x2; τ2, δ2] · · · [xn; τn, δn][y±1; σ], where σ(R) = R, and for all i ∈ {1, . . . , n}, σ(xi) = λixi for some nonzero λi ∈ k. Let Aj = R[x1; τ1; δ1][x2; τ2, δ2] · · · [xj ; τj, δj ] for j = 1, 2, . . . , n, and A0 = R. (1) Then A = R[y; σ∗][x1; τ 1][x2; τ 2] · · · [xn; τ ′n, δ′n],  = R[y±1; σ∗][x1; τ 1][x2; τ 2] · · · [xn; τ ′n, δ′n], where σ∗ = σ , τ ′i = τi, δ = δi, τ i(y) = λ i y, and δ i(y) = 0 for all 1 ≤ i ≤ n and j ≤ i− 1. (2) If (τi, δi) is qi-skew for any 1 ≤ i ≤ n, then (τ ′i , δ′i) is also qi-skew. (3) Suppose that each δi extends to an h.qi-s.τi-d. {di,p}∞p=0, and that σdi,p = λ i di,pσ on Ai−1 for all i and p. Then each δ i extends to a h.qi-s.τ i -d. {d′i,p}∞p=0 on the algebra R〈y, y−1, x1, . . . , xi−1〉, where d′i,p coincides with di,p on Aj, for j < i, and d′i,p(y) = 0 for p ≥ 1. Moreover, {d′i,p} restricts to a h.qi-s.τ ′i -d. on R〈y, x1, . . . , xi−1〉. (a) If {di,p} is iterative for any 1 ≤ i ≤ n, then {d′i,p} is iterative. (b) If {di,p} is locally nilpotent for any 1 ≤ i ≤ n, then {d′i,p} is locally nilpotent. Proof. (1) The condition σ(xi) = λixi for all i implies that σ(Ai) = Ai. We will use induction on n to prove the result. Lemma 4.1 proves the case n = 1. Suppose the result holds for all m < n, and consider A = An−1[xn; τn, δn][y; σ]. Application of Lemma 4.1, and then the induction hypothesis, gives A = An−1[xn; τn, δn][y; σ] = An−1[y; σ ′][xn; τ = R[x1; τ1, δ1] · · · [xn−1; τn−1, δn−1][y; σ′][xn; τ ′n, δ′n] = R[y; σ∗][x1; τ 1] · · · [xn; τ ′n, δ′n], PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 17 with the desired conditions met by the automorphisms and derivations, completing the induction. Similarly,  = R[y±1; σ∗][x1; τ 1] · · · [xn; τ ′n, δ′n]. (2) Consider the two τ ′i -derivations τ i and qiδ i on the ring R[y±1; σ∗][x1; τ 1] · · · [xi−1; τ ′i−1, δ′i−1] for 1 ≤ i ≤ n. Since (τi, δi) is q-skew, it is clear that these two τ ′i derivations agree on Ai−1. And since δ i(y) = 0 for all i = 1, . . . , n, these two τ i -derivations agree on a full set of generators of R[y±1; σ∗][x1; τ 1] · · · [xi−1; τ ′i−1, δ′i−1]. Hence, δ′iτ ′i = qiτ ′iδ′i. (3) Suppose the result holds for the algebra R[x1; τ1, δ1] · · · [xn−1; τn−1, δn−1][y±1; σ]. Then Lemma 4.1 may be applied, with An−1 providing the coefficients, to get An−1[xn; τn, δn][y ±1; σ] = An−1[y ±1; σ′][xn; τ where δ′n extends to a h.qn-s.τ n-d. {d′n,p} on An−1[y±1]. The induction hypothesis gives the result. � Definition 4.3. For a k-algebra A and a, b ∈ A, we say that a and b scalar commute if there is an element α ∈ k× such that ab = αba. We may also say that a and b α-commute. In the following two lemmas, we let D denote the division ring of fractions for the noetherian domain A. When comparing localizations of A, we identify them as subrings of D. Lemma 4.4. Let A be a noetherian domain, S ⊆ A \ {0} an Ore set. Let T be an Ore set in AS−1 \ {0} with S ⊆ T . (1) Then there exists an Ore set T̃ ⊆ A\{0} with S ⊆ T̃ such that AT̃−1 = (AS−1)T−1. (2) Suppose A is a k-algebra and S is generated by s1, . . . , sn satisfying sisj = γijsjsi for all i, j and some γij ∈ k×. Further suppose that T is generated by S ∪ t for some t ∈ AS−1 that satisfies sit = λitsi for all i and some λi ∈ k×. Then there exist a cyclic Ore set T̂ ⊆ A \ {0} and an (n + 1)-generator Ore set Ŝ ⊆ A \ {0} such that S ⊆ Ŝ, and (AS−1)T−1 = AT̂−1 = AŜ−1. Proof. (1) Consider T ∩A, the subset in T of elements with a denominator of 1. Clearly, this is a multiplicative set in A which contains S. Set T̃ = T ∩A. Let a ∈ T̃ and α ∈ A. Then a ∈ T , and since α ∈ AS−1, there exist b′ ∈ T and β ′ ∈ AS−1 such that aβ ′ = αb′. By [16, 10.2], there exist y ∈ S, and b, β ∈ A such that β ′ = βy−1 and b′ = by−1; hence, aβy−1 = αby−1 in AS−1. It follows that aβ = αb in A. So T̃ satisfies the right Ore condition in A, and the left Ore condition by symmetry. By the universal property, AT̃−1 ∼= (AS−1)T−1. As subrings of D, we have AT̃−1 = (AS−1)T−1. 18 HEIDI HAYNAL (2) The generating element t has the form t = ā(sm11 s 2 · · · smnn )−1 for some mi ∈ N, and ā ∈ A. For any si ∈ S, we have siā(s 2 · · · smnn )−1 = λiā(sm11 sm22 · · · smnn )−1si = µλiāsi(sm11 sm22 · · · smnn )−1, where µ is a product of powers of the γij. So ā scalar commutes with the genera- tors of S via the relations siā = µλiāsi. Let Ŝ be the multiplicative set generated by ā, s1, . . . , sn in A, and T̂ the multiplicative set generated by ās1s2 · · · sn in A. Recall that (AS−1)T−1 = AT̃−1, where T̃ = T ∩ A from part (1). From the scalar com- muting relations it follows that any element at̃−1 ∈ AT̃−1 may be written in the form b(ās1, · · · sn)−m for some m ∈ N ∪ {0}, b ∈ A, or the form cā−ℓn+1s−ℓ11 · · · s−ℓnn , for ℓj ∈ N ∪ {0}, c ∈ A. So we conclude that Ŝ and T̂ are Ore sets in A and that (AS−1)T−1 = AT̂−1 = AŜ−1. � Lemma 4.5. Let A be a noetherian domain, S1 ⊆ A \ {0} an Ore set, and for integers j = 2, . . . , n let Sj be an Ore set in ((AS 1 ) · · · )S−1j−1 \ {0} with Sj−1 ⊆ Sj. (1) Then there exists an Ore set T ⊆ A \ {0} such that AT−1 = (((AS−11 )S−12 ) · · · )S−1n . (2) Suppose A is a k-algebra, S1 is generated by s1, and for j = 2, . . . , n, Sj is generated by Sj−1 ∪ {sj}, where sisj = γijsjsi for some multiplicatively antisymmetric matrix (γij) ∈ Mn(k×). Then there are a cyclic Ore set T̂ ⊆ A and an n-generator Ore set Ŝ ⊆ A such that S1 ⊆ Ŝ, and ((AS−11 )S−12 ) · · ·S−1n = AT̂−1 = AŜ−1. Proof. (1) The proof proceeds by induction on n. The case n = 1 is covered in the lemma above. Suppose that for all j ≤ n− 1 there exists an Ore set Tj ⊆ A \ {0} such that AT−1j = (((AS 2 ) · · · )S−1j . Then the equality AT−1n−1 = (((AS 2 ) · · · )S−1n−1 identifies an Ore set Tn ⊆ AT−1n−1 \ {0} such that (AT−1n−1)T n = (((AS 2 ) · · ·S−1n−1)S−1n . Furthermore, Lemma 4.4 implies the existence of an Ore set T ⊆ A \ {0} such that AT−1 = (AT−1n−1)T n = (((AS 2 ) · · ·S−1n−1)S−1n . (2) Suppose, inductively, that there exist (i) a cyclic Ore set T̂n−1 ⊆ A \ {0} generated by s1ā2 · · · ān−1 (ii) an (n− 1)-generator Ore set Ŝn−1 ⊆ A \ {0} with S1 ⊆ Ŝn−1 and generators s1, ā2, ā3, . . . , ān−1 (iii) the āi scalar commute with s1 and with each other (iv) ((AS−11 )S 2 ) · · ·S−1n−1 = AT̂−1n−1 = AŜ−1n−1 as subrings of D. PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 19 Then sn = ān(s1ā2 · · · ān−1)−r for some ān ∈ A and r ∈ N. Using the relations sisj = γijsjsi, routine calculations show that the āi scalar commute with the sj , and also with each other, for all i, j. Let T̂ be the multiplicative set generated by s1ā2 · · · ān, and let Ŝ be the multiplicative set generated by s1, ā2, ā3, . . . , ān. Then ((AS−11 )S 2 ) · · ·S−1n = (AT̂−1n−1)S−1n = AT−1 from part (1). Using Lemma 4.4, we con- clude that T̂ and Ŝ are Ore sets in A and that AT−1 = AT̂−1 = AŜ−1. � In the proof of the main theorem, we will use without mention the facts gathered here. For greater details on these statements, see [16, 10X, 10Y] and [10, 1.4]. (1) Given a noetherian ring A and a normal element x ∈ A, the multiplicative set generated by x is an Ore set. (2) The multiplicative set generated by a nonempty family of right Ore sets is right (3) Let A = R[x; τ, δ], and S a right denominator set in R such that τ(S) = S. Then S is a right denominator set in A and the identity map on AS−1 extends to an isomorphism of AS−1 onto (RS−1)[x; τ, δ] sending x1−1 to x. Note that if A is a k-algebra, τ , δ are k-linear, and τ(k×S) = k×S, then the result holds because S is a denominator set if and only if k×S is a denominator set. Theorem 4.6. Let R be a k-algebra and noetherian domain, A = R[x1; τ1, δ1] · · · [xn; τn, δn], where each τi is a k-linear automorphism of R〈xi, . . . , xi−1〉 such that τi(xj) = λijxj for all i, j with 1 ≤ j < i ≤ n and some λij ∈ k×, and where each δi is a k-linear τi- derivation. Assume that there exist elements qi ∈ k× with qi 6= 1 such that δiτi = qiτiδi, and that δi extends to a locally nilpotent, iterative h.qi-s.τi-d. on R〈xi, . . . , xi−1〉 for i = 1, . . . , n. (1) Then there exists an Ore set T ⊆ A generated by n elements of A such that AT−1 ∼= R[y±11 ; τ1][y±12 ; τ ′2] · · · [y±1n ; τ ′n] where τ ′i |R = τi and τ ′i(yj) = λijyj for all i, j with 1 ≤ j < i ≤ n (2) There is PI degree parity between A and R[y1; τ1][y2; τ 2] · · · [yn; τ ′n]. Moreover, these algebras have isomorphic division rings of fractions. Proof. (a) Suppose, inductively, that we have R[x1; τ1, δ1][y 2 ; τ2] · · · [y±1n ; τ ′n] ∼= AS−12 where the restriction of τ ′i to R〈x1〉 coincides with τi, τ ′i(ym) = λimym for 2 ≤ i ≤ n and 1 < m < i, and S2 is an Ore set in A generated by n − 1 elements from A. Then by 20 HEIDI HAYNAL Lemma 4.2 AS−12 ∼= R[y±12 ; τ ′′2 ] · · · [y±1n ; τ ′′n ][x1; τ ′1, δ′1] (9) where the restrictions of τ ′1 and δ 1 to R coincide with τ1 and δ1, τ 1(yj) = λ j1 yj, δ 1(yj) = 0, and τ ′′i coincides with the restriction of τi to R〈y2, . . . , yi−1〉 for 2 ≤ i ≤ n. Observe that by Lemmas 4.2 and 2.7 we also have δ′1τ 1 = q1τ 1, and that δ 1 extends to a locally nilpotent iterative h.q1-s.τ -d. on R〈y±12 , . . . , y±1n 〉. Then applying the derivation removing homomorphism to the right hand side of (9) gives an isomorphism (AS−12 )T ∼= R[y±12 ; τ ′2] · · · [y±1n ; τ ′n][y±11 ; τ ′1] where T1 ⊆ AS−12 is an Ore set generated by one element of AS−12 . Then Lemma 4.5 and a reordering of variables shows the existence of an Ore set T ⊆ A, generated by n elements of A, such that AT−1 ∼= R[y±11 ; τ1][y±12 ; τ ′2] · · · [y±1n ; τ ′n]. (2) This follows from part (1). � Corollary 4.7. Let A = k[x1; τ1, δ1] · · · [xn; τn, δn] with the hypotheses as in Theorem 4.6. Set λ = (λij). Then (1) A and Oλ(kn) have isomorphic division rings of fractions. (2) A is a PI-algebra if and only if all the λij are roots of unity, in which case A and Oλ(kn) have the same PI degree. In general, identification of the generators for the Ore set T in Theorem 4.6 is very cumbersome. To illustrate the computations on a fairly short iterated skew polynomial ring, we consider the multiparameter second quantized Weyl algebra A 2 (k). Here, Q = (q1, q2) ∈ (k×)2, qi 6= 1 for all i, and Γ = (γij) ∈ M2(k×) with γii = 1 and γ21 = γ 12 . The algebra A 2 (k) may be presented as an iterated skew polynomial ring of the form k[y1][x1; τ2, δ2][y2; τ3][x2; τ4, δ4], where the τi are k-linear automorphisms and the δ2i are k-linear τ2i-derivations such that τ2(y1) = q1y1, δ2(y1) = 1 τ3(y1) = γ 12 y1 τ3(x1) = γ12x1 τ4(y1) = q1γ12y1, δ4(y1) = 0 τ4(x1) = q 1 γ21x1, δ4(x1) = 0 τ4(y2) = q2y2, δ4(y2) = (q1 − 1)y1x1 + 1. For greater detail about this algebra, the reader is referred to [1], [23], [12], and [15]. Routine computations show that the pair (τ2, δ2) is a q1-skew derivation and that (τ4, δ4) is a q2-skew derivation. To show that δ2 and δ4 are locally nilpotent, it suffices to check PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 21 for local nilpotence on a set of generators. Given their definitions, this is accomplished by verifying their action on powers of y1 and y2: δi2(y 1 ) = (n)!q1 (n−i)!q1 yn−i1 i ≤ n 0 i > n δi4(y 2 ) = (n)!q2 (n−i)!q2 [δ4(y2)] iyn−i2 i ≤ n 0 i > n Using Theorem 2.8 we have a h.q1-s.τ2-d. {d2,i} extending δ2, and a h.q2-s.τ4-d. {d4,i} extending δ4, both of which are iterative and locally nilpotent. Let S2 ⊆ AQ,Γ2 (k) be the multiplicative set generated by x2. The derivation removing homomorphism induces an isomorphism Φ : k[y1][x1; τ2, δ2][y2; τ3][z 2 ; τ4] −→ A 2 (k)S whose action on generators is given by y1 7→ y1 x1 7→ x1 z2 7→ x2 y2 7→ y2 + (q2 − 1)−1 (q1 − 1)y1x1 + 1 x−12 . For simplicity, label the domain of Φ as BZ−1. Let X1 ⊆ BZ−1 be the Ore set generated by z2 and x1. Applying the derivation removing homomorphism to BZ −1 induces an isomorphism Ψ : k[y1][z 1 ; τ2][y2; τ3][z 2 ; τ 4] −→ (BZ−1)X−11 whose action on generators is given by z1 7→ z1 z2 7→ z2 y2 7→ y2 y1 7→ y1 + (q1 − 1)−1x−11 . The derivation removing homomorphism need not be employed again to achieve the result. Through iterated localization we find that there is an Ore set T ⊆ AQ,Γ2 (k) such 2 (k)T −1 ∼= k[y±11 ][x±11 ; τ2][y±12 ; τ3][x±12 ; τ4] and T is generated by the four elements x2, x1, y2x2(q2 − 1) + y1x1(q1 − 1) + 1, and y1x1(q1 − 1) + 1. Note that we recover the result of [22, Theorem 5]. 22 HEIDI HAYNAL 5. Examples We will demonstrate how each of the following k-algebras satisfies all the conditions of Theorem 2.8. Then Corollary 4.7 is applied to obtain an isomorphism of quotient division rings (thereby confirming the quantum Gel’fand-Kirillov conjecture) and PI degree parity with a multiparameter quantum affine space. When calculating the PI degree of a quantum affine space, we encounter an antisymmetric, or skew-symmetric, integral matrix. As proved in [30, Theorem IV.1], such a matrix is congruent to a matrix in skew normal form. Theorem 5.1. [Newman] Let A be a skew-symmetric matrix of rank r which belongs to Mn(R), where the commutative principal ideal domain R is not of characteristic 2. Then r = 2s and A is congruent to a matrix in block diagonal form  −h1 0 0 −h2 0 . . . 0 −hs 0  where hi | hi+1, 1 ≤ i ≤ s− 1. The same result, in the language of alternating bilinear forms, can be found in [3, Section 5.1]. The matrix S in Theorem 5.1 is clearly equivalent to the more familiar Smith normal form, diag(h1, h1, h2, h2, . . . , hs, hs, 0, 0, . . . , 0), where the diagonal entries are the in- variant factors of the matrix A. In the examples that follow, we outline the operations necessary to obtain the Smith normal form. Definition 5.2. Let A = k[x1; τ1, δ1] · · · [xn; τn, δn] and A′ = k[x1; τ1] · · · [xn; τn] be iterated skew polynomial rings. (1) If there exists Q = (q1, . . . , qn) ∈ (k×)n such that δiτi = qiτiδi for i = 1, . . . , n, then A is called an iterated Q-skew polynomial ring. (2) If there exist λji ∈ k× such that τj(xi) = λjixi for all i < j, then set λij = λ−1ji and λii = 1 for all i. We call Λ = (λij) ∈ Mn(k×) the matrix of relations for A′. Lemma 5.3. Let C be a commutative k-algebra, A a C-algebra, B ⊆ A a C-subalgebra generated by {b1, b2, . . . }. Let τ be a C-algebra automorphism of A, and δ a u-skew τ -derivation on A for some unit u ∈ C. If τ(bj) ∈ B and δn(bj) ∈ (n)!uB for all j, n, then δn(B) ⊆ (n)!uB for all n. PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 23 Proof. Note that τ(bj) ∈ B for all j implies that τ(B) ⊆ B and hence we have (j)!uB ⊆ (j)!uB for all j. Suppose that for integers m ≥ 1 and 1 ≤ ℓ ≤ m − 1, we have δi(bj1 · · · bjℓ) ∈ (i)!uB for all i, and all choices of j1, . . . , jℓ. Then δn(bj1 · · · bjm) = τn−iδi(bj1 · · · bj(m−1))δ n−i(bjm) (i)!u(n− i)!uB ⊆ (n)!uB for all n and all j1, . . . , jm by induction. � For a first family of examples, we take odd-dimensional quantum Euclidean spaces. The even-dimensional ones will be covered in Example 5.4. 5.1. The coordinate ring of odd-dimensional quantum Euclidean space; Oq(ok2n+1). For q ∈ k×, assuming q has a (fixed) square root q1/2 ∈ k, the k-algebra Oq(ok2n+1) may be presented as an iterated skew polynomial ring k[w][y1; σ1][x1; τ1, δ1] · · · [yn; σn][xn; τn, δn] with automorphisms σi, τi and derivations δi defined by σi(w) = q −1w all i τi(w) = qw all i σi(yj) = q −1yj j < i σi(xj) = q −1xj j < i τi(yj) = qyj i 6= j τi(xj) = qxj j < i τi(yi) = yi all i δi(w) = δi(xj) = δi(yj) = 0 j < i δi(yi) = (q 1/2 − q3/2)w2 + (1− q2) yℓxℓ all i. Quantum Euclidean spaces have been studied since 1990 when they were introduced by Reshetikhin et al. in [36]. The three-dimensional case has applications to the structure of space-time at small distances. Musson simplified the original set of relations in [29], and Oh further simplified them, renaming the generators ω, xi, yi in [31]. Here, we have made a change to Oh’s variables, yi 7→ qiyi, to obtain the relations in our presentation of Oq(ok2n+1). Routine computations show that τ−1i δiτi(yi) = q −2δi(yi) for all i, and so we conclude that each (τi, δi) is a q −2-skew derivation. We may present the analogous k[t±1]-algebra 24 HEIDI HAYNAL Ot(ok[t±1]2n+1) as an iterated skew polynomial ring with coefficient ring k[t±1] and generators w, yi, xi for i = 1, . . . , n, k[t±1][w][y1; σ̄1][x1; τ̄1, δ̄1] · · · [yn; σ̄n][xn; τ̄n, δ̄n] where the automorphisms and derivations are defined analogously to those of the algebra Oq(ok2n+1) with t ∈ k[t±1] replacing q ∈ k×. So each (τ̄i, δ̄i) is a t−2-skew derivation. It is immediate that Ot(ok[t±1]2n+1)/〈t− q〉 ∼= Oq(ok2n+1) with each τ̄i and δ̄i reducing to τi and δi respectively. Let Aj denote the k[t ±1]-subalgebra generated by w, ym, xm for m < j, and yj. To show that δ̄ij(Aj) ⊆ (i)!t−2Aj, we apply Lemma 5.3 noting that δ̄ij(yj) has been given for i = 1 and is zero for i > 1. So, by Theorem 2.8, each δi in our presentation of Oq(ok2n+1) extends to an iterative, locally nilpotent h.q−2-s.τi-d. on an appropriate subalgebra. Then Corollary 4.7 gives FractOq(ok2n+1) ∼= FractOB(k2n+1), where the matrix of relations is  1 q q−1 q q−1 · · · q q−1 q−1 1 1 q q−1 · · · q q−1 q 1 1 q q−1 · · · q q−1 q−1 q−1 q−1 1 1 · · · q q−1 q q q 1 1 · · · q q−1 . . . q−1 q−1 q−1 q−1 q−1 · · · 1 1 q q q q q · · · 1 1  If q ∈ k× is a root of unity, we may assume without loss of generality that it is a primitive rth root of unity. Then the powers of q from the matrix B become the entries of a (2n+ 1)× (2n+ 1) integer matrix  0 1 −1 1 −1 · · · 1 −1 −1 0 0 1 −1 · · · 1 −1 1 0 0 1 −1 · · · 1 −1 −1 −1 −1 0 0 · · · 1 −1 1 1 1 0 0 · · · 1 −1 . . . −1 −1 −1 −1 −1 · · · 0 0 1 1 1 1 1 · · · 0 0  Now, PIdegOq(ok2n+1) can be computed from Theorem 1.2(2) using the matrix B′. The cardinality of the image will not be changed if we first perform some row reductions on B′. Letting N = 2n+ 1, n > 2, we manipulate the rows as follows. PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 25 • For i = 2, 4, 6, . . . , N − 1, replace row i with row i + row (i+ 1). • For i = N,N − 2, N − 4, . . . , 5, replace row i with row i − row (i− 2). • Replace row 5 with row 5 − row 1. • For i = 2, 4, 6, . . . , N − 5, replace row i with row i − 2row (i+ 5). • Multiply the even numbered rows, except row 2n− 2, by −1. The resulting matrix has 2n pivots and one zero row. We put the rows in this order 3, 1, 5, 7, 2, 9, 4, 11, 6, 13, . . . , 2i, 2i+ 7, . . . , N,N − 5, N − 3, N − 1 to place the pivots on the main diagonal and the zero row in the last position. Then we have a matrix of this form  1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 1 −1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 0 2 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 0 0 1 1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 0 0 0 4 ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 0 0 0 0 1 1 ∗ ∗ ∗ ∗ ∗ 0 0 0 0 0 0 4 ∗ ∗ ∗ ∗ ∗ 0 0 0 0 0 0 0 . . . ∗ ∗ ∗ ∗ 0 0 0 0 0 0 0 0 1 1 ∗ ∗ 0 0 0 0 0 0 0 0 0 4 ∗ ∗ 0 0 0 0 0 0 0 0 0 0 2 −2 0 0 0 0 0 0 0 0 0 0 0 0  The diagonal entries of this echelon matrix do not yet reveal the size of its image because the pivot in row three does not divide all of the (suppressed) entries in its row when n ≥ 3. So more row reduction is needed. First replace row 3 with row 3 + row(4i+ 2). For n even and j = 5, 7, 9, . . . , 2n− 3, replace row j as follows: for j = 4p+ 1, p ≥ 1, use row j + i=p+1 2 · row(4i) + row(2n); for j = 4p+ 3, p ≥ 1, use row j + i=p+1 2 · row(4i+ 2). 26 HEIDI HAYNAL For n odd and j = 5, 7, 9, . . . , 2n− 5, replace row j as follows: for j = 4p+ 1, p ≥ 1, use row j + i=p+1 2 · row(4i) + 2 · row(2n); for j = 4p+ 3, p ≥ 1, use row j + i=p+1 2 · row(4i+ 2) + row(2n). Then add row(2n) to row(2n − 3), and add 2·row(2n) to row(2n − 1). For integers 4 ≤ j ≤ 2n − 1, with j 6≡ 2(mod 4), add (−1)jcol 3 to col j. Subtract col(2n + 1) from col 3; add row 3 to row(2n − 2); and subtract 2·row 3 from row(2n). The result is an upper echelon matrix in which each pivot divides all the nonzero entries in its row. So it is trivial to diagonalize by column operations. The Smith normal form for n odd is diag(1, 1, . . . , 1, 4, 4, . . . , 4, 0) with n+1 ones and n− 1 fours. The Smith normal form for n even is diag(1, 1, . . . , 1, 2, 2, 4, 4, . . . , 4, 0) with n ones, two twos, and n − 2 fours. For the cases n = 1, 2, the row-reduced matrices are, respectively, 1 0 0 0 1 −1 0 0 0  1 0 0 1 −1 0 1 −1 1 −1 0 0 2 −2 2 0 0 0 2 −2 0 0 0 0 0  Hence we have, for all n > 0, PIdegOq(ok2n+1) = rn, r odd rn/2⌊ ⌋, r even, r /∈ 4Z rn/2n−1, r ∈ 4Z 5.2. The multiparameter quantized Weyl algebras; AQ,Γn (k). For a fixed n-tuple Q = (q1, . . . , qn) ∈ (k×)n and Γ = (γij) a multiplicatively antisymmetric n × n matrix over k, the algebra AQ,Γn (k), studied in [23] and [26], may be presented as an iterated skew polynomial ring k[y1][x1; τ1, δ1][y2; σ2][x2; τ2, δ2] · · · [yn; σn][xn; τn, δn] PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 27 where the automorphisms and derivations are defined by σi(yj) = γjiyj j < i σi(xj) = γijxj j < i τi(yj) = qjγjiyj j < i τi(xj) = q j γijxj j < i τi(yi) = qiyi all i δi(xj) = δi(yj) = 0 j < i δi(yi) = 1 + (qℓ − 1)yℓxℓ all i. Routine computations show that τ−1i δiτi(yi) = qiδi(yi) for all i, and so we conclude that each (τi, δi) is a qi-skew derivation. We may present the k[t 1 , . . . , t n ]-algebra AT,Γn (k[t 1 , . . . , t n ]) as an iterated skew polynomial ring k[t±11 , . . . , t n ][y1][x1; τ̄1, δ̄1][y2; σ̄2][x2; τ̄2, δ̄2] · · · [yn; σ̄n][xn; τ̄n, δ̄n] where the automorphisms and derivations are defined analogously to those of AQ,Γn (k) with ti ∈ k[t±11 , . . . , t±1n ] replacing qi ∈ k. So each (τ̄i, δ̄i) is a ti-skew derivation. It is immediate that AT,Γn (k[t 1 , . . . , t n ])/〈t1 − q1, . . . , tn − qn〉 ∼= AQ,Γn (k) with each τ̄i and δ̄i reducing to τi and δi respectively. Let Aj denote the k[t 1 , . . . , t n ]-subalgebra generated by ym, xm for m < j, and yj. To show that δ̄ij(Aj) ⊆ (i)!tjAj, it suffices to check δ̄ij(yj) by Lemma 5.3. But this is given by definition for i = 1 and is zero for i > 1. So, by Theorem 2.8, each δi in our presentation of AQ,Γn (k) extends to an iterative, locally nilpotent h,qi-s.τi-d. on the appropriate subalgebra. Then Corollary 4.7 gives FractAQ,Γn (k) ∼= FractOΛ(k2n), where the 2n× 2n matrix of relations Λ is comprised of 2× 2 blocks Bii = 1 q−1i , for all i; Bij = γji q i γji γij qiγij , for i < j; Bij = γji γij qjγji q j γij , for i > j. If γij and qi are roots of unity for all i, j, then OΛ(k2n) is a PI algebra. Assuming that γij is an r ij root of unity and that qi is an r i root of unity, we let r = lcm{rij, ri | i, j = 1, . . . , n}. 28 HEIDI HAYNAL Then there exists a primitive rth root of unity q ∈ k and integers bi, bij such that qi = qbi and γij = q bij for i, j = 1, . . . , n. The powers of this q from the matrix Λ give a 2n× 2n integer matrix Λ′ comprised of 2× 2 blocks B′ii = 0 −bi , for all i; B′ij = bji bji − bi bij bij + bi , for i < j; B′ij = bji bij bj + bji bij − bj , for i > j. Then PIdegAQ.Γn (k) can be computed using the matrix Λ ′ in Theorem 1.2 (2). Consider the single parameter case, denoted Aqn(k), where qi = q for all i, and γij = 1 for i < j, relegating the σi to identity maps. Assuming that q is a primitive r th root of unity, then δi(y i ) = 0 and τi(y i ) = y i for all i, implying that y i is central. The definition of the τi, along with the q-Liebnitz rule, implies that x i is central for all i. So the algebra Aqn(k) is a finitely generated module over the central subring k[y i , x 1, . . . , y n]. To find the PI degree in this case, the integer matrix becomes  0 −1 0 −1 . . . 0 −1 1 0 0 1 0 1 0 0 0 −1 0 −1 1 −1 1 0 0 1 . . . 0 0 0 0 . . . 0 −1 1 −1 1 −1 . . . 1 0  which is seen to have a trivial kernel after these row reductions: • Replace row 2n with row 2n− row (2n− 2)− row (2n− 3) • For j = n−1, n−2, . . . , 2, replace row 2j with row 2j−row (2j−2)−row (2j−3) • Rearrange the rows to order 2, 1, 4, 3, 6, 5 . . . , 2n, 2n− 1. PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 29 The resulting matrix has the form  0 −1 ∗ . . . 0 1 0  thus verifying that PIdegAqn(k) = r 5.3. The multiparameter coordinate ring of quantum n×nmatrices; Oλ,p Mn(k) The multiparameter coordinate ring of quantum n×n matrices was introduced by Artin, Schelter, and Tate in [2]. The k-algebra Oλ,p Mn(k) is defined by generators xij for i, j = 1, . . . , n and relations xℓmxij = pℓipjmxijxℓm + (λ− 1)pℓiximxlj (ℓ > i, m > j) λpℓipjmxijxℓm (ℓ > i, m ≤ j) pjmxijxℓm (ℓ = i, m > j), where λ ∈ k× and p = (pij) ∈ Mn2(k×) is multiplicatively antisymmetric. It can also be presented as an iterated skew polynomial ring k[x11][x12; τ12] · · · [xij ; τij, δij ] · · · [xnn; τnn, δnn] where each τℓm and δℓm is k-linear and satisfies τℓm(xij) = pℓipjmxij when ℓ > i and m 6= j λpℓipjmxij when ℓ > i and m = j pjmxij when ℓ = i and m > j, δℓm(xij) = (λ− 1)pℓiximxℓj when ℓ > i and m > j 0 otherwise. Routine computations show τ−1ℓm δℓmτℓm(xij) = λ −1δℓm(xij) as in [9, Section 5], and so we conclude that each (τℓm, δℓm) is a λ −1-skew derivation. We may present the k[t±1]- algebra Ot,p Mn(k[t as an iterated skew polynomial ring with generators xij for i, j = 1, . . . , n k[t±1][x11][x12, τ̄12] · · · [xij ; τ̄ij, δ̄ij ] · · · [xnn; τ̄nn, δ̄nn] 30 HEIDI HAYNAL where the automorphisms and derivations are defined analogously to those of the algebra Mn(k) with t ∈ k[t±1] replacing λ ∈ k. So each (τ̄ℓm, δ̄ℓm) is a t−1-skew derivation. It is immediate that Mn(k[t /〈t− λ〉 ∼= Oλ,p Mn(k) with each τ̄ℓm and δ̄ℓm reducing to τℓm and δℓm respectively. Let A−ℓm denote the k[t ±1]-subalgebra generated by the xij with (i, j) < (ℓ,m) in the lexicographic order. Lemma 5.3 allows us to to verify that δ̄sℓm(A ℓm) ⊆ (s)!t−1(A−ℓm) by checking only that δ̄sℓm(xij) is contained in A ℓm. This is immediate from the formula for δ̄ℓm given above. Thus, by Theorem 2.8, each δℓm in our presentation of Oλ,p Mn(k) extends to an iterative, locally nilpotent h.λ−1-s.τℓm-d. on the appropriate k-subalgebra. Then Corollary 4.7 gives FractOλ,p Mn(k) ) ∼= FractOΛ(kn where the matrix of relations Λ = (bij) ∈ Mn2(k) is comprised of n× n blocks Bii =  1 p21 p31 · · · pn1 p12 1 p32 · · · pn2 p13 p23 1 · · · pn3 . . . p1n p2n p3n · · · 1  for all i, Bij =  λ−1pij pijp21 pijp31 · · · pijpn1 λ−1pijp12 λ −1pij pijp32 · · · pijpn2 λ−1pijp13 λ −1pijp23 λ −1pij · · · pijpn3 . . . λ−1pijp1n λ −1pijp2n λ −1pijp3n · · · λ−1pij  , for i < j, Bij =  λpij λpijp21 λpijp31 · · · λpijpn1 pijp12 λpij λpijp32 · · · λpijpn2 pijp13 pijp23 λpij · · · λpijpn3 . . . pijp1n pijp2n pijp3n · · · λpij  , for i > j. If λ and pij are roots of unity for all i, j, then OΛ(kn ) is a PI algebra. In this case we may assume that λ is an sth root of unity and that pij is an r ij root of unity, and let r = lcm{s, rij | i, j = 1, . . . , n}. Then there exists a primitive rth root of unity q ∈ k and integers b, bij such that λ = q b and pij = q bij . The powers of this q from the matrix PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 31 Λ provide entries for an n2 × n2 integer matrix Λ′ made up of n× n blocks B′ii =  0 b21 b31 · · · bn1 b12 0 b32 · · · bn2 b13 b23 0 · · · bn3 . . . b1n b2n b3n · · · 0  for all i, B′ij =  bij − b bij + b21 bij + b31 · · · bij + bn1 bij + b12 − b bij − b bij + b32 · · · bij + bn2 bij + b13 − b bij + b23 − b bij − b · · · bij + bn3 . . . bij + b1n − b bij + b2n − b bij + b3n − b · · · bij − b  , for i < j, B′ij =  bij + b bij + b21 + b bij + b31 + b · · · bij + bn1 + b bij + b12 bij + b bij + b32 + b · · · bij + bn2 + b bij + b13 bij + b23 bij + b · · · bij + bn3 + b . . . bij + b1n bij + b2n bij + b3n · · · bij + b  , for i > j. Then PIdegOλ,p Mn(k) can be calculated using Λ′ in Theorem 1.2 (2). The single parameter quantized coordinate ring of n × n matrices, Oq(Mn(k)), is de- fined over k analogously to Oλ,p(Mn(k)), but with relations that are recovered by setting λ = q−2 and pij = q for all i > j. When k has characteristic zero and q is a primitive m root of unity for m odd, Jakobsen and Zhang found in [20] that PIdegOq(Mn(k)) = m n(n−1) 2 by using De Concini’s and Procesi’s tool given in Theo- rem 1.2. This result is reproved in [19] using results of De Concini and Procesi and also Jøndrup’s work from [21]. Now we can recover PIdegOq(Mn(k) without the assumption that k has characteristic zero. In the single parameter case of n × n quantum matrices, the matrix that we use to calculate the PI degree is  An In In In · · · In −In An In In · · · In −In −In An In · · · In −In −In −In −In · · · An  32 HEIDI HAYNAL where  0 1 1 1 · · · 1 −1 0 1 1 · · · 1 −1 −1 0 1 · · · 1 −1 −1 −1 · · · −1 0  is n× n and In is the n× n identity matrix. For any n, the characteristic polynomial of An is the sum of the terms of degree ≡ n (mod 2) in the binomial expansion of (x+1)n, so in fact χn(x) = (x+1)n+ 1 (x− 1)n. But there is also a recursion formula for the characteristic polynomial for n ≥ 3 given χn(x) = χn−1(x)(x+ 1)− (x− 1)n−1, which will be useful in the linear algebra that follows. We will perform the following row reductions on the rows of blocks of Λ′. For ease of notation, we’ll denote the jth row of blocks as BRj , the interchange of BRi and BRj as BRi ↔ BRj , and the addition of a multiple of BRi to BRj as MBRi + BRj 7→ BRj , where M ∈ Mn(Z). • BR1 ↔ BRn. • −InBR1 7→ BR1. • For i = 2, . . . , n− 1, BR1 +BRi 7→ BRi. • BRn − AnBR1 7→ BRn. This yields the matrix  In In In In · · · −An 0 An + In 2In 2In · · · In −An 0 0 An + In 2In · · · In −An . . . 0 0 0 · · · An + In In −An 0 In −An In −An In −An · · · In + A2n  which can be reduced further by n − 2 block row operations, each of which produces one zero block in the nth row. We list the first three here along with the resulting (n, n) block. • (An + In)BRn − (In − An)BR2 7→ BRn : A3n + 3An • (An + In)BRn + (In −An)2BR3 7→ BRn : A4n + 6A2n + In • (An + In)BRn − (In − An)3BR4 7→ BRn : A5n + 10A3n + 5An PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 33 In general, the block row operations that we need to perform in order to obtain a block upper triangular matrix are: • For i = 2, . . . , n− 1, (An + In)BRn + (−1)i−1(In − An)i−1BRi 7→ BRn. These row operations are justified when m is odd because An + In is invertible in Mn(Z/mZ) in that case, as will be shown below. After applying this step to the i th row, the (n, n) block is χi+1(An). So the resulting block upper triangular matrix is  In In In In · · · −An 0 An + In 2In 2In · · · In −An 0 0 An + In 2In · · · In −An . . . 0 0 0 · · · An + In In −An 0 0 0 0 · · · χn(An)  where χn(An) is the n× n zero matrix. Each block on the diagonal is An + In =  1 1 1 1 · · · 1 −1 1 1 1 · · · 1 −1 −1 1 1 · · · 1 −1 −1 −1 −1 · · · 1  which can be row reduced just by adding row 1 to rows 2 through n to yield the matrix   1 1 1 1 · · · 1 0 2 2 2 · · · 2 0 0 2 2 · · · 2 . . . 0 0 0 0 · · · 2  In particular this shows that An + In is invertible in Mn(Z/mZ) for m odd. Hence Λ′ can be reduced through row operations to an upper triangular n2 × n2 matrix with 2n− 2 ones, (n− 1)(n− 2) twos, and n zeroes on the diagonal. Assuming that q ∈ k is a primitive mth root of unity, and recalling Theorem 1.2, the cardinality of the image in (Z/mZ)n is mn 2−n if m is odd. Thus we conclude that PIdegOqMn(k) = m n(n−1) recovering the result of Jakobsen and Zhang [20] in characteristic zero. By similar methods, one can show that PIdegOqMn(k) = m n(n−1) (n−1)(n−2) 2 when m is even. For details on this result see [20] or [17]. 5.4. The algebra K n,Γ (k), which generalizes the coordinate rings of even- dimensional quantum Euclidean space and quantum symplectic space. For P = (p1, . . . , pn) and Q = (q1, . . . , qn) in (k ×)n with pi 6= qi for all i = 1, . . . , n, and 34 HEIDI HAYNAL Γ = (γij) ∈ Mn(k×) multiplicatively antisymmetric, the k-algebra KP,Qn,Γ (k) introduced in [18] is defined by generators xi, yi for i = 1, . . . , n and relations yiyj = γijyjyi all i, j xixj = qip j γijxjxi i < j xiyj = pjγjiyjxi i < j xiyj = qjγjiyjxi i > j xiyi = qiyixi + (qℓ − pℓ)yℓxℓ all i. This algebra may be presented in the form of an iterated skew polynomial ring k[y1][x1; τ1][y2; σ2][x2; τ2, δ2] · · · [yn; σn][xn; τn, δn] where the automorphisms τi, σi and derivations δi are defined by σi(yj) = γijyj j < i σi(xj) = p i γjixj j < i τi(yj) = qjγjiyj j < i τi(xj) = q j piγijxj j < i τi(yi) = qiyi all i δi(xj) = δi(yj) = 0 j < i δi(yi) = (qℓ − pℓ)yℓxℓ all i. Routine computations show that τ−1i δiτi(yi) = qip i δi(yi) for all i, and so we conclude that each (τi, δi) is a qip i -skew derivation. For ease of notation we now shall let k = k[t±11 , . . . , t n , u 1 , . . . , u n ] with T = (t1, . . . , tn) ∈ k and U = (u1, . . . , un) ∈ k. We may present the k-algebra K n,Γ (k) as an iterated skew polynomial ring k[y1][x1; τ̄1][y2; σ̄2][x2; τ̄2, δ̄2] · · · [yn; σ̄n][xn; τ̄n, δ̄n] where the automorphisms and derivations are defined analogously to those of K n.Γ (k) with ti replacing pi and ui replacing qi. Let I ⊆ KT,Un,Γ (k) be the ideal generated by the 2n monomials ti − pi, ui − qi for i = 1, . . . , n. It is immediate that n,Γ (k)/I ∼= KP,Qn,Γ (k), with each τ̄i, δ̄i, σ̄i reducing to τi, δi, σi respectively. Let Aj denote the subalgebra of K n,Γ (k) generated by ym, xm for m < j and yj. To show that δ̄ij(Aj) ⊆ (i)!ujt−1j Aj , it suffices to check that δ̄ j(yj) is an element of (i)!ujt−1j PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 35 by Lemma 5.3. This is given for i = 1 by the formula for δ̄j and is zero for i > 1. So, by Theorem 2.8, each δi in our presentation of K n,Γ (k) extends to an iterative, locally nilpotent h.qip i -s.τi-d. on the appropriate subalgebra. Then Corollary 4.7 gives FractK n,Γ (k) ∼= FractOΛ(k2n) where the 2n× 2n matrix of relations Λ = (Bij) is comprised of 2× 2 blocks Bii = 1 q−1i , for all i; Bij = γij q i γji pjγji qip j γij , for i < j; Bij = γij p i γij qjγji q j piγij , for i > j. If the qi, pi and γi are all roots of unity, then OΛ(k2n) is a PI algebra. Suppose qi is an rthi root of unity, pi is an s i root of unity, and γij is an r ij root of unity for all i, j. Let r = lcm{ri, si, γij | i, j = 1, . . . , n}. Then there extsis a primitive rth root of unity q ∈ k and integers bi, ci, bij such that qi = q bi , pi = q ci, and γij = q bij for all i, j. The powers of q from the matrix Λ provide the entries for an integer matrix Λ′ comprised of 2 × 2 blocks B′ii = 0 −bi , for all i; B′ij = bij bji − bi bji + cj bi + bij − cj , for i < j; B′ij = bij bij − ci bji + bj bij + ci − bj , for i > j. Then PIdegK n,Γ (k) can be calculated using Λ ′ in Theorem 1.2 (2). The coordinate ring of quantum Euclidean 2n-space over k, Oq(ok2n), is formed by setting qi = 1, pi = q −2 for all i, and γij = q −1 for i < j in the parameters Q, P , and Γ 36 HEIDI HAYNAL (see [18], Example 2.6). Then the integer matrix, Λ′, is  0 0 −1 1 −1 1 . . . −1 1 0 0 −1 1 −1 1 . . . −1 1 1 1 0 0 −1 1 −1 1 −1 −1 0 0 −1 1 ... 1 1 1 1 0 0 −1 −1 −1 −1 0 0 . . . 1 1 1 1 1 . . . 0 0 −1 −1 −1 −1 −1 . . . 0 0  We perform the following row reductions that preserve the size of the image of the homomorphism Z2n −→ Z2n given by Λ′: • For j = 2n, 2n− 1, 2n− 2, . . . , 4, replace row j with row j + row (j − 1) • Replace row 2 with row 2− row 1 • Replace the (new) row 5 with row 5 + row 1 • For j = 4, 6, 8, . . . , 2n− 4, replace row j with row j + 2row (j + 3) • For n ≥ 4, rearrange the rows to order 3, 1, 5, 7, 4, 9, 6, 11, . . . , 2i, 2i+ 5, . . . , 2n− 4, 2n− 2, 2, 2n. The resulting matrix has the form  0 −1 1 0 2 ∗ 0 1 1 . . . 0 1 1 0 0 4 0 −2 2  When n is even, the pivot in the third row does not divide all the entries in its row, so more elementary row and column operations are needed before it becomes clear that the matrix can be diagonalized. By a method similar to that used in Example 5.1, suppressed here in the interest of saving space but listed explicitly in [17], we obtain the Smith normal form diag(1, 1, . . . , 1, 4, 4, . . . , 4, 0, 0) with n ones and n− 2 fours when n PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 37 is even; and diag(1, 1, . . . , 1, 2, 2, 4, 4, . . . , 4, 0, 0), with n− 1 ones and n− 3 fours when n is odd. Thus we have PIdegOq(ok2n) = rn−1, r odd rn−1/2⌊ ⌋, r even /∈ 4Z rn−1/2n−2, r ∈ 4Z . (10) The low-dimensional cases do not fit the same pattern, but the matrices for the cases n = 2 and n = 3 are readily transformed to 1 1 0 0 0 0 −1 1 0 0 0 0 0 0 0 0  and  1 1 0 0 −1 1 0 0 −1 1 −1 1 0 0 0 2 0 0 0 0 0 0 −2 2 0 0 0 0 0 0 0 0 0 0 0 0  respectively. Therefore, formula (10) holds for all n ≥ 2. As a specific case ofK n,Γ (k), quantum symplectic spaceOq(sp(k2n)) is formed by setting qi = q −2 and pi = 1 for all i, and γij = q for i < j (see [18], Example 2.4). With these parameters, the 2n× 2n integer matrix Λ′ is  0 2 1 1 1 1 . . . 1 1 −2 0 −1 −1 −1 −1 . . . −1 −1 −1 1 0 2 1 1 1 1 −1 1 −2 0 −1 −1 −1 −1 −1 1 −1 1 0 2 ... −1 1 −1 1 −2 0 . . . −1 1 −1 1 −1 1 . . . 0 2 −1 1 −1 1 −1 1 . . . −2 0  We perform the following row reductions that preserve the size of the image of the homomorphism Z2n −→ Z2n given by Λ′: • For j = 2n, 2n− 1, . . . , 4, replace row j with row j − row (j − 1) • Replace row 2 with −(row 2− 2row 3 + row 1) • For j = 4, 6, 8, . . . , 2n− 2, replace row j with row j + 2row (j + 1) • For n ≥ 3, order the rows 3, 1, 5, 2, 7, 4, 9, . . . , 2j, 2j + 5, . . . , 2n− 4, 2n, 2n− 2. 38 HEIDI HAYNAL This yields a matrix whose image is more easily measured:  0 1 1 ∗ . . . 0 0 4 −2 −2  But the pivot in row 2 is problematic because it does not always divide the other entries in its row. With further elementary row and column operations, full details of which can be found in [17], we can bring this matrix into Smith normal form diag(1, 1, . . . , 1, 4, 4, . . . , 4) with n ones and n fours when n is even; or the form diag(1, 1, . . . , 1, 2, 2, 4, 4, . . . , 4) with n − 1 ones, two twos, and n − 1 fours when n is odd. For n = 1, 2, the row reduced matrices are, respectively, −1 1 0 2 0 1 1 1 0 0 −4 −4 0 0 0 −4  . Hence we have, for all n, PIdegOq(sp(k2n)) = rn, r odd rn/2⌊ ⌋, r even, r /∈ 4Z rn/2n, r ∈ 4Z 6. Prime Factor Localizations In this section we present a structure theorem for completely prime factors of iterated skew polynomial rings analogous to the main theorem of section four. Applying this result to the algebras studied in section five, we’d like to strengthen it to the form of the quantum Gel’fand-Kirillov conjecture. Recall that the assumptions about skew polynomial rings from section one are still in effect. Theorem 6.1. Let A = R[x; τ, δ], where R is noetherian and δτ = qτδ for some q ∈ k×. Assume that δ extends to a locally nilpotent, iterative h.q-s.τ -d., {di}, on R. Let P ∈ specA be completely prime. Then PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 39 (1) there exists a cyclic Ore set S in A/P such that (A/P )S−1 ∼= R[y; τ ]/Q Y −1 for some completely prime Q ∈ specR[y; τ ] and cyclic Ore set Y , (2) FractA/P ∼= FractR[y; τ ]/Q. Proof. The completely prime ideal P naturally satisfies one of two cases: x ∈ P or x /∈ P . If x ∈ P , then xA ⊆ P and Ax ⊆ P . So the relation xr = τ(r)x + δ(r) implies that δ(r) ∈ P for all r ∈ R. Hence, there is a completely prime ideal I ∈ R such that A/P ∼= R/I ∼= R[y; τ ]/(I + 〈y〉). In this case, we can take S = Y = {1} and localize. If x /∈ P , then xi /∈ P for all i ∈ N ∪ {0} because A/P is a domain. Letting S = {1, x, x2, . . . }, which is a known denominator set in A, we have P ∩ S = ∅. Since extension and contraction provide inverse bijections between the sets specAS−1 and {I ∈ specA | I ∩ S = ∅}, we know that P e ∈ specAS−1. From Theorem 3.7, we have AS−1 ∼= R[y±1; τ ], a localization of R[y; τ ]. So there is a completely prime ideal Q̄⊳R[y±1; τ ] such that AS−1/P e ∼= R[y±1; τ ]/Q̄. Setting Y = {1, y, y2, . . . , }, contraction to R[y; τ ] gives a completely prime ideal Q, where Q∩ Y = ∅, such that R[y±1; τ ]/Q̄ is isomorphic to (R[y; τ ]/Q)Y −1. The canonical projection π : AS−1 −→ (A/P )S−1 gives AS−1/P e ∼= (A/P )S−1. Thus (A/P )S−1 ∼= R[y; τ ]/Q Y −1. � Theorem 6.2. Let R be a noetherian k-algebra, and let A = R[x1, τ1, δ1] · · · [xn; τn, δn] be an iterated skew polynomial ring where, for j < i and λij ∈ k×, τi(xj) = λijxj, and δi is a qi-skew τi-derivation, qi 6= 1, which extends to a locally nilpotent, iterative h.qi-s.τi-d. {di,p}∞p=0 on R[x1; τ1, δ1] · · · [xi−1; τi−1, δi−1] for all i. Let A′ = R[y1; τ ′1][y2; τ ′2] · · · [yn; τ ′n] where τ ′i(yj) = λijyj for all i with j < i and the same units λij as above. Let P be a completely prime ideal in A. Then (1) there exists a finitely generated Ore set Sn in A/P such that (A/P )S n is isomorphic Y −1n for some completely prime ideal Q ⊆ A′ and finitely generated Ore set (2) FractA/P ∼= FractA′/Q. Proof. The case n = 1 has been established in Theorem 6.1. Suppose the result holds for the case n− 1, and let An−1 = R[x1, τ1, δ1] · · · [xn−1; τn−1, δn−1] ⊆ A. Then we have A = An−1[xn; τn, δn]. If xn ∈ P , then as in Theorem 6.1 there is a completely prime ideal I ⊆ An−1 such that A/P ∼= An−1/I ∼= An−1[yn; τ ′n]/(I + 〈yn〉). The induction hypothesis and Lemma 4.2 imply that An−1[yn; τ n]/(I + 〈yn〉) S−1 ∼= Y −1 for some finitely generated Ore sets S and Y . Hence there is a finitely generated Ore set Sn in A such that (A/P )S ∼= (A′/Q)Y −1. If xn /∈ P , let Sn = {1, xn, x2n, . . . } ⊆ A and Yn = {1, yn, y2n, . . . } ⊆ An−1[yn; τn]. Then from the single-variable result, it follows that ( An−1[yn; τ n]/Q̄ Y −1n , (11) 40 HEIDI HAYNAL for a completely prime ideal Q̄ ⊆ An−1[yn; τ ′n]. From Lemma 4.2, we have An−1[yn; τ n] = R[yn; τ n][x1; τ 1] · · · [xn−1; τ ′n−1, δ′n−1], which is an iterated skew polynomial ring in n − 1 variables over the coefficient ring R[yn; τ n] that satisfies the current assumptions. So, we apply the induction hypothesis and rearrange variables to obtain An−1[yn; τn]/Q̄ Y −1n R[yn; τ n][y1; τ 1] · · · [yn−1; τ ′n−1]/Q R[y1; τ 1][y2; τ 2] · · · [yn; τ ′n]/Q for a completely prime ideal Q ⊆ R[y1; τ ′1][y1; τ ′1] · · · [yn; τ ′n] and a denominator set Z ⊆ R[y1; τ ′1][y1; τ ′1] · · · [yn; τ ′n]/Q. This, along with isomorphism (11) gives the re- sult. � When R is replaced by k, we have the following result. Corollary 6.3. Let A = k[x1, τ1, δ1] · · · [xn; τn, δn], where τi(xj) = λijxj and δiτi = qiτiδi, qi 6= 1, for λij , qi ∈ k× and all i with j < i. Assume that each δi extends to a locally nilpotent, iterative h.qi-s.τi-d. {di,m}∞m=0 on the subalgebra k[x1; τ1, δ1] · · · [xi−1; τi−1, δi−1]. Let P be a completely prime ideal in A and set λii = 1 and λji = λ ij . Then for λ = (λij) ∈ Mn(k), and an appropriate completely prime ideal Q ⊆ Oλ(kn), we have FractA/P ∼= FractOλ(kn)/Q. We summarize how this applies to the k-algebras of quantized coordinate type. Corollary 6.4. Let A be any of the examples discussed in sections 5.1 - 5.4, and let P be a completely prime ideal of A. Then there exist a positive integer N , a multiplicatively antisymmetric N ×N matrix λ over k, and a completely prime ideal Q ∈ Oλ(kN) such that FractA/P ∼= FractOλ(kN)/Q. To complete the question posed by the corollary, one might ask how far the quantum Gel’fand-Kirillov conjecture extends to prime factor algebras. For instance: Question 6.5. Find conditions under which we can conclude that for any positive in- teger n, multiplicatively antisymmetric matrix λ ∈ Mn(k×), and completely prime ideal Q ∈ specOλ(kn), we have FractOλ(kn)/Q ∼= FractOp(Km) for some field extension K ⊇ k, integer m ≤ n, and m×m matrix p over K. The case n = 1 is trivial. When n = 2 and Q contains x1 or x2, then FractOλ(k2)/Q is isomorphic either to FractOp(k(y)) where p = (1), or to k itself. In fact, for any n, if Q is generated by a subset S of {x1, . . . , xn}, then the result holds, with p the submatrix of λ formed by deleting the ith row and column for xi ∈ S, and K = k. When xi /∈ Q PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 41 for all i, answering the question fully will likely require different methods depending on the presence of roots of unity among the λij. A positive answer in the generic case has been provided in the proof of [13, Theorem 2.1]: Theorem 6.6. [Goodearl - Letzter] Let k be a field, λ = (λij) a multiplicatively anti- symmetric n× n matrix over k×, and Λ the subgroup of k× generated by the λij. If Λ is torsionfree, then all of the prime ideals Q of Oλ(kn) are completely prime. In their proof, they showed that FractOλ(kn)/Q ∼= FractOp(Km), and identified K as the quotient field of a commutative domain embedded in the center of Oλ((k×)n)/Q′, where Q′ is the prime ideal in Oλ((k×)n) induced by localization. Quantum affine space is included in a class called quantum solvable algebras by A. N. Panov. The main theorem of [34, Section 3], states that when the group generated by the λij is torsionfree, then FractOλ(kn)/Q is isomorphic to the quotient division ring of a quan- tum torus. The main theorem of [35, Section 3], allows roots of unity and states that when Q satisfies the extra condition of being stable under a certain set of derivations, then FractOλ(kn)/Q is isomorphic to the quotient division ring of a quantum torus. Cauchon’s work may also be specialized to apply to quantum affine space when the group generated by the λij is torsionfree. The result of [5, Theorem 6.1.1], indicates that FractOλ(kn)/Q is isomorphic to FractOp(Km) which specializes to this result. But the division ring of real quaternions provides an example showing that Question 6.5 needs to have some conditions imposed. Note that H ∼= Oλ(R3)/Q, where λ = 1 −1 −1 −1 1 −1 −1 −1 1  , andQ = 〈x21 + 1, x22 + 1, x23 + 1〉. Therefore, we cannot obtain the desired isomorphism of quotient division rings in this case, illustrating the necessity of an extra condition such as the one imposed by Panov in [35]. acknowledgments The author thanks her dissertation advisor, Ken Goodearl, for his direction that was so freely given in many inspiring discussions. References [1] J. Alev and F. Dumas, Sur le corps de fractions de certaines algèbres quantiques, J. Algebra 170 (1994), 229-265 [2] M. Artin, W. Schelter, and J. Tate, Quantum deformations of GLn, Comm. Pure Appl. Math 44 (1991), 879-895 [3] N. Bourbaki, Éléments de mathématique, Livre II, Algèbre, Chapitre 9, Formes sesquilinéaires et formes quadratiques, Hermann, Paris, 1959 42 HEIDI HAYNAL [4] K. A. Brown and K. R. Goodearl, Lectures on Algebraic Quantum Groups, Birkhäuser Verlag, Basel - Boston, 2002 [5] G. Cauchon, Effacement des dérivations et spectres premiers des algèbres quantiques, J. Algebra 260 (2003), 476-518 [6] G. Cauchon, Spectre premier de Oq(Mn(k)) image canonique et séparation normale, J. Algebra 260 (2003), 519-569 [7] G. Cliff, The division ring of quotients of the coordinate ring of the quantum general linear group, J. London Math. Soc. (2) 51 (1995), 503-513 [8] C. De Concini and C. Procesi, Quantum Groups, in D-modules Representation Theory, and Quantum Groups (Venezia, June 1992) (G. Zampieri and A. D’Agnolo, eds.), Lecture Notes in Math. 1565, Springer-Verlag, Berlin, 1993, 31-140 [9] K. R. Goodearl, Uniform ranks of prime factors of skew polynomial rings, in Ring Theory, Proc. Biennial Ohio State - Denison Conf., 1992 (S. K. Jain and S. T. Rizvi, eds.), World Scientific, Singapore, 1993, 182-199 [10] K. R. Goodearl, Prime ideals in Skew polynomial rings and quantized Weyl algebras, Trans. Amer. Math. Soc. 352 (2000), 1381-1403 [11] K. R. Goodearl, Prime spectra of quantized coordinate rings, in Interactions between Ring Theory and Representations of Algebras (Murcia 1998) (F. Van Oystaeyen and M. Saoŕın, eds.), Dekker, New York, 2000, pp. 205-237 [12] K. R. Goodearl and T. H. Lenagan, Catenarity in quantum algebras, J. Pure and Appl. Algebra 111 (1996), 123-142 [13] K. R. Goodearl and E. S. Letzter, Prime factor algebras of the coordinate ring of quantum matrices, Proc. Amer. Math. Soc. 121 (1994), 1017-1025 [14] K. R. Goodearl and E. S. Letzter, Prime ideals in skew and q-skew polynomial rings, Mem. Amer. Math. Soc. 521 (1994) [15] K. R. Goodearl and E. S. Letzter, The Dixmier-Moeglin equivalence in quantum coordinate rings and quantized Weyl Algebras, Trans. Amer. Math. Soc. 352 (2000), 1381-1403 [16] K. R. Goodearl and R. B. Warfield, Jr., An Introduction to Noncommutative Noetherian Rings, 2nd ed., Cambridge Univ. Press, Cambridge, 2004 [17] H. A. Haynal, Pi degree parity in q-skew polynomial rings, Ph.D. Thesis, to appear, (2007) University of California, Santa Barbara [18] K. L. Horton, The prime and primitive spectra of multiparameter quantum symplectic and eu- clidean spaces, Comm. Algebra 31 (10) (2003), 4713-4743 [19] H. P. Jakobsen and S. Jøndrup, Quantized rank r matrices, J. Algebra 246 (2001), 70-96, arXiv:math.QA/9902133 v3, 23 May 2001 [20] H. P. Jakobsen and H. Zhang, The center of the quantized matrix algebra, J. Albegra 196 (1997), 458-474 [21] S. Jøndrup, Representations of skew polynomial algebras, Proc. Amer. Math Soc. 128 (2000), 1301-1305 [22] S. Jøndrup, Representations of some PI algebras, Comm. Algebra 31 (6) (2003), 2587-2602 [23] D. A. Jordan, A simple localization of the quantized Weyl algebra, J. Algebra 174 (1995), 267-281 [24] T. Y. Lam, Lectures on Modules and Rings, Springer, New York, 1999 [25] D. R. Malm, Simplicity of partial and Schmidt differential operator rings, Pacific J. Math. 132 (1998), no. 1, 85-112 [26] G. Maltsiniotis, Calcul différentiel quantique, Groupe de travail, Université Paris VII (1992) [27] J. C. McConnell and J. C. Robson, Noncommutative Noetherian Rings, Wiley-Interscience, Chichester - New York, 1987 http://arxiv.org/abs/math/9902133 PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 43 [28] V. G. Mosin and A. N. Panov, Division rings of quotients and central elements of multiparam- eter quantizations, Sbornik: Mathematics 187:6 (1996), 835-855 [29] I. M. Musson, Ring theoretic properties of the coordinate rings of quantum symplectic and Eu- clidean space, in Ring Theory, Proc. Biennial Ohio State-Denison Conf., 1992 (S.K. Jain and S.T. Rizvi, eds.), World Scientific, Singapore, 1993, 248-258 [30] M. Newman, Integral Matrices, Academic Press, 1972 [31] S. Q. Oh, Catenarity in a class of iterated skew polynomial rings, Comm. Algebra 25 (1) (1997), 37-49 [32] A. N. Panov, Skew fields of twisted rational functions and the skew field of rational functions on GLq(n,K), St. Petersburg Math J. 7 (1) (1996), 129-143 [33] A. Panov, Fields of fractions of quantum solvable algebras, J. Algebra 236 (2001), 110-121 [34] A. Panov, Stratification of prime spectrum of quantum solvable algebras, Comm. Algebra 29(9) (2001), 3801-3827 [35] A. Panov, Quantum solvable algebras. Ideals and representations at roots of 1, Transformation Groups 7, no. 4, (2002) 379-402 [36] N. Yu. Reshetikhin, L. A. Takhtadzhyan, and L. D. Fadeev, Quantization of Lie Groups and Lie Algebras, Leningrad Math J. 1 (1990), 193-225 [37] L. H. Rowen, Ring Theory, Volumes I and II, Academic Press, Boston, 1988 [38] S. P. Smith, Quantum groups: An introduction and survey for ring theorists, in Noncommutative Rings (S. Montgomery and L. W. Small, eds.), pp131-178, MSRI Publ. 24, Springer-Verlag, Berlin (1992) [39] R. P. Stanley, Enumerative Combinatorics, Vol. I, Wadsworth & Brooks/Cole, Monterey, CA, Department of Mathematics, University of California, Santa Barbara, California 93106 E-mail address : heidi@softerhardware.com 1. Introduction 2. Higher q-Skew -Derivations 3. The -Derivation Removing Homomorphism 4. Main Theorem 5. Examples 5.1. The coordinate ring of odd-dimensional quantum Euclidean space; Oq (o k2n+1) 5.2. The multiparameter quantized Weyl algebras; AnQ, (k) 5.3. The multiparameter coordinate ring of quantum n n matrices; O, bold0mu mumu pppppp(to.Mn(k))to. 5.4. The algebra Kn, P, Q (k), which generalizes the coordinate rings of even-dimensional quantum Euclidean space and quantum symplectic space 6. Prime Factor Localizations acknowledgments References ABSTRACT For k a field of arbitrary characteristic, and R a k-algebra, we show that the PI degree of an iterated skew polynomial ring R[x_1;\tau_1,\delta_1]...b[x_n;\tau_n,\delta_n] agrees with the PI degree of R[x_1;\tau_1]...b[x_n;\tau_n] when each (\tau_i,\delta_i) satisfies a q_i-skew relation for q_i \in k^{\times} and extends to a higher q_i-skew \tau_i-derivation. We confirm the quantum Gel'fand-Kirillov conjecture for various quantized coordinate rings, and calculate their PI degrees. We extend these results to completely prime factor algebras. <|endoftext|><|startoftext|> Semi-spheroidal Quantum Harmonic Oscillator D. N. Poenaru,1, 2, ∗ R. A. Gherghescu,1, 2 A. V. Solov’yov,1 and W. Greiner1 1Frankfurt Institute for Advanced Studies, J. W. Goethe Universität, Max-von-Laue-Str. 1, D-60438 Frankfurt am Main, Germany 2 Horia Hulubei National Institute of Physics and Nuclear Engineering (IFIN-HH), P.O. Box MG-6, RO-077125 Bucharest-Magurele, Romania (Dated: November 15, 2018) A new single-particle shell model is derived by solving the Schrödinger equation for a semi- spheroidal potential well. Only the negative parity states of the Z(z) component of the wave function are allowed, so that new magic numbers are obtained for oblate semi-spheroids, semi-sphere and prolate semi-spheroids. The semi-spherical magic numbers are identical with those obtained at the oblate spheroidal superdeformed shape: 2, 6, 14, 26, 44, 68, 100, 140, ... The superdeformed prolate magic numbers of the semi-spheroidal shape are identical with those obtained at the spherical shape of the spheroidal harmonic oscillator: 2, 8, 20, 40, 70, 112, 168 ... PACS numbers: 03.65.Ge, 21.10.Pc, 31.10.+z, The spheroidal harmonic oscillator have been used in various branches of Physics. Of particular interest was the famous single-particle Nilsson model [1] very success- ful in Nuclear Physics and its variants [2, 3, 4] for atomic clusters. Major spherical-shells N = 2, 8, 20, 40, 58, 92 have been found [2] in the mass spectra of sodium clus- ters of N atoms per cluster, and the Clemenger’s shell model [3] was able to explain this sequence of spherical magic numbers. In the present paper we would like to write explicitly the analytical relationships for the energy levels of the spheroidal harmonic oscillator and to derive the corre- sponding solutions for a semi-spheroidal harmonic oscil- lator which may be useful to study atomic cluster de- posited on planar surfaces. For spheroidal equipotential surfaces, generated by a potential with cylindrical symmetry the states of the va- lence electrons were found [3] by using an effective single- particle Hamiltonian with a potential Mω20R 2 + δ 2 + δ In order to get analytical solutions we shall neglect an additional term proportional to (l2 − 〈l2〉n). We plan to include in the future such a term which needs a numerical solution. K. L. Clemenger introduced the deformation δ by expressing the dimensionless two semiaxes (in units of the radius of a sphere with the same volume, R0 = 1/3, where rs is the Wigner-Seitz radius, 2.117 Å for Na [5, 6]) as 2 + δ ; c = 2 + δ The spheroid surface equation in dimensionless cylindri- cal coordinates ρ and z is given by = 1 (3) where a is the minor (major) semiaxis for prolate (oblate) spheroid and c is the major (minor) semiaxis for prolate (oblate) spheroid. Volume conservation leads to a2c = 1. One can separate the variables in the Schrödinger equation, HΨ = EΨ, written in cylindrical coordinates. As a result the wave function [7, 8] may be written as Ψ(η, ξ, ϕ) = ψmnr (η)Φm(ϕ)Znz (ξ) (4) where each component of the wave function is ortonor- malized leading to Φm(ϕ) = e 2π (5) ψ(η) = Nmnrη |m|/2e−η/2L nr (η) Nmnr = α⊥(nr+|m|)! in which η = R20ρ 2/α2⊥ and the quantum numbers m = (n⊥−2i) with i = 0, 1, ... up to (n⊥−1)/2 for an odd n⊥ or to (n⊥− 2)/2 for an even n⊥. Lmn (x) is the associated Laguerre polynomial and the constant α⊥ = h̄/Mω⊥ has the dimension of a length. Znz(ξ) = Nnze −ξ2Hnz (ξ) Nnz = π2nznz!) 1/2 (7) where ξ = R0z/αz, αz = h̄/Mωz, and the main quan- tum number n = n⊥ + nz = 0, 1, 2, .... The eigenvalues are En = h̄ω⊥(n⊥ + 1) + h̄ωz(nz + 1/2) (8) The parity of the Hermite polynomials Hnz (ξ) is given by (−1)nz meaning that the even order Hermite poly- nomials are even functions H2nz(−ξ) = H2nz (ξ) and the odd order Hermite polynomials are odd functions H2nz+1(−ξ) = −H2nz+1(ξ). There is a recurrence rela- tionship 2zHn = Hn+1+2nHn−1. One hasH0 = 1, H1 = http://arxiv.org/abs/0704.0847v1 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 (spheroidal deformation) -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 (spheroidal deformation) FIG. 1: LEFT: Spheroidal harmonic oscillator energy levels in units of h̄ω0 vs. the deformation parameter δ. Only 6 major shells (N = 0, 1, 2, ..., 5) have been considered. Each level is labeled by n, n⊥ quantum numbers and is (2n⊥+2)-fold degenerate. The labels are 0, 0; 1, 0, 1, 1; 2, 0, 2, 1, 2, 2; 3, 0, 3, 1, 3, 2, 3, 3; 4, 0, 4, 1, 4, 2, 4, 3, 4, 4, etc. RIGHT: Semi-spheroidal harmonic oscillator energy levels in units of h̄ω0 vs. the deformation coordinate δ. Only 9 major shells (N = 0, 1, 2, ..., 8) have been considered. Each level is labeled by n, n⊥ quantum numbers (with nz = n− n⊥ = 1, 3, 5, ... and is (2n⊥ + 2)-fold degenerate. The labels are 1, 0; 2, 1; 3, 2, 3, 0; 4, 3, 4, 1; 5, 4, 5, 2, 5, 0; 6, 5, 6, 3, 6, 1, etc. The semi-spherical magic numbers are identical with those obtained at the oblate spheroidal superdeformed shape (δ = −2/3): 2, 6, 14, 26, 44, 68, 100, 140, ... 2z, H2 = 4z 2−2, H3 = 8z3−12z, H4 = 16z4−48z2+12, H5 = 32z 5 − 160z3 + 120z, etc. In units of h̄ω0 the eigenvalues, ǫ = E/(h̄ω0), are given (2− δ)1/3(2 + δ)2/3 For a prolate spheroid, δ > 0, at n⊥ = 0 the energy level decreases with deformation except for n = 0, but when n⊥ = n it increases. For a given prolate deformation and z0=0.5 z1=1.5 z2=2.5 z3=3.5 z4=4.5 z5=5.5 FIG. 2: LEFT: Harmonic oscillator potential V = V (ξ), the wave functions Znz = Znz (ξ) for nz = 0, 1, 2, 3, 4, 5 and the corresponding contributions to the total energy levels ǫz nz = Enz/h̄ωz = (nz + 1/2) for spherical shapes, δ = 0. ξ = h̄/Mωz. RIGHT: The similar functions for a semi- spherical harmonic oscillator potential. Only negative parity states are retained which are vanishing at ξ = 0 where the potential wall is infinitely high . a maximum energy ǫm, there are nmin closed shells and other levels for high-order shells up to nmax: nmin = (2− δ)1/3(2 + δ)2/3ǫm − 2 + δ nmax = (2− δ)1/3(2 + δ)2/3ǫm − 2− δ (11) and similar formulae for oblate deformations, δ < 0. The low lying energy levels for the six shells (main quantum number n = 0, 1, 2, 3, 4, 5) can be seen in figure 1. Each level, labelled by n⊥, n, may accomodate 2n⊥ + 2 parti- cles. One has 2 (n⊥+1) = (n+1)(n+2) nucleons in a completely filled shell charcterized by n, and the total number of states of the low-lying n + 1 shells is n=0(n+1)(n+2) = (n+1)(n+2)(n+3)/3 leading to the magic numbers 2, 8, 20, 40, 70, 112, 168... for a spheri- cal shape. Besides the important degeneracy at a spher- ical shape (δ = 0), one also have degeneracies at some superdeformed shapes, e.g. for prolate shapes at the ra- tio c/a = (2 + δ)/(2 − δ) = 2 i.e. δ = 2/3. More details may be found in the Table I. The first five shells can reproduce the experimental magic numbers mentioned above; in order to describe the other shells Clemenger introduced the term proportional to (l2 − 〈l2〉n). Let us consider a particular shape (half of an oblate or prolate spheroid) of a semi-spheroidal cluster deposited on a surface with the z axis perpendicular on the sur- face and the ρ axis in the surface plane. Then the semi- 20 40 60 80 100 120 140 20 40 60 80 100 120 140 20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 = - 1 = - 2/3 = - 0.4 = - 0.8/3 = 2/3 = 0.4 = 0.8/3 FIG. 3: Variation of shell corrections with N for Na clusters. TOP: δ = 0. The semi-spherical magic numbers are identical with those obtained at the oblate spheroidal superdeformed shape: 2, 6, 14, 26, 44, 68, 100, 140, ... For a prolate superdeformed (δ = 2/3) shape the magic numbers are identical with those obtained at the spherical shape: 2, 8, 20, 40, 70, 112, 168, ... Other magic numbers are given in Table I. Oblate and prolate shapes are considered on the left-hand side and right-hand side, respectively. spheroidal surface equation is given by (a/c)2(c2 − z2) z ≥ 0 0 z < 0 The radius of the semi-sphere obtained for the defor- mation δ = 0 is Rs, given by the volume conservation, (1/2)(4πR3s/3) = 4πR 0/3, leading to Rs = 2 1/3R0. We shall give ρ, z, a, c in units of Rs instead of R0. Accord- ing to the volume conservation, a2cR3s/2 = R 0 so that a2c = 1. Other kind of shapes obtained from a spheroid by removing less or more than its half (as in the liquid drop calculations [9]) will be considered in the future; in this case it is not possible to obtain analytical solutions. The new potential well we have to consider in order to solve the quantum mechanical problem is shown in the right-hand side of the figure 2. The potential along the symmetry axis, Vz(z), has a wall of an infinitely large height at z = 0, and concerns only positive values of z ∞ z = 0 MR2sω 2/2 z ≥ 0 (13) In this case the wave functions should vanish in the origin, where the potential wall is infinitely high, so that only negative parity Hermite polynomials (nz odd) should be taken into consideration. From the energy levels given in figure 1 we have to select only those corresponding to this condition. In this way the former lowest level with n = 0, n⊥ = 0 should be excluded. From the two leveles with n = 1 we can retain the level with n⊥ = 0 i.e. nz = 1. This will be the lowest level for the semi-spherical harmonic oscillator and will accomodate 2n⊥+2 = 2 atoms. From the three levels with n = 2 only the one with nz = n⊥ = 1 with 2n⊥ + 2 = 4 degeneracy is retained so that the first two magic numbers at spherical shape (δ = 0) are now 2 followed by 6, etc. Some deformed magic numbers may be found in the Table I and as position of minima in Fig. 3. Each level, labelled by n⊥, n, may accomodate 2n⊥+2 particles. When n is an odd number, one should only have even n⊥ in order to select the odd nz = n − n⊥. The contribution of the shells with odd n to the semi- spherical magic numbers will be neven (2n⊥ + 2) = (n+ 1)2 leading to the sequence 2, 8, 18 for n = 1, 3, 5. The con- tribution of the shells with even n to the semi-spherical TABLE I: TOP: Deformed magic numbers of the spheroidal harmonic oscillator. BOTTOM: Deformed magic numbers of the semi-spheroidal harmonic oscillator. OBLATE PROLATE δ a/c Magic numbers δ a/c Magic numbers −0.8/3 17/13 2, 8, 18, 20, 34, 38, 58, 64, 92, 100, 136, 148, ... 0.8/3 13/17 2, 8, 20, 22, 42, 46, 76, 82, 124, 134 ... −0.4 1.5 2, 6, 8, 14, 18, 28, 34, 48, 58, 76, 90, 114, 132, ... 0.4 2/3 2, 8, 10, 22, 26, 46, 54, 66, 84, 96, 114, 138, 156, ... −2/3 2 2, 6, 14, 26, 44, 68, 100, 140, ... 2/3 0.5 2, 4, 10, 16, 28, 40, 60, 80, 110, 140, ... −1 3 2, 6, 12, 22, 36, 54, 78, 108, 144, 1 1/3 4, 12, 18, 24, 36, 48, 60, 80, 100, 120, 150, ... −0.8/3 17/13 2, 6, 12, 22, 26, 36, 42, 56, 64, 82, 92, 114, 126, 154, ... 0.8/3 13/17 2, 6, 8, 14, 18, 28, 34, 48, 58, 76, 90, 114, 132, ... −0.4 1.5 2, 6, 12, 22, 36, 54, 78, 108, 144, 0.4 2/3 2, 8, 18, 20, 34, 38, 50, 58, 64, 80, 92, 100, ... −2/3 2 2, 6, 12, 20, 32, 48, 68, 92, 122, 158, ... 2/3 0.5 2, 8, 20, 40, 70, 112, 168, ... −1 3 2, 6, 20, 30, 42, 58, 78, 102, 130, 1 1/3 2, 8, 10, 14, 22, 26, 46, 54, 66, 84, 96, 114, 138, 156, ... magic numbers will be (2n⊥ + 2) = n(n+ 2) which gives the sequence 4, 12, 24 for n = 2, 4, 6. This should be interlaced with the preceding one so that the magic numbers will be 2, 2+4 = 6, 6+8 = 14, 14+12 = 26, 26+18 = 44, 44+24 = 68, as shown at the right-hand side of the Fig. 1. The equation (9) from the harmonic oscillator, in units of h̄ω0 is still valid, but one should only allow the values of n and n⊥ for which nz = n−n⊥ ≥ 1 are odd numbers. The ortonormalization condition of the Znz component of the wave function became Zn′z(z)Znz(z)dz = δn′znz (16) with nz = 1, 3, 5, ..., n for odd n and nz = 1, 3, 5, ..., n− 1 for even n. Consequently the normalization factor is times the preceding one Znz(ξ) = 2Nnze −ξ2Hnz(ξ) Nnz = π2nznz!) 1/2 (17) For a nucleus with mass number A the shell gap is given by h̄ω00 = 41A 1/3 MeV. For an atomic cluster [10] the single-particle shell gap is given by h̄ω0(N) = 13.72 eV Å rsN1/3 which is 3.0613N−1/3 eV in case of Na clusters. Since we consider solely monovalent elements, N in this eq. is the number of atoms and t denotes the electronic spillout for the neutral cluster according to [10]. The shell correction energy, δU [11], in figure 3 shows minima at the oblate and prolate magic numbers given in the lower part of the table I. The striking result is that the superdeformed prolate magic numbers of the semi-spheroidal shape are identical with those obtained at the spherical shape of the spheroidal harmonic oscil- lator. We expect that this kind of symmetry will not be present anylonger for the Hamiltonian including the term proportional to (l2−〈l2〉n) and/or the more complex equipotential surface we shall study in the future. ∗ poenaru@fias.uni-frankfurt.de [1] S. G. Nilsson, Det Kongelige Danske Videnskabernes Sel- skab (Dan. Mat. Fys. Medd.) 29 (1955). [2] W. D. Knight, K. Clemenger, W. A. de Heer, W. A. Saunders, M. Y. Chou, and M. L. Cohen, Phys. Rev. Lett. 52, 2141 (1984). [3] K. L. Clemenger, Phys. Rev. B 32, 1359 (1985). [4] S. M. Reimann, M. Brack, and K. Hansen, Z. Phys. D 28, 235 (1993). [5] M. Brack, Phys. Rev. B 39, 3533 (1989). [6] C. Yannouleas and U. Landman, Phys. Rev. B 51, 1902 (1995). [7] A. J. Rassey, Phys. Rev. 109, 949 (1958). [8] D. Vautherin, Phys. Rev. C 7, 296 (1973). [9] V. V. Semenikhina, A. G. Lyalin, A. V. Solov’yov, and W. Greiner, to be published (2007). [10] K. L. Clemenger, Ph. D. Dissertation (1985), University of California, Berkeley. [11] V. M. Strutinsky, Nuclear Physics, A 95, 420 (1967). mailto:poenaru@fias.uni-frankfurt.de ABSTRACT A new single-particle shell model is derived by solving the Schr\"odinger equation for a semi-spheroidal potential well. Only the negative parity states of the $Z(z)$ component of the wave function are allowed, so that new magic numbers are obtained for oblate semi-spheroids, semi-sphere and prolate semi-spheroids. The semi-spherical magic numbers are identical with those obtained at the oblate spheroidal superdeformed shape: 2, 6, 14, 26, 44, 68, 100, 140, ... The superdeformed prolate magic numbers of the semi-spheroidal shape are identical with those obtained at the spherical shape of the spheroidal harmonic oscillator: 2, 8, 20, 40, 70, 112, 168 ... <|endoftext|><|startoftext|> Introduction to mathematics of quasicrys- tals, edited by M. V. Jaric (Academic Press, Boston, 1989), pp. 53–80. [6] S. Dworkin and J.-I. Shieh, Commun. math. Phys. 168, 337 (1995). [7] R. Penrose, Bull. Inst. Math. and Its Appl. 10, 266 (1974). [8] M. V. Jaric and M. Ronchetti, Phys. Rev. Lett. 62, 1209 (1989). [9] G. van Ophuysen, M. Weber, and L. Danzer, J. Phys. A 28, 281 (1995). [10] J. E. S. Socolar, in Quasicrystals, The State of Art (2nd Ed.), edited by D. DiVincenzo and P. J. Steinhardt (World Scientific, Singapore, 1999), pp. 225–250. [11] M. Gardner, Sci. Am. 236, 110 (1977). [12] H.-C. Jeong and E. D. Williams, Surface Science Reports 34, 171 (1999). [13] R. Penrose, Emperor’s New Mind (Oxford University Press, New York, 2002), p. 640ff. [14] T. Dotera, H.-C. Jeong, and P. J. Steinhardt, in Methods of structural analysis of modulated structures and qua- sicrystals, edited by et. al.. J. M. Perez-Mato (World Scientific, Singapore, 1991), pp. 660–663. [15] K. Ingersent, Ph.D. thesis, University of Pennsylvania, 1990. [16] We introduce the concept of sticky sites as a mathemat- ical devise to avoid the mistakes of copying a tile in a flipped worm but it may mimic the real growth process of quasicrystals. Recently, Fournée et al. observed “traps” or sticky sites on which adatoms are easily captured for some quasicrystal surfaces [21]. [17] Covering of all ends of the semi-infinite worms further ensures that all sticky sites in the decapod tiling stay outside of the semi-infinite worms. [18] From the Fig 2(b), one can see that an active decapod tiling is obtained by flipping a semi-infinite worm in a cartwheel tiling while flipping a half of an infinite worm results in an inactive decapod tiling. [19] The requirement of the underneath tiles is added to en- sure that the nucleation happens from the third layer. In real growth, this requirement may not be needed. Nu- cleation is likely to happen only on the top layer (hence from the third layer) which can be large enough to wait the slow necleation process. [20] The sticky sites (Fig. 2(a)) are formed only on a compact cluster of tiles. Therefore, the vertical growth by Rule-V, which produces isolated tiles, cannot be continue more than one layer height without nucleation process. [21] V. Fournée et al., Phys. Rev. B 67, 033406 (2003). http://arxiv.org/abs/cond-mat/9903074 ABSTRACT A local growth algorithm for a decagonal quasicrystal is presented. We show that a perfect Penrose tiling (PPT) layer can be grown on a decapod tiling layer by a three dimensional (3D) local rule growth. Once a PPT layer begins to form on the upper layer, successive 2D PPT layers can be added on top resulting in a perfect decagonal quasicrystalline structure in bulk with a point defect only on the bottom surface layer. Our growth rule shows that an ideal quasicrystal structure can be constructed by a local growth algorithm in 3D, contrary to the necessity of non-local information for a 2D PPT growth. <|endoftext|><|startoftext|> Introduction Cosmological models representing the early stages of the Universe have been studied by several authors. An LRS (Locally Rotationally Symmetric) Binachi type-V spatially homogeneous space-time creates more interest due to its richer structure both physically and geometrically than the standard perfect fluid FRW models. An LRS Bianchi type-V universe is a simple generalization of the Robertson-Walker metric with negative curvature. Most cosmological models assume that the matter in the universe can be described by ’dust’ (a pressure- less distribution) or at best a perfect fluid. However, bulk viscosity is expected to play an important role at certain stages of expanding universe [1]−[3]. It has been shown that bulk viscosity leads to inflationary like solution [4] and acts like a negative energy field in an expanding universe [5]. Furthermore, there are several processes which are expected to give rise to viscous effects. These are the decoupling of neutrinos during the radiation era and the decoupling of 1Corresponding Author http://arxiv.org/abs/0704.0849v2 radiation and matter during the recombination era. Bulk viscosity is associ- ated with the Grand Unification Theories (GUT) phase transition and string creation. Thus, we should consider the presence of a material distribution other than a perfect fluid to have realistic cosmological models (see Grøn [6] for a review on cosmological models with bulk viscosity). A number of authors have discussed cosmological solutions with bulk viscosity in various context [7]−[9]. Models with a relic cosmological constant Λ have received considerable at- tention recently among researchers for various reasons (see Refs.[10]−[14] and references therein). Some of the recent discussions on the cosmological constant “problem” and consequence on cosmology with a time-varying cosmological con- stant by Ratra and Peebles [15], Dolgov [16]−[18] and Sahni and Starobinsky [19] have pointed out that in the absence of any interaction with matter or radi- ation, the cosmological constant remains a “constant”. However, in the presence of interactions with matter or radiation, a solution of Einstein equations and the assumed equation of covariant conservation of stress-energy with a time- varying Λ can be found. For these solutions, conservation of energy requires decrease in the energy density of the vacuum component to be compensated by a corresponding increase in the energy density of matter or radiation. Earlier researchers on this topic, are contained in Zeldovich [20], Weinberg [11] and Carroll, Press and Turner [21]. Recent observations by Perlmutter et al. [22] and Riess et al. [23] strongly favour a significant and positive value of Λ. Their finding arise from the study of more than 50 type Ia supernovae with redshifts in the range 0.10 ≤ z ≤ 0.83 and these suggest Friedmann models with negative pressure matter such as a cosmological constant (Λ), domain walls or cosmic strings (Vilenkin [24], Garnavich et al. [25]) Recently, Carmeli and Kuzmenko [26] have shown that the cosmological relativistic theory (Behar and Carmeli [27]) predicts the value for cosmological constant Λ = 1.934 × 10−35s−2. This value of “Λ” is in excellent agreement with the measurements recently obtained by the High-Z Supernova Team and Supernova Cosmological Project (Garnavich et al. [25], Perlmutter et al. [22], Riess et al. [23], Schmidt et al. [28]). The main conclusion of these observations is that the expansion of the universe is accelerating. Several ansätz have been proposed in which the Λ term decays with time (see Refs. Gasperini [29, 30], Berman [31], Freese et al. [14], Özer and Taha [14], Peebles and Ratra [32], Chen and Hu [33], Abdussattar and Viswakarma [34], Gariel and Le Denmat [35], Pradhan et al. [36]). Of the special interest is the ansätz Λ ∝ S−2 (where S is the scale factor of the Robertson-Walker metric) by Chen and Wu [33], which has been considered/modified by several authors ( Abdel-Rahaman [37], Carvalho et al. [14], Waga [38], Silveira and Waga [39], Vishwakarma [40]). Recently Bali and Yadav [41] obtained an LRS Bianchi type-V viscous fluid cosmological models in general relativity. Motivated by the situations discussed above, in this paper, we focus upon the exact solutions of Einstein’s field equa- tions in presence of a bulk viscous fluid in an expanding universe. We do this by extending the work of Bali and Yadav [41] by including a time dependent cosmological term Λ in the field equations. We have also assumed the coefficient of bulk viscosity to be a power function of mass density. This paper is organized as follows. The metric and the field equations are presented in section 2. In section 3 we deal with the solution of the field equations in presence of viscous fluid. The sections 3.1 and 3.2 contain the two different cases and also con- tain some physical aspects of these models respectively. Section 4 describe two models under suitable transformations. Finally in section 5 concluding remarks have been given. 2 The Metric and Field Euations We consider LRS Bianchi type-V metric in the form ds2 = −dt2 +A2dx2 +B2e2x(dy2 + dz2), (1) where A and B are functions of t alone. The Einstein’s field equations (in gravitational units c = 1, G = 1) read as i + Λg i = −8πT i , (2) where R i is the Ricci tensor; R = g ijRij is the Ricci scalar; and T i is the stress energy-tensor in the presence of bulk stress given by i = (ρ+ p)viv j + pg i − (v i; + v ;i + v jvℓvi;ℓ + viv ξ − 2 vℓ;ℓ(g i + viv j). (3) Here ρ, p, η and ξ are the energy density, isotropic pressure, coefficients of shear viscosity and bulk viscous coefficient respectively and vi the flow vector satisfying the relations ivj = −1. (4) The semicolon (; ) indicates covariant differentiation. We choose the coordinates to be comoving, so that vi = δi4. The Einstein’s field equations (2) for the line element (1) has been set up as = −8π p− 2ηA4 ξ − 2 − Λ, (5) = −8π p− 2η − Λ, (6) 2A4B4 = −8πρ− Λ, (7) = 0. (8) The suffix 4 after the symbols A, B denotes ordinary differentiation with respect to t and θ = vℓ;ℓ 3 Solutions of the Field Eqations In this section, we have revisited the solutions obtained by Bali and Yadav [41]. Equations (5) - (8) are four independent equations in seven unknowns A, B, p, ρ, ξ, η and Λ. For complete determinacy of the system, we need three extra conditions. Eq. (8), after integration, reduce to A = Bk, (9) where k is an integrating constant. Equations (5) and (6) lead to − A44 − A4B4 = −16πη . (10) Using Eq. (9) in (10), we obtain k + 1 f = −16πη, (11) where B4 = f(B). Eq. (11) leads to f = − 16πη (k + 2) , (12) where L is an integrating constant. Eq. (12) again leads to B = (k + 2) k1 − k2e−16πηt k+2 , (13) where , (14) , (15) N being constant of integration. From Eqs. (9) and (13), we obtain A = (k + 2) k1 − k2e−16πηt k+2 . (16) Hence the metric (1) reduces to the form ds2 = −dt2 + (k + 2) k1 − k2e−16πηt k+2 dx2 + e2x(k + 2) k1 − k2e−16πηt k+2 (dy2 + dz2). (17) The pressure and density of the model (17) are obtained as 8πp = (8π)(16πη)k2e −16πηt 3(k + 2)2(k1 − k2e−16πηt)2 k1(k + 2) 2(4η + 3ξ)− {k2(4η + 3ξ) +4k(η+3ξ)+2(5η+6ξ)}k2e−16πηt [(k + 2)(k1 − k2e−16πηt)] −Λ, (18) 8πρ = − (2k + 1) (k + 2)2 (16πη)2k22 e−32πηt (k1 − k2e−16πηt)2 [(k + 2)(k1 − k2e−16πηt)] + Λ. (19) The expansion θ in the model (17) is obtained as (16πη)k2e −16πηt (k1 − k2e−16πηt) . (20) For complete determinacy of the system we have to consider three extra condi- tions. Firstly we assume that the coefficient of shear viscosity is constant, i.e., η = η0 (say). For the specification of Λ(t), we secondly assume that the fluid obeys an equation of state of the form p = γρ, (21) where γ(0 ≤ γ ≤ 1) is a constant. Thirdly bulk viscosity (ξ) is assumed to be a simple power function of the energy density [42]−[45]. ξ(t) = ξ0ρ n, (22) where ξ0 and n are constants. For small density, n may even be equal to unity as used in Murphy’s work [46] for simplicity. If n = 1, Eq. (22) may correspond to a radiative fluid [47]. Near the big bang, 0 ≤ n ≤ 1 is a more appropriate assumption [48] to obtain realistic models. For simplicity and realistic models of physical importance, we consider the following two cases (n = 0, 1): 3.1 Model I: Solution for n = 0 When n = 0, Eq. (22) reduces to ξ = ξ0 = constant. Hence, in this case Eqs. (18) and (19), with the use of (21), lead to 8π(1 + γ)ρ = k1(k + 2) 2(4η0 + 3ξ0)− {k2(4η0 + 3ξ0) + 4k(η0 + 3ξ0) + 2(5η0 + 6ξ0)}k2e−16πη0t − (2k + 1)M . (23) Eliminating ρ(t) between Eqs. (19) and (23), we obtain (1 + γ)Λ = k1(k + 2) 2(4η0 + 3ξ0)− {k2(4η0 + 3ξ0) + 4k(η0 + 3ξ0) + 2(5η0 + 6ξ0)}k2e−16πη0t + (2k + 1)γ (1− 3γ) , (24) where M = 16πk2η0e −16πη0t, N = (k + 2)(k1 − k2e−16πη0t), P = 2k2 + 2k + 5, Q = k2 + 4k + 4. (25) 3.2 Model II: Solution for n = 1 When n = 1, Eq. (22) reduces to ξ = ξ0ρ . Hence, in this case Eqs. (18) and (19), with the use of (21), leads to 8πρ = 16πM{2k1(k + 2)2η0 − Pk2η0e−16πη0t} 3 [(1 + γ)N2 −M{k1(k + 2)2ξ0 −Qk2ξ0e−16πη0t}] k+2 − (2k + 1)M2 [(1 + γ)N2 −M{k1(k + 2)2ξ0 −Qk2ξ0e−16πη0t}] . (26) Eliminating ρ(t) between Eqs. (19) and (26), we get Λ = 16πM [2k1(k + 2) 2η0 − Pk2η0e−16πη0t] + γ(2k + 1) (1 + γ) (1− 3γ) (1 + γ)N M [k1(k + 2) 2ξ0 −Qk2ξ0e−16πη0t]{4N k+2 − (2k + 1)M2} (1 + γ)N2 [(1 + γ)N2 −M{k1(k + 2)2ξ0 −Qk2ξ0e−16πη0t}] From Eqs. (23) and (26), we note that ρ(t) is a decreasing function of time and ρ > 0 for all time in both models. The behaviour of the universe in these models will be determined by the cosmological term Λ; this term has the same effect as a uniform mass density ρeff = −Λ/4πG, which is constant in space and time. A positive value of Λ corresponds to a negative effective mass density (repulsion). Hence, we expect that in the universe with a positive value of Λ, the expansion will tend to accelerate; whereas in the universe with negative value of Λ, the expansion will slow down, stop and reverse. From Eqs. (24) and (27), we observe that the cosmological term Λ in both models is a decreasing function of time and it approaches a small positive value as time increase more and more. This is a good agreement with recent observations of supernovae Ia (Garnavich et al. [25], Perlmutter et al. [22], Riess et al. [23], Schmidt et al. [28]). The shear σ in the model (17) is given by (k − 1)M√ . (28) The non-vanishing components of conformal curvature tensor are given by C2323 = −C1414 = (k − 1)M [kM − 16πη0k1(k + 2)], (29) C1313 = −C2424 = (k − 1)M [16πη0k1(k + 2)− kM ], (30) C1212 = −C3434 = (k − 1)M [16πη0k1(k + 2)− kM ]. (31) Equations (20) and (28) lead to (k − 1)√ 3(k + 2) = constant. (32) The model (17) is expanding, non-rotating and shearing. Since σ = conatant, hence the model does not approach isotropy. The space-time (17) is Petrov type D in presence of viscosity. 4 Other Models After using the transformation k1 − k2e−16πηt = sin (16πητ), k + 2 = 1/16πη, (33) the metric (17) reduces to ds2 = − cos (16πητ) k1 − sin (16πητ) dτ2 + sin (16πητ) ]2(1−32πη) + e2x sin (16πητ) ](32πη) (dy2 + dz2). (34) The pressure (p), density (ρ) and the expansion (θ) of the model (34) are ob- tained as 8πp = (16πη)2{k1 − sin (16πητ) 3 sin2 (16πητ) 2k1−2(1−48πη+1152π2η2){k1−sin (16πητ)} (16πη)(8πξ){k1 − sin (16πητ)} sin (16πητ) sin (16πητ) ]2(1−32πη) − Λ, (35) 8πρ = 2(24πη − 1)(16πη)3{k1 − sin (16πητ)}2 sin2 (16πητ) sin (16πητ) ]2(1−32πη) (16πη){k1 − sin (16πητ)} sin (16πητ) . (37) 4.1 Model I: Solution for n = 0 When n = 0, Eq. (22) reduces to ξ = ξ0 = constant. Hence, in this case Eqs. (35) and (36), with the use of (21), lead to 8π(1 + γ)ρ = 2(16πη0) 3 sin2(16πη0τ) [k1 − P1M1] + (16πη0)(8πξ0)M1 sin(16πη0τ) + 4N1 + 2(24πη0 − 1)(16πη0)3M21 sin2(16πη0τ) . (38) Eliminating ρ(t) between Eqs. (36) and (38), we obtain (1 + γ)Λ = 2(16πη0) 3 sin2(16πη0τ) [k1 − P1M1] + (16πη0)(8πξ0)M1 sin(16πη0τ) + (1− 3γ)N1 + 2γ(24πη0 − 1)(16πη0)3M21 sin2(16πη0τ) . (39) 4.2 Model II: Solution for n = 1 When n = 1, Eq. (22) reduces to ξ = ξ0ρ . Hence, in this case Eqs. (35) and (36), with the use of (21), lead to 8πρ = 2(16πη0) 2M1[(k1 − P1M1) + 3(24πη0 − 1)(16πη0)M1] 3 sin(16πη0τ)[(1 + γ) sin(16πη0τ) − 16πη0ξ0M1] 4N1 sin(16πη0τ) [(1 + γ) sin(16πη0τ) − 16πη0ξ0M1] . (40) Eliminating ρ(t) between Eqs. (36) and (40), we obtain 2(16πη0) 2M1(k1 − P1M1) 3 sin(16πη0τ)[(1 + γ) sin(16πη0τ)− 16πη0ξ0M1] [3(16πη0ξ0)M1 + (1− 3γ) sin(16πη0τ)] [(1 + γ) sin(16πη0τ)− 16πη0ξ0M1] 2(24πη0 − 1)(16πη0)3M21 [γ(1 + γ) sin(16πη0τ) − (1− γ)(16πη0ξ0)M1] (1 + γ) sin2(16πη0τ)[(1 + γ) sin(16πη0τ)− (16πη0ξ0)M1] , (41) where M1 = k1 − sin(16πη0τ), 16πη0 sin (16πη0τ) ]2(1−32πη) P1 = 1− 48πη0 + 1152π2η20 . (42) The shear (σ) in the model (34) is obtained as (1− 48πη0)(16πη0)[k1 − sin(16πη0τ)√ 3 sin(16πη0τ) . (43) The models descibed in cases 4.1 and 4.2 preserve the same properties as in the cases of 3.1 and 3.2. 5 Conclusions We have obtained a new class of LRS Bianchi type-V cosmological models of the universe in presence of a viscous fluid distribution with a time dependent cosmological term Λ. We have revisited the solutions obtained by Bali and Ya- dav [41] and obtained new solutions which also generalize their work. The cosmological constant is a parameter describing the energy density of the vacuum (empty space), and a potentially important contribution to the dy- namical history of the universe. The physical interpretation of the cosmological constant as vacuum energy is supported by the existence of the “zero point” energy predicted by quantum mechanics. In quantum mechanics, particle and antiparticle pairs are consistently being created out of the vacuum. Even though these particles exist for only a short amount of time before annihilating each other they do give the vacuum a non-zero potential energy. In general relativity, all forms of energy should gravitate, including the energy of vacuum, hence the cosmological constant. A negative cosmological constant adds to the attractive gravity of matter, therefore universes with a negative cosmological constant are invariably doomed to re-collapse [49]. A positive cosmological constant resists the attractive gravity of matter due to its negative pressure. For most universes, the positive cosmological constant eventually dominates over the attraction of matter and drives the universe to expand exponentially [50]. The cosmological constants in all models given in Sections 3.1 and 3.2 are decreasing functions of time and they all approach a small and positive value at late times which are supported by the results from recent type Ia supernova observations recently obtained by the High-z Supernova Team and Supernova Cosmological Project (Garnavich et al. [25], Perlmutter et al. [22], Riess et al. [23], Schmidt et al. [28]). Thus, with our approach, we obtain a physically rele- vant decay law for the cosmological term unlike other investigators where adhoc laws were used to arrive at a mathematical expressions for the decaying vacuum energy. Our derived models provide a good agreement with the observational results. We have derived value for the cosmological constant Λ and attempted to formulate a physical interpretation for it. Acknowledgements The authors wish to thank the Harish-Chandra Research Institute, Allahabad, India, for providing facility where part this work was done. We also thank to Professor Raj Bali for his fruitful suggestions and comments in the first draft of the paper. References [1] C. W. Misner, Astrophys. J. 151, 431 (1968). [2] G. F. R. Ellis, In General Relativity and Cosmology, Enrico Fermi Course, R. K. Sachs. ed. (Academic Press, New York, 1979). [3] B. L. Hu, In Advance in Astrophysics, eds. L. J. Fung and R. Ruffini, (World Scientific, Singapore, 1983). [4] T. Padmanabhan and S. M. Chitre, Phys. Lett. A 120, 433 (1987). [5] V. B. Johri and R. Sudarshan, Proc. Int. Conf. on Mathematical Modelling in Science and Technology, L. S. Srinath et al., eds (World Scientific, Singapore, 1989). [6] Ø. Grøn, Astrophys. Space Sci. 173, 191 (1990). [7] A. Pradhan, V. K. Yadav and I. Chakrabarty, Int. J. Mod. Phys. D 10, 339 (2001). I. Chakrabarty, A. Pradhan and N. N. Saste, Int. J. Mod. Phys. D 10, 741 (2001). A. Pradhan and I. Aotemashi, Int. J. Mod. Phys. D 11, 1419 (2002). A. Pradhan and H. R. Pandey, Int. J. Mod. Phys. D 12 , 941 (2003). [8] L. P. Chimento, A. S. Jakubi and D. Pavon, Class. Quant. Grav. 16, 1625 (1999). [9] G. P. Singh, S. G. Ghosh and A. Beesham, Aust. J. Phys. 50, 903 (1997). [10] S. Weinberg, Rev. Mod. Phys. 61, 1 (1989). [11] S. Weinberg, Gravitation and Cosmology, (Wiley, New York, 1972). [12] J. A. Frieman and I. Waga, Phys. Rev. D 57, 4642 (1998). [13] R. Carlberg, et al., Astrophys. J. 462, 32 (1996). [14] M. Özer and M. O. Taha, Nucl. Phys. B 287, 776 (1987). K. Freese, F. C. Adams, J. A. Frieman and E. Motta, ibid. B 287, 1797 (1987). J. C. Carvalho, J. A. S. Lima and I. Waga, Phys. Rev.D 46, 2404 (1992). V. Silviera and I. Waga, ibid. D 50, 4890 (1994). [15] B. Ratra and P. J. E. Peebles, Phys. Rev. D 37, 3406 (1988). [16] A. D. Dolgov, in The Very Early Universe, eds. G. W. Gibbons, S. W. Hawking and S. T. C. Siklos, (Cambridge Univerity Press, 1983). [17] A. D. Dolgov, M. V. Sazhin and Ya. B. Zeldovich, Basics of Modern Cosmology, (Editions Frontiers, 1990). [18] A. D. Dolgov, Phys. Rev. D 55, 5881 (1997). [19] V. Sahni and A. Starobinsky, Int. J. Mod. Phys. D 9, 373 (2000). [20] Ya. B. Zeldovich, Sov. Phys.-Uspekhi 11, 381 (1968). [21] S. M. Carroll, W. H. Press and E. L. Turner, Ann. Rev. Astron. Astrophys. 30, 499 (1992). [22] S. Perlmutter et al., Astrophys. J. 483, 565 (1997), Supernova Cosmology Project Collaboration (astro-ph/9608192); S. Perlmutter et al., Nature 391, 51 (1998), Supernova Cosmology Project Collaboration (astro-ph/9712212); S. Perlmutter et al., Astrophys. J. 517, 565 (1999), Project Collaboration (astro-ph/9608192). [23] A. G. Riess et al., Astron. J. 116, 1009 (1998); Hi-Z Supernova Team Collaboration (astro-ph/9805201). [24] A. Vilenkin, Phys. Rep. 121, 265 (1985). [25] P. M. Garnavich et al., Astrophys. J. 493, L53 (1998a), Hi-z Supernova Team Collaboration (astro-ph/9710123); P. M. Garnavich et al., Astrophys. J. 509, 74 (1998b); Hi-z Supernova Team Collaboration (astro-ph/9806396). [26] M. Carmeli and T. Kuzmenko, Int. J. Theor. Phys. 41, 131 (2002). [27] S. Behar and M. Carmeli, Int. J. Theor. Phys. 39, 1375 (2002) 1375. http://arxiv.org/abs/astro-ph/9608192 http://arxiv.org/abs/astro-ph/9712212 http://arxiv.org/abs/astro-ph/9608192 http://arxiv.org/abs/astro-ph/9805201 http://arxiv.org/abs/astro-ph/9710123 http://arxiv.org/abs/astro-ph/9806396 [28] B. P. Schmidt et al., Astrophys. J. 507, 46 (1998), Hi-z Supernova Team Collaboration (astro-ph/9805200). [29] M. Gasperini, Phys. Lett. B 194, 347 (1987). [30] M. Gasperini, Class. Quant. Grav. 5, 521 (1988). [31] M. S. Berman, Int. J. Theor. Phys. 29, 567 (1990) 567; M. S. Berman, Int. J. Theor. Phys. 29, 1419 (1990); M. S. Berman, Phys. Rev. D 43, 75 (1991). M. S. Berman and M. M. Som, Int. J. Theor. Phys. 29, 1411 (1990). M. S. Berman, M. M. Som and F. M. Gomide, Gen. Rel. Grav. 21, 287 (1989). M. S. Berman and F. M. Gomide, Gen. Rel. Grav. 22, 625 (1990). [32] P. J. E. Peebles and B. Ratra, Astrophys. J. 325, L17 (1988). [33] W. Chen and Y. S. Wu, Phys. Rev. D 41, 695 (1990). [34] Abdussattar and R. G. Vishwakarma, Pramana J. Phys. 47, 41 (1996). [35] J. Gariel and G. Le Denmat, Class. Quant. Grav. 16, 149 (1999). [36] A. Pradhan and A. Kumar, Int. J. Mod. Phys. D 10, 291 (2001). A. Pradhan and V. K. Yadav, Int J. Mod Phys. D 11, 983 (2002). A. Pradhan and O. P. Pandey, Int. J. Mod. Phys. D 12, 941 (2003). A. Pradhan, S. K. Srivastava and K. R. Jotania, Czech. J. Phys. 54, 255 (2004). A. Pradhan, A. K. Yadav and L. Yadav, Czech. J. Phys. 55, 503 (2005). A. Pradhan and P. Pandey, Czech. J. Phys. 55, 749 (2005). A. Pradhan and P. Pandey, Astrophys. Space Sci. 301, 221 (2006). G. S. Khadekar, A. Pradhan and M. R. Molaei, Int. J. Mod. Phys. D 15, 95 (2006). A. Pradhan, K. Srivastava and R. P. Singh, Fizika B (Zagreb) 15, 141 (2006). C. P. Singh, S. Kumar and A. Pradhan, Class. Quantum Grav. 24, 455 (2007). A. Pradhan, A. K. Singh and S. Otarod, Romanian J. Phys. 52, 415 (2007). [37] A.-M. M. Abdel-Rahaman, Gen. Rel. Grav. 22, 655 (1990); Phys. Rev. D 45, 3492 (1992). [38] I. Waga, Astrophys. J. 414, 436 (1993). [39] V. Silveira and I. Waga, Phys. Rev. D 50, 4890 (1994). [40] R. G. Vishwakarma, Class. Quant. Grav. 17, 3833 (2000). [41] R. Bali and M. K. Yadav, J. Raj. Acad. Phys. Sci. 1, 47 (2002). http://arxiv.org/abs/astro-ph/9805200 [42] D. Pavon, J. Bafaluy and D. Jou, Class Quant. Grav. 8, 357 (1991); “Proc. Hanno Rund Conf. on Relativity and Thermodynamics”, Ed. S. D. Maharaj, (University of Natal, Durban, 1996, p. 21). [43] R. Maartens, Class Quant. Grav. 12, 1455 (1995). [44] W. Zimdahl, Phys. Rev. D 53, 5483 (1996). [45] N. O. Santos, R. S. Dias and A. Banerjee, J. Math. Phys. 26, 878 (1985). [46] G. L. Murphy, Phys. Rev. D 8, 4231 (1973). [47] S. Weinberg, Astrophys. J. 168, 175 (1971). [48] U. A. Belinskii and I. M. Khalatnikov, Sov. Phys. JETP 42, 205 (1976). [49] S. M. Carrol, W. H. Press and E. L. Turner, ARA&A 30, 499 (1992). [50] C. S. Kochanek, Astrophys. J. 384, 1 (1992). Introduction The Metric and Field Euations Solutions of the Field Eqations Model I: Solution for n = 0 Model II: Solution for n = 1 Other Models Model I: Solution for n = 0 Model II: Solution for n = 1 Conclusions ABSTRACT An LRS Bianchi type-V cosmological models representing a viscous fluid distribution with a time dependent cosmological term $\Lambda$ is investigated. To get a determinate solution, the viscosity coefficient of bulk viscous fluid is assumed to be a power function of mass density. It turns out that the cosmological term $\Lambda(t)$ is a decreasing function of time, which is consistent with recent observations of type Ia supernovae. Various physical and kinematic features of these models have also been explored. <|endoftext|><|startoftext|> Introduction The spin-1/2 antiferromagnetic Heisenberg XXZ chain is one of the most fundamental models for one-dimensional quantum magnetism, which is given by the Hamiltonian Sxj S j+1 + S j+1 +∆S , (1.1) where Sαj = σ j /2 with σ j being the Pauli matrices acting on the j-th site and ∆ is the anisotropy parameter. For ∆ > 1, it is called the massive XXZ model where the system is gapful. Meanwhile for −1 < ∆ ≤ 1 case, the system is gapless and called the massless XXZ model. Especially we call it XXX model for the isotropic case ∆ = 1. The exact eigenvalues and eigenvectors of this model can be obtained by the Bethe Ansatz method [1, 2]. Many physical quantities in the thermodynamic limit such as specific heat, magnetic susceptibility, elementary excitations, etc..., can be exactly evaluated even at finite temperature by the Bethe ansatz method [2]. The exact calculation of the correlation functions, however, is still a difficult problem. The exceptional case is ∆ = 0, where the system reduces to a lattice free-fermion model by the Jordan-Wigner transformation. In this case, we can calculate arbitrary correlation functions by means of Wick’s theorem [3, 4]. Recently, however, there have been rapid developments in the exact evaluations of correlation functions for ∆ 6= 0 case also, since Kyoto Group (Jimbo, Miki, Miwa, Nakayashiki) derived a multiple integral representation for arbitrary correlation functions. Using the representation theory of the quantum affine algebra Uq(ŝl2), they first derived a multiple integral representation for massive XXZ antiferromagnetic chain in 1992 [5, 6], which is before long extended to the XXX case [7, 8] and the massless XXZ case [9]. Later the same integral representations were reproduced by Kitanine, Maillet, Terras [10] in the framework of Quantum Inverse Scattering Method. They have also succeeded in generalizing the integral representations to the XXZ model with an external magnetic field [10]. More recently the multiple integral formulas were extended to dynamical correlation functions as well as finite temperature correlation functions [11, 12, 13, 14]. In this way it has been established now the correlation functions for XXZ model are represented by multiple integrals in general. However, these multiple integrals are difficult to evaluate both numerically and analytically. For general anisotropy ∆, it has been shown that the multiple inetegrals up to four- dimension can be reduced to one-dimensional integrals [15, 16, 17, 18, 19, 20, 21]. As a result all the density matrix elements within four lattice sites have been obtained for general anisotropy [21]. To reduce the multiple integrals into one-dimension, however, involves hard calculation, which makes difficult to obtain correlation functions on more than four lattice sites. On the other hand, at the isotropic point ∆ = 1, an algebraic method based on qKZ equation has been devised [22] and all the density matrix elements up to six lattice sites have been obtained [23, 24]. Moreover, as for the spin-spin correlation functions, up to seventh-neighbour correlation 〈Sz1Sz8〉 for XXX chain have been obtained from the generating functional approach [25, 26]. It is desirable that this algebraic method will be generalized to the case with ∆ 6= 1. Actually, Boos, Jimbo, Miwa, Smirnov and Takeyama have derived an exponential formula for the density matrix elements of XXZ model, which does not contain multiple integrals [27, 28, 29, 30, 31]. It, however, seems still hard to evaluate the formula for general density matrix elements. Among the general ∆ 6= 0, there is a special point ∆ = 1/2, where some intriguing prop- erties have been observed. Let us define a correlation function called Emptiness Formation Probability (EFP) [8] which signifies the probability to find a ferromagnetic string of length P (n) ≡ + Szj . (1.2) The explicit general formula for P (n) at ∆ = 1/2 was conjectured in [33] P (n) = 2−n (3k + 1)! (n + k)! , (1.3) which is proportional to the number of alternating sign matrix of size n × n. Later this conjecture was proved by the explicit evaluation of the multiple integral representing the EFP [34]. Remarkably, one can also obtain the exact asymptotic behavior as n → ∞ from this formula, which is the unique valuable example except for the free fermion point ∆ = 0. Note also that as for the longitudinal two-point correlation functions at ∆ = 1/2, up to eighth-neighbour correlation function 〈Sz1Sz9〉 have been obtained in [32] by use of the multiple integral representation for the generating function. Most outstanding is that all the results are represented by single rational numbers. These results motivated us to calculate other correlation functions at ∆ = 1/2. Actually we have obtained all the density matrix elements up to six lattice sites by the direct evaluation of the multiple integrals. All the results can be written by single rational numbers as expected. A direct evaluation of the multiple integrals is possible due to the particularity of the case for ∆ = 1/2 as is explained below. 2 Analytical evaluation of multiple integral Here we shall describe how we analytically obtain the density matrix elements at ∆ = 1/2 from the multiple integral formula. Any correlation function can be expressed as a sum of density matrix elements P ,··· ,ǫ′n ǫ1,··· ,ǫn , which are defined by the ground state expectation value of the product of elementary matrices: ,··· ,ǫ′n ǫ1,··· ,ǫn ≡ 〈E 1 · · ·Eǫ n 〉, (2.1) where E j are 2× 2 elementary matrices acting on the j-th site as E++j = + Szj , E − Szj , E+−j = = S+j = S j + iS j , E = S−j = S j − iS The multiple integral formula of the density matrix element for the massless XXZ chain reads [9] ,··· ,ǫ′n ǫ1,··· ,ǫn =(−ν)−n(n−1)/2 · · · sinh(xa − xb) sinh[(xa − xb − ifabπ)ν] sinhyk−1 [(xk + iπ/2)ν] sinh n−yk [(xk − iπ/2)ν] coshn xk , (2.2) where the parameter ν is related to the anisotropy as ∆ = cosπν and fab and yk are determined as fab = (1 + sign[(s ′ − a + 1/2)(s′ − b+ 1/2)])/2, y1 > y2 > · · · > ys′, ǫ′yi = + ys′+1 > · · · > yn, ǫn+1−yi = −. (2.3) In the case of ∆ = 1/2, namely ν = 1/3, the significant simplification occurs in the multiple integrals due to the trigonometric identity sinh(xa−xb) = 4 sinh[(xa−xb)/3] sinh[(xa−xb+iπ)/3] sinh[(xa−xb−iπ)/3]. (2.4) Actually if we note that the parameter fab takes the value 0 or 1, the first factor in the multiple integral at ν = 1/3 can be decomposed as sinh(xa − xb) sinh[(xa − xb − iπ)/3] = 4 sinh xa − xb xa − xb + iπ = −1 + ωe (xa−xb) + ω−1e− (xa−xb), (2.5) sinh(xa − xb) sinh[(xa − xb)/3] = 4 sinh xa − xb + iπ xa − xb − iπ = 1 + e (xa−xb) + e− (xa−xb), (2.6) where ω = eiπ/3. Expanding the trigonometoric functions in the second factor into exponen- tials sinhy−1 [(x+ iπ/2)/3] sinhn−y [(x− iπ/2)/3] = 21−n ω1/2ex/3 − ω−1/2e−x/3 )y−1 ( ω−1/2ex/3 − ω1/2e−x/3 = 21−n (−1)l+m y − 1 ωy−l+m−(n+1)/2e (n−2l−2m−1)x, (2.7) we can explicitly evaluate the multiple integral by use of the formula eαxdx coshn x = 2n−1B , Re(n± α) > 0, (2.8) where B(p, q) is the beta function defined by B(p, q) = tp−1(1− t)q−1dt, Re(p),Re(q) > 0. (2.9) Table 1: Comparison with the asymptotic formula of the transverse correlation function 〈Sx1Sx2 〉 〈Sx1Sx3 〉 〈Sx1Sx4 〉 〈Sx1Sx5 〉 〈Sx1Sx6 〉 Exact −0.156250 0.0800781 −0.0671234 0.0521997 −0.0467664 Asymptotics −0.159522 0.0787307 −0.0667821 0.0519121 −0.0466083 In this way we have succeeded in calculating all the density matrix elements up to six lattice sites. All the results are represented by single rational numbers, which are presented in Appendix A. As for the spin-spin correlation functions, we have newly obtained the fourth- and fifth-neighbour transverse two-point correlation function 〈Sx1Sx2 〉 = − = −0.15625, 〈Sx1Sx3 〉 = = 0.080078125, 〈Sx1Sx4 〉 = − 65536 = −0.0671234130859375, 〈Sx1Sx5 〉 = 1751531 33554432 = 0.0521996915340423583984375, 〈Sx1Sx6 〉 = − 3213760345 68719476736 = −0.046766368104727007448673248291015625. The asymptotic formula of the transverse two-point correlation function for the massless XXZ chain is established in [35, 36] 〈Sx1Sx1+n〉 ∼ Ax(η) (−1)n − Ãx(η) + · · · , η = 1− ν, Ax(η) = 8(1− η)2 sinh(ηt) sinh(t) cosh[(1− η)t] − ηe−2t Ãx(η) = 2η(1− η) cosh(2ηt)e−2t − 1 2 sinh(ηt) sinh(t) cosh[(1− η)t] sinh(ηt) η2 + 1 , (2.10) which produces a good numerical value even for small n as is shown in Table 1. Note that the longitudinal correlation function was obtained up to eighth-neighbour correlaion 〈Sz1Sz9〉 from the multiple integral representation for the generating function [32]. Note also that up to third-neighbour both longitudinal and transverse correlation functions for general anisotropy ∆ were obtained in [21]. 3 Reduced density matrix and entanglement entropy Below let us discuss the reduced density matrix for a sub-chain and the entanglement entropy. The density matrix for the infinite system at zero temperature has the form ρT ≡ |GS〉〈GS|, (3.1) 0 10 20 30 40 50 60 0 10 20 30 Figure 1: Eigenvalue-distribution of density matrices Table 2: Entanglement entropy S(n) of a finite sub-chain of length n S(1) S(2) S(3) S(4) 1 1.3716407621868583 1.5766810784924767 1.7179079372711414 S(5) S(6) 1.8262818282012363 1.9144714710902746 where |GS〉 denotes the ground state of the total system. We consider a finite sub-chain of length n, the rest of which is regarded as an environment. We define the reduced density matrix for this sub-chain by tracing out the environment from the infinite chain ρn ≡ trEρT = ,··· ,ǫ′n ǫ1,··· ,ǫn ǫj ,ǫ . (3.2) We have numerically evaluate all the eigenvalues ωα (α = 1, 2, · · · , 2n) of the reduced density matrix ρn up to n = 6. We show the distribution of the eigenvalues in Figure 1. The distribution is less degenerate comapared with the isotropic case ∆ = 1 shown in [24]. In the odd n case, all the eigenvalues are two-fold degenerate due to the spin-reverse symmetry. Subsequently we exactly evaluate the von Neumann entropy (Entanglement entropy) defined as S(n) ≡ −trρn log2 ρn = − ωα log2 ωα. (3.3) The exact numerical values of S(n) up to n = 6 are shown in Table 2. By analyzing the behaviour of the entanglement S(n) for large n, we can see how long quantum correlations reach [37]. In the massive region ∆ > 1, the entanglement entropy will be saturated as n grows due to the finite correlation length. This means the ground state is well approximated by a subsystem of a finite length corresponding to the large eigenvalues of reduced density matrix. On the other hand, in the massless case −1 < ∆ ≤ 1, the conformal field theory predict that the entanglement entropy shows a logarithmic divergence [38] S(n) ∼ 1 log2 n + k∆. (3.4) 1 2 3 4 5 6 Exact Asymptotics Figure 2: Entanglement entropy S(n) of a finite sub-chain of length n Our exact results up to n = 6 agree quite well with the asymptotic formula as shown in Figure 2. We estimate the numerical value of the constant term k∆=1/2 as k∆=1/2 ∼ S(6)− 13 log2 6 = 1.0528. This numerical value is slightly smaller than the isotropic case ∆ = 1, where the constant k∆=1 is estimated as k∆=1 ∼ 1.0607 from the exact data for S(n) up to n = 6 [24]. At free fermion point ∆ = 0, the exact asymptotic formula has been obtained in [39] S(n) ∼ 1 log2 n+ k∆=0, k∆=0 = 1/3− t sinh2(t/2) − cosh(t/2) 2 sinh3(t/2) / ln 2. (3.5) In this case the numerical value for the constant term is given by k∆=0 = 1.0474932144 · · · . 4 Summary and discussion We have succeeded in obtaining all the density matrix elements on six lattice sites for XXZ chain at ∆ = 1/2. Especially we have newly obtained the fourth- and fifth-neighbour transverse spin-spin correlation functions. Our exact results for the transverse correlations show good agreement with the asymptotic formula established in [35, 36]. Subsequently we have calculated all the eigenvalues of the reduced density matrix ρn up to n = 6. From these results we have exactly evaluated the entanglement entropy, which shows a good agreement with the asymptotic formula derived via the conformal field theory. Finally, we remark that similar procedures to evaluate the multiple integrals are also possible at ν = 1/n for n = 4, 5, 6, · · · , since there are similar trigonometric identities as (2.4). We will report the calculation of correlation functions for these cases in subsequent papers. Acknowledgement The authors are grateful to K. Sakai for valuable discussions. This work is in part sup- ported by Grant-in-Aid for the Scientific Research (B) No. 18340112. from the Ministry of Education, Culture, Sports, Science and Technology, Japan. Appendix A Density matrix elements up to n = 6 In this appendix we present all the independent density matrix elements defined in eq. (2.1) up to n = 6. Other elements can be computed from the relations ,··· ,ǫ′n ǫ1,··· ,ǫn = 0 if ǫj 6= ǫ′j , (A.1) ,··· ,ǫ′n ǫ1,··· ,ǫn = P ǫ1,··· ,ǫn ,··· ,ǫ′n ,··· ,−ǫ′n −ǫ1,··· ,−ǫn ǫ′n,··· ,ǫ ǫn,··· ,ǫ1 , (A.2) ,··· ,ǫ′n +,ǫ1,··· ,ǫn ,··· ,ǫ′n −,ǫ1,··· ,ǫn ,··· ,ǫ′n,+ ǫ1,··· ,ǫn,+ ,··· ,ǫ′n,− ǫ1,··· ,ǫn,− ,··· ,ǫ′n ǫ1,··· ,ǫn , (A.3) and the formula for the EFP [33, 34] P (n) = P +,··· ,+ +,··· ,+ = 2 (3k + 1)! (n+ k)! . (A.4) Appendix A.1 n ≤ 4 P−++− = − = −0.3125, P−++++− = = 0.0800781, P−++++−++ = − = −0.0269775, P−+++++−+ = 65536 = 0.0240936, P−++++++− = − 32768 = −0.00881958, P+−+++−++ = 16384 = 0.0632935, P+−++++−+ = − 32768 = −0.0611877, P−−+++−+− = − 65536 = −0.0583038, P−−++++−− = 65536 = 0.0212555, P−+−++−+− = 32768 = 0.149017, P−++−+−−+ = 32768 = 0.0943298. Appendix A.2 n = 5 P−+++++−+++ = − 14721 8388608 = −0.00175488, P−++++++−++ = 37335 16777216 = 0.00222534, P−+++++++−+ = − 48987 33554432 = −0.00145993, P−++++++++− = 13911 33554432 = 0.00041458, P+−++++−+++ = 179699 33554432 = 0.00535545, P+−+++++−++ = − 120337 16777216 = −0.00717264, P+−++++++−+ = 165155 33554432 = 0.004922, P++−++++−++ = 168313 16777216 = 0.0100322, P−−++++−−++ = 31069 2097152 = 0.0148149, P−−++++−+−+ = − 411583 16777216 = −0.0245323, P−−++++−++− = 196569 16777216 = 0.0117164, P−−+++++−+− = − 281271 33554432 = −0.00838253, P−−++++++−− = 79673 33554432 = 0.00237444, P−+−+++−−++ = − 1441787 33554432 = −0.0429686, P−+−+++−++− = − 1261655 33554432 = −0.0376002, P−+−++++−+− = 59459 2097152 = 0.0283523, P−++−++−++− = 1575515 33554432 = 0.046954, P−+++−+−−++ = − 696151 33554432 = −0.0207469, P−+++−+−+−+ = 1366619 33554432 = 0.0407284. Appendix A.3 n = 6 P−++++++−++++ = − 1546981 34359738368 = −0.0000450231, P−+++++++−+++ = 5095899 68719476736 = 0.0000741551, P−++++++++−++ = − 2366275 34359738368 = −0.0000688677, P−+++++++++−+ = 2455833 68719476736 = 0.0000357371, P−++++++++++− = − 284577 34359738368 = −8.28228× 10−6, P+−+++++−++++ = 2927709 17179869184 = 0.000170415, P+−++++++−+++ = − 20086627 68719476736 = −0.000292299, P+−+++++++−++ = 19268565 68719476736 = 0.000280395, P+−++++++++−+ = − 10295153 68719476736 = −0.000149814, P++−+++++−+++ = 17781349 34359738368 = 0.000517505, P++−++++++−++ = − 35087523 68719476736 = −0.000510591, P−−+++++−−+++ = 48421023 34359738368 = 0.00140924, P−−+++++−+−++ = − 214080091 68719476736 = −0.00311528, P−−+++++−++−+ = 88171589 34359738368 = 0.00256613, P−−+++++−+++− = − 57522267 68719476736 = −0.000837059, P−−++++++−−++ = 56776545 34359738368 = 0.00165241, P−−++++++−+−+ = − 154538459 68719476736 = −0.00224883, P−−++++++−++− = 60809571 68719476736 = 0.000884896, P−−+++++++−−+ = 6708473 8589934592 = 0.000780969, P−−+++++++−+− = − 33366621 68719476736 = −0.000485548, P−−++++++++−− = 3860673 34359738368 = 0.00011236, P−+−++++−−+++ = − 85706851 17179869184 = −0.0049888, P−+−++++−+−++ = 12211375 1073741824 = 0.0113727, P−+−++++−++−+ = − 332557469 34359738368 = −0.0096787, P−+−++++−+++− = 56183761 17179869184 = 0.00327033, P−+−+++++−−++ = − 430452959 68719476736 = −0.00626391, P−+−+++++−+−+ = 606065059 68719476736 = 0.00881941, P−+−+++++−++− = − 123612511 34359738368 = −0.0035976, P−+−++++++−−+ = − 108202041 34359738368 = −0.00314909, P−+−++++++−+− = 70061315 34359738368 = 0.00203905, P−++−+++−−+++ = 7860495 1073741824 = 0.00732066, P−++−+++−+−++ = − 591759525 34359738368 = −0.0172225, P−++−+++−++−+ = 1044016671 68719476736 = 0.0151924, P−++−+++−+++− = − 367905053 68719476736 = −0.00535372, P−++−++++−−++ = 676957849 68719476736 = 0.00985103, P−++−++++−+−+ = − 988973861 68719476736 = −0.0143915, P−++−++++−++− = 6581795 1073741824 = 0.00612977, P−++−+++++−−+ = 363618785 68719476736 = 0.00529135, P−+++−++−−+++ = − 185522333 34359738368 = −0.00539941, P−+++−++−+−++ = 901633567 68719476736 = 0.0131205, P−+++−++−++−+ = − 103539423 8589934592 = −0.0120536, P−+++−++−+++− = 38524625 8589934592 = 0.00448486, P−+++−+++−−++ = − 267901987 34359738368 = −0.00779697, P−+++−+++−+−+ = 12750645 1073741824 = 0.011875, P−+++−++++−−+ = − 309855965 68719476736 = −0.004509, P−++++−+−−+++ = 29410257 17179869184 = 0.0017119, P−++++−+−+−++ = − 296882461 68719476736 = −0.00432021, P−++++−+−++−+ = 35985105 8589934592 = 0.00418922, P−++++−++−−++ = 92176287 34359738368 = 0.00268268, P+−−++++−−+++ = 202646807 34359738368 = 0.0058978, P+−−++++−+−++ = − 972245985 68719476736 = −0.014148, P+−−++++−++−+ = 217687057 17179869184 = 0.0126711, P+−−+++++−+−+ = − 211696415 17179869184 = −0.0123224, P+−−++++++−−+ = 78922695 17179869184 = 0.00459391, P+−+−+++−+−++ = 1196499417 34359738368 = 0.0348227, P+−+−+++−++−+ = − 2209522727 68719476736 = −0.0321528, P+−+−++++−+−+ = 1108384987 34359738368 = 0.0322582, P+−++−++−++−+ = 530683585 17179869184 = 0.0308899, P+−++−+++−−++ = 347202525 17179869184 = 0.0202098, P−−−++++−−++− = − 268623007 68719476736 = −0.00390898, P−−−++++−+−+− = 46285135 8589934592 = 0.0053883, P−−−++++−++−− = − 136974885 68719476736 = −0.00199325, P−−−+++++−+−− = 19939391 17179869184 = 0.00116063, P−−−++++++−−− = − 18442085 68719476736 = −0.000268368, P−−+−+++−−++− = 1018463205 68719476736 = 0.0148206, P−−+−+++−+−+− = − 1454513249 68719476736 = −0.021166, P−−+−+++−++−− = 277721503 34359738368 = 0.00808276, P−−+−++++−+−− = − 335265249 68719476736 = −0.00487875, P−−++−++−−++− = − 369408975 17179869184 = −0.0215024, P−−++−++−+−+− = 1104236607 34359738368 = 0.0321375, P−−++−++−++−− = − 880560357 68719476736 = −0.0128138, P−−++−+++−−+− = − 876924641 68719476736 = −0.0127609, P−−+++−+−−−++ = 113631201 17179869184 = 0.00661421, P−−+++−+−−+−+ = − 292857807 17179869184 = −0.0170466, P−−+++−+−+−−+ = 548645951 34359738368 = 0.0159677, P−−+++−++−−−+ = − 377925345 68719476736 = −0.00549954, P−+−+−++−−++− = 1719255909 34359738368 = 0.0500369, P−+−+−++−+−+− = − 5350158879 68719476736 = −0.0778551, P−+−++−+−−+−+ = 1565770597 34359738368 = 0.0455699, P−+−++−+−+−−+ = − 3059753503 68719476736 = −0.0445253, P−++−−++−−++− = − 2117554719 68719476736 = −0.0308145. References [1] H.A. Bethe, Z. Phys. 71 (1931) 205. [2] M. Takahashi, Thermodynamics of One-Dimensional Solvable Models, Cambridge Uni- versity Press, Cambridge, 1999. [3] E. Lieb, T. Schultz, D. Mattis, Ann. Phys. (N.Y.) 16 (1961) 407. [4] B.M. McCoy, Phys. Rev. 173 (1968) 531. [5] M. Jimbo, K. Miki, T. Miwa, A. Nakayashiki, Phys. Lett. A 168 (1992) 256. [6] M. Jimbo, T. Miwa, Algebraic Analysis of Solvable Lattice Models, CBMS Regional Con- ference Series in Mathematics vol.85, American Mathematical Society, Providence, 1994. [7] A. Nakayashiki, Int. J. Mod. Phys. A 9 (1994) 5673. [8] V.E. Korepin, A. Izergin, F.H.L. Essler, D. Uglov, Phys. Lett. A 190 (1994) 182. [9] M. Jimbo, T. Miwa, J. Phys. A: Math. Gen. 29 (1996) 2923. [10] N. Kitanine, J.M. Maillet, V. Terras, Nucl. Phys. B 567 (2000), 554. [11] N. Kitanine, J.M. Maillet, N.A. Slavnov, V. Terras, Nucl. Phys. B 729 (2005) 558. [12] F.Göhmann, A. Klümper, A. Seel, J. Phys. A: Math. Gen 37 (2004) 7625. [13] F.Göhmann, A. Klümper, A. Seel, J. Phys. A: Math. Gen 38 (2005) 1833. [14] K. Sakai, “Dynamical correlation functions of the XXZ model at finite temperature”, cond-mat/0703319. [15] H.E. Boos, V.E. Korepin, J. Phys. A: Math. Gen. 34 (2001) 5311. [16] H.E. Boos, V.E. Korepin, “Evaluation of integrals representing correlators in XXX Heisenberg spin chain” in. MathPhys Odyssey 2001, Birkhäuser, Basel, (2001) 65. [17] H.E. Boos, V.E. Korepin, Y. Nishiyama, M. Shiroishi, J. Phys. A: Math. Gen 35 (2002) 4443. [18] K. Sakai, M. Shiroishi, Y. Nishiyama, M. Takahashi, Phys. Rev. E 67 (2003) 065101. [19] G. Kato, M. Shiroishi, M. Takahashi, K. Sakai, J. Phys. A: Math. Gen. 36 (2003) L337. [20] M. Takahashi, G. Kato, M. Shiroishi, J. Phys. Soc. Jpn, 73 (2004) 245. [21] G. Kato, M. Shiroishi, M. Takahashi, K. Sakai, J. Phys. A: Math. Gen. 37 (2004) 5097. [22] H.E. Boos, V.E. Korepin, F.A. Smirnov, Nucl. Phys. B 658 (2003) 417. [23] H.E. Boos, M. Shiroishi, M. Takahashi, Nucl. Phys. B 712 (2005) 573. [24] J. Sato, M. Shiroishi, M. Takahashi, J. Stat. Mech. 0612 (2006) P017. http://arxiv.org/abs/cond-mat/0703319 [25] J. Sato, M. Shiroishi, J. Phys. A: Math. Gen. 38 (2005) L405. [26] J. Sato, M. Shiroishi, M. Takahashi, Nucl. Phys. B 729 (2005) 441, hep-th/0507290. [27] H.E. Boos, M. Jimbo, T. Miwa, F. Smirnov, Y. Takeyama, Algebra Anal. 17 (2005) [28] H.E. Boos, M. Jimbo, T. Miwa, F. Smirnov, Y. Takeyama, Commun. Math. Phys. 261 (2006) 245. [29] H.E. Boos, M. Jimbo, T. Miwa, F. Smirnov, Y. Takeyama, J. Phys. A: Math. Gen. 38 (2005) 7629. [30] H.E. Boos, M. Jimbo, T. Miwa, F. Smirnov, Y. Takeyama, Lett. Math. Phys. 75 (2006) [31] H.E. Boos, M. Jimbo, T. Miwa, F. Smirnov, Y. Takeyama, Annales Henri Poincare 7 (2006) 1395. [32] N. Kitanine, J.M. Maillet, N.A. Slavnov, V. Terras, J. Stat. Mech. 0509 (2005) L002. [33] A.V. Razumov, Yu.G. Stroganov, J. Phys. A: Math. Gen. 34 (2001) 3185. [34] N. Kitanine, J.M. Maillet, N.A. Slavnov, V. Terras, J. Phys. A: Math. Gen. 35 (2002) L385. [35] S. Lukyanov, A. Zamolodchikov, Nucl. Phys. B 493 (1997) 571. [36] S. Lukyanov, V. Terras, Nucl. Phys. B 654 (2003) 323. [37] G. Vidal, J.I. Latorre, E. Rico, A. Kitaev, Phys. Rev. Lett. 90 (2003) 227902. [38] C. Holzhey, F. Larsen, F. Wilczek, Nucl. Phys. B 424 (1994) 443. [39] B.-Q. Jin, V.E. Korepin, J. Stat. Phys. 116 (2004) 79. http://arxiv.org/abs/hep-th/0507290 Introduction Analytical evaluation of multiple integral Reduced density matrix and entanglement entropy Summary and discussion Density matrix elements up to n=6 ABSTRACT We have analytically obtained all the density matrix elements up to six lattice sites for the spin-1/2 Heisenberg XXZ chain at $\Delta=1/2$. We use the multiple integral formula of the correlation function for the massless XXZ chain derived by Jimbo and Miwa. As for the spin-spin correlation functions, we have newly obtained the fourth- and fifth-neighbour transverse correlation functions. We have calculated all the eigenvalues of the density matrix and analyze the eigenvalue-distribution. Using these results the exact values of the entanglement entropy for the reduced density matrix up six lattice sites have been obtained. We observe that our exact results agree quite well with the asymptotic formula predicted by the conformal field theory. <|endoftext|><|startoftext|> Counting on Rectangular Areas Milan Janjić, Faculty of Natural Sciences and mathematics, Banja Luka, Republic of Srpska, Bosnia and Herzegovina. Counting on Rectangular Areas Abstract In the first section of this paper we prove a theorem for the number of columns of a rectangular area that are identical to the given one. A special case, concerning (0, 1)-matrices, is also stated. In the next section we apply this theorem to derive several combina- torial identities by counting specified subsets of a finite set. This means that the obtained identities will involve binomial coefficients only. We start with a simple equation which is, in fact, an immediate consequence of Binomial theorem, but it is derived independently of it. The second result concerns sums of binomial coefficients. In a special case we obtain one of the best known binomial identity dealing with alternating sums. Klee’s identity is also obtained as a special case as well as some formu- lae for partial sums of binomial coefficients, that is, for the numbers of Bernoulli’s triangle. 1 A counting theorem The set of natural numbers {1, 2, . . . , n} will be denoted by [n], and by |X | will be denoted the number of elements of the set X. For the proof of the main theorem we need the following simple result: (−1)|I| = 0, (1) where I run over all subsets of [n] (empty set included). This may be easily proved by induction or using Binomial theorem. But the proof by induction makes all further investigations independent even of Binomial theorem. Let A be an m× n rectangular matrix filled with elements which belong to a set Ω. By the i-column of A we shall mean each column of A that is equal to [c1, c2, . . . , cm] T , where c1, c2, . . . , cm of Ω are given. We shall denote the number of i-columns of A by νA(c) or simply by ν(c). For I = {i1, i2, . . . , ik} ⊂ [m], by A(I) will be denoted the maximal number of columns j of A such that aij 6= cj , (i ∈ I). http://arxiv.org/abs/0704.0851v1 We also define A(∅) = n. Theorem 1. The number ν(c) of i-columns of A is equal ν(c) = (−1)|I|A(I), (2) where summation is taken over all subsets I of [m]. Proof. Theorem may be proved by the standard combinatorial method, by counting the contribution of each column of A in the sum on the right side of We give here a proof by induction. First, the formula will be proved in the case ν(c) = 0 and ν(c) = n. In the case ν(c) = n it is obvious that for I 6= ∅ we have A(I) = 0, which implies (−1)|I|A(I) = n+ I 6=∅ (−1)|I|A(I) = n. In the case ν(c) = 0 we use induction on n. If n = 1 then the matrix A has only one column, which is not equal c. It yields that there exists i0 ∈ {1, 2, . . . ,m} such that ai0,1 6= ci0 . Denote by I0 the set of all such numbers. Then A(I) = 1 if and only if I ⊂ I0. From this and (1) we obtain (−1)|I|A(I) = (−1)|I| = 0. Suppose now that the formula is true for matrices with n columns and that A has n+ 1-columns, and νA(c) = 0. Omitting the first column, the matrix B with n columns remains. If I0 is the same as in the case n = 1, then (−1)|I|A(I) = I 6⊂I0 (−1)|I|A(I) + (−1)|I|A(I) = I 6⊂I0 (−1)|I|B(I) + (−1)|I|(B(I) + 1) = (−1)|I|B(I) + (−1)|I| = 0, since the first sum is equal zero by the induction hypothesis, and the second by For the rest of the proof we use induction on n again. For n = 1 the matrix A has only one column which is either equal c or not. In both cases theorem is true, from the preceding. Suppose that theorem holds for n, and that the matrix A has n+1 columns. We may suppose that ν(c) ≥ 1. Omitting one of the i-columns we obtain the matrix B with n columns. By the induction hypothesis theorem is true for B. On the other hand it is clear that A(I) = B(I) for each nonempty subset I. Furthermore A has one i-column more then B, which implies ν(c) = νA(c) = νB(c) + 1 = 1 + (−1)|I|B(I) = = 1 + n+ I 6=∅ (−1)|I|B(I) = 1 + n+ I 6=∅ (−1)|I|A(I). ν(c) = (−1)|I|A(I), and theorem is proved. If the number A(I) does not depend on elements of the set I, but only on its number |I| then the equation(2) may be written in the form ν(c) = (−1)i A(i), (3) where |I| = i. Our object of investigation will be (0, 1) matrices. Let c be the i- column of a such matrix A. Take I0 ⊆ [m], |I0| = k such that 1 i ∈ I0 0 i 6∈ I0 Then the number A(I) is equal to the number of columns of A having 0’s in the rows labelled by the set I ∩ I0, and 1’s in the rows labelled by the set I \ I0. Suppose that the number A(I) depends only on |I ∩ I0|, |I \ I0|. If we denote |I ∩ I0| = i1, |I \ I0| = i2, A(I) = A(i1, i2), then (2) may be written in the form ν(c) = (−1)i1+i2 A(i1, i2). (5) 2 Counting subsets of a finite set Suppose that a finite set X = {x1, x2, . . . , xn} is given. Label by 1, 2, . . . , 2 n all subsets of X arbitrary and define an n× 2n matrix A in the following way aij = 1 if xi lies in the set labelled by j 0 otherwise . (6) Take I0 ⊆ [n], |I0| = k, and form the submatrix B of A consisting of those rows of A which indices belong to I0. Let c be arbitrary i-column of B. Define = {i ∈ I0 : ci = 1}, I ′′0 = {i ∈ I0 : ci = 0} . (7) The number ν(c) is equal to the number of subsets that contain the set {xi, i ∈ I 0}, and do not intersect the set {xi : i ∈ I 0 }. There are obviously ν(c) = 2n−k, such sets. Furthermore, if I ⊆ I0 then the number B(I) is equal to the number of subsets that contain the set {xi : i ∈ I ∩ I }, and do not meet the set {xi : i ∈ I ∩ I ′0}. It is clear that there are B(I) = 2n−|I| such subsets, so that the formula (2) may be applied. It follows 2n−k = (−1)i 2n−i. Thus we have Proposition 2.1. For each nonnegative integer k holds (−1)i 2k−i. Note 2.1. The preceding equation is a trivial consequence of Binomial theorem. But here it is obtained independently of this theorem. The preceding Proposition shows that counting i-columns over all subsets of X always produce the same result. We shall now make some restrictions on the number of subsets of X . Take 0 ≤ m1 ≤ m2 ≤ n fixed, and consider the submatrix C of A consisting of rows whose indices belong to I0, and columns corresponding to those subsets of X that have m, (m1 ≤ m ≤ m2) elements. Let c be an i-column of C. Define I ′0 = {i ∈ I0 : ci = 1}, |I 0| = l. The number ν(c) is equal to the number of sets that contain {xi : i ∈ I and do not intersect the sets {xi : i ∈ I0 \ I }. We thus have m2−|I i=m1−|I n− |I0| On the other hand, for I ⊆ I0 the number C(I) corresponds to the number of sets that contain {xi : i ∈ I \ I }, and do not intersect {xi : i ∈ I ∩ I }. Its number is equal m2−|I\I i3=m1−|I\I n− |I| It follows that the formula (5) may be applied. We thus have Proposition 2.2. For 0 ≤ m1 ≤ m2 ≤ n, and 0 ≤ l ≤ k holds i=m1−l m2−i2 i3=m1−i2 (−1)i1+i2 k − l n− i1 − i2 In the special case when one takes k = l, m1 = m2 = m we obtain Corollary 2.1. For arbitrary nonnegative integers m,n, k holds (−1)i . (9) Note 2.2. The preceding is one of the best known binomial identities. It appears in the book [1] in many different forms. Taking m1 = m2 = m, in (8) one gets Corollary 2.2. For arbitrary nonnegative integer m,n, k, l, (l ≤ k) holds (−1)i1+i2 k − l n− i1 − i2 m− i2 , (10) For l = 0 we obtain (−1)i , (11) which is only another form of (9). Taking n = 2k, l = k in (10)we obtain (−1)i1 2k − i1 Substituting k − i1 by i we obtain Corollary 2.3. Klee’s identity,([2],p.13) (−1)k (−1)i k + i From (8) we may obtain different formulae for partial sums of binomial coefficients, that is, for the numbers of Bernoulli’s triangle. For instance, taking l = 0, m1 = 0, m2 = m we obtain Corollary 2.4. For any 0 ≤ m ≤ n and arbitrary nonnegative integer k holds (−1)i1 n+ k − i1 . (12) Note 2.3. The number k in the preceding equation may be considered as a free variable that takes nonnegative integer values. Specially, for k = 1 the equa- tion represents the standard recursion formula for the numbers of Bernoulli’s triangle. Taking k = l = m1, m2 = m one obtains (−1)i1 n+ k − i1 Note 2.4. The formulae (12) and (13) differs in the range of the index i2. References [1] J. Riordan, Combinatorial Identities. New York: Wiley, 1979. A counting theorem Counting subsets of a finite set ABSTRACT In the first section of this paper we prove a theorem for the number of columns of a rectangular area that are identical to the given one. In the next section we apply this theorem to derive several combinatorial identities by counting specified subsets of a finite set. <|endoftext|><|startoftext|> Introduction Photons have an extremely long mean free path length and escape from the hot matter without rescattering. By measuring their Bose-Einstein (or Hanbury-Brown Twiss, HBT) correlations one can extract the space-time dimensions of the hottest central part of the collision1,2,3,4,5 in contrast to hadron HBT correlations which measure the size of the system at the moment of its freeze-out. Moreover, photons emitted at different stages of the collision dominate in different ranges of trans- verse momentum6, therefore measuring photon correlation radii at various average transverse momenta (KT ) one can scan the space-time dimensions of the system at various times and thus trace the evolution of the hot matter. Photons emitted directly by the hot matter – direct photons – constitute only a small fraction of the total photon yield while the dominant contribution comes from decays of the final state hadrons, mainly π0 → 2γ and η → 2γ mesons. Fortunately, the lifetime of these hadrons is extremely large and the width of the Bose-Einstein correlations between the decay photons is of the order of a few eV and cannot obscure the direct photon correlations. This feature can be used to extract the direct photon yield3: assuming that direct photons are emitted incoherently, the photon correlation strength parameter can be related to the proportion of direct photons as λ = 1/2(Ndirγ /N 2. This approach is probably the only way to experimentally measure direct photon yield at very small pT . Presently, the only experiment to ∗For the full list of the PHENIX collaboration and acknowledgments, see9. http://arxiv.org/abs/0704.0852v1 November 9, 2018 19:7 WSPC/INSTRUCTION FILE DPeressounko- ggHBT-T 2 D. Peressounko have measured direct photon Bose-Einstein correlations in ultrarelativistic heavy ion collisions is WA987. An invariant correlation radius was extracted and the direct photon yield was measured in Pb+Pb collisions at sNN = 17 GeV. Since the strength of the direct photon Bose-Einstein correlation is typically a few tenths of a percent, it is important to exclude all background contributions which could distort the photon correlation function. These contributions can be classified as following: apparatus effects (close clusters interference – attraction of close clusters in the calorimeter during reconstruction) and correlations caused by real particles. The latter in turn can be divided into contribution due to ”splitting” of particles – processes like antineutron annihilation in the calorimeter and photon conversion on detector material in front of the calorimeter; contamination by corre- lated hadrons (e.g. Bose-Einstein-correlated π±); background correlations of decay photons. In this paper we consider all of these contributions in detail and describe how to control for them in the PHENIX experiment. 2. Analysis This analysis is based on the data taken by PHENIX in Run3 (d+Au) and Run4 (Au+Au). The total collected statistics is ≈ 3 billion d+Au events and ≈ 900 M Au+Au events. Details of the PHENIX configuration in these runs can be found in references 8 and 9, respectively. 2.1. Apparatus effects Since correlation functions are rapidly rising functions at small relative momenta any small distortion of the relative momentum for real pairs, because of errors in reconstruction of close clusters in the calorimeter (”cluster attraction”) for example, can lead to the appearance of a fake bump in the correlation function. To explore the influence of cluster interference in the calorimeter EMCAL, we construct a set of correlation functions by applying different cuts on the minimal distance between photon clusters in EMCAL. To quantify the difference between these correlation functions we fit them with a Gaussian and compare the extracted correlation parameters. We find that for correlation functions that include clusters with small relative distances there is strong dependence on minimal distance cut, but for distance cuts above 24 cm (4-5 modules) the correlation parameters are independent of the relative distance cut. This implies that with this distance cut the apparatus effects are sufficiently small. 2.2. Photon conversion, n̄ annihilation, and similar backgrounds The next class of possible backgrounds are processes in which one real particle produces several clusters in the calorimeter close to each other. These are processes like n̄ annihilation in the calorimeter producing several separated clusters, or photon conversion in front of calorimeter, or residual correlations between photons that November 9, 2018 19:7 WSPC/INSTRUCTION FILE DPeressounko- ggHBT-T Bose-Einstein correlations of direct photons in Au+Au collisions 3 (GeV) 0 0.05 0.1 0.15 0.2 0.25 0.3 C Min.Bias, Au+Au Min.Bias, d+Au, scaled Fig. 1. Two-photon correlation function measured in d+Au collisions at sNN = 200 GeV scaled to reproduce the height of the π0 peak in Au+Au collisions compared to the same correlation function measured in Au+Au collisions at sNN = 200 GeV. Absolute vertical scale is omitted in this technical plot. belong to different π0 in decays like η → 3π0 → 6γ. The common feature of this type of process is that their strength is proportional to the number of particles per event and not to the square of the number of particles, as would be the case for Bose-Einstein correlations. To estimate the upper limit on these contributions, we compare two-photon correlation functions, calculated in d+Au and Au+Au collisions. For the moment we assume, that all correlations at small relative momenta seen in d+Au collisions are due to the background effects under consideration. Then we scale the correlation function obtained in d+Au collisions with the number of π0 (that is we reproduce the height of the π0 peak in Au+Au collisions): Cscaled2 = 1− hAu+Auπ hd+Auπ (C2 − 1). (1) The result of this operation is shown in Fig. 1. We find that the scaled d+Au correlation function lies well below (close to unity) the correlation function calcu- lated for Au+Au collisions at small relative momenta. From this we conclude that the contribution from effects with strength proportional to the first power of the number of particles is negligible in Au+Au collisions. 2.3. Charged and neutral hadron contamination Another possible source of distortion of the photon correlation function is a contam- ination by (correlated) hadrons. Although we use rather strict identification criteria November 9, 2018 19:7 WSPC/INSTRUCTION FILE DPeressounko- ggHBT-T 4 D. Peressounko for photons there still may be some admixture of correlated hadrons contributing to the region of small relative momenta. (GeV) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 C Converted + EMCAL EMCAL + EMCAL Fig. 2. Comparison of two-photon correlation functions measured in Au+Au collisions at sNN = 200 GeV by two different methods: both photons are registered in the EMCAL (closed) and one photon is registered in EMCAL while the other is reconstructed through its external conversion (open). Absolute vertical scale is omitted in this technical plot. To exclude this possibility, we construct the two-photon correlation function using one photon registered in the calorimeter EMCAL and reconstructing the sec- ond photon from its conversion into an e+e− pair on the material of the beam pipe. The photon sample, constructed using external conversions is completely free from hadron contamination, so comparison of the standard correlation function with the pure one allows to estimate the contribution from non-photon contami- nation. This comparison is shown in Fig. 2. We find that the correlation function constructed with the more pure photon sample demonstrates a slightly larger cor- relation strength. This demonstrates that the observed correlation is indeed a pho- ton correlation, while hadron contamination in the photon sample just increases combinatorial background and reduces the correlation strength. In addition, this comparison shows that we have properly excluded the region of cluster interference. Due to deflection by the magnetic field the electrons of the e+e− conversion pair hit the calorimeter far from the location of the pair photon used in the correlation function and thus effects related to the interference of close clusters are absent. 2.4. Photon residual correlations The last possible source of the distortion of the photon correlation function are residual correlations between photons. We have already demonstrated that the con- November 9, 2018 19:7 WSPC/INSTRUCTION FILE DPeressounko- ggHBT-T Bose-Einstein correlations of direct photons in Au+Au collisions 5 tributions of residual correlations between photons in decays like η → 3π0 → 6γ, with strength proportional to Npart and not N part is negligible in Au+Au collisions. Below we consider other effects, which may cause photon correlations. These are collective flow (and jet-like correlations) and correlations between photons, origi- nated from decays of Bose-Einstein correlated mesons. Collective (elliptic) flow as well as jet-like correlations are long-range effects, resulting in correlations at relative angles much larger than under consideration here (for example, the opening angle of a photon pair with 20 MeV mass and KT = 500 MeV is ∼ 5 degrees). Monte-Carlo simulations demonstrate that flow and jet-like contribution are indeed negligible. (GeV) 0 0.05 0.1 0.15 0.2 0.25 0.3 C Min.Bias Au+Au, Data HBT resid.corr., Sim.0π Fig. 3. Comparison of two-photon correlation functions measured in Au+Au collisions at sNN = 200 GeV with Monte-Carlo simulations of the contribution of residual correlations due to decays of Bose-Einstein-correlated neutral pions. Absolute vertical scale is omitted in this technical plot. Potentially, the most serious distortion of the photon correlation function are residual correlations between decay photons of HBT-correlated π0s. Monte-Carlo simulations show that this contribution is not negligible, but has a rather specific shape (see Fig. 3), so that it does not distort the photon correlation function at small Qinv. This result can be explained as follows. Let us consider two π 0s with zero relative momentum. The distribution of decay photons is isotropic in their rest frame, and the probability to find a collinear photon pair (Qinv = 0) is suppressed due to phase space reasons. The photon pair mass distribution has a maximum at 2/3mπ, not at zero. After convoluting with the pion correlation function we find a step-like two-photon correlation function3. On the other hand, if one artificially chooses photons with momentum along the direction of the parent π0 (e.g. by looking at photon pairs at very large KT ), then the shape of the decay photon correlation function will reproduce the shape of the parent π0 correlation. This November 9, 2018 19:7 WSPC/INSTRUCTION FILE DPeressounko- ggHBT-T 6 D. Peressounko probably explains the different shape of the residual correlations due to decays of HBT-correlated π0 found in10. 3. Conclusions We have presented the current status of analysis of direct photon Bose-Einstein correlations in the PHENIX experiment. We are able to measure the two-photon correlation function with a precision sufficient to extract the direct photon corre- lations. Correlation measurements in which one of the photon pair has converted to an e+e− pair have been used to provide an important cross-check. We have demonstrated that all known backgrounds are under control. The extraction of the correlation parameters of direct photon pairs is in progress. References 1. A.N. Makhlin, JETP Lett. 46:55 (1987); A.N. Makhlin, Sov.J.Nucl.Phys. 49:151,(1989). 2. D.K. Srivastava, J. Kapusta, Phys.Rev. C48:1335 (1993); D.K. Srivastava, C. Gale, Phys.Lett. B319:407 (1993); D.K. Srivastava, Phys.Rev. D49:4523 (1994); D.K. Srivas- tava, J. Kapusta, Phys.Rev. C50:505 (1994). D.K. Srivastava, Phys.Rev. C71:034905 (2005); S. Bass, B. Muller, D.K. Srivastava, Phys.Rev. Lett. 93:162301 (2004). 3. D. Peressounko, Phys.Rev. C67:014905 (2003). 4. J. Alam et al., Phys.Rev. C67:054902 (2003); J. Alam et al., Phys.Rev. C70:054901 (2004). 5. T. Renk, Phys.Rev. C71:064905 (2005); hep-ph/0408218. 6. D. d’Enterria and D.Peressounko, Eur.Phys.J.C46:451 (2006). 7. M.M. Aggarwal et al., Phys.Rev.Lett. 93:022301 (2004). 8. S.S.Adler et al., (PHENIX collaboration), Phys.Rev.Lett. 98:012002. 9. S.Bathe et al., (PHENIX collaboration), Nucl.Phys. A774:731 (2006). 10. D.Das et al., nucl-ex/0511055. http://arxiv.org/abs/hep-ph/0408218 http://arxiv.org/abs/nucl-ex/0511055 Introduction Analysis Apparatus effects Photon conversion, annihilation, and similar backgrounds Charged and neutral hadron contamination Photon residual correlations Conclusions ABSTRACT The current status of the analysis of direct photon Bose-Einstein correlations in Au+Au collisions at $\sqrt{s_{NN}}=200$ GeV done by the PHENIX collaboration is summarized. All possible sources of distortion of the two-photon correlation function are discussed and methods to control them in the PHENIX experiment are presented. <|endoftext|><|startoftext|> Introduction Let (M, g) be a Riemannian manifold of dimension 2. The normalized ricci flow = (r −R)gij , where R is the scalar curvature and r is some constant. For compact surface, r is the average of scalar curvature. In this case, Hamilton [4] and Chow [2] proved the normalized Ricci flow from any initial metric will exist for all time and converge to a metric of constant curvature. It’s therefore nature to ask if such result holds for non-compact surfaces. Recently, a preprint of Ji and Sesum [14] generalized the above result to complete surfaces with logarithmic ends. Such surfaces have infinities like hyperbolic cusps. In particular, they have finite volume, therefore are parabolic, in the sense that there exists no positive Green’s function. One of their result shows that the normalized Ricci flow from such a metric will exist for all time and converge to hyperbolic metric. In this paper, we study nonparabolic complete surfaces, i.e. surfaces admitting positive Green’s function. In contrast to [14], such surfaces have at least one nonparabolic end and have infinite volume. For a discussion of parabolic and nonparabolic ends and their geometric characterization, see Li’s survey paper [6]. Here we choose r = −1 because if the flow converges, the limit metric will be of constant curvature r. Since we are considering noncompact surfaces, r can’t be positive. If r = 0, the limit will be flat R2 or its quotient. However, it’s well known that these flat surfaces are parabolic. On the other hand, whether a surface is parabolic or nonparabolic is invariant under quasi-isometries. Since if the normalized Ricci flow converges, then the limit metric will be quasi-isometric to the initial one, we know r can’t be zero.(For the definition of quasi-isometry, see also [6].) If r < 0, we can always assume r = −1 by a scaling. The main result of this paper is Theorem 1.1. Let (M, g) be a nonparabolic surface with bounded curvature. If the infinity is close to a hyperbolic metric in the sense that |R+ 1| dV < +∞. http://arxiv.org/abs/0704.0853v2 2 HAO YIN Then, the normalized Ricci flow will converge to a metric of constant scalar curva- ture −1. As in [14], we try to apply the above result to prove results along the line of Uniformization theorem. That amounts to prove the existence of a complete hyperbolic metric within a given conformal class of a noncompact surface. In [14], the authors proved that there is a uniformization theorem for Riemann surfaces obtained from compact Riemann surface by removing finitely many points and remarked that similar result should be true for Riemann surfaces obtained from compact ones by removing finitely many disjoint disks and points. Our theorem can be used to prove the same result in the case there is at least one disk removed. In fact, we will give a unified proof, which includes and simplifies the proof of [14]. Precisely, we will show Corollary 1.2. Let M be a Riemann surface obtained from compact Riemann sur- face by removing finitely many disjoint disks and/or points. If no disk is removed, then we further assume that the Euler number of M is less than zero. Then there exists on M a complete hyperbolic metric compatible with the conformal structure. The proof of Theorem 1.1 is along the same line as [14]. The method was initiated by Hamilton in [4]. There, Hamilton considered only compact case. for the purpose of generalizing this method to complete case, we need to overcome some analytic difficulties. Precisely, one need to solve Poisson equations and obtain estimates for the solutions, for all t. Those growth estimates for the solution are needed to apply the maximum principle. As for the maximum principle, there are many versions of maximum principle on complete manifolds. Since we will be working on complete manifold with a changing metric, the closest version for our need is in [1]. We still need a little modification. Theorem 1.3. Suppose g(t) is a smooth family of complete metrics defined on M , 0 ≤ t ≤ T with Ricci curvature bounded from below and ∣ ≤ C on M × [0, T ]. Suppose f(x, t) is a smooth function defined on M × [0, T ] such that △tf − whenever f(x, t) > 0 and exp(−ar2t (o, x))f2+(x, t)dVt < ∞ for some a > 0. If f(x, 0) ≤ 0 for all x ∈ M , then f ≤ 0 on M × [0, T ]. Although there is no detail in [1], one can prove it using the method of Ecker and Huisken in [3] and Ni and Tam in [12]. To solve the Poisson equation△u = R+1 for t = 0. We use a result of Ni[10], See Theorem 3.1. That’s the reason why we assume |R+ 1| dV < +∞. Moreover, we prove a growth estimate of the solution under the further assumption that Ricci curvature bounded from blow. This result is true for all dimensions. For the growth estimate, an estimate of Green’s function is proved under the assumption that Ricci curvature bounded from below. This estimate may be of independent interests, see the discussion in Section 2. Instead of solving △tu(x, t) = R(x, t) + 1 for later t. We solve an evolution equation for u. Thanks to the recent preprint of Chau, Tam and Yu [1], we can RICCI FLOW ON SURFACES 3 solve this evolution equation with a changing metric. Following a method in [11], we show that u, |∇u| and △u satisfy the growth estimate like in equation (1). With these preperation, we proceed to show that u(x, t) is indeed the potential functions we need. Now the Theorem 1.1 follows from the approach of Hamilton and repeated use of Theorem 1.3. The paper is organized as follows: In Section 2, we prove the crucial estimate of Green’s function needed for the growth estimate. In Section 3, we solve the Poisson equation and prove the relevant growth estimates. In the last section, we prove Theorem 1.1 and discuss results related to Uniformization theorem. 2. An estimate of Green’s function In this section we prove that Theorem 2.1. Let (M, g) be a complete noncompact manifold with Ricci curvature bounded from below by −K. Assume that M admits a positive Green’s function G(x, y). Let x0 be a fixed point in M . Then there exists constant A > 0 and B > 0, which may depend on M and x0, so that {G(x,y)>eAr(y,x0)} G(x, y)dx ≤ BeAr(y,x0), where r(y, x0) is the distance from y to x0. Remark 2.2. It’s impossible to get an estimate of this kind with constant depending only on K. Considering a family of nonparabolic manifolds Mi, which are becoming less and less ’nonparabolic’, i.e. their infinities are closing up. For any A,B > 0, there exists Mi and some xi ∈ Mi such that {Gi(x,xi)>A} Gi(x, xi)dx > B. See [8]. Remark 2.3. To the best of the author’s knowledge, known estimates on Green’s function in terms of volume of balls require Ricci curvature to be non-negative, See [9]. There could be one estimate of such type for Ricci curvature bounded from below, in light of [1]. If so, our relative estimate should be a corollary. The following proof is a direct one. We begin with a lemma, Lemma 2.4. There is a constant C depending only on K and the dimension, such that if Ricci curvature on B(x, 1) is bounded from below by −K and G(x, y) is the Dirichlet Green’s function on B(x, 1), then B(x,1) G(x, y)dy < C. Proof. Let H(x, y, t) be the Dirichlet heat kernel of B(x, 1). It’s easy to see B(x,1) H(x, y, t)dy ≤ 1, for all t > 0. 4 HAO YIN Now we prove that H(x, y, 2) is bounded from above. The proof is Moser itera- tion, which has appeared several times. Here we follow computations in [17]. Since we have Dirichlet boundary condition, we don’t need cut off function of space. Let 0 < τ < 2 and 0 < δ ≤ 1/2 be some positive constants, σk = (1− (1/2)kδ)τ and ηi be smooth function on [0,∞) such that 1) ηi = 0 on [0, σi], 2) ηi = 1 on [σi+1,∞) and 3) η′i ≤ 2i+3(δτ)−1. Let pi = (1 + 2n ) i. Since H is a solution to the heat equation, it’s easy to know Hp is a subsolution to the heat equation for p > 1. −△y)Hp(x, y, t) ≤ 0. Multiply by η2iH pi and integrate B(x,1) −△y)Hpidydt ≤ 0. Routine computation gives B(x,1) |∇yHpi |2 dydt+ B(x,1) H2pi(x, y, T )dy ≤ 2i+3(τδ)−1 B(x,1) H2pidydt. The sobolev inequality in [13] implies B(x,1) (Hpi) n−2 dy ≤ CV −2/n B(x,1) |∇yHpi |2 +H2pidy, where V is the volume of B(x, 1). By Hölder inequality, B(x,1) H2pi+1dy ≤ B(x,1) (Hpi) n−2 dy B(x,1) H2pidy)2/n ≤ (CV −2/n B(x,1) |∇yHpi |2 +H2pidy)( B(x,1) H2pidy)2/n. By (2), integrate over time B(x,1) H2pi+1dydt ≤ CV −2/nci+30 (στ)−(1+2/n)( B(x,1) H2pidydt)1+ where c0 = 2 1+2/n. A standard Moser iteration gives t∈[τ,2] y∈B(x,1) H2(x, y, t) ≤ CV −1(στ)− (1−δ)τ B(x,1) H2(x, y, t)dydt. An iteration process as given in [7] implies the L1 mean value inequality. In par- ticular, y∈B(x,1) H(x, y, 2) ≤ CV −1 B(x,1) H(x, y, t)dydt ≤ CV −1. Hence, B(x,1) H2(x, y, 2)dy ≤ CV −1. RICCI FLOW ON SURFACES 5 Due to a Poincaré inequality in [7], B(x,1) H2(x, y, t)dy = B(x,1) 2H△yHdy B(x,1) |∇yH |2 dy B(x,1) H2(x, y, t)dy. This differential inequality implies B(x,1) H2(x, y, t)dy ≤ B(x,1) H2(x, y, 2)dy × e−C(t−2) ≤ CV −1e−C(t−2). Hölder inequality shows B(x,1) H(x, y, t) ≤ V (B(x, 1)) B(x,1) H2(x, y, t)dy ≤ Ce−C(t−2), for t ≥ 2. The lemma follows from B(x,1) G(x, y)dy = B(x,1) H(x, y, t)dydt. Now let’s turn to the proof of Theorem 2.1. Proof. The key tool in the proof is Gradient estimate for harmonic function. Recall that if u is a positive harmonic function on B(x, 2R), then B(x,R) |∇ log u(x)|2 ≤ C1K + C2R−2 This is to say outside B(x, 0.1), the Green function as a function of y decays or increases at most exponentially with a factor C1K + 100C2. (1) Consider G(x0, y), Set p = max y∈∂B(x0,1) G(x0, y). As pointed out in Li and Tam, in the paper constructing Green function, G(x0, y) ≤ p for y /∈ B(x0, 1). Since the Green function is symmetric, for any point y far out in the infinity, G(y, x0) ≤ p. (2) If the theorem is not true, then for any big A and B, there is a point y (far away) so that {G(x,y)>eAr(y,x0)} G(x, y) > BeAr(y,x0). We will derive a contradiction with (1). Claim: {x|G(x, y) > eAr} ⊂ B(y, 1) is not true. If true, then consider the Dirichlet Green function G1(z, y) on B(y, 1). It’s well known that G(z, y) − G1(z, y) is a harmonic function. Notice that this harmonic function has boundary value less than eAr. Therefore, its integration on B(y, 1) is less than eAr × V ol(B(y, 1)). Since we assume Ricci lower bound, V ol(B(y, 1)) is less than a universal constant depending on K. 6 HAO YIN Therefore, {G(x,y)>eAr} G(x, y)dx ≤ B(y,1) G(x, y)dx ≤ V ol(B(y, 1))× eAr + B(y,1) G1(x, y)dx ≤ C(K,n)× eAr, where we used Lemma 2.4 for the last inequality. If we choose B to be any number larger than C(K,n) in the above equation, then the choice of y gives an contradic- tion and implies that the claim is true. (3) There is a z ∈ {x|G(x, y) > eAr} so that d(z, y) = 1 because the set {G(x, y) > eAr} is connected. This follows from the maximum principle and the construction of Green’s function. (3.1) If |d(y, x0)− d(z, x0)| < 0.3, then Let σ be the minimal geodesic connecting z and x0. Claim: the nearest distance from y to σ is no less than 0.1. If not, let w be the point in σ such that d(y, w) < 0.1. Since d(y, z) > 1, we d(w, z) > 0.9 Now, w is on the minimal geodesic from z to x0, so d(w, x0) ≤ d(z, x0)− 0.9 d(y, x0) < d(w, x0) + d(y, w) < d(z, x0)− 0.8 This is a contradiction , so the claim is true. We can use the gradient estimate along the segment σ. (Notice that d(z, x0) < r(x, x0) + 1) G(y, x0) > G(y, z) C1K + 100C2(r + 1)) This is a contradiction if we choose A >> C1K + 100C2. (3.2) If d(z, x0) ≤ d(y, x0)− 0.3, then The distance from y to the minimal geodesic connecting z and x0 will be larger than 0.1. The above argument gives a contradiction. (3.3) If d(z, x0) ≥ d(y, x0) + 0.3, then Since G(z, y) > eAr, we move the center to z, by symmetry of Green function. G(y, z) > eAr(y,x0)) > eA ′r(z,x0). This is case (3.2). We get a contradiction at z. This finishes the proof of estimate of Green function. � 3. Poisson equations △u = R+ 1 This section is divided into two parts. The first part solves the Poisson equation for t = 0. The second part solves for t > 0 before the maximum time using an indirect way. First, we use Theorem 2.1 to obtain an growth estimate of the solution of the Poisson equation △u = R + 1 for t = 0. The existence part without curvature restriction and boundedness of f of the following theorem is due to Lei Ni in [10]. RICCI FLOW ON SURFACES 7 Theorem 3.1. Let M be a complete nonparabolic manifold with Ricci curvature bounded from below by −K. For non-negative bounded continuous function f the Poisson equation △u = −f has a non-negative solution u ∈ W 2,nloc (M) ∩ C loc (M)(0 < α < 1) if f ∈ L1(M). Moreover, for any fixed x0 ∈ M , there exists A > 0 and C > 0 such that u(x) ≤ CeAr(x,x0). Proof. Let G(x, y) be the positive Green’s function. G(x, y)f(y)dy = {G(x,y)≤eAr(x,x0)} G(x, y)f(y)dy {G(x,y)>eAr(x,x0)} G(x, y)f(y)dy ≤ CeAr(x,x0). For the first term, we use the assumption that f is integrable, for the second term, we use the boundedness of f and the Theorem 2.1. The estimate above shows the Poisson equation is solvable with the required estimate. � Corollary 3.2. Let M be a surface satisfying the assumptions in Theorem 1.1. There exists a solution u0 to the equation △u0 = R(x) + 1 satisfying exp(−ar2(x, x0))u20(x)dV < ∞ exp(−br2(x, x0)) |∇u0|2 (x)dV < ∞ where a and b are some positive constants. Proof. Solve the Poisson equation for the positive part and the negative part of R+ 1 respectively. Then subtract the solutions. The first integral estimate follows from the pointwise growth estimate and volume comparison. Let R > 1. Choose a cut-off function ϕ such that ϕ(x) = 1 x ∈ B(x0, R) 0 x /∈ B(x0, 2R) |∇ϕ|2 ≤ C1ϕ. Multiply the equation by ϕu0 and integrate over M , ϕu0△u0dV = (R + 1)ϕu0dV, which implies ϕ |∇u0|2 dV + u0∇ϕ · ∇u0dV = − (R+ 1)ϕu0dV. 8 HAO YIN Hence (ϕ− |∇ϕ| ) |∇u0|2 dV ≤ C B(x0,2R) u20dV + C B(x0,2R) |u0| dV. B(x0,2R) u20dV + CV ol(B(x0, 2R)). From the integration estimate of u0, B(x0,2R) u20dV ≤ Ce4aR By choice of ϕ, B(x0,R) |∇u0|2 dV ≤ CeãR From here, it’s not difficult to see the estimate we need. � Now let’s look at the case of t > 0. In fact, it’s not difficult to show the above method can be used for t > 0. This amounts to show that M is still nonparabolic for t > 0 and |R+ 1| dV is still finite. The first claim is trivial and the second follows from the evolution equation and maximum principle. Assume the solutions are u(t). We have trouble in deriving the evolution equation for u(t), due to the possible existence of nontrivial harmonic functions. This explains why we use the following indirect way. Lemma 3.3. Assume the normalized Ricci flow exists for t ∈ [0, Tmax). The following equation has a solution u(x, t) (0 ≤ t < Tmax) with initial value u0, = △u− u, where △ is the Laplace operator of metric g(t). Moreover, there exists a > 0 depending on T such that for any T < Tmax exp(−ar2(x, x0))u2(x, t)dVt < ∞. Similar estimates hold for |∇u| and △u with different constants. Remark 3.4. Since g(0) and g(t) are equivalent up to a constant depending on T , it doesn’t matter whether we estimate ∇u or ∇tu and whether we use r to stand for distance at g(0) or g(t) if t ∈ [0, T ]. Proof. In [1], the authors considered a class of evolution equation with changing metric. ∂u = △u− u with the underling metric evolving by normalized Ricci flow is in this class. They proved, among other things, that the fundamental solution Z(x, t; y, s) has a Gaussian upper bound, i.e Z(x, t; y, x) ≤ C t− s) r2(x,y) D(t−s) . These constants depends on the solution of normalized Ricci flow and T . See Corollary 5.2 in [1]. For simplicity, denote Z(x, t; y, 0) by H(x, y, t), then to solve the equation, it suffices to show the following integral converges, u(x, t) = H(x, y, t)u0(y)dy. RICCI FLOW ON SURFACES 9 Bt(x,1) H(x, y, t)u0(y)dy ≤ CeAr(x,x0), because the integral of H on Bt(x, 1) is less than 1 and u0 grows at most exponen- tially by Theorem 2.1. M\Bt(x,1) H(x, y, t)u0(y)dy M\Bt(x,1) r2(x,y) Dt |u0(y)| dy. By volume comparison, Vx(1) ≥ C1e−A1r(x,x0)Vx0(1) t) ≥ C2e−A1r(x,x0) min(1, tn/2). Therefore M\Bt(x,1) H(x, y, t)u0(y)dy M\Bt(x,1) CeA1r(x,x0)e− r2(x,y) 2DT |u0(y)| dy M\Bt(x,1) CeA2r(x,x0)eAr(x,y)e− r2(x,y) 2DT dy ≤ CeA2r(x,x0). In summary, |u(x, t)| ≤ CeAr(x,x0), where A means a different constant. Volume comparison then implies exp(−ar2(x, x0))u2(x, t)dVt < ∞. For estimates on derivatives, note first that etu(x, t) is a solution of heat equation (with evolving metric) with initial value u0. Since we allow constants depend on T , it’s equivalent to prove estimates for etu(x, t). Therefore, from now on, to the end of this proof, we assume u(x, t) is a solution of heat equation. Then (2) (△− ∂ )u2 = 2 |∇u| . Assume that ϕ : R+ → R+ satisfies 1) ϕ(x) = 1 for x ≤ 1; 2) ϕ(x) = 0 for x ≥ 2. Choose the cut-off function ϕ( r(x,x0) )(R > 1). Multiplying this to the equation (2) and integrate, r(x, x0) ) |∇u|2 dVtdt ≤ r(x, x0) )u2dVtdt. △ϕ( r ) = div(ϕ′( = ϕ′′( |∇r|2 + ϕ′( 10 HAO YIN By definition of ϕ, we know ϕ( r ) vanishes unless R ≤ r(x, x0) ≤ 2R. Laplacian comparison implies (curvature is bounded from below −k) △r ≤ (n− 1) kcoth( kr) ≤ C. Therefore, ϕ△u2dVt ≤ C B(x0,2R) u2dVt. Let dVt = e FdV0, (ϕu2)eF dV0dt ≥ (ϕu2eF )dtdV0 − Cϕu2dVtdt ϕu2(x, T )dVT − ϕu2(x, 0)dV0 − C ϕu2dVtdt ϕu20(x)dV0 − C ϕu2dVtdt. Here we have used the fact that ∂e is bounded. Combined with equation (3) and B(x0,R) |∇u|2 dVtdt ≤ C B(x0,2R) u2dVtdt+ B(x0,2R) u20(x)dV0. From here it’s easy to see the type of estimate in Theorem 1.3. For △u, it suffices to consider ∣. The Bochner formula in this case is (remember we have assumed that u is a solution of the heat equation), (△− ∂ ) |∇u|2 = 2 2 − |∇u|2 . The same argument as before works for Lemma 3.5. For t ∈ [0, Tmax), △tu(x, t) = R(x, t) + 1. Proof. We know for t = 0 it’s true. Calculation shows (△tu−R(t)− 1) = (R+ 1)△tu+△t(△tu− u)−△tR−R(R+ 1) = △t(△tu−R(t)− 1) +R(△tu−R− 1) By previous lemma, we have growth estimate for △tu−R(t)−1. If △tu−R−1 ≥ 0, −△t)(△tu−R(t)− 1) ≤ C(△tu−R− 1). If △tu−R− 1 ≤ 0, then −△t)(△tu−R(t)− 1) ≥ C(△tu−R− 1). Apply maximum principle for △tu − R − 1, which is zero at t = 0. We know it’s zero forever. � RICCI FLOW ON SURFACES 11 4. Proof of the main theorem and the corollary Assume we have a surface satisfying the assumptions of Theorem 1.1. Short time existence is known, see [15]. The long time existence and convergence follows exactly by an argument of Hamilton in [4]. For completeness, we outline the steps. Solve Poisson equations△tu(x, t) = R(x, t)+1 as we did. Consider the evolution equation for H = R+ 1 + |∇u|2, H = △H − 2 |M |2 −H, where M = ∇∇u − 1 △f · g. Since we have growth estimate for H , maximum principle says R+ 1 ≤ H ≤ Ce−t. Therefore, after some time R will be negative everywhere. Applying maximum principle again to the evolution equation of scalar curvature R = △R+ R(R+ 1) will prove Theorem 1.1. Next, we discuss the application of the above theorem to Uniformization theorem. Let S be a compact Riemann surface. Let p1, · · · , pk be k different points in S and D1, · · · , Dl be l domains on S such that all of them are disjoint and Di is diffeomorphic to disk. Denote S \ ∪iDi \ {p1, · · · , pk} by M . The aim is to show there exists a complete hyperbolic metric on M compatible with the conformal structure. The approach is to construct an initial metric g0 on M compatible with the conformal structure so that the normalized Ricci flow starting from g0 will converge to a hyperbolic metric. Assume there is metric h in the given conformal class of S. Note that h is incomplete as a metric on M . For pi, there is an isothermal coordinate (x, y) around pi. By a conformal change of h, one can ask g0 to be (x2 + y2) log2(x2 + y2) (dx2 + dy2) in a small neighborhood Ui of pi. Remark 4.1. This is called hyperbolic cusp metric in [14] and it has scalar curva- ture −1. For Dj, let r be the distence to ∂Dj on M with respect to h. Let Vj be a neighborhood of ∂Dj in M . Let (r, θ) be the Fermi coordinates for ∂Dj so that h0 = dr 2 +A(r, θ)dθ2. We will find ρ = ρ(r, θ) such that 1) ρ = 0 on ∂Dj; 2) dρ 6= 0 on ∂Dj; 12 HAO YIN is asymptoticly hyperbolic in high order. Let K and K0 be the Gaussian curvature of h and g0 respectively. We have the formula, K0 = ρ 2(△h log ρ+K). In order that K0 = −1, 1− |∇ρ|2 + ρ△ρ+ ρ2K = 0. In terms of r and θ, |∇ρ|2 = (∂ρ )2 +A−1(r, θ)( △ρ = ∂ Here A, B, C and D are smooth functions of r and θ. The equation now becomes (5) ρ + 1− ( )2 −A−1( )2 + ρ2K = 0. If equation (5) is true at r = 0, then (r, θ) = 1. Here we used that fact that ρ > 0. Set η(r, θ) = ρ . Equation (6) implies η(0, θ) = 1. Equation (5) becomes +Brη +Br2 ∂η + Cr2 ∂η +Dr2 ∂ + 1−η )2 −A−1 r )2 + ηr2K = 0 For the convinience of formal calculation, this equation is rewritten as (7) (D2 −D − 2)η + F [r, η] = 0, where D = r ∂ F [r, η] = Brη +Br2 + Cr2 (1− η)2 )2 −A−1 r )2 + ηr2K. Equation (7) is a very typical form of Fuchsian type PDE. Formal solutions of this kind of equation has been discussed many times. For example, Kichenassamy [5] and Yin [16]. We will only outline the main steps here, for details see [5] and [16]. Consider formal solution with the following expansion, (8) η(r, θ) = 1 + aij(θ)r i(log r)j . We will call the sum j=0 aijr i(log r)j the i-level of the expansion. Note that D maps i-level to i-level. Details on formal calculation could be find in [5] and [16]. A common feature of all terms in F [r, η], which is crutial in obtaining a formal solution, is that the k-level of F [r, η] could be calculated with knowledge of only l-level of η with l < k. For example, consider (1 − η)2/η. It’s the multiplication of three formal series, two 1 − η and 1/η. In order the k-level of η appears in the RICCI FLOW ON SURFACES 13 k-level of (1 − η)2/η, the only possibility is that two of the three series contribute zero level and one k-level. However, the zero level of 1− η vanishes. The only thing we need is that there exists a formal solution and furthermore due to Borel’s Lemma as in [16], there is an approximate solution so that (D2 −D − 2)η + F [r, η] = o(rk) for any k. In terms of ρ, (9) K0 + 1 = 1 + ρ 2(△h log ρ+K) = o(ρk) for any k. This metric g0 near ∂Dj has Gaussian curvature -1 asymptotically. By a scaling, we assume it has scalar curvature -1 asymptotically. We construct g0 by doing the above to every point Pi and disk Dj. If there is at least one disk removed, we know M is nonparabolic. |R+ 1| dV is finite because of equation (9). Therefore, Theorem 1.1 proves the Uniformization in this case. If there is no disk removed, i.e. M = S \ {p1, · · · , pk} and M has negative Euler number, then it’s proved in [14] that there exists a hyperbolic metric in the conformal class. A large part of [14] is devoted to solve (10) △g0u = Rg0 + 1 with |∇u| < ∞. Observe that the above equation is equivalent to (11) △hu = (Rg0 + 1). Since every end of (M, g0) is a hyperbolic cusp, Gauss-Bonnet theorem says Rg0dV0 = 2πχ(M) < 0. There exists a function f of compact support on M such that the volume of (M, efg0) is −2πχ(M), because (M, g0) has finite volume. Denote efg0 by g0, since the infinity is not changed, equation (12) is still true. Now, the volume of (M, g0) is −2πχ(M). This implies (Rg0 + 1)dV0 = 0. Therefore (Rg0 + 1)dVh = 0. By construction of g0, we know (Rg0+1) is zero near Pi. So (Rg0+1) is a smooth function on S. Therefore, equation (11) is solvable. Since u is a smooth function on compact surface S, u has bounded gradient with respect to h. The relation of h and g0 near Pi is explicit. It’s straight forward to check u has bounded gradient as a function of (M, g0). This simplifies the proof in [14]. 14 HAO YIN Remark 4.2. In the case that there is at least one disk removed, by construction of g0, Rg0 +1 vanishes at high order near ∂Dj. Then one can extend the definition (Rg0 + 1) to S so that (Rg0 + 1)dVh = 0. The rest is the same as in the previous case. This method of solving Poisson equation depends on the conformal structure of M , therefore Theorem 3.1 and Theorem 1.1 are not coverd by the above discussion. References [1] Chau, A., Tam, L.-F. and Yu, C., Pseudolocality for the Ricci flow and applications, preprint, DG/0701153. [2] Chow, B., The Ricci flow on the 2-sphere, J. Diff. Geom. 33(1991), 325-334. [3] Ecker, K. and Huisken, G., Interior estimates for hypersurfaces moving by mean curvature, Invent. Math., 105(1991), 547-569. [4] Hamilton, R., The Ricci flow on surfaces. Mathematics and General Relativity, Contemporary Mathematics, 71(1988), 237-261. [5] Kichenassamy, S., On a conjecture of Fefferman and Graham, Adv. In Math., 184(2004), 268-288. [6] Li, P., Curvature and function theorey on Riemannian manifolds, Surveys in Differential Ge- ometry, Vol VII, International Press(2000), 375-432. [7] Li, P. and Schoen, R., Lp and mean value properties of subharmonic functions on Riemannian manifolds, Acta Math. , 153(1984), no.3-4, 279-301. [8] Li, P. and Tam, L.-F., Symmetric Green’s function on complete manifolds, Amer. J. Math., 109(1987), 1129-1154. [9] Li,P. and Yau, S.-T., On the parabolic kernel of the Schrödinger operator, Acta. Math., 156(1986), 139-168. [10] Ni, L. Poisson equation and Hermitian-Einstein metrics on holomorphic vector bundles over complete noncompact Kähler manifolds, Indiana Univ. Math. Jour., 51 (2002), 670-703. [11] Ni, L. and Tam, L.-F., Plurisubharmonic functions and the structure of complete Kähler manifolds with nonnegative curvture, J. Diff. Geom., 64(2003), 457-524. [12] Ni, L. and Tam, L.-F., Kähler Ricci flow and Poincaré Lelong equation, Comm. Anal. Geom., 12(2004), no 1. 111-141. [13] Saloff-Coste, L. Uniform elliptic operators on Riemannian manifolds, J. Diff. Geom., 36(1992), 417-450. [14] Ji, L.-Z. and Sesum, N. Uniformization of conformally finite Riemann surfaces by the Ricci flow, preprint, DG/0703357. [15] Shi, W.-X., Deforming the metric on complete Riemannian manifolds, J. Diff. Geom., 30(1989), 223-301. [16] Yin, H., Boundary regularity of harmonic maps from Hyperbolic space into nonpositively curved manifolds, to appear in Pacific. J. Math.. [17] Zhang, Q.-S., Some gradient estimates for the heat equation on domains and for an equation by Perelman, preprint, DG/0605518. 1. Introduction 2. An estimate of Green's function 3. Poisson equations u=R+1 4. Proof of the main theorem and the corollary References ABSTRACT This paper studies normalized Ricci flow on a nonparabolic surface, whose scalar curvature is asymptotically -1 in an integral sense. By a method initiated by R. Hamilton, the flow is shown to converge to a metric of constant scalar curvature -1. A relative estimate of Green's function is proved as a tool. <|endoftext|><|startoftext|> myjournal manuscript No. (will be inserted by the editor) Polarization properties of subwavelength hole arrays consisting of rectangular holes Xi-Feng Ren, Pei Zhang, Guo-Ping Guo⋆, Yun-Feng Huang, Zhi-Wei Wang, Guang-Can Guo Key Laboratory of Quantum Information, University of Science and Technology of China, Hefei 230026, People’s Republic of China Received: date / Revised version: date Abstract Influence of hole shape on extraordinary op- tical transmission was investigated using hole arrays con- sisting of rectangular holes with different aspect ratio. It was found that the transmission could be tuned con- tinuously by rotating the hole array. Further more, a phase was generated in this process, and linear polar- ization states could be changed to elliptical polarization states. This phase was correlated with the aspect ratio of the holes. An intuitional model was presented to explain these results. PACS numbers:78.66.Bz,73.20.MF, 71.36.+c 1 introduction In metal films perforated with a periodic array of sub- wavelength apertures, it has long been observed that there is an unusually high optical transmission[1]. It is ⋆ E-mail: e-mail: :gpguo@ustc.edu.cn believed that metal surface plays a crucial role and the phenomenon is mediated by surface plasmon polaritons (SPPs) and there is a process of transforming photon to SPP and back to photon[2,3,4]. This phenomenon can be used in various applications, for example, sen- sors, optoelectronic device, etc[5,6,7,8,9,10]. Polariza- tion properties of nanohole arrays have been studied in many works[11,12,13]. Recently, orbital angular momen- tum of photons was explored to investigate the spatial mode properties of surface plasmon assisted transmis- sion [14,15]. It is also showed that entanglement of pho- ton pairs can be preserved when they respectively travel through a hole array [15,16,17]. Therefore, the macro- scopic surface plasmon polarizations, a collective excita- tion wave involving typically 1010 free electrons propa- gating at the surface of conducting matter, have a true quantum nature. However, the increasing use of EOT requires further understanding of the phenomenon. http://arxiv.org/abs/0704.0854v2 2 Xi-Feng Ren, Pei Zhang, Guo-Ping Guo, Yun-Feng Huang, Zhi-Wei Wang, Guang-Can Guo The polarization of the incident light determines the mode of excited SPP which is also related to the periodic structure. For the manipulation of light at a subwave- length scale with periodic arrays of holes, two ingredi- ents exist: shape and periodicity[2,3,4,11,18,19,20]. In- fluence of unsymmetrical periodicity on EOT was dis- cussed in [21]. Influence of the hole shape on EOT was also observed recently[18,20], in which the authors mainly focused on the transmission spectra. In this work, we used rectangle hole arrays to investigate the influence of hole shape on the polarization properties of EOT. It is found that linear polarization states could be changed to elliptical polarization states and a phase could be added between two eigenmode directions. The phase was changed when the aspect ratio of the rectangle holes was varied. The hole array was also rotated in the plane per- pendicular to the illuminate beam. The optical transmis- sion was changed in this process. It strongly depended on the rotation angle, in other words, the angle between polarization of incident light and axis of hole array, as in the case with unsymmetrical hole array structure[21]. 2 experimental results and modeling 2.1 Relation between transmission efficiency and photon polarization Fig. 1(a) is a scanning electron microscope picture of part of our hole arrays. The hole arrays are produced as follows: after subsequently evaporating a 3-nm tita- nium bonding layer and a 135-nm gold layer onto a 0.5- mm-thick silica glass substrate, a focused ion beam etch- ing system is used to produce rectangle holes (100nm× 100nm, 100nm × 150nm, 100nm × 200nm, 100nm × 300nm respectively) arranged as a square lattice (520nm period). The area of the hole array is 10µm× 10µm. Transmission spectra of the hole arrays were recorded by a silicon avalanche photodiode single photon counter couple with a spectrograph through a fiber. White light from a stabilized tungsten-halogen source passed though a single mode fiber and a polarizer (only vertical polar- ized light can pass), then illuminated the sample. The hole arrays were set between two lenses of 35mm focal length, so that the light was normally incident on the hole array with a cross sectional diameter about 10µm and covered hundreds of holes. The light exiting from the hole array was launched into the spectrograph. The hole arrays were rotated anti-clockwise in the plane per- pendicular to the illuminating light, as shown in Fig. (a) (b) Fig. 1 (Color online)The rectangle hole arrays. (a) Scanning electron microscope pictures. (b) Rotation direction. S (L) is the axis of short (long) edge of rectangle hole; H(V) is horizontal (vertical) axis. Polarization properties of subwavelength hole arrays consisting of rectangular holes 3 1(b). Transmission spectra of the hole arrays for rota- tion angle θ = 0o and 90o were given in Fig. 2. There were large difference between the two cases, which was also observed in [18]. Further, the typical hole array(100nm×300nm holes) was rotated anti-clockwise in the plane perpendicular to the illuminating light(see Fig.1 (b)). Transmission effi- ciencies of H and V photons(702nm wavelength) were measured with rotation angle θ = 0o, 30o, 45o, 60o, and 90o respectively, as shown in Fig. 3. They were varied with θ. To explain the results, we gave a simple model. For our sample, photons with 702nm wavelength will excite the SPP eigenmodes (0,±1) and (±1, 0). Since the SPPs were excited in the directions of long (L) and short (S) edges of rectangle holes, we suspected that this 550 600 650 700 750 800 850 550 600 650 700 750 800 850 550 600 650 700 750 800 850 550 600 650 700 750 800 850 Wavelength(nm) Fig. 2 (Color online)Hole array transmittance as a function of wavelength for rotation angle θ = 0o(black square dots) and 90o(red round dots)(holes for a, b, c, and d are 100nm× 100nm, 100nm × 150nm, 100nm × 200nm, and 100nm × 300nm respectively). The dashed vertical lines indicate the wavelength of 702nm used in the experiment. two directions were eigenmode-directions for our sample. The polarization of illuminating light was projected into the two eigenmode-directions to excite SPPs. After that, the two kinds of SPPs transmitted the holes and irritated light with different transmission efficiencies TL and TS respectively. For light whose polarization had an angle θ with the S direction, the transmission efficiency Tθ will Tθ = TS cos 2(θ) + TL sin 2(θ). (1) This equation was also given in the works[20,21]. Due to the unequal values of TL and TS, the whole transmis- sion efficiency was varied with angle θ. So if we know the transmission spectra for enginmode-directions (here L and S), we can calculate out the transmission spectra (including the heights and locations of peaks) for any 0 15 30 45 60 75 90 10000 20000 30000 40000 50000 60000 70000 Tilt angle (degree) Fig. 3 (Color online)Transmittance as a function rotation angle θ for photons in 702nm wavelength(100nm × 300nm holes). Red round dots and black square dots are the counts for V and H photons respectively. The lines come from the- oretical calculation. 4 Xi-Feng Ren, Pei Zhang, Guo-Ping Guo, Yun-Feng Huang, Zhi-Wei Wang, Guang-Can Guo θ. The theoretical calculations were also given in Fig. 3, which agreed well with the experimental data. The similar results were also observed when the hole arrays (100nm× 150nm and 100nm× 200nm) were used. With this model, the transmission efficiency can be continu- ously tuned in a certain range. 2.2 Influence of hole shape on photon polarization To investigate the polarization property of the hole ar- ray, we used the method of polarization state tomog- raphy. Experimental setup was shown in Fig. 4. White light from a stabilized tungsten-halogen source passed though single mode fiber and 4nm filter (center wave- length 702 nm) to generate 702nm wavelength photons. Polarization of input light was controlled by a polarizer, a HWP (half wave plate, 702nm) and a QWP (quar- ter wave plate, 702nm). The hole array was set between two lenses of 35mm focal length. Symmetrically, a QWP, a HWP and a polarizer were combined to analyze the polarization of transmitted photons. For arbitrary in- put states, the output states were measured in the four bases: H , V , 1/ 2(|H〉 + |V 〉), and 1/ 2(|H〉 + i|V 〉). With these experimental data, we could get the density matrix of output states, which gave the full polarization characters of transmitted photons. For example, in the case of θ = 0o, for input state 1/ 2(|H〉 + eI∗0.5π|V 〉), four counts (8943, 31079, 3623 and 21760) were recorded when we used the four detection bases. The density ma- trix was calculated as: 0.223 −0.410− 0.043i −0.410 + 0.043i 0.777 , (2) which had a fidelity of 0.997 with the pure state 0.472|H〉+ 0.882eI∗0.967π|V 〉. Compared this state with the input state, we found that not only the ratio of |H〉 and |V 〉 was changed, but also a phase ϕ = 0.467π was added between them. The similar phenomenon was also ob- served when the input state was 1/ 2(|H〉 + |V 〉) and in this case ϕ = 0.442π. We also considered the cases for θ = 30o, 45o, 60o, and 90o. The experimental density matrices had the fidelities all larger than 0.960 with the theoretical calculations, where ϕ = (0.462 ± 0.053)π. It can be seen that the phase ϕ was hardly influenced by the rotation. To study the dependence of phase ϕ with the hole shape, we performed the same measurements on other hole arrays which were shown in Fig. 1. It was found that ϕ was changed with the aspect ratio of the rectan- Filter Detector HA SMF Source SMF Polarization Controller Polarization Analyzer Fig. 4 (Color online)Experimental setup to investigate the polarization property of our rectangle hole array. Polariza- tion of input light was controlled by a polarizer, a HWP and a QWP. The hole array was set between two lenses of 35mm focal length. Symmetrically, a QWP, a HWP and a polar- izer were combined to analyze the polarization of transmitted photons. Polarization properties of subwavelength hole arrays consisting of rectangular holes 5 gle holes. Fig. 5 gave the relation between ϕ and aspect ratio. The phases are 0, (0.227±0.032)π, (0.357±0.020)π and (0.462±0.053)π for aspect ratio 1, 1.5, 2.0 and 3.0 re- spectively. As mentioned above, period is another impor- tant parameter in the EOT experiments. Since no similar result was observed for hole arrays with symmetrical pe- riods, a special quadrate hole array(see Fig. 1 of [21]) was also investigated to show the influence of the hole period. We found that even the periods were different in two di- rections, there was no birefringent phenomenon(ϕ = 0). This birefringent phenomenon might be explained with the propagating of SPPs on the metal surface. As we know, the interaction of the incident light with sur- face plasmon is made allowed by coupling through the grating momentum and obeys conservation of momen- k sp = k 0 ± i Gx ± j Gy, (3) 1.0 1.5 2.0 2.5 3.0 Aspect ratio Fig. 5 (Color online)Relation between birefringent phase ϕ and hole shape aspect ratio. ϕ becomes lager when the aspect ratio increases. where k sp is the surface plasmon wave vector, k 0 is the component of the incident wave vector that lies in the plane of the sample, Gx and Gy are the reciprocal lattice vectors, and i, j are integers. Usually, Gx = Gy = 2π/d for a square lattice, and relation k sp ∗ d = mπ was sat- isfied, where m was the band index[22]. While for our rectangle hole arrays, the length of holes in L direction was changed form 150nm to 300nm, which was not as same as it in S direction. Though Gx = Gy = 2π/d for our rectangle hole array, the time for surface plasmon polariton propagating in the L direction must be influ- enced by the aspect ratio of hole shape, which could not be same as that in the S direction. A phase difference ϕ was generated between the two directions, leading the birefringent phenomenon. Due to the absorption or scat- tering of the SPPs and scattering at the hole edges, it is hard to give the accurate value of the phase or the exact relation between the phase and aspect ratio of holes. Even so, ϕ could be controlled by changing the hole shape. As a contrast, there was no birefringent phe- nomenon observed when the quadrate hole array(see Fig. 1 of [21]) was used. The reason was that phase Gx ∗ dx always equal to Gy ∗ dy, even Gx 6= Gy for the quadrate hole array. 3 conclusion In conclusion, rectangle hole array was explored to study the influence of hole shape on EOT, especially the prop- erties of photon polarization. Because of the unsymmet- 6 Xi-Feng Ren, Pei Zhang, Guo-Ping Guo, Yun-Feng Huang, Zhi-Wei Wang, Guang-Can Guo rical of the hole shape, a birefringent phenomenon was observed. The phase was determined by the hole shape, which gave us a potential method to control this bire- fringent process. It was also found that the transmission efficiency can be tuned continuously by rotating the hole array. These results might be explained using an intu- itional model based on surface plasmon eigenmodes. This work was funded by the National Fundamental Research Program, National Natural Science Foundation of China (10604052), Program for New Century Excel- lent Talents in University, the Innovation Funds from Chinese Academy of Sciences, the Program of the Educa- tion Department of Anhui Province (Grant No.2006kj074A). Xi-Feng Ren also thanks for the China Postdoctoral Sci- ence Foundation (20060400205) and the K. C. Wong Ed- ucation Foundation, Hong Kong. References 1. T.W. Ebbesen, H. J. Lezec, H. F. Ghaemi, T. Thio, and P. A. Wolff, Nature 391, 667 (1998). 2. H. Raether, Surface Plasmons on Smooth and Rough Sur- faces and on Gratings, Vol. 111 of Springer Tracts in Mod- ern Physics, Springer, Berlin, (1988). 3. D. E. Grupp, H. J. Lezec, T. W. Ebbesen, K. M. Pellerin, and Tineke Thio, Appl. Phys. Lett. 77 1569 (2000). 4. M. Moreno, F. J. Garca-Vidal, H. J. Lezec, K. M. Pellerin, T. Thio, J. B. Pendry, and T. W. Ebbesen, Phys. Rev. Lett. 86, 1114 (2001). 5. S. M. Williams, K. R. Rodriguez, S. Teeters-Kennedy, A. D. Stafford, S. R. Bishop, U. K. Lincoln, and J. V. Coe, J. Phys. Chem. B. 108, 11833 (2004). 6. A. G. Brolo, R. Gordon, B. Leathem, and K. L. Kavanagh, Langmuir. 20, 4813 (2004). 7. A. Nahata, R. A. Linke, T. Ishi, and K. Ohashi, Opt. Lett. 28, 423 (2003). 8. X. Luo and T. Ishihara, Appl. Phys. Lett. 84, 4780 (2004). 9. S. Shinada, J. Hasijume and F. Koyama, Appl. Phys. Lett. 83, 836 (2003). 10. C. Genet and T. W. Ebbeson, Nature, 445, 39 (2007). 11. J. Elliott, I. I. Smolyaninov, N. I. Zheludev, and A. V. Zayats, Opt. Lett. 29, 1414 (2004). 12. R. Gordon, A. G. Brolo, A. McKinnon, A. Rajora, B. Leathem, and K. L. Kavanagh, Phys. Rev. Lett. 92, 037401 (2004). 13. E. Altewischer, C. Genet, M. P. van Exter, and J. P. Woerdman, Opt. Lett. 30, 90 (2005). 14. X. F. Ren, G. P. Guo, Y. F. Huang, Z. W. Wang, and G. C. Guo, Opt. Lett. 31, 2792, (2006). 15. X. F. Ren, G. P. Guo, Y. F. Huang, C. F. Li, and G. C. Guo, Europhys. Lett. 76, 753 (2006). 16. E. Altewischer, M. P. van Exter and J. P. Woerdman Nature 418 304 (2002). 17. S. Fasel, F. Robin, E. Moreno, D. Erni, N. Gisin and H. Zbinden, Phys. Rev. Lett. 94 110501 (2005). 18. K. J. Klein Koerkamp, S. Enoch, F. B. Segerink, N. F. van Hulst and L. Kuipers, Phys. Rev. Lett. 92 183901 (2004). 19. Zhichao Ruan and Min Qiu, Phys. Rev. Lett. 96 233901 (2006). Polarization properties of subwavelength hole arrays consisting of rectangular holes 7 20. M. Sarrazin, J. P. Vigneron, Opt. Commun. 240 89 (2004) . 21. X. F. Ren, G. P. Guo, Y. F. Huang, Z. W. Wang, and G. C. Guo, Appl. Phys. Lett. 90, 161112 (2007). 22. F. L. Tejeira, S. G. Rodrigo, L. M. Moreno, F. J. G. Vi- dal, E. Devaux, T. W. Ebbesen, J. R. Krenn, I. P. Radko, S. I.Bozhevolnyi, M. U. Gonzalez, J. C. Weeber, and A. Dereux, Nature Physics 3, 324 (2007). introduction experimental results and modeling conclusion ABSTRACT Influence of hole shape on extraordinary optical transmission was investigated using hole arrays consisting of rectangular holes with different aspect ratio. It was found that the transmission could be tuned continuously by rotating the hole array. Further more, a phase was generated in this process, and linear polarization states could be changed to elliptical polarization states. This phase was correlated with the aspect ratio of the holes. An intuitional model was presented to explain these results. <|endoftext|><|startoftext|> Introduction Cooling and trapping alkaline-earth atoms offer interest- ing alternatives to alkaline atoms. Indeed, the singlet- triplet forbidden lines can be used for optical frequency measurement and related subjects [1]. Moreover, the spin- less ground state of the most abundant bosonic isotopes can lead to simpler or at least different cold collisions prob- lems than with alkaline atoms [2]. Considering fermionic isotopes, the long-living and isolated nuclear spin can be controlled by optical means [3] and has been proposed to implement quantum logic gates [4]. It has also been shown that the ultimate performance of Doppler cooling can be greatly improved by using narrow transitions whose pho- ton recoil frequency shifts ωr are larger than their natural widths Γ [5]. This is the case for the 1S0 →3 P1 spin- forbidden line of Magnesium (ωr ≈ 1100Γ ) or Calcium (ωr ≈ 36Γ ). Unfortunately, both atomic species can not be hold in a standard magneto-optical trap (MOT) be- cause the radiation pressure force is not strong enough to overcome gravity. This imposes the use of an extra quenching laser as demonstrated for Ca [6]. For Stron- tium, the natural width of the intercombination transition (Γ = 2π×7.5 kHz) is slightly broader than the recoil shift (ωr = 2π×4.7 kHz). The radiation pressure force is higher than the gravity but at the same time the final tempera- ture is still in the µK range [7,8]. In parallel, the narrow transition partially prevents multiple scattering processes and the related atomic repulsive force [10]. Hence impor- tant improvements on the spatial density have been re- ported [7]. However, despite experimental efforts, such as adding an extra confining optical potential, pure optical methods have not allowed yet to reach the quantum de- generacy regime with Strontium atoms [9]. In this paper, we will discuss some performances, es- sentially in terms of temperatures, sizes and loading rates, of a Strontium 88 MOT using the 689 nm 1S0 →3P1 in- tercombination line. Initially the atoms are precooled in a MOT on the spin-allowed 461 nm 1S0 →1P1 transition (natural width Γ = 2π × 32MHz) as discussed in [11]. Then the atoms are transferred into the 689 nm intercombination MOT. To achieve a high loading rate, Katori et al. [7] have used laser spectrum, broadened by frequency modulation. Thus the velocity capture range of the 689 nm MOT matches the typical velocity in the 461 nm MOT. They report a transfer efficiency of 30%. The same value of transfer effi- ciency is also reported in reference [8]. In our set-up, 50% of the atoms initially in the blue MOT are transferred into the red one. In section 3 we present a systematic study of the transfer efficiency as function of the parameters of the frequency modulation. In order to discuss the intrin- sic limitations of the loading efficiency, we compare our experimental results to a simple model. In particular, we demonstrated that our transfer efficiency is limited by the size of the red MOT beams. We show that it could be op- timized up to 90% with realistic laser power (25mW per beams). The minimum temperature achieved in the broadband MOT is about 2.5µK. In order to reduce the tempera- ture down to the photon recoil limit (0.5µK), we apply a second cooling stage, using a single frequency laser and observe similar temperatures, detuning and intensity de- pendencies as reported in the literature (see references [7], http://arxiv.org/abs/0704.0855v2 2 T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line [8], [12] and [13]). In those publications, the role of gravity on the cooling and trapping dynamics along the z vertical direction has been discussed. In this paper we compare the steady state behaviour along vertical (z) direction to that along the horizontal plane (x−y) where gravity plays indirectly a crucial role (section 4). Details about the dynamics are given in references [8],[12]. In particular the authors establish three regimes. In regime (I) the laser detuning |δ| is larger than the power-broadened linewidth ΓE = Γ (1 + s). Regime (II) on the contrary corresponds to ΓE > |δ|. In both regimes (I) and (II) ΓE ≫ Γ, ωr and the semiclassical limit is a good approximation. In regime (III) the saturation pa- rameter is small and a full quantum treatment is required. We will focus here on the semiclassical regime (I). In this regime, we confirm that the temperature along the z di- rection is independent of the detuning δ. Following Loftus et al. [12], we have also found (see section 4.1) that this behavior is due to the balance of the gravitational force and the radiation pressure force produced by the upward pointing laser (the gravity defining the downward direc- tion). The center of mass of the atomic cloud is shifted downward from the magnetic field quadrupole center. As a consequence, cooling and trapping in the horizontal plane occur at a strong bias magnetic field mostly perpendicu- lar to the cooling plane. This unusual situation is studied in detail (section 4.2). Despite different friction and dif- fusion coefficients along the horizontal and the vertical directions, the horizontal temperature is found to be the same as the vertical one (see section 4.3). In reference [12], the trapping potential is predicted to have a box shape whose walls are given by the laser detuning. This is in- deed the case without a bias magnetic field along the z axis. It is actually different for the regime (I) described in this paper. Here we have found that the trapping poten- tial remain harmonic. This leads to a cloud width in the horizontal direction which is proportional to |δ| (section 4.2). 2 Experimental set-up Our blue MOT setup (on the broad 1S0 →1 P1 transi- tion at 461 nm) is described in references [14,15]. Briefly, it is composed by six independent laser beams typically 10mW/cm2 each. The magnetic field gradient is about 70G/cm. The blue MOT is loaded from an atomic beam extracted from an oven at 550 ◦C and longitudinally slowed down by a Zeeman slower. The loading rate of our blue MOT is of 109 atoms/s and we trap about 2.106 in a 0.6mm rms radius cloud when no repumping lasers are used [16]. To optimize the transfer into the red MOT, the temperature of the blue MOT should be as small as possi- ble. As previously observed [11], this temperature depends strongly on the optical field intensity. We therefore de- crease the intensity by a factor 5 (see figure 1) 4ms before switching off the blue MOT. The rms velocity right before the transfer stage is thus reduced down to σb = 0.6m/s whereas the rms size remains unchanged. Similar two stage cooling in a blue MOT is also reported in reference [13]. The 689 nm laser source is an anti-reflection coated laser diode in a 10 cm long extended cavity, closed by a diffraction grating. It is locked to an ULE cavity using the Pound-Drever-Hall technique [17]. The unity gain of the servo loop is obtained at a frequency of 1MHz. From the noise spectrum of the error signal, we derive a frequency noise power. It shows, in the range of interest, namely 1Hz − 100 kHz, an upper limit of 160 Hz2/Hz which is low enough for our purpose. The transmitted light from the ULE cavity is injected into a 20mW slave laser diode. Then the noise components at frequencies higher than the ULE cavity cut-off (300 kHz) are filtered. It is important to note that the lateral bands used for the lock-in are also removed. Those lateral bands, at 20MHz from the carrier, are generated modulating directly the current of the master laser diode. A saturated spectroscopy set-up on the 1S0 →3P1 intercombination line is used to compensate the long term drift of 10−50Hz/s mainly due to the daily temperature change of the ULE cavity. The slave beam is sent through an acousto-optical mod- ulator mounted in a double pass configuration. The laser detuning can then be tuned within the range of a few hundreds of linewidth around the resonance. This acousto- optical modulator is also used for frequency modulation (FM) of the laser, as required during the loading phase (see section 3). The red MOT is made of three retroreflected beams with a waist of 0.7 cm. The maximum intensity per beam is about 4mW/cm2 (the saturation intensity being Is = 3µW/cm2). The magnetic gradient used for the red MOT is varied from 1 to 10G/cm. To probe the cloud (number of atoms and tempera- ture) we use a resonant 40µs pulse of blue light (see fig 1). The total emitted fluorescence is collected onto an avalanche detector. From this measurement, we deduce the number of atoms and then evaluate the transfer rate into the red MOT. At the same time, an image of the cloud is taken with an intensified CCD camera. The typical spa- tial resolution of the camera is 30µm. Varying the dark period (time-of-flight) between the red MOT phase and the probe, we get the ballistic expansion of the cloud. We then derive the velocity rms value and the corresponding temperature. 3 Broadband loading of the red MOT The loading efficiency of a MOT depends strongly on the width of the transition. With a broad transition, the max- imum radiation pressure force is typically am = 104 × g, where vr is the recoil velocity [18]. Hence, on l ≈ 1 cm (usual MOT beam waist) an atom with a veloc- ity vc = 2aml ≈ 30m/s can be slowed down to zero and then be captured. During the deceleration, the atom re- mains always close to resonance because the Doppler shift is comparable to the linewidth. Thus MOTs can be di- rectly loaded from a thermal vapor or a slow atomic beam using single frequency lasers. Moreover typical magnetic field gradients of few tens of G/cm usually do not dras- T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line 3 tically change the loading because the Zeeman shift over the trapping region is also comparable to the linewidth. An efficient loading is more complex to achieved with a narrow transition. For Strontium, the maximum radia- tion pressure force of a single laser is only am ≈ 15 × g. Assuming the force is maximum during all the capture process, one gets vc = 2aml ≈ 1.7m/s. Hence, precool- ing in the blue MOT is almost mandatory. In that case the initial Doppler shift will be vcλ −1 ≈ 2.5MHz, 300 times larger than the linewidth. In order to keep the laser on resonance during the capture phase, the red MOT lasers must thus be spectrally broadened. Because of the low value of the saturation intensity, the spectral power den- sity can easily be kept large enough to maintain a maxi- mum force with a reasonable total power (few milliwatts). The magnetic field gradient of the MOT may also affect the velocity capture range. To illustrate this point, let us consider an atom initially in the blue MOT at the center of the trap with a velocity vc = 1.7m/s. During the decel- eration, the Doppler shift decreases whereas the Zeeman shift increases. However, the magnetic field gradient does not affect the capture velocity as far as the total shift (Doppler+Zeeman) is still decreasing. This condition is fulfilled if the magnetic field gradient is lower than [19]: λgeµbvc ≈ 0.6G/cm (1) where ge = 1.5 is the Landé factor of the 3P1 level and µb = 1.4MHz/G is the Bohr magneton. In practice we use a magnetic field gradient which is larger than bc. In that case, it is necessary to increase the width of the laser spec- trum so that the optimum transfer rate is not limited by the Zeeman shift (see section 3.2). An alternative solution may consist of ramping the magnetic field gradient during the loading [7]. 3.1 Transfer rate: experimental results In this section we will present the experimental results re- garding the loading efficiency of the red MOT from the blue MOT. To optimize the transfer rate, the laser spec- trum is broadened using frequency modulation (FM). Thus the instantaneous laser detuning is ∆(t) = δ+∆ν. sin νmt. ∆ν and νm are the frequency deviation and modulation frequency respectively, δ is the carrier detuning. Here, the modulation index ∆ν/νm is always larger than 1, thus the so-called wideband limit is well fulfilled. Hence one can assume the FM spectrum to be mainly enclosed in the interval [δ −∆ν; δ +∆ν]. As shown in figure 2, the transfer rate increases with νm up to 15 kHz where we observe a plateau at 45% trans- fer efficiency. On the one hand when νm is larger than the linewidth, the atoms are in the non-adiabatic regime where they interact with all the Fourier components of the laser spectrum. Moreover, the typical intensity per Fourier component remains always higher than the saturation in- tensity Is = 3µW/cm 2. As a consequence, the radiation pressure force should be close to its maximum value for any atomic velocity. On the other hand when νm < Γ/2π, the atoms interact with a chirped intense laser where the mean radiation pressure force (over a period 2π/νm) is clearly smaller than in the case νm > Γ/2π. As a conse- quence, the transfer rate is reduced when νm decreases. In figure 3, the transfer rate is measured as a func- tion of ∆ν. The carrier detuning is δ = −1MHz and the modulation frequency is kept larger than the linewidth (νm = 25 kHz). Starting from no deviation (∆ν = 0), we observe (fig. 3) an increase of the transfer rate with ∆ν (in the range 0 < ∆ν < 500 kHz). After reaching its maximum value, the transfer rate does not depend on ∆ν anymore. Thus the capturing process is not limited by the laser spectrum anymore. If we further increase the frequency deviation ∆ν, the transfer becomes less efficient and fi- nally decreases again down to zero. This reduction occurs as soon as ∆ν > |δ|, i.e. some components of the spectrum are blue detuned. This frequency configuration obviously should affect the MOT steady regime adding extra heat- ing at zero velocity (see section 3.3). We can see that it is also affecting the transfer rate. To confirm that point, figure 4 shows the same experiment but with a larger de- tuning δ = −1.5MHz and δ = −2MHz for the figures 4a and 4b respectively. Again the transfer rate decreases as soon as ∆ν > |δ|. The transfer rate is also very small on the other side for small values of ∆ν. In that case the entire spectrum of the laser is too far red detuned. The radiation pressure forces are significant only for velocities larger than the capture velocity and no steady state is expected. Keeping now the deviation fixed and varying the detuning as shown in figure 5, we observe a maximum transfer rate when the detuning is close to the deviation frequency ∆ν ≃ |δ|. Closer to resonance (∆ν < |δ|), the blue detuned components prevent an efficient loading of the MOT. The magnetic field gradient plays also a crucial role for the loading. We indeed observe (fig. 6) that the transfer rate decreases when the magnetic field gradient increases. At very low magnetic field (b < 1G/cm) the reduction of the transfer rate is most likely due to a lack of stabil- ity within the trapping region. In that case we actually observe a strong displacement of the center of mass of the cloud. This is induced by imperfections of the set-up such as non-balanced laser intensities which are critical at low magnetic gradient. Hence, the optimum magnetic field gradient is found to be the smallest one which ensure the stability of the cloud in the MOT. 3.2 Theoretical model and comparison with the experiments To clearly understand the limiting processes of the transfer rate, we compare the experimental data to a simple 1D theoretical model based on the following assumptions: - An atom undergoes a radiation pressure force and thus a deceleration if the modulus of its velocity is between vmax and vmin with vmax = λ(|δ|+∆ν), vmin = max{λ(|δ|−∆ν);λ(−|δ|+∆ν)} 4 T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line am = 0 elsewhere. We simply write that the Doppler shift is contained within the FM spectrum. We add the condi- tion vmin = λ(−|δ|+∆ν) when some components are blue detuned ∆ν > |δ|. In this case, we consider the simple ideal situation where the two counter-propagating lasers are assumed perfectly balanced and then compensate each other in the spectral overlapping region. - Even in the semiclassical model, it is difficult to calculate the acceleration as a function of the velocity for a FM spec- trum. However for all the data presented here, the satura- tion parameter is larger than one. Hence the deceleration is set to a constant value − 1 am when vmin < |v| < vmax. The prefactor 1/3 takes into account the saturation by the 3 counter-propagating laser beam pairs. - The magnetic field gradient is included by giving a spa- tial dependence of the detuning δ in the expression (2). - An atom will be trapped if its velocity changes of sign within a distance shorter that the beam waist. In figures 3-6 the results of the model are compared to the experimental data. The agreement between the model and the experimental data is correct except at large fre- quency deviation (figures 3 and 4) or at low detuning (fig- ure 5). In those cases the spectrum has some blue detuned components. As mentionned before, this is a complex sit- uation where the assumptions of the simple model do not hold anymore. Fortunately those cases do not have any practical interest because they do not correspond to the optimum transfer efficiency. At the optimum, the model suggests that the transfer is limited by the beam waist (see caption of figures (3-6)). Moreover for all the situation explored in figures 3-5, the magnetic field gradient is strong enough (b = 1G/cm) to have an impact on the capture process, as suggested by the inequality (1). However it is not the transfer limiting factor because the Zeeman shift is easily compensated by a larger frequency excursion or by a larger detuning. Increasing the beam waist would definitely improve the transfer efficiency as showed in figure 7. If the saturation parameter would remain large for all values of beam waist, more than 90% of the atoms would be transferred for a 2 cm beam waist. 25mW of power per beam should be sufficient to achieve this goal. In our experimental set-up, the power is limited to 3mW per beam. So the satura- tion parameter is necessarily reduced once the waist is increased. To take this into account and get a more realis- tic estimation of the efficiency for larger beams, we replace the previous acceleration by the expression ams/(1 + 3s), with s = I/Is the saturation parameter per beam. In this case, the transfer efficiency becomes maximum at 70% for a beam waist of 1.5 cm. 3.3 Temperature Cooling with a broadband FM spectrum on the intercom- biaison line decreases the temperature by three orders of magnitude in comparison with the blue MOT: from 3mK (σb = 0.6m/s) to 2.5µK (see figure 8). For small detuning, the temperature is strongly increasing when the spectrum has some blue detuned components (∆ν > |δ|). Indeed the cooling force and heating rate are strongly modified at the vicinity of zero detuning. This effect is illustrated in figure 8. On the other side at large detuning (δ < −1.5MHz), the temperature becomes constant. This regime corresponds to a detuning independent steady state, as also observed in single frequency cooling (see ref. [12] and section 4). 4 Single frequency cooling About half of the atoms initially in the 461 nm MOT are recaptured in the red one using a broadband laser. The final temperature is 2.5µK i.e. 5 times larger than the photon recoil temperature Tr = 460 nK. To further de- crease the temperature one has to switch to single fre- quency cooling (for time sequences: see figure 1). As we will see in this section, the minimum temperature is now about 600 nK close to the expected 0.8Tr in an 1D mo- lasses [5]. Moreover, one has to note that, under proper conditions described in reference [12], the transfer between the broadband and the single frequency red MOT can be almost lossless. In the steady state regime of the single frequency red MOT, one has kσv ≈ ωr ≈ Γ . Thus, there is no net sepa- ration of different time scales as in MOTs operated with a broad transition where ωr << kσv << Γ . However, here the saturation parameter s always remains high. It cor- responds to the so-called regimes (I) and (II) presented in reference [12]. Thus ωr << Γ 1 + s and the semiclas- sical Doppler theory describes properly the encountered experimental situations. To insure an efficient trapping, the parameter’s val- ues of the single frequency red MOT are different from a usual broad transition MOT: the magnetic field gradi- ent is higher, typically 1000Γ/cm. Moreover the gravity is not negligible anymore by comparison with the typical radiation pressure. Those features lead to an unusual be- havior of the red MOT as we will explain in this section. We will first independently analyze the MOT properties along the vertical dimension (section 4.1) then in the hor- izontal plane (section 4.2), to finally compare those two situations (section 4.3). 4.1 Vertical direction In the regime (I) i.e. at large negative detuning and high saturation (see examples on figure 9a) the temperature is indeed constant. As explained in reference [12], this be- havior is due to the balance between the gravity and the radiation pressure force of the upward laser. At large neg- ative detuning, the downward laser is too far detuned to give a significant contribution. In the semiclassical regime, an atom undergoes a net force of Fz = h̄k 1 + sT + 4(δ − geµBbz − kvz)2/Γ 2 −mg (3) Considering the velocity dependence of the force, the first order term is: T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line 5 Fz ≈ −γzvz (4) γz = −4 h̄k2δeff (1 + sT + 4δ /Γ 2)2 where the effective detuning δeff = δ − geµBb < z > is define such as 1 + sT + 4δ = mg (6) sT is the total saturation parameter including all the beams. < z > is the mean vertical position of the cold cloud. Hence δeff is independent of the laser detuning δ and the vertical temperature at larger detuning depends only on the intensity as shown in figures 9a and 9b. The spatial properties of the cloud are also related to the effective detuning δeff which is independent of δ. The mean vertical position depends linearly on the detuning, so that one has : d < z > geµBb The predicted vertical displacement is compared to the experimental data in figure 10a. The agreement is excel- lent (the only adjustable parameter is the unknown origin of the vertical axe). Because the radiation pressure force for an atom at rest does not depend on the laser detuning δ, the vertical rms size should be also δ-independent. This point is also verified experimentally (see figure 10b). 4.2 x− y horizontal plane Let us now study the behavior of the cold cloud in the x−y plane at large laser detuning. As explained in section 4.1, the position of the cloud is vertically shifted downward with respect to the center of the magnetic field quadrupole (see figure 11). The dynamic in the x−y plane occurs thus in the presence of a high bias magnetic field. To derive the expression of the semiclassical force in this unusual situation one has first to project the circular polarizations states of the horizontal lasers on the eigenstates. We define the quantification axis along the magnetic field, one gets: e+x = 1 + sinα cosα√ 1− sinα e−x = 1− sinα cosα√ 1+ sinα where e−i , πi and e i represent respectively the left-handed, linear and right-handed polarisations along the i axis. The angle α between the vertical axis and the local magnetic field is shown on figure 11. For large detuning, α is al- ways small (α ≪ 1 ) and we write α ≈ −x/ < z > considering only the dynamics along the x dimension. For simplicity the magnetic field gradient b is considered as spatially isotropic with b > 0 as sketched on figure 11b. The expression of the radiation pressure force is then: Fx = h̄k ×(10) s(1− sinα)2/4 1 + sT + 4(δ − geµBb < z > (1− tanα)− kvx)2/Γ 2 s(1 + sinα)2/4 1 + sT + 4(δ − geµBb < z > (1− tanα) + kvx)2/Γ 2 Note that this expression is not restricted to the small α values. We expect six terms in the expression (11): three terms for each laser corresponding to the three e− and e+ polarisation eigenstates. However only two terms, corresponding to the e+ state, are close to resonance and thus have a dominant contribution. As for the vertical dimension, the off resonant terms are removed from the expression (11). One has also to note that the effective detuning δeff = δ − geµBb < z > is actually the same as the one along the vertical dimension. The first order expansion of (11) in α and kvx/Γ gives the expression of the horizontal radiation pressure force: Fx ≈ −καα− γxvx = −κxx− γxvx (11) κα = − < z > κx = h̄k 1 + sT + 4δ = mg (12) = −2 h̄k 2δeff (1 + sT + 4δ /Γ 2)2 As for the vertical dimension (equation (6)), the force depends on δeff but at the position of the MOT does not depend on the laser detuning δ. Hence, at large detuning, the horizontal temperature depends only on the intensity as observed in figures 9a and 9b. To understand the trapping mechanisms in the x − y plane, we now consider an atom at rest located at a po- sition x 6= 0 (corresponding to α 6= 0), i.e. not in the center of the MOT. The transition rate of two counter- propagating laser beam is not balanced anymore. This is due to the opposite sign in the α dependency of the prefac- tor in expression (11). This mechanism leads to a restoring force in the x−y plane at the origin of the spatial confine- ment (equation 11). Applying the equipartition theorem one gets the horizontal rms size of the cloud: x2rms = = −< z > kBT Without any free adjusting parameter, the agreement with experimental data is very good as shown in figure 10b. On the other hand there’s no displacement of the center of mass in the x − y plane whatever is the detuning δ as long as the equilibrium of the counter-propagating beams intensities is preserved (figure 10a). 6 T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line 4.3 Comparing the temperatures along horizontal and vertical axes As seen in sections 4.1 and 4.2, gravity has a dominant impact on cooling in a MOT operated on the intercom- bination line not only along the vertical axe but also in the horizontal plane. Even so we expect different behav- iors along this directions essentially because the gravity renders the trapping potential anisotropic. This is indeed the case for the spatial distribution (figures 10a and 10b) whereas the temperatures are surprisingly the same (fig- ures 9a and 9b). We will now give few simple arguments to physically explain this last point. In the semiclassical approximation, the temperature is defined as the ratio between the friction and the diffusion term: kBTi = Dabsi +D with i = x, y, z (15) Dabs and Dspo correspond to the diffusion coefficients in- duced by absorption and spontaneous emission events re- spectively. The friction coefficients has been already de- rived (equation 13): γz = 2γx,y (16) Indeed cooling along an axe in the x − y plane results in the action of two counter-propagating beams four times less coupled than the single upward laser beam. The same argument holds for the absorption term of the diffusion coefficient: Dabsz = 2D x,y (17) The spontaneous emission contribution in the diffusion co- efficient can be derived from the differential cross-section dσ/dΩ of the emitting dipole [20]. With a strong biased magnetic field along the vertical direction, this calcula- tion is particularly simple as e+z is the only quasi resonant state. Hence dσ/dΩ ∝ (1 + cosφ2) (18) φ is the angle between the vertical axe and the direction of observation. After a straightforward integration, one finds a contribution again two times larger along the vertical Dspoz = 2D x,y (19) From those considerations, the temperature is expected to be isotropic as observed experimentally (see figures 9a and 9b). In the so-called regime (I), the minimum temperature is given by the semiclassical Doppler theory: T = NR s (20) Where NR is a numerical factor which should be close to two [12]. This solution is represented in figure 9 by a dashed line nicely matching the experimental data for s > 8 but with NR = 1.2. Similar results, i.e. with unex- pected low NR values, have been found in [12]. For s ≤ 8 we observed a plateau in the final temperature slightly higher than the low saturation theoretical prediction [5]. We cannot explain why the temperature does not decrease further down as reported in [12]. For quantitative compar- ison with the theory, more detailed studies in a horizontal 1D molasses are required. 4.4 Conclusions Cooling of Strontium atoms using the intercombination line is an efficient technique to reach the recoil temper- ature in three dimensions by optical methods. Unfortu- nately loading from a thermal beam cannot be done di- rectly with a single frequency laser because of the nar- row velocity capture range. We have shown experimentally that more than 50% of the atoms initially in a blue MOT on the dipole-allowed transition are recaptured in the red MOT using a frequency-broadened spectrum. Using a sim- ple model, we conclude that the transfer is limited by the size of the laser beam. If the total power of the beams at 689 nm was higher, transfer rates up to 90% could be expected by tripling our laser beam size. The final tem- perature in the broadband regime is found to be as low as 2.5µK, i.e. only 5 times larger than the photon recoil temperature. The gain in temperature by comparison to the blue MOT (1−10mK) is appreciable. So in absence of strong requirements on the temperature, broadband cool- ing is very efficient and reasonably fast (less than 100ms). The requirements for the frequency noise of the laser are also much less stringent than for single frequency cooling. Using a subsequent single frequency cooling stage, it is possible to reduce the temperature down to 600 nK, slightly above the photon recoil temperature. Analyzing the large detuning regime, we particularly focus our stud- ies on the comparison between vertical and horizontal di- rections. We show how gravity indirectly influences the horizontal parameters of the steady state MOT and find that the trapping potential remains harmonic along all directions, but with an anisotropy. Gravity has a major impact on the MOT as it coun- terbalances the laser pressure of the upward laser (making the steady state independent of the detuning). We show that gravity thus affects the final temperature, which re- mains isotropic, despite different cooling dynamics along the vertical and horizontal directions. 5 Acknowledgments The authors wish to thank J.-C. Bernard and J.-C. Bery for valuable technical assistances. This research is finan- cially supported by the CNRS (Centre National de la Recherche Scientifique) and the former BNM (Bureau Na- tional de Métrologie) actually LNE (Laboratoire national de métrologie et d’essais) contract N◦ 03 3 005. T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line 7 References 1. F. Ruschewitz, J. L. Peng, H. Hinderthr, N. Schaffrath, K. Sengstock, and W. Ertmer, Phys. Rev. Lett. 80, 3173 (1998); G. Ferrari, P. Cancio, R. Drullinger, G. Giusfredi, N. Poli, M. Prevedelli, C. Toninelli, and G. M. Tino Phys. Rev. Lett. 91, 243002 (2003); M. Yasuda and H. Katori Phys. Rev. Lett. 92, 153004 (2004); T. Ido, T. H. Loftus, M. M. Boyd, A. D. Lud- low, K. W. Holman, and J. Ye Phys. Rev. Lett. 94, 153001 (2005); R. Le Targat, X. Baillard, M. Fouch, A. Brusch, O. Tcherbakoff, G. D. Rovera, and P. Lemonde Phys. Rev. Lett. 97, 130801 (2006). 2. J. Weiner, V. Bagnato, S. Zilio, and P. S. Julienne, Rev. Mod. Phys. 71, 1 (1999); T. Dinneen, K. R. Vogel, E. Arimondo, J. L. Hall, and A. Gallagher, Phys. Rev. A 59, 1216 (1999). A.R.L.Caires, G.D.Telles, M.W.Mancini, L.G.Marcassa, V.S.Bagnato, D.Wilkowski, R. Kaiser, Bra. J. Phys. 34, 1504 (2004). 3. M. M. Boyd, T. Zelevinsky, A. D. Ludlow, S.M. Forman, T. Ido, and J. Ye Science 314, 1430 (2006). 4. D. Hayes, P. Julienne, I. Deutsch, Arxiv, quant-ph/0609111. 5. Y. Castin, H. Wallis, and J. Dalibard, J. Opt. Soc. Am. B. 6, 2046 (1989). 6. T. Binnewies, G. Wilpers, U. Sterr, F. Riehle, and J. Helm- cke, T. E. Mehlstubler, E. M. Rasel, and W. Ertmer, Phys. Rev. Lett. 87, 123002 (2001). 7. H. Katori, T. Ido, Y. Isoya, and M. Kuwata-Gonokami, Phys. Rev. Lett. 82, 1116 (1999) 8. T. H. Loftus, T. Ido, A. D. Ludlow, M. M. Boyd, and J. Ye, Phys. Rev. Lett. 93, 073003 (2004). 9. T. Ido, Y. Isoya, and H. Katori, Phys. Rev. A 61, 061403 (2000). 10. D. W. Sesko, T. G. Walker and C. E. Wieman, J. Opt. Soc. Am. B 8, 946 (1991). 11. T. Chanelière, J.-L. Meunier, R. Kaiser, C. Miniatura, and D. Wilkowski. J. Opt. Soc. Am. B, 22, 1819 (2005). 12. T. H. Loftus, T. Ido, M. M. Boyd, A. D. Ludlow, and J. Ye, Phys. Rev. A 70, 063413 (2004). 13. K. R. Vogel, Ph. D. Thesis, University of Colorado, Boul- der, CO 80309, (1999). 14. Y. Bidel, B. Klappauf, J.C. Bernard, D. Delande, G. Labeyrie, C. Miniatura, D. Wilkowski, R. Kaiser, Phys. Rev. Lett. 88, 203902 (2002). 15. B. Klappauf, Y. Bidel, D. Wilkowski, T. Chanelière, R. Kaiser, Appl.Opt. 43, 2510 (2004). 16. D. Wilkowski, Y. Bidel, T. Chanelière, R. Kaiser, B. Klap- pauf, C. Miniatura, SPIE Proceeding 5866, 298 (2005). 17. N. Poli, G. Ferrari, M. Prevedelli, F. Sorrentino, R. E. Drullinger, and G. M. Tino, Spectro. Acta Part A 63, 981 (2006). 18. H.J. Metcalf, P. van der Straten, Laser cooling and trap- ping, Springer, (1999). 19. C. Dedman, J. Nes, T. Hanna, R. Dall, K. Baldwin, and A. Truscott, Rev. Mod. Phys., 75, 5136 (2004). 20. J.D. Jackson, Classical Electrodynamics (J. Wiley and sons, third edition New York, 1999). Blue MOT Laser Red MOT Laser Red MOT Laser Magnetic field gradient 70 G/cm 1-10 G/cm Du~k vbD 70 ms40 ms 80 ms Fig. 1. Time sequence and cooling stages of Strontium with the dipole-allowed transition and with the intercombination line. 0 5 10 15 20 25 30 35 40 45 Modulation frequency (kHz) Fig. 2. Transfer rate as a function of the modulation frequency. The other parameters are fixed: P = 3mW, δ = −1000 kHz, b = 1G/cm and ∆ν = 1000 kHz http://arxiv.org/abs/quant-ph/0609111 8 T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line 0 500 1000 1500 2000 2500 3000 Frequency deviation (kHz) Fig. 3. Transfer rate as a function of the frequency deviation (squares). The other parameters are fixed: P = 3mW, δ = −1000 kHz, b = 1G/cm and νm = 25 kHz. The dash and solid line correspond to a simple model prediction (see text). The transfer rate is limited by the frequency deviation of the broad laser spectrum for the dash line and by the waist of the MOT beam for the solid line. 0 500 1000 1500 2000 2500 3000 Frequency deviation (kHz) Frequency deviation (kHz) 0 500 1000 1500 2000 2500 3000 (a) (b) Fig. 4. Transfer rate as a function of the frequency deviation (squares). δ = −1500 kHz and δ = −2000 kHz for (a) and (b) respectively, the other parameters and the definitions are the same than for figure 3. 0 500 1000 1500 2000 2500 Detuning (kHz) Fig. 5. Transfer rate as a function of the detuning (squares). The other parameters are fixed: P = 3mW, ∆ν = 1000 kHz, b = 1G/cm and νm = 25 kHz. The dashed and solid lines have the same signification than in figure 3. 0 1 2 3 4 5 6 7 8 9 10 b (G/cm) Fig. 6. Transfer rate as a function of the magnetic gradient (squares). The other parameters are fixed: P = 3mW, δ = −1000 kHz, ∆ν = 1000 kHz and νm = 25 kHz. The transfer rate is limited by the waist of the MOT beam for all values. The dotted lines represent the case where the magnetic field gradient do not affect the deceleration. 0 1 2 3 4 5 Beam waist (cm) Fig. 7. Transfer rate as a function of the beam waist. The solid lines correspond to a high saturation parameter where as the dash line correspond to a constant power of P = 3mW. The other parameters are fixed: δ = −1000 kHz, ∆ν = 1000 kHz and b = 0.1G/cm. -2000 -1500 -1000 -500 δ (kHz) Fig. 8. Measured temperature as a function of the detuning for a FM spectrum. The other parameters are fixed: P = 3mW, b = 1G/cm, ∆ν = 1000 kHz and νm = 25 kHz T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line 9 10 100 1000 -80 -60 -40 -20 0 Detuning (kHz) Fig. 9. Measured temperature as a function of the detuning (a) with I = 4Is or I = 15Is and as a function of the intensity (b) with δ = −100 kHz of single frequency cooling. The circles (respectively stars) correspond to temperature along one of the horizontal (respectively vertical) axis. The magnetic field gradient is b = 2.5G/cm -700 -600 -500 -400 -300 -200 -100 0 Detuning (kHz) -700 -600 -500 -400 -300 -200 -100 0 Detuning (kHz) Fig. 10. Displacement (a) and rms radius (b) of the cold cloud in single frequency cooling along the z axis (star) and in the x−y plane (circle). The intensity per beam is I = 20Is and the magnetic gradient b = 2.5G/cm along the strong axis in the x− y plane. The linear displacement prediction correspond to the plain line (graph a). In graph b, the plain curve correspond to the rms radius prediction based on the equipartition theorem. 10 T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line d=-1000kHzd=-100kHz Cloud Quantization axe Fig. 11. (a) Images of the cold cloud in the red MOT. The cloud position for δ = −100 kHz coincides roughly with the center of the MOT whereas it is shifted downward for δ = −1000 kHz. The spatial position of the resonance correspond dot circle. (b) Sketch representing the large detuning case. The coupling efficiency of the MOT lasers is encoded in the size of the empty arrow. The laser form below has maximum efficiency whereas the one pointing downward is absent because is too detuned. Along a horizontal axe, the lasers are less coupled because they do not have the correct polarization. The α angle is the angular position of an atom M with respect to O, the center of the MOT. Introduction Experimental set-up Broadband loading of the red MOT Single frequency cooling Acknowledgments ABSTRACT The intercombination line of Strontium at 689nm is successfully used in laser cooling to reach the photon recoil limit with Doppler cooling in a magneto-optical traps (MOT). In this paper we present a systematic study of the loading efficiency of such a MOT. Comparing the experimental results to a simple model allows us to discuss the actual limitation of our apparatus. We also study in detail the final MOT regime emphasizing the role of gravity on the position, size and temperature along the vertical and horizontal directions. At large laser detuning, one finds an unusual situation where cooling and trapping occur in the presence of a high bias magnetic field. <|endoftext|><|startoftext|> Approximate Selection Rule for Orbital Angular Momentum in Atomic Radiative Transitions I.B. Khriplovich and D.V. Matvienko Budker Institute of Nuclear Physics, 630090 Novosibirsk, Russia, and Novosibirsk University Abstract We demonstrate that radiative transitions with ∆l = −1 are strongly dominating for all values of n and l, except small region where l ≪ n. It is well-known that the selection rule for the orbital angular momentum l in electro- magnetic dipole transitions, dominating in atoms, is ∆l = ±1, i. e. in these transitions the angular momentum can both increase and decrease by unity. Meanwhile, the classical radiation of a charge in the Coulomb field is always accompanied by the loss of angular momentum. Thus, at least in the semiclassical limit, the probability of dipole transitions with ∆ l = − 1 is higher. Here we discuss the question how strongly and under what exactly conditions the transitions with ∆l = −1 dominate in atoms. (To simplify the presentation, we mean always, here and below, the radiation of a photon, i. e. transitions with ∆n < 0. Obviously, in the case of photon absorption, i. e. for ∆n > 0, the angular momentum predominantly increases.) The analysis of numerical values for the transition probabilities in hydrogen presented in [1] has demonstrated that even for n and l, comparable with unity, i. e. in a nonclassical situation, radiation with ∆l = −1 can be much more probable than that with ∆l = 1. Later, the relation between the probabilities of transitions with ∆l = −1 and ∆l = 1 was investigated in [2] by analyzing the corresponding matrix elements in the semiclassical approximation. The conclusion made therein is also that the transitions with ∆l = −1 dominate, and the dominance is especially strong when l > n2/3. Here we present a simple solution of the problem using the classical electrodynamics and, of course, the correspondence principle. Our results describe the situation not only in the semiclassical situation. Remarkably enough, they agree, at least qualitatively, with the results of [1], although the latter refer to transitions with |∆n| ∼ n ∼ 1 and l ∼ 1, which are not classical at all. We start our analysis with a purely classical problem. Let a particle with a mass m and charge − e moves in an attractive Coulomb field, created by a charge e, along an ellipse with large semi-axis a and eccentricity ε. It is known [3] that the radiation intensity at a given harmonic ν is here 4e2ω4 ξ2ν + η ; (1) J ′ν(νε), ην = 1− ε2 Jν(νε). (2) In expressions (2), Jν(νε) is the Bessel function, and J ν(νε) is its derivative. We use the Fourier transformation in the following form: x(t) = a iνω0t = 2a ξν cos νω0t, http://arxiv.org/abs/0704.0856v1 y(t) = a iνω0t = 2a ην sin νω0t, where all dimensionless Fourier components ξν and ην are real, and ξ−ν = ξν , η−ν = − ην . We note that the Cartesian coordinates x and y are related here to the polar coordinates r and φ as follows: x = r cos φ, y = r sin φ, where φ increases with time. Thus, the angular momentum is directed along the z axis (but not in the opposite direction). We note also that, since 0 ≤ ε ≤ 1, both Jν(νε) and J ′ν(νε) are reasonably well approximated by the first term of their series expansion in the argument. Therefore, all the Fourier components ξν and ην are positive. In the quantum problem (where ν = |∆n|), the probability of transition in the unit time is h̄ω0ν 4e2ω3 3c3h̄ ξ2ν + η , ω0 = h̄2n3 . (3) Now, the loss of angular momentum with radiation is [3] r× r... . Going over here to the Fourier components, we obtain Ṁν = − 4e2ω2 rν × ṙν , or (with our choice of the direction of coordinate axes, and with the angular momentum measured in the units of h̄) Ṁν = − 4e2ω3 3c3h̄ 2ξνην . (4) Obviously, the last expression is nothing but the difference between the probabilities of transitions with ∆l = 1 and ∆l = −1 in the unit time: Ṁν = W ν −W−ν . (5) Of course, the total probability (3) can be written as Wν = W ν . (6) From explicit expressions (3) and (4) it is clear that inequality W+ν ≪ W−ν holds if 2ξνην ≈ ξ2ν + η2ν , or ην ≈ ξν . The last relation is valid for ε ≪ 1, i. e. for orbits close to circular ones. (The simplest way to check it, is to use in formulae (2) the explicit expression for the Bessel function at small argument: Jν(νε) = (νε) ν/(2νν !).) This conclusion looks quite natural from the quantum point of view. Indeed, it is the state with the orbital quantum number l equal to n − 1 (i. e. with the maximum possible value for given n) which corresponds to the circular orbit. In result of radiation n decreases, and therefore l should decrease as well. The surprising fact is, however, that in fact the probabilities W−ν of transitions with ∆l = −1 dominate numerically everywhere, except small vicinity of the maximum possible eccentricity ε = 1. For instance, if ε ≃ 0.9 (which is much more close to 1 than to 0 !), then at ν = 1 the discussed probability ratio is very large, it constitutes ≃ 12 . The change with ε of the ratio of W+ν to W ν for two values of ν is illustrated in Fig. 1. The curves therein demonstrate in particular that with the increase of ν, the region 0.2 0.4 0.6 0.8 1.0 Ε �������� ���� Fig. 1 where W−ν and W ν are comparable, gets more and more narrow, i. e. when ν grows, the corresponding curves tend more and more to a right angle. Let us go over now to the quantum problem. In the semiclassical limit, the classical expression for the eccentricity is rewritten with usual relations E = −me4/(2h̄2n2) and M = h̄l as . (8) In fact, the exact expression for ε, valid for arbitrary l and n, is [3]: l(l + 1) + 1 . (9) Clearly, in the semiclassical approximation the eccentricity is close to unity only under condition l ≪ n. If this condition does not hold, one may expect that in the semiclas- sical limit the transitions with ∆l = −1 dominate. In other words, as long as l ≪ n, the probabilities of transitions with decrease and increase of the angular momentum are comparable. But if the angular momentum is not small, it is being lost predominantly in radiation. This situation looks quite natural. The next point is that with the increase of |∆n| = ν, the region where W−ν and W+ν are comparable, gets more and more narrow in agreement with the observation made in However, we do not see any hint at some special role (advocated in [2]) of the condition l > n2/3 for the dominance of transitions with ∆l = −1. As mentioned already, the analysis of the numerical values of transition probabilities [1] demonstrates that even for n and l comparable with unity and |∆n| ≃ n, i. e. in the absolutely nonclassical regime, the transitions with ∆l = −1 are still much more probable than those with ∆l = 1. The results of this analysis for the ratio W−/W+ in some transitions are presented in Table 3.1 (first line). Then we indicate in Table 3.1 (last line) W4p→3s W4p→3d W5p→4s W5p→4d W5d→4p W5d→4f W6f→5d W6f→5g W5p→3s W5p→3d W6p→3s W6p→3d exact value 10 3.75 28 72 10.67 13.7 ε̄ 0.87 0.92 0.81 0.75 0.90 0.92 ν = |∆n| 1 1 1 1 2 3 semiclassical value 17.6 8.7 34 58 17.2 15.7 Table 3.1 the values of these ratios obtained in the näıve (semi)classical approximation. Here for the eccentricity ε̄ we use the value of expression (9), calculated with l corresponding to the initial state; as to n, we take its value average for the initial and final states. The table starts with the smallest possible quantum numbers where the transitions, which differ by the sign of ∆l, occur, i. e. with the ratio W4p→3s/W4p→3d. This table demonstrates that the ratio of the classical results to the exact quantum-mechanical ones remains everywhere within a factor of about two. In fact, if one uses as ε̄ expression (8), calculated in the analogous way, the numbers in the last line change considerably. It is clear, however, that the classical approximation describes here, at least qualitatively, the real situation. References [1] H.A. Bethe and E.E. Salpeter, Quantum Mechanics of One- and Two-Electron Atoms, Springer, 1957; §63. [2] N.B. Delone and V.P. Krainov, FIAN Preprint No. 18, 1979; J. Phys. B 27, (1994) 4403. [3] L.D. Landau and E.M. Lifshitz, The Classical Theory of Fields, Nauka, 1973; §70, problem 2 to §72. [4] L.D. Landau and E.M. Lifshitz, Quantum Mechanics, Nauka, 1974; §36. ABSTRACT We demonstrate that radiative transitions with \Delta l = - 1 are strongly dominating for all values of n and l, except small region where l << n. <|endoftext|><|startoftext|> Introduction Being a ‘close relative’ of General Relativity (GR), Absolute Parallelism (AP) has many interesting features: larger symmetry group of equations; field irreducibility with respect to this group; vast list of compatible second order equations (discovered by Einstein and Mayer [1]) not restricted to Lagrangian ones. There is the only variant of Absolute Parallelism which solutions are free of arising singularities, if D=5 (there is no room for changes; this variant of AP does not have a Lagrangian, nor match GR); in this case AP has topological features of nonlinear sigma-model. In order to give clear presentation and full picture of the theory’ scope, many items should be sketched: instability of trivial solution and expanding O4-symmetrical ones; tensor Tµν (positive energy, but only three polarizations of 15 carry (and angular) momentum; how to quantize such a stuff ?) and PN-effects; topological classification of symmetric 5D field configurations (alighting on evident parallels with Standard Model’ particle combinatorics) and ‘quantum phenomenology on expanding classical background’ (coexistence); ‘plain’ R2-gravity on very thick brane and change in the Newton’s Law: 1 goes to 1 with distance (not with acceleration – as it is in MOND [2]). At last, an experiment with single photon interference is discussed as the other way to observe very-very long (and very undeveloped) the extra dimension. 2 Unique 5D equation of AP (free of singularities in solutions) There is one unique variant of AP (non-Lagrangian, with the unique D; D=5) which solutions of general position seem to be free of arising singularities. The formal integrability test [3] can be ∗E-mail: zhogin at inp.nsk.su; http://zhogin.narod.ru http://arxiv.org/abs/0704.0857v1 http://zhogin.narod.ru extended to the cases of degeneration of either co-frame matrix, haµ, (co-singularities) or contra- variant frame (or contra-frame density of some weight), serving as the local and covariant (no coordinate choice) test for singularities of solutions. In AP this test singles out the next equation (and D=5, see [4]; ηab = diag(−1, 1, . . . , 1), then h = det haµ = Eaµ = Laµν;ν − 13(faµ + LaµνΦν) = 0 , (1) where (see [4] for more detailed introduction to AP and explanation of notations used) Laµν = La[µν] = Λaµν − Saµν − 23ha[µΦν], Λaµν = 2ha[µ,ν], Sµνλ = 3Λ[µνλ], Φµ = Λaaµ, fµν = 2Φ[µ,ν] = 2Φ[µ;ν]. (2) Coma ”,” and semicolon ”;” denote partial derivative and usual covariant differentiation with symmetric Levi-Civita connection, respectively. One should retain the identities [which follow from the definitions (2)]: Λa[µν;λ] ≡ 0 , haλΛabc;λ ≡ fcb (= fµνhµchνb ), f[µν;λ] ≡ 0. (3) The equation Eaµ;µ = 0 gives ‘Maxwell-like equation’ (we prefer to omit g µν (ηab) in contrac- tions that not to keep redundant information – when covariant differentiation is in use only): (faµ + LaµνΦν);µ = 0, or fµν;ν = (SµνλΦλ);ν (= −12Sµνλfνλ, see below) . (4) Actually the Eq. (4) follows from the symmetric part of equation, E(ab), because skewsymmetric one gives just the identity: 2E[νµ] = Sµνλ;λ = 0, E[µν];ν ≡ 0; note also that the trace part becomes irregular (the principal derivatives vanish) if D = 4 (this number of dimension is forbidden, and the next number, D = 5, is the most preferable): Eµµ = Eaµh ab = 4−D Φµ;µ + (Λ 2) = 0. The system (1) remains compatible under adding fµν = 0, see (4); this is not the case for another covariant, S,Φ, or (some irreducible part of the) Riemannian curvature, which relates to Λ as usually: Raµνλ = 2haµ;[ν;λ]; haµhaν;λ = Sµνλ − Λλµν . 3 Tensor Tµν (despite Lagrangian absence) and PN-effects One might rearrange E(µν)=0 that to pick out (into LHS) the Einstein tensor, Gµν =Rµν− 12gµνR, but the rest terms are not proper energy-momentum tensor: they contain linear terms Φ(µ;ν) (no positive energy ( !); another presentation of ‘Maxwell equation’ (4) is possible instead – as divergence of symmetrical tensor). However, the prolonged equation E(µν);λ;λ = 0 can be written as ‘plain’ (no R-term) R 2-gravity: (−h−1 δ(hRµνGµν)/δgµν=) Gµν;λ;λ +Gǫτ(2Rǫµτν − 12gµνRǫτ ) = Tµν(Λ ′2, . . .), Tµν;ν = 0; (5) up to quadratic terms, Tµν = 2 − fµλfνλ) + Aµǫντ (Λ2);(ǫ;τ) + (Λ2Λ′,Λ4); tensor A has symmetries of Riemann tensor, so the term A′′ adds nothing to momentum and angular momentum. It is worth noting that: (a) the theory does not match GR, but shows ‘plain’ R2-gravity (sure, (5) does not contain all the theory); (b) only f -component (three transverse polarizations in D=5) carries D-momentum and an- gular momentum (‘powerful’ waves); other 12 polarizations are ‘powerless’, or ‘weightless’ (this is a very unusual feature – impossible in the Lagrangian tradition; how to quantize ? let us not to try this, leaving the theory ‘as is’); (c) f -component feels only metric and S-field (‘contorsion’, not ‘torsion’ Λ – to label somehow), see (4), but S has effect only on polarization of f : S[µνλ] does not enter eikonal equation, and f moves along usual Riemannian geodesic (if background has f=0); one may think that all ‘quantum fields’ (phenomenological quantized fields accounting for topological (quasi)charges and carrying some ‘power’; see further) inherit this property; (d) the trace Tµµ = fµνfµν can be non-zero if f 2 6= 0 and this seemingly depends on S- component [which enters the current in (4)]; in other words, ‘mass distribution’ is to depend on distribution of f - and S-component; (e) it should be stressed and underlined that the f -component is not usual (quantum) EM- field – just important covariant responsible for energy-momentum (suffice it to say that there is no gradient invariance for f). 4 Linear domain: instability of trivial solution (with powerless waves) Another strange feature is the instability of trivial solution: some ‘powerless’ polarizations grow linearly with time in presence of ‘powerful’ f -polarizations. Really, from the linearized Eq. (1) and the identity (3) one can write (the following equations should be understood as linearized): Φa,a = 0 (D 6= 4), 3Λabd,d = Φa,b − 2Φb,a, Λa[bc,d],d ≡ 0 ⇒ 3Λabc,dd = −2fbc,a . The last ‘D‘Alembert equation’ has the ‘source’ in its right hand side. Some components of Λ (most symmetrical irreducible parts) do not grow (as well as curvature), because (again, linearized equations are implied below) Sabc,dd = 0, Φa,dd = 0, fab,dd = 0, Rabcd,ee = 0, but the least symmetrical components of the tensor Λ do grow up with time (due to terms ∼ t e−iωt; three growing polarizations which are ‘imponderable’, or powerless) if the ‘ponderable’ waves (three f -polarizations) do not vanish (and this should be the case for solutions of ‘general position’). 5 Expanding O4-symmetrical (single wave) solutions and cosmology The unique symmetry of AP equations gives scope for symmetrical solutions. In contrast to GR, this variant of AP has non-stationary spherically (O4-) symmetric solutions. The O4-symmetric frame field can be generally written as follows [4]: haµ(t, x a bni cni eninj + d∆ij ; i, j = (1, 2, 3, 4), ni = . (6) Here a, . . . , e are functions of time, t = x0, and radius r, ∆ij = δij −ninj, r2 = xixi. As functions of radius, b, c are odd, while the others are even; other boundary conditions: e = d at r = 0, and haµ → δ aµ as r → ∞. Placing in (6) b = 0, e = d (the other interesting choice is b=c=0) and making integrations one can arrive to the next system (resembling dynamics of Chaplygin gas; dot and prime denote derivation on time and radius, resp.; A = a/e = e1/2, B = −c/e): A· = AB′ −BA′ + 3AB/r , B · = AA′ − BB′ − 2B2/r . (7) This system (does not suffer of gradient catastrophe and) has non-stationary solutions; a single- wave solution of proper ‘amplitude’ might serve as a suitable cosmological (expanding) background. The condition fµν=0 is a must for solutions with such a high symmetry (as well as Sµνλ=0); so, these O4-solutions carry no energy, that is, weight nothing (some lack of gravity ! in this theory the universe expansion seemingly has little common with gravity, GR and its dark energy [5]). More realistic cosmological model might look like a single O4-wave (or a sequence of such waves) moving along the radius and being filled with chaos, or stochastic waves, both powerful (weak, ∆h≪ 1) and powerless (∆h < 1, but intense enough that to lead to non-linear fluctuations with ∆h ∼ 1), which form statistical ensemble(s) having a few characteristic parameters (after ‘thermalization’). The development and examination of stability of such a model is an interesting problem. The metric variation in cosmological O4-wave can serve as a time-dependent ‘shallow dielectric guide’ for that weak noise waves. The ponderable waves (which slightly ‘decelerate’ the O4-wave) should have wave-vectors almost tangent to the S 3-sphere of wave-front that to be trapped inside this (‘shallow’) wave-guide; the imponderable waves can grow up, and partly escape from the wave-guide, and their wave-vectors can be less tangent to the S3-sphere. The waveguide thickness can be small for an observer in the center of O4-symmetry, but in co- moving coordinates it can be very large (due to relativistic effect), however still small with respect to the radius of sphere, L≪ R. It seems that the radial dimension has to be very ‘undeveloped’; that is, there are no other characteristic scales, smaller than L, along this extra-dimension. 6 Non-linear domain: topological charges and quasi-charges Let AP-space is of trivial topology: no worm-holes, no compactified space dimensions, no singu- larities. One can continuously deform frame field h(x) to a field of rotation matrices (metric can be diagonalized and ‘square-rooted’) haµ(x) → saµ(x) ∈ SO(1, d); m=D−1. Further deformation can remove boosts too, and so, for any space-like (Cauchy) surface, this gives a (pointed) map, s : Rm ∪∞ = Sm → SOm; ∞ 7→ 1m ∈ SOm. The set of such maps consists of homotopy classes forming the group of topological charge, Π(m): Π(m) = πm(SOm); Π(3) = Z, Π(4) = Z2 + Z2. (8) Here Z is the infinite cyclic group, and Z2 is the cyclic group of order two. It is important that deformation to s-field can keep symmetry of field configuration. Definition: localized field (pointed map) s(x) : Rm → SO(m), s(∞) = 1m, is G-symmetric if, in some coordinates, s(σx) = σs(x)σ−1 ∀ σ ∈ G ⊂ O(m) . (9) The set of such fields C(m)G generally consists of separate, disconnected components – homotopy classes forming the ‘topological quasi-charge group’ denoted here as Π(G;m) ≡ π0(C(m)G ). These QC-groups classify symmetrical localized configurations of frame field. Since field equation does not break symmetry, quasi-charge conserves; if symmetry is not exact (because of distant regions), quasi-charge is not exactly conserving value, and quasi-particle (of zero topological charge) can annihilate (or be created) during colliding with another quasi-particle. The other problem. Let G1 ⊃ G2, such that there is a mapping (embedding) i : C(m)G1 → C which induces the homomorphism of QC-groups: i∗ : Π(G1;m) → Π(G2;m), so one has to describe this morphism. Let us consider the simple (discreet) symmetry group P1 with a plane of reflection symmetry: P1 = {1, p(1)}, where p(1) = diag(−1, 1, . . . , 1) = p−1(1). It is necessary to set field s(x) on the half-space 1 Rm = {x1 ≥ 0}, with additional condition imposed on the surface Rm−1 = {x1 = 0} (stationary points of P1 group) where s has to commute with the symmetry [see (9)]: p(1)x = x ⇒ s(x) = p(1)sp(1) ⇒ s ∈ 1× SOm−1. Hence, accounting for the localization requirement, we have a diad map (relative spheroid; here Dm is anm-ball and Sm−1 its surface) (Dm;Sm−1) → (SOm;SOm−1), and topological classification of such maps leads to the relative (or diad) homotopy group ([6]; the last equality below follows due to fibration SOm/SOm−1 = S m−1): Π(P1;m) = πm(SOm;SOm−1) = πm(S m−1). Similar considerations (of group orbits and stationary points) lead to the following result: Π(Ol;m) = πm−l+1(SOm−l+1;SOm−l) = πm−l+1(S m−l). If l > 3, there is the equality: Π(SOl;m) = Π(Ol;m), while for l = 2, 3 one can find: Π(SO3;m) = πm−2(SO2 × SOm−2;SOm−3) = πm−2(S1 × Sm−3), Π(SO2;m) = πm−1(SOm;SOm−2 × SO2) = πm−1(RG+(m, 2)). The set of quaternions with absolute value one, H1 = {f, |f| = 1}, forms a group under quaternion multiplication, H1 ∼= SU2 = S3, and any s ∈ SO4 can be represented as a pair of such quaternions [6], (f , g) ∈ S3(l) × S3(r), |f | = |g| = 1: x∗ = sx ⇔ x∗ = f x g−1 = f x ḡ ; |x| = |x∗|. The pairs (f,g) and (–f, –g) correspond to the same rotation s, that is, SO4 = S (l) × S3(r)/±. Note that the symmetry condition (9) also splits into two parts: f(axb−1) = af(x)a−1, g(axb−1) = bg(x)b−1 ∀(a,b) ∈ G ⊂ SO4. (10) 7 Example of SO2-symmetric quaternion field Let’s consider an example of SO2{2, 3}−symmetric f–field configuration (g=1), which carries both charge and SO2-quasi-charge (left, of course), f(x): H = R 4 → H1; f(∞) = 1. The symmetry condition (10) reads f(eiφ/2xe−iφ/2) = eiφ/2f(x)e−iφ/2. (11) We’ll switch to ‘double-axial’ coordinates: x = aeiϕ + beiψj. Let us use imaginary quaternions q as stereogrphic coordinates on H1, and take symmetrical field q(x) consistent with Eq. (11): q(x) = x i x̄+ i = −q̄, f(x) = − 1 + q . (12) It is easy to find the ‘center of quasi-soliton’ (1-submanifold, S1) S1 = f−1(−1) = q−1(0) = {a = 0, b = 1} = {x0(ψ) = eiψj} and the ‘vector equipment’ on this circle: dx|x0 = da eiϕ + (db+ i dψ)eiψj, 14df = idb− k ei (ϕ+ψ)da ; i-vector all time looks along the radius b (parallel translation along the circle S1; this is a ‘trivial‘, or ‘flavor’-vector). Two others (’phase’-vectors) make 2π−rotation along the circle. In fact, the field (12) has also symmetry SO2{1, 4}, and this feature restricts possible directions of ‘flavor’-vector (two ‘flavors’ are possible, ±; the P2{1, 4}−symmetry (this is the π-rotation of x1, x4) gives the same effect). The other interesting observation is that the equipped circle can be located also at the stationary points of SO2−symmetry (this increases the number of ‘flavors’). 8 Quasi-charges and their morphisms (in 5D, ie m = 4) If G ⊂ SO4, the QC-group has two isomorphous parts, left and right: Π(G) = Π(l)(G) + Π(r)(G). The Table below describes quasi-charge groups for G ⊂ G0 = (O3 × P4) ∩ SO4 (P4 is spatial inversion, the 4-th coordinate is the extra dimension of G0-symmetric expanding cosmological background). Table. QC-groups Π(l)(G) and their morphisms to the preceding group; G ⊂ G0. G Πl(G) → Πl(G∗) ‘label’ SO{1, 2} Z(e) e→ Z2 e SO{1, 2} × P{3, 4} Z(ν) + Z(H) i,m2→ Z(e) ν0; H0 → e + e SO{1, 2} × P{2, 3} Z(W ) 0→ Z(e) W → e + ν0 SO{1, 2} × P{2, 4} Z(Z) 0→ Z(e) Z0 → e+ e SO{1, 2} × P{3, 4} × Z(γ) 0→ Z(H) γ0 → H0 +H0 ×P{2, 3} 0→ Z(W ) →W +W ‘Quasi-particles’, which symmetry includes P4, seem to be true neutral (neutrinos, Higgs particles, photon). One can assume further that an hadron bag is a specific place where G0−symmetry does not work, and the bag’s symmetry is isomorphous to O4. This assumption can lead to another classification of quasi-solitons (some doubling the above scheme), where self-dual and anti-self- dual one-parameter groups take place of SO2−group. The total set of quasi-particle parameters (parameters of equipped 1-manifold (loop) plus parameters of group) for (anti)self-dual groups, G(4, 2)×RP 2, is larger than the analogous set for groups SO2 ⊂ G0, which is just O3×G(3, 1) = RP 2 . If the number of ‘flavor’-parameters (which are not degenerate and have some preferable particular values; this should be sensitive to discreet part of G – at least photons have the same flavor) is the same as in the case of ‘white’ quasi-particles, the remaining parameters (degenerate, or ‘phase’) can give room for ‘color’ (in addition to spin). So, perhaps one might think about ‘color neutrinos’ (in the context of pomeron, and baryon spin puzzle), ‘color W, Z, and Higgs’ (another context – B-mesons), and so on. Note that in this picture the very notion of quasi-particle depends on the background symmetry (also to note: there are no ’quanta of torsion’ per se). On the other hand, large clusters of quasi-particles (matter) can disturb the background, and waves of such small disturbances (with wavelength larger than the thickness L, perhaps) can be generated as well (but these waves do not carry (quasi)charges, that is, are not quantized). 9 Coexistence: phenomenological ‘quantum fields’ on classical back- ground The non-linear, particle-like field configurations with quasi-charges (quasi-particles) should be very elongated along the extra-dimension (all of the same size L), while being small sized along usual dimensions, λ≪ L. The motion of such a spaghetti-like quasi-particle should be very complicated and stochastic due to ‘strong’ imponderable noise, such that different parts of spaghetti are coming their own paths. At the same time, quasi-particle can acquire ‘its own’ energy–momentum – due to scattering of ponderable waves (which wave-vectors are almost tangent to usual 3D (sub)space); so, it seems that scattering amplitudes1 of those spaghetti’s parts which have the same 3D– coordinates can be summarized providing an auxiliary, secondary field. So, the imponderable waves provides stochasticity (of motion of spaghetti’s parts), while the ponderable waves ensure superposition (with secondary fields). Phenomenology of secondary fields could be of Lagrangian type, with positive energy acquired by quasi-particles, – that to ensure the stability (of all the waveguide with its infill – with respect to quasi-particle production; the least action principle has deep concerns with Lyapunov stability and is deducible, in principle, from the path integral approach). 10 ‘Plain’ R2 gravity on very thick brane and change in the Newton’s Law of Gravitation Let us start with 4d (from 5D) bi-Laplace equation with a δ-source [as weak field, non-relativistic (stationary) approximation (it is assumed that ‘mass is possible’) for R2-gravity (5)] and its solution (R is 4d distance, radius): ∆2ϕ = − a δ(R); ϕ(R2) = lnR2 − b (+ c , but c does not matter); (13) the attracting force between two point masses is Fpoint = , a, b should be proportional to both masses. Now let us suppose that all masses are distributed along the extra dimension with a ‘universal function’, µ(p), µ(p) dp = 1. Then the attracting (gravitation) force takes the next form [see 1 These amplitudes can depend on additional vector-parameters (‘equipment vectors’) relating to differential of field mapping at a ‘quasi-particle center’ – where quasi-charge density is largest (if it has covariant sense). 0 1 2 3 4 5 6 Fig. 1. Deviation δF = F − 1/r2 for different µ(p), see Eq. (14) and text below. (13); r is usual 3d distance]: F (r) = ϕ(r2 + (p− q)2)µ(p)µ(q) dp dq = V − b V ′, V (r) = µ(p)µ(q) dp dq r2 + (p− q)2 . (14) (Note that V (r) can be restored if F (r) is measured.) Taking µ1(p) = π −1/(1 + p2) (typical scale along the extra dimension is taken as unit, L = 1; it seems that L should be greater than ten AU), one can find rV1(r) = 1/(2 + r) and F (r) = 8 + 4r 2b(1 + r) r2(2 + r)2 ; or (now L 6= 1) F (r) = 2L(2L+ r)2 , where a = b = 2/L2. Fig. 1, curve (a) shows δF = F − 1/r2 (deviation from the Newton’s Law; a/b is chosen that δF (0)=0); two other curves, (b) & (c), correspond to µ2 = 2π −1/(1 + p2)2, µ3 = 2π −1p2/(1 + p2)2 (also δF (0)=0; residues help to find rV2 = (10 + 6r + r 2)/(2 + r)3, rV3 = (2 + 2r + r 2)/(2 + r)3). We see that in principle this theory can explain galaxy rotation curves, v2(r)∝ rF r→∞−→ const, without need for Dark Matter (or MOND [2]; about rotation curves and DM see [7]; they are looking for DM in Solar system too, [8]). Q: Can the ‘coherence of mass’ along the extra dimension be disturbed ? (the flyby anomaly, the Pioneer anomaly [9]); can µ(p) be negative in some domains of p ? 11 How to register ‘powerless’ waves This section is added perhaps for some funny recreation (or still not ? who knows). We have learnt that S-waves do not carry momentum and angular momentum, so they can not perform any work or spin flip. But let us conceive that these waves can effect a flip-flop of two neighbor spins. So, a ‘detector’ could be a media with two sorts of spins, A and B. Let sA = sB = 1/2 but gA 6= gB, and let the initial state is prepared as follows: {,}(0) = {1/2,−1/2}. Then the process of spin relaxation starts; turning on appropriate magnetic field Hz (and alternating fields of proper frequencies) one can measure the detector’s state and find the time of spin relaxation. The next step. Skilled experimenters try to generate S-waves and to register an effect of these waves on spin relaxation. The generation of intense ‘coherent’ S-waves could be proceeded perhaps with a similar spin system subjected to alternating polarization. 12 Single photon experiment (that to feel huge extra dimension), and Conclusion Today, many laboratories have sources of single (heralded) photons, or entangled bi-photons (say, for Bell-type experiments [10]); some students can perform laboratory works with single photons, having convinced on their own experience that light is quantized (the Grangier experiment)[11]. It is being suggested a minor modification of the single (polarized) photon interference exper- iment, say, in a Mach-Zehnder fiber interferometer with ‘long’ (the fibers may be rolled) enough arms. The only new element is a fast-acting shutter placed at the beginning of (one of) the inter- ferometer’s arms (the closing-opening time of the shutter should be smaller than the flight time in the arms). For example, a fast electro-optical modulator in combination with polarizer (or a number of such combinations) can be used with polarized photons. Both Quantum mechanics (no particle’s ontology) and Bohmian mechanics (wave-particle dou- ble ontology)[12] exclude any change in the interference figure as a result of separating activity of such a fast shutter (while the photon’s ‘halves’ are making their ways to the place of a meet- ing). However, if a photon has non-local spaghetti-like ontology (along the extra dimension) and fragments of this spaghetti are moving along both arms at once, then the shutter should tear up this spaghetti (mainly without photon absorption), tear out its fragments (which will dissolve in ‘zero-point oscillations’). Hence, if the absorption factor of the shutter (the extinction ratio of polarizer) is large enough, the 50/50-proportion (between the photon’s amplitudes in the arms) will be changed and a significant decrease of the interference visibility should be observed. QM is everywhere (where we can see, of course), and, so, non-linear 5D-field fluctuations, looking like spaghetti-anti-spaghetti loops, should exist everywhere. (This omnipresence can be related to the universality of ‘low-level heat death’, restricted by the presence of topological quasi- solitons – some as the 2D computer experiment by Fermi, Pasta, and Ulam, where the process of thermalization was restricted by the existence of solitons. See also the sections 5–8 (and [4]) for arguments in favor of phenomenological (quantized) ‘secondary fields’ accounting for topological (quasi)charges and obeying superposition, path integral and so on.) AP, at least at the level of its symmetry, seems to be able to cure the gap between the two branches of physics – General Relativity (with coordinate diffeomorphisms) and Quantum Mechanics (with Lorentz invariance).2 Most people give all the rights of fundamentality to quanta, and so, they are trying to quantize gravity, and the very space-time (probing loops, and strings, and branes; see also the warning polemic by Schroer [14]). The other possibility is that quanta have the specific phenomenological origin relating to topological (quasi)charges. 2Rovelli writes[13]: In spite of their empirical success, GR and QM offer a schizophrenic and confused under- standing of the physical world. References [1] A. Einstein and W. Mayer, Sitzungsber. preuss. Akad. Wiss. Kl 257–265 (1931). [2] M. Milgrom, The modified dynamics – a status review, arXiv: astro-ph/9810302. [3] J. F. Pommaret, Systems of Partial Differentiation Equations and Lie Pseudogroups (Math. and its Applications, Vol. 14, New York 1978). [4] I. L. Zhogin, Topological charges and quasi-charges in AP, arXiv: gr-qc/0610076; spherical symmetry: gr-qc/0412130; 3-linear equations (contra-singularities): gr-qc/0203008. [5] S.M. Carroll, Why is the Universe Accelerating ? arXiv: astro-ph/0310342 [6] B.A. Dubrovin, A.T. Fomenko and S.P. Novikov, Modern Geometry – Methods and Applica- tions, Springer-Verlag, 1984. [7] M.E. Peskin, Dark Matter: What is it ? Where is it ? Can we make it in the lab ? http://www.slac.stanford.edu/grp/th/mpeskin/Yale1.pdf; M. Battaglia, M.E. Peskin, The Role of the ILC in the Study of Cosmic Dark Matter, hep-ph/0509135 [8] L. Iorio, Solar System planetary orbital motions and dark matter, arXiv: gr-qc/0602095; I.B. Khriplovich, Density of dark matter in Solar system and perihelion precession of planets, astro-ph/0702260. [9] C. Lämmerzahl, O. Preuss, and H. Dittus, Is the physics within the Solar system really understood ? arXiv: gr-qc/0604052; A. Unzicker, Why do we Still Believe in Newton’s Law ? Facts, Myths and Methods in Gravitational Physics, gr-qc/0702009. [10] G. Weihs, T. Jennewein, C. Simon, H. Weinfurter, and A. Zeilinger, Phys. Rev. Lett. 81, 5039 (1998); quant-ph/9810080; W. Tittel, G. Weihs, Photonic Entanglement for Fundamental Tests and Quantum Communication, quant-ph/0107156. [11] See the next links: departments.colgate.edu/physics/research/Photon/root/ , marcus.whitman.edu/ beckmk/QM . [12] H. Nikolić, Quantum mechanics: Myths and facts, arXiv: quant-ph/0609163 . [13] C. Rovelli, Unfinished revolution, gr-qc/0604045 . [14] B. Schroer, String theory and the crisis in particle physics (a Samisdat on particle physics), arXiv: physics/0603112; the other sources of contra-string polemic are seemingly the books: P. Woit, Not even wrong; L. Smolin, The Trouble with Physics (and the blog math.columbia.edu/∼woit/wordpress). http://arxiv.org/abs/astro-ph/9810302 http://arxiv.org/astro-ph/9810302 http://arxiv.org/abs/gr-qc/0610076 http://arXiv.org/gr-qc/0610076 http://arXiv.org/gr-qc/0412130 http://arXiv.org/gr-qc/0203008 http://arxiv.org/abs/astro-ph/0310342 http://arxiv.org/astro-ph/0310342 http://www.slac.stanford.edu/grp/th/mpeskin/Yale1.pdf http://arxiv.org/hep-ph/0509135 http://arxiv.org/abs/gr-qc/0602095 http://arxiv.org/gr-qc/0602095 http://arxiv.org/astro-ph/0702260 http://arxiv.org/abs/gr-qc/0604052 http://arxiv.org/gr-qc/0604052 http://arxiv.org/gr-qc/0702009 http://arXiv.org/quant-ph/9810080 http://arXiv.org/quant-ph/0107156 http://departments.colgate.edu/%physics/research/Photon/root/photon_quantum_mechanics.htm http://marcus.whitman.edu/~beckmk/QM/ http://arxiv.org/abs/quant-ph/0609163 http://arXiv.org/gr-qc/0604045 http://arxiv.org/abs/physics/0603112 http://arXiv.org/physics/0603112 http://www.math.columbia.edu/~woit/wordpress/ Introduction Unique 5D equation of AP (free of singularities in solutions) Tensor T (despite Lagrangian absence) and PN-effects Linear domain: instability of trivial solution (with powerless waves) Expanding O4-symmetrical (single wave) solutions and cosmology Non-linear domain: topological charges and quasi-charges Example of SO2-symmetric quaternion field Quasi-charges and their morphisms (in 5D, ie m=4) Coexistence: phenomenological `quantum fields' on classical background `Plain' R2 gravity on very thick brane and change in the Newton's Law of Gravitation How to register `powerless' waves Single photon experiment (that to feel huge extra dimension), and Conclusion ABSTRACT Galactic rotation curves and lack of direct observations of Dark Matter may indicate that General Relativity is not valid (on galactic scale) and should be replaced with another theory. There is the only variant of Absolute Parallelism which solutions are free of arising singularities, if D=5 (there is no room for changes). This variant does not have a Lagrangian, nor match GR: an equation of `plain' R^2-gravity (ie without R-term) is in sight instead. Arranging an expanding O_4-symmetrical solution as the basis of 5D cosmological model, and probing a universal_function of mass distribution (along very-very long the extra dimension) to place into bi-Laplace equation (R^2 gravity), one can derive the Law of Gravitation: 1/r^2 transforms to 1/r with distance (not with acceleration). <|endoftext|><|startoftext|> Introduction During the last decade, several initiatives have been developed to monitor and collect real world data about malicious activities on the Internet, e.g., the Internet Motion Sensor project [1], CAIDA [2] and Dshield [3]. The CADHo project [4] in which we are involved is complementary to these initiatives and is aimed at: • deploying a distributed platform of honeypots [5] that gathers data suitable to analyze the attack processes targeting a large number of machines on the Internet; • validating the usefulness of this platform by carrying out various analyses, based on the collected data, to characterize the observed attacks and model their impact on security. A honeypot is a machine connected to a network but that no one is supposed to use. If a connection occurs, it must be, at best an accidental error or, more likely, an attempt to attack the machine. The first stage of the project focused on the deployment of a data collection environment (called Leurré.com [6]) based on low-interaction honeypots. As of today, around 40 honeypot platforms have been deployed at various sites from academia and industry in almost 30 different countries over the five continents. Several analyses and interesting conclusions have been derived based on the collected data as detailed e.g., in [4,5,7-9]. Nevertheless, with such honeypots, hackers can only scan ports and send requests to fake servers without ever succeeding in taking control over them. The second stage of our project is aimed at setting up and deploying high-interaction honeypots to allow us to analyze and model the behavior of malicious attackers once they have managed to compromise and get access to a new host, under strict control and monitoring. We are mainly interested in observing the progress of real attack processes and the activities carried out by the attackers in a controlled environment. In this paper, we describe the lessons learned from the development and deployment of such a honeypot. The main contributions are threefold. First, we do confirm the findings discussed in [9] showing that different sets of compromised machines are used to carry out the various stages of planned attacks. Second, we do outline the fact that, despite this apparent sophistication, the actors behind those actions do not seem to be extremely skillful, to say the least. Last, the geographical location of the machines involved in the last step of the attacks and the link with some phishing activities shed a geopolitical and socio-economical light on the results of our analysis. The paper is organized as follows. Section 2 presents the architecture of our high-interaction honeypot and the design rationales for our solution. The lessons learned from the attacks observed over a period of almost 4.5 months are discussed in Section 3. Finally, Section 4 concludes and discusses future work. An extended version of this paper detailing the context of this work and the related state-of-the art is available in [10]. 2. Architecture of our honeypot In our implementation, we decided to use VMware [11] and to install virtual operating system upon it. Compared to solutions based on physical machines, virtual honeypots provide a cost effective and flexible solution that is well suited for running experiments to observe attacks. The objective of our experiment is to analyze the behavior of the attackers who succeed in breaking into a machine. The vulnerability that they exploit is not as crucial as the activity they carry out once they have broken into the host. That's why we chose to use a simple vulnerability: weak passwords for ssh user accounts. Our honeypot is not particularly hardened for two reasons. First, we are interested in analyzing the behavior of the attackers even when they exploit a buffer overflow and become root. So, if we use some kernel patch such as Pax [12], our system will be more secure but it will be impossible to observe some behavior. Secondly, if the system is too hardened, the intruders may suspect something abnormal and then give up. In our setup, only ssh connections to the virtual host are authorized so that the attacker can exploit this vulnerability. A firewall blocks all connection attempts from the Internet, but those to port 22 (ssh). Also, any connection from the virtual host to the outside is blocked Proceedings of the Sixth European Dependable Computing Conference (EDCC'06) 0-7695-2648-9/06 $20.00 © 2006 to avoid that intruders attack remote machines from the honeypot. This does not prevent the intruder from downloading code, using the ssh connection1. Our honeypot is a standard Gnu/Linux installation, with kernel 2.6, with the usual binary tools. No additional software was installed except the http apache server. This kernel was modified as explained in the next subsection. The real host executing VMware uses the same Gnu/Linux distribution and is isolated from outside. In order to log what the intruders do on the honeypot, we modified some drivers functions (tty_read and tty_write), as well as the exec system call in the Linux kernel. The modifications of tty_read and tty_write enable us to intercept the activity on all the terminals of the system. The modification of the exec system call enables us to record the system calls used by the intruder. These functions are modified in such a way that the captured information is logged directly into a buffer of the kernel memory of the honeypot itself. Moreover, in order to record all the logins and passwords tried by the attackers to break into the honeypot we added a new system call into the kernel of the virtual operating system and we modified the source code of the ssh server so that it uses this new system call. The logins and passwords are logged in the kernel memory, in the same buffer as the information related to the commands used by the attackers. The activities of the intruder logged by the honeypot are preprocessed and then stored into an SQL database. The raw data are automatically processed to extract relevant information for further analyses, mainly: i) the IP address of the attacking machine, ii) the login and the password tested, iii) the date of the connection, iv) the terminal associated (tty) to each connection, and v) each command used by the attacker. 3. Experimental results This section presents the results of our experiments. First, we give global statistics in order to give an overview of the activities observed on the honeypot, then we characterize the various intrusion processes. Finally, we analyze in detail the behavior of the attackers once they break into the honeypot. In this paper, an intrusion corresponds to the activities carried out by an intruder who has succeeded to break into the system. 3.1. Global statistics The high-interaction honeypot has been deployed on the Internet and has been running for 131 days during which 480 IP addresses have tried to contact its ssh port. It is worth comparing this value to the amount of hits observed against port 22, considering all the other low- interaction honeypot platforms we have deployed in the rest of the world (40 platforms). In the average, each platform has received hits on port 22 from around approximately 100 different IPs during the same period of time. Only four platforms have been contacted by more 1 We have sometimes authorized http connections for a short time, by checking that the attackers were not trying to attack other remote hosts. than 300 different IP addresses on that port and only one was hit by more visitors than our high interaction honeypot. Even better, the low-interaction platform maintained in the same subnet as the high-interaction honeypot experimented only 298 visits, i.e. less than two thirds of what the high-interaction did see. This very simple and first observation confirms the fact already described in [9] that some attacks are driven by the fact that attackers know in advance, thanks to scans done by other machines, where potentially vulnerable services are running. The existence of such a service on a machine will trigger more attacks against it. This is what we observe here: the low interaction machines do not have the ssh service open, as opposed to the high interaction one, and, therefore get less attacked than the one where some target has been identified. The number of ssh connection attempts to the honeypot we have recorded is 248717 (we do not consider here the scans on the ssh port). This represents about 1900 connection attempts a day. Among these 248717 connection attempts, only 344 were successful. Table 1 represents the user accounts that were mostly tried (the top ten) as well as the number of different passwords that have been tested by the attackers. It is noteworthy that many user accounts corresponding to usual first names have also regularly been tested on our honeypot. The total number of accounts tested is 41530. Account Number of connection attempts Percentage of connection attempts Number of passwords tested root 34251 13.77% 12027 admin 4007 1.61% 1425 test 3109 1.25% 561 user 1247 0.50% 267 guest 1128 0.45% 201 info 886 0.36% 203 mysql 870 0.35% 211 oracle 857 0.34% 226 postgres 834 0.33% 194 webmaster 728 0.29% 170 Table 1: ssh connection attempts and number of passwords tested Before the real beginning of the experiment (approximately one and a half month), we had deployed a machine with a ssh server correctly configured, offering no weak account and password. We have taken advantage of this observation period to determine which accounts were mostly tried by automated scripts. Using this acquired knowledge, we have created 17 user accounts and we have started looking for successful intrusions. Some of the created accounts were among the most attacked ones and others not. As we already explained in the paper, we have deliberately created user accounts with weak passwords (except for the root account). Then, we have measured the time between the creation of the account and the first successful connection to this account, then the duration between the first successful connection and the first real intrusion (as explained in section 3.2, the first successful connection is very seldom a real intrusion but rather an automatic script which tests passwords). Proceedings of the Sixth European Dependable Computing Conference (EDCC'06) 0-7695-2648-9/06 $20.00 © 2006 Table 2 summarizes these durations (UAi means User Account i). User Account Duration between creation and first successful connection Duration between first successful connection and first intrusion UA1 1 day 4 days UA2 Half a day 4 minutes UA3 15 days 1 day UA4 5 days 10 days UA5 5 days null UA6 1 day 4 days UA7 5 days 8 days UA8 1 day 9 days UA9 1 day 12 days UA10 3 days 2 minutes UA11 7 days 4 days UA12 1 day 8 days UA13 5 days 17 days UA14 5 days 13 days UA15 9 days 7 days UA16 1 day 14 days UA17 1 day 12 days Table 2: History of breaking accounts The second column indicates that there is usually a gap of several days between the time when a weak password is found and the time when someone logs into the system with this password to issue some commands on the now compromised host. This is a somehow a surprising fact and is described with some more details here below. The particular case of the UA5 account is explained as follows: an intruder succeeded in breaking the UA4 account. This intruder looked at the contents of the /etc/passwd file in order to see the list of user accounts for this machine. He immediately decided to try to break the UA5 account and he was successful. Thus, for this account, the first successful connection is also the first intrusion. 3.2. Intrusion process In the section, we present the conclusions of our analyses regarding the process to exploit the weak password vulnerability of our honeypot. The observed attack activities can be grouped into three main categories: 1) dictionary attacks, 2) interactive intrusions, 3) other activities such as scanning, etc. Figure 3: Classification of observed IP addresses As illustrated in figure 3, among the 480 IP addresses that were seen on the honeypot, 197 performed dictionary attacks and 35 performed real intrusions on the honeypot (see below for details). The 248 IP addresses left were used for scanning activity or activity that we did not clearly identified. Among the 197 IP addresses that made dictionary attacks, 18 succeeded in finding passwords. The others (179) did not find the passwords either because their dictionary did not include the accounts we created or because the corresponding weak password had already been changed by a previous intruder. We have also represented in Figure 3 the corresponding number of IP addresses that were also seen on the low-interaction honeypot deployed in the context of the project in the same network (between brackets). Whereas most of the IP addresses seen on the high interaction honeypot are also observed on the low interaction honeypot, none of the 35 IPs used to really log into our machine to launch commands have ever been observed on any of the low interaction honeypots that we do control in the whole world! This striking result is discussed hereafter. 3.2.1. Dictionary attack. The preliminary step of the intrusion consists in dictionary attacks2. In general, it takes only a couple of days for newly created accounts to be compromised. As shown in Figure 3, these attacks have been launched from 197 IP addresses. By analysing more precisely the duration between the different ssh connection attempts from the same attacking machine, we can say that these dictionary attacks are executed by automatic scripts. As a matter of fact, we have noted that these attacking machines try several hundreds, even several thousands of accounts in a very short time. We have made then further analyses regarding the machines that succeed in finding passwords, i.e., the 18 IP addresses. By searching the leurré.com database containing information about the activities of these addresses against the other low interaction honeypots we found four important elements of information. First, we note that none of our low interaction honeypot has an ssh server running, none of them replies to requests sent to port 22. These machines are thus scanning machines without any prior knowledge on their open ports. Second, we found evidences that these IPs were scanning in a simple sequential way all addresses to be found in a block of addresses. Moreover, the comparison of the fingerprints left on our low interaction honeypots highlights the fact that these machines are running tools behaving the same way, not to say the same tool. Third, these machines are only interested in port 22, they have never been seen connecting to other ports. Fourth, there is no apparent correlation as far as their geographical location is concerned: they are located all over the world. In other words, it comes from this analysis that these IPs are used to run a well known program. The detailed analysis of this specific tool is outside the scope of the paper but, nevertheless, it is worth mentioning that the activities linked to that tool, as observed in our Leurré.com database, indicate that it is unlikely to be a worm but rather an easy to use and widely spread tool. 3.2.2. Interactive attack: intrusion. The second step of the attack consists in the real intrusion. We have noted that, several days after the guessing of a weak 2 We consider as “dictionary attack” any attack that tries more than 10 different accounts and passwords. Proceedings of the Sixth European Dependable Computing Conference (EDCC'06) 0-7695-2648-9/06 $20.00 © 2006 password, an interactive ssh connection is executed on our honeypot to issue several commands. We believe that, in those situations, a real human being, as opposed to an automated script, is connected to our machine. This is explained and justified in Section 4.3. As shown in Figure 3, these intrusions come from 35 IP addresses never observed on any of the low-interaction honeypots. Whereas the geographic localisation of the machines performing dictionary attacks is very blur, the machines that are used by a human being for the interactive ssh connection are, most of the time, clearly identified. We have a precise idea of their country, geographic address, the responsible of the corresponding domain. Surprisingly, these machines, for half of them, come from the same country, an European country not usually seen as one of the most attacking ones as reported, for instance, by the www.leurrecom.org web site. We then made analyses in order to see if these IP addresses had tried to connect to other ports of our honeypot except for these interactive connections; and the answer is no. Furthermore, the machines that make interactive ssh connections on our honeypot do not make any other kind of connections on this honeypot, i.e, no scan or dictionary attack. Further analyses, using the data collected from the low-interaction honeypots deployed in the CADHo project, revealed that none of the 35 IP addresses have ever been observed on any of our platforms deployed in the word. This is interesting because it shows that these machines are totally dedicated to this kind of attack (they only targeted our high- interaction honeypot and only when they knew at least one login and password on this machine). We can conclude for these analyses that we face two groups of attacking machines. The first group is composed of machines that are specifically in charge of making dictionary attacks. Then the results of these dictionary attacks are published somewhere. Then, another group of machines, which has no intersection with the first group, comes to exploit the weak passwords discovered by the first group. This second group of machines is, as far as we can see, clearly geographically identified and commands are executed by a human being. A similar two steps process was already observed in the CADHo project when analyzing the data collected from the low-interaction honeypots (see [9] for more details). 3.3. Behavior of attackers This section is dedicated to the analysis of the behavior of the intruders. We first characterize the intruders, i.e. we try to know if they are humans or programs. Then, we present in more details the various actions they have carried out on the honeypot. Finally, we try to figure out what their skill level seems to be. We concentrate the analyses on the last three months of our experiment. During this period, some intruders have visited our honeypot only once, others have visited it several times, for a total of 38 ssh intrusions. These intrusions were initiated from 16 IP addresses and 7 accounts were used. Table 3 presents the number of intrusions per account, IP addresses and passwords used for these intrusions. It is of course difficult to be sure that all the intrusions for a same account are initiated by the same person. Nevertheless, in our case, we noted that: • most of the time, after his first login, the attacker changes the weak password into a strong which, from there on, remains unchanged. • when two different IP addresses access the same account (with the same password), they are very close and belong to the same country or company. These two remarks lead us to believe that there is in general only one person associated to the intrusions for a particular account. Account Number of intrusions Number of passwords Number of IP addresses UA2 1 1 1 UA4 13 2 2 UA5 1 1 1 UA8 1 1 1 UA10 9 2 2 UA13 6 1 5 UA16 5 1 3 UA17 2 1 1 Table 3: Number of intrusions per account 3.3.1. Type of the attackers: humans or programs. Before analyzing what intruders do when connected, we can try to identify who they are. They can be of two different natures. Either they are humans, or they are programs which reproduce simple behaviors. For all intrusions but 12, intruders have made mistakes when typing commands. Mistakes are identified when the intruder uses the backspace to erase a previously entered character. So, it is very likely that such activities are carried out by a human, rather than programs. When an intruder did not make any mistake, we analyzed how the data were transmitted from the attacker machine to the honeypot. We can note that, for ssh communications, data transmission between the client and the server is asynchronous. Most of the time, the ssh client implementation uses the function select() to get user input. So, when the user presses a key, this function ends and the program sends the corresponding value to the server. In the case of a copy and a paste into the terminal running the client, the select() function also ends, but the program sends all the values contained in the buffer used for the paste into the server. We can assume that, when tty_read() returns more than one character, these values have been sent after a copy and a paste. If all the activities during a connection are due to a copy and a paste, we can strongly assume that it is due to an automatic script. Otherwise, this is quite likely a human being who uses shortcuts from time to time (such as CTRL-V to paste commands into its ssh session). For 7 out of the last 12 activities without mistakes, intruders have entered several commands on a character-by- character basis. This, once again, seems to indicate that a human being is entering the commands. For the 5 others, their activities are not significant enough to conclude: they have only launched a single command, like w, which is not long enough to highlight a copy and a paste. Proceedings of the Sixth European Dependable Computing Conference (EDCC'06) 0-7695-2648-9/06 $20.00 © 2006 3.3.2. Attacker activities. The first significant remark is that all of the intruders change the password of the hacked account. The second remark is that most of them start by downloading some files. In all cases, but one, the attackers tried to download some malware to the compromised machines. In a single case, the attacker has first tried to download an innocuous, yet large, file to the machine (the binary for a driver coming from a known web site). This is probably a simple way to assess the connectivity quality of the compromised host. The command used by the intruders to download the software is wget. To be more precise, 21 intrusions upon 38 include the wget command. These 21 intrusions concern all the hacked accounts. As mentioned in section 2, outgoing http connections are forbidden by the firewall. Nevertheless, the intruders still have the possibility to download files through the ssh connection using sftp command (instead of wget). Surprisingly, we noted that only 30% of the intruders did use this ssh connection. 70% of the attackers were unable to download their malware due to the absence of http connectivity! Three explanations can be envisaged at this stage. First, they follow some simplistic cookbook and do not even known the other methods at their disposal to upload a file. Second, the machines where the malware resides do not support sftp. Third, the lack of http connectivity made the attacker suspicious and he decided to leave our system. Surprisingly, the first explanation seems to be the right one in our case as we noticed that the attackers leave after an unsuccessful wget and come back a few hours or days later, trying the same command again as if they were hoping it to work at that time. Some of them have been seen trying this several times. It can be concluded that: i) they are apparently unable to understand why the command fails, ii) they are not afraid to come back to the machine despite the lack of http connectivity, iii) applying such brute force attack reveals that they are not aware of any other method to upload the file. Once the attackers manage to download their malware using sftp, they try to install it (by decompressing or extracting files for example). 75% of the intrusions that installed software did not install it on the hacked account but rather on standard directories such as /tmp, /var/tmp or /dev/shm (which are directories with write access for everybody). This makes the hacker activity more difficult to identify because these directories are regularly used by the operating system itself and shared by all the users. Additionally, we have identified four main activities of the intruders. The first one is launching ssh scans on other networks but these scans have never tested local machines. Their idea is to use the targeted machine to scan other networks, so that it is more difficult for the administrator of the targeted network to localize them. The program used by most intruders, which is easy to find on the Internet, is pscan.c. The second type of activity consists in launching irc clients, e.g., emech [13] and psyBNC. Names of binary files have regularly been changed by intruders, probably in order to hide them. For example, the binary files of emech have been changed to crond or inetd, which are well known Unix binary file names and processes. The third type of activity is trying to become root. Surprisingly, such attempts have been observed for 3 intrusions only. Two rootkits were used. The first one exploits two vulnerabilities: a vulnerability which concerns the Linux kernel memory management code of the mremap system call [14] and a vulnerability which concerns the internal kernel function used to manage process's memory heap [15]. This exploit could not succeed because the kernel version of our honeypot does not correspond to the version of the exploit. The intruder should have realized this because he checked the version of the kernel of the honeypot (uname -a). However, he launched this rootkit anyway and failed. The other rootkit used by intruders exploits a vulnerability in the program ld. Thanks to this exploit, three intruders became root but the buffer overflow succeeded only partially. Even if they apparently became root, they could not launch all desired programs (removing files for example caused access control errors). The last activity observed in the honeypot is related to phishing activities. It is difficult to make precise conclusions because only one intruder has attempted to launch such an attack. He downloaded a forged email and tried to send it through the local smtp agent. But, as far as we could understand, it looked like a preliminary step of the attack because the list of recipient emails was very short. It seems that is was just a preliminary test before the real deployment of the attack. 3.3.3. Attackers skill. Intruders can roughly speaking be classified into two main categories. The most important one is relative to script kiddies. They are inexperienced hackers who use programs found on the Internet without really understanding how they work. The next category represents intruders who are more dangerous. They are named “black hat”. They can make serious damage on systems because they are expert in security and they know how to exploit vulnerabilities on various systems. As already presented in §3.3.2. (use of wget and sftp), we have observed that intruders are not as clever as expected. For example, for two hacked accounts, the intruders don't seem to really understand the Unix file access rights (it's very obvious for example when they try to erase some files whereas they don't have the required privileges). For these two same accounts, the intruders also try to kill the processes of other users. Many intruders do not try to delete the file containing the history of their commands or do not try to deactivate this history function (this file depends on the login shell used, it is .bash_history for example for the bash). Among the 38 intrusions, only 14 were cleaned by the intruders (11 have deactivated the history function and 3 have deleted the.bash_history file). This means that 24 intrusions left behind them a perfectly readable summary of their activity within the honeypot. The IP address of the honeypot is private and we have started another honeypot on this network. This second honeypot is not directly accessible from the outside, it is only accessible from the first honeypot. We have modified the /etc/motd file of the first honeypot (which is automatically printed on the screen during the login Proceedings of the Sixth European Dependable Computing Conference (EDCC'06) 0-7695-2648-9/06 $20.00 © 2006 process) and added the following message: “In order to use the software XXX, please connect to A.B.C.D”. In spite of this message, only one intruder has tried to connect to the second honeypot. We could expect that an experienced hacker will try to use this information. In a more general way, we have very seldom seen an intruder looking for other active machines on the same network. One important thing to note is relative to fingerprinting activity. No intruder has tried to check the presence of VMware software. For three hacked accounts, the intruders have read the contents of the file /proc/cpuinfo but that's all. None of the methods discussed on Internet was tested to identify the presence of VMware software [16,17]. This probably means that the intruders are not experienced hackers. 4. Conclusion In this paper, we have presented the results of an experiment carried out over a period of 6 months during which we have observed the various steps that lead an attacker to successfully break into a vulnerable machine and his behavior once he has managed to take control over the machine. The findings are somehow consistent with the informal know how shared by security experts. The contributions of the paper reside in performing an experiment and rigorous analyses that confirm some of these informal assumptions. Also, the precise analysis of the observed attacks reveals several interesting facts. First of all, the complementarity between high and low interaction honeypots is highlighted as some explanations can be found by combining information coming from both set ups. Second, it appears that most of the observed attacks against port 22 were only partially automated and carried out by script kiddies. This is very different from what can be observed against other ports, such as 445, 139 and others, where worms have been designed to completely carry out the tasks required for the infection and propagation. Last, honeypot fingerprinting does not seem to be a high priority for attackers as none of them has tried the known techniques to check if they were under observation. It is also worth mentioning a couple of important missing observations. First, we did not observe scanners detecting the presence of the open ssh port and providing this information to other machines in charge of running the dictionary attack. This is different from previous observations reported in [9]. Second, as most of the attacks follow very simple and repetitive patterns, we did not observe anything that could be used to derive sophisticated scenarios of attacks that could be analyzed by intrusion detection correlation engine. Of course, at this stage it is too early to derive definite conclusions from this observation. Therefore, it would be interesting to keep doing this experiment over a longer period of time to see if things do change, for instance if a more efficient automation takes place. We would have to solve the problem of weak passwords being replaced by strong ones though, in order to see more people succeeding in breaking into the system. Also, it would be worth running the same experiment by opening another vulnerability into the system and verifying if the identified steps remain the same, if the types of attackers are similar. Could it be, at the contrary, that some ports are preferably chosen by script kiddies while others are reserved to some more elite attackers? This is something that we are in the process of assessing. Acknowledgement. This work has been partially supported by: 1) CADHo, a research action funded by the French ACI “Securité & Informatique” (www.cadho.org), 2) the CRUTIAL IST-027513 project (crutial.cesiricerca.it), and 3) the ReSIST IST- 026764 project (www.resist-noe.org). 5. References [1] M. Bailey, E. Cooke, F. Jahanian, J. Nazario, The Internet motion sensor - a distributed blackhole monitoring system. Network and Distributed Systems Security Symp. (NDSS 2005), San Diego, USA, 2005. [2] CAIDA Project. Home Page of the CAIDA Project, http://www.caida.org. [3] http://www.dshield.org. Home page of the DShield.org Distributed Intrusion Detection System. [4] E. Alata, M. Dacier, Y. Deswarte, M. Kaaniche, K. Kortchinsky, V. Nicomette, V. Hau Pham, and F. Pouget, Collection and analysis of attack data based on honeypots deployed on the Internet. QOP 2005, 1st Workshop on Quality of Protection (co-located with ESORICS and METRICS), Sept. 15, Milan, Italy, 2005. [5] F. Pouget, M. Dacier, V. Hau Pham. Leurre.com: on the advantages of deploying a large scale distributed honeypot platform. In Proc. of ECCE'05, E-Crime and Computer Conference, Monaco, 2005. [6] Home page of Leurré.com: http://www.leurre.org. [7] Project Leurré.com. Publications web page: http://www.leurrecom.org/paper.htm. [8] M. Dacier, F. Pouget, H. Debar. Honeypots: practical means to validate malicious fault assumptions. 10th IEEE Pacific Rim Int. Symp., pp. 383--388, Tahiti, 2004. [9] F. Pouget, M. Dacier, V. Hau Pham, “Understanding threats: a prerequisite to enhance survivability of computing systems”, Int. Infrastructure Survivability Workshop IISW'04, (25th IEEE Int. Real-Time Systems Symp. (RTSS 04)), Lisboa, Portugal, 2004. [10] E. Alata, V. Nicomette, M. Kaaniche, M. Dacier, M. Herrb, Lessons learned from the deployment of a high- interaction honeypot: Extended version. LAAS Report, July 2006. [11] Inc. VMware. Available on: http://www.vmware.com [12] The PaX Team. Available on: http://pax.grsecurity.net. [13] EnergyMech team. Energymech. Available on: http://www.energymech.net. [14] US-CERT. Linux kernel mremap(2) system call does not properly check return value from do_munmap() function. Available on: http://www.kb.cert.org/vuls/id/981222. [15] US-CERT. Linux kernel do_brk() function contains integer overflow. http://www.kb.cert.org/vuls/id/981222. [16] J. Corey, Advanced honeypot identification and exploitation. Phrack, N 63, Available on: http://www.phrack.org/fakes/p63/p63-0x09.txt. [17] T. Holz and F. Raynal, Detecting honeypots and other suspicious environments. In Systems, Man and Cybernetics (SMC) Information Assurance Workshop. Proc. from the Sixth Annual IEEE, pages 29--36, 2005. Proceedings of the Sixth European Dependable Computing Conference (EDCC'06) 0-7695-2648-9/06 $20.00 © 2006 ABSTRACT This paper presents an experimental study and the lessons learned from the observation of the attackers when logged on a compromised machine. The results are based on a six months period during which a controlled experiment has been run with a high interaction honeypot. We correlate our findings with those obtained with a worldwide distributed system of lowinteraction honeypots. <|endoftext|><|startoftext|> Introduction The idea behind abstract (linear) potential theory, as developed by Choquet [4], Fuglede [9] and Ohtsuka [15], is to replace the Euclidian space Rd by some locally compact space X and the well-known Newto- nian kernel by some other kernel function k : X×X → R∪{+∞}, and ∗ This work was started during the 3rd Summerschool on Potential Theory, 2004, hosted by the College of Kecskemét, Faculty of Mechanical Engineering and Automa- tion (GAMF). Both authors would like to express their gratitude for the hospitality and the support received during their stay in Kecskemét. † The second named author was supported by the Hungarian Scientific Research Fund; OTKA 49448 http://arxiv.org/abs/0704.0859v1 to look at which “potential theoretic” assertions remain true in this gen- erality (see the monograph of Landkof [12]). This approach facilitates general understanding of certain potential theoretic phenomena and allows also the exploration of fundamental principles like Frostman’s maximum principle. Although there is a vast work done considering energy integrals and different notions of energies, the familiar notions of transfinite diame- ter and Chebyshev constants in this abstract setting are sporadically found, sometimes indeed inaccessible, in the literature, see Choquet [4] or Ohtsuka [17]. In [4] Choquet defines transfinite diameter and proves its equality with the Wiener energy in a rather general situation, which of course covers the classical case of the logarithmic kernel on C. We give a slightly different definition for the transfinite diameter that, for infinite sets, turns out to be equivalent with the one of Choquet. The primary aim of this note is to revisit the above mentioned notions and related results and also to partly complement the theory. We already remark here that Zaharjuta’s generalisation of transfi- nite diameter and Chebyshev constant to Cn is completely different in nature, see [24], whereas some elementary parts of weighted potential theory (see, e.g., Mhaskar, Saff [13] and Saff, Totik [20]) could fit in this framework. The power of the abstract potential analytic tools is well illustrated by the notion of the average distance number from metric analysis, see Gross [11], Stadje [21]. The surprising phenomenon noticed by Gross is the following: If (X, d) is a compact connected metric space, there al- ways exists a unique number r(X) (called the average distance number or the rendezvous number of X), with the property that for any finite point system x1, . . . , xn ∈ X there is another point x ∈ X with average distance d(xj , x) = r(X). Stadje generalised this to arbitrary continuous, symmetric functions replacing d. Actually, it turned out, see the series of papers [6, 5, 7] and the references therein, that many of the known results concerning av- erage distance numbers (existence, uniqueness, various generalisations, calculation techniques etc.), can be proved in a unified way using the works of Fuglede and Ohtsuka. We mention for example that Frost- man’s Equilibrium Theorem is to be accounted for the existence for certain invariant measures (see Section 5 below). In these investigations the two variable versions of Chebyshev constants and energies and even their minimax duals had been needed, and were also partly available due to the works of Fuglede [10] and Ohtsuka [16, 17], see also [6]. Another occurrence of abstract Chebyshev constants is in the study of polarisation constants of normed spaces, see Anagnostopoulos, Ré- vész [1] and Révész, Sarantopoulos [19]. Let us settle now our general framework. A kernel in the sense of Fuglede is a lower semicontinuous function k : X × X → R ∪ {+∞} [9, p. 149]. In this paper we will sometimes need that the kernel is symmetric, i.e., k(x, y) = k(y, x). This is for example essential when defining potential and Chebyshev constant, otherwise there would be a left- and right-potential and the like. Another assumption, however a bit of technical flavour, is the pos- itivity of the kernel. This we need, because we would like to avoid technicalities when integrating not necessarily positive functions. This assumption is nevertheless not very restrictive. Since we usually con- sider compact sets of X ×X, where by lower semicontinuity k is nec- essarily bounded from below, we can assume that k ≥ 0. Indeed, as we will see, energy, nth diameter and nth Chebyshev constant are linear in constants added to k. Denote the set of compactly supported Radon measures on X by M(X), that is M(X) := {µ : µ is a regular Borel measure on X, µ has compact support, ‖µ‖ < +∞}. Further, let M1(X) be the set of positive unit measures from M(X), M1(X) := {µ ∈ M(X) : µ ≥ 0, µ(X) = 1}. We say that µ ∈ M1(X) is supported on H if supp µ, which is a compact subset of X, is in H. The set of (probability) measures supported on H are denoted by M(H) (M1(H)). Before recalling the relevant potential theoretic notions from [9] (see also [15]), let us spend a few words on integrals (see [2, Ch. III-IV.]). Let µ be a positive Radon measure on X. Then the integral of a compactly supported continuous function with respect to µ is the usual integral. The upper integral of a positive l.s.c. function f is defined as f dµ := sup 0 ≤ h ≤ f h ∈ Cc(X) h dµ. This definition works well, because by standard arguments (see, e.g., [2, Ch. IV., Lemma 1]) one has k(x, y) = sup 0 ≤ h ≤ k h ∈ Cc(X ×X) h(x, y), where, because of the symmetry assumption, it suffices to take only symmetric functions h in the supremum. What should be here noted, is that this notion of integral has all useful properties that we are used to in case of Lebesgue integrals (note also the necessity of the positivity assumptions). The usual topology onM is the so-called vague topology which is a lo- cally convex topology defined by the family {µ 7→ X f dµ : f ∈ Cc(X)} of seminorms. We will only encounter this topology in connection with families M of measures supported on subsets of the same compact set K ⊂ X. In this case, the weak∗-topology (determined by C(K)) and the vague topology coincide on M, Fuglede [9]. For a potential theoretic kernel k : X ×X → R+ ∪ {0} Fuglede [9] and Ohtsuka [15] define the potential and the energy of a measure µ Uµ(x) := k(x, y) dµ(y) , W (µ) := k(x, y) dµ(y) dµ(x). The integrals exist in the above sense, although may attain +∞ as well. For a given set H ⊂ X its Wiener energy is w(H) := inf µ∈M1(H) W (µ), (1) see [9, (2) on p. 153]. One also encounters the quantities (see [9, p. 153]) U(µ) := sup Uµ(x), V (µ) := sup x∈ supp µ Uµ(x). Accordingly one defines the following energy functions u(H) := inf µ∈M1(H) U(µ), v(H) := inf µ∈M1(H) V (µ). In general, one has the relation w ≤ v ≤ u ≤ +∞, where in all places strict inequality may occur. Nevertheless, under our assumptions we have the equality of the energies v and w, being gen- erally different, see [9, p. 159]. More importantly, our set of conditions suffices to have a general version of Frostman’s equilibrium theorem, see Theorem 9. In fact, at a certain point (in §4), we will also assume Frostman’s maximum principle, which will trivially guarantee even u = v, that is, the equivalence of all three energies treated by Fuglede. Definition. The kernel k satisfies the maximum principle, if for every measure µ ∈ M1 U(µ) = V (µ). As our examples show in §5, this is essential also for the equivalence of the Chebyshev constant and the transfinite diameter. Carleson [3, Ch. III.] gives a class of examples satisfying the maximum principle: Let Φ(r), r = |x|, x ∈ Rd be the fundamental solution of the Laplace equation, i.e., Φ(|x−y|) the Newtonian potential on Rd. For a positive, continuous, increasing, convex function H assume also that H(Φ(r))rd−2 dr < +∞. Then H ◦Φ satisfies the maximum principle; see [3, Ch. III.] and also Fuglede [9] for further examples. Let us now turn to the systematic treatment of the Chebyshev constant and the transfinite diameter. We call a function g : X → R log-polynomial, if there exist w1, . . . , wn ∈ X such that g(x) = j=1 k(x,wj) for all x ∈ X. Accordingly, we will call the wjs and n the zeros and the degree of g(x), respectively. Obviously the sum of two log-polynomials is a log-polynomial again. The terminology here is motivated by the case of the logarithmic kernel k(x, y) = − log |x− y|, where the log-polynomials correspond to negative logarithms of alge- braic polynomials. Log-polynomials give access to the definition of transfinite diameter and the Chebyshev constant, see Carleson [3], Choquet [4], Fekete [8], Ohtsuka [17] and Pólya, Szegő [18]. First we start with the “degree n” versions, whose convergence will be proved later. Definition. Let H ⊂ X be fixed. We define the nth diameter of H as Dn(H) := inf w1,...,wn∈H (n− 1)n 1≤j 6=l≤n k(wj , wl) ; (2) or, if the kernel is symmetric Dn(H) = inf w1,...,wn∈H (n− 1)n 1≤i m, there is always a point from the diagonal ∆ = {(x, x) : x ∈ H} in the definition of Dn(H). This possibility is completely excluded by Choquet in [4], thus allowing only infinite sets. Definition. For an arbitrary H ⊂ X the nth Chebyshev constant of H is defined as Mn(H) := sup w1,...,wn∈H k(x,wk) We are going to show that both nth diameters and nth Chebyshev constants converge from below to some number (or +∞), which are respectively called the transfinite diameter D(H) and the Chebyshev constant M(H). The aim of this paper is to relate these quantities as well as the Wiener energy of a set. 2. Chebyshev constant and transfinite diameter We define the Chebyshev constant and the transfinite diameter of a set H ⊂ X and proceed analogously to the classical case. It turns out, though not very surprisingly, that in general the equality of these two quantities does not hold. First, we prove the convergence of nth diameters and nth Chebyshev constants. This is for both cases classical, we give the proof only for the sake of completeness, see, e.g., Carleson [3], Choquet [4], Fekete [8], Ohtsuka [17] and Pólya, Szegő [18]. PROPOSITION 1. The sequence of nth diameters is monotonically increasing. Proof. Choose x1, . . . , xn ∈ H arbitrarily. If we leave out any index i = 1, 2, . . . , n, then for the remaining n − 1 points we obtain by the definition of Dn−1(H) that (n− 1)(n − 2) 1 ≤ j 6= l ≤ n j 6= i, l 6= i k(xj , xl) ≥ Dn−1(H). After summing up for i = 1, 2, . . . , n this yields 1≤j 6=l≤n k(xj , xl) ≥ n ·Dn−1(H), for each term k(xj , xl) occurs exactly n − 2 times. Now taking the infimum for all possible x1, . . . , xn ∈ H, we obtain n · Dn(H) ≥ n · Dn−1(H), hence the assertion. The limit D(H) := limn→∞Dn(H) is the transfinite diameter of H. Similarly, the nth Chebyshev constants converge, too. PROPOSITION 2. For any H ⊂ X, the Chebyshev constants Mn(H) converge in the extended sense. Proof. The sum of two log-polynomials, p(z) = i=1 k(z, xi) with de- gree n and q(z) = j=1 k(z, yj) with degree m, is also a log-polynomial with degree n+m. Therefore (n+m)Mn+m ≥ nMn +mMm (3) for all n,m follows at once. Should Mn(H) be infinity for some n, then all succeeding terms Mn′(H), n ′ ≥ n are infinity as well, hence the convergence is obvious. We assume now that Mn(H) is a finite sequence. At this point, for the sake of completeness, we can repeat the classical argument of Fekete [8]. Namely, let m,n be fixed integers. Then there exist l = l(n,m) and r = r(n,m), 0 ≤ r < m nonnegative integers such that n = l ·m + r. Iterating the previous inequality (3) we get n ·Mn ≥ l + rMr = nMm + r(Mr −Mm). Fixing now the value of m, the possible values of r remain bounded by m, and the finitely many values of Mr −Mm’s are finite, too. Hence dividing both sides by n, and taking lim infn→∞, we are led to lim inf Mn ≥ lim inf Mr −Mm = Mm . This holds for any fixed m ∈ N, so taking lim supm→∞ on the right hand side we obtain lim inf Mn ≥ lim sup that is, the limit exists. M(H) := limn→∞Mn(H) is called the Chebyshev constant of H. In the following, we investigate the connection between the Chebyshev constant M(H) and the transfinite diameter D(H). THEOREM 3. Let k be a positive, symmetric kernel. For any n ∈ N and H ⊂ X we have Dn(H) ≤ Mn(H), thus also D(H) ≤ M(H). Proof. If Mn(H) = +∞, then the assertion is trivial. So assume Mn(H) < +∞. By the quasi-monotonicity (see (3)) we have that for all m ≤ n also Mm(H) is finite. We use this fact to recursively find w1, . . . wn ∈ H such that k(wi, wj) < +∞ for all i < j ≤ n. At the end we arrive at 1≤i 0, we take, as we may, an “approximate n-Fekete point system” w1, . . . , wn (n− 1)n 1≤i 6=j≤n k(wi, wj) < Dn + ε. (4) For any x ∈ H the points x,w1, . . . , wn form a point system of n + 1 points, so by the definition of Dn+1 we have k(x,wi) + 1≤i 6=j≤n k(wi, wj) ≥ n(n+ 1)Dn+1 ≥ n(n+ 1)Dn, using also the monotonicity of the sequence Dn. This together with (4) lead to pn(x) := k(x,wi) ≥ n(n+ 1) n(n− 1) Dn + ε Taking infimum of the left hand side for x ∈ H we obtain pn(x) ≥ nDn − n(n− 1)ε By the very definition of the nth Chebyshev constant, n · Mn ≥ infx∈H pn(x) holds, hence Mn ≥ Dn − (n− 1)ε/2 follows. As this holds for all ε > 0, we conclude Mn ≥ Dn. Later we will show that, unlike the classical case of C, the strict inequality D < M is well possible. 3. Transfinite diameter and energy We study the connection between the energy w and the transfinite diameter D. Without assuming the maximum principle we can prove the equivalence of these two quantities for compact sets. This result can actually be found in a note of Choquet [4]. There is however a slight difference to the definitions of Choquet in [4]. There the diagonal was completely excluded from the definition of D, that is the infimum in (2) is taken over wi 6= wj, i 6= j and not for systems of arbitrary wj’s . This means, among others, that in [4] the transfinite diameter is only defined for infinite sets. The other assumption of Choquet is that the kernel is infinite on the diagonal. This is completely the contrary to what we assume in Theorem 8. Indeed, with our definitions of the transfinite diameter one can even prove equality for arbitrary sets if the kernel is finite-valued. THEOREM 4. Let k be an arbitrary kernel and H ⊂ X be any set. Then D(H) ≤ w(H). Proof. Let µ ∈ M1(H) be arbitrary, and define ν := j=1 µ the product measure on the product space Xn. We can assume that the kernel is positive because supp µ, and hence supp ν, is compact so we can add a constant to k such that it will be positive on these supports. Consider the following lower semicontinuous functions g and h on Xn g : (x1, . . . , xn) 7→ Dn(H) := inf (w1,...,wn)∈Xn n(n−1) 1≤i 6=j≤n k(wi, wj) h : (x1, . . . , xn) 7→ n(n−1) 1≤i 6=j≤n k(xi, xj). Since 0 ≤ g ≤ h, by the definition of the upper integral the following holds true Dn(H) ≤ n(n− 1) 1≤i 6=j≤n k(xi, xj) dν(x1, . . . , xn) n(n− 1) 1≤i 6=j≤n k(xi, xj) dµ(xi) dµ(xj) = W (µ). Taking infimum in µ yields Dn(H) ≤ w(H), hence also D(H) ≤ w(H). To establish the converse inequality we need a compactness as- sumption. With the slightly different terminology, Choquet proves the following for kernels being +∞ on the diagonal ∆. The arguments there are very similar, except that the diagonal doesn’t have to be taken care of in [4]. We give a detailed proof. PROPOSITION 5 (Choquet [4]). For an arbitrary kernel function k the inequality D(K) ≥ w(K) holds for all K ⊆ X compact sets. Proof. First of all the l.s.c. function k attains its infimum on the compact set K × K. So by shifting k up we can assume that it is positive, and the validity of the desired inequality is not influenced by this. If D(K) = +∞, then by Theorem 4 we have w(K) = +∞, thus the assertion follows. Assume therefore D(K) < +∞, and let n ∈ N, ε > 0 be fixed. Let us choose a Fekete point system w1, . . . , wn from K. Put µ := µn := 1/n i=1 δwi where δwi are the Dirac measures at the points wi, i = 1, . . . , n. For a continuous function 0 ≤ h ≤ k with compact support, we have h dµ dµ = i,j=1 h(wi, wj) h(wi, wi) + i,j=1 h(wi, wj) h(wi, wi) + i,j=1 k(wi, wj) i,j=1 k(wi, wj) Dn(K) ≤ +D(K) using, in the last step, also the monotonicity of the sequenceDn (Propo- sition 1). In fact, we obtain for n ≥ N = N(‖h‖, ε) the inequality h dµ dµ ≤ D + ε. (5) It is known, essentially by the Banach-Alaoglu Theorem, that for a compact set K the measures of M1(K) form a weak ∗-compact subset of M, hence there is a cluster point ν ∈ M1(K) of the set MN := {µn : n ≥ N} ⊂ M1(K). Let {να}α∈I ⊆ MN be a net converging to ν. Recall that να⊗να weak ∗-converges to ν⊗ν. We give the proof. For a function g ∈ C(K ×K), g(x, y) = g1(x) · g2(y) it is obvious that g dνα dνα → g dν dν. (6) The set A of such product-decomposable functions g(x, y) = g1(x)g2(y) is a subalgebra of C(K ×K), which also separates X ×X, since it is already coordinatewise separating. By the Stone–Weierstraß theorem A is dense in C(K ×K). From this, using also that the family MN of measures is norm-bounded, we immediately get the weak∗-convergence (6). All these imply h dν dν ≤ D(K) + ε, w(K) ≤ W (ν) := kdνdν = sup 0 ≤ h ≤ k h ∈ Cc(X ×X) hdνdν ≤ D(K)+ε, for all ε > 0. This shows w(K) ≤ D(K). COROLLARY 6 (Choquet [4]). For arbitrary kernel k and compact set K ⊂ X, the equality D(K) = w(K) holds. Proof. By compactness we can shift k up and therefore assume it is positive. Then we apply Theorem 4 and Proposition 5. The assumptions of Choquet [4] are the compactness of the set plus the property that the kernel is +∞ on the diagonal (besides it is continuous in the extended sense). This ensures, loosely speaking, that for a set K of finite energy an energy minimising measure µ (i.e., for whichW (µ) = w(K)) is necessarily non-atomic, moreover µ ⊗ µ is not concentrated on the diagonal. Therefore to show equality of w with D, one has to exclude the diagonal completely from the definition of the transfinite diameter. We however allow a larger set of choices for the point system in the definition of D. Indeed, we allow Fekete points to coincide, and this also makes it possible to define the transfinite diameter of finite sets. With this setup the inequality D ≤ w is only simpler than in the case handled by Choquet. Whereas, however surprisingly, the equality D(K) = w(K) is still true for compact sets K but without the assumption on the diagonal values of the kernel. We will see in §5 Example 13 that even assuming the maximum prin- ciple but lacking the compactness allows the strict inequality D < w. This phenomena however may exist only in case of unbounded kernels, as we will see below. In fact, we show that if the kernel is finite on the diagonal, thenD = w holds for arbitrary sets. For this purpose, we need the following technical lemma, which shows certain inner regularity properties of D and is also interesting in itself. LEMMA 7. Assume that the kernel k is positive and finite on the diagonal, i.e., k(x, x) < +∞ for all x ∈ X. Then for an arbitrary H ⊂ X we have D(H) = inf K ⊂ H K compact D(K) = inf W ⊂ H #W < ∞ D(W ). (7) Proof. The inequality infD(K) ≤ infD(W ) is clear. For H ⊇ K the inequality D(H) ≤ D(K) is obvious, so we can assume D(H) < +∞. For ε > 0 let W = {w1, . . . , wn} be an approximate n-Fekete point set of H satisfying (4). Then D(W ) = lim Dmn(W ) ≤ lim mn(mn− 1) 1≤i′ 6=j′≤mn k(wi′ , wj′), where wi′ := . . . ′ = i+ rn, r = 0, . . . ,m− 1 . . . Set C := max{k(x, x) : x ∈ W}. So we find D(W ) ≤ lim mn(mn−1) 1≤i 6=j≤n k(wi, wj) + mn(mn−1) 1≤i≤n k(wi, wi) 1≤i 6=j≤n k(wi, wj) lim mn(mn−1) + Cn lim mn(mn−1) 1≤i 6=j≤n k(wi, wj) ≤ (Dn(H) + ε) ≤ D(H) + ε. This being true for all ε > 0, taking infimum we finally obtain W ⊂ H #W < ∞ D(W ) ≤ D(H). Clearly, if k(x, x) = +∞ for all x ∈ W with a finite set #W = n, then for all m > n we have Dm(W ) = +∞. Thus in particular for kernels with k : ∆ → {+∞}, the above can not hold in general, at least as regards the last part with finite subsets. Now, completely contrary to Choquet [4] we assume that the kernel is finite on the diagonal and prove D = w for any set. Hence an example of D < w (see §5 Example 13) must assume k(x, x) = +∞ at least for some point x. THEOREM 8. Assume that the kernel k is positive and is finite on the diagonal, that is k(x, x) < +∞ for all x ∈ X. Then for arbitrary sets H ⊂ X, the equality D(H) = w(H) holds. Proof. By Theorem 4 we have D(H) ≤ w(H). Hence there is nothing to prove, if D(H) = +∞. Assume D(H) < +∞, and let ε > 0 be arbitrary. By Lemma 7 we have for some n ∈ N a finite set W = {w1, w2 . . . , wn} with D(H) + ε ≥ D(W ). In view of Proposition 5 we have D(W ) ≥ w(W ), and by monotonicity also w(W ) ≥ w(H). It follows that D(H) + ε ≥ w(H) for all ε > 0, hence also the “≥” part of the assertion follows. 4. Energy and Chebyshev constant To investigate the relationship between the energy and the Cheby- shev constant the following general version of Frostman’s Equilibrium Theorem [9, Theorem 2.4] is fundamental for us. THEOREM 9 (Fuglede). Let k be a positive, symmetric kernel and K ⊂ X be a compact set such that w(K) < +∞. Every µ which has minimal energy (µ ∈ M1(K),W (µ) = w(K)) satisfy the following properties Uµ(x) ≥ w(K) for nearly every1 x ∈ K, Uµ(x) ≤ w(K) for every x ∈ supp µ, Uµ(x) = w(K) for µ-almost every x ∈ X. Moreover, if the kernel is continuous, then Uµ(x) ≥ w(K) for every x ∈ K. THEOREM 10. Let H ⊂ X be arbitrary. Assume that the kernel k is positive, symmetric and satisfies the maximum principle. Then we have Mn(H) ≤ w(H) for all n ∈ N, whence also M(H) ≤ w(H) holds true. Proof. Let n ∈ N be arbitrary. First let K be any compact set. We can assume w(K) < +∞, since otherwise the inequality holds irrespective of the value of Mn(K). Consider now an energy-minimising measure νK of K, whose existence is assured by the lower semicontinu- ity of µ 7→ k dµ dµ and the compactness of M1(K), see [9, Theorem 2.3]. By the Frostman-Fuglede theorem (Theorem 9) we have UνK (x) ≤ w(K) for all x ∈ supp νK , so V (νK) ≤ w(K), and by the maximum principle even UνK (x) ≤ w(K) for all x ∈ X. 1 The set A of exceptional points is small in the sense w(A) = +∞. Then for all w1, . . . , wn ∈ K k(x,wj) ≤ k(x,wj) dνK(x) ≤ w(K) . Taking supremum for w1, . . . , wn ∈ K, we obtain w1,...,wn∈K k(x,wj) ≤ w(K). So Mn(K) ≤ w(K) for all n ∈ N. Next let H ⊂ X be arbitrary. In view of the last form of (1), for all ε > 0 there exists a measure µ ∈ M1(H), compactly supported in H, with w(µ) ≤ w(H) + ε. Let W = {w1, . . . , wn} ⊂ H be arbitrary and define pW (x) := i k(x,wi). Consider the compact set K := W ∪ supp µ ⊂ H. By definition of the energy, supp µ ⊂ K implies w(K) ≤ w(µ), hence w(K) ≤ w(H) + ε. Combining this with the above, we come to Mn(K) ≤ w(H) + ε. Since W ⊂ K, by definition of Mn(K) we also have pW (x) ≤ Mn(K). (8) The left hand side does not increase, if we extend the inf over the whole of H, and the right hand side is already estimated from above by w(H) + ε. Thus (8) leads to pW (x) ≤ w(H) + ε. This holds for all possible choices of W = {w1, . . . , wn} ⊂ H, hence is true also for the sup of the left hand side. By definition of Mn(H) this gives exactly Mn(H) ≤ w(H) + ε, which shows even Mn(H) ≤ w(H). Remark. In [6] it is proved that M(H) = q(H), where q(H) = inf µ∈M1(H) Uµ(x). The idea behind is a minimax theorem, see also [16, 17]. Trivially w(H) ≤ q(H) ≤ u(H). So the maximum principle implies M(H) = w(H) = q(H) = u(H). 5. Summary of the Results. Examples In this section, we put together the previous results, thus proving the equality of the three quantities being studied, under the assumption of the maximum principle for the kernel. Further, via several instruc- tive examples we investigate the necessity of our assumptions and the sharpness of the results. THEOREM 11. Assume that the kernel k is positive, symmetric and satisfies the maximum principle. Let K ⊂ X be any compact set. Then the transfinite diameter, the Chebyshev constant and the energy of K coincide: D(K) = M(K) = w(K). Proof. We presented a cyclic proof above, consisting of M ≥ D (Theorem 3), D ≥ w (Proposition 5) and finally w ≥ M (Theorem 10). THEOREM 12. Assume that the kernel k is positive, finite and sat- isfies the maximum principle. For an arbitrary subset H ⊂ X the transfinite diameter, the Chebyshev constant and the energy of H co- incide: D(H) = M(H) = w(H). Proof. By finiteness D = w, due to Theorem 8. This with D ≤ M and M ≤ w (Theorems 3 and 10) proves the assertion. Remark. In the above theorem, logically it would suffice to assume that the kernel be finite only on the diagonal. But if this was the case, the maximum principle would then immediately imply the finiteness of the kernel everywhere. Let us now discuss how sharp the results of the preceding sections are. In the first example we show that, if we drop the assumption of compactness the assertions of Theorem 3, Theorem 4 and Theorem 10 are in general the strongest possible. Example 13. Let X = N ∪ {0} endowed with discrete topology and the kernel k(n,m) := +∞ if n = m, 0 if 0 6= n 6= m 6= 0, 1 otherwise. The kernel is symmetric, l.s.c. and has the maximum principle. This latter can be seen by noticing that for a probability measure µ ∈ M1(X) the potential is +∞ on the support of µ. Indeed, since X is countable, all measures µ ∈ M1(X) are necessarily atomic, and if for some point ℓ ∈ X we have µ({ℓ}) > 0, then by definition X k(x, y) dµ(y) = +∞. We calculate the studied quantities of the set H = X (also as in all the examples below). Since the kernel is positive, Dn ≥ 0. On the other hand, choosing w1 := 1, . . . , wn := n, all the values k(wi, wj) will be exactly 0, so it follows that Dn = 0, n = 1, 2, . . ., and hence D = 0. The Chebyshev constant can be estimated from below, if we compute the infimum of a suitably chosen log-polynomial. Consider the log- polynomial p(x) with all zeros placed at 0, that is with w1 = . . . = wn = 0. Then the log-polynomial p(x) is j k(x,wj) = n · k(x, 0). If x 6= 0, we have p(x) = n, which gives M ≥ 1. The upper estimate of M is also easy: suppose that in the system w1, . . . wn there are exactly m points being equal to 0 (say the first m). Then p(x) = +∞ x = w1, . . . , wn, n x = 0, x 6= w1, . . . , wn (if m = 0) m x 6= 0, x 6= w1, . . . , wn This shows for the corresponding log-polynomial inf p(x) = m, so Mn ≤ 1, whence M = 1. The energy is computed easily. Using the above reasoning on the maximum principle, we see W (µ) = +∞ for any µ ∈ M1(X), hence w(X) = +∞. Thus we have an example of +∞ = w > M > D = 0. The above example completes the case of the kernel with maximum principle. Let us now drop this assumption and look at what can happen. Example 14. Let X := {−1, 0, 1} be endowed with the discrete topol- ogy. We define the kernel by k(x, y) := 2 if 0 ≤ |x− y| < 2, 0 if 2 = |x− y|. Then k is continuous and bounded on X×X. This, in any case, implies D = w by Theorem 8. Note that k does not satisfy the maximum principle. To see this, consider, e.g., the measure µ = 1 δ1. Then for the potential Uµ one has Uµ(1) = Uµ(−1) = 1 and Uµ(0) = 2, which shows the failure of the maximum principle. To estimate the nth diameter from above, let us consider the point system {wi} of n = 2m points with m points falling at −1 and m points falling at 1, while no points being placed at 0. Then by definition of Dn := Dn(X) one can write n(n− 1) Dn ≤ 2 · 2 +m2 · 0 = Applying this estimate for all even n = 2m as n → ∞, it follows that D = lim Dn ≤ 1. (9) Next we estimate the Chebyshev constants from below by computing the infimum of some special log-polynomials. For pn(x) = k(x, 0) one has pn(x) ≡ 2 = inf pn. We thus find Mn ≥ 2 and M ≥ 2, showing M > D, as desired. Example 15. Let X := N with the discrete topology. Then X is a locally compact Hausdorff space, and all functions are continuous, hence l.s.c. on X. Let k : X ×X → [0,+∞] be defined as k(n,m) := +∞ if n = m, 2−n−m if n 6= m. Clearly k is an admissible kernel function. For the energy we have again w(X) = +∞, see Example 13. On the other hand let n ∈ N be any fixed number, and compute the nth diameter Dn(X). Clearly if we choose wj := m+ j, with m a given (large) number to be chosen, then we get Dn(H) ≤ (n− 1)n 1≤i 6=j≤n 2−i−j−2m ≤ (n− 1)n ≤ 2−2m , hence we find that the nth diameter is Dn(X) = 0, so D(X) = 0, too. For any log-polynomial p(x) we have inf p(x) = limx→∞ p(x) = 0, hence M(X) = 0. That is we have D(X) = M(X) = 0 < w(X) = +∞. The example shows how important the diagonal, excluded in the definition of D but taken into account in w, may become for particular cases. We can even modify the above example to get finite energy. Example 16. Let X := (0, 1], equipped with the usual topology, and let xn = 1/n. We take now k(x, y) := +∞ if x = y, 2−n−m if x = xn and y = xm (xn 6= xm), − log |x− y| otherwise Compared to the l.s.c. logarithmic kernel, this k assumes different, smaller values at the relatively closed set of points {(xn, xm) : n 6= m} ⊂ X ×X only, hence it is also l.s.c. and thus admissible as kernel. If a measure µ ∈ M1(X) has any atom, say if for some point z ∈ X we have µ({z}) > 0, then by definition X k(x, y) dµ(y) = +∞, hence also w(µ) = +∞. Since for all µ ∈ M1(X) with any atomic component w(µ) = +∞, we find that for the set H := X we have w(H) := inf µ∈M1(H) w(µ) = inf µ∈M1(H) µ not atomic w(µ). But for measures without atoms, the countable set of the points xn are just of measure zero, hence the energy equals to the energy with respect to the logarithmic kernel. Thus we conclude w(H) = e−cap(H) = e−1/4, as cap((0, 1]) = 1/4 is well-known. On the other hand if n ∈ N is any fixed number, we can compute the nth diameter Dn(H) exactly as above in Example 15. Hence it is easy to see that Dn(H) = 0, whence also D(H) = 0. Similarly, we find M(H) = 0, too. This example shows that even in case w(H) < +∞ we can have w(H) > D(H) = M(H). 6. Average distance number and the maximum principle In the previous section, we showed the equality of the Chebyshev con- stant M and the transfinite diameter D, using essentially elementary inequalities and the only theoretically deeper ingredient, the assump- tion of the maximum principle. We have also seen examples showing that the lack of the maximum principle for the kernel allows strict inequality between M and D. These observations certify to the rel- evance of this principle in our investigations. Indeed, in this section we show the necessity of the maximum principle in case of continuous kernels for having M(K) = D(K) for all compact sets K. We need some preparation first. Recall from the introduction the notion of the average distance (or rendezvous) number. Actuyally, a more general assertion than there can be stated, see Stadje [21] or [6]. For a compact connected set K and a continuous, symmetric kernel k, the average distance number r(K) is the uniquely existing number with the property that for all probability measures supported in K there is a point x ∈ K with Uµ(x) = k(x, y) dµ(y) = r(K). This can be even further generalised by dropping the connectedness, see Thomassen [22] and [6]. Even for not necessarily connected but compact spacesK with symmetric, continuous kernel k there is a unique number r(K) with the property that whenever a probability measure on K and a positive ε are given, there are points x1, x2 ∈ K such that Uµ(x1)− ε ≤ r(K) ≤ U µ(x2) + ε. This number is called the (weak) average distance number, and is par- ticularly easy to calculate, when a probability measure with constant potential is available. Such a measure µ is called then an invariant measure. In this case the average distance number r(K) is trivially just the constant value of the potential Uµ, see Morris, Nicholas [14] or [7]. It was proved in [7] that one always has M(K) = r(K), so once we have an invariant measure, then the Chebyshev constant is again easy to determine. Also the Wiener energy w(K) has connection to invariant measures, as shown by the following result, which is a simplified version of a more general statement from [7], see also Wolf [23]. THEOREM 17. Let ∅ 6= K ⊂ X be a compact set and k be a continu- ous, symmetric kernel. Then we have r(K) ≥ w(K). Furthermore, if r(K) = w(K), then there exists an invariant measure in M1(K). As mentioned above, we have r(K) = M(K), so the inequality r(K) ≥ w(K) in the first assertion of the above theorem is also the conse- quence of Theorems 3 and 8. For the proof of the second assertion one can use the Frostman-Fuglede Equilibrium Theorem 9 with the obvious observation that “nearly every” in this context means indeed “every”. Actually any probability measure µ ∈ M1(K) which minimises ν 7→ supK U ν is an invariant measure and its potential is constant M(K), see [7, Thm. 5.2] (such measures undoubtedly exist because of compactness of M1(K)). Henceforth we will indifferently use the terms energy minimising or invariant for expressing this property of measures. THEOREM 18. Suppose that the kernel k is symmetric and continu- ous. If M(K) = D(K) for all compact sets K ⊆ X, then the kernel has the maximum principle. Proof. Recall from Corollary 6 that D(K) = w(K) for all K ⊆ X compact. So we can use Theorem 17 all over in the following arguments. We first prove the assertion in the case when X is a finite set. The proof is by induction on n = #X. For n = 1 the assertion is trivial. Let now #X = 2, X = {a, b}. Assume without loss of generality that k(a, a) ≤ k(b, b). Then we only have to prove that for µ = δa the maximum principle, i.e., the inequality k(a, b) ≤ k(a, a) holds. To see this we calculate M(X) and D(X). We certainly have D(X) ≤ k(a, a). On the other hand for an energy minimising probability measure νp := pδa + (1 − p)δb on X we know that its potential is constant over X, hence pk(a, a) + (1− p)k(b, a) = pk(a, b) + (1− p)k(b, b) = M(X) = D(X) ≤ k(a, a). Here if p = 1, then k(a, a) = k(a, b). If p < 1, then we can write (1− p)k(b, a) ≤ (1− p)k(a, a), hence k(b, a) ≤ k(a, a), so the maximum principle holds. Assume now that the assertion is true for all sets with at most n elements and for all kernels, and let #X = n + 1. For a probability measure µ on X we have to prove supx∈X U µ(x) = supx∈ supp µ U µ(x). If supp µ = X, then there is nothing to prove. Similarly, if there are two distinct points x1 6= x2, x1, x2 ∈ X \ supp µ, then by the induction hypothesis we have x∈X\{x1} Uµ(x) = sup x∈ supp µ Uµ(x) = sup x∈X\{x2} Uµ(x). So for a probability measure µ defying the maximum principle we must have # supp µ = n, say supp µ = X \ {xn+1}; let µ be such a measure. Set K = supp µ and let µ′ be an invariant measure on K. We claim that all such measures µ′ are also violating the maximum principle. If µ = µ′, we are done. Assume µ 6= µ′ and consider the linear combinations µt := tµ+(1− t)µ ′. There is a τ > 1, for which µτ is still a probability measure and supp µτ ( supp µ. By the inductive hypothesis (as # supp µτ < n) we have U µτ (xn+1) ≤ U µτ (a) for some a ∈ supp µτ . We also know that U µ(xn+1) = U µ1(xn+1) > U µ1(a). Hence for the linear function Φ(t) := Uµt(xn+1) − U µt(a) we have Φ(1) > 0 and also Φ(τ) ≤ 0 (τ > 1). This yields Φ(0) > 0, i.e., (xn+1) = U µ0(xn+1) > U µ0(a) = Uµ (y) for all y ∈ K. We have therefore shown that all energy minimising (invariant) measures on K must defy the maximum principle. Let now ν be an invariant measure on X. We have M(X) = Uν(y) = sup Uν(x) = D(X) ≤ D(K) = sup (x) = Uµ (z) < Uµ (xn+1) for all y ∈ X, z ∈ K. Thus we can conclude Uν(y) ≤ Uµ (y) for all y ∈ X and even “<” for y = xn+1. Integrating with respect to ν would yield k dν dν = M(X) < k dµ′ dν = k dν dµ′ = M(X), hence a contradiction, unless ν({xn+1}) = 0. If ν({xn+1}) = 0 held, then ν would be an energy minimising measure on K. This is because obviously supp ν ⊆ K holds, and the potential of ν is constant M(X) over K, so M(X) = k dν dµ′ = k dµ′ dν = M(K) holds. As we saw above, then ν would not satisfy the maximum principle, a contradiction again, since the potential of ν is constant on X. The proof of the case of finite X is complete. We turn now to the general case of X being a locally compact space with continuous kernel. Let µ be a compactly supported probability measure on X and y 6∈ supp µ. Set K = supp µ and note that both M1(K) ∋ ν 7→ supK U ν and ν 7→ Uν(y) are continuous mappings with respect to the weak∗-topology on M1(K). If supK U µ < Uµ(y) were true, we could therefore find, by a standard approximation argument, see for example [6, Lemma 3.8], a finitely supported probability measure µ′ on K for which x∈ supp µ′ (x) ≤ sup (x) < Uµ This is nevertheless impossible by the first part of the proof, thus the assertion of the theorem follows. Acknowledgement The authors are deeply indebted to Szilárd Révész for his insightful suggestions and for the motivating discussions. References 1. Anagnostopoulos, V. and Sz. Gy. Révész: 2006, ‘Polarization constants for products of linear functionals over R2 and C2 and Chebyshev constants of the unit sphere’. Publ. Math. Debrecen 68(1–2), 75–83. 2. Bourbaki, N.: 1965, Intégration, Éléments de Mathématique XIII., Vol. 1175 of Actualités Sci. Ind. Paris: Hermann, 2nd edition. 3. Carleson, L.: 1967, Selected Problems on Exceptional Sets, Vol. 13 of Van Nostrand Mathematical Studies. D. Van Nostrand Co., Inc. 4. Choquet, G.: 1958/59, ‘Diamètre transfini et comparaison de diverses ca- pacités’. Technical report, Faculté des Sciences de Paris. 5. Farkas, B. and Sz. Gy. Révész: 2005, ‘Rendezvous numbers in normed spaces’. Bull. Austr. Math. Soc. 72, 423–440. 6. Farkas, B. and Sz. Gy. Révész: 2006a, ‘Potential theoretic approach to rendezvous numbers’. Monatshefte Math 148, 309–331. 7. Farkas, B. and Sz. Gy. Révész: 2006b, ‘Rendezvous numbers of metric spaces – a potential theoretic approach’. Arch. Math. (Basel) 86, 268–281. 8. Fekete, M.: 1923, ‘Über die Verteilung der Wurzeln bei gewissen algebraischen Gleichungen mit ganzahligen Koeffizienten’. Math. Z. 17, 228–249. 9. Fuglede, B.: 1960, ‘On the theory of potentials in locally compact spaces’. Acta Math. 103, 139–215. 10. Fuglede, B.: 1965, ‘Le théorème du minimax et la théorie fine du potentiel’. Ann Inst. Fourier 15, 65–87. 11. Gross, O.: 1964, ‘The rendezvous value of a metric space’. Ann. of Math. Stud. 52, 49–53. 12. Landkof, N. S.: 1972, Foundations of modern potential theory, Vol. 180 of Die Grundlehren der mathematischen Wissenschaften. New York, Heidelberg: Springer. 13. Mhaskar, H. N. and E. B. Saff: 1992, ‘Weighted analogues of capacity, transfinite diameter and Chebyshev constants’. Constr. Approx. 8(1), 105–124. 14. Morris, S. A. and P. Nickolas: 1983, ‘On the average distance property of compact connected metric spaces’. Arch. Math. 40, 459–463. 15. Ohtsuka, M.: 1961, ‘On potentials in locally compact spaces’. J. Sci. Hiroshima Univ. ser A 1, 135–352. 16. Ohtsuka, M.: 1965, ‘An application of the minimax theorem to the theory of capacity’. J. Sci. Hiroshima Univ. ser A 29, 217–221. 17. Ohtsuka, M.: 1967, ‘On various definitions of capacity and related notions’. Nagoya Math. J. 30, 121–127. 18. Pólya, Gy. and G. Szegő: 1931, ‘Über den transfiniten Durchmesser (Ka- pazitätskonstante) von ebenen und räumlichen Punktmengen’. J. Reine Angew. Math. 165, 4–49. 19. Révész, Sz. Gy. and Y. Sarantopoulos: 2004, ‘Plank problems, polarization, and Chebyshev constants’. J. Korean Math. Soc. 41(1), 157–174. 20. Saff, E. B. and V. Totik: 1997, Logarithmic potentials with external fields, Vol. 316 of Grundlehren der Mathematischen Wissenschaften. Springer, Berlin. 21. Stadje, W.: 1981, ‘A property of compact, connected spaces’. Arch. Math. 36, 275–280. 22. Thomassen, C.: 2000, ‘The rendezvous number of a symmetric matrix and a compact connected metric space’. Amer. Math. Monthly 107(2), 163–166. 23. Wolf, R.: 1997, ‘On the average distance property and certain energy integrals’. Ark. Mat. 35, 387–400. 24. Zaharjuta, V. P.: 1975, ‘Transfinite diameter, Chebishev constants, and capacity for compacta in Cn’. Math. USSR Sbornik 25(3), 350–364. ABSTRACT We study the relationship between transfinite diameter, Chebyshev constant and Wiener energy in the abstract linear potential analytic setting pioneered by Choquet, Fuglede and Ohtsuka. It turns out that, whenever the potential theoretic kernel has the maximum principle, then all these quantities are equal for all compact sets. For continuous kernels even the converse statement is true: if the Chebyshev constant of any compact set coincides with its transfinite diameter, the kernel must satisfy the maximum principle. An abundance of examples is provided to show the sharpness of the results. <|endoftext|><|startoftext|> Introduction Event logs have been widely used to analyze the error/failure behavior of computer-based systems and to estimate their dependability. Event logs include a large amount of information about the occurrence of various types of events that are collected concurrently with normal system operation, and as such reflect actual workload and usage. Some of the events are informational and are issued from the normal activity of the target systems, whereas others are recorded when errors and failures affect local or distributed resources, or are related to system shutdown and start-up. The latter events are particularly useful for dependability analysis. Computer system dependability analysis based on event logs has been the focus of several published papers [1, 2, 4, 5, 7, 8, 9]. Various types of systems have been studied (Tandem, VAX/VMS, Unix, Windows NT, Windows 2000, etc.) including mainframes and largely deployed commercial systems. The issues addressed in these studies cover a large spectrum, including the development of techniques and methodologies for the extraction of relevant information from the event logs, the identification of error patterns, their causes and their effects, and the statistical assessment of dependability measures such as failure and recovery rates, reliability and availability. It is widely recognized that such event log based dependability analyses provide useful feedback to software and system designers. Nevertheless, it is important to note that the results obtained are intimately related to the quality and the accuracy of the data recorded in the logs. The study reported in [1] points out various problems that might affect the data included in the event logs and make incorrect conclusions likely, considering as an example the VAX/VMS system. Thus, extreme care is needed to identify deficiencies in the data and to avoid that they lead to incorrect conclusions. In this paper, we show that similar problems can be observed in the event logs maintained by the SunOS/Solaris Unix operating system, and we present a novel approach that is aimed to address such problems and to improve the dependability estimates based on such event logs. These results are illustrated using field data collected during a 4-year period from 373 SunOS/Solaris Unix workstations and servers interconnected through a LAN. The data corresponds to event logs recorded via the syslog daemon. In particular, we use var/adm/messages log files. We focus on the evaluation of machine uptimes, downtimes and availability based on the identification of failures that caused a total service interruption of the machine. In this study, we show that the consideration of the information recorded in the var/adm/messages log files only may lead to dependability estimations that do not faithfully reflect reality due to incomplete or imperfect data recorded in the corresponding logs. For the estimation of these measures, we start with the assumption that machine failures can be identified by the last events recorded in the event log before the machine goes down and then is rebooted. This assumption was considered in the study reported in [3]. However, the validity of this assumption is questionable in the following situations: 1) the machine has a real activity between the last event logged and the reboot without generating events in the logs, 2) the time when the failure occurs is earlier than the timestamp of the last event logged on the machine. To address these problems and to obtain more realistic estimations, we propose a solution based on utilization of additional information obtained from wtmpx Unix files, as well as data characterizing the state of the machines included in the data collection that are recorded at a regular basis during the data collection procedure. The results clearly show that the combined use of this additional information and syslogd log files have a significant impact on the estimations. To our knowledge, the approach discussed in this paper and the corresponding results have not been addressed in the previous studies published on the exploitation of syslogd log files for the dependability analysis of Unix based systems, including our paper The rest of the paper is structured into 5 sections. Section 2 describes the event logging mechanism in Unix and the data collection procedure that we have used in our study. Section 3 presents the dependability measures that we have considered and discusses different approaches and assumptions to estimate them from the collected data. Section 4 presents some results illustrating the benefits of the proposed approach, as well as various statistics characterizing the dependability of the Unix systems considered in our study. 2. Event logging and data collection For the Unix operating system, the event logging mechanism is implemented by the syslog daemon (denoted as syslogd). Running as a background process, this daemon listens for the events generated by different sources: kernel, system components (disk, memory, network interfaces), daemons and applications that are configured to communicate with syslogd. These events inform about the normal activity of the system as well as its behavior under the occurrence of errors and failures including reboot and shutdown events. The configuration file /etc/syslog.conf specifies the destination of each event received by syslogd, depending on its severity level and its origin. The destination could be one or several log files, the administration console or the operator. The events that are relevant to our study are generally stored in the /var/adm/messages log file. Each message stored in a log file refers to an event that occurred on the system due to the local activity or its interaction with other systems on the network. It contains the following information: the date and time of the event, the machine name on which the event is logged and a description of the message. An example of an event recorded in the log file is given below: Mar 2 10:45:12 elgar automountd[124]: server mahler not responding The SunOS/Solaris Unix operating system limits the size of the log files. Generally, only the log files corresponding to the last 5 weeks of activity are kept. It is necessary to set up a data collection strategy in order to archive a large amount of data. This is essential to obtain representative results for the dependability measures characterizing the monitored systems. In our study, we have included all the SunOS/Solaris machines connected through the LAAS local area network, excluding those used for experimental testbeds or maintenance activities. We have developed a data collection strategy to automatically collect the /var/adm/messages log files stored on these machines. This strategy takes into account the frequent evolution of the network configuration during the observation period in terms of variation of the number of connected systems, updates or changes of the operating system versions, modification of software configurations, etc. A shell script executed each week via the cron mechanism implements the strategy and remotely copies the log files from each system included in the study and archives them on a dedicated machine. After each data collection campaign, a text file (named DCSummary) containing a summary of the data collection campaign is created. This summary indicates the status of each machine included in the campaign and how the collection of the corresponding log file has been done. For each machine, the status information reported in the summary is one of the following: • alive_OK: the machine is alive and the copy of its log file succeeded; • alive_KO: the machine is alive but the copy of its log file failed. For this case, a description of the failure symptom and cause is also included: shell problem, connection ended by tiers, etc. • no_answer: the machine did not answer to a ping request before expiration of the default timeout period. The information included in the DCSummary file is used to verify each data collection campaign and solve the problems that may appear during the collection. It is also useful to improve the accuracy of dependability measures estimation (see Section 3.2). More detailed information about the syslogd mechanism and the data collection strategy are reported in [6]. 3. Dependability measures estimation and assumptions Various types of dependability analyses can be carried out based on the information contained in the log files and several quantitative measures can be considered to characterize the dependability of the target machines: machine uptimes and downtimes, reliability, availability, failure and recovery rates, etc. In order to evaluate these measures, it is necessary to identify from the log files the failure occurrences and the corresponding service degradation durations. Such task is tedious and requires the development of heuristics and predefined failure criteria. An example of such analysis is reported in [7]. In our study, we have focused on the availability analysis of the individual machines included in the data collection. In this context, we have considered machine failures leading to a total interruption of the service delivered to the users, followed by a reboot. The time between the failure occurrence and the end of the reboot corresponds to the total service interruption period of the system. Apart from these periods, the system is considered to be in the normal functioning state where it delivers an appropriate service to the users. In order to evaluate the availability of the machines included in the study, we need to estimate for each machine the corresponding uptimes (denoted as UTi) and downtimes (DTi), based on the information recorded in the event logs. Each downtime value DTi corresponds to the total service interruption period associated to the i failure. It is composed by the service degradation period due to the failure occurrence and the reboot period. Each uptime value corresponds to the period between two successive downtimes. Using the uptime and downtime estimates for each machine j, we can evaluate the corresponding availability (noted Aj) and the unavailability (noted UAj). These measures are computed with the following formulas: UAj =� UTi ⁄ �(UTi +DTi) and UAj = 1 - UAj (1) 3.1. Machine uptimes and downtimes estimation The estimation of machine uptimes and downtimes is carried out in two steps: 1) Identification of machine reboots and their duration. 2) Identification of failures associated to each reboot and of the corresponding service interruption period. To identify the occurrence of machine reboots and their duration, we have developed an algorithm based on the sequential parsing and matching of each event recorded in the system log files to specific patterns or sequences of patterns characterizing the occurrence of reboots. Indeed, whereas some reboots can be explicitly identified by a “reboot” or a “shutdown” event, many others can be detected only by identifying the sequence of the initialization events that are generated by the system when it is restarted. The algorithm is described in [4, 6]. It gives, for each reboot i identified in the event logs and for each machine, the timestamp of the reboot start (dateSBi), the timestamp of the reboot end (dateEBi) and the associated service interruption duration. The identification of the timestamp of the failure associated to each reboot and the corresponding service interruption period is more problematic. In the study reported in [3], it was assumed that the timestamp of the last event recorded before the reboot (denoted as dateEBRi) identifies the failure occurrence time. With this assumption, each uptime UTi and downtime DTi can be evaluated as follows: UTi = dateEBRi – dateEBi-1 and DTi = dateEBi - dateEBRi (2) where i is the index of the current reboot, i-1 the index of the previous reboot. The consideration of EBR for the estimation of UTi and DTi parameters may not be realistic in the following situations (denoted as S1 and S2): S1) The system could be in a normal functioning state during a period of time between EBR and the following reboot although it does not generate any event into the log files during that period. S2) The beginning of the service interruption period for the users could be prior to the timestamp of the EBR event. This happens for instance when a critical failure affects the machine in such a way that it becomes completely unusable to the users, without preventing the event logging mechanisms from recording some messages into the log files. A careful analysis of the data collected during our study revealed that the above situations are common. To address this problem and to improve downtime and uptime estimation accuracy, it is necessary to use auxiliary data that provides complementary information on the activity of the target machines. In this paper, we present a solution based on the correlation of data collected from the /var/adm/messages log files, with data issued from wtmpx files also maintained by the SunOS/Solaris operating system. We also use the information recorded in the DCSummary file (see Section 2). The following section presents the method developed to extract the data from the wtmpx file and how we used this data to adjust the estimation of machine uptimes and downtimes. 3.2. Uptime and downtime estimations refinement 3.2.1. wtmpx files. The SunOS/Solaris Unix operating system records into the /var/adm/wtmpx binary file information identifying the users login/logout. Through the pseudo-user reboot it also records information on the system reboots. The wtmpx file is organized into records (named also entries) with a fixed size. Each record has the format of a data structure with the following fields: • the user login name: “user”; • the id associated to the current record in the /etc/inittab file: “init_id”; • the device name (console, lnxx): “device”; • the process id: “pid”; • the record type: “proc_type”; • the exit status for a process marked as DEAD_PROCESS: “exit_status” and “term_status”; • the timestamp of the record: “date”; • the session id: “session_id”; • the length of the machine’s name: “length”; • the machine’s name used by the user to connect, if it is a remote one: “host”. We developed a specific algorithm that collects the wtmpx file of each machine included in the study on a regular basis and processes the binary file to extract the information that is relevant to our study. The results of the algorithm are kept in a separate file for each machine. Figure 1 presents examples of records obtained for a machine of our network. The first two records show that the root user connected to the local system from the system named cubitus on November 6, 2001 at 16h 37mn 41s, using the rlogin command. The next records inform about the occurrence of a reboot event about 3 minutes later. The third record shows that this reboot was done via a shutdown command executed probably by the root user. The sequence of records corresponding to a reboot event is much longer than this example. The whole sequence is not presented in Figure 1, the aim of the illustration is to show some examples of records as extracted from wtmpx files by our algorithm. In the following, we outline the approach that we developed to use the information extracted from the wtmpx files together with the information from the DCSummary files in order to refine the uptime and downtime estimations, considering situations S1 and S2 discussed in Section 3.1. 2001 Nov 6 16:37:41 user=.rlogin host= length=0 init_id=r100 device=/dev/pts/1 pid=25220 proc_type=6 term_status=0 2001 Nov 6 16:37:41 user=root host=cubitus length=8 init_id=r100 device=/dev/pts/1 pid=25220 proc_type=7 term_status=0 2001 Nov 6 16:40:35 user=shutdown host= length=0 init_id= device=~ pid=0 proc_type=0 term_status=0 exit_status=0 2001 Nov 6 16:41:39 user= host= length=0 init_id= device=system boot pid=0 proc_type=2 term_status=0 2001 Nov 6 16:42:09 user= host= length=0 init_id= device=run–level 3 pid=0 proc_type=1 term_status=0 Figure 1. Examples of records from /var/adm/wtmpx obtained with our algorithm 3.2.2. Situation S1: an operational activity exists between EBR and SB events. The detailed analysis of the collected data from the log files and comparison with the information extracted from wtmpx files showed that the situation where a real activity exists between the last event recorded before a reboot (EBR) and the event identifying the start of the following reboot (SB event) recorded in the /var/adm/messages log files appears quite often. This situation occurs when the machine functions normally but its activity doesn’t produce any message into the log file maintained by the syslogd daemon. The cause could be that the applications or services run by the users aren’t configured to communicate with the syslogd daemon. To better understand this case, Figure 2 gives an example of a sequence of events characterizing the state of the corresponding system, taking into account the information extracted from the /var/adm/messages, wtmpx and DCSummary files. For each event, we indicate the timestamp when it is logged, a short description and the source file from which the event is extracted. For wtmpx events, we present only the fields which are useful to identify the system activity, the other fields are not significant for this analysis. For this example, the events recorded in the /var/adm/messages log file let us believe that the system had no activity between December 8 at 18:06 (EBR event) and December 9 at 15:30, the timestamp of the reboot start. However, the analysis of the DCSummary and wtmpx files shows that the system had a real activity between EBR and SB events. In fact, we see that the data collection campaign was successfully carried out on December 9 at 6:43. Event # Event date Event description File where the event is logged .................. 2002 Dec 8 18:06:08 2002 Dec 9 06:43:34 2002 Dec 9 13:18:45 2002 Dec 9 13:35:21 2002 Dec 9 13:47:57 2002 Dec 9 13:48:48 2002 Dec 9 15:18:46 2002 Dec 9 15:29:20 2002 Dec 9 15:29:25 2002 Dec 9 15:29:25 2002 Dec 9 15:29:27 .................. 2002 Dec 9 15:29:52 2002 Dec 9 15:30:52 2002 Dec 9 15:30:52 2002 Dec 9 15:30:52 2002 Dec 9 15:30:52 2002 Dec 9 15:30:53 last event before reboot alive_ok user=UserC; device=pts/0; pid=2362; proc_type=7 user=UserB; device=pts/1; pid=2379; proc_type=7 user=UserB; device=pts/1; pid=2379; proc_type=8 user=UserA; device=pts/1; pid=2434; proc_type=7 user=UserA; device=pts/1; pid=2434; proc_type=8 user=UserB; device=console; pid=2644; proc_type=7 user=UserB; device=console; pid=338; proc_type=8 user=UserB; device=console; pid=2644; proc_type=8 user=LOGIN; device=console; pid=2742; proc_type=6 .................. user=troot; device=console; pid=334; proc_type=7 user=sac; device=; pid=333; proc_type=8 user=troot; device=console; pid=334; proc_type=8 user=; device=run-level 6; pid=0; proc_type=1 user=rc6; device=; pid=2899; proc_type=5 reboot start var/adm/messages log file DCSummary wtmpx wtmpx wtmpx wtmpx wtmpx wtmpx wtmpx wtmpx wtmpx .................. wtmpx wtmpx wtmpx wtmpx wtmpx var/adm/messages log file Figure 2. Example illustrating situation S1 Moreover, the records from wtmpx file show, for example, that UserA used the system on December 9 between 13:48 (information given by the proc_type field value equal to 7, that is the process with pid=2434 started at the time of this record) and 15:18 (proc_type=8, the same process ended at the time of this record), corresponding to an utilization period of the system of nearly one hour and a half. In this situation, the EBR event as defined earlier doesn’t correspond to the beginning of the total service interruption period. Thus, the estimated value of the downtime parameter using the assumption discussed in Section 3.1, does not faithfully reflect the real value of the service interruption period. Based on the correlation of the information provided by the three data source files, a refined and more accurate estimation of machine downtimes and uptimes could be obtained. The refinement consists in associating the failure occurrence time to the timestamps of the last event recorded before the reboot based on the information contained in /var/adm/messages, wtmpx and DCSummary files. 3.2.3. Situation S2: the service interruption period starts before the EBR event. This situation occurs when critical failures affect the system in such a way that it becomes completely unusable, without preventing the event logging mechanisms from recording some messages into the log files. During the recovery phase, the actions performed by the system administrators may include several unsuccessful reboot attempts that are not recorded in the /var/adm/messages log file, but some events referring to them are written in the wtmpx file. Using this information, just like in the previous case, we can refine the downtime and uptime estimations by associating the failure occurrence time to the timestamps of the events recorded in the wtmpx file that better reflects the start of the service interruption. An example of a sequence of events illustrating this case is given in Figure 3. Event # Event date Event description File where the event is logged 2003 Jan 9 10:18:59 2003 Jan 9 10:21:39 2003 Jan 9 10:21:39 2003 Jan 9 10:21:39 2003 Jan 9 10:21:39 2003 Jan 9 10:21:48 2003 Jan 9 10:21:48 2003 Jan 9 10:22:05 2003 Jan 9 10:22:13 2003 Jan 9 10:22:13 2003 Jan 9 10:22:16 user=root; device=console; pid=2370; proc_type=7 user=sac; device=; pid=425; proc_type=8 user=root; device=console; pid=2370; proc_type=8 user=; device=run-level 5; pid=0; proc_type=1 user=rc5; device=; pid=25952; proc_type=5 user=UserC; device=pts/3; pid=11584; proc_type=8 user=UserC; device=pts/1; pid=11359; proc_type=8 last event before reboot user=rc5; device=; pid=25953; proc_type=8 user=uadmin;device=; pid=26121; proc_type=5 reboot start wtmpx wtmpx wtmpx wtmpx wtmpx wtmpx wtmpx var/adm/messages log file wtmpx wtmpx var/adm/messages log file Figure 3. Example illustrating situation S2 We can identify the events extracted from the wtmpx file informing upon the stop of the system: event # 2 with user field “sac” and proc_type “ 8” (dead process) followed by events #3, #4, and #5 notifying the system run-level change to run-level 5 (this one is used to properly stop the system). This example shows that the start of the service interruption period is prior to the EBR event recorded in the /var/adm/messages log file. The refinement of the uptime and downtime estimations corresponding to such situations consists in associating the failure occurrence time to the timestamps of the last event recorded in the wtmpx file before the start of the reboot sequence. 4. Experimental results The analyses presented in this Section are based on /var/adm/messages log file data collected during 45 months (October 1999 – July 2003) from 418 SunOS/Solaris Unix workstations and servers interconnected through the LAAS local area computing network. As shown in Figure 4, the data collection period differed significantly from one machine to another due to the dynamic evolution of the network. For more than 70 % of the machines, the data collection period was longer than 21 months. On the other hand, it can be noticed that some machines have a quite short data collection period. In order to have significant statistical analysis results, we excluded from the analysis the machines for which the data collection period was shorter than 2000 hours (about 3 months). Consequently, the results presented in the following concern 373 Unix machines. Among these machines, 17 correspond to major servers for the entire network or a sub-set of users: WWW, NIS+, NFS, FTP, SMTP, file servers, printing servers, etc. Figure 4. Examples of records from /var/adm/wtmpx obtained with our algorithm The application of the reboot identification algorithm on the collected data allowed us to identify 12805 reboots for the 373 machines, only 476 reboots concern the 17 servers. Based on the information provided by the reboot identification algorithm, we evaluated for each machine the associated uptimes UTi and downtimes DTi, and the availability measure. The collection of wtmpx files started later than the /var/adm/messages log files. For this reason, we were able to analyze the impact of uptimes and downtimes estimation refinement algorithms only on a subset of UTi and DTi values associated with the reboots identified from the log files. Among the 12805 reboots, this analysis concerned 6163 reboots (48.13%). For the remaining 6642 reboots, the corresponding data from the wtmpx files was not available. In the following, we first present in Section 4.1 the results of machine uptimes and downtimes estimation based on the processing of the set of 6163 reboots focusing on the impact of the estimation refinement algorithms. Then, global results taking into account the whole data collected during our study are presented in Section 4.2 in order to give an overall picture on the availability and the rate of occurrence of reboots characterizing the Unix machines included in our study. 4.1. Machine uptimes and downtimes estimation and refinement The correlation of the information contained in the /var/adm/messages log files, the wtmpx files, and the DCSummary files, revealed that both situations S1 and S2 discussed in Section 3.2 are common: • Situation S1 was observed for 79.35% of the analyzed reboots; • Situation S2 was observed for 10.77% of the analyzed reboots; For the 9.88% remaining reboots, the assumption that the EBR recorded in /var/adm/messages file identifies the last event recorded on the machine before the reboot was consistent with the information available in the wtmpx and the DCSummary files. In order to analyze the impact of the estimation refinement algorithms on the results, Table 1 gives the Mean, Median and Standard Deviation of uptime and downtime values, before and after the application of our estimation refinement algorithms discussed in Section 3. Considering the median of the downtime values, it can be seen that the refinement algorithms have a significant impact on the results. The median estimated after the refinement is 66 times lower than the value obtained without the refinement. The refinement algorithms have also an impact on the uptimes estimation, but as expected the improvement factor is lower than the one observed for the downtime values (1.8 compared to 66). Table 1. Machine uptimes and downtimes estimates before and after refinement Uptimes UTi Downtimes DTi before refinement after refinement before refinement after refinement Mean 28.3days 1.1 month 5.9 days 1.9 days Median 6.1 days 10.8 days 8.9 hours 8.1 min 1.7 months 1.8 months 24.1 days 21.1 days The impact of the estimation refinement algorithms on availability is summarized in Table 2. It can be seen the estimated average unavailability after the refinement is three times lower than the value estimated based only on the information in the /var/adm/messages log files. Clearly, the difference is significant and cannot be ignored. Table 2. Impact of the estimation refinement algorithms on Availability and Unavailability before refinement after refinement A 89.3% 96.3 % UA 39.0 days/year 13.7 days/year 4.2. Availability and reboot rates estimated from the whole data set This section presents some results concerning the reboot rates and the availability of the 373 SunOS/Solaris Unix machines included in our study taking into account the whole set of 12805 reboots identified from the /var/adm/messages files. When the wtmpx files were not available (this concerned 6642 reboots), the estimation of the UTi, DTi, availability and reboot rates was based only on the information in the /var/adm/messages files, using the assumption discussed in Section 3.1. In the other case (i.e., for the 6163 reboots), we applied the estimation refinement algorithms presented in Section 3.2. Figure 5 plots the reboot rates estimated for each machine as a function of the data collection period. The estimated reboot rate for each machine corresponds to the average number of reboots recorded during the corresponding observation. It can be seen that the reboot rates are uniformly distributed between 10 /hour and 10 /hour. Figure 5. Unix machines reboot rates as a function of the data collection period As indicated in Table 3, the mean value of machine reboot rates is 1.3 10 /hour, when considering all Unix machines including workstations and servers. If we take into account only the servers, the mean reboot rate is 1.5 times lower (7.7 10 /hour) corresponding to one reboot every two months. Table 3. Reboot rate statistics Mean Median Std. Dev. SunOS/Solaris machines (Workstations + Servers) 1.3 10 /h 1.0 10 /h 1.3 10 Servers only 7.7 10 /h 6.4 10 /h 5.6 10 The results illustrating the availability and unavailability of the Unix machines including workstations and servers are given in Figure 6 and Table 4. The mean availability is 97.81 % corresponding to an average unavailability of 8 days per year. Detailed analysis shows that only 15 among the 373 Unix machines included in the study have an availability lower than 90%. When considering only the servers, the estimated availability varies between 99.36% and 99.1% with an average unavailability of 12 hours per year. Figure 6. SunOS/Solaris Unix machines availability distribution Table 4. Availability and Unavailability statistics Mean Median Std. Dev. A 97.81 % 98.79 % 3.07 % UA 7.99 day/year 4.41 day/year 11.20 day/year 6. Conclusion Dependability analyses based on event logs provide useful feedback to software and system designers. Nevertheless, the results obtained are intimately related to the quality and the completeness of the information recorded in the logs. As the information contained in such event logs could be incomplete or imperfect, it is important to use additional sources of information to ensure that the conclusions derived from such analyses faithfully reflect reality. The approach investigated in this paper is aimed to fulfill this objective considering SunOS/Solaris Unix systems as an example. In particular, we have shown that the combined us of the data contained in the syslogd files and the information recorded in the wtmpx files or through the monitoring of systems state during the data collection campaigns provides uptime and downtime estimations that are closer to reality than the estimations obtained based on syslogd files only. This result is illustrated based on a large set of field data collected from 373 machines during a 45 month observation period. In our future work, we will investigate the applicability of the approach proposed in this paper to other operating systems such as Linux, Windows 2K and Mac OS X. References [1] M. F. Buckley, D. P. Siewiorek, “VAX/VMS Event Monitoring and Analysis”, 25th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-25), (Pasadena, CA, USA), pp. 414-423, IEEE Computer Society, 1995. [2] R. K. Iyer, D. Tang, “Experimental Analysis of Computer System Dependability”, in Fault-Tolerant Computer System Design, D. K. Pradhan, Ed., Prentice Hall PTR, 1996, pp. 282-392. [3] M. Kalyanakrishnam, Z. Kalbarczyk, R. K. Iyer, “Failure Data Analysis of a LAN of Windows NT Based Computers”, 18th IEEE Symp. on Reliable Distributed Systems (SRDS-18), (Lausanne, Switzerland), pp. 178-187, 1999. [4] C. Simache, M. Kaâniche, “Measurement-based Availability Analysis of Unix Systems in a Distributed Environment”, The 12th Int. Symp. on Software Reliability Engineering (ISSRE-2001), (Hong Kong, China), pp. 346-355, IEEE Computer Society, 2001. [5] C. Simache, M. Kaâniche, “Event Log based Dependability Analysis of Windows NT and 2K Systems”, 2002 Pacific Rim Int. Symposium on Dependable Computing (PRDC-2002), (Tsukuba, Japan), pp. 311-315, IEEE Computer Society, 2002. [6] C. Simache, “Dependability evaluation of Unix and Windows Systems based on operational data: A Method and Application”, PhD Thesis, LAAS Report N°04333, 2004 (in French). [7] A. Thakur, R. K. Iyer, “Analyze-NOW — An Environment for Collection & Analysis of Failures in a Network of Workstations”, IEEE Transactions on Reliability, vol. 45, pp. 561-570, 1996. [8] M. Tsao, D. P. Siewiorek, “Trend Analysis on System Error Files”, 13th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-13), (Milano, Italy), pp. 116-119, IEEE Computer Society, 1983. [9] J. Xu, Z. Kalbarczyk, R. K. Iyer, “Networked Windows NT System Field Failure Data Analysis”, Proc. 1999 IEEE Pacific Rim Int. Symp. on Dependable Computing (PRDC-1999), (Los Alamitos, CA), pp. 178-185, 1999 /ASCII85EncodePages false /AllowTransparency false /AutoPositionEPSFiles false /AutoRotatePages /None /Binding /Left /CalGrayProfile (None) /CalRGBProfile (None) /CalCMYKProfile (None) /sRGBProfile (sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Error /CompatibilityLevel 1.3 /CompressObjects /Off /CompressPages true /ConvertImagesToIndexed true /PassThroughJPEGImages true /CreateJDFFile false /CreateJobTicket false /DefaultRenderingIntent /Default /DetectBlends true /DetectCurves 0.1000 /ColorConversionStrategy /LeaveColorUnchanged /DoThumbnails true /EmbedAllFonts true /EmbedOpenType false /ParseICCProfilesInComments true /EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false /EndPage -1 /ImageMemory 1048576 /LockDistillerParams true /MaxSubsetPct 100 /Optimize true /OPM 0 /ParseDSCComments false /ParseDSCCommentsForDocInfo false /PreserveCopyPage true /PreserveDICMYKValues true /PreserveEPSInfo false /PreserveFlatness true /PreserveHalftoneInfo true /PreserveOPIComments false /PreserveOverprintSettings true /StartPage 1 /SubsetFonts true /TransferFunctionInfo /Remove /UCRandBGInfo /Preserve /UsePrologue false /ColorSettingsFile () /AlwaysEmbed [ true /NeverEmbed [ true /AntiAliasColorImages false /CropColorImages true /ColorImageMinResolution 150 /ColorImageMinResolutionPolicy /OK /DownsampleColorImages true /ColorImageDownsampleType /Bicubic /ColorImageResolution 300 /ColorImageDepth -1 /ColorImageMinDownsampleDepth 1 /ColorImageDownsampleThreshold 2.00333 /EncodeColorImages true /ColorImageFilter /DCTEncode /AutoFilterColorImages false /ColorImageAutoFilterStrategy /JPEG /ColorACSImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] /ColorImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] /JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 /JPEG2000ColorImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 150 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 2.00333 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages false /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] /GrayImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] /JPEG2000GrayACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 /JPEG2000GrayImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.00167 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1 /AllowPSXObjects false /CheckCompliance [ /None /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /False /Description << /JPN /DEU /FRA /PTB /DAN /NLD /ESP /SUO /ITA /NOR /SVE /ENU >> setdistillerparams /HWResolution [600 600] /PageSize [612.000 792.000] >> setpagedevice ABSTRACT This paper presents a measurement-based availability assessment study using field data collected during a 4-year period from 373 SunOS/Solaris Unix workstations and servers interconnected through a local area network. We focus on the estimation of machine uptimes, downtimes and availability based on the identification of failures that caused total service loss. Data corresponds to syslogd event logs that contain a large amount of information about the normal activity of the studied systems as well as their behavior in the presence of failures. It is widely recognized that the information contained in such event logs might be incomplete or imperfect. The solution investigated in this paper to address this problem is based on the use of auxiliary sources of data obtained from wtmpx files maintained by the SunOS/Solaris Unix operating system. The results obtained suggest that the combined use of wtmpx and syslogd log files provides more complete information on the state of the target systems that is useful to provide availability estimations that better reflect reality. <|endoftext|><|startoftext|> Introduction Several initiatives have been developed during the last decade to monitor malicious threats and activities on the Internet, including viruses, worms, denial of service attacks, etc. Among them, we can mention the Internet Motion Sensor project [1], CAIDA [2], DShield [3], and CADHo [4]. These projects provide valuable information on security threats and the potential damage that they might cause to Internet users. Analysis and modeling methodologies are necessary to extract the most relevant information from the large set of data collected from such monitoring activities that can be useful for system security administrators and designers to support decision making. The designers are mainly interested in having representative and realistic assumptions about the kind of threats and vulnerabilities that their system will have to cope with once it is used in operation. Knowing who are the enemies and how they proceed to defeat the security of target systems is an important step to be able to build systems that can be resilient with respect to the corresponding threats. From the system security administrators’ perspective, the collected data should be used to support the development of efficient early warning and intrusion detection systems that will enable them to better react to the attacks targeting their systems. As of today, there is still a lack of methodologies and significant results to fulfill the objectives described above, although some progress has been achieved recently in this field. The CADHo project “Collection and Analysis of Data from Honeypots” [4], an ongoing research action started in September 2004, is aimed at contributing to filling such a gap by carrying out the following activities: 1) deploying a distributed platform of honeypots [5] that gathers data suitable to analyze the attack processes targeting a large number of machines connected to the Internet; 2) developing analysis methodologies and modeling approaches to validate the usefulness of this platform by carrying out various analyses, based on the collected data, to characterize the observed attacks and model their impact on security. A honeypot is a machine connected to a network but that no one is supposed to use. In theory, no connection to or from that machine should be observed. If a connection occurs, it must be, at best an accidental error or, more likely, an attempt to attack the machine. The Leurré.com data collection environment [5], set up in the context of the CADHo project, has deployed, as of to date, thirty five honeypot platforms at various locations from academia and industry, in twenty five countries over the five continents. Several analyses carried out based on the data collected so far from these honeypots have revealed that very interesting observations and conclusions can be derived with respect to the attack activities observed on the Internet [4, 6-9]. In addition, several automatic data analyses and clustering techniques have been developed to facilitate the extraction of relevant information from the collected data. A list of papers detailing the methodologies used and the results of these analyses is available in [6]. This paper focuses on modeling-related activities based on the data collected from the honeypots. We first discuss the objectives of such activities and the challenges that need to be addressed. Then we present some examples of models obtained from the data. The paper is organized as follows. Section 2 presents the data collection environment. Section 3 focuses on the modeling of attacks based on the data collected from the honeypots deployed. Modeling examples are presented in Section 4. Finally, Section 5 discusses future work. 2. The data collection environment The data collection environment (called Leurré.com [5]) deployed in the context of the CADHo project is based on low-interaction honeypots using the freely available software called honeyd [10]. Since September 2004, 35 honeypot platforms have been progressively deployed on the Internet at various geographical locations. Each platform emulates three computers running Linux RedHat, Windows 98 and Windows NT, respectively, and various services such as ftp, web, etc. A firewall ensures that connections cannot be initiated from the computers, only replies to external solicitations are allowed. All the honeypot platforms are centrally managed to ensure that they have exactly the same configuration. The data gathered by each platform are securely uploaded to a centralized database with the complete content, including payload of all packets sent to or from these honeypots, and additional information to facilitate its analysis, such as the IP geographical localization of packets’ source addresses, the OS of the attacking machine, the local time of the source, etc. 3. Modeling objectives Modeling involves three main steps: 1) The definition of the objectives of the modeling activities and the quantitative measures to be evaluated. 2) The development of one (or several) models that are suitable to achieve the specified objectives. 3) The processing of the models and the analysis of the results to support system design or operation activities. The data collected from the honeypots can be processed in various ways to characterize the attack processes and perform predictive analyses. In particular, modeling activities can be used to: • Identify the probability distributions that best characterize the occurrence of attacks and their propagation through the Internet. • Analyze whether the data collected from different platforms exhibit similar or different malicious attack activities. • Model the time relationships that may exist between attacks coming from different sources (or to different destinations). • Predict the occurrence of new waves of attacks on a given platform based on the history of attacks observed on this platform as well as on the other platforms. For the sake of illustration, we present in the following sections simple preliminary models based on the data collected from our honeypots that are aimed at fulfilling such objectives. 4. Examples The examples presented in the following address: 1) The analysis of the time evolution of the number of attacks taking into account the geographic location of the attacking machine. 2) The characterization and statistical modeling of the times between attacks. 3) The analysis of the propagation of attacks throughout the honeypot platforms. The data considered for the examples has been collected from January 1st, 2004 to April 17, 2005, corresponding to a data collection period of 320 days. We take into account the attacks observed on 14 honeypot platforms among those deployed so far. The selected honeypots correspond to those that have been active for almost the whole considered period. The total number of attacks observed on these honeypots is 816476. These attacks are not uniformly distributed among the platforms. In particular, the data collected from three platforms represent more than fifty percent of the total attack activity. 4.1 Attack occurrence and geographic distribution The preliminary models presented in this sub-section address: i) the time-evolution modeling of the number of attacks observed on different honeypot platforms, and ii) the analysis of potential correlations for the attack processes observed on the different platforms taking into account the geographic location of the attacking machines and the proportion of attacks observed on each platform, wrt. to the global attack activity. Let us denote by: − Y(t) the function describing the evolution of the number of attacks per unit of time observed on all the honeypots during the observation period, − Xj(t) the function describing the evolution of the number of attacks per unit of time observed on all the honeypots during the observation period for which the IP address of the attacking machine is located in country j . In a first stage, we have plotted, for various time periods, Y(t) and the curves Xj(t) corresponding to different countries j. Visual inspection showed surprising similarities between Y(t) and some Xj(t). To confirm such empirical observations, we have then decided to rigorously analyze this phenomenon using mathematical linear regression models. Considering a linear regression model, we have investigated if Y(t) can be estimated from the combination of the attacks described by Xj(t), taking into account a limited number of countries j. Let us denote by Y*(t) the estimated model. Formally, Y*(t) is defined as follows: Y*(t) = Σαj Xj(t) + β j= 1, 2, .. k (1) Constants αj and β correspond to the parameters of the linear model that provide the best fit with the observed data, and k is the number of countries considered in the regression. The quality of fit of the model is measured by the statistics R2 defined by: R2 = Σ(Y*(i) – Yav) 2/ Σ(Y (i) – Yav) 2 (2) Y (i) and Y*(i) correspond to the observed and estimated number of attacks for unit of time i, respectively. Yav is the average number of attacks per unit of time, taking into account the whole observation period. Indeed, R is the correlation factor between the estimated model and the observed values. The closer the R2 value is to 1, the better the estimated model fits the collected data. We have applied this model considering linear regressions involving one, two or more countries. Surprisingly, the results reveal that a good fit can be obtained by considering the attacks from one country only. For example, the models providing the best fit taking into account the total number of attacks from all the platforms are obtained by considering the attacks issued from either UK, USA, Russia or Germany only. The corresponding R2 values are of the same order of magnitude (0.944 for UK, 0.939 for USA, 0.930 for Russia and 0.920 for Germany), denoting a very good fit of the estimated models to the collected data. For example, the estimated model obtained when considering the attacks from Russia only is defined by equation (3): Y*(t) = 44.568 X1(t) + 1555.67 (3) X1(t) represents the evolution of the number of attacks from Russia. Figure 1 plots the evolution of the observed and estimated number of attacks per unit of time during the data collection period considered in this example. The unit of time corresponds to 4 days. It is noteworthy that, similar conclusions are obtained if we consider another granularity for the unit of time, for example one day, or one week. These results are even more surprising that the attacks from Russia and UK represent only a small proportion of the total number of attacks (1.9% and 3.7% respectively). Concerning the USA, although the proportion is higher (about 18%), it is not sufficient to explain the linear model. Figure 1- Evolution of the number of attacks per unit of time observed on all the platforms and estimated model considering attacks from Russia only We have applied similar analyses by respectively considering each honeypot platform in order to investigate if similar conclusions can be derived by comparing their attack activities per source country to their global attack activities. The results are summarized in Table 1. The second column identifies the source country that provides the best fit. The corresponding R2 value is given in the third column. Finally, the last three columns give the R2 values obtained when considering UK, USA, or Russia in the regression model. It can be noticed that the quality of the regressions measured when considering attacks from Russia only is generally low for all platforms (R2 less than 0.80). This indicates that the property observed at the global level is not visible when looking at the local activities observed on each platform. However, for the majority of the platforms, the best regression models often involve one of the three following countries: USA, Germany or UK, which also provide the best regressions when analyzing the global attack activity considering all the platforms together. Two exceptions are found with P6 and P8 for which the observed attack activities exhibit different characteristics with respect to the origin of the attacks (Taiwan, China), compared to the other platforms. The trends discussed above have been also observed when considering a different granularity for the unit of time (e.g., 1 day or 1 week) as well as different data observation periods. Platform Country providing the best model Best model Russia P1 Germany 0.895 0.873 0.858 0.687 P2 USA 0.733 0.464 0.733 0.260 P4 Germany 0.722 0.197 0.373 0.161 P5 Germany 0.874 0.869 0.872 0.608 P6 UK 0.861 0.861 0.699 0.656 P8 Taiwan 0.796 0.249 0.425 0.212 P9 Germany 0.754 0.630 0.624 0.631 P11 China 0.746 0.303 0.664 0.097 P13 Germany 0.738 0.574 0.412 0.389 P14 Germany 0.708 0.510 0.546 0.087 P20 USA 0.912 0.787 0.912 0.774 P21 SPAIN 0.791 0.620 0.727 0.720 P22 USA 0.870 0.176 0.870 0.111 P23 USA 0.874 0.659 0.874 0.517 Global UK 0.944 0.944 0.939 0.930 Table 1 – Estimated models for each platform: correlation factors for the countries providing the best fit and for UK, USA and Russia To summarize, two main findings can be derived from the results presented above: 1) Some trends exhibited at the global level considering the attack processes on all the platforms together are not observed when analyzing each platform individually (this is the case, for example, of attacks from Russia). On the other hand, we have observed the other situation where the trends observed globally are also visible locally on the majority of the platforms (this is the case, for example, of attacks from USA, UK and Germany). 2) The attack processes observed on each platform are very often highly correlated with the attack processes originating from a particular country. The country providing the best regressions locally, does not necessary exhibit high correlations when considering other platforms or at the global level. These trends seem to result from specific factors that govern the attack processes observed on each platform. 4.2 Distribution of times between attacks In this example, we focus on the analysis and the modeling of the times between attacks observed on different honeypot platforms. Let us denote by ti, the time separating the occurrence of attack i and attack (i-1). Each attack is associated to an IP address, and its occurrence time is defined by the time when the first packet is received from the corresponding address at one of the three virtual machines of the honeypot platform. All the packets received from the same IP address within 24 hours are supposed to belong to the same attack session. We have analyzed the distribution of the times between attacks observed on each honeypot platform. Our objective was to find analytical models that faithfully reflect the empirical data collected from each platform. In the following, we summarize the results obtained considering 5 platforms for which we have observed the highest attack activity. 4 .2.1 Empirical analyses Table 2 gives the number of intervals of times between attacks observed at each platform considered in the analysis as well as the corresponding number of IP addresses. As illustrated by Figure 2, most of these addresses have been observed only once at a given platform. Nevertheless, some IP addresses have been observed several times, the maximum number of visits per IP address for the five platforms was 57, 96, 148, 183, and 83 (respectively). Indeed, the curves plotting the number of IP addresses as a function of the number of attacks for each address follow a heavy-tailed power law distribution. It is noteworthy that such distributions have been observed in many performance and dependability related studies in the context of the Internet, e.g., transfer and interarrival times, burst sizes, sizes of files transferred over the web, error rates in web servers, etc. P5 P6 P9 P20 P23 Number of ti 85890 148942 46268 224917 51580 Number of IP addresses 79549 90620 42230 162156 47859 Table 2 - Numbers of intervals of times between attacks (ti) and of different IP addresses observed at each platform Figure 2- Number of IP addresses versus the number of attacks per IP address observed at each platform (log-log scale) 4 .2.2 Modeling Finding tractable analytical models that faithfully reflect the observed times between attacks is useful to characterize the observed attack processes and to find appropriate indicators that can be used for prediction purposes. We have investigated several candidate distributions, including Weibull, Lognormal, Pareto, and the Exponential distribution, which are traditionally used in reliability related studies. The best fit for each platform has been obtained using a mixture model combining a Pareto and an exponential distribution. Let us denote by T the random variable corresponding to the time between the occurrence of two consecutive attacks at a given platform, and t a realization of T. Assuming that the probability density function pdf(t) associated to T is characterized by a mixture distribution combining a Pareto distribution and an exponential distribution, then f(t) is defined as follows. pdf (t) = P (t +1) + (1" P k is the index parameter of the Pareto distribution, λ is the rate associated to the exponential distribution and Pa is a probability. We have used the R statistical package [11] to estimate the parameters that provide the best fit to the collected data. The quality of fit is assessed by applying the Kolmogorov-Smirnov statistical test. The results are presented in Figure 3. It can be noticed that for all the platforms, the mixed distribution provides a good fit to the observed data whereas the exponential distribution is not suitable to describe the observed attack processes. Thus, the traditional assumption considered in hardware reliability evaluation studies assuming that failures occur according to a Poisson process does not seem to be satisfactory when considering the data observed form our honeypots. These results have been also confirmed when considering the data collected during other observation periods. 1 31 61 91 121 151 181 211 241 271 Time between attacks (seconds) Pa = 0.0051 k = 0.173 ! = 0.121/sec. p-value = 0.90 Data Mixture (Pareto, Exp.) Exponential 1 31 61 91 121 151 181 211 241 271 Time between attacks (seconds) Mixture (Pareto, Exp.) Exponential Pa = 0.0115 k = 0.1183 ! = 0.1364/sec. p-value = 0.999 a) P5 b) P6 1 31 61 91 121 151 181 211 241 271 Time between attacks (seconds) Mixture (Pareto, Exp.) Exponential Pa = 0.0019 k = 0.1668 ! = 0.276/sec. p-value = 0.99 1 31 61 91 121 151 181 211 241 271 Time between attacks (seconds) Mixture (Pareto, Exp.) Exponential Pa = 0.0144 k = 0.0183 ! = 0.0136/sec. p-value = 0.90 c) P9 d) P20 1 31 61 91 121 151 181 211 241 271 Time between attacks (seconds) Mixture (Pareto, Exp.) Exponential Pa = 0.0031 k = 0.1240 ! = 0.275/sec. p-value = 0.985 e) P23 Figure 3- Observed and estimated times between attacks probability density functions. 4.3 Propagation of attacks Besides analyzing the attack activities observed at each platform in isolation, it is useful to identify phenomena that reflect propagation of attacks through different platforms. In this section, we analyze simple scenarios where a propagation between two platforms is assumed to occur when the IP address of an attacking machine observed at a given platform is also observed at another platform. Such a situation might occur for example as a result of a scanning activity or might be resulting from the propagation of worms. For the sake of illustration, we restrict the analysis to the five platforms considered in the previous example. For each attacking IP address in the data collected from the five platforms during the period of the study, we identified: 1) all the occurrences with the same source address, 2) the times of each occurrence and 3) the platform on which each occurrence has been reported. A propagation is said to occur for this IP address from platform Pi to platform Pj when the next occurrence of this address is observed on Pj after visiting Pi. Based on this information we build a propagation graph where each node identifies a platform and a transition between two nodes identifies a propagation between the nodes. A probability is associated to each transition to characterize its likelihood of occurrence. Figure 4 presents the propagation graph obtained for the five platforms included in the analysis. Considering platforms P6 and P20, it can be seen that only a few IP addresses that attacked these platforms have been observed on the other platforms. The situation is different when considering platforms P5, P9, and P23. In particular, it can be noticed that propagation between P5 and P9 is highly probable. This is related in particular to the fact that the addresses of the corresponding platforms belong to the same /8 network domain. More thorough and detailed analyses are currently carried out based on the propagation graph in order to take into account timing information for the corresponding transitions and also the types of attacks observed, in order to better explain the propagation phenomena illustrated by the graph. Figure 4- Propagation graph 5. Conclusion This paper presented simple examples and preliminary models illustrating various types of empirical analysis and modeling activities that can be carried out based on the data collected from honeypots in order to characterize attack processes. The honeypot platforms deployed so far in our project belong to the family of so-called “low interaction honeypots”. Thus, hackers can only scan ports and send requests to fake servers without ever succeeding in taking control over them. In our project, we are also interested in running experiments with “high interaction” honeypots where attackers can really compromise the targets. Such honeypots are suitable to collect data that would enable us to study the behaviors of attackers once they have managed to get access to a target and try to progress in the intrusion process to get additional privileges. Future work will be focused on the deployment of such honeypots and the exploitation of the collected data to better characterize attack scenarios and analyze their impact on the security of the target systems. The ultimate objective would be to build representative stochastic models that will enable us to evaluate the ability of computing systems to resist to attacks and to validate them based on real attack data. Acknowledgement. This work has been carried out in the context of the CADHo project, an ongoing research action funded by the French ACI “Securité & Informatique” (www.cadho.org). It is partially supported by the ReSIST European Network of Excellence (www .resist-noe.org). References [1] M. Bailey, E. Cooke, F. Jahanian, J. Nazario, and D. Watson, "The Internet Motion Sensor: A Distributed Blackhole Monitoring System," Proc. 12th Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, Feb. 2005. [2] Home Page of the CAIDA Project, http://www.caida.org/ [3] DShield Distributed Detection System homepage, http://www.honeynet.org/ [4] E. Alata, M. Dacier, Y. Deswarte, M. Kaâniche, K. Kortchinsky, V. Nicomette, V.H. Pham, F. Pouget, Collection and Analysis of Attack data based on honeypots deployed on the Internet”, 1st Workshop on Quality of Protection, Milano, Italy, September 2005. [5] F. Pouget, M. Dacier, V. H. Pham, “Leurré.com: On the Advantages of Deploying a Large Scale Distributed Honeypot Platform”, Proc. E-Crime and Computer Evidence Conference (ECCE 2005), Monaco, Mars 2005. [6] L. Spitzner, Honeypots: Tracking Hackers, Addison- Wesley, ISBN from-321-10895-7, 2002 [7] Project Leurré.com. Publications web page, http://www.leurrecom.org/paper.htm [8] M. Dacier, F. Pouget, H. Debar, “Honeypots: Practical Means to Validate Malicious Fault Assumptions on the Internet”, Proc. 10th IEEE International Symposium Pacific Rim Dependable Computing (PRDC10), Tahiti, March 2004, pages 383-388. [9] M. Dacier, F. Pouget, H. Debar, “Attack Processes found on the Internet”, Proc. OTAN Symp. on Adaptive Defense in Unclassified Networks, Toulouse, France, April 2004. [10] Honeyd Home page, http://www.citi.umich.edu/u/provos/honeyd/ [11] R statistical package Home page, http://www.r-project.org ABSTRACT Honeypots are more and more used to collect data on malicious activities on the Internet and to better understand the strategies and techniques used by attackers to compromise target systems. Analysis and modeling methodologies are needed to support the characterization of attack processes based on the data collected from the honeypots. This paper presents some empirical analyses based on the data collected from the Leurr{\'e}.com honeypot platforms deployed on the Internet and presents some preliminary modeling studies aimed at fulfilling such objectives. <|endoftext|><|startoftext|> Draft version October 24, 2018 Preprint typeset using LATEX style emulateapj v. 08/22/09 THE LOW CO CONTENT OF THE EXTREMELY METAL POOR GALAXY I ZW 18 Adam Leroy , John Cannon , Fabian Walter , Alberto Bolatto , Axel Weiss Draft version October 24, 2018 ABSTRACT We present sensitive molecular line observations of the metal-poor blue compact dwarf I Zw 18 obtained with the IRAM Plateau de Bure interferometer. These data constrain the CO J = 1 → 0 luminosity within our 300 pc (FWHM) beam to be LCO < 1×10 5 K km s−1 pc2 (ICO < 1 K km s −1), an order of magnitude lower than previous limits. Although I Zw 18 is starbursting, it has a CO luminosity similar to or less than nearby low-mass irregulars (e.g. NGC 1569, the SMC, and NGC 6822). There is less CO in I Zw 18 relative to its B-band luminosity, H I mass, or star formation rate than in spiral or dwarf starburst galaxies (including the nearby dwarf starburst IC 10). Comparing the star formation rate to our CO upper limit reveals that unless molecular gas forms stars much more efficiently in I Zw 18 than in our own galaxy, it must have a very low CO-to-H2 ratio, ∼ 10 −2 times the Galactic value. We detect 3mm continuum emission, presumably due to thermal dust and free-free emission, towards the radio peak. Subject headings: galaxies: individual (I Zw 18); galaxies: ISM; galaxies: dwarf, radio lines: ISM 1. INTRODUCTION With the lowest nebular metallicity in the nearby uni- verse (12+ logO/H ≈ 7.2, Skillman & Kennicutt 1993), the blue compact dwarf I Zw 18 plays an important role in our understanding of galaxy evolution. Vigorous ongo- ing star formation implies the presence of molecular gas, but direct evidence has been elusive. Vidal-Madjar et al. (2000) showed that there is not significant diffuse H2, but Cannon et al. (2002) found ∼ 103 M⊙ of dust organized in clumps with sizes 50 – 100 pc. Vidal-Madjar et al. (2000) did not rule out compact, dense molecular clouds, and Cannon et al. (2002) argued that this dust may in- dicate the presence of molecular gas. Observations by Arnault et al. (1988) and Gondhalekar et al. (1998) failed to detect CO J = 1 → 0 emission, the most commonly used tracer of H2. This is not surprising. The low dust abundance and intense radiation fields found in I Zw 18 may have a dramatic impact on the formation of H2 and structure of molecular clouds. A large fraction of the H2 may exist in extended envelopes surrounding relatively compact cold cores. In these envelopes, H2 self-shields while CO is dissociated (Maloney & Black 1988). The result may be that in such galaxies [CII] or FIR emission trace H2 better than CO (Madden et al. 1997; Israel 1997a; Pak et al. 1998). Further, H2 may simply be underabundant, as there is a lack of grains on which to form while photodissociation is enhanced by an intense UV field. Indeed, Bell et al. (2006) found that at Z = Z⊙/100, a molecular cloud may take as long as a Gyr to reach chemical equilibrium. A low CO content in I Zw 18 is then expected, and a stringent upper limit would lend observational support to predictions for molecular cloud structure at low metallic- 1 Max-Planck-Institut für Astronomie, Königstuhl 17, D-69117, Heidelberg, Germany; email: leroy@mpia-hd.mpg.de 2 Astronomy Department, Wesleyan University, Middletown, CT 06459, cannon@astro.wesleyan.edu 3 Radio Astronomy Lab, UC Berkeley, 601 Campbell Hall, Berkeley, CA, 94720 4 MPIfR, Auf dem Hügel 69, 53121, Bonn, Germany ity. However, while the existing upper limits are sensitive in an absolute sense, they do not even show I Zw 18 to have a lower normalized CO content than a spiral galaxy (e.g. less CO per B-band luminosity). The low luminos- ity (MB ≈ −14.7, Gil de Paz et al. 2003) and large dis- tance (d=14 Mpc, Izotov & Thuan 2004) of this system require very sensitive observations to set a meaningful upper limit. In this letter we present observations, obtained with the IRAM Plateau de Bure Interferometer (PdBI)5, that constrain the CO luminosity, LCO, to be equal to or less than that of nearby CO-poor (non-starbursting) dwarf irregulars. 2. OBSERVATIONS I Zw 18 was observed with the IRAM Plateau de Bure Interferometer on 17, 21, and 27 April and 13 May 2004 for a total of 11 hours. The phase calibrators were 0836+710 (Fν(115GHz) ≈ 1.1 Jy), and 0954+556 (Fν(115GHz) ≈ 0.35 Jy). One or more calibrators with known fluxes were also observed during each track. The data were reduced at the IRAM facility in Grenoble us- ing the GILDAS software package; maps were prepared using AIPS. The final CO J = 1 → 0 data cube has beam size 5.59′′ × 3.42′′, and a velocity (frequency) resolution of 6.5 km s−1 (2.5 MHz). The velocity coverage stretches from vLSR ≈ 50 to 1450 km s −1. The data have an RMS noise of 3.77 mJy beam−1 (18 mK; 1 Jy beam−1 = 4.8 K). The 44′′ (FWHM) primary beam completely covers the galaxy. Based on variation of the relative fluxes of the calibrators, we estimate the gain uncertainty to be < 15%. 3. RESULTS 3.1. Upper Limit on CO Emission To search for significant CO emission, we smooth the cube to 20 km s−1 velocity resolution, a typical line width 5 Based on observations carried out with the IRAM Plateau de Bure Interferometer. IRAM is supported by INSU/CNRS (France), MPG (Germany) and IGN (Spain).” http://arxiv.org/abs/0704.0862v1 Fig. 1.— CO 1 → 0 spectra of I Zw 18 towards the radio continuum/Hα peak (left) and the highest significance spectra (right), which is still too faint to classify as more than marginal. The locations of both spectra are shown in Figure 2. Dashed hor- izontal lines show the magnitude of the RMS noise. for CO at our spatial resolution (e.g., Helfer et al. 2003). The noise per channel map in this smoothed cube is σ20 ≈ 0.25 K km s −1. Over the H I velocity range (710 – 810 km s−1, van Zee et al. 1998), there are no regions with ICO,20 > 1 K km s −1 (4σ) within the primary beam. We pick a slightly conservative upper limit for two rea- sons. First, if there were CO emission with this intensity we would be certain of detecting it. Second, the noise in the cube is slightly non-Gaussian, so that the false posi- tive rate for ICO,20 > 1 K km s −1 — estimated from the negatives and the channel maps outside the H I velocity range — is ∼ 0.2%, very close to that of a 3σ deviate. For d = 14 Mpc, the synthesized beam has a FWHM of 300 pc and an area of 1.0 × 105 pc2. Our intensity limit, ICO < 1 K km s −1, therefore translates to a CO luminosity limit of LCO < 1× 10 5 K km s−1 pc2. There is a marginal signal toward the southern knot of Hα emission (9h34m02s.4, 55◦14′23′′.0). This emission has the largest |ICO,20| found over the H I velocity range, corresponding to LCO ∼ 8×10 4 K km s−1 pc2, just below our limit. This same line of sight also shows |ICO| > 2σ over three consecutive channels, a feature seen along only one other line of sight (in negative) over the H I velocity range. The marginal signal is suggestively located in the southeast of I Zw 18, where Cannon et al. (2002) identi- fied several potential sites of molecular gas from regions of relatively high extinction. While tantalizing, the sig- nal is not strong enough to be categorized as a detection. Figure 1 shows CO spectra towards the Hα/radio contin- uum peak (Cannon et al. 2002, 2005; Hunt et al. 2005a, see Figure 2) and this marginal signal. 3.2. Continuum Emission We average the data over all channels and produce a continuum map with noise σ115GHz = 0.35 mJy beam The highest value in the map is I115GHz = 1.06±0.35mJy beam−1 at α2000 = 9 h34m02s.1, δ2000 = +55 ◦ 14′ 27′′.0. This is within a fraction of a beam of the 1.4 GHz peak identified by Cannon et al. (2005, α2000 = 9 h34m02s.1, δ2000 = +55 ◦ 14′ 28′′.06) and Hunt et al. (2005a, α2000 = 9h34m02s, δ2000 = +55 ◦ 14′ 29′′.06). Figure 2 shows the radio continuum peak and 115 GHz continuum contours plotted over Hα emission from I Zw 18 (Cannon et al. 2002). There is only one other region with |I115GHz| > 3σ115GHz within the primary beam and the star-forming extent of I Zw 18 occupies ≈ 10 % of the primary beam. Therefore, we estimate the chance of a false positive co- incident with the galaxy to be only ∼ 10%. 4. DISCUSSION Here we discuss the implications of our CO upper limit and continuum detection. We adopt the following prop- erties for I Zw 18, all scaled to d = 14 Mpc: MB = −14.7 (Gil de Paz et al. 2003), MHI = 1.4 × 10 (van Zee et al. 1998), Hα luminosity log10 Hα = 39.9 erg s−1 (Cannon et al. 2002; Gil de Paz et al. 2003), 1.4 GHz flux F1.4 = 1.79 mJy (Cannon et al. 2005). 4.1. Point Source Luminosity Our upper limit along each line of sight, LCO < 1 × 105 K km s−1 pc2, matches the luminosity of a fairly massive Galactic giant molecular cloud (Blitz 1993). For a Galactic CO-to-H2 conversion factor, 2 × 1020 cm−2 (K km s−1)−1, the corresponding molecular gas mass is MMol ≈ 4.4× 10 5 M⊙, similar to the mass of the Orion-Monoceros complex (e.g. Wilson et al. 2005). 4.2. Comparison With More Luminous Galaxies In galaxies detected by CO surveys, the CO content per unit B-band luminosity is fairly constant. Figure 3 shows the CO luminosity normalized by B-band lumi- nosity, LCO/LB, as a function of absolute B-band mag- nitude (LB is extinction corrected). LCO/LB is nearly constant over two orders of magnitude in LB, though with substantial scatter (much of it due to the extrapo- lation from a single pointing to LCO). Based on these data and assuming that LCO is not a function of the metallicity of the galaxy, we may ex- trapolate to an expected CO luminosity for I Zw 18. For MB,IZw18 ≈ −14.7 the CO luminosity correspond- ing to the median value of LCO/LB (dashed line) in Figure 3 is LCO,IZw18 ≈ 1.7 × 10 6 K km s−1 pc2. The Hα, 1.4 GHz, and H I luminosities lead to simi- lar predictions. Young et al. (1996) found MH2/LHα ≈ 10L⊙/M⊙ for Sd–Irr galaxies, which implies LCO,IZw18 ∼ 4 × 106 K km s−1 pc2. Murgia et al. (2005) measured FCO/F1.4 ≈ 10 Jy km s −1 (mJy)−1 for spirals, that would imply LCO,IZw18 ∼ 10 7 K km s−1. For Sd/Sm galax- ies, MH2/MHI ≈ 0.2 (Young & Scoville 1991), leading to LCO,IZw18 ∼ 5 × 10 6 K km s−1 pc2. Both MH2/LHα and MH2/MHI tend to be even higher in earlier-type spirals. Therefore, surveys would predict LCO,IZw18 & 2 × 106 K km s−1 pc2, very close to the previously established upper limits of 2− 3 × 106 K km s−1pc2 (Arnault et al. 1988; Gondhalekar et al. 1998). With the present obser- vations, we constrain LCO < 1 × 10 5 K km s−1pc2 and thus clearly rule out LCO ∼ 10 6 K km s−1 pc2. This may be seen in Figure 3; even if I Zw 18 has the highest possible CO content, it will still have a lower LCO/LB than 97% of the survey galaxies. 4.3. Comparison With Nearby Metal-Poor Dwarfs The subset of irregular galaxies detected by CO surveys tend to be CO-rich and actively star-forming, resembling scaled-down versions of spiral galaxies (Young et al. 1995, 1996; Leroy et al. 2005). Such galaxies may not be representative of all dwarfs. Because they are nearby, several of the closest dwarf irregulars have been detected Fig. 2.— V -band and Hα (right, Cannon et al. 2002) images of I Zw 18. Overlays on the left image show the size of the synthesized beam and the locations of the spectra shown in Figure 1. Contours on the right image show continuum emission in increments of 0.5σ significance and the location of the radio continuum peak. The primary beam is larger than the area shown. Both optical maps are on linear stretches. V -band data obtained from the MAST Archive, originally observed for GO program 9400, PI: T. Thuan). despite very small LCO. With their low masses and metallicities, they may represent good points of compar- ison for I Zw 18. Table 1 and Figure 3 show CO lumi- nosities and LCO/LB for four nearby dwarfs: NGC 1569, the Small Magellanic Cloud (SMC), NGC 6822, and IC 10. The SMC, NGC 1569, and NGC 6822 have LCO ∼ 10 5 K km s−1 pc2, close to our upper limit, and occupy a region of LCO/LB-LB parameter space similar to I Zw 18. All four of these galaxies have active star for- mation but very low CO content relative to their other properties. We test whether our observations would have detected CO in NGC 1569, the SMC, and IC 10 at the plausible lower limit of 10 Mpc (from H0 = 72 km s −1) or our adopted distance of 14 Mpc. We convolve the integrated intensity maps to resolutions of 210 and 300 pc and mea- sure the peak integrated intensity. The results appear in columns 4 and 5 of Table 1. The PdBI observations of NGC 1569 resolve out most of the flux, so we also apply this test to a distribution with the size and luminosity derived by Greve et al. (1996) from single dish observa- tions. Our observations would detect an analog to IC 10 but not the SMC, with NGC 1569 an intermediate case. With a factor of ∼ 3 better sensitivity (requiring ∼ 10 times more observing time) we would expect to detect all three nearby galaxies. However, achieving such sen- sitivity with present instrumentation will be quite chal- lenging. ALMA will likely be necessary to place stronger constraints on CO in galaxies like I Zw 18. IC 10 may be the nearest blue compact dwarf (Richer et al. 2001), so it may be telling that we would detect it at the distance of I Zw 18. The blue compact galaxies that have been detected in CO have LCO/LB similar to IC 10 (Gondhalekar et al. 1998, the diamonds in Figure 3). Most searches for CO towards BCDs have yielded nondetections, so those detected may not be rep- resentative, but I Zw 18 is clearly not among the “CO- rich” portion of the BCD population. 4.4. Interpretation of the Continuum We measure continuum intensity of F115GHz = 1.06± 0.35 mJy towards the radio continuum peak. The continuum is detected along only one line of sight, so we refer to it here as a point source and com- pare it to integrated values for I Zw 18. F115GHz is expected to be the product of mainly two types of emission: thermal free-free emission and thermal dust emission. At long wavelengths, the integrated ther- mal free-free emission is F1.4GHz(free− free) ≈ 0.52 – 0.75 mJy (Cannon et al. 2005; Hunt et al. 2005a), imply- ing F115GHz(free− free) = 0.36 – 0.51 mJy at 115 GHz (Fν ∝ ν −0.1). The Hα flux predicts a similar value, F115GHz(free− free) = 0.34 mJy (Cannon et al. 2005, Equation 1). Hunt et al. (2005b) placed an upper limit of Fν(850) < 2.5 mJy on dust continuum emission at 850µm; this is consistent with the ∼ 5 × 103 M⊙ esti- mated by Cannon et al. (2002) given almost any reason- able dust properties. Extrapolating this to 2.6 mm as- suming a pure blackbody spectrum, the shallowest plau- sible SED, constrains thermal emission from dust to be < 0.25 mJy at 115 GHz. Based on these data, we would predict F115GHz . 0.75 mJy. Thus our measured F115GHz is consistent with, but somewhat higher than, the ther- mal free-free plus dust emission expected based on opti- cal, centimeter, and submillimeter data. 4.5. Relation to Star Formation I Zw 18 has a star formation rate ∼ 0.06 – 0.1 M⊙ yr−1, based on Hα and cm radio continuum measure- ments (Cannon et al. 2002; Kennicutt 1998a; Hunt et al. 2005a). Our continuum flux suggests a slightly higher value ≈ 0.15 – 0.2 M⊙ yr −1 (following Hunt et al. 2005a; Condon 1992), with the exact value depending on the contribution from thermal dust emission. For any value in this range, the star formation rate per CO luminosity, SFR/LCO is much higher in I Zw 18 than in spirals. For Fig. 3.— CO luminosity normalized by absolute blue mag- nitude for galaxies with Hubble Type Sb or later (black cir- cles, Young et al. 1995; Elfhag et al. 1996; Böker et al. 2003; Leroy et al. 2005). We also plot nearby dwarfs from Ta- ble 1 (crosses) and blue compact galaxies compiled by Gondhalekar et al. (1998, , diamonds). The shaded regions shows our upper limit for I Zw 18, with the range inMB for distances from 10 to 20 Mpc. The dashed line and light shaded region show the median value and 1σ scatter in LCO/LB for spirals and dwarf star- bursts. Methodology: We extrapolate from ICO in central pointings to LCO assuming the CO to have an exponential profile with scale length 0.1 d25 (Young et al. 1995), including only galaxies where the central pointing measures > 20% of LCO. We adopt B mag- nitudes (corrected for internal and Galactic extinction), distances (Tully-Fisher when available, otherwise Virgocentric-flow corrected Hubble flow), and radii from LEDA (Paturel et al. 2003). comparison, our upper limit and the molecular “Schmidt Law” derived by Murgia et al. (2002) predicts a star for- mation rate . 2 × 10−4 M⊙ yr −1. Fits by Young et al. (1996) and Kennicutt (1998b, applied to just the molec- ular limit) yield similar values. Again, I Zw 18 is similar to the SMC and NGC 6822, which have star formation rates of 0.05 M⊙ yr −1 and 0.04 M⊙ yr −1 (Wilke et al. 2004; Israel 1997b) and LCO ∼ 10 5 K km s−1 pc2. 4.6. Variations in XCO Several calibrations of the CO-to-H2 conversion factor, XCO as a function of metallicity exist in the literature. The topic has been controversial and these calibrations range from little or no dependence (e.g. Walter 2003; Rosolowsky et al. 2003) to very steep dependence (e.g., XCO ∝ Z −2.7 Israel 1997a). Comparing the star for- mation rate to our CO upper limit, we may rule out that I Zw 18 has a Galactic XCO unless molecular gas in I Zw 18 forms stars much more efficiently than in the Galaxy. Either the ratio of CO-to-H2 is low in I Zw 18 or molecular gas in this galaxy forms stars with an effi- ciency two orders of magnitude higher than that in spiral galaxies. 5. CONCLUSIONS We present new, sensitive observations of the metal- poor dwarf galaxy I Zw 18 at 3 mm using the Plateau de Bure Interferometer. These data constrain the integrated CO J = 1 → 0 intensity to be ICO < 1 K km s −1 over our 300 pc (FWHM) beam and the luminosity to be LCO < 1× 105 K km s−1 pc2. I Zw 18 has less CO relative to its B-band luminosity, H Imass, or SFR than spiral galaxies or dwarf starbursts, including more metal-rich blue compact galaxies such as IC 10 (ZIC 10 ∼ Z⊙/4, Lee et al. 2003). Because of its small size and large distance, these are the first observa- tions to impose this constraint. We show that I Zw 18 should be grouped with several local analogs — NGC 1569, the SMC, NGC 6822 — as a galaxy with active star formation but a very low CO content relative to its other properties. In these galax- ies, observations suggest that the environment affects the molecular gas and these data suggest that the same is true in I Zw 18. A simple comparison of star formation rate to CO content shows that this must be true at a basic level: either the ratio of CO to H2 is dramatically low in I Zw 18 or molecular gas in this galaxy forms stars with an efficiency two orders of magnitude higher than that in spiral galaxies. We detect 3mm continuum with F115 GHz = 1.06 ± 0.35 mJy coincident with the radio peak identified by Cannon et al. (2005) and Hunt et al. (2005a). This flux is consistent with but somewhat higher than the thermal free-free plus dust emission one would predict based on centimeter, submillimeter, and optical measurements. Finally, we note that improving on this limit with cur- rent instrumentation will be quite challenging. The order of magnitude increase in sensitivity from ALMA will be needed to place stronger constraints on CO in galaxies like I Zw 18. We thank Roberto Neri for his help reducing the data. We acknowledge the usage of the HyperLeda database (http://leda.univ-lyon1.fr). REFERENCES Arnault, P., Kunth, D., Casoli, F., & Combes, F. 1988, A&A, 205, 41 Bell, T. A., Roueff, E., Viti, S., & Williams, D. A. 2006, MNRAS, 371, 1865 Blitz, L. 1993, Protostars and Planets III, 125 Böker, T., Lisenfeld, U., & Schinnerer, E. 2003, A&A, 406, 87 Cannon, J. M., Skillman, E. D., Garnett, D. R., & Dufour, R. J. 2002, ApJ, 565, 931 Cannon, J. M., Walter, F., Skillman, E. D., & van Zee, L. 2005, ApJ, 621, L21 Condon, J. J. 1992, ARA&A, 30, 575 Gil de Paz, A., Madore, B. F., & Pevunova, O. 2003, ApJS, 147, Elfhag, T., Booth, R. S., Hoeglund, B., Johansson, L. E. B., & Sandqvist, A. 1996, A&AS, 115, 439 Gondhalekar, P. M., Johansson, L. E. B., Brosch, N., Glass, I. S., & Brinks, E. 1998, A&A, 335, 152 Greve, A., Becker, R., Johansson, L. E. B., & McKeith, C. D. 1996, A&A, 312, 391 Helfer, T. T., Thornley, M. D., Regan, M. W., Wong, T., Sheth, K., Vogel, S. N., Blitz, L., & Bock, D. C.-J. 2003, ApJS, 145, Hunt, L. K., Dyer, K. K., & Thuan, T. X. 2005a, A&A, 436, 837 Hunt, L., Bianchi, S., & Maiolino, R. 2005b, A&A, 434, 849 Israel, F. P. 1997, A&A, 328, 471 Israel, F. P. 1997, A&A, 317, 65 Izotov, Y. I., & Thuan, T. X. 2004, ApJ, 616, 768 http://leda.univ-lyon1.fr TABLE 1 CO in Nearby Low Mass Galaxies Galaxy MB LCO ICO,210 a ICO,300 a Reference (mag) (K km s−1 pc2) (K km s−1) (K km s−1) NGC 1569 −16.5 1.2× 105 1.1 0.8 Greve et al. (1996) −16.5 0.2× 105 0.8 0.5 Taylor et al. (1999) SMC −16 1.5× 105 0.5 0.4 Mizuno et al. (2001, 2006) NGC 6822 −16 1.2× 105 · · · · · · Israel (1997b) IC 10 −16.5 2.2× 106 3.8 2.2 Leroy et al. (2006) I Zw 18 −14.7 < 2× 106 · · · · · · Arnault et al. (1988); Gondhalekar et al. (1998) I Zw 18 −14.7 . 1× 105 < 1 < 1 this paper a Peak integrated intensity at 210 and 300 pc, corresponding to our beam size at 10 and 14 Mpc, respectively. Kennicutt, R. C., Jr. 1998a, ARA&A, 36, 189 Kennicutt, R. C., Jr. 1998b, ApJ, 498, 541 Lee, H., McCall, M. L., & Richer, M. G. 2003, AJ, 125, 2975 Leroy, A., Bolatto, A. D., Simon, J. D., & Blitz, L. 2005, ApJ, 625, 763 Leroy, A., Bolatto, A., Walter, F., & Blitz, L. 2006, ApJ, 643, 825 Madden, S. C., Poglitsch, A., Geis, N., Stacey, G. J., & Townes, C. H. 1997, ApJ, 483, 200 Maloney, P., & Black, J. H. 1988, ApJ, 325, 389 Mizuno, N., Rubio, M., Mizuno, A., Yamaguchi, R., Onishi, T., & Fukui, Y. 2001, PASJ, 53, L45 Mizuno, N., et al. 2006, in prep. Murgia, M., Crapsi, A., Moscadelli, L., & Gregorini, L. 2002, A&A, 385, 412 Murgia, M., Helfer, T. T., Ekers, R., Blitz, L., Moscadelli, L., Wong, T., & Paladino, R. 2005, A&A, 437, 389 Pak, S., Jaffe, D. T., van Dishoeck, E. F., Johansson, L. E. B., & Booth, R. S. 1998, ApJ, 498, 735 Paturel, G., Petit, C., Prugniel, P., Theureau, G., Rousseau, J., Brouty, M., Dubois, P., & Cambrésy, L. 2003, A&A, 412, 45 Richer, M. G., et al. 2001, A&A, 370, 34 Rosolowsky, E., Engargiola, G., Plambeck, R., & Blitz, L. 2003, ApJ, 599, 258 Skillman, E. D., & Kennicutt, R. C., Jr. 1993, ApJ, 411, 655 Taylor, C. L., Kobulnicky, H. A., & Skillman, E. D. 1998, AJ, 116, 2746 Taylor, C. L., Hüttemeister, S., Klein, U., & Greve, A. 1999, A&A, 349, 424 van Zee, L., Westpfahl, D., Haynes, M. P., & Salzer, J. J. 1998, AJ, 115, 1000 Vidal-Madjar, A., et al. 2000, ApJ, 538, L77 Walter, F. 2003, IAU Symposium, 221, 176P Wilke, K., Klaas, U., Lemke, D., Mattila, K., Stickel, M., & Haas, M. 2004, A&A, 414, 69 Wilson, B. A., Dame, T. M., Masheder, M. R. W., & Thaddeus, P. 2005, A&A, 430, 523 Young, J. S., et al. 1995, ApJS, 98, 219 Young, J. S., & Scoville, N. Z. 1991, ARA&A, 29, 581 Young, J. S., Allen, L., Kenney, J. D. P., Lesser, A., & Rownd, B. 1996, AJ, 112, 1903 ABSTRACT We present sensitive molecular line observations of the metal-poor blue compact dwarf I Zw 18 obtained with the IRAM Plateau de Bure interferometer. These data constrain the CO J=1-0 luminosity within our 300 pc (FWHM) beam to be L_CO < 1 \times 10^5 K km s^-1 pc^2 (I_CO < 1 K km s^-1), an order of magnitude lower than previous limits. Although I Zw 18 is starbursting, it has a CO luminosity similar to or less than nearby low-mass irregulars (e.g. NGC 1569, the SMC, and NGC 6822). There is less CO in I Zw 18 relative to its B-band luminosity, HI mass, or star formation rate than in spiral or dwarf starburst galaxies (including the nearby dwarf starburst IC 10). Comparing the star formation rate to our CO upper limit reveals that unless molecular gas forms stars much more efficiently in I Zw 18 than in our own galaxy, it must have a very low CO-to-H_2 ratio, \sim 10^-2 times the Galactic value. We detect 3mm continuum emission, presumably due to thermal dust and free-free emission, towards the radio peak. <|endoftext|><|startoftext|> Introduction The Model The BPS code and the formation of hot subdwarfs Monte-Carlo simulation parameters Spectral library Observables from the model Simulations Standard simulation set Simulation sets with varying model parameters Simulation sets for composite stellar populations Results and Discussion Simple stellar populations The model for composite stellar populations Theory versus observations The UV-upturn and metallicity Comparison with previous models Summary and Conclusion REFERENCES ABSTRACT The discovery of a flux excess in the far-ultraviolet (UV) spectrum of elliptical galaxies was a major surprise in 1969. While it is now clear that this UV excess is caused by an old population of hot helium-burning stars without large hydrogen-rich envelopes, rather than young stars, their origin has remained a mystery. Here we show that these stars most likely lost their envelopes because of binary interactions, similar to the hot subdwarf population in our own Galaxy. We have developed an evolutionary population synthesis model for the far-UV excess of elliptical galaxies based on the binary model developed by Han et al (2002, 2003) for the formation of hot subdwarfs in our Galaxy. Despite its simplicity, it successfully reproduces most of the properties of elliptical galaxies with a UV excess: the range of observed UV excesses, both in $(1550-V)$ and $(2000-V)$, and their evolution with redshift. We also present colour-colour diagrams for use as diagnostic tools in the study of elliptical galaxies. The model has major implications for understanding the evolution of the UV excess and of elliptical galaxies in general. In particular, it implies that the UV excess is not a sign of age, as had been postulated previously, and predicts that it should not be strongly dependent on the metallicity of the population, but exists universally from dwarf ellipticals to giant ellipticals. <|endoftext|><|startoftext|> Baltic Astronomy, vol.12, XXX–XXX, 2003. THE REDSHIFT OF LONG GRBS’ Z. Bagoly1 and I. Csabai2 and A. Mészáros3 and P. Mészáros4 and I. Horváth5 and L. G. Balázs6 and R. Vavrek7 1 Lab. for Information Technology, Eötvös University, H-1117 Budapest, Pázmány P. s. 1./A, Hungary 2 Dept. of Physics for Complex Systems, Eötvös University, H-1117 Bu- dapest, Pázmány P. s. 1./A, Hungary 3 Astronomical Institute of the Charles University, V Holešovičkách 2, CZ-180 00 Prague 8, Czech Republic 4 Dept. of Astronomy & Astrophysics, Pennsylvania State University, 525 Davey Lab., University Park, PA 16802, USA 5 Dept. of Physics, Bolyai Military University, H-1456 Budapest, POB 12, Hungary 6 Konkoly Observatory, H-1505 Budapest, POB 67, Hungary 7 Max-Planck-Institut für Astronomie, D-69117 Heidelberg, 17 Königstuhl, Germany Received October 20, 2003 Abstract. The low energy spectra of some gamma-ray bursts’ show ex- cess components beside the power-law dependence. The consequences of such a feature allows to estimate the gamma photometric redshift of the long gamma-ray bursts in the BATSE Catalog. There is good correla- tion between the measured optical and the estimated gamma photometric redshifts. The estimated redshift values for the long bright gamma-ray bursts are up to z = 4, while for the the faint long bursts - which should be up to z = 20 - the redshifts cannot be determined unambiguously with this method. The redshift distribution of all the gamma-ray bursts with known optical redshift agrees quite well with the BATSE based gamma photometric redshift distribution. Key words: Cosmology - Gamma-ray burst 1. INTRODUCTION In this article we present a new method called gamma photometric redshift (GPZ) estimation of the estimation of the redshifts for the http://arxiv.org/abs/0704.0864v1 2 Z.Bagoly et. al long GRBs. We utilize the fact that broadband fluxes change sys- tematically, as characteristic spectral features redshift into, or out of the observational bands. The situation is in some sense similar to the optical observations of galaxies, where for galaxies and quasars the photometric redshift estimation (Csabai et. al (2000), Budavári et. al (2001)) achieved a great success in estimating redshifts from photometry only. We construct our template spectrum that will be used in the GPZ process in the following manner: let the spectrum be a sum of the Band’s function and of a low energy soft excess power-law function, observed in several cases (Preece et. al (2000)). The low energy cross-over is at Ecr = 90 keV, Eo = 500 keV, and the spectral indices are α = 3.2, β = 0.5 and γ = 3.0. Let us introduce the peak flux ratio (PFR hereafter) in the fol- lowing way: PFR = l34 − l12 l34 + l12 where lij is the BATSE DISCSC flux in energy channel Ei < E < Ej , here E1 = 25 keV, E2 = E3 = 55 keV, E4 = 100 keV. 0 2 4 6 8 10 12 14 α=3.2 β=0.5 Ecr=90 keV Fig. 1. The theoretical PFR curves calculated from the template spec- trum using the average detector re- sponse matrix. The spectra are changing quite rapidly with time; the typ- ical timescale for the time vari- ation is ≃ (0.5 − 2.5) s (Ryde & Svensson (1999, 2000)). There- fore, we will consider the spectra in the 320ms time interval cen- tered around the peak-flux. If we redshift the template spec- trum and use the detector re- sponse matrix of the given burst, we can get for any redshift the observed flux and the PFR value. On Fig. 1. we plot the the- oretical PFR curves calculated from the above defined template spectrum using the average detector response matrices for the 8 bursts that have both BATSE data and measured redshifts (Klose (2000)) In the used range of z (i.e. for z< 4) the relation between z and PFR is invertible, hence we can use it to estimate the gamma photometric The Redshift of Long GRBs’ 3 redshift (GPZ) from a measured PFR. For the 7 considered GRBs (leaving out GRB associated with the supernova and GRB having upper redshift limit only) the estimation error between the real z and the GPZ is ∆z =≈ 0.33. 2. ESTIMATION OF THE REDSHIFTS Here restrict ourselves to long and not very faint GRBs with T90 > 10 s and F256 < 0.65 photon/(cm 2s) to avoid the problems with the instrumental threshold (Pendleton et. al (1997), Hakkila et. al, (2000)). Introducing an another cut at F256 > 2.00 photon/(cm 2s) we can investigate roughly the brighter half of this sample. As the soft-excess range redshifts out from the BATSE DISCSC energy channels around z ≈ 4, the theoretical curves converge to a constant value. For higher z it starts to decrease. This means that the method is ambiguous: for the given value of PFR one may have two redshifts - below and above z ≈ 4. Because for the bright GRBs the values above z ≈ 4 are practically excluded, for them the method is usable. Using only the 25 − 55 keV and 55 − 100 keV BATSE energy channels, this method can be used to estimate GPZ only in the redshift range z < 0 1 2 3 4 5 Gamma Photometric Redshift F256>0.65 ph/cm F256>2.0 ph/cm Fig. 1. The distribution of the GPR estimators of the long GRBs having DISCSC data. Let us assume for a moment that all observed long bursts, we have selected above, have z < 4. Then we can simply calculate the zGPZ redshift for any GRB, which has PFR from the DISCSC data. Fig. 2. shows the distribu- tion of the estimated derived red- shifts under the assumption that all GRBs are below z ≈ 4. The dis- tribution has a clear peak value around PFR ≈ 0.2, which corre- sponds to z ≈ (1.5− 2.0). Although there is a problem with the degeneracy (e.g. two possible redshift values) we think that the great majority of values of z obtained for the bright half are correct. This opinion may be supported by the following arguments: the obtained distribution of GRBs in z for the bright half is very similar to the obtained distribution of Schmidt (2001) and Schaefer 4 Z.Bagoly et. al et. al (2001). An another problem for z as it moves into z> 4 regime for the bright GRB is the extremely high GRB luminosities, ≃ 1053ergs/s (Mészáros & Mészáros, 1996). 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 17 bursts with known redshift GPZ{ F256>0.65 ph/cm2/s Fig. 3. The redshift distribution of the 17 GRBs’ with known z and the distributions from the GPZ estima- tors. As an additional statistical test we compared the redshift distribution of the 17 GRB with observed redshift with our re- constructed GRB z distributions (limited to the z < 4 range). For the F256 > 0.65 photon/(cm group the KS test suggests a 38% probability, i.e. the observed N(< z) probability distribution agrees quite well with the GPZ reconstructed function. ACKNOWLEDGMENTS The useful remarks with Drs. T. Budavári, S. Klose, D. Reichart, A.S. Szalay are kindly acknowl- edged. This research was supported in part through OTKA grants T024027 (L.G.B.), F029461 (I.H.) and T034549, Czech Research Grant J13/98: 113200004 (A.M.), NASA grant NAG5-9192 (P.M.). REFERENCES ??udavári, T., Csabai, I., Szalay, A.S. et. al, 2001, AJ, 122, 1163 ??sabai, I., Connolly, A.J., Szalay, A.S. et. al, 2000, AJ, 119, 69 ??akkila, J., Haglin, D. J., Pendleton, G. N. et. al, 2000, ApJ, 538, ??lose, S. 2000, Reviews in Modern Astronomy 13, Astronomische Gesellschaft, Hamburg, p.129 ??észáros, A., & Mészáros, P. 1996, ApJ, 466, 29 ??reece, R.D., Briggs, M.S., Pendleton, G.N., et. al 1996, ApJ, 473, ??reece, R.D., Briggs, M.S., Mallozzi, et. al, 2000, ApJS, 126, 19 ??yde, F., & Svensson, R. 1999, ApJ, 512, 693 ??yde, F., & Svensson, R. 2000, ApJ, 529, L13 ??chaefer, B. E., Deng, M. & Band, D. L., 2001, ApJ, 563, L123 ??chmidt, M. 2001, ApJ, 552, 36 ABSTRACT The low energy spectra of some gamma-ray bursts' show excess components beside the power-law dependence. The consequences of such a feature allows to estimate the gamma photometric redshift of the long gamma-ray bursts in the BATSE Catalog. There is good correlation between the measured optical and the estimated gamma photometric redshifts. The estimated redshift values for the long bright gamma-ray bursts are up to z=4, while for the the faint long bursts - which should be up to z=20 - the redshifts cannot be determined unambiguously with this method. The redshift distribution of all the gamma-ray bursts with known optical redshift agrees quite well with the BATSE based gamma photometric redshift distribution. <|endoftext|><|startoftext|> Introduction The increasing complexity of software systems raises major concerns in various critical application domains, in particular with respect to the validation and analysis of performance, timing and dependability requirements. Model-driven engineering approaches based on architecture description languages (ADLs) aim at mastering this complexity at the design level. Over the last decade, considerable research has been devoted to ADLs leading to a large number of proposals [1]. In particular, AADL (Architecture Analysis and Design Language) [2] has received an increasing interest from the safety-critical industry (i.e., Honeywell, Rockwell Collins, Lockheed Martin, the European Space Agency, Airbus) during the last years. It has been standardized under the auspices of the International Society of Automotive Engineers (SAE), to support the design and analysis of complex real-time safety-critical applications. AADL provides a standardized textual and graphical notation, for describing architectures with functional interfaces, and for performing various analyses to determine the behavior and performance of the system being modeled. AADL has been designed to be extensible to accommodate analyses that the core language does not support, such as dependability and performance. In critical application domains, one of the challenges faced by the software engineers concerns: 1) the description of the software architecture and its dynamic behavior taking into account the impact of errors and failures, and 2) the evaluation of quantitative measures of relevant dependability properties such as reliability, availability and safety, allowing them to assess the impact of errors and failures on the service. For pragmatic reasons, the designers using an AADL-based engineering approach are interested in using an integrated set of methods and tools to describe specifications and designs, and to perform dependability evaluations. The AADL Error Model Annex [3] has been defined to complement the description capabilities of the AADL core language standard by providing features with precise semantics to be used for describing dependability-related characteristics in AADL models (faults, failure modes and repair assumptions, error propagations, etc.). AADL and the AADL Error Model Annex are supported by the Open Source AADL Tool Environment (OSATE)1. At the current stage, there is a lack of methodologies and guidelines to help the developers, using an AADL based engineering approach, to use the notations defined in the standard for describing complex dependability models reflecting real-life systems with multiple dependencies between components. The objective of this paper is to propose a structured method for AADL dependability model construction. The AADL model is built and validated iteratively, taking into account progressively the dependencies between the components. The approach proposed in this paper is complementary to other research studies focused on the extension of the AADL language capabilities to support formal verifications and analyses (see e.g. [4]). Also, it is intended to be complementary to other studies focused on the integration of formal verification, dependability and performance related activities in the general context of 1 http://lwww.aadl.info/OpenSourceAADLToolEnvironment.html model driven engineering approaches based on ADLs and on UML (see e.g., [5-9]). The remainder of the paper is organized as follows. Section 2 presents the AADL concepts that are necessary for understanding our modeling approach. Section 3 gives an overview of our framework for system dependability modeling and evaluation using AADL. Section 4 presents the iterative approach for building the AADL dependability model. Section 5 illustrates some of the concepts of our approach on a small example and section 6 concludes the paper. 2. AADL concepts The AADL core language allows analyzing the impact of different architecture choices (such as scheduling policy or redundancy scheme) on a system’s properties [10]. An architecture specification in AADL is an hierarchical collection of interacting components (software and compute platform) combined in subsystems. Each AADL component is modeled at two levels: in the component type and in one or more component implementations corresponding to different implementation structures of the component in terms of subcomponents and connections. The AADL core language is designed to describe static architectures with operational modes for their components. However, it can be extended to associate additional information to the architecture. AADL error models are an extension intended to support (qualitative and quantitative) analyses of dependability attributes. The AADL Error Model Annex defines a sub- language to declare reusable error models within an error model annex library. The AADL architecture model serves as a skeleton for error model instances. Error model instances can be associated with components of the system and with the system itself. The component error models describe the behavior of the components with which they are associated, in the presence of internal faults and recovery events, as well as in the presence of external propagations from the component’s environment. Error models have two levels of description: the error model type and the error model implementation. The error model type declares a set of error states, error events (internal to the component) and error propagations2 (events that propagate, from one component to other components, through the connections and bindings between components of the architecture model). Propagations have associated directions (in or out or in out). Error model implementations declare transitions between states, triggered by events and propagations declared in the error model type. Both the type and the implementation can declare Occurrence properties that 2 Error states can also model error free states, error events can also model repair events and error propagations can model all kinds of notifications. specify the arrival rate or the occurrence probability of events and propagations. An out propagation occurs according to a specified Occurrence property when it is named in a transition and the current state is the origin of the transition. If the source state and the destination state of a transition triggered by an out propagation are the same, the propagation is sent out of the component but does not influence the state of the sender component. An in propagation occurs as a consequence of an out propagation from another component. Figure 1 shows an error model example. Error Model Type [simple] error model simple features Error_Free: initial error state; Failed: error state; Fail: error event {Occurrence => Poisson λ}; Recover: error event {Occurrence => Poisson µ}; KO: in out error propagation {Occurrence => fixed p}; end simple; Error Model Implementation [simple.general] error model implementation simple.general transitions Error_Free-[Fail] -> Failed; Error_Free-[in KO] -> Failed; Failed-[Recover] -> Error_Free; Failed-[out KO] -> Failed; end simple.general; Figure 1. Simple error model Error model instances can be customized to fit a particular component through the definition of Guard properties that control and filter propagations by means of Boolean expressions. The system error model is defined as a composition of a set of concurrent finite stochastic automata corresponding to components. In the same way as the entire architecture, the system error model is described hierarchically. The state of a system that contains subcomponents can be specified as a function of its subcomponents’ states (i.e., the system has a derived error model). 3. Overview of the modeling framework For complex systems, the main difficulty for building a dependability model arises from dependencies between the system components. Dependencies can be of several types, identified in [11]: functional, structural or related to the recovery and maintenance strategies. Exchange of data or transfer of intermediate results from one component to another is an example of functional dependency. The fact that a thread runs on a processor induces a structural dependency between the thread and the processor. Sharing a recovery or maintenance facility between several components leads to a recovery or maintenance dependency. Functional and structural dependencies can be grouped into an architecture-based dependency class, as they are triggered by physical or logical connections between the dependent components at architectural level. Instead, recovery and maintenance dependencies are not always visible at architectural level. A structured approach is necessary to model dependencies in a systematic way, to promote model reusability, to avoid errors in the resulting model of the system and to facilitate its validation. In our approach, the AADL dependability-oriented model is built in a progressive and iterative way. More concretely, in a first iteration, we propose to build the model of the system’s components, representing their behavior in the presence of their own faults and recovery events only. The components are thus modeled as if they were isolated from their environment. In the following iterations, dependencies between basic error models are introduced progressively. This approach is part of a complete framework that allows the generation of dependability analysis and evaluation models from AADL models. An overview of this framework is presented in Figure 2. Figure 2. Modeling framework The first step is devoted to the modeling of the application architecture in AADL (in terms of components and operational modes of these components). The AADL architecture model may be available if it has been already built for other purposes. The second step concerns the specification of the application behavior in the presence of faults through AADL error models associated with components of the architecture model. The error model of the application is a composition of the set of component error models. The architecture model and the error model of the application form the dependability-oriented AADL model, referred to as the AADL dependability model. The third step aims at building an analytical dependability evaluation model, from the AADL dependability model, based on model transformation rules. The fourth step is devoted to the dependability evaluation model processing that aims at evaluating quantitative measures characterizing dependability attributes. This step is entirely based on existing processing tools. The iterative approach can be applied to the second step of the modeling framework only or to the second and third steps together. In the latter case, semantic validation based on the analytical model, after each iteration, is helpful to identify specification errors in the AADL dependability model. Due to space limitations, we focus only on the first and second steps in this paper. A transformation from AADL to generalized stochastic Petri nets (GSPN) for dependability evaluation purposes is presented in [12]. 4. AADL dependability model construction To illustrate the proposed approach, the rest of this section presents successively guidelines for modeling an architecture-based dependency (structural or functional) and a recovery and maintenance dependency. More general practical aspects for building the AADL dependability model are given at the end of this section. Note that we illustrate the principles using the graphical notation for AADL composite components (system components). However, they apply to all types of components and connections. 4.1. Architecture-based dependency The dependency is modeled in the error models associated with the dependent components, by specifying respectively outgoing and incoming propagations and their impact on the corresponding error model. An example is shown in Figure 3: Component 1 sends data to Component 2, thus we assume that, at the error model level, the behavior of Component 2 depends on that of Component 1. Figure 3. Architecture-based dependency Instances of the same error model, shown in Figure 1, are associated both with Component 1 and with Component 2. However, the AADL dependability model is asymmetric because of the unidirectional connection between Component 1 and Component 2. Thus, the out propagation KO declared in the error model instance associated with Component 2 is inactive (i.e., even if it occurs, it cannot propagate to Component 1). The out propagation KO from the error model instance of Component 1, together with its Occurrence property and the AADL transition triggered by it form the “sender” part of the dependency. It means that when Component 1 fails, it sends a propagation through the unidirectional connection. The in propagation KO from the error model instance of Component2 together with the AADL transition triggered by it form the “receiver” part of the dependency. Thus, an incoming propagation KO causes the failure of the receiving component. In real applications, architecture-based dependencies usually require using more advanced propagation controlling and filtering through Guard properties. In particular, Boolean expressions can be defined to specify the consequences of a set of propagations occurring in a set of sender components on a receiver component. 4.2. Recovery and maintenance dependency Recovery and maintenance dependencies need to be described when recovery and maintenance facilities are shared between components or when the maintenance activity of some components has to be carried out according to a given order or a specified strategy (i.e., a thread can be restarted only if another thread is running). Components that are not dependent at architectural level may become dependent due to the recovery and maintenance strategy. Thus, the AADL dependability model might need some adjustments to support the description of dependencies related to the maintenance strategy. As error models interact only via propagations through architectural features (i.e., connections, bindings), the recovery and maintenance dependency between components’ error models must be supported by the architecture model. Thus, besides the architecture components, we may need to model (at architectural level) a component allowing to describe the recovery and maintenance strategy. Figure 4-a shows an example of AADL dependability model. In this architecture, Component 3 and Component 4 do not interact at the architecture level. However, if we assume that they share a recovery and maintenance facility, the recovery and maintenance strategy has to be taken into account in the error model of the application. Thus, it is necessary to represent the recovery and maintenance facility at the architectural level, as shown in Figure 4-b in order to model explicitly the dependency between Components 3 and Component 4. Also, the error models of dependent components with regards to the recovery and maintenance strategy might need some adjustments. For example, to represent the fact that Component 3 can only restart if Component 4 is running, one needs to distinguish between a failed state of Component 3 and a failed state where Component 3 is allowed to restart. - a - - b - Figure 4. Maintenance dependency 4.3. Practical aspects The order for modeling dependencies does not impact the final AADL dependability model. However, it may impact the reusability of parts of the model. Thus, the order may be chosen according to the context of the targeted analysis. For example, if the analysis is meant to help the user to choose the best-adapted structure for a system whose functions are completely defined, it may be convenient to introduce first functional dependencies between components and then structural dependencies, as the model corresponding to functional dependencies is to be reused. Generally, recovery and maintenance dependencies are modeled at the end, as one important aim of the dependability evaluation is to find the best- suited recovery and maintenance strategies for an application. Recovery and maintenance dependencies may have an impact on the system’s structure. Not all the details of the architecture model are necessary for the AADL dependability model. Only components that have associated error models and all connections and bindings between them are necessary. This allows a designer to evaluate dependability measures at different stages in the development cycle by moving from a lower fidelity AADL dependability model to a detailed one. In some cases, not all components having associated error models are part of the AADL dependability model. The AADL Error Model Annex offers two useful abstraction options for error models of components composed of subcomponents: − The first option is to declare an abstract error model for a system component. In this case, the corresponding component is seen as a black box (i.e., the detailed subcomponents’ error models are not part of the AADL dependability model). This option is useful to abstract away modeling details in case an architecture model with too detailed error models associated with components does exist for other purposes. Issues linked to the relationship between abstract and concrete error models have been mentioned in [13]. − The second option is to define the state of a system component as a function of its subcomponents’ states. This option can be used to specify state classes for the overall application. These classes are useful in the evaluation of measures. If the user wishes to evaluate reliability or availability, it is necessary to specify the system states that are to be considered as failed states. If in addition, the user wishes to evaluate safety, it is necessary to specify the system states that are considered as catastrophic. 5. Example In this section we illustrate our modeling approach on a small software architecture representing a process whose functional role is to compute a result. The computation is divided in three sub computations, each of them being performed by a thread. The thread Compute2 uses the result obtained by the thread Compute1 and the thread Compute3 uses the result obtained by the thread Compute2 to compute the result expected from the process. The three threads are connected through data connections according to the pipe and filter architectural style [14]. Due to space limitations, we only take into account two dependencies: − An architecture-based dependency between the computing threads: a failure in one of the computing threads may cause the failure of the following thread (with a probability p). In some cases, cascading failures can occur. − A recovery dependency: Compute3 can only recover if Compute1 and Compute2 are error free. We assume that Compute2 can recover if Compute1 is not error free. The AADL dependability model of this application is shown in Figure 5 using the AADL graphical notation. Figure 5. AADL dependability model The AADL dependability model of this application is built in three iterations. The computing threads’ behavior in the presence of their own fault and recovery events is represented in the first iteration. The propagation KO together with corresponding transitions are added in a second iteration to represent the architecture-based dependency. The thread Compute1 can have an impact on Compute2 and Compute2 can have an impact on Compute3. We remind that the opposite is not possible, as the connections between threads are unidirectional. The recovery dependency is modeled in the third iteration. It requires the existence of a Recovery thread in the architecture model (see light grey part of Figure 5). Its role is to send (through the out port to3) a RecoverAuthorize propagation to Compute3 if Compute1 and Compute2 are error free. Figure 6-a shows the error model Comp.general associated with threads Compute1 and Compute2. Figure 6-b shows the error model Comp3.general associated with the threads Compute3. The three iterations are highlighted. Each line tagged with a (+) sign is added to the error model corresponding to the previous iteration while each line tagged with a (-) sign is removed from it during the current iteration. The first and second iterations are the same for all three computing threads. In the third iteration, it is necessary to distinguish between a failed state and a failed state from which Compute3 is authorized to restart. This leads to removing a transition declared in the first iteration, and adding a state (CanRecover) and two transitions linking it to the state machine. Figure 7 shows the Guard_Out property applied to port to3 of the Recovery thread in the third iteration. This property specifies that a RecoverAuthorize propagation is sent to Compute3 through port to3 when OK propagations are received through ports in1 and in2 (meaning that Compute1 and Compute2 are error free). The Recovery thread has an associated error model that is not shown here. It declares in and out propagations used in the Guard_Out property. The main idea of this method is to verify and validate the model at each iteration. If a problem arises during iteration i, only the part of the current AADL dependability model corresponding to iteration i is questioned. Thus, the validation process is facilitated especially in the context of complex systems. 6. Conclusion This paper presented an iterative approach for system dependability modeling using AADL. This approach is meant to ease the task of analyzing dependability characteristics and evaluating dependability measures for the AADL users community. Our approach assists the user in the structured construction of the AADL dependability model (i.e., architecture model and dependability-related information). To support and trace model evolution, this approach proposes that the user builds the model iteratively. Components’ behaviors in the presence of faults are modeled in the first iteration as if they were isolated. Then, each iteration introduces a new dependency between system components. Error models representing the behavior of several types of system components and several types of dependencies may be placed in a library and then instantiated to minimize the modeling effort and maximize the reusability of models. The OSATE toolset is able to support our modeling approach. It also allows choosing component models and error models from libraries. For the sake of illustration, we used simple examples in this paper. We have already applied the iterative modeling approach to a system with multiple dependencies in [12] and we plan to validate it against other complex case studies. Error Model Type [Comp] error model Comp features -- iteration 1 (+) Error_Free: initial error state; (+) Failed: error state; (+) Fail: error event (+) {Occurrence => Poisson λ}; (+) Recover: error event (+) {Occurrence => Poisson µ}; -- iteration 2 (+) KO: in out error propagation (+) {Occurrence => fixed p}; -- iteration 3 (+) OK: out error propagation (+) {Occurrence => fixed 1}; end Comp; Error Model Type [Comp3] error model Comp3 features -- iteration 1 (+) Error_Free: initial error state; (+) Failed: error state; (+) Fail: error event (+) {Occurrence => Poisson λ}; (+) Recover: error event (+) {Occurrence => Poisson µ}; -- iteration 2 (+) KO: in out error propagation (+) {Occurrence => fixed p}; -- iteration 3 (+) CanRecover: error state; (+) OK: in error propagation; end Comp3; Error Model Implementation [Comp.general] error model implementation Comp.general transitions -- iteration 1 (+) Error_Free-[Fail]->Failed; (+) Failed-[Recover]->Error_Free; -- iteration 2 (+) Error_Free-[in KO]->Failed; (+) Failed-[out KO]->Failed; -- iteration 3 (+) Error_Free-[out OK]->Error_Free; end Comp.general; Error Model Implementation [Comp3.general] error model implementation Comp3.general transitions -- iteration 1 (+) Error_Free-[Fail]->Failed; (+) Failed-[Recover]->Error_Free; -- iteration 2 (+) Error_Free-[in KO]->Failed; (+) Failed-[out KO]->Failed; -- iteration 3 (-) Failed-[Recover]->Error_Free; (+) Failed-[RecoverAuthorize]->CanRecover; (+) CanRecover-[Recover]->Error_Free; end Comp3.general; a: Error Model for Compute1 and Compute2 b: Error Model for Compute3 Figure 6. Error model for Compute1 / Compute2 Guard_Out [port Recovery.to3] -- iteration 3 (+) Guard_Out => (+) RecoverAuthorize when (+) (from1[OK]and from2[OK]) (+) mask when others (+) applies to to3; Figure 7. Guard_Out property (port Recovery.to3) Acknowledgements This work is partially supported by 1) the European Commission (European integrated project ASSERT No. IST 004033 and network of excellence ReSIST No. IST 026764). and 2) the European Social Fund. References [1] N. Medvidovic and R. N. Taylor, A classification and comparison framework for Software Architecture Description Languages, IEEE Transactions on Software Engineering, 26, 2000, 70-93. [2] SAE-AS5506, Architecture Analysis and Design Language, Society of Automotive Engineers, 2004. [3] SAE-AS5506/1, Architecture Analysis and Design Language (AADL) Annex Volume 1, Annex E: Error Model Annex, Society of Automotive Engineers, 2006. [4] J.-M. Farines, et al., The Cotre project: rigorous software development for real time systems in avionics, 27th IFAC/IFIP/IEEE Workshop on Real Time Programming, Zielona Gora, Poland, 2003. [5] R. Allen and D. Garlan, A Formal Basis for Architectural Connection, ACM Transactions on Software Engineering and Methodology, 6, 1997, 213-249. [6] M. Bernardo, P. Ciancarini, and L. Donatiello, Architecting Families of Software Systems with Process Algebras, ACM Transactions on Software Engineering and Methodology, 11, 2002, 386-426. [7] A. Bondavalli, et al., Dependability Analysis in the Early Phases of UML Based System Design, Int. Journal of Computer Systems - Science & Engineering, 16, 2001, 265- 275. [8] S. Bernardi, S. Donatelli, and J. Merseguer, From UML Sequence Diagrams and Statecharts to analysable Petri Net models, 3rd Int. Workshop on Software and Performance (WOSP 2002), Rome, Italy, 2002, ,35-45. [9] P. King and R. Pooley, Using UML to Derive Stochastic Petri Net Models, 15th annual UK Performance Engineering Workshop, 1999, 45-56. [10] P. H. Feiler, et al., Pattern-Based Analysis of an Embedded Real-time System Architecture, 18th IFIP World Computer Congress, ADL Workshop, Toulouse, France, 2004, 83-91. [11] K. Kanoun and M. Borrel, Fault-tolerant systems dependability. Explicit modeling of hardware and software component-interactions, IEEE Transactions on Reliability, 49, 2000, 363-376. [12] A. E. Rugina, K. Kanoun, and M. Kaâniche, AADL-based Dependability Modelling, LAAS-CNRS Research Report n°06209, April 2006, 85p. [13] P. Binns and S. Vestal, Hierarchical composition and abstraction in architecture models, 18th IFIP World Computer Congress, ADL Workshop, Toulouse, France, 2004, 43-52. [14] M. Shaw and D. Garlan, Software Architecture: Perspectives on an Emerging Discipline (Prentice-Hall, 1996). ABSTRACT For efficiency reasons, the software system designers' will is to use an integrated set of methods and tools to describe specifications and designs, and also to perform analyses such as dependability, schedulability and performance. AADL (Architecture Analysis and Design Language) has proved to be efficient for software architecture modeling. In addition, AADL was designed to accommodate several types of analyses. This paper presents an iterative dependency-driven approach for dependability modeling using AADL. It is illustrated on a small example. This approach is part of a complete framework that allows the generation of dependability analysis and evaluation models from AADL models to support the analysis of software and system architectures, in critical application domains. <|endoftext|><|startoftext|> Introduction Let X be a compact connected Kähler manifold of dimension n ∈ N∗. Throughout the article ω denotes a smooth closed form of bidegree (1, 1) which is nonnegative and big, i.e. such that ωn > 0. We continue the study started in [GZ 2], [EGZ] of the complex Monge-Ampère equation (MA)µ (ω + dd cϕ)n = µ, where ϕ, the unknown function, is ω-plurisubharmonic: this means that ϕ ∈ L1(X) is upper semi-continuous and ω+ ddcϕ ≥ 0 is a positive current. We let PSH(X,ω) denote the set of all such functions (see [GZ 1] for their basic properties). Here µ is a fixed positive Radon measure of total mass µ(X) = ωn, and d = ∂ + ∂, dc = 1 (∂ − ∂). Following [GZ 2] we say that a ω-plurisubharmonic function ϕ has fi- nite weighted Monge-Ampère energy, ϕ ∈ E(X,ω), when its Monge-Ampère measure (ω+ ddcϕ)n is well defined, and there exists an increasing function χ : R− → R− such that χ(−∞) = −∞ and χ ◦ ϕ ∈ L1((ω + ddcϕ)n). In general χ has very slow growth at infinity, so that ϕ is far from being bounded. The purpose of this article is twofold. First we extend one of the main results of [GZ 2] by showing THEOREM A. There exists ϕ ∈ E(X,ω) such that µ = (ω+ddcϕ)n if and only if µ does not charge pluripolar sets. This results has been established in [GZ 2] when ω is a Kähler form. It is important for applications to complex dynamics and Kähler geometry to consider as well forms ω that are less positive (see [EGZ]). http://arxiv.org/abs/0704.0866v2 2 S.BENELKOURCHI & V.GUEDJ & A.ZERIAHI We then look for conditions on the measure µ which insure that the solution ϕ is almost bounded. Following the seminal work of S. Kolodziej [K 2,3], we say that µ is dominated by the Monge-Ampère Capacity Capω if there exists a function F : R+ → R+ such that limt→0+ F (t) = 0 and (†) µ(K) ≤ F (Capω(K)), for all Borel subsets K ⊂ X. Here Capω denotes the global version of the Monge-Ampère capacity intro- duced by E.Bedford and A.Taylor [BT] (see section 2). Observe that µ does not charge pluripolar sets since F (0) = 0. When F (x) . xα vanishes at order α > 1 and ω is Kähler, S. Kolodziej has proved [K 2] that the solution ϕ ∈ PSH(X,ω) of (MA)µ is continuous. The boundedness part of this result was extended in [EGZ] to the case when ω is merely big and nonnegative. If F (x) . xα with 0 < α < 1, two of us have proved in [GZ 2] that the solution ϕ has finite χ−energy, where χ(t) = −(−t)p, p = p(α) > 0. This result was first established by U. Cegrell in a local context [Ce]. Another objective of this article is to fill in the gap inbetween Cegrell’s and Kolodziej’s results, by considering all intermediate dominating functions F. Write Fε(x) = x[ε(− ln(x)/n)] n where ε : R → [0,∞[ is nonincreasing. Our second main result is: THEOREM B. If µ(K) ≤ Fε(Capω(K)) for all Borel subsets K ⊂ X, then µ = (ω + ddcϕ)n where ϕ ∈ PSH(X,ω) satisfies supX ϕ = 0 and Capω(ϕ < −s) ≤ exp(−nH −1(s)). Here H−1 is the reciprocal function of H(x) = e ε(t)dt + s0, where s0 = s0(ε, ω) ≥ 0 only depends on ε and ω. This general statement has several useful consequences: ε(t)dt < +∞, thenH−1(s) = +∞ for s ≥ s∞ := e ε(t)dt+ s0, hence Capω(ϕ < −s) = 0. This means that ϕ is bounded from below by −s∞. This result is due to S. Kolodziej [K 2,3] when ω is Kähler, and [EGZ] when ω ≥ 0 is merely big; • the condition (†) is easy to check for measures with density in Lp, p > 1. Our result thus gives a simple proof (Corollary 3.2), following the seminal approach of S. Kolodziej ([K2]), of the C0-a priori estimate of S.T. Yau [Y], which is crucial for proving the Calabi conjecture (see [T] for an overview); • when ε(t)dt = +∞, the solution ϕ is generally unbounded. The faster ε(t) decreases towards zero, the faster the growth of H−1 at infinity, hence the closer is ϕ from being bounded; • the special case ε ≡ 1 is of particular interest. Here µ(·) ≤ Capω(·), and our result shows that Capω(ϕ < −s) decreases exponentially fast, hence ϕ has “ loglog-singularities”. These are the type of sin- gularities of the metrics used in Arakelov geometry in relation with measures µ = fdV whose density has Poincaré-type singularities (see [Ku], [BKK]). We prove Theorem B in section 3, after establishing Theorem A in section 2.1 and recalling some useful facts from [GZ 2], [EGZ] in section 2.2. We A PRIORI ESTIMATES FOR SOLUTIONS OF MONGE-AMPÈRE EQUATIONS 3 then test the sharpness of our estimates in section 4, where we give examples of measures fulfilling our assumptions: these are absolutely continuous with respect to ωn, and their density do not belong to Lp, for any p > 1. 2. Weakly singular quasiplurisubharmonic functions The class E(X,ω) of ω-psh functions with finite weighted Monge-Ampère energy has been introduced and studied in [GZ 2]. It is the largest subclass of PSH(X,ω) on which the complex Monge-Ampère operator (ω+ddc·)n is well-defined and the comparison principle is valid. Recall that ϕ ∈ E(X,ω) if and only if (ω + ddcϕj) n(ϕ ≤ −j) → 0, where ϕj := max(ϕ,−j). 2.1. The range of the Monge-Ampère operator. The range of the operator (ω + ddc·)n acting on E(X,ω) has been characterized in [GZ 2] when ω is a Kähler form. We extend here this result to the case when ω is merely nonnegative and big. Theorem 2.1. Assume ω is a smooth closed nonnegative (1,1) form on X, and µ is a positive Radon measure such that µ(X) = ωn > 0. Then there exists ϕ ∈ E(X,ω) such that µ = (ω + ddcϕ)n if and only if µ does not charge pluripolar sets. Proof. We can assume without loss of generality that µ and ω are normalized so that µ(X) = ωn = 1. Consider, for A > 0, CA(ω) := {ν probability measure / ν(K) ≤ A · Capω(K), for all K ⊂ X}, where Capω denotes the Monge-Ampère capacity introduced by E.Bedford and A.Taylor in [BT] (see [GZ 1] for this compact setting). Recall that Capω(K) := sup (ω + ddcu)n / u ∈ PSH(X,ω), 0 ≤ u ≤ 1 We first show that a measure ν ∈ CA(ω) is the Monge-Ampère of a func- tion ψ ∈ Ep(X,ω), for any 0 < p < 1, where Ep(X,ω) := {ψ ∈ E(X,ω) / ψ ∈ Lp (ω + ddcψ)n Indeed, fix ν ∈ CA(ω), 0 < p < 1, and ωj := ω + εjΩ, where Ω is a kähler form on X, and εj > 0 decreases towards zero. Observe that PSH(X,ω) ⊂ PSH(X,ωj), hence Capω(.) ≤ Capωj(.), so that ν ∈ CA(ωj). It follows from Proposition 3.6 and 2.7 in [GZ 1] that there exists C0 > 0 such that for any v ∈ PSH(X,ωj) normalized by supX v = −1, we have Capωj(v < −t) ≤ , for all t ≥ 1. This yields Ep(X,ωj) ⊂ L p(ν): if v ∈ Ep(X,ωj) with supX v = −1, then (−v)pdν = p · tp−1ν(v < −t)dt ≤ pA · tp−1Capω(v < −t)dt+ Cp + Cp < +∞. 4 S.BENELKOURCHI & V.GUEDJ & A.ZERIAHI It follows therefore from Theorem 4.2 in [GZ 2] that there exists ϕj ∈ Ep(X,ωj) with supX ϕj = −1 and (ωj+dd n = cj ·ν, where cj = ωnj ≥ 1 decreases towards 1 as εj decreases towards zero. We can assume without loss of generality that 1 ≤ cj ≤ 2. Observe that the ϕj ’s have uniformly bounded energies, namely (−ϕj) p(ωj + dd n ≤ 2 (−ϕj) pdν ≤ 2 Since supX ϕj = −1, we can assume (after extracting a convergent subse- quence) that ϕj → ϕ in L 1(X), where ϕ ∈ PSH(X,ω), supX ϕ = −1. Set φj := (supl≥j ϕl) ∗. Thus φj ∈ PSH(X,ωj), and φj decreases towards ϕ. Since φj ≥ ϕj , it follows from the “fundamental inequality” (Lemma 2.3 in [GZ 2]) that (−φj) p(ωj + dd n ≤ 2n (−ϕj) p(ωj + dd n ≤ C ′ < +∞. Hence it follows from stability properties of the class Ep(X,ω) that ϕ ∈ Ep(X,ω) (see Proposition 5.6 in [GZ 2]). Moreover (ωj + dd n ≥ inf (ωl + dd n ≥ ν, hence (ω + ddcϕ)n = lim(ωj + dd n ≥ ν. Since ωn = ν(X) = 1, this yields ν = (ω + ddcϕ)n as claimed above. We can now prove the statement of the theorem. One implication is obvious: if µ = (ω+ddcϕ)n, ϕ ∈ E(X,ω), then µ does not charge pluripolar sets, as follows from Theorem 1.3 in [GZ 2]. So we assume now µ that does not charge pluripolar sets. Since C1(ω) is a compact convex set of probability measures which contains all measures (ω + ddcu)n, u ∈ PSH(X,ω), 0 ≤ u ≤ 1, we can project µ onto C1(ω) and get, by a generalization of Radon-Nikodym theorem (see [R], [Ce]), µ = f · ν, ν ∈ C1(ω), 0 ≤ f ∈ L 1(ν). Now ν = (ω + ddcψ)n for some ψ ∈ E1/2(X,ω), ψ ≤ 0, as follows from the discussion above. Replacing ψ by eψ shows that we can actually assume ψ to be bounded (see Lemma 4.5 in [GZ 2]). We can now apply line by line the same proof as that of Theorem 4.6 in [GZ 2] to conclude that µ = (ω+ddcϕ)n for some ϕ ∈ E(X,ω). � 2.2. High energy and capacity estimates. Given χ : R− → R− an increasing function, we consider, following [GZ 2], Eχ(X,ω) := ϕ ∈ E(X,ω) / (−χ)(−|ϕ|) (ω + ddcϕ)n < +∞ Alternatively a function ϕ ≤ 0 belongs to Eχ(X,ω) if and only if (−χ) ◦ ϕj (ω + dd n < +∞, where ϕj := max(ϕ,−j) is the canonical approximation of ϕ by bounded ω-psh functions. When χ(t) = −(−t)p, Eχ(X,ω) is the class E p(X,ω) used in previous section. The properties of classes Eχ(X,ω) are quite different whether the weight χ is convex (slow growth at infinity) or concave. In previous works [GZ 2], A PRIORI ESTIMATES FOR SOLUTIONS OF MONGE-AMPÈRE EQUATIONS 5 two of us were mainly interested in weights χ of moderate growth at infinity (at most polynomial). Our main objective in the sequel is to construct solutions ϕ of (MA)µ which are “almost bounded”, i.e. in classes Eχ(X,ω) for concave weights χ of arbitrarily high growth. For this purpose it is useful to relate the property ϕ ∈ Eχ(X,ω) to the speed of decreasing of Capω(ϕ < −t), as t → +∞. We set Êχ(X,ω) := ϕ ∈ PSH(X,ω) / tnχ′(−t)Capω(ϕ < −t)dt < +∞ An important tool in the study of classes Eχ(X,ω) are the “fundamental inequalities” (Lemmas 2.3 and 3.5 in [GZ 2]), which allow to compare the weighted energy of two ω-psh functions ϕ ≤ ψ. These inequalities are only valid for weights of slow growth (at most polynomial), while they become immediate for classes Êχ(X,ω). So are the convexity properties of Êχ(X,ω). We summarize this and compare these classes in the following: Proposition 2.2. The classes Êχ(X,ω) are convex and stable under maxi- mum: if Êχ(X,ω) ∋ ϕ ≤ ψ ∈ PSH(X,ω), then ψ ∈ Êχ(X,ω). One always has Êχ(X,ω) ⊂ Eχ(X,ω), while Eχ̂(X,ω) ⊂ Êχ(X,ω), where χ ′(t− 1) = tnχ̂′(t). Since we are mainly interested in the sequel in weights with (super) fast growth at infinity, the previous proposition shows that Êχ(X,ω) and Eχ(X,ω) are roughly the same: a function ϕ ∈ PSH(X,ω) belongs to one of these classes if and only if Capω(ϕ < −t) decreases fast enough, as t→ +∞. Proof. The convexity of Êχ(X,ω) follows from the following simple observa- tion: if ϕ,ψ ∈ Êχ(X,ω) and 0 ≤ a ≤ 1, then {aϕ+ (1− a)ψ < −t} ⊂ {ϕ < −t} ∪ {ψ < −t} . The stability under maximum is obvious. Assume ϕ ∈ Êχ(X,ω). We can assume without loss of generality ϕ ≤ 0 and χ(0) = 0. Set ϕj := max(ϕ,−j). It follows from Lemma 2.3 below that (−χ) ◦ ϕj (ω + dd χ′(−t)(ω + ddcϕj) n(ϕj < −t)dt χ′(−t)tnCapω(ϕ < −t)dt < +∞, This shows that ϕ ∈ Eχ(X,ω). The other inclusion goes similarly, using the second inequality in Lemma 2.3 below. � If ϕ ∈ Eχ(X,ω) (or Êχ(X,ω)), then the bigger the growth of χ at −∞, the smaller Capω(ϕ < −t) when t → +∞, hence the closer ϕ is from being bounded. Indeed ϕ ∈ PSH(X,ω) is bounded iff it belongs to Eχ(X,ω) for all weights χ, as was observed in [GZ 2], Proposition 3.1. Similarly PSH(X,ω) ∩ L∞(X) = Êχ(X,ω), where the intersection runs over all concave increasing functions χ. We will make constant use of the following result: 6 S.BENELKOURCHI & V.GUEDJ & A.ZERIAHI Lemma 2.3. Fix ϕ ∈ E(X,ω). Then for all s > 0 and 0 ≤ t ≤ 1, tnCapω(ϕ < −s− t) ≤ (ϕ<−s) (ω + ddcϕ)n ≤ snCapω(ϕ < −s), where the second inequality is true only for s ≥ 1. The proof is a direct consequence of the comparison principle (see Lemma 2.2 in [EGZ] and [GZ 2]). 3. Measures dominated by capacity From now on µ denotes a positive Radon measure on X whose total mass is V olω(X): this is an obvious necessary condition in order to solve (MA)µ. To simplify numerical computations, we assume in the sequel that µ and ω have been normalized so that µ(X) = V olω(X) = ωn = 1. When µ = ehωn is a smooth volume form and ω is a Kähler form, S.T.Yau has proved [Y] that (MA)µ admits a unique smooth solution ϕ ∈ PSH(X,ω) with supX ϕ = 0. Smooth measures are easily seen to be nicely dominated by the Monge-Ampère capacity (see the proof of Corollary 3.2 below). Measures dominated by the Monge-Ampère capacity have been exten- sively studied by S.Kolodziej in [K 2,3,4]. Following S. Kolodziej ([K3], [K4]) with slightly different notations, fix ε : R → [0,∞[ a continuous decreasing function and set Fε(x) := x[ε(− lnx/n)] n, x > 0. We will consider probability measures µ satisfying the following condition : for all Borel subsets K ⊂ X, µ(K) ≤ Fε(Capω(K)). The main result achieved in [K 2], can be formulated as follows: If ω is a Kähler form and ε(t)dt < +∞ then µ = (ω + ddcϕ)n for some contin- uous function ϕ ∈ PSH(X,ω). The condition ε(t)dt < +∞ means that ε decreases fast enough towards zero at infinity. This gives a quantitative estimate on how fast ε(− lnCapω(K)/n), hence µ(K), decreases towards zero as Capω(K) → 0. ε(t)dt = +∞, it follows from Theorem 2.1 that µ = (ω + ddcϕ)n for some function ϕ ∈ E(X,ω), but ϕ will generally be unbounded. Our second main result measures how far ϕ is from being bounded: Theorem 3.1. Assume for all compact subsets K ⊂ X, (3.1) µ(K) ≤ Fε(Capω(K)). Then µ = (ω + ddcϕ)n where ϕ ∈ E(X,ω) is such that supX ϕ = 0 and Capω(ϕ < −s) ≤ exp(−nH −1(s)), for all s > 0. Here H−1 is the reciprocal function of H(x) = e ε(t)dt + s0, where s0 = s0(ε, ω) ≥ 0 is a constant which only depends on ε and ω. In particular ϕ ∈ Eχ(X,ω) where −χ(−t) = exp(nH −1(t)/2). A PRIORI ESTIMATES FOR SOLUTIONS OF MONGE-AMPÈRE EQUATIONS 7 Recall that here, and troughout the article, ω ≥ 0 is merely big. Before proving this result we make a few observations. • It is interesting to consider as well the case when ε(t) increases to- wards +∞. One can then obtain solutions ϕ such that Capω(ϕ < −t) decreases at a polynomial rate. When e.g. ω is Kähler and µ(K) ≤ Capω(K) α, 0 < α < 1, it follows from Proposition 5.3 in [GZ 2] that µ = (ω + ddcϕ)n where ϕ ∈ Ep(X,ω) for some p = pα > 0. Here E p(X,ω) denotes the Cegrell type class Eχ(X,ω), with χ(t) = −(−t)p. • When ε(t) ≡ 1, Fε(x) = x and H(x) ≍ e.x. Thus Theorem 3.1 reads µ ≤ Capω ⇒ µ = (ω + dd cϕ)n, where Capω(ϕ < −s) . exp (−ns/e) . This is precisely the rate of decreasing corresponding to functions which look locally like − log(− log ||z||), in some local chart z ∈ U ⊂ Cn. This class of ω-psh functions with “loglog-singularities” is important for applications (see [Ku], [BKK]). • If ε(t) decreases towards zero, then Capω(ϕ < −t) decreases at a superexponential rate. The faster ε(t) decreases towards zero, the slower the growth of H, hence the faster the growth of H−1 at infin- ity. When ε(t)dt < +∞, the function ε decreases so fast that Capω(ϕ < −t) = 0 for t >> 1, thus ϕ is bounded. This is the case when µ(K) ≤ Capω(K) α for some α > 1 [K 2], [EGZ]. • When ε(t)dt = +∞, the solution ϕmay well be unbounded (see Examples in section 4). At the critical case where µ ≤ Fε(Capω) for all functions ε such that ε(t)dt = +∞, we obtain µ = (ω + ddcϕ)n with ϕ ∈ PSH(X,ω) ∩ L∞(X), as follows from Proposition 3.1 in [GZ 2]. This partially explains the difficulty in describing the range of Monge-Ampère operators on the set of bounded (quasi-)psh functions. Proof. The assumption on µ implies in particular that it vanishes on pluripo- lar sets. It follows from Theorem 2.1 that there exists a function ϕ ∈ E(X,ω) such that µ = (ω + ddcϕ)n and supX ϕ = 0. Set g(s) := − logCapω(ϕ < −s), ∀s > 0. The function g is increasing on [0,+∞] and g(+∞) = +∞, since Capω vanishes on pluripolar sets. Observe also that g(s) ≥ 0 for all s ≥ 0, since g(0) = − logCapω(X) = − log V olω(X) = 0. It follows from Lemma 2.3 and (3.1) that for all s > 0 and 0 ≤ t ≤ 1, tnCapω(ϕ < −s− t) ≤ µ(ϕ < −s) ≤ Fε (Capω(ϕ < −s)) . Therefore for all s > 0 and 0 ≤ t ≤ 1, (3.2) log t− log ε ◦ g(s) + g(s) ≤ g(s + t). We define an increasing sequence (sj)j∈N by induction setting sj+1 = sj + eε ◦ g(sj), for all j ∈ N. 8 S.BENELKOURCHI & V.GUEDJ & A.ZERIAHI The choice of s0. Recall that (3.2) is only valid for 0 ≤ t ≤ 1. We choose s0 ≥ 0 large enough so that (3.3) e.ε ◦ g(s0) ≤ 1. This will allow us to use (3.2) with t = tj = sj+1 − sj ∈ [0, 1], since ε ◦ g is decreasing, while sj ≥ s0 is increasing, hence 0 ≤ tj = eε ◦ g(sj) ≤ eε ◦ g(s0) ≤ 1. We must insure that s0 = s0(ε, ω) can chosen to be independent of ϕ. This is a consequence of Proposition 2.7 in [GZ 1]: since supX ϕ = 0, there exists c1(ω) > 0 so that 0 ≤ (−ϕ)ωn ≤ c1(ω), hence g(s) := − logCapω(ϕ < −s) ≥ log s− log(n+ c1(ω)). Therefore g(s0) ≥ ε −1(1/e) for s0 = s0(ε, ω) := (n+ c1(ω)) exp(nε −1(1/e)), which is independent of ϕ. This yields e.ε ◦ g(s0) ≤ 1, as desired. The growth of sj. We can now apply (3.2) and get g(sj) ≥ j + g(s0) ≥ j. Thus lim g(sj) = +∞. There are two cases to be considered. If s∞ = lim sj ∈ R +, then g(s) ≡ +∞ for s > s∞, i.e. Capω(ϕ < −s) = 0, ∀s > s∞. Therefore ϕ is bounded from below by −s∞, in particular ϕ ∈ Eχ(X,ω) for all χ. Assume now (second case) that sj → +∞. For each s > 0, there exists N = Ns ∈ N such that sN ≤ s < sN+1. We can estimate s 7→ Ns: s ≤ sN+1 = (sj+1 − sj) + s0 = e ε ◦ g(sj) + s0 ε(j) + s0 ≤ e.ε(0) + e ε(t)dt+ s0 =: H(N), Therefore H−1(s) ≤ N ≤ g(sN ) ≤ g(s), hence Capω(ϕ < −s) ≤ exp(−nH −1(s)). Set now −χ(−t) = exp(nH−1(t)/2). Then tnχ′(−t)Capω(ϕ < −t)dt ε(H−1(t)) + s̃0 exp(−nH−1(t)/2)dt tn exp(−nt/2)dt < +∞. This shows that ϕ ∈ Eχ(X,ω) where χ(t) = − exp(nH −1(−t)/2). It follows from the proof above that when ε(t)dt < +∞, the solution ϕ is bounded since in this case we have s∞ := lim sj ≤ s0(ε, ω) + e ε(0) + e ε(t)dt < +∞ where s0(ε, ω) is an absolute constant satisfying (3.3) (see above). � A PRIORI ESTIMATES FOR SOLUTIONS OF MONGE-AMPÈRE EQUATIONS 9 Let us emphasize that Theorem 3.1 also yields a slightly simplified proof of the following result [K 2], [EGZ]: if µ(K) ≤ Fε(Capω(K)) for some decreas- ing function ε : R → R+ such that ε(t)dt < +∞, then the sequence (sj) above is convergent, hence µ = (ω + dd cϕ)n, where ϕ ∈ PSH(X,ω) is bounded. For the reader’s convenience we indicate a proof of the following important particular case: Corollary 3.2. Let µ = fωn be a measure with density 0 ≤ f ∈ Lp(ωn), where p > 1 and fωn = ωn. Then there exists a unique bounded function ϕ ∈ PSH(X,ω) such that (ω + ddcϕ)n = µ, supX ϕ = 0 and 0 ≤ ||ϕ||L∞(X) ≤ C(p, ω).||f || Lp(ωn) where C(p, ω) > 0 only depends on p and ω. This a priori bound is a crucial step in the proof by S.T.Yau of the Calabi conjecture (see [Ca], [Y], [A], [T], [Bl]). The proof presented here follows Kolodziej’s new and decisive pluripotential approach (see [K2]). Let us stress that the dependence ω 7−→ C(p, ω) is quite explicit, as we shall see in the proof. This is important when considering degenerate situations [EGZ]. Proof. We claim that there exists C1(ω) such that (3.4) µ(K) ≤ C1(ω)||f || Lp(ωn) [Capω(K)] , for all Borel sets K ⊂ X. Assuming this for the moment, we can apply Theorem 3.1 with ε(x) = C1(ω)||f || Lp(ωn) exp(−x), which yields, as observed at the end of the proof of Theorem 3.1 ||ϕ||L∞(X) ≤M(f, ω), whereM(f, ω) := s0(ε, ω)+e ε(0)+e ε(t)dt = s0(ε, ω)+2eC1(ω)||f || Lp(ωn) and s0 = s0(ε, ω) is a large number s0 > 1 satisfying the inequality (3.3). In order to give the precise dependence of the uniform bound M(f, ω) on the Lp−norm of the density f , we need to choose s0 more carefully. Observe that condition (3.3) can be written Capω({ϕ ≤ −s0}) ≤ exp(−nε −1(1/e). Since nε−1(1/e) = log enC1(ω) n‖f‖Lp(ωn) , we must choose s0 > 0 so that (3.5) Capω({ϕ ≤ −s0}) ≤ enC1(ω)n‖f‖Lp(ωn) We claim that for any N ≥ 1 there exists a uniform constant C2(N, p, ω) > 0 such that for any s > 0, (3.6) Capω({ϕ ≤ −s}) ≤ C2(N, p) s −N ‖f‖Lp(ωn). Indeed observe first that by Hölder inequality, (−ϕ)Nωnϕ = (−ϕ)Nfωn ≤ ‖f‖Lp(ωn)‖ϕ‖ LNq (ωn) 10 S.BENELKOURCHI & V.GUEDJ & A.ZERIAHI Since ϕ belongs to the compact family {ψ ∈ PSH(X,ω); supX ψ = 0} ([GZ2]), there exists a uniform constant C ′2(N, p, ω) > 0 such that ‖ϕ‖ LNq(ωn) C ′2(N, p, ω), hence (−ϕ)Nωnϕ ≤ C 2(N, p, ω)‖f‖Lp(ωn). Fix u ∈ PSH(X,ω) with −1 ≤ u ≤ 0 and N ≥ 1 to be specified later. If follows from Tchebysheff and energy inequalities ([GZ2]) that {ϕ≤−s} (ω + ddcu)n ≤ s−N (−ϕ)N (ω + ddcu)n ≤ cN s −N max (−ϕ)Nωnϕ, (−u)Nωnu ≤ cN s −N max C ′2(N, p, ω), 1 ‖f‖Lp(ωn). We have used here the fact that ‖f‖Lp(ωn) ≥ 1, which follows from the normalization : 1 = fωn ≤ ‖f‖Lp(ωn). This proves the claim. SetN = 2n, it follows from (3.6) that s0 := C1(ω) nenC2(2n, p, ω)‖f‖ Lp(ωn) satisfies the required condition (3.5), which implies the estimate of the the- orem. We now establish the estimate (3.4). Observe first that Hölder’s inequality yields (3.7) µ(K) ≤ ||f ||Lp(ωn) [V olω(K)] , where 1/p + 1/q = 1. Thus it suffices to estimate the volume V olω(K). Recall the definition of the Alexander-Taylor capacity, Tω(K) := exp(− supX VK,ω), where VK,ω(x) := sup{ψ(x) /ψ ∈ PSH(X,ω), ψ ≤ 0 on K}. This capacity is comparable to the Monge-Ampère capacity, as was observed by H.Alexander and A.Taylor [AT] (see Proposition 7.1 in [GZ 1] for this compact setting): (3.8) Tω(K) ≤ e exp Capω(K) It thus remains to show that V olω(K) is suitably bounded from above by Tω(K). This follows from Skoda’s uniform integrability result: set ν(ω) := sup {ν(ψ, x) /ψ ∈ PSH(X,ω), x ∈ X} , where ν(ψ, x) denotes the Lelong number of ψ at point x. This actually only depends on the cohomology class {ω} ∈ H1,1(X,R). It is a standard fact that goes back to H.Skoda (see [Z]) that there exists C2(ω) > 0 so that ωn ≤ C2(ω), for all functions ψ ∈ PSH(X,ω) normalized by supX ψ = 0. We infer (3.9) V olω(K) ≤ V ∗K,ω ωn ≤ C2(ω)[Tω(K)] 1/ν(ω). A PRIORI ESTIMATES FOR SOLUTIONS OF MONGE-AMPÈRE EQUATIONS 11 It now follows from (3.7), (3.8), (3.9), that µ(K) ≤ ||f ||Lp [C2(ω)] 1/qe1/qν(ω) exp qν(ω)Capω(K) The conclusion follows by observing that exp(−1/x1/n) ≤ Cnx 2 for some explicit constant Cn > 0. � 4. Examples 4.1. Measures invariant by rotations. In this section we produce exam- ples of radially invariant functions/measures which show that our previous results are essentially sharp. The first example is due to S.Kolodziej [K 1]. Example 4.1. We work here on the Riemann sphere X = P1(C), with ω = ωFS, the Fubini-Study volume form. Consider µ = fω a measure with density f which is smooth and positive on X \ {p}, and such that f(z) ≃ |z|2(log |z|)2 , c > 0, in a local chart near p = 0. A simple computation yields µ = ω + ddcϕ, where ϕ ∈ PSH(P1, ω) is smooth in P1 \ {p} and ϕ(z) ≃ −c′ log(− log |z|) near p = 0, c′ > 0, hence logCapω(ϕ < −t) ≃ −t, Here a ≃ b means that a/b is bounded away from zero and infinity. This is to be compared to our estimate logCapω(ϕ < −t) . −t/e (Theo- rem 3.1 ) which can be applied, as it was shown by S.Kolodziej in [K 1] that µ . Capω. Thus Theorem 3.1 is essentially sharp when ε ≡ 1. We now generalize this example and show that the estimate provided by Theorem 3.1 is essentially sharp in all cases. Example 4.2. Fix ε as in Theorem 3.1. Consider µ = fω on X = P1(C), where ω = ωFS is the Fubini-Study volume form, f ≥ 0 is continuous on 1 \ {p}, and f(z) ≃ ε(log(− log |z|)) |z|2(log |z|)2 in local coordinates near p = 0. Here ε : R → R+ decreases towards 0 at +∞. We claim that there exists A > 0 such that (4.1) µ(K) ≤ ACapω(K)ε(− logCapω(K)), for all K ⊂ X. This is clear outside a small neighborhood of p = 0 since the measure µ is there dominated by a smooth volume form. So it suffices to establish this estimate when K is included in a local chart near p = 0. Consider K̃ := {r ∈ [0, R] ; K ∩ {|z| = r} 6= ∅}. It is a classical fact (see e.g. [Ra]) that the logarithmic capacity c(K) of K can be estimated from below by the length of K̃, namely l(K̃) ≤ c(K̃) ≤ c(K). 12 S.BENELKOURCHI & V.GUEDJ & A.ZERIAHI Using that ε is decreasing, hence 0 ≤ −ε′, we infer µ(K) ≤ 2π ∫ l(K̃) f(r)rdr ∫ l(K̃) ε(log(− log r))− ε′(log− log r) r(log r)2 ε(log(− log l(K̃))) − log l(K̃) ε(log(− log 4c(K))) − log 4c(K) Recall now that the logarithmic capacity c(K) is equivalent to Alexander- Taylor’s capacity T∆(K), which in turn is equivalent to the global Alexander- Taylor capacity Tω(K) (see [GZ 1]): c(K) ≃ T∆(K) ≃ Tω(K). The Alexander- Taylor’s comparison theorem [AT] reads − log 4c(K) ≃ − log Tω(K) ≃ 1/Capω(K), thus µ(K) ≤ ACapω(K)ε(− logCapω(K)). We can therefore apply Theorem 3.1. It guarantees that µ = (ω + ddcϕ), where ϕ ∈ PSH(P1, ω) satisfies logCapω(ϕ < −s) ≃ −nH −1(s), with H(s) = eA ε(t)dt + s0. On the other hand a simple computation shows that ϕ is continuous in P1 \ {p} and ϕ ≃ −H(log(− log |z|)) , near p = 0. The sublevel set (ϕ < −t) therefore coincides with the ball of radius exp(− exp(H−1(t))), hence logCapω(ϕ < −s) ≃ −H −1(s). 4.2. Measures with density. Here we consider the case when µ = fdV is absolutely continuous with respect to a volume form. Proposition 4.3. Assume µ = fωn is a probability measure whose density satisfies f [log(1 + f)]n ∈ L1(ωn). Then µ . Capω. More generally if f [log(1 + f)/ε(log(1 + | log f |))]n ∈ L1(ωn) for some continuous decreasing function ε : R → R+∗ , then for all K ⊂ X, µ(K) ≤ Fε(Capω(K)), where Fε(x) = Ax , A > 0. Proof. With slightly different notations, the proof is identical to that of Lemma 4.2 in [K 4] to which we refer the reader. � We now give examples showing that Proposition 4.3 is almost optimal. Example 4.4. For simplicity we give local examples. The computations to follow can also be performed in a global compact setting. Consider ϕ(z) = − log(− log ||z||), where ||z|| = |z1|2 + . . .+ |zn|2 de- notes the Euclidean norm in Cn. One can check that ϕ is plurisubharmonic in a neighborhood of the origin in Cn, and that there exists cn > 0 so that µ := (ddcϕ)n = f dVeucl, where f(z) = ||z||2n(− log ||z||)n+1 Observe that f [log(1 + f)]n−α ∈ L1, ∀α > 0 but f [log(1 + f)]n 6∈ L1. A PRIORI ESTIMATES FOR SOLUTIONS OF MONGE-AMPÈRE EQUATIONS 13 When n = 1 it was observed by S. Kolodziej [K 1] that µ(K) . Capω(K). Proposition 4.3 yields here µ(K) . Capω(K)(| logCapω(K)|+ 1). For n ≥ 1, it follows from Proposition 4.3 and Theorem 3.1 that logCapω(ϕ < −s) . −nH −1(s). On the other hand, one can directly check that logCapω(ϕ < −s) ≃ −nH −1(s). One can get further examples by considering ϕ(z) = χ ◦ log ||z||, so that (ddcϕ)n = ′ ◦ log ||z||)n−1χ′′(log ||z||) ||z||2n dVeucl. References [AT] H.ALEXANDER & B.A.TAYLOR: Comparison of two capacities in Cn. Math. Zeit, 186 (1984),407-417. [A] T.AUBIN: Équations du type Monge-Ampère sur les variétés kählériennes compactes. Bull. Sci. Math. (2) 102 (1978), no. 1, 63–95. [BT] E.BEDFORD & B.A.TAYLOR: A new capacity for plurisubharmonic func- tions. Acta Math. 149 (1982), no. 1-2, 1–40. [Bl] Z.BLOCKI: On uniform estimate in Calabi-Yau theorem. Sci. China Ser. A 48 (2005), suppl., 244–247. [BKK] G.BURGOS & J.KRAMER & U.KUHN: Arithmetic characteristic classes of automorphic vector bundles. Doc. Math. 10 (2005), 619–716. [Ca] E.CALABI: On Kähler manifolds with vanishing canonical class. Algebraic geometry and topology. A symposium in honor of S. Lefschetz, pp. 78–89. Princeton Univ. Press, Princeton, N. J. (1957). [Ce] U.CEGRELL: Pluricomplex energy. Acta Math. 180 (1998), no. 2, 187–217. [EGZ] P.EYSSIDIEUX & V.GUEDJ & A.ZERIAHI: Singular Kähler-Einstein met- rics. Preprint arxiv math.AG/0603431. [GZ 1] V.GUEDJ & A.ZERIAHI: Intrinsic capacities on compact Kähler manifolds. J. Geom. Anal. 15 (2005), no. 4, 607-639. [GZ 2] V.GUEDJ & A.ZERIAHI: The weighted Monge-Ampère energy of quasi- plurisubharmonic functions. J. Funct. Anal. 250 (2007), 442-482. [K 1] S.KOLODZIEJ: The range of the complex Monge-Ampère operator. Indiana Univ. Math. J. 43 (1994), no. 4, 1321–1338. [K 2] S.KOLODZIEJ: The complex Monge-Ampère equation. Acta Math. 180 (1998), no. 1, 69–117. [K 3] S.KOLODZIEJ: The Monge-Ampère equation on compact Kähler manifolds. Indiana Univ. Math. J. 52 (2003), no. 3, 667–686 [K 4] S.KOLODZIEJ: The complex Monge-Ampère equation and pluripotential theory. Mem. Amer. Math. Soc. 178 (2005), no. 840, x+64 pp. [Ku] U.KUHN: Generalized arithmetic intersection numbers. J. Reine Angew. Math. 534 (2001), 209–236. [R] J.RAINWATER: A note on the preceding paper. Duke Math. J. 36 (1969) 799–800. [Ra] T.RANSFORD: Potential theory in the complex plane. London Mathemati- cal Society Student Texts, 28. Cambridge University Press, Cambridge, 1995. x+232 pp. [T] G.TIAN: Canonical metrics in Kähler geometry. Lectures in Mathematics ETH Zürich. Birkhäuser Verlag, Basel (2000). [Y] S.T.YAU: On the Ricci curvature of a compact Kähler manifold and the complex Monge-Ampère equation. I. Comm. Pure Appl. Math. 31 (1978), no. 3, 339–411. [Z] A.ZERIAHI: Volume and capacity of sublevel sets of a Lelong class of psh functions. Indiana Univ. Math. J. 50 (2001), no. 1, 671–703. http://arxiv.org/abs/math/0603431 14 S.BENELKOURCHI & V.GUEDJ & A.ZERIAHI Slimane BENELKOURCHI & Vincent GUEDJ & Ahmed ZERIAHI Laboratoire Emile Picard UMR 5580, Université Paul Sabatier 118 route de Narbonne 31062 TOULOUSE Cedex 09 (FRANCE) benel@math.ups-tlse.fr guedj@math.ups-tlse.fr zeriahi@math.ups-tlse.fr 1. Introduction 2. Weakly singular quasiplurisubharmonic functions 2.1. The range of the Monge-Ampère operator 2.2. High energy and capacity estimates 3. Measures dominated by capacity 4. Examples 4.1. Measures invariant by rotations 4.2. Measures with density References Bibliography ABSTRACT Let $X$ be a compact K\"ahler manifold and $\om$ a smooth closed form of bidegree $(1,1)$ which is nonnegative and big. We study the classes ${\mathcal E}_{\chi}(X,\om)$ of $\om$-plurisubharmonic functions of finite weighted Monge-Amp\`ere energy. When the weight $\chi$ has fast growth at infinity, the corresponding functions are close to be bounded. We show that if a positive Radon measure is suitably dominated by the Monge-Amp\`ere capacity, then it belongs to the range of the Monge-Amp\`ere operator on some class ${\mathcal E}_{\chi}(X,\om)$. This is done by establishing a priori estimates on the capacity of sublevel sets of the solutions. Our result extends U.Cegrell's and S.Kolodziej's results and puts them into a unifying frame. It also gives a simple proof of S.T.Yau's celebrated a priori ${\mathcal C}^0$-estimate. <|endoftext|><|startoftext|> Density oscillation in highly flattened quantum elliptic rings and tunable strong dipole radiation S.P. Situ, Y.Z. He, and C.G. Bao∗ The State Key Laboratory of Optoelectronic Materials and Technologies, Zhongshan University, Guangzhou, 510275, P.R. China A narrow elliptic ring containing an electron threaded by a magnetic field B is studied. When the ring is highly flattened, the increase of B would lead to a big energy gap between the ground and excited states, and therefore lead to a strong emission of dipole photons. The photon frequency can be tuned in a wide range by changing B and/or the shape of the ellipse. The particle density is found to oscillate from a pattern of distribution to another pattern back and forth against B. This is a new kind of Aharonov-Bohm oscillation originating from symmetry breaking and is different from the usual oscillation of persistent current. ∗Corresponding author It is recognized that micro-devices are important to micro-techniques. Various kinds of micro-devices, includ- ing the quantum rings,1 have been extensively studied theoretically and experimentally in recent years. Quan- tum rings are different from other devices due to their special geometry. A distinguished phenomenon of the ring is the Aharonov-Bohm (A-B) oscillation of the ground state energy and persistent current2−5. It is be- lieved that geometry would affect the properties of small systems. Therefore, in addition to circular rings, ellip- tic rings or other rings subjected to specific topological transformations deserve to be studied, because new and special properties might be found. There have been a number of literatures devoted to elliptic quantum dots6−9 and rings10−12. It was found that the elliptic rings have two distinguished features. (i) The avoided crossing of the levels and the suppression of the A-B oscillation. (ii) The appearance of localized states which are related to bound states in infinite wires with bends.13 These feature would become more explicit if the eccentricity is larger and the ring is narrower. On the other hand, as a micro-device, the optical prop- erty is obviously essential to its application. It is guessed that very narrow rings with a high eccentricity might have special optical property, this is a point to be clari- fied. This paper is dedicated to this topic. It turns out that the optical properties of a highly flattened narrow ring is greatly different from a circular ring due to having a tunable energy gap, which would lead to strong dipole transitions with wave length tunable in a very broad range (say, from 0.1 to 0.001cm). Besides, a kind of A-B density-oscillation originating from symmetry breaking was found as reported as follows. We consider an electron with an effective mass m∗ con- fined on a one-dimensional elliptic ring with a half major axis rax and an eccentricity ε. Let us introduce an argu- ment θ so that a point (x, y) at the ring is related to θ as x = rax cos θ and y = ray sin θ, where ray = rax 1− ε2 is the half minor axis. A uniform magnetic field B confined inside a cylinder with radius rin vertical to the plane of the ring is applied. The associated vector potential reads A = Br2int/2r, where t is a unit vector normal to the position vector r. Then, the Hamiltonian reads H = G/(1− ε2 cos2 θ)[− d − i2α 1− ε2 (1− ε2 sin2 θ) 1− ε2 cos2 θ 1− ε2 sin2 θ ] (1) where G = ~2/(2m∗r2ax), α = φ/φo, φ = πr inB is the flux, φo = hc/e is the flux quantum. The eigen-states are expanded as Ψj = ∑kmax k=kmin eikθ, where k is an integer ranging from kmin to kmax, and j = 1, 2, · · · denotes the ground state, the second state, and so on. The coefficients C are obtained via the diagonalization of H . In practice, B takes positive values, kmin = −100 and kmax = 10. This range of k assures the numerical results having at least four effective figures. The energy of the j − th state is Ej = 〈H〉j ≡ dθ(1 − ε2 cos2 θ)Ψ∗jHΨj (2) where the eigen-state is normalized as dθ(1− ε2 cos2 θ)Ψ∗jΨj (3) In the follows the units meV, nm, and Tesla are used, m∗ = 0.063me (for InGaAs rings), and rin is fixed at 25. When rax = 50, ε = 0 and 0.4, the evolution of the low-lying spectra with B are given in Fig.1. When ε = 0.4, the effect of eccentricity is still small, the spec- trum is changed only slightly from the case ε = 0, but the avoided crossing of levels can be seen.10,11 In par- ticular, the A-B oscillation exists and the period of φ remains to be φo. However, when ε becomes large, three remarkable changes emerge as shown in Fig.2. (i) The A-B oscillation of the ground state vanishes gradually. (ii) The energy of the second state becomes closer and closer to the ground state. (iii) There is an energy gap lying between the ground state and the third state, the http://arxiv.org/abs/0704.0867v1 0 2 4 6 8 10 E(meV) B(Tesla) ε =0.4rax=50(b) ε =0rax=50(a) FIG. 1: Low-lying spectrum (in meV) of an one-electron sys- tem on an elliptic ring against B. rax = 50nm and ε = 0 (a) and 0.4 (b). The period of the flux φo = hc/e is associated with B = 2.106 Tesla. 0 4 8 12 16 20 B (Tesla) E(meV) rax= 50, ε = 0.8 FIG. 2: Similar to Fig.1 but ε = 0.8. The lowest eight levels are included, where a great energy gap lies between the ground and the third states. gap width increases nearly linearly with B. The exis- tence of the gap is a remarkable feature which has not yet been found before from the rings with a finite width. This feature is crucial to the optical properties as shown later. Fig.3 demonstrates further how the gap varies with ε, rax, and B , where B is from 0 to 30 (or φ from 0 to 14.24φo). One can see that, when ε is large and rax is small, the increase of B would lead to a very large gap. 0 5 10 15 20 25 30 0 5 10 15 20 25 30 (b) ε = 0.8 ε =0.8 B (Tesla) E3-E1 (meV) (a) r ε =0.6 ε=0.4 FIG. 3: Evolution of the energy gap E3 − E1 when rax and ε are given. The A-B oscillation of the ground state energy is given in Fig.4. The change of ε does not affect the period (2.106 Tesla). However, when ε is large, the amplitude of the oscillation would be rapidly suppressed. Thus, for a highly flattened elliptic ring, the A-B oscillation appears only when B is small. 0 2 4 6 8 10 B (Tesla) ε =0, 0.4, 0.8rax=50 FIG. 4: The A-B oscillation of the ground state energy. The solid, dash-dot-dot, and dot lines are for ε = 0, 0.4, and 0.8, respectively. The persistent current of the j − th state reads14 Jj = G/~[Ψ 1− ε2 (1− ε2 sin2 θ) )Ψj + c.c.] (4) The A-B oscillation of Jj is plotted in Fig.5. When ε is small (≤ 0.4), just as in Fig.4, the effect of ε is small as shown in 5a. When ε is large there are three noticeable points: (i) The oscillation of the ground state current would become weaker and weaker when B increases. (ii) The current of the second state has a similar amplitude as the ground state, but in opposite phase. (iii) The third (and higher) state has a much stronger oscillation of current. 0 2 4 6 8 10 B (Tesla) (b) rax=50, ε =0.8 ε=0, 0.4, 0.8(a) r FIG. 5: The A-B oscillation of the persistent current J . (a) is for the ground state with ε = 0 (solid line), 0.4 (dash-dot- dot), and 0.8 (dot). (b) is for the first (ground), second and third states (marked by 1,2, and 3 by the curves) with ε fixed at 0.8. The ordinate is 106 times J/c in nm−1. For elliptic rings, the angular momentum L is not con- served. However, it is useful to define (L)j = 〈−i ∂∂θ 〉j (refer to eq.(2)). This quantity would tend to an integer if ε → 0. It was found that (i) When ε is small (≤ 0.4), (L)1 of the ground state decreases step by step with B, each step by one, just as the case of circular rings. How- ever, when ε is large, (L)1 decreases continuously and nearly linearly. (ii) When ε is small, |(L)i− (L)1| is close (not close) to 1 if 2 ≤ i ≤ 3 (otherwise). Since L would be changed by ±1 under a dipole transition, the ground state would therefore essentially jump to the second and third states. Accordingly, the dipole photon has essen- tially two energies, namely, E2 −E1 and E3 −E1 . How- ever, this is not exactly true when ε is large. There is a relation between the dipole photon energies and the persistent current.15 For ε = 0, the ground state with L = k1 would have the current J1 = G(k1 + α)/π~, while the ground state energy E(k1) = G(k1 + α) 2. Ac- cordingly the second and third states would have L = k1 ± 1, therefore we have |E3 − E2| = |E(k1 + 1)− E(k1 − 1)| = 2hJ1 (5) This relation implies that the current can be accurately measured simply by measuring the energy difference of the photons emitted in dipole transitions. For elliptic rings, this relation holds approximately when ε is small (≤ 0.4), as shown in Fig.6a. However, the deviation is quite large when ε is large as shown in 6c. 0 2 4 6 8 10 (c) ε = 0.7 B (Tesla) (b) ε = 0.5 (a) ε = 0.3 FIG. 6: E3 − E2 and the persistent current of the ground state. The solid line denotes (E3 − E2)/(2hc)10 6, the dash- dot-dot line denotes |J |/c·106 . They overlap nearly if ε < 0.3. The probability of dipole transition from Ψj to Ψj′ reads (ω/c)3|〈x∓ iy〉j′,j |2 (6) where ~ω = Ej′ − Ej is the photon energy, 〈x∓ iy〉j′,j = rax dθ(1 − ε2 cos2 θ)Ψ∗j′ [cos θ ∓ i 1− ε2 sin θ]Ψj (7) The probability of the transition of the ground state to the j′ − th state is shown in Fig.7. When ε is small (≤ 0.4) and B is not very large (≤ 10), the allowed final states essentially Ψ2 and Ψ3, and the oscillation of the probability is similar to the case of circular rings with the same period as shown in 7a and 7b. In particular, P±3,1 is considerably larger than P±2,1 due to having a larger pho- ton energy, thus the third state is particularly important to the optical properties. When ε is large (Fig.7c), the oscillation disappears gradually with B,while the prob- ability increases very rapidly due to the factor (ω/c)3. Since E3 − E1 is nearly proportional to B as shown in Fig.3, the probability is nearly proportional to B3. This leads to a very strong emission (absorption). Further- more, in Fig.7c the black solid curve is much higher than the dash-dot-dot curve, it implies that the final states can be higher than Ψ3, this leads to an even larger prob- ability. 0 2 4 6 8 10 B(Tesla) c) ε =0.8 b) ε =0.4 a) ε =0 FIG. 7: Evolution of the probability of dipole transition of the ground state. The green line is for Ψ1 to Ψ2 transition, red line for Ψ1 to Ψ3, dash-dot-dot line is for the sum of the above two, solid line in black is for the total probability. For circular rings, the particle densities ρ of all the eigen-states are uniform under arbitraryB. However, for elliptic rings, ρ is no more uniform as shown in Fig.8. For the ground state (8a), when φ =0, the non-uniformity is slight and ρ is a little smaller at the two ends of the major axis (θ = 0, π). When φ increases, the density at the two ends of the minor axis (θ = π/2, 3π/2) increases as well. When φ = 4φo the non-uniformity is very strong as shown by the curve 9, where ρ ≈ 0 when θ ≈ 0 or π. The second state has a parity opposite to the ground state, 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 Arc length (nm) fourth state third state 8ε =0.8, r ground state FIG. 8: Particle densities ρ as functions of the arc length (the according change of θ is 0 to π). The fluxes are given as φ = (i− 1)φo/2, where i is an integer from 1 to 9 marked by the curves. The first group of curves (in violet) have φ/φo = integer, the second group (in green) have φ/φo = half-integer. When φ increases, the curve of ρ jumps from the first group to the second group and jumps back, and repeatedly. but their densities are similar. For the third state (8b), ρ is peaked not at the ends of the major and minor axes but in between. In particular, when B increases, ρ oscillates from one pattern (say, in violet line) to another pattern (in green line), and repeatedly. The density oscillation would become stronger in higher states (8c). The period of oscillation remains to be φo, thus it is a new type of A-B oscillation without analogue in circular rings (where ρ remains uniform). Incidentally, the density oscillation does not need to be driven by a strong field, instead, a small change of φ from 0 to φo is sufficient. Let us evaluate Ej roughly by using (L)j to replace the operator −i ∂ in eq.(2) Then, Ej ≈ G dθ{[(L)j + α 1− ε2 1− ε2 sin2 θ αε2 sin 2θ 2(1− ε2 sin2 θ) ]2}Ψ∗jΨj (8) There are two terms at the right each is a square of a pair of brackets (for circular rings the second term does not exist). It is reminded that, while α = φ/φo is given positive, (L)j is negative. Thus there is a cancellation inside the first term. Therefore, when ε and α are large, the second term would be more important. It is recalled that both Ψ1 and Ψ2 are mainly distributed around θ = π/2 and 3π/2 (refer to Fig.8a), where the second term is zero due to the factor sin 2θ. Accordingly the energies of Ψ1 and Ψ2 are lower. On the contrary, both Ψ3 and Ψ4 are distributed close to the peaks of the second term (refer to Fig.8b and 8c), this leads to a higher energy. This effect would be greatly amplified by αε2 , this leads to the large energy gap shown in Fig.3. In summary, the optical property of highly flattened elliptic narrow rings was found to be greatly different from circular rings. For the latter, both the energy of the dipole photon and the probability of transition are low, and they are oscillating in small domains. On the contrary, for the former, both the energy and the prob- ability are not limited, the energy (probability) is nearly proportional to B (B3), they are tunable by changing ε, rax and/or B. It implies that a strong source of light with frequency adjustable in a wide domain can be de- signed by using highly flattened, narrow, and small rings. Furthermore, a new type of A-B oscillation, namely, the density oscillation, originating from symmetry breaking, was found. This is a noticeable point because the density oscillation might be popular for the systems with broken symmetry (e.g., with C3 symmetry). Acknowledgment: The support under the grants 10574163 and 90306016 by NSFC is appreciated. References 1, S.Viefers, P. Koskinen, P. Singha Deo, M. Manninen, Physica E 21 , 1 (2004) 2, U.F. Keyser, C. Fühner, S. Borck, R.J. Haug, M. Bichler, G. Abstreiter, and W. Wegscheider, Phys. Rev. Lett. 90, 196601 (2003) 3, D. Mailly, C. Chapelier, and A. Benoit, Phys. Rev. Lett. 70, 2020 (1993) 4, A. Fuhrer, S. Lüscher, T. Ihn, T. Heinzel, K. Ensslin, W. Wegscheider, and M. Bichler, Nature (London) 413, 822 (2001) 5, A.E. Hansen, A. Kristensen, S. Pedersen, C.B. Sorensen, and P.E. Lindelof, Physica E (Amsterdam) 12, 770 (2002) 6, M. van den Broek, F.M. Peeters, Physica E,11, 345 (2001) 7, E. Lipparini, L. Serra, A. Puente, European Phys. J. B 27, 409 (2002) 8, J. Even, S. Loualiche, P. Miska, J. of Phys.: Cond. Matt., 15, 8737 (2003) 9, C. Yannouleas, U. Landman, Physica Status Solidi A 203, 1160 (2006) 10, D. Berman, O Entin-Wohlman, and M. Ya. Azbel, Phys. Rev. B 42, 9299 (1990) 11, D. Gridin, A.T.I. Adamou, and R.V. Craster, Phys. Rev. B 69, 155317 (2004) 12, A. Bruno-Alfonso, and A. Latgé, Phys. Rev. B 71, 125312 (2005) 13, J. Goldstone and R.L. Jaffe, Phys. Rev. B 45, 14100 (1992) 14, Eq.(4) originates from a 2-dimensional system via the following steps. (i) the components of the current along X- and Y-axis are firstly obtained from the conser- vation of mass as well known. (ii) Then, the component along the tangent of ellipse jθ can be obtained. (iii) jθ is integrated along the normal of the ellipse under the assumption that the wave function is restricted in a very narrow region along the normal, then it leads to eq.(4). 15, Y.Z. He, C.G. Bao (submitted to PRB) ABSTRACT A narrow elliptic ring containing an electron threaded by a magnetic field B is studied. When the ring is highly flattened, the increase of B would lead to a big energy gap between the ground and excited states, and therefore lead to a strong emission of dipole photons. The photon frequency can be tuned in a wide range by changing B and/or the shape of the ellipse. The particle density is found to oscillate from a pattern of distribution to another pattern back and forth against $B$. This is a new kind of Aharonov-Bohm oscillation originating from symmetry breaking and is different from the usual oscillation of persistent current. <|endoftext|><|startoftext|> Effect of electron-electron interaction on the phonon-mediated spin relaxation in quantum dots Juan I. Climente,1, ∗ Andrea Bertoni,1 Guido Goldoni,1, 2 Massimo Rontani,1 and Elisa Molinari1, 2 1CNR-INFM National Center on nanoStructures and bioSystems at Surfaces (S3), Via Campi 213/A, 41100 Modena, Italy 2Dipartimento di Fisica, Università degli Studi di Modena e Reggio Emilia, Via Campi 213/A, 41100 Modena, Italy (Dated: October 21, 2018) We estimate the spin relaxation rate due to spin-orbit coupling and acoustic phonon scattering in weakly-confined quantum dots with up to five interacting electrons. The Full Configuration Interaction approach is used to account for the inter-electron repulsion, and Rashba and Dresselhaus spin-orbit couplings are exactly diagonalized. We show that electron-electron interaction strongly affects spin-orbit admixture in the sample. Consequently, relaxation rates strongly depend on the number of carriers confined in the dot. We identify the mechanisms which may lead to improved spin stability in few electron (> 2) quantum dots as compared to the usual one and two electron devices. Finally, we discuss recent experiments on triplet-singlet transitions in GaAs dots subject to external magnetic fields. Our simulations are in good agreement with the experimental findings, and support the interpretation of the observed spin relaxation as being due to spin-orbit coupling assisted by acoustic phonon emission. PACS numbers: 73.21.La,71.70.Ej,72.10.Di,73.22.Lp I. INTRODUCTION There is currently interest in manipulating electron spins in quantum dots (QDs) for quantum information and quantum computing purposes.1,2,3 A major goal in this research line is to optimize the spin relaxation time (T1), which sets the upper limit of the spin coherence time (T2): T2 ≤ 2T1.4 Therefore, designing two-level spin systems with long spin relaxation times is an im- portant step towards the realization of coherent quantum operations and read-out measuraments. Up to date, spin relaxation has been investigated almost exclusively in single-electron4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 and two- electron20,21,22,23,24,25,26,27,28,29,30 QDs. Spin relaxation in QDs with a larger number of electrons has seldom been considered28,31, even though Coulomb blockade makes it possible to control the exact number of carriers con- fined in a QD.32 Yet, recent theoretical works suggest that Coulomb interaction renders few-electron charge de- grees of freedom more stable than single-electron ones33, which leads to the question of whether similar findings hold for spin degrees of freedom. Moreover, in weakly- confined QDs, acoustic phonon emission assisted by spin- orbit (SO) interaction has been identified as the domi- nant spin relaxation mechanism when cotunneling and nuclei-mediated relaxation are reduced.6,8,31 The com- bined effect of Coulomb interaction and SO coupling has been shown to influence the energy spectrum of few- electron QDs profoundly,34,35,36 but the consequences on the spin relaxation remain largely unexplored.37 In Ref. 28 we investigated the effect of a magnetic field on the triplet-singlet (TS) spin relaxation in two and four-electron QDs with SO coupling, so as to under- stand related experimental works. Motivated by the very different response observed for different number of con- fined particles, in this work we shall focus on the role of electron-electron interaction in spin relaxation processes, extending our analysis to different number of carriers, highlighting, in particular, the different physics involved in even and odd number of confined electrons. Further- more, we will explicitly compare the predictions of our theoretical model with very recent experiments on spin relaxation in two-electron GaAs QDs.29 We study theoretically the energy structure and spin relaxation of N interacting electrons (N = 1 − 5) in parabolic GaAs QDs with SO coupling, subject to axial magnetic fields. Both Rashba38 and Dresselhaus39 SO terms are considered, and the electron-electron repulsion is accounted for via the Full Configuration Interaction method.40,41 By focusing on the two lowest spin states, two different classes of systems are distinguished. For N odd (1,3,5) and weak magnetic fields, the ground state is a doublet and then the two-level system is defined by the Zeeman-split sublevels of the lowest orbital. For N even (2,4), the two-level system is defined by a singlet and a triplet. We analyze these two classes of systems sepa- rately because, as we shall comment below, the physics involved in the spin transition differs. Thus, we compare the phonon-induced spin relaxation of N = 1, 3, 5 elec- trons and that of N = 2, 4 separately. As a general rule, the larger the number of confined carriers, the stronger the SO mixing, owing to the increasing density of elec- tronic states. This would normally yield faster relax- ation rates. However, we note that this is not necessarily the case, and few-electron states may display compara- ble or even slower relaxation than their single-electron and two-electron counterparts. This is due to charac- teristic features of the few-particle energy spectra which tend to weaken the admixture between the initial and final spin states. In N -odd systems, it is the presence of low-energy quadruplets for N > 1 that reduces the admixture between the Zeeman sublevels of the (dou- blet) ground state, hence inhibiting the spin flipping. In N -even systems, electronic correlations partially quench http://arxiv.org/abs/0704.0868v2 phonon emission33, and the relaxation can be further sup- pressed forN > 2 if one selects initial and final spin states differing in more than one quantum of angular momen- tum, which inhibits direct triplet-singlet SO mixing via linear Rashba and Dresselhaus SO terms.28 Noteworthy, all these effects are connected with Coulomb interaction between confined carriers. The paper is organized as follows. In Section II we give details about the theoretical model we use. In Section III we study the energy structure and spin relaxation of a QD with an odd number of electrons (N = 1, 3, 5). In Section IV we do the same for QDs with an even number of electrons (N = 2, 4). In Section V we compare our numerical simulations with experimental data recently reported for N = 2 GaAs QDs. Finally, in Section VI we present the conclusions of this work. II. THEORY We consider weakly-confined GaAs/AlGaAs QDs, which are the kind of samples usually fabricated by different groups to investigate spin relaxation processes.7,8,20,22 In these structures, the dot and the sur- rounding barrier have similar elastic properties, and the lateral confinement (which we approximate as circular) is much weaker than the vertical one. A number of use- ful approximations can be made for such QDs. First, since the weak lateral confinement gives inter-level spac- ings within the range of few meV, only acoustic phonons have significant interaction with bound carriers, while op- tical phonons can be safely neglected. Second, the elasti- cally homogeneous materials are not expected to induce phonon confinement, which allows us to consider three- dimensional bulk phonons. Finally, the different energy scales of vertical and lateral electronic confinement allow us to decouple vertical and lateral motion in the building of single-electron spin-orbitals. Thus, we take a parabolic confinement profile in the in-plane (x, y) direction, with single-particle energy gaps h̄ω0, which yields the Fock- Darwin states.42 In the vertical direction (z) the confine- ment is provided by a rectangular quantum well of width Lz and height determined by the band-offset between the QD and barrier materials (the zero of energy is then the bottom of the conduction band). The quantum well eigenstates are derived numerically. In cylindrical coor- dinates, the single-electron spin-orbitals can be written ψµ(ρ, θ, z; sz) = eimθ Rn,m(ρ) ξ0(z)χsz , (1) where ξ0 is the lowest eigenstate of the quantum well, χsz is the spinor eigenvector of the spin z-component with eigenvalue sz, and Rn,m is the n−th Fock-Darwin orbital with azimuthal angular momentum m, Rn,m(ρ) = (n+ |m|)! 0 L|m|n In the above expression L|m|n denotes a generalized La- guerre polynomial and l0 = h̄/m∗ω0 is the effective length scale, with m∗ standing for the electron effec- tive mass. The energy of the single-particle Fock-Darwin states is given by En,m = (2n + 1 + |m|)h̄Ωc + m2 h̄ωc, where ωc = is the cyclotron frequency and Ωc = + (ωc/2)2 is the total (spatial plus magnetic) con- finement frequency. With regard to Coulomb interaction, we need to go beyond mean field approximations in order to properly include electronic correlations, which play an important role in determining the phonon-induced electron scatter- ing rate.43 Moreover, since we are interested in the re- laxation time of excited states, we need to know both ground and excited states with comparable accuracy. Our method of choice is the Full Configuration Interac- tion approach: the few-electron wave functions are writ- ten as linear combinations |Ψa〉 = cai|Φi〉, where the Slater determinants |Φi〉 = Πµic†µi |0〉 are obtained by filling in the single-electron spin-orbitals µ with the N electrons in all possible ways consistent with symmetry requirements; here c†µ creates an electron in the level µ. The fully interacting Hamiltonian is numerically diago- nalized, exploiting orbital and spin symmetries.40,41 The few-electron states can then be labeled by the total az- imuthal angular momentumM = 0,±1,±2 . . ., total spin S and its z-projection Sz. The inclusion of SO terms is done following a similar scheme to that of Ref. 44, although here we consider not only Rashba but also linear Dresselhaus terms. For a quantum well grown along the [001] direction, these terms read:38,39 HR = α (kysx − kxsy), (3) HD = γc 〈k2z〉(kysy − kxsx), (4) where α and γc are coupling constants, while sj and kj are the j-th Cartesian projections of the electron spin and canonical momentum, respectively, along the main crys- talographic axes (〈k2z〉 = (π/Lz)2 for the lowest eigen- state of the quantum well). The momentum operator includes a magnetic field B applied along the vertical di- rection z. Other SO terms may also be present in the conduction band of a QD, such as the contribution aris- ing from the system inversion asymmetry in the lateral dimension or the cubic Dresselhaus term. However, in GaAs QDs with strong vertical confinement, HR and HD account for most of the SO interaction.36 We rewrite Eqs.(3,4) in terms of ladder operators as: HR = α (k+s− − k−s+), (5) HD = β (k+s+ + k−s−), (6) where k± and s± change m and sz by one quantum, respectively, and β = γc (π/Lz) 2 is the Dresselhaus in- plane coupling constant. It is worth mentioning that when only Rashba (Dresselhaus) coupling is present, the total angular momentum j = m + sz (j = m − sz) is conserved. However, in the general case, when both coupling terms are present and α 6= β, all symmetries are broken. Still, SO interaction in a large-gap semi- conductor such as GaAs is rather weak, and the low- lying states can be safely labelled by their approximate quantum numbers (M,S, Sz) except in the vicinity of the level anticrossings.11,26,45 Since the few-electron M and Sz quantum numbers are given by the algebraic sum of the single-particle states m and sz quantum numbers, it is clear from Eqs. (5,6) that Rashba interaction mixes (M,Sz) states with (M ± 1, Sz ∓ 1) ones, while Dressel- haus interaction mixes (M,Sz) with (M ± 1, Sz ± 1). The SO terms of Eqs. (5,6) can be spanned on a basis of correlated few-electron states.46 The SO matrix elements are then given by sums of single-particle contributions of the form: 〈n′m′ s′z| HR +HD |nmsz〉 = C∗R O+n′m′ nm δm′ m+1 δs′z sz−1+CR O n′m′ nm δm′ m−1 δs′z sz+1+ C∗D O+n′m′ nm δm′ m+1 δs′z sz+1+CD O n′m′ nm δm′ m−1 δs′ sz−1. Here CR = α and CD = −iβ are constans for the Rashba and Dresselhaus interactions respectively, andO± are the form factors: Rnm(t), Rnm(t), with t = ρ2/l20. The above forms factors have analytical expressions which depend on the set of quantum num- bers {n′m′, nm}. The resulting SO-coupled eigenvec- tors are then linear combinations of the correlated states, |ΨSOA 〉 = cAa|Ψa〉. We assume zero temperature, which suffices to capture the main features of one-phonon processes.9,16 Indeed, it is one-phonon processes that account for most of the low- temperature experimental observations in the SO cou- pling regime.2,6,8,28,29,31 We evaluate the relaxation rate between the initial (occupied) and final (empty) states of the SO-coupled few-electron state, B and A, using the Fermi Golden Rule: τ−1B→A = c∗BbcAa c∗bicaj〈Φi|Vνq|Φj〉 δ(EB−EA−h̄ωq), where the electron states |ΨSOK 〉 (K = A,B) have been written explicitly as linear combinations of Slater deter- minants, EK stands for the K electron state energy and h̄ωq represents the phonon energy. Vνq is the interac- tion operator of an electron with an acoustic phonon of momentum q via the mechanism ν, which can be either deformation potential or piezoelectric field interaction. Details about the electron-phonon interaction matrix el- ements can be found elsewhere.33 In this work we study a GaAs/Al0.3Ga0.7As QDs, us- ing the following material parameters:47 electron effective massm∗ = 0.067, band-offset Vc = 243 meV, crystal den- sity d = 5310 kg/m3, acoustic deformation potential con- stant D = 8.6 eV, effective dielectric constant ǫ = 12.9, and piezoelectric constant h14 = 1.41 · 109 V/m. The Landé factor is g = −0.44.5 As for GaAs sound speed, we take cLA = 4.72 · 103 m/s for longitudinal phonon modes and cTA = 3.34 · 103 m/s for transversal modes.48 Unless otherwise stated, a lateral confinement of h̄ω0 = 4 meV and a quantum well width of Lz = 10 nm are assumed for the QD under study, and a Dressehlaus coupling pa- rameter γc = 25.5 eV·Å3 is taken49, so that β ≈ 25 meV·Å. The value of the Rashba coupling constant can be modulated externally e.g. with external electric fields. Here we will investigate systems both with and without Rashba interaction. When present, we shall mostly con- sider α = 50 meV·Å, to represent the case where Rashba effects prevail over Dresselhaus ones. Few-body correlated states (M,S, Sz) are obtained us- ing a basis set composed by the Slater determinants (SDs) which result from all possible combinations of 42 single-electron spin-orbitals (i.e., from the six lowest en- ergy shells of the Fock-Darwin spectrum at B = 0) filled with N electrons. For N = 5, this means that the basis rank may reach ∼ 2 · 105. The SO Hamiltonian is then diagonalized in a basis of up to 56 few-electron states, which grants a spin relaxation convergence error below 2%. Since SO terms break the spin and angular mo- mentum symmetries, the SO-coupled states |ΨSOK 〉 are described by a linear combination of SDs coming from different (M,S, Sz) subspaces. Thus, for N = 5, the states are described by up to ∼ 8.5 · 105 SDs. To evalu- ate the electron-phonon interaction matrix elements, we note that only a small percentage of the huge number of possible pairs of SDs (∼ 7 · 1011 for N = 5) may give non-zero matrix elements, owing to spin-orbital or- thogonalities. We scan all pairs of SDs and filter those which may give non-zero matrix elements writing the de- terminants in binary representation and using efficient bit-per-bit algorithms.40,41 The matrix elements of the remaining pairs (∼ 2 ·106 for N = 5) are evaluated using massive parallel computation. 0 1 3 B (T) FIG. 1: Low-lying energy levels in a QD with N = 1, 3, 5 interacting electrons, as a function of an axial magnetic field. The SO interaction coefficients are α = 50 meV· Å and β = 25 meV· Å. The dot has h̄ω0 = 4 meV and Lz = 10 nm. Note the increasing size of the SO-induced anticrossing gaps and zero-field splittings with increasing N . III. SPIN RELAXATION IN A QD WITH N ODD A. Energy structure When the number of electrons confined in the QD is odd and the magnetic field is weak enough, the ground and first excited states are usually the Zeeman sz = 1/2 and sz = −1/2 sublevels of a doublet [Fig. 1]. Since the initial and final spin states belong to the same orbital, ∆M = 0 and SO mixing (which requires ∆M = ±1) is only possible with higher-lying states. In addition, the phonon energy (corresponding to the electron tran- sition energy) is typically small (in the µeV scale). In this case, the relaxation rate is determined essentially by the phonon density, the strength and nature of the SO interaction, and the proximity of higher-lying states.9,11 In order to gain some insight on the influence of these factors, in Fig. 1 we compare the energy structure of a QD with N = 1, 3, 5 vs. an axial magnetic field, in the presence of Rashba and Dresselhaus interactions.55 One can see that the increasing number of particles changes the energy magneto-spectrum drastically. This is be- cause the quantum numbers of the low-lying energy levels change, resulting in a different field dependence, and be- cause Coulomb interaction leads to an increased density of electron states, as well as to a more complicated spec- trum. At first sight, the energy spectra of Fig. 1 closely resem- ble those in the absence of SO effects. For instance, the N = 1 spectrum is very similar to the pure Fock-Darwin spectrum.42 Rashba and Dresselhaus interactions were expected to split the degenerate |m| > 0 shells at B = 0, shift the positions of the level crossings and turn them into anticrossings36,52,53,54, but here such signatures are hardly visible because SO interaction is weak in GaAs. In fact, the magnitude of the SO-induced zero-field en- ergy splittings and that of the anticrossing gaps is of very few µeV, and SO effects simply add fine features to the N = 1 spectrum.52 A significantly different picture arises in the N = 3 and N = 5 cases. Here, the increased density of elec- tronic states enhances SO mixing as compared to the single-electron case.56 As a result, the anticrossing gaps can be as large as 30 µeV (N = 3) and 60 µeV (N = 5). Moreover, unlike in the N = 1 case, where the ground state orbital has m = 0, here it has |M | = 1. Therefore, the Zeeman sublevels involved in the fundamental spin transition are subject to SO-induced zero-field splittings. To illustrate this point, in Fig. 2 we zoom in on the energy spectrum of the four lowest states of N = 3 and N = 5 under weak magnetic fields, without (left panels) and with (right panels) Rashba interaction. Clearly, the four- fold degeneracy of |M | = 1 spin-orbitals at B = 0 has been lifted by SO interaction.36 One can also see that the order of the two lowest sublevels at B ∼ 0 changes when Rashba interaction is switched on. Thus, for N = 3 and α = 0, the two lowest sublevels are (M = −1, Sz = 1/2) and (M = −1, Sz = −1/2), but this order is reversed when α = 50 meV·Å. The opposite level order as a func- tion of α is found for N = 5. This behavior constitutes a qualitative difference with respect to the N = 1 case in two aspects. First, the phonon energy (i.e., the energy of the fundamental spin transition) is no longer given by the bare Zeeman splitting. Instead, it has a more compli- cated dependence on the magnetic field, and it is greatly influenced by the particular values of α and β. This is apparent in the N = 5 panels, where the energy splitting between the two lowest states strongly differs depending on the relative value of α and β. Second, it is possible to find situations where the ground state at B ∼ 0 has Sz = −1/2 and the first excited state has Sz = 1/2 (e.g. N = 3 when α > β or N = 5 when α < β). In these cases, the Zeeman splitting leads to a weak anticrossing of the two sublevels (highlighted with dashed circles in Fig. 2) which has no counterpart in single-electron sys- tems. This kind of B-induced (i.e., not phonon-induced) ground state spin mixing, also referred to as “intrinsic spin mixing”, has been previously reported for singlet- triplet transitions in N = 2 QDs.58 Here we show that they may also exist in few-electron QDs with N odd. (−2,1/2)(1,1/2) (−1,1/2) (1,1/2) (1,1/2) (−4,1/2) (−2,1/2) (−4,1/2) (−1,1/2) (−1,1/2) (−1,1/2) (1,1/2) α = 50, β = 25α = 0, β = 25 N = 3 N = 5 127.0 127.5 128.0 128.5 0.5 1.0 0.0 236.0 236.5 237.0 B (T) 0.0 0.5 1.0 B (T) FIG. 2: The four lowest energy levels in a QD with N = 3, 5 interacting electrons, as a function of an axial magnetic field, without (left column) and with (right column) Rashba SO interaction. The approximate quantum numbers (M,S) of the levels are shown, with arrows denoting the spin projection Sz = 1/2 (↑) and Sz = −1/2 (↓). The dashed circles highlight the region of intrinsic spin mixing of the ground state. Figure 1 puts forward yet another qualitative differ- ence between SO coupling in single- and few-electron QDs: while in the former low-energy anticrossings are due to Rashba interaction11,36,52, in few-electron QDs, when S = 3/2 states come into play, both Rashba and Dresselhaus terms may induce anticrossings. For exam- ple, the (M = −1, Sz = 1/2) sublevel couples directly to both (M = −2, Sz = −1/2) and (M = −2, Sz = 3/2) sublevels, via the Dresselhaus and Rashba interaction, respectively. Coupling to S = 3/2 states is a characteris- tic feature of N > 1 systems, which has important effects on the spin relaxation rate, as we will discuss below. B. Spin relaxation between Zeeman sublevels In Fig. 3 we compare the magnetic field dependence of the spin relaxation rate between the two lowest Zeeman sublevels of N = 1, 3, 5. Dashed lines (solid lines) are used for systems without (with) Rashba interaction.59 While for N = 1 the well-known exponential dependence with B is found2,6,9, and the main effect of Rashba cou- pling is to shift the curve upwards (i.e., to accelerate the relaxation), for N = 3 and N = 5 the relaxation rate ex- hibits complicated trends which strongly depend on the values of the SO coupling parameters. α = 50, β = 25 α = 0, β = 25 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 B (T) 0 1 2 3 FIG. 3: Spin relaxation rate in a QD with N = 1, 3, 5 inter- acting electrons as a function of an axial magnetic field. Solid (dashed) lines stand for the system with (without) Rashba interaction. Note the strong influence of the SO interaction in the shape of the relaxation curve for N > 1. To understand this result, one has to bear in mind that in spin relaxation processes two well-distinguished and complementary ingredients are involved, namely SO in- teraction and phonon emission. Phonon emission grants the conservation of energy in the electron relaxation, but phonons have zero spin and therefore cannot cou- ple states with different spin. It is the SO interaction that turns pure spin states into mixed ones, thus enabling the phonon-induced transition. The overall efficiency of the scattering event is then given by the combination of the two phenomena: the phonon emission efficiency modulated by the extent of the SO mixing. The shape of spin relaxation curves shown in Fig. 3 can be directly related to the energy dispersion of the phonon, which cor- responds to the splitting between the two lowest levels of the electron spectrum. Thus, for N = 1, the phonon energy is simply proportional to B through the Zeeman splitting, but for N = 3 and N = 5 it has a non-trivial dependence on B, as shown in Fig. 2. Actually, the relax- ation minima in Fig. 3 are connected with the magnetic field values where the two lowest levels anticross in Fig. 2. In these magnetic field windows, in spite of the fact that SO coupling is strong, the phonon density is so small that the relaxation rate is greatly suppressed.28 Similarly, the relaxation rate fluctuations of N = 3 at B ∼ 3 T are signatures of the anticrossings with high-angular momen- tum states. For larger fields (B > 3 T), the ground state approaches the maximum density droplet configuration and high-spin states are possible.44 In this work, how- ever, we restrict ourselves to the magnetic field regime where the ground state is a doublet. eV∆ (µ ) 10 10 10 10 10 10 10 10 10 0 20 40 60 80 10 FIG. 4: (Color online). Spin relaxation rate in a QD with N = 1, 3, 5 interacting electrons as a function of the energy splitting between the two lowest spin states. Top panel: α = 0, β = 25 meV·Å. Bottom panel: α = 50 meV·Å, β = 25 meV·Å. The relaxation of N = 3 is slower than that of N = 1 for a wide range of ∆12. The irregular data distribution is due to the irregular relaxation rates vs. magnetic field. For example, the strongly deviated points of N = 3 come from the peaks at B ∼ 3 in Fig. 3. For a more direct comparison between the relaxation rates of N = 1, 3, 5, in Fig. 4 we replot the data of Fig. 3 as a function of the energy splitting between the two lowest states, ∆12, without (top panel) and with (bot- tom panel) Rashba interaction. Since the phonon energy is identical for all points with the same ∆12, differences in the relaxation rate arise exclusively from the different strength of SO interaction. ∆12 is also a relevant pa- rameter from the experimental point of view, since it is usually required that it be large enough for the states to be resolvable. In this sense, it is worth noting that, even if the inter-level splittings shown in Fig. 4 are fairly small, a number of experiments have successfully addressed this regime.5,8,21 A most striking feature observed in the figure is that, for most values of ∆12, the N = 3 relaxation rate is clearly slower than the N = 1 one. Likewise, N = 5 shows a similar (or slightly faster) relaxation rate than N = 1. These are interesting results, for they suggest that improved spin stability may be achieved using few- electron QDs instead of the single-electron ones typically employed up to date.8 At first sight the results are sur- prising, because the higher density of states in the few- electron systems implies smaller inter-level spacings, and hence stronger SO mixing, which should translate into enhanced relaxation. It then follows that another physi- cal mechanism must be acting upon the few-electron sys- tems, which reduces the transition probability between the initial and final spin states, and may even make it smaller than for N = 1. Here we propose that such mechanism is the SO admixture with low-lying quadru- plet (S = 3/2) states, which become available for N > 1. By coupling to S = 3/2 levels, the projection of the dou- blet Sz = 1/2 levels onto Sz = −1/2 ones is reduced, and this partly inhibitis the transition between the low- est doublet sublevels. Let us explain this by comparing the spin transition for N = 1 and N = 3. For N = 1, the spin configuration of the initial and final states, in the absence of SO coupling, is |Sz = −1/2〉 and |Sz = +1/2〉, respectively. The tran- sition between these states is spin-forbidden. However, when SO coupling is switched on, the two states become admixed with higher-lying S = 1/2 states fulfilling the ∆Sz = ±1 condition. The transition between the initial and final states can then be represented schematically as: ca |Sz = −1/2〉+cb|Sz = +1/2〉 ⇒ cr |Sz = +1/2〉+cs|Sz = −1/2〉, where ci are the admixture coefficients (in general ca ≫ cb and cr ≫ cs). Clearly now both spin configurations of the initial state have a finite overlap with the final state, and so the transition is possible. Let us next consider the N = 3 case. In the absence of SO coupling, the initial and final states are again the Sz = −1/2 and Sz = +1/2 doublets, respectively, and the transition is spin- forbidden. When we switch on SO coupling, we note that the ∆Sz = ±1 condition allows for mixing not only with Sz = ±1/2 states (either doublets or quadruplets) but also with Sz = ±3/2 quadruplets, so that the transition can be represented as: ca|Sz = −1/2〉+ cb|Sz = +1/2〉+ cc|Sz = −3/2〉 ⇒ cr|Sz = +1/2〉+ cs|Sz = −1/2〉+ ct|Sz = +3/2〉, where, in general, ca ≫ cb, cc, and cr ≫ cs, ct. In this case, |Sz = −3/2〉 has no overlap with the final state con- figurations. Likewise, |Sz = +3/2〉 has no overlap with the initial state configurations. Therefore, these quadru- plet configurations are inactive from the point of view of the transition, and the more important they are (i.e., the stronger the SO coupling with quadruplet states), the less likely the transition is. To prove this argument quantitatively, in Fig. 5 we il- lustrate the spin relaxation of N = 3 calculated by diag- onalization of the SO Hamiltonian including and exclud- ing the low-lying S = 3/2 states from the basis set. As expected, when the quadruplets are not considered, the transition is visibly faster. For N = 5, low-lying S = 3/2 levels are also available, but in this case they barely com- pensate for the large density of electron states, so that the overall scattering rate turns out to be comparable to that of N = 1. eV∆ (µ ) without S=3/2 with S=3/2 10 10 10 10 10 0 20 40 60 80 FIG. 5: (Color online). Spin relaxation rate in a QD with N = 3 interacting electrons as a function of the energy splitting between the two lowest spin states. α = 0 and β = 25 meV·Å. Symbol + (×) stands for SO Hamiltonian diagonalized in a basis which includes (excludes) S = 3/2 states. Clearly, the inclusion of S = 3/2 states slows down the relaxation. To test the robustness of the few-electron spin states stability predicted above, we also compare the relax- ation rate of N = 1 and N = 3 in a QD with dif- ferent confinement, namely h̄ω0 = 6 meV, in Fig. 6. Since the lateral confinement of the dot is now stronger, (M = −1, S = 1/2) is the N = 3 ground state up to large values of the magnetic field (B ∼ 5 T). This allows us to investigate larger Zeeman splittings (i.e., larger ∆12), which may be easier to resolve experimentally. As seen in the figure, the relaxation rate of N = 3 is again slower than that of N = 1 for a wide range of ∆12, the behavior being very similar to that of Fig. 4, albeit extended to- wards larger inter-level spacings. The crossing between N = 3 and N = 1 relaxation rates at large ∆12 val- ues, both in Fig. 4 and Fig. 6, is due to the proximity of high-angular momentum levels coming down in energy for N = 3 when the magnetic field (and hence the Zee- man splitting) is large. Such levels bring about strong SO admixture and thus fast relaxation (see middle panel of Fig. 3 at B ∼ 3 T). IV. SPIN RELAXATION IN A QD WITH N A. Energy structure When the number of electrons confined in the QD is even and the magnetic field is not very strong, the ground and first excited states are usually a singlet (S = 0) and a triplet (S = 1) with three Zeeman sublevels eV∆ (µ ) 10 10 10 10 10 10 10 10 10 10 0 40 80 120 FIG. 6: (Color online). Spin relaxation rate in a QD with N = 1, 3 interacting electrons as a function of the energy splitting between the two lowest spin states. The QD has h̄ω0 = 6 meV. Top panel: α = 0, β = 25 meV·Å. Bottom panel: α = 50 meV·Å, β = 25 meV·Å. As for the weaker- confined dot of Fig. 4, the relaxation of N = 3 is slower than that of N = 1 for a wide range of ∆12. (Sz = +1, 0,−1). Unlike in the previous section, here the initial and final states of the spin transition may have dif- ferent orbital quantum numbers, and the inter-level split- ting ∆12 may be significantly larger (in the meV scale). Under these conditions, the phonon emission efficiency no longer exhibits a simple proportionality with the phonon density, but it further depends on the ratio between the phonon wavelength and the QD dimensions.50,51 More- over, SO interaction is sensitive to the quantum numbers of the initial and final electron states.26,28 Therefore, in this class of spin transitions the details of the energy structure are also relevant to determine the relaxation rate. In Fig. 7 we plot the energy levels vs. magnetic field for a QD with N = 2, 4 in the presence of Rashba and Dres- selhaus interactions. The approximate quantum num- bers (M,S) of the lowest-lying states are written between parenthesis. For N = 2 and weak fields, the ground state is the (M = 0, S = 0) singlet, and the first excited state is the (M = −1, S = 1) triplet. As in the previous sec- tion, SO interaction introduces small zero-field splittings and anticrossings in the energy levels with |M | > 0.36 As a consequence, when α > β, the zero-field ordering of the (M = −1, S = 1) Zeeman sublevels is such that they anticross in the presence of an external magnetic field. This anticrossing is highlighted in the figure by a dashed circle. On the other hand, as B increases the singlet- triplet energy spacing is gradually reduced, and then the singlet experiences a series of weak anticrossings with all (−1,1) (−2,0) (−3,1) (0,1) (0,0) 0 1 2 3 B (T) FIG. 7: Low-lying energy levels in a QD with N = 2, 4 in- teracting electrons as a function of an axial magnetic field. α = 50 meV· Å and β = 25 meV· Å. The approximate quantum numbers (M,S) of the lowest states are shown. The dashed circle in N = 2 highlights the anticrossing between M = −1 Zeeman sublevels. three Zeeman sublevels of the triplet. These anticross- ings are due to the fact that (M = 0, S = 0, Sz = 0) couples to the (M = −1, S = 1, Sz = −1) sublevel via Dresselhaus interaction, to the (M = −1, S = 1, Sz = +1) sublevel via Rashba interaction, and finally to the (M = −1, S = 1, Sz = 0) sublevel indirectly through higher-lying states.26,28 For N = 4, the density of electronic states is larger than for N = 2, which again reflects in a larger magni- tude of the anticrossings gaps due to the enhanced SO interaction. The ground state at B = 0 is a triplet, (M = 0, S = 1), but soon after it anticrosses with a singlet, (M = −2, S = 0). After this, and before the formation of Landau levels, two different branches of the first excited state can be distinguished: when B < 1 T, the first excited state is (M = 0, S = 1), and when B > 1 T it is (M = −3, S = 1). It is worth pointing out that the complexity of the N = 4 spectrum, as compared to the simple N = 2 one, implies a greater flexibility to select initial and final spin states by means of external fields. As we shall discuss below, this degree of freedom has important consequences on the relaxation rate. B. Triplet-singlet spin relaxation In a recent work, we have investigated the magnetic field dependence of the TS relaxation due to SO cou- pling and phonon emission in N = 2 and N = 4 QDs.28. Here we study this kind of transition from a different perspective, namely we compare the spin relaxation of two- and four-electron systems in order to highlight the changes introduced by inter-electron repulsion. Increas- ing the number of electrons confined in the QD has three important consequences on the TS transition. First, it increases the density of electronic states (and then the SO mixing), leading to faster relaxation. Second, as mentioned in the previous section, it introduces a wider choice of orbital quantum numbers for the singlet and triplet states. Third, it increases the strength of elec- tronic correlations. Since now the initial and final spin states have different orbital wave functions, the latter factor effectively reduces phonon scattering, in a similar fashion to charge relaxation processes33 (this effect has been recently pointed out in Ref. 30 as well). To find out the overall combined effect of these three factors, in this section we analyze quantitative simulations of correlated We focus on the magnetic field regions where the ground state is a singlet and the excited state is a triplet. A complete description of the TS transition should then include spin relaxation between the Zeeman-split sub- levels of the triplet. However, for the weak fields we con- sider this relaxation is orders of magnitude slower than the TS one (compare Figs. 3 and 8),60 the reason for this being the small Zeeman energy and the fact that the Zeeman sublevels are not directly coupled by Rashba and Dresselhaus terms, as mentioned in Section III. There- fore, it is a good approximation to assume that all three triplet Zeeman sublevels are equally populated and they relax directly to the singlet.26 α = 50, β = 25 α = 0, β = 25 10 10 10 10 10 10 0.5 1 1.5 2 2.5 3 B (T) FIG. 8: Spin relaxation rate in a QD with N = 2, 4 interact- ing electrons as a function of an axial magnetic field. Solid (dashed) lines stand for the system with (without) Rashba interaction. The relaxation of N = 4 when B < 1 T is slower than that of N = 2. Figure 8 represents the TS relaxation rate in a QD with N = 2, 4, after averaging the relaxation from the three triplet sublevels. Solid (dashed) lines stand for the case with (without) Rashba interaction.59 The main effect of Rashba and Dresselhaus interactions is to accelerate the spin transition by shifting the relaxation curve upwards. This is in contrast to the N -odd case, where these terms may induce drastic changes in the shape of the relaxation rate curve (see Fig. 3). Figure 8 also reveals a different behavior of the N = 2 and N = 4 TS relaxation rates. The former increases gradually with B and then drops in the vicinity of the TS anticrossing, due to the small phonon energies.28,29,30 Conversely, for N = 4 an addi- tional feature is found, namely an abrupt step at B ∼ 1. This is due to the change of angular momentum of the excited triplet. For B < 1 T the triplet has M = 0, and for B > 1 T it has M = −3. Since the ground state is a singlet with M = −2, the M = 0 triplet does not fulfill the ∆M = ±1 condition for linear SO coupling. This inhibits direct spin mixing between initial and final states and reduces the relaxation rate by about one order of magnitude.28 meV∆ ( ) N=4, M=0 N=4, M=−3 10 10 10 10 10 10 10 0 0.4 0.8 1.2 FIG. 9: (Color online). Spin relaxation rate in a QD with N = 2, 4 interacting electrons as a function of the energy spacing between the singlet and the triplet. Here M stands for the angular momentum of the triplet. Top panel: α = 0, β = 25 meV·Å. Bottom panel: α = 50 meV·Å, β = 25 meV·Å. The relaxation of N = 4 is comparable to that of N = 2 when the triplet has M = −3, and it is much smaller when M = 0. Noteworthy, the choice of states differing in more than one quantum of angular momentum is only possible for N > 2 QDs. One may then wonder if it is more conve- nient to use these systems instead of the N = 2 ones dom- inating the experimental literature up to date20,21,29, i.e. if it compensates for the increased density of electronic states. Interestingly, Fig. 8 predicts slower relaxation for the N = 4 QD with M = 0 triplet than for N = 2. To verify that this arises from weakend SO coupling rather than from different phonon energy values, in Fig. 9 we replot the spin relaxation rate of N = 2, 4 as a function of the TS energy splitting. In the figure, the upper and bottom panels represent the situations without and with Rashba interaction, respectively. While N = 4 shows similar relaxation rate to N = 2 when the triplet has M = −3, the relaxation is slower by about one order of magnitude when the triplet has M = 0. This result indicates that the weakening of SO mixing due to the violation of the ∆M = ±1 condition clearly exceeds the strengthening due to the higher density of states, con- firming that N = 4 systems are more attractive than N = 2 ones to obtain long triplet lifetimes. We also point out that, in spite of the different density of states, the relaxation rate of N = 2 and N = 4, M = −3 triplets is quite similar. This can be ascribed to the phonon scat- tering reduction by electronic correlations,33 which may also explain the fact that experimentally resolved TS re- laxation rates of N = 8 QDs and N = 2 QDs be quite similar.20,31 V. COMPARISON WITH N = 2 EXPERIMENTS Whereas, to our knowledge, no experiments have mea- sured transitions between Zeeman-split sublevels in N > 1 systems yet, a number of works have dealt with TS re- laxation in QDs with few interacting electrons. In Ref. 28 we showed that our model correctly predicts the trends observed in experiments with N = 2 and N = 8 QDs subject to axial magnetic fields.20,21,31 In this section, we extend the comparison to new experiments available for N = 2 TS relaxation in QDs,29 which for the first time provide continuous measurements of the average triplet lifetime against axial magnetic fields, from B = 0 to the vicinity of the TS anticrossing. By using a simple model, the authors of the experimental work showed that the measuraments are in clear agreement with the behavior expected from SO coupling plus acoustic phonon scatter- ing. However, in such model: (i) the TS energy splitting was a taken directly from the experimental data, (ii) the SO coupling effect was accounted for by parametrizing the admixture of the lowest singlet and triplet states only, and (iii) the B-dependence of the SO-induced admix- ture was neglected. Approximation (ii) may overlook the correlation-induced reduction of phonon scattering,30,33 that we have shown above to be significant, and which may have an important contribution from higher excited states in weakly-confined QDs. In turn, approximation (iii) may overlook the important influence of SO coupling in the B-dependence of the triplet lifetime, as we had anticipated in Ref. 28. Here we compare with the exper- imental findings using our model, which includes these effects properly. We assume a QD with an effective well width Lz = 30 nm, as expected by Ref. 29 authors, and a lateral confinement parabola of h̄ω0 = 2 meV which, as we shall see next, fits well the position of the TS an- ticrossing. Yet, the comparison is limited by the lack of detailed information about the Rashba and Dresselhaus interaction constants, and because we deal with circular QDs instead of elliptical ones (the latter effect introduces simple deviations from the circular case26). In addition, in the experiment a tilted magnetic field of magnitude B∗, forming an angle of 68◦ with the vertical direction was used. Here we consider the vertical component of the field (B = 0.37B∗), which is the main responsible for the changes in the energy structure, and the effect of the in-plane component enters via the Zeeman splitting only. Figure 10 illustrates the average triplet lifetime for N = 2. The bottom axis shows the vertical magnetic field B value, while the top axis shows the value to be compared with the experiment B∗.59 As can be seen, the triplet lifetime first decreases with the field and then it abruptly increases in the vicinity of the TS anticross- ing, due to the small phonon density.28 This behavior is in clear agreement with the experiment (cf. Fig. 3 of Ref. 29). The position of the anticrossing (B∗ ∼ 2.9 T) is also close to the experimental value (B∗ ∼ 2.8 T), which confirms that that h̄ω0 = 2 meV is similar to the mean confinement frequency of the experimental sample. A departure from the experimental trend appears at weak fields (B < 0.5 T), where we observe a continuous in- crease of T1 with decreasing B, while the experiment re- ports a plateau. This is most likely due to the ellipticity of the experimental sample, which renders the electron states (and consequently the relaxation rate) insensitive to the field in the B∗ = 0 − 0.5 T region (see Fig. 1a in 29). In any case, Fig. 10 clearly confirms the role of phonon-induced relaxation in the experiments, using a realistic model for the description of correlated electron states, SO admixture and phonon scattering. A comment is worth here on the magnitude of the SO coupling terms. In Fig. 10, we obtain good agreement with the experimental relaxation times by using small values of the SO coupling parameters. In particular, a close fit is obtained using β = 1, α = 0.5 meV·Å, which yields a spin-orbit length λSO = 48 µm. This value, which coincides with the experimental guess (λSO ≈ 50 µm), indicates that SO coupling is several times weaker than that reported for other GaAs QDs.8 Typical GaAs parameters are often larger. For instance, measuraments of the Rashba and Dresselhaus constants by analysis of the weak antilocalization in clean GaAs/AlGaAs two- dimensional gases revealed α = 4−5 meV·Å, and γc = 28 eV·Å3 (i.e, β = 3 meV·Å for our quantum well of Lz = 30 nm).61 To be sure, the small SO coupling parameters in the experiment have a major influence on the lifetime scale. Compare e.g. the β = 1 and β = 5 meV·Å curves in Fig. 10. Actually, we note that accurate comparison with the timescale reported for other GaAs samples31 is also possible within our model, but assuming stronger SO coupling constants.28 In Ref. 29, it was suspected that the weak SO coupling inferred from the experimen- tal data could be the result of the exclusion of higher orbitals and the magnetic field dependence of SO ad- α = 0.5 β = 1 β = 5 β = 2 β = 1 α = 0 B (T) 0.2 0.4 0.6 0.8 1 0 0.53 1.58 2.11 2.631.05 B (T) FIG. 10: Average triplet lifetime in a QD with N = 2 elec- trons as a function of an axial magnetic field. Only the field region before the TS anticrossing is shown. α and β are in meV·Å units. B is the applied axial magnetic field, and B∗ is the equivalent tilted magnetic field, for comparison with Ref. 29 experiment. mixture in their model (higher states reduce the effective SO coupling constants by decreasing the phonon-induced scattering30,33). Here we have considered both these ef- fects and still small SO coupling constants are needed to reproduce the experiment. Therefore, understanding the origin of their small value remains as an open question. One possibility could be that the particular direction of the tilted magnetic field used in the experiment corre- sponded to a reduced degree of SO admixture.30 VI. CONCLUSIONS We have investigated theoretically the energy structure and spin relaxation rate of weakly-confined QDs with N = 1 − 5 interacting electrons, subject to axial mag- netic fields, in the presence of linear Rashba and Dressel- haus SO interactions. It has been shown that the num- ber of electrons confined in the dot introduces changes in the energy spectrum which significantly influence the intensity of the SO admixture, and hence the spin re- laxation. In general, the larger the number of confined carriers, the higher the density of electronic states. This decreases the energy splitting between consecutive lev- els and then enhances SO admixture, which should lead to faster spin relaxation. However, we find that this is not necessarily the case, and slower relaxation rate may be found for few-electron QDs as compared to the usual single and two-electron QDs used up to date. The physi- cal mechanisms responsible for this have been identified. For N -odd systems, when the spin transition takes place between Zeeman-split sublevels, it is the presence of low- energy S = 3/2 states for N > 1 that reduces the pro- jection of the doublet Sz = 1/2 sublevels into Sz = −1/2 ones, thus partly inhibiting the spin transition. For N - even systems, when the spin transition takes place be- tween triplet and singlet levels, there are two underlying mechanisms. On the one hand, electronic correlations tend to reduce phonon emission efficiency. On the other hand, for N > 2 a magnetic field can be used to se- lect a pair of singlet-triplet states which do not fulfill the ∆M = ±1 condition of direct SO admixture, which significantly weakens the SO mixing. Last, we have compared our estimates with recent experimental data for TS relaxation in N = 2 QDs.29 Our results support the interpretation of the experi- ment in terms of SO admixture plus acoustic phonon scattering, even though quantitative agreement with the experiment requires assuming much weaker SO coupling than that reported for similar GaAs structures. Acknowledgments We acknowledge support from the Italian Ministry for University and Scientific Research under FIRB RBIN04EY74, Cineca Calcolo parallelo 2006, and Marie Curie IEF project NANO-CORR MEIF-CT-2006- 023797. ∗ Electronic address: climente@unimore.it; URL: www.nanoscience.unimore.it 1 I. Zutic, J. Fabian, and S. Das Sarma, Rev. Mod. Phys. 76, 323 (2004). 2 D. Heiss, M. Kroutvar, J.J. Finley, and G. Abstreiter, Solid State Comm. 135, 519 (2005). 3 D. Loss, and D.P. DiVincenzo, Phys. Rev. A 57, 120 (1998). 4 V.N. Golovach, A. Khaetskii, and D. Loss, Phys. Rev. Lett. 93, 016601 (2004). 5 R. Hanson, B. Witkamp, L.M.K. Vandersypen, L.H. Willems van Beveren, J.M. Elzerman, and L.P. Kouwen- hoven, Phys. Rev. Lett. 91, 196802 (2003). 6 M. Kroutvar, Y. Ducommun, D. Heiss, M. Bichler, D. Schuh, G. Abstreiter, and J.J. Finley, Nature (London) 432, 81 (2004). 7 J.M. Elzerman, R. Hanson, L.H. Willems van Beveren, B. Witkamp, L.M.K. Vandersypen, and L.P. Kouwenhoven, Nature (London) 430, 431 (2004). 8 S. Amasha, K. MacLean, I. Radu, D.M. Zumbühl, M.A. Kastner, M.P. Hanson, and A.C. Gossard, cond-mat/0607110. 9 A.V. Khaetskii, and Y.V. Nazarov, Phys. Rev. B 64, 125316 (2001). 10 J.L. Cheng, M.W. Wu, and C. Lü, Phys. Rev. B 69, 115318 (2004). 11 D.V. Bulaev, and D. Loss, Phys. Rev. B 71, 205324 (2005). 12 C.F. Destefani, and S.E. Ulloa, Phys. Rev. B 72, 115326 (2005). 13 P. Stano, and J. Fabian, Phys. Rev. B 74, 045320 (2006). 14 Y.Y. Wang, and M.W. Wu, Phys. Rev. B 74, 165312 (2006). 15 E. Ya. Sherman, and D.J. Lockwood, Phys. Rev. B 72, 125340 (2005). 16 L.M. Woods, T.L Reinecke, and Y. Lyanda-Geller, Phys. Rev. B 66, 161318(R) (2002). 17 I.A. Merkulov, Al. L. Efros, and M. Rosen, Phys. Rev. B 65, 205309 (2002). 18 S.I. Erlingsson, and Y.V. Nazarov, Phys. Rev. B 66, 155327 (2002). 19 P. San-Jose, G. Zarand, A. Shnirman, and G. Schön, Phys. Rev. Lett. 97, 076803 (2006). 20 T. Fujisawa, D.G. Austing, Y. Tokura, Y. Hirayama, and S. Tarucha, Nature (London) 419, 278 (2002); T. Fujisawa, D.G. Austing, Y. Tokura, Y. Hirayama, and S. Tarucha, J. Phys.: Cond. Matter 15, R1395 (2003). 21 R. Hanson, L.H. Willems van Beveren, I.T. Vink, J.M. Elzerman, W.J.M. Naber, F.H.L. Koppens, L.P. Kouwen- hoven, and L.M.K. Vandersypen, Phys. Rev. Lett. 94, 196802 (2005). 22 J.R. Petta, A.C. Johnson, J.M. Taylor, E.A. Laird, A. Ya- coby, M.D. Lukin, C.M. Marcus, M.P. Hanson, and A.C. Gossard, Science 309, 2180 (2005). 23 A.C. Johnson, J.R. Petta, J.M. Taylor, A. Yacoby, M.D. Lukin, C.M. Marcus, M.P. Hanson, and A.C. Gossard, Na- ture (London) 435, 925 (2005). 24 J.R. Petta, A.C. Johnson, A. Yacoby, C.M. Marcus, M.P. Hanson, and A.C. Gossard, Phys. Rev. B 72, 161301(R) (2005). 25 W.A. Coish, and D. Loss, Phys. Rev. B 72, 125337 (2005). 26 M. Florescu, S. Dickman, M. Ciorga, A. Sachrajda, and P. Hawrylak, Physica E (Amsterdam) 22, 414 (2004); M. Flo- rescu, and P. Hawrylak, Phys. Rev. B 73, 045304 (2006). 27 D. Chaney and P.A. Maksym, Phys. Rev. B 75, 035323 (2007). 28 J.I. Climente, A. Bertoni, G. Goldoni, M. Rontani, and E. Molinari, Phys. Rev. B 75, 081303(R) (2007). 29 T. Meunier, I.T. Vink, L.H. Willems van Beveren, K.J. Tielrooij, R. Hanson, F.H.L. Koppens, H.P. Tranitz, W. Wegscheider, L.P. Kouwenhoven, and L.M.K. Vander- sypen, Phys. Rev. Lett. 98, 126601 (2007). 30 V.N. Golovach, A. Khaetskii, and D. Loss, cond-mat/0703427 (unpublished). 31 S. Sasaki, T. Fujisawa, T. Hayashi, and Y. Hirayama, Phys. Rev. Lett. 95, 056803 (2005). 32 M. Ciorga, A.S. Sachrajda, P. Hawrylak, C. Gould, P. Za- wadzki, S. Jullian, Y. Feng, and Z. Wasilewski, Phys. Rev. B 61, R16315 (2000); H. Drexler, D. Leonard, W. Hansen, J.P. Kotthaus, P.M. Petroff, Phys. Rev. Lett. 73, 2252 (1994). 33 A. Bertoni, M. Rontani, G. Goldoni, and E. Molinari, Phys. Rev. Lett. 95, 066806 (2005); J.I. Climente, A. Bertoni, M. Rontani, G. Goldoni, and E. Molinari, Phys. Rev. B 74, 125303 (2006). mailto:climente@unimore.it www.nanoscience.unimore.it http://arxiv.org/abs/cond-mat/0607110 http://arxiv.org/abs/cond-mat/0703427 34 T. Chakraborty, and P. Pietiläinen, Phys. Rev. B 71, 113305 (2005). 35 P. Pietiläinen, and T. Chakraborty, Phys. Rev. B 73, 155315 (2006). 36 C.F. Destefani, S.E. Ulloa, and G.E. Marques, Phys. Rev. B 70, 205315 (2004). 37 During the finalization of this paper we have learned about a parallel work investigating the influence of Coulomb in- teraction in two-electron TS relaxation.30 Many of the find- ings in such paper are in agreement with our numerical results. 38 Y.A. Bychkov, and E.I. Rashba, J. Phys. C 17, 6039 (1984). 39 G. Dresselhaus, Phys. Rev. 100, 580 (1955). 40 M. Rontani, C. Cavazzoni, D. Bellucci, and G. Goldoni, J. Chem. Phys. 124, 124102 (2006). 41 http://www.s3.infm.it/donrodrigo 42 L. Jacak, P. Hawrylak, and A. Wojs, Quantum Dots, (Springer Verlag, Berlin, 1998). 43 M. Brasken, S. Corni, M. Lindberg, J. Olsen, and D. Sund- holm, Mol. Phys. 100, 911 (2002). 44 P. Lucignano, B. Jouault, and A. Tagliacozzo, Phys. Rev. B 69, 045314 (2004). 45 The (M,S, Sz) quantum numbers of few-electron states are a good approximation for the lowest-lying states only. For higher-lying states, the energy spectrum becomes denser and the SO interaction becomes very strong even for GaAs, which leads to important departures from the SO-free pic- ture. This does not occur in single-electron parabolic QDs because the energy levels are equally spaced. 46 The convenience of using exact diagonalization procedures, instead of perturbational approaches, to account for the SO coupling in GaAs QDs has been claimed in Ref. 10. 47 C.S. Ting (ed.), Physics of Hot Electron Transport in Semi- conductors, (World Scientific, 1992). 48 Landolt-Börnstein: Numerical Data and Functional Rela- tionships in Science and Technology, Vol. 17. Semiconduc- tors, Group IV Elements and III-V Compounds, edited by O. Madelung, (Springer-Verlag, 1982). 49 M. Cardona, N.E. Christensen, and G. Fasol, Phys. Rev. B 38, 1806 (1988). 50 U. Bockelmann, Phys. Rev. B 50, 17271 (1994). 51 J.I. Climente, A. Bertoni, G. Goldoni, and E. Molinari, Phys. Rev. B 74, 035313 (2006). 52 P. Stano, and J. Fabian, Phys. Rev. B 72, 155410 (2005). 53 O. Voskoboynikov, C.P. Lee, and O. Tretyak, Phys. Rev. B 63, 165306 (2001). 54 W.H. Kuan, and C.S. Tang, J. Appl. Phys. 95, 6368 (2004). 55 The energy magneto-spectrum of GaAs parabolic QDs with SO interaction and up to four interacting electrons was also investigated in Ref. 35, but considering Rashba interaction only. 56 Coulomb-enhanced SO interaction was previously pre- dicted for higher-dimensional structures.57 Here we report it for QDs. 57 G.H. Chen, and M.E. Raikh, Phys. Rev. B 60, 4826 (1999). 58 C.F. Destefani, S.E. Ulloa, and G.E. Marques, Phys. Rev. B 69, 125302 (2004). 59 For simplicity of the discussion, in Figs. 3, 8 and 10, the near vicinity of B = 0 T is not shown. In that range one finds damped phonon-induced relaxation rates due to de- generacies arising from the time-reversal symmetry and the circular symmetry of the confinement we have assumed. We do not expect these features to be observable in exper- iments, because QDs are not perfectly circular and because hyperfine interaction is expected to be the dominant spin relaxation mechanism for very weak fields (see Refs. 18,23). 60 Greatly suppressed TS spin relaxation, comparable to that of inter-Zeeman sublevels at very weak B, may be achieved by means of geometrically or field-induced acoustic phonon emission minima.27,28 61 J.B. Miller, D.M. Zumbühl, C.M. Marcus, Y.B. Lyanda- Geller, D. Goldhaber-Gordon, K. Campman, and A.C. Gossard, Phys. Rev. Lett. 90, 076807 (2003). http://www.s3.infm.it/donrodrigo ABSTRACT We estimate the spin relaxation rate due to spin-orbit coupling and acoustic phonon scattering in weakly-confined quantum dots with up to five interacting electrons. The Full Configuration Interaction approach is used to account for the inter-electron repulsion, and Rashba and Dresselhaus spin-orbit couplings are exactly diagonalized. We show that electron-electron interaction strongly affects spin-orbit admixture in the sample. Consequently, relaxation rates strongly depend on the number of carriers confined in the dot. We identify the mechanisms which may lead to improved spin stability in few electron (>2) quantum dots as compared to the usual one and two electron devices. Finally, we discuss recent experiments on triplet-singlet transitions in GaAs dots subject to external magnetic fields. Our simulations are in good agreement with the experimental findings, and support the interpretation of the observed spin relaxation as being due to spin-orbit coupling assisted by acoustic phonon emission. <|endoftext|><|startoftext|> Introduction The Asymmetric Simple Exclusion Process (ASEP) is a lattice model of parti- cles with hard core interactions. Due to its simplicity, the ASEP appears as a minimal model in many different contexts such as one-dimensional transport phenomena, molecular motors and traffic models. From a theoretical point of view, this model has become a paradigm in the field of non-equilibrium statistical mechanics; many exact results have been derived using various methods, such as continuous limits, Bethe Ansatz and matrix Ansatz (for re- views, see e.g., Spohn 1991, Derrida 1998, Schütz 2001, Golinelli and Mallick 2006). In a recent work (Golinelli and Mallick 2007), we applied the algebraic Bethe Ansatz technique to the Totally Asymmetric Exclusion Process (TASEP). http://arxiv.org/abs/0704.0869v1 Golinelli, Mallick — Connected Operators for TASEP 2 This method allowed us to construct a hierarchy of ‘generalized Hamiltonians’ that contain the Markov matrix and commute with each other. Using the algebraic relations satisfied by the local jump operators, we derived explicit formulae for the transfer matrix and the generalized Hamiltonians, generated from the transfer matrix. We showed that the transfer matrix can be inter- preted as the generator of a discrete time Markov process and we described the actions of the generalized Hamiltonians. These actions are non-local be- cause they involve non-connected bonds of the lattice. However, connected operators are generated by taking the logarithm of the transfer matrix. We conjectured for the connected operators a combinatorial formula that was verified for the first ten connected operators by using a symbolic calculation program. The aim of the present work is to present an analytical calculation of the connected operators and to prove the formula that was proposed in (Golinelli and Mallick 2007). This paper is a sequel of our previous work, however, in section 2, we briefly review the main definitions and results already obtained so that this work can be read in a fairly self-contained manner. In section 3, we derive the general expression of the connected operators. 2 Review of known results We first recall the dynamical rules that define the TASEP with n particles on a periodic 1-d ring with L sites labelled i = 1, . . . , L. The particles move according to the following dynamics: during the time interval [t, t + dt], a particle on a site i jumps with probability dt to the neighboring site i+ 1, if this site is empty. This exclusion rule which forbids to have more than one particle per site, mimics a hard-core interaction between particles. Because the particles can jump only in one direction this process is called totally asymmetric. The total number n of particles is conserved. The TASEP being a continuous-time Markov process, its dynamics is entirely encoded in a 2L × 2L Markov matrix M , that describes the evolution of the probability distribution of the system at time t. The Markov matrix can be written as Mi , (1) where the local jump operator Mi affects only the sites i and i + 1 and represents the contribution to the dynamics of jumps from the site i to i+1. Golinelli, Mallick — Connected Operators for TASEP 3 2.1 The TASEP algebra The local jump operators satisfy a set of algebraic equations : M2i = −Mi, (2) Mi Mi+1 Mi = Mi+1 Mi Mi+1 = 0, (3) [Mi,Mj ] = 0 if |i− j| > 1. (4) These relations can be obtained as a limiting form of the Temperley-Lieb algebra. On the ring we have periodic boundary conditions : Mi+L = Mi. The local jumps matrices define an algebra. Any product of the Mi’s will be called a word. The length of a given word is the minimal number of operators Mi required to write it. A word, that can not be simplified further by using the algebraic rules above, will be called a reduced word. Consider any word W and call I(W ) the set of indices i of the operators Mi that compose it (indices are enumerated without repetitions). We remark that, if W is not annihilated by application of rule (3), the simplification rules (2, 4) do not alter the set I(W ), i.e., these rules do not introduce any new index or suppress any existing index in I(W ). This crucial property is not valid for the algebra associated with the partially asymmetric exclusion process (see Golinelli and Mallick 2006). Using the relation (2) we observe that for any i and any real number λ 6= 1 we have (1 + λMi) −1 = (1 + αMi) with α = . (5) 2.2 Simple words A simple word of length k is defined as a word Mσ(1)Mσ(2) . . .Mσ(k), where σ is a permutation on the set {1, 2, . . . , k}. The commutation rule (4) implies that only the relative position of Mi with respect to Mi±1 matters. A simple word of length k can therefore be written as Wk(s2, s3, . . . , sk) where the boolean variable sj for 2 ≤ j ≤ k is defined as follows : sj = 0 if Mj is on the left of Mj−1 and sj = 1 if Mj is on the right of Mj−1. Equivalently, Wk(s2, s3, . . . , sk) is uniquely defined by the recursion relation Wk(s2, s3, . . . , sk−1, 1) = Wk−1(s2, s3, . . . , sk−1) Mk , (6) Wk(s2, s3, . . . , sk−1, 0) = Mk Wk−1(s2, s3, . . . , sk−1) . (7) The set of the 2k−1simple words of length k will be called Wk. For a simple word Wk, we define u(Wk) to be the number of inversions in Wk, i.e., the Golinelli, Mallick — Connected Operators for TASEP 4 number of times that Mj is on the left of Mj−1 : u(Wk(s2, s3, . . . , sk)) = (1− sj) . (8) We remark that simple words are connected, they cannot be factorized in two (or more) commuting words. 2.3 Ring-ordered product Because of the periodic boundary conditions, products of local jump opera- tors must be ordered adequately. In the following we shall need to use a ring ordered product O () which acts on words of the type W = Mi1Mi2 . . .Mik with 1 ≤ i1 < i2 < . . . < ik ≤ L , (9) by changing the positions of matrices that appear in W according to the following rules : (i) If i1 > 1 or ik < L, we define O (W ) = W . The word W is well- ordered. (ii) If i1 = 1 and ik = L, we first write W as a product of two blocks, W = AB, such that B = MbMb+1 . . .ML is the maximal block of matrices with consecutive indices that contains ML, and A = M1Mi2 . . .Mia , with ia < b− 1, contains the remaining terms. We then define O (W ) = O (AB) = BA = MbMb+1 . . .MLM1Mi2 . . .Mia . (10) (iii) The previous definition makes sense only for k < L. Indeed, when k = L, we have W = M1M2 . . .ML and it is not possible to split W in two different blocks A and B. For this special case, we define O (M1M2 . . .ML) = |1, 1, . . . , 1〉〈1, 1, . . . , 1| , (11) which is the projector on the ‘full’ configuration with all sites occupied. The ring-orderingO () is extended by linearity to the vector space spanned by words of the type described above. 2.4 Transfer matrix and generalized Hamiltonians Hk The algebraic Bethe Ansatz allows to construct a one parameter commuting family of transfer matrices, t(λ), that contains the translation operator T = t(1) and the Markov matrix M = t′(0). For 0 ≤ λ ≤ 1, the operator Golinelli, Mallick — Connected Operators for TASEP 5 t(λ) can be interpreted as a discrete time process with non-local jumps : a hole located on the right of a cluster of p particles can jump a distance k in the backward direction, with probability λk(1 − λ) for 1 ≤ k < p, and with probability λp for k = p. The probability that this hole does not jump at all is 1 − λ. This model is equivalent to the 3-D anisotropic percolation model of Rajesh and Dhar (1998) and to a 2-D five-vertex model. It is also an adaptation on a periodic lattice of the ASEP with a backward- ordered sequential update (Rajewsky et al. 1996, Brankov et al. 2004), and equivalently of an asymmetric fragmentation process (Rákos and Schütz 2005). The operator t(λ) is a polynomial in λ of degree L given by t(λ) = 1 + λkHk , (12) where the generalized HamiltoniansHk are non-local operators that act on the configuration space. [We emphasize that the notation used here is different from that of our previous work : t(λ) was denoted by tg(λ) in (Golinelli and Mallick 2007).] We have H1 = M and more generally, as shown in (Golinelli and Mallick 2007), Hk is a homogeneous sum of words of length k 1≤i1